All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/8] drm/i915/mtl: Define MOCS and PAT tables for MTL
@ 2023-04-19 23:00 ` fei.yang
  0 siblings, 0 replies; 76+ messages in thread
From: fei.yang @ 2023-04-19 23:00 UTC (permalink / raw)
  To: intel-gfx; +Cc: Fei Yang, dri-devel

From: Fei Yang <fei.yang@intel.com>

The series includes patches needed to enable MTL.
Also add new extension for GEM_CREATE uAPI to let
user space set cache policy for buffer objects.

v2: addressing review comments and checkpatch warnings
v3: make mtl_ggtt_pte_encode static

Fei Yang (7):
  drm/i915/mtl: Set has_llc=0
  drm/i915/mtl: Add PTE encode function
  drm/i915/mtl: workaround coherency issue for Media
  drm/i915/mtl: end support for set caching ioctl
  drm/i915: preparation for using PAT index
  drm/i915: use pat_index instead of cache_level
  drm/i915: Allow user to set cache at BO creation

Madhumitha Tolakanahalli Pradeep (1):
  drm/i915/mtl: Define MOCS and PAT tables for MTL

 drivers/gpu/drm/i915/display/intel_dpt.c      | 14 ++--
 drivers/gpu/drm/i915/gem/i915_gem_create.c    | 36 ++++++++
 drivers/gpu/drm/i915/gem/i915_gem_domain.c    | 30 +++----
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 10 ++-
 drivers/gpu/drm/i915/gem/i915_gem_mman.c      |  3 +-
 drivers/gpu/drm/i915/gem/i915_gem_object.c    | 67 ++++++++++++++-
 drivers/gpu/drm/i915/gem/i915_gem_object.h    |  8 ++
 .../gpu/drm/i915/gem/i915_gem_object_types.h  | 26 +++++-
 drivers/gpu/drm/i915/gem/i915_gem_pages.c     |  5 +-
 drivers/gpu/drm/i915/gem/i915_gem_shmem.c     |  9 +-
 drivers/gpu/drm/i915/gem/i915_gem_shrinker.c  |  2 -
 drivers/gpu/drm/i915/gem/i915_gem_stolen.c    |  4 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c  | 16 ++--
 .../gpu/drm/i915/gem/selftests/huge_pages.c   |  2 +-
 .../drm/i915/gem/selftests/i915_gem_migrate.c |  2 +-
 .../drm/i915/gem/selftests/i915_gem_mman.c    |  2 +-
 drivers/gpu/drm/i915/gt/gen6_ppgtt.c          | 10 ++-
 drivers/gpu/drm/i915/gt/gen8_ppgtt.c          | 76 ++++++++++++-----
 drivers/gpu/drm/i915/gt/gen8_ppgtt.h          |  3 +-
 drivers/gpu/drm/i915/gt/intel_ggtt.c          | 84 +++++++++++++------
 drivers/gpu/drm/i915/gt/intel_gt_regs.h       |  6 +-
 drivers/gpu/drm/i915/gt/intel_gtt.c           | 47 ++++++++++-
 drivers/gpu/drm/i915/gt/intel_gtt.h           | 38 ++++++---
 drivers/gpu/drm/i915/gt/intel_migrate.c       | 47 ++++++-----
 drivers/gpu/drm/i915/gt/intel_migrate.h       | 13 ++-
 drivers/gpu/drm/i915/gt/intel_mocs.c          | 76 ++++++++++++++++-
 drivers/gpu/drm/i915/gt/intel_ppgtt.c         |  6 +-
 drivers/gpu/drm/i915/gt/selftest_migrate.c    | 47 ++++++-----
 drivers/gpu/drm/i915/gt/selftest_mocs.c       |  2 +-
 drivers/gpu/drm/i915/gt/selftest_reset.c      |  8 +-
 drivers/gpu/drm/i915/gt/selftest_timeline.c   |  2 +-
 drivers/gpu/drm/i915/gt/selftest_tlb.c        |  4 +-
 drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c     | 13 +++
 drivers/gpu/drm/i915/gt/uc/intel_guc.c        |  7 ++
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     |  6 ++
 drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c      | 10 ++-
 drivers/gpu/drm/i915/i915_debugfs.c           | 55 +++++++++---
 drivers/gpu/drm/i915/i915_gem.c               | 16 +++-
 drivers/gpu/drm/i915/i915_gpu_error.c         |  8 +-
 drivers/gpu/drm/i915/i915_pci.c               | 76 +++++++++++++++--
 drivers/gpu/drm/i915/i915_vma.c               | 16 ++--
 drivers/gpu/drm/i915/i915_vma.h               |  2 +-
 drivers/gpu/drm/i915/i915_vma_types.h         |  2 -
 drivers/gpu/drm/i915/intel_device_info.h      |  5 ++
 drivers/gpu/drm/i915/selftests/i915_gem.c     |  5 +-
 .../gpu/drm/i915/selftests/i915_gem_evict.c   |  4 +-
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 15 ++--
 .../drm/i915/selftests/intel_memory_region.c  |  4 +-
 .../gpu/drm/i915/selftests/mock_gem_device.c  |  9 ++
 drivers/gpu/drm/i915/selftests/mock_gtt.c     |  8 +-
 include/uapi/drm/i915_drm.h                   | 36 ++++++++
 tools/include/uapi/drm/i915_drm.h             | 36 ++++++++
 52 files changed, 812 insertions(+), 226 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 76+ messages in thread

* [Intel-gfx] [PATCH 0/8] drm/i915/mtl: Define MOCS and PAT tables for MTL
@ 2023-04-19 23:00 ` fei.yang
  0 siblings, 0 replies; 76+ messages in thread
From: fei.yang @ 2023-04-19 23:00 UTC (permalink / raw)
  To: intel-gfx; +Cc: dri-devel

From: Fei Yang <fei.yang@intel.com>

The series includes patches needed to enable MTL.
Also add new extension for GEM_CREATE uAPI to let
user space set cache policy for buffer objects.

v2: addressing review comments and checkpatch warnings
v3: make mtl_ggtt_pte_encode static

Fei Yang (7):
  drm/i915/mtl: Set has_llc=0
  drm/i915/mtl: Add PTE encode function
  drm/i915/mtl: workaround coherency issue for Media
  drm/i915/mtl: end support for set caching ioctl
  drm/i915: preparation for using PAT index
  drm/i915: use pat_index instead of cache_level
  drm/i915: Allow user to set cache at BO creation

Madhumitha Tolakanahalli Pradeep (1):
  drm/i915/mtl: Define MOCS and PAT tables for MTL

 drivers/gpu/drm/i915/display/intel_dpt.c      | 14 ++--
 drivers/gpu/drm/i915/gem/i915_gem_create.c    | 36 ++++++++
 drivers/gpu/drm/i915/gem/i915_gem_domain.c    | 30 +++----
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 10 ++-
 drivers/gpu/drm/i915/gem/i915_gem_mman.c      |  3 +-
 drivers/gpu/drm/i915/gem/i915_gem_object.c    | 67 ++++++++++++++-
 drivers/gpu/drm/i915/gem/i915_gem_object.h    |  8 ++
 .../gpu/drm/i915/gem/i915_gem_object_types.h  | 26 +++++-
 drivers/gpu/drm/i915/gem/i915_gem_pages.c     |  5 +-
 drivers/gpu/drm/i915/gem/i915_gem_shmem.c     |  9 +-
 drivers/gpu/drm/i915/gem/i915_gem_shrinker.c  |  2 -
 drivers/gpu/drm/i915/gem/i915_gem_stolen.c    |  4 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c  | 16 ++--
 .../gpu/drm/i915/gem/selftests/huge_pages.c   |  2 +-
 .../drm/i915/gem/selftests/i915_gem_migrate.c |  2 +-
 .../drm/i915/gem/selftests/i915_gem_mman.c    |  2 +-
 drivers/gpu/drm/i915/gt/gen6_ppgtt.c          | 10 ++-
 drivers/gpu/drm/i915/gt/gen8_ppgtt.c          | 76 ++++++++++++-----
 drivers/gpu/drm/i915/gt/gen8_ppgtt.h          |  3 +-
 drivers/gpu/drm/i915/gt/intel_ggtt.c          | 84 +++++++++++++------
 drivers/gpu/drm/i915/gt/intel_gt_regs.h       |  6 +-
 drivers/gpu/drm/i915/gt/intel_gtt.c           | 47 ++++++++++-
 drivers/gpu/drm/i915/gt/intel_gtt.h           | 38 ++++++---
 drivers/gpu/drm/i915/gt/intel_migrate.c       | 47 ++++++-----
 drivers/gpu/drm/i915/gt/intel_migrate.h       | 13 ++-
 drivers/gpu/drm/i915/gt/intel_mocs.c          | 76 ++++++++++++++++-
 drivers/gpu/drm/i915/gt/intel_ppgtt.c         |  6 +-
 drivers/gpu/drm/i915/gt/selftest_migrate.c    | 47 ++++++-----
 drivers/gpu/drm/i915/gt/selftest_mocs.c       |  2 +-
 drivers/gpu/drm/i915/gt/selftest_reset.c      |  8 +-
 drivers/gpu/drm/i915/gt/selftest_timeline.c   |  2 +-
 drivers/gpu/drm/i915/gt/selftest_tlb.c        |  4 +-
 drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c     | 13 +++
 drivers/gpu/drm/i915/gt/uc/intel_guc.c        |  7 ++
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     |  6 ++
 drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c      | 10 ++-
 drivers/gpu/drm/i915/i915_debugfs.c           | 55 +++++++++---
 drivers/gpu/drm/i915/i915_gem.c               | 16 +++-
 drivers/gpu/drm/i915/i915_gpu_error.c         |  8 +-
 drivers/gpu/drm/i915/i915_pci.c               | 76 +++++++++++++++--
 drivers/gpu/drm/i915/i915_vma.c               | 16 ++--
 drivers/gpu/drm/i915/i915_vma.h               |  2 +-
 drivers/gpu/drm/i915/i915_vma_types.h         |  2 -
 drivers/gpu/drm/i915/intel_device_info.h      |  5 ++
 drivers/gpu/drm/i915/selftests/i915_gem.c     |  5 +-
 .../gpu/drm/i915/selftests/i915_gem_evict.c   |  4 +-
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 15 ++--
 .../drm/i915/selftests/intel_memory_region.c  |  4 +-
 .../gpu/drm/i915/selftests/mock_gem_device.c  |  9 ++
 drivers/gpu/drm/i915/selftests/mock_gtt.c     |  8 +-
 include/uapi/drm/i915_drm.h                   | 36 ++++++++
 tools/include/uapi/drm/i915_drm.h             | 36 ++++++++
 52 files changed, 812 insertions(+), 226 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH 1/8] drm/i915/mtl: Set has_llc=0
  2023-04-19 23:00 ` [Intel-gfx] " fei.yang
@ 2023-04-19 23:00   ` fei.yang
  -1 siblings, 0 replies; 76+ messages in thread
From: fei.yang @ 2023-04-19 23:00 UTC (permalink / raw)
  To: intel-gfx; +Cc: Andrzej Hajda, Nirmoy Das, Fei Yang, dri-devel, Andi Shyti

From: Fei Yang <fei.yang@intel.com>

On MTL, LLC is not shared between GT and CPU, set has_llc=0.

Signed-off-by: Fei Yang <fei.yang@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
---
 drivers/gpu/drm/i915/i915_pci.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
index d64e074d7457..272a8ba37b64 100644
--- a/drivers/gpu/drm/i915/i915_pci.c
+++ b/drivers/gpu/drm/i915/i915_pci.c
@@ -1147,6 +1147,7 @@ static const struct intel_device_info mtl_info = {
 	.has_flat_ccs = 0,
 	.has_gmd_id = 1,
 	.has_guc_deprivilege = 1,
+	.has_llc = 0,
 	.has_mslice_steering = 0,
 	.has_snoop = 1,
 	.__runtime.memory_regions = REGION_SMEM | REGION_STOLEN_LMEM,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [Intel-gfx] [PATCH 1/8] drm/i915/mtl: Set has_llc=0
@ 2023-04-19 23:00   ` fei.yang
  0 siblings, 0 replies; 76+ messages in thread
From: fei.yang @ 2023-04-19 23:00 UTC (permalink / raw)
  To: intel-gfx; +Cc: Andrzej Hajda, Nirmoy Das, dri-devel

From: Fei Yang <fei.yang@intel.com>

On MTL, LLC is not shared between GT and CPU, set has_llc=0.

Signed-off-by: Fei Yang <fei.yang@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
---
 drivers/gpu/drm/i915/i915_pci.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
index d64e074d7457..272a8ba37b64 100644
--- a/drivers/gpu/drm/i915/i915_pci.c
+++ b/drivers/gpu/drm/i915/i915_pci.c
@@ -1147,6 +1147,7 @@ static const struct intel_device_info mtl_info = {
 	.has_flat_ccs = 0,
 	.has_gmd_id = 1,
 	.has_guc_deprivilege = 1,
+	.has_llc = 0,
 	.has_mslice_steering = 0,
 	.has_snoop = 1,
 	.__runtime.memory_regions = REGION_SMEM | REGION_STOLEN_LMEM,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 2/8] drm/i915/mtl: Define MOCS and PAT tables for MTL
  2023-04-19 23:00 ` [Intel-gfx] " fei.yang
@ 2023-04-19 23:00   ` fei.yang
  -1 siblings, 0 replies; 76+ messages in thread
From: fei.yang @ 2023-04-19 23:00 UTC (permalink / raw)
  To: intel-gfx
  Cc: Andi Shyti, Lucas De Marchi, dri-devel,
	Madhumitha Tolakanahalli Pradeep, Andrzej Hajda, Matt Roper,
	Nirmoy Das

From: Madhumitha Tolakanahalli Pradeep <madhumitha.tolakanahalli.pradeep@intel.com>

On MTL, GT can no longer allocate on LLC - only the CPU can.
This, along with addition of support for L4 cache calls for
a MOCS/PAT table update.
Also the PAT index registers are multicasted for primary GT,
and there is an address jump from index 7 to 8. This patch
makes sure that these registers are programmed in the proper
way.

BSpec: 44509, 45101, 44235

Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Madhumitha Tolakanahalli Pradeep <madhumitha.tolakanahalli.pradeep@intel.com>
Signed-off-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com>
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
Signed-off-by: Fei Yang <fei.yang@intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
---
 drivers/gpu/drm/i915/gt/intel_gt_regs.h |  6 +-
 drivers/gpu/drm/i915/gt/intel_gtt.c     | 47 ++++++++++++++-
 drivers/gpu/drm/i915/gt/intel_gtt.h     | 20 ++++++-
 drivers/gpu/drm/i915/gt/intel_mocs.c    | 76 +++++++++++++++++++++++--
 drivers/gpu/drm/i915/gt/selftest_mocs.c |  2 +-
 5 files changed, 143 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
index fd1f9cd35e9d..e8c3b762a92a 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
@@ -356,7 +356,11 @@
 #define GEN7_TLB_RD_ADDR			_MMIO(0x4700)
 
 #define GEN12_PAT_INDEX(index)			_MMIO(0x4800 + (index) * 4)
-#define XEHP_PAT_INDEX(index)			MCR_REG(0x4800 + (index) * 4)
+#define _PAT_INDEX(index)			_PICK_EVEN_2RANGES(index, 8, \
+								   0x4800, 0x4804, \
+								   0x4848, 0x484c)
+#define XEHP_PAT_INDEX(index)			MCR_REG(_PAT_INDEX(index))
+#define XELPMP_PAT_INDEX(index)			_MMIO(_PAT_INDEX(index))
 
 #define XEHP_TILE0_ADDR_RANGE			MCR_REG(0x4900)
 #define   XEHP_TILE_LMEM_RANGE_SHIFT		8
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
index 4f436ba7a3c8..2f6a9be0ffe6 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
@@ -468,6 +468,44 @@ void gtt_write_workarounds(struct intel_gt *gt)
 	}
 }
 
+static void xelpmp_setup_private_ppat(struct intel_uncore *uncore)
+{
+	intel_uncore_write(uncore, XELPMP_PAT_INDEX(0),
+			   MTL_PPAT_L4_0_WB);
+	intel_uncore_write(uncore, XELPMP_PAT_INDEX(1),
+			   MTL_PPAT_L4_1_WT);
+	intel_uncore_write(uncore, XELPMP_PAT_INDEX(2),
+			   MTL_PPAT_L4_3_UC);
+	intel_uncore_write(uncore, XELPMP_PAT_INDEX(3),
+			   MTL_PPAT_L4_0_WB | MTL_2_COH_1W);
+	intel_uncore_write(uncore, XELPMP_PAT_INDEX(4),
+			   MTL_PPAT_L4_0_WB | MTL_3_COH_2W);
+
+	/*
+	 * Remaining PAT entries are left at the hardware-default
+	 * fully-cached setting
+	 */
+}
+
+static void xelpg_setup_private_ppat(struct intel_gt *gt)
+{
+	intel_gt_mcr_multicast_write(gt, XEHP_PAT_INDEX(0),
+				     MTL_PPAT_L4_0_WB);
+	intel_gt_mcr_multicast_write(gt, XEHP_PAT_INDEX(1),
+				     MTL_PPAT_L4_1_WT);
+	intel_gt_mcr_multicast_write(gt, XEHP_PAT_INDEX(2),
+				     MTL_PPAT_L4_3_UC);
+	intel_gt_mcr_multicast_write(gt, XEHP_PAT_INDEX(3),
+				     MTL_PPAT_L4_0_WB | MTL_2_COH_1W);
+	intel_gt_mcr_multicast_write(gt, XEHP_PAT_INDEX(4),
+				     MTL_PPAT_L4_0_WB | MTL_3_COH_2W);
+
+	/*
+	 * Remaining PAT entries are left at the hardware-default
+	 * fully-cached setting
+	 */
+}
+
 static void tgl_setup_private_ppat(struct intel_uncore *uncore)
 {
 	/* TGL doesn't support LLC or AGE settings */
@@ -603,7 +641,14 @@ void setup_private_pat(struct intel_gt *gt)
 
 	GEM_BUG_ON(GRAPHICS_VER(i915) < 8);
 
-	if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 50))
+	if (gt->type == GT_MEDIA) {
+		xelpmp_setup_private_ppat(gt->uncore);
+		return;
+	}
+
+	if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 70))
+		xelpg_setup_private_ppat(gt);
+	else if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 50))
 		xehp_setup_private_ppat(gt);
 	else if (GRAPHICS_VER(i915) >= 12)
 		tgl_setup_private_ppat(uncore);
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
index 69ce55f517f5..854ec09fd588 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
@@ -88,9 +88,18 @@ typedef u64 gen8_pte_t;
 #define BYT_PTE_SNOOPED_BY_CPU_CACHES	REG_BIT(2)
 #define BYT_PTE_WRITEABLE		REG_BIT(1)
 
+#define MTL_PPGTT_PTE_PAT3	BIT_ULL(62)
 #define GEN12_PPGTT_PTE_LM	BIT_ULL(11)
+#define GEN12_PPGTT_PTE_PAT2	BIT_ULL(7)
+#define GEN12_PPGTT_PTE_NC	BIT_ULL(5)
+#define GEN12_PPGTT_PTE_PAT1	BIT_ULL(4)
+#define GEN12_PPGTT_PTE_PAT0	BIT_ULL(3)
 
-#define GEN12_GGTT_PTE_LM	BIT_ULL(1)
+#define GEN12_GGTT_PTE_LM		BIT_ULL(1)
+#define MTL_GGTT_PTE_PAT0		BIT_ULL(52)
+#define MTL_GGTT_PTE_PAT1		BIT_ULL(53)
+#define GEN12_GGTT_PTE_ADDR_MASK	GENMASK_ULL(45, 12)
+#define MTL_GGTT_PTE_PAT_MASK		GENMASK_ULL(53, 52)
 
 #define GEN12_PDE_64K BIT(6)
 #define GEN12_PTE_PS64 BIT(8)
@@ -147,6 +156,15 @@ typedef u64 gen8_pte_t;
 #define GEN8_PDE_IPS_64K BIT(11)
 #define GEN8_PDE_PS_2M   BIT(7)
 
+#define MTL_PPAT_L4_CACHE_POLICY_MASK	REG_GENMASK(3, 2)
+#define MTL_PAT_INDEX_COH_MODE_MASK	REG_GENMASK(1, 0)
+#define MTL_PPAT_L4_3_UC	REG_FIELD_PREP(MTL_PPAT_L4_CACHE_POLICY_MASK, 3)
+#define MTL_PPAT_L4_1_WT	REG_FIELD_PREP(MTL_PPAT_L4_CACHE_POLICY_MASK, 1)
+#define MTL_PPAT_L4_0_WB	REG_FIELD_PREP(MTL_PPAT_L4_CACHE_POLICY_MASK, 0)
+#define MTL_3_COH_2W	REG_FIELD_PREP(MTL_PAT_INDEX_COH_MODE_MASK, 3)
+#define MTL_2_COH_1W	REG_FIELD_PREP(MTL_PAT_INDEX_COH_MODE_MASK, 2)
+#define MTL_0_COH_NON	REG_FIELD_PREP(MTL_PAT_INDEX_COH_MODE_MASK, 0)
+
 enum i915_cache_level;
 
 struct drm_i915_gem_object;
diff --git a/drivers/gpu/drm/i915/gt/intel_mocs.c b/drivers/gpu/drm/i915/gt/intel_mocs.c
index 69b489e8dfed..89570f137b2c 100644
--- a/drivers/gpu/drm/i915/gt/intel_mocs.c
+++ b/drivers/gpu/drm/i915/gt/intel_mocs.c
@@ -40,6 +40,10 @@ struct drm_i915_mocs_table {
 #define LE_COS(value)		((value) << 15)
 #define LE_SSE(value)		((value) << 17)
 
+/* Defines for the tables (GLOB_MOCS_0 - GLOB_MOCS_16) */
+#define _L4_CACHEABILITY(value)	((value) << 2)
+#define IG_PAT(value)		((value) << 8)
+
 /* Defines for the tables (LNCFMOCS0 - LNCFMOCS31) - two entries per word */
 #define L3_ESC(value)		((value) << 0)
 #define L3_SCC(value)		((value) << 1)
@@ -50,6 +54,7 @@ struct drm_i915_mocs_table {
 /* Helper defines */
 #define GEN9_NUM_MOCS_ENTRIES	64  /* 63-64 are reserved, but configured. */
 #define PVC_NUM_MOCS_ENTRIES	3
+#define MTL_NUM_MOCS_ENTRIES	16
 
 /* (e)LLC caching options */
 /*
@@ -73,6 +78,12 @@ struct drm_i915_mocs_table {
 #define L3_2_RESERVED		_L3_CACHEABILITY(2)
 #define L3_3_WB			_L3_CACHEABILITY(3)
 
+/* L4 caching options */
+#define L4_0_WB			_L4_CACHEABILITY(0)
+#define L4_1_WT			_L4_CACHEABILITY(1)
+#define L4_2_RESERVED		_L4_CACHEABILITY(2)
+#define L4_3_UC			_L4_CACHEABILITY(3)
+
 #define MOCS_ENTRY(__idx, __control_value, __l3cc_value) \
 	[__idx] = { \
 		.control_value = __control_value, \
@@ -416,6 +427,57 @@ static const struct drm_i915_mocs_entry pvc_mocs_table[] = {
 	MOCS_ENTRY(2, 0, L3_3_WB),
 };
 
+static const struct drm_i915_mocs_entry mtl_mocs_table[] = {
+	/* Error - Reserved for Non-Use */
+	MOCS_ENTRY(0,
+		   IG_PAT(0),
+		   L3_LKUP(1) | L3_3_WB),
+	/* Cached - L3 + L4 */
+	MOCS_ENTRY(1,
+		   IG_PAT(1),
+		   L3_LKUP(1) | L3_3_WB),
+	/* L4 - GO:L3 */
+	MOCS_ENTRY(2,
+		   IG_PAT(1),
+		   L3_LKUP(1) | L3_1_UC),
+	/* Uncached - GO:L3 */
+	MOCS_ENTRY(3,
+		   IG_PAT(1) | L4_3_UC,
+		   L3_LKUP(1) | L3_1_UC),
+	/* L4 - GO:Mem */
+	MOCS_ENTRY(4,
+		   IG_PAT(1),
+		   L3_LKUP(1) | L3_GLBGO(1) | L3_1_UC),
+	/* Uncached - GO:Mem */
+	MOCS_ENTRY(5,
+		   IG_PAT(1) | L4_3_UC,
+		   L3_LKUP(1) | L3_GLBGO(1) | L3_1_UC),
+	/* L4 - L3:NoLKUP; GO:L3 */
+	MOCS_ENTRY(6,
+		   IG_PAT(1),
+		   L3_1_UC),
+	/* Uncached - L3:NoLKUP; GO:L3 */
+	MOCS_ENTRY(7,
+		   IG_PAT(1) | L4_3_UC,
+		   L3_1_UC),
+	/* L4 - L3:NoLKUP; GO:Mem */
+	MOCS_ENTRY(8,
+		   IG_PAT(1),
+		   L3_GLBGO(1) | L3_1_UC),
+	/* Uncached - L3:NoLKUP; GO:Mem */
+	MOCS_ENTRY(9,
+		   IG_PAT(1) | L4_3_UC,
+		   L3_GLBGO(1) | L3_1_UC),
+	/* Display - L3; L4:WT */
+	MOCS_ENTRY(14,
+		   IG_PAT(1) | L4_1_WT,
+		   L3_LKUP(1) | L3_3_WB),
+	/* CCS - Non-Displayable */
+	MOCS_ENTRY(15,
+		   IG_PAT(1),
+		   L3_GLBGO(1) | L3_1_UC),
+};
+
 enum {
 	HAS_GLOBAL_MOCS = BIT(0),
 	HAS_ENGINE_MOCS = BIT(1),
@@ -445,7 +507,13 @@ static unsigned int get_mocs_settings(const struct drm_i915_private *i915,
 	memset(table, 0, sizeof(struct drm_i915_mocs_table));
 
 	table->unused_entries_index = I915_MOCS_PTE;
-	if (IS_PONTEVECCHIO(i915)) {
+	if (IS_METEORLAKE(i915)) {
+		table->size = ARRAY_SIZE(mtl_mocs_table);
+		table->table = mtl_mocs_table;
+		table->n_entries = MTL_NUM_MOCS_ENTRIES;
+		table->uc_index = 9;
+		table->unused_entries_index = 1;
+	} else if (IS_PONTEVECCHIO(i915)) {
 		table->size = ARRAY_SIZE(pvc_mocs_table);
 		table->table = pvc_mocs_table;
 		table->n_entries = PVC_NUM_MOCS_ENTRIES;
@@ -646,9 +714,9 @@ void intel_mocs_init_engine(struct intel_engine_cs *engine)
 		init_l3cc_table(engine->gt, &table);
 }
 
-static u32 global_mocs_offset(void)
+static u32 global_mocs_offset(struct intel_gt *gt)
 {
-	return i915_mmio_reg_offset(GEN12_GLOBAL_MOCS(0));
+	return i915_mmio_reg_offset(GEN12_GLOBAL_MOCS(0)) + gt->uncore->gsi_offset;
 }
 
 void intel_set_mocs_index(struct intel_gt *gt)
@@ -671,7 +739,7 @@ void intel_mocs_init(struct intel_gt *gt)
 	 */
 	flags = get_mocs_settings(gt->i915, &table);
 	if (flags & HAS_GLOBAL_MOCS)
-		__init_mocs_table(gt->uncore, &table, global_mocs_offset());
+		__init_mocs_table(gt->uncore, &table, global_mocs_offset(gt));
 
 	/*
 	 * Initialize the L3CC table as part of mocs initalization to make
diff --git a/drivers/gpu/drm/i915/gt/selftest_mocs.c b/drivers/gpu/drm/i915/gt/selftest_mocs.c
index ca009a6a13bd..730796346514 100644
--- a/drivers/gpu/drm/i915/gt/selftest_mocs.c
+++ b/drivers/gpu/drm/i915/gt/selftest_mocs.c
@@ -137,7 +137,7 @@ static int read_mocs_table(struct i915_request *rq,
 		return 0;
 
 	if (HAS_GLOBAL_MOCS_REGISTERS(rq->engine->i915))
-		addr = global_mocs_offset();
+		addr = global_mocs_offset(rq->engine->gt);
 	else
 		addr = mocs_offset(rq->engine);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [Intel-gfx] [PATCH 2/8] drm/i915/mtl: Define MOCS and PAT tables for MTL
@ 2023-04-19 23:00   ` fei.yang
  0 siblings, 0 replies; 76+ messages in thread
From: fei.yang @ 2023-04-19 23:00 UTC (permalink / raw)
  To: intel-gfx
  Cc: Lucas De Marchi, dri-devel, Madhumitha Tolakanahalli Pradeep,
	Andrzej Hajda, Matt Roper, Nirmoy Das

From: Madhumitha Tolakanahalli Pradeep <madhumitha.tolakanahalli.pradeep@intel.com>

On MTL, GT can no longer allocate on LLC - only the CPU can.
This, along with addition of support for L4 cache calls for
a MOCS/PAT table update.
Also the PAT index registers are multicasted for primary GT,
and there is an address jump from index 7 to 8. This patch
makes sure that these registers are programmed in the proper
way.

BSpec: 44509, 45101, 44235

Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Madhumitha Tolakanahalli Pradeep <madhumitha.tolakanahalli.pradeep@intel.com>
Signed-off-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com>
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
Signed-off-by: Fei Yang <fei.yang@intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
---
 drivers/gpu/drm/i915/gt/intel_gt_regs.h |  6 +-
 drivers/gpu/drm/i915/gt/intel_gtt.c     | 47 ++++++++++++++-
 drivers/gpu/drm/i915/gt/intel_gtt.h     | 20 ++++++-
 drivers/gpu/drm/i915/gt/intel_mocs.c    | 76 +++++++++++++++++++++++--
 drivers/gpu/drm/i915/gt/selftest_mocs.c |  2 +-
 5 files changed, 143 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
index fd1f9cd35e9d..e8c3b762a92a 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
@@ -356,7 +356,11 @@
 #define GEN7_TLB_RD_ADDR			_MMIO(0x4700)
 
 #define GEN12_PAT_INDEX(index)			_MMIO(0x4800 + (index) * 4)
-#define XEHP_PAT_INDEX(index)			MCR_REG(0x4800 + (index) * 4)
+#define _PAT_INDEX(index)			_PICK_EVEN_2RANGES(index, 8, \
+								   0x4800, 0x4804, \
+								   0x4848, 0x484c)
+#define XEHP_PAT_INDEX(index)			MCR_REG(_PAT_INDEX(index))
+#define XELPMP_PAT_INDEX(index)			_MMIO(_PAT_INDEX(index))
 
 #define XEHP_TILE0_ADDR_RANGE			MCR_REG(0x4900)
 #define   XEHP_TILE_LMEM_RANGE_SHIFT		8
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
index 4f436ba7a3c8..2f6a9be0ffe6 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
@@ -468,6 +468,44 @@ void gtt_write_workarounds(struct intel_gt *gt)
 	}
 }
 
+static void xelpmp_setup_private_ppat(struct intel_uncore *uncore)
+{
+	intel_uncore_write(uncore, XELPMP_PAT_INDEX(0),
+			   MTL_PPAT_L4_0_WB);
+	intel_uncore_write(uncore, XELPMP_PAT_INDEX(1),
+			   MTL_PPAT_L4_1_WT);
+	intel_uncore_write(uncore, XELPMP_PAT_INDEX(2),
+			   MTL_PPAT_L4_3_UC);
+	intel_uncore_write(uncore, XELPMP_PAT_INDEX(3),
+			   MTL_PPAT_L4_0_WB | MTL_2_COH_1W);
+	intel_uncore_write(uncore, XELPMP_PAT_INDEX(4),
+			   MTL_PPAT_L4_0_WB | MTL_3_COH_2W);
+
+	/*
+	 * Remaining PAT entries are left at the hardware-default
+	 * fully-cached setting
+	 */
+}
+
+static void xelpg_setup_private_ppat(struct intel_gt *gt)
+{
+	intel_gt_mcr_multicast_write(gt, XEHP_PAT_INDEX(0),
+				     MTL_PPAT_L4_0_WB);
+	intel_gt_mcr_multicast_write(gt, XEHP_PAT_INDEX(1),
+				     MTL_PPAT_L4_1_WT);
+	intel_gt_mcr_multicast_write(gt, XEHP_PAT_INDEX(2),
+				     MTL_PPAT_L4_3_UC);
+	intel_gt_mcr_multicast_write(gt, XEHP_PAT_INDEX(3),
+				     MTL_PPAT_L4_0_WB | MTL_2_COH_1W);
+	intel_gt_mcr_multicast_write(gt, XEHP_PAT_INDEX(4),
+				     MTL_PPAT_L4_0_WB | MTL_3_COH_2W);
+
+	/*
+	 * Remaining PAT entries are left at the hardware-default
+	 * fully-cached setting
+	 */
+}
+
 static void tgl_setup_private_ppat(struct intel_uncore *uncore)
 {
 	/* TGL doesn't support LLC or AGE settings */
@@ -603,7 +641,14 @@ void setup_private_pat(struct intel_gt *gt)
 
 	GEM_BUG_ON(GRAPHICS_VER(i915) < 8);
 
-	if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 50))
+	if (gt->type == GT_MEDIA) {
+		xelpmp_setup_private_ppat(gt->uncore);
+		return;
+	}
+
+	if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 70))
+		xelpg_setup_private_ppat(gt);
+	else if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 50))
 		xehp_setup_private_ppat(gt);
 	else if (GRAPHICS_VER(i915) >= 12)
 		tgl_setup_private_ppat(uncore);
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
index 69ce55f517f5..854ec09fd588 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
@@ -88,9 +88,18 @@ typedef u64 gen8_pte_t;
 #define BYT_PTE_SNOOPED_BY_CPU_CACHES	REG_BIT(2)
 #define BYT_PTE_WRITEABLE		REG_BIT(1)
 
+#define MTL_PPGTT_PTE_PAT3	BIT_ULL(62)
 #define GEN12_PPGTT_PTE_LM	BIT_ULL(11)
+#define GEN12_PPGTT_PTE_PAT2	BIT_ULL(7)
+#define GEN12_PPGTT_PTE_NC	BIT_ULL(5)
+#define GEN12_PPGTT_PTE_PAT1	BIT_ULL(4)
+#define GEN12_PPGTT_PTE_PAT0	BIT_ULL(3)
 
-#define GEN12_GGTT_PTE_LM	BIT_ULL(1)
+#define GEN12_GGTT_PTE_LM		BIT_ULL(1)
+#define MTL_GGTT_PTE_PAT0		BIT_ULL(52)
+#define MTL_GGTT_PTE_PAT1		BIT_ULL(53)
+#define GEN12_GGTT_PTE_ADDR_MASK	GENMASK_ULL(45, 12)
+#define MTL_GGTT_PTE_PAT_MASK		GENMASK_ULL(53, 52)
 
 #define GEN12_PDE_64K BIT(6)
 #define GEN12_PTE_PS64 BIT(8)
@@ -147,6 +156,15 @@ typedef u64 gen8_pte_t;
 #define GEN8_PDE_IPS_64K BIT(11)
 #define GEN8_PDE_PS_2M   BIT(7)
 
+#define MTL_PPAT_L4_CACHE_POLICY_MASK	REG_GENMASK(3, 2)
+#define MTL_PAT_INDEX_COH_MODE_MASK	REG_GENMASK(1, 0)
+#define MTL_PPAT_L4_3_UC	REG_FIELD_PREP(MTL_PPAT_L4_CACHE_POLICY_MASK, 3)
+#define MTL_PPAT_L4_1_WT	REG_FIELD_PREP(MTL_PPAT_L4_CACHE_POLICY_MASK, 1)
+#define MTL_PPAT_L4_0_WB	REG_FIELD_PREP(MTL_PPAT_L4_CACHE_POLICY_MASK, 0)
+#define MTL_3_COH_2W	REG_FIELD_PREP(MTL_PAT_INDEX_COH_MODE_MASK, 3)
+#define MTL_2_COH_1W	REG_FIELD_PREP(MTL_PAT_INDEX_COH_MODE_MASK, 2)
+#define MTL_0_COH_NON	REG_FIELD_PREP(MTL_PAT_INDEX_COH_MODE_MASK, 0)
+
 enum i915_cache_level;
 
 struct drm_i915_gem_object;
diff --git a/drivers/gpu/drm/i915/gt/intel_mocs.c b/drivers/gpu/drm/i915/gt/intel_mocs.c
index 69b489e8dfed..89570f137b2c 100644
--- a/drivers/gpu/drm/i915/gt/intel_mocs.c
+++ b/drivers/gpu/drm/i915/gt/intel_mocs.c
@@ -40,6 +40,10 @@ struct drm_i915_mocs_table {
 #define LE_COS(value)		((value) << 15)
 #define LE_SSE(value)		((value) << 17)
 
+/* Defines for the tables (GLOB_MOCS_0 - GLOB_MOCS_16) */
+#define _L4_CACHEABILITY(value)	((value) << 2)
+#define IG_PAT(value)		((value) << 8)
+
 /* Defines for the tables (LNCFMOCS0 - LNCFMOCS31) - two entries per word */
 #define L3_ESC(value)		((value) << 0)
 #define L3_SCC(value)		((value) << 1)
@@ -50,6 +54,7 @@ struct drm_i915_mocs_table {
 /* Helper defines */
 #define GEN9_NUM_MOCS_ENTRIES	64  /* 63-64 are reserved, but configured. */
 #define PVC_NUM_MOCS_ENTRIES	3
+#define MTL_NUM_MOCS_ENTRIES	16
 
 /* (e)LLC caching options */
 /*
@@ -73,6 +78,12 @@ struct drm_i915_mocs_table {
 #define L3_2_RESERVED		_L3_CACHEABILITY(2)
 #define L3_3_WB			_L3_CACHEABILITY(3)
 
+/* L4 caching options */
+#define L4_0_WB			_L4_CACHEABILITY(0)
+#define L4_1_WT			_L4_CACHEABILITY(1)
+#define L4_2_RESERVED		_L4_CACHEABILITY(2)
+#define L4_3_UC			_L4_CACHEABILITY(3)
+
 #define MOCS_ENTRY(__idx, __control_value, __l3cc_value) \
 	[__idx] = { \
 		.control_value = __control_value, \
@@ -416,6 +427,57 @@ static const struct drm_i915_mocs_entry pvc_mocs_table[] = {
 	MOCS_ENTRY(2, 0, L3_3_WB),
 };
 
+static const struct drm_i915_mocs_entry mtl_mocs_table[] = {
+	/* Error - Reserved for Non-Use */
+	MOCS_ENTRY(0,
+		   IG_PAT(0),
+		   L3_LKUP(1) | L3_3_WB),
+	/* Cached - L3 + L4 */
+	MOCS_ENTRY(1,
+		   IG_PAT(1),
+		   L3_LKUP(1) | L3_3_WB),
+	/* L4 - GO:L3 */
+	MOCS_ENTRY(2,
+		   IG_PAT(1),
+		   L3_LKUP(1) | L3_1_UC),
+	/* Uncached - GO:L3 */
+	MOCS_ENTRY(3,
+		   IG_PAT(1) | L4_3_UC,
+		   L3_LKUP(1) | L3_1_UC),
+	/* L4 - GO:Mem */
+	MOCS_ENTRY(4,
+		   IG_PAT(1),
+		   L3_LKUP(1) | L3_GLBGO(1) | L3_1_UC),
+	/* Uncached - GO:Mem */
+	MOCS_ENTRY(5,
+		   IG_PAT(1) | L4_3_UC,
+		   L3_LKUP(1) | L3_GLBGO(1) | L3_1_UC),
+	/* L4 - L3:NoLKUP; GO:L3 */
+	MOCS_ENTRY(6,
+		   IG_PAT(1),
+		   L3_1_UC),
+	/* Uncached - L3:NoLKUP; GO:L3 */
+	MOCS_ENTRY(7,
+		   IG_PAT(1) | L4_3_UC,
+		   L3_1_UC),
+	/* L4 - L3:NoLKUP; GO:Mem */
+	MOCS_ENTRY(8,
+		   IG_PAT(1),
+		   L3_GLBGO(1) | L3_1_UC),
+	/* Uncached - L3:NoLKUP; GO:Mem */
+	MOCS_ENTRY(9,
+		   IG_PAT(1) | L4_3_UC,
+		   L3_GLBGO(1) | L3_1_UC),
+	/* Display - L3; L4:WT */
+	MOCS_ENTRY(14,
+		   IG_PAT(1) | L4_1_WT,
+		   L3_LKUP(1) | L3_3_WB),
+	/* CCS - Non-Displayable */
+	MOCS_ENTRY(15,
+		   IG_PAT(1),
+		   L3_GLBGO(1) | L3_1_UC),
+};
+
 enum {
 	HAS_GLOBAL_MOCS = BIT(0),
 	HAS_ENGINE_MOCS = BIT(1),
@@ -445,7 +507,13 @@ static unsigned int get_mocs_settings(const struct drm_i915_private *i915,
 	memset(table, 0, sizeof(struct drm_i915_mocs_table));
 
 	table->unused_entries_index = I915_MOCS_PTE;
-	if (IS_PONTEVECCHIO(i915)) {
+	if (IS_METEORLAKE(i915)) {
+		table->size = ARRAY_SIZE(mtl_mocs_table);
+		table->table = mtl_mocs_table;
+		table->n_entries = MTL_NUM_MOCS_ENTRIES;
+		table->uc_index = 9;
+		table->unused_entries_index = 1;
+	} else if (IS_PONTEVECCHIO(i915)) {
 		table->size = ARRAY_SIZE(pvc_mocs_table);
 		table->table = pvc_mocs_table;
 		table->n_entries = PVC_NUM_MOCS_ENTRIES;
@@ -646,9 +714,9 @@ void intel_mocs_init_engine(struct intel_engine_cs *engine)
 		init_l3cc_table(engine->gt, &table);
 }
 
-static u32 global_mocs_offset(void)
+static u32 global_mocs_offset(struct intel_gt *gt)
 {
-	return i915_mmio_reg_offset(GEN12_GLOBAL_MOCS(0));
+	return i915_mmio_reg_offset(GEN12_GLOBAL_MOCS(0)) + gt->uncore->gsi_offset;
 }
 
 void intel_set_mocs_index(struct intel_gt *gt)
@@ -671,7 +739,7 @@ void intel_mocs_init(struct intel_gt *gt)
 	 */
 	flags = get_mocs_settings(gt->i915, &table);
 	if (flags & HAS_GLOBAL_MOCS)
-		__init_mocs_table(gt->uncore, &table, global_mocs_offset());
+		__init_mocs_table(gt->uncore, &table, global_mocs_offset(gt));
 
 	/*
 	 * Initialize the L3CC table as part of mocs initalization to make
diff --git a/drivers/gpu/drm/i915/gt/selftest_mocs.c b/drivers/gpu/drm/i915/gt/selftest_mocs.c
index ca009a6a13bd..730796346514 100644
--- a/drivers/gpu/drm/i915/gt/selftest_mocs.c
+++ b/drivers/gpu/drm/i915/gt/selftest_mocs.c
@@ -137,7 +137,7 @@ static int read_mocs_table(struct i915_request *rq,
 		return 0;
 
 	if (HAS_GLOBAL_MOCS_REGISTERS(rq->engine->i915))
-		addr = global_mocs_offset();
+		addr = global_mocs_offset(rq->engine->gt);
 	else
 		addr = mocs_offset(rq->engine);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 3/8] drm/i915/mtl: Add PTE encode function
  2023-04-19 23:00 ` [Intel-gfx] " fei.yang
@ 2023-04-19 23:00   ` fei.yang
  -1 siblings, 0 replies; 76+ messages in thread
From: fei.yang @ 2023-04-19 23:00 UTC (permalink / raw)
  To: intel-gfx; +Cc: Andrzej Hajda, Nirmoy Das, Fei Yang, dri-devel, Andi Shyti

From: Fei Yang <fei.yang@intel.com>

PTE encode functions are platform dependent. This patch implements
PTE functions for MTL, and ensures the correct PTE encode function
is used by calling pte_encode function pointer instead of the
hardcoded gen8 version of PTE encode.

Signed-off-by: Fei Yang <fei.yang@intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Acked-by: Nirmoy Das <nirmoy.das@intel.com>
---
 drivers/gpu/drm/i915/display/intel_dpt.c |  2 +-
 drivers/gpu/drm/i915/gt/gen8_ppgtt.c     | 45 ++++++++++++++++++++----
 drivers/gpu/drm/i915/gt/intel_ggtt.c     | 36 +++++++++++++++++--
 3 files changed, 72 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_dpt.c b/drivers/gpu/drm/i915/display/intel_dpt.c
index b8027392144d..c5eacfdba1a5 100644
--- a/drivers/gpu/drm/i915/display/intel_dpt.c
+++ b/drivers/gpu/drm/i915/display/intel_dpt.c
@@ -300,7 +300,7 @@ intel_dpt_create(struct intel_framebuffer *fb)
 	vm->vma_ops.bind_vma    = dpt_bind_vma;
 	vm->vma_ops.unbind_vma  = dpt_unbind_vma;
 
-	vm->pte_encode = gen8_ggtt_pte_encode;
+	vm->pte_encode = vm->gt->ggtt->vm.pte_encode;
 
 	dpt->obj = dpt_obj;
 	dpt->obj->is_dpt = true;
diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
index 4daaa6f55668..11b91e0453c8 100644
--- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
@@ -55,6 +55,34 @@ static u64 gen8_pte_encode(dma_addr_t addr,
 	return pte;
 }
 
+static u64 mtl_pte_encode(dma_addr_t addr,
+			  enum i915_cache_level level,
+			  u32 flags)
+{
+	gen8_pte_t pte = addr | GEN8_PAGE_PRESENT | GEN8_PAGE_RW;
+
+	if (unlikely(flags & PTE_READ_ONLY))
+		pte &= ~GEN8_PAGE_RW;
+
+	if (flags & PTE_LM)
+		pte |= GEN12_PPGTT_PTE_LM | GEN12_PPGTT_PTE_NC;
+
+	switch (level) {
+	case I915_CACHE_NONE:
+		pte |= GEN12_PPGTT_PTE_PAT1;
+		break;
+	case I915_CACHE_LLC:
+	case I915_CACHE_L3_LLC:
+		pte |= GEN12_PPGTT_PTE_PAT0 | GEN12_PPGTT_PTE_PAT1;
+		break;
+	case I915_CACHE_WT:
+		pte |= GEN12_PPGTT_PTE_PAT0;
+		break;
+	}
+
+	return pte;
+}
+
 static void gen8_ppgtt_notify_vgt(struct i915_ppgtt *ppgtt, bool create)
 {
 	struct drm_i915_private *i915 = ppgtt->vm.i915;
@@ -427,7 +455,7 @@ gen8_ppgtt_insert_pte(struct i915_ppgtt *ppgtt,
 		      u32 flags)
 {
 	struct i915_page_directory *pd;
-	const gen8_pte_t pte_encode = gen8_pte_encode(0, cache_level, flags);
+	const gen8_pte_t pte_encode = ppgtt->vm.pte_encode(0, cache_level, flags);
 	gen8_pte_t *vaddr;
 
 	pd = i915_pd_entry(pdp, gen8_pd_index(idx, 2));
@@ -580,7 +608,7 @@ static void gen8_ppgtt_insert_huge(struct i915_address_space *vm,
 				   enum i915_cache_level cache_level,
 				   u32 flags)
 {
-	const gen8_pte_t pte_encode = gen8_pte_encode(0, cache_level, flags);
+	const gen8_pte_t pte_encode = vm->pte_encode(0, cache_level, flags);
 	unsigned int rem = sg_dma_len(iter->sg);
 	u64 start = vma_res->start;
 
@@ -743,7 +771,7 @@ static void gen8_ppgtt_insert_entry(struct i915_address_space *vm,
 	GEM_BUG_ON(pt->is_compact);
 
 	vaddr = px_vaddr(pt);
-	vaddr[gen8_pd_index(idx, 0)] = gen8_pte_encode(addr, level, flags);
+	vaddr[gen8_pd_index(idx, 0)] = vm->pte_encode(addr, level, flags);
 	drm_clflush_virt_range(&vaddr[gen8_pd_index(idx, 0)], sizeof(*vaddr));
 }
 
@@ -773,7 +801,7 @@ static void __xehpsdv_ppgtt_insert_entry_lm(struct i915_address_space *vm,
 	}
 
 	vaddr = px_vaddr(pt);
-	vaddr[gen8_pd_index(idx, 0) / 16] = gen8_pte_encode(addr, level, flags);
+	vaddr[gen8_pd_index(idx, 0) / 16] = vm->pte_encode(addr, level, flags);
 }
 
 static void xehpsdv_ppgtt_insert_entry(struct i915_address_space *vm,
@@ -820,8 +848,8 @@ static int gen8_init_scratch(struct i915_address_space *vm)
 		pte_flags |= PTE_LM;
 
 	vm->scratch[0]->encode =
-		gen8_pte_encode(px_dma(vm->scratch[0]),
-				I915_CACHE_NONE, pte_flags);
+		vm->pte_encode(px_dma(vm->scratch[0]),
+			       I915_CACHE_NONE, pte_flags);
 
 	for (i = 1; i <= vm->top; i++) {
 		struct drm_i915_gem_object *obj;
@@ -963,7 +991,10 @@ struct i915_ppgtt *gen8_ppgtt_create(struct intel_gt *gt,
 	 */
 	ppgtt->vm.alloc_scratch_dma = alloc_pt_dma;
 
-	ppgtt->vm.pte_encode = gen8_pte_encode;
+	if (GRAPHICS_VER_FULL(gt->i915) >= IP_VER(12, 70))
+		ppgtt->vm.pte_encode = mtl_pte_encode;
+	else
+		ppgtt->vm.pte_encode = gen8_pte_encode;
 
 	ppgtt->vm.bind_async_flags = I915_VMA_LOCAL_BIND;
 	ppgtt->vm.insert_entries = gen8_ppgtt_insert;
diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c
index 3c7f1ed92f5b..20915edc8bd9 100644
--- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
@@ -220,6 +220,33 @@ static void guc_ggtt_invalidate(struct i915_ggtt *ggtt)
 	}
 }
 
+static u64 mtl_ggtt_pte_encode(dma_addr_t addr,
+			       enum i915_cache_level level,
+			       u32 flags)
+{
+	gen8_pte_t pte = addr | GEN8_PAGE_PRESENT;
+
+	WARN_ON_ONCE(addr & ~GEN12_GGTT_PTE_ADDR_MASK);
+
+	if (flags & PTE_LM)
+		pte |= GEN12_GGTT_PTE_LM;
+
+	switch (level) {
+	case I915_CACHE_NONE:
+		pte |= MTL_GGTT_PTE_PAT1;
+		break;
+	case I915_CACHE_LLC:
+	case I915_CACHE_L3_LLC:
+		pte |= MTL_GGTT_PTE_PAT0 | MTL_GGTT_PTE_PAT1;
+		break;
+	case I915_CACHE_WT:
+		pte |= MTL_GGTT_PTE_PAT0;
+		break;
+	}
+
+	return pte;
+}
+
 u64 gen8_ggtt_pte_encode(dma_addr_t addr,
 			 enum i915_cache_level level,
 			 u32 flags)
@@ -247,7 +274,7 @@ static void gen8_ggtt_insert_page(struct i915_address_space *vm,
 	gen8_pte_t __iomem *pte =
 		(gen8_pte_t __iomem *)ggtt->gsm + offset / I915_GTT_PAGE_SIZE;
 
-	gen8_set_pte(pte, gen8_ggtt_pte_encode(addr, level, flags));
+	gen8_set_pte(pte, ggtt->vm.pte_encode(addr, level, flags));
 
 	ggtt->invalidate(ggtt);
 }
@@ -257,8 +284,8 @@ static void gen8_ggtt_insert_entries(struct i915_address_space *vm,
 				     enum i915_cache_level level,
 				     u32 flags)
 {
-	const gen8_pte_t pte_encode = gen8_ggtt_pte_encode(0, level, flags);
 	struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
+	const gen8_pte_t pte_encode = ggtt->vm.pte_encode(0, level, flags);
 	gen8_pte_t __iomem *gte;
 	gen8_pte_t __iomem *end;
 	struct sgt_iter iter;
@@ -981,7 +1008,10 @@ static int gen8_gmch_probe(struct i915_ggtt *ggtt)
 	ggtt->vm.vma_ops.bind_vma    = intel_ggtt_bind_vma;
 	ggtt->vm.vma_ops.unbind_vma  = intel_ggtt_unbind_vma;
 
-	ggtt->vm.pte_encode = gen8_ggtt_pte_encode;
+	if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 70))
+		ggtt->vm.pte_encode = mtl_ggtt_pte_encode;
+	else
+		ggtt->vm.pte_encode = gen8_ggtt_pte_encode;
 
 	return ggtt_probe_common(ggtt, size);
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [Intel-gfx] [PATCH 3/8] drm/i915/mtl: Add PTE encode function
@ 2023-04-19 23:00   ` fei.yang
  0 siblings, 0 replies; 76+ messages in thread
From: fei.yang @ 2023-04-19 23:00 UTC (permalink / raw)
  To: intel-gfx; +Cc: Andrzej Hajda, Nirmoy Das, dri-devel

From: Fei Yang <fei.yang@intel.com>

PTE encode functions are platform dependent. This patch implements
PTE functions for MTL, and ensures the correct PTE encode function
is used by calling pte_encode function pointer instead of the
hardcoded gen8 version of PTE encode.

Signed-off-by: Fei Yang <fei.yang@intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Acked-by: Nirmoy Das <nirmoy.das@intel.com>
---
 drivers/gpu/drm/i915/display/intel_dpt.c |  2 +-
 drivers/gpu/drm/i915/gt/gen8_ppgtt.c     | 45 ++++++++++++++++++++----
 drivers/gpu/drm/i915/gt/intel_ggtt.c     | 36 +++++++++++++++++--
 3 files changed, 72 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_dpt.c b/drivers/gpu/drm/i915/display/intel_dpt.c
index b8027392144d..c5eacfdba1a5 100644
--- a/drivers/gpu/drm/i915/display/intel_dpt.c
+++ b/drivers/gpu/drm/i915/display/intel_dpt.c
@@ -300,7 +300,7 @@ intel_dpt_create(struct intel_framebuffer *fb)
 	vm->vma_ops.bind_vma    = dpt_bind_vma;
 	vm->vma_ops.unbind_vma  = dpt_unbind_vma;
 
-	vm->pte_encode = gen8_ggtt_pte_encode;
+	vm->pte_encode = vm->gt->ggtt->vm.pte_encode;
 
 	dpt->obj = dpt_obj;
 	dpt->obj->is_dpt = true;
diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
index 4daaa6f55668..11b91e0453c8 100644
--- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
@@ -55,6 +55,34 @@ static u64 gen8_pte_encode(dma_addr_t addr,
 	return pte;
 }
 
+static u64 mtl_pte_encode(dma_addr_t addr,
+			  enum i915_cache_level level,
+			  u32 flags)
+{
+	gen8_pte_t pte = addr | GEN8_PAGE_PRESENT | GEN8_PAGE_RW;
+
+	if (unlikely(flags & PTE_READ_ONLY))
+		pte &= ~GEN8_PAGE_RW;
+
+	if (flags & PTE_LM)
+		pte |= GEN12_PPGTT_PTE_LM | GEN12_PPGTT_PTE_NC;
+
+	switch (level) {
+	case I915_CACHE_NONE:
+		pte |= GEN12_PPGTT_PTE_PAT1;
+		break;
+	case I915_CACHE_LLC:
+	case I915_CACHE_L3_LLC:
+		pte |= GEN12_PPGTT_PTE_PAT0 | GEN12_PPGTT_PTE_PAT1;
+		break;
+	case I915_CACHE_WT:
+		pte |= GEN12_PPGTT_PTE_PAT0;
+		break;
+	}
+
+	return pte;
+}
+
 static void gen8_ppgtt_notify_vgt(struct i915_ppgtt *ppgtt, bool create)
 {
 	struct drm_i915_private *i915 = ppgtt->vm.i915;
@@ -427,7 +455,7 @@ gen8_ppgtt_insert_pte(struct i915_ppgtt *ppgtt,
 		      u32 flags)
 {
 	struct i915_page_directory *pd;
-	const gen8_pte_t pte_encode = gen8_pte_encode(0, cache_level, flags);
+	const gen8_pte_t pte_encode = ppgtt->vm.pte_encode(0, cache_level, flags);
 	gen8_pte_t *vaddr;
 
 	pd = i915_pd_entry(pdp, gen8_pd_index(idx, 2));
@@ -580,7 +608,7 @@ static void gen8_ppgtt_insert_huge(struct i915_address_space *vm,
 				   enum i915_cache_level cache_level,
 				   u32 flags)
 {
-	const gen8_pte_t pte_encode = gen8_pte_encode(0, cache_level, flags);
+	const gen8_pte_t pte_encode = vm->pte_encode(0, cache_level, flags);
 	unsigned int rem = sg_dma_len(iter->sg);
 	u64 start = vma_res->start;
 
@@ -743,7 +771,7 @@ static void gen8_ppgtt_insert_entry(struct i915_address_space *vm,
 	GEM_BUG_ON(pt->is_compact);
 
 	vaddr = px_vaddr(pt);
-	vaddr[gen8_pd_index(idx, 0)] = gen8_pte_encode(addr, level, flags);
+	vaddr[gen8_pd_index(idx, 0)] = vm->pte_encode(addr, level, flags);
 	drm_clflush_virt_range(&vaddr[gen8_pd_index(idx, 0)], sizeof(*vaddr));
 }
 
@@ -773,7 +801,7 @@ static void __xehpsdv_ppgtt_insert_entry_lm(struct i915_address_space *vm,
 	}
 
 	vaddr = px_vaddr(pt);
-	vaddr[gen8_pd_index(idx, 0) / 16] = gen8_pte_encode(addr, level, flags);
+	vaddr[gen8_pd_index(idx, 0) / 16] = vm->pte_encode(addr, level, flags);
 }
 
 static void xehpsdv_ppgtt_insert_entry(struct i915_address_space *vm,
@@ -820,8 +848,8 @@ static int gen8_init_scratch(struct i915_address_space *vm)
 		pte_flags |= PTE_LM;
 
 	vm->scratch[0]->encode =
-		gen8_pte_encode(px_dma(vm->scratch[0]),
-				I915_CACHE_NONE, pte_flags);
+		vm->pte_encode(px_dma(vm->scratch[0]),
+			       I915_CACHE_NONE, pte_flags);
 
 	for (i = 1; i <= vm->top; i++) {
 		struct drm_i915_gem_object *obj;
@@ -963,7 +991,10 @@ struct i915_ppgtt *gen8_ppgtt_create(struct intel_gt *gt,
 	 */
 	ppgtt->vm.alloc_scratch_dma = alloc_pt_dma;
 
-	ppgtt->vm.pte_encode = gen8_pte_encode;
+	if (GRAPHICS_VER_FULL(gt->i915) >= IP_VER(12, 70))
+		ppgtt->vm.pte_encode = mtl_pte_encode;
+	else
+		ppgtt->vm.pte_encode = gen8_pte_encode;
 
 	ppgtt->vm.bind_async_flags = I915_VMA_LOCAL_BIND;
 	ppgtt->vm.insert_entries = gen8_ppgtt_insert;
diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c
index 3c7f1ed92f5b..20915edc8bd9 100644
--- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
@@ -220,6 +220,33 @@ static void guc_ggtt_invalidate(struct i915_ggtt *ggtt)
 	}
 }
 
+static u64 mtl_ggtt_pte_encode(dma_addr_t addr,
+			       enum i915_cache_level level,
+			       u32 flags)
+{
+	gen8_pte_t pte = addr | GEN8_PAGE_PRESENT;
+
+	WARN_ON_ONCE(addr & ~GEN12_GGTT_PTE_ADDR_MASK);
+
+	if (flags & PTE_LM)
+		pte |= GEN12_GGTT_PTE_LM;
+
+	switch (level) {
+	case I915_CACHE_NONE:
+		pte |= MTL_GGTT_PTE_PAT1;
+		break;
+	case I915_CACHE_LLC:
+	case I915_CACHE_L3_LLC:
+		pte |= MTL_GGTT_PTE_PAT0 | MTL_GGTT_PTE_PAT1;
+		break;
+	case I915_CACHE_WT:
+		pte |= MTL_GGTT_PTE_PAT0;
+		break;
+	}
+
+	return pte;
+}
+
 u64 gen8_ggtt_pte_encode(dma_addr_t addr,
 			 enum i915_cache_level level,
 			 u32 flags)
@@ -247,7 +274,7 @@ static void gen8_ggtt_insert_page(struct i915_address_space *vm,
 	gen8_pte_t __iomem *pte =
 		(gen8_pte_t __iomem *)ggtt->gsm + offset / I915_GTT_PAGE_SIZE;
 
-	gen8_set_pte(pte, gen8_ggtt_pte_encode(addr, level, flags));
+	gen8_set_pte(pte, ggtt->vm.pte_encode(addr, level, flags));
 
 	ggtt->invalidate(ggtt);
 }
@@ -257,8 +284,8 @@ static void gen8_ggtt_insert_entries(struct i915_address_space *vm,
 				     enum i915_cache_level level,
 				     u32 flags)
 {
-	const gen8_pte_t pte_encode = gen8_ggtt_pte_encode(0, level, flags);
 	struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
+	const gen8_pte_t pte_encode = ggtt->vm.pte_encode(0, level, flags);
 	gen8_pte_t __iomem *gte;
 	gen8_pte_t __iomem *end;
 	struct sgt_iter iter;
@@ -981,7 +1008,10 @@ static int gen8_gmch_probe(struct i915_ggtt *ggtt)
 	ggtt->vm.vma_ops.bind_vma    = intel_ggtt_bind_vma;
 	ggtt->vm.vma_ops.unbind_vma  = intel_ggtt_unbind_vma;
 
-	ggtt->vm.pte_encode = gen8_ggtt_pte_encode;
+	if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 70))
+		ggtt->vm.pte_encode = mtl_ggtt_pte_encode;
+	else
+		ggtt->vm.pte_encode = gen8_ggtt_pte_encode;
 
 	return ggtt_probe_common(ggtt, size);
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 4/8] drm/i915/mtl: workaround coherency issue for Media
  2023-04-19 23:00 ` [Intel-gfx] " fei.yang
@ 2023-04-19 23:00   ` fei.yang
  -1 siblings, 0 replies; 76+ messages in thread
From: fei.yang @ 2023-04-19 23:00 UTC (permalink / raw)
  To: intel-gfx; +Cc: Nirmoy Das, Fei Yang, dri-devel, Andi Shyti

From: Fei Yang <fei.yang@intel.com>

This patch implements Wa_22016122933.

In MTL, memory writes initiated by Media tile update the whole
cache line even for partial writes. This creates a coherency
problem for cacheable memory if both CPU and GPU are writing data
to different locations within a single cache line. CTB communication
is impacted by this issue because the head and tail pointers are
adjacent words within a cache line (see struct guc_ct_buffer_desc),
where one is written by GuC and the other by the host.
This patch circumvents the issue by making CPU/GPU shared memory
uncacheable (WC on CPU side, and PAT index 2 for GPU). Also for
CTB which is being updated by both CPU and GuC, mfence instruction
is added to make sure the CPU writes are visible to GPU right away
(flush the write combining buffer).

While fixing the CTB issue, we noticed some random GSC firmware
loading failure because the share buffers are cacheable (WB) on CPU
side but uncached on GPU side. To fix these issues we need to map
such shared buffers as WC on CPU side. Since such allocations are
not all done through GuC allocator, to avoid too many code changes,
the i915_coherent_map_type() is now hard coded to return WC for MTL.

BSpec: 45101

Signed-off-by: Fei Yang <fei.yang@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Acked-by: Nirmoy Das <nirmoy.das@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_pages.c |  5 ++++-
 drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c | 13 +++++++++++++
 drivers/gpu/drm/i915/gt/uc/intel_guc.c    |  7 +++++++
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c |  6 ++++++
 4 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
index ecd86130b74f..89fc8ea6bcfc 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
@@ -469,7 +469,10 @@ enum i915_map_type i915_coherent_map_type(struct drm_i915_private *i915,
 					  struct drm_i915_gem_object *obj,
 					  bool always_coherent)
 {
-	if (i915_gem_object_is_lmem(obj))
+	/*
+	 * Wa_22016122933: always return I915_MAP_WC for MTL
+	 */
+	if (i915_gem_object_is_lmem(obj) || IS_METEORLAKE(i915))
 		return I915_MAP_WC;
 	if (HAS_LLC(i915) || always_coherent)
 		return I915_MAP_WB;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c b/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c
index 1d9fdfb11268..236673c02f9a 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c
@@ -110,6 +110,13 @@ static int gsc_fw_load_prepare(struct intel_gsc_uc *gsc)
 	if (obj->base.size < gsc->fw.size)
 		return -ENOSPC;
 
+	/*
+	 * Wa_22016122933: For MTL the shared memory needs to be mapped
+	 * as WC on CPU side and UC (PAT index 2) on GPU side
+	 */
+	if (IS_METEORLAKE(i915))
+		i915_gem_object_set_cache_coherency(obj, I915_CACHE_NONE);
+
 	dst = i915_gem_object_pin_map_unlocked(obj,
 					       i915_coherent_map_type(i915, obj, true));
 	if (IS_ERR(dst))
@@ -125,6 +132,12 @@ static int gsc_fw_load_prepare(struct intel_gsc_uc *gsc)
 	memset(dst, 0, obj->base.size);
 	memcpy(dst, src, gsc->fw.size);
 
+	/*
+	 * Wa_22016122933: Making sure the data in dst is
+	 * visible to GSC right away
+	 */
+	intel_guc_write_barrier(&gt->uc.guc);
+
 	i915_gem_object_unpin_map(gsc->fw.obj);
 	i915_gem_object_unpin_map(obj);
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
index e89f16ecf1ae..c9f20385f6a0 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
@@ -744,6 +744,13 @@ struct i915_vma *intel_guc_allocate_vma(struct intel_guc *guc, u32 size)
 	if (IS_ERR(obj))
 		return ERR_CAST(obj);
 
+	/*
+	 * Wa_22016122933: For MTL the shared memory needs to be mapped
+	 * as WC on CPU side and UC (PAT index 2) on GPU side
+	 */
+	if (IS_METEORLAKE(gt->i915))
+		i915_gem_object_set_cache_coherency(obj, I915_CACHE_NONE);
+
 	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL);
 	if (IS_ERR(vma))
 		goto err;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index 1803a633ed64..99a0a89091e7 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -902,6 +902,12 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
 	/* now update descriptor */
 	WRITE_ONCE(desc->head, head);
 
+	/*
+	 * Wa_22016122933: Making sure the head update is
+	 * visible to GuC right away
+	 */
+	intel_guc_write_barrier(ct_to_guc(ct));
+
 	return available - len;
 
 corrupted:
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [Intel-gfx] [PATCH 4/8] drm/i915/mtl: workaround coherency issue for Media
@ 2023-04-19 23:00   ` fei.yang
  0 siblings, 0 replies; 76+ messages in thread
From: fei.yang @ 2023-04-19 23:00 UTC (permalink / raw)
  To: intel-gfx; +Cc: Nirmoy Das, dri-devel

From: Fei Yang <fei.yang@intel.com>

This patch implements Wa_22016122933.

In MTL, memory writes initiated by Media tile update the whole
cache line even for partial writes. This creates a coherency
problem for cacheable memory if both CPU and GPU are writing data
to different locations within a single cache line. CTB communication
is impacted by this issue because the head and tail pointers are
adjacent words within a cache line (see struct guc_ct_buffer_desc),
where one is written by GuC and the other by the host.
This patch circumvents the issue by making CPU/GPU shared memory
uncacheable (WC on CPU side, and PAT index 2 for GPU). Also for
CTB which is being updated by both CPU and GuC, mfence instruction
is added to make sure the CPU writes are visible to GPU right away
(flush the write combining buffer).

While fixing the CTB issue, we noticed some random GSC firmware
loading failure because the share buffers are cacheable (WB) on CPU
side but uncached on GPU side. To fix these issues we need to map
such shared buffers as WC on CPU side. Since such allocations are
not all done through GuC allocator, to avoid too many code changes,
the i915_coherent_map_type() is now hard coded to return WC for MTL.

BSpec: 45101

Signed-off-by: Fei Yang <fei.yang@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Acked-by: Nirmoy Das <nirmoy.das@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_pages.c |  5 ++++-
 drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c | 13 +++++++++++++
 drivers/gpu/drm/i915/gt/uc/intel_guc.c    |  7 +++++++
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c |  6 ++++++
 4 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
index ecd86130b74f..89fc8ea6bcfc 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
@@ -469,7 +469,10 @@ enum i915_map_type i915_coherent_map_type(struct drm_i915_private *i915,
 					  struct drm_i915_gem_object *obj,
 					  bool always_coherent)
 {
-	if (i915_gem_object_is_lmem(obj))
+	/*
+	 * Wa_22016122933: always return I915_MAP_WC for MTL
+	 */
+	if (i915_gem_object_is_lmem(obj) || IS_METEORLAKE(i915))
 		return I915_MAP_WC;
 	if (HAS_LLC(i915) || always_coherent)
 		return I915_MAP_WB;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c b/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c
index 1d9fdfb11268..236673c02f9a 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c
@@ -110,6 +110,13 @@ static int gsc_fw_load_prepare(struct intel_gsc_uc *gsc)
 	if (obj->base.size < gsc->fw.size)
 		return -ENOSPC;
 
+	/*
+	 * Wa_22016122933: For MTL the shared memory needs to be mapped
+	 * as WC on CPU side and UC (PAT index 2) on GPU side
+	 */
+	if (IS_METEORLAKE(i915))
+		i915_gem_object_set_cache_coherency(obj, I915_CACHE_NONE);
+
 	dst = i915_gem_object_pin_map_unlocked(obj,
 					       i915_coherent_map_type(i915, obj, true));
 	if (IS_ERR(dst))
@@ -125,6 +132,12 @@ static int gsc_fw_load_prepare(struct intel_gsc_uc *gsc)
 	memset(dst, 0, obj->base.size);
 	memcpy(dst, src, gsc->fw.size);
 
+	/*
+	 * Wa_22016122933: Making sure the data in dst is
+	 * visible to GSC right away
+	 */
+	intel_guc_write_barrier(&gt->uc.guc);
+
 	i915_gem_object_unpin_map(gsc->fw.obj);
 	i915_gem_object_unpin_map(obj);
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
index e89f16ecf1ae..c9f20385f6a0 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
@@ -744,6 +744,13 @@ struct i915_vma *intel_guc_allocate_vma(struct intel_guc *guc, u32 size)
 	if (IS_ERR(obj))
 		return ERR_CAST(obj);
 
+	/*
+	 * Wa_22016122933: For MTL the shared memory needs to be mapped
+	 * as WC on CPU side and UC (PAT index 2) on GPU side
+	 */
+	if (IS_METEORLAKE(gt->i915))
+		i915_gem_object_set_cache_coherency(obj, I915_CACHE_NONE);
+
 	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL);
 	if (IS_ERR(vma))
 		goto err;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index 1803a633ed64..99a0a89091e7 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -902,6 +902,12 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
 	/* now update descriptor */
 	WRITE_ONCE(desc->head, head);
 
+	/*
+	 * Wa_22016122933: Making sure the head update is
+	 * visible to GuC right away
+	 */
+	intel_guc_write_barrier(ct_to_guc(ct));
+
 	return available - len;
 
 corrupted:
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 5/8] drm/i915/mtl: end support for set caching ioctl
  2023-04-19 23:00 ` [Intel-gfx] " fei.yang
@ 2023-04-19 23:00   ` fei.yang
  -1 siblings, 0 replies; 76+ messages in thread
From: fei.yang @ 2023-04-19 23:00 UTC (permalink / raw)
  To: intel-gfx; +Cc: Andrzej Hajda, Fei Yang, dri-devel, Andi Shyti

From: Fei Yang <fei.yang@intel.com>

The design is to keep Buffer Object's caching policy immutable through
out its life cycle. This patch ends the support for set caching ioctl
from MTL onward. While doing that we also set BO's to be 1-way coherent
at creation time because GPU is no longer automatically snooping CPU
cache. For UMD's need to fine tune the caching policy for BO's, a follow
up patch will extend the GEM_CREATE uAPI to allow UMD's specify caching
mode at BO creation time.

Signed-off-by: Fei Yang <fei.yang@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_domain.c | 3 +++
 drivers/gpu/drm/i915/gem/i915_gem_shmem.c  | 9 ++++++++-
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
index d2d5a24301b2..bb3575b1479f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
@@ -337,6 +337,9 @@ int i915_gem_set_caching_ioctl(struct drm_device *dev, void *data,
 	if (IS_DGFX(i915))
 		return -ENODEV;
 
+	if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 70))
+		return -EOPNOTSUPP;
+
 	switch (args->caching) {
 	case I915_CACHING_NONE:
 		level = I915_CACHE_NONE;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
index 37d1efcd3ca6..cad4a6017f4b 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
@@ -601,7 +601,14 @@ static int shmem_object_init(struct intel_memory_region *mem,
 	obj->write_domain = I915_GEM_DOMAIN_CPU;
 	obj->read_domains = I915_GEM_DOMAIN_CPU;
 
-	if (HAS_LLC(i915))
+	/*
+	 * MTL doesn't snoop CPU cache by default for GPU access (namely
+	 * 1-way coherency). However some UMD's are currently depending on
+	 * that. Make 1-way coherent the default setting for MTL. A follow
+	 * up patch will extend the GEM_CREATE uAPI to allow UMD's specify
+	 * caching mode at BO creation time
+	 */
+	if (HAS_LLC(i915) || (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 70)))
 		/* On some devices, we can have the GPU use the LLC (the CPU
 		 * cache) for about a 10% performance improvement
 		 * compared to uncached.  Graphics requests other than
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [Intel-gfx] [PATCH 5/8] drm/i915/mtl: end support for set caching ioctl
@ 2023-04-19 23:00   ` fei.yang
  0 siblings, 0 replies; 76+ messages in thread
From: fei.yang @ 2023-04-19 23:00 UTC (permalink / raw)
  To: intel-gfx; +Cc: Andrzej Hajda, dri-devel

From: Fei Yang <fei.yang@intel.com>

The design is to keep Buffer Object's caching policy immutable through
out its life cycle. This patch ends the support for set caching ioctl
from MTL onward. While doing that we also set BO's to be 1-way coherent
at creation time because GPU is no longer automatically snooping CPU
cache. For UMD's need to fine tune the caching policy for BO's, a follow
up patch will extend the GEM_CREATE uAPI to allow UMD's specify caching
mode at BO creation time.

Signed-off-by: Fei Yang <fei.yang@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_domain.c | 3 +++
 drivers/gpu/drm/i915/gem/i915_gem_shmem.c  | 9 ++++++++-
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
index d2d5a24301b2..bb3575b1479f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
@@ -337,6 +337,9 @@ int i915_gem_set_caching_ioctl(struct drm_device *dev, void *data,
 	if (IS_DGFX(i915))
 		return -ENODEV;
 
+	if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 70))
+		return -EOPNOTSUPP;
+
 	switch (args->caching) {
 	case I915_CACHING_NONE:
 		level = I915_CACHE_NONE;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
index 37d1efcd3ca6..cad4a6017f4b 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
@@ -601,7 +601,14 @@ static int shmem_object_init(struct intel_memory_region *mem,
 	obj->write_domain = I915_GEM_DOMAIN_CPU;
 	obj->read_domains = I915_GEM_DOMAIN_CPU;
 
-	if (HAS_LLC(i915))
+	/*
+	 * MTL doesn't snoop CPU cache by default for GPU access (namely
+	 * 1-way coherency). However some UMD's are currently depending on
+	 * that. Make 1-way coherent the default setting for MTL. A follow
+	 * up patch will extend the GEM_CREATE uAPI to allow UMD's specify
+	 * caching mode at BO creation time
+	 */
+	if (HAS_LLC(i915) || (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 70)))
 		/* On some devices, we can have the GPU use the LLC (the CPU
 		 * cache) for about a 10% performance improvement
 		 * compared to uncached.  Graphics requests other than
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 6/8] drm/i915: preparation for using PAT index
  2023-04-19 23:00 ` [Intel-gfx] " fei.yang
@ 2023-04-19 23:00   ` fei.yang
  -1 siblings, 0 replies; 76+ messages in thread
From: fei.yang @ 2023-04-19 23:00 UTC (permalink / raw)
  To: intel-gfx; +Cc: Matt Roper, Chris Wilson, Fei Yang, dri-devel, Andi Shyti

From: Fei Yang <fei.yang@intel.com>

This patch is a preparation for replacing enum i915_cache_level with PAT
index. Caching policy for buffer objects is set through the PAT index in
PTE, the old i915_cache_level is not sufficient to represent all caching
modes supported by the hardware.

Preparing the transition by adding some platform dependent data structures
and helper functions to translate the cache_level to pat_index.

cachelevel_to_pat: a platform dependent array mapping cache_level to
                   pat_index.

max_pat_index: the maximum PAT index supported by the hardware. Needed for
               validating the PAT index passed in from user space.

i915_gem_get_pat_index: function to convert cache_level to PAT index.

obj_to_i915(obj): macro moved to header file for wider usage.

I915_MAX_CACHE_LEVEL: upper bound of i915_cache_level for the
                      convenience of coding.

Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Fei Yang <fei.yang@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_object.c    |  9 +++
 drivers/gpu/drm/i915/gem/i915_gem_object.h    |  4 +
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |  1 +
 drivers/gpu/drm/i915/gem/i915_gem_shrinker.c  |  2 -
 drivers/gpu/drm/i915/gt/gen8_ppgtt.c          |  6 ++
 drivers/gpu/drm/i915/gt/intel_ggtt.c          |  6 ++
 drivers/gpu/drm/i915/i915_pci.c               | 75 +++++++++++++++++--
 drivers/gpu/drm/i915/intel_device_info.h      |  5 ++
 .../gpu/drm/i915/selftests/mock_gem_device.c  |  9 +++
 9 files changed, 107 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
index 4666bb82f312..8c70a0ec7d2f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
@@ -45,6 +45,15 @@ static struct kmem_cache *slab_objects;
 
 static const struct drm_gem_object_funcs i915_gem_object_funcs;
 
+unsigned int i915_gem_get_pat_index(struct drm_i915_private *i915,
+				    enum i915_cache_level level)
+{
+	if (drm_WARN_ON(&i915->drm, level >= I915_MAX_CACHE_LEVEL))
+		return 0;
+
+	return INTEL_INFO(i915)->cachelevel_to_pat[level];
+}
+
 struct drm_i915_gem_object *i915_gem_object_alloc(void)
 {
 	struct drm_i915_gem_object *obj;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h
index 885ccde9dc3c..4c92e17b4337 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
@@ -20,6 +20,8 @@
 
 enum intel_region_id;
 
+#define obj_to_i915(obj__) to_i915((obj__)->base.dev)
+
 static inline bool i915_gem_object_size_2big(u64 size)
 {
 	struct drm_i915_gem_object *obj;
@@ -30,6 +32,8 @@ static inline bool i915_gem_object_size_2big(u64 size)
 	return false;
 }
 
+unsigned int i915_gem_get_pat_index(struct drm_i915_private *i915,
+				    enum i915_cache_level level);
 void i915_gem_init__objects(struct drm_i915_private *i915);
 
 void i915_objects_module_exit(void);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index 830c11431ee8..41b35abccf88 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -194,6 +194,7 @@ enum i915_cache_level {
 	 * engine.
 	 */
 	I915_CACHE_WT,
+	I915_MAX_CACHE_LEVEL,
 };
 
 enum i915_map_type {
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c
index b1672e054b21..214763942aa2 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c
@@ -460,8 +460,6 @@ void i915_gem_shrinker_taints_mutex(struct drm_i915_private *i915,
 	fs_reclaim_release(GFP_KERNEL);
 }
 
-#define obj_to_i915(obj__) to_i915((obj__)->base.dev)
-
 /**
  * i915_gem_object_make_unshrinkable - Hide the object from the shrinker. By
  * default all object types that support shrinking(see IS_SHRINKABLE), will also
diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
index 11b91e0453c8..7a4b1d1afce9 100644
--- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
@@ -78,6 +78,12 @@ static u64 mtl_pte_encode(dma_addr_t addr,
 	case I915_CACHE_WT:
 		pte |= GEN12_PPGTT_PTE_PAT0;
 		break;
+	default:
+		/* This should never happen. Added to deal with the compile
+		 * error due to the addition of I915_MAX_CACHE_LEVEL. Will
+		 * be removed by the pat_index patch.
+		 */
+		break;
 	}
 
 	return pte;
diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c
index 20915edc8bd9..c8390d03fce2 100644
--- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
@@ -242,6 +242,12 @@ static u64 mtl_ggtt_pte_encode(dma_addr_t addr,
 	case I915_CACHE_WT:
 		pte |= MTL_GGTT_PTE_PAT0;
 		break;
+	default:
+		/* This should never happen. Added to deal with the compile
+		 * error due to the addition of I915_MAX_CACHE_LEVEL. Will
+		 * be removed by the pat_index patch.
+		 */
+		break;
 	}
 
 	return pte;
diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
index 272a8ba37b64..4ca0ea8fce9b 100644
--- a/drivers/gpu/drm/i915/i915_pci.c
+++ b/drivers/gpu/drm/i915/i915_pci.c
@@ -30,6 +30,7 @@
 #include "display/intel_display_driver.h"
 #include "gt/intel_gt_regs.h"
 #include "gt/intel_sa_media.h"
+#include "gem/i915_gem_object_types.h"
 
 #include "i915_driver.h"
 #include "i915_drv.h"
@@ -164,6 +165,38 @@
 		.gamma_lut_tests = DRM_COLOR_LUT_NON_DECREASING, \
 	}
 
+#define LEGACY_CACHELEVEL \
+	.cachelevel_to_pat = { \
+		[I915_CACHE_NONE]   = 0, \
+		[I915_CACHE_LLC]    = 1, \
+		[I915_CACHE_L3_LLC] = 2, \
+		[I915_CACHE_WT]     = 3, \
+	}
+
+#define TGL_CACHELEVEL \
+	.cachelevel_to_pat = { \
+		[I915_CACHE_NONE]   = 3, \
+		[I915_CACHE_LLC]    = 0, \
+		[I915_CACHE_L3_LLC] = 0, \
+		[I915_CACHE_WT]     = 2, \
+	}
+
+#define PVC_CACHELEVEL \
+	.cachelevel_to_pat = { \
+		[I915_CACHE_NONE]   = 0, \
+		[I915_CACHE_LLC]    = 3, \
+		[I915_CACHE_L3_LLC] = 3, \
+		[I915_CACHE_WT]     = 2, \
+	}
+
+#define MTL_CACHELEVEL \
+	.cachelevel_to_pat = { \
+		[I915_CACHE_NONE]   = 2, \
+		[I915_CACHE_LLC]    = 3, \
+		[I915_CACHE_L3_LLC] = 3, \
+		[I915_CACHE_WT]     = 1, \
+	}
+
 /* Keep in gen based order, and chronological order within a gen */
 
 #define GEN_DEFAULT_PAGE_SIZES \
@@ -189,11 +222,13 @@
 	.has_snoop = true, \
 	.has_coherent_ggtt = false, \
 	.dma_mask_size = 32, \
+	.max_pat_index = 3, \
 	I9XX_PIPE_OFFSETS, \
 	I9XX_CURSOR_OFFSETS, \
 	I9XX_COLORS, \
 	GEN_DEFAULT_PAGE_SIZES, \
-	GEN_DEFAULT_REGIONS
+	GEN_DEFAULT_REGIONS, \
+	LEGACY_CACHELEVEL
 
 #define I845_FEATURES \
 	GEN(2), \
@@ -210,11 +245,13 @@
 	.has_snoop = true, \
 	.has_coherent_ggtt = false, \
 	.dma_mask_size = 32, \
+	.max_pat_index = 3, \
 	I845_PIPE_OFFSETS, \
 	I845_CURSOR_OFFSETS, \
 	I845_COLORS, \
 	GEN_DEFAULT_PAGE_SIZES, \
-	GEN_DEFAULT_REGIONS
+	GEN_DEFAULT_REGIONS, \
+	LEGACY_CACHELEVEL
 
 static const struct intel_device_info i830_info = {
 	I830_FEATURES,
@@ -249,11 +286,13 @@ static const struct intel_device_info i865g_info = {
 	.has_snoop = true, \
 	.has_coherent_ggtt = true, \
 	.dma_mask_size = 32, \
+	.max_pat_index = 3, \
 	I9XX_PIPE_OFFSETS, \
 	I9XX_CURSOR_OFFSETS, \
 	I9XX_COLORS, \
 	GEN_DEFAULT_PAGE_SIZES, \
-	GEN_DEFAULT_REGIONS
+	GEN_DEFAULT_REGIONS, \
+	LEGACY_CACHELEVEL
 
 static const struct intel_device_info i915g_info = {
 	GEN3_FEATURES,
@@ -341,11 +380,13 @@ static const struct intel_device_info pnv_m_info = {
 	.has_snoop = true, \
 	.has_coherent_ggtt = true, \
 	.dma_mask_size = 36, \
+	.max_pat_index = 3, \
 	I9XX_PIPE_OFFSETS, \
 	I9XX_CURSOR_OFFSETS, \
 	I9XX_COLORS, \
 	GEN_DEFAULT_PAGE_SIZES, \
-	GEN_DEFAULT_REGIONS
+	GEN_DEFAULT_REGIONS, \
+	LEGACY_CACHELEVEL
 
 static const struct intel_device_info i965g_info = {
 	GEN4_FEATURES,
@@ -395,11 +436,13 @@ static const struct intel_device_info gm45_info = {
 	/* ilk does support rc6, but we do not implement [power] contexts */ \
 	.has_rc6 = 0, \
 	.dma_mask_size = 36, \
+	.max_pat_index = 3, \
 	I9XX_PIPE_OFFSETS, \
 	I9XX_CURSOR_OFFSETS, \
 	ILK_COLORS, \
 	GEN_DEFAULT_PAGE_SIZES, \
-	GEN_DEFAULT_REGIONS
+	GEN_DEFAULT_REGIONS, \
+	LEGACY_CACHELEVEL
 
 static const struct intel_device_info ilk_d_info = {
 	GEN5_FEATURES,
@@ -429,13 +472,15 @@ static const struct intel_device_info ilk_m_info = {
 	.has_rc6p = 0, \
 	.has_rps = true, \
 	.dma_mask_size = 40, \
+	.max_pat_index = 3, \
 	.__runtime.ppgtt_type = INTEL_PPGTT_ALIASING, \
 	.__runtime.ppgtt_size = 31, \
 	I9XX_PIPE_OFFSETS, \
 	I9XX_CURSOR_OFFSETS, \
 	ILK_COLORS, \
 	GEN_DEFAULT_PAGE_SIZES, \
-	GEN_DEFAULT_REGIONS
+	GEN_DEFAULT_REGIONS, \
+	LEGACY_CACHELEVEL
 
 #define SNB_D_PLATFORM \
 	GEN6_FEATURES, \
@@ -482,13 +527,15 @@ static const struct intel_device_info snb_m_gt2_info = {
 	.has_reset_engine = true, \
 	.has_rps = true, \
 	.dma_mask_size = 40, \
+	.max_pat_index = 3, \
 	.__runtime.ppgtt_type = INTEL_PPGTT_ALIASING, \
 	.__runtime.ppgtt_size = 31, \
 	IVB_PIPE_OFFSETS, \
 	IVB_CURSOR_OFFSETS, \
 	IVB_COLORS, \
 	GEN_DEFAULT_PAGE_SIZES, \
-	GEN_DEFAULT_REGIONS
+	GEN_DEFAULT_REGIONS, \
+	LEGACY_CACHELEVEL
 
 #define IVB_D_PLATFORM \
 	GEN7_FEATURES, \
@@ -542,6 +589,7 @@ static const struct intel_device_info vlv_info = {
 	.display.has_gmch = 1,
 	.display.has_hotplug = 1,
 	.dma_mask_size = 40,
+	.max_pat_index = 3,
 	.__runtime.ppgtt_type = INTEL_PPGTT_ALIASING,
 	.__runtime.ppgtt_size = 31,
 	.has_snoop = true,
@@ -553,6 +601,7 @@ static const struct intel_device_info vlv_info = {
 	I9XX_COLORS,
 	GEN_DEFAULT_PAGE_SIZES,
 	GEN_DEFAULT_REGIONS,
+	LEGACY_CACHELEVEL,
 };
 
 #define G75_FEATURES  \
@@ -640,6 +689,7 @@ static const struct intel_device_info chv_info = {
 	.has_logical_ring_contexts = 1,
 	.display.has_gmch = 1,
 	.dma_mask_size = 39,
+	.max_pat_index = 3,
 	.__runtime.ppgtt_type = INTEL_PPGTT_FULL,
 	.__runtime.ppgtt_size = 32,
 	.has_reset_engine = 1,
@@ -651,6 +701,7 @@ static const struct intel_device_info chv_info = {
 	CHV_COLORS,
 	GEN_DEFAULT_PAGE_SIZES,
 	GEN_DEFAULT_REGIONS,
+	LEGACY_CACHELEVEL,
 };
 
 #define GEN9_DEFAULT_PAGE_SIZES \
@@ -890,9 +941,11 @@ static const struct intel_device_info jsl_info = {
 		[TRANSCODER_DSI_1] = TRANSCODER_DSI1_OFFSET, \
 	}, \
 	TGL_CURSOR_OFFSETS, \
+	TGL_CACHELEVEL, \
 	.has_global_mocs = 1, \
 	.has_pxp = 1, \
-	.display.has_dsb = 1
+	.display.has_dsb = 1, \
+	.max_pat_index = 3
 
 static const struct intel_device_info tgl_info = {
 	GEN12_FEATURES,
@@ -1014,6 +1067,7 @@ static const struct intel_device_info adl_p_info = {
 	.__runtime.graphics.ip.ver = 12, \
 	.__runtime.graphics.ip.rel = 50, \
 	XE_HP_PAGE_SIZES, \
+	TGL_CACHELEVEL, \
 	.dma_mask_size = 46, \
 	.has_3d_pipeline = 1, \
 	.has_64bit_reloc = 1, \
@@ -1032,6 +1086,7 @@ static const struct intel_device_info adl_p_info = {
 	.has_reset_engine = 1, \
 	.has_rps = 1, \
 	.has_runtime_pm = 1, \
+	.max_pat_index = 3, \
 	.__runtime.ppgtt_size = 48, \
 	.__runtime.ppgtt_type = INTEL_PPGTT_FULL
 
@@ -1108,11 +1163,13 @@ static const struct intel_device_info pvc_info = {
 	PLATFORM(INTEL_PONTEVECCHIO),
 	NO_DISPLAY,
 	.has_flat_ccs = 0,
+	.max_pat_index = 7,
 	.__runtime.platform_engine_mask =
 		BIT(BCS0) |
 		BIT(VCS0) |
 		BIT(CCS0) | BIT(CCS1) | BIT(CCS2) | BIT(CCS3),
 	.require_force_probe = 1,
+	PVC_CACHELEVEL,
 };
 
 #define XE_LPDP_FEATURES	\
@@ -1150,9 +1207,11 @@ static const struct intel_device_info mtl_info = {
 	.has_llc = 0,
 	.has_mslice_steering = 0,
 	.has_snoop = 1,
+	.max_pat_index = 4,
 	.__runtime.memory_regions = REGION_SMEM | REGION_STOLEN_LMEM,
 	.__runtime.platform_engine_mask = BIT(RCS0) | BIT(BCS0) | BIT(CCS0),
 	.require_force_probe = 1,
+	MTL_CACHELEVEL,
 };
 
 #undef PLATFORM
diff --git a/drivers/gpu/drm/i915/intel_device_info.h b/drivers/gpu/drm/i915/intel_device_info.h
index f032f2500f50..959a4080840c 100644
--- a/drivers/gpu/drm/i915/intel_device_info.h
+++ b/drivers/gpu/drm/i915/intel_device_info.h
@@ -35,6 +35,8 @@
 #include "gt/intel_context_types.h"
 #include "gt/intel_sseu.h"
 
+#include "gem/i915_gem_object_types.h"
+
 struct drm_printer;
 struct drm_i915_private;
 struct intel_gt_definition;
@@ -308,6 +310,9 @@ struct intel_device_info {
 	 * Initial runtime info. Do not access outside of i915_driver_create().
 	 */
 	const struct intel_runtime_info __runtime;
+
+	u32 cachelevel_to_pat[I915_MAX_CACHE_LEVEL];
+	u32 max_pat_index;
 };
 
 struct intel_driver_caps {
diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
index f6a7c0bd2955..0eda8b4ee17f 100644
--- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
+++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
@@ -123,7 +123,9 @@ struct drm_i915_private *mock_gem_device(void)
 	static struct dev_iommu fake_iommu = { .priv = (void *)-1 };
 #endif
 	struct drm_i915_private *i915;
+	struct intel_device_info *i915_info;
 	struct pci_dev *pdev;
+	unsigned int i;
 	int ret;
 
 	pdev = kzalloc(sizeof(*pdev), GFP_KERNEL);
@@ -180,6 +182,13 @@ struct drm_i915_private *mock_gem_device(void)
 		I915_GTT_PAGE_SIZE_2M;
 
 	RUNTIME_INFO(i915)->memory_regions = REGION_SMEM;
+
+	/* simply use legacy cache level for mock device */
+	i915_info = (struct intel_device_info *)INTEL_INFO(i915);
+	i915_info->max_pat_index = 3;
+	for (i = 0; i < I915_MAX_CACHE_LEVEL; i++)
+		i915_info->cachelevel_to_pat[i] = i;
+
 	intel_memory_regions_hw_probe(i915);
 
 	spin_lock_init(&i915->gpu_error.lock);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [Intel-gfx] [PATCH 6/8] drm/i915: preparation for using PAT index
@ 2023-04-19 23:00   ` fei.yang
  0 siblings, 0 replies; 76+ messages in thread
From: fei.yang @ 2023-04-19 23:00 UTC (permalink / raw)
  To: intel-gfx; +Cc: Matt Roper, Chris Wilson, dri-devel

From: Fei Yang <fei.yang@intel.com>

This patch is a preparation for replacing enum i915_cache_level with PAT
index. Caching policy for buffer objects is set through the PAT index in
PTE, the old i915_cache_level is not sufficient to represent all caching
modes supported by the hardware.

Preparing the transition by adding some platform dependent data structures
and helper functions to translate the cache_level to pat_index.

cachelevel_to_pat: a platform dependent array mapping cache_level to
                   pat_index.

max_pat_index: the maximum PAT index supported by the hardware. Needed for
               validating the PAT index passed in from user space.

i915_gem_get_pat_index: function to convert cache_level to PAT index.

obj_to_i915(obj): macro moved to header file for wider usage.

I915_MAX_CACHE_LEVEL: upper bound of i915_cache_level for the
                      convenience of coding.

Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Fei Yang <fei.yang@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_object.c    |  9 +++
 drivers/gpu/drm/i915/gem/i915_gem_object.h    |  4 +
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |  1 +
 drivers/gpu/drm/i915/gem/i915_gem_shrinker.c  |  2 -
 drivers/gpu/drm/i915/gt/gen8_ppgtt.c          |  6 ++
 drivers/gpu/drm/i915/gt/intel_ggtt.c          |  6 ++
 drivers/gpu/drm/i915/i915_pci.c               | 75 +++++++++++++++++--
 drivers/gpu/drm/i915/intel_device_info.h      |  5 ++
 .../gpu/drm/i915/selftests/mock_gem_device.c  |  9 +++
 9 files changed, 107 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
index 4666bb82f312..8c70a0ec7d2f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
@@ -45,6 +45,15 @@ static struct kmem_cache *slab_objects;
 
 static const struct drm_gem_object_funcs i915_gem_object_funcs;
 
+unsigned int i915_gem_get_pat_index(struct drm_i915_private *i915,
+				    enum i915_cache_level level)
+{
+	if (drm_WARN_ON(&i915->drm, level >= I915_MAX_CACHE_LEVEL))
+		return 0;
+
+	return INTEL_INFO(i915)->cachelevel_to_pat[level];
+}
+
 struct drm_i915_gem_object *i915_gem_object_alloc(void)
 {
 	struct drm_i915_gem_object *obj;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h
index 885ccde9dc3c..4c92e17b4337 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
@@ -20,6 +20,8 @@
 
 enum intel_region_id;
 
+#define obj_to_i915(obj__) to_i915((obj__)->base.dev)
+
 static inline bool i915_gem_object_size_2big(u64 size)
 {
 	struct drm_i915_gem_object *obj;
@@ -30,6 +32,8 @@ static inline bool i915_gem_object_size_2big(u64 size)
 	return false;
 }
 
+unsigned int i915_gem_get_pat_index(struct drm_i915_private *i915,
+				    enum i915_cache_level level);
 void i915_gem_init__objects(struct drm_i915_private *i915);
 
 void i915_objects_module_exit(void);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index 830c11431ee8..41b35abccf88 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -194,6 +194,7 @@ enum i915_cache_level {
 	 * engine.
 	 */
 	I915_CACHE_WT,
+	I915_MAX_CACHE_LEVEL,
 };
 
 enum i915_map_type {
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c
index b1672e054b21..214763942aa2 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c
@@ -460,8 +460,6 @@ void i915_gem_shrinker_taints_mutex(struct drm_i915_private *i915,
 	fs_reclaim_release(GFP_KERNEL);
 }
 
-#define obj_to_i915(obj__) to_i915((obj__)->base.dev)
-
 /**
  * i915_gem_object_make_unshrinkable - Hide the object from the shrinker. By
  * default all object types that support shrinking(see IS_SHRINKABLE), will also
diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
index 11b91e0453c8..7a4b1d1afce9 100644
--- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
@@ -78,6 +78,12 @@ static u64 mtl_pte_encode(dma_addr_t addr,
 	case I915_CACHE_WT:
 		pte |= GEN12_PPGTT_PTE_PAT0;
 		break;
+	default:
+		/* This should never happen. Added to deal with the compile
+		 * error due to the addition of I915_MAX_CACHE_LEVEL. Will
+		 * be removed by the pat_index patch.
+		 */
+		break;
 	}
 
 	return pte;
diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c
index 20915edc8bd9..c8390d03fce2 100644
--- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
@@ -242,6 +242,12 @@ static u64 mtl_ggtt_pte_encode(dma_addr_t addr,
 	case I915_CACHE_WT:
 		pte |= MTL_GGTT_PTE_PAT0;
 		break;
+	default:
+		/* This should never happen. Added to deal with the compile
+		 * error due to the addition of I915_MAX_CACHE_LEVEL. Will
+		 * be removed by the pat_index patch.
+		 */
+		break;
 	}
 
 	return pte;
diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
index 272a8ba37b64..4ca0ea8fce9b 100644
--- a/drivers/gpu/drm/i915/i915_pci.c
+++ b/drivers/gpu/drm/i915/i915_pci.c
@@ -30,6 +30,7 @@
 #include "display/intel_display_driver.h"
 #include "gt/intel_gt_regs.h"
 #include "gt/intel_sa_media.h"
+#include "gem/i915_gem_object_types.h"
 
 #include "i915_driver.h"
 #include "i915_drv.h"
@@ -164,6 +165,38 @@
 		.gamma_lut_tests = DRM_COLOR_LUT_NON_DECREASING, \
 	}
 
+#define LEGACY_CACHELEVEL \
+	.cachelevel_to_pat = { \
+		[I915_CACHE_NONE]   = 0, \
+		[I915_CACHE_LLC]    = 1, \
+		[I915_CACHE_L3_LLC] = 2, \
+		[I915_CACHE_WT]     = 3, \
+	}
+
+#define TGL_CACHELEVEL \
+	.cachelevel_to_pat = { \
+		[I915_CACHE_NONE]   = 3, \
+		[I915_CACHE_LLC]    = 0, \
+		[I915_CACHE_L3_LLC] = 0, \
+		[I915_CACHE_WT]     = 2, \
+	}
+
+#define PVC_CACHELEVEL \
+	.cachelevel_to_pat = { \
+		[I915_CACHE_NONE]   = 0, \
+		[I915_CACHE_LLC]    = 3, \
+		[I915_CACHE_L3_LLC] = 3, \
+		[I915_CACHE_WT]     = 2, \
+	}
+
+#define MTL_CACHELEVEL \
+	.cachelevel_to_pat = { \
+		[I915_CACHE_NONE]   = 2, \
+		[I915_CACHE_LLC]    = 3, \
+		[I915_CACHE_L3_LLC] = 3, \
+		[I915_CACHE_WT]     = 1, \
+	}
+
 /* Keep in gen based order, and chronological order within a gen */
 
 #define GEN_DEFAULT_PAGE_SIZES \
@@ -189,11 +222,13 @@
 	.has_snoop = true, \
 	.has_coherent_ggtt = false, \
 	.dma_mask_size = 32, \
+	.max_pat_index = 3, \
 	I9XX_PIPE_OFFSETS, \
 	I9XX_CURSOR_OFFSETS, \
 	I9XX_COLORS, \
 	GEN_DEFAULT_PAGE_SIZES, \
-	GEN_DEFAULT_REGIONS
+	GEN_DEFAULT_REGIONS, \
+	LEGACY_CACHELEVEL
 
 #define I845_FEATURES \
 	GEN(2), \
@@ -210,11 +245,13 @@
 	.has_snoop = true, \
 	.has_coherent_ggtt = false, \
 	.dma_mask_size = 32, \
+	.max_pat_index = 3, \
 	I845_PIPE_OFFSETS, \
 	I845_CURSOR_OFFSETS, \
 	I845_COLORS, \
 	GEN_DEFAULT_PAGE_SIZES, \
-	GEN_DEFAULT_REGIONS
+	GEN_DEFAULT_REGIONS, \
+	LEGACY_CACHELEVEL
 
 static const struct intel_device_info i830_info = {
 	I830_FEATURES,
@@ -249,11 +286,13 @@ static const struct intel_device_info i865g_info = {
 	.has_snoop = true, \
 	.has_coherent_ggtt = true, \
 	.dma_mask_size = 32, \
+	.max_pat_index = 3, \
 	I9XX_PIPE_OFFSETS, \
 	I9XX_CURSOR_OFFSETS, \
 	I9XX_COLORS, \
 	GEN_DEFAULT_PAGE_SIZES, \
-	GEN_DEFAULT_REGIONS
+	GEN_DEFAULT_REGIONS, \
+	LEGACY_CACHELEVEL
 
 static const struct intel_device_info i915g_info = {
 	GEN3_FEATURES,
@@ -341,11 +380,13 @@ static const struct intel_device_info pnv_m_info = {
 	.has_snoop = true, \
 	.has_coherent_ggtt = true, \
 	.dma_mask_size = 36, \
+	.max_pat_index = 3, \
 	I9XX_PIPE_OFFSETS, \
 	I9XX_CURSOR_OFFSETS, \
 	I9XX_COLORS, \
 	GEN_DEFAULT_PAGE_SIZES, \
-	GEN_DEFAULT_REGIONS
+	GEN_DEFAULT_REGIONS, \
+	LEGACY_CACHELEVEL
 
 static const struct intel_device_info i965g_info = {
 	GEN4_FEATURES,
@@ -395,11 +436,13 @@ static const struct intel_device_info gm45_info = {
 	/* ilk does support rc6, but we do not implement [power] contexts */ \
 	.has_rc6 = 0, \
 	.dma_mask_size = 36, \
+	.max_pat_index = 3, \
 	I9XX_PIPE_OFFSETS, \
 	I9XX_CURSOR_OFFSETS, \
 	ILK_COLORS, \
 	GEN_DEFAULT_PAGE_SIZES, \
-	GEN_DEFAULT_REGIONS
+	GEN_DEFAULT_REGIONS, \
+	LEGACY_CACHELEVEL
 
 static const struct intel_device_info ilk_d_info = {
 	GEN5_FEATURES,
@@ -429,13 +472,15 @@ static const struct intel_device_info ilk_m_info = {
 	.has_rc6p = 0, \
 	.has_rps = true, \
 	.dma_mask_size = 40, \
+	.max_pat_index = 3, \
 	.__runtime.ppgtt_type = INTEL_PPGTT_ALIASING, \
 	.__runtime.ppgtt_size = 31, \
 	I9XX_PIPE_OFFSETS, \
 	I9XX_CURSOR_OFFSETS, \
 	ILK_COLORS, \
 	GEN_DEFAULT_PAGE_SIZES, \
-	GEN_DEFAULT_REGIONS
+	GEN_DEFAULT_REGIONS, \
+	LEGACY_CACHELEVEL
 
 #define SNB_D_PLATFORM \
 	GEN6_FEATURES, \
@@ -482,13 +527,15 @@ static const struct intel_device_info snb_m_gt2_info = {
 	.has_reset_engine = true, \
 	.has_rps = true, \
 	.dma_mask_size = 40, \
+	.max_pat_index = 3, \
 	.__runtime.ppgtt_type = INTEL_PPGTT_ALIASING, \
 	.__runtime.ppgtt_size = 31, \
 	IVB_PIPE_OFFSETS, \
 	IVB_CURSOR_OFFSETS, \
 	IVB_COLORS, \
 	GEN_DEFAULT_PAGE_SIZES, \
-	GEN_DEFAULT_REGIONS
+	GEN_DEFAULT_REGIONS, \
+	LEGACY_CACHELEVEL
 
 #define IVB_D_PLATFORM \
 	GEN7_FEATURES, \
@@ -542,6 +589,7 @@ static const struct intel_device_info vlv_info = {
 	.display.has_gmch = 1,
 	.display.has_hotplug = 1,
 	.dma_mask_size = 40,
+	.max_pat_index = 3,
 	.__runtime.ppgtt_type = INTEL_PPGTT_ALIASING,
 	.__runtime.ppgtt_size = 31,
 	.has_snoop = true,
@@ -553,6 +601,7 @@ static const struct intel_device_info vlv_info = {
 	I9XX_COLORS,
 	GEN_DEFAULT_PAGE_SIZES,
 	GEN_DEFAULT_REGIONS,
+	LEGACY_CACHELEVEL,
 };
 
 #define G75_FEATURES  \
@@ -640,6 +689,7 @@ static const struct intel_device_info chv_info = {
 	.has_logical_ring_contexts = 1,
 	.display.has_gmch = 1,
 	.dma_mask_size = 39,
+	.max_pat_index = 3,
 	.__runtime.ppgtt_type = INTEL_PPGTT_FULL,
 	.__runtime.ppgtt_size = 32,
 	.has_reset_engine = 1,
@@ -651,6 +701,7 @@ static const struct intel_device_info chv_info = {
 	CHV_COLORS,
 	GEN_DEFAULT_PAGE_SIZES,
 	GEN_DEFAULT_REGIONS,
+	LEGACY_CACHELEVEL,
 };
 
 #define GEN9_DEFAULT_PAGE_SIZES \
@@ -890,9 +941,11 @@ static const struct intel_device_info jsl_info = {
 		[TRANSCODER_DSI_1] = TRANSCODER_DSI1_OFFSET, \
 	}, \
 	TGL_CURSOR_OFFSETS, \
+	TGL_CACHELEVEL, \
 	.has_global_mocs = 1, \
 	.has_pxp = 1, \
-	.display.has_dsb = 1
+	.display.has_dsb = 1, \
+	.max_pat_index = 3
 
 static const struct intel_device_info tgl_info = {
 	GEN12_FEATURES,
@@ -1014,6 +1067,7 @@ static const struct intel_device_info adl_p_info = {
 	.__runtime.graphics.ip.ver = 12, \
 	.__runtime.graphics.ip.rel = 50, \
 	XE_HP_PAGE_SIZES, \
+	TGL_CACHELEVEL, \
 	.dma_mask_size = 46, \
 	.has_3d_pipeline = 1, \
 	.has_64bit_reloc = 1, \
@@ -1032,6 +1086,7 @@ static const struct intel_device_info adl_p_info = {
 	.has_reset_engine = 1, \
 	.has_rps = 1, \
 	.has_runtime_pm = 1, \
+	.max_pat_index = 3, \
 	.__runtime.ppgtt_size = 48, \
 	.__runtime.ppgtt_type = INTEL_PPGTT_FULL
 
@@ -1108,11 +1163,13 @@ static const struct intel_device_info pvc_info = {
 	PLATFORM(INTEL_PONTEVECCHIO),
 	NO_DISPLAY,
 	.has_flat_ccs = 0,
+	.max_pat_index = 7,
 	.__runtime.platform_engine_mask =
 		BIT(BCS0) |
 		BIT(VCS0) |
 		BIT(CCS0) | BIT(CCS1) | BIT(CCS2) | BIT(CCS3),
 	.require_force_probe = 1,
+	PVC_CACHELEVEL,
 };
 
 #define XE_LPDP_FEATURES	\
@@ -1150,9 +1207,11 @@ static const struct intel_device_info mtl_info = {
 	.has_llc = 0,
 	.has_mslice_steering = 0,
 	.has_snoop = 1,
+	.max_pat_index = 4,
 	.__runtime.memory_regions = REGION_SMEM | REGION_STOLEN_LMEM,
 	.__runtime.platform_engine_mask = BIT(RCS0) | BIT(BCS0) | BIT(CCS0),
 	.require_force_probe = 1,
+	MTL_CACHELEVEL,
 };
 
 #undef PLATFORM
diff --git a/drivers/gpu/drm/i915/intel_device_info.h b/drivers/gpu/drm/i915/intel_device_info.h
index f032f2500f50..959a4080840c 100644
--- a/drivers/gpu/drm/i915/intel_device_info.h
+++ b/drivers/gpu/drm/i915/intel_device_info.h
@@ -35,6 +35,8 @@
 #include "gt/intel_context_types.h"
 #include "gt/intel_sseu.h"
 
+#include "gem/i915_gem_object_types.h"
+
 struct drm_printer;
 struct drm_i915_private;
 struct intel_gt_definition;
@@ -308,6 +310,9 @@ struct intel_device_info {
 	 * Initial runtime info. Do not access outside of i915_driver_create().
 	 */
 	const struct intel_runtime_info __runtime;
+
+	u32 cachelevel_to_pat[I915_MAX_CACHE_LEVEL];
+	u32 max_pat_index;
 };
 
 struct intel_driver_caps {
diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
index f6a7c0bd2955..0eda8b4ee17f 100644
--- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
+++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
@@ -123,7 +123,9 @@ struct drm_i915_private *mock_gem_device(void)
 	static struct dev_iommu fake_iommu = { .priv = (void *)-1 };
 #endif
 	struct drm_i915_private *i915;
+	struct intel_device_info *i915_info;
 	struct pci_dev *pdev;
+	unsigned int i;
 	int ret;
 
 	pdev = kzalloc(sizeof(*pdev), GFP_KERNEL);
@@ -180,6 +182,13 @@ struct drm_i915_private *mock_gem_device(void)
 		I915_GTT_PAGE_SIZE_2M;
 
 	RUNTIME_INFO(i915)->memory_regions = REGION_SMEM;
+
+	/* simply use legacy cache level for mock device */
+	i915_info = (struct intel_device_info *)INTEL_INFO(i915);
+	i915_info->max_pat_index = 3;
+	for (i = 0; i < I915_MAX_CACHE_LEVEL; i++)
+		i915_info->cachelevel_to_pat[i] = i;
+
 	intel_memory_regions_hw_probe(i915);
 
 	spin_lock_init(&i915->gpu_error.lock);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 7/8] drm/i915: use pat_index instead of cache_level
  2023-04-19 23:00 ` [Intel-gfx] " fei.yang
@ 2023-04-19 23:00   ` fei.yang
  -1 siblings, 0 replies; 76+ messages in thread
From: fei.yang @ 2023-04-19 23:00 UTC (permalink / raw)
  To: intel-gfx; +Cc: Matt Roper, Chris Wilson, Fei Yang, dri-devel, Andi Shyti

From: Fei Yang <fei.yang@intel.com>

Currently the KMD is using enum i915_cache_level to set caching policy for
buffer objects. This is flaky because the PAT index which really controls
the caching behavior in PTE has far more levels than what's defined in the
enum. In addition, the PAT index is platform dependent, having to translate
between i915_cache_level and PAT index is not reliable, and makes the code
more complicated.

From UMD's perspective there is also a necessity to set caching policy for
performance fine tuning. It's much easier for the UMD to directly use PAT
index because the behavior of each PAT index is clearly defined in Bspec.
Having the abstracted i915_cache_level sitting in between would only cause
more ambiguity.

For these reasons this patch replaces i915_cache_level with PAT index. Also
note, the cache_level is not completely removed yet, because the KMD still
has the need of creating buffer objects with simple cache settings such as
cached, uncached, or writethrough. For such simple cases, using cache_level
would help simplify the code.

Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Fei Yang <fei.yang@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
---
 drivers/gpu/drm/i915/display/intel_dpt.c      | 12 +--
 drivers/gpu/drm/i915/gem/i915_gem_domain.c    | 27 ++----
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 10 ++-
 drivers/gpu/drm/i915/gem/i915_gem_mman.c      |  3 +-
 drivers/gpu/drm/i915/gem/i915_gem_object.c    | 52 +++++++++++-
 drivers/gpu/drm/i915/gem/i915_gem_object.h    |  4 +
 .../gpu/drm/i915/gem/i915_gem_object_types.h  | 25 +++++-
 drivers/gpu/drm/i915/gem/i915_gem_stolen.c    |  4 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c  | 16 ++--
 .../gpu/drm/i915/gem/selftests/huge_pages.c   |  2 +-
 .../drm/i915/gem/selftests/i915_gem_migrate.c |  2 +-
 .../drm/i915/gem/selftests/i915_gem_mman.c    |  2 +-
 drivers/gpu/drm/i915/gt/gen6_ppgtt.c          | 10 ++-
 drivers/gpu/drm/i915/gt/gen8_ppgtt.c          | 71 ++++++++--------
 drivers/gpu/drm/i915/gt/gen8_ppgtt.h          |  3 +-
 drivers/gpu/drm/i915/gt/intel_ggtt.c          | 82 +++++++++----------
 drivers/gpu/drm/i915/gt/intel_gtt.h           | 20 ++---
 drivers/gpu/drm/i915/gt/intel_migrate.c       | 47 ++++++-----
 drivers/gpu/drm/i915/gt/intel_migrate.h       | 13 ++-
 drivers/gpu/drm/i915/gt/intel_ppgtt.c         |  6 +-
 drivers/gpu/drm/i915/gt/selftest_migrate.c    | 47 ++++++-----
 drivers/gpu/drm/i915/gt/selftest_reset.c      |  8 +-
 drivers/gpu/drm/i915/gt/selftest_timeline.c   |  2 +-
 drivers/gpu/drm/i915/gt/selftest_tlb.c        |  4 +-
 drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c      | 10 ++-
 drivers/gpu/drm/i915/i915_debugfs.c           | 55 ++++++++++---
 drivers/gpu/drm/i915/i915_gem.c               | 16 +++-
 drivers/gpu/drm/i915/i915_gpu_error.c         |  8 +-
 drivers/gpu/drm/i915/i915_vma.c               | 16 ++--
 drivers/gpu/drm/i915/i915_vma.h               |  2 +-
 drivers/gpu/drm/i915/i915_vma_types.h         |  2 -
 drivers/gpu/drm/i915/selftests/i915_gem.c     |  5 +-
 .../gpu/drm/i915/selftests/i915_gem_evict.c   |  4 +-
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 15 ++--
 .../drm/i915/selftests/intel_memory_region.c  |  4 +-
 drivers/gpu/drm/i915/selftests/mock_gtt.c     |  8 +-
 36 files changed, 378 insertions(+), 239 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_dpt.c b/drivers/gpu/drm/i915/display/intel_dpt.c
index c5eacfdba1a5..7c5fddb203ba 100644
--- a/drivers/gpu/drm/i915/display/intel_dpt.c
+++ b/drivers/gpu/drm/i915/display/intel_dpt.c
@@ -43,24 +43,24 @@ static void gen8_set_pte(void __iomem *addr, gen8_pte_t pte)
 static void dpt_insert_page(struct i915_address_space *vm,
 			    dma_addr_t addr,
 			    u64 offset,
-			    enum i915_cache_level level,
+			    unsigned int pat_index,
 			    u32 flags)
 {
 	struct i915_dpt *dpt = i915_vm_to_dpt(vm);
 	gen8_pte_t __iomem *base = dpt->iomem;
 
 	gen8_set_pte(base + offset / I915_GTT_PAGE_SIZE,
-		     vm->pte_encode(addr, level, flags));
+		     vm->pte_encode(addr, pat_index, flags));
 }
 
 static void dpt_insert_entries(struct i915_address_space *vm,
 			       struct i915_vma_resource *vma_res,
-			       enum i915_cache_level level,
+			       unsigned int pat_index,
 			       u32 flags)
 {
 	struct i915_dpt *dpt = i915_vm_to_dpt(vm);
 	gen8_pte_t __iomem *base = dpt->iomem;
-	const gen8_pte_t pte_encode = vm->pte_encode(0, level, flags);
+	const gen8_pte_t pte_encode = vm->pte_encode(0, pat_index, flags);
 	struct sgt_iter sgt_iter;
 	dma_addr_t addr;
 	int i;
@@ -83,7 +83,7 @@ static void dpt_clear_range(struct i915_address_space *vm,
 static void dpt_bind_vma(struct i915_address_space *vm,
 			 struct i915_vm_pt_stash *stash,
 			 struct i915_vma_resource *vma_res,
-			 enum i915_cache_level cache_level,
+			 unsigned int pat_index,
 			 u32 flags)
 {
 	u32 pte_flags;
@@ -98,7 +98,7 @@ static void dpt_bind_vma(struct i915_address_space *vm,
 	if (vma_res->bi.lmem)
 		pte_flags |= PTE_LM;
 
-	vm->insert_entries(vm, vma_res, cache_level, pte_flags);
+	vm->insert_entries(vm, vma_res, pat_index, pte_flags);
 
 	vma_res->page_sizes_gtt = I915_GTT_PAGE_SIZE;
 
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
index bb3575b1479f..d5fd4c9cd9f8 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
@@ -27,8 +27,8 @@ static bool gpu_write_needs_clflush(struct drm_i915_gem_object *obj)
 	if (IS_DGFX(i915))
 		return false;
 
-	return !(obj->cache_level == I915_CACHE_NONE ||
-		 obj->cache_level == I915_CACHE_WT);
+	return !(i915_gem_object_has_cache_level(obj, I915_CACHE_NONE) ||
+		 i915_gem_object_has_cache_level(obj, I915_CACHE_WT));
 }
 
 bool i915_gem_cpu_write_needs_clflush(struct drm_i915_gem_object *obj)
@@ -267,7 +267,7 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
 {
 	int ret;
 
-	if (obj->cache_level == cache_level)
+	if (i915_gem_object_has_cache_level(obj, cache_level))
 		return 0;
 
 	ret = i915_gem_object_wait(obj,
@@ -278,10 +278,8 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
 		return ret;
 
 	/* Always invalidate stale cachelines */
-	if (obj->cache_level != cache_level) {
-		i915_gem_object_set_cache_coherency(obj, cache_level);
-		obj->cache_dirty = true;
-	}
+	i915_gem_object_set_cache_coherency(obj, cache_level);
+	obj->cache_dirty = true;
 
 	/* The cache-level will be applied when each vma is rebound. */
 	return i915_gem_object_unbind(obj,
@@ -306,20 +304,13 @@ int i915_gem_get_caching_ioctl(struct drm_device *dev, void *data,
 		goto out;
 	}
 
-	switch (obj->cache_level) {
-	case I915_CACHE_LLC:
-	case I915_CACHE_L3_LLC:
+	if (i915_gem_object_has_cache_level(obj, I915_CACHE_LLC) ||
+	    i915_gem_object_has_cache_level(obj, I915_CACHE_L3_LLC))
 		args->caching = I915_CACHING_CACHED;
-		break;
-
-	case I915_CACHE_WT:
+	else if (i915_gem_object_has_cache_level(obj, I915_CACHE_WT))
 		args->caching = I915_CACHING_DISPLAY;
-		break;
-
-	default:
+	else
 		args->caching = I915_CACHING_NONE;
-		break;
-	}
 out:
 	rcu_read_unlock();
 	return err;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 3aeede6aee4d..d42915516636 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -642,7 +642,7 @@ static inline int use_cpu_reloc(const struct reloc_cache *cache,
 
 	return (cache->has_llc ||
 		obj->cache_dirty ||
-		obj->cache_level != I915_CACHE_NONE);
+		!i915_gem_object_has_cache_level(obj, I915_CACHE_NONE));
 }
 
 static int eb_reserve_vma(struct i915_execbuffer *eb,
@@ -1323,8 +1323,10 @@ static void *reloc_iomap(struct i915_vma *batch,
 	offset = cache->node.start;
 	if (drm_mm_node_allocated(&cache->node)) {
 		ggtt->vm.insert_page(&ggtt->vm,
-				     i915_gem_object_get_dma_address(obj, page),
-				     offset, I915_CACHE_NONE, 0);
+			i915_gem_object_get_dma_address(obj, page),
+			offset,
+			i915_gem_get_pat_index(ggtt->vm.i915, I915_CACHE_NONE),
+			0);
 	} else {
 		offset += page << PAGE_SHIFT;
 	}
@@ -1464,7 +1466,7 @@ eb_relocate_entry(struct i915_execbuffer *eb,
 			reloc_cache_unmap(&eb->reloc_cache);
 			mutex_lock(&vma->vm->mutex);
 			err = i915_vma_bind(target->vma,
-					    target->vma->obj->cache_level,
+					    target->vma->obj->pat_index,
 					    PIN_GLOBAL, NULL, NULL);
 			mutex_unlock(&vma->vm->mutex);
 			reloc_cache_remap(&eb->reloc_cache, ev->vma->obj);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
index 3dbacdf0911a..50c30efa08a3 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
@@ -383,7 +383,8 @@ static vm_fault_t vm_fault_gtt(struct vm_fault *vmf)
 	}
 
 	/* Access to snoopable pages through the GTT is incoherent. */
-	if (obj->cache_level != I915_CACHE_NONE && !HAS_LLC(i915)) {
+	if (!(i915_gem_object_has_cache_level(obj, I915_CACHE_NONE) ||
+	      HAS_LLC(i915))) {
 		ret = -EFAULT;
 		goto err_unpin;
 	}
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
index 8c70a0ec7d2f..27c948350b5b 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
@@ -54,6 +54,25 @@ unsigned int i915_gem_get_pat_index(struct drm_i915_private *i915,
 	return INTEL_INFO(i915)->cachelevel_to_pat[level];
 }
 
+bool i915_gem_object_has_cache_level(const struct drm_i915_gem_object *obj,
+				     enum i915_cache_level lvl)
+{
+	/*
+	 * cache_level == I915_CACHE_INVAL indicates the UMD's have set the
+	 * caching policy through pat_index, in which case the KMD should
+	 * leave the coherency to be managed by user space, simply return
+	 * true here.
+	 */
+	if (obj->cache_level == I915_CACHE_INVAL)
+		return true;
+
+	/*
+	 * Otherwise the pat_index should have been converted from cache_level
+	 * so that the following comparison is valid.
+	 */
+	return obj->pat_index == i915_gem_get_pat_index(obj_to_i915(obj), lvl);
+}
+
 struct drm_i915_gem_object *i915_gem_object_alloc(void)
 {
 	struct drm_i915_gem_object *obj;
@@ -133,7 +152,7 @@ void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object *obj,
 {
 	struct drm_i915_private *i915 = to_i915(obj->base.dev);
 
-	obj->cache_level = cache_level;
+	obj->pat_index = i915_gem_get_pat_index(i915, cache_level);
 
 	if (cache_level != I915_CACHE_NONE)
 		obj->cache_coherent = (I915_BO_CACHE_COHERENT_FOR_READ |
@@ -148,6 +167,37 @@ void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object *obj,
 		!IS_DGFX(i915);
 }
 
+/**
+ * i915_gem_object_set_pat_index - set PAT index to be used in PTE encode
+ * @obj: #drm_i915_gem_object
+ * @pat_index: PAT index
+ *
+ * This is a clone of i915_gem_object_set_cache_coherency taking pat index
+ * instead of cache_level as its second argument.
+ */
+void i915_gem_object_set_pat_index(struct drm_i915_gem_object *obj,
+				   unsigned int pat_index)
+{
+	struct drm_i915_private *i915 = to_i915(obj->base.dev);
+
+	if (obj->pat_index == pat_index)
+		return;
+
+	obj->pat_index = pat_index;
+
+	if (pat_index != i915_gem_get_pat_index(i915, I915_CACHE_NONE))
+		obj->cache_coherent = (I915_BO_CACHE_COHERENT_FOR_READ |
+				       I915_BO_CACHE_COHERENT_FOR_WRITE);
+	else if (HAS_LLC(i915))
+		obj->cache_coherent = I915_BO_CACHE_COHERENT_FOR_READ;
+	else
+		obj->cache_coherent = 0;
+
+	obj->cache_dirty =
+		!(obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_WRITE) &&
+		!IS_DGFX(i915);
+}
+
 bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj)
 {
 	struct drm_i915_private *i915 = to_i915(obj->base.dev);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h
index 4c92e17b4337..6f00aab10015 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
@@ -34,6 +34,8 @@ static inline bool i915_gem_object_size_2big(u64 size)
 
 unsigned int i915_gem_get_pat_index(struct drm_i915_private *i915,
 				    enum i915_cache_level level);
+bool i915_gem_object_has_cache_level(const struct drm_i915_gem_object *obj,
+				     enum i915_cache_level lvl);
 void i915_gem_init__objects(struct drm_i915_private *i915);
 
 void i915_objects_module_exit(void);
@@ -764,6 +766,8 @@ bool i915_gem_object_has_unknown_state(struct drm_i915_gem_object *obj);
 
 void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object *obj,
 					 unsigned int cache_level);
+void i915_gem_object_set_pat_index(struct drm_i915_gem_object *obj,
+				   unsigned int pat_index);
 bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj);
 void i915_gem_object_flush_if_display(struct drm_i915_gem_object *obj);
 void i915_gem_object_flush_if_display_locked(struct drm_i915_gem_object *obj);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index 41b35abccf88..132ce01dee9f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -195,6 +195,7 @@ enum i915_cache_level {
 	 */
 	I915_CACHE_WT,
 	I915_MAX_CACHE_LEVEL,
+	I915_CACHE_INVAL = I915_MAX_CACHE_LEVEL,
 };
 
 enum i915_map_type {
@@ -358,10 +359,28 @@ struct drm_i915_gem_object {
 #define I915_BO_FLAG_STRUCT_PAGE BIT(0) /* Object backed by struct pages */
 #define I915_BO_FLAG_IOMEM       BIT(1) /* Object backed by IO memory */
 	/**
-	 * @cache_level: The desired GTT caching level.
+	 * @pat_index: The desired PAT index.
+	 *
+	 * See hardware specification for valid PAT indices for each platform.
+	 * This field used to contain a value of enum i915_cache_level. It's
+	 * changed to an unsigned int because PAT indices are being used by
+	 * both UMD and KMD for caching policy control after GEN12.
+	 * For backward compatibility, this field will continue to contain
+	 * value of i915_cache_level for pre-GEN12 platforms so that the PTE
+	 * encode functions for these legacy platforms can stay the same.
+	 * In the meantime platform specific tables are created to translate
+	 * i915_cache_level into pat index, for more details check the macros
+	 * defined i915/i915_pci.c, e.g. PVC_CACHELEVEL.
+	 */
+	unsigned int pat_index:6;
+	/**
+	 * @cache_level: Indicate whether pat_index is set by UMD
 	 *
-	 * See enum i915_cache_level for possible values, along with what
-	 * each does.
+	 * This used to hold desired GTT caching level, but is now replaced by
+	 * pat_index. It's kept here for KMD to tell whether the pat_index is
+	 * set by UMD or converted from enum i915_cache_level.
+	 * This field should be 0 by default, but I915_CACHE_INVAL if the
+	 * pat_index is set by UMD.
 	 */
 	unsigned int cache_level:3;
 	/**
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
index ee492d823f1b..3b094d36a0b0 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
@@ -565,7 +565,9 @@ static void dbg_poison(struct i915_ggtt *ggtt,
 
 		ggtt->vm.insert_page(&ggtt->vm, addr,
 				     ggtt->error_capture.start,
-				     I915_CACHE_NONE, 0);
+				     i915_gem_get_pat_index(ggtt->vm.i915,
+							    I915_CACHE_NONE),
+				     0);
 		mb();
 
 		s = io_mapping_map_wc(&ggtt->iomap,
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index 69eb20ed4d47..e40761e13c2a 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -214,7 +214,8 @@ static struct dma_fence *i915_ttm_accel_move(struct ttm_buffer_object *bo,
 
 		intel_engine_pm_get(to_gt(i915)->migrate.context->engine);
 		ret = intel_context_migrate_clear(to_gt(i915)->migrate.context, deps,
-						  dst_st->sgl, dst_level,
+						  dst_st->sgl,
+						  i915_gem_get_pat_index(i915, dst_level),
 						  i915_ttm_gtt_binds_lmem(dst_mem),
 						  0, &rq);
 	} else {
@@ -227,12 +228,13 @@ static struct dma_fence *i915_ttm_accel_move(struct ttm_buffer_object *bo,
 		src_level = i915_ttm_cache_level(i915, bo->resource, src_ttm);
 		intel_engine_pm_get(to_gt(i915)->migrate.context->engine);
 		ret = intel_context_migrate_copy(to_gt(i915)->migrate.context,
-						 deps, src_rsgt->table.sgl,
-						 src_level,
-						 i915_ttm_gtt_binds_lmem(bo->resource),
-						 dst_st->sgl, dst_level,
-						 i915_ttm_gtt_binds_lmem(dst_mem),
-						 &rq);
+					deps, src_rsgt->table.sgl,
+					i915_gem_get_pat_index(i915, src_level),
+					i915_ttm_gtt_binds_lmem(bo->resource),
+					dst_st->sgl,
+					i915_gem_get_pat_index(i915, dst_level),
+					i915_ttm_gtt_binds_lmem(dst_mem),
+					&rq);
 
 		i915_refct_sgt_put(src_rsgt);
 	}
diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
index defece0bcb81..ebb68ac9cd5e 100644
--- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
+++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
@@ -354,7 +354,7 @@ fake_huge_pages_object(struct drm_i915_private *i915, u64 size, bool single)
 
 	obj->write_domain = I915_GEM_DOMAIN_CPU;
 	obj->read_domains = I915_GEM_DOMAIN_CPU;
-	obj->cache_level = I915_CACHE_NONE;
+	obj->pat_index = i915_gem_get_pat_index(i915, I915_CACHE_NONE);
 
 	return obj;
 }
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c
index fe6c37fd7859..a93a90b15907 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c
@@ -219,7 +219,7 @@ static int __igt_lmem_pages_migrate(struct intel_gt *gt,
 			continue;
 
 		err = intel_migrate_clear(&gt->migrate, &ww, deps,
-					  obj->mm.pages->sgl, obj->cache_level,
+					  obj->mm.pages->sgl, obj->pat_index,
 					  i915_gem_object_is_lmem(obj),
 					  0xdeadbeaf, &rq);
 		if (rq) {
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
index 56279908ed30..a93d8f9f8bc1 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
@@ -1222,7 +1222,7 @@ static int __igt_mmap_migrate(struct intel_memory_region **placements,
 	}
 
 	err = intel_context_migrate_clear(to_gt(i915)->migrate.context, NULL,
-					  obj->mm.pages->sgl, obj->cache_level,
+					  obj->mm.pages->sgl, obj->pat_index,
 					  i915_gem_object_is_lmem(obj),
 					  expand32(POISON_INUSE), &rq);
 	i915_gem_object_unpin_pages(obj);
diff --git a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
index 5aaacc53fa4c..c2bdc133c89a 100644
--- a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
@@ -109,7 +109,7 @@ static void gen6_ppgtt_clear_range(struct i915_address_space *vm,
 
 static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
 				      struct i915_vma_resource *vma_res,
-				      enum i915_cache_level cache_level,
+				      unsigned int pat_index,
 				      u32 flags)
 {
 	struct i915_ppgtt *ppgtt = i915_vm_to_ppgtt(vm);
@@ -117,7 +117,7 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
 	unsigned int first_entry = vma_res->start / I915_GTT_PAGE_SIZE;
 	unsigned int act_pt = first_entry / GEN6_PTES;
 	unsigned int act_pte = first_entry % GEN6_PTES;
-	const u32 pte_encode = vm->pte_encode(0, cache_level, flags);
+	const u32 pte_encode = vm->pte_encode(0, pat_index, flags);
 	struct sgt_dma iter = sgt_dma(vma_res);
 	gen6_pte_t *vaddr;
 
@@ -227,7 +227,9 @@ static int gen6_ppgtt_init_scratch(struct gen6_ppgtt *ppgtt)
 
 	vm->scratch[0]->encode =
 		vm->pte_encode(px_dma(vm->scratch[0]),
-			       I915_CACHE_NONE, PTE_READ_ONLY);
+			       i915_gem_get_pat_index(vm->i915,
+						      I915_CACHE_NONE),
+			       PTE_READ_ONLY);
 
 	vm->scratch[1] = vm->alloc_pt_dma(vm, I915_GTT_PAGE_SIZE_4K);
 	if (IS_ERR(vm->scratch[1])) {
@@ -278,7 +280,7 @@ static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
 static void pd_vma_bind(struct i915_address_space *vm,
 			struct i915_vm_pt_stash *stash,
 			struct i915_vma_resource *vma_res,
-			enum i915_cache_level cache_level,
+			unsigned int pat_index,
 			u32 unused)
 {
 	struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
index 7a4b1d1afce9..c046813514f4 100644
--- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
@@ -56,7 +56,7 @@ static u64 gen8_pte_encode(dma_addr_t addr,
 }
 
 static u64 mtl_pte_encode(dma_addr_t addr,
-			  enum i915_cache_level level,
+			  unsigned int pat_index,
 			  u32 flags)
 {
 	gen8_pte_t pte = addr | GEN8_PAGE_PRESENT | GEN8_PAGE_RW;
@@ -67,24 +67,17 @@ static u64 mtl_pte_encode(dma_addr_t addr,
 	if (flags & PTE_LM)
 		pte |= GEN12_PPGTT_PTE_LM | GEN12_PPGTT_PTE_NC;
 
-	switch (level) {
-	case I915_CACHE_NONE:
-		pte |= GEN12_PPGTT_PTE_PAT1;
-		break;
-	case I915_CACHE_LLC:
-	case I915_CACHE_L3_LLC:
-		pte |= GEN12_PPGTT_PTE_PAT0 | GEN12_PPGTT_PTE_PAT1;
-		break;
-	case I915_CACHE_WT:
+	if (pat_index & BIT(0))
 		pte |= GEN12_PPGTT_PTE_PAT0;
-		break;
-	default:
-		/* This should never happen. Added to deal with the compile
-		 * error due to the addition of I915_MAX_CACHE_LEVEL. Will
-		 * be removed by the pat_index patch.
-		 */
-		break;
-	}
+
+	if (pat_index & BIT(1))
+		pte |= GEN12_PPGTT_PTE_PAT1;
+
+	if (pat_index & BIT(2))
+		pte |= GEN12_PPGTT_PTE_PAT2;
+
+	if (pat_index & BIT(3))
+		pte |= MTL_PPGTT_PTE_PAT3;
 
 	return pte;
 }
@@ -457,11 +450,11 @@ gen8_ppgtt_insert_pte(struct i915_ppgtt *ppgtt,
 		      struct i915_page_directory *pdp,
 		      struct sgt_dma *iter,
 		      u64 idx,
-		      enum i915_cache_level cache_level,
+		      unsigned int pat_index,
 		      u32 flags)
 {
 	struct i915_page_directory *pd;
-	const gen8_pte_t pte_encode = ppgtt->vm.pte_encode(0, cache_level, flags);
+	const gen8_pte_t pte_encode = ppgtt->vm.pte_encode(0, pat_index, flags);
 	gen8_pte_t *vaddr;
 
 	pd = i915_pd_entry(pdp, gen8_pd_index(idx, 2));
@@ -504,10 +497,10 @@ static void
 xehpsdv_ppgtt_insert_huge(struct i915_address_space *vm,
 			  struct i915_vma_resource *vma_res,
 			  struct sgt_dma *iter,
-			  enum i915_cache_level cache_level,
+			  unsigned int pat_index,
 			  u32 flags)
 {
-	const gen8_pte_t pte_encode = vm->pte_encode(0, cache_level, flags);
+	const gen8_pte_t pte_encode = vm->pte_encode(0, pat_index, flags);
 	unsigned int rem = sg_dma_len(iter->sg);
 	u64 start = vma_res->start;
 	u64 end = start + vma_res->vma_size;
@@ -611,10 +604,10 @@ xehpsdv_ppgtt_insert_huge(struct i915_address_space *vm,
 static void gen8_ppgtt_insert_huge(struct i915_address_space *vm,
 				   struct i915_vma_resource *vma_res,
 				   struct sgt_dma *iter,
-				   enum i915_cache_level cache_level,
+				   unsigned int pat_index,
 				   u32 flags)
 {
-	const gen8_pte_t pte_encode = vm->pte_encode(0, cache_level, flags);
+	const gen8_pte_t pte_encode = vm->pte_encode(0, pat_index, flags);
 	unsigned int rem = sg_dma_len(iter->sg);
 	u64 start = vma_res->start;
 
@@ -734,7 +727,7 @@ static void gen8_ppgtt_insert_huge(struct i915_address_space *vm,
 
 static void gen8_ppgtt_insert(struct i915_address_space *vm,
 			      struct i915_vma_resource *vma_res,
-			      enum i915_cache_level cache_level,
+			      unsigned int pat_index,
 			      u32 flags)
 {
 	struct i915_ppgtt * const ppgtt = i915_vm_to_ppgtt(vm);
@@ -742,9 +735,9 @@ static void gen8_ppgtt_insert(struct i915_address_space *vm,
 
 	if (vma_res->bi.page_sizes.sg > I915_GTT_PAGE_SIZE) {
 		if (HAS_64K_PAGES(vm->i915))
-			xehpsdv_ppgtt_insert_huge(vm, vma_res, &iter, cache_level, flags);
+			xehpsdv_ppgtt_insert_huge(vm, vma_res, &iter, pat_index, flags);
 		else
-			gen8_ppgtt_insert_huge(vm, vma_res, &iter, cache_level, flags);
+			gen8_ppgtt_insert_huge(vm, vma_res, &iter, pat_index, flags);
 	} else  {
 		u64 idx = vma_res->start >> GEN8_PTE_SHIFT;
 
@@ -753,7 +746,7 @@ static void gen8_ppgtt_insert(struct i915_address_space *vm,
 				gen8_pdp_for_page_index(vm, idx);
 
 			idx = gen8_ppgtt_insert_pte(ppgtt, pdp, &iter, idx,
-						    cache_level, flags);
+						    pat_index, flags);
 		} while (idx);
 
 		vma_res->page_sizes_gtt = I915_GTT_PAGE_SIZE;
@@ -763,7 +756,7 @@ static void gen8_ppgtt_insert(struct i915_address_space *vm,
 static void gen8_ppgtt_insert_entry(struct i915_address_space *vm,
 				    dma_addr_t addr,
 				    u64 offset,
-				    enum i915_cache_level level,
+				    unsigned int pat_index,
 				    u32 flags)
 {
 	u64 idx = offset >> GEN8_PTE_SHIFT;
@@ -777,14 +770,14 @@ static void gen8_ppgtt_insert_entry(struct i915_address_space *vm,
 	GEM_BUG_ON(pt->is_compact);
 
 	vaddr = px_vaddr(pt);
-	vaddr[gen8_pd_index(idx, 0)] = vm->pte_encode(addr, level, flags);
+	vaddr[gen8_pd_index(idx, 0)] = vm->pte_encode(addr, pat_index, flags);
 	drm_clflush_virt_range(&vaddr[gen8_pd_index(idx, 0)], sizeof(*vaddr));
 }
 
 static void __xehpsdv_ppgtt_insert_entry_lm(struct i915_address_space *vm,
 					    dma_addr_t addr,
 					    u64 offset,
-					    enum i915_cache_level level,
+					    unsigned int pat_index,
 					    u32 flags)
 {
 	u64 idx = offset >> GEN8_PTE_SHIFT;
@@ -807,20 +800,20 @@ static void __xehpsdv_ppgtt_insert_entry_lm(struct i915_address_space *vm,
 	}
 
 	vaddr = px_vaddr(pt);
-	vaddr[gen8_pd_index(idx, 0) / 16] = vm->pte_encode(addr, level, flags);
+	vaddr[gen8_pd_index(idx, 0) / 16] = vm->pte_encode(addr, pat_index, flags);
 }
 
 static void xehpsdv_ppgtt_insert_entry(struct i915_address_space *vm,
 				       dma_addr_t addr,
 				       u64 offset,
-				       enum i915_cache_level level,
+				       unsigned int pat_index,
 				       u32 flags)
 {
 	if (flags & PTE_LM)
 		return __xehpsdv_ppgtt_insert_entry_lm(vm, addr, offset,
-						       level, flags);
+						       pat_index, flags);
 
-	return gen8_ppgtt_insert_entry(vm, addr, offset, level, flags);
+	return gen8_ppgtt_insert_entry(vm, addr, offset, pat_index, flags);
 }
 
 static int gen8_init_scratch(struct i915_address_space *vm)
@@ -855,7 +848,9 @@ static int gen8_init_scratch(struct i915_address_space *vm)
 
 	vm->scratch[0]->encode =
 		vm->pte_encode(px_dma(vm->scratch[0]),
-			       I915_CACHE_NONE, pte_flags);
+			       i915_gem_get_pat_index(vm->i915,
+						      I915_CACHE_NONE),
+			       pte_flags);
 
 	for (i = 1; i <= vm->top; i++) {
 		struct drm_i915_gem_object *obj;
@@ -873,7 +868,9 @@ static int gen8_init_scratch(struct i915_address_space *vm)
 		}
 
 		fill_px(obj, vm->scratch[i - 1]->encode);
-		obj->encode = gen8_pde_encode(px_dma(obj), I915_CACHE_NONE);
+		obj->encode = gen8_pde_encode(px_dma(obj),
+					      i915_gem_get_pat_index(vm->i915,
+								     I915_CACHE_NONE));
 
 		vm->scratch[i] = obj;
 	}
diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.h b/drivers/gpu/drm/i915/gt/gen8_ppgtt.h
index f541d19264b4..19c635441642 100644
--- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.h
+++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.h
@@ -10,13 +10,12 @@
 
 struct i915_address_space;
 struct intel_gt;
-enum i915_cache_level;
 
 struct i915_ppgtt *gen8_ppgtt_create(struct intel_gt *gt,
 				     unsigned long lmem_pt_obj_flags);
 
 u64 gen8_ggtt_pte_encode(dma_addr_t addr,
-			 enum i915_cache_level level,
+			 unsigned int pat_index,
 			 u32 flags);
 
 #endif
diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c
index c8390d03fce2..2a7942fac798 100644
--- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
@@ -221,7 +221,7 @@ static void guc_ggtt_invalidate(struct i915_ggtt *ggtt)
 }
 
 static u64 mtl_ggtt_pte_encode(dma_addr_t addr,
-			       enum i915_cache_level level,
+			       unsigned int pat_index,
 			       u32 flags)
 {
 	gen8_pte_t pte = addr | GEN8_PAGE_PRESENT;
@@ -231,30 +231,17 @@ static u64 mtl_ggtt_pte_encode(dma_addr_t addr,
 	if (flags & PTE_LM)
 		pte |= GEN12_GGTT_PTE_LM;
 
-	switch (level) {
-	case I915_CACHE_NONE:
-		pte |= MTL_GGTT_PTE_PAT1;
-		break;
-	case I915_CACHE_LLC:
-	case I915_CACHE_L3_LLC:
-		pte |= MTL_GGTT_PTE_PAT0 | MTL_GGTT_PTE_PAT1;
-		break;
-	case I915_CACHE_WT:
+	if (pat_index & BIT(0))
 		pte |= MTL_GGTT_PTE_PAT0;
-		break;
-	default:
-		/* This should never happen. Added to deal with the compile
-		 * error due to the addition of I915_MAX_CACHE_LEVEL. Will
-		 * be removed by the pat_index patch.
-		 */
-		break;
-	}
+
+	if (pat_index & BIT(1))
+		pte |= MTL_GGTT_PTE_PAT1;
 
 	return pte;
 }
 
 u64 gen8_ggtt_pte_encode(dma_addr_t addr,
-			 enum i915_cache_level level,
+			 unsigned int pat_index,
 			 u32 flags)
 {
 	gen8_pte_t pte = addr | GEN8_PAGE_PRESENT;
@@ -273,25 +260,25 @@ static void gen8_set_pte(void __iomem *addr, gen8_pte_t pte)
 static void gen8_ggtt_insert_page(struct i915_address_space *vm,
 				  dma_addr_t addr,
 				  u64 offset,
-				  enum i915_cache_level level,
+				  unsigned int pat_index,
 				  u32 flags)
 {
 	struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
 	gen8_pte_t __iomem *pte =
 		(gen8_pte_t __iomem *)ggtt->gsm + offset / I915_GTT_PAGE_SIZE;
 
-	gen8_set_pte(pte, ggtt->vm.pte_encode(addr, level, flags));
+	gen8_set_pte(pte, ggtt->vm.pte_encode(addr, pat_index, flags));
 
 	ggtt->invalidate(ggtt);
 }
 
 static void gen8_ggtt_insert_entries(struct i915_address_space *vm,
 				     struct i915_vma_resource *vma_res,
-				     enum i915_cache_level level,
+				     unsigned int pat_index,
 				     u32 flags)
 {
 	struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
-	const gen8_pte_t pte_encode = ggtt->vm.pte_encode(0, level, flags);
+	const gen8_pte_t pte_encode = ggtt->vm.pte_encode(0, pat_index, flags);
 	gen8_pte_t __iomem *gte;
 	gen8_pte_t __iomem *end;
 	struct sgt_iter iter;
@@ -348,14 +335,14 @@ static void gen8_ggtt_clear_range(struct i915_address_space *vm,
 static void gen6_ggtt_insert_page(struct i915_address_space *vm,
 				  dma_addr_t addr,
 				  u64 offset,
-				  enum i915_cache_level level,
+				  unsigned int pat_index,
 				  u32 flags)
 {
 	struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
 	gen6_pte_t __iomem *pte =
 		(gen6_pte_t __iomem *)ggtt->gsm + offset / I915_GTT_PAGE_SIZE;
 
-	iowrite32(vm->pte_encode(addr, level, flags), pte);
+	iowrite32(vm->pte_encode(addr, pat_index, flags), pte);
 
 	ggtt->invalidate(ggtt);
 }
@@ -368,7 +355,7 @@ static void gen6_ggtt_insert_page(struct i915_address_space *vm,
  */
 static void gen6_ggtt_insert_entries(struct i915_address_space *vm,
 				     struct i915_vma_resource *vma_res,
-				     enum i915_cache_level level,
+				     unsigned int pat_index,
 				     u32 flags)
 {
 	struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
@@ -385,7 +372,7 @@ static void gen6_ggtt_insert_entries(struct i915_address_space *vm,
 		iowrite32(vm->scratch[0]->encode, gte++);
 	end += (vma_res->node_size + vma_res->guard) / I915_GTT_PAGE_SIZE;
 	for_each_sgt_daddr(addr, iter, vma_res->bi.pages)
-		iowrite32(vm->pte_encode(addr, level, flags), gte++);
+		iowrite32(vm->pte_encode(addr, pat_index, flags), gte++);
 	GEM_BUG_ON(gte > end);
 
 	/* Fill the allocated but "unused" space beyond the end of the buffer */
@@ -420,14 +407,15 @@ struct insert_page {
 	struct i915_address_space *vm;
 	dma_addr_t addr;
 	u64 offset;
-	enum i915_cache_level level;
+	unsigned int pat_index;
 };
 
 static int bxt_vtd_ggtt_insert_page__cb(void *_arg)
 {
 	struct insert_page *arg = _arg;
 
-	gen8_ggtt_insert_page(arg->vm, arg->addr, arg->offset, arg->level, 0);
+	gen8_ggtt_insert_page(arg->vm, arg->addr, arg->offset,
+			      arg->pat_index, 0);
 	bxt_vtd_ggtt_wa(arg->vm);
 
 	return 0;
@@ -436,10 +424,10 @@ static int bxt_vtd_ggtt_insert_page__cb(void *_arg)
 static void bxt_vtd_ggtt_insert_page__BKL(struct i915_address_space *vm,
 					  dma_addr_t addr,
 					  u64 offset,
-					  enum i915_cache_level level,
+					  unsigned int pat_index,
 					  u32 unused)
 {
-	struct insert_page arg = { vm, addr, offset, level };
+	struct insert_page arg = { vm, addr, offset, pat_index };
 
 	stop_machine(bxt_vtd_ggtt_insert_page__cb, &arg, NULL);
 }
@@ -447,7 +435,7 @@ static void bxt_vtd_ggtt_insert_page__BKL(struct i915_address_space *vm,
 struct insert_entries {
 	struct i915_address_space *vm;
 	struct i915_vma_resource *vma_res;
-	enum i915_cache_level level;
+	unsigned int pat_index;
 	u32 flags;
 };
 
@@ -455,7 +443,8 @@ static int bxt_vtd_ggtt_insert_entries__cb(void *_arg)
 {
 	struct insert_entries *arg = _arg;
 
-	gen8_ggtt_insert_entries(arg->vm, arg->vma_res, arg->level, arg->flags);
+	gen8_ggtt_insert_entries(arg->vm, arg->vma_res,
+				 arg->pat_index, arg->flags);
 	bxt_vtd_ggtt_wa(arg->vm);
 
 	return 0;
@@ -463,10 +452,10 @@ static int bxt_vtd_ggtt_insert_entries__cb(void *_arg)
 
 static void bxt_vtd_ggtt_insert_entries__BKL(struct i915_address_space *vm,
 					     struct i915_vma_resource *vma_res,
-					     enum i915_cache_level level,
+					     unsigned int pat_index,
 					     u32 flags)
 {
-	struct insert_entries arg = { vm, vma_res, level, flags };
+	struct insert_entries arg = { vm, vma_res, pat_index, flags };
 
 	stop_machine(bxt_vtd_ggtt_insert_entries__cb, &arg, NULL);
 }
@@ -495,7 +484,7 @@ static void gen6_ggtt_clear_range(struct i915_address_space *vm,
 void intel_ggtt_bind_vma(struct i915_address_space *vm,
 			 struct i915_vm_pt_stash *stash,
 			 struct i915_vma_resource *vma_res,
-			 enum i915_cache_level cache_level,
+			 unsigned int pat_index,
 			 u32 flags)
 {
 	u32 pte_flags;
@@ -512,7 +501,7 @@ void intel_ggtt_bind_vma(struct i915_address_space *vm,
 	if (vma_res->bi.lmem)
 		pte_flags |= PTE_LM;
 
-	vm->insert_entries(vm, vma_res, cache_level, pte_flags);
+	vm->insert_entries(vm, vma_res, pat_index, pte_flags);
 	vma_res->page_sizes_gtt = I915_GTT_PAGE_SIZE;
 }
 
@@ -661,7 +650,7 @@ static int init_ggtt(struct i915_ggtt *ggtt)
 static void aliasing_gtt_bind_vma(struct i915_address_space *vm,
 				  struct i915_vm_pt_stash *stash,
 				  struct i915_vma_resource *vma_res,
-				  enum i915_cache_level cache_level,
+				  unsigned int pat_index,
 				  u32 flags)
 {
 	u32 pte_flags;
@@ -673,10 +662,10 @@ static void aliasing_gtt_bind_vma(struct i915_address_space *vm,
 
 	if (flags & I915_VMA_LOCAL_BIND)
 		ppgtt_bind_vma(&i915_vm_to_ggtt(vm)->alias->vm,
-			       stash, vma_res, cache_level, flags);
+			       stash, vma_res, pat_index, flags);
 
 	if (flags & I915_VMA_GLOBAL_BIND)
-		vm->insert_entries(vm, vma_res, cache_level, pte_flags);
+		vm->insert_entries(vm, vma_res, pat_index, pte_flags);
 
 	vma_res->bound_flags |= flags;
 }
@@ -933,7 +922,9 @@ static int ggtt_probe_common(struct i915_ggtt *ggtt, u64 size)
 
 	ggtt->vm.scratch[0]->encode =
 		ggtt->vm.pte_encode(px_dma(ggtt->vm.scratch[0]),
-				    I915_CACHE_NONE, pte_flags);
+				    i915_gem_get_pat_index(i915,
+							   I915_CACHE_NONE),
+				    pte_flags);
 
 	return 0;
 }
@@ -1022,6 +1013,11 @@ static int gen8_gmch_probe(struct i915_ggtt *ggtt)
 	return ggtt_probe_common(ggtt, size);
 }
 
+/*
+ * For pre-gen8 platforms pat_index is the same as enum i915_cache_level,
+ * so these PTE encode functions are left with using cache_level.
+ * See translation table LEGACY_CACHELEVEL.
+ */
 static u64 snb_pte_encode(dma_addr_t addr,
 			  enum i915_cache_level level,
 			  u32 flags)
@@ -1302,7 +1298,9 @@ bool i915_ggtt_resume_vm(struct i915_address_space *vm)
 		 */
 		vma->resource->bound_flags = 0;
 		vma->ops->bind_vma(vm, NULL, vma->resource,
-				   obj ? obj->cache_level : 0,
+				   obj ? obj->pat_index :
+					 i915_gem_get_pat_index(vm->i915,
+								I915_CACHE_NONE),
 				   was_bound);
 
 		if (obj) { /* only used during resume => exclusive access */
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
index 854ec09fd588..be767e13b1e5 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
@@ -165,8 +165,6 @@ typedef u64 gen8_pte_t;
 #define MTL_2_COH_1W	REG_FIELD_PREP(MTL_PAT_INDEX_COH_MODE_MASK, 2)
 #define MTL_0_COH_NON	REG_FIELD_PREP(MTL_PAT_INDEX_COH_MODE_MASK, 0)
 
-enum i915_cache_level;
-
 struct drm_i915_gem_object;
 struct i915_fence_reg;
 struct i915_vma;
@@ -234,7 +232,7 @@ struct i915_vma_ops {
 	void (*bind_vma)(struct i915_address_space *vm,
 			 struct i915_vm_pt_stash *stash,
 			 struct i915_vma_resource *vma_res,
-			 enum i915_cache_level cache_level,
+			 unsigned int pat_index,
 			 u32 flags);
 	/*
 	 * Unmap an object from an address space. This usually consists of
@@ -306,7 +304,7 @@ struct i915_address_space {
 		(*alloc_scratch_dma)(struct i915_address_space *vm, int sz);
 
 	u64 (*pte_encode)(dma_addr_t addr,
-			  enum i915_cache_level level,
+			  unsigned int pat_index,
 			  u32 flags); /* Create a valid PTE */
 #define PTE_READ_ONLY	BIT(0)
 #define PTE_LM		BIT(1)
@@ -321,20 +319,20 @@ struct i915_address_space {
 	void (*insert_page)(struct i915_address_space *vm,
 			    dma_addr_t addr,
 			    u64 offset,
-			    enum i915_cache_level cache_level,
+			    unsigned int pat_index,
 			    u32 flags);
 	void (*insert_entries)(struct i915_address_space *vm,
 			       struct i915_vma_resource *vma_res,
-			       enum i915_cache_level cache_level,
+			       unsigned int pat_index,
 			       u32 flags);
 	void (*raw_insert_page)(struct i915_address_space *vm,
 				dma_addr_t addr,
 				u64 offset,
-				enum i915_cache_level cache_level,
+				unsigned int pat_index,
 				u32 flags);
 	void (*raw_insert_entries)(struct i915_address_space *vm,
 				   struct i915_vma_resource *vma_res,
-				   enum i915_cache_level cache_level,
+				   unsigned int pat_index,
 				   u32 flags);
 	void (*cleanup)(struct i915_address_space *vm);
 
@@ -581,7 +579,7 @@ void ppgtt_init(struct i915_ppgtt *ppgtt, struct intel_gt *gt,
 void intel_ggtt_bind_vma(struct i915_address_space *vm,
 			 struct i915_vm_pt_stash *stash,
 			 struct i915_vma_resource *vma_res,
-			 enum i915_cache_level cache_level,
+			 unsigned int pat_index,
 			 u32 flags);
 void intel_ggtt_unbind_vma(struct i915_address_space *vm,
 			   struct i915_vma_resource *vma_res);
@@ -639,7 +637,7 @@ void
 __set_pd_entry(struct i915_page_directory * const pd,
 	       const unsigned short idx,
 	       struct i915_page_table *pt,
-	       u64 (*encode)(const dma_addr_t, const enum i915_cache_level));
+	       u64 (*encode)(const dma_addr_t, const unsigned int pat_index));
 
 #define set_pd_entry(pd, idx, to) \
 	__set_pd_entry((pd), (idx), px_pt(to), gen8_pde_encode)
@@ -659,7 +657,7 @@ void gen6_ggtt_invalidate(struct i915_ggtt *ggtt);
 void ppgtt_bind_vma(struct i915_address_space *vm,
 		    struct i915_vm_pt_stash *stash,
 		    struct i915_vma_resource *vma_res,
-		    enum i915_cache_level cache_level,
+		    unsigned int pat_index,
 		    u32 flags);
 void ppgtt_unbind_vma(struct i915_address_space *vm,
 		      struct i915_vma_resource *vma_res);
diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c
index 3f638f198796..117c3d05af3e 100644
--- a/drivers/gpu/drm/i915/gt/intel_migrate.c
+++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
@@ -45,7 +45,9 @@ static void xehpsdv_toggle_pdes(struct i915_address_space *vm,
 	 * Insert a dummy PTE into every PT that will map to LMEM to ensure
 	 * we have a correctly setup PDE structure for later use.
 	 */
-	vm->insert_page(vm, 0, d->offset, I915_CACHE_NONE, PTE_LM);
+	vm->insert_page(vm, 0, d->offset,
+			i915_gem_get_pat_index(vm->i915, I915_CACHE_NONE),
+			PTE_LM);
 	GEM_BUG_ON(!pt->is_compact);
 	d->offset += SZ_2M;
 }
@@ -63,7 +65,9 @@ static void xehpsdv_insert_pte(struct i915_address_space *vm,
 	 * alignment is 64K underneath for the pt, and we are careful
 	 * not to access the space in the void.
 	 */
-	vm->insert_page(vm, px_dma(pt), d->offset, I915_CACHE_NONE, PTE_LM);
+	vm->insert_page(vm, px_dma(pt), d->offset,
+			i915_gem_get_pat_index(vm->i915, I915_CACHE_NONE),
+			PTE_LM);
 	d->offset += SZ_64K;
 }
 
@@ -73,7 +77,8 @@ static void insert_pte(struct i915_address_space *vm,
 {
 	struct insert_pte_data *d = data;
 
-	vm->insert_page(vm, px_dma(pt), d->offset, I915_CACHE_NONE,
+	vm->insert_page(vm, px_dma(pt), d->offset,
+			i915_gem_get_pat_index(vm->i915, I915_CACHE_NONE),
 			i915_gem_object_is_lmem(pt->base) ? PTE_LM : 0);
 	d->offset += PAGE_SIZE;
 }
@@ -356,13 +361,13 @@ static int max_pte_pkt_size(struct i915_request *rq, int pkt)
 
 static int emit_pte(struct i915_request *rq,
 		    struct sgt_dma *it,
-		    enum i915_cache_level cache_level,
+		    unsigned int pat_index,
 		    bool is_lmem,
 		    u64 offset,
 		    int length)
 {
 	bool has_64K_pages = HAS_64K_PAGES(rq->engine->i915);
-	const u64 encode = rq->context->vm->pte_encode(0, cache_level,
+	const u64 encode = rq->context->vm->pte_encode(0, pat_index,
 						       is_lmem ? PTE_LM : 0);
 	struct intel_ring *ring = rq->ring;
 	int pkt, dword_length;
@@ -673,17 +678,17 @@ int
 intel_context_migrate_copy(struct intel_context *ce,
 			   const struct i915_deps *deps,
 			   struct scatterlist *src,
-			   enum i915_cache_level src_cache_level,
+			   unsigned int src_pat_index,
 			   bool src_is_lmem,
 			   struct scatterlist *dst,
-			   enum i915_cache_level dst_cache_level,
+			   unsigned int dst_pat_index,
 			   bool dst_is_lmem,
 			   struct i915_request **out)
 {
 	struct sgt_dma it_src = sg_sgt(src), it_dst = sg_sgt(dst), it_ccs;
 	struct drm_i915_private *i915 = ce->engine->i915;
 	u64 ccs_bytes_to_cpy = 0, bytes_to_cpy;
-	enum i915_cache_level ccs_cache_level;
+	unsigned int ccs_pat_index;
 	u32 src_offset, dst_offset;
 	u8 src_access, dst_access;
 	struct i915_request *rq;
@@ -707,12 +712,12 @@ intel_context_migrate_copy(struct intel_context *ce,
 		dst_sz = scatter_list_length(dst);
 		if (src_is_lmem) {
 			it_ccs = it_dst;
-			ccs_cache_level = dst_cache_level;
+			ccs_pat_index = dst_pat_index;
 			ccs_is_src = false;
 		} else if (dst_is_lmem) {
 			bytes_to_cpy = dst_sz;
 			it_ccs = it_src;
-			ccs_cache_level = src_cache_level;
+			ccs_pat_index = src_pat_index;
 			ccs_is_src = true;
 		}
 
@@ -773,7 +778,7 @@ intel_context_migrate_copy(struct intel_context *ce,
 		src_sz = calculate_chunk_sz(i915, src_is_lmem,
 					    bytes_to_cpy, ccs_bytes_to_cpy);
 
-		len = emit_pte(rq, &it_src, src_cache_level, src_is_lmem,
+		len = emit_pte(rq, &it_src, src_pat_index, src_is_lmem,
 			       src_offset, src_sz);
 		if (!len) {
 			err = -EINVAL;
@@ -784,7 +789,7 @@ intel_context_migrate_copy(struct intel_context *ce,
 			goto out_rq;
 		}
 
-		err = emit_pte(rq, &it_dst, dst_cache_level, dst_is_lmem,
+		err = emit_pte(rq, &it_dst, dst_pat_index, dst_is_lmem,
 			       dst_offset, len);
 		if (err < 0)
 			goto out_rq;
@@ -811,7 +816,7 @@ intel_context_migrate_copy(struct intel_context *ce,
 				goto out_rq;
 
 			ccs_sz = GET_CCS_BYTES(i915, len);
-			err = emit_pte(rq, &it_ccs, ccs_cache_level, false,
+			err = emit_pte(rq, &it_ccs, ccs_pat_index, false,
 				       ccs_is_src ? src_offset : dst_offset,
 				       ccs_sz);
 			if (err < 0)
@@ -979,7 +984,7 @@ int
 intel_context_migrate_clear(struct intel_context *ce,
 			    const struct i915_deps *deps,
 			    struct scatterlist *sg,
-			    enum i915_cache_level cache_level,
+			    unsigned int pat_index,
 			    bool is_lmem,
 			    u32 value,
 			    struct i915_request **out)
@@ -1027,7 +1032,7 @@ intel_context_migrate_clear(struct intel_context *ce,
 		if (err)
 			goto out_rq;
 
-		len = emit_pte(rq, &it, cache_level, is_lmem, offset, CHUNK_SZ);
+		len = emit_pte(rq, &it, pat_index, is_lmem, offset, CHUNK_SZ);
 		if (len <= 0) {
 			err = len;
 			goto out_rq;
@@ -1074,10 +1079,10 @@ int intel_migrate_copy(struct intel_migrate *m,
 		       struct i915_gem_ww_ctx *ww,
 		       const struct i915_deps *deps,
 		       struct scatterlist *src,
-		       enum i915_cache_level src_cache_level,
+		       unsigned int src_pat_index,
 		       bool src_is_lmem,
 		       struct scatterlist *dst,
-		       enum i915_cache_level dst_cache_level,
+		       unsigned int dst_pat_index,
 		       bool dst_is_lmem,
 		       struct i915_request **out)
 {
@@ -1098,8 +1103,8 @@ int intel_migrate_copy(struct intel_migrate *m,
 		goto out;
 
 	err = intel_context_migrate_copy(ce, deps,
-					 src, src_cache_level, src_is_lmem,
-					 dst, dst_cache_level, dst_is_lmem,
+					 src, src_pat_index, src_is_lmem,
+					 dst, dst_pat_index, dst_is_lmem,
 					 out);
 
 	intel_context_unpin(ce);
@@ -1113,7 +1118,7 @@ intel_migrate_clear(struct intel_migrate *m,
 		    struct i915_gem_ww_ctx *ww,
 		    const struct i915_deps *deps,
 		    struct scatterlist *sg,
-		    enum i915_cache_level cache_level,
+		    unsigned int pat_index,
 		    bool is_lmem,
 		    u32 value,
 		    struct i915_request **out)
@@ -1134,7 +1139,7 @@ intel_migrate_clear(struct intel_migrate *m,
 	if (err)
 		goto out;
 
-	err = intel_context_migrate_clear(ce, deps, sg, cache_level,
+	err = intel_context_migrate_clear(ce, deps, sg, pat_index,
 					  is_lmem, value, out);
 
 	intel_context_unpin(ce);
diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.h b/drivers/gpu/drm/i915/gt/intel_migrate.h
index ccc677ec4aa3..11fc09a00c4b 100644
--- a/drivers/gpu/drm/i915/gt/intel_migrate.h
+++ b/drivers/gpu/drm/i915/gt/intel_migrate.h
@@ -16,7 +16,6 @@ struct i915_request;
 struct i915_gem_ww_ctx;
 struct intel_gt;
 struct scatterlist;
-enum i915_cache_level;
 
 int intel_migrate_init(struct intel_migrate *m, struct intel_gt *gt);
 
@@ -26,20 +25,20 @@ int intel_migrate_copy(struct intel_migrate *m,
 		       struct i915_gem_ww_ctx *ww,
 		       const struct i915_deps *deps,
 		       struct scatterlist *src,
-		       enum i915_cache_level src_cache_level,
+		       unsigned int src_pat_index,
 		       bool src_is_lmem,
 		       struct scatterlist *dst,
-		       enum i915_cache_level dst_cache_level,
+		       unsigned int dst_pat_index,
 		       bool dst_is_lmem,
 		       struct i915_request **out);
 
 int intel_context_migrate_copy(struct intel_context *ce,
 			       const struct i915_deps *deps,
 			       struct scatterlist *src,
-			       enum i915_cache_level src_cache_level,
+			       unsigned int src_pat_index,
 			       bool src_is_lmem,
 			       struct scatterlist *dst,
-			       enum i915_cache_level dst_cache_level,
+			       unsigned int dst_pat_index,
 			       bool dst_is_lmem,
 			       struct i915_request **out);
 
@@ -48,7 +47,7 @@ intel_migrate_clear(struct intel_migrate *m,
 		    struct i915_gem_ww_ctx *ww,
 		    const struct i915_deps *deps,
 		    struct scatterlist *sg,
-		    enum i915_cache_level cache_level,
+		    unsigned int pat_index,
 		    bool is_lmem,
 		    u32 value,
 		    struct i915_request **out);
@@ -56,7 +55,7 @@ int
 intel_context_migrate_clear(struct intel_context *ce,
 			    const struct i915_deps *deps,
 			    struct scatterlist *sg,
-			    enum i915_cache_level cache_level,
+			    unsigned int pat_index,
 			    bool is_lmem,
 			    u32 value,
 			    struct i915_request **out);
diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
index 7ecfa672f738..f0da3555c6db 100644
--- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
@@ -98,7 +98,7 @@ void
 __set_pd_entry(struct i915_page_directory * const pd,
 	       const unsigned short idx,
 	       struct i915_page_table * const to,
-	       u64 (*encode)(const dma_addr_t, const enum i915_cache_level))
+	       u64 (*encode)(const dma_addr_t, const unsigned int))
 {
 	/* Each thread pre-pins the pd, and we may have a thread per pde. */
 	GEM_BUG_ON(atomic_read(px_used(pd)) > NALLOC * I915_PDES);
@@ -181,7 +181,7 @@ struct i915_ppgtt *i915_ppgtt_create(struct intel_gt *gt,
 void ppgtt_bind_vma(struct i915_address_space *vm,
 		    struct i915_vm_pt_stash *stash,
 		    struct i915_vma_resource *vma_res,
-		    enum i915_cache_level cache_level,
+		    unsigned int pat_index,
 		    u32 flags)
 {
 	u32 pte_flags;
@@ -199,7 +199,7 @@ void ppgtt_bind_vma(struct i915_address_space *vm,
 	if (vma_res->bi.lmem)
 		pte_flags |= PTE_LM;
 
-	vm->insert_entries(vm, vma_res, cache_level, pte_flags);
+	vm->insert_entries(vm, vma_res, pat_index, pte_flags);
 	wmb();
 }
 
diff --git a/drivers/gpu/drm/i915/gt/selftest_migrate.c b/drivers/gpu/drm/i915/gt/selftest_migrate.c
index e677f2da093d..3def5ca72dec 100644
--- a/drivers/gpu/drm/i915/gt/selftest_migrate.c
+++ b/drivers/gpu/drm/i915/gt/selftest_migrate.c
@@ -137,7 +137,7 @@ static int copy(struct intel_migrate *migrate,
 static int intel_context_copy_ccs(struct intel_context *ce,
 				  const struct i915_deps *deps,
 				  struct scatterlist *sg,
-				  enum i915_cache_level cache_level,
+				  unsigned int pat_index,
 				  bool write_to_ccs,
 				  struct i915_request **out)
 {
@@ -185,7 +185,7 @@ static int intel_context_copy_ccs(struct intel_context *ce,
 		if (err)
 			goto out_rq;
 
-		len = emit_pte(rq, &it, cache_level, true, offset, CHUNK_SZ);
+		len = emit_pte(rq, &it, pat_index, true, offset, CHUNK_SZ);
 		if (len <= 0) {
 			err = len;
 			goto out_rq;
@@ -223,7 +223,7 @@ intel_migrate_ccs_copy(struct intel_migrate *m,
 		       struct i915_gem_ww_ctx *ww,
 		       const struct i915_deps *deps,
 		       struct scatterlist *sg,
-		       enum i915_cache_level cache_level,
+		       unsigned int pat_index,
 		       bool write_to_ccs,
 		       struct i915_request **out)
 {
@@ -243,7 +243,7 @@ intel_migrate_ccs_copy(struct intel_migrate *m,
 	if (err)
 		goto out;
 
-	err = intel_context_copy_ccs(ce, deps, sg, cache_level,
+	err = intel_context_copy_ccs(ce, deps, sg, pat_index,
 				     write_to_ccs, out);
 
 	intel_context_unpin(ce);
@@ -300,7 +300,7 @@ static int clear(struct intel_migrate *migrate,
 			/* Write the obj data into ccs surface */
 			err = intel_migrate_ccs_copy(migrate, &ww, NULL,
 						     obj->mm.pages->sgl,
-						     obj->cache_level,
+						     obj->pat_index,
 						     true, &rq);
 			if (rq && !err) {
 				if (i915_request_wait(rq, 0, HZ) < 0) {
@@ -351,7 +351,7 @@ static int clear(struct intel_migrate *migrate,
 
 			err = intel_migrate_ccs_copy(migrate, &ww, NULL,
 						     obj->mm.pages->sgl,
-						     obj->cache_level,
+						     obj->pat_index,
 						     false, &rq);
 			if (rq && !err) {
 				if (i915_request_wait(rq, 0, HZ) < 0) {
@@ -414,9 +414,9 @@ static int __migrate_copy(struct intel_migrate *migrate,
 			  struct i915_request **out)
 {
 	return intel_migrate_copy(migrate, ww, NULL,
-				  src->mm.pages->sgl, src->cache_level,
+				  src->mm.pages->sgl, src->pat_index,
 				  i915_gem_object_is_lmem(src),
-				  dst->mm.pages->sgl, dst->cache_level,
+				  dst->mm.pages->sgl, dst->pat_index,
 				  i915_gem_object_is_lmem(dst),
 				  out);
 }
@@ -428,9 +428,9 @@ static int __global_copy(struct intel_migrate *migrate,
 			 struct i915_request **out)
 {
 	return intel_context_migrate_copy(migrate->context, NULL,
-					  src->mm.pages->sgl, src->cache_level,
+					  src->mm.pages->sgl, src->pat_index,
 					  i915_gem_object_is_lmem(src),
-					  dst->mm.pages->sgl, dst->cache_level,
+					  dst->mm.pages->sgl, dst->pat_index,
 					  i915_gem_object_is_lmem(dst),
 					  out);
 }
@@ -455,7 +455,7 @@ static int __migrate_clear(struct intel_migrate *migrate,
 {
 	return intel_migrate_clear(migrate, ww, NULL,
 				   obj->mm.pages->sgl,
-				   obj->cache_level,
+				   obj->pat_index,
 				   i915_gem_object_is_lmem(obj),
 				   value, out);
 }
@@ -468,7 +468,7 @@ static int __global_clear(struct intel_migrate *migrate,
 {
 	return intel_context_migrate_clear(migrate->context, NULL,
 					   obj->mm.pages->sgl,
-					   obj->cache_level,
+					   obj->pat_index,
 					   i915_gem_object_is_lmem(obj),
 					   value, out);
 }
@@ -648,7 +648,7 @@ static int live_emit_pte_full_ring(void *arg)
 	 */
 	pr_info("%s emite_pte ring space=%u\n", __func__, rq->ring->space);
 	it = sg_sgt(obj->mm.pages->sgl);
-	len = emit_pte(rq, &it, obj->cache_level, false, 0, CHUNK_SZ);
+	len = emit_pte(rq, &it, obj->pat_index, false, 0, CHUNK_SZ);
 	if (!len) {
 		err = -EINVAL;
 		goto out_rq;
@@ -844,7 +844,7 @@ static int wrap_ktime_compare(const void *A, const void *B)
 
 static int __perf_clear_blt(struct intel_context *ce,
 			    struct scatterlist *sg,
-			    enum i915_cache_level cache_level,
+			    unsigned int pat_index,
 			    bool is_lmem,
 			    size_t sz)
 {
@@ -858,7 +858,7 @@ static int __perf_clear_blt(struct intel_context *ce,
 
 		t0 = ktime_get();
 
-		err = intel_context_migrate_clear(ce, NULL, sg, cache_level,
+		err = intel_context_migrate_clear(ce, NULL, sg, pat_index,
 						  is_lmem, 0, &rq);
 		if (rq) {
 			if (i915_request_wait(rq, 0, MAX_SCHEDULE_TIMEOUT) < 0)
@@ -904,7 +904,8 @@ static int perf_clear_blt(void *arg)
 
 		err = __perf_clear_blt(gt->migrate.context,
 				       dst->mm.pages->sgl,
-				       I915_CACHE_NONE,
+				       i915_gem_get_pat_index(gt->i915,
+							      I915_CACHE_NONE),
 				       i915_gem_object_is_lmem(dst),
 				       sizes[i]);
 
@@ -919,10 +920,10 @@ static int perf_clear_blt(void *arg)
 
 static int __perf_copy_blt(struct intel_context *ce,
 			   struct scatterlist *src,
-			   enum i915_cache_level src_cache_level,
+			   unsigned int src_pat_index,
 			   bool src_is_lmem,
 			   struct scatterlist *dst,
-			   enum i915_cache_level dst_cache_level,
+			   unsigned int dst_pat_index,
 			   bool dst_is_lmem,
 			   size_t sz)
 {
@@ -937,9 +938,9 @@ static int __perf_copy_blt(struct intel_context *ce,
 		t0 = ktime_get();
 
 		err = intel_context_migrate_copy(ce, NULL,
-						 src, src_cache_level,
+						 src, src_pat_index,
 						 src_is_lmem,
-						 dst, dst_cache_level,
+						 dst, dst_pat_index,
 						 dst_is_lmem,
 						 &rq);
 		if (rq) {
@@ -994,10 +995,12 @@ static int perf_copy_blt(void *arg)
 
 		err = __perf_copy_blt(gt->migrate.context,
 				      src->mm.pages->sgl,
-				      I915_CACHE_NONE,
+				      i915_gem_get_pat_index(gt->i915,
+							     I915_CACHE_NONE),
 				      i915_gem_object_is_lmem(src),
 				      dst->mm.pages->sgl,
-				      I915_CACHE_NONE,
+				      i915_gem_get_pat_index(gt->i915,
+							     I915_CACHE_NONE),
 				      i915_gem_object_is_lmem(dst),
 				      sz);
 
diff --git a/drivers/gpu/drm/i915/gt/selftest_reset.c b/drivers/gpu/drm/i915/gt/selftest_reset.c
index a9e0a91bc0e0..79aa6ac66ad2 100644
--- a/drivers/gpu/drm/i915/gt/selftest_reset.c
+++ b/drivers/gpu/drm/i915/gt/selftest_reset.c
@@ -86,7 +86,9 @@ __igt_reset_stolen(struct intel_gt *gt,
 
 		ggtt->vm.insert_page(&ggtt->vm, dma,
 				     ggtt->error_capture.start,
-				     I915_CACHE_NONE, 0);
+				     i915_gem_get_pat_index(gt->i915,
+							    I915_CACHE_NONE),
+				     0);
 		mb();
 
 		s = io_mapping_map_wc(&ggtt->iomap,
@@ -127,7 +129,9 @@ __igt_reset_stolen(struct intel_gt *gt,
 
 		ggtt->vm.insert_page(&ggtt->vm, dma,
 				     ggtt->error_capture.start,
-				     I915_CACHE_NONE, 0);
+				     i915_gem_get_pat_index(gt->i915,
+							    I915_CACHE_NONE),
+				     0);
 		mb();
 
 		s = io_mapping_map_wc(&ggtt->iomap,
diff --git a/drivers/gpu/drm/i915/gt/selftest_timeline.c b/drivers/gpu/drm/i915/gt/selftest_timeline.c
index 9f536c251179..39c3ec12df1a 100644
--- a/drivers/gpu/drm/i915/gt/selftest_timeline.c
+++ b/drivers/gpu/drm/i915/gt/selftest_timeline.c
@@ -836,7 +836,7 @@ static int setup_watcher(struct hwsp_watcher *w, struct intel_gt *gt,
 		return PTR_ERR(obj);
 
 	/* keep the same cache settings as timeline */
-	i915_gem_object_set_cache_coherency(obj, tl->hwsp_ggtt->obj->cache_level);
+	i915_gem_object_set_pat_index(obj, tl->hwsp_ggtt->obj->pat_index);
 	w->map = i915_gem_object_pin_map_unlocked(obj,
 						  page_unmask_bits(tl->hwsp_ggtt->obj->mm.mapping));
 	if (IS_ERR(w->map)) {
diff --git a/drivers/gpu/drm/i915/gt/selftest_tlb.c b/drivers/gpu/drm/i915/gt/selftest_tlb.c
index e6cac1f15d6e..4493c8518e91 100644
--- a/drivers/gpu/drm/i915/gt/selftest_tlb.c
+++ b/drivers/gpu/drm/i915/gt/selftest_tlb.c
@@ -36,6 +36,8 @@ pte_tlbinv(struct intel_context *ce,
 	   u64 length,
 	   struct rnd_state *prng)
 {
+	const unsigned int pat_index =
+		i915_gem_get_pat_index(ce->vm->i915, I915_CACHE_NONE);
 	struct drm_i915_gem_object *batch;
 	struct drm_mm_node vb_node;
 	struct i915_request *rq;
@@ -155,7 +157,7 @@ pte_tlbinv(struct intel_context *ce,
 		/* Flip the PTE between A and B */
 		if (i915_gem_object_is_lmem(vb->obj))
 			pte_flags |= PTE_LM;
-		ce->vm->insert_entries(ce->vm, &vb_res, 0, pte_flags);
+		ce->vm->insert_entries(ce->vm, &vb_res, pat_index, pte_flags);
 
 		/* Flush the PTE update to concurrent HW */
 		tlbinv(ce->vm, addr & -length, length);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
index a82a53dbbc86..145681ae20a5 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
@@ -890,9 +890,15 @@ static void uc_fw_bind_ggtt(struct intel_uc_fw *uc_fw)
 		pte_flags |= PTE_LM;
 
 	if (ggtt->vm.raw_insert_entries)
-		ggtt->vm.raw_insert_entries(&ggtt->vm, dummy, I915_CACHE_NONE, pte_flags);
+		ggtt->vm.raw_insert_entries(&ggtt->vm, dummy,
+					    i915_gem_get_pat_index(ggtt->vm.i915,
+								   I915_CACHE_NONE),
+					    pte_flags);
 	else
-		ggtt->vm.insert_entries(&ggtt->vm, dummy, I915_CACHE_NONE, pte_flags);
+		ggtt->vm.insert_entries(&ggtt->vm, dummy,
+					i915_gem_get_pat_index(ggtt->vm.i915,
+							       I915_CACHE_NONE),
+					pte_flags);
 }
 
 static void uc_fw_unbind_ggtt(struct intel_uc_fw *uc_fw)
diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 41389a32e998..9a4922da3a71 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -139,21 +139,56 @@ static const char *stringify_vma_type(const struct i915_vma *vma)
 	return "ppgtt";
 }
 
-static const char *i915_cache_level_str(struct drm_i915_private *i915, int type)
-{
-	switch (type) {
-	case I915_CACHE_NONE: return " uncached";
-	case I915_CACHE_LLC: return HAS_LLC(i915) ? " LLC" : " snooped";
-	case I915_CACHE_L3_LLC: return " L3+LLC";
-	case I915_CACHE_WT: return " WT";
-	default: return "";
+static const char *i915_cache_level_str(struct drm_i915_gem_object *obj)
+{
+	struct drm_i915_private *i915 = obj_to_i915(obj);
+
+	if (IS_METEORLAKE(i915)) {
+		switch (obj->pat_index) {
+		case 0: return " WB";
+		case 1: return " WT";
+		case 2: return " UC";
+		case 3: return " WB (1-Way Coh)";
+		case 4: return " WB (2-Way Coh)";
+		default: return " not defined";
+		}
+	} else if (IS_PONTEVECCHIO(i915)) {
+		switch (obj->pat_index) {
+		case 0: return " UC";
+		case 1: return " WC";
+		case 2: return " WT";
+		case 3: return " WB";
+		case 4: return " WT (CLOS1)";
+		case 5: return " WB (CLOS1)";
+		case 6: return " WT (CLOS2)";
+		case 7: return " WT (CLOS2)";
+		default: return " not defined";
+		}
+	} else if (GRAPHICS_VER(i915) >= 12) {
+		switch (obj->pat_index) {
+		case 0: return " WB";
+		case 1: return " WC";
+		case 2: return " WT";
+		case 3: return " UC";
+		default: return " not defined";
+		}
+	} else {
+		if (i915_gem_object_has_cache_level(obj, I915_CACHE_NONE))
+			return " uncached";
+		else if (i915_gem_object_has_cache_level(obj, I915_CACHE_LLC))
+			return HAS_LLC(i915) ? " LLC" : " snooped";
+		else if (i915_gem_object_has_cache_level(obj, I915_CACHE_L3_LLC))
+			return " L3+LLC";
+		else if (i915_gem_object_has_cache_level(obj, I915_CACHE_WT))
+			return " WT";
+		else
+			return " not defined";
 	}
 }
 
 void
 i915_debugfs_describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
 {
-	struct drm_i915_private *dev_priv = to_i915(obj->base.dev);
 	struct i915_vma *vma;
 	int pin_count = 0;
 
@@ -165,7 +200,7 @@ i915_debugfs_describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
 		   obj->base.size / 1024,
 		   obj->read_domains,
 		   obj->write_domain,
-		   i915_cache_level_str(dev_priv, obj->cache_level),
+		   i915_cache_level_str(obj),
 		   obj->mm.dirty ? " dirty" : "",
 		   obj->mm.madv == I915_MADV_DONTNEED ? " purgeable" : "");
 	if (obj->base.name)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 0a78bdbd36b1..63207b0740b3 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -420,8 +420,12 @@ i915_gem_gtt_pread(struct drm_i915_gem_object *obj,
 		page_length = remain < page_length ? remain : page_length;
 		if (drm_mm_node_allocated(&node)) {
 			ggtt->vm.insert_page(&ggtt->vm,
-					     i915_gem_object_get_dma_address(obj, offset >> PAGE_SHIFT),
-					     node.start, I915_CACHE_NONE, 0);
+					i915_gem_object_get_dma_address(obj,
+									offset >> PAGE_SHIFT),
+					node.start,
+					i915_gem_get_pat_index(i915,
+							       I915_CACHE_NONE),
+					0);
 		} else {
 			page_base += offset & PAGE_MASK;
 		}
@@ -598,8 +602,12 @@ i915_gem_gtt_pwrite_fast(struct drm_i915_gem_object *obj,
 			/* flush the write before we modify the GGTT */
 			intel_gt_flush_ggtt_writes(ggtt->vm.gt);
 			ggtt->vm.insert_page(&ggtt->vm,
-					     i915_gem_object_get_dma_address(obj, offset >> PAGE_SHIFT),
-					     node.start, I915_CACHE_NONE, 0);
+					i915_gem_object_get_dma_address(obj,
+									offset >> PAGE_SHIFT),
+					node.start,
+					i915_gem_get_pat_index(i915,
+							       I915_CACHE_NONE),
+					0);
 			wmb(); /* flush modifications to the GGTT (insert_page) */
 		} else {
 			page_base += offset & PAGE_MASK;
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index f020c0086fbc..2556cabea02c 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -1117,10 +1117,14 @@ i915_vma_coredump_create(const struct intel_gt *gt,
 			mutex_lock(&ggtt->error_mutex);
 			if (ggtt->vm.raw_insert_page)
 				ggtt->vm.raw_insert_page(&ggtt->vm, dma, slot,
-							 I915_CACHE_NONE, 0);
+						i915_gem_get_pat_index(gt->i915,
+								       I915_CACHE_NONE),
+						0);
 			else
 				ggtt->vm.insert_page(&ggtt->vm, dma, slot,
-						     I915_CACHE_NONE, 0);
+						i915_gem_get_pat_index(gt->i915,
+								       I915_CACHE_NONE),
+						0);
 			mb();
 
 			s = io_mapping_map_wc(&ggtt->iomap, slot, PAGE_SIZE);
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 20a44788999e..a814775a363d 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -315,7 +315,7 @@ struct i915_vma_work {
 	struct i915_vma_resource *vma_res;
 	struct drm_i915_gem_object *obj;
 	struct i915_sw_dma_fence_cb cb;
-	enum i915_cache_level cache_level;
+	unsigned int pat_index;
 	unsigned int flags;
 };
 
@@ -334,7 +334,7 @@ static void __vma_bind(struct dma_fence_work *work)
 		return;
 
 	vma_res->ops->bind_vma(vma_res->vm, &vw->stash,
-			       vma_res, vw->cache_level, vw->flags);
+			       vma_res, vw->pat_index, vw->flags);
 }
 
 static void __vma_release(struct dma_fence_work *work)
@@ -426,7 +426,7 @@ i915_vma_resource_init_from_vma(struct i915_vma_resource *vma_res,
 /**
  * i915_vma_bind - Sets up PTEs for an VMA in it's corresponding address space.
  * @vma: VMA to map
- * @cache_level: mapping cache level
+ * @pat_index: PAT index to set in PTE
  * @flags: flags like global or local mapping
  * @work: preallocated worker for allocating and binding the PTE
  * @vma_res: pointer to a preallocated vma resource. The resource is either
@@ -437,7 +437,7 @@ i915_vma_resource_init_from_vma(struct i915_vma_resource *vma_res,
  * Note that DMA addresses are also the only part of the SG table we care about.
  */
 int i915_vma_bind(struct i915_vma *vma,
-		  enum i915_cache_level cache_level,
+		  unsigned int pat_index,
 		  u32 flags,
 		  struct i915_vma_work *work,
 		  struct i915_vma_resource *vma_res)
@@ -507,7 +507,7 @@ int i915_vma_bind(struct i915_vma *vma,
 		struct dma_fence *prev;
 
 		work->vma_res = i915_vma_resource_get(vma->resource);
-		work->cache_level = cache_level;
+		work->pat_index = pat_index;
 		work->flags = bind_flags;
 
 		/*
@@ -537,7 +537,7 @@ int i915_vma_bind(struct i915_vma *vma,
 
 			return ret;
 		}
-		vma->ops->bind_vma(vma->vm, NULL, vma->resource, cache_level,
+		vma->ops->bind_vma(vma->vm, NULL, vma->resource, pat_index,
 				   bind_flags);
 	}
 
@@ -814,7 +814,7 @@ i915_vma_insert(struct i915_vma *vma, struct i915_gem_ww_ctx *ww,
 	color = 0;
 
 	if (i915_vm_has_cache_coloring(vma->vm))
-		color = vma->obj->cache_level;
+		color = vma->obj->pat_index;
 
 	if (flags & PIN_OFFSET_FIXED) {
 		u64 offset = flags & PIN_OFFSET_MASK;
@@ -1518,7 +1518,7 @@ int i915_vma_pin_ww(struct i915_vma *vma, struct i915_gem_ww_ctx *ww,
 
 	GEM_BUG_ON(!vma->pages);
 	err = i915_vma_bind(vma,
-			    vma->obj->cache_level,
+			    vma->obj->pat_index,
 			    flags, work, vma_res);
 	vma_res = NULL;
 	if (err)
diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
index ed5c9d682a1b..31a8f8aa5558 100644
--- a/drivers/gpu/drm/i915/i915_vma.h
+++ b/drivers/gpu/drm/i915/i915_vma.h
@@ -250,7 +250,7 @@ i915_vma_compare(struct i915_vma *vma,
 
 struct i915_vma_work *i915_vma_work(void);
 int i915_vma_bind(struct i915_vma *vma,
-		  enum i915_cache_level cache_level,
+		  unsigned int pat_index,
 		  u32 flags,
 		  struct i915_vma_work *work,
 		  struct i915_vma_resource *vma_res);
diff --git a/drivers/gpu/drm/i915/i915_vma_types.h b/drivers/gpu/drm/i915/i915_vma_types.h
index 77fda2244d16..64472b7f0e77 100644
--- a/drivers/gpu/drm/i915/i915_vma_types.h
+++ b/drivers/gpu/drm/i915/i915_vma_types.h
@@ -32,8 +32,6 @@
 
 #include "gem/i915_gem_object_types.h"
 
-enum i915_cache_level;
-
 /**
  * DOC: Global GTT views
  *
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem.c b/drivers/gpu/drm/i915/selftests/i915_gem.c
index d91d0ade8abd..61da4ed9d521 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem.c
@@ -57,7 +57,10 @@ static void trash_stolen(struct drm_i915_private *i915)
 		u32 __iomem *s;
 		int x;
 
-		ggtt->vm.insert_page(&ggtt->vm, dma, slot, I915_CACHE_NONE, 0);
+		ggtt->vm.insert_page(&ggtt->vm, dma, slot,
+				     i915_gem_get_pat_index(i915,
+							    I915_CACHE_NONE),
+				     0);
 
 		s = io_mapping_map_atomic_wc(&ggtt->iomap, slot);
 		for (x = 0; x < PAGE_SIZE / sizeof(u32); x++) {
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
index 37068542aafe..f13a4d265814 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
@@ -245,7 +245,7 @@ static int igt_evict_for_cache_color(void *arg)
 	struct drm_mm_node target = {
 		.start = I915_GTT_PAGE_SIZE * 2,
 		.size = I915_GTT_PAGE_SIZE,
-		.color = I915_CACHE_LLC,
+		.color = i915_gem_get_pat_index(gt->i915, I915_CACHE_LLC),
 	};
 	struct drm_i915_gem_object *obj;
 	struct i915_vma *vma;
@@ -308,7 +308,7 @@ static int igt_evict_for_cache_color(void *arg)
 	/* Attempt to remove the first *pinned* vma, by removing the (empty)
 	 * neighbour -- this should fail.
 	 */
-	target.color = I915_CACHE_L3_LLC;
+	target.color = i915_gem_get_pat_index(gt->i915, I915_CACHE_L3_LLC);
 
 	mutex_lock(&ggtt->vm.mutex);
 	err = i915_gem_evict_for_node(&ggtt->vm, NULL, &target, 0);
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
index 154801f1c468..36940ef10108 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
@@ -135,7 +135,7 @@ fake_dma_object(struct drm_i915_private *i915, u64 size)
 
 	obj->write_domain = I915_GEM_DOMAIN_CPU;
 	obj->read_domains = I915_GEM_DOMAIN_CPU;
-	obj->cache_level = I915_CACHE_NONE;
+	obj->pat_index = i915_gem_get_pat_index(i915, I915_CACHE_NONE);
 
 	/* Preallocate the "backing storage" */
 	if (i915_gem_object_pin_pages_unlocked(obj))
@@ -359,7 +359,9 @@ static int lowlevel_hole(struct i915_address_space *vm,
 
 			with_intel_runtime_pm(vm->gt->uncore->rpm, wakeref)
 			  vm->insert_entries(vm, mock_vma_res,
-						   I915_CACHE_NONE, 0);
+					     i915_gem_get_pat_index(vm->i915,
+								    I915_CACHE_NONE),
+					     0);
 		}
 		count = n;
 
@@ -1377,7 +1379,10 @@ static int igt_ggtt_page(void *arg)
 
 		ggtt->vm.insert_page(&ggtt->vm,
 				     i915_gem_object_get_dma_address(obj, 0),
-				     offset, I915_CACHE_NONE, 0);
+				     offset,
+				     i915_gem_get_pat_index(i915,
+							    I915_CACHE_NONE),
+				     0);
 	}
 
 	order = i915_random_order(count, &prng);
@@ -1510,7 +1515,7 @@ static int reserve_gtt_with_resource(struct i915_vma *vma, u64 offset)
 	mutex_lock(&vm->mutex);
 	err = i915_gem_gtt_reserve(vm, NULL, &vma->node, obj->base.size,
 				   offset,
-				   obj->cache_level,
+				   obj->pat_index,
 				   0);
 	if (!err) {
 		i915_vma_resource_init_from_vma(vma_res, vma);
@@ -1690,7 +1695,7 @@ static int insert_gtt_with_resource(struct i915_vma *vma)
 
 	mutex_lock(&vm->mutex);
 	err = i915_gem_gtt_insert(vm, NULL, &vma->node, obj->base.size, 0,
-				  obj->cache_level, 0, vm->total, 0);
+				  obj->pat_index, 0, vm->total, 0);
 	if (!err) {
 		i915_vma_resource_init_from_vma(vma_res, vma);
 		vma->resource = vma_res;
diff --git a/drivers/gpu/drm/i915/selftests/intel_memory_region.c b/drivers/gpu/drm/i915/selftests/intel_memory_region.c
index 3b18e5905c86..d985d9bae2e8 100644
--- a/drivers/gpu/drm/i915/selftests/intel_memory_region.c
+++ b/drivers/gpu/drm/i915/selftests/intel_memory_region.c
@@ -1070,7 +1070,9 @@ static int igt_lmem_write_cpu(void *arg)
 	/* Put the pages into a known state -- from the gpu for added fun */
 	intel_engine_pm_get(engine);
 	err = intel_context_migrate_clear(engine->gt->migrate.context, NULL,
-					  obj->mm.pages->sgl, I915_CACHE_NONE,
+					  obj->mm.pages->sgl,
+					  i915_gem_get_pat_index(i915,
+								 I915_CACHE_NONE),
 					  true, 0xdeadbeaf, &rq);
 	if (rq) {
 		dma_resv_add_fence(obj->base.resv, &rq->fence,
diff --git a/drivers/gpu/drm/i915/selftests/mock_gtt.c b/drivers/gpu/drm/i915/selftests/mock_gtt.c
index ece97e4faacb..a516c0aa88fd 100644
--- a/drivers/gpu/drm/i915/selftests/mock_gtt.c
+++ b/drivers/gpu/drm/i915/selftests/mock_gtt.c
@@ -27,21 +27,21 @@
 static void mock_insert_page(struct i915_address_space *vm,
 			     dma_addr_t addr,
 			     u64 offset,
-			     enum i915_cache_level level,
+			     unsigned int pat_index,
 			     u32 flags)
 {
 }
 
 static void mock_insert_entries(struct i915_address_space *vm,
 				struct i915_vma_resource *vma_res,
-				enum i915_cache_level level, u32 flags)
+				unsigned int pat_index, u32 flags)
 {
 }
 
 static void mock_bind_ppgtt(struct i915_address_space *vm,
 			    struct i915_vm_pt_stash *stash,
 			    struct i915_vma_resource *vma_res,
-			    enum i915_cache_level cache_level,
+			    unsigned int pat_index,
 			    u32 flags)
 {
 	GEM_BUG_ON(flags & I915_VMA_GLOBAL_BIND);
@@ -94,7 +94,7 @@ struct i915_ppgtt *mock_ppgtt(struct drm_i915_private *i915, const char *name)
 static void mock_bind_ggtt(struct i915_address_space *vm,
 			   struct i915_vm_pt_stash *stash,
 			   struct i915_vma_resource *vma_res,
-			   enum i915_cache_level cache_level,
+			   unsigned int pat_index,
 			   u32 flags)
 {
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [Intel-gfx] [PATCH 7/8] drm/i915: use pat_index instead of cache_level
@ 2023-04-19 23:00   ` fei.yang
  0 siblings, 0 replies; 76+ messages in thread
From: fei.yang @ 2023-04-19 23:00 UTC (permalink / raw)
  To: intel-gfx; +Cc: Matt Roper, Chris Wilson, dri-devel

From: Fei Yang <fei.yang@intel.com>

Currently the KMD is using enum i915_cache_level to set caching policy for
buffer objects. This is flaky because the PAT index which really controls
the caching behavior in PTE has far more levels than what's defined in the
enum. In addition, the PAT index is platform dependent, having to translate
between i915_cache_level and PAT index is not reliable, and makes the code
more complicated.

From UMD's perspective there is also a necessity to set caching policy for
performance fine tuning. It's much easier for the UMD to directly use PAT
index because the behavior of each PAT index is clearly defined in Bspec.
Having the abstracted i915_cache_level sitting in between would only cause
more ambiguity.

For these reasons this patch replaces i915_cache_level with PAT index. Also
note, the cache_level is not completely removed yet, because the KMD still
has the need of creating buffer objects with simple cache settings such as
cached, uncached, or writethrough. For such simple cases, using cache_level
would help simplify the code.

Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Fei Yang <fei.yang@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
---
 drivers/gpu/drm/i915/display/intel_dpt.c      | 12 +--
 drivers/gpu/drm/i915/gem/i915_gem_domain.c    | 27 ++----
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 10 ++-
 drivers/gpu/drm/i915/gem/i915_gem_mman.c      |  3 +-
 drivers/gpu/drm/i915/gem/i915_gem_object.c    | 52 +++++++++++-
 drivers/gpu/drm/i915/gem/i915_gem_object.h    |  4 +
 .../gpu/drm/i915/gem/i915_gem_object_types.h  | 25 +++++-
 drivers/gpu/drm/i915/gem/i915_gem_stolen.c    |  4 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c  | 16 ++--
 .../gpu/drm/i915/gem/selftests/huge_pages.c   |  2 +-
 .../drm/i915/gem/selftests/i915_gem_migrate.c |  2 +-
 .../drm/i915/gem/selftests/i915_gem_mman.c    |  2 +-
 drivers/gpu/drm/i915/gt/gen6_ppgtt.c          | 10 ++-
 drivers/gpu/drm/i915/gt/gen8_ppgtt.c          | 71 ++++++++--------
 drivers/gpu/drm/i915/gt/gen8_ppgtt.h          |  3 +-
 drivers/gpu/drm/i915/gt/intel_ggtt.c          | 82 +++++++++----------
 drivers/gpu/drm/i915/gt/intel_gtt.h           | 20 ++---
 drivers/gpu/drm/i915/gt/intel_migrate.c       | 47 ++++++-----
 drivers/gpu/drm/i915/gt/intel_migrate.h       | 13 ++-
 drivers/gpu/drm/i915/gt/intel_ppgtt.c         |  6 +-
 drivers/gpu/drm/i915/gt/selftest_migrate.c    | 47 ++++++-----
 drivers/gpu/drm/i915/gt/selftest_reset.c      |  8 +-
 drivers/gpu/drm/i915/gt/selftest_timeline.c   |  2 +-
 drivers/gpu/drm/i915/gt/selftest_tlb.c        |  4 +-
 drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c      | 10 ++-
 drivers/gpu/drm/i915/i915_debugfs.c           | 55 ++++++++++---
 drivers/gpu/drm/i915/i915_gem.c               | 16 +++-
 drivers/gpu/drm/i915/i915_gpu_error.c         |  8 +-
 drivers/gpu/drm/i915/i915_vma.c               | 16 ++--
 drivers/gpu/drm/i915/i915_vma.h               |  2 +-
 drivers/gpu/drm/i915/i915_vma_types.h         |  2 -
 drivers/gpu/drm/i915/selftests/i915_gem.c     |  5 +-
 .../gpu/drm/i915/selftests/i915_gem_evict.c   |  4 +-
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 15 ++--
 .../drm/i915/selftests/intel_memory_region.c  |  4 +-
 drivers/gpu/drm/i915/selftests/mock_gtt.c     |  8 +-
 36 files changed, 378 insertions(+), 239 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_dpt.c b/drivers/gpu/drm/i915/display/intel_dpt.c
index c5eacfdba1a5..7c5fddb203ba 100644
--- a/drivers/gpu/drm/i915/display/intel_dpt.c
+++ b/drivers/gpu/drm/i915/display/intel_dpt.c
@@ -43,24 +43,24 @@ static void gen8_set_pte(void __iomem *addr, gen8_pte_t pte)
 static void dpt_insert_page(struct i915_address_space *vm,
 			    dma_addr_t addr,
 			    u64 offset,
-			    enum i915_cache_level level,
+			    unsigned int pat_index,
 			    u32 flags)
 {
 	struct i915_dpt *dpt = i915_vm_to_dpt(vm);
 	gen8_pte_t __iomem *base = dpt->iomem;
 
 	gen8_set_pte(base + offset / I915_GTT_PAGE_SIZE,
-		     vm->pte_encode(addr, level, flags));
+		     vm->pte_encode(addr, pat_index, flags));
 }
 
 static void dpt_insert_entries(struct i915_address_space *vm,
 			       struct i915_vma_resource *vma_res,
-			       enum i915_cache_level level,
+			       unsigned int pat_index,
 			       u32 flags)
 {
 	struct i915_dpt *dpt = i915_vm_to_dpt(vm);
 	gen8_pte_t __iomem *base = dpt->iomem;
-	const gen8_pte_t pte_encode = vm->pte_encode(0, level, flags);
+	const gen8_pte_t pte_encode = vm->pte_encode(0, pat_index, flags);
 	struct sgt_iter sgt_iter;
 	dma_addr_t addr;
 	int i;
@@ -83,7 +83,7 @@ static void dpt_clear_range(struct i915_address_space *vm,
 static void dpt_bind_vma(struct i915_address_space *vm,
 			 struct i915_vm_pt_stash *stash,
 			 struct i915_vma_resource *vma_res,
-			 enum i915_cache_level cache_level,
+			 unsigned int pat_index,
 			 u32 flags)
 {
 	u32 pte_flags;
@@ -98,7 +98,7 @@ static void dpt_bind_vma(struct i915_address_space *vm,
 	if (vma_res->bi.lmem)
 		pte_flags |= PTE_LM;
 
-	vm->insert_entries(vm, vma_res, cache_level, pte_flags);
+	vm->insert_entries(vm, vma_res, pat_index, pte_flags);
 
 	vma_res->page_sizes_gtt = I915_GTT_PAGE_SIZE;
 
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
index bb3575b1479f..d5fd4c9cd9f8 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
@@ -27,8 +27,8 @@ static bool gpu_write_needs_clflush(struct drm_i915_gem_object *obj)
 	if (IS_DGFX(i915))
 		return false;
 
-	return !(obj->cache_level == I915_CACHE_NONE ||
-		 obj->cache_level == I915_CACHE_WT);
+	return !(i915_gem_object_has_cache_level(obj, I915_CACHE_NONE) ||
+		 i915_gem_object_has_cache_level(obj, I915_CACHE_WT));
 }
 
 bool i915_gem_cpu_write_needs_clflush(struct drm_i915_gem_object *obj)
@@ -267,7 +267,7 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
 {
 	int ret;
 
-	if (obj->cache_level == cache_level)
+	if (i915_gem_object_has_cache_level(obj, cache_level))
 		return 0;
 
 	ret = i915_gem_object_wait(obj,
@@ -278,10 +278,8 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
 		return ret;
 
 	/* Always invalidate stale cachelines */
-	if (obj->cache_level != cache_level) {
-		i915_gem_object_set_cache_coherency(obj, cache_level);
-		obj->cache_dirty = true;
-	}
+	i915_gem_object_set_cache_coherency(obj, cache_level);
+	obj->cache_dirty = true;
 
 	/* The cache-level will be applied when each vma is rebound. */
 	return i915_gem_object_unbind(obj,
@@ -306,20 +304,13 @@ int i915_gem_get_caching_ioctl(struct drm_device *dev, void *data,
 		goto out;
 	}
 
-	switch (obj->cache_level) {
-	case I915_CACHE_LLC:
-	case I915_CACHE_L3_LLC:
+	if (i915_gem_object_has_cache_level(obj, I915_CACHE_LLC) ||
+	    i915_gem_object_has_cache_level(obj, I915_CACHE_L3_LLC))
 		args->caching = I915_CACHING_CACHED;
-		break;
-
-	case I915_CACHE_WT:
+	else if (i915_gem_object_has_cache_level(obj, I915_CACHE_WT))
 		args->caching = I915_CACHING_DISPLAY;
-		break;
-
-	default:
+	else
 		args->caching = I915_CACHING_NONE;
-		break;
-	}
 out:
 	rcu_read_unlock();
 	return err;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 3aeede6aee4d..d42915516636 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -642,7 +642,7 @@ static inline int use_cpu_reloc(const struct reloc_cache *cache,
 
 	return (cache->has_llc ||
 		obj->cache_dirty ||
-		obj->cache_level != I915_CACHE_NONE);
+		!i915_gem_object_has_cache_level(obj, I915_CACHE_NONE));
 }
 
 static int eb_reserve_vma(struct i915_execbuffer *eb,
@@ -1323,8 +1323,10 @@ static void *reloc_iomap(struct i915_vma *batch,
 	offset = cache->node.start;
 	if (drm_mm_node_allocated(&cache->node)) {
 		ggtt->vm.insert_page(&ggtt->vm,
-				     i915_gem_object_get_dma_address(obj, page),
-				     offset, I915_CACHE_NONE, 0);
+			i915_gem_object_get_dma_address(obj, page),
+			offset,
+			i915_gem_get_pat_index(ggtt->vm.i915, I915_CACHE_NONE),
+			0);
 	} else {
 		offset += page << PAGE_SHIFT;
 	}
@@ -1464,7 +1466,7 @@ eb_relocate_entry(struct i915_execbuffer *eb,
 			reloc_cache_unmap(&eb->reloc_cache);
 			mutex_lock(&vma->vm->mutex);
 			err = i915_vma_bind(target->vma,
-					    target->vma->obj->cache_level,
+					    target->vma->obj->pat_index,
 					    PIN_GLOBAL, NULL, NULL);
 			mutex_unlock(&vma->vm->mutex);
 			reloc_cache_remap(&eb->reloc_cache, ev->vma->obj);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
index 3dbacdf0911a..50c30efa08a3 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
@@ -383,7 +383,8 @@ static vm_fault_t vm_fault_gtt(struct vm_fault *vmf)
 	}
 
 	/* Access to snoopable pages through the GTT is incoherent. */
-	if (obj->cache_level != I915_CACHE_NONE && !HAS_LLC(i915)) {
+	if (!(i915_gem_object_has_cache_level(obj, I915_CACHE_NONE) ||
+	      HAS_LLC(i915))) {
 		ret = -EFAULT;
 		goto err_unpin;
 	}
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
index 8c70a0ec7d2f..27c948350b5b 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
@@ -54,6 +54,25 @@ unsigned int i915_gem_get_pat_index(struct drm_i915_private *i915,
 	return INTEL_INFO(i915)->cachelevel_to_pat[level];
 }
 
+bool i915_gem_object_has_cache_level(const struct drm_i915_gem_object *obj,
+				     enum i915_cache_level lvl)
+{
+	/*
+	 * cache_level == I915_CACHE_INVAL indicates the UMD's have set the
+	 * caching policy through pat_index, in which case the KMD should
+	 * leave the coherency to be managed by user space, simply return
+	 * true here.
+	 */
+	if (obj->cache_level == I915_CACHE_INVAL)
+		return true;
+
+	/*
+	 * Otherwise the pat_index should have been converted from cache_level
+	 * so that the following comparison is valid.
+	 */
+	return obj->pat_index == i915_gem_get_pat_index(obj_to_i915(obj), lvl);
+}
+
 struct drm_i915_gem_object *i915_gem_object_alloc(void)
 {
 	struct drm_i915_gem_object *obj;
@@ -133,7 +152,7 @@ void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object *obj,
 {
 	struct drm_i915_private *i915 = to_i915(obj->base.dev);
 
-	obj->cache_level = cache_level;
+	obj->pat_index = i915_gem_get_pat_index(i915, cache_level);
 
 	if (cache_level != I915_CACHE_NONE)
 		obj->cache_coherent = (I915_BO_CACHE_COHERENT_FOR_READ |
@@ -148,6 +167,37 @@ void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object *obj,
 		!IS_DGFX(i915);
 }
 
+/**
+ * i915_gem_object_set_pat_index - set PAT index to be used in PTE encode
+ * @obj: #drm_i915_gem_object
+ * @pat_index: PAT index
+ *
+ * This is a clone of i915_gem_object_set_cache_coherency taking pat index
+ * instead of cache_level as its second argument.
+ */
+void i915_gem_object_set_pat_index(struct drm_i915_gem_object *obj,
+				   unsigned int pat_index)
+{
+	struct drm_i915_private *i915 = to_i915(obj->base.dev);
+
+	if (obj->pat_index == pat_index)
+		return;
+
+	obj->pat_index = pat_index;
+
+	if (pat_index != i915_gem_get_pat_index(i915, I915_CACHE_NONE))
+		obj->cache_coherent = (I915_BO_CACHE_COHERENT_FOR_READ |
+				       I915_BO_CACHE_COHERENT_FOR_WRITE);
+	else if (HAS_LLC(i915))
+		obj->cache_coherent = I915_BO_CACHE_COHERENT_FOR_READ;
+	else
+		obj->cache_coherent = 0;
+
+	obj->cache_dirty =
+		!(obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_WRITE) &&
+		!IS_DGFX(i915);
+}
+
 bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj)
 {
 	struct drm_i915_private *i915 = to_i915(obj->base.dev);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h
index 4c92e17b4337..6f00aab10015 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
@@ -34,6 +34,8 @@ static inline bool i915_gem_object_size_2big(u64 size)
 
 unsigned int i915_gem_get_pat_index(struct drm_i915_private *i915,
 				    enum i915_cache_level level);
+bool i915_gem_object_has_cache_level(const struct drm_i915_gem_object *obj,
+				     enum i915_cache_level lvl);
 void i915_gem_init__objects(struct drm_i915_private *i915);
 
 void i915_objects_module_exit(void);
@@ -764,6 +766,8 @@ bool i915_gem_object_has_unknown_state(struct drm_i915_gem_object *obj);
 
 void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object *obj,
 					 unsigned int cache_level);
+void i915_gem_object_set_pat_index(struct drm_i915_gem_object *obj,
+				   unsigned int pat_index);
 bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj);
 void i915_gem_object_flush_if_display(struct drm_i915_gem_object *obj);
 void i915_gem_object_flush_if_display_locked(struct drm_i915_gem_object *obj);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index 41b35abccf88..132ce01dee9f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -195,6 +195,7 @@ enum i915_cache_level {
 	 */
 	I915_CACHE_WT,
 	I915_MAX_CACHE_LEVEL,
+	I915_CACHE_INVAL = I915_MAX_CACHE_LEVEL,
 };
 
 enum i915_map_type {
@@ -358,10 +359,28 @@ struct drm_i915_gem_object {
 #define I915_BO_FLAG_STRUCT_PAGE BIT(0) /* Object backed by struct pages */
 #define I915_BO_FLAG_IOMEM       BIT(1) /* Object backed by IO memory */
 	/**
-	 * @cache_level: The desired GTT caching level.
+	 * @pat_index: The desired PAT index.
+	 *
+	 * See hardware specification for valid PAT indices for each platform.
+	 * This field used to contain a value of enum i915_cache_level. It's
+	 * changed to an unsigned int because PAT indices are being used by
+	 * both UMD and KMD for caching policy control after GEN12.
+	 * For backward compatibility, this field will continue to contain
+	 * value of i915_cache_level for pre-GEN12 platforms so that the PTE
+	 * encode functions for these legacy platforms can stay the same.
+	 * In the meantime platform specific tables are created to translate
+	 * i915_cache_level into pat index, for more details check the macros
+	 * defined i915/i915_pci.c, e.g. PVC_CACHELEVEL.
+	 */
+	unsigned int pat_index:6;
+	/**
+	 * @cache_level: Indicate whether pat_index is set by UMD
 	 *
-	 * See enum i915_cache_level for possible values, along with what
-	 * each does.
+	 * This used to hold desired GTT caching level, but is now replaced by
+	 * pat_index. It's kept here for KMD to tell whether the pat_index is
+	 * set by UMD or converted from enum i915_cache_level.
+	 * This field should be 0 by default, but I915_CACHE_INVAL if the
+	 * pat_index is set by UMD.
 	 */
 	unsigned int cache_level:3;
 	/**
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
index ee492d823f1b..3b094d36a0b0 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
@@ -565,7 +565,9 @@ static void dbg_poison(struct i915_ggtt *ggtt,
 
 		ggtt->vm.insert_page(&ggtt->vm, addr,
 				     ggtt->error_capture.start,
-				     I915_CACHE_NONE, 0);
+				     i915_gem_get_pat_index(ggtt->vm.i915,
+							    I915_CACHE_NONE),
+				     0);
 		mb();
 
 		s = io_mapping_map_wc(&ggtt->iomap,
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index 69eb20ed4d47..e40761e13c2a 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -214,7 +214,8 @@ static struct dma_fence *i915_ttm_accel_move(struct ttm_buffer_object *bo,
 
 		intel_engine_pm_get(to_gt(i915)->migrate.context->engine);
 		ret = intel_context_migrate_clear(to_gt(i915)->migrate.context, deps,
-						  dst_st->sgl, dst_level,
+						  dst_st->sgl,
+						  i915_gem_get_pat_index(i915, dst_level),
 						  i915_ttm_gtt_binds_lmem(dst_mem),
 						  0, &rq);
 	} else {
@@ -227,12 +228,13 @@ static struct dma_fence *i915_ttm_accel_move(struct ttm_buffer_object *bo,
 		src_level = i915_ttm_cache_level(i915, bo->resource, src_ttm);
 		intel_engine_pm_get(to_gt(i915)->migrate.context->engine);
 		ret = intel_context_migrate_copy(to_gt(i915)->migrate.context,
-						 deps, src_rsgt->table.sgl,
-						 src_level,
-						 i915_ttm_gtt_binds_lmem(bo->resource),
-						 dst_st->sgl, dst_level,
-						 i915_ttm_gtt_binds_lmem(dst_mem),
-						 &rq);
+					deps, src_rsgt->table.sgl,
+					i915_gem_get_pat_index(i915, src_level),
+					i915_ttm_gtt_binds_lmem(bo->resource),
+					dst_st->sgl,
+					i915_gem_get_pat_index(i915, dst_level),
+					i915_ttm_gtt_binds_lmem(dst_mem),
+					&rq);
 
 		i915_refct_sgt_put(src_rsgt);
 	}
diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
index defece0bcb81..ebb68ac9cd5e 100644
--- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
+++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
@@ -354,7 +354,7 @@ fake_huge_pages_object(struct drm_i915_private *i915, u64 size, bool single)
 
 	obj->write_domain = I915_GEM_DOMAIN_CPU;
 	obj->read_domains = I915_GEM_DOMAIN_CPU;
-	obj->cache_level = I915_CACHE_NONE;
+	obj->pat_index = i915_gem_get_pat_index(i915, I915_CACHE_NONE);
 
 	return obj;
 }
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c
index fe6c37fd7859..a93a90b15907 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c
@@ -219,7 +219,7 @@ static int __igt_lmem_pages_migrate(struct intel_gt *gt,
 			continue;
 
 		err = intel_migrate_clear(&gt->migrate, &ww, deps,
-					  obj->mm.pages->sgl, obj->cache_level,
+					  obj->mm.pages->sgl, obj->pat_index,
 					  i915_gem_object_is_lmem(obj),
 					  0xdeadbeaf, &rq);
 		if (rq) {
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
index 56279908ed30..a93d8f9f8bc1 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
@@ -1222,7 +1222,7 @@ static int __igt_mmap_migrate(struct intel_memory_region **placements,
 	}
 
 	err = intel_context_migrate_clear(to_gt(i915)->migrate.context, NULL,
-					  obj->mm.pages->sgl, obj->cache_level,
+					  obj->mm.pages->sgl, obj->pat_index,
 					  i915_gem_object_is_lmem(obj),
 					  expand32(POISON_INUSE), &rq);
 	i915_gem_object_unpin_pages(obj);
diff --git a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
index 5aaacc53fa4c..c2bdc133c89a 100644
--- a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
@@ -109,7 +109,7 @@ static void gen6_ppgtt_clear_range(struct i915_address_space *vm,
 
 static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
 				      struct i915_vma_resource *vma_res,
-				      enum i915_cache_level cache_level,
+				      unsigned int pat_index,
 				      u32 flags)
 {
 	struct i915_ppgtt *ppgtt = i915_vm_to_ppgtt(vm);
@@ -117,7 +117,7 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
 	unsigned int first_entry = vma_res->start / I915_GTT_PAGE_SIZE;
 	unsigned int act_pt = first_entry / GEN6_PTES;
 	unsigned int act_pte = first_entry % GEN6_PTES;
-	const u32 pte_encode = vm->pte_encode(0, cache_level, flags);
+	const u32 pte_encode = vm->pte_encode(0, pat_index, flags);
 	struct sgt_dma iter = sgt_dma(vma_res);
 	gen6_pte_t *vaddr;
 
@@ -227,7 +227,9 @@ static int gen6_ppgtt_init_scratch(struct gen6_ppgtt *ppgtt)
 
 	vm->scratch[0]->encode =
 		vm->pte_encode(px_dma(vm->scratch[0]),
-			       I915_CACHE_NONE, PTE_READ_ONLY);
+			       i915_gem_get_pat_index(vm->i915,
+						      I915_CACHE_NONE),
+			       PTE_READ_ONLY);
 
 	vm->scratch[1] = vm->alloc_pt_dma(vm, I915_GTT_PAGE_SIZE_4K);
 	if (IS_ERR(vm->scratch[1])) {
@@ -278,7 +280,7 @@ static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
 static void pd_vma_bind(struct i915_address_space *vm,
 			struct i915_vm_pt_stash *stash,
 			struct i915_vma_resource *vma_res,
-			enum i915_cache_level cache_level,
+			unsigned int pat_index,
 			u32 unused)
 {
 	struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
index 7a4b1d1afce9..c046813514f4 100644
--- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
@@ -56,7 +56,7 @@ static u64 gen8_pte_encode(dma_addr_t addr,
 }
 
 static u64 mtl_pte_encode(dma_addr_t addr,
-			  enum i915_cache_level level,
+			  unsigned int pat_index,
 			  u32 flags)
 {
 	gen8_pte_t pte = addr | GEN8_PAGE_PRESENT | GEN8_PAGE_RW;
@@ -67,24 +67,17 @@ static u64 mtl_pte_encode(dma_addr_t addr,
 	if (flags & PTE_LM)
 		pte |= GEN12_PPGTT_PTE_LM | GEN12_PPGTT_PTE_NC;
 
-	switch (level) {
-	case I915_CACHE_NONE:
-		pte |= GEN12_PPGTT_PTE_PAT1;
-		break;
-	case I915_CACHE_LLC:
-	case I915_CACHE_L3_LLC:
-		pte |= GEN12_PPGTT_PTE_PAT0 | GEN12_PPGTT_PTE_PAT1;
-		break;
-	case I915_CACHE_WT:
+	if (pat_index & BIT(0))
 		pte |= GEN12_PPGTT_PTE_PAT0;
-		break;
-	default:
-		/* This should never happen. Added to deal with the compile
-		 * error due to the addition of I915_MAX_CACHE_LEVEL. Will
-		 * be removed by the pat_index patch.
-		 */
-		break;
-	}
+
+	if (pat_index & BIT(1))
+		pte |= GEN12_PPGTT_PTE_PAT1;
+
+	if (pat_index & BIT(2))
+		pte |= GEN12_PPGTT_PTE_PAT2;
+
+	if (pat_index & BIT(3))
+		pte |= MTL_PPGTT_PTE_PAT3;
 
 	return pte;
 }
@@ -457,11 +450,11 @@ gen8_ppgtt_insert_pte(struct i915_ppgtt *ppgtt,
 		      struct i915_page_directory *pdp,
 		      struct sgt_dma *iter,
 		      u64 idx,
-		      enum i915_cache_level cache_level,
+		      unsigned int pat_index,
 		      u32 flags)
 {
 	struct i915_page_directory *pd;
-	const gen8_pte_t pte_encode = ppgtt->vm.pte_encode(0, cache_level, flags);
+	const gen8_pte_t pte_encode = ppgtt->vm.pte_encode(0, pat_index, flags);
 	gen8_pte_t *vaddr;
 
 	pd = i915_pd_entry(pdp, gen8_pd_index(idx, 2));
@@ -504,10 +497,10 @@ static void
 xehpsdv_ppgtt_insert_huge(struct i915_address_space *vm,
 			  struct i915_vma_resource *vma_res,
 			  struct sgt_dma *iter,
-			  enum i915_cache_level cache_level,
+			  unsigned int pat_index,
 			  u32 flags)
 {
-	const gen8_pte_t pte_encode = vm->pte_encode(0, cache_level, flags);
+	const gen8_pte_t pte_encode = vm->pte_encode(0, pat_index, flags);
 	unsigned int rem = sg_dma_len(iter->sg);
 	u64 start = vma_res->start;
 	u64 end = start + vma_res->vma_size;
@@ -611,10 +604,10 @@ xehpsdv_ppgtt_insert_huge(struct i915_address_space *vm,
 static void gen8_ppgtt_insert_huge(struct i915_address_space *vm,
 				   struct i915_vma_resource *vma_res,
 				   struct sgt_dma *iter,
-				   enum i915_cache_level cache_level,
+				   unsigned int pat_index,
 				   u32 flags)
 {
-	const gen8_pte_t pte_encode = vm->pte_encode(0, cache_level, flags);
+	const gen8_pte_t pte_encode = vm->pte_encode(0, pat_index, flags);
 	unsigned int rem = sg_dma_len(iter->sg);
 	u64 start = vma_res->start;
 
@@ -734,7 +727,7 @@ static void gen8_ppgtt_insert_huge(struct i915_address_space *vm,
 
 static void gen8_ppgtt_insert(struct i915_address_space *vm,
 			      struct i915_vma_resource *vma_res,
-			      enum i915_cache_level cache_level,
+			      unsigned int pat_index,
 			      u32 flags)
 {
 	struct i915_ppgtt * const ppgtt = i915_vm_to_ppgtt(vm);
@@ -742,9 +735,9 @@ static void gen8_ppgtt_insert(struct i915_address_space *vm,
 
 	if (vma_res->bi.page_sizes.sg > I915_GTT_PAGE_SIZE) {
 		if (HAS_64K_PAGES(vm->i915))
-			xehpsdv_ppgtt_insert_huge(vm, vma_res, &iter, cache_level, flags);
+			xehpsdv_ppgtt_insert_huge(vm, vma_res, &iter, pat_index, flags);
 		else
-			gen8_ppgtt_insert_huge(vm, vma_res, &iter, cache_level, flags);
+			gen8_ppgtt_insert_huge(vm, vma_res, &iter, pat_index, flags);
 	} else  {
 		u64 idx = vma_res->start >> GEN8_PTE_SHIFT;
 
@@ -753,7 +746,7 @@ static void gen8_ppgtt_insert(struct i915_address_space *vm,
 				gen8_pdp_for_page_index(vm, idx);
 
 			idx = gen8_ppgtt_insert_pte(ppgtt, pdp, &iter, idx,
-						    cache_level, flags);
+						    pat_index, flags);
 		} while (idx);
 
 		vma_res->page_sizes_gtt = I915_GTT_PAGE_SIZE;
@@ -763,7 +756,7 @@ static void gen8_ppgtt_insert(struct i915_address_space *vm,
 static void gen8_ppgtt_insert_entry(struct i915_address_space *vm,
 				    dma_addr_t addr,
 				    u64 offset,
-				    enum i915_cache_level level,
+				    unsigned int pat_index,
 				    u32 flags)
 {
 	u64 idx = offset >> GEN8_PTE_SHIFT;
@@ -777,14 +770,14 @@ static void gen8_ppgtt_insert_entry(struct i915_address_space *vm,
 	GEM_BUG_ON(pt->is_compact);
 
 	vaddr = px_vaddr(pt);
-	vaddr[gen8_pd_index(idx, 0)] = vm->pte_encode(addr, level, flags);
+	vaddr[gen8_pd_index(idx, 0)] = vm->pte_encode(addr, pat_index, flags);
 	drm_clflush_virt_range(&vaddr[gen8_pd_index(idx, 0)], sizeof(*vaddr));
 }
 
 static void __xehpsdv_ppgtt_insert_entry_lm(struct i915_address_space *vm,
 					    dma_addr_t addr,
 					    u64 offset,
-					    enum i915_cache_level level,
+					    unsigned int pat_index,
 					    u32 flags)
 {
 	u64 idx = offset >> GEN8_PTE_SHIFT;
@@ -807,20 +800,20 @@ static void __xehpsdv_ppgtt_insert_entry_lm(struct i915_address_space *vm,
 	}
 
 	vaddr = px_vaddr(pt);
-	vaddr[gen8_pd_index(idx, 0) / 16] = vm->pte_encode(addr, level, flags);
+	vaddr[gen8_pd_index(idx, 0) / 16] = vm->pte_encode(addr, pat_index, flags);
 }
 
 static void xehpsdv_ppgtt_insert_entry(struct i915_address_space *vm,
 				       dma_addr_t addr,
 				       u64 offset,
-				       enum i915_cache_level level,
+				       unsigned int pat_index,
 				       u32 flags)
 {
 	if (flags & PTE_LM)
 		return __xehpsdv_ppgtt_insert_entry_lm(vm, addr, offset,
-						       level, flags);
+						       pat_index, flags);
 
-	return gen8_ppgtt_insert_entry(vm, addr, offset, level, flags);
+	return gen8_ppgtt_insert_entry(vm, addr, offset, pat_index, flags);
 }
 
 static int gen8_init_scratch(struct i915_address_space *vm)
@@ -855,7 +848,9 @@ static int gen8_init_scratch(struct i915_address_space *vm)
 
 	vm->scratch[0]->encode =
 		vm->pte_encode(px_dma(vm->scratch[0]),
-			       I915_CACHE_NONE, pte_flags);
+			       i915_gem_get_pat_index(vm->i915,
+						      I915_CACHE_NONE),
+			       pte_flags);
 
 	for (i = 1; i <= vm->top; i++) {
 		struct drm_i915_gem_object *obj;
@@ -873,7 +868,9 @@ static int gen8_init_scratch(struct i915_address_space *vm)
 		}
 
 		fill_px(obj, vm->scratch[i - 1]->encode);
-		obj->encode = gen8_pde_encode(px_dma(obj), I915_CACHE_NONE);
+		obj->encode = gen8_pde_encode(px_dma(obj),
+					      i915_gem_get_pat_index(vm->i915,
+								     I915_CACHE_NONE));
 
 		vm->scratch[i] = obj;
 	}
diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.h b/drivers/gpu/drm/i915/gt/gen8_ppgtt.h
index f541d19264b4..19c635441642 100644
--- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.h
+++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.h
@@ -10,13 +10,12 @@
 
 struct i915_address_space;
 struct intel_gt;
-enum i915_cache_level;
 
 struct i915_ppgtt *gen8_ppgtt_create(struct intel_gt *gt,
 				     unsigned long lmem_pt_obj_flags);
 
 u64 gen8_ggtt_pte_encode(dma_addr_t addr,
-			 enum i915_cache_level level,
+			 unsigned int pat_index,
 			 u32 flags);
 
 #endif
diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c
index c8390d03fce2..2a7942fac798 100644
--- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
@@ -221,7 +221,7 @@ static void guc_ggtt_invalidate(struct i915_ggtt *ggtt)
 }
 
 static u64 mtl_ggtt_pte_encode(dma_addr_t addr,
-			       enum i915_cache_level level,
+			       unsigned int pat_index,
 			       u32 flags)
 {
 	gen8_pte_t pte = addr | GEN8_PAGE_PRESENT;
@@ -231,30 +231,17 @@ static u64 mtl_ggtt_pte_encode(dma_addr_t addr,
 	if (flags & PTE_LM)
 		pte |= GEN12_GGTT_PTE_LM;
 
-	switch (level) {
-	case I915_CACHE_NONE:
-		pte |= MTL_GGTT_PTE_PAT1;
-		break;
-	case I915_CACHE_LLC:
-	case I915_CACHE_L3_LLC:
-		pte |= MTL_GGTT_PTE_PAT0 | MTL_GGTT_PTE_PAT1;
-		break;
-	case I915_CACHE_WT:
+	if (pat_index & BIT(0))
 		pte |= MTL_GGTT_PTE_PAT0;
-		break;
-	default:
-		/* This should never happen. Added to deal with the compile
-		 * error due to the addition of I915_MAX_CACHE_LEVEL. Will
-		 * be removed by the pat_index patch.
-		 */
-		break;
-	}
+
+	if (pat_index & BIT(1))
+		pte |= MTL_GGTT_PTE_PAT1;
 
 	return pte;
 }
 
 u64 gen8_ggtt_pte_encode(dma_addr_t addr,
-			 enum i915_cache_level level,
+			 unsigned int pat_index,
 			 u32 flags)
 {
 	gen8_pte_t pte = addr | GEN8_PAGE_PRESENT;
@@ -273,25 +260,25 @@ static void gen8_set_pte(void __iomem *addr, gen8_pte_t pte)
 static void gen8_ggtt_insert_page(struct i915_address_space *vm,
 				  dma_addr_t addr,
 				  u64 offset,
-				  enum i915_cache_level level,
+				  unsigned int pat_index,
 				  u32 flags)
 {
 	struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
 	gen8_pte_t __iomem *pte =
 		(gen8_pte_t __iomem *)ggtt->gsm + offset / I915_GTT_PAGE_SIZE;
 
-	gen8_set_pte(pte, ggtt->vm.pte_encode(addr, level, flags));
+	gen8_set_pte(pte, ggtt->vm.pte_encode(addr, pat_index, flags));
 
 	ggtt->invalidate(ggtt);
 }
 
 static void gen8_ggtt_insert_entries(struct i915_address_space *vm,
 				     struct i915_vma_resource *vma_res,
-				     enum i915_cache_level level,
+				     unsigned int pat_index,
 				     u32 flags)
 {
 	struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
-	const gen8_pte_t pte_encode = ggtt->vm.pte_encode(0, level, flags);
+	const gen8_pte_t pte_encode = ggtt->vm.pte_encode(0, pat_index, flags);
 	gen8_pte_t __iomem *gte;
 	gen8_pte_t __iomem *end;
 	struct sgt_iter iter;
@@ -348,14 +335,14 @@ static void gen8_ggtt_clear_range(struct i915_address_space *vm,
 static void gen6_ggtt_insert_page(struct i915_address_space *vm,
 				  dma_addr_t addr,
 				  u64 offset,
-				  enum i915_cache_level level,
+				  unsigned int pat_index,
 				  u32 flags)
 {
 	struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
 	gen6_pte_t __iomem *pte =
 		(gen6_pte_t __iomem *)ggtt->gsm + offset / I915_GTT_PAGE_SIZE;
 
-	iowrite32(vm->pte_encode(addr, level, flags), pte);
+	iowrite32(vm->pte_encode(addr, pat_index, flags), pte);
 
 	ggtt->invalidate(ggtt);
 }
@@ -368,7 +355,7 @@ static void gen6_ggtt_insert_page(struct i915_address_space *vm,
  */
 static void gen6_ggtt_insert_entries(struct i915_address_space *vm,
 				     struct i915_vma_resource *vma_res,
-				     enum i915_cache_level level,
+				     unsigned int pat_index,
 				     u32 flags)
 {
 	struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
@@ -385,7 +372,7 @@ static void gen6_ggtt_insert_entries(struct i915_address_space *vm,
 		iowrite32(vm->scratch[0]->encode, gte++);
 	end += (vma_res->node_size + vma_res->guard) / I915_GTT_PAGE_SIZE;
 	for_each_sgt_daddr(addr, iter, vma_res->bi.pages)
-		iowrite32(vm->pte_encode(addr, level, flags), gte++);
+		iowrite32(vm->pte_encode(addr, pat_index, flags), gte++);
 	GEM_BUG_ON(gte > end);
 
 	/* Fill the allocated but "unused" space beyond the end of the buffer */
@@ -420,14 +407,15 @@ struct insert_page {
 	struct i915_address_space *vm;
 	dma_addr_t addr;
 	u64 offset;
-	enum i915_cache_level level;
+	unsigned int pat_index;
 };
 
 static int bxt_vtd_ggtt_insert_page__cb(void *_arg)
 {
 	struct insert_page *arg = _arg;
 
-	gen8_ggtt_insert_page(arg->vm, arg->addr, arg->offset, arg->level, 0);
+	gen8_ggtt_insert_page(arg->vm, arg->addr, arg->offset,
+			      arg->pat_index, 0);
 	bxt_vtd_ggtt_wa(arg->vm);
 
 	return 0;
@@ -436,10 +424,10 @@ static int bxt_vtd_ggtt_insert_page__cb(void *_arg)
 static void bxt_vtd_ggtt_insert_page__BKL(struct i915_address_space *vm,
 					  dma_addr_t addr,
 					  u64 offset,
-					  enum i915_cache_level level,
+					  unsigned int pat_index,
 					  u32 unused)
 {
-	struct insert_page arg = { vm, addr, offset, level };
+	struct insert_page arg = { vm, addr, offset, pat_index };
 
 	stop_machine(bxt_vtd_ggtt_insert_page__cb, &arg, NULL);
 }
@@ -447,7 +435,7 @@ static void bxt_vtd_ggtt_insert_page__BKL(struct i915_address_space *vm,
 struct insert_entries {
 	struct i915_address_space *vm;
 	struct i915_vma_resource *vma_res;
-	enum i915_cache_level level;
+	unsigned int pat_index;
 	u32 flags;
 };
 
@@ -455,7 +443,8 @@ static int bxt_vtd_ggtt_insert_entries__cb(void *_arg)
 {
 	struct insert_entries *arg = _arg;
 
-	gen8_ggtt_insert_entries(arg->vm, arg->vma_res, arg->level, arg->flags);
+	gen8_ggtt_insert_entries(arg->vm, arg->vma_res,
+				 arg->pat_index, arg->flags);
 	bxt_vtd_ggtt_wa(arg->vm);
 
 	return 0;
@@ -463,10 +452,10 @@ static int bxt_vtd_ggtt_insert_entries__cb(void *_arg)
 
 static void bxt_vtd_ggtt_insert_entries__BKL(struct i915_address_space *vm,
 					     struct i915_vma_resource *vma_res,
-					     enum i915_cache_level level,
+					     unsigned int pat_index,
 					     u32 flags)
 {
-	struct insert_entries arg = { vm, vma_res, level, flags };
+	struct insert_entries arg = { vm, vma_res, pat_index, flags };
 
 	stop_machine(bxt_vtd_ggtt_insert_entries__cb, &arg, NULL);
 }
@@ -495,7 +484,7 @@ static void gen6_ggtt_clear_range(struct i915_address_space *vm,
 void intel_ggtt_bind_vma(struct i915_address_space *vm,
 			 struct i915_vm_pt_stash *stash,
 			 struct i915_vma_resource *vma_res,
-			 enum i915_cache_level cache_level,
+			 unsigned int pat_index,
 			 u32 flags)
 {
 	u32 pte_flags;
@@ -512,7 +501,7 @@ void intel_ggtt_bind_vma(struct i915_address_space *vm,
 	if (vma_res->bi.lmem)
 		pte_flags |= PTE_LM;
 
-	vm->insert_entries(vm, vma_res, cache_level, pte_flags);
+	vm->insert_entries(vm, vma_res, pat_index, pte_flags);
 	vma_res->page_sizes_gtt = I915_GTT_PAGE_SIZE;
 }
 
@@ -661,7 +650,7 @@ static int init_ggtt(struct i915_ggtt *ggtt)
 static void aliasing_gtt_bind_vma(struct i915_address_space *vm,
 				  struct i915_vm_pt_stash *stash,
 				  struct i915_vma_resource *vma_res,
-				  enum i915_cache_level cache_level,
+				  unsigned int pat_index,
 				  u32 flags)
 {
 	u32 pte_flags;
@@ -673,10 +662,10 @@ static void aliasing_gtt_bind_vma(struct i915_address_space *vm,
 
 	if (flags & I915_VMA_LOCAL_BIND)
 		ppgtt_bind_vma(&i915_vm_to_ggtt(vm)->alias->vm,
-			       stash, vma_res, cache_level, flags);
+			       stash, vma_res, pat_index, flags);
 
 	if (flags & I915_VMA_GLOBAL_BIND)
-		vm->insert_entries(vm, vma_res, cache_level, pte_flags);
+		vm->insert_entries(vm, vma_res, pat_index, pte_flags);
 
 	vma_res->bound_flags |= flags;
 }
@@ -933,7 +922,9 @@ static int ggtt_probe_common(struct i915_ggtt *ggtt, u64 size)
 
 	ggtt->vm.scratch[0]->encode =
 		ggtt->vm.pte_encode(px_dma(ggtt->vm.scratch[0]),
-				    I915_CACHE_NONE, pte_flags);
+				    i915_gem_get_pat_index(i915,
+							   I915_CACHE_NONE),
+				    pte_flags);
 
 	return 0;
 }
@@ -1022,6 +1013,11 @@ static int gen8_gmch_probe(struct i915_ggtt *ggtt)
 	return ggtt_probe_common(ggtt, size);
 }
 
+/*
+ * For pre-gen8 platforms pat_index is the same as enum i915_cache_level,
+ * so these PTE encode functions are left with using cache_level.
+ * See translation table LEGACY_CACHELEVEL.
+ */
 static u64 snb_pte_encode(dma_addr_t addr,
 			  enum i915_cache_level level,
 			  u32 flags)
@@ -1302,7 +1298,9 @@ bool i915_ggtt_resume_vm(struct i915_address_space *vm)
 		 */
 		vma->resource->bound_flags = 0;
 		vma->ops->bind_vma(vm, NULL, vma->resource,
-				   obj ? obj->cache_level : 0,
+				   obj ? obj->pat_index :
+					 i915_gem_get_pat_index(vm->i915,
+								I915_CACHE_NONE),
 				   was_bound);
 
 		if (obj) { /* only used during resume => exclusive access */
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
index 854ec09fd588..be767e13b1e5 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
@@ -165,8 +165,6 @@ typedef u64 gen8_pte_t;
 #define MTL_2_COH_1W	REG_FIELD_PREP(MTL_PAT_INDEX_COH_MODE_MASK, 2)
 #define MTL_0_COH_NON	REG_FIELD_PREP(MTL_PAT_INDEX_COH_MODE_MASK, 0)
 
-enum i915_cache_level;
-
 struct drm_i915_gem_object;
 struct i915_fence_reg;
 struct i915_vma;
@@ -234,7 +232,7 @@ struct i915_vma_ops {
 	void (*bind_vma)(struct i915_address_space *vm,
 			 struct i915_vm_pt_stash *stash,
 			 struct i915_vma_resource *vma_res,
-			 enum i915_cache_level cache_level,
+			 unsigned int pat_index,
 			 u32 flags);
 	/*
 	 * Unmap an object from an address space. This usually consists of
@@ -306,7 +304,7 @@ struct i915_address_space {
 		(*alloc_scratch_dma)(struct i915_address_space *vm, int sz);
 
 	u64 (*pte_encode)(dma_addr_t addr,
-			  enum i915_cache_level level,
+			  unsigned int pat_index,
 			  u32 flags); /* Create a valid PTE */
 #define PTE_READ_ONLY	BIT(0)
 #define PTE_LM		BIT(1)
@@ -321,20 +319,20 @@ struct i915_address_space {
 	void (*insert_page)(struct i915_address_space *vm,
 			    dma_addr_t addr,
 			    u64 offset,
-			    enum i915_cache_level cache_level,
+			    unsigned int pat_index,
 			    u32 flags);
 	void (*insert_entries)(struct i915_address_space *vm,
 			       struct i915_vma_resource *vma_res,
-			       enum i915_cache_level cache_level,
+			       unsigned int pat_index,
 			       u32 flags);
 	void (*raw_insert_page)(struct i915_address_space *vm,
 				dma_addr_t addr,
 				u64 offset,
-				enum i915_cache_level cache_level,
+				unsigned int pat_index,
 				u32 flags);
 	void (*raw_insert_entries)(struct i915_address_space *vm,
 				   struct i915_vma_resource *vma_res,
-				   enum i915_cache_level cache_level,
+				   unsigned int pat_index,
 				   u32 flags);
 	void (*cleanup)(struct i915_address_space *vm);
 
@@ -581,7 +579,7 @@ void ppgtt_init(struct i915_ppgtt *ppgtt, struct intel_gt *gt,
 void intel_ggtt_bind_vma(struct i915_address_space *vm,
 			 struct i915_vm_pt_stash *stash,
 			 struct i915_vma_resource *vma_res,
-			 enum i915_cache_level cache_level,
+			 unsigned int pat_index,
 			 u32 flags);
 void intel_ggtt_unbind_vma(struct i915_address_space *vm,
 			   struct i915_vma_resource *vma_res);
@@ -639,7 +637,7 @@ void
 __set_pd_entry(struct i915_page_directory * const pd,
 	       const unsigned short idx,
 	       struct i915_page_table *pt,
-	       u64 (*encode)(const dma_addr_t, const enum i915_cache_level));
+	       u64 (*encode)(const dma_addr_t, const unsigned int pat_index));
 
 #define set_pd_entry(pd, idx, to) \
 	__set_pd_entry((pd), (idx), px_pt(to), gen8_pde_encode)
@@ -659,7 +657,7 @@ void gen6_ggtt_invalidate(struct i915_ggtt *ggtt);
 void ppgtt_bind_vma(struct i915_address_space *vm,
 		    struct i915_vm_pt_stash *stash,
 		    struct i915_vma_resource *vma_res,
-		    enum i915_cache_level cache_level,
+		    unsigned int pat_index,
 		    u32 flags);
 void ppgtt_unbind_vma(struct i915_address_space *vm,
 		      struct i915_vma_resource *vma_res);
diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c
index 3f638f198796..117c3d05af3e 100644
--- a/drivers/gpu/drm/i915/gt/intel_migrate.c
+++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
@@ -45,7 +45,9 @@ static void xehpsdv_toggle_pdes(struct i915_address_space *vm,
 	 * Insert a dummy PTE into every PT that will map to LMEM to ensure
 	 * we have a correctly setup PDE structure for later use.
 	 */
-	vm->insert_page(vm, 0, d->offset, I915_CACHE_NONE, PTE_LM);
+	vm->insert_page(vm, 0, d->offset,
+			i915_gem_get_pat_index(vm->i915, I915_CACHE_NONE),
+			PTE_LM);
 	GEM_BUG_ON(!pt->is_compact);
 	d->offset += SZ_2M;
 }
@@ -63,7 +65,9 @@ static void xehpsdv_insert_pte(struct i915_address_space *vm,
 	 * alignment is 64K underneath for the pt, and we are careful
 	 * not to access the space in the void.
 	 */
-	vm->insert_page(vm, px_dma(pt), d->offset, I915_CACHE_NONE, PTE_LM);
+	vm->insert_page(vm, px_dma(pt), d->offset,
+			i915_gem_get_pat_index(vm->i915, I915_CACHE_NONE),
+			PTE_LM);
 	d->offset += SZ_64K;
 }
 
@@ -73,7 +77,8 @@ static void insert_pte(struct i915_address_space *vm,
 {
 	struct insert_pte_data *d = data;
 
-	vm->insert_page(vm, px_dma(pt), d->offset, I915_CACHE_NONE,
+	vm->insert_page(vm, px_dma(pt), d->offset,
+			i915_gem_get_pat_index(vm->i915, I915_CACHE_NONE),
 			i915_gem_object_is_lmem(pt->base) ? PTE_LM : 0);
 	d->offset += PAGE_SIZE;
 }
@@ -356,13 +361,13 @@ static int max_pte_pkt_size(struct i915_request *rq, int pkt)
 
 static int emit_pte(struct i915_request *rq,
 		    struct sgt_dma *it,
-		    enum i915_cache_level cache_level,
+		    unsigned int pat_index,
 		    bool is_lmem,
 		    u64 offset,
 		    int length)
 {
 	bool has_64K_pages = HAS_64K_PAGES(rq->engine->i915);
-	const u64 encode = rq->context->vm->pte_encode(0, cache_level,
+	const u64 encode = rq->context->vm->pte_encode(0, pat_index,
 						       is_lmem ? PTE_LM : 0);
 	struct intel_ring *ring = rq->ring;
 	int pkt, dword_length;
@@ -673,17 +678,17 @@ int
 intel_context_migrate_copy(struct intel_context *ce,
 			   const struct i915_deps *deps,
 			   struct scatterlist *src,
-			   enum i915_cache_level src_cache_level,
+			   unsigned int src_pat_index,
 			   bool src_is_lmem,
 			   struct scatterlist *dst,
-			   enum i915_cache_level dst_cache_level,
+			   unsigned int dst_pat_index,
 			   bool dst_is_lmem,
 			   struct i915_request **out)
 {
 	struct sgt_dma it_src = sg_sgt(src), it_dst = sg_sgt(dst), it_ccs;
 	struct drm_i915_private *i915 = ce->engine->i915;
 	u64 ccs_bytes_to_cpy = 0, bytes_to_cpy;
-	enum i915_cache_level ccs_cache_level;
+	unsigned int ccs_pat_index;
 	u32 src_offset, dst_offset;
 	u8 src_access, dst_access;
 	struct i915_request *rq;
@@ -707,12 +712,12 @@ intel_context_migrate_copy(struct intel_context *ce,
 		dst_sz = scatter_list_length(dst);
 		if (src_is_lmem) {
 			it_ccs = it_dst;
-			ccs_cache_level = dst_cache_level;
+			ccs_pat_index = dst_pat_index;
 			ccs_is_src = false;
 		} else if (dst_is_lmem) {
 			bytes_to_cpy = dst_sz;
 			it_ccs = it_src;
-			ccs_cache_level = src_cache_level;
+			ccs_pat_index = src_pat_index;
 			ccs_is_src = true;
 		}
 
@@ -773,7 +778,7 @@ intel_context_migrate_copy(struct intel_context *ce,
 		src_sz = calculate_chunk_sz(i915, src_is_lmem,
 					    bytes_to_cpy, ccs_bytes_to_cpy);
 
-		len = emit_pte(rq, &it_src, src_cache_level, src_is_lmem,
+		len = emit_pte(rq, &it_src, src_pat_index, src_is_lmem,
 			       src_offset, src_sz);
 		if (!len) {
 			err = -EINVAL;
@@ -784,7 +789,7 @@ intel_context_migrate_copy(struct intel_context *ce,
 			goto out_rq;
 		}
 
-		err = emit_pte(rq, &it_dst, dst_cache_level, dst_is_lmem,
+		err = emit_pte(rq, &it_dst, dst_pat_index, dst_is_lmem,
 			       dst_offset, len);
 		if (err < 0)
 			goto out_rq;
@@ -811,7 +816,7 @@ intel_context_migrate_copy(struct intel_context *ce,
 				goto out_rq;
 
 			ccs_sz = GET_CCS_BYTES(i915, len);
-			err = emit_pte(rq, &it_ccs, ccs_cache_level, false,
+			err = emit_pte(rq, &it_ccs, ccs_pat_index, false,
 				       ccs_is_src ? src_offset : dst_offset,
 				       ccs_sz);
 			if (err < 0)
@@ -979,7 +984,7 @@ int
 intel_context_migrate_clear(struct intel_context *ce,
 			    const struct i915_deps *deps,
 			    struct scatterlist *sg,
-			    enum i915_cache_level cache_level,
+			    unsigned int pat_index,
 			    bool is_lmem,
 			    u32 value,
 			    struct i915_request **out)
@@ -1027,7 +1032,7 @@ intel_context_migrate_clear(struct intel_context *ce,
 		if (err)
 			goto out_rq;
 
-		len = emit_pte(rq, &it, cache_level, is_lmem, offset, CHUNK_SZ);
+		len = emit_pte(rq, &it, pat_index, is_lmem, offset, CHUNK_SZ);
 		if (len <= 0) {
 			err = len;
 			goto out_rq;
@@ -1074,10 +1079,10 @@ int intel_migrate_copy(struct intel_migrate *m,
 		       struct i915_gem_ww_ctx *ww,
 		       const struct i915_deps *deps,
 		       struct scatterlist *src,
-		       enum i915_cache_level src_cache_level,
+		       unsigned int src_pat_index,
 		       bool src_is_lmem,
 		       struct scatterlist *dst,
-		       enum i915_cache_level dst_cache_level,
+		       unsigned int dst_pat_index,
 		       bool dst_is_lmem,
 		       struct i915_request **out)
 {
@@ -1098,8 +1103,8 @@ int intel_migrate_copy(struct intel_migrate *m,
 		goto out;
 
 	err = intel_context_migrate_copy(ce, deps,
-					 src, src_cache_level, src_is_lmem,
-					 dst, dst_cache_level, dst_is_lmem,
+					 src, src_pat_index, src_is_lmem,
+					 dst, dst_pat_index, dst_is_lmem,
 					 out);
 
 	intel_context_unpin(ce);
@@ -1113,7 +1118,7 @@ intel_migrate_clear(struct intel_migrate *m,
 		    struct i915_gem_ww_ctx *ww,
 		    const struct i915_deps *deps,
 		    struct scatterlist *sg,
-		    enum i915_cache_level cache_level,
+		    unsigned int pat_index,
 		    bool is_lmem,
 		    u32 value,
 		    struct i915_request **out)
@@ -1134,7 +1139,7 @@ intel_migrate_clear(struct intel_migrate *m,
 	if (err)
 		goto out;
 
-	err = intel_context_migrate_clear(ce, deps, sg, cache_level,
+	err = intel_context_migrate_clear(ce, deps, sg, pat_index,
 					  is_lmem, value, out);
 
 	intel_context_unpin(ce);
diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.h b/drivers/gpu/drm/i915/gt/intel_migrate.h
index ccc677ec4aa3..11fc09a00c4b 100644
--- a/drivers/gpu/drm/i915/gt/intel_migrate.h
+++ b/drivers/gpu/drm/i915/gt/intel_migrate.h
@@ -16,7 +16,6 @@ struct i915_request;
 struct i915_gem_ww_ctx;
 struct intel_gt;
 struct scatterlist;
-enum i915_cache_level;
 
 int intel_migrate_init(struct intel_migrate *m, struct intel_gt *gt);
 
@@ -26,20 +25,20 @@ int intel_migrate_copy(struct intel_migrate *m,
 		       struct i915_gem_ww_ctx *ww,
 		       const struct i915_deps *deps,
 		       struct scatterlist *src,
-		       enum i915_cache_level src_cache_level,
+		       unsigned int src_pat_index,
 		       bool src_is_lmem,
 		       struct scatterlist *dst,
-		       enum i915_cache_level dst_cache_level,
+		       unsigned int dst_pat_index,
 		       bool dst_is_lmem,
 		       struct i915_request **out);
 
 int intel_context_migrate_copy(struct intel_context *ce,
 			       const struct i915_deps *deps,
 			       struct scatterlist *src,
-			       enum i915_cache_level src_cache_level,
+			       unsigned int src_pat_index,
 			       bool src_is_lmem,
 			       struct scatterlist *dst,
-			       enum i915_cache_level dst_cache_level,
+			       unsigned int dst_pat_index,
 			       bool dst_is_lmem,
 			       struct i915_request **out);
 
@@ -48,7 +47,7 @@ intel_migrate_clear(struct intel_migrate *m,
 		    struct i915_gem_ww_ctx *ww,
 		    const struct i915_deps *deps,
 		    struct scatterlist *sg,
-		    enum i915_cache_level cache_level,
+		    unsigned int pat_index,
 		    bool is_lmem,
 		    u32 value,
 		    struct i915_request **out);
@@ -56,7 +55,7 @@ int
 intel_context_migrate_clear(struct intel_context *ce,
 			    const struct i915_deps *deps,
 			    struct scatterlist *sg,
-			    enum i915_cache_level cache_level,
+			    unsigned int pat_index,
 			    bool is_lmem,
 			    u32 value,
 			    struct i915_request **out);
diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
index 7ecfa672f738..f0da3555c6db 100644
--- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
@@ -98,7 +98,7 @@ void
 __set_pd_entry(struct i915_page_directory * const pd,
 	       const unsigned short idx,
 	       struct i915_page_table * const to,
-	       u64 (*encode)(const dma_addr_t, const enum i915_cache_level))
+	       u64 (*encode)(const dma_addr_t, const unsigned int))
 {
 	/* Each thread pre-pins the pd, and we may have a thread per pde. */
 	GEM_BUG_ON(atomic_read(px_used(pd)) > NALLOC * I915_PDES);
@@ -181,7 +181,7 @@ struct i915_ppgtt *i915_ppgtt_create(struct intel_gt *gt,
 void ppgtt_bind_vma(struct i915_address_space *vm,
 		    struct i915_vm_pt_stash *stash,
 		    struct i915_vma_resource *vma_res,
-		    enum i915_cache_level cache_level,
+		    unsigned int pat_index,
 		    u32 flags)
 {
 	u32 pte_flags;
@@ -199,7 +199,7 @@ void ppgtt_bind_vma(struct i915_address_space *vm,
 	if (vma_res->bi.lmem)
 		pte_flags |= PTE_LM;
 
-	vm->insert_entries(vm, vma_res, cache_level, pte_flags);
+	vm->insert_entries(vm, vma_res, pat_index, pte_flags);
 	wmb();
 }
 
diff --git a/drivers/gpu/drm/i915/gt/selftest_migrate.c b/drivers/gpu/drm/i915/gt/selftest_migrate.c
index e677f2da093d..3def5ca72dec 100644
--- a/drivers/gpu/drm/i915/gt/selftest_migrate.c
+++ b/drivers/gpu/drm/i915/gt/selftest_migrate.c
@@ -137,7 +137,7 @@ static int copy(struct intel_migrate *migrate,
 static int intel_context_copy_ccs(struct intel_context *ce,
 				  const struct i915_deps *deps,
 				  struct scatterlist *sg,
-				  enum i915_cache_level cache_level,
+				  unsigned int pat_index,
 				  bool write_to_ccs,
 				  struct i915_request **out)
 {
@@ -185,7 +185,7 @@ static int intel_context_copy_ccs(struct intel_context *ce,
 		if (err)
 			goto out_rq;
 
-		len = emit_pte(rq, &it, cache_level, true, offset, CHUNK_SZ);
+		len = emit_pte(rq, &it, pat_index, true, offset, CHUNK_SZ);
 		if (len <= 0) {
 			err = len;
 			goto out_rq;
@@ -223,7 +223,7 @@ intel_migrate_ccs_copy(struct intel_migrate *m,
 		       struct i915_gem_ww_ctx *ww,
 		       const struct i915_deps *deps,
 		       struct scatterlist *sg,
-		       enum i915_cache_level cache_level,
+		       unsigned int pat_index,
 		       bool write_to_ccs,
 		       struct i915_request **out)
 {
@@ -243,7 +243,7 @@ intel_migrate_ccs_copy(struct intel_migrate *m,
 	if (err)
 		goto out;
 
-	err = intel_context_copy_ccs(ce, deps, sg, cache_level,
+	err = intel_context_copy_ccs(ce, deps, sg, pat_index,
 				     write_to_ccs, out);
 
 	intel_context_unpin(ce);
@@ -300,7 +300,7 @@ static int clear(struct intel_migrate *migrate,
 			/* Write the obj data into ccs surface */
 			err = intel_migrate_ccs_copy(migrate, &ww, NULL,
 						     obj->mm.pages->sgl,
-						     obj->cache_level,
+						     obj->pat_index,
 						     true, &rq);
 			if (rq && !err) {
 				if (i915_request_wait(rq, 0, HZ) < 0) {
@@ -351,7 +351,7 @@ static int clear(struct intel_migrate *migrate,
 
 			err = intel_migrate_ccs_copy(migrate, &ww, NULL,
 						     obj->mm.pages->sgl,
-						     obj->cache_level,
+						     obj->pat_index,
 						     false, &rq);
 			if (rq && !err) {
 				if (i915_request_wait(rq, 0, HZ) < 0) {
@@ -414,9 +414,9 @@ static int __migrate_copy(struct intel_migrate *migrate,
 			  struct i915_request **out)
 {
 	return intel_migrate_copy(migrate, ww, NULL,
-				  src->mm.pages->sgl, src->cache_level,
+				  src->mm.pages->sgl, src->pat_index,
 				  i915_gem_object_is_lmem(src),
-				  dst->mm.pages->sgl, dst->cache_level,
+				  dst->mm.pages->sgl, dst->pat_index,
 				  i915_gem_object_is_lmem(dst),
 				  out);
 }
@@ -428,9 +428,9 @@ static int __global_copy(struct intel_migrate *migrate,
 			 struct i915_request **out)
 {
 	return intel_context_migrate_copy(migrate->context, NULL,
-					  src->mm.pages->sgl, src->cache_level,
+					  src->mm.pages->sgl, src->pat_index,
 					  i915_gem_object_is_lmem(src),
-					  dst->mm.pages->sgl, dst->cache_level,
+					  dst->mm.pages->sgl, dst->pat_index,
 					  i915_gem_object_is_lmem(dst),
 					  out);
 }
@@ -455,7 +455,7 @@ static int __migrate_clear(struct intel_migrate *migrate,
 {
 	return intel_migrate_clear(migrate, ww, NULL,
 				   obj->mm.pages->sgl,
-				   obj->cache_level,
+				   obj->pat_index,
 				   i915_gem_object_is_lmem(obj),
 				   value, out);
 }
@@ -468,7 +468,7 @@ static int __global_clear(struct intel_migrate *migrate,
 {
 	return intel_context_migrate_clear(migrate->context, NULL,
 					   obj->mm.pages->sgl,
-					   obj->cache_level,
+					   obj->pat_index,
 					   i915_gem_object_is_lmem(obj),
 					   value, out);
 }
@@ -648,7 +648,7 @@ static int live_emit_pte_full_ring(void *arg)
 	 */
 	pr_info("%s emite_pte ring space=%u\n", __func__, rq->ring->space);
 	it = sg_sgt(obj->mm.pages->sgl);
-	len = emit_pte(rq, &it, obj->cache_level, false, 0, CHUNK_SZ);
+	len = emit_pte(rq, &it, obj->pat_index, false, 0, CHUNK_SZ);
 	if (!len) {
 		err = -EINVAL;
 		goto out_rq;
@@ -844,7 +844,7 @@ static int wrap_ktime_compare(const void *A, const void *B)
 
 static int __perf_clear_blt(struct intel_context *ce,
 			    struct scatterlist *sg,
-			    enum i915_cache_level cache_level,
+			    unsigned int pat_index,
 			    bool is_lmem,
 			    size_t sz)
 {
@@ -858,7 +858,7 @@ static int __perf_clear_blt(struct intel_context *ce,
 
 		t0 = ktime_get();
 
-		err = intel_context_migrate_clear(ce, NULL, sg, cache_level,
+		err = intel_context_migrate_clear(ce, NULL, sg, pat_index,
 						  is_lmem, 0, &rq);
 		if (rq) {
 			if (i915_request_wait(rq, 0, MAX_SCHEDULE_TIMEOUT) < 0)
@@ -904,7 +904,8 @@ static int perf_clear_blt(void *arg)
 
 		err = __perf_clear_blt(gt->migrate.context,
 				       dst->mm.pages->sgl,
-				       I915_CACHE_NONE,
+				       i915_gem_get_pat_index(gt->i915,
+							      I915_CACHE_NONE),
 				       i915_gem_object_is_lmem(dst),
 				       sizes[i]);
 
@@ -919,10 +920,10 @@ static int perf_clear_blt(void *arg)
 
 static int __perf_copy_blt(struct intel_context *ce,
 			   struct scatterlist *src,
-			   enum i915_cache_level src_cache_level,
+			   unsigned int src_pat_index,
 			   bool src_is_lmem,
 			   struct scatterlist *dst,
-			   enum i915_cache_level dst_cache_level,
+			   unsigned int dst_pat_index,
 			   bool dst_is_lmem,
 			   size_t sz)
 {
@@ -937,9 +938,9 @@ static int __perf_copy_blt(struct intel_context *ce,
 		t0 = ktime_get();
 
 		err = intel_context_migrate_copy(ce, NULL,
-						 src, src_cache_level,
+						 src, src_pat_index,
 						 src_is_lmem,
-						 dst, dst_cache_level,
+						 dst, dst_pat_index,
 						 dst_is_lmem,
 						 &rq);
 		if (rq) {
@@ -994,10 +995,12 @@ static int perf_copy_blt(void *arg)
 
 		err = __perf_copy_blt(gt->migrate.context,
 				      src->mm.pages->sgl,
-				      I915_CACHE_NONE,
+				      i915_gem_get_pat_index(gt->i915,
+							     I915_CACHE_NONE),
 				      i915_gem_object_is_lmem(src),
 				      dst->mm.pages->sgl,
-				      I915_CACHE_NONE,
+				      i915_gem_get_pat_index(gt->i915,
+							     I915_CACHE_NONE),
 				      i915_gem_object_is_lmem(dst),
 				      sz);
 
diff --git a/drivers/gpu/drm/i915/gt/selftest_reset.c b/drivers/gpu/drm/i915/gt/selftest_reset.c
index a9e0a91bc0e0..79aa6ac66ad2 100644
--- a/drivers/gpu/drm/i915/gt/selftest_reset.c
+++ b/drivers/gpu/drm/i915/gt/selftest_reset.c
@@ -86,7 +86,9 @@ __igt_reset_stolen(struct intel_gt *gt,
 
 		ggtt->vm.insert_page(&ggtt->vm, dma,
 				     ggtt->error_capture.start,
-				     I915_CACHE_NONE, 0);
+				     i915_gem_get_pat_index(gt->i915,
+							    I915_CACHE_NONE),
+				     0);
 		mb();
 
 		s = io_mapping_map_wc(&ggtt->iomap,
@@ -127,7 +129,9 @@ __igt_reset_stolen(struct intel_gt *gt,
 
 		ggtt->vm.insert_page(&ggtt->vm, dma,
 				     ggtt->error_capture.start,
-				     I915_CACHE_NONE, 0);
+				     i915_gem_get_pat_index(gt->i915,
+							    I915_CACHE_NONE),
+				     0);
 		mb();
 
 		s = io_mapping_map_wc(&ggtt->iomap,
diff --git a/drivers/gpu/drm/i915/gt/selftest_timeline.c b/drivers/gpu/drm/i915/gt/selftest_timeline.c
index 9f536c251179..39c3ec12df1a 100644
--- a/drivers/gpu/drm/i915/gt/selftest_timeline.c
+++ b/drivers/gpu/drm/i915/gt/selftest_timeline.c
@@ -836,7 +836,7 @@ static int setup_watcher(struct hwsp_watcher *w, struct intel_gt *gt,
 		return PTR_ERR(obj);
 
 	/* keep the same cache settings as timeline */
-	i915_gem_object_set_cache_coherency(obj, tl->hwsp_ggtt->obj->cache_level);
+	i915_gem_object_set_pat_index(obj, tl->hwsp_ggtt->obj->pat_index);
 	w->map = i915_gem_object_pin_map_unlocked(obj,
 						  page_unmask_bits(tl->hwsp_ggtt->obj->mm.mapping));
 	if (IS_ERR(w->map)) {
diff --git a/drivers/gpu/drm/i915/gt/selftest_tlb.c b/drivers/gpu/drm/i915/gt/selftest_tlb.c
index e6cac1f15d6e..4493c8518e91 100644
--- a/drivers/gpu/drm/i915/gt/selftest_tlb.c
+++ b/drivers/gpu/drm/i915/gt/selftest_tlb.c
@@ -36,6 +36,8 @@ pte_tlbinv(struct intel_context *ce,
 	   u64 length,
 	   struct rnd_state *prng)
 {
+	const unsigned int pat_index =
+		i915_gem_get_pat_index(ce->vm->i915, I915_CACHE_NONE);
 	struct drm_i915_gem_object *batch;
 	struct drm_mm_node vb_node;
 	struct i915_request *rq;
@@ -155,7 +157,7 @@ pte_tlbinv(struct intel_context *ce,
 		/* Flip the PTE between A and B */
 		if (i915_gem_object_is_lmem(vb->obj))
 			pte_flags |= PTE_LM;
-		ce->vm->insert_entries(ce->vm, &vb_res, 0, pte_flags);
+		ce->vm->insert_entries(ce->vm, &vb_res, pat_index, pte_flags);
 
 		/* Flush the PTE update to concurrent HW */
 		tlbinv(ce->vm, addr & -length, length);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
index a82a53dbbc86..145681ae20a5 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
@@ -890,9 +890,15 @@ static void uc_fw_bind_ggtt(struct intel_uc_fw *uc_fw)
 		pte_flags |= PTE_LM;
 
 	if (ggtt->vm.raw_insert_entries)
-		ggtt->vm.raw_insert_entries(&ggtt->vm, dummy, I915_CACHE_NONE, pte_flags);
+		ggtt->vm.raw_insert_entries(&ggtt->vm, dummy,
+					    i915_gem_get_pat_index(ggtt->vm.i915,
+								   I915_CACHE_NONE),
+					    pte_flags);
 	else
-		ggtt->vm.insert_entries(&ggtt->vm, dummy, I915_CACHE_NONE, pte_flags);
+		ggtt->vm.insert_entries(&ggtt->vm, dummy,
+					i915_gem_get_pat_index(ggtt->vm.i915,
+							       I915_CACHE_NONE),
+					pte_flags);
 }
 
 static void uc_fw_unbind_ggtt(struct intel_uc_fw *uc_fw)
diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 41389a32e998..9a4922da3a71 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -139,21 +139,56 @@ static const char *stringify_vma_type(const struct i915_vma *vma)
 	return "ppgtt";
 }
 
-static const char *i915_cache_level_str(struct drm_i915_private *i915, int type)
-{
-	switch (type) {
-	case I915_CACHE_NONE: return " uncached";
-	case I915_CACHE_LLC: return HAS_LLC(i915) ? " LLC" : " snooped";
-	case I915_CACHE_L3_LLC: return " L3+LLC";
-	case I915_CACHE_WT: return " WT";
-	default: return "";
+static const char *i915_cache_level_str(struct drm_i915_gem_object *obj)
+{
+	struct drm_i915_private *i915 = obj_to_i915(obj);
+
+	if (IS_METEORLAKE(i915)) {
+		switch (obj->pat_index) {
+		case 0: return " WB";
+		case 1: return " WT";
+		case 2: return " UC";
+		case 3: return " WB (1-Way Coh)";
+		case 4: return " WB (2-Way Coh)";
+		default: return " not defined";
+		}
+	} else if (IS_PONTEVECCHIO(i915)) {
+		switch (obj->pat_index) {
+		case 0: return " UC";
+		case 1: return " WC";
+		case 2: return " WT";
+		case 3: return " WB";
+		case 4: return " WT (CLOS1)";
+		case 5: return " WB (CLOS1)";
+		case 6: return " WT (CLOS2)";
+		case 7: return " WT (CLOS2)";
+		default: return " not defined";
+		}
+	} else if (GRAPHICS_VER(i915) >= 12) {
+		switch (obj->pat_index) {
+		case 0: return " WB";
+		case 1: return " WC";
+		case 2: return " WT";
+		case 3: return " UC";
+		default: return " not defined";
+		}
+	} else {
+		if (i915_gem_object_has_cache_level(obj, I915_CACHE_NONE))
+			return " uncached";
+		else if (i915_gem_object_has_cache_level(obj, I915_CACHE_LLC))
+			return HAS_LLC(i915) ? " LLC" : " snooped";
+		else if (i915_gem_object_has_cache_level(obj, I915_CACHE_L3_LLC))
+			return " L3+LLC";
+		else if (i915_gem_object_has_cache_level(obj, I915_CACHE_WT))
+			return " WT";
+		else
+			return " not defined";
 	}
 }
 
 void
 i915_debugfs_describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
 {
-	struct drm_i915_private *dev_priv = to_i915(obj->base.dev);
 	struct i915_vma *vma;
 	int pin_count = 0;
 
@@ -165,7 +200,7 @@ i915_debugfs_describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
 		   obj->base.size / 1024,
 		   obj->read_domains,
 		   obj->write_domain,
-		   i915_cache_level_str(dev_priv, obj->cache_level),
+		   i915_cache_level_str(obj),
 		   obj->mm.dirty ? " dirty" : "",
 		   obj->mm.madv == I915_MADV_DONTNEED ? " purgeable" : "");
 	if (obj->base.name)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 0a78bdbd36b1..63207b0740b3 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -420,8 +420,12 @@ i915_gem_gtt_pread(struct drm_i915_gem_object *obj,
 		page_length = remain < page_length ? remain : page_length;
 		if (drm_mm_node_allocated(&node)) {
 			ggtt->vm.insert_page(&ggtt->vm,
-					     i915_gem_object_get_dma_address(obj, offset >> PAGE_SHIFT),
-					     node.start, I915_CACHE_NONE, 0);
+					i915_gem_object_get_dma_address(obj,
+									offset >> PAGE_SHIFT),
+					node.start,
+					i915_gem_get_pat_index(i915,
+							       I915_CACHE_NONE),
+					0);
 		} else {
 			page_base += offset & PAGE_MASK;
 		}
@@ -598,8 +602,12 @@ i915_gem_gtt_pwrite_fast(struct drm_i915_gem_object *obj,
 			/* flush the write before we modify the GGTT */
 			intel_gt_flush_ggtt_writes(ggtt->vm.gt);
 			ggtt->vm.insert_page(&ggtt->vm,
-					     i915_gem_object_get_dma_address(obj, offset >> PAGE_SHIFT),
-					     node.start, I915_CACHE_NONE, 0);
+					i915_gem_object_get_dma_address(obj,
+									offset >> PAGE_SHIFT),
+					node.start,
+					i915_gem_get_pat_index(i915,
+							       I915_CACHE_NONE),
+					0);
 			wmb(); /* flush modifications to the GGTT (insert_page) */
 		} else {
 			page_base += offset & PAGE_MASK;
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index f020c0086fbc..2556cabea02c 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -1117,10 +1117,14 @@ i915_vma_coredump_create(const struct intel_gt *gt,
 			mutex_lock(&ggtt->error_mutex);
 			if (ggtt->vm.raw_insert_page)
 				ggtt->vm.raw_insert_page(&ggtt->vm, dma, slot,
-							 I915_CACHE_NONE, 0);
+						i915_gem_get_pat_index(gt->i915,
+								       I915_CACHE_NONE),
+						0);
 			else
 				ggtt->vm.insert_page(&ggtt->vm, dma, slot,
-						     I915_CACHE_NONE, 0);
+						i915_gem_get_pat_index(gt->i915,
+								       I915_CACHE_NONE),
+						0);
 			mb();
 
 			s = io_mapping_map_wc(&ggtt->iomap, slot, PAGE_SIZE);
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 20a44788999e..a814775a363d 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -315,7 +315,7 @@ struct i915_vma_work {
 	struct i915_vma_resource *vma_res;
 	struct drm_i915_gem_object *obj;
 	struct i915_sw_dma_fence_cb cb;
-	enum i915_cache_level cache_level;
+	unsigned int pat_index;
 	unsigned int flags;
 };
 
@@ -334,7 +334,7 @@ static void __vma_bind(struct dma_fence_work *work)
 		return;
 
 	vma_res->ops->bind_vma(vma_res->vm, &vw->stash,
-			       vma_res, vw->cache_level, vw->flags);
+			       vma_res, vw->pat_index, vw->flags);
 }
 
 static void __vma_release(struct dma_fence_work *work)
@@ -426,7 +426,7 @@ i915_vma_resource_init_from_vma(struct i915_vma_resource *vma_res,
 /**
  * i915_vma_bind - Sets up PTEs for an VMA in it's corresponding address space.
  * @vma: VMA to map
- * @cache_level: mapping cache level
+ * @pat_index: PAT index to set in PTE
  * @flags: flags like global or local mapping
  * @work: preallocated worker for allocating and binding the PTE
  * @vma_res: pointer to a preallocated vma resource. The resource is either
@@ -437,7 +437,7 @@ i915_vma_resource_init_from_vma(struct i915_vma_resource *vma_res,
  * Note that DMA addresses are also the only part of the SG table we care about.
  */
 int i915_vma_bind(struct i915_vma *vma,
-		  enum i915_cache_level cache_level,
+		  unsigned int pat_index,
 		  u32 flags,
 		  struct i915_vma_work *work,
 		  struct i915_vma_resource *vma_res)
@@ -507,7 +507,7 @@ int i915_vma_bind(struct i915_vma *vma,
 		struct dma_fence *prev;
 
 		work->vma_res = i915_vma_resource_get(vma->resource);
-		work->cache_level = cache_level;
+		work->pat_index = pat_index;
 		work->flags = bind_flags;
 
 		/*
@@ -537,7 +537,7 @@ int i915_vma_bind(struct i915_vma *vma,
 
 			return ret;
 		}
-		vma->ops->bind_vma(vma->vm, NULL, vma->resource, cache_level,
+		vma->ops->bind_vma(vma->vm, NULL, vma->resource, pat_index,
 				   bind_flags);
 	}
 
@@ -814,7 +814,7 @@ i915_vma_insert(struct i915_vma *vma, struct i915_gem_ww_ctx *ww,
 	color = 0;
 
 	if (i915_vm_has_cache_coloring(vma->vm))
-		color = vma->obj->cache_level;
+		color = vma->obj->pat_index;
 
 	if (flags & PIN_OFFSET_FIXED) {
 		u64 offset = flags & PIN_OFFSET_MASK;
@@ -1518,7 +1518,7 @@ int i915_vma_pin_ww(struct i915_vma *vma, struct i915_gem_ww_ctx *ww,
 
 	GEM_BUG_ON(!vma->pages);
 	err = i915_vma_bind(vma,
-			    vma->obj->cache_level,
+			    vma->obj->pat_index,
 			    flags, work, vma_res);
 	vma_res = NULL;
 	if (err)
diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
index ed5c9d682a1b..31a8f8aa5558 100644
--- a/drivers/gpu/drm/i915/i915_vma.h
+++ b/drivers/gpu/drm/i915/i915_vma.h
@@ -250,7 +250,7 @@ i915_vma_compare(struct i915_vma *vma,
 
 struct i915_vma_work *i915_vma_work(void);
 int i915_vma_bind(struct i915_vma *vma,
-		  enum i915_cache_level cache_level,
+		  unsigned int pat_index,
 		  u32 flags,
 		  struct i915_vma_work *work,
 		  struct i915_vma_resource *vma_res);
diff --git a/drivers/gpu/drm/i915/i915_vma_types.h b/drivers/gpu/drm/i915/i915_vma_types.h
index 77fda2244d16..64472b7f0e77 100644
--- a/drivers/gpu/drm/i915/i915_vma_types.h
+++ b/drivers/gpu/drm/i915/i915_vma_types.h
@@ -32,8 +32,6 @@
 
 #include "gem/i915_gem_object_types.h"
 
-enum i915_cache_level;
-
 /**
  * DOC: Global GTT views
  *
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem.c b/drivers/gpu/drm/i915/selftests/i915_gem.c
index d91d0ade8abd..61da4ed9d521 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem.c
@@ -57,7 +57,10 @@ static void trash_stolen(struct drm_i915_private *i915)
 		u32 __iomem *s;
 		int x;
 
-		ggtt->vm.insert_page(&ggtt->vm, dma, slot, I915_CACHE_NONE, 0);
+		ggtt->vm.insert_page(&ggtt->vm, dma, slot,
+				     i915_gem_get_pat_index(i915,
+							    I915_CACHE_NONE),
+				     0);
 
 		s = io_mapping_map_atomic_wc(&ggtt->iomap, slot);
 		for (x = 0; x < PAGE_SIZE / sizeof(u32); x++) {
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
index 37068542aafe..f13a4d265814 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
@@ -245,7 +245,7 @@ static int igt_evict_for_cache_color(void *arg)
 	struct drm_mm_node target = {
 		.start = I915_GTT_PAGE_SIZE * 2,
 		.size = I915_GTT_PAGE_SIZE,
-		.color = I915_CACHE_LLC,
+		.color = i915_gem_get_pat_index(gt->i915, I915_CACHE_LLC),
 	};
 	struct drm_i915_gem_object *obj;
 	struct i915_vma *vma;
@@ -308,7 +308,7 @@ static int igt_evict_for_cache_color(void *arg)
 	/* Attempt to remove the first *pinned* vma, by removing the (empty)
 	 * neighbour -- this should fail.
 	 */
-	target.color = I915_CACHE_L3_LLC;
+	target.color = i915_gem_get_pat_index(gt->i915, I915_CACHE_L3_LLC);
 
 	mutex_lock(&ggtt->vm.mutex);
 	err = i915_gem_evict_for_node(&ggtt->vm, NULL, &target, 0);
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
index 154801f1c468..36940ef10108 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
@@ -135,7 +135,7 @@ fake_dma_object(struct drm_i915_private *i915, u64 size)
 
 	obj->write_domain = I915_GEM_DOMAIN_CPU;
 	obj->read_domains = I915_GEM_DOMAIN_CPU;
-	obj->cache_level = I915_CACHE_NONE;
+	obj->pat_index = i915_gem_get_pat_index(i915, I915_CACHE_NONE);
 
 	/* Preallocate the "backing storage" */
 	if (i915_gem_object_pin_pages_unlocked(obj))
@@ -359,7 +359,9 @@ static int lowlevel_hole(struct i915_address_space *vm,
 
 			with_intel_runtime_pm(vm->gt->uncore->rpm, wakeref)
 			  vm->insert_entries(vm, mock_vma_res,
-						   I915_CACHE_NONE, 0);
+					     i915_gem_get_pat_index(vm->i915,
+								    I915_CACHE_NONE),
+					     0);
 		}
 		count = n;
 
@@ -1377,7 +1379,10 @@ static int igt_ggtt_page(void *arg)
 
 		ggtt->vm.insert_page(&ggtt->vm,
 				     i915_gem_object_get_dma_address(obj, 0),
-				     offset, I915_CACHE_NONE, 0);
+				     offset,
+				     i915_gem_get_pat_index(i915,
+							    I915_CACHE_NONE),
+				     0);
 	}
 
 	order = i915_random_order(count, &prng);
@@ -1510,7 +1515,7 @@ static int reserve_gtt_with_resource(struct i915_vma *vma, u64 offset)
 	mutex_lock(&vm->mutex);
 	err = i915_gem_gtt_reserve(vm, NULL, &vma->node, obj->base.size,
 				   offset,
-				   obj->cache_level,
+				   obj->pat_index,
 				   0);
 	if (!err) {
 		i915_vma_resource_init_from_vma(vma_res, vma);
@@ -1690,7 +1695,7 @@ static int insert_gtt_with_resource(struct i915_vma *vma)
 
 	mutex_lock(&vm->mutex);
 	err = i915_gem_gtt_insert(vm, NULL, &vma->node, obj->base.size, 0,
-				  obj->cache_level, 0, vm->total, 0);
+				  obj->pat_index, 0, vm->total, 0);
 	if (!err) {
 		i915_vma_resource_init_from_vma(vma_res, vma);
 		vma->resource = vma_res;
diff --git a/drivers/gpu/drm/i915/selftests/intel_memory_region.c b/drivers/gpu/drm/i915/selftests/intel_memory_region.c
index 3b18e5905c86..d985d9bae2e8 100644
--- a/drivers/gpu/drm/i915/selftests/intel_memory_region.c
+++ b/drivers/gpu/drm/i915/selftests/intel_memory_region.c
@@ -1070,7 +1070,9 @@ static int igt_lmem_write_cpu(void *arg)
 	/* Put the pages into a known state -- from the gpu for added fun */
 	intel_engine_pm_get(engine);
 	err = intel_context_migrate_clear(engine->gt->migrate.context, NULL,
-					  obj->mm.pages->sgl, I915_CACHE_NONE,
+					  obj->mm.pages->sgl,
+					  i915_gem_get_pat_index(i915,
+								 I915_CACHE_NONE),
 					  true, 0xdeadbeaf, &rq);
 	if (rq) {
 		dma_resv_add_fence(obj->base.resv, &rq->fence,
diff --git a/drivers/gpu/drm/i915/selftests/mock_gtt.c b/drivers/gpu/drm/i915/selftests/mock_gtt.c
index ece97e4faacb..a516c0aa88fd 100644
--- a/drivers/gpu/drm/i915/selftests/mock_gtt.c
+++ b/drivers/gpu/drm/i915/selftests/mock_gtt.c
@@ -27,21 +27,21 @@
 static void mock_insert_page(struct i915_address_space *vm,
 			     dma_addr_t addr,
 			     u64 offset,
-			     enum i915_cache_level level,
+			     unsigned int pat_index,
 			     u32 flags)
 {
 }
 
 static void mock_insert_entries(struct i915_address_space *vm,
 				struct i915_vma_resource *vma_res,
-				enum i915_cache_level level, u32 flags)
+				unsigned int pat_index, u32 flags)
 {
 }
 
 static void mock_bind_ppgtt(struct i915_address_space *vm,
 			    struct i915_vm_pt_stash *stash,
 			    struct i915_vma_resource *vma_res,
-			    enum i915_cache_level cache_level,
+			    unsigned int pat_index,
 			    u32 flags)
 {
 	GEM_BUG_ON(flags & I915_VMA_GLOBAL_BIND);
@@ -94,7 +94,7 @@ struct i915_ppgtt *mock_ppgtt(struct drm_i915_private *i915, const char *name)
 static void mock_bind_ggtt(struct i915_address_space *vm,
 			   struct i915_vm_pt_stash *stash,
 			   struct i915_vma_resource *vma_res,
-			   enum i915_cache_level cache_level,
+			   unsigned int pat_index,
 			   u32 flags)
 {
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 8/8] drm/i915: Allow user to set cache at BO creation
  2023-04-19 23:00 ` [Intel-gfx] " fei.yang
@ 2023-04-19 23:00   ` fei.yang
  -1 siblings, 0 replies; 76+ messages in thread
From: fei.yang @ 2023-04-19 23:00 UTC (permalink / raw)
  To: intel-gfx; +Cc: Matt Roper, Chris Wilson, Fei Yang, dri-devel, Andi Shyti

From: Fei Yang <fei.yang@intel.com>

To comply with the design that buffer objects shall have immutable
cache setting through out their life cycle, {set, get}_caching ioctl's
are no longer supported from MTL onward. With that change caching
policy can only be set at object creation time. The current code
applies a default (platform dependent) cache setting for all objects.
However this is not optimal for performance tuning. The patch extends
the existing gem_create uAPI to let user set PAT index for the object
at creation time.
The new extension is platform independent, so UMD's can switch to using
this extension for older platforms as well, while {set, get}_caching are
still supported on these legacy paltforms for compatibility reason.

Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Fei Yang <fei.yang@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_create.c | 36 ++++++++++++++++++++++
 drivers/gpu/drm/i915/gem/i915_gem_object.c |  6 ++++
 include/uapi/drm/i915_drm.h                | 36 ++++++++++++++++++++++
 tools/include/uapi/drm/i915_drm.h          | 36 ++++++++++++++++++++++
 4 files changed, 114 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c b/drivers/gpu/drm/i915/gem/i915_gem_create.c
index bfe1dbda4cb7..723c3ddd6c74 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
@@ -245,6 +245,7 @@ struct create_ext {
 	unsigned int n_placements;
 	unsigned int placement_mask;
 	unsigned long flags;
+	unsigned int pat_index;
 };
 
 static void repr_placements(char *buf, size_t size,
@@ -394,11 +395,39 @@ static int ext_set_protected(struct i915_user_extension __user *base, void *data
 	return 0;
 }
 
+static int ext_set_pat(struct i915_user_extension __user *base, void *data)
+{
+	struct create_ext *ext_data = data;
+	struct drm_i915_private *i915 = ext_data->i915;
+	struct drm_i915_gem_create_ext_set_pat ext;
+	unsigned int max_pat_index;
+
+	BUILD_BUG_ON(sizeof(struct drm_i915_gem_create_ext_set_pat) !=
+		     offsetofend(struct drm_i915_gem_create_ext_set_pat, rsvd));
+
+	if (copy_from_user(&ext, base, sizeof(ext)))
+		return -EFAULT;
+
+	max_pat_index = INTEL_INFO(i915)->max_pat_index;
+
+	if (ext.pat_index > max_pat_index) {
+		drm_dbg(&i915->drm, "PAT index is invalid: %u\n",
+			ext.pat_index);
+		return -EINVAL;
+	}
+
+	ext_data->pat_index = ext.pat_index;
+
+	return 0;
+}
+
 static const i915_user_extension_fn create_extensions[] = {
 	[I915_GEM_CREATE_EXT_MEMORY_REGIONS] = ext_set_placements,
 	[I915_GEM_CREATE_EXT_PROTECTED_CONTENT] = ext_set_protected,
+	[I915_GEM_CREATE_EXT_SET_PAT] = ext_set_pat,
 };
 
+#define PAT_INDEX_NOT_SET	0xffff
 /**
  * i915_gem_create_ext_ioctl - Creates a new mm object and returns a handle to it.
  * @dev: drm device pointer
@@ -418,6 +447,7 @@ i915_gem_create_ext_ioctl(struct drm_device *dev, void *data,
 	if (args->flags & ~I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS)
 		return -EINVAL;
 
+	ext_data.pat_index = PAT_INDEX_NOT_SET;
 	ret = i915_user_extensions(u64_to_user_ptr(args->extensions),
 				   create_extensions,
 				   ARRAY_SIZE(create_extensions),
@@ -454,5 +484,11 @@ i915_gem_create_ext_ioctl(struct drm_device *dev, void *data,
 	if (IS_ERR(obj))
 		return PTR_ERR(obj);
 
+	if (ext_data.pat_index != PAT_INDEX_NOT_SET) {
+		i915_gem_object_set_pat_index(obj, ext_data.pat_index);
+		/* Mark pat_index is set by UMD */
+		obj->cache_level = I915_CACHE_INVAL;
+	}
+
 	return i915_gem_publish(obj, file, &args->size, &args->handle);
 }
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
index 27c948350b5b..61651f7e5806 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
@@ -209,6 +209,12 @@ bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj)
 	if (!(obj->flags & I915_BO_ALLOC_USER))
 		return false;
 
+	/*
+	 * Always flush cache for UMD objects at creation time.
+	 */
+	if (obj->cache_level == I915_CACHE_INVAL)
+		return true;
+
 	/*
 	 * EHL and JSL add the 'Bypass LLC' MOCS entry, which should make it
 	 * possible for userspace to bypass the GTT caching bits set by the
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index dba7c5a5b25e..03c5c314846e 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -3630,9 +3630,13 @@ struct drm_i915_gem_create_ext {
 	 *
 	 * For I915_GEM_CREATE_EXT_PROTECTED_CONTENT usage see
 	 * struct drm_i915_gem_create_ext_protected_content.
+	 *
+	 * For I915_GEM_CREATE_EXT_SET_PAT usage see
+	 * struct drm_i915_gem_create_ext_set_pat.
 	 */
 #define I915_GEM_CREATE_EXT_MEMORY_REGIONS 0
 #define I915_GEM_CREATE_EXT_PROTECTED_CONTENT 1
+#define I915_GEM_CREATE_EXT_SET_PAT 2
 	__u64 extensions;
 };
 
@@ -3747,6 +3751,38 @@ struct drm_i915_gem_create_ext_protected_content {
 	__u32 flags;
 };
 
+/**
+ * struct drm_i915_gem_create_ext_set_pat - The
+ * I915_GEM_CREATE_EXT_SET_PAT extension.
+ *
+ * If this extension is provided, the specified caching policy (PAT index) is
+ * applied to the buffer object.
+ *
+ * Below is an example on how to create an object with specific caching policy:
+ *
+ * .. code-block:: C
+ *
+ *      struct drm_i915_gem_create_ext_set_pat set_pat_ext = {
+ *              .base = { .name = I915_GEM_CREATE_EXT_SET_PAT },
+ *              .pat_index = 0,
+ *      };
+ *      struct drm_i915_gem_create_ext create_ext = {
+ *              .size = PAGE_SIZE,
+ *              .extensions = (uintptr_t)&set_pat_ext,
+ *      };
+ *
+ *      int err = ioctl(fd, DRM_IOCTL_I915_GEM_CREATE_EXT, &create_ext);
+ *      if (err) ...
+ */
+struct drm_i915_gem_create_ext_set_pat {
+	/** @base: Extension link. See struct i915_user_extension. */
+	struct i915_user_extension base;
+	/** @pat_index: PAT index to be set */
+	__u32 pat_index;
+	/** @rsvd: reserved for future use */
+	__u32 rsvd;
+};
+
 /* ID of the protected content session managed by i915 when PXP is active */
 #define I915_PROTECTED_CONTENT_DEFAULT_SESSION 0xf
 
diff --git a/tools/include/uapi/drm/i915_drm.h b/tools/include/uapi/drm/i915_drm.h
index 8df261c5ab9b..8cdcdb5fac26 100644
--- a/tools/include/uapi/drm/i915_drm.h
+++ b/tools/include/uapi/drm/i915_drm.h
@@ -3607,9 +3607,13 @@ struct drm_i915_gem_create_ext {
 	 *
 	 * For I915_GEM_CREATE_EXT_PROTECTED_CONTENT usage see
 	 * struct drm_i915_gem_create_ext_protected_content.
+	 *
+	 * For I915_GEM_CREATE_EXT_SET_PAT usage see
+	 * struct drm_i915_gem_create_ext_set_pat.
 	 */
 #define I915_GEM_CREATE_EXT_MEMORY_REGIONS 0
 #define I915_GEM_CREATE_EXT_PROTECTED_CONTENT 1
+#define I915_GEM_CREATE_EXT_SET_PAT 2
 	__u64 extensions;
 };
 
@@ -3724,6 +3728,38 @@ struct drm_i915_gem_create_ext_protected_content {
 	__u32 flags;
 };
 
+/**
+ * struct drm_i915_gem_create_ext_set_pat - The
+ * I915_GEM_CREATE_EXT_SET_PAT extension.
+ *
+ * If this extension is provided, the specified caching policy (PAT index) is
+ * applied to the buffer object.
+ *
+ * Below is an example on how to create an object with specific caching policy:
+ *
+ * .. code-block:: C
+ *
+ *      struct drm_i915_gem_create_ext_set_pat set_pat_ext = {
+ *              .base = { .name = I915_GEM_CREATE_EXT_SET_PAT },
+ *              .pat_index = 0,
+ *      };
+ *      struct drm_i915_gem_create_ext create_ext = {
+ *              .size = PAGE_SIZE,
+ *              .extensions = (uintptr_t)&set_pat_ext,
+ *      };
+ *
+ *      int err = ioctl(fd, DRM_IOCTL_I915_GEM_CREATE_EXT, &create_ext);
+ *      if (err) ...
+ */
+struct drm_i915_gem_create_ext_set_pat {
+	/** @base: Extension link. See struct i915_user_extension. */
+	struct i915_user_extension base;
+	/** @pat_index: PAT index to be set */
+	__u32 pat_index;
+	/** @rsvd: reserved for future use */
+	__u32 rsvd;
+};
+
 /* ID of the protected content session managed by i915 when PXP is active */
 #define I915_PROTECTED_CONTENT_DEFAULT_SESSION 0xf
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [Intel-gfx] [PATCH 8/8] drm/i915: Allow user to set cache at BO creation
@ 2023-04-19 23:00   ` fei.yang
  0 siblings, 0 replies; 76+ messages in thread
From: fei.yang @ 2023-04-19 23:00 UTC (permalink / raw)
  To: intel-gfx; +Cc: Matt Roper, Chris Wilson, dri-devel

From: Fei Yang <fei.yang@intel.com>

To comply with the design that buffer objects shall have immutable
cache setting through out their life cycle, {set, get}_caching ioctl's
are no longer supported from MTL onward. With that change caching
policy can only be set at object creation time. The current code
applies a default (platform dependent) cache setting for all objects.
However this is not optimal for performance tuning. The patch extends
the existing gem_create uAPI to let user set PAT index for the object
at creation time.
The new extension is platform independent, so UMD's can switch to using
this extension for older platforms as well, while {set, get}_caching are
still supported on these legacy paltforms for compatibility reason.

Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Fei Yang <fei.yang@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_create.c | 36 ++++++++++++++++++++++
 drivers/gpu/drm/i915/gem/i915_gem_object.c |  6 ++++
 include/uapi/drm/i915_drm.h                | 36 ++++++++++++++++++++++
 tools/include/uapi/drm/i915_drm.h          | 36 ++++++++++++++++++++++
 4 files changed, 114 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c b/drivers/gpu/drm/i915/gem/i915_gem_create.c
index bfe1dbda4cb7..723c3ddd6c74 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
@@ -245,6 +245,7 @@ struct create_ext {
 	unsigned int n_placements;
 	unsigned int placement_mask;
 	unsigned long flags;
+	unsigned int pat_index;
 };
 
 static void repr_placements(char *buf, size_t size,
@@ -394,11 +395,39 @@ static int ext_set_protected(struct i915_user_extension __user *base, void *data
 	return 0;
 }
 
+static int ext_set_pat(struct i915_user_extension __user *base, void *data)
+{
+	struct create_ext *ext_data = data;
+	struct drm_i915_private *i915 = ext_data->i915;
+	struct drm_i915_gem_create_ext_set_pat ext;
+	unsigned int max_pat_index;
+
+	BUILD_BUG_ON(sizeof(struct drm_i915_gem_create_ext_set_pat) !=
+		     offsetofend(struct drm_i915_gem_create_ext_set_pat, rsvd));
+
+	if (copy_from_user(&ext, base, sizeof(ext)))
+		return -EFAULT;
+
+	max_pat_index = INTEL_INFO(i915)->max_pat_index;
+
+	if (ext.pat_index > max_pat_index) {
+		drm_dbg(&i915->drm, "PAT index is invalid: %u\n",
+			ext.pat_index);
+		return -EINVAL;
+	}
+
+	ext_data->pat_index = ext.pat_index;
+
+	return 0;
+}
+
 static const i915_user_extension_fn create_extensions[] = {
 	[I915_GEM_CREATE_EXT_MEMORY_REGIONS] = ext_set_placements,
 	[I915_GEM_CREATE_EXT_PROTECTED_CONTENT] = ext_set_protected,
+	[I915_GEM_CREATE_EXT_SET_PAT] = ext_set_pat,
 };
 
+#define PAT_INDEX_NOT_SET	0xffff
 /**
  * i915_gem_create_ext_ioctl - Creates a new mm object and returns a handle to it.
  * @dev: drm device pointer
@@ -418,6 +447,7 @@ i915_gem_create_ext_ioctl(struct drm_device *dev, void *data,
 	if (args->flags & ~I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS)
 		return -EINVAL;
 
+	ext_data.pat_index = PAT_INDEX_NOT_SET;
 	ret = i915_user_extensions(u64_to_user_ptr(args->extensions),
 				   create_extensions,
 				   ARRAY_SIZE(create_extensions),
@@ -454,5 +484,11 @@ i915_gem_create_ext_ioctl(struct drm_device *dev, void *data,
 	if (IS_ERR(obj))
 		return PTR_ERR(obj);
 
+	if (ext_data.pat_index != PAT_INDEX_NOT_SET) {
+		i915_gem_object_set_pat_index(obj, ext_data.pat_index);
+		/* Mark pat_index is set by UMD */
+		obj->cache_level = I915_CACHE_INVAL;
+	}
+
 	return i915_gem_publish(obj, file, &args->size, &args->handle);
 }
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
index 27c948350b5b..61651f7e5806 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
@@ -209,6 +209,12 @@ bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj)
 	if (!(obj->flags & I915_BO_ALLOC_USER))
 		return false;
 
+	/*
+	 * Always flush cache for UMD objects at creation time.
+	 */
+	if (obj->cache_level == I915_CACHE_INVAL)
+		return true;
+
 	/*
 	 * EHL and JSL add the 'Bypass LLC' MOCS entry, which should make it
 	 * possible for userspace to bypass the GTT caching bits set by the
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index dba7c5a5b25e..03c5c314846e 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -3630,9 +3630,13 @@ struct drm_i915_gem_create_ext {
 	 *
 	 * For I915_GEM_CREATE_EXT_PROTECTED_CONTENT usage see
 	 * struct drm_i915_gem_create_ext_protected_content.
+	 *
+	 * For I915_GEM_CREATE_EXT_SET_PAT usage see
+	 * struct drm_i915_gem_create_ext_set_pat.
 	 */
 #define I915_GEM_CREATE_EXT_MEMORY_REGIONS 0
 #define I915_GEM_CREATE_EXT_PROTECTED_CONTENT 1
+#define I915_GEM_CREATE_EXT_SET_PAT 2
 	__u64 extensions;
 };
 
@@ -3747,6 +3751,38 @@ struct drm_i915_gem_create_ext_protected_content {
 	__u32 flags;
 };
 
+/**
+ * struct drm_i915_gem_create_ext_set_pat - The
+ * I915_GEM_CREATE_EXT_SET_PAT extension.
+ *
+ * If this extension is provided, the specified caching policy (PAT index) is
+ * applied to the buffer object.
+ *
+ * Below is an example on how to create an object with specific caching policy:
+ *
+ * .. code-block:: C
+ *
+ *      struct drm_i915_gem_create_ext_set_pat set_pat_ext = {
+ *              .base = { .name = I915_GEM_CREATE_EXT_SET_PAT },
+ *              .pat_index = 0,
+ *      };
+ *      struct drm_i915_gem_create_ext create_ext = {
+ *              .size = PAGE_SIZE,
+ *              .extensions = (uintptr_t)&set_pat_ext,
+ *      };
+ *
+ *      int err = ioctl(fd, DRM_IOCTL_I915_GEM_CREATE_EXT, &create_ext);
+ *      if (err) ...
+ */
+struct drm_i915_gem_create_ext_set_pat {
+	/** @base: Extension link. See struct i915_user_extension. */
+	struct i915_user_extension base;
+	/** @pat_index: PAT index to be set */
+	__u32 pat_index;
+	/** @rsvd: reserved for future use */
+	__u32 rsvd;
+};
+
 /* ID of the protected content session managed by i915 when PXP is active */
 #define I915_PROTECTED_CONTENT_DEFAULT_SESSION 0xf
 
diff --git a/tools/include/uapi/drm/i915_drm.h b/tools/include/uapi/drm/i915_drm.h
index 8df261c5ab9b..8cdcdb5fac26 100644
--- a/tools/include/uapi/drm/i915_drm.h
+++ b/tools/include/uapi/drm/i915_drm.h
@@ -3607,9 +3607,13 @@ struct drm_i915_gem_create_ext {
 	 *
 	 * For I915_GEM_CREATE_EXT_PROTECTED_CONTENT usage see
 	 * struct drm_i915_gem_create_ext_protected_content.
+	 *
+	 * For I915_GEM_CREATE_EXT_SET_PAT usage see
+	 * struct drm_i915_gem_create_ext_set_pat.
 	 */
 #define I915_GEM_CREATE_EXT_MEMORY_REGIONS 0
 #define I915_GEM_CREATE_EXT_PROTECTED_CONTENT 1
+#define I915_GEM_CREATE_EXT_SET_PAT 2
 	__u64 extensions;
 };
 
@@ -3724,6 +3728,38 @@ struct drm_i915_gem_create_ext_protected_content {
 	__u32 flags;
 };
 
+/**
+ * struct drm_i915_gem_create_ext_set_pat - The
+ * I915_GEM_CREATE_EXT_SET_PAT extension.
+ *
+ * If this extension is provided, the specified caching policy (PAT index) is
+ * applied to the buffer object.
+ *
+ * Below is an example on how to create an object with specific caching policy:
+ *
+ * .. code-block:: C
+ *
+ *      struct drm_i915_gem_create_ext_set_pat set_pat_ext = {
+ *              .base = { .name = I915_GEM_CREATE_EXT_SET_PAT },
+ *              .pat_index = 0,
+ *      };
+ *      struct drm_i915_gem_create_ext create_ext = {
+ *              .size = PAGE_SIZE,
+ *              .extensions = (uintptr_t)&set_pat_ext,
+ *      };
+ *
+ *      int err = ioctl(fd, DRM_IOCTL_I915_GEM_CREATE_EXT, &create_ext);
+ *      if (err) ...
+ */
+struct drm_i915_gem_create_ext_set_pat {
+	/** @base: Extension link. See struct i915_user_extension. */
+	struct i915_user_extension base;
+	/** @pat_index: PAT index to be set */
+	__u32 pat_index;
+	/** @rsvd: reserved for future use */
+	__u32 rsvd;
+};
+
 /* ID of the protected content session managed by i915 when PXP is active */
 #define I915_PROTECTED_CONTENT_DEFAULT_SESSION 0xf
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [Intel-gfx] ✗ Fi.CI.SPARSE: warning for drm/i915/mtl: Define MOCS and PAT tables for MTL (rev8)
  2023-04-19 23:00 ` [Intel-gfx] " fei.yang
                   ` (8 preceding siblings ...)
  (?)
@ 2023-04-19 23:29 ` Patchwork
  -1 siblings, 0 replies; 76+ messages in thread
From: Patchwork @ 2023-04-19 23:29 UTC (permalink / raw)
  To: Yang, Fei; +Cc: intel-gfx

== Series Details ==

Series: drm/i915/mtl: Define MOCS and PAT tables for MTL (rev8)
URL   : https://patchwork.freedesktop.org/series/115980/
State : warning

== Summary ==

Error: dim sparse failed
Sparse version: v0.6.2
Fast mode used, each commit won't be checked separately.



^ permalink raw reply	[flat|nested] 76+ messages in thread

* [Intel-gfx] ✗ Fi.CI.BAT: failure for drm/i915/mtl: Define MOCS and PAT tables for MTL (rev8)
  2023-04-19 23:00 ` [Intel-gfx] " fei.yang
                   ` (9 preceding siblings ...)
  (?)
@ 2023-04-19 23:51 ` Patchwork
  -1 siblings, 0 replies; 76+ messages in thread
From: Patchwork @ 2023-04-19 23:51 UTC (permalink / raw)
  To: Yang, Fei; +Cc: intel-gfx

[-- Attachment #1: Type: text/plain, Size: 8462 bytes --]

== Series Details ==

Series: drm/i915/mtl: Define MOCS and PAT tables for MTL (rev8)
URL   : https://patchwork.freedesktop.org/series/115980/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_13029 -> Patchwork_115980v8
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with Patchwork_115980v8 absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_115980v8, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_115980v8/index.html

Participating hosts (38 -> 37)
------------------------------

  Missing    (1): fi-snb-2520m 

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_115980v8:

### IGT changes ###

#### Possible regressions ####

  * igt@i915_selftest@live@execlists:
    - fi-apl-guc:         [PASS][1] -> [DMESG-FAIL][2] +1 similar issue
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13029/fi-apl-guc/igt@i915_selftest@live@execlists.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_115980v8/fi-apl-guc/igt@i915_selftest@live@execlists.html
    - fi-glk-j4005:       [PASS][3] -> [ABORT][4]
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13029/fi-glk-j4005/igt@i915_selftest@live@execlists.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_115980v8/fi-glk-j4005/igt@i915_selftest@live@execlists.html

  * igt@i915_selftest@live@gt_engines:
    - fi-glk-j4005:       [PASS][5] -> [FAIL][6]
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13029/fi-glk-j4005/igt@i915_selftest@live@gt_engines.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_115980v8/fi-glk-j4005/igt@i915_selftest@live@gt_engines.html

  * igt@i915_selftest@live@gt_mocs:
    - fi-glk-j4005:       [PASS][7] -> [DMESG-FAIL][8] +2 similar issues
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13029/fi-glk-j4005/igt@i915_selftest@live@gt_mocs.html
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_115980v8/fi-glk-j4005/igt@i915_selftest@live@gt_mocs.html

  
#### Suppressed ####

  The following results come from untrusted machines, tests, or statuses.
  They do not affect the overall result.

  * igt@i915_selftest@live@requests:
    - {bat-mtlp-8}:       NOTRUN -> [ABORT][9]
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_115980v8/bat-mtlp-8/igt@i915_selftest@live@requests.html

  
Known issues
------------

  Here are the changes found in Patchwork_115980v8 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_exec_suspend@basic-s3@smem:
    - bat-rpls-1:         NOTRUN -> [ABORT][10] ([i915#6687] / [i915#7978])
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_115980v8/bat-rpls-1/igt@gem_exec_suspend@basic-s3@smem.html

  * igt@i915_selftest@live@gt_heartbeat:
    - fi-kbl-soraka:      [PASS][11] -> [DMESG-FAIL][12] ([i915#5334] / [i915#7872])
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13029/fi-kbl-soraka/igt@i915_selftest@live@gt_heartbeat.html
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_115980v8/fi-kbl-soraka/igt@i915_selftest@live@gt_heartbeat.html

  * igt@i915_selftest@live@hangcheck:
    - bat-dg2-8:          [PASS][13] -> [ABORT][14] ([i915#7913] / [i915#7979])
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13029/bat-dg2-8/igt@i915_selftest@live@hangcheck.html
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_115980v8/bat-dg2-8/igt@i915_selftest@live@hangcheck.html

  * igt@i915_selftest@live@slpc:
    - bat-rpls-1:         NOTRUN -> [DMESG-FAIL][15] ([i915#6367] / [i915#7996])
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_115980v8/bat-rpls-1/igt@i915_selftest@live@slpc.html

  * igt@kms_pipe_crc_basic@read-crc:
    - bat-adlp-9:         NOTRUN -> [SKIP][16] ([i915#3546]) +1 similar issue
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_115980v8/bat-adlp-9/igt@kms_pipe_crc_basic@read-crc.html

  
#### Possible fixes ####

  * igt@i915_selftest@live@gt_mocs:
    - {bat-mtlp-8}:       [ABORT][17] ([i915#8369]) -> [PASS][18]
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13029/bat-mtlp-8/igt@i915_selftest@live@gt_mocs.html
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_115980v8/bat-mtlp-8/igt@i915_selftest@live@gt_mocs.html

  * igt@i915_selftest@live@gt_pm:
    - {bat-mtlp-8}:       [DMESG-FAIL][19] ([i915#8370]) -> [PASS][20]
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13029/bat-mtlp-8/igt@i915_selftest@live@gt_pm.html
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_115980v8/bat-mtlp-8/igt@i915_selftest@live@gt_pm.html

  * igt@i915_selftest@live@requests:
    - bat-rpls-1:         [ABORT][21] ([i915#7911]) -> [PASS][22]
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13029/bat-rpls-1/igt@i915_selftest@live@requests.html
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_115980v8/bat-rpls-1/igt@i915_selftest@live@requests.html

  * igt@i915_selftest@live@workarounds:
    - bat-adlm-1:         [DMESG-FAIL][23] -> [PASS][24]
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13029/bat-adlm-1/igt@i915_selftest@live@workarounds.html
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_115980v8/bat-adlm-1/igt@i915_selftest@live@workarounds.html

  * igt@kms_pipe_crc_basic@suspend-read-crc@pipe-a-edp-1:
    - bat-rplp-1:         [FAIL][25] ([fdo#103375]) -> [PASS][26] +5 similar issues
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13029/bat-rplp-1/igt@kms_pipe_crc_basic@suspend-read-crc@pipe-a-edp-1.html
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_115980v8/bat-rplp-1/igt@kms_pipe_crc_basic@suspend-read-crc@pipe-a-edp-1.html

  
#### Warnings ####

  * igt@kms_chamelium_frames@dp-crc-fast:
    - fi-kbl-soraka:      [INCOMPLETE][27] -> [SKIP][28] ([fdo#109271])
   [27]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13029/fi-kbl-soraka/igt@kms_chamelium_frames@dp-crc-fast.html
   [28]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_115980v8/fi-kbl-soraka/igt@kms_chamelium_frames@dp-crc-fast.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#103375]: https://bugs.freedesktop.org/show_bug.cgi?id=103375
  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [i915#3546]: https://gitlab.freedesktop.org/drm/intel/issues/3546
  [i915#5334]: https://gitlab.freedesktop.org/drm/intel/issues/5334
  [i915#6367]: https://gitlab.freedesktop.org/drm/intel/issues/6367
  [i915#6687]: https://gitlab.freedesktop.org/drm/intel/issues/6687
  [i915#7872]: https://gitlab.freedesktop.org/drm/intel/issues/7872
  [i915#7911]: https://gitlab.freedesktop.org/drm/intel/issues/7911
  [i915#7913]: https://gitlab.freedesktop.org/drm/intel/issues/7913
  [i915#7978]: https://gitlab.freedesktop.org/drm/intel/issues/7978
  [i915#7979]: https://gitlab.freedesktop.org/drm/intel/issues/7979
  [i915#7996]: https://gitlab.freedesktop.org/drm/intel/issues/7996
  [i915#8369]: https://gitlab.freedesktop.org/drm/intel/issues/8369
  [i915#8370]: https://gitlab.freedesktop.org/drm/intel/issues/8370


Build changes
-------------

  * Linux: CI_DRM_13029 -> Patchwork_115980v8

  CI-20190529: 20190529
  CI_DRM_13029: 5eae4746072c2d127c8bd21c76036072aec806a4 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_7260: 5a0dab0153d184b4497e5e25305699f76a20b303 @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  Patchwork_115980v8: 5eae4746072c2d127c8bd21c76036072aec806a4 @ git://anongit.freedesktop.org/gfx-ci/linux


### Linux commits

e22d629e0ae7 drm/i915: Allow user to set cache at BO creation
204ed6727086 drm/i915: use pat_index instead of cache_level
dbbf115ee963 drm/i915: preparation for using PAT index
3d0146b9febb drm/i915/mtl: end support for set caching ioctl
fd9c0cec5544 drm/i915/mtl: workaround coherency issue for Media
87e4d08f2809 drm/i915/mtl: Add PTE encode function
32550005f5f4 drm/i915/mtl: Define MOCS and PAT tables for MTL
b60ac82926c6 drm/i915/mtl: Set has_llc=0

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_115980v8/index.html

[-- Attachment #2: Type: text/html, Size: 9537 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Intel-gfx] [PATCH 4/8] drm/i915/mtl: workaround coherency issue for Media
  2023-04-19 23:00   ` [Intel-gfx] " fei.yang
  (?)
@ 2023-04-20  8:26   ` Andrzej Hajda
  -1 siblings, 0 replies; 76+ messages in thread
From: Andrzej Hajda @ 2023-04-20  8:26 UTC (permalink / raw)
  To: fei.yang, intel-gfx; +Cc: dri-devel, Nirmoy Das

On 20.04.2023 01:00, fei.yang@intel.com wrote:
> From: Fei Yang <fei.yang@intel.com>
> 
> This patch implements Wa_22016122933.
> 
> In MTL, memory writes initiated by Media tile update the whole
> cache line even for partial writes. This creates a coherency
> problem for cacheable memory if both CPU and GPU are writing data
> to different locations within a single cache line. CTB communication
> is impacted by this issue because the head and tail pointers are
> adjacent words within a cache line (see struct guc_ct_buffer_desc),
> where one is written by GuC and the other by the host.
> This patch circumvents the issue by making CPU/GPU shared memory
> uncacheable (WC on CPU side, and PAT index 2 for GPU). Also for
> CTB which is being updated by both CPU and GuC, mfence instruction
> is added to make sure the CPU writes are visible to GPU right away
> (flush the write combining buffer).
> 
> While fixing the CTB issue, we noticed some random GSC firmware
> loading failure because the share buffers are cacheable (WB) on CPU
> side but uncached on GPU side. To fix these issues we need to map
> such shared buffers as WC on CPU side. Since such allocations are
> not all done through GuC allocator, to avoid too many code changes,
> the i915_coherent_map_type() is now hard coded to return WC for MTL.
> 
> BSpec: 45101
> 
> Signed-off-by: Fei Yang <fei.yang@intel.com>
> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
> Acked-by: Nirmoy Das <nirmoy.das@intel.com>

Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>

Regards
Andrzej
> ---
>   drivers/gpu/drm/i915/gem/i915_gem_pages.c |  5 ++++-
>   drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c | 13 +++++++++++++
>   drivers/gpu/drm/i915/gt/uc/intel_guc.c    |  7 +++++++
>   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c |  6 ++++++
>   4 files changed, 30 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> index ecd86130b74f..89fc8ea6bcfc 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> @@ -469,7 +469,10 @@ enum i915_map_type i915_coherent_map_type(struct drm_i915_private *i915,
>   					  struct drm_i915_gem_object *obj,
>   					  bool always_coherent)
>   {
> -	if (i915_gem_object_is_lmem(obj))
> +	/*
> +	 * Wa_22016122933: always return I915_MAP_WC for MTL
> +	 */
> +	if (i915_gem_object_is_lmem(obj) || IS_METEORLAKE(i915))
>   		return I915_MAP_WC;
>   	if (HAS_LLC(i915) || always_coherent)
>   		return I915_MAP_WB;
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c b/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c
> index 1d9fdfb11268..236673c02f9a 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c
> @@ -110,6 +110,13 @@ static int gsc_fw_load_prepare(struct intel_gsc_uc *gsc)
>   	if (obj->base.size < gsc->fw.size)
>   		return -ENOSPC;
>   
> +	/*
> +	 * Wa_22016122933: For MTL the shared memory needs to be mapped
> +	 * as WC on CPU side and UC (PAT index 2) on GPU side
> +	 */
> +	if (IS_METEORLAKE(i915))
> +		i915_gem_object_set_cache_coherency(obj, I915_CACHE_NONE);
> +
>   	dst = i915_gem_object_pin_map_unlocked(obj,
>   					       i915_coherent_map_type(i915, obj, true));
>   	if (IS_ERR(dst))
> @@ -125,6 +132,12 @@ static int gsc_fw_load_prepare(struct intel_gsc_uc *gsc)
>   	memset(dst, 0, obj->base.size);
>   	memcpy(dst, src, gsc->fw.size);
>   
> +	/*
> +	 * Wa_22016122933: Making sure the data in dst is
> +	 * visible to GSC right away
> +	 */
> +	intel_guc_write_barrier(&gt->uc.guc);
> +
>   	i915_gem_object_unpin_map(gsc->fw.obj);
>   	i915_gem_object_unpin_map(obj);
>   
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> index e89f16ecf1ae..c9f20385f6a0 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> @@ -744,6 +744,13 @@ struct i915_vma *intel_guc_allocate_vma(struct intel_guc *guc, u32 size)
>   	if (IS_ERR(obj))
>   		return ERR_CAST(obj);
>   
> +	/*
> +	 * Wa_22016122933: For MTL the shared memory needs to be mapped
> +	 * as WC on CPU side and UC (PAT index 2) on GPU side
> +	 */
> +	if (IS_METEORLAKE(gt->i915))
> +		i915_gem_object_set_cache_coherency(obj, I915_CACHE_NONE);
> +
>   	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL);
>   	if (IS_ERR(vma))
>   		goto err;
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> index 1803a633ed64..99a0a89091e7 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> @@ -902,6 +902,12 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
>   	/* now update descriptor */
>   	WRITE_ONCE(desc->head, head);
>   
> +	/*
> +	 * Wa_22016122933: Making sure the head update is
> +	 * visible to GuC right away
> +	 */
> +	intel_guc_write_barrier(ct_to_guc(ct));
> +
>   	return available - len;
>   
>   corrupted:


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Intel-gfx] [PATCH 6/8] drm/i915: preparation for using PAT index
  2023-04-19 23:00   ` [Intel-gfx] " fei.yang
  (?)
@ 2023-04-20  8:45   ` Andrzej Hajda
  -1 siblings, 0 replies; 76+ messages in thread
From: Andrzej Hajda @ 2023-04-20  8:45 UTC (permalink / raw)
  To: fei.yang, intel-gfx; +Cc: Chris Wilson, Matt Roper, dri-devel

On 20.04.2023 01:00, fei.yang@intel.com wrote:
> From: Fei Yang <fei.yang@intel.com>
> 
> This patch is a preparation for replacing enum i915_cache_level with PAT
> index. Caching policy for buffer objects is set through the PAT index in
> PTE, the old i915_cache_level is not sufficient to represent all caching
> modes supported by the hardware.
> 
> Preparing the transition by adding some platform dependent data structures
> and helper functions to translate the cache_level to pat_index.
> 
> cachelevel_to_pat: a platform dependent array mapping cache_level to
>                     pat_index.
> 
> max_pat_index: the maximum PAT index supported by the hardware. Needed for
>                 validating the PAT index passed in from user space.
> 
> i915_gem_get_pat_index: function to convert cache_level to PAT index.
> 
> obj_to_i915(obj): macro moved to header file for wider usage.
> 
> I915_MAX_CACHE_LEVEL: upper bound of i915_cache_level for the
>                        convenience of coding.
> 
> Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
> Cc: Matt Roper <matthew.d.roper@intel.com>
> Cc: Andi Shyti <andi.shyti@linux.intel.com>
> Signed-off-by: Fei Yang <fei.yang@intel.com>
> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
> ---
>   drivers/gpu/drm/i915/gem/i915_gem_object.c    |  9 +++
>   drivers/gpu/drm/i915/gem/i915_gem_object.h    |  4 +
>   .../gpu/drm/i915/gem/i915_gem_object_types.h  |  1 +
>   drivers/gpu/drm/i915/gem/i915_gem_shrinker.c  |  2 -
>   drivers/gpu/drm/i915/gt/gen8_ppgtt.c          |  6 ++
>   drivers/gpu/drm/i915/gt/intel_ggtt.c          |  6 ++
>   drivers/gpu/drm/i915/i915_pci.c               | 75 +++++++++++++++++--
>   drivers/gpu/drm/i915/intel_device_info.h      |  5 ++
>   .../gpu/drm/i915/selftests/mock_gem_device.c  |  9 +++
>   9 files changed, 107 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> index 4666bb82f312..8c70a0ec7d2f 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> @@ -45,6 +45,15 @@ static struct kmem_cache *slab_objects;
>   
>   static const struct drm_gem_object_funcs i915_gem_object_funcs;
>   
> +unsigned int i915_gem_get_pat_index(struct drm_i915_private *i915,
> +				    enum i915_cache_level level)
> +{
> +	if (drm_WARN_ON(&i915->drm, level >= I915_MAX_CACHE_LEVEL))
> +		return 0;
> +
> +	return INTEL_INFO(i915)->cachelevel_to_pat[level];
> +}
> +
>   struct drm_i915_gem_object *i915_gem_object_alloc(void)
>   {
>   	struct drm_i915_gem_object *obj;
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> index 885ccde9dc3c..4c92e17b4337 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> @@ -20,6 +20,8 @@
>   
>   enum intel_region_id;
>   
> +#define obj_to_i915(obj__) to_i915((obj__)->base.dev)
> +

while moved could be replaced by inline, up to you


>   static inline bool i915_gem_object_size_2big(u64 size)
>   {
>   	struct drm_i915_gem_object *obj;
> @@ -30,6 +32,8 @@ static inline bool i915_gem_object_size_2big(u64 size)
>   	return false;
>   }
>   
> +unsigned int i915_gem_get_pat_index(struct drm_i915_private *i915,
> +				    enum i915_cache_level level);
>   void i915_gem_init__objects(struct drm_i915_private *i915);
>   
>   void i915_objects_module_exit(void);
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> index 830c11431ee8..41b35abccf88 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> @@ -194,6 +194,7 @@ enum i915_cache_level {
>   	 * engine.
>   	 */
>   	I915_CACHE_WT,
> +	I915_MAX_CACHE_LEVEL,
>   };
>   
>   enum i915_map_type {
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c
> index b1672e054b21..214763942aa2 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c
> @@ -460,8 +460,6 @@ void i915_gem_shrinker_taints_mutex(struct drm_i915_private *i915,
>   	fs_reclaim_release(GFP_KERNEL);
>   }
>   
> -#define obj_to_i915(obj__) to_i915((obj__)->base.dev)
> -
>   /**
>    * i915_gem_object_make_unshrinkable - Hide the object from the shrinker. By
>    * default all object types that support shrinking(see IS_SHRINKABLE), will also
> diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> index 11b91e0453c8..7a4b1d1afce9 100644
> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> @@ -78,6 +78,12 @@ static u64 mtl_pte_encode(dma_addr_t addr,
>   	case I915_CACHE_WT:
>   		pte |= GEN12_PPGTT_PTE_PAT0;
>   		break;
> +	default:
> +		/* This should never happen. Added to deal with the compile
> +		 * error due to the addition of I915_MAX_CACHE_LEVEL. Will
> +		 * be removed by the pat_index patch.
> +		 */
> +		break;
>   	}
>   
>   	return pte;
> diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> index 20915edc8bd9..c8390d03fce2 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> @@ -242,6 +242,12 @@ static u64 mtl_ggtt_pte_encode(dma_addr_t addr,
>   	case I915_CACHE_WT:
>   		pte |= MTL_GGTT_PTE_PAT0;
>   		break;
> +	default:
> +		/* This should never happen. Added to deal with the compile
> +		 * error due to the addition of I915_MAX_CACHE_LEVEL. Will
> +		 * be removed by the pat_index patch.
> +		 */
> +		break;
>   	}
>   
>   	return pte;
> diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
> index 272a8ba37b64..4ca0ea8fce9b 100644
> --- a/drivers/gpu/drm/i915/i915_pci.c
> +++ b/drivers/gpu/drm/i915/i915_pci.c
> @@ -30,6 +30,7 @@
>   #include "display/intel_display_driver.h"
>   #include "gt/intel_gt_regs.h"
>   #include "gt/intel_sa_media.h"
> +#include "gem/i915_gem_object_types.h"
>   
>   #include "i915_driver.h"
>   #include "i915_drv.h"
> @@ -164,6 +165,38 @@
>   		.gamma_lut_tests = DRM_COLOR_LUT_NON_DECREASING, \
>   	}
>   
> +#define LEGACY_CACHELEVEL \
> +	.cachelevel_to_pat = { \
> +		[I915_CACHE_NONE]   = 0, \
> +		[I915_CACHE_LLC]    = 1, \
> +		[I915_CACHE_L3_LLC] = 2, \
> +		[I915_CACHE_WT]     = 3, \
> +	}
> +
> +#define TGL_CACHELEVEL \
> +	.cachelevel_to_pat = { \
> +		[I915_CACHE_NONE]   = 3, \
> +		[I915_CACHE_LLC]    = 0, \
> +		[I915_CACHE_L3_LLC] = 0, \
> +		[I915_CACHE_WT]     = 2, \
> +	}
> +
> +#define PVC_CACHELEVEL \
> +	.cachelevel_to_pat = { \
> +		[I915_CACHE_NONE]   = 0, \
> +		[I915_CACHE_LLC]    = 3, \
> +		[I915_CACHE_L3_LLC] = 3, \
> +		[I915_CACHE_WT]     = 2, \
> +	}
> +
> +#define MTL_CACHELEVEL \
> +	.cachelevel_to_pat = { \
> +		[I915_CACHE_NONE]   = 2, \
> +		[I915_CACHE_LLC]    = 3, \
> +		[I915_CACHE_L3_LLC] = 3, \
> +		[I915_CACHE_WT]     = 1, \
> +	}
> +
>   /* Keep in gen based order, and chronological order within a gen */
>   
>   #define GEN_DEFAULT_PAGE_SIZES \
> @@ -189,11 +222,13 @@
>   	.has_snoop = true, \
>   	.has_coherent_ggtt = false, \
>   	.dma_mask_size = 32, \
> +	.max_pat_index = 3, \
>   	I9XX_PIPE_OFFSETS, \
>   	I9XX_CURSOR_OFFSETS, \
>   	I9XX_COLORS, \
>   	GEN_DEFAULT_PAGE_SIZES, \
> -	GEN_DEFAULT_REGIONS
> +	GEN_DEFAULT_REGIONS, \
> +	LEGACY_CACHELEVEL
>   
>   #define I845_FEATURES \
>   	GEN(2), \
> @@ -210,11 +245,13 @@
>   	.has_snoop = true, \
>   	.has_coherent_ggtt = false, \
>   	.dma_mask_size = 32, \
> +	.max_pat_index = 3, \
>   	I845_PIPE_OFFSETS, \
>   	I845_CURSOR_OFFSETS, \
>   	I845_COLORS, \
>   	GEN_DEFAULT_PAGE_SIZES, \
> -	GEN_DEFAULT_REGIONS
> +	GEN_DEFAULT_REGIONS, \
> +	LEGACY_CACHELEVEL
>   
>   static const struct intel_device_info i830_info = {
>   	I830_FEATURES,
> @@ -249,11 +286,13 @@ static const struct intel_device_info i865g_info = {
>   	.has_snoop = true, \
>   	.has_coherent_ggtt = true, \
>   	.dma_mask_size = 32, \
> +	.max_pat_index = 3, \
>   	I9XX_PIPE_OFFSETS, \
>   	I9XX_CURSOR_OFFSETS, \
>   	I9XX_COLORS, \
>   	GEN_DEFAULT_PAGE_SIZES, \
> -	GEN_DEFAULT_REGIONS
> +	GEN_DEFAULT_REGIONS, \
> +	LEGACY_CACHELEVEL
>   
>   static const struct intel_device_info i915g_info = {
>   	GEN3_FEATURES,
> @@ -341,11 +380,13 @@ static const struct intel_device_info pnv_m_info = {
>   	.has_snoop = true, \
>   	.has_coherent_ggtt = true, \
>   	.dma_mask_size = 36, \
> +	.max_pat_index = 3, \
>   	I9XX_PIPE_OFFSETS, \
>   	I9XX_CURSOR_OFFSETS, \
>   	I9XX_COLORS, \
>   	GEN_DEFAULT_PAGE_SIZES, \
> -	GEN_DEFAULT_REGIONS
> +	GEN_DEFAULT_REGIONS, \
> +	LEGACY_CACHELEVEL
>   
>   static const struct intel_device_info i965g_info = {
>   	GEN4_FEATURES,
> @@ -395,11 +436,13 @@ static const struct intel_device_info gm45_info = {
>   	/* ilk does support rc6, but we do not implement [power] contexts */ \
>   	.has_rc6 = 0, \
>   	.dma_mask_size = 36, \
> +	.max_pat_index = 3, \
>   	I9XX_PIPE_OFFSETS, \
>   	I9XX_CURSOR_OFFSETS, \
>   	ILK_COLORS, \
>   	GEN_DEFAULT_PAGE_SIZES, \
> -	GEN_DEFAULT_REGIONS
> +	GEN_DEFAULT_REGIONS, \
> +	LEGACY_CACHELEVEL
>   
>   static const struct intel_device_info ilk_d_info = {
>   	GEN5_FEATURES,
> @@ -429,13 +472,15 @@ static const struct intel_device_info ilk_m_info = {
>   	.has_rc6p = 0, \
>   	.has_rps = true, \
>   	.dma_mask_size = 40, \
> +	.max_pat_index = 3, \
>   	.__runtime.ppgtt_type = INTEL_PPGTT_ALIASING, \
>   	.__runtime.ppgtt_size = 31, \
>   	I9XX_PIPE_OFFSETS, \
>   	I9XX_CURSOR_OFFSETS, \
>   	ILK_COLORS, \
>   	GEN_DEFAULT_PAGE_SIZES, \
> -	GEN_DEFAULT_REGIONS
> +	GEN_DEFAULT_REGIONS, \
> +	LEGACY_CACHELEVEL
>   
>   #define SNB_D_PLATFORM \
>   	GEN6_FEATURES, \
> @@ -482,13 +527,15 @@ static const struct intel_device_info snb_m_gt2_info = {
>   	.has_reset_engine = true, \
>   	.has_rps = true, \
>   	.dma_mask_size = 40, \
> +	.max_pat_index = 3, \
>   	.__runtime.ppgtt_type = INTEL_PPGTT_ALIASING, \
>   	.__runtime.ppgtt_size = 31, \
>   	IVB_PIPE_OFFSETS, \
>   	IVB_CURSOR_OFFSETS, \
>   	IVB_COLORS, \
>   	GEN_DEFAULT_PAGE_SIZES, \
> -	GEN_DEFAULT_REGIONS
> +	GEN_DEFAULT_REGIONS, \
> +	LEGACY_CACHELEVEL
>   
>   #define IVB_D_PLATFORM \
>   	GEN7_FEATURES, \
> @@ -542,6 +589,7 @@ static const struct intel_device_info vlv_info = {
>   	.display.has_gmch = 1,
>   	.display.has_hotplug = 1,
>   	.dma_mask_size = 40,
> +	.max_pat_index = 3,
>   	.__runtime.ppgtt_type = INTEL_PPGTT_ALIASING,
>   	.__runtime.ppgtt_size = 31,
>   	.has_snoop = true,
> @@ -553,6 +601,7 @@ static const struct intel_device_info vlv_info = {
>   	I9XX_COLORS,
>   	GEN_DEFAULT_PAGE_SIZES,
>   	GEN_DEFAULT_REGIONS,
> +	LEGACY_CACHELEVEL,
>   };
>   
>   #define G75_FEATURES  \
> @@ -640,6 +689,7 @@ static const struct intel_device_info chv_info = {
>   	.has_logical_ring_contexts = 1,
>   	.display.has_gmch = 1,
>   	.dma_mask_size = 39,
> +	.max_pat_index = 3,
>   	.__runtime.ppgtt_type = INTEL_PPGTT_FULL,
>   	.__runtime.ppgtt_size = 32,
>   	.has_reset_engine = 1,
> @@ -651,6 +701,7 @@ static const struct intel_device_info chv_info = {
>   	CHV_COLORS,
>   	GEN_DEFAULT_PAGE_SIZES,
>   	GEN_DEFAULT_REGIONS,
> +	LEGACY_CACHELEVEL,
>   };
>   
>   #define GEN9_DEFAULT_PAGE_SIZES \
> @@ -890,9 +941,11 @@ static const struct intel_device_info jsl_info = {
>   		[TRANSCODER_DSI_1] = TRANSCODER_DSI1_OFFSET, \
>   	}, \
>   	TGL_CURSOR_OFFSETS, \
> +	TGL_CACHELEVEL, \
>   	.has_global_mocs = 1, \
>   	.has_pxp = 1, \
> -	.display.has_dsb = 1
> +	.display.has_dsb = 1, \
> +	.max_pat_index = 3
>   
>   static const struct intel_device_info tgl_info = {
>   	GEN12_FEATURES,
> @@ -1014,6 +1067,7 @@ static const struct intel_device_info adl_p_info = {
>   	.__runtime.graphics.ip.ver = 12, \
>   	.__runtime.graphics.ip.rel = 50, \
>   	XE_HP_PAGE_SIZES, \
> +	TGL_CACHELEVEL, \
>   	.dma_mask_size = 46, \
>   	.has_3d_pipeline = 1, \
>   	.has_64bit_reloc = 1, \
> @@ -1032,6 +1086,7 @@ static const struct intel_device_info adl_p_info = {
>   	.has_reset_engine = 1, \
>   	.has_rps = 1, \
>   	.has_runtime_pm = 1, \
> +	.max_pat_index = 3, \
>   	.__runtime.ppgtt_size = 48, \
>   	.__runtime.ppgtt_type = INTEL_PPGTT_FULL
>   
> @@ -1108,11 +1163,13 @@ static const struct intel_device_info pvc_info = {
>   	PLATFORM(INTEL_PONTEVECCHIO),
>   	NO_DISPLAY,
>   	.has_flat_ccs = 0,
> +	.max_pat_index = 7,
>   	.__runtime.platform_engine_mask =
>   		BIT(BCS0) |
>   		BIT(VCS0) |
>   		BIT(CCS0) | BIT(CCS1) | BIT(CCS2) | BIT(CCS3),
>   	.require_force_probe = 1,
> +	PVC_CACHELEVEL,
>   };
>   
>   #define XE_LPDP_FEATURES	\
> @@ -1150,9 +1207,11 @@ static const struct intel_device_info mtl_info = {
>   	.has_llc = 0,
>   	.has_mslice_steering = 0,
>   	.has_snoop = 1,
> +	.max_pat_index = 4,
>   	.__runtime.memory_regions = REGION_SMEM | REGION_STOLEN_LMEM,
>   	.__runtime.platform_engine_mask = BIT(RCS0) | BIT(BCS0) | BIT(CCS0),
>   	.require_force_probe = 1,
> +	MTL_CACHELEVEL,
>   };
>   
>   #undef PLATFORM
> diff --git a/drivers/gpu/drm/i915/intel_device_info.h b/drivers/gpu/drm/i915/intel_device_info.h
> index f032f2500f50..959a4080840c 100644
> --- a/drivers/gpu/drm/i915/intel_device_info.h
> +++ b/drivers/gpu/drm/i915/intel_device_info.h
> @@ -35,6 +35,8 @@
>   #include "gt/intel_context_types.h"
>   #include "gt/intel_sseu.h"
>   
> +#include "gem/i915_gem_object_types.h"
> +
>   struct drm_printer;
>   struct drm_i915_private;
>   struct intel_gt_definition;
> @@ -308,6 +310,9 @@ struct intel_device_info {
>   	 * Initial runtime info. Do not access outside of i915_driver_create().
>   	 */
>   	const struct intel_runtime_info __runtime;
> +
> +	u32 cachelevel_to_pat[I915_MAX_CACHE_LEVEL];
> +	u32 max_pat_index;
>   };
>   
>   struct intel_driver_caps {
> diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> index f6a7c0bd2955..0eda8b4ee17f 100644
> --- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> +++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> @@ -123,7 +123,9 @@ struct drm_i915_private *mock_gem_device(void)
>   	static struct dev_iommu fake_iommu = { .priv = (void *)-1 };
>   #endif
>   	struct drm_i915_private *i915;
> +	struct intel_device_info *i915_info;
>   	struct pci_dev *pdev;
> +	unsigned int i;
>   	int ret;
>   
>   	pdev = kzalloc(sizeof(*pdev), GFP_KERNEL);
> @@ -180,6 +182,13 @@ struct drm_i915_private *mock_gem_device(void)
>   		I915_GTT_PAGE_SIZE_2M;
>   
>   	RUNTIME_INFO(i915)->memory_regions = REGION_SMEM;
> +
> +	/* simply use legacy cache level for mock device */
> +	i915_info = (struct intel_device_info *)INTEL_INFO(i915);
> +	i915_info->max_pat_index = 3;
> +	for (i = 0; i < I915_MAX_CACHE_LEVEL; i++)
> +		i915_info->cachelevel_to_pat[i] = i;

Pity LEGACY_CACHELEVEL cannot be used here.

Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>

Regards
Andrzej


> +
>   	intel_memory_regions_hw_probe(i915);
>   
>   	spin_lock_init(&i915->gpu_error.lock);


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Intel-gfx] [PATCH 7/8] drm/i915: use pat_index instead of cache_level
  2023-04-19 23:00   ` [Intel-gfx] " fei.yang
  (?)
@ 2023-04-20 10:13   ` Andrzej Hajda
  2023-04-20 12:39     ` Tvrtko Ursulin
  -1 siblings, 1 reply; 76+ messages in thread
From: Andrzej Hajda @ 2023-04-20 10:13 UTC (permalink / raw)
  To: fei.yang, intel-gfx; +Cc: Chris Wilson, Matt Roper, dri-devel

On 20.04.2023 01:00, fei.yang@intel.com wrote:
> From: Fei Yang <fei.yang@intel.com>
> 
> Currently the KMD is using enum i915_cache_level to set caching policy for
> buffer objects. This is flaky because the PAT index which really controls
> the caching behavior in PTE has far more levels than what's defined in the
> enum. In addition, the PAT index is platform dependent, having to translate
> between i915_cache_level and PAT index is not reliable, and makes the code
> more complicated.
> 
>  From UMD's perspective there is also a necessity to set caching policy for
> performance fine tuning. It's much easier for the UMD to directly use PAT
> index because the behavior of each PAT index is clearly defined in Bspec.
> Having the abstracted i915_cache_level sitting in between would only cause
> more ambiguity.
> 
> For these reasons this patch replaces i915_cache_level with PAT index. Also
> note, the cache_level is not completely removed yet, because the KMD still
> has the need of creating buffer objects with simple cache settings such as
> cached, uncached, or writethrough. For such simple cases, using cache_level
> would help simplify the code.

It seems quite fundamental change to me. Does this "not completely 
removed yet" means that in some future we will not have support for 
generic cache levels at all? Seems strange to me. Even looking at the 
number of users of i915_gem_get_pat_index below it seem very unlikely.

And if the support for generic level will stay, maybe it would be better 
to make usage of it more convienient. All conversion of
	f(..., cache_level, ...)
to
	f(..., i915_gem_get_pat_index(i915, cache_level), ...)
looks quite ugly to me.

Maybe extending cache level to support pat index somehow, for example:
enum i915_cache_level {
	I915_CACHE_NONE = 0,
	I915_CACHE_...,
	...
	I915_CACHE_1ST_PAT_INDEX = 0x100,
}

so real_pat_index = cache_level - I915_CACHE_1ST_PAT_INDEX

and in case of generic level there will be platform dependend conversion 
to real_pat_index?

I do not know the whole picture so maybe this is all wrong for some 
reason, just asking :)

Regards
Andrzej


> 
> Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
> Cc: Matt Roper <matthew.d.roper@intel.com>
> Signed-off-by: Fei Yang <fei.yang@intel.com>
> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
> ---
>   drivers/gpu/drm/i915/display/intel_dpt.c      | 12 +--
>   drivers/gpu/drm/i915/gem/i915_gem_domain.c    | 27 ++----
>   .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 10 ++-
>   drivers/gpu/drm/i915/gem/i915_gem_mman.c      |  3 +-
>   drivers/gpu/drm/i915/gem/i915_gem_object.c    | 52 +++++++++++-
>   drivers/gpu/drm/i915/gem/i915_gem_object.h    |  4 +
>   .../gpu/drm/i915/gem/i915_gem_object_types.h  | 25 +++++-
>   drivers/gpu/drm/i915/gem/i915_gem_stolen.c    |  4 +-
>   drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c  | 16 ++--
>   .../gpu/drm/i915/gem/selftests/huge_pages.c   |  2 +-
>   .../drm/i915/gem/selftests/i915_gem_migrate.c |  2 +-
>   .../drm/i915/gem/selftests/i915_gem_mman.c    |  2 +-
>   drivers/gpu/drm/i915/gt/gen6_ppgtt.c          | 10 ++-
>   drivers/gpu/drm/i915/gt/gen8_ppgtt.c          | 71 ++++++++--------
>   drivers/gpu/drm/i915/gt/gen8_ppgtt.h          |  3 +-
>   drivers/gpu/drm/i915/gt/intel_ggtt.c          | 82 +++++++++----------
>   drivers/gpu/drm/i915/gt/intel_gtt.h           | 20 ++---
>   drivers/gpu/drm/i915/gt/intel_migrate.c       | 47 ++++++-----
>   drivers/gpu/drm/i915/gt/intel_migrate.h       | 13 ++-
>   drivers/gpu/drm/i915/gt/intel_ppgtt.c         |  6 +-
>   drivers/gpu/drm/i915/gt/selftest_migrate.c    | 47 ++++++-----
>   drivers/gpu/drm/i915/gt/selftest_reset.c      |  8 +-
>   drivers/gpu/drm/i915/gt/selftest_timeline.c   |  2 +-
>   drivers/gpu/drm/i915/gt/selftest_tlb.c        |  4 +-
>   drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c      | 10 ++-
>   drivers/gpu/drm/i915/i915_debugfs.c           | 55 ++++++++++---
>   drivers/gpu/drm/i915/i915_gem.c               | 16 +++-
>   drivers/gpu/drm/i915/i915_gpu_error.c         |  8 +-
>   drivers/gpu/drm/i915/i915_vma.c               | 16 ++--
>   drivers/gpu/drm/i915/i915_vma.h               |  2 +-
>   drivers/gpu/drm/i915/i915_vma_types.h         |  2 -
>   drivers/gpu/drm/i915/selftests/i915_gem.c     |  5 +-
>   .../gpu/drm/i915/selftests/i915_gem_evict.c   |  4 +-
>   drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 15 ++--
>   .../drm/i915/selftests/intel_memory_region.c  |  4 +-
>   drivers/gpu/drm/i915/selftests/mock_gtt.c     |  8 +-
>   36 files changed, 378 insertions(+), 239 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/display/intel_dpt.c b/drivers/gpu/drm/i915/display/intel_dpt.c
> index c5eacfdba1a5..7c5fddb203ba 100644
> --- a/drivers/gpu/drm/i915/display/intel_dpt.c
> +++ b/drivers/gpu/drm/i915/display/intel_dpt.c
> @@ -43,24 +43,24 @@ static void gen8_set_pte(void __iomem *addr, gen8_pte_t pte)
>   static void dpt_insert_page(struct i915_address_space *vm,
>   			    dma_addr_t addr,
>   			    u64 offset,
> -			    enum i915_cache_level level,
> +			    unsigned int pat_index,
>   			    u32 flags)
>   {
>   	struct i915_dpt *dpt = i915_vm_to_dpt(vm);
>   	gen8_pte_t __iomem *base = dpt->iomem;
>   
>   	gen8_set_pte(base + offset / I915_GTT_PAGE_SIZE,
> -		     vm->pte_encode(addr, level, flags));
> +		     vm->pte_encode(addr, pat_index, flags));
>   }
>   
>   static void dpt_insert_entries(struct i915_address_space *vm,
>   			       struct i915_vma_resource *vma_res,
> -			       enum i915_cache_level level,
> +			       unsigned int pat_index,
>   			       u32 flags)
>   {
>   	struct i915_dpt *dpt = i915_vm_to_dpt(vm);
>   	gen8_pte_t __iomem *base = dpt->iomem;
> -	const gen8_pte_t pte_encode = vm->pte_encode(0, level, flags);
> +	const gen8_pte_t pte_encode = vm->pte_encode(0, pat_index, flags);
>   	struct sgt_iter sgt_iter;
>   	dma_addr_t addr;
>   	int i;
> @@ -83,7 +83,7 @@ static void dpt_clear_range(struct i915_address_space *vm,
>   static void dpt_bind_vma(struct i915_address_space *vm,
>   			 struct i915_vm_pt_stash *stash,
>   			 struct i915_vma_resource *vma_res,
> -			 enum i915_cache_level cache_level,
> +			 unsigned int pat_index,
>   			 u32 flags)
>   {
>   	u32 pte_flags;
> @@ -98,7 +98,7 @@ static void dpt_bind_vma(struct i915_address_space *vm,
>   	if (vma_res->bi.lmem)
>   		pte_flags |= PTE_LM;
>   
> -	vm->insert_entries(vm, vma_res, cache_level, pte_flags);
> +	vm->insert_entries(vm, vma_res, pat_index, pte_flags);
>   
>   	vma_res->page_sizes_gtt = I915_GTT_PAGE_SIZE;
>   
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> index bb3575b1479f..d5fd4c9cd9f8 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> @@ -27,8 +27,8 @@ static bool gpu_write_needs_clflush(struct drm_i915_gem_object *obj)
>   	if (IS_DGFX(i915))
>   		return false;
>   
> -	return !(obj->cache_level == I915_CACHE_NONE ||
> -		 obj->cache_level == I915_CACHE_WT);
> +	return !(i915_gem_object_has_cache_level(obj, I915_CACHE_NONE) ||
> +		 i915_gem_object_has_cache_level(obj, I915_CACHE_WT));
>   }
>   
>   bool i915_gem_cpu_write_needs_clflush(struct drm_i915_gem_object *obj)
> @@ -267,7 +267,7 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
>   {
>   	int ret;
>   
> -	if (obj->cache_level == cache_level)
> +	if (i915_gem_object_has_cache_level(obj, cache_level))
>   		return 0;
>   
>   	ret = i915_gem_object_wait(obj,
> @@ -278,10 +278,8 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
>   		return ret;
>   
>   	/* Always invalidate stale cachelines */
> -	if (obj->cache_level != cache_level) {
> -		i915_gem_object_set_cache_coherency(obj, cache_level);
> -		obj->cache_dirty = true;
> -	}
> +	i915_gem_object_set_cache_coherency(obj, cache_level);
> +	obj->cache_dirty = true;
>   
>   	/* The cache-level will be applied when each vma is rebound. */
>   	return i915_gem_object_unbind(obj,
> @@ -306,20 +304,13 @@ int i915_gem_get_caching_ioctl(struct drm_device *dev, void *data,
>   		goto out;
>   	}
>   
> -	switch (obj->cache_level) {
> -	case I915_CACHE_LLC:
> -	case I915_CACHE_L3_LLC:
> +	if (i915_gem_object_has_cache_level(obj, I915_CACHE_LLC) ||
> +	    i915_gem_object_has_cache_level(obj, I915_CACHE_L3_LLC))
>   		args->caching = I915_CACHING_CACHED;
> -		break;
> -
> -	case I915_CACHE_WT:
> +	else if (i915_gem_object_has_cache_level(obj, I915_CACHE_WT))
>   		args->caching = I915_CACHING_DISPLAY;
> -		break;
> -
> -	default:
> +	else
>   		args->caching = I915_CACHING_NONE;
> -		break;
> -	}
>   out:
>   	rcu_read_unlock();
>   	return err;
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> index 3aeede6aee4d..d42915516636 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> @@ -642,7 +642,7 @@ static inline int use_cpu_reloc(const struct reloc_cache *cache,
>   
>   	return (cache->has_llc ||
>   		obj->cache_dirty ||
> -		obj->cache_level != I915_CACHE_NONE);
> +		!i915_gem_object_has_cache_level(obj, I915_CACHE_NONE));
>   }
>   
>   static int eb_reserve_vma(struct i915_execbuffer *eb,
> @@ -1323,8 +1323,10 @@ static void *reloc_iomap(struct i915_vma *batch,
>   	offset = cache->node.start;
>   	if (drm_mm_node_allocated(&cache->node)) {
>   		ggtt->vm.insert_page(&ggtt->vm,
> -				     i915_gem_object_get_dma_address(obj, page),
> -				     offset, I915_CACHE_NONE, 0);
> +			i915_gem_object_get_dma_address(obj, page),
> +			offset,
> +			i915_gem_get_pat_index(ggtt->vm.i915, I915_CACHE_NONE),
> +			0);
>   	} else {
>   		offset += page << PAGE_SHIFT;
>   	}
> @@ -1464,7 +1466,7 @@ eb_relocate_entry(struct i915_execbuffer *eb,
>   			reloc_cache_unmap(&eb->reloc_cache);
>   			mutex_lock(&vma->vm->mutex);
>   			err = i915_vma_bind(target->vma,
> -					    target->vma->obj->cache_level,
> +					    target->vma->obj->pat_index,
>   					    PIN_GLOBAL, NULL, NULL);
>   			mutex_unlock(&vma->vm->mutex);
>   			reloc_cache_remap(&eb->reloc_cache, ev->vma->obj);
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> index 3dbacdf0911a..50c30efa08a3 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> @@ -383,7 +383,8 @@ static vm_fault_t vm_fault_gtt(struct vm_fault *vmf)
>   	}
>   
>   	/* Access to snoopable pages through the GTT is incoherent. */
> -	if (obj->cache_level != I915_CACHE_NONE && !HAS_LLC(i915)) {
> +	if (!(i915_gem_object_has_cache_level(obj, I915_CACHE_NONE) ||
> +	      HAS_LLC(i915))) {
>   		ret = -EFAULT;
>   		goto err_unpin;
>   	}
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> index 8c70a0ec7d2f..27c948350b5b 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> @@ -54,6 +54,25 @@ unsigned int i915_gem_get_pat_index(struct drm_i915_private *i915,
>   	return INTEL_INFO(i915)->cachelevel_to_pat[level];
>   }
>   
> +bool i915_gem_object_has_cache_level(const struct drm_i915_gem_object *obj,
> +				     enum i915_cache_level lvl)

The name suggest object can have more cache levels, maybe only my 
impression, up to you.

> +{
> +	/*
> +	 * cache_level == I915_CACHE_INVAL indicates the UMD's have set the
> +	 * caching policy through pat_index, in which case the KMD should
> +	 * leave the coherency to be managed by user space, simply return
> +	 * true here.
> +	 */
> +	if (obj->cache_level == I915_CACHE_INVAL)
> +		return true;
> +
> +	/*
> +	 * Otherwise the pat_index should have been converted from cache_level
> +	 * so that the following comparison is valid.
> +	 */
> +	return obj->pat_index == i915_gem_get_pat_index(obj_to_i915(obj), lvl);
> +}
> +
>   struct drm_i915_gem_object *i915_gem_object_alloc(void)
>   {
>   	struct drm_i915_gem_object *obj;
> @@ -133,7 +152,7 @@ void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object *obj,
>   {
>   	struct drm_i915_private *i915 = to_i915(obj->base.dev);
>   
> -	obj->cache_level = cache_level;
> +	obj->pat_index = i915_gem_get_pat_index(i915, cache_level);
>   
>   	if (cache_level != I915_CACHE_NONE)
>   		obj->cache_coherent = (I915_BO_CACHE_COHERENT_FOR_READ |
> @@ -148,6 +167,37 @@ void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object *obj,
>   		!IS_DGFX(i915);
>   }
>   
> +/**
> + * i915_gem_object_set_pat_index - set PAT index to be used in PTE encode
> + * @obj: #drm_i915_gem_object
> + * @pat_index: PAT index
> + *
> + * This is a clone of i915_gem_object_set_cache_coherency taking pat index
> + * instead of cache_level as its second argument.
> + */
> +void i915_gem_object_set_pat_index(struct drm_i915_gem_object *obj,
> +				   unsigned int pat_index)
> +{
> +	struct drm_i915_private *i915 = to_i915(obj->base.dev);
> +
> +	if (obj->pat_index == pat_index)
> +		return;
> +
> +	obj->pat_index = pat_index;
> +
> +	if (pat_index != i915_gem_get_pat_index(i915, I915_CACHE_NONE))
> +		obj->cache_coherent = (I915_BO_CACHE_COHERENT_FOR_READ |
> +				       I915_BO_CACHE_COHERENT_FOR_WRITE);
> +	else if (HAS_LLC(i915))
> +		obj->cache_coherent = I915_BO_CACHE_COHERENT_FOR_READ;
> +	else
> +		obj->cache_coherent = 0;
> +
> +	obj->cache_dirty =
> +		!(obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_WRITE) &&
> +		!IS_DGFX(i915);
> +}
> +
>   bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj)
>   {
>   	struct drm_i915_private *i915 = to_i915(obj->base.dev);
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> index 4c92e17b4337..6f00aab10015 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> @@ -34,6 +34,8 @@ static inline bool i915_gem_object_size_2big(u64 size)
>   
>   unsigned int i915_gem_get_pat_index(struct drm_i915_private *i915,
>   				    enum i915_cache_level level);
> +bool i915_gem_object_has_cache_level(const struct drm_i915_gem_object *obj,
> +				     enum i915_cache_level lvl);
>   void i915_gem_init__objects(struct drm_i915_private *i915);
>   
>   void i915_objects_module_exit(void);
> @@ -764,6 +766,8 @@ bool i915_gem_object_has_unknown_state(struct drm_i915_gem_object *obj);
>   
>   void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object *obj,
>   					 unsigned int cache_level);
> +void i915_gem_object_set_pat_index(struct drm_i915_gem_object *obj,
> +				   unsigned int pat_index);
>   bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj);
>   void i915_gem_object_flush_if_display(struct drm_i915_gem_object *obj);
>   void i915_gem_object_flush_if_display_locked(struct drm_i915_gem_object *obj);
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> index 41b35abccf88..132ce01dee9f 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> @@ -195,6 +195,7 @@ enum i915_cache_level {
>   	 */
>   	I915_CACHE_WT,
>   	I915_MAX_CACHE_LEVEL,
> +	I915_CACHE_INVAL = I915_MAX_CACHE_LEVEL,
>   };
>   
>   enum i915_map_type {
> @@ -358,10 +359,28 @@ struct drm_i915_gem_object {
>   #define I915_BO_FLAG_STRUCT_PAGE BIT(0) /* Object backed by struct pages */
>   #define I915_BO_FLAG_IOMEM       BIT(1) /* Object backed by IO memory */
>   	/**
> -	 * @cache_level: The desired GTT caching level.
> +	 * @pat_index: The desired PAT index.
> +	 *
> +	 * See hardware specification for valid PAT indices for each platform.
> +	 * This field used to contain a value of enum i915_cache_level. It's
> +	 * changed to an unsigned int because PAT indices are being used by
> +	 * both UMD and KMD for caching policy control after GEN12.
> +	 * For backward compatibility, this field will continue to contain
> +	 * value of i915_cache_level for pre-GEN12 platforms so that the PTE
> +	 * encode functions for these legacy platforms can stay the same.
> +	 * In the meantime platform specific tables are created to translate
> +	 * i915_cache_level into pat index, for more details check the macros
> +	 * defined i915/i915_pci.c, e.g. PVC_CACHELEVEL.
> +	 */
> +	unsigned int pat_index:6;
> +	/**
> +	 * @cache_level: Indicate whether pat_index is set by UMD
>   	 *
> -	 * See enum i915_cache_level for possible values, along with what
> -	 * each does.
> +	 * This used to hold desired GTT caching level, but is now replaced by
> +	 * pat_index. It's kept here for KMD to tell whether the pat_index is
> +	 * set by UMD or converted from enum i915_cache_level.
> +	 * This field should be 0 by default, but I915_CACHE_INVAL if the
> +	 * pat_index is set by UMD.
>   	 */
>   	unsigned int cache_level:3;
>   	/**
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
> index ee492d823f1b..3b094d36a0b0 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
> @@ -565,7 +565,9 @@ static void dbg_poison(struct i915_ggtt *ggtt,
>   
>   		ggtt->vm.insert_page(&ggtt->vm, addr,
>   				     ggtt->error_capture.start,
> -				     I915_CACHE_NONE, 0);
> +				     i915_gem_get_pat_index(ggtt->vm.i915,
> +							    I915_CACHE_NONE),
> +				     0);
>   		mb();
>   
>   		s = io_mapping_map_wc(&ggtt->iomap,
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
> index 69eb20ed4d47..e40761e13c2a 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
> @@ -214,7 +214,8 @@ static struct dma_fence *i915_ttm_accel_move(struct ttm_buffer_object *bo,
>   
>   		intel_engine_pm_get(to_gt(i915)->migrate.context->engine);
>   		ret = intel_context_migrate_clear(to_gt(i915)->migrate.context, deps,
> -						  dst_st->sgl, dst_level,
> +						  dst_st->sgl,
> +						  i915_gem_get_pat_index(i915, dst_level),
>   						  i915_ttm_gtt_binds_lmem(dst_mem),
>   						  0, &rq);
>   	} else {
> @@ -227,12 +228,13 @@ static struct dma_fence *i915_ttm_accel_move(struct ttm_buffer_object *bo,
>   		src_level = i915_ttm_cache_level(i915, bo->resource, src_ttm);
>   		intel_engine_pm_get(to_gt(i915)->migrate.context->engine);
>   		ret = intel_context_migrate_copy(to_gt(i915)->migrate.context,
> -						 deps, src_rsgt->table.sgl,
> -						 src_level,
> -						 i915_ttm_gtt_binds_lmem(bo->resource),
> -						 dst_st->sgl, dst_level,
> -						 i915_ttm_gtt_binds_lmem(dst_mem),
> -						 &rq);
> +					deps, src_rsgt->table.sgl,
> +					i915_gem_get_pat_index(i915, src_level),
> +					i915_ttm_gtt_binds_lmem(bo->resource),
> +					dst_st->sgl,
> +					i915_gem_get_pat_index(i915, dst_level),
> +					i915_ttm_gtt_binds_lmem(dst_mem),
> +					&rq);
>   
>   		i915_refct_sgt_put(src_rsgt);
>   	}
> diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
> index defece0bcb81..ebb68ac9cd5e 100644
> --- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
> +++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
> @@ -354,7 +354,7 @@ fake_huge_pages_object(struct drm_i915_private *i915, u64 size, bool single)
>   
>   	obj->write_domain = I915_GEM_DOMAIN_CPU;
>   	obj->read_domains = I915_GEM_DOMAIN_CPU;
> -	obj->cache_level = I915_CACHE_NONE;
> +	obj->pat_index = i915_gem_get_pat_index(i915, I915_CACHE_NONE);
>   
>   	return obj;
>   }
> diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c
> index fe6c37fd7859..a93a90b15907 100644
> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c
> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c
> @@ -219,7 +219,7 @@ static int __igt_lmem_pages_migrate(struct intel_gt *gt,
>   			continue;
>   
>   		err = intel_migrate_clear(&gt->migrate, &ww, deps,
> -					  obj->mm.pages->sgl, obj->cache_level,
> +					  obj->mm.pages->sgl, obj->pat_index,
>   					  i915_gem_object_is_lmem(obj),
>   					  0xdeadbeaf, &rq);
>   		if (rq) {
> diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
> index 56279908ed30..a93d8f9f8bc1 100644
> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
> @@ -1222,7 +1222,7 @@ static int __igt_mmap_migrate(struct intel_memory_region **placements,
>   	}
>   
>   	err = intel_context_migrate_clear(to_gt(i915)->migrate.context, NULL,
> -					  obj->mm.pages->sgl, obj->cache_level,
> +					  obj->mm.pages->sgl, obj->pat_index,
>   					  i915_gem_object_is_lmem(obj),
>   					  expand32(POISON_INUSE), &rq);
>   	i915_gem_object_unpin_pages(obj);
> diff --git a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
> index 5aaacc53fa4c..c2bdc133c89a 100644
> --- a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
> +++ b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
> @@ -109,7 +109,7 @@ static void gen6_ppgtt_clear_range(struct i915_address_space *vm,
>   
>   static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
>   				      struct i915_vma_resource *vma_res,
> -				      enum i915_cache_level cache_level,
> +				      unsigned int pat_index,
>   				      u32 flags)
>   {
>   	struct i915_ppgtt *ppgtt = i915_vm_to_ppgtt(vm);
> @@ -117,7 +117,7 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
>   	unsigned int first_entry = vma_res->start / I915_GTT_PAGE_SIZE;
>   	unsigned int act_pt = first_entry / GEN6_PTES;
>   	unsigned int act_pte = first_entry % GEN6_PTES;
> -	const u32 pte_encode = vm->pte_encode(0, cache_level, flags);
> +	const u32 pte_encode = vm->pte_encode(0, pat_index, flags);
>   	struct sgt_dma iter = sgt_dma(vma_res);
>   	gen6_pte_t *vaddr;
>   
> @@ -227,7 +227,9 @@ static int gen6_ppgtt_init_scratch(struct gen6_ppgtt *ppgtt)
>   
>   	vm->scratch[0]->encode =
>   		vm->pte_encode(px_dma(vm->scratch[0]),
> -			       I915_CACHE_NONE, PTE_READ_ONLY);
> +			       i915_gem_get_pat_index(vm->i915,
> +						      I915_CACHE_NONE),
> +			       PTE_READ_ONLY);
>   
>   	vm->scratch[1] = vm->alloc_pt_dma(vm, I915_GTT_PAGE_SIZE_4K);
>   	if (IS_ERR(vm->scratch[1])) {
> @@ -278,7 +280,7 @@ static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
>   static void pd_vma_bind(struct i915_address_space *vm,
>   			struct i915_vm_pt_stash *stash,
>   			struct i915_vma_resource *vma_res,
> -			enum i915_cache_level cache_level,
> +			unsigned int pat_index,
>   			u32 unused)
>   {
>   	struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
> diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> index 7a4b1d1afce9..c046813514f4 100644
> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> @@ -56,7 +56,7 @@ static u64 gen8_pte_encode(dma_addr_t addr,
>   }
>   
>   static u64 mtl_pte_encode(dma_addr_t addr,
> -			  enum i915_cache_level level,
> +			  unsigned int pat_index,
>   			  u32 flags)
>   {
>   	gen8_pte_t pte = addr | GEN8_PAGE_PRESENT | GEN8_PAGE_RW;
> @@ -67,24 +67,17 @@ static u64 mtl_pte_encode(dma_addr_t addr,
>   	if (flags & PTE_LM)
>   		pte |= GEN12_PPGTT_PTE_LM | GEN12_PPGTT_PTE_NC;
>   
> -	switch (level) {
> -	case I915_CACHE_NONE:
> -		pte |= GEN12_PPGTT_PTE_PAT1;
> -		break;
> -	case I915_CACHE_LLC:
> -	case I915_CACHE_L3_LLC:
> -		pte |= GEN12_PPGTT_PTE_PAT0 | GEN12_PPGTT_PTE_PAT1;
> -		break;
> -	case I915_CACHE_WT:
> +	if (pat_index & BIT(0))
>   		pte |= GEN12_PPGTT_PTE_PAT0;
> -		break;
> -	default:
> -		/* This should never happen. Added to deal with the compile
> -		 * error due to the addition of I915_MAX_CACHE_LEVEL. Will
> -		 * be removed by the pat_index patch.
> -		 */
> -		break;
> -	}
> +
> +	if (pat_index & BIT(1))
> +		pte |= GEN12_PPGTT_PTE_PAT1;
> +
> +	if (pat_index & BIT(2))
> +		pte |= GEN12_PPGTT_PTE_PAT2;
> +
> +	if (pat_index & BIT(3))
> +		pte |= MTL_PPGTT_PTE_PAT3;
>   
>   	return pte;
>   }
> @@ -457,11 +450,11 @@ gen8_ppgtt_insert_pte(struct i915_ppgtt *ppgtt,
>   		      struct i915_page_directory *pdp,
>   		      struct sgt_dma *iter,
>   		      u64 idx,
> -		      enum i915_cache_level cache_level,
> +		      unsigned int pat_index,
>   		      u32 flags)
>   {
>   	struct i915_page_directory *pd;
> -	const gen8_pte_t pte_encode = ppgtt->vm.pte_encode(0, cache_level, flags);
> +	const gen8_pte_t pte_encode = ppgtt->vm.pte_encode(0, pat_index, flags);
>   	gen8_pte_t *vaddr;
>   
>   	pd = i915_pd_entry(pdp, gen8_pd_index(idx, 2));
> @@ -504,10 +497,10 @@ static void
>   xehpsdv_ppgtt_insert_huge(struct i915_address_space *vm,
>   			  struct i915_vma_resource *vma_res,
>   			  struct sgt_dma *iter,
> -			  enum i915_cache_level cache_level,
> +			  unsigned int pat_index,
>   			  u32 flags)
>   {
> -	const gen8_pte_t pte_encode = vm->pte_encode(0, cache_level, flags);
> +	const gen8_pte_t pte_encode = vm->pte_encode(0, pat_index, flags);
>   	unsigned int rem = sg_dma_len(iter->sg);
>   	u64 start = vma_res->start;
>   	u64 end = start + vma_res->vma_size;
> @@ -611,10 +604,10 @@ xehpsdv_ppgtt_insert_huge(struct i915_address_space *vm,
>   static void gen8_ppgtt_insert_huge(struct i915_address_space *vm,
>   				   struct i915_vma_resource *vma_res,
>   				   struct sgt_dma *iter,
> -				   enum i915_cache_level cache_level,
> +				   unsigned int pat_index,
>   				   u32 flags)
>   {
> -	const gen8_pte_t pte_encode = vm->pte_encode(0, cache_level, flags);
> +	const gen8_pte_t pte_encode = vm->pte_encode(0, pat_index, flags);
>   	unsigned int rem = sg_dma_len(iter->sg);
>   	u64 start = vma_res->start;
>   
> @@ -734,7 +727,7 @@ static void gen8_ppgtt_insert_huge(struct i915_address_space *vm,
>   
>   static void gen8_ppgtt_insert(struct i915_address_space *vm,
>   			      struct i915_vma_resource *vma_res,
> -			      enum i915_cache_level cache_level,
> +			      unsigned int pat_index,
>   			      u32 flags)
>   {
>   	struct i915_ppgtt * const ppgtt = i915_vm_to_ppgtt(vm);
> @@ -742,9 +735,9 @@ static void gen8_ppgtt_insert(struct i915_address_space *vm,
>   
>   	if (vma_res->bi.page_sizes.sg > I915_GTT_PAGE_SIZE) {
>   		if (HAS_64K_PAGES(vm->i915))
> -			xehpsdv_ppgtt_insert_huge(vm, vma_res, &iter, cache_level, flags);
> +			xehpsdv_ppgtt_insert_huge(vm, vma_res, &iter, pat_index, flags);
>   		else
> -			gen8_ppgtt_insert_huge(vm, vma_res, &iter, cache_level, flags);
> +			gen8_ppgtt_insert_huge(vm, vma_res, &iter, pat_index, flags);
>   	} else  {
>   		u64 idx = vma_res->start >> GEN8_PTE_SHIFT;
>   
> @@ -753,7 +746,7 @@ static void gen8_ppgtt_insert(struct i915_address_space *vm,
>   				gen8_pdp_for_page_index(vm, idx);
>   
>   			idx = gen8_ppgtt_insert_pte(ppgtt, pdp, &iter, idx,
> -						    cache_level, flags);
> +						    pat_index, flags);
>   		} while (idx);
>   
>   		vma_res->page_sizes_gtt = I915_GTT_PAGE_SIZE;
> @@ -763,7 +756,7 @@ static void gen8_ppgtt_insert(struct i915_address_space *vm,
>   static void gen8_ppgtt_insert_entry(struct i915_address_space *vm,
>   				    dma_addr_t addr,
>   				    u64 offset,
> -				    enum i915_cache_level level,
> +				    unsigned int pat_index,
>   				    u32 flags)
>   {
>   	u64 idx = offset >> GEN8_PTE_SHIFT;
> @@ -777,14 +770,14 @@ static void gen8_ppgtt_insert_entry(struct i915_address_space *vm,
>   	GEM_BUG_ON(pt->is_compact);
>   
>   	vaddr = px_vaddr(pt);
> -	vaddr[gen8_pd_index(idx, 0)] = vm->pte_encode(addr, level, flags);
> +	vaddr[gen8_pd_index(idx, 0)] = vm->pte_encode(addr, pat_index, flags);
>   	drm_clflush_virt_range(&vaddr[gen8_pd_index(idx, 0)], sizeof(*vaddr));
>   }
>   
>   static void __xehpsdv_ppgtt_insert_entry_lm(struct i915_address_space *vm,
>   					    dma_addr_t addr,
>   					    u64 offset,
> -					    enum i915_cache_level level,
> +					    unsigned int pat_index,
>   					    u32 flags)
>   {
>   	u64 idx = offset >> GEN8_PTE_SHIFT;
> @@ -807,20 +800,20 @@ static void __xehpsdv_ppgtt_insert_entry_lm(struct i915_address_space *vm,
>   	}
>   
>   	vaddr = px_vaddr(pt);
> -	vaddr[gen8_pd_index(idx, 0) / 16] = vm->pte_encode(addr, level, flags);
> +	vaddr[gen8_pd_index(idx, 0) / 16] = vm->pte_encode(addr, pat_index, flags);
>   }
>   
>   static void xehpsdv_ppgtt_insert_entry(struct i915_address_space *vm,
>   				       dma_addr_t addr,
>   				       u64 offset,
> -				       enum i915_cache_level level,
> +				       unsigned int pat_index,
>   				       u32 flags)
>   {
>   	if (flags & PTE_LM)
>   		return __xehpsdv_ppgtt_insert_entry_lm(vm, addr, offset,
> -						       level, flags);
> +						       pat_index, flags);
>   
> -	return gen8_ppgtt_insert_entry(vm, addr, offset, level, flags);
> +	return gen8_ppgtt_insert_entry(vm, addr, offset, pat_index, flags);
>   }
>   
>   static int gen8_init_scratch(struct i915_address_space *vm)
> @@ -855,7 +848,9 @@ static int gen8_init_scratch(struct i915_address_space *vm)
>   
>   	vm->scratch[0]->encode =
>   		vm->pte_encode(px_dma(vm->scratch[0]),
> -			       I915_CACHE_NONE, pte_flags);
> +			       i915_gem_get_pat_index(vm->i915,
> +						      I915_CACHE_NONE),
> +			       pte_flags);
>   
>   	for (i = 1; i <= vm->top; i++) {
>   		struct drm_i915_gem_object *obj;
> @@ -873,7 +868,9 @@ static int gen8_init_scratch(struct i915_address_space *vm)
>   		}
>   
>   		fill_px(obj, vm->scratch[i - 1]->encode);
> -		obj->encode = gen8_pde_encode(px_dma(obj), I915_CACHE_NONE);
> +		obj->encode = gen8_pde_encode(px_dma(obj),
> +					      i915_gem_get_pat_index(vm->i915,
> +								     I915_CACHE_NONE));
>   
>   		vm->scratch[i] = obj;
>   	}
> diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.h b/drivers/gpu/drm/i915/gt/gen8_ppgtt.h
> index f541d19264b4..19c635441642 100644
> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.h
> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.h
> @@ -10,13 +10,12 @@
>   
>   struct i915_address_space;
>   struct intel_gt;
> -enum i915_cache_level;
>   
>   struct i915_ppgtt *gen8_ppgtt_create(struct intel_gt *gt,
>   				     unsigned long lmem_pt_obj_flags);
>   
>   u64 gen8_ggtt_pte_encode(dma_addr_t addr,
> -			 enum i915_cache_level level,
> +			 unsigned int pat_index,
>   			 u32 flags);
>   
>   #endif
> diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> index c8390d03fce2..2a7942fac798 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> @@ -221,7 +221,7 @@ static void guc_ggtt_invalidate(struct i915_ggtt *ggtt)
>   }
>   
>   static u64 mtl_ggtt_pte_encode(dma_addr_t addr,
> -			       enum i915_cache_level level,
> +			       unsigned int pat_index,
>   			       u32 flags)
>   {
>   	gen8_pte_t pte = addr | GEN8_PAGE_PRESENT;
> @@ -231,30 +231,17 @@ static u64 mtl_ggtt_pte_encode(dma_addr_t addr,
>   	if (flags & PTE_LM)
>   		pte |= GEN12_GGTT_PTE_LM;
>   
> -	switch (level) {
> -	case I915_CACHE_NONE:
> -		pte |= MTL_GGTT_PTE_PAT1;
> -		break;
> -	case I915_CACHE_LLC:
> -	case I915_CACHE_L3_LLC:
> -		pte |= MTL_GGTT_PTE_PAT0 | MTL_GGTT_PTE_PAT1;
> -		break;
> -	case I915_CACHE_WT:
> +	if (pat_index & BIT(0))
>   		pte |= MTL_GGTT_PTE_PAT0;
> -		break;
> -	default:
> -		/* This should never happen. Added to deal with the compile
> -		 * error due to the addition of I915_MAX_CACHE_LEVEL. Will
> -		 * be removed by the pat_index patch.
> -		 */
> -		break;
> -	}
> +
> +	if (pat_index & BIT(1))
> +		pte |= MTL_GGTT_PTE_PAT1;
>   
>   	return pte;
>   }
>   
>   u64 gen8_ggtt_pte_encode(dma_addr_t addr,
> -			 enum i915_cache_level level,
> +			 unsigned int pat_index,
>   			 u32 flags)
>   {
>   	gen8_pte_t pte = addr | GEN8_PAGE_PRESENT;
> @@ -273,25 +260,25 @@ static void gen8_set_pte(void __iomem *addr, gen8_pte_t pte)
>   static void gen8_ggtt_insert_page(struct i915_address_space *vm,
>   				  dma_addr_t addr,
>   				  u64 offset,
> -				  enum i915_cache_level level,
> +				  unsigned int pat_index,
>   				  u32 flags)
>   {
>   	struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
>   	gen8_pte_t __iomem *pte =
>   		(gen8_pte_t __iomem *)ggtt->gsm + offset / I915_GTT_PAGE_SIZE;
>   
> -	gen8_set_pte(pte, ggtt->vm.pte_encode(addr, level, flags));
> +	gen8_set_pte(pte, ggtt->vm.pte_encode(addr, pat_index, flags));
>   
>   	ggtt->invalidate(ggtt);
>   }
>   
>   static void gen8_ggtt_insert_entries(struct i915_address_space *vm,
>   				     struct i915_vma_resource *vma_res,
> -				     enum i915_cache_level level,
> +				     unsigned int pat_index,
>   				     u32 flags)
>   {
>   	struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
> -	const gen8_pte_t pte_encode = ggtt->vm.pte_encode(0, level, flags);
> +	const gen8_pte_t pte_encode = ggtt->vm.pte_encode(0, pat_index, flags);
>   	gen8_pte_t __iomem *gte;
>   	gen8_pte_t __iomem *end;
>   	struct sgt_iter iter;
> @@ -348,14 +335,14 @@ static void gen8_ggtt_clear_range(struct i915_address_space *vm,
>   static void gen6_ggtt_insert_page(struct i915_address_space *vm,
>   				  dma_addr_t addr,
>   				  u64 offset,
> -				  enum i915_cache_level level,
> +				  unsigned int pat_index,
>   				  u32 flags)
>   {
>   	struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
>   	gen6_pte_t __iomem *pte =
>   		(gen6_pte_t __iomem *)ggtt->gsm + offset / I915_GTT_PAGE_SIZE;
>   
> -	iowrite32(vm->pte_encode(addr, level, flags), pte);
> +	iowrite32(vm->pte_encode(addr, pat_index, flags), pte);
>   
>   	ggtt->invalidate(ggtt);
>   }
> @@ -368,7 +355,7 @@ static void gen6_ggtt_insert_page(struct i915_address_space *vm,
>    */
>   static void gen6_ggtt_insert_entries(struct i915_address_space *vm,
>   				     struct i915_vma_resource *vma_res,
> -				     enum i915_cache_level level,
> +				     unsigned int pat_index,
>   				     u32 flags)
>   {
>   	struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
> @@ -385,7 +372,7 @@ static void gen6_ggtt_insert_entries(struct i915_address_space *vm,
>   		iowrite32(vm->scratch[0]->encode, gte++);
>   	end += (vma_res->node_size + vma_res->guard) / I915_GTT_PAGE_SIZE;
>   	for_each_sgt_daddr(addr, iter, vma_res->bi.pages)
> -		iowrite32(vm->pte_encode(addr, level, flags), gte++);
> +		iowrite32(vm->pte_encode(addr, pat_index, flags), gte++);
>   	GEM_BUG_ON(gte > end);
>   
>   	/* Fill the allocated but "unused" space beyond the end of the buffer */
> @@ -420,14 +407,15 @@ struct insert_page {
>   	struct i915_address_space *vm;
>   	dma_addr_t addr;
>   	u64 offset;
> -	enum i915_cache_level level;
> +	unsigned int pat_index;
>   };
>   
>   static int bxt_vtd_ggtt_insert_page__cb(void *_arg)
>   {
>   	struct insert_page *arg = _arg;
>   
> -	gen8_ggtt_insert_page(arg->vm, arg->addr, arg->offset, arg->level, 0);
> +	gen8_ggtt_insert_page(arg->vm, arg->addr, arg->offset,
> +			      arg->pat_index, 0);
>   	bxt_vtd_ggtt_wa(arg->vm);
>   
>   	return 0;
> @@ -436,10 +424,10 @@ static int bxt_vtd_ggtt_insert_page__cb(void *_arg)
>   static void bxt_vtd_ggtt_insert_page__BKL(struct i915_address_space *vm,
>   					  dma_addr_t addr,
>   					  u64 offset,
> -					  enum i915_cache_level level,
> +					  unsigned int pat_index,
>   					  u32 unused)
>   {
> -	struct insert_page arg = { vm, addr, offset, level };
> +	struct insert_page arg = { vm, addr, offset, pat_index };
>   
>   	stop_machine(bxt_vtd_ggtt_insert_page__cb, &arg, NULL);
>   }
> @@ -447,7 +435,7 @@ static void bxt_vtd_ggtt_insert_page__BKL(struct i915_address_space *vm,
>   struct insert_entries {
>   	struct i915_address_space *vm;
>   	struct i915_vma_resource *vma_res;
> -	enum i915_cache_level level;
> +	unsigned int pat_index;
>   	u32 flags;
>   };
>   
> @@ -455,7 +443,8 @@ static int bxt_vtd_ggtt_insert_entries__cb(void *_arg)
>   {
>   	struct insert_entries *arg = _arg;
>   
> -	gen8_ggtt_insert_entries(arg->vm, arg->vma_res, arg->level, arg->flags);
> +	gen8_ggtt_insert_entries(arg->vm, arg->vma_res,
> +				 arg->pat_index, arg->flags);
>   	bxt_vtd_ggtt_wa(arg->vm);
>   
>   	return 0;
> @@ -463,10 +452,10 @@ static int bxt_vtd_ggtt_insert_entries__cb(void *_arg)
>   
>   static void bxt_vtd_ggtt_insert_entries__BKL(struct i915_address_space *vm,
>   					     struct i915_vma_resource *vma_res,
> -					     enum i915_cache_level level,
> +					     unsigned int pat_index,
>   					     u32 flags)
>   {
> -	struct insert_entries arg = { vm, vma_res, level, flags };
> +	struct insert_entries arg = { vm, vma_res, pat_index, flags };
>   
>   	stop_machine(bxt_vtd_ggtt_insert_entries__cb, &arg, NULL);
>   }
> @@ -495,7 +484,7 @@ static void gen6_ggtt_clear_range(struct i915_address_space *vm,
>   void intel_ggtt_bind_vma(struct i915_address_space *vm,
>   			 struct i915_vm_pt_stash *stash,
>   			 struct i915_vma_resource *vma_res,
> -			 enum i915_cache_level cache_level,
> +			 unsigned int pat_index,
>   			 u32 flags)
>   {
>   	u32 pte_flags;
> @@ -512,7 +501,7 @@ void intel_ggtt_bind_vma(struct i915_address_space *vm,
>   	if (vma_res->bi.lmem)
>   		pte_flags |= PTE_LM;
>   
> -	vm->insert_entries(vm, vma_res, cache_level, pte_flags);
> +	vm->insert_entries(vm, vma_res, pat_index, pte_flags);
>   	vma_res->page_sizes_gtt = I915_GTT_PAGE_SIZE;
>   }
>   
> @@ -661,7 +650,7 @@ static int init_ggtt(struct i915_ggtt *ggtt)
>   static void aliasing_gtt_bind_vma(struct i915_address_space *vm,
>   				  struct i915_vm_pt_stash *stash,
>   				  struct i915_vma_resource *vma_res,
> -				  enum i915_cache_level cache_level,
> +				  unsigned int pat_index,
>   				  u32 flags)
>   {
>   	u32 pte_flags;
> @@ -673,10 +662,10 @@ static void aliasing_gtt_bind_vma(struct i915_address_space *vm,
>   
>   	if (flags & I915_VMA_LOCAL_BIND)
>   		ppgtt_bind_vma(&i915_vm_to_ggtt(vm)->alias->vm,
> -			       stash, vma_res, cache_level, flags);
> +			       stash, vma_res, pat_index, flags);
>   
>   	if (flags & I915_VMA_GLOBAL_BIND)
> -		vm->insert_entries(vm, vma_res, cache_level, pte_flags);
> +		vm->insert_entries(vm, vma_res, pat_index, pte_flags);
>   
>   	vma_res->bound_flags |= flags;
>   }
> @@ -933,7 +922,9 @@ static int ggtt_probe_common(struct i915_ggtt *ggtt, u64 size)
>   
>   	ggtt->vm.scratch[0]->encode =
>   		ggtt->vm.pte_encode(px_dma(ggtt->vm.scratch[0]),
> -				    I915_CACHE_NONE, pte_flags);
> +				    i915_gem_get_pat_index(i915,
> +							   I915_CACHE_NONE),
> +				    pte_flags);
>   
>   	return 0;
>   }
> @@ -1022,6 +1013,11 @@ static int gen8_gmch_probe(struct i915_ggtt *ggtt)
>   	return ggtt_probe_common(ggtt, size);
>   }
>   
> +/*
> + * For pre-gen8 platforms pat_index is the same as enum i915_cache_level,
> + * so these PTE encode functions are left with using cache_level.
> + * See translation table LEGACY_CACHELEVEL.
> + */
>   static u64 snb_pte_encode(dma_addr_t addr,
>   			  enum i915_cache_level level,
>   			  u32 flags)
> @@ -1302,7 +1298,9 @@ bool i915_ggtt_resume_vm(struct i915_address_space *vm)
>   		 */
>   		vma->resource->bound_flags = 0;
>   		vma->ops->bind_vma(vm, NULL, vma->resource,
> -				   obj ? obj->cache_level : 0,
> +				   obj ? obj->pat_index :
> +					 i915_gem_get_pat_index(vm->i915,
> +								I915_CACHE_NONE),
>   				   was_bound);
>   
>   		if (obj) { /* only used during resume => exclusive access */
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
> index 854ec09fd588..be767e13b1e5 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
> @@ -165,8 +165,6 @@ typedef u64 gen8_pte_t;
>   #define MTL_2_COH_1W	REG_FIELD_PREP(MTL_PAT_INDEX_COH_MODE_MASK, 2)
>   #define MTL_0_COH_NON	REG_FIELD_PREP(MTL_PAT_INDEX_COH_MODE_MASK, 0)
>   
> -enum i915_cache_level;
> -
>   struct drm_i915_gem_object;
>   struct i915_fence_reg;
>   struct i915_vma;
> @@ -234,7 +232,7 @@ struct i915_vma_ops {
>   	void (*bind_vma)(struct i915_address_space *vm,
>   			 struct i915_vm_pt_stash *stash,
>   			 struct i915_vma_resource *vma_res,
> -			 enum i915_cache_level cache_level,
> +			 unsigned int pat_index,
>   			 u32 flags);
>   	/*
>   	 * Unmap an object from an address space. This usually consists of
> @@ -306,7 +304,7 @@ struct i915_address_space {
>   		(*alloc_scratch_dma)(struct i915_address_space *vm, int sz);
>   
>   	u64 (*pte_encode)(dma_addr_t addr,
> -			  enum i915_cache_level level,
> +			  unsigned int pat_index,
>   			  u32 flags); /* Create a valid PTE */
>   #define PTE_READ_ONLY	BIT(0)
>   #define PTE_LM		BIT(1)
> @@ -321,20 +319,20 @@ struct i915_address_space {
>   	void (*insert_page)(struct i915_address_space *vm,
>   			    dma_addr_t addr,
>   			    u64 offset,
> -			    enum i915_cache_level cache_level,
> +			    unsigned int pat_index,
>   			    u32 flags);
>   	void (*insert_entries)(struct i915_address_space *vm,
>   			       struct i915_vma_resource *vma_res,
> -			       enum i915_cache_level cache_level,
> +			       unsigned int pat_index,
>   			       u32 flags);
>   	void (*raw_insert_page)(struct i915_address_space *vm,
>   				dma_addr_t addr,
>   				u64 offset,
> -				enum i915_cache_level cache_level,
> +				unsigned int pat_index,
>   				u32 flags);
>   	void (*raw_insert_entries)(struct i915_address_space *vm,
>   				   struct i915_vma_resource *vma_res,
> -				   enum i915_cache_level cache_level,
> +				   unsigned int pat_index,
>   				   u32 flags);
>   	void (*cleanup)(struct i915_address_space *vm);
>   
> @@ -581,7 +579,7 @@ void ppgtt_init(struct i915_ppgtt *ppgtt, struct intel_gt *gt,
>   void intel_ggtt_bind_vma(struct i915_address_space *vm,
>   			 struct i915_vm_pt_stash *stash,
>   			 struct i915_vma_resource *vma_res,
> -			 enum i915_cache_level cache_level,
> +			 unsigned int pat_index,
>   			 u32 flags);
>   void intel_ggtt_unbind_vma(struct i915_address_space *vm,
>   			   struct i915_vma_resource *vma_res);
> @@ -639,7 +637,7 @@ void
>   __set_pd_entry(struct i915_page_directory * const pd,
>   	       const unsigned short idx,
>   	       struct i915_page_table *pt,
> -	       u64 (*encode)(const dma_addr_t, const enum i915_cache_level));
> +	       u64 (*encode)(const dma_addr_t, const unsigned int pat_index));
>   
>   #define set_pd_entry(pd, idx, to) \
>   	__set_pd_entry((pd), (idx), px_pt(to), gen8_pde_encode)
> @@ -659,7 +657,7 @@ void gen6_ggtt_invalidate(struct i915_ggtt *ggtt);
>   void ppgtt_bind_vma(struct i915_address_space *vm,
>   		    struct i915_vm_pt_stash *stash,
>   		    struct i915_vma_resource *vma_res,
> -		    enum i915_cache_level cache_level,
> +		    unsigned int pat_index,
>   		    u32 flags);
>   void ppgtt_unbind_vma(struct i915_address_space *vm,
>   		      struct i915_vma_resource *vma_res);
> diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c
> index 3f638f198796..117c3d05af3e 100644
> --- a/drivers/gpu/drm/i915/gt/intel_migrate.c
> +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
> @@ -45,7 +45,9 @@ static void xehpsdv_toggle_pdes(struct i915_address_space *vm,
>   	 * Insert a dummy PTE into every PT that will map to LMEM to ensure
>   	 * we have a correctly setup PDE structure for later use.
>   	 */
> -	vm->insert_page(vm, 0, d->offset, I915_CACHE_NONE, PTE_LM);
> +	vm->insert_page(vm, 0, d->offset,
> +			i915_gem_get_pat_index(vm->i915, I915_CACHE_NONE),
> +			PTE_LM);
>   	GEM_BUG_ON(!pt->is_compact);
>   	d->offset += SZ_2M;
>   }
> @@ -63,7 +65,9 @@ static void xehpsdv_insert_pte(struct i915_address_space *vm,
>   	 * alignment is 64K underneath for the pt, and we are careful
>   	 * not to access the space in the void.
>   	 */
> -	vm->insert_page(vm, px_dma(pt), d->offset, I915_CACHE_NONE, PTE_LM);
> +	vm->insert_page(vm, px_dma(pt), d->offset,
> +			i915_gem_get_pat_index(vm->i915, I915_CACHE_NONE),
> +			PTE_LM);
>   	d->offset += SZ_64K;
>   }
>   
> @@ -73,7 +77,8 @@ static void insert_pte(struct i915_address_space *vm,
>   {
>   	struct insert_pte_data *d = data;
>   
> -	vm->insert_page(vm, px_dma(pt), d->offset, I915_CACHE_NONE,
> +	vm->insert_page(vm, px_dma(pt), d->offset,
> +			i915_gem_get_pat_index(vm->i915, I915_CACHE_NONE),
>   			i915_gem_object_is_lmem(pt->base) ? PTE_LM : 0);
>   	d->offset += PAGE_SIZE;
>   }
> @@ -356,13 +361,13 @@ static int max_pte_pkt_size(struct i915_request *rq, int pkt)
>   
>   static int emit_pte(struct i915_request *rq,
>   		    struct sgt_dma *it,
> -		    enum i915_cache_level cache_level,
> +		    unsigned int pat_index,
>   		    bool is_lmem,
>   		    u64 offset,
>   		    int length)
>   {
>   	bool has_64K_pages = HAS_64K_PAGES(rq->engine->i915);
> -	const u64 encode = rq->context->vm->pte_encode(0, cache_level,
> +	const u64 encode = rq->context->vm->pte_encode(0, pat_index,
>   						       is_lmem ? PTE_LM : 0);
>   	struct intel_ring *ring = rq->ring;
>   	int pkt, dword_length;
> @@ -673,17 +678,17 @@ int
>   intel_context_migrate_copy(struct intel_context *ce,
>   			   const struct i915_deps *deps,
>   			   struct scatterlist *src,
> -			   enum i915_cache_level src_cache_level,
> +			   unsigned int src_pat_index,
>   			   bool src_is_lmem,
>   			   struct scatterlist *dst,
> -			   enum i915_cache_level dst_cache_level,
> +			   unsigned int dst_pat_index,
>   			   bool dst_is_lmem,
>   			   struct i915_request **out)
>   {
>   	struct sgt_dma it_src = sg_sgt(src), it_dst = sg_sgt(dst), it_ccs;
>   	struct drm_i915_private *i915 = ce->engine->i915;
>   	u64 ccs_bytes_to_cpy = 0, bytes_to_cpy;
> -	enum i915_cache_level ccs_cache_level;
> +	unsigned int ccs_pat_index;
>   	u32 src_offset, dst_offset;
>   	u8 src_access, dst_access;
>   	struct i915_request *rq;
> @@ -707,12 +712,12 @@ intel_context_migrate_copy(struct intel_context *ce,
>   		dst_sz = scatter_list_length(dst);
>   		if (src_is_lmem) {
>   			it_ccs = it_dst;
> -			ccs_cache_level = dst_cache_level;
> +			ccs_pat_index = dst_pat_index;
>   			ccs_is_src = false;
>   		} else if (dst_is_lmem) {
>   			bytes_to_cpy = dst_sz;
>   			it_ccs = it_src;
> -			ccs_cache_level = src_cache_level;
> +			ccs_pat_index = src_pat_index;
>   			ccs_is_src = true;
>   		}
>   
> @@ -773,7 +778,7 @@ intel_context_migrate_copy(struct intel_context *ce,
>   		src_sz = calculate_chunk_sz(i915, src_is_lmem,
>   					    bytes_to_cpy, ccs_bytes_to_cpy);
>   
> -		len = emit_pte(rq, &it_src, src_cache_level, src_is_lmem,
> +		len = emit_pte(rq, &it_src, src_pat_index, src_is_lmem,
>   			       src_offset, src_sz);
>   		if (!len) {
>   			err = -EINVAL;
> @@ -784,7 +789,7 @@ intel_context_migrate_copy(struct intel_context *ce,
>   			goto out_rq;
>   		}
>   
> -		err = emit_pte(rq, &it_dst, dst_cache_level, dst_is_lmem,
> +		err = emit_pte(rq, &it_dst, dst_pat_index, dst_is_lmem,
>   			       dst_offset, len);
>   		if (err < 0)
>   			goto out_rq;
> @@ -811,7 +816,7 @@ intel_context_migrate_copy(struct intel_context *ce,
>   				goto out_rq;
>   
>   			ccs_sz = GET_CCS_BYTES(i915, len);
> -			err = emit_pte(rq, &it_ccs, ccs_cache_level, false,
> +			err = emit_pte(rq, &it_ccs, ccs_pat_index, false,
>   				       ccs_is_src ? src_offset : dst_offset,
>   				       ccs_sz);
>   			if (err < 0)
> @@ -979,7 +984,7 @@ int
>   intel_context_migrate_clear(struct intel_context *ce,
>   			    const struct i915_deps *deps,
>   			    struct scatterlist *sg,
> -			    enum i915_cache_level cache_level,
> +			    unsigned int pat_index,
>   			    bool is_lmem,
>   			    u32 value,
>   			    struct i915_request **out)
> @@ -1027,7 +1032,7 @@ intel_context_migrate_clear(struct intel_context *ce,
>   		if (err)
>   			goto out_rq;
>   
> -		len = emit_pte(rq, &it, cache_level, is_lmem, offset, CHUNK_SZ);
> +		len = emit_pte(rq, &it, pat_index, is_lmem, offset, CHUNK_SZ);
>   		if (len <= 0) {
>   			err = len;
>   			goto out_rq;
> @@ -1074,10 +1079,10 @@ int intel_migrate_copy(struct intel_migrate *m,
>   		       struct i915_gem_ww_ctx *ww,
>   		       const struct i915_deps *deps,
>   		       struct scatterlist *src,
> -		       enum i915_cache_level src_cache_level,
> +		       unsigned int src_pat_index,
>   		       bool src_is_lmem,
>   		       struct scatterlist *dst,
> -		       enum i915_cache_level dst_cache_level,
> +		       unsigned int dst_pat_index,
>   		       bool dst_is_lmem,
>   		       struct i915_request **out)
>   {
> @@ -1098,8 +1103,8 @@ int intel_migrate_copy(struct intel_migrate *m,
>   		goto out;
>   
>   	err = intel_context_migrate_copy(ce, deps,
> -					 src, src_cache_level, src_is_lmem,
> -					 dst, dst_cache_level, dst_is_lmem,
> +					 src, src_pat_index, src_is_lmem,
> +					 dst, dst_pat_index, dst_is_lmem,
>   					 out);
>   
>   	intel_context_unpin(ce);
> @@ -1113,7 +1118,7 @@ intel_migrate_clear(struct intel_migrate *m,
>   		    struct i915_gem_ww_ctx *ww,
>   		    const struct i915_deps *deps,
>   		    struct scatterlist *sg,
> -		    enum i915_cache_level cache_level,
> +		    unsigned int pat_index,
>   		    bool is_lmem,
>   		    u32 value,
>   		    struct i915_request **out)
> @@ -1134,7 +1139,7 @@ intel_migrate_clear(struct intel_migrate *m,
>   	if (err)
>   		goto out;
>   
> -	err = intel_context_migrate_clear(ce, deps, sg, cache_level,
> +	err = intel_context_migrate_clear(ce, deps, sg, pat_index,
>   					  is_lmem, value, out);
>   
>   	intel_context_unpin(ce);
> diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.h b/drivers/gpu/drm/i915/gt/intel_migrate.h
> index ccc677ec4aa3..11fc09a00c4b 100644
> --- a/drivers/gpu/drm/i915/gt/intel_migrate.h
> +++ b/drivers/gpu/drm/i915/gt/intel_migrate.h
> @@ -16,7 +16,6 @@ struct i915_request;
>   struct i915_gem_ww_ctx;
>   struct intel_gt;
>   struct scatterlist;
> -enum i915_cache_level;
>   
>   int intel_migrate_init(struct intel_migrate *m, struct intel_gt *gt);
>   
> @@ -26,20 +25,20 @@ int intel_migrate_copy(struct intel_migrate *m,
>   		       struct i915_gem_ww_ctx *ww,
>   		       const struct i915_deps *deps,
>   		       struct scatterlist *src,
> -		       enum i915_cache_level src_cache_level,
> +		       unsigned int src_pat_index,
>   		       bool src_is_lmem,
>   		       struct scatterlist *dst,
> -		       enum i915_cache_level dst_cache_level,
> +		       unsigned int dst_pat_index,
>   		       bool dst_is_lmem,
>   		       struct i915_request **out);
>   
>   int intel_context_migrate_copy(struct intel_context *ce,
>   			       const struct i915_deps *deps,
>   			       struct scatterlist *src,
> -			       enum i915_cache_level src_cache_level,
> +			       unsigned int src_pat_index,
>   			       bool src_is_lmem,
>   			       struct scatterlist *dst,
> -			       enum i915_cache_level dst_cache_level,
> +			       unsigned int dst_pat_index,
>   			       bool dst_is_lmem,
>   			       struct i915_request **out);
>   
> @@ -48,7 +47,7 @@ intel_migrate_clear(struct intel_migrate *m,
>   		    struct i915_gem_ww_ctx *ww,
>   		    const struct i915_deps *deps,
>   		    struct scatterlist *sg,
> -		    enum i915_cache_level cache_level,
> +		    unsigned int pat_index,
>   		    bool is_lmem,
>   		    u32 value,
>   		    struct i915_request **out);
> @@ -56,7 +55,7 @@ int
>   intel_context_migrate_clear(struct intel_context *ce,
>   			    const struct i915_deps *deps,
>   			    struct scatterlist *sg,
> -			    enum i915_cache_level cache_level,
> +			    unsigned int pat_index,
>   			    bool is_lmem,
>   			    u32 value,
>   			    struct i915_request **out);
> diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
> index 7ecfa672f738..f0da3555c6db 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
> @@ -98,7 +98,7 @@ void
>   __set_pd_entry(struct i915_page_directory * const pd,
>   	       const unsigned short idx,
>   	       struct i915_page_table * const to,
> -	       u64 (*encode)(const dma_addr_t, const enum i915_cache_level))
> +	       u64 (*encode)(const dma_addr_t, const unsigned int))
>   {
>   	/* Each thread pre-pins the pd, and we may have a thread per pde. */
>   	GEM_BUG_ON(atomic_read(px_used(pd)) > NALLOC * I915_PDES);
> @@ -181,7 +181,7 @@ struct i915_ppgtt *i915_ppgtt_create(struct intel_gt *gt,
>   void ppgtt_bind_vma(struct i915_address_space *vm,
>   		    struct i915_vm_pt_stash *stash,
>   		    struct i915_vma_resource *vma_res,
> -		    enum i915_cache_level cache_level,
> +		    unsigned int pat_index,
>   		    u32 flags)
>   {
>   	u32 pte_flags;
> @@ -199,7 +199,7 @@ void ppgtt_bind_vma(struct i915_address_space *vm,
>   	if (vma_res->bi.lmem)
>   		pte_flags |= PTE_LM;
>   
> -	vm->insert_entries(vm, vma_res, cache_level, pte_flags);
> +	vm->insert_entries(vm, vma_res, pat_index, pte_flags);
>   	wmb();
>   }
>   
> diff --git a/drivers/gpu/drm/i915/gt/selftest_migrate.c b/drivers/gpu/drm/i915/gt/selftest_migrate.c
> index e677f2da093d..3def5ca72dec 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_migrate.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_migrate.c
> @@ -137,7 +137,7 @@ static int copy(struct intel_migrate *migrate,
>   static int intel_context_copy_ccs(struct intel_context *ce,
>   				  const struct i915_deps *deps,
>   				  struct scatterlist *sg,
> -				  enum i915_cache_level cache_level,
> +				  unsigned int pat_index,
>   				  bool write_to_ccs,
>   				  struct i915_request **out)
>   {
> @@ -185,7 +185,7 @@ static int intel_context_copy_ccs(struct intel_context *ce,
>   		if (err)
>   			goto out_rq;
>   
> -		len = emit_pte(rq, &it, cache_level, true, offset, CHUNK_SZ);
> +		len = emit_pte(rq, &it, pat_index, true, offset, CHUNK_SZ);
>   		if (len <= 0) {
>   			err = len;
>   			goto out_rq;
> @@ -223,7 +223,7 @@ intel_migrate_ccs_copy(struct intel_migrate *m,
>   		       struct i915_gem_ww_ctx *ww,
>   		       const struct i915_deps *deps,
>   		       struct scatterlist *sg,
> -		       enum i915_cache_level cache_level,
> +		       unsigned int pat_index,
>   		       bool write_to_ccs,
>   		       struct i915_request **out)
>   {
> @@ -243,7 +243,7 @@ intel_migrate_ccs_copy(struct intel_migrate *m,
>   	if (err)
>   		goto out;
>   
> -	err = intel_context_copy_ccs(ce, deps, sg, cache_level,
> +	err = intel_context_copy_ccs(ce, deps, sg, pat_index,
>   				     write_to_ccs, out);
>   
>   	intel_context_unpin(ce);
> @@ -300,7 +300,7 @@ static int clear(struct intel_migrate *migrate,
>   			/* Write the obj data into ccs surface */
>   			err = intel_migrate_ccs_copy(migrate, &ww, NULL,
>   						     obj->mm.pages->sgl,
> -						     obj->cache_level,
> +						     obj->pat_index,
>   						     true, &rq);
>   			if (rq && !err) {
>   				if (i915_request_wait(rq, 0, HZ) < 0) {
> @@ -351,7 +351,7 @@ static int clear(struct intel_migrate *migrate,
>   
>   			err = intel_migrate_ccs_copy(migrate, &ww, NULL,
>   						     obj->mm.pages->sgl,
> -						     obj->cache_level,
> +						     obj->pat_index,
>   						     false, &rq);
>   			if (rq && !err) {
>   				if (i915_request_wait(rq, 0, HZ) < 0) {
> @@ -414,9 +414,9 @@ static int __migrate_copy(struct intel_migrate *migrate,
>   			  struct i915_request **out)
>   {
>   	return intel_migrate_copy(migrate, ww, NULL,
> -				  src->mm.pages->sgl, src->cache_level,
> +				  src->mm.pages->sgl, src->pat_index,
>   				  i915_gem_object_is_lmem(src),
> -				  dst->mm.pages->sgl, dst->cache_level,
> +				  dst->mm.pages->sgl, dst->pat_index,
>   				  i915_gem_object_is_lmem(dst),
>   				  out);
>   }
> @@ -428,9 +428,9 @@ static int __global_copy(struct intel_migrate *migrate,
>   			 struct i915_request **out)
>   {
>   	return intel_context_migrate_copy(migrate->context, NULL,
> -					  src->mm.pages->sgl, src->cache_level,
> +					  src->mm.pages->sgl, src->pat_index,
>   					  i915_gem_object_is_lmem(src),
> -					  dst->mm.pages->sgl, dst->cache_level,
> +					  dst->mm.pages->sgl, dst->pat_index,
>   					  i915_gem_object_is_lmem(dst),
>   					  out);
>   }
> @@ -455,7 +455,7 @@ static int __migrate_clear(struct intel_migrate *migrate,
>   {
>   	return intel_migrate_clear(migrate, ww, NULL,
>   				   obj->mm.pages->sgl,
> -				   obj->cache_level,
> +				   obj->pat_index,
>   				   i915_gem_object_is_lmem(obj),
>   				   value, out);
>   }
> @@ -468,7 +468,7 @@ static int __global_clear(struct intel_migrate *migrate,
>   {
>   	return intel_context_migrate_clear(migrate->context, NULL,
>   					   obj->mm.pages->sgl,
> -					   obj->cache_level,
> +					   obj->pat_index,
>   					   i915_gem_object_is_lmem(obj),
>   					   value, out);
>   }
> @@ -648,7 +648,7 @@ static int live_emit_pte_full_ring(void *arg)
>   	 */
>   	pr_info("%s emite_pte ring space=%u\n", __func__, rq->ring->space);
>   	it = sg_sgt(obj->mm.pages->sgl);
> -	len = emit_pte(rq, &it, obj->cache_level, false, 0, CHUNK_SZ);
> +	len = emit_pte(rq, &it, obj->pat_index, false, 0, CHUNK_SZ);
>   	if (!len) {
>   		err = -EINVAL;
>   		goto out_rq;
> @@ -844,7 +844,7 @@ static int wrap_ktime_compare(const void *A, const void *B)
>   
>   static int __perf_clear_blt(struct intel_context *ce,
>   			    struct scatterlist *sg,
> -			    enum i915_cache_level cache_level,
> +			    unsigned int pat_index,
>   			    bool is_lmem,
>   			    size_t sz)
>   {
> @@ -858,7 +858,7 @@ static int __perf_clear_blt(struct intel_context *ce,
>   
>   		t0 = ktime_get();
>   
> -		err = intel_context_migrate_clear(ce, NULL, sg, cache_level,
> +		err = intel_context_migrate_clear(ce, NULL, sg, pat_index,
>   						  is_lmem, 0, &rq);
>   		if (rq) {
>   			if (i915_request_wait(rq, 0, MAX_SCHEDULE_TIMEOUT) < 0)
> @@ -904,7 +904,8 @@ static int perf_clear_blt(void *arg)
>   
>   		err = __perf_clear_blt(gt->migrate.context,
>   				       dst->mm.pages->sgl,
> -				       I915_CACHE_NONE,
> +				       i915_gem_get_pat_index(gt->i915,
> +							      I915_CACHE_NONE),
>   				       i915_gem_object_is_lmem(dst),
>   				       sizes[i]);
>   
> @@ -919,10 +920,10 @@ static int perf_clear_blt(void *arg)
>   
>   static int __perf_copy_blt(struct intel_context *ce,
>   			   struct scatterlist *src,
> -			   enum i915_cache_level src_cache_level,
> +			   unsigned int src_pat_index,
>   			   bool src_is_lmem,
>   			   struct scatterlist *dst,
> -			   enum i915_cache_level dst_cache_level,
> +			   unsigned int dst_pat_index,
>   			   bool dst_is_lmem,
>   			   size_t sz)
>   {
> @@ -937,9 +938,9 @@ static int __perf_copy_blt(struct intel_context *ce,
>   		t0 = ktime_get();
>   
>   		err = intel_context_migrate_copy(ce, NULL,
> -						 src, src_cache_level,
> +						 src, src_pat_index,
>   						 src_is_lmem,
> -						 dst, dst_cache_level,
> +						 dst, dst_pat_index,
>   						 dst_is_lmem,
>   						 &rq);
>   		if (rq) {
> @@ -994,10 +995,12 @@ static int perf_copy_blt(void *arg)
>   
>   		err = __perf_copy_blt(gt->migrate.context,
>   				      src->mm.pages->sgl,
> -				      I915_CACHE_NONE,
> +				      i915_gem_get_pat_index(gt->i915,
> +							     I915_CACHE_NONE),
>   				      i915_gem_object_is_lmem(src),
>   				      dst->mm.pages->sgl,
> -				      I915_CACHE_NONE,
> +				      i915_gem_get_pat_index(gt->i915,
> +							     I915_CACHE_NONE),
>   				      i915_gem_object_is_lmem(dst),
>   				      sz);
>   
> diff --git a/drivers/gpu/drm/i915/gt/selftest_reset.c b/drivers/gpu/drm/i915/gt/selftest_reset.c
> index a9e0a91bc0e0..79aa6ac66ad2 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_reset.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_reset.c
> @@ -86,7 +86,9 @@ __igt_reset_stolen(struct intel_gt *gt,
>   
>   		ggtt->vm.insert_page(&ggtt->vm, dma,
>   				     ggtt->error_capture.start,
> -				     I915_CACHE_NONE, 0);
> +				     i915_gem_get_pat_index(gt->i915,
> +							    I915_CACHE_NONE),
> +				     0);
>   		mb();
>   
>   		s = io_mapping_map_wc(&ggtt->iomap,
> @@ -127,7 +129,9 @@ __igt_reset_stolen(struct intel_gt *gt,
>   
>   		ggtt->vm.insert_page(&ggtt->vm, dma,
>   				     ggtt->error_capture.start,
> -				     I915_CACHE_NONE, 0);
> +				     i915_gem_get_pat_index(gt->i915,
> +							    I915_CACHE_NONE),
> +				     0);
>   		mb();
>   
>   		s = io_mapping_map_wc(&ggtt->iomap,
> diff --git a/drivers/gpu/drm/i915/gt/selftest_timeline.c b/drivers/gpu/drm/i915/gt/selftest_timeline.c
> index 9f536c251179..39c3ec12df1a 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_timeline.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_timeline.c
> @@ -836,7 +836,7 @@ static int setup_watcher(struct hwsp_watcher *w, struct intel_gt *gt,
>   		return PTR_ERR(obj);
>   
>   	/* keep the same cache settings as timeline */
> -	i915_gem_object_set_cache_coherency(obj, tl->hwsp_ggtt->obj->cache_level);
> +	i915_gem_object_set_pat_index(obj, tl->hwsp_ggtt->obj->pat_index);
>   	w->map = i915_gem_object_pin_map_unlocked(obj,
>   						  page_unmask_bits(tl->hwsp_ggtt->obj->mm.mapping));
>   	if (IS_ERR(w->map)) {
> diff --git a/drivers/gpu/drm/i915/gt/selftest_tlb.c b/drivers/gpu/drm/i915/gt/selftest_tlb.c
> index e6cac1f15d6e..4493c8518e91 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_tlb.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_tlb.c
> @@ -36,6 +36,8 @@ pte_tlbinv(struct intel_context *ce,
>   	   u64 length,
>   	   struct rnd_state *prng)
>   {
> +	const unsigned int pat_index =
> +		i915_gem_get_pat_index(ce->vm->i915, I915_CACHE_NONE);
>   	struct drm_i915_gem_object *batch;
>   	struct drm_mm_node vb_node;
>   	struct i915_request *rq;
> @@ -155,7 +157,7 @@ pte_tlbinv(struct intel_context *ce,
>   		/* Flip the PTE between A and B */
>   		if (i915_gem_object_is_lmem(vb->obj))
>   			pte_flags |= PTE_LM;
> -		ce->vm->insert_entries(ce->vm, &vb_res, 0, pte_flags);
> +		ce->vm->insert_entries(ce->vm, &vb_res, pat_index, pte_flags);
>   
>   		/* Flush the PTE update to concurrent HW */
>   		tlbinv(ce->vm, addr & -length, length);
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
> index a82a53dbbc86..145681ae20a5 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
> @@ -890,9 +890,15 @@ static void uc_fw_bind_ggtt(struct intel_uc_fw *uc_fw)
>   		pte_flags |= PTE_LM;
>   
>   	if (ggtt->vm.raw_insert_entries)
> -		ggtt->vm.raw_insert_entries(&ggtt->vm, dummy, I915_CACHE_NONE, pte_flags);
> +		ggtt->vm.raw_insert_entries(&ggtt->vm, dummy,
> +					    i915_gem_get_pat_index(ggtt->vm.i915,
> +								   I915_CACHE_NONE),
> +					    pte_flags);
>   	else
> -		ggtt->vm.insert_entries(&ggtt->vm, dummy, I915_CACHE_NONE, pte_flags);
> +		ggtt->vm.insert_entries(&ggtt->vm, dummy,
> +					i915_gem_get_pat_index(ggtt->vm.i915,
> +							       I915_CACHE_NONE),
> +					pte_flags);
>   }
>   
>   static void uc_fw_unbind_ggtt(struct intel_uc_fw *uc_fw)
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 41389a32e998..9a4922da3a71 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -139,21 +139,56 @@ static const char *stringify_vma_type(const struct i915_vma *vma)
>   	return "ppgtt";
>   }
>   
> -static const char *i915_cache_level_str(struct drm_i915_private *i915, int type)
> -{
> -	switch (type) {
> -	case I915_CACHE_NONE: return " uncached";
> -	case I915_CACHE_LLC: return HAS_LLC(i915) ? " LLC" : " snooped";
> -	case I915_CACHE_L3_LLC: return " L3+LLC";
> -	case I915_CACHE_WT: return " WT";
> -	default: return "";
> +static const char *i915_cache_level_str(struct drm_i915_gem_object *obj)
> +{
> +	struct drm_i915_private *i915 = obj_to_i915(obj);
> +
> +	if (IS_METEORLAKE(i915)) {
> +		switch (obj->pat_index) {
> +		case 0: return " WB";
> +		case 1: return " WT";
> +		case 2: return " UC";
> +		case 3: return " WB (1-Way Coh)";
> +		case 4: return " WB (2-Way Coh)";
> +		default: return " not defined";
> +		}
> +	} else if (IS_PONTEVECCHIO(i915)) {
> +		switch (obj->pat_index) {
> +		case 0: return " UC";
> +		case 1: return " WC";
> +		case 2: return " WT";
> +		case 3: return " WB";
> +		case 4: return " WT (CLOS1)";
> +		case 5: return " WB (CLOS1)";
> +		case 6: return " WT (CLOS2)";
> +		case 7: return " WT (CLOS2)";
> +		default: return " not defined";
> +		}
> +	} else if (GRAPHICS_VER(i915) >= 12) {
> +		switch (obj->pat_index) {
> +		case 0: return " WB";
> +		case 1: return " WC";
> +		case 2: return " WT";
> +		case 3: return " UC";
> +		default: return " not defined";
> +		}
> +	} else {
> +		if (i915_gem_object_has_cache_level(obj, I915_CACHE_NONE))
> +			return " uncached";
> +		else if (i915_gem_object_has_cache_level(obj, I915_CACHE_LLC))
> +			return HAS_LLC(i915) ? " LLC" : " snooped";
> +		else if (i915_gem_object_has_cache_level(obj, I915_CACHE_L3_LLC))
> +			return " L3+LLC";
> +		else if (i915_gem_object_has_cache_level(obj, I915_CACHE_WT))
> +			return " WT";
> +		else
> +			return " not defined";
>   	}
>   }
>   
>   void
>   i915_debugfs_describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
>   {
> -	struct drm_i915_private *dev_priv = to_i915(obj->base.dev);
>   	struct i915_vma *vma;
>   	int pin_count = 0;
>   
> @@ -165,7 +200,7 @@ i915_debugfs_describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
>   		   obj->base.size / 1024,
>   		   obj->read_domains,
>   		   obj->write_domain,
> -		   i915_cache_level_str(dev_priv, obj->cache_level),
> +		   i915_cache_level_str(obj),
>   		   obj->mm.dirty ? " dirty" : "",
>   		   obj->mm.madv == I915_MADV_DONTNEED ? " purgeable" : "");
>   	if (obj->base.name)
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 0a78bdbd36b1..63207b0740b3 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -420,8 +420,12 @@ i915_gem_gtt_pread(struct drm_i915_gem_object *obj,
>   		page_length = remain < page_length ? remain : page_length;
>   		if (drm_mm_node_allocated(&node)) {
>   			ggtt->vm.insert_page(&ggtt->vm,
> -					     i915_gem_object_get_dma_address(obj, offset >> PAGE_SHIFT),
> -					     node.start, I915_CACHE_NONE, 0);
> +					i915_gem_object_get_dma_address(obj,
> +									offset >> PAGE_SHIFT),
> +					node.start,
> +					i915_gem_get_pat_index(i915,
> +							       I915_CACHE_NONE),
> +					0);
>   		} else {
>   			page_base += offset & PAGE_MASK;
>   		}
> @@ -598,8 +602,12 @@ i915_gem_gtt_pwrite_fast(struct drm_i915_gem_object *obj,
>   			/* flush the write before we modify the GGTT */
>   			intel_gt_flush_ggtt_writes(ggtt->vm.gt);
>   			ggtt->vm.insert_page(&ggtt->vm,
> -					     i915_gem_object_get_dma_address(obj, offset >> PAGE_SHIFT),
> -					     node.start, I915_CACHE_NONE, 0);
> +					i915_gem_object_get_dma_address(obj,
> +									offset >> PAGE_SHIFT),
> +					node.start,
> +					i915_gem_get_pat_index(i915,
> +							       I915_CACHE_NONE),
> +					0);
>   			wmb(); /* flush modifications to the GGTT (insert_page) */
>   		} else {
>   			page_base += offset & PAGE_MASK;
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> index f020c0086fbc..2556cabea02c 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> @@ -1117,10 +1117,14 @@ i915_vma_coredump_create(const struct intel_gt *gt,
>   			mutex_lock(&ggtt->error_mutex);
>   			if (ggtt->vm.raw_insert_page)
>   				ggtt->vm.raw_insert_page(&ggtt->vm, dma, slot,
> -							 I915_CACHE_NONE, 0);
> +						i915_gem_get_pat_index(gt->i915,
> +								       I915_CACHE_NONE),
> +						0);
>   			else
>   				ggtt->vm.insert_page(&ggtt->vm, dma, slot,
> -						     I915_CACHE_NONE, 0);
> +						i915_gem_get_pat_index(gt->i915,
> +								       I915_CACHE_NONE),
> +						0);
>   			mb();
>   
>   			s = io_mapping_map_wc(&ggtt->iomap, slot, PAGE_SIZE);
> diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
> index 20a44788999e..a814775a363d 100644
> --- a/drivers/gpu/drm/i915/i915_vma.c
> +++ b/drivers/gpu/drm/i915/i915_vma.c
> @@ -315,7 +315,7 @@ struct i915_vma_work {
>   	struct i915_vma_resource *vma_res;
>   	struct drm_i915_gem_object *obj;
>   	struct i915_sw_dma_fence_cb cb;
> -	enum i915_cache_level cache_level;
> +	unsigned int pat_index;
>   	unsigned int flags;
>   };
>   
> @@ -334,7 +334,7 @@ static void __vma_bind(struct dma_fence_work *work)
>   		return;
>   
>   	vma_res->ops->bind_vma(vma_res->vm, &vw->stash,
> -			       vma_res, vw->cache_level, vw->flags);
> +			       vma_res, vw->pat_index, vw->flags);
>   }
>   
>   static void __vma_release(struct dma_fence_work *work)
> @@ -426,7 +426,7 @@ i915_vma_resource_init_from_vma(struct i915_vma_resource *vma_res,
>   /**
>    * i915_vma_bind - Sets up PTEs for an VMA in it's corresponding address space.
>    * @vma: VMA to map
> - * @cache_level: mapping cache level
> + * @pat_index: PAT index to set in PTE
>    * @flags: flags like global or local mapping
>    * @work: preallocated worker for allocating and binding the PTE
>    * @vma_res: pointer to a preallocated vma resource. The resource is either
> @@ -437,7 +437,7 @@ i915_vma_resource_init_from_vma(struct i915_vma_resource *vma_res,
>    * Note that DMA addresses are also the only part of the SG table we care about.
>    */
>   int i915_vma_bind(struct i915_vma *vma,
> -		  enum i915_cache_level cache_level,
> +		  unsigned int pat_index,
>   		  u32 flags,
>   		  struct i915_vma_work *work,
>   		  struct i915_vma_resource *vma_res)
> @@ -507,7 +507,7 @@ int i915_vma_bind(struct i915_vma *vma,
>   		struct dma_fence *prev;
>   
>   		work->vma_res = i915_vma_resource_get(vma->resource);
> -		work->cache_level = cache_level;
> +		work->pat_index = pat_index;
>   		work->flags = bind_flags;
>   
>   		/*
> @@ -537,7 +537,7 @@ int i915_vma_bind(struct i915_vma *vma,
>   
>   			return ret;
>   		}
> -		vma->ops->bind_vma(vma->vm, NULL, vma->resource, cache_level,
> +		vma->ops->bind_vma(vma->vm, NULL, vma->resource, pat_index,
>   				   bind_flags);
>   	}
>   
> @@ -814,7 +814,7 @@ i915_vma_insert(struct i915_vma *vma, struct i915_gem_ww_ctx *ww,
>   	color = 0;
>   
>   	if (i915_vm_has_cache_coloring(vma->vm))
> -		color = vma->obj->cache_level;
> +		color = vma->obj->pat_index;
>   
>   	if (flags & PIN_OFFSET_FIXED) {
>   		u64 offset = flags & PIN_OFFSET_MASK;
> @@ -1518,7 +1518,7 @@ int i915_vma_pin_ww(struct i915_vma *vma, struct i915_gem_ww_ctx *ww,
>   
>   	GEM_BUG_ON(!vma->pages);
>   	err = i915_vma_bind(vma,
> -			    vma->obj->cache_level,
> +			    vma->obj->pat_index,
>   			    flags, work, vma_res);
>   	vma_res = NULL;
>   	if (err)
> diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
> index ed5c9d682a1b..31a8f8aa5558 100644
> --- a/drivers/gpu/drm/i915/i915_vma.h
> +++ b/drivers/gpu/drm/i915/i915_vma.h
> @@ -250,7 +250,7 @@ i915_vma_compare(struct i915_vma *vma,
>   
>   struct i915_vma_work *i915_vma_work(void);
>   int i915_vma_bind(struct i915_vma *vma,
> -		  enum i915_cache_level cache_level,
> +		  unsigned int pat_index,
>   		  u32 flags,
>   		  struct i915_vma_work *work,
>   		  struct i915_vma_resource *vma_res);
> diff --git a/drivers/gpu/drm/i915/i915_vma_types.h b/drivers/gpu/drm/i915/i915_vma_types.h
> index 77fda2244d16..64472b7f0e77 100644
> --- a/drivers/gpu/drm/i915/i915_vma_types.h
> +++ b/drivers/gpu/drm/i915/i915_vma_types.h
> @@ -32,8 +32,6 @@
>   
>   #include "gem/i915_gem_object_types.h"
>   
> -enum i915_cache_level;
> -
>   /**
>    * DOC: Global GTT views
>    *
> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem.c b/drivers/gpu/drm/i915/selftests/i915_gem.c
> index d91d0ade8abd..61da4ed9d521 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_gem.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_gem.c
> @@ -57,7 +57,10 @@ static void trash_stolen(struct drm_i915_private *i915)
>   		u32 __iomem *s;
>   		int x;
>   
> -		ggtt->vm.insert_page(&ggtt->vm, dma, slot, I915_CACHE_NONE, 0);
> +		ggtt->vm.insert_page(&ggtt->vm, dma, slot,
> +				     i915_gem_get_pat_index(i915,
> +							    I915_CACHE_NONE),
> +				     0);
>   
>   		s = io_mapping_map_atomic_wc(&ggtt->iomap, slot);
>   		for (x = 0; x < PAGE_SIZE / sizeof(u32); x++) {
> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
> index 37068542aafe..f13a4d265814 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
> @@ -245,7 +245,7 @@ static int igt_evict_for_cache_color(void *arg)
>   	struct drm_mm_node target = {
>   		.start = I915_GTT_PAGE_SIZE * 2,
>   		.size = I915_GTT_PAGE_SIZE,
> -		.color = I915_CACHE_LLC,
> +		.color = i915_gem_get_pat_index(gt->i915, I915_CACHE_LLC),
>   	};
>   	struct drm_i915_gem_object *obj;
>   	struct i915_vma *vma;
> @@ -308,7 +308,7 @@ static int igt_evict_for_cache_color(void *arg)
>   	/* Attempt to remove the first *pinned* vma, by removing the (empty)
>   	 * neighbour -- this should fail.
>   	 */
> -	target.color = I915_CACHE_L3_LLC;
> +	target.color = i915_gem_get_pat_index(gt->i915, I915_CACHE_L3_LLC);
>   
>   	mutex_lock(&ggtt->vm.mutex);
>   	err = i915_gem_evict_for_node(&ggtt->vm, NULL, &target, 0);
> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
> index 154801f1c468..36940ef10108 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
> @@ -135,7 +135,7 @@ fake_dma_object(struct drm_i915_private *i915, u64 size)
>   
>   	obj->write_domain = I915_GEM_DOMAIN_CPU;
>   	obj->read_domains = I915_GEM_DOMAIN_CPU;
> -	obj->cache_level = I915_CACHE_NONE;
> +	obj->pat_index = i915_gem_get_pat_index(i915, I915_CACHE_NONE);
>   
>   	/* Preallocate the "backing storage" */
>   	if (i915_gem_object_pin_pages_unlocked(obj))
> @@ -359,7 +359,9 @@ static int lowlevel_hole(struct i915_address_space *vm,
>   
>   			with_intel_runtime_pm(vm->gt->uncore->rpm, wakeref)
>   			  vm->insert_entries(vm, mock_vma_res,
> -						   I915_CACHE_NONE, 0);
> +					     i915_gem_get_pat_index(vm->i915,
> +								    I915_CACHE_NONE),
> +					     0);
>   		}
>   		count = n;
>   
> @@ -1377,7 +1379,10 @@ static int igt_ggtt_page(void *arg)
>   
>   		ggtt->vm.insert_page(&ggtt->vm,
>   				     i915_gem_object_get_dma_address(obj, 0),
> -				     offset, I915_CACHE_NONE, 0);
> +				     offset,
> +				     i915_gem_get_pat_index(i915,
> +							    I915_CACHE_NONE),
> +				     0);
>   	}
>   
>   	order = i915_random_order(count, &prng);
> @@ -1510,7 +1515,7 @@ static int reserve_gtt_with_resource(struct i915_vma *vma, u64 offset)
>   	mutex_lock(&vm->mutex);
>   	err = i915_gem_gtt_reserve(vm, NULL, &vma->node, obj->base.size,
>   				   offset,
> -				   obj->cache_level,
> +				   obj->pat_index,
>   				   0);
>   	if (!err) {
>   		i915_vma_resource_init_from_vma(vma_res, vma);
> @@ -1690,7 +1695,7 @@ static int insert_gtt_with_resource(struct i915_vma *vma)
>   
>   	mutex_lock(&vm->mutex);
>   	err = i915_gem_gtt_insert(vm, NULL, &vma->node, obj->base.size, 0,
> -				  obj->cache_level, 0, vm->total, 0);
> +				  obj->pat_index, 0, vm->total, 0);
>   	if (!err) {
>   		i915_vma_resource_init_from_vma(vma_res, vma);
>   		vma->resource = vma_res;
> diff --git a/drivers/gpu/drm/i915/selftests/intel_memory_region.c b/drivers/gpu/drm/i915/selftests/intel_memory_region.c
> index 3b18e5905c86..d985d9bae2e8 100644
> --- a/drivers/gpu/drm/i915/selftests/intel_memory_region.c
> +++ b/drivers/gpu/drm/i915/selftests/intel_memory_region.c
> @@ -1070,7 +1070,9 @@ static int igt_lmem_write_cpu(void *arg)
>   	/* Put the pages into a known state -- from the gpu for added fun */
>   	intel_engine_pm_get(engine);
>   	err = intel_context_migrate_clear(engine->gt->migrate.context, NULL,
> -					  obj->mm.pages->sgl, I915_CACHE_NONE,
> +					  obj->mm.pages->sgl,
> +					  i915_gem_get_pat_index(i915,
> +								 I915_CACHE_NONE),
>   					  true, 0xdeadbeaf, &rq);
>   	if (rq) {
>   		dma_resv_add_fence(obj->base.resv, &rq->fence,
> diff --git a/drivers/gpu/drm/i915/selftests/mock_gtt.c b/drivers/gpu/drm/i915/selftests/mock_gtt.c
> index ece97e4faacb..a516c0aa88fd 100644
> --- a/drivers/gpu/drm/i915/selftests/mock_gtt.c
> +++ b/drivers/gpu/drm/i915/selftests/mock_gtt.c
> @@ -27,21 +27,21 @@
>   static void mock_insert_page(struct i915_address_space *vm,
>   			     dma_addr_t addr,
>   			     u64 offset,
> -			     enum i915_cache_level level,
> +			     unsigned int pat_index,
>   			     u32 flags)
>   {
>   }
>   
>   static void mock_insert_entries(struct i915_address_space *vm,
>   				struct i915_vma_resource *vma_res,
> -				enum i915_cache_level level, u32 flags)
> +				unsigned int pat_index, u32 flags)
>   {
>   }
>   
>   static void mock_bind_ppgtt(struct i915_address_space *vm,
>   			    struct i915_vm_pt_stash *stash,
>   			    struct i915_vma_resource *vma_res,
> -			    enum i915_cache_level cache_level,
> +			    unsigned int pat_index,
>   			    u32 flags)
>   {
>   	GEM_BUG_ON(flags & I915_VMA_GLOBAL_BIND);
> @@ -94,7 +94,7 @@ struct i915_ppgtt *mock_ppgtt(struct drm_i915_private *i915, const char *name)
>   static void mock_bind_ggtt(struct i915_address_space *vm,
>   			   struct i915_vm_pt_stash *stash,
>   			   struct i915_vma_resource *vma_res,
> -			   enum i915_cache_level cache_level,
> +			   unsigned int pat_index,
>   			   u32 flags)
>   {
>   }


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Intel-gfx] [PATCH 1/8] drm/i915/mtl: Set has_llc=0
  2023-04-19 23:00   ` [Intel-gfx] " fei.yang
@ 2023-04-20 10:20     ` Das, Nirmoy
  -1 siblings, 0 replies; 76+ messages in thread
From: Das, Nirmoy @ 2023-04-20 10:20 UTC (permalink / raw)
  To: fei.yang, intel-gfx; +Cc: Andi Shyti, dri-devel, Andrzej Hajda, Nirmoy Das

We have multiple bugs that requires this and it can be picked up 
irrespective of this series. I have sent a trybot patch for this and

once that passes, I will push this one.


https://patchwork.freedesktop.org/series/116746/


Nirmoy

On 4/20/2023 1:00 AM, fei.yang@intel.com wrote:
> From: Fei Yang <fei.yang@intel.com>
>
> On MTL, LLC is not shared between GT and CPU, set has_llc=0.
>
> Signed-off-by: Fei Yang <fei.yang@intel.com>
> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_pci.c | 1 +
>   1 file changed, 1 insertion(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
> index d64e074d7457..272a8ba37b64 100644
> --- a/drivers/gpu/drm/i915/i915_pci.c
> +++ b/drivers/gpu/drm/i915/i915_pci.c
> @@ -1147,6 +1147,7 @@ static const struct intel_device_info mtl_info = {
>   	.has_flat_ccs = 0,
>   	.has_gmd_id = 1,
>   	.has_guc_deprivilege = 1,
> +	.has_llc = 0,
>   	.has_mslice_steering = 0,
>   	.has_snoop = 1,
>   	.__runtime.memory_regions = REGION_SMEM | REGION_STOLEN_LMEM,

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Intel-gfx] [PATCH 1/8] drm/i915/mtl: Set has_llc=0
@ 2023-04-20 10:20     ` Das, Nirmoy
  0 siblings, 0 replies; 76+ messages in thread
From: Das, Nirmoy @ 2023-04-20 10:20 UTC (permalink / raw)
  To: fei.yang, intel-gfx; +Cc: dri-devel, Andrzej Hajda, Nirmoy Das

We have multiple bugs that requires this and it can be picked up 
irrespective of this series. I have sent a trybot patch for this and

once that passes, I will push this one.


https://patchwork.freedesktop.org/series/116746/


Nirmoy

On 4/20/2023 1:00 AM, fei.yang@intel.com wrote:
> From: Fei Yang <fei.yang@intel.com>
>
> On MTL, LLC is not shared between GT and CPU, set has_llc=0.
>
> Signed-off-by: Fei Yang <fei.yang@intel.com>
> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_pci.c | 1 +
>   1 file changed, 1 insertion(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
> index d64e074d7457..272a8ba37b64 100644
> --- a/drivers/gpu/drm/i915/i915_pci.c
> +++ b/drivers/gpu/drm/i915/i915_pci.c
> @@ -1147,6 +1147,7 @@ static const struct intel_device_info mtl_info = {
>   	.has_flat_ccs = 0,
>   	.has_gmd_id = 1,
>   	.has_guc_deprivilege = 1,
> +	.has_llc = 0,
>   	.has_mslice_steering = 0,
>   	.has_snoop = 1,
>   	.__runtime.memory_regions = REGION_SMEM | REGION_STOLEN_LMEM,

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Intel-gfx] [PATCH 0/8] drm/i915/mtl: Define MOCS and PAT tables for MTL
  2023-04-19 23:00 ` [Intel-gfx] " fei.yang
                   ` (10 preceding siblings ...)
  (?)
@ 2023-04-20 11:30 ` Andi Shyti
  -1 siblings, 0 replies; 76+ messages in thread
From: Andi Shyti @ 2023-04-20 11:30 UTC (permalink / raw)
  To: fei.yang; +Cc: intel-gfx, dri-devel

Hi Fei,

On Wed, Apr 19, 2023 at 04:00:50PM -0700, fei.yang@intel.com wrote:
> From: Fei Yang <fei.yang@intel.com>
> 
> The series includes patches needed to enable MTL.
> Also add new extension for GEM_CREATE uAPI to let
> user space set cache policy for buffer objects.
> 
> v2: addressing review comments and checkpatch warnings
> v3: make mtl_ggtt_pte_encode static

This series is a mixture of different series, bug fixes along
with PAT table. Maybe a good solution is to split this series
in more series, so that CI has the possibility to perform a
better per-patch testing.

Nirmoy has already sent the first two patches of this series for
testing. The 4th patch can be also taken out from this series.

Andi

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Intel-gfx] [PATCH 4/8] drm/i915/mtl: workaround coherency issue for Media
  2023-04-19 23:00   ` [Intel-gfx] " fei.yang
@ 2023-04-20 11:36     ` Das, Nirmoy
  -1 siblings, 0 replies; 76+ messages in thread
From: Das, Nirmoy @ 2023-04-20 11:36 UTC (permalink / raw)
  To: fei.yang, intel-gfx; +Cc: Andi Shyti, dri-devel, Nirmoy Das

This is a important fix and can be pushed without depending on this series.

I will send this out to mailing list separately for CI.


Regards,

Nirmoy

On 4/20/2023 1:00 AM, fei.yang@intel.com wrote:
> From: Fei Yang <fei.yang@intel.com>
>
> This patch implements Wa_22016122933.
>
> In MTL, memory writes initiated by Media tile update the whole
> cache line even for partial writes. This creates a coherency
> problem for cacheable memory if both CPU and GPU are writing data
> to different locations within a single cache line. CTB communication
> is impacted by this issue because the head and tail pointers are
> adjacent words within a cache line (see struct guc_ct_buffer_desc),
> where one is written by GuC and the other by the host.
> This patch circumvents the issue by making CPU/GPU shared memory
> uncacheable (WC on CPU side, and PAT index 2 for GPU). Also for
> CTB which is being updated by both CPU and GuC, mfence instruction
> is added to make sure the CPU writes are visible to GPU right away
> (flush the write combining buffer).
>
> While fixing the CTB issue, we noticed some random GSC firmware
> loading failure because the share buffers are cacheable (WB) on CPU
> side but uncached on GPU side. To fix these issues we need to map
> such shared buffers as WC on CPU side. Since such allocations are
> not all done through GuC allocator, to avoid too many code changes,
> the i915_coherent_map_type() is now hard coded to return WC for MTL.
>
> BSpec: 45101
>
> Signed-off-by: Fei Yang <fei.yang@intel.com>
> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
> Acked-by: Nirmoy Das <nirmoy.das@intel.com>
> ---
>   drivers/gpu/drm/i915/gem/i915_gem_pages.c |  5 ++++-
>   drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c | 13 +++++++++++++
>   drivers/gpu/drm/i915/gt/uc/intel_guc.c    |  7 +++++++
>   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c |  6 ++++++
>   4 files changed, 30 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> index ecd86130b74f..89fc8ea6bcfc 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> @@ -469,7 +469,10 @@ enum i915_map_type i915_coherent_map_type(struct drm_i915_private *i915,
>   					  struct drm_i915_gem_object *obj,
>   					  bool always_coherent)
>   {
> -	if (i915_gem_object_is_lmem(obj))
> +	/*
> +	 * Wa_22016122933: always return I915_MAP_WC for MTL
> +	 */
> +	if (i915_gem_object_is_lmem(obj) || IS_METEORLAKE(i915))
>   		return I915_MAP_WC;
>   	if (HAS_LLC(i915) || always_coherent)
>   		return I915_MAP_WB;
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c b/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c
> index 1d9fdfb11268..236673c02f9a 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c
> @@ -110,6 +110,13 @@ static int gsc_fw_load_prepare(struct intel_gsc_uc *gsc)
>   	if (obj->base.size < gsc->fw.size)
>   		return -ENOSPC;
>   
> +	/*
> +	 * Wa_22016122933: For MTL the shared memory needs to be mapped
> +	 * as WC on CPU side and UC (PAT index 2) on GPU side
> +	 */
> +	if (IS_METEORLAKE(i915))
> +		i915_gem_object_set_cache_coherency(obj, I915_CACHE_NONE);
> +
>   	dst = i915_gem_object_pin_map_unlocked(obj,
>   					       i915_coherent_map_type(i915, obj, true));
>   	if (IS_ERR(dst))
> @@ -125,6 +132,12 @@ static int gsc_fw_load_prepare(struct intel_gsc_uc *gsc)
>   	memset(dst, 0, obj->base.size);
>   	memcpy(dst, src, gsc->fw.size);
>   
> +	/*
> +	 * Wa_22016122933: Making sure the data in dst is
> +	 * visible to GSC right away
> +	 */
> +	intel_guc_write_barrier(&gt->uc.guc);
> +
>   	i915_gem_object_unpin_map(gsc->fw.obj);
>   	i915_gem_object_unpin_map(obj);
>   
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> index e89f16ecf1ae..c9f20385f6a0 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> @@ -744,6 +744,13 @@ struct i915_vma *intel_guc_allocate_vma(struct intel_guc *guc, u32 size)
>   	if (IS_ERR(obj))
>   		return ERR_CAST(obj);
>   
> +	/*
> +	 * Wa_22016122933: For MTL the shared memory needs to be mapped
> +	 * as WC on CPU side and UC (PAT index 2) on GPU side
> +	 */
> +	if (IS_METEORLAKE(gt->i915))
> +		i915_gem_object_set_cache_coherency(obj, I915_CACHE_NONE);
> +
>   	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL);
>   	if (IS_ERR(vma))
>   		goto err;
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> index 1803a633ed64..99a0a89091e7 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> @@ -902,6 +902,12 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
>   	/* now update descriptor */
>   	WRITE_ONCE(desc->head, head);
>   
> +	/*
> +	 * Wa_22016122933: Making sure the head update is
> +	 * visible to GuC right away
> +	 */
> +	intel_guc_write_barrier(ct_to_guc(ct));
> +
>   	return available - len;
>   
>   corrupted:

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Intel-gfx] [PATCH 4/8] drm/i915/mtl: workaround coherency issue for Media
@ 2023-04-20 11:36     ` Das, Nirmoy
  0 siblings, 0 replies; 76+ messages in thread
From: Das, Nirmoy @ 2023-04-20 11:36 UTC (permalink / raw)
  To: fei.yang, intel-gfx; +Cc: dri-devel, Nirmoy Das

This is a important fix and can be pushed without depending on this series.

I will send this out to mailing list separately for CI.


Regards,

Nirmoy

On 4/20/2023 1:00 AM, fei.yang@intel.com wrote:
> From: Fei Yang <fei.yang@intel.com>
>
> This patch implements Wa_22016122933.
>
> In MTL, memory writes initiated by Media tile update the whole
> cache line even for partial writes. This creates a coherency
> problem for cacheable memory if both CPU and GPU are writing data
> to different locations within a single cache line. CTB communication
> is impacted by this issue because the head and tail pointers are
> adjacent words within a cache line (see struct guc_ct_buffer_desc),
> where one is written by GuC and the other by the host.
> This patch circumvents the issue by making CPU/GPU shared memory
> uncacheable (WC on CPU side, and PAT index 2 for GPU). Also for
> CTB which is being updated by both CPU and GuC, mfence instruction
> is added to make sure the CPU writes are visible to GPU right away
> (flush the write combining buffer).
>
> While fixing the CTB issue, we noticed some random GSC firmware
> loading failure because the share buffers are cacheable (WB) on CPU
> side but uncached on GPU side. To fix these issues we need to map
> such shared buffers as WC on CPU side. Since such allocations are
> not all done through GuC allocator, to avoid too many code changes,
> the i915_coherent_map_type() is now hard coded to return WC for MTL.
>
> BSpec: 45101
>
> Signed-off-by: Fei Yang <fei.yang@intel.com>
> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
> Acked-by: Nirmoy Das <nirmoy.das@intel.com>
> ---
>   drivers/gpu/drm/i915/gem/i915_gem_pages.c |  5 ++++-
>   drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c | 13 +++++++++++++
>   drivers/gpu/drm/i915/gt/uc/intel_guc.c    |  7 +++++++
>   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c |  6 ++++++
>   4 files changed, 30 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> index ecd86130b74f..89fc8ea6bcfc 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> @@ -469,7 +469,10 @@ enum i915_map_type i915_coherent_map_type(struct drm_i915_private *i915,
>   					  struct drm_i915_gem_object *obj,
>   					  bool always_coherent)
>   {
> -	if (i915_gem_object_is_lmem(obj))
> +	/*
> +	 * Wa_22016122933: always return I915_MAP_WC for MTL
> +	 */
> +	if (i915_gem_object_is_lmem(obj) || IS_METEORLAKE(i915))
>   		return I915_MAP_WC;
>   	if (HAS_LLC(i915) || always_coherent)
>   		return I915_MAP_WB;
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c b/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c
> index 1d9fdfb11268..236673c02f9a 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c
> @@ -110,6 +110,13 @@ static int gsc_fw_load_prepare(struct intel_gsc_uc *gsc)
>   	if (obj->base.size < gsc->fw.size)
>   		return -ENOSPC;
>   
> +	/*
> +	 * Wa_22016122933: For MTL the shared memory needs to be mapped
> +	 * as WC on CPU side and UC (PAT index 2) on GPU side
> +	 */
> +	if (IS_METEORLAKE(i915))
> +		i915_gem_object_set_cache_coherency(obj, I915_CACHE_NONE);
> +
>   	dst = i915_gem_object_pin_map_unlocked(obj,
>   					       i915_coherent_map_type(i915, obj, true));
>   	if (IS_ERR(dst))
> @@ -125,6 +132,12 @@ static int gsc_fw_load_prepare(struct intel_gsc_uc *gsc)
>   	memset(dst, 0, obj->base.size);
>   	memcpy(dst, src, gsc->fw.size);
>   
> +	/*
> +	 * Wa_22016122933: Making sure the data in dst is
> +	 * visible to GSC right away
> +	 */
> +	intel_guc_write_barrier(&gt->uc.guc);
> +
>   	i915_gem_object_unpin_map(gsc->fw.obj);
>   	i915_gem_object_unpin_map(obj);
>   
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> index e89f16ecf1ae..c9f20385f6a0 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> @@ -744,6 +744,13 @@ struct i915_vma *intel_guc_allocate_vma(struct intel_guc *guc, u32 size)
>   	if (IS_ERR(obj))
>   		return ERR_CAST(obj);
>   
> +	/*
> +	 * Wa_22016122933: For MTL the shared memory needs to be mapped
> +	 * as WC on CPU side and UC (PAT index 2) on GPU side
> +	 */
> +	if (IS_METEORLAKE(gt->i915))
> +		i915_gem_object_set_cache_coherency(obj, I915_CACHE_NONE);
> +
>   	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL);
>   	if (IS_ERR(vma))
>   		goto err;
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> index 1803a633ed64..99a0a89091e7 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> @@ -902,6 +902,12 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
>   	/* now update descriptor */
>   	WRITE_ONCE(desc->head, head);
>   
> +	/*
> +	 * Wa_22016122933: Making sure the head update is
> +	 * visible to GuC right away
> +	 */
> +	intel_guc_write_barrier(ct_to_guc(ct));
> +
>   	return available - len;
>   
>   corrupted:

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 8/8] drm/i915: Allow user to set cache at BO creation
  2023-04-19 23:00   ` [Intel-gfx] " fei.yang
@ 2023-04-20 11:39     ` Andi Shyti
  -1 siblings, 0 replies; 76+ messages in thread
From: Andi Shyti @ 2023-04-20 11:39 UTC (permalink / raw)
  To: fei.yang
  Cc: Chris Wilson, intel-gfx, dri-devel, Lionel Landwerlin,
	Andi Shyti, Jordan Justen, Matt Roper, Nirmoy Das

Hi Fei,

> To comply with the design that buffer objects shall have immutable
> cache setting through out their life cycle, {set, get}_caching ioctl's
> are no longer supported from MTL onward. With that change caching
> policy can only be set at object creation time. The current code
> applies a default (platform dependent) cache setting for all objects.
> However this is not optimal for performance tuning. The patch extends
> the existing gem_create uAPI to let user set PAT index for the object
> at creation time.
> The new extension is platform independent, so UMD's can switch to using
> this extension for older platforms as well, while {set, get}_caching are
> still supported on these legacy paltforms for compatibility reason.
> 
> Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
> Cc: Matt Roper <matthew.d.roper@intel.com>
> Cc: Andi Shyti <andi.shyti@linux.intel.com>
> Signed-off-by: Fei Yang <fei.yang@intel.com>
> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>

because this is an API change, we need some more information
here.

First of all you need to CC the userspace guys that have been
working on top of your series and get their ack's.

I also believe that this series has also been tested on a
separate repository, would you link it in the commit message?

Thanks,
Andi

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Intel-gfx] [PATCH 8/8] drm/i915: Allow user to set cache at BO creation
@ 2023-04-20 11:39     ` Andi Shyti
  0 siblings, 0 replies; 76+ messages in thread
From: Andi Shyti @ 2023-04-20 11:39 UTC (permalink / raw)
  To: fei.yang; +Cc: Chris Wilson, intel-gfx, dri-devel, Matt Roper, Nirmoy Das

Hi Fei,

> To comply with the design that buffer objects shall have immutable
> cache setting through out their life cycle, {set, get}_caching ioctl's
> are no longer supported from MTL onward. With that change caching
> policy can only be set at object creation time. The current code
> applies a default (platform dependent) cache setting for all objects.
> However this is not optimal for performance tuning. The patch extends
> the existing gem_create uAPI to let user set PAT index for the object
> at creation time.
> The new extension is platform independent, so UMD's can switch to using
> this extension for older platforms as well, while {set, get}_caching are
> still supported on these legacy paltforms for compatibility reason.
> 
> Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
> Cc: Matt Roper <matthew.d.roper@intel.com>
> Cc: Andi Shyti <andi.shyti@linux.intel.com>
> Signed-off-by: Fei Yang <fei.yang@intel.com>
> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>

because this is an API change, we need some more information
here.

First of all you need to CC the userspace guys that have been
working on top of your series and get their ack's.

I also believe that this series has also been tested on a
separate repository, would you link it in the commit message?

Thanks,
Andi

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Intel-gfx] [PATCH 7/8] drm/i915: use pat_index instead of cache_level
  2023-04-20 10:13   ` Andrzej Hajda
@ 2023-04-20 12:39     ` Tvrtko Ursulin
  2023-04-20 20:34       ` Yang, Fei
  0 siblings, 1 reply; 76+ messages in thread
From: Tvrtko Ursulin @ 2023-04-20 12:39 UTC (permalink / raw)
  To: Andrzej Hajda, fei.yang, intel-gfx; +Cc: Matt Roper, Chris Wilson, dri-devel


On 20/04/2023 11:13, Andrzej Hajda wrote:
> On 20.04.2023 01:00, fei.yang@intel.com wrote:
>> From: Fei Yang <fei.yang@intel.com>
>>
>> Currently the KMD is using enum i915_cache_level to set caching policy 
>> for
>> buffer objects. This is flaky because the PAT index which really controls
>> the caching behavior in PTE has far more levels than what's defined in 
>> the
>> enum. In addition, the PAT index is platform dependent, having to 
>> translate
>> between i915_cache_level and PAT index is not reliable, and makes the 
>> code
>> more complicated.

How it is flaky and not reliable - yet the series proposed to leave it in place and even claims using cache levels simplifies the code (lower in the commit message). Maybe just the commit message needs work.

>>  From UMD's perspective there is also a necessity to set caching 
>> policy for
>> performance fine tuning. It's much easier for the UMD to directly use PAT
>> index because the behavior of each PAT index is clearly defined in Bspec.
>> Having the abstracted i915_cache_level sitting in between would only 
>> cause
>> more ambiguity.
>>
>> For these reasons this patch replaces i915_cache_level with PAT index. 
>> Also
>> note, the cache_level is not completely removed yet, because the KMD 
>> still
>> has the need of creating buffer objects with simple cache settings 
>> such as
>> cached, uncached, or writethrough. For such simple cases, using 
>> cache_level
>> would help simplify the code.
> 
> It seems quite fundamental change to me. Does this "not completely 
> removed yet" means that in some future we will not have support for 
> generic cache levels at all? Seems strange to me. Even looking at the 
> number of users of i915_gem_get_pat_index below it seem very unlikely.
> 
> And if the support for generic level will stay, maybe it would be better 
> to make usage of it more convienient. All conversion of
>      f(..., cache_level, ...)
> to
>      f(..., i915_gem_get_pat_index(i915, cache_level), ...)
> looks quite ugly to me.
> 
> Maybe extending cache level to support pat index somehow, for example:
> enum i915_cache_level {
>      I915_CACHE_NONE = 0,
>      I915_CACHE_...,
>      ...
>      I915_CACHE_1ST_PAT_INDEX = 0x100,
> }
> 
> so real_pat_index = cache_level - I915_CACHE_1ST_PAT_INDEX
> 
> and in case of generic level there will be platform dependend conversion 
> to real_pat_index?
> 
> I do not know the whole picture so maybe this is all wrong for some 
> reason, just asking :)

It looks a bit unsightly to me too so yes please, brain storm on whether it can be made more elegant and less intrusive would be appreciated.

>> Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
>> Cc: Matt Roper <matthew.d.roper@intel.com>
>> Signed-off-by: Fei Yang <fei.yang@intel.com>
>> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
>> ---
>>   drivers/gpu/drm/i915/display/intel_dpt.c      | 12 +--
>>   drivers/gpu/drm/i915/gem/i915_gem_domain.c    | 27 ++----
>>   .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 10 ++-
>>   drivers/gpu/drm/i915/gem/i915_gem_mman.c      |  3 +-
>>   drivers/gpu/drm/i915/gem/i915_gem_object.c    | 52 +++++++++++-
>>   drivers/gpu/drm/i915/gem/i915_gem_object.h    |  4 +
>>   .../gpu/drm/i915/gem/i915_gem_object_types.h  | 25 +++++-
>>   drivers/gpu/drm/i915/gem/i915_gem_stolen.c    |  4 +-
>>   drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c  | 16 ++--
>>   .../gpu/drm/i915/gem/selftests/huge_pages.c   |  2 +-
>>   .../drm/i915/gem/selftests/i915_gem_migrate.c |  2 +-
>>   .../drm/i915/gem/selftests/i915_gem_mman.c    |  2 +-
>>   drivers/gpu/drm/i915/gt/gen6_ppgtt.c          | 10 ++-
>>   drivers/gpu/drm/i915/gt/gen8_ppgtt.c          | 71 ++++++++--------
>>   drivers/gpu/drm/i915/gt/gen8_ppgtt.h          |  3 +-
>>   drivers/gpu/drm/i915/gt/intel_ggtt.c          | 82 +++++++++----------
>>   drivers/gpu/drm/i915/gt/intel_gtt.h           | 20 ++---
>>   drivers/gpu/drm/i915/gt/intel_migrate.c       | 47 ++++++-----
>>   drivers/gpu/drm/i915/gt/intel_migrate.h       | 13 ++-
>>   drivers/gpu/drm/i915/gt/intel_ppgtt.c         |  6 +-
>>   drivers/gpu/drm/i915/gt/selftest_migrate.c    | 47 ++++++-----
>>   drivers/gpu/drm/i915/gt/selftest_reset.c      |  8 +-
>>   drivers/gpu/drm/i915/gt/selftest_timeline.c   |  2 +-
>>   drivers/gpu/drm/i915/gt/selftest_tlb.c        |  4 +-
>>   drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c      | 10 ++-
>>   drivers/gpu/drm/i915/i915_debugfs.c           | 55 ++++++++++---
>>   drivers/gpu/drm/i915/i915_gem.c               | 16 +++-
>>   drivers/gpu/drm/i915/i915_gpu_error.c         |  8 +-
>>   drivers/gpu/drm/i915/i915_vma.c               | 16 ++--
>>   drivers/gpu/drm/i915/i915_vma.h               |  2 +-
>>   drivers/gpu/drm/i915/i915_vma_types.h         |  2 -
>>   drivers/gpu/drm/i915/selftests/i915_gem.c     |  5 +-
>>   .../gpu/drm/i915/selftests/i915_gem_evict.c   |  4 +-
>>   drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 15 ++--
>>   .../drm/i915/selftests/intel_memory_region.c  |  4 +-
>>   drivers/gpu/drm/i915/selftests/mock_gtt.c     |  8 +-
>>   36 files changed, 378 insertions(+), 239 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/display/intel_dpt.c 
>> b/drivers/gpu/drm/i915/display/intel_dpt.c
>> index c5eacfdba1a5..7c5fddb203ba 100644
>> --- a/drivers/gpu/drm/i915/display/intel_dpt.c
>> +++ b/drivers/gpu/drm/i915/display/intel_dpt.c
>> @@ -43,24 +43,24 @@ static void gen8_set_pte(void __iomem *addr, 
>> gen8_pte_t pte)
>>   static void dpt_insert_page(struct i915_address_space *vm,
>>                   dma_addr_t addr,
>>                   u64 offset,
>> -                enum i915_cache_level level,
>> +                unsigned int pat_index,
>>                   u32 flags)
>>   {
>>       struct i915_dpt *dpt = i915_vm_to_dpt(vm);
>>       gen8_pte_t __iomem *base = dpt->iomem;
>>       gen8_set_pte(base + offset / I915_GTT_PAGE_SIZE,
>> -             vm->pte_encode(addr, level, flags));
>> +             vm->pte_encode(addr, pat_index, flags));
>>   }
>>   static void dpt_insert_entries(struct i915_address_space *vm,
>>                      struct i915_vma_resource *vma_res,
>> -                   enum i915_cache_level level,
>> +                   unsigned int pat_index,
>>                      u32 flags)
>>   {
>>       struct i915_dpt *dpt = i915_vm_to_dpt(vm);
>>       gen8_pte_t __iomem *base = dpt->iomem;
>> -    const gen8_pte_t pte_encode = vm->pte_encode(0, level, flags);
>> +    const gen8_pte_t pte_encode = vm->pte_encode(0, pat_index, flags);
>>       struct sgt_iter sgt_iter;
>>       dma_addr_t addr;
>>       int i;
>> @@ -83,7 +83,7 @@ static void dpt_clear_range(struct 
>> i915_address_space *vm,
>>   static void dpt_bind_vma(struct i915_address_space *vm,
>>                struct i915_vm_pt_stash *stash,
>>                struct i915_vma_resource *vma_res,
>> -             enum i915_cache_level cache_level,
>> +             unsigned int pat_index,
>>                u32 flags)
>>   {
>>       u32 pte_flags;
>> @@ -98,7 +98,7 @@ static void dpt_bind_vma(struct i915_address_space *vm,
>>       if (vma_res->bi.lmem)
>>           pte_flags |= PTE_LM;
>> -    vm->insert_entries(vm, vma_res, cache_level, pte_flags);
>> +    vm->insert_entries(vm, vma_res, pat_index, pte_flags);
>>       vma_res->page_sizes_gtt = I915_GTT_PAGE_SIZE;
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c 
>> b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
>> index bb3575b1479f..d5fd4c9cd9f8 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
>> @@ -27,8 +27,8 @@ static bool gpu_write_needs_clflush(struct 
>> drm_i915_gem_object *obj)
>>       if (IS_DGFX(i915))
>>           return false;
>> -    return !(obj->cache_level == I915_CACHE_NONE ||
>> -         obj->cache_level == I915_CACHE_WT);
>> +    return !(i915_gem_object_has_cache_level(obj, I915_CACHE_NONE) ||
>> +         i915_gem_object_has_cache_level(obj, I915_CACHE_WT));
>>   }
>>   bool i915_gem_cpu_write_needs_clflush(struct drm_i915_gem_object *obj)
>> @@ -267,7 +267,7 @@ int i915_gem_object_set_cache_level(struct 
>> drm_i915_gem_object *obj,
>>   {
>>       int ret;
>> -    if (obj->cache_level == cache_level)
>> +    if (i915_gem_object_has_cache_level(obj, cache_level))
>>           return 0;
>>       ret = i915_gem_object_wait(obj,
>> @@ -278,10 +278,8 @@ int i915_gem_object_set_cache_level(struct 
>> drm_i915_gem_object *obj,
>>           return ret;
>>       /* Always invalidate stale cachelines */
>> -    if (obj->cache_level != cache_level) {
>> -        i915_gem_object_set_cache_coherency(obj, cache_level);
>> -        obj->cache_dirty = true;
>> -    }
>> +    i915_gem_object_set_cache_coherency(obj, cache_level);
>> +    obj->cache_dirty = true;
>>       /* The cache-level will be applied when each vma is rebound. */
>>       return i915_gem_object_unbind(obj,
>> @@ -306,20 +304,13 @@ int i915_gem_get_caching_ioctl(struct drm_device 
>> *dev, void *data,
>>           goto out;
>>       }
>> -    switch (obj->cache_level) {
>> -    case I915_CACHE_LLC:
>> -    case I915_CACHE_L3_LLC:
>> +    if (i915_gem_object_has_cache_level(obj, I915_CACHE_LLC) ||
>> +        i915_gem_object_has_cache_level(obj, I915_CACHE_L3_LLC))
>>           args->caching = I915_CACHING_CACHED;
>> -        break;
>> -
>> -    case I915_CACHE_WT:
>> +    else if (i915_gem_object_has_cache_level(obj, I915_CACHE_WT))
>>           args->caching = I915_CACHING_DISPLAY;
>> -        break;
>> -
>> -    default:
>> +    else
>>           args->caching = I915_CACHING_NONE;
>> -        break;
>> -    }
>>   out:
>>       rcu_read_unlock();
>>       return err;
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
>> b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>> index 3aeede6aee4d..d42915516636 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>> @@ -642,7 +642,7 @@ static inline int use_cpu_reloc(const struct 
>> reloc_cache *cache,
>>       return (cache->has_llc ||
>>           obj->cache_dirty ||
>> -        obj->cache_level != I915_CACHE_NONE);
>> +        !i915_gem_object_has_cache_level(obj, I915_CACHE_NONE));
>>   }
>>   static int eb_reserve_vma(struct i915_execbuffer *eb,
>> @@ -1323,8 +1323,10 @@ static void *reloc_iomap(struct i915_vma *batch,
>>       offset = cache->node.start;
>>       if (drm_mm_node_allocated(&cache->node)) {
>>           ggtt->vm.insert_page(&ggtt->vm,
>> -                     i915_gem_object_get_dma_address(obj, page),
>> -                     offset, I915_CACHE_NONE, 0);
>> +            i915_gem_object_get_dma_address(obj, page),
>> +            offset,
>> +            i915_gem_get_pat_index(ggtt->vm.i915, I915_CACHE_NONE),
>> +            0);
>>       } else {
>>           offset += page << PAGE_SHIFT;
>>       }
>> @@ -1464,7 +1466,7 @@ eb_relocate_entry(struct i915_execbuffer *eb,
>>               reloc_cache_unmap(&eb->reloc_cache);
>>               mutex_lock(&vma->vm->mutex);
>>               err = i915_vma_bind(target->vma,
>> -                        target->vma->obj->cache_level,
>> +                        target->vma->obj->pat_index,
>>                           PIN_GLOBAL, NULL, NULL);
>>               mutex_unlock(&vma->vm->mutex);
>>               reloc_cache_remap(&eb->reloc_cache, ev->vma->obj);
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c 
>> b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
>> index 3dbacdf0911a..50c30efa08a3 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
>> @@ -383,7 +383,8 @@ static vm_fault_t vm_fault_gtt(struct vm_fault *vmf)
>>       }
>>       /* Access to snoopable pages through the GTT is incoherent. */
>> -    if (obj->cache_level != I915_CACHE_NONE && !HAS_LLC(i915)) {
>> +    if (!(i915_gem_object_has_cache_level(obj, I915_CACHE_NONE) ||
>> +          HAS_LLC(i915))) {
>>           ret = -EFAULT;
>>           goto err_unpin;
>>       }
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c 
>> b/drivers/gpu/drm/i915/gem/i915_gem_object.c
>> index 8c70a0ec7d2f..27c948350b5b 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
>> @@ -54,6 +54,25 @@ unsigned int i915_gem_get_pat_index(struct 
>> drm_i915_private *i915,
>>       return INTEL_INFO(i915)->cachelevel_to_pat[level];
>>   }
>> +bool i915_gem_object_has_cache_level(const struct drm_i915_gem_object 
>> *obj,
>> +                     enum i915_cache_level lvl)
> 
> The name suggest object can have more cache levels, maybe only my 
> impression, up to you.
> 
>> +{
>> +    /*
>> +     * cache_level == I915_CACHE_INVAL indicates the UMD's have set the
>> +     * caching policy through pat_index, in which case the KMD should
>> +     * leave the coherency to be managed by user space, simply return
>> +     * true here.
>> +     */
>> +    if (obj->cache_level == I915_CACHE_INVAL)
>> +        return true;

It's a "bit" counter intuitive that answer "has cache level" is yes when cache level is set to invalid!

I worry we don't create an impenetrable code base so I hope this can be improved.

>> +
>> +    /*
>> +     * Otherwise the pat_index should have been converted from 
>> cache_level
>> +     * so that the following comparison is valid.
>> +     */
>> +    return obj->pat_index == i915_gem_get_pat_index(obj_to_i915(obj), 
>> lvl);
>> +}
>> +
>>   struct drm_i915_gem_object *i915_gem_object_alloc(void)
>>   {
>>       struct drm_i915_gem_object *obj;
>> @@ -133,7 +152,7 @@ void i915_gem_object_set_cache_coherency(struct 
>> drm_i915_gem_object *obj,
>>   {
>>       struct drm_i915_private *i915 = to_i915(obj->base.dev);
>> -    obj->cache_level = cache_level;
>> +    obj->pat_index = i915_gem_get_pat_index(i915, cache_level);
>>       if (cache_level != I915_CACHE_NONE)
>>           obj->cache_coherent = (I915_BO_CACHE_COHERENT_FOR_READ |
>> @@ -148,6 +167,37 @@ void i915_gem_object_set_cache_coherency(struct 
>> drm_i915_gem_object *obj,
>>           !IS_DGFX(i915);
>>   }
>> +/**
>> + * i915_gem_object_set_pat_index - set PAT index to be used in PTE 
>> encode
>> + * @obj: #drm_i915_gem_object
>> + * @pat_index: PAT index
>> + *
>> + * This is a clone of i915_gem_object_set_cache_coherency taking pat 
>> index
>> + * instead of cache_level as its second argument.
>> + */
>> +void i915_gem_object_set_pat_index(struct drm_i915_gem_object *obj,
>> +                   unsigned int pat_index)
>> +{
>> +    struct drm_i915_private *i915 = to_i915(obj->base.dev);
>> +
>> +    if (obj->pat_index == pat_index)
>> +        return;
>> +
>> +    obj->pat_index = pat_index;
>> +
>> +    if (pat_index != i915_gem_get_pat_index(i915, I915_CACHE_NONE))
>> +        obj->cache_coherent = (I915_BO_CACHE_COHERENT_FOR_READ |
>> +                       I915_BO_CACHE_COHERENT_FOR_WRITE);
>> +    else if (HAS_LLC(i915))
>> +        obj->cache_coherent = I915_BO_CACHE_COHERENT_FOR_READ;
>> +    else
>> +        obj->cache_coherent = 0;
>> +
>> +    obj->cache_dirty =
>> +        !(obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_WRITE) &&
>> +        !IS_DGFX(i915);
>> +}
>> +
>>   bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj)
>>   {
>>       struct drm_i915_private *i915 = to_i915(obj->base.dev);
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h 
>> b/drivers/gpu/drm/i915/gem/i915_gem_object.h
>> index 4c92e17b4337..6f00aab10015 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
>> @@ -34,6 +34,8 @@ static inline bool i915_gem_object_size_2big(u64 size)
>>   unsigned int i915_gem_get_pat_index(struct drm_i915_private *i915,
>>                       enum i915_cache_level level);
>> +bool i915_gem_object_has_cache_level(const struct drm_i915_gem_object 
>> *obj,
>> +                     enum i915_cache_level lvl);
>>   void i915_gem_init__objects(struct drm_i915_private *i915);
>>   void i915_objects_module_exit(void);
>> @@ -764,6 +766,8 @@ bool i915_gem_object_has_unknown_state(struct 
>> drm_i915_gem_object *obj);
>>   void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object 
>> *obj,
>>                        unsigned int cache_level);
>> +void i915_gem_object_set_pat_index(struct drm_i915_gem_object *obj,
>> +                   unsigned int pat_index);
>>   bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj);
>>   void i915_gem_object_flush_if_display(struct drm_i915_gem_object *obj);
>>   void i915_gem_object_flush_if_display_locked(struct 
>> drm_i915_gem_object *obj);
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h 
>> b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
>> index 41b35abccf88..132ce01dee9f 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
>> @@ -195,6 +195,7 @@ enum i915_cache_level {
>>        */
>>       I915_CACHE_WT,
>>       I915_MAX_CACHE_LEVEL,
>> +    I915_CACHE_INVAL = I915_MAX_CACHE_LEVEL,
>>   };
>>   enum i915_map_type {
>> @@ -358,10 +359,28 @@ struct drm_i915_gem_object {
>>   #define I915_BO_FLAG_STRUCT_PAGE BIT(0) /* Object backed by struct 
>> pages */
>>   #define I915_BO_FLAG_IOMEM       BIT(1) /* Object backed by IO 
>> memory */
>>       /**
>> -     * @cache_level: The desired GTT caching level.
>> +     * @pat_index: The desired PAT index.
>> +     *
>> +     * See hardware specification for valid PAT indices for each 
>> platform.

Side note for the last patch in the series - the UAPI blurb next to u32 index needs to at least point to some public PRM which lists the PATs and their configuration I would think. Otherwise it's not fully transparent how to use the feature.

>> +     * This field used to contain a value of enum i915_cache_level. 

What does this mean? Nothing is changed to unsigned here but just new field added.

It's
>> +     * changed to an unsigned int because PAT indices are being used by
>> +     * both UMD and KMD for caching policy control after GEN12.
>> +     * For backward compatibility, this field will continue to contain
>> +     * value of i915_cache_level for pre-GEN12 platforms so that the PTE

Pat_index:6 is a copy of cache_level:3 pre-Gen12?

But when I look at changes like:

@@ -1302,7 +1298,9 @@ bool i915_ggtt_resume_vm(struct i915_address_space *vm)
  		 */
  		vma->resource->bound_flags = 0;
  		vma->ops->bind_vma(vm, NULL, vma->resource,
-				   obj ? obj->cache_level : 0,
+				   obj ? obj->pat_index :
+					 i915_gem_get_pat_index(vm->i915,
+								I915_CACHE_NONE),
  				   was_bound);

That suggests it is not a copy but that obj->pat_index is always valid and directly a PAT index.

In which case new cache_level enum to say "use pat instead" may indeed be nicer as Andrzej suggested.

Although it is not clear to me for a glance that we need both. Maybe all in driver object creation can use cache_level but immediately convert to PAT internally and just don't store cache_level? I haven't looked in detail is my disclaimer though.. I guess it may boil down to does i915 ever need to read back cache_level, other than on the top entry points like setting it or so.

>> +     * encode functions for these legacy platforms can stay the same.
>> +     * In the meantime platform specific tables are created to translate
>> +     * i915_cache_level into pat index, for more details check the 
>> macros
>> +     * defined i915/i915_pci.c, e.g. PVC_CACHELEVEL.
>> +     */
>> +    unsigned int pat_index:6;

Existing bitfield takes up 7 bits. I'd check here with pahole if making pat_index a full u8 and changing the existing ones to u8 field:bits maybe ends up better overall.

>> +    /**
>> +     * @cache_level: Indicate whether pat_index is set by UMD
>>        *
>> -     * See enum i915_cache_level for possible values, along with what
>> -     * each does.
>> +     * This used to hold desired GTT caching level, but is now 
>> replaced by
>> +     * pat_index. It's kept here for KMD to tell whether the 
>> pat_index is
>> +     * set by UMD or converted from enum i915_cache_level.
>> +     * This field should be 0 by default, but I915_CACHE_INVAL if the
>> +     * pat_index is set by UMD.
>>        */
>>       unsigned int cache_level:3;
>>       /**
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c 
>> b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
>> index ee492d823f1b..3b094d36a0b0 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
>> @@ -565,7 +565,9 @@ static void dbg_poison(struct i915_ggtt *ggtt,
>>           ggtt->vm.insert_page(&ggtt->vm, addr,
>>                        ggtt->error_capture.start,
>> -                     I915_CACHE_NONE, 0);
>> +                     i915_gem_get_pat_index(ggtt->vm.i915,
>> +                                I915_CACHE_NONE),
>> +                     0);
>>           mb();
>>           s = io_mapping_map_wc(&ggtt->iomap,
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c 
>> b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
>> index 69eb20ed4d47..e40761e13c2a 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
>> @@ -214,7 +214,8 @@ static struct dma_fence 
>> *i915_ttm_accel_move(struct ttm_buffer_object *bo,
>>           intel_engine_pm_get(to_gt(i915)->migrate.context->engine);
>>           ret = 
>> intel_context_migrate_clear(to_gt(i915)->migrate.context, deps,
>> -                          dst_st->sgl, dst_level,
>> +                          dst_st->sgl,
>> +                          i915_gem_get_pat_index(i915, dst_level),
>>                             i915_ttm_gtt_binds_lmem(dst_mem),
>>                             0, &rq);
>>       } else {
>> @@ -227,12 +228,13 @@ static struct dma_fence 
>> *i915_ttm_accel_move(struct ttm_buffer_object *bo,
>>           src_level = i915_ttm_cache_level(i915, bo->resource, src_ttm);
>>           intel_engine_pm_get(to_gt(i915)->migrate.context->engine);
>>           ret = intel_context_migrate_copy(to_gt(i915)->migrate.context,
>> -                         deps, src_rsgt->table.sgl,
>> -                         src_level,
>> -                         i915_ttm_gtt_binds_lmem(bo->resource),
>> -                         dst_st->sgl, dst_level,
>> -                         i915_ttm_gtt_binds_lmem(dst_mem),
>> -                         &rq);
>> +                    deps, src_rsgt->table.sgl,
>> +                    i915_gem_get_pat_index(i915, src_level),
>> +                    i915_ttm_gtt_binds_lmem(bo->resource),
>> +                    dst_st->sgl,
>> +                    i915_gem_get_pat_index(i915, dst_level),
>> +                    i915_ttm_gtt_binds_lmem(dst_mem),
>> +                    &rq);
>>           i915_refct_sgt_put(src_rsgt);
>>       }
>> diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c 
>> b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
>> index defece0bcb81..ebb68ac9cd5e 100644
>> --- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
>> +++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
>> @@ -354,7 +354,7 @@ fake_huge_pages_object(struct drm_i915_private 
>> *i915, u64 size, bool single)
>>       obj->write_domain = I915_GEM_DOMAIN_CPU;
>>       obj->read_domains = I915_GEM_DOMAIN_CPU;
>> -    obj->cache_level = I915_CACHE_NONE;
>> +    obj->pat_index = i915_gem_get_pat_index(i915, I915_CACHE_NONE);
>>       return obj;
>>   }
>> diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c 
>> b/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c
>> index fe6c37fd7859..a93a90b15907 100644
>> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c
>> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c
>> @@ -219,7 +219,7 @@ static int __igt_lmem_pages_migrate(struct 
>> intel_gt *gt,
>>               continue;
>>           err = intel_migrate_clear(&gt->migrate, &ww, deps,
>> -                      obj->mm.pages->sgl, obj->cache_level,
>> +                      obj->mm.pages->sgl, obj->pat_index,
>>                         i915_gem_object_is_lmem(obj),
>>                         0xdeadbeaf, &rq);
>>           if (rq) {
>> diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c 
>> b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
>> index 56279908ed30..a93d8f9f8bc1 100644
>> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
>> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
>> @@ -1222,7 +1222,7 @@ static int __igt_mmap_migrate(struct 
>> intel_memory_region **placements,
>>       }
>>       err = intel_context_migrate_clear(to_gt(i915)->migrate.context, 
>> NULL,
>> -                      obj->mm.pages->sgl, obj->cache_level,
>> +                      obj->mm.pages->sgl, obj->pat_index,
>>                         i915_gem_object_is_lmem(obj),
>>                         expand32(POISON_INUSE), &rq);
>>       i915_gem_object_unpin_pages(obj);
>> diff --git a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c 
>> b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
>> index 5aaacc53fa4c..c2bdc133c89a 100644
>> --- a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
>> +++ b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
>> @@ -109,7 +109,7 @@ static void gen6_ppgtt_clear_range(struct 
>> i915_address_space *vm,
>>   static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
>>                         struct i915_vma_resource *vma_res,
>> -                      enum i915_cache_level cache_level,
>> +                      unsigned int pat_index,
>>                         u32 flags)
>>   {
>>       struct i915_ppgtt *ppgtt = i915_vm_to_ppgtt(vm);
>> @@ -117,7 +117,7 @@ static void gen6_ppgtt_insert_entries(struct 
>> i915_address_space *vm,
>>       unsigned int first_entry = vma_res->start / I915_GTT_PAGE_SIZE;
>>       unsigned int act_pt = first_entry / GEN6_PTES;
>>       unsigned int act_pte = first_entry % GEN6_PTES;
>> -    const u32 pte_encode = vm->pte_encode(0, cache_level, flags);
>> +    const u32 pte_encode = vm->pte_encode(0, pat_index, flags);
>>       struct sgt_dma iter = sgt_dma(vma_res);
>>       gen6_pte_t *vaddr;
>> @@ -227,7 +227,9 @@ static int gen6_ppgtt_init_scratch(struct 
>> gen6_ppgtt *ppgtt)
>>       vm->scratch[0]->encode =
>>           vm->pte_encode(px_dma(vm->scratch[0]),
>> -                   I915_CACHE_NONE, PTE_READ_ONLY);
>> +                   i915_gem_get_pat_index(vm->i915,
>> +                              I915_CACHE_NONE),
>> +                   PTE_READ_ONLY);
>>       vm->scratch[1] = vm->alloc_pt_dma(vm, I915_GTT_PAGE_SIZE_4K);
>>       if (IS_ERR(vm->scratch[1])) {
>> @@ -278,7 +280,7 @@ static void gen6_ppgtt_cleanup(struct 
>> i915_address_space *vm)
>>   static void pd_vma_bind(struct i915_address_space *vm,
>>               struct i915_vm_pt_stash *stash,
>>               struct i915_vma_resource *vma_res,
>> -            enum i915_cache_level cache_level,
>> +            unsigned int pat_index,
>>               u32 unused)
>>   {
>>       struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
>> diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c 
>> b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>> index 7a4b1d1afce9..c046813514f4 100644
>> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>> @@ -56,7 +56,7 @@ static u64 gen8_pte_encode(dma_addr_t addr,
>>   }
>>   static u64 mtl_pte_encode(dma_addr_t addr,
>> -              enum i915_cache_level level,
>> +              unsigned int pat_index,
>>                 u32 flags)
>>   {
>>       gen8_pte_t pte = addr | GEN8_PAGE_PRESENT | GEN8_PAGE_RW;
>> @@ -67,24 +67,17 @@ static u64 mtl_pte_encode(dma_addr_t addr,
>>       if (flags & PTE_LM)
>>           pte |= GEN12_PPGTT_PTE_LM | GEN12_PPGTT_PTE_NC;
>> -    switch (level) {
>> -    case I915_CACHE_NONE:
>> -        pte |= GEN12_PPGTT_PTE_PAT1;
>> -        break;
>> -    case I915_CACHE_LLC:
>> -    case I915_CACHE_L3_LLC:
>> -        pte |= GEN12_PPGTT_PTE_PAT0 | GEN12_PPGTT_PTE_PAT1;
>> -        break;
>> -    case I915_CACHE_WT:
>> +    if (pat_index & BIT(0))
>>           pte |= GEN12_PPGTT_PTE_PAT0;
>> -        break;
>> -    default:
>> -        /* This should never happen. Added to deal with the compile
>> -         * error due to the addition of I915_MAX_CACHE_LEVEL. Will
>> -         * be removed by the pat_index patch.
>> -         */
>> -        break;
>> -    }
>> +
>> +    if (pat_index & BIT(1))
>> +        pte |= GEN12_PPGTT_PTE_PAT1;
>> +
>> +    if (pat_index & BIT(2))
>> +        pte |= GEN12_PPGTT_PTE_PAT2;
>> +
>> +    if (pat_index & BIT(3))
>> +        pte |= MTL_PPGTT_PTE_PAT3;
>>       return pte;
>>   }
>> @@ -457,11 +450,11 @@ gen8_ppgtt_insert_pte(struct i915_ppgtt *ppgtt,
>>                 struct i915_page_directory *pdp,
>>                 struct sgt_dma *iter,
>>                 u64 idx,
>> -              enum i915_cache_level cache_level,
>> +              unsigned int pat_index,
>>                 u32 flags)
>>   {
>>       struct i915_page_directory *pd;
>> -    const gen8_pte_t pte_encode = ppgtt->vm.pte_encode(0, 
>> cache_level, flags);
>> +    const gen8_pte_t pte_encode = ppgtt->vm.pte_encode(0, pat_index, 
>> flags);
>>       gen8_pte_t *vaddr;
>>       pd = i915_pd_entry(pdp, gen8_pd_index(idx, 2));
>> @@ -504,10 +497,10 @@ static void
>>   xehpsdv_ppgtt_insert_huge(struct i915_address_space *vm,
>>                 struct i915_vma_resource *vma_res,
>>                 struct sgt_dma *iter,
>> -              enum i915_cache_level cache_level,
>> +              unsigned int pat_index,
>>                 u32 flags)
>>   {
>> -    const gen8_pte_t pte_encode = vm->pte_encode(0, cache_level, flags);
>> +    const gen8_pte_t pte_encode = vm->pte_encode(0, pat_index, flags);
>>       unsigned int rem = sg_dma_len(iter->sg);
>>       u64 start = vma_res->start;
>>       u64 end = start + vma_res->vma_size;
>> @@ -611,10 +604,10 @@ xehpsdv_ppgtt_insert_huge(struct 
>> i915_address_space *vm,
>>   static void gen8_ppgtt_insert_huge(struct i915_address_space *vm,
>>                      struct i915_vma_resource *vma_res,
>>                      struct sgt_dma *iter,
>> -                   enum i915_cache_level cache_level,
>> +                   unsigned int pat_index,
>>                      u32 flags)
>>   {
>> -    const gen8_pte_t pte_encode = vm->pte_encode(0, cache_level, flags);
>> +    const gen8_pte_t pte_encode = vm->pte_encode(0, pat_index, flags);
>>       unsigned int rem = sg_dma_len(iter->sg);
>>       u64 start = vma_res->start;
>> @@ -734,7 +727,7 @@ static void gen8_ppgtt_insert_huge(struct 
>> i915_address_space *vm,
>>   static void gen8_ppgtt_insert(struct i915_address_space *vm,
>>                     struct i915_vma_resource *vma_res,
>> -                  enum i915_cache_level cache_level,
>> +                  unsigned int pat_index,
>>                     u32 flags)
>>   {
>>       struct i915_ppgtt * const ppgtt = i915_vm_to_ppgtt(vm);
>> @@ -742,9 +735,9 @@ static void gen8_ppgtt_insert(struct 
>> i915_address_space *vm,
>>       if (vma_res->bi.page_sizes.sg > I915_GTT_PAGE_SIZE) {
>>           if (HAS_64K_PAGES(vm->i915))
>> -            xehpsdv_ppgtt_insert_huge(vm, vma_res, &iter, 
>> cache_level, flags);
>> +            xehpsdv_ppgtt_insert_huge(vm, vma_res, &iter, pat_index, 
>> flags);
>>           else
>> -            gen8_ppgtt_insert_huge(vm, vma_res, &iter, cache_level, 
>> flags);
>> +            gen8_ppgtt_insert_huge(vm, vma_res, &iter, pat_index, 
>> flags);
>>       } else  {
>>           u64 idx = vma_res->start >> GEN8_PTE_SHIFT;
>> @@ -753,7 +746,7 @@ static void gen8_ppgtt_insert(struct 
>> i915_address_space *vm,
>>                   gen8_pdp_for_page_index(vm, idx);
>>               idx = gen8_ppgtt_insert_pte(ppgtt, pdp, &iter, idx,
>> -                            cache_level, flags);
>> +                            pat_index, flags);
>>           } while (idx);
>>           vma_res->page_sizes_gtt = I915_GTT_PAGE_SIZE;
>> @@ -763,7 +756,7 @@ static void gen8_ppgtt_insert(struct 
>> i915_address_space *vm,
>>   static void gen8_ppgtt_insert_entry(struct i915_address_space *vm,
>>                       dma_addr_t addr,
>>                       u64 offset,
>> -                    enum i915_cache_level level,
>> +                    unsigned int pat_index,
>>                       u32 flags)
>>   {
>>       u64 idx = offset >> GEN8_PTE_SHIFT;
>> @@ -777,14 +770,14 @@ static void gen8_ppgtt_insert_entry(struct 
>> i915_address_space *vm,
>>       GEM_BUG_ON(pt->is_compact);
>>       vaddr = px_vaddr(pt);
>> -    vaddr[gen8_pd_index(idx, 0)] = vm->pte_encode(addr, level, flags);
>> +    vaddr[gen8_pd_index(idx, 0)] = vm->pte_encode(addr, pat_index, 
>> flags);
>>       drm_clflush_virt_range(&vaddr[gen8_pd_index(idx, 0)], 
>> sizeof(*vaddr));
>>   }
>>   static void __xehpsdv_ppgtt_insert_entry_lm(struct 
>> i915_address_space *vm,
>>                           dma_addr_t addr,
>>                           u64 offset,
>> -                        enum i915_cache_level level,
>> +                        unsigned int pat_index,
>>                           u32 flags)
>>   {
>>       u64 idx = offset >> GEN8_PTE_SHIFT;
>> @@ -807,20 +800,20 @@ static void 
>> __xehpsdv_ppgtt_insert_entry_lm(struct i915_address_space *vm,
>>       }
>>       vaddr = px_vaddr(pt);
>> -    vaddr[gen8_pd_index(idx, 0) / 16] = vm->pte_encode(addr, level, 
>> flags);
>> +    vaddr[gen8_pd_index(idx, 0) / 16] = vm->pte_encode(addr, 
>> pat_index, flags);
>>   }
>>   static void xehpsdv_ppgtt_insert_entry(struct i915_address_space *vm,
>>                          dma_addr_t addr,
>>                          u64 offset,
>> -                       enum i915_cache_level level,
>> +                       unsigned int pat_index,
>>                          u32 flags)
>>   {
>>       if (flags & PTE_LM)
>>           return __xehpsdv_ppgtt_insert_entry_lm(vm, addr, offset,
>> -                               level, flags);
>> +                               pat_index, flags);
>> -    return gen8_ppgtt_insert_entry(vm, addr, offset, level, flags);
>> +    return gen8_ppgtt_insert_entry(vm, addr, offset, pat_index, flags);
>>   }
>>   static int gen8_init_scratch(struct i915_address_space *vm)
>> @@ -855,7 +848,9 @@ static int gen8_init_scratch(struct 
>> i915_address_space *vm)
>>       vm->scratch[0]->encode =
>>           vm->pte_encode(px_dma(vm->scratch[0]),
>> -                   I915_CACHE_NONE, pte_flags);
>> +                   i915_gem_get_pat_index(vm->i915,
>> +                              I915_CACHE_NONE),
>> +                   pte_flags);
>>       for (i = 1; i <= vm->top; i++) {
>>           struct drm_i915_gem_object *obj;
>> @@ -873,7 +868,9 @@ static int gen8_init_scratch(struct 
>> i915_address_space *vm)
>>           }
>>           fill_px(obj, vm->scratch[i - 1]->encode);
>> -        obj->encode = gen8_pde_encode(px_dma(obj), I915_CACHE_NONE);
>> +        obj->encode = gen8_pde_encode(px_dma(obj),
>> +                          i915_gem_get_pat_index(vm->i915,
>> +                                     I915_CACHE_NONE));
>>           vm->scratch[i] = obj;
>>       }
>> diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.h 
>> b/drivers/gpu/drm/i915/gt/gen8_ppgtt.h
>> index f541d19264b4..19c635441642 100644
>> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.h
>> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.h
>> @@ -10,13 +10,12 @@
>>   struct i915_address_space;
>>   struct intel_gt;
>> -enum i915_cache_level;
>>   struct i915_ppgtt *gen8_ppgtt_create(struct intel_gt *gt,
>>                        unsigned long lmem_pt_obj_flags);
>>   u64 gen8_ggtt_pte_encode(dma_addr_t addr,
>> -             enum i915_cache_level level,
>> +             unsigned int pat_index,
>>                u32 flags);
>>   #endif
>> diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c 
>> b/drivers/gpu/drm/i915/gt/intel_ggtt.c
>> index c8390d03fce2..2a7942fac798 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
>> @@ -221,7 +221,7 @@ static void guc_ggtt_invalidate(struct i915_ggtt 
>> *ggtt)
>>   }
>>   static u64 mtl_ggtt_pte_encode(dma_addr_t addr,
>> -                   enum i915_cache_level level,
>> +                   unsigned int pat_index,
>>                      u32 flags)
>>   {
>>       gen8_pte_t pte = addr | GEN8_PAGE_PRESENT;
>> @@ -231,30 +231,17 @@ static u64 mtl_ggtt_pte_encode(dma_addr_t addr,
>>       if (flags & PTE_LM)
>>           pte |= GEN12_GGTT_PTE_LM;
>> -    switch (level) {
>> -    case I915_CACHE_NONE:
>> -        pte |= MTL_GGTT_PTE_PAT1;
>> -        break;
>> -    case I915_CACHE_LLC:
>> -    case I915_CACHE_L3_LLC:
>> -        pte |= MTL_GGTT_PTE_PAT0 | MTL_GGTT_PTE_PAT1;
>> -        break;
>> -    case I915_CACHE_WT:
>> +    if (pat_index & BIT(0))
>>           pte |= MTL_GGTT_PTE_PAT0;
>> -        break;
>> -    default:
>> -        /* This should never happen. Added to deal with the compile
>> -         * error due to the addition of I915_MAX_CACHE_LEVEL. Will
>> -         * be removed by the pat_index patch.
>> -         */
>> -        break;
>> -    }
>> +
>> +    if (pat_index & BIT(1))
>> +        pte |= MTL_GGTT_PTE_PAT1;
>>       return pte;
>>   }
>>   u64 gen8_ggtt_pte_encode(dma_addr_t addr,
>> -             enum i915_cache_level level,
>> +             unsigned int pat_index,
>>                u32 flags)
>>   {
>>       gen8_pte_t pte = addr | GEN8_PAGE_PRESENT;
>> @@ -273,25 +260,25 @@ static void gen8_set_pte(void __iomem *addr, 
>> gen8_pte_t pte)
>>   static void gen8_ggtt_insert_page(struct i915_address_space *vm,
>>                     dma_addr_t addr,
>>                     u64 offset,
>> -                  enum i915_cache_level level,
>> +                  unsigned int pat_index,
>>                     u32 flags)
>>   {
>>       struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
>>       gen8_pte_t __iomem *pte =
>>           (gen8_pte_t __iomem *)ggtt->gsm + offset / I915_GTT_PAGE_SIZE;
>> -    gen8_set_pte(pte, ggtt->vm.pte_encode(addr, level, flags));
>> +    gen8_set_pte(pte, ggtt->vm.pte_encode(addr, pat_index, flags));
>>       ggtt->invalidate(ggtt);
>>   }
>>   static void gen8_ggtt_insert_entries(struct i915_address_space *vm,
>>                        struct i915_vma_resource *vma_res,
>> -                     enum i915_cache_level level,
>> +                     unsigned int pat_index,
>>                        u32 flags)
>>   {
>>       struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
>> -    const gen8_pte_t pte_encode = ggtt->vm.pte_encode(0, level, flags);
>> +    const gen8_pte_t pte_encode = ggtt->vm.pte_encode(0, pat_index, 
>> flags);
>>       gen8_pte_t __iomem *gte;
>>       gen8_pte_t __iomem *end;
>>       struct sgt_iter iter;
>> @@ -348,14 +335,14 @@ static void gen8_ggtt_clear_range(struct 
>> i915_address_space *vm,
>>   static void gen6_ggtt_insert_page(struct i915_address_space *vm,
>>                     dma_addr_t addr,
>>                     u64 offset,
>> -                  enum i915_cache_level level,
>> +                  unsigned int pat_index,
>>                     u32 flags)
>>   {
>>       struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
>>       gen6_pte_t __iomem *pte =
>>           (gen6_pte_t __iomem *)ggtt->gsm + offset / I915_GTT_PAGE_SIZE;
>> -    iowrite32(vm->pte_encode(addr, level, flags), pte);
>> +    iowrite32(vm->pte_encode(addr, pat_index, flags), pte);
>>       ggtt->invalidate(ggtt);
>>   }
>> @@ -368,7 +355,7 @@ static void gen6_ggtt_insert_page(struct 
>> i915_address_space *vm,
>>    */
>>   static void gen6_ggtt_insert_entries(struct i915_address_space *vm,
>>                        struct i915_vma_resource *vma_res,
>> -                     enum i915_cache_level level,
>> +                     unsigned int pat_index,
>>                        u32 flags)
>>   {
>>       struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
>> @@ -385,7 +372,7 @@ static void gen6_ggtt_insert_entries(struct 
>> i915_address_space *vm,
>>           iowrite32(vm->scratch[0]->encode, gte++);
>>       end += (vma_res->node_size + vma_res->guard) / I915_GTT_PAGE_SIZE;
>>       for_each_sgt_daddr(addr, iter, vma_res->bi.pages)
>> -        iowrite32(vm->pte_encode(addr, level, flags), gte++);
>> +        iowrite32(vm->pte_encode(addr, pat_index, flags), gte++);
>>       GEM_BUG_ON(gte > end);
>>       /* Fill the allocated but "unused" space beyond the end of the 
>> buffer */
>> @@ -420,14 +407,15 @@ struct insert_page {
>>       struct i915_address_space *vm;
>>       dma_addr_t addr;
>>       u64 offset;
>> -    enum i915_cache_level level;
>> +    unsigned int pat_index;
>>   };
>>   static int bxt_vtd_ggtt_insert_page__cb(void *_arg)
>>   {
>>       struct insert_page *arg = _arg;
>> -    gen8_ggtt_insert_page(arg->vm, arg->addr, arg->offset, 
>> arg->level, 0);
>> +    gen8_ggtt_insert_page(arg->vm, arg->addr, arg->offset,
>> +                  arg->pat_index, 0);
>>       bxt_vtd_ggtt_wa(arg->vm);
>>       return 0;
>> @@ -436,10 +424,10 @@ static int bxt_vtd_ggtt_insert_page__cb(void *_arg)
>>   static void bxt_vtd_ggtt_insert_page__BKL(struct i915_address_space 
>> *vm,
>>                         dma_addr_t addr,
>>                         u64 offset,
>> -                      enum i915_cache_level level,
>> +                      unsigned int pat_index,
>>                         u32 unused)
>>   {
>> -    struct insert_page arg = { vm, addr, offset, level };
>> +    struct insert_page arg = { vm, addr, offset, pat_index };
>>       stop_machine(bxt_vtd_ggtt_insert_page__cb, &arg, NULL);
>>   }
>> @@ -447,7 +435,7 @@ static void bxt_vtd_ggtt_insert_page__BKL(struct 
>> i915_address_space *vm,
>>   struct insert_entries {
>>       struct i915_address_space *vm;
>>       struct i915_vma_resource *vma_res;
>> -    enum i915_cache_level level;
>> +    unsigned int pat_index;
>>       u32 flags;
>>   };
>> @@ -455,7 +443,8 @@ static int bxt_vtd_ggtt_insert_entries__cb(void 
>> *_arg)
>>   {
>>       struct insert_entries *arg = _arg;
>> -    gen8_ggtt_insert_entries(arg->vm, arg->vma_res, arg->level, 
>> arg->flags);
>> +    gen8_ggtt_insert_entries(arg->vm, arg->vma_res,
>> +                 arg->pat_index, arg->flags);
>>       bxt_vtd_ggtt_wa(arg->vm);
>>       return 0;
>> @@ -463,10 +452,10 @@ static int bxt_vtd_ggtt_insert_entries__cb(void 
>> *_arg)
>>   static void bxt_vtd_ggtt_insert_entries__BKL(struct 
>> i915_address_space *vm,
>>                            struct i915_vma_resource *vma_res,
>> -                         enum i915_cache_level level,
>> +                         unsigned int pat_index,
>>                            u32 flags)
>>   {
>> -    struct insert_entries arg = { vm, vma_res, level, flags };
>> +    struct insert_entries arg = { vm, vma_res, pat_index, flags };
>>       stop_machine(bxt_vtd_ggtt_insert_entries__cb, &arg, NULL);
>>   }
>> @@ -495,7 +484,7 @@ static void gen6_ggtt_clear_range(struct 
>> i915_address_space *vm,
>>   void intel_ggtt_bind_vma(struct i915_address_space *vm,
>>                struct i915_vm_pt_stash *stash,
>>                struct i915_vma_resource *vma_res,
>> -             enum i915_cache_level cache_level,
>> +             unsigned int pat_index,
>>                u32 flags)
>>   {
>>       u32 pte_flags;
>> @@ -512,7 +501,7 @@ void intel_ggtt_bind_vma(struct i915_address_space 
>> *vm,
>>       if (vma_res->bi.lmem)
>>           pte_flags |= PTE_LM;
>> -    vm->insert_entries(vm, vma_res, cache_level, pte_flags);
>> +    vm->insert_entries(vm, vma_res, pat_index, pte_flags);
>>       vma_res->page_sizes_gtt = I915_GTT_PAGE_SIZE;
>>   }
>> @@ -661,7 +650,7 @@ static int init_ggtt(struct i915_ggtt *ggtt)
>>   static void aliasing_gtt_bind_vma(struct i915_address_space *vm,
>>                     struct i915_vm_pt_stash *stash,
>>                     struct i915_vma_resource *vma_res,
>> -                  enum i915_cache_level cache_level,
>> +                  unsigned int pat_index,
>>                     u32 flags)
>>   {
>>       u32 pte_flags;
>> @@ -673,10 +662,10 @@ static void aliasing_gtt_bind_vma(struct 
>> i915_address_space *vm,
>>       if (flags & I915_VMA_LOCAL_BIND)
>>           ppgtt_bind_vma(&i915_vm_to_ggtt(vm)->alias->vm,
>> -                   stash, vma_res, cache_level, flags);
>> +                   stash, vma_res, pat_index, flags);
>>       if (flags & I915_VMA_GLOBAL_BIND)
>> -        vm->insert_entries(vm, vma_res, cache_level, pte_flags);
>> +        vm->insert_entries(vm, vma_res, pat_index, pte_flags);
>>       vma_res->bound_flags |= flags;
>>   }
>> @@ -933,7 +922,9 @@ static int ggtt_probe_common(struct i915_ggtt 
>> *ggtt, u64 size)
>>       ggtt->vm.scratch[0]->encode =
>>           ggtt->vm.pte_encode(px_dma(ggtt->vm.scratch[0]),
>> -                    I915_CACHE_NONE, pte_flags);
>> +                    i915_gem_get_pat_index(i915,
>> +                               I915_CACHE_NONE),
>> +                    pte_flags);
>>       return 0;
>>   }
>> @@ -1022,6 +1013,11 @@ static int gen8_gmch_probe(struct i915_ggtt *ggtt)
>>       return ggtt_probe_common(ggtt, size);
>>   }
>> +/*
>> + * For pre-gen8 platforms pat_index is the same as enum 
>> i915_cache_level,
>> + * so these PTE encode functions are left with using cache_level.
>> + * See translation table LEGACY_CACHELEVEL.
>> + */
>>   static u64 snb_pte_encode(dma_addr_t addr,
>>                 enum i915_cache_level level,
>>                 u32 flags)
>> @@ -1302,7 +1298,9 @@ bool i915_ggtt_resume_vm(struct 
>> i915_address_space *vm)
>>            */
>>           vma->resource->bound_flags = 0;
>>           vma->ops->bind_vma(vm, NULL, vma->resource,
>> -                   obj ? obj->cache_level : 0,
>> +                   obj ? obj->pat_index :
>> +                     i915_gem_get_pat_index(vm->i915,
>> +                                I915_CACHE_NONE),
>>                      was_bound);
>>           if (obj) { /* only used during resume => exclusive access */
>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h 
>> b/drivers/gpu/drm/i915/gt/intel_gtt.h
>> index 854ec09fd588..be767e13b1e5 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
>> @@ -165,8 +165,6 @@ typedef u64 gen8_pte_t;
>>   #define MTL_2_COH_1W    REG_FIELD_PREP(MTL_PAT_INDEX_COH_MODE_MASK, 2)
>>   #define MTL_0_COH_NON    REG_FIELD_PREP(MTL_PAT_INDEX_COH_MODE_MASK, 0)
>> -enum i915_cache_level;
>> -
>>   struct drm_i915_gem_object;
>>   struct i915_fence_reg;
>>   struct i915_vma;
>> @@ -234,7 +232,7 @@ struct i915_vma_ops {
>>       void (*bind_vma)(struct i915_address_space *vm,
>>                struct i915_vm_pt_stash *stash,
>>                struct i915_vma_resource *vma_res,
>> -             enum i915_cache_level cache_level,
>> +             unsigned int pat_index,
>>                u32 flags);
>>       /*
>>        * Unmap an object from an address space. This usually consists of
>> @@ -306,7 +304,7 @@ struct i915_address_space {
>>           (*alloc_scratch_dma)(struct i915_address_space *vm, int sz);
>>       u64 (*pte_encode)(dma_addr_t addr,
>> -              enum i915_cache_level level,
>> +              unsigned int pat_index,
>>                 u32 flags); /* Create a valid PTE */
>>   #define PTE_READ_ONLY    BIT(0)
>>   #define PTE_LM        BIT(1)
>> @@ -321,20 +319,20 @@ struct i915_address_space {
>>       void (*insert_page)(struct i915_address_space *vm,
>>                   dma_addr_t addr,
>>                   u64 offset,
>> -                enum i915_cache_level cache_level,
>> +                unsigned int pat_index,
>>                   u32 flags);
>>       void (*insert_entries)(struct i915_address_space *vm,
>>                      struct i915_vma_resource *vma_res,
>> -                   enum i915_cache_level cache_level,
>> +                   unsigned int pat_index,
>>                      u32 flags);
>>       void (*raw_insert_page)(struct i915_address_space *vm,
>>                   dma_addr_t addr,
>>                   u64 offset,
>> -                enum i915_cache_level cache_level,
>> +                unsigned int pat_index,
>>                   u32 flags);
>>       void (*raw_insert_entries)(struct i915_address_space *vm,
>>                      struct i915_vma_resource *vma_res,
>> -                   enum i915_cache_level cache_level,
>> +                   unsigned int pat_index,
>>                      u32 flags);
>>       void (*cleanup)(struct i915_address_space *vm);
>> @@ -581,7 +579,7 @@ void ppgtt_init(struct i915_ppgtt *ppgtt, struct 
>> intel_gt *gt,
>>   void intel_ggtt_bind_vma(struct i915_address_space *vm,
>>                struct i915_vm_pt_stash *stash,
>>                struct i915_vma_resource *vma_res,
>> -             enum i915_cache_level cache_level,
>> +             unsigned int pat_index,
>>                u32 flags);
>>   void intel_ggtt_unbind_vma(struct i915_address_space *vm,
>>                  struct i915_vma_resource *vma_res);
>> @@ -639,7 +637,7 @@ void
>>   __set_pd_entry(struct i915_page_directory * const pd,
>>              const unsigned short idx,
>>              struct i915_page_table *pt,
>> -           u64 (*encode)(const dma_addr_t, const enum 
>> i915_cache_level));
>> +           u64 (*encode)(const dma_addr_t, const unsigned int 
>> pat_index));
>>   #define set_pd_entry(pd, idx, to) \
>>       __set_pd_entry((pd), (idx), px_pt(to), gen8_pde_encode)
>> @@ -659,7 +657,7 @@ void gen6_ggtt_invalidate(struct i915_ggtt *ggtt);
>>   void ppgtt_bind_vma(struct i915_address_space *vm,
>>               struct i915_vm_pt_stash *stash,
>>               struct i915_vma_resource *vma_res,
>> -            enum i915_cache_level cache_level,
>> +            unsigned int pat_index,
>>               u32 flags);
>>   void ppgtt_unbind_vma(struct i915_address_space *vm,
>>                 struct i915_vma_resource *vma_res);
>> diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c 
>> b/drivers/gpu/drm/i915/gt/intel_migrate.c
>> index 3f638f198796..117c3d05af3e 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_migrate.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
>> @@ -45,7 +45,9 @@ static void xehpsdv_toggle_pdes(struct 
>> i915_address_space *vm,
>>        * Insert a dummy PTE into every PT that will map to LMEM to ensure
>>        * we have a correctly setup PDE structure for later use.
>>        */
>> -    vm->insert_page(vm, 0, d->offset, I915_CACHE_NONE, PTE_LM);
>> +    vm->insert_page(vm, 0, d->offset,
>> +            i915_gem_get_pat_index(vm->i915, I915_CACHE_NONE),
>> +            PTE_LM);
>>       GEM_BUG_ON(!pt->is_compact);
>>       d->offset += SZ_2M;
>>   }
>> @@ -63,7 +65,9 @@ static void xehpsdv_insert_pte(struct 
>> i915_address_space *vm,
>>        * alignment is 64K underneath for the pt, and we are careful
>>        * not to access the space in the void.
>>        */
>> -    vm->insert_page(vm, px_dma(pt), d->offset, I915_CACHE_NONE, PTE_LM);
>> +    vm->insert_page(vm, px_dma(pt), d->offset,
>> +            i915_gem_get_pat_index(vm->i915, I915_CACHE_NONE),
>> +            PTE_LM);
>>       d->offset += SZ_64K;
>>   }
>> @@ -73,7 +77,8 @@ static void insert_pte(struct i915_address_space *vm,
>>   {
>>       struct insert_pte_data *d = data;
>> -    vm->insert_page(vm, px_dma(pt), d->offset, I915_CACHE_NONE,
>> +    vm->insert_page(vm, px_dma(pt), d->offset,
>> +            i915_gem_get_pat_index(vm->i915, I915_CACHE_NONE),
>>               i915_gem_object_is_lmem(pt->base) ? PTE_LM : 0);
>>       d->offset += PAGE_SIZE;
>>   }
>> @@ -356,13 +361,13 @@ static int max_pte_pkt_size(struct i915_request 
>> *rq, int pkt)
>>   static int emit_pte(struct i915_request *rq,
>>               struct sgt_dma *it,
>> -            enum i915_cache_level cache_level,
>> +            unsigned int pat_index,
>>               bool is_lmem,
>>               u64 offset,
>>               int length)
>>   {
>>       bool has_64K_pages = HAS_64K_PAGES(rq->engine->i915);
>> -    const u64 encode = rq->context->vm->pte_encode(0, cache_level,
>> +    const u64 encode = rq->context->vm->pte_encode(0, pat_index,
>>                                  is_lmem ? PTE_LM : 0);
>>       struct intel_ring *ring = rq->ring;
>>       int pkt, dword_length;
>> @@ -673,17 +678,17 @@ int
>>   intel_context_migrate_copy(struct intel_context *ce,
>>                  const struct i915_deps *deps,
>>                  struct scatterlist *src,
>> -               enum i915_cache_level src_cache_level,
>> +               unsigned int src_pat_index,
>>                  bool src_is_lmem,
>>                  struct scatterlist *dst,
>> -               enum i915_cache_level dst_cache_level,
>> +               unsigned int dst_pat_index,
>>                  bool dst_is_lmem,
>>                  struct i915_request **out)
>>   {
>>       struct sgt_dma it_src = sg_sgt(src), it_dst = sg_sgt(dst), it_ccs;
>>       struct drm_i915_private *i915 = ce->engine->i915;
>>       u64 ccs_bytes_to_cpy = 0, bytes_to_cpy;
>> -    enum i915_cache_level ccs_cache_level;
>> +    unsigned int ccs_pat_index;
>>       u32 src_offset, dst_offset;
>>       u8 src_access, dst_access;
>>       struct i915_request *rq;
>> @@ -707,12 +712,12 @@ intel_context_migrate_copy(struct intel_context 
>> *ce,
>>           dst_sz = scatter_list_length(dst);
>>           if (src_is_lmem) {
>>               it_ccs = it_dst;
>> -            ccs_cache_level = dst_cache_level;
>> +            ccs_pat_index = dst_pat_index;
>>               ccs_is_src = false;
>>           } else if (dst_is_lmem) {
>>               bytes_to_cpy = dst_sz;
>>               it_ccs = it_src;
>> -            ccs_cache_level = src_cache_level;
>> +            ccs_pat_index = src_pat_index;
>>               ccs_is_src = true;
>>           }
>> @@ -773,7 +778,7 @@ intel_context_migrate_copy(struct intel_context *ce,
>>           src_sz = calculate_chunk_sz(i915, src_is_lmem,
>>                           bytes_to_cpy, ccs_bytes_to_cpy);
>> -        len = emit_pte(rq, &it_src, src_cache_level, src_is_lmem,
>> +        len = emit_pte(rq, &it_src, src_pat_index, src_is_lmem,
>>                      src_offset, src_sz);
>>           if (!len) {
>>               err = -EINVAL;
>> @@ -784,7 +789,7 @@ intel_context_migrate_copy(struct intel_context *ce,
>>               goto out_rq;
>>           }
>> -        err = emit_pte(rq, &it_dst, dst_cache_level, dst_is_lmem,
>> +        err = emit_pte(rq, &it_dst, dst_pat_index, dst_is_lmem,
>>                      dst_offset, len);
>>           if (err < 0)
>>               goto out_rq;
>> @@ -811,7 +816,7 @@ intel_context_migrate_copy(struct intel_context *ce,
>>                   goto out_rq;
>>               ccs_sz = GET_CCS_BYTES(i915, len);
>> -            err = emit_pte(rq, &it_ccs, ccs_cache_level, false,
>> +            err = emit_pte(rq, &it_ccs, ccs_pat_index, false,
>>                          ccs_is_src ? src_offset : dst_offset,
>>                          ccs_sz);
>>               if (err < 0)
>> @@ -979,7 +984,7 @@ int
>>   intel_context_migrate_clear(struct intel_context *ce,
>>                   const struct i915_deps *deps,
>>                   struct scatterlist *sg,
>> -                enum i915_cache_level cache_level,
>> +                unsigned int pat_index,
>>                   bool is_lmem,
>>                   u32 value,
>>                   struct i915_request **out)
>> @@ -1027,7 +1032,7 @@ intel_context_migrate_clear(struct intel_context 
>> *ce,
>>           if (err)
>>               goto out_rq;
>> -        len = emit_pte(rq, &it, cache_level, is_lmem, offset, CHUNK_SZ);
>> +        len = emit_pte(rq, &it, pat_index, is_lmem, offset, CHUNK_SZ);
>>           if (len <= 0) {
>>               err = len;
>>               goto out_rq;
>> @@ -1074,10 +1079,10 @@ int intel_migrate_copy(struct intel_migrate *m,
>>                  struct i915_gem_ww_ctx *ww,
>>                  const struct i915_deps *deps,
>>                  struct scatterlist *src,
>> -               enum i915_cache_level src_cache_level,
>> +               unsigned int src_pat_index,
>>                  bool src_is_lmem,
>>                  struct scatterlist *dst,
>> -               enum i915_cache_level dst_cache_level,
>> +               unsigned int dst_pat_index,
>>                  bool dst_is_lmem,
>>                  struct i915_request **out)
>>   {
>> @@ -1098,8 +1103,8 @@ int intel_migrate_copy(struct intel_migrate *m,
>>           goto out;
>>       err = intel_context_migrate_copy(ce, deps,
>> -                     src, src_cache_level, src_is_lmem,
>> -                     dst, dst_cache_level, dst_is_lmem,
>> +                     src, src_pat_index, src_is_lmem,
>> +                     dst, dst_pat_index, dst_is_lmem,
>>                        out);
>>       intel_context_unpin(ce);
>> @@ -1113,7 +1118,7 @@ intel_migrate_clear(struct intel_migrate *m,
>>               struct i915_gem_ww_ctx *ww,
>>               const struct i915_deps *deps,
>>               struct scatterlist *sg,
>> -            enum i915_cache_level cache_level,
>> +            unsigned int pat_index,
>>               bool is_lmem,
>>               u32 value,
>>               struct i915_request **out)
>> @@ -1134,7 +1139,7 @@ intel_migrate_clear(struct intel_migrate *m,
>>       if (err)
>>           goto out;
>> -    err = intel_context_migrate_clear(ce, deps, sg, cache_level,
>> +    err = intel_context_migrate_clear(ce, deps, sg, pat_index,
>>                         is_lmem, value, out);
>>       intel_context_unpin(ce);
>> diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.h 
>> b/drivers/gpu/drm/i915/gt/intel_migrate.h
>> index ccc677ec4aa3..11fc09a00c4b 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_migrate.h
>> +++ b/drivers/gpu/drm/i915/gt/intel_migrate.h
>> @@ -16,7 +16,6 @@ struct i915_request;
>>   struct i915_gem_ww_ctx;
>>   struct intel_gt;
>>   struct scatterlist;
>> -enum i915_cache_level;
>>   int intel_migrate_init(struct intel_migrate *m, struct intel_gt *gt);
>> @@ -26,20 +25,20 @@ int intel_migrate_copy(struct intel_migrate *m,
>>                  struct i915_gem_ww_ctx *ww,
>>                  const struct i915_deps *deps,
>>                  struct scatterlist *src,
>> -               enum i915_cache_level src_cache_level,
>> +               unsigned int src_pat_index,
>>                  bool src_is_lmem,
>>                  struct scatterlist *dst,
>> -               enum i915_cache_level dst_cache_level,
>> +               unsigned int dst_pat_index,
>>                  bool dst_is_lmem,
>>                  struct i915_request **out);
>>   int intel_context_migrate_copy(struct intel_context *ce,
>>                      const struct i915_deps *deps,
>>                      struct scatterlist *src,
>> -                   enum i915_cache_level src_cache_level,
>> +                   unsigned int src_pat_index,
>>                      bool src_is_lmem,
>>                      struct scatterlist *dst,
>> -                   enum i915_cache_level dst_cache_level,
>> +                   unsigned int dst_pat_index,
>>                      bool dst_is_lmem,
>>                      struct i915_request **out);
>> @@ -48,7 +47,7 @@ intel_migrate_clear(struct intel_migrate *m,
>>               struct i915_gem_ww_ctx *ww,
>>               const struct i915_deps *deps,
>>               struct scatterlist *sg,
>> -            enum i915_cache_level cache_level,
>> +            unsigned int pat_index,
>>               bool is_lmem,
>>               u32 value,
>>               struct i915_request **out);
>> @@ -56,7 +55,7 @@ int
>>   intel_context_migrate_clear(struct intel_context *ce,
>>                   const struct i915_deps *deps,
>>                   struct scatterlist *sg,
>> -                enum i915_cache_level cache_level,
>> +                unsigned int pat_index,
>>                   bool is_lmem,
>>                   u32 value,
>>                   struct i915_request **out);
>> diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c 
>> b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
>> index 7ecfa672f738..f0da3555c6db 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
>> @@ -98,7 +98,7 @@ void
>>   __set_pd_entry(struct i915_page_directory * const pd,
>>              const unsigned short idx,
>>              struct i915_page_table * const to,
>> -           u64 (*encode)(const dma_addr_t, const enum i915_cache_level))
>> +           u64 (*encode)(const dma_addr_t, const unsigned int))
>>   {
>>       /* Each thread pre-pins the pd, and we may have a thread per 
>> pde. */
>>       GEM_BUG_ON(atomic_read(px_used(pd)) > NALLOC * I915_PDES);
>> @@ -181,7 +181,7 @@ struct i915_ppgtt *i915_ppgtt_create(struct 
>> intel_gt *gt,
>>   void ppgtt_bind_vma(struct i915_address_space *vm,
>>               struct i915_vm_pt_stash *stash,
>>               struct i915_vma_resource *vma_res,
>> -            enum i915_cache_level cache_level,
>> +            unsigned int pat_index,
>>               u32 flags)
>>   {
>>       u32 pte_flags;
>> @@ -199,7 +199,7 @@ void ppgtt_bind_vma(struct i915_address_space *vm,
>>       if (vma_res->bi.lmem)
>>           pte_flags |= PTE_LM;
>> -    vm->insert_entries(vm, vma_res, cache_level, pte_flags);
>> +    vm->insert_entries(vm, vma_res, pat_index, pte_flags);
>>       wmb();
>>   }
>> diff --git a/drivers/gpu/drm/i915/gt/selftest_migrate.c 
>> b/drivers/gpu/drm/i915/gt/selftest_migrate.c
>> index e677f2da093d..3def5ca72dec 100644
>> --- a/drivers/gpu/drm/i915/gt/selftest_migrate.c
>> +++ b/drivers/gpu/drm/i915/gt/selftest_migrate.c
>> @@ -137,7 +137,7 @@ static int copy(struct intel_migrate *migrate,
>>   static int intel_context_copy_ccs(struct intel_context *ce,
>>                     const struct i915_deps *deps,
>>                     struct scatterlist *sg,
>> -                  enum i915_cache_level cache_level,
>> +                  unsigned int pat_index,
>>                     bool write_to_ccs,
>>                     struct i915_request **out)
>>   {
>> @@ -185,7 +185,7 @@ static int intel_context_copy_ccs(struct 
>> intel_context *ce,
>>           if (err)
>>               goto out_rq;
>> -        len = emit_pte(rq, &it, cache_level, true, offset, CHUNK_SZ);
>> +        len = emit_pte(rq, &it, pat_index, true, offset, CHUNK_SZ);
>>           if (len <= 0) {
>>               err = len;
>>               goto out_rq;
>> @@ -223,7 +223,7 @@ intel_migrate_ccs_copy(struct intel_migrate *m,
>>                  struct i915_gem_ww_ctx *ww,
>>                  const struct i915_deps *deps,
>>                  struct scatterlist *sg,
>> -               enum i915_cache_level cache_level,
>> +               unsigned int pat_index,
>>                  bool write_to_ccs,
>>                  struct i915_request **out)
>>   {
>> @@ -243,7 +243,7 @@ intel_migrate_ccs_copy(struct intel_migrate *m,
>>       if (err)
>>           goto out;
>> -    err = intel_context_copy_ccs(ce, deps, sg, cache_level,
>> +    err = intel_context_copy_ccs(ce, deps, sg, pat_index,
>>                        write_to_ccs, out);
>>       intel_context_unpin(ce);
>> @@ -300,7 +300,7 @@ static int clear(struct intel_migrate *migrate,
>>               /* Write the obj data into ccs surface */
>>               err = intel_migrate_ccs_copy(migrate, &ww, NULL,
>>                                obj->mm.pages->sgl,
>> -                             obj->cache_level,
>> +                             obj->pat_index,
>>                                true, &rq);
>>               if (rq && !err) {
>>                   if (i915_request_wait(rq, 0, HZ) < 0) {
>> @@ -351,7 +351,7 @@ static int clear(struct intel_migrate *migrate,
>>               err = intel_migrate_ccs_copy(migrate, &ww, NULL,
>>                                obj->mm.pages->sgl,
>> -                             obj->cache_level,
>> +                             obj->pat_index,
>>                                false, &rq);
>>               if (rq && !err) {
>>                   if (i915_request_wait(rq, 0, HZ) < 0) {
>> @@ -414,9 +414,9 @@ static int __migrate_copy(struct intel_migrate 
>> *migrate,
>>                 struct i915_request **out)
>>   {
>>       return intel_migrate_copy(migrate, ww, NULL,
>> -                  src->mm.pages->sgl, src->cache_level,
>> +                  src->mm.pages->sgl, src->pat_index,
>>                     i915_gem_object_is_lmem(src),
>> -                  dst->mm.pages->sgl, dst->cache_level,
>> +                  dst->mm.pages->sgl, dst->pat_index,
>>                     i915_gem_object_is_lmem(dst),
>>                     out);
>>   }
>> @@ -428,9 +428,9 @@ static int __global_copy(struct intel_migrate 
>> *migrate,
>>                struct i915_request **out)
>>   {
>>       return intel_context_migrate_copy(migrate->context, NULL,
>> -                      src->mm.pages->sgl, src->cache_level,
>> +                      src->mm.pages->sgl, src->pat_index,
>>                         i915_gem_object_is_lmem(src),
>> -                      dst->mm.pages->sgl, dst->cache_level,
>> +                      dst->mm.pages->sgl, dst->pat_index,
>>                         i915_gem_object_is_lmem(dst),
>>                         out);
>>   }
>> @@ -455,7 +455,7 @@ static int __migrate_clear(struct intel_migrate 
>> *migrate,
>>   {
>>       return intel_migrate_clear(migrate, ww, NULL,
>>                      obj->mm.pages->sgl,
>> -                   obj->cache_level,
>> +                   obj->pat_index,
>>                      i915_gem_object_is_lmem(obj),
>>                      value, out);
>>   }
>> @@ -468,7 +468,7 @@ static int __global_clear(struct intel_migrate 
>> *migrate,
>>   {
>>       return intel_context_migrate_clear(migrate->context, NULL,
>>                          obj->mm.pages->sgl,
>> -                       obj->cache_level,
>> +                       obj->pat_index,
>>                          i915_gem_object_is_lmem(obj),
>>                          value, out);
>>   }
>> @@ -648,7 +648,7 @@ static int live_emit_pte_full_ring(void *arg)
>>        */
>>       pr_info("%s emite_pte ring space=%u\n", __func__, rq->ring->space);
>>       it = sg_sgt(obj->mm.pages->sgl);
>> -    len = emit_pte(rq, &it, obj->cache_level, false, 0, CHUNK_SZ);
>> +    len = emit_pte(rq, &it, obj->pat_index, false, 0, CHUNK_SZ);
>>       if (!len) {
>>           err = -EINVAL;
>>           goto out_rq;
>> @@ -844,7 +844,7 @@ static int wrap_ktime_compare(const void *A, const 
>> void *B)
>>   static int __perf_clear_blt(struct intel_context *ce,
>>                   struct scatterlist *sg,
>> -                enum i915_cache_level cache_level,
>> +                unsigned int pat_index,
>>                   bool is_lmem,
>>                   size_t sz)
>>   {
>> @@ -858,7 +858,7 @@ static int __perf_clear_blt(struct intel_context *ce,
>>           t0 = ktime_get();
>> -        err = intel_context_migrate_clear(ce, NULL, sg, cache_level,
>> +        err = intel_context_migrate_clear(ce, NULL, sg, pat_index,
>>                             is_lmem, 0, &rq);
>>           if (rq) {
>>               if (i915_request_wait(rq, 0, MAX_SCHEDULE_TIMEOUT) < 0)
>> @@ -904,7 +904,8 @@ static int perf_clear_blt(void *arg)
>>           err = __perf_clear_blt(gt->migrate.context,
>>                          dst->mm.pages->sgl,
>> -                       I915_CACHE_NONE,
>> +                       i915_gem_get_pat_index(gt->i915,
>> +                                  I915_CACHE_NONE),
>>                          i915_gem_object_is_lmem(dst),
>>                          sizes[i]);
>> @@ -919,10 +920,10 @@ static int perf_clear_blt(void *arg)
>>   static int __perf_copy_blt(struct intel_context *ce,
>>                  struct scatterlist *src,
>> -               enum i915_cache_level src_cache_level,
>> +               unsigned int src_pat_index,
>>                  bool src_is_lmem,
>>                  struct scatterlist *dst,
>> -               enum i915_cache_level dst_cache_level,
>> +               unsigned int dst_pat_index,
>>                  bool dst_is_lmem,
>>                  size_t sz)
>>   {
>> @@ -937,9 +938,9 @@ static int __perf_copy_blt(struct intel_context *ce,
>>           t0 = ktime_get();
>>           err = intel_context_migrate_copy(ce, NULL,
>> -                         src, src_cache_level,
>> +                         src, src_pat_index,
>>                            src_is_lmem,
>> -                         dst, dst_cache_level,
>> +                         dst, dst_pat_index,
>>                            dst_is_lmem,
>>                            &rq);
>>           if (rq) {
>> @@ -994,10 +995,12 @@ static int perf_copy_blt(void *arg)
>>           err = __perf_copy_blt(gt->migrate.context,
>>                         src->mm.pages->sgl,
>> -                      I915_CACHE_NONE,
>> +                      i915_gem_get_pat_index(gt->i915,
>> +                                 I915_CACHE_NONE),
>>                         i915_gem_object_is_lmem(src),
>>                         dst->mm.pages->sgl,
>> -                      I915_CACHE_NONE,
>> +                      i915_gem_get_pat_index(gt->i915,
>> +                                 I915_CACHE_NONE),
>>                         i915_gem_object_is_lmem(dst),
>>                         sz);
>> diff --git a/drivers/gpu/drm/i915/gt/selftest_reset.c 
>> b/drivers/gpu/drm/i915/gt/selftest_reset.c
>> index a9e0a91bc0e0..79aa6ac66ad2 100644
>> --- a/drivers/gpu/drm/i915/gt/selftest_reset.c
>> +++ b/drivers/gpu/drm/i915/gt/selftest_reset.c
>> @@ -86,7 +86,9 @@ __igt_reset_stolen(struct intel_gt *gt,
>>           ggtt->vm.insert_page(&ggtt->vm, dma,
>>                        ggtt->error_capture.start,
>> -                     I915_CACHE_NONE, 0);
>> +                     i915_gem_get_pat_index(gt->i915,
>> +                                I915_CACHE_NONE),
>> +                     0);
>>           mb();
>>           s = io_mapping_map_wc(&ggtt->iomap,
>> @@ -127,7 +129,9 @@ __igt_reset_stolen(struct intel_gt *gt,
>>           ggtt->vm.insert_page(&ggtt->vm, dma,
>>                        ggtt->error_capture.start,
>> -                     I915_CACHE_NONE, 0);
>> +                     i915_gem_get_pat_index(gt->i915,
>> +                                I915_CACHE_NONE),
>> +                     0);
>>           mb();
>>           s = io_mapping_map_wc(&ggtt->iomap,
>> diff --git a/drivers/gpu/drm/i915/gt/selftest_timeline.c 
>> b/drivers/gpu/drm/i915/gt/selftest_timeline.c
>> index 9f536c251179..39c3ec12df1a 100644
>> --- a/drivers/gpu/drm/i915/gt/selftest_timeline.c
>> +++ b/drivers/gpu/drm/i915/gt/selftest_timeline.c
>> @@ -836,7 +836,7 @@ static int setup_watcher(struct hwsp_watcher *w, 
>> struct intel_gt *gt,
>>           return PTR_ERR(obj);
>>       /* keep the same cache settings as timeline */
>> -    i915_gem_object_set_cache_coherency(obj, 
>> tl->hwsp_ggtt->obj->cache_level);
>> +    i915_gem_object_set_pat_index(obj, tl->hwsp_ggtt->obj->pat_index);
>>       w->map = i915_gem_object_pin_map_unlocked(obj,
>>                             
>> page_unmask_bits(tl->hwsp_ggtt->obj->mm.mapping));
>>       if (IS_ERR(w->map)) {
>> diff --git a/drivers/gpu/drm/i915/gt/selftest_tlb.c 
>> b/drivers/gpu/drm/i915/gt/selftest_tlb.c
>> index e6cac1f15d6e..4493c8518e91 100644
>> --- a/drivers/gpu/drm/i915/gt/selftest_tlb.c
>> +++ b/drivers/gpu/drm/i915/gt/selftest_tlb.c
>> @@ -36,6 +36,8 @@ pte_tlbinv(struct intel_context *ce,
>>          u64 length,
>>          struct rnd_state *prng)
>>   {
>> +    const unsigned int pat_index =
>> +        i915_gem_get_pat_index(ce->vm->i915, I915_CACHE_NONE);
>>       struct drm_i915_gem_object *batch;
>>       struct drm_mm_node vb_node;
>>       struct i915_request *rq;
>> @@ -155,7 +157,7 @@ pte_tlbinv(struct intel_context *ce,
>>           /* Flip the PTE between A and B */
>>           if (i915_gem_object_is_lmem(vb->obj))
>>               pte_flags |= PTE_LM;
>> -        ce->vm->insert_entries(ce->vm, &vb_res, 0, pte_flags);
>> +        ce->vm->insert_entries(ce->vm, &vb_res, pat_index, pte_flags);
>>           /* Flush the PTE update to concurrent HW */
>>           tlbinv(ce->vm, addr & -length, length);
>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c 
>> b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
>> index a82a53dbbc86..145681ae20a5 100644
>> --- a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
>> @@ -890,9 +890,15 @@ static void uc_fw_bind_ggtt(struct intel_uc_fw 
>> *uc_fw)
>>           pte_flags |= PTE_LM;
>>       if (ggtt->vm.raw_insert_entries)
>> -        ggtt->vm.raw_insert_entries(&ggtt->vm, dummy, 
>> I915_CACHE_NONE, pte_flags);
>> +        ggtt->vm.raw_insert_entries(&ggtt->vm, dummy,
>> +                        i915_gem_get_pat_index(ggtt->vm.i915,
>> +                                   I915_CACHE_NONE),
>> +                        pte_flags);
>>       else
>> -        ggtt->vm.insert_entries(&ggtt->vm, dummy, I915_CACHE_NONE, 
>> pte_flags);
>> +        ggtt->vm.insert_entries(&ggtt->vm, dummy,
>> +                    i915_gem_get_pat_index(ggtt->vm.i915,
>> +                                   I915_CACHE_NONE),
>> +                    pte_flags);
>>   }
>>   static void uc_fw_unbind_ggtt(struct intel_uc_fw *uc_fw)
>> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
>> b/drivers/gpu/drm/i915/i915_debugfs.c
>> index 41389a32e998..9a4922da3a71 100644
>> --- a/drivers/gpu/drm/i915/i915_debugfs.c
>> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
>> @@ -139,21 +139,56 @@ static const char *stringify_vma_type(const 
>> struct i915_vma *vma)
>>       return "ppgtt";
>>   }
>> -static const char *i915_cache_level_str(struct drm_i915_private 
>> *i915, int type)
>> -{
>> -    switch (type) {
>> -    case I915_CACHE_NONE: return " uncached";
>> -    case I915_CACHE_LLC: return HAS_LLC(i915) ? " LLC" : " snooped";
>> -    case I915_CACHE_L3_LLC: return " L3+LLC";
>> -    case I915_CACHE_WT: return " WT";
>> -    default: return "";
>> +static const char *i915_cache_level_str(struct drm_i915_gem_object *obj)
>> +{
>> +    struct drm_i915_private *i915 = obj_to_i915(obj);
>> +
>> +    if (IS_METEORLAKE(i915)) {
>> +        switch (obj->pat_index) {
>> +        case 0: return " WB";
>> +        case 1: return " WT";
>> +        case 2: return " UC";
>> +        case 3: return " WB (1-Way Coh)";
>> +        case 4: return " WB (2-Way Coh)";
>> +        default: return " not defined";

Is not defined possible?

Also, it may be nicer to handle the leading space in the caller.

>> +        }
>> +    } else if (IS_PONTEVECCHIO(i915)) {
>> +        switch (obj->pat_index) {
>> +        case 0: return " UC";
>> +        case 1: return " WC";
>> +        case 2: return " WT";
>> +        case 3: return " WB";
>> +        case 4: return " WT (CLOS1)";
>> +        case 5: return " WB (CLOS1)";
>> +        case 6: return " WT (CLOS2)";
>> +        case 7: return " WT (CLOS2)";
>> +        default: return " not defined";
>> +        }
>> +    } else if (GRAPHICS_VER(i915) >= 12) {
>> +        switch (obj->pat_index) {
>> +        case 0: return " WB";
>> +        case 1: return " WC";
>> +        case 2: return " WT";
>> +        case 3: return " UC";
>> +        default: return " not defined";
>> +        }
>> +    } else {

Is this correct if a legacy platform used the set pat extension? I don't see that it is disallowed.

Would it simplify things to add a reverse table to device info, so like cachelevel_to_pat, just for pat_index to names? I guess it depends what names PRMs use for PATs on legacy platforms. Is it consistend with the above UC/WC/WB/... or with the below names.

>> +        if (i915_gem_object_has_cache_level(obj, I915_CACHE_NONE))
>> +            return " uncached";

UC for consistency?

>> +        else if (i915_gem_object_has_cache_level(obj, I915_CACHE_LLC))
>> +            return HAS_LLC(i915) ? " LLC" : " snooped";
>> +        else if (i915_gem_object_has_cache_level(obj, 
>> I915_CACHE_L3_LLC))
>> +            return " L3+LLC";

Is this correct if !HAS_LLC?

>> +        else if (i915_gem_object_has_cache_level(obj, I915_CACHE_WT))
>> +            return " WT";
>> +        else
>> +            return " not defined";

Current code prints nothing for the default switch statement.

But is this even reachable or should it be MISSING_CASE warning?

>>       }
>>   }
>>   void
>>   i915_debugfs_describe_obj(struct seq_file *m, struct 
>> drm_i915_gem_object *obj)
>>   {
>> -    struct drm_i915_private *dev_priv = to_i915(obj->base.dev);
>>       struct i915_vma *vma;
>>       int pin_count = 0;
>> @@ -165,7 +200,7 @@ i915_debugfs_describe_obj(struct seq_file *m, 
>> struct drm_i915_gem_object *obj)
>>              obj->base.size / 1024,
>>              obj->read_domains,
>>              obj->write_domain,
>> -           i915_cache_level_str(dev_priv, obj->cache_level),
>> +           i915_cache_level_str(obj),
>>              obj->mm.dirty ? " dirty" : "",
>>              obj->mm.madv == I915_MADV_DONTNEED ? " purgeable" : "");
>>       if (obj->base.name)
>> diff --git a/drivers/gpu/drm/i915/i915_gem.c 
>> b/drivers/gpu/drm/i915/i915_gem.c
>> index 0a78bdbd36b1..63207b0740b3 100644
>> --- a/drivers/gpu/drm/i915/i915_gem.c
>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>> @@ -420,8 +420,12 @@ i915_gem_gtt_pread(struct drm_i915_gem_object *obj,
>>           page_length = remain < page_length ? remain : page_length;
>>           if (drm_mm_node_allocated(&node)) {
>>               ggtt->vm.insert_page(&ggtt->vm,
>> -                         i915_gem_object_get_dma_address(obj, offset 
>> >> PAGE_SHIFT),
>> -                         node.start, I915_CACHE_NONE, 0);
>> +                    i915_gem_object_get_dma_address(obj,
>> +                                    offset >> PAGE_SHIFT),
>> +                    node.start,
>> +                    i915_gem_get_pat_index(i915,
>> +                                   I915_CACHE_NONE),

For the callsites which use const levels you could at least do something like i915->pat_cache_none, or I know the not very popular static inline i915_gem_get_pat_index so it can be evaluated at runtime. Not sure really, throwing out ideas which may be invalid if a more elegant refactoring is possible.

Regards,

Tvrtko

>> +                    0);
>>           } else {
>>               page_base += offset & PAGE_MASK;
>>           }
>> @@ -598,8 +602,12 @@ i915_gem_gtt_pwrite_fast(struct 
>> drm_i915_gem_object *obj,
>>               /* flush the write before we modify the GGTT */
>>               intel_gt_flush_ggtt_writes(ggtt->vm.gt);
>>               ggtt->vm.insert_page(&ggtt->vm,
>> -                         i915_gem_object_get_dma_address(obj, offset 
>> >> PAGE_SHIFT),
>> -                         node.start, I915_CACHE_NONE, 0);
>> +                    i915_gem_object_get_dma_address(obj,
>> +                                    offset >> PAGE_SHIFT),
>> +                    node.start,
>> +                    i915_gem_get_pat_index(i915,
>> +                                   I915_CACHE_NONE),
>> +                    0);
>>               wmb(); /* flush modifications to the GGTT (insert_page) */
>>           } else {
>>               page_base += offset & PAGE_MASK;
>> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c 
>> b/drivers/gpu/drm/i915/i915_gpu_error.c
>> index f020c0086fbc..2556cabea02c 100644
>> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
>> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
>> @@ -1117,10 +1117,14 @@ i915_vma_coredump_create(const struct intel_gt 
>> *gt,
>>               mutex_lock(&ggtt->error_mutex);
>>               if (ggtt->vm.raw_insert_page)
>>                   ggtt->vm.raw_insert_page(&ggtt->vm, dma, slot,
>> -                             I915_CACHE_NONE, 0);
>> +                        i915_gem_get_pat_index(gt->i915,
>> +                                       I915_CACHE_NONE),
>> +                        0);
>>               else
>>                   ggtt->vm.insert_page(&ggtt->vm, dma, slot,
>> -                             I915_CACHE_NONE, 0);
>> +                        i915_gem_get_pat_index(gt->i915,
>> +                                       I915_CACHE_NONE),
>> +                        0);
>>               mb();
>>               s = io_mapping_map_wc(&ggtt->iomap, slot, PAGE_SIZE);
>> diff --git a/drivers/gpu/drm/i915/i915_vma.c 
>> b/drivers/gpu/drm/i915/i915_vma.c
>> index 20a44788999e..a814775a363d 100644
>> --- a/drivers/gpu/drm/i915/i915_vma.c
>> +++ b/drivers/gpu/drm/i915/i915_vma.c
>> @@ -315,7 +315,7 @@ struct i915_vma_work {
>>       struct i915_vma_resource *vma_res;
>>       struct drm_i915_gem_object *obj;
>>       struct i915_sw_dma_fence_cb cb;
>> -    enum i915_cache_level cache_level;
>> +    unsigned int pat_index;
>>       unsigned int flags;
>>   };
>> @@ -334,7 +334,7 @@ static void __vma_bind(struct dma_fence_work *work)
>>           return;
>>       vma_res->ops->bind_vma(vma_res->vm, &vw->stash,
>> -                   vma_res, vw->cache_level, vw->flags);
>> +                   vma_res, vw->pat_index, vw->flags);
>>   }
>>   static void __vma_release(struct dma_fence_work *work)
>> @@ -426,7 +426,7 @@ i915_vma_resource_init_from_vma(struct 
>> i915_vma_resource *vma_res,
>>   /**
>>    * i915_vma_bind - Sets up PTEs for an VMA in it's corresponding 
>> address space.
>>    * @vma: VMA to map
>> - * @cache_level: mapping cache level
>> + * @pat_index: PAT index to set in PTE
>>    * @flags: flags like global or local mapping
>>    * @work: preallocated worker for allocating and binding the PTE
>>    * @vma_res: pointer to a preallocated vma resource. The resource is 
>> either
>> @@ -437,7 +437,7 @@ i915_vma_resource_init_from_vma(struct 
>> i915_vma_resource *vma_res,
>>    * Note that DMA addresses are also the only part of the SG table we 
>> care about.
>>    */
>>   int i915_vma_bind(struct i915_vma *vma,
>> -          enum i915_cache_level cache_level,
>> +          unsigned int pat_index,
>>             u32 flags,
>>             struct i915_vma_work *work,
>>             struct i915_vma_resource *vma_res)
>> @@ -507,7 +507,7 @@ int i915_vma_bind(struct i915_vma *vma,
>>           struct dma_fence *prev;
>>           work->vma_res = i915_vma_resource_get(vma->resource);
>> -        work->cache_level = cache_level;
>> +        work->pat_index = pat_index;
>>           work->flags = bind_flags;
>>           /*
>> @@ -537,7 +537,7 @@ int i915_vma_bind(struct i915_vma *vma,
>>               return ret;
>>           }
>> -        vma->ops->bind_vma(vma->vm, NULL, vma->resource, cache_level,
>> +        vma->ops->bind_vma(vma->vm, NULL, vma->resource, pat_index,
>>                      bind_flags);
>>       }
>> @@ -814,7 +814,7 @@ i915_vma_insert(struct i915_vma *vma, struct 
>> i915_gem_ww_ctx *ww,
>>       color = 0;
>>       if (i915_vm_has_cache_coloring(vma->vm))
>> -        color = vma->obj->cache_level;
>> +        color = vma->obj->pat_index;
>>       if (flags & PIN_OFFSET_FIXED) {
>>           u64 offset = flags & PIN_OFFSET_MASK;
>> @@ -1518,7 +1518,7 @@ int i915_vma_pin_ww(struct i915_vma *vma, struct 
>> i915_gem_ww_ctx *ww,
>>       GEM_BUG_ON(!vma->pages);
>>       err = i915_vma_bind(vma,
>> -                vma->obj->cache_level,
>> +                vma->obj->pat_index,
>>                   flags, work, vma_res);
>>       vma_res = NULL;
>>       if (err)
>> diff --git a/drivers/gpu/drm/i915/i915_vma.h 
>> b/drivers/gpu/drm/i915/i915_vma.h
>> index ed5c9d682a1b..31a8f8aa5558 100644
>> --- a/drivers/gpu/drm/i915/i915_vma.h
>> +++ b/drivers/gpu/drm/i915/i915_vma.h
>> @@ -250,7 +250,7 @@ i915_vma_compare(struct i915_vma *vma,
>>   struct i915_vma_work *i915_vma_work(void);
>>   int i915_vma_bind(struct i915_vma *vma,
>> -          enum i915_cache_level cache_level,
>> +          unsigned int pat_index,
>>             u32 flags,
>>             struct i915_vma_work *work,
>>             struct i915_vma_resource *vma_res);
>> diff --git a/drivers/gpu/drm/i915/i915_vma_types.h 
>> b/drivers/gpu/drm/i915/i915_vma_types.h
>> index 77fda2244d16..64472b7f0e77 100644
>> --- a/drivers/gpu/drm/i915/i915_vma_types.h
>> +++ b/drivers/gpu/drm/i915/i915_vma_types.h
>> @@ -32,8 +32,6 @@
>>   #include "gem/i915_gem_object_types.h"
>> -enum i915_cache_level;
>> -
>>   /**
>>    * DOC: Global GTT views
>>    *
>> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem.c 
>> b/drivers/gpu/drm/i915/selftests/i915_gem.c
>> index d91d0ade8abd..61da4ed9d521 100644
>> --- a/drivers/gpu/drm/i915/selftests/i915_gem.c
>> +++ b/drivers/gpu/drm/i915/selftests/i915_gem.c
>> @@ -57,7 +57,10 @@ static void trash_stolen(struct drm_i915_private 
>> *i915)
>>           u32 __iomem *s;
>>           int x;
>> -        ggtt->vm.insert_page(&ggtt->vm, dma, slot, I915_CACHE_NONE, 0);
>> +        ggtt->vm.insert_page(&ggtt->vm, dma, slot,
>> +                     i915_gem_get_pat_index(i915,
>> +                                I915_CACHE_NONE),
>> +                     0);
>>           s = io_mapping_map_atomic_wc(&ggtt->iomap, slot);
>>           for (x = 0; x < PAGE_SIZE / sizeof(u32); x++) {
>> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c 
>> b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
>> index 37068542aafe..f13a4d265814 100644
>> --- a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
>> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
>> @@ -245,7 +245,7 @@ static int igt_evict_for_cache_color(void *arg)
>>       struct drm_mm_node target = {
>>           .start = I915_GTT_PAGE_SIZE * 2,
>>           .size = I915_GTT_PAGE_SIZE,
>> -        .color = I915_CACHE_LLC,
>> +        .color = i915_gem_get_pat_index(gt->i915, I915_CACHE_LLC),
>>       };
>>       struct drm_i915_gem_object *obj;
>>       struct i915_vma *vma;
>> @@ -308,7 +308,7 @@ static int igt_evict_for_cache_color(void *arg)
>>       /* Attempt to remove the first *pinned* vma, by removing the 
>> (empty)
>>        * neighbour -- this should fail.
>>        */
>> -    target.color = I915_CACHE_L3_LLC;
>> +    target.color = i915_gem_get_pat_index(gt->i915, I915_CACHE_L3_LLC);
>>       mutex_lock(&ggtt->vm.mutex);
>>       err = i915_gem_evict_for_node(&ggtt->vm, NULL, &target, 0);
>> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c 
>> b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>> index 154801f1c468..36940ef10108 100644
>> --- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>> @@ -135,7 +135,7 @@ fake_dma_object(struct drm_i915_private *i915, u64 
>> size)
>>       obj->write_domain = I915_GEM_DOMAIN_CPU;
>>       obj->read_domains = I915_GEM_DOMAIN_CPU;
>> -    obj->cache_level = I915_CACHE_NONE;
>> +    obj->pat_index = i915_gem_get_pat_index(i915, I915_CACHE_NONE);
>>       /* Preallocate the "backing storage" */
>>       if (i915_gem_object_pin_pages_unlocked(obj))
>> @@ -359,7 +359,9 @@ static int lowlevel_hole(struct i915_address_space 
>> *vm,
>>               with_intel_runtime_pm(vm->gt->uncore->rpm, wakeref)
>>                 vm->insert_entries(vm, mock_vma_res,
>> -                           I915_CACHE_NONE, 0);
>> +                         i915_gem_get_pat_index(vm->i915,
>> +                                    I915_CACHE_NONE),
>> +                         0);
>>           }
>>           count = n;
>> @@ -1377,7 +1379,10 @@ static int igt_ggtt_page(void *arg)
>>           ggtt->vm.insert_page(&ggtt->vm,
>>                        i915_gem_object_get_dma_address(obj, 0),
>> -                     offset, I915_CACHE_NONE, 0);
>> +                     offset,
>> +                     i915_gem_get_pat_index(i915,
>> +                                I915_CACHE_NONE),
>> +                     0);
>>       }
>>       order = i915_random_order(count, &prng);
>> @@ -1510,7 +1515,7 @@ static int reserve_gtt_with_resource(struct 
>> i915_vma *vma, u64 offset)
>>       mutex_lock(&vm->mutex);
>>       err = i915_gem_gtt_reserve(vm, NULL, &vma->node, obj->base.size,
>>                      offset,
>> -                   obj->cache_level,
>> +                   obj->pat_index,
>>                      0);
>>       if (!err) {
>>           i915_vma_resource_init_from_vma(vma_res, vma);
>> @@ -1690,7 +1695,7 @@ static int insert_gtt_with_resource(struct 
>> i915_vma *vma)
>>       mutex_lock(&vm->mutex);
>>       err = i915_gem_gtt_insert(vm, NULL, &vma->node, obj->base.size, 0,
>> -                  obj->cache_level, 0, vm->total, 0);
>> +                  obj->pat_index, 0, vm->total, 0);
>>       if (!err) {
>>           i915_vma_resource_init_from_vma(vma_res, vma);
>>           vma->resource = vma_res;
>> diff --git a/drivers/gpu/drm/i915/selftests/intel_memory_region.c 
>> b/drivers/gpu/drm/i915/selftests/intel_memory_region.c
>> index 3b18e5905c86..d985d9bae2e8 100644
>> --- a/drivers/gpu/drm/i915/selftests/intel_memory_region.c
>> +++ b/drivers/gpu/drm/i915/selftests/intel_memory_region.c
>> @@ -1070,7 +1070,9 @@ static int igt_lmem_write_cpu(void *arg)
>>       /* Put the pages into a known state -- from the gpu for added 
>> fun */
>>       intel_engine_pm_get(engine);
>>       err = intel_context_migrate_clear(engine->gt->migrate.context, 
>> NULL,
>> -                      obj->mm.pages->sgl, I915_CACHE_NONE,
>> +                      obj->mm.pages->sgl,
>> +                      i915_gem_get_pat_index(i915,
>> +                                 I915_CACHE_NONE),
>>                         true, 0xdeadbeaf, &rq);
>>       if (rq) {
>>           dma_resv_add_fence(obj->base.resv, &rq->fence,
>> diff --git a/drivers/gpu/drm/i915/selftests/mock_gtt.c 
>> b/drivers/gpu/drm/i915/selftests/mock_gtt.c
>> index ece97e4faacb..a516c0aa88fd 100644
>> --- a/drivers/gpu/drm/i915/selftests/mock_gtt.c
>> +++ b/drivers/gpu/drm/i915/selftests/mock_gtt.c
>> @@ -27,21 +27,21 @@
>>   static void mock_insert_page(struct i915_address_space *vm,
>>                    dma_addr_t addr,
>>                    u64 offset,
>> -                 enum i915_cache_level level,
>> +                 unsigned int pat_index,
>>                    u32 flags)
>>   {
>>   }
>>   static void mock_insert_entries(struct i915_address_space *vm,
>>                   struct i915_vma_resource *vma_res,
>> -                enum i915_cache_level level, u32 flags)
>> +                unsigned int pat_index, u32 flags)
>>   {
>>   }
>>   static void mock_bind_ppgtt(struct i915_address_space *vm,
>>                   struct i915_vm_pt_stash *stash,
>>                   struct i915_vma_resource *vma_res,
>> -                enum i915_cache_level cache_level,
>> +                unsigned int pat_index,
>>                   u32 flags)
>>   {
>>       GEM_BUG_ON(flags & I915_VMA_GLOBAL_BIND);
>> @@ -94,7 +94,7 @@ struct i915_ppgtt *mock_ppgtt(struct 
>> drm_i915_private *i915, const char *name)
>>   static void mock_bind_ggtt(struct i915_address_space *vm,
>>                  struct i915_vm_pt_stash *stash,
>>                  struct i915_vma_resource *vma_res,
>> -               enum i915_cache_level cache_level,
>> +               unsigned int pat_index,
>>                  u32 flags)
>>   {
>>   }
> 

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Intel-gfx] [PATCH 8/8] drm/i915: Allow user to set cache at BO creation
  2023-04-20 11:39     ` [Intel-gfx] " Andi Shyti
  (?)
@ 2023-04-20 13:06     ` Tvrtko Ursulin
  2023-04-20 16:11       ` Yang, Fei
  -1 siblings, 1 reply; 76+ messages in thread
From: Tvrtko Ursulin @ 2023-04-20 13:06 UTC (permalink / raw)
  To: Andi Shyti, fei.yang
  Cc: Matt Roper, intel-gfx, Chris Wilson, dri-devel, Nirmoy Das


On 20/04/2023 12:39, Andi Shyti wrote:
> Hi Fei,
> 
>> To comply with the design that buffer objects shall have immutable
>> cache setting through out their life cycle, {set, get}_caching ioctl's
>> are no longer supported from MTL onward. With that change caching
>> policy can only be set at object creation time. The current code
>> applies a default (platform dependent) cache setting for all objects.
>> However this is not optimal for performance tuning. The patch extends
>> the existing gem_create uAPI to let user set PAT index for the object
>> at creation time.
>> The new extension is platform independent, so UMD's can switch to using
>> this extension for older platforms as well, while {set, get}_caching are
>> still supported on these legacy paltforms for compatibility reason.
>>
>> Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
>> Cc: Matt Roper <matthew.d.roper@intel.com>
>> Cc: Andi Shyti <andi.shyti@linux.intel.com>
>> Signed-off-by: Fei Yang <fei.yang@intel.com>
>> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
> 
> because this is an API change, we need some more information
> here.
> 
> First of all you need to CC the userspace guys that have been
> working on top of your series and get their ack's.

Yes, and a link to a Mesa merge request which uses the uapi should be 
included.

IGTs should be ready to before we can merge. I glanced over igt-dev but 
did not spot anything.

Regards,

Tvrtko

> 
> I also believe that this series has also been tested on a
> separate repository, would you link it in the commit message?
> 
> Thanks,
> Andi

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Intel-gfx] [PATCH 8/8] drm/i915: Allow user to set cache at BO creation
  2023-04-20 13:06     ` Tvrtko Ursulin
@ 2023-04-20 16:11       ` Yang, Fei
  2023-04-20 16:29           ` Andi Shyti
  2023-04-21 20:48           ` Jordan Justen
  0 siblings, 2 replies; 76+ messages in thread
From: Yang, Fei @ 2023-04-20 16:11 UTC (permalink / raw)
  To: Tvrtko Ursulin, Andi Shyti
  Cc: Roper, Matthew D, intel-gfx, Chris Wilson, dri-devel, Das, Nirmoy

[-- Attachment #1: Type: text/plain, Size: 1832 bytes --]

> On 20/04/2023 12:39, Andi Shyti wrote:
>> Hi Fei,
>>
>>> To comply with the design that buffer objects shall have immutable
>>> cache setting through out their life cycle, {set, get}_caching ioctl's
>>> are no longer supported from MTL onward. With that change caching
>>> policy can only be set at object creation time. The current code
>>> applies a default (platform dependent) cache setting for all objects.
>>> However this is not optimal for performance tuning. The patch extends
>>> the existing gem_create uAPI to let user set PAT index for the object
>>> at creation time.
>>> The new extension is platform independent, so UMD's can switch to using
>>> this extension for older platforms as well, while {set, get}_caching are
>>> still supported on these legacy paltforms for compatibility reason.
>>>
>>> Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
>>> Cc: Matt Roper <matthew.d.roper@intel.com>
>>> Cc: Andi Shyti <andi.shyti@linux.intel.com>
>>> Signed-off-by: Fei Yang <fei.yang@intel.com>
>>> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
>>
>> because this is an API change, we need some more information
>> here.
>>
>> First of all you need to CC the userspace guys that have been
>> working on top of your series and get their ack's.
>
> Yes, and a link to a Mesa merge request which uses the uapi should be
> included.

Working with Mesa team on this, stay tuned.

> IGTs should be ready to before we can merge. I glanced over igt-dev but
> did not spot anything.

There is a IGT patch posted to gfx-internal-devel, titled "test/gem_create: exercise gem_create_ext_set_pat"

> Regards,
>
> Tvrtko
>
>>
>> I also believe that this series has also been tested on a
>> separate repository, would you link it in the commit message?
>>
>> Thanks,
>> Andi


[-- Attachment #2: Type: text/html, Size: 4126 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Intel-gfx] [PATCH 8/8] drm/i915: Allow user to set cache at BO creation
  2023-04-20 16:11       ` Yang, Fei
@ 2023-04-20 16:29           ` Andi Shyti
  2023-04-21 20:48           ` Jordan Justen
  1 sibling, 0 replies; 76+ messages in thread
From: Andi Shyti @ 2023-04-20 16:29 UTC (permalink / raw)
  To: Yang, Fei
  Cc: Tvrtko Ursulin, Chris Wilson, intel-gfx, dri-devel, Andi Shyti,
	Roper, Matthew D, Das, Nirmoy

Hi Fei,

> >>> To comply with the design that buffer objects shall have immutable
> >>> cache setting through out their life cycle, {set, get}_caching ioctl's
> >>> are no longer supported from MTL onward. With that change caching
> >>> policy can only be set at object creation time. The current code
> >>> applies a default (platform dependent) cache setting for all objects.
> >>> However this is not optimal for performance tuning. The patch extends
> >>> the existing gem_create uAPI to let user set PAT index for the object
> >>> at creation time.
> >>> The new extension is platform independent, so UMD's can switch to using
> >>> this extension for older platforms as well, while {set, get}_caching are
> >>> still supported on these legacy paltforms for compatibility reason.
> >>>
> >>> Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
> >>> Cc: Matt Roper <matthew.d.roper@intel.com>
> >>> Cc: Andi Shyti <andi.shyti@linux.intel.com>
> >>> Signed-off-by: Fei Yang <fei.yang@intel.com>
> >>> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
> >>
> >> because this is an API change, we need some more information
> >> here.
> >>
> >> First of all you need to CC the userspace guys that have been
> >> working on top of your series and get their ack's.
> >
> > Yes, and a link to a Mesa merge request which uses the uapi should be
> > included.
> 
> Working with Mesa team on this, stay tuned.
> 
> > IGTs should be ready to before we can merge. I glanced over igt-dev but
> > did not spot anything.
> 
> There is a IGT patch posted to gfx-internal-devel, titled "test/gem_create:
> exercise gem_create_ext_set_pat"

Any chance to have it ported upstream? It's OK even if it's not
merged (at least on my side) but some public interface testing is
needed.

If you do post it upstream you could add in the cover letter:

Test-with: <mail-id>

where the mail-id is referred to the upstream patch of the test.

Andi

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Intel-gfx] [PATCH 8/8] drm/i915: Allow user to set cache at BO creation
@ 2023-04-20 16:29           ` Andi Shyti
  0 siblings, 0 replies; 76+ messages in thread
From: Andi Shyti @ 2023-04-20 16:29 UTC (permalink / raw)
  To: Yang, Fei
  Cc: Chris Wilson, intel-gfx, dri-devel, Roper, Matthew D, Das, Nirmoy

Hi Fei,

> >>> To comply with the design that buffer objects shall have immutable
> >>> cache setting through out their life cycle, {set, get}_caching ioctl's
> >>> are no longer supported from MTL onward. With that change caching
> >>> policy can only be set at object creation time. The current code
> >>> applies a default (platform dependent) cache setting for all objects.
> >>> However this is not optimal for performance tuning. The patch extends
> >>> the existing gem_create uAPI to let user set PAT index for the object
> >>> at creation time.
> >>> The new extension is platform independent, so UMD's can switch to using
> >>> this extension for older platforms as well, while {set, get}_caching are
> >>> still supported on these legacy paltforms for compatibility reason.
> >>>
> >>> Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
> >>> Cc: Matt Roper <matthew.d.roper@intel.com>
> >>> Cc: Andi Shyti <andi.shyti@linux.intel.com>
> >>> Signed-off-by: Fei Yang <fei.yang@intel.com>
> >>> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
> >>
> >> because this is an API change, we need some more information
> >> here.
> >>
> >> First of all you need to CC the userspace guys that have been
> >> working on top of your series and get their ack's.
> >
> > Yes, and a link to a Mesa merge request which uses the uapi should be
> > included.
> 
> Working with Mesa team on this, stay tuned.
> 
> > IGTs should be ready to before we can merge. I glanced over igt-dev but
> > did not spot anything.
> 
> There is a IGT patch posted to gfx-internal-devel, titled "test/gem_create:
> exercise gem_create_ext_set_pat"

Any chance to have it ported upstream? It's OK even if it's not
merged (at least on my side) but some public interface testing is
needed.

If you do post it upstream you could add in the cover letter:

Test-with: <mail-id>

where the mail-id is referred to the upstream patch of the test.

Andi

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Intel-gfx] [PATCH 2/8] drm/i915/mtl: Define MOCS and PAT tables for MTL
  2023-04-19 23:00   ` [Intel-gfx] " fei.yang
  (?)
@ 2023-04-20 20:29   ` Matt Roper
  -1 siblings, 0 replies; 76+ messages in thread
From: Matt Roper @ 2023-04-20 20:29 UTC (permalink / raw)
  To: fei.yang
  Cc: intel-gfx, Lucas De Marchi, dri-devel,
	Madhumitha Tolakanahalli Pradeep, Andrzej Hajda, Nirmoy Das

On Wed, Apr 19, 2023 at 04:00:52PM -0700, fei.yang@intel.com wrote:
> From: Madhumitha Tolakanahalli Pradeep <madhumitha.tolakanahalli.pradeep@intel.com>
> 
> On MTL, GT can no longer allocate on LLC - only the CPU can.
> This, along with addition of support for L4 cache calls for
> a MOCS/PAT table update.
> Also the PAT index registers are multicasted for primary GT,
> and there is an address jump from index 7 to 8. This patch
> makes sure that these registers are programmed in the proper
> way.
> 
> BSpec: 44509, 45101, 44235
> 
> Cc: Matt Roper <matthew.d.roper@intel.com>
> Cc: Lucas De Marchi <lucas.demarchi@intel.com>
> Signed-off-by: Madhumitha Tolakanahalli Pradeep <madhumitha.tolakanahalli.pradeep@intel.com>
> Signed-off-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com>
> Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
> Signed-off-by: Fei Yang <fei.yang@intel.com>
> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/gt/intel_gt_regs.h |  6 +-
>  drivers/gpu/drm/i915/gt/intel_gtt.c     | 47 ++++++++++++++-
>  drivers/gpu/drm/i915/gt/intel_gtt.h     | 20 ++++++-
>  drivers/gpu/drm/i915/gt/intel_mocs.c    | 76 +++++++++++++++++++++++--
>  drivers/gpu/drm/i915/gt/selftest_mocs.c |  2 +-
>  5 files changed, 143 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> index fd1f9cd35e9d..e8c3b762a92a 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> @@ -356,7 +356,11 @@
>  #define GEN7_TLB_RD_ADDR			_MMIO(0x4700)
>  
>  #define GEN12_PAT_INDEX(index)			_MMIO(0x4800 + (index) * 4)
> -#define XEHP_PAT_INDEX(index)			MCR_REG(0x4800 + (index) * 4)
> +#define _PAT_INDEX(index)			_PICK_EVEN_2RANGES(index, 8, \
> +								   0x4800, 0x4804, \
> +								   0x4848, 0x484c)
> +#define XEHP_PAT_INDEX(index)			MCR_REG(_PAT_INDEX(index))
> +#define XELPMP_PAT_INDEX(index)			_MMIO(_PAT_INDEX(index))
>  
>  #define XEHP_TILE0_ADDR_RANGE			MCR_REG(0x4900)
>  #define   XEHP_TILE_LMEM_RANGE_SHIFT		8
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
> index 4f436ba7a3c8..2f6a9be0ffe6 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
> @@ -468,6 +468,44 @@ void gtt_write_workarounds(struct intel_gt *gt)
>  	}
>  }
>  
> +static void xelpmp_setup_private_ppat(struct intel_uncore *uncore)
> +{
> +	intel_uncore_write(uncore, XELPMP_PAT_INDEX(0),
> +			   MTL_PPAT_L4_0_WB);
> +	intel_uncore_write(uncore, XELPMP_PAT_INDEX(1),
> +			   MTL_PPAT_L4_1_WT);
> +	intel_uncore_write(uncore, XELPMP_PAT_INDEX(2),
> +			   MTL_PPAT_L4_3_UC);
> +	intel_uncore_write(uncore, XELPMP_PAT_INDEX(3),
> +			   MTL_PPAT_L4_0_WB | MTL_2_COH_1W);
> +	intel_uncore_write(uncore, XELPMP_PAT_INDEX(4),
> +			   MTL_PPAT_L4_0_WB | MTL_3_COH_2W);
> +
> +	/*
> +	 * Remaining PAT entries are left at the hardware-default
> +	 * fully-cached setting
> +	 */
> +}
> +
> +static void xelpg_setup_private_ppat(struct intel_gt *gt)
> +{
> +	intel_gt_mcr_multicast_write(gt, XEHP_PAT_INDEX(0),
> +				     MTL_PPAT_L4_0_WB);
> +	intel_gt_mcr_multicast_write(gt, XEHP_PAT_INDEX(1),
> +				     MTL_PPAT_L4_1_WT);
> +	intel_gt_mcr_multicast_write(gt, XEHP_PAT_INDEX(2),
> +				     MTL_PPAT_L4_3_UC);
> +	intel_gt_mcr_multicast_write(gt, XEHP_PAT_INDEX(3),
> +				     MTL_PPAT_L4_0_WB | MTL_2_COH_1W);
> +	intel_gt_mcr_multicast_write(gt, XEHP_PAT_INDEX(4),
> +				     MTL_PPAT_L4_0_WB | MTL_3_COH_2W);
> +
> +	/*
> +	 * Remaining PAT entries are left at the hardware-default
> +	 * fully-cached setting
> +	 */
> +}
> +
>  static void tgl_setup_private_ppat(struct intel_uncore *uncore)
>  {
>  	/* TGL doesn't support LLC or AGE settings */
> @@ -603,7 +641,14 @@ void setup_private_pat(struct intel_gt *gt)
>  
>  	GEM_BUG_ON(GRAPHICS_VER(i915) < 8);
>  
> -	if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 50))
> +	if (gt->type == GT_MEDIA) {
> +		xelpmp_setup_private_ppat(gt->uncore);
> +		return;
> +	}
> +
> +	if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 70))
> +		xelpg_setup_private_ppat(gt);
> +	else if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 50))
>  		xehp_setup_private_ppat(gt);
>  	else if (GRAPHICS_VER(i915) >= 12)
>  		tgl_setup_private_ppat(uncore);
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
> index 69ce55f517f5..854ec09fd588 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
> @@ -88,9 +88,18 @@ typedef u64 gen8_pte_t;
>  #define BYT_PTE_SNOOPED_BY_CPU_CACHES	REG_BIT(2)
>  #define BYT_PTE_WRITEABLE		REG_BIT(1)
>  
> +#define MTL_PPGTT_PTE_PAT3	BIT_ULL(62)
>  #define GEN12_PPGTT_PTE_LM	BIT_ULL(11)
> +#define GEN12_PPGTT_PTE_PAT2	BIT_ULL(7)
> +#define GEN12_PPGTT_PTE_NC	BIT_ULL(5)
> +#define GEN12_PPGTT_PTE_PAT1	BIT_ULL(4)
> +#define GEN12_PPGTT_PTE_PAT0	BIT_ULL(3)
>  
> -#define GEN12_GGTT_PTE_LM	BIT_ULL(1)
> +#define GEN12_GGTT_PTE_LM		BIT_ULL(1)
> +#define MTL_GGTT_PTE_PAT0		BIT_ULL(52)
> +#define MTL_GGTT_PTE_PAT1		BIT_ULL(53)
> +#define GEN12_GGTT_PTE_ADDR_MASK	GENMASK_ULL(45, 12)
> +#define MTL_GGTT_PTE_PAT_MASK		GENMASK_ULL(53, 52)

All these PTE bits should probably move to the next patch that deals
with PTE encoding.  They aren't related to the table settings being
defined in this patch.

>  
>  #define GEN12_PDE_64K BIT(6)
>  #define GEN12_PTE_PS64 BIT(8)
> @@ -147,6 +156,15 @@ typedef u64 gen8_pte_t;
>  #define GEN8_PDE_IPS_64K BIT(11)
>  #define GEN8_PDE_PS_2M   BIT(7)
>  
> +#define MTL_PPAT_L4_CACHE_POLICY_MASK	REG_GENMASK(3, 2)
> +#define MTL_PAT_INDEX_COH_MODE_MASK	REG_GENMASK(1, 0)
> +#define MTL_PPAT_L4_3_UC	REG_FIELD_PREP(MTL_PPAT_L4_CACHE_POLICY_MASK, 3)
> +#define MTL_PPAT_L4_1_WT	REG_FIELD_PREP(MTL_PPAT_L4_CACHE_POLICY_MASK, 1)
> +#define MTL_PPAT_L4_0_WB	REG_FIELD_PREP(MTL_PPAT_L4_CACHE_POLICY_MASK, 0)
> +#define MTL_3_COH_2W	REG_FIELD_PREP(MTL_PAT_INDEX_COH_MODE_MASK, 3)
> +#define MTL_2_COH_1W	REG_FIELD_PREP(MTL_PAT_INDEX_COH_MODE_MASK, 2)
> +#define MTL_0_COH_NON	REG_FIELD_PREP(MTL_PAT_INDEX_COH_MODE_MASK, 0)

This last definition isn't used or needed.

> +
>  enum i915_cache_level;
>  
>  struct drm_i915_gem_object;
> diff --git a/drivers/gpu/drm/i915/gt/intel_mocs.c b/drivers/gpu/drm/i915/gt/intel_mocs.c
> index 69b489e8dfed..89570f137b2c 100644
> --- a/drivers/gpu/drm/i915/gt/intel_mocs.c
> +++ b/drivers/gpu/drm/i915/gt/intel_mocs.c
> @@ -40,6 +40,10 @@ struct drm_i915_mocs_table {
>  #define LE_COS(value)		((value) << 15)
>  #define LE_SSE(value)		((value) << 17)
>  
> +/* Defines for the tables (GLOB_MOCS_0 - GLOB_MOCS_16) */
> +#define _L4_CACHEABILITY(value)	((value) << 2)
> +#define IG_PAT(value)		((value) << 8)
> +
>  /* Defines for the tables (LNCFMOCS0 - LNCFMOCS31) - two entries per word */
>  #define L3_ESC(value)		((value) << 0)
>  #define L3_SCC(value)		((value) << 1)
> @@ -50,6 +54,7 @@ struct drm_i915_mocs_table {
>  /* Helper defines */
>  #define GEN9_NUM_MOCS_ENTRIES	64  /* 63-64 are reserved, but configured. */
>  #define PVC_NUM_MOCS_ENTRIES	3
> +#define MTL_NUM_MOCS_ENTRIES	16
>  
>  /* (e)LLC caching options */
>  /*
> @@ -73,6 +78,12 @@ struct drm_i915_mocs_table {
>  #define L3_2_RESERVED		_L3_CACHEABILITY(2)
>  #define L3_3_WB			_L3_CACHEABILITY(3)
>  
> +/* L4 caching options */
> +#define L4_0_WB			_L4_CACHEABILITY(0)
> +#define L4_1_WT			_L4_CACHEABILITY(1)
> +#define L4_2_RESERVED		_L4_CACHEABILITY(2)
> +#define L4_3_UC			_L4_CACHEABILITY(3)
> +
>  #define MOCS_ENTRY(__idx, __control_value, __l3cc_value) \
>  	[__idx] = { \
>  		.control_value = __control_value, \
> @@ -416,6 +427,57 @@ static const struct drm_i915_mocs_entry pvc_mocs_table[] = {
>  	MOCS_ENTRY(2, 0, L3_3_WB),
>  };
>  
> +static const struct drm_i915_mocs_entry mtl_mocs_table[] = {
> +	/* Error - Reserved for Non-Use */
> +	MOCS_ENTRY(0,
> +		   IG_PAT(0),
> +		   L3_LKUP(1) | L3_3_WB),
> +	/* Cached - L3 + L4 */
> +	MOCS_ENTRY(1,
> +		   IG_PAT(1),
> +		   L3_LKUP(1) | L3_3_WB),
> +	/* L4 - GO:L3 */
> +	MOCS_ENTRY(2,
> +		   IG_PAT(1),
> +		   L3_LKUP(1) | L3_1_UC),
> +	/* Uncached - GO:L3 */
> +	MOCS_ENTRY(3,
> +		   IG_PAT(1) | L4_3_UC,
> +		   L3_LKUP(1) | L3_1_UC),
> +	/* L4 - GO:Mem */
> +	MOCS_ENTRY(4,
> +		   IG_PAT(1),
> +		   L3_LKUP(1) | L3_GLBGO(1) | L3_1_UC),
> +	/* Uncached - GO:Mem */
> +	MOCS_ENTRY(5,
> +		   IG_PAT(1) | L4_3_UC,
> +		   L3_LKUP(1) | L3_GLBGO(1) | L3_1_UC),
> +	/* L4 - L3:NoLKUP; GO:L3 */
> +	MOCS_ENTRY(6,
> +		   IG_PAT(1),
> +		   L3_1_UC),
> +	/* Uncached - L3:NoLKUP; GO:L3 */
> +	MOCS_ENTRY(7,
> +		   IG_PAT(1) | L4_3_UC,
> +		   L3_1_UC),
> +	/* L4 - L3:NoLKUP; GO:Mem */
> +	MOCS_ENTRY(8,
> +		   IG_PAT(1),
> +		   L3_GLBGO(1) | L3_1_UC),
> +	/* Uncached - L3:NoLKUP; GO:Mem */
> +	MOCS_ENTRY(9,
> +		   IG_PAT(1) | L4_3_UC,
> +		   L3_GLBGO(1) | L3_1_UC),
> +	/* Display - L3; L4:WT */
> +	MOCS_ENTRY(14,
> +		   IG_PAT(1) | L4_1_WT,
> +		   L3_LKUP(1) | L3_3_WB),
> +	/* CCS - Non-Displayable */
> +	MOCS_ENTRY(15,
> +		   IG_PAT(1),
> +		   L3_GLBGO(1) | L3_1_UC),
> +};
> +
>  enum {
>  	HAS_GLOBAL_MOCS = BIT(0),
>  	HAS_ENGINE_MOCS = BIT(1),
> @@ -445,7 +507,13 @@ static unsigned int get_mocs_settings(const struct drm_i915_private *i915,
>  	memset(table, 0, sizeof(struct drm_i915_mocs_table));
>  
>  	table->unused_entries_index = I915_MOCS_PTE;
> -	if (IS_PONTEVECCHIO(i915)) {
> +	if (IS_METEORLAKE(i915)) {
> +		table->size = ARRAY_SIZE(mtl_mocs_table);
> +		table->table = mtl_mocs_table;
> +		table->n_entries = MTL_NUM_MOCS_ENTRIES;
> +		table->uc_index = 9;
> +		table->unused_entries_index = 1;
> +	} else if (IS_PONTEVECCHIO(i915)) {
>  		table->size = ARRAY_SIZE(pvc_mocs_table);
>  		table->table = pvc_mocs_table;
>  		table->n_entries = PVC_NUM_MOCS_ENTRIES;
> @@ -646,9 +714,9 @@ void intel_mocs_init_engine(struct intel_engine_cs *engine)
>  		init_l3cc_table(engine->gt, &table);
>  }
>  
> -static u32 global_mocs_offset(void)
> +static u32 global_mocs_offset(struct intel_gt *gt)
>  {
> -	return i915_mmio_reg_offset(GEN12_GLOBAL_MOCS(0));
> +	return i915_mmio_reg_offset(GEN12_GLOBAL_MOCS(0)) + gt->uncore->gsi_offset;

The main use of this function is as a parameter to __init_mocs_table().
The value ultimately gets used in intel_uncore_write_fw, which will
already apply the GSI offset automatically; the extra addition here is
unnecessary.

It seems this is an attempt to work around the secondary usage in a
selftest (where the value is encoded into a MI_STORE_REGISTER_MEM_GEN8).
Since the selftest is building register offsets into the GPU
instructions it is generating, it would probably make more sense to move
the GSI translation into the selftest itself so that it's clear what's
happening and why.

Fixing the selftest should also probably be done in a separate patch.
We can keep this patch focused on the primary goal of providing the new
tables documented in the bspec.


Matt

>  }
>  
>  void intel_set_mocs_index(struct intel_gt *gt)
> @@ -671,7 +739,7 @@ void intel_mocs_init(struct intel_gt *gt)
>  	 */
>  	flags = get_mocs_settings(gt->i915, &table);
>  	if (flags & HAS_GLOBAL_MOCS)
> -		__init_mocs_table(gt->uncore, &table, global_mocs_offset());
> +		__init_mocs_table(gt->uncore, &table, global_mocs_offset(gt));
>  
>  	/*
>  	 * Initialize the L3CC table as part of mocs initalization to make
> diff --git a/drivers/gpu/drm/i915/gt/selftest_mocs.c b/drivers/gpu/drm/i915/gt/selftest_mocs.c
> index ca009a6a13bd..730796346514 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_mocs.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_mocs.c
> @@ -137,7 +137,7 @@ static int read_mocs_table(struct i915_request *rq,
>  		return 0;
>  
>  	if (HAS_GLOBAL_MOCS_REGISTERS(rq->engine->i915))
> -		addr = global_mocs_offset();
> +		addr = global_mocs_offset(rq->engine->gt);
>  	else
>  		addr = mocs_offset(rq->engine);
>  
> -- 
> 2.25.1
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Intel-gfx] [PATCH 7/8] drm/i915: use pat_index instead of cache_level
  2023-04-20 12:39     ` Tvrtko Ursulin
@ 2023-04-20 20:34       ` Yang, Fei
  0 siblings, 0 replies; 76+ messages in thread
From: Yang, Fei @ 2023-04-20 20:34 UTC (permalink / raw)
  To: Tvrtko Ursulin, Hajda, Andrzej, intel-gfx
  Cc: Roper, Matthew D, Chris Wilson, dri-devel

[-- Attachment #1: Type: text/plain, Size: 98340 bytes --]

> On 20/04/2023 11:13, Andrzej Hajda wrote:
>> On 20.04.2023 01:00, fei.yang@intel.com wrote:
>>> From: Fei Yang <fei.yang@intel.com>
>>>
>>> Currently the KMD is using enum i915_cache_level to set caching policy
>>> for
>>> buffer objects. This is flaky because the PAT index which really controls
>>> the caching behavior in PTE has far more levels than what's defined in
>>> the
>>> enum. In addition, the PAT index is platform dependent, having to
>>> translate
>>> between i915_cache_level and PAT index is not reliable, and makes the
>>> code
>>> more complicated.
>
> How it is flaky and not reliable - yet the series proposed to leave it in
> place and even claims using cache levels simplifies the code (lower in the
> commit message). Maybe just the commit message needs work.

If you look into the PTE encode functions, using cache_level there is not even
correct. There is no way to map the 4 possible cache levels to all available PAT
indices, let alone the number of PAT indices varies from platofrm to platform.

The architectural design allows UMD's to directly set PAT index, this is not
possible with the use of cache_level in KMD. PAT index from UMD -> cache_level
-> PAT index bits in PTE? Such translation is too much of a headache.

However in the kernel space, cache_level is still a good enough abstraction for
KMD objects. Removing that would require code like this,
if (need to set uncached mode) {
      if (MTL)
            pat_index = 2;
      else if (PVC)
            pat_index = 0;
      else if (GEN12)
            pat_index = 3;
}

>>>  From UMD's perspective there is also a necessity to set caching
>>> policy for
>>> performance fine tuning. It's much easier for the UMD to directly use PAT
>>> index because the behavior of each PAT index is clearly defined in Bspec.
>>> Having the abstracted i915_cache_level sitting in between would only
>>> cause
>>> more ambiguity.
>>>
>>> For these reasons this patch replaces i915_cache_level with PAT index.
>>> Also
>>> note, the cache_level is not completely removed yet, because the KMD
>>> still
>>> has the need of creating buffer objects with simple cache settings
>>> such as
>>> cached, uncached, or writethrough. For such simple cases, using
>>> cache_level
>>> would help simplify the code.
>>
>> It seems quite fundamental change to me. Does this "not completely
>> removed yet" means that in some future we will not have support for
>> generic cache levels at all?

I think further simplification is possible if there is a consistent PAT
table across all platforms. Looking at Xe 2/3, that seems to be the trend.

>>Seems strange to me. Even looking at the
>> number of users of i915_gem_get_pat_index below it seem very unlikely.
>>
>> And if the support for generic level will stay, maybe it would be better
>> to make usage of it more convienient. All conversion of
>>      f(..., cache_level, ...)
>> to
>>      f(..., i915_gem_get_pat_index(i915, cache_level), ...)
>> looks quite ugly to me.
>>
>> Maybe extending cache level to support pat index somehow, for example:
>> enum i915_cache_level {
>>      I915_CACHE_NONE = 0,
>>      I915_CACHE_...,
>>      ...
>>      I915_CACHE_1ST_PAT_INDEX = 0x100,
>> }
>>
>> so real_pat_index = cache_level - I915_CACHE_1ST_PAT_INDEX
>>
>> and in case of generic level there will be platform dependend conversion
>> to real_pat_index?
>>
>> I do not know the whole picture so maybe this is all wrong for some
>> reason, just asking :)
>
> It looks a bit unsightly to me too so yes please, brain storm on whether
> it can be made more elegant and less intrusive would be appreciated.

I still think using pat_index without abstraction is better...

>>> Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
>>> Cc: Matt Roper <matthew.d.roper@intel.com>
>>> Signed-off-by: Fei Yang <fei.yang@intel.com>
>>> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
>>> ---
>>>   drivers/gpu/drm/i915/display/intel_dpt.c      | 12 +--
>>>   drivers/gpu/drm/i915/gem/i915_gem_domain.c    | 27 ++----
>>>   .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 10 ++-
>>>   drivers/gpu/drm/i915/gem/i915_gem_mman.c      |  3 +-
>>>   drivers/gpu/drm/i915/gem/i915_gem_object.c    | 52 +++++++++++-
>>>   drivers/gpu/drm/i915/gem/i915_gem_object.h    |  4 +
>>>   .../gpu/drm/i915/gem/i915_gem_object_types.h  | 25 +++++-
>>>   drivers/gpu/drm/i915/gem/i915_gem_stolen.c    |  4 +-
>>>   drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c  | 16 ++--
>>>   .../gpu/drm/i915/gem/selftests/huge_pages.c   |  2 +-
>>>   .../drm/i915/gem/selftests/i915_gem_migrate.c |  2 +-
>>>   .../drm/i915/gem/selftests/i915_gem_mman.c    |  2 +-
>>>   drivers/gpu/drm/i915/gt/gen6_ppgtt.c          | 10 ++-
>>>   drivers/gpu/drm/i915/gt/gen8_ppgtt.c          | 71 ++++++++--------
>>>   drivers/gpu/drm/i915/gt/gen8_ppgtt.h          |  3 +-
>>>   drivers/gpu/drm/i915/gt/intel_ggtt.c          | 82 +++++++++----------
>>>   drivers/gpu/drm/i915/gt/intel_gtt.h           | 20 ++---
>>>   drivers/gpu/drm/i915/gt/intel_migrate.c       | 47 ++++++-----
>>>   drivers/gpu/drm/i915/gt/intel_migrate.h       | 13 ++-
>>>   drivers/gpu/drm/i915/gt/intel_ppgtt.c         |  6 +-
>>>   drivers/gpu/drm/i915/gt/selftest_migrate.c    | 47 ++++++-----
>>>   drivers/gpu/drm/i915/gt/selftest_reset.c      |  8 +-
>>>   drivers/gpu/drm/i915/gt/selftest_timeline.c   |  2 +-
>>>   drivers/gpu/drm/i915/gt/selftest_tlb.c        |  4 +-
>>>   drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c      | 10 ++-
>>>   drivers/gpu/drm/i915/i915_debugfs.c           | 55 ++++++++++---
>>>   drivers/gpu/drm/i915/i915_gem.c               | 16 +++-
>>>   drivers/gpu/drm/i915/i915_gpu_error.c         |  8 +-
>>>   drivers/gpu/drm/i915/i915_vma.c               | 16 ++--
>>>   drivers/gpu/drm/i915/i915_vma.h               |  2 +-
>>>   drivers/gpu/drm/i915/i915_vma_types.h         |  2 -
>>>   drivers/gpu/drm/i915/selftests/i915_gem.c     |  5 +-
>>>   .../gpu/drm/i915/selftests/i915_gem_evict.c   |  4 +-
>>>   drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 15 ++--
>>>   .../drm/i915/selftests/intel_memory_region.c  |  4 +-
>>>   drivers/gpu/drm/i915/selftests/mock_gtt.c     |  8 +-
>>>   36 files changed, 378 insertions(+), 239 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/display/intel_dpt.c
>>> b/drivers/gpu/drm/i915/display/intel_dpt.c
>>> index c5eacfdba1a5..7c5fddb203ba 100644
>>> --- a/drivers/gpu/drm/i915/display/intel_dpt.c
>>> +++ b/drivers/gpu/drm/i915/display/intel_dpt.c
>>> @@ -43,24 +43,24 @@ static void gen8_set_pte(void __iomem *addr,
>>> gen8_pte_t pte)
>>>   static void dpt_insert_page(struct i915_address_space *vm,
>>>                   dma_addr_t addr,
>>>                   u64 offset,
>>> -                enum i915_cache_level level,
>>> +                unsigned int pat_index,
>>>                   u32 flags)
>>>   {
>>>       struct i915_dpt *dpt = i915_vm_to_dpt(vm);
>>>       gen8_pte_t __iomem *base = dpt->iomem;
>>>       gen8_set_pte(base + offset / I915_GTT_PAGE_SIZE,
>>> -             vm->pte_encode(addr, level, flags));
>>> +             vm->pte_encode(addr, pat_index, flags));
>>>   }
>>>   static void dpt_insert_entries(struct i915_address_space *vm,
>>>                      struct i915_vma_resource *vma_res,
>>> -                   enum i915_cache_level level,
>>> +                   unsigned int pat_index,
>>>                      u32 flags)
>>>   {
>>>       struct i915_dpt *dpt = i915_vm_to_dpt(vm);
>>>       gen8_pte_t __iomem *base = dpt->iomem;
>>> -    const gen8_pte_t pte_encode = vm->pte_encode(0, level, flags);
>>> +    const gen8_pte_t pte_encode = vm->pte_encode(0, pat_index, flags);
>>>       struct sgt_iter sgt_iter;
>>>       dma_addr_t addr;
>>>       int i;
>>> @@ -83,7 +83,7 @@ static void dpt_clear_range(struct
>>> i915_address_space *vm,
>>>   static void dpt_bind_vma(struct i915_address_space *vm,
>>>                struct i915_vm_pt_stash *stash,
>>>                struct i915_vma_resource *vma_res,
>>> -             enum i915_cache_level cache_level,
>>> +             unsigned int pat_index,
>>>                u32 flags)
>>>   {
>>>       u32 pte_flags;
>>> @@ -98,7 +98,7 @@ static void dpt_bind_vma(struct i915_address_space *vm,
>>>       if (vma_res->bi.lmem)
>>>           pte_flags |= PTE_LM;
>>> -    vm->insert_entries(vm, vma_res, cache_level, pte_flags);
>>> +    vm->insert_entries(vm, vma_res, pat_index, pte_flags);
>>>       vma_res->page_sizes_gtt = I915_GTT_PAGE_SIZE;
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
>>> b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
>>> index bb3575b1479f..d5fd4c9cd9f8 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
>>> @@ -27,8 +27,8 @@ static bool gpu_write_needs_clflush(struct
>>> drm_i915_gem_object *obj)
>>>       if (IS_DGFX(i915))
>>>           return false;
>>> -    return !(obj->cache_level == I915_CACHE_NONE ||
>>> -         obj->cache_level == I915_CACHE_WT);
>>> +    return !(i915_gem_object_has_cache_level(obj, I915_CACHE_NONE) ||
>>> +         i915_gem_object_has_cache_level(obj, I915_CACHE_WT));
>>>   }
>>>   bool i915_gem_cpu_write_needs_clflush(struct drm_i915_gem_object *obj)
>>> @@ -267,7 +267,7 @@ int i915_gem_object_set_cache_level(struct
>>> drm_i915_gem_object *obj,
>>>   {
>>>       int ret;
>>> -    if (obj->cache_level == cache_level)
>>> +    if (i915_gem_object_has_cache_level(obj, cache_level))
>>>           return 0;
>>>       ret = i915_gem_object_wait(obj,
>>> @@ -278,10 +278,8 @@ int i915_gem_object_set_cache_level(struct
>>> drm_i915_gem_object *obj,
>>>           return ret;
>>>       /* Always invalidate stale cachelines */
>>> -    if (obj->cache_level != cache_level) {
>>> -        i915_gem_object_set_cache_coherency(obj, cache_level);
>>> -        obj->cache_dirty = true;
>>> -    }
>>> +    i915_gem_object_set_cache_coherency(obj, cache_level);
>>> +    obj->cache_dirty = true;
>>>       /* The cache-level will be applied when each vma is rebound. */
>>>       return i915_gem_object_unbind(obj,
>>> @@ -306,20 +304,13 @@ int i915_gem_get_caching_ioctl(struct drm_device
>>> *dev, void *data,
>>>           goto out;
>>>       }
>>> -    switch (obj->cache_level) {
>>> -    case I915_CACHE_LLC:
>>> -    case I915_CACHE_L3_LLC:
>>> +    if (i915_gem_object_has_cache_level(obj, I915_CACHE_LLC) ||
>>> +        i915_gem_object_has_cache_level(obj, I915_CACHE_L3_LLC))
>>>           args->caching = I915_CACHING_CACHED;
>>> -        break;
>>> -
>>> -    case I915_CACHE_WT:
>>> +    else if (i915_gem_object_has_cache_level(obj, I915_CACHE_WT))
>>>           args->caching = I915_CACHING_DISPLAY;
>>> -        break;
>>> -
>>> -    default:
>>> +    else
>>>           args->caching = I915_CACHING_NONE;
>>> -        break;
>>> -    }
>>>   out:
>>>       rcu_read_unlock();
>>>       return err;
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>>> b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>>> index 3aeede6aee4d..d42915516636 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>>> @@ -642,7 +642,7 @@ static inline int use_cpu_reloc(const struct
>>> reloc_cache *cache,
>>>       return (cache->has_llc ||
>>>           obj->cache_dirty ||
>>> -        obj->cache_level != I915_CACHE_NONE);
>>> +        !i915_gem_object_has_cache_level(obj, I915_CACHE_NONE));
>>>   }
>>>   static int eb_reserve_vma(struct i915_execbuffer *eb,
>>> @@ -1323,8 +1323,10 @@ static void *reloc_iomap(struct i915_vma *batch,
>>>       offset = cache->node.start;
>>>       if (drm_mm_node_allocated(&cache->node)) {
>>>           ggtt->vm.insert_page(&ggtt->vm,
>>> -                     i915_gem_object_get_dma_address(obj, page),
>>> -                     offset, I915_CACHE_NONE, 0);
>>> +            i915_gem_object_get_dma_address(obj, page),
>>> +            offset,
>>> +            i915_gem_get_pat_index(ggtt->vm.i915, I915_CACHE_NONE),
>>> +            0);
>>>       } else {
>>>           offset += page << PAGE_SHIFT;
>>>       }
>>> @@ -1464,7 +1466,7 @@ eb_relocate_entry(struct i915_execbuffer *eb,
>>>               reloc_cache_unmap(&eb->reloc_cache);
>>>               mutex_lock(&vma->vm->mutex);
>>>               err = i915_vma_bind(target->vma,
>>> -                        target->vma->obj->cache_level,
>>> +                        target->vma->obj->pat_index,
>>>                           PIN_GLOBAL, NULL, NULL);
>>>               mutex_unlock(&vma->vm->mutex);
>>>               reloc_cache_remap(&eb->reloc_cache, ev->vma->obj);
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
>>> b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
>>> index 3dbacdf0911a..50c30efa08a3 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
>>> @@ -383,7 +383,8 @@ static vm_fault_t vm_fault_gtt(struct vm_fault *vmf)
>>>       }
>>>       /* Access to snoopable pages through the GTT is incoherent. */
>>> -    if (obj->cache_level != I915_CACHE_NONE && !HAS_LLC(i915)) {
>>> +    if (!(i915_gem_object_has_cache_level(obj, I915_CACHE_NONE) ||
>>> +          HAS_LLC(i915))) {
>>>           ret = -EFAULT;
>>>           goto err_unpin;
>>>       }
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c
>>> b/drivers/gpu/drm/i915/gem/i915_gem_object.c
>>> index 8c70a0ec7d2f..27c948350b5b 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
>>> @@ -54,6 +54,25 @@ unsigned int i915_gem_get_pat_index(struct
>>> drm_i915_private *i915,
>>>       return INTEL_INFO(i915)->cachelevel_to_pat[level];
>>>   }
>>> +bool i915_gem_object_has_cache_level(const struct drm_i915_gem_object
>>> *obj,
>>> +                     enum i915_cache_level lvl)
>>
>> The name suggest object can have more cache levels, maybe only my
>> impression, up to you.
>>
>>> +{
>>> +    /*
>>> +     * cache_level == I915_CACHE_INVAL indicates the UMD's have set the
>>> +     * caching policy through pat_index, in which case the KMD should
>>> +     * leave the coherency to be managed by user space, simply return
>>> +     * true here.
>>> +     */
>>> +    if (obj->cache_level == I915_CACHE_INVAL)
>>> +        return true;
>
> It's a "bit" counter intuitive that answer "has cache level" is yes when
> cache level is set to invalid!

The only case for this condition to be true is an object created by UMD through
GEM_CREATE with PAT index specified by set_pat extension. In this case the KMD
is not supposed to touch the setting.

> I worry we don't create an impenetrable code base so I hope this can be improved.
>
>>> +
>>> +    /*
>>> +     * Otherwise the pat_index should have been converted from
>>> cache_level
>>> +     * so that the following comparison is valid.
>>> +     */
>>> +    return obj->pat_index == i915_gem_get_pat_index(obj_to_i915(obj),
>>> lvl);
>>> +}
>>> +
>>>   struct drm_i915_gem_object *i915_gem_object_alloc(void)
>>>   {
>>>       struct drm_i915_gem_object *obj;
>>> @@ -133,7 +152,7 @@ void i915_gem_object_set_cache_coherency(struct
>>> drm_i915_gem_object *obj,
>>>   {
>>>       struct drm_i915_private *i915 = to_i915(obj->base.dev);
>>> -    obj->cache_level = cache_level;
>>> +    obj->pat_index = i915_gem_get_pat_index(i915, cache_level);
>>>       if (cache_level != I915_CACHE_NONE)
>>>           obj->cache_coherent = (I915_BO_CACHE_COHERENT_FOR_READ |
>>> @@ -148,6 +167,37 @@ void i915_gem_object_set_cache_coherency(struct
>>> drm_i915_gem_object *obj,
>>>           !IS_DGFX(i915);
>>>   }
>>> +/**
>>> + * i915_gem_object_set_pat_index - set PAT index to be used in PTE
>>> encode
>>> + * @obj: #drm_i915_gem_object
>>> + * @pat_index: PAT index
>>> + *
>>> + * This is a clone of i915_gem_object_set_cache_coherency taking pat
>>> index
>>> + * instead of cache_level as its second argument.
>>> + */
>>> +void i915_gem_object_set_pat_index(struct drm_i915_gem_object *obj,
>>> +                   unsigned int pat_index)
>>> +{
>>> +    struct drm_i915_private *i915 = to_i915(obj->base.dev);
>>> +
>>> +    if (obj->pat_index == pat_index)
>>> +        return;
>>> +
>>> +    obj->pat_index = pat_index;
>>> +
>>> +    if (pat_index != i915_gem_get_pat_index(i915, I915_CACHE_NONE))
>>> +        obj->cache_coherent = (I915_BO_CACHE_COHERENT_FOR_READ |
>>> +                       I915_BO_CACHE_COHERENT_FOR_WRITE);
>>> +    else if (HAS_LLC(i915))
>>> +        obj->cache_coherent = I915_BO_CACHE_COHERENT_FOR_READ;
>>> +    else
>>> +        obj->cache_coherent = 0;
>>> +
>>> +    obj->cache_dirty =
>>> +        !(obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_WRITE) &&
>>> +        !IS_DGFX(i915);
>>> +}
>>> +
>>>   bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj)
>>>   {
>>>       struct drm_i915_private *i915 = to_i915(obj->base.dev);
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h
>>> b/drivers/gpu/drm/i915/gem/i915_gem_object.h
>>> index 4c92e17b4337..6f00aab10015 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
>>> @@ -34,6 +34,8 @@ static inline bool i915_gem_object_size_2big(u64 size)
>>>   unsigned int i915_gem_get_pat_index(struct drm_i915_private *i915,
>>>                       enum i915_cache_level level);
>>> +bool i915_gem_object_has_cache_level(const struct drm_i915_gem_object
>>> *obj,
>>> +                     enum i915_cache_level lvl);
>>>   void i915_gem_init__objects(struct drm_i915_private *i915);
>>>   void i915_objects_module_exit(void);
>>> @@ -764,6 +766,8 @@ bool i915_gem_object_has_unknown_state(struct
>>> drm_i915_gem_object *obj);
>>>   void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object
>>> *obj,
>>>                        unsigned int cache_level);
>>> +void i915_gem_object_set_pat_index(struct drm_i915_gem_object *obj,
>>> +                   unsigned int pat_index);
>>>   bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj);
>>>   void i915_gem_object_flush_if_display(struct drm_i915_gem_object *obj);
>>>   void i915_gem_object_flush_if_display_locked(struct
>>> drm_i915_gem_object *obj);
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
>>> b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
>>> index 41b35abccf88..132ce01dee9f 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
>>> @@ -195,6 +195,7 @@ enum i915_cache_level {
>>>        */
>>>       I915_CACHE_WT,
>>>       I915_MAX_CACHE_LEVEL,
>>> +    I915_CACHE_INVAL = I915_MAX_CACHE_LEVEL,
>>>   };
>>>   enum i915_map_type {
>>> @@ -358,10 +359,28 @@ struct drm_i915_gem_object {
>>>   #define I915_BO_FLAG_STRUCT_PAGE BIT(0) /* Object backed by struct
>>> pages */
>>>   #define I915_BO_FLAG_IOMEM       BIT(1) /* Object backed by IO
>>> memory */
>>>       /**
>>> -     * @cache_level: The desired GTT caching level.
>>> +     * @pat_index: The desired PAT index.
>>> +     *
>>> +     * See hardware specification for valid PAT indices for each
>>> platform.
>
> Side note for the last patch in the series - the UAPI blurb next to u32
> index needs to at least point to some public PRM

That would be the Bspec.
As far as the design goes, the UMD's should be aware of the PAT index and
its platform dependent nature. The KMD is doing a sanity check so that the
PAT index won't go pass the boundary (e.g. 4 for MTL, 7 for PVC, 3 for TGL/ADL).
I believe the UMD's must be maintaining a list for each platform...

>  which lists the PATs
> and their configuration I would think. Otherwise it's not fully
> transparent how to use the feature.
>
>>> +     * This field used to contain a value of enum i915_cache_level.
>
> What does this mean? Nothing is changed to unsigned here but just new field added.

This comment needs update. Will do in the next version.

>It's
>>> +     * changed to an unsigned int because PAT indices are being used by
>>> +     * both UMD and KMD for caching policy control after GEN12.
>>> +     * For backward compatibility, this field will continue to contain
>>> +     * value of i915_cache_level for pre-GEN12 platforms so that the PTE
>
> Pat_index:6 is a copy of cache_level:3 pre-Gen12?

There wasn't PAT index defined in pre-GEN12 platforms, so this is just to say
that pat_index would behave the same as the cache_level for these platforms.

> But when I look at changes like:
>
>@@ -1302,7 +1298,9 @@ bool i915_ggtt_resume_vm(struct i915_address_space *vm)
>                  */
>                 vma->resource->bound_flags = 0;
>                 vma->ops->bind_vma(vm, NULL, vma->resource,
>-                                  obj ? obj->cache_level : 0,
>+                                  obj ? obj->pat_index :
>+                                        i915_gem_get_pat_index(vm->i915,
>+                                                               I915_CACHE_NONE),
>                                    was_bound);
>
> That suggests it is not a copy but that obj->pat_index is always
> valid and directly a PAT index.
>
> In which case new cache_level enum to say "use pat instead" may
> indeed be nicer as Andrzej suggested.

I'm not sure I understand the concern here. I915_CACHE_NONE is 0, this is
just trying to select between obj->pat_index and pat_index for UC, depending
on whether the obj is valid.

> Although it is not clear to me for a glance that we need both. Maybe all
> in driver object creation can use cache_level but immediately convert to
> PAT internally and just don't store cache_level? I haven't looked in detail
> is my disclaimer though.. I guess it may boil down to does i915 ever need
> to read back cache_level, other than on the top entry points like setting
> it or so.
>
>>> +     * encode functions for these legacy platforms can stay the same.
>>> +     * In the meantime platform specific tables are created to translate
>>> +     * i915_cache_level into pat index, for more details check the
>>> macros
>>> +     * defined i915/i915_pci.c, e.g. PVC_CACHELEVEL.
>>> +     */
>>> +    unsigned int pat_index:6;
>
> Existing bitfield takes up 7 bits. I'd check here with pahole if making
> pat_index a full u8 and changing the existing ones to u8 field:bits maybe
> ends up better overall.
>
>>> +    /**
>>> +     * @cache_level: Indicate whether pat_index is set by UMD
>>>        *
>>> -     * See enum i915_cache_level for possible values, along with what
>>> -     * each does.
>>> +     * This used to hold desired GTT caching level, but is now
>>> replaced by
>>> +     * pat_index. It's kept here for KMD to tell whether the
>>> pat_index is
>>> +     * set by UMD or converted from enum i915_cache_level.
>>> +     * This field should be 0 by default, but I915_CACHE_INVAL if the
>>> +     * pat_index is set by UMD.
>>>        */
>>>       unsigned int cache_level:3;
>>>       /**
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
>>> b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
>>> index ee492d823f1b..3b094d36a0b0 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
>>> @@ -565,7 +565,9 @@ static void dbg_poison(struct i915_ggtt *ggtt,
>>>           ggtt->vm.insert_page(&ggtt->vm, addr,
>>>                        ggtt->error_capture.start,
>>> -                     I915_CACHE_NONE, 0);
>>> +                     i915_gem_get_pat_index(ggtt->vm.i915,
>>> +                                I915_CACHE_NONE),
>>> +                     0);
>>>           mb();
>>>           s = io_mapping_map_wc(&ggtt->iomap,
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
>>> b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
>>> index 69eb20ed4d47..e40761e13c2a 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
>>> @@ -214,7 +214,8 @@ static struct dma_fence
>>> *i915_ttm_accel_move(struct ttm_buffer_object *bo,
>>>           intel_engine_pm_get(to_gt(i915)->migrate.context->engine);
>>>           ret =
>>> intel_context_migrate_clear(to_gt(i915)->migrate.context, deps,
>>> -                          dst_st->sgl, dst_level,
>>> +                          dst_st->sgl,
>>> +                          i915_gem_get_pat_index(i915, dst_level),
>>>                             i915_ttm_gtt_binds_lmem(dst_mem),
>>>                             0, &rq);
>>>       } else {
>>> @@ -227,12 +228,13 @@ static struct dma_fence
>>> *i915_ttm_accel_move(struct ttm_buffer_object *bo,
>>>           src_level = i915_ttm_cache_level(i915, bo->resource, src_ttm);
>>>           intel_engine_pm_get(to_gt(i915)->migrate.context->engine);
>>>           ret = intel_context_migrate_copy(to_gt(i915)->migrate.context,
>>> -                         deps, src_rsgt->table.sgl,
>>> -                         src_level,
>>> -                         i915_ttm_gtt_binds_lmem(bo->resource),
>>> -                         dst_st->sgl, dst_level,
>>> -                         i915_ttm_gtt_binds_lmem(dst_mem),
>>> -                         &rq);
>>> +                    deps, src_rsgt->table.sgl,
>>> +                    i915_gem_get_pat_index(i915, src_level),
>>> +                    i915_ttm_gtt_binds_lmem(bo->resource),
>>> +                    dst_st->sgl,
>>> +                    i915_gem_get_pat_index(i915, dst_level),
>>> +                    i915_ttm_gtt_binds_lmem(dst_mem),
>>> +                    &rq);
>>>           i915_refct_sgt_put(src_rsgt);
>>>       }
>>> diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
>>> b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
>>> index defece0bcb81..ebb68ac9cd5e 100644
>>> --- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
>>> +++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
>>> @@ -354,7 +354,7 @@ fake_huge_pages_object(struct drm_i915_private
>>> *i915, u64 size, bool single)
>>>       obj->write_domain = I915_GEM_DOMAIN_CPU;
>>>       obj->read_domains = I915_GEM_DOMAIN_CPU;
>>> -    obj->cache_level = I915_CACHE_NONE;
>>> +    obj->pat_index = i915_gem_get_pat_index(i915, I915_CACHE_NONE);
>>>       return obj;
>>>   }
>>> diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c
>>> b/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c
>>> index fe6c37fd7859..a93a90b15907 100644
>>> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c
>>> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c
>>> @@ -219,7 +219,7 @@ static int __igt_lmem_pages_migrate(struct
>>> intel_gt *gt,
>>>               continue;
>>>           err = intel_migrate_clear(&gt->migrate, &ww, deps,
>>> -                      obj->mm.pages->sgl, obj->cache_level,
>>> +                      obj->mm.pages->sgl, obj->pat_index,
>>>                         i915_gem_object_is_lmem(obj),
>>>                         0xdeadbeaf, &rq);
>>>           if (rq) {
>>> diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
>>> b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
>>> index 56279908ed30..a93d8f9f8bc1 100644
>>> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
>>> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
>>> @@ -1222,7 +1222,7 @@ static int __igt_mmap_migrate(struct
>>> intel_memory_region **placements,
>>>       }
>>>       err = intel_context_migrate_clear(to_gt(i915)->migrate.context,
>>> NULL,
>>> -                      obj->mm.pages->sgl, obj->cache_level,
>>> +                      obj->mm.pages->sgl, obj->pat_index,
>>>                         i915_gem_object_is_lmem(obj),
>>>                         expand32(POISON_INUSE), &rq);
>>>       i915_gem_object_unpin_pages(obj);
>>> diff --git a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
>>> b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
>>> index 5aaacc53fa4c..c2bdc133c89a 100644
>>> --- a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
>>> +++ b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
>>> @@ -109,7 +109,7 @@ static void gen6_ppgtt_clear_range(struct
>>> i915_address_space *vm,
>>>   static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
>>>                         struct i915_vma_resource *vma_res,
>>> -                      enum i915_cache_level cache_level,
>>> +                      unsigned int pat_index,
>>>                         u32 flags)
>>>   {
>>>       struct i915_ppgtt *ppgtt = i915_vm_to_ppgtt(vm);
>>> @@ -117,7 +117,7 @@ static void gen6_ppgtt_insert_entries(struct
>>> i915_address_space *vm,
>>>       unsigned int first_entry = vma_res->start / I915_GTT_PAGE_SIZE;
>>>       unsigned int act_pt = first_entry / GEN6_PTES;
>>>       unsigned int act_pte = first_entry % GEN6_PTES;
>>> -    const u32 pte_encode = vm->pte_encode(0, cache_level, flags);
>>> +    const u32 pte_encode = vm->pte_encode(0, pat_index, flags);
>>>       struct sgt_dma iter = sgt_dma(vma_res);
>>>       gen6_pte_t *vaddr;
>>> @@ -227,7 +227,9 @@ static int gen6_ppgtt_init_scratch(struct
>>> gen6_ppgtt *ppgtt)
>>>       vm->scratch[0]->encode =
>>>           vm->pte_encode(px_dma(vm->scratch[0]),
>>> -                   I915_CACHE_NONE, PTE_READ_ONLY);
>>> +                   i915_gem_get_pat_index(vm->i915,
>>> +                              I915_CACHE_NONE),
>>> +                   PTE_READ_ONLY);
>>>       vm->scratch[1] = vm->alloc_pt_dma(vm, I915_GTT_PAGE_SIZE_4K);
>>>       if (IS_ERR(vm->scratch[1])) {
>>> @@ -278,7 +280,7 @@ static void gen6_ppgtt_cleanup(struct
>>> i915_address_space *vm)
>>>   static void pd_vma_bind(struct i915_address_space *vm,
>>>               struct i915_vm_pt_stash *stash,
>>>               struct i915_vma_resource *vma_res,
>>> -            enum i915_cache_level cache_level,
>>> +            unsigned int pat_index,
>>>               u32 unused)
>>>   {
>>>       struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
>>> diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>>> b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>>> index 7a4b1d1afce9..c046813514f4 100644
>>> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>>> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>>> @@ -56,7 +56,7 @@ static u64 gen8_pte_encode(dma_addr_t addr,
>>>   }
>>>   static u64 mtl_pte_encode(dma_addr_t addr,
>>> -              enum i915_cache_level level,
>>> +              unsigned int pat_index,
>>>                 u32 flags)
>>>   {
>>>       gen8_pte_t pte = addr | GEN8_PAGE_PRESENT | GEN8_PAGE_RW;
>>> @@ -67,24 +67,17 @@ static u64 mtl_pte_encode(dma_addr_t addr,
>>>       if (flags & PTE_LM)
>>>           pte |= GEN12_PPGTT_PTE_LM | GEN12_PPGTT_PTE_NC;
>>> -    switch (level) {
>>> -    case I915_CACHE_NONE:
>>> -        pte |= GEN12_PPGTT_PTE_PAT1;
>>> -        break;
>>> -    case I915_CACHE_LLC:
>>> -    case I915_CACHE_L3_LLC:
>>> -        pte |= GEN12_PPGTT_PTE_PAT0 | GEN12_PPGTT_PTE_PAT1;
>>> -        break;
>>> -    case I915_CACHE_WT:
>>> +    if (pat_index & BIT(0))
>>>           pte |= GEN12_PPGTT_PTE_PAT0;
>>> -        break;
>>> -    default:
>>> -        /* This should never happen. Added to deal with the compile
>>> -         * error due to the addition of I915_MAX_CACHE_LEVEL. Will
>>> -         * be removed by the pat_index patch.
>>> -         */
>>> -        break;
>>> -    }
>>> +
>>> +    if (pat_index & BIT(1))
>>> +        pte |= GEN12_PPGTT_PTE_PAT1;
>>> +
>>> +    if (pat_index & BIT(2))
>>> +        pte |= GEN12_PPGTT_PTE_PAT2;
>>> +
>>> +    if (pat_index & BIT(3))
>>> +        pte |= MTL_PPGTT_PTE_PAT3;
>>>       return pte;
>>>   }
>>> @@ -457,11 +450,11 @@ gen8_ppgtt_insert_pte(struct i915_ppgtt *ppgtt,
>>>                 struct i915_page_directory *pdp,
>>>                 struct sgt_dma *iter,
>>>                 u64 idx,
>>> -              enum i915_cache_level cache_level,
>>> +              unsigned int pat_index,
>>>                 u32 flags)
>>>   {
>>>       struct i915_page_directory *pd;
>>> -    const gen8_pte_t pte_encode = ppgtt->vm.pte_encode(0,
>>> cache_level, flags);
>>> +    const gen8_pte_t pte_encode = ppgtt->vm.pte_encode(0, pat_index,
>>> flags);
>>>       gen8_pte_t *vaddr;
>>>       pd = i915_pd_entry(pdp, gen8_pd_index(idx, 2));
>>> @@ -504,10 +497,10 @@ static void
>>>   xehpsdv_ppgtt_insert_huge(struct i915_address_space *vm,
>>>                 struct i915_vma_resource *vma_res,
>>>                 struct sgt_dma *iter,
>>> -              enum i915_cache_level cache_level,
>>> +              unsigned int pat_index,
>>>                 u32 flags)
>>>   {
>>> -    const gen8_pte_t pte_encode = vm->pte_encode(0, cache_level, flags);
>>> +    const gen8_pte_t pte_encode = vm->pte_encode(0, pat_index, flags);
>>>       unsigned int rem = sg_dma_len(iter->sg);
>>>       u64 start = vma_res->start;
>>>       u64 end = start + vma_res->vma_size;
>>> @@ -611,10 +604,10 @@ xehpsdv_ppgtt_insert_huge(struct
>>> i915_address_space *vm,
>>>   static void gen8_ppgtt_insert_huge(struct i915_address_space *vm,
>>>                      struct i915_vma_resource *vma_res,
>>>                      struct sgt_dma *iter,
>>> -                   enum i915_cache_level cache_level,
>>> +                   unsigned int pat_index,
>>>                      u32 flags)
>>>   {
>>> -    const gen8_pte_t pte_encode = vm->pte_encode(0, cache_level, flags);
>>> +    const gen8_pte_t pte_encode = vm->pte_encode(0, pat_index, flags);
>>>       unsigned int rem = sg_dma_len(iter->sg);
>>>       u64 start = vma_res->start;
>>> @@ -734,7 +727,7 @@ static void gen8_ppgtt_insert_huge(struct
>>> i915_address_space *vm,
>>>   static void gen8_ppgtt_insert(struct i915_address_space *vm,
>>>                     struct i915_vma_resource *vma_res,
>>> -                  enum i915_cache_level cache_level,
>>> +                  unsigned int pat_index,
>>>                     u32 flags)
>>>   {
>>>       struct i915_ppgtt * const ppgtt = i915_vm_to_ppgtt(vm);
>>> @@ -742,9 +735,9 @@ static void gen8_ppgtt_insert(struct
>>> i915_address_space *vm,
>>>       if (vma_res->bi.page_sizes.sg > I915_GTT_PAGE_SIZE) {
>>>           if (HAS_64K_PAGES(vm->i915))
>>> -            xehpsdv_ppgtt_insert_huge(vm, vma_res, &iter,
>>> cache_level, flags);
>>> +            xehpsdv_ppgtt_insert_huge(vm, vma_res, &iter, pat_index,
>>> flags);
>>>           else
>>> -            gen8_ppgtt_insert_huge(vm, vma_res, &iter, cache_level,
>>> flags);
>>> +            gen8_ppgtt_insert_huge(vm, vma_res, &iter, pat_index,
>>> flags);
>>>       } else  {
>>>           u64 idx = vma_res->start >> GEN8_PTE_SHIFT;
>>> @@ -753,7 +746,7 @@ static void gen8_ppgtt_insert(struct
>>> i915_address_space *vm,
>>>                   gen8_pdp_for_page_index(vm, idx);
>>>               idx = gen8_ppgtt_insert_pte(ppgtt, pdp, &iter, idx,
>>> -                            cache_level, flags);
>>> +                            pat_index, flags);
>>>           } while (idx);
>>>           vma_res->page_sizes_gtt = I915_GTT_PAGE_SIZE;
>>> @@ -763,7 +756,7 @@ static void gen8_ppgtt_insert(struct
>>> i915_address_space *vm,
>>>   static void gen8_ppgtt_insert_entry(struct i915_address_space *vm,
>>>                       dma_addr_t addr,
>>>                       u64 offset,
>>> -                    enum i915_cache_level level,
>>> +                    unsigned int pat_index,
>>>                       u32 flags)
>>>   {
>>>       u64 idx = offset >> GEN8_PTE_SHIFT;
>>> @@ -777,14 +770,14 @@ static void gen8_ppgtt_insert_entry(struct
>>> i915_address_space *vm,
>>>       GEM_BUG_ON(pt->is_compact);
>>>       vaddr = px_vaddr(pt);
>>> -    vaddr[gen8_pd_index(idx, 0)] = vm->pte_encode(addr, level, flags);
>>> +    vaddr[gen8_pd_index(idx, 0)] = vm->pte_encode(addr, pat_index,
>>> flags);
>>>       drm_clflush_virt_range(&vaddr[gen8_pd_index(idx, 0)],
>>> sizeof(*vaddr));
>>>   }
>>>   static void __xehpsdv_ppgtt_insert_entry_lm(struct
>>> i915_address_space *vm,
>>>                           dma_addr_t addr,
>>>                           u64 offset,
>>> -                        enum i915_cache_level level,
>>> +                        unsigned int pat_index,
>>>                           u32 flags)
>>>   {
>>>       u64 idx = offset >> GEN8_PTE_SHIFT;
>>> @@ -807,20 +800,20 @@ static void
>>> __xehpsdv_ppgtt_insert_entry_lm(struct i915_address_space *vm,
>>>       }
>>>       vaddr = px_vaddr(pt);
>>> -    vaddr[gen8_pd_index(idx, 0) / 16] = vm->pte_encode(addr, level,
>>> flags);
>>> +    vaddr[gen8_pd_index(idx, 0) / 16] = vm->pte_encode(addr,
>>> pat_index, flags);
>>>   }
>>>   static void xehpsdv_ppgtt_insert_entry(struct i915_address_space *vm,
>>>                          dma_addr_t addr,
>>>                          u64 offset,
>>> -                       enum i915_cache_level level,
>>> +                       unsigned int pat_index,
>>>                          u32 flags)
>>>   {
>>>       if (flags & PTE_LM)
>>>           return __xehpsdv_ppgtt_insert_entry_lm(vm, addr, offset,
>>> -                               level, flags);
>>> +                               pat_index, flags);
>>> -    return gen8_ppgtt_insert_entry(vm, addr, offset, level, flags);
>>> +    return gen8_ppgtt_insert_entry(vm, addr, offset, pat_index, flags);
>>>   }
>>>   static int gen8_init_scratch(struct i915_address_space *vm)
>>> @@ -855,7 +848,9 @@ static int gen8_init_scratch(struct
>>> i915_address_space *vm)
>>>       vm->scratch[0]->encode =
>>>           vm->pte_encode(px_dma(vm->scratch[0]),
>>> -                   I915_CACHE_NONE, pte_flags);
>>> +                   i915_gem_get_pat_index(vm->i915,
>>> +                              I915_CACHE_NONE),
>>> +                   pte_flags);
>>>       for (i = 1; i <= vm->top; i++) {
>>>           struct drm_i915_gem_object *obj;
>>> @@ -873,7 +868,9 @@ static int gen8_init_scratch(struct
>>> i915_address_space *vm)
>>>           }
>>>           fill_px(obj, vm->scratch[i - 1]->encode);
>>> -        obj->encode = gen8_pde_encode(px_dma(obj), I915_CACHE_NONE);
>>> +        obj->encode = gen8_pde_encode(px_dma(obj),
>>> +                          i915_gem_get_pat_index(vm->i915,
>>> +                                     I915_CACHE_NONE));
>>>           vm->scratch[i] = obj;
>>>       }
>>> diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.h
>>> b/drivers/gpu/drm/i915/gt/gen8_ppgtt.h
>>> index f541d19264b4..19c635441642 100644
>>> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.h
>>> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.h
>>> @@ -10,13 +10,12 @@
>>>   struct i915_address_space;
>>>   struct intel_gt;
>>> -enum i915_cache_level;
>>>   struct i915_ppgtt *gen8_ppgtt_create(struct intel_gt *gt,
>>>                        unsigned long lmem_pt_obj_flags);
>>>   u64 gen8_ggtt_pte_encode(dma_addr_t addr,
>>> -             enum i915_cache_level level,
>>> +             unsigned int pat_index,
>>>                u32 flags);
>>>   #endif
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c
>>> b/drivers/gpu/drm/i915/gt/intel_ggtt.c
>>> index c8390d03fce2..2a7942fac798 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
>>> @@ -221,7 +221,7 @@ static void guc_ggtt_invalidate(struct i915_ggtt
>>> *ggtt)
>>>   }
>>>   static u64 mtl_ggtt_pte_encode(dma_addr_t addr,
>>> -                   enum i915_cache_level level,
>>> +                   unsigned int pat_index,
>>>                      u32 flags)
>>>   {
>>>       gen8_pte_t pte = addr | GEN8_PAGE_PRESENT;
>>> @@ -231,30 +231,17 @@ static u64 mtl_ggtt_pte_encode(dma_addr_t addr,
>>>       if (flags & PTE_LM)
>>>           pte |= GEN12_GGTT_PTE_LM;
>>> -    switch (level) {
>>> -    case I915_CACHE_NONE:
>>> -        pte |= MTL_GGTT_PTE_PAT1;
>>> -        break;
>>> -    case I915_CACHE_LLC:
>>> -    case I915_CACHE_L3_LLC:
>>> -        pte |= MTL_GGTT_PTE_PAT0 | MTL_GGTT_PTE_PAT1;
>>> -        break;
>>> -    case I915_CACHE_WT:
>>> +    if (pat_index & BIT(0))
>>>           pte |= MTL_GGTT_PTE_PAT0;
>>> -        break;
>>> -    default:
>>> -        /* This should never happen. Added to deal with the compile
>>> -         * error due to the addition of I915_MAX_CACHE_LEVEL. Will
>>> -         * be removed by the pat_index patch.
>>> -         */
>>> -        break;
>>> -    }
>>> +
>>> +    if (pat_index & BIT(1))
>>> +        pte |= MTL_GGTT_PTE_PAT1;
>>>       return pte;
>>>   }
>>>   u64 gen8_ggtt_pte_encode(dma_addr_t addr,
>>> -             enum i915_cache_level level,
>>> +             unsigned int pat_index,
>>>                u32 flags)
>>>   {
>>>       gen8_pte_t pte = addr | GEN8_PAGE_PRESENT;
>>> @@ -273,25 +260,25 @@ static void gen8_set_pte(void __iomem *addr,
>>> gen8_pte_t pte)
>>>   static void gen8_ggtt_insert_page(struct i915_address_space *vm,
>>>                     dma_addr_t addr,
>>>                     u64 offset,
>>> -                  enum i915_cache_level level,
>>> +                  unsigned int pat_index,
>>>                     u32 flags)
>>>   {
>>>       struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
>>>       gen8_pte_t __iomem *pte =
>>>           (gen8_pte_t __iomem *)ggtt->gsm + offset / I915_GTT_PAGE_SIZE;
>>> -    gen8_set_pte(pte, ggtt->vm.pte_encode(addr, level, flags));
>>> +    gen8_set_pte(pte, ggtt->vm.pte_encode(addr, pat_index, flags));
>>>       ggtt->invalidate(ggtt);
>>>   }
>>>   static void gen8_ggtt_insert_entries(struct i915_address_space *vm,
>>>                        struct i915_vma_resource *vma_res,
>>> -                     enum i915_cache_level level,
>>> +                     unsigned int pat_index,
>>>                        u32 flags)
>>>   {
>>>       struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
>>> -    const gen8_pte_t pte_encode = ggtt->vm.pte_encode(0, level, flags);
>>> +    const gen8_pte_t pte_encode = ggtt->vm.pte_encode(0, pat_index,
>>> flags);
>>>       gen8_pte_t __iomem *gte;
>>>       gen8_pte_t __iomem *end;
>>>       struct sgt_iter iter;
>>> @@ -348,14 +335,14 @@ static void gen8_ggtt_clear_range(struct
>>> i915_address_space *vm,
>>>   static void gen6_ggtt_insert_page(struct i915_address_space *vm,
>>>                     dma_addr_t addr,
>>>                     u64 offset,
>>> -                  enum i915_cache_level level,
>>> +                  unsigned int pat_index,
>>>                     u32 flags)
>>>   {
>>>       struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
>>>       gen6_pte_t __iomem *pte =
>>>           (gen6_pte_t __iomem *)ggtt->gsm + offset / I915_GTT_PAGE_SIZE;
>>> -    iowrite32(vm->pte_encode(addr, level, flags), pte);
>>> +    iowrite32(vm->pte_encode(addr, pat_index, flags), pte);
>>>       ggtt->invalidate(ggtt);
>>>   }
>>> @@ -368,7 +355,7 @@ static void gen6_ggtt_insert_page(struct
>>> i915_address_space *vm,
>>>    */
>>>   static void gen6_ggtt_insert_entries(struct i915_address_space *vm,
>>>                        struct i915_vma_resource *vma_res,
>>> -                     enum i915_cache_level level,
>>> +                     unsigned int pat_index,
>>>                        u32 flags)
>>>   {
>>>       struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
>>> @@ -385,7 +372,7 @@ static void gen6_ggtt_insert_entries(struct
>>> i915_address_space *vm,
>>>           iowrite32(vm->scratch[0]->encode, gte++);
>>>       end += (vma_res->node_size + vma_res->guard) / I915_GTT_PAGE_SIZE;
>>>       for_each_sgt_daddr(addr, iter, vma_res->bi.pages)
>>> -        iowrite32(vm->pte_encode(addr, level, flags), gte++);
>>> +        iowrite32(vm->pte_encode(addr, pat_index, flags), gte++);
>>>       GEM_BUG_ON(gte > end);
>>>       /* Fill the allocated but "unused" space beyond the end of the
>>> buffer */
>>> @@ -420,14 +407,15 @@ struct insert_page {
>>>       struct i915_address_space *vm;
>>>       dma_addr_t addr;
>>>       u64 offset;
>>> -    enum i915_cache_level level;
>>> +    unsigned int pat_index;
>>>   };
>>>   static int bxt_vtd_ggtt_insert_page__cb(void *_arg)
>>>   {
>>>       struct insert_page *arg = _arg;
>>> -    gen8_ggtt_insert_page(arg->vm, arg->addr, arg->offset,
>>> arg->level, 0);
>>> +    gen8_ggtt_insert_page(arg->vm, arg->addr, arg->offset,
>>> +                  arg->pat_index, 0);
>>>       bxt_vtd_ggtt_wa(arg->vm);
>>>       return 0;
>>> @@ -436,10 +424,10 @@ static int bxt_vtd_ggtt_insert_page__cb(void *_arg)
>>>   static void bxt_vtd_ggtt_insert_page__BKL(struct i915_address_space
>>> *vm,
>>>                         dma_addr_t addr,
>>>                         u64 offset,
>>> -                      enum i915_cache_level level,
>>> +                      unsigned int pat_index,
>>>                         u32 unused)
>>>   {
>>> -    struct insert_page arg = { vm, addr, offset, level };
>>> +    struct insert_page arg = { vm, addr, offset, pat_index };
>>>       stop_machine(bxt_vtd_ggtt_insert_page__cb, &arg, NULL);
>>>   }
>>> @@ -447,7 +435,7 @@ static void bxt_vtd_ggtt_insert_page__BKL(struct
>>> i915_address_space *vm,
>>>   struct insert_entries {
>>>       struct i915_address_space *vm;
>>>       struct i915_vma_resource *vma_res;
>>> -    enum i915_cache_level level;
>>> +    unsigned int pat_index;
>>>       u32 flags;
>>>   };
>>> @@ -455,7 +443,8 @@ static int bxt_vtd_ggtt_insert_entries__cb(void
>>> *_arg)
>>>   {
>>>       struct insert_entries *arg = _arg;
>>> -    gen8_ggtt_insert_entries(arg->vm, arg->vma_res, arg->level,
>>> arg->flags);
>>> +    gen8_ggtt_insert_entries(arg->vm, arg->vma_res,
>>> +                 arg->pat_index, arg->flags);
>>>       bxt_vtd_ggtt_wa(arg->vm);
>>>       return 0;
>>> @@ -463,10 +452,10 @@ static int bxt_vtd_ggtt_insert_entries__cb(void
>>> *_arg)
>>>   static void bxt_vtd_ggtt_insert_entries__BKL(struct
>>> i915_address_space *vm,
>>>                            struct i915_vma_resource *vma_res,
>>> -                         enum i915_cache_level level,
>>> +                         unsigned int pat_index,
>>>                            u32 flags)
>>>   {
>>> -    struct insert_entries arg = { vm, vma_res, level, flags };
>>> +    struct insert_entries arg = { vm, vma_res, pat_index, flags };
>>>       stop_machine(bxt_vtd_ggtt_insert_entries__cb, &arg, NULL);
>>>   }
>>> @@ -495,7 +484,7 @@ static void gen6_ggtt_clear_range(struct
>>> i915_address_space *vm,
>>>   void intel_ggtt_bind_vma(struct i915_address_space *vm,
>>>                struct i915_vm_pt_stash *stash,
>>>                struct i915_vma_resource *vma_res,
>>> -             enum i915_cache_level cache_level,
>>> +             unsigned int pat_index,
>>>                u32 flags)
>>>   {
>>>       u32 pte_flags;
>>> @@ -512,7 +501,7 @@ void intel_ggtt_bind_vma(struct i915_address_space
>>> *vm,
>>>       if (vma_res->bi.lmem)
>>>           pte_flags |= PTE_LM;
>>> -    vm->insert_entries(vm, vma_res, cache_level, pte_flags);
>>> +    vm->insert_entries(vm, vma_res, pat_index, pte_flags);
>>>       vma_res->page_sizes_gtt = I915_GTT_PAGE_SIZE;
>>>   }
>>> @@ -661,7 +650,7 @@ static int init_ggtt(struct i915_ggtt *ggtt)
>>>   static void aliasing_gtt_bind_vma(struct i915_address_space *vm,
>>>                     struct i915_vm_pt_stash *stash,
>>>                     struct i915_vma_resource *vma_res,
>>> -                  enum i915_cache_level cache_level,
>>> +                  unsigned int pat_index,
>>>                     u32 flags)
>>>   {
>>>       u32 pte_flags;
>>> @@ -673,10 +662,10 @@ static void aliasing_gtt_bind_vma(struct
>>> i915_address_space *vm,
>>>       if (flags & I915_VMA_LOCAL_BIND)
>>>           ppgtt_bind_vma(&i915_vm_to_ggtt(vm)->alias->vm,
>>> -                   stash, vma_res, cache_level, flags);
>>> +                   stash, vma_res, pat_index, flags);
>>>       if (flags & I915_VMA_GLOBAL_BIND)
>>> -        vm->insert_entries(vm, vma_res, cache_level, pte_flags);
>>> +        vm->insert_entries(vm, vma_res, pat_index, pte_flags);
>>>       vma_res->bound_flags |= flags;
>>>   }
>>> @@ -933,7 +922,9 @@ static int ggtt_probe_common(struct i915_ggtt
>>> *ggtt, u64 size)
>>>       ggtt->vm.scratch[0]->encode =
>>>           ggtt->vm.pte_encode(px_dma(ggtt->vm.scratch[0]),
>>> -                    I915_CACHE_NONE, pte_flags);
>>> +                    i915_gem_get_pat_index(i915,
>>> +                               I915_CACHE_NONE),
>>> +                    pte_flags);
>>>       return 0;
>>>   }
>>> @@ -1022,6 +1013,11 @@ static int gen8_gmch_probe(struct i915_ggtt *ggtt)
>>>       return ggtt_probe_common(ggtt, size);
>>>   }
>>> +/*
>>> + * For pre-gen8 platforms pat_index is the same as enum
>>> i915_cache_level,
>>> + * so these PTE encode functions are left with using cache_level.
>>> + * See translation table LEGACY_CACHELEVEL.
>>> + */
>>>   static u64 snb_pte_encode(dma_addr_t addr,
>>>                 enum i915_cache_level level,
>>>                 u32 flags)
>>> @@ -1302,7 +1298,9 @@ bool i915_ggtt_resume_vm(struct
>>> i915_address_space *vm)
>>>            */
>>>           vma->resource->bound_flags = 0;
>>>           vma->ops->bind_vma(vm, NULL, vma->resource,
>>> -                   obj ? obj->cache_level : 0,
>>> +                   obj ? obj->pat_index :
>>> +                     i915_gem_get_pat_index(vm->i915,
>>> +                                I915_CACHE_NONE),
>>>                      was_bound);
>>>           if (obj) { /* only used during resume => exclusive access */
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h
>>> b/drivers/gpu/drm/i915/gt/intel_gtt.h
>>> index 854ec09fd588..be767e13b1e5 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
>>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
>>> @@ -165,8 +165,6 @@ typedef u64 gen8_pte_t;
>>>   #define MTL_2_COH_1W    REG_FIELD_PREP(MTL_PAT_INDEX_COH_MODE_MASK, 2)
>>>   #define MTL_0_COH_NON    REG_FIELD_PREP(MTL_PAT_INDEX_COH_MODE_MASK, 0)
>>> -enum i915_cache_level;
>>> -
>>>   struct drm_i915_gem_object;
>>>   struct i915_fence_reg;
>>>   struct i915_vma;
>>> @@ -234,7 +232,7 @@ struct i915_vma_ops {
>>>       void (*bind_vma)(struct i915_address_space *vm,
>>>                struct i915_vm_pt_stash *stash,
>>>                struct i915_vma_resource *vma_res,
>>> -             enum i915_cache_level cache_level,
>>> +             unsigned int pat_index,
>>>                u32 flags);
>>>       /*
>>>        * Unmap an object from an address space. This usually consists of
>>> @@ -306,7 +304,7 @@ struct i915_address_space {
>>>           (*alloc_scratch_dma)(struct i915_address_space *vm, int sz);
>>>       u64 (*pte_encode)(dma_addr_t addr,
>>> -              enum i915_cache_level level,
>>> +              unsigned int pat_index,
>>>                 u32 flags); /* Create a valid PTE */
>>>   #define PTE_READ_ONLY    BIT(0)
>>>   #define PTE_LM        BIT(1)
>>> @@ -321,20 +319,20 @@ struct i915_address_space {
>>>       void (*insert_page)(struct i915_address_space *vm,
>>>                   dma_addr_t addr,
>>>                   u64 offset,
>>> -                enum i915_cache_level cache_level,
>>> +                unsigned int pat_index,
>>>                   u32 flags);
>>>       void (*insert_entries)(struct i915_address_space *vm,
>>>                      struct i915_vma_resource *vma_res,
>>> -                   enum i915_cache_level cache_level,
>>> +                   unsigned int pat_index,
>>>                      u32 flags);
>>>       void (*raw_insert_page)(struct i915_address_space *vm,
>>>                   dma_addr_t addr,
>>>                   u64 offset,
>>> -                enum i915_cache_level cache_level,
>>> +                unsigned int pat_index,
>>>                   u32 flags);
>>>       void (*raw_insert_entries)(struct i915_address_space *vm,
>>>                      struct i915_vma_resource *vma_res,
>>> -                   enum i915_cache_level cache_level,
>>> +                   unsigned int pat_index,
>>>                      u32 flags);
>>>       void (*cleanup)(struct i915_address_space *vm);
>>> @@ -581,7 +579,7 @@ void ppgtt_init(struct i915_ppgtt *ppgtt, struct
>>> intel_gt *gt,
>>>   void intel_ggtt_bind_vma(struct i915_address_space *vm,
>>>                struct i915_vm_pt_stash *stash,
>>>                struct i915_vma_resource *vma_res,
>>> -             enum i915_cache_level cache_level,
>>> +             unsigned int pat_index,
>>>                u32 flags);
>>>   void intel_ggtt_unbind_vma(struct i915_address_space *vm,
>>>                  struct i915_vma_resource *vma_res);
>>> @@ -639,7 +637,7 @@ void
>>>   __set_pd_entry(struct i915_page_directory * const pd,
>>>              const unsigned short idx,
>>>              struct i915_page_table *pt,
>>> -           u64 (*encode)(const dma_addr_t, const enum
>>> i915_cache_level));
>>> +           u64 (*encode)(const dma_addr_t, const unsigned int
>>> pat_index));
>>>   #define set_pd_entry(pd, idx, to) \
>>>       __set_pd_entry((pd), (idx), px_pt(to), gen8_pde_encode)
>>> @@ -659,7 +657,7 @@ void gen6_ggtt_invalidate(struct i915_ggtt *ggtt);
>>>   void ppgtt_bind_vma(struct i915_address_space *vm,
>>>               struct i915_vm_pt_stash *stash,
>>>               struct i915_vma_resource *vma_res,
>>> -            enum i915_cache_level cache_level,
>>> +            unsigned int pat_index,
>>>               u32 flags);
>>>   void ppgtt_unbind_vma(struct i915_address_space *vm,
>>>                 struct i915_vma_resource *vma_res);
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c
>>> b/drivers/gpu/drm/i915/gt/intel_migrate.c
>>> index 3f638f198796..117c3d05af3e 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_migrate.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
>>> @@ -45,7 +45,9 @@ static void xehpsdv_toggle_pdes(struct
>>> i915_address_space *vm,
>>>        * Insert a dummy PTE into every PT that will map to LMEM to ensure
>>>        * we have a correctly setup PDE structure for later use.
>>>        */
>>> -    vm->insert_page(vm, 0, d->offset, I915_CACHE_NONE, PTE_LM);
>>> +    vm->insert_page(vm, 0, d->offset,
>>> +            i915_gem_get_pat_index(vm->i915, I915_CACHE_NONE),
>>> +            PTE_LM);
>>>       GEM_BUG_ON(!pt->is_compact);
>>>       d->offset += SZ_2M;
>>>   }
>>> @@ -63,7 +65,9 @@ static void xehpsdv_insert_pte(struct
>>> i915_address_space *vm,
>>>        * alignment is 64K underneath for the pt, and we are careful
>>>        * not to access the space in the void.
>>>        */
>>> -    vm->insert_page(vm, px_dma(pt), d->offset, I915_CACHE_NONE, PTE_LM);
>>> +    vm->insert_page(vm, px_dma(pt), d->offset,
>>> +            i915_gem_get_pat_index(vm->i915, I915_CACHE_NONE),
>>> +            PTE_LM);
>>>       d->offset += SZ_64K;
>>>   }
>>> @@ -73,7 +77,8 @@ static void insert_pte(struct i915_address_space *vm,
>>>   {
>>>       struct insert_pte_data *d = data;
>>> -    vm->insert_page(vm, px_dma(pt), d->offset, I915_CACHE_NONE,
>>> +    vm->insert_page(vm, px_dma(pt), d->offset,
>>> +            i915_gem_get_pat_index(vm->i915, I915_CACHE_NONE),
>>>               i915_gem_object_is_lmem(pt->base) ? PTE_LM : 0);
>>>       d->offset += PAGE_SIZE;
>>>   }
>>> @@ -356,13 +361,13 @@ static int max_pte_pkt_size(struct i915_request
>>> *rq, int pkt)
>>>   static int emit_pte(struct i915_request *rq,
>>>               struct sgt_dma *it,
>>> -            enum i915_cache_level cache_level,
>>> +            unsigned int pat_index,
>>>               bool is_lmem,
>>>               u64 offset,
>>>               int length)
>>>   {
>>>       bool has_64K_pages = HAS_64K_PAGES(rq->engine->i915);
>>> -    const u64 encode = rq->context->vm->pte_encode(0, cache_level,
>>> +    const u64 encode = rq->context->vm->pte_encode(0, pat_index,
>>>                                  is_lmem ? PTE_LM : 0);
>>>       struct intel_ring *ring = rq->ring;
>>>       int pkt, dword_length;
>>> @@ -673,17 +678,17 @@ int
>>>   intel_context_migrate_copy(struct intel_context *ce,
>>>                  const struct i915_deps *deps,
>>>                  struct scatterlist *src,
>>> -               enum i915_cache_level src_cache_level,
>>> +               unsigned int src_pat_index,
>>>                  bool src_is_lmem,
>>>                  struct scatterlist *dst,
>>> -               enum i915_cache_level dst_cache_level,
>>> +               unsigned int dst_pat_index,
>>>                  bool dst_is_lmem,
>>>                  struct i915_request **out)
>>>   {
>>>       struct sgt_dma it_src = sg_sgt(src), it_dst = sg_sgt(dst), it_ccs;
>>>       struct drm_i915_private *i915 = ce->engine->i915;
>>>       u64 ccs_bytes_to_cpy = 0, bytes_to_cpy;
>>> -    enum i915_cache_level ccs_cache_level;
>>> +    unsigned int ccs_pat_index;
>>>       u32 src_offset, dst_offset;
>>>       u8 src_access, dst_access;
>>>       struct i915_request *rq;
>>> @@ -707,12 +712,12 @@ intel_context_migrate_copy(struct intel_context
>>> *ce,
>>>           dst_sz = scatter_list_length(dst);
>>>           if (src_is_lmem) {
>>>               it_ccs = it_dst;
>>> -            ccs_cache_level = dst_cache_level;
>>> +            ccs_pat_index = dst_pat_index;
>>>               ccs_is_src = false;
>>>           } else if (dst_is_lmem) {
>>>               bytes_to_cpy = dst_sz;
>>>               it_ccs = it_src;
>>> -            ccs_cache_level = src_cache_level;
>>> +            ccs_pat_index = src_pat_index;
>>>               ccs_is_src = true;
>>>           }
>>> @@ -773,7 +778,7 @@ intel_context_migrate_copy(struct intel_context *ce,
>>>           src_sz = calculate_chunk_sz(i915, src_is_lmem,
>>>                           bytes_to_cpy, ccs_bytes_to_cpy);
>>> -        len = emit_pte(rq, &it_src, src_cache_level, src_is_lmem,
>>> +        len = emit_pte(rq, &it_src, src_pat_index, src_is_lmem,
>>>                      src_offset, src_sz);
>>>           if (!len) {
>>>               err = -EINVAL;
>>> @@ -784,7 +789,7 @@ intel_context_migrate_copy(struct intel_context *ce,
>>>               goto out_rq;
>>>           }
>>> -        err = emit_pte(rq, &it_dst, dst_cache_level, dst_is_lmem,
>>> +        err = emit_pte(rq, &it_dst, dst_pat_index, dst_is_lmem,
>>>                      dst_offset, len);
>>>           if (err < 0)
>>>               goto out_rq;
>>> @@ -811,7 +816,7 @@ intel_context_migrate_copy(struct intel_context *ce,
>>>                   goto out_rq;
>>>               ccs_sz = GET_CCS_BYTES(i915, len);
>>> -            err = emit_pte(rq, &it_ccs, ccs_cache_level, false,
>>> +            err = emit_pte(rq, &it_ccs, ccs_pat_index, false,
>>>                          ccs_is_src ? src_offset : dst_offset,
>>>                          ccs_sz);
>>>               if (err < 0)
>>> @@ -979,7 +984,7 @@ int
>>>   intel_context_migrate_clear(struct intel_context *ce,
>>>                   const struct i915_deps *deps,
>>>                   struct scatterlist *sg,
>>> -                enum i915_cache_level cache_level,
>>> +                unsigned int pat_index,
>>>                   bool is_lmem,
>>>                   u32 value,
>>>                   struct i915_request **out)
>>> @@ -1027,7 +1032,7 @@ intel_context_migrate_clear(struct intel_context
>>> *ce,
>>>           if (err)
>>>               goto out_rq;
>>> -        len = emit_pte(rq, &it, cache_level, is_lmem, offset, CHUNK_SZ);
>>> +        len = emit_pte(rq, &it, pat_index, is_lmem, offset, CHUNK_SZ);
>>>           if (len <= 0) {
>>>               err = len;
>>>               goto out_rq;
>>> @@ -1074,10 +1079,10 @@ int intel_migrate_copy(struct intel_migrate *m,
>>>                  struct i915_gem_ww_ctx *ww,
>>>                  const struct i915_deps *deps,
>>>                  struct scatterlist *src,
>>> -               enum i915_cache_level src_cache_level,
>>> +               unsigned int src_pat_index,
>>>                  bool src_is_lmem,
>>>                  struct scatterlist *dst,
>>> -               enum i915_cache_level dst_cache_level,
>>> +               unsigned int dst_pat_index,
>>>                  bool dst_is_lmem,
>>>                  struct i915_request **out)
>>>   {
>>> @@ -1098,8 +1103,8 @@ int intel_migrate_copy(struct intel_migrate *m,
>>>           goto out;
>>>       err = intel_context_migrate_copy(ce, deps,
>>> -                     src, src_cache_level, src_is_lmem,
>>> -                     dst, dst_cache_level, dst_is_lmem,
>>> +                     src, src_pat_index, src_is_lmem,
>>> +                     dst, dst_pat_index, dst_is_lmem,
>>>                        out);
>>>       intel_context_unpin(ce);
>>> @@ -1113,7 +1118,7 @@ intel_migrate_clear(struct intel_migrate *m,
>>>               struct i915_gem_ww_ctx *ww,
>>>               const struct i915_deps *deps,
>>>               struct scatterlist *sg,
>>> -            enum i915_cache_level cache_level,
>>> +            unsigned int pat_index,
>>>               bool is_lmem,
>>>               u32 value,
>>>               struct i915_request **out)
>>> @@ -1134,7 +1139,7 @@ intel_migrate_clear(struct intel_migrate *m,
>>>       if (err)
>>>           goto out;
>>> -    err = intel_context_migrate_clear(ce, deps, sg, cache_level,
>>> +    err = intel_context_migrate_clear(ce, deps, sg, pat_index,
>>>                         is_lmem, value, out);
>>>       intel_context_unpin(ce);
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.h
>>> b/drivers/gpu/drm/i915/gt/intel_migrate.h
>>> index ccc677ec4aa3..11fc09a00c4b 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_migrate.h
>>> +++ b/drivers/gpu/drm/i915/gt/intel_migrate.h
>>> @@ -16,7 +16,6 @@ struct i915_request;
>>>   struct i915_gem_ww_ctx;
>>>   struct intel_gt;
>>>   struct scatterlist;
>>> -enum i915_cache_level;
>>>   int intel_migrate_init(struct intel_migrate *m, struct intel_gt *gt);
>>> @@ -26,20 +25,20 @@ int intel_migrate_copy(struct intel_migrate *m,
>>>                  struct i915_gem_ww_ctx *ww,
>>>                  const struct i915_deps *deps,
>>>                  struct scatterlist *src,
>>> -               enum i915_cache_level src_cache_level,
>>> +               unsigned int src_pat_index,
>>>                  bool src_is_lmem,
>>>                  struct scatterlist *dst,
>>> -               enum i915_cache_level dst_cache_level,
>>> +               unsigned int dst_pat_index,
>>>                  bool dst_is_lmem,
>>>                  struct i915_request **out);
>>>   int intel_context_migrate_copy(struct intel_context *ce,
>>>                      const struct i915_deps *deps,
>>>                      struct scatterlist *src,
>>> -                   enum i915_cache_level src_cache_level,
>>> +                   unsigned int src_pat_index,
>>>                      bool src_is_lmem,
>>>                      struct scatterlist *dst,
>>> -                   enum i915_cache_level dst_cache_level,
>>> +                   unsigned int dst_pat_index,
>>>                      bool dst_is_lmem,
>>>                      struct i915_request **out);
>>> @@ -48,7 +47,7 @@ intel_migrate_clear(struct intel_migrate *m,
>>>               struct i915_gem_ww_ctx *ww,
>>>               const struct i915_deps *deps,
>>>               struct scatterlist *sg,
>>> -            enum i915_cache_level cache_level,
>>> +            unsigned int pat_index,
>>>               bool is_lmem,
>>>               u32 value,
>>>               struct i915_request **out);
>>> @@ -56,7 +55,7 @@ int
>>>   intel_context_migrate_clear(struct intel_context *ce,
>>>                   const struct i915_deps *deps,
>>>                   struct scatterlist *sg,
>>> -                enum i915_cache_level cache_level,
>>> +                unsigned int pat_index,
>>>                   bool is_lmem,
>>>                   u32 value,
>>>                   struct i915_request **out);
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c
>>> b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
>>> index 7ecfa672f738..f0da3555c6db 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
>>> @@ -98,7 +98,7 @@ void
>>>   __set_pd_entry(struct i915_page_directory * const pd,
>>>              const unsigned short idx,
>>>              struct i915_page_table * const to,
>>> -           u64 (*encode)(const dma_addr_t, const enum i915_cache_level))
>>> +           u64 (*encode)(const dma_addr_t, const unsigned int))
>>>   {
>>>       /* Each thread pre-pins the pd, and we may have a thread per
>>> pde. */
>>>       GEM_BUG_ON(atomic_read(px_used(pd)) > NALLOC * I915_PDES);
>>> @@ -181,7 +181,7 @@ struct i915_ppgtt *i915_ppgtt_create(struct
>>> intel_gt *gt,
>>>   void ppgtt_bind_vma(struct i915_address_space *vm,
>>>               struct i915_vm_pt_stash *stash,
>>>               struct i915_vma_resource *vma_res,
>>> -            enum i915_cache_level cache_level,
>>> +            unsigned int pat_index,
>>>               u32 flags)
>>>   {
>>>       u32 pte_flags;
>>> @@ -199,7 +199,7 @@ void ppgtt_bind_vma(struct i915_address_space *vm,
>>>       if (vma_res->bi.lmem)
>>>           pte_flags |= PTE_LM;
>>> -    vm->insert_entries(vm, vma_res, cache_level, pte_flags);
>>> +    vm->insert_entries(vm, vma_res, pat_index, pte_flags);
>>>       wmb();
>>>   }
>>> diff --git a/drivers/gpu/drm/i915/gt/selftest_migrate.c
>>> b/drivers/gpu/drm/i915/gt/selftest_migrate.c
>>> index e677f2da093d..3def5ca72dec 100644
>>> --- a/drivers/gpu/drm/i915/gt/selftest_migrate.c
>>> +++ b/drivers/gpu/drm/i915/gt/selftest_migrate.c
>>> @@ -137,7 +137,7 @@ static int copy(struct intel_migrate *migrate,
>>>   static int intel_context_copy_ccs(struct intel_context *ce,
>>>                     const struct i915_deps *deps,
>>>                     struct scatterlist *sg,
>>> -                  enum i915_cache_level cache_level,
>>> +                  unsigned int pat_index,
>>>                     bool write_to_ccs,
>>>                     struct i915_request **out)
>>>   {
>>> @@ -185,7 +185,7 @@ static int intel_context_copy_ccs(struct
>>> intel_context *ce,
>>>           if (err)
>>>               goto out_rq;
>>> -        len = emit_pte(rq, &it, cache_level, true, offset, CHUNK_SZ);
>>> +        len = emit_pte(rq, &it, pat_index, true, offset, CHUNK_SZ);
>>>           if (len <= 0) {
>>>               err = len;
>>>               goto out_rq;
>>> @@ -223,7 +223,7 @@ intel_migrate_ccs_copy(struct intel_migrate *m,
>>>                  struct i915_gem_ww_ctx *ww,
>>>                  const struct i915_deps *deps,
>>>                  struct scatterlist *sg,
>>> -               enum i915_cache_level cache_level,
>>> +               unsigned int pat_index,
>>>                  bool write_to_ccs,
>>>                  struct i915_request **out)
>>>   {
>>> @@ -243,7 +243,7 @@ intel_migrate_ccs_copy(struct intel_migrate *m,
>>>       if (err)
>>>           goto out;
>>> -    err = intel_context_copy_ccs(ce, deps, sg, cache_level,
>>> +    err = intel_context_copy_ccs(ce, deps, sg, pat_index,
>>>                        write_to_ccs, out);
>>>       intel_context_unpin(ce);
>>> @@ -300,7 +300,7 @@ static int clear(struct intel_migrate *migrate,
>>>               /* Write the obj data into ccs surface */
>>>               err = intel_migrate_ccs_copy(migrate, &ww, NULL,
>>>                                obj->mm.pages->sgl,
>>> -                             obj->cache_level,
>>> +                             obj->pat_index,
>>>                                true, &rq);
>>>               if (rq && !err) {
>>>                   if (i915_request_wait(rq, 0, HZ) < 0) {
>>> @@ -351,7 +351,7 @@ static int clear(struct intel_migrate *migrate,
>>>               err = intel_migrate_ccs_copy(migrate, &ww, NULL,
>>>                                obj->mm.pages->sgl,
>>> -                             obj->cache_level,
>>> +                             obj->pat_index,
>>>                                false, &rq);
>>>               if (rq && !err) {
>>>                   if (i915_request_wait(rq, 0, HZ) < 0) {
>>> @@ -414,9 +414,9 @@ static int __migrate_copy(struct intel_migrate
>>> *migrate,
>>>                 struct i915_request **out)
>>>   {
>>>       return intel_migrate_copy(migrate, ww, NULL,
>>> -                  src->mm.pages->sgl, src->cache_level,
>>> +                  src->mm.pages->sgl, src->pat_index,
>>>                     i915_gem_object_is_lmem(src),
>>> -                  dst->mm.pages->sgl, dst->cache_level,
>>> +                  dst->mm.pages->sgl, dst->pat_index,
>>>                     i915_gem_object_is_lmem(dst),
>>>                     out);
>>>   }
>>> @@ -428,9 +428,9 @@ static int __global_copy(struct intel_migrate
>>> *migrate,
>>>                struct i915_request **out)
>>>   {
>>>       return intel_context_migrate_copy(migrate->context, NULL,
>>> -                      src->mm.pages->sgl, src->cache_level,
>>> +                      src->mm.pages->sgl, src->pat_index,
>>>                         i915_gem_object_is_lmem(src),
>>> -                      dst->mm.pages->sgl, dst->cache_level,
>>> +                      dst->mm.pages->sgl, dst->pat_index,
>>>                         i915_gem_object_is_lmem(dst),
>>>                         out);
>>>   }
>>> @@ -455,7 +455,7 @@ static int __migrate_clear(struct intel_migrate
>>> *migrate,
>>>   {
>>>       return intel_migrate_clear(migrate, ww, NULL,
>>>                      obj->mm.pages->sgl,
>>> -                   obj->cache_level,
>>> +                   obj->pat_index,
>>>                      i915_gem_object_is_lmem(obj),
>>>                      value, out);
>>>   }
>>> @@ -468,7 +468,7 @@ static int __global_clear(struct intel_migrate
>>> *migrate,
>>>   {
>>>       return intel_context_migrate_clear(migrate->context, NULL,
>>>                          obj->mm.pages->sgl,
>>> -                       obj->cache_level,
>>> +                       obj->pat_index,
>>>                          i915_gem_object_is_lmem(obj),
>>>                          value, out);
>>>   }
>>> @@ -648,7 +648,7 @@ static int live_emit_pte_full_ring(void *arg)
>>>        */
>>>       pr_info("%s emite_pte ring space=%u\n", __func__, rq->ring->space);
>>>       it = sg_sgt(obj->mm.pages->sgl);
>>> -    len = emit_pte(rq, &it, obj->cache_level, false, 0, CHUNK_SZ);
>>> +    len = emit_pte(rq, &it, obj->pat_index, false, 0, CHUNK_SZ);
>>>       if (!len) {
>>>           err = -EINVAL;
>>>           goto out_rq;
>>> @@ -844,7 +844,7 @@ static int wrap_ktime_compare(const void *A, const
>>> void *B)
>>>   static int __perf_clear_blt(struct intel_context *ce,
>>>                   struct scatterlist *sg,
>>> -                enum i915_cache_level cache_level,
>>> +                unsigned int pat_index,
>>>                   bool is_lmem,
>>>                   size_t sz)
>>>   {
>>> @@ -858,7 +858,7 @@ static int __perf_clear_blt(struct intel_context *ce,
>>>           t0 = ktime_get();
>>> -        err = intel_context_migrate_clear(ce, NULL, sg, cache_level,
>>> +        err = intel_context_migrate_clear(ce, NULL, sg, pat_index,
>>>                             is_lmem, 0, &rq);
>>>           if (rq) {
>>>               if (i915_request_wait(rq, 0, MAX_SCHEDULE_TIMEOUT) < 0)
>>> @@ -904,7 +904,8 @@ static int perf_clear_blt(void *arg)
>>>           err = __perf_clear_blt(gt->migrate.context,
>>>                          dst->mm.pages->sgl,
>>> -                       I915_CACHE_NONE,
>>> +                       i915_gem_get_pat_index(gt->i915,
>>> +                                  I915_CACHE_NONE),
>>>                          i915_gem_object_is_lmem(dst),
>>>                          sizes[i]);
>>> @@ -919,10 +920,10 @@ static int perf_clear_blt(void *arg)
>>>   static int __perf_copy_blt(struct intel_context *ce,
>>>                  struct scatterlist *src,
>>> -               enum i915_cache_level src_cache_level,
>>> +               unsigned int src_pat_index,
>>>                  bool src_is_lmem,
>>>                  struct scatterlist *dst,
>>> -               enum i915_cache_level dst_cache_level,
>>> +               unsigned int dst_pat_index,
>>>                  bool dst_is_lmem,
>>>                  size_t sz)
>>>   {
>>> @@ -937,9 +938,9 @@ static int __perf_copy_blt(struct intel_context *ce,
>>>           t0 = ktime_get();
>>>           err = intel_context_migrate_copy(ce, NULL,
>>> -                         src, src_cache_level,
>>> +                         src, src_pat_index,
>>>                            src_is_lmem,
>>> -                         dst, dst_cache_level,
>>> +                         dst, dst_pat_index,
>>>                            dst_is_lmem,
>>>                            &rq);
>>>           if (rq) {
>>> @@ -994,10 +995,12 @@ static int perf_copy_blt(void *arg)
>>>           err = __perf_copy_blt(gt->migrate.context,
>>>                         src->mm.pages->sgl,
>>> -                      I915_CACHE_NONE,
>>> +                      i915_gem_get_pat_index(gt->i915,
>>> +                                 I915_CACHE_NONE),
>>>                         i915_gem_object_is_lmem(src),
>>>                         dst->mm.pages->sgl,
>>> -                      I915_CACHE_NONE,
>>> +                      i915_gem_get_pat_index(gt->i915,
>>> +                                 I915_CACHE_NONE),
>>>                         i915_gem_object_is_lmem(dst),
>>>                         sz);
>>> diff --git a/drivers/gpu/drm/i915/gt/selftest_reset.c
>>> b/drivers/gpu/drm/i915/gt/selftest_reset.c
>>> index a9e0a91bc0e0..79aa6ac66ad2 100644
>>> --- a/drivers/gpu/drm/i915/gt/selftest_reset.c
>>> +++ b/drivers/gpu/drm/i915/gt/selftest_reset.c
>>> @@ -86,7 +86,9 @@ __igt_reset_stolen(struct intel_gt *gt,
>>>           ggtt->vm.insert_page(&ggtt->vm, dma,
>>>                        ggtt->error_capture.start,
>>> -                     I915_CACHE_NONE, 0);
>>> +                     i915_gem_get_pat_index(gt->i915,
>>> +                                I915_CACHE_NONE),
>>> +                     0);
>>>           mb();
>>>           s = io_mapping_map_wc(&ggtt->iomap,
>>> @@ -127,7 +129,9 @@ __igt_reset_stolen(struct intel_gt *gt,
>>>           ggtt->vm.insert_page(&ggtt->vm, dma,
>>>                        ggtt->error_capture.start,
>>> -                     I915_CACHE_NONE, 0);
>>> +                     i915_gem_get_pat_index(gt->i915,
>>> +                                I915_CACHE_NONE),
>>> +                     0);
>>>           mb();
>>>           s = io_mapping_map_wc(&ggtt->iomap,
>>> diff --git a/drivers/gpu/drm/i915/gt/selftest_timeline.c
>>> b/drivers/gpu/drm/i915/gt/selftest_timeline.c
>>> index 9f536c251179..39c3ec12df1a 100644
>>> --- a/drivers/gpu/drm/i915/gt/selftest_timeline.c
>>> +++ b/drivers/gpu/drm/i915/gt/selftest_timeline.c
>>> @@ -836,7 +836,7 @@ static int setup_watcher(struct hwsp_watcher *w,
>>> struct intel_gt *gt,
>>>           return PTR_ERR(obj);
>>>       /* keep the same cache settings as timeline */
>>> -    i915_gem_object_set_cache_coherency(obj,
>>> tl->hwsp_ggtt->obj->cache_level);
>>> +    i915_gem_object_set_pat_index(obj, tl->hwsp_ggtt->obj->pat_index);
>>>       w->map = i915_gem_object_pin_map_unlocked(obj,
>>>
>>> page_unmask_bits(tl->hwsp_ggtt->obj->mm.mapping));
>>>       if (IS_ERR(w->map)) {
>>> diff --git a/drivers/gpu/drm/i915/gt/selftest_tlb.c
>>> b/drivers/gpu/drm/i915/gt/selftest_tlb.c
>>> index e6cac1f15d6e..4493c8518e91 100644
>>> --- a/drivers/gpu/drm/i915/gt/selftest_tlb.c
>>> +++ b/drivers/gpu/drm/i915/gt/selftest_tlb.c
>>> @@ -36,6 +36,8 @@ pte_tlbinv(struct intel_context *ce,
>>>          u64 length,
>>>          struct rnd_state *prng)
>>>   {
>>> +    const unsigned int pat_index =
>>> +        i915_gem_get_pat_index(ce->vm->i915, I915_CACHE_NONE);
>>>       struct drm_i915_gem_object *batch;
>>>       struct drm_mm_node vb_node;
>>>       struct i915_request *rq;
>>> @@ -155,7 +157,7 @@ pte_tlbinv(struct intel_context *ce,
>>>           /* Flip the PTE between A and B */
>>>           if (i915_gem_object_is_lmem(vb->obj))
>>>               pte_flags |= PTE_LM;
>>> -        ce->vm->insert_entries(ce->vm, &vb_res, 0, pte_flags);
>>> +        ce->vm->insert_entries(ce->vm, &vb_res, pat_index, pte_flags);
>>>           /* Flush the PTE update to concurrent HW */
>>>           tlbinv(ce->vm, addr & -length, length);
>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
>>> b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
>>> index a82a53dbbc86..145681ae20a5 100644
>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
>>> @@ -890,9 +890,15 @@ static void uc_fw_bind_ggtt(struct intel_uc_fw
>>> *uc_fw)
>>>           pte_flags |= PTE_LM;
>>>       if (ggtt->vm.raw_insert_entries)
>>> -        ggtt->vm.raw_insert_entries(&ggtt->vm, dummy,
>>> I915_CACHE_NONE, pte_flags);
>>> +        ggtt->vm.raw_insert_entries(&ggtt->vm, dummy,
>>> +                        i915_gem_get_pat_index(ggtt->vm.i915,
>>> +                                   I915_CACHE_NONE),
>>> +                        pte_flags);
>>>       else
>>> -        ggtt->vm.insert_entries(&ggtt->vm, dummy, I915_CACHE_NONE,
>>> pte_flags);
>>> +        ggtt->vm.insert_entries(&ggtt->vm, dummy,
>>> +                    i915_gem_get_pat_index(ggtt->vm.i915,
>>> +                                   I915_CACHE_NONE),
>>> +                    pte_flags);
>>>   }
>>>   static void uc_fw_unbind_ggtt(struct intel_uc_fw *uc_fw)
>>> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c
>>> b/drivers/gpu/drm/i915/i915_debugfs.c
>>> index 41389a32e998..9a4922da3a71 100644
>>> --- a/drivers/gpu/drm/i915/i915_debugfs.c
>>> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
>>> @@ -139,21 +139,56 @@ static const char *stringify_vma_type(const
>>> struct i915_vma *vma)
>>>       return "ppgtt";
>>>   }
>>> -static const char *i915_cache_level_str(struct drm_i915_private
>>> *i915, int type)
>>> -{
>>> -    switch (type) {
>>> -    case I915_CACHE_NONE: return " uncached";
>>> -    case I915_CACHE_LLC: return HAS_LLC(i915) ? " LLC" : " snooped";
>>> -    case I915_CACHE_L3_LLC: return " L3+LLC";
>>> -    case I915_CACHE_WT: return " WT";
>>> -    default: return "";
>>> +static const char *i915_cache_level_str(struct drm_i915_gem_object *obj)
>>> +{
>>> +    struct drm_i915_private *i915 = obj_to_i915(obj);
>>> +
>>> +    if (IS_METEORLAKE(i915)) {
>>> +        switch (obj->pat_index) {
>>> +        case 0: return " WB";
>>> +        case 1: return " WT";
>>> +        case 2: return " UC";
>>> +        case 3: return " WB (1-Way Coh)";
>>> +        case 4: return " WB (2-Way Coh)";
>>> +        default: return " not defined";
>
> Is not defined possible?

Not possible because there is a sanity check on pat_index, but I think I got
a compilation warning without default, then due to the treat-warning-as-error
flag, I actually failed the compilation.

> Also, it may be nicer to handle the leading space in the caller.

will check...

>>> +        }
>>> +    } else if (IS_PONTEVECCHIO(i915)) {
>>> +        switch (obj->pat_index) {
>>> +        case 0: return " UC";
>>> +        case 1: return " WC";
>>> +        case 2: return " WT";
>>> +        case 3: return " WB";
>>> +        case 4: return " WT (CLOS1)";
>>> +        case 5: return " WB (CLOS1)";
>>> +        case 6: return " WT (CLOS2)";
>>> +        case 7: return " WT (CLOS2)";
>>> +        default: return " not defined";
>>> +        }
>>> +    } else if (GRAPHICS_VER(i915) >= 12) {
>>> +        switch (obj->pat_index) {
>>> +        case 0: return " WB";
>>> +        case 1: return " WC";
>>> +        case 2: return " WT";
>>> +        case 3: return " UC";
>>> +        default: return " not defined";
>>> +        }
>>> +    } else {
>
> Is this correct if a legacy platform used the set pat extension?
> I don't see that it is disallowed.

That is allowed. For legacy platforms pat_index is the same as cache_level.

> Would it simplify things to add a reverse table to device info, so like
> cachelevel_to_pat, just for pat_index to names? I guess it depends what
> names PRMs use for PATs on legacy platforms. Is it consistend with the
> above UC/WC/WB/... or with the below names.

I was trying to avoid adding tables, but couldn't get away with cachelevel_to_pat
translation. Well, I guess having constant tables defined locally here might
not be too bad...

>>> +        if (i915_gem_object_has_cache_level(obj, I915_CACHE_NONE))
>>> +            return " uncached";
>
> UC for consistency?

Will update.

>>> +        else if (i915_gem_object_has_cache_level(obj, I915_CACHE_LLC))
>>> +            return HAS_LLC(i915) ? " LLC" : " snooped";
>>> +        else if (i915_gem_object_has_cache_level(obj,
>>> I915_CACHE_L3_LLC))
>>> +            return " L3+LLC";
>
> Is this correct if !HAS_LLC?

Nice catch. Will update.

>>> +        else if (i915_gem_object_has_cache_level(obj, I915_CACHE_WT))
>>> +            return " WT";
>>> +        else
>>> +            return " not defined";
>
> Current code prints nothing for the default switch statement.
>
> But is this even reachable or should it be MISSING_CASE warning?

hmm... I guess it's possible, will re-examine the code.

>>>       }
>>>   }
>>>   void
>>>   i915_debugfs_describe_obj(struct seq_file *m, struct
>>> drm_i915_gem_object *obj)
>>>   {
>>> -    struct drm_i915_private *dev_priv = to_i915(obj->base.dev);
>>>       struct i915_vma *vma;
>>>       int pin_count = 0;
>>> @@ -165,7 +200,7 @@ i915_debugfs_describe_obj(struct seq_file *m,
>>> struct drm_i915_gem_object *obj)
>>>              obj->base.size / 1024,
>>>              obj->read_domains,
>>>              obj->write_domain,
>>> -           i915_cache_level_str(dev_priv, obj->cache_level),
>>> +           i915_cache_level_str(obj),
>>>              obj->mm.dirty ? " dirty" : "",
>>>              obj->mm.madv == I915_MADV_DONTNEED ? " purgeable" : "");
>>>       if (obj->base.name)
>>> diff --git a/drivers/gpu/drm/i915/i915_gem.c
>>> b/drivers/gpu/drm/i915/i915_gem.c
>>> index 0a78bdbd36b1..63207b0740b3 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem.c
>>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>>> @@ -420,8 +420,12 @@ i915_gem_gtt_pread(struct drm_i915_gem_object *obj,
>>>           page_length = remain < page_length ? remain : page_length;
>>>           if (drm_mm_node_allocated(&node)) {
>>>               ggtt->vm.insert_page(&ggtt->vm,
>>> -                         i915_gem_object_get_dma_address(obj, offset
>>> >> PAGE_SHIFT),
>>> -                         node.start, I915_CACHE_NONE, 0);
>>> +                    i915_gem_object_get_dma_address(obj,
>>> +                                    offset >> PAGE_SHIFT),
>>> +                    node.start,
>>> +                    i915_gem_get_pat_index(i915,
>>> +                                   I915_CACHE_NONE),
>
> For the callsites which use const levels you could at least do something
> like i915->pat_cache_none, or I know the not very popular static inline
> i915_gem_get_pat_index so it can be evaluated at runtime. Not sure really,
> throwing out ideas which may be invalid if a more elegant refactoring is
> possible.

You meant defining each cache_level in the i915 structure in stead of having
a constant table to do the translation?

-Fei

> Regards,
>
> Tvrtko
>
>>> +                    0);
>>>           } else {
>>>               page_base += offset & PAGE_MASK;
>>>           }
>>> @@ -598,8 +602,12 @@ i915_gem_gtt_pwrite_fast(struct
>>> drm_i915_gem_object *obj,
>>>               /* flush the write before we modify the GGTT */
>>>               intel_gt_flush_ggtt_writes(ggtt->vm.gt);
>>>               ggtt->vm.insert_page(&ggtt->vm,
>>> -                         i915_gem_object_get_dma_address(obj, offset
>>> >> PAGE_SHIFT),
>>> -                         node.start, I915_CACHE_NONE, 0);
>>> +                    i915_gem_object_get_dma_address(obj,
>>> +                                    offset >> PAGE_SHIFT),
>>> +                    node.start,
>>> +                    i915_gem_get_pat_index(i915,
>>> +                                   I915_CACHE_NONE),
>>> +                    0);
>>>               wmb(); /* flush modifications to the GGTT (insert_page) */
>>>           } else {
>>>               page_base += offset & PAGE_MASK;
>>> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c
>>> b/drivers/gpu/drm/i915/i915_gpu_error.c
>>> index f020c0086fbc..2556cabea02c 100644
>>> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
>>> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
>>> @@ -1117,10 +1117,14 @@ i915_vma_coredump_create(const struct intel_gt
>>> *gt,
>>>               mutex_lock(&ggtt->error_mutex);
>>>               if (ggtt->vm.raw_insert_page)
>>>                   ggtt->vm.raw_insert_page(&ggtt->vm, dma, slot,
>>> -                             I915_CACHE_NONE, 0);
>>> +                        i915_gem_get_pat_index(gt->i915,
>>> +                                       I915_CACHE_NONE),
>>> +                        0);
>>>               else
>>>                   ggtt->vm.insert_page(&ggtt->vm, dma, slot,
>>> -                             I915_CACHE_NONE, 0);
>>> +                        i915_gem_get_pat_index(gt->i915,
>>> +                                       I915_CACHE_NONE),
>>> +                        0);
>>>               mb();
>>>               s = io_mapping_map_wc(&ggtt->iomap, slot, PAGE_SIZE);
>>> diff --git a/drivers/gpu/drm/i915/i915_vma.c
>>> b/drivers/gpu/drm/i915/i915_vma.c
>>> index 20a44788999e..a814775a363d 100644
>>> --- a/drivers/gpu/drm/i915/i915_vma.c
>>> +++ b/drivers/gpu/drm/i915/i915_vma.c
>>> @@ -315,7 +315,7 @@ struct i915_vma_work {
>>>       struct i915_vma_resource *vma_res;
>>>       struct drm_i915_gem_object *obj;
>>>       struct i915_sw_dma_fence_cb cb;
>>> -    enum i915_cache_level cache_level;
>>> +    unsigned int pat_index;
>>>       unsigned int flags;
>>>   };
>>> @@ -334,7 +334,7 @@ static void __vma_bind(struct dma_fence_work *work)
>>>           return;
>>>       vma_res->ops->bind_vma(vma_res->vm, &vw->stash,
>>> -                   vma_res, vw->cache_level, vw->flags);
>>> +                   vma_res, vw->pat_index, vw->flags);
>>>   }
>>>   static void __vma_release(struct dma_fence_work *work)
>>> @@ -426,7 +426,7 @@ i915_vma_resource_init_from_vma(struct
>>> i915_vma_resource *vma_res,
>>>   /**
>>>    * i915_vma_bind - Sets up PTEs for an VMA in it's corresponding
>>> address space.
>>>    * @vma: VMA to map
>>> - * @cache_level: mapping cache level
>>> + * @pat_index: PAT index to set in PTE
>>>    * @flags: flags like global or local mapping
>>>    * @work: preallocated worker for allocating and binding the PTE
>>>    * @vma_res: pointer to a preallocated vma resource. The resource is
>>> either
>>> @@ -437,7 +437,7 @@ i915_vma_resource_init_from_vma(struct
>>> i915_vma_resource *vma_res,
>>>    * Note that DMA addresses are also the only part of the SG table we
>>> care about.
>>>    */
>>>   int i915_vma_bind(struct i915_vma *vma,
>>> -          enum i915_cache_level cache_level,
>>> +          unsigned int pat_index,
>>>             u32 flags,
>>>             struct i915_vma_work *work,
>>>             struct i915_vma_resource *vma_res)
>>> @@ -507,7 +507,7 @@ int i915_vma_bind(struct i915_vma *vma,
>>>           struct dma_fence *prev;
>>>           work->vma_res = i915_vma_resource_get(vma->resource);
>>> -        work->cache_level = cache_level;
>>> +        work->pat_index = pat_index;
>>>           work->flags = bind_flags;
>>>           /*
>>> @@ -537,7 +537,7 @@ int i915_vma_bind(struct i915_vma *vma,
>>>               return ret;
>>>           }
>>> -        vma->ops->bind_vma(vma->vm, NULL, vma->resource, cache_level,
>>> +        vma->ops->bind_vma(vma->vm, NULL, vma->resource, pat_index,
>>>                      bind_flags);
>>>       }
>>> @@ -814,7 +814,7 @@ i915_vma_insert(struct i915_vma *vma, struct
>>> i915_gem_ww_ctx *ww,
>>>       color = 0;
>>>       if (i915_vm_has_cache_coloring(vma->vm))
>>> -        color = vma->obj->cache_level;
>>> +        color = vma->obj->pat_index;
>>>       if (flags & PIN_OFFSET_FIXED) {
>>>           u64 offset = flags & PIN_OFFSET_MASK;
>>> @@ -1518,7 +1518,7 @@ int i915_vma_pin_ww(struct i915_vma *vma, struct
>>> i915_gem_ww_ctx *ww,
>>>       GEM_BUG_ON(!vma->pages);
>>>       err = i915_vma_bind(vma,
>>> -                vma->obj->cache_level,
>>> +                vma->obj->pat_index,
>>>                   flags, work, vma_res);
>>>       vma_res = NULL;
>>>       if (err)
>>> diff --git a/drivers/gpu/drm/i915/i915_vma.h
>>> b/drivers/gpu/drm/i915/i915_vma.h
>>> index ed5c9d682a1b..31a8f8aa5558 100644
>>> --- a/drivers/gpu/drm/i915/i915_vma.h
>>> +++ b/drivers/gpu/drm/i915/i915_vma.h
>>> @@ -250,7 +250,7 @@ i915_vma_compare(struct i915_vma *vma,
>>>   struct i915_vma_work *i915_vma_work(void);
>>>   int i915_vma_bind(struct i915_vma *vma,
>>> -          enum i915_cache_level cache_level,
>>> +          unsigned int pat_index,
>>>             u32 flags,
>>>             struct i915_vma_work *work,
>>>             struct i915_vma_resource *vma_res);
>>> diff --git a/drivers/gpu/drm/i915/i915_vma_types.h
>>> b/drivers/gpu/drm/i915/i915_vma_types.h
>>> index 77fda2244d16..64472b7f0e77 100644
>>> --- a/drivers/gpu/drm/i915/i915_vma_types.h
>>> +++ b/drivers/gpu/drm/i915/i915_vma_types.h
>>> @@ -32,8 +32,6 @@
>>>   #include "gem/i915_gem_object_types.h"
>>> -enum i915_cache_level;
>>> -
>>>   /**
>>>    * DOC: Global GTT views
>>>    *
>>> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem.c
>>> b/drivers/gpu/drm/i915/selftests/i915_gem.c
>>> index d91d0ade8abd..61da4ed9d521 100644
>>> --- a/drivers/gpu/drm/i915/selftests/i915_gem.c
>>> +++ b/drivers/gpu/drm/i915/selftests/i915_gem.c
>>> @@ -57,7 +57,10 @@ static void trash_stolen(struct drm_i915_private
>>> *i915)
>>>           u32 __iomem *s;
>>>           int x;
>>> -        ggtt->vm.insert_page(&ggtt->vm, dma, slot, I915_CACHE_NONE, 0);
>>> +        ggtt->vm.insert_page(&ggtt->vm, dma, slot,
>>> +                     i915_gem_get_pat_index(i915,
>>> +                                I915_CACHE_NONE),
>>> +                     0);
>>>           s = io_mapping_map_atomic_wc(&ggtt->iomap, slot);
>>>           for (x = 0; x < PAGE_SIZE / sizeof(u32); x++) {
>>> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
>>> b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
>>> index 37068542aafe..f13a4d265814 100644
>>> --- a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
>>> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
>>> @@ -245,7 +245,7 @@ static int igt_evict_for_cache_color(void *arg)
>>>       struct drm_mm_node target = {
>>>           .start = I915_GTT_PAGE_SIZE * 2,
>>>           .size = I915_GTT_PAGE_SIZE,
>>> -        .color = I915_CACHE_LLC,
>>> +        .color = i915_gem_get_pat_index(gt->i915, I915_CACHE_LLC),
>>>       };
>>>       struct drm_i915_gem_object *obj;
>>>       struct i915_vma *vma;
>>> @@ -308,7 +308,7 @@ static int igt_evict_for_cache_color(void *arg)
>>>       /* Attempt to remove the first *pinned* vma, by removing the
>>> (empty)
>>>        * neighbour -- this should fail.
>>>        */
>>> -    target.color = I915_CACHE_L3_LLC;
>>> +    target.color = i915_gem_get_pat_index(gt->i915, I915_CACHE_L3_LLC);
>>>       mutex_lock(&ggtt->vm.mutex);
>>>       err = i915_gem_evict_for_node(&ggtt->vm, NULL, &target, 0);
>>> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>>> b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>>> index 154801f1c468..36940ef10108 100644
>>> --- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>>> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>>> @@ -135,7 +135,7 @@ fake_dma_object(struct drm_i915_private *i915, u64
>>> size)
>>>       obj->write_domain = I915_GEM_DOMAIN_CPU;
>>>       obj->read_domains = I915_GEM_DOMAIN_CPU;
>>> -    obj->cache_level = I915_CACHE_NONE;
>>> +    obj->pat_index = i915_gem_get_pat_index(i915, I915_CACHE_NONE);
>>>       /* Preallocate the "backing storage" */
>>>       if (i915_gem_object_pin_pages_unlocked(obj))
>>> @@ -359,7 +359,9 @@ static int lowlevel_hole(struct i915_address_space
>>> *vm,
>>>               with_intel_runtime_pm(vm->gt->uncore->rpm, wakeref)
>>>                 vm->insert_entries(vm, mock_vma_res,
>>> -                           I915_CACHE_NONE, 0);
>>> +                         i915_gem_get_pat_index(vm->i915,
>>> +                                    I915_CACHE_NONE),
>>> +                         0);
>>>           }
>>>           count = n;
>>> @@ -1377,7 +1379,10 @@ static int igt_ggtt_page(void *arg)
>>>           ggtt->vm.insert_page(&ggtt->vm,
>>>                        i915_gem_object_get_dma_address(obj, 0),
>>> -                     offset, I915_CACHE_NONE, 0);
>>> +                     offset,
>>> +                     i915_gem_get_pat_index(i915,
>>> +                                I915_CACHE_NONE),
>>> +                     0);
>>>       }
>>>       order = i915_random_order(count, &prng);
>>> @@ -1510,7 +1515,7 @@ static int reserve_gtt_with_resource(struct
>>> i915_vma *vma, u64 offset)
>>>       mutex_lock(&vm->mutex);
>>>       err = i915_gem_gtt_reserve(vm, NULL, &vma->node, obj->base.size,
>>>                      offset,
>>> -                   obj->cache_level,
>>> +                   obj->pat_index,
>>>                      0);
>>>       if (!err) {
>>>           i915_vma_resource_init_from_vma(vma_res, vma);
>>> @@ -1690,7 +1695,7 @@ static int insert_gtt_with_resource(struct
>>> i915_vma *vma)
>>>       mutex_lock(&vm->mutex);
>>>       err = i915_gem_gtt_insert(vm, NULL, &vma->node, obj->base.size, 0,
>>> -                  obj->cache_level, 0, vm->total, 0);
>>> +                  obj->pat_index, 0, vm->total, 0);
>>>       if (!err) {
>>>           i915_vma_resource_init_from_vma(vma_res, vma);
>>>           vma->resource = vma_res;
>>> diff --git a/drivers/gpu/drm/i915/selftests/intel_memory_region.c
>>> b/drivers/gpu/drm/i915/selftests/intel_memory_region.c
>>> index 3b18e5905c86..d985d9bae2e8 100644
>>> --- a/drivers/gpu/drm/i915/selftests/intel_memory_region.c
>>> +++ b/drivers/gpu/drm/i915/selftests/intel_memory_region.c
>>> @@ -1070,7 +1070,9 @@ static int igt_lmem_write_cpu(void *arg)
>>>       /* Put the pages into a known state -- from the gpu for added
>>> fun */
>>>       intel_engine_pm_get(engine);
>>>       err = intel_context_migrate_clear(engine->gt->migrate.context,
>>> NULL,
>>> -                      obj->mm.pages->sgl, I915_CACHE_NONE,
>>> +                      obj->mm.pages->sgl,
>>> +                      i915_gem_get_pat_index(i915,
>>> +                                 I915_CACHE_NONE),
>>>                         true, 0xdeadbeaf, &rq);
>>>       if (rq) {
>>>           dma_resv_add_fence(obj->base.resv, &rq->fence,
>>> diff --git a/drivers/gpu/drm/i915/selftests/mock_gtt.c
>>> b/drivers/gpu/drm/i915/selftests/mock_gtt.c
>>> index ece97e4faacb..a516c0aa88fd 100644
>>> --- a/drivers/gpu/drm/i915/selftests/mock_gtt.c
>>> +++ b/drivers/gpu/drm/i915/selftests/mock_gtt.c
>>> @@ -27,21 +27,21 @@
>>>   static void mock_insert_page(struct i915_address_space *vm,
>>>                    dma_addr_t addr,
>>>                    u64 offset,
>>> -                 enum i915_cache_level level,
>>> +                 unsigned int pat_index,
>>>                    u32 flags)
>>>   {
>>>   }
>>>   static void mock_insert_entries(struct i915_address_space *vm,
>>>                   struct i915_vma_resource *vma_res,
>>> -                enum i915_cache_level level, u32 flags)
>>> +                unsigned int pat_index, u32 flags)
>>>   {
>>>   }
>>>   static void mock_bind_ppgtt(struct i915_address_space *vm,
>>>                   struct i915_vm_pt_stash *stash,
>>>                   struct i915_vma_resource *vma_res,
>>> -                enum i915_cache_level cache_level,
>>> +                unsigned int pat_index,
>>>                   u32 flags)
>>>   {
>>>       GEM_BUG_ON(flags & I915_VMA_GLOBAL_BIND);
>>> @@ -94,7 +94,7 @@ struct i915_ppgtt *mock_ppgtt(struct
>>> drm_i915_private *i915, const char *name)
>>>   static void mock_bind_ggtt(struct i915_address_space *vm,
>>>                  struct i915_vm_pt_stash *stash,
>>>                  struct i915_vma_resource *vma_res,
>>> -               enum i915_cache_level cache_level,
>>> +               unsigned int pat_index,
>>>                  u32 flags)
>>>   {
>>>   }
>>


[-- Attachment #2: Type: text/html, Size: 233219 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Intel-gfx] [PATCH 3/8] drm/i915/mtl: Add PTE encode function
  2023-04-19 23:00   ` [Intel-gfx] " fei.yang
  (?)
@ 2023-04-20 20:40   ` Matt Roper
  2023-04-21 17:27     ` Yang, Fei
  -1 siblings, 1 reply; 76+ messages in thread
From: Matt Roper @ 2023-04-20 20:40 UTC (permalink / raw)
  To: fei.yang; +Cc: intel-gfx, dri-devel, Andrzej Hajda, Nirmoy Das

On Wed, Apr 19, 2023 at 04:00:53PM -0700, fei.yang@intel.com wrote:
> From: Fei Yang <fei.yang@intel.com>
> 
> PTE encode functions are platform dependent. This patch implements
> PTE functions for MTL, and ensures the correct PTE encode function
> is used by calling pte_encode function pointer instead of the
> hardcoded gen8 version of PTE encode.
> 
> Signed-off-by: Fei Yang <fei.yang@intel.com>
> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
> Acked-by: Nirmoy Das <nirmoy.das@intel.com>

Bspec: 45015, 45040

> ---
>  drivers/gpu/drm/i915/display/intel_dpt.c |  2 +-
>  drivers/gpu/drm/i915/gt/gen8_ppgtt.c     | 45 ++++++++++++++++++++----
>  drivers/gpu/drm/i915/gt/intel_ggtt.c     | 36 +++++++++++++++++--
>  3 files changed, 72 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/display/intel_dpt.c b/drivers/gpu/drm/i915/display/intel_dpt.c
> index b8027392144d..c5eacfdba1a5 100644
> --- a/drivers/gpu/drm/i915/display/intel_dpt.c
> +++ b/drivers/gpu/drm/i915/display/intel_dpt.c
> @@ -300,7 +300,7 @@ intel_dpt_create(struct intel_framebuffer *fb)
>  	vm->vma_ops.bind_vma    = dpt_bind_vma;
>  	vm->vma_ops.unbind_vma  = dpt_unbind_vma;
>  
> -	vm->pte_encode = gen8_ggtt_pte_encode;
> +	vm->pte_encode = vm->gt->ggtt->vm.pte_encode;
>  
>  	dpt->obj = dpt_obj;
>  	dpt->obj->is_dpt = true;
> diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> index 4daaa6f55668..11b91e0453c8 100644
> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> @@ -55,6 +55,34 @@ static u64 gen8_pte_encode(dma_addr_t addr,
>  	return pte;
>  }
>  
> +static u64 mtl_pte_encode(dma_addr_t addr,
> +			  enum i915_cache_level level,
> +			  u32 flags)
> +{
> +	gen8_pte_t pte = addr | GEN8_PAGE_PRESENT | GEN8_PAGE_RW;
> +
> +	if (unlikely(flags & PTE_READ_ONLY))
> +		pte &= ~GEN8_PAGE_RW;
> +
> +	if (flags & PTE_LM)
> +		pte |= GEN12_PPGTT_PTE_LM | GEN12_PPGTT_PTE_NC;

GEN12_PPGTT_PTE_NC got defined in the previous patch as BIT(5).  But
according to bspec 45040, bit 5 is ignored in the PTE encoding.  What is
this trying to do?


Matt

> +
> +	switch (level) {
> +	case I915_CACHE_NONE:
> +		pte |= GEN12_PPGTT_PTE_PAT1;
> +		break;
> +	case I915_CACHE_LLC:
> +	case I915_CACHE_L3_LLC:
> +		pte |= GEN12_PPGTT_PTE_PAT0 | GEN12_PPGTT_PTE_PAT1;
> +		break;
> +	case I915_CACHE_WT:
> +		pte |= GEN12_PPGTT_PTE_PAT0;
> +		break;
> +	}
> +
> +	return pte;
> +}
> +
>  static void gen8_ppgtt_notify_vgt(struct i915_ppgtt *ppgtt, bool create)
>  {
>  	struct drm_i915_private *i915 = ppgtt->vm.i915;
> @@ -427,7 +455,7 @@ gen8_ppgtt_insert_pte(struct i915_ppgtt *ppgtt,
>  		      u32 flags)
>  {
>  	struct i915_page_directory *pd;
> -	const gen8_pte_t pte_encode = gen8_pte_encode(0, cache_level, flags);
> +	const gen8_pte_t pte_encode = ppgtt->vm.pte_encode(0, cache_level, flags);
>  	gen8_pte_t *vaddr;
>  
>  	pd = i915_pd_entry(pdp, gen8_pd_index(idx, 2));
> @@ -580,7 +608,7 @@ static void gen8_ppgtt_insert_huge(struct i915_address_space *vm,
>  				   enum i915_cache_level cache_level,
>  				   u32 flags)
>  {
> -	const gen8_pte_t pte_encode = gen8_pte_encode(0, cache_level, flags);
> +	const gen8_pte_t pte_encode = vm->pte_encode(0, cache_level, flags);
>  	unsigned int rem = sg_dma_len(iter->sg);
>  	u64 start = vma_res->start;
>  
> @@ -743,7 +771,7 @@ static void gen8_ppgtt_insert_entry(struct i915_address_space *vm,
>  	GEM_BUG_ON(pt->is_compact);
>  
>  	vaddr = px_vaddr(pt);
> -	vaddr[gen8_pd_index(idx, 0)] = gen8_pte_encode(addr, level, flags);
> +	vaddr[gen8_pd_index(idx, 0)] = vm->pte_encode(addr, level, flags);
>  	drm_clflush_virt_range(&vaddr[gen8_pd_index(idx, 0)], sizeof(*vaddr));
>  }
>  
> @@ -773,7 +801,7 @@ static void __xehpsdv_ppgtt_insert_entry_lm(struct i915_address_space *vm,
>  	}
>  
>  	vaddr = px_vaddr(pt);
> -	vaddr[gen8_pd_index(idx, 0) / 16] = gen8_pte_encode(addr, level, flags);
> +	vaddr[gen8_pd_index(idx, 0) / 16] = vm->pte_encode(addr, level, flags);
>  }
>  
>  static void xehpsdv_ppgtt_insert_entry(struct i915_address_space *vm,
> @@ -820,8 +848,8 @@ static int gen8_init_scratch(struct i915_address_space *vm)
>  		pte_flags |= PTE_LM;
>  
>  	vm->scratch[0]->encode =
> -		gen8_pte_encode(px_dma(vm->scratch[0]),
> -				I915_CACHE_NONE, pte_flags);
> +		vm->pte_encode(px_dma(vm->scratch[0]),
> +			       I915_CACHE_NONE, pte_flags);
>  
>  	for (i = 1; i <= vm->top; i++) {
>  		struct drm_i915_gem_object *obj;
> @@ -963,7 +991,10 @@ struct i915_ppgtt *gen8_ppgtt_create(struct intel_gt *gt,
>  	 */
>  	ppgtt->vm.alloc_scratch_dma = alloc_pt_dma;
>  
> -	ppgtt->vm.pte_encode = gen8_pte_encode;
> +	if (GRAPHICS_VER_FULL(gt->i915) >= IP_VER(12, 70))
> +		ppgtt->vm.pte_encode = mtl_pte_encode;
> +	else
> +		ppgtt->vm.pte_encode = gen8_pte_encode;
>  
>  	ppgtt->vm.bind_async_flags = I915_VMA_LOCAL_BIND;
>  	ppgtt->vm.insert_entries = gen8_ppgtt_insert;
> diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> index 3c7f1ed92f5b..20915edc8bd9 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> @@ -220,6 +220,33 @@ static void guc_ggtt_invalidate(struct i915_ggtt *ggtt)
>  	}
>  }
>  
> +static u64 mtl_ggtt_pte_encode(dma_addr_t addr,
> +			       enum i915_cache_level level,
> +			       u32 flags)
> +{
> +	gen8_pte_t pte = addr | GEN8_PAGE_PRESENT;
> +
> +	WARN_ON_ONCE(addr & ~GEN12_GGTT_PTE_ADDR_MASK);
> +
> +	if (flags & PTE_LM)
> +		pte |= GEN12_GGTT_PTE_LM;
> +
> +	switch (level) {
> +	case I915_CACHE_NONE:
> +		pte |= MTL_GGTT_PTE_PAT1;
> +		break;
> +	case I915_CACHE_LLC:
> +	case I915_CACHE_L3_LLC:
> +		pte |= MTL_GGTT_PTE_PAT0 | MTL_GGTT_PTE_PAT1;
> +		break;
> +	case I915_CACHE_WT:
> +		pte |= MTL_GGTT_PTE_PAT0;
> +		break;
> +	}
> +
> +	return pte;
> +}
> +
>  u64 gen8_ggtt_pte_encode(dma_addr_t addr,
>  			 enum i915_cache_level level,
>  			 u32 flags)
> @@ -247,7 +274,7 @@ static void gen8_ggtt_insert_page(struct i915_address_space *vm,
>  	gen8_pte_t __iomem *pte =
>  		(gen8_pte_t __iomem *)ggtt->gsm + offset / I915_GTT_PAGE_SIZE;
>  
> -	gen8_set_pte(pte, gen8_ggtt_pte_encode(addr, level, flags));
> +	gen8_set_pte(pte, ggtt->vm.pte_encode(addr, level, flags));
>  
>  	ggtt->invalidate(ggtt);
>  }
> @@ -257,8 +284,8 @@ static void gen8_ggtt_insert_entries(struct i915_address_space *vm,
>  				     enum i915_cache_level level,
>  				     u32 flags)
>  {
> -	const gen8_pte_t pte_encode = gen8_ggtt_pte_encode(0, level, flags);
>  	struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
> +	const gen8_pte_t pte_encode = ggtt->vm.pte_encode(0, level, flags);
>  	gen8_pte_t __iomem *gte;
>  	gen8_pte_t __iomem *end;
>  	struct sgt_iter iter;
> @@ -981,7 +1008,10 @@ static int gen8_gmch_probe(struct i915_ggtt *ggtt)
>  	ggtt->vm.vma_ops.bind_vma    = intel_ggtt_bind_vma;
>  	ggtt->vm.vma_ops.unbind_vma  = intel_ggtt_unbind_vma;
>  
> -	ggtt->vm.pte_encode = gen8_ggtt_pte_encode;
> +	if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 70))
> +		ggtt->vm.pte_encode = mtl_ggtt_pte_encode;
> +	else
> +		ggtt->vm.pte_encode = gen8_ggtt_pte_encode;
>  
>  	return ggtt_probe_common(ggtt, size);
>  }
> -- 
> 2.25.1
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Intel-gfx] [PATCH 4/8] drm/i915/mtl: workaround coherency issue for Media
  2023-04-19 23:00   ` [Intel-gfx] " fei.yang
                     ` (2 preceding siblings ...)
  (?)
@ 2023-04-20 20:52   ` Matt Roper
  -1 siblings, 0 replies; 76+ messages in thread
From: Matt Roper @ 2023-04-20 20:52 UTC (permalink / raw)
  To: fei.yang; +Cc: intel-gfx, dri-devel, Nirmoy Das

On Wed, Apr 19, 2023 at 04:00:54PM -0700, fei.yang@intel.com wrote:
> From: Fei Yang <fei.yang@intel.com>
> 
> This patch implements Wa_22016122933.
> 
> In MTL, memory writes initiated by Media tile update the whole
> cache line even for partial writes. This creates a coherency
> problem for cacheable memory if both CPU and GPU are writing data
> to different locations within a single cache line. CTB communication
> is impacted by this issue because the head and tail pointers are
> adjacent words within a cache line (see struct guc_ct_buffer_desc),
> where one is written by GuC and the other by the host.
> This patch circumvents the issue by making CPU/GPU shared memory
> uncacheable (WC on CPU side, and PAT index 2 for GPU). Also for
> CTB which is being updated by both CPU and GuC, mfence instruction
> is added to make sure the CPU writes are visible to GPU right away
> (flush the write combining buffer).

Is this description accurate?  This patch doesn't insert an mfence
instruction itself, it just calls intel_guc_write_barrier().  On
platforms like MTL that aren't using local memory, that issues a wmb()
barrier, which I believe is implemented as an sfence, not mfence.  You'd
need to be doing a mb() call to get an mfence.

I think in general this level of explanation is unnecessary; you can
just give a high-level description indicating that we force the
write-combine buffer to be flushed and not give the low-level specifics
of what instruction that translates to at the x86 level.

Aside from simplifying the commit message,

Reviewed-by: Matt Roper <matthew.d.roper@intel.com>

> 
> While fixing the CTB issue, we noticed some random GSC firmware
> loading failure because the share buffers are cacheable (WB) on CPU
> side but uncached on GPU side. To fix these issues we need to map
> such shared buffers as WC on CPU side. Since such allocations are
> not all done through GuC allocator, to avoid too many code changes,
> the i915_coherent_map_type() is now hard coded to return WC for MTL.
> 
> BSpec: 45101
> 
> Signed-off-by: Fei Yang <fei.yang@intel.com>
> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
> Acked-by: Nirmoy Das <nirmoy.das@intel.com>
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_pages.c |  5 ++++-
>  drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c | 13 +++++++++++++
>  drivers/gpu/drm/i915/gt/uc/intel_guc.c    |  7 +++++++
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c |  6 ++++++
>  4 files changed, 30 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> index ecd86130b74f..89fc8ea6bcfc 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> @@ -469,7 +469,10 @@ enum i915_map_type i915_coherent_map_type(struct drm_i915_private *i915,
>  					  struct drm_i915_gem_object *obj,
>  					  bool always_coherent)
>  {
> -	if (i915_gem_object_is_lmem(obj))
> +	/*
> +	 * Wa_22016122933: always return I915_MAP_WC for MTL
> +	 */
> +	if (i915_gem_object_is_lmem(obj) || IS_METEORLAKE(i915))
>  		return I915_MAP_WC;
>  	if (HAS_LLC(i915) || always_coherent)
>  		return I915_MAP_WB;
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c b/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c
> index 1d9fdfb11268..236673c02f9a 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c
> @@ -110,6 +110,13 @@ static int gsc_fw_load_prepare(struct intel_gsc_uc *gsc)
>  	if (obj->base.size < gsc->fw.size)
>  		return -ENOSPC;
>  
> +	/*
> +	 * Wa_22016122933: For MTL the shared memory needs to be mapped
> +	 * as WC on CPU side and UC (PAT index 2) on GPU side
> +	 */
> +	if (IS_METEORLAKE(i915))
> +		i915_gem_object_set_cache_coherency(obj, I915_CACHE_NONE);
> +
>  	dst = i915_gem_object_pin_map_unlocked(obj,
>  					       i915_coherent_map_type(i915, obj, true));
>  	if (IS_ERR(dst))
> @@ -125,6 +132,12 @@ static int gsc_fw_load_prepare(struct intel_gsc_uc *gsc)
>  	memset(dst, 0, obj->base.size);
>  	memcpy(dst, src, gsc->fw.size);
>  
> +	/*
> +	 * Wa_22016122933: Making sure the data in dst is
> +	 * visible to GSC right away
> +	 */
> +	intel_guc_write_barrier(&gt->uc.guc);
> +
>  	i915_gem_object_unpin_map(gsc->fw.obj);
>  	i915_gem_object_unpin_map(obj);
>  
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> index e89f16ecf1ae..c9f20385f6a0 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> @@ -744,6 +744,13 @@ struct i915_vma *intel_guc_allocate_vma(struct intel_guc *guc, u32 size)
>  	if (IS_ERR(obj))
>  		return ERR_CAST(obj);
>  
> +	/*
> +	 * Wa_22016122933: For MTL the shared memory needs to be mapped
> +	 * as WC on CPU side and UC (PAT index 2) on GPU side
> +	 */
> +	if (IS_METEORLAKE(gt->i915))
> +		i915_gem_object_set_cache_coherency(obj, I915_CACHE_NONE);
> +
>  	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL);
>  	if (IS_ERR(vma))
>  		goto err;
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> index 1803a633ed64..99a0a89091e7 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> @@ -902,6 +902,12 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
>  	/* now update descriptor */
>  	WRITE_ONCE(desc->head, head);
>  
> +	/*
> +	 * Wa_22016122933: Making sure the head update is
> +	 * visible to GuC right away
> +	 */
> +	intel_guc_write_barrier(ct_to_guc(ct));
> +
>  	return available - len;
>  
>  corrupted:
> -- 
> 2.25.1
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Intel-gfx] [PATCH 5/8] drm/i915/mtl: end support for set caching ioctl
  2023-04-19 23:00   ` [Intel-gfx] " fei.yang
  (?)
@ 2023-04-20 21:05   ` Matt Roper
  -1 siblings, 0 replies; 76+ messages in thread
From: Matt Roper @ 2023-04-20 21:05 UTC (permalink / raw)
  To: fei.yang; +Cc: intel-gfx, dri-devel, Andrzej Hajda

On Wed, Apr 19, 2023 at 04:00:55PM -0700, fei.yang@intel.com wrote:
> From: Fei Yang <fei.yang@intel.com>
> 
> The design is to keep Buffer Object's caching policy immutable through
> out its life cycle. This patch ends the support for set caching ioctl
> from MTL onward. While doing that we also set BO's to be 1-way coherent
> at creation time because GPU is no longer automatically snooping CPU
> cache. For UMD's need to fine tune the caching policy for BO's, a follow
> up patch will extend the GEM_CREATE uAPI to allow UMD's specify caching
> mode at BO creation time.

Nitpick:  I don't think "UMD" is a term that anyone really uses outside
of Intel.  It's probably better to just say "userspace" instead of
"UMD" since that's more accurate anyway.


Matt

> 
> Signed-off-by: Fei Yang <fei.yang@intel.com>
> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_domain.c | 3 +++
>  drivers/gpu/drm/i915/gem/i915_gem_shmem.c  | 9 ++++++++-
>  2 files changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> index d2d5a24301b2..bb3575b1479f 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> @@ -337,6 +337,9 @@ int i915_gem_set_caching_ioctl(struct drm_device *dev, void *data,
>  	if (IS_DGFX(i915))
>  		return -ENODEV;
>  
> +	if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 70))
> +		return -EOPNOTSUPP;
> +
>  	switch (args->caching) {
>  	case I915_CACHING_NONE:
>  		level = I915_CACHE_NONE;
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
> index 37d1efcd3ca6..cad4a6017f4b 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
> @@ -601,7 +601,14 @@ static int shmem_object_init(struct intel_memory_region *mem,
>  	obj->write_domain = I915_GEM_DOMAIN_CPU;
>  	obj->read_domains = I915_GEM_DOMAIN_CPU;
>  
> -	if (HAS_LLC(i915))
> +	/*
> +	 * MTL doesn't snoop CPU cache by default for GPU access (namely
> +	 * 1-way coherency). However some UMD's are currently depending on
> +	 * that. Make 1-way coherent the default setting for MTL. A follow
> +	 * up patch will extend the GEM_CREATE uAPI to allow UMD's specify
> +	 * caching mode at BO creation time
> +	 */
> +	if (HAS_LLC(i915) || (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 70)))
>  		/* On some devices, we can have the GPU use the LLC (the CPU
>  		 * cache) for about a 10% performance improvement
>  		 * compared to uncached.  Graphics requests other than
> -- 
> 2.25.1
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Intel-gfx] [PATCH 6/8] drm/i915: preparation for using PAT index
  2023-04-19 23:00   ` [Intel-gfx] " fei.yang
  (?)
  (?)
@ 2023-04-20 21:14   ` Matt Roper
  -1 siblings, 0 replies; 76+ messages in thread
From: Matt Roper @ 2023-04-20 21:14 UTC (permalink / raw)
  To: fei.yang; +Cc: intel-gfx, Chris Wilson, dri-devel

On Wed, Apr 19, 2023 at 04:00:56PM -0700, fei.yang@intel.com wrote:
> From: Fei Yang <fei.yang@intel.com>
> 
> This patch is a preparation for replacing enum i915_cache_level with PAT
> index. Caching policy for buffer objects is set through the PAT index in
> PTE, the old i915_cache_level is not sufficient to represent all caching
> modes supported by the hardware.
> 
> Preparing the transition by adding some platform dependent data structures
> and helper functions to translate the cache_level to pat_index.
> 
> cachelevel_to_pat: a platform dependent array mapping cache_level to
>                    pat_index.
> 
> max_pat_index: the maximum PAT index supported by the hardware. Needed for
>                validating the PAT index passed in from user space.

The description here doesn't quite match how it's being used.  For
platforms like MTL, the hardware supports PAT indices 0-15.  The bspec
only gives us values to program for the first 5 of those entries and we
leave the rest at their hardware default (fully cached).  In the code
below, you're setting max_pat_index to the size of the bspec-defined
table (i.e., max=4 on MTL).  That's fine, but it means the description
here ("maximum...supported by hardware") is inaccurate.


Matt

> 
> i915_gem_get_pat_index: function to convert cache_level to PAT index.
> 
> obj_to_i915(obj): macro moved to header file for wider usage.
> 
> I915_MAX_CACHE_LEVEL: upper bound of i915_cache_level for the
>                       convenience of coding.
> 
> Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
> Cc: Matt Roper <matthew.d.roper@intel.com>
> Cc: Andi Shyti <andi.shyti@linux.intel.com>
> Signed-off-by: Fei Yang <fei.yang@intel.com>
> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_object.c    |  9 +++
>  drivers/gpu/drm/i915/gem/i915_gem_object.h    |  4 +
>  .../gpu/drm/i915/gem/i915_gem_object_types.h  |  1 +
>  drivers/gpu/drm/i915/gem/i915_gem_shrinker.c  |  2 -
>  drivers/gpu/drm/i915/gt/gen8_ppgtt.c          |  6 ++
>  drivers/gpu/drm/i915/gt/intel_ggtt.c          |  6 ++
>  drivers/gpu/drm/i915/i915_pci.c               | 75 +++++++++++++++++--
>  drivers/gpu/drm/i915/intel_device_info.h      |  5 ++
>  .../gpu/drm/i915/selftests/mock_gem_device.c  |  9 +++
>  9 files changed, 107 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> index 4666bb82f312..8c70a0ec7d2f 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> @@ -45,6 +45,15 @@ static struct kmem_cache *slab_objects;
>  
>  static const struct drm_gem_object_funcs i915_gem_object_funcs;
>  
> +unsigned int i915_gem_get_pat_index(struct drm_i915_private *i915,
> +				    enum i915_cache_level level)
> +{
> +	if (drm_WARN_ON(&i915->drm, level >= I915_MAX_CACHE_LEVEL))
> +		return 0;
> +
> +	return INTEL_INFO(i915)->cachelevel_to_pat[level];
> +}
> +
>  struct drm_i915_gem_object *i915_gem_object_alloc(void)
>  {
>  	struct drm_i915_gem_object *obj;
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> index 885ccde9dc3c..4c92e17b4337 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> @@ -20,6 +20,8 @@
>  
>  enum intel_region_id;
>  
> +#define obj_to_i915(obj__) to_i915((obj__)->base.dev)
> +
>  static inline bool i915_gem_object_size_2big(u64 size)
>  {
>  	struct drm_i915_gem_object *obj;
> @@ -30,6 +32,8 @@ static inline bool i915_gem_object_size_2big(u64 size)
>  	return false;
>  }
>  
> +unsigned int i915_gem_get_pat_index(struct drm_i915_private *i915,
> +				    enum i915_cache_level level);
>  void i915_gem_init__objects(struct drm_i915_private *i915);
>  
>  void i915_objects_module_exit(void);
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> index 830c11431ee8..41b35abccf88 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> @@ -194,6 +194,7 @@ enum i915_cache_level {
>  	 * engine.
>  	 */
>  	I915_CACHE_WT,
> +	I915_MAX_CACHE_LEVEL,
>  };
>  
>  enum i915_map_type {
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c
> index b1672e054b21..214763942aa2 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c
> @@ -460,8 +460,6 @@ void i915_gem_shrinker_taints_mutex(struct drm_i915_private *i915,
>  	fs_reclaim_release(GFP_KERNEL);
>  }
>  
> -#define obj_to_i915(obj__) to_i915((obj__)->base.dev)
> -
>  /**
>   * i915_gem_object_make_unshrinkable - Hide the object from the shrinker. By
>   * default all object types that support shrinking(see IS_SHRINKABLE), will also
> diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> index 11b91e0453c8..7a4b1d1afce9 100644
> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> @@ -78,6 +78,12 @@ static u64 mtl_pte_encode(dma_addr_t addr,
>  	case I915_CACHE_WT:
>  		pte |= GEN12_PPGTT_PTE_PAT0;
>  		break;
> +	default:
> +		/* This should never happen. Added to deal with the compile
> +		 * error due to the addition of I915_MAX_CACHE_LEVEL. Will
> +		 * be removed by the pat_index patch.
> +		 */
> +		break;
>  	}
>  
>  	return pte;
> diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> index 20915edc8bd9..c8390d03fce2 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> @@ -242,6 +242,12 @@ static u64 mtl_ggtt_pte_encode(dma_addr_t addr,
>  	case I915_CACHE_WT:
>  		pte |= MTL_GGTT_PTE_PAT0;
>  		break;
> +	default:
> +		/* This should never happen. Added to deal with the compile
> +		 * error due to the addition of I915_MAX_CACHE_LEVEL. Will
> +		 * be removed by the pat_index patch.
> +		 */
> +		break;
>  	}
>  
>  	return pte;
> diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
> index 272a8ba37b64..4ca0ea8fce9b 100644
> --- a/drivers/gpu/drm/i915/i915_pci.c
> +++ b/drivers/gpu/drm/i915/i915_pci.c
> @@ -30,6 +30,7 @@
>  #include "display/intel_display_driver.h"
>  #include "gt/intel_gt_regs.h"
>  #include "gt/intel_sa_media.h"
> +#include "gem/i915_gem_object_types.h"
>  
>  #include "i915_driver.h"
>  #include "i915_drv.h"
> @@ -164,6 +165,38 @@
>  		.gamma_lut_tests = DRM_COLOR_LUT_NON_DECREASING, \
>  	}
>  
> +#define LEGACY_CACHELEVEL \
> +	.cachelevel_to_pat = { \
> +		[I915_CACHE_NONE]   = 0, \
> +		[I915_CACHE_LLC]    = 1, \
> +		[I915_CACHE_L3_LLC] = 2, \
> +		[I915_CACHE_WT]     = 3, \
> +	}
> +
> +#define TGL_CACHELEVEL \
> +	.cachelevel_to_pat = { \
> +		[I915_CACHE_NONE]   = 3, \
> +		[I915_CACHE_LLC]    = 0, \
> +		[I915_CACHE_L3_LLC] = 0, \
> +		[I915_CACHE_WT]     = 2, \
> +	}
> +
> +#define PVC_CACHELEVEL \
> +	.cachelevel_to_pat = { \
> +		[I915_CACHE_NONE]   = 0, \
> +		[I915_CACHE_LLC]    = 3, \
> +		[I915_CACHE_L3_LLC] = 3, \
> +		[I915_CACHE_WT]     = 2, \
> +	}
> +
> +#define MTL_CACHELEVEL \
> +	.cachelevel_to_pat = { \
> +		[I915_CACHE_NONE]   = 2, \
> +		[I915_CACHE_LLC]    = 3, \
> +		[I915_CACHE_L3_LLC] = 3, \
> +		[I915_CACHE_WT]     = 1, \
> +	}
> +
>  /* Keep in gen based order, and chronological order within a gen */
>  
>  #define GEN_DEFAULT_PAGE_SIZES \
> @@ -189,11 +222,13 @@
>  	.has_snoop = true, \
>  	.has_coherent_ggtt = false, \
>  	.dma_mask_size = 32, \
> +	.max_pat_index = 3, \
>  	I9XX_PIPE_OFFSETS, \
>  	I9XX_CURSOR_OFFSETS, \
>  	I9XX_COLORS, \
>  	GEN_DEFAULT_PAGE_SIZES, \
> -	GEN_DEFAULT_REGIONS
> +	GEN_DEFAULT_REGIONS, \
> +	LEGACY_CACHELEVEL
>  
>  #define I845_FEATURES \
>  	GEN(2), \
> @@ -210,11 +245,13 @@
>  	.has_snoop = true, \
>  	.has_coherent_ggtt = false, \
>  	.dma_mask_size = 32, \
> +	.max_pat_index = 3, \
>  	I845_PIPE_OFFSETS, \
>  	I845_CURSOR_OFFSETS, \
>  	I845_COLORS, \
>  	GEN_DEFAULT_PAGE_SIZES, \
> -	GEN_DEFAULT_REGIONS
> +	GEN_DEFAULT_REGIONS, \
> +	LEGACY_CACHELEVEL
>  
>  static const struct intel_device_info i830_info = {
>  	I830_FEATURES,
> @@ -249,11 +286,13 @@ static const struct intel_device_info i865g_info = {
>  	.has_snoop = true, \
>  	.has_coherent_ggtt = true, \
>  	.dma_mask_size = 32, \
> +	.max_pat_index = 3, \
>  	I9XX_PIPE_OFFSETS, \
>  	I9XX_CURSOR_OFFSETS, \
>  	I9XX_COLORS, \
>  	GEN_DEFAULT_PAGE_SIZES, \
> -	GEN_DEFAULT_REGIONS
> +	GEN_DEFAULT_REGIONS, \
> +	LEGACY_CACHELEVEL
>  
>  static const struct intel_device_info i915g_info = {
>  	GEN3_FEATURES,
> @@ -341,11 +380,13 @@ static const struct intel_device_info pnv_m_info = {
>  	.has_snoop = true, \
>  	.has_coherent_ggtt = true, \
>  	.dma_mask_size = 36, \
> +	.max_pat_index = 3, \
>  	I9XX_PIPE_OFFSETS, \
>  	I9XX_CURSOR_OFFSETS, \
>  	I9XX_COLORS, \
>  	GEN_DEFAULT_PAGE_SIZES, \
> -	GEN_DEFAULT_REGIONS
> +	GEN_DEFAULT_REGIONS, \
> +	LEGACY_CACHELEVEL
>  
>  static const struct intel_device_info i965g_info = {
>  	GEN4_FEATURES,
> @@ -395,11 +436,13 @@ static const struct intel_device_info gm45_info = {
>  	/* ilk does support rc6, but we do not implement [power] contexts */ \
>  	.has_rc6 = 0, \
>  	.dma_mask_size = 36, \
> +	.max_pat_index = 3, \
>  	I9XX_PIPE_OFFSETS, \
>  	I9XX_CURSOR_OFFSETS, \
>  	ILK_COLORS, \
>  	GEN_DEFAULT_PAGE_SIZES, \
> -	GEN_DEFAULT_REGIONS
> +	GEN_DEFAULT_REGIONS, \
> +	LEGACY_CACHELEVEL
>  
>  static const struct intel_device_info ilk_d_info = {
>  	GEN5_FEATURES,
> @@ -429,13 +472,15 @@ static const struct intel_device_info ilk_m_info = {
>  	.has_rc6p = 0, \
>  	.has_rps = true, \
>  	.dma_mask_size = 40, \
> +	.max_pat_index = 3, \
>  	.__runtime.ppgtt_type = INTEL_PPGTT_ALIASING, \
>  	.__runtime.ppgtt_size = 31, \
>  	I9XX_PIPE_OFFSETS, \
>  	I9XX_CURSOR_OFFSETS, \
>  	ILK_COLORS, \
>  	GEN_DEFAULT_PAGE_SIZES, \
> -	GEN_DEFAULT_REGIONS
> +	GEN_DEFAULT_REGIONS, \
> +	LEGACY_CACHELEVEL
>  
>  #define SNB_D_PLATFORM \
>  	GEN6_FEATURES, \
> @@ -482,13 +527,15 @@ static const struct intel_device_info snb_m_gt2_info = {
>  	.has_reset_engine = true, \
>  	.has_rps = true, \
>  	.dma_mask_size = 40, \
> +	.max_pat_index = 3, \
>  	.__runtime.ppgtt_type = INTEL_PPGTT_ALIASING, \
>  	.__runtime.ppgtt_size = 31, \
>  	IVB_PIPE_OFFSETS, \
>  	IVB_CURSOR_OFFSETS, \
>  	IVB_COLORS, \
>  	GEN_DEFAULT_PAGE_SIZES, \
> -	GEN_DEFAULT_REGIONS
> +	GEN_DEFAULT_REGIONS, \
> +	LEGACY_CACHELEVEL
>  
>  #define IVB_D_PLATFORM \
>  	GEN7_FEATURES, \
> @@ -542,6 +589,7 @@ static const struct intel_device_info vlv_info = {
>  	.display.has_gmch = 1,
>  	.display.has_hotplug = 1,
>  	.dma_mask_size = 40,
> +	.max_pat_index = 3,
>  	.__runtime.ppgtt_type = INTEL_PPGTT_ALIASING,
>  	.__runtime.ppgtt_size = 31,
>  	.has_snoop = true,
> @@ -553,6 +601,7 @@ static const struct intel_device_info vlv_info = {
>  	I9XX_COLORS,
>  	GEN_DEFAULT_PAGE_SIZES,
>  	GEN_DEFAULT_REGIONS,
> +	LEGACY_CACHELEVEL,
>  };
>  
>  #define G75_FEATURES  \
> @@ -640,6 +689,7 @@ static const struct intel_device_info chv_info = {
>  	.has_logical_ring_contexts = 1,
>  	.display.has_gmch = 1,
>  	.dma_mask_size = 39,
> +	.max_pat_index = 3,
>  	.__runtime.ppgtt_type = INTEL_PPGTT_FULL,
>  	.__runtime.ppgtt_size = 32,
>  	.has_reset_engine = 1,
> @@ -651,6 +701,7 @@ static const struct intel_device_info chv_info = {
>  	CHV_COLORS,
>  	GEN_DEFAULT_PAGE_SIZES,
>  	GEN_DEFAULT_REGIONS,
> +	LEGACY_CACHELEVEL,
>  };
>  
>  #define GEN9_DEFAULT_PAGE_SIZES \
> @@ -890,9 +941,11 @@ static const struct intel_device_info jsl_info = {
>  		[TRANSCODER_DSI_1] = TRANSCODER_DSI1_OFFSET, \
>  	}, \
>  	TGL_CURSOR_OFFSETS, \
> +	TGL_CACHELEVEL, \
>  	.has_global_mocs = 1, \
>  	.has_pxp = 1, \
> -	.display.has_dsb = 1
> +	.display.has_dsb = 1, \
> +	.max_pat_index = 3
>  
>  static const struct intel_device_info tgl_info = {
>  	GEN12_FEATURES,
> @@ -1014,6 +1067,7 @@ static const struct intel_device_info adl_p_info = {
>  	.__runtime.graphics.ip.ver = 12, \
>  	.__runtime.graphics.ip.rel = 50, \
>  	XE_HP_PAGE_SIZES, \
> +	TGL_CACHELEVEL, \
>  	.dma_mask_size = 46, \
>  	.has_3d_pipeline = 1, \
>  	.has_64bit_reloc = 1, \
> @@ -1032,6 +1086,7 @@ static const struct intel_device_info adl_p_info = {
>  	.has_reset_engine = 1, \
>  	.has_rps = 1, \
>  	.has_runtime_pm = 1, \
> +	.max_pat_index = 3, \
>  	.__runtime.ppgtt_size = 48, \
>  	.__runtime.ppgtt_type = INTEL_PPGTT_FULL
>  
> @@ -1108,11 +1163,13 @@ static const struct intel_device_info pvc_info = {
>  	PLATFORM(INTEL_PONTEVECCHIO),
>  	NO_DISPLAY,
>  	.has_flat_ccs = 0,
> +	.max_pat_index = 7,
>  	.__runtime.platform_engine_mask =
>  		BIT(BCS0) |
>  		BIT(VCS0) |
>  		BIT(CCS0) | BIT(CCS1) | BIT(CCS2) | BIT(CCS3),
>  	.require_force_probe = 1,
> +	PVC_CACHELEVEL,
>  };
>  
>  #define XE_LPDP_FEATURES	\
> @@ -1150,9 +1207,11 @@ static const struct intel_device_info mtl_info = {
>  	.has_llc = 0,
>  	.has_mslice_steering = 0,
>  	.has_snoop = 1,
> +	.max_pat_index = 4,
>  	.__runtime.memory_regions = REGION_SMEM | REGION_STOLEN_LMEM,
>  	.__runtime.platform_engine_mask = BIT(RCS0) | BIT(BCS0) | BIT(CCS0),
>  	.require_force_probe = 1,
> +	MTL_CACHELEVEL,
>  };
>  
>  #undef PLATFORM
> diff --git a/drivers/gpu/drm/i915/intel_device_info.h b/drivers/gpu/drm/i915/intel_device_info.h
> index f032f2500f50..959a4080840c 100644
> --- a/drivers/gpu/drm/i915/intel_device_info.h
> +++ b/drivers/gpu/drm/i915/intel_device_info.h
> @@ -35,6 +35,8 @@
>  #include "gt/intel_context_types.h"
>  #include "gt/intel_sseu.h"
>  
> +#include "gem/i915_gem_object_types.h"
> +
>  struct drm_printer;
>  struct drm_i915_private;
>  struct intel_gt_definition;
> @@ -308,6 +310,9 @@ struct intel_device_info {
>  	 * Initial runtime info. Do not access outside of i915_driver_create().
>  	 */
>  	const struct intel_runtime_info __runtime;
> +
> +	u32 cachelevel_to_pat[I915_MAX_CACHE_LEVEL];
> +	u32 max_pat_index;
>  };
>  
>  struct intel_driver_caps {
> diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> index f6a7c0bd2955..0eda8b4ee17f 100644
> --- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> +++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> @@ -123,7 +123,9 @@ struct drm_i915_private *mock_gem_device(void)
>  	static struct dev_iommu fake_iommu = { .priv = (void *)-1 };
>  #endif
>  	struct drm_i915_private *i915;
> +	struct intel_device_info *i915_info;
>  	struct pci_dev *pdev;
> +	unsigned int i;
>  	int ret;
>  
>  	pdev = kzalloc(sizeof(*pdev), GFP_KERNEL);
> @@ -180,6 +182,13 @@ struct drm_i915_private *mock_gem_device(void)
>  		I915_GTT_PAGE_SIZE_2M;
>  
>  	RUNTIME_INFO(i915)->memory_regions = REGION_SMEM;
> +
> +	/* simply use legacy cache level for mock device */
> +	i915_info = (struct intel_device_info *)INTEL_INFO(i915);
> +	i915_info->max_pat_index = 3;
> +	for (i = 0; i < I915_MAX_CACHE_LEVEL; i++)
> +		i915_info->cachelevel_to_pat[i] = i;
> +
>  	intel_memory_regions_hw_probe(i915);
>  
>  	spin_lock_init(&i915->gpu_error.lock);
> -- 
> 2.25.1
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Intel-gfx] [PATCH 7/8] drm/i915: use pat_index instead of cache_level
  2023-04-19 23:00   ` [Intel-gfx] " fei.yang
  (?)
  (?)
@ 2023-04-21  8:43   ` Tvrtko Ursulin
  -1 siblings, 0 replies; 76+ messages in thread
From: Tvrtko Ursulin @ 2023-04-21  8:43 UTC (permalink / raw)
  To: fei.yang, intel-gfx; +Cc: Chris Wilson, Matt Roper, dri-devel


On 20/04/2023 00:00, fei.yang@intel.com wrote:
> From: Fei Yang <fei.yang@intel.com>
> 
> Currently the KMD is using enum i915_cache_level to set caching policy for
> buffer objects. This is flaky because the PAT index which really controls
> the caching behavior in PTE has far more levels than what's defined in the
> enum. In addition, the PAT index is platform dependent, having to translate
> between i915_cache_level and PAT index is not reliable, and makes the code
> more complicated.
> 
>>From UMD's perspective there is also a necessity to set caching policy for
> performance fine tuning. It's much easier for the UMD to directly use PAT
> index because the behavior of each PAT index is clearly defined in Bspec.
> Having the abstracted i915_cache_level sitting in between would only cause
> more ambiguity.
> 
> For these reasons this patch replaces i915_cache_level with PAT index. Also
> note, the cache_level is not completely removed yet, because the KMD still
> has the need of creating buffer objects with simple cache settings such as
> cached, uncached, or writethrough. For such simple cases, using cache_level
> would help simplify the code.
> 
> Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
> Cc: Matt Roper <matthew.d.roper@intel.com>
> Signed-off-by: Fei Yang <fei.yang@intel.com>
> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>

[snip]

> @@ -306,20 +304,13 @@ int i915_gem_get_caching_ioctl(struct drm_device *dev, void *data,
>   		goto out;
>   	}
>   
> -	switch (obj->cache_level) {
> -	case I915_CACHE_LLC:
> -	case I915_CACHE_L3_LLC:
> +	if (i915_gem_object_has_cache_level(obj, I915_CACHE_LLC) ||
> +	    i915_gem_object_has_cache_level(obj, I915_CACHE_L3_LLC))
>   		args->caching = I915_CACHING_CACHED;
> -		break;
> -
> -	case I915_CACHE_WT:
> +	else if (i915_gem_object_has_cache_level(obj, I915_CACHE_WT))
>   		args->caching = I915_CACHING_DISPLAY;
> -		break;
> -
> -	default:
> +	else
>   		args->caching = I915_CACHING_NONE;
> -		break;
> -	}
>   out:
>   	rcu_read_unlock();
>   	return err;

[snip]

> +bool i915_gem_object_has_cache_level(const struct drm_i915_gem_object *obj,
> +				     enum i915_cache_level lvl)
> +{
> +	/*
> +	 * cache_level == I915_CACHE_INVAL indicates the UMD's have set the
> +	 * caching policy through pat_index, in which case the KMD should
> +	 * leave the coherency to be managed by user space, simply return
> +	 * true here.
> +	 */
> +	if (obj->cache_level == I915_CACHE_INVAL)
> +		return true;
> +
> +	/*
> +	 * Otherwise the pat_index should have been converted from cache_level
> +	 * so that the following comparison is valid.
> +	 */
> +	return obj->pat_index == i915_gem_get_pat_index(obj_to_i915(obj), lvl);
> +}
> +

Isn't i915_gem_get_caching_ioctl always going to report 
I915_CACHING_CACHED if any PAT index has been set?

Not sure if that is okay or not, or if it only needs mentioning in the 
commit, I am still reading through it all.

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Intel-gfx] [PATCH 7/8] drm/i915: use pat_index instead of cache_level
  2023-04-19 23:00   ` [Intel-gfx] " fei.yang
                     ` (2 preceding siblings ...)
  (?)
@ 2023-04-21 10:17   ` Tvrtko Ursulin
  2023-04-23  6:12       ` Yang, Fei
  -1 siblings, 1 reply; 76+ messages in thread
From: Tvrtko Ursulin @ 2023-04-21 10:17 UTC (permalink / raw)
  To: fei.yang, intel-gfx; +Cc: Chris Wilson, Matt Roper, dri-devel



On 20/04/2023 00:00, fei.yang@intel.com wrote:
> From: Fei Yang <fei.yang@intel.com>
> 
> Currently the KMD is using enum i915_cache_level to set caching policy for
> buffer objects. This is flaky because the PAT index which really controls
> the caching behavior in PTE has far more levels than what's defined in the
> enum. In addition, the PAT index is platform dependent, having to translate
> between i915_cache_level and PAT index is not reliable, and makes the code
> more complicated.
> 
>>From UMD's perspective there is also a necessity to set caching policy for
> performance fine tuning. It's much easier for the UMD to directly use PAT
> index because the behavior of each PAT index is clearly defined in Bspec.
> Having the abstracted i915_cache_level sitting in between would only cause
> more ambiguity.
> 
> For these reasons this patch replaces i915_cache_level with PAT index. Also
> note, the cache_level is not completely removed yet, because the KMD still
> has the need of creating buffer objects with simple cache settings such as
> cached, uncached, or writethrough. For such simple cases, using cache_level
> would help simplify the code.
> 
> Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
> Cc: Matt Roper <matthew.d.roper@intel.com>
> Signed-off-by: Fei Yang <fei.yang@intel.com>
> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>

[snip]

>   
>   bool i915_gem_cpu_write_needs_clflush(struct drm_i915_gem_object *obj)
> @@ -267,7 +267,7 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
>   {
>   	int ret;
>   
> -	if (obj->cache_level == cache_level)
> +	if (i915_gem_object_has_cache_level(obj, cache_level))
>   		return 0;

When userspace calls i915_gem_set_caching_ioctl after having set the PAT index explicitly this will make it silently succeed regardless of the cache level passed in, no? Because of:

+bool i915_gem_object_has_cache_level(const struct drm_i915_gem_object *obj,
+				     enum i915_cache_level lvl)
+{
+	/*
+	 * cache_level == I915_CACHE_INVAL indicates the UMD's have set the
+	 * caching policy through pat_index, in which case the KMD should
+	 * leave the coherency to be managed by user space, simply return
+	 * true here.
+	 */
+	if (obj->cache_level == I915_CACHE_INVAL)
+		return true;

I think we need to let it know it is doing it wrong with an error.

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Intel-gfx] [PATCH 7/8] drm/i915: use pat_index instead of cache_level
  2023-04-19 23:00   ` [Intel-gfx] " fei.yang
                     ` (3 preceding siblings ...)
  (?)
@ 2023-04-21 11:39   ` Tvrtko Ursulin
  2023-04-23  6:52       ` Yang, Fei
  -1 siblings, 1 reply; 76+ messages in thread
From: Tvrtko Ursulin @ 2023-04-21 11:39 UTC (permalink / raw)
  To: fei.yang, intel-gfx; +Cc: Chris Wilson, Matt Roper, dri-devel


On 20/04/2023 00:00, fei.yang@intel.com wrote:
> From: Fei Yang <fei.yang@intel.com>
> 
> Currently the KMD is using enum i915_cache_level to set caching policy for
> buffer objects. This is flaky because the PAT index which really controls
> the caching behavior in PTE has far more levels than what's defined in the
> enum. In addition, the PAT index is platform dependent, having to translate
> between i915_cache_level and PAT index is not reliable, and makes the code
> more complicated.
> 
>>From UMD's perspective there is also a necessity to set caching policy for
> performance fine tuning. It's much easier for the UMD to directly use PAT
> index because the behavior of each PAT index is clearly defined in Bspec.
> Having the abstracted i915_cache_level sitting in between would only cause
> more ambiguity.
> 
> For these reasons this patch replaces i915_cache_level with PAT index. Also
> note, the cache_level is not completely removed yet, because the KMD still
> has the need of creating buffer objects with simple cache settings such as
> cached, uncached, or writethrough. For such simple cases, using cache_level
> would help simplify the code.
> 
> Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
> Cc: Matt Roper <matthew.d.roper@intel.com>
> Signed-off-by: Fei Yang <fei.yang@intel.com>
> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>

I think have some ideas no how to perhaps make this simpler, please bear 
with me.

In my mind get/set caching ioctls need to be failing once explicit pat 
index has been set by userspace. Or at least not return false information.

And I don't like i915_gem_object_has_cache_level and 
i915_gem_get_pat_index as a refactoring step.

It also seems that the driver has a need to query the caching mode set 
regardless of the route (of setting).

So how about this.

Three callers which query the caching mode: use_cpu_reloc, vm_fault_gtt, 
gpu_write_needs_clflush.

We convert them to be like:

i915_gem_object_has_caching_mode(obj, PAT_UC / PAT_WT / ...);

Then apart from the per platform tables for mapping between cache level 
to pat index, you add tables which map pat index to caching modes 
(PAT_UC, etc, naming TBD, just enums or bitmasks also TBD, I haven't 
looked at the bspec to see how exactly it works).

You would use that table in the i915_gem_object_has_caching_mode helper, 
called from the above three functions instead of obj->cache_level direct 
comparison.

I am assuming at least for instance cache_level != I915_CACHE_NONE would 
be equivalent to i915_gem_object_has_caching_mode(obj, PAT_UC), etc.

Same mapping table could also be used in debugfs (i915_cache_level_str) 
to universally describe any obj->pat_index, with no need to have 
anything platform dependend there.

In set caching set you always set obj->pat_index and so low level code 
can always just use that.

Unless I am missing something (possible) I think like that we end up 
with no i915_gem_get_pat_index sprinkled around and also no confusing 
i915_gem_object_has_cache_level.

Obj->pat_index would be a single point of truth, while obj->cache_level 
is just a legacy field for get/set_caching ioctl - not used in the 
internal driver flows.

We would need an additional field for storing the boolean of whether 
userspace had overriden the PAT.

Regards,

Tvrtko

> ---
>   drivers/gpu/drm/i915/display/intel_dpt.c      | 12 +--
>   drivers/gpu/drm/i915/gem/i915_gem_domain.c    | 27 ++----
>   .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 10 ++-
>   drivers/gpu/drm/i915/gem/i915_gem_mman.c      |  3 +-
>   drivers/gpu/drm/i915/gem/i915_gem_object.c    | 52 +++++++++++-
>   drivers/gpu/drm/i915/gem/i915_gem_object.h    |  4 +
>   .../gpu/drm/i915/gem/i915_gem_object_types.h  | 25 +++++-
>   drivers/gpu/drm/i915/gem/i915_gem_stolen.c    |  4 +-
>   drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c  | 16 ++--
>   .../gpu/drm/i915/gem/selftests/huge_pages.c   |  2 +-
>   .../drm/i915/gem/selftests/i915_gem_migrate.c |  2 +-
>   .../drm/i915/gem/selftests/i915_gem_mman.c    |  2 +-
>   drivers/gpu/drm/i915/gt/gen6_ppgtt.c          | 10 ++-
>   drivers/gpu/drm/i915/gt/gen8_ppgtt.c          | 71 ++++++++--------
>   drivers/gpu/drm/i915/gt/gen8_ppgtt.h          |  3 +-
>   drivers/gpu/drm/i915/gt/intel_ggtt.c          | 82 +++++++++----------
>   drivers/gpu/drm/i915/gt/intel_gtt.h           | 20 ++---
>   drivers/gpu/drm/i915/gt/intel_migrate.c       | 47 ++++++-----
>   drivers/gpu/drm/i915/gt/intel_migrate.h       | 13 ++-
>   drivers/gpu/drm/i915/gt/intel_ppgtt.c         |  6 +-
>   drivers/gpu/drm/i915/gt/selftest_migrate.c    | 47 ++++++-----
>   drivers/gpu/drm/i915/gt/selftest_reset.c      |  8 +-
>   drivers/gpu/drm/i915/gt/selftest_timeline.c   |  2 +-
>   drivers/gpu/drm/i915/gt/selftest_tlb.c        |  4 +-
>   drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c      | 10 ++-
>   drivers/gpu/drm/i915/i915_debugfs.c           | 55 ++++++++++---
>   drivers/gpu/drm/i915/i915_gem.c               | 16 +++-
>   drivers/gpu/drm/i915/i915_gpu_error.c         |  8 +-
>   drivers/gpu/drm/i915/i915_vma.c               | 16 ++--
>   drivers/gpu/drm/i915/i915_vma.h               |  2 +-
>   drivers/gpu/drm/i915/i915_vma_types.h         |  2 -
>   drivers/gpu/drm/i915/selftests/i915_gem.c     |  5 +-
>   .../gpu/drm/i915/selftests/i915_gem_evict.c   |  4 +-
>   drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 15 ++--
>   .../drm/i915/selftests/intel_memory_region.c  |  4 +-
>   drivers/gpu/drm/i915/selftests/mock_gtt.c     |  8 +-
>   36 files changed, 378 insertions(+), 239 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/display/intel_dpt.c b/drivers/gpu/drm/i915/display/intel_dpt.c
> index c5eacfdba1a5..7c5fddb203ba 100644
> --- a/drivers/gpu/drm/i915/display/intel_dpt.c
> +++ b/drivers/gpu/drm/i915/display/intel_dpt.c
> @@ -43,24 +43,24 @@ static void gen8_set_pte(void __iomem *addr, gen8_pte_t pte)
>   static void dpt_insert_page(struct i915_address_space *vm,
>   			    dma_addr_t addr,
>   			    u64 offset,
> -			    enum i915_cache_level level,
> +			    unsigned int pat_index,
>   			    u32 flags)
>   {
>   	struct i915_dpt *dpt = i915_vm_to_dpt(vm);
>   	gen8_pte_t __iomem *base = dpt->iomem;
>   
>   	gen8_set_pte(base + offset / I915_GTT_PAGE_SIZE,
> -		     vm->pte_encode(addr, level, flags));
> +		     vm->pte_encode(addr, pat_index, flags));
>   }
>   
>   static void dpt_insert_entries(struct i915_address_space *vm,
>   			       struct i915_vma_resource *vma_res,
> -			       enum i915_cache_level level,
> +			       unsigned int pat_index,
>   			       u32 flags)
>   {
>   	struct i915_dpt *dpt = i915_vm_to_dpt(vm);
>   	gen8_pte_t __iomem *base = dpt->iomem;
> -	const gen8_pte_t pte_encode = vm->pte_encode(0, level, flags);
> +	const gen8_pte_t pte_encode = vm->pte_encode(0, pat_index, flags);
>   	struct sgt_iter sgt_iter;
>   	dma_addr_t addr;
>   	int i;
> @@ -83,7 +83,7 @@ static void dpt_clear_range(struct i915_address_space *vm,
>   static void dpt_bind_vma(struct i915_address_space *vm,
>   			 struct i915_vm_pt_stash *stash,
>   			 struct i915_vma_resource *vma_res,
> -			 enum i915_cache_level cache_level,
> +			 unsigned int pat_index,
>   			 u32 flags)
>   {
>   	u32 pte_flags;
> @@ -98,7 +98,7 @@ static void dpt_bind_vma(struct i915_address_space *vm,
>   	if (vma_res->bi.lmem)
>   		pte_flags |= PTE_LM;
>   
> -	vm->insert_entries(vm, vma_res, cache_level, pte_flags);
> +	vm->insert_entries(vm, vma_res, pat_index, pte_flags);
>   
>   	vma_res->page_sizes_gtt = I915_GTT_PAGE_SIZE;
>   
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> index bb3575b1479f..d5fd4c9cd9f8 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> @@ -27,8 +27,8 @@ static bool gpu_write_needs_clflush(struct drm_i915_gem_object *obj)
>   	if (IS_DGFX(i915))
>   		return false;
>   
> -	return !(obj->cache_level == I915_CACHE_NONE ||
> -		 obj->cache_level == I915_CACHE_WT);
> +	return !(i915_gem_object_has_cache_level(obj, I915_CACHE_NONE) ||
> +		 i915_gem_object_has_cache_level(obj, I915_CACHE_WT));
>   }
>   
>   bool i915_gem_cpu_write_needs_clflush(struct drm_i915_gem_object *obj)
> @@ -267,7 +267,7 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
>   {
>   	int ret;
>   
> -	if (obj->cache_level == cache_level)
> +	if (i915_gem_object_has_cache_level(obj, cache_level))
>   		return 0;
>   
>   	ret = i915_gem_object_wait(obj,
> @@ -278,10 +278,8 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
>   		return ret;
>   
>   	/* Always invalidate stale cachelines */
> -	if (obj->cache_level != cache_level) {
> -		i915_gem_object_set_cache_coherency(obj, cache_level);
> -		obj->cache_dirty = true;
> -	}
> +	i915_gem_object_set_cache_coherency(obj, cache_level);
> +	obj->cache_dirty = true;
>   
>   	/* The cache-level will be applied when each vma is rebound. */
>   	return i915_gem_object_unbind(obj,
> @@ -306,20 +304,13 @@ int i915_gem_get_caching_ioctl(struct drm_device *dev, void *data,
>   		goto out;
>   	}
>   
> -	switch (obj->cache_level) {
> -	case I915_CACHE_LLC:
> -	case I915_CACHE_L3_LLC:
> +	if (i915_gem_object_has_cache_level(obj, I915_CACHE_LLC) ||
> +	    i915_gem_object_has_cache_level(obj, I915_CACHE_L3_LLC))
>   		args->caching = I915_CACHING_CACHED;
> -		break;
> -
> -	case I915_CACHE_WT:
> +	else if (i915_gem_object_has_cache_level(obj, I915_CACHE_WT))
>   		args->caching = I915_CACHING_DISPLAY;
> -		break;
> -
> -	default:
> +	else
>   		args->caching = I915_CACHING_NONE;
> -		break;
> -	}
>   out:
>   	rcu_read_unlock();
>   	return err;
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> index 3aeede6aee4d..d42915516636 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> @@ -642,7 +642,7 @@ static inline int use_cpu_reloc(const struct reloc_cache *cache,
>   
>   	return (cache->has_llc ||
>   		obj->cache_dirty ||
> -		obj->cache_level != I915_CACHE_NONE);
> +		!i915_gem_object_has_cache_level(obj, I915_CACHE_NONE));
>   }
>   
>   static int eb_reserve_vma(struct i915_execbuffer *eb,
> @@ -1323,8 +1323,10 @@ static void *reloc_iomap(struct i915_vma *batch,
>   	offset = cache->node.start;
>   	if (drm_mm_node_allocated(&cache->node)) {
>   		ggtt->vm.insert_page(&ggtt->vm,
> -				     i915_gem_object_get_dma_address(obj, page),
> -				     offset, I915_CACHE_NONE, 0);
> +			i915_gem_object_get_dma_address(obj, page),
> +			offset,
> +			i915_gem_get_pat_index(ggtt->vm.i915, I915_CACHE_NONE),
> +			0);
>   	} else {
>   		offset += page << PAGE_SHIFT;
>   	}
> @@ -1464,7 +1466,7 @@ eb_relocate_entry(struct i915_execbuffer *eb,
>   			reloc_cache_unmap(&eb->reloc_cache);
>   			mutex_lock(&vma->vm->mutex);
>   			err = i915_vma_bind(target->vma,
> -					    target->vma->obj->cache_level,
> +					    target->vma->obj->pat_index,
>   					    PIN_GLOBAL, NULL, NULL);
>   			mutex_unlock(&vma->vm->mutex);
>   			reloc_cache_remap(&eb->reloc_cache, ev->vma->obj);
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> index 3dbacdf0911a..50c30efa08a3 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> @@ -383,7 +383,8 @@ static vm_fault_t vm_fault_gtt(struct vm_fault *vmf)
>   	}
>   
>   	/* Access to snoopable pages through the GTT is incoherent. */
> -	if (obj->cache_level != I915_CACHE_NONE && !HAS_LLC(i915)) {
> +	if (!(i915_gem_object_has_cache_level(obj, I915_CACHE_NONE) ||
> +	      HAS_LLC(i915))) {
>   		ret = -EFAULT;
>   		goto err_unpin;
>   	}
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> index 8c70a0ec7d2f..27c948350b5b 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> @@ -54,6 +54,25 @@ unsigned int i915_gem_get_pat_index(struct drm_i915_private *i915,
>   	return INTEL_INFO(i915)->cachelevel_to_pat[level];
>   }
>   
> +bool i915_gem_object_has_cache_level(const struct drm_i915_gem_object *obj,
> +				     enum i915_cache_level lvl)
> +{
> +	/*
> +	 * cache_level == I915_CACHE_INVAL indicates the UMD's have set the
> +	 * caching policy through pat_index, in which case the KMD should
> +	 * leave the coherency to be managed by user space, simply return
> +	 * true here.
> +	 */
> +	if (obj->cache_level == I915_CACHE_INVAL)
> +		return true;
> +
> +	/*
> +	 * Otherwise the pat_index should have been converted from cache_level
> +	 * so that the following comparison is valid.
> +	 */
> +	return obj->pat_index == i915_gem_get_pat_index(obj_to_i915(obj), lvl);
> +}
> +
>   struct drm_i915_gem_object *i915_gem_object_alloc(void)
>   {
>   	struct drm_i915_gem_object *obj;
> @@ -133,7 +152,7 @@ void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object *obj,
>   {
>   	struct drm_i915_private *i915 = to_i915(obj->base.dev);
>   
> -	obj->cache_level = cache_level;
> +	obj->pat_index = i915_gem_get_pat_index(i915, cache_level);
>   
>   	if (cache_level != I915_CACHE_NONE)
>   		obj->cache_coherent = (I915_BO_CACHE_COHERENT_FOR_READ |
> @@ -148,6 +167,37 @@ void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object *obj,
>   		!IS_DGFX(i915);
>   }
>   
> +/**
> + * i915_gem_object_set_pat_index - set PAT index to be used in PTE encode
> + * @obj: #drm_i915_gem_object
> + * @pat_index: PAT index
> + *
> + * This is a clone of i915_gem_object_set_cache_coherency taking pat index
> + * instead of cache_level as its second argument.
> + */
> +void i915_gem_object_set_pat_index(struct drm_i915_gem_object *obj,
> +				   unsigned int pat_index)
> +{
> +	struct drm_i915_private *i915 = to_i915(obj->base.dev);
> +
> +	if (obj->pat_index == pat_index)
> +		return;
> +
> +	obj->pat_index = pat_index;
> +
> +	if (pat_index != i915_gem_get_pat_index(i915, I915_CACHE_NONE))
> +		obj->cache_coherent = (I915_BO_CACHE_COHERENT_FOR_READ |
> +				       I915_BO_CACHE_COHERENT_FOR_WRITE);
> +	else if (HAS_LLC(i915))
> +		obj->cache_coherent = I915_BO_CACHE_COHERENT_FOR_READ;
> +	else
> +		obj->cache_coherent = 0;
> +
> +	obj->cache_dirty =
> +		!(obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_WRITE) &&
> +		!IS_DGFX(i915);
> +}
> +
>   bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj)
>   {
>   	struct drm_i915_private *i915 = to_i915(obj->base.dev);
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> index 4c92e17b4337..6f00aab10015 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> @@ -34,6 +34,8 @@ static inline bool i915_gem_object_size_2big(u64 size)
>   
>   unsigned int i915_gem_get_pat_index(struct drm_i915_private *i915,
>   				    enum i915_cache_level level);
> +bool i915_gem_object_has_cache_level(const struct drm_i915_gem_object *obj,
> +				     enum i915_cache_level lvl);
>   void i915_gem_init__objects(struct drm_i915_private *i915);
>   
>   void i915_objects_module_exit(void);
> @@ -764,6 +766,8 @@ bool i915_gem_object_has_unknown_state(struct drm_i915_gem_object *obj);
>   
>   void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object *obj,
>   					 unsigned int cache_level);
> +void i915_gem_object_set_pat_index(struct drm_i915_gem_object *obj,
> +				   unsigned int pat_index);
>   bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj);
>   void i915_gem_object_flush_if_display(struct drm_i915_gem_object *obj);
>   void i915_gem_object_flush_if_display_locked(struct drm_i915_gem_object *obj);
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> index 41b35abccf88..132ce01dee9f 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> @@ -195,6 +195,7 @@ enum i915_cache_level {
>   	 */
>   	I915_CACHE_WT,
>   	I915_MAX_CACHE_LEVEL,
> +	I915_CACHE_INVAL = I915_MAX_CACHE_LEVEL,
>   };
>   
>   enum i915_map_type {
> @@ -358,10 +359,28 @@ struct drm_i915_gem_object {
>   #define I915_BO_FLAG_STRUCT_PAGE BIT(0) /* Object backed by struct pages */
>   #define I915_BO_FLAG_IOMEM       BIT(1) /* Object backed by IO memory */
>   	/**
> -	 * @cache_level: The desired GTT caching level.
> +	 * @pat_index: The desired PAT index.
> +	 *
> +	 * See hardware specification for valid PAT indices for each platform.
> +	 * This field used to contain a value of enum i915_cache_level. It's
> +	 * changed to an unsigned int because PAT indices are being used by
> +	 * both UMD and KMD for caching policy control after GEN12.
> +	 * For backward compatibility, this field will continue to contain
> +	 * value of i915_cache_level for pre-GEN12 platforms so that the PTE
> +	 * encode functions for these legacy platforms can stay the same.
> +	 * In the meantime platform specific tables are created to translate
> +	 * i915_cache_level into pat index, for more details check the macros
> +	 * defined i915/i915_pci.c, e.g. PVC_CACHELEVEL.
> +	 */
> +	unsigned int pat_index:6;
> +	/**
> +	 * @cache_level: Indicate whether pat_index is set by UMD
>   	 *
> -	 * See enum i915_cache_level for possible values, along with what
> -	 * each does.
> +	 * This used to hold desired GTT caching level, but is now replaced by
> +	 * pat_index. It's kept here for KMD to tell whether the pat_index is
> +	 * set by UMD or converted from enum i915_cache_level.
> +	 * This field should be 0 by default, but I915_CACHE_INVAL if the
> +	 * pat_index is set by UMD.
>   	 */
>   	unsigned int cache_level:3;
>   	/**
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
> index ee492d823f1b..3b094d36a0b0 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
> @@ -565,7 +565,9 @@ static void dbg_poison(struct i915_ggtt *ggtt,
>   
>   		ggtt->vm.insert_page(&ggtt->vm, addr,
>   				     ggtt->error_capture.start,
> -				     I915_CACHE_NONE, 0);
> +				     i915_gem_get_pat_index(ggtt->vm.i915,
> +							    I915_CACHE_NONE),
> +				     0);
>   		mb();
>   
>   		s = io_mapping_map_wc(&ggtt->iomap,
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
> index 69eb20ed4d47..e40761e13c2a 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
> @@ -214,7 +214,8 @@ static struct dma_fence *i915_ttm_accel_move(struct ttm_buffer_object *bo,
>   
>   		intel_engine_pm_get(to_gt(i915)->migrate.context->engine);
>   		ret = intel_context_migrate_clear(to_gt(i915)->migrate.context, deps,
> -						  dst_st->sgl, dst_level,
> +						  dst_st->sgl,
> +						  i915_gem_get_pat_index(i915, dst_level),
>   						  i915_ttm_gtt_binds_lmem(dst_mem),
>   						  0, &rq);
>   	} else {
> @@ -227,12 +228,13 @@ static struct dma_fence *i915_ttm_accel_move(struct ttm_buffer_object *bo,
>   		src_level = i915_ttm_cache_level(i915, bo->resource, src_ttm);
>   		intel_engine_pm_get(to_gt(i915)->migrate.context->engine);
>   		ret = intel_context_migrate_copy(to_gt(i915)->migrate.context,
> -						 deps, src_rsgt->table.sgl,
> -						 src_level,
> -						 i915_ttm_gtt_binds_lmem(bo->resource),
> -						 dst_st->sgl, dst_level,
> -						 i915_ttm_gtt_binds_lmem(dst_mem),
> -						 &rq);
> +					deps, src_rsgt->table.sgl,
> +					i915_gem_get_pat_index(i915, src_level),
> +					i915_ttm_gtt_binds_lmem(bo->resource),
> +					dst_st->sgl,
> +					i915_gem_get_pat_index(i915, dst_level),
> +					i915_ttm_gtt_binds_lmem(dst_mem),
> +					&rq);
>   
>   		i915_refct_sgt_put(src_rsgt);
>   	}
> diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
> index defece0bcb81..ebb68ac9cd5e 100644
> --- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
> +++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
> @@ -354,7 +354,7 @@ fake_huge_pages_object(struct drm_i915_private *i915, u64 size, bool single)
>   
>   	obj->write_domain = I915_GEM_DOMAIN_CPU;
>   	obj->read_domains = I915_GEM_DOMAIN_CPU;
> -	obj->cache_level = I915_CACHE_NONE;
> +	obj->pat_index = i915_gem_get_pat_index(i915, I915_CACHE_NONE);
>   
>   	return obj;
>   }
> diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c
> index fe6c37fd7859..a93a90b15907 100644
> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c
> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c
> @@ -219,7 +219,7 @@ static int __igt_lmem_pages_migrate(struct intel_gt *gt,
>   			continue;
>   
>   		err = intel_migrate_clear(&gt->migrate, &ww, deps,
> -					  obj->mm.pages->sgl, obj->cache_level,
> +					  obj->mm.pages->sgl, obj->pat_index,
>   					  i915_gem_object_is_lmem(obj),
>   					  0xdeadbeaf, &rq);
>   		if (rq) {
> diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
> index 56279908ed30..a93d8f9f8bc1 100644
> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
> @@ -1222,7 +1222,7 @@ static int __igt_mmap_migrate(struct intel_memory_region **placements,
>   	}
>   
>   	err = intel_context_migrate_clear(to_gt(i915)->migrate.context, NULL,
> -					  obj->mm.pages->sgl, obj->cache_level,
> +					  obj->mm.pages->sgl, obj->pat_index,
>   					  i915_gem_object_is_lmem(obj),
>   					  expand32(POISON_INUSE), &rq);
>   	i915_gem_object_unpin_pages(obj);
> diff --git a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
> index 5aaacc53fa4c..c2bdc133c89a 100644
> --- a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
> +++ b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
> @@ -109,7 +109,7 @@ static void gen6_ppgtt_clear_range(struct i915_address_space *vm,
>   
>   static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
>   				      struct i915_vma_resource *vma_res,
> -				      enum i915_cache_level cache_level,
> +				      unsigned int pat_index,
>   				      u32 flags)
>   {
>   	struct i915_ppgtt *ppgtt = i915_vm_to_ppgtt(vm);
> @@ -117,7 +117,7 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
>   	unsigned int first_entry = vma_res->start / I915_GTT_PAGE_SIZE;
>   	unsigned int act_pt = first_entry / GEN6_PTES;
>   	unsigned int act_pte = first_entry % GEN6_PTES;
> -	const u32 pte_encode = vm->pte_encode(0, cache_level, flags);
> +	const u32 pte_encode = vm->pte_encode(0, pat_index, flags);
>   	struct sgt_dma iter = sgt_dma(vma_res);
>   	gen6_pte_t *vaddr;
>   
> @@ -227,7 +227,9 @@ static int gen6_ppgtt_init_scratch(struct gen6_ppgtt *ppgtt)
>   
>   	vm->scratch[0]->encode =
>   		vm->pte_encode(px_dma(vm->scratch[0]),
> -			       I915_CACHE_NONE, PTE_READ_ONLY);
> +			       i915_gem_get_pat_index(vm->i915,
> +						      I915_CACHE_NONE),
> +			       PTE_READ_ONLY);
>   
>   	vm->scratch[1] = vm->alloc_pt_dma(vm, I915_GTT_PAGE_SIZE_4K);
>   	if (IS_ERR(vm->scratch[1])) {
> @@ -278,7 +280,7 @@ static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
>   static void pd_vma_bind(struct i915_address_space *vm,
>   			struct i915_vm_pt_stash *stash,
>   			struct i915_vma_resource *vma_res,
> -			enum i915_cache_level cache_level,
> +			unsigned int pat_index,
>   			u32 unused)
>   {
>   	struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
> diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> index 7a4b1d1afce9..c046813514f4 100644
> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> @@ -56,7 +56,7 @@ static u64 gen8_pte_encode(dma_addr_t addr,
>   }
>   
>   static u64 mtl_pte_encode(dma_addr_t addr,
> -			  enum i915_cache_level level,
> +			  unsigned int pat_index,
>   			  u32 flags)
>   {
>   	gen8_pte_t pte = addr | GEN8_PAGE_PRESENT | GEN8_PAGE_RW;
> @@ -67,24 +67,17 @@ static u64 mtl_pte_encode(dma_addr_t addr,
>   	if (flags & PTE_LM)
>   		pte |= GEN12_PPGTT_PTE_LM | GEN12_PPGTT_PTE_NC;
>   
> -	switch (level) {
> -	case I915_CACHE_NONE:
> -		pte |= GEN12_PPGTT_PTE_PAT1;
> -		break;
> -	case I915_CACHE_LLC:
> -	case I915_CACHE_L3_LLC:
> -		pte |= GEN12_PPGTT_PTE_PAT0 | GEN12_PPGTT_PTE_PAT1;
> -		break;
> -	case I915_CACHE_WT:
> +	if (pat_index & BIT(0))
>   		pte |= GEN12_PPGTT_PTE_PAT0;
> -		break;
> -	default:
> -		/* This should never happen. Added to deal with the compile
> -		 * error due to the addition of I915_MAX_CACHE_LEVEL. Will
> -		 * be removed by the pat_index patch.
> -		 */
> -		break;
> -	}
> +
> +	if (pat_index & BIT(1))
> +		pte |= GEN12_PPGTT_PTE_PAT1;
> +
> +	if (pat_index & BIT(2))
> +		pte |= GEN12_PPGTT_PTE_PAT2;
> +
> +	if (pat_index & BIT(3))
> +		pte |= MTL_PPGTT_PTE_PAT3;
>   
>   	return pte;
>   }
> @@ -457,11 +450,11 @@ gen8_ppgtt_insert_pte(struct i915_ppgtt *ppgtt,
>   		      struct i915_page_directory *pdp,
>   		      struct sgt_dma *iter,
>   		      u64 idx,
> -		      enum i915_cache_level cache_level,
> +		      unsigned int pat_index,
>   		      u32 flags)
>   {
>   	struct i915_page_directory *pd;
> -	const gen8_pte_t pte_encode = ppgtt->vm.pte_encode(0, cache_level, flags);
> +	const gen8_pte_t pte_encode = ppgtt->vm.pte_encode(0, pat_index, flags);
>   	gen8_pte_t *vaddr;
>   
>   	pd = i915_pd_entry(pdp, gen8_pd_index(idx, 2));
> @@ -504,10 +497,10 @@ static void
>   xehpsdv_ppgtt_insert_huge(struct i915_address_space *vm,
>   			  struct i915_vma_resource *vma_res,
>   			  struct sgt_dma *iter,
> -			  enum i915_cache_level cache_level,
> +			  unsigned int pat_index,
>   			  u32 flags)
>   {
> -	const gen8_pte_t pte_encode = vm->pte_encode(0, cache_level, flags);
> +	const gen8_pte_t pte_encode = vm->pte_encode(0, pat_index, flags);
>   	unsigned int rem = sg_dma_len(iter->sg);
>   	u64 start = vma_res->start;
>   	u64 end = start + vma_res->vma_size;
> @@ -611,10 +604,10 @@ xehpsdv_ppgtt_insert_huge(struct i915_address_space *vm,
>   static void gen8_ppgtt_insert_huge(struct i915_address_space *vm,
>   				   struct i915_vma_resource *vma_res,
>   				   struct sgt_dma *iter,
> -				   enum i915_cache_level cache_level,
> +				   unsigned int pat_index,
>   				   u32 flags)
>   {
> -	const gen8_pte_t pte_encode = vm->pte_encode(0, cache_level, flags);
> +	const gen8_pte_t pte_encode = vm->pte_encode(0, pat_index, flags);
>   	unsigned int rem = sg_dma_len(iter->sg);
>   	u64 start = vma_res->start;
>   
> @@ -734,7 +727,7 @@ static void gen8_ppgtt_insert_huge(struct i915_address_space *vm,
>   
>   static void gen8_ppgtt_insert(struct i915_address_space *vm,
>   			      struct i915_vma_resource *vma_res,
> -			      enum i915_cache_level cache_level,
> +			      unsigned int pat_index,
>   			      u32 flags)
>   {
>   	struct i915_ppgtt * const ppgtt = i915_vm_to_ppgtt(vm);
> @@ -742,9 +735,9 @@ static void gen8_ppgtt_insert(struct i915_address_space *vm,
>   
>   	if (vma_res->bi.page_sizes.sg > I915_GTT_PAGE_SIZE) {
>   		if (HAS_64K_PAGES(vm->i915))
> -			xehpsdv_ppgtt_insert_huge(vm, vma_res, &iter, cache_level, flags);
> +			xehpsdv_ppgtt_insert_huge(vm, vma_res, &iter, pat_index, flags);
>   		else
> -			gen8_ppgtt_insert_huge(vm, vma_res, &iter, cache_level, flags);
> +			gen8_ppgtt_insert_huge(vm, vma_res, &iter, pat_index, flags);
>   	} else  {
>   		u64 idx = vma_res->start >> GEN8_PTE_SHIFT;
>   
> @@ -753,7 +746,7 @@ static void gen8_ppgtt_insert(struct i915_address_space *vm,
>   				gen8_pdp_for_page_index(vm, idx);
>   
>   			idx = gen8_ppgtt_insert_pte(ppgtt, pdp, &iter, idx,
> -						    cache_level, flags);
> +						    pat_index, flags);
>   		} while (idx);
>   
>   		vma_res->page_sizes_gtt = I915_GTT_PAGE_SIZE;
> @@ -763,7 +756,7 @@ static void gen8_ppgtt_insert(struct i915_address_space *vm,
>   static void gen8_ppgtt_insert_entry(struct i915_address_space *vm,
>   				    dma_addr_t addr,
>   				    u64 offset,
> -				    enum i915_cache_level level,
> +				    unsigned int pat_index,
>   				    u32 flags)
>   {
>   	u64 idx = offset >> GEN8_PTE_SHIFT;
> @@ -777,14 +770,14 @@ static void gen8_ppgtt_insert_entry(struct i915_address_space *vm,
>   	GEM_BUG_ON(pt->is_compact);
>   
>   	vaddr = px_vaddr(pt);
> -	vaddr[gen8_pd_index(idx, 0)] = vm->pte_encode(addr, level, flags);
> +	vaddr[gen8_pd_index(idx, 0)] = vm->pte_encode(addr, pat_index, flags);
>   	drm_clflush_virt_range(&vaddr[gen8_pd_index(idx, 0)], sizeof(*vaddr));
>   }
>   
>   static void __xehpsdv_ppgtt_insert_entry_lm(struct i915_address_space *vm,
>   					    dma_addr_t addr,
>   					    u64 offset,
> -					    enum i915_cache_level level,
> +					    unsigned int pat_index,
>   					    u32 flags)
>   {
>   	u64 idx = offset >> GEN8_PTE_SHIFT;
> @@ -807,20 +800,20 @@ static void __xehpsdv_ppgtt_insert_entry_lm(struct i915_address_space *vm,
>   	}
>   
>   	vaddr = px_vaddr(pt);
> -	vaddr[gen8_pd_index(idx, 0) / 16] = vm->pte_encode(addr, level, flags);
> +	vaddr[gen8_pd_index(idx, 0) / 16] = vm->pte_encode(addr, pat_index, flags);
>   }
>   
>   static void xehpsdv_ppgtt_insert_entry(struct i915_address_space *vm,
>   				       dma_addr_t addr,
>   				       u64 offset,
> -				       enum i915_cache_level level,
> +				       unsigned int pat_index,
>   				       u32 flags)
>   {
>   	if (flags & PTE_LM)
>   		return __xehpsdv_ppgtt_insert_entry_lm(vm, addr, offset,
> -						       level, flags);
> +						       pat_index, flags);
>   
> -	return gen8_ppgtt_insert_entry(vm, addr, offset, level, flags);
> +	return gen8_ppgtt_insert_entry(vm, addr, offset, pat_index, flags);
>   }
>   
>   static int gen8_init_scratch(struct i915_address_space *vm)
> @@ -855,7 +848,9 @@ static int gen8_init_scratch(struct i915_address_space *vm)
>   
>   	vm->scratch[0]->encode =
>   		vm->pte_encode(px_dma(vm->scratch[0]),
> -			       I915_CACHE_NONE, pte_flags);
> +			       i915_gem_get_pat_index(vm->i915,
> +						      I915_CACHE_NONE),
> +			       pte_flags);
>   
>   	for (i = 1; i <= vm->top; i++) {
>   		struct drm_i915_gem_object *obj;
> @@ -873,7 +868,9 @@ static int gen8_init_scratch(struct i915_address_space *vm)
>   		}
>   
>   		fill_px(obj, vm->scratch[i - 1]->encode);
> -		obj->encode = gen8_pde_encode(px_dma(obj), I915_CACHE_NONE);
> +		obj->encode = gen8_pde_encode(px_dma(obj),
> +					      i915_gem_get_pat_index(vm->i915,
> +								     I915_CACHE_NONE));
>   
>   		vm->scratch[i] = obj;
>   	}
> diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.h b/drivers/gpu/drm/i915/gt/gen8_ppgtt.h
> index f541d19264b4..19c635441642 100644
> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.h
> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.h
> @@ -10,13 +10,12 @@
>   
>   struct i915_address_space;
>   struct intel_gt;
> -enum i915_cache_level;
>   
>   struct i915_ppgtt *gen8_ppgtt_create(struct intel_gt *gt,
>   				     unsigned long lmem_pt_obj_flags);
>   
>   u64 gen8_ggtt_pte_encode(dma_addr_t addr,
> -			 enum i915_cache_level level,
> +			 unsigned int pat_index,
>   			 u32 flags);
>   
>   #endif
> diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> index c8390d03fce2..2a7942fac798 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> @@ -221,7 +221,7 @@ static void guc_ggtt_invalidate(struct i915_ggtt *ggtt)
>   }
>   
>   static u64 mtl_ggtt_pte_encode(dma_addr_t addr,
> -			       enum i915_cache_level level,
> +			       unsigned int pat_index,
>   			       u32 flags)
>   {
>   	gen8_pte_t pte = addr | GEN8_PAGE_PRESENT;
> @@ -231,30 +231,17 @@ static u64 mtl_ggtt_pte_encode(dma_addr_t addr,
>   	if (flags & PTE_LM)
>   		pte |= GEN12_GGTT_PTE_LM;
>   
> -	switch (level) {
> -	case I915_CACHE_NONE:
> -		pte |= MTL_GGTT_PTE_PAT1;
> -		break;
> -	case I915_CACHE_LLC:
> -	case I915_CACHE_L3_LLC:
> -		pte |= MTL_GGTT_PTE_PAT0 | MTL_GGTT_PTE_PAT1;
> -		break;
> -	case I915_CACHE_WT:
> +	if (pat_index & BIT(0))
>   		pte |= MTL_GGTT_PTE_PAT0;
> -		break;
> -	default:
> -		/* This should never happen. Added to deal with the compile
> -		 * error due to the addition of I915_MAX_CACHE_LEVEL. Will
> -		 * be removed by the pat_index patch.
> -		 */
> -		break;
> -	}
> +
> +	if (pat_index & BIT(1))
> +		pte |= MTL_GGTT_PTE_PAT1;
>   
>   	return pte;
>   }
>   
>   u64 gen8_ggtt_pte_encode(dma_addr_t addr,
> -			 enum i915_cache_level level,
> +			 unsigned int pat_index,
>   			 u32 flags)
>   {
>   	gen8_pte_t pte = addr | GEN8_PAGE_PRESENT;
> @@ -273,25 +260,25 @@ static void gen8_set_pte(void __iomem *addr, gen8_pte_t pte)
>   static void gen8_ggtt_insert_page(struct i915_address_space *vm,
>   				  dma_addr_t addr,
>   				  u64 offset,
> -				  enum i915_cache_level level,
> +				  unsigned int pat_index,
>   				  u32 flags)
>   {
>   	struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
>   	gen8_pte_t __iomem *pte =
>   		(gen8_pte_t __iomem *)ggtt->gsm + offset / I915_GTT_PAGE_SIZE;
>   
> -	gen8_set_pte(pte, ggtt->vm.pte_encode(addr, level, flags));
> +	gen8_set_pte(pte, ggtt->vm.pte_encode(addr, pat_index, flags));
>   
>   	ggtt->invalidate(ggtt);
>   }
>   
>   static void gen8_ggtt_insert_entries(struct i915_address_space *vm,
>   				     struct i915_vma_resource *vma_res,
> -				     enum i915_cache_level level,
> +				     unsigned int pat_index,
>   				     u32 flags)
>   {
>   	struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
> -	const gen8_pte_t pte_encode = ggtt->vm.pte_encode(0, level, flags);
> +	const gen8_pte_t pte_encode = ggtt->vm.pte_encode(0, pat_index, flags);
>   	gen8_pte_t __iomem *gte;
>   	gen8_pte_t __iomem *end;
>   	struct sgt_iter iter;
> @@ -348,14 +335,14 @@ static void gen8_ggtt_clear_range(struct i915_address_space *vm,
>   static void gen6_ggtt_insert_page(struct i915_address_space *vm,
>   				  dma_addr_t addr,
>   				  u64 offset,
> -				  enum i915_cache_level level,
> +				  unsigned int pat_index,
>   				  u32 flags)
>   {
>   	struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
>   	gen6_pte_t __iomem *pte =
>   		(gen6_pte_t __iomem *)ggtt->gsm + offset / I915_GTT_PAGE_SIZE;
>   
> -	iowrite32(vm->pte_encode(addr, level, flags), pte);
> +	iowrite32(vm->pte_encode(addr, pat_index, flags), pte);
>   
>   	ggtt->invalidate(ggtt);
>   }
> @@ -368,7 +355,7 @@ static void gen6_ggtt_insert_page(struct i915_address_space *vm,
>    */
>   static void gen6_ggtt_insert_entries(struct i915_address_space *vm,
>   				     struct i915_vma_resource *vma_res,
> -				     enum i915_cache_level level,
> +				     unsigned int pat_index,
>   				     u32 flags)
>   {
>   	struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
> @@ -385,7 +372,7 @@ static void gen6_ggtt_insert_entries(struct i915_address_space *vm,
>   		iowrite32(vm->scratch[0]->encode, gte++);
>   	end += (vma_res->node_size + vma_res->guard) / I915_GTT_PAGE_SIZE;
>   	for_each_sgt_daddr(addr, iter, vma_res->bi.pages)
> -		iowrite32(vm->pte_encode(addr, level, flags), gte++);
> +		iowrite32(vm->pte_encode(addr, pat_index, flags), gte++);
>   	GEM_BUG_ON(gte > end);
>   
>   	/* Fill the allocated but "unused" space beyond the end of the buffer */
> @@ -420,14 +407,15 @@ struct insert_page {
>   	struct i915_address_space *vm;
>   	dma_addr_t addr;
>   	u64 offset;
> -	enum i915_cache_level level;
> +	unsigned int pat_index;
>   };
>   
>   static int bxt_vtd_ggtt_insert_page__cb(void *_arg)
>   {
>   	struct insert_page *arg = _arg;
>   
> -	gen8_ggtt_insert_page(arg->vm, arg->addr, arg->offset, arg->level, 0);
> +	gen8_ggtt_insert_page(arg->vm, arg->addr, arg->offset,
> +			      arg->pat_index, 0);
>   	bxt_vtd_ggtt_wa(arg->vm);
>   
>   	return 0;
> @@ -436,10 +424,10 @@ static int bxt_vtd_ggtt_insert_page__cb(void *_arg)
>   static void bxt_vtd_ggtt_insert_page__BKL(struct i915_address_space *vm,
>   					  dma_addr_t addr,
>   					  u64 offset,
> -					  enum i915_cache_level level,
> +					  unsigned int pat_index,
>   					  u32 unused)
>   {
> -	struct insert_page arg = { vm, addr, offset, level };
> +	struct insert_page arg = { vm, addr, offset, pat_index };
>   
>   	stop_machine(bxt_vtd_ggtt_insert_page__cb, &arg, NULL);
>   }
> @@ -447,7 +435,7 @@ static void bxt_vtd_ggtt_insert_page__BKL(struct i915_address_space *vm,
>   struct insert_entries {
>   	struct i915_address_space *vm;
>   	struct i915_vma_resource *vma_res;
> -	enum i915_cache_level level;
> +	unsigned int pat_index;
>   	u32 flags;
>   };
>   
> @@ -455,7 +443,8 @@ static int bxt_vtd_ggtt_insert_entries__cb(void *_arg)
>   {
>   	struct insert_entries *arg = _arg;
>   
> -	gen8_ggtt_insert_entries(arg->vm, arg->vma_res, arg->level, arg->flags);
> +	gen8_ggtt_insert_entries(arg->vm, arg->vma_res,
> +				 arg->pat_index, arg->flags);
>   	bxt_vtd_ggtt_wa(arg->vm);
>   
>   	return 0;
> @@ -463,10 +452,10 @@ static int bxt_vtd_ggtt_insert_entries__cb(void *_arg)
>   
>   static void bxt_vtd_ggtt_insert_entries__BKL(struct i915_address_space *vm,
>   					     struct i915_vma_resource *vma_res,
> -					     enum i915_cache_level level,
> +					     unsigned int pat_index,
>   					     u32 flags)
>   {
> -	struct insert_entries arg = { vm, vma_res, level, flags };
> +	struct insert_entries arg = { vm, vma_res, pat_index, flags };
>   
>   	stop_machine(bxt_vtd_ggtt_insert_entries__cb, &arg, NULL);
>   }
> @@ -495,7 +484,7 @@ static void gen6_ggtt_clear_range(struct i915_address_space *vm,
>   void intel_ggtt_bind_vma(struct i915_address_space *vm,
>   			 struct i915_vm_pt_stash *stash,
>   			 struct i915_vma_resource *vma_res,
> -			 enum i915_cache_level cache_level,
> +			 unsigned int pat_index,
>   			 u32 flags)
>   {
>   	u32 pte_flags;
> @@ -512,7 +501,7 @@ void intel_ggtt_bind_vma(struct i915_address_space *vm,
>   	if (vma_res->bi.lmem)
>   		pte_flags |= PTE_LM;
>   
> -	vm->insert_entries(vm, vma_res, cache_level, pte_flags);
> +	vm->insert_entries(vm, vma_res, pat_index, pte_flags);
>   	vma_res->page_sizes_gtt = I915_GTT_PAGE_SIZE;
>   }
>   
> @@ -661,7 +650,7 @@ static int init_ggtt(struct i915_ggtt *ggtt)
>   static void aliasing_gtt_bind_vma(struct i915_address_space *vm,
>   				  struct i915_vm_pt_stash *stash,
>   				  struct i915_vma_resource *vma_res,
> -				  enum i915_cache_level cache_level,
> +				  unsigned int pat_index,
>   				  u32 flags)
>   {
>   	u32 pte_flags;
> @@ -673,10 +662,10 @@ static void aliasing_gtt_bind_vma(struct i915_address_space *vm,
>   
>   	if (flags & I915_VMA_LOCAL_BIND)
>   		ppgtt_bind_vma(&i915_vm_to_ggtt(vm)->alias->vm,
> -			       stash, vma_res, cache_level, flags);
> +			       stash, vma_res, pat_index, flags);
>   
>   	if (flags & I915_VMA_GLOBAL_BIND)
> -		vm->insert_entries(vm, vma_res, cache_level, pte_flags);
> +		vm->insert_entries(vm, vma_res, pat_index, pte_flags);
>   
>   	vma_res->bound_flags |= flags;
>   }
> @@ -933,7 +922,9 @@ static int ggtt_probe_common(struct i915_ggtt *ggtt, u64 size)
>   
>   	ggtt->vm.scratch[0]->encode =
>   		ggtt->vm.pte_encode(px_dma(ggtt->vm.scratch[0]),
> -				    I915_CACHE_NONE, pte_flags);
> +				    i915_gem_get_pat_index(i915,
> +							   I915_CACHE_NONE),
> +				    pte_flags);
>   
>   	return 0;
>   }
> @@ -1022,6 +1013,11 @@ static int gen8_gmch_probe(struct i915_ggtt *ggtt)
>   	return ggtt_probe_common(ggtt, size);
>   }
>   
> +/*
> + * For pre-gen8 platforms pat_index is the same as enum i915_cache_level,
> + * so these PTE encode functions are left with using cache_level.
> + * See translation table LEGACY_CACHELEVEL.
> + */
>   static u64 snb_pte_encode(dma_addr_t addr,
>   			  enum i915_cache_level level,
>   			  u32 flags)
> @@ -1302,7 +1298,9 @@ bool i915_ggtt_resume_vm(struct i915_address_space *vm)
>   		 */
>   		vma->resource->bound_flags = 0;
>   		vma->ops->bind_vma(vm, NULL, vma->resource,
> -				   obj ? obj->cache_level : 0,
> +				   obj ? obj->pat_index :
> +					 i915_gem_get_pat_index(vm->i915,
> +								I915_CACHE_NONE),
>   				   was_bound);
>   
>   		if (obj) { /* only used during resume => exclusive access */
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
> index 854ec09fd588..be767e13b1e5 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
> @@ -165,8 +165,6 @@ typedef u64 gen8_pte_t;
>   #define MTL_2_COH_1W	REG_FIELD_PREP(MTL_PAT_INDEX_COH_MODE_MASK, 2)
>   #define MTL_0_COH_NON	REG_FIELD_PREP(MTL_PAT_INDEX_COH_MODE_MASK, 0)
>   
> -enum i915_cache_level;
> -
>   struct drm_i915_gem_object;
>   struct i915_fence_reg;
>   struct i915_vma;
> @@ -234,7 +232,7 @@ struct i915_vma_ops {
>   	void (*bind_vma)(struct i915_address_space *vm,
>   			 struct i915_vm_pt_stash *stash,
>   			 struct i915_vma_resource *vma_res,
> -			 enum i915_cache_level cache_level,
> +			 unsigned int pat_index,
>   			 u32 flags);
>   	/*
>   	 * Unmap an object from an address space. This usually consists of
> @@ -306,7 +304,7 @@ struct i915_address_space {
>   		(*alloc_scratch_dma)(struct i915_address_space *vm, int sz);
>   
>   	u64 (*pte_encode)(dma_addr_t addr,
> -			  enum i915_cache_level level,
> +			  unsigned int pat_index,
>   			  u32 flags); /* Create a valid PTE */
>   #define PTE_READ_ONLY	BIT(0)
>   #define PTE_LM		BIT(1)
> @@ -321,20 +319,20 @@ struct i915_address_space {
>   	void (*insert_page)(struct i915_address_space *vm,
>   			    dma_addr_t addr,
>   			    u64 offset,
> -			    enum i915_cache_level cache_level,
> +			    unsigned int pat_index,
>   			    u32 flags);
>   	void (*insert_entries)(struct i915_address_space *vm,
>   			       struct i915_vma_resource *vma_res,
> -			       enum i915_cache_level cache_level,
> +			       unsigned int pat_index,
>   			       u32 flags);
>   	void (*raw_insert_page)(struct i915_address_space *vm,
>   				dma_addr_t addr,
>   				u64 offset,
> -				enum i915_cache_level cache_level,
> +				unsigned int pat_index,
>   				u32 flags);
>   	void (*raw_insert_entries)(struct i915_address_space *vm,
>   				   struct i915_vma_resource *vma_res,
> -				   enum i915_cache_level cache_level,
> +				   unsigned int pat_index,
>   				   u32 flags);
>   	void (*cleanup)(struct i915_address_space *vm);
>   
> @@ -581,7 +579,7 @@ void ppgtt_init(struct i915_ppgtt *ppgtt, struct intel_gt *gt,
>   void intel_ggtt_bind_vma(struct i915_address_space *vm,
>   			 struct i915_vm_pt_stash *stash,
>   			 struct i915_vma_resource *vma_res,
> -			 enum i915_cache_level cache_level,
> +			 unsigned int pat_index,
>   			 u32 flags);
>   void intel_ggtt_unbind_vma(struct i915_address_space *vm,
>   			   struct i915_vma_resource *vma_res);
> @@ -639,7 +637,7 @@ void
>   __set_pd_entry(struct i915_page_directory * const pd,
>   	       const unsigned short idx,
>   	       struct i915_page_table *pt,
> -	       u64 (*encode)(const dma_addr_t, const enum i915_cache_level));
> +	       u64 (*encode)(const dma_addr_t, const unsigned int pat_index));
>   
>   #define set_pd_entry(pd, idx, to) \
>   	__set_pd_entry((pd), (idx), px_pt(to), gen8_pde_encode)
> @@ -659,7 +657,7 @@ void gen6_ggtt_invalidate(struct i915_ggtt *ggtt);
>   void ppgtt_bind_vma(struct i915_address_space *vm,
>   		    struct i915_vm_pt_stash *stash,
>   		    struct i915_vma_resource *vma_res,
> -		    enum i915_cache_level cache_level,
> +		    unsigned int pat_index,
>   		    u32 flags);
>   void ppgtt_unbind_vma(struct i915_address_space *vm,
>   		      struct i915_vma_resource *vma_res);
> diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c
> index 3f638f198796..117c3d05af3e 100644
> --- a/drivers/gpu/drm/i915/gt/intel_migrate.c
> +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
> @@ -45,7 +45,9 @@ static void xehpsdv_toggle_pdes(struct i915_address_space *vm,
>   	 * Insert a dummy PTE into every PT that will map to LMEM to ensure
>   	 * we have a correctly setup PDE structure for later use.
>   	 */
> -	vm->insert_page(vm, 0, d->offset, I915_CACHE_NONE, PTE_LM);
> +	vm->insert_page(vm, 0, d->offset,
> +			i915_gem_get_pat_index(vm->i915, I915_CACHE_NONE),
> +			PTE_LM);
>   	GEM_BUG_ON(!pt->is_compact);
>   	d->offset += SZ_2M;
>   }
> @@ -63,7 +65,9 @@ static void xehpsdv_insert_pte(struct i915_address_space *vm,
>   	 * alignment is 64K underneath for the pt, and we are careful
>   	 * not to access the space in the void.
>   	 */
> -	vm->insert_page(vm, px_dma(pt), d->offset, I915_CACHE_NONE, PTE_LM);
> +	vm->insert_page(vm, px_dma(pt), d->offset,
> +			i915_gem_get_pat_index(vm->i915, I915_CACHE_NONE),
> +			PTE_LM);
>   	d->offset += SZ_64K;
>   }
>   
> @@ -73,7 +77,8 @@ static void insert_pte(struct i915_address_space *vm,
>   {
>   	struct insert_pte_data *d = data;
>   
> -	vm->insert_page(vm, px_dma(pt), d->offset, I915_CACHE_NONE,
> +	vm->insert_page(vm, px_dma(pt), d->offset,
> +			i915_gem_get_pat_index(vm->i915, I915_CACHE_NONE),
>   			i915_gem_object_is_lmem(pt->base) ? PTE_LM : 0);
>   	d->offset += PAGE_SIZE;
>   }
> @@ -356,13 +361,13 @@ static int max_pte_pkt_size(struct i915_request *rq, int pkt)
>   
>   static int emit_pte(struct i915_request *rq,
>   		    struct sgt_dma *it,
> -		    enum i915_cache_level cache_level,
> +		    unsigned int pat_index,
>   		    bool is_lmem,
>   		    u64 offset,
>   		    int length)
>   {
>   	bool has_64K_pages = HAS_64K_PAGES(rq->engine->i915);
> -	const u64 encode = rq->context->vm->pte_encode(0, cache_level,
> +	const u64 encode = rq->context->vm->pte_encode(0, pat_index,
>   						       is_lmem ? PTE_LM : 0);
>   	struct intel_ring *ring = rq->ring;
>   	int pkt, dword_length;
> @@ -673,17 +678,17 @@ int
>   intel_context_migrate_copy(struct intel_context *ce,
>   			   const struct i915_deps *deps,
>   			   struct scatterlist *src,
> -			   enum i915_cache_level src_cache_level,
> +			   unsigned int src_pat_index,
>   			   bool src_is_lmem,
>   			   struct scatterlist *dst,
> -			   enum i915_cache_level dst_cache_level,
> +			   unsigned int dst_pat_index,
>   			   bool dst_is_lmem,
>   			   struct i915_request **out)
>   {
>   	struct sgt_dma it_src = sg_sgt(src), it_dst = sg_sgt(dst), it_ccs;
>   	struct drm_i915_private *i915 = ce->engine->i915;
>   	u64 ccs_bytes_to_cpy = 0, bytes_to_cpy;
> -	enum i915_cache_level ccs_cache_level;
> +	unsigned int ccs_pat_index;
>   	u32 src_offset, dst_offset;
>   	u8 src_access, dst_access;
>   	struct i915_request *rq;
> @@ -707,12 +712,12 @@ intel_context_migrate_copy(struct intel_context *ce,
>   		dst_sz = scatter_list_length(dst);
>   		if (src_is_lmem) {
>   			it_ccs = it_dst;
> -			ccs_cache_level = dst_cache_level;
> +			ccs_pat_index = dst_pat_index;
>   			ccs_is_src = false;
>   		} else if (dst_is_lmem) {
>   			bytes_to_cpy = dst_sz;
>   			it_ccs = it_src;
> -			ccs_cache_level = src_cache_level;
> +			ccs_pat_index = src_pat_index;
>   			ccs_is_src = true;
>   		}
>   
> @@ -773,7 +778,7 @@ intel_context_migrate_copy(struct intel_context *ce,
>   		src_sz = calculate_chunk_sz(i915, src_is_lmem,
>   					    bytes_to_cpy, ccs_bytes_to_cpy);
>   
> -		len = emit_pte(rq, &it_src, src_cache_level, src_is_lmem,
> +		len = emit_pte(rq, &it_src, src_pat_index, src_is_lmem,
>   			       src_offset, src_sz);
>   		if (!len) {
>   			err = -EINVAL;
> @@ -784,7 +789,7 @@ intel_context_migrate_copy(struct intel_context *ce,
>   			goto out_rq;
>   		}
>   
> -		err = emit_pte(rq, &it_dst, dst_cache_level, dst_is_lmem,
> +		err = emit_pte(rq, &it_dst, dst_pat_index, dst_is_lmem,
>   			       dst_offset, len);
>   		if (err < 0)
>   			goto out_rq;
> @@ -811,7 +816,7 @@ intel_context_migrate_copy(struct intel_context *ce,
>   				goto out_rq;
>   
>   			ccs_sz = GET_CCS_BYTES(i915, len);
> -			err = emit_pte(rq, &it_ccs, ccs_cache_level, false,
> +			err = emit_pte(rq, &it_ccs, ccs_pat_index, false,
>   				       ccs_is_src ? src_offset : dst_offset,
>   				       ccs_sz);
>   			if (err < 0)
> @@ -979,7 +984,7 @@ int
>   intel_context_migrate_clear(struct intel_context *ce,
>   			    const struct i915_deps *deps,
>   			    struct scatterlist *sg,
> -			    enum i915_cache_level cache_level,
> +			    unsigned int pat_index,
>   			    bool is_lmem,
>   			    u32 value,
>   			    struct i915_request **out)
> @@ -1027,7 +1032,7 @@ intel_context_migrate_clear(struct intel_context *ce,
>   		if (err)
>   			goto out_rq;
>   
> -		len = emit_pte(rq, &it, cache_level, is_lmem, offset, CHUNK_SZ);
> +		len = emit_pte(rq, &it, pat_index, is_lmem, offset, CHUNK_SZ);
>   		if (len <= 0) {
>   			err = len;
>   			goto out_rq;
> @@ -1074,10 +1079,10 @@ int intel_migrate_copy(struct intel_migrate *m,
>   		       struct i915_gem_ww_ctx *ww,
>   		       const struct i915_deps *deps,
>   		       struct scatterlist *src,
> -		       enum i915_cache_level src_cache_level,
> +		       unsigned int src_pat_index,
>   		       bool src_is_lmem,
>   		       struct scatterlist *dst,
> -		       enum i915_cache_level dst_cache_level,
> +		       unsigned int dst_pat_index,
>   		       bool dst_is_lmem,
>   		       struct i915_request **out)
>   {
> @@ -1098,8 +1103,8 @@ int intel_migrate_copy(struct intel_migrate *m,
>   		goto out;
>   
>   	err = intel_context_migrate_copy(ce, deps,
> -					 src, src_cache_level, src_is_lmem,
> -					 dst, dst_cache_level, dst_is_lmem,
> +					 src, src_pat_index, src_is_lmem,
> +					 dst, dst_pat_index, dst_is_lmem,
>   					 out);
>   
>   	intel_context_unpin(ce);
> @@ -1113,7 +1118,7 @@ intel_migrate_clear(struct intel_migrate *m,
>   		    struct i915_gem_ww_ctx *ww,
>   		    const struct i915_deps *deps,
>   		    struct scatterlist *sg,
> -		    enum i915_cache_level cache_level,
> +		    unsigned int pat_index,
>   		    bool is_lmem,
>   		    u32 value,
>   		    struct i915_request **out)
> @@ -1134,7 +1139,7 @@ intel_migrate_clear(struct intel_migrate *m,
>   	if (err)
>   		goto out;
>   
> -	err = intel_context_migrate_clear(ce, deps, sg, cache_level,
> +	err = intel_context_migrate_clear(ce, deps, sg, pat_index,
>   					  is_lmem, value, out);
>   
>   	intel_context_unpin(ce);
> diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.h b/drivers/gpu/drm/i915/gt/intel_migrate.h
> index ccc677ec4aa3..11fc09a00c4b 100644
> --- a/drivers/gpu/drm/i915/gt/intel_migrate.h
> +++ b/drivers/gpu/drm/i915/gt/intel_migrate.h
> @@ -16,7 +16,6 @@ struct i915_request;
>   struct i915_gem_ww_ctx;
>   struct intel_gt;
>   struct scatterlist;
> -enum i915_cache_level;
>   
>   int intel_migrate_init(struct intel_migrate *m, struct intel_gt *gt);
>   
> @@ -26,20 +25,20 @@ int intel_migrate_copy(struct intel_migrate *m,
>   		       struct i915_gem_ww_ctx *ww,
>   		       const struct i915_deps *deps,
>   		       struct scatterlist *src,
> -		       enum i915_cache_level src_cache_level,
> +		       unsigned int src_pat_index,
>   		       bool src_is_lmem,
>   		       struct scatterlist *dst,
> -		       enum i915_cache_level dst_cache_level,
> +		       unsigned int dst_pat_index,
>   		       bool dst_is_lmem,
>   		       struct i915_request **out);
>   
>   int intel_context_migrate_copy(struct intel_context *ce,
>   			       const struct i915_deps *deps,
>   			       struct scatterlist *src,
> -			       enum i915_cache_level src_cache_level,
> +			       unsigned int src_pat_index,
>   			       bool src_is_lmem,
>   			       struct scatterlist *dst,
> -			       enum i915_cache_level dst_cache_level,
> +			       unsigned int dst_pat_index,
>   			       bool dst_is_lmem,
>   			       struct i915_request **out);
>   
> @@ -48,7 +47,7 @@ intel_migrate_clear(struct intel_migrate *m,
>   		    struct i915_gem_ww_ctx *ww,
>   		    const struct i915_deps *deps,
>   		    struct scatterlist *sg,
> -		    enum i915_cache_level cache_level,
> +		    unsigned int pat_index,
>   		    bool is_lmem,
>   		    u32 value,
>   		    struct i915_request **out);
> @@ -56,7 +55,7 @@ int
>   intel_context_migrate_clear(struct intel_context *ce,
>   			    const struct i915_deps *deps,
>   			    struct scatterlist *sg,
> -			    enum i915_cache_level cache_level,
> +			    unsigned int pat_index,
>   			    bool is_lmem,
>   			    u32 value,
>   			    struct i915_request **out);
> diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
> index 7ecfa672f738..f0da3555c6db 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
> @@ -98,7 +98,7 @@ void
>   __set_pd_entry(struct i915_page_directory * const pd,
>   	       const unsigned short idx,
>   	       struct i915_page_table * const to,
> -	       u64 (*encode)(const dma_addr_t, const enum i915_cache_level))
> +	       u64 (*encode)(const dma_addr_t, const unsigned int))
>   {
>   	/* Each thread pre-pins the pd, and we may have a thread per pde. */
>   	GEM_BUG_ON(atomic_read(px_used(pd)) > NALLOC * I915_PDES);
> @@ -181,7 +181,7 @@ struct i915_ppgtt *i915_ppgtt_create(struct intel_gt *gt,
>   void ppgtt_bind_vma(struct i915_address_space *vm,
>   		    struct i915_vm_pt_stash *stash,
>   		    struct i915_vma_resource *vma_res,
> -		    enum i915_cache_level cache_level,
> +		    unsigned int pat_index,
>   		    u32 flags)
>   {
>   	u32 pte_flags;
> @@ -199,7 +199,7 @@ void ppgtt_bind_vma(struct i915_address_space *vm,
>   	if (vma_res->bi.lmem)
>   		pte_flags |= PTE_LM;
>   
> -	vm->insert_entries(vm, vma_res, cache_level, pte_flags);
> +	vm->insert_entries(vm, vma_res, pat_index, pte_flags);
>   	wmb();
>   }
>   
> diff --git a/drivers/gpu/drm/i915/gt/selftest_migrate.c b/drivers/gpu/drm/i915/gt/selftest_migrate.c
> index e677f2da093d..3def5ca72dec 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_migrate.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_migrate.c
> @@ -137,7 +137,7 @@ static int copy(struct intel_migrate *migrate,
>   static int intel_context_copy_ccs(struct intel_context *ce,
>   				  const struct i915_deps *deps,
>   				  struct scatterlist *sg,
> -				  enum i915_cache_level cache_level,
> +				  unsigned int pat_index,
>   				  bool write_to_ccs,
>   				  struct i915_request **out)
>   {
> @@ -185,7 +185,7 @@ static int intel_context_copy_ccs(struct intel_context *ce,
>   		if (err)
>   			goto out_rq;
>   
> -		len = emit_pte(rq, &it, cache_level, true, offset, CHUNK_SZ);
> +		len = emit_pte(rq, &it, pat_index, true, offset, CHUNK_SZ);
>   		if (len <= 0) {
>   			err = len;
>   			goto out_rq;
> @@ -223,7 +223,7 @@ intel_migrate_ccs_copy(struct intel_migrate *m,
>   		       struct i915_gem_ww_ctx *ww,
>   		       const struct i915_deps *deps,
>   		       struct scatterlist *sg,
> -		       enum i915_cache_level cache_level,
> +		       unsigned int pat_index,
>   		       bool write_to_ccs,
>   		       struct i915_request **out)
>   {
> @@ -243,7 +243,7 @@ intel_migrate_ccs_copy(struct intel_migrate *m,
>   	if (err)
>   		goto out;
>   
> -	err = intel_context_copy_ccs(ce, deps, sg, cache_level,
> +	err = intel_context_copy_ccs(ce, deps, sg, pat_index,
>   				     write_to_ccs, out);
>   
>   	intel_context_unpin(ce);
> @@ -300,7 +300,7 @@ static int clear(struct intel_migrate *migrate,
>   			/* Write the obj data into ccs surface */
>   			err = intel_migrate_ccs_copy(migrate, &ww, NULL,
>   						     obj->mm.pages->sgl,
> -						     obj->cache_level,
> +						     obj->pat_index,
>   						     true, &rq);
>   			if (rq && !err) {
>   				if (i915_request_wait(rq, 0, HZ) < 0) {
> @@ -351,7 +351,7 @@ static int clear(struct intel_migrate *migrate,
>   
>   			err = intel_migrate_ccs_copy(migrate, &ww, NULL,
>   						     obj->mm.pages->sgl,
> -						     obj->cache_level,
> +						     obj->pat_index,
>   						     false, &rq);
>   			if (rq && !err) {
>   				if (i915_request_wait(rq, 0, HZ) < 0) {
> @@ -414,9 +414,9 @@ static int __migrate_copy(struct intel_migrate *migrate,
>   			  struct i915_request **out)
>   {
>   	return intel_migrate_copy(migrate, ww, NULL,
> -				  src->mm.pages->sgl, src->cache_level,
> +				  src->mm.pages->sgl, src->pat_index,
>   				  i915_gem_object_is_lmem(src),
> -				  dst->mm.pages->sgl, dst->cache_level,
> +				  dst->mm.pages->sgl, dst->pat_index,
>   				  i915_gem_object_is_lmem(dst),
>   				  out);
>   }
> @@ -428,9 +428,9 @@ static int __global_copy(struct intel_migrate *migrate,
>   			 struct i915_request **out)
>   {
>   	return intel_context_migrate_copy(migrate->context, NULL,
> -					  src->mm.pages->sgl, src->cache_level,
> +					  src->mm.pages->sgl, src->pat_index,
>   					  i915_gem_object_is_lmem(src),
> -					  dst->mm.pages->sgl, dst->cache_level,
> +					  dst->mm.pages->sgl, dst->pat_index,
>   					  i915_gem_object_is_lmem(dst),
>   					  out);
>   }
> @@ -455,7 +455,7 @@ static int __migrate_clear(struct intel_migrate *migrate,
>   {
>   	return intel_migrate_clear(migrate, ww, NULL,
>   				   obj->mm.pages->sgl,
> -				   obj->cache_level,
> +				   obj->pat_index,
>   				   i915_gem_object_is_lmem(obj),
>   				   value, out);
>   }
> @@ -468,7 +468,7 @@ static int __global_clear(struct intel_migrate *migrate,
>   {
>   	return intel_context_migrate_clear(migrate->context, NULL,
>   					   obj->mm.pages->sgl,
> -					   obj->cache_level,
> +					   obj->pat_index,
>   					   i915_gem_object_is_lmem(obj),
>   					   value, out);
>   }
> @@ -648,7 +648,7 @@ static int live_emit_pte_full_ring(void *arg)
>   	 */
>   	pr_info("%s emite_pte ring space=%u\n", __func__, rq->ring->space);
>   	it = sg_sgt(obj->mm.pages->sgl);
> -	len = emit_pte(rq, &it, obj->cache_level, false, 0, CHUNK_SZ);
> +	len = emit_pte(rq, &it, obj->pat_index, false, 0, CHUNK_SZ);
>   	if (!len) {
>   		err = -EINVAL;
>   		goto out_rq;
> @@ -844,7 +844,7 @@ static int wrap_ktime_compare(const void *A, const void *B)
>   
>   static int __perf_clear_blt(struct intel_context *ce,
>   			    struct scatterlist *sg,
> -			    enum i915_cache_level cache_level,
> +			    unsigned int pat_index,
>   			    bool is_lmem,
>   			    size_t sz)
>   {
> @@ -858,7 +858,7 @@ static int __perf_clear_blt(struct intel_context *ce,
>   
>   		t0 = ktime_get();
>   
> -		err = intel_context_migrate_clear(ce, NULL, sg, cache_level,
> +		err = intel_context_migrate_clear(ce, NULL, sg, pat_index,
>   						  is_lmem, 0, &rq);
>   		if (rq) {
>   			if (i915_request_wait(rq, 0, MAX_SCHEDULE_TIMEOUT) < 0)
> @@ -904,7 +904,8 @@ static int perf_clear_blt(void *arg)
>   
>   		err = __perf_clear_blt(gt->migrate.context,
>   				       dst->mm.pages->sgl,
> -				       I915_CACHE_NONE,
> +				       i915_gem_get_pat_index(gt->i915,
> +							      I915_CACHE_NONE),
>   				       i915_gem_object_is_lmem(dst),
>   				       sizes[i]);
>   
> @@ -919,10 +920,10 @@ static int perf_clear_blt(void *arg)
>   
>   static int __perf_copy_blt(struct intel_context *ce,
>   			   struct scatterlist *src,
> -			   enum i915_cache_level src_cache_level,
> +			   unsigned int src_pat_index,
>   			   bool src_is_lmem,
>   			   struct scatterlist *dst,
> -			   enum i915_cache_level dst_cache_level,
> +			   unsigned int dst_pat_index,
>   			   bool dst_is_lmem,
>   			   size_t sz)
>   {
> @@ -937,9 +938,9 @@ static int __perf_copy_blt(struct intel_context *ce,
>   		t0 = ktime_get();
>   
>   		err = intel_context_migrate_copy(ce, NULL,
> -						 src, src_cache_level,
> +						 src, src_pat_index,
>   						 src_is_lmem,
> -						 dst, dst_cache_level,
> +						 dst, dst_pat_index,
>   						 dst_is_lmem,
>   						 &rq);
>   		if (rq) {
> @@ -994,10 +995,12 @@ static int perf_copy_blt(void *arg)
>   
>   		err = __perf_copy_blt(gt->migrate.context,
>   				      src->mm.pages->sgl,
> -				      I915_CACHE_NONE,
> +				      i915_gem_get_pat_index(gt->i915,
> +							     I915_CACHE_NONE),
>   				      i915_gem_object_is_lmem(src),
>   				      dst->mm.pages->sgl,
> -				      I915_CACHE_NONE,
> +				      i915_gem_get_pat_index(gt->i915,
> +							     I915_CACHE_NONE),
>   				      i915_gem_object_is_lmem(dst),
>   				      sz);
>   
> diff --git a/drivers/gpu/drm/i915/gt/selftest_reset.c b/drivers/gpu/drm/i915/gt/selftest_reset.c
> index a9e0a91bc0e0..79aa6ac66ad2 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_reset.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_reset.c
> @@ -86,7 +86,9 @@ __igt_reset_stolen(struct intel_gt *gt,
>   
>   		ggtt->vm.insert_page(&ggtt->vm, dma,
>   				     ggtt->error_capture.start,
> -				     I915_CACHE_NONE, 0);
> +				     i915_gem_get_pat_index(gt->i915,
> +							    I915_CACHE_NONE),
> +				     0);
>   		mb();
>   
>   		s = io_mapping_map_wc(&ggtt->iomap,
> @@ -127,7 +129,9 @@ __igt_reset_stolen(struct intel_gt *gt,
>   
>   		ggtt->vm.insert_page(&ggtt->vm, dma,
>   				     ggtt->error_capture.start,
> -				     I915_CACHE_NONE, 0);
> +				     i915_gem_get_pat_index(gt->i915,
> +							    I915_CACHE_NONE),
> +				     0);
>   		mb();
>   
>   		s = io_mapping_map_wc(&ggtt->iomap,
> diff --git a/drivers/gpu/drm/i915/gt/selftest_timeline.c b/drivers/gpu/drm/i915/gt/selftest_timeline.c
> index 9f536c251179..39c3ec12df1a 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_timeline.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_timeline.c
> @@ -836,7 +836,7 @@ static int setup_watcher(struct hwsp_watcher *w, struct intel_gt *gt,
>   		return PTR_ERR(obj);
>   
>   	/* keep the same cache settings as timeline */
> -	i915_gem_object_set_cache_coherency(obj, tl->hwsp_ggtt->obj->cache_level);
> +	i915_gem_object_set_pat_index(obj, tl->hwsp_ggtt->obj->pat_index);
>   	w->map = i915_gem_object_pin_map_unlocked(obj,
>   						  page_unmask_bits(tl->hwsp_ggtt->obj->mm.mapping));
>   	if (IS_ERR(w->map)) {
> diff --git a/drivers/gpu/drm/i915/gt/selftest_tlb.c b/drivers/gpu/drm/i915/gt/selftest_tlb.c
> index e6cac1f15d6e..4493c8518e91 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_tlb.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_tlb.c
> @@ -36,6 +36,8 @@ pte_tlbinv(struct intel_context *ce,
>   	   u64 length,
>   	   struct rnd_state *prng)
>   {
> +	const unsigned int pat_index =
> +		i915_gem_get_pat_index(ce->vm->i915, I915_CACHE_NONE);
>   	struct drm_i915_gem_object *batch;
>   	struct drm_mm_node vb_node;
>   	struct i915_request *rq;
> @@ -155,7 +157,7 @@ pte_tlbinv(struct intel_context *ce,
>   		/* Flip the PTE between A and B */
>   		if (i915_gem_object_is_lmem(vb->obj))
>   			pte_flags |= PTE_LM;
> -		ce->vm->insert_entries(ce->vm, &vb_res, 0, pte_flags);
> +		ce->vm->insert_entries(ce->vm, &vb_res, pat_index, pte_flags);
>   
>   		/* Flush the PTE update to concurrent HW */
>   		tlbinv(ce->vm, addr & -length, length);
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
> index a82a53dbbc86..145681ae20a5 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
> @@ -890,9 +890,15 @@ static void uc_fw_bind_ggtt(struct intel_uc_fw *uc_fw)
>   		pte_flags |= PTE_LM;
>   
>   	if (ggtt->vm.raw_insert_entries)
> -		ggtt->vm.raw_insert_entries(&ggtt->vm, dummy, I915_CACHE_NONE, pte_flags);
> +		ggtt->vm.raw_insert_entries(&ggtt->vm, dummy,
> +					    i915_gem_get_pat_index(ggtt->vm.i915,
> +								   I915_CACHE_NONE),
> +					    pte_flags);
>   	else
> -		ggtt->vm.insert_entries(&ggtt->vm, dummy, I915_CACHE_NONE, pte_flags);
> +		ggtt->vm.insert_entries(&ggtt->vm, dummy,
> +					i915_gem_get_pat_index(ggtt->vm.i915,
> +							       I915_CACHE_NONE),
> +					pte_flags);
>   }
>   
>   static void uc_fw_unbind_ggtt(struct intel_uc_fw *uc_fw)
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 41389a32e998..9a4922da3a71 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -139,21 +139,56 @@ static const char *stringify_vma_type(const struct i915_vma *vma)
>   	return "ppgtt";
>   }
>   
> -static const char *i915_cache_level_str(struct drm_i915_private *i915, int type)
> -{
> -	switch (type) {
> -	case I915_CACHE_NONE: return " uncached";
> -	case I915_CACHE_LLC: return HAS_LLC(i915) ? " LLC" : " snooped";
> -	case I915_CACHE_L3_LLC: return " L3+LLC";
> -	case I915_CACHE_WT: return " WT";
> -	default: return "";
> +static const char *i915_cache_level_str(struct drm_i915_gem_object *obj)
> +{
> +	struct drm_i915_private *i915 = obj_to_i915(obj);
> +
> +	if (IS_METEORLAKE(i915)) {
> +		switch (obj->pat_index) {
> +		case 0: return " WB";
> +		case 1: return " WT";
> +		case 2: return " UC";
> +		case 3: return " WB (1-Way Coh)";
> +		case 4: return " WB (2-Way Coh)";
> +		default: return " not defined";
> +		}
> +	} else if (IS_PONTEVECCHIO(i915)) {
> +		switch (obj->pat_index) {
> +		case 0: return " UC";
> +		case 1: return " WC";
> +		case 2: return " WT";
> +		case 3: return " WB";
> +		case 4: return " WT (CLOS1)";
> +		case 5: return " WB (CLOS1)";
> +		case 6: return " WT (CLOS2)";
> +		case 7: return " WT (CLOS2)";
> +		default: return " not defined";
> +		}
> +	} else if (GRAPHICS_VER(i915) >= 12) {
> +		switch (obj->pat_index) {
> +		case 0: return " WB";
> +		case 1: return " WC";
> +		case 2: return " WT";
> +		case 3: return " UC";
> +		default: return " not defined";
> +		}
> +	} else {
> +		if (i915_gem_object_has_cache_level(obj, I915_CACHE_NONE))
> +			return " uncached";
> +		else if (i915_gem_object_has_cache_level(obj, I915_CACHE_LLC))
> +			return HAS_LLC(i915) ? " LLC" : " snooped";
> +		else if (i915_gem_object_has_cache_level(obj, I915_CACHE_L3_LLC))
> +			return " L3+LLC";
> +		else if (i915_gem_object_has_cache_level(obj, I915_CACHE_WT))
> +			return " WT";
> +		else
> +			return " not defined";
>   	}
>   }
>   
>   void
>   i915_debugfs_describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
>   {
> -	struct drm_i915_private *dev_priv = to_i915(obj->base.dev);
>   	struct i915_vma *vma;
>   	int pin_count = 0;
>   
> @@ -165,7 +200,7 @@ i915_debugfs_describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
>   		   obj->base.size / 1024,
>   		   obj->read_domains,
>   		   obj->write_domain,
> -		   i915_cache_level_str(dev_priv, obj->cache_level),
> +		   i915_cache_level_str(obj),
>   		   obj->mm.dirty ? " dirty" : "",
>   		   obj->mm.madv == I915_MADV_DONTNEED ? " purgeable" : "");
>   	if (obj->base.name)
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 0a78bdbd36b1..63207b0740b3 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -420,8 +420,12 @@ i915_gem_gtt_pread(struct drm_i915_gem_object *obj,
>   		page_length = remain < page_length ? remain : page_length;
>   		if (drm_mm_node_allocated(&node)) {
>   			ggtt->vm.insert_page(&ggtt->vm,
> -					     i915_gem_object_get_dma_address(obj, offset >> PAGE_SHIFT),
> -					     node.start, I915_CACHE_NONE, 0);
> +					i915_gem_object_get_dma_address(obj,
> +									offset >> PAGE_SHIFT),
> +					node.start,
> +					i915_gem_get_pat_index(i915,
> +							       I915_CACHE_NONE),
> +					0);
>   		} else {
>   			page_base += offset & PAGE_MASK;
>   		}
> @@ -598,8 +602,12 @@ i915_gem_gtt_pwrite_fast(struct drm_i915_gem_object *obj,
>   			/* flush the write before we modify the GGTT */
>   			intel_gt_flush_ggtt_writes(ggtt->vm.gt);
>   			ggtt->vm.insert_page(&ggtt->vm,
> -					     i915_gem_object_get_dma_address(obj, offset >> PAGE_SHIFT),
> -					     node.start, I915_CACHE_NONE, 0);
> +					i915_gem_object_get_dma_address(obj,
> +									offset >> PAGE_SHIFT),
> +					node.start,
> +					i915_gem_get_pat_index(i915,
> +							       I915_CACHE_NONE),
> +					0);
>   			wmb(); /* flush modifications to the GGTT (insert_page) */
>   		} else {
>   			page_base += offset & PAGE_MASK;
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> index f020c0086fbc..2556cabea02c 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> @@ -1117,10 +1117,14 @@ i915_vma_coredump_create(const struct intel_gt *gt,
>   			mutex_lock(&ggtt->error_mutex);
>   			if (ggtt->vm.raw_insert_page)
>   				ggtt->vm.raw_insert_page(&ggtt->vm, dma, slot,
> -							 I915_CACHE_NONE, 0);
> +						i915_gem_get_pat_index(gt->i915,
> +								       I915_CACHE_NONE),
> +						0);
>   			else
>   				ggtt->vm.insert_page(&ggtt->vm, dma, slot,
> -						     I915_CACHE_NONE, 0);
> +						i915_gem_get_pat_index(gt->i915,
> +								       I915_CACHE_NONE),
> +						0);
>   			mb();
>   
>   			s = io_mapping_map_wc(&ggtt->iomap, slot, PAGE_SIZE);
> diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
> index 20a44788999e..a814775a363d 100644
> --- a/drivers/gpu/drm/i915/i915_vma.c
> +++ b/drivers/gpu/drm/i915/i915_vma.c
> @@ -315,7 +315,7 @@ struct i915_vma_work {
>   	struct i915_vma_resource *vma_res;
>   	struct drm_i915_gem_object *obj;
>   	struct i915_sw_dma_fence_cb cb;
> -	enum i915_cache_level cache_level;
> +	unsigned int pat_index;
>   	unsigned int flags;
>   };
>   
> @@ -334,7 +334,7 @@ static void __vma_bind(struct dma_fence_work *work)
>   		return;
>   
>   	vma_res->ops->bind_vma(vma_res->vm, &vw->stash,
> -			       vma_res, vw->cache_level, vw->flags);
> +			       vma_res, vw->pat_index, vw->flags);
>   }
>   
>   static void __vma_release(struct dma_fence_work *work)
> @@ -426,7 +426,7 @@ i915_vma_resource_init_from_vma(struct i915_vma_resource *vma_res,
>   /**
>    * i915_vma_bind - Sets up PTEs for an VMA in it's corresponding address space.
>    * @vma: VMA to map
> - * @cache_level: mapping cache level
> + * @pat_index: PAT index to set in PTE
>    * @flags: flags like global or local mapping
>    * @work: preallocated worker for allocating and binding the PTE
>    * @vma_res: pointer to a preallocated vma resource. The resource is either
> @@ -437,7 +437,7 @@ i915_vma_resource_init_from_vma(struct i915_vma_resource *vma_res,
>    * Note that DMA addresses are also the only part of the SG table we care about.
>    */
>   int i915_vma_bind(struct i915_vma *vma,
> -		  enum i915_cache_level cache_level,
> +		  unsigned int pat_index,
>   		  u32 flags,
>   		  struct i915_vma_work *work,
>   		  struct i915_vma_resource *vma_res)
> @@ -507,7 +507,7 @@ int i915_vma_bind(struct i915_vma *vma,
>   		struct dma_fence *prev;
>   
>   		work->vma_res = i915_vma_resource_get(vma->resource);
> -		work->cache_level = cache_level;
> +		work->pat_index = pat_index;
>   		work->flags = bind_flags;
>   
>   		/*
> @@ -537,7 +537,7 @@ int i915_vma_bind(struct i915_vma *vma,
>   
>   			return ret;
>   		}
> -		vma->ops->bind_vma(vma->vm, NULL, vma->resource, cache_level,
> +		vma->ops->bind_vma(vma->vm, NULL, vma->resource, pat_index,
>   				   bind_flags);
>   	}
>   
> @@ -814,7 +814,7 @@ i915_vma_insert(struct i915_vma *vma, struct i915_gem_ww_ctx *ww,
>   	color = 0;
>   
>   	if (i915_vm_has_cache_coloring(vma->vm))
> -		color = vma->obj->cache_level;
> +		color = vma->obj->pat_index;
>   
>   	if (flags & PIN_OFFSET_FIXED) {
>   		u64 offset = flags & PIN_OFFSET_MASK;
> @@ -1518,7 +1518,7 @@ int i915_vma_pin_ww(struct i915_vma *vma, struct i915_gem_ww_ctx *ww,
>   
>   	GEM_BUG_ON(!vma->pages);
>   	err = i915_vma_bind(vma,
> -			    vma->obj->cache_level,
> +			    vma->obj->pat_index,
>   			    flags, work, vma_res);
>   	vma_res = NULL;
>   	if (err)
> diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
> index ed5c9d682a1b..31a8f8aa5558 100644
> --- a/drivers/gpu/drm/i915/i915_vma.h
> +++ b/drivers/gpu/drm/i915/i915_vma.h
> @@ -250,7 +250,7 @@ i915_vma_compare(struct i915_vma *vma,
>   
>   struct i915_vma_work *i915_vma_work(void);
>   int i915_vma_bind(struct i915_vma *vma,
> -		  enum i915_cache_level cache_level,
> +		  unsigned int pat_index,
>   		  u32 flags,
>   		  struct i915_vma_work *work,
>   		  struct i915_vma_resource *vma_res);
> diff --git a/drivers/gpu/drm/i915/i915_vma_types.h b/drivers/gpu/drm/i915/i915_vma_types.h
> index 77fda2244d16..64472b7f0e77 100644
> --- a/drivers/gpu/drm/i915/i915_vma_types.h
> +++ b/drivers/gpu/drm/i915/i915_vma_types.h
> @@ -32,8 +32,6 @@
>   
>   #include "gem/i915_gem_object_types.h"
>   
> -enum i915_cache_level;
> -
>   /**
>    * DOC: Global GTT views
>    *
> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem.c b/drivers/gpu/drm/i915/selftests/i915_gem.c
> index d91d0ade8abd..61da4ed9d521 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_gem.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_gem.c
> @@ -57,7 +57,10 @@ static void trash_stolen(struct drm_i915_private *i915)
>   		u32 __iomem *s;
>   		int x;
>   
> -		ggtt->vm.insert_page(&ggtt->vm, dma, slot, I915_CACHE_NONE, 0);
> +		ggtt->vm.insert_page(&ggtt->vm, dma, slot,
> +				     i915_gem_get_pat_index(i915,
> +							    I915_CACHE_NONE),
> +				     0);
>   
>   		s = io_mapping_map_atomic_wc(&ggtt->iomap, slot);
>   		for (x = 0; x < PAGE_SIZE / sizeof(u32); x++) {
> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
> index 37068542aafe..f13a4d265814 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
> @@ -245,7 +245,7 @@ static int igt_evict_for_cache_color(void *arg)
>   	struct drm_mm_node target = {
>   		.start = I915_GTT_PAGE_SIZE * 2,
>   		.size = I915_GTT_PAGE_SIZE,
> -		.color = I915_CACHE_LLC,
> +		.color = i915_gem_get_pat_index(gt->i915, I915_CACHE_LLC),
>   	};
>   	struct drm_i915_gem_object *obj;
>   	struct i915_vma *vma;
> @@ -308,7 +308,7 @@ static int igt_evict_for_cache_color(void *arg)
>   	/* Attempt to remove the first *pinned* vma, by removing the (empty)
>   	 * neighbour -- this should fail.
>   	 */
> -	target.color = I915_CACHE_L3_LLC;
> +	target.color = i915_gem_get_pat_index(gt->i915, I915_CACHE_L3_LLC);
>   
>   	mutex_lock(&ggtt->vm.mutex);
>   	err = i915_gem_evict_for_node(&ggtt->vm, NULL, &target, 0);
> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
> index 154801f1c468..36940ef10108 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
> @@ -135,7 +135,7 @@ fake_dma_object(struct drm_i915_private *i915, u64 size)
>   
>   	obj->write_domain = I915_GEM_DOMAIN_CPU;
>   	obj->read_domains = I915_GEM_DOMAIN_CPU;
> -	obj->cache_level = I915_CACHE_NONE;
> +	obj->pat_index = i915_gem_get_pat_index(i915, I915_CACHE_NONE);
>   
>   	/* Preallocate the "backing storage" */
>   	if (i915_gem_object_pin_pages_unlocked(obj))
> @@ -359,7 +359,9 @@ static int lowlevel_hole(struct i915_address_space *vm,
>   
>   			with_intel_runtime_pm(vm->gt->uncore->rpm, wakeref)
>   			  vm->insert_entries(vm, mock_vma_res,
> -						   I915_CACHE_NONE, 0);
> +					     i915_gem_get_pat_index(vm->i915,
> +								    I915_CACHE_NONE),
> +					     0);
>   		}
>   		count = n;
>   
> @@ -1377,7 +1379,10 @@ static int igt_ggtt_page(void *arg)
>   
>   		ggtt->vm.insert_page(&ggtt->vm,
>   				     i915_gem_object_get_dma_address(obj, 0),
> -				     offset, I915_CACHE_NONE, 0);
> +				     offset,
> +				     i915_gem_get_pat_index(i915,
> +							    I915_CACHE_NONE),
> +				     0);
>   	}
>   
>   	order = i915_random_order(count, &prng);
> @@ -1510,7 +1515,7 @@ static int reserve_gtt_with_resource(struct i915_vma *vma, u64 offset)
>   	mutex_lock(&vm->mutex);
>   	err = i915_gem_gtt_reserve(vm, NULL, &vma->node, obj->base.size,
>   				   offset,
> -				   obj->cache_level,
> +				   obj->pat_index,
>   				   0);
>   	if (!err) {
>   		i915_vma_resource_init_from_vma(vma_res, vma);
> @@ -1690,7 +1695,7 @@ static int insert_gtt_with_resource(struct i915_vma *vma)
>   
>   	mutex_lock(&vm->mutex);
>   	err = i915_gem_gtt_insert(vm, NULL, &vma->node, obj->base.size, 0,
> -				  obj->cache_level, 0, vm->total, 0);
> +				  obj->pat_index, 0, vm->total, 0);
>   	if (!err) {
>   		i915_vma_resource_init_from_vma(vma_res, vma);
>   		vma->resource = vma_res;
> diff --git a/drivers/gpu/drm/i915/selftests/intel_memory_region.c b/drivers/gpu/drm/i915/selftests/intel_memory_region.c
> index 3b18e5905c86..d985d9bae2e8 100644
> --- a/drivers/gpu/drm/i915/selftests/intel_memory_region.c
> +++ b/drivers/gpu/drm/i915/selftests/intel_memory_region.c
> @@ -1070,7 +1070,9 @@ static int igt_lmem_write_cpu(void *arg)
>   	/* Put the pages into a known state -- from the gpu for added fun */
>   	intel_engine_pm_get(engine);
>   	err = intel_context_migrate_clear(engine->gt->migrate.context, NULL,
> -					  obj->mm.pages->sgl, I915_CACHE_NONE,
> +					  obj->mm.pages->sgl,
> +					  i915_gem_get_pat_index(i915,
> +								 I915_CACHE_NONE),
>   					  true, 0xdeadbeaf, &rq);
>   	if (rq) {
>   		dma_resv_add_fence(obj->base.resv, &rq->fence,
> diff --git a/drivers/gpu/drm/i915/selftests/mock_gtt.c b/drivers/gpu/drm/i915/selftests/mock_gtt.c
> index ece97e4faacb..a516c0aa88fd 100644
> --- a/drivers/gpu/drm/i915/selftests/mock_gtt.c
> +++ b/drivers/gpu/drm/i915/selftests/mock_gtt.c
> @@ -27,21 +27,21 @@
>   static void mock_insert_page(struct i915_address_space *vm,
>   			     dma_addr_t addr,
>   			     u64 offset,
> -			     enum i915_cache_level level,
> +			     unsigned int pat_index,
>   			     u32 flags)
>   {
>   }
>   
>   static void mock_insert_entries(struct i915_address_space *vm,
>   				struct i915_vma_resource *vma_res,
> -				enum i915_cache_level level, u32 flags)
> +				unsigned int pat_index, u32 flags)
>   {
>   }
>   
>   static void mock_bind_ppgtt(struct i915_address_space *vm,
>   			    struct i915_vm_pt_stash *stash,
>   			    struct i915_vma_resource *vma_res,
> -			    enum i915_cache_level cache_level,
> +			    unsigned int pat_index,
>   			    u32 flags)
>   {
>   	GEM_BUG_ON(flags & I915_VMA_GLOBAL_BIND);
> @@ -94,7 +94,7 @@ struct i915_ppgtt *mock_ppgtt(struct drm_i915_private *i915, const char *name)
>   static void mock_bind_ggtt(struct i915_address_space *vm,
>   			   struct i915_vm_pt_stash *stash,
>   			   struct i915_vma_resource *vma_res,
> -			   enum i915_cache_level cache_level,
> +			   unsigned int pat_index,
>   			   u32 flags)
>   {
>   }

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Intel-gfx] [PATCH 3/8] drm/i915/mtl: Add PTE encode function
  2023-04-20 20:40   ` Matt Roper
@ 2023-04-21 17:27     ` Yang, Fei
  2023-04-21 17:42       ` Matt Roper
  0 siblings, 1 reply; 76+ messages in thread
From: Yang, Fei @ 2023-04-21 17:27 UTC (permalink / raw)
  To: Roper, Matthew D; +Cc: intel-gfx, dri-devel, Hajda, Andrzej, Das, Nirmoy

[-- Attachment #1: Type: text/plain, Size: 2805 bytes --]

> On Wed, Apr 19, 2023 at 04:00:53PM -0700, fei.yang@intel.com wrote:
>> From: Fei Yang <fei.yang@intel.com>
>>
>> PTE encode functions are platform dependent. This patch implements
>> PTE functions for MTL, and ensures the correct PTE encode function
>> is used by calling pte_encode function pointer instead of the
>> hardcoded gen8 version of PTE encode.
>>
>> Signed-off-by: Fei Yang <fei.yang@intel.com>
>> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
>> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
>> Acked-by: Nirmoy Das <nirmoy.das@intel.com>
>
> Bspec: 45015, 45040
>
>> ---
>>  drivers/gpu/drm/i915/display/intel_dpt.c |  2 +-
>>  drivers/gpu/drm/i915/gt/gen8_ppgtt.c     | 45 ++++++++++++++++++++----
>>  drivers/gpu/drm/i915/gt/intel_ggtt.c     | 36 +++++++++++++++++--
>>  3 files changed, 72 insertions(+), 11 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/display/intel_dpt.c b/drivers/gpu/drm/i915/display/intel_dpt.c
>> index b8027392144d..c5eacfdba1a5 100644
>> --- a/drivers/gpu/drm/i915/display/intel_dpt.c
>> +++ b/drivers/gpu/drm/i915/display/intel_dpt.c
>> @@ -300,7 +300,7 @@ intel_dpt_create(struct intel_framebuffer *fb)
>>        vm->vma_ops.bind_vma    = dpt_bind_vma;
>>        vm->vma_ops.unbind_vma  = dpt_unbind_vma;
>>
>> -     vm->pte_encode = gen8_ggtt_pte_encode;
>> +     vm->pte_encode = vm->gt->ggtt->vm.pte_encode;
>>
>>        dpt->obj = dpt_obj;
>>        dpt->obj->is_dpt = true;
>> diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>> index 4daaa6f55668..11b91e0453c8 100644
>> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>> @@ -55,6 +55,34 @@ static u64 gen8_pte_encode(dma_addr_t addr,
>>        return pte;
>>  }
>>
>> +static u64 mtl_pte_encode(dma_addr_t addr,
>> +                       enum i915_cache_level level,
>> +                       u32 flags)
>> +{
>> +     gen8_pte_t pte = addr | GEN8_PAGE_PRESENT | GEN8_PAGE_RW;
>> +
>> +     if (unlikely(flags & PTE_READ_ONLY))
>> +             pte &= ~GEN8_PAGE_RW;
>> +
>> +     if (flags & PTE_LM)
>> +             pte |= GEN12_PPGTT_PTE_LM | GEN12_PPGTT_PTE_NC;
>
> GEN12_PPGTT_PTE_NC got defined in the previous patch as BIT(5).  But
> according to bspec 45040, bit 5 is ignored in the PTE encoding.  What is
> this trying to do?

This takes effect only for PTE_LM, doesn't affect MTL.
PTE_NC is needed for PVC (use of access counter).

I believe this function was writen based on the one for PVC. And this function
did get extended to cover all gen12 in a later patch.

-Fei

> Matt
>
>> +
>> +     switch (level) {
>> +     case I915_CACHE_NONE:
>> +             pte |= GEN12_PPGTT_PTE_PAT1;
>> +             break;


[-- Attachment #2: Type: text/html, Size: 6630 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Intel-gfx] [PATCH 3/8] drm/i915/mtl: Add PTE encode function
  2023-04-21 17:27     ` Yang, Fei
@ 2023-04-21 17:42       ` Matt Roper
  2023-04-23  7:37           ` Yang, Fei
  0 siblings, 1 reply; 76+ messages in thread
From: Matt Roper @ 2023-04-21 17:42 UTC (permalink / raw)
  To: Yang, Fei; +Cc: intel-gfx, dri-devel, Hajda, Andrzej, Das, Nirmoy

On Fri, Apr 21, 2023 at 10:27:22AM -0700, Yang, Fei wrote:
>    > On Wed, Apr 19, 2023 at 04:00:53PM -0700, fei.yang@intel.com wrote:
>    >> From: Fei Yang <fei.yang@intel.com>
>    >>
>    >> PTE encode functions are platform dependent. This patch implements
>    >> PTE functions for MTL, and ensures the correct PTE encode function
>    >> is used by calling pte_encode function pointer instead of the
>    >> hardcoded gen8 version of PTE encode.
>    >>
>    >> Signed-off-by: Fei Yang <fei.yang@intel.com>
>    >> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
>    >> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
>    >> Acked-by: Nirmoy Das <nirmoy.das@intel.com>
>    >
>    > Bspec: 45015, 45040
>    >
>    >> ---
>    >>  drivers/gpu/drm/i915/display/intel_dpt.c |  2 +-
>    >>  drivers/gpu/drm/i915/gt/gen8_ppgtt.c     | 45 ++++++++++++++++++++----
>    >>  drivers/gpu/drm/i915/gt/intel_ggtt.c     | 36 +++++++++++++++++--
>    >>  3 files changed, 72 insertions(+), 11 deletions(-)
>    >>
>    >> diff --git a/drivers/gpu/drm/i915/display/intel_dpt.c
>    b/drivers/gpu/drm/i915/display/intel_dpt.c
>    >> index b8027392144d..c5eacfdba1a5 100644
>    >> --- a/drivers/gpu/drm/i915/display/intel_dpt.c
>    >> +++ b/drivers/gpu/drm/i915/display/intel_dpt.c
>    >> @@ -300,7 +300,7 @@ intel_dpt_create(struct intel_framebuffer *fb)
>    >>        vm->vma_ops.bind_vma    = dpt_bind_vma;
>    >>        vm->vma_ops.unbind_vma  = dpt_unbind_vma;
>    >>
>    >> -     vm->pte_encode = gen8_ggtt_pte_encode;
>    >> +     vm->pte_encode = vm->gt->ggtt->vm.pte_encode;
>    >>
>    >>        dpt->obj = dpt_obj;
>    >>        dpt->obj->is_dpt = true;
>    >> diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>    b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>    >> index 4daaa6f55668..11b91e0453c8 100644
>    >> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>    >> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>    >> @@ -55,6 +55,34 @@ static u64 gen8_pte_encode(dma_addr_t addr,
>    >>        return pte;
>    >>  }
>    >>
>    >> +static u64 mtl_pte_encode(dma_addr_t addr,
>    >> +                       enum i915_cache_level level,
>    >> +                       u32 flags)
>    >> +{
>    >> +     gen8_pte_t pte = addr | GEN8_PAGE_PRESENT | GEN8_PAGE_RW;
>    >> +
>    >> +     if (unlikely(flags & PTE_READ_ONLY))
>    >> +             pte &= ~GEN8_PAGE_RW;
>    >> +
>    >> +     if (flags & PTE_LM)
>    >> +             pte |= GEN12_PPGTT_PTE_LM | GEN12_PPGTT_PTE_NC;
>    >
>    > GEN12_PPGTT_PTE_NC got defined in the previous patch as BIT(5).  But
>    > according to bspec 45040, bit 5 is ignored in the PTE encoding.  What is
>    > this trying to do?
>    This takes effect only for PTE_LM, doesn't affect MTL.
>    PTE_NC is needed for PVC (use of access counter).
>    I believe this function was writen based on the one for PVC. And this
>    function
>    did get extended to cover all gen12 in a later patch.

Even though MTL doesn't have local memory, PTE_LM is supposed to be used
on MTL for access to BAR2 stolen memory.


Matt

>    -Fei
>    > Matt
>    >
>    >> +
>    >> +     switch (level) {
>    >> +     case I915_CACHE_NONE:
>    >> +             pte |= GEN12_PPGTT_PTE_PAT1;
>    >> +             break;

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Intel-gfx] [PATCH 8/8] drm/i915: Allow user to set cache at BO creation
  2023-04-20 16:11       ` Yang, Fei
@ 2023-04-21 20:48           ` Jordan Justen
  2023-04-21 20:48           ` Jordan Justen
  1 sibling, 0 replies; 76+ messages in thread
From: Jordan Justen @ 2023-04-21 20:48 UTC (permalink / raw)
  To: Yang, Fei, Andi Shyti, Tvrtko Ursulin, Alan Previn
  Cc: Roper, Matthew D, intel-gfx, dri-devel, Daniele Ceraolo Spurio,
	Lionel Landwerlin, Chris Wilson, Das, Nirmoy

On 2023-04-20 09:11:18, Yang, Fei wrote:
> > On 20/04/2023 12:39, Andi Shyti wrote:
> >> Hi Fei,
> >>
> >>> To comply with the design that buffer objects shall have immutable
> >>> cache setting through out their life cycle, {set, get}_caching ioctl's
> >>> are no longer supported from MTL onward. With that change caching
> >>> policy can only be set at object creation time. The current code
> >>> applies a default (platform dependent) cache setting for all objects.
> >>> However this is not optimal for performance tuning. The patch extends
> >>> the existing gem_create uAPI to let user set PAT index for the object
> >>> at creation time.
> >>> The new extension is platform independent, so UMD's can switch to using
> >>> this extension for older platforms as well, while {set, get}_caching are
> >>> still supported on these legacy paltforms for compatibility reason.
> >>>
> >>> Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
> >>> Cc: Matt Roper <matthew.d.roper@intel.com>
> >>> Cc: Andi Shyti <andi.shyti@linux.intel.com>
> >>> Signed-off-by: Fei Yang <fei.yang@intel.com>
> >>> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
> >>
> >> because this is an API change, we need some more information
> >> here.
> >>
> >> First of all you need to CC the userspace guys that have been
> >> working on top of your series and get their ack's.
> >
> > Yes, and a link to a Mesa merge request which uses the uapi should be
> > included.
> 
> Working with Mesa team on this, stay tuned.
> 

I would like to see the extension detection issue is handled before
ack'ing this.

How about a new DRM_I915_QUERY_GEM_CREATE_EXTENSIONS item, that
returns a u64 array of usable extension names for
DRM_IOCTL_I915_GEM_CREATE_EXT?

A similar DRM_I915_QUERY_GEM_CONTEXT_CREATE_EXTENSIONS could also
provide an alternative to Alan's "drm/i915/uapi/pxp: Add a GET_PARAM
for PXP", and more easily allow advertising future new extensions for
context/buffer creation.

-Jordan

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Intel-gfx] [PATCH 8/8] drm/i915: Allow user to set cache at BO creation
@ 2023-04-21 20:48           ` Jordan Justen
  0 siblings, 0 replies; 76+ messages in thread
From: Jordan Justen @ 2023-04-21 20:48 UTC (permalink / raw)
  To: Yang, Fei, Andi Shyti, Tvrtko Ursulin, Alan Previn
  Cc: Roper, Matthew D, intel-gfx, dri-devel, Chris Wilson, Das, Nirmoy

On 2023-04-20 09:11:18, Yang, Fei wrote:
> > On 20/04/2023 12:39, Andi Shyti wrote:
> >> Hi Fei,
> >>
> >>> To comply with the design that buffer objects shall have immutable
> >>> cache setting through out their life cycle, {set, get}_caching ioctl's
> >>> are no longer supported from MTL onward. With that change caching
> >>> policy can only be set at object creation time. The current code
> >>> applies a default (platform dependent) cache setting for all objects.
> >>> However this is not optimal for performance tuning. The patch extends
> >>> the existing gem_create uAPI to let user set PAT index for the object
> >>> at creation time.
> >>> The new extension is platform independent, so UMD's can switch to using
> >>> this extension for older platforms as well, while {set, get}_caching are
> >>> still supported on these legacy paltforms for compatibility reason.
> >>>
> >>> Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
> >>> Cc: Matt Roper <matthew.d.roper@intel.com>
> >>> Cc: Andi Shyti <andi.shyti@linux.intel.com>
> >>> Signed-off-by: Fei Yang <fei.yang@intel.com>
> >>> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
> >>
> >> because this is an API change, we need some more information
> >> here.
> >>
> >> First of all you need to CC the userspace guys that have been
> >> working on top of your series and get their ack's.
> >
> > Yes, and a link to a Mesa merge request which uses the uapi should be
> > included.
> 
> Working with Mesa team on this, stay tuned.
> 

I would like to see the extension detection issue is handled before
ack'ing this.

How about a new DRM_I915_QUERY_GEM_CREATE_EXTENSIONS item, that
returns a u64 array of usable extension names for
DRM_IOCTL_I915_GEM_CREATE_EXT?

A similar DRM_I915_QUERY_GEM_CONTEXT_CREATE_EXTENSIONS could also
provide an alternative to Alan's "drm/i915/uapi/pxp: Add a GET_PARAM
for PXP", and more easily allow advertising future new extensions for
context/buffer creation.

-Jordan

^ permalink raw reply	[flat|nested] 76+ messages in thread

* RE: [Intel-gfx] [PATCH 7/8] drm/i915: use pat_index instead of cache_level
  2023-04-21 10:17   ` Tvrtko Ursulin
@ 2023-04-23  6:12       ` Yang, Fei
  0 siblings, 0 replies; 76+ messages in thread
From: Yang, Fei @ 2023-04-23  6:12 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx; +Cc: Chris Wilson, Roper, Matthew D, dri-devel

> On 20/04/2023 00:00, fei.yang@intel.com wrote:
>> From: Fei Yang <fei.yang@intel.com>
>>
>> Currently the KMD is using enum i915_cache_level to set caching policy
>> for buffer objects. This is flaky because the PAT index which really
>> controls the caching behavior in PTE has far more levels than what's
>> defined in the enum. In addition, the PAT index is platform dependent,
>> having to translate between i915_cache_level and PAT index is not
>> reliable, and makes the code more complicated.
>>
>> From UMD's perspective there is also a necessity to set caching policy for
>> performance fine tuning. It's much easier for the UMD to directly use
>> PAT index because the behavior of each PAT index is clearly defined in Bspec.
>> Having the abstracted i915_cache_level sitting in between would only
>> cause more ambiguity.
>>
>> For these reasons this patch replaces i915_cache_level with PAT index.
>> Also note, the cache_level is not completely removed yet, because the
>> KMD still has the need of creating buffer objects with simple cache
>> settings such as cached, uncached, or writethrough. For such simple
>> cases, using cache_level would help simplify the code.
>>
>> Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
>> Cc: Matt Roper <matthew.d.roper@intel.com>
>> Signed-off-by: Fei Yang <fei.yang@intel.com>
>> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
>
> [snip]
>
>>
>>   bool i915_gem_cpu_write_needs_clflush(struct drm_i915_gem_object
>> *obj) @@ -267,7 +267,7 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
>>   {
>>      int ret;
>>
>> -    if (obj->cache_level == cache_level)
>> +    if (i915_gem_object_has_cache_level(obj, cache_level))
>>              return 0;
>
> When userspace calls i915_gem_set_caching_ioctl

We are ending the support for set_caching_ioctl.

> after having set the PAT index explicitly this will make it silently succeed
> regardless of the cache level passed in, no? Because of:

Yes, that's the point. For objects created by userspace with PAT index set,
KMD is not supposed to touch the setting.

> +bool i915_gem_object_has_cache_level(const struct drm_i915_gem_object *obj,
> +                                  enum i915_cache_level lvl)
> +{
> +     /*
> +      * cache_level == I915_CACHE_INVAL indicates the UMD's have set the
> +      * caching policy through pat_index, in which case the KMD should
> +      * leave the coherency to be managed by user space, simply return
> +      * true here.
> +      */
> +     if (obj->cache_level == I915_CACHE_INVAL)
> +             return true;
>
> I think we need to let it know it is doing it wrong with an error.

This is not an error, by design userspace should know exactly what it's doing.

-Fei

> Regards,
>
> Tvrtko

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Intel-gfx] [PATCH 7/8] drm/i915: use pat_index instead of cache_level
@ 2023-04-23  6:12       ` Yang, Fei
  0 siblings, 0 replies; 76+ messages in thread
From: Yang, Fei @ 2023-04-23  6:12 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx; +Cc: Chris Wilson, Roper, Matthew D, dri-devel

> On 20/04/2023 00:00, fei.yang@intel.com wrote:
>> From: Fei Yang <fei.yang@intel.com>
>>
>> Currently the KMD is using enum i915_cache_level to set caching policy
>> for buffer objects. This is flaky because the PAT index which really
>> controls the caching behavior in PTE has far more levels than what's
>> defined in the enum. In addition, the PAT index is platform dependent,
>> having to translate between i915_cache_level and PAT index is not
>> reliable, and makes the code more complicated.
>>
>> From UMD's perspective there is also a necessity to set caching policy for
>> performance fine tuning. It's much easier for the UMD to directly use
>> PAT index because the behavior of each PAT index is clearly defined in Bspec.
>> Having the abstracted i915_cache_level sitting in between would only
>> cause more ambiguity.
>>
>> For these reasons this patch replaces i915_cache_level with PAT index.
>> Also note, the cache_level is not completely removed yet, because the
>> KMD still has the need of creating buffer objects with simple cache
>> settings such as cached, uncached, or writethrough. For such simple
>> cases, using cache_level would help simplify the code.
>>
>> Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
>> Cc: Matt Roper <matthew.d.roper@intel.com>
>> Signed-off-by: Fei Yang <fei.yang@intel.com>
>> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
>
> [snip]
>
>>
>>   bool i915_gem_cpu_write_needs_clflush(struct drm_i915_gem_object
>> *obj) @@ -267,7 +267,7 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
>>   {
>>      int ret;
>>
>> -    if (obj->cache_level == cache_level)
>> +    if (i915_gem_object_has_cache_level(obj, cache_level))
>>              return 0;
>
> When userspace calls i915_gem_set_caching_ioctl

We are ending the support for set_caching_ioctl.

> after having set the PAT index explicitly this will make it silently succeed
> regardless of the cache level passed in, no? Because of:

Yes, that's the point. For objects created by userspace with PAT index set,
KMD is not supposed to touch the setting.

> +bool i915_gem_object_has_cache_level(const struct drm_i915_gem_object *obj,
> +                                  enum i915_cache_level lvl)
> +{
> +     /*
> +      * cache_level == I915_CACHE_INVAL indicates the UMD's have set the
> +      * caching policy through pat_index, in which case the KMD should
> +      * leave the coherency to be managed by user space, simply return
> +      * true here.
> +      */
> +     if (obj->cache_level == I915_CACHE_INVAL)
> +             return true;
>
> I think we need to let it know it is doing it wrong with an error.

This is not an error, by design userspace should know exactly what it's doing.

-Fei

> Regards,
>
> Tvrtko

^ permalink raw reply	[flat|nested] 76+ messages in thread

* RE: [Intel-gfx] [PATCH 7/8] drm/i915: use pat_index instead of cache_level
  2023-04-21 11:39   ` Tvrtko Ursulin
@ 2023-04-23  6:52       ` Yang, Fei
  0 siblings, 0 replies; 76+ messages in thread
From: Yang, Fei @ 2023-04-23  6:52 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx; +Cc: Chris Wilson, Roper, Matthew D, dri-devel

> On 20/04/2023 00:00, fei.yang@intel.com wrote:
>> From: Fei Yang <fei.yang@intel.com>
>>
>> Currently the KMD is using enum i915_cache_level to set caching policy for
>> buffer objects. This is flaky because the PAT index which really controls
>> the caching behavior in PTE has far more levels than what's defined in the
>> enum. In addition, the PAT index is platform dependent, having to translate
>> between i915_cache_level and PAT index is not reliable, and makes the code
>> more complicated.
>>
>>>From UMD's perspective there is also a necessity to set caching policy for
>> performance fine tuning. It's much easier for the UMD to directly use PAT
>> index because the behavior of each PAT index is clearly defined in Bspec.
>> Having the abstracted i915_cache_level sitting in between would only cause
>> more ambiguity.
>>
>> For these reasons this patch replaces i915_cache_level with PAT index. Also
>> note, the cache_level is not completely removed yet, because the KMD still
>> has the need of creating buffer objects with simple cache settings such as
>> cached, uncached, or writethrough. For such simple cases, using cache_level
>> would help simplify the code.
>>
>> Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
>> Cc: Matt Roper <matthew.d.roper@intel.com>
>> Signed-off-by: Fei Yang <fei.yang@intel.com>
>> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
>
> I think have some ideas no how to perhaps make this simpler, please bear
> with me.
>
> In my mind get/set caching ioctls need to be failing once explicit pat
> index has been set by userspace. Or at least not return false information.

By design we are ending the support for set caching ioctl. The patch is included
in this series, "drm/i915/mtl: end support for set caching ioctl"

+       if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 70))
+               return -EOPNOTSUPP;
+

> And I don't like i915_gem_object_has_cache_level and
> i915_gem_get_pat_index as a refactoring step.
>
> It also seems that the driver has a need to query the caching mode set
> regardless of the route (of setting).

Only for the objects created by the KMD. For UMD created objects with PAT
index set KMD should never touch the setting.

> So how about this.
>
> Three callers which query the caching mode: use_cpu_reloc, vm_fault_gtt,
> gpu_write_needs_clflush.
>
> We convert them to be like:
>
> i915_gem_object_has_caching_mode(obj, PAT_UC / PAT_WT / ...);

PAT_UC/WT/WB are platform dependent (https://gfxspecs.intel.com/Predator/Home/Index/45101),
performing this check you would have to do something like,

if (MTL)
        ...
else if (PVC)
        ...
else if (GEN12)
        ...
else
        ...

> Then apart from the per platform tables for mapping between cache level
> to pat index, you add tables which map pat index to caching modes
> (PAT_UC, etc, naming TBD, just enums or bitmasks also TBD, I haven't
> looked at the bspec to see how exactly it works).
>
> You would use that table in the i915_gem_object_has_caching_mode helper,
> called from the above three functions instead of obj->cache_level direct
> comparison.
>
> I am assuming at least for instance cache_level != I915_CACHE_NONE would
> be equivalent to i915_gem_object_has_caching_mode(obj, PAT_UC), etc.

So far kernel only needs 4 cache levels defined in enum i915_cache_level,
kernel doesn't need to understand all PAT indices. By desgin if the userspace
is setting PAT index directly, kernel only needs to pass the setting to PTE.

For objects created by kernel (including objects created by userspace without
specifying pat index), there are only 4 options (defined in the cachelevel_to_pat).

For objects created by userspace with PAT index set (GEM_CREATE + set_pat extension),
kernel should not touch the setting, just pass it to the PAT index bits in PTE.

That's why I was only checking cache_level. Handling PAT index is much more
complicated because of its platform dependent nature and even the number of
PAT indices varies from platform to platform. Fortunately kernel doesn't need
to understand that.

-Fei

> Same mapping table could also be used in debugfs (i915_cache_level_str)
> to universally describe any obj->pat_index, with no need to have
> anything platform dependend there.
>
> In set caching set you always set obj->pat_index and so low level code
> can always just use that.
>
> Unless I am missing something (possible) I think like that we end up
> with no i915_gem_get_pat_index sprinkled around and also no confusing
> i915_gem_object_has_cache_level.
>
> Obj->pat_index would be a single point of truth, while obj->cache_level
> is just a legacy field for get/set_caching ioctl - not used in the
> internal driver flows.
>
> We would need an additional field for storing the boolean of whether
> userspace had overriden the PAT.
>
> Regards,
>
> Tvrtko

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Intel-gfx] [PATCH 7/8] drm/i915: use pat_index instead of cache_level
@ 2023-04-23  6:52       ` Yang, Fei
  0 siblings, 0 replies; 76+ messages in thread
From: Yang, Fei @ 2023-04-23  6:52 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx; +Cc: Chris Wilson, Roper, Matthew D, dri-devel

> On 20/04/2023 00:00, fei.yang@intel.com wrote:
>> From: Fei Yang <fei.yang@intel.com>
>>
>> Currently the KMD is using enum i915_cache_level to set caching policy for
>> buffer objects. This is flaky because the PAT index which really controls
>> the caching behavior in PTE has far more levels than what's defined in the
>> enum. In addition, the PAT index is platform dependent, having to translate
>> between i915_cache_level and PAT index is not reliable, and makes the code
>> more complicated.
>>
>>>From UMD's perspective there is also a necessity to set caching policy for
>> performance fine tuning. It's much easier for the UMD to directly use PAT
>> index because the behavior of each PAT index is clearly defined in Bspec.
>> Having the abstracted i915_cache_level sitting in between would only cause
>> more ambiguity.
>>
>> For these reasons this patch replaces i915_cache_level with PAT index. Also
>> note, the cache_level is not completely removed yet, because the KMD still
>> has the need of creating buffer objects with simple cache settings such as
>> cached, uncached, or writethrough. For such simple cases, using cache_level
>> would help simplify the code.
>>
>> Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
>> Cc: Matt Roper <matthew.d.roper@intel.com>
>> Signed-off-by: Fei Yang <fei.yang@intel.com>
>> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
>
> I think have some ideas no how to perhaps make this simpler, please bear
> with me.
>
> In my mind get/set caching ioctls need to be failing once explicit pat
> index has been set by userspace. Or at least not return false information.

By design we are ending the support for set caching ioctl. The patch is included
in this series, "drm/i915/mtl: end support for set caching ioctl"

+       if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 70))
+               return -EOPNOTSUPP;
+

> And I don't like i915_gem_object_has_cache_level and
> i915_gem_get_pat_index as a refactoring step.
>
> It also seems that the driver has a need to query the caching mode set
> regardless of the route (of setting).

Only for the objects created by the KMD. For UMD created objects with PAT
index set KMD should never touch the setting.

> So how about this.
>
> Three callers which query the caching mode: use_cpu_reloc, vm_fault_gtt,
> gpu_write_needs_clflush.
>
> We convert them to be like:
>
> i915_gem_object_has_caching_mode(obj, PAT_UC / PAT_WT / ...);

PAT_UC/WT/WB are platform dependent (https://gfxspecs.intel.com/Predator/Home/Index/45101),
performing this check you would have to do something like,

if (MTL)
        ...
else if (PVC)
        ...
else if (GEN12)
        ...
else
        ...

> Then apart from the per platform tables for mapping between cache level
> to pat index, you add tables which map pat index to caching modes
> (PAT_UC, etc, naming TBD, just enums or bitmasks also TBD, I haven't
> looked at the bspec to see how exactly it works).
>
> You would use that table in the i915_gem_object_has_caching_mode helper,
> called from the above three functions instead of obj->cache_level direct
> comparison.
>
> I am assuming at least for instance cache_level != I915_CACHE_NONE would
> be equivalent to i915_gem_object_has_caching_mode(obj, PAT_UC), etc.

So far kernel only needs 4 cache levels defined in enum i915_cache_level,
kernel doesn't need to understand all PAT indices. By desgin if the userspace
is setting PAT index directly, kernel only needs to pass the setting to PTE.

For objects created by kernel (including objects created by userspace without
specifying pat index), there are only 4 options (defined in the cachelevel_to_pat).

For objects created by userspace with PAT index set (GEM_CREATE + set_pat extension),
kernel should not touch the setting, just pass it to the PAT index bits in PTE.

That's why I was only checking cache_level. Handling PAT index is much more
complicated because of its platform dependent nature and even the number of
PAT indices varies from platform to platform. Fortunately kernel doesn't need
to understand that.

-Fei

> Same mapping table could also be used in debugfs (i915_cache_level_str)
> to universally describe any obj->pat_index, with no need to have
> anything platform dependend there.
>
> In set caching set you always set obj->pat_index and so low level code
> can always just use that.
>
> Unless I am missing something (possible) I think like that we end up
> with no i915_gem_get_pat_index sprinkled around and also no confusing
> i915_gem_object_has_cache_level.
>
> Obj->pat_index would be a single point of truth, while obj->cache_level
> is just a legacy field for get/set_caching ioctl - not used in the
> internal driver flows.
>
> We would need an additional field for storing the boolean of whether
> userspace had overriden the PAT.
>
> Regards,
>
> Tvrtko

^ permalink raw reply	[flat|nested] 76+ messages in thread

* RE: [Intel-gfx] [PATCH 3/8] drm/i915/mtl: Add PTE encode function
  2023-04-21 17:42       ` Matt Roper
@ 2023-04-23  7:37           ` Yang, Fei
  0 siblings, 0 replies; 76+ messages in thread
From: Yang, Fei @ 2023-04-23  7:37 UTC (permalink / raw)
  To: Roper, Matthew D; +Cc: intel-gfx, dri-devel, Hajda, Andrzej, Das, Nirmoy

> On Fri, Apr 21, 2023 at 10:27:22AM -0700, Yang, Fei wrote:
>>> On Wed, Apr 19, 2023 at 04:00:53PM -0700, fei.yang@intel.com wrote:
>>>> From: Fei Yang <fei.yang@intel.com>
>>>>
>>>> PTE encode functions are platform dependent. This patch implements
>>>> PTE functions for MTL, and ensures the correct PTE encode function
>>>> is used by calling pte_encode function pointer instead of the
>>>> hardcoded gen8 version of PTE encode.
>>>>
>>>> Signed-off-by: Fei Yang <fei.yang@intel.com>
>>>> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
>>>> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
>>>> Acked-by: Nirmoy Das <nirmoy.das@intel.com>
>>>
>>> Bspec: 45015, 45040
>>>
>>>> ---
>>>>  drivers/gpu/drm/i915/display/intel_dpt.c |  2 +-
>>>>  drivers/gpu/drm/i915/gt/gen8_ppgtt.c     | 45 ++++++++++++++++++++----
>>>>  drivers/gpu/drm/i915/gt/intel_ggtt.c     | 36 +++++++++++++++++--
>>>>  3 files changed, 72 insertions(+), 11 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/display/intel_dpt.c
>>b/drivers/gpu/drm/i915/display/intel_dpt.c
>>>> index b8027392144d..c5eacfdba1a5 100644
>>>> --- a/drivers/gpu/drm/i915/display/intel_dpt.c
>>>> +++ b/drivers/gpu/drm/i915/display/intel_dpt.c
>>>> @@ -300,7 +300,7 @@ intel_dpt_create(struct intel_framebuffer *fb)
>>>>        vm->vma_ops.bind_vma    = dpt_bind_vma;
>>>>        vm->vma_ops.unbind_vma  = dpt_unbind_vma;
>>>>
>>>> -     vm->pte_encode = gen8_ggtt_pte_encode;
>>>> +     vm->pte_encode = vm->gt->ggtt->vm.pte_encode;
>>>>
>>>>        dpt->obj = dpt_obj;
>>>>        dpt->obj->is_dpt = true;
>>>> diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>>>>  b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>>>> index 4daaa6f55668..11b91e0453c8 100644
>>>> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>>>> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>>>> @@ -55,6 +55,34 @@ static u64 gen8_pte_encode(dma_addr_t addr,
>>>>        return pte;
>>>>  }
>>>>
>>>> +static u64 mtl_pte_encode(dma_addr_t addr,
>>>> +                       enum i915_cache_level level,
>>>> +                       u32 flags)
>>>> +{
>>>> +     gen8_pte_t pte = addr | GEN8_PAGE_PRESENT | GEN8_PAGE_RW;
>>>> +
>>>> +     if (unlikely(flags & PTE_READ_ONLY))
>>>> +             pte &= ~GEN8_PAGE_RW;
>>>> +
>>>> +     if (flags & PTE_LM)
>>>> +             pte |= GEN12_PPGTT_PTE_LM | GEN12_PPGTT_PTE_NC;
>>>
>>> GEN12_PPGTT_PTE_NC got defined in the previous patch as BIT(5).  But
>>> according to bspec 45040, bit 5 is ignored in the PTE encoding.  What is
>>> this trying to do?
>>
>> This takes effect only for PTE_LM, doesn't affect MTL.
>> PTE_NC is needed for PVC (use of access counter).
>> I believe this function was writen based on the one for PVC. And this
>> function did get extended to cover all gen12 in a later patch.
>
> Even though MTL doesn't have local memory, PTE_LM is supposed to be
> used on MTL for access to BAR2 stolen memory.

You were right, but I still think this code is fine because this bit is
ignored for MTL anyway and it is needed for other platforms with LMEM.
Otherwise this code would have some sort of platform checking which is
hard to do because we don't have platform info here.
Or we would have to define another PTE encode function for platforms
needing PTE_NC just for this one difference, then manage the function
pointer correctly.

-Fei

> Matt
>
>> -Fei
>>> Matt
>>>
>>>> +
>>>> +     switch (level) {
>>>> +     case I915_CACHE_NONE:
>>>> +             pte |= GEN12_PPGTT_PTE_PAT1;
>>>> +             break;

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Intel-gfx] [PATCH 3/8] drm/i915/mtl: Add PTE encode function
@ 2023-04-23  7:37           ` Yang, Fei
  0 siblings, 0 replies; 76+ messages in thread
From: Yang, Fei @ 2023-04-23  7:37 UTC (permalink / raw)
  To: Roper, Matthew D; +Cc: intel-gfx, dri-devel, Hajda, Andrzej, Das, Nirmoy

> On Fri, Apr 21, 2023 at 10:27:22AM -0700, Yang, Fei wrote:
>>> On Wed, Apr 19, 2023 at 04:00:53PM -0700, fei.yang@intel.com wrote:
>>>> From: Fei Yang <fei.yang@intel.com>
>>>>
>>>> PTE encode functions are platform dependent. This patch implements
>>>> PTE functions for MTL, and ensures the correct PTE encode function
>>>> is used by calling pte_encode function pointer instead of the
>>>> hardcoded gen8 version of PTE encode.
>>>>
>>>> Signed-off-by: Fei Yang <fei.yang@intel.com>
>>>> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
>>>> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
>>>> Acked-by: Nirmoy Das <nirmoy.das@intel.com>
>>>
>>> Bspec: 45015, 45040
>>>
>>>> ---
>>>>  drivers/gpu/drm/i915/display/intel_dpt.c |  2 +-
>>>>  drivers/gpu/drm/i915/gt/gen8_ppgtt.c     | 45 ++++++++++++++++++++----
>>>>  drivers/gpu/drm/i915/gt/intel_ggtt.c     | 36 +++++++++++++++++--
>>>>  3 files changed, 72 insertions(+), 11 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/display/intel_dpt.c
>>b/drivers/gpu/drm/i915/display/intel_dpt.c
>>>> index b8027392144d..c5eacfdba1a5 100644
>>>> --- a/drivers/gpu/drm/i915/display/intel_dpt.c
>>>> +++ b/drivers/gpu/drm/i915/display/intel_dpt.c
>>>> @@ -300,7 +300,7 @@ intel_dpt_create(struct intel_framebuffer *fb)
>>>>        vm->vma_ops.bind_vma    = dpt_bind_vma;
>>>>        vm->vma_ops.unbind_vma  = dpt_unbind_vma;
>>>>
>>>> -     vm->pte_encode = gen8_ggtt_pte_encode;
>>>> +     vm->pte_encode = vm->gt->ggtt->vm.pte_encode;
>>>>
>>>>        dpt->obj = dpt_obj;
>>>>        dpt->obj->is_dpt = true;
>>>> diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>>>>  b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>>>> index 4daaa6f55668..11b91e0453c8 100644
>>>> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>>>> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>>>> @@ -55,6 +55,34 @@ static u64 gen8_pte_encode(dma_addr_t addr,
>>>>        return pte;
>>>>  }
>>>>
>>>> +static u64 mtl_pte_encode(dma_addr_t addr,
>>>> +                       enum i915_cache_level level,
>>>> +                       u32 flags)
>>>> +{
>>>> +     gen8_pte_t pte = addr | GEN8_PAGE_PRESENT | GEN8_PAGE_RW;
>>>> +
>>>> +     if (unlikely(flags & PTE_READ_ONLY))
>>>> +             pte &= ~GEN8_PAGE_RW;
>>>> +
>>>> +     if (flags & PTE_LM)
>>>> +             pte |= GEN12_PPGTT_PTE_LM | GEN12_PPGTT_PTE_NC;
>>>
>>> GEN12_PPGTT_PTE_NC got defined in the previous patch as BIT(5).  But
>>> according to bspec 45040, bit 5 is ignored in the PTE encoding.  What is
>>> this trying to do?
>>
>> This takes effect only for PTE_LM, doesn't affect MTL.
>> PTE_NC is needed for PVC (use of access counter).
>> I believe this function was writen based on the one for PVC. And this
>> function did get extended to cover all gen12 in a later patch.
>
> Even though MTL doesn't have local memory, PTE_LM is supposed to be
> used on MTL for access to BAR2 stolen memory.

You were right, but I still think this code is fine because this bit is
ignored for MTL anyway and it is needed for other platforms with LMEM.
Otherwise this code would have some sort of platform checking which is
hard to do because we don't have platform info here.
Or we would have to define another PTE encode function for platforms
needing PTE_NC just for this one difference, then manage the function
pointer correctly.

-Fei

> Matt
>
>> -Fei
>>> Matt
>>>
>>>> +
>>>> +     switch (level) {
>>>> +     case I915_CACHE_NONE:
>>>> +             pte |= GEN12_PPGTT_PTE_PAT1;
>>>> +             break;

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Intel-gfx] [PATCH 7/8] drm/i915: use pat_index instead of cache_level
  2023-04-23  6:12       ` Yang, Fei
  (?)
@ 2023-04-24  8:41       ` Tvrtko Ursulin
  -1 siblings, 0 replies; 76+ messages in thread
From: Tvrtko Ursulin @ 2023-04-24  8:41 UTC (permalink / raw)
  To: Yang, Fei, intel-gfx; +Cc: Chris Wilson, Roper, Matthew D, dri-devel


On 23/04/2023 07:12, Yang, Fei wrote:
>> On 20/04/2023 00:00, fei.yang@intel.com wrote:
>>> From: Fei Yang <fei.yang@intel.com>
>>>
>>> Currently the KMD is using enum i915_cache_level to set caching policy
>>> for buffer objects. This is flaky because the PAT index which really
>>> controls the caching behavior in PTE has far more levels than what's
>>> defined in the enum. In addition, the PAT index is platform dependent,
>>> having to translate between i915_cache_level and PAT index is not
>>> reliable, and makes the code more complicated.
>>>
>>>  From UMD's perspective there is also a necessity to set caching policy for
>>> performance fine tuning. It's much easier for the UMD to directly use
>>> PAT index because the behavior of each PAT index is clearly defined in Bspec.
>>> Having the abstracted i915_cache_level sitting in between would only
>>> cause more ambiguity.
>>>
>>> For these reasons this patch replaces i915_cache_level with PAT index.
>>> Also note, the cache_level is not completely removed yet, because the
>>> KMD still has the need of creating buffer objects with simple cache
>>> settings such as cached, uncached, or writethrough. For such simple
>>> cases, using cache_level would help simplify the code.
>>>
>>> Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
>>> Cc: Matt Roper <matthew.d.roper@intel.com>
>>> Signed-off-by: Fei Yang <fei.yang@intel.com>
>>> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
>>
>> [snip]
>>
>>>
>>>    bool i915_gem_cpu_write_needs_clflush(struct drm_i915_gem_object
>>> *obj) @@ -267,7 +267,7 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
>>>    {
>>>       int ret;
>>>
>>> -    if (obj->cache_level == cache_level)
>>> +    if (i915_gem_object_has_cache_level(obj, cache_level))
>>>               return 0;
>>
>> When userspace calls i915_gem_set_caching_ioctl
> 
> We are ending the support for set_caching_ioctl.

Not on all platforms.

>> after having set the PAT index explicitly this will make it silently succeed
>> regardless of the cache level passed in, no? Because of:
> 
> Yes, that's the point. For objects created by userspace with PAT index set,
> KMD is not supposed to touch the setting.

Why would that be a reason to lie to it? What would would be the problem 
with telling it of the mistake?

>> +bool i915_gem_object_has_cache_level(const struct drm_i915_gem_object *obj,
>> +                                  enum i915_cache_level lvl)
>> +{
>> +     /*
>> +      * cache_level == I915_CACHE_INVAL indicates the UMD's have set the
>> +      * caching policy through pat_index, in which case the KMD should
>> +      * leave the coherency to be managed by user space, simply return
>> +      * true here.
>> +      */
>> +     if (obj->cache_level == I915_CACHE_INVAL)
>> +             return true;
>>
>> I think we need to let it know it is doing it wrong with an error.
> 
> This is not an error, by design userspace should know exactly what it's doing.

IMO when return values can be misleading that means the API is not great.

I don't see a good reason to lie to both in kernel callers and to 
userspace (set_caching).

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Intel-gfx] [PATCH 8/8] drm/i915: Allow user to set cache at BO creation
       [not found]             ` <168232538771.392286.3227368099155268955@jljusten-skl>
@ 2023-04-24  9:08                 ` Tvrtko Ursulin
  0 siblings, 0 replies; 76+ messages in thread
From: Tvrtko Ursulin @ 2023-04-24  9:08 UTC (permalink / raw)
  To: Jordan Justen, Teres Alexis, Alan Previn, Yang, Fei, Andi Shyti
  Cc: Roper, Matthew D, Intel-gfx, Landwerlin, Lionel G,
	Ceraolo Spurio, Daniele, DRI Development, Chris Wilson, Das,
	Nirmoy


[fixed mailing lists addresses]

On 24/04/2023 09:36, Jordan Justen wrote:
> On 2023-04-23 00:05:06, Yang, Fei wrote:
>>> On 2023-04-20 09:11:18, Yang, Fei wrote:
>>>>> On 20/04/2023 12:39, Andi Shyti wrote:
>>>>>> Hi Fei,
>>>>>>
>>>>>> because this is an API change, we need some more information here.
>>>>>>
>>>>>> First of all you need to CC the userspace guys that have been
>>>>>> working on top of your series and get their ack's.
>>>>>
>>>>> Yes, and a link to a Mesa merge request which uses the uapi should
>>>>> be included.
>>>>
>>>> Working with Mesa team on this, stay tuned.
>>>>
>>>
>>> I would like to see the extension detection issue is handled
>>> before ack'ing this.
>>>
>>> How about a new DRM_I915_QUERY_GEM_CREATE_EXTENSIONS item, that returns
>>> a u64 array of usable extension names for DRM_IOCTL_I915_GEM_CREATE_EXT?
>>
>> I agree a query mechanism is necessary, but that should be generic for all
>> uAPI's, not just for GEM_CREATE.
>>
>>> A similar DRM_I915_QUERY_GEM_CONTEXT_CREATE_EXTENSIONS could also provide
>>> an alternative to Alan's "drm/i915/uapi/pxp: Add a GET_PARAM for PXP",
>>> and more easily allow advertising future new extensions for context/buffer
>>> creation.
>>
>> I think we should have a discussion and come up with a sustainable design for
>> the query uAPI, rather than just put in a quick hack for this.
> 
> I think you are being a bit too quick to dismiss my idea as a quick
> hack... Nevetheless, I would love to hear an alternate suggestion.
> Just as long as it's not, "let's figure this out later, because I need
> to add this feature now".
> 
> I don't think "just try to use it and if it fails, I guess it isn't
> supported" is reasonable. So, if we can't do better, at least add a
> GET_PARAM. Yeah, it's a quick hack, but it's better than nothing.

Being able to "list" supported extensions sounds like a reasonable principle, albeit a departure from the design direction to date. Which means there are probably no quick solutions. Also, AFAIU, only PXP context create is the problematic one, right? Everything else is pretty much instant or delayed allocation so super cheap to probe by attempting to use.

If I got that right and given this series is about drm_i915_gem_create_ext I don't think this side discussion should be blocking it.

Furthermore the PXP context create story is even more complicated, in a way that it is not just about querying whether the extension is supported, but the expensive check is something more complicated.

Going back to implementation details for this proposed new feature, one alternative to query could be something like:

   drm_i915_gem_create_ext.flags |= I915_GEM_CREATE_EXT_FLAG_PROBE_EXTENSIONS;

That would be somewhat more light weight to implement that the i915_query route. And it appears it would work for all ioctls which support extensions apart for i915_context_param_engines.

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Intel-gfx] [PATCH 8/8] drm/i915: Allow user to set cache at BO creation
@ 2023-04-24  9:08                 ` Tvrtko Ursulin
  0 siblings, 0 replies; 76+ messages in thread
From: Tvrtko Ursulin @ 2023-04-24  9:08 UTC (permalink / raw)
  To: Jordan Justen, Teres Alexis, Alan Previn, Yang, Fei, Andi Shyti
  Cc: Roper, Matthew D, Intel-gfx, DRI Development, Chris Wilson, Das, Nirmoy


[fixed mailing lists addresses]

On 24/04/2023 09:36, Jordan Justen wrote:
> On 2023-04-23 00:05:06, Yang, Fei wrote:
>>> On 2023-04-20 09:11:18, Yang, Fei wrote:
>>>>> On 20/04/2023 12:39, Andi Shyti wrote:
>>>>>> Hi Fei,
>>>>>>
>>>>>> because this is an API change, we need some more information here.
>>>>>>
>>>>>> First of all you need to CC the userspace guys that have been
>>>>>> working on top of your series and get their ack's.
>>>>>
>>>>> Yes, and a link to a Mesa merge request which uses the uapi should
>>>>> be included.
>>>>
>>>> Working with Mesa team on this, stay tuned.
>>>>
>>>
>>> I would like to see the extension detection issue is handled
>>> before ack'ing this.
>>>
>>> How about a new DRM_I915_QUERY_GEM_CREATE_EXTENSIONS item, that returns
>>> a u64 array of usable extension names for DRM_IOCTL_I915_GEM_CREATE_EXT?
>>
>> I agree a query mechanism is necessary, but that should be generic for all
>> uAPI's, not just for GEM_CREATE.
>>
>>> A similar DRM_I915_QUERY_GEM_CONTEXT_CREATE_EXTENSIONS could also provide
>>> an alternative to Alan's "drm/i915/uapi/pxp: Add a GET_PARAM for PXP",
>>> and more easily allow advertising future new extensions for context/buffer
>>> creation.
>>
>> I think we should have a discussion and come up with a sustainable design for
>> the query uAPI, rather than just put in a quick hack for this.
> 
> I think you are being a bit too quick to dismiss my idea as a quick
> hack... Nevetheless, I would love to hear an alternate suggestion.
> Just as long as it's not, "let's figure this out later, because I need
> to add this feature now".
> 
> I don't think "just try to use it and if it fails, I guess it isn't
> supported" is reasonable. So, if we can't do better, at least add a
> GET_PARAM. Yeah, it's a quick hack, but it's better than nothing.

Being able to "list" supported extensions sounds like a reasonable principle, albeit a departure from the design direction to date. Which means there are probably no quick solutions. Also, AFAIU, only PXP context create is the problematic one, right? Everything else is pretty much instant or delayed allocation so super cheap to probe by attempting to use.

If I got that right and given this series is about drm_i915_gem_create_ext I don't think this side discussion should be blocking it.

Furthermore the PXP context create story is even more complicated, in a way that it is not just about querying whether the extension is supported, but the expensive check is something more complicated.

Going back to implementation details for this proposed new feature, one alternative to query could be something like:

   drm_i915_gem_create_ext.flags |= I915_GEM_CREATE_EXT_FLAG_PROBE_EXTENSIONS;

That would be somewhat more light weight to implement that the i915_query route. And it appears it would work for all ioctls which support extensions apart for i915_context_param_engines.

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Intel-gfx] [PATCH 7/8] drm/i915: use pat_index instead of cache_level
  2023-04-23  6:52       ` Yang, Fei
  (?)
@ 2023-04-24  9:22       ` Tvrtko Ursulin
  -1 siblings, 0 replies; 76+ messages in thread
From: Tvrtko Ursulin @ 2023-04-24  9:22 UTC (permalink / raw)
  To: Yang, Fei, intel-gfx; +Cc: Chris Wilson, Roper, Matthew D, dri-devel


On 23/04/2023 07:52, Yang, Fei wrote:
>> On 20/04/2023 00:00, fei.yang@intel.com wrote:
>>> From: Fei Yang <fei.yang@intel.com>
>>>
>>> Currently the KMD is using enum i915_cache_level to set caching policy for
>>> buffer objects. This is flaky because the PAT index which really controls
>>> the caching behavior in PTE has far more levels than what's defined in the
>>> enum. In addition, the PAT index is platform dependent, having to translate
>>> between i915_cache_level and PAT index is not reliable, and makes the code
>>> more complicated.
>>>
>>> >From UMD's perspective there is also a necessity to set caching policy for
>>> performance fine tuning. It's much easier for the UMD to directly use PAT
>>> index because the behavior of each PAT index is clearly defined in Bspec.
>>> Having the abstracted i915_cache_level sitting in between would only cause
>>> more ambiguity.
>>>
>>> For these reasons this patch replaces i915_cache_level with PAT index. Also
>>> note, the cache_level is not completely removed yet, because the KMD still
>>> has the need of creating buffer objects with simple cache settings such as
>>> cached, uncached, or writethrough. For such simple cases, using cache_level
>>> would help simplify the code.
>>>
>>> Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
>>> Cc: Matt Roper <matthew.d.roper@intel.com>
>>> Signed-off-by: Fei Yang <fei.yang@intel.com>
>>> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
>>
>> I think have some ideas no how to perhaps make this simpler, please bear
>> with me.
>>
>> In my mind get/set caching ioctls need to be failing once explicit pat
>> index has been set by userspace. Or at least not return false information.
> 
> By design we are ending the support for set caching ioctl. The patch is included
> in this series, "drm/i915/mtl: end support for set caching ioctl"
> 
> +       if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 70))
> +               return -EOPNOTSUPP;
> +
> 
>> And I don't like i915_gem_object_has_cache_level and
>> i915_gem_get_pat_index as a refactoring step.
>>
>> It also seems that the driver has a need to query the caching mode set
>> regardless of the route (of setting).
> 
> Only for the objects created by the KMD. For UMD created objects with PAT
> index set KMD should never touch the setting.
> 
>> So how about this.
>>
>> Three callers which query the caching mode: use_cpu_reloc, vm_fault_gtt,
>> gpu_write_needs_clflush.
>>
>> We convert them to be like:
>>
>> i915_gem_object_has_caching_mode(obj, PAT_UC / PAT_WT / ...);
> 
> PAT_UC/WT/WB are platform dependent (https://gfxspecs.intel.com/Predator/Home/Index/45101),
> performing this check you would have to do something like,
> 
> if (MTL)
>          ...
> else if (PVC)
>          ...
> else if (GEN12)
>          ...
> else
>          ...

No, it would be doable with a table as I suggested below. And that table 
could be re-used for debugfs pretty-printing simplifying that code too.

>> Then apart from the per platform tables for mapping between cache level
>> to pat index, you add tables which map pat index to caching modes
>> (PAT_UC, etc, naming TBD, just enums or bitmasks also TBD, I haven't
>> looked at the bspec to see how exactly it works).
>>
>> You would use that table in the i915_gem_object_has_caching_mode helper,
>> called from the above three functions instead of obj->cache_level direct
>> comparison.
>>
>> I am assuming at least for instance cache_level != I915_CACHE_NONE would
>> be equivalent to i915_gem_object_has_caching_mode(obj, PAT_UC), etc.
> 
> So far kernel only needs 4 cache levels defined in enum i915_cache_level,
> kernel doesn't need to understand all PAT indices. By desgin if the userspace
> is setting PAT index directly, kernel only needs to pass the setting to PTE.
> 
> For objects created by kernel (including objects created by userspace without
> specifying pat index), there are only 4 options (defined in the cachelevel_to_pat).
> 
> For objects created by userspace with PAT index set (GEM_CREATE + set_pat extension),
> kernel should not touch the setting, just pass it to the PAT index bits in PTE.
> 
> That's why I was only checking cache_level. Handling PAT index is much more
> complicated because of its platform dependent nature and even the number of
> PAT indices varies from platform to platform. Fortunately kernel doesn't need
> to understand that.

Yeah but I think you maybe missed the spirit of my proposal - which is 
to simplify the internal code paths by not having the duality of 
cache_level-vs-pat almost all the way down. But instead cut it at the 
top API level.

You have this:

+	.cachelevel_to_pat = { \
+		[I915_CACHE_NONE]   = 0, \
+		[I915_CACHE_LLC]    = 1, \
+		[I915_CACHE_L3_LLC] = 2, \
+		[I915_CACHE_WT]     = 3, \
+	}

I propose to add something like:

.legacy_platform_pat = { /* pat index to driver logical flags */
     [0] = PAT_UC,
     [1] = PAT_WB | PAT_LLC, /* Just illustrating the principle */
};

i915->platform_pat = &legacy_platform_pat

i915_gem_object_has_caching_mode(obj, PAT_UC)
...
	return i915->platform_pat & PAT_UC == PAT_UC /* give or take */


get/set_caching_ioctl
{
...
	if (obj->has_pat_index) /* set in the extension only */
		return -EINVAL;

debugfs:

i915_show_pat_flags(i915->platform_pat[obj->pat_index]); /* generic! */

etc...

Set obj->pat_index in the extension or set_cache_level _only_. No 
internal code paths then need to use anything but it. No sprinkling of 
conversion helpers needed or dubious has_cache_level query.

Regards,

Tvrtko

>> Same mapping table could also be used in debugfs (i915_cache_level_str)
>> to universally describe any obj->pat_index, with no need to have
>> anything platform dependend there.
>>
>> In set caching set you always set obj->pat_index and so low level code
>> can always just use that.
>>
>> Unless I am missing something (possible) I think like that we end up
>> with no i915_gem_get_pat_index sprinkled around and also no confusing
>> i915_gem_object_has_cache_level.
>>
>> Obj->pat_index would be a single point of truth, while obj->cache_level
>> is just a legacy field for get/set_caching ioctl - not used in the
>> internal driver flows.
>>
>> We would need an additional field for storing the boolean of whether
>> userspace had overriden the PAT.
>>
>> Regards,
>>
>> Tvrtko

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Intel-gfx] [PATCH 8/8] drm/i915: Allow user to set cache at BO creation
  2023-04-24  9:08                 ` Tvrtko Ursulin
@ 2023-04-24 17:13                   ` Jordan Justen
  -1 siblings, 0 replies; 76+ messages in thread
From: Jordan Justen @ 2023-04-24 17:13 UTC (permalink / raw)
  To: Teres Alexis, Alan Previn, Yang, Fei, Andi Shyti, Tvrtko Ursulin
  Cc: Roper, Matthew D, Intel-gfx, Landwerlin, Lionel G,
	Ceraolo Spurio, Daniele, DRI Development, Chris Wilson, Das,
	Nirmoy

On 2023-04-24 02:08:43, Tvrtko Ursulin wrote:
> 
> Being able to "list" supported extensions sounds like a reasonable
> principle, albeit a departure from the design direction to date.
> Which means there are probably no quick solutions. Also, AFAIU, only
> PXP context create is the problematic one, right? Everything else is
> pretty much instant or delayed allocation so super cheap to probe by
> attempting to use.
> 
> If I got that right and given this series is about
> drm_i915_gem_create_ext I don't think this side discussion should be
> blocking it.

This still leaves the issue of no reasonable detection mechanism for
the extension. If the discussion gets too complicated, then can we add
a GET_PARAM for the SET_PAT extension? I'm hoping we could either come
up with something better reasonably quickly, or i915/Xe can add a new
param for each new extensions until a better approach is available.

> Furthermore the PXP context create story is even more complicated,
> in a way that it is not just about querying whether the extension is
> supported, but the expensive check is something more complicated.
> 
> Going back to implementation details for this proposed new feature,
> one alternative to query could be something like:
> 
>    drm_i915_gem_create_ext.flags |= I915_GEM_CREATE_EXT_FLAG_PROBE_EXTENSIONS;
> 
> That would be somewhat more light weight to implement that the
> i915_query route. And it appears it would work for all ioctls which
> support extensions apart for i915_context_param_engines.

This seems little better than the "try it, and if it works then it's
supported".

I'm not suggesting that userspace should be able to check that
scenario x+y+z will work, but more a list of extensions that
conceivably could work. Normally this should just a matter of the
kernel unconditionally adding the newly implemented extension to the
list returned in the query call.

If a GET_PARAM can be made for the PXP case, then it seems like a
query item returning CONTEXT_CREATE extensions could conditionally
omit that extension just as easily as implementing the proposed new
GET_PARAM.

-Jordan

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Intel-gfx] [PATCH 8/8] drm/i915: Allow user to set cache at BO creation
@ 2023-04-24 17:13                   ` Jordan Justen
  0 siblings, 0 replies; 76+ messages in thread
From: Jordan Justen @ 2023-04-24 17:13 UTC (permalink / raw)
  To: Teres Alexis, Alan Previn, Yang, Fei, Andi Shyti, Tvrtko Ursulin
  Cc: Roper, Matthew D, Intel-gfx, DRI Development, Chris Wilson, Das, Nirmoy

On 2023-04-24 02:08:43, Tvrtko Ursulin wrote:
> 
> Being able to "list" supported extensions sounds like a reasonable
> principle, albeit a departure from the design direction to date.
> Which means there are probably no quick solutions. Also, AFAIU, only
> PXP context create is the problematic one, right? Everything else is
> pretty much instant or delayed allocation so super cheap to probe by
> attempting to use.
> 
> If I got that right and given this series is about
> drm_i915_gem_create_ext I don't think this side discussion should be
> blocking it.

This still leaves the issue of no reasonable detection mechanism for
the extension. If the discussion gets too complicated, then can we add
a GET_PARAM for the SET_PAT extension? I'm hoping we could either come
up with something better reasonably quickly, or i915/Xe can add a new
param for each new extensions until a better approach is available.

> Furthermore the PXP context create story is even more complicated,
> in a way that it is not just about querying whether the extension is
> supported, but the expensive check is something more complicated.
> 
> Going back to implementation details for this proposed new feature,
> one alternative to query could be something like:
> 
>    drm_i915_gem_create_ext.flags |= I915_GEM_CREATE_EXT_FLAG_PROBE_EXTENSIONS;
> 
> That would be somewhat more light weight to implement that the
> i915_query route. And it appears it would work for all ioctls which
> support extensions apart for i915_context_param_engines.

This seems little better than the "try it, and if it works then it's
supported".

I'm not suggesting that userspace should be able to check that
scenario x+y+z will work, but more a list of extensions that
conceivably could work. Normally this should just a matter of the
kernel unconditionally adding the newly implemented extension to the
list returned in the query call.

If a GET_PARAM can be made for the PXP case, then it seems like a
query item returning CONTEXT_CREATE extensions could conditionally
omit that extension just as easily as implementing the proposed new
GET_PARAM.

-Jordan

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Intel-gfx] [PATCH 3/8] drm/i915/mtl: Add PTE encode function
  2023-04-23  7:37           ` Yang, Fei
  (?)
@ 2023-04-24 17:20           ` Matt Roper
  2023-04-24 18:41             ` Yang, Fei
  -1 siblings, 1 reply; 76+ messages in thread
From: Matt Roper @ 2023-04-24 17:20 UTC (permalink / raw)
  To: Yang, Fei; +Cc: intel-gfx, dri-devel, Hajda, Andrzej, Das, Nirmoy

On Sun, Apr 23, 2023 at 12:37:27AM -0700, Yang, Fei wrote:
> > On Fri, Apr 21, 2023 at 10:27:22AM -0700, Yang, Fei wrote:
> >>> On Wed, Apr 19, 2023 at 04:00:53PM -0700, fei.yang@intel.com wrote:
> >>>> From: Fei Yang <fei.yang@intel.com>
> >>>>
> >>>> PTE encode functions are platform dependent. This patch implements
> >>>> PTE functions for MTL, and ensures the correct PTE encode function
> >>>> is used by calling pte_encode function pointer instead of the
> >>>> hardcoded gen8 version of PTE encode.
> >>>>
> >>>> Signed-off-by: Fei Yang <fei.yang@intel.com>
> >>>> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
> >>>> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
> >>>> Acked-by: Nirmoy Das <nirmoy.das@intel.com>
> >>>
> >>> Bspec: 45015, 45040
> >>>
> >>>> ---
> >>>>  drivers/gpu/drm/i915/display/intel_dpt.c |  2 +-
> >>>>  drivers/gpu/drm/i915/gt/gen8_ppgtt.c     | 45 ++++++++++++++++++++----
> >>>>  drivers/gpu/drm/i915/gt/intel_ggtt.c     | 36 +++++++++++++++++--
> >>>>  3 files changed, 72 insertions(+), 11 deletions(-)
> >>>>
> >>>> diff --git a/drivers/gpu/drm/i915/display/intel_dpt.c
> >>b/drivers/gpu/drm/i915/display/intel_dpt.c
> >>>> index b8027392144d..c5eacfdba1a5 100644
> >>>> --- a/drivers/gpu/drm/i915/display/intel_dpt.c
> >>>> +++ b/drivers/gpu/drm/i915/display/intel_dpt.c
> >>>> @@ -300,7 +300,7 @@ intel_dpt_create(struct intel_framebuffer *fb)
> >>>>        vm->vma_ops.bind_vma    = dpt_bind_vma;
> >>>>        vm->vma_ops.unbind_vma  = dpt_unbind_vma;
> >>>>
> >>>> -     vm->pte_encode = gen8_ggtt_pte_encode;
> >>>> +     vm->pte_encode = vm->gt->ggtt->vm.pte_encode;
> >>>>
> >>>>        dpt->obj = dpt_obj;
> >>>>        dpt->obj->is_dpt = true;
> >>>> diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> >>>>  b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> >>>> index 4daaa6f55668..11b91e0453c8 100644
> >>>> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> >>>> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> >>>> @@ -55,6 +55,34 @@ static u64 gen8_pte_encode(dma_addr_t addr,
> >>>>        return pte;
> >>>>  }
> >>>>
> >>>> +static u64 mtl_pte_encode(dma_addr_t addr,
> >>>> +                       enum i915_cache_level level,
> >>>> +                       u32 flags)
> >>>> +{
> >>>> +     gen8_pte_t pte = addr | GEN8_PAGE_PRESENT | GEN8_PAGE_RW;
> >>>> +
> >>>> +     if (unlikely(flags & PTE_READ_ONLY))
> >>>> +             pte &= ~GEN8_PAGE_RW;
> >>>> +
> >>>> +     if (flags & PTE_LM)
> >>>> +             pte |= GEN12_PPGTT_PTE_LM | GEN12_PPGTT_PTE_NC;
> >>>
> >>> GEN12_PPGTT_PTE_NC got defined in the previous patch as BIT(5).  But
> >>> according to bspec 45040, bit 5 is ignored in the PTE encoding.  What is
> >>> this trying to do?
> >>
> >> This takes effect only for PTE_LM, doesn't affect MTL.
> >> PTE_NC is needed for PVC (use of access counter).
> >> I believe this function was writen based on the one for PVC. And this
> >> function did get extended to cover all gen12 in a later patch.
> >
> > Even though MTL doesn't have local memory, PTE_LM is supposed to be
> > used on MTL for access to BAR2 stolen memory.
> 
> You were right, but I still think this code is fine because this bit is
> ignored for MTL anyway and it is needed for other platforms with LMEM.
> Otherwise this code would have some sort of platform checking which is
> hard to do because we don't have platform info here.
> Or we would have to define another PTE encode function for platforms
> needing PTE_NC just for this one difference, then manage the function
> pointer correctly.

MTL is the only platform that uses this function right now:

   +       if (GRAPHICS_VER_FULL(gt->i915) >= IP_VER(12, 70))
   +               ppgtt->vm.pte_encode = mtl_pte_encode;
   +       else
   +               ppgtt->vm.pte_encode = gen8_pte_encode;

If this is intended for PVC, then you have it in the wrong function to
begin with (and it also shouldn't be in a patch labelled "mtl").  If
you're trying to future-proof for some post-MTL discrete platform, then
such code should be saved until we enable that platform so that it can
be properly reviewed.


Matt

> 
> -Fei
> 
> > Matt
> >
> >> -Fei
> >>> Matt
> >>>
> >>>> +
> >>>> +     switch (level) {
> >>>> +     case I915_CACHE_NONE:
> >>>> +             pte |= GEN12_PPGTT_PTE_PAT1;
> >>>> +             break;

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Intel-gfx] [PATCH 3/8] drm/i915/mtl: Add PTE encode function
  2023-04-24 17:20           ` Matt Roper
@ 2023-04-24 18:41             ` Yang, Fei
  0 siblings, 0 replies; 76+ messages in thread
From: Yang, Fei @ 2023-04-24 18:41 UTC (permalink / raw)
  To: Roper, Matthew D; +Cc: intel-gfx, dri-devel, Hajda, Andrzej, Das, Nirmoy

[-- Attachment #1: Type: text/plain, Size: 4540 bytes --]

> On Sun, Apr 23, 2023 at 12:37:27AM -0700, Yang, Fei wrote:
>>> On Fri, Apr 21, 2023 at 10:27:22AM -0700, Yang, Fei wrote:
>>>>> On Wed, Apr 19, 2023 at 04:00:53PM -0700, fei.yang@intel.com wrote:
>>>>>> From: Fei Yang <fei.yang@intel.com>
>>>>>>
>>>>>> PTE encode functions are platform dependent. This patch implements
>>>>>> PTE functions for MTL, and ensures the correct PTE encode function
>>>>>> is used by calling pte_encode function pointer instead of the
>>>>>> hardcoded gen8 version of PTE encode.
>>>>>>
>>>>>> Signed-off-by: Fei Yang <fei.yang@intel.com>
>>>>>> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
>>>>>> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
>>>>>> Acked-by: Nirmoy Das <nirmoy.das@intel.com>
>>>>>
>>>>> Bspec: 45015, 45040
>>>>>
>>>>>> ---
>>>>>>  drivers/gpu/drm/i915/display/intel_dpt.c |  2 +-
>>>>>>  drivers/gpu/drm/i915/gt/gen8_ppgtt.c     | 45 ++++++++++++++++++++----
>>>>>>  drivers/gpu/drm/i915/gt/intel_ggtt.c     | 36 +++++++++++++++++--
>>>>>>  3 files changed, 72 insertions(+), 11 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/i915/display/intel_dpt.c
>>>>b/drivers/gpu/drm/i915/display/intel_dpt.c
>>>>>> index b8027392144d..c5eacfdba1a5 100644
>>>>>> --- a/drivers/gpu/drm/i915/display/intel_dpt.c
>>>>>> +++ b/drivers/gpu/drm/i915/display/intel_dpt.c
>>>>>> @@ -300,7 +300,7 @@ intel_dpt_create(struct intel_framebuffer *fb)
>>>>>>        vm->vma_ops.bind_vma    = dpt_bind_vma;
>>>>>>        vm->vma_ops.unbind_vma  = dpt_unbind_vma;
>>>>>>
>>>>>> -     vm->pte_encode = gen8_ggtt_pte_encode;
>>>>>> +     vm->pte_encode = vm->gt->ggtt->vm.pte_encode;
>>>>>>
>>>>>>        dpt->obj = dpt_obj;
>>>>>>        dpt->obj->is_dpt = true;
>>>>>> diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>>>>>>  b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>>>>>> index 4daaa6f55668..11b91e0453c8 100644
>>>>>> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>>>>>> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>>>>>> @@ -55,6 +55,34 @@ static u64 gen8_pte_encode(dma_addr_t addr,
>>>>>>        return pte;
>>>>>>  }
>>>>>>
>>>>>> +static u64 mtl_pte_encode(dma_addr_t addr,
>>>>>> +                       enum i915_cache_level level,
>>>>>> +                       u32 flags)
>>>>>> +{
>>>>>> +     gen8_pte_t pte = addr | GEN8_PAGE_PRESENT | GEN8_PAGE_RW;
>>>>>> +
>>>>>> +     if (unlikely(flags & PTE_READ_ONLY))
>>>>>> +             pte &= ~GEN8_PAGE_RW;
>>>>>> +
>>>>>> +     if (flags & PTE_LM)
>>>>>> +             pte |= GEN12_PPGTT_PTE_LM | GEN12_PPGTT_PTE_NC;
>>>>>
>>>>> GEN12_PPGTT_PTE_NC got defined in the previous patch as BIT(5).  But
>>>>> according to bspec 45040, bit 5 is ignored in the PTE encoding.  What is
>>>>> this trying to do?
>>>>
>>>> This takes effect only for PTE_LM, doesn't affect MTL.
>>>> PTE_NC is needed for PVC (use of access counter).
>>>> I believe this function was writen based on the one for PVC. And this
>>>> function did get extended to cover all gen12 in a later patch.
>>>
>>> Even though MTL doesn't have local memory, PTE_LM is supposed to be
>>> used on MTL for access to BAR2 stolen memory.
>>
>> You were right, but I still think this code is fine because this bit is
>> ignored for MTL anyway and it is needed for other platforms with LMEM.
>> Otherwise this code would have some sort of platform checking which is
>> hard to do because we don't have platform info here.
>> Or we would have to define another PTE encode function for platforms
>> needing PTE_NC just for this one difference, then manage the function
>> pointer correctly.
>
> MTL is the only platform that uses this function right now:
>
>   +       if (GRAPHICS_VER_FULL(gt->i915) >= IP_VER(12, 70))
>   +               ppgtt->vm.pte_encode = mtl_pte_encode;
>   +       else
>   +               ppgtt->vm.pte_encode = gen8_pte_encode;
>
> If this is intended for PVC, then you have it in the wrong function to
> begin with (and it also shouldn't be in a patch labelled "mtl").  If
> you're trying to future-proof for some post-MTL discrete platform, then
> such code should be saved until we enable that platform so that it can
> be properly reviewed.

dropped GEN12_PPGTT_PTE_NC bit in v2 of https://patchwork.freedesktop.org/series/116900/

> Matt
>
>>
>> -Fei
>>
>>> Matt
>>>
>>>> -Fei
>>>>> Matt
>>>>>
>>>>>> +
>>>>>> +     switch (level) {
>>>>>> +     case I915_CACHE_NONE:
>>>>>> +             pte |= GEN12_PPGTT_PTE_PAT1;
>>>>>> +             break;

[-- Attachment #2: Type: text/html, Size: 10619 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* IOCTL feature detection (Was: Re: [Intel-gfx] [PATCH 8/8] drm/i915: Allow user to set cache at BO creation)
  2023-04-24 17:13                   ` Jordan Justen
@ 2023-04-25 13:41                     ` Joonas Lahtinen
  -1 siblings, 0 replies; 76+ messages in thread
From: Joonas Lahtinen @ 2023-04-25 13:41 UTC (permalink / raw)
  To: Teres Alexis, Alan Previn, Yang, Fei, Andi Shyti, Jordan Justen,
	Tvrtko Ursulin
  Cc: Roper, Matthew D, Intel-gfx, DRI Development, Chris Wilson,
	Faith Ekstrand, Das, Nirmoy

(+ Faith and Daniel as they have been involved in previous discussions)

Quoting Jordan Justen (2023-04-24 20:13:00)
> On 2023-04-24 02:08:43, Tvrtko Ursulin wrote:
> > 
> > Being able to "list" supported extensions sounds like a reasonable
> > principle, albeit a departure from the design direction to date.
> > Which means there are probably no quick solutions. Also, AFAIU, only
> > PXP context create is the problematic one, right? Everything else is
> > pretty much instant or delayed allocation so super cheap to probe by
> > attempting to use.
> > 
> > If I got that right and given this series is about
> > drm_i915_gem_create_ext I don't think this side discussion should be
> > blocking it.
> 
> This still leaves the issue of no reasonable detection mechanism for
> the extension.

I remember this exact discussion being repeated at least a few times in
the past 5 years. Previously the conclusion was always that detection by
trying to use the extension leads to least amount of uAPI surface. There
is also no concern of having mismatch in backporting of the query and the
functionality patches.

Why exactly do you think it is more reasonable to do the following?

check_if_extension_query_is_supported();
<check retval>
check_if_extension_xyz_is_supported();
<check retval>

VS

create_[context,buffer,whatever]_with_extension();
<check errno>
destroy_[context,buffer,whatever]();

For contexts and buffers there's no overhead, and we're keeping the uAPI
surface smaller (less bugs, less validation effort). Additionally we
support checking combinations of extensions A, B and C without extra
effort.

If we're not returning enough clear errnos, then that is something to
make sure we do.

Regards, Joonas

> If the discussion gets too complicated, then can we add
> a GET_PARAM for the SET_PAT extension? I'm hoping we could either come
> up with something better reasonably quickly, or i915/Xe can add a new
> param for each new extensions until a better approach is available.
> 
> > Furthermore the PXP context create story is even more complicated,
> > in a way that it is not just about querying whether the extension is
> > supported, but the expensive check is something more complicated.
> > 
> > Going back to implementation details for this proposed new feature,
> > one alternative to query could be something like:
> > 
> >    drm_i915_gem_create_ext.flags |= I915_GEM_CREATE_EXT_FLAG_PROBE_EXTENSIONS;
> > 
> > That would be somewhat more light weight to implement that the
> > i915_query route. And it appears it would work for all ioctls which
> > support extensions apart for i915_context_param_engines.
> 
> This seems little better than the "try it, and if it works then it's
> supported".
> 
> I'm not suggesting that userspace should be able to check that
> scenario x+y+z will work, but more a list of extensions that
> conceivably could work. Normally this should just a matter of the
> kernel unconditionally adding the newly implemented extension to the
> list returned in the query call.
> 
> If a GET_PARAM can be made for the PXP case, then it seems like a
> query item returning CONTEXT_CREATE extensions could conditionally
> omit that extension just as easily as implementing the proposed new
> GET_PARAM.
> 
> -Jordan

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [Intel-gfx] IOCTL feature detection (Was: Re: [PATCH 8/8] drm/i915: Allow user to set cache at BO creation)
@ 2023-04-25 13:41                     ` Joonas Lahtinen
  0 siblings, 0 replies; 76+ messages in thread
From: Joonas Lahtinen @ 2023-04-25 13:41 UTC (permalink / raw)
  To: Teres Alexis, Alan Previn, Yang, Fei, Andi Shyti, Jordan Justen,
	Tvrtko Ursulin
  Cc: Roper, Matthew D, Intel-gfx, DRI Development, Daniel Vetter,
	Chris Wilson, Faith Ekstrand, Das, Nirmoy

(+ Faith and Daniel as they have been involved in previous discussions)

Quoting Jordan Justen (2023-04-24 20:13:00)
> On 2023-04-24 02:08:43, Tvrtko Ursulin wrote:
> > 
> > Being able to "list" supported extensions sounds like a reasonable
> > principle, albeit a departure from the design direction to date.
> > Which means there are probably no quick solutions. Also, AFAIU, only
> > PXP context create is the problematic one, right? Everything else is
> > pretty much instant or delayed allocation so super cheap to probe by
> > attempting to use.
> > 
> > If I got that right and given this series is about
> > drm_i915_gem_create_ext I don't think this side discussion should be
> > blocking it.
> 
> This still leaves the issue of no reasonable detection mechanism for
> the extension.

I remember this exact discussion being repeated at least a few times in
the past 5 years. Previously the conclusion was always that detection by
trying to use the extension leads to least amount of uAPI surface. There
is also no concern of having mismatch in backporting of the query and the
functionality patches.

Why exactly do you think it is more reasonable to do the following?

check_if_extension_query_is_supported();
<check retval>
check_if_extension_xyz_is_supported();
<check retval>

VS

create_[context,buffer,whatever]_with_extension();
<check errno>
destroy_[context,buffer,whatever]();

For contexts and buffers there's no overhead, and we're keeping the uAPI
surface smaller (less bugs, less validation effort). Additionally we
support checking combinations of extensions A, B and C without extra
effort.

If we're not returning enough clear errnos, then that is something to
make sure we do.

Regards, Joonas

> If the discussion gets too complicated, then can we add
> a GET_PARAM for the SET_PAT extension? I'm hoping we could either come
> up with something better reasonably quickly, or i915/Xe can add a new
> param for each new extensions until a better approach is available.
> 
> > Furthermore the PXP context create story is even more complicated,
> > in a way that it is not just about querying whether the extension is
> > supported, but the expensive check is something more complicated.
> > 
> > Going back to implementation details for this proposed new feature,
> > one alternative to query could be something like:
> > 
> >    drm_i915_gem_create_ext.flags |= I915_GEM_CREATE_EXT_FLAG_PROBE_EXTENSIONS;
> > 
> > That would be somewhat more light weight to implement that the
> > i915_query route. And it appears it would work for all ioctls which
> > support extensions apart for i915_context_param_engines.
> 
> This seems little better than the "try it, and if it works then it's
> supported".
> 
> I'm not suggesting that userspace should be able to check that
> scenario x+y+z will work, but more a list of extensions that
> conceivably could work. Normally this should just a matter of the
> kernel unconditionally adding the newly implemented extension to the
> list returned in the query call.
> 
> If a GET_PARAM can be made for the PXP case, then it seems like a
> query item returning CONTEXT_CREATE extensions could conditionally
> omit that extension just as easily as implementing the proposed new
> GET_PARAM.
> 
> -Jordan

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: IOCTL feature detection (Was: Re: [Intel-gfx] [PATCH 8/8] drm/i915: Allow user to set cache at BO creation)
  2023-04-25 13:41                     ` [Intel-gfx] IOCTL feature detection (Was: " Joonas Lahtinen
@ 2023-04-25 17:21                       ` Teres Alexis, Alan Previn
  -1 siblings, 0 replies; 76+ messages in thread
From: Teres Alexis, Alan Previn @ 2023-04-25 17:21 UTC (permalink / raw)
  To: joonas.lahtinen, Justen, Jordan L, Yang, Fei, tvrtko.ursulin, andi.shyti
  Cc: chris.p.wilson, Intel-gfx, dri-devel, Roper, Matthew D,
	faith.ekstrand, Das,  Nirmoy

On Tue, 2023-04-25 at 16:41 +0300, Joonas Lahtinen wrote:
> (+ Faith and Daniel as they have been involved in previous discussions)
An orthogonal (but losely related) question: Is PXP the only subsystem that has
the unique problem of: Uses a delayed worker to complete all dependencies for init..
but they take so long that by the time compositors-mesa-init comes up, creation
of PXP context blocks too long and may still likely failing and incorrectly
assuming PXP is not supported. (when we don't have a GET_PARAM for it).
I believe HuC has a similiar issue - but doesnt reflect in the UAPI but rather the cmd buffers.
We don't have other subsystems that have this problem? (dependency on other startups?)


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Intel-gfx] IOCTL feature detection (Was: Re: [PATCH 8/8] drm/i915: Allow user to set cache at BO creation)
@ 2023-04-25 17:21                       ` Teres Alexis, Alan Previn
  0 siblings, 0 replies; 76+ messages in thread
From: Teres Alexis, Alan Previn @ 2023-04-25 17:21 UTC (permalink / raw)
  To: joonas.lahtinen, Justen, Jordan L, Yang, Fei, tvrtko.ursulin, andi.shyti
  Cc: chris.p.wilson, Intel-gfx, dri-devel, daniel, Roper, Matthew D,
	faith.ekstrand, Das,  Nirmoy

On Tue, 2023-04-25 at 16:41 +0300, Joonas Lahtinen wrote:
> (+ Faith and Daniel as they have been involved in previous discussions)
An orthogonal (but losely related) question: Is PXP the only subsystem that has
the unique problem of: Uses a delayed worker to complete all dependencies for init..
but they take so long that by the time compositors-mesa-init comes up, creation
of PXP context blocks too long and may still likely failing and incorrectly
assuming PXP is not supported. (when we don't have a GET_PARAM for it).
I believe HuC has a similiar issue - but doesnt reflect in the UAPI but rather the cmd buffers.
We don't have other subsystems that have this problem? (dependency on other startups?)


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: IOCTL feature detection (Was: Re: [Intel-gfx] [PATCH 8/8] drm/i915: Allow user to set cache at BO creation)
  2023-04-25 13:41                     ` [Intel-gfx] IOCTL feature detection (Was: " Joonas Lahtinen
@ 2023-04-25 18:19                       ` Jordan Justen
  -1 siblings, 0 replies; 76+ messages in thread
From: Jordan Justen @ 2023-04-25 18:19 UTC (permalink / raw)
  To: Teres Alexis, Alan Previn, Yang, Fei, Andi Shyti,
	Joonas Lahtinen, Tvrtko Ursulin
  Cc: Roper, Matthew D, Intel-gfx, DRI Development, Chris Wilson,
	Faith Ekstrand, Das, Nirmoy

On 2023-04-25 06:41:54, Joonas Lahtinen wrote:
> (+ Faith and Daniel as they have been involved in previous discussions)
> 
> Quoting Jordan Justen (2023-04-24 20:13:00)
> > On 2023-04-24 02:08:43, Tvrtko Ursulin wrote:
> > > 
> > > Being able to "list" supported extensions sounds like a reasonable
> > > principle, albeit a departure from the design direction to date.
> > > Which means there are probably no quick solutions. Also, AFAIU, only
> > > PXP context create is the problematic one, right? Everything else is
> > > pretty much instant or delayed allocation so super cheap to probe by
> > > attempting to use.
> > > 
> > > If I got that right and given this series is about
> > > drm_i915_gem_create_ext I don't think this side discussion should be
> > > blocking it.
> > 
> > This still leaves the issue of no reasonable detection mechanism for
> > the extension.
> 
> I remember this exact discussion being repeated at least a few times in
> the past 5 years. Previously the conclusion was always that detection by
> trying to use the extension leads to least amount of uAPI surface. There
> is also no concern of having mismatch in backporting of the query and the
> functionality patches.
> 
> Why exactly do you think it is more reasonable to do the following?
> 
> check_if_extension_query_is_supported();
> <check retval>
> check_if_extension_xyz_is_supported();
> <check retval>

As I've mentioned a couple times, I'm asking for query item that
returns it all the extensions that conceivably could work.

In theory it could be made a single query item which somehow then
enumerates if something is a context extension or bo extension. But,
it seems simpler for all if we just have a couple query items
returning a simple u64 array for the few classes of extensions.

> VS
> 
> create_[context,buffer,whatever]_with_extension();
> <check errno>
> destroy_[context,buffer,whatever]();
> 
> For contexts and buffers there's no overhead,

There's no-overhead to creating and destroying things? I think the 8s
timeout when trying create a protected content context shows it's not
always quite that simple.

Over time userspace will have to continue growing this set of
create/destroy tests as new extensions are added. Simply so we can
discover what extensions are supported.

This doesn't seem like a very robust way to advertise extensions for
an api.

Another point ... say you need to debug why some simple app is failing
to start the driver. One tool might be strace. But now you have a
bunch of noise of calls from the driver creating and destroying things
just to discover what extensions the kernel supports. It would be nice
if all this was handled with a query item vs a bunch of
create/destroys.

> and we're keeping the uAPI surface smaller (less bugs, less
> validation effort).

I understand wanting to keep the kernel uapi and implementation
simple.

Is it too complicated to return a u64 array for a query item
indicating the extensions supported for the various classes of
extensions? I think in most cases it'll just be essentially shuffling
across an array of u64 items. (Since most known extensions will always
be supported by the kernel.)

> Additionally we support checking combinations of extensions A, B and
> C without extra effort.

Regarding combinations of extensions, is that really something
userspace needs to probe? I would think if userspace knows about the
possible extensions, then it's on userspace to use combinations
appropriately.

But, in detecting extensions, it is possible that, say extension B
relies on extension A. Now userspace may have to probe to see if
extension A is supported, and then probe for extension B while using
extension A. Essentially, probing for an extension could become a bit
complicated. Vs the kernel just giving us a u64 array of known
extensions...

-Jordan

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Intel-gfx] IOCTL feature detection (Was: Re: [PATCH 8/8] drm/i915: Allow user to set cache at BO creation)
@ 2023-04-25 18:19                       ` Jordan Justen
  0 siblings, 0 replies; 76+ messages in thread
From: Jordan Justen @ 2023-04-25 18:19 UTC (permalink / raw)
  To: Teres Alexis, Alan Previn, Yang, Fei, Andi Shyti,
	Joonas Lahtinen, Tvrtko Ursulin
  Cc: Roper, Matthew D, Intel-gfx, DRI Development, Daniel Vetter,
	Chris Wilson, Faith Ekstrand, Das, Nirmoy

On 2023-04-25 06:41:54, Joonas Lahtinen wrote:
> (+ Faith and Daniel as they have been involved in previous discussions)
> 
> Quoting Jordan Justen (2023-04-24 20:13:00)
> > On 2023-04-24 02:08:43, Tvrtko Ursulin wrote:
> > > 
> > > Being able to "list" supported extensions sounds like a reasonable
> > > principle, albeit a departure from the design direction to date.
> > > Which means there are probably no quick solutions. Also, AFAIU, only
> > > PXP context create is the problematic one, right? Everything else is
> > > pretty much instant or delayed allocation so super cheap to probe by
> > > attempting to use.
> > > 
> > > If I got that right and given this series is about
> > > drm_i915_gem_create_ext I don't think this side discussion should be
> > > blocking it.
> > 
> > This still leaves the issue of no reasonable detection mechanism for
> > the extension.
> 
> I remember this exact discussion being repeated at least a few times in
> the past 5 years. Previously the conclusion was always that detection by
> trying to use the extension leads to least amount of uAPI surface. There
> is also no concern of having mismatch in backporting of the query and the
> functionality patches.
> 
> Why exactly do you think it is more reasonable to do the following?
> 
> check_if_extension_query_is_supported();
> <check retval>
> check_if_extension_xyz_is_supported();
> <check retval>

As I've mentioned a couple times, I'm asking for query item that
returns it all the extensions that conceivably could work.

In theory it could be made a single query item which somehow then
enumerates if something is a context extension or bo extension. But,
it seems simpler for all if we just have a couple query items
returning a simple u64 array for the few classes of extensions.

> VS
> 
> create_[context,buffer,whatever]_with_extension();
> <check errno>
> destroy_[context,buffer,whatever]();
> 
> For contexts and buffers there's no overhead,

There's no-overhead to creating and destroying things? I think the 8s
timeout when trying create a protected content context shows it's not
always quite that simple.

Over time userspace will have to continue growing this set of
create/destroy tests as new extensions are added. Simply so we can
discover what extensions are supported.

This doesn't seem like a very robust way to advertise extensions for
an api.

Another point ... say you need to debug why some simple app is failing
to start the driver. One tool might be strace. But now you have a
bunch of noise of calls from the driver creating and destroying things
just to discover what extensions the kernel supports. It would be nice
if all this was handled with a query item vs a bunch of
create/destroys.

> and we're keeping the uAPI surface smaller (less bugs, less
> validation effort).

I understand wanting to keep the kernel uapi and implementation
simple.

Is it too complicated to return a u64 array for a query item
indicating the extensions supported for the various classes of
extensions? I think in most cases it'll just be essentially shuffling
across an array of u64 items. (Since most known extensions will always
be supported by the kernel.)

> Additionally we support checking combinations of extensions A, B and
> C without extra effort.

Regarding combinations of extensions, is that really something
userspace needs to probe? I would think if userspace knows about the
possible extensions, then it's on userspace to use combinations
appropriately.

But, in detecting extensions, it is possible that, say extension B
relies on extension A. Now userspace may have to probe to see if
extension A is supported, and then probe for extension B while using
extension A. Essentially, probing for an extension could become a bit
complicated. Vs the kernel just giving us a u64 array of known
extensions...

-Jordan

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: IOCTL feature detection (Was: Re: [Intel-gfx] [PATCH 8/8] drm/i915: Allow user to set cache at BO creation)
  2023-04-25 13:41                     ` [Intel-gfx] IOCTL feature detection (Was: " Joonas Lahtinen
@ 2023-04-26 11:52                       ` Daniel Vetter
  -1 siblings, 0 replies; 76+ messages in thread
From: Daniel Vetter @ 2023-04-26 11:52 UTC (permalink / raw)
  To: Joonas Lahtinen
  Cc: Tvrtko Ursulin, Andi Shyti, Teres Alexis, Alan Previn,
	Jordan Justen, Intel-gfx, DRI Development, Yang, Fei,
	Chris Wilson, Roper, Matthew D, Faith Ekstrand, Das, Nirmoy

On Tue, Apr 25, 2023 at 04:41:54PM +0300, Joonas Lahtinen wrote:
> (+ Faith and Daniel as they have been involved in previous discussions)
> 
> Quoting Jordan Justen (2023-04-24 20:13:00)
> > On 2023-04-24 02:08:43, Tvrtko Ursulin wrote:
> > > 
> > > Being able to "list" supported extensions sounds like a reasonable
> > > principle, albeit a departure from the design direction to date.
> > > Which means there are probably no quick solutions. Also, AFAIU, only
> > > PXP context create is the problematic one, right? Everything else is
> > > pretty much instant or delayed allocation so super cheap to probe by
> > > attempting to use.
> > > 
> > > If I got that right and given this series is about
> > > drm_i915_gem_create_ext I don't think this side discussion should be
> > > blocking it.
> > 
> > This still leaves the issue of no reasonable detection mechanism for
> > the extension.
> 
> I remember this exact discussion being repeated at least a few times in
> the past 5 years. Previously the conclusion was always that detection by
> trying to use the extension leads to least amount of uAPI surface. There
> is also no concern of having mismatch in backporting of the query and the
> functionality patches.
> 
> Why exactly do you think it is more reasonable to do the following?
> 
> check_if_extension_query_is_supported();
> <check retval>
> check_if_extension_xyz_is_supported();
> <check retval>
> 
> VS
> 
> create_[context,buffer,whatever]_with_extension();
> <check errno>
> destroy_[context,buffer,whatever]();
> 
> For contexts and buffers there's no overhead, and we're keeping the uAPI
> surface smaller (less bugs, less validation effort). Additionally we
> support checking combinations of extensions A, B and C without extra
> effort.
> 
> If we're not returning enough clear errnos, then that is something to
> make sure we do.

Joonas asked me to put my thoughts here:

- in general the "feature discovery by trying it out" approach is most
  robust and hence preferred, but it's also not something that's required
  when there's good reasons against it

- the more a feature spans drivers/modules, the more it should be
  discovered by trying it out, e.g. dma-buf fence import/export was a huge
  discussion, luckily mesa devs figured out how to transparantly fall back
  at runtime so we didn't end up merging the separate feature flag (I
  think at least, can't find it). pxp being split across i915/me/fw/who
  knows what else is kinda similar so I'd heavily lean towards discovery
  by creating a context

- pxp taking 8s to init a ctx sounds very broken, irrespective of anything
  else

- in practice there's not really a combinatorial explosion, for a lot of
  reasons:
  - kernel and userspace tend to assume/require implied features where it
    makes sense, e.g. in kms atomic implies planes and universal planes.
    mesa has been doing the same
  - you cold go all the way to what radeon/amdgpu have done for years with
    a single incrementing version. Probably not flexible enough for intel.
  - every pciid is it's own uapi, so a feature only needs to be tested on
    platforms where it didn't ship from the start. Also cuts down on
    runtime discovery a lot
  - mesa tends to only support down to current lts kernels and not older,
    which again cuts the combinations a lot

- I did look through upstream kernel docs for general (driver) uapi
  recommendations, but there isn't really anything about good taste stuff,
  just a lot about not screwing up compatibility across kernels/platforms.

tldr; prefer discovery, don't sweat it if not, definitely don't
overengineer this with some magic "give me all extensions" thing because
that results in guaranteed cross-component backporting pains sooner or
later. Or inconsistency, which defeats the point.

Cheers, Daniel
 
> Regards, Joonas
> 
> > If the discussion gets too complicated, then can we add
> > a GET_PARAM for the SET_PAT extension? I'm hoping we could either come
> > up with something better reasonably quickly, or i915/Xe can add a new
> > param for each new extensions until a better approach is available.
> > 
> > > Furthermore the PXP context create story is even more complicated,
> > > in a way that it is not just about querying whether the extension is
> > > supported, but the expensive check is something more complicated.
> > > 
> > > Going back to implementation details for this proposed new feature,
> > > one alternative to query could be something like:
> > > 
> > >    drm_i915_gem_create_ext.flags |= I915_GEM_CREATE_EXT_FLAG_PROBE_EXTENSIONS;
> > > 
> > > That would be somewhat more light weight to implement that the
> > > i915_query route. And it appears it would work for all ioctls which
> > > support extensions apart for i915_context_param_engines.
> > 
> > This seems little better than the "try it, and if it works then it's
> > supported".
> > 
> > I'm not suggesting that userspace should be able to check that
> > scenario x+y+z will work, but more a list of extensions that
> > conceivably could work. Normally this should just a matter of the
> > kernel unconditionally adding the newly implemented extension to the
> > list returned in the query call.
> > 
> > If a GET_PARAM can be made for the PXP case, then it seems like a
> > query item returning CONTEXT_CREATE extensions could conditionally
> > omit that extension just as easily as implementing the proposed new
> > GET_PARAM.
> > 
> > -Jordan

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Intel-gfx] IOCTL feature detection (Was: Re: [PATCH 8/8] drm/i915: Allow user to set cache at BO creation)
@ 2023-04-26 11:52                       ` Daniel Vetter
  0 siblings, 0 replies; 76+ messages in thread
From: Daniel Vetter @ 2023-04-26 11:52 UTC (permalink / raw)
  To: Joonas Lahtinen
  Cc: Teres Alexis, Alan Previn, Daniel Vetter, Intel-gfx,
	DRI Development, Chris Wilson, Roper, Matthew D, Faith Ekstrand,
	Das, Nirmoy

On Tue, Apr 25, 2023 at 04:41:54PM +0300, Joonas Lahtinen wrote:
> (+ Faith and Daniel as they have been involved in previous discussions)
> 
> Quoting Jordan Justen (2023-04-24 20:13:00)
> > On 2023-04-24 02:08:43, Tvrtko Ursulin wrote:
> > > 
> > > Being able to "list" supported extensions sounds like a reasonable
> > > principle, albeit a departure from the design direction to date.
> > > Which means there are probably no quick solutions. Also, AFAIU, only
> > > PXP context create is the problematic one, right? Everything else is
> > > pretty much instant or delayed allocation so super cheap to probe by
> > > attempting to use.
> > > 
> > > If I got that right and given this series is about
> > > drm_i915_gem_create_ext I don't think this side discussion should be
> > > blocking it.
> > 
> > This still leaves the issue of no reasonable detection mechanism for
> > the extension.
> 
> I remember this exact discussion being repeated at least a few times in
> the past 5 years. Previously the conclusion was always that detection by
> trying to use the extension leads to least amount of uAPI surface. There
> is also no concern of having mismatch in backporting of the query and the
> functionality patches.
> 
> Why exactly do you think it is more reasonable to do the following?
> 
> check_if_extension_query_is_supported();
> <check retval>
> check_if_extension_xyz_is_supported();
> <check retval>
> 
> VS
> 
> create_[context,buffer,whatever]_with_extension();
> <check errno>
> destroy_[context,buffer,whatever]();
> 
> For contexts and buffers there's no overhead, and we're keeping the uAPI
> surface smaller (less bugs, less validation effort). Additionally we
> support checking combinations of extensions A, B and C without extra
> effort.
> 
> If we're not returning enough clear errnos, then that is something to
> make sure we do.

Joonas asked me to put my thoughts here:

- in general the "feature discovery by trying it out" approach is most
  robust and hence preferred, but it's also not something that's required
  when there's good reasons against it

- the more a feature spans drivers/modules, the more it should be
  discovered by trying it out, e.g. dma-buf fence import/export was a huge
  discussion, luckily mesa devs figured out how to transparantly fall back
  at runtime so we didn't end up merging the separate feature flag (I
  think at least, can't find it). pxp being split across i915/me/fw/who
  knows what else is kinda similar so I'd heavily lean towards discovery
  by creating a context

- pxp taking 8s to init a ctx sounds very broken, irrespective of anything
  else

- in practice there's not really a combinatorial explosion, for a lot of
  reasons:
  - kernel and userspace tend to assume/require implied features where it
    makes sense, e.g. in kms atomic implies planes and universal planes.
    mesa has been doing the same
  - you cold go all the way to what radeon/amdgpu have done for years with
    a single incrementing version. Probably not flexible enough for intel.
  - every pciid is it's own uapi, so a feature only needs to be tested on
    platforms where it didn't ship from the start. Also cuts down on
    runtime discovery a lot
  - mesa tends to only support down to current lts kernels and not older,
    which again cuts the combinations a lot

- I did look through upstream kernel docs for general (driver) uapi
  recommendations, but there isn't really anything about good taste stuff,
  just a lot about not screwing up compatibility across kernels/platforms.

tldr; prefer discovery, don't sweat it if not, definitely don't
overengineer this with some magic "give me all extensions" thing because
that results in guaranteed cross-component backporting pains sooner or
later. Or inconsistency, which defeats the point.

Cheers, Daniel
 
> Regards, Joonas
> 
> > If the discussion gets too complicated, then can we add
> > a GET_PARAM for the SET_PAT extension? I'm hoping we could either come
> > up with something better reasonably quickly, or i915/Xe can add a new
> > param for each new extensions until a better approach is available.
> > 
> > > Furthermore the PXP context create story is even more complicated,
> > > in a way that it is not just about querying whether the extension is
> > > supported, but the expensive check is something more complicated.
> > > 
> > > Going back to implementation details for this proposed new feature,
> > > one alternative to query could be something like:
> > > 
> > >    drm_i915_gem_create_ext.flags |= I915_GEM_CREATE_EXT_FLAG_PROBE_EXTENSIONS;
> > > 
> > > That would be somewhat more light weight to implement that the
> > > i915_query route. And it appears it would work for all ioctls which
> > > support extensions apart for i915_context_param_engines.
> > 
> > This seems little better than the "try it, and if it works then it's
> > supported".
> > 
> > I'm not suggesting that userspace should be able to check that
> > scenario x+y+z will work, but more a list of extensions that
> > conceivably could work. Normally this should just a matter of the
> > kernel unconditionally adding the newly implemented extension to the
> > list returned in the query call.
> > 
> > If a GET_PARAM can be made for the PXP case, then it seems like a
> > query item returning CONTEXT_CREATE extensions could conditionally
> > omit that extension just as easily as implementing the proposed new
> > GET_PARAM.
> > 
> > -Jordan

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: IOCTL feature detection (Was: Re: [Intel-gfx] [PATCH 8/8] drm/i915: Allow user to set cache at BO creation)
  2023-04-26 11:52                       ` [Intel-gfx] IOCTL feature detection (Was: " Daniel Vetter
@ 2023-04-26 16:48                         ` Teres Alexis, Alan Previn
  -1 siblings, 0 replies; 76+ messages in thread
From: Teres Alexis, Alan Previn @ 2023-04-26 16:48 UTC (permalink / raw)
  To: daniel, joonas.lahtinen
  Cc: tvrtko.ursulin, Yang, Fei, Justen, Jordan L, Intel-gfx,
	dri-devel, Ceraolo Spurio, Daniele, andi.shyti, chris.p.wilson,
	Roper, Matthew D, faith.ekstrand, Das, Nirmoy

On Wed, 2023-04-26 at 13:52 +0200, Daniel Vetter wrote:
> On Tue, Apr 25, 2023 at 04:41:54PM +0300, Joonas Lahtinen wrote:
> > (+ Faith and Daniel as they have been involved in previous discussions)
> > Quoting Jordan Justen (2023-04-24 20:13:00)
> > > On 2023-04-24 02:08:43, Tvrtko Ursulin wrote:
> > > > 
> > > > 
> 
alan:snip

> - the more a feature spans drivers/modules, the more it should be
>   discovered by trying it out, e.g. dma-buf fence import/export was a huge
>   discussion, luckily mesa devs figured out how to transparantly fall back
>   at runtime so we didn't end up merging the separate feature flag (I
>   think at least, can't find it). pxp being split across i915/me/fw/who
>   knows what else is kinda similar so I'd heavily lean towards discovery
>   by creating a context
> 
> - pxp taking 8s to init a ctx sounds very broken, irrespective of anything
>   else
> 

Alan: Please be aware that:
1. the wait-timeout was changed to 1 second sometime back.
2. the I'm not deciding the time-out. I initially wanted to keep it at the same
timeout as ADL (250 milisec) - and ask the UMD to retry if user needs it. (as per
same ADL behavior). Daniele requested to move it to 8 seconds - but thru review
process, we reduced it to 1 second.
3. In anycase, thats just the wait-timeout - and we know it wont succeed until
~6 seconds after i915 (~9 secs after boot). The issue isnt our hardware or i915
- its the component driver load <-- this is what's broken. 

Details: PXP context is dependent on gsc-fw load, huc-firmware load, mei-gsc-proxy
component driver load + bind, huc-authentication and gsc-proxy-init-handshake.
Most of above steps begin rather quickly during i915 driver load - the delay
seems to come from a very late mei-gsc-proxy component driver load. In fact the
parent mei-me driver is only getting ~6 seconds after i915 init is done. That
blocks the gsc-proxy-init-handshake and huc-authentication and lastly PXP.

That said, what is broken is why it takes so long to get the component drivers
to come up. NOTE: PXP isnt really doing anything differently in the context
creation flow (in terms of time-consuming-steps compared to ADL) besides the
extra dependency waits these.

We can actually go back to the original timeout of 250 milisecs like we have in ADL
but will fail if MESA calls in too early (but will succeed later) ... or...
we can create the GET_PARAMs.

A better idea would be to figure out how to control the driver load order and
force mei driver + components to get called right after i915. I was informed
there is no way to control this and changes here will likely not be accepted
upstream.

++ Daniele - can you chime in?

Take note that ADL has the same issue but for whatever reason, the dependant
mei component on ADL loaded much sooner - so it was never an issue that was
caught but still existed on ADL time merge (if users customize the kernel + 
compositor for fastboot it will happen).(i realize I havent tested ADL with the
new kernel configs that we use to also boot PXP on MTL - wonder if the new
mei configs are causing the delay - i.e. ADL customer could suddenly see this
6 sec delay too. - something i have to check now)

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Intel-gfx] IOCTL feature detection (Was: Re: [PATCH 8/8] drm/i915: Allow user to set cache at BO creation)
@ 2023-04-26 16:48                         ` Teres Alexis, Alan Previn
  0 siblings, 0 replies; 76+ messages in thread
From: Teres Alexis, Alan Previn @ 2023-04-26 16:48 UTC (permalink / raw)
  To: daniel, joonas.lahtinen
  Cc: Intel-gfx, dri-devel, chris.p.wilson, Roper, Matthew D,
	faith.ekstrand, Das, Nirmoy

On Wed, 2023-04-26 at 13:52 +0200, Daniel Vetter wrote:
> On Tue, Apr 25, 2023 at 04:41:54PM +0300, Joonas Lahtinen wrote:
> > (+ Faith and Daniel as they have been involved in previous discussions)
> > Quoting Jordan Justen (2023-04-24 20:13:00)
> > > On 2023-04-24 02:08:43, Tvrtko Ursulin wrote:
> > > > 
> > > > 
> 
alan:snip

> - the more a feature spans drivers/modules, the more it should be
>   discovered by trying it out, e.g. dma-buf fence import/export was a huge
>   discussion, luckily mesa devs figured out how to transparantly fall back
>   at runtime so we didn't end up merging the separate feature flag (I
>   think at least, can't find it). pxp being split across i915/me/fw/who
>   knows what else is kinda similar so I'd heavily lean towards discovery
>   by creating a context
> 
> - pxp taking 8s to init a ctx sounds very broken, irrespective of anything
>   else
> 

Alan: Please be aware that:
1. the wait-timeout was changed to 1 second sometime back.
2. the I'm not deciding the time-out. I initially wanted to keep it at the same
timeout as ADL (250 milisec) - and ask the UMD to retry if user needs it. (as per
same ADL behavior). Daniele requested to move it to 8 seconds - but thru review
process, we reduced it to 1 second.
3. In anycase, thats just the wait-timeout - and we know it wont succeed until
~6 seconds after i915 (~9 secs after boot). The issue isnt our hardware or i915
- its the component driver load <-- this is what's broken. 

Details: PXP context is dependent on gsc-fw load, huc-firmware load, mei-gsc-proxy
component driver load + bind, huc-authentication and gsc-proxy-init-handshake.
Most of above steps begin rather quickly during i915 driver load - the delay
seems to come from a very late mei-gsc-proxy component driver load. In fact the
parent mei-me driver is only getting ~6 seconds after i915 init is done. That
blocks the gsc-proxy-init-handshake and huc-authentication and lastly PXP.

That said, what is broken is why it takes so long to get the component drivers
to come up. NOTE: PXP isnt really doing anything differently in the context
creation flow (in terms of time-consuming-steps compared to ADL) besides the
extra dependency waits these.

We can actually go back to the original timeout of 250 milisecs like we have in ADL
but will fail if MESA calls in too early (but will succeed later) ... or...
we can create the GET_PARAMs.

A better idea would be to figure out how to control the driver load order and
force mei driver + components to get called right after i915. I was informed
there is no way to control this and changes here will likely not be accepted
upstream.

++ Daniele - can you chime in?

Take note that ADL has the same issue but for whatever reason, the dependant
mei component on ADL loaded much sooner - so it was never an issue that was
caught but still existed on ADL time merge (if users customize the kernel + 
compositor for fastboot it will happen).(i realize I havent tested ADL with the
new kernel configs that we use to also boot PXP on MTL - wonder if the new
mei configs are causing the delay - i.e. ADL customer could suddenly see this
6 sec delay too. - something i have to check now)

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: IOCTL feature detection (Was: Re: [Intel-gfx] [PATCH 8/8] drm/i915: Allow user to set cache at BO creation)
  2023-04-26 16:48                         ` [Intel-gfx] IOCTL feature detection (Was: " Teres Alexis, Alan Previn
@ 2023-04-26 18:10                           ` Ceraolo Spurio, Daniele
  -1 siblings, 0 replies; 76+ messages in thread
From: Ceraolo Spurio, Daniele @ 2023-04-26 18:10 UTC (permalink / raw)
  To: Teres Alexis, Alan Previn, daniel, joonas.lahtinen
  Cc: tvrtko.ursulin, Yang, Fei, Justen, Jordan L, Intel-gfx,
	dri-devel, andi.shyti, chris.p.wilson, Roper, Matthew D,
	faith.ekstrand, Das, Nirmoy



On 4/26/2023 9:48 AM, Teres Alexis, Alan Previn wrote:
> On Wed, 2023-04-26 at 13:52 +0200, Daniel Vetter wrote:
>> On Tue, Apr 25, 2023 at 04:41:54PM +0300, Joonas Lahtinen wrote:
>>> (+ Faith and Daniel as they have been involved in previous discussions)
>>> Quoting Jordan Justen (2023-04-24 20:13:00)
>>>> On 2023-04-24 02:08:43, Tvrtko Ursulin wrote:
>>>>>
> alan:snip
>
>> - the more a feature spans drivers/modules, the more it should be
>>    discovered by trying it out, e.g. dma-buf fence import/export was a huge
>>    discussion, luckily mesa devs figured out how to transparantly fall back
>>    at runtime so we didn't end up merging the separate feature flag (I
>>    think at least, can't find it). pxp being split across i915/me/fw/who
>>    knows what else is kinda similar so I'd heavily lean towards discovery
>>    by creating a context
>>
>> - pxp taking 8s to init a ctx sounds very broken, irrespective of anything
>>    else

I think there has been a bit of confusion in regards to this timeout and 
to where it applies, so let me try to clarify to make sure we're all on 
the same page (Alan has already explained most of it below, but I'm 
going to go in a bit more detail and I want to make sure it's all in one 
place for reference).
Before we can do any PXP operation, dependencies need to be satisfied, 
some of which are outside of i915. For MTL, these are:

GSC FW needs to be loaded (~250 ms)
HuC FW needs to be authenticated for PXP ops (~20 ms)
MEI modules need to be bound (depends on the probe ordering, but usually 
a few secs)
GSC SW proxy via MEI needs to be established (~500 ms normally, but can 
take a few seconds on the first boot after a firmware update)

Due to the fact that these can take several seconds in total to 
complete, to avoid stalling driver load/resume for that long we moved 
the i915-side operations to a separate worker and we register i915 
before they've completed. This means that we can get a PXP context 
creation call before all the dependencies are in place, in which case we 
do need to wait and that's where the 8s come from. After all the pieces 
are in place, a PXP context creation call is much faster (up to ~150 ms, 
which is the time required to start the PXP session if it is not already 
running).

The reason why we suggested a dedicated getparam was to avoid requiring 
early users to wait for all of that to happen just to check the 
capability. By the time an user actually wants to use PXP, we're likely 
done with the prep steps (or at least we're far along with them) and 
therefore the wait will be short.

> Alan: Please be aware that:
> 1. the wait-timeout was changed to 1 second sometime back.
> 2. the I'm not deciding the time-out. I initially wanted to keep it at the same
> timeout as ADL (250 milisec) - and ask the UMD to retry if user needs it. (as per
> same ADL behavior). Daniele requested to move it to 8 seconds - but thru review
> process, we reduced it to 1 second.
> 3. In anycase, thats just the wait-timeout - and we know it wont succeed until
> ~6 seconds after i915 (~9 secs after boot). The issue isnt our hardware or i915
> - its the component driver load <-- this is what's broken.

I think the question here is whether the mei driver is taking a long 
time to probe or if it is just being probed late. In the latter case, I 
wouldn't call it broken.

>
> Details: PXP context is dependent on gsc-fw load, huc-firmware load, mei-gsc-proxy
> component driver load + bind, huc-authentication and gsc-proxy-init-handshake.
> Most of above steps begin rather quickly during i915 driver load - the delay
> seems to come from a very late mei-gsc-proxy component driver load. In fact the
> parent mei-me driver is only getting ~6 seconds after i915 init is done. That
> blocks the gsc-proxy-init-handshake and huc-authentication and lastly PXP.
>
> That said, what is broken is why it takes so long to get the component drivers
> to come up. NOTE: PXP isnt really doing anything differently in the context
> creation flow (in terms of time-consuming-steps compared to ADL) besides the
> extra dependency waits these.
>
> We can actually go back to the original timeout of 250 milisecs like we have in ADL
> but will fail if MESA calls in too early (but will succeed later) ... or...
> we can create the GET_PARAMs.
>
> A better idea would be to figure out how to control the driver load order and
> force mei driver + components to get called right after i915. I was informed
> there is no way to control this and changes here will likely not be accepted
> upstream.

we could add a device link to mark i915 as a consumer of mei, but I 
believe that wouldn't work for 2 reasons

1 - on discrete, mei binds to a child device of i915, so the dependency 
is reversed
2 - the link might just delay the i915 load to after the mei load, which 
I'm not sure it is something we want (and at that point we could also 
just wait for mei to bind from within the i915 load).

Daniele

>
> ++ Daniele - can you chime in?
>
> Take note that ADL has the same issue but for whatever reason, the dependant
> mei component on ADL loaded much sooner - so it was never an issue that was
> caught but still existed on ADL time merge (if users customize the kernel +
> compositor for fastboot it will happen).(i realize I havent tested ADL with the
> new kernel configs that we use to also boot PXP on MTL - wonder if the new
> mei configs are causing the delay - i.e. ADL customer could suddenly see this
> 6 sec delay too. - something i have to check now)


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Intel-gfx] IOCTL feature detection (Was: Re: [PATCH 8/8] drm/i915: Allow user to set cache at BO creation)
@ 2023-04-26 18:10                           ` Ceraolo Spurio, Daniele
  0 siblings, 0 replies; 76+ messages in thread
From: Ceraolo Spurio, Daniele @ 2023-04-26 18:10 UTC (permalink / raw)
  To: Teres Alexis, Alan Previn, daniel, joonas.lahtinen
  Cc: Intel-gfx, dri-devel, chris.p.wilson, Roper, Matthew D,
	faith.ekstrand, Das, Nirmoy



On 4/26/2023 9:48 AM, Teres Alexis, Alan Previn wrote:
> On Wed, 2023-04-26 at 13:52 +0200, Daniel Vetter wrote:
>> On Tue, Apr 25, 2023 at 04:41:54PM +0300, Joonas Lahtinen wrote:
>>> (+ Faith and Daniel as they have been involved in previous discussions)
>>> Quoting Jordan Justen (2023-04-24 20:13:00)
>>>> On 2023-04-24 02:08:43, Tvrtko Ursulin wrote:
>>>>>
> alan:snip
>
>> - the more a feature spans drivers/modules, the more it should be
>>    discovered by trying it out, e.g. dma-buf fence import/export was a huge
>>    discussion, luckily mesa devs figured out how to transparantly fall back
>>    at runtime so we didn't end up merging the separate feature flag (I
>>    think at least, can't find it). pxp being split across i915/me/fw/who
>>    knows what else is kinda similar so I'd heavily lean towards discovery
>>    by creating a context
>>
>> - pxp taking 8s to init a ctx sounds very broken, irrespective of anything
>>    else

I think there has been a bit of confusion in regards to this timeout and 
to where it applies, so let me try to clarify to make sure we're all on 
the same page (Alan has already explained most of it below, but I'm 
going to go in a bit more detail and I want to make sure it's all in one 
place for reference).
Before we can do any PXP operation, dependencies need to be satisfied, 
some of which are outside of i915. For MTL, these are:

GSC FW needs to be loaded (~250 ms)
HuC FW needs to be authenticated for PXP ops (~20 ms)
MEI modules need to be bound (depends on the probe ordering, but usually 
a few secs)
GSC SW proxy via MEI needs to be established (~500 ms normally, but can 
take a few seconds on the first boot after a firmware update)

Due to the fact that these can take several seconds in total to 
complete, to avoid stalling driver load/resume for that long we moved 
the i915-side operations to a separate worker and we register i915 
before they've completed. This means that we can get a PXP context 
creation call before all the dependencies are in place, in which case we 
do need to wait and that's where the 8s come from. After all the pieces 
are in place, a PXP context creation call is much faster (up to ~150 ms, 
which is the time required to start the PXP session if it is not already 
running).

The reason why we suggested a dedicated getparam was to avoid requiring 
early users to wait for all of that to happen just to check the 
capability. By the time an user actually wants to use PXP, we're likely 
done with the prep steps (or at least we're far along with them) and 
therefore the wait will be short.

> Alan: Please be aware that:
> 1. the wait-timeout was changed to 1 second sometime back.
> 2. the I'm not deciding the time-out. I initially wanted to keep it at the same
> timeout as ADL (250 milisec) - and ask the UMD to retry if user needs it. (as per
> same ADL behavior). Daniele requested to move it to 8 seconds - but thru review
> process, we reduced it to 1 second.
> 3. In anycase, thats just the wait-timeout - and we know it wont succeed until
> ~6 seconds after i915 (~9 secs after boot). The issue isnt our hardware or i915
> - its the component driver load <-- this is what's broken.

I think the question here is whether the mei driver is taking a long 
time to probe or if it is just being probed late. In the latter case, I 
wouldn't call it broken.

>
> Details: PXP context is dependent on gsc-fw load, huc-firmware load, mei-gsc-proxy
> component driver load + bind, huc-authentication and gsc-proxy-init-handshake.
> Most of above steps begin rather quickly during i915 driver load - the delay
> seems to come from a very late mei-gsc-proxy component driver load. In fact the
> parent mei-me driver is only getting ~6 seconds after i915 init is done. That
> blocks the gsc-proxy-init-handshake and huc-authentication and lastly PXP.
>
> That said, what is broken is why it takes so long to get the component drivers
> to come up. NOTE: PXP isnt really doing anything differently in the context
> creation flow (in terms of time-consuming-steps compared to ADL) besides the
> extra dependency waits these.
>
> We can actually go back to the original timeout of 250 milisecs like we have in ADL
> but will fail if MESA calls in too early (but will succeed later) ... or...
> we can create the GET_PARAMs.
>
> A better idea would be to figure out how to control the driver load order and
> force mei driver + components to get called right after i915. I was informed
> there is no way to control this and changes here will likely not be accepted
> upstream.

we could add a device link to mark i915 as a consumer of mei, but I 
believe that wouldn't work for 2 reasons

1 - on discrete, mei binds to a child device of i915, so the dependency 
is reversed
2 - the link might just delay the i915 load to after the mei load, which 
I'm not sure it is something we want (and at that point we could also 
just wait for mei to bind from within the i915 load).

Daniele

>
> ++ Daniele - can you chime in?
>
> Take note that ADL has the same issue but for whatever reason, the dependant
> mei component on ADL loaded much sooner - so it was never an issue that was
> caught but still existed on ADL time merge (if users customize the kernel +
> compositor for fastboot it will happen).(i realize I havent tested ADL with the
> new kernel configs that we use to also boot PXP on MTL - wonder if the new
> mei configs are causing the delay - i.e. ADL customer could suddenly see this
> 6 sec delay too. - something i have to check now)


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: IOCTL feature detection (Was: Re: [Intel-gfx] [PATCH 8/8] drm/i915: Allow user to set cache at BO creation)
  2023-04-26 11:52                       ` [Intel-gfx] IOCTL feature detection (Was: " Daniel Vetter
@ 2023-04-26 20:04                         ` Jordan Justen
  -1 siblings, 0 replies; 76+ messages in thread
From: Jordan Justen @ 2023-04-26 20:04 UTC (permalink / raw)
  To: Daniel Vetter, Joonas Lahtinen
  Cc: Tvrtko Ursulin, Andi Shyti, Teres Alexis, Alan Previn, Roper,
	Matthew D, Intel-gfx, DRI Development, Yang, Fei, Chris Wilson,
	Faith Ekstrand, Das, Nirmoy

On 2023-04-26 04:52:43, Daniel Vetter wrote:
> 
> Joonas asked me to put my thoughts here:
> 
> - in general the "feature discovery by trying it out" approach is most
>   robust and hence preferred, but it's also not something that's required
>   when there's good reasons against it

More robust in what sense?

Userspace has to set up some scenario to use the interface, which
hopefully is not too complex. It's difficult to predict what it may
look like in the future since we are talking about undefined
extensions.

Userspace has to rely on the kernel making creating and destroying
this thing essentially free. For
I915_GEM_CREATE_EXT_PROTECTED_CONTENT, that did not work out. For
I915_GEM_CREATE_EXT_SET_PAT, just wondering, since the PAT indices are
platform specific, what might happen if we tried to choose some common
index value to detect the extension in a generic manner? Could it
potentially lead to unexpected behavior, or maybe just an error. I
guess it's really extension specific what kind of issues potentially
could arise.

> tldr; prefer discovery, don't sweat it if not, definitely don't
> overengineer this with some magic "give me all extensions" thing because
> that results in guaranteed cross-component backporting pains sooner or
> later. Or inconsistency, which defeats the point.

I guess I don't know the full context of your concerns here.

For returning the gem-create extensions, isn't this just a matter of
returning the valid indices to the create_extensions array in
i915_gem_create.c? Is that an over-engineered query?

It seems weird that there's a reasonably well defined "extension"
mechanism here, but no way for the kernel to just tell us what
extensions it knows about.

-Jordan

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Intel-gfx] IOCTL feature detection (Was: Re: [PATCH 8/8] drm/i915: Allow user to set cache at BO creation)
@ 2023-04-26 20:04                         ` Jordan Justen
  0 siblings, 0 replies; 76+ messages in thread
From: Jordan Justen @ 2023-04-26 20:04 UTC (permalink / raw)
  To: Daniel Vetter, Joonas Lahtinen
  Cc: Teres Alexis, Alan Previn, Roper, Matthew D, Intel-gfx,
	DRI Development, Chris Wilson, Faith Ekstrand, Das, Nirmoy

On 2023-04-26 04:52:43, Daniel Vetter wrote:
> 
> Joonas asked me to put my thoughts here:
> 
> - in general the "feature discovery by trying it out" approach is most
>   robust and hence preferred, but it's also not something that's required
>   when there's good reasons against it

More robust in what sense?

Userspace has to set up some scenario to use the interface, which
hopefully is not too complex. It's difficult to predict what it may
look like in the future since we are talking about undefined
extensions.

Userspace has to rely on the kernel making creating and destroying
this thing essentially free. For
I915_GEM_CREATE_EXT_PROTECTED_CONTENT, that did not work out. For
I915_GEM_CREATE_EXT_SET_PAT, just wondering, since the PAT indices are
platform specific, what might happen if we tried to choose some common
index value to detect the extension in a generic manner? Could it
potentially lead to unexpected behavior, or maybe just an error. I
guess it's really extension specific what kind of issues potentially
could arise.

> tldr; prefer discovery, don't sweat it if not, definitely don't
> overengineer this with some magic "give me all extensions" thing because
> that results in guaranteed cross-component backporting pains sooner or
> later. Or inconsistency, which defeats the point.

I guess I don't know the full context of your concerns here.

For returning the gem-create extensions, isn't this just a matter of
returning the valid indices to the create_extensions array in
i915_gem_create.c? Is that an over-engineered query?

It seems weird that there's a reasonably well defined "extension"
mechanism here, but no way for the kernel to just tell us what
extensions it knows about.

-Jordan

^ permalink raw reply	[flat|nested] 76+ messages in thread

end of thread, other threads:[~2023-04-26 20:04 UTC | newest]

Thread overview: 76+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-19 23:00 [PATCH 0/8] drm/i915/mtl: Define MOCS and PAT tables for MTL fei.yang
2023-04-19 23:00 ` [Intel-gfx] " fei.yang
2023-04-19 23:00 ` [PATCH 1/8] drm/i915/mtl: Set has_llc=0 fei.yang
2023-04-19 23:00   ` [Intel-gfx] " fei.yang
2023-04-20 10:20   ` Das, Nirmoy
2023-04-20 10:20     ` Das, Nirmoy
2023-04-19 23:00 ` [PATCH 2/8] drm/i915/mtl: Define MOCS and PAT tables for MTL fei.yang
2023-04-19 23:00   ` [Intel-gfx] " fei.yang
2023-04-20 20:29   ` Matt Roper
2023-04-19 23:00 ` [PATCH 3/8] drm/i915/mtl: Add PTE encode function fei.yang
2023-04-19 23:00   ` [Intel-gfx] " fei.yang
2023-04-20 20:40   ` Matt Roper
2023-04-21 17:27     ` Yang, Fei
2023-04-21 17:42       ` Matt Roper
2023-04-23  7:37         ` Yang, Fei
2023-04-23  7:37           ` Yang, Fei
2023-04-24 17:20           ` Matt Roper
2023-04-24 18:41             ` Yang, Fei
2023-04-19 23:00 ` [PATCH 4/8] drm/i915/mtl: workaround coherency issue for Media fei.yang
2023-04-19 23:00   ` [Intel-gfx] " fei.yang
2023-04-20  8:26   ` Andrzej Hajda
2023-04-20 11:36   ` Das, Nirmoy
2023-04-20 11:36     ` Das, Nirmoy
2023-04-20 20:52   ` Matt Roper
2023-04-19 23:00 ` [PATCH 5/8] drm/i915/mtl: end support for set caching ioctl fei.yang
2023-04-19 23:00   ` [Intel-gfx] " fei.yang
2023-04-20 21:05   ` Matt Roper
2023-04-19 23:00 ` [PATCH 6/8] drm/i915: preparation for using PAT index fei.yang
2023-04-19 23:00   ` [Intel-gfx] " fei.yang
2023-04-20  8:45   ` Andrzej Hajda
2023-04-20 21:14   ` Matt Roper
2023-04-19 23:00 ` [PATCH 7/8] drm/i915: use pat_index instead of cache_level fei.yang
2023-04-19 23:00   ` [Intel-gfx] " fei.yang
2023-04-20 10:13   ` Andrzej Hajda
2023-04-20 12:39     ` Tvrtko Ursulin
2023-04-20 20:34       ` Yang, Fei
2023-04-21  8:43   ` Tvrtko Ursulin
2023-04-21 10:17   ` Tvrtko Ursulin
2023-04-23  6:12     ` Yang, Fei
2023-04-23  6:12       ` Yang, Fei
2023-04-24  8:41       ` Tvrtko Ursulin
2023-04-21 11:39   ` Tvrtko Ursulin
2023-04-23  6:52     ` Yang, Fei
2023-04-23  6:52       ` Yang, Fei
2023-04-24  9:22       ` Tvrtko Ursulin
2023-04-19 23:00 ` [PATCH 8/8] drm/i915: Allow user to set cache at BO creation fei.yang
2023-04-19 23:00   ` [Intel-gfx] " fei.yang
2023-04-20 11:39   ` Andi Shyti
2023-04-20 11:39     ` [Intel-gfx] " Andi Shyti
2023-04-20 13:06     ` Tvrtko Ursulin
2023-04-20 16:11       ` Yang, Fei
2023-04-20 16:29         ` Andi Shyti
2023-04-20 16:29           ` Andi Shyti
2023-04-21 20:48         ` Jordan Justen
2023-04-21 20:48           ` Jordan Justen
     [not found]           ` <BYAPR11MB2567F03AD43D7E2DE2628D5D9A669@BYAPR11MB2567.namprd11.prod.outlook.com>
     [not found]             ` <168232538771.392286.3227368099155268955@jljusten-skl>
2023-04-24  9:08               ` Tvrtko Ursulin
2023-04-24  9:08                 ` Tvrtko Ursulin
2023-04-24 17:13                 ` Jordan Justen
2023-04-24 17:13                   ` Jordan Justen
2023-04-25 13:41                   ` IOCTL feature detection (Was: Re: [Intel-gfx] [PATCH 8/8] drm/i915: Allow user to set cache at BO creation) Joonas Lahtinen
2023-04-25 13:41                     ` [Intel-gfx] IOCTL feature detection (Was: " Joonas Lahtinen
2023-04-25 17:21                     ` IOCTL feature detection (Was: Re: [Intel-gfx] " Teres Alexis, Alan Previn
2023-04-25 17:21                       ` [Intel-gfx] IOCTL feature detection (Was: " Teres Alexis, Alan Previn
2023-04-25 18:19                     ` IOCTL feature detection (Was: Re: [Intel-gfx] " Jordan Justen
2023-04-25 18:19                       ` [Intel-gfx] IOCTL feature detection (Was: " Jordan Justen
2023-04-26 11:52                     ` IOCTL feature detection (Was: Re: [Intel-gfx] " Daniel Vetter
2023-04-26 11:52                       ` [Intel-gfx] IOCTL feature detection (Was: " Daniel Vetter
2023-04-26 16:48                       ` IOCTL feature detection (Was: Re: [Intel-gfx] " Teres Alexis, Alan Previn
2023-04-26 16:48                         ` [Intel-gfx] IOCTL feature detection (Was: " Teres Alexis, Alan Previn
2023-04-26 18:10                         ` IOCTL feature detection (Was: Re: [Intel-gfx] " Ceraolo Spurio, Daniele
2023-04-26 18:10                           ` [Intel-gfx] IOCTL feature detection (Was: " Ceraolo Spurio, Daniele
2023-04-26 20:04                       ` IOCTL feature detection (Was: Re: [Intel-gfx] " Jordan Justen
2023-04-26 20:04                         ` [Intel-gfx] IOCTL feature detection (Was: " Jordan Justen
2023-04-19 23:29 ` [Intel-gfx] ✗ Fi.CI.SPARSE: warning for drm/i915/mtl: Define MOCS and PAT tables for MTL (rev8) Patchwork
2023-04-19 23:51 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork
2023-04-20 11:30 ` [Intel-gfx] [PATCH 0/8] drm/i915/mtl: Define MOCS and PAT tables for MTL Andi Shyti

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.