All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 0/8] Another take on PAT/object cache mode refactoring
@ 2023-07-27 14:54 ` Tvrtko Ursulin
  0 siblings, 0 replies; 59+ messages in thread
From: Tvrtko Ursulin @ 2023-07-27 14:54 UTC (permalink / raw)
  To: Intel-gfx, dri-devel; +Cc: Matt Roper, Fei Yang, Tvrtko Ursulin

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Good news is that I realized series can be split after all. Bad news is that it
is still a lot to go through.

  drm/i915: Skip clflush after GPU writes on Meteorlake

This is based on what Fei found out from hardware architects. If we agree the
the function this helper should achieve follow up is checking if other snoopable
platforms are the same.

  drm/i915: Split PTE encode between Gen12 and Meteorlake

Not that much related but I feel we don't need to run impossible code on
platforms before Meteorlake. Shouldn't be controversial.

  drm/i915: Cache PAT index used by the driver

This one shouldn't be controversial either. Just eliminates a pile of calls to
i915_gem_get_pat_index().

  drm/i915: Refactor PAT/object cache handling

This is most code and the "table reversal" logic which makes i915 understands
caching modes behind PAT indices.

Review for taste and general "does it make sense" is needed here. Oh and extra
care about boolean logic conversion as I was pulling out obj->user_pat_set from
inside i915_gem_object_has_cache_level to the call sites.

All magic "if user PAT is set assume the worst" are still left in with this
patch.

  drm/i915: Improve the vm_fault_gtt user PAT index restriction
  drm/i915: Lift the user PAT restriction from gpu_write_needs_clflush
  drm/i915: Lift the user PAT restriction from use_cpu_reloc
  drm/i915: Refine the caching check in i915_gem_object_can_bypass_llc

This bunch is what removes the "user PAT set special casing".

Each of them probably have different reasons why the original cache level check
was in them so as many extra pair of eyes as possible are needed to verify both
that I have correctly understood what the underlying reasons why each were
there, and that I haven't fumbled the logic on the rudimentary level. Or perhaps
that it is possible to simplify this further. By maybe using more of
I915_BO_CACHE_COHERENT_FOR_... flags, or something.

Overall, a lot of scrutiny is needed for most of the series since it is
complicated and I am juggling multiple things.

Cc: Fei Yang <fei.yang@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>

Tvrtko Ursulin (8):
  drm/i915: Skip clflush after GPU writes on Meteorlake
  drm/i915: Split PTE encode between Gen12 and Meteorlake
  drm/i915: Cache PAT index used by the driver
  drm/i915: Refactor PAT/object cache handling
  drm/i915: Improve the vm_fault_gtt user PAT index restriction
  drm/i915: Lift the user PAT restriction from gpu_write_needs_clflush
  drm/i915: Lift the user PAT restriction from use_cpu_reloc
  drm/i915: Refine the caching check in i915_gem_object_can_bypass_llc

 drivers/gpu/drm/i915/Makefile                 |   1 +
 drivers/gpu/drm/i915/gem/i915_gem_domain.c    |  67 ++++++---
 drivers/gpu/drm/i915/gem/i915_gem_domain.h    |   5 +-
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |  11 +-
 drivers/gpu/drm/i915/gem/i915_gem_internal.c  |   2 +-
 drivers/gpu/drm/i915/gem/i915_gem_mman.c      |  12 +-
 drivers/gpu/drm/i915/gem/i915_gem_object.c    | 135 ++++++++++--------
 drivers/gpu/drm/i915/gem/i915_gem_object.h    |  11 +-
 .../gpu/drm/i915/gem/i915_gem_object_types.h  | 116 +--------------
 drivers/gpu/drm/i915/gem/i915_gem_shmem.c     |   8 +-
 drivers/gpu/drm/i915/gem/i915_gem_stolen.c    |   9 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c  |  46 +++---
 drivers/gpu/drm/i915/gem/i915_gem_userptr.c   |   2 +-
 .../drm/i915/gem/selftests/huge_gem_object.c  |   2 +-
 .../gpu/drm/i915/gem/selftests/huge_pages.c   |   5 +-
 drivers/gpu/drm/i915/gt/gen6_ppgtt.c          |   4 +-
 drivers/gpu/drm/i915/gt/gen8_ppgtt.c          |  40 ++++--
 drivers/gpu/drm/i915/gt/intel_engine_cs.c     |   2 +-
 drivers/gpu/drm/i915/gt/intel_ggtt.c          |  33 ++---
 drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c     |   4 +-
 drivers/gpu/drm/i915/gt/intel_gtt.c           |   2 +-
 drivers/gpu/drm/i915/gt/intel_gtt.h           |   3 +-
 drivers/gpu/drm/i915/gt/intel_migrate.c       |  11 +-
 drivers/gpu/drm/i915/gt/intel_ppgtt.c         |   6 +-
 .../gpu/drm/i915/gt/intel_ring_submission.c   |   4 +-
 drivers/gpu/drm/i915/gt/intel_timeline.c      |   2 +-
 drivers/gpu/drm/i915/gt/selftest_hangcheck.c  |   2 +-
 drivers/gpu/drm/i915/gt/selftest_migrate.c    |   9 +-
 drivers/gpu/drm/i915/gt/selftest_reset.c      |  14 +-
 drivers/gpu/drm/i915/gt/selftest_tlb.c        |   5 +-
 .../gpu/drm/i915/gt/selftest_workarounds.c    |   2 +-
 drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c      |   8 +-
 drivers/gpu/drm/i915/i915_cache.c             |  93 ++++++++++++
 drivers/gpu/drm/i915/i915_cache.h             |  81 +++++++++++
 drivers/gpu/drm/i915/i915_debugfs.c           |  53 +------
 drivers/gpu/drm/i915/i915_driver.c            |   5 +
 drivers/gpu/drm/i915/i915_drv.h               |   2 +
 drivers/gpu/drm/i915/i915_gem.c               |  21 +--
 drivers/gpu/drm/i915/i915_gpu_error.c         |   8 +-
 drivers/gpu/drm/i915/i915_pci.c               |  84 ++++++-----
 drivers/gpu/drm/i915/i915_perf.c              |   2 +-
 drivers/gpu/drm/i915/intel_device_info.h      |   6 +-
 drivers/gpu/drm/i915/selftests/i915_gem.c     |   5 +-
 .../gpu/drm/i915/selftests/i915_gem_evict.c   |   8 +-
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c |  11 +-
 drivers/gpu/drm/i915/selftests/igt_spinner.c  |   2 +-
 .../drm/i915/selftests/intel_memory_region.c  |   4 +-
 .../gpu/drm/i915/selftests/mock_gem_device.c  |  14 +-
 48 files changed, 513 insertions(+), 469 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/i915_cache.c
 create mode 100644 drivers/gpu/drm/i915/i915_cache.h

-- 
2.39.2


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [Intel-gfx] [RFC 0/8] Another take on PAT/object cache mode refactoring
@ 2023-07-27 14:54 ` Tvrtko Ursulin
  0 siblings, 0 replies; 59+ messages in thread
From: Tvrtko Ursulin @ 2023-07-27 14:54 UTC (permalink / raw)
  To: Intel-gfx, dri-devel; +Cc: Matt Roper

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Good news is that I realized series can be split after all. Bad news is that it
is still a lot to go through.

  drm/i915: Skip clflush after GPU writes on Meteorlake

This is based on what Fei found out from hardware architects. If we agree the
the function this helper should achieve follow up is checking if other snoopable
platforms are the same.

  drm/i915: Split PTE encode between Gen12 and Meteorlake

Not that much related but I feel we don't need to run impossible code on
platforms before Meteorlake. Shouldn't be controversial.

  drm/i915: Cache PAT index used by the driver

This one shouldn't be controversial either. Just eliminates a pile of calls to
i915_gem_get_pat_index().

  drm/i915: Refactor PAT/object cache handling

This is most code and the "table reversal" logic which makes i915 understands
caching modes behind PAT indices.

Review for taste and general "does it make sense" is needed here. Oh and extra
care about boolean logic conversion as I was pulling out obj->user_pat_set from
inside i915_gem_object_has_cache_level to the call sites.

All magic "if user PAT is set assume the worst" are still left in with this
patch.

  drm/i915: Improve the vm_fault_gtt user PAT index restriction
  drm/i915: Lift the user PAT restriction from gpu_write_needs_clflush
  drm/i915: Lift the user PAT restriction from use_cpu_reloc
  drm/i915: Refine the caching check in i915_gem_object_can_bypass_llc

This bunch is what removes the "user PAT set special casing".

Each of them probably have different reasons why the original cache level check
was in them so as many extra pair of eyes as possible are needed to verify both
that I have correctly understood what the underlying reasons why each were
there, and that I haven't fumbled the logic on the rudimentary level. Or perhaps
that it is possible to simplify this further. By maybe using more of
I915_BO_CACHE_COHERENT_FOR_... flags, or something.

Overall, a lot of scrutiny is needed for most of the series since it is
complicated and I am juggling multiple things.

Cc: Fei Yang <fei.yang@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>

Tvrtko Ursulin (8):
  drm/i915: Skip clflush after GPU writes on Meteorlake
  drm/i915: Split PTE encode between Gen12 and Meteorlake
  drm/i915: Cache PAT index used by the driver
  drm/i915: Refactor PAT/object cache handling
  drm/i915: Improve the vm_fault_gtt user PAT index restriction
  drm/i915: Lift the user PAT restriction from gpu_write_needs_clflush
  drm/i915: Lift the user PAT restriction from use_cpu_reloc
  drm/i915: Refine the caching check in i915_gem_object_can_bypass_llc

 drivers/gpu/drm/i915/Makefile                 |   1 +
 drivers/gpu/drm/i915/gem/i915_gem_domain.c    |  67 ++++++---
 drivers/gpu/drm/i915/gem/i915_gem_domain.h    |   5 +-
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |  11 +-
 drivers/gpu/drm/i915/gem/i915_gem_internal.c  |   2 +-
 drivers/gpu/drm/i915/gem/i915_gem_mman.c      |  12 +-
 drivers/gpu/drm/i915/gem/i915_gem_object.c    | 135 ++++++++++--------
 drivers/gpu/drm/i915/gem/i915_gem_object.h    |  11 +-
 .../gpu/drm/i915/gem/i915_gem_object_types.h  | 116 +--------------
 drivers/gpu/drm/i915/gem/i915_gem_shmem.c     |   8 +-
 drivers/gpu/drm/i915/gem/i915_gem_stolen.c    |   9 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c  |  46 +++---
 drivers/gpu/drm/i915/gem/i915_gem_userptr.c   |   2 +-
 .../drm/i915/gem/selftests/huge_gem_object.c  |   2 +-
 .../gpu/drm/i915/gem/selftests/huge_pages.c   |   5 +-
 drivers/gpu/drm/i915/gt/gen6_ppgtt.c          |   4 +-
 drivers/gpu/drm/i915/gt/gen8_ppgtt.c          |  40 ++++--
 drivers/gpu/drm/i915/gt/intel_engine_cs.c     |   2 +-
 drivers/gpu/drm/i915/gt/intel_ggtt.c          |  33 ++---
 drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c     |   4 +-
 drivers/gpu/drm/i915/gt/intel_gtt.c           |   2 +-
 drivers/gpu/drm/i915/gt/intel_gtt.h           |   3 +-
 drivers/gpu/drm/i915/gt/intel_migrate.c       |  11 +-
 drivers/gpu/drm/i915/gt/intel_ppgtt.c         |   6 +-
 .../gpu/drm/i915/gt/intel_ring_submission.c   |   4 +-
 drivers/gpu/drm/i915/gt/intel_timeline.c      |   2 +-
 drivers/gpu/drm/i915/gt/selftest_hangcheck.c  |   2 +-
 drivers/gpu/drm/i915/gt/selftest_migrate.c    |   9 +-
 drivers/gpu/drm/i915/gt/selftest_reset.c      |  14 +-
 drivers/gpu/drm/i915/gt/selftest_tlb.c        |   5 +-
 .../gpu/drm/i915/gt/selftest_workarounds.c    |   2 +-
 drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c      |   8 +-
 drivers/gpu/drm/i915/i915_cache.c             |  93 ++++++++++++
 drivers/gpu/drm/i915/i915_cache.h             |  81 +++++++++++
 drivers/gpu/drm/i915/i915_debugfs.c           |  53 +------
 drivers/gpu/drm/i915/i915_driver.c            |   5 +
 drivers/gpu/drm/i915/i915_drv.h               |   2 +
 drivers/gpu/drm/i915/i915_gem.c               |  21 +--
 drivers/gpu/drm/i915/i915_gpu_error.c         |   8 +-
 drivers/gpu/drm/i915/i915_pci.c               |  84 ++++++-----
 drivers/gpu/drm/i915/i915_perf.c              |   2 +-
 drivers/gpu/drm/i915/intel_device_info.h      |   6 +-
 drivers/gpu/drm/i915/selftests/i915_gem.c     |   5 +-
 .../gpu/drm/i915/selftests/i915_gem_evict.c   |   8 +-
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c |  11 +-
 drivers/gpu/drm/i915/selftests/igt_spinner.c  |   2 +-
 .../drm/i915/selftests/intel_memory_region.c  |   4 +-
 .../gpu/drm/i915/selftests/mock_gem_device.c  |  14 +-
 48 files changed, 513 insertions(+), 469 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/i915_cache.c
 create mode 100644 drivers/gpu/drm/i915/i915_cache.h

-- 
2.39.2


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [RFC 1/8] drm/i915: Skip clflush after GPU writes on Meteorlake
  2023-07-27 14:54 ` [Intel-gfx] " Tvrtko Ursulin
@ 2023-07-27 14:54   ` Tvrtko Ursulin
  -1 siblings, 0 replies; 59+ messages in thread
From: Tvrtko Ursulin @ 2023-07-27 14:54 UTC (permalink / raw)
  To: Intel-gfx, dri-devel
  Cc: Thomas Hellström, Matt Roper, Matthew Auld, Fei Yang,
	Tvrtko Ursulin

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

On Meteorlake CPU cache will not contain stale data after GPU access since
write-invalidate protocol is used, which means there is no need to flush
before potentially transitioning the buffer to a non-coherent domain.

Use the opportunity to documet the situation on discrete too.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Fei Yang <fei.yang@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_domain.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
index ffddec1d2a76..57db9c581bf6 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
@@ -24,9 +24,22 @@ static bool gpu_write_needs_clflush(struct drm_i915_gem_object *obj)
 {
 	struct drm_i915_private *i915 = to_i915(obj->base.dev);
 
+	/*
+	 * Discrete GPUs never dirty the CPU cache.
+	 */
 	if (IS_DGFX(i915))
 		return false;
 
+	/*
+	 * Cache snooping on Meteorlake is using write-invalidate so GPU writes
+	 * never end up in the CPU cache.
+	 *
+	 * QQQ: Do other snooping platforms behave identicaly and could we
+	 *      therefore write this as "if !HAS_LLC(i915) && HAS_SNOOP(i915)"?
+	 */
+	if (IS_METEORLAKE(i915))
+		return false;
+
 	/*
 	 * For objects created by userspace through GEM_CREATE with pat_index
 	 * set by set_pat extension, i915_gem_object_has_cache_level() will
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Intel-gfx] [RFC 1/8] drm/i915: Skip clflush after GPU writes on Meteorlake
@ 2023-07-27 14:54   ` Tvrtko Ursulin
  0 siblings, 0 replies; 59+ messages in thread
From: Tvrtko Ursulin @ 2023-07-27 14:54 UTC (permalink / raw)
  To: Intel-gfx, dri-devel; +Cc: Thomas Hellström, Matt Roper, Matthew Auld

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

On Meteorlake CPU cache will not contain stale data after GPU access since
write-invalidate protocol is used, which means there is no need to flush
before potentially transitioning the buffer to a non-coherent domain.

Use the opportunity to documet the situation on discrete too.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Fei Yang <fei.yang@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_domain.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
index ffddec1d2a76..57db9c581bf6 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
@@ -24,9 +24,22 @@ static bool gpu_write_needs_clflush(struct drm_i915_gem_object *obj)
 {
 	struct drm_i915_private *i915 = to_i915(obj->base.dev);
 
+	/*
+	 * Discrete GPUs never dirty the CPU cache.
+	 */
 	if (IS_DGFX(i915))
 		return false;
 
+	/*
+	 * Cache snooping on Meteorlake is using write-invalidate so GPU writes
+	 * never end up in the CPU cache.
+	 *
+	 * QQQ: Do other snooping platforms behave identicaly and could we
+	 *      therefore write this as "if !HAS_LLC(i915) && HAS_SNOOP(i915)"?
+	 */
+	if (IS_METEORLAKE(i915))
+		return false;
+
 	/*
 	 * For objects created by userspace through GEM_CREATE with pat_index
 	 * set by set_pat extension, i915_gem_object_has_cache_level() will
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [RFC 2/8] drm/i915: Split PTE encode between Gen12 and Meteorlake
  2023-07-27 14:54 ` [Intel-gfx] " Tvrtko Ursulin
@ 2023-07-27 14:54   ` Tvrtko Ursulin
  -1 siblings, 0 replies; 59+ messages in thread
From: Tvrtko Ursulin @ 2023-07-27 14:54 UTC (permalink / raw)
  To: Intel-gfx, dri-devel; +Cc: Tvrtko Ursulin

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

No need to run extra instructions which will never trigger on platforms
before Meteorlake.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
index c8568e5d1147..862ac1d2de25 100644
--- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
@@ -63,6 +63,30 @@ static u64 gen12_pte_encode(dma_addr_t addr,
 {
 	gen8_pte_t pte = addr | GEN8_PAGE_PRESENT | GEN8_PAGE_RW;
 
+	if (unlikely(flags & PTE_READ_ONLY))
+		pte &= ~GEN8_PAGE_RW;
+
+	if (flags & PTE_LM)
+		pte |= GEN12_PPGTT_PTE_LM;
+
+	if (pat_index & BIT(0))
+		pte |= GEN12_PPGTT_PTE_PAT0;
+
+	if (pat_index & BIT(1))
+		pte |= GEN12_PPGTT_PTE_PAT1;
+
+	if (pat_index & BIT(2))
+		pte |= GEN12_PPGTT_PTE_PAT2;
+
+	return pte;
+}
+
+static u64 mtl_pte_encode(dma_addr_t addr,
+			  unsigned int pat_index,
+			  u32 flags)
+{
+	gen8_pte_t pte = addr | GEN8_PAGE_PRESENT | GEN8_PAGE_RW;
+
 	if (unlikely(flags & PTE_READ_ONLY))
 		pte &= ~GEN8_PAGE_RW;
 
@@ -995,6 +1019,8 @@ struct i915_ppgtt *gen8_ppgtt_create(struct intel_gt *gt,
 	 */
 	ppgtt->vm.alloc_scratch_dma = alloc_pt_dma;
 
+	if (GRAPHICS_VER_FULL(gt->i915) >= IP_VER(12, 70))
+		ppgtt->vm.pte_encode = mtl_pte_encode;
 	if (GRAPHICS_VER(gt->i915) >= 12)
 		ppgtt->vm.pte_encode = gen12_pte_encode;
 	else
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Intel-gfx] [RFC 2/8] drm/i915: Split PTE encode between Gen12 and Meteorlake
@ 2023-07-27 14:54   ` Tvrtko Ursulin
  0 siblings, 0 replies; 59+ messages in thread
From: Tvrtko Ursulin @ 2023-07-27 14:54 UTC (permalink / raw)
  To: Intel-gfx, dri-devel

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

No need to run extra instructions which will never trigger on platforms
before Meteorlake.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
index c8568e5d1147..862ac1d2de25 100644
--- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
@@ -63,6 +63,30 @@ static u64 gen12_pte_encode(dma_addr_t addr,
 {
 	gen8_pte_t pte = addr | GEN8_PAGE_PRESENT | GEN8_PAGE_RW;
 
+	if (unlikely(flags & PTE_READ_ONLY))
+		pte &= ~GEN8_PAGE_RW;
+
+	if (flags & PTE_LM)
+		pte |= GEN12_PPGTT_PTE_LM;
+
+	if (pat_index & BIT(0))
+		pte |= GEN12_PPGTT_PTE_PAT0;
+
+	if (pat_index & BIT(1))
+		pte |= GEN12_PPGTT_PTE_PAT1;
+
+	if (pat_index & BIT(2))
+		pte |= GEN12_PPGTT_PTE_PAT2;
+
+	return pte;
+}
+
+static u64 mtl_pte_encode(dma_addr_t addr,
+			  unsigned int pat_index,
+			  u32 flags)
+{
+	gen8_pte_t pte = addr | GEN8_PAGE_PRESENT | GEN8_PAGE_RW;
+
 	if (unlikely(flags & PTE_READ_ONLY))
 		pte &= ~GEN8_PAGE_RW;
 
@@ -995,6 +1019,8 @@ struct i915_ppgtt *gen8_ppgtt_create(struct intel_gt *gt,
 	 */
 	ppgtt->vm.alloc_scratch_dma = alloc_pt_dma;
 
+	if (GRAPHICS_VER_FULL(gt->i915) >= IP_VER(12, 70))
+		ppgtt->vm.pte_encode = mtl_pte_encode;
 	if (GRAPHICS_VER(gt->i915) >= 12)
 		ppgtt->vm.pte_encode = gen12_pte_encode;
 	else
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [RFC 3/8] drm/i915: Cache PAT index used by the driver
  2023-07-27 14:54 ` [Intel-gfx] " Tvrtko Ursulin
@ 2023-07-27 14:54   ` Tvrtko Ursulin
  -1 siblings, 0 replies; 59+ messages in thread
From: Tvrtko Ursulin @ 2023-07-27 14:54 UTC (permalink / raw)
  To: Intel-gfx, dri-devel; +Cc: Matt Roper, Fei Yang, Tvrtko Ursulin

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Eliminate a bunch of runtime calls to i915_gem_get_pat_index() by caching
the interesting PAT indices in struct drm_i915_private. They are static
per platfrom so no need to consult a function every time.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Fei Yang <fei.yang@intel.com>
---
 drivers/gpu/drm/i915/Makefile                 |  1 +
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |  3 +--
 drivers/gpu/drm/i915/gem/i915_gem_stolen.c    |  7 ++---
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c  | 26 ++++++++++++-------
 .../gpu/drm/i915/gem/selftests/huge_pages.c   |  2 +-
 drivers/gpu/drm/i915/gt/gen6_ppgtt.c          |  4 +--
 drivers/gpu/drm/i915/gt/gen8_ppgtt.c          |  4 +--
 drivers/gpu/drm/i915/gt/intel_ggtt.c          |  8 ++----
 drivers/gpu/drm/i915/gt/intel_migrate.c       | 11 +++-----
 drivers/gpu/drm/i915/gt/selftest_migrate.c    |  9 +++----
 drivers/gpu/drm/i915/gt/selftest_reset.c      | 14 +++-------
 drivers/gpu/drm/i915/gt/selftest_tlb.c        |  5 ++--
 drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c      |  8 ++----
 drivers/gpu/drm/i915/i915_cache.c             | 18 +++++++++++++
 drivers/gpu/drm/i915/i915_cache.h             | 13 ++++++++++
 drivers/gpu/drm/i915/i915_driver.c            |  3 +++
 drivers/gpu/drm/i915/i915_drv.h               |  2 ++
 drivers/gpu/drm/i915/i915_gem.c               |  8 ++----
 drivers/gpu/drm/i915/i915_gpu_error.c         |  8 ++----
 drivers/gpu/drm/i915/selftests/i915_gem.c     |  5 +---
 .../gpu/drm/i915/selftests/i915_gem_evict.c   |  4 +--
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 11 +++-----
 .../drm/i915/selftests/intel_memory_region.c  |  4 +--
 .../gpu/drm/i915/selftests/mock_gem_device.c  |  2 ++
 24 files changed, 89 insertions(+), 91 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/i915_cache.c
 create mode 100644 drivers/gpu/drm/i915/i915_cache.h

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index c5fc91cd58e7..905a51a16588 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -35,6 +35,7 @@ subdir-ccflags-y += -I$(srctree)/$(src)
 # core driver code
 i915-y += i915_driver.o \
 	  i915_drm_client.o \
+	  i915_cache.o \
 	  i915_config.o \
 	  i915_getparam.o \
 	  i915_ioctl.o \
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 5a687a3686bd..0a1d40220020 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -1330,8 +1330,7 @@ static void *reloc_iomap(struct i915_vma *batch,
 		ggtt->vm.insert_page(&ggtt->vm,
 				     i915_gem_object_get_dma_address(obj, page),
 				     offset,
-				     i915_gem_get_pat_index(ggtt->vm.i915,
-							    I915_CACHE_NONE),
+				     eb->i915->pat_uc,
 				     0);
 	} else {
 		offset += page << PAGE_SHIFT;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
index 5b0a5cf9a98a..1c8eb806b7d3 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
@@ -563,11 +563,8 @@ static void dbg_poison(struct i915_ggtt *ggtt,
 	while (size) {
 		void __iomem *s;
 
-		ggtt->vm.insert_page(&ggtt->vm, addr,
-				     ggtt->error_capture.start,
-				     i915_gem_get_pat_index(ggtt->vm.i915,
-							    I915_CACHE_NONE),
-				     0);
+		ggtt->vm.insert_page(&ggtt->vm, addr, ggtt->error_capture.start,
+				     ggtt->vm.i915->pat_uc, 0);
 		mb();
 
 		s = io_mapping_map_wc(&ggtt->iomap,
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index 7078af2f8f79..6bd6c239f4ac 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -58,6 +58,16 @@ i915_ttm_cache_level(struct drm_i915_private *i915, struct ttm_resource *res,
 		I915_CACHE_NONE;
 }
 
+static unsigned int
+i915_ttm_cache_pat(struct drm_i915_private *i915, struct ttm_resource *res,
+		   struct ttm_tt *ttm)
+{
+	return ((HAS_LLC(i915) || HAS_SNOOP(i915)) &&
+		!i915_ttm_gtt_binds_lmem(res) &&
+		ttm->caching == ttm_cached) ? i915->pat_wb :
+		i915->pat_uc;
+}
+
 static struct intel_memory_region *
 i915_ttm_region(struct ttm_device *bdev, int ttm_mem_type)
 {
@@ -196,7 +206,7 @@ static struct dma_fence *i915_ttm_accel_move(struct ttm_buffer_object *bo,
 	struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
 	struct i915_request *rq;
 	struct ttm_tt *src_ttm = bo->ttm;
-	enum i915_cache_level src_level, dst_level;
+	unsigned int src_pat, dst_pat;
 	int ret;
 
 	if (!to_gt(i915)->migrate.context || intel_gt_is_wedged(to_gt(i915)))
@@ -206,16 +216,15 @@ static struct dma_fence *i915_ttm_accel_move(struct ttm_buffer_object *bo,
 	if (I915_SELFTEST_ONLY(fail_gpu_migration))
 		clear = true;
 
-	dst_level = i915_ttm_cache_level(i915, dst_mem, dst_ttm);
+	dst_pat = i915_ttm_cache_pat(i915, dst_mem, dst_ttm);
 	if (clear) {
 		if (bo->type == ttm_bo_type_kernel &&
 		    !I915_SELFTEST_ONLY(fail_gpu_migration))
 			return ERR_PTR(-EINVAL);
 
 		intel_engine_pm_get(to_gt(i915)->migrate.context->engine);
-		ret = intel_context_migrate_clear(to_gt(i915)->migrate.context, deps,
-						  dst_st->sgl,
-						  i915_gem_get_pat_index(i915, dst_level),
+		ret = intel_context_migrate_clear(to_gt(i915)->migrate.context,
+						  deps, dst_st->sgl, dst_pat,
 						  i915_ttm_gtt_binds_lmem(dst_mem),
 						  0, &rq);
 	} else {
@@ -225,14 +234,13 @@ static struct dma_fence *i915_ttm_accel_move(struct ttm_buffer_object *bo,
 		if (IS_ERR(src_rsgt))
 			return ERR_CAST(src_rsgt);
 
-		src_level = i915_ttm_cache_level(i915, bo->resource, src_ttm);
+		src_pat = i915_ttm_cache_pat(i915, bo->resource, src_ttm);
 		intel_engine_pm_get(to_gt(i915)->migrate.context->engine);
 		ret = intel_context_migrate_copy(to_gt(i915)->migrate.context,
 						 deps, src_rsgt->table.sgl,
-						 i915_gem_get_pat_index(i915, src_level),
+						 src_pat,
 						 i915_ttm_gtt_binds_lmem(bo->resource),
-						 dst_st->sgl,
-						 i915_gem_get_pat_index(i915, dst_level),
+						 dst_st->sgl, dst_pat,
 						 i915_ttm_gtt_binds_lmem(dst_mem),
 						 &rq);
 
diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
index 6b9f6cf50bf6..6bddd733d796 100644
--- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
+++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
@@ -354,7 +354,7 @@ fake_huge_pages_object(struct drm_i915_private *i915, u64 size, bool single)
 
 	obj->write_domain = I915_GEM_DOMAIN_CPU;
 	obj->read_domains = I915_GEM_DOMAIN_CPU;
-	obj->pat_index = i915_gem_get_pat_index(i915, I915_CACHE_NONE);
+	obj->pat_index = i915->pat_uc;
 
 	return obj;
 }
diff --git a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
index c2bdc133c89a..fb69f667652a 100644
--- a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
@@ -226,9 +226,7 @@ static int gen6_ppgtt_init_scratch(struct gen6_ppgtt *ppgtt)
 		return ret;
 
 	vm->scratch[0]->encode =
-		vm->pte_encode(px_dma(vm->scratch[0]),
-			       i915_gem_get_pat_index(vm->i915,
-						      I915_CACHE_NONE),
+		vm->pte_encode(px_dma(vm->scratch[0]), vm->i915->pat_uc,
 			       PTE_READ_ONLY);
 
 	vm->scratch[1] = vm->alloc_pt_dma(vm, I915_GTT_PAGE_SIZE_4K);
diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
index 862ac1d2de25..675f71f06e89 100644
--- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
@@ -874,9 +874,7 @@ static int gen8_init_scratch(struct i915_address_space *vm)
 		pte_flags |= PTE_LM;
 
 	vm->scratch[0]->encode =
-		vm->pte_encode(px_dma(vm->scratch[0]),
-			       i915_gem_get_pat_index(vm->i915,
-						      I915_CACHE_NONE),
+		vm->pte_encode(px_dma(vm->scratch[0]), vm->i915->pat_uc,
 			       pte_flags);
 
 	for (i = 1; i <= vm->top; i++) {
diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c
index dd0ed941441a..fca61ddca8ad 100644
--- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
@@ -921,9 +921,7 @@ static int ggtt_probe_common(struct i915_ggtt *ggtt, u64 size)
 		pte_flags |= PTE_LM;
 
 	ggtt->vm.scratch[0]->encode =
-		ggtt->vm.pte_encode(px_dma(ggtt->vm.scratch[0]),
-				    i915_gem_get_pat_index(i915,
-							   I915_CACHE_NONE),
+		ggtt->vm.pte_encode(px_dma(ggtt->vm.scratch[0]), i915->pat_uc,
 				    pte_flags);
 
 	return 0;
@@ -1298,9 +1296,7 @@ bool i915_ggtt_resume_vm(struct i915_address_space *vm)
 		 */
 		vma->resource->bound_flags = 0;
 		vma->ops->bind_vma(vm, NULL, vma->resource,
-				   obj ? obj->pat_index :
-					 i915_gem_get_pat_index(vm->i915,
-								I915_CACHE_NONE),
+				   obj ? obj->pat_index : vm->i915->pat_uc,
 				   was_bound);
 
 		if (obj) { /* only used during resume => exclusive access */
diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c
index 576e5ef0289b..b7a61b02f64c 100644
--- a/drivers/gpu/drm/i915/gt/intel_migrate.c
+++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
@@ -45,9 +45,7 @@ static void xehpsdv_toggle_pdes(struct i915_address_space *vm,
 	 * Insert a dummy PTE into every PT that will map to LMEM to ensure
 	 * we have a correctly setup PDE structure for later use.
 	 */
-	vm->insert_page(vm, 0, d->offset,
-			i915_gem_get_pat_index(vm->i915, I915_CACHE_NONE),
-			PTE_LM);
+	vm->insert_page(vm, 0, d->offset, vm->i915->pat_uc, PTE_LM);
 	GEM_BUG_ON(!pt->is_compact);
 	d->offset += SZ_2M;
 }
@@ -65,9 +63,7 @@ static void xehpsdv_insert_pte(struct i915_address_space *vm,
 	 * alignment is 64K underneath for the pt, and we are careful
 	 * not to access the space in the void.
 	 */
-	vm->insert_page(vm, px_dma(pt), d->offset,
-			i915_gem_get_pat_index(vm->i915, I915_CACHE_NONE),
-			PTE_LM);
+	vm->insert_page(vm, px_dma(pt), d->offset, vm->i915->pat_uc, PTE_LM);
 	d->offset += SZ_64K;
 }
 
@@ -77,8 +73,7 @@ static void insert_pte(struct i915_address_space *vm,
 {
 	struct insert_pte_data *d = data;
 
-	vm->insert_page(vm, px_dma(pt), d->offset,
-			i915_gem_get_pat_index(vm->i915, I915_CACHE_NONE),
+	vm->insert_page(vm, px_dma(pt), d->offset, vm->i915->pat_uc,
 			i915_gem_object_is_lmem(pt->base) ? PTE_LM : 0);
 	d->offset += PAGE_SIZE;
 }
diff --git a/drivers/gpu/drm/i915/gt/selftest_migrate.c b/drivers/gpu/drm/i915/gt/selftest_migrate.c
index 3def5ca72dec..a67ede65d816 100644
--- a/drivers/gpu/drm/i915/gt/selftest_migrate.c
+++ b/drivers/gpu/drm/i915/gt/selftest_migrate.c
@@ -904,8 +904,7 @@ static int perf_clear_blt(void *arg)
 
 		err = __perf_clear_blt(gt->migrate.context,
 				       dst->mm.pages->sgl,
-				       i915_gem_get_pat_index(gt->i915,
-							      I915_CACHE_NONE),
+				       gt->i915->pat_uc,
 				       i915_gem_object_is_lmem(dst),
 				       sizes[i]);
 
@@ -995,12 +994,10 @@ static int perf_copy_blt(void *arg)
 
 		err = __perf_copy_blt(gt->migrate.context,
 				      src->mm.pages->sgl,
-				      i915_gem_get_pat_index(gt->i915,
-							     I915_CACHE_NONE),
+				      gt->i915->pat_uc,
 				      i915_gem_object_is_lmem(src),
 				      dst->mm.pages->sgl,
-				      i915_gem_get_pat_index(gt->i915,
-							     I915_CACHE_NONE),
+				      gt->i915->pat_uc,
 				      i915_gem_object_is_lmem(dst),
 				      sz);
 
diff --git a/drivers/gpu/drm/i915/gt/selftest_reset.c b/drivers/gpu/drm/i915/gt/selftest_reset.c
index 79aa6ac66ad2..327dc9294e0f 100644
--- a/drivers/gpu/drm/i915/gt/selftest_reset.c
+++ b/drivers/gpu/drm/i915/gt/selftest_reset.c
@@ -84,11 +84,8 @@ __igt_reset_stolen(struct intel_gt *gt,
 		void __iomem *s;
 		void *in;
 
-		ggtt->vm.insert_page(&ggtt->vm, dma,
-				     ggtt->error_capture.start,
-				     i915_gem_get_pat_index(gt->i915,
-							    I915_CACHE_NONE),
-				     0);
+		ggtt->vm.insert_page(&ggtt->vm, dma, ggtt->error_capture.start,
+				     gt->i915->pat_uc, 0);
 		mb();
 
 		s = io_mapping_map_wc(&ggtt->iomap,
@@ -127,11 +124,8 @@ __igt_reset_stolen(struct intel_gt *gt,
 		void *in;
 		u32 x;
 
-		ggtt->vm.insert_page(&ggtt->vm, dma,
-				     ggtt->error_capture.start,
-				     i915_gem_get_pat_index(gt->i915,
-							    I915_CACHE_NONE),
-				     0);
+		ggtt->vm.insert_page(&ggtt->vm, dma, ggtt->error_capture.start,
+				     gt->i915->pat_uc, 0);
 		mb();
 
 		s = io_mapping_map_wc(&ggtt->iomap,
diff --git a/drivers/gpu/drm/i915/gt/selftest_tlb.c b/drivers/gpu/drm/i915/gt/selftest_tlb.c
index 3bd6b540257b..6049f01be219 100644
--- a/drivers/gpu/drm/i915/gt/selftest_tlb.c
+++ b/drivers/gpu/drm/i915/gt/selftest_tlb.c
@@ -36,8 +36,6 @@ pte_tlbinv(struct intel_context *ce,
 	   u64 length,
 	   struct rnd_state *prng)
 {
-	const unsigned int pat_index =
-		i915_gem_get_pat_index(ce->vm->i915, I915_CACHE_NONE);
 	struct drm_i915_gem_object *batch;
 	struct drm_mm_node vb_node;
 	struct i915_request *rq;
@@ -157,7 +155,8 @@ pte_tlbinv(struct intel_context *ce,
 		/* Flip the PTE between A and B */
 		if (i915_gem_object_is_lmem(vb->obj))
 			pte_flags |= PTE_LM;
-		ce->vm->insert_entries(ce->vm, &vb_res, pat_index, pte_flags);
+		ce->vm->insert_entries(ce->vm, &vb_res, ce->vm->i915->pat_uc,
+				       pte_flags);
 
 		/* Flush the PTE update to concurrent HW */
 		tlbinv(ce->vm, addr & -length, length);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
index 7aadad5639c3..8b7aa8c5a99d 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
@@ -1053,14 +1053,10 @@ static void uc_fw_bind_ggtt(struct intel_uc_fw *uc_fw)
 
 	if (ggtt->vm.raw_insert_entries)
 		ggtt->vm.raw_insert_entries(&ggtt->vm, vma_res,
-					    i915_gem_get_pat_index(ggtt->vm.i915,
-								   I915_CACHE_NONE),
-					    pte_flags);
+					    ggtt->vm.i915->pat_uc, pte_flags);
 	else
 		ggtt->vm.insert_entries(&ggtt->vm, vma_res,
-					i915_gem_get_pat_index(ggtt->vm.i915,
-							       I915_CACHE_NONE),
-					pte_flags);
+					ggtt->vm.i915->pat_uc, pte_flags);
 }
 
 static void uc_fw_unbind_ggtt(struct intel_uc_fw *uc_fw)
diff --git a/drivers/gpu/drm/i915/i915_cache.c b/drivers/gpu/drm/i915/i915_cache.c
new file mode 100644
index 000000000000..06eb5933c719
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_cache.c
@@ -0,0 +1,18 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2023 Intel Corporation
+ */
+
+#include "i915_cache.h"
+#include "i915_drv.h"
+
+void i915_cache_init(struct drm_i915_private *i915)
+{
+	i915->pat_uc = i915_gem_get_pat_index(i915, I915_CACHE_NONE);
+	drm_info(&i915->drm, "Using PAT index %u for uncached access\n",
+		 i915->pat_uc);
+
+	i915->pat_wb = i915_gem_get_pat_index(i915, I915_CACHE_LLC);
+	drm_info(&i915->drm, "Using PAT index %u for write-back access\n",
+		 i915->pat_wb);
+}
diff --git a/drivers/gpu/drm/i915/i915_cache.h b/drivers/gpu/drm/i915/i915_cache.h
new file mode 100644
index 000000000000..cb68936fb8a2
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_cache.h
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2023 Intel Corporation
+ */
+
+#ifndef __I915_CACHE_H__
+#define __I915_CACHE_H__
+
+struct drm_i915_private;
+
+void i915_cache_init(struct drm_i915_private *i915);
+
+#endif /* __I915_CACHE_H__ */
diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
index 294b022de22b..bb2223cc3470 100644
--- a/drivers/gpu/drm/i915/i915_driver.c
+++ b/drivers/gpu/drm/i915/i915_driver.c
@@ -80,6 +80,7 @@
 #include "soc/intel_dram.h"
 #include "soc/intel_gmch.h"
 
+#include "i915_cache.h"
 #include "i915_debugfs.h"
 #include "i915_driver.h"
 #include "i915_drm_client.h"
@@ -240,6 +241,8 @@ static int i915_driver_early_probe(struct drm_i915_private *dev_priv)
 	i915_memcpy_init_early(dev_priv);
 	intel_runtime_pm_init_early(&dev_priv->runtime_pm);
 
+	i915_cache_init(dev_priv);
+
 	ret = i915_workqueues_init(dev_priv);
 	if (ret < 0)
 		return ret;
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 682ef2b5c7d5..f5c591a762df 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -250,6 +250,8 @@ struct drm_i915_private {
 	unsigned int hpll_freq;
 	unsigned int czclk_freq;
 
+	unsigned int pat_uc, pat_wb;
+
 	/**
 	 * wq - Driver workqueue for GEM.
 	 *
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 1f65bb33dd21..896aa48ed089 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -422,9 +422,7 @@ i915_gem_gtt_pread(struct drm_i915_gem_object *obj,
 			ggtt->vm.insert_page(&ggtt->vm,
 					     i915_gem_object_get_dma_address(obj,
 									     offset >> PAGE_SHIFT),
-					     node.start,
-					     i915_gem_get_pat_index(i915,
-								    I915_CACHE_NONE), 0);
+					     node.start, i915->pat_uc, 0);
 		} else {
 			page_base += offset & PAGE_MASK;
 		}
@@ -603,9 +601,7 @@ i915_gem_gtt_pwrite_fast(struct drm_i915_gem_object *obj,
 			ggtt->vm.insert_page(&ggtt->vm,
 					     i915_gem_object_get_dma_address(obj,
 									     offset >> PAGE_SHIFT),
-					     node.start,
-					     i915_gem_get_pat_index(i915,
-								    I915_CACHE_NONE), 0);
+					     node.start, i915->pat_uc, 0);
 			wmb(); /* flush modifications to the GGTT (insert_page) */
 		} else {
 			page_base += offset & PAGE_MASK;
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 4008bb09fdb5..31975a79730c 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -1124,14 +1124,10 @@ i915_vma_coredump_create(const struct intel_gt *gt,
 			mutex_lock(&ggtt->error_mutex);
 			if (ggtt->vm.raw_insert_page)
 				ggtt->vm.raw_insert_page(&ggtt->vm, dma, slot,
-							 i915_gem_get_pat_index(gt->i915,
-										I915_CACHE_NONE),
-							 0);
+							 gt->i915->pat_uc, 0);
 			else
 				ggtt->vm.insert_page(&ggtt->vm, dma, slot,
-						     i915_gem_get_pat_index(gt->i915,
-									    I915_CACHE_NONE),
-						     0);
+						     gt->i915->pat_uc, 0);
 			mb();
 
 			s = io_mapping_map_wc(&ggtt->iomap, slot, PAGE_SIZE);
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem.c b/drivers/gpu/drm/i915/selftests/i915_gem.c
index 61da4ed9d521..e620f73793a5 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem.c
@@ -57,10 +57,7 @@ static void trash_stolen(struct drm_i915_private *i915)
 		u32 __iomem *s;
 		int x;
 
-		ggtt->vm.insert_page(&ggtt->vm, dma, slot,
-				     i915_gem_get_pat_index(i915,
-							    I915_CACHE_NONE),
-				     0);
+		ggtt->vm.insert_page(&ggtt->vm, dma, slot, i915->pat_uc, 0);
 
 		s = io_mapping_map_atomic_wc(&ggtt->iomap, slot);
 		for (x = 0; x < PAGE_SIZE / sizeof(u32); x++) {
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
index f8fe3681c3dc..f910ec9b6d2b 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
@@ -246,7 +246,7 @@ static int igt_evict_for_cache_color(void *arg)
 	struct drm_mm_node target = {
 		.start = I915_GTT_PAGE_SIZE * 2,
 		.size = I915_GTT_PAGE_SIZE,
-		.color = i915_gem_get_pat_index(gt->i915, I915_CACHE_LLC),
+		.color = gt->i915->pat_wb,
 	};
 	struct drm_i915_gem_object *obj;
 	struct i915_vma *vma;
@@ -309,7 +309,7 @@ static int igt_evict_for_cache_color(void *arg)
 	/* Attempt to remove the first *pinned* vma, by removing the (empty)
 	 * neighbour -- this should fail.
 	 */
-	target.color = i915_gem_get_pat_index(gt->i915, I915_CACHE_L3_LLC);
+	target.color = gt->i915->pat_uc;
 
 	mutex_lock(&ggtt->vm.mutex);
 	err = i915_gem_evict_for_node(&ggtt->vm, NULL, &target, 0);
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
index 5c397a2df70e..c96b7f7d7853 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
@@ -135,7 +135,7 @@ fake_dma_object(struct drm_i915_private *i915, u64 size)
 
 	obj->write_domain = I915_GEM_DOMAIN_CPU;
 	obj->read_domains = I915_GEM_DOMAIN_CPU;
-	obj->pat_index = i915_gem_get_pat_index(i915, I915_CACHE_NONE);
+	obj->pat_index = i915->pat_uc;
 
 	/* Preallocate the "backing storage" */
 	if (i915_gem_object_pin_pages_unlocked(obj))
@@ -358,9 +358,7 @@ static int lowlevel_hole(struct i915_address_space *vm,
 			mock_vma_res->start = addr;
 
 			with_intel_runtime_pm(vm->gt->uncore->rpm, wakeref)
-			  vm->insert_entries(vm, mock_vma_res,
-					     i915_gem_get_pat_index(vm->i915,
-								    I915_CACHE_NONE),
+			  vm->insert_entries(vm, mock_vma_res, vm->i915->pat_uc,
 					     0);
 		}
 		count = n;
@@ -1379,10 +1377,7 @@ static int igt_ggtt_page(void *arg)
 
 		ggtt->vm.insert_page(&ggtt->vm,
 				     i915_gem_object_get_dma_address(obj, 0),
-				     offset,
-				     i915_gem_get_pat_index(i915,
-							    I915_CACHE_NONE),
-				     0);
+				     offset, i915->pat_uc, 0);
 	}
 
 	order = i915_random_order(count, &prng);
diff --git a/drivers/gpu/drm/i915/selftests/intel_memory_region.c b/drivers/gpu/drm/i915/selftests/intel_memory_region.c
index d985d9bae2e8..b82fe0ef8cd7 100644
--- a/drivers/gpu/drm/i915/selftests/intel_memory_region.c
+++ b/drivers/gpu/drm/i915/selftests/intel_memory_region.c
@@ -1070,9 +1070,7 @@ static int igt_lmem_write_cpu(void *arg)
 	/* Put the pages into a known state -- from the gpu for added fun */
 	intel_engine_pm_get(engine);
 	err = intel_context_migrate_clear(engine->gt->migrate.context, NULL,
-					  obj->mm.pages->sgl,
-					  i915_gem_get_pat_index(i915,
-								 I915_CACHE_NONE),
+					  obj->mm.pages->sgl, i915->pat_uc,
 					  true, 0xdeadbeaf, &rq);
 	if (rq) {
 		dma_resv_add_fence(obj->base.resv, &rq->fence,
diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
index da0b269606c5..1d1a457e2aee 100644
--- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
+++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
@@ -181,6 +181,8 @@ struct drm_i915_private *mock_gem_device(void)
 	/* Set up device info and initial runtime info. */
 	intel_device_info_driver_create(i915, pdev->device, &mock_info);
 
+	i915_cache_init(i915);
+
 	dev_pm_domain_set(&pdev->dev, &pm_domain);
 	pm_runtime_enable(&pdev->dev);
 	pm_runtime_dont_use_autosuspend(&pdev->dev);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Intel-gfx] [RFC 3/8] drm/i915: Cache PAT index used by the driver
@ 2023-07-27 14:54   ` Tvrtko Ursulin
  0 siblings, 0 replies; 59+ messages in thread
From: Tvrtko Ursulin @ 2023-07-27 14:54 UTC (permalink / raw)
  To: Intel-gfx, dri-devel; +Cc: Matt Roper

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Eliminate a bunch of runtime calls to i915_gem_get_pat_index() by caching
the interesting PAT indices in struct drm_i915_private. They are static
per platfrom so no need to consult a function every time.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Fei Yang <fei.yang@intel.com>
---
 drivers/gpu/drm/i915/Makefile                 |  1 +
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |  3 +--
 drivers/gpu/drm/i915/gem/i915_gem_stolen.c    |  7 ++---
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c  | 26 ++++++++++++-------
 .../gpu/drm/i915/gem/selftests/huge_pages.c   |  2 +-
 drivers/gpu/drm/i915/gt/gen6_ppgtt.c          |  4 +--
 drivers/gpu/drm/i915/gt/gen8_ppgtt.c          |  4 +--
 drivers/gpu/drm/i915/gt/intel_ggtt.c          |  8 ++----
 drivers/gpu/drm/i915/gt/intel_migrate.c       | 11 +++-----
 drivers/gpu/drm/i915/gt/selftest_migrate.c    |  9 +++----
 drivers/gpu/drm/i915/gt/selftest_reset.c      | 14 +++-------
 drivers/gpu/drm/i915/gt/selftest_tlb.c        |  5 ++--
 drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c      |  8 ++----
 drivers/gpu/drm/i915/i915_cache.c             | 18 +++++++++++++
 drivers/gpu/drm/i915/i915_cache.h             | 13 ++++++++++
 drivers/gpu/drm/i915/i915_driver.c            |  3 +++
 drivers/gpu/drm/i915/i915_drv.h               |  2 ++
 drivers/gpu/drm/i915/i915_gem.c               |  8 ++----
 drivers/gpu/drm/i915/i915_gpu_error.c         |  8 ++----
 drivers/gpu/drm/i915/selftests/i915_gem.c     |  5 +---
 .../gpu/drm/i915/selftests/i915_gem_evict.c   |  4 +--
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 11 +++-----
 .../drm/i915/selftests/intel_memory_region.c  |  4 +--
 .../gpu/drm/i915/selftests/mock_gem_device.c  |  2 ++
 24 files changed, 89 insertions(+), 91 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/i915_cache.c
 create mode 100644 drivers/gpu/drm/i915/i915_cache.h

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index c5fc91cd58e7..905a51a16588 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -35,6 +35,7 @@ subdir-ccflags-y += -I$(srctree)/$(src)
 # core driver code
 i915-y += i915_driver.o \
 	  i915_drm_client.o \
+	  i915_cache.o \
 	  i915_config.o \
 	  i915_getparam.o \
 	  i915_ioctl.o \
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 5a687a3686bd..0a1d40220020 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -1330,8 +1330,7 @@ static void *reloc_iomap(struct i915_vma *batch,
 		ggtt->vm.insert_page(&ggtt->vm,
 				     i915_gem_object_get_dma_address(obj, page),
 				     offset,
-				     i915_gem_get_pat_index(ggtt->vm.i915,
-							    I915_CACHE_NONE),
+				     eb->i915->pat_uc,
 				     0);
 	} else {
 		offset += page << PAGE_SHIFT;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
index 5b0a5cf9a98a..1c8eb806b7d3 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
@@ -563,11 +563,8 @@ static void dbg_poison(struct i915_ggtt *ggtt,
 	while (size) {
 		void __iomem *s;
 
-		ggtt->vm.insert_page(&ggtt->vm, addr,
-				     ggtt->error_capture.start,
-				     i915_gem_get_pat_index(ggtt->vm.i915,
-							    I915_CACHE_NONE),
-				     0);
+		ggtt->vm.insert_page(&ggtt->vm, addr, ggtt->error_capture.start,
+				     ggtt->vm.i915->pat_uc, 0);
 		mb();
 
 		s = io_mapping_map_wc(&ggtt->iomap,
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index 7078af2f8f79..6bd6c239f4ac 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -58,6 +58,16 @@ i915_ttm_cache_level(struct drm_i915_private *i915, struct ttm_resource *res,
 		I915_CACHE_NONE;
 }
 
+static unsigned int
+i915_ttm_cache_pat(struct drm_i915_private *i915, struct ttm_resource *res,
+		   struct ttm_tt *ttm)
+{
+	return ((HAS_LLC(i915) || HAS_SNOOP(i915)) &&
+		!i915_ttm_gtt_binds_lmem(res) &&
+		ttm->caching == ttm_cached) ? i915->pat_wb :
+		i915->pat_uc;
+}
+
 static struct intel_memory_region *
 i915_ttm_region(struct ttm_device *bdev, int ttm_mem_type)
 {
@@ -196,7 +206,7 @@ static struct dma_fence *i915_ttm_accel_move(struct ttm_buffer_object *bo,
 	struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
 	struct i915_request *rq;
 	struct ttm_tt *src_ttm = bo->ttm;
-	enum i915_cache_level src_level, dst_level;
+	unsigned int src_pat, dst_pat;
 	int ret;
 
 	if (!to_gt(i915)->migrate.context || intel_gt_is_wedged(to_gt(i915)))
@@ -206,16 +216,15 @@ static struct dma_fence *i915_ttm_accel_move(struct ttm_buffer_object *bo,
 	if (I915_SELFTEST_ONLY(fail_gpu_migration))
 		clear = true;
 
-	dst_level = i915_ttm_cache_level(i915, dst_mem, dst_ttm);
+	dst_pat = i915_ttm_cache_pat(i915, dst_mem, dst_ttm);
 	if (clear) {
 		if (bo->type == ttm_bo_type_kernel &&
 		    !I915_SELFTEST_ONLY(fail_gpu_migration))
 			return ERR_PTR(-EINVAL);
 
 		intel_engine_pm_get(to_gt(i915)->migrate.context->engine);
-		ret = intel_context_migrate_clear(to_gt(i915)->migrate.context, deps,
-						  dst_st->sgl,
-						  i915_gem_get_pat_index(i915, dst_level),
+		ret = intel_context_migrate_clear(to_gt(i915)->migrate.context,
+						  deps, dst_st->sgl, dst_pat,
 						  i915_ttm_gtt_binds_lmem(dst_mem),
 						  0, &rq);
 	} else {
@@ -225,14 +234,13 @@ static struct dma_fence *i915_ttm_accel_move(struct ttm_buffer_object *bo,
 		if (IS_ERR(src_rsgt))
 			return ERR_CAST(src_rsgt);
 
-		src_level = i915_ttm_cache_level(i915, bo->resource, src_ttm);
+		src_pat = i915_ttm_cache_pat(i915, bo->resource, src_ttm);
 		intel_engine_pm_get(to_gt(i915)->migrate.context->engine);
 		ret = intel_context_migrate_copy(to_gt(i915)->migrate.context,
 						 deps, src_rsgt->table.sgl,
-						 i915_gem_get_pat_index(i915, src_level),
+						 src_pat,
 						 i915_ttm_gtt_binds_lmem(bo->resource),
-						 dst_st->sgl,
-						 i915_gem_get_pat_index(i915, dst_level),
+						 dst_st->sgl, dst_pat,
 						 i915_ttm_gtt_binds_lmem(dst_mem),
 						 &rq);
 
diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
index 6b9f6cf50bf6..6bddd733d796 100644
--- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
+++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
@@ -354,7 +354,7 @@ fake_huge_pages_object(struct drm_i915_private *i915, u64 size, bool single)
 
 	obj->write_domain = I915_GEM_DOMAIN_CPU;
 	obj->read_domains = I915_GEM_DOMAIN_CPU;
-	obj->pat_index = i915_gem_get_pat_index(i915, I915_CACHE_NONE);
+	obj->pat_index = i915->pat_uc;
 
 	return obj;
 }
diff --git a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
index c2bdc133c89a..fb69f667652a 100644
--- a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
@@ -226,9 +226,7 @@ static int gen6_ppgtt_init_scratch(struct gen6_ppgtt *ppgtt)
 		return ret;
 
 	vm->scratch[0]->encode =
-		vm->pte_encode(px_dma(vm->scratch[0]),
-			       i915_gem_get_pat_index(vm->i915,
-						      I915_CACHE_NONE),
+		vm->pte_encode(px_dma(vm->scratch[0]), vm->i915->pat_uc,
 			       PTE_READ_ONLY);
 
 	vm->scratch[1] = vm->alloc_pt_dma(vm, I915_GTT_PAGE_SIZE_4K);
diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
index 862ac1d2de25..675f71f06e89 100644
--- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
@@ -874,9 +874,7 @@ static int gen8_init_scratch(struct i915_address_space *vm)
 		pte_flags |= PTE_LM;
 
 	vm->scratch[0]->encode =
-		vm->pte_encode(px_dma(vm->scratch[0]),
-			       i915_gem_get_pat_index(vm->i915,
-						      I915_CACHE_NONE),
+		vm->pte_encode(px_dma(vm->scratch[0]), vm->i915->pat_uc,
 			       pte_flags);
 
 	for (i = 1; i <= vm->top; i++) {
diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c
index dd0ed941441a..fca61ddca8ad 100644
--- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
@@ -921,9 +921,7 @@ static int ggtt_probe_common(struct i915_ggtt *ggtt, u64 size)
 		pte_flags |= PTE_LM;
 
 	ggtt->vm.scratch[0]->encode =
-		ggtt->vm.pte_encode(px_dma(ggtt->vm.scratch[0]),
-				    i915_gem_get_pat_index(i915,
-							   I915_CACHE_NONE),
+		ggtt->vm.pte_encode(px_dma(ggtt->vm.scratch[0]), i915->pat_uc,
 				    pte_flags);
 
 	return 0;
@@ -1298,9 +1296,7 @@ bool i915_ggtt_resume_vm(struct i915_address_space *vm)
 		 */
 		vma->resource->bound_flags = 0;
 		vma->ops->bind_vma(vm, NULL, vma->resource,
-				   obj ? obj->pat_index :
-					 i915_gem_get_pat_index(vm->i915,
-								I915_CACHE_NONE),
+				   obj ? obj->pat_index : vm->i915->pat_uc,
 				   was_bound);
 
 		if (obj) { /* only used during resume => exclusive access */
diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c
index 576e5ef0289b..b7a61b02f64c 100644
--- a/drivers/gpu/drm/i915/gt/intel_migrate.c
+++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
@@ -45,9 +45,7 @@ static void xehpsdv_toggle_pdes(struct i915_address_space *vm,
 	 * Insert a dummy PTE into every PT that will map to LMEM to ensure
 	 * we have a correctly setup PDE structure for later use.
 	 */
-	vm->insert_page(vm, 0, d->offset,
-			i915_gem_get_pat_index(vm->i915, I915_CACHE_NONE),
-			PTE_LM);
+	vm->insert_page(vm, 0, d->offset, vm->i915->pat_uc, PTE_LM);
 	GEM_BUG_ON(!pt->is_compact);
 	d->offset += SZ_2M;
 }
@@ -65,9 +63,7 @@ static void xehpsdv_insert_pte(struct i915_address_space *vm,
 	 * alignment is 64K underneath for the pt, and we are careful
 	 * not to access the space in the void.
 	 */
-	vm->insert_page(vm, px_dma(pt), d->offset,
-			i915_gem_get_pat_index(vm->i915, I915_CACHE_NONE),
-			PTE_LM);
+	vm->insert_page(vm, px_dma(pt), d->offset, vm->i915->pat_uc, PTE_LM);
 	d->offset += SZ_64K;
 }
 
@@ -77,8 +73,7 @@ static void insert_pte(struct i915_address_space *vm,
 {
 	struct insert_pte_data *d = data;
 
-	vm->insert_page(vm, px_dma(pt), d->offset,
-			i915_gem_get_pat_index(vm->i915, I915_CACHE_NONE),
+	vm->insert_page(vm, px_dma(pt), d->offset, vm->i915->pat_uc,
 			i915_gem_object_is_lmem(pt->base) ? PTE_LM : 0);
 	d->offset += PAGE_SIZE;
 }
diff --git a/drivers/gpu/drm/i915/gt/selftest_migrate.c b/drivers/gpu/drm/i915/gt/selftest_migrate.c
index 3def5ca72dec..a67ede65d816 100644
--- a/drivers/gpu/drm/i915/gt/selftest_migrate.c
+++ b/drivers/gpu/drm/i915/gt/selftest_migrate.c
@@ -904,8 +904,7 @@ static int perf_clear_blt(void *arg)
 
 		err = __perf_clear_blt(gt->migrate.context,
 				       dst->mm.pages->sgl,
-				       i915_gem_get_pat_index(gt->i915,
-							      I915_CACHE_NONE),
+				       gt->i915->pat_uc,
 				       i915_gem_object_is_lmem(dst),
 				       sizes[i]);
 
@@ -995,12 +994,10 @@ static int perf_copy_blt(void *arg)
 
 		err = __perf_copy_blt(gt->migrate.context,
 				      src->mm.pages->sgl,
-				      i915_gem_get_pat_index(gt->i915,
-							     I915_CACHE_NONE),
+				      gt->i915->pat_uc,
 				      i915_gem_object_is_lmem(src),
 				      dst->mm.pages->sgl,
-				      i915_gem_get_pat_index(gt->i915,
-							     I915_CACHE_NONE),
+				      gt->i915->pat_uc,
 				      i915_gem_object_is_lmem(dst),
 				      sz);
 
diff --git a/drivers/gpu/drm/i915/gt/selftest_reset.c b/drivers/gpu/drm/i915/gt/selftest_reset.c
index 79aa6ac66ad2..327dc9294e0f 100644
--- a/drivers/gpu/drm/i915/gt/selftest_reset.c
+++ b/drivers/gpu/drm/i915/gt/selftest_reset.c
@@ -84,11 +84,8 @@ __igt_reset_stolen(struct intel_gt *gt,
 		void __iomem *s;
 		void *in;
 
-		ggtt->vm.insert_page(&ggtt->vm, dma,
-				     ggtt->error_capture.start,
-				     i915_gem_get_pat_index(gt->i915,
-							    I915_CACHE_NONE),
-				     0);
+		ggtt->vm.insert_page(&ggtt->vm, dma, ggtt->error_capture.start,
+				     gt->i915->pat_uc, 0);
 		mb();
 
 		s = io_mapping_map_wc(&ggtt->iomap,
@@ -127,11 +124,8 @@ __igt_reset_stolen(struct intel_gt *gt,
 		void *in;
 		u32 x;
 
-		ggtt->vm.insert_page(&ggtt->vm, dma,
-				     ggtt->error_capture.start,
-				     i915_gem_get_pat_index(gt->i915,
-							    I915_CACHE_NONE),
-				     0);
+		ggtt->vm.insert_page(&ggtt->vm, dma, ggtt->error_capture.start,
+				     gt->i915->pat_uc, 0);
 		mb();
 
 		s = io_mapping_map_wc(&ggtt->iomap,
diff --git a/drivers/gpu/drm/i915/gt/selftest_tlb.c b/drivers/gpu/drm/i915/gt/selftest_tlb.c
index 3bd6b540257b..6049f01be219 100644
--- a/drivers/gpu/drm/i915/gt/selftest_tlb.c
+++ b/drivers/gpu/drm/i915/gt/selftest_tlb.c
@@ -36,8 +36,6 @@ pte_tlbinv(struct intel_context *ce,
 	   u64 length,
 	   struct rnd_state *prng)
 {
-	const unsigned int pat_index =
-		i915_gem_get_pat_index(ce->vm->i915, I915_CACHE_NONE);
 	struct drm_i915_gem_object *batch;
 	struct drm_mm_node vb_node;
 	struct i915_request *rq;
@@ -157,7 +155,8 @@ pte_tlbinv(struct intel_context *ce,
 		/* Flip the PTE between A and B */
 		if (i915_gem_object_is_lmem(vb->obj))
 			pte_flags |= PTE_LM;
-		ce->vm->insert_entries(ce->vm, &vb_res, pat_index, pte_flags);
+		ce->vm->insert_entries(ce->vm, &vb_res, ce->vm->i915->pat_uc,
+				       pte_flags);
 
 		/* Flush the PTE update to concurrent HW */
 		tlbinv(ce->vm, addr & -length, length);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
index 7aadad5639c3..8b7aa8c5a99d 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
@@ -1053,14 +1053,10 @@ static void uc_fw_bind_ggtt(struct intel_uc_fw *uc_fw)
 
 	if (ggtt->vm.raw_insert_entries)
 		ggtt->vm.raw_insert_entries(&ggtt->vm, vma_res,
-					    i915_gem_get_pat_index(ggtt->vm.i915,
-								   I915_CACHE_NONE),
-					    pte_flags);
+					    ggtt->vm.i915->pat_uc, pte_flags);
 	else
 		ggtt->vm.insert_entries(&ggtt->vm, vma_res,
-					i915_gem_get_pat_index(ggtt->vm.i915,
-							       I915_CACHE_NONE),
-					pte_flags);
+					ggtt->vm.i915->pat_uc, pte_flags);
 }
 
 static void uc_fw_unbind_ggtt(struct intel_uc_fw *uc_fw)
diff --git a/drivers/gpu/drm/i915/i915_cache.c b/drivers/gpu/drm/i915/i915_cache.c
new file mode 100644
index 000000000000..06eb5933c719
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_cache.c
@@ -0,0 +1,18 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2023 Intel Corporation
+ */
+
+#include "i915_cache.h"
+#include "i915_drv.h"
+
+void i915_cache_init(struct drm_i915_private *i915)
+{
+	i915->pat_uc = i915_gem_get_pat_index(i915, I915_CACHE_NONE);
+	drm_info(&i915->drm, "Using PAT index %u for uncached access\n",
+		 i915->pat_uc);
+
+	i915->pat_wb = i915_gem_get_pat_index(i915, I915_CACHE_LLC);
+	drm_info(&i915->drm, "Using PAT index %u for write-back access\n",
+		 i915->pat_wb);
+}
diff --git a/drivers/gpu/drm/i915/i915_cache.h b/drivers/gpu/drm/i915/i915_cache.h
new file mode 100644
index 000000000000..cb68936fb8a2
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_cache.h
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2023 Intel Corporation
+ */
+
+#ifndef __I915_CACHE_H__
+#define __I915_CACHE_H__
+
+struct drm_i915_private;
+
+void i915_cache_init(struct drm_i915_private *i915);
+
+#endif /* __I915_CACHE_H__ */
diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
index 294b022de22b..bb2223cc3470 100644
--- a/drivers/gpu/drm/i915/i915_driver.c
+++ b/drivers/gpu/drm/i915/i915_driver.c
@@ -80,6 +80,7 @@
 #include "soc/intel_dram.h"
 #include "soc/intel_gmch.h"
 
+#include "i915_cache.h"
 #include "i915_debugfs.h"
 #include "i915_driver.h"
 #include "i915_drm_client.h"
@@ -240,6 +241,8 @@ static int i915_driver_early_probe(struct drm_i915_private *dev_priv)
 	i915_memcpy_init_early(dev_priv);
 	intel_runtime_pm_init_early(&dev_priv->runtime_pm);
 
+	i915_cache_init(dev_priv);
+
 	ret = i915_workqueues_init(dev_priv);
 	if (ret < 0)
 		return ret;
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 682ef2b5c7d5..f5c591a762df 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -250,6 +250,8 @@ struct drm_i915_private {
 	unsigned int hpll_freq;
 	unsigned int czclk_freq;
 
+	unsigned int pat_uc, pat_wb;
+
 	/**
 	 * wq - Driver workqueue for GEM.
 	 *
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 1f65bb33dd21..896aa48ed089 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -422,9 +422,7 @@ i915_gem_gtt_pread(struct drm_i915_gem_object *obj,
 			ggtt->vm.insert_page(&ggtt->vm,
 					     i915_gem_object_get_dma_address(obj,
 									     offset >> PAGE_SHIFT),
-					     node.start,
-					     i915_gem_get_pat_index(i915,
-								    I915_CACHE_NONE), 0);
+					     node.start, i915->pat_uc, 0);
 		} else {
 			page_base += offset & PAGE_MASK;
 		}
@@ -603,9 +601,7 @@ i915_gem_gtt_pwrite_fast(struct drm_i915_gem_object *obj,
 			ggtt->vm.insert_page(&ggtt->vm,
 					     i915_gem_object_get_dma_address(obj,
 									     offset >> PAGE_SHIFT),
-					     node.start,
-					     i915_gem_get_pat_index(i915,
-								    I915_CACHE_NONE), 0);
+					     node.start, i915->pat_uc, 0);
 			wmb(); /* flush modifications to the GGTT (insert_page) */
 		} else {
 			page_base += offset & PAGE_MASK;
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 4008bb09fdb5..31975a79730c 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -1124,14 +1124,10 @@ i915_vma_coredump_create(const struct intel_gt *gt,
 			mutex_lock(&ggtt->error_mutex);
 			if (ggtt->vm.raw_insert_page)
 				ggtt->vm.raw_insert_page(&ggtt->vm, dma, slot,
-							 i915_gem_get_pat_index(gt->i915,
-										I915_CACHE_NONE),
-							 0);
+							 gt->i915->pat_uc, 0);
 			else
 				ggtt->vm.insert_page(&ggtt->vm, dma, slot,
-						     i915_gem_get_pat_index(gt->i915,
-									    I915_CACHE_NONE),
-						     0);
+						     gt->i915->pat_uc, 0);
 			mb();
 
 			s = io_mapping_map_wc(&ggtt->iomap, slot, PAGE_SIZE);
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem.c b/drivers/gpu/drm/i915/selftests/i915_gem.c
index 61da4ed9d521..e620f73793a5 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem.c
@@ -57,10 +57,7 @@ static void trash_stolen(struct drm_i915_private *i915)
 		u32 __iomem *s;
 		int x;
 
-		ggtt->vm.insert_page(&ggtt->vm, dma, slot,
-				     i915_gem_get_pat_index(i915,
-							    I915_CACHE_NONE),
-				     0);
+		ggtt->vm.insert_page(&ggtt->vm, dma, slot, i915->pat_uc, 0);
 
 		s = io_mapping_map_atomic_wc(&ggtt->iomap, slot);
 		for (x = 0; x < PAGE_SIZE / sizeof(u32); x++) {
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
index f8fe3681c3dc..f910ec9b6d2b 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
@@ -246,7 +246,7 @@ static int igt_evict_for_cache_color(void *arg)
 	struct drm_mm_node target = {
 		.start = I915_GTT_PAGE_SIZE * 2,
 		.size = I915_GTT_PAGE_SIZE,
-		.color = i915_gem_get_pat_index(gt->i915, I915_CACHE_LLC),
+		.color = gt->i915->pat_wb,
 	};
 	struct drm_i915_gem_object *obj;
 	struct i915_vma *vma;
@@ -309,7 +309,7 @@ static int igt_evict_for_cache_color(void *arg)
 	/* Attempt to remove the first *pinned* vma, by removing the (empty)
 	 * neighbour -- this should fail.
 	 */
-	target.color = i915_gem_get_pat_index(gt->i915, I915_CACHE_L3_LLC);
+	target.color = gt->i915->pat_uc;
 
 	mutex_lock(&ggtt->vm.mutex);
 	err = i915_gem_evict_for_node(&ggtt->vm, NULL, &target, 0);
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
index 5c397a2df70e..c96b7f7d7853 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
@@ -135,7 +135,7 @@ fake_dma_object(struct drm_i915_private *i915, u64 size)
 
 	obj->write_domain = I915_GEM_DOMAIN_CPU;
 	obj->read_domains = I915_GEM_DOMAIN_CPU;
-	obj->pat_index = i915_gem_get_pat_index(i915, I915_CACHE_NONE);
+	obj->pat_index = i915->pat_uc;
 
 	/* Preallocate the "backing storage" */
 	if (i915_gem_object_pin_pages_unlocked(obj))
@@ -358,9 +358,7 @@ static int lowlevel_hole(struct i915_address_space *vm,
 			mock_vma_res->start = addr;
 
 			with_intel_runtime_pm(vm->gt->uncore->rpm, wakeref)
-			  vm->insert_entries(vm, mock_vma_res,
-					     i915_gem_get_pat_index(vm->i915,
-								    I915_CACHE_NONE),
+			  vm->insert_entries(vm, mock_vma_res, vm->i915->pat_uc,
 					     0);
 		}
 		count = n;
@@ -1379,10 +1377,7 @@ static int igt_ggtt_page(void *arg)
 
 		ggtt->vm.insert_page(&ggtt->vm,
 				     i915_gem_object_get_dma_address(obj, 0),
-				     offset,
-				     i915_gem_get_pat_index(i915,
-							    I915_CACHE_NONE),
-				     0);
+				     offset, i915->pat_uc, 0);
 	}
 
 	order = i915_random_order(count, &prng);
diff --git a/drivers/gpu/drm/i915/selftests/intel_memory_region.c b/drivers/gpu/drm/i915/selftests/intel_memory_region.c
index d985d9bae2e8..b82fe0ef8cd7 100644
--- a/drivers/gpu/drm/i915/selftests/intel_memory_region.c
+++ b/drivers/gpu/drm/i915/selftests/intel_memory_region.c
@@ -1070,9 +1070,7 @@ static int igt_lmem_write_cpu(void *arg)
 	/* Put the pages into a known state -- from the gpu for added fun */
 	intel_engine_pm_get(engine);
 	err = intel_context_migrate_clear(engine->gt->migrate.context, NULL,
-					  obj->mm.pages->sgl,
-					  i915_gem_get_pat_index(i915,
-								 I915_CACHE_NONE),
+					  obj->mm.pages->sgl, i915->pat_uc,
 					  true, 0xdeadbeaf, &rq);
 	if (rq) {
 		dma_resv_add_fence(obj->base.resv, &rq->fence,
diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
index da0b269606c5..1d1a457e2aee 100644
--- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
+++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
@@ -181,6 +181,8 @@ struct drm_i915_private *mock_gem_device(void)
 	/* Set up device info and initial runtime info. */
 	intel_device_info_driver_create(i915, pdev->device, &mock_info);
 
+	i915_cache_init(i915);
+
 	dev_pm_domain_set(&pdev->dev, &pm_domain);
 	pm_runtime_enable(&pdev->dev);
 	pm_runtime_dont_use_autosuspend(&pdev->dev);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [RFC 4/8] drm/i915: Refactor PAT/object cache handling
  2023-07-27 14:54 ` [Intel-gfx] " Tvrtko Ursulin
@ 2023-07-27 14:55   ` Tvrtko Ursulin
  -1 siblings, 0 replies; 59+ messages in thread
From: Tvrtko Ursulin @ 2023-07-27 14:55 UTC (permalink / raw)
  To: Intel-gfx, dri-devel
  Cc: Matt Roper, Chris Wilson, Andi Shyti, Fei Yang, Tvrtko Ursulin

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Commit 9275277d5324 ("drm/i915: use pat_index instead of cache_level") has
introduced PAT indices to i915 internal APIs, partially replacing the
usage of driver internal cache_level, but has also added a few sub-
optimal design decisions which this patch tries to improve upon.

Principal change here is to invert the per platform cache level to PAT
index table which was added by the referenced commit, and by doing so
enable i915 to understand the cache mode between PAT indices, changing
them from opaque to transparent.

Once we have the inverted table we are able to remove the hidden false
"return true" from i915_gem_object_has_cache_level and make the involved
code path clearer.

To achieve this we replace the enum i915_cache_level with i915_cache_t,
composed of a more detailed representation of each cache mode (base mode
plus flags).

In this way we are able to express the differences between different
write-back mode coherency settings on Meteorlake, which in turn enables us
to map the i915 "cached" mode to the correct Meteorlake PAT index.

We can also replace the platform dependent cache mode to string code in
debugfs and elsewhere by the single implementation based on i915_cache_t.

v2:
 * Fix PAT-to-cache-mode table for PVC. (Fei)
 * Cache display caching mode too. (Fei)
 * Improve and document criteria in i915_gem_object_can_bypass_llc() (Matt)

v3:
 * Checkpath issues.
 * Cache mode flags check fixed.

v4:
 * Fix intel_device_info->cache_modes array size. (Matt)
 * Boolean cache mode and flags query. (Matt)
 * Reduce number of cache macros with some macro magic.
 * One more checkpatch fix.
 * Tweak tables to show legacy and Gen12 WB is fully coherent.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
References: 9275277d5324 ("drm/i915: use pat_index instead of cache_level")
Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
Cc: Fei Yang <fei.yang@intel.com>
Cc: Andi Shyti <andi.shyti@linux.intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_domain.c    |  60 +++++----
 drivers/gpu/drm/i915/gem/i915_gem_domain.h    |   5 +-
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |   3 +-
 drivers/gpu/drm/i915/gem/i915_gem_internal.c  |   2 +-
 drivers/gpu/drm/i915/gem/i915_gem_mman.c      |   4 +-
 drivers/gpu/drm/i915/gem/i915_gem_object.c    | 117 ++++++++++--------
 drivers/gpu/drm/i915/gem/i915_gem_object.h    |  11 +-
 .../gpu/drm/i915/gem/i915_gem_object_types.h  | 116 +----------------
 drivers/gpu/drm/i915/gem/i915_gem_shmem.c     |   8 +-
 drivers/gpu/drm/i915/gem/i915_gem_stolen.c    |   2 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c  |  20 +--
 drivers/gpu/drm/i915/gem/i915_gem_userptr.c   |   2 +-
 .../drm/i915/gem/selftests/huge_gem_object.c  |   2 +-
 .../gpu/drm/i915/gem/selftests/huge_pages.c   |   3 +-
 drivers/gpu/drm/i915/gt/gen8_ppgtt.c          |  10 +-
 drivers/gpu/drm/i915/gt/intel_engine_cs.c     |   2 +-
 drivers/gpu/drm/i915/gt/intel_ggtt.c          |  25 ++--
 drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c     |   4 +-
 drivers/gpu/drm/i915/gt/intel_gtt.c           |   2 +-
 drivers/gpu/drm/i915/gt/intel_gtt.h           |   3 +-
 drivers/gpu/drm/i915/gt/intel_ppgtt.c         |   6 +-
 .../gpu/drm/i915/gt/intel_ring_submission.c   |   4 +-
 drivers/gpu/drm/i915/gt/intel_timeline.c      |   2 +-
 drivers/gpu/drm/i915/gt/selftest_hangcheck.c  |   2 +-
 .../gpu/drm/i915/gt/selftest_workarounds.c    |   2 +-
 drivers/gpu/drm/i915/i915_cache.c             |  89 +++++++++++--
 drivers/gpu/drm/i915/i915_cache.h             |  70 ++++++++++-
 drivers/gpu/drm/i915/i915_debugfs.c           |  53 ++------
 drivers/gpu/drm/i915/i915_driver.c            |   4 +-
 drivers/gpu/drm/i915/i915_gem.c               |  13 --
 drivers/gpu/drm/i915/i915_pci.c               |  84 +++++++------
 drivers/gpu/drm/i915/i915_perf.c              |   2 +-
 drivers/gpu/drm/i915/intel_device_info.h      |   6 +-
 .../gpu/drm/i915/selftests/i915_gem_evict.c   |   4 +-
 drivers/gpu/drm/i915/selftests/igt_spinner.c  |   2 +-
 .../gpu/drm/i915/selftests/mock_gem_device.c  |  14 +--
 36 files changed, 391 insertions(+), 367 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
index 57db9c581bf6..c15f83de33af 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
@@ -8,6 +8,7 @@
 #include "display/intel_frontbuffer.h"
 #include "gt/intel_gt.h"
 
+#include "i915_cache.h"
 #include "i915_drv.h"
 #include "i915_gem_clflush.h"
 #include "i915_gem_domain.h"
@@ -41,14 +42,17 @@ static bool gpu_write_needs_clflush(struct drm_i915_gem_object *obj)
 		return false;
 
 	/*
-	 * For objects created by userspace through GEM_CREATE with pat_index
-	 * set by set_pat extension, i915_gem_object_has_cache_level() will
-	 * always return true, because the coherency of such object is managed
-	 * by userspace. Othereise the call here would fall back to checking
-	 * whether the object is un-cached or write-through.
+	 * Always flush cache for UMD objects with PAT index set.
 	 */
-	return !(i915_gem_object_has_cache_level(obj, I915_CACHE_NONE) ||
-		 i915_gem_object_has_cache_level(obj, I915_CACHE_WT));
+	if (obj->pat_set_by_user)
+		return true;
+
+	/*
+	 * Fully coherent cached access may end up with data in the CPU cache
+	 * which hasn't hit memory yet.
+	 */
+	return i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_WB) &&
+	       i915_gem_object_has_cache_flag(obj, I915_CACHE_FLAG_COH2W);
 }
 
 bool i915_gem_cpu_write_needs_clflush(struct drm_i915_gem_object *obj)
@@ -268,7 +272,7 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
 /**
  * i915_gem_object_set_cache_level - Changes the cache-level of an object across all VMA.
  * @obj: object to act on
- * @cache_level: new cache level to set for the object
+ * @cache: new caching mode to set for the object
  *
  * After this function returns, the object will be in the new cache-level
  * across all GTT and the contents of the backing storage will be coherent,
@@ -281,18 +285,28 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
  * that all direct access to the scanout remains coherent.
  */
 int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
-				    enum i915_cache_level cache_level)
+				    i915_cache_t cache)
 {
-	int ret;
+	struct drm_i915_private *i915 = to_i915(obj->base.dev);
+	int pat, ret;
 
-	/*
-	 * For objects created by userspace through GEM_CREATE with pat_index
-	 * set by set_pat extension, simply return 0 here without touching
-	 * the cache setting, because such objects should have an immutable
-	 * cache setting by desgin and always managed by userspace.
-	 */
-	if (i915_gem_object_has_cache_level(obj, cache_level))
+	pat = i915_cache_find_pat(i915, cache);
+	if (pat < 0) {
+		char buf[I915_CACHE_NAME_LEN];
+
+		i915_cache_print(buf, sizeof(buf), NULL, cache);
+		drm_err_ratelimited(&i915->drm,
+				    "Attempting to use unknown caching mode %s!\n",
+				    buf);
+
+		return -EINVAL;
+	} else if (pat == obj->pat_index) {
 		return 0;
+	} else if (obj->pat_set_by_user) {
+		drm_notice_once(&i915->drm,
+				"Attempting to change caching mode on an object with fixed PAT!\n");
+		return -EINVAL;
+	}
 
 	ret = i915_gem_object_wait(obj,
 				   I915_WAIT_INTERRUPTIBLE |
@@ -302,7 +316,7 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
 		return ret;
 
 	/* Always invalidate stale cachelines */
-	i915_gem_object_set_cache_coherency(obj, cache_level);
+	i915_gem_object_set_pat_index(obj, pat);
 	obj->cache_dirty = true;
 
 	/* The cache-level will be applied when each vma is rebound. */
@@ -337,10 +351,10 @@ int i915_gem_get_caching_ioctl(struct drm_device *dev, void *data,
 		goto out;
 	}
 
-	if (i915_gem_object_has_cache_level(obj, I915_CACHE_LLC) ||
-	    i915_gem_object_has_cache_level(obj, I915_CACHE_L3_LLC))
+	if (i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_WB) &&
+	    i915_gem_object_has_cache_flag(obj, I915_CACHE_FLAG_COH2W))
 		args->caching = I915_CACHING_CACHED;
-	else if (i915_gem_object_has_cache_level(obj, I915_CACHE_WT))
+	else if (i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_WT))
 		args->caching = I915_CACHING_DISPLAY;
 	else
 		args->caching = I915_CACHING_NONE;
@@ -355,7 +369,7 @@ int i915_gem_set_caching_ioctl(struct drm_device *dev, void *data,
 	struct drm_i915_private *i915 = to_i915(dev);
 	struct drm_i915_gem_caching *args = data;
 	struct drm_i915_gem_object *obj;
-	enum i915_cache_level level;
+	i915_cache_t level;
 	int ret = 0;
 
 	if (IS_DGFX(i915))
@@ -378,7 +392,7 @@ int i915_gem_set_caching_ioctl(struct drm_device *dev, void *data,
 		if (!HAS_LLC(i915) && !HAS_SNOOP(i915))
 			return -ENODEV;
 
-		level = I915_CACHE_LLC;
+		level = I915_CACHE_CACHED;
 		break;
 	case I915_CACHING_DISPLAY:
 		level = HAS_WT(i915) ? I915_CACHE_WT : I915_CACHE_NONE;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.h b/drivers/gpu/drm/i915/gem/i915_gem_domain.h
index 9622df962bfc..6da5c351f6fd 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_domain.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.h
@@ -6,10 +6,11 @@
 #ifndef __I915_GEM_DOMAIN_H__
 #define __I915_GEM_DOMAIN_H__
 
+#include "i915_cache.h"
+
 struct drm_i915_gem_object;
-enum i915_cache_level;
 
 int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
-				    enum i915_cache_level cache_level);
+				    i915_cache_t cache);
 
 #endif /* __I915_GEM_DOMAIN_H__ */
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 0a1d40220020..9d6e49c8a4c6 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -648,7 +648,8 @@ static inline int use_cpu_reloc(const struct reloc_cache *cache,
 	 */
 	return (cache->has_llc ||
 		obj->cache_dirty ||
-		!i915_gem_object_has_cache_level(obj, I915_CACHE_NONE));
+		!(obj->pat_set_by_user ||
+		  i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_UC)));
 }
 
 static int eb_reserve_vma(struct i915_execbuffer *eb,
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_internal.c b/drivers/gpu/drm/i915/gem/i915_gem_internal.c
index 6bc26b4b06b8..88c360c3d6a3 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_internal.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_internal.c
@@ -170,7 +170,7 @@ __i915_gem_object_create_internal(struct drm_i915_private *i915,
 	obj->read_domains = I915_GEM_DOMAIN_CPU;
 	obj->write_domain = I915_GEM_DOMAIN_CPU;
 
-	cache_level = HAS_LLC(i915) ? I915_CACHE_LLC : I915_CACHE_NONE;
+	cache_level = HAS_LLC(i915) ? I915_CACHE_CACHED : I915_CACHE_NONE;
 	i915_gem_object_set_cache_coherency(obj, cache_level);
 
 	return obj;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
index aa4d842d4c5a..cd7f8ded0d6f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
@@ -382,7 +382,6 @@ static vm_fault_t vm_fault_gtt(struct vm_fault *vmf)
 		goto err_reset;
 	}
 
-	/* Access to snoopable pages through the GTT is incoherent. */
 	/*
 	 * For objects created by userspace through GEM_CREATE with pat_index
 	 * set by set_pat extension, coherency is managed by userspace, make
@@ -391,7 +390,8 @@ static vm_fault_t vm_fault_gtt(struct vm_fault *vmf)
 	 * objects. Otherwise this helper function would fall back to checking
 	 * whether the object is un-cached.
 	 */
-	if (!(i915_gem_object_has_cache_level(obj, I915_CACHE_NONE) ||
+	if (!((obj->pat_set_by_user ||
+	       i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_UC)) ||
 	      HAS_LLC(i915))) {
 		ret = -EFAULT;
 		goto err_unpin;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
index 3dc4fbb67d2b..ec1f0be43d0d 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
@@ -45,33 +45,6 @@ static struct kmem_cache *slab_objects;
 
 static const struct drm_gem_object_funcs i915_gem_object_funcs;
 
-unsigned int i915_gem_get_pat_index(struct drm_i915_private *i915,
-				    enum i915_cache_level level)
-{
-	if (drm_WARN_ON(&i915->drm, level >= I915_MAX_CACHE_LEVEL))
-		return 0;
-
-	return INTEL_INFO(i915)->cachelevel_to_pat[level];
-}
-
-bool i915_gem_object_has_cache_level(const struct drm_i915_gem_object *obj,
-				     enum i915_cache_level lvl)
-{
-	/*
-	 * In case the pat_index is set by user space, this kernel mode
-	 * driver should leave the coherency to be managed by user space,
-	 * simply return true here.
-	 */
-	if (obj->pat_set_by_user)
-		return true;
-
-	/*
-	 * Otherwise the pat_index should have been converted from cache_level
-	 * so that the following comparison is valid.
-	 */
-	return obj->pat_index == i915_gem_get_pat_index(obj_to_i915(obj), lvl);
-}
-
 struct drm_i915_gem_object *i915_gem_object_alloc(void)
 {
 	struct drm_i915_gem_object *obj;
@@ -144,30 +117,72 @@ void __i915_gem_object_fini(struct drm_i915_gem_object *obj)
 	dma_resv_fini(&obj->base._resv);
 }
 
+bool i915_gem_object_has_cache_mode(const struct drm_i915_gem_object *obj,
+				    enum i915_cache_mode mode)
+{
+	struct drm_i915_private *i915 = obj_to_i915(obj);
+	i915_cache_t cache = INTEL_INFO(i915)->cache_modes[obj->pat_index];
+
+	return I915_CACHE_MODE(cache) == mode;
+}
+
+bool i915_gem_object_has_cache_flag(const struct drm_i915_gem_object *obj,
+				    unsigned int flag)
+{
+	struct drm_i915_private *i915 = obj_to_i915(obj);
+	i915_cache_t cache = INTEL_INFO(i915)->cache_modes[obj->pat_index];
+
+	return I915_CACHE_FLAGS(cache) & flag;
+}
+
+static void __i915_gem_object_update_coherency(struct drm_i915_gem_object *obj)
+{
+	struct drm_i915_private *i915 = obj_to_i915(obj);
+	i915_cache_t cache = INTEL_INFO(i915)->cache_modes[obj->pat_index];
+	const unsigned int flags = I915_CACHE_FLAGS(cache);
+	const unsigned int mode = I915_CACHE_MODE(cache);
+
+	if (mode == I915_CACHE_MODE_WC ||
+	    mode == I915_CACHE_MODE_WT ||
+	    (mode == I915_CACHE_MODE_WB && (flags & I915_CACHE_FLAG_COH2W)))
+		obj->cache_coherent = I915_BO_CACHE_COHERENT_FOR_READ |
+				      I915_BO_CACHE_COHERENT_FOR_WRITE;
+	else if (HAS_LLC(i915))
+		obj->cache_coherent = I915_BO_CACHE_COHERENT_FOR_READ;
+	else
+		obj->cache_coherent = 0;
+
+	obj->cache_dirty =
+		!(obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_WRITE) &&
+		!IS_DGFX(i915);
+}
+
 /**
  * i915_gem_object_set_cache_coherency - Mark up the object's coherency levels
- * for a given cache_level
+ * for a given caching mode
  * @obj: #drm_i915_gem_object
- * @cache_level: cache level
+ * @cache: cache mode
  */
 void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object *obj,
-					 unsigned int cache_level)
+					 i915_cache_t cache)
 {
-	struct drm_i915_private *i915 = to_i915(obj->base.dev);
+	struct drm_i915_private *i915 = obj_to_i915(obj);
+	int found;
 
-	obj->pat_index = i915_gem_get_pat_index(i915, cache_level);
+	found = i915_cache_find_pat(i915, cache);
+	if (found < 0) {
+		char buf[I915_CACHE_NAME_LEN];
 
-	if (cache_level != I915_CACHE_NONE)
-		obj->cache_coherent = (I915_BO_CACHE_COHERENT_FOR_READ |
-				       I915_BO_CACHE_COHERENT_FOR_WRITE);
-	else if (HAS_LLC(i915))
-		obj->cache_coherent = I915_BO_CACHE_COHERENT_FOR_READ;
-	else
-		obj->cache_coherent = 0;
+		i915_cache_print(buf, sizeof(buf), NULL, cache);
+		drm_err_ratelimited(&i915->drm, "Unknown cache mode %s!\n",
+				    buf);
 
-	obj->cache_dirty =
-		!(obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_WRITE) &&
-		!IS_DGFX(i915);
+		found = i915->pat_uc;
+	}
+
+	obj->pat_index = found;
+
+	__i915_gem_object_update_coherency(obj);
 }
 
 /**
@@ -181,24 +196,18 @@ void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object *obj,
 void i915_gem_object_set_pat_index(struct drm_i915_gem_object *obj,
 				   unsigned int pat_index)
 {
-	struct drm_i915_private *i915 = to_i915(obj->base.dev);
+	struct drm_i915_private *i915 = obj_to_i915(obj);
 
 	if (obj->pat_index == pat_index)
 		return;
 
+	if (drm_WARN_ON_ONCE(&i915->drm,
+			     pat_index > INTEL_INFO(i915)->max_pat_index))
+		return;
+
 	obj->pat_index = pat_index;
 
-	if (pat_index != i915_gem_get_pat_index(i915, I915_CACHE_NONE))
-		obj->cache_coherent = (I915_BO_CACHE_COHERENT_FOR_READ |
-				       I915_BO_CACHE_COHERENT_FOR_WRITE);
-	else if (HAS_LLC(i915))
-		obj->cache_coherent = I915_BO_CACHE_COHERENT_FOR_READ;
-	else
-		obj->cache_coherent = 0;
-
-	obj->cache_dirty =
-		!(obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_WRITE) &&
-		!IS_DGFX(i915);
+	__i915_gem_object_update_coherency(obj);
 }
 
 bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h
index 884a17275b3a..a5d4ee19d9be 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
@@ -13,6 +13,7 @@
 
 #include "display/intel_frontbuffer.h"
 #include "intel_memory_region.h"
+#include "i915_cache.h"
 #include "i915_gem_object_types.h"
 #include "i915_gem_gtt.h"
 #include "i915_gem_ww.h"
@@ -32,10 +33,6 @@ static inline bool i915_gem_object_size_2big(u64 size)
 	return false;
 }
 
-unsigned int i915_gem_get_pat_index(struct drm_i915_private *i915,
-				    enum i915_cache_level level);
-bool i915_gem_object_has_cache_level(const struct drm_i915_gem_object *obj,
-				     enum i915_cache_level lvl);
 void i915_gem_init__objects(struct drm_i915_private *i915);
 
 void i915_objects_module_exit(void);
@@ -764,8 +761,12 @@ int i915_gem_object_wait_moving_fence(struct drm_i915_gem_object *obj,
 				      bool intr);
 bool i915_gem_object_has_unknown_state(struct drm_i915_gem_object *obj);
 
+bool i915_gem_object_has_cache_mode(const struct drm_i915_gem_object *obj,
+				    enum i915_cache_mode mode);
+bool i915_gem_object_has_cache_flag(const struct drm_i915_gem_object *obj,
+				    unsigned int flag);
 void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object *obj,
-					 unsigned int cache_level);
+					 i915_cache_t cache);
 void i915_gem_object_set_pat_index(struct drm_i915_gem_object *obj,
 				   unsigned int pat_index);
 bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index 8de2b91b3edf..6790e13ad262 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -14,6 +14,7 @@
 #include <uapi/drm/i915_drm.h>
 
 #include "i915_active.h"
+#include "i915_cache.h"
 #include "i915_selftest.h"
 #include "i915_vma_resource.h"
 
@@ -116,93 +117,6 @@ struct drm_i915_gem_object_ops {
 	const char *name; /* friendly name for debug, e.g. lockdep classes */
 };
 
-/**
- * enum i915_cache_level - The supported GTT caching values for system memory
- * pages.
- *
- * These translate to some special GTT PTE bits when binding pages into some
- * address space. It also determines whether an object, or rather its pages are
- * coherent with the GPU, when also reading or writing through the CPU cache
- * with those pages.
- *
- * Userspace can also control this through struct drm_i915_gem_caching.
- */
-enum i915_cache_level {
-	/**
-	 * @I915_CACHE_NONE:
-	 *
-	 * GPU access is not coherent with the CPU cache. If the cache is dirty
-	 * and we need the underlying pages to be coherent with some later GPU
-	 * access then we need to manually flush the pages.
-	 *
-	 * On shared LLC platforms reads and writes through the CPU cache are
-	 * still coherent even with this setting. See also
-	 * &drm_i915_gem_object.cache_coherent for more details. Due to this we
-	 * should only ever use uncached for scanout surfaces, otherwise we end
-	 * up over-flushing in some places.
-	 *
-	 * This is the default on non-LLC platforms.
-	 */
-	I915_CACHE_NONE = 0,
-	/**
-	 * @I915_CACHE_LLC:
-	 *
-	 * GPU access is coherent with the CPU cache. If the cache is dirty,
-	 * then the GPU will ensure that access remains coherent, when both
-	 * reading and writing through the CPU cache. GPU writes can dirty the
-	 * CPU cache.
-	 *
-	 * Not used for scanout surfaces.
-	 *
-	 * Applies to both platforms with shared LLC(HAS_LLC), and snooping
-	 * based platforms(HAS_SNOOP).
-	 *
-	 * This is the default on shared LLC platforms.  The only exception is
-	 * scanout objects, where the display engine is not coherent with the
-	 * CPU cache. For such objects I915_CACHE_NONE or I915_CACHE_WT is
-	 * automatically applied by the kernel in pin_for_display, if userspace
-	 * has not done so already.
-	 */
-	I915_CACHE_LLC,
-	/**
-	 * @I915_CACHE_L3_LLC:
-	 *
-	 * Explicitly enable the Gfx L3 cache, with coherent LLC.
-	 *
-	 * The Gfx L3 sits between the domain specific caches, e.g
-	 * sampler/render caches, and the larger LLC. LLC is coherent with the
-	 * GPU, but L3 is only visible to the GPU, so likely needs to be flushed
-	 * when the workload completes.
-	 *
-	 * Not used for scanout surfaces.
-	 *
-	 * Only exposed on some gen7 + GGTT. More recent hardware has dropped
-	 * this explicit setting, where it should now be enabled by default.
-	 */
-	I915_CACHE_L3_LLC,
-	/**
-	 * @I915_CACHE_WT:
-	 *
-	 * Write-through. Used for scanout surfaces.
-	 *
-	 * The GPU can utilise the caches, while still having the display engine
-	 * be coherent with GPU writes, as a result we don't need to flush the
-	 * CPU caches when moving out of the render domain. This is the default
-	 * setting chosen by the kernel, if supported by the HW, otherwise we
-	 * fallback to I915_CACHE_NONE. On the CPU side writes through the CPU
-	 * cache still need to be flushed, to remain coherent with the display
-	 * engine.
-	 */
-	I915_CACHE_WT,
-	/**
-	 * @I915_MAX_CACHE_LEVEL:
-	 *
-	 * Mark the last entry in the enum. Used for defining cachelevel_to_pat
-	 * array for cache_level to pat translation table.
-	 */
-	I915_MAX_CACHE_LEVEL,
-};
-
 enum i915_map_type {
 	I915_MAP_WB = 0,
 	I915_MAP_WC,
@@ -403,16 +317,6 @@ struct drm_i915_gem_object {
 	/**
 	 * @cache_coherent:
 	 *
-	 * Note: with the change above which replaced @cache_level with pat_index,
-	 * the use of @cache_coherent is limited to the objects created by kernel
-	 * or by userspace without pat index specified.
-	 * Check for @pat_set_by_user to find out if an object has pat index set
-	 * by userspace. The ioctl's to change cache settings have also been
-	 * disabled for the objects with pat index set by userspace. Please don't
-	 * assume @cache_coherent having the flags set as describe here. A helper
-	 * function i915_gem_object_has_cache_level() provides one way to bypass
-	 * the use of this field.
-	 *
 	 * Track whether the pages are coherent with the GPU if reading or
 	 * writing through the CPU caches. The largely depends on the
 	 * @cache_level setting.
@@ -447,7 +351,7 @@ struct drm_i915_gem_object {
 	 * flushing the surface just before doing the scanout.  This does mean
 	 * we might unnecessarily flush non-scanout objects in some places, but
 	 * the default assumption is that all normal objects should be using
-	 * I915_CACHE_LLC, at least on platforms with the shared LLC.
+	 * I915_CACHE_CACHED, at least on platforms with the shared LLC.
 	 *
 	 * Supported values:
 	 *
@@ -486,16 +390,6 @@ struct drm_i915_gem_object {
 	/**
 	 * @cache_dirty:
 	 *
-	 * Note: with the change above which replaced cache_level with pat_index,
-	 * the use of @cache_dirty is limited to the objects created by kernel
-	 * or by userspace without pat index specified.
-	 * Check for @pat_set_by_user to find out if an object has pat index set
-	 * by userspace. The ioctl's to change cache settings have also been
-	 * disabled for the objects with pat_index set by userspace. Please don't
-	 * assume @cache_dirty is set as describe here. Also see helper function
-	 * i915_gem_object_has_cache_level() for possible ways to bypass the use
-	 * of this field.
-	 *
 	 * Track if we are we dirty with writes through the CPU cache for this
 	 * object. As a result reading directly from main memory might yield
 	 * stale data.
@@ -531,9 +425,9 @@ struct drm_i915_gem_object {
 	 *
 	 *   1. All userspace objects, by default, have @cache_level set as
 	 *   I915_CACHE_NONE. The only exception is userptr objects, where we
-	 *   instead force I915_CACHE_LLC, but we also don't allow userspace to
-	 *   ever change the @cache_level for such objects. Another special case
-	 *   is dma-buf, which doesn't rely on @cache_dirty,  but there we
+	 *   instead force I915_CACHE_CACHED, but we also don't allow userspace
+	 *   to ever change the @cache_level for such objects. Another special
+	 *   case is dma-buf, which doesn't rely on @cache_dirty,  but there we
 	 *   always do a forced flush when acquiring the pages, if there is a
 	 *   chance that the pages can be read directly from main memory with
 	 *   the GPU.
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
index 8f1633c3fb93..aba908f0349f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
@@ -584,7 +584,7 @@ static int shmem_object_init(struct intel_memory_region *mem,
 	static struct lock_class_key lock_class;
 	struct drm_i915_private *i915 = mem->i915;
 	struct address_space *mapping;
-	unsigned int cache_level;
+	i915_cache_t cache;
 	gfp_t mask;
 	int ret;
 
@@ -628,11 +628,11 @@ static int shmem_object_init(struct intel_memory_region *mem,
 		 * However, we maintain the display planes as UC, and so
 		 * need to rebind when first used as such.
 		 */
-		cache_level = I915_CACHE_LLC;
+		cache = I915_CACHE_CACHED;
 	else
-		cache_level = I915_CACHE_NONE;
+		cache = I915_CACHE_NONE;
 
-	i915_gem_object_set_cache_coherency(obj, cache_level);
+	i915_gem_object_set_cache_coherency(obj, cache);
 
 	i915_gem_object_init_memory_region(obj, mem);
 
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
index 1c8eb806b7d3..cc907a1f1c53 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
@@ -691,7 +691,7 @@ static int __i915_gem_object_create_stolen(struct intel_memory_region *mem,
 
 	obj->stolen = stolen;
 	obj->read_domains = I915_GEM_DOMAIN_CPU | I915_GEM_DOMAIN_GTT;
-	cache_level = HAS_LLC(mem->i915) ? I915_CACHE_LLC : I915_CACHE_NONE;
+	cache_level = HAS_LLC(mem->i915) ? I915_CACHE_CACHED : I915_CACHE_NONE;
 	i915_gem_object_set_cache_coherency(obj, cache_level);
 
 	if (WARN_ON(!i915_gem_object_trylock(obj, NULL)))
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index 6bd6c239f4ac..107176d1757b 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -48,14 +48,14 @@ void i915_ttm_migrate_set_ban_memcpy(bool ban)
 }
 #endif
 
-static enum i915_cache_level
-i915_ttm_cache_level(struct drm_i915_private *i915, struct ttm_resource *res,
-		     struct ttm_tt *ttm)
+static i915_cache_t
+i915_ttm_cache(struct drm_i915_private *i915, struct ttm_resource *res,
+	       struct ttm_tt *ttm)
 {
 	return ((HAS_LLC(i915) || HAS_SNOOP(i915)) &&
 		!i915_ttm_gtt_binds_lmem(res) &&
-		ttm->caching == ttm_cached) ? I915_CACHE_LLC :
-		I915_CACHE_NONE;
+		ttm->caching == ttm_cached) ? I915_CACHE_CACHED :
+					      I915_CACHE_NONE;
 }
 
 static unsigned int
@@ -112,8 +112,8 @@ void i915_ttm_adjust_domains_after_move(struct drm_i915_gem_object *obj)
 void i915_ttm_adjust_gem_after_move(struct drm_i915_gem_object *obj)
 {
 	struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
-	unsigned int cache_level;
 	unsigned int mem_flags;
+	i915_cache_t cache;
 	unsigned int i;
 	int mem_type;
 
@@ -126,13 +126,13 @@ void i915_ttm_adjust_gem_after_move(struct drm_i915_gem_object *obj)
 	if (!bo->resource) {
 		mem_flags = I915_BO_FLAG_STRUCT_PAGE;
 		mem_type = I915_PL_SYSTEM;
-		cache_level = I915_CACHE_NONE;
+		cache = I915_CACHE_NONE;
 	} else {
 		mem_flags = i915_ttm_cpu_maps_iomem(bo->resource) ? I915_BO_FLAG_IOMEM :
 			I915_BO_FLAG_STRUCT_PAGE;
 		mem_type = bo->resource->mem_type;
-		cache_level = i915_ttm_cache_level(to_i915(bo->base.dev), bo->resource,
-						   bo->ttm);
+		cache = i915_ttm_cache(to_i915(bo->base.dev), bo->resource,
+				       bo->ttm);
 	}
 
 	/*
@@ -157,7 +157,7 @@ void i915_ttm_adjust_gem_after_move(struct drm_i915_gem_object *obj)
 	obj->mem_flags &= ~(I915_BO_FLAG_STRUCT_PAGE | I915_BO_FLAG_IOMEM);
 	obj->mem_flags |= mem_flags;
 
-	i915_gem_object_set_cache_coherency(obj, cache_level);
+	i915_gem_object_set_cache_coherency(obj, cache);
 }
 
 /**
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
index 1d3ebdf4069b..5d2891981bd4 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
@@ -553,7 +553,7 @@ i915_gem_userptr_ioctl(struct drm_device *dev,
 	obj->mem_flags = I915_BO_FLAG_STRUCT_PAGE;
 	obj->read_domains = I915_GEM_DOMAIN_CPU;
 	obj->write_domain = I915_GEM_DOMAIN_CPU;
-	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
+	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
 
 	obj->userptr.ptr = args->user_ptr;
 	obj->userptr.notifier_seq = ULONG_MAX;
diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c b/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c
index bac957755068..77d04be5e9d7 100644
--- a/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c
@@ -123,7 +123,7 @@ huge_gem_object(struct drm_i915_private *i915,
 
 	obj->read_domains = I915_GEM_DOMAIN_CPU;
 	obj->write_domain = I915_GEM_DOMAIN_CPU;
-	cache_level = HAS_LLC(i915) ? I915_CACHE_LLC : I915_CACHE_NONE;
+	cache_level = HAS_LLC(i915) ? I915_CACHE_CACHED : I915_CACHE_NONE;
 	i915_gem_object_set_cache_coherency(obj, cache_level);
 	obj->scratch = phys_size;
 
diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
index 6bddd733d796..6ca5b9dbc414 100644
--- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
+++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
@@ -200,9 +200,10 @@ huge_pages_object(struct drm_i915_private *i915,
 	obj->write_domain = I915_GEM_DOMAIN_CPU;
 	obj->read_domains = I915_GEM_DOMAIN_CPU;
 
-	cache_level = HAS_LLC(i915) ? I915_CACHE_LLC : I915_CACHE_NONE;
+	cache_level = HAS_LLC(i915) ? I915_CACHE_CACHED : I915_CACHE_NONE;
 	i915_gem_object_set_cache_coherency(obj, cache_level);
 
+
 	obj->mm.page_mask = page_mask;
 
 	return obj;
diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
index 675f71f06e89..3c93a73cf6b1 100644
--- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
@@ -16,11 +16,11 @@
 #include "intel_gtt.h"
 
 static u64 gen8_pde_encode(const dma_addr_t addr,
-			   const enum i915_cache_level level)
+			   const enum i915_cache_mode cache_mode)
 {
 	u64 pde = addr | GEN8_PAGE_PRESENT | GEN8_PAGE_RW;
 
-	if (level != I915_CACHE_NONE)
+	if (cache_mode != I915_CACHE_MODE_UC)
 		pde |= PPAT_CACHED_PDE;
 	else
 		pde |= PPAT_UNCACHED;
@@ -43,10 +43,10 @@ static u64 gen8_pte_encode(dma_addr_t addr,
 	 * See translation table defined by LEGACY_CACHELEVEL.
 	 */
 	switch (pat_index) {
-	case I915_CACHE_NONE:
+	case I915_CACHE_MODE_UC:
 		pte |= PPAT_UNCACHED;
 		break;
-	case I915_CACHE_WT:
+	case I915_CACHE_MODE_WT:
 		pte |= PPAT_DISPLAY_ELLC;
 		break;
 	default:
@@ -893,7 +893,7 @@ static int gen8_init_scratch(struct i915_address_space *vm)
 		}
 
 		fill_px(obj, vm->scratch[i - 1]->encode);
-		obj->encode = gen8_pde_encode(px_dma(obj), I915_CACHE_NONE);
+		obj->encode = gen8_pde_encode(px_dma(obj), I915_CACHE_MODE_UC);
 
 		vm->scratch[i] = obj;
 	}
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index ee15486fed0d..f1e59e512d14 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -1103,7 +1103,7 @@ static int init_status_page(struct intel_engine_cs *engine)
 		return PTR_ERR(obj);
 	}
 
-	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
+	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
 
 	vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL);
 	if (IS_ERR(vma)) {
diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c
index fca61ddca8ad..ab5f654e7557 100644
--- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
@@ -1011,11 +1011,6 @@ static int gen8_gmch_probe(struct i915_ggtt *ggtt)
 	return ggtt_probe_common(ggtt, size);
 }
 
-/*
- * For pre-gen8 platforms pat_index is the same as enum i915_cache_level,
- * so the switch-case statements in these PTE encode functions are still valid.
- * See translation table LEGACY_CACHELEVEL.
- */
 static u64 snb_pte_encode(dma_addr_t addr,
 			  unsigned int pat_index,
 			  u32 flags)
@@ -1023,11 +1018,11 @@ static u64 snb_pte_encode(dma_addr_t addr,
 	gen6_pte_t pte = GEN6_PTE_ADDR_ENCODE(addr) | GEN6_PTE_VALID;
 
 	switch (pat_index) {
-	case I915_CACHE_L3_LLC:
-	case I915_CACHE_LLC:
+	case I915_CACHE_MODE_WB:
+	case __I915_CACHE_MODE_WB_L3:
 		pte |= GEN6_PTE_CACHE_LLC;
 		break;
-	case I915_CACHE_NONE:
+	case I915_CACHE_MODE_UC:
 		pte |= GEN6_PTE_UNCACHED;
 		break;
 	default:
@@ -1044,13 +1039,13 @@ static u64 ivb_pte_encode(dma_addr_t addr,
 	gen6_pte_t pte = GEN6_PTE_ADDR_ENCODE(addr) | GEN6_PTE_VALID;
 
 	switch (pat_index) {
-	case I915_CACHE_L3_LLC:
+	case __I915_CACHE_MODE_WB_L3:
 		pte |= GEN7_PTE_CACHE_L3_LLC;
 		break;
-	case I915_CACHE_LLC:
+	case I915_CACHE_MODE_WB:
 		pte |= GEN6_PTE_CACHE_LLC;
 		break;
-	case I915_CACHE_NONE:
+	case I915_CACHE_MODE_UC:
 		pte |= GEN6_PTE_UNCACHED;
 		break;
 	default:
@@ -1069,7 +1064,7 @@ static u64 byt_pte_encode(dma_addr_t addr,
 	if (!(flags & PTE_READ_ONLY))
 		pte |= BYT_PTE_WRITEABLE;
 
-	if (pat_index != I915_CACHE_NONE)
+	if (pat_index != I915_CACHE_MODE_UC)
 		pte |= BYT_PTE_SNOOPED_BY_CPU_CACHES;
 
 	return pte;
@@ -1081,7 +1076,7 @@ static u64 hsw_pte_encode(dma_addr_t addr,
 {
 	gen6_pte_t pte = HSW_PTE_ADDR_ENCODE(addr) | GEN6_PTE_VALID;
 
-	if (pat_index != I915_CACHE_NONE)
+	if (pat_index != I915_CACHE_MODE_UC)
 		pte |= HSW_WB_LLC_AGE3;
 
 	return pte;
@@ -1094,9 +1089,9 @@ static u64 iris_pte_encode(dma_addr_t addr,
 	gen6_pte_t pte = HSW_PTE_ADDR_ENCODE(addr) | GEN6_PTE_VALID;
 
 	switch (pat_index) {
-	case I915_CACHE_NONE:
+	case I915_CACHE_MODE_UC:
 		break;
-	case I915_CACHE_WT:
+	case I915_CACHE_MODE_WT:
 		pte |= HSW_WT_ELLC_LLC_AGE3;
 		break;
 	default:
diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c b/drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c
index 866c416afb73..803c41ac4ccb 100644
--- a/drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c
+++ b/drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c
@@ -21,7 +21,7 @@ static void gmch_ggtt_insert_page(struct i915_address_space *vm,
 				  unsigned int pat_index,
 				  u32 unused)
 {
-	unsigned int flags = (pat_index == I915_CACHE_NONE) ?
+	unsigned int flags = (pat_index == I915_CACHE_MODE_UC) ?
 		AGP_USER_MEMORY : AGP_USER_CACHED_MEMORY;
 
 	intel_gmch_gtt_insert_page(addr, offset >> PAGE_SHIFT, flags);
@@ -32,7 +32,7 @@ static void gmch_ggtt_insert_entries(struct i915_address_space *vm,
 				     unsigned int pat_index,
 				     u32 unused)
 {
-	unsigned int flags = (pat_index == I915_CACHE_NONE) ?
+	unsigned int flags = (pat_index == I915_CACHE_MODE_UC) ?
 		AGP_USER_MEMORY : AGP_USER_CACHED_MEMORY;
 
 	intel_gmch_gtt_insert_sg_entries(vma_res->bi.pages, vma_res->start >> PAGE_SHIFT,
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
index 065099362a98..48055304537a 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
@@ -676,7 +676,7 @@ __vm_create_scratch_for_read(struct i915_address_space *vm, unsigned long size)
 	if (IS_ERR(obj))
 		return ERR_CAST(obj);
 
-	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
+	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
 
 	vma = i915_vma_instance(obj, vm, NULL);
 	if (IS_ERR(vma)) {
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
index 7192a534a654..af4277c1d577 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
@@ -636,7 +636,8 @@ void
 __set_pd_entry(struct i915_page_directory * const pd,
 	       const unsigned short idx,
 	       struct i915_page_table *pt,
-	       u64 (*encode)(const dma_addr_t, const enum i915_cache_level));
+	       u64 (*encode)(const dma_addr_t,
+			     const enum i915_cache_mode cache_mode));
 
 #define set_pd_entry(pd, idx, to) \
 	__set_pd_entry((pd), (idx), px_pt(to), gen8_pde_encode)
diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
index 436756bfbb1a..3e461d4f3693 100644
--- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
@@ -98,14 +98,16 @@ void
 __set_pd_entry(struct i915_page_directory * const pd,
 	       const unsigned short idx,
 	       struct i915_page_table * const to,
-	       u64 (*encode)(const dma_addr_t, const enum i915_cache_level))
+	       u64 (*encode)(const dma_addr_t,
+			     const enum i915_cache_mode cache_mode))
 {
 	/* Each thread pre-pins the pd, and we may have a thread per pde. */
 	GEM_BUG_ON(atomic_read(px_used(pd)) > NALLOC * I915_PDES);
 
 	atomic_inc(px_used(pd));
 	pd->entry[idx] = to;
-	write_dma_entry(px_base(pd), idx, encode(px_dma(to), I915_CACHE_LLC));
+	write_dma_entry(px_base(pd), idx,
+			encode(px_dma(to), I915_CACHE_MODE_WB));
 }
 
 void
diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
index 92085ffd23de..9131d228d285 100644
--- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
@@ -551,7 +551,9 @@ alloc_context_vma(struct intel_engine_cs *engine)
 	 * later platforms don't have L3 control bits in the PTE.
 	 */
 	if (IS_IVYBRIDGE(i915))
-		i915_gem_object_set_cache_coherency(obj, I915_CACHE_L3_LLC);
+		i915_gem_object_set_cache_coherency(obj,
+						    I915_CACHE_CACHED |
+						    __I915_CACHE_FLAG(L3));
 
 	vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL);
 	if (IS_ERR(vma)) {
diff --git a/drivers/gpu/drm/i915/gt/intel_timeline.c b/drivers/gpu/drm/i915/gt/intel_timeline.c
index b9640212d659..025ce54c886d 100644
--- a/drivers/gpu/drm/i915/gt/intel_timeline.c
+++ b/drivers/gpu/drm/i915/gt/intel_timeline.c
@@ -26,7 +26,7 @@ static struct i915_vma *hwsp_alloc(struct intel_gt *gt)
 	if (IS_ERR(obj))
 		return ERR_CAST(obj);
 
-	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
+	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
 
 	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL);
 	if (IS_ERR(vma))
diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
index 8b0d84f2aad2..fc278fa463b0 100644
--- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
+++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
@@ -64,7 +64,7 @@ static int hang_init(struct hang *h, struct intel_gt *gt)
 		goto err_hws;
 	}
 
-	i915_gem_object_set_cache_coherency(h->hws, I915_CACHE_LLC);
+	i915_gem_object_set_cache_coherency(h->hws, I915_CACHE_CACHED);
 	vaddr = i915_gem_object_pin_map_unlocked(h->hws, I915_MAP_WB);
 	if (IS_ERR(vaddr)) {
 		err = PTR_ERR(vaddr);
diff --git a/drivers/gpu/drm/i915/gt/selftest_workarounds.c b/drivers/gpu/drm/i915/gt/selftest_workarounds.c
index 14a8b25b6204..d25990d33d44 100644
--- a/drivers/gpu/drm/i915/gt/selftest_workarounds.c
+++ b/drivers/gpu/drm/i915/gt/selftest_workarounds.c
@@ -111,7 +111,7 @@ read_nonprivs(struct intel_context *ce)
 	if (IS_ERR(result))
 		return result;
 
-	i915_gem_object_set_cache_coherency(result, I915_CACHE_LLC);
+	i915_gem_object_set_cache_coherency(result, I915_CACHE_CACHED);
 
 	cs = i915_gem_object_pin_map_unlocked(result, I915_MAP_WB);
 	if (IS_ERR(cs)) {
diff --git a/drivers/gpu/drm/i915/i915_cache.c b/drivers/gpu/drm/i915/i915_cache.c
index 06eb5933c719..f4ba1cb430d3 100644
--- a/drivers/gpu/drm/i915/i915_cache.c
+++ b/drivers/gpu/drm/i915/i915_cache.c
@@ -6,13 +6,88 @@
 #include "i915_cache.h"
 #include "i915_drv.h"
 
-void i915_cache_init(struct drm_i915_private *i915)
+int i915_cache_init(struct drm_i915_private *i915)
 {
-	i915->pat_uc = i915_gem_get_pat_index(i915, I915_CACHE_NONE);
-	drm_info(&i915->drm, "Using PAT index %u for uncached access\n",
-		 i915->pat_uc);
+	int ret;
 
-	i915->pat_wb = i915_gem_get_pat_index(i915, I915_CACHE_LLC);
-	drm_info(&i915->drm, "Using PAT index %u for write-back access\n",
-		 i915->pat_wb);
+	ret = i915_cache_find_pat(i915, I915_CACHE_NONE);
+	if (ret < 0) {
+		drm_err(&i915->drm,
+			"Failed to find PAT index for uncached access\n");
+		return -ENODEV;
+	}
+	drm_info(&i915->drm, "Using PAT index %u for uncached access\n", ret);
+	i915->pat_uc = ret;
+
+	ret = i915_cache_find_pat(i915, I915_CACHE_CACHED);
+	if (ret < 0) {
+		drm_err(&i915->drm,
+			"Failed to find PAT index for write-back access\n");
+		return -ENODEV;
+	}
+	drm_info(&i915->drm, "Using PAT index %u for write-back access\n", ret);
+	i915->pat_wb = ret;
+
+	return 0;
+}
+
+int i915_cache_find_pat(struct drm_i915_private *i915, i915_cache_t cache)
+{
+	const struct intel_device_info *info = INTEL_INFO(i915);
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(info->cache_modes); i++) {
+		if (info->cache_modes[i] == cache)
+			return i;
+	}
+
+	return -1;
+}
+
+void i915_cache_print(char *buf, size_t buflen, const char *suffix,
+		      i915_cache_t cache)
+{
+	const enum i915_cache_mode mode = I915_CACHE_MODE(cache);
+	static const char * const mode_str[] = {
+		[I915_CACHE_MODE_UC] = "UC",
+		[I915_CACHE_MODE_WB] = "WB",
+		[I915_CACHE_MODE_WT] = "WT",
+		[I915_CACHE_MODE_WC] = "WC",
+	};
+	static const char * const flag_str[] = {
+		[ilog2(I915_CACHE_FLAG_COH1W)] = "1-Way-Coherent",
+		[ilog2(I915_CACHE_FLAG_COH2W)] = "2-Way-Coherent",
+		[ilog2(I915_CACHE_FLAG_L3)] =    "L3",
+		[ilog2(I915_CACHE_FLAG_CLOS1)] = "CLOS1",
+		[ilog2(I915_CACHE_FLAG_CLOS2)] = "CLOS2",
+	};
+
+	if (mode > ARRAY_SIZE(mode_str)) {
+		snprintf(buf, buflen, "0x%x%s", cache, suffix ?: "");
+	} else {
+		unsigned long flags = I915_CACHE_FLAGS(cache);
+		unsigned long bit;
+		int ret;
+
+		ret = snprintf(buf, buflen, "%s", mode_str[mode]);
+		buf += ret;
+		buflen -= ret;
+
+		/*
+		 * Don't print "1-way-2-way", it would be confusing and 2-way
+		 * implies 1-way anyway.
+		 */
+		if ((flags & (I915_CACHE_FLAG_COH1W | I915_CACHE_FLAG_COH2W)) ==
+		    (I915_CACHE_FLAG_COH1W | I915_CACHE_FLAG_COH2W))
+			flags &= ~I915_CACHE_FLAG_COH1W;
+
+		for_each_set_bit(bit, &flags, BITS_PER_TYPE(i915_cache_t)) {
+			ret = snprintf(buf, buflen, "-%s", flag_str[bit]);
+			buf += ret;
+			buflen -= ret;
+		}
+
+		if (suffix)
+			snprintf(buf, buflen, "%s", suffix);
+	}
 }
diff --git a/drivers/gpu/drm/i915/i915_cache.h b/drivers/gpu/drm/i915/i915_cache.h
index cb68936fb8a2..d9e97318b942 100644
--- a/drivers/gpu/drm/i915/i915_cache.h
+++ b/drivers/gpu/drm/i915/i915_cache.h
@@ -6,8 +6,76 @@
 #ifndef __I915_CACHE_H__
 #define __I915_CACHE_H__
 
+#include <linux/types.h>
+
+struct drm_printer;
+
 struct drm_i915_private;
 
-void i915_cache_init(struct drm_i915_private *i915);
+typedef u16 i915_cache_t;
+
+/* Cache modes */
+enum i915_cache_mode {
+	I915_CACHE_MODE_UC = 0,
+	I915_CACHE_MODE_WB,
+	__I915_CACHE_MODE_WB_L3, /* Special do-not-use entry for legacy 1:1 mapping. */
+	I915_CACHE_MODE_WT,
+	I915_CACHE_MODE_WC,
+	I915_NUM_CACHE_MODES
+};
+
+/* Cache mode flag bits */
+#define I915_CACHE_FLAG_COH1W	(0x1)
+#define I915_CACHE_FLAG_COH2W	(0x2) /* 1-way needs to be set too. */
+#define I915_CACHE_FLAG_L3	(0x4)
+#define I915_CACHE_FLAG_CLOS1	(0x8)
+#define I915_CACHE_FLAG_CLOS2	(0x10)
+
+/*
+ * Overloaded I915_CACHE() macro based on:
+ *  https://stackoverflow.com/questions/3046889/optional-parameters-with-c-macros
+ *
+ * It is possible to call I915_CACHE with mode and zero or more flags as
+ * separate arguments. Ie these all work:
+ *
+ *   I915_CACHE(WB)
+ *   I915_CACHE(WB, COH1W, COH2W)
+ *   I915_CACHE(WB, COH1W, COH2W, L3)
+ */
+
+#define __I915_CACHE_FLAG(f) (I915_CACHE_FLAG_##f << 8)
+#define __I915_CACHE(m, f) ((i915_cache_t)(I915_CACHE_MODE_##m | (f)))
+
+#define I915_CACHE_4(m, f1, f2, f3)	__I915_CACHE(m, __I915_CACHE_FLAG(f1) | __I915_CACHE_FLAG(f2) | __I915_CACHE_FLAG(f3))
+#define I915_CACHE_3(m, f1, f2)		__I915_CACHE(m, __I915_CACHE_FLAG(f1) | __I915_CACHE_FLAG(f2))
+#define I915_CACHE_2(m, f1)		__I915_CACHE(m, __I915_CACHE_FLAG(f1))
+#define I915_CACHE_1(m)			__I915_CACHE(m, 0)
+#define I915_CACHE_0(m)			__I915_CACHE(WC, 0)
+
+#define FUNC_CHOOSER(_f1, _f2, _f3, _f4, _f5, ...) _f5
+#define FUNC_RECOMPOSER(argsWithParentheses) FUNC_CHOOSER argsWithParentheses
+#define CHOOSE_FROM_ARG_COUNT(...) FUNC_RECOMPOSER((__VA_ARGS__, I915_CACHE_4, I915_CACHE_3, I915_CACHE_2, I915_CACHE_1, ))
+#define NO_ARG_EXPANDER() ,,,I915_CACHE_0
+#define MACRO_CHOOSER(...) CHOOSE_FROM_ARG_COUNT(NO_ARG_EXPANDER __VA_ARGS__ ())
+
+#define I915_CACHE(...) MACRO_CHOOSER(__VA_ARGS__)(__VA_ARGS__)
+
+/* i915_cache_t mode and flags extraction helpers. */
+#define I915_CACHE_MODE(cache) \
+	((enum i915_cache_mode)(((i915_cache_t)(cache)) & 0xff))
+#define I915_CACHE_FLAGS(cache) \
+	((unsigned int)((((i915_cache_t)(cache) & 0xff00)) >> 8))
+
+/* Helpers for i915 caching modes. */
+#define I915_CACHE_NONE		I915_CACHE(UC)
+#define I915_CACHE_CACHED	I915_CACHE(WB, COH1W, COH2W)
+#define I915_CACHE_WT		I915_CACHE(WT)
+
+int i915_cache_init(struct drm_i915_private *i915);
+int i915_cache_find_pat(struct drm_i915_private *i915, i915_cache_t cache);
+void i915_cache_print(char *buf, size_t buflen, const char *suffix,
+		      i915_cache_t cache);
+
+#define I915_CACHE_NAME_LEN (40)
 
 #endif /* __I915_CACHE_H__ */
diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 4de44cf1026d..4ec292011546 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -140,57 +140,18 @@ static const char *stringify_vma_type(const struct i915_vma *vma)
 	return "ppgtt";
 }
 
-static const char *i915_cache_level_str(struct drm_i915_gem_object *obj)
-{
-	struct drm_i915_private *i915 = obj_to_i915(obj);
-
-	if (IS_METEORLAKE(i915)) {
-		switch (obj->pat_index) {
-		case 0: return " WB";
-		case 1: return " WT";
-		case 2: return " UC";
-		case 3: return " WB (1-Way Coh)";
-		case 4: return " WB (2-Way Coh)";
-		default: return " not defined";
-		}
-	} else if (IS_PONTEVECCHIO(i915)) {
-		switch (obj->pat_index) {
-		case 0: return " UC";
-		case 1: return " WC";
-		case 2: return " WT";
-		case 3: return " WB";
-		case 4: return " WT (CLOS1)";
-		case 5: return " WB (CLOS1)";
-		case 6: return " WT (CLOS2)";
-		case 7: return " WT (CLOS2)";
-		default: return " not defined";
-		}
-	} else if (GRAPHICS_VER(i915) >= 12) {
-		switch (obj->pat_index) {
-		case 0: return " WB";
-		case 1: return " WC";
-		case 2: return " WT";
-		case 3: return " UC";
-		default: return " not defined";
-		}
-	} else {
-		switch (obj->pat_index) {
-		case 0: return " UC";
-		case 1: return HAS_LLC(i915) ?
-			       " LLC" : " snooped";
-		case 2: return " L3+LLC";
-		case 3: return " WT";
-		default: return " not defined";
-		}
-	}
-}
-
 void
 i915_debugfs_describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
 {
+	struct drm_i915_private *i915 = to_i915(obj->base.dev);
+	char buf[I915_CACHE_NAME_LEN];
 	struct i915_vma *vma;
 	int pin_count = 0;
 
+	i915_cache_print(buf, sizeof(buf),
+			 obj->pat_set_by_user ? "!" : NULL,
+			 INTEL_INFO(i915)->cache_modes[obj->pat_index]);
+
 	seq_printf(m, "%pK: %c%c%c %8zdKiB %02x %02x %s%s%s",
 		   &obj->base,
 		   get_tiling_flag(obj),
@@ -199,7 +160,7 @@ i915_debugfs_describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
 		   obj->base.size / 1024,
 		   obj->read_domains,
 		   obj->write_domain,
-		   i915_cache_level_str(obj),
+		   buf,
 		   obj->mm.dirty ? " dirty" : "",
 		   obj->mm.madv == I915_MADV_DONTNEED ? " purgeable" : "");
 	if (obj->base.name)
diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
index bb2223cc3470..8663388a524f 100644
--- a/drivers/gpu/drm/i915/i915_driver.c
+++ b/drivers/gpu/drm/i915/i915_driver.c
@@ -241,7 +241,9 @@ static int i915_driver_early_probe(struct drm_i915_private *dev_priv)
 	i915_memcpy_init_early(dev_priv);
 	intel_runtime_pm_init_early(&dev_priv->runtime_pm);
 
-	i915_cache_init(dev_priv);
+	ret = i915_cache_init(dev_priv);
+	if (ret < 0)
+		return ret;
 
 	ret = i915_workqueues_init(dev_priv);
 	if (ret < 0)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 896aa48ed089..814705cfeb12 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1144,19 +1144,6 @@ int i915_gem_init(struct drm_i915_private *dev_priv)
 	unsigned int i;
 	int ret;
 
-	/*
-	 * In the proccess of replacing cache_level with pat_index a tricky
-	 * dependency is created on the definition of the enum i915_cache_level.
-	 * in case this enum is changed, PTE encode would be broken.
-	 * Add a WARNING here. And remove when we completely quit using this
-	 * enum
-	 */
-	BUILD_BUG_ON(I915_CACHE_NONE != 0 ||
-		     I915_CACHE_LLC != 1 ||
-		     I915_CACHE_L3_LLC != 2 ||
-		     I915_CACHE_WT != 3 ||
-		     I915_MAX_CACHE_LEVEL != 4);
-
 	/* We need to fallback to 4K pages if host doesn't support huge gtt. */
 	if (intel_vgpu_active(dev_priv) && !intel_vgpu_has_huge_gtt(dev_priv))
 		RUNTIME_INFO(dev_priv)->page_sizes = I915_GTT_PAGE_SIZE_4K;
diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
index fcacdc21643c..565a60a1645d 100644
--- a/drivers/gpu/drm/i915/i915_pci.c
+++ b/drivers/gpu/drm/i915/i915_pci.c
@@ -32,6 +32,7 @@
 #include "gt/intel_sa_media.h"
 #include "gem/i915_gem_object_types.h"
 
+#include "i915_cache.h"
 #include "i915_driver.h"
 #include "i915_drv.h"
 #include "i915_pci.h"
@@ -43,36 +44,43 @@
 	.__runtime.graphics.ip.ver = (x), \
 	.__runtime.media.ip.ver = (x)
 
-#define LEGACY_CACHELEVEL \
-	.cachelevel_to_pat = { \
-		[I915_CACHE_NONE]   = 0, \
-		[I915_CACHE_LLC]    = 1, \
-		[I915_CACHE_L3_LLC] = 2, \
-		[I915_CACHE_WT]     = 3, \
+#define LEGACY_CACHE_MODES \
+	.cache_modes = { \
+		[I915_CACHE_MODE_UC] 	  = I915_CACHE(UC), \
+		[I915_CACHE_MODE_WB] 	  = I915_CACHE(WB, COH1W, COH2W), \
+		[__I915_CACHE_MODE_WB_L3] = I915_CACHE(WB, COH1W, COH2W, L3), \
+		[I915_CACHE_MODE_WT] 	  = I915_CACHE(WT), \
 	}
 
-#define TGL_CACHELEVEL \
-	.cachelevel_to_pat = { \
-		[I915_CACHE_NONE]   = 3, \
-		[I915_CACHE_LLC]    = 0, \
-		[I915_CACHE_L3_LLC] = 0, \
-		[I915_CACHE_WT]     = 2, \
+#define GEN12_CACHE_MODES \
+	.cache_modes = { \
+		[0] = I915_CACHE(WB, COH1W, COH2W), \
+		[1] = I915_CACHE(WC), \
+		[2] = I915_CACHE(WT), \
+		[3] = I915_CACHE(UC), \
 	}
 
-#define PVC_CACHELEVEL \
-	.cachelevel_to_pat = { \
-		[I915_CACHE_NONE]   = 0, \
-		[I915_CACHE_LLC]    = 3, \
-		[I915_CACHE_L3_LLC] = 3, \
-		[I915_CACHE_WT]     = 2, \
+/* FIXME is 1-way or 2-way for 3, 5, 7 */
+
+#define PVC_CACHE_MODES \
+	.cache_modes = { \
+		[0] = I915_CACHE(UC), \
+		[1] = I915_CACHE(WC), \
+		[2] = I915_CACHE(WT), \
+		[3] = I915_CACHE(WB, COH1W), \
+		[4] = I915_CACHE(WT, CLOS1), \
+		[5] = I915_CACHE(WB, COH1W, CLOS1), \
+		[6] = I915_CACHE(WT, CLOS2), \
+		[7] = I915_CACHE(WB, COH1W, CLOS2), \
 	}
 
-#define MTL_CACHELEVEL \
-	.cachelevel_to_pat = { \
-		[I915_CACHE_NONE]   = 2, \
-		[I915_CACHE_LLC]    = 3, \
-		[I915_CACHE_L3_LLC] = 3, \
-		[I915_CACHE_WT]     = 1, \
+#define MTL_CACHE_MODES \
+	.cache_modes = { \
+		[0] = I915_CACHE(WB), \
+		[1] = I915_CACHE(WT), \
+		[2] = I915_CACHE(UC), \
+		[3] = I915_CACHE(WB, COH1W), \
+		[4] = I915_CACHE(WB, COH1W, COH2W), \
 	}
 
 /* Keep in gen based order, and chronological order within a gen */
@@ -97,7 +105,7 @@
 	.max_pat_index = 3, \
 	GEN_DEFAULT_PAGE_SIZES, \
 	GEN_DEFAULT_REGIONS, \
-	LEGACY_CACHELEVEL
+	LEGACY_CACHE_MODES
 
 #define I845_FEATURES \
 	GEN(2), \
@@ -112,7 +120,7 @@
 	.max_pat_index = 3, \
 	GEN_DEFAULT_PAGE_SIZES, \
 	GEN_DEFAULT_REGIONS, \
-	LEGACY_CACHELEVEL
+	LEGACY_CACHE_MODES
 
 static const struct intel_device_info i830_info = {
 	I830_FEATURES,
@@ -145,7 +153,7 @@ static const struct intel_device_info i865g_info = {
 	.max_pat_index = 3, \
 	GEN_DEFAULT_PAGE_SIZES, \
 	GEN_DEFAULT_REGIONS, \
-	LEGACY_CACHELEVEL
+	LEGACY_CACHE_MODES
 
 static const struct intel_device_info i915g_info = {
 	GEN3_FEATURES,
@@ -208,7 +216,7 @@ static const struct intel_device_info pnv_m_info = {
 	.max_pat_index = 3, \
 	GEN_DEFAULT_PAGE_SIZES, \
 	GEN_DEFAULT_REGIONS, \
-	LEGACY_CACHELEVEL
+	LEGACY_CACHE_MODES
 
 static const struct intel_device_info i965g_info = {
 	GEN4_FEATURES,
@@ -252,7 +260,7 @@ static const struct intel_device_info gm45_info = {
 	.max_pat_index = 3, \
 	GEN_DEFAULT_PAGE_SIZES, \
 	GEN_DEFAULT_REGIONS, \
-	LEGACY_CACHELEVEL
+	LEGACY_CACHE_MODES
 
 static const struct intel_device_info ilk_d_info = {
 	GEN5_FEATURES,
@@ -282,7 +290,7 @@ static const struct intel_device_info ilk_m_info = {
 	.__runtime.ppgtt_size = 31, \
 	GEN_DEFAULT_PAGE_SIZES, \
 	GEN_DEFAULT_REGIONS, \
-	LEGACY_CACHELEVEL
+	LEGACY_CACHE_MODES
 
 #define SNB_D_PLATFORM \
 	GEN6_FEATURES, \
@@ -330,7 +338,7 @@ static const struct intel_device_info snb_m_gt2_info = {
 	.__runtime.ppgtt_size = 31, \
 	GEN_DEFAULT_PAGE_SIZES, \
 	GEN_DEFAULT_REGIONS, \
-	LEGACY_CACHELEVEL
+	LEGACY_CACHE_MODES
 
 #define IVB_D_PLATFORM \
 	GEN7_FEATURES, \
@@ -387,7 +395,7 @@ static const struct intel_device_info vlv_info = {
 	.platform_engine_mask = BIT(RCS0) | BIT(VCS0) | BIT(BCS0),
 	GEN_DEFAULT_PAGE_SIZES,
 	GEN_DEFAULT_REGIONS,
-	LEGACY_CACHELEVEL,
+	LEGACY_CACHE_MODES
 };
 
 #define G75_FEATURES  \
@@ -473,7 +481,7 @@ static const struct intel_device_info chv_info = {
 	.has_coherent_ggtt = false,
 	GEN_DEFAULT_PAGE_SIZES,
 	GEN_DEFAULT_REGIONS,
-	LEGACY_CACHELEVEL,
+	LEGACY_CACHE_MODES
 };
 
 #define GEN9_DEFAULT_PAGE_SIZES \
@@ -536,7 +544,7 @@ static const struct intel_device_info skl_gt4_info = {
 	.max_pat_index = 3, \
 	GEN9_DEFAULT_PAGE_SIZES, \
 	GEN_DEFAULT_REGIONS, \
-	LEGACY_CACHELEVEL
+	LEGACY_CACHE_MODES
 
 static const struct intel_device_info bxt_info = {
 	GEN9_LP_FEATURES,
@@ -640,7 +648,7 @@ static const struct intel_device_info jsl_info = {
 #define GEN12_FEATURES \
 	GEN11_FEATURES, \
 	GEN(12), \
-	TGL_CACHELEVEL, \
+	GEN12_CACHE_MODES, \
 	.has_global_mocs = 1, \
 	.has_pxp = 1, \
 	.max_pat_index = 3
@@ -708,7 +716,7 @@ static const struct intel_device_info adl_p_info = {
 	.__runtime.graphics.ip.ver = 12, \
 	.__runtime.graphics.ip.rel = 50, \
 	XE_HP_PAGE_SIZES, \
-	TGL_CACHELEVEL, \
+	GEN12_CACHE_MODES, \
 	.dma_mask_size = 46, \
 	.has_3d_pipeline = 1, \
 	.has_64bit_reloc = 1, \
@@ -803,7 +811,7 @@ static const struct intel_device_info pvc_info = {
 		BIT(VCS0) |
 		BIT(CCS0) | BIT(CCS1) | BIT(CCS2) | BIT(CCS3),
 	.require_force_probe = 1,
-	PVC_CACHELEVEL,
+	PVC_CACHE_MODES
 };
 
 static const struct intel_gt_definition xelpmp_extra_gt[] = {
@@ -838,7 +846,7 @@ static const struct intel_device_info mtl_info = {
 	.memory_regions = REGION_SMEM | REGION_STOLEN_LMEM,
 	.platform_engine_mask = BIT(RCS0) | BIT(BCS0) | BIT(CCS0),
 	.require_force_probe = 1,
-	MTL_CACHELEVEL,
+	MTL_CACHE_MODES
 };
 
 #undef PLATFORM
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 04bc1f4a1115..973175a64534 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -1870,7 +1870,7 @@ static int alloc_oa_buffer(struct i915_perf_stream *stream)
 		return PTR_ERR(bo);
 	}
 
-	i915_gem_object_set_cache_coherency(bo, I915_CACHE_LLC);
+	i915_gem_object_set_cache_coherency(bo, I915_CACHE_CACHED);
 
 	/* PreHSW required 512K alignment, HSW requires 16M */
 	vma = i915_vma_instance(bo, &gt->ggtt->vm, NULL);
diff --git a/drivers/gpu/drm/i915/intel_device_info.h b/drivers/gpu/drm/i915/intel_device_info.h
index dbfe6443457b..2ce13b7c48cb 100644
--- a/drivers/gpu/drm/i915/intel_device_info.h
+++ b/drivers/gpu/drm/i915/intel_device_info.h
@@ -27,6 +27,8 @@
 
 #include <uapi/drm/i915_drm.h>
 
+#include "i915_cache.h"
+
 #include "intel_step.h"
 
 #include "gt/intel_engine_types.h"
@@ -243,8 +245,8 @@ struct intel_device_info {
 	 */
 	const struct intel_runtime_info __runtime;
 
-	u32 cachelevel_to_pat[I915_MAX_CACHE_LEVEL];
-	u32 max_pat_index;
+	i915_cache_t cache_modes[8];
+	unsigned int max_pat_index;
 };
 
 struct intel_driver_caps {
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
index f910ec9b6d2b..ba821e48baa5 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
@@ -267,7 +267,7 @@ static int igt_evict_for_cache_color(void *arg)
 		err = PTR_ERR(obj);
 		goto cleanup;
 	}
-	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
+	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
 	quirk_add(obj, &objects);
 
 	vma = i915_gem_object_ggtt_pin(obj, NULL, 0, 0,
@@ -283,7 +283,7 @@ static int igt_evict_for_cache_color(void *arg)
 		err = PTR_ERR(obj);
 		goto cleanup;
 	}
-	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
+	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
 	quirk_add(obj, &objects);
 
 	/* Neighbouring; same colour - should fit */
diff --git a/drivers/gpu/drm/i915/selftests/igt_spinner.c b/drivers/gpu/drm/i915/selftests/igt_spinner.c
index 3c5e0952f1b8..4cfc5000d6ff 100644
--- a/drivers/gpu/drm/i915/selftests/igt_spinner.c
+++ b/drivers/gpu/drm/i915/selftests/igt_spinner.c
@@ -23,7 +23,7 @@ int igt_spinner_init(struct igt_spinner *spin, struct intel_gt *gt)
 		err = PTR_ERR(spin->hws);
 		goto err;
 	}
-	i915_gem_object_set_cache_coherency(spin->hws, I915_CACHE_LLC);
+	i915_gem_object_set_cache_coherency(spin->hws, I915_CACHE_CACHED);
 
 	spin->obj = i915_gem_object_create_internal(gt->i915, PAGE_SIZE);
 	if (IS_ERR(spin->obj)) {
diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
index 1d1a457e2aee..8ae77bcf27fa 100644
--- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
+++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
@@ -126,13 +126,13 @@ static const struct intel_device_info mock_info = {
 	.memory_regions = REGION_SMEM,
 	.platform_engine_mask = BIT(0),
 
-	/* simply use legacy cache level for mock device */
+	/* Simply use legacy cache modes for the mock device. */
 	.max_pat_index = 3,
-	.cachelevel_to_pat = {
-		[I915_CACHE_NONE]   = 0,
-		[I915_CACHE_LLC]    = 1,
-		[I915_CACHE_L3_LLC] = 2,
-		[I915_CACHE_WT]     = 3,
+	.cache_modes = {
+		[0] = I915_CACHE(UC),
+		[1] = I915_CACHE(WB, COH1W),
+		[2] = I915_CACHE(WB, COH1W, COH2W, L3),
+		[3] = I915_CACHE(WT),
 	},
 };
 
@@ -181,7 +181,7 @@ struct drm_i915_private *mock_gem_device(void)
 	/* Set up device info and initial runtime info. */
 	intel_device_info_driver_create(i915, pdev->device, &mock_info);
 
-	i915_cache_init(i915);
+	WARN_ON(i915_cache_init(i915));
 
 	dev_pm_domain_set(&pdev->dev, &pm_domain);
 	pm_runtime_enable(&pdev->dev);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Intel-gfx] [RFC 4/8] drm/i915: Refactor PAT/object cache handling
@ 2023-07-27 14:55   ` Tvrtko Ursulin
  0 siblings, 0 replies; 59+ messages in thread
From: Tvrtko Ursulin @ 2023-07-27 14:55 UTC (permalink / raw)
  To: Intel-gfx, dri-devel; +Cc: Matt Roper, Chris Wilson

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Commit 9275277d5324 ("drm/i915: use pat_index instead of cache_level") has
introduced PAT indices to i915 internal APIs, partially replacing the
usage of driver internal cache_level, but has also added a few sub-
optimal design decisions which this patch tries to improve upon.

Principal change here is to invert the per platform cache level to PAT
index table which was added by the referenced commit, and by doing so
enable i915 to understand the cache mode between PAT indices, changing
them from opaque to transparent.

Once we have the inverted table we are able to remove the hidden false
"return true" from i915_gem_object_has_cache_level and make the involved
code path clearer.

To achieve this we replace the enum i915_cache_level with i915_cache_t,
composed of a more detailed representation of each cache mode (base mode
plus flags).

In this way we are able to express the differences between different
write-back mode coherency settings on Meteorlake, which in turn enables us
to map the i915 "cached" mode to the correct Meteorlake PAT index.

We can also replace the platform dependent cache mode to string code in
debugfs and elsewhere by the single implementation based on i915_cache_t.

v2:
 * Fix PAT-to-cache-mode table for PVC. (Fei)
 * Cache display caching mode too. (Fei)
 * Improve and document criteria in i915_gem_object_can_bypass_llc() (Matt)

v3:
 * Checkpath issues.
 * Cache mode flags check fixed.

v4:
 * Fix intel_device_info->cache_modes array size. (Matt)
 * Boolean cache mode and flags query. (Matt)
 * Reduce number of cache macros with some macro magic.
 * One more checkpatch fix.
 * Tweak tables to show legacy and Gen12 WB is fully coherent.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
References: 9275277d5324 ("drm/i915: use pat_index instead of cache_level")
Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
Cc: Fei Yang <fei.yang@intel.com>
Cc: Andi Shyti <andi.shyti@linux.intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_domain.c    |  60 +++++----
 drivers/gpu/drm/i915/gem/i915_gem_domain.h    |   5 +-
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |   3 +-
 drivers/gpu/drm/i915/gem/i915_gem_internal.c  |   2 +-
 drivers/gpu/drm/i915/gem/i915_gem_mman.c      |   4 +-
 drivers/gpu/drm/i915/gem/i915_gem_object.c    | 117 ++++++++++--------
 drivers/gpu/drm/i915/gem/i915_gem_object.h    |  11 +-
 .../gpu/drm/i915/gem/i915_gem_object_types.h  | 116 +----------------
 drivers/gpu/drm/i915/gem/i915_gem_shmem.c     |   8 +-
 drivers/gpu/drm/i915/gem/i915_gem_stolen.c    |   2 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c  |  20 +--
 drivers/gpu/drm/i915/gem/i915_gem_userptr.c   |   2 +-
 .../drm/i915/gem/selftests/huge_gem_object.c  |   2 +-
 .../gpu/drm/i915/gem/selftests/huge_pages.c   |   3 +-
 drivers/gpu/drm/i915/gt/gen8_ppgtt.c          |  10 +-
 drivers/gpu/drm/i915/gt/intel_engine_cs.c     |   2 +-
 drivers/gpu/drm/i915/gt/intel_ggtt.c          |  25 ++--
 drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c     |   4 +-
 drivers/gpu/drm/i915/gt/intel_gtt.c           |   2 +-
 drivers/gpu/drm/i915/gt/intel_gtt.h           |   3 +-
 drivers/gpu/drm/i915/gt/intel_ppgtt.c         |   6 +-
 .../gpu/drm/i915/gt/intel_ring_submission.c   |   4 +-
 drivers/gpu/drm/i915/gt/intel_timeline.c      |   2 +-
 drivers/gpu/drm/i915/gt/selftest_hangcheck.c  |   2 +-
 .../gpu/drm/i915/gt/selftest_workarounds.c    |   2 +-
 drivers/gpu/drm/i915/i915_cache.c             |  89 +++++++++++--
 drivers/gpu/drm/i915/i915_cache.h             |  70 ++++++++++-
 drivers/gpu/drm/i915/i915_debugfs.c           |  53 ++------
 drivers/gpu/drm/i915/i915_driver.c            |   4 +-
 drivers/gpu/drm/i915/i915_gem.c               |  13 --
 drivers/gpu/drm/i915/i915_pci.c               |  84 +++++++------
 drivers/gpu/drm/i915/i915_perf.c              |   2 +-
 drivers/gpu/drm/i915/intel_device_info.h      |   6 +-
 .../gpu/drm/i915/selftests/i915_gem_evict.c   |   4 +-
 drivers/gpu/drm/i915/selftests/igt_spinner.c  |   2 +-
 .../gpu/drm/i915/selftests/mock_gem_device.c  |  14 +--
 36 files changed, 391 insertions(+), 367 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
index 57db9c581bf6..c15f83de33af 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
@@ -8,6 +8,7 @@
 #include "display/intel_frontbuffer.h"
 #include "gt/intel_gt.h"
 
+#include "i915_cache.h"
 #include "i915_drv.h"
 #include "i915_gem_clflush.h"
 #include "i915_gem_domain.h"
@@ -41,14 +42,17 @@ static bool gpu_write_needs_clflush(struct drm_i915_gem_object *obj)
 		return false;
 
 	/*
-	 * For objects created by userspace through GEM_CREATE with pat_index
-	 * set by set_pat extension, i915_gem_object_has_cache_level() will
-	 * always return true, because the coherency of such object is managed
-	 * by userspace. Othereise the call here would fall back to checking
-	 * whether the object is un-cached or write-through.
+	 * Always flush cache for UMD objects with PAT index set.
 	 */
-	return !(i915_gem_object_has_cache_level(obj, I915_CACHE_NONE) ||
-		 i915_gem_object_has_cache_level(obj, I915_CACHE_WT));
+	if (obj->pat_set_by_user)
+		return true;
+
+	/*
+	 * Fully coherent cached access may end up with data in the CPU cache
+	 * which hasn't hit memory yet.
+	 */
+	return i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_WB) &&
+	       i915_gem_object_has_cache_flag(obj, I915_CACHE_FLAG_COH2W);
 }
 
 bool i915_gem_cpu_write_needs_clflush(struct drm_i915_gem_object *obj)
@@ -268,7 +272,7 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
 /**
  * i915_gem_object_set_cache_level - Changes the cache-level of an object across all VMA.
  * @obj: object to act on
- * @cache_level: new cache level to set for the object
+ * @cache: new caching mode to set for the object
  *
  * After this function returns, the object will be in the new cache-level
  * across all GTT and the contents of the backing storage will be coherent,
@@ -281,18 +285,28 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
  * that all direct access to the scanout remains coherent.
  */
 int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
-				    enum i915_cache_level cache_level)
+				    i915_cache_t cache)
 {
-	int ret;
+	struct drm_i915_private *i915 = to_i915(obj->base.dev);
+	int pat, ret;
 
-	/*
-	 * For objects created by userspace through GEM_CREATE with pat_index
-	 * set by set_pat extension, simply return 0 here without touching
-	 * the cache setting, because such objects should have an immutable
-	 * cache setting by desgin and always managed by userspace.
-	 */
-	if (i915_gem_object_has_cache_level(obj, cache_level))
+	pat = i915_cache_find_pat(i915, cache);
+	if (pat < 0) {
+		char buf[I915_CACHE_NAME_LEN];
+
+		i915_cache_print(buf, sizeof(buf), NULL, cache);
+		drm_err_ratelimited(&i915->drm,
+				    "Attempting to use unknown caching mode %s!\n",
+				    buf);
+
+		return -EINVAL;
+	} else if (pat == obj->pat_index) {
 		return 0;
+	} else if (obj->pat_set_by_user) {
+		drm_notice_once(&i915->drm,
+				"Attempting to change caching mode on an object with fixed PAT!\n");
+		return -EINVAL;
+	}
 
 	ret = i915_gem_object_wait(obj,
 				   I915_WAIT_INTERRUPTIBLE |
@@ -302,7 +316,7 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
 		return ret;
 
 	/* Always invalidate stale cachelines */
-	i915_gem_object_set_cache_coherency(obj, cache_level);
+	i915_gem_object_set_pat_index(obj, pat);
 	obj->cache_dirty = true;
 
 	/* The cache-level will be applied when each vma is rebound. */
@@ -337,10 +351,10 @@ int i915_gem_get_caching_ioctl(struct drm_device *dev, void *data,
 		goto out;
 	}
 
-	if (i915_gem_object_has_cache_level(obj, I915_CACHE_LLC) ||
-	    i915_gem_object_has_cache_level(obj, I915_CACHE_L3_LLC))
+	if (i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_WB) &&
+	    i915_gem_object_has_cache_flag(obj, I915_CACHE_FLAG_COH2W))
 		args->caching = I915_CACHING_CACHED;
-	else if (i915_gem_object_has_cache_level(obj, I915_CACHE_WT))
+	else if (i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_WT))
 		args->caching = I915_CACHING_DISPLAY;
 	else
 		args->caching = I915_CACHING_NONE;
@@ -355,7 +369,7 @@ int i915_gem_set_caching_ioctl(struct drm_device *dev, void *data,
 	struct drm_i915_private *i915 = to_i915(dev);
 	struct drm_i915_gem_caching *args = data;
 	struct drm_i915_gem_object *obj;
-	enum i915_cache_level level;
+	i915_cache_t level;
 	int ret = 0;
 
 	if (IS_DGFX(i915))
@@ -378,7 +392,7 @@ int i915_gem_set_caching_ioctl(struct drm_device *dev, void *data,
 		if (!HAS_LLC(i915) && !HAS_SNOOP(i915))
 			return -ENODEV;
 
-		level = I915_CACHE_LLC;
+		level = I915_CACHE_CACHED;
 		break;
 	case I915_CACHING_DISPLAY:
 		level = HAS_WT(i915) ? I915_CACHE_WT : I915_CACHE_NONE;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.h b/drivers/gpu/drm/i915/gem/i915_gem_domain.h
index 9622df962bfc..6da5c351f6fd 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_domain.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.h
@@ -6,10 +6,11 @@
 #ifndef __I915_GEM_DOMAIN_H__
 #define __I915_GEM_DOMAIN_H__
 
+#include "i915_cache.h"
+
 struct drm_i915_gem_object;
-enum i915_cache_level;
 
 int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
-				    enum i915_cache_level cache_level);
+				    i915_cache_t cache);
 
 #endif /* __I915_GEM_DOMAIN_H__ */
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 0a1d40220020..9d6e49c8a4c6 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -648,7 +648,8 @@ static inline int use_cpu_reloc(const struct reloc_cache *cache,
 	 */
 	return (cache->has_llc ||
 		obj->cache_dirty ||
-		!i915_gem_object_has_cache_level(obj, I915_CACHE_NONE));
+		!(obj->pat_set_by_user ||
+		  i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_UC)));
 }
 
 static int eb_reserve_vma(struct i915_execbuffer *eb,
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_internal.c b/drivers/gpu/drm/i915/gem/i915_gem_internal.c
index 6bc26b4b06b8..88c360c3d6a3 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_internal.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_internal.c
@@ -170,7 +170,7 @@ __i915_gem_object_create_internal(struct drm_i915_private *i915,
 	obj->read_domains = I915_GEM_DOMAIN_CPU;
 	obj->write_domain = I915_GEM_DOMAIN_CPU;
 
-	cache_level = HAS_LLC(i915) ? I915_CACHE_LLC : I915_CACHE_NONE;
+	cache_level = HAS_LLC(i915) ? I915_CACHE_CACHED : I915_CACHE_NONE;
 	i915_gem_object_set_cache_coherency(obj, cache_level);
 
 	return obj;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
index aa4d842d4c5a..cd7f8ded0d6f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
@@ -382,7 +382,6 @@ static vm_fault_t vm_fault_gtt(struct vm_fault *vmf)
 		goto err_reset;
 	}
 
-	/* Access to snoopable pages through the GTT is incoherent. */
 	/*
 	 * For objects created by userspace through GEM_CREATE with pat_index
 	 * set by set_pat extension, coherency is managed by userspace, make
@@ -391,7 +390,8 @@ static vm_fault_t vm_fault_gtt(struct vm_fault *vmf)
 	 * objects. Otherwise this helper function would fall back to checking
 	 * whether the object is un-cached.
 	 */
-	if (!(i915_gem_object_has_cache_level(obj, I915_CACHE_NONE) ||
+	if (!((obj->pat_set_by_user ||
+	       i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_UC)) ||
 	      HAS_LLC(i915))) {
 		ret = -EFAULT;
 		goto err_unpin;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
index 3dc4fbb67d2b..ec1f0be43d0d 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
@@ -45,33 +45,6 @@ static struct kmem_cache *slab_objects;
 
 static const struct drm_gem_object_funcs i915_gem_object_funcs;
 
-unsigned int i915_gem_get_pat_index(struct drm_i915_private *i915,
-				    enum i915_cache_level level)
-{
-	if (drm_WARN_ON(&i915->drm, level >= I915_MAX_CACHE_LEVEL))
-		return 0;
-
-	return INTEL_INFO(i915)->cachelevel_to_pat[level];
-}
-
-bool i915_gem_object_has_cache_level(const struct drm_i915_gem_object *obj,
-				     enum i915_cache_level lvl)
-{
-	/*
-	 * In case the pat_index is set by user space, this kernel mode
-	 * driver should leave the coherency to be managed by user space,
-	 * simply return true here.
-	 */
-	if (obj->pat_set_by_user)
-		return true;
-
-	/*
-	 * Otherwise the pat_index should have been converted from cache_level
-	 * so that the following comparison is valid.
-	 */
-	return obj->pat_index == i915_gem_get_pat_index(obj_to_i915(obj), lvl);
-}
-
 struct drm_i915_gem_object *i915_gem_object_alloc(void)
 {
 	struct drm_i915_gem_object *obj;
@@ -144,30 +117,72 @@ void __i915_gem_object_fini(struct drm_i915_gem_object *obj)
 	dma_resv_fini(&obj->base._resv);
 }
 
+bool i915_gem_object_has_cache_mode(const struct drm_i915_gem_object *obj,
+				    enum i915_cache_mode mode)
+{
+	struct drm_i915_private *i915 = obj_to_i915(obj);
+	i915_cache_t cache = INTEL_INFO(i915)->cache_modes[obj->pat_index];
+
+	return I915_CACHE_MODE(cache) == mode;
+}
+
+bool i915_gem_object_has_cache_flag(const struct drm_i915_gem_object *obj,
+				    unsigned int flag)
+{
+	struct drm_i915_private *i915 = obj_to_i915(obj);
+	i915_cache_t cache = INTEL_INFO(i915)->cache_modes[obj->pat_index];
+
+	return I915_CACHE_FLAGS(cache) & flag;
+}
+
+static void __i915_gem_object_update_coherency(struct drm_i915_gem_object *obj)
+{
+	struct drm_i915_private *i915 = obj_to_i915(obj);
+	i915_cache_t cache = INTEL_INFO(i915)->cache_modes[obj->pat_index];
+	const unsigned int flags = I915_CACHE_FLAGS(cache);
+	const unsigned int mode = I915_CACHE_MODE(cache);
+
+	if (mode == I915_CACHE_MODE_WC ||
+	    mode == I915_CACHE_MODE_WT ||
+	    (mode == I915_CACHE_MODE_WB && (flags & I915_CACHE_FLAG_COH2W)))
+		obj->cache_coherent = I915_BO_CACHE_COHERENT_FOR_READ |
+				      I915_BO_CACHE_COHERENT_FOR_WRITE;
+	else if (HAS_LLC(i915))
+		obj->cache_coherent = I915_BO_CACHE_COHERENT_FOR_READ;
+	else
+		obj->cache_coherent = 0;
+
+	obj->cache_dirty =
+		!(obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_WRITE) &&
+		!IS_DGFX(i915);
+}
+
 /**
  * i915_gem_object_set_cache_coherency - Mark up the object's coherency levels
- * for a given cache_level
+ * for a given caching mode
  * @obj: #drm_i915_gem_object
- * @cache_level: cache level
+ * @cache: cache mode
  */
 void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object *obj,
-					 unsigned int cache_level)
+					 i915_cache_t cache)
 {
-	struct drm_i915_private *i915 = to_i915(obj->base.dev);
+	struct drm_i915_private *i915 = obj_to_i915(obj);
+	int found;
 
-	obj->pat_index = i915_gem_get_pat_index(i915, cache_level);
+	found = i915_cache_find_pat(i915, cache);
+	if (found < 0) {
+		char buf[I915_CACHE_NAME_LEN];
 
-	if (cache_level != I915_CACHE_NONE)
-		obj->cache_coherent = (I915_BO_CACHE_COHERENT_FOR_READ |
-				       I915_BO_CACHE_COHERENT_FOR_WRITE);
-	else if (HAS_LLC(i915))
-		obj->cache_coherent = I915_BO_CACHE_COHERENT_FOR_READ;
-	else
-		obj->cache_coherent = 0;
+		i915_cache_print(buf, sizeof(buf), NULL, cache);
+		drm_err_ratelimited(&i915->drm, "Unknown cache mode %s!\n",
+				    buf);
 
-	obj->cache_dirty =
-		!(obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_WRITE) &&
-		!IS_DGFX(i915);
+		found = i915->pat_uc;
+	}
+
+	obj->pat_index = found;
+
+	__i915_gem_object_update_coherency(obj);
 }
 
 /**
@@ -181,24 +196,18 @@ void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object *obj,
 void i915_gem_object_set_pat_index(struct drm_i915_gem_object *obj,
 				   unsigned int pat_index)
 {
-	struct drm_i915_private *i915 = to_i915(obj->base.dev);
+	struct drm_i915_private *i915 = obj_to_i915(obj);
 
 	if (obj->pat_index == pat_index)
 		return;
 
+	if (drm_WARN_ON_ONCE(&i915->drm,
+			     pat_index > INTEL_INFO(i915)->max_pat_index))
+		return;
+
 	obj->pat_index = pat_index;
 
-	if (pat_index != i915_gem_get_pat_index(i915, I915_CACHE_NONE))
-		obj->cache_coherent = (I915_BO_CACHE_COHERENT_FOR_READ |
-				       I915_BO_CACHE_COHERENT_FOR_WRITE);
-	else if (HAS_LLC(i915))
-		obj->cache_coherent = I915_BO_CACHE_COHERENT_FOR_READ;
-	else
-		obj->cache_coherent = 0;
-
-	obj->cache_dirty =
-		!(obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_WRITE) &&
-		!IS_DGFX(i915);
+	__i915_gem_object_update_coherency(obj);
 }
 
 bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h
index 884a17275b3a..a5d4ee19d9be 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
@@ -13,6 +13,7 @@
 
 #include "display/intel_frontbuffer.h"
 #include "intel_memory_region.h"
+#include "i915_cache.h"
 #include "i915_gem_object_types.h"
 #include "i915_gem_gtt.h"
 #include "i915_gem_ww.h"
@@ -32,10 +33,6 @@ static inline bool i915_gem_object_size_2big(u64 size)
 	return false;
 }
 
-unsigned int i915_gem_get_pat_index(struct drm_i915_private *i915,
-				    enum i915_cache_level level);
-bool i915_gem_object_has_cache_level(const struct drm_i915_gem_object *obj,
-				     enum i915_cache_level lvl);
 void i915_gem_init__objects(struct drm_i915_private *i915);
 
 void i915_objects_module_exit(void);
@@ -764,8 +761,12 @@ int i915_gem_object_wait_moving_fence(struct drm_i915_gem_object *obj,
 				      bool intr);
 bool i915_gem_object_has_unknown_state(struct drm_i915_gem_object *obj);
 
+bool i915_gem_object_has_cache_mode(const struct drm_i915_gem_object *obj,
+				    enum i915_cache_mode mode);
+bool i915_gem_object_has_cache_flag(const struct drm_i915_gem_object *obj,
+				    unsigned int flag);
 void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object *obj,
-					 unsigned int cache_level);
+					 i915_cache_t cache);
 void i915_gem_object_set_pat_index(struct drm_i915_gem_object *obj,
 				   unsigned int pat_index);
 bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index 8de2b91b3edf..6790e13ad262 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -14,6 +14,7 @@
 #include <uapi/drm/i915_drm.h>
 
 #include "i915_active.h"
+#include "i915_cache.h"
 #include "i915_selftest.h"
 #include "i915_vma_resource.h"
 
@@ -116,93 +117,6 @@ struct drm_i915_gem_object_ops {
 	const char *name; /* friendly name for debug, e.g. lockdep classes */
 };
 
-/**
- * enum i915_cache_level - The supported GTT caching values for system memory
- * pages.
- *
- * These translate to some special GTT PTE bits when binding pages into some
- * address space. It also determines whether an object, or rather its pages are
- * coherent with the GPU, when also reading or writing through the CPU cache
- * with those pages.
- *
- * Userspace can also control this through struct drm_i915_gem_caching.
- */
-enum i915_cache_level {
-	/**
-	 * @I915_CACHE_NONE:
-	 *
-	 * GPU access is not coherent with the CPU cache. If the cache is dirty
-	 * and we need the underlying pages to be coherent with some later GPU
-	 * access then we need to manually flush the pages.
-	 *
-	 * On shared LLC platforms reads and writes through the CPU cache are
-	 * still coherent even with this setting. See also
-	 * &drm_i915_gem_object.cache_coherent for more details. Due to this we
-	 * should only ever use uncached for scanout surfaces, otherwise we end
-	 * up over-flushing in some places.
-	 *
-	 * This is the default on non-LLC platforms.
-	 */
-	I915_CACHE_NONE = 0,
-	/**
-	 * @I915_CACHE_LLC:
-	 *
-	 * GPU access is coherent with the CPU cache. If the cache is dirty,
-	 * then the GPU will ensure that access remains coherent, when both
-	 * reading and writing through the CPU cache. GPU writes can dirty the
-	 * CPU cache.
-	 *
-	 * Not used for scanout surfaces.
-	 *
-	 * Applies to both platforms with shared LLC(HAS_LLC), and snooping
-	 * based platforms(HAS_SNOOP).
-	 *
-	 * This is the default on shared LLC platforms.  The only exception is
-	 * scanout objects, where the display engine is not coherent with the
-	 * CPU cache. For such objects I915_CACHE_NONE or I915_CACHE_WT is
-	 * automatically applied by the kernel in pin_for_display, if userspace
-	 * has not done so already.
-	 */
-	I915_CACHE_LLC,
-	/**
-	 * @I915_CACHE_L3_LLC:
-	 *
-	 * Explicitly enable the Gfx L3 cache, with coherent LLC.
-	 *
-	 * The Gfx L3 sits between the domain specific caches, e.g
-	 * sampler/render caches, and the larger LLC. LLC is coherent with the
-	 * GPU, but L3 is only visible to the GPU, so likely needs to be flushed
-	 * when the workload completes.
-	 *
-	 * Not used for scanout surfaces.
-	 *
-	 * Only exposed on some gen7 + GGTT. More recent hardware has dropped
-	 * this explicit setting, where it should now be enabled by default.
-	 */
-	I915_CACHE_L3_LLC,
-	/**
-	 * @I915_CACHE_WT:
-	 *
-	 * Write-through. Used for scanout surfaces.
-	 *
-	 * The GPU can utilise the caches, while still having the display engine
-	 * be coherent with GPU writes, as a result we don't need to flush the
-	 * CPU caches when moving out of the render domain. This is the default
-	 * setting chosen by the kernel, if supported by the HW, otherwise we
-	 * fallback to I915_CACHE_NONE. On the CPU side writes through the CPU
-	 * cache still need to be flushed, to remain coherent with the display
-	 * engine.
-	 */
-	I915_CACHE_WT,
-	/**
-	 * @I915_MAX_CACHE_LEVEL:
-	 *
-	 * Mark the last entry in the enum. Used for defining cachelevel_to_pat
-	 * array for cache_level to pat translation table.
-	 */
-	I915_MAX_CACHE_LEVEL,
-};
-
 enum i915_map_type {
 	I915_MAP_WB = 0,
 	I915_MAP_WC,
@@ -403,16 +317,6 @@ struct drm_i915_gem_object {
 	/**
 	 * @cache_coherent:
 	 *
-	 * Note: with the change above which replaced @cache_level with pat_index,
-	 * the use of @cache_coherent is limited to the objects created by kernel
-	 * or by userspace without pat index specified.
-	 * Check for @pat_set_by_user to find out if an object has pat index set
-	 * by userspace. The ioctl's to change cache settings have also been
-	 * disabled for the objects with pat index set by userspace. Please don't
-	 * assume @cache_coherent having the flags set as describe here. A helper
-	 * function i915_gem_object_has_cache_level() provides one way to bypass
-	 * the use of this field.
-	 *
 	 * Track whether the pages are coherent with the GPU if reading or
 	 * writing through the CPU caches. The largely depends on the
 	 * @cache_level setting.
@@ -447,7 +351,7 @@ struct drm_i915_gem_object {
 	 * flushing the surface just before doing the scanout.  This does mean
 	 * we might unnecessarily flush non-scanout objects in some places, but
 	 * the default assumption is that all normal objects should be using
-	 * I915_CACHE_LLC, at least on platforms with the shared LLC.
+	 * I915_CACHE_CACHED, at least on platforms with the shared LLC.
 	 *
 	 * Supported values:
 	 *
@@ -486,16 +390,6 @@ struct drm_i915_gem_object {
 	/**
 	 * @cache_dirty:
 	 *
-	 * Note: with the change above which replaced cache_level with pat_index,
-	 * the use of @cache_dirty is limited to the objects created by kernel
-	 * or by userspace without pat index specified.
-	 * Check for @pat_set_by_user to find out if an object has pat index set
-	 * by userspace. The ioctl's to change cache settings have also been
-	 * disabled for the objects with pat_index set by userspace. Please don't
-	 * assume @cache_dirty is set as describe here. Also see helper function
-	 * i915_gem_object_has_cache_level() for possible ways to bypass the use
-	 * of this field.
-	 *
 	 * Track if we are we dirty with writes through the CPU cache for this
 	 * object. As a result reading directly from main memory might yield
 	 * stale data.
@@ -531,9 +425,9 @@ struct drm_i915_gem_object {
 	 *
 	 *   1. All userspace objects, by default, have @cache_level set as
 	 *   I915_CACHE_NONE. The only exception is userptr objects, where we
-	 *   instead force I915_CACHE_LLC, but we also don't allow userspace to
-	 *   ever change the @cache_level for such objects. Another special case
-	 *   is dma-buf, which doesn't rely on @cache_dirty,  but there we
+	 *   instead force I915_CACHE_CACHED, but we also don't allow userspace
+	 *   to ever change the @cache_level for such objects. Another special
+	 *   case is dma-buf, which doesn't rely on @cache_dirty,  but there we
 	 *   always do a forced flush when acquiring the pages, if there is a
 	 *   chance that the pages can be read directly from main memory with
 	 *   the GPU.
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
index 8f1633c3fb93..aba908f0349f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
@@ -584,7 +584,7 @@ static int shmem_object_init(struct intel_memory_region *mem,
 	static struct lock_class_key lock_class;
 	struct drm_i915_private *i915 = mem->i915;
 	struct address_space *mapping;
-	unsigned int cache_level;
+	i915_cache_t cache;
 	gfp_t mask;
 	int ret;
 
@@ -628,11 +628,11 @@ static int shmem_object_init(struct intel_memory_region *mem,
 		 * However, we maintain the display planes as UC, and so
 		 * need to rebind when first used as such.
 		 */
-		cache_level = I915_CACHE_LLC;
+		cache = I915_CACHE_CACHED;
 	else
-		cache_level = I915_CACHE_NONE;
+		cache = I915_CACHE_NONE;
 
-	i915_gem_object_set_cache_coherency(obj, cache_level);
+	i915_gem_object_set_cache_coherency(obj, cache);
 
 	i915_gem_object_init_memory_region(obj, mem);
 
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
index 1c8eb806b7d3..cc907a1f1c53 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
@@ -691,7 +691,7 @@ static int __i915_gem_object_create_stolen(struct intel_memory_region *mem,
 
 	obj->stolen = stolen;
 	obj->read_domains = I915_GEM_DOMAIN_CPU | I915_GEM_DOMAIN_GTT;
-	cache_level = HAS_LLC(mem->i915) ? I915_CACHE_LLC : I915_CACHE_NONE;
+	cache_level = HAS_LLC(mem->i915) ? I915_CACHE_CACHED : I915_CACHE_NONE;
 	i915_gem_object_set_cache_coherency(obj, cache_level);
 
 	if (WARN_ON(!i915_gem_object_trylock(obj, NULL)))
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index 6bd6c239f4ac..107176d1757b 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -48,14 +48,14 @@ void i915_ttm_migrate_set_ban_memcpy(bool ban)
 }
 #endif
 
-static enum i915_cache_level
-i915_ttm_cache_level(struct drm_i915_private *i915, struct ttm_resource *res,
-		     struct ttm_tt *ttm)
+static i915_cache_t
+i915_ttm_cache(struct drm_i915_private *i915, struct ttm_resource *res,
+	       struct ttm_tt *ttm)
 {
 	return ((HAS_LLC(i915) || HAS_SNOOP(i915)) &&
 		!i915_ttm_gtt_binds_lmem(res) &&
-		ttm->caching == ttm_cached) ? I915_CACHE_LLC :
-		I915_CACHE_NONE;
+		ttm->caching == ttm_cached) ? I915_CACHE_CACHED :
+					      I915_CACHE_NONE;
 }
 
 static unsigned int
@@ -112,8 +112,8 @@ void i915_ttm_adjust_domains_after_move(struct drm_i915_gem_object *obj)
 void i915_ttm_adjust_gem_after_move(struct drm_i915_gem_object *obj)
 {
 	struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
-	unsigned int cache_level;
 	unsigned int mem_flags;
+	i915_cache_t cache;
 	unsigned int i;
 	int mem_type;
 
@@ -126,13 +126,13 @@ void i915_ttm_adjust_gem_after_move(struct drm_i915_gem_object *obj)
 	if (!bo->resource) {
 		mem_flags = I915_BO_FLAG_STRUCT_PAGE;
 		mem_type = I915_PL_SYSTEM;
-		cache_level = I915_CACHE_NONE;
+		cache = I915_CACHE_NONE;
 	} else {
 		mem_flags = i915_ttm_cpu_maps_iomem(bo->resource) ? I915_BO_FLAG_IOMEM :
 			I915_BO_FLAG_STRUCT_PAGE;
 		mem_type = bo->resource->mem_type;
-		cache_level = i915_ttm_cache_level(to_i915(bo->base.dev), bo->resource,
-						   bo->ttm);
+		cache = i915_ttm_cache(to_i915(bo->base.dev), bo->resource,
+				       bo->ttm);
 	}
 
 	/*
@@ -157,7 +157,7 @@ void i915_ttm_adjust_gem_after_move(struct drm_i915_gem_object *obj)
 	obj->mem_flags &= ~(I915_BO_FLAG_STRUCT_PAGE | I915_BO_FLAG_IOMEM);
 	obj->mem_flags |= mem_flags;
 
-	i915_gem_object_set_cache_coherency(obj, cache_level);
+	i915_gem_object_set_cache_coherency(obj, cache);
 }
 
 /**
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
index 1d3ebdf4069b..5d2891981bd4 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
@@ -553,7 +553,7 @@ i915_gem_userptr_ioctl(struct drm_device *dev,
 	obj->mem_flags = I915_BO_FLAG_STRUCT_PAGE;
 	obj->read_domains = I915_GEM_DOMAIN_CPU;
 	obj->write_domain = I915_GEM_DOMAIN_CPU;
-	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
+	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
 
 	obj->userptr.ptr = args->user_ptr;
 	obj->userptr.notifier_seq = ULONG_MAX;
diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c b/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c
index bac957755068..77d04be5e9d7 100644
--- a/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c
@@ -123,7 +123,7 @@ huge_gem_object(struct drm_i915_private *i915,
 
 	obj->read_domains = I915_GEM_DOMAIN_CPU;
 	obj->write_domain = I915_GEM_DOMAIN_CPU;
-	cache_level = HAS_LLC(i915) ? I915_CACHE_LLC : I915_CACHE_NONE;
+	cache_level = HAS_LLC(i915) ? I915_CACHE_CACHED : I915_CACHE_NONE;
 	i915_gem_object_set_cache_coherency(obj, cache_level);
 	obj->scratch = phys_size;
 
diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
index 6bddd733d796..6ca5b9dbc414 100644
--- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
+++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
@@ -200,9 +200,10 @@ huge_pages_object(struct drm_i915_private *i915,
 	obj->write_domain = I915_GEM_DOMAIN_CPU;
 	obj->read_domains = I915_GEM_DOMAIN_CPU;
 
-	cache_level = HAS_LLC(i915) ? I915_CACHE_LLC : I915_CACHE_NONE;
+	cache_level = HAS_LLC(i915) ? I915_CACHE_CACHED : I915_CACHE_NONE;
 	i915_gem_object_set_cache_coherency(obj, cache_level);
 
+
 	obj->mm.page_mask = page_mask;
 
 	return obj;
diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
index 675f71f06e89..3c93a73cf6b1 100644
--- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
@@ -16,11 +16,11 @@
 #include "intel_gtt.h"
 
 static u64 gen8_pde_encode(const dma_addr_t addr,
-			   const enum i915_cache_level level)
+			   const enum i915_cache_mode cache_mode)
 {
 	u64 pde = addr | GEN8_PAGE_PRESENT | GEN8_PAGE_RW;
 
-	if (level != I915_CACHE_NONE)
+	if (cache_mode != I915_CACHE_MODE_UC)
 		pde |= PPAT_CACHED_PDE;
 	else
 		pde |= PPAT_UNCACHED;
@@ -43,10 +43,10 @@ static u64 gen8_pte_encode(dma_addr_t addr,
 	 * See translation table defined by LEGACY_CACHELEVEL.
 	 */
 	switch (pat_index) {
-	case I915_CACHE_NONE:
+	case I915_CACHE_MODE_UC:
 		pte |= PPAT_UNCACHED;
 		break;
-	case I915_CACHE_WT:
+	case I915_CACHE_MODE_WT:
 		pte |= PPAT_DISPLAY_ELLC;
 		break;
 	default:
@@ -893,7 +893,7 @@ static int gen8_init_scratch(struct i915_address_space *vm)
 		}
 
 		fill_px(obj, vm->scratch[i - 1]->encode);
-		obj->encode = gen8_pde_encode(px_dma(obj), I915_CACHE_NONE);
+		obj->encode = gen8_pde_encode(px_dma(obj), I915_CACHE_MODE_UC);
 
 		vm->scratch[i] = obj;
 	}
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index ee15486fed0d..f1e59e512d14 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -1103,7 +1103,7 @@ static int init_status_page(struct intel_engine_cs *engine)
 		return PTR_ERR(obj);
 	}
 
-	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
+	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
 
 	vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL);
 	if (IS_ERR(vma)) {
diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c
index fca61ddca8ad..ab5f654e7557 100644
--- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
@@ -1011,11 +1011,6 @@ static int gen8_gmch_probe(struct i915_ggtt *ggtt)
 	return ggtt_probe_common(ggtt, size);
 }
 
-/*
- * For pre-gen8 platforms pat_index is the same as enum i915_cache_level,
- * so the switch-case statements in these PTE encode functions are still valid.
- * See translation table LEGACY_CACHELEVEL.
- */
 static u64 snb_pte_encode(dma_addr_t addr,
 			  unsigned int pat_index,
 			  u32 flags)
@@ -1023,11 +1018,11 @@ static u64 snb_pte_encode(dma_addr_t addr,
 	gen6_pte_t pte = GEN6_PTE_ADDR_ENCODE(addr) | GEN6_PTE_VALID;
 
 	switch (pat_index) {
-	case I915_CACHE_L3_LLC:
-	case I915_CACHE_LLC:
+	case I915_CACHE_MODE_WB:
+	case __I915_CACHE_MODE_WB_L3:
 		pte |= GEN6_PTE_CACHE_LLC;
 		break;
-	case I915_CACHE_NONE:
+	case I915_CACHE_MODE_UC:
 		pte |= GEN6_PTE_UNCACHED;
 		break;
 	default:
@@ -1044,13 +1039,13 @@ static u64 ivb_pte_encode(dma_addr_t addr,
 	gen6_pte_t pte = GEN6_PTE_ADDR_ENCODE(addr) | GEN6_PTE_VALID;
 
 	switch (pat_index) {
-	case I915_CACHE_L3_LLC:
+	case __I915_CACHE_MODE_WB_L3:
 		pte |= GEN7_PTE_CACHE_L3_LLC;
 		break;
-	case I915_CACHE_LLC:
+	case I915_CACHE_MODE_WB:
 		pte |= GEN6_PTE_CACHE_LLC;
 		break;
-	case I915_CACHE_NONE:
+	case I915_CACHE_MODE_UC:
 		pte |= GEN6_PTE_UNCACHED;
 		break;
 	default:
@@ -1069,7 +1064,7 @@ static u64 byt_pte_encode(dma_addr_t addr,
 	if (!(flags & PTE_READ_ONLY))
 		pte |= BYT_PTE_WRITEABLE;
 
-	if (pat_index != I915_CACHE_NONE)
+	if (pat_index != I915_CACHE_MODE_UC)
 		pte |= BYT_PTE_SNOOPED_BY_CPU_CACHES;
 
 	return pte;
@@ -1081,7 +1076,7 @@ static u64 hsw_pte_encode(dma_addr_t addr,
 {
 	gen6_pte_t pte = HSW_PTE_ADDR_ENCODE(addr) | GEN6_PTE_VALID;
 
-	if (pat_index != I915_CACHE_NONE)
+	if (pat_index != I915_CACHE_MODE_UC)
 		pte |= HSW_WB_LLC_AGE3;
 
 	return pte;
@@ -1094,9 +1089,9 @@ static u64 iris_pte_encode(dma_addr_t addr,
 	gen6_pte_t pte = HSW_PTE_ADDR_ENCODE(addr) | GEN6_PTE_VALID;
 
 	switch (pat_index) {
-	case I915_CACHE_NONE:
+	case I915_CACHE_MODE_UC:
 		break;
-	case I915_CACHE_WT:
+	case I915_CACHE_MODE_WT:
 		pte |= HSW_WT_ELLC_LLC_AGE3;
 		break;
 	default:
diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c b/drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c
index 866c416afb73..803c41ac4ccb 100644
--- a/drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c
+++ b/drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c
@@ -21,7 +21,7 @@ static void gmch_ggtt_insert_page(struct i915_address_space *vm,
 				  unsigned int pat_index,
 				  u32 unused)
 {
-	unsigned int flags = (pat_index == I915_CACHE_NONE) ?
+	unsigned int flags = (pat_index == I915_CACHE_MODE_UC) ?
 		AGP_USER_MEMORY : AGP_USER_CACHED_MEMORY;
 
 	intel_gmch_gtt_insert_page(addr, offset >> PAGE_SHIFT, flags);
@@ -32,7 +32,7 @@ static void gmch_ggtt_insert_entries(struct i915_address_space *vm,
 				     unsigned int pat_index,
 				     u32 unused)
 {
-	unsigned int flags = (pat_index == I915_CACHE_NONE) ?
+	unsigned int flags = (pat_index == I915_CACHE_MODE_UC) ?
 		AGP_USER_MEMORY : AGP_USER_CACHED_MEMORY;
 
 	intel_gmch_gtt_insert_sg_entries(vma_res->bi.pages, vma_res->start >> PAGE_SHIFT,
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
index 065099362a98..48055304537a 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
@@ -676,7 +676,7 @@ __vm_create_scratch_for_read(struct i915_address_space *vm, unsigned long size)
 	if (IS_ERR(obj))
 		return ERR_CAST(obj);
 
-	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
+	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
 
 	vma = i915_vma_instance(obj, vm, NULL);
 	if (IS_ERR(vma)) {
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
index 7192a534a654..af4277c1d577 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
@@ -636,7 +636,8 @@ void
 __set_pd_entry(struct i915_page_directory * const pd,
 	       const unsigned short idx,
 	       struct i915_page_table *pt,
-	       u64 (*encode)(const dma_addr_t, const enum i915_cache_level));
+	       u64 (*encode)(const dma_addr_t,
+			     const enum i915_cache_mode cache_mode));
 
 #define set_pd_entry(pd, idx, to) \
 	__set_pd_entry((pd), (idx), px_pt(to), gen8_pde_encode)
diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
index 436756bfbb1a..3e461d4f3693 100644
--- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
@@ -98,14 +98,16 @@ void
 __set_pd_entry(struct i915_page_directory * const pd,
 	       const unsigned short idx,
 	       struct i915_page_table * const to,
-	       u64 (*encode)(const dma_addr_t, const enum i915_cache_level))
+	       u64 (*encode)(const dma_addr_t,
+			     const enum i915_cache_mode cache_mode))
 {
 	/* Each thread pre-pins the pd, and we may have a thread per pde. */
 	GEM_BUG_ON(atomic_read(px_used(pd)) > NALLOC * I915_PDES);
 
 	atomic_inc(px_used(pd));
 	pd->entry[idx] = to;
-	write_dma_entry(px_base(pd), idx, encode(px_dma(to), I915_CACHE_LLC));
+	write_dma_entry(px_base(pd), idx,
+			encode(px_dma(to), I915_CACHE_MODE_WB));
 }
 
 void
diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
index 92085ffd23de..9131d228d285 100644
--- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
@@ -551,7 +551,9 @@ alloc_context_vma(struct intel_engine_cs *engine)
 	 * later platforms don't have L3 control bits in the PTE.
 	 */
 	if (IS_IVYBRIDGE(i915))
-		i915_gem_object_set_cache_coherency(obj, I915_CACHE_L3_LLC);
+		i915_gem_object_set_cache_coherency(obj,
+						    I915_CACHE_CACHED |
+						    __I915_CACHE_FLAG(L3));
 
 	vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL);
 	if (IS_ERR(vma)) {
diff --git a/drivers/gpu/drm/i915/gt/intel_timeline.c b/drivers/gpu/drm/i915/gt/intel_timeline.c
index b9640212d659..025ce54c886d 100644
--- a/drivers/gpu/drm/i915/gt/intel_timeline.c
+++ b/drivers/gpu/drm/i915/gt/intel_timeline.c
@@ -26,7 +26,7 @@ static struct i915_vma *hwsp_alloc(struct intel_gt *gt)
 	if (IS_ERR(obj))
 		return ERR_CAST(obj);
 
-	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
+	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
 
 	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL);
 	if (IS_ERR(vma))
diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
index 8b0d84f2aad2..fc278fa463b0 100644
--- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
+++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
@@ -64,7 +64,7 @@ static int hang_init(struct hang *h, struct intel_gt *gt)
 		goto err_hws;
 	}
 
-	i915_gem_object_set_cache_coherency(h->hws, I915_CACHE_LLC);
+	i915_gem_object_set_cache_coherency(h->hws, I915_CACHE_CACHED);
 	vaddr = i915_gem_object_pin_map_unlocked(h->hws, I915_MAP_WB);
 	if (IS_ERR(vaddr)) {
 		err = PTR_ERR(vaddr);
diff --git a/drivers/gpu/drm/i915/gt/selftest_workarounds.c b/drivers/gpu/drm/i915/gt/selftest_workarounds.c
index 14a8b25b6204..d25990d33d44 100644
--- a/drivers/gpu/drm/i915/gt/selftest_workarounds.c
+++ b/drivers/gpu/drm/i915/gt/selftest_workarounds.c
@@ -111,7 +111,7 @@ read_nonprivs(struct intel_context *ce)
 	if (IS_ERR(result))
 		return result;
 
-	i915_gem_object_set_cache_coherency(result, I915_CACHE_LLC);
+	i915_gem_object_set_cache_coherency(result, I915_CACHE_CACHED);
 
 	cs = i915_gem_object_pin_map_unlocked(result, I915_MAP_WB);
 	if (IS_ERR(cs)) {
diff --git a/drivers/gpu/drm/i915/i915_cache.c b/drivers/gpu/drm/i915/i915_cache.c
index 06eb5933c719..f4ba1cb430d3 100644
--- a/drivers/gpu/drm/i915/i915_cache.c
+++ b/drivers/gpu/drm/i915/i915_cache.c
@@ -6,13 +6,88 @@
 #include "i915_cache.h"
 #include "i915_drv.h"
 
-void i915_cache_init(struct drm_i915_private *i915)
+int i915_cache_init(struct drm_i915_private *i915)
 {
-	i915->pat_uc = i915_gem_get_pat_index(i915, I915_CACHE_NONE);
-	drm_info(&i915->drm, "Using PAT index %u for uncached access\n",
-		 i915->pat_uc);
+	int ret;
 
-	i915->pat_wb = i915_gem_get_pat_index(i915, I915_CACHE_LLC);
-	drm_info(&i915->drm, "Using PAT index %u for write-back access\n",
-		 i915->pat_wb);
+	ret = i915_cache_find_pat(i915, I915_CACHE_NONE);
+	if (ret < 0) {
+		drm_err(&i915->drm,
+			"Failed to find PAT index for uncached access\n");
+		return -ENODEV;
+	}
+	drm_info(&i915->drm, "Using PAT index %u for uncached access\n", ret);
+	i915->pat_uc = ret;
+
+	ret = i915_cache_find_pat(i915, I915_CACHE_CACHED);
+	if (ret < 0) {
+		drm_err(&i915->drm,
+			"Failed to find PAT index for write-back access\n");
+		return -ENODEV;
+	}
+	drm_info(&i915->drm, "Using PAT index %u for write-back access\n", ret);
+	i915->pat_wb = ret;
+
+	return 0;
+}
+
+int i915_cache_find_pat(struct drm_i915_private *i915, i915_cache_t cache)
+{
+	const struct intel_device_info *info = INTEL_INFO(i915);
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(info->cache_modes); i++) {
+		if (info->cache_modes[i] == cache)
+			return i;
+	}
+
+	return -1;
+}
+
+void i915_cache_print(char *buf, size_t buflen, const char *suffix,
+		      i915_cache_t cache)
+{
+	const enum i915_cache_mode mode = I915_CACHE_MODE(cache);
+	static const char * const mode_str[] = {
+		[I915_CACHE_MODE_UC] = "UC",
+		[I915_CACHE_MODE_WB] = "WB",
+		[I915_CACHE_MODE_WT] = "WT",
+		[I915_CACHE_MODE_WC] = "WC",
+	};
+	static const char * const flag_str[] = {
+		[ilog2(I915_CACHE_FLAG_COH1W)] = "1-Way-Coherent",
+		[ilog2(I915_CACHE_FLAG_COH2W)] = "2-Way-Coherent",
+		[ilog2(I915_CACHE_FLAG_L3)] =    "L3",
+		[ilog2(I915_CACHE_FLAG_CLOS1)] = "CLOS1",
+		[ilog2(I915_CACHE_FLAG_CLOS2)] = "CLOS2",
+	};
+
+	if (mode > ARRAY_SIZE(mode_str)) {
+		snprintf(buf, buflen, "0x%x%s", cache, suffix ?: "");
+	} else {
+		unsigned long flags = I915_CACHE_FLAGS(cache);
+		unsigned long bit;
+		int ret;
+
+		ret = snprintf(buf, buflen, "%s", mode_str[mode]);
+		buf += ret;
+		buflen -= ret;
+
+		/*
+		 * Don't print "1-way-2-way", it would be confusing and 2-way
+		 * implies 1-way anyway.
+		 */
+		if ((flags & (I915_CACHE_FLAG_COH1W | I915_CACHE_FLAG_COH2W)) ==
+		    (I915_CACHE_FLAG_COH1W | I915_CACHE_FLAG_COH2W))
+			flags &= ~I915_CACHE_FLAG_COH1W;
+
+		for_each_set_bit(bit, &flags, BITS_PER_TYPE(i915_cache_t)) {
+			ret = snprintf(buf, buflen, "-%s", flag_str[bit]);
+			buf += ret;
+			buflen -= ret;
+		}
+
+		if (suffix)
+			snprintf(buf, buflen, "%s", suffix);
+	}
 }
diff --git a/drivers/gpu/drm/i915/i915_cache.h b/drivers/gpu/drm/i915/i915_cache.h
index cb68936fb8a2..d9e97318b942 100644
--- a/drivers/gpu/drm/i915/i915_cache.h
+++ b/drivers/gpu/drm/i915/i915_cache.h
@@ -6,8 +6,76 @@
 #ifndef __I915_CACHE_H__
 #define __I915_CACHE_H__
 
+#include <linux/types.h>
+
+struct drm_printer;
+
 struct drm_i915_private;
 
-void i915_cache_init(struct drm_i915_private *i915);
+typedef u16 i915_cache_t;
+
+/* Cache modes */
+enum i915_cache_mode {
+	I915_CACHE_MODE_UC = 0,
+	I915_CACHE_MODE_WB,
+	__I915_CACHE_MODE_WB_L3, /* Special do-not-use entry for legacy 1:1 mapping. */
+	I915_CACHE_MODE_WT,
+	I915_CACHE_MODE_WC,
+	I915_NUM_CACHE_MODES
+};
+
+/* Cache mode flag bits */
+#define I915_CACHE_FLAG_COH1W	(0x1)
+#define I915_CACHE_FLAG_COH2W	(0x2) /* 1-way needs to be set too. */
+#define I915_CACHE_FLAG_L3	(0x4)
+#define I915_CACHE_FLAG_CLOS1	(0x8)
+#define I915_CACHE_FLAG_CLOS2	(0x10)
+
+/*
+ * Overloaded I915_CACHE() macro based on:
+ *  https://stackoverflow.com/questions/3046889/optional-parameters-with-c-macros
+ *
+ * It is possible to call I915_CACHE with mode and zero or more flags as
+ * separate arguments. Ie these all work:
+ *
+ *   I915_CACHE(WB)
+ *   I915_CACHE(WB, COH1W, COH2W)
+ *   I915_CACHE(WB, COH1W, COH2W, L3)
+ */
+
+#define __I915_CACHE_FLAG(f) (I915_CACHE_FLAG_##f << 8)
+#define __I915_CACHE(m, f) ((i915_cache_t)(I915_CACHE_MODE_##m | (f)))
+
+#define I915_CACHE_4(m, f1, f2, f3)	__I915_CACHE(m, __I915_CACHE_FLAG(f1) | __I915_CACHE_FLAG(f2) | __I915_CACHE_FLAG(f3))
+#define I915_CACHE_3(m, f1, f2)		__I915_CACHE(m, __I915_CACHE_FLAG(f1) | __I915_CACHE_FLAG(f2))
+#define I915_CACHE_2(m, f1)		__I915_CACHE(m, __I915_CACHE_FLAG(f1))
+#define I915_CACHE_1(m)			__I915_CACHE(m, 0)
+#define I915_CACHE_0(m)			__I915_CACHE(WC, 0)
+
+#define FUNC_CHOOSER(_f1, _f2, _f3, _f4, _f5, ...) _f5
+#define FUNC_RECOMPOSER(argsWithParentheses) FUNC_CHOOSER argsWithParentheses
+#define CHOOSE_FROM_ARG_COUNT(...) FUNC_RECOMPOSER((__VA_ARGS__, I915_CACHE_4, I915_CACHE_3, I915_CACHE_2, I915_CACHE_1, ))
+#define NO_ARG_EXPANDER() ,,,I915_CACHE_0
+#define MACRO_CHOOSER(...) CHOOSE_FROM_ARG_COUNT(NO_ARG_EXPANDER __VA_ARGS__ ())
+
+#define I915_CACHE(...) MACRO_CHOOSER(__VA_ARGS__)(__VA_ARGS__)
+
+/* i915_cache_t mode and flags extraction helpers. */
+#define I915_CACHE_MODE(cache) \
+	((enum i915_cache_mode)(((i915_cache_t)(cache)) & 0xff))
+#define I915_CACHE_FLAGS(cache) \
+	((unsigned int)((((i915_cache_t)(cache) & 0xff00)) >> 8))
+
+/* Helpers for i915 caching modes. */
+#define I915_CACHE_NONE		I915_CACHE(UC)
+#define I915_CACHE_CACHED	I915_CACHE(WB, COH1W, COH2W)
+#define I915_CACHE_WT		I915_CACHE(WT)
+
+int i915_cache_init(struct drm_i915_private *i915);
+int i915_cache_find_pat(struct drm_i915_private *i915, i915_cache_t cache);
+void i915_cache_print(char *buf, size_t buflen, const char *suffix,
+		      i915_cache_t cache);
+
+#define I915_CACHE_NAME_LEN (40)
 
 #endif /* __I915_CACHE_H__ */
diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 4de44cf1026d..4ec292011546 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -140,57 +140,18 @@ static const char *stringify_vma_type(const struct i915_vma *vma)
 	return "ppgtt";
 }
 
-static const char *i915_cache_level_str(struct drm_i915_gem_object *obj)
-{
-	struct drm_i915_private *i915 = obj_to_i915(obj);
-
-	if (IS_METEORLAKE(i915)) {
-		switch (obj->pat_index) {
-		case 0: return " WB";
-		case 1: return " WT";
-		case 2: return " UC";
-		case 3: return " WB (1-Way Coh)";
-		case 4: return " WB (2-Way Coh)";
-		default: return " not defined";
-		}
-	} else if (IS_PONTEVECCHIO(i915)) {
-		switch (obj->pat_index) {
-		case 0: return " UC";
-		case 1: return " WC";
-		case 2: return " WT";
-		case 3: return " WB";
-		case 4: return " WT (CLOS1)";
-		case 5: return " WB (CLOS1)";
-		case 6: return " WT (CLOS2)";
-		case 7: return " WT (CLOS2)";
-		default: return " not defined";
-		}
-	} else if (GRAPHICS_VER(i915) >= 12) {
-		switch (obj->pat_index) {
-		case 0: return " WB";
-		case 1: return " WC";
-		case 2: return " WT";
-		case 3: return " UC";
-		default: return " not defined";
-		}
-	} else {
-		switch (obj->pat_index) {
-		case 0: return " UC";
-		case 1: return HAS_LLC(i915) ?
-			       " LLC" : " snooped";
-		case 2: return " L3+LLC";
-		case 3: return " WT";
-		default: return " not defined";
-		}
-	}
-}
-
 void
 i915_debugfs_describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
 {
+	struct drm_i915_private *i915 = to_i915(obj->base.dev);
+	char buf[I915_CACHE_NAME_LEN];
 	struct i915_vma *vma;
 	int pin_count = 0;
 
+	i915_cache_print(buf, sizeof(buf),
+			 obj->pat_set_by_user ? "!" : NULL,
+			 INTEL_INFO(i915)->cache_modes[obj->pat_index]);
+
 	seq_printf(m, "%pK: %c%c%c %8zdKiB %02x %02x %s%s%s",
 		   &obj->base,
 		   get_tiling_flag(obj),
@@ -199,7 +160,7 @@ i915_debugfs_describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
 		   obj->base.size / 1024,
 		   obj->read_domains,
 		   obj->write_domain,
-		   i915_cache_level_str(obj),
+		   buf,
 		   obj->mm.dirty ? " dirty" : "",
 		   obj->mm.madv == I915_MADV_DONTNEED ? " purgeable" : "");
 	if (obj->base.name)
diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
index bb2223cc3470..8663388a524f 100644
--- a/drivers/gpu/drm/i915/i915_driver.c
+++ b/drivers/gpu/drm/i915/i915_driver.c
@@ -241,7 +241,9 @@ static int i915_driver_early_probe(struct drm_i915_private *dev_priv)
 	i915_memcpy_init_early(dev_priv);
 	intel_runtime_pm_init_early(&dev_priv->runtime_pm);
 
-	i915_cache_init(dev_priv);
+	ret = i915_cache_init(dev_priv);
+	if (ret < 0)
+		return ret;
 
 	ret = i915_workqueues_init(dev_priv);
 	if (ret < 0)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 896aa48ed089..814705cfeb12 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1144,19 +1144,6 @@ int i915_gem_init(struct drm_i915_private *dev_priv)
 	unsigned int i;
 	int ret;
 
-	/*
-	 * In the proccess of replacing cache_level with pat_index a tricky
-	 * dependency is created on the definition of the enum i915_cache_level.
-	 * in case this enum is changed, PTE encode would be broken.
-	 * Add a WARNING here. And remove when we completely quit using this
-	 * enum
-	 */
-	BUILD_BUG_ON(I915_CACHE_NONE != 0 ||
-		     I915_CACHE_LLC != 1 ||
-		     I915_CACHE_L3_LLC != 2 ||
-		     I915_CACHE_WT != 3 ||
-		     I915_MAX_CACHE_LEVEL != 4);
-
 	/* We need to fallback to 4K pages if host doesn't support huge gtt. */
 	if (intel_vgpu_active(dev_priv) && !intel_vgpu_has_huge_gtt(dev_priv))
 		RUNTIME_INFO(dev_priv)->page_sizes = I915_GTT_PAGE_SIZE_4K;
diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
index fcacdc21643c..565a60a1645d 100644
--- a/drivers/gpu/drm/i915/i915_pci.c
+++ b/drivers/gpu/drm/i915/i915_pci.c
@@ -32,6 +32,7 @@
 #include "gt/intel_sa_media.h"
 #include "gem/i915_gem_object_types.h"
 
+#include "i915_cache.h"
 #include "i915_driver.h"
 #include "i915_drv.h"
 #include "i915_pci.h"
@@ -43,36 +44,43 @@
 	.__runtime.graphics.ip.ver = (x), \
 	.__runtime.media.ip.ver = (x)
 
-#define LEGACY_CACHELEVEL \
-	.cachelevel_to_pat = { \
-		[I915_CACHE_NONE]   = 0, \
-		[I915_CACHE_LLC]    = 1, \
-		[I915_CACHE_L3_LLC] = 2, \
-		[I915_CACHE_WT]     = 3, \
+#define LEGACY_CACHE_MODES \
+	.cache_modes = { \
+		[I915_CACHE_MODE_UC] 	  = I915_CACHE(UC), \
+		[I915_CACHE_MODE_WB] 	  = I915_CACHE(WB, COH1W, COH2W), \
+		[__I915_CACHE_MODE_WB_L3] = I915_CACHE(WB, COH1W, COH2W, L3), \
+		[I915_CACHE_MODE_WT] 	  = I915_CACHE(WT), \
 	}
 
-#define TGL_CACHELEVEL \
-	.cachelevel_to_pat = { \
-		[I915_CACHE_NONE]   = 3, \
-		[I915_CACHE_LLC]    = 0, \
-		[I915_CACHE_L3_LLC] = 0, \
-		[I915_CACHE_WT]     = 2, \
+#define GEN12_CACHE_MODES \
+	.cache_modes = { \
+		[0] = I915_CACHE(WB, COH1W, COH2W), \
+		[1] = I915_CACHE(WC), \
+		[2] = I915_CACHE(WT), \
+		[3] = I915_CACHE(UC), \
 	}
 
-#define PVC_CACHELEVEL \
-	.cachelevel_to_pat = { \
-		[I915_CACHE_NONE]   = 0, \
-		[I915_CACHE_LLC]    = 3, \
-		[I915_CACHE_L3_LLC] = 3, \
-		[I915_CACHE_WT]     = 2, \
+/* FIXME is 1-way or 2-way for 3, 5, 7 */
+
+#define PVC_CACHE_MODES \
+	.cache_modes = { \
+		[0] = I915_CACHE(UC), \
+		[1] = I915_CACHE(WC), \
+		[2] = I915_CACHE(WT), \
+		[3] = I915_CACHE(WB, COH1W), \
+		[4] = I915_CACHE(WT, CLOS1), \
+		[5] = I915_CACHE(WB, COH1W, CLOS1), \
+		[6] = I915_CACHE(WT, CLOS2), \
+		[7] = I915_CACHE(WB, COH1W, CLOS2), \
 	}
 
-#define MTL_CACHELEVEL \
-	.cachelevel_to_pat = { \
-		[I915_CACHE_NONE]   = 2, \
-		[I915_CACHE_LLC]    = 3, \
-		[I915_CACHE_L3_LLC] = 3, \
-		[I915_CACHE_WT]     = 1, \
+#define MTL_CACHE_MODES \
+	.cache_modes = { \
+		[0] = I915_CACHE(WB), \
+		[1] = I915_CACHE(WT), \
+		[2] = I915_CACHE(UC), \
+		[3] = I915_CACHE(WB, COH1W), \
+		[4] = I915_CACHE(WB, COH1W, COH2W), \
 	}
 
 /* Keep in gen based order, and chronological order within a gen */
@@ -97,7 +105,7 @@
 	.max_pat_index = 3, \
 	GEN_DEFAULT_PAGE_SIZES, \
 	GEN_DEFAULT_REGIONS, \
-	LEGACY_CACHELEVEL
+	LEGACY_CACHE_MODES
 
 #define I845_FEATURES \
 	GEN(2), \
@@ -112,7 +120,7 @@
 	.max_pat_index = 3, \
 	GEN_DEFAULT_PAGE_SIZES, \
 	GEN_DEFAULT_REGIONS, \
-	LEGACY_CACHELEVEL
+	LEGACY_CACHE_MODES
 
 static const struct intel_device_info i830_info = {
 	I830_FEATURES,
@@ -145,7 +153,7 @@ static const struct intel_device_info i865g_info = {
 	.max_pat_index = 3, \
 	GEN_DEFAULT_PAGE_SIZES, \
 	GEN_DEFAULT_REGIONS, \
-	LEGACY_CACHELEVEL
+	LEGACY_CACHE_MODES
 
 static const struct intel_device_info i915g_info = {
 	GEN3_FEATURES,
@@ -208,7 +216,7 @@ static const struct intel_device_info pnv_m_info = {
 	.max_pat_index = 3, \
 	GEN_DEFAULT_PAGE_SIZES, \
 	GEN_DEFAULT_REGIONS, \
-	LEGACY_CACHELEVEL
+	LEGACY_CACHE_MODES
 
 static const struct intel_device_info i965g_info = {
 	GEN4_FEATURES,
@@ -252,7 +260,7 @@ static const struct intel_device_info gm45_info = {
 	.max_pat_index = 3, \
 	GEN_DEFAULT_PAGE_SIZES, \
 	GEN_DEFAULT_REGIONS, \
-	LEGACY_CACHELEVEL
+	LEGACY_CACHE_MODES
 
 static const struct intel_device_info ilk_d_info = {
 	GEN5_FEATURES,
@@ -282,7 +290,7 @@ static const struct intel_device_info ilk_m_info = {
 	.__runtime.ppgtt_size = 31, \
 	GEN_DEFAULT_PAGE_SIZES, \
 	GEN_DEFAULT_REGIONS, \
-	LEGACY_CACHELEVEL
+	LEGACY_CACHE_MODES
 
 #define SNB_D_PLATFORM \
 	GEN6_FEATURES, \
@@ -330,7 +338,7 @@ static const struct intel_device_info snb_m_gt2_info = {
 	.__runtime.ppgtt_size = 31, \
 	GEN_DEFAULT_PAGE_SIZES, \
 	GEN_DEFAULT_REGIONS, \
-	LEGACY_CACHELEVEL
+	LEGACY_CACHE_MODES
 
 #define IVB_D_PLATFORM \
 	GEN7_FEATURES, \
@@ -387,7 +395,7 @@ static const struct intel_device_info vlv_info = {
 	.platform_engine_mask = BIT(RCS0) | BIT(VCS0) | BIT(BCS0),
 	GEN_DEFAULT_PAGE_SIZES,
 	GEN_DEFAULT_REGIONS,
-	LEGACY_CACHELEVEL,
+	LEGACY_CACHE_MODES
 };
 
 #define G75_FEATURES  \
@@ -473,7 +481,7 @@ static const struct intel_device_info chv_info = {
 	.has_coherent_ggtt = false,
 	GEN_DEFAULT_PAGE_SIZES,
 	GEN_DEFAULT_REGIONS,
-	LEGACY_CACHELEVEL,
+	LEGACY_CACHE_MODES
 };
 
 #define GEN9_DEFAULT_PAGE_SIZES \
@@ -536,7 +544,7 @@ static const struct intel_device_info skl_gt4_info = {
 	.max_pat_index = 3, \
 	GEN9_DEFAULT_PAGE_SIZES, \
 	GEN_DEFAULT_REGIONS, \
-	LEGACY_CACHELEVEL
+	LEGACY_CACHE_MODES
 
 static const struct intel_device_info bxt_info = {
 	GEN9_LP_FEATURES,
@@ -640,7 +648,7 @@ static const struct intel_device_info jsl_info = {
 #define GEN12_FEATURES \
 	GEN11_FEATURES, \
 	GEN(12), \
-	TGL_CACHELEVEL, \
+	GEN12_CACHE_MODES, \
 	.has_global_mocs = 1, \
 	.has_pxp = 1, \
 	.max_pat_index = 3
@@ -708,7 +716,7 @@ static const struct intel_device_info adl_p_info = {
 	.__runtime.graphics.ip.ver = 12, \
 	.__runtime.graphics.ip.rel = 50, \
 	XE_HP_PAGE_SIZES, \
-	TGL_CACHELEVEL, \
+	GEN12_CACHE_MODES, \
 	.dma_mask_size = 46, \
 	.has_3d_pipeline = 1, \
 	.has_64bit_reloc = 1, \
@@ -803,7 +811,7 @@ static const struct intel_device_info pvc_info = {
 		BIT(VCS0) |
 		BIT(CCS0) | BIT(CCS1) | BIT(CCS2) | BIT(CCS3),
 	.require_force_probe = 1,
-	PVC_CACHELEVEL,
+	PVC_CACHE_MODES
 };
 
 static const struct intel_gt_definition xelpmp_extra_gt[] = {
@@ -838,7 +846,7 @@ static const struct intel_device_info mtl_info = {
 	.memory_regions = REGION_SMEM | REGION_STOLEN_LMEM,
 	.platform_engine_mask = BIT(RCS0) | BIT(BCS0) | BIT(CCS0),
 	.require_force_probe = 1,
-	MTL_CACHELEVEL,
+	MTL_CACHE_MODES
 };
 
 #undef PLATFORM
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 04bc1f4a1115..973175a64534 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -1870,7 +1870,7 @@ static int alloc_oa_buffer(struct i915_perf_stream *stream)
 		return PTR_ERR(bo);
 	}
 
-	i915_gem_object_set_cache_coherency(bo, I915_CACHE_LLC);
+	i915_gem_object_set_cache_coherency(bo, I915_CACHE_CACHED);
 
 	/* PreHSW required 512K alignment, HSW requires 16M */
 	vma = i915_vma_instance(bo, &gt->ggtt->vm, NULL);
diff --git a/drivers/gpu/drm/i915/intel_device_info.h b/drivers/gpu/drm/i915/intel_device_info.h
index dbfe6443457b..2ce13b7c48cb 100644
--- a/drivers/gpu/drm/i915/intel_device_info.h
+++ b/drivers/gpu/drm/i915/intel_device_info.h
@@ -27,6 +27,8 @@
 
 #include <uapi/drm/i915_drm.h>
 
+#include "i915_cache.h"
+
 #include "intel_step.h"
 
 #include "gt/intel_engine_types.h"
@@ -243,8 +245,8 @@ struct intel_device_info {
 	 */
 	const struct intel_runtime_info __runtime;
 
-	u32 cachelevel_to_pat[I915_MAX_CACHE_LEVEL];
-	u32 max_pat_index;
+	i915_cache_t cache_modes[8];
+	unsigned int max_pat_index;
 };
 
 struct intel_driver_caps {
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
index f910ec9b6d2b..ba821e48baa5 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
@@ -267,7 +267,7 @@ static int igt_evict_for_cache_color(void *arg)
 		err = PTR_ERR(obj);
 		goto cleanup;
 	}
-	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
+	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
 	quirk_add(obj, &objects);
 
 	vma = i915_gem_object_ggtt_pin(obj, NULL, 0, 0,
@@ -283,7 +283,7 @@ static int igt_evict_for_cache_color(void *arg)
 		err = PTR_ERR(obj);
 		goto cleanup;
 	}
-	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
+	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
 	quirk_add(obj, &objects);
 
 	/* Neighbouring; same colour - should fit */
diff --git a/drivers/gpu/drm/i915/selftests/igt_spinner.c b/drivers/gpu/drm/i915/selftests/igt_spinner.c
index 3c5e0952f1b8..4cfc5000d6ff 100644
--- a/drivers/gpu/drm/i915/selftests/igt_spinner.c
+++ b/drivers/gpu/drm/i915/selftests/igt_spinner.c
@@ -23,7 +23,7 @@ int igt_spinner_init(struct igt_spinner *spin, struct intel_gt *gt)
 		err = PTR_ERR(spin->hws);
 		goto err;
 	}
-	i915_gem_object_set_cache_coherency(spin->hws, I915_CACHE_LLC);
+	i915_gem_object_set_cache_coherency(spin->hws, I915_CACHE_CACHED);
 
 	spin->obj = i915_gem_object_create_internal(gt->i915, PAGE_SIZE);
 	if (IS_ERR(spin->obj)) {
diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
index 1d1a457e2aee..8ae77bcf27fa 100644
--- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
+++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
@@ -126,13 +126,13 @@ static const struct intel_device_info mock_info = {
 	.memory_regions = REGION_SMEM,
 	.platform_engine_mask = BIT(0),
 
-	/* simply use legacy cache level for mock device */
+	/* Simply use legacy cache modes for the mock device. */
 	.max_pat_index = 3,
-	.cachelevel_to_pat = {
-		[I915_CACHE_NONE]   = 0,
-		[I915_CACHE_LLC]    = 1,
-		[I915_CACHE_L3_LLC] = 2,
-		[I915_CACHE_WT]     = 3,
+	.cache_modes = {
+		[0] = I915_CACHE(UC),
+		[1] = I915_CACHE(WB, COH1W),
+		[2] = I915_CACHE(WB, COH1W, COH2W, L3),
+		[3] = I915_CACHE(WT),
 	},
 };
 
@@ -181,7 +181,7 @@ struct drm_i915_private *mock_gem_device(void)
 	/* Set up device info and initial runtime info. */
 	intel_device_info_driver_create(i915, pdev->device, &mock_info);
 
-	i915_cache_init(i915);
+	WARN_ON(i915_cache_init(i915));
 
 	dev_pm_domain_set(&pdev->dev, &pm_domain);
 	pm_runtime_enable(&pdev->dev);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [RFC 5/8] drm/i915: Improve the vm_fault_gtt user PAT index restriction
  2023-07-27 14:54 ` [Intel-gfx] " Tvrtko Ursulin
@ 2023-07-27 14:55   ` Tvrtko Ursulin
  -1 siblings, 0 replies; 59+ messages in thread
From: Tvrtko Ursulin @ 2023-07-27 14:55 UTC (permalink / raw)
  To: Intel-gfx, dri-devel; +Cc: Matt Roper, Fei Yang, Tvrtko Ursulin

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Now that i915 understands the caching modes behind PAT indices, we can
refine the check in vm_fault_gtt() to not reject the uncached PAT if it
was set by userspace on a snoopable platform.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Fei Yang <fei.yang@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_mman.c | 14 +++-----------
 1 file changed, 3 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
index cd7f8ded0d6f..9aa6ecf68432 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
@@ -382,17 +382,9 @@ static vm_fault_t vm_fault_gtt(struct vm_fault *vmf)
 		goto err_reset;
 	}
 
-	/*
-	 * For objects created by userspace through GEM_CREATE with pat_index
-	 * set by set_pat extension, coherency is managed by userspace, make
-	 * sure we don't fail handling the vm fault by calling
-	 * i915_gem_object_has_cache_level() which always return true for such
-	 * objects. Otherwise this helper function would fall back to checking
-	 * whether the object is un-cached.
-	 */
-	if (!((obj->pat_set_by_user ||
-	       i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_UC)) ||
-	      HAS_LLC(i915))) {
+	/* Access to snoopable pages through the GTT is incoherent. */
+	if (!i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_UC) &&
+	    !HAS_LLC(i915)) {
 		ret = -EFAULT;
 		goto err_unpin;
 	}
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Intel-gfx] [RFC 5/8] drm/i915: Improve the vm_fault_gtt user PAT index restriction
@ 2023-07-27 14:55   ` Tvrtko Ursulin
  0 siblings, 0 replies; 59+ messages in thread
From: Tvrtko Ursulin @ 2023-07-27 14:55 UTC (permalink / raw)
  To: Intel-gfx, dri-devel; +Cc: Matt Roper

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Now that i915 understands the caching modes behind PAT indices, we can
refine the check in vm_fault_gtt() to not reject the uncached PAT if it
was set by userspace on a snoopable platform.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Fei Yang <fei.yang@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_mman.c | 14 +++-----------
 1 file changed, 3 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
index cd7f8ded0d6f..9aa6ecf68432 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
@@ -382,17 +382,9 @@ static vm_fault_t vm_fault_gtt(struct vm_fault *vmf)
 		goto err_reset;
 	}
 
-	/*
-	 * For objects created by userspace through GEM_CREATE with pat_index
-	 * set by set_pat extension, coherency is managed by userspace, make
-	 * sure we don't fail handling the vm fault by calling
-	 * i915_gem_object_has_cache_level() which always return true for such
-	 * objects. Otherwise this helper function would fall back to checking
-	 * whether the object is un-cached.
-	 */
-	if (!((obj->pat_set_by_user ||
-	       i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_UC)) ||
-	      HAS_LLC(i915))) {
+	/* Access to snoopable pages through the GTT is incoherent. */
+	if (!i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_UC) &&
+	    !HAS_LLC(i915)) {
 		ret = -EFAULT;
 		goto err_unpin;
 	}
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [RFC 6/8] drm/i915: Lift the user PAT restriction from gpu_write_needs_clflush
  2023-07-27 14:54 ` [Intel-gfx] " Tvrtko Ursulin
@ 2023-07-27 14:55   ` Tvrtko Ursulin
  -1 siblings, 0 replies; 59+ messages in thread
From: Tvrtko Ursulin @ 2023-07-27 14:55 UTC (permalink / raw)
  To: Intel-gfx, dri-devel; +Cc: Matt Roper, Fei Yang, Tvrtko Ursulin

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Now that i915 understands the caching modes behind PAT indices, and having
also special cased the Meteorlake snooping fully coherent mode, we can
remove the user PAT check from gpu_write_needs_clflush().

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Fei Yang <fei.yang@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_domain.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
index c15f83de33af..bf3a2fa0e539 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
@@ -41,12 +41,6 @@ static bool gpu_write_needs_clflush(struct drm_i915_gem_object *obj)
 	if (IS_METEORLAKE(i915))
 		return false;
 
-	/*
-	 * Always flush cache for UMD objects with PAT index set.
-	 */
-	if (obj->pat_set_by_user)
-		return true;
-
 	/*
 	 * Fully coherent cached access may end up with data in the CPU cache
 	 * which hasn't hit memory yet.
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Intel-gfx] [RFC 6/8] drm/i915: Lift the user PAT restriction from gpu_write_needs_clflush
@ 2023-07-27 14:55   ` Tvrtko Ursulin
  0 siblings, 0 replies; 59+ messages in thread
From: Tvrtko Ursulin @ 2023-07-27 14:55 UTC (permalink / raw)
  To: Intel-gfx, dri-devel; +Cc: Matt Roper

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Now that i915 understands the caching modes behind PAT indices, and having
also special cased the Meteorlake snooping fully coherent mode, we can
remove the user PAT check from gpu_write_needs_clflush().

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Fei Yang <fei.yang@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_domain.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
index c15f83de33af..bf3a2fa0e539 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
@@ -41,12 +41,6 @@ static bool gpu_write_needs_clflush(struct drm_i915_gem_object *obj)
 	if (IS_METEORLAKE(i915))
 		return false;
 
-	/*
-	 * Always flush cache for UMD objects with PAT index set.
-	 */
-	if (obj->pat_set_by_user)
-		return true;
-
 	/*
 	 * Fully coherent cached access may end up with data in the CPU cache
 	 * which hasn't hit memory yet.
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [RFC 7/8] drm/i915: Lift the user PAT restriction from use_cpu_reloc
  2023-07-27 14:54 ` [Intel-gfx] " Tvrtko Ursulin
@ 2023-07-27 14:55   ` Tvrtko Ursulin
  -1 siblings, 0 replies; 59+ messages in thread
From: Tvrtko Ursulin @ 2023-07-27 14:55 UTC (permalink / raw)
  To: Intel-gfx, dri-devel; +Cc: Matt Roper, Fei Yang, Tvrtko Ursulin

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Now that i915 understands the caching modes behind PAT indices, we can
refine the check in use_cpu_reloc() to not reject the uncached PAT if it
was set by userspace.

Instead it can decide based on the presence of full coherency which
should be functionally equivalent on legacy platforms. We can ignore WT
since it is only used by the display, and we can ignore Meteorlake since
it will fail on the existing "has_llc" condition before the object cache
mode check.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Fei Yang <fei.yang@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 9 +--------
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 9d6e49c8a4c6..f74b33670bad 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -640,16 +640,9 @@ static inline int use_cpu_reloc(const struct reloc_cache *cache,
 	if (DBG_FORCE_RELOC == FORCE_GTT_RELOC)
 		return false;
 
-	/*
-	 * For objects created by userspace through GEM_CREATE with pat_index
-	 * set by set_pat extension, i915_gem_object_has_cache_level() always
-	 * return true, otherwise the call would fall back to checking whether
-	 * the object is un-cached.
-	 */
 	return (cache->has_llc ||
 		obj->cache_dirty ||
-		!(obj->pat_set_by_user ||
-		  i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_UC)));
+		i915_gem_object_has_cache_flag(obj, I915_CACHE_FLAG_COH2W));
 }
 
 static int eb_reserve_vma(struct i915_execbuffer *eb,
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Intel-gfx] [RFC 7/8] drm/i915: Lift the user PAT restriction from use_cpu_reloc
@ 2023-07-27 14:55   ` Tvrtko Ursulin
  0 siblings, 0 replies; 59+ messages in thread
From: Tvrtko Ursulin @ 2023-07-27 14:55 UTC (permalink / raw)
  To: Intel-gfx, dri-devel; +Cc: Matt Roper

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Now that i915 understands the caching modes behind PAT indices, we can
refine the check in use_cpu_reloc() to not reject the uncached PAT if it
was set by userspace.

Instead it can decide based on the presence of full coherency which
should be functionally equivalent on legacy platforms. We can ignore WT
since it is only used by the display, and we can ignore Meteorlake since
it will fail on the existing "has_llc" condition before the object cache
mode check.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Fei Yang <fei.yang@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 9 +--------
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 9d6e49c8a4c6..f74b33670bad 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -640,16 +640,9 @@ static inline int use_cpu_reloc(const struct reloc_cache *cache,
 	if (DBG_FORCE_RELOC == FORCE_GTT_RELOC)
 		return false;
 
-	/*
-	 * For objects created by userspace through GEM_CREATE with pat_index
-	 * set by set_pat extension, i915_gem_object_has_cache_level() always
-	 * return true, otherwise the call would fall back to checking whether
-	 * the object is un-cached.
-	 */
 	return (cache->has_llc ||
 		obj->cache_dirty ||
-		!(obj->pat_set_by_user ||
-		  i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_UC)));
+		i915_gem_object_has_cache_flag(obj, I915_CACHE_FLAG_COH2W));
 }
 
 static int eb_reserve_vma(struct i915_execbuffer *eb,
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [RFC 8/8] drm/i915: Refine the caching check in i915_gem_object_can_bypass_llc
  2023-07-27 14:54 ` [Intel-gfx] " Tvrtko Ursulin
@ 2023-07-27 14:55   ` Tvrtko Ursulin
  -1 siblings, 0 replies; 59+ messages in thread
From: Tvrtko Ursulin @ 2023-07-27 14:55 UTC (permalink / raw)
  To: Intel-gfx, dri-devel; +Cc: Matt Roper, Fei Yang, Tvrtko Ursulin

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Now that i915 understands the caching modes behind PAT indices, we can
refine the check in i915_gem_object_can_bypass_llc() to stop assuming any
user PAT can bypass the shared cache (if there is any).

Instead we can use the absence of I915_BO_CACHE_COHERENT_FOR_WRITE as the
criteria, which is set for all caching modes where writes from the CPU
side (in this case buffer clears before handing buffers over to userspace)
are fully coherent with respect to reads from the GPU.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Fei Yang <fei.yang@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_object.c | 18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
index ec1f0be43d0d..8c4b54bd3911 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
@@ -221,12 +221,6 @@ bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj)
 	if (!(obj->flags & I915_BO_ALLOC_USER))
 		return false;
 
-	/*
-	 * Always flush cache for UMD objects at creation time.
-	 */
-	if (obj->pat_set_by_user)
-		return true;
-
 	/*
 	 * EHL and JSL add the 'Bypass LLC' MOCS entry, which should make it
 	 * possible for userspace to bypass the GTT caching bits set by the
@@ -239,7 +233,17 @@ bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj)
 	 * it, but since i915 takes the stance of always zeroing memory before
 	 * handing it to userspace, we need to prevent this.
 	 */
-	return IS_JSL_EHL(i915);
+	if (IS_JSL_EHL(i915))
+		return true;
+
+	/*
+	 * Any caching mode where writes via CPU cache are not coherent with
+	 * the GPU needs explicit flushing to ensure GPU can not see stale data.
+	 */
+	if (!(obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_WRITE))
+		return true;
+
+	return false;
 }
 
 static void i915_gem_close_object(struct drm_gem_object *gem, struct drm_file *file)
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Intel-gfx] [RFC 8/8] drm/i915: Refine the caching check in i915_gem_object_can_bypass_llc
@ 2023-07-27 14:55   ` Tvrtko Ursulin
  0 siblings, 0 replies; 59+ messages in thread
From: Tvrtko Ursulin @ 2023-07-27 14:55 UTC (permalink / raw)
  To: Intel-gfx, dri-devel; +Cc: Matt Roper

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Now that i915 understands the caching modes behind PAT indices, we can
refine the check in i915_gem_object_can_bypass_llc() to stop assuming any
user PAT can bypass the shared cache (if there is any).

Instead we can use the absence of I915_BO_CACHE_COHERENT_FOR_WRITE as the
criteria, which is set for all caching modes where writes from the CPU
side (in this case buffer clears before handing buffers over to userspace)
are fully coherent with respect to reads from the GPU.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Fei Yang <fei.yang@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_object.c | 18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
index ec1f0be43d0d..8c4b54bd3911 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
@@ -221,12 +221,6 @@ bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj)
 	if (!(obj->flags & I915_BO_ALLOC_USER))
 		return false;
 
-	/*
-	 * Always flush cache for UMD objects at creation time.
-	 */
-	if (obj->pat_set_by_user)
-		return true;
-
 	/*
 	 * EHL and JSL add the 'Bypass LLC' MOCS entry, which should make it
 	 * possible for userspace to bypass the GTT caching bits set by the
@@ -239,7 +233,17 @@ bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj)
 	 * it, but since i915 takes the stance of always zeroing memory before
 	 * handing it to userspace, we need to prevent this.
 	 */
-	return IS_JSL_EHL(i915);
+	if (IS_JSL_EHL(i915))
+		return true;
+
+	/*
+	 * Any caching mode where writes via CPU cache are not coherent with
+	 * the GPU needs explicit flushing to ensure GPU can not see stale data.
+	 */
+	if (!(obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_WRITE))
+		return true;
+
+	return false;
 }
 
 static void i915_gem_close_object(struct drm_gem_object *gem, struct drm_file *file)
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Another take on PAT/object cache mode refactoring
  2023-07-27 14:54 ` [Intel-gfx] " Tvrtko Ursulin
                   ` (8 preceding siblings ...)
  (?)
@ 2023-07-27 19:43 ` Patchwork
  -1 siblings, 0 replies; 59+ messages in thread
From: Patchwork @ 2023-07-27 19:43 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

== Series Details ==

Series: Another take on PAT/object cache mode refactoring
URL   : https://patchwork.freedesktop.org/series/121450/
State : warning

== Summary ==

Error: dim checkpatch failed
7ea1d7a9ac31 drm/i915: Skip clflush after GPU writes on Meteorlake
5658be9fc6d2 drm/i915: Split PTE encode between Gen12 and Meteorlake
7ae0f4a5fb34 drm/i915: Cache PAT index used by the driver
Traceback (most recent call last):
  File "scripts/spdxcheck.py", line 6, in <module>
    from ply import lex, yacc
ModuleNotFoundError: No module named 'ply'
Traceback (most recent call last):
  File "scripts/spdxcheck.py", line 6, in <module>
    from ply import lex, yacc
ModuleNotFoundError: No module named 'ply'
-:8: WARNING:TYPO_SPELLING: 'platfrom' may be misspelled - perhaps 'platform'?
#8: 
per platfrom so no need to consult a function every time.
    ^^^^^^^^

-:337: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#337: 
new file mode 100644

total: 0 errors, 2 warnings, 0 checks, 412 lines checked
b3cd816012f4 drm/i915: Refactor PAT/object cache handling
-:738: CHECK:LINE_SPACING: Please don't use multiple blank lines
#738: FILE: drivers/gpu/drm/i915/gem/selftests/huge_pages.c:206:
 
+

-:1148: WARNING:LONG_LINE: line length of 126 exceeds 100 columns
#1148: FILE: drivers/gpu/drm/i915/i915_cache.h:49:
+#define I915_CACHE_4(m, f1, f2, f3)	__I915_CACHE(m, __I915_CACHE_FLAG(f1) | __I915_CACHE_FLAG(f2) | __I915_CACHE_FLAG(f3))

-:1149: WARNING:LONG_LINE: line length of 102 exceeds 100 columns
#1149: FILE: drivers/gpu/drm/i915/i915_cache.h:50:
+#define I915_CACHE_3(m, f1, f2)		__I915_CACHE(m, __I915_CACHE_FLAG(f1) | __I915_CACHE_FLAG(f2))

-:1155: CHECK:CAMELCASE: Avoid CamelCase: <argsWithParentheses>
#1155: FILE: drivers/gpu/drm/i915/i915_cache.h:56:
+#define FUNC_RECOMPOSER(argsWithParentheses) FUNC_CHOOSER argsWithParentheses

-:1155: ERROR:COMPLEX_MACRO: Macros with complex values should be enclosed in parentheses
#1155: FILE: drivers/gpu/drm/i915/i915_cache.h:56:
+#define FUNC_RECOMPOSER(argsWithParentheses) FUNC_CHOOSER argsWithParentheses

-:1156: WARNING:LONG_LINE: line length of 123 exceeds 100 columns
#1156: FILE: drivers/gpu/drm/i915/i915_cache.h:57:
+#define CHOOSE_FROM_ARG_COUNT(...) FUNC_RECOMPOSER((__VA_ARGS__, I915_CACHE_4, I915_CACHE_3, I915_CACHE_2, I915_CACHE_1, ))

-:1156: ERROR:SPACING: space prohibited before that close parenthesis ')'
#1156: FILE: drivers/gpu/drm/i915/i915_cache.h:57:
+#define CHOOSE_FROM_ARG_COUNT(...) FUNC_RECOMPOSER((__VA_ARGS__, I915_CACHE_4, I915_CACHE_3, I915_CACHE_2, I915_CACHE_1, ))

-:1157: ERROR:SPACING: space required after that ',' (ctx:OxO)
#1157: FILE: drivers/gpu/drm/i915/i915_cache.h:58:
+#define NO_ARG_EXPANDER() ,,,I915_CACHE_0
                            ^

-:1157: ERROR:SPACING: space required after that ',' (ctx:OxV)
#1157: FILE: drivers/gpu/drm/i915/i915_cache.h:58:
+#define NO_ARG_EXPANDER() ,,,I915_CACHE_0
                             ^

-:1157: ERROR:COMPLEX_MACRO: Macros with complex values should be enclosed in parentheses
#1157: FILE: drivers/gpu/drm/i915/i915_cache.h:58:
+#define NO_ARG_EXPANDER() ,,,I915_CACHE_0

-:1158: WARNING:SPACING: space prohibited between function name and open parenthesis '('
#1158: FILE: drivers/gpu/drm/i915/i915_cache.h:59:
+#define MACRO_CHOOSER(...) CHOOSE_FROM_ARG_COUNT(NO_ARG_EXPANDER __VA_ARGS__ ())

-:1321: WARNING:SPACE_BEFORE_TAB: please, no space before tabs
#1321: FILE: drivers/gpu/drm/i915/i915_pci.c:49:
+^I^I[I915_CACHE_MODE_UC] ^I  = I915_CACHE(UC), \$

-:1322: WARNING:SPACE_BEFORE_TAB: please, no space before tabs
#1322: FILE: drivers/gpu/drm/i915/i915_pci.c:50:
+^I^I[I915_CACHE_MODE_WB] ^I  = I915_CACHE(WB, COH1W, COH2W), \$

-:1324: WARNING:SPACE_BEFORE_TAB: please, no space before tabs
#1324: FILE: drivers/gpu/drm/i915/i915_pci.c:52:
+^I^I[I915_CACHE_MODE_WT] ^I  = I915_CACHE(WT), \$

total: 5 errors, 7 warnings, 2 checks, 1325 lines checked
34422af90547 drm/i915: Improve the vm_fault_gtt user PAT index restriction
310ac748bf11 drm/i915: Lift the user PAT restriction from gpu_write_needs_clflush
c685b68ad021 drm/i915: Lift the user PAT restriction from use_cpu_reloc
84f57227fb16 drm/i915: Refine the caching check in i915_gem_object_can_bypass_llc



^ permalink raw reply	[flat|nested] 59+ messages in thread

* [Intel-gfx] ✗ Fi.CI.SPARSE: warning for Another take on PAT/object cache mode refactoring
  2023-07-27 14:54 ` [Intel-gfx] " Tvrtko Ursulin
                   ` (9 preceding siblings ...)
  (?)
@ 2023-07-27 19:43 ` Patchwork
  -1 siblings, 0 replies; 59+ messages in thread
From: Patchwork @ 2023-07-27 19:43 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

== Series Details ==

Series: Another take on PAT/object cache mode refactoring
URL   : https://patchwork.freedesktop.org/series/121450/
State : warning

== Summary ==

Error: dim sparse failed
Sparse version: v0.6.2
Fast mode used, each commit won't be checked separately.



^ permalink raw reply	[flat|nested] 59+ messages in thread

* [Intel-gfx] ✓ Fi.CI.BAT: success for Another take on PAT/object cache mode refactoring
  2023-07-27 14:54 ` [Intel-gfx] " Tvrtko Ursulin
                   ` (10 preceding siblings ...)
  (?)
@ 2023-07-27 20:01 ` Patchwork
  -1 siblings, 0 replies; 59+ messages in thread
From: Patchwork @ 2023-07-27 20:01 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

[-- Attachment #1: Type: text/plain, Size: 11983 bytes --]

== Series Details ==

Series: Another take on PAT/object cache mode refactoring
URL   : https://patchwork.freedesktop.org/series/121450/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_13432 -> Patchwork_121450v1
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/index.html

Participating hosts (43 -> 42)
------------------------------

  Missing    (1): fi-snb-2520m 

Known issues
------------

  Here are the changes found in Patchwork_121450v1 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@core_auth@basic-auth:
    - fi-kbl-soraka:      [PASS][1] -> [INCOMPLETE][2] ([i915#1982] / [i915#8011])
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13432/fi-kbl-soraka/igt@core_auth@basic-auth.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/fi-kbl-soraka/igt@core_auth@basic-auth.html

  * igt@gem_exec_suspend@basic-s0@smem:
    - bat-jsl-3:          [PASS][3] -> [ABORT][4] ([i915#5122])
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13432/bat-jsl-3/igt@gem_exec_suspend@basic-s0@smem.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/bat-jsl-3/igt@gem_exec_suspend@basic-s0@smem.html

  * igt@gem_exec_suspend@basic-s3@smem:
    - bat-rpls-2:         NOTRUN -> [ABORT][5] ([i915#6687] / [i915#7978] / [i915#8668])
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/bat-rpls-2/igt@gem_exec_suspend@basic-s3@smem.html

  * igt@gem_lmem_swapping@parallel-random-engines:
    - bat-adlp-9:         NOTRUN -> [SKIP][6] ([i915#4613]) +3 similar issues
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/bat-adlp-9/igt@gem_lmem_swapping@parallel-random-engines.html

  * igt@i915_module_load@load:
    - bat-mtlp-6:         [PASS][7] -> [ABORT][8] ([i915#8141])
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13432/bat-mtlp-6/igt@i915_module_load@load.html
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/bat-mtlp-6/igt@i915_module_load@load.html
    - bat-mtlp-8:         [PASS][9] -> [ABORT][10] ([i915#8141])
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13432/bat-mtlp-8/igt@i915_module_load@load.html
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/bat-mtlp-8/igt@i915_module_load@load.html

  * igt@i915_pm_rpm@basic-rte:
    - fi-cfl-guc:         [PASS][11] -> [FAIL][12] ([i915#7940])
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13432/fi-cfl-guc/igt@i915_pm_rpm@basic-rte.html
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/fi-cfl-guc/igt@i915_pm_rpm@basic-rte.html

  * igt@i915_pm_rpm@module-reload:
    - bat-adlp-9:         NOTRUN -> [FAIL][13] ([i915#7940])
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/bat-adlp-9/igt@i915_pm_rpm@module-reload.html

  * igt@i915_pm_rps@basic-api:
    - bat-adlp-9:         NOTRUN -> [SKIP][14] ([i915#6621])
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/bat-adlp-9/igt@i915_pm_rps@basic-api.html

  * igt@i915_selftest@live@requests:
    - bat-rpls-1:         [PASS][15] -> [ABORT][16] ([i915#7911] / [i915#7920] / [i915#7982])
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13432/bat-rpls-1/igt@i915_selftest@live@requests.html
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/bat-rpls-1/igt@i915_selftest@live@requests.html

  * igt@i915_selftest@live@slpc:
    - bat-rpls-2:         NOTRUN -> [DMESG-WARN][17] ([i915#6367])
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/bat-rpls-2/igt@i915_selftest@live@slpc.html

  * igt@i915_suspend@basic-s3-without-i915:
    - bat-jsl-3:          [PASS][18] -> [FAIL][19] ([fdo#103375])
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13432/bat-jsl-3/igt@i915_suspend@basic-s3-without-i915.html
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/bat-jsl-3/igt@i915_suspend@basic-s3-without-i915.html

  * igt@kms_chamelium_hpd@common-hpd-after-suspend:
    - bat-adlp-9:         NOTRUN -> [SKIP][20] ([i915#7828])
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/bat-adlp-9/igt@kms_chamelium_hpd@common-hpd-after-suspend.html

  * igt@kms_pipe_crc_basic@nonblocking-crc-frame-sequence:
    - bat-dg2-11:         NOTRUN -> [SKIP][21] ([i915#1845] / [i915#5354]) +3 similar issues
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/bat-dg2-11/igt@kms_pipe_crc_basic@nonblocking-crc-frame-sequence.html

  * igt@kms_psr@cursor_plane_move:
    - bat-rplp-1:         NOTRUN -> [ABORT][22] ([i915#8434])
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/bat-rplp-1/igt@kms_psr@cursor_plane_move.html

  * igt@kms_psr@primary_page_flip:
    - bat-rplp-1:         NOTRUN -> [SKIP][23] ([i915#1072])
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/bat-rplp-1/igt@kms_psr@primary_page_flip.html

  * igt@prime_vgem@basic-fence-read:
    - bat-adlp-9:         NOTRUN -> [SKIP][24] ([fdo#109295] / [i915#3291] / [i915#3708]) +2 similar issues
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/bat-adlp-9/igt@prime_vgem@basic-fence-read.html

  
#### Possible fixes ####

  * igt@i915_pm_rpm@basic-pci-d3-state:
    - fi-tgl-1115g4:      [FAIL][25] ([i915#7940]) -> [PASS][26]
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13432/fi-tgl-1115g4/igt@i915_pm_rpm@basic-pci-d3-state.html
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/fi-tgl-1115g4/igt@i915_pm_rpm@basic-pci-d3-state.html
    - bat-adlp-9:         [FAIL][27] ([i915#7940]) -> [PASS][28]
   [27]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13432/bat-adlp-9/igt@i915_pm_rpm@basic-pci-d3-state.html
   [28]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/bat-adlp-9/igt@i915_pm_rpm@basic-pci-d3-state.html

  * igt@i915_pm_rpm@basic-rte:
    - bat-adlp-9:         [ABORT][29] ([i915#7977] / [i915#8668]) -> [PASS][30]
   [29]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13432/bat-adlp-9/igt@i915_pm_rpm@basic-rte.html
   [30]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/bat-adlp-9/igt@i915_pm_rpm@basic-rte.html

  * igt@i915_pm_rpm@module-reload:
    - fi-rkl-11600:       [FAIL][31] ([i915#7940]) -> [PASS][32]
   [31]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13432/fi-rkl-11600/igt@i915_pm_rpm@module-reload.html
   [32]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/fi-rkl-11600/igt@i915_pm_rpm@module-reload.html
    - fi-skl-guc:         [FAIL][33] ([i915#7940]) -> [PASS][34]
   [33]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13432/fi-skl-guc/igt@i915_pm_rpm@module-reload.html
   [34]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/fi-skl-guc/igt@i915_pm_rpm@module-reload.html

  * igt@i915_selftest@live@hangcheck:
    - bat-dg2-8:          [DMESG-FAIL][35] ([i915#6998] / [i915#7913]) -> [PASS][36]
   [35]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13432/bat-dg2-8/igt@i915_selftest@live@hangcheck.html
   [36]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/bat-dg2-8/igt@i915_selftest@live@hangcheck.html

  * igt@i915_selftest@live@reset:
    - bat-rpls-2:         [ABORT][37] ([i915#4983] / [i915#7461] / [i915#7913] / [i915#8347]) -> [PASS][38]
   [37]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13432/bat-rpls-2/igt@i915_selftest@live@reset.html
   [38]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/bat-rpls-2/igt@i915_selftest@live@reset.html

  * igt@kms_addfb_basic@no-handle:
    - fi-kbl-soraka:      [INCOMPLETE][39] -> [PASS][40]
   [39]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13432/fi-kbl-soraka/igt@kms_addfb_basic@no-handle.html
   [40]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/fi-kbl-soraka/igt@kms_addfb_basic@no-handle.html

  * igt@kms_pipe_crc_basic@read-crc-frame-sequence@pipe-d-edp-1:
    - bat-rplp-1:         [ABORT][41] ([i915#8442] / [i915#8668]) -> [PASS][42]
   [41]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13432/bat-rplp-1/igt@kms_pipe_crc_basic@read-crc-frame-sequence@pipe-d-edp-1.html
   [42]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/bat-rplp-1/igt@kms_pipe_crc_basic@read-crc-frame-sequence@pipe-d-edp-1.html

  
#### Warnings ####

  * igt@i915_pm_rpm@basic-rte:
    - fi-kbl-guc:         [FAIL][43] ([i915#7940]) -> [FAIL][44] ([i915#8843])
   [43]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13432/fi-kbl-guc/igt@i915_pm_rpm@basic-rte.html
   [44]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/fi-kbl-guc/igt@i915_pm_rpm@basic-rte.html

  
  [fdo#103375]: https://bugs.freedesktop.org/show_bug.cgi?id=103375
  [fdo#109295]: https://bugs.freedesktop.org/show_bug.cgi?id=109295
  [i915#1072]: https://gitlab.freedesktop.org/drm/intel/issues/1072
  [i915#1845]: https://gitlab.freedesktop.org/drm/intel/issues/1845
  [i915#1982]: https://gitlab.freedesktop.org/drm/intel/issues/1982
  [i915#3291]: https://gitlab.freedesktop.org/drm/intel/issues/3291
  [i915#3708]: https://gitlab.freedesktop.org/drm/intel/issues/3708
  [i915#4613]: https://gitlab.freedesktop.org/drm/intel/issues/4613
  [i915#4983]: https://gitlab.freedesktop.org/drm/intel/issues/4983
  [i915#5122]: https://gitlab.freedesktop.org/drm/intel/issues/5122
  [i915#5354]: https://gitlab.freedesktop.org/drm/intel/issues/5354
  [i915#6367]: https://gitlab.freedesktop.org/drm/intel/issues/6367
  [i915#6621]: https://gitlab.freedesktop.org/drm/intel/issues/6621
  [i915#6687]: https://gitlab.freedesktop.org/drm/intel/issues/6687
  [i915#6998]: https://gitlab.freedesktop.org/drm/intel/issues/6998
  [i915#7461]: https://gitlab.freedesktop.org/drm/intel/issues/7461
  [i915#7828]: https://gitlab.freedesktop.org/drm/intel/issues/7828
  [i915#7911]: https://gitlab.freedesktop.org/drm/intel/issues/7911
  [i915#7913]: https://gitlab.freedesktop.org/drm/intel/issues/7913
  [i915#7920]: https://gitlab.freedesktop.org/drm/intel/issues/7920
  [i915#7940]: https://gitlab.freedesktop.org/drm/intel/issues/7940
  [i915#7977]: https://gitlab.freedesktop.org/drm/intel/issues/7977
  [i915#7978]: https://gitlab.freedesktop.org/drm/intel/issues/7978
  [i915#7982]: https://gitlab.freedesktop.org/drm/intel/issues/7982
  [i915#8011]: https://gitlab.freedesktop.org/drm/intel/issues/8011
  [i915#8141]: https://gitlab.freedesktop.org/drm/intel/issues/8141
  [i915#8347]: https://gitlab.freedesktop.org/drm/intel/issues/8347
  [i915#8434]: https://gitlab.freedesktop.org/drm/intel/issues/8434
  [i915#8442]: https://gitlab.freedesktop.org/drm/intel/issues/8442
  [i915#8668]: https://gitlab.freedesktop.org/drm/intel/issues/8668
  [i915#8843]: https://gitlab.freedesktop.org/drm/intel/issues/8843


Build changes
-------------

  * Linux: CI_DRM_13432 -> Patchwork_121450v1

  CI-20190529: 20190529
  CI_DRM_13432: 069a79d6af09879060345da9f8b886a73b7810a8 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_7406: 1d6fd796607099d189e85d1fd305160363b961f2 @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  Patchwork_121450v1: 069a79d6af09879060345da9f8b886a73b7810a8 @ git://anongit.freedesktop.org/gfx-ci/linux


### Linux commits

3118ce62a19d drm/i915: Refine the caching check in i915_gem_object_can_bypass_llc
d93637c182d5 drm/i915: Lift the user PAT restriction from use_cpu_reloc
b105cd97c3f4 drm/i915: Lift the user PAT restriction from gpu_write_needs_clflush
ce7fda866356 drm/i915: Improve the vm_fault_gtt user PAT index restriction
23b6c9039a5f drm/i915: Refactor PAT/object cache handling
dfc61cae30bd drm/i915: Cache PAT index used by the driver
48ebce044f0f drm/i915: Split PTE encode between Gen12 and Meteorlake
16d7c6aba89d drm/i915: Skip clflush after GPU writes on Meteorlake

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/index.html

[-- Attachment #2: Type: text/html, Size: 13924 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC 1/8] drm/i915: Skip clflush after GPU writes on Meteorlake
  2023-07-27 14:54   ` [Intel-gfx] " Tvrtko Ursulin
@ 2023-07-27 22:19     ` Matt Roper
  -1 siblings, 0 replies; 59+ messages in thread
From: Matt Roper @ 2023-07-27 22:19 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Thomas Hellström, Tvrtko Ursulin, Intel-gfx, dri-devel,
	Matthew Auld, Fei Yang

On Thu, Jul 27, 2023 at 03:54:57PM +0100, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> On Meteorlake CPU cache will not contain stale data after GPU access since
> write-invalidate protocol is used, which means there is no need to flush
> before potentially transitioning the buffer to a non-coherent domain.
> 
> Use the opportunity to documet the situation on discrete too.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Matt Roper <matthew.d.roper@intel.com>
> Cc: Fei Yang <fei.yang@intel.com>
> Cc: Matthew Auld <matthew.auld@intel.com>
> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>

Reviewed-by: Matt Roper <matthew.d.roper@intel.com>

> ---
>  drivers/gpu/drm/i915/gem/i915_gem_domain.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> index ffddec1d2a76..57db9c581bf6 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> @@ -24,9 +24,22 @@ static bool gpu_write_needs_clflush(struct drm_i915_gem_object *obj)
>  {
>  	struct drm_i915_private *i915 = to_i915(obj->base.dev);
>  
> +	/*
> +	 * Discrete GPUs never dirty the CPU cache.
> +	 */
>  	if (IS_DGFX(i915))
>  		return false;
>  
> +	/*
> +	 * Cache snooping on Meteorlake is using write-invalidate so GPU writes
> +	 * never end up in the CPU cache.
> +	 *
> +	 * QQQ: Do other snooping platforms behave identicaly and could we
> +	 *      therefore write this as "if !HAS_LLC(i915) && HAS_SNOOP(i915)"?
> +	 */
> +	if (IS_METEORLAKE(i915))
> +		return false;
> +
>  	/*
>  	 * For objects created by userspace through GEM_CREATE with pat_index
>  	 * set by set_pat extension, i915_gem_object_has_cache_level() will
> -- 
> 2.39.2
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Intel-gfx] [RFC 1/8] drm/i915: Skip clflush after GPU writes on Meteorlake
@ 2023-07-27 22:19     ` Matt Roper
  0 siblings, 0 replies; 59+ messages in thread
From: Matt Roper @ 2023-07-27 22:19 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Thomas Hellström, Intel-gfx, dri-devel, Matthew Auld

On Thu, Jul 27, 2023 at 03:54:57PM +0100, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> On Meteorlake CPU cache will not contain stale data after GPU access since
> write-invalidate protocol is used, which means there is no need to flush
> before potentially transitioning the buffer to a non-coherent domain.
> 
> Use the opportunity to documet the situation on discrete too.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Matt Roper <matthew.d.roper@intel.com>
> Cc: Fei Yang <fei.yang@intel.com>
> Cc: Matthew Auld <matthew.auld@intel.com>
> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>

Reviewed-by: Matt Roper <matthew.d.roper@intel.com>

> ---
>  drivers/gpu/drm/i915/gem/i915_gem_domain.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> index ffddec1d2a76..57db9c581bf6 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> @@ -24,9 +24,22 @@ static bool gpu_write_needs_clflush(struct drm_i915_gem_object *obj)
>  {
>  	struct drm_i915_private *i915 = to_i915(obj->base.dev);
>  
> +	/*
> +	 * Discrete GPUs never dirty the CPU cache.
> +	 */
>  	if (IS_DGFX(i915))
>  		return false;
>  
> +	/*
> +	 * Cache snooping on Meteorlake is using write-invalidate so GPU writes
> +	 * never end up in the CPU cache.
> +	 *
> +	 * QQQ: Do other snooping platforms behave identicaly and could we
> +	 *      therefore write this as "if !HAS_LLC(i915) && HAS_SNOOP(i915)"?
> +	 */
> +	if (IS_METEORLAKE(i915))
> +		return false;
> +
>  	/*
>  	 * For objects created by userspace through GEM_CREATE with pat_index
>  	 * set by set_pat extension, i915_gem_object_has_cache_level() will
> -- 
> 2.39.2
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Intel-gfx] [RFC 2/8] drm/i915: Split PTE encode between Gen12 and Meteorlake
  2023-07-27 14:54   ` [Intel-gfx] " Tvrtko Ursulin
  (?)
@ 2023-07-27 22:25   ` Matt Roper
  2023-07-28  8:18     ` Tvrtko Ursulin
  -1 siblings, 1 reply; 59+ messages in thread
From: Matt Roper @ 2023-07-27 22:25 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Intel-gfx, dri-devel

On Thu, Jul 27, 2023 at 03:54:58PM +0100, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> No need to run extra instructions which will never trigger on platforms
> before Meteorlake.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 26 ++++++++++++++++++++++++++
>  1 file changed, 26 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> index c8568e5d1147..862ac1d2de25 100644
> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> @@ -63,6 +63,30 @@ static u64 gen12_pte_encode(dma_addr_t addr,
>  {
>  	gen8_pte_t pte = addr | GEN8_PAGE_PRESENT | GEN8_PAGE_RW;
>  
> +	if (unlikely(flags & PTE_READ_ONLY))
> +		pte &= ~GEN8_PAGE_RW;
> +
> +	if (flags & PTE_LM)
> +		pte |= GEN12_PPGTT_PTE_LM;
> +
> +	if (pat_index & BIT(0))
> +		pte |= GEN12_PPGTT_PTE_PAT0;
> +
> +	if (pat_index & BIT(1))
> +		pte |= GEN12_PPGTT_PTE_PAT1;
> +
> +	if (pat_index & BIT(2))
> +		pte |= GEN12_PPGTT_PTE_PAT2;
> +
> +	return pte;
> +}
> +
> +static u64 mtl_pte_encode(dma_addr_t addr,
> +			  unsigned int pat_index,
> +			  u32 flags)
> +{
> +	gen8_pte_t pte = addr | GEN8_PAGE_PRESENT | GEN8_PAGE_RW;
> +

Would it be more readable to start with

        gen8_pte_t pte = gen12_pte_encode(addr, pat_index, flags);

and then |-in only the MTL-specific bit(s) as appropriate?

>  	if (unlikely(flags & PTE_READ_ONLY))
>  		pte &= ~GEN8_PAGE_RW;
>  
> @@ -995,6 +1019,8 @@ struct i915_ppgtt *gen8_ppgtt_create(struct intel_gt *gt,
>  	 */
>  	ppgtt->vm.alloc_scratch_dma = alloc_pt_dma;
>  
> +	if (GRAPHICS_VER_FULL(gt->i915) >= IP_VER(12, 70))
> +		ppgtt->vm.pte_encode = mtl_pte_encode;
>  	if (GRAPHICS_VER(gt->i915) >= 12)
>  		ppgtt->vm.pte_encode = gen12_pte_encode;

I think you wanted 'else if' here.  Otherwise you clobber the MTL
function pointer.


Matt

>  	else
> -- 
> 2.39.2
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC 3/8] drm/i915: Cache PAT index used by the driver
  2023-07-27 14:54   ` [Intel-gfx] " Tvrtko Ursulin
@ 2023-07-27 22:44     ` Matt Roper
  -1 siblings, 0 replies; 59+ messages in thread
From: Matt Roper @ 2023-07-27 22:44 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Intel-gfx, Fei Yang, dri-devel, Tvrtko Ursulin

On Thu, Jul 27, 2023 at 03:54:59PM +0100, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Eliminate a bunch of runtime calls to i915_gem_get_pat_index() by caching
> the interesting PAT indices in struct drm_i915_private. They are static
> per platfrom so no need to consult a function every time.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Matt Roper <matthew.d.roper@intel.com>
> Cc: Fei Yang <fei.yang@intel.com>
> ---
>  drivers/gpu/drm/i915/Makefile                 |  1 +
>  .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |  3 +--
>  drivers/gpu/drm/i915/gem/i915_gem_stolen.c    |  7 ++---
>  drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c  | 26 ++++++++++++-------
>  .../gpu/drm/i915/gem/selftests/huge_pages.c   |  2 +-
>  drivers/gpu/drm/i915/gt/gen6_ppgtt.c          |  4 +--
>  drivers/gpu/drm/i915/gt/gen8_ppgtt.c          |  4 +--
>  drivers/gpu/drm/i915/gt/intel_ggtt.c          |  8 ++----
>  drivers/gpu/drm/i915/gt/intel_migrate.c       | 11 +++-----
>  drivers/gpu/drm/i915/gt/selftest_migrate.c    |  9 +++----
>  drivers/gpu/drm/i915/gt/selftest_reset.c      | 14 +++-------
>  drivers/gpu/drm/i915/gt/selftest_tlb.c        |  5 ++--
>  drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c      |  8 ++----
>  drivers/gpu/drm/i915/i915_cache.c             | 18 +++++++++++++
>  drivers/gpu/drm/i915/i915_cache.h             | 13 ++++++++++
>  drivers/gpu/drm/i915/i915_driver.c            |  3 +++
>  drivers/gpu/drm/i915/i915_drv.h               |  2 ++
>  drivers/gpu/drm/i915/i915_gem.c               |  8 ++----
>  drivers/gpu/drm/i915/i915_gpu_error.c         |  8 ++----
>  drivers/gpu/drm/i915/selftests/i915_gem.c     |  5 +---
>  .../gpu/drm/i915/selftests/i915_gem_evict.c   |  4 +--
>  drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 11 +++-----
>  .../drm/i915/selftests/intel_memory_region.c  |  4 +--
>  .../gpu/drm/i915/selftests/mock_gem_device.c  |  2 ++
>  24 files changed, 89 insertions(+), 91 deletions(-)
>  create mode 100644 drivers/gpu/drm/i915/i915_cache.c
>  create mode 100644 drivers/gpu/drm/i915/i915_cache.h
> 
> diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
> index c5fc91cd58e7..905a51a16588 100644
> --- a/drivers/gpu/drm/i915/Makefile
> +++ b/drivers/gpu/drm/i915/Makefile
> @@ -35,6 +35,7 @@ subdir-ccflags-y += -I$(srctree)/$(src)
>  # core driver code
>  i915-y += i915_driver.o \
>  	  i915_drm_client.o \
> +	  i915_cache.o \
>  	  i915_config.o \
>  	  i915_getparam.o \
>  	  i915_ioctl.o \
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> index 5a687a3686bd..0a1d40220020 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> @@ -1330,8 +1330,7 @@ static void *reloc_iomap(struct i915_vma *batch,
>  		ggtt->vm.insert_page(&ggtt->vm,
>  				     i915_gem_object_get_dma_address(obj, page),
>  				     offset,
> -				     i915_gem_get_pat_index(ggtt->vm.i915,
> -							    I915_CACHE_NONE),
> +				     eb->i915->pat_uc,
>  				     0);
>  	} else {
>  		offset += page << PAGE_SHIFT;
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
> index 5b0a5cf9a98a..1c8eb806b7d3 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
> @@ -563,11 +563,8 @@ static void dbg_poison(struct i915_ggtt *ggtt,
>  	while (size) {
>  		void __iomem *s;
>  
> -		ggtt->vm.insert_page(&ggtt->vm, addr,
> -				     ggtt->error_capture.start,
> -				     i915_gem_get_pat_index(ggtt->vm.i915,
> -							    I915_CACHE_NONE),
> -				     0);
> +		ggtt->vm.insert_page(&ggtt->vm, addr, ggtt->error_capture.start,
> +				     ggtt->vm.i915->pat_uc, 0);
>  		mb();
>  
>  		s = io_mapping_map_wc(&ggtt->iomap,
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
> index 7078af2f8f79..6bd6c239f4ac 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
> @@ -58,6 +58,16 @@ i915_ttm_cache_level(struct drm_i915_private *i915, struct ttm_resource *res,
>  		I915_CACHE_NONE;
>  }
>  
> +static unsigned int
> +i915_ttm_cache_pat(struct drm_i915_private *i915, struct ttm_resource *res,
> +		   struct ttm_tt *ttm)
> +{
> +	return ((HAS_LLC(i915) || HAS_SNOOP(i915)) &&
> +		!i915_ttm_gtt_binds_lmem(res) &&

This matches the existing logic of i915_ttm_cache_level(), but do you
know why LMEM buffers are always set to uncached?  I don't understand
that part.

> +		ttm->caching == ttm_cached) ? i915->pat_wb :
> +		i915->pat_uc;
> +}
> +
>  static struct intel_memory_region *
>  i915_ttm_region(struct ttm_device *bdev, int ttm_mem_type)
>  {
> @@ -196,7 +206,7 @@ static struct dma_fence *i915_ttm_accel_move(struct ttm_buffer_object *bo,
>  	struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
>  	struct i915_request *rq;
>  	struct ttm_tt *src_ttm = bo->ttm;
> -	enum i915_cache_level src_level, dst_level;
> +	unsigned int src_pat, dst_pat;
>  	int ret;
>  
>  	if (!to_gt(i915)->migrate.context || intel_gt_is_wedged(to_gt(i915)))
> @@ -206,16 +216,15 @@ static struct dma_fence *i915_ttm_accel_move(struct ttm_buffer_object *bo,
>  	if (I915_SELFTEST_ONLY(fail_gpu_migration))
>  		clear = true;
>  
> -	dst_level = i915_ttm_cache_level(i915, dst_mem, dst_ttm);
> +	dst_pat = i915_ttm_cache_pat(i915, dst_mem, dst_ttm);
>  	if (clear) {
>  		if (bo->type == ttm_bo_type_kernel &&
>  		    !I915_SELFTEST_ONLY(fail_gpu_migration))
>  			return ERR_PTR(-EINVAL);
>  
>  		intel_engine_pm_get(to_gt(i915)->migrate.context->engine);
> -		ret = intel_context_migrate_clear(to_gt(i915)->migrate.context, deps,
> -						  dst_st->sgl,
> -						  i915_gem_get_pat_index(i915, dst_level),
> +		ret = intel_context_migrate_clear(to_gt(i915)->migrate.context,
> +						  deps, dst_st->sgl, dst_pat,
>  						  i915_ttm_gtt_binds_lmem(dst_mem),
>  						  0, &rq);
>  	} else {
> @@ -225,14 +234,13 @@ static struct dma_fence *i915_ttm_accel_move(struct ttm_buffer_object *bo,
>  		if (IS_ERR(src_rsgt))
>  			return ERR_CAST(src_rsgt);
>  
> -		src_level = i915_ttm_cache_level(i915, bo->resource, src_ttm);
> +		src_pat = i915_ttm_cache_pat(i915, bo->resource, src_ttm);
>  		intel_engine_pm_get(to_gt(i915)->migrate.context->engine);
>  		ret = intel_context_migrate_copy(to_gt(i915)->migrate.context,
>  						 deps, src_rsgt->table.sgl,
> -						 i915_gem_get_pat_index(i915, src_level),
> +						 src_pat,
>  						 i915_ttm_gtt_binds_lmem(bo->resource),
> -						 dst_st->sgl,
> -						 i915_gem_get_pat_index(i915, dst_level),
> +						 dst_st->sgl, dst_pat,
>  						 i915_ttm_gtt_binds_lmem(dst_mem),
>  						 &rq);
>  
> diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
> index 6b9f6cf50bf6..6bddd733d796 100644
> --- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
> +++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
> @@ -354,7 +354,7 @@ fake_huge_pages_object(struct drm_i915_private *i915, u64 size, bool single)
>  
>  	obj->write_domain = I915_GEM_DOMAIN_CPU;
>  	obj->read_domains = I915_GEM_DOMAIN_CPU;
> -	obj->pat_index = i915_gem_get_pat_index(i915, I915_CACHE_NONE);
> +	obj->pat_index = i915->pat_uc;
>  
>  	return obj;
>  }
> diff --git a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
> index c2bdc133c89a..fb69f667652a 100644
> --- a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
> +++ b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
> @@ -226,9 +226,7 @@ static int gen6_ppgtt_init_scratch(struct gen6_ppgtt *ppgtt)
>  		return ret;
>  
>  	vm->scratch[0]->encode =
> -		vm->pte_encode(px_dma(vm->scratch[0]),
> -			       i915_gem_get_pat_index(vm->i915,
> -						      I915_CACHE_NONE),
> +		vm->pte_encode(px_dma(vm->scratch[0]), vm->i915->pat_uc,
>  			       PTE_READ_ONLY);
>  
>  	vm->scratch[1] = vm->alloc_pt_dma(vm, I915_GTT_PAGE_SIZE_4K);
> diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> index 862ac1d2de25..675f71f06e89 100644
> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> @@ -874,9 +874,7 @@ static int gen8_init_scratch(struct i915_address_space *vm)
>  		pte_flags |= PTE_LM;
>  
>  	vm->scratch[0]->encode =
> -		vm->pte_encode(px_dma(vm->scratch[0]),
> -			       i915_gem_get_pat_index(vm->i915,
> -						      I915_CACHE_NONE),
> +		vm->pte_encode(px_dma(vm->scratch[0]), vm->i915->pat_uc,
>  			       pte_flags);
>  
>  	for (i = 1; i <= vm->top; i++) {
> diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> index dd0ed941441a..fca61ddca8ad 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> @@ -921,9 +921,7 @@ static int ggtt_probe_common(struct i915_ggtt *ggtt, u64 size)
>  		pte_flags |= PTE_LM;
>  
>  	ggtt->vm.scratch[0]->encode =
> -		ggtt->vm.pte_encode(px_dma(ggtt->vm.scratch[0]),
> -				    i915_gem_get_pat_index(i915,
> -							   I915_CACHE_NONE),
> +		ggtt->vm.pte_encode(px_dma(ggtt->vm.scratch[0]), i915->pat_uc,
>  				    pte_flags);
>  
>  	return 0;
> @@ -1298,9 +1296,7 @@ bool i915_ggtt_resume_vm(struct i915_address_space *vm)
>  		 */
>  		vma->resource->bound_flags = 0;
>  		vma->ops->bind_vma(vm, NULL, vma->resource,
> -				   obj ? obj->pat_index :
> -					 i915_gem_get_pat_index(vm->i915,
> -								I915_CACHE_NONE),
> +				   obj ? obj->pat_index : vm->i915->pat_uc,
>  				   was_bound);
>  
>  		if (obj) { /* only used during resume => exclusive access */
> diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c
> index 576e5ef0289b..b7a61b02f64c 100644
> --- a/drivers/gpu/drm/i915/gt/intel_migrate.c
> +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
> @@ -45,9 +45,7 @@ static void xehpsdv_toggle_pdes(struct i915_address_space *vm,
>  	 * Insert a dummy PTE into every PT that will map to LMEM to ensure
>  	 * we have a correctly setup PDE structure for later use.
>  	 */
> -	vm->insert_page(vm, 0, d->offset,
> -			i915_gem_get_pat_index(vm->i915, I915_CACHE_NONE),
> -			PTE_LM);
> +	vm->insert_page(vm, 0, d->offset, vm->i915->pat_uc, PTE_LM);
>  	GEM_BUG_ON(!pt->is_compact);
>  	d->offset += SZ_2M;
>  }
> @@ -65,9 +63,7 @@ static void xehpsdv_insert_pte(struct i915_address_space *vm,
>  	 * alignment is 64K underneath for the pt, and we are careful
>  	 * not to access the space in the void.
>  	 */
> -	vm->insert_page(vm, px_dma(pt), d->offset,
> -			i915_gem_get_pat_index(vm->i915, I915_CACHE_NONE),
> -			PTE_LM);
> +	vm->insert_page(vm, px_dma(pt), d->offset, vm->i915->pat_uc, PTE_LM);
>  	d->offset += SZ_64K;
>  }
>  
> @@ -77,8 +73,7 @@ static void insert_pte(struct i915_address_space *vm,
>  {
>  	struct insert_pte_data *d = data;
>  
> -	vm->insert_page(vm, px_dma(pt), d->offset,
> -			i915_gem_get_pat_index(vm->i915, I915_CACHE_NONE),
> +	vm->insert_page(vm, px_dma(pt), d->offset, vm->i915->pat_uc,
>  			i915_gem_object_is_lmem(pt->base) ? PTE_LM : 0);
>  	d->offset += PAGE_SIZE;
>  }
> diff --git a/drivers/gpu/drm/i915/gt/selftest_migrate.c b/drivers/gpu/drm/i915/gt/selftest_migrate.c
> index 3def5ca72dec..a67ede65d816 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_migrate.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_migrate.c
> @@ -904,8 +904,7 @@ static int perf_clear_blt(void *arg)
>  
>  		err = __perf_clear_blt(gt->migrate.context,
>  				       dst->mm.pages->sgl,
> -				       i915_gem_get_pat_index(gt->i915,
> -							      I915_CACHE_NONE),
> +				       gt->i915->pat_uc,
>  				       i915_gem_object_is_lmem(dst),
>  				       sizes[i]);
>  
> @@ -995,12 +994,10 @@ static int perf_copy_blt(void *arg)
>  
>  		err = __perf_copy_blt(gt->migrate.context,
>  				      src->mm.pages->sgl,
> -				      i915_gem_get_pat_index(gt->i915,
> -							     I915_CACHE_NONE),
> +				      gt->i915->pat_uc,
>  				      i915_gem_object_is_lmem(src),
>  				      dst->mm.pages->sgl,
> -				      i915_gem_get_pat_index(gt->i915,
> -							     I915_CACHE_NONE),
> +				      gt->i915->pat_uc,
>  				      i915_gem_object_is_lmem(dst),
>  				      sz);
>  
> diff --git a/drivers/gpu/drm/i915/gt/selftest_reset.c b/drivers/gpu/drm/i915/gt/selftest_reset.c
> index 79aa6ac66ad2..327dc9294e0f 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_reset.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_reset.c
> @@ -84,11 +84,8 @@ __igt_reset_stolen(struct intel_gt *gt,
>  		void __iomem *s;
>  		void *in;
>  
> -		ggtt->vm.insert_page(&ggtt->vm, dma,
> -				     ggtt->error_capture.start,
> -				     i915_gem_get_pat_index(gt->i915,
> -							    I915_CACHE_NONE),
> -				     0);
> +		ggtt->vm.insert_page(&ggtt->vm, dma, ggtt->error_capture.start,
> +				     gt->i915->pat_uc, 0);
>  		mb();
>  
>  		s = io_mapping_map_wc(&ggtt->iomap,
> @@ -127,11 +124,8 @@ __igt_reset_stolen(struct intel_gt *gt,
>  		void *in;
>  		u32 x;
>  
> -		ggtt->vm.insert_page(&ggtt->vm, dma,
> -				     ggtt->error_capture.start,
> -				     i915_gem_get_pat_index(gt->i915,
> -							    I915_CACHE_NONE),
> -				     0);
> +		ggtt->vm.insert_page(&ggtt->vm, dma, ggtt->error_capture.start,
> +				     gt->i915->pat_uc, 0);
>  		mb();
>  
>  		s = io_mapping_map_wc(&ggtt->iomap,
> diff --git a/drivers/gpu/drm/i915/gt/selftest_tlb.c b/drivers/gpu/drm/i915/gt/selftest_tlb.c
> index 3bd6b540257b..6049f01be219 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_tlb.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_tlb.c
> @@ -36,8 +36,6 @@ pte_tlbinv(struct intel_context *ce,
>  	   u64 length,
>  	   struct rnd_state *prng)
>  {
> -	const unsigned int pat_index =
> -		i915_gem_get_pat_index(ce->vm->i915, I915_CACHE_NONE);
>  	struct drm_i915_gem_object *batch;
>  	struct drm_mm_node vb_node;
>  	struct i915_request *rq;
> @@ -157,7 +155,8 @@ pte_tlbinv(struct intel_context *ce,
>  		/* Flip the PTE between A and B */
>  		if (i915_gem_object_is_lmem(vb->obj))
>  			pte_flags |= PTE_LM;
> -		ce->vm->insert_entries(ce->vm, &vb_res, pat_index, pte_flags);
> +		ce->vm->insert_entries(ce->vm, &vb_res, ce->vm->i915->pat_uc,
> +				       pte_flags);
>  
>  		/* Flush the PTE update to concurrent HW */
>  		tlbinv(ce->vm, addr & -length, length);
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
> index 7aadad5639c3..8b7aa8c5a99d 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
> @@ -1053,14 +1053,10 @@ static void uc_fw_bind_ggtt(struct intel_uc_fw *uc_fw)
>  
>  	if (ggtt->vm.raw_insert_entries)
>  		ggtt->vm.raw_insert_entries(&ggtt->vm, vma_res,
> -					    i915_gem_get_pat_index(ggtt->vm.i915,
> -								   I915_CACHE_NONE),
> -					    pte_flags);
> +					    ggtt->vm.i915->pat_uc, pte_flags);
>  	else
>  		ggtt->vm.insert_entries(&ggtt->vm, vma_res,
> -					i915_gem_get_pat_index(ggtt->vm.i915,
> -							       I915_CACHE_NONE),
> -					pte_flags);
> +					ggtt->vm.i915->pat_uc, pte_flags);
>  }
>  
>  static void uc_fw_unbind_ggtt(struct intel_uc_fw *uc_fw)
> diff --git a/drivers/gpu/drm/i915/i915_cache.c b/drivers/gpu/drm/i915/i915_cache.c
> new file mode 100644
> index 000000000000..06eb5933c719
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/i915_cache.c
> @@ -0,0 +1,18 @@
> +// SPDX-License-Identifier: MIT
> +/*
> + * Copyright © 2023 Intel Corporation
> + */
> +
> +#include "i915_cache.h"
> +#include "i915_drv.h"
> +
> +void i915_cache_init(struct drm_i915_private *i915)
> +{
> +	i915->pat_uc = i915_gem_get_pat_index(i915, I915_CACHE_NONE);
> +	drm_info(&i915->drm, "Using PAT index %u for uncached access\n",
> +		 i915->pat_uc);
> +
> +	i915->pat_wb = i915_gem_get_pat_index(i915, I915_CACHE_LLC);
> +	drm_info(&i915->drm, "Using PAT index %u for write-back access\n",
> +		 i915->pat_wb);
> +}
> diff --git a/drivers/gpu/drm/i915/i915_cache.h b/drivers/gpu/drm/i915/i915_cache.h
> new file mode 100644
> index 000000000000..cb68936fb8a2
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/i915_cache.h
> @@ -0,0 +1,13 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright © 2023 Intel Corporation
> + */
> +
> +#ifndef __I915_CACHE_H__
> +#define __I915_CACHE_H__
> +
> +struct drm_i915_private;
> +
> +void i915_cache_init(struct drm_i915_private *i915);
> +
> +#endif /* __I915_CACHE_H__ */
> diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
> index 294b022de22b..bb2223cc3470 100644
> --- a/drivers/gpu/drm/i915/i915_driver.c
> +++ b/drivers/gpu/drm/i915/i915_driver.c
> @@ -80,6 +80,7 @@
>  #include "soc/intel_dram.h"
>  #include "soc/intel_gmch.h"
>  
> +#include "i915_cache.h"
>  #include "i915_debugfs.h"
>  #include "i915_driver.h"
>  #include "i915_drm_client.h"
> @@ -240,6 +241,8 @@ static int i915_driver_early_probe(struct drm_i915_private *dev_priv)
>  	i915_memcpy_init_early(dev_priv);
>  	intel_runtime_pm_init_early(&dev_priv->runtime_pm);
>  
> +	i915_cache_init(dev_priv);
> +
>  	ret = i915_workqueues_init(dev_priv);
>  	if (ret < 0)
>  		return ret;
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 682ef2b5c7d5..f5c591a762df 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -250,6 +250,8 @@ struct drm_i915_private {
>  	unsigned int hpll_freq;
>  	unsigned int czclk_freq;
>  
> +	unsigned int pat_uc, pat_wb;
> +
>  	/**
>  	 * wq - Driver workqueue for GEM.
>  	 *
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 1f65bb33dd21..896aa48ed089 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -422,9 +422,7 @@ i915_gem_gtt_pread(struct drm_i915_gem_object *obj,
>  			ggtt->vm.insert_page(&ggtt->vm,
>  					     i915_gem_object_get_dma_address(obj,
>  									     offset >> PAGE_SHIFT),
> -					     node.start,
> -					     i915_gem_get_pat_index(i915,
> -								    I915_CACHE_NONE), 0);
> +					     node.start, i915->pat_uc, 0);
>  		} else {
>  			page_base += offset & PAGE_MASK;
>  		}
> @@ -603,9 +601,7 @@ i915_gem_gtt_pwrite_fast(struct drm_i915_gem_object *obj,
>  			ggtt->vm.insert_page(&ggtt->vm,
>  					     i915_gem_object_get_dma_address(obj,
>  									     offset >> PAGE_SHIFT),
> -					     node.start,
> -					     i915_gem_get_pat_index(i915,
> -								    I915_CACHE_NONE), 0);
> +					     node.start, i915->pat_uc, 0);
>  			wmb(); /* flush modifications to the GGTT (insert_page) */
>  		} else {
>  			page_base += offset & PAGE_MASK;
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> index 4008bb09fdb5..31975a79730c 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> @@ -1124,14 +1124,10 @@ i915_vma_coredump_create(const struct intel_gt *gt,
>  			mutex_lock(&ggtt->error_mutex);
>  			if (ggtt->vm.raw_insert_page)
>  				ggtt->vm.raw_insert_page(&ggtt->vm, dma, slot,
> -							 i915_gem_get_pat_index(gt->i915,
> -										I915_CACHE_NONE),
> -							 0);
> +							 gt->i915->pat_uc, 0);
>  			else
>  				ggtt->vm.insert_page(&ggtt->vm, dma, slot,
> -						     i915_gem_get_pat_index(gt->i915,
> -									    I915_CACHE_NONE),
> -						     0);
> +						     gt->i915->pat_uc, 0);
>  			mb();
>  
>  			s = io_mapping_map_wc(&ggtt->iomap, slot, PAGE_SIZE);
> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem.c b/drivers/gpu/drm/i915/selftests/i915_gem.c
> index 61da4ed9d521..e620f73793a5 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_gem.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_gem.c
> @@ -57,10 +57,7 @@ static void trash_stolen(struct drm_i915_private *i915)
>  		u32 __iomem *s;
>  		int x;
>  
> -		ggtt->vm.insert_page(&ggtt->vm, dma, slot,
> -				     i915_gem_get_pat_index(i915,
> -							    I915_CACHE_NONE),
> -				     0);
> +		ggtt->vm.insert_page(&ggtt->vm, dma, slot, i915->pat_uc, 0);
>  
>  		s = io_mapping_map_atomic_wc(&ggtt->iomap, slot);
>  		for (x = 0; x < PAGE_SIZE / sizeof(u32); x++) {
> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
> index f8fe3681c3dc..f910ec9b6d2b 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
> @@ -246,7 +246,7 @@ static int igt_evict_for_cache_color(void *arg)
>  	struct drm_mm_node target = {
>  		.start = I915_GTT_PAGE_SIZE * 2,
>  		.size = I915_GTT_PAGE_SIZE,
> -		.color = i915_gem_get_pat_index(gt->i915, I915_CACHE_LLC),
> +		.color = gt->i915->pat_wb,
>  	};
>  	struct drm_i915_gem_object *obj;
>  	struct i915_vma *vma;
> @@ -309,7 +309,7 @@ static int igt_evict_for_cache_color(void *arg)
>  	/* Attempt to remove the first *pinned* vma, by removing the (empty)
>  	 * neighbour -- this should fail.
>  	 */
> -	target.color = i915_gem_get_pat_index(gt->i915, I915_CACHE_L3_LLC);
> +	target.color = gt->i915->pat_uc;

This one doesn't look correct.  On most platforms I915_CACHE_L3_LLC maps
to the same wb PAT as I915_CACHE_LLC.  Only on legacy platforms does it
differ, and it maps to something different than either pat_uc or pat_wb
there.


Matt

>  
>  	mutex_lock(&ggtt->vm.mutex);
>  	err = i915_gem_evict_for_node(&ggtt->vm, NULL, &target, 0);
> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
> index 5c397a2df70e..c96b7f7d7853 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
> @@ -135,7 +135,7 @@ fake_dma_object(struct drm_i915_private *i915, u64 size)
>  
>  	obj->write_domain = I915_GEM_DOMAIN_CPU;
>  	obj->read_domains = I915_GEM_DOMAIN_CPU;
> -	obj->pat_index = i915_gem_get_pat_index(i915, I915_CACHE_NONE);
> +	obj->pat_index = i915->pat_uc;
>  
>  	/* Preallocate the "backing storage" */
>  	if (i915_gem_object_pin_pages_unlocked(obj))
> @@ -358,9 +358,7 @@ static int lowlevel_hole(struct i915_address_space *vm,
>  			mock_vma_res->start = addr;
>  
>  			with_intel_runtime_pm(vm->gt->uncore->rpm, wakeref)
> -			  vm->insert_entries(vm, mock_vma_res,
> -					     i915_gem_get_pat_index(vm->i915,
> -								    I915_CACHE_NONE),
> +			  vm->insert_entries(vm, mock_vma_res, vm->i915->pat_uc,
>  					     0);
>  		}
>  		count = n;
> @@ -1379,10 +1377,7 @@ static int igt_ggtt_page(void *arg)
>  
>  		ggtt->vm.insert_page(&ggtt->vm,
>  				     i915_gem_object_get_dma_address(obj, 0),
> -				     offset,
> -				     i915_gem_get_pat_index(i915,
> -							    I915_CACHE_NONE),
> -				     0);
> +				     offset, i915->pat_uc, 0);
>  	}
>  
>  	order = i915_random_order(count, &prng);
> diff --git a/drivers/gpu/drm/i915/selftests/intel_memory_region.c b/drivers/gpu/drm/i915/selftests/intel_memory_region.c
> index d985d9bae2e8..b82fe0ef8cd7 100644
> --- a/drivers/gpu/drm/i915/selftests/intel_memory_region.c
> +++ b/drivers/gpu/drm/i915/selftests/intel_memory_region.c
> @@ -1070,9 +1070,7 @@ static int igt_lmem_write_cpu(void *arg)
>  	/* Put the pages into a known state -- from the gpu for added fun */
>  	intel_engine_pm_get(engine);
>  	err = intel_context_migrate_clear(engine->gt->migrate.context, NULL,
> -					  obj->mm.pages->sgl,
> -					  i915_gem_get_pat_index(i915,
> -								 I915_CACHE_NONE),
> +					  obj->mm.pages->sgl, i915->pat_uc,
>  					  true, 0xdeadbeaf, &rq);
>  	if (rq) {
>  		dma_resv_add_fence(obj->base.resv, &rq->fence,
> diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> index da0b269606c5..1d1a457e2aee 100644
> --- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> +++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> @@ -181,6 +181,8 @@ struct drm_i915_private *mock_gem_device(void)
>  	/* Set up device info and initial runtime info. */
>  	intel_device_info_driver_create(i915, pdev->device, &mock_info);
>  
> +	i915_cache_init(i915);
> +
>  	dev_pm_domain_set(&pdev->dev, &pm_domain);
>  	pm_runtime_enable(&pdev->dev);
>  	pm_runtime_dont_use_autosuspend(&pdev->dev);
> -- 
> 2.39.2
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Intel-gfx] [RFC 3/8] drm/i915: Cache PAT index used by the driver
@ 2023-07-27 22:44     ` Matt Roper
  0 siblings, 0 replies; 59+ messages in thread
From: Matt Roper @ 2023-07-27 22:44 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Intel-gfx, dri-devel

On Thu, Jul 27, 2023 at 03:54:59PM +0100, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Eliminate a bunch of runtime calls to i915_gem_get_pat_index() by caching
> the interesting PAT indices in struct drm_i915_private. They are static
> per platfrom so no need to consult a function every time.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Matt Roper <matthew.d.roper@intel.com>
> Cc: Fei Yang <fei.yang@intel.com>
> ---
>  drivers/gpu/drm/i915/Makefile                 |  1 +
>  .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |  3 +--
>  drivers/gpu/drm/i915/gem/i915_gem_stolen.c    |  7 ++---
>  drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c  | 26 ++++++++++++-------
>  .../gpu/drm/i915/gem/selftests/huge_pages.c   |  2 +-
>  drivers/gpu/drm/i915/gt/gen6_ppgtt.c          |  4 +--
>  drivers/gpu/drm/i915/gt/gen8_ppgtt.c          |  4 +--
>  drivers/gpu/drm/i915/gt/intel_ggtt.c          |  8 ++----
>  drivers/gpu/drm/i915/gt/intel_migrate.c       | 11 +++-----
>  drivers/gpu/drm/i915/gt/selftest_migrate.c    |  9 +++----
>  drivers/gpu/drm/i915/gt/selftest_reset.c      | 14 +++-------
>  drivers/gpu/drm/i915/gt/selftest_tlb.c        |  5 ++--
>  drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c      |  8 ++----
>  drivers/gpu/drm/i915/i915_cache.c             | 18 +++++++++++++
>  drivers/gpu/drm/i915/i915_cache.h             | 13 ++++++++++
>  drivers/gpu/drm/i915/i915_driver.c            |  3 +++
>  drivers/gpu/drm/i915/i915_drv.h               |  2 ++
>  drivers/gpu/drm/i915/i915_gem.c               |  8 ++----
>  drivers/gpu/drm/i915/i915_gpu_error.c         |  8 ++----
>  drivers/gpu/drm/i915/selftests/i915_gem.c     |  5 +---
>  .../gpu/drm/i915/selftests/i915_gem_evict.c   |  4 +--
>  drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 11 +++-----
>  .../drm/i915/selftests/intel_memory_region.c  |  4 +--
>  .../gpu/drm/i915/selftests/mock_gem_device.c  |  2 ++
>  24 files changed, 89 insertions(+), 91 deletions(-)
>  create mode 100644 drivers/gpu/drm/i915/i915_cache.c
>  create mode 100644 drivers/gpu/drm/i915/i915_cache.h
> 
> diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
> index c5fc91cd58e7..905a51a16588 100644
> --- a/drivers/gpu/drm/i915/Makefile
> +++ b/drivers/gpu/drm/i915/Makefile
> @@ -35,6 +35,7 @@ subdir-ccflags-y += -I$(srctree)/$(src)
>  # core driver code
>  i915-y += i915_driver.o \
>  	  i915_drm_client.o \
> +	  i915_cache.o \
>  	  i915_config.o \
>  	  i915_getparam.o \
>  	  i915_ioctl.o \
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> index 5a687a3686bd..0a1d40220020 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> @@ -1330,8 +1330,7 @@ static void *reloc_iomap(struct i915_vma *batch,
>  		ggtt->vm.insert_page(&ggtt->vm,
>  				     i915_gem_object_get_dma_address(obj, page),
>  				     offset,
> -				     i915_gem_get_pat_index(ggtt->vm.i915,
> -							    I915_CACHE_NONE),
> +				     eb->i915->pat_uc,
>  				     0);
>  	} else {
>  		offset += page << PAGE_SHIFT;
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
> index 5b0a5cf9a98a..1c8eb806b7d3 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
> @@ -563,11 +563,8 @@ static void dbg_poison(struct i915_ggtt *ggtt,
>  	while (size) {
>  		void __iomem *s;
>  
> -		ggtt->vm.insert_page(&ggtt->vm, addr,
> -				     ggtt->error_capture.start,
> -				     i915_gem_get_pat_index(ggtt->vm.i915,
> -							    I915_CACHE_NONE),
> -				     0);
> +		ggtt->vm.insert_page(&ggtt->vm, addr, ggtt->error_capture.start,
> +				     ggtt->vm.i915->pat_uc, 0);
>  		mb();
>  
>  		s = io_mapping_map_wc(&ggtt->iomap,
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
> index 7078af2f8f79..6bd6c239f4ac 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
> @@ -58,6 +58,16 @@ i915_ttm_cache_level(struct drm_i915_private *i915, struct ttm_resource *res,
>  		I915_CACHE_NONE;
>  }
>  
> +static unsigned int
> +i915_ttm_cache_pat(struct drm_i915_private *i915, struct ttm_resource *res,
> +		   struct ttm_tt *ttm)
> +{
> +	return ((HAS_LLC(i915) || HAS_SNOOP(i915)) &&
> +		!i915_ttm_gtt_binds_lmem(res) &&

This matches the existing logic of i915_ttm_cache_level(), but do you
know why LMEM buffers are always set to uncached?  I don't understand
that part.

> +		ttm->caching == ttm_cached) ? i915->pat_wb :
> +		i915->pat_uc;
> +}
> +
>  static struct intel_memory_region *
>  i915_ttm_region(struct ttm_device *bdev, int ttm_mem_type)
>  {
> @@ -196,7 +206,7 @@ static struct dma_fence *i915_ttm_accel_move(struct ttm_buffer_object *bo,
>  	struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
>  	struct i915_request *rq;
>  	struct ttm_tt *src_ttm = bo->ttm;
> -	enum i915_cache_level src_level, dst_level;
> +	unsigned int src_pat, dst_pat;
>  	int ret;
>  
>  	if (!to_gt(i915)->migrate.context || intel_gt_is_wedged(to_gt(i915)))
> @@ -206,16 +216,15 @@ static struct dma_fence *i915_ttm_accel_move(struct ttm_buffer_object *bo,
>  	if (I915_SELFTEST_ONLY(fail_gpu_migration))
>  		clear = true;
>  
> -	dst_level = i915_ttm_cache_level(i915, dst_mem, dst_ttm);
> +	dst_pat = i915_ttm_cache_pat(i915, dst_mem, dst_ttm);
>  	if (clear) {
>  		if (bo->type == ttm_bo_type_kernel &&
>  		    !I915_SELFTEST_ONLY(fail_gpu_migration))
>  			return ERR_PTR(-EINVAL);
>  
>  		intel_engine_pm_get(to_gt(i915)->migrate.context->engine);
> -		ret = intel_context_migrate_clear(to_gt(i915)->migrate.context, deps,
> -						  dst_st->sgl,
> -						  i915_gem_get_pat_index(i915, dst_level),
> +		ret = intel_context_migrate_clear(to_gt(i915)->migrate.context,
> +						  deps, dst_st->sgl, dst_pat,
>  						  i915_ttm_gtt_binds_lmem(dst_mem),
>  						  0, &rq);
>  	} else {
> @@ -225,14 +234,13 @@ static struct dma_fence *i915_ttm_accel_move(struct ttm_buffer_object *bo,
>  		if (IS_ERR(src_rsgt))
>  			return ERR_CAST(src_rsgt);
>  
> -		src_level = i915_ttm_cache_level(i915, bo->resource, src_ttm);
> +		src_pat = i915_ttm_cache_pat(i915, bo->resource, src_ttm);
>  		intel_engine_pm_get(to_gt(i915)->migrate.context->engine);
>  		ret = intel_context_migrate_copy(to_gt(i915)->migrate.context,
>  						 deps, src_rsgt->table.sgl,
> -						 i915_gem_get_pat_index(i915, src_level),
> +						 src_pat,
>  						 i915_ttm_gtt_binds_lmem(bo->resource),
> -						 dst_st->sgl,
> -						 i915_gem_get_pat_index(i915, dst_level),
> +						 dst_st->sgl, dst_pat,
>  						 i915_ttm_gtt_binds_lmem(dst_mem),
>  						 &rq);
>  
> diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
> index 6b9f6cf50bf6..6bddd733d796 100644
> --- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
> +++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
> @@ -354,7 +354,7 @@ fake_huge_pages_object(struct drm_i915_private *i915, u64 size, bool single)
>  
>  	obj->write_domain = I915_GEM_DOMAIN_CPU;
>  	obj->read_domains = I915_GEM_DOMAIN_CPU;
> -	obj->pat_index = i915_gem_get_pat_index(i915, I915_CACHE_NONE);
> +	obj->pat_index = i915->pat_uc;
>  
>  	return obj;
>  }
> diff --git a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
> index c2bdc133c89a..fb69f667652a 100644
> --- a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
> +++ b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
> @@ -226,9 +226,7 @@ static int gen6_ppgtt_init_scratch(struct gen6_ppgtt *ppgtt)
>  		return ret;
>  
>  	vm->scratch[0]->encode =
> -		vm->pte_encode(px_dma(vm->scratch[0]),
> -			       i915_gem_get_pat_index(vm->i915,
> -						      I915_CACHE_NONE),
> +		vm->pte_encode(px_dma(vm->scratch[0]), vm->i915->pat_uc,
>  			       PTE_READ_ONLY);
>  
>  	vm->scratch[1] = vm->alloc_pt_dma(vm, I915_GTT_PAGE_SIZE_4K);
> diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> index 862ac1d2de25..675f71f06e89 100644
> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> @@ -874,9 +874,7 @@ static int gen8_init_scratch(struct i915_address_space *vm)
>  		pte_flags |= PTE_LM;
>  
>  	vm->scratch[0]->encode =
> -		vm->pte_encode(px_dma(vm->scratch[0]),
> -			       i915_gem_get_pat_index(vm->i915,
> -						      I915_CACHE_NONE),
> +		vm->pte_encode(px_dma(vm->scratch[0]), vm->i915->pat_uc,
>  			       pte_flags);
>  
>  	for (i = 1; i <= vm->top; i++) {
> diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> index dd0ed941441a..fca61ddca8ad 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> @@ -921,9 +921,7 @@ static int ggtt_probe_common(struct i915_ggtt *ggtt, u64 size)
>  		pte_flags |= PTE_LM;
>  
>  	ggtt->vm.scratch[0]->encode =
> -		ggtt->vm.pte_encode(px_dma(ggtt->vm.scratch[0]),
> -				    i915_gem_get_pat_index(i915,
> -							   I915_CACHE_NONE),
> +		ggtt->vm.pte_encode(px_dma(ggtt->vm.scratch[0]), i915->pat_uc,
>  				    pte_flags);
>  
>  	return 0;
> @@ -1298,9 +1296,7 @@ bool i915_ggtt_resume_vm(struct i915_address_space *vm)
>  		 */
>  		vma->resource->bound_flags = 0;
>  		vma->ops->bind_vma(vm, NULL, vma->resource,
> -				   obj ? obj->pat_index :
> -					 i915_gem_get_pat_index(vm->i915,
> -								I915_CACHE_NONE),
> +				   obj ? obj->pat_index : vm->i915->pat_uc,
>  				   was_bound);
>  
>  		if (obj) { /* only used during resume => exclusive access */
> diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c
> index 576e5ef0289b..b7a61b02f64c 100644
> --- a/drivers/gpu/drm/i915/gt/intel_migrate.c
> +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
> @@ -45,9 +45,7 @@ static void xehpsdv_toggle_pdes(struct i915_address_space *vm,
>  	 * Insert a dummy PTE into every PT that will map to LMEM to ensure
>  	 * we have a correctly setup PDE structure for later use.
>  	 */
> -	vm->insert_page(vm, 0, d->offset,
> -			i915_gem_get_pat_index(vm->i915, I915_CACHE_NONE),
> -			PTE_LM);
> +	vm->insert_page(vm, 0, d->offset, vm->i915->pat_uc, PTE_LM);
>  	GEM_BUG_ON(!pt->is_compact);
>  	d->offset += SZ_2M;
>  }
> @@ -65,9 +63,7 @@ static void xehpsdv_insert_pte(struct i915_address_space *vm,
>  	 * alignment is 64K underneath for the pt, and we are careful
>  	 * not to access the space in the void.
>  	 */
> -	vm->insert_page(vm, px_dma(pt), d->offset,
> -			i915_gem_get_pat_index(vm->i915, I915_CACHE_NONE),
> -			PTE_LM);
> +	vm->insert_page(vm, px_dma(pt), d->offset, vm->i915->pat_uc, PTE_LM);
>  	d->offset += SZ_64K;
>  }
>  
> @@ -77,8 +73,7 @@ static void insert_pte(struct i915_address_space *vm,
>  {
>  	struct insert_pte_data *d = data;
>  
> -	vm->insert_page(vm, px_dma(pt), d->offset,
> -			i915_gem_get_pat_index(vm->i915, I915_CACHE_NONE),
> +	vm->insert_page(vm, px_dma(pt), d->offset, vm->i915->pat_uc,
>  			i915_gem_object_is_lmem(pt->base) ? PTE_LM : 0);
>  	d->offset += PAGE_SIZE;
>  }
> diff --git a/drivers/gpu/drm/i915/gt/selftest_migrate.c b/drivers/gpu/drm/i915/gt/selftest_migrate.c
> index 3def5ca72dec..a67ede65d816 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_migrate.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_migrate.c
> @@ -904,8 +904,7 @@ static int perf_clear_blt(void *arg)
>  
>  		err = __perf_clear_blt(gt->migrate.context,
>  				       dst->mm.pages->sgl,
> -				       i915_gem_get_pat_index(gt->i915,
> -							      I915_CACHE_NONE),
> +				       gt->i915->pat_uc,
>  				       i915_gem_object_is_lmem(dst),
>  				       sizes[i]);
>  
> @@ -995,12 +994,10 @@ static int perf_copy_blt(void *arg)
>  
>  		err = __perf_copy_blt(gt->migrate.context,
>  				      src->mm.pages->sgl,
> -				      i915_gem_get_pat_index(gt->i915,
> -							     I915_CACHE_NONE),
> +				      gt->i915->pat_uc,
>  				      i915_gem_object_is_lmem(src),
>  				      dst->mm.pages->sgl,
> -				      i915_gem_get_pat_index(gt->i915,
> -							     I915_CACHE_NONE),
> +				      gt->i915->pat_uc,
>  				      i915_gem_object_is_lmem(dst),
>  				      sz);
>  
> diff --git a/drivers/gpu/drm/i915/gt/selftest_reset.c b/drivers/gpu/drm/i915/gt/selftest_reset.c
> index 79aa6ac66ad2..327dc9294e0f 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_reset.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_reset.c
> @@ -84,11 +84,8 @@ __igt_reset_stolen(struct intel_gt *gt,
>  		void __iomem *s;
>  		void *in;
>  
> -		ggtt->vm.insert_page(&ggtt->vm, dma,
> -				     ggtt->error_capture.start,
> -				     i915_gem_get_pat_index(gt->i915,
> -							    I915_CACHE_NONE),
> -				     0);
> +		ggtt->vm.insert_page(&ggtt->vm, dma, ggtt->error_capture.start,
> +				     gt->i915->pat_uc, 0);
>  		mb();
>  
>  		s = io_mapping_map_wc(&ggtt->iomap,
> @@ -127,11 +124,8 @@ __igt_reset_stolen(struct intel_gt *gt,
>  		void *in;
>  		u32 x;
>  
> -		ggtt->vm.insert_page(&ggtt->vm, dma,
> -				     ggtt->error_capture.start,
> -				     i915_gem_get_pat_index(gt->i915,
> -							    I915_CACHE_NONE),
> -				     0);
> +		ggtt->vm.insert_page(&ggtt->vm, dma, ggtt->error_capture.start,
> +				     gt->i915->pat_uc, 0);
>  		mb();
>  
>  		s = io_mapping_map_wc(&ggtt->iomap,
> diff --git a/drivers/gpu/drm/i915/gt/selftest_tlb.c b/drivers/gpu/drm/i915/gt/selftest_tlb.c
> index 3bd6b540257b..6049f01be219 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_tlb.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_tlb.c
> @@ -36,8 +36,6 @@ pte_tlbinv(struct intel_context *ce,
>  	   u64 length,
>  	   struct rnd_state *prng)
>  {
> -	const unsigned int pat_index =
> -		i915_gem_get_pat_index(ce->vm->i915, I915_CACHE_NONE);
>  	struct drm_i915_gem_object *batch;
>  	struct drm_mm_node vb_node;
>  	struct i915_request *rq;
> @@ -157,7 +155,8 @@ pte_tlbinv(struct intel_context *ce,
>  		/* Flip the PTE between A and B */
>  		if (i915_gem_object_is_lmem(vb->obj))
>  			pte_flags |= PTE_LM;
> -		ce->vm->insert_entries(ce->vm, &vb_res, pat_index, pte_flags);
> +		ce->vm->insert_entries(ce->vm, &vb_res, ce->vm->i915->pat_uc,
> +				       pte_flags);
>  
>  		/* Flush the PTE update to concurrent HW */
>  		tlbinv(ce->vm, addr & -length, length);
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
> index 7aadad5639c3..8b7aa8c5a99d 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
> @@ -1053,14 +1053,10 @@ static void uc_fw_bind_ggtt(struct intel_uc_fw *uc_fw)
>  
>  	if (ggtt->vm.raw_insert_entries)
>  		ggtt->vm.raw_insert_entries(&ggtt->vm, vma_res,
> -					    i915_gem_get_pat_index(ggtt->vm.i915,
> -								   I915_CACHE_NONE),
> -					    pte_flags);
> +					    ggtt->vm.i915->pat_uc, pte_flags);
>  	else
>  		ggtt->vm.insert_entries(&ggtt->vm, vma_res,
> -					i915_gem_get_pat_index(ggtt->vm.i915,
> -							       I915_CACHE_NONE),
> -					pte_flags);
> +					ggtt->vm.i915->pat_uc, pte_flags);
>  }
>  
>  static void uc_fw_unbind_ggtt(struct intel_uc_fw *uc_fw)
> diff --git a/drivers/gpu/drm/i915/i915_cache.c b/drivers/gpu/drm/i915/i915_cache.c
> new file mode 100644
> index 000000000000..06eb5933c719
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/i915_cache.c
> @@ -0,0 +1,18 @@
> +// SPDX-License-Identifier: MIT
> +/*
> + * Copyright © 2023 Intel Corporation
> + */
> +
> +#include "i915_cache.h"
> +#include "i915_drv.h"
> +
> +void i915_cache_init(struct drm_i915_private *i915)
> +{
> +	i915->pat_uc = i915_gem_get_pat_index(i915, I915_CACHE_NONE);
> +	drm_info(&i915->drm, "Using PAT index %u for uncached access\n",
> +		 i915->pat_uc);
> +
> +	i915->pat_wb = i915_gem_get_pat_index(i915, I915_CACHE_LLC);
> +	drm_info(&i915->drm, "Using PAT index %u for write-back access\n",
> +		 i915->pat_wb);
> +}
> diff --git a/drivers/gpu/drm/i915/i915_cache.h b/drivers/gpu/drm/i915/i915_cache.h
> new file mode 100644
> index 000000000000..cb68936fb8a2
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/i915_cache.h
> @@ -0,0 +1,13 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright © 2023 Intel Corporation
> + */
> +
> +#ifndef __I915_CACHE_H__
> +#define __I915_CACHE_H__
> +
> +struct drm_i915_private;
> +
> +void i915_cache_init(struct drm_i915_private *i915);
> +
> +#endif /* __I915_CACHE_H__ */
> diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
> index 294b022de22b..bb2223cc3470 100644
> --- a/drivers/gpu/drm/i915/i915_driver.c
> +++ b/drivers/gpu/drm/i915/i915_driver.c
> @@ -80,6 +80,7 @@
>  #include "soc/intel_dram.h"
>  #include "soc/intel_gmch.h"
>  
> +#include "i915_cache.h"
>  #include "i915_debugfs.h"
>  #include "i915_driver.h"
>  #include "i915_drm_client.h"
> @@ -240,6 +241,8 @@ static int i915_driver_early_probe(struct drm_i915_private *dev_priv)
>  	i915_memcpy_init_early(dev_priv);
>  	intel_runtime_pm_init_early(&dev_priv->runtime_pm);
>  
> +	i915_cache_init(dev_priv);
> +
>  	ret = i915_workqueues_init(dev_priv);
>  	if (ret < 0)
>  		return ret;
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 682ef2b5c7d5..f5c591a762df 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -250,6 +250,8 @@ struct drm_i915_private {
>  	unsigned int hpll_freq;
>  	unsigned int czclk_freq;
>  
> +	unsigned int pat_uc, pat_wb;
> +
>  	/**
>  	 * wq - Driver workqueue for GEM.
>  	 *
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 1f65bb33dd21..896aa48ed089 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -422,9 +422,7 @@ i915_gem_gtt_pread(struct drm_i915_gem_object *obj,
>  			ggtt->vm.insert_page(&ggtt->vm,
>  					     i915_gem_object_get_dma_address(obj,
>  									     offset >> PAGE_SHIFT),
> -					     node.start,
> -					     i915_gem_get_pat_index(i915,
> -								    I915_CACHE_NONE), 0);
> +					     node.start, i915->pat_uc, 0);
>  		} else {
>  			page_base += offset & PAGE_MASK;
>  		}
> @@ -603,9 +601,7 @@ i915_gem_gtt_pwrite_fast(struct drm_i915_gem_object *obj,
>  			ggtt->vm.insert_page(&ggtt->vm,
>  					     i915_gem_object_get_dma_address(obj,
>  									     offset >> PAGE_SHIFT),
> -					     node.start,
> -					     i915_gem_get_pat_index(i915,
> -								    I915_CACHE_NONE), 0);
> +					     node.start, i915->pat_uc, 0);
>  			wmb(); /* flush modifications to the GGTT (insert_page) */
>  		} else {
>  			page_base += offset & PAGE_MASK;
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> index 4008bb09fdb5..31975a79730c 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> @@ -1124,14 +1124,10 @@ i915_vma_coredump_create(const struct intel_gt *gt,
>  			mutex_lock(&ggtt->error_mutex);
>  			if (ggtt->vm.raw_insert_page)
>  				ggtt->vm.raw_insert_page(&ggtt->vm, dma, slot,
> -							 i915_gem_get_pat_index(gt->i915,
> -										I915_CACHE_NONE),
> -							 0);
> +							 gt->i915->pat_uc, 0);
>  			else
>  				ggtt->vm.insert_page(&ggtt->vm, dma, slot,
> -						     i915_gem_get_pat_index(gt->i915,
> -									    I915_CACHE_NONE),
> -						     0);
> +						     gt->i915->pat_uc, 0);
>  			mb();
>  
>  			s = io_mapping_map_wc(&ggtt->iomap, slot, PAGE_SIZE);
> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem.c b/drivers/gpu/drm/i915/selftests/i915_gem.c
> index 61da4ed9d521..e620f73793a5 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_gem.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_gem.c
> @@ -57,10 +57,7 @@ static void trash_stolen(struct drm_i915_private *i915)
>  		u32 __iomem *s;
>  		int x;
>  
> -		ggtt->vm.insert_page(&ggtt->vm, dma, slot,
> -				     i915_gem_get_pat_index(i915,
> -							    I915_CACHE_NONE),
> -				     0);
> +		ggtt->vm.insert_page(&ggtt->vm, dma, slot, i915->pat_uc, 0);
>  
>  		s = io_mapping_map_atomic_wc(&ggtt->iomap, slot);
>  		for (x = 0; x < PAGE_SIZE / sizeof(u32); x++) {
> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
> index f8fe3681c3dc..f910ec9b6d2b 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
> @@ -246,7 +246,7 @@ static int igt_evict_for_cache_color(void *arg)
>  	struct drm_mm_node target = {
>  		.start = I915_GTT_PAGE_SIZE * 2,
>  		.size = I915_GTT_PAGE_SIZE,
> -		.color = i915_gem_get_pat_index(gt->i915, I915_CACHE_LLC),
> +		.color = gt->i915->pat_wb,
>  	};
>  	struct drm_i915_gem_object *obj;
>  	struct i915_vma *vma;
> @@ -309,7 +309,7 @@ static int igt_evict_for_cache_color(void *arg)
>  	/* Attempt to remove the first *pinned* vma, by removing the (empty)
>  	 * neighbour -- this should fail.
>  	 */
> -	target.color = i915_gem_get_pat_index(gt->i915, I915_CACHE_L3_LLC);
> +	target.color = gt->i915->pat_uc;

This one doesn't look correct.  On most platforms I915_CACHE_L3_LLC maps
to the same wb PAT as I915_CACHE_LLC.  Only on legacy platforms does it
differ, and it maps to something different than either pat_uc or pat_wb
there.


Matt

>  
>  	mutex_lock(&ggtt->vm.mutex);
>  	err = i915_gem_evict_for_node(&ggtt->vm, NULL, &target, 0);
> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
> index 5c397a2df70e..c96b7f7d7853 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
> @@ -135,7 +135,7 @@ fake_dma_object(struct drm_i915_private *i915, u64 size)
>  
>  	obj->write_domain = I915_GEM_DOMAIN_CPU;
>  	obj->read_domains = I915_GEM_DOMAIN_CPU;
> -	obj->pat_index = i915_gem_get_pat_index(i915, I915_CACHE_NONE);
> +	obj->pat_index = i915->pat_uc;
>  
>  	/* Preallocate the "backing storage" */
>  	if (i915_gem_object_pin_pages_unlocked(obj))
> @@ -358,9 +358,7 @@ static int lowlevel_hole(struct i915_address_space *vm,
>  			mock_vma_res->start = addr;
>  
>  			with_intel_runtime_pm(vm->gt->uncore->rpm, wakeref)
> -			  vm->insert_entries(vm, mock_vma_res,
> -					     i915_gem_get_pat_index(vm->i915,
> -								    I915_CACHE_NONE),
> +			  vm->insert_entries(vm, mock_vma_res, vm->i915->pat_uc,
>  					     0);
>  		}
>  		count = n;
> @@ -1379,10 +1377,7 @@ static int igt_ggtt_page(void *arg)
>  
>  		ggtt->vm.insert_page(&ggtt->vm,
>  				     i915_gem_object_get_dma_address(obj, 0),
> -				     offset,
> -				     i915_gem_get_pat_index(i915,
> -							    I915_CACHE_NONE),
> -				     0);
> +				     offset, i915->pat_uc, 0);
>  	}
>  
>  	order = i915_random_order(count, &prng);
> diff --git a/drivers/gpu/drm/i915/selftests/intel_memory_region.c b/drivers/gpu/drm/i915/selftests/intel_memory_region.c
> index d985d9bae2e8..b82fe0ef8cd7 100644
> --- a/drivers/gpu/drm/i915/selftests/intel_memory_region.c
> +++ b/drivers/gpu/drm/i915/selftests/intel_memory_region.c
> @@ -1070,9 +1070,7 @@ static int igt_lmem_write_cpu(void *arg)
>  	/* Put the pages into a known state -- from the gpu for added fun */
>  	intel_engine_pm_get(engine);
>  	err = intel_context_migrate_clear(engine->gt->migrate.context, NULL,
> -					  obj->mm.pages->sgl,
> -					  i915_gem_get_pat_index(i915,
> -								 I915_CACHE_NONE),
> +					  obj->mm.pages->sgl, i915->pat_uc,
>  					  true, 0xdeadbeaf, &rq);
>  	if (rq) {
>  		dma_resv_add_fence(obj->base.resv, &rq->fence,
> diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> index da0b269606c5..1d1a457e2aee 100644
> --- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> +++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> @@ -181,6 +181,8 @@ struct drm_i915_private *mock_gem_device(void)
>  	/* Set up device info and initial runtime info. */
>  	intel_device_info_driver_create(i915, pdev->device, &mock_info);
>  
> +	i915_cache_init(i915);
> +
>  	dev_pm_domain_set(&pdev->dev, &pm_domain);
>  	pm_runtime_enable(&pdev->dev);
>  	pm_runtime_dont_use_autosuspend(&pdev->dev);
> -- 
> 2.39.2
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC 4/8] drm/i915: Refactor PAT/object cache handling
  2023-07-27 14:55   ` [Intel-gfx] " Tvrtko Ursulin
@ 2023-07-27 23:57     ` Matt Roper
  -1 siblings, 0 replies; 59+ messages in thread
From: Matt Roper @ 2023-07-27 23:57 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Fei Yang, Tvrtko Ursulin, Intel-gfx, dri-devel, Andi Shyti, Chris Wilson

On Thu, Jul 27, 2023 at 03:55:00PM +0100, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Commit 9275277d5324 ("drm/i915: use pat_index instead of cache_level") has
> introduced PAT indices to i915 internal APIs, partially replacing the
> usage of driver internal cache_level, but has also added a few sub-
> optimal design decisions which this patch tries to improve upon.
> 
> Principal change here is to invert the per platform cache level to PAT
> index table which was added by the referenced commit, and by doing so
> enable i915 to understand the cache mode between PAT indices, changing
> them from opaque to transparent.
> 
> Once we have the inverted table we are able to remove the hidden false
> "return true" from i915_gem_object_has_cache_level and make the involved
> code path clearer.
> 
> To achieve this we replace the enum i915_cache_level with i915_cache_t,
> composed of a more detailed representation of each cache mode (base mode
> plus flags).
> 
> In this way we are able to express the differences between different
> write-back mode coherency settings on Meteorlake, which in turn enables us
> to map the i915 "cached" mode to the correct Meteorlake PAT index.
> 
> We can also replace the platform dependent cache mode to string code in
> debugfs and elsewhere by the single implementation based on i915_cache_t.
> 
> v2:
>  * Fix PAT-to-cache-mode table for PVC. (Fei)
>  * Cache display caching mode too. (Fei)
>  * Improve and document criteria in i915_gem_object_can_bypass_llc() (Matt)
> 
> v3:
>  * Checkpath issues.
>  * Cache mode flags check fixed.
> 
> v4:
>  * Fix intel_device_info->cache_modes array size. (Matt)
>  * Boolean cache mode and flags query. (Matt)
>  * Reduce number of cache macros with some macro magic.
>  * One more checkpatch fix.
>  * Tweak tables to show legacy and Gen12 WB is fully coherent.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> References: 9275277d5324 ("drm/i915: use pat_index instead of cache_level")
> Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
> Cc: Fei Yang <fei.yang@intel.com>
> Cc: Andi Shyti <andi.shyti@linux.intel.com>
> Cc: Matt Roper <matthew.d.roper@intel.com>
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_domain.c    |  60 +++++----
>  drivers/gpu/drm/i915/gem/i915_gem_domain.h    |   5 +-
>  .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |   3 +-
>  drivers/gpu/drm/i915/gem/i915_gem_internal.c  |   2 +-
>  drivers/gpu/drm/i915/gem/i915_gem_mman.c      |   4 +-
>  drivers/gpu/drm/i915/gem/i915_gem_object.c    | 117 ++++++++++--------
>  drivers/gpu/drm/i915/gem/i915_gem_object.h    |  11 +-
>  .../gpu/drm/i915/gem/i915_gem_object_types.h  | 116 +----------------
>  drivers/gpu/drm/i915/gem/i915_gem_shmem.c     |   8 +-
>  drivers/gpu/drm/i915/gem/i915_gem_stolen.c    |   2 +-
>  drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c  |  20 +--
>  drivers/gpu/drm/i915/gem/i915_gem_userptr.c   |   2 +-
>  .../drm/i915/gem/selftests/huge_gem_object.c  |   2 +-
>  .../gpu/drm/i915/gem/selftests/huge_pages.c   |   3 +-
>  drivers/gpu/drm/i915/gt/gen8_ppgtt.c          |  10 +-
>  drivers/gpu/drm/i915/gt/intel_engine_cs.c     |   2 +-
>  drivers/gpu/drm/i915/gt/intel_ggtt.c          |  25 ++--
>  drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c     |   4 +-
>  drivers/gpu/drm/i915/gt/intel_gtt.c           |   2 +-
>  drivers/gpu/drm/i915/gt/intel_gtt.h           |   3 +-
>  drivers/gpu/drm/i915/gt/intel_ppgtt.c         |   6 +-
>  .../gpu/drm/i915/gt/intel_ring_submission.c   |   4 +-
>  drivers/gpu/drm/i915/gt/intel_timeline.c      |   2 +-
>  drivers/gpu/drm/i915/gt/selftest_hangcheck.c  |   2 +-
>  .../gpu/drm/i915/gt/selftest_workarounds.c    |   2 +-
>  drivers/gpu/drm/i915/i915_cache.c             |  89 +++++++++++--
>  drivers/gpu/drm/i915/i915_cache.h             |  70 ++++++++++-
>  drivers/gpu/drm/i915/i915_debugfs.c           |  53 ++------
>  drivers/gpu/drm/i915/i915_driver.c            |   4 +-
>  drivers/gpu/drm/i915/i915_gem.c               |  13 --
>  drivers/gpu/drm/i915/i915_pci.c               |  84 +++++++------
>  drivers/gpu/drm/i915/i915_perf.c              |   2 +-
>  drivers/gpu/drm/i915/intel_device_info.h      |   6 +-
>  .../gpu/drm/i915/selftests/i915_gem_evict.c   |   4 +-
>  drivers/gpu/drm/i915/selftests/igt_spinner.c  |   2 +-
>  .../gpu/drm/i915/selftests/mock_gem_device.c  |  14 +--
>  36 files changed, 391 insertions(+), 367 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> index 57db9c581bf6..c15f83de33af 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> @@ -8,6 +8,7 @@
>  #include "display/intel_frontbuffer.h"
>  #include "gt/intel_gt.h"
>  
> +#include "i915_cache.h"
>  #include "i915_drv.h"
>  #include "i915_gem_clflush.h"
>  #include "i915_gem_domain.h"
> @@ -41,14 +42,17 @@ static bool gpu_write_needs_clflush(struct drm_i915_gem_object *obj)
>  		return false;
>  
>  	/*
> -	 * For objects created by userspace through GEM_CREATE with pat_index
> -	 * set by set_pat extension, i915_gem_object_has_cache_level() will
> -	 * always return true, because the coherency of such object is managed
> -	 * by userspace. Othereise the call here would fall back to checking
> -	 * whether the object is un-cached or write-through.
> +	 * Always flush cache for UMD objects with PAT index set.
>  	 */
> -	return !(i915_gem_object_has_cache_level(obj, I915_CACHE_NONE) ||
> -		 i915_gem_object_has_cache_level(obj, I915_CACHE_WT));
> +	if (obj->pat_set_by_user)
> +		return true;
> +
> +	/*
> +	 * Fully coherent cached access may end up with data in the CPU cache
> +	 * which hasn't hit memory yet.
> +	 */
> +	return i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_WB) &&
> +	       i915_gem_object_has_cache_flag(obj, I915_CACHE_FLAG_COH2W);
>  }
>  
>  bool i915_gem_cpu_write_needs_clflush(struct drm_i915_gem_object *obj)
> @@ -268,7 +272,7 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
>  /**
>   * i915_gem_object_set_cache_level - Changes the cache-level of an object across all VMA.
>   * @obj: object to act on
> - * @cache_level: new cache level to set for the object
> + * @cache: new caching mode to set for the object
>   *
>   * After this function returns, the object will be in the new cache-level
>   * across all GTT and the contents of the backing storage will be coherent,
> @@ -281,18 +285,28 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
>   * that all direct access to the scanout remains coherent.
>   */
>  int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
> -				    enum i915_cache_level cache_level)
> +				    i915_cache_t cache)
>  {
> -	int ret;
> +	struct drm_i915_private *i915 = to_i915(obj->base.dev);
> +	int pat, ret;
>  
> -	/*
> -	 * For objects created by userspace through GEM_CREATE with pat_index
> -	 * set by set_pat extension, simply return 0 here without touching
> -	 * the cache setting, because such objects should have an immutable
> -	 * cache setting by desgin and always managed by userspace.
> -	 */
> -	if (i915_gem_object_has_cache_level(obj, cache_level))
> +	pat = i915_cache_find_pat(i915, cache);
> +	if (pat < 0) {
> +		char buf[I915_CACHE_NAME_LEN];
> +
> +		i915_cache_print(buf, sizeof(buf), NULL, cache);
> +		drm_err_ratelimited(&i915->drm,
> +				    "Attempting to use unknown caching mode %s!\n",
> +				    buf);
> +
> +		return -EINVAL;
> +	} else if (pat == obj->pat_index) {
>  		return 0;
> +	} else if (obj->pat_set_by_user) {
> +		drm_notice_once(&i915->drm,
> +				"Attempting to change caching mode on an object with fixed PAT!\n");
> +		return -EINVAL;
> +	}
>  
>  	ret = i915_gem_object_wait(obj,
>  				   I915_WAIT_INTERRUPTIBLE |
> @@ -302,7 +316,7 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
>  		return ret;
>  
>  	/* Always invalidate stale cachelines */
> -	i915_gem_object_set_cache_coherency(obj, cache_level);
> +	i915_gem_object_set_pat_index(obj, pat);
>  	obj->cache_dirty = true;
>  
>  	/* The cache-level will be applied when each vma is rebound. */
> @@ -337,10 +351,10 @@ int i915_gem_get_caching_ioctl(struct drm_device *dev, void *data,
>  		goto out;
>  	}
>  
> -	if (i915_gem_object_has_cache_level(obj, I915_CACHE_LLC) ||
> -	    i915_gem_object_has_cache_level(obj, I915_CACHE_L3_LLC))
> +	if (i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_WB) &&
> +	    i915_gem_object_has_cache_flag(obj, I915_CACHE_FLAG_COH2W))
>  		args->caching = I915_CACHING_CACHED;
> -	else if (i915_gem_object_has_cache_level(obj, I915_CACHE_WT))
> +	else if (i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_WT))
>  		args->caching = I915_CACHING_DISPLAY;
>  	else
>  		args->caching = I915_CACHING_NONE;
> @@ -355,7 +369,7 @@ int i915_gem_set_caching_ioctl(struct drm_device *dev, void *data,
>  	struct drm_i915_private *i915 = to_i915(dev);
>  	struct drm_i915_gem_caching *args = data;
>  	struct drm_i915_gem_object *obj;
> -	enum i915_cache_level level;
> +	i915_cache_t level;
>  	int ret = 0;
>  
>  	if (IS_DGFX(i915))
> @@ -378,7 +392,7 @@ int i915_gem_set_caching_ioctl(struct drm_device *dev, void *data,
>  		if (!HAS_LLC(i915) && !HAS_SNOOP(i915))
>  			return -ENODEV;
>  
> -		level = I915_CACHE_LLC;
> +		level = I915_CACHE_CACHED;
>  		break;
>  	case I915_CACHING_DISPLAY:
>  		level = HAS_WT(i915) ? I915_CACHE_WT : I915_CACHE_NONE;
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.h b/drivers/gpu/drm/i915/gem/i915_gem_domain.h
> index 9622df962bfc..6da5c351f6fd 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.h
> @@ -6,10 +6,11 @@
>  #ifndef __I915_GEM_DOMAIN_H__
>  #define __I915_GEM_DOMAIN_H__
>  
> +#include "i915_cache.h"
> +
>  struct drm_i915_gem_object;
> -enum i915_cache_level;
>  
>  int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
> -				    enum i915_cache_level cache_level);
> +				    i915_cache_t cache);
>  
>  #endif /* __I915_GEM_DOMAIN_H__ */
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> index 0a1d40220020..9d6e49c8a4c6 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> @@ -648,7 +648,8 @@ static inline int use_cpu_reloc(const struct reloc_cache *cache,
>  	 */
>  	return (cache->has_llc ||
>  		obj->cache_dirty ||
> -		!i915_gem_object_has_cache_level(obj, I915_CACHE_NONE));
> +		!(obj->pat_set_by_user ||
> +		  i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_UC)));
>  }
>  
>  static int eb_reserve_vma(struct i915_execbuffer *eb,
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_internal.c b/drivers/gpu/drm/i915/gem/i915_gem_internal.c
> index 6bc26b4b06b8..88c360c3d6a3 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_internal.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_internal.c
> @@ -170,7 +170,7 @@ __i915_gem_object_create_internal(struct drm_i915_private *i915,
>  	obj->read_domains = I915_GEM_DOMAIN_CPU;
>  	obj->write_domain = I915_GEM_DOMAIN_CPU;
>  
> -	cache_level = HAS_LLC(i915) ? I915_CACHE_LLC : I915_CACHE_NONE;
> +	cache_level = HAS_LLC(i915) ? I915_CACHE_CACHED : I915_CACHE_NONE;
>  	i915_gem_object_set_cache_coherency(obj, cache_level);
>  
>  	return obj;
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> index aa4d842d4c5a..cd7f8ded0d6f 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> @@ -382,7 +382,6 @@ static vm_fault_t vm_fault_gtt(struct vm_fault *vmf)
>  		goto err_reset;
>  	}
>  
> -	/* Access to snoopable pages through the GTT is incoherent. */
>  	/*
>  	 * For objects created by userspace through GEM_CREATE with pat_index
>  	 * set by set_pat extension, coherency is managed by userspace, make
> @@ -391,7 +390,8 @@ static vm_fault_t vm_fault_gtt(struct vm_fault *vmf)
>  	 * objects. Otherwise this helper function would fall back to checking
>  	 * whether the object is un-cached.
>  	 */
> -	if (!(i915_gem_object_has_cache_level(obj, I915_CACHE_NONE) ||
> +	if (!((obj->pat_set_by_user ||
> +	       i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_UC)) ||
>  	      HAS_LLC(i915))) {
>  		ret = -EFAULT;
>  		goto err_unpin;
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> index 3dc4fbb67d2b..ec1f0be43d0d 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> @@ -45,33 +45,6 @@ static struct kmem_cache *slab_objects;
>  
>  static const struct drm_gem_object_funcs i915_gem_object_funcs;
>  
> -unsigned int i915_gem_get_pat_index(struct drm_i915_private *i915,
> -				    enum i915_cache_level level)
> -{
> -	if (drm_WARN_ON(&i915->drm, level >= I915_MAX_CACHE_LEVEL))
> -		return 0;
> -
> -	return INTEL_INFO(i915)->cachelevel_to_pat[level];
> -}
> -
> -bool i915_gem_object_has_cache_level(const struct drm_i915_gem_object *obj,
> -				     enum i915_cache_level lvl)
> -{
> -	/*
> -	 * In case the pat_index is set by user space, this kernel mode
> -	 * driver should leave the coherency to be managed by user space,
> -	 * simply return true here.
> -	 */
> -	if (obj->pat_set_by_user)
> -		return true;
> -
> -	/*
> -	 * Otherwise the pat_index should have been converted from cache_level
> -	 * so that the following comparison is valid.
> -	 */
> -	return obj->pat_index == i915_gem_get_pat_index(obj_to_i915(obj), lvl);
> -}
> -
>  struct drm_i915_gem_object *i915_gem_object_alloc(void)
>  {
>  	struct drm_i915_gem_object *obj;
> @@ -144,30 +117,72 @@ void __i915_gem_object_fini(struct drm_i915_gem_object *obj)
>  	dma_resv_fini(&obj->base._resv);
>  }
>  
> +bool i915_gem_object_has_cache_mode(const struct drm_i915_gem_object *obj,
> +				    enum i915_cache_mode mode)
> +{
> +	struct drm_i915_private *i915 = obj_to_i915(obj);
> +	i915_cache_t cache = INTEL_INFO(i915)->cache_modes[obj->pat_index];
> +
> +	return I915_CACHE_MODE(cache) == mode;
> +}
> +
> +bool i915_gem_object_has_cache_flag(const struct drm_i915_gem_object *obj,
> +				    unsigned int flag)
> +{
> +	struct drm_i915_private *i915 = obj_to_i915(obj);
> +	i915_cache_t cache = INTEL_INFO(i915)->cache_modes[obj->pat_index];
> +
> +	return I915_CACHE_FLAGS(cache) & flag;
> +}
> +
> +static void __i915_gem_object_update_coherency(struct drm_i915_gem_object *obj)
> +{
> +	struct drm_i915_private *i915 = obj_to_i915(obj);
> +	i915_cache_t cache = INTEL_INFO(i915)->cache_modes[obj->pat_index];
> +	const unsigned int flags = I915_CACHE_FLAGS(cache);
> +	const unsigned int mode = I915_CACHE_MODE(cache);
> +
> +	if (mode == I915_CACHE_MODE_WC ||
> +	    mode == I915_CACHE_MODE_WT ||
> +	    (mode == I915_CACHE_MODE_WB && (flags & I915_CACHE_FLAG_COH2W)))
> +		obj->cache_coherent = I915_BO_CACHE_COHERENT_FOR_READ |
> +				      I915_BO_CACHE_COHERENT_FOR_WRITE;
> +	else if (HAS_LLC(i915))
> +		obj->cache_coherent = I915_BO_CACHE_COHERENT_FOR_READ;
> +	else
> +		obj->cache_coherent = 0;
> +
> +	obj->cache_dirty =
> +		!(obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_WRITE) &&
> +		!IS_DGFX(i915);
> +}
> +
>  /**
>   * i915_gem_object_set_cache_coherency - Mark up the object's coherency levels
> - * for a given cache_level
> + * for a given caching mode
>   * @obj: #drm_i915_gem_object
> - * @cache_level: cache level
> + * @cache: cache mode
>   */
>  void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object *obj,
> -					 unsigned int cache_level)
> +					 i915_cache_t cache)
>  {
> -	struct drm_i915_private *i915 = to_i915(obj->base.dev);
> +	struct drm_i915_private *i915 = obj_to_i915(obj);
> +	int found;
>  
> -	obj->pat_index = i915_gem_get_pat_index(i915, cache_level);
> +	found = i915_cache_find_pat(i915, cache);
> +	if (found < 0) {
> +		char buf[I915_CACHE_NAME_LEN];
>  
> -	if (cache_level != I915_CACHE_NONE)
> -		obj->cache_coherent = (I915_BO_CACHE_COHERENT_FOR_READ |
> -				       I915_BO_CACHE_COHERENT_FOR_WRITE);
> -	else if (HAS_LLC(i915))
> -		obj->cache_coherent = I915_BO_CACHE_COHERENT_FOR_READ;
> -	else
> -		obj->cache_coherent = 0;
> +		i915_cache_print(buf, sizeof(buf), NULL, cache);
> +		drm_err_ratelimited(&i915->drm, "Unknown cache mode %s!\n",
> +				    buf);
>  
> -	obj->cache_dirty =
> -		!(obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_WRITE) &&
> -		!IS_DGFX(i915);
> +		found = i915->pat_uc;
> +	}
> +
> +	obj->pat_index = found;
> +
> +	__i915_gem_object_update_coherency(obj);
>  }
>  
>  /**
> @@ -181,24 +196,18 @@ void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object *obj,
>  void i915_gem_object_set_pat_index(struct drm_i915_gem_object *obj,
>  				   unsigned int pat_index)
>  {
> -	struct drm_i915_private *i915 = to_i915(obj->base.dev);
> +	struct drm_i915_private *i915 = obj_to_i915(obj);
>  
>  	if (obj->pat_index == pat_index)
>  		return;
>  
> +	if (drm_WARN_ON_ONCE(&i915->drm,
> +			     pat_index > INTEL_INFO(i915)->max_pat_index))
> +		return;
> +
>  	obj->pat_index = pat_index;
>  
> -	if (pat_index != i915_gem_get_pat_index(i915, I915_CACHE_NONE))
> -		obj->cache_coherent = (I915_BO_CACHE_COHERENT_FOR_READ |
> -				       I915_BO_CACHE_COHERENT_FOR_WRITE);
> -	else if (HAS_LLC(i915))
> -		obj->cache_coherent = I915_BO_CACHE_COHERENT_FOR_READ;
> -	else
> -		obj->cache_coherent = 0;
> -
> -	obj->cache_dirty =
> -		!(obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_WRITE) &&
> -		!IS_DGFX(i915);
> +	__i915_gem_object_update_coherency(obj);
>  }
>  
>  bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj)
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> index 884a17275b3a..a5d4ee19d9be 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> @@ -13,6 +13,7 @@
>  
>  #include "display/intel_frontbuffer.h"
>  #include "intel_memory_region.h"
> +#include "i915_cache.h"
>  #include "i915_gem_object_types.h"
>  #include "i915_gem_gtt.h"
>  #include "i915_gem_ww.h"
> @@ -32,10 +33,6 @@ static inline bool i915_gem_object_size_2big(u64 size)
>  	return false;
>  }
>  
> -unsigned int i915_gem_get_pat_index(struct drm_i915_private *i915,
> -				    enum i915_cache_level level);
> -bool i915_gem_object_has_cache_level(const struct drm_i915_gem_object *obj,
> -				     enum i915_cache_level lvl);
>  void i915_gem_init__objects(struct drm_i915_private *i915);
>  
>  void i915_objects_module_exit(void);
> @@ -764,8 +761,12 @@ int i915_gem_object_wait_moving_fence(struct drm_i915_gem_object *obj,
>  				      bool intr);
>  bool i915_gem_object_has_unknown_state(struct drm_i915_gem_object *obj);
>  
> +bool i915_gem_object_has_cache_mode(const struct drm_i915_gem_object *obj,
> +				    enum i915_cache_mode mode);
> +bool i915_gem_object_has_cache_flag(const struct drm_i915_gem_object *obj,
> +				    unsigned int flag);
>  void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object *obj,
> -					 unsigned int cache_level);
> +					 i915_cache_t cache);
>  void i915_gem_object_set_pat_index(struct drm_i915_gem_object *obj,
>  				   unsigned int pat_index);
>  bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj);
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> index 8de2b91b3edf..6790e13ad262 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> @@ -14,6 +14,7 @@
>  #include <uapi/drm/i915_drm.h>
>  
>  #include "i915_active.h"
> +#include "i915_cache.h"
>  #include "i915_selftest.h"
>  #include "i915_vma_resource.h"
>  
> @@ -116,93 +117,6 @@ struct drm_i915_gem_object_ops {
>  	const char *name; /* friendly name for debug, e.g. lockdep classes */
>  };
>  
> -/**
> - * enum i915_cache_level - The supported GTT caching values for system memory
> - * pages.
> - *
> - * These translate to some special GTT PTE bits when binding pages into some
> - * address space. It also determines whether an object, or rather its pages are
> - * coherent with the GPU, when also reading or writing through the CPU cache
> - * with those pages.
> - *
> - * Userspace can also control this through struct drm_i915_gem_caching.
> - */
> -enum i915_cache_level {
> -	/**
> -	 * @I915_CACHE_NONE:
> -	 *
> -	 * GPU access is not coherent with the CPU cache. If the cache is dirty
> -	 * and we need the underlying pages to be coherent with some later GPU
> -	 * access then we need to manually flush the pages.
> -	 *
> -	 * On shared LLC platforms reads and writes through the CPU cache are
> -	 * still coherent even with this setting. See also
> -	 * &drm_i915_gem_object.cache_coherent for more details. Due to this we
> -	 * should only ever use uncached for scanout surfaces, otherwise we end
> -	 * up over-flushing in some places.
> -	 *
> -	 * This is the default on non-LLC platforms.
> -	 */
> -	I915_CACHE_NONE = 0,
> -	/**
> -	 * @I915_CACHE_LLC:
> -	 *
> -	 * GPU access is coherent with the CPU cache. If the cache is dirty,
> -	 * then the GPU will ensure that access remains coherent, when both
> -	 * reading and writing through the CPU cache. GPU writes can dirty the
> -	 * CPU cache.
> -	 *
> -	 * Not used for scanout surfaces.
> -	 *
> -	 * Applies to both platforms with shared LLC(HAS_LLC), and snooping
> -	 * based platforms(HAS_SNOOP).
> -	 *
> -	 * This is the default on shared LLC platforms.  The only exception is
> -	 * scanout objects, where the display engine is not coherent with the
> -	 * CPU cache. For such objects I915_CACHE_NONE or I915_CACHE_WT is
> -	 * automatically applied by the kernel in pin_for_display, if userspace
> -	 * has not done so already.
> -	 */
> -	I915_CACHE_LLC,
> -	/**
> -	 * @I915_CACHE_L3_LLC:
> -	 *
> -	 * Explicitly enable the Gfx L3 cache, with coherent LLC.
> -	 *
> -	 * The Gfx L3 sits between the domain specific caches, e.g
> -	 * sampler/render caches, and the larger LLC. LLC is coherent with the
> -	 * GPU, but L3 is only visible to the GPU, so likely needs to be flushed
> -	 * when the workload completes.
> -	 *
> -	 * Not used for scanout surfaces.
> -	 *
> -	 * Only exposed on some gen7 + GGTT. More recent hardware has dropped
> -	 * this explicit setting, where it should now be enabled by default.
> -	 */
> -	I915_CACHE_L3_LLC,
> -	/**
> -	 * @I915_CACHE_WT:
> -	 *
> -	 * Write-through. Used for scanout surfaces.
> -	 *
> -	 * The GPU can utilise the caches, while still having the display engine
> -	 * be coherent with GPU writes, as a result we don't need to flush the
> -	 * CPU caches when moving out of the render domain. This is the default
> -	 * setting chosen by the kernel, if supported by the HW, otherwise we
> -	 * fallback to I915_CACHE_NONE. On the CPU side writes through the CPU
> -	 * cache still need to be flushed, to remain coherent with the display
> -	 * engine.
> -	 */
> -	I915_CACHE_WT,
> -	/**
> -	 * @I915_MAX_CACHE_LEVEL:
> -	 *
> -	 * Mark the last entry in the enum. Used for defining cachelevel_to_pat
> -	 * array for cache_level to pat translation table.
> -	 */
> -	I915_MAX_CACHE_LEVEL,
> -};
> -
>  enum i915_map_type {
>  	I915_MAP_WB = 0,
>  	I915_MAP_WC,
> @@ -403,16 +317,6 @@ struct drm_i915_gem_object {
>  	/**
>  	 * @cache_coherent:
>  	 *
> -	 * Note: with the change above which replaced @cache_level with pat_index,
> -	 * the use of @cache_coherent is limited to the objects created by kernel
> -	 * or by userspace without pat index specified.
> -	 * Check for @pat_set_by_user to find out if an object has pat index set
> -	 * by userspace. The ioctl's to change cache settings have also been
> -	 * disabled for the objects with pat index set by userspace. Please don't
> -	 * assume @cache_coherent having the flags set as describe here. A helper
> -	 * function i915_gem_object_has_cache_level() provides one way to bypass
> -	 * the use of this field.
> -	 *
>  	 * Track whether the pages are coherent with the GPU if reading or
>  	 * writing through the CPU caches. The largely depends on the
>  	 * @cache_level setting.
> @@ -447,7 +351,7 @@ struct drm_i915_gem_object {
>  	 * flushing the surface just before doing the scanout.  This does mean
>  	 * we might unnecessarily flush non-scanout objects in some places, but
>  	 * the default assumption is that all normal objects should be using
> -	 * I915_CACHE_LLC, at least on platforms with the shared LLC.
> +	 * I915_CACHE_CACHED, at least on platforms with the shared LLC.
>  	 *
>  	 * Supported values:
>  	 *
> @@ -486,16 +390,6 @@ struct drm_i915_gem_object {
>  	/**
>  	 * @cache_dirty:
>  	 *
> -	 * Note: with the change above which replaced cache_level with pat_index,
> -	 * the use of @cache_dirty is limited to the objects created by kernel
> -	 * or by userspace without pat index specified.
> -	 * Check for @pat_set_by_user to find out if an object has pat index set
> -	 * by userspace. The ioctl's to change cache settings have also been
> -	 * disabled for the objects with pat_index set by userspace. Please don't
> -	 * assume @cache_dirty is set as describe here. Also see helper function
> -	 * i915_gem_object_has_cache_level() for possible ways to bypass the use
> -	 * of this field.
> -	 *
>  	 * Track if we are we dirty with writes through the CPU cache for this
>  	 * object. As a result reading directly from main memory might yield
>  	 * stale data.
> @@ -531,9 +425,9 @@ struct drm_i915_gem_object {
>  	 *
>  	 *   1. All userspace objects, by default, have @cache_level set as
>  	 *   I915_CACHE_NONE. The only exception is userptr objects, where we
> -	 *   instead force I915_CACHE_LLC, but we also don't allow userspace to
> -	 *   ever change the @cache_level for such objects. Another special case
> -	 *   is dma-buf, which doesn't rely on @cache_dirty,  but there we
> +	 *   instead force I915_CACHE_CACHED, but we also don't allow userspace
> +	 *   to ever change the @cache_level for such objects. Another special
> +	 *   case is dma-buf, which doesn't rely on @cache_dirty,  but there we
>  	 *   always do a forced flush when acquiring the pages, if there is a
>  	 *   chance that the pages can be read directly from main memory with
>  	 *   the GPU.
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
> index 8f1633c3fb93..aba908f0349f 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
> @@ -584,7 +584,7 @@ static int shmem_object_init(struct intel_memory_region *mem,
>  	static struct lock_class_key lock_class;
>  	struct drm_i915_private *i915 = mem->i915;
>  	struct address_space *mapping;
> -	unsigned int cache_level;
> +	i915_cache_t cache;
>  	gfp_t mask;
>  	int ret;
>  
> @@ -628,11 +628,11 @@ static int shmem_object_init(struct intel_memory_region *mem,
>  		 * However, we maintain the display planes as UC, and so
>  		 * need to rebind when first used as such.
>  		 */
> -		cache_level = I915_CACHE_LLC;
> +		cache = I915_CACHE_CACHED;
>  	else
> -		cache_level = I915_CACHE_NONE;
> +		cache = I915_CACHE_NONE;
>  
> -	i915_gem_object_set_cache_coherency(obj, cache_level);
> +	i915_gem_object_set_cache_coherency(obj, cache);
>  
>  	i915_gem_object_init_memory_region(obj, mem);
>  
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
> index 1c8eb806b7d3..cc907a1f1c53 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
> @@ -691,7 +691,7 @@ static int __i915_gem_object_create_stolen(struct intel_memory_region *mem,
>  
>  	obj->stolen = stolen;
>  	obj->read_domains = I915_GEM_DOMAIN_CPU | I915_GEM_DOMAIN_GTT;
> -	cache_level = HAS_LLC(mem->i915) ? I915_CACHE_LLC : I915_CACHE_NONE;
> +	cache_level = HAS_LLC(mem->i915) ? I915_CACHE_CACHED : I915_CACHE_NONE;
>  	i915_gem_object_set_cache_coherency(obj, cache_level);
>  
>  	if (WARN_ON(!i915_gem_object_trylock(obj, NULL)))
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
> index 6bd6c239f4ac..107176d1757b 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
> @@ -48,14 +48,14 @@ void i915_ttm_migrate_set_ban_memcpy(bool ban)
>  }
>  #endif
>  
> -static enum i915_cache_level
> -i915_ttm_cache_level(struct drm_i915_private *i915, struct ttm_resource *res,
> -		     struct ttm_tt *ttm)
> +static i915_cache_t
> +i915_ttm_cache(struct drm_i915_private *i915, struct ttm_resource *res,
> +	       struct ttm_tt *ttm)
>  {
>  	return ((HAS_LLC(i915) || HAS_SNOOP(i915)) &&
>  		!i915_ttm_gtt_binds_lmem(res) &&
> -		ttm->caching == ttm_cached) ? I915_CACHE_LLC :
> -		I915_CACHE_NONE;
> +		ttm->caching == ttm_cached) ? I915_CACHE_CACHED :
> +					      I915_CACHE_NONE;
>  }
>  
>  static unsigned int
> @@ -112,8 +112,8 @@ void i915_ttm_adjust_domains_after_move(struct drm_i915_gem_object *obj)
>  void i915_ttm_adjust_gem_after_move(struct drm_i915_gem_object *obj)
>  {
>  	struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
> -	unsigned int cache_level;
>  	unsigned int mem_flags;
> +	i915_cache_t cache;
>  	unsigned int i;
>  	int mem_type;
>  
> @@ -126,13 +126,13 @@ void i915_ttm_adjust_gem_after_move(struct drm_i915_gem_object *obj)
>  	if (!bo->resource) {
>  		mem_flags = I915_BO_FLAG_STRUCT_PAGE;
>  		mem_type = I915_PL_SYSTEM;
> -		cache_level = I915_CACHE_NONE;
> +		cache = I915_CACHE_NONE;
>  	} else {
>  		mem_flags = i915_ttm_cpu_maps_iomem(bo->resource) ? I915_BO_FLAG_IOMEM :
>  			I915_BO_FLAG_STRUCT_PAGE;
>  		mem_type = bo->resource->mem_type;
> -		cache_level = i915_ttm_cache_level(to_i915(bo->base.dev), bo->resource,
> -						   bo->ttm);
> +		cache = i915_ttm_cache(to_i915(bo->base.dev), bo->resource,
> +				       bo->ttm);
>  	}
>  
>  	/*
> @@ -157,7 +157,7 @@ void i915_ttm_adjust_gem_after_move(struct drm_i915_gem_object *obj)
>  	obj->mem_flags &= ~(I915_BO_FLAG_STRUCT_PAGE | I915_BO_FLAG_IOMEM);
>  	obj->mem_flags |= mem_flags;
>  
> -	i915_gem_object_set_cache_coherency(obj, cache_level);
> +	i915_gem_object_set_cache_coherency(obj, cache);
>  }
>  
>  /**
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
> index 1d3ebdf4069b..5d2891981bd4 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
> @@ -553,7 +553,7 @@ i915_gem_userptr_ioctl(struct drm_device *dev,
>  	obj->mem_flags = I915_BO_FLAG_STRUCT_PAGE;
>  	obj->read_domains = I915_GEM_DOMAIN_CPU;
>  	obj->write_domain = I915_GEM_DOMAIN_CPU;
> -	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
> +	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
>  
>  	obj->userptr.ptr = args->user_ptr;
>  	obj->userptr.notifier_seq = ULONG_MAX;
> diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c b/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c
> index bac957755068..77d04be5e9d7 100644
> --- a/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c
> +++ b/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c
> @@ -123,7 +123,7 @@ huge_gem_object(struct drm_i915_private *i915,
>  
>  	obj->read_domains = I915_GEM_DOMAIN_CPU;
>  	obj->write_domain = I915_GEM_DOMAIN_CPU;
> -	cache_level = HAS_LLC(i915) ? I915_CACHE_LLC : I915_CACHE_NONE;
> +	cache_level = HAS_LLC(i915) ? I915_CACHE_CACHED : I915_CACHE_NONE;
>  	i915_gem_object_set_cache_coherency(obj, cache_level);
>  	obj->scratch = phys_size;
>  
> diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
> index 6bddd733d796..6ca5b9dbc414 100644
> --- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
> +++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
> @@ -200,9 +200,10 @@ huge_pages_object(struct drm_i915_private *i915,
>  	obj->write_domain = I915_GEM_DOMAIN_CPU;
>  	obj->read_domains = I915_GEM_DOMAIN_CPU;
>  
> -	cache_level = HAS_LLC(i915) ? I915_CACHE_LLC : I915_CACHE_NONE;
> +	cache_level = HAS_LLC(i915) ? I915_CACHE_CACHED : I915_CACHE_NONE;
>  	i915_gem_object_set_cache_coherency(obj, cache_level);
>  
> +
>  	obj->mm.page_mask = page_mask;
>  
>  	return obj;
> diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> index 675f71f06e89..3c93a73cf6b1 100644
> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> @@ -16,11 +16,11 @@
>  #include "intel_gtt.h"
>  
>  static u64 gen8_pde_encode(const dma_addr_t addr,
> -			   const enum i915_cache_level level)
> +			   const enum i915_cache_mode cache_mode)
>  {
>  	u64 pde = addr | GEN8_PAGE_PRESENT | GEN8_PAGE_RW;
>  
> -	if (level != I915_CACHE_NONE)
> +	if (cache_mode != I915_CACHE_MODE_UC)
>  		pde |= PPAT_CACHED_PDE;
>  	else
>  		pde |= PPAT_UNCACHED;
> @@ -43,10 +43,10 @@ static u64 gen8_pte_encode(dma_addr_t addr,
>  	 * See translation table defined by LEGACY_CACHELEVEL.
>  	 */
>  	switch (pat_index) {
> -	case I915_CACHE_NONE:
> +	case I915_CACHE_MODE_UC:
>  		pte |= PPAT_UNCACHED;
>  		break;
> -	case I915_CACHE_WT:
> +	case I915_CACHE_MODE_WT:
>  		pte |= PPAT_DISPLAY_ELLC;
>  		break;
>  	default:
> @@ -893,7 +893,7 @@ static int gen8_init_scratch(struct i915_address_space *vm)
>  		}
>  
>  		fill_px(obj, vm->scratch[i - 1]->encode);
> -		obj->encode = gen8_pde_encode(px_dma(obj), I915_CACHE_NONE);
> +		obj->encode = gen8_pde_encode(px_dma(obj), I915_CACHE_MODE_UC);
>  
>  		vm->scratch[i] = obj;
>  	}
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> index ee15486fed0d..f1e59e512d14 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> @@ -1103,7 +1103,7 @@ static int init_status_page(struct intel_engine_cs *engine)
>  		return PTR_ERR(obj);
>  	}
>  
> -	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
> +	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
>  
>  	vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL);
>  	if (IS_ERR(vma)) {
> diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> index fca61ddca8ad..ab5f654e7557 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> @@ -1011,11 +1011,6 @@ static int gen8_gmch_probe(struct i915_ggtt *ggtt)
>  	return ggtt_probe_common(ggtt, size);
>  }
>  
> -/*
> - * For pre-gen8 platforms pat_index is the same as enum i915_cache_level,
> - * so the switch-case statements in these PTE encode functions are still valid.
> - * See translation table LEGACY_CACHELEVEL.
> - */
>  static u64 snb_pte_encode(dma_addr_t addr,
>  			  unsigned int pat_index,
>  			  u32 flags)
> @@ -1023,11 +1018,11 @@ static u64 snb_pte_encode(dma_addr_t addr,
>  	gen6_pte_t pte = GEN6_PTE_ADDR_ENCODE(addr) | GEN6_PTE_VALID;
>  
>  	switch (pat_index) {
> -	case I915_CACHE_L3_LLC:
> -	case I915_CACHE_LLC:
> +	case I915_CACHE_MODE_WB:
> +	case __I915_CACHE_MODE_WB_L3:
>  		pte |= GEN6_PTE_CACHE_LLC;
>  		break;
> -	case I915_CACHE_NONE:
> +	case I915_CACHE_MODE_UC:
>  		pte |= GEN6_PTE_UNCACHED;
>  		break;
>  	default:
> @@ -1044,13 +1039,13 @@ static u64 ivb_pte_encode(dma_addr_t addr,
>  	gen6_pte_t pte = GEN6_PTE_ADDR_ENCODE(addr) | GEN6_PTE_VALID;
>  
>  	switch (pat_index) {
> -	case I915_CACHE_L3_LLC:
> +	case __I915_CACHE_MODE_WB_L3:
>  		pte |= GEN7_PTE_CACHE_L3_LLC;
>  		break;
> -	case I915_CACHE_LLC:
> +	case I915_CACHE_MODE_WB:
>  		pte |= GEN6_PTE_CACHE_LLC;
>  		break;
> -	case I915_CACHE_NONE:
> +	case I915_CACHE_MODE_UC:
>  		pte |= GEN6_PTE_UNCACHED;
>  		break;
>  	default:
> @@ -1069,7 +1064,7 @@ static u64 byt_pte_encode(dma_addr_t addr,
>  	if (!(flags & PTE_READ_ONLY))
>  		pte |= BYT_PTE_WRITEABLE;
>  
> -	if (pat_index != I915_CACHE_NONE)
> +	if (pat_index != I915_CACHE_MODE_UC)
>  		pte |= BYT_PTE_SNOOPED_BY_CPU_CACHES;
>  
>  	return pte;
> @@ -1081,7 +1076,7 @@ static u64 hsw_pte_encode(dma_addr_t addr,
>  {
>  	gen6_pte_t pte = HSW_PTE_ADDR_ENCODE(addr) | GEN6_PTE_VALID;
>  
> -	if (pat_index != I915_CACHE_NONE)
> +	if (pat_index != I915_CACHE_MODE_UC)
>  		pte |= HSW_WB_LLC_AGE3;
>  
>  	return pte;
> @@ -1094,9 +1089,9 @@ static u64 iris_pte_encode(dma_addr_t addr,
>  	gen6_pte_t pte = HSW_PTE_ADDR_ENCODE(addr) | GEN6_PTE_VALID;
>  
>  	switch (pat_index) {
> -	case I915_CACHE_NONE:
> +	case I915_CACHE_MODE_UC:
>  		break;
> -	case I915_CACHE_WT:
> +	case I915_CACHE_MODE_WT:
>  		pte |= HSW_WT_ELLC_LLC_AGE3;
>  		break;
>  	default:
> diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c b/drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c
> index 866c416afb73..803c41ac4ccb 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c
> @@ -21,7 +21,7 @@ static void gmch_ggtt_insert_page(struct i915_address_space *vm,
>  				  unsigned int pat_index,
>  				  u32 unused)
>  {
> -	unsigned int flags = (pat_index == I915_CACHE_NONE) ?
> +	unsigned int flags = (pat_index == I915_CACHE_MODE_UC) ?
>  		AGP_USER_MEMORY : AGP_USER_CACHED_MEMORY;
>  
>  	intel_gmch_gtt_insert_page(addr, offset >> PAGE_SHIFT, flags);
> @@ -32,7 +32,7 @@ static void gmch_ggtt_insert_entries(struct i915_address_space *vm,
>  				     unsigned int pat_index,
>  				     u32 unused)
>  {
> -	unsigned int flags = (pat_index == I915_CACHE_NONE) ?
> +	unsigned int flags = (pat_index == I915_CACHE_MODE_UC) ?
>  		AGP_USER_MEMORY : AGP_USER_CACHED_MEMORY;
>  
>  	intel_gmch_gtt_insert_sg_entries(vma_res->bi.pages, vma_res->start >> PAGE_SHIFT,
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
> index 065099362a98..48055304537a 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
> @@ -676,7 +676,7 @@ __vm_create_scratch_for_read(struct i915_address_space *vm, unsigned long size)
>  	if (IS_ERR(obj))
>  		return ERR_CAST(obj);
>  
> -	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
> +	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
>  
>  	vma = i915_vma_instance(obj, vm, NULL);
>  	if (IS_ERR(vma)) {
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
> index 7192a534a654..af4277c1d577 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
> @@ -636,7 +636,8 @@ void
>  __set_pd_entry(struct i915_page_directory * const pd,
>  	       const unsigned short idx,
>  	       struct i915_page_table *pt,
> -	       u64 (*encode)(const dma_addr_t, const enum i915_cache_level));
> +	       u64 (*encode)(const dma_addr_t,
> +			     const enum i915_cache_mode cache_mode));
>  
>  #define set_pd_entry(pd, idx, to) \
>  	__set_pd_entry((pd), (idx), px_pt(to), gen8_pde_encode)
> diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
> index 436756bfbb1a..3e461d4f3693 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
> @@ -98,14 +98,16 @@ void
>  __set_pd_entry(struct i915_page_directory * const pd,
>  	       const unsigned short idx,
>  	       struct i915_page_table * const to,
> -	       u64 (*encode)(const dma_addr_t, const enum i915_cache_level))
> +	       u64 (*encode)(const dma_addr_t,
> +			     const enum i915_cache_mode cache_mode))
>  {
>  	/* Each thread pre-pins the pd, and we may have a thread per pde. */
>  	GEM_BUG_ON(atomic_read(px_used(pd)) > NALLOC * I915_PDES);
>  
>  	atomic_inc(px_used(pd));
>  	pd->entry[idx] = to;
> -	write_dma_entry(px_base(pd), idx, encode(px_dma(to), I915_CACHE_LLC));
> +	write_dma_entry(px_base(pd), idx,
> +			encode(px_dma(to), I915_CACHE_MODE_WB));
>  }
>  
>  void
> diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> index 92085ffd23de..9131d228d285 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> @@ -551,7 +551,9 @@ alloc_context_vma(struct intel_engine_cs *engine)
>  	 * later platforms don't have L3 control bits in the PTE.
>  	 */
>  	if (IS_IVYBRIDGE(i915))
> -		i915_gem_object_set_cache_coherency(obj, I915_CACHE_L3_LLC);
> +		i915_gem_object_set_cache_coherency(obj,
> +						    I915_CACHE_CACHED |
> +						    __I915_CACHE_FLAG(L3));
>  
>  	vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL);
>  	if (IS_ERR(vma)) {
> diff --git a/drivers/gpu/drm/i915/gt/intel_timeline.c b/drivers/gpu/drm/i915/gt/intel_timeline.c
> index b9640212d659..025ce54c886d 100644
> --- a/drivers/gpu/drm/i915/gt/intel_timeline.c
> +++ b/drivers/gpu/drm/i915/gt/intel_timeline.c
> @@ -26,7 +26,7 @@ static struct i915_vma *hwsp_alloc(struct intel_gt *gt)
>  	if (IS_ERR(obj))
>  		return ERR_CAST(obj);
>  
> -	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
> +	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
>  
>  	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL);
>  	if (IS_ERR(vma))
> diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> index 8b0d84f2aad2..fc278fa463b0 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> @@ -64,7 +64,7 @@ static int hang_init(struct hang *h, struct intel_gt *gt)
>  		goto err_hws;
>  	}
>  
> -	i915_gem_object_set_cache_coherency(h->hws, I915_CACHE_LLC);
> +	i915_gem_object_set_cache_coherency(h->hws, I915_CACHE_CACHED);
>  	vaddr = i915_gem_object_pin_map_unlocked(h->hws, I915_MAP_WB);
>  	if (IS_ERR(vaddr)) {
>  		err = PTR_ERR(vaddr);
> diff --git a/drivers/gpu/drm/i915/gt/selftest_workarounds.c b/drivers/gpu/drm/i915/gt/selftest_workarounds.c
> index 14a8b25b6204..d25990d33d44 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_workarounds.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_workarounds.c
> @@ -111,7 +111,7 @@ read_nonprivs(struct intel_context *ce)
>  	if (IS_ERR(result))
>  		return result;
>  
> -	i915_gem_object_set_cache_coherency(result, I915_CACHE_LLC);
> +	i915_gem_object_set_cache_coherency(result, I915_CACHE_CACHED);
>  
>  	cs = i915_gem_object_pin_map_unlocked(result, I915_MAP_WB);
>  	if (IS_ERR(cs)) {
> diff --git a/drivers/gpu/drm/i915/i915_cache.c b/drivers/gpu/drm/i915/i915_cache.c
> index 06eb5933c719..f4ba1cb430d3 100644
> --- a/drivers/gpu/drm/i915/i915_cache.c
> +++ b/drivers/gpu/drm/i915/i915_cache.c
> @@ -6,13 +6,88 @@
>  #include "i915_cache.h"
>  #include "i915_drv.h"
>  
> -void i915_cache_init(struct drm_i915_private *i915)
> +int i915_cache_init(struct drm_i915_private *i915)
>  {
> -	i915->pat_uc = i915_gem_get_pat_index(i915, I915_CACHE_NONE);
> -	drm_info(&i915->drm, "Using PAT index %u for uncached access\n",
> -		 i915->pat_uc);
> +	int ret;
>  
> -	i915->pat_wb = i915_gem_get_pat_index(i915, I915_CACHE_LLC);
> -	drm_info(&i915->drm, "Using PAT index %u for write-back access\n",
> -		 i915->pat_wb);
> +	ret = i915_cache_find_pat(i915, I915_CACHE_NONE);
> +	if (ret < 0) {
> +		drm_err(&i915->drm,
> +			"Failed to find PAT index for uncached access\n");
> +		return -ENODEV;
> +	}
> +	drm_info(&i915->drm, "Using PAT index %u for uncached access\n", ret);
> +	i915->pat_uc = ret;
> +
> +	ret = i915_cache_find_pat(i915, I915_CACHE_CACHED);
> +	if (ret < 0) {
> +		drm_err(&i915->drm,
> +			"Failed to find PAT index for write-back access\n");
> +		return -ENODEV;
> +	}
> +	drm_info(&i915->drm, "Using PAT index %u for write-back access\n", ret);
> +	i915->pat_wb = ret;
> +
> +	return 0;
> +}
> +
> +int i915_cache_find_pat(struct drm_i915_private *i915, i915_cache_t cache)
> +{
> +	const struct intel_device_info *info = INTEL_INFO(i915);
> +	int i;
> +
> +	for (i = 0; i < ARRAY_SIZE(info->cache_modes); i++) {
> +		if (info->cache_modes[i] == cache)
> +			return i;
> +	}
> +
> +	return -1;
> +}
> +
> +void i915_cache_print(char *buf, size_t buflen, const char *suffix,
> +		      i915_cache_t cache)
> +{
> +	const enum i915_cache_mode mode = I915_CACHE_MODE(cache);
> +	static const char * const mode_str[] = {
> +		[I915_CACHE_MODE_UC] = "UC",
> +		[I915_CACHE_MODE_WB] = "WB",
> +		[I915_CACHE_MODE_WT] = "WT",
> +		[I915_CACHE_MODE_WC] = "WC",
> +	};
> +	static const char * const flag_str[] = {
> +		[ilog2(I915_CACHE_FLAG_COH1W)] = "1-Way-Coherent",
> +		[ilog2(I915_CACHE_FLAG_COH2W)] = "2-Way-Coherent",
> +		[ilog2(I915_CACHE_FLAG_L3)] =    "L3",
> +		[ilog2(I915_CACHE_FLAG_CLOS1)] = "CLOS1",
> +		[ilog2(I915_CACHE_FLAG_CLOS2)] = "CLOS2",
> +	};
> +
> +	if (mode > ARRAY_SIZE(mode_str)) {
> +		snprintf(buf, buflen, "0x%x%s", cache, suffix ?: "");
> +	} else {
> +		unsigned long flags = I915_CACHE_FLAGS(cache);
> +		unsigned long bit;
> +		int ret;
> +
> +		ret = snprintf(buf, buflen, "%s", mode_str[mode]);
> +		buf += ret;
> +		buflen -= ret;
> +
> +		/*
> +		 * Don't print "1-way-2-way", it would be confusing and 2-way
> +		 * implies 1-way anyway.
> +		 */
> +		if ((flags & (I915_CACHE_FLAG_COH1W | I915_CACHE_FLAG_COH2W)) ==
> +		    (I915_CACHE_FLAG_COH1W | I915_CACHE_FLAG_COH2W))
> +			flags &= ~I915_CACHE_FLAG_COH1W;
> +
> +		for_each_set_bit(bit, &flags, BITS_PER_TYPE(i915_cache_t)) {
> +			ret = snprintf(buf, buflen, "-%s", flag_str[bit]);
> +			buf += ret;
> +			buflen -= ret;
> +		}
> +
> +		if (suffix)
> +			snprintf(buf, buflen, "%s", suffix);
> +	}
>  }
> diff --git a/drivers/gpu/drm/i915/i915_cache.h b/drivers/gpu/drm/i915/i915_cache.h
> index cb68936fb8a2..d9e97318b942 100644
> --- a/drivers/gpu/drm/i915/i915_cache.h
> +++ b/drivers/gpu/drm/i915/i915_cache.h
> @@ -6,8 +6,76 @@
>  #ifndef __I915_CACHE_H__
>  #define __I915_CACHE_H__
>  
> +#include <linux/types.h>
> +
> +struct drm_printer;
> +
>  struct drm_i915_private;
>  
> -void i915_cache_init(struct drm_i915_private *i915);
> +typedef u16 i915_cache_t;
> +
> +/* Cache modes */
> +enum i915_cache_mode {
> +	I915_CACHE_MODE_UC = 0,
> +	I915_CACHE_MODE_WB,
> +	__I915_CACHE_MODE_WB_L3, /* Special do-not-use entry for legacy 1:1 mapping. */
> +	I915_CACHE_MODE_WT,
> +	I915_CACHE_MODE_WC,
> +	I915_NUM_CACHE_MODES
> +};
> +
> +/* Cache mode flag bits */
> +#define I915_CACHE_FLAG_COH1W	(0x1)
> +#define I915_CACHE_FLAG_COH2W	(0x2) /* 1-way needs to be set too. */
> +#define I915_CACHE_FLAG_L3	(0x4)
> +#define I915_CACHE_FLAG_CLOS1	(0x8)
> +#define I915_CACHE_FLAG_CLOS2	(0x10)
> +
> +/*
> + * Overloaded I915_CACHE() macro based on:
> + *  https://stackoverflow.com/questions/3046889/optional-parameters-with-c-macros
> + *
> + * It is possible to call I915_CACHE with mode and zero or more flags as
> + * separate arguments. Ie these all work:
> + *
> + *   I915_CACHE(WB)
> + *   I915_CACHE(WB, COH1W, COH2W)
> + *   I915_CACHE(WB, COH1W, COH2W, L3)
> + */
> +
> +#define __I915_CACHE_FLAG(f) (I915_CACHE_FLAG_##f << 8)
> +#define __I915_CACHE(m, f) ((i915_cache_t)(I915_CACHE_MODE_##m | (f)))
> +
> +#define I915_CACHE_4(m, f1, f2, f3)	__I915_CACHE(m, __I915_CACHE_FLAG(f1) | __I915_CACHE_FLAG(f2) | __I915_CACHE_FLAG(f3))
> +#define I915_CACHE_3(m, f1, f2)		__I915_CACHE(m, __I915_CACHE_FLAG(f1) | __I915_CACHE_FLAG(f2))
> +#define I915_CACHE_2(m, f1)		__I915_CACHE(m, __I915_CACHE_FLAG(f1))
> +#define I915_CACHE_1(m)			__I915_CACHE(m, 0)
> +#define I915_CACHE_0(m)			__I915_CACHE(WC, 0)
> +
> +#define FUNC_CHOOSER(_f1, _f2, _f3, _f4, _f5, ...) _f5
> +#define FUNC_RECOMPOSER(argsWithParentheses) FUNC_CHOOSER argsWithParentheses
> +#define CHOOSE_FROM_ARG_COUNT(...) FUNC_RECOMPOSER((__VA_ARGS__, I915_CACHE_4, I915_CACHE_3, I915_CACHE_2, I915_CACHE_1, ))
> +#define NO_ARG_EXPANDER() ,,,I915_CACHE_0
> +#define MACRO_CHOOSER(...) CHOOSE_FROM_ARG_COUNT(NO_ARG_EXPANDER __VA_ARGS__ ())
> +
> +#define I915_CACHE(...) MACRO_CHOOSER(__VA_ARGS__)(__VA_ARGS__)
> +
> +/* i915_cache_t mode and flags extraction helpers. */
> +#define I915_CACHE_MODE(cache) \
> +	((enum i915_cache_mode)(((i915_cache_t)(cache)) & 0xff))
> +#define I915_CACHE_FLAGS(cache) \
> +	((unsigned int)((((i915_cache_t)(cache) & 0xff00)) >> 8))
> +
> +/* Helpers for i915 caching modes. */
> +#define I915_CACHE_NONE		I915_CACHE(UC)
> +#define I915_CACHE_CACHED	I915_CACHE(WB, COH1W, COH2W)
> +#define I915_CACHE_WT		I915_CACHE(WT)
> +
> +int i915_cache_init(struct drm_i915_private *i915);
> +int i915_cache_find_pat(struct drm_i915_private *i915, i915_cache_t cache);
> +void i915_cache_print(char *buf, size_t buflen, const char *suffix,
> +		      i915_cache_t cache);
> +
> +#define I915_CACHE_NAME_LEN (40)
>  
>  #endif /* __I915_CACHE_H__ */
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 4de44cf1026d..4ec292011546 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -140,57 +140,18 @@ static const char *stringify_vma_type(const struct i915_vma *vma)
>  	return "ppgtt";
>  }
>  
> -static const char *i915_cache_level_str(struct drm_i915_gem_object *obj)
> -{
> -	struct drm_i915_private *i915 = obj_to_i915(obj);
> -
> -	if (IS_METEORLAKE(i915)) {
> -		switch (obj->pat_index) {
> -		case 0: return " WB";
> -		case 1: return " WT";
> -		case 2: return " UC";
> -		case 3: return " WB (1-Way Coh)";
> -		case 4: return " WB (2-Way Coh)";
> -		default: return " not defined";
> -		}
> -	} else if (IS_PONTEVECCHIO(i915)) {
> -		switch (obj->pat_index) {
> -		case 0: return " UC";
> -		case 1: return " WC";
> -		case 2: return " WT";
> -		case 3: return " WB";
> -		case 4: return " WT (CLOS1)";
> -		case 5: return " WB (CLOS1)";
> -		case 6: return " WT (CLOS2)";
> -		case 7: return " WT (CLOS2)";
> -		default: return " not defined";
> -		}
> -	} else if (GRAPHICS_VER(i915) >= 12) {
> -		switch (obj->pat_index) {
> -		case 0: return " WB";
> -		case 1: return " WC";
> -		case 2: return " WT";
> -		case 3: return " UC";
> -		default: return " not defined";
> -		}
> -	} else {
> -		switch (obj->pat_index) {
> -		case 0: return " UC";
> -		case 1: return HAS_LLC(i915) ?
> -			       " LLC" : " snooped";
> -		case 2: return " L3+LLC";
> -		case 3: return " WT";
> -		default: return " not defined";
> -		}
> -	}
> -}
> -
>  void
>  i915_debugfs_describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
>  {
> +	struct drm_i915_private *i915 = to_i915(obj->base.dev);
> +	char buf[I915_CACHE_NAME_LEN];
>  	struct i915_vma *vma;
>  	int pin_count = 0;
>  
> +	i915_cache_print(buf, sizeof(buf),
> +			 obj->pat_set_by_user ? "!" : NULL,
> +			 INTEL_INFO(i915)->cache_modes[obj->pat_index]);
> +
>  	seq_printf(m, "%pK: %c%c%c %8zdKiB %02x %02x %s%s%s",
>  		   &obj->base,
>  		   get_tiling_flag(obj),
> @@ -199,7 +160,7 @@ i915_debugfs_describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
>  		   obj->base.size / 1024,
>  		   obj->read_domains,
>  		   obj->write_domain,
> -		   i915_cache_level_str(obj),
> +		   buf,
>  		   obj->mm.dirty ? " dirty" : "",
>  		   obj->mm.madv == I915_MADV_DONTNEED ? " purgeable" : "");
>  	if (obj->base.name)
> diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
> index bb2223cc3470..8663388a524f 100644
> --- a/drivers/gpu/drm/i915/i915_driver.c
> +++ b/drivers/gpu/drm/i915/i915_driver.c
> @@ -241,7 +241,9 @@ static int i915_driver_early_probe(struct drm_i915_private *dev_priv)
>  	i915_memcpy_init_early(dev_priv);
>  	intel_runtime_pm_init_early(&dev_priv->runtime_pm);
>  
> -	i915_cache_init(dev_priv);
> +	ret = i915_cache_init(dev_priv);
> +	if (ret < 0)
> +		return ret;
>  
>  	ret = i915_workqueues_init(dev_priv);
>  	if (ret < 0)
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 896aa48ed089..814705cfeb12 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1144,19 +1144,6 @@ int i915_gem_init(struct drm_i915_private *dev_priv)
>  	unsigned int i;
>  	int ret;
>  
> -	/*
> -	 * In the proccess of replacing cache_level with pat_index a tricky
> -	 * dependency is created on the definition of the enum i915_cache_level.
> -	 * in case this enum is changed, PTE encode would be broken.
> -	 * Add a WARNING here. And remove when we completely quit using this
> -	 * enum
> -	 */
> -	BUILD_BUG_ON(I915_CACHE_NONE != 0 ||
> -		     I915_CACHE_LLC != 1 ||
> -		     I915_CACHE_L3_LLC != 2 ||
> -		     I915_CACHE_WT != 3 ||
> -		     I915_MAX_CACHE_LEVEL != 4);
> -
>  	/* We need to fallback to 4K pages if host doesn't support huge gtt. */
>  	if (intel_vgpu_active(dev_priv) && !intel_vgpu_has_huge_gtt(dev_priv))
>  		RUNTIME_INFO(dev_priv)->page_sizes = I915_GTT_PAGE_SIZE_4K;
> diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
> index fcacdc21643c..565a60a1645d 100644
> --- a/drivers/gpu/drm/i915/i915_pci.c
> +++ b/drivers/gpu/drm/i915/i915_pci.c
> @@ -32,6 +32,7 @@
>  #include "gt/intel_sa_media.h"
>  #include "gem/i915_gem_object_types.h"
>  
> +#include "i915_cache.h"
>  #include "i915_driver.h"
>  #include "i915_drv.h"
>  #include "i915_pci.h"
> @@ -43,36 +44,43 @@
>  	.__runtime.graphics.ip.ver = (x), \
>  	.__runtime.media.ip.ver = (x)
>  
> -#define LEGACY_CACHELEVEL \
> -	.cachelevel_to_pat = { \
> -		[I915_CACHE_NONE]   = 0, \
> -		[I915_CACHE_LLC]    = 1, \
> -		[I915_CACHE_L3_LLC] = 2, \
> -		[I915_CACHE_WT]     = 3, \
> +#define LEGACY_CACHE_MODES \
> +	.cache_modes = { \
> +		[I915_CACHE_MODE_UC] 	  = I915_CACHE(UC), \
> +		[I915_CACHE_MODE_WB] 	  = I915_CACHE(WB, COH1W, COH2W), \

Reading bspec 2863 (bdw) indicates that the CPU being able to snoop the
GPU's L3 was a new feature in gen8.  So for HSW and earlier, any
coherency was only 1-way (GPU could be coherent with CPU's caches, but
not vice-versa).  Only starting with gen8 did we get 2-way coherency as
an option where the CPU would also be coherent with the GPU cache (and
with gen8 and beyond you could still select 1-way instead of 2-way
coherency with instruction-level granularity via MOCS).  There are also
some legacy platforms (e.g., EHL/JSL on bspec 13948) where the IA wasn't
coherent with GPU L3 so we were back to 1-way coherency.

So should we split LEGACY_CACHE_MODES into two tables with different
coherency settings attached to I915_CACHE_MODE_WB?

> +		[__I915_CACHE_MODE_WB_L3] = I915_CACHE(WB, COH1W, COH2W, L3), \
> +		[I915_CACHE_MODE_WT] 	  = I915_CACHE(WT), \
>  	}
>  
> -#define TGL_CACHELEVEL \
> -	.cachelevel_to_pat = { \
> -		[I915_CACHE_NONE]   = 3, \
> -		[I915_CACHE_LLC]    = 0, \
> -		[I915_CACHE_L3_LLC] = 0, \
> -		[I915_CACHE_WT]     = 2, \
> +#define GEN12_CACHE_MODES \
> +	.cache_modes = { \
> +		[0] = I915_CACHE(WB, COH1W, COH2W), \
> +		[1] = I915_CACHE(WC), \
> +		[2] = I915_CACHE(WT), \
> +		[3] = I915_CACHE(UC), \
>  	}
>  
> -#define PVC_CACHELEVEL \
> -	.cachelevel_to_pat = { \
> -		[I915_CACHE_NONE]   = 0, \
> -		[I915_CACHE_LLC]    = 3, \
> -		[I915_CACHE_L3_LLC] = 3, \
> -		[I915_CACHE_WT]     = 2, \
> +/* FIXME is 1-way or 2-way for 3, 5, 7 */
> +
> +#define PVC_CACHE_MODES \
> +	.cache_modes = { \
> +		[0] = I915_CACHE(UC), \
> +		[1] = I915_CACHE(WC), \
> +		[2] = I915_CACHE(WT), \
> +		[3] = I915_CACHE(WB, COH1W), \
> +		[4] = I915_CACHE(WT, CLOS1), \
> +		[5] = I915_CACHE(WB, COH1W, CLOS1), \
> +		[6] = I915_CACHE(WT, CLOS2), \
> +		[7] = I915_CACHE(WB, COH1W, CLOS2), \
>  	}
>  
> -#define MTL_CACHELEVEL \
> -	.cachelevel_to_pat = { \
> -		[I915_CACHE_NONE]   = 2, \
> -		[I915_CACHE_LLC]    = 3, \
> -		[I915_CACHE_L3_LLC] = 3, \
> -		[I915_CACHE_WT]     = 1, \
> +#define MTL_CACHE_MODES \
> +	.cache_modes = { \
> +		[0] = I915_CACHE(WB), \
> +		[1] = I915_CACHE(WT), \
> +		[2] = I915_CACHE(UC), \
> +		[3] = I915_CACHE(WB, COH1W), \
> +		[4] = I915_CACHE(WB, COH1W, COH2W), \

We may want a comment on this one since the "2W" part is sort of a lie.
Bspec 63884 has a programming note for MTL that says

        "...Except for system atomics, setting Coherency Mode to 10 or
        11 results in this same one-way coherenct behavior..."

So if we ask for 2W, we actually only get 1W behavior except in a very
narrow set of cases.


Matt

>  	}
>  
>  /* Keep in gen based order, and chronological order within a gen */
> @@ -97,7 +105,7 @@
>  	.max_pat_index = 3, \
>  	GEN_DEFAULT_PAGE_SIZES, \
>  	GEN_DEFAULT_REGIONS, \
> -	LEGACY_CACHELEVEL
> +	LEGACY_CACHE_MODES
>  
>  #define I845_FEATURES \
>  	GEN(2), \
> @@ -112,7 +120,7 @@
>  	.max_pat_index = 3, \
>  	GEN_DEFAULT_PAGE_SIZES, \
>  	GEN_DEFAULT_REGIONS, \
> -	LEGACY_CACHELEVEL
> +	LEGACY_CACHE_MODES
>  
>  static const struct intel_device_info i830_info = {
>  	I830_FEATURES,
> @@ -145,7 +153,7 @@ static const struct intel_device_info i865g_info = {
>  	.max_pat_index = 3, \
>  	GEN_DEFAULT_PAGE_SIZES, \
>  	GEN_DEFAULT_REGIONS, \
> -	LEGACY_CACHELEVEL
> +	LEGACY_CACHE_MODES
>  
>  static const struct intel_device_info i915g_info = {
>  	GEN3_FEATURES,
> @@ -208,7 +216,7 @@ static const struct intel_device_info pnv_m_info = {
>  	.max_pat_index = 3, \
>  	GEN_DEFAULT_PAGE_SIZES, \
>  	GEN_DEFAULT_REGIONS, \
> -	LEGACY_CACHELEVEL
> +	LEGACY_CACHE_MODES
>  
>  static const struct intel_device_info i965g_info = {
>  	GEN4_FEATURES,
> @@ -252,7 +260,7 @@ static const struct intel_device_info gm45_info = {
>  	.max_pat_index = 3, \
>  	GEN_DEFAULT_PAGE_SIZES, \
>  	GEN_DEFAULT_REGIONS, \
> -	LEGACY_CACHELEVEL
> +	LEGACY_CACHE_MODES
>  
>  static const struct intel_device_info ilk_d_info = {
>  	GEN5_FEATURES,
> @@ -282,7 +290,7 @@ static const struct intel_device_info ilk_m_info = {
>  	.__runtime.ppgtt_size = 31, \
>  	GEN_DEFAULT_PAGE_SIZES, \
>  	GEN_DEFAULT_REGIONS, \
> -	LEGACY_CACHELEVEL
> +	LEGACY_CACHE_MODES
>  
>  #define SNB_D_PLATFORM \
>  	GEN6_FEATURES, \
> @@ -330,7 +338,7 @@ static const struct intel_device_info snb_m_gt2_info = {
>  	.__runtime.ppgtt_size = 31, \
>  	GEN_DEFAULT_PAGE_SIZES, \
>  	GEN_DEFAULT_REGIONS, \
> -	LEGACY_CACHELEVEL
> +	LEGACY_CACHE_MODES
>  
>  #define IVB_D_PLATFORM \
>  	GEN7_FEATURES, \
> @@ -387,7 +395,7 @@ static const struct intel_device_info vlv_info = {
>  	.platform_engine_mask = BIT(RCS0) | BIT(VCS0) | BIT(BCS0),
>  	GEN_DEFAULT_PAGE_SIZES,
>  	GEN_DEFAULT_REGIONS,
> -	LEGACY_CACHELEVEL,
> +	LEGACY_CACHE_MODES
>  };
>  
>  #define G75_FEATURES  \
> @@ -473,7 +481,7 @@ static const struct intel_device_info chv_info = {
>  	.has_coherent_ggtt = false,
>  	GEN_DEFAULT_PAGE_SIZES,
>  	GEN_DEFAULT_REGIONS,
> -	LEGACY_CACHELEVEL,
> +	LEGACY_CACHE_MODES
>  };
>  
>  #define GEN9_DEFAULT_PAGE_SIZES \
> @@ -536,7 +544,7 @@ static const struct intel_device_info skl_gt4_info = {
>  	.max_pat_index = 3, \
>  	GEN9_DEFAULT_PAGE_SIZES, \
>  	GEN_DEFAULT_REGIONS, \
> -	LEGACY_CACHELEVEL
> +	LEGACY_CACHE_MODES
>  
>  static const struct intel_device_info bxt_info = {
>  	GEN9_LP_FEATURES,
> @@ -640,7 +648,7 @@ static const struct intel_device_info jsl_info = {
>  #define GEN12_FEATURES \
>  	GEN11_FEATURES, \
>  	GEN(12), \
> -	TGL_CACHELEVEL, \
> +	GEN12_CACHE_MODES, \
>  	.has_global_mocs = 1, \
>  	.has_pxp = 1, \
>  	.max_pat_index = 3
> @@ -708,7 +716,7 @@ static const struct intel_device_info adl_p_info = {
>  	.__runtime.graphics.ip.ver = 12, \
>  	.__runtime.graphics.ip.rel = 50, \
>  	XE_HP_PAGE_SIZES, \
> -	TGL_CACHELEVEL, \
> +	GEN12_CACHE_MODES, \
>  	.dma_mask_size = 46, \
>  	.has_3d_pipeline = 1, \
>  	.has_64bit_reloc = 1, \
> @@ -803,7 +811,7 @@ static const struct intel_device_info pvc_info = {
>  		BIT(VCS0) |
>  		BIT(CCS0) | BIT(CCS1) | BIT(CCS2) | BIT(CCS3),
>  	.require_force_probe = 1,
> -	PVC_CACHELEVEL,
> +	PVC_CACHE_MODES
>  };
>  
>  static const struct intel_gt_definition xelpmp_extra_gt[] = {
> @@ -838,7 +846,7 @@ static const struct intel_device_info mtl_info = {
>  	.memory_regions = REGION_SMEM | REGION_STOLEN_LMEM,
>  	.platform_engine_mask = BIT(RCS0) | BIT(BCS0) | BIT(CCS0),
>  	.require_force_probe = 1,
> -	MTL_CACHELEVEL,
> +	MTL_CACHE_MODES
>  };
>  
>  #undef PLATFORM
> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> index 04bc1f4a1115..973175a64534 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -1870,7 +1870,7 @@ static int alloc_oa_buffer(struct i915_perf_stream *stream)
>  		return PTR_ERR(bo);
>  	}
>  
> -	i915_gem_object_set_cache_coherency(bo, I915_CACHE_LLC);
> +	i915_gem_object_set_cache_coherency(bo, I915_CACHE_CACHED);
>  
>  	/* PreHSW required 512K alignment, HSW requires 16M */
>  	vma = i915_vma_instance(bo, &gt->ggtt->vm, NULL);
> diff --git a/drivers/gpu/drm/i915/intel_device_info.h b/drivers/gpu/drm/i915/intel_device_info.h
> index dbfe6443457b..2ce13b7c48cb 100644
> --- a/drivers/gpu/drm/i915/intel_device_info.h
> +++ b/drivers/gpu/drm/i915/intel_device_info.h
> @@ -27,6 +27,8 @@
>  
>  #include <uapi/drm/i915_drm.h>
>  
> +#include "i915_cache.h"
> +
>  #include "intel_step.h"
>  
>  #include "gt/intel_engine_types.h"
> @@ -243,8 +245,8 @@ struct intel_device_info {
>  	 */
>  	const struct intel_runtime_info __runtime;
>  
> -	u32 cachelevel_to_pat[I915_MAX_CACHE_LEVEL];
> -	u32 max_pat_index;
> +	i915_cache_t cache_modes[8];
> +	unsigned int max_pat_index;
>  };
>  
>  struct intel_driver_caps {
> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
> index f910ec9b6d2b..ba821e48baa5 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
> @@ -267,7 +267,7 @@ static int igt_evict_for_cache_color(void *arg)
>  		err = PTR_ERR(obj);
>  		goto cleanup;
>  	}
> -	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
> +	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
>  	quirk_add(obj, &objects);
>  
>  	vma = i915_gem_object_ggtt_pin(obj, NULL, 0, 0,
> @@ -283,7 +283,7 @@ static int igt_evict_for_cache_color(void *arg)
>  		err = PTR_ERR(obj);
>  		goto cleanup;
>  	}
> -	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
> +	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
>  	quirk_add(obj, &objects);
>  
>  	/* Neighbouring; same colour - should fit */
> diff --git a/drivers/gpu/drm/i915/selftests/igt_spinner.c b/drivers/gpu/drm/i915/selftests/igt_spinner.c
> index 3c5e0952f1b8..4cfc5000d6ff 100644
> --- a/drivers/gpu/drm/i915/selftests/igt_spinner.c
> +++ b/drivers/gpu/drm/i915/selftests/igt_spinner.c
> @@ -23,7 +23,7 @@ int igt_spinner_init(struct igt_spinner *spin, struct intel_gt *gt)
>  		err = PTR_ERR(spin->hws);
>  		goto err;
>  	}
> -	i915_gem_object_set_cache_coherency(spin->hws, I915_CACHE_LLC);
> +	i915_gem_object_set_cache_coherency(spin->hws, I915_CACHE_CACHED);
>  
>  	spin->obj = i915_gem_object_create_internal(gt->i915, PAGE_SIZE);
>  	if (IS_ERR(spin->obj)) {
> diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> index 1d1a457e2aee..8ae77bcf27fa 100644
> --- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> +++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> @@ -126,13 +126,13 @@ static const struct intel_device_info mock_info = {
>  	.memory_regions = REGION_SMEM,
>  	.platform_engine_mask = BIT(0),
>  
> -	/* simply use legacy cache level for mock device */
> +	/* Simply use legacy cache modes for the mock device. */
>  	.max_pat_index = 3,
> -	.cachelevel_to_pat = {
> -		[I915_CACHE_NONE]   = 0,
> -		[I915_CACHE_LLC]    = 1,
> -		[I915_CACHE_L3_LLC] = 2,
> -		[I915_CACHE_WT]     = 3,
> +	.cache_modes = {
> +		[0] = I915_CACHE(UC),
> +		[1] = I915_CACHE(WB, COH1W),
> +		[2] = I915_CACHE(WB, COH1W, COH2W, L3),
> +		[3] = I915_CACHE(WT),
>  	},
>  };
>  
> @@ -181,7 +181,7 @@ struct drm_i915_private *mock_gem_device(void)
>  	/* Set up device info and initial runtime info. */
>  	intel_device_info_driver_create(i915, pdev->device, &mock_info);
>  
> -	i915_cache_init(i915);
> +	WARN_ON(i915_cache_init(i915));
>  
>  	dev_pm_domain_set(&pdev->dev, &pm_domain);
>  	pm_runtime_enable(&pdev->dev);
> -- 
> 2.39.2
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Intel-gfx] [RFC 4/8] drm/i915: Refactor PAT/object cache handling
@ 2023-07-27 23:57     ` Matt Roper
  0 siblings, 0 replies; 59+ messages in thread
From: Matt Roper @ 2023-07-27 23:57 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Intel-gfx, dri-devel, Chris Wilson

On Thu, Jul 27, 2023 at 03:55:00PM +0100, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Commit 9275277d5324 ("drm/i915: use pat_index instead of cache_level") has
> introduced PAT indices to i915 internal APIs, partially replacing the
> usage of driver internal cache_level, but has also added a few sub-
> optimal design decisions which this patch tries to improve upon.
> 
> Principal change here is to invert the per platform cache level to PAT
> index table which was added by the referenced commit, and by doing so
> enable i915 to understand the cache mode between PAT indices, changing
> them from opaque to transparent.
> 
> Once we have the inverted table we are able to remove the hidden false
> "return true" from i915_gem_object_has_cache_level and make the involved
> code path clearer.
> 
> To achieve this we replace the enum i915_cache_level with i915_cache_t,
> composed of a more detailed representation of each cache mode (base mode
> plus flags).
> 
> In this way we are able to express the differences between different
> write-back mode coherency settings on Meteorlake, which in turn enables us
> to map the i915 "cached" mode to the correct Meteorlake PAT index.
> 
> We can also replace the platform dependent cache mode to string code in
> debugfs and elsewhere by the single implementation based on i915_cache_t.
> 
> v2:
>  * Fix PAT-to-cache-mode table for PVC. (Fei)
>  * Cache display caching mode too. (Fei)
>  * Improve and document criteria in i915_gem_object_can_bypass_llc() (Matt)
> 
> v3:
>  * Checkpath issues.
>  * Cache mode flags check fixed.
> 
> v4:
>  * Fix intel_device_info->cache_modes array size. (Matt)
>  * Boolean cache mode and flags query. (Matt)
>  * Reduce number of cache macros with some macro magic.
>  * One more checkpatch fix.
>  * Tweak tables to show legacy and Gen12 WB is fully coherent.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> References: 9275277d5324 ("drm/i915: use pat_index instead of cache_level")
> Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
> Cc: Fei Yang <fei.yang@intel.com>
> Cc: Andi Shyti <andi.shyti@linux.intel.com>
> Cc: Matt Roper <matthew.d.roper@intel.com>
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_domain.c    |  60 +++++----
>  drivers/gpu/drm/i915/gem/i915_gem_domain.h    |   5 +-
>  .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |   3 +-
>  drivers/gpu/drm/i915/gem/i915_gem_internal.c  |   2 +-
>  drivers/gpu/drm/i915/gem/i915_gem_mman.c      |   4 +-
>  drivers/gpu/drm/i915/gem/i915_gem_object.c    | 117 ++++++++++--------
>  drivers/gpu/drm/i915/gem/i915_gem_object.h    |  11 +-
>  .../gpu/drm/i915/gem/i915_gem_object_types.h  | 116 +----------------
>  drivers/gpu/drm/i915/gem/i915_gem_shmem.c     |   8 +-
>  drivers/gpu/drm/i915/gem/i915_gem_stolen.c    |   2 +-
>  drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c  |  20 +--
>  drivers/gpu/drm/i915/gem/i915_gem_userptr.c   |   2 +-
>  .../drm/i915/gem/selftests/huge_gem_object.c  |   2 +-
>  .../gpu/drm/i915/gem/selftests/huge_pages.c   |   3 +-
>  drivers/gpu/drm/i915/gt/gen8_ppgtt.c          |  10 +-
>  drivers/gpu/drm/i915/gt/intel_engine_cs.c     |   2 +-
>  drivers/gpu/drm/i915/gt/intel_ggtt.c          |  25 ++--
>  drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c     |   4 +-
>  drivers/gpu/drm/i915/gt/intel_gtt.c           |   2 +-
>  drivers/gpu/drm/i915/gt/intel_gtt.h           |   3 +-
>  drivers/gpu/drm/i915/gt/intel_ppgtt.c         |   6 +-
>  .../gpu/drm/i915/gt/intel_ring_submission.c   |   4 +-
>  drivers/gpu/drm/i915/gt/intel_timeline.c      |   2 +-
>  drivers/gpu/drm/i915/gt/selftest_hangcheck.c  |   2 +-
>  .../gpu/drm/i915/gt/selftest_workarounds.c    |   2 +-
>  drivers/gpu/drm/i915/i915_cache.c             |  89 +++++++++++--
>  drivers/gpu/drm/i915/i915_cache.h             |  70 ++++++++++-
>  drivers/gpu/drm/i915/i915_debugfs.c           |  53 ++------
>  drivers/gpu/drm/i915/i915_driver.c            |   4 +-
>  drivers/gpu/drm/i915/i915_gem.c               |  13 --
>  drivers/gpu/drm/i915/i915_pci.c               |  84 +++++++------
>  drivers/gpu/drm/i915/i915_perf.c              |   2 +-
>  drivers/gpu/drm/i915/intel_device_info.h      |   6 +-
>  .../gpu/drm/i915/selftests/i915_gem_evict.c   |   4 +-
>  drivers/gpu/drm/i915/selftests/igt_spinner.c  |   2 +-
>  .../gpu/drm/i915/selftests/mock_gem_device.c  |  14 +--
>  36 files changed, 391 insertions(+), 367 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> index 57db9c581bf6..c15f83de33af 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> @@ -8,6 +8,7 @@
>  #include "display/intel_frontbuffer.h"
>  #include "gt/intel_gt.h"
>  
> +#include "i915_cache.h"
>  #include "i915_drv.h"
>  #include "i915_gem_clflush.h"
>  #include "i915_gem_domain.h"
> @@ -41,14 +42,17 @@ static bool gpu_write_needs_clflush(struct drm_i915_gem_object *obj)
>  		return false;
>  
>  	/*
> -	 * For objects created by userspace through GEM_CREATE with pat_index
> -	 * set by set_pat extension, i915_gem_object_has_cache_level() will
> -	 * always return true, because the coherency of such object is managed
> -	 * by userspace. Othereise the call here would fall back to checking
> -	 * whether the object is un-cached or write-through.
> +	 * Always flush cache for UMD objects with PAT index set.
>  	 */
> -	return !(i915_gem_object_has_cache_level(obj, I915_CACHE_NONE) ||
> -		 i915_gem_object_has_cache_level(obj, I915_CACHE_WT));
> +	if (obj->pat_set_by_user)
> +		return true;
> +
> +	/*
> +	 * Fully coherent cached access may end up with data in the CPU cache
> +	 * which hasn't hit memory yet.
> +	 */
> +	return i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_WB) &&
> +	       i915_gem_object_has_cache_flag(obj, I915_CACHE_FLAG_COH2W);
>  }
>  
>  bool i915_gem_cpu_write_needs_clflush(struct drm_i915_gem_object *obj)
> @@ -268,7 +272,7 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
>  /**
>   * i915_gem_object_set_cache_level - Changes the cache-level of an object across all VMA.
>   * @obj: object to act on
> - * @cache_level: new cache level to set for the object
> + * @cache: new caching mode to set for the object
>   *
>   * After this function returns, the object will be in the new cache-level
>   * across all GTT and the contents of the backing storage will be coherent,
> @@ -281,18 +285,28 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
>   * that all direct access to the scanout remains coherent.
>   */
>  int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
> -				    enum i915_cache_level cache_level)
> +				    i915_cache_t cache)
>  {
> -	int ret;
> +	struct drm_i915_private *i915 = to_i915(obj->base.dev);
> +	int pat, ret;
>  
> -	/*
> -	 * For objects created by userspace through GEM_CREATE with pat_index
> -	 * set by set_pat extension, simply return 0 here without touching
> -	 * the cache setting, because such objects should have an immutable
> -	 * cache setting by desgin and always managed by userspace.
> -	 */
> -	if (i915_gem_object_has_cache_level(obj, cache_level))
> +	pat = i915_cache_find_pat(i915, cache);
> +	if (pat < 0) {
> +		char buf[I915_CACHE_NAME_LEN];
> +
> +		i915_cache_print(buf, sizeof(buf), NULL, cache);
> +		drm_err_ratelimited(&i915->drm,
> +				    "Attempting to use unknown caching mode %s!\n",
> +				    buf);
> +
> +		return -EINVAL;
> +	} else if (pat == obj->pat_index) {
>  		return 0;
> +	} else if (obj->pat_set_by_user) {
> +		drm_notice_once(&i915->drm,
> +				"Attempting to change caching mode on an object with fixed PAT!\n");
> +		return -EINVAL;
> +	}
>  
>  	ret = i915_gem_object_wait(obj,
>  				   I915_WAIT_INTERRUPTIBLE |
> @@ -302,7 +316,7 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
>  		return ret;
>  
>  	/* Always invalidate stale cachelines */
> -	i915_gem_object_set_cache_coherency(obj, cache_level);
> +	i915_gem_object_set_pat_index(obj, pat);
>  	obj->cache_dirty = true;
>  
>  	/* The cache-level will be applied when each vma is rebound. */
> @@ -337,10 +351,10 @@ int i915_gem_get_caching_ioctl(struct drm_device *dev, void *data,
>  		goto out;
>  	}
>  
> -	if (i915_gem_object_has_cache_level(obj, I915_CACHE_LLC) ||
> -	    i915_gem_object_has_cache_level(obj, I915_CACHE_L3_LLC))
> +	if (i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_WB) &&
> +	    i915_gem_object_has_cache_flag(obj, I915_CACHE_FLAG_COH2W))
>  		args->caching = I915_CACHING_CACHED;
> -	else if (i915_gem_object_has_cache_level(obj, I915_CACHE_WT))
> +	else if (i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_WT))
>  		args->caching = I915_CACHING_DISPLAY;
>  	else
>  		args->caching = I915_CACHING_NONE;
> @@ -355,7 +369,7 @@ int i915_gem_set_caching_ioctl(struct drm_device *dev, void *data,
>  	struct drm_i915_private *i915 = to_i915(dev);
>  	struct drm_i915_gem_caching *args = data;
>  	struct drm_i915_gem_object *obj;
> -	enum i915_cache_level level;
> +	i915_cache_t level;
>  	int ret = 0;
>  
>  	if (IS_DGFX(i915))
> @@ -378,7 +392,7 @@ int i915_gem_set_caching_ioctl(struct drm_device *dev, void *data,
>  		if (!HAS_LLC(i915) && !HAS_SNOOP(i915))
>  			return -ENODEV;
>  
> -		level = I915_CACHE_LLC;
> +		level = I915_CACHE_CACHED;
>  		break;
>  	case I915_CACHING_DISPLAY:
>  		level = HAS_WT(i915) ? I915_CACHE_WT : I915_CACHE_NONE;
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.h b/drivers/gpu/drm/i915/gem/i915_gem_domain.h
> index 9622df962bfc..6da5c351f6fd 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.h
> @@ -6,10 +6,11 @@
>  #ifndef __I915_GEM_DOMAIN_H__
>  #define __I915_GEM_DOMAIN_H__
>  
> +#include "i915_cache.h"
> +
>  struct drm_i915_gem_object;
> -enum i915_cache_level;
>  
>  int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
> -				    enum i915_cache_level cache_level);
> +				    i915_cache_t cache);
>  
>  #endif /* __I915_GEM_DOMAIN_H__ */
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> index 0a1d40220020..9d6e49c8a4c6 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> @@ -648,7 +648,8 @@ static inline int use_cpu_reloc(const struct reloc_cache *cache,
>  	 */
>  	return (cache->has_llc ||
>  		obj->cache_dirty ||
> -		!i915_gem_object_has_cache_level(obj, I915_CACHE_NONE));
> +		!(obj->pat_set_by_user ||
> +		  i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_UC)));
>  }
>  
>  static int eb_reserve_vma(struct i915_execbuffer *eb,
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_internal.c b/drivers/gpu/drm/i915/gem/i915_gem_internal.c
> index 6bc26b4b06b8..88c360c3d6a3 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_internal.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_internal.c
> @@ -170,7 +170,7 @@ __i915_gem_object_create_internal(struct drm_i915_private *i915,
>  	obj->read_domains = I915_GEM_DOMAIN_CPU;
>  	obj->write_domain = I915_GEM_DOMAIN_CPU;
>  
> -	cache_level = HAS_LLC(i915) ? I915_CACHE_LLC : I915_CACHE_NONE;
> +	cache_level = HAS_LLC(i915) ? I915_CACHE_CACHED : I915_CACHE_NONE;
>  	i915_gem_object_set_cache_coherency(obj, cache_level);
>  
>  	return obj;
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> index aa4d842d4c5a..cd7f8ded0d6f 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> @@ -382,7 +382,6 @@ static vm_fault_t vm_fault_gtt(struct vm_fault *vmf)
>  		goto err_reset;
>  	}
>  
> -	/* Access to snoopable pages through the GTT is incoherent. */
>  	/*
>  	 * For objects created by userspace through GEM_CREATE with pat_index
>  	 * set by set_pat extension, coherency is managed by userspace, make
> @@ -391,7 +390,8 @@ static vm_fault_t vm_fault_gtt(struct vm_fault *vmf)
>  	 * objects. Otherwise this helper function would fall back to checking
>  	 * whether the object is un-cached.
>  	 */
> -	if (!(i915_gem_object_has_cache_level(obj, I915_CACHE_NONE) ||
> +	if (!((obj->pat_set_by_user ||
> +	       i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_UC)) ||
>  	      HAS_LLC(i915))) {
>  		ret = -EFAULT;
>  		goto err_unpin;
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> index 3dc4fbb67d2b..ec1f0be43d0d 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> @@ -45,33 +45,6 @@ static struct kmem_cache *slab_objects;
>  
>  static const struct drm_gem_object_funcs i915_gem_object_funcs;
>  
> -unsigned int i915_gem_get_pat_index(struct drm_i915_private *i915,
> -				    enum i915_cache_level level)
> -{
> -	if (drm_WARN_ON(&i915->drm, level >= I915_MAX_CACHE_LEVEL))
> -		return 0;
> -
> -	return INTEL_INFO(i915)->cachelevel_to_pat[level];
> -}
> -
> -bool i915_gem_object_has_cache_level(const struct drm_i915_gem_object *obj,
> -				     enum i915_cache_level lvl)
> -{
> -	/*
> -	 * In case the pat_index is set by user space, this kernel mode
> -	 * driver should leave the coherency to be managed by user space,
> -	 * simply return true here.
> -	 */
> -	if (obj->pat_set_by_user)
> -		return true;
> -
> -	/*
> -	 * Otherwise the pat_index should have been converted from cache_level
> -	 * so that the following comparison is valid.
> -	 */
> -	return obj->pat_index == i915_gem_get_pat_index(obj_to_i915(obj), lvl);
> -}
> -
>  struct drm_i915_gem_object *i915_gem_object_alloc(void)
>  {
>  	struct drm_i915_gem_object *obj;
> @@ -144,30 +117,72 @@ void __i915_gem_object_fini(struct drm_i915_gem_object *obj)
>  	dma_resv_fini(&obj->base._resv);
>  }
>  
> +bool i915_gem_object_has_cache_mode(const struct drm_i915_gem_object *obj,
> +				    enum i915_cache_mode mode)
> +{
> +	struct drm_i915_private *i915 = obj_to_i915(obj);
> +	i915_cache_t cache = INTEL_INFO(i915)->cache_modes[obj->pat_index];
> +
> +	return I915_CACHE_MODE(cache) == mode;
> +}
> +
> +bool i915_gem_object_has_cache_flag(const struct drm_i915_gem_object *obj,
> +				    unsigned int flag)
> +{
> +	struct drm_i915_private *i915 = obj_to_i915(obj);
> +	i915_cache_t cache = INTEL_INFO(i915)->cache_modes[obj->pat_index];
> +
> +	return I915_CACHE_FLAGS(cache) & flag;
> +}
> +
> +static void __i915_gem_object_update_coherency(struct drm_i915_gem_object *obj)
> +{
> +	struct drm_i915_private *i915 = obj_to_i915(obj);
> +	i915_cache_t cache = INTEL_INFO(i915)->cache_modes[obj->pat_index];
> +	const unsigned int flags = I915_CACHE_FLAGS(cache);
> +	const unsigned int mode = I915_CACHE_MODE(cache);
> +
> +	if (mode == I915_CACHE_MODE_WC ||
> +	    mode == I915_CACHE_MODE_WT ||
> +	    (mode == I915_CACHE_MODE_WB && (flags & I915_CACHE_FLAG_COH2W)))
> +		obj->cache_coherent = I915_BO_CACHE_COHERENT_FOR_READ |
> +				      I915_BO_CACHE_COHERENT_FOR_WRITE;
> +	else if (HAS_LLC(i915))
> +		obj->cache_coherent = I915_BO_CACHE_COHERENT_FOR_READ;
> +	else
> +		obj->cache_coherent = 0;
> +
> +	obj->cache_dirty =
> +		!(obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_WRITE) &&
> +		!IS_DGFX(i915);
> +}
> +
>  /**
>   * i915_gem_object_set_cache_coherency - Mark up the object's coherency levels
> - * for a given cache_level
> + * for a given caching mode
>   * @obj: #drm_i915_gem_object
> - * @cache_level: cache level
> + * @cache: cache mode
>   */
>  void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object *obj,
> -					 unsigned int cache_level)
> +					 i915_cache_t cache)
>  {
> -	struct drm_i915_private *i915 = to_i915(obj->base.dev);
> +	struct drm_i915_private *i915 = obj_to_i915(obj);
> +	int found;
>  
> -	obj->pat_index = i915_gem_get_pat_index(i915, cache_level);
> +	found = i915_cache_find_pat(i915, cache);
> +	if (found < 0) {
> +		char buf[I915_CACHE_NAME_LEN];
>  
> -	if (cache_level != I915_CACHE_NONE)
> -		obj->cache_coherent = (I915_BO_CACHE_COHERENT_FOR_READ |
> -				       I915_BO_CACHE_COHERENT_FOR_WRITE);
> -	else if (HAS_LLC(i915))
> -		obj->cache_coherent = I915_BO_CACHE_COHERENT_FOR_READ;
> -	else
> -		obj->cache_coherent = 0;
> +		i915_cache_print(buf, sizeof(buf), NULL, cache);
> +		drm_err_ratelimited(&i915->drm, "Unknown cache mode %s!\n",
> +				    buf);
>  
> -	obj->cache_dirty =
> -		!(obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_WRITE) &&
> -		!IS_DGFX(i915);
> +		found = i915->pat_uc;
> +	}
> +
> +	obj->pat_index = found;
> +
> +	__i915_gem_object_update_coherency(obj);
>  }
>  
>  /**
> @@ -181,24 +196,18 @@ void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object *obj,
>  void i915_gem_object_set_pat_index(struct drm_i915_gem_object *obj,
>  				   unsigned int pat_index)
>  {
> -	struct drm_i915_private *i915 = to_i915(obj->base.dev);
> +	struct drm_i915_private *i915 = obj_to_i915(obj);
>  
>  	if (obj->pat_index == pat_index)
>  		return;
>  
> +	if (drm_WARN_ON_ONCE(&i915->drm,
> +			     pat_index > INTEL_INFO(i915)->max_pat_index))
> +		return;
> +
>  	obj->pat_index = pat_index;
>  
> -	if (pat_index != i915_gem_get_pat_index(i915, I915_CACHE_NONE))
> -		obj->cache_coherent = (I915_BO_CACHE_COHERENT_FOR_READ |
> -				       I915_BO_CACHE_COHERENT_FOR_WRITE);
> -	else if (HAS_LLC(i915))
> -		obj->cache_coherent = I915_BO_CACHE_COHERENT_FOR_READ;
> -	else
> -		obj->cache_coherent = 0;
> -
> -	obj->cache_dirty =
> -		!(obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_WRITE) &&
> -		!IS_DGFX(i915);
> +	__i915_gem_object_update_coherency(obj);
>  }
>  
>  bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj)
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> index 884a17275b3a..a5d4ee19d9be 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> @@ -13,6 +13,7 @@
>  
>  #include "display/intel_frontbuffer.h"
>  #include "intel_memory_region.h"
> +#include "i915_cache.h"
>  #include "i915_gem_object_types.h"
>  #include "i915_gem_gtt.h"
>  #include "i915_gem_ww.h"
> @@ -32,10 +33,6 @@ static inline bool i915_gem_object_size_2big(u64 size)
>  	return false;
>  }
>  
> -unsigned int i915_gem_get_pat_index(struct drm_i915_private *i915,
> -				    enum i915_cache_level level);
> -bool i915_gem_object_has_cache_level(const struct drm_i915_gem_object *obj,
> -				     enum i915_cache_level lvl);
>  void i915_gem_init__objects(struct drm_i915_private *i915);
>  
>  void i915_objects_module_exit(void);
> @@ -764,8 +761,12 @@ int i915_gem_object_wait_moving_fence(struct drm_i915_gem_object *obj,
>  				      bool intr);
>  bool i915_gem_object_has_unknown_state(struct drm_i915_gem_object *obj);
>  
> +bool i915_gem_object_has_cache_mode(const struct drm_i915_gem_object *obj,
> +				    enum i915_cache_mode mode);
> +bool i915_gem_object_has_cache_flag(const struct drm_i915_gem_object *obj,
> +				    unsigned int flag);
>  void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object *obj,
> -					 unsigned int cache_level);
> +					 i915_cache_t cache);
>  void i915_gem_object_set_pat_index(struct drm_i915_gem_object *obj,
>  				   unsigned int pat_index);
>  bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj);
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> index 8de2b91b3edf..6790e13ad262 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> @@ -14,6 +14,7 @@
>  #include <uapi/drm/i915_drm.h>
>  
>  #include "i915_active.h"
> +#include "i915_cache.h"
>  #include "i915_selftest.h"
>  #include "i915_vma_resource.h"
>  
> @@ -116,93 +117,6 @@ struct drm_i915_gem_object_ops {
>  	const char *name; /* friendly name for debug, e.g. lockdep classes */
>  };
>  
> -/**
> - * enum i915_cache_level - The supported GTT caching values for system memory
> - * pages.
> - *
> - * These translate to some special GTT PTE bits when binding pages into some
> - * address space. It also determines whether an object, or rather its pages are
> - * coherent with the GPU, when also reading or writing through the CPU cache
> - * with those pages.
> - *
> - * Userspace can also control this through struct drm_i915_gem_caching.
> - */
> -enum i915_cache_level {
> -	/**
> -	 * @I915_CACHE_NONE:
> -	 *
> -	 * GPU access is not coherent with the CPU cache. If the cache is dirty
> -	 * and we need the underlying pages to be coherent with some later GPU
> -	 * access then we need to manually flush the pages.
> -	 *
> -	 * On shared LLC platforms reads and writes through the CPU cache are
> -	 * still coherent even with this setting. See also
> -	 * &drm_i915_gem_object.cache_coherent for more details. Due to this we
> -	 * should only ever use uncached for scanout surfaces, otherwise we end
> -	 * up over-flushing in some places.
> -	 *
> -	 * This is the default on non-LLC platforms.
> -	 */
> -	I915_CACHE_NONE = 0,
> -	/**
> -	 * @I915_CACHE_LLC:
> -	 *
> -	 * GPU access is coherent with the CPU cache. If the cache is dirty,
> -	 * then the GPU will ensure that access remains coherent, when both
> -	 * reading and writing through the CPU cache. GPU writes can dirty the
> -	 * CPU cache.
> -	 *
> -	 * Not used for scanout surfaces.
> -	 *
> -	 * Applies to both platforms with shared LLC(HAS_LLC), and snooping
> -	 * based platforms(HAS_SNOOP).
> -	 *
> -	 * This is the default on shared LLC platforms.  The only exception is
> -	 * scanout objects, where the display engine is not coherent with the
> -	 * CPU cache. For such objects I915_CACHE_NONE or I915_CACHE_WT is
> -	 * automatically applied by the kernel in pin_for_display, if userspace
> -	 * has not done so already.
> -	 */
> -	I915_CACHE_LLC,
> -	/**
> -	 * @I915_CACHE_L3_LLC:
> -	 *
> -	 * Explicitly enable the Gfx L3 cache, with coherent LLC.
> -	 *
> -	 * The Gfx L3 sits between the domain specific caches, e.g
> -	 * sampler/render caches, and the larger LLC. LLC is coherent with the
> -	 * GPU, but L3 is only visible to the GPU, so likely needs to be flushed
> -	 * when the workload completes.
> -	 *
> -	 * Not used for scanout surfaces.
> -	 *
> -	 * Only exposed on some gen7 + GGTT. More recent hardware has dropped
> -	 * this explicit setting, where it should now be enabled by default.
> -	 */
> -	I915_CACHE_L3_LLC,
> -	/**
> -	 * @I915_CACHE_WT:
> -	 *
> -	 * Write-through. Used for scanout surfaces.
> -	 *
> -	 * The GPU can utilise the caches, while still having the display engine
> -	 * be coherent with GPU writes, as a result we don't need to flush the
> -	 * CPU caches when moving out of the render domain. This is the default
> -	 * setting chosen by the kernel, if supported by the HW, otherwise we
> -	 * fallback to I915_CACHE_NONE. On the CPU side writes through the CPU
> -	 * cache still need to be flushed, to remain coherent with the display
> -	 * engine.
> -	 */
> -	I915_CACHE_WT,
> -	/**
> -	 * @I915_MAX_CACHE_LEVEL:
> -	 *
> -	 * Mark the last entry in the enum. Used for defining cachelevel_to_pat
> -	 * array for cache_level to pat translation table.
> -	 */
> -	I915_MAX_CACHE_LEVEL,
> -};
> -
>  enum i915_map_type {
>  	I915_MAP_WB = 0,
>  	I915_MAP_WC,
> @@ -403,16 +317,6 @@ struct drm_i915_gem_object {
>  	/**
>  	 * @cache_coherent:
>  	 *
> -	 * Note: with the change above which replaced @cache_level with pat_index,
> -	 * the use of @cache_coherent is limited to the objects created by kernel
> -	 * or by userspace without pat index specified.
> -	 * Check for @pat_set_by_user to find out if an object has pat index set
> -	 * by userspace. The ioctl's to change cache settings have also been
> -	 * disabled for the objects with pat index set by userspace. Please don't
> -	 * assume @cache_coherent having the flags set as describe here. A helper
> -	 * function i915_gem_object_has_cache_level() provides one way to bypass
> -	 * the use of this field.
> -	 *
>  	 * Track whether the pages are coherent with the GPU if reading or
>  	 * writing through the CPU caches. The largely depends on the
>  	 * @cache_level setting.
> @@ -447,7 +351,7 @@ struct drm_i915_gem_object {
>  	 * flushing the surface just before doing the scanout.  This does mean
>  	 * we might unnecessarily flush non-scanout objects in some places, but
>  	 * the default assumption is that all normal objects should be using
> -	 * I915_CACHE_LLC, at least on platforms with the shared LLC.
> +	 * I915_CACHE_CACHED, at least on platforms with the shared LLC.
>  	 *
>  	 * Supported values:
>  	 *
> @@ -486,16 +390,6 @@ struct drm_i915_gem_object {
>  	/**
>  	 * @cache_dirty:
>  	 *
> -	 * Note: with the change above which replaced cache_level with pat_index,
> -	 * the use of @cache_dirty is limited to the objects created by kernel
> -	 * or by userspace without pat index specified.
> -	 * Check for @pat_set_by_user to find out if an object has pat index set
> -	 * by userspace. The ioctl's to change cache settings have also been
> -	 * disabled for the objects with pat_index set by userspace. Please don't
> -	 * assume @cache_dirty is set as describe here. Also see helper function
> -	 * i915_gem_object_has_cache_level() for possible ways to bypass the use
> -	 * of this field.
> -	 *
>  	 * Track if we are we dirty with writes through the CPU cache for this
>  	 * object. As a result reading directly from main memory might yield
>  	 * stale data.
> @@ -531,9 +425,9 @@ struct drm_i915_gem_object {
>  	 *
>  	 *   1. All userspace objects, by default, have @cache_level set as
>  	 *   I915_CACHE_NONE. The only exception is userptr objects, where we
> -	 *   instead force I915_CACHE_LLC, but we also don't allow userspace to
> -	 *   ever change the @cache_level for such objects. Another special case
> -	 *   is dma-buf, which doesn't rely on @cache_dirty,  but there we
> +	 *   instead force I915_CACHE_CACHED, but we also don't allow userspace
> +	 *   to ever change the @cache_level for such objects. Another special
> +	 *   case is dma-buf, which doesn't rely on @cache_dirty,  but there we
>  	 *   always do a forced flush when acquiring the pages, if there is a
>  	 *   chance that the pages can be read directly from main memory with
>  	 *   the GPU.
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
> index 8f1633c3fb93..aba908f0349f 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
> @@ -584,7 +584,7 @@ static int shmem_object_init(struct intel_memory_region *mem,
>  	static struct lock_class_key lock_class;
>  	struct drm_i915_private *i915 = mem->i915;
>  	struct address_space *mapping;
> -	unsigned int cache_level;
> +	i915_cache_t cache;
>  	gfp_t mask;
>  	int ret;
>  
> @@ -628,11 +628,11 @@ static int shmem_object_init(struct intel_memory_region *mem,
>  		 * However, we maintain the display planes as UC, and so
>  		 * need to rebind when first used as such.
>  		 */
> -		cache_level = I915_CACHE_LLC;
> +		cache = I915_CACHE_CACHED;
>  	else
> -		cache_level = I915_CACHE_NONE;
> +		cache = I915_CACHE_NONE;
>  
> -	i915_gem_object_set_cache_coherency(obj, cache_level);
> +	i915_gem_object_set_cache_coherency(obj, cache);
>  
>  	i915_gem_object_init_memory_region(obj, mem);
>  
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
> index 1c8eb806b7d3..cc907a1f1c53 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
> @@ -691,7 +691,7 @@ static int __i915_gem_object_create_stolen(struct intel_memory_region *mem,
>  
>  	obj->stolen = stolen;
>  	obj->read_domains = I915_GEM_DOMAIN_CPU | I915_GEM_DOMAIN_GTT;
> -	cache_level = HAS_LLC(mem->i915) ? I915_CACHE_LLC : I915_CACHE_NONE;
> +	cache_level = HAS_LLC(mem->i915) ? I915_CACHE_CACHED : I915_CACHE_NONE;
>  	i915_gem_object_set_cache_coherency(obj, cache_level);
>  
>  	if (WARN_ON(!i915_gem_object_trylock(obj, NULL)))
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
> index 6bd6c239f4ac..107176d1757b 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
> @@ -48,14 +48,14 @@ void i915_ttm_migrate_set_ban_memcpy(bool ban)
>  }
>  #endif
>  
> -static enum i915_cache_level
> -i915_ttm_cache_level(struct drm_i915_private *i915, struct ttm_resource *res,
> -		     struct ttm_tt *ttm)
> +static i915_cache_t
> +i915_ttm_cache(struct drm_i915_private *i915, struct ttm_resource *res,
> +	       struct ttm_tt *ttm)
>  {
>  	return ((HAS_LLC(i915) || HAS_SNOOP(i915)) &&
>  		!i915_ttm_gtt_binds_lmem(res) &&
> -		ttm->caching == ttm_cached) ? I915_CACHE_LLC :
> -		I915_CACHE_NONE;
> +		ttm->caching == ttm_cached) ? I915_CACHE_CACHED :
> +					      I915_CACHE_NONE;
>  }
>  
>  static unsigned int
> @@ -112,8 +112,8 @@ void i915_ttm_adjust_domains_after_move(struct drm_i915_gem_object *obj)
>  void i915_ttm_adjust_gem_after_move(struct drm_i915_gem_object *obj)
>  {
>  	struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
> -	unsigned int cache_level;
>  	unsigned int mem_flags;
> +	i915_cache_t cache;
>  	unsigned int i;
>  	int mem_type;
>  
> @@ -126,13 +126,13 @@ void i915_ttm_adjust_gem_after_move(struct drm_i915_gem_object *obj)
>  	if (!bo->resource) {
>  		mem_flags = I915_BO_FLAG_STRUCT_PAGE;
>  		mem_type = I915_PL_SYSTEM;
> -		cache_level = I915_CACHE_NONE;
> +		cache = I915_CACHE_NONE;
>  	} else {
>  		mem_flags = i915_ttm_cpu_maps_iomem(bo->resource) ? I915_BO_FLAG_IOMEM :
>  			I915_BO_FLAG_STRUCT_PAGE;
>  		mem_type = bo->resource->mem_type;
> -		cache_level = i915_ttm_cache_level(to_i915(bo->base.dev), bo->resource,
> -						   bo->ttm);
> +		cache = i915_ttm_cache(to_i915(bo->base.dev), bo->resource,
> +				       bo->ttm);
>  	}
>  
>  	/*
> @@ -157,7 +157,7 @@ void i915_ttm_adjust_gem_after_move(struct drm_i915_gem_object *obj)
>  	obj->mem_flags &= ~(I915_BO_FLAG_STRUCT_PAGE | I915_BO_FLAG_IOMEM);
>  	obj->mem_flags |= mem_flags;
>  
> -	i915_gem_object_set_cache_coherency(obj, cache_level);
> +	i915_gem_object_set_cache_coherency(obj, cache);
>  }
>  
>  /**
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
> index 1d3ebdf4069b..5d2891981bd4 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
> @@ -553,7 +553,7 @@ i915_gem_userptr_ioctl(struct drm_device *dev,
>  	obj->mem_flags = I915_BO_FLAG_STRUCT_PAGE;
>  	obj->read_domains = I915_GEM_DOMAIN_CPU;
>  	obj->write_domain = I915_GEM_DOMAIN_CPU;
> -	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
> +	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
>  
>  	obj->userptr.ptr = args->user_ptr;
>  	obj->userptr.notifier_seq = ULONG_MAX;
> diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c b/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c
> index bac957755068..77d04be5e9d7 100644
> --- a/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c
> +++ b/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c
> @@ -123,7 +123,7 @@ huge_gem_object(struct drm_i915_private *i915,
>  
>  	obj->read_domains = I915_GEM_DOMAIN_CPU;
>  	obj->write_domain = I915_GEM_DOMAIN_CPU;
> -	cache_level = HAS_LLC(i915) ? I915_CACHE_LLC : I915_CACHE_NONE;
> +	cache_level = HAS_LLC(i915) ? I915_CACHE_CACHED : I915_CACHE_NONE;
>  	i915_gem_object_set_cache_coherency(obj, cache_level);
>  	obj->scratch = phys_size;
>  
> diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
> index 6bddd733d796..6ca5b9dbc414 100644
> --- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
> +++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
> @@ -200,9 +200,10 @@ huge_pages_object(struct drm_i915_private *i915,
>  	obj->write_domain = I915_GEM_DOMAIN_CPU;
>  	obj->read_domains = I915_GEM_DOMAIN_CPU;
>  
> -	cache_level = HAS_LLC(i915) ? I915_CACHE_LLC : I915_CACHE_NONE;
> +	cache_level = HAS_LLC(i915) ? I915_CACHE_CACHED : I915_CACHE_NONE;
>  	i915_gem_object_set_cache_coherency(obj, cache_level);
>  
> +
>  	obj->mm.page_mask = page_mask;
>  
>  	return obj;
> diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> index 675f71f06e89..3c93a73cf6b1 100644
> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> @@ -16,11 +16,11 @@
>  #include "intel_gtt.h"
>  
>  static u64 gen8_pde_encode(const dma_addr_t addr,
> -			   const enum i915_cache_level level)
> +			   const enum i915_cache_mode cache_mode)
>  {
>  	u64 pde = addr | GEN8_PAGE_PRESENT | GEN8_PAGE_RW;
>  
> -	if (level != I915_CACHE_NONE)
> +	if (cache_mode != I915_CACHE_MODE_UC)
>  		pde |= PPAT_CACHED_PDE;
>  	else
>  		pde |= PPAT_UNCACHED;
> @@ -43,10 +43,10 @@ static u64 gen8_pte_encode(dma_addr_t addr,
>  	 * See translation table defined by LEGACY_CACHELEVEL.
>  	 */
>  	switch (pat_index) {
> -	case I915_CACHE_NONE:
> +	case I915_CACHE_MODE_UC:
>  		pte |= PPAT_UNCACHED;
>  		break;
> -	case I915_CACHE_WT:
> +	case I915_CACHE_MODE_WT:
>  		pte |= PPAT_DISPLAY_ELLC;
>  		break;
>  	default:
> @@ -893,7 +893,7 @@ static int gen8_init_scratch(struct i915_address_space *vm)
>  		}
>  
>  		fill_px(obj, vm->scratch[i - 1]->encode);
> -		obj->encode = gen8_pde_encode(px_dma(obj), I915_CACHE_NONE);
> +		obj->encode = gen8_pde_encode(px_dma(obj), I915_CACHE_MODE_UC);
>  
>  		vm->scratch[i] = obj;
>  	}
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> index ee15486fed0d..f1e59e512d14 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> @@ -1103,7 +1103,7 @@ static int init_status_page(struct intel_engine_cs *engine)
>  		return PTR_ERR(obj);
>  	}
>  
> -	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
> +	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
>  
>  	vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL);
>  	if (IS_ERR(vma)) {
> diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> index fca61ddca8ad..ab5f654e7557 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> @@ -1011,11 +1011,6 @@ static int gen8_gmch_probe(struct i915_ggtt *ggtt)
>  	return ggtt_probe_common(ggtt, size);
>  }
>  
> -/*
> - * For pre-gen8 platforms pat_index is the same as enum i915_cache_level,
> - * so the switch-case statements in these PTE encode functions are still valid.
> - * See translation table LEGACY_CACHELEVEL.
> - */
>  static u64 snb_pte_encode(dma_addr_t addr,
>  			  unsigned int pat_index,
>  			  u32 flags)
> @@ -1023,11 +1018,11 @@ static u64 snb_pte_encode(dma_addr_t addr,
>  	gen6_pte_t pte = GEN6_PTE_ADDR_ENCODE(addr) | GEN6_PTE_VALID;
>  
>  	switch (pat_index) {
> -	case I915_CACHE_L3_LLC:
> -	case I915_CACHE_LLC:
> +	case I915_CACHE_MODE_WB:
> +	case __I915_CACHE_MODE_WB_L3:
>  		pte |= GEN6_PTE_CACHE_LLC;
>  		break;
> -	case I915_CACHE_NONE:
> +	case I915_CACHE_MODE_UC:
>  		pte |= GEN6_PTE_UNCACHED;
>  		break;
>  	default:
> @@ -1044,13 +1039,13 @@ static u64 ivb_pte_encode(dma_addr_t addr,
>  	gen6_pte_t pte = GEN6_PTE_ADDR_ENCODE(addr) | GEN6_PTE_VALID;
>  
>  	switch (pat_index) {
> -	case I915_CACHE_L3_LLC:
> +	case __I915_CACHE_MODE_WB_L3:
>  		pte |= GEN7_PTE_CACHE_L3_LLC;
>  		break;
> -	case I915_CACHE_LLC:
> +	case I915_CACHE_MODE_WB:
>  		pte |= GEN6_PTE_CACHE_LLC;
>  		break;
> -	case I915_CACHE_NONE:
> +	case I915_CACHE_MODE_UC:
>  		pte |= GEN6_PTE_UNCACHED;
>  		break;
>  	default:
> @@ -1069,7 +1064,7 @@ static u64 byt_pte_encode(dma_addr_t addr,
>  	if (!(flags & PTE_READ_ONLY))
>  		pte |= BYT_PTE_WRITEABLE;
>  
> -	if (pat_index != I915_CACHE_NONE)
> +	if (pat_index != I915_CACHE_MODE_UC)
>  		pte |= BYT_PTE_SNOOPED_BY_CPU_CACHES;
>  
>  	return pte;
> @@ -1081,7 +1076,7 @@ static u64 hsw_pte_encode(dma_addr_t addr,
>  {
>  	gen6_pte_t pte = HSW_PTE_ADDR_ENCODE(addr) | GEN6_PTE_VALID;
>  
> -	if (pat_index != I915_CACHE_NONE)
> +	if (pat_index != I915_CACHE_MODE_UC)
>  		pte |= HSW_WB_LLC_AGE3;
>  
>  	return pte;
> @@ -1094,9 +1089,9 @@ static u64 iris_pte_encode(dma_addr_t addr,
>  	gen6_pte_t pte = HSW_PTE_ADDR_ENCODE(addr) | GEN6_PTE_VALID;
>  
>  	switch (pat_index) {
> -	case I915_CACHE_NONE:
> +	case I915_CACHE_MODE_UC:
>  		break;
> -	case I915_CACHE_WT:
> +	case I915_CACHE_MODE_WT:
>  		pte |= HSW_WT_ELLC_LLC_AGE3;
>  		break;
>  	default:
> diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c b/drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c
> index 866c416afb73..803c41ac4ccb 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c
> @@ -21,7 +21,7 @@ static void gmch_ggtt_insert_page(struct i915_address_space *vm,
>  				  unsigned int pat_index,
>  				  u32 unused)
>  {
> -	unsigned int flags = (pat_index == I915_CACHE_NONE) ?
> +	unsigned int flags = (pat_index == I915_CACHE_MODE_UC) ?
>  		AGP_USER_MEMORY : AGP_USER_CACHED_MEMORY;
>  
>  	intel_gmch_gtt_insert_page(addr, offset >> PAGE_SHIFT, flags);
> @@ -32,7 +32,7 @@ static void gmch_ggtt_insert_entries(struct i915_address_space *vm,
>  				     unsigned int pat_index,
>  				     u32 unused)
>  {
> -	unsigned int flags = (pat_index == I915_CACHE_NONE) ?
> +	unsigned int flags = (pat_index == I915_CACHE_MODE_UC) ?
>  		AGP_USER_MEMORY : AGP_USER_CACHED_MEMORY;
>  
>  	intel_gmch_gtt_insert_sg_entries(vma_res->bi.pages, vma_res->start >> PAGE_SHIFT,
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
> index 065099362a98..48055304537a 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
> @@ -676,7 +676,7 @@ __vm_create_scratch_for_read(struct i915_address_space *vm, unsigned long size)
>  	if (IS_ERR(obj))
>  		return ERR_CAST(obj);
>  
> -	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
> +	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
>  
>  	vma = i915_vma_instance(obj, vm, NULL);
>  	if (IS_ERR(vma)) {
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
> index 7192a534a654..af4277c1d577 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
> @@ -636,7 +636,8 @@ void
>  __set_pd_entry(struct i915_page_directory * const pd,
>  	       const unsigned short idx,
>  	       struct i915_page_table *pt,
> -	       u64 (*encode)(const dma_addr_t, const enum i915_cache_level));
> +	       u64 (*encode)(const dma_addr_t,
> +			     const enum i915_cache_mode cache_mode));
>  
>  #define set_pd_entry(pd, idx, to) \
>  	__set_pd_entry((pd), (idx), px_pt(to), gen8_pde_encode)
> diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
> index 436756bfbb1a..3e461d4f3693 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
> @@ -98,14 +98,16 @@ void
>  __set_pd_entry(struct i915_page_directory * const pd,
>  	       const unsigned short idx,
>  	       struct i915_page_table * const to,
> -	       u64 (*encode)(const dma_addr_t, const enum i915_cache_level))
> +	       u64 (*encode)(const dma_addr_t,
> +			     const enum i915_cache_mode cache_mode))
>  {
>  	/* Each thread pre-pins the pd, and we may have a thread per pde. */
>  	GEM_BUG_ON(atomic_read(px_used(pd)) > NALLOC * I915_PDES);
>  
>  	atomic_inc(px_used(pd));
>  	pd->entry[idx] = to;
> -	write_dma_entry(px_base(pd), idx, encode(px_dma(to), I915_CACHE_LLC));
> +	write_dma_entry(px_base(pd), idx,
> +			encode(px_dma(to), I915_CACHE_MODE_WB));
>  }
>  
>  void
> diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> index 92085ffd23de..9131d228d285 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> @@ -551,7 +551,9 @@ alloc_context_vma(struct intel_engine_cs *engine)
>  	 * later platforms don't have L3 control bits in the PTE.
>  	 */
>  	if (IS_IVYBRIDGE(i915))
> -		i915_gem_object_set_cache_coherency(obj, I915_CACHE_L3_LLC);
> +		i915_gem_object_set_cache_coherency(obj,
> +						    I915_CACHE_CACHED |
> +						    __I915_CACHE_FLAG(L3));
>  
>  	vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL);
>  	if (IS_ERR(vma)) {
> diff --git a/drivers/gpu/drm/i915/gt/intel_timeline.c b/drivers/gpu/drm/i915/gt/intel_timeline.c
> index b9640212d659..025ce54c886d 100644
> --- a/drivers/gpu/drm/i915/gt/intel_timeline.c
> +++ b/drivers/gpu/drm/i915/gt/intel_timeline.c
> @@ -26,7 +26,7 @@ static struct i915_vma *hwsp_alloc(struct intel_gt *gt)
>  	if (IS_ERR(obj))
>  		return ERR_CAST(obj);
>  
> -	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
> +	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
>  
>  	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL);
>  	if (IS_ERR(vma))
> diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> index 8b0d84f2aad2..fc278fa463b0 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> @@ -64,7 +64,7 @@ static int hang_init(struct hang *h, struct intel_gt *gt)
>  		goto err_hws;
>  	}
>  
> -	i915_gem_object_set_cache_coherency(h->hws, I915_CACHE_LLC);
> +	i915_gem_object_set_cache_coherency(h->hws, I915_CACHE_CACHED);
>  	vaddr = i915_gem_object_pin_map_unlocked(h->hws, I915_MAP_WB);
>  	if (IS_ERR(vaddr)) {
>  		err = PTR_ERR(vaddr);
> diff --git a/drivers/gpu/drm/i915/gt/selftest_workarounds.c b/drivers/gpu/drm/i915/gt/selftest_workarounds.c
> index 14a8b25b6204..d25990d33d44 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_workarounds.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_workarounds.c
> @@ -111,7 +111,7 @@ read_nonprivs(struct intel_context *ce)
>  	if (IS_ERR(result))
>  		return result;
>  
> -	i915_gem_object_set_cache_coherency(result, I915_CACHE_LLC);
> +	i915_gem_object_set_cache_coherency(result, I915_CACHE_CACHED);
>  
>  	cs = i915_gem_object_pin_map_unlocked(result, I915_MAP_WB);
>  	if (IS_ERR(cs)) {
> diff --git a/drivers/gpu/drm/i915/i915_cache.c b/drivers/gpu/drm/i915/i915_cache.c
> index 06eb5933c719..f4ba1cb430d3 100644
> --- a/drivers/gpu/drm/i915/i915_cache.c
> +++ b/drivers/gpu/drm/i915/i915_cache.c
> @@ -6,13 +6,88 @@
>  #include "i915_cache.h"
>  #include "i915_drv.h"
>  
> -void i915_cache_init(struct drm_i915_private *i915)
> +int i915_cache_init(struct drm_i915_private *i915)
>  {
> -	i915->pat_uc = i915_gem_get_pat_index(i915, I915_CACHE_NONE);
> -	drm_info(&i915->drm, "Using PAT index %u for uncached access\n",
> -		 i915->pat_uc);
> +	int ret;
>  
> -	i915->pat_wb = i915_gem_get_pat_index(i915, I915_CACHE_LLC);
> -	drm_info(&i915->drm, "Using PAT index %u for write-back access\n",
> -		 i915->pat_wb);
> +	ret = i915_cache_find_pat(i915, I915_CACHE_NONE);
> +	if (ret < 0) {
> +		drm_err(&i915->drm,
> +			"Failed to find PAT index for uncached access\n");
> +		return -ENODEV;
> +	}
> +	drm_info(&i915->drm, "Using PAT index %u for uncached access\n", ret);
> +	i915->pat_uc = ret;
> +
> +	ret = i915_cache_find_pat(i915, I915_CACHE_CACHED);
> +	if (ret < 0) {
> +		drm_err(&i915->drm,
> +			"Failed to find PAT index for write-back access\n");
> +		return -ENODEV;
> +	}
> +	drm_info(&i915->drm, "Using PAT index %u for write-back access\n", ret);
> +	i915->pat_wb = ret;
> +
> +	return 0;
> +}
> +
> +int i915_cache_find_pat(struct drm_i915_private *i915, i915_cache_t cache)
> +{
> +	const struct intel_device_info *info = INTEL_INFO(i915);
> +	int i;
> +
> +	for (i = 0; i < ARRAY_SIZE(info->cache_modes); i++) {
> +		if (info->cache_modes[i] == cache)
> +			return i;
> +	}
> +
> +	return -1;
> +}
> +
> +void i915_cache_print(char *buf, size_t buflen, const char *suffix,
> +		      i915_cache_t cache)
> +{
> +	const enum i915_cache_mode mode = I915_CACHE_MODE(cache);
> +	static const char * const mode_str[] = {
> +		[I915_CACHE_MODE_UC] = "UC",
> +		[I915_CACHE_MODE_WB] = "WB",
> +		[I915_CACHE_MODE_WT] = "WT",
> +		[I915_CACHE_MODE_WC] = "WC",
> +	};
> +	static const char * const flag_str[] = {
> +		[ilog2(I915_CACHE_FLAG_COH1W)] = "1-Way-Coherent",
> +		[ilog2(I915_CACHE_FLAG_COH2W)] = "2-Way-Coherent",
> +		[ilog2(I915_CACHE_FLAG_L3)] =    "L3",
> +		[ilog2(I915_CACHE_FLAG_CLOS1)] = "CLOS1",
> +		[ilog2(I915_CACHE_FLAG_CLOS2)] = "CLOS2",
> +	};
> +
> +	if (mode > ARRAY_SIZE(mode_str)) {
> +		snprintf(buf, buflen, "0x%x%s", cache, suffix ?: "");
> +	} else {
> +		unsigned long flags = I915_CACHE_FLAGS(cache);
> +		unsigned long bit;
> +		int ret;
> +
> +		ret = snprintf(buf, buflen, "%s", mode_str[mode]);
> +		buf += ret;
> +		buflen -= ret;
> +
> +		/*
> +		 * Don't print "1-way-2-way", it would be confusing and 2-way
> +		 * implies 1-way anyway.
> +		 */
> +		if ((flags & (I915_CACHE_FLAG_COH1W | I915_CACHE_FLAG_COH2W)) ==
> +		    (I915_CACHE_FLAG_COH1W | I915_CACHE_FLAG_COH2W))
> +			flags &= ~I915_CACHE_FLAG_COH1W;
> +
> +		for_each_set_bit(bit, &flags, BITS_PER_TYPE(i915_cache_t)) {
> +			ret = snprintf(buf, buflen, "-%s", flag_str[bit]);
> +			buf += ret;
> +			buflen -= ret;
> +		}
> +
> +		if (suffix)
> +			snprintf(buf, buflen, "%s", suffix);
> +	}
>  }
> diff --git a/drivers/gpu/drm/i915/i915_cache.h b/drivers/gpu/drm/i915/i915_cache.h
> index cb68936fb8a2..d9e97318b942 100644
> --- a/drivers/gpu/drm/i915/i915_cache.h
> +++ b/drivers/gpu/drm/i915/i915_cache.h
> @@ -6,8 +6,76 @@
>  #ifndef __I915_CACHE_H__
>  #define __I915_CACHE_H__
>  
> +#include <linux/types.h>
> +
> +struct drm_printer;
> +
>  struct drm_i915_private;
>  
> -void i915_cache_init(struct drm_i915_private *i915);
> +typedef u16 i915_cache_t;
> +
> +/* Cache modes */
> +enum i915_cache_mode {
> +	I915_CACHE_MODE_UC = 0,
> +	I915_CACHE_MODE_WB,
> +	__I915_CACHE_MODE_WB_L3, /* Special do-not-use entry for legacy 1:1 mapping. */
> +	I915_CACHE_MODE_WT,
> +	I915_CACHE_MODE_WC,
> +	I915_NUM_CACHE_MODES
> +};
> +
> +/* Cache mode flag bits */
> +#define I915_CACHE_FLAG_COH1W	(0x1)
> +#define I915_CACHE_FLAG_COH2W	(0x2) /* 1-way needs to be set too. */
> +#define I915_CACHE_FLAG_L3	(0x4)
> +#define I915_CACHE_FLAG_CLOS1	(0x8)
> +#define I915_CACHE_FLAG_CLOS2	(0x10)
> +
> +/*
> + * Overloaded I915_CACHE() macro based on:
> + *  https://stackoverflow.com/questions/3046889/optional-parameters-with-c-macros
> + *
> + * It is possible to call I915_CACHE with mode and zero or more flags as
> + * separate arguments. Ie these all work:
> + *
> + *   I915_CACHE(WB)
> + *   I915_CACHE(WB, COH1W, COH2W)
> + *   I915_CACHE(WB, COH1W, COH2W, L3)
> + */
> +
> +#define __I915_CACHE_FLAG(f) (I915_CACHE_FLAG_##f << 8)
> +#define __I915_CACHE(m, f) ((i915_cache_t)(I915_CACHE_MODE_##m | (f)))
> +
> +#define I915_CACHE_4(m, f1, f2, f3)	__I915_CACHE(m, __I915_CACHE_FLAG(f1) | __I915_CACHE_FLAG(f2) | __I915_CACHE_FLAG(f3))
> +#define I915_CACHE_3(m, f1, f2)		__I915_CACHE(m, __I915_CACHE_FLAG(f1) | __I915_CACHE_FLAG(f2))
> +#define I915_CACHE_2(m, f1)		__I915_CACHE(m, __I915_CACHE_FLAG(f1))
> +#define I915_CACHE_1(m)			__I915_CACHE(m, 0)
> +#define I915_CACHE_0(m)			__I915_CACHE(WC, 0)
> +
> +#define FUNC_CHOOSER(_f1, _f2, _f3, _f4, _f5, ...) _f5
> +#define FUNC_RECOMPOSER(argsWithParentheses) FUNC_CHOOSER argsWithParentheses
> +#define CHOOSE_FROM_ARG_COUNT(...) FUNC_RECOMPOSER((__VA_ARGS__, I915_CACHE_4, I915_CACHE_3, I915_CACHE_2, I915_CACHE_1, ))
> +#define NO_ARG_EXPANDER() ,,,I915_CACHE_0
> +#define MACRO_CHOOSER(...) CHOOSE_FROM_ARG_COUNT(NO_ARG_EXPANDER __VA_ARGS__ ())
> +
> +#define I915_CACHE(...) MACRO_CHOOSER(__VA_ARGS__)(__VA_ARGS__)
> +
> +/* i915_cache_t mode and flags extraction helpers. */
> +#define I915_CACHE_MODE(cache) \
> +	((enum i915_cache_mode)(((i915_cache_t)(cache)) & 0xff))
> +#define I915_CACHE_FLAGS(cache) \
> +	((unsigned int)((((i915_cache_t)(cache) & 0xff00)) >> 8))
> +
> +/* Helpers for i915 caching modes. */
> +#define I915_CACHE_NONE		I915_CACHE(UC)
> +#define I915_CACHE_CACHED	I915_CACHE(WB, COH1W, COH2W)
> +#define I915_CACHE_WT		I915_CACHE(WT)
> +
> +int i915_cache_init(struct drm_i915_private *i915);
> +int i915_cache_find_pat(struct drm_i915_private *i915, i915_cache_t cache);
> +void i915_cache_print(char *buf, size_t buflen, const char *suffix,
> +		      i915_cache_t cache);
> +
> +#define I915_CACHE_NAME_LEN (40)
>  
>  #endif /* __I915_CACHE_H__ */
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 4de44cf1026d..4ec292011546 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -140,57 +140,18 @@ static const char *stringify_vma_type(const struct i915_vma *vma)
>  	return "ppgtt";
>  }
>  
> -static const char *i915_cache_level_str(struct drm_i915_gem_object *obj)
> -{
> -	struct drm_i915_private *i915 = obj_to_i915(obj);
> -
> -	if (IS_METEORLAKE(i915)) {
> -		switch (obj->pat_index) {
> -		case 0: return " WB";
> -		case 1: return " WT";
> -		case 2: return " UC";
> -		case 3: return " WB (1-Way Coh)";
> -		case 4: return " WB (2-Way Coh)";
> -		default: return " not defined";
> -		}
> -	} else if (IS_PONTEVECCHIO(i915)) {
> -		switch (obj->pat_index) {
> -		case 0: return " UC";
> -		case 1: return " WC";
> -		case 2: return " WT";
> -		case 3: return " WB";
> -		case 4: return " WT (CLOS1)";
> -		case 5: return " WB (CLOS1)";
> -		case 6: return " WT (CLOS2)";
> -		case 7: return " WT (CLOS2)";
> -		default: return " not defined";
> -		}
> -	} else if (GRAPHICS_VER(i915) >= 12) {
> -		switch (obj->pat_index) {
> -		case 0: return " WB";
> -		case 1: return " WC";
> -		case 2: return " WT";
> -		case 3: return " UC";
> -		default: return " not defined";
> -		}
> -	} else {
> -		switch (obj->pat_index) {
> -		case 0: return " UC";
> -		case 1: return HAS_LLC(i915) ?
> -			       " LLC" : " snooped";
> -		case 2: return " L3+LLC";
> -		case 3: return " WT";
> -		default: return " not defined";
> -		}
> -	}
> -}
> -
>  void
>  i915_debugfs_describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
>  {
> +	struct drm_i915_private *i915 = to_i915(obj->base.dev);
> +	char buf[I915_CACHE_NAME_LEN];
>  	struct i915_vma *vma;
>  	int pin_count = 0;
>  
> +	i915_cache_print(buf, sizeof(buf),
> +			 obj->pat_set_by_user ? "!" : NULL,
> +			 INTEL_INFO(i915)->cache_modes[obj->pat_index]);
> +
>  	seq_printf(m, "%pK: %c%c%c %8zdKiB %02x %02x %s%s%s",
>  		   &obj->base,
>  		   get_tiling_flag(obj),
> @@ -199,7 +160,7 @@ i915_debugfs_describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
>  		   obj->base.size / 1024,
>  		   obj->read_domains,
>  		   obj->write_domain,
> -		   i915_cache_level_str(obj),
> +		   buf,
>  		   obj->mm.dirty ? " dirty" : "",
>  		   obj->mm.madv == I915_MADV_DONTNEED ? " purgeable" : "");
>  	if (obj->base.name)
> diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
> index bb2223cc3470..8663388a524f 100644
> --- a/drivers/gpu/drm/i915/i915_driver.c
> +++ b/drivers/gpu/drm/i915/i915_driver.c
> @@ -241,7 +241,9 @@ static int i915_driver_early_probe(struct drm_i915_private *dev_priv)
>  	i915_memcpy_init_early(dev_priv);
>  	intel_runtime_pm_init_early(&dev_priv->runtime_pm);
>  
> -	i915_cache_init(dev_priv);
> +	ret = i915_cache_init(dev_priv);
> +	if (ret < 0)
> +		return ret;
>  
>  	ret = i915_workqueues_init(dev_priv);
>  	if (ret < 0)
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 896aa48ed089..814705cfeb12 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1144,19 +1144,6 @@ int i915_gem_init(struct drm_i915_private *dev_priv)
>  	unsigned int i;
>  	int ret;
>  
> -	/*
> -	 * In the proccess of replacing cache_level with pat_index a tricky
> -	 * dependency is created on the definition of the enum i915_cache_level.
> -	 * in case this enum is changed, PTE encode would be broken.
> -	 * Add a WARNING here. And remove when we completely quit using this
> -	 * enum
> -	 */
> -	BUILD_BUG_ON(I915_CACHE_NONE != 0 ||
> -		     I915_CACHE_LLC != 1 ||
> -		     I915_CACHE_L3_LLC != 2 ||
> -		     I915_CACHE_WT != 3 ||
> -		     I915_MAX_CACHE_LEVEL != 4);
> -
>  	/* We need to fallback to 4K pages if host doesn't support huge gtt. */
>  	if (intel_vgpu_active(dev_priv) && !intel_vgpu_has_huge_gtt(dev_priv))
>  		RUNTIME_INFO(dev_priv)->page_sizes = I915_GTT_PAGE_SIZE_4K;
> diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
> index fcacdc21643c..565a60a1645d 100644
> --- a/drivers/gpu/drm/i915/i915_pci.c
> +++ b/drivers/gpu/drm/i915/i915_pci.c
> @@ -32,6 +32,7 @@
>  #include "gt/intel_sa_media.h"
>  #include "gem/i915_gem_object_types.h"
>  
> +#include "i915_cache.h"
>  #include "i915_driver.h"
>  #include "i915_drv.h"
>  #include "i915_pci.h"
> @@ -43,36 +44,43 @@
>  	.__runtime.graphics.ip.ver = (x), \
>  	.__runtime.media.ip.ver = (x)
>  
> -#define LEGACY_CACHELEVEL \
> -	.cachelevel_to_pat = { \
> -		[I915_CACHE_NONE]   = 0, \
> -		[I915_CACHE_LLC]    = 1, \
> -		[I915_CACHE_L3_LLC] = 2, \
> -		[I915_CACHE_WT]     = 3, \
> +#define LEGACY_CACHE_MODES \
> +	.cache_modes = { \
> +		[I915_CACHE_MODE_UC] 	  = I915_CACHE(UC), \
> +		[I915_CACHE_MODE_WB] 	  = I915_CACHE(WB, COH1W, COH2W), \

Reading bspec 2863 (bdw) indicates that the CPU being able to snoop the
GPU's L3 was a new feature in gen8.  So for HSW and earlier, any
coherency was only 1-way (GPU could be coherent with CPU's caches, but
not vice-versa).  Only starting with gen8 did we get 2-way coherency as
an option where the CPU would also be coherent with the GPU cache (and
with gen8 and beyond you could still select 1-way instead of 2-way
coherency with instruction-level granularity via MOCS).  There are also
some legacy platforms (e.g., EHL/JSL on bspec 13948) where the IA wasn't
coherent with GPU L3 so we were back to 1-way coherency.

So should we split LEGACY_CACHE_MODES into two tables with different
coherency settings attached to I915_CACHE_MODE_WB?

> +		[__I915_CACHE_MODE_WB_L3] = I915_CACHE(WB, COH1W, COH2W, L3), \
> +		[I915_CACHE_MODE_WT] 	  = I915_CACHE(WT), \
>  	}
>  
> -#define TGL_CACHELEVEL \
> -	.cachelevel_to_pat = { \
> -		[I915_CACHE_NONE]   = 3, \
> -		[I915_CACHE_LLC]    = 0, \
> -		[I915_CACHE_L3_LLC] = 0, \
> -		[I915_CACHE_WT]     = 2, \
> +#define GEN12_CACHE_MODES \
> +	.cache_modes = { \
> +		[0] = I915_CACHE(WB, COH1W, COH2W), \
> +		[1] = I915_CACHE(WC), \
> +		[2] = I915_CACHE(WT), \
> +		[3] = I915_CACHE(UC), \
>  	}
>  
> -#define PVC_CACHELEVEL \
> -	.cachelevel_to_pat = { \
> -		[I915_CACHE_NONE]   = 0, \
> -		[I915_CACHE_LLC]    = 3, \
> -		[I915_CACHE_L3_LLC] = 3, \
> -		[I915_CACHE_WT]     = 2, \
> +/* FIXME is 1-way or 2-way for 3, 5, 7 */
> +
> +#define PVC_CACHE_MODES \
> +	.cache_modes = { \
> +		[0] = I915_CACHE(UC), \
> +		[1] = I915_CACHE(WC), \
> +		[2] = I915_CACHE(WT), \
> +		[3] = I915_CACHE(WB, COH1W), \
> +		[4] = I915_CACHE(WT, CLOS1), \
> +		[5] = I915_CACHE(WB, COH1W, CLOS1), \
> +		[6] = I915_CACHE(WT, CLOS2), \
> +		[7] = I915_CACHE(WB, COH1W, CLOS2), \
>  	}
>  
> -#define MTL_CACHELEVEL \
> -	.cachelevel_to_pat = { \
> -		[I915_CACHE_NONE]   = 2, \
> -		[I915_CACHE_LLC]    = 3, \
> -		[I915_CACHE_L3_LLC] = 3, \
> -		[I915_CACHE_WT]     = 1, \
> +#define MTL_CACHE_MODES \
> +	.cache_modes = { \
> +		[0] = I915_CACHE(WB), \
> +		[1] = I915_CACHE(WT), \
> +		[2] = I915_CACHE(UC), \
> +		[3] = I915_CACHE(WB, COH1W), \
> +		[4] = I915_CACHE(WB, COH1W, COH2W), \

We may want a comment on this one since the "2W" part is sort of a lie.
Bspec 63884 has a programming note for MTL that says

        "...Except for system atomics, setting Coherency Mode to 10 or
        11 results in this same one-way coherenct behavior..."

So if we ask for 2W, we actually only get 1W behavior except in a very
narrow set of cases.


Matt

>  	}
>  
>  /* Keep in gen based order, and chronological order within a gen */
> @@ -97,7 +105,7 @@
>  	.max_pat_index = 3, \
>  	GEN_DEFAULT_PAGE_SIZES, \
>  	GEN_DEFAULT_REGIONS, \
> -	LEGACY_CACHELEVEL
> +	LEGACY_CACHE_MODES
>  
>  #define I845_FEATURES \
>  	GEN(2), \
> @@ -112,7 +120,7 @@
>  	.max_pat_index = 3, \
>  	GEN_DEFAULT_PAGE_SIZES, \
>  	GEN_DEFAULT_REGIONS, \
> -	LEGACY_CACHELEVEL
> +	LEGACY_CACHE_MODES
>  
>  static const struct intel_device_info i830_info = {
>  	I830_FEATURES,
> @@ -145,7 +153,7 @@ static const struct intel_device_info i865g_info = {
>  	.max_pat_index = 3, \
>  	GEN_DEFAULT_PAGE_SIZES, \
>  	GEN_DEFAULT_REGIONS, \
> -	LEGACY_CACHELEVEL
> +	LEGACY_CACHE_MODES
>  
>  static const struct intel_device_info i915g_info = {
>  	GEN3_FEATURES,
> @@ -208,7 +216,7 @@ static const struct intel_device_info pnv_m_info = {
>  	.max_pat_index = 3, \
>  	GEN_DEFAULT_PAGE_SIZES, \
>  	GEN_DEFAULT_REGIONS, \
> -	LEGACY_CACHELEVEL
> +	LEGACY_CACHE_MODES
>  
>  static const struct intel_device_info i965g_info = {
>  	GEN4_FEATURES,
> @@ -252,7 +260,7 @@ static const struct intel_device_info gm45_info = {
>  	.max_pat_index = 3, \
>  	GEN_DEFAULT_PAGE_SIZES, \
>  	GEN_DEFAULT_REGIONS, \
> -	LEGACY_CACHELEVEL
> +	LEGACY_CACHE_MODES
>  
>  static const struct intel_device_info ilk_d_info = {
>  	GEN5_FEATURES,
> @@ -282,7 +290,7 @@ static const struct intel_device_info ilk_m_info = {
>  	.__runtime.ppgtt_size = 31, \
>  	GEN_DEFAULT_PAGE_SIZES, \
>  	GEN_DEFAULT_REGIONS, \
> -	LEGACY_CACHELEVEL
> +	LEGACY_CACHE_MODES
>  
>  #define SNB_D_PLATFORM \
>  	GEN6_FEATURES, \
> @@ -330,7 +338,7 @@ static const struct intel_device_info snb_m_gt2_info = {
>  	.__runtime.ppgtt_size = 31, \
>  	GEN_DEFAULT_PAGE_SIZES, \
>  	GEN_DEFAULT_REGIONS, \
> -	LEGACY_CACHELEVEL
> +	LEGACY_CACHE_MODES
>  
>  #define IVB_D_PLATFORM \
>  	GEN7_FEATURES, \
> @@ -387,7 +395,7 @@ static const struct intel_device_info vlv_info = {
>  	.platform_engine_mask = BIT(RCS0) | BIT(VCS0) | BIT(BCS0),
>  	GEN_DEFAULT_PAGE_SIZES,
>  	GEN_DEFAULT_REGIONS,
> -	LEGACY_CACHELEVEL,
> +	LEGACY_CACHE_MODES
>  };
>  
>  #define G75_FEATURES  \
> @@ -473,7 +481,7 @@ static const struct intel_device_info chv_info = {
>  	.has_coherent_ggtt = false,
>  	GEN_DEFAULT_PAGE_SIZES,
>  	GEN_DEFAULT_REGIONS,
> -	LEGACY_CACHELEVEL,
> +	LEGACY_CACHE_MODES
>  };
>  
>  #define GEN9_DEFAULT_PAGE_SIZES \
> @@ -536,7 +544,7 @@ static const struct intel_device_info skl_gt4_info = {
>  	.max_pat_index = 3, \
>  	GEN9_DEFAULT_PAGE_SIZES, \
>  	GEN_DEFAULT_REGIONS, \
> -	LEGACY_CACHELEVEL
> +	LEGACY_CACHE_MODES
>  
>  static const struct intel_device_info bxt_info = {
>  	GEN9_LP_FEATURES,
> @@ -640,7 +648,7 @@ static const struct intel_device_info jsl_info = {
>  #define GEN12_FEATURES \
>  	GEN11_FEATURES, \
>  	GEN(12), \
> -	TGL_CACHELEVEL, \
> +	GEN12_CACHE_MODES, \
>  	.has_global_mocs = 1, \
>  	.has_pxp = 1, \
>  	.max_pat_index = 3
> @@ -708,7 +716,7 @@ static const struct intel_device_info adl_p_info = {
>  	.__runtime.graphics.ip.ver = 12, \
>  	.__runtime.graphics.ip.rel = 50, \
>  	XE_HP_PAGE_SIZES, \
> -	TGL_CACHELEVEL, \
> +	GEN12_CACHE_MODES, \
>  	.dma_mask_size = 46, \
>  	.has_3d_pipeline = 1, \
>  	.has_64bit_reloc = 1, \
> @@ -803,7 +811,7 @@ static const struct intel_device_info pvc_info = {
>  		BIT(VCS0) |
>  		BIT(CCS0) | BIT(CCS1) | BIT(CCS2) | BIT(CCS3),
>  	.require_force_probe = 1,
> -	PVC_CACHELEVEL,
> +	PVC_CACHE_MODES
>  };
>  
>  static const struct intel_gt_definition xelpmp_extra_gt[] = {
> @@ -838,7 +846,7 @@ static const struct intel_device_info mtl_info = {
>  	.memory_regions = REGION_SMEM | REGION_STOLEN_LMEM,
>  	.platform_engine_mask = BIT(RCS0) | BIT(BCS0) | BIT(CCS0),
>  	.require_force_probe = 1,
> -	MTL_CACHELEVEL,
> +	MTL_CACHE_MODES
>  };
>  
>  #undef PLATFORM
> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> index 04bc1f4a1115..973175a64534 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -1870,7 +1870,7 @@ static int alloc_oa_buffer(struct i915_perf_stream *stream)
>  		return PTR_ERR(bo);
>  	}
>  
> -	i915_gem_object_set_cache_coherency(bo, I915_CACHE_LLC);
> +	i915_gem_object_set_cache_coherency(bo, I915_CACHE_CACHED);
>  
>  	/* PreHSW required 512K alignment, HSW requires 16M */
>  	vma = i915_vma_instance(bo, &gt->ggtt->vm, NULL);
> diff --git a/drivers/gpu/drm/i915/intel_device_info.h b/drivers/gpu/drm/i915/intel_device_info.h
> index dbfe6443457b..2ce13b7c48cb 100644
> --- a/drivers/gpu/drm/i915/intel_device_info.h
> +++ b/drivers/gpu/drm/i915/intel_device_info.h
> @@ -27,6 +27,8 @@
>  
>  #include <uapi/drm/i915_drm.h>
>  
> +#include "i915_cache.h"
> +
>  #include "intel_step.h"
>  
>  #include "gt/intel_engine_types.h"
> @@ -243,8 +245,8 @@ struct intel_device_info {
>  	 */
>  	const struct intel_runtime_info __runtime;
>  
> -	u32 cachelevel_to_pat[I915_MAX_CACHE_LEVEL];
> -	u32 max_pat_index;
> +	i915_cache_t cache_modes[8];
> +	unsigned int max_pat_index;
>  };
>  
>  struct intel_driver_caps {
> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
> index f910ec9b6d2b..ba821e48baa5 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
> @@ -267,7 +267,7 @@ static int igt_evict_for_cache_color(void *arg)
>  		err = PTR_ERR(obj);
>  		goto cleanup;
>  	}
> -	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
> +	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
>  	quirk_add(obj, &objects);
>  
>  	vma = i915_gem_object_ggtt_pin(obj, NULL, 0, 0,
> @@ -283,7 +283,7 @@ static int igt_evict_for_cache_color(void *arg)
>  		err = PTR_ERR(obj);
>  		goto cleanup;
>  	}
> -	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
> +	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
>  	quirk_add(obj, &objects);
>  
>  	/* Neighbouring; same colour - should fit */
> diff --git a/drivers/gpu/drm/i915/selftests/igt_spinner.c b/drivers/gpu/drm/i915/selftests/igt_spinner.c
> index 3c5e0952f1b8..4cfc5000d6ff 100644
> --- a/drivers/gpu/drm/i915/selftests/igt_spinner.c
> +++ b/drivers/gpu/drm/i915/selftests/igt_spinner.c
> @@ -23,7 +23,7 @@ int igt_spinner_init(struct igt_spinner *spin, struct intel_gt *gt)
>  		err = PTR_ERR(spin->hws);
>  		goto err;
>  	}
> -	i915_gem_object_set_cache_coherency(spin->hws, I915_CACHE_LLC);
> +	i915_gem_object_set_cache_coherency(spin->hws, I915_CACHE_CACHED);
>  
>  	spin->obj = i915_gem_object_create_internal(gt->i915, PAGE_SIZE);
>  	if (IS_ERR(spin->obj)) {
> diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> index 1d1a457e2aee..8ae77bcf27fa 100644
> --- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> +++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> @@ -126,13 +126,13 @@ static const struct intel_device_info mock_info = {
>  	.memory_regions = REGION_SMEM,
>  	.platform_engine_mask = BIT(0),
>  
> -	/* simply use legacy cache level for mock device */
> +	/* Simply use legacy cache modes for the mock device. */
>  	.max_pat_index = 3,
> -	.cachelevel_to_pat = {
> -		[I915_CACHE_NONE]   = 0,
> -		[I915_CACHE_LLC]    = 1,
> -		[I915_CACHE_L3_LLC] = 2,
> -		[I915_CACHE_WT]     = 3,
> +	.cache_modes = {
> +		[0] = I915_CACHE(UC),
> +		[1] = I915_CACHE(WB, COH1W),
> +		[2] = I915_CACHE(WB, COH1W, COH2W, L3),
> +		[3] = I915_CACHE(WT),
>  	},
>  };
>  
> @@ -181,7 +181,7 @@ struct drm_i915_private *mock_gem_device(void)
>  	/* Set up device info and initial runtime info. */
>  	intel_device_info_driver_create(i915, pdev->device, &mock_info);
>  
> -	i915_cache_init(i915);
> +	WARN_ON(i915_cache_init(i915));
>  
>  	dev_pm_domain_set(&pdev->dev, &pm_domain);
>  	pm_runtime_enable(&pdev->dev);
> -- 
> 2.39.2
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC 5/8] drm/i915: Improve the vm_fault_gtt user PAT index restriction
  2023-07-27 14:55   ` [Intel-gfx] " Tvrtko Ursulin
@ 2023-07-28  0:04     ` Matt Roper
  -1 siblings, 0 replies; 59+ messages in thread
From: Matt Roper @ 2023-07-28  0:04 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Intel-gfx, Fei Yang, dri-devel, Tvrtko Ursulin

On Thu, Jul 27, 2023 at 03:55:01PM +0100, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Now that i915 understands the caching modes behind PAT indices, we can
> refine the check in vm_fault_gtt() to not reject the uncached PAT if it
> was set by userspace on a snoopable platform.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Fei Yang <fei.yang@intel.com>
> Cc: Matt Roper <matthew.d.roper@intel.com>
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_mman.c | 14 +++-----------
>  1 file changed, 3 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> index cd7f8ded0d6f..9aa6ecf68432 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> @@ -382,17 +382,9 @@ static vm_fault_t vm_fault_gtt(struct vm_fault *vmf)
>  		goto err_reset;
>  	}
>  
> -	/*
> -	 * For objects created by userspace through GEM_CREATE with pat_index
> -	 * set by set_pat extension, coherency is managed by userspace, make
> -	 * sure we don't fail handling the vm fault by calling
> -	 * i915_gem_object_has_cache_level() which always return true for such
> -	 * objects. Otherwise this helper function would fall back to checking
> -	 * whether the object is un-cached.
> -	 */
> -	if (!((obj->pat_set_by_user ||
> -	       i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_UC)) ||
> -	      HAS_LLC(i915))) {
> +	/* Access to snoopable pages through the GTT is incoherent. */

This comment was removed in the previous patch, but now it came back
here.  Should we have just left it be in the previous patch?

I'm not really clear on what it means either.  Are we using "GTT" as
shorthand to refer to the aperture here?


Matt

> +	if (!i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_UC) &&
> +	    !HAS_LLC(i915)) {
>  		ret = -EFAULT;
>  		goto err_unpin;
>  	}
> -- 
> 2.39.2
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Intel-gfx] [RFC 5/8] drm/i915: Improve the vm_fault_gtt user PAT index restriction
@ 2023-07-28  0:04     ` Matt Roper
  0 siblings, 0 replies; 59+ messages in thread
From: Matt Roper @ 2023-07-28  0:04 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Intel-gfx, dri-devel

On Thu, Jul 27, 2023 at 03:55:01PM +0100, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Now that i915 understands the caching modes behind PAT indices, we can
> refine the check in vm_fault_gtt() to not reject the uncached PAT if it
> was set by userspace on a snoopable platform.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Fei Yang <fei.yang@intel.com>
> Cc: Matt Roper <matthew.d.roper@intel.com>
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_mman.c | 14 +++-----------
>  1 file changed, 3 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> index cd7f8ded0d6f..9aa6ecf68432 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> @@ -382,17 +382,9 @@ static vm_fault_t vm_fault_gtt(struct vm_fault *vmf)
>  		goto err_reset;
>  	}
>  
> -	/*
> -	 * For objects created by userspace through GEM_CREATE with pat_index
> -	 * set by set_pat extension, coherency is managed by userspace, make
> -	 * sure we don't fail handling the vm fault by calling
> -	 * i915_gem_object_has_cache_level() which always return true for such
> -	 * objects. Otherwise this helper function would fall back to checking
> -	 * whether the object is un-cached.
> -	 */
> -	if (!((obj->pat_set_by_user ||
> -	       i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_UC)) ||
> -	      HAS_LLC(i915))) {
> +	/* Access to snoopable pages through the GTT is incoherent. */

This comment was removed in the previous patch, but now it came back
here.  Should we have just left it be in the previous patch?

I'm not really clear on what it means either.  Are we using "GTT" as
shorthand to refer to the aperture here?


Matt

> +	if (!i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_UC) &&
> +	    !HAS_LLC(i915)) {
>  		ret = -EFAULT;
>  		goto err_unpin;
>  	}
> -- 
> 2.39.2
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC 6/8] drm/i915: Lift the user PAT restriction from gpu_write_needs_clflush
  2023-07-27 14:55   ` [Intel-gfx] " Tvrtko Ursulin
@ 2023-07-28  0:05     ` Matt Roper
  -1 siblings, 0 replies; 59+ messages in thread
From: Matt Roper @ 2023-07-28  0:05 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Intel-gfx, Fei Yang, dri-devel, Tvrtko Ursulin

On Thu, Jul 27, 2023 at 03:55:02PM +0100, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Now that i915 understands the caching modes behind PAT indices, and having
> also special cased the Meteorlake snooping fully coherent mode, we can
> remove the user PAT check from gpu_write_needs_clflush().
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Fei Yang <fei.yang@intel.com>
> Cc: Matt Roper <matthew.d.roper@intel.com>

Reviewed-by: Matt Roper <matthew.d.roper@intel.com>

> ---
>  drivers/gpu/drm/i915/gem/i915_gem_domain.c | 6 ------
>  1 file changed, 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> index c15f83de33af..bf3a2fa0e539 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> @@ -41,12 +41,6 @@ static bool gpu_write_needs_clflush(struct drm_i915_gem_object *obj)
>  	if (IS_METEORLAKE(i915))
>  		return false;
>  
> -	/*
> -	 * Always flush cache for UMD objects with PAT index set.
> -	 */
> -	if (obj->pat_set_by_user)
> -		return true;
> -
>  	/*
>  	 * Fully coherent cached access may end up with data in the CPU cache
>  	 * which hasn't hit memory yet.
> -- 
> 2.39.2
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Intel-gfx] [RFC 6/8] drm/i915: Lift the user PAT restriction from gpu_write_needs_clflush
@ 2023-07-28  0:05     ` Matt Roper
  0 siblings, 0 replies; 59+ messages in thread
From: Matt Roper @ 2023-07-28  0:05 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Intel-gfx, dri-devel

On Thu, Jul 27, 2023 at 03:55:02PM +0100, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Now that i915 understands the caching modes behind PAT indices, and having
> also special cased the Meteorlake snooping fully coherent mode, we can
> remove the user PAT check from gpu_write_needs_clflush().
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Fei Yang <fei.yang@intel.com>
> Cc: Matt Roper <matthew.d.roper@intel.com>

Reviewed-by: Matt Roper <matthew.d.roper@intel.com>

> ---
>  drivers/gpu/drm/i915/gem/i915_gem_domain.c | 6 ------
>  1 file changed, 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> index c15f83de33af..bf3a2fa0e539 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> @@ -41,12 +41,6 @@ static bool gpu_write_needs_clflush(struct drm_i915_gem_object *obj)
>  	if (IS_METEORLAKE(i915))
>  		return false;
>  
> -	/*
> -	 * Always flush cache for UMD objects with PAT index set.
> -	 */
> -	if (obj->pat_set_by_user)
> -		return true;
> -
>  	/*
>  	 * Fully coherent cached access may end up with data in the CPU cache
>  	 * which hasn't hit memory yet.
> -- 
> 2.39.2
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC 7/8] drm/i915: Lift the user PAT restriction from use_cpu_reloc
  2023-07-27 14:55   ` [Intel-gfx] " Tvrtko Ursulin
@ 2023-07-28  0:09     ` Matt Roper
  -1 siblings, 0 replies; 59+ messages in thread
From: Matt Roper @ 2023-07-28  0:09 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Intel-gfx, Fei Yang, dri-devel, Tvrtko Ursulin

On Thu, Jul 27, 2023 at 03:55:03PM +0100, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Now that i915 understands the caching modes behind PAT indices, we can
> refine the check in use_cpu_reloc() to not reject the uncached PAT if it
> was set by userspace.
> 
> Instead it can decide based on the presence of full coherency which
> should be functionally equivalent on legacy platforms. We can ignore WT
> since it is only used by the display, and we can ignore Meteorlake since
> it will fail on the existing "has_llc" condition before the object cache
> mode check.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Fei Yang <fei.yang@intel.com>
> Cc: Matt Roper <matthew.d.roper@intel.com>
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 9 +--------
>  1 file changed, 1 insertion(+), 8 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> index 9d6e49c8a4c6..f74b33670bad 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> @@ -640,16 +640,9 @@ static inline int use_cpu_reloc(const struct reloc_cache *cache,
>  	if (DBG_FORCE_RELOC == FORCE_GTT_RELOC)
>  		return false;
>  
> -	/*
> -	 * For objects created by userspace through GEM_CREATE with pat_index
> -	 * set by set_pat extension, i915_gem_object_has_cache_level() always
> -	 * return true, otherwise the call would fall back to checking whether
> -	 * the object is un-cached.
> -	 */
>  	return (cache->has_llc ||
>  		obj->cache_dirty ||
> -		!(obj->pat_set_by_user ||
> -		  i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_UC)));
> +		i915_gem_object_has_cache_flag(obj, I915_CACHE_FLAG_COH2W));

My understanding of relocations is minimal, but does 2W actually matter
here (CPU snooping GPU caches)?  I would have expected only 1W coherency
to be necessary (GPU snooping CPU caches)?


Matt

>  }
>  
>  static int eb_reserve_vma(struct i915_execbuffer *eb,
> -- 
> 2.39.2
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Intel-gfx] [RFC 7/8] drm/i915: Lift the user PAT restriction from use_cpu_reloc
@ 2023-07-28  0:09     ` Matt Roper
  0 siblings, 0 replies; 59+ messages in thread
From: Matt Roper @ 2023-07-28  0:09 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Intel-gfx, dri-devel

On Thu, Jul 27, 2023 at 03:55:03PM +0100, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Now that i915 understands the caching modes behind PAT indices, we can
> refine the check in use_cpu_reloc() to not reject the uncached PAT if it
> was set by userspace.
> 
> Instead it can decide based on the presence of full coherency which
> should be functionally equivalent on legacy platforms. We can ignore WT
> since it is only used by the display, and we can ignore Meteorlake since
> it will fail on the existing "has_llc" condition before the object cache
> mode check.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Fei Yang <fei.yang@intel.com>
> Cc: Matt Roper <matthew.d.roper@intel.com>
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 9 +--------
>  1 file changed, 1 insertion(+), 8 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> index 9d6e49c8a4c6..f74b33670bad 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> @@ -640,16 +640,9 @@ static inline int use_cpu_reloc(const struct reloc_cache *cache,
>  	if (DBG_FORCE_RELOC == FORCE_GTT_RELOC)
>  		return false;
>  
> -	/*
> -	 * For objects created by userspace through GEM_CREATE with pat_index
> -	 * set by set_pat extension, i915_gem_object_has_cache_level() always
> -	 * return true, otherwise the call would fall back to checking whether
> -	 * the object is un-cached.
> -	 */
>  	return (cache->has_llc ||
>  		obj->cache_dirty ||
> -		!(obj->pat_set_by_user ||
> -		  i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_UC)));
> +		i915_gem_object_has_cache_flag(obj, I915_CACHE_FLAG_COH2W));

My understanding of relocations is minimal, but does 2W actually matter
here (CPU snooping GPU caches)?  I would have expected only 1W coherency
to be necessary (GPU snooping CPU caches)?


Matt

>  }
>  
>  static int eb_reserve_vma(struct i915_execbuffer *eb,
> -- 
> 2.39.2
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC 4/8] drm/i915: Refactor PAT/object cache handling
  2023-07-27 23:57     ` [Intel-gfx] " Matt Roper
@ 2023-07-28  0:17       ` Matt Roper
  -1 siblings, 0 replies; 59+ messages in thread
From: Matt Roper @ 2023-07-28  0:17 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Fei Yang, Tvrtko Ursulin, Intel-gfx, dri-devel, Andi Shyti, Chris Wilson

On Thu, Jul 27, 2023 at 04:57:53PM -0700, Matt Roper wrote:
> On Thu, Jul 27, 2023 at 03:55:00PM +0100, Tvrtko Ursulin wrote:
> > From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > 
> > Commit 9275277d5324 ("drm/i915: use pat_index instead of cache_level") has
> > introduced PAT indices to i915 internal APIs, partially replacing the
> > usage of driver internal cache_level, but has also added a few sub-
> > optimal design decisions which this patch tries to improve upon.
> > 
> > Principal change here is to invert the per platform cache level to PAT
> > index table which was added by the referenced commit, and by doing so
> > enable i915 to understand the cache mode between PAT indices, changing
> > them from opaque to transparent.
> > 
> > Once we have the inverted table we are able to remove the hidden false
> > "return true" from i915_gem_object_has_cache_level and make the involved
> > code path clearer.
> > 
> > To achieve this we replace the enum i915_cache_level with i915_cache_t,
> > composed of a more detailed representation of each cache mode (base mode
> > plus flags).
> > 
> > In this way we are able to express the differences between different
> > write-back mode coherency settings on Meteorlake, which in turn enables us
> > to map the i915 "cached" mode to the correct Meteorlake PAT index.
> > 
> > We can also replace the platform dependent cache mode to string code in
> > debugfs and elsewhere by the single implementation based on i915_cache_t.
> > 
> > v2:
> >  * Fix PAT-to-cache-mode table for PVC. (Fei)
> >  * Cache display caching mode too. (Fei)
> >  * Improve and document criteria in i915_gem_object_can_bypass_llc() (Matt)
> > 
> > v3:
> >  * Checkpath issues.
> >  * Cache mode flags check fixed.
> > 
> > v4:
> >  * Fix intel_device_info->cache_modes array size. (Matt)
> >  * Boolean cache mode and flags query. (Matt)
> >  * Reduce number of cache macros with some macro magic.
> >  * One more checkpatch fix.
> >  * Tweak tables to show legacy and Gen12 WB is fully coherent.
> > 
> > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > References: 9275277d5324 ("drm/i915: use pat_index instead of cache_level")
> > Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
> > Cc: Fei Yang <fei.yang@intel.com>
> > Cc: Andi Shyti <andi.shyti@linux.intel.com>
> > Cc: Matt Roper <matthew.d.roper@intel.com>
> > ---
> >  drivers/gpu/drm/i915/gem/i915_gem_domain.c    |  60 +++++----
> >  drivers/gpu/drm/i915/gem/i915_gem_domain.h    |   5 +-
> >  .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |   3 +-
> >  drivers/gpu/drm/i915/gem/i915_gem_internal.c  |   2 +-
> >  drivers/gpu/drm/i915/gem/i915_gem_mman.c      |   4 +-
> >  drivers/gpu/drm/i915/gem/i915_gem_object.c    | 117 ++++++++++--------
> >  drivers/gpu/drm/i915/gem/i915_gem_object.h    |  11 +-
> >  .../gpu/drm/i915/gem/i915_gem_object_types.h  | 116 +----------------
> >  drivers/gpu/drm/i915/gem/i915_gem_shmem.c     |   8 +-
> >  drivers/gpu/drm/i915/gem/i915_gem_stolen.c    |   2 +-
> >  drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c  |  20 +--
> >  drivers/gpu/drm/i915/gem/i915_gem_userptr.c   |   2 +-
> >  .../drm/i915/gem/selftests/huge_gem_object.c  |   2 +-
> >  .../gpu/drm/i915/gem/selftests/huge_pages.c   |   3 +-
> >  drivers/gpu/drm/i915/gt/gen8_ppgtt.c          |  10 +-
> >  drivers/gpu/drm/i915/gt/intel_engine_cs.c     |   2 +-
> >  drivers/gpu/drm/i915/gt/intel_ggtt.c          |  25 ++--
> >  drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c     |   4 +-
> >  drivers/gpu/drm/i915/gt/intel_gtt.c           |   2 +-
> >  drivers/gpu/drm/i915/gt/intel_gtt.h           |   3 +-
> >  drivers/gpu/drm/i915/gt/intel_ppgtt.c         |   6 +-
> >  .../gpu/drm/i915/gt/intel_ring_submission.c   |   4 +-
> >  drivers/gpu/drm/i915/gt/intel_timeline.c      |   2 +-
> >  drivers/gpu/drm/i915/gt/selftest_hangcheck.c  |   2 +-
> >  .../gpu/drm/i915/gt/selftest_workarounds.c    |   2 +-
> >  drivers/gpu/drm/i915/i915_cache.c             |  89 +++++++++++--
> >  drivers/gpu/drm/i915/i915_cache.h             |  70 ++++++++++-
> >  drivers/gpu/drm/i915/i915_debugfs.c           |  53 ++------
> >  drivers/gpu/drm/i915/i915_driver.c            |   4 +-
> >  drivers/gpu/drm/i915/i915_gem.c               |  13 --
> >  drivers/gpu/drm/i915/i915_pci.c               |  84 +++++++------
> >  drivers/gpu/drm/i915/i915_perf.c              |   2 +-
> >  drivers/gpu/drm/i915/intel_device_info.h      |   6 +-
> >  .../gpu/drm/i915/selftests/i915_gem_evict.c   |   4 +-
> >  drivers/gpu/drm/i915/selftests/igt_spinner.c  |   2 +-
> >  .../gpu/drm/i915/selftests/mock_gem_device.c  |  14 +--
> >  36 files changed, 391 insertions(+), 367 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> > index 57db9c581bf6..c15f83de33af 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> > @@ -8,6 +8,7 @@
> >  #include "display/intel_frontbuffer.h"
> >  #include "gt/intel_gt.h"
> >  
> > +#include "i915_cache.h"
> >  #include "i915_drv.h"
> >  #include "i915_gem_clflush.h"
> >  #include "i915_gem_domain.h"
> > @@ -41,14 +42,17 @@ static bool gpu_write_needs_clflush(struct drm_i915_gem_object *obj)
> >  		return false;
> >  
> >  	/*
> > -	 * For objects created by userspace through GEM_CREATE with pat_index
> > -	 * set by set_pat extension, i915_gem_object_has_cache_level() will
> > -	 * always return true, because the coherency of such object is managed
> > -	 * by userspace. Othereise the call here would fall back to checking
> > -	 * whether the object is un-cached or write-through.
> > +	 * Always flush cache for UMD objects with PAT index set.
> >  	 */
> > -	return !(i915_gem_object_has_cache_level(obj, I915_CACHE_NONE) ||
> > -		 i915_gem_object_has_cache_level(obj, I915_CACHE_WT));
> > +	if (obj->pat_set_by_user)
> > +		return true;
> > +
> > +	/*
> > +	 * Fully coherent cached access may end up with data in the CPU cache
> > +	 * which hasn't hit memory yet.
> > +	 */
> > +	return i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_WB) &&
> > +	       i915_gem_object_has_cache_flag(obj, I915_CACHE_FLAG_COH2W);
> >  }
> >  
> >  bool i915_gem_cpu_write_needs_clflush(struct drm_i915_gem_object *obj)
> > @@ -268,7 +272,7 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
> >  /**
> >   * i915_gem_object_set_cache_level - Changes the cache-level of an object across all VMA.
> >   * @obj: object to act on
> > - * @cache_level: new cache level to set for the object
> > + * @cache: new caching mode to set for the object
> >   *
> >   * After this function returns, the object will be in the new cache-level
> >   * across all GTT and the contents of the backing storage will be coherent,
> > @@ -281,18 +285,28 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
> >   * that all direct access to the scanout remains coherent.
> >   */
> >  int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
> > -				    enum i915_cache_level cache_level)
> > +				    i915_cache_t cache)
> >  {
> > -	int ret;
> > +	struct drm_i915_private *i915 = to_i915(obj->base.dev);
> > +	int pat, ret;
> >  
> > -	/*
> > -	 * For objects created by userspace through GEM_CREATE with pat_index
> > -	 * set by set_pat extension, simply return 0 here without touching
> > -	 * the cache setting, because such objects should have an immutable
> > -	 * cache setting by desgin and always managed by userspace.
> > -	 */
> > -	if (i915_gem_object_has_cache_level(obj, cache_level))
> > +	pat = i915_cache_find_pat(i915, cache);
> > +	if (pat < 0) {
> > +		char buf[I915_CACHE_NAME_LEN];
> > +
> > +		i915_cache_print(buf, sizeof(buf), NULL, cache);
> > +		drm_err_ratelimited(&i915->drm,
> > +				    "Attempting to use unknown caching mode %s!\n",
> > +				    buf);
> > +
> > +		return -EINVAL;
> > +	} else if (pat == obj->pat_index) {
> >  		return 0;
> > +	} else if (obj->pat_set_by_user) {
> > +		drm_notice_once(&i915->drm,
> > +				"Attempting to change caching mode on an object with fixed PAT!\n");
> > +		return -EINVAL;
> > +	}
> >  
> >  	ret = i915_gem_object_wait(obj,
> >  				   I915_WAIT_INTERRUPTIBLE |
> > @@ -302,7 +316,7 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
> >  		return ret;
> >  
> >  	/* Always invalidate stale cachelines */
> > -	i915_gem_object_set_cache_coherency(obj, cache_level);
> > +	i915_gem_object_set_pat_index(obj, pat);
> >  	obj->cache_dirty = true;
> >  
> >  	/* The cache-level will be applied when each vma is rebound. */
> > @@ -337,10 +351,10 @@ int i915_gem_get_caching_ioctl(struct drm_device *dev, void *data,
> >  		goto out;
> >  	}
> >  
> > -	if (i915_gem_object_has_cache_level(obj, I915_CACHE_LLC) ||
> > -	    i915_gem_object_has_cache_level(obj, I915_CACHE_L3_LLC))
> > +	if (i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_WB) &&
> > +	    i915_gem_object_has_cache_flag(obj, I915_CACHE_FLAG_COH2W))
> >  		args->caching = I915_CACHING_CACHED;
> > -	else if (i915_gem_object_has_cache_level(obj, I915_CACHE_WT))
> > +	else if (i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_WT))
> >  		args->caching = I915_CACHING_DISPLAY;
> >  	else
> >  		args->caching = I915_CACHING_NONE;
> > @@ -355,7 +369,7 @@ int i915_gem_set_caching_ioctl(struct drm_device *dev, void *data,
> >  	struct drm_i915_private *i915 = to_i915(dev);
> >  	struct drm_i915_gem_caching *args = data;
> >  	struct drm_i915_gem_object *obj;
> > -	enum i915_cache_level level;
> > +	i915_cache_t level;
> >  	int ret = 0;
> >  
> >  	if (IS_DGFX(i915))
> > @@ -378,7 +392,7 @@ int i915_gem_set_caching_ioctl(struct drm_device *dev, void *data,
> >  		if (!HAS_LLC(i915) && !HAS_SNOOP(i915))
> >  			return -ENODEV;
> >  
> > -		level = I915_CACHE_LLC;
> > +		level = I915_CACHE_CACHED;
> >  		break;
> >  	case I915_CACHING_DISPLAY:
> >  		level = HAS_WT(i915) ? I915_CACHE_WT : I915_CACHE_NONE;
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.h b/drivers/gpu/drm/i915/gem/i915_gem_domain.h
> > index 9622df962bfc..6da5c351f6fd 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.h
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.h
> > @@ -6,10 +6,11 @@
> >  #ifndef __I915_GEM_DOMAIN_H__
> >  #define __I915_GEM_DOMAIN_H__
> >  
> > +#include "i915_cache.h"
> > +
> >  struct drm_i915_gem_object;
> > -enum i915_cache_level;
> >  
> >  int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
> > -				    enum i915_cache_level cache_level);
> > +				    i915_cache_t cache);
> >  
> >  #endif /* __I915_GEM_DOMAIN_H__ */
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > index 0a1d40220020..9d6e49c8a4c6 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > @@ -648,7 +648,8 @@ static inline int use_cpu_reloc(const struct reloc_cache *cache,
> >  	 */
> >  	return (cache->has_llc ||
> >  		obj->cache_dirty ||
> > -		!i915_gem_object_has_cache_level(obj, I915_CACHE_NONE));
> > +		!(obj->pat_set_by_user ||
> > +		  i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_UC)));
> >  }
> >  
> >  static int eb_reserve_vma(struct i915_execbuffer *eb,
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_internal.c b/drivers/gpu/drm/i915/gem/i915_gem_internal.c
> > index 6bc26b4b06b8..88c360c3d6a3 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_internal.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_internal.c
> > @@ -170,7 +170,7 @@ __i915_gem_object_create_internal(struct drm_i915_private *i915,
> >  	obj->read_domains = I915_GEM_DOMAIN_CPU;
> >  	obj->write_domain = I915_GEM_DOMAIN_CPU;
> >  
> > -	cache_level = HAS_LLC(i915) ? I915_CACHE_LLC : I915_CACHE_NONE;
> > +	cache_level = HAS_LLC(i915) ? I915_CACHE_CACHED : I915_CACHE_NONE;
> >  	i915_gem_object_set_cache_coherency(obj, cache_level);
> >  
> >  	return obj;
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> > index aa4d842d4c5a..cd7f8ded0d6f 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> > @@ -382,7 +382,6 @@ static vm_fault_t vm_fault_gtt(struct vm_fault *vmf)
> >  		goto err_reset;
> >  	}
> >  
> > -	/* Access to snoopable pages through the GTT is incoherent. */
> >  	/*
> >  	 * For objects created by userspace through GEM_CREATE with pat_index
> >  	 * set by set_pat extension, coherency is managed by userspace, make
> > @@ -391,7 +390,8 @@ static vm_fault_t vm_fault_gtt(struct vm_fault *vmf)
> >  	 * objects. Otherwise this helper function would fall back to checking
> >  	 * whether the object is un-cached.
> >  	 */
> > -	if (!(i915_gem_object_has_cache_level(obj, I915_CACHE_NONE) ||
> > +	if (!((obj->pat_set_by_user ||
> > +	       i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_UC)) ||
> >  	      HAS_LLC(i915))) {
> >  		ret = -EFAULT;
> >  		goto err_unpin;
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> > index 3dc4fbb67d2b..ec1f0be43d0d 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> > @@ -45,33 +45,6 @@ static struct kmem_cache *slab_objects;
> >  
> >  static const struct drm_gem_object_funcs i915_gem_object_funcs;
> >  
> > -unsigned int i915_gem_get_pat_index(struct drm_i915_private *i915,
> > -				    enum i915_cache_level level)
> > -{
> > -	if (drm_WARN_ON(&i915->drm, level >= I915_MAX_CACHE_LEVEL))
> > -		return 0;
> > -
> > -	return INTEL_INFO(i915)->cachelevel_to_pat[level];
> > -}
> > -
> > -bool i915_gem_object_has_cache_level(const struct drm_i915_gem_object *obj,
> > -				     enum i915_cache_level lvl)
> > -{
> > -	/*
> > -	 * In case the pat_index is set by user space, this kernel mode
> > -	 * driver should leave the coherency to be managed by user space,
> > -	 * simply return true here.
> > -	 */
> > -	if (obj->pat_set_by_user)
> > -		return true;
> > -
> > -	/*
> > -	 * Otherwise the pat_index should have been converted from cache_level
> > -	 * so that the following comparison is valid.
> > -	 */
> > -	return obj->pat_index == i915_gem_get_pat_index(obj_to_i915(obj), lvl);
> > -}
> > -
> >  struct drm_i915_gem_object *i915_gem_object_alloc(void)
> >  {
> >  	struct drm_i915_gem_object *obj;
> > @@ -144,30 +117,72 @@ void __i915_gem_object_fini(struct drm_i915_gem_object *obj)
> >  	dma_resv_fini(&obj->base._resv);
> >  }
> >  
> > +bool i915_gem_object_has_cache_mode(const struct drm_i915_gem_object *obj,
> > +				    enum i915_cache_mode mode)
> > +{
> > +	struct drm_i915_private *i915 = obj_to_i915(obj);
> > +	i915_cache_t cache = INTEL_INFO(i915)->cache_modes[obj->pat_index];
> > +
> > +	return I915_CACHE_MODE(cache) == mode;
> > +}
> > +
> > +bool i915_gem_object_has_cache_flag(const struct drm_i915_gem_object *obj,
> > +				    unsigned int flag)
> > +{
> > +	struct drm_i915_private *i915 = obj_to_i915(obj);
> > +	i915_cache_t cache = INTEL_INFO(i915)->cache_modes[obj->pat_index];
> > +
> > +	return I915_CACHE_FLAGS(cache) & flag;
> > +}
> > +
> > +static void __i915_gem_object_update_coherency(struct drm_i915_gem_object *obj)
> > +{
> > +	struct drm_i915_private *i915 = obj_to_i915(obj);
> > +	i915_cache_t cache = INTEL_INFO(i915)->cache_modes[obj->pat_index];
> > +	const unsigned int flags = I915_CACHE_FLAGS(cache);
> > +	const unsigned int mode = I915_CACHE_MODE(cache);
> > +
> > +	if (mode == I915_CACHE_MODE_WC ||
> > +	    mode == I915_CACHE_MODE_WT ||
> > +	    (mode == I915_CACHE_MODE_WB && (flags & I915_CACHE_FLAG_COH2W)))

Shouldn't we only need 1W coherency here?  With 1-way coherency GPU
reads will snoop the CPU cache and GPU writes will invalidate the CPU
cache.  2-way only matters for how CPU reads/writes interact with the
GPU cache.


Matt

> > +		obj->cache_coherent = I915_BO_CACHE_COHERENT_FOR_READ |
> > +				      I915_BO_CACHE_COHERENT_FOR_WRITE;
> > +	else if (HAS_LLC(i915))
> > +		obj->cache_coherent = I915_BO_CACHE_COHERENT_FOR_READ;
> > +	else
> > +		obj->cache_coherent = 0;
> > +
> > +	obj->cache_dirty =
> > +		!(obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_WRITE) &&
> > +		!IS_DGFX(i915);
> > +}
> > +
> >  /**
> >   * i915_gem_object_set_cache_coherency - Mark up the object's coherency levels
> > - * for a given cache_level
> > + * for a given caching mode
> >   * @obj: #drm_i915_gem_object
> > - * @cache_level: cache level
> > + * @cache: cache mode
> >   */
> >  void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object *obj,
> > -					 unsigned int cache_level)
> > +					 i915_cache_t cache)
> >  {
> > -	struct drm_i915_private *i915 = to_i915(obj->base.dev);
> > +	struct drm_i915_private *i915 = obj_to_i915(obj);
> > +	int found;
> >  
> > -	obj->pat_index = i915_gem_get_pat_index(i915, cache_level);
> > +	found = i915_cache_find_pat(i915, cache);
> > +	if (found < 0) {
> > +		char buf[I915_CACHE_NAME_LEN];
> >  
> > -	if (cache_level != I915_CACHE_NONE)
> > -		obj->cache_coherent = (I915_BO_CACHE_COHERENT_FOR_READ |
> > -				       I915_BO_CACHE_COHERENT_FOR_WRITE);
> > -	else if (HAS_LLC(i915))
> > -		obj->cache_coherent = I915_BO_CACHE_COHERENT_FOR_READ;
> > -	else
> > -		obj->cache_coherent = 0;
> > +		i915_cache_print(buf, sizeof(buf), NULL, cache);
> > +		drm_err_ratelimited(&i915->drm, "Unknown cache mode %s!\n",
> > +				    buf);
> >  
> > -	obj->cache_dirty =
> > -		!(obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_WRITE) &&
> > -		!IS_DGFX(i915);
> > +		found = i915->pat_uc;
> > +	}
> > +
> > +	obj->pat_index = found;
> > +
> > +	__i915_gem_object_update_coherency(obj);
> >  }
> >  
> >  /**
> > @@ -181,24 +196,18 @@ void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object *obj,
> >  void i915_gem_object_set_pat_index(struct drm_i915_gem_object *obj,
> >  				   unsigned int pat_index)
> >  {
> > -	struct drm_i915_private *i915 = to_i915(obj->base.dev);
> > +	struct drm_i915_private *i915 = obj_to_i915(obj);
> >  
> >  	if (obj->pat_index == pat_index)
> >  		return;
> >  
> > +	if (drm_WARN_ON_ONCE(&i915->drm,
> > +			     pat_index > INTEL_INFO(i915)->max_pat_index))
> > +		return;
> > +
> >  	obj->pat_index = pat_index;
> >  
> > -	if (pat_index != i915_gem_get_pat_index(i915, I915_CACHE_NONE))
> > -		obj->cache_coherent = (I915_BO_CACHE_COHERENT_FOR_READ |
> > -				       I915_BO_CACHE_COHERENT_FOR_WRITE);
> > -	else if (HAS_LLC(i915))
> > -		obj->cache_coherent = I915_BO_CACHE_COHERENT_FOR_READ;
> > -	else
> > -		obj->cache_coherent = 0;
> > -
> > -	obj->cache_dirty =
> > -		!(obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_WRITE) &&
> > -		!IS_DGFX(i915);
> > +	__i915_gem_object_update_coherency(obj);
> >  }
> >  
> >  bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj)
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> > index 884a17275b3a..a5d4ee19d9be 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> > @@ -13,6 +13,7 @@
> >  
> >  #include "display/intel_frontbuffer.h"
> >  #include "intel_memory_region.h"
> > +#include "i915_cache.h"
> >  #include "i915_gem_object_types.h"
> >  #include "i915_gem_gtt.h"
> >  #include "i915_gem_ww.h"
> > @@ -32,10 +33,6 @@ static inline bool i915_gem_object_size_2big(u64 size)
> >  	return false;
> >  }
> >  
> > -unsigned int i915_gem_get_pat_index(struct drm_i915_private *i915,
> > -				    enum i915_cache_level level);
> > -bool i915_gem_object_has_cache_level(const struct drm_i915_gem_object *obj,
> > -				     enum i915_cache_level lvl);
> >  void i915_gem_init__objects(struct drm_i915_private *i915);
> >  
> >  void i915_objects_module_exit(void);
> > @@ -764,8 +761,12 @@ int i915_gem_object_wait_moving_fence(struct drm_i915_gem_object *obj,
> >  				      bool intr);
> >  bool i915_gem_object_has_unknown_state(struct drm_i915_gem_object *obj);
> >  
> > +bool i915_gem_object_has_cache_mode(const struct drm_i915_gem_object *obj,
> > +				    enum i915_cache_mode mode);
> > +bool i915_gem_object_has_cache_flag(const struct drm_i915_gem_object *obj,
> > +				    unsigned int flag);
> >  void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object *obj,
> > -					 unsigned int cache_level);
> > +					 i915_cache_t cache);
> >  void i915_gem_object_set_pat_index(struct drm_i915_gem_object *obj,
> >  				   unsigned int pat_index);
> >  bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj);
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> > index 8de2b91b3edf..6790e13ad262 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> > @@ -14,6 +14,7 @@
> >  #include <uapi/drm/i915_drm.h>
> >  
> >  #include "i915_active.h"
> > +#include "i915_cache.h"
> >  #include "i915_selftest.h"
> >  #include "i915_vma_resource.h"
> >  
> > @@ -116,93 +117,6 @@ struct drm_i915_gem_object_ops {
> >  	const char *name; /* friendly name for debug, e.g. lockdep classes */
> >  };
> >  
> > -/**
> > - * enum i915_cache_level - The supported GTT caching values for system memory
> > - * pages.
> > - *
> > - * These translate to some special GTT PTE bits when binding pages into some
> > - * address space. It also determines whether an object, or rather its pages are
> > - * coherent with the GPU, when also reading or writing through the CPU cache
> > - * with those pages.
> > - *
> > - * Userspace can also control this through struct drm_i915_gem_caching.
> > - */
> > -enum i915_cache_level {
> > -	/**
> > -	 * @I915_CACHE_NONE:
> > -	 *
> > -	 * GPU access is not coherent with the CPU cache. If the cache is dirty
> > -	 * and we need the underlying pages to be coherent with some later GPU
> > -	 * access then we need to manually flush the pages.
> > -	 *
> > -	 * On shared LLC platforms reads and writes through the CPU cache are
> > -	 * still coherent even with this setting. See also
> > -	 * &drm_i915_gem_object.cache_coherent for more details. Due to this we
> > -	 * should only ever use uncached for scanout surfaces, otherwise we end
> > -	 * up over-flushing in some places.
> > -	 *
> > -	 * This is the default on non-LLC platforms.
> > -	 */
> > -	I915_CACHE_NONE = 0,
> > -	/**
> > -	 * @I915_CACHE_LLC:
> > -	 *
> > -	 * GPU access is coherent with the CPU cache. If the cache is dirty,
> > -	 * then the GPU will ensure that access remains coherent, when both
> > -	 * reading and writing through the CPU cache. GPU writes can dirty the
> > -	 * CPU cache.
> > -	 *
> > -	 * Not used for scanout surfaces.
> > -	 *
> > -	 * Applies to both platforms with shared LLC(HAS_LLC), and snooping
> > -	 * based platforms(HAS_SNOOP).
> > -	 *
> > -	 * This is the default on shared LLC platforms.  The only exception is
> > -	 * scanout objects, where the display engine is not coherent with the
> > -	 * CPU cache. For such objects I915_CACHE_NONE or I915_CACHE_WT is
> > -	 * automatically applied by the kernel in pin_for_display, if userspace
> > -	 * has not done so already.
> > -	 */
> > -	I915_CACHE_LLC,
> > -	/**
> > -	 * @I915_CACHE_L3_LLC:
> > -	 *
> > -	 * Explicitly enable the Gfx L3 cache, with coherent LLC.
> > -	 *
> > -	 * The Gfx L3 sits between the domain specific caches, e.g
> > -	 * sampler/render caches, and the larger LLC. LLC is coherent with the
> > -	 * GPU, but L3 is only visible to the GPU, so likely needs to be flushed
> > -	 * when the workload completes.
> > -	 *
> > -	 * Not used for scanout surfaces.
> > -	 *
> > -	 * Only exposed on some gen7 + GGTT. More recent hardware has dropped
> > -	 * this explicit setting, where it should now be enabled by default.
> > -	 */
> > -	I915_CACHE_L3_LLC,
> > -	/**
> > -	 * @I915_CACHE_WT:
> > -	 *
> > -	 * Write-through. Used for scanout surfaces.
> > -	 *
> > -	 * The GPU can utilise the caches, while still having the display engine
> > -	 * be coherent with GPU writes, as a result we don't need to flush the
> > -	 * CPU caches when moving out of the render domain. This is the default
> > -	 * setting chosen by the kernel, if supported by the HW, otherwise we
> > -	 * fallback to I915_CACHE_NONE. On the CPU side writes through the CPU
> > -	 * cache still need to be flushed, to remain coherent with the display
> > -	 * engine.
> > -	 */
> > -	I915_CACHE_WT,
> > -	/**
> > -	 * @I915_MAX_CACHE_LEVEL:
> > -	 *
> > -	 * Mark the last entry in the enum. Used for defining cachelevel_to_pat
> > -	 * array for cache_level to pat translation table.
> > -	 */
> > -	I915_MAX_CACHE_LEVEL,
> > -};
> > -
> >  enum i915_map_type {
> >  	I915_MAP_WB = 0,
> >  	I915_MAP_WC,
> > @@ -403,16 +317,6 @@ struct drm_i915_gem_object {
> >  	/**
> >  	 * @cache_coherent:
> >  	 *
> > -	 * Note: with the change above which replaced @cache_level with pat_index,
> > -	 * the use of @cache_coherent is limited to the objects created by kernel
> > -	 * or by userspace without pat index specified.
> > -	 * Check for @pat_set_by_user to find out if an object has pat index set
> > -	 * by userspace. The ioctl's to change cache settings have also been
> > -	 * disabled for the objects with pat index set by userspace. Please don't
> > -	 * assume @cache_coherent having the flags set as describe here. A helper
> > -	 * function i915_gem_object_has_cache_level() provides one way to bypass
> > -	 * the use of this field.
> > -	 *
> >  	 * Track whether the pages are coherent with the GPU if reading or
> >  	 * writing through the CPU caches. The largely depends on the
> >  	 * @cache_level setting.
> > @@ -447,7 +351,7 @@ struct drm_i915_gem_object {
> >  	 * flushing the surface just before doing the scanout.  This does mean
> >  	 * we might unnecessarily flush non-scanout objects in some places, but
> >  	 * the default assumption is that all normal objects should be using
> > -	 * I915_CACHE_LLC, at least on platforms with the shared LLC.
> > +	 * I915_CACHE_CACHED, at least on platforms with the shared LLC.
> >  	 *
> >  	 * Supported values:
> >  	 *
> > @@ -486,16 +390,6 @@ struct drm_i915_gem_object {
> >  	/**
> >  	 * @cache_dirty:
> >  	 *
> > -	 * Note: with the change above which replaced cache_level with pat_index,
> > -	 * the use of @cache_dirty is limited to the objects created by kernel
> > -	 * or by userspace without pat index specified.
> > -	 * Check for @pat_set_by_user to find out if an object has pat index set
> > -	 * by userspace. The ioctl's to change cache settings have also been
> > -	 * disabled for the objects with pat_index set by userspace. Please don't
> > -	 * assume @cache_dirty is set as describe here. Also see helper function
> > -	 * i915_gem_object_has_cache_level() for possible ways to bypass the use
> > -	 * of this field.
> > -	 *
> >  	 * Track if we are we dirty with writes through the CPU cache for this
> >  	 * object. As a result reading directly from main memory might yield
> >  	 * stale data.
> > @@ -531,9 +425,9 @@ struct drm_i915_gem_object {
> >  	 *
> >  	 *   1. All userspace objects, by default, have @cache_level set as
> >  	 *   I915_CACHE_NONE. The only exception is userptr objects, where we
> > -	 *   instead force I915_CACHE_LLC, but we also don't allow userspace to
> > -	 *   ever change the @cache_level for such objects. Another special case
> > -	 *   is dma-buf, which doesn't rely on @cache_dirty,  but there we
> > +	 *   instead force I915_CACHE_CACHED, but we also don't allow userspace
> > +	 *   to ever change the @cache_level for such objects. Another special
> > +	 *   case is dma-buf, which doesn't rely on @cache_dirty,  but there we
> >  	 *   always do a forced flush when acquiring the pages, if there is a
> >  	 *   chance that the pages can be read directly from main memory with
> >  	 *   the GPU.
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
> > index 8f1633c3fb93..aba908f0349f 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
> > @@ -584,7 +584,7 @@ static int shmem_object_init(struct intel_memory_region *mem,
> >  	static struct lock_class_key lock_class;
> >  	struct drm_i915_private *i915 = mem->i915;
> >  	struct address_space *mapping;
> > -	unsigned int cache_level;
> > +	i915_cache_t cache;
> >  	gfp_t mask;
> >  	int ret;
> >  
> > @@ -628,11 +628,11 @@ static int shmem_object_init(struct intel_memory_region *mem,
> >  		 * However, we maintain the display planes as UC, and so
> >  		 * need to rebind when first used as such.
> >  		 */
> > -		cache_level = I915_CACHE_LLC;
> > +		cache = I915_CACHE_CACHED;
> >  	else
> > -		cache_level = I915_CACHE_NONE;
> > +		cache = I915_CACHE_NONE;
> >  
> > -	i915_gem_object_set_cache_coherency(obj, cache_level);
> > +	i915_gem_object_set_cache_coherency(obj, cache);
> >  
> >  	i915_gem_object_init_memory_region(obj, mem);
> >  
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
> > index 1c8eb806b7d3..cc907a1f1c53 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
> > @@ -691,7 +691,7 @@ static int __i915_gem_object_create_stolen(struct intel_memory_region *mem,
> >  
> >  	obj->stolen = stolen;
> >  	obj->read_domains = I915_GEM_DOMAIN_CPU | I915_GEM_DOMAIN_GTT;
> > -	cache_level = HAS_LLC(mem->i915) ? I915_CACHE_LLC : I915_CACHE_NONE;
> > +	cache_level = HAS_LLC(mem->i915) ? I915_CACHE_CACHED : I915_CACHE_NONE;
> >  	i915_gem_object_set_cache_coherency(obj, cache_level);
> >  
> >  	if (WARN_ON(!i915_gem_object_trylock(obj, NULL)))
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
> > index 6bd6c239f4ac..107176d1757b 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
> > @@ -48,14 +48,14 @@ void i915_ttm_migrate_set_ban_memcpy(bool ban)
> >  }
> >  #endif
> >  
> > -static enum i915_cache_level
> > -i915_ttm_cache_level(struct drm_i915_private *i915, struct ttm_resource *res,
> > -		     struct ttm_tt *ttm)
> > +static i915_cache_t
> > +i915_ttm_cache(struct drm_i915_private *i915, struct ttm_resource *res,
> > +	       struct ttm_tt *ttm)
> >  {
> >  	return ((HAS_LLC(i915) || HAS_SNOOP(i915)) &&
> >  		!i915_ttm_gtt_binds_lmem(res) &&
> > -		ttm->caching == ttm_cached) ? I915_CACHE_LLC :
> > -		I915_CACHE_NONE;
> > +		ttm->caching == ttm_cached) ? I915_CACHE_CACHED :
> > +					      I915_CACHE_NONE;
> >  }
> >  
> >  static unsigned int
> > @@ -112,8 +112,8 @@ void i915_ttm_adjust_domains_after_move(struct drm_i915_gem_object *obj)
> >  void i915_ttm_adjust_gem_after_move(struct drm_i915_gem_object *obj)
> >  {
> >  	struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
> > -	unsigned int cache_level;
> >  	unsigned int mem_flags;
> > +	i915_cache_t cache;
> >  	unsigned int i;
> >  	int mem_type;
> >  
> > @@ -126,13 +126,13 @@ void i915_ttm_adjust_gem_after_move(struct drm_i915_gem_object *obj)
> >  	if (!bo->resource) {
> >  		mem_flags = I915_BO_FLAG_STRUCT_PAGE;
> >  		mem_type = I915_PL_SYSTEM;
> > -		cache_level = I915_CACHE_NONE;
> > +		cache = I915_CACHE_NONE;
> >  	} else {
> >  		mem_flags = i915_ttm_cpu_maps_iomem(bo->resource) ? I915_BO_FLAG_IOMEM :
> >  			I915_BO_FLAG_STRUCT_PAGE;
> >  		mem_type = bo->resource->mem_type;
> > -		cache_level = i915_ttm_cache_level(to_i915(bo->base.dev), bo->resource,
> > -						   bo->ttm);
> > +		cache = i915_ttm_cache(to_i915(bo->base.dev), bo->resource,
> > +				       bo->ttm);
> >  	}
> >  
> >  	/*
> > @@ -157,7 +157,7 @@ void i915_ttm_adjust_gem_after_move(struct drm_i915_gem_object *obj)
> >  	obj->mem_flags &= ~(I915_BO_FLAG_STRUCT_PAGE | I915_BO_FLAG_IOMEM);
> >  	obj->mem_flags |= mem_flags;
> >  
> > -	i915_gem_object_set_cache_coherency(obj, cache_level);
> > +	i915_gem_object_set_cache_coherency(obj, cache);
> >  }
> >  
> >  /**
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
> > index 1d3ebdf4069b..5d2891981bd4 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
> > @@ -553,7 +553,7 @@ i915_gem_userptr_ioctl(struct drm_device *dev,
> >  	obj->mem_flags = I915_BO_FLAG_STRUCT_PAGE;
> >  	obj->read_domains = I915_GEM_DOMAIN_CPU;
> >  	obj->write_domain = I915_GEM_DOMAIN_CPU;
> > -	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
> > +	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
> >  
> >  	obj->userptr.ptr = args->user_ptr;
> >  	obj->userptr.notifier_seq = ULONG_MAX;
> > diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c b/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c
> > index bac957755068..77d04be5e9d7 100644
> > --- a/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c
> > +++ b/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c
> > @@ -123,7 +123,7 @@ huge_gem_object(struct drm_i915_private *i915,
> >  
> >  	obj->read_domains = I915_GEM_DOMAIN_CPU;
> >  	obj->write_domain = I915_GEM_DOMAIN_CPU;
> > -	cache_level = HAS_LLC(i915) ? I915_CACHE_LLC : I915_CACHE_NONE;
> > +	cache_level = HAS_LLC(i915) ? I915_CACHE_CACHED : I915_CACHE_NONE;
> >  	i915_gem_object_set_cache_coherency(obj, cache_level);
> >  	obj->scratch = phys_size;
> >  
> > diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
> > index 6bddd733d796..6ca5b9dbc414 100644
> > --- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
> > +++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
> > @@ -200,9 +200,10 @@ huge_pages_object(struct drm_i915_private *i915,
> >  	obj->write_domain = I915_GEM_DOMAIN_CPU;
> >  	obj->read_domains = I915_GEM_DOMAIN_CPU;
> >  
> > -	cache_level = HAS_LLC(i915) ? I915_CACHE_LLC : I915_CACHE_NONE;
> > +	cache_level = HAS_LLC(i915) ? I915_CACHE_CACHED : I915_CACHE_NONE;
> >  	i915_gem_object_set_cache_coherency(obj, cache_level);
> >  
> > +
> >  	obj->mm.page_mask = page_mask;
> >  
> >  	return obj;
> > diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> > index 675f71f06e89..3c93a73cf6b1 100644
> > --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> > +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> > @@ -16,11 +16,11 @@
> >  #include "intel_gtt.h"
> >  
> >  static u64 gen8_pde_encode(const dma_addr_t addr,
> > -			   const enum i915_cache_level level)
> > +			   const enum i915_cache_mode cache_mode)
> >  {
> >  	u64 pde = addr | GEN8_PAGE_PRESENT | GEN8_PAGE_RW;
> >  
> > -	if (level != I915_CACHE_NONE)
> > +	if (cache_mode != I915_CACHE_MODE_UC)
> >  		pde |= PPAT_CACHED_PDE;
> >  	else
> >  		pde |= PPAT_UNCACHED;
> > @@ -43,10 +43,10 @@ static u64 gen8_pte_encode(dma_addr_t addr,
> >  	 * See translation table defined by LEGACY_CACHELEVEL.
> >  	 */
> >  	switch (pat_index) {
> > -	case I915_CACHE_NONE:
> > +	case I915_CACHE_MODE_UC:
> >  		pte |= PPAT_UNCACHED;
> >  		break;
> > -	case I915_CACHE_WT:
> > +	case I915_CACHE_MODE_WT:
> >  		pte |= PPAT_DISPLAY_ELLC;
> >  		break;
> >  	default:
> > @@ -893,7 +893,7 @@ static int gen8_init_scratch(struct i915_address_space *vm)
> >  		}
> >  
> >  		fill_px(obj, vm->scratch[i - 1]->encode);
> > -		obj->encode = gen8_pde_encode(px_dma(obj), I915_CACHE_NONE);
> > +		obj->encode = gen8_pde_encode(px_dma(obj), I915_CACHE_MODE_UC);
> >  
> >  		vm->scratch[i] = obj;
> >  	}
> > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > index ee15486fed0d..f1e59e512d14 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > @@ -1103,7 +1103,7 @@ static int init_status_page(struct intel_engine_cs *engine)
> >  		return PTR_ERR(obj);
> >  	}
> >  
> > -	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
> > +	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
> >  
> >  	vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL);
> >  	if (IS_ERR(vma)) {
> > diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> > index fca61ddca8ad..ab5f654e7557 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> > @@ -1011,11 +1011,6 @@ static int gen8_gmch_probe(struct i915_ggtt *ggtt)
> >  	return ggtt_probe_common(ggtt, size);
> >  }
> >  
> > -/*
> > - * For pre-gen8 platforms pat_index is the same as enum i915_cache_level,
> > - * so the switch-case statements in these PTE encode functions are still valid.
> > - * See translation table LEGACY_CACHELEVEL.
> > - */
> >  static u64 snb_pte_encode(dma_addr_t addr,
> >  			  unsigned int pat_index,
> >  			  u32 flags)
> > @@ -1023,11 +1018,11 @@ static u64 snb_pte_encode(dma_addr_t addr,
> >  	gen6_pte_t pte = GEN6_PTE_ADDR_ENCODE(addr) | GEN6_PTE_VALID;
> >  
> >  	switch (pat_index) {
> > -	case I915_CACHE_L3_LLC:
> > -	case I915_CACHE_LLC:
> > +	case I915_CACHE_MODE_WB:
> > +	case __I915_CACHE_MODE_WB_L3:
> >  		pte |= GEN6_PTE_CACHE_LLC;
> >  		break;
> > -	case I915_CACHE_NONE:
> > +	case I915_CACHE_MODE_UC:
> >  		pte |= GEN6_PTE_UNCACHED;
> >  		break;
> >  	default:
> > @@ -1044,13 +1039,13 @@ static u64 ivb_pte_encode(dma_addr_t addr,
> >  	gen6_pte_t pte = GEN6_PTE_ADDR_ENCODE(addr) | GEN6_PTE_VALID;
> >  
> >  	switch (pat_index) {
> > -	case I915_CACHE_L3_LLC:
> > +	case __I915_CACHE_MODE_WB_L3:
> >  		pte |= GEN7_PTE_CACHE_L3_LLC;
> >  		break;
> > -	case I915_CACHE_LLC:
> > +	case I915_CACHE_MODE_WB:
> >  		pte |= GEN6_PTE_CACHE_LLC;
> >  		break;
> > -	case I915_CACHE_NONE:
> > +	case I915_CACHE_MODE_UC:
> >  		pte |= GEN6_PTE_UNCACHED;
> >  		break;
> >  	default:
> > @@ -1069,7 +1064,7 @@ static u64 byt_pte_encode(dma_addr_t addr,
> >  	if (!(flags & PTE_READ_ONLY))
> >  		pte |= BYT_PTE_WRITEABLE;
> >  
> > -	if (pat_index != I915_CACHE_NONE)
> > +	if (pat_index != I915_CACHE_MODE_UC)
> >  		pte |= BYT_PTE_SNOOPED_BY_CPU_CACHES;
> >  
> >  	return pte;
> > @@ -1081,7 +1076,7 @@ static u64 hsw_pte_encode(dma_addr_t addr,
> >  {
> >  	gen6_pte_t pte = HSW_PTE_ADDR_ENCODE(addr) | GEN6_PTE_VALID;
> >  
> > -	if (pat_index != I915_CACHE_NONE)
> > +	if (pat_index != I915_CACHE_MODE_UC)
> >  		pte |= HSW_WB_LLC_AGE3;
> >  
> >  	return pte;
> > @@ -1094,9 +1089,9 @@ static u64 iris_pte_encode(dma_addr_t addr,
> >  	gen6_pte_t pte = HSW_PTE_ADDR_ENCODE(addr) | GEN6_PTE_VALID;
> >  
> >  	switch (pat_index) {
> > -	case I915_CACHE_NONE:
> > +	case I915_CACHE_MODE_UC:
> >  		break;
> > -	case I915_CACHE_WT:
> > +	case I915_CACHE_MODE_WT:
> >  		pte |= HSW_WT_ELLC_LLC_AGE3;
> >  		break;
> >  	default:
> > diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c b/drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c
> > index 866c416afb73..803c41ac4ccb 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c
> > @@ -21,7 +21,7 @@ static void gmch_ggtt_insert_page(struct i915_address_space *vm,
> >  				  unsigned int pat_index,
> >  				  u32 unused)
> >  {
> > -	unsigned int flags = (pat_index == I915_CACHE_NONE) ?
> > +	unsigned int flags = (pat_index == I915_CACHE_MODE_UC) ?
> >  		AGP_USER_MEMORY : AGP_USER_CACHED_MEMORY;
> >  
> >  	intel_gmch_gtt_insert_page(addr, offset >> PAGE_SHIFT, flags);
> > @@ -32,7 +32,7 @@ static void gmch_ggtt_insert_entries(struct i915_address_space *vm,
> >  				     unsigned int pat_index,
> >  				     u32 unused)
> >  {
> > -	unsigned int flags = (pat_index == I915_CACHE_NONE) ?
> > +	unsigned int flags = (pat_index == I915_CACHE_MODE_UC) ?
> >  		AGP_USER_MEMORY : AGP_USER_CACHED_MEMORY;
> >  
> >  	intel_gmch_gtt_insert_sg_entries(vma_res->bi.pages, vma_res->start >> PAGE_SHIFT,
> > diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
> > index 065099362a98..48055304537a 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
> > @@ -676,7 +676,7 @@ __vm_create_scratch_for_read(struct i915_address_space *vm, unsigned long size)
> >  	if (IS_ERR(obj))
> >  		return ERR_CAST(obj);
> >  
> > -	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
> > +	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
> >  
> >  	vma = i915_vma_instance(obj, vm, NULL);
> >  	if (IS_ERR(vma)) {
> > diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
> > index 7192a534a654..af4277c1d577 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
> > @@ -636,7 +636,8 @@ void
> >  __set_pd_entry(struct i915_page_directory * const pd,
> >  	       const unsigned short idx,
> >  	       struct i915_page_table *pt,
> > -	       u64 (*encode)(const dma_addr_t, const enum i915_cache_level));
> > +	       u64 (*encode)(const dma_addr_t,
> > +			     const enum i915_cache_mode cache_mode));
> >  
> >  #define set_pd_entry(pd, idx, to) \
> >  	__set_pd_entry((pd), (idx), px_pt(to), gen8_pde_encode)
> > diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
> > index 436756bfbb1a..3e461d4f3693 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
> > @@ -98,14 +98,16 @@ void
> >  __set_pd_entry(struct i915_page_directory * const pd,
> >  	       const unsigned short idx,
> >  	       struct i915_page_table * const to,
> > -	       u64 (*encode)(const dma_addr_t, const enum i915_cache_level))
> > +	       u64 (*encode)(const dma_addr_t,
> > +			     const enum i915_cache_mode cache_mode))
> >  {
> >  	/* Each thread pre-pins the pd, and we may have a thread per pde. */
> >  	GEM_BUG_ON(atomic_read(px_used(pd)) > NALLOC * I915_PDES);
> >  
> >  	atomic_inc(px_used(pd));
> >  	pd->entry[idx] = to;
> > -	write_dma_entry(px_base(pd), idx, encode(px_dma(to), I915_CACHE_LLC));
> > +	write_dma_entry(px_base(pd), idx,
> > +			encode(px_dma(to), I915_CACHE_MODE_WB));
> >  }
> >  
> >  void
> > diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> > index 92085ffd23de..9131d228d285 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> > @@ -551,7 +551,9 @@ alloc_context_vma(struct intel_engine_cs *engine)
> >  	 * later platforms don't have L3 control bits in the PTE.
> >  	 */
> >  	if (IS_IVYBRIDGE(i915))
> > -		i915_gem_object_set_cache_coherency(obj, I915_CACHE_L3_LLC);
> > +		i915_gem_object_set_cache_coherency(obj,
> > +						    I915_CACHE_CACHED |
> > +						    __I915_CACHE_FLAG(L3));
> >  
> >  	vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL);
> >  	if (IS_ERR(vma)) {
> > diff --git a/drivers/gpu/drm/i915/gt/intel_timeline.c b/drivers/gpu/drm/i915/gt/intel_timeline.c
> > index b9640212d659..025ce54c886d 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_timeline.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_timeline.c
> > @@ -26,7 +26,7 @@ static struct i915_vma *hwsp_alloc(struct intel_gt *gt)
> >  	if (IS_ERR(obj))
> >  		return ERR_CAST(obj);
> >  
> > -	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
> > +	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
> >  
> >  	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL);
> >  	if (IS_ERR(vma))
> > diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> > index 8b0d84f2aad2..fc278fa463b0 100644
> > --- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> > +++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> > @@ -64,7 +64,7 @@ static int hang_init(struct hang *h, struct intel_gt *gt)
> >  		goto err_hws;
> >  	}
> >  
> > -	i915_gem_object_set_cache_coherency(h->hws, I915_CACHE_LLC);
> > +	i915_gem_object_set_cache_coherency(h->hws, I915_CACHE_CACHED);
> >  	vaddr = i915_gem_object_pin_map_unlocked(h->hws, I915_MAP_WB);
> >  	if (IS_ERR(vaddr)) {
> >  		err = PTR_ERR(vaddr);
> > diff --git a/drivers/gpu/drm/i915/gt/selftest_workarounds.c b/drivers/gpu/drm/i915/gt/selftest_workarounds.c
> > index 14a8b25b6204..d25990d33d44 100644
> > --- a/drivers/gpu/drm/i915/gt/selftest_workarounds.c
> > +++ b/drivers/gpu/drm/i915/gt/selftest_workarounds.c
> > @@ -111,7 +111,7 @@ read_nonprivs(struct intel_context *ce)
> >  	if (IS_ERR(result))
> >  		return result;
> >  
> > -	i915_gem_object_set_cache_coherency(result, I915_CACHE_LLC);
> > +	i915_gem_object_set_cache_coherency(result, I915_CACHE_CACHED);
> >  
> >  	cs = i915_gem_object_pin_map_unlocked(result, I915_MAP_WB);
> >  	if (IS_ERR(cs)) {
> > diff --git a/drivers/gpu/drm/i915/i915_cache.c b/drivers/gpu/drm/i915/i915_cache.c
> > index 06eb5933c719..f4ba1cb430d3 100644
> > --- a/drivers/gpu/drm/i915/i915_cache.c
> > +++ b/drivers/gpu/drm/i915/i915_cache.c
> > @@ -6,13 +6,88 @@
> >  #include "i915_cache.h"
> >  #include "i915_drv.h"
> >  
> > -void i915_cache_init(struct drm_i915_private *i915)
> > +int i915_cache_init(struct drm_i915_private *i915)
> >  {
> > -	i915->pat_uc = i915_gem_get_pat_index(i915, I915_CACHE_NONE);
> > -	drm_info(&i915->drm, "Using PAT index %u for uncached access\n",
> > -		 i915->pat_uc);
> > +	int ret;
> >  
> > -	i915->pat_wb = i915_gem_get_pat_index(i915, I915_CACHE_LLC);
> > -	drm_info(&i915->drm, "Using PAT index %u for write-back access\n",
> > -		 i915->pat_wb);
> > +	ret = i915_cache_find_pat(i915, I915_CACHE_NONE);
> > +	if (ret < 0) {
> > +		drm_err(&i915->drm,
> > +			"Failed to find PAT index for uncached access\n");
> > +		return -ENODEV;
> > +	}
> > +	drm_info(&i915->drm, "Using PAT index %u for uncached access\n", ret);
> > +	i915->pat_uc = ret;
> > +
> > +	ret = i915_cache_find_pat(i915, I915_CACHE_CACHED);
> > +	if (ret < 0) {
> > +		drm_err(&i915->drm,
> > +			"Failed to find PAT index for write-back access\n");
> > +		return -ENODEV;
> > +	}
> > +	drm_info(&i915->drm, "Using PAT index %u for write-back access\n", ret);
> > +	i915->pat_wb = ret;
> > +
> > +	return 0;
> > +}
> > +
> > +int i915_cache_find_pat(struct drm_i915_private *i915, i915_cache_t cache)
> > +{
> > +	const struct intel_device_info *info = INTEL_INFO(i915);
> > +	int i;
> > +
> > +	for (i = 0; i < ARRAY_SIZE(info->cache_modes); i++) {
> > +		if (info->cache_modes[i] == cache)
> > +			return i;
> > +	}
> > +
> > +	return -1;
> > +}
> > +
> > +void i915_cache_print(char *buf, size_t buflen, const char *suffix,
> > +		      i915_cache_t cache)
> > +{
> > +	const enum i915_cache_mode mode = I915_CACHE_MODE(cache);
> > +	static const char * const mode_str[] = {
> > +		[I915_CACHE_MODE_UC] = "UC",
> > +		[I915_CACHE_MODE_WB] = "WB",
> > +		[I915_CACHE_MODE_WT] = "WT",
> > +		[I915_CACHE_MODE_WC] = "WC",
> > +	};
> > +	static const char * const flag_str[] = {
> > +		[ilog2(I915_CACHE_FLAG_COH1W)] = "1-Way-Coherent",
> > +		[ilog2(I915_CACHE_FLAG_COH2W)] = "2-Way-Coherent",
> > +		[ilog2(I915_CACHE_FLAG_L3)] =    "L3",
> > +		[ilog2(I915_CACHE_FLAG_CLOS1)] = "CLOS1",
> > +		[ilog2(I915_CACHE_FLAG_CLOS2)] = "CLOS2",
> > +	};
> > +
> > +	if (mode > ARRAY_SIZE(mode_str)) {
> > +		snprintf(buf, buflen, "0x%x%s", cache, suffix ?: "");
> > +	} else {
> > +		unsigned long flags = I915_CACHE_FLAGS(cache);
> > +		unsigned long bit;
> > +		int ret;
> > +
> > +		ret = snprintf(buf, buflen, "%s", mode_str[mode]);
> > +		buf += ret;
> > +		buflen -= ret;
> > +
> > +		/*
> > +		 * Don't print "1-way-2-way", it would be confusing and 2-way
> > +		 * implies 1-way anyway.
> > +		 */
> > +		if ((flags & (I915_CACHE_FLAG_COH1W | I915_CACHE_FLAG_COH2W)) ==
> > +		    (I915_CACHE_FLAG_COH1W | I915_CACHE_FLAG_COH2W))
> > +			flags &= ~I915_CACHE_FLAG_COH1W;
> > +
> > +		for_each_set_bit(bit, &flags, BITS_PER_TYPE(i915_cache_t)) {
> > +			ret = snprintf(buf, buflen, "-%s", flag_str[bit]);
> > +			buf += ret;
> > +			buflen -= ret;
> > +		}
> > +
> > +		if (suffix)
> > +			snprintf(buf, buflen, "%s", suffix);
> > +	}
> >  }
> > diff --git a/drivers/gpu/drm/i915/i915_cache.h b/drivers/gpu/drm/i915/i915_cache.h
> > index cb68936fb8a2..d9e97318b942 100644
> > --- a/drivers/gpu/drm/i915/i915_cache.h
> > +++ b/drivers/gpu/drm/i915/i915_cache.h
> > @@ -6,8 +6,76 @@
> >  #ifndef __I915_CACHE_H__
> >  #define __I915_CACHE_H__
> >  
> > +#include <linux/types.h>
> > +
> > +struct drm_printer;
> > +
> >  struct drm_i915_private;
> >  
> > -void i915_cache_init(struct drm_i915_private *i915);
> > +typedef u16 i915_cache_t;
> > +
> > +/* Cache modes */
> > +enum i915_cache_mode {
> > +	I915_CACHE_MODE_UC = 0,
> > +	I915_CACHE_MODE_WB,
> > +	__I915_CACHE_MODE_WB_L3, /* Special do-not-use entry for legacy 1:1 mapping. */
> > +	I915_CACHE_MODE_WT,
> > +	I915_CACHE_MODE_WC,
> > +	I915_NUM_CACHE_MODES
> > +};
> > +
> > +/* Cache mode flag bits */
> > +#define I915_CACHE_FLAG_COH1W	(0x1)
> > +#define I915_CACHE_FLAG_COH2W	(0x2) /* 1-way needs to be set too. */
> > +#define I915_CACHE_FLAG_L3	(0x4)
> > +#define I915_CACHE_FLAG_CLOS1	(0x8)
> > +#define I915_CACHE_FLAG_CLOS2	(0x10)
> > +
> > +/*
> > + * Overloaded I915_CACHE() macro based on:
> > + *  https://stackoverflow.com/questions/3046889/optional-parameters-with-c-macros
> > + *
> > + * It is possible to call I915_CACHE with mode and zero or more flags as
> > + * separate arguments. Ie these all work:
> > + *
> > + *   I915_CACHE(WB)
> > + *   I915_CACHE(WB, COH1W, COH2W)
> > + *   I915_CACHE(WB, COH1W, COH2W, L3)
> > + */
> > +
> > +#define __I915_CACHE_FLAG(f) (I915_CACHE_FLAG_##f << 8)
> > +#define __I915_CACHE(m, f) ((i915_cache_t)(I915_CACHE_MODE_##m | (f)))
> > +
> > +#define I915_CACHE_4(m, f1, f2, f3)	__I915_CACHE(m, __I915_CACHE_FLAG(f1) | __I915_CACHE_FLAG(f2) | __I915_CACHE_FLAG(f3))
> > +#define I915_CACHE_3(m, f1, f2)		__I915_CACHE(m, __I915_CACHE_FLAG(f1) | __I915_CACHE_FLAG(f2))
> > +#define I915_CACHE_2(m, f1)		__I915_CACHE(m, __I915_CACHE_FLAG(f1))
> > +#define I915_CACHE_1(m)			__I915_CACHE(m, 0)
> > +#define I915_CACHE_0(m)			__I915_CACHE(WC, 0)
> > +
> > +#define FUNC_CHOOSER(_f1, _f2, _f3, _f4, _f5, ...) _f5
> > +#define FUNC_RECOMPOSER(argsWithParentheses) FUNC_CHOOSER argsWithParentheses
> > +#define CHOOSE_FROM_ARG_COUNT(...) FUNC_RECOMPOSER((__VA_ARGS__, I915_CACHE_4, I915_CACHE_3, I915_CACHE_2, I915_CACHE_1, ))
> > +#define NO_ARG_EXPANDER() ,,,I915_CACHE_0
> > +#define MACRO_CHOOSER(...) CHOOSE_FROM_ARG_COUNT(NO_ARG_EXPANDER __VA_ARGS__ ())
> > +
> > +#define I915_CACHE(...) MACRO_CHOOSER(__VA_ARGS__)(__VA_ARGS__)
> > +
> > +/* i915_cache_t mode and flags extraction helpers. */
> > +#define I915_CACHE_MODE(cache) \
> > +	((enum i915_cache_mode)(((i915_cache_t)(cache)) & 0xff))
> > +#define I915_CACHE_FLAGS(cache) \
> > +	((unsigned int)((((i915_cache_t)(cache) & 0xff00)) >> 8))
> > +
> > +/* Helpers for i915 caching modes. */
> > +#define I915_CACHE_NONE		I915_CACHE(UC)
> > +#define I915_CACHE_CACHED	I915_CACHE(WB, COH1W, COH2W)
> > +#define I915_CACHE_WT		I915_CACHE(WT)
> > +
> > +int i915_cache_init(struct drm_i915_private *i915);
> > +int i915_cache_find_pat(struct drm_i915_private *i915, i915_cache_t cache);
> > +void i915_cache_print(char *buf, size_t buflen, const char *suffix,
> > +		      i915_cache_t cache);
> > +
> > +#define I915_CACHE_NAME_LEN (40)
> >  
> >  #endif /* __I915_CACHE_H__ */
> > diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> > index 4de44cf1026d..4ec292011546 100644
> > --- a/drivers/gpu/drm/i915/i915_debugfs.c
> > +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> > @@ -140,57 +140,18 @@ static const char *stringify_vma_type(const struct i915_vma *vma)
> >  	return "ppgtt";
> >  }
> >  
> > -static const char *i915_cache_level_str(struct drm_i915_gem_object *obj)
> > -{
> > -	struct drm_i915_private *i915 = obj_to_i915(obj);
> > -
> > -	if (IS_METEORLAKE(i915)) {
> > -		switch (obj->pat_index) {
> > -		case 0: return " WB";
> > -		case 1: return " WT";
> > -		case 2: return " UC";
> > -		case 3: return " WB (1-Way Coh)";
> > -		case 4: return " WB (2-Way Coh)";
> > -		default: return " not defined";
> > -		}
> > -	} else if (IS_PONTEVECCHIO(i915)) {
> > -		switch (obj->pat_index) {
> > -		case 0: return " UC";
> > -		case 1: return " WC";
> > -		case 2: return " WT";
> > -		case 3: return " WB";
> > -		case 4: return " WT (CLOS1)";
> > -		case 5: return " WB (CLOS1)";
> > -		case 6: return " WT (CLOS2)";
> > -		case 7: return " WT (CLOS2)";
> > -		default: return " not defined";
> > -		}
> > -	} else if (GRAPHICS_VER(i915) >= 12) {
> > -		switch (obj->pat_index) {
> > -		case 0: return " WB";
> > -		case 1: return " WC";
> > -		case 2: return " WT";
> > -		case 3: return " UC";
> > -		default: return " not defined";
> > -		}
> > -	} else {
> > -		switch (obj->pat_index) {
> > -		case 0: return " UC";
> > -		case 1: return HAS_LLC(i915) ?
> > -			       " LLC" : " snooped";
> > -		case 2: return " L3+LLC";
> > -		case 3: return " WT";
> > -		default: return " not defined";
> > -		}
> > -	}
> > -}
> > -
> >  void
> >  i915_debugfs_describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
> >  {
> > +	struct drm_i915_private *i915 = to_i915(obj->base.dev);
> > +	char buf[I915_CACHE_NAME_LEN];
> >  	struct i915_vma *vma;
> >  	int pin_count = 0;
> >  
> > +	i915_cache_print(buf, sizeof(buf),
> > +			 obj->pat_set_by_user ? "!" : NULL,
> > +			 INTEL_INFO(i915)->cache_modes[obj->pat_index]);
> > +
> >  	seq_printf(m, "%pK: %c%c%c %8zdKiB %02x %02x %s%s%s",
> >  		   &obj->base,
> >  		   get_tiling_flag(obj),
> > @@ -199,7 +160,7 @@ i915_debugfs_describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
> >  		   obj->base.size / 1024,
> >  		   obj->read_domains,
> >  		   obj->write_domain,
> > -		   i915_cache_level_str(obj),
> > +		   buf,
> >  		   obj->mm.dirty ? " dirty" : "",
> >  		   obj->mm.madv == I915_MADV_DONTNEED ? " purgeable" : "");
> >  	if (obj->base.name)
> > diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
> > index bb2223cc3470..8663388a524f 100644
> > --- a/drivers/gpu/drm/i915/i915_driver.c
> > +++ b/drivers/gpu/drm/i915/i915_driver.c
> > @@ -241,7 +241,9 @@ static int i915_driver_early_probe(struct drm_i915_private *dev_priv)
> >  	i915_memcpy_init_early(dev_priv);
> >  	intel_runtime_pm_init_early(&dev_priv->runtime_pm);
> >  
> > -	i915_cache_init(dev_priv);
> > +	ret = i915_cache_init(dev_priv);
> > +	if (ret < 0)
> > +		return ret;
> >  
> >  	ret = i915_workqueues_init(dev_priv);
> >  	if (ret < 0)
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index 896aa48ed089..814705cfeb12 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -1144,19 +1144,6 @@ int i915_gem_init(struct drm_i915_private *dev_priv)
> >  	unsigned int i;
> >  	int ret;
> >  
> > -	/*
> > -	 * In the proccess of replacing cache_level with pat_index a tricky
> > -	 * dependency is created on the definition of the enum i915_cache_level.
> > -	 * in case this enum is changed, PTE encode would be broken.
> > -	 * Add a WARNING here. And remove when we completely quit using this
> > -	 * enum
> > -	 */
> > -	BUILD_BUG_ON(I915_CACHE_NONE != 0 ||
> > -		     I915_CACHE_LLC != 1 ||
> > -		     I915_CACHE_L3_LLC != 2 ||
> > -		     I915_CACHE_WT != 3 ||
> > -		     I915_MAX_CACHE_LEVEL != 4);
> > -
> >  	/* We need to fallback to 4K pages if host doesn't support huge gtt. */
> >  	if (intel_vgpu_active(dev_priv) && !intel_vgpu_has_huge_gtt(dev_priv))
> >  		RUNTIME_INFO(dev_priv)->page_sizes = I915_GTT_PAGE_SIZE_4K;
> > diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
> > index fcacdc21643c..565a60a1645d 100644
> > --- a/drivers/gpu/drm/i915/i915_pci.c
> > +++ b/drivers/gpu/drm/i915/i915_pci.c
> > @@ -32,6 +32,7 @@
> >  #include "gt/intel_sa_media.h"
> >  #include "gem/i915_gem_object_types.h"
> >  
> > +#include "i915_cache.h"
> >  #include "i915_driver.h"
> >  #include "i915_drv.h"
> >  #include "i915_pci.h"
> > @@ -43,36 +44,43 @@
> >  	.__runtime.graphics.ip.ver = (x), \
> >  	.__runtime.media.ip.ver = (x)
> >  
> > -#define LEGACY_CACHELEVEL \
> > -	.cachelevel_to_pat = { \
> > -		[I915_CACHE_NONE]   = 0, \
> > -		[I915_CACHE_LLC]    = 1, \
> > -		[I915_CACHE_L3_LLC] = 2, \
> > -		[I915_CACHE_WT]     = 3, \
> > +#define LEGACY_CACHE_MODES \
> > +	.cache_modes = { \
> > +		[I915_CACHE_MODE_UC] 	  = I915_CACHE(UC), \
> > +		[I915_CACHE_MODE_WB] 	  = I915_CACHE(WB, COH1W, COH2W), \
> 
> Reading bspec 2863 (bdw) indicates that the CPU being able to snoop the
> GPU's L3 was a new feature in gen8.  So for HSW and earlier, any
> coherency was only 1-way (GPU could be coherent with CPU's caches, but
> not vice-versa).  Only starting with gen8 did we get 2-way coherency as
> an option where the CPU would also be coherent with the GPU cache (and
> with gen8 and beyond you could still select 1-way instead of 2-way
> coherency with instruction-level granularity via MOCS).  There are also
> some legacy platforms (e.g., EHL/JSL on bspec 13948) where the IA wasn't
> coherent with GPU L3 so we were back to 1-way coherency.
> 
> So should we split LEGACY_CACHE_MODES into two tables with different
> coherency settings attached to I915_CACHE_MODE_WB?
> 
> > +		[__I915_CACHE_MODE_WB_L3] = I915_CACHE(WB, COH1W, COH2W, L3), \
> > +		[I915_CACHE_MODE_WT] 	  = I915_CACHE(WT), \
> >  	}
> >  
> > -#define TGL_CACHELEVEL \
> > -	.cachelevel_to_pat = { \
> > -		[I915_CACHE_NONE]   = 3, \
> > -		[I915_CACHE_LLC]    = 0, \
> > -		[I915_CACHE_L3_LLC] = 0, \
> > -		[I915_CACHE_WT]     = 2, \
> > +#define GEN12_CACHE_MODES \
> > +	.cache_modes = { \
> > +		[0] = I915_CACHE(WB, COH1W, COH2W), \
> > +		[1] = I915_CACHE(WC), \
> > +		[2] = I915_CACHE(WT), \
> > +		[3] = I915_CACHE(UC), \
> >  	}
> >  
> > -#define PVC_CACHELEVEL \
> > -	.cachelevel_to_pat = { \
> > -		[I915_CACHE_NONE]   = 0, \
> > -		[I915_CACHE_LLC]    = 3, \
> > -		[I915_CACHE_L3_LLC] = 3, \
> > -		[I915_CACHE_WT]     = 2, \
> > +/* FIXME is 1-way or 2-way for 3, 5, 7 */
> > +
> > +#define PVC_CACHE_MODES \
> > +	.cache_modes = { \
> > +		[0] = I915_CACHE(UC), \
> > +		[1] = I915_CACHE(WC), \
> > +		[2] = I915_CACHE(WT), \
> > +		[3] = I915_CACHE(WB, COH1W), \
> > +		[4] = I915_CACHE(WT, CLOS1), \
> > +		[5] = I915_CACHE(WB, COH1W, CLOS1), \
> > +		[6] = I915_CACHE(WT, CLOS2), \
> > +		[7] = I915_CACHE(WB, COH1W, CLOS2), \
> >  	}
> >  
> > -#define MTL_CACHELEVEL \
> > -	.cachelevel_to_pat = { \
> > -		[I915_CACHE_NONE]   = 2, \
> > -		[I915_CACHE_LLC]    = 3, \
> > -		[I915_CACHE_L3_LLC] = 3, \
> > -		[I915_CACHE_WT]     = 1, \
> > +#define MTL_CACHE_MODES \
> > +	.cache_modes = { \
> > +		[0] = I915_CACHE(WB), \
> > +		[1] = I915_CACHE(WT), \
> > +		[2] = I915_CACHE(UC), \
> > +		[3] = I915_CACHE(WB, COH1W), \
> > +		[4] = I915_CACHE(WB, COH1W, COH2W), \
> 
> We may want a comment on this one since the "2W" part is sort of a lie.
> Bspec 63884 has a programming note for MTL that says
> 
>         "...Except for system atomics, setting Coherency Mode to 10 or
>         11 results in this same one-way coherenct behavior..."
> 
> So if we ask for 2W, we actually only get 1W behavior except in a very
> narrow set of cases.
> 
> 
> Matt
> 
> >  	}
> >  
> >  /* Keep in gen based order, and chronological order within a gen */
> > @@ -97,7 +105,7 @@
> >  	.max_pat_index = 3, \
> >  	GEN_DEFAULT_PAGE_SIZES, \
> >  	GEN_DEFAULT_REGIONS, \
> > -	LEGACY_CACHELEVEL
> > +	LEGACY_CACHE_MODES
> >  
> >  #define I845_FEATURES \
> >  	GEN(2), \
> > @@ -112,7 +120,7 @@
> >  	.max_pat_index = 3, \
> >  	GEN_DEFAULT_PAGE_SIZES, \
> >  	GEN_DEFAULT_REGIONS, \
> > -	LEGACY_CACHELEVEL
> > +	LEGACY_CACHE_MODES
> >  
> >  static const struct intel_device_info i830_info = {
> >  	I830_FEATURES,
> > @@ -145,7 +153,7 @@ static const struct intel_device_info i865g_info = {
> >  	.max_pat_index = 3, \
> >  	GEN_DEFAULT_PAGE_SIZES, \
> >  	GEN_DEFAULT_REGIONS, \
> > -	LEGACY_CACHELEVEL
> > +	LEGACY_CACHE_MODES
> >  
> >  static const struct intel_device_info i915g_info = {
> >  	GEN3_FEATURES,
> > @@ -208,7 +216,7 @@ static const struct intel_device_info pnv_m_info = {
> >  	.max_pat_index = 3, \
> >  	GEN_DEFAULT_PAGE_SIZES, \
> >  	GEN_DEFAULT_REGIONS, \
> > -	LEGACY_CACHELEVEL
> > +	LEGACY_CACHE_MODES
> >  
> >  static const struct intel_device_info i965g_info = {
> >  	GEN4_FEATURES,
> > @@ -252,7 +260,7 @@ static const struct intel_device_info gm45_info = {
> >  	.max_pat_index = 3, \
> >  	GEN_DEFAULT_PAGE_SIZES, \
> >  	GEN_DEFAULT_REGIONS, \
> > -	LEGACY_CACHELEVEL
> > +	LEGACY_CACHE_MODES
> >  
> >  static const struct intel_device_info ilk_d_info = {
> >  	GEN5_FEATURES,
> > @@ -282,7 +290,7 @@ static const struct intel_device_info ilk_m_info = {
> >  	.__runtime.ppgtt_size = 31, \
> >  	GEN_DEFAULT_PAGE_SIZES, \
> >  	GEN_DEFAULT_REGIONS, \
> > -	LEGACY_CACHELEVEL
> > +	LEGACY_CACHE_MODES
> >  
> >  #define SNB_D_PLATFORM \
> >  	GEN6_FEATURES, \
> > @@ -330,7 +338,7 @@ static const struct intel_device_info snb_m_gt2_info = {
> >  	.__runtime.ppgtt_size = 31, \
> >  	GEN_DEFAULT_PAGE_SIZES, \
> >  	GEN_DEFAULT_REGIONS, \
> > -	LEGACY_CACHELEVEL
> > +	LEGACY_CACHE_MODES
> >  
> >  #define IVB_D_PLATFORM \
> >  	GEN7_FEATURES, \
> > @@ -387,7 +395,7 @@ static const struct intel_device_info vlv_info = {
> >  	.platform_engine_mask = BIT(RCS0) | BIT(VCS0) | BIT(BCS0),
> >  	GEN_DEFAULT_PAGE_SIZES,
> >  	GEN_DEFAULT_REGIONS,
> > -	LEGACY_CACHELEVEL,
> > +	LEGACY_CACHE_MODES
> >  };
> >  
> >  #define G75_FEATURES  \
> > @@ -473,7 +481,7 @@ static const struct intel_device_info chv_info = {
> >  	.has_coherent_ggtt = false,
> >  	GEN_DEFAULT_PAGE_SIZES,
> >  	GEN_DEFAULT_REGIONS,
> > -	LEGACY_CACHELEVEL,
> > +	LEGACY_CACHE_MODES
> >  };
> >  
> >  #define GEN9_DEFAULT_PAGE_SIZES \
> > @@ -536,7 +544,7 @@ static const struct intel_device_info skl_gt4_info = {
> >  	.max_pat_index = 3, \
> >  	GEN9_DEFAULT_PAGE_SIZES, \
> >  	GEN_DEFAULT_REGIONS, \
> > -	LEGACY_CACHELEVEL
> > +	LEGACY_CACHE_MODES
> >  
> >  static const struct intel_device_info bxt_info = {
> >  	GEN9_LP_FEATURES,
> > @@ -640,7 +648,7 @@ static const struct intel_device_info jsl_info = {
> >  #define GEN12_FEATURES \
> >  	GEN11_FEATURES, \
> >  	GEN(12), \
> > -	TGL_CACHELEVEL, \
> > +	GEN12_CACHE_MODES, \
> >  	.has_global_mocs = 1, \
> >  	.has_pxp = 1, \
> >  	.max_pat_index = 3
> > @@ -708,7 +716,7 @@ static const struct intel_device_info adl_p_info = {
> >  	.__runtime.graphics.ip.ver = 12, \
> >  	.__runtime.graphics.ip.rel = 50, \
> >  	XE_HP_PAGE_SIZES, \
> > -	TGL_CACHELEVEL, \
> > +	GEN12_CACHE_MODES, \
> >  	.dma_mask_size = 46, \
> >  	.has_3d_pipeline = 1, \
> >  	.has_64bit_reloc = 1, \
> > @@ -803,7 +811,7 @@ static const struct intel_device_info pvc_info = {
> >  		BIT(VCS0) |
> >  		BIT(CCS0) | BIT(CCS1) | BIT(CCS2) | BIT(CCS3),
> >  	.require_force_probe = 1,
> > -	PVC_CACHELEVEL,
> > +	PVC_CACHE_MODES
> >  };
> >  
> >  static const struct intel_gt_definition xelpmp_extra_gt[] = {
> > @@ -838,7 +846,7 @@ static const struct intel_device_info mtl_info = {
> >  	.memory_regions = REGION_SMEM | REGION_STOLEN_LMEM,
> >  	.platform_engine_mask = BIT(RCS0) | BIT(BCS0) | BIT(CCS0),
> >  	.require_force_probe = 1,
> > -	MTL_CACHELEVEL,
> > +	MTL_CACHE_MODES
> >  };
> >  
> >  #undef PLATFORM
> > diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> > index 04bc1f4a1115..973175a64534 100644
> > --- a/drivers/gpu/drm/i915/i915_perf.c
> > +++ b/drivers/gpu/drm/i915/i915_perf.c
> > @@ -1870,7 +1870,7 @@ static int alloc_oa_buffer(struct i915_perf_stream *stream)
> >  		return PTR_ERR(bo);
> >  	}
> >  
> > -	i915_gem_object_set_cache_coherency(bo, I915_CACHE_LLC);
> > +	i915_gem_object_set_cache_coherency(bo, I915_CACHE_CACHED);
> >  
> >  	/* PreHSW required 512K alignment, HSW requires 16M */
> >  	vma = i915_vma_instance(bo, &gt->ggtt->vm, NULL);
> > diff --git a/drivers/gpu/drm/i915/intel_device_info.h b/drivers/gpu/drm/i915/intel_device_info.h
> > index dbfe6443457b..2ce13b7c48cb 100644
> > --- a/drivers/gpu/drm/i915/intel_device_info.h
> > +++ b/drivers/gpu/drm/i915/intel_device_info.h
> > @@ -27,6 +27,8 @@
> >  
> >  #include <uapi/drm/i915_drm.h>
> >  
> > +#include "i915_cache.h"
> > +
> >  #include "intel_step.h"
> >  
> >  #include "gt/intel_engine_types.h"
> > @@ -243,8 +245,8 @@ struct intel_device_info {
> >  	 */
> >  	const struct intel_runtime_info __runtime;
> >  
> > -	u32 cachelevel_to_pat[I915_MAX_CACHE_LEVEL];
> > -	u32 max_pat_index;
> > +	i915_cache_t cache_modes[8];
> > +	unsigned int max_pat_index;
> >  };
> >  
> >  struct intel_driver_caps {
> > diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
> > index f910ec9b6d2b..ba821e48baa5 100644
> > --- a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
> > +++ b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
> > @@ -267,7 +267,7 @@ static int igt_evict_for_cache_color(void *arg)
> >  		err = PTR_ERR(obj);
> >  		goto cleanup;
> >  	}
> > -	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
> > +	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
> >  	quirk_add(obj, &objects);
> >  
> >  	vma = i915_gem_object_ggtt_pin(obj, NULL, 0, 0,
> > @@ -283,7 +283,7 @@ static int igt_evict_for_cache_color(void *arg)
> >  		err = PTR_ERR(obj);
> >  		goto cleanup;
> >  	}
> > -	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
> > +	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
> >  	quirk_add(obj, &objects);
> >  
> >  	/* Neighbouring; same colour - should fit */
> > diff --git a/drivers/gpu/drm/i915/selftests/igt_spinner.c b/drivers/gpu/drm/i915/selftests/igt_spinner.c
> > index 3c5e0952f1b8..4cfc5000d6ff 100644
> > --- a/drivers/gpu/drm/i915/selftests/igt_spinner.c
> > +++ b/drivers/gpu/drm/i915/selftests/igt_spinner.c
> > @@ -23,7 +23,7 @@ int igt_spinner_init(struct igt_spinner *spin, struct intel_gt *gt)
> >  		err = PTR_ERR(spin->hws);
> >  		goto err;
> >  	}
> > -	i915_gem_object_set_cache_coherency(spin->hws, I915_CACHE_LLC);
> > +	i915_gem_object_set_cache_coherency(spin->hws, I915_CACHE_CACHED);
> >  
> >  	spin->obj = i915_gem_object_create_internal(gt->i915, PAGE_SIZE);
> >  	if (IS_ERR(spin->obj)) {
> > diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> > index 1d1a457e2aee..8ae77bcf27fa 100644
> > --- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> > +++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> > @@ -126,13 +126,13 @@ static const struct intel_device_info mock_info = {
> >  	.memory_regions = REGION_SMEM,
> >  	.platform_engine_mask = BIT(0),
> >  
> > -	/* simply use legacy cache level for mock device */
> > +	/* Simply use legacy cache modes for the mock device. */
> >  	.max_pat_index = 3,
> > -	.cachelevel_to_pat = {
> > -		[I915_CACHE_NONE]   = 0,
> > -		[I915_CACHE_LLC]    = 1,
> > -		[I915_CACHE_L3_LLC] = 2,
> > -		[I915_CACHE_WT]     = 3,
> > +	.cache_modes = {
> > +		[0] = I915_CACHE(UC),
> > +		[1] = I915_CACHE(WB, COH1W),
> > +		[2] = I915_CACHE(WB, COH1W, COH2W, L3),
> > +		[3] = I915_CACHE(WT),
> >  	},
> >  };
> >  
> > @@ -181,7 +181,7 @@ struct drm_i915_private *mock_gem_device(void)
> >  	/* Set up device info and initial runtime info. */
> >  	intel_device_info_driver_create(i915, pdev->device, &mock_info);
> >  
> > -	i915_cache_init(i915);
> > +	WARN_ON(i915_cache_init(i915));
> >  
> >  	dev_pm_domain_set(&pdev->dev, &pm_domain);
> >  	pm_runtime_enable(&pdev->dev);
> > -- 
> > 2.39.2
> > 
> 
> -- 
> Matt Roper
> Graphics Software Engineer
> Linux GPU Platform Enablement
> Intel Corporation

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Intel-gfx] [RFC 4/8] drm/i915: Refactor PAT/object cache handling
@ 2023-07-28  0:17       ` Matt Roper
  0 siblings, 0 replies; 59+ messages in thread
From: Matt Roper @ 2023-07-28  0:17 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Intel-gfx, dri-devel, Chris Wilson

On Thu, Jul 27, 2023 at 04:57:53PM -0700, Matt Roper wrote:
> On Thu, Jul 27, 2023 at 03:55:00PM +0100, Tvrtko Ursulin wrote:
> > From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > 
> > Commit 9275277d5324 ("drm/i915: use pat_index instead of cache_level") has
> > introduced PAT indices to i915 internal APIs, partially replacing the
> > usage of driver internal cache_level, but has also added a few sub-
> > optimal design decisions which this patch tries to improve upon.
> > 
> > Principal change here is to invert the per platform cache level to PAT
> > index table which was added by the referenced commit, and by doing so
> > enable i915 to understand the cache mode between PAT indices, changing
> > them from opaque to transparent.
> > 
> > Once we have the inverted table we are able to remove the hidden false
> > "return true" from i915_gem_object_has_cache_level and make the involved
> > code path clearer.
> > 
> > To achieve this we replace the enum i915_cache_level with i915_cache_t,
> > composed of a more detailed representation of each cache mode (base mode
> > plus flags).
> > 
> > In this way we are able to express the differences between different
> > write-back mode coherency settings on Meteorlake, which in turn enables us
> > to map the i915 "cached" mode to the correct Meteorlake PAT index.
> > 
> > We can also replace the platform dependent cache mode to string code in
> > debugfs and elsewhere by the single implementation based on i915_cache_t.
> > 
> > v2:
> >  * Fix PAT-to-cache-mode table for PVC. (Fei)
> >  * Cache display caching mode too. (Fei)
> >  * Improve and document criteria in i915_gem_object_can_bypass_llc() (Matt)
> > 
> > v3:
> >  * Checkpath issues.
> >  * Cache mode flags check fixed.
> > 
> > v4:
> >  * Fix intel_device_info->cache_modes array size. (Matt)
> >  * Boolean cache mode and flags query. (Matt)
> >  * Reduce number of cache macros with some macro magic.
> >  * One more checkpatch fix.
> >  * Tweak tables to show legacy and Gen12 WB is fully coherent.
> > 
> > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > References: 9275277d5324 ("drm/i915: use pat_index instead of cache_level")
> > Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
> > Cc: Fei Yang <fei.yang@intel.com>
> > Cc: Andi Shyti <andi.shyti@linux.intel.com>
> > Cc: Matt Roper <matthew.d.roper@intel.com>
> > ---
> >  drivers/gpu/drm/i915/gem/i915_gem_domain.c    |  60 +++++----
> >  drivers/gpu/drm/i915/gem/i915_gem_domain.h    |   5 +-
> >  .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |   3 +-
> >  drivers/gpu/drm/i915/gem/i915_gem_internal.c  |   2 +-
> >  drivers/gpu/drm/i915/gem/i915_gem_mman.c      |   4 +-
> >  drivers/gpu/drm/i915/gem/i915_gem_object.c    | 117 ++++++++++--------
> >  drivers/gpu/drm/i915/gem/i915_gem_object.h    |  11 +-
> >  .../gpu/drm/i915/gem/i915_gem_object_types.h  | 116 +----------------
> >  drivers/gpu/drm/i915/gem/i915_gem_shmem.c     |   8 +-
> >  drivers/gpu/drm/i915/gem/i915_gem_stolen.c    |   2 +-
> >  drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c  |  20 +--
> >  drivers/gpu/drm/i915/gem/i915_gem_userptr.c   |   2 +-
> >  .../drm/i915/gem/selftests/huge_gem_object.c  |   2 +-
> >  .../gpu/drm/i915/gem/selftests/huge_pages.c   |   3 +-
> >  drivers/gpu/drm/i915/gt/gen8_ppgtt.c          |  10 +-
> >  drivers/gpu/drm/i915/gt/intel_engine_cs.c     |   2 +-
> >  drivers/gpu/drm/i915/gt/intel_ggtt.c          |  25 ++--
> >  drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c     |   4 +-
> >  drivers/gpu/drm/i915/gt/intel_gtt.c           |   2 +-
> >  drivers/gpu/drm/i915/gt/intel_gtt.h           |   3 +-
> >  drivers/gpu/drm/i915/gt/intel_ppgtt.c         |   6 +-
> >  .../gpu/drm/i915/gt/intel_ring_submission.c   |   4 +-
> >  drivers/gpu/drm/i915/gt/intel_timeline.c      |   2 +-
> >  drivers/gpu/drm/i915/gt/selftest_hangcheck.c  |   2 +-
> >  .../gpu/drm/i915/gt/selftest_workarounds.c    |   2 +-
> >  drivers/gpu/drm/i915/i915_cache.c             |  89 +++++++++++--
> >  drivers/gpu/drm/i915/i915_cache.h             |  70 ++++++++++-
> >  drivers/gpu/drm/i915/i915_debugfs.c           |  53 ++------
> >  drivers/gpu/drm/i915/i915_driver.c            |   4 +-
> >  drivers/gpu/drm/i915/i915_gem.c               |  13 --
> >  drivers/gpu/drm/i915/i915_pci.c               |  84 +++++++------
> >  drivers/gpu/drm/i915/i915_perf.c              |   2 +-
> >  drivers/gpu/drm/i915/intel_device_info.h      |   6 +-
> >  .../gpu/drm/i915/selftests/i915_gem_evict.c   |   4 +-
> >  drivers/gpu/drm/i915/selftests/igt_spinner.c  |   2 +-
> >  .../gpu/drm/i915/selftests/mock_gem_device.c  |  14 +--
> >  36 files changed, 391 insertions(+), 367 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> > index 57db9c581bf6..c15f83de33af 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> > @@ -8,6 +8,7 @@
> >  #include "display/intel_frontbuffer.h"
> >  #include "gt/intel_gt.h"
> >  
> > +#include "i915_cache.h"
> >  #include "i915_drv.h"
> >  #include "i915_gem_clflush.h"
> >  #include "i915_gem_domain.h"
> > @@ -41,14 +42,17 @@ static bool gpu_write_needs_clflush(struct drm_i915_gem_object *obj)
> >  		return false;
> >  
> >  	/*
> > -	 * For objects created by userspace through GEM_CREATE with pat_index
> > -	 * set by set_pat extension, i915_gem_object_has_cache_level() will
> > -	 * always return true, because the coherency of such object is managed
> > -	 * by userspace. Othereise the call here would fall back to checking
> > -	 * whether the object is un-cached or write-through.
> > +	 * Always flush cache for UMD objects with PAT index set.
> >  	 */
> > -	return !(i915_gem_object_has_cache_level(obj, I915_CACHE_NONE) ||
> > -		 i915_gem_object_has_cache_level(obj, I915_CACHE_WT));
> > +	if (obj->pat_set_by_user)
> > +		return true;
> > +
> > +	/*
> > +	 * Fully coherent cached access may end up with data in the CPU cache
> > +	 * which hasn't hit memory yet.
> > +	 */
> > +	return i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_WB) &&
> > +	       i915_gem_object_has_cache_flag(obj, I915_CACHE_FLAG_COH2W);
> >  }
> >  
> >  bool i915_gem_cpu_write_needs_clflush(struct drm_i915_gem_object *obj)
> > @@ -268,7 +272,7 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
> >  /**
> >   * i915_gem_object_set_cache_level - Changes the cache-level of an object across all VMA.
> >   * @obj: object to act on
> > - * @cache_level: new cache level to set for the object
> > + * @cache: new caching mode to set for the object
> >   *
> >   * After this function returns, the object will be in the new cache-level
> >   * across all GTT and the contents of the backing storage will be coherent,
> > @@ -281,18 +285,28 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
> >   * that all direct access to the scanout remains coherent.
> >   */
> >  int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
> > -				    enum i915_cache_level cache_level)
> > +				    i915_cache_t cache)
> >  {
> > -	int ret;
> > +	struct drm_i915_private *i915 = to_i915(obj->base.dev);
> > +	int pat, ret;
> >  
> > -	/*
> > -	 * For objects created by userspace through GEM_CREATE with pat_index
> > -	 * set by set_pat extension, simply return 0 here without touching
> > -	 * the cache setting, because such objects should have an immutable
> > -	 * cache setting by desgin and always managed by userspace.
> > -	 */
> > -	if (i915_gem_object_has_cache_level(obj, cache_level))
> > +	pat = i915_cache_find_pat(i915, cache);
> > +	if (pat < 0) {
> > +		char buf[I915_CACHE_NAME_LEN];
> > +
> > +		i915_cache_print(buf, sizeof(buf), NULL, cache);
> > +		drm_err_ratelimited(&i915->drm,
> > +				    "Attempting to use unknown caching mode %s!\n",
> > +				    buf);
> > +
> > +		return -EINVAL;
> > +	} else if (pat == obj->pat_index) {
> >  		return 0;
> > +	} else if (obj->pat_set_by_user) {
> > +		drm_notice_once(&i915->drm,
> > +				"Attempting to change caching mode on an object with fixed PAT!\n");
> > +		return -EINVAL;
> > +	}
> >  
> >  	ret = i915_gem_object_wait(obj,
> >  				   I915_WAIT_INTERRUPTIBLE |
> > @@ -302,7 +316,7 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
> >  		return ret;
> >  
> >  	/* Always invalidate stale cachelines */
> > -	i915_gem_object_set_cache_coherency(obj, cache_level);
> > +	i915_gem_object_set_pat_index(obj, pat);
> >  	obj->cache_dirty = true;
> >  
> >  	/* The cache-level will be applied when each vma is rebound. */
> > @@ -337,10 +351,10 @@ int i915_gem_get_caching_ioctl(struct drm_device *dev, void *data,
> >  		goto out;
> >  	}
> >  
> > -	if (i915_gem_object_has_cache_level(obj, I915_CACHE_LLC) ||
> > -	    i915_gem_object_has_cache_level(obj, I915_CACHE_L3_LLC))
> > +	if (i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_WB) &&
> > +	    i915_gem_object_has_cache_flag(obj, I915_CACHE_FLAG_COH2W))
> >  		args->caching = I915_CACHING_CACHED;
> > -	else if (i915_gem_object_has_cache_level(obj, I915_CACHE_WT))
> > +	else if (i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_WT))
> >  		args->caching = I915_CACHING_DISPLAY;
> >  	else
> >  		args->caching = I915_CACHING_NONE;
> > @@ -355,7 +369,7 @@ int i915_gem_set_caching_ioctl(struct drm_device *dev, void *data,
> >  	struct drm_i915_private *i915 = to_i915(dev);
> >  	struct drm_i915_gem_caching *args = data;
> >  	struct drm_i915_gem_object *obj;
> > -	enum i915_cache_level level;
> > +	i915_cache_t level;
> >  	int ret = 0;
> >  
> >  	if (IS_DGFX(i915))
> > @@ -378,7 +392,7 @@ int i915_gem_set_caching_ioctl(struct drm_device *dev, void *data,
> >  		if (!HAS_LLC(i915) && !HAS_SNOOP(i915))
> >  			return -ENODEV;
> >  
> > -		level = I915_CACHE_LLC;
> > +		level = I915_CACHE_CACHED;
> >  		break;
> >  	case I915_CACHING_DISPLAY:
> >  		level = HAS_WT(i915) ? I915_CACHE_WT : I915_CACHE_NONE;
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.h b/drivers/gpu/drm/i915/gem/i915_gem_domain.h
> > index 9622df962bfc..6da5c351f6fd 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.h
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.h
> > @@ -6,10 +6,11 @@
> >  #ifndef __I915_GEM_DOMAIN_H__
> >  #define __I915_GEM_DOMAIN_H__
> >  
> > +#include "i915_cache.h"
> > +
> >  struct drm_i915_gem_object;
> > -enum i915_cache_level;
> >  
> >  int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
> > -				    enum i915_cache_level cache_level);
> > +				    i915_cache_t cache);
> >  
> >  #endif /* __I915_GEM_DOMAIN_H__ */
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > index 0a1d40220020..9d6e49c8a4c6 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > @@ -648,7 +648,8 @@ static inline int use_cpu_reloc(const struct reloc_cache *cache,
> >  	 */
> >  	return (cache->has_llc ||
> >  		obj->cache_dirty ||
> > -		!i915_gem_object_has_cache_level(obj, I915_CACHE_NONE));
> > +		!(obj->pat_set_by_user ||
> > +		  i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_UC)));
> >  }
> >  
> >  static int eb_reserve_vma(struct i915_execbuffer *eb,
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_internal.c b/drivers/gpu/drm/i915/gem/i915_gem_internal.c
> > index 6bc26b4b06b8..88c360c3d6a3 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_internal.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_internal.c
> > @@ -170,7 +170,7 @@ __i915_gem_object_create_internal(struct drm_i915_private *i915,
> >  	obj->read_domains = I915_GEM_DOMAIN_CPU;
> >  	obj->write_domain = I915_GEM_DOMAIN_CPU;
> >  
> > -	cache_level = HAS_LLC(i915) ? I915_CACHE_LLC : I915_CACHE_NONE;
> > +	cache_level = HAS_LLC(i915) ? I915_CACHE_CACHED : I915_CACHE_NONE;
> >  	i915_gem_object_set_cache_coherency(obj, cache_level);
> >  
> >  	return obj;
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> > index aa4d842d4c5a..cd7f8ded0d6f 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> > @@ -382,7 +382,6 @@ static vm_fault_t vm_fault_gtt(struct vm_fault *vmf)
> >  		goto err_reset;
> >  	}
> >  
> > -	/* Access to snoopable pages through the GTT is incoherent. */
> >  	/*
> >  	 * For objects created by userspace through GEM_CREATE with pat_index
> >  	 * set by set_pat extension, coherency is managed by userspace, make
> > @@ -391,7 +390,8 @@ static vm_fault_t vm_fault_gtt(struct vm_fault *vmf)
> >  	 * objects. Otherwise this helper function would fall back to checking
> >  	 * whether the object is un-cached.
> >  	 */
> > -	if (!(i915_gem_object_has_cache_level(obj, I915_CACHE_NONE) ||
> > +	if (!((obj->pat_set_by_user ||
> > +	       i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_UC)) ||
> >  	      HAS_LLC(i915))) {
> >  		ret = -EFAULT;
> >  		goto err_unpin;
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> > index 3dc4fbb67d2b..ec1f0be43d0d 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> > @@ -45,33 +45,6 @@ static struct kmem_cache *slab_objects;
> >  
> >  static const struct drm_gem_object_funcs i915_gem_object_funcs;
> >  
> > -unsigned int i915_gem_get_pat_index(struct drm_i915_private *i915,
> > -				    enum i915_cache_level level)
> > -{
> > -	if (drm_WARN_ON(&i915->drm, level >= I915_MAX_CACHE_LEVEL))
> > -		return 0;
> > -
> > -	return INTEL_INFO(i915)->cachelevel_to_pat[level];
> > -}
> > -
> > -bool i915_gem_object_has_cache_level(const struct drm_i915_gem_object *obj,
> > -				     enum i915_cache_level lvl)
> > -{
> > -	/*
> > -	 * In case the pat_index is set by user space, this kernel mode
> > -	 * driver should leave the coherency to be managed by user space,
> > -	 * simply return true here.
> > -	 */
> > -	if (obj->pat_set_by_user)
> > -		return true;
> > -
> > -	/*
> > -	 * Otherwise the pat_index should have been converted from cache_level
> > -	 * so that the following comparison is valid.
> > -	 */
> > -	return obj->pat_index == i915_gem_get_pat_index(obj_to_i915(obj), lvl);
> > -}
> > -
> >  struct drm_i915_gem_object *i915_gem_object_alloc(void)
> >  {
> >  	struct drm_i915_gem_object *obj;
> > @@ -144,30 +117,72 @@ void __i915_gem_object_fini(struct drm_i915_gem_object *obj)
> >  	dma_resv_fini(&obj->base._resv);
> >  }
> >  
> > +bool i915_gem_object_has_cache_mode(const struct drm_i915_gem_object *obj,
> > +				    enum i915_cache_mode mode)
> > +{
> > +	struct drm_i915_private *i915 = obj_to_i915(obj);
> > +	i915_cache_t cache = INTEL_INFO(i915)->cache_modes[obj->pat_index];
> > +
> > +	return I915_CACHE_MODE(cache) == mode;
> > +}
> > +
> > +bool i915_gem_object_has_cache_flag(const struct drm_i915_gem_object *obj,
> > +				    unsigned int flag)
> > +{
> > +	struct drm_i915_private *i915 = obj_to_i915(obj);
> > +	i915_cache_t cache = INTEL_INFO(i915)->cache_modes[obj->pat_index];
> > +
> > +	return I915_CACHE_FLAGS(cache) & flag;
> > +}
> > +
> > +static void __i915_gem_object_update_coherency(struct drm_i915_gem_object *obj)
> > +{
> > +	struct drm_i915_private *i915 = obj_to_i915(obj);
> > +	i915_cache_t cache = INTEL_INFO(i915)->cache_modes[obj->pat_index];
> > +	const unsigned int flags = I915_CACHE_FLAGS(cache);
> > +	const unsigned int mode = I915_CACHE_MODE(cache);
> > +
> > +	if (mode == I915_CACHE_MODE_WC ||
> > +	    mode == I915_CACHE_MODE_WT ||
> > +	    (mode == I915_CACHE_MODE_WB && (flags & I915_CACHE_FLAG_COH2W)))

Shouldn't we only need 1W coherency here?  With 1-way coherency GPU
reads will snoop the CPU cache and GPU writes will invalidate the CPU
cache.  2-way only matters for how CPU reads/writes interact with the
GPU cache.


Matt

> > +		obj->cache_coherent = I915_BO_CACHE_COHERENT_FOR_READ |
> > +				      I915_BO_CACHE_COHERENT_FOR_WRITE;
> > +	else if (HAS_LLC(i915))
> > +		obj->cache_coherent = I915_BO_CACHE_COHERENT_FOR_READ;
> > +	else
> > +		obj->cache_coherent = 0;
> > +
> > +	obj->cache_dirty =
> > +		!(obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_WRITE) &&
> > +		!IS_DGFX(i915);
> > +}
> > +
> >  /**
> >   * i915_gem_object_set_cache_coherency - Mark up the object's coherency levels
> > - * for a given cache_level
> > + * for a given caching mode
> >   * @obj: #drm_i915_gem_object
> > - * @cache_level: cache level
> > + * @cache: cache mode
> >   */
> >  void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object *obj,
> > -					 unsigned int cache_level)
> > +					 i915_cache_t cache)
> >  {
> > -	struct drm_i915_private *i915 = to_i915(obj->base.dev);
> > +	struct drm_i915_private *i915 = obj_to_i915(obj);
> > +	int found;
> >  
> > -	obj->pat_index = i915_gem_get_pat_index(i915, cache_level);
> > +	found = i915_cache_find_pat(i915, cache);
> > +	if (found < 0) {
> > +		char buf[I915_CACHE_NAME_LEN];
> >  
> > -	if (cache_level != I915_CACHE_NONE)
> > -		obj->cache_coherent = (I915_BO_CACHE_COHERENT_FOR_READ |
> > -				       I915_BO_CACHE_COHERENT_FOR_WRITE);
> > -	else if (HAS_LLC(i915))
> > -		obj->cache_coherent = I915_BO_CACHE_COHERENT_FOR_READ;
> > -	else
> > -		obj->cache_coherent = 0;
> > +		i915_cache_print(buf, sizeof(buf), NULL, cache);
> > +		drm_err_ratelimited(&i915->drm, "Unknown cache mode %s!\n",
> > +				    buf);
> >  
> > -	obj->cache_dirty =
> > -		!(obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_WRITE) &&
> > -		!IS_DGFX(i915);
> > +		found = i915->pat_uc;
> > +	}
> > +
> > +	obj->pat_index = found;
> > +
> > +	__i915_gem_object_update_coherency(obj);
> >  }
> >  
> >  /**
> > @@ -181,24 +196,18 @@ void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object *obj,
> >  void i915_gem_object_set_pat_index(struct drm_i915_gem_object *obj,
> >  				   unsigned int pat_index)
> >  {
> > -	struct drm_i915_private *i915 = to_i915(obj->base.dev);
> > +	struct drm_i915_private *i915 = obj_to_i915(obj);
> >  
> >  	if (obj->pat_index == pat_index)
> >  		return;
> >  
> > +	if (drm_WARN_ON_ONCE(&i915->drm,
> > +			     pat_index > INTEL_INFO(i915)->max_pat_index))
> > +		return;
> > +
> >  	obj->pat_index = pat_index;
> >  
> > -	if (pat_index != i915_gem_get_pat_index(i915, I915_CACHE_NONE))
> > -		obj->cache_coherent = (I915_BO_CACHE_COHERENT_FOR_READ |
> > -				       I915_BO_CACHE_COHERENT_FOR_WRITE);
> > -	else if (HAS_LLC(i915))
> > -		obj->cache_coherent = I915_BO_CACHE_COHERENT_FOR_READ;
> > -	else
> > -		obj->cache_coherent = 0;
> > -
> > -	obj->cache_dirty =
> > -		!(obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_WRITE) &&
> > -		!IS_DGFX(i915);
> > +	__i915_gem_object_update_coherency(obj);
> >  }
> >  
> >  bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj)
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> > index 884a17275b3a..a5d4ee19d9be 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> > @@ -13,6 +13,7 @@
> >  
> >  #include "display/intel_frontbuffer.h"
> >  #include "intel_memory_region.h"
> > +#include "i915_cache.h"
> >  #include "i915_gem_object_types.h"
> >  #include "i915_gem_gtt.h"
> >  #include "i915_gem_ww.h"
> > @@ -32,10 +33,6 @@ static inline bool i915_gem_object_size_2big(u64 size)
> >  	return false;
> >  }
> >  
> > -unsigned int i915_gem_get_pat_index(struct drm_i915_private *i915,
> > -				    enum i915_cache_level level);
> > -bool i915_gem_object_has_cache_level(const struct drm_i915_gem_object *obj,
> > -				     enum i915_cache_level lvl);
> >  void i915_gem_init__objects(struct drm_i915_private *i915);
> >  
> >  void i915_objects_module_exit(void);
> > @@ -764,8 +761,12 @@ int i915_gem_object_wait_moving_fence(struct drm_i915_gem_object *obj,
> >  				      bool intr);
> >  bool i915_gem_object_has_unknown_state(struct drm_i915_gem_object *obj);
> >  
> > +bool i915_gem_object_has_cache_mode(const struct drm_i915_gem_object *obj,
> > +				    enum i915_cache_mode mode);
> > +bool i915_gem_object_has_cache_flag(const struct drm_i915_gem_object *obj,
> > +				    unsigned int flag);
> >  void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object *obj,
> > -					 unsigned int cache_level);
> > +					 i915_cache_t cache);
> >  void i915_gem_object_set_pat_index(struct drm_i915_gem_object *obj,
> >  				   unsigned int pat_index);
> >  bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj);
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> > index 8de2b91b3edf..6790e13ad262 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> > @@ -14,6 +14,7 @@
> >  #include <uapi/drm/i915_drm.h>
> >  
> >  #include "i915_active.h"
> > +#include "i915_cache.h"
> >  #include "i915_selftest.h"
> >  #include "i915_vma_resource.h"
> >  
> > @@ -116,93 +117,6 @@ struct drm_i915_gem_object_ops {
> >  	const char *name; /* friendly name for debug, e.g. lockdep classes */
> >  };
> >  
> > -/**
> > - * enum i915_cache_level - The supported GTT caching values for system memory
> > - * pages.
> > - *
> > - * These translate to some special GTT PTE bits when binding pages into some
> > - * address space. It also determines whether an object, or rather its pages are
> > - * coherent with the GPU, when also reading or writing through the CPU cache
> > - * with those pages.
> > - *
> > - * Userspace can also control this through struct drm_i915_gem_caching.
> > - */
> > -enum i915_cache_level {
> > -	/**
> > -	 * @I915_CACHE_NONE:
> > -	 *
> > -	 * GPU access is not coherent with the CPU cache. If the cache is dirty
> > -	 * and we need the underlying pages to be coherent with some later GPU
> > -	 * access then we need to manually flush the pages.
> > -	 *
> > -	 * On shared LLC platforms reads and writes through the CPU cache are
> > -	 * still coherent even with this setting. See also
> > -	 * &drm_i915_gem_object.cache_coherent for more details. Due to this we
> > -	 * should only ever use uncached for scanout surfaces, otherwise we end
> > -	 * up over-flushing in some places.
> > -	 *
> > -	 * This is the default on non-LLC platforms.
> > -	 */
> > -	I915_CACHE_NONE = 0,
> > -	/**
> > -	 * @I915_CACHE_LLC:
> > -	 *
> > -	 * GPU access is coherent with the CPU cache. If the cache is dirty,
> > -	 * then the GPU will ensure that access remains coherent, when both
> > -	 * reading and writing through the CPU cache. GPU writes can dirty the
> > -	 * CPU cache.
> > -	 *
> > -	 * Not used for scanout surfaces.
> > -	 *
> > -	 * Applies to both platforms with shared LLC(HAS_LLC), and snooping
> > -	 * based platforms(HAS_SNOOP).
> > -	 *
> > -	 * This is the default on shared LLC platforms.  The only exception is
> > -	 * scanout objects, where the display engine is not coherent with the
> > -	 * CPU cache. For such objects I915_CACHE_NONE or I915_CACHE_WT is
> > -	 * automatically applied by the kernel in pin_for_display, if userspace
> > -	 * has not done so already.
> > -	 */
> > -	I915_CACHE_LLC,
> > -	/**
> > -	 * @I915_CACHE_L3_LLC:
> > -	 *
> > -	 * Explicitly enable the Gfx L3 cache, with coherent LLC.
> > -	 *
> > -	 * The Gfx L3 sits between the domain specific caches, e.g
> > -	 * sampler/render caches, and the larger LLC. LLC is coherent with the
> > -	 * GPU, but L3 is only visible to the GPU, so likely needs to be flushed
> > -	 * when the workload completes.
> > -	 *
> > -	 * Not used for scanout surfaces.
> > -	 *
> > -	 * Only exposed on some gen7 + GGTT. More recent hardware has dropped
> > -	 * this explicit setting, where it should now be enabled by default.
> > -	 */
> > -	I915_CACHE_L3_LLC,
> > -	/**
> > -	 * @I915_CACHE_WT:
> > -	 *
> > -	 * Write-through. Used for scanout surfaces.
> > -	 *
> > -	 * The GPU can utilise the caches, while still having the display engine
> > -	 * be coherent with GPU writes, as a result we don't need to flush the
> > -	 * CPU caches when moving out of the render domain. This is the default
> > -	 * setting chosen by the kernel, if supported by the HW, otherwise we
> > -	 * fallback to I915_CACHE_NONE. On the CPU side writes through the CPU
> > -	 * cache still need to be flushed, to remain coherent with the display
> > -	 * engine.
> > -	 */
> > -	I915_CACHE_WT,
> > -	/**
> > -	 * @I915_MAX_CACHE_LEVEL:
> > -	 *
> > -	 * Mark the last entry in the enum. Used for defining cachelevel_to_pat
> > -	 * array for cache_level to pat translation table.
> > -	 */
> > -	I915_MAX_CACHE_LEVEL,
> > -};
> > -
> >  enum i915_map_type {
> >  	I915_MAP_WB = 0,
> >  	I915_MAP_WC,
> > @@ -403,16 +317,6 @@ struct drm_i915_gem_object {
> >  	/**
> >  	 * @cache_coherent:
> >  	 *
> > -	 * Note: with the change above which replaced @cache_level with pat_index,
> > -	 * the use of @cache_coherent is limited to the objects created by kernel
> > -	 * or by userspace without pat index specified.
> > -	 * Check for @pat_set_by_user to find out if an object has pat index set
> > -	 * by userspace. The ioctl's to change cache settings have also been
> > -	 * disabled for the objects with pat index set by userspace. Please don't
> > -	 * assume @cache_coherent having the flags set as describe here. A helper
> > -	 * function i915_gem_object_has_cache_level() provides one way to bypass
> > -	 * the use of this field.
> > -	 *
> >  	 * Track whether the pages are coherent with the GPU if reading or
> >  	 * writing through the CPU caches. The largely depends on the
> >  	 * @cache_level setting.
> > @@ -447,7 +351,7 @@ struct drm_i915_gem_object {
> >  	 * flushing the surface just before doing the scanout.  This does mean
> >  	 * we might unnecessarily flush non-scanout objects in some places, but
> >  	 * the default assumption is that all normal objects should be using
> > -	 * I915_CACHE_LLC, at least on platforms with the shared LLC.
> > +	 * I915_CACHE_CACHED, at least on platforms with the shared LLC.
> >  	 *
> >  	 * Supported values:
> >  	 *
> > @@ -486,16 +390,6 @@ struct drm_i915_gem_object {
> >  	/**
> >  	 * @cache_dirty:
> >  	 *
> > -	 * Note: with the change above which replaced cache_level with pat_index,
> > -	 * the use of @cache_dirty is limited to the objects created by kernel
> > -	 * or by userspace without pat index specified.
> > -	 * Check for @pat_set_by_user to find out if an object has pat index set
> > -	 * by userspace. The ioctl's to change cache settings have also been
> > -	 * disabled for the objects with pat_index set by userspace. Please don't
> > -	 * assume @cache_dirty is set as describe here. Also see helper function
> > -	 * i915_gem_object_has_cache_level() for possible ways to bypass the use
> > -	 * of this field.
> > -	 *
> >  	 * Track if we are we dirty with writes through the CPU cache for this
> >  	 * object. As a result reading directly from main memory might yield
> >  	 * stale data.
> > @@ -531,9 +425,9 @@ struct drm_i915_gem_object {
> >  	 *
> >  	 *   1. All userspace objects, by default, have @cache_level set as
> >  	 *   I915_CACHE_NONE. The only exception is userptr objects, where we
> > -	 *   instead force I915_CACHE_LLC, but we also don't allow userspace to
> > -	 *   ever change the @cache_level for such objects. Another special case
> > -	 *   is dma-buf, which doesn't rely on @cache_dirty,  but there we
> > +	 *   instead force I915_CACHE_CACHED, but we also don't allow userspace
> > +	 *   to ever change the @cache_level for such objects. Another special
> > +	 *   case is dma-buf, which doesn't rely on @cache_dirty,  but there we
> >  	 *   always do a forced flush when acquiring the pages, if there is a
> >  	 *   chance that the pages can be read directly from main memory with
> >  	 *   the GPU.
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
> > index 8f1633c3fb93..aba908f0349f 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
> > @@ -584,7 +584,7 @@ static int shmem_object_init(struct intel_memory_region *mem,
> >  	static struct lock_class_key lock_class;
> >  	struct drm_i915_private *i915 = mem->i915;
> >  	struct address_space *mapping;
> > -	unsigned int cache_level;
> > +	i915_cache_t cache;
> >  	gfp_t mask;
> >  	int ret;
> >  
> > @@ -628,11 +628,11 @@ static int shmem_object_init(struct intel_memory_region *mem,
> >  		 * However, we maintain the display planes as UC, and so
> >  		 * need to rebind when first used as such.
> >  		 */
> > -		cache_level = I915_CACHE_LLC;
> > +		cache = I915_CACHE_CACHED;
> >  	else
> > -		cache_level = I915_CACHE_NONE;
> > +		cache = I915_CACHE_NONE;
> >  
> > -	i915_gem_object_set_cache_coherency(obj, cache_level);
> > +	i915_gem_object_set_cache_coherency(obj, cache);
> >  
> >  	i915_gem_object_init_memory_region(obj, mem);
> >  
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
> > index 1c8eb806b7d3..cc907a1f1c53 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
> > @@ -691,7 +691,7 @@ static int __i915_gem_object_create_stolen(struct intel_memory_region *mem,
> >  
> >  	obj->stolen = stolen;
> >  	obj->read_domains = I915_GEM_DOMAIN_CPU | I915_GEM_DOMAIN_GTT;
> > -	cache_level = HAS_LLC(mem->i915) ? I915_CACHE_LLC : I915_CACHE_NONE;
> > +	cache_level = HAS_LLC(mem->i915) ? I915_CACHE_CACHED : I915_CACHE_NONE;
> >  	i915_gem_object_set_cache_coherency(obj, cache_level);
> >  
> >  	if (WARN_ON(!i915_gem_object_trylock(obj, NULL)))
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
> > index 6bd6c239f4ac..107176d1757b 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
> > @@ -48,14 +48,14 @@ void i915_ttm_migrate_set_ban_memcpy(bool ban)
> >  }
> >  #endif
> >  
> > -static enum i915_cache_level
> > -i915_ttm_cache_level(struct drm_i915_private *i915, struct ttm_resource *res,
> > -		     struct ttm_tt *ttm)
> > +static i915_cache_t
> > +i915_ttm_cache(struct drm_i915_private *i915, struct ttm_resource *res,
> > +	       struct ttm_tt *ttm)
> >  {
> >  	return ((HAS_LLC(i915) || HAS_SNOOP(i915)) &&
> >  		!i915_ttm_gtt_binds_lmem(res) &&
> > -		ttm->caching == ttm_cached) ? I915_CACHE_LLC :
> > -		I915_CACHE_NONE;
> > +		ttm->caching == ttm_cached) ? I915_CACHE_CACHED :
> > +					      I915_CACHE_NONE;
> >  }
> >  
> >  static unsigned int
> > @@ -112,8 +112,8 @@ void i915_ttm_adjust_domains_after_move(struct drm_i915_gem_object *obj)
> >  void i915_ttm_adjust_gem_after_move(struct drm_i915_gem_object *obj)
> >  {
> >  	struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
> > -	unsigned int cache_level;
> >  	unsigned int mem_flags;
> > +	i915_cache_t cache;
> >  	unsigned int i;
> >  	int mem_type;
> >  
> > @@ -126,13 +126,13 @@ void i915_ttm_adjust_gem_after_move(struct drm_i915_gem_object *obj)
> >  	if (!bo->resource) {
> >  		mem_flags = I915_BO_FLAG_STRUCT_PAGE;
> >  		mem_type = I915_PL_SYSTEM;
> > -		cache_level = I915_CACHE_NONE;
> > +		cache = I915_CACHE_NONE;
> >  	} else {
> >  		mem_flags = i915_ttm_cpu_maps_iomem(bo->resource) ? I915_BO_FLAG_IOMEM :
> >  			I915_BO_FLAG_STRUCT_PAGE;
> >  		mem_type = bo->resource->mem_type;
> > -		cache_level = i915_ttm_cache_level(to_i915(bo->base.dev), bo->resource,
> > -						   bo->ttm);
> > +		cache = i915_ttm_cache(to_i915(bo->base.dev), bo->resource,
> > +				       bo->ttm);
> >  	}
> >  
> >  	/*
> > @@ -157,7 +157,7 @@ void i915_ttm_adjust_gem_after_move(struct drm_i915_gem_object *obj)
> >  	obj->mem_flags &= ~(I915_BO_FLAG_STRUCT_PAGE | I915_BO_FLAG_IOMEM);
> >  	obj->mem_flags |= mem_flags;
> >  
> > -	i915_gem_object_set_cache_coherency(obj, cache_level);
> > +	i915_gem_object_set_cache_coherency(obj, cache);
> >  }
> >  
> >  /**
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
> > index 1d3ebdf4069b..5d2891981bd4 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
> > @@ -553,7 +553,7 @@ i915_gem_userptr_ioctl(struct drm_device *dev,
> >  	obj->mem_flags = I915_BO_FLAG_STRUCT_PAGE;
> >  	obj->read_domains = I915_GEM_DOMAIN_CPU;
> >  	obj->write_domain = I915_GEM_DOMAIN_CPU;
> > -	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
> > +	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
> >  
> >  	obj->userptr.ptr = args->user_ptr;
> >  	obj->userptr.notifier_seq = ULONG_MAX;
> > diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c b/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c
> > index bac957755068..77d04be5e9d7 100644
> > --- a/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c
> > +++ b/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c
> > @@ -123,7 +123,7 @@ huge_gem_object(struct drm_i915_private *i915,
> >  
> >  	obj->read_domains = I915_GEM_DOMAIN_CPU;
> >  	obj->write_domain = I915_GEM_DOMAIN_CPU;
> > -	cache_level = HAS_LLC(i915) ? I915_CACHE_LLC : I915_CACHE_NONE;
> > +	cache_level = HAS_LLC(i915) ? I915_CACHE_CACHED : I915_CACHE_NONE;
> >  	i915_gem_object_set_cache_coherency(obj, cache_level);
> >  	obj->scratch = phys_size;
> >  
> > diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
> > index 6bddd733d796..6ca5b9dbc414 100644
> > --- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
> > +++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
> > @@ -200,9 +200,10 @@ huge_pages_object(struct drm_i915_private *i915,
> >  	obj->write_domain = I915_GEM_DOMAIN_CPU;
> >  	obj->read_domains = I915_GEM_DOMAIN_CPU;
> >  
> > -	cache_level = HAS_LLC(i915) ? I915_CACHE_LLC : I915_CACHE_NONE;
> > +	cache_level = HAS_LLC(i915) ? I915_CACHE_CACHED : I915_CACHE_NONE;
> >  	i915_gem_object_set_cache_coherency(obj, cache_level);
> >  
> > +
> >  	obj->mm.page_mask = page_mask;
> >  
> >  	return obj;
> > diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> > index 675f71f06e89..3c93a73cf6b1 100644
> > --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> > +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> > @@ -16,11 +16,11 @@
> >  #include "intel_gtt.h"
> >  
> >  static u64 gen8_pde_encode(const dma_addr_t addr,
> > -			   const enum i915_cache_level level)
> > +			   const enum i915_cache_mode cache_mode)
> >  {
> >  	u64 pde = addr | GEN8_PAGE_PRESENT | GEN8_PAGE_RW;
> >  
> > -	if (level != I915_CACHE_NONE)
> > +	if (cache_mode != I915_CACHE_MODE_UC)
> >  		pde |= PPAT_CACHED_PDE;
> >  	else
> >  		pde |= PPAT_UNCACHED;
> > @@ -43,10 +43,10 @@ static u64 gen8_pte_encode(dma_addr_t addr,
> >  	 * See translation table defined by LEGACY_CACHELEVEL.
> >  	 */
> >  	switch (pat_index) {
> > -	case I915_CACHE_NONE:
> > +	case I915_CACHE_MODE_UC:
> >  		pte |= PPAT_UNCACHED;
> >  		break;
> > -	case I915_CACHE_WT:
> > +	case I915_CACHE_MODE_WT:
> >  		pte |= PPAT_DISPLAY_ELLC;
> >  		break;
> >  	default:
> > @@ -893,7 +893,7 @@ static int gen8_init_scratch(struct i915_address_space *vm)
> >  		}
> >  
> >  		fill_px(obj, vm->scratch[i - 1]->encode);
> > -		obj->encode = gen8_pde_encode(px_dma(obj), I915_CACHE_NONE);
> > +		obj->encode = gen8_pde_encode(px_dma(obj), I915_CACHE_MODE_UC);
> >  
> >  		vm->scratch[i] = obj;
> >  	}
> > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > index ee15486fed0d..f1e59e512d14 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > @@ -1103,7 +1103,7 @@ static int init_status_page(struct intel_engine_cs *engine)
> >  		return PTR_ERR(obj);
> >  	}
> >  
> > -	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
> > +	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
> >  
> >  	vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL);
> >  	if (IS_ERR(vma)) {
> > diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> > index fca61ddca8ad..ab5f654e7557 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> > @@ -1011,11 +1011,6 @@ static int gen8_gmch_probe(struct i915_ggtt *ggtt)
> >  	return ggtt_probe_common(ggtt, size);
> >  }
> >  
> > -/*
> > - * For pre-gen8 platforms pat_index is the same as enum i915_cache_level,
> > - * so the switch-case statements in these PTE encode functions are still valid.
> > - * See translation table LEGACY_CACHELEVEL.
> > - */
> >  static u64 snb_pte_encode(dma_addr_t addr,
> >  			  unsigned int pat_index,
> >  			  u32 flags)
> > @@ -1023,11 +1018,11 @@ static u64 snb_pte_encode(dma_addr_t addr,
> >  	gen6_pte_t pte = GEN6_PTE_ADDR_ENCODE(addr) | GEN6_PTE_VALID;
> >  
> >  	switch (pat_index) {
> > -	case I915_CACHE_L3_LLC:
> > -	case I915_CACHE_LLC:
> > +	case I915_CACHE_MODE_WB:
> > +	case __I915_CACHE_MODE_WB_L3:
> >  		pte |= GEN6_PTE_CACHE_LLC;
> >  		break;
> > -	case I915_CACHE_NONE:
> > +	case I915_CACHE_MODE_UC:
> >  		pte |= GEN6_PTE_UNCACHED;
> >  		break;
> >  	default:
> > @@ -1044,13 +1039,13 @@ static u64 ivb_pte_encode(dma_addr_t addr,
> >  	gen6_pte_t pte = GEN6_PTE_ADDR_ENCODE(addr) | GEN6_PTE_VALID;
> >  
> >  	switch (pat_index) {
> > -	case I915_CACHE_L3_LLC:
> > +	case __I915_CACHE_MODE_WB_L3:
> >  		pte |= GEN7_PTE_CACHE_L3_LLC;
> >  		break;
> > -	case I915_CACHE_LLC:
> > +	case I915_CACHE_MODE_WB:
> >  		pte |= GEN6_PTE_CACHE_LLC;
> >  		break;
> > -	case I915_CACHE_NONE:
> > +	case I915_CACHE_MODE_UC:
> >  		pte |= GEN6_PTE_UNCACHED;
> >  		break;
> >  	default:
> > @@ -1069,7 +1064,7 @@ static u64 byt_pte_encode(dma_addr_t addr,
> >  	if (!(flags & PTE_READ_ONLY))
> >  		pte |= BYT_PTE_WRITEABLE;
> >  
> > -	if (pat_index != I915_CACHE_NONE)
> > +	if (pat_index != I915_CACHE_MODE_UC)
> >  		pte |= BYT_PTE_SNOOPED_BY_CPU_CACHES;
> >  
> >  	return pte;
> > @@ -1081,7 +1076,7 @@ static u64 hsw_pte_encode(dma_addr_t addr,
> >  {
> >  	gen6_pte_t pte = HSW_PTE_ADDR_ENCODE(addr) | GEN6_PTE_VALID;
> >  
> > -	if (pat_index != I915_CACHE_NONE)
> > +	if (pat_index != I915_CACHE_MODE_UC)
> >  		pte |= HSW_WB_LLC_AGE3;
> >  
> >  	return pte;
> > @@ -1094,9 +1089,9 @@ static u64 iris_pte_encode(dma_addr_t addr,
> >  	gen6_pte_t pte = HSW_PTE_ADDR_ENCODE(addr) | GEN6_PTE_VALID;
> >  
> >  	switch (pat_index) {
> > -	case I915_CACHE_NONE:
> > +	case I915_CACHE_MODE_UC:
> >  		break;
> > -	case I915_CACHE_WT:
> > +	case I915_CACHE_MODE_WT:
> >  		pte |= HSW_WT_ELLC_LLC_AGE3;
> >  		break;
> >  	default:
> > diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c b/drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c
> > index 866c416afb73..803c41ac4ccb 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c
> > @@ -21,7 +21,7 @@ static void gmch_ggtt_insert_page(struct i915_address_space *vm,
> >  				  unsigned int pat_index,
> >  				  u32 unused)
> >  {
> > -	unsigned int flags = (pat_index == I915_CACHE_NONE) ?
> > +	unsigned int flags = (pat_index == I915_CACHE_MODE_UC) ?
> >  		AGP_USER_MEMORY : AGP_USER_CACHED_MEMORY;
> >  
> >  	intel_gmch_gtt_insert_page(addr, offset >> PAGE_SHIFT, flags);
> > @@ -32,7 +32,7 @@ static void gmch_ggtt_insert_entries(struct i915_address_space *vm,
> >  				     unsigned int pat_index,
> >  				     u32 unused)
> >  {
> > -	unsigned int flags = (pat_index == I915_CACHE_NONE) ?
> > +	unsigned int flags = (pat_index == I915_CACHE_MODE_UC) ?
> >  		AGP_USER_MEMORY : AGP_USER_CACHED_MEMORY;
> >  
> >  	intel_gmch_gtt_insert_sg_entries(vma_res->bi.pages, vma_res->start >> PAGE_SHIFT,
> > diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
> > index 065099362a98..48055304537a 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
> > @@ -676,7 +676,7 @@ __vm_create_scratch_for_read(struct i915_address_space *vm, unsigned long size)
> >  	if (IS_ERR(obj))
> >  		return ERR_CAST(obj);
> >  
> > -	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
> > +	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
> >  
> >  	vma = i915_vma_instance(obj, vm, NULL);
> >  	if (IS_ERR(vma)) {
> > diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
> > index 7192a534a654..af4277c1d577 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
> > @@ -636,7 +636,8 @@ void
> >  __set_pd_entry(struct i915_page_directory * const pd,
> >  	       const unsigned short idx,
> >  	       struct i915_page_table *pt,
> > -	       u64 (*encode)(const dma_addr_t, const enum i915_cache_level));
> > +	       u64 (*encode)(const dma_addr_t,
> > +			     const enum i915_cache_mode cache_mode));
> >  
> >  #define set_pd_entry(pd, idx, to) \
> >  	__set_pd_entry((pd), (idx), px_pt(to), gen8_pde_encode)
> > diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
> > index 436756bfbb1a..3e461d4f3693 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
> > @@ -98,14 +98,16 @@ void
> >  __set_pd_entry(struct i915_page_directory * const pd,
> >  	       const unsigned short idx,
> >  	       struct i915_page_table * const to,
> > -	       u64 (*encode)(const dma_addr_t, const enum i915_cache_level))
> > +	       u64 (*encode)(const dma_addr_t,
> > +			     const enum i915_cache_mode cache_mode))
> >  {
> >  	/* Each thread pre-pins the pd, and we may have a thread per pde. */
> >  	GEM_BUG_ON(atomic_read(px_used(pd)) > NALLOC * I915_PDES);
> >  
> >  	atomic_inc(px_used(pd));
> >  	pd->entry[idx] = to;
> > -	write_dma_entry(px_base(pd), idx, encode(px_dma(to), I915_CACHE_LLC));
> > +	write_dma_entry(px_base(pd), idx,
> > +			encode(px_dma(to), I915_CACHE_MODE_WB));
> >  }
> >  
> >  void
> > diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> > index 92085ffd23de..9131d228d285 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> > @@ -551,7 +551,9 @@ alloc_context_vma(struct intel_engine_cs *engine)
> >  	 * later platforms don't have L3 control bits in the PTE.
> >  	 */
> >  	if (IS_IVYBRIDGE(i915))
> > -		i915_gem_object_set_cache_coherency(obj, I915_CACHE_L3_LLC);
> > +		i915_gem_object_set_cache_coherency(obj,
> > +						    I915_CACHE_CACHED |
> > +						    __I915_CACHE_FLAG(L3));
> >  
> >  	vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL);
> >  	if (IS_ERR(vma)) {
> > diff --git a/drivers/gpu/drm/i915/gt/intel_timeline.c b/drivers/gpu/drm/i915/gt/intel_timeline.c
> > index b9640212d659..025ce54c886d 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_timeline.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_timeline.c
> > @@ -26,7 +26,7 @@ static struct i915_vma *hwsp_alloc(struct intel_gt *gt)
> >  	if (IS_ERR(obj))
> >  		return ERR_CAST(obj);
> >  
> > -	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
> > +	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
> >  
> >  	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL);
> >  	if (IS_ERR(vma))
> > diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> > index 8b0d84f2aad2..fc278fa463b0 100644
> > --- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> > +++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> > @@ -64,7 +64,7 @@ static int hang_init(struct hang *h, struct intel_gt *gt)
> >  		goto err_hws;
> >  	}
> >  
> > -	i915_gem_object_set_cache_coherency(h->hws, I915_CACHE_LLC);
> > +	i915_gem_object_set_cache_coherency(h->hws, I915_CACHE_CACHED);
> >  	vaddr = i915_gem_object_pin_map_unlocked(h->hws, I915_MAP_WB);
> >  	if (IS_ERR(vaddr)) {
> >  		err = PTR_ERR(vaddr);
> > diff --git a/drivers/gpu/drm/i915/gt/selftest_workarounds.c b/drivers/gpu/drm/i915/gt/selftest_workarounds.c
> > index 14a8b25b6204..d25990d33d44 100644
> > --- a/drivers/gpu/drm/i915/gt/selftest_workarounds.c
> > +++ b/drivers/gpu/drm/i915/gt/selftest_workarounds.c
> > @@ -111,7 +111,7 @@ read_nonprivs(struct intel_context *ce)
> >  	if (IS_ERR(result))
> >  		return result;
> >  
> > -	i915_gem_object_set_cache_coherency(result, I915_CACHE_LLC);
> > +	i915_gem_object_set_cache_coherency(result, I915_CACHE_CACHED);
> >  
> >  	cs = i915_gem_object_pin_map_unlocked(result, I915_MAP_WB);
> >  	if (IS_ERR(cs)) {
> > diff --git a/drivers/gpu/drm/i915/i915_cache.c b/drivers/gpu/drm/i915/i915_cache.c
> > index 06eb5933c719..f4ba1cb430d3 100644
> > --- a/drivers/gpu/drm/i915/i915_cache.c
> > +++ b/drivers/gpu/drm/i915/i915_cache.c
> > @@ -6,13 +6,88 @@
> >  #include "i915_cache.h"
> >  #include "i915_drv.h"
> >  
> > -void i915_cache_init(struct drm_i915_private *i915)
> > +int i915_cache_init(struct drm_i915_private *i915)
> >  {
> > -	i915->pat_uc = i915_gem_get_pat_index(i915, I915_CACHE_NONE);
> > -	drm_info(&i915->drm, "Using PAT index %u for uncached access\n",
> > -		 i915->pat_uc);
> > +	int ret;
> >  
> > -	i915->pat_wb = i915_gem_get_pat_index(i915, I915_CACHE_LLC);
> > -	drm_info(&i915->drm, "Using PAT index %u for write-back access\n",
> > -		 i915->pat_wb);
> > +	ret = i915_cache_find_pat(i915, I915_CACHE_NONE);
> > +	if (ret < 0) {
> > +		drm_err(&i915->drm,
> > +			"Failed to find PAT index for uncached access\n");
> > +		return -ENODEV;
> > +	}
> > +	drm_info(&i915->drm, "Using PAT index %u for uncached access\n", ret);
> > +	i915->pat_uc = ret;
> > +
> > +	ret = i915_cache_find_pat(i915, I915_CACHE_CACHED);
> > +	if (ret < 0) {
> > +		drm_err(&i915->drm,
> > +			"Failed to find PAT index for write-back access\n");
> > +		return -ENODEV;
> > +	}
> > +	drm_info(&i915->drm, "Using PAT index %u for write-back access\n", ret);
> > +	i915->pat_wb = ret;
> > +
> > +	return 0;
> > +}
> > +
> > +int i915_cache_find_pat(struct drm_i915_private *i915, i915_cache_t cache)
> > +{
> > +	const struct intel_device_info *info = INTEL_INFO(i915);
> > +	int i;
> > +
> > +	for (i = 0; i < ARRAY_SIZE(info->cache_modes); i++) {
> > +		if (info->cache_modes[i] == cache)
> > +			return i;
> > +	}
> > +
> > +	return -1;
> > +}
> > +
> > +void i915_cache_print(char *buf, size_t buflen, const char *suffix,
> > +		      i915_cache_t cache)
> > +{
> > +	const enum i915_cache_mode mode = I915_CACHE_MODE(cache);
> > +	static const char * const mode_str[] = {
> > +		[I915_CACHE_MODE_UC] = "UC",
> > +		[I915_CACHE_MODE_WB] = "WB",
> > +		[I915_CACHE_MODE_WT] = "WT",
> > +		[I915_CACHE_MODE_WC] = "WC",
> > +	};
> > +	static const char * const flag_str[] = {
> > +		[ilog2(I915_CACHE_FLAG_COH1W)] = "1-Way-Coherent",
> > +		[ilog2(I915_CACHE_FLAG_COH2W)] = "2-Way-Coherent",
> > +		[ilog2(I915_CACHE_FLAG_L3)] =    "L3",
> > +		[ilog2(I915_CACHE_FLAG_CLOS1)] = "CLOS1",
> > +		[ilog2(I915_CACHE_FLAG_CLOS2)] = "CLOS2",
> > +	};
> > +
> > +	if (mode > ARRAY_SIZE(mode_str)) {
> > +		snprintf(buf, buflen, "0x%x%s", cache, suffix ?: "");
> > +	} else {
> > +		unsigned long flags = I915_CACHE_FLAGS(cache);
> > +		unsigned long bit;
> > +		int ret;
> > +
> > +		ret = snprintf(buf, buflen, "%s", mode_str[mode]);
> > +		buf += ret;
> > +		buflen -= ret;
> > +
> > +		/*
> > +		 * Don't print "1-way-2-way", it would be confusing and 2-way
> > +		 * implies 1-way anyway.
> > +		 */
> > +		if ((flags & (I915_CACHE_FLAG_COH1W | I915_CACHE_FLAG_COH2W)) ==
> > +		    (I915_CACHE_FLAG_COH1W | I915_CACHE_FLAG_COH2W))
> > +			flags &= ~I915_CACHE_FLAG_COH1W;
> > +
> > +		for_each_set_bit(bit, &flags, BITS_PER_TYPE(i915_cache_t)) {
> > +			ret = snprintf(buf, buflen, "-%s", flag_str[bit]);
> > +			buf += ret;
> > +			buflen -= ret;
> > +		}
> > +
> > +		if (suffix)
> > +			snprintf(buf, buflen, "%s", suffix);
> > +	}
> >  }
> > diff --git a/drivers/gpu/drm/i915/i915_cache.h b/drivers/gpu/drm/i915/i915_cache.h
> > index cb68936fb8a2..d9e97318b942 100644
> > --- a/drivers/gpu/drm/i915/i915_cache.h
> > +++ b/drivers/gpu/drm/i915/i915_cache.h
> > @@ -6,8 +6,76 @@
> >  #ifndef __I915_CACHE_H__
> >  #define __I915_CACHE_H__
> >  
> > +#include <linux/types.h>
> > +
> > +struct drm_printer;
> > +
> >  struct drm_i915_private;
> >  
> > -void i915_cache_init(struct drm_i915_private *i915);
> > +typedef u16 i915_cache_t;
> > +
> > +/* Cache modes */
> > +enum i915_cache_mode {
> > +	I915_CACHE_MODE_UC = 0,
> > +	I915_CACHE_MODE_WB,
> > +	__I915_CACHE_MODE_WB_L3, /* Special do-not-use entry for legacy 1:1 mapping. */
> > +	I915_CACHE_MODE_WT,
> > +	I915_CACHE_MODE_WC,
> > +	I915_NUM_CACHE_MODES
> > +};
> > +
> > +/* Cache mode flag bits */
> > +#define I915_CACHE_FLAG_COH1W	(0x1)
> > +#define I915_CACHE_FLAG_COH2W	(0x2) /* 1-way needs to be set too. */
> > +#define I915_CACHE_FLAG_L3	(0x4)
> > +#define I915_CACHE_FLAG_CLOS1	(0x8)
> > +#define I915_CACHE_FLAG_CLOS2	(0x10)
> > +
> > +/*
> > + * Overloaded I915_CACHE() macro based on:
> > + *  https://stackoverflow.com/questions/3046889/optional-parameters-with-c-macros
> > + *
> > + * It is possible to call I915_CACHE with mode and zero or more flags as
> > + * separate arguments. Ie these all work:
> > + *
> > + *   I915_CACHE(WB)
> > + *   I915_CACHE(WB, COH1W, COH2W)
> > + *   I915_CACHE(WB, COH1W, COH2W, L3)
> > + */
> > +
> > +#define __I915_CACHE_FLAG(f) (I915_CACHE_FLAG_##f << 8)
> > +#define __I915_CACHE(m, f) ((i915_cache_t)(I915_CACHE_MODE_##m | (f)))
> > +
> > +#define I915_CACHE_4(m, f1, f2, f3)	__I915_CACHE(m, __I915_CACHE_FLAG(f1) | __I915_CACHE_FLAG(f2) | __I915_CACHE_FLAG(f3))
> > +#define I915_CACHE_3(m, f1, f2)		__I915_CACHE(m, __I915_CACHE_FLAG(f1) | __I915_CACHE_FLAG(f2))
> > +#define I915_CACHE_2(m, f1)		__I915_CACHE(m, __I915_CACHE_FLAG(f1))
> > +#define I915_CACHE_1(m)			__I915_CACHE(m, 0)
> > +#define I915_CACHE_0(m)			__I915_CACHE(WC, 0)
> > +
> > +#define FUNC_CHOOSER(_f1, _f2, _f3, _f4, _f5, ...) _f5
> > +#define FUNC_RECOMPOSER(argsWithParentheses) FUNC_CHOOSER argsWithParentheses
> > +#define CHOOSE_FROM_ARG_COUNT(...) FUNC_RECOMPOSER((__VA_ARGS__, I915_CACHE_4, I915_CACHE_3, I915_CACHE_2, I915_CACHE_1, ))
> > +#define NO_ARG_EXPANDER() ,,,I915_CACHE_0
> > +#define MACRO_CHOOSER(...) CHOOSE_FROM_ARG_COUNT(NO_ARG_EXPANDER __VA_ARGS__ ())
> > +
> > +#define I915_CACHE(...) MACRO_CHOOSER(__VA_ARGS__)(__VA_ARGS__)
> > +
> > +/* i915_cache_t mode and flags extraction helpers. */
> > +#define I915_CACHE_MODE(cache) \
> > +	((enum i915_cache_mode)(((i915_cache_t)(cache)) & 0xff))
> > +#define I915_CACHE_FLAGS(cache) \
> > +	((unsigned int)((((i915_cache_t)(cache) & 0xff00)) >> 8))
> > +
> > +/* Helpers for i915 caching modes. */
> > +#define I915_CACHE_NONE		I915_CACHE(UC)
> > +#define I915_CACHE_CACHED	I915_CACHE(WB, COH1W, COH2W)
> > +#define I915_CACHE_WT		I915_CACHE(WT)
> > +
> > +int i915_cache_init(struct drm_i915_private *i915);
> > +int i915_cache_find_pat(struct drm_i915_private *i915, i915_cache_t cache);
> > +void i915_cache_print(char *buf, size_t buflen, const char *suffix,
> > +		      i915_cache_t cache);
> > +
> > +#define I915_CACHE_NAME_LEN (40)
> >  
> >  #endif /* __I915_CACHE_H__ */
> > diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> > index 4de44cf1026d..4ec292011546 100644
> > --- a/drivers/gpu/drm/i915/i915_debugfs.c
> > +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> > @@ -140,57 +140,18 @@ static const char *stringify_vma_type(const struct i915_vma *vma)
> >  	return "ppgtt";
> >  }
> >  
> > -static const char *i915_cache_level_str(struct drm_i915_gem_object *obj)
> > -{
> > -	struct drm_i915_private *i915 = obj_to_i915(obj);
> > -
> > -	if (IS_METEORLAKE(i915)) {
> > -		switch (obj->pat_index) {
> > -		case 0: return " WB";
> > -		case 1: return " WT";
> > -		case 2: return " UC";
> > -		case 3: return " WB (1-Way Coh)";
> > -		case 4: return " WB (2-Way Coh)";
> > -		default: return " not defined";
> > -		}
> > -	} else if (IS_PONTEVECCHIO(i915)) {
> > -		switch (obj->pat_index) {
> > -		case 0: return " UC";
> > -		case 1: return " WC";
> > -		case 2: return " WT";
> > -		case 3: return " WB";
> > -		case 4: return " WT (CLOS1)";
> > -		case 5: return " WB (CLOS1)";
> > -		case 6: return " WT (CLOS2)";
> > -		case 7: return " WT (CLOS2)";
> > -		default: return " not defined";
> > -		}
> > -	} else if (GRAPHICS_VER(i915) >= 12) {
> > -		switch (obj->pat_index) {
> > -		case 0: return " WB";
> > -		case 1: return " WC";
> > -		case 2: return " WT";
> > -		case 3: return " UC";
> > -		default: return " not defined";
> > -		}
> > -	} else {
> > -		switch (obj->pat_index) {
> > -		case 0: return " UC";
> > -		case 1: return HAS_LLC(i915) ?
> > -			       " LLC" : " snooped";
> > -		case 2: return " L3+LLC";
> > -		case 3: return " WT";
> > -		default: return " not defined";
> > -		}
> > -	}
> > -}
> > -
> >  void
> >  i915_debugfs_describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
> >  {
> > +	struct drm_i915_private *i915 = to_i915(obj->base.dev);
> > +	char buf[I915_CACHE_NAME_LEN];
> >  	struct i915_vma *vma;
> >  	int pin_count = 0;
> >  
> > +	i915_cache_print(buf, sizeof(buf),
> > +			 obj->pat_set_by_user ? "!" : NULL,
> > +			 INTEL_INFO(i915)->cache_modes[obj->pat_index]);
> > +
> >  	seq_printf(m, "%pK: %c%c%c %8zdKiB %02x %02x %s%s%s",
> >  		   &obj->base,
> >  		   get_tiling_flag(obj),
> > @@ -199,7 +160,7 @@ i915_debugfs_describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
> >  		   obj->base.size / 1024,
> >  		   obj->read_domains,
> >  		   obj->write_domain,
> > -		   i915_cache_level_str(obj),
> > +		   buf,
> >  		   obj->mm.dirty ? " dirty" : "",
> >  		   obj->mm.madv == I915_MADV_DONTNEED ? " purgeable" : "");
> >  	if (obj->base.name)
> > diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
> > index bb2223cc3470..8663388a524f 100644
> > --- a/drivers/gpu/drm/i915/i915_driver.c
> > +++ b/drivers/gpu/drm/i915/i915_driver.c
> > @@ -241,7 +241,9 @@ static int i915_driver_early_probe(struct drm_i915_private *dev_priv)
> >  	i915_memcpy_init_early(dev_priv);
> >  	intel_runtime_pm_init_early(&dev_priv->runtime_pm);
> >  
> > -	i915_cache_init(dev_priv);
> > +	ret = i915_cache_init(dev_priv);
> > +	if (ret < 0)
> > +		return ret;
> >  
> >  	ret = i915_workqueues_init(dev_priv);
> >  	if (ret < 0)
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index 896aa48ed089..814705cfeb12 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -1144,19 +1144,6 @@ int i915_gem_init(struct drm_i915_private *dev_priv)
> >  	unsigned int i;
> >  	int ret;
> >  
> > -	/*
> > -	 * In the proccess of replacing cache_level with pat_index a tricky
> > -	 * dependency is created on the definition of the enum i915_cache_level.
> > -	 * in case this enum is changed, PTE encode would be broken.
> > -	 * Add a WARNING here. And remove when we completely quit using this
> > -	 * enum
> > -	 */
> > -	BUILD_BUG_ON(I915_CACHE_NONE != 0 ||
> > -		     I915_CACHE_LLC != 1 ||
> > -		     I915_CACHE_L3_LLC != 2 ||
> > -		     I915_CACHE_WT != 3 ||
> > -		     I915_MAX_CACHE_LEVEL != 4);
> > -
> >  	/* We need to fallback to 4K pages if host doesn't support huge gtt. */
> >  	if (intel_vgpu_active(dev_priv) && !intel_vgpu_has_huge_gtt(dev_priv))
> >  		RUNTIME_INFO(dev_priv)->page_sizes = I915_GTT_PAGE_SIZE_4K;
> > diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
> > index fcacdc21643c..565a60a1645d 100644
> > --- a/drivers/gpu/drm/i915/i915_pci.c
> > +++ b/drivers/gpu/drm/i915/i915_pci.c
> > @@ -32,6 +32,7 @@
> >  #include "gt/intel_sa_media.h"
> >  #include "gem/i915_gem_object_types.h"
> >  
> > +#include "i915_cache.h"
> >  #include "i915_driver.h"
> >  #include "i915_drv.h"
> >  #include "i915_pci.h"
> > @@ -43,36 +44,43 @@
> >  	.__runtime.graphics.ip.ver = (x), \
> >  	.__runtime.media.ip.ver = (x)
> >  
> > -#define LEGACY_CACHELEVEL \
> > -	.cachelevel_to_pat = { \
> > -		[I915_CACHE_NONE]   = 0, \
> > -		[I915_CACHE_LLC]    = 1, \
> > -		[I915_CACHE_L3_LLC] = 2, \
> > -		[I915_CACHE_WT]     = 3, \
> > +#define LEGACY_CACHE_MODES \
> > +	.cache_modes = { \
> > +		[I915_CACHE_MODE_UC] 	  = I915_CACHE(UC), \
> > +		[I915_CACHE_MODE_WB] 	  = I915_CACHE(WB, COH1W, COH2W), \
> 
> Reading bspec 2863 (bdw) indicates that the CPU being able to snoop the
> GPU's L3 was a new feature in gen8.  So for HSW and earlier, any
> coherency was only 1-way (GPU could be coherent with CPU's caches, but
> not vice-versa).  Only starting with gen8 did we get 2-way coherency as
> an option where the CPU would also be coherent with the GPU cache (and
> with gen8 and beyond you could still select 1-way instead of 2-way
> coherency with instruction-level granularity via MOCS).  There are also
> some legacy platforms (e.g., EHL/JSL on bspec 13948) where the IA wasn't
> coherent with GPU L3 so we were back to 1-way coherency.
> 
> So should we split LEGACY_CACHE_MODES into two tables with different
> coherency settings attached to I915_CACHE_MODE_WB?
> 
> > +		[__I915_CACHE_MODE_WB_L3] = I915_CACHE(WB, COH1W, COH2W, L3), \
> > +		[I915_CACHE_MODE_WT] 	  = I915_CACHE(WT), \
> >  	}
> >  
> > -#define TGL_CACHELEVEL \
> > -	.cachelevel_to_pat = { \
> > -		[I915_CACHE_NONE]   = 3, \
> > -		[I915_CACHE_LLC]    = 0, \
> > -		[I915_CACHE_L3_LLC] = 0, \
> > -		[I915_CACHE_WT]     = 2, \
> > +#define GEN12_CACHE_MODES \
> > +	.cache_modes = { \
> > +		[0] = I915_CACHE(WB, COH1W, COH2W), \
> > +		[1] = I915_CACHE(WC), \
> > +		[2] = I915_CACHE(WT), \
> > +		[3] = I915_CACHE(UC), \
> >  	}
> >  
> > -#define PVC_CACHELEVEL \
> > -	.cachelevel_to_pat = { \
> > -		[I915_CACHE_NONE]   = 0, \
> > -		[I915_CACHE_LLC]    = 3, \
> > -		[I915_CACHE_L3_LLC] = 3, \
> > -		[I915_CACHE_WT]     = 2, \
> > +/* FIXME is 1-way or 2-way for 3, 5, 7 */
> > +
> > +#define PVC_CACHE_MODES \
> > +	.cache_modes = { \
> > +		[0] = I915_CACHE(UC), \
> > +		[1] = I915_CACHE(WC), \
> > +		[2] = I915_CACHE(WT), \
> > +		[3] = I915_CACHE(WB, COH1W), \
> > +		[4] = I915_CACHE(WT, CLOS1), \
> > +		[5] = I915_CACHE(WB, COH1W, CLOS1), \
> > +		[6] = I915_CACHE(WT, CLOS2), \
> > +		[7] = I915_CACHE(WB, COH1W, CLOS2), \
> >  	}
> >  
> > -#define MTL_CACHELEVEL \
> > -	.cachelevel_to_pat = { \
> > -		[I915_CACHE_NONE]   = 2, \
> > -		[I915_CACHE_LLC]    = 3, \
> > -		[I915_CACHE_L3_LLC] = 3, \
> > -		[I915_CACHE_WT]     = 1, \
> > +#define MTL_CACHE_MODES \
> > +	.cache_modes = { \
> > +		[0] = I915_CACHE(WB), \
> > +		[1] = I915_CACHE(WT), \
> > +		[2] = I915_CACHE(UC), \
> > +		[3] = I915_CACHE(WB, COH1W), \
> > +		[4] = I915_CACHE(WB, COH1W, COH2W), \
> 
> We may want a comment on this one since the "2W" part is sort of a lie.
> Bspec 63884 has a programming note for MTL that says
> 
>         "...Except for system atomics, setting Coherency Mode to 10 or
>         11 results in this same one-way coherenct behavior..."
> 
> So if we ask for 2W, we actually only get 1W behavior except in a very
> narrow set of cases.
> 
> 
> Matt
> 
> >  	}
> >  
> >  /* Keep in gen based order, and chronological order within a gen */
> > @@ -97,7 +105,7 @@
> >  	.max_pat_index = 3, \
> >  	GEN_DEFAULT_PAGE_SIZES, \
> >  	GEN_DEFAULT_REGIONS, \
> > -	LEGACY_CACHELEVEL
> > +	LEGACY_CACHE_MODES
> >  
> >  #define I845_FEATURES \
> >  	GEN(2), \
> > @@ -112,7 +120,7 @@
> >  	.max_pat_index = 3, \
> >  	GEN_DEFAULT_PAGE_SIZES, \
> >  	GEN_DEFAULT_REGIONS, \
> > -	LEGACY_CACHELEVEL
> > +	LEGACY_CACHE_MODES
> >  
> >  static const struct intel_device_info i830_info = {
> >  	I830_FEATURES,
> > @@ -145,7 +153,7 @@ static const struct intel_device_info i865g_info = {
> >  	.max_pat_index = 3, \
> >  	GEN_DEFAULT_PAGE_SIZES, \
> >  	GEN_DEFAULT_REGIONS, \
> > -	LEGACY_CACHELEVEL
> > +	LEGACY_CACHE_MODES
> >  
> >  static const struct intel_device_info i915g_info = {
> >  	GEN3_FEATURES,
> > @@ -208,7 +216,7 @@ static const struct intel_device_info pnv_m_info = {
> >  	.max_pat_index = 3, \
> >  	GEN_DEFAULT_PAGE_SIZES, \
> >  	GEN_DEFAULT_REGIONS, \
> > -	LEGACY_CACHELEVEL
> > +	LEGACY_CACHE_MODES
> >  
> >  static const struct intel_device_info i965g_info = {
> >  	GEN4_FEATURES,
> > @@ -252,7 +260,7 @@ static const struct intel_device_info gm45_info = {
> >  	.max_pat_index = 3, \
> >  	GEN_DEFAULT_PAGE_SIZES, \
> >  	GEN_DEFAULT_REGIONS, \
> > -	LEGACY_CACHELEVEL
> > +	LEGACY_CACHE_MODES
> >  
> >  static const struct intel_device_info ilk_d_info = {
> >  	GEN5_FEATURES,
> > @@ -282,7 +290,7 @@ static const struct intel_device_info ilk_m_info = {
> >  	.__runtime.ppgtt_size = 31, \
> >  	GEN_DEFAULT_PAGE_SIZES, \
> >  	GEN_DEFAULT_REGIONS, \
> > -	LEGACY_CACHELEVEL
> > +	LEGACY_CACHE_MODES
> >  
> >  #define SNB_D_PLATFORM \
> >  	GEN6_FEATURES, \
> > @@ -330,7 +338,7 @@ static const struct intel_device_info snb_m_gt2_info = {
> >  	.__runtime.ppgtt_size = 31, \
> >  	GEN_DEFAULT_PAGE_SIZES, \
> >  	GEN_DEFAULT_REGIONS, \
> > -	LEGACY_CACHELEVEL
> > +	LEGACY_CACHE_MODES
> >  
> >  #define IVB_D_PLATFORM \
> >  	GEN7_FEATURES, \
> > @@ -387,7 +395,7 @@ static const struct intel_device_info vlv_info = {
> >  	.platform_engine_mask = BIT(RCS0) | BIT(VCS0) | BIT(BCS0),
> >  	GEN_DEFAULT_PAGE_SIZES,
> >  	GEN_DEFAULT_REGIONS,
> > -	LEGACY_CACHELEVEL,
> > +	LEGACY_CACHE_MODES
> >  };
> >  
> >  #define G75_FEATURES  \
> > @@ -473,7 +481,7 @@ static const struct intel_device_info chv_info = {
> >  	.has_coherent_ggtt = false,
> >  	GEN_DEFAULT_PAGE_SIZES,
> >  	GEN_DEFAULT_REGIONS,
> > -	LEGACY_CACHELEVEL,
> > +	LEGACY_CACHE_MODES
> >  };
> >  
> >  #define GEN9_DEFAULT_PAGE_SIZES \
> > @@ -536,7 +544,7 @@ static const struct intel_device_info skl_gt4_info = {
> >  	.max_pat_index = 3, \
> >  	GEN9_DEFAULT_PAGE_SIZES, \
> >  	GEN_DEFAULT_REGIONS, \
> > -	LEGACY_CACHELEVEL
> > +	LEGACY_CACHE_MODES
> >  
> >  static const struct intel_device_info bxt_info = {
> >  	GEN9_LP_FEATURES,
> > @@ -640,7 +648,7 @@ static const struct intel_device_info jsl_info = {
> >  #define GEN12_FEATURES \
> >  	GEN11_FEATURES, \
> >  	GEN(12), \
> > -	TGL_CACHELEVEL, \
> > +	GEN12_CACHE_MODES, \
> >  	.has_global_mocs = 1, \
> >  	.has_pxp = 1, \
> >  	.max_pat_index = 3
> > @@ -708,7 +716,7 @@ static const struct intel_device_info adl_p_info = {
> >  	.__runtime.graphics.ip.ver = 12, \
> >  	.__runtime.graphics.ip.rel = 50, \
> >  	XE_HP_PAGE_SIZES, \
> > -	TGL_CACHELEVEL, \
> > +	GEN12_CACHE_MODES, \
> >  	.dma_mask_size = 46, \
> >  	.has_3d_pipeline = 1, \
> >  	.has_64bit_reloc = 1, \
> > @@ -803,7 +811,7 @@ static const struct intel_device_info pvc_info = {
> >  		BIT(VCS0) |
> >  		BIT(CCS0) | BIT(CCS1) | BIT(CCS2) | BIT(CCS3),
> >  	.require_force_probe = 1,
> > -	PVC_CACHELEVEL,
> > +	PVC_CACHE_MODES
> >  };
> >  
> >  static const struct intel_gt_definition xelpmp_extra_gt[] = {
> > @@ -838,7 +846,7 @@ static const struct intel_device_info mtl_info = {
> >  	.memory_regions = REGION_SMEM | REGION_STOLEN_LMEM,
> >  	.platform_engine_mask = BIT(RCS0) | BIT(BCS0) | BIT(CCS0),
> >  	.require_force_probe = 1,
> > -	MTL_CACHELEVEL,
> > +	MTL_CACHE_MODES
> >  };
> >  
> >  #undef PLATFORM
> > diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> > index 04bc1f4a1115..973175a64534 100644
> > --- a/drivers/gpu/drm/i915/i915_perf.c
> > +++ b/drivers/gpu/drm/i915/i915_perf.c
> > @@ -1870,7 +1870,7 @@ static int alloc_oa_buffer(struct i915_perf_stream *stream)
> >  		return PTR_ERR(bo);
> >  	}
> >  
> > -	i915_gem_object_set_cache_coherency(bo, I915_CACHE_LLC);
> > +	i915_gem_object_set_cache_coherency(bo, I915_CACHE_CACHED);
> >  
> >  	/* PreHSW required 512K alignment, HSW requires 16M */
> >  	vma = i915_vma_instance(bo, &gt->ggtt->vm, NULL);
> > diff --git a/drivers/gpu/drm/i915/intel_device_info.h b/drivers/gpu/drm/i915/intel_device_info.h
> > index dbfe6443457b..2ce13b7c48cb 100644
> > --- a/drivers/gpu/drm/i915/intel_device_info.h
> > +++ b/drivers/gpu/drm/i915/intel_device_info.h
> > @@ -27,6 +27,8 @@
> >  
> >  #include <uapi/drm/i915_drm.h>
> >  
> > +#include "i915_cache.h"
> > +
> >  #include "intel_step.h"
> >  
> >  #include "gt/intel_engine_types.h"
> > @@ -243,8 +245,8 @@ struct intel_device_info {
> >  	 */
> >  	const struct intel_runtime_info __runtime;
> >  
> > -	u32 cachelevel_to_pat[I915_MAX_CACHE_LEVEL];
> > -	u32 max_pat_index;
> > +	i915_cache_t cache_modes[8];
> > +	unsigned int max_pat_index;
> >  };
> >  
> >  struct intel_driver_caps {
> > diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
> > index f910ec9b6d2b..ba821e48baa5 100644
> > --- a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
> > +++ b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
> > @@ -267,7 +267,7 @@ static int igt_evict_for_cache_color(void *arg)
> >  		err = PTR_ERR(obj);
> >  		goto cleanup;
> >  	}
> > -	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
> > +	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
> >  	quirk_add(obj, &objects);
> >  
> >  	vma = i915_gem_object_ggtt_pin(obj, NULL, 0, 0,
> > @@ -283,7 +283,7 @@ static int igt_evict_for_cache_color(void *arg)
> >  		err = PTR_ERR(obj);
> >  		goto cleanup;
> >  	}
> > -	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
> > +	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
> >  	quirk_add(obj, &objects);
> >  
> >  	/* Neighbouring; same colour - should fit */
> > diff --git a/drivers/gpu/drm/i915/selftests/igt_spinner.c b/drivers/gpu/drm/i915/selftests/igt_spinner.c
> > index 3c5e0952f1b8..4cfc5000d6ff 100644
> > --- a/drivers/gpu/drm/i915/selftests/igt_spinner.c
> > +++ b/drivers/gpu/drm/i915/selftests/igt_spinner.c
> > @@ -23,7 +23,7 @@ int igt_spinner_init(struct igt_spinner *spin, struct intel_gt *gt)
> >  		err = PTR_ERR(spin->hws);
> >  		goto err;
> >  	}
> > -	i915_gem_object_set_cache_coherency(spin->hws, I915_CACHE_LLC);
> > +	i915_gem_object_set_cache_coherency(spin->hws, I915_CACHE_CACHED);
> >  
> >  	spin->obj = i915_gem_object_create_internal(gt->i915, PAGE_SIZE);
> >  	if (IS_ERR(spin->obj)) {
> > diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> > index 1d1a457e2aee..8ae77bcf27fa 100644
> > --- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> > +++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> > @@ -126,13 +126,13 @@ static const struct intel_device_info mock_info = {
> >  	.memory_regions = REGION_SMEM,
> >  	.platform_engine_mask = BIT(0),
> >  
> > -	/* simply use legacy cache level for mock device */
> > +	/* Simply use legacy cache modes for the mock device. */
> >  	.max_pat_index = 3,
> > -	.cachelevel_to_pat = {
> > -		[I915_CACHE_NONE]   = 0,
> > -		[I915_CACHE_LLC]    = 1,
> > -		[I915_CACHE_L3_LLC] = 2,
> > -		[I915_CACHE_WT]     = 3,
> > +	.cache_modes = {
> > +		[0] = I915_CACHE(UC),
> > +		[1] = I915_CACHE(WB, COH1W),
> > +		[2] = I915_CACHE(WB, COH1W, COH2W, L3),
> > +		[3] = I915_CACHE(WT),
> >  	},
> >  };
> >  
> > @@ -181,7 +181,7 @@ struct drm_i915_private *mock_gem_device(void)
> >  	/* Set up device info and initial runtime info. */
> >  	intel_device_info_driver_create(i915, pdev->device, &mock_info);
> >  
> > -	i915_cache_init(i915);
> > +	WARN_ON(i915_cache_init(i915));
> >  
> >  	dev_pm_domain_set(&pdev->dev, &pm_domain);
> >  	pm_runtime_enable(&pdev->dev);
> > -- 
> > 2.39.2
> > 
> 
> -- 
> Matt Roper
> Graphics Software Engineer
> Linux GPU Platform Enablement
> Intel Corporation

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [Intel-gfx] ✗ Fi.CI.IGT: failure for Another take on PAT/object cache mode refactoring
  2023-07-27 14:54 ` [Intel-gfx] " Tvrtko Ursulin
                   ` (11 preceding siblings ...)
  (?)
@ 2023-07-28  1:03 ` Patchwork
  -1 siblings, 0 replies; 59+ messages in thread
From: Patchwork @ 2023-07-28  1:03 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

[-- Attachment #1: Type: text/plain, Size: 41272 bytes --]

== Series Details ==

Series: Another take on PAT/object cache mode refactoring
URL   : https://patchwork.freedesktop.org/series/121450/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_13432_full -> Patchwork_121450v1_full
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with Patchwork_121450v1_full absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_121450v1_full, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  

Participating hosts (10 -> 10)
------------------------------

  No changes in participating hosts

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_121450v1_full:

### IGT changes ###

#### Possible regressions ####

  * igt@i915_selftest@mock@timelines:
    - shard-apl:          [PASS][1] -> [ABORT][2]
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13432/shard-apl4/igt@i915_selftest@mock@timelines.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-apl1/igt@i915_selftest@mock@timelines.html
    - shard-glk:          [PASS][3] -> [ABORT][4]
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13432/shard-glk4/igt@i915_selftest@mock@timelines.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-glk7/igt@i915_selftest@mock@timelines.html
    - shard-dg2:          [PASS][5] -> [ABORT][6]
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13432/shard-dg2-1/igt@i915_selftest@mock@timelines.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-dg2-3/igt@i915_selftest@mock@timelines.html
    - shard-rkl:          [PASS][7] -> [ABORT][8]
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13432/shard-rkl-6/igt@i915_selftest@mock@timelines.html
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-rkl-1/igt@i915_selftest@mock@timelines.html
    - shard-dg1:          [PASS][9] -> [ABORT][10]
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13432/shard-dg1-19/igt@i915_selftest@mock@timelines.html
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-dg1-16/igt@i915_selftest@mock@timelines.html
    - shard-tglu:         [PASS][11] -> [ABORT][12]
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13432/shard-tglu-5/igt@i915_selftest@mock@timelines.html
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-tglu-9/igt@i915_selftest@mock@timelines.html

  * igt@syncobj_wait@multi-wait-submitted:
    - shard-dg2:          [PASS][13] -> [TIMEOUT][14] +1 similar issue
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13432/shard-dg2-6/igt@syncobj_wait@multi-wait-submitted.html
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-dg2-6/igt@syncobj_wait@multi-wait-submitted.html

  
#### Warnings ####

  * igt@kms_ccs@pipe-b-crc-primary-rotation-180-4_tiled_mtl_rc_ccs_cc:
    - shard-dg2:          [SKIP][15] ([i915#5354]) -> [TIMEOUT][16]
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13432/shard-dg2-6/igt@kms_ccs@pipe-b-crc-primary-rotation-180-4_tiled_mtl_rc_ccs_cc.html
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-dg2-6/igt@kms_ccs@pipe-b-crc-primary-rotation-180-4_tiled_mtl_rc_ccs_cc.html

  * igt@kms_ccs@pipe-d-random-ccs-data-y_tiled_gen12_rc_ccs:
    - shard-dg2:          [SKIP][17] ([i915#3689] / [i915#5354]) -> [TIMEOUT][18]
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13432/shard-dg2-6/igt@kms_ccs@pipe-d-random-ccs-data-y_tiled_gen12_rc_ccs.html
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-dg2-6/igt@kms_ccs@pipe-d-random-ccs-data-y_tiled_gen12_rc_ccs.html

  * igt@kms_cursor_crc@cursor-onscreen-32x32:
    - shard-dg2:          [SKIP][19] ([i915#3555]) -> [TIMEOUT][20]
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13432/shard-dg2-6/igt@kms_cursor_crc@cursor-onscreen-32x32.html
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-dg2-6/igt@kms_cursor_crc@cursor-onscreen-32x32.html

  * igt@kms_frontbuffer_tracking@fbcpsr-rgb101010-draw-mmap-cpu:
    - shard-dg2:          [SKIP][21] ([i915#3458]) -> [TIMEOUT][22]
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13432/shard-dg2-6/igt@kms_frontbuffer_tracking@fbcpsr-rgb101010-draw-mmap-cpu.html
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-dg2-6/igt@kms_frontbuffer_tracking@fbcpsr-rgb101010-draw-mmap-cpu.html

  * igt@kms_tv_load_detect@load-detect:
    - shard-dg2:          [SKIP][23] ([fdo#109309]) -> [TIMEOUT][24]
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13432/shard-dg2-6/igt@kms_tv_load_detect@load-detect.html
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-dg2-6/igt@kms_tv_load_detect@load-detect.html

  
Known issues
------------

  Here are the changes found in Patchwork_121450v1_full that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@drm_fdinfo@virtual-busy:
    - shard-dg1:          NOTRUN -> [SKIP][25] ([i915#8414])
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-dg1-18/igt@drm_fdinfo@virtual-busy.html

  * igt@gem_barrier_race@remote-request@rcs0:
    - shard-apl:          [PASS][26] -> [ABORT][27] ([i915#7461] / [i915#8211] / [i915#8234])
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13432/shard-apl2/igt@gem_barrier_race@remote-request@rcs0.html
   [27]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-apl6/igt@gem_barrier_race@remote-request@rcs0.html

  * igt@gem_busy@close-race:
    - shard-tglu:         [PASS][28] -> [ABORT][29] ([i915#6016])
   [28]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13432/shard-tglu-2/igt@gem_busy@close-race.html
   [29]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-tglu-4/igt@gem_busy@close-race.html

  * igt@gem_ctx_persistence@legacy-engines-queued:
    - shard-snb:          NOTRUN -> [SKIP][30] ([fdo#109271] / [i915#1099]) +4 similar issues
   [30]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-snb6/igt@gem_ctx_persistence@legacy-engines-queued.html

  * igt@gem_eio@kms:
    - shard-dg1:          NOTRUN -> [FAIL][31] ([i915#5784])
   [31]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-dg1-18/igt@gem_eio@kms.html

  * igt@gem_exec_await@wide-contexts:
    - shard-dg2:          [PASS][32] -> [TIMEOUT][33] ([i915#5892])
   [32]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13432/shard-dg2-6/igt@gem_exec_await@wide-contexts.html
   [33]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-dg2-6/igt@gem_exec_await@wide-contexts.html

  * igt@gem_exec_fair@basic-none-solo@rcs0:
    - shard-apl:          [PASS][34] -> [FAIL][35] ([i915#2842])
   [34]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13432/shard-apl6/igt@gem_exec_fair@basic-none-solo@rcs0.html
   [35]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-apl6/igt@gem_exec_fair@basic-none-solo@rcs0.html

  * igt@gem_exec_fair@basic-pace@rcs0:
    - shard-glk:          [PASS][36] -> [FAIL][37] ([i915#2842])
   [36]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13432/shard-glk7/igt@gem_exec_fair@basic-pace@rcs0.html
   [37]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-glk5/igt@gem_exec_fair@basic-pace@rcs0.html

  * igt@gem_exec_fair@basic-throttle:
    - shard-dg1:          NOTRUN -> [SKIP][38] ([i915#3539])
   [38]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-dg1-18/igt@gem_exec_fair@basic-throttle.html

  * igt@gem_exec_flush@basic-wb-rw-before-default:
    - shard-dg1:          NOTRUN -> [SKIP][39] ([i915#3539] / [i915#4852])
   [39]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-dg1-18/igt@gem_exec_flush@basic-wb-rw-before-default.html

  * igt@gem_exec_reloc@basic-wc-read-active:
    - shard-dg1:          NOTRUN -> [SKIP][40] ([i915#3281])
   [40]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-dg1-18/igt@gem_exec_reloc@basic-wc-read-active.html

  * igt@gem_exec_suspend@basic-s4-devices@smem:
    - shard-tglu:         [PASS][41] -> [ABORT][42] ([i915#7975] / [i915#8213])
   [41]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13432/shard-tglu-2/igt@gem_exec_suspend@basic-s4-devices@smem.html
   [42]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-tglu-10/igt@gem_exec_suspend@basic-s4-devices@smem.html

  * igt@gem_fenced_exec_thrash@no-spare-fences:
    - shard-dg1:          NOTRUN -> [SKIP][43] ([i915#4860])
   [43]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-dg1-18/igt@gem_fenced_exec_thrash@no-spare-fences.html

  * igt@gem_lmem_swapping@basic:
    - shard-apl:          NOTRUN -> [SKIP][44] ([fdo#109271] / [i915#4613])
   [44]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-apl4/igt@gem_lmem_swapping@basic.html

  * igt@gem_lmem_swapping@smem-oom@lmem0:
    - shard-dg2:          [PASS][45] -> [TIMEOUT][46] ([i915#5493])
   [45]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13432/shard-dg2-8/igt@gem_lmem_swapping@smem-oom@lmem0.html
   [46]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-dg2-7/igt@gem_lmem_swapping@smem-oom@lmem0.html

  * igt@gem_mmap@short-mmap:
    - shard-dg1:          NOTRUN -> [SKIP][47] ([i915#4083])
   [47]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-dg1-18/igt@gem_mmap@short-mmap.html

  * igt@gem_mmap_gtt@cpuset-big-copy-xy:
    - shard-dg1:          NOTRUN -> [SKIP][48] ([i915#4077])
   [48]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-dg1-18/igt@gem_mmap_gtt@cpuset-big-copy-xy.html

  * igt@gem_userptr_blits@forbidden-operations:
    - shard-dg1:          NOTRUN -> [SKIP][49] ([i915#3282])
   [49]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-dg1-18/igt@gem_userptr_blits@forbidden-operations.html

  * igt@gem_userptr_blits@map-fixed-invalidate-overlap-busy:
    - shard-dg1:          NOTRUN -> [SKIP][50] ([i915#3297] / [i915#4880])
   [50]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-dg1-18/igt@gem_userptr_blits@map-fixed-invalidate-overlap-busy.html

  * igt@gen9_exec_parse@basic-rejected-ctx-param:
    - shard-dg1:          NOTRUN -> [SKIP][51] ([i915#2527])
   [51]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-dg1-18/igt@gen9_exec_parse@basic-rejected-ctx-param.html

  * igt@i915_pm_dc@dc6-dpms:
    - shard-dg1:          NOTRUN -> [SKIP][52] ([i915#3361])
   [52]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-dg1-18/igt@i915_pm_dc@dc6-dpms.html

  * igt@i915_pm_dc@dc9-dpms:
    - shard-tglu:         [PASS][53] -> [SKIP][54] ([i915#4281])
   [53]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13432/shard-tglu-10/igt@i915_pm_dc@dc9-dpms.html
   [54]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-tglu-5/igt@i915_pm_dc@dc9-dpms.html

  * igt@i915_pm_lpsp@screens-disabled:
    - shard-dg1:          NOTRUN -> [SKIP][55] ([i915#1902])
   [55]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-dg1-18/igt@i915_pm_lpsp@screens-disabled.html

  * igt@i915_pm_rpm@gem-execbuf-stress@smem0:
    - shard-dg1:          [PASS][56] -> [FAIL][57] ([i915#7940]) +3 similar issues
   [56]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13432/shard-dg1-17/igt@i915_pm_rpm@gem-execbuf-stress@smem0.html
   [57]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-dg1-18/igt@i915_pm_rpm@gem-execbuf-stress@smem0.html
    - shard-tglu:         [PASS][58] -> [FAIL][59] ([i915#7940])
   [58]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13432/shard-tglu-9/igt@i915_pm_rpm@gem-execbuf-stress@smem0.html
   [59]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-tglu-7/igt@i915_pm_rpm@gem-execbuf-stress@smem0.html

  * igt@i915_pm_rps@engine-order:
    - shard-apl:          NOTRUN -> [FAIL][60] ([i915#6537])
   [60]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-apl4/igt@i915_pm_rps@engine-order.html

  * igt@i915_pm_sseu@full-enable:
    - shard-apl:          [PASS][61] -> [FAIL][62] ([i915#3524])
   [61]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13432/shard-apl2/igt@i915_pm_sseu@full-enable.html
   [62]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-apl6/igt@i915_pm_sseu@full-enable.html

  * igt@i915_suspend@fence-restore-untiled:
    - shard-snb:          NOTRUN -> [DMESG-WARN][63] ([i915#8841])
   [63]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-snb6/igt@i915_suspend@fence-restore-untiled.html

  * igt@kms_addfb_basic@basic-x-tiled-legacy:
    - shard-dg1:          NOTRUN -> [SKIP][64] ([i915#4212])
   [64]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-dg1-18/igt@kms_addfb_basic@basic-x-tiled-legacy.html

  * igt@kms_async_flips@crc@pipe-d-hdmi-a-4:
    - shard-dg1:          NOTRUN -> [FAIL][65] ([i915#8247]) +3 similar issues
   [65]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-dg1-14/igt@kms_async_flips@crc@pipe-d-hdmi-a-4.html

  * igt@kms_atomic_transition@plane-all-modeset-transition-fencing-internal-panels:
    - shard-snb:          NOTRUN -> [SKIP][66] ([fdo#109271] / [i915#1769])
   [66]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-snb6/igt@kms_atomic_transition@plane-all-modeset-transition-fencing-internal-panels.html

  * igt@kms_big_fb@x-tiled-64bpp-rotate-270:
    - shard-dg1:          NOTRUN -> [SKIP][67] ([i915#3638])
   [67]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-dg1-18/igt@kms_big_fb@x-tiled-64bpp-rotate-270.html

  * igt@kms_big_joiner@basic:
    - shard-dg1:          NOTRUN -> [SKIP][68] ([i915#2705])
   [68]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-dg1-18/igt@kms_big_joiner@basic.html

  * igt@kms_ccs@pipe-a-missing-ccs-buffer-y_tiled_gen12_mc_ccs:
    - shard-apl:          NOTRUN -> [SKIP][69] ([fdo#109271] / [i915#3886])
   [69]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-apl4/igt@kms_ccs@pipe-a-missing-ccs-buffer-y_tiled_gen12_mc_ccs.html

  * igt@kms_ccs@pipe-b-bad-aux-stride-yf_tiled_ccs:
    - shard-dg1:          NOTRUN -> [SKIP][70] ([i915#3689] / [i915#5354] / [i915#6095]) +2 similar issues
   [70]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-dg1-18/igt@kms_ccs@pipe-b-bad-aux-stride-yf_tiled_ccs.html

  * igt@kms_ccs@pipe-b-missing-ccs-buffer-y_tiled_gen12_mc_ccs:
    - shard-dg1:          NOTRUN -> [SKIP][71] ([i915#3689] / [i915#3886] / [i915#5354] / [i915#6095])
   [71]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-dg1-18/igt@kms_ccs@pipe-b-missing-ccs-buffer-y_tiled_gen12_mc_ccs.html

  * igt@kms_ccs@pipe-c-bad-rotation-90-4_tiled_mtl_mc_ccs:
    - shard-dg1:          NOTRUN -> [SKIP][72] ([i915#5354] / [i915#6095]) +3 similar issues
   [72]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-dg1-18/igt@kms_ccs@pipe-c-bad-rotation-90-4_tiled_mtl_mc_ccs.html

  * igt@kms_cdclk@plane-scaling@pipe-c-hdmi-a-3:
    - shard-dg2:          NOTRUN -> [SKIP][73] ([i915#4087]) +3 similar issues
   [73]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-dg2-1/igt@kms_cdclk@plane-scaling@pipe-c-hdmi-a-3.html

  * igt@kms_chamelium_frames@hdmi-crc-nonplanar-formats:
    - shard-dg1:          NOTRUN -> [SKIP][74] ([i915#7828]) +2 similar issues
   [74]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-dg1-18/igt@kms_chamelium_frames@hdmi-crc-nonplanar-formats.html

  * igt@kms_content_protection@atomic@pipe-a-dp-4:
    - shard-dg2:          NOTRUN -> [TIMEOUT][75] ([i915#7173])
   [75]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-dg2-11/igt@kms_content_protection@atomic@pipe-a-dp-4.html

  * igt@kms_content_protection@srm@pipe-a-dp-1:
    - shard-apl:          NOTRUN -> [TIMEOUT][76] ([i915#7173])
   [76]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-apl4/igt@kms_content_protection@srm@pipe-a-dp-1.html

  * igt@kms_cursor_crc@cursor-rapid-movement-32x32:
    - shard-dg1:          NOTRUN -> [SKIP][77] ([i915#3555])
   [77]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-dg1-18/igt@kms_cursor_crc@cursor-rapid-movement-32x32.html

  * igt@kms_cursor_legacy@cursorb-vs-flipb-atomic-transitions:
    - shard-snb:          NOTRUN -> [SKIP][78] ([fdo#109271] / [fdo#111767])
   [78]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-snb6/igt@kms_cursor_legacy@cursorb-vs-flipb-atomic-transitions.html

  * igt@kms_dsc@dsc-with-output-formats:
    - shard-dg1:          NOTRUN -> [SKIP][79] ([i915#3555] / [i915#3840])
   [79]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-dg1-18/igt@kms_dsc@dsc-with-output-formats.html

  * igt@kms_frontbuffer_tracking@fbc-1p-primscrn-pri-indfb-draw-mmap-cpu:
    - shard-dg2:          [PASS][80] -> [FAIL][81] ([i915#6880]) +1 similar issue
   [80]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13432/shard-dg2-1/igt@kms_frontbuffer_tracking@fbc-1p-primscrn-pri-indfb-draw-mmap-cpu.html
   [81]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-dg2-11/igt@kms_frontbuffer_tracking@fbc-1p-primscrn-pri-indfb-draw-mmap-cpu.html

  * igt@kms_frontbuffer_tracking@fbc-2p-primscrn-pri-indfb-draw-mmap-gtt:
    - shard-dg1:          NOTRUN -> [SKIP][82] ([i915#8708]) +3 similar issues
   [82]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-dg1-18/igt@kms_frontbuffer_tracking@fbc-2p-primscrn-pri-indfb-draw-mmap-gtt.html

  * igt@kms_frontbuffer_tracking@fbc-2p-primscrn-shrfb-pgflip-blt:
    - shard-dg1:          NOTRUN -> [SKIP][83] ([fdo#111825]) +8 similar issues
   [83]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-dg1-18/igt@kms_frontbuffer_tracking@fbc-2p-primscrn-shrfb-pgflip-blt.html

  * igt@kms_frontbuffer_tracking@psr-1p-primscrn-pri-shrfb-draw-blt:
    - shard-dg1:          NOTRUN -> [SKIP][84] ([i915#3458]) +2 similar issues
   [84]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-dg1-18/igt@kms_frontbuffer_tracking@psr-1p-primscrn-pri-shrfb-draw-blt.html

  * igt@kms_frontbuffer_tracking@psr-rgb101010-draw-mmap-cpu:
    - shard-apl:          NOTRUN -> [SKIP][85] ([fdo#109271]) +33 similar issues
   [85]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-apl4/igt@kms_frontbuffer_tracking@psr-rgb101010-draw-mmap-cpu.html

  * igt@kms_hdr@static-swap:
    - shard-dg2:          NOTRUN -> [SKIP][86] ([i915#3555] / [i915#8228])
   [86]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-dg2-1/igt@kms_hdr@static-swap.html

  * igt@kms_plane_scaling@plane-scaler-with-rotation-unity-scaling@pipe-d-hdmi-a-4:
    - shard-dg1:          NOTRUN -> [SKIP][87] ([i915#5176]) +7 similar issues
   [87]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-dg1-18/igt@kms_plane_scaling@plane-scaler-with-rotation-unity-scaling@pipe-d-hdmi-a-4.html

  * igt@kms_plane_scaling@plane-upscale-with-rotation-20x20@pipe-a-hdmi-a-1:
    - shard-rkl:          NOTRUN -> [SKIP][88] ([i915#5176]) +3 similar issues
   [88]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-rkl-7/igt@kms_plane_scaling@plane-upscale-with-rotation-20x20@pipe-a-hdmi-a-1.html

  * igt@kms_plane_scaling@planes-downscale-factor-0-25-upscale-20x20@pipe-c-hdmi-a-1:
    - shard-dg1:          NOTRUN -> [SKIP][89] ([i915#5235]) +3 similar issues
   [89]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-dg1-19/igt@kms_plane_scaling@planes-downscale-factor-0-25-upscale-20x20@pipe-c-hdmi-a-1.html

  * igt@kms_plane_scaling@planes-downscale-factor-0-25@pipe-d-hdmi-a-3:
    - shard-dg2:          NOTRUN -> [SKIP][90] ([i915#5235]) +11 similar issues
   [90]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-dg2-5/igt@kms_plane_scaling@planes-downscale-factor-0-25@pipe-d-hdmi-a-3.html

  * igt@kms_plane_scaling@planes-downscale-factor-0-5-unity-scaling@pipe-b-vga-1:
    - shard-snb:          NOTRUN -> [SKIP][91] ([fdo#109271]) +286 similar issues
   [91]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-snb6/igt@kms_plane_scaling@planes-downscale-factor-0-5-unity-scaling@pipe-b-vga-1.html

  * igt@kms_psr@dpms:
    - shard-dg1:          NOTRUN -> [SKIP][92] ([i915#1072])
   [92]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-dg1-18/igt@kms_psr@dpms.html

  * igt@runner@aborted:
    - shard-mtlp:         NOTRUN -> ([FAIL][93], [FAIL][94], [FAIL][95], [FAIL][96], [FAIL][97], [FAIL][98], [FAIL][99], [FAIL][100], [FAIL][101], [FAIL][102], [FAIL][103], [FAIL][104], [FAIL][105], [FAIL][106], [FAIL][107], [FAIL][108], [FAIL][109], [FAIL][110], [FAIL][111], [FAIL][112], [FAIL][113], [FAIL][114], [FAIL][115], [FAIL][116], [FAIL][117]) ([i915#7812])
   [93]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-mtlp-3/igt@runner@aborted.html
   [94]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-mtlp-1/igt@runner@aborted.html
   [95]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-mtlp-1/igt@runner@aborted.html
   [96]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-mtlp-3/igt@runner@aborted.html
   [97]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-mtlp-4/igt@runner@aborted.html
   [98]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-mtlp-1/igt@runner@aborted.html
   [99]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-mtlp-4/igt@runner@aborted.html
   [100]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-mtlp-1/igt@runner@aborted.html
   [101]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-mtlp-2/igt@runner@aborted.html
   [102]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-mtlp-7/igt@runner@aborted.html
   [103]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-mtlp-6/igt@runner@aborted.html
   [104]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-mtlp-5/igt@runner@aborted.html
   [105]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-mtlp-5/igt@runner@aborted.html
   [106]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-mtlp-7/igt@runner@aborted.html
   [107]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-mtlp-6/igt@runner@aborted.html
   [108]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-mtlp-6/igt@runner@aborted.html
   [109]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-mtlp-8/igt@runner@aborted.html
   [110]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-mtlp-4/igt@runner@aborted.html
   [111]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-mtlp-5/igt@runner@aborted.html
   [112]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-mtlp-8/igt@runner@aborted.html
   [113]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-mtlp-7/igt@runner@aborted.html
   [114]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-mtlp-5/igt@runner@aborted.html
   [115]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-mtlp-7/igt@runner@aborted.html
   [116]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-mtlp-7/igt@runner@aborted.html
   [117]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-mtlp-8/igt@runner@aborted.html

  * igt@v3d/v3d_submit_cl@single-in-sync:
    - shard-dg1:          NOTRUN -> [SKIP][118] ([i915#2575])
   [118]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-dg1-18/igt@v3d/v3d_submit_cl@single-in-sync.html

  * igt@vc4/vc4_tiling@get-bad-flags:
    - shard-dg1:          NOTRUN -> [SKIP][119] ([i915#7711])
   [119]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-dg1-18/igt@vc4/vc4_tiling@get-bad-flags.html

  
#### Possible fixes ####

  * igt@gem_ctx_exec@basic-nohangcheck:
    - shard-tglu:         [FAIL][120] ([i915#6268]) -> [PASS][121]
   [120]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13432/shard-tglu-4/igt@gem_ctx_exec@basic-nohangcheck.html
   [121]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-tglu-2/igt@gem_ctx_exec@basic-nohangcheck.html

  * {igt@gem_ctx_freq@sysfs@gt0}:
    - shard-dg2:          [FAIL][122] ([i915#6786]) -> [PASS][123]
   [122]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13432/shard-dg2-5/igt@gem_ctx_freq@sysfs@gt0.html
   [123]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-dg2-8/igt@gem_ctx_freq@sysfs@gt0.html

  * igt@gem_ctx_isolation@preservation-s3@vecs0:
    - shard-apl:          [ABORT][124] ([i915#180]) -> [PASS][125]
   [124]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13432/shard-apl2/igt@gem_ctx_isolation@preservation-s3@vecs0.html
   [125]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-apl4/igt@gem_ctx_isolation@preservation-s3@vecs0.html

  * igt@gem_eio@hibernate:
    - shard-dg1:          [ABORT][126] ([i915#4391] / [i915#7975] / [i915#8213]) -> [PASS][127]
   [126]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13432/shard-dg1-14/igt@gem_eio@hibernate.html
   [127]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-dg1-18/igt@gem_eio@hibernate.html

  * igt@gem_eio@unwedge-stress:
    - shard-dg1:          [FAIL][128] ([i915#5784]) -> [PASS][129]
   [128]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13432/shard-dg1-16/igt@gem_eio@unwedge-stress.html
   [129]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-dg1-15/igt@gem_eio@unwedge-stress.html

  * igt@gem_exec_fair@basic-deadline:
    - shard-glk:          [FAIL][130] ([i915#2846]) -> [PASS][131]
   [130]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13432/shard-glk3/igt@gem_exec_fair@basic-deadline.html
   [131]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-glk4/igt@gem_exec_fair@basic-deadline.html

  * igt@gem_exec_fair@basic-pace-share@rcs0:
    - shard-rkl:          [FAIL][132] ([i915#2842]) -> [PASS][133] +2 similar issues
   [132]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13432/shard-rkl-2/igt@gem_exec_fair@basic-pace-share@rcs0.html
   [133]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-rkl-7/igt@gem_exec_fair@basic-pace-share@rcs0.html

  * igt@i915_module_load@reload-with-fault-injection:
    - shard-dg2:          [DMESG-WARN][134] ([i915#7061] / [i915#8617]) -> [PASS][135]
   [134]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13432/shard-dg2-3/igt@i915_module_load@reload-with-fault-injection.html
   [135]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-dg2-2/igt@i915_module_load@reload-with-fault-injection.html

  * igt@i915_pm_rc6_residency@rc6-idle@rcs0:
    - shard-dg1:          [FAIL][136] ([i915#3591]) -> [PASS][137] +2 similar issues
   [136]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13432/shard-dg1-17/igt@i915_pm_rc6_residency@rc6-idle@rcs0.html
   [137]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-dg1-18/igt@i915_pm_rc6_residency@rc6-idle@rcs0.html

  * igt@i915_pm_rpm@dpms-mode-unset-non-lpsp:
    - shard-dg1:          [SKIP][138] ([i915#1397]) -> [PASS][139] +1 similar issue
   [138]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13432/shard-dg1-19/igt@i915_pm_rpm@dpms-mode-unset-non-lpsp.html
   [139]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-dg1-15/igt@i915_pm_rpm@dpms-mode-unset-non-lpsp.html

  * igt@i915_pm_rpm@dpms-non-lpsp:
    - shard-rkl:          [SKIP][140] ([i915#1397]) -> [PASS][141]
   [140]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13432/shard-rkl-7/igt@i915_pm_rpm@dpms-non-lpsp.html
   [141]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-rkl-2/igt@i915_pm_rpm@dpms-non-lpsp.html

  * igt@i915_pm_rpm@gem-execbuf-stress@extra-wait-smem0:
    - shard-dg1:          [FAIL][142] ([i915#7940]) -> [PASS][143]
   [142]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13432/shard-dg1-17/igt@i915_pm_rpm@gem-execbuf-stress@extra-wait-smem0.html
   [143]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-dg1-18/igt@i915_pm_rpm@gem-execbuf-stress@extra-wait-smem0.html

  * igt@i915_pm_rpm@modeset-non-lpsp:
    - shard-dg2:          [SKIP][144] ([i915#1397]) -> [PASS][145]
   [144]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13432/shard-dg2-10/igt@i915_pm_rpm@modeset-non-lpsp.html
   [145]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-dg2-11/igt@i915_pm_rpm@modeset-non-lpsp.html

  * igt@i915_pm_rpm@pm-caching:
    - shard-tglu:         [FAIL][146] ([i915#7940]) -> [PASS][147]
   [146]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13432/shard-tglu-2/igt@i915_pm_rpm@pm-caching.html
   [147]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-tglu-10/igt@i915_pm_rpm@pm-caching.html

  * igt@i915_suspend@basic-s3-without-i915:
    - shard-rkl:          [FAIL][148] ([fdo#103375]) -> [PASS][149]
   [148]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13432/shard-rkl-7/igt@i915_suspend@basic-s3-without-i915.html
   [149]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-rkl-7/igt@i915_suspend@basic-s3-without-i915.html

  * igt@kms_cursor_legacy@flip-vs-cursor-atomic-transitions-varying-size:
    - shard-glk:          [FAIL][150] ([i915#2346]) -> [PASS][151]
   [150]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13432/shard-glk7/igt@kms_cursor_legacy@flip-vs-cursor-atomic-transitions-varying-size.html
   [151]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-glk5/igt@kms_cursor_legacy@flip-vs-cursor-atomic-transitions-varying-size.html

  * igt@kms_flip@2x-plain-flip-ts-check@ab-hdmi-a1-hdmi-a2:
    - shard-glk:          [FAIL][152] ([i915#2122]) -> [PASS][153]
   [152]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13432/shard-glk3/igt@kms_flip@2x-plain-flip-ts-check@ab-hdmi-a1-hdmi-a2.html
   [153]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-glk4/igt@kms_flip@2x-plain-flip-ts-check@ab-hdmi-a1-hdmi-a2.html

  * igt@kms_flip@flip-vs-expired-vblank-interruptible@a-dp1:
    - shard-apl:          [FAIL][154] ([i915#79]) -> [PASS][155]
   [154]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13432/shard-apl7/igt@kms_flip@flip-vs-expired-vblank-interruptible@a-dp1.html
   [155]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-apl6/igt@kms_flip@flip-vs-expired-vblank-interruptible@a-dp1.html

  * igt@kms_flip@flip-vs-suspend@a-hdmi-a3:
    - shard-dg2:          [FAIL][156] ([fdo#103375]) -> [PASS][157] +3 similar issues
   [156]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13432/shard-dg2-5/igt@kms_flip@flip-vs-suspend@a-hdmi-a3.html
   [157]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-dg2-3/igt@kms_flip@flip-vs-suspend@a-hdmi-a3.html

  * igt@kms_frontbuffer_tracking@fbc-1p-primscrn-spr-indfb-draw-blt:
    - shard-dg2:          [FAIL][158] ([i915#6880]) -> [PASS][159] +1 similar issue
   [158]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13432/shard-dg2-11/igt@kms_frontbuffer_tracking@fbc-1p-primscrn-spr-indfb-draw-blt.html
   [159]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-dg2-1/igt@kms_frontbuffer_tracking@fbc-1p-primscrn-spr-indfb-draw-blt.html

  
#### Warnings ####

  * igt@i915_pm_rc6_residency@rc6-idle@rcs0:
    - shard-tglu:         [FAIL][160] ([i915#2681] / [i915#3591]) -> [WARN][161] ([i915#2681])
   [160]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13432/shard-tglu-9/igt@i915_pm_rc6_residency@rc6-idle@rcs0.html
   [161]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-tglu-7/igt@i915_pm_rc6_residency@rc6-idle@rcs0.html

  * igt@kms_content_protection@mei_interface:
    - shard-dg2:          [SKIP][162] ([i915#7118]) -> [SKIP][163] ([i915#7118] / [i915#7162])
   [162]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13432/shard-dg2-1/igt@kms_content_protection@mei_interface.html
   [163]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-dg2-11/igt@kms_content_protection@mei_interface.html

  * igt@kms_fbcon_fbt@psr:
    - shard-rkl:          [SKIP][164] ([fdo#110189] / [i915#3955]) -> [SKIP][165] ([i915#3955])
   [164]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13432/shard-rkl-1/igt@kms_fbcon_fbt@psr.html
   [165]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-rkl-4/igt@kms_fbcon_fbt@psr.html

  * igt@kms_force_connector_basic@force-load-detect:
    - shard-rkl:          [SKIP][166] ([fdo#109285] / [i915#4098]) -> [SKIP][167] ([fdo#109285])
   [166]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13432/shard-rkl-1/igt@kms_force_connector_basic@force-load-detect.html
   [167]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-rkl-6/igt@kms_force_connector_basic@force-load-detect.html

  * igt@kms_frontbuffer_tracking@fbc-suspend:
    - shard-snb:          [DMESG-WARN][168] ([i915#8841]) -> [SKIP][169] ([fdo#109271])
   [168]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13432/shard-snb6/igt@kms_frontbuffer_tracking@fbc-suspend.html
   [169]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-snb4/igt@kms_frontbuffer_tracking@fbc-suspend.html

  * igt@kms_plane@plane-panning-bottom-right-suspend@pipe-a-planes:
    - shard-snb:          [DMESG-WARN][170] ([i915#8841]) -> [DMESG-FAIL][171] ([fdo#103375])
   [170]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13432/shard-snb4/igt@kms_plane@plane-panning-bottom-right-suspend@pipe-a-planes.html
   [171]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-snb7/igt@kms_plane@plane-panning-bottom-right-suspend@pipe-a-planes.html

  * igt@kms_psr@cursor_plane_move:
    - shard-dg1:          [SKIP][172] ([i915#1072] / [i915#4078]) -> [SKIP][173] ([i915#1072]) +1 similar issue
   [172]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13432/shard-dg1-16/igt@kms_psr@cursor_plane_move.html
   [173]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-dg1-17/igt@kms_psr@cursor_plane_move.html

  * igt@kms_psr@primary_page_flip:
    - shard-dg1:          [SKIP][174] ([i915#1072]) -> [SKIP][175] ([i915#1072] / [i915#4078])
   [174]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13432/shard-dg1-15/igt@kms_psr@primary_page_flip.html
   [175]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/shard-dg1-16/igt@kms_psr@primary_page_flip.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#103375]: https://bugs.freedesktop.org/show_bug.cgi?id=103375
  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109285]: https://bugs.freedesktop.org/show_bug.cgi?id=109285
  [fdo#109309]: https://bugs.freedesktop.org/show_bug.cgi?id=109309
  [fdo#110189]: https://bugs.freedesktop.org/show_bug.cgi?id=110189
  [fdo#111767]: https://bugs.freedesktop.org/show_bug.cgi?id=111767
  [fdo#111825]: https://bugs.freedesktop.org/show_bug.cgi?id=111825
  [i915#1072]: https://gitlab.freedesktop.org/drm/intel/issues/1072
  [i915#1099]: https://gitlab.freedesktop.org/drm/intel/issues/1099
  [i915#1397]: https://gitlab.freedesktop.org/drm/intel/issues/1397
  [i915#1769]: https://gitlab.freedesktop.org/drm/intel/issues/1769
  [i915#180]: https://gitlab.freedesktop.org/drm/intel/issues/180
  [i915#1902]: https://gitlab.freedesktop.org/drm/intel/issues/1902
  [i915#2122]: https://gitlab.freedesktop.org/drm/intel/issues/2122
  [i915#2346]: https://gitlab.freedesktop.org/drm/intel/issues/2346
  [i915#2527]: https://gitlab.freedesktop.org/drm/intel/issues/2527
  [i915#2575]: https://gitlab.freedesktop.org/drm/intel/issues/2575
  [i915#2681]: https://gitlab.freedesktop.org/drm/intel/issues/2681
  [i915#2705]: https://gitlab.freedesktop.org/drm/intel/issues/2705
  [i915#2842]: https://gitlab.freedesktop.org/drm/intel/issues/2842
  [i915#2846]: https://gitlab.freedesktop.org/drm/intel/issues/2846
  [i915#3281]: https://gitlab.freedesktop.org/drm/intel/issues/3281
  [i915#3282]: https://gitlab.freedesktop.org/drm/intel/issues/3282
  [i915#3297]: https://gitlab.freedesktop.org/drm/intel/issues/3297
  [i915#3361]: https://gitlab.freedesktop.org/drm/intel/issues/3361
  [i915#3458]: https://gitlab.freedesktop.org/drm/intel/issues/3458
  [i915#3524]: https://gitlab.freedesktop.org/drm/intel/issues/3524
  [i915#3539]: https://gitlab.freedesktop.org/drm/intel/issues/3539
  [i915#3555]: https://gitlab.freedesktop.org/drm/intel/issues/3555
  [i915#3591]: https://gitlab.freedesktop.org/drm/intel/issues/3591
  [i915#3638]: https://gitlab.freedesktop.org/drm/intel/issues/3638
  [i915#3689]: https://gitlab.freedesktop.org/drm/intel/issues/3689
  [i915#3840]: https://gitlab.freedesktop.org/drm/intel/issues/3840
  [i915#3886]: https://gitlab.freedesktop.org/drm/intel/issues/3886
  [i915#3955]: https://gitlab.freedesktop.org/drm/intel/issues/3955
  [i915#4077]: https://gitlab.freedesktop.org/drm/intel/issues/4077
  [i915#4078]: https://gitlab.freedesktop.org/drm/intel/issues/4078
  [i915#4083]: https://gitlab.freedesktop.org/drm/intel/issues/4083
  [i915#4087]: https://gitlab.freedesktop.org/drm/intel/issues/4087
  [i915#4098]: https://gitlab.freedesktop.org/drm/intel/issues/4098
  [i915#4212]: https://gitlab.freedesktop.org/drm/intel/issues/4212
  [i915#4281]: https://gitlab.freedesktop.org/drm/intel/issues/4281
  [i915#4391]: https://gitlab.freedesktop.org/drm/intel/issues/4391
  [i915#4613]: https://gitlab.freedesktop.org/drm/intel/issues/4613
  [i915#4852]: https://gitlab.freedesktop.org/drm/intel/issues/4852
  [i915#4860]: https://gitlab.freedesktop.org/drm/intel/issues/4860
  [i915#4880]: https://gitlab.freedesktop.org/drm/intel/issues/4880
  [i915#5176]: https://gitlab.freedesktop.org/drm/intel/issues/5176
  [i915#5235]: https://gitlab.freedesktop.org/drm/intel/issues/5235
  [i915#5354]: https://gitlab.freedesktop.org/drm/intel/issues/5354
  [i915#5493]: https://gitlab.freedesktop.org/drm/intel/issues/5493
  [i915#5784]: https://gitlab.freedesktop.org/drm/intel/issues/5784
  [i915#5892]: https://gitlab.freedesktop.org/drm/intel/issues/5892
  [i915#6016]: https://gitlab.freedesktop.org/drm/intel/issues/6016
  [i915#6095]: https://gitlab.freedesktop.org/drm/intel/issues/6095
  [i915#6268]: https://gitlab.freedesktop.org/drm/intel/issues/6268
  [i915#6537]: https://gitlab.freedesktop.org/drm/intel/issues/6537
  [i915#6786]: https://gitlab.freedesktop.org/drm/intel/issues/6786
  [i915#6880]: https://gitlab.freedesktop.org/drm/intel/issues/6880
  [i915#7061]: https://gitlab.freedesktop.org/drm/intel/issues/7061
  [i915#7118]: https://gitlab.freedesktop.org/drm/intel/issues/7118
  [i915#7162]: https://gitlab.freedesktop.org/drm/intel/issues/7162
  [i915#7173]: https://gitlab.freedesktop.org/drm/intel/issues/7173
  [i915#7461]: https://gitlab.freedesktop.org/drm/intel/issues/7461
  [i915#7711]: https://gitlab.freedesktop.org/drm/intel/issues/7711
  [i915#7812]: https://gitlab.freedesktop.org/drm/intel/issues/7812
  [i915#7828]: https://gitlab.freedesktop.org/drm/intel/issues/7828
  [i915#79]: https://gitlab.freedesktop.org/drm/intel/issues/79
  [i915#7940]: https://gitlab.freedesktop.org/drm/intel/issues/7940
  [i915#7975]: https://gitlab.freedesktop.org/drm/intel/issues/7975
  [i915#8211]: https://gitlab.freedesktop.org/drm/intel/issues/8211
  [i915#8213]: https://gitlab.freedesktop.org/drm/intel/issues/8213
  [i915#8228]: https://gitlab.freedesktop.org/drm/intel/issues/8228
  [i915#8234]: https://gitlab.freedesktop.org/drm/intel/issues/8234
  [i915#8247]: https://gitlab.freedesktop.org/drm/intel/issues/8247
  [i915#8414]: https://gitlab.freedesktop.org/drm/intel/issues/8414
  [i915#8502]: https://gitlab.freedesktop.org/drm/intel/issues/8502
  [i915#8617]: https://gitlab.freedesktop.org/drm/intel/issues/8617
  [i915#8661]: https://gitlab.freedesktop.org/drm/intel/issues/8661
  [i915#8708]: https://gitlab.freedesktop.org/drm/intel/issues/8708
  [i915#8709]: https://gitlab.freedesktop.org/drm/intel/issues/8709
  [i915#8841]: https://gitlab.freedesktop.org/drm/intel/issues/8841


Build changes
-------------

  * Linux: CI_DRM_13432 -> Patchwork_121450v1

  CI-20190529: 20190529
  CI_DRM_13432: 069a79d6af09879060345da9f8b886a73b7810a8 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_7406: 1d6fd796607099d189e85d1fd305160363b961f2 @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  Patchwork_121450v1: 069a79d6af09879060345da9f8b886a73b7810a8 @ git://anongit.freedesktop.org/gfx-ci/linux
  piglit_4509: fdc5a4ca11124ab8413c7988896eec4c97336694 @ git://anongit.freedesktop.org/piglit

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121450v1/index.html

[-- Attachment #2: Type: text/html, Size: 47850 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* RE: [RFC 1/8] drm/i915: Skip clflush after GPU writes on Meteorlake
  2023-07-27 14:54   ` [Intel-gfx] " Tvrtko Ursulin
@ 2023-07-28  5:50     ` Yang, Fei
  -1 siblings, 0 replies; 59+ messages in thread
From: Yang, Fei @ 2023-07-28  5:50 UTC (permalink / raw)
  To: Tvrtko Ursulin, Intel-gfx, dri-devel
  Cc: Thomas Hellström, Roper, Matthew D, Auld, Matthew, Ursulin, Tvrtko

> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>
> On Meteorlake CPU cache will not contain stale data after GPU
> access since write-invalidate protocol is used, which means
> there is no need to flush before potentially transitioning the
> buffer to a non-coherent domain.
>
> Use the opportunity to documet the situation on discrete too.

LGTM.
Reviewed-by: Fei Yang <fei.yang@intel.com>

> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Matt Roper <matthew.d.roper@intel.com>
> Cc: Fei Yang <fei.yang@intel.com>
> Cc: Matthew Auld <matthew.auld@intel.com>
> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_domain.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> index ffddec1d2a76..57db9c581bf6 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> @@ -24,9 +24,22 @@ static bool gpu_write_needs_clflush(struct drm_i915_gem_object *obj)  {
>       struct drm_i915_private *i915 = to_i915(obj->base.dev);
>
> +     /*
> +      * Discrete GPUs never dirty the CPU cache.
> +      */
>       if (IS_DGFX(i915))
>               return false;
>
> +     /*
> +      * Cache snooping on Meteorlake is using write-invalidate so GPU writes
> +      * never end up in the CPU cache.
> +      *
> +      * QQQ: Do other snooping platforms behave identicaly and could we
> +      *      therefore write this as "if !HAS_LLC(i915) && HAS_SNOOP(i915)"?
> +      */
> +     if (IS_METEORLAKE(i915))
> +             return false;
> +
>       /*
>        * For objects created by userspace through GEM_CREATE with pat_index
>        * set by set_pat extension, i915_gem_object_has_cache_level() will
> --
> 2.39.2

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Intel-gfx] [RFC 1/8] drm/i915: Skip clflush after GPU writes on Meteorlake
@ 2023-07-28  5:50     ` Yang, Fei
  0 siblings, 0 replies; 59+ messages in thread
From: Yang, Fei @ 2023-07-28  5:50 UTC (permalink / raw)
  To: Tvrtko Ursulin, Intel-gfx, dri-devel
  Cc: Thomas Hellström, Roper, Matthew D, Auld, Matthew

> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>
> On Meteorlake CPU cache will not contain stale data after GPU
> access since write-invalidate protocol is used, which means
> there is no need to flush before potentially transitioning the
> buffer to a non-coherent domain.
>
> Use the opportunity to documet the situation on discrete too.

LGTM.
Reviewed-by: Fei Yang <fei.yang@intel.com>

> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Matt Roper <matthew.d.roper@intel.com>
> Cc: Fei Yang <fei.yang@intel.com>
> Cc: Matthew Auld <matthew.auld@intel.com>
> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_domain.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> index ffddec1d2a76..57db9c581bf6 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> @@ -24,9 +24,22 @@ static bool gpu_write_needs_clflush(struct drm_i915_gem_object *obj)  {
>       struct drm_i915_private *i915 = to_i915(obj->base.dev);
>
> +     /*
> +      * Discrete GPUs never dirty the CPU cache.
> +      */
>       if (IS_DGFX(i915))
>               return false;
>
> +     /*
> +      * Cache snooping on Meteorlake is using write-invalidate so GPU writes
> +      * never end up in the CPU cache.
> +      *
> +      * QQQ: Do other snooping platforms behave identicaly and could we
> +      *      therefore write this as "if !HAS_LLC(i915) && HAS_SNOOP(i915)"?
> +      */
> +     if (IS_METEORLAKE(i915))
> +             return false;
> +
>       /*
>        * For objects created by userspace through GEM_CREATE with pat_index
>        * set by set_pat extension, i915_gem_object_has_cache_level() will
> --
> 2.39.2

^ permalink raw reply	[flat|nested] 59+ messages in thread

* RE: [RFC 4/8] drm/i915: Refactor PAT/object cache handling
  2023-07-27 14:55   ` [Intel-gfx] " Tvrtko Ursulin
@ 2023-07-28  7:14     ` Yang, Fei
  -1 siblings, 0 replies; 59+ messages in thread
From: Yang, Fei @ 2023-07-28  7:14 UTC (permalink / raw)
  To: Tvrtko Ursulin, Intel-gfx, dri-devel
  Cc: Roper, Matthew D, Chris Wilson, Andi Shyti, Ursulin, Tvrtko

[snip]
> @@ -41,14 +42,17 @@ static bool gpu_write_needs_clflush(struct drm_i915_gem_object *obj)
>               return false;
>
>       /*
> -      * For objects created by userspace through GEM_CREATE with pat_index
> -      * set by set_pat extension, i915_gem_object_has_cache_level() will
> -      * always return true, because the coherency of such object is managed

i915_gem_object_has_cache_level() always return true means this function
always return false.

> -      * by userspace. Othereise the call here would fall back to checking
> -      * whether the object is un-cached or write-through.
> +      * Always flush cache for UMD objects with PAT index set.

(obj->pat_set_by_user == true) indicates UMD knows how to handle the coherency,
forcing clflush in KMD would be redundant.

>        */
> -     return !(i915_gem_object_has_cache_level(obj, I915_CACHE_NONE) ||
> -              i915_gem_object_has_cache_level(obj, I915_CACHE_WT));
> +     if (obj->pat_set_by_user)
> +             return true;

return false;

> +
> +     /*
> +      * Fully coherent cached access may end up with data in the CPU cache
> +      * which hasn't hit memory yet.
> +      */
> +     return i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_WB) &&
> +            i915_gem_object_has_cache_flag(obj, I915_CACHE_FLAG_COH2W);

Why checking COH2W here? The logic was, if UC or WT return false, otherwise
return true. So, as long as cache_mode is WB, it's sufficient to say true
here, right?

>  }

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Intel-gfx] [RFC 4/8] drm/i915: Refactor PAT/object cache handling
@ 2023-07-28  7:14     ` Yang, Fei
  0 siblings, 0 replies; 59+ messages in thread
From: Yang, Fei @ 2023-07-28  7:14 UTC (permalink / raw)
  To: Tvrtko Ursulin, Intel-gfx, dri-devel; +Cc: Roper, Matthew D, Chris Wilson

[snip]
> @@ -41,14 +42,17 @@ static bool gpu_write_needs_clflush(struct drm_i915_gem_object *obj)
>               return false;
>
>       /*
> -      * For objects created by userspace through GEM_CREATE with pat_index
> -      * set by set_pat extension, i915_gem_object_has_cache_level() will
> -      * always return true, because the coherency of such object is managed

i915_gem_object_has_cache_level() always return true means this function
always return false.

> -      * by userspace. Othereise the call here would fall back to checking
> -      * whether the object is un-cached or write-through.
> +      * Always flush cache for UMD objects with PAT index set.

(obj->pat_set_by_user == true) indicates UMD knows how to handle the coherency,
forcing clflush in KMD would be redundant.

>        */
> -     return !(i915_gem_object_has_cache_level(obj, I915_CACHE_NONE) ||
> -              i915_gem_object_has_cache_level(obj, I915_CACHE_WT));
> +     if (obj->pat_set_by_user)
> +             return true;

return false;

> +
> +     /*
> +      * Fully coherent cached access may end up with data in the CPU cache
> +      * which hasn't hit memory yet.
> +      */
> +     return i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_WB) &&
> +            i915_gem_object_has_cache_flag(obj, I915_CACHE_FLAG_COH2W);

Why checking COH2W here? The logic was, if UC or WT return false, otherwise
return true. So, as long as cache_mode is WB, it's sufficient to say true
here, right?

>  }

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Intel-gfx] [RFC 2/8] drm/i915: Split PTE encode between Gen12 and Meteorlake
  2023-07-27 22:25   ` Matt Roper
@ 2023-07-28  8:18     ` Tvrtko Ursulin
  2023-07-28 14:41       ` Matt Roper
  0 siblings, 1 reply; 59+ messages in thread
From: Tvrtko Ursulin @ 2023-07-28  8:18 UTC (permalink / raw)
  To: Matt Roper; +Cc: Intel-gfx, dri-devel


On 27/07/2023 23:25, Matt Roper wrote:
> On Thu, Jul 27, 2023 at 03:54:58PM +0100, Tvrtko Ursulin wrote:
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> No need to run extra instructions which will never trigger on platforms
>> before Meteorlake.
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> ---
>>   drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 26 ++++++++++++++++++++++++++
>>   1 file changed, 26 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>> index c8568e5d1147..862ac1d2de25 100644
>> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>> @@ -63,6 +63,30 @@ static u64 gen12_pte_encode(dma_addr_t addr,
>>   {
>>   	gen8_pte_t pte = addr | GEN8_PAGE_PRESENT | GEN8_PAGE_RW;
>>   
>> +	if (unlikely(flags & PTE_READ_ONLY))
>> +		pte &= ~GEN8_PAGE_RW;
>> +
>> +	if (flags & PTE_LM)
>> +		pte |= GEN12_PPGTT_PTE_LM;
>> +
>> +	if (pat_index & BIT(0))
>> +		pte |= GEN12_PPGTT_PTE_PAT0;
>> +
>> +	if (pat_index & BIT(1))
>> +		pte |= GEN12_PPGTT_PTE_PAT1;
>> +
>> +	if (pat_index & BIT(2))
>> +		pte |= GEN12_PPGTT_PTE_PAT2;
>> +
>> +	return pte;
>> +}
>> +
>> +static u64 mtl_pte_encode(dma_addr_t addr,
>> +			  unsigned int pat_index,
>> +			  u32 flags)
>> +{
>> +	gen8_pte_t pte = addr | GEN8_PAGE_PRESENT | GEN8_PAGE_RW;
>> +
> 
> Would it be more readable to start with
> 
>          gen8_pte_t pte = gen12_pte_encode(addr, pat_index, flags);
> 
> and then |-in only the MTL-specific bit(s) as appropriate?
> 
>>   	if (unlikely(flags & PTE_READ_ONLY))
>>   		pte &= ~GEN8_PAGE_RW;
>>   
>> @@ -995,6 +1019,8 @@ struct i915_ppgtt *gen8_ppgtt_create(struct intel_gt *gt,
>>   	 */
>>   	ppgtt->vm.alloc_scratch_dma = alloc_pt_dma;
>>   
>> +	if (GRAPHICS_VER_FULL(gt->i915) >= IP_VER(12, 70))
>> +		ppgtt->vm.pte_encode = mtl_pte_encode;
>>   	if (GRAPHICS_VER(gt->i915) >= 12)
>>   		ppgtt->vm.pte_encode = gen12_pte_encode;
> 
> I think you wanted 'else if' here.  Otherwise you clobber the MTL
> function pointer.

Doh this was a strong fail.. Yes and yes.. I even had it like you 
suggest in that patch I mentioned to you earlier.. 
https://patchwork.freedesktop.org/patch/546013/?series=120341&rev=2.

Do you have an opinion on that one perhaps?

Thanks,

Tvrtko

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC 3/8] drm/i915: Cache PAT index used by the driver
  2023-07-27 22:44     ` [Intel-gfx] " Matt Roper
@ 2023-07-28 12:03       ` Tvrtko Ursulin
  -1 siblings, 0 replies; 59+ messages in thread
From: Tvrtko Ursulin @ 2023-07-28 12:03 UTC (permalink / raw)
  To: Matt Roper; +Cc: Intel-gfx, Fei Yang, dri-devel, Tvrtko Ursulin


On 27/07/2023 23:44, Matt Roper wrote:
> On Thu, Jul 27, 2023 at 03:54:59PM +0100, Tvrtko Ursulin wrote:
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> Eliminate a bunch of runtime calls to i915_gem_get_pat_index() by caching
>> the interesting PAT indices in struct drm_i915_private. They are static
>> per platfrom so no need to consult a function every time.
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> Cc: Matt Roper <matthew.d.roper@intel.com>
>> Cc: Fei Yang <fei.yang@intel.com>
>> ---
>>   drivers/gpu/drm/i915/Makefile                 |  1 +
>>   .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |  3 +--
>>   drivers/gpu/drm/i915/gem/i915_gem_stolen.c    |  7 ++---
>>   drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c  | 26 ++++++++++++-------
>>   .../gpu/drm/i915/gem/selftests/huge_pages.c   |  2 +-
>>   drivers/gpu/drm/i915/gt/gen6_ppgtt.c          |  4 +--
>>   drivers/gpu/drm/i915/gt/gen8_ppgtt.c          |  4 +--
>>   drivers/gpu/drm/i915/gt/intel_ggtt.c          |  8 ++----
>>   drivers/gpu/drm/i915/gt/intel_migrate.c       | 11 +++-----
>>   drivers/gpu/drm/i915/gt/selftest_migrate.c    |  9 +++----
>>   drivers/gpu/drm/i915/gt/selftest_reset.c      | 14 +++-------
>>   drivers/gpu/drm/i915/gt/selftest_tlb.c        |  5 ++--
>>   drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c      |  8 ++----
>>   drivers/gpu/drm/i915/i915_cache.c             | 18 +++++++++++++
>>   drivers/gpu/drm/i915/i915_cache.h             | 13 ++++++++++
>>   drivers/gpu/drm/i915/i915_driver.c            |  3 +++
>>   drivers/gpu/drm/i915/i915_drv.h               |  2 ++
>>   drivers/gpu/drm/i915/i915_gem.c               |  8 ++----
>>   drivers/gpu/drm/i915/i915_gpu_error.c         |  8 ++----
>>   drivers/gpu/drm/i915/selftests/i915_gem.c     |  5 +---
>>   .../gpu/drm/i915/selftests/i915_gem_evict.c   |  4 +--
>>   drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 11 +++-----
>>   .../drm/i915/selftests/intel_memory_region.c  |  4 +--
>>   .../gpu/drm/i915/selftests/mock_gem_device.c  |  2 ++
>>   24 files changed, 89 insertions(+), 91 deletions(-)
>>   create mode 100644 drivers/gpu/drm/i915/i915_cache.c
>>   create mode 100644 drivers/gpu/drm/i915/i915_cache.h
>>
>> diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
>> index c5fc91cd58e7..905a51a16588 100644
>> --- a/drivers/gpu/drm/i915/Makefile
>> +++ b/drivers/gpu/drm/i915/Makefile
>> @@ -35,6 +35,7 @@ subdir-ccflags-y += -I$(srctree)/$(src)
>>   # core driver code
>>   i915-y += i915_driver.o \
>>   	  i915_drm_client.o \
>> +	  i915_cache.o \
>>   	  i915_config.o \
>>   	  i915_getparam.o \
>>   	  i915_ioctl.o \
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>> index 5a687a3686bd..0a1d40220020 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>> @@ -1330,8 +1330,7 @@ static void *reloc_iomap(struct i915_vma *batch,
>>   		ggtt->vm.insert_page(&ggtt->vm,
>>   				     i915_gem_object_get_dma_address(obj, page),
>>   				     offset,
>> -				     i915_gem_get_pat_index(ggtt->vm.i915,
>> -							    I915_CACHE_NONE),
>> +				     eb->i915->pat_uc,
>>   				     0);
>>   	} else {
>>   		offset += page << PAGE_SHIFT;
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
>> index 5b0a5cf9a98a..1c8eb806b7d3 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
>> @@ -563,11 +563,8 @@ static void dbg_poison(struct i915_ggtt *ggtt,
>>   	while (size) {
>>   		void __iomem *s;
>>   
>> -		ggtt->vm.insert_page(&ggtt->vm, addr,
>> -				     ggtt->error_capture.start,
>> -				     i915_gem_get_pat_index(ggtt->vm.i915,
>> -							    I915_CACHE_NONE),
>> -				     0);
>> +		ggtt->vm.insert_page(&ggtt->vm, addr, ggtt->error_capture.start,
>> +				     ggtt->vm.i915->pat_uc, 0);
>>   		mb();
>>   
>>   		s = io_mapping_map_wc(&ggtt->iomap,
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
>> index 7078af2f8f79..6bd6c239f4ac 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
>> @@ -58,6 +58,16 @@ i915_ttm_cache_level(struct drm_i915_private *i915, struct ttm_resource *res,
>>   		I915_CACHE_NONE;
>>   }
>>   
>> +static unsigned int
>> +i915_ttm_cache_pat(struct drm_i915_private *i915, struct ttm_resource *res,
>> +		   struct ttm_tt *ttm)
>> +{
>> +	return ((HAS_LLC(i915) || HAS_SNOOP(i915)) &&
>> +		!i915_ttm_gtt_binds_lmem(res) &&
> 
> This matches the existing logic of i915_ttm_cache_level(), but do you
> know why LMEM buffers are always set to uncached?  I don't understand
> that part.

I am not sure - was thinking about that myself - like why not WC? WC PAT 
exists on Gen12, but maybe using it wouldn't help any with blitter moves.

> 
>> +		ttm->caching == ttm_cached) ? i915->pat_wb :
>> +		i915->pat_uc;
>> +}
>> +
>>   static struct intel_memory_region *
>>   i915_ttm_region(struct ttm_device *bdev, int ttm_mem_type)
>>   {
>> @@ -196,7 +206,7 @@ static struct dma_fence *i915_ttm_accel_move(struct ttm_buffer_object *bo,
>>   	struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
>>   	struct i915_request *rq;
>>   	struct ttm_tt *src_ttm = bo->ttm;
>> -	enum i915_cache_level src_level, dst_level;
>> +	unsigned int src_pat, dst_pat;
>>   	int ret;
>>   
>>   	if (!to_gt(i915)->migrate.context || intel_gt_is_wedged(to_gt(i915)))
>> @@ -206,16 +216,15 @@ static struct dma_fence *i915_ttm_accel_move(struct ttm_buffer_object *bo,
>>   	if (I915_SELFTEST_ONLY(fail_gpu_migration))
>>   		clear = true;
>>   
>> -	dst_level = i915_ttm_cache_level(i915, dst_mem, dst_ttm);
>> +	dst_pat = i915_ttm_cache_pat(i915, dst_mem, dst_ttm);
>>   	if (clear) {
>>   		if (bo->type == ttm_bo_type_kernel &&
>>   		    !I915_SELFTEST_ONLY(fail_gpu_migration))
>>   			return ERR_PTR(-EINVAL);
>>   
>>   		intel_engine_pm_get(to_gt(i915)->migrate.context->engine);
>> -		ret = intel_context_migrate_clear(to_gt(i915)->migrate.context, deps,
>> -						  dst_st->sgl,
>> -						  i915_gem_get_pat_index(i915, dst_level),
>> +		ret = intel_context_migrate_clear(to_gt(i915)->migrate.context,
>> +						  deps, dst_st->sgl, dst_pat,
>>   						  i915_ttm_gtt_binds_lmem(dst_mem),
>>   						  0, &rq);
>>   	} else {
>> @@ -225,14 +234,13 @@ static struct dma_fence *i915_ttm_accel_move(struct ttm_buffer_object *bo,
>>   		if (IS_ERR(src_rsgt))
>>   			return ERR_CAST(src_rsgt);
>>   
>> -		src_level = i915_ttm_cache_level(i915, bo->resource, src_ttm);
>> +		src_pat = i915_ttm_cache_pat(i915, bo->resource, src_ttm);
>>   		intel_engine_pm_get(to_gt(i915)->migrate.context->engine);
>>   		ret = intel_context_migrate_copy(to_gt(i915)->migrate.context,
>>   						 deps, src_rsgt->table.sgl,
>> -						 i915_gem_get_pat_index(i915, src_level),
>> +						 src_pat,
>>   						 i915_ttm_gtt_binds_lmem(bo->resource),
>> -						 dst_st->sgl,
>> -						 i915_gem_get_pat_index(i915, dst_level),
>> +						 dst_st->sgl, dst_pat,
>>   						 i915_ttm_gtt_binds_lmem(dst_mem),
>>   						 &rq);
>>   
>> diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
>> index 6b9f6cf50bf6..6bddd733d796 100644
>> --- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
>> +++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
>> @@ -354,7 +354,7 @@ fake_huge_pages_object(struct drm_i915_private *i915, u64 size, bool single)
>>   
>>   	obj->write_domain = I915_GEM_DOMAIN_CPU;
>>   	obj->read_domains = I915_GEM_DOMAIN_CPU;
>> -	obj->pat_index = i915_gem_get_pat_index(i915, I915_CACHE_NONE);
>> +	obj->pat_index = i915->pat_uc;
>>   
>>   	return obj;
>>   }
>> diff --git a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
>> index c2bdc133c89a..fb69f667652a 100644
>> --- a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
>> +++ b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
>> @@ -226,9 +226,7 @@ static int gen6_ppgtt_init_scratch(struct gen6_ppgtt *ppgtt)
>>   		return ret;
>>   
>>   	vm->scratch[0]->encode =
>> -		vm->pte_encode(px_dma(vm->scratch[0]),
>> -			       i915_gem_get_pat_index(vm->i915,
>> -						      I915_CACHE_NONE),
>> +		vm->pte_encode(px_dma(vm->scratch[0]), vm->i915->pat_uc,
>>   			       PTE_READ_ONLY);
>>   
>>   	vm->scratch[1] = vm->alloc_pt_dma(vm, I915_GTT_PAGE_SIZE_4K);
>> diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>> index 862ac1d2de25..675f71f06e89 100644
>> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>> @@ -874,9 +874,7 @@ static int gen8_init_scratch(struct i915_address_space *vm)
>>   		pte_flags |= PTE_LM;
>>   
>>   	vm->scratch[0]->encode =
>> -		vm->pte_encode(px_dma(vm->scratch[0]),
>> -			       i915_gem_get_pat_index(vm->i915,
>> -						      I915_CACHE_NONE),
>> +		vm->pte_encode(px_dma(vm->scratch[0]), vm->i915->pat_uc,
>>   			       pte_flags);
>>   
>>   	for (i = 1; i <= vm->top; i++) {
>> diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c
>> index dd0ed941441a..fca61ddca8ad 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
>> @@ -921,9 +921,7 @@ static int ggtt_probe_common(struct i915_ggtt *ggtt, u64 size)
>>   		pte_flags |= PTE_LM;
>>   
>>   	ggtt->vm.scratch[0]->encode =
>> -		ggtt->vm.pte_encode(px_dma(ggtt->vm.scratch[0]),
>> -				    i915_gem_get_pat_index(i915,
>> -							   I915_CACHE_NONE),
>> +		ggtt->vm.pte_encode(px_dma(ggtt->vm.scratch[0]), i915->pat_uc,
>>   				    pte_flags);
>>   
>>   	return 0;
>> @@ -1298,9 +1296,7 @@ bool i915_ggtt_resume_vm(struct i915_address_space *vm)
>>   		 */
>>   		vma->resource->bound_flags = 0;
>>   		vma->ops->bind_vma(vm, NULL, vma->resource,
>> -				   obj ? obj->pat_index :
>> -					 i915_gem_get_pat_index(vm->i915,
>> -								I915_CACHE_NONE),
>> +				   obj ? obj->pat_index : vm->i915->pat_uc,
>>   				   was_bound);
>>   
>>   		if (obj) { /* only used during resume => exclusive access */
>> diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c
>> index 576e5ef0289b..b7a61b02f64c 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_migrate.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
>> @@ -45,9 +45,7 @@ static void xehpsdv_toggle_pdes(struct i915_address_space *vm,
>>   	 * Insert a dummy PTE into every PT that will map to LMEM to ensure
>>   	 * we have a correctly setup PDE structure for later use.
>>   	 */
>> -	vm->insert_page(vm, 0, d->offset,
>> -			i915_gem_get_pat_index(vm->i915, I915_CACHE_NONE),
>> -			PTE_LM);
>> +	vm->insert_page(vm, 0, d->offset, vm->i915->pat_uc, PTE_LM);
>>   	GEM_BUG_ON(!pt->is_compact);
>>   	d->offset += SZ_2M;
>>   }
>> @@ -65,9 +63,7 @@ static void xehpsdv_insert_pte(struct i915_address_space *vm,
>>   	 * alignment is 64K underneath for the pt, and we are careful
>>   	 * not to access the space in the void.
>>   	 */
>> -	vm->insert_page(vm, px_dma(pt), d->offset,
>> -			i915_gem_get_pat_index(vm->i915, I915_CACHE_NONE),
>> -			PTE_LM);
>> +	vm->insert_page(vm, px_dma(pt), d->offset, vm->i915->pat_uc, PTE_LM);
>>   	d->offset += SZ_64K;
>>   }
>>   
>> @@ -77,8 +73,7 @@ static void insert_pte(struct i915_address_space *vm,
>>   {
>>   	struct insert_pte_data *d = data;
>>   
>> -	vm->insert_page(vm, px_dma(pt), d->offset,
>> -			i915_gem_get_pat_index(vm->i915, I915_CACHE_NONE),
>> +	vm->insert_page(vm, px_dma(pt), d->offset, vm->i915->pat_uc,
>>   			i915_gem_object_is_lmem(pt->base) ? PTE_LM : 0);
>>   	d->offset += PAGE_SIZE;
>>   }
>> diff --git a/drivers/gpu/drm/i915/gt/selftest_migrate.c b/drivers/gpu/drm/i915/gt/selftest_migrate.c
>> index 3def5ca72dec..a67ede65d816 100644
>> --- a/drivers/gpu/drm/i915/gt/selftest_migrate.c
>> +++ b/drivers/gpu/drm/i915/gt/selftest_migrate.c
>> @@ -904,8 +904,7 @@ static int perf_clear_blt(void *arg)
>>   
>>   		err = __perf_clear_blt(gt->migrate.context,
>>   				       dst->mm.pages->sgl,
>> -				       i915_gem_get_pat_index(gt->i915,
>> -							      I915_CACHE_NONE),
>> +				       gt->i915->pat_uc,
>>   				       i915_gem_object_is_lmem(dst),
>>   				       sizes[i]);
>>   
>> @@ -995,12 +994,10 @@ static int perf_copy_blt(void *arg)
>>   
>>   		err = __perf_copy_blt(gt->migrate.context,
>>   				      src->mm.pages->sgl,
>> -				      i915_gem_get_pat_index(gt->i915,
>> -							     I915_CACHE_NONE),
>> +				      gt->i915->pat_uc,
>>   				      i915_gem_object_is_lmem(src),
>>   				      dst->mm.pages->sgl,
>> -				      i915_gem_get_pat_index(gt->i915,
>> -							     I915_CACHE_NONE),
>> +				      gt->i915->pat_uc,
>>   				      i915_gem_object_is_lmem(dst),
>>   				      sz);
>>   
>> diff --git a/drivers/gpu/drm/i915/gt/selftest_reset.c b/drivers/gpu/drm/i915/gt/selftest_reset.c
>> index 79aa6ac66ad2..327dc9294e0f 100644
>> --- a/drivers/gpu/drm/i915/gt/selftest_reset.c
>> +++ b/drivers/gpu/drm/i915/gt/selftest_reset.c
>> @@ -84,11 +84,8 @@ __igt_reset_stolen(struct intel_gt *gt,
>>   		void __iomem *s;
>>   		void *in;
>>   
>> -		ggtt->vm.insert_page(&ggtt->vm, dma,
>> -				     ggtt->error_capture.start,
>> -				     i915_gem_get_pat_index(gt->i915,
>> -							    I915_CACHE_NONE),
>> -				     0);
>> +		ggtt->vm.insert_page(&ggtt->vm, dma, ggtt->error_capture.start,
>> +				     gt->i915->pat_uc, 0);
>>   		mb();
>>   
>>   		s = io_mapping_map_wc(&ggtt->iomap,
>> @@ -127,11 +124,8 @@ __igt_reset_stolen(struct intel_gt *gt,
>>   		void *in;
>>   		u32 x;
>>   
>> -		ggtt->vm.insert_page(&ggtt->vm, dma,
>> -				     ggtt->error_capture.start,
>> -				     i915_gem_get_pat_index(gt->i915,
>> -							    I915_CACHE_NONE),
>> -				     0);
>> +		ggtt->vm.insert_page(&ggtt->vm, dma, ggtt->error_capture.start,
>> +				     gt->i915->pat_uc, 0);
>>   		mb();
>>   
>>   		s = io_mapping_map_wc(&ggtt->iomap,
>> diff --git a/drivers/gpu/drm/i915/gt/selftest_tlb.c b/drivers/gpu/drm/i915/gt/selftest_tlb.c
>> index 3bd6b540257b..6049f01be219 100644
>> --- a/drivers/gpu/drm/i915/gt/selftest_tlb.c
>> +++ b/drivers/gpu/drm/i915/gt/selftest_tlb.c
>> @@ -36,8 +36,6 @@ pte_tlbinv(struct intel_context *ce,
>>   	   u64 length,
>>   	   struct rnd_state *prng)
>>   {
>> -	const unsigned int pat_index =
>> -		i915_gem_get_pat_index(ce->vm->i915, I915_CACHE_NONE);
>>   	struct drm_i915_gem_object *batch;
>>   	struct drm_mm_node vb_node;
>>   	struct i915_request *rq;
>> @@ -157,7 +155,8 @@ pte_tlbinv(struct intel_context *ce,
>>   		/* Flip the PTE between A and B */
>>   		if (i915_gem_object_is_lmem(vb->obj))
>>   			pte_flags |= PTE_LM;
>> -		ce->vm->insert_entries(ce->vm, &vb_res, pat_index, pte_flags);
>> +		ce->vm->insert_entries(ce->vm, &vb_res, ce->vm->i915->pat_uc,
>> +				       pte_flags);
>>   
>>   		/* Flush the PTE update to concurrent HW */
>>   		tlbinv(ce->vm, addr & -length, length);
>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
>> index 7aadad5639c3..8b7aa8c5a99d 100644
>> --- a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
>> @@ -1053,14 +1053,10 @@ static void uc_fw_bind_ggtt(struct intel_uc_fw *uc_fw)
>>   
>>   	if (ggtt->vm.raw_insert_entries)
>>   		ggtt->vm.raw_insert_entries(&ggtt->vm, vma_res,
>> -					    i915_gem_get_pat_index(ggtt->vm.i915,
>> -								   I915_CACHE_NONE),
>> -					    pte_flags);
>> +					    ggtt->vm.i915->pat_uc, pte_flags);
>>   	else
>>   		ggtt->vm.insert_entries(&ggtt->vm, vma_res,
>> -					i915_gem_get_pat_index(ggtt->vm.i915,
>> -							       I915_CACHE_NONE),
>> -					pte_flags);
>> +					ggtt->vm.i915->pat_uc, pte_flags);
>>   }
>>   
>>   static void uc_fw_unbind_ggtt(struct intel_uc_fw *uc_fw)
>> diff --git a/drivers/gpu/drm/i915/i915_cache.c b/drivers/gpu/drm/i915/i915_cache.c
>> new file mode 100644
>> index 000000000000..06eb5933c719
>> --- /dev/null
>> +++ b/drivers/gpu/drm/i915/i915_cache.c
>> @@ -0,0 +1,18 @@
>> +// SPDX-License-Identifier: MIT
>> +/*
>> + * Copyright © 2023 Intel Corporation
>> + */
>> +
>> +#include "i915_cache.h"
>> +#include "i915_drv.h"
>> +
>> +void i915_cache_init(struct drm_i915_private *i915)
>> +{
>> +	i915->pat_uc = i915_gem_get_pat_index(i915, I915_CACHE_NONE);
>> +	drm_info(&i915->drm, "Using PAT index %u for uncached access\n",
>> +		 i915->pat_uc);
>> +
>> +	i915->pat_wb = i915_gem_get_pat_index(i915, I915_CACHE_LLC);
>> +	drm_info(&i915->drm, "Using PAT index %u for write-back access\n",
>> +		 i915->pat_wb);
>> +}
>> diff --git a/drivers/gpu/drm/i915/i915_cache.h b/drivers/gpu/drm/i915/i915_cache.h
>> new file mode 100644
>> index 000000000000..cb68936fb8a2
>> --- /dev/null
>> +++ b/drivers/gpu/drm/i915/i915_cache.h
>> @@ -0,0 +1,13 @@
>> +/* SPDX-License-Identifier: MIT */
>> +/*
>> + * Copyright © 2023 Intel Corporation
>> + */
>> +
>> +#ifndef __I915_CACHE_H__
>> +#define __I915_CACHE_H__
>> +
>> +struct drm_i915_private;
>> +
>> +void i915_cache_init(struct drm_i915_private *i915);
>> +
>> +#endif /* __I915_CACHE_H__ */
>> diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
>> index 294b022de22b..bb2223cc3470 100644
>> --- a/drivers/gpu/drm/i915/i915_driver.c
>> +++ b/drivers/gpu/drm/i915/i915_driver.c
>> @@ -80,6 +80,7 @@
>>   #include "soc/intel_dram.h"
>>   #include "soc/intel_gmch.h"
>>   
>> +#include "i915_cache.h"
>>   #include "i915_debugfs.h"
>>   #include "i915_driver.h"
>>   #include "i915_drm_client.h"
>> @@ -240,6 +241,8 @@ static int i915_driver_early_probe(struct drm_i915_private *dev_priv)
>>   	i915_memcpy_init_early(dev_priv);
>>   	intel_runtime_pm_init_early(&dev_priv->runtime_pm);
>>   
>> +	i915_cache_init(dev_priv);
>> +
>>   	ret = i915_workqueues_init(dev_priv);
>>   	if (ret < 0)
>>   		return ret;
>> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
>> index 682ef2b5c7d5..f5c591a762df 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.h
>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>> @@ -250,6 +250,8 @@ struct drm_i915_private {
>>   	unsigned int hpll_freq;
>>   	unsigned int czclk_freq;
>>   
>> +	unsigned int pat_uc, pat_wb;
>> +
>>   	/**
>>   	 * wq - Driver workqueue for GEM.
>>   	 *
>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>> index 1f65bb33dd21..896aa48ed089 100644
>> --- a/drivers/gpu/drm/i915/i915_gem.c
>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>> @@ -422,9 +422,7 @@ i915_gem_gtt_pread(struct drm_i915_gem_object *obj,
>>   			ggtt->vm.insert_page(&ggtt->vm,
>>   					     i915_gem_object_get_dma_address(obj,
>>   									     offset >> PAGE_SHIFT),
>> -					     node.start,
>> -					     i915_gem_get_pat_index(i915,
>> -								    I915_CACHE_NONE), 0);
>> +					     node.start, i915->pat_uc, 0);
>>   		} else {
>>   			page_base += offset & PAGE_MASK;
>>   		}
>> @@ -603,9 +601,7 @@ i915_gem_gtt_pwrite_fast(struct drm_i915_gem_object *obj,
>>   			ggtt->vm.insert_page(&ggtt->vm,
>>   					     i915_gem_object_get_dma_address(obj,
>>   									     offset >> PAGE_SHIFT),
>> -					     node.start,
>> -					     i915_gem_get_pat_index(i915,
>> -								    I915_CACHE_NONE), 0);
>> +					     node.start, i915->pat_uc, 0);
>>   			wmb(); /* flush modifications to the GGTT (insert_page) */
>>   		} else {
>>   			page_base += offset & PAGE_MASK;
>> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
>> index 4008bb09fdb5..31975a79730c 100644
>> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
>> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
>> @@ -1124,14 +1124,10 @@ i915_vma_coredump_create(const struct intel_gt *gt,
>>   			mutex_lock(&ggtt->error_mutex);
>>   			if (ggtt->vm.raw_insert_page)
>>   				ggtt->vm.raw_insert_page(&ggtt->vm, dma, slot,
>> -							 i915_gem_get_pat_index(gt->i915,
>> -										I915_CACHE_NONE),
>> -							 0);
>> +							 gt->i915->pat_uc, 0);
>>   			else
>>   				ggtt->vm.insert_page(&ggtt->vm, dma, slot,
>> -						     i915_gem_get_pat_index(gt->i915,
>> -									    I915_CACHE_NONE),
>> -						     0);
>> +						     gt->i915->pat_uc, 0);
>>   			mb();
>>   
>>   			s = io_mapping_map_wc(&ggtt->iomap, slot, PAGE_SIZE);
>> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem.c b/drivers/gpu/drm/i915/selftests/i915_gem.c
>> index 61da4ed9d521..e620f73793a5 100644
>> --- a/drivers/gpu/drm/i915/selftests/i915_gem.c
>> +++ b/drivers/gpu/drm/i915/selftests/i915_gem.c
>> @@ -57,10 +57,7 @@ static void trash_stolen(struct drm_i915_private *i915)
>>   		u32 __iomem *s;
>>   		int x;
>>   
>> -		ggtt->vm.insert_page(&ggtt->vm, dma, slot,
>> -				     i915_gem_get_pat_index(i915,
>> -							    I915_CACHE_NONE),
>> -				     0);
>> +		ggtt->vm.insert_page(&ggtt->vm, dma, slot, i915->pat_uc, 0);
>>   
>>   		s = io_mapping_map_atomic_wc(&ggtt->iomap, slot);
>>   		for (x = 0; x < PAGE_SIZE / sizeof(u32); x++) {
>> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
>> index f8fe3681c3dc..f910ec9b6d2b 100644
>> --- a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
>> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
>> @@ -246,7 +246,7 @@ static int igt_evict_for_cache_color(void *arg)
>>   	struct drm_mm_node target = {
>>   		.start = I915_GTT_PAGE_SIZE * 2,
>>   		.size = I915_GTT_PAGE_SIZE,
>> -		.color = i915_gem_get_pat_index(gt->i915, I915_CACHE_LLC),
>> +		.color = gt->i915->pat_wb,
>>   	};
>>   	struct drm_i915_gem_object *obj;
>>   	struct i915_vma *vma;
>> @@ -309,7 +309,7 @@ static int igt_evict_for_cache_color(void *arg)
>>   	/* Attempt to remove the first *pinned* vma, by removing the (empty)
>>   	 * neighbour -- this should fail.
>>   	 */
>> -	target.color = i915_gem_get_pat_index(gt->i915, I915_CACHE_L3_LLC);
>> +	target.color = gt->i915->pat_uc;
> 
> This one doesn't look correct.  On most platforms I915_CACHE_L3_LLC maps
> to the same wb PAT as I915_CACHE_LLC.  Only on legacy platforms does it
> differ, and it maps to something different than either pat_uc or pat_wb
> there.

AFAICT this is just fake color in a mock test so the actual modes do not 
matter. All that matters is that two different values are used. I will 
put a comment in the test and try not using anything caching related 
there but just two constants.

Regards,

Tvrtko

>>   
>>   	mutex_lock(&ggtt->vm.mutex);
>>   	err = i915_gem_evict_for_node(&ggtt->vm, NULL, &target, 0);
>> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>> index 5c397a2df70e..c96b7f7d7853 100644
>> --- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>> @@ -135,7 +135,7 @@ fake_dma_object(struct drm_i915_private *i915, u64 size)
>>   
>>   	obj->write_domain = I915_GEM_DOMAIN_CPU;
>>   	obj->read_domains = I915_GEM_DOMAIN_CPU;
>> -	obj->pat_index = i915_gem_get_pat_index(i915, I915_CACHE_NONE);
>> +	obj->pat_index = i915->pat_uc;
>>   
>>   	/* Preallocate the "backing storage" */
>>   	if (i915_gem_object_pin_pages_unlocked(obj))
>> @@ -358,9 +358,7 @@ static int lowlevel_hole(struct i915_address_space *vm,
>>   			mock_vma_res->start = addr;
>>   
>>   			with_intel_runtime_pm(vm->gt->uncore->rpm, wakeref)
>> -			  vm->insert_entries(vm, mock_vma_res,
>> -					     i915_gem_get_pat_index(vm->i915,
>> -								    I915_CACHE_NONE),
>> +			  vm->insert_entries(vm, mock_vma_res, vm->i915->pat_uc,
>>   					     0);
>>   		}
>>   		count = n;
>> @@ -1379,10 +1377,7 @@ static int igt_ggtt_page(void *arg)
>>   
>>   		ggtt->vm.insert_page(&ggtt->vm,
>>   				     i915_gem_object_get_dma_address(obj, 0),
>> -				     offset,
>> -				     i915_gem_get_pat_index(i915,
>> -							    I915_CACHE_NONE),
>> -				     0);
>> +				     offset, i915->pat_uc, 0);
>>   	}
>>   
>>   	order = i915_random_order(count, &prng);
>> diff --git a/drivers/gpu/drm/i915/selftests/intel_memory_region.c b/drivers/gpu/drm/i915/selftests/intel_memory_region.c
>> index d985d9bae2e8..b82fe0ef8cd7 100644
>> --- a/drivers/gpu/drm/i915/selftests/intel_memory_region.c
>> +++ b/drivers/gpu/drm/i915/selftests/intel_memory_region.c
>> @@ -1070,9 +1070,7 @@ static int igt_lmem_write_cpu(void *arg)
>>   	/* Put the pages into a known state -- from the gpu for added fun */
>>   	intel_engine_pm_get(engine);
>>   	err = intel_context_migrate_clear(engine->gt->migrate.context, NULL,
>> -					  obj->mm.pages->sgl,
>> -					  i915_gem_get_pat_index(i915,
>> -								 I915_CACHE_NONE),
>> +					  obj->mm.pages->sgl, i915->pat_uc,
>>   					  true, 0xdeadbeaf, &rq);
>>   	if (rq) {
>>   		dma_resv_add_fence(obj->base.resv, &rq->fence,
>> diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
>> index da0b269606c5..1d1a457e2aee 100644
>> --- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
>> +++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
>> @@ -181,6 +181,8 @@ struct drm_i915_private *mock_gem_device(void)
>>   	/* Set up device info and initial runtime info. */
>>   	intel_device_info_driver_create(i915, pdev->device, &mock_info);
>>   
>> +	i915_cache_init(i915);
>> +
>>   	dev_pm_domain_set(&pdev->dev, &pm_domain);
>>   	pm_runtime_enable(&pdev->dev);
>>   	pm_runtime_dont_use_autosuspend(&pdev->dev);
>> -- 
>> 2.39.2
>>
> 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Intel-gfx] [RFC 3/8] drm/i915: Cache PAT index used by the driver
@ 2023-07-28 12:03       ` Tvrtko Ursulin
  0 siblings, 0 replies; 59+ messages in thread
From: Tvrtko Ursulin @ 2023-07-28 12:03 UTC (permalink / raw)
  To: Matt Roper; +Cc: Intel-gfx, dri-devel


On 27/07/2023 23:44, Matt Roper wrote:
> On Thu, Jul 27, 2023 at 03:54:59PM +0100, Tvrtko Ursulin wrote:
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> Eliminate a bunch of runtime calls to i915_gem_get_pat_index() by caching
>> the interesting PAT indices in struct drm_i915_private. They are static
>> per platfrom so no need to consult a function every time.
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> Cc: Matt Roper <matthew.d.roper@intel.com>
>> Cc: Fei Yang <fei.yang@intel.com>
>> ---
>>   drivers/gpu/drm/i915/Makefile                 |  1 +
>>   .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |  3 +--
>>   drivers/gpu/drm/i915/gem/i915_gem_stolen.c    |  7 ++---
>>   drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c  | 26 ++++++++++++-------
>>   .../gpu/drm/i915/gem/selftests/huge_pages.c   |  2 +-
>>   drivers/gpu/drm/i915/gt/gen6_ppgtt.c          |  4 +--
>>   drivers/gpu/drm/i915/gt/gen8_ppgtt.c          |  4 +--
>>   drivers/gpu/drm/i915/gt/intel_ggtt.c          |  8 ++----
>>   drivers/gpu/drm/i915/gt/intel_migrate.c       | 11 +++-----
>>   drivers/gpu/drm/i915/gt/selftest_migrate.c    |  9 +++----
>>   drivers/gpu/drm/i915/gt/selftest_reset.c      | 14 +++-------
>>   drivers/gpu/drm/i915/gt/selftest_tlb.c        |  5 ++--
>>   drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c      |  8 ++----
>>   drivers/gpu/drm/i915/i915_cache.c             | 18 +++++++++++++
>>   drivers/gpu/drm/i915/i915_cache.h             | 13 ++++++++++
>>   drivers/gpu/drm/i915/i915_driver.c            |  3 +++
>>   drivers/gpu/drm/i915/i915_drv.h               |  2 ++
>>   drivers/gpu/drm/i915/i915_gem.c               |  8 ++----
>>   drivers/gpu/drm/i915/i915_gpu_error.c         |  8 ++----
>>   drivers/gpu/drm/i915/selftests/i915_gem.c     |  5 +---
>>   .../gpu/drm/i915/selftests/i915_gem_evict.c   |  4 +--
>>   drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 11 +++-----
>>   .../drm/i915/selftests/intel_memory_region.c  |  4 +--
>>   .../gpu/drm/i915/selftests/mock_gem_device.c  |  2 ++
>>   24 files changed, 89 insertions(+), 91 deletions(-)
>>   create mode 100644 drivers/gpu/drm/i915/i915_cache.c
>>   create mode 100644 drivers/gpu/drm/i915/i915_cache.h
>>
>> diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
>> index c5fc91cd58e7..905a51a16588 100644
>> --- a/drivers/gpu/drm/i915/Makefile
>> +++ b/drivers/gpu/drm/i915/Makefile
>> @@ -35,6 +35,7 @@ subdir-ccflags-y += -I$(srctree)/$(src)
>>   # core driver code
>>   i915-y += i915_driver.o \
>>   	  i915_drm_client.o \
>> +	  i915_cache.o \
>>   	  i915_config.o \
>>   	  i915_getparam.o \
>>   	  i915_ioctl.o \
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>> index 5a687a3686bd..0a1d40220020 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>> @@ -1330,8 +1330,7 @@ static void *reloc_iomap(struct i915_vma *batch,
>>   		ggtt->vm.insert_page(&ggtt->vm,
>>   				     i915_gem_object_get_dma_address(obj, page),
>>   				     offset,
>> -				     i915_gem_get_pat_index(ggtt->vm.i915,
>> -							    I915_CACHE_NONE),
>> +				     eb->i915->pat_uc,
>>   				     0);
>>   	} else {
>>   		offset += page << PAGE_SHIFT;
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
>> index 5b0a5cf9a98a..1c8eb806b7d3 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
>> @@ -563,11 +563,8 @@ static void dbg_poison(struct i915_ggtt *ggtt,
>>   	while (size) {
>>   		void __iomem *s;
>>   
>> -		ggtt->vm.insert_page(&ggtt->vm, addr,
>> -				     ggtt->error_capture.start,
>> -				     i915_gem_get_pat_index(ggtt->vm.i915,
>> -							    I915_CACHE_NONE),
>> -				     0);
>> +		ggtt->vm.insert_page(&ggtt->vm, addr, ggtt->error_capture.start,
>> +				     ggtt->vm.i915->pat_uc, 0);
>>   		mb();
>>   
>>   		s = io_mapping_map_wc(&ggtt->iomap,
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
>> index 7078af2f8f79..6bd6c239f4ac 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
>> @@ -58,6 +58,16 @@ i915_ttm_cache_level(struct drm_i915_private *i915, struct ttm_resource *res,
>>   		I915_CACHE_NONE;
>>   }
>>   
>> +static unsigned int
>> +i915_ttm_cache_pat(struct drm_i915_private *i915, struct ttm_resource *res,
>> +		   struct ttm_tt *ttm)
>> +{
>> +	return ((HAS_LLC(i915) || HAS_SNOOP(i915)) &&
>> +		!i915_ttm_gtt_binds_lmem(res) &&
> 
> This matches the existing logic of i915_ttm_cache_level(), but do you
> know why LMEM buffers are always set to uncached?  I don't understand
> that part.

I am not sure - was thinking about that myself - like why not WC? WC PAT 
exists on Gen12, but maybe using it wouldn't help any with blitter moves.

> 
>> +		ttm->caching == ttm_cached) ? i915->pat_wb :
>> +		i915->pat_uc;
>> +}
>> +
>>   static struct intel_memory_region *
>>   i915_ttm_region(struct ttm_device *bdev, int ttm_mem_type)
>>   {
>> @@ -196,7 +206,7 @@ static struct dma_fence *i915_ttm_accel_move(struct ttm_buffer_object *bo,
>>   	struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
>>   	struct i915_request *rq;
>>   	struct ttm_tt *src_ttm = bo->ttm;
>> -	enum i915_cache_level src_level, dst_level;
>> +	unsigned int src_pat, dst_pat;
>>   	int ret;
>>   
>>   	if (!to_gt(i915)->migrate.context || intel_gt_is_wedged(to_gt(i915)))
>> @@ -206,16 +216,15 @@ static struct dma_fence *i915_ttm_accel_move(struct ttm_buffer_object *bo,
>>   	if (I915_SELFTEST_ONLY(fail_gpu_migration))
>>   		clear = true;
>>   
>> -	dst_level = i915_ttm_cache_level(i915, dst_mem, dst_ttm);
>> +	dst_pat = i915_ttm_cache_pat(i915, dst_mem, dst_ttm);
>>   	if (clear) {
>>   		if (bo->type == ttm_bo_type_kernel &&
>>   		    !I915_SELFTEST_ONLY(fail_gpu_migration))
>>   			return ERR_PTR(-EINVAL);
>>   
>>   		intel_engine_pm_get(to_gt(i915)->migrate.context->engine);
>> -		ret = intel_context_migrate_clear(to_gt(i915)->migrate.context, deps,
>> -						  dst_st->sgl,
>> -						  i915_gem_get_pat_index(i915, dst_level),
>> +		ret = intel_context_migrate_clear(to_gt(i915)->migrate.context,
>> +						  deps, dst_st->sgl, dst_pat,
>>   						  i915_ttm_gtt_binds_lmem(dst_mem),
>>   						  0, &rq);
>>   	} else {
>> @@ -225,14 +234,13 @@ static struct dma_fence *i915_ttm_accel_move(struct ttm_buffer_object *bo,
>>   		if (IS_ERR(src_rsgt))
>>   			return ERR_CAST(src_rsgt);
>>   
>> -		src_level = i915_ttm_cache_level(i915, bo->resource, src_ttm);
>> +		src_pat = i915_ttm_cache_pat(i915, bo->resource, src_ttm);
>>   		intel_engine_pm_get(to_gt(i915)->migrate.context->engine);
>>   		ret = intel_context_migrate_copy(to_gt(i915)->migrate.context,
>>   						 deps, src_rsgt->table.sgl,
>> -						 i915_gem_get_pat_index(i915, src_level),
>> +						 src_pat,
>>   						 i915_ttm_gtt_binds_lmem(bo->resource),
>> -						 dst_st->sgl,
>> -						 i915_gem_get_pat_index(i915, dst_level),
>> +						 dst_st->sgl, dst_pat,
>>   						 i915_ttm_gtt_binds_lmem(dst_mem),
>>   						 &rq);
>>   
>> diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
>> index 6b9f6cf50bf6..6bddd733d796 100644
>> --- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
>> +++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
>> @@ -354,7 +354,7 @@ fake_huge_pages_object(struct drm_i915_private *i915, u64 size, bool single)
>>   
>>   	obj->write_domain = I915_GEM_DOMAIN_CPU;
>>   	obj->read_domains = I915_GEM_DOMAIN_CPU;
>> -	obj->pat_index = i915_gem_get_pat_index(i915, I915_CACHE_NONE);
>> +	obj->pat_index = i915->pat_uc;
>>   
>>   	return obj;
>>   }
>> diff --git a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
>> index c2bdc133c89a..fb69f667652a 100644
>> --- a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
>> +++ b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
>> @@ -226,9 +226,7 @@ static int gen6_ppgtt_init_scratch(struct gen6_ppgtt *ppgtt)
>>   		return ret;
>>   
>>   	vm->scratch[0]->encode =
>> -		vm->pte_encode(px_dma(vm->scratch[0]),
>> -			       i915_gem_get_pat_index(vm->i915,
>> -						      I915_CACHE_NONE),
>> +		vm->pte_encode(px_dma(vm->scratch[0]), vm->i915->pat_uc,
>>   			       PTE_READ_ONLY);
>>   
>>   	vm->scratch[1] = vm->alloc_pt_dma(vm, I915_GTT_PAGE_SIZE_4K);
>> diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>> index 862ac1d2de25..675f71f06e89 100644
>> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>> @@ -874,9 +874,7 @@ static int gen8_init_scratch(struct i915_address_space *vm)
>>   		pte_flags |= PTE_LM;
>>   
>>   	vm->scratch[0]->encode =
>> -		vm->pte_encode(px_dma(vm->scratch[0]),
>> -			       i915_gem_get_pat_index(vm->i915,
>> -						      I915_CACHE_NONE),
>> +		vm->pte_encode(px_dma(vm->scratch[0]), vm->i915->pat_uc,
>>   			       pte_flags);
>>   
>>   	for (i = 1; i <= vm->top; i++) {
>> diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c
>> index dd0ed941441a..fca61ddca8ad 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
>> @@ -921,9 +921,7 @@ static int ggtt_probe_common(struct i915_ggtt *ggtt, u64 size)
>>   		pte_flags |= PTE_LM;
>>   
>>   	ggtt->vm.scratch[0]->encode =
>> -		ggtt->vm.pte_encode(px_dma(ggtt->vm.scratch[0]),
>> -				    i915_gem_get_pat_index(i915,
>> -							   I915_CACHE_NONE),
>> +		ggtt->vm.pte_encode(px_dma(ggtt->vm.scratch[0]), i915->pat_uc,
>>   				    pte_flags);
>>   
>>   	return 0;
>> @@ -1298,9 +1296,7 @@ bool i915_ggtt_resume_vm(struct i915_address_space *vm)
>>   		 */
>>   		vma->resource->bound_flags = 0;
>>   		vma->ops->bind_vma(vm, NULL, vma->resource,
>> -				   obj ? obj->pat_index :
>> -					 i915_gem_get_pat_index(vm->i915,
>> -								I915_CACHE_NONE),
>> +				   obj ? obj->pat_index : vm->i915->pat_uc,
>>   				   was_bound);
>>   
>>   		if (obj) { /* only used during resume => exclusive access */
>> diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c
>> index 576e5ef0289b..b7a61b02f64c 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_migrate.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
>> @@ -45,9 +45,7 @@ static void xehpsdv_toggle_pdes(struct i915_address_space *vm,
>>   	 * Insert a dummy PTE into every PT that will map to LMEM to ensure
>>   	 * we have a correctly setup PDE structure for later use.
>>   	 */
>> -	vm->insert_page(vm, 0, d->offset,
>> -			i915_gem_get_pat_index(vm->i915, I915_CACHE_NONE),
>> -			PTE_LM);
>> +	vm->insert_page(vm, 0, d->offset, vm->i915->pat_uc, PTE_LM);
>>   	GEM_BUG_ON(!pt->is_compact);
>>   	d->offset += SZ_2M;
>>   }
>> @@ -65,9 +63,7 @@ static void xehpsdv_insert_pte(struct i915_address_space *vm,
>>   	 * alignment is 64K underneath for the pt, and we are careful
>>   	 * not to access the space in the void.
>>   	 */
>> -	vm->insert_page(vm, px_dma(pt), d->offset,
>> -			i915_gem_get_pat_index(vm->i915, I915_CACHE_NONE),
>> -			PTE_LM);
>> +	vm->insert_page(vm, px_dma(pt), d->offset, vm->i915->pat_uc, PTE_LM);
>>   	d->offset += SZ_64K;
>>   }
>>   
>> @@ -77,8 +73,7 @@ static void insert_pte(struct i915_address_space *vm,
>>   {
>>   	struct insert_pte_data *d = data;
>>   
>> -	vm->insert_page(vm, px_dma(pt), d->offset,
>> -			i915_gem_get_pat_index(vm->i915, I915_CACHE_NONE),
>> +	vm->insert_page(vm, px_dma(pt), d->offset, vm->i915->pat_uc,
>>   			i915_gem_object_is_lmem(pt->base) ? PTE_LM : 0);
>>   	d->offset += PAGE_SIZE;
>>   }
>> diff --git a/drivers/gpu/drm/i915/gt/selftest_migrate.c b/drivers/gpu/drm/i915/gt/selftest_migrate.c
>> index 3def5ca72dec..a67ede65d816 100644
>> --- a/drivers/gpu/drm/i915/gt/selftest_migrate.c
>> +++ b/drivers/gpu/drm/i915/gt/selftest_migrate.c
>> @@ -904,8 +904,7 @@ static int perf_clear_blt(void *arg)
>>   
>>   		err = __perf_clear_blt(gt->migrate.context,
>>   				       dst->mm.pages->sgl,
>> -				       i915_gem_get_pat_index(gt->i915,
>> -							      I915_CACHE_NONE),
>> +				       gt->i915->pat_uc,
>>   				       i915_gem_object_is_lmem(dst),
>>   				       sizes[i]);
>>   
>> @@ -995,12 +994,10 @@ static int perf_copy_blt(void *arg)
>>   
>>   		err = __perf_copy_blt(gt->migrate.context,
>>   				      src->mm.pages->sgl,
>> -				      i915_gem_get_pat_index(gt->i915,
>> -							     I915_CACHE_NONE),
>> +				      gt->i915->pat_uc,
>>   				      i915_gem_object_is_lmem(src),
>>   				      dst->mm.pages->sgl,
>> -				      i915_gem_get_pat_index(gt->i915,
>> -							     I915_CACHE_NONE),
>> +				      gt->i915->pat_uc,
>>   				      i915_gem_object_is_lmem(dst),
>>   				      sz);
>>   
>> diff --git a/drivers/gpu/drm/i915/gt/selftest_reset.c b/drivers/gpu/drm/i915/gt/selftest_reset.c
>> index 79aa6ac66ad2..327dc9294e0f 100644
>> --- a/drivers/gpu/drm/i915/gt/selftest_reset.c
>> +++ b/drivers/gpu/drm/i915/gt/selftest_reset.c
>> @@ -84,11 +84,8 @@ __igt_reset_stolen(struct intel_gt *gt,
>>   		void __iomem *s;
>>   		void *in;
>>   
>> -		ggtt->vm.insert_page(&ggtt->vm, dma,
>> -				     ggtt->error_capture.start,
>> -				     i915_gem_get_pat_index(gt->i915,
>> -							    I915_CACHE_NONE),
>> -				     0);
>> +		ggtt->vm.insert_page(&ggtt->vm, dma, ggtt->error_capture.start,
>> +				     gt->i915->pat_uc, 0);
>>   		mb();
>>   
>>   		s = io_mapping_map_wc(&ggtt->iomap,
>> @@ -127,11 +124,8 @@ __igt_reset_stolen(struct intel_gt *gt,
>>   		void *in;
>>   		u32 x;
>>   
>> -		ggtt->vm.insert_page(&ggtt->vm, dma,
>> -				     ggtt->error_capture.start,
>> -				     i915_gem_get_pat_index(gt->i915,
>> -							    I915_CACHE_NONE),
>> -				     0);
>> +		ggtt->vm.insert_page(&ggtt->vm, dma, ggtt->error_capture.start,
>> +				     gt->i915->pat_uc, 0);
>>   		mb();
>>   
>>   		s = io_mapping_map_wc(&ggtt->iomap,
>> diff --git a/drivers/gpu/drm/i915/gt/selftest_tlb.c b/drivers/gpu/drm/i915/gt/selftest_tlb.c
>> index 3bd6b540257b..6049f01be219 100644
>> --- a/drivers/gpu/drm/i915/gt/selftest_tlb.c
>> +++ b/drivers/gpu/drm/i915/gt/selftest_tlb.c
>> @@ -36,8 +36,6 @@ pte_tlbinv(struct intel_context *ce,
>>   	   u64 length,
>>   	   struct rnd_state *prng)
>>   {
>> -	const unsigned int pat_index =
>> -		i915_gem_get_pat_index(ce->vm->i915, I915_CACHE_NONE);
>>   	struct drm_i915_gem_object *batch;
>>   	struct drm_mm_node vb_node;
>>   	struct i915_request *rq;
>> @@ -157,7 +155,8 @@ pte_tlbinv(struct intel_context *ce,
>>   		/* Flip the PTE between A and B */
>>   		if (i915_gem_object_is_lmem(vb->obj))
>>   			pte_flags |= PTE_LM;
>> -		ce->vm->insert_entries(ce->vm, &vb_res, pat_index, pte_flags);
>> +		ce->vm->insert_entries(ce->vm, &vb_res, ce->vm->i915->pat_uc,
>> +				       pte_flags);
>>   
>>   		/* Flush the PTE update to concurrent HW */
>>   		tlbinv(ce->vm, addr & -length, length);
>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
>> index 7aadad5639c3..8b7aa8c5a99d 100644
>> --- a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
>> @@ -1053,14 +1053,10 @@ static void uc_fw_bind_ggtt(struct intel_uc_fw *uc_fw)
>>   
>>   	if (ggtt->vm.raw_insert_entries)
>>   		ggtt->vm.raw_insert_entries(&ggtt->vm, vma_res,
>> -					    i915_gem_get_pat_index(ggtt->vm.i915,
>> -								   I915_CACHE_NONE),
>> -					    pte_flags);
>> +					    ggtt->vm.i915->pat_uc, pte_flags);
>>   	else
>>   		ggtt->vm.insert_entries(&ggtt->vm, vma_res,
>> -					i915_gem_get_pat_index(ggtt->vm.i915,
>> -							       I915_CACHE_NONE),
>> -					pte_flags);
>> +					ggtt->vm.i915->pat_uc, pte_flags);
>>   }
>>   
>>   static void uc_fw_unbind_ggtt(struct intel_uc_fw *uc_fw)
>> diff --git a/drivers/gpu/drm/i915/i915_cache.c b/drivers/gpu/drm/i915/i915_cache.c
>> new file mode 100644
>> index 000000000000..06eb5933c719
>> --- /dev/null
>> +++ b/drivers/gpu/drm/i915/i915_cache.c
>> @@ -0,0 +1,18 @@
>> +// SPDX-License-Identifier: MIT
>> +/*
>> + * Copyright © 2023 Intel Corporation
>> + */
>> +
>> +#include "i915_cache.h"
>> +#include "i915_drv.h"
>> +
>> +void i915_cache_init(struct drm_i915_private *i915)
>> +{
>> +	i915->pat_uc = i915_gem_get_pat_index(i915, I915_CACHE_NONE);
>> +	drm_info(&i915->drm, "Using PAT index %u for uncached access\n",
>> +		 i915->pat_uc);
>> +
>> +	i915->pat_wb = i915_gem_get_pat_index(i915, I915_CACHE_LLC);
>> +	drm_info(&i915->drm, "Using PAT index %u for write-back access\n",
>> +		 i915->pat_wb);
>> +}
>> diff --git a/drivers/gpu/drm/i915/i915_cache.h b/drivers/gpu/drm/i915/i915_cache.h
>> new file mode 100644
>> index 000000000000..cb68936fb8a2
>> --- /dev/null
>> +++ b/drivers/gpu/drm/i915/i915_cache.h
>> @@ -0,0 +1,13 @@
>> +/* SPDX-License-Identifier: MIT */
>> +/*
>> + * Copyright © 2023 Intel Corporation
>> + */
>> +
>> +#ifndef __I915_CACHE_H__
>> +#define __I915_CACHE_H__
>> +
>> +struct drm_i915_private;
>> +
>> +void i915_cache_init(struct drm_i915_private *i915);
>> +
>> +#endif /* __I915_CACHE_H__ */
>> diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
>> index 294b022de22b..bb2223cc3470 100644
>> --- a/drivers/gpu/drm/i915/i915_driver.c
>> +++ b/drivers/gpu/drm/i915/i915_driver.c
>> @@ -80,6 +80,7 @@
>>   #include "soc/intel_dram.h"
>>   #include "soc/intel_gmch.h"
>>   
>> +#include "i915_cache.h"
>>   #include "i915_debugfs.h"
>>   #include "i915_driver.h"
>>   #include "i915_drm_client.h"
>> @@ -240,6 +241,8 @@ static int i915_driver_early_probe(struct drm_i915_private *dev_priv)
>>   	i915_memcpy_init_early(dev_priv);
>>   	intel_runtime_pm_init_early(&dev_priv->runtime_pm);
>>   
>> +	i915_cache_init(dev_priv);
>> +
>>   	ret = i915_workqueues_init(dev_priv);
>>   	if (ret < 0)
>>   		return ret;
>> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
>> index 682ef2b5c7d5..f5c591a762df 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.h
>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>> @@ -250,6 +250,8 @@ struct drm_i915_private {
>>   	unsigned int hpll_freq;
>>   	unsigned int czclk_freq;
>>   
>> +	unsigned int pat_uc, pat_wb;
>> +
>>   	/**
>>   	 * wq - Driver workqueue for GEM.
>>   	 *
>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>> index 1f65bb33dd21..896aa48ed089 100644
>> --- a/drivers/gpu/drm/i915/i915_gem.c
>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>> @@ -422,9 +422,7 @@ i915_gem_gtt_pread(struct drm_i915_gem_object *obj,
>>   			ggtt->vm.insert_page(&ggtt->vm,
>>   					     i915_gem_object_get_dma_address(obj,
>>   									     offset >> PAGE_SHIFT),
>> -					     node.start,
>> -					     i915_gem_get_pat_index(i915,
>> -								    I915_CACHE_NONE), 0);
>> +					     node.start, i915->pat_uc, 0);
>>   		} else {
>>   			page_base += offset & PAGE_MASK;
>>   		}
>> @@ -603,9 +601,7 @@ i915_gem_gtt_pwrite_fast(struct drm_i915_gem_object *obj,
>>   			ggtt->vm.insert_page(&ggtt->vm,
>>   					     i915_gem_object_get_dma_address(obj,
>>   									     offset >> PAGE_SHIFT),
>> -					     node.start,
>> -					     i915_gem_get_pat_index(i915,
>> -								    I915_CACHE_NONE), 0);
>> +					     node.start, i915->pat_uc, 0);
>>   			wmb(); /* flush modifications to the GGTT (insert_page) */
>>   		} else {
>>   			page_base += offset & PAGE_MASK;
>> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
>> index 4008bb09fdb5..31975a79730c 100644
>> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
>> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
>> @@ -1124,14 +1124,10 @@ i915_vma_coredump_create(const struct intel_gt *gt,
>>   			mutex_lock(&ggtt->error_mutex);
>>   			if (ggtt->vm.raw_insert_page)
>>   				ggtt->vm.raw_insert_page(&ggtt->vm, dma, slot,
>> -							 i915_gem_get_pat_index(gt->i915,
>> -										I915_CACHE_NONE),
>> -							 0);
>> +							 gt->i915->pat_uc, 0);
>>   			else
>>   				ggtt->vm.insert_page(&ggtt->vm, dma, slot,
>> -						     i915_gem_get_pat_index(gt->i915,
>> -									    I915_CACHE_NONE),
>> -						     0);
>> +						     gt->i915->pat_uc, 0);
>>   			mb();
>>   
>>   			s = io_mapping_map_wc(&ggtt->iomap, slot, PAGE_SIZE);
>> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem.c b/drivers/gpu/drm/i915/selftests/i915_gem.c
>> index 61da4ed9d521..e620f73793a5 100644
>> --- a/drivers/gpu/drm/i915/selftests/i915_gem.c
>> +++ b/drivers/gpu/drm/i915/selftests/i915_gem.c
>> @@ -57,10 +57,7 @@ static void trash_stolen(struct drm_i915_private *i915)
>>   		u32 __iomem *s;
>>   		int x;
>>   
>> -		ggtt->vm.insert_page(&ggtt->vm, dma, slot,
>> -				     i915_gem_get_pat_index(i915,
>> -							    I915_CACHE_NONE),
>> -				     0);
>> +		ggtt->vm.insert_page(&ggtt->vm, dma, slot, i915->pat_uc, 0);
>>   
>>   		s = io_mapping_map_atomic_wc(&ggtt->iomap, slot);
>>   		for (x = 0; x < PAGE_SIZE / sizeof(u32); x++) {
>> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
>> index f8fe3681c3dc..f910ec9b6d2b 100644
>> --- a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
>> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
>> @@ -246,7 +246,7 @@ static int igt_evict_for_cache_color(void *arg)
>>   	struct drm_mm_node target = {
>>   		.start = I915_GTT_PAGE_SIZE * 2,
>>   		.size = I915_GTT_PAGE_SIZE,
>> -		.color = i915_gem_get_pat_index(gt->i915, I915_CACHE_LLC),
>> +		.color = gt->i915->pat_wb,
>>   	};
>>   	struct drm_i915_gem_object *obj;
>>   	struct i915_vma *vma;
>> @@ -309,7 +309,7 @@ static int igt_evict_for_cache_color(void *arg)
>>   	/* Attempt to remove the first *pinned* vma, by removing the (empty)
>>   	 * neighbour -- this should fail.
>>   	 */
>> -	target.color = i915_gem_get_pat_index(gt->i915, I915_CACHE_L3_LLC);
>> +	target.color = gt->i915->pat_uc;
> 
> This one doesn't look correct.  On most platforms I915_CACHE_L3_LLC maps
> to the same wb PAT as I915_CACHE_LLC.  Only on legacy platforms does it
> differ, and it maps to something different than either pat_uc or pat_wb
> there.

AFAICT this is just fake color in a mock test so the actual modes do not 
matter. All that matters is that two different values are used. I will 
put a comment in the test and try not using anything caching related 
there but just two constants.

Regards,

Tvrtko

>>   
>>   	mutex_lock(&ggtt->vm.mutex);
>>   	err = i915_gem_evict_for_node(&ggtt->vm, NULL, &target, 0);
>> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>> index 5c397a2df70e..c96b7f7d7853 100644
>> --- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>> @@ -135,7 +135,7 @@ fake_dma_object(struct drm_i915_private *i915, u64 size)
>>   
>>   	obj->write_domain = I915_GEM_DOMAIN_CPU;
>>   	obj->read_domains = I915_GEM_DOMAIN_CPU;
>> -	obj->pat_index = i915_gem_get_pat_index(i915, I915_CACHE_NONE);
>> +	obj->pat_index = i915->pat_uc;
>>   
>>   	/* Preallocate the "backing storage" */
>>   	if (i915_gem_object_pin_pages_unlocked(obj))
>> @@ -358,9 +358,7 @@ static int lowlevel_hole(struct i915_address_space *vm,
>>   			mock_vma_res->start = addr;
>>   
>>   			with_intel_runtime_pm(vm->gt->uncore->rpm, wakeref)
>> -			  vm->insert_entries(vm, mock_vma_res,
>> -					     i915_gem_get_pat_index(vm->i915,
>> -								    I915_CACHE_NONE),
>> +			  vm->insert_entries(vm, mock_vma_res, vm->i915->pat_uc,
>>   					     0);
>>   		}
>>   		count = n;
>> @@ -1379,10 +1377,7 @@ static int igt_ggtt_page(void *arg)
>>   
>>   		ggtt->vm.insert_page(&ggtt->vm,
>>   				     i915_gem_object_get_dma_address(obj, 0),
>> -				     offset,
>> -				     i915_gem_get_pat_index(i915,
>> -							    I915_CACHE_NONE),
>> -				     0);
>> +				     offset, i915->pat_uc, 0);
>>   	}
>>   
>>   	order = i915_random_order(count, &prng);
>> diff --git a/drivers/gpu/drm/i915/selftests/intel_memory_region.c b/drivers/gpu/drm/i915/selftests/intel_memory_region.c
>> index d985d9bae2e8..b82fe0ef8cd7 100644
>> --- a/drivers/gpu/drm/i915/selftests/intel_memory_region.c
>> +++ b/drivers/gpu/drm/i915/selftests/intel_memory_region.c
>> @@ -1070,9 +1070,7 @@ static int igt_lmem_write_cpu(void *arg)
>>   	/* Put the pages into a known state -- from the gpu for added fun */
>>   	intel_engine_pm_get(engine);
>>   	err = intel_context_migrate_clear(engine->gt->migrate.context, NULL,
>> -					  obj->mm.pages->sgl,
>> -					  i915_gem_get_pat_index(i915,
>> -								 I915_CACHE_NONE),
>> +					  obj->mm.pages->sgl, i915->pat_uc,
>>   					  true, 0xdeadbeaf, &rq);
>>   	if (rq) {
>>   		dma_resv_add_fence(obj->base.resv, &rq->fence,
>> diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
>> index da0b269606c5..1d1a457e2aee 100644
>> --- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
>> +++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
>> @@ -181,6 +181,8 @@ struct drm_i915_private *mock_gem_device(void)
>>   	/* Set up device info and initial runtime info. */
>>   	intel_device_info_driver_create(i915, pdev->device, &mock_info);
>>   
>> +	i915_cache_init(i915);
>> +
>>   	dev_pm_domain_set(&pdev->dev, &pm_domain);
>>   	pm_runtime_enable(&pdev->dev);
>>   	pm_runtime_dont_use_autosuspend(&pdev->dev);
>> -- 
>> 2.39.2
>>
> 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC 4/8] drm/i915: Refactor PAT/object cache handling
  2023-07-27 23:57     ` [Intel-gfx] " Matt Roper
@ 2023-07-28 12:23       ` Tvrtko Ursulin
  -1 siblings, 0 replies; 59+ messages in thread
From: Tvrtko Ursulin @ 2023-07-28 12:23 UTC (permalink / raw)
  To: Matt Roper
  Cc: Fei Yang, Tvrtko Ursulin, Intel-gfx, dri-devel, Andi Shyti, Chris Wilson


On 28/07/2023 00:57, Matt Roper wrote:
> On Thu, Jul 27, 2023 at 03:55:00PM +0100, Tvrtko Ursulin wrote:
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> Commit 9275277d5324 ("drm/i915: use pat_index instead of cache_level") has
>> introduced PAT indices to i915 internal APIs, partially replacing the
>> usage of driver internal cache_level, but has also added a few sub-
>> optimal design decisions which this patch tries to improve upon.
>>
>> Principal change here is to invert the per platform cache level to PAT
>> index table which was added by the referenced commit, and by doing so
>> enable i915 to understand the cache mode between PAT indices, changing
>> them from opaque to transparent.
>>
>> Once we have the inverted table we are able to remove the hidden false
>> "return true" from i915_gem_object_has_cache_level and make the involved
>> code path clearer.
>>
>> To achieve this we replace the enum i915_cache_level with i915_cache_t,
>> composed of a more detailed representation of each cache mode (base mode
>> plus flags).
>>
>> In this way we are able to express the differences between different
>> write-back mode coherency settings on Meteorlake, which in turn enables us
>> to map the i915 "cached" mode to the correct Meteorlake PAT index.
>>
>> We can also replace the platform dependent cache mode to string code in
>> debugfs and elsewhere by the single implementation based on i915_cache_t.
>>
>> v2:
>>   * Fix PAT-to-cache-mode table for PVC. (Fei)
>>   * Cache display caching mode too. (Fei)
>>   * Improve and document criteria in i915_gem_object_can_bypass_llc() (Matt)
>>
>> v3:
>>   * Checkpath issues.
>>   * Cache mode flags check fixed.
>>
>> v4:
>>   * Fix intel_device_info->cache_modes array size. (Matt)
>>   * Boolean cache mode and flags query. (Matt)
>>   * Reduce number of cache macros with some macro magic.
>>   * One more checkpatch fix.
>>   * Tweak tables to show legacy and Gen12 WB is fully coherent.
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> References: 9275277d5324 ("drm/i915: use pat_index instead of cache_level")
>> Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
>> Cc: Fei Yang <fei.yang@intel.com>
>> Cc: Andi Shyti <andi.shyti@linux.intel.com>
>> Cc: Matt Roper <matthew.d.roper@intel.com>
>> ---
>>   drivers/gpu/drm/i915/gem/i915_gem_domain.c    |  60 +++++----
>>   drivers/gpu/drm/i915/gem/i915_gem_domain.h    |   5 +-
>>   .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |   3 +-
>>   drivers/gpu/drm/i915/gem/i915_gem_internal.c  |   2 +-
>>   drivers/gpu/drm/i915/gem/i915_gem_mman.c      |   4 +-
>>   drivers/gpu/drm/i915/gem/i915_gem_object.c    | 117 ++++++++++--------
>>   drivers/gpu/drm/i915/gem/i915_gem_object.h    |  11 +-
>>   .../gpu/drm/i915/gem/i915_gem_object_types.h  | 116 +----------------
>>   drivers/gpu/drm/i915/gem/i915_gem_shmem.c     |   8 +-
>>   drivers/gpu/drm/i915/gem/i915_gem_stolen.c    |   2 +-
>>   drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c  |  20 +--
>>   drivers/gpu/drm/i915/gem/i915_gem_userptr.c   |   2 +-
>>   .../drm/i915/gem/selftests/huge_gem_object.c  |   2 +-
>>   .../gpu/drm/i915/gem/selftests/huge_pages.c   |   3 +-
>>   drivers/gpu/drm/i915/gt/gen8_ppgtt.c          |  10 +-
>>   drivers/gpu/drm/i915/gt/intel_engine_cs.c     |   2 +-
>>   drivers/gpu/drm/i915/gt/intel_ggtt.c          |  25 ++--
>>   drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c     |   4 +-
>>   drivers/gpu/drm/i915/gt/intel_gtt.c           |   2 +-
>>   drivers/gpu/drm/i915/gt/intel_gtt.h           |   3 +-
>>   drivers/gpu/drm/i915/gt/intel_ppgtt.c         |   6 +-
>>   .../gpu/drm/i915/gt/intel_ring_submission.c   |   4 +-
>>   drivers/gpu/drm/i915/gt/intel_timeline.c      |   2 +-
>>   drivers/gpu/drm/i915/gt/selftest_hangcheck.c  |   2 +-
>>   .../gpu/drm/i915/gt/selftest_workarounds.c    |   2 +-
>>   drivers/gpu/drm/i915/i915_cache.c             |  89 +++++++++++--
>>   drivers/gpu/drm/i915/i915_cache.h             |  70 ++++++++++-
>>   drivers/gpu/drm/i915/i915_debugfs.c           |  53 ++------
>>   drivers/gpu/drm/i915/i915_driver.c            |   4 +-
>>   drivers/gpu/drm/i915/i915_gem.c               |  13 --
>>   drivers/gpu/drm/i915/i915_pci.c               |  84 +++++++------
>>   drivers/gpu/drm/i915/i915_perf.c              |   2 +-
>>   drivers/gpu/drm/i915/intel_device_info.h      |   6 +-
>>   .../gpu/drm/i915/selftests/i915_gem_evict.c   |   4 +-
>>   drivers/gpu/drm/i915/selftests/igt_spinner.c  |   2 +-
>>   .../gpu/drm/i915/selftests/mock_gem_device.c  |  14 +--
>>   36 files changed, 391 insertions(+), 367 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
>> index 57db9c581bf6..c15f83de33af 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
>> @@ -8,6 +8,7 @@
>>   #include "display/intel_frontbuffer.h"
>>   #include "gt/intel_gt.h"
>>   
>> +#include "i915_cache.h"
>>   #include "i915_drv.h"
>>   #include "i915_gem_clflush.h"
>>   #include "i915_gem_domain.h"
>> @@ -41,14 +42,17 @@ static bool gpu_write_needs_clflush(struct drm_i915_gem_object *obj)
>>   		return false;
>>   
>>   	/*
>> -	 * For objects created by userspace through GEM_CREATE with pat_index
>> -	 * set by set_pat extension, i915_gem_object_has_cache_level() will
>> -	 * always return true, because the coherency of such object is managed
>> -	 * by userspace. Othereise the call here would fall back to checking
>> -	 * whether the object is un-cached or write-through.
>> +	 * Always flush cache for UMD objects with PAT index set.
>>   	 */
>> -	return !(i915_gem_object_has_cache_level(obj, I915_CACHE_NONE) ||
>> -		 i915_gem_object_has_cache_level(obj, I915_CACHE_WT));
>> +	if (obj->pat_set_by_user)
>> +		return true;
>> +
>> +	/*
>> +	 * Fully coherent cached access may end up with data in the CPU cache
>> +	 * which hasn't hit memory yet.
>> +	 */
>> +	return i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_WB) &&
>> +	       i915_gem_object_has_cache_flag(obj, I915_CACHE_FLAG_COH2W);
>>   }
>>   
>>   bool i915_gem_cpu_write_needs_clflush(struct drm_i915_gem_object *obj)
>> @@ -268,7 +272,7 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
>>   /**
>>    * i915_gem_object_set_cache_level - Changes the cache-level of an object across all VMA.
>>    * @obj: object to act on
>> - * @cache_level: new cache level to set for the object
>> + * @cache: new caching mode to set for the object
>>    *
>>    * After this function returns, the object will be in the new cache-level
>>    * across all GTT and the contents of the backing storage will be coherent,
>> @@ -281,18 +285,28 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
>>    * that all direct access to the scanout remains coherent.
>>    */
>>   int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
>> -				    enum i915_cache_level cache_level)
>> +				    i915_cache_t cache)
>>   {
>> -	int ret;
>> +	struct drm_i915_private *i915 = to_i915(obj->base.dev);
>> +	int pat, ret;
>>   
>> -	/*
>> -	 * For objects created by userspace through GEM_CREATE with pat_index
>> -	 * set by set_pat extension, simply return 0 here without touching
>> -	 * the cache setting, because such objects should have an immutable
>> -	 * cache setting by desgin and always managed by userspace.
>> -	 */
>> -	if (i915_gem_object_has_cache_level(obj, cache_level))
>> +	pat = i915_cache_find_pat(i915, cache);
>> +	if (pat < 0) {
>> +		char buf[I915_CACHE_NAME_LEN];
>> +
>> +		i915_cache_print(buf, sizeof(buf), NULL, cache);
>> +		drm_err_ratelimited(&i915->drm,
>> +				    "Attempting to use unknown caching mode %s!\n",
>> +				    buf);
>> +
>> +		return -EINVAL;
>> +	} else if (pat == obj->pat_index) {
>>   		return 0;
>> +	} else if (obj->pat_set_by_user) {
>> +		drm_notice_once(&i915->drm,
>> +				"Attempting to change caching mode on an object with fixed PAT!\n");
>> +		return -EINVAL;
>> +	}
>>   
>>   	ret = i915_gem_object_wait(obj,
>>   				   I915_WAIT_INTERRUPTIBLE |
>> @@ -302,7 +316,7 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
>>   		return ret;
>>   
>>   	/* Always invalidate stale cachelines */
>> -	i915_gem_object_set_cache_coherency(obj, cache_level);
>> +	i915_gem_object_set_pat_index(obj, pat);
>>   	obj->cache_dirty = true;
>>   
>>   	/* The cache-level will be applied when each vma is rebound. */
>> @@ -337,10 +351,10 @@ int i915_gem_get_caching_ioctl(struct drm_device *dev, void *data,
>>   		goto out;
>>   	}
>>   
>> -	if (i915_gem_object_has_cache_level(obj, I915_CACHE_LLC) ||
>> -	    i915_gem_object_has_cache_level(obj, I915_CACHE_L3_LLC))
>> +	if (i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_WB) &&
>> +	    i915_gem_object_has_cache_flag(obj, I915_CACHE_FLAG_COH2W))
>>   		args->caching = I915_CACHING_CACHED;
>> -	else if (i915_gem_object_has_cache_level(obj, I915_CACHE_WT))
>> +	else if (i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_WT))
>>   		args->caching = I915_CACHING_DISPLAY;
>>   	else
>>   		args->caching = I915_CACHING_NONE;
>> @@ -355,7 +369,7 @@ int i915_gem_set_caching_ioctl(struct drm_device *dev, void *data,
>>   	struct drm_i915_private *i915 = to_i915(dev);
>>   	struct drm_i915_gem_caching *args = data;
>>   	struct drm_i915_gem_object *obj;
>> -	enum i915_cache_level level;
>> +	i915_cache_t level;
>>   	int ret = 0;
>>   
>>   	if (IS_DGFX(i915))
>> @@ -378,7 +392,7 @@ int i915_gem_set_caching_ioctl(struct drm_device *dev, void *data,
>>   		if (!HAS_LLC(i915) && !HAS_SNOOP(i915))
>>   			return -ENODEV;
>>   
>> -		level = I915_CACHE_LLC;
>> +		level = I915_CACHE_CACHED;
>>   		break;
>>   	case I915_CACHING_DISPLAY:
>>   		level = HAS_WT(i915) ? I915_CACHE_WT : I915_CACHE_NONE;
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.h b/drivers/gpu/drm/i915/gem/i915_gem_domain.h
>> index 9622df962bfc..6da5c351f6fd 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.h
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.h
>> @@ -6,10 +6,11 @@
>>   #ifndef __I915_GEM_DOMAIN_H__
>>   #define __I915_GEM_DOMAIN_H__
>>   
>> +#include "i915_cache.h"
>> +
>>   struct drm_i915_gem_object;
>> -enum i915_cache_level;
>>   
>>   int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
>> -				    enum i915_cache_level cache_level);
>> +				    i915_cache_t cache);
>>   
>>   #endif /* __I915_GEM_DOMAIN_H__ */
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>> index 0a1d40220020..9d6e49c8a4c6 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>> @@ -648,7 +648,8 @@ static inline int use_cpu_reloc(const struct reloc_cache *cache,
>>   	 */
>>   	return (cache->has_llc ||
>>   		obj->cache_dirty ||
>> -		!i915_gem_object_has_cache_level(obj, I915_CACHE_NONE));
>> +		!(obj->pat_set_by_user ||
>> +		  i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_UC)));
>>   }
>>   
>>   static int eb_reserve_vma(struct i915_execbuffer *eb,
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_internal.c b/drivers/gpu/drm/i915/gem/i915_gem_internal.c
>> index 6bc26b4b06b8..88c360c3d6a3 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_internal.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_internal.c
>> @@ -170,7 +170,7 @@ __i915_gem_object_create_internal(struct drm_i915_private *i915,
>>   	obj->read_domains = I915_GEM_DOMAIN_CPU;
>>   	obj->write_domain = I915_GEM_DOMAIN_CPU;
>>   
>> -	cache_level = HAS_LLC(i915) ? I915_CACHE_LLC : I915_CACHE_NONE;
>> +	cache_level = HAS_LLC(i915) ? I915_CACHE_CACHED : I915_CACHE_NONE;
>>   	i915_gem_object_set_cache_coherency(obj, cache_level);
>>   
>>   	return obj;
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
>> index aa4d842d4c5a..cd7f8ded0d6f 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
>> @@ -382,7 +382,6 @@ static vm_fault_t vm_fault_gtt(struct vm_fault *vmf)
>>   		goto err_reset;
>>   	}
>>   
>> -	/* Access to snoopable pages through the GTT is incoherent. */
>>   	/*
>>   	 * For objects created by userspace through GEM_CREATE with pat_index
>>   	 * set by set_pat extension, coherency is managed by userspace, make
>> @@ -391,7 +390,8 @@ static vm_fault_t vm_fault_gtt(struct vm_fault *vmf)
>>   	 * objects. Otherwise this helper function would fall back to checking
>>   	 * whether the object is un-cached.
>>   	 */
>> -	if (!(i915_gem_object_has_cache_level(obj, I915_CACHE_NONE) ||
>> +	if (!((obj->pat_set_by_user ||
>> +	       i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_UC)) ||
>>   	      HAS_LLC(i915))) {
>>   		ret = -EFAULT;
>>   		goto err_unpin;
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
>> index 3dc4fbb67d2b..ec1f0be43d0d 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
>> @@ -45,33 +45,6 @@ static struct kmem_cache *slab_objects;
>>   
>>   static const struct drm_gem_object_funcs i915_gem_object_funcs;
>>   
>> -unsigned int i915_gem_get_pat_index(struct drm_i915_private *i915,
>> -				    enum i915_cache_level level)
>> -{
>> -	if (drm_WARN_ON(&i915->drm, level >= I915_MAX_CACHE_LEVEL))
>> -		return 0;
>> -
>> -	return INTEL_INFO(i915)->cachelevel_to_pat[level];
>> -}
>> -
>> -bool i915_gem_object_has_cache_level(const struct drm_i915_gem_object *obj,
>> -				     enum i915_cache_level lvl)
>> -{
>> -	/*
>> -	 * In case the pat_index is set by user space, this kernel mode
>> -	 * driver should leave the coherency to be managed by user space,
>> -	 * simply return true here.
>> -	 */
>> -	if (obj->pat_set_by_user)
>> -		return true;
>> -
>> -	/*
>> -	 * Otherwise the pat_index should have been converted from cache_level
>> -	 * so that the following comparison is valid.
>> -	 */
>> -	return obj->pat_index == i915_gem_get_pat_index(obj_to_i915(obj), lvl);
>> -}
>> -
>>   struct drm_i915_gem_object *i915_gem_object_alloc(void)
>>   {
>>   	struct drm_i915_gem_object *obj;
>> @@ -144,30 +117,72 @@ void __i915_gem_object_fini(struct drm_i915_gem_object *obj)
>>   	dma_resv_fini(&obj->base._resv);
>>   }
>>   
>> +bool i915_gem_object_has_cache_mode(const struct drm_i915_gem_object *obj,
>> +				    enum i915_cache_mode mode)
>> +{
>> +	struct drm_i915_private *i915 = obj_to_i915(obj);
>> +	i915_cache_t cache = INTEL_INFO(i915)->cache_modes[obj->pat_index];
>> +
>> +	return I915_CACHE_MODE(cache) == mode;
>> +}
>> +
>> +bool i915_gem_object_has_cache_flag(const struct drm_i915_gem_object *obj,
>> +				    unsigned int flag)
>> +{
>> +	struct drm_i915_private *i915 = obj_to_i915(obj);
>> +	i915_cache_t cache = INTEL_INFO(i915)->cache_modes[obj->pat_index];
>> +
>> +	return I915_CACHE_FLAGS(cache) & flag;
>> +}
>> +
>> +static void __i915_gem_object_update_coherency(struct drm_i915_gem_object *obj)
>> +{
>> +	struct drm_i915_private *i915 = obj_to_i915(obj);
>> +	i915_cache_t cache = INTEL_INFO(i915)->cache_modes[obj->pat_index];
>> +	const unsigned int flags = I915_CACHE_FLAGS(cache);
>> +	const unsigned int mode = I915_CACHE_MODE(cache);
>> +
>> +	if (mode == I915_CACHE_MODE_WC ||
>> +	    mode == I915_CACHE_MODE_WT ||
>> +	    (mode == I915_CACHE_MODE_WB && (flags & I915_CACHE_FLAG_COH2W)))
>> +		obj->cache_coherent = I915_BO_CACHE_COHERENT_FOR_READ |
>> +				      I915_BO_CACHE_COHERENT_FOR_WRITE;
>> +	else if (HAS_LLC(i915))
>> +		obj->cache_coherent = I915_BO_CACHE_COHERENT_FOR_READ;
>> +	else
>> +		obj->cache_coherent = 0;
>> +
>> +	obj->cache_dirty =
>> +		!(obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_WRITE) &&
>> +		!IS_DGFX(i915);
>> +}
>> +
>>   /**
>>    * i915_gem_object_set_cache_coherency - Mark up the object's coherency levels
>> - * for a given cache_level
>> + * for a given caching mode
>>    * @obj: #drm_i915_gem_object
>> - * @cache_level: cache level
>> + * @cache: cache mode
>>    */
>>   void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object *obj,
>> -					 unsigned int cache_level)
>> +					 i915_cache_t cache)
>>   {
>> -	struct drm_i915_private *i915 = to_i915(obj->base.dev);
>> +	struct drm_i915_private *i915 = obj_to_i915(obj);
>> +	int found;
>>   
>> -	obj->pat_index = i915_gem_get_pat_index(i915, cache_level);
>> +	found = i915_cache_find_pat(i915, cache);
>> +	if (found < 0) {
>> +		char buf[I915_CACHE_NAME_LEN];
>>   
>> -	if (cache_level != I915_CACHE_NONE)
>> -		obj->cache_coherent = (I915_BO_CACHE_COHERENT_FOR_READ |
>> -				       I915_BO_CACHE_COHERENT_FOR_WRITE);
>> -	else if (HAS_LLC(i915))
>> -		obj->cache_coherent = I915_BO_CACHE_COHERENT_FOR_READ;
>> -	else
>> -		obj->cache_coherent = 0;
>> +		i915_cache_print(buf, sizeof(buf), NULL, cache);
>> +		drm_err_ratelimited(&i915->drm, "Unknown cache mode %s!\n",
>> +				    buf);
>>   
>> -	obj->cache_dirty =
>> -		!(obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_WRITE) &&
>> -		!IS_DGFX(i915);
>> +		found = i915->pat_uc;
>> +	}
>> +
>> +	obj->pat_index = found;
>> +
>> +	__i915_gem_object_update_coherency(obj);
>>   }
>>   
>>   /**
>> @@ -181,24 +196,18 @@ void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object *obj,
>>   void i915_gem_object_set_pat_index(struct drm_i915_gem_object *obj,
>>   				   unsigned int pat_index)
>>   {
>> -	struct drm_i915_private *i915 = to_i915(obj->base.dev);
>> +	struct drm_i915_private *i915 = obj_to_i915(obj);
>>   
>>   	if (obj->pat_index == pat_index)
>>   		return;
>>   
>> +	if (drm_WARN_ON_ONCE(&i915->drm,
>> +			     pat_index > INTEL_INFO(i915)->max_pat_index))
>> +		return;
>> +
>>   	obj->pat_index = pat_index;
>>   
>> -	if (pat_index != i915_gem_get_pat_index(i915, I915_CACHE_NONE))
>> -		obj->cache_coherent = (I915_BO_CACHE_COHERENT_FOR_READ |
>> -				       I915_BO_CACHE_COHERENT_FOR_WRITE);
>> -	else if (HAS_LLC(i915))
>> -		obj->cache_coherent = I915_BO_CACHE_COHERENT_FOR_READ;
>> -	else
>> -		obj->cache_coherent = 0;
>> -
>> -	obj->cache_dirty =
>> -		!(obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_WRITE) &&
>> -		!IS_DGFX(i915);
>> +	__i915_gem_object_update_coherency(obj);
>>   }
>>   
>>   bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj)
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h
>> index 884a17275b3a..a5d4ee19d9be 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
>> @@ -13,6 +13,7 @@
>>   
>>   #include "display/intel_frontbuffer.h"
>>   #include "intel_memory_region.h"
>> +#include "i915_cache.h"
>>   #include "i915_gem_object_types.h"
>>   #include "i915_gem_gtt.h"
>>   #include "i915_gem_ww.h"
>> @@ -32,10 +33,6 @@ static inline bool i915_gem_object_size_2big(u64 size)
>>   	return false;
>>   }
>>   
>> -unsigned int i915_gem_get_pat_index(struct drm_i915_private *i915,
>> -				    enum i915_cache_level level);
>> -bool i915_gem_object_has_cache_level(const struct drm_i915_gem_object *obj,
>> -				     enum i915_cache_level lvl);
>>   void i915_gem_init__objects(struct drm_i915_private *i915);
>>   
>>   void i915_objects_module_exit(void);
>> @@ -764,8 +761,12 @@ int i915_gem_object_wait_moving_fence(struct drm_i915_gem_object *obj,
>>   				      bool intr);
>>   bool i915_gem_object_has_unknown_state(struct drm_i915_gem_object *obj);
>>   
>> +bool i915_gem_object_has_cache_mode(const struct drm_i915_gem_object *obj,
>> +				    enum i915_cache_mode mode);
>> +bool i915_gem_object_has_cache_flag(const struct drm_i915_gem_object *obj,
>> +				    unsigned int flag);
>>   void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object *obj,
>> -					 unsigned int cache_level);
>> +					 i915_cache_t cache);
>>   void i915_gem_object_set_pat_index(struct drm_i915_gem_object *obj,
>>   				   unsigned int pat_index);
>>   bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj);
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
>> index 8de2b91b3edf..6790e13ad262 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
>> @@ -14,6 +14,7 @@
>>   #include <uapi/drm/i915_drm.h>
>>   
>>   #include "i915_active.h"
>> +#include "i915_cache.h"
>>   #include "i915_selftest.h"
>>   #include "i915_vma_resource.h"
>>   
>> @@ -116,93 +117,6 @@ struct drm_i915_gem_object_ops {
>>   	const char *name; /* friendly name for debug, e.g. lockdep classes */
>>   };
>>   
>> -/**
>> - * enum i915_cache_level - The supported GTT caching values for system memory
>> - * pages.
>> - *
>> - * These translate to some special GTT PTE bits when binding pages into some
>> - * address space. It also determines whether an object, or rather its pages are
>> - * coherent with the GPU, when also reading or writing through the CPU cache
>> - * with those pages.
>> - *
>> - * Userspace can also control this through struct drm_i915_gem_caching.
>> - */
>> -enum i915_cache_level {
>> -	/**
>> -	 * @I915_CACHE_NONE:
>> -	 *
>> -	 * GPU access is not coherent with the CPU cache. If the cache is dirty
>> -	 * and we need the underlying pages to be coherent with some later GPU
>> -	 * access then we need to manually flush the pages.
>> -	 *
>> -	 * On shared LLC platforms reads and writes through the CPU cache are
>> -	 * still coherent even with this setting. See also
>> -	 * &drm_i915_gem_object.cache_coherent for more details. Due to this we
>> -	 * should only ever use uncached for scanout surfaces, otherwise we end
>> -	 * up over-flushing in some places.
>> -	 *
>> -	 * This is the default on non-LLC platforms.
>> -	 */
>> -	I915_CACHE_NONE = 0,
>> -	/**
>> -	 * @I915_CACHE_LLC:
>> -	 *
>> -	 * GPU access is coherent with the CPU cache. If the cache is dirty,
>> -	 * then the GPU will ensure that access remains coherent, when both
>> -	 * reading and writing through the CPU cache. GPU writes can dirty the
>> -	 * CPU cache.
>> -	 *
>> -	 * Not used for scanout surfaces.
>> -	 *
>> -	 * Applies to both platforms with shared LLC(HAS_LLC), and snooping
>> -	 * based platforms(HAS_SNOOP).
>> -	 *
>> -	 * This is the default on shared LLC platforms.  The only exception is
>> -	 * scanout objects, where the display engine is not coherent with the
>> -	 * CPU cache. For such objects I915_CACHE_NONE or I915_CACHE_WT is
>> -	 * automatically applied by the kernel in pin_for_display, if userspace
>> -	 * has not done so already.
>> -	 */
>> -	I915_CACHE_LLC,
>> -	/**
>> -	 * @I915_CACHE_L3_LLC:
>> -	 *
>> -	 * Explicitly enable the Gfx L3 cache, with coherent LLC.
>> -	 *
>> -	 * The Gfx L3 sits between the domain specific caches, e.g
>> -	 * sampler/render caches, and the larger LLC. LLC is coherent with the
>> -	 * GPU, but L3 is only visible to the GPU, so likely needs to be flushed
>> -	 * when the workload completes.
>> -	 *
>> -	 * Not used for scanout surfaces.
>> -	 *
>> -	 * Only exposed on some gen7 + GGTT. More recent hardware has dropped
>> -	 * this explicit setting, where it should now be enabled by default.
>> -	 */
>> -	I915_CACHE_L3_LLC,
>> -	/**
>> -	 * @I915_CACHE_WT:
>> -	 *
>> -	 * Write-through. Used for scanout surfaces.
>> -	 *
>> -	 * The GPU can utilise the caches, while still having the display engine
>> -	 * be coherent with GPU writes, as a result we don't need to flush the
>> -	 * CPU caches when moving out of the render domain. This is the default
>> -	 * setting chosen by the kernel, if supported by the HW, otherwise we
>> -	 * fallback to I915_CACHE_NONE. On the CPU side writes through the CPU
>> -	 * cache still need to be flushed, to remain coherent with the display
>> -	 * engine.
>> -	 */
>> -	I915_CACHE_WT,
>> -	/**
>> -	 * @I915_MAX_CACHE_LEVEL:
>> -	 *
>> -	 * Mark the last entry in the enum. Used for defining cachelevel_to_pat
>> -	 * array for cache_level to pat translation table.
>> -	 */
>> -	I915_MAX_CACHE_LEVEL,
>> -};
>> -
>>   enum i915_map_type {
>>   	I915_MAP_WB = 0,
>>   	I915_MAP_WC,
>> @@ -403,16 +317,6 @@ struct drm_i915_gem_object {
>>   	/**
>>   	 * @cache_coherent:
>>   	 *
>> -	 * Note: with the change above which replaced @cache_level with pat_index,
>> -	 * the use of @cache_coherent is limited to the objects created by kernel
>> -	 * or by userspace without pat index specified.
>> -	 * Check for @pat_set_by_user to find out if an object has pat index set
>> -	 * by userspace. The ioctl's to change cache settings have also been
>> -	 * disabled for the objects with pat index set by userspace. Please don't
>> -	 * assume @cache_coherent having the flags set as describe here. A helper
>> -	 * function i915_gem_object_has_cache_level() provides one way to bypass
>> -	 * the use of this field.
>> -	 *
>>   	 * Track whether the pages are coherent with the GPU if reading or
>>   	 * writing through the CPU caches. The largely depends on the
>>   	 * @cache_level setting.
>> @@ -447,7 +351,7 @@ struct drm_i915_gem_object {
>>   	 * flushing the surface just before doing the scanout.  This does mean
>>   	 * we might unnecessarily flush non-scanout objects in some places, but
>>   	 * the default assumption is that all normal objects should be using
>> -	 * I915_CACHE_LLC, at least on platforms with the shared LLC.
>> +	 * I915_CACHE_CACHED, at least on platforms with the shared LLC.
>>   	 *
>>   	 * Supported values:
>>   	 *
>> @@ -486,16 +390,6 @@ struct drm_i915_gem_object {
>>   	/**
>>   	 * @cache_dirty:
>>   	 *
>> -	 * Note: with the change above which replaced cache_level with pat_index,
>> -	 * the use of @cache_dirty is limited to the objects created by kernel
>> -	 * or by userspace without pat index specified.
>> -	 * Check for @pat_set_by_user to find out if an object has pat index set
>> -	 * by userspace. The ioctl's to change cache settings have also been
>> -	 * disabled for the objects with pat_index set by userspace. Please don't
>> -	 * assume @cache_dirty is set as describe here. Also see helper function
>> -	 * i915_gem_object_has_cache_level() for possible ways to bypass the use
>> -	 * of this field.
>> -	 *
>>   	 * Track if we are we dirty with writes through the CPU cache for this
>>   	 * object. As a result reading directly from main memory might yield
>>   	 * stale data.
>> @@ -531,9 +425,9 @@ struct drm_i915_gem_object {
>>   	 *
>>   	 *   1. All userspace objects, by default, have @cache_level set as
>>   	 *   I915_CACHE_NONE. The only exception is userptr objects, where we
>> -	 *   instead force I915_CACHE_LLC, but we also don't allow userspace to
>> -	 *   ever change the @cache_level for such objects. Another special case
>> -	 *   is dma-buf, which doesn't rely on @cache_dirty,  but there we
>> +	 *   instead force I915_CACHE_CACHED, but we also don't allow userspace
>> +	 *   to ever change the @cache_level for such objects. Another special
>> +	 *   case is dma-buf, which doesn't rely on @cache_dirty,  but there we
>>   	 *   always do a forced flush when acquiring the pages, if there is a
>>   	 *   chance that the pages can be read directly from main memory with
>>   	 *   the GPU.
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
>> index 8f1633c3fb93..aba908f0349f 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
>> @@ -584,7 +584,7 @@ static int shmem_object_init(struct intel_memory_region *mem,
>>   	static struct lock_class_key lock_class;
>>   	struct drm_i915_private *i915 = mem->i915;
>>   	struct address_space *mapping;
>> -	unsigned int cache_level;
>> +	i915_cache_t cache;
>>   	gfp_t mask;
>>   	int ret;
>>   
>> @@ -628,11 +628,11 @@ static int shmem_object_init(struct intel_memory_region *mem,
>>   		 * However, we maintain the display planes as UC, and so
>>   		 * need to rebind when first used as such.
>>   		 */
>> -		cache_level = I915_CACHE_LLC;
>> +		cache = I915_CACHE_CACHED;
>>   	else
>> -		cache_level = I915_CACHE_NONE;
>> +		cache = I915_CACHE_NONE;
>>   
>> -	i915_gem_object_set_cache_coherency(obj, cache_level);
>> +	i915_gem_object_set_cache_coherency(obj, cache);
>>   
>>   	i915_gem_object_init_memory_region(obj, mem);
>>   
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
>> index 1c8eb806b7d3..cc907a1f1c53 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
>> @@ -691,7 +691,7 @@ static int __i915_gem_object_create_stolen(struct intel_memory_region *mem,
>>   
>>   	obj->stolen = stolen;
>>   	obj->read_domains = I915_GEM_DOMAIN_CPU | I915_GEM_DOMAIN_GTT;
>> -	cache_level = HAS_LLC(mem->i915) ? I915_CACHE_LLC : I915_CACHE_NONE;
>> +	cache_level = HAS_LLC(mem->i915) ? I915_CACHE_CACHED : I915_CACHE_NONE;
>>   	i915_gem_object_set_cache_coherency(obj, cache_level);
>>   
>>   	if (WARN_ON(!i915_gem_object_trylock(obj, NULL)))
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
>> index 6bd6c239f4ac..107176d1757b 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
>> @@ -48,14 +48,14 @@ void i915_ttm_migrate_set_ban_memcpy(bool ban)
>>   }
>>   #endif
>>   
>> -static enum i915_cache_level
>> -i915_ttm_cache_level(struct drm_i915_private *i915, struct ttm_resource *res,
>> -		     struct ttm_tt *ttm)
>> +static i915_cache_t
>> +i915_ttm_cache(struct drm_i915_private *i915, struct ttm_resource *res,
>> +	       struct ttm_tt *ttm)
>>   {
>>   	return ((HAS_LLC(i915) || HAS_SNOOP(i915)) &&
>>   		!i915_ttm_gtt_binds_lmem(res) &&
>> -		ttm->caching == ttm_cached) ? I915_CACHE_LLC :
>> -		I915_CACHE_NONE;
>> +		ttm->caching == ttm_cached) ? I915_CACHE_CACHED :
>> +					      I915_CACHE_NONE;
>>   }
>>   
>>   static unsigned int
>> @@ -112,8 +112,8 @@ void i915_ttm_adjust_domains_after_move(struct drm_i915_gem_object *obj)
>>   void i915_ttm_adjust_gem_after_move(struct drm_i915_gem_object *obj)
>>   {
>>   	struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
>> -	unsigned int cache_level;
>>   	unsigned int mem_flags;
>> +	i915_cache_t cache;
>>   	unsigned int i;
>>   	int mem_type;
>>   
>> @@ -126,13 +126,13 @@ void i915_ttm_adjust_gem_after_move(struct drm_i915_gem_object *obj)
>>   	if (!bo->resource) {
>>   		mem_flags = I915_BO_FLAG_STRUCT_PAGE;
>>   		mem_type = I915_PL_SYSTEM;
>> -		cache_level = I915_CACHE_NONE;
>> +		cache = I915_CACHE_NONE;
>>   	} else {
>>   		mem_flags = i915_ttm_cpu_maps_iomem(bo->resource) ? I915_BO_FLAG_IOMEM :
>>   			I915_BO_FLAG_STRUCT_PAGE;
>>   		mem_type = bo->resource->mem_type;
>> -		cache_level = i915_ttm_cache_level(to_i915(bo->base.dev), bo->resource,
>> -						   bo->ttm);
>> +		cache = i915_ttm_cache(to_i915(bo->base.dev), bo->resource,
>> +				       bo->ttm);
>>   	}
>>   
>>   	/*
>> @@ -157,7 +157,7 @@ void i915_ttm_adjust_gem_after_move(struct drm_i915_gem_object *obj)
>>   	obj->mem_flags &= ~(I915_BO_FLAG_STRUCT_PAGE | I915_BO_FLAG_IOMEM);
>>   	obj->mem_flags |= mem_flags;
>>   
>> -	i915_gem_object_set_cache_coherency(obj, cache_level);
>> +	i915_gem_object_set_cache_coherency(obj, cache);
>>   }
>>   
>>   /**
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
>> index 1d3ebdf4069b..5d2891981bd4 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
>> @@ -553,7 +553,7 @@ i915_gem_userptr_ioctl(struct drm_device *dev,
>>   	obj->mem_flags = I915_BO_FLAG_STRUCT_PAGE;
>>   	obj->read_domains = I915_GEM_DOMAIN_CPU;
>>   	obj->write_domain = I915_GEM_DOMAIN_CPU;
>> -	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
>> +	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
>>   
>>   	obj->userptr.ptr = args->user_ptr;
>>   	obj->userptr.notifier_seq = ULONG_MAX;
>> diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c b/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c
>> index bac957755068..77d04be5e9d7 100644
>> --- a/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c
>> +++ b/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c
>> @@ -123,7 +123,7 @@ huge_gem_object(struct drm_i915_private *i915,
>>   
>>   	obj->read_domains = I915_GEM_DOMAIN_CPU;
>>   	obj->write_domain = I915_GEM_DOMAIN_CPU;
>> -	cache_level = HAS_LLC(i915) ? I915_CACHE_LLC : I915_CACHE_NONE;
>> +	cache_level = HAS_LLC(i915) ? I915_CACHE_CACHED : I915_CACHE_NONE;
>>   	i915_gem_object_set_cache_coherency(obj, cache_level);
>>   	obj->scratch = phys_size;
>>   
>> diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
>> index 6bddd733d796..6ca5b9dbc414 100644
>> --- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
>> +++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
>> @@ -200,9 +200,10 @@ huge_pages_object(struct drm_i915_private *i915,
>>   	obj->write_domain = I915_GEM_DOMAIN_CPU;
>>   	obj->read_domains = I915_GEM_DOMAIN_CPU;
>>   
>> -	cache_level = HAS_LLC(i915) ? I915_CACHE_LLC : I915_CACHE_NONE;
>> +	cache_level = HAS_LLC(i915) ? I915_CACHE_CACHED : I915_CACHE_NONE;
>>   	i915_gem_object_set_cache_coherency(obj, cache_level);
>>   
>> +
>>   	obj->mm.page_mask = page_mask;
>>   
>>   	return obj;
>> diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>> index 675f71f06e89..3c93a73cf6b1 100644
>> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>> @@ -16,11 +16,11 @@
>>   #include "intel_gtt.h"
>>   
>>   static u64 gen8_pde_encode(const dma_addr_t addr,
>> -			   const enum i915_cache_level level)
>> +			   const enum i915_cache_mode cache_mode)
>>   {
>>   	u64 pde = addr | GEN8_PAGE_PRESENT | GEN8_PAGE_RW;
>>   
>> -	if (level != I915_CACHE_NONE)
>> +	if (cache_mode != I915_CACHE_MODE_UC)
>>   		pde |= PPAT_CACHED_PDE;
>>   	else
>>   		pde |= PPAT_UNCACHED;
>> @@ -43,10 +43,10 @@ static u64 gen8_pte_encode(dma_addr_t addr,
>>   	 * See translation table defined by LEGACY_CACHELEVEL.
>>   	 */
>>   	switch (pat_index) {
>> -	case I915_CACHE_NONE:
>> +	case I915_CACHE_MODE_UC:
>>   		pte |= PPAT_UNCACHED;
>>   		break;
>> -	case I915_CACHE_WT:
>> +	case I915_CACHE_MODE_WT:
>>   		pte |= PPAT_DISPLAY_ELLC;
>>   		break;
>>   	default:
>> @@ -893,7 +893,7 @@ static int gen8_init_scratch(struct i915_address_space *vm)
>>   		}
>>   
>>   		fill_px(obj, vm->scratch[i - 1]->encode);
>> -		obj->encode = gen8_pde_encode(px_dma(obj), I915_CACHE_NONE);
>> +		obj->encode = gen8_pde_encode(px_dma(obj), I915_CACHE_MODE_UC);
>>   
>>   		vm->scratch[i] = obj;
>>   	}
>> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
>> index ee15486fed0d..f1e59e512d14 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
>> @@ -1103,7 +1103,7 @@ static int init_status_page(struct intel_engine_cs *engine)
>>   		return PTR_ERR(obj);
>>   	}
>>   
>> -	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
>> +	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
>>   
>>   	vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL);
>>   	if (IS_ERR(vma)) {
>> diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c
>> index fca61ddca8ad..ab5f654e7557 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
>> @@ -1011,11 +1011,6 @@ static int gen8_gmch_probe(struct i915_ggtt *ggtt)
>>   	return ggtt_probe_common(ggtt, size);
>>   }
>>   
>> -/*
>> - * For pre-gen8 platforms pat_index is the same as enum i915_cache_level,
>> - * so the switch-case statements in these PTE encode functions are still valid.
>> - * See translation table LEGACY_CACHELEVEL.
>> - */
>>   static u64 snb_pte_encode(dma_addr_t addr,
>>   			  unsigned int pat_index,
>>   			  u32 flags)
>> @@ -1023,11 +1018,11 @@ static u64 snb_pte_encode(dma_addr_t addr,
>>   	gen6_pte_t pte = GEN6_PTE_ADDR_ENCODE(addr) | GEN6_PTE_VALID;
>>   
>>   	switch (pat_index) {
>> -	case I915_CACHE_L3_LLC:
>> -	case I915_CACHE_LLC:
>> +	case I915_CACHE_MODE_WB:
>> +	case __I915_CACHE_MODE_WB_L3:
>>   		pte |= GEN6_PTE_CACHE_LLC;
>>   		break;
>> -	case I915_CACHE_NONE:
>> +	case I915_CACHE_MODE_UC:
>>   		pte |= GEN6_PTE_UNCACHED;
>>   		break;
>>   	default:
>> @@ -1044,13 +1039,13 @@ static u64 ivb_pte_encode(dma_addr_t addr,
>>   	gen6_pte_t pte = GEN6_PTE_ADDR_ENCODE(addr) | GEN6_PTE_VALID;
>>   
>>   	switch (pat_index) {
>> -	case I915_CACHE_L3_LLC:
>> +	case __I915_CACHE_MODE_WB_L3:
>>   		pte |= GEN7_PTE_CACHE_L3_LLC;
>>   		break;
>> -	case I915_CACHE_LLC:
>> +	case I915_CACHE_MODE_WB:
>>   		pte |= GEN6_PTE_CACHE_LLC;
>>   		break;
>> -	case I915_CACHE_NONE:
>> +	case I915_CACHE_MODE_UC:
>>   		pte |= GEN6_PTE_UNCACHED;
>>   		break;
>>   	default:
>> @@ -1069,7 +1064,7 @@ static u64 byt_pte_encode(dma_addr_t addr,
>>   	if (!(flags & PTE_READ_ONLY))
>>   		pte |= BYT_PTE_WRITEABLE;
>>   
>> -	if (pat_index != I915_CACHE_NONE)
>> +	if (pat_index != I915_CACHE_MODE_UC)
>>   		pte |= BYT_PTE_SNOOPED_BY_CPU_CACHES;
>>   
>>   	return pte;
>> @@ -1081,7 +1076,7 @@ static u64 hsw_pte_encode(dma_addr_t addr,
>>   {
>>   	gen6_pte_t pte = HSW_PTE_ADDR_ENCODE(addr) | GEN6_PTE_VALID;
>>   
>> -	if (pat_index != I915_CACHE_NONE)
>> +	if (pat_index != I915_CACHE_MODE_UC)
>>   		pte |= HSW_WB_LLC_AGE3;
>>   
>>   	return pte;
>> @@ -1094,9 +1089,9 @@ static u64 iris_pte_encode(dma_addr_t addr,
>>   	gen6_pte_t pte = HSW_PTE_ADDR_ENCODE(addr) | GEN6_PTE_VALID;
>>   
>>   	switch (pat_index) {
>> -	case I915_CACHE_NONE:
>> +	case I915_CACHE_MODE_UC:
>>   		break;
>> -	case I915_CACHE_WT:
>> +	case I915_CACHE_MODE_WT:
>>   		pte |= HSW_WT_ELLC_LLC_AGE3;
>>   		break;
>>   	default:
>> diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c b/drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c
>> index 866c416afb73..803c41ac4ccb 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c
>> @@ -21,7 +21,7 @@ static void gmch_ggtt_insert_page(struct i915_address_space *vm,
>>   				  unsigned int pat_index,
>>   				  u32 unused)
>>   {
>> -	unsigned int flags = (pat_index == I915_CACHE_NONE) ?
>> +	unsigned int flags = (pat_index == I915_CACHE_MODE_UC) ?
>>   		AGP_USER_MEMORY : AGP_USER_CACHED_MEMORY;
>>   
>>   	intel_gmch_gtt_insert_page(addr, offset >> PAGE_SHIFT, flags);
>> @@ -32,7 +32,7 @@ static void gmch_ggtt_insert_entries(struct i915_address_space *vm,
>>   				     unsigned int pat_index,
>>   				     u32 unused)
>>   {
>> -	unsigned int flags = (pat_index == I915_CACHE_NONE) ?
>> +	unsigned int flags = (pat_index == I915_CACHE_MODE_UC) ?
>>   		AGP_USER_MEMORY : AGP_USER_CACHED_MEMORY;
>>   
>>   	intel_gmch_gtt_insert_sg_entries(vma_res->bi.pages, vma_res->start >> PAGE_SHIFT,
>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
>> index 065099362a98..48055304537a 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
>> @@ -676,7 +676,7 @@ __vm_create_scratch_for_read(struct i915_address_space *vm, unsigned long size)
>>   	if (IS_ERR(obj))
>>   		return ERR_CAST(obj);
>>   
>> -	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
>> +	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
>>   
>>   	vma = i915_vma_instance(obj, vm, NULL);
>>   	if (IS_ERR(vma)) {
>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
>> index 7192a534a654..af4277c1d577 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
>> @@ -636,7 +636,8 @@ void
>>   __set_pd_entry(struct i915_page_directory * const pd,
>>   	       const unsigned short idx,
>>   	       struct i915_page_table *pt,
>> -	       u64 (*encode)(const dma_addr_t, const enum i915_cache_level));
>> +	       u64 (*encode)(const dma_addr_t,
>> +			     const enum i915_cache_mode cache_mode));
>>   
>>   #define set_pd_entry(pd, idx, to) \
>>   	__set_pd_entry((pd), (idx), px_pt(to), gen8_pde_encode)
>> diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
>> index 436756bfbb1a..3e461d4f3693 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
>> @@ -98,14 +98,16 @@ void
>>   __set_pd_entry(struct i915_page_directory * const pd,
>>   	       const unsigned short idx,
>>   	       struct i915_page_table * const to,
>> -	       u64 (*encode)(const dma_addr_t, const enum i915_cache_level))
>> +	       u64 (*encode)(const dma_addr_t,
>> +			     const enum i915_cache_mode cache_mode))
>>   {
>>   	/* Each thread pre-pins the pd, and we may have a thread per pde. */
>>   	GEM_BUG_ON(atomic_read(px_used(pd)) > NALLOC * I915_PDES);
>>   
>>   	atomic_inc(px_used(pd));
>>   	pd->entry[idx] = to;
>> -	write_dma_entry(px_base(pd), idx, encode(px_dma(to), I915_CACHE_LLC));
>> +	write_dma_entry(px_base(pd), idx,
>> +			encode(px_dma(to), I915_CACHE_MODE_WB));
>>   }
>>   
>>   void
>> diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
>> index 92085ffd23de..9131d228d285 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
>> @@ -551,7 +551,9 @@ alloc_context_vma(struct intel_engine_cs *engine)
>>   	 * later platforms don't have L3 control bits in the PTE.
>>   	 */
>>   	if (IS_IVYBRIDGE(i915))
>> -		i915_gem_object_set_cache_coherency(obj, I915_CACHE_L3_LLC);
>> +		i915_gem_object_set_cache_coherency(obj,
>> +						    I915_CACHE_CACHED |
>> +						    __I915_CACHE_FLAG(L3));
>>   
>>   	vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL);
>>   	if (IS_ERR(vma)) {
>> diff --git a/drivers/gpu/drm/i915/gt/intel_timeline.c b/drivers/gpu/drm/i915/gt/intel_timeline.c
>> index b9640212d659..025ce54c886d 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_timeline.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_timeline.c
>> @@ -26,7 +26,7 @@ static struct i915_vma *hwsp_alloc(struct intel_gt *gt)
>>   	if (IS_ERR(obj))
>>   		return ERR_CAST(obj);
>>   
>> -	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
>> +	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
>>   
>>   	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL);
>>   	if (IS_ERR(vma))
>> diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
>> index 8b0d84f2aad2..fc278fa463b0 100644
>> --- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
>> +++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
>> @@ -64,7 +64,7 @@ static int hang_init(struct hang *h, struct intel_gt *gt)
>>   		goto err_hws;
>>   	}
>>   
>> -	i915_gem_object_set_cache_coherency(h->hws, I915_CACHE_LLC);
>> +	i915_gem_object_set_cache_coherency(h->hws, I915_CACHE_CACHED);
>>   	vaddr = i915_gem_object_pin_map_unlocked(h->hws, I915_MAP_WB);
>>   	if (IS_ERR(vaddr)) {
>>   		err = PTR_ERR(vaddr);
>> diff --git a/drivers/gpu/drm/i915/gt/selftest_workarounds.c b/drivers/gpu/drm/i915/gt/selftest_workarounds.c
>> index 14a8b25b6204..d25990d33d44 100644
>> --- a/drivers/gpu/drm/i915/gt/selftest_workarounds.c
>> +++ b/drivers/gpu/drm/i915/gt/selftest_workarounds.c
>> @@ -111,7 +111,7 @@ read_nonprivs(struct intel_context *ce)
>>   	if (IS_ERR(result))
>>   		return result;
>>   
>> -	i915_gem_object_set_cache_coherency(result, I915_CACHE_LLC);
>> +	i915_gem_object_set_cache_coherency(result, I915_CACHE_CACHED);
>>   
>>   	cs = i915_gem_object_pin_map_unlocked(result, I915_MAP_WB);
>>   	if (IS_ERR(cs)) {
>> diff --git a/drivers/gpu/drm/i915/i915_cache.c b/drivers/gpu/drm/i915/i915_cache.c
>> index 06eb5933c719..f4ba1cb430d3 100644
>> --- a/drivers/gpu/drm/i915/i915_cache.c
>> +++ b/drivers/gpu/drm/i915/i915_cache.c
>> @@ -6,13 +6,88 @@
>>   #include "i915_cache.h"
>>   #include "i915_drv.h"
>>   
>> -void i915_cache_init(struct drm_i915_private *i915)
>> +int i915_cache_init(struct drm_i915_private *i915)
>>   {
>> -	i915->pat_uc = i915_gem_get_pat_index(i915, I915_CACHE_NONE);
>> -	drm_info(&i915->drm, "Using PAT index %u for uncached access\n",
>> -		 i915->pat_uc);
>> +	int ret;
>>   
>> -	i915->pat_wb = i915_gem_get_pat_index(i915, I915_CACHE_LLC);
>> -	drm_info(&i915->drm, "Using PAT index %u for write-back access\n",
>> -		 i915->pat_wb);
>> +	ret = i915_cache_find_pat(i915, I915_CACHE_NONE);
>> +	if (ret < 0) {
>> +		drm_err(&i915->drm,
>> +			"Failed to find PAT index for uncached access\n");
>> +		return -ENODEV;
>> +	}
>> +	drm_info(&i915->drm, "Using PAT index %u for uncached access\n", ret);
>> +	i915->pat_uc = ret;
>> +
>> +	ret = i915_cache_find_pat(i915, I915_CACHE_CACHED);
>> +	if (ret < 0) {
>> +		drm_err(&i915->drm,
>> +			"Failed to find PAT index for write-back access\n");
>> +		return -ENODEV;
>> +	}
>> +	drm_info(&i915->drm, "Using PAT index %u for write-back access\n", ret);
>> +	i915->pat_wb = ret;
>> +
>> +	return 0;
>> +}
>> +
>> +int i915_cache_find_pat(struct drm_i915_private *i915, i915_cache_t cache)
>> +{
>> +	const struct intel_device_info *info = INTEL_INFO(i915);
>> +	int i;
>> +
>> +	for (i = 0; i < ARRAY_SIZE(info->cache_modes); i++) {
>> +		if (info->cache_modes[i] == cache)
>> +			return i;
>> +	}
>> +
>> +	return -1;
>> +}
>> +
>> +void i915_cache_print(char *buf, size_t buflen, const char *suffix,
>> +		      i915_cache_t cache)
>> +{
>> +	const enum i915_cache_mode mode = I915_CACHE_MODE(cache);
>> +	static const char * const mode_str[] = {
>> +		[I915_CACHE_MODE_UC] = "UC",
>> +		[I915_CACHE_MODE_WB] = "WB",
>> +		[I915_CACHE_MODE_WT] = "WT",
>> +		[I915_CACHE_MODE_WC] = "WC",
>> +	};
>> +	static const char * const flag_str[] = {
>> +		[ilog2(I915_CACHE_FLAG_COH1W)] = "1-Way-Coherent",
>> +		[ilog2(I915_CACHE_FLAG_COH2W)] = "2-Way-Coherent",
>> +		[ilog2(I915_CACHE_FLAG_L3)] =    "L3",
>> +		[ilog2(I915_CACHE_FLAG_CLOS1)] = "CLOS1",
>> +		[ilog2(I915_CACHE_FLAG_CLOS2)] = "CLOS2",
>> +	};
>> +
>> +	if (mode > ARRAY_SIZE(mode_str)) {
>> +		snprintf(buf, buflen, "0x%x%s", cache, suffix ?: "");
>> +	} else {
>> +		unsigned long flags = I915_CACHE_FLAGS(cache);
>> +		unsigned long bit;
>> +		int ret;
>> +
>> +		ret = snprintf(buf, buflen, "%s", mode_str[mode]);
>> +		buf += ret;
>> +		buflen -= ret;
>> +
>> +		/*
>> +		 * Don't print "1-way-2-way", it would be confusing and 2-way
>> +		 * implies 1-way anyway.
>> +		 */
>> +		if ((flags & (I915_CACHE_FLAG_COH1W | I915_CACHE_FLAG_COH2W)) ==
>> +		    (I915_CACHE_FLAG_COH1W | I915_CACHE_FLAG_COH2W))
>> +			flags &= ~I915_CACHE_FLAG_COH1W;
>> +
>> +		for_each_set_bit(bit, &flags, BITS_PER_TYPE(i915_cache_t)) {
>> +			ret = snprintf(buf, buflen, "-%s", flag_str[bit]);
>> +			buf += ret;
>> +			buflen -= ret;
>> +		}
>> +
>> +		if (suffix)
>> +			snprintf(buf, buflen, "%s", suffix);
>> +	}
>>   }
>> diff --git a/drivers/gpu/drm/i915/i915_cache.h b/drivers/gpu/drm/i915/i915_cache.h
>> index cb68936fb8a2..d9e97318b942 100644
>> --- a/drivers/gpu/drm/i915/i915_cache.h
>> +++ b/drivers/gpu/drm/i915/i915_cache.h
>> @@ -6,8 +6,76 @@
>>   #ifndef __I915_CACHE_H__
>>   #define __I915_CACHE_H__
>>   
>> +#include <linux/types.h>
>> +
>> +struct drm_printer;
>> +
>>   struct drm_i915_private;
>>   
>> -void i915_cache_init(struct drm_i915_private *i915);
>> +typedef u16 i915_cache_t;
>> +
>> +/* Cache modes */
>> +enum i915_cache_mode {
>> +	I915_CACHE_MODE_UC = 0,
>> +	I915_CACHE_MODE_WB,
>> +	__I915_CACHE_MODE_WB_L3, /* Special do-not-use entry for legacy 1:1 mapping. */
>> +	I915_CACHE_MODE_WT,
>> +	I915_CACHE_MODE_WC,
>> +	I915_NUM_CACHE_MODES
>> +};
>> +
>> +/* Cache mode flag bits */
>> +#define I915_CACHE_FLAG_COH1W	(0x1)
>> +#define I915_CACHE_FLAG_COH2W	(0x2) /* 1-way needs to be set too. */
>> +#define I915_CACHE_FLAG_L3	(0x4)
>> +#define I915_CACHE_FLAG_CLOS1	(0x8)
>> +#define I915_CACHE_FLAG_CLOS2	(0x10)
>> +
>> +/*
>> + * Overloaded I915_CACHE() macro based on:
>> + *  https://stackoverflow.com/questions/3046889/optional-parameters-with-c-macros
>> + *
>> + * It is possible to call I915_CACHE with mode and zero or more flags as
>> + * separate arguments. Ie these all work:
>> + *
>> + *   I915_CACHE(WB)
>> + *   I915_CACHE(WB, COH1W, COH2W)
>> + *   I915_CACHE(WB, COH1W, COH2W, L3)
>> + */
>> +
>> +#define __I915_CACHE_FLAG(f) (I915_CACHE_FLAG_##f << 8)
>> +#define __I915_CACHE(m, f) ((i915_cache_t)(I915_CACHE_MODE_##m | (f)))
>> +
>> +#define I915_CACHE_4(m, f1, f2, f3)	__I915_CACHE(m, __I915_CACHE_FLAG(f1) | __I915_CACHE_FLAG(f2) | __I915_CACHE_FLAG(f3))
>> +#define I915_CACHE_3(m, f1, f2)		__I915_CACHE(m, __I915_CACHE_FLAG(f1) | __I915_CACHE_FLAG(f2))
>> +#define I915_CACHE_2(m, f1)		__I915_CACHE(m, __I915_CACHE_FLAG(f1))
>> +#define I915_CACHE_1(m)			__I915_CACHE(m, 0)
>> +#define I915_CACHE_0(m)			__I915_CACHE(WC, 0)
>> +
>> +#define FUNC_CHOOSER(_f1, _f2, _f3, _f4, _f5, ...) _f5
>> +#define FUNC_RECOMPOSER(argsWithParentheses) FUNC_CHOOSER argsWithParentheses
>> +#define CHOOSE_FROM_ARG_COUNT(...) FUNC_RECOMPOSER((__VA_ARGS__, I915_CACHE_4, I915_CACHE_3, I915_CACHE_2, I915_CACHE_1, ))
>> +#define NO_ARG_EXPANDER() ,,,I915_CACHE_0
>> +#define MACRO_CHOOSER(...) CHOOSE_FROM_ARG_COUNT(NO_ARG_EXPANDER __VA_ARGS__ ())
>> +
>> +#define I915_CACHE(...) MACRO_CHOOSER(__VA_ARGS__)(__VA_ARGS__)
>> +
>> +/* i915_cache_t mode and flags extraction helpers. */
>> +#define I915_CACHE_MODE(cache) \
>> +	((enum i915_cache_mode)(((i915_cache_t)(cache)) & 0xff))
>> +#define I915_CACHE_FLAGS(cache) \
>> +	((unsigned int)((((i915_cache_t)(cache) & 0xff00)) >> 8))
>> +
>> +/* Helpers for i915 caching modes. */
>> +#define I915_CACHE_NONE		I915_CACHE(UC)
>> +#define I915_CACHE_CACHED	I915_CACHE(WB, COH1W, COH2W)
>> +#define I915_CACHE_WT		I915_CACHE(WT)
>> +
>> +int i915_cache_init(struct drm_i915_private *i915);
>> +int i915_cache_find_pat(struct drm_i915_private *i915, i915_cache_t cache);
>> +void i915_cache_print(char *buf, size_t buflen, const char *suffix,
>> +		      i915_cache_t cache);
>> +
>> +#define I915_CACHE_NAME_LEN (40)
>>   
>>   #endif /* __I915_CACHE_H__ */
>> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
>> index 4de44cf1026d..4ec292011546 100644
>> --- a/drivers/gpu/drm/i915/i915_debugfs.c
>> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
>> @@ -140,57 +140,18 @@ static const char *stringify_vma_type(const struct i915_vma *vma)
>>   	return "ppgtt";
>>   }
>>   
>> -static const char *i915_cache_level_str(struct drm_i915_gem_object *obj)
>> -{
>> -	struct drm_i915_private *i915 = obj_to_i915(obj);
>> -
>> -	if (IS_METEORLAKE(i915)) {
>> -		switch (obj->pat_index) {
>> -		case 0: return " WB";
>> -		case 1: return " WT";
>> -		case 2: return " UC";
>> -		case 3: return " WB (1-Way Coh)";
>> -		case 4: return " WB (2-Way Coh)";
>> -		default: return " not defined";
>> -		}
>> -	} else if (IS_PONTEVECCHIO(i915)) {
>> -		switch (obj->pat_index) {
>> -		case 0: return " UC";
>> -		case 1: return " WC";
>> -		case 2: return " WT";
>> -		case 3: return " WB";
>> -		case 4: return " WT (CLOS1)";
>> -		case 5: return " WB (CLOS1)";
>> -		case 6: return " WT (CLOS2)";
>> -		case 7: return " WT (CLOS2)";
>> -		default: return " not defined";
>> -		}
>> -	} else if (GRAPHICS_VER(i915) >= 12) {
>> -		switch (obj->pat_index) {
>> -		case 0: return " WB";
>> -		case 1: return " WC";
>> -		case 2: return " WT";
>> -		case 3: return " UC";
>> -		default: return " not defined";
>> -		}
>> -	} else {
>> -		switch (obj->pat_index) {
>> -		case 0: return " UC";
>> -		case 1: return HAS_LLC(i915) ?
>> -			       " LLC" : " snooped";
>> -		case 2: return " L3+LLC";
>> -		case 3: return " WT";
>> -		default: return " not defined";
>> -		}
>> -	}
>> -}
>> -
>>   void
>>   i915_debugfs_describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
>>   {
>> +	struct drm_i915_private *i915 = to_i915(obj->base.dev);
>> +	char buf[I915_CACHE_NAME_LEN];
>>   	struct i915_vma *vma;
>>   	int pin_count = 0;
>>   
>> +	i915_cache_print(buf, sizeof(buf),
>> +			 obj->pat_set_by_user ? "!" : NULL,
>> +			 INTEL_INFO(i915)->cache_modes[obj->pat_index]);
>> +
>>   	seq_printf(m, "%pK: %c%c%c %8zdKiB %02x %02x %s%s%s",
>>   		   &obj->base,
>>   		   get_tiling_flag(obj),
>> @@ -199,7 +160,7 @@ i915_debugfs_describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
>>   		   obj->base.size / 1024,
>>   		   obj->read_domains,
>>   		   obj->write_domain,
>> -		   i915_cache_level_str(obj),
>> +		   buf,
>>   		   obj->mm.dirty ? " dirty" : "",
>>   		   obj->mm.madv == I915_MADV_DONTNEED ? " purgeable" : "");
>>   	if (obj->base.name)
>> diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
>> index bb2223cc3470..8663388a524f 100644
>> --- a/drivers/gpu/drm/i915/i915_driver.c
>> +++ b/drivers/gpu/drm/i915/i915_driver.c
>> @@ -241,7 +241,9 @@ static int i915_driver_early_probe(struct drm_i915_private *dev_priv)
>>   	i915_memcpy_init_early(dev_priv);
>>   	intel_runtime_pm_init_early(&dev_priv->runtime_pm);
>>   
>> -	i915_cache_init(dev_priv);
>> +	ret = i915_cache_init(dev_priv);
>> +	if (ret < 0)
>> +		return ret;
>>   
>>   	ret = i915_workqueues_init(dev_priv);
>>   	if (ret < 0)
>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>> index 896aa48ed089..814705cfeb12 100644
>> --- a/drivers/gpu/drm/i915/i915_gem.c
>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>> @@ -1144,19 +1144,6 @@ int i915_gem_init(struct drm_i915_private *dev_priv)
>>   	unsigned int i;
>>   	int ret;
>>   
>> -	/*
>> -	 * In the proccess of replacing cache_level with pat_index a tricky
>> -	 * dependency is created on the definition of the enum i915_cache_level.
>> -	 * in case this enum is changed, PTE encode would be broken.
>> -	 * Add a WARNING here. And remove when we completely quit using this
>> -	 * enum
>> -	 */
>> -	BUILD_BUG_ON(I915_CACHE_NONE != 0 ||
>> -		     I915_CACHE_LLC != 1 ||
>> -		     I915_CACHE_L3_LLC != 2 ||
>> -		     I915_CACHE_WT != 3 ||
>> -		     I915_MAX_CACHE_LEVEL != 4);
>> -
>>   	/* We need to fallback to 4K pages if host doesn't support huge gtt. */
>>   	if (intel_vgpu_active(dev_priv) && !intel_vgpu_has_huge_gtt(dev_priv))
>>   		RUNTIME_INFO(dev_priv)->page_sizes = I915_GTT_PAGE_SIZE_4K;
>> diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
>> index fcacdc21643c..565a60a1645d 100644
>> --- a/drivers/gpu/drm/i915/i915_pci.c
>> +++ b/drivers/gpu/drm/i915/i915_pci.c
>> @@ -32,6 +32,7 @@
>>   #include "gt/intel_sa_media.h"
>>   #include "gem/i915_gem_object_types.h"
>>   
>> +#include "i915_cache.h"
>>   #include "i915_driver.h"
>>   #include "i915_drv.h"
>>   #include "i915_pci.h"
>> @@ -43,36 +44,43 @@
>>   	.__runtime.graphics.ip.ver = (x), \
>>   	.__runtime.media.ip.ver = (x)
>>   
>> -#define LEGACY_CACHELEVEL \
>> -	.cachelevel_to_pat = { \
>> -		[I915_CACHE_NONE]   = 0, \
>> -		[I915_CACHE_LLC]    = 1, \
>> -		[I915_CACHE_L3_LLC] = 2, \
>> -		[I915_CACHE_WT]     = 3, \
>> +#define LEGACY_CACHE_MODES \
>> +	.cache_modes = { \
>> +		[I915_CACHE_MODE_UC] 	  = I915_CACHE(UC), \
>> +		[I915_CACHE_MODE_WB] 	  = I915_CACHE(WB, COH1W, COH2W), \
> 
> Reading bspec 2863 (bdw) indicates that the CPU being able to snoop the
> GPU's L3 was a new feature in gen8.  So for HSW and earlier, any
> coherency was only 1-way (GPU could be coherent with CPU's caches, but
> not vice-versa).  Only starting with gen8 did we get 2-way coherency as
> an option where the CPU would also be coherent with the GPU cache (and
> with gen8 and beyond you could still select 1-way instead of 2-way
> coherency with instruction-level granularity via MOCS).  There are also

Did you mean Gen9 here? For me 2863 leads to "L3 Coherency SKL+" and 
text says "Gen9".

[Comes back later.]

I think so, 2770 is BDW.

> some legacy platforms (e.g., EHL/JSL on bspec 13948) where the IA wasn't
> coherent with GPU L3 so we were back to 1-way coherency.

> So should we split LEGACY_CACHE_MODES into two tables with different
> coherency settings attached to I915_CACHE_MODE_WB?

Looks like it. Marking as TODO for next respin.

Many thanks for helping with the research here!

Regards,

Tvrtko

> 
>> +		[__I915_CACHE_MODE_WB_L3] = I915_CACHE(WB, COH1W, COH2W, L3), \
>> +		[I915_CACHE_MODE_WT] 	  = I915_CACHE(WT), \
>>   	}
>>   
>> -#define TGL_CACHELEVEL \
>> -	.cachelevel_to_pat = { \
>> -		[I915_CACHE_NONE]   = 3, \
>> -		[I915_CACHE_LLC]    = 0, \
>> -		[I915_CACHE_L3_LLC] = 0, \
>> -		[I915_CACHE_WT]     = 2, \
>> +#define GEN12_CACHE_MODES \
>> +	.cache_modes = { \
>> +		[0] = I915_CACHE(WB, COH1W, COH2W), \
>> +		[1] = I915_CACHE(WC), \
>> +		[2] = I915_CACHE(WT), \
>> +		[3] = I915_CACHE(UC), \
>>   	}
>>   
>> -#define PVC_CACHELEVEL \
>> -	.cachelevel_to_pat = { \
>> -		[I915_CACHE_NONE]   = 0, \
>> -		[I915_CACHE_LLC]    = 3, \
>> -		[I915_CACHE_L3_LLC] = 3, \
>> -		[I915_CACHE_WT]     = 2, \
>> +/* FIXME is 1-way or 2-way for 3, 5, 7 */
>> +
>> +#define PVC_CACHE_MODES \
>> +	.cache_modes = { \
>> +		[0] = I915_CACHE(UC), \
>> +		[1] = I915_CACHE(WC), \
>> +		[2] = I915_CACHE(WT), \
>> +		[3] = I915_CACHE(WB, COH1W), \
>> +		[4] = I915_CACHE(WT, CLOS1), \
>> +		[5] = I915_CACHE(WB, COH1W, CLOS1), \
>> +		[6] = I915_CACHE(WT, CLOS2), \
>> +		[7] = I915_CACHE(WB, COH1W, CLOS2), \
>>   	}
>>   
>> -#define MTL_CACHELEVEL \
>> -	.cachelevel_to_pat = { \
>> -		[I915_CACHE_NONE]   = 2, \
>> -		[I915_CACHE_LLC]    = 3, \
>> -		[I915_CACHE_L3_LLC] = 3, \
>> -		[I915_CACHE_WT]     = 1, \
>> +#define MTL_CACHE_MODES \
>> +	.cache_modes = { \
>> +		[0] = I915_CACHE(WB), \
>> +		[1] = I915_CACHE(WT), \
>> +		[2] = I915_CACHE(UC), \
>> +		[3] = I915_CACHE(WB, COH1W), \
>> +		[4] = I915_CACHE(WB, COH1W, COH2W), \
> 
> We may want a comment on this one since the "2W" part is sort of a lie.
> Bspec 63884 has a programming note for MTL that says
> 
>          "...Except for system atomics, setting Coherency Mode to 10 or
>          11 results in this same one-way coherenct behavior..."
> 
> So if we ask for 2W, we actually only get 1W behavior except in a very
> narrow set of cases.
> 
> 
> Matt
> 
>>   	}
>>   
>>   /* Keep in gen based order, and chronological order within a gen */
>> @@ -97,7 +105,7 @@
>>   	.max_pat_index = 3, \
>>   	GEN_DEFAULT_PAGE_SIZES, \
>>   	GEN_DEFAULT_REGIONS, \
>> -	LEGACY_CACHELEVEL
>> +	LEGACY_CACHE_MODES
>>   
>>   #define I845_FEATURES \
>>   	GEN(2), \
>> @@ -112,7 +120,7 @@
>>   	.max_pat_index = 3, \
>>   	GEN_DEFAULT_PAGE_SIZES, \
>>   	GEN_DEFAULT_REGIONS, \
>> -	LEGACY_CACHELEVEL
>> +	LEGACY_CACHE_MODES
>>   
>>   static const struct intel_device_info i830_info = {
>>   	I830_FEATURES,
>> @@ -145,7 +153,7 @@ static const struct intel_device_info i865g_info = {
>>   	.max_pat_index = 3, \
>>   	GEN_DEFAULT_PAGE_SIZES, \
>>   	GEN_DEFAULT_REGIONS, \
>> -	LEGACY_CACHELEVEL
>> +	LEGACY_CACHE_MODES
>>   
>>   static const struct intel_device_info i915g_info = {
>>   	GEN3_FEATURES,
>> @@ -208,7 +216,7 @@ static const struct intel_device_info pnv_m_info = {
>>   	.max_pat_index = 3, \
>>   	GEN_DEFAULT_PAGE_SIZES, \
>>   	GEN_DEFAULT_REGIONS, \
>> -	LEGACY_CACHELEVEL
>> +	LEGACY_CACHE_MODES
>>   
>>   static const struct intel_device_info i965g_info = {
>>   	GEN4_FEATURES,
>> @@ -252,7 +260,7 @@ static const struct intel_device_info gm45_info = {
>>   	.max_pat_index = 3, \
>>   	GEN_DEFAULT_PAGE_SIZES, \
>>   	GEN_DEFAULT_REGIONS, \
>> -	LEGACY_CACHELEVEL
>> +	LEGACY_CACHE_MODES
>>   
>>   static const struct intel_device_info ilk_d_info = {
>>   	GEN5_FEATURES,
>> @@ -282,7 +290,7 @@ static const struct intel_device_info ilk_m_info = {
>>   	.__runtime.ppgtt_size = 31, \
>>   	GEN_DEFAULT_PAGE_SIZES, \
>>   	GEN_DEFAULT_REGIONS, \
>> -	LEGACY_CACHELEVEL
>> +	LEGACY_CACHE_MODES
>>   
>>   #define SNB_D_PLATFORM \
>>   	GEN6_FEATURES, \
>> @@ -330,7 +338,7 @@ static const struct intel_device_info snb_m_gt2_info = {
>>   	.__runtime.ppgtt_size = 31, \
>>   	GEN_DEFAULT_PAGE_SIZES, \
>>   	GEN_DEFAULT_REGIONS, \
>> -	LEGACY_CACHELEVEL
>> +	LEGACY_CACHE_MODES
>>   
>>   #define IVB_D_PLATFORM \
>>   	GEN7_FEATURES, \
>> @@ -387,7 +395,7 @@ static const struct intel_device_info vlv_info = {
>>   	.platform_engine_mask = BIT(RCS0) | BIT(VCS0) | BIT(BCS0),
>>   	GEN_DEFAULT_PAGE_SIZES,
>>   	GEN_DEFAULT_REGIONS,
>> -	LEGACY_CACHELEVEL,
>> +	LEGACY_CACHE_MODES
>>   };
>>   
>>   #define G75_FEATURES  \
>> @@ -473,7 +481,7 @@ static const struct intel_device_info chv_info = {
>>   	.has_coherent_ggtt = false,
>>   	GEN_DEFAULT_PAGE_SIZES,
>>   	GEN_DEFAULT_REGIONS,
>> -	LEGACY_CACHELEVEL,
>> +	LEGACY_CACHE_MODES
>>   };
>>   
>>   #define GEN9_DEFAULT_PAGE_SIZES \
>> @@ -536,7 +544,7 @@ static const struct intel_device_info skl_gt4_info = {
>>   	.max_pat_index = 3, \
>>   	GEN9_DEFAULT_PAGE_SIZES, \
>>   	GEN_DEFAULT_REGIONS, \
>> -	LEGACY_CACHELEVEL
>> +	LEGACY_CACHE_MODES
>>   
>>   static const struct intel_device_info bxt_info = {
>>   	GEN9_LP_FEATURES,
>> @@ -640,7 +648,7 @@ static const struct intel_device_info jsl_info = {
>>   #define GEN12_FEATURES \
>>   	GEN11_FEATURES, \
>>   	GEN(12), \
>> -	TGL_CACHELEVEL, \
>> +	GEN12_CACHE_MODES, \
>>   	.has_global_mocs = 1, \
>>   	.has_pxp = 1, \
>>   	.max_pat_index = 3
>> @@ -708,7 +716,7 @@ static const struct intel_device_info adl_p_info = {
>>   	.__runtime.graphics.ip.ver = 12, \
>>   	.__runtime.graphics.ip.rel = 50, \
>>   	XE_HP_PAGE_SIZES, \
>> -	TGL_CACHELEVEL, \
>> +	GEN12_CACHE_MODES, \
>>   	.dma_mask_size = 46, \
>>   	.has_3d_pipeline = 1, \
>>   	.has_64bit_reloc = 1, \
>> @@ -803,7 +811,7 @@ static const struct intel_device_info pvc_info = {
>>   		BIT(VCS0) |
>>   		BIT(CCS0) | BIT(CCS1) | BIT(CCS2) | BIT(CCS3),
>>   	.require_force_probe = 1,
>> -	PVC_CACHELEVEL,
>> +	PVC_CACHE_MODES
>>   };
>>   
>>   static const struct intel_gt_definition xelpmp_extra_gt[] = {
>> @@ -838,7 +846,7 @@ static const struct intel_device_info mtl_info = {
>>   	.memory_regions = REGION_SMEM | REGION_STOLEN_LMEM,
>>   	.platform_engine_mask = BIT(RCS0) | BIT(BCS0) | BIT(CCS0),
>>   	.require_force_probe = 1,
>> -	MTL_CACHELEVEL,
>> +	MTL_CACHE_MODES
>>   };
>>   
>>   #undef PLATFORM
>> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
>> index 04bc1f4a1115..973175a64534 100644
>> --- a/drivers/gpu/drm/i915/i915_perf.c
>> +++ b/drivers/gpu/drm/i915/i915_perf.c
>> @@ -1870,7 +1870,7 @@ static int alloc_oa_buffer(struct i915_perf_stream *stream)
>>   		return PTR_ERR(bo);
>>   	}
>>   
>> -	i915_gem_object_set_cache_coherency(bo, I915_CACHE_LLC);
>> +	i915_gem_object_set_cache_coherency(bo, I915_CACHE_CACHED);
>>   
>>   	/* PreHSW required 512K alignment, HSW requires 16M */
>>   	vma = i915_vma_instance(bo, &gt->ggtt->vm, NULL);
>> diff --git a/drivers/gpu/drm/i915/intel_device_info.h b/drivers/gpu/drm/i915/intel_device_info.h
>> index dbfe6443457b..2ce13b7c48cb 100644
>> --- a/drivers/gpu/drm/i915/intel_device_info.h
>> +++ b/drivers/gpu/drm/i915/intel_device_info.h
>> @@ -27,6 +27,8 @@
>>   
>>   #include <uapi/drm/i915_drm.h>
>>   
>> +#include "i915_cache.h"
>> +
>>   #include "intel_step.h"
>>   
>>   #include "gt/intel_engine_types.h"
>> @@ -243,8 +245,8 @@ struct intel_device_info {
>>   	 */
>>   	const struct intel_runtime_info __runtime;
>>   
>> -	u32 cachelevel_to_pat[I915_MAX_CACHE_LEVEL];
>> -	u32 max_pat_index;
>> +	i915_cache_t cache_modes[8];
>> +	unsigned int max_pat_index;
>>   };
>>   
>>   struct intel_driver_caps {
>> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
>> index f910ec9b6d2b..ba821e48baa5 100644
>> --- a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
>> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
>> @@ -267,7 +267,7 @@ static int igt_evict_for_cache_color(void *arg)
>>   		err = PTR_ERR(obj);
>>   		goto cleanup;
>>   	}
>> -	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
>> +	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
>>   	quirk_add(obj, &objects);
>>   
>>   	vma = i915_gem_object_ggtt_pin(obj, NULL, 0, 0,
>> @@ -283,7 +283,7 @@ static int igt_evict_for_cache_color(void *arg)
>>   		err = PTR_ERR(obj);
>>   		goto cleanup;
>>   	}
>> -	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
>> +	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
>>   	quirk_add(obj, &objects);
>>   
>>   	/* Neighbouring; same colour - should fit */
>> diff --git a/drivers/gpu/drm/i915/selftests/igt_spinner.c b/drivers/gpu/drm/i915/selftests/igt_spinner.c
>> index 3c5e0952f1b8..4cfc5000d6ff 100644
>> --- a/drivers/gpu/drm/i915/selftests/igt_spinner.c
>> +++ b/drivers/gpu/drm/i915/selftests/igt_spinner.c
>> @@ -23,7 +23,7 @@ int igt_spinner_init(struct igt_spinner *spin, struct intel_gt *gt)
>>   		err = PTR_ERR(spin->hws);
>>   		goto err;
>>   	}
>> -	i915_gem_object_set_cache_coherency(spin->hws, I915_CACHE_LLC);
>> +	i915_gem_object_set_cache_coherency(spin->hws, I915_CACHE_CACHED);
>>   
>>   	spin->obj = i915_gem_object_create_internal(gt->i915, PAGE_SIZE);
>>   	if (IS_ERR(spin->obj)) {
>> diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
>> index 1d1a457e2aee..8ae77bcf27fa 100644
>> --- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
>> +++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
>> @@ -126,13 +126,13 @@ static const struct intel_device_info mock_info = {
>>   	.memory_regions = REGION_SMEM,
>>   	.platform_engine_mask = BIT(0),
>>   
>> -	/* simply use legacy cache level for mock device */
>> +	/* Simply use legacy cache modes for the mock device. */
>>   	.max_pat_index = 3,
>> -	.cachelevel_to_pat = {
>> -		[I915_CACHE_NONE]   = 0,
>> -		[I915_CACHE_LLC]    = 1,
>> -		[I915_CACHE_L3_LLC] = 2,
>> -		[I915_CACHE_WT]     = 3,
>> +	.cache_modes = {
>> +		[0] = I915_CACHE(UC),
>> +		[1] = I915_CACHE(WB, COH1W),
>> +		[2] = I915_CACHE(WB, COH1W, COH2W, L3),
>> +		[3] = I915_CACHE(WT),
>>   	},
>>   };
>>   
>> @@ -181,7 +181,7 @@ struct drm_i915_private *mock_gem_device(void)
>>   	/* Set up device info and initial runtime info. */
>>   	intel_device_info_driver_create(i915, pdev->device, &mock_info);
>>   
>> -	i915_cache_init(i915);
>> +	WARN_ON(i915_cache_init(i915));
>>   
>>   	dev_pm_domain_set(&pdev->dev, &pm_domain);
>>   	pm_runtime_enable(&pdev->dev);
>> -- 
>> 2.39.2
>>
> 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Intel-gfx] [RFC 4/8] drm/i915: Refactor PAT/object cache handling
@ 2023-07-28 12:23       ` Tvrtko Ursulin
  0 siblings, 0 replies; 59+ messages in thread
From: Tvrtko Ursulin @ 2023-07-28 12:23 UTC (permalink / raw)
  To: Matt Roper; +Cc: Intel-gfx, dri-devel, Chris Wilson


On 28/07/2023 00:57, Matt Roper wrote:
> On Thu, Jul 27, 2023 at 03:55:00PM +0100, Tvrtko Ursulin wrote:
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> Commit 9275277d5324 ("drm/i915: use pat_index instead of cache_level") has
>> introduced PAT indices to i915 internal APIs, partially replacing the
>> usage of driver internal cache_level, but has also added a few sub-
>> optimal design decisions which this patch tries to improve upon.
>>
>> Principal change here is to invert the per platform cache level to PAT
>> index table which was added by the referenced commit, and by doing so
>> enable i915 to understand the cache mode between PAT indices, changing
>> them from opaque to transparent.
>>
>> Once we have the inverted table we are able to remove the hidden false
>> "return true" from i915_gem_object_has_cache_level and make the involved
>> code path clearer.
>>
>> To achieve this we replace the enum i915_cache_level with i915_cache_t,
>> composed of a more detailed representation of each cache mode (base mode
>> plus flags).
>>
>> In this way we are able to express the differences between different
>> write-back mode coherency settings on Meteorlake, which in turn enables us
>> to map the i915 "cached" mode to the correct Meteorlake PAT index.
>>
>> We can also replace the platform dependent cache mode to string code in
>> debugfs and elsewhere by the single implementation based on i915_cache_t.
>>
>> v2:
>>   * Fix PAT-to-cache-mode table for PVC. (Fei)
>>   * Cache display caching mode too. (Fei)
>>   * Improve and document criteria in i915_gem_object_can_bypass_llc() (Matt)
>>
>> v3:
>>   * Checkpath issues.
>>   * Cache mode flags check fixed.
>>
>> v4:
>>   * Fix intel_device_info->cache_modes array size. (Matt)
>>   * Boolean cache mode and flags query. (Matt)
>>   * Reduce number of cache macros with some macro magic.
>>   * One more checkpatch fix.
>>   * Tweak tables to show legacy and Gen12 WB is fully coherent.
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> References: 9275277d5324 ("drm/i915: use pat_index instead of cache_level")
>> Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
>> Cc: Fei Yang <fei.yang@intel.com>
>> Cc: Andi Shyti <andi.shyti@linux.intel.com>
>> Cc: Matt Roper <matthew.d.roper@intel.com>
>> ---
>>   drivers/gpu/drm/i915/gem/i915_gem_domain.c    |  60 +++++----
>>   drivers/gpu/drm/i915/gem/i915_gem_domain.h    |   5 +-
>>   .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |   3 +-
>>   drivers/gpu/drm/i915/gem/i915_gem_internal.c  |   2 +-
>>   drivers/gpu/drm/i915/gem/i915_gem_mman.c      |   4 +-
>>   drivers/gpu/drm/i915/gem/i915_gem_object.c    | 117 ++++++++++--------
>>   drivers/gpu/drm/i915/gem/i915_gem_object.h    |  11 +-
>>   .../gpu/drm/i915/gem/i915_gem_object_types.h  | 116 +----------------
>>   drivers/gpu/drm/i915/gem/i915_gem_shmem.c     |   8 +-
>>   drivers/gpu/drm/i915/gem/i915_gem_stolen.c    |   2 +-
>>   drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c  |  20 +--
>>   drivers/gpu/drm/i915/gem/i915_gem_userptr.c   |   2 +-
>>   .../drm/i915/gem/selftests/huge_gem_object.c  |   2 +-
>>   .../gpu/drm/i915/gem/selftests/huge_pages.c   |   3 +-
>>   drivers/gpu/drm/i915/gt/gen8_ppgtt.c          |  10 +-
>>   drivers/gpu/drm/i915/gt/intel_engine_cs.c     |   2 +-
>>   drivers/gpu/drm/i915/gt/intel_ggtt.c          |  25 ++--
>>   drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c     |   4 +-
>>   drivers/gpu/drm/i915/gt/intel_gtt.c           |   2 +-
>>   drivers/gpu/drm/i915/gt/intel_gtt.h           |   3 +-
>>   drivers/gpu/drm/i915/gt/intel_ppgtt.c         |   6 +-
>>   .../gpu/drm/i915/gt/intel_ring_submission.c   |   4 +-
>>   drivers/gpu/drm/i915/gt/intel_timeline.c      |   2 +-
>>   drivers/gpu/drm/i915/gt/selftest_hangcheck.c  |   2 +-
>>   .../gpu/drm/i915/gt/selftest_workarounds.c    |   2 +-
>>   drivers/gpu/drm/i915/i915_cache.c             |  89 +++++++++++--
>>   drivers/gpu/drm/i915/i915_cache.h             |  70 ++++++++++-
>>   drivers/gpu/drm/i915/i915_debugfs.c           |  53 ++------
>>   drivers/gpu/drm/i915/i915_driver.c            |   4 +-
>>   drivers/gpu/drm/i915/i915_gem.c               |  13 --
>>   drivers/gpu/drm/i915/i915_pci.c               |  84 +++++++------
>>   drivers/gpu/drm/i915/i915_perf.c              |   2 +-
>>   drivers/gpu/drm/i915/intel_device_info.h      |   6 +-
>>   .../gpu/drm/i915/selftests/i915_gem_evict.c   |   4 +-
>>   drivers/gpu/drm/i915/selftests/igt_spinner.c  |   2 +-
>>   .../gpu/drm/i915/selftests/mock_gem_device.c  |  14 +--
>>   36 files changed, 391 insertions(+), 367 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
>> index 57db9c581bf6..c15f83de33af 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
>> @@ -8,6 +8,7 @@
>>   #include "display/intel_frontbuffer.h"
>>   #include "gt/intel_gt.h"
>>   
>> +#include "i915_cache.h"
>>   #include "i915_drv.h"
>>   #include "i915_gem_clflush.h"
>>   #include "i915_gem_domain.h"
>> @@ -41,14 +42,17 @@ static bool gpu_write_needs_clflush(struct drm_i915_gem_object *obj)
>>   		return false;
>>   
>>   	/*
>> -	 * For objects created by userspace through GEM_CREATE with pat_index
>> -	 * set by set_pat extension, i915_gem_object_has_cache_level() will
>> -	 * always return true, because the coherency of such object is managed
>> -	 * by userspace. Othereise the call here would fall back to checking
>> -	 * whether the object is un-cached or write-through.
>> +	 * Always flush cache for UMD objects with PAT index set.
>>   	 */
>> -	return !(i915_gem_object_has_cache_level(obj, I915_CACHE_NONE) ||
>> -		 i915_gem_object_has_cache_level(obj, I915_CACHE_WT));
>> +	if (obj->pat_set_by_user)
>> +		return true;
>> +
>> +	/*
>> +	 * Fully coherent cached access may end up with data in the CPU cache
>> +	 * which hasn't hit memory yet.
>> +	 */
>> +	return i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_WB) &&
>> +	       i915_gem_object_has_cache_flag(obj, I915_CACHE_FLAG_COH2W);
>>   }
>>   
>>   bool i915_gem_cpu_write_needs_clflush(struct drm_i915_gem_object *obj)
>> @@ -268,7 +272,7 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
>>   /**
>>    * i915_gem_object_set_cache_level - Changes the cache-level of an object across all VMA.
>>    * @obj: object to act on
>> - * @cache_level: new cache level to set for the object
>> + * @cache: new caching mode to set for the object
>>    *
>>    * After this function returns, the object will be in the new cache-level
>>    * across all GTT and the contents of the backing storage will be coherent,
>> @@ -281,18 +285,28 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
>>    * that all direct access to the scanout remains coherent.
>>    */
>>   int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
>> -				    enum i915_cache_level cache_level)
>> +				    i915_cache_t cache)
>>   {
>> -	int ret;
>> +	struct drm_i915_private *i915 = to_i915(obj->base.dev);
>> +	int pat, ret;
>>   
>> -	/*
>> -	 * For objects created by userspace through GEM_CREATE with pat_index
>> -	 * set by set_pat extension, simply return 0 here without touching
>> -	 * the cache setting, because such objects should have an immutable
>> -	 * cache setting by desgin and always managed by userspace.
>> -	 */
>> -	if (i915_gem_object_has_cache_level(obj, cache_level))
>> +	pat = i915_cache_find_pat(i915, cache);
>> +	if (pat < 0) {
>> +		char buf[I915_CACHE_NAME_LEN];
>> +
>> +		i915_cache_print(buf, sizeof(buf), NULL, cache);
>> +		drm_err_ratelimited(&i915->drm,
>> +				    "Attempting to use unknown caching mode %s!\n",
>> +				    buf);
>> +
>> +		return -EINVAL;
>> +	} else if (pat == obj->pat_index) {
>>   		return 0;
>> +	} else if (obj->pat_set_by_user) {
>> +		drm_notice_once(&i915->drm,
>> +				"Attempting to change caching mode on an object with fixed PAT!\n");
>> +		return -EINVAL;
>> +	}
>>   
>>   	ret = i915_gem_object_wait(obj,
>>   				   I915_WAIT_INTERRUPTIBLE |
>> @@ -302,7 +316,7 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
>>   		return ret;
>>   
>>   	/* Always invalidate stale cachelines */
>> -	i915_gem_object_set_cache_coherency(obj, cache_level);
>> +	i915_gem_object_set_pat_index(obj, pat);
>>   	obj->cache_dirty = true;
>>   
>>   	/* The cache-level will be applied when each vma is rebound. */
>> @@ -337,10 +351,10 @@ int i915_gem_get_caching_ioctl(struct drm_device *dev, void *data,
>>   		goto out;
>>   	}
>>   
>> -	if (i915_gem_object_has_cache_level(obj, I915_CACHE_LLC) ||
>> -	    i915_gem_object_has_cache_level(obj, I915_CACHE_L3_LLC))
>> +	if (i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_WB) &&
>> +	    i915_gem_object_has_cache_flag(obj, I915_CACHE_FLAG_COH2W))
>>   		args->caching = I915_CACHING_CACHED;
>> -	else if (i915_gem_object_has_cache_level(obj, I915_CACHE_WT))
>> +	else if (i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_WT))
>>   		args->caching = I915_CACHING_DISPLAY;
>>   	else
>>   		args->caching = I915_CACHING_NONE;
>> @@ -355,7 +369,7 @@ int i915_gem_set_caching_ioctl(struct drm_device *dev, void *data,
>>   	struct drm_i915_private *i915 = to_i915(dev);
>>   	struct drm_i915_gem_caching *args = data;
>>   	struct drm_i915_gem_object *obj;
>> -	enum i915_cache_level level;
>> +	i915_cache_t level;
>>   	int ret = 0;
>>   
>>   	if (IS_DGFX(i915))
>> @@ -378,7 +392,7 @@ int i915_gem_set_caching_ioctl(struct drm_device *dev, void *data,
>>   		if (!HAS_LLC(i915) && !HAS_SNOOP(i915))
>>   			return -ENODEV;
>>   
>> -		level = I915_CACHE_LLC;
>> +		level = I915_CACHE_CACHED;
>>   		break;
>>   	case I915_CACHING_DISPLAY:
>>   		level = HAS_WT(i915) ? I915_CACHE_WT : I915_CACHE_NONE;
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.h b/drivers/gpu/drm/i915/gem/i915_gem_domain.h
>> index 9622df962bfc..6da5c351f6fd 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.h
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.h
>> @@ -6,10 +6,11 @@
>>   #ifndef __I915_GEM_DOMAIN_H__
>>   #define __I915_GEM_DOMAIN_H__
>>   
>> +#include "i915_cache.h"
>> +
>>   struct drm_i915_gem_object;
>> -enum i915_cache_level;
>>   
>>   int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
>> -				    enum i915_cache_level cache_level);
>> +				    i915_cache_t cache);
>>   
>>   #endif /* __I915_GEM_DOMAIN_H__ */
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>> index 0a1d40220020..9d6e49c8a4c6 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>> @@ -648,7 +648,8 @@ static inline int use_cpu_reloc(const struct reloc_cache *cache,
>>   	 */
>>   	return (cache->has_llc ||
>>   		obj->cache_dirty ||
>> -		!i915_gem_object_has_cache_level(obj, I915_CACHE_NONE));
>> +		!(obj->pat_set_by_user ||
>> +		  i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_UC)));
>>   }
>>   
>>   static int eb_reserve_vma(struct i915_execbuffer *eb,
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_internal.c b/drivers/gpu/drm/i915/gem/i915_gem_internal.c
>> index 6bc26b4b06b8..88c360c3d6a3 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_internal.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_internal.c
>> @@ -170,7 +170,7 @@ __i915_gem_object_create_internal(struct drm_i915_private *i915,
>>   	obj->read_domains = I915_GEM_DOMAIN_CPU;
>>   	obj->write_domain = I915_GEM_DOMAIN_CPU;
>>   
>> -	cache_level = HAS_LLC(i915) ? I915_CACHE_LLC : I915_CACHE_NONE;
>> +	cache_level = HAS_LLC(i915) ? I915_CACHE_CACHED : I915_CACHE_NONE;
>>   	i915_gem_object_set_cache_coherency(obj, cache_level);
>>   
>>   	return obj;
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
>> index aa4d842d4c5a..cd7f8ded0d6f 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
>> @@ -382,7 +382,6 @@ static vm_fault_t vm_fault_gtt(struct vm_fault *vmf)
>>   		goto err_reset;
>>   	}
>>   
>> -	/* Access to snoopable pages through the GTT is incoherent. */
>>   	/*
>>   	 * For objects created by userspace through GEM_CREATE with pat_index
>>   	 * set by set_pat extension, coherency is managed by userspace, make
>> @@ -391,7 +390,8 @@ static vm_fault_t vm_fault_gtt(struct vm_fault *vmf)
>>   	 * objects. Otherwise this helper function would fall back to checking
>>   	 * whether the object is un-cached.
>>   	 */
>> -	if (!(i915_gem_object_has_cache_level(obj, I915_CACHE_NONE) ||
>> +	if (!((obj->pat_set_by_user ||
>> +	       i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_UC)) ||
>>   	      HAS_LLC(i915))) {
>>   		ret = -EFAULT;
>>   		goto err_unpin;
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
>> index 3dc4fbb67d2b..ec1f0be43d0d 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
>> @@ -45,33 +45,6 @@ static struct kmem_cache *slab_objects;
>>   
>>   static const struct drm_gem_object_funcs i915_gem_object_funcs;
>>   
>> -unsigned int i915_gem_get_pat_index(struct drm_i915_private *i915,
>> -				    enum i915_cache_level level)
>> -{
>> -	if (drm_WARN_ON(&i915->drm, level >= I915_MAX_CACHE_LEVEL))
>> -		return 0;
>> -
>> -	return INTEL_INFO(i915)->cachelevel_to_pat[level];
>> -}
>> -
>> -bool i915_gem_object_has_cache_level(const struct drm_i915_gem_object *obj,
>> -				     enum i915_cache_level lvl)
>> -{
>> -	/*
>> -	 * In case the pat_index is set by user space, this kernel mode
>> -	 * driver should leave the coherency to be managed by user space,
>> -	 * simply return true here.
>> -	 */
>> -	if (obj->pat_set_by_user)
>> -		return true;
>> -
>> -	/*
>> -	 * Otherwise the pat_index should have been converted from cache_level
>> -	 * so that the following comparison is valid.
>> -	 */
>> -	return obj->pat_index == i915_gem_get_pat_index(obj_to_i915(obj), lvl);
>> -}
>> -
>>   struct drm_i915_gem_object *i915_gem_object_alloc(void)
>>   {
>>   	struct drm_i915_gem_object *obj;
>> @@ -144,30 +117,72 @@ void __i915_gem_object_fini(struct drm_i915_gem_object *obj)
>>   	dma_resv_fini(&obj->base._resv);
>>   }
>>   
>> +bool i915_gem_object_has_cache_mode(const struct drm_i915_gem_object *obj,
>> +				    enum i915_cache_mode mode)
>> +{
>> +	struct drm_i915_private *i915 = obj_to_i915(obj);
>> +	i915_cache_t cache = INTEL_INFO(i915)->cache_modes[obj->pat_index];
>> +
>> +	return I915_CACHE_MODE(cache) == mode;
>> +}
>> +
>> +bool i915_gem_object_has_cache_flag(const struct drm_i915_gem_object *obj,
>> +				    unsigned int flag)
>> +{
>> +	struct drm_i915_private *i915 = obj_to_i915(obj);
>> +	i915_cache_t cache = INTEL_INFO(i915)->cache_modes[obj->pat_index];
>> +
>> +	return I915_CACHE_FLAGS(cache) & flag;
>> +}
>> +
>> +static void __i915_gem_object_update_coherency(struct drm_i915_gem_object *obj)
>> +{
>> +	struct drm_i915_private *i915 = obj_to_i915(obj);
>> +	i915_cache_t cache = INTEL_INFO(i915)->cache_modes[obj->pat_index];
>> +	const unsigned int flags = I915_CACHE_FLAGS(cache);
>> +	const unsigned int mode = I915_CACHE_MODE(cache);
>> +
>> +	if (mode == I915_CACHE_MODE_WC ||
>> +	    mode == I915_CACHE_MODE_WT ||
>> +	    (mode == I915_CACHE_MODE_WB && (flags & I915_CACHE_FLAG_COH2W)))
>> +		obj->cache_coherent = I915_BO_CACHE_COHERENT_FOR_READ |
>> +				      I915_BO_CACHE_COHERENT_FOR_WRITE;
>> +	else if (HAS_LLC(i915))
>> +		obj->cache_coherent = I915_BO_CACHE_COHERENT_FOR_READ;
>> +	else
>> +		obj->cache_coherent = 0;
>> +
>> +	obj->cache_dirty =
>> +		!(obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_WRITE) &&
>> +		!IS_DGFX(i915);
>> +}
>> +
>>   /**
>>    * i915_gem_object_set_cache_coherency - Mark up the object's coherency levels
>> - * for a given cache_level
>> + * for a given caching mode
>>    * @obj: #drm_i915_gem_object
>> - * @cache_level: cache level
>> + * @cache: cache mode
>>    */
>>   void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object *obj,
>> -					 unsigned int cache_level)
>> +					 i915_cache_t cache)
>>   {
>> -	struct drm_i915_private *i915 = to_i915(obj->base.dev);
>> +	struct drm_i915_private *i915 = obj_to_i915(obj);
>> +	int found;
>>   
>> -	obj->pat_index = i915_gem_get_pat_index(i915, cache_level);
>> +	found = i915_cache_find_pat(i915, cache);
>> +	if (found < 0) {
>> +		char buf[I915_CACHE_NAME_LEN];
>>   
>> -	if (cache_level != I915_CACHE_NONE)
>> -		obj->cache_coherent = (I915_BO_CACHE_COHERENT_FOR_READ |
>> -				       I915_BO_CACHE_COHERENT_FOR_WRITE);
>> -	else if (HAS_LLC(i915))
>> -		obj->cache_coherent = I915_BO_CACHE_COHERENT_FOR_READ;
>> -	else
>> -		obj->cache_coherent = 0;
>> +		i915_cache_print(buf, sizeof(buf), NULL, cache);
>> +		drm_err_ratelimited(&i915->drm, "Unknown cache mode %s!\n",
>> +				    buf);
>>   
>> -	obj->cache_dirty =
>> -		!(obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_WRITE) &&
>> -		!IS_DGFX(i915);
>> +		found = i915->pat_uc;
>> +	}
>> +
>> +	obj->pat_index = found;
>> +
>> +	__i915_gem_object_update_coherency(obj);
>>   }
>>   
>>   /**
>> @@ -181,24 +196,18 @@ void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object *obj,
>>   void i915_gem_object_set_pat_index(struct drm_i915_gem_object *obj,
>>   				   unsigned int pat_index)
>>   {
>> -	struct drm_i915_private *i915 = to_i915(obj->base.dev);
>> +	struct drm_i915_private *i915 = obj_to_i915(obj);
>>   
>>   	if (obj->pat_index == pat_index)
>>   		return;
>>   
>> +	if (drm_WARN_ON_ONCE(&i915->drm,
>> +			     pat_index > INTEL_INFO(i915)->max_pat_index))
>> +		return;
>> +
>>   	obj->pat_index = pat_index;
>>   
>> -	if (pat_index != i915_gem_get_pat_index(i915, I915_CACHE_NONE))
>> -		obj->cache_coherent = (I915_BO_CACHE_COHERENT_FOR_READ |
>> -				       I915_BO_CACHE_COHERENT_FOR_WRITE);
>> -	else if (HAS_LLC(i915))
>> -		obj->cache_coherent = I915_BO_CACHE_COHERENT_FOR_READ;
>> -	else
>> -		obj->cache_coherent = 0;
>> -
>> -	obj->cache_dirty =
>> -		!(obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_WRITE) &&
>> -		!IS_DGFX(i915);
>> +	__i915_gem_object_update_coherency(obj);
>>   }
>>   
>>   bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj)
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h
>> index 884a17275b3a..a5d4ee19d9be 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
>> @@ -13,6 +13,7 @@
>>   
>>   #include "display/intel_frontbuffer.h"
>>   #include "intel_memory_region.h"
>> +#include "i915_cache.h"
>>   #include "i915_gem_object_types.h"
>>   #include "i915_gem_gtt.h"
>>   #include "i915_gem_ww.h"
>> @@ -32,10 +33,6 @@ static inline bool i915_gem_object_size_2big(u64 size)
>>   	return false;
>>   }
>>   
>> -unsigned int i915_gem_get_pat_index(struct drm_i915_private *i915,
>> -				    enum i915_cache_level level);
>> -bool i915_gem_object_has_cache_level(const struct drm_i915_gem_object *obj,
>> -				     enum i915_cache_level lvl);
>>   void i915_gem_init__objects(struct drm_i915_private *i915);
>>   
>>   void i915_objects_module_exit(void);
>> @@ -764,8 +761,12 @@ int i915_gem_object_wait_moving_fence(struct drm_i915_gem_object *obj,
>>   				      bool intr);
>>   bool i915_gem_object_has_unknown_state(struct drm_i915_gem_object *obj);
>>   
>> +bool i915_gem_object_has_cache_mode(const struct drm_i915_gem_object *obj,
>> +				    enum i915_cache_mode mode);
>> +bool i915_gem_object_has_cache_flag(const struct drm_i915_gem_object *obj,
>> +				    unsigned int flag);
>>   void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object *obj,
>> -					 unsigned int cache_level);
>> +					 i915_cache_t cache);
>>   void i915_gem_object_set_pat_index(struct drm_i915_gem_object *obj,
>>   				   unsigned int pat_index);
>>   bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj);
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
>> index 8de2b91b3edf..6790e13ad262 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
>> @@ -14,6 +14,7 @@
>>   #include <uapi/drm/i915_drm.h>
>>   
>>   #include "i915_active.h"
>> +#include "i915_cache.h"
>>   #include "i915_selftest.h"
>>   #include "i915_vma_resource.h"
>>   
>> @@ -116,93 +117,6 @@ struct drm_i915_gem_object_ops {
>>   	const char *name; /* friendly name for debug, e.g. lockdep classes */
>>   };
>>   
>> -/**
>> - * enum i915_cache_level - The supported GTT caching values for system memory
>> - * pages.
>> - *
>> - * These translate to some special GTT PTE bits when binding pages into some
>> - * address space. It also determines whether an object, or rather its pages are
>> - * coherent with the GPU, when also reading or writing through the CPU cache
>> - * with those pages.
>> - *
>> - * Userspace can also control this through struct drm_i915_gem_caching.
>> - */
>> -enum i915_cache_level {
>> -	/**
>> -	 * @I915_CACHE_NONE:
>> -	 *
>> -	 * GPU access is not coherent with the CPU cache. If the cache is dirty
>> -	 * and we need the underlying pages to be coherent with some later GPU
>> -	 * access then we need to manually flush the pages.
>> -	 *
>> -	 * On shared LLC platforms reads and writes through the CPU cache are
>> -	 * still coherent even with this setting. See also
>> -	 * &drm_i915_gem_object.cache_coherent for more details. Due to this we
>> -	 * should only ever use uncached for scanout surfaces, otherwise we end
>> -	 * up over-flushing in some places.
>> -	 *
>> -	 * This is the default on non-LLC platforms.
>> -	 */
>> -	I915_CACHE_NONE = 0,
>> -	/**
>> -	 * @I915_CACHE_LLC:
>> -	 *
>> -	 * GPU access is coherent with the CPU cache. If the cache is dirty,
>> -	 * then the GPU will ensure that access remains coherent, when both
>> -	 * reading and writing through the CPU cache. GPU writes can dirty the
>> -	 * CPU cache.
>> -	 *
>> -	 * Not used for scanout surfaces.
>> -	 *
>> -	 * Applies to both platforms with shared LLC(HAS_LLC), and snooping
>> -	 * based platforms(HAS_SNOOP).
>> -	 *
>> -	 * This is the default on shared LLC platforms.  The only exception is
>> -	 * scanout objects, where the display engine is not coherent with the
>> -	 * CPU cache. For such objects I915_CACHE_NONE or I915_CACHE_WT is
>> -	 * automatically applied by the kernel in pin_for_display, if userspace
>> -	 * has not done so already.
>> -	 */
>> -	I915_CACHE_LLC,
>> -	/**
>> -	 * @I915_CACHE_L3_LLC:
>> -	 *
>> -	 * Explicitly enable the Gfx L3 cache, with coherent LLC.
>> -	 *
>> -	 * The Gfx L3 sits between the domain specific caches, e.g
>> -	 * sampler/render caches, and the larger LLC. LLC is coherent with the
>> -	 * GPU, but L3 is only visible to the GPU, so likely needs to be flushed
>> -	 * when the workload completes.
>> -	 *
>> -	 * Not used for scanout surfaces.
>> -	 *
>> -	 * Only exposed on some gen7 + GGTT. More recent hardware has dropped
>> -	 * this explicit setting, where it should now be enabled by default.
>> -	 */
>> -	I915_CACHE_L3_LLC,
>> -	/**
>> -	 * @I915_CACHE_WT:
>> -	 *
>> -	 * Write-through. Used for scanout surfaces.
>> -	 *
>> -	 * The GPU can utilise the caches, while still having the display engine
>> -	 * be coherent with GPU writes, as a result we don't need to flush the
>> -	 * CPU caches when moving out of the render domain. This is the default
>> -	 * setting chosen by the kernel, if supported by the HW, otherwise we
>> -	 * fallback to I915_CACHE_NONE. On the CPU side writes through the CPU
>> -	 * cache still need to be flushed, to remain coherent with the display
>> -	 * engine.
>> -	 */
>> -	I915_CACHE_WT,
>> -	/**
>> -	 * @I915_MAX_CACHE_LEVEL:
>> -	 *
>> -	 * Mark the last entry in the enum. Used for defining cachelevel_to_pat
>> -	 * array for cache_level to pat translation table.
>> -	 */
>> -	I915_MAX_CACHE_LEVEL,
>> -};
>> -
>>   enum i915_map_type {
>>   	I915_MAP_WB = 0,
>>   	I915_MAP_WC,
>> @@ -403,16 +317,6 @@ struct drm_i915_gem_object {
>>   	/**
>>   	 * @cache_coherent:
>>   	 *
>> -	 * Note: with the change above which replaced @cache_level with pat_index,
>> -	 * the use of @cache_coherent is limited to the objects created by kernel
>> -	 * or by userspace without pat index specified.
>> -	 * Check for @pat_set_by_user to find out if an object has pat index set
>> -	 * by userspace. The ioctl's to change cache settings have also been
>> -	 * disabled for the objects with pat index set by userspace. Please don't
>> -	 * assume @cache_coherent having the flags set as describe here. A helper
>> -	 * function i915_gem_object_has_cache_level() provides one way to bypass
>> -	 * the use of this field.
>> -	 *
>>   	 * Track whether the pages are coherent with the GPU if reading or
>>   	 * writing through the CPU caches. The largely depends on the
>>   	 * @cache_level setting.
>> @@ -447,7 +351,7 @@ struct drm_i915_gem_object {
>>   	 * flushing the surface just before doing the scanout.  This does mean
>>   	 * we might unnecessarily flush non-scanout objects in some places, but
>>   	 * the default assumption is that all normal objects should be using
>> -	 * I915_CACHE_LLC, at least on platforms with the shared LLC.
>> +	 * I915_CACHE_CACHED, at least on platforms with the shared LLC.
>>   	 *
>>   	 * Supported values:
>>   	 *
>> @@ -486,16 +390,6 @@ struct drm_i915_gem_object {
>>   	/**
>>   	 * @cache_dirty:
>>   	 *
>> -	 * Note: with the change above which replaced cache_level with pat_index,
>> -	 * the use of @cache_dirty is limited to the objects created by kernel
>> -	 * or by userspace without pat index specified.
>> -	 * Check for @pat_set_by_user to find out if an object has pat index set
>> -	 * by userspace. The ioctl's to change cache settings have also been
>> -	 * disabled for the objects with pat_index set by userspace. Please don't
>> -	 * assume @cache_dirty is set as describe here. Also see helper function
>> -	 * i915_gem_object_has_cache_level() for possible ways to bypass the use
>> -	 * of this field.
>> -	 *
>>   	 * Track if we are we dirty with writes through the CPU cache for this
>>   	 * object. As a result reading directly from main memory might yield
>>   	 * stale data.
>> @@ -531,9 +425,9 @@ struct drm_i915_gem_object {
>>   	 *
>>   	 *   1. All userspace objects, by default, have @cache_level set as
>>   	 *   I915_CACHE_NONE. The only exception is userptr objects, where we
>> -	 *   instead force I915_CACHE_LLC, but we also don't allow userspace to
>> -	 *   ever change the @cache_level for such objects. Another special case
>> -	 *   is dma-buf, which doesn't rely on @cache_dirty,  but there we
>> +	 *   instead force I915_CACHE_CACHED, but we also don't allow userspace
>> +	 *   to ever change the @cache_level for such objects. Another special
>> +	 *   case is dma-buf, which doesn't rely on @cache_dirty,  but there we
>>   	 *   always do a forced flush when acquiring the pages, if there is a
>>   	 *   chance that the pages can be read directly from main memory with
>>   	 *   the GPU.
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
>> index 8f1633c3fb93..aba908f0349f 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
>> @@ -584,7 +584,7 @@ static int shmem_object_init(struct intel_memory_region *mem,
>>   	static struct lock_class_key lock_class;
>>   	struct drm_i915_private *i915 = mem->i915;
>>   	struct address_space *mapping;
>> -	unsigned int cache_level;
>> +	i915_cache_t cache;
>>   	gfp_t mask;
>>   	int ret;
>>   
>> @@ -628,11 +628,11 @@ static int shmem_object_init(struct intel_memory_region *mem,
>>   		 * However, we maintain the display planes as UC, and so
>>   		 * need to rebind when first used as such.
>>   		 */
>> -		cache_level = I915_CACHE_LLC;
>> +		cache = I915_CACHE_CACHED;
>>   	else
>> -		cache_level = I915_CACHE_NONE;
>> +		cache = I915_CACHE_NONE;
>>   
>> -	i915_gem_object_set_cache_coherency(obj, cache_level);
>> +	i915_gem_object_set_cache_coherency(obj, cache);
>>   
>>   	i915_gem_object_init_memory_region(obj, mem);
>>   
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
>> index 1c8eb806b7d3..cc907a1f1c53 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
>> @@ -691,7 +691,7 @@ static int __i915_gem_object_create_stolen(struct intel_memory_region *mem,
>>   
>>   	obj->stolen = stolen;
>>   	obj->read_domains = I915_GEM_DOMAIN_CPU | I915_GEM_DOMAIN_GTT;
>> -	cache_level = HAS_LLC(mem->i915) ? I915_CACHE_LLC : I915_CACHE_NONE;
>> +	cache_level = HAS_LLC(mem->i915) ? I915_CACHE_CACHED : I915_CACHE_NONE;
>>   	i915_gem_object_set_cache_coherency(obj, cache_level);
>>   
>>   	if (WARN_ON(!i915_gem_object_trylock(obj, NULL)))
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
>> index 6bd6c239f4ac..107176d1757b 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
>> @@ -48,14 +48,14 @@ void i915_ttm_migrate_set_ban_memcpy(bool ban)
>>   }
>>   #endif
>>   
>> -static enum i915_cache_level
>> -i915_ttm_cache_level(struct drm_i915_private *i915, struct ttm_resource *res,
>> -		     struct ttm_tt *ttm)
>> +static i915_cache_t
>> +i915_ttm_cache(struct drm_i915_private *i915, struct ttm_resource *res,
>> +	       struct ttm_tt *ttm)
>>   {
>>   	return ((HAS_LLC(i915) || HAS_SNOOP(i915)) &&
>>   		!i915_ttm_gtt_binds_lmem(res) &&
>> -		ttm->caching == ttm_cached) ? I915_CACHE_LLC :
>> -		I915_CACHE_NONE;
>> +		ttm->caching == ttm_cached) ? I915_CACHE_CACHED :
>> +					      I915_CACHE_NONE;
>>   }
>>   
>>   static unsigned int
>> @@ -112,8 +112,8 @@ void i915_ttm_adjust_domains_after_move(struct drm_i915_gem_object *obj)
>>   void i915_ttm_adjust_gem_after_move(struct drm_i915_gem_object *obj)
>>   {
>>   	struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
>> -	unsigned int cache_level;
>>   	unsigned int mem_flags;
>> +	i915_cache_t cache;
>>   	unsigned int i;
>>   	int mem_type;
>>   
>> @@ -126,13 +126,13 @@ void i915_ttm_adjust_gem_after_move(struct drm_i915_gem_object *obj)
>>   	if (!bo->resource) {
>>   		mem_flags = I915_BO_FLAG_STRUCT_PAGE;
>>   		mem_type = I915_PL_SYSTEM;
>> -		cache_level = I915_CACHE_NONE;
>> +		cache = I915_CACHE_NONE;
>>   	} else {
>>   		mem_flags = i915_ttm_cpu_maps_iomem(bo->resource) ? I915_BO_FLAG_IOMEM :
>>   			I915_BO_FLAG_STRUCT_PAGE;
>>   		mem_type = bo->resource->mem_type;
>> -		cache_level = i915_ttm_cache_level(to_i915(bo->base.dev), bo->resource,
>> -						   bo->ttm);
>> +		cache = i915_ttm_cache(to_i915(bo->base.dev), bo->resource,
>> +				       bo->ttm);
>>   	}
>>   
>>   	/*
>> @@ -157,7 +157,7 @@ void i915_ttm_adjust_gem_after_move(struct drm_i915_gem_object *obj)
>>   	obj->mem_flags &= ~(I915_BO_FLAG_STRUCT_PAGE | I915_BO_FLAG_IOMEM);
>>   	obj->mem_flags |= mem_flags;
>>   
>> -	i915_gem_object_set_cache_coherency(obj, cache_level);
>> +	i915_gem_object_set_cache_coherency(obj, cache);
>>   }
>>   
>>   /**
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
>> index 1d3ebdf4069b..5d2891981bd4 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
>> @@ -553,7 +553,7 @@ i915_gem_userptr_ioctl(struct drm_device *dev,
>>   	obj->mem_flags = I915_BO_FLAG_STRUCT_PAGE;
>>   	obj->read_domains = I915_GEM_DOMAIN_CPU;
>>   	obj->write_domain = I915_GEM_DOMAIN_CPU;
>> -	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
>> +	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
>>   
>>   	obj->userptr.ptr = args->user_ptr;
>>   	obj->userptr.notifier_seq = ULONG_MAX;
>> diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c b/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c
>> index bac957755068..77d04be5e9d7 100644
>> --- a/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c
>> +++ b/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c
>> @@ -123,7 +123,7 @@ huge_gem_object(struct drm_i915_private *i915,
>>   
>>   	obj->read_domains = I915_GEM_DOMAIN_CPU;
>>   	obj->write_domain = I915_GEM_DOMAIN_CPU;
>> -	cache_level = HAS_LLC(i915) ? I915_CACHE_LLC : I915_CACHE_NONE;
>> +	cache_level = HAS_LLC(i915) ? I915_CACHE_CACHED : I915_CACHE_NONE;
>>   	i915_gem_object_set_cache_coherency(obj, cache_level);
>>   	obj->scratch = phys_size;
>>   
>> diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
>> index 6bddd733d796..6ca5b9dbc414 100644
>> --- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
>> +++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
>> @@ -200,9 +200,10 @@ huge_pages_object(struct drm_i915_private *i915,
>>   	obj->write_domain = I915_GEM_DOMAIN_CPU;
>>   	obj->read_domains = I915_GEM_DOMAIN_CPU;
>>   
>> -	cache_level = HAS_LLC(i915) ? I915_CACHE_LLC : I915_CACHE_NONE;
>> +	cache_level = HAS_LLC(i915) ? I915_CACHE_CACHED : I915_CACHE_NONE;
>>   	i915_gem_object_set_cache_coherency(obj, cache_level);
>>   
>> +
>>   	obj->mm.page_mask = page_mask;
>>   
>>   	return obj;
>> diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>> index 675f71f06e89..3c93a73cf6b1 100644
>> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>> @@ -16,11 +16,11 @@
>>   #include "intel_gtt.h"
>>   
>>   static u64 gen8_pde_encode(const dma_addr_t addr,
>> -			   const enum i915_cache_level level)
>> +			   const enum i915_cache_mode cache_mode)
>>   {
>>   	u64 pde = addr | GEN8_PAGE_PRESENT | GEN8_PAGE_RW;
>>   
>> -	if (level != I915_CACHE_NONE)
>> +	if (cache_mode != I915_CACHE_MODE_UC)
>>   		pde |= PPAT_CACHED_PDE;
>>   	else
>>   		pde |= PPAT_UNCACHED;
>> @@ -43,10 +43,10 @@ static u64 gen8_pte_encode(dma_addr_t addr,
>>   	 * See translation table defined by LEGACY_CACHELEVEL.
>>   	 */
>>   	switch (pat_index) {
>> -	case I915_CACHE_NONE:
>> +	case I915_CACHE_MODE_UC:
>>   		pte |= PPAT_UNCACHED;
>>   		break;
>> -	case I915_CACHE_WT:
>> +	case I915_CACHE_MODE_WT:
>>   		pte |= PPAT_DISPLAY_ELLC;
>>   		break;
>>   	default:
>> @@ -893,7 +893,7 @@ static int gen8_init_scratch(struct i915_address_space *vm)
>>   		}
>>   
>>   		fill_px(obj, vm->scratch[i - 1]->encode);
>> -		obj->encode = gen8_pde_encode(px_dma(obj), I915_CACHE_NONE);
>> +		obj->encode = gen8_pde_encode(px_dma(obj), I915_CACHE_MODE_UC);
>>   
>>   		vm->scratch[i] = obj;
>>   	}
>> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
>> index ee15486fed0d..f1e59e512d14 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
>> @@ -1103,7 +1103,7 @@ static int init_status_page(struct intel_engine_cs *engine)
>>   		return PTR_ERR(obj);
>>   	}
>>   
>> -	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
>> +	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
>>   
>>   	vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL);
>>   	if (IS_ERR(vma)) {
>> diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c
>> index fca61ddca8ad..ab5f654e7557 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
>> @@ -1011,11 +1011,6 @@ static int gen8_gmch_probe(struct i915_ggtt *ggtt)
>>   	return ggtt_probe_common(ggtt, size);
>>   }
>>   
>> -/*
>> - * For pre-gen8 platforms pat_index is the same as enum i915_cache_level,
>> - * so the switch-case statements in these PTE encode functions are still valid.
>> - * See translation table LEGACY_CACHELEVEL.
>> - */
>>   static u64 snb_pte_encode(dma_addr_t addr,
>>   			  unsigned int pat_index,
>>   			  u32 flags)
>> @@ -1023,11 +1018,11 @@ static u64 snb_pte_encode(dma_addr_t addr,
>>   	gen6_pte_t pte = GEN6_PTE_ADDR_ENCODE(addr) | GEN6_PTE_VALID;
>>   
>>   	switch (pat_index) {
>> -	case I915_CACHE_L3_LLC:
>> -	case I915_CACHE_LLC:
>> +	case I915_CACHE_MODE_WB:
>> +	case __I915_CACHE_MODE_WB_L3:
>>   		pte |= GEN6_PTE_CACHE_LLC;
>>   		break;
>> -	case I915_CACHE_NONE:
>> +	case I915_CACHE_MODE_UC:
>>   		pte |= GEN6_PTE_UNCACHED;
>>   		break;
>>   	default:
>> @@ -1044,13 +1039,13 @@ static u64 ivb_pte_encode(dma_addr_t addr,
>>   	gen6_pte_t pte = GEN6_PTE_ADDR_ENCODE(addr) | GEN6_PTE_VALID;
>>   
>>   	switch (pat_index) {
>> -	case I915_CACHE_L3_LLC:
>> +	case __I915_CACHE_MODE_WB_L3:
>>   		pte |= GEN7_PTE_CACHE_L3_LLC;
>>   		break;
>> -	case I915_CACHE_LLC:
>> +	case I915_CACHE_MODE_WB:
>>   		pte |= GEN6_PTE_CACHE_LLC;
>>   		break;
>> -	case I915_CACHE_NONE:
>> +	case I915_CACHE_MODE_UC:
>>   		pte |= GEN6_PTE_UNCACHED;
>>   		break;
>>   	default:
>> @@ -1069,7 +1064,7 @@ static u64 byt_pte_encode(dma_addr_t addr,
>>   	if (!(flags & PTE_READ_ONLY))
>>   		pte |= BYT_PTE_WRITEABLE;
>>   
>> -	if (pat_index != I915_CACHE_NONE)
>> +	if (pat_index != I915_CACHE_MODE_UC)
>>   		pte |= BYT_PTE_SNOOPED_BY_CPU_CACHES;
>>   
>>   	return pte;
>> @@ -1081,7 +1076,7 @@ static u64 hsw_pte_encode(dma_addr_t addr,
>>   {
>>   	gen6_pte_t pte = HSW_PTE_ADDR_ENCODE(addr) | GEN6_PTE_VALID;
>>   
>> -	if (pat_index != I915_CACHE_NONE)
>> +	if (pat_index != I915_CACHE_MODE_UC)
>>   		pte |= HSW_WB_LLC_AGE3;
>>   
>>   	return pte;
>> @@ -1094,9 +1089,9 @@ static u64 iris_pte_encode(dma_addr_t addr,
>>   	gen6_pte_t pte = HSW_PTE_ADDR_ENCODE(addr) | GEN6_PTE_VALID;
>>   
>>   	switch (pat_index) {
>> -	case I915_CACHE_NONE:
>> +	case I915_CACHE_MODE_UC:
>>   		break;
>> -	case I915_CACHE_WT:
>> +	case I915_CACHE_MODE_WT:
>>   		pte |= HSW_WT_ELLC_LLC_AGE3;
>>   		break;
>>   	default:
>> diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c b/drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c
>> index 866c416afb73..803c41ac4ccb 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c
>> @@ -21,7 +21,7 @@ static void gmch_ggtt_insert_page(struct i915_address_space *vm,
>>   				  unsigned int pat_index,
>>   				  u32 unused)
>>   {
>> -	unsigned int flags = (pat_index == I915_CACHE_NONE) ?
>> +	unsigned int flags = (pat_index == I915_CACHE_MODE_UC) ?
>>   		AGP_USER_MEMORY : AGP_USER_CACHED_MEMORY;
>>   
>>   	intel_gmch_gtt_insert_page(addr, offset >> PAGE_SHIFT, flags);
>> @@ -32,7 +32,7 @@ static void gmch_ggtt_insert_entries(struct i915_address_space *vm,
>>   				     unsigned int pat_index,
>>   				     u32 unused)
>>   {
>> -	unsigned int flags = (pat_index == I915_CACHE_NONE) ?
>> +	unsigned int flags = (pat_index == I915_CACHE_MODE_UC) ?
>>   		AGP_USER_MEMORY : AGP_USER_CACHED_MEMORY;
>>   
>>   	intel_gmch_gtt_insert_sg_entries(vma_res->bi.pages, vma_res->start >> PAGE_SHIFT,
>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
>> index 065099362a98..48055304537a 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
>> @@ -676,7 +676,7 @@ __vm_create_scratch_for_read(struct i915_address_space *vm, unsigned long size)
>>   	if (IS_ERR(obj))
>>   		return ERR_CAST(obj);
>>   
>> -	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
>> +	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
>>   
>>   	vma = i915_vma_instance(obj, vm, NULL);
>>   	if (IS_ERR(vma)) {
>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
>> index 7192a534a654..af4277c1d577 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
>> @@ -636,7 +636,8 @@ void
>>   __set_pd_entry(struct i915_page_directory * const pd,
>>   	       const unsigned short idx,
>>   	       struct i915_page_table *pt,
>> -	       u64 (*encode)(const dma_addr_t, const enum i915_cache_level));
>> +	       u64 (*encode)(const dma_addr_t,
>> +			     const enum i915_cache_mode cache_mode));
>>   
>>   #define set_pd_entry(pd, idx, to) \
>>   	__set_pd_entry((pd), (idx), px_pt(to), gen8_pde_encode)
>> diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
>> index 436756bfbb1a..3e461d4f3693 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
>> @@ -98,14 +98,16 @@ void
>>   __set_pd_entry(struct i915_page_directory * const pd,
>>   	       const unsigned short idx,
>>   	       struct i915_page_table * const to,
>> -	       u64 (*encode)(const dma_addr_t, const enum i915_cache_level))
>> +	       u64 (*encode)(const dma_addr_t,
>> +			     const enum i915_cache_mode cache_mode))
>>   {
>>   	/* Each thread pre-pins the pd, and we may have a thread per pde. */
>>   	GEM_BUG_ON(atomic_read(px_used(pd)) > NALLOC * I915_PDES);
>>   
>>   	atomic_inc(px_used(pd));
>>   	pd->entry[idx] = to;
>> -	write_dma_entry(px_base(pd), idx, encode(px_dma(to), I915_CACHE_LLC));
>> +	write_dma_entry(px_base(pd), idx,
>> +			encode(px_dma(to), I915_CACHE_MODE_WB));
>>   }
>>   
>>   void
>> diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
>> index 92085ffd23de..9131d228d285 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
>> @@ -551,7 +551,9 @@ alloc_context_vma(struct intel_engine_cs *engine)
>>   	 * later platforms don't have L3 control bits in the PTE.
>>   	 */
>>   	if (IS_IVYBRIDGE(i915))
>> -		i915_gem_object_set_cache_coherency(obj, I915_CACHE_L3_LLC);
>> +		i915_gem_object_set_cache_coherency(obj,
>> +						    I915_CACHE_CACHED |
>> +						    __I915_CACHE_FLAG(L3));
>>   
>>   	vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL);
>>   	if (IS_ERR(vma)) {
>> diff --git a/drivers/gpu/drm/i915/gt/intel_timeline.c b/drivers/gpu/drm/i915/gt/intel_timeline.c
>> index b9640212d659..025ce54c886d 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_timeline.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_timeline.c
>> @@ -26,7 +26,7 @@ static struct i915_vma *hwsp_alloc(struct intel_gt *gt)
>>   	if (IS_ERR(obj))
>>   		return ERR_CAST(obj);
>>   
>> -	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
>> +	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
>>   
>>   	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL);
>>   	if (IS_ERR(vma))
>> diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
>> index 8b0d84f2aad2..fc278fa463b0 100644
>> --- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
>> +++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
>> @@ -64,7 +64,7 @@ static int hang_init(struct hang *h, struct intel_gt *gt)
>>   		goto err_hws;
>>   	}
>>   
>> -	i915_gem_object_set_cache_coherency(h->hws, I915_CACHE_LLC);
>> +	i915_gem_object_set_cache_coherency(h->hws, I915_CACHE_CACHED);
>>   	vaddr = i915_gem_object_pin_map_unlocked(h->hws, I915_MAP_WB);
>>   	if (IS_ERR(vaddr)) {
>>   		err = PTR_ERR(vaddr);
>> diff --git a/drivers/gpu/drm/i915/gt/selftest_workarounds.c b/drivers/gpu/drm/i915/gt/selftest_workarounds.c
>> index 14a8b25b6204..d25990d33d44 100644
>> --- a/drivers/gpu/drm/i915/gt/selftest_workarounds.c
>> +++ b/drivers/gpu/drm/i915/gt/selftest_workarounds.c
>> @@ -111,7 +111,7 @@ read_nonprivs(struct intel_context *ce)
>>   	if (IS_ERR(result))
>>   		return result;
>>   
>> -	i915_gem_object_set_cache_coherency(result, I915_CACHE_LLC);
>> +	i915_gem_object_set_cache_coherency(result, I915_CACHE_CACHED);
>>   
>>   	cs = i915_gem_object_pin_map_unlocked(result, I915_MAP_WB);
>>   	if (IS_ERR(cs)) {
>> diff --git a/drivers/gpu/drm/i915/i915_cache.c b/drivers/gpu/drm/i915/i915_cache.c
>> index 06eb5933c719..f4ba1cb430d3 100644
>> --- a/drivers/gpu/drm/i915/i915_cache.c
>> +++ b/drivers/gpu/drm/i915/i915_cache.c
>> @@ -6,13 +6,88 @@
>>   #include "i915_cache.h"
>>   #include "i915_drv.h"
>>   
>> -void i915_cache_init(struct drm_i915_private *i915)
>> +int i915_cache_init(struct drm_i915_private *i915)
>>   {
>> -	i915->pat_uc = i915_gem_get_pat_index(i915, I915_CACHE_NONE);
>> -	drm_info(&i915->drm, "Using PAT index %u for uncached access\n",
>> -		 i915->pat_uc);
>> +	int ret;
>>   
>> -	i915->pat_wb = i915_gem_get_pat_index(i915, I915_CACHE_LLC);
>> -	drm_info(&i915->drm, "Using PAT index %u for write-back access\n",
>> -		 i915->pat_wb);
>> +	ret = i915_cache_find_pat(i915, I915_CACHE_NONE);
>> +	if (ret < 0) {
>> +		drm_err(&i915->drm,
>> +			"Failed to find PAT index for uncached access\n");
>> +		return -ENODEV;
>> +	}
>> +	drm_info(&i915->drm, "Using PAT index %u for uncached access\n", ret);
>> +	i915->pat_uc = ret;
>> +
>> +	ret = i915_cache_find_pat(i915, I915_CACHE_CACHED);
>> +	if (ret < 0) {
>> +		drm_err(&i915->drm,
>> +			"Failed to find PAT index for write-back access\n");
>> +		return -ENODEV;
>> +	}
>> +	drm_info(&i915->drm, "Using PAT index %u for write-back access\n", ret);
>> +	i915->pat_wb = ret;
>> +
>> +	return 0;
>> +}
>> +
>> +int i915_cache_find_pat(struct drm_i915_private *i915, i915_cache_t cache)
>> +{
>> +	const struct intel_device_info *info = INTEL_INFO(i915);
>> +	int i;
>> +
>> +	for (i = 0; i < ARRAY_SIZE(info->cache_modes); i++) {
>> +		if (info->cache_modes[i] == cache)
>> +			return i;
>> +	}
>> +
>> +	return -1;
>> +}
>> +
>> +void i915_cache_print(char *buf, size_t buflen, const char *suffix,
>> +		      i915_cache_t cache)
>> +{
>> +	const enum i915_cache_mode mode = I915_CACHE_MODE(cache);
>> +	static const char * const mode_str[] = {
>> +		[I915_CACHE_MODE_UC] = "UC",
>> +		[I915_CACHE_MODE_WB] = "WB",
>> +		[I915_CACHE_MODE_WT] = "WT",
>> +		[I915_CACHE_MODE_WC] = "WC",
>> +	};
>> +	static const char * const flag_str[] = {
>> +		[ilog2(I915_CACHE_FLAG_COH1W)] = "1-Way-Coherent",
>> +		[ilog2(I915_CACHE_FLAG_COH2W)] = "2-Way-Coherent",
>> +		[ilog2(I915_CACHE_FLAG_L3)] =    "L3",
>> +		[ilog2(I915_CACHE_FLAG_CLOS1)] = "CLOS1",
>> +		[ilog2(I915_CACHE_FLAG_CLOS2)] = "CLOS2",
>> +	};
>> +
>> +	if (mode > ARRAY_SIZE(mode_str)) {
>> +		snprintf(buf, buflen, "0x%x%s", cache, suffix ?: "");
>> +	} else {
>> +		unsigned long flags = I915_CACHE_FLAGS(cache);
>> +		unsigned long bit;
>> +		int ret;
>> +
>> +		ret = snprintf(buf, buflen, "%s", mode_str[mode]);
>> +		buf += ret;
>> +		buflen -= ret;
>> +
>> +		/*
>> +		 * Don't print "1-way-2-way", it would be confusing and 2-way
>> +		 * implies 1-way anyway.
>> +		 */
>> +		if ((flags & (I915_CACHE_FLAG_COH1W | I915_CACHE_FLAG_COH2W)) ==
>> +		    (I915_CACHE_FLAG_COH1W | I915_CACHE_FLAG_COH2W))
>> +			flags &= ~I915_CACHE_FLAG_COH1W;
>> +
>> +		for_each_set_bit(bit, &flags, BITS_PER_TYPE(i915_cache_t)) {
>> +			ret = snprintf(buf, buflen, "-%s", flag_str[bit]);
>> +			buf += ret;
>> +			buflen -= ret;
>> +		}
>> +
>> +		if (suffix)
>> +			snprintf(buf, buflen, "%s", suffix);
>> +	}
>>   }
>> diff --git a/drivers/gpu/drm/i915/i915_cache.h b/drivers/gpu/drm/i915/i915_cache.h
>> index cb68936fb8a2..d9e97318b942 100644
>> --- a/drivers/gpu/drm/i915/i915_cache.h
>> +++ b/drivers/gpu/drm/i915/i915_cache.h
>> @@ -6,8 +6,76 @@
>>   #ifndef __I915_CACHE_H__
>>   #define __I915_CACHE_H__
>>   
>> +#include <linux/types.h>
>> +
>> +struct drm_printer;
>> +
>>   struct drm_i915_private;
>>   
>> -void i915_cache_init(struct drm_i915_private *i915);
>> +typedef u16 i915_cache_t;
>> +
>> +/* Cache modes */
>> +enum i915_cache_mode {
>> +	I915_CACHE_MODE_UC = 0,
>> +	I915_CACHE_MODE_WB,
>> +	__I915_CACHE_MODE_WB_L3, /* Special do-not-use entry for legacy 1:1 mapping. */
>> +	I915_CACHE_MODE_WT,
>> +	I915_CACHE_MODE_WC,
>> +	I915_NUM_CACHE_MODES
>> +};
>> +
>> +/* Cache mode flag bits */
>> +#define I915_CACHE_FLAG_COH1W	(0x1)
>> +#define I915_CACHE_FLAG_COH2W	(0x2) /* 1-way needs to be set too. */
>> +#define I915_CACHE_FLAG_L3	(0x4)
>> +#define I915_CACHE_FLAG_CLOS1	(0x8)
>> +#define I915_CACHE_FLAG_CLOS2	(0x10)
>> +
>> +/*
>> + * Overloaded I915_CACHE() macro based on:
>> + *  https://stackoverflow.com/questions/3046889/optional-parameters-with-c-macros
>> + *
>> + * It is possible to call I915_CACHE with mode and zero or more flags as
>> + * separate arguments. Ie these all work:
>> + *
>> + *   I915_CACHE(WB)
>> + *   I915_CACHE(WB, COH1W, COH2W)
>> + *   I915_CACHE(WB, COH1W, COH2W, L3)
>> + */
>> +
>> +#define __I915_CACHE_FLAG(f) (I915_CACHE_FLAG_##f << 8)
>> +#define __I915_CACHE(m, f) ((i915_cache_t)(I915_CACHE_MODE_##m | (f)))
>> +
>> +#define I915_CACHE_4(m, f1, f2, f3)	__I915_CACHE(m, __I915_CACHE_FLAG(f1) | __I915_CACHE_FLAG(f2) | __I915_CACHE_FLAG(f3))
>> +#define I915_CACHE_3(m, f1, f2)		__I915_CACHE(m, __I915_CACHE_FLAG(f1) | __I915_CACHE_FLAG(f2))
>> +#define I915_CACHE_2(m, f1)		__I915_CACHE(m, __I915_CACHE_FLAG(f1))
>> +#define I915_CACHE_1(m)			__I915_CACHE(m, 0)
>> +#define I915_CACHE_0(m)			__I915_CACHE(WC, 0)
>> +
>> +#define FUNC_CHOOSER(_f1, _f2, _f3, _f4, _f5, ...) _f5
>> +#define FUNC_RECOMPOSER(argsWithParentheses) FUNC_CHOOSER argsWithParentheses
>> +#define CHOOSE_FROM_ARG_COUNT(...) FUNC_RECOMPOSER((__VA_ARGS__, I915_CACHE_4, I915_CACHE_3, I915_CACHE_2, I915_CACHE_1, ))
>> +#define NO_ARG_EXPANDER() ,,,I915_CACHE_0
>> +#define MACRO_CHOOSER(...) CHOOSE_FROM_ARG_COUNT(NO_ARG_EXPANDER __VA_ARGS__ ())
>> +
>> +#define I915_CACHE(...) MACRO_CHOOSER(__VA_ARGS__)(__VA_ARGS__)
>> +
>> +/* i915_cache_t mode and flags extraction helpers. */
>> +#define I915_CACHE_MODE(cache) \
>> +	((enum i915_cache_mode)(((i915_cache_t)(cache)) & 0xff))
>> +#define I915_CACHE_FLAGS(cache) \
>> +	((unsigned int)((((i915_cache_t)(cache) & 0xff00)) >> 8))
>> +
>> +/* Helpers for i915 caching modes. */
>> +#define I915_CACHE_NONE		I915_CACHE(UC)
>> +#define I915_CACHE_CACHED	I915_CACHE(WB, COH1W, COH2W)
>> +#define I915_CACHE_WT		I915_CACHE(WT)
>> +
>> +int i915_cache_init(struct drm_i915_private *i915);
>> +int i915_cache_find_pat(struct drm_i915_private *i915, i915_cache_t cache);
>> +void i915_cache_print(char *buf, size_t buflen, const char *suffix,
>> +		      i915_cache_t cache);
>> +
>> +#define I915_CACHE_NAME_LEN (40)
>>   
>>   #endif /* __I915_CACHE_H__ */
>> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
>> index 4de44cf1026d..4ec292011546 100644
>> --- a/drivers/gpu/drm/i915/i915_debugfs.c
>> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
>> @@ -140,57 +140,18 @@ static const char *stringify_vma_type(const struct i915_vma *vma)
>>   	return "ppgtt";
>>   }
>>   
>> -static const char *i915_cache_level_str(struct drm_i915_gem_object *obj)
>> -{
>> -	struct drm_i915_private *i915 = obj_to_i915(obj);
>> -
>> -	if (IS_METEORLAKE(i915)) {
>> -		switch (obj->pat_index) {
>> -		case 0: return " WB";
>> -		case 1: return " WT";
>> -		case 2: return " UC";
>> -		case 3: return " WB (1-Way Coh)";
>> -		case 4: return " WB (2-Way Coh)";
>> -		default: return " not defined";
>> -		}
>> -	} else if (IS_PONTEVECCHIO(i915)) {
>> -		switch (obj->pat_index) {
>> -		case 0: return " UC";
>> -		case 1: return " WC";
>> -		case 2: return " WT";
>> -		case 3: return " WB";
>> -		case 4: return " WT (CLOS1)";
>> -		case 5: return " WB (CLOS1)";
>> -		case 6: return " WT (CLOS2)";
>> -		case 7: return " WT (CLOS2)";
>> -		default: return " not defined";
>> -		}
>> -	} else if (GRAPHICS_VER(i915) >= 12) {
>> -		switch (obj->pat_index) {
>> -		case 0: return " WB";
>> -		case 1: return " WC";
>> -		case 2: return " WT";
>> -		case 3: return " UC";
>> -		default: return " not defined";
>> -		}
>> -	} else {
>> -		switch (obj->pat_index) {
>> -		case 0: return " UC";
>> -		case 1: return HAS_LLC(i915) ?
>> -			       " LLC" : " snooped";
>> -		case 2: return " L3+LLC";
>> -		case 3: return " WT";
>> -		default: return " not defined";
>> -		}
>> -	}
>> -}
>> -
>>   void
>>   i915_debugfs_describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
>>   {
>> +	struct drm_i915_private *i915 = to_i915(obj->base.dev);
>> +	char buf[I915_CACHE_NAME_LEN];
>>   	struct i915_vma *vma;
>>   	int pin_count = 0;
>>   
>> +	i915_cache_print(buf, sizeof(buf),
>> +			 obj->pat_set_by_user ? "!" : NULL,
>> +			 INTEL_INFO(i915)->cache_modes[obj->pat_index]);
>> +
>>   	seq_printf(m, "%pK: %c%c%c %8zdKiB %02x %02x %s%s%s",
>>   		   &obj->base,
>>   		   get_tiling_flag(obj),
>> @@ -199,7 +160,7 @@ i915_debugfs_describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
>>   		   obj->base.size / 1024,
>>   		   obj->read_domains,
>>   		   obj->write_domain,
>> -		   i915_cache_level_str(obj),
>> +		   buf,
>>   		   obj->mm.dirty ? " dirty" : "",
>>   		   obj->mm.madv == I915_MADV_DONTNEED ? " purgeable" : "");
>>   	if (obj->base.name)
>> diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
>> index bb2223cc3470..8663388a524f 100644
>> --- a/drivers/gpu/drm/i915/i915_driver.c
>> +++ b/drivers/gpu/drm/i915/i915_driver.c
>> @@ -241,7 +241,9 @@ static int i915_driver_early_probe(struct drm_i915_private *dev_priv)
>>   	i915_memcpy_init_early(dev_priv);
>>   	intel_runtime_pm_init_early(&dev_priv->runtime_pm);
>>   
>> -	i915_cache_init(dev_priv);
>> +	ret = i915_cache_init(dev_priv);
>> +	if (ret < 0)
>> +		return ret;
>>   
>>   	ret = i915_workqueues_init(dev_priv);
>>   	if (ret < 0)
>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>> index 896aa48ed089..814705cfeb12 100644
>> --- a/drivers/gpu/drm/i915/i915_gem.c
>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>> @@ -1144,19 +1144,6 @@ int i915_gem_init(struct drm_i915_private *dev_priv)
>>   	unsigned int i;
>>   	int ret;
>>   
>> -	/*
>> -	 * In the proccess of replacing cache_level with pat_index a tricky
>> -	 * dependency is created on the definition of the enum i915_cache_level.
>> -	 * in case this enum is changed, PTE encode would be broken.
>> -	 * Add a WARNING here. And remove when we completely quit using this
>> -	 * enum
>> -	 */
>> -	BUILD_BUG_ON(I915_CACHE_NONE != 0 ||
>> -		     I915_CACHE_LLC != 1 ||
>> -		     I915_CACHE_L3_LLC != 2 ||
>> -		     I915_CACHE_WT != 3 ||
>> -		     I915_MAX_CACHE_LEVEL != 4);
>> -
>>   	/* We need to fallback to 4K pages if host doesn't support huge gtt. */
>>   	if (intel_vgpu_active(dev_priv) && !intel_vgpu_has_huge_gtt(dev_priv))
>>   		RUNTIME_INFO(dev_priv)->page_sizes = I915_GTT_PAGE_SIZE_4K;
>> diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
>> index fcacdc21643c..565a60a1645d 100644
>> --- a/drivers/gpu/drm/i915/i915_pci.c
>> +++ b/drivers/gpu/drm/i915/i915_pci.c
>> @@ -32,6 +32,7 @@
>>   #include "gt/intel_sa_media.h"
>>   #include "gem/i915_gem_object_types.h"
>>   
>> +#include "i915_cache.h"
>>   #include "i915_driver.h"
>>   #include "i915_drv.h"
>>   #include "i915_pci.h"
>> @@ -43,36 +44,43 @@
>>   	.__runtime.graphics.ip.ver = (x), \
>>   	.__runtime.media.ip.ver = (x)
>>   
>> -#define LEGACY_CACHELEVEL \
>> -	.cachelevel_to_pat = { \
>> -		[I915_CACHE_NONE]   = 0, \
>> -		[I915_CACHE_LLC]    = 1, \
>> -		[I915_CACHE_L3_LLC] = 2, \
>> -		[I915_CACHE_WT]     = 3, \
>> +#define LEGACY_CACHE_MODES \
>> +	.cache_modes = { \
>> +		[I915_CACHE_MODE_UC] 	  = I915_CACHE(UC), \
>> +		[I915_CACHE_MODE_WB] 	  = I915_CACHE(WB, COH1W, COH2W), \
> 
> Reading bspec 2863 (bdw) indicates that the CPU being able to snoop the
> GPU's L3 was a new feature in gen8.  So for HSW and earlier, any
> coherency was only 1-way (GPU could be coherent with CPU's caches, but
> not vice-versa).  Only starting with gen8 did we get 2-way coherency as
> an option where the CPU would also be coherent with the GPU cache (and
> with gen8 and beyond you could still select 1-way instead of 2-way
> coherency with instruction-level granularity via MOCS).  There are also

Did you mean Gen9 here? For me 2863 leads to "L3 Coherency SKL+" and 
text says "Gen9".

[Comes back later.]

I think so, 2770 is BDW.

> some legacy platforms (e.g., EHL/JSL on bspec 13948) where the IA wasn't
> coherent with GPU L3 so we were back to 1-way coherency.

> So should we split LEGACY_CACHE_MODES into two tables with different
> coherency settings attached to I915_CACHE_MODE_WB?

Looks like it. Marking as TODO for next respin.

Many thanks for helping with the research here!

Regards,

Tvrtko

> 
>> +		[__I915_CACHE_MODE_WB_L3] = I915_CACHE(WB, COH1W, COH2W, L3), \
>> +		[I915_CACHE_MODE_WT] 	  = I915_CACHE(WT), \
>>   	}
>>   
>> -#define TGL_CACHELEVEL \
>> -	.cachelevel_to_pat = { \
>> -		[I915_CACHE_NONE]   = 3, \
>> -		[I915_CACHE_LLC]    = 0, \
>> -		[I915_CACHE_L3_LLC] = 0, \
>> -		[I915_CACHE_WT]     = 2, \
>> +#define GEN12_CACHE_MODES \
>> +	.cache_modes = { \
>> +		[0] = I915_CACHE(WB, COH1W, COH2W), \
>> +		[1] = I915_CACHE(WC), \
>> +		[2] = I915_CACHE(WT), \
>> +		[3] = I915_CACHE(UC), \
>>   	}
>>   
>> -#define PVC_CACHELEVEL \
>> -	.cachelevel_to_pat = { \
>> -		[I915_CACHE_NONE]   = 0, \
>> -		[I915_CACHE_LLC]    = 3, \
>> -		[I915_CACHE_L3_LLC] = 3, \
>> -		[I915_CACHE_WT]     = 2, \
>> +/* FIXME is 1-way or 2-way for 3, 5, 7 */
>> +
>> +#define PVC_CACHE_MODES \
>> +	.cache_modes = { \
>> +		[0] = I915_CACHE(UC), \
>> +		[1] = I915_CACHE(WC), \
>> +		[2] = I915_CACHE(WT), \
>> +		[3] = I915_CACHE(WB, COH1W), \
>> +		[4] = I915_CACHE(WT, CLOS1), \
>> +		[5] = I915_CACHE(WB, COH1W, CLOS1), \
>> +		[6] = I915_CACHE(WT, CLOS2), \
>> +		[7] = I915_CACHE(WB, COH1W, CLOS2), \
>>   	}
>>   
>> -#define MTL_CACHELEVEL \
>> -	.cachelevel_to_pat = { \
>> -		[I915_CACHE_NONE]   = 2, \
>> -		[I915_CACHE_LLC]    = 3, \
>> -		[I915_CACHE_L3_LLC] = 3, \
>> -		[I915_CACHE_WT]     = 1, \
>> +#define MTL_CACHE_MODES \
>> +	.cache_modes = { \
>> +		[0] = I915_CACHE(WB), \
>> +		[1] = I915_CACHE(WT), \
>> +		[2] = I915_CACHE(UC), \
>> +		[3] = I915_CACHE(WB, COH1W), \
>> +		[4] = I915_CACHE(WB, COH1W, COH2W), \
> 
> We may want a comment on this one since the "2W" part is sort of a lie.
> Bspec 63884 has a programming note for MTL that says
> 
>          "...Except for system atomics, setting Coherency Mode to 10 or
>          11 results in this same one-way coherenct behavior..."
> 
> So if we ask for 2W, we actually only get 1W behavior except in a very
> narrow set of cases.
> 
> 
> Matt
> 
>>   	}
>>   
>>   /* Keep in gen based order, and chronological order within a gen */
>> @@ -97,7 +105,7 @@
>>   	.max_pat_index = 3, \
>>   	GEN_DEFAULT_PAGE_SIZES, \
>>   	GEN_DEFAULT_REGIONS, \
>> -	LEGACY_CACHELEVEL
>> +	LEGACY_CACHE_MODES
>>   
>>   #define I845_FEATURES \
>>   	GEN(2), \
>> @@ -112,7 +120,7 @@
>>   	.max_pat_index = 3, \
>>   	GEN_DEFAULT_PAGE_SIZES, \
>>   	GEN_DEFAULT_REGIONS, \
>> -	LEGACY_CACHELEVEL
>> +	LEGACY_CACHE_MODES
>>   
>>   static const struct intel_device_info i830_info = {
>>   	I830_FEATURES,
>> @@ -145,7 +153,7 @@ static const struct intel_device_info i865g_info = {
>>   	.max_pat_index = 3, \
>>   	GEN_DEFAULT_PAGE_SIZES, \
>>   	GEN_DEFAULT_REGIONS, \
>> -	LEGACY_CACHELEVEL
>> +	LEGACY_CACHE_MODES
>>   
>>   static const struct intel_device_info i915g_info = {
>>   	GEN3_FEATURES,
>> @@ -208,7 +216,7 @@ static const struct intel_device_info pnv_m_info = {
>>   	.max_pat_index = 3, \
>>   	GEN_DEFAULT_PAGE_SIZES, \
>>   	GEN_DEFAULT_REGIONS, \
>> -	LEGACY_CACHELEVEL
>> +	LEGACY_CACHE_MODES
>>   
>>   static const struct intel_device_info i965g_info = {
>>   	GEN4_FEATURES,
>> @@ -252,7 +260,7 @@ static const struct intel_device_info gm45_info = {
>>   	.max_pat_index = 3, \
>>   	GEN_DEFAULT_PAGE_SIZES, \
>>   	GEN_DEFAULT_REGIONS, \
>> -	LEGACY_CACHELEVEL
>> +	LEGACY_CACHE_MODES
>>   
>>   static const struct intel_device_info ilk_d_info = {
>>   	GEN5_FEATURES,
>> @@ -282,7 +290,7 @@ static const struct intel_device_info ilk_m_info = {
>>   	.__runtime.ppgtt_size = 31, \
>>   	GEN_DEFAULT_PAGE_SIZES, \
>>   	GEN_DEFAULT_REGIONS, \
>> -	LEGACY_CACHELEVEL
>> +	LEGACY_CACHE_MODES
>>   
>>   #define SNB_D_PLATFORM \
>>   	GEN6_FEATURES, \
>> @@ -330,7 +338,7 @@ static const struct intel_device_info snb_m_gt2_info = {
>>   	.__runtime.ppgtt_size = 31, \
>>   	GEN_DEFAULT_PAGE_SIZES, \
>>   	GEN_DEFAULT_REGIONS, \
>> -	LEGACY_CACHELEVEL
>> +	LEGACY_CACHE_MODES
>>   
>>   #define IVB_D_PLATFORM \
>>   	GEN7_FEATURES, \
>> @@ -387,7 +395,7 @@ static const struct intel_device_info vlv_info = {
>>   	.platform_engine_mask = BIT(RCS0) | BIT(VCS0) | BIT(BCS0),
>>   	GEN_DEFAULT_PAGE_SIZES,
>>   	GEN_DEFAULT_REGIONS,
>> -	LEGACY_CACHELEVEL,
>> +	LEGACY_CACHE_MODES
>>   };
>>   
>>   #define G75_FEATURES  \
>> @@ -473,7 +481,7 @@ static const struct intel_device_info chv_info = {
>>   	.has_coherent_ggtt = false,
>>   	GEN_DEFAULT_PAGE_SIZES,
>>   	GEN_DEFAULT_REGIONS,
>> -	LEGACY_CACHELEVEL,
>> +	LEGACY_CACHE_MODES
>>   };
>>   
>>   #define GEN9_DEFAULT_PAGE_SIZES \
>> @@ -536,7 +544,7 @@ static const struct intel_device_info skl_gt4_info = {
>>   	.max_pat_index = 3, \
>>   	GEN9_DEFAULT_PAGE_SIZES, \
>>   	GEN_DEFAULT_REGIONS, \
>> -	LEGACY_CACHELEVEL
>> +	LEGACY_CACHE_MODES
>>   
>>   static const struct intel_device_info bxt_info = {
>>   	GEN9_LP_FEATURES,
>> @@ -640,7 +648,7 @@ static const struct intel_device_info jsl_info = {
>>   #define GEN12_FEATURES \
>>   	GEN11_FEATURES, \
>>   	GEN(12), \
>> -	TGL_CACHELEVEL, \
>> +	GEN12_CACHE_MODES, \
>>   	.has_global_mocs = 1, \
>>   	.has_pxp = 1, \
>>   	.max_pat_index = 3
>> @@ -708,7 +716,7 @@ static const struct intel_device_info adl_p_info = {
>>   	.__runtime.graphics.ip.ver = 12, \
>>   	.__runtime.graphics.ip.rel = 50, \
>>   	XE_HP_PAGE_SIZES, \
>> -	TGL_CACHELEVEL, \
>> +	GEN12_CACHE_MODES, \
>>   	.dma_mask_size = 46, \
>>   	.has_3d_pipeline = 1, \
>>   	.has_64bit_reloc = 1, \
>> @@ -803,7 +811,7 @@ static const struct intel_device_info pvc_info = {
>>   		BIT(VCS0) |
>>   		BIT(CCS0) | BIT(CCS1) | BIT(CCS2) | BIT(CCS3),
>>   	.require_force_probe = 1,
>> -	PVC_CACHELEVEL,
>> +	PVC_CACHE_MODES
>>   };
>>   
>>   static const struct intel_gt_definition xelpmp_extra_gt[] = {
>> @@ -838,7 +846,7 @@ static const struct intel_device_info mtl_info = {
>>   	.memory_regions = REGION_SMEM | REGION_STOLEN_LMEM,
>>   	.platform_engine_mask = BIT(RCS0) | BIT(BCS0) | BIT(CCS0),
>>   	.require_force_probe = 1,
>> -	MTL_CACHELEVEL,
>> +	MTL_CACHE_MODES
>>   };
>>   
>>   #undef PLATFORM
>> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
>> index 04bc1f4a1115..973175a64534 100644
>> --- a/drivers/gpu/drm/i915/i915_perf.c
>> +++ b/drivers/gpu/drm/i915/i915_perf.c
>> @@ -1870,7 +1870,7 @@ static int alloc_oa_buffer(struct i915_perf_stream *stream)
>>   		return PTR_ERR(bo);
>>   	}
>>   
>> -	i915_gem_object_set_cache_coherency(bo, I915_CACHE_LLC);
>> +	i915_gem_object_set_cache_coherency(bo, I915_CACHE_CACHED);
>>   
>>   	/* PreHSW required 512K alignment, HSW requires 16M */
>>   	vma = i915_vma_instance(bo, &gt->ggtt->vm, NULL);
>> diff --git a/drivers/gpu/drm/i915/intel_device_info.h b/drivers/gpu/drm/i915/intel_device_info.h
>> index dbfe6443457b..2ce13b7c48cb 100644
>> --- a/drivers/gpu/drm/i915/intel_device_info.h
>> +++ b/drivers/gpu/drm/i915/intel_device_info.h
>> @@ -27,6 +27,8 @@
>>   
>>   #include <uapi/drm/i915_drm.h>
>>   
>> +#include "i915_cache.h"
>> +
>>   #include "intel_step.h"
>>   
>>   #include "gt/intel_engine_types.h"
>> @@ -243,8 +245,8 @@ struct intel_device_info {
>>   	 */
>>   	const struct intel_runtime_info __runtime;
>>   
>> -	u32 cachelevel_to_pat[I915_MAX_CACHE_LEVEL];
>> -	u32 max_pat_index;
>> +	i915_cache_t cache_modes[8];
>> +	unsigned int max_pat_index;
>>   };
>>   
>>   struct intel_driver_caps {
>> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
>> index f910ec9b6d2b..ba821e48baa5 100644
>> --- a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
>> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
>> @@ -267,7 +267,7 @@ static int igt_evict_for_cache_color(void *arg)
>>   		err = PTR_ERR(obj);
>>   		goto cleanup;
>>   	}
>> -	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
>> +	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
>>   	quirk_add(obj, &objects);
>>   
>>   	vma = i915_gem_object_ggtt_pin(obj, NULL, 0, 0,
>> @@ -283,7 +283,7 @@ static int igt_evict_for_cache_color(void *arg)
>>   		err = PTR_ERR(obj);
>>   		goto cleanup;
>>   	}
>> -	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
>> +	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
>>   	quirk_add(obj, &objects);
>>   
>>   	/* Neighbouring; same colour - should fit */
>> diff --git a/drivers/gpu/drm/i915/selftests/igt_spinner.c b/drivers/gpu/drm/i915/selftests/igt_spinner.c
>> index 3c5e0952f1b8..4cfc5000d6ff 100644
>> --- a/drivers/gpu/drm/i915/selftests/igt_spinner.c
>> +++ b/drivers/gpu/drm/i915/selftests/igt_spinner.c
>> @@ -23,7 +23,7 @@ int igt_spinner_init(struct igt_spinner *spin, struct intel_gt *gt)
>>   		err = PTR_ERR(spin->hws);
>>   		goto err;
>>   	}
>> -	i915_gem_object_set_cache_coherency(spin->hws, I915_CACHE_LLC);
>> +	i915_gem_object_set_cache_coherency(spin->hws, I915_CACHE_CACHED);
>>   
>>   	spin->obj = i915_gem_object_create_internal(gt->i915, PAGE_SIZE);
>>   	if (IS_ERR(spin->obj)) {
>> diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
>> index 1d1a457e2aee..8ae77bcf27fa 100644
>> --- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
>> +++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
>> @@ -126,13 +126,13 @@ static const struct intel_device_info mock_info = {
>>   	.memory_regions = REGION_SMEM,
>>   	.platform_engine_mask = BIT(0),
>>   
>> -	/* simply use legacy cache level for mock device */
>> +	/* Simply use legacy cache modes for the mock device. */
>>   	.max_pat_index = 3,
>> -	.cachelevel_to_pat = {
>> -		[I915_CACHE_NONE]   = 0,
>> -		[I915_CACHE_LLC]    = 1,
>> -		[I915_CACHE_L3_LLC] = 2,
>> -		[I915_CACHE_WT]     = 3,
>> +	.cache_modes = {
>> +		[0] = I915_CACHE(UC),
>> +		[1] = I915_CACHE(WB, COH1W),
>> +		[2] = I915_CACHE(WB, COH1W, COH2W, L3),
>> +		[3] = I915_CACHE(WT),
>>   	},
>>   };
>>   
>> @@ -181,7 +181,7 @@ struct drm_i915_private *mock_gem_device(void)
>>   	/* Set up device info and initial runtime info. */
>>   	intel_device_info_driver_create(i915, pdev->device, &mock_info);
>>   
>> -	i915_cache_init(i915);
>> +	WARN_ON(i915_cache_init(i915));
>>   
>>   	dev_pm_domain_set(&pdev->dev, &pm_domain);
>>   	pm_runtime_enable(&pdev->dev);
>> -- 
>> 2.39.2
>>
> 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC 5/8] drm/i915: Improve the vm_fault_gtt user PAT index restriction
  2023-07-28  0:04     ` [Intel-gfx] " Matt Roper
@ 2023-07-28 12:28       ` Tvrtko Ursulin
  -1 siblings, 0 replies; 59+ messages in thread
From: Tvrtko Ursulin @ 2023-07-28 12:28 UTC (permalink / raw)
  To: Matt Roper; +Cc: Intel-gfx, Fei Yang, dri-devel, Tvrtko Ursulin


On 28/07/2023 01:04, Matt Roper wrote:
> On Thu, Jul 27, 2023 at 03:55:01PM +0100, Tvrtko Ursulin wrote:
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> Now that i915 understands the caching modes behind PAT indices, we can
>> refine the check in vm_fault_gtt() to not reject the uncached PAT if it
>> was set by userspace on a snoopable platform.
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> Cc: Fei Yang <fei.yang@intel.com>
>> Cc: Matt Roper <matthew.d.roper@intel.com>
>> ---
>>   drivers/gpu/drm/i915/gem/i915_gem_mman.c | 14 +++-----------
>>   1 file changed, 3 insertions(+), 11 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
>> index cd7f8ded0d6f..9aa6ecf68432 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
>> @@ -382,17 +382,9 @@ static vm_fault_t vm_fault_gtt(struct vm_fault *vmf)
>>   		goto err_reset;
>>   	}
>>   
>> -	/*
>> -	 * For objects created by userspace through GEM_CREATE with pat_index
>> -	 * set by set_pat extension, coherency is managed by userspace, make
>> -	 * sure we don't fail handling the vm fault by calling
>> -	 * i915_gem_object_has_cache_level() which always return true for such
>> -	 * objects. Otherwise this helper function would fall back to checking
>> -	 * whether the object is un-cached.
>> -	 */
>> -	if (!((obj->pat_set_by_user ||
>> -	       i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_UC)) ||
>> -	      HAS_LLC(i915))) {
>> +	/* Access to snoopable pages through the GTT is incoherent. */
> 
> This comment was removed in the previous patch, but now it came back
> here.  Should we have just left it be in the previous patch?

Oops yes, fumble when splitting the single patch into this series.

> I'm not really clear on what it means either.  Are we using "GTT" as
> shorthand to refer to the aperture here?

It is about CPU mmap access so I think so.

Original code was:

         /* Access to snoopable pages through the GTT is incoherent. */
         if (obj->cache_level != I915_CACHE_NONE && !HAS_LLC(i915)) {
                 ret = -EFAULT;
                 goto err_unpin;
         }

Which was disallowing anything not uncached on snoopable platforms. So I 
made it equivalent to that:

	/* Access to snoopable pages through the GTT is incoherent. */
	if (!i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_UC) &&
	    !HAS_LLC(i915)) {
		ret = -EFAULT;
		goto err_unpin;
	}

Should be like-for-like assuming PAT-to-cache-mode tables are all good.

On Meteorlake it is no change in behaviour either way due !HAS_LLC.

Regards,

Tvrtko


> 
> Matt
> 
>> +	if (!i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_UC) &&
>> +	    !HAS_LLC(i915)) {
>>   		ret = -EFAULT;
>>   		goto err_unpin;
>>   	}
>> -- 
>> 2.39.2
>>
> 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Intel-gfx] [RFC 5/8] drm/i915: Improve the vm_fault_gtt user PAT index restriction
@ 2023-07-28 12:28       ` Tvrtko Ursulin
  0 siblings, 0 replies; 59+ messages in thread
From: Tvrtko Ursulin @ 2023-07-28 12:28 UTC (permalink / raw)
  To: Matt Roper; +Cc: Intel-gfx, dri-devel


On 28/07/2023 01:04, Matt Roper wrote:
> On Thu, Jul 27, 2023 at 03:55:01PM +0100, Tvrtko Ursulin wrote:
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> Now that i915 understands the caching modes behind PAT indices, we can
>> refine the check in vm_fault_gtt() to not reject the uncached PAT if it
>> was set by userspace on a snoopable platform.
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> Cc: Fei Yang <fei.yang@intel.com>
>> Cc: Matt Roper <matthew.d.roper@intel.com>
>> ---
>>   drivers/gpu/drm/i915/gem/i915_gem_mman.c | 14 +++-----------
>>   1 file changed, 3 insertions(+), 11 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
>> index cd7f8ded0d6f..9aa6ecf68432 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
>> @@ -382,17 +382,9 @@ static vm_fault_t vm_fault_gtt(struct vm_fault *vmf)
>>   		goto err_reset;
>>   	}
>>   
>> -	/*
>> -	 * For objects created by userspace through GEM_CREATE with pat_index
>> -	 * set by set_pat extension, coherency is managed by userspace, make
>> -	 * sure we don't fail handling the vm fault by calling
>> -	 * i915_gem_object_has_cache_level() which always return true for such
>> -	 * objects. Otherwise this helper function would fall back to checking
>> -	 * whether the object is un-cached.
>> -	 */
>> -	if (!((obj->pat_set_by_user ||
>> -	       i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_UC)) ||
>> -	      HAS_LLC(i915))) {
>> +	/* Access to snoopable pages through the GTT is incoherent. */
> 
> This comment was removed in the previous patch, but now it came back
> here.  Should we have just left it be in the previous patch?

Oops yes, fumble when splitting the single patch into this series.

> I'm not really clear on what it means either.  Are we using "GTT" as
> shorthand to refer to the aperture here?

It is about CPU mmap access so I think so.

Original code was:

         /* Access to snoopable pages through the GTT is incoherent. */
         if (obj->cache_level != I915_CACHE_NONE && !HAS_LLC(i915)) {
                 ret = -EFAULT;
                 goto err_unpin;
         }

Which was disallowing anything not uncached on snoopable platforms. So I 
made it equivalent to that:

	/* Access to snoopable pages through the GTT is incoherent. */
	if (!i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_UC) &&
	    !HAS_LLC(i915)) {
		ret = -EFAULT;
		goto err_unpin;
	}

Should be like-for-like assuming PAT-to-cache-mode tables are all good.

On Meteorlake it is no change in behaviour either way due !HAS_LLC.

Regards,

Tvrtko


> 
> Matt
> 
>> +	if (!i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_UC) &&
>> +	    !HAS_LLC(i915)) {
>>   		ret = -EFAULT;
>>   		goto err_unpin;
>>   	}
>> -- 
>> 2.39.2
>>
> 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Intel-gfx] [RFC 4/8] drm/i915: Refactor PAT/object cache handling
  2023-07-28  0:17       ` [Intel-gfx] " Matt Roper
@ 2023-07-28 12:35         ` Tvrtko Ursulin
  -1 siblings, 0 replies; 59+ messages in thread
From: Tvrtko Ursulin @ 2023-07-28 12:35 UTC (permalink / raw)
  To: Matt Roper; +Cc: Intel-gfx, dri-devel, Chris Wilson


On 28/07/2023 01:17, Matt Roper wrote:
> On Thu, Jul 27, 2023 at 04:57:53PM -0700, Matt Roper wrote:
>> On Thu, Jul 27, 2023 at 03:55:00PM +0100, Tvrtko Ursulin wrote:
>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>
>>> Commit 9275277d5324 ("drm/i915: use pat_index instead of cache_level") has
>>> introduced PAT indices to i915 internal APIs, partially replacing the
>>> usage of driver internal cache_level, but has also added a few sub-
>>> optimal design decisions which this patch tries to improve upon.
>>>
>>> Principal change here is to invert the per platform cache level to PAT
>>> index table which was added by the referenced commit, and by doing so
>>> enable i915 to understand the cache mode between PAT indices, changing
>>> them from opaque to transparent.
>>>
>>> Once we have the inverted table we are able to remove the hidden false
>>> "return true" from i915_gem_object_has_cache_level and make the involved
>>> code path clearer.
>>>
>>> To achieve this we replace the enum i915_cache_level with i915_cache_t,
>>> composed of a more detailed representation of each cache mode (base mode
>>> plus flags).
>>>
>>> In this way we are able to express the differences between different
>>> write-back mode coherency settings on Meteorlake, which in turn enables us
>>> to map the i915 "cached" mode to the correct Meteorlake PAT index.
>>>
>>> We can also replace the platform dependent cache mode to string code in
>>> debugfs and elsewhere by the single implementation based on i915_cache_t.
>>>
>>> v2:
>>>   * Fix PAT-to-cache-mode table for PVC. (Fei)
>>>   * Cache display caching mode too. (Fei)
>>>   * Improve and document criteria in i915_gem_object_can_bypass_llc() (Matt)
>>>
>>> v3:
>>>   * Checkpath issues.
>>>   * Cache mode flags check fixed.
>>>
>>> v4:
>>>   * Fix intel_device_info->cache_modes array size. (Matt)
>>>   * Boolean cache mode and flags query. (Matt)
>>>   * Reduce number of cache macros with some macro magic.
>>>   * One more checkpatch fix.
>>>   * Tweak tables to show legacy and Gen12 WB is fully coherent.
>>>
>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>> References: 9275277d5324 ("drm/i915: use pat_index instead of cache_level")
>>> Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
>>> Cc: Fei Yang <fei.yang@intel.com>
>>> Cc: Andi Shyti <andi.shyti@linux.intel.com>
>>> Cc: Matt Roper <matthew.d.roper@intel.com>
>>> ---
>>>   drivers/gpu/drm/i915/gem/i915_gem_domain.c    |  60 +++++----
>>>   drivers/gpu/drm/i915/gem/i915_gem_domain.h    |   5 +-
>>>   .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |   3 +-
>>>   drivers/gpu/drm/i915/gem/i915_gem_internal.c  |   2 +-
>>>   drivers/gpu/drm/i915/gem/i915_gem_mman.c      |   4 +-
>>>   drivers/gpu/drm/i915/gem/i915_gem_object.c    | 117 ++++++++++--------
>>>   drivers/gpu/drm/i915/gem/i915_gem_object.h    |  11 +-
>>>   .../gpu/drm/i915/gem/i915_gem_object_types.h  | 116 +----------------
>>>   drivers/gpu/drm/i915/gem/i915_gem_shmem.c     |   8 +-
>>>   drivers/gpu/drm/i915/gem/i915_gem_stolen.c    |   2 +-
>>>   drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c  |  20 +--
>>>   drivers/gpu/drm/i915/gem/i915_gem_userptr.c   |   2 +-
>>>   .../drm/i915/gem/selftests/huge_gem_object.c  |   2 +-
>>>   .../gpu/drm/i915/gem/selftests/huge_pages.c   |   3 +-
>>>   drivers/gpu/drm/i915/gt/gen8_ppgtt.c          |  10 +-
>>>   drivers/gpu/drm/i915/gt/intel_engine_cs.c     |   2 +-
>>>   drivers/gpu/drm/i915/gt/intel_ggtt.c          |  25 ++--
>>>   drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c     |   4 +-
>>>   drivers/gpu/drm/i915/gt/intel_gtt.c           |   2 +-
>>>   drivers/gpu/drm/i915/gt/intel_gtt.h           |   3 +-
>>>   drivers/gpu/drm/i915/gt/intel_ppgtt.c         |   6 +-
>>>   .../gpu/drm/i915/gt/intel_ring_submission.c   |   4 +-
>>>   drivers/gpu/drm/i915/gt/intel_timeline.c      |   2 +-
>>>   drivers/gpu/drm/i915/gt/selftest_hangcheck.c  |   2 +-
>>>   .../gpu/drm/i915/gt/selftest_workarounds.c    |   2 +-
>>>   drivers/gpu/drm/i915/i915_cache.c             |  89 +++++++++++--
>>>   drivers/gpu/drm/i915/i915_cache.h             |  70 ++++++++++-
>>>   drivers/gpu/drm/i915/i915_debugfs.c           |  53 ++------
>>>   drivers/gpu/drm/i915/i915_driver.c            |   4 +-
>>>   drivers/gpu/drm/i915/i915_gem.c               |  13 --
>>>   drivers/gpu/drm/i915/i915_pci.c               |  84 +++++++------
>>>   drivers/gpu/drm/i915/i915_perf.c              |   2 +-
>>>   drivers/gpu/drm/i915/intel_device_info.h      |   6 +-
>>>   .../gpu/drm/i915/selftests/i915_gem_evict.c   |   4 +-
>>>   drivers/gpu/drm/i915/selftests/igt_spinner.c  |   2 +-
>>>   .../gpu/drm/i915/selftests/mock_gem_device.c  |  14 +--
>>>   36 files changed, 391 insertions(+), 367 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
>>> index 57db9c581bf6..c15f83de33af 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
>>> @@ -8,6 +8,7 @@
>>>   #include "display/intel_frontbuffer.h"
>>>   #include "gt/intel_gt.h"
>>>   
>>> +#include "i915_cache.h"
>>>   #include "i915_drv.h"
>>>   #include "i915_gem_clflush.h"
>>>   #include "i915_gem_domain.h"
>>> @@ -41,14 +42,17 @@ static bool gpu_write_needs_clflush(struct drm_i915_gem_object *obj)
>>>   		return false;
>>>   
>>>   	/*
>>> -	 * For objects created by userspace through GEM_CREATE with pat_index
>>> -	 * set by set_pat extension, i915_gem_object_has_cache_level() will
>>> -	 * always return true, because the coherency of such object is managed
>>> -	 * by userspace. Othereise the call here would fall back to checking
>>> -	 * whether the object is un-cached or write-through.
>>> +	 * Always flush cache for UMD objects with PAT index set.
>>>   	 */
>>> -	return !(i915_gem_object_has_cache_level(obj, I915_CACHE_NONE) ||
>>> -		 i915_gem_object_has_cache_level(obj, I915_CACHE_WT));
>>> +	if (obj->pat_set_by_user)
>>> +		return true;
>>> +
>>> +	/*
>>> +	 * Fully coherent cached access may end up with data in the CPU cache
>>> +	 * which hasn't hit memory yet.
>>> +	 */
>>> +	return i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_WB) &&
>>> +	       i915_gem_object_has_cache_flag(obj, I915_CACHE_FLAG_COH2W);
>>>   }
>>>   
>>>   bool i915_gem_cpu_write_needs_clflush(struct drm_i915_gem_object *obj)
>>> @@ -268,7 +272,7 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
>>>   /**
>>>    * i915_gem_object_set_cache_level - Changes the cache-level of an object across all VMA.
>>>    * @obj: object to act on
>>> - * @cache_level: new cache level to set for the object
>>> + * @cache: new caching mode to set for the object
>>>    *
>>>    * After this function returns, the object will be in the new cache-level
>>>    * across all GTT and the contents of the backing storage will be coherent,
>>> @@ -281,18 +285,28 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
>>>    * that all direct access to the scanout remains coherent.
>>>    */
>>>   int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
>>> -				    enum i915_cache_level cache_level)
>>> +				    i915_cache_t cache)
>>>   {
>>> -	int ret;
>>> +	struct drm_i915_private *i915 = to_i915(obj->base.dev);
>>> +	int pat, ret;
>>>   
>>> -	/*
>>> -	 * For objects created by userspace through GEM_CREATE with pat_index
>>> -	 * set by set_pat extension, simply return 0 here without touching
>>> -	 * the cache setting, because such objects should have an immutable
>>> -	 * cache setting by desgin and always managed by userspace.
>>> -	 */
>>> -	if (i915_gem_object_has_cache_level(obj, cache_level))
>>> +	pat = i915_cache_find_pat(i915, cache);
>>> +	if (pat < 0) {
>>> +		char buf[I915_CACHE_NAME_LEN];
>>> +
>>> +		i915_cache_print(buf, sizeof(buf), NULL, cache);
>>> +		drm_err_ratelimited(&i915->drm,
>>> +				    "Attempting to use unknown caching mode %s!\n",
>>> +				    buf);
>>> +
>>> +		return -EINVAL;
>>> +	} else if (pat == obj->pat_index) {
>>>   		return 0;
>>> +	} else if (obj->pat_set_by_user) {
>>> +		drm_notice_once(&i915->drm,
>>> +				"Attempting to change caching mode on an object with fixed PAT!\n");
>>> +		return -EINVAL;
>>> +	}
>>>   
>>>   	ret = i915_gem_object_wait(obj,
>>>   				   I915_WAIT_INTERRUPTIBLE |
>>> @@ -302,7 +316,7 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
>>>   		return ret;
>>>   
>>>   	/* Always invalidate stale cachelines */
>>> -	i915_gem_object_set_cache_coherency(obj, cache_level);
>>> +	i915_gem_object_set_pat_index(obj, pat);
>>>   	obj->cache_dirty = true;
>>>   
>>>   	/* The cache-level will be applied when each vma is rebound. */
>>> @@ -337,10 +351,10 @@ int i915_gem_get_caching_ioctl(struct drm_device *dev, void *data,
>>>   		goto out;
>>>   	}
>>>   
>>> -	if (i915_gem_object_has_cache_level(obj, I915_CACHE_LLC) ||
>>> -	    i915_gem_object_has_cache_level(obj, I915_CACHE_L3_LLC))
>>> +	if (i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_WB) &&
>>> +	    i915_gem_object_has_cache_flag(obj, I915_CACHE_FLAG_COH2W))
>>>   		args->caching = I915_CACHING_CACHED;
>>> -	else if (i915_gem_object_has_cache_level(obj, I915_CACHE_WT))
>>> +	else if (i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_WT))
>>>   		args->caching = I915_CACHING_DISPLAY;
>>>   	else
>>>   		args->caching = I915_CACHING_NONE;
>>> @@ -355,7 +369,7 @@ int i915_gem_set_caching_ioctl(struct drm_device *dev, void *data,
>>>   	struct drm_i915_private *i915 = to_i915(dev);
>>>   	struct drm_i915_gem_caching *args = data;
>>>   	struct drm_i915_gem_object *obj;
>>> -	enum i915_cache_level level;
>>> +	i915_cache_t level;
>>>   	int ret = 0;
>>>   
>>>   	if (IS_DGFX(i915))
>>> @@ -378,7 +392,7 @@ int i915_gem_set_caching_ioctl(struct drm_device *dev, void *data,
>>>   		if (!HAS_LLC(i915) && !HAS_SNOOP(i915))
>>>   			return -ENODEV;
>>>   
>>> -		level = I915_CACHE_LLC;
>>> +		level = I915_CACHE_CACHED;
>>>   		break;
>>>   	case I915_CACHING_DISPLAY:
>>>   		level = HAS_WT(i915) ? I915_CACHE_WT : I915_CACHE_NONE;
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.h b/drivers/gpu/drm/i915/gem/i915_gem_domain.h
>>> index 9622df962bfc..6da5c351f6fd 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.h
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.h
>>> @@ -6,10 +6,11 @@
>>>   #ifndef __I915_GEM_DOMAIN_H__
>>>   #define __I915_GEM_DOMAIN_H__
>>>   
>>> +#include "i915_cache.h"
>>> +
>>>   struct drm_i915_gem_object;
>>> -enum i915_cache_level;
>>>   
>>>   int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
>>> -				    enum i915_cache_level cache_level);
>>> +				    i915_cache_t cache);
>>>   
>>>   #endif /* __I915_GEM_DOMAIN_H__ */
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>>> index 0a1d40220020..9d6e49c8a4c6 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>>> @@ -648,7 +648,8 @@ static inline int use_cpu_reloc(const struct reloc_cache *cache,
>>>   	 */
>>>   	return (cache->has_llc ||
>>>   		obj->cache_dirty ||
>>> -		!i915_gem_object_has_cache_level(obj, I915_CACHE_NONE));
>>> +		!(obj->pat_set_by_user ||
>>> +		  i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_UC)));
>>>   }
>>>   
>>>   static int eb_reserve_vma(struct i915_execbuffer *eb,
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_internal.c b/drivers/gpu/drm/i915/gem/i915_gem_internal.c
>>> index 6bc26b4b06b8..88c360c3d6a3 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_internal.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_internal.c
>>> @@ -170,7 +170,7 @@ __i915_gem_object_create_internal(struct drm_i915_private *i915,
>>>   	obj->read_domains = I915_GEM_DOMAIN_CPU;
>>>   	obj->write_domain = I915_GEM_DOMAIN_CPU;
>>>   
>>> -	cache_level = HAS_LLC(i915) ? I915_CACHE_LLC : I915_CACHE_NONE;
>>> +	cache_level = HAS_LLC(i915) ? I915_CACHE_CACHED : I915_CACHE_NONE;
>>>   	i915_gem_object_set_cache_coherency(obj, cache_level);
>>>   
>>>   	return obj;
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
>>> index aa4d842d4c5a..cd7f8ded0d6f 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
>>> @@ -382,7 +382,6 @@ static vm_fault_t vm_fault_gtt(struct vm_fault *vmf)
>>>   		goto err_reset;
>>>   	}
>>>   
>>> -	/* Access to snoopable pages through the GTT is incoherent. */
>>>   	/*
>>>   	 * For objects created by userspace through GEM_CREATE with pat_index
>>>   	 * set by set_pat extension, coherency is managed by userspace, make
>>> @@ -391,7 +390,8 @@ static vm_fault_t vm_fault_gtt(struct vm_fault *vmf)
>>>   	 * objects. Otherwise this helper function would fall back to checking
>>>   	 * whether the object is un-cached.
>>>   	 */
>>> -	if (!(i915_gem_object_has_cache_level(obj, I915_CACHE_NONE) ||
>>> +	if (!((obj->pat_set_by_user ||
>>> +	       i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_UC)) ||
>>>   	      HAS_LLC(i915))) {
>>>   		ret = -EFAULT;
>>>   		goto err_unpin;
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
>>> index 3dc4fbb67d2b..ec1f0be43d0d 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
>>> @@ -45,33 +45,6 @@ static struct kmem_cache *slab_objects;
>>>   
>>>   static const struct drm_gem_object_funcs i915_gem_object_funcs;
>>>   
>>> -unsigned int i915_gem_get_pat_index(struct drm_i915_private *i915,
>>> -				    enum i915_cache_level level)
>>> -{
>>> -	if (drm_WARN_ON(&i915->drm, level >= I915_MAX_CACHE_LEVEL))
>>> -		return 0;
>>> -
>>> -	return INTEL_INFO(i915)->cachelevel_to_pat[level];
>>> -}
>>> -
>>> -bool i915_gem_object_has_cache_level(const struct drm_i915_gem_object *obj,
>>> -				     enum i915_cache_level lvl)
>>> -{
>>> -	/*
>>> -	 * In case the pat_index is set by user space, this kernel mode
>>> -	 * driver should leave the coherency to be managed by user space,
>>> -	 * simply return true here.
>>> -	 */
>>> -	if (obj->pat_set_by_user)
>>> -		return true;
>>> -
>>> -	/*
>>> -	 * Otherwise the pat_index should have been converted from cache_level
>>> -	 * so that the following comparison is valid.
>>> -	 */
>>> -	return obj->pat_index == i915_gem_get_pat_index(obj_to_i915(obj), lvl);
>>> -}
>>> -
>>>   struct drm_i915_gem_object *i915_gem_object_alloc(void)
>>>   {
>>>   	struct drm_i915_gem_object *obj;
>>> @@ -144,30 +117,72 @@ void __i915_gem_object_fini(struct drm_i915_gem_object *obj)
>>>   	dma_resv_fini(&obj->base._resv);
>>>   }
>>>   
>>> +bool i915_gem_object_has_cache_mode(const struct drm_i915_gem_object *obj,
>>> +				    enum i915_cache_mode mode)
>>> +{
>>> +	struct drm_i915_private *i915 = obj_to_i915(obj);
>>> +	i915_cache_t cache = INTEL_INFO(i915)->cache_modes[obj->pat_index];
>>> +
>>> +	return I915_CACHE_MODE(cache) == mode;
>>> +}
>>> +
>>> +bool i915_gem_object_has_cache_flag(const struct drm_i915_gem_object *obj,
>>> +				    unsigned int flag)
>>> +{
>>> +	struct drm_i915_private *i915 = obj_to_i915(obj);
>>> +	i915_cache_t cache = INTEL_INFO(i915)->cache_modes[obj->pat_index];
>>> +
>>> +	return I915_CACHE_FLAGS(cache) & flag;
>>> +}
>>> +
>>> +static void __i915_gem_object_update_coherency(struct drm_i915_gem_object *obj)
>>> +{
>>> +	struct drm_i915_private *i915 = obj_to_i915(obj);
>>> +	i915_cache_t cache = INTEL_INFO(i915)->cache_modes[obj->pat_index];
>>> +	const unsigned int flags = I915_CACHE_FLAGS(cache);
>>> +	const unsigned int mode = I915_CACHE_MODE(cache);
>>> +
>>> +	if (mode == I915_CACHE_MODE_WC ||
>>> +	    mode == I915_CACHE_MODE_WT ||
>>> +	    (mode == I915_CACHE_MODE_WB && (flags & I915_CACHE_FLAG_COH2W)))
> 
> Shouldn't we only need 1W coherency here?  With 1-way coherency GPU
> reads will snoop the CPU cache and GPU writes will invalidate the CPU
> cache.  2-way only matters for how CPU reads/writes interact with the
> GPU cache.

I thought so too at one point, but then was not entirely sure. Our kerneldoc says:

	 * I915_BO_CACHE_COHERENT_FOR_WRITE:
	 *
	 * When writing through the CPU cache, the GPU is still coherent. Note
	 * that this also implies I915_BO_CACHE_COHERENT_FOR_READ.

Question is whether that only applies before handing over the buffer to the GPU, or also while in active use from both sides?

If just before handing over then 1-way is correct. But if latter then 2-way is needed because MTL bspec says 1-way only snoops from the GPU until first GPU access, after which point GPU has its own copy of the cache line in it's cache.

Bspec 59620 - although now I see that may not be MTL..

Regards,

Tvrtko

> 
> 
> Matt
> 
>>> +		obj->cache_coherent = I915_BO_CACHE_COHERENT_FOR_READ |
>>> +				      I915_BO_CACHE_COHERENT_FOR_WRITE;
>>> +	else if (HAS_LLC(i915))
>>> +		obj->cache_coherent = I915_BO_CACHE_COHERENT_FOR_READ;
>>> +	else
>>> +		obj->cache_coherent = 0;
>>> +
>>> +	obj->cache_dirty =
>>> +		!(obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_WRITE) &&
>>> +		!IS_DGFX(i915);
>>> +}
>>> +
>>>   /**
>>>    * i915_gem_object_set_cache_coherency - Mark up the object's coherency levels
>>> - * for a given cache_level
>>> + * for a given caching mode
>>>    * @obj: #drm_i915_gem_object
>>> - * @cache_level: cache level
>>> + * @cache: cache mode
>>>    */
>>>   void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object *obj,
>>> -					 unsigned int cache_level)
>>> +					 i915_cache_t cache)
>>>   {
>>> -	struct drm_i915_private *i915 = to_i915(obj->base.dev);
>>> +	struct drm_i915_private *i915 = obj_to_i915(obj);
>>> +	int found;
>>>   
>>> -	obj->pat_index = i915_gem_get_pat_index(i915, cache_level);
>>> +	found = i915_cache_find_pat(i915, cache);
>>> +	if (found < 0) {
>>> +		char buf[I915_CACHE_NAME_LEN];
>>>   
>>> -	if (cache_level != I915_CACHE_NONE)
>>> -		obj->cache_coherent = (I915_BO_CACHE_COHERENT_FOR_READ |
>>> -				       I915_BO_CACHE_COHERENT_FOR_WRITE);
>>> -	else if (HAS_LLC(i915))
>>> -		obj->cache_coherent = I915_BO_CACHE_COHERENT_FOR_READ;
>>> -	else
>>> -		obj->cache_coherent = 0;
>>> +		i915_cache_print(buf, sizeof(buf), NULL, cache);
>>> +		drm_err_ratelimited(&i915->drm, "Unknown cache mode %s!\n",
>>> +				    buf);
>>>   
>>> -	obj->cache_dirty =
>>> -		!(obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_WRITE) &&
>>> -		!IS_DGFX(i915);
>>> +		found = i915->pat_uc;
>>> +	}
>>> +
>>> +	obj->pat_index = found;
>>> +
>>> +	__i915_gem_object_update_coherency(obj);
>>>   }
>>>   
>>>   /**
>>> @@ -181,24 +196,18 @@ void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object *obj,
>>>   void i915_gem_object_set_pat_index(struct drm_i915_gem_object *obj,
>>>   				   unsigned int pat_index)
>>>   {
>>> -	struct drm_i915_private *i915 = to_i915(obj->base.dev);
>>> +	struct drm_i915_private *i915 = obj_to_i915(obj);
>>>   
>>>   	if (obj->pat_index == pat_index)
>>>   		return;
>>>   
>>> +	if (drm_WARN_ON_ONCE(&i915->drm,
>>> +			     pat_index > INTEL_INFO(i915)->max_pat_index))
>>> +		return;
>>> +
>>>   	obj->pat_index = pat_index;
>>>   
>>> -	if (pat_index != i915_gem_get_pat_index(i915, I915_CACHE_NONE))
>>> -		obj->cache_coherent = (I915_BO_CACHE_COHERENT_FOR_READ |
>>> -				       I915_BO_CACHE_COHERENT_FOR_WRITE);
>>> -	else if (HAS_LLC(i915))
>>> -		obj->cache_coherent = I915_BO_CACHE_COHERENT_FOR_READ;
>>> -	else
>>> -		obj->cache_coherent = 0;
>>> -
>>> -	obj->cache_dirty =
>>> -		!(obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_WRITE) &&
>>> -		!IS_DGFX(i915);
>>> +	__i915_gem_object_update_coherency(obj);
>>>   }
>>>   
>>>   bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj)
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h
>>> index 884a17275b3a..a5d4ee19d9be 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
>>> @@ -13,6 +13,7 @@
>>>   
>>>   #include "display/intel_frontbuffer.h"
>>>   #include "intel_memory_region.h"
>>> +#include "i915_cache.h"
>>>   #include "i915_gem_object_types.h"
>>>   #include "i915_gem_gtt.h"
>>>   #include "i915_gem_ww.h"
>>> @@ -32,10 +33,6 @@ static inline bool i915_gem_object_size_2big(u64 size)
>>>   	return false;
>>>   }
>>>   
>>> -unsigned int i915_gem_get_pat_index(struct drm_i915_private *i915,
>>> -				    enum i915_cache_level level);
>>> -bool i915_gem_object_has_cache_level(const struct drm_i915_gem_object *obj,
>>> -				     enum i915_cache_level lvl);
>>>   void i915_gem_init__objects(struct drm_i915_private *i915);
>>>   
>>>   void i915_objects_module_exit(void);
>>> @@ -764,8 +761,12 @@ int i915_gem_object_wait_moving_fence(struct drm_i915_gem_object *obj,
>>>   				      bool intr);
>>>   bool i915_gem_object_has_unknown_state(struct drm_i915_gem_object *obj);
>>>   
>>> +bool i915_gem_object_has_cache_mode(const struct drm_i915_gem_object *obj,
>>> +				    enum i915_cache_mode mode);
>>> +bool i915_gem_object_has_cache_flag(const struct drm_i915_gem_object *obj,
>>> +				    unsigned int flag);
>>>   void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object *obj,
>>> -					 unsigned int cache_level);
>>> +					 i915_cache_t cache);
>>>   void i915_gem_object_set_pat_index(struct drm_i915_gem_object *obj,
>>>   				   unsigned int pat_index);
>>>   bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj);
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
>>> index 8de2b91b3edf..6790e13ad262 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
>>> @@ -14,6 +14,7 @@
>>>   #include <uapi/drm/i915_drm.h>
>>>   
>>>   #include "i915_active.h"
>>> +#include "i915_cache.h"
>>>   #include "i915_selftest.h"
>>>   #include "i915_vma_resource.h"
>>>   
>>> @@ -116,93 +117,6 @@ struct drm_i915_gem_object_ops {
>>>   	const char *name; /* friendly name for debug, e.g. lockdep classes */
>>>   };
>>>   
>>> -/**
>>> - * enum i915_cache_level - The supported GTT caching values for system memory
>>> - * pages.
>>> - *
>>> - * These translate to some special GTT PTE bits when binding pages into some
>>> - * address space. It also determines whether an object, or rather its pages are
>>> - * coherent with the GPU, when also reading or writing through the CPU cache
>>> - * with those pages.
>>> - *
>>> - * Userspace can also control this through struct drm_i915_gem_caching.
>>> - */
>>> -enum i915_cache_level {
>>> -	/**
>>> -	 * @I915_CACHE_NONE:
>>> -	 *
>>> -	 * GPU access is not coherent with the CPU cache. If the cache is dirty
>>> -	 * and we need the underlying pages to be coherent with some later GPU
>>> -	 * access then we need to manually flush the pages.
>>> -	 *
>>> -	 * On shared LLC platforms reads and writes through the CPU cache are
>>> -	 * still coherent even with this setting. See also
>>> -	 * &drm_i915_gem_object.cache_coherent for more details. Due to this we
>>> -	 * should only ever use uncached for scanout surfaces, otherwise we end
>>> -	 * up over-flushing in some places.
>>> -	 *
>>> -	 * This is the default on non-LLC platforms.
>>> -	 */
>>> -	I915_CACHE_NONE = 0,
>>> -	/**
>>> -	 * @I915_CACHE_LLC:
>>> -	 *
>>> -	 * GPU access is coherent with the CPU cache. If the cache is dirty,
>>> -	 * then the GPU will ensure that access remains coherent, when both
>>> -	 * reading and writing through the CPU cache. GPU writes can dirty the
>>> -	 * CPU cache.
>>> -	 *
>>> -	 * Not used for scanout surfaces.
>>> -	 *
>>> -	 * Applies to both platforms with shared LLC(HAS_LLC), and snooping
>>> -	 * based platforms(HAS_SNOOP).
>>> -	 *
>>> -	 * This is the default on shared LLC platforms.  The only exception is
>>> -	 * scanout objects, where the display engine is not coherent with the
>>> -	 * CPU cache. For such objects I915_CACHE_NONE or I915_CACHE_WT is
>>> -	 * automatically applied by the kernel in pin_for_display, if userspace
>>> -	 * has not done so already.
>>> -	 */
>>> -	I915_CACHE_LLC,
>>> -	/**
>>> -	 * @I915_CACHE_L3_LLC:
>>> -	 *
>>> -	 * Explicitly enable the Gfx L3 cache, with coherent LLC.
>>> -	 *
>>> -	 * The Gfx L3 sits between the domain specific caches, e.g
>>> -	 * sampler/render caches, and the larger LLC. LLC is coherent with the
>>> -	 * GPU, but L3 is only visible to the GPU, so likely needs to be flushed
>>> -	 * when the workload completes.
>>> -	 *
>>> -	 * Not used for scanout surfaces.
>>> -	 *
>>> -	 * Only exposed on some gen7 + GGTT. More recent hardware has dropped
>>> -	 * this explicit setting, where it should now be enabled by default.
>>> -	 */
>>> -	I915_CACHE_L3_LLC,
>>> -	/**
>>> -	 * @I915_CACHE_WT:
>>> -	 *
>>> -	 * Write-through. Used for scanout surfaces.
>>> -	 *
>>> -	 * The GPU can utilise the caches, while still having the display engine
>>> -	 * be coherent with GPU writes, as a result we don't need to flush the
>>> -	 * CPU caches when moving out of the render domain. This is the default
>>> -	 * setting chosen by the kernel, if supported by the HW, otherwise we
>>> -	 * fallback to I915_CACHE_NONE. On the CPU side writes through the CPU
>>> -	 * cache still need to be flushed, to remain coherent with the display
>>> -	 * engine.
>>> -	 */
>>> -	I915_CACHE_WT,
>>> -	/**
>>> -	 * @I915_MAX_CACHE_LEVEL:
>>> -	 *
>>> -	 * Mark the last entry in the enum. Used for defining cachelevel_to_pat
>>> -	 * array for cache_level to pat translation table.
>>> -	 */
>>> -	I915_MAX_CACHE_LEVEL,
>>> -};
>>> -
>>>   enum i915_map_type {
>>>   	I915_MAP_WB = 0,
>>>   	I915_MAP_WC,
>>> @@ -403,16 +317,6 @@ struct drm_i915_gem_object {
>>>   	/**
>>>   	 * @cache_coherent:
>>>   	 *
>>> -	 * Note: with the change above which replaced @cache_level with pat_index,
>>> -	 * the use of @cache_coherent is limited to the objects created by kernel
>>> -	 * or by userspace without pat index specified.
>>> -	 * Check for @pat_set_by_user to find out if an object has pat index set
>>> -	 * by userspace. The ioctl's to change cache settings have also been
>>> -	 * disabled for the objects with pat index set by userspace. Please don't
>>> -	 * assume @cache_coherent having the flags set as describe here. A helper
>>> -	 * function i915_gem_object_has_cache_level() provides one way to bypass
>>> -	 * the use of this field.
>>> -	 *
>>>   	 * Track whether the pages are coherent with the GPU if reading or
>>>   	 * writing through the CPU caches. The largely depends on the
>>>   	 * @cache_level setting.
>>> @@ -447,7 +351,7 @@ struct drm_i915_gem_object {
>>>   	 * flushing the surface just before doing the scanout.  This does mean
>>>   	 * we might unnecessarily flush non-scanout objects in some places, but
>>>   	 * the default assumption is that all normal objects should be using
>>> -	 * I915_CACHE_LLC, at least on platforms with the shared LLC.
>>> +	 * I915_CACHE_CACHED, at least on platforms with the shared LLC.
>>>   	 *
>>>   	 * Supported values:
>>>   	 *
>>> @@ -486,16 +390,6 @@ struct drm_i915_gem_object {
>>>   	/**
>>>   	 * @cache_dirty:
>>>   	 *
>>> -	 * Note: with the change above which replaced cache_level with pat_index,
>>> -	 * the use of @cache_dirty is limited to the objects created by kernel
>>> -	 * or by userspace without pat index specified.
>>> -	 * Check for @pat_set_by_user to find out if an object has pat index set
>>> -	 * by userspace. The ioctl's to change cache settings have also been
>>> -	 * disabled for the objects with pat_index set by userspace. Please don't
>>> -	 * assume @cache_dirty is set as describe here. Also see helper function
>>> -	 * i915_gem_object_has_cache_level() for possible ways to bypass the use
>>> -	 * of this field.
>>> -	 *
>>>   	 * Track if we are we dirty with writes through the CPU cache for this
>>>   	 * object. As a result reading directly from main memory might yield
>>>   	 * stale data.
>>> @@ -531,9 +425,9 @@ struct drm_i915_gem_object {
>>>   	 *
>>>   	 *   1. All userspace objects, by default, have @cache_level set as
>>>   	 *   I915_CACHE_NONE. The only exception is userptr objects, where we
>>> -	 *   instead force I915_CACHE_LLC, but we also don't allow userspace to
>>> -	 *   ever change the @cache_level for such objects. Another special case
>>> -	 *   is dma-buf, which doesn't rely on @cache_dirty,  but there we
>>> +	 *   instead force I915_CACHE_CACHED, but we also don't allow userspace
>>> +	 *   to ever change the @cache_level for such objects. Another special
>>> +	 *   case is dma-buf, which doesn't rely on @cache_dirty,  but there we
>>>   	 *   always do a forced flush when acquiring the pages, if there is a
>>>   	 *   chance that the pages can be read directly from main memory with
>>>   	 *   the GPU.
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
>>> index 8f1633c3fb93..aba908f0349f 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
>>> @@ -584,7 +584,7 @@ static int shmem_object_init(struct intel_memory_region *mem,
>>>   	static struct lock_class_key lock_class;
>>>   	struct drm_i915_private *i915 = mem->i915;
>>>   	struct address_space *mapping;
>>> -	unsigned int cache_level;
>>> +	i915_cache_t cache;
>>>   	gfp_t mask;
>>>   	int ret;
>>>   
>>> @@ -628,11 +628,11 @@ static int shmem_object_init(struct intel_memory_region *mem,
>>>   		 * However, we maintain the display planes as UC, and so
>>>   		 * need to rebind when first used as such.
>>>   		 */
>>> -		cache_level = I915_CACHE_LLC;
>>> +		cache = I915_CACHE_CACHED;
>>>   	else
>>> -		cache_level = I915_CACHE_NONE;
>>> +		cache = I915_CACHE_NONE;
>>>   
>>> -	i915_gem_object_set_cache_coherency(obj, cache_level);
>>> +	i915_gem_object_set_cache_coherency(obj, cache);
>>>   
>>>   	i915_gem_object_init_memory_region(obj, mem);
>>>   
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
>>> index 1c8eb806b7d3..cc907a1f1c53 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
>>> @@ -691,7 +691,7 @@ static int __i915_gem_object_create_stolen(struct intel_memory_region *mem,
>>>   
>>>   	obj->stolen = stolen;
>>>   	obj->read_domains = I915_GEM_DOMAIN_CPU | I915_GEM_DOMAIN_GTT;
>>> -	cache_level = HAS_LLC(mem->i915) ? I915_CACHE_LLC : I915_CACHE_NONE;
>>> +	cache_level = HAS_LLC(mem->i915) ? I915_CACHE_CACHED : I915_CACHE_NONE;
>>>   	i915_gem_object_set_cache_coherency(obj, cache_level);
>>>   
>>>   	if (WARN_ON(!i915_gem_object_trylock(obj, NULL)))
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
>>> index 6bd6c239f4ac..107176d1757b 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
>>> @@ -48,14 +48,14 @@ void i915_ttm_migrate_set_ban_memcpy(bool ban)
>>>   }
>>>   #endif
>>>   
>>> -static enum i915_cache_level
>>> -i915_ttm_cache_level(struct drm_i915_private *i915, struct ttm_resource *res,
>>> -		     struct ttm_tt *ttm)
>>> +static i915_cache_t
>>> +i915_ttm_cache(struct drm_i915_private *i915, struct ttm_resource *res,
>>> +	       struct ttm_tt *ttm)
>>>   {
>>>   	return ((HAS_LLC(i915) || HAS_SNOOP(i915)) &&
>>>   		!i915_ttm_gtt_binds_lmem(res) &&
>>> -		ttm->caching == ttm_cached) ? I915_CACHE_LLC :
>>> -		I915_CACHE_NONE;
>>> +		ttm->caching == ttm_cached) ? I915_CACHE_CACHED :
>>> +					      I915_CACHE_NONE;
>>>   }
>>>   
>>>   static unsigned int
>>> @@ -112,8 +112,8 @@ void i915_ttm_adjust_domains_after_move(struct drm_i915_gem_object *obj)
>>>   void i915_ttm_adjust_gem_after_move(struct drm_i915_gem_object *obj)
>>>   {
>>>   	struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
>>> -	unsigned int cache_level;
>>>   	unsigned int mem_flags;
>>> +	i915_cache_t cache;
>>>   	unsigned int i;
>>>   	int mem_type;
>>>   
>>> @@ -126,13 +126,13 @@ void i915_ttm_adjust_gem_after_move(struct drm_i915_gem_object *obj)
>>>   	if (!bo->resource) {
>>>   		mem_flags = I915_BO_FLAG_STRUCT_PAGE;
>>>   		mem_type = I915_PL_SYSTEM;
>>> -		cache_level = I915_CACHE_NONE;
>>> +		cache = I915_CACHE_NONE;
>>>   	} else {
>>>   		mem_flags = i915_ttm_cpu_maps_iomem(bo->resource) ? I915_BO_FLAG_IOMEM :
>>>   			I915_BO_FLAG_STRUCT_PAGE;
>>>   		mem_type = bo->resource->mem_type;
>>> -		cache_level = i915_ttm_cache_level(to_i915(bo->base.dev), bo->resource,
>>> -						   bo->ttm);
>>> +		cache = i915_ttm_cache(to_i915(bo->base.dev), bo->resource,
>>> +				       bo->ttm);
>>>   	}
>>>   
>>>   	/*
>>> @@ -157,7 +157,7 @@ void i915_ttm_adjust_gem_after_move(struct drm_i915_gem_object *obj)
>>>   	obj->mem_flags &= ~(I915_BO_FLAG_STRUCT_PAGE | I915_BO_FLAG_IOMEM);
>>>   	obj->mem_flags |= mem_flags;
>>>   
>>> -	i915_gem_object_set_cache_coherency(obj, cache_level);
>>> +	i915_gem_object_set_cache_coherency(obj, cache);
>>>   }
>>>   
>>>   /**
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
>>> index 1d3ebdf4069b..5d2891981bd4 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
>>> @@ -553,7 +553,7 @@ i915_gem_userptr_ioctl(struct drm_device *dev,
>>>   	obj->mem_flags = I915_BO_FLAG_STRUCT_PAGE;
>>>   	obj->read_domains = I915_GEM_DOMAIN_CPU;
>>>   	obj->write_domain = I915_GEM_DOMAIN_CPU;
>>> -	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
>>> +	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
>>>   
>>>   	obj->userptr.ptr = args->user_ptr;
>>>   	obj->userptr.notifier_seq = ULONG_MAX;
>>> diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c b/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c
>>> index bac957755068..77d04be5e9d7 100644
>>> --- a/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c
>>> +++ b/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c
>>> @@ -123,7 +123,7 @@ huge_gem_object(struct drm_i915_private *i915,
>>>   
>>>   	obj->read_domains = I915_GEM_DOMAIN_CPU;
>>>   	obj->write_domain = I915_GEM_DOMAIN_CPU;
>>> -	cache_level = HAS_LLC(i915) ? I915_CACHE_LLC : I915_CACHE_NONE;
>>> +	cache_level = HAS_LLC(i915) ? I915_CACHE_CACHED : I915_CACHE_NONE;
>>>   	i915_gem_object_set_cache_coherency(obj, cache_level);
>>>   	obj->scratch = phys_size;
>>>   
>>> diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
>>> index 6bddd733d796..6ca5b9dbc414 100644
>>> --- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
>>> +++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
>>> @@ -200,9 +200,10 @@ huge_pages_object(struct drm_i915_private *i915,
>>>   	obj->write_domain = I915_GEM_DOMAIN_CPU;
>>>   	obj->read_domains = I915_GEM_DOMAIN_CPU;
>>>   
>>> -	cache_level = HAS_LLC(i915) ? I915_CACHE_LLC : I915_CACHE_NONE;
>>> +	cache_level = HAS_LLC(i915) ? I915_CACHE_CACHED : I915_CACHE_NONE;
>>>   	i915_gem_object_set_cache_coherency(obj, cache_level);
>>>   
>>> +
>>>   	obj->mm.page_mask = page_mask;
>>>   
>>>   	return obj;
>>> diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>>> index 675f71f06e89..3c93a73cf6b1 100644
>>> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>>> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>>> @@ -16,11 +16,11 @@
>>>   #include "intel_gtt.h"
>>>   
>>>   static u64 gen8_pde_encode(const dma_addr_t addr,
>>> -			   const enum i915_cache_level level)
>>> +			   const enum i915_cache_mode cache_mode)
>>>   {
>>>   	u64 pde = addr | GEN8_PAGE_PRESENT | GEN8_PAGE_RW;
>>>   
>>> -	if (level != I915_CACHE_NONE)
>>> +	if (cache_mode != I915_CACHE_MODE_UC)
>>>   		pde |= PPAT_CACHED_PDE;
>>>   	else
>>>   		pde |= PPAT_UNCACHED;
>>> @@ -43,10 +43,10 @@ static u64 gen8_pte_encode(dma_addr_t addr,
>>>   	 * See translation table defined by LEGACY_CACHELEVEL.
>>>   	 */
>>>   	switch (pat_index) {
>>> -	case I915_CACHE_NONE:
>>> +	case I915_CACHE_MODE_UC:
>>>   		pte |= PPAT_UNCACHED;
>>>   		break;
>>> -	case I915_CACHE_WT:
>>> +	case I915_CACHE_MODE_WT:
>>>   		pte |= PPAT_DISPLAY_ELLC;
>>>   		break;
>>>   	default:
>>> @@ -893,7 +893,7 @@ static int gen8_init_scratch(struct i915_address_space *vm)
>>>   		}
>>>   
>>>   		fill_px(obj, vm->scratch[i - 1]->encode);
>>> -		obj->encode = gen8_pde_encode(px_dma(obj), I915_CACHE_NONE);
>>> +		obj->encode = gen8_pde_encode(px_dma(obj), I915_CACHE_MODE_UC);
>>>   
>>>   		vm->scratch[i] = obj;
>>>   	}
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
>>> index ee15486fed0d..f1e59e512d14 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
>>> @@ -1103,7 +1103,7 @@ static int init_status_page(struct intel_engine_cs *engine)
>>>   		return PTR_ERR(obj);
>>>   	}
>>>   
>>> -	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
>>> +	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
>>>   
>>>   	vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL);
>>>   	if (IS_ERR(vma)) {
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c
>>> index fca61ddca8ad..ab5f654e7557 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
>>> @@ -1011,11 +1011,6 @@ static int gen8_gmch_probe(struct i915_ggtt *ggtt)
>>>   	return ggtt_probe_common(ggtt, size);
>>>   }
>>>   
>>> -/*
>>> - * For pre-gen8 platforms pat_index is the same as enum i915_cache_level,
>>> - * so the switch-case statements in these PTE encode functions are still valid.
>>> - * See translation table LEGACY_CACHELEVEL.
>>> - */
>>>   static u64 snb_pte_encode(dma_addr_t addr,
>>>   			  unsigned int pat_index,
>>>   			  u32 flags)
>>> @@ -1023,11 +1018,11 @@ static u64 snb_pte_encode(dma_addr_t addr,
>>>   	gen6_pte_t pte = GEN6_PTE_ADDR_ENCODE(addr) | GEN6_PTE_VALID;
>>>   
>>>   	switch (pat_index) {
>>> -	case I915_CACHE_L3_LLC:
>>> -	case I915_CACHE_LLC:
>>> +	case I915_CACHE_MODE_WB:
>>> +	case __I915_CACHE_MODE_WB_L3:
>>>   		pte |= GEN6_PTE_CACHE_LLC;
>>>   		break;
>>> -	case I915_CACHE_NONE:
>>> +	case I915_CACHE_MODE_UC:
>>>   		pte |= GEN6_PTE_UNCACHED;
>>>   		break;
>>>   	default:
>>> @@ -1044,13 +1039,13 @@ static u64 ivb_pte_encode(dma_addr_t addr,
>>>   	gen6_pte_t pte = GEN6_PTE_ADDR_ENCODE(addr) | GEN6_PTE_VALID;
>>>   
>>>   	switch (pat_index) {
>>> -	case I915_CACHE_L3_LLC:
>>> +	case __I915_CACHE_MODE_WB_L3:
>>>   		pte |= GEN7_PTE_CACHE_L3_LLC;
>>>   		break;
>>> -	case I915_CACHE_LLC:
>>> +	case I915_CACHE_MODE_WB:
>>>   		pte |= GEN6_PTE_CACHE_LLC;
>>>   		break;
>>> -	case I915_CACHE_NONE:
>>> +	case I915_CACHE_MODE_UC:
>>>   		pte |= GEN6_PTE_UNCACHED;
>>>   		break;
>>>   	default:
>>> @@ -1069,7 +1064,7 @@ static u64 byt_pte_encode(dma_addr_t addr,
>>>   	if (!(flags & PTE_READ_ONLY))
>>>   		pte |= BYT_PTE_WRITEABLE;
>>>   
>>> -	if (pat_index != I915_CACHE_NONE)
>>> +	if (pat_index != I915_CACHE_MODE_UC)
>>>   		pte |= BYT_PTE_SNOOPED_BY_CPU_CACHES;
>>>   
>>>   	return pte;
>>> @@ -1081,7 +1076,7 @@ static u64 hsw_pte_encode(dma_addr_t addr,
>>>   {
>>>   	gen6_pte_t pte = HSW_PTE_ADDR_ENCODE(addr) | GEN6_PTE_VALID;
>>>   
>>> -	if (pat_index != I915_CACHE_NONE)
>>> +	if (pat_index != I915_CACHE_MODE_UC)
>>>   		pte |= HSW_WB_LLC_AGE3;
>>>   
>>>   	return pte;
>>> @@ -1094,9 +1089,9 @@ static u64 iris_pte_encode(dma_addr_t addr,
>>>   	gen6_pte_t pte = HSW_PTE_ADDR_ENCODE(addr) | GEN6_PTE_VALID;
>>>   
>>>   	switch (pat_index) {
>>> -	case I915_CACHE_NONE:
>>> +	case I915_CACHE_MODE_UC:
>>>   		break;
>>> -	case I915_CACHE_WT:
>>> +	case I915_CACHE_MODE_WT:
>>>   		pte |= HSW_WT_ELLC_LLC_AGE3;
>>>   		break;
>>>   	default:
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c b/drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c
>>> index 866c416afb73..803c41ac4ccb 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c
>>> @@ -21,7 +21,7 @@ static void gmch_ggtt_insert_page(struct i915_address_space *vm,
>>>   				  unsigned int pat_index,
>>>   				  u32 unused)
>>>   {
>>> -	unsigned int flags = (pat_index == I915_CACHE_NONE) ?
>>> +	unsigned int flags = (pat_index == I915_CACHE_MODE_UC) ?
>>>   		AGP_USER_MEMORY : AGP_USER_CACHED_MEMORY;
>>>   
>>>   	intel_gmch_gtt_insert_page(addr, offset >> PAGE_SHIFT, flags);
>>> @@ -32,7 +32,7 @@ static void gmch_ggtt_insert_entries(struct i915_address_space *vm,
>>>   				     unsigned int pat_index,
>>>   				     u32 unused)
>>>   {
>>> -	unsigned int flags = (pat_index == I915_CACHE_NONE) ?
>>> +	unsigned int flags = (pat_index == I915_CACHE_MODE_UC) ?
>>>   		AGP_USER_MEMORY : AGP_USER_CACHED_MEMORY;
>>>   
>>>   	intel_gmch_gtt_insert_sg_entries(vma_res->bi.pages, vma_res->start >> PAGE_SHIFT,
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
>>> index 065099362a98..48055304537a 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
>>> @@ -676,7 +676,7 @@ __vm_create_scratch_for_read(struct i915_address_space *vm, unsigned long size)
>>>   	if (IS_ERR(obj))
>>>   		return ERR_CAST(obj);
>>>   
>>> -	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
>>> +	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
>>>   
>>>   	vma = i915_vma_instance(obj, vm, NULL);
>>>   	if (IS_ERR(vma)) {
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
>>> index 7192a534a654..af4277c1d577 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
>>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
>>> @@ -636,7 +636,8 @@ void
>>>   __set_pd_entry(struct i915_page_directory * const pd,
>>>   	       const unsigned short idx,
>>>   	       struct i915_page_table *pt,
>>> -	       u64 (*encode)(const dma_addr_t, const enum i915_cache_level));
>>> +	       u64 (*encode)(const dma_addr_t,
>>> +			     const enum i915_cache_mode cache_mode));
>>>   
>>>   #define set_pd_entry(pd, idx, to) \
>>>   	__set_pd_entry((pd), (idx), px_pt(to), gen8_pde_encode)
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
>>> index 436756bfbb1a..3e461d4f3693 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
>>> @@ -98,14 +98,16 @@ void
>>>   __set_pd_entry(struct i915_page_directory * const pd,
>>>   	       const unsigned short idx,
>>>   	       struct i915_page_table * const to,
>>> -	       u64 (*encode)(const dma_addr_t, const enum i915_cache_level))
>>> +	       u64 (*encode)(const dma_addr_t,
>>> +			     const enum i915_cache_mode cache_mode))
>>>   {
>>>   	/* Each thread pre-pins the pd, and we may have a thread per pde. */
>>>   	GEM_BUG_ON(atomic_read(px_used(pd)) > NALLOC * I915_PDES);
>>>   
>>>   	atomic_inc(px_used(pd));
>>>   	pd->entry[idx] = to;
>>> -	write_dma_entry(px_base(pd), idx, encode(px_dma(to), I915_CACHE_LLC));
>>> +	write_dma_entry(px_base(pd), idx,
>>> +			encode(px_dma(to), I915_CACHE_MODE_WB));
>>>   }
>>>   
>>>   void
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
>>> index 92085ffd23de..9131d228d285 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
>>> @@ -551,7 +551,9 @@ alloc_context_vma(struct intel_engine_cs *engine)
>>>   	 * later platforms don't have L3 control bits in the PTE.
>>>   	 */
>>>   	if (IS_IVYBRIDGE(i915))
>>> -		i915_gem_object_set_cache_coherency(obj, I915_CACHE_L3_LLC);
>>> +		i915_gem_object_set_cache_coherency(obj,
>>> +						    I915_CACHE_CACHED |
>>> +						    __I915_CACHE_FLAG(L3));
>>>   
>>>   	vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL);
>>>   	if (IS_ERR(vma)) {
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_timeline.c b/drivers/gpu/drm/i915/gt/intel_timeline.c
>>> index b9640212d659..025ce54c886d 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_timeline.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_timeline.c
>>> @@ -26,7 +26,7 @@ static struct i915_vma *hwsp_alloc(struct intel_gt *gt)
>>>   	if (IS_ERR(obj))
>>>   		return ERR_CAST(obj);
>>>   
>>> -	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
>>> +	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
>>>   
>>>   	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL);
>>>   	if (IS_ERR(vma))
>>> diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
>>> index 8b0d84f2aad2..fc278fa463b0 100644
>>> --- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
>>> +++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
>>> @@ -64,7 +64,7 @@ static int hang_init(struct hang *h, struct intel_gt *gt)
>>>   		goto err_hws;
>>>   	}
>>>   
>>> -	i915_gem_object_set_cache_coherency(h->hws, I915_CACHE_LLC);
>>> +	i915_gem_object_set_cache_coherency(h->hws, I915_CACHE_CACHED);
>>>   	vaddr = i915_gem_object_pin_map_unlocked(h->hws, I915_MAP_WB);
>>>   	if (IS_ERR(vaddr)) {
>>>   		err = PTR_ERR(vaddr);
>>> diff --git a/drivers/gpu/drm/i915/gt/selftest_workarounds.c b/drivers/gpu/drm/i915/gt/selftest_workarounds.c
>>> index 14a8b25b6204..d25990d33d44 100644
>>> --- a/drivers/gpu/drm/i915/gt/selftest_workarounds.c
>>> +++ b/drivers/gpu/drm/i915/gt/selftest_workarounds.c
>>> @@ -111,7 +111,7 @@ read_nonprivs(struct intel_context *ce)
>>>   	if (IS_ERR(result))
>>>   		return result;
>>>   
>>> -	i915_gem_object_set_cache_coherency(result, I915_CACHE_LLC);
>>> +	i915_gem_object_set_cache_coherency(result, I915_CACHE_CACHED);
>>>   
>>>   	cs = i915_gem_object_pin_map_unlocked(result, I915_MAP_WB);
>>>   	if (IS_ERR(cs)) {
>>> diff --git a/drivers/gpu/drm/i915/i915_cache.c b/drivers/gpu/drm/i915/i915_cache.c
>>> index 06eb5933c719..f4ba1cb430d3 100644
>>> --- a/drivers/gpu/drm/i915/i915_cache.c
>>> +++ b/drivers/gpu/drm/i915/i915_cache.c
>>> @@ -6,13 +6,88 @@
>>>   #include "i915_cache.h"
>>>   #include "i915_drv.h"
>>>   
>>> -void i915_cache_init(struct drm_i915_private *i915)
>>> +int i915_cache_init(struct drm_i915_private *i915)
>>>   {
>>> -	i915->pat_uc = i915_gem_get_pat_index(i915, I915_CACHE_NONE);
>>> -	drm_info(&i915->drm, "Using PAT index %u for uncached access\n",
>>> -		 i915->pat_uc);
>>> +	int ret;
>>>   
>>> -	i915->pat_wb = i915_gem_get_pat_index(i915, I915_CACHE_LLC);
>>> -	drm_info(&i915->drm, "Using PAT index %u for write-back access\n",
>>> -		 i915->pat_wb);
>>> +	ret = i915_cache_find_pat(i915, I915_CACHE_NONE);
>>> +	if (ret < 0) {
>>> +		drm_err(&i915->drm,
>>> +			"Failed to find PAT index for uncached access\n");
>>> +		return -ENODEV;
>>> +	}
>>> +	drm_info(&i915->drm, "Using PAT index %u for uncached access\n", ret);
>>> +	i915->pat_uc = ret;
>>> +
>>> +	ret = i915_cache_find_pat(i915, I915_CACHE_CACHED);
>>> +	if (ret < 0) {
>>> +		drm_err(&i915->drm,
>>> +			"Failed to find PAT index for write-back access\n");
>>> +		return -ENODEV;
>>> +	}
>>> +	drm_info(&i915->drm, "Using PAT index %u for write-back access\n", ret);
>>> +	i915->pat_wb = ret;
>>> +
>>> +	return 0;
>>> +}
>>> +
>>> +int i915_cache_find_pat(struct drm_i915_private *i915, i915_cache_t cache)
>>> +{
>>> +	const struct intel_device_info *info = INTEL_INFO(i915);
>>> +	int i;
>>> +
>>> +	for (i = 0; i < ARRAY_SIZE(info->cache_modes); i++) {
>>> +		if (info->cache_modes[i] == cache)
>>> +			return i;
>>> +	}
>>> +
>>> +	return -1;
>>> +}
>>> +
>>> +void i915_cache_print(char *buf, size_t buflen, const char *suffix,
>>> +		      i915_cache_t cache)
>>> +{
>>> +	const enum i915_cache_mode mode = I915_CACHE_MODE(cache);
>>> +	static const char * const mode_str[] = {
>>> +		[I915_CACHE_MODE_UC] = "UC",
>>> +		[I915_CACHE_MODE_WB] = "WB",
>>> +		[I915_CACHE_MODE_WT] = "WT",
>>> +		[I915_CACHE_MODE_WC] = "WC",
>>> +	};
>>> +	static const char * const flag_str[] = {
>>> +		[ilog2(I915_CACHE_FLAG_COH1W)] = "1-Way-Coherent",
>>> +		[ilog2(I915_CACHE_FLAG_COH2W)] = "2-Way-Coherent",
>>> +		[ilog2(I915_CACHE_FLAG_L3)] =    "L3",
>>> +		[ilog2(I915_CACHE_FLAG_CLOS1)] = "CLOS1",
>>> +		[ilog2(I915_CACHE_FLAG_CLOS2)] = "CLOS2",
>>> +	};
>>> +
>>> +	if (mode > ARRAY_SIZE(mode_str)) {
>>> +		snprintf(buf, buflen, "0x%x%s", cache, suffix ?: "");
>>> +	} else {
>>> +		unsigned long flags = I915_CACHE_FLAGS(cache);
>>> +		unsigned long bit;
>>> +		int ret;
>>> +
>>> +		ret = snprintf(buf, buflen, "%s", mode_str[mode]);
>>> +		buf += ret;
>>> +		buflen -= ret;
>>> +
>>> +		/*
>>> +		 * Don't print "1-way-2-way", it would be confusing and 2-way
>>> +		 * implies 1-way anyway.
>>> +		 */
>>> +		if ((flags & (I915_CACHE_FLAG_COH1W | I915_CACHE_FLAG_COH2W)) ==
>>> +		    (I915_CACHE_FLAG_COH1W | I915_CACHE_FLAG_COH2W))
>>> +			flags &= ~I915_CACHE_FLAG_COH1W;
>>> +
>>> +		for_each_set_bit(bit, &flags, BITS_PER_TYPE(i915_cache_t)) {
>>> +			ret = snprintf(buf, buflen, "-%s", flag_str[bit]);
>>> +			buf += ret;
>>> +			buflen -= ret;
>>> +		}
>>> +
>>> +		if (suffix)
>>> +			snprintf(buf, buflen, "%s", suffix);
>>> +	}
>>>   }
>>> diff --git a/drivers/gpu/drm/i915/i915_cache.h b/drivers/gpu/drm/i915/i915_cache.h
>>> index cb68936fb8a2..d9e97318b942 100644
>>> --- a/drivers/gpu/drm/i915/i915_cache.h
>>> +++ b/drivers/gpu/drm/i915/i915_cache.h
>>> @@ -6,8 +6,76 @@
>>>   #ifndef __I915_CACHE_H__
>>>   #define __I915_CACHE_H__
>>>   
>>> +#include <linux/types.h>
>>> +
>>> +struct drm_printer;
>>> +
>>>   struct drm_i915_private;
>>>   
>>> -void i915_cache_init(struct drm_i915_private *i915);
>>> +typedef u16 i915_cache_t;
>>> +
>>> +/* Cache modes */
>>> +enum i915_cache_mode {
>>> +	I915_CACHE_MODE_UC = 0,
>>> +	I915_CACHE_MODE_WB,
>>> +	__I915_CACHE_MODE_WB_L3, /* Special do-not-use entry for legacy 1:1 mapping. */
>>> +	I915_CACHE_MODE_WT,
>>> +	I915_CACHE_MODE_WC,
>>> +	I915_NUM_CACHE_MODES
>>> +};
>>> +
>>> +/* Cache mode flag bits */
>>> +#define I915_CACHE_FLAG_COH1W	(0x1)
>>> +#define I915_CACHE_FLAG_COH2W	(0x2) /* 1-way needs to be set too. */
>>> +#define I915_CACHE_FLAG_L3	(0x4)
>>> +#define I915_CACHE_FLAG_CLOS1	(0x8)
>>> +#define I915_CACHE_FLAG_CLOS2	(0x10)
>>> +
>>> +/*
>>> + * Overloaded I915_CACHE() macro based on:
>>> + *  https://stackoverflow.com/questions/3046889/optional-parameters-with-c-macros
>>> + *
>>> + * It is possible to call I915_CACHE with mode and zero or more flags as
>>> + * separate arguments. Ie these all work:
>>> + *
>>> + *   I915_CACHE(WB)
>>> + *   I915_CACHE(WB, COH1W, COH2W)
>>> + *   I915_CACHE(WB, COH1W, COH2W, L3)
>>> + */
>>> +
>>> +#define __I915_CACHE_FLAG(f) (I915_CACHE_FLAG_##f << 8)
>>> +#define __I915_CACHE(m, f) ((i915_cache_t)(I915_CACHE_MODE_##m | (f)))
>>> +
>>> +#define I915_CACHE_4(m, f1, f2, f3)	__I915_CACHE(m, __I915_CACHE_FLAG(f1) | __I915_CACHE_FLAG(f2) | __I915_CACHE_FLAG(f3))
>>> +#define I915_CACHE_3(m, f1, f2)		__I915_CACHE(m, __I915_CACHE_FLAG(f1) | __I915_CACHE_FLAG(f2))
>>> +#define I915_CACHE_2(m, f1)		__I915_CACHE(m, __I915_CACHE_FLAG(f1))
>>> +#define I915_CACHE_1(m)			__I915_CACHE(m, 0)
>>> +#define I915_CACHE_0(m)			__I915_CACHE(WC, 0)
>>> +
>>> +#define FUNC_CHOOSER(_f1, _f2, _f3, _f4, _f5, ...) _f5
>>> +#define FUNC_RECOMPOSER(argsWithParentheses) FUNC_CHOOSER argsWithParentheses
>>> +#define CHOOSE_FROM_ARG_COUNT(...) FUNC_RECOMPOSER((__VA_ARGS__, I915_CACHE_4, I915_CACHE_3, I915_CACHE_2, I915_CACHE_1, ))
>>> +#define NO_ARG_EXPANDER() ,,,I915_CACHE_0
>>> +#define MACRO_CHOOSER(...) CHOOSE_FROM_ARG_COUNT(NO_ARG_EXPANDER __VA_ARGS__ ())
>>> +
>>> +#define I915_CACHE(...) MACRO_CHOOSER(__VA_ARGS__)(__VA_ARGS__)
>>> +
>>> +/* i915_cache_t mode and flags extraction helpers. */
>>> +#define I915_CACHE_MODE(cache) \
>>> +	((enum i915_cache_mode)(((i915_cache_t)(cache)) & 0xff))
>>> +#define I915_CACHE_FLAGS(cache) \
>>> +	((unsigned int)((((i915_cache_t)(cache) & 0xff00)) >> 8))
>>> +
>>> +/* Helpers for i915 caching modes. */
>>> +#define I915_CACHE_NONE		I915_CACHE(UC)
>>> +#define I915_CACHE_CACHED	I915_CACHE(WB, COH1W, COH2W)
>>> +#define I915_CACHE_WT		I915_CACHE(WT)
>>> +
>>> +int i915_cache_init(struct drm_i915_private *i915);
>>> +int i915_cache_find_pat(struct drm_i915_private *i915, i915_cache_t cache);
>>> +void i915_cache_print(char *buf, size_t buflen, const char *suffix,
>>> +		      i915_cache_t cache);
>>> +
>>> +#define I915_CACHE_NAME_LEN (40)
>>>   
>>>   #endif /* __I915_CACHE_H__ */
>>> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
>>> index 4de44cf1026d..4ec292011546 100644
>>> --- a/drivers/gpu/drm/i915/i915_debugfs.c
>>> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
>>> @@ -140,57 +140,18 @@ static const char *stringify_vma_type(const struct i915_vma *vma)
>>>   	return "ppgtt";
>>>   }
>>>   
>>> -static const char *i915_cache_level_str(struct drm_i915_gem_object *obj)
>>> -{
>>> -	struct drm_i915_private *i915 = obj_to_i915(obj);
>>> -
>>> -	if (IS_METEORLAKE(i915)) {
>>> -		switch (obj->pat_index) {
>>> -		case 0: return " WB";
>>> -		case 1: return " WT";
>>> -		case 2: return " UC";
>>> -		case 3: return " WB (1-Way Coh)";
>>> -		case 4: return " WB (2-Way Coh)";
>>> -		default: return " not defined";
>>> -		}
>>> -	} else if (IS_PONTEVECCHIO(i915)) {
>>> -		switch (obj->pat_index) {
>>> -		case 0: return " UC";
>>> -		case 1: return " WC";
>>> -		case 2: return " WT";
>>> -		case 3: return " WB";
>>> -		case 4: return " WT (CLOS1)";
>>> -		case 5: return " WB (CLOS1)";
>>> -		case 6: return " WT (CLOS2)";
>>> -		case 7: return " WT (CLOS2)";
>>> -		default: return " not defined";
>>> -		}
>>> -	} else if (GRAPHICS_VER(i915) >= 12) {
>>> -		switch (obj->pat_index) {
>>> -		case 0: return " WB";
>>> -		case 1: return " WC";
>>> -		case 2: return " WT";
>>> -		case 3: return " UC";
>>> -		default: return " not defined";
>>> -		}
>>> -	} else {
>>> -		switch (obj->pat_index) {
>>> -		case 0: return " UC";
>>> -		case 1: return HAS_LLC(i915) ?
>>> -			       " LLC" : " snooped";
>>> -		case 2: return " L3+LLC";
>>> -		case 3: return " WT";
>>> -		default: return " not defined";
>>> -		}
>>> -	}
>>> -}
>>> -
>>>   void
>>>   i915_debugfs_describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
>>>   {
>>> +	struct drm_i915_private *i915 = to_i915(obj->base.dev);
>>> +	char buf[I915_CACHE_NAME_LEN];
>>>   	struct i915_vma *vma;
>>>   	int pin_count = 0;
>>>   
>>> +	i915_cache_print(buf, sizeof(buf),
>>> +			 obj->pat_set_by_user ? "!" : NULL,
>>> +			 INTEL_INFO(i915)->cache_modes[obj->pat_index]);
>>> +
>>>   	seq_printf(m, "%pK: %c%c%c %8zdKiB %02x %02x %s%s%s",
>>>   		   &obj->base,
>>>   		   get_tiling_flag(obj),
>>> @@ -199,7 +160,7 @@ i915_debugfs_describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
>>>   		   obj->base.size / 1024,
>>>   		   obj->read_domains,
>>>   		   obj->write_domain,
>>> -		   i915_cache_level_str(obj),
>>> +		   buf,
>>>   		   obj->mm.dirty ? " dirty" : "",
>>>   		   obj->mm.madv == I915_MADV_DONTNEED ? " purgeable" : "");
>>>   	if (obj->base.name)
>>> diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
>>> index bb2223cc3470..8663388a524f 100644
>>> --- a/drivers/gpu/drm/i915/i915_driver.c
>>> +++ b/drivers/gpu/drm/i915/i915_driver.c
>>> @@ -241,7 +241,9 @@ static int i915_driver_early_probe(struct drm_i915_private *dev_priv)
>>>   	i915_memcpy_init_early(dev_priv);
>>>   	intel_runtime_pm_init_early(&dev_priv->runtime_pm);
>>>   
>>> -	i915_cache_init(dev_priv);
>>> +	ret = i915_cache_init(dev_priv);
>>> +	if (ret < 0)
>>> +		return ret;
>>>   
>>>   	ret = i915_workqueues_init(dev_priv);
>>>   	if (ret < 0)
>>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>>> index 896aa48ed089..814705cfeb12 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem.c
>>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>>> @@ -1144,19 +1144,6 @@ int i915_gem_init(struct drm_i915_private *dev_priv)
>>>   	unsigned int i;
>>>   	int ret;
>>>   
>>> -	/*
>>> -	 * In the proccess of replacing cache_level with pat_index a tricky
>>> -	 * dependency is created on the definition of the enum i915_cache_level.
>>> -	 * in case this enum is changed, PTE encode would be broken.
>>> -	 * Add a WARNING here. And remove when we completely quit using this
>>> -	 * enum
>>> -	 */
>>> -	BUILD_BUG_ON(I915_CACHE_NONE != 0 ||
>>> -		     I915_CACHE_LLC != 1 ||
>>> -		     I915_CACHE_L3_LLC != 2 ||
>>> -		     I915_CACHE_WT != 3 ||
>>> -		     I915_MAX_CACHE_LEVEL != 4);
>>> -
>>>   	/* We need to fallback to 4K pages if host doesn't support huge gtt. */
>>>   	if (intel_vgpu_active(dev_priv) && !intel_vgpu_has_huge_gtt(dev_priv))
>>>   		RUNTIME_INFO(dev_priv)->page_sizes = I915_GTT_PAGE_SIZE_4K;
>>> diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
>>> index fcacdc21643c..565a60a1645d 100644
>>> --- a/drivers/gpu/drm/i915/i915_pci.c
>>> +++ b/drivers/gpu/drm/i915/i915_pci.c
>>> @@ -32,6 +32,7 @@
>>>   #include "gt/intel_sa_media.h"
>>>   #include "gem/i915_gem_object_types.h"
>>>   
>>> +#include "i915_cache.h"
>>>   #include "i915_driver.h"
>>>   #include "i915_drv.h"
>>>   #include "i915_pci.h"
>>> @@ -43,36 +44,43 @@
>>>   	.__runtime.graphics.ip.ver = (x), \
>>>   	.__runtime.media.ip.ver = (x)
>>>   
>>> -#define LEGACY_CACHELEVEL \
>>> -	.cachelevel_to_pat = { \
>>> -		[I915_CACHE_NONE]   = 0, \
>>> -		[I915_CACHE_LLC]    = 1, \
>>> -		[I915_CACHE_L3_LLC] = 2, \
>>> -		[I915_CACHE_WT]     = 3, \
>>> +#define LEGACY_CACHE_MODES \
>>> +	.cache_modes = { \
>>> +		[I915_CACHE_MODE_UC] 	  = I915_CACHE(UC), \
>>> +		[I915_CACHE_MODE_WB] 	  = I915_CACHE(WB, COH1W, COH2W), \
>>
>> Reading bspec 2863 (bdw) indicates that the CPU being able to snoop the
>> GPU's L3 was a new feature in gen8.  So for HSW and earlier, any
>> coherency was only 1-way (GPU could be coherent with CPU's caches, but
>> not vice-versa).  Only starting with gen8 did we get 2-way coherency as
>> an option where the CPU would also be coherent with the GPU cache (and
>> with gen8 and beyond you could still select 1-way instead of 2-way
>> coherency with instruction-level granularity via MOCS).  There are also
>> some legacy platforms (e.g., EHL/JSL on bspec 13948) where the IA wasn't
>> coherent with GPU L3 so we were back to 1-way coherency.
>>
>> So should we split LEGACY_CACHE_MODES into two tables with different
>> coherency settings attached to I915_CACHE_MODE_WB?
>>
>>> +		[__I915_CACHE_MODE_WB_L3] = I915_CACHE(WB, COH1W, COH2W, L3), \
>>> +		[I915_CACHE_MODE_WT] 	  = I915_CACHE(WT), \
>>>   	}
>>>   
>>> -#define TGL_CACHELEVEL \
>>> -	.cachelevel_to_pat = { \
>>> -		[I915_CACHE_NONE]   = 3, \
>>> -		[I915_CACHE_LLC]    = 0, \
>>> -		[I915_CACHE_L3_LLC] = 0, \
>>> -		[I915_CACHE_WT]     = 2, \
>>> +#define GEN12_CACHE_MODES \
>>> +	.cache_modes = { \
>>> +		[0] = I915_CACHE(WB, COH1W, COH2W), \
>>> +		[1] = I915_CACHE(WC), \
>>> +		[2] = I915_CACHE(WT), \
>>> +		[3] = I915_CACHE(UC), \
>>>   	}
>>>   
>>> -#define PVC_CACHELEVEL \
>>> -	.cachelevel_to_pat = { \
>>> -		[I915_CACHE_NONE]   = 0, \
>>> -		[I915_CACHE_LLC]    = 3, \
>>> -		[I915_CACHE_L3_LLC] = 3, \
>>> -		[I915_CACHE_WT]     = 2, \
>>> +/* FIXME is 1-way or 2-way for 3, 5, 7 */
>>> +
>>> +#define PVC_CACHE_MODES \
>>> +	.cache_modes = { \
>>> +		[0] = I915_CACHE(UC), \
>>> +		[1] = I915_CACHE(WC), \
>>> +		[2] = I915_CACHE(WT), \
>>> +		[3] = I915_CACHE(WB, COH1W), \
>>> +		[4] = I915_CACHE(WT, CLOS1), \
>>> +		[5] = I915_CACHE(WB, COH1W, CLOS1), \
>>> +		[6] = I915_CACHE(WT, CLOS2), \
>>> +		[7] = I915_CACHE(WB, COH1W, CLOS2), \
>>>   	}
>>>   
>>> -#define MTL_CACHELEVEL \
>>> -	.cachelevel_to_pat = { \
>>> -		[I915_CACHE_NONE]   = 2, \
>>> -		[I915_CACHE_LLC]    = 3, \
>>> -		[I915_CACHE_L3_LLC] = 3, \
>>> -		[I915_CACHE_WT]     = 1, \
>>> +#define MTL_CACHE_MODES \
>>> +	.cache_modes = { \
>>> +		[0] = I915_CACHE(WB), \
>>> +		[1] = I915_CACHE(WT), \
>>> +		[2] = I915_CACHE(UC), \
>>> +		[3] = I915_CACHE(WB, COH1W), \
>>> +		[4] = I915_CACHE(WB, COH1W, COH2W), \
>>
>> We may want a comment on this one since the "2W" part is sort of a lie.
>> Bspec 63884 has a programming note for MTL that says
>>
>>          "...Except for system atomics, setting Coherency Mode to 10 or
>>          11 results in this same one-way coherenct behavior..."
>>
>> So if we ask for 2W, we actually only get 1W behavior except in a very
>> narrow set of cases.
>>
>>
>> Matt
>>
>>>   	}
>>>   
>>>   /* Keep in gen based order, and chronological order within a gen */
>>> @@ -97,7 +105,7 @@
>>>   	.max_pat_index = 3, \
>>>   	GEN_DEFAULT_PAGE_SIZES, \
>>>   	GEN_DEFAULT_REGIONS, \
>>> -	LEGACY_CACHELEVEL
>>> +	LEGACY_CACHE_MODES
>>>   
>>>   #define I845_FEATURES \
>>>   	GEN(2), \
>>> @@ -112,7 +120,7 @@
>>>   	.max_pat_index = 3, \
>>>   	GEN_DEFAULT_PAGE_SIZES, \
>>>   	GEN_DEFAULT_REGIONS, \
>>> -	LEGACY_CACHELEVEL
>>> +	LEGACY_CACHE_MODES
>>>   
>>>   static const struct intel_device_info i830_info = {
>>>   	I830_FEATURES,
>>> @@ -145,7 +153,7 @@ static const struct intel_device_info i865g_info = {
>>>   	.max_pat_index = 3, \
>>>   	GEN_DEFAULT_PAGE_SIZES, \
>>>   	GEN_DEFAULT_REGIONS, \
>>> -	LEGACY_CACHELEVEL
>>> +	LEGACY_CACHE_MODES
>>>   
>>>   static const struct intel_device_info i915g_info = {
>>>   	GEN3_FEATURES,
>>> @@ -208,7 +216,7 @@ static const struct intel_device_info pnv_m_info = {
>>>   	.max_pat_index = 3, \
>>>   	GEN_DEFAULT_PAGE_SIZES, \
>>>   	GEN_DEFAULT_REGIONS, \
>>> -	LEGACY_CACHELEVEL
>>> +	LEGACY_CACHE_MODES
>>>   
>>>   static const struct intel_device_info i965g_info = {
>>>   	GEN4_FEATURES,
>>> @@ -252,7 +260,7 @@ static const struct intel_device_info gm45_info = {
>>>   	.max_pat_index = 3, \
>>>   	GEN_DEFAULT_PAGE_SIZES, \
>>>   	GEN_DEFAULT_REGIONS, \
>>> -	LEGACY_CACHELEVEL
>>> +	LEGACY_CACHE_MODES
>>>   
>>>   static const struct intel_device_info ilk_d_info = {
>>>   	GEN5_FEATURES,
>>> @@ -282,7 +290,7 @@ static const struct intel_device_info ilk_m_info = {
>>>   	.__runtime.ppgtt_size = 31, \
>>>   	GEN_DEFAULT_PAGE_SIZES, \
>>>   	GEN_DEFAULT_REGIONS, \
>>> -	LEGACY_CACHELEVEL
>>> +	LEGACY_CACHE_MODES
>>>   
>>>   #define SNB_D_PLATFORM \
>>>   	GEN6_FEATURES, \
>>> @@ -330,7 +338,7 @@ static const struct intel_device_info snb_m_gt2_info = {
>>>   	.__runtime.ppgtt_size = 31, \
>>>   	GEN_DEFAULT_PAGE_SIZES, \
>>>   	GEN_DEFAULT_REGIONS, \
>>> -	LEGACY_CACHELEVEL
>>> +	LEGACY_CACHE_MODES
>>>   
>>>   #define IVB_D_PLATFORM \
>>>   	GEN7_FEATURES, \
>>> @@ -387,7 +395,7 @@ static const struct intel_device_info vlv_info = {
>>>   	.platform_engine_mask = BIT(RCS0) | BIT(VCS0) | BIT(BCS0),
>>>   	GEN_DEFAULT_PAGE_SIZES,
>>>   	GEN_DEFAULT_REGIONS,
>>> -	LEGACY_CACHELEVEL,
>>> +	LEGACY_CACHE_MODES
>>>   };
>>>   
>>>   #define G75_FEATURES  \
>>> @@ -473,7 +481,7 @@ static const struct intel_device_info chv_info = {
>>>   	.has_coherent_ggtt = false,
>>>   	GEN_DEFAULT_PAGE_SIZES,
>>>   	GEN_DEFAULT_REGIONS,
>>> -	LEGACY_CACHELEVEL,
>>> +	LEGACY_CACHE_MODES
>>>   };
>>>   
>>>   #define GEN9_DEFAULT_PAGE_SIZES \
>>> @@ -536,7 +544,7 @@ static const struct intel_device_info skl_gt4_info = {
>>>   	.max_pat_index = 3, \
>>>   	GEN9_DEFAULT_PAGE_SIZES, \
>>>   	GEN_DEFAULT_REGIONS, \
>>> -	LEGACY_CACHELEVEL
>>> +	LEGACY_CACHE_MODES
>>>   
>>>   static const struct intel_device_info bxt_info = {
>>>   	GEN9_LP_FEATURES,
>>> @@ -640,7 +648,7 @@ static const struct intel_device_info jsl_info = {
>>>   #define GEN12_FEATURES \
>>>   	GEN11_FEATURES, \
>>>   	GEN(12), \
>>> -	TGL_CACHELEVEL, \
>>> +	GEN12_CACHE_MODES, \
>>>   	.has_global_mocs = 1, \
>>>   	.has_pxp = 1, \
>>>   	.max_pat_index = 3
>>> @@ -708,7 +716,7 @@ static const struct intel_device_info adl_p_info = {
>>>   	.__runtime.graphics.ip.ver = 12, \
>>>   	.__runtime.graphics.ip.rel = 50, \
>>>   	XE_HP_PAGE_SIZES, \
>>> -	TGL_CACHELEVEL, \
>>> +	GEN12_CACHE_MODES, \
>>>   	.dma_mask_size = 46, \
>>>   	.has_3d_pipeline = 1, \
>>>   	.has_64bit_reloc = 1, \
>>> @@ -803,7 +811,7 @@ static const struct intel_device_info pvc_info = {
>>>   		BIT(VCS0) |
>>>   		BIT(CCS0) | BIT(CCS1) | BIT(CCS2) | BIT(CCS3),
>>>   	.require_force_probe = 1,
>>> -	PVC_CACHELEVEL,
>>> +	PVC_CACHE_MODES
>>>   };
>>>   
>>>   static const struct intel_gt_definition xelpmp_extra_gt[] = {
>>> @@ -838,7 +846,7 @@ static const struct intel_device_info mtl_info = {
>>>   	.memory_regions = REGION_SMEM | REGION_STOLEN_LMEM,
>>>   	.platform_engine_mask = BIT(RCS0) | BIT(BCS0) | BIT(CCS0),
>>>   	.require_force_probe = 1,
>>> -	MTL_CACHELEVEL,
>>> +	MTL_CACHE_MODES
>>>   };
>>>   
>>>   #undef PLATFORM
>>> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
>>> index 04bc1f4a1115..973175a64534 100644
>>> --- a/drivers/gpu/drm/i915/i915_perf.c
>>> +++ b/drivers/gpu/drm/i915/i915_perf.c
>>> @@ -1870,7 +1870,7 @@ static int alloc_oa_buffer(struct i915_perf_stream *stream)
>>>   		return PTR_ERR(bo);
>>>   	}
>>>   
>>> -	i915_gem_object_set_cache_coherency(bo, I915_CACHE_LLC);
>>> +	i915_gem_object_set_cache_coherency(bo, I915_CACHE_CACHED);
>>>   
>>>   	/* PreHSW required 512K alignment, HSW requires 16M */
>>>   	vma = i915_vma_instance(bo, &gt->ggtt->vm, NULL);
>>> diff --git a/drivers/gpu/drm/i915/intel_device_info.h b/drivers/gpu/drm/i915/intel_device_info.h
>>> index dbfe6443457b..2ce13b7c48cb 100644
>>> --- a/drivers/gpu/drm/i915/intel_device_info.h
>>> +++ b/drivers/gpu/drm/i915/intel_device_info.h
>>> @@ -27,6 +27,8 @@
>>>   
>>>   #include <uapi/drm/i915_drm.h>
>>>   
>>> +#include "i915_cache.h"
>>> +
>>>   #include "intel_step.h"
>>>   
>>>   #include "gt/intel_engine_types.h"
>>> @@ -243,8 +245,8 @@ struct intel_device_info {
>>>   	 */
>>>   	const struct intel_runtime_info __runtime;
>>>   
>>> -	u32 cachelevel_to_pat[I915_MAX_CACHE_LEVEL];
>>> -	u32 max_pat_index;
>>> +	i915_cache_t cache_modes[8];
>>> +	unsigned int max_pat_index;
>>>   };
>>>   
>>>   struct intel_driver_caps {
>>> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
>>> index f910ec9b6d2b..ba821e48baa5 100644
>>> --- a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
>>> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
>>> @@ -267,7 +267,7 @@ static int igt_evict_for_cache_color(void *arg)
>>>   		err = PTR_ERR(obj);
>>>   		goto cleanup;
>>>   	}
>>> -	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
>>> +	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
>>>   	quirk_add(obj, &objects);
>>>   
>>>   	vma = i915_gem_object_ggtt_pin(obj, NULL, 0, 0,
>>> @@ -283,7 +283,7 @@ static int igt_evict_for_cache_color(void *arg)
>>>   		err = PTR_ERR(obj);
>>>   		goto cleanup;
>>>   	}
>>> -	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
>>> +	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
>>>   	quirk_add(obj, &objects);
>>>   
>>>   	/* Neighbouring; same colour - should fit */
>>> diff --git a/drivers/gpu/drm/i915/selftests/igt_spinner.c b/drivers/gpu/drm/i915/selftests/igt_spinner.c
>>> index 3c5e0952f1b8..4cfc5000d6ff 100644
>>> --- a/drivers/gpu/drm/i915/selftests/igt_spinner.c
>>> +++ b/drivers/gpu/drm/i915/selftests/igt_spinner.c
>>> @@ -23,7 +23,7 @@ int igt_spinner_init(struct igt_spinner *spin, struct intel_gt *gt)
>>>   		err = PTR_ERR(spin->hws);
>>>   		goto err;
>>>   	}
>>> -	i915_gem_object_set_cache_coherency(spin->hws, I915_CACHE_LLC);
>>> +	i915_gem_object_set_cache_coherency(spin->hws, I915_CACHE_CACHED);
>>>   
>>>   	spin->obj = i915_gem_object_create_internal(gt->i915, PAGE_SIZE);
>>>   	if (IS_ERR(spin->obj)) {
>>> diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
>>> index 1d1a457e2aee..8ae77bcf27fa 100644
>>> --- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
>>> +++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
>>> @@ -126,13 +126,13 @@ static const struct intel_device_info mock_info = {
>>>   	.memory_regions = REGION_SMEM,
>>>   	.platform_engine_mask = BIT(0),
>>>   
>>> -	/* simply use legacy cache level for mock device */
>>> +	/* Simply use legacy cache modes for the mock device. */
>>>   	.max_pat_index = 3,
>>> -	.cachelevel_to_pat = {
>>> -		[I915_CACHE_NONE]   = 0,
>>> -		[I915_CACHE_LLC]    = 1,
>>> -		[I915_CACHE_L3_LLC] = 2,
>>> -		[I915_CACHE_WT]     = 3,
>>> +	.cache_modes = {
>>> +		[0] = I915_CACHE(UC),
>>> +		[1] = I915_CACHE(WB, COH1W),
>>> +		[2] = I915_CACHE(WB, COH1W, COH2W, L3),
>>> +		[3] = I915_CACHE(WT),
>>>   	},
>>>   };
>>>   
>>> @@ -181,7 +181,7 @@ struct drm_i915_private *mock_gem_device(void)
>>>   	/* Set up device info and initial runtime info. */
>>>   	intel_device_info_driver_create(i915, pdev->device, &mock_info);
>>>   
>>> -	i915_cache_init(i915);
>>> +	WARN_ON(i915_cache_init(i915));
>>>   
>>>   	dev_pm_domain_set(&pdev->dev, &pm_domain);
>>>   	pm_runtime_enable(&pdev->dev);
>>> -- 
>>> 2.39.2
>>>
>>
>> -- 
>> Matt Roper
>> Graphics Software Engineer
>> Linux GPU Platform Enablement
>> Intel Corporation
> 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC 4/8] drm/i915: Refactor PAT/object cache handling
@ 2023-07-28 12:35         ` Tvrtko Ursulin
  0 siblings, 0 replies; 59+ messages in thread
From: Tvrtko Ursulin @ 2023-07-28 12:35 UTC (permalink / raw)
  To: Matt Roper
  Cc: Fei Yang, Tvrtko Ursulin, Intel-gfx, dri-devel, Andi Shyti, Chris Wilson


On 28/07/2023 01:17, Matt Roper wrote:
> On Thu, Jul 27, 2023 at 04:57:53PM -0700, Matt Roper wrote:
>> On Thu, Jul 27, 2023 at 03:55:00PM +0100, Tvrtko Ursulin wrote:
>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>
>>> Commit 9275277d5324 ("drm/i915: use pat_index instead of cache_level") has
>>> introduced PAT indices to i915 internal APIs, partially replacing the
>>> usage of driver internal cache_level, but has also added a few sub-
>>> optimal design decisions which this patch tries to improve upon.
>>>
>>> Principal change here is to invert the per platform cache level to PAT
>>> index table which was added by the referenced commit, and by doing so
>>> enable i915 to understand the cache mode between PAT indices, changing
>>> them from opaque to transparent.
>>>
>>> Once we have the inverted table we are able to remove the hidden false
>>> "return true" from i915_gem_object_has_cache_level and make the involved
>>> code path clearer.
>>>
>>> To achieve this we replace the enum i915_cache_level with i915_cache_t,
>>> composed of a more detailed representation of each cache mode (base mode
>>> plus flags).
>>>
>>> In this way we are able to express the differences between different
>>> write-back mode coherency settings on Meteorlake, which in turn enables us
>>> to map the i915 "cached" mode to the correct Meteorlake PAT index.
>>>
>>> We can also replace the platform dependent cache mode to string code in
>>> debugfs and elsewhere by the single implementation based on i915_cache_t.
>>>
>>> v2:
>>>   * Fix PAT-to-cache-mode table for PVC. (Fei)
>>>   * Cache display caching mode too. (Fei)
>>>   * Improve and document criteria in i915_gem_object_can_bypass_llc() (Matt)
>>>
>>> v3:
>>>   * Checkpath issues.
>>>   * Cache mode flags check fixed.
>>>
>>> v4:
>>>   * Fix intel_device_info->cache_modes array size. (Matt)
>>>   * Boolean cache mode and flags query. (Matt)
>>>   * Reduce number of cache macros with some macro magic.
>>>   * One more checkpatch fix.
>>>   * Tweak tables to show legacy and Gen12 WB is fully coherent.
>>>
>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>> References: 9275277d5324 ("drm/i915: use pat_index instead of cache_level")
>>> Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
>>> Cc: Fei Yang <fei.yang@intel.com>
>>> Cc: Andi Shyti <andi.shyti@linux.intel.com>
>>> Cc: Matt Roper <matthew.d.roper@intel.com>
>>> ---
>>>   drivers/gpu/drm/i915/gem/i915_gem_domain.c    |  60 +++++----
>>>   drivers/gpu/drm/i915/gem/i915_gem_domain.h    |   5 +-
>>>   .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |   3 +-
>>>   drivers/gpu/drm/i915/gem/i915_gem_internal.c  |   2 +-
>>>   drivers/gpu/drm/i915/gem/i915_gem_mman.c      |   4 +-
>>>   drivers/gpu/drm/i915/gem/i915_gem_object.c    | 117 ++++++++++--------
>>>   drivers/gpu/drm/i915/gem/i915_gem_object.h    |  11 +-
>>>   .../gpu/drm/i915/gem/i915_gem_object_types.h  | 116 +----------------
>>>   drivers/gpu/drm/i915/gem/i915_gem_shmem.c     |   8 +-
>>>   drivers/gpu/drm/i915/gem/i915_gem_stolen.c    |   2 +-
>>>   drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c  |  20 +--
>>>   drivers/gpu/drm/i915/gem/i915_gem_userptr.c   |   2 +-
>>>   .../drm/i915/gem/selftests/huge_gem_object.c  |   2 +-
>>>   .../gpu/drm/i915/gem/selftests/huge_pages.c   |   3 +-
>>>   drivers/gpu/drm/i915/gt/gen8_ppgtt.c          |  10 +-
>>>   drivers/gpu/drm/i915/gt/intel_engine_cs.c     |   2 +-
>>>   drivers/gpu/drm/i915/gt/intel_ggtt.c          |  25 ++--
>>>   drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c     |   4 +-
>>>   drivers/gpu/drm/i915/gt/intel_gtt.c           |   2 +-
>>>   drivers/gpu/drm/i915/gt/intel_gtt.h           |   3 +-
>>>   drivers/gpu/drm/i915/gt/intel_ppgtt.c         |   6 +-
>>>   .../gpu/drm/i915/gt/intel_ring_submission.c   |   4 +-
>>>   drivers/gpu/drm/i915/gt/intel_timeline.c      |   2 +-
>>>   drivers/gpu/drm/i915/gt/selftest_hangcheck.c  |   2 +-
>>>   .../gpu/drm/i915/gt/selftest_workarounds.c    |   2 +-
>>>   drivers/gpu/drm/i915/i915_cache.c             |  89 +++++++++++--
>>>   drivers/gpu/drm/i915/i915_cache.h             |  70 ++++++++++-
>>>   drivers/gpu/drm/i915/i915_debugfs.c           |  53 ++------
>>>   drivers/gpu/drm/i915/i915_driver.c            |   4 +-
>>>   drivers/gpu/drm/i915/i915_gem.c               |  13 --
>>>   drivers/gpu/drm/i915/i915_pci.c               |  84 +++++++------
>>>   drivers/gpu/drm/i915/i915_perf.c              |   2 +-
>>>   drivers/gpu/drm/i915/intel_device_info.h      |   6 +-
>>>   .../gpu/drm/i915/selftests/i915_gem_evict.c   |   4 +-
>>>   drivers/gpu/drm/i915/selftests/igt_spinner.c  |   2 +-
>>>   .../gpu/drm/i915/selftests/mock_gem_device.c  |  14 +--
>>>   36 files changed, 391 insertions(+), 367 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
>>> index 57db9c581bf6..c15f83de33af 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
>>> @@ -8,6 +8,7 @@
>>>   #include "display/intel_frontbuffer.h"
>>>   #include "gt/intel_gt.h"
>>>   
>>> +#include "i915_cache.h"
>>>   #include "i915_drv.h"
>>>   #include "i915_gem_clflush.h"
>>>   #include "i915_gem_domain.h"
>>> @@ -41,14 +42,17 @@ static bool gpu_write_needs_clflush(struct drm_i915_gem_object *obj)
>>>   		return false;
>>>   
>>>   	/*
>>> -	 * For objects created by userspace through GEM_CREATE with pat_index
>>> -	 * set by set_pat extension, i915_gem_object_has_cache_level() will
>>> -	 * always return true, because the coherency of such object is managed
>>> -	 * by userspace. Othereise the call here would fall back to checking
>>> -	 * whether the object is un-cached or write-through.
>>> +	 * Always flush cache for UMD objects with PAT index set.
>>>   	 */
>>> -	return !(i915_gem_object_has_cache_level(obj, I915_CACHE_NONE) ||
>>> -		 i915_gem_object_has_cache_level(obj, I915_CACHE_WT));
>>> +	if (obj->pat_set_by_user)
>>> +		return true;
>>> +
>>> +	/*
>>> +	 * Fully coherent cached access may end up with data in the CPU cache
>>> +	 * which hasn't hit memory yet.
>>> +	 */
>>> +	return i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_WB) &&
>>> +	       i915_gem_object_has_cache_flag(obj, I915_CACHE_FLAG_COH2W);
>>>   }
>>>   
>>>   bool i915_gem_cpu_write_needs_clflush(struct drm_i915_gem_object *obj)
>>> @@ -268,7 +272,7 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
>>>   /**
>>>    * i915_gem_object_set_cache_level - Changes the cache-level of an object across all VMA.
>>>    * @obj: object to act on
>>> - * @cache_level: new cache level to set for the object
>>> + * @cache: new caching mode to set for the object
>>>    *
>>>    * After this function returns, the object will be in the new cache-level
>>>    * across all GTT and the contents of the backing storage will be coherent,
>>> @@ -281,18 +285,28 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
>>>    * that all direct access to the scanout remains coherent.
>>>    */
>>>   int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
>>> -				    enum i915_cache_level cache_level)
>>> +				    i915_cache_t cache)
>>>   {
>>> -	int ret;
>>> +	struct drm_i915_private *i915 = to_i915(obj->base.dev);
>>> +	int pat, ret;
>>>   
>>> -	/*
>>> -	 * For objects created by userspace through GEM_CREATE with pat_index
>>> -	 * set by set_pat extension, simply return 0 here without touching
>>> -	 * the cache setting, because such objects should have an immutable
>>> -	 * cache setting by desgin and always managed by userspace.
>>> -	 */
>>> -	if (i915_gem_object_has_cache_level(obj, cache_level))
>>> +	pat = i915_cache_find_pat(i915, cache);
>>> +	if (pat < 0) {
>>> +		char buf[I915_CACHE_NAME_LEN];
>>> +
>>> +		i915_cache_print(buf, sizeof(buf), NULL, cache);
>>> +		drm_err_ratelimited(&i915->drm,
>>> +				    "Attempting to use unknown caching mode %s!\n",
>>> +				    buf);
>>> +
>>> +		return -EINVAL;
>>> +	} else if (pat == obj->pat_index) {
>>>   		return 0;
>>> +	} else if (obj->pat_set_by_user) {
>>> +		drm_notice_once(&i915->drm,
>>> +				"Attempting to change caching mode on an object with fixed PAT!\n");
>>> +		return -EINVAL;
>>> +	}
>>>   
>>>   	ret = i915_gem_object_wait(obj,
>>>   				   I915_WAIT_INTERRUPTIBLE |
>>> @@ -302,7 +316,7 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
>>>   		return ret;
>>>   
>>>   	/* Always invalidate stale cachelines */
>>> -	i915_gem_object_set_cache_coherency(obj, cache_level);
>>> +	i915_gem_object_set_pat_index(obj, pat);
>>>   	obj->cache_dirty = true;
>>>   
>>>   	/* The cache-level will be applied when each vma is rebound. */
>>> @@ -337,10 +351,10 @@ int i915_gem_get_caching_ioctl(struct drm_device *dev, void *data,
>>>   		goto out;
>>>   	}
>>>   
>>> -	if (i915_gem_object_has_cache_level(obj, I915_CACHE_LLC) ||
>>> -	    i915_gem_object_has_cache_level(obj, I915_CACHE_L3_LLC))
>>> +	if (i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_WB) &&
>>> +	    i915_gem_object_has_cache_flag(obj, I915_CACHE_FLAG_COH2W))
>>>   		args->caching = I915_CACHING_CACHED;
>>> -	else if (i915_gem_object_has_cache_level(obj, I915_CACHE_WT))
>>> +	else if (i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_WT))
>>>   		args->caching = I915_CACHING_DISPLAY;
>>>   	else
>>>   		args->caching = I915_CACHING_NONE;
>>> @@ -355,7 +369,7 @@ int i915_gem_set_caching_ioctl(struct drm_device *dev, void *data,
>>>   	struct drm_i915_private *i915 = to_i915(dev);
>>>   	struct drm_i915_gem_caching *args = data;
>>>   	struct drm_i915_gem_object *obj;
>>> -	enum i915_cache_level level;
>>> +	i915_cache_t level;
>>>   	int ret = 0;
>>>   
>>>   	if (IS_DGFX(i915))
>>> @@ -378,7 +392,7 @@ int i915_gem_set_caching_ioctl(struct drm_device *dev, void *data,
>>>   		if (!HAS_LLC(i915) && !HAS_SNOOP(i915))
>>>   			return -ENODEV;
>>>   
>>> -		level = I915_CACHE_LLC;
>>> +		level = I915_CACHE_CACHED;
>>>   		break;
>>>   	case I915_CACHING_DISPLAY:
>>>   		level = HAS_WT(i915) ? I915_CACHE_WT : I915_CACHE_NONE;
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.h b/drivers/gpu/drm/i915/gem/i915_gem_domain.h
>>> index 9622df962bfc..6da5c351f6fd 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.h
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.h
>>> @@ -6,10 +6,11 @@
>>>   #ifndef __I915_GEM_DOMAIN_H__
>>>   #define __I915_GEM_DOMAIN_H__
>>>   
>>> +#include "i915_cache.h"
>>> +
>>>   struct drm_i915_gem_object;
>>> -enum i915_cache_level;
>>>   
>>>   int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
>>> -				    enum i915_cache_level cache_level);
>>> +				    i915_cache_t cache);
>>>   
>>>   #endif /* __I915_GEM_DOMAIN_H__ */
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>>> index 0a1d40220020..9d6e49c8a4c6 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>>> @@ -648,7 +648,8 @@ static inline int use_cpu_reloc(const struct reloc_cache *cache,
>>>   	 */
>>>   	return (cache->has_llc ||
>>>   		obj->cache_dirty ||
>>> -		!i915_gem_object_has_cache_level(obj, I915_CACHE_NONE));
>>> +		!(obj->pat_set_by_user ||
>>> +		  i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_UC)));
>>>   }
>>>   
>>>   static int eb_reserve_vma(struct i915_execbuffer *eb,
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_internal.c b/drivers/gpu/drm/i915/gem/i915_gem_internal.c
>>> index 6bc26b4b06b8..88c360c3d6a3 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_internal.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_internal.c
>>> @@ -170,7 +170,7 @@ __i915_gem_object_create_internal(struct drm_i915_private *i915,
>>>   	obj->read_domains = I915_GEM_DOMAIN_CPU;
>>>   	obj->write_domain = I915_GEM_DOMAIN_CPU;
>>>   
>>> -	cache_level = HAS_LLC(i915) ? I915_CACHE_LLC : I915_CACHE_NONE;
>>> +	cache_level = HAS_LLC(i915) ? I915_CACHE_CACHED : I915_CACHE_NONE;
>>>   	i915_gem_object_set_cache_coherency(obj, cache_level);
>>>   
>>>   	return obj;
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
>>> index aa4d842d4c5a..cd7f8ded0d6f 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
>>> @@ -382,7 +382,6 @@ static vm_fault_t vm_fault_gtt(struct vm_fault *vmf)
>>>   		goto err_reset;
>>>   	}
>>>   
>>> -	/* Access to snoopable pages through the GTT is incoherent. */
>>>   	/*
>>>   	 * For objects created by userspace through GEM_CREATE with pat_index
>>>   	 * set by set_pat extension, coherency is managed by userspace, make
>>> @@ -391,7 +390,8 @@ static vm_fault_t vm_fault_gtt(struct vm_fault *vmf)
>>>   	 * objects. Otherwise this helper function would fall back to checking
>>>   	 * whether the object is un-cached.
>>>   	 */
>>> -	if (!(i915_gem_object_has_cache_level(obj, I915_CACHE_NONE) ||
>>> +	if (!((obj->pat_set_by_user ||
>>> +	       i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_UC)) ||
>>>   	      HAS_LLC(i915))) {
>>>   		ret = -EFAULT;
>>>   		goto err_unpin;
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
>>> index 3dc4fbb67d2b..ec1f0be43d0d 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
>>> @@ -45,33 +45,6 @@ static struct kmem_cache *slab_objects;
>>>   
>>>   static const struct drm_gem_object_funcs i915_gem_object_funcs;
>>>   
>>> -unsigned int i915_gem_get_pat_index(struct drm_i915_private *i915,
>>> -				    enum i915_cache_level level)
>>> -{
>>> -	if (drm_WARN_ON(&i915->drm, level >= I915_MAX_CACHE_LEVEL))
>>> -		return 0;
>>> -
>>> -	return INTEL_INFO(i915)->cachelevel_to_pat[level];
>>> -}
>>> -
>>> -bool i915_gem_object_has_cache_level(const struct drm_i915_gem_object *obj,
>>> -				     enum i915_cache_level lvl)
>>> -{
>>> -	/*
>>> -	 * In case the pat_index is set by user space, this kernel mode
>>> -	 * driver should leave the coherency to be managed by user space,
>>> -	 * simply return true here.
>>> -	 */
>>> -	if (obj->pat_set_by_user)
>>> -		return true;
>>> -
>>> -	/*
>>> -	 * Otherwise the pat_index should have been converted from cache_level
>>> -	 * so that the following comparison is valid.
>>> -	 */
>>> -	return obj->pat_index == i915_gem_get_pat_index(obj_to_i915(obj), lvl);
>>> -}
>>> -
>>>   struct drm_i915_gem_object *i915_gem_object_alloc(void)
>>>   {
>>>   	struct drm_i915_gem_object *obj;
>>> @@ -144,30 +117,72 @@ void __i915_gem_object_fini(struct drm_i915_gem_object *obj)
>>>   	dma_resv_fini(&obj->base._resv);
>>>   }
>>>   
>>> +bool i915_gem_object_has_cache_mode(const struct drm_i915_gem_object *obj,
>>> +				    enum i915_cache_mode mode)
>>> +{
>>> +	struct drm_i915_private *i915 = obj_to_i915(obj);
>>> +	i915_cache_t cache = INTEL_INFO(i915)->cache_modes[obj->pat_index];
>>> +
>>> +	return I915_CACHE_MODE(cache) == mode;
>>> +}
>>> +
>>> +bool i915_gem_object_has_cache_flag(const struct drm_i915_gem_object *obj,
>>> +				    unsigned int flag)
>>> +{
>>> +	struct drm_i915_private *i915 = obj_to_i915(obj);
>>> +	i915_cache_t cache = INTEL_INFO(i915)->cache_modes[obj->pat_index];
>>> +
>>> +	return I915_CACHE_FLAGS(cache) & flag;
>>> +}
>>> +
>>> +static void __i915_gem_object_update_coherency(struct drm_i915_gem_object *obj)
>>> +{
>>> +	struct drm_i915_private *i915 = obj_to_i915(obj);
>>> +	i915_cache_t cache = INTEL_INFO(i915)->cache_modes[obj->pat_index];
>>> +	const unsigned int flags = I915_CACHE_FLAGS(cache);
>>> +	const unsigned int mode = I915_CACHE_MODE(cache);
>>> +
>>> +	if (mode == I915_CACHE_MODE_WC ||
>>> +	    mode == I915_CACHE_MODE_WT ||
>>> +	    (mode == I915_CACHE_MODE_WB && (flags & I915_CACHE_FLAG_COH2W)))
> 
> Shouldn't we only need 1W coherency here?  With 1-way coherency GPU
> reads will snoop the CPU cache and GPU writes will invalidate the CPU
> cache.  2-way only matters for how CPU reads/writes interact with the
> GPU cache.

I thought so too at one point, but then was not entirely sure. Our kerneldoc says:

	 * I915_BO_CACHE_COHERENT_FOR_WRITE:
	 *
	 * When writing through the CPU cache, the GPU is still coherent. Note
	 * that this also implies I915_BO_CACHE_COHERENT_FOR_READ.

Question is whether that only applies before handing over the buffer to the GPU, or also while in active use from both sides?

If just before handing over then 1-way is correct. But if latter then 2-way is needed because MTL bspec says 1-way only snoops from the GPU until first GPU access, after which point GPU has its own copy of the cache line in it's cache.

Bspec 59620 - although now I see that may not be MTL..

Regards,

Tvrtko

> 
> 
> Matt
> 
>>> +		obj->cache_coherent = I915_BO_CACHE_COHERENT_FOR_READ |
>>> +				      I915_BO_CACHE_COHERENT_FOR_WRITE;
>>> +	else if (HAS_LLC(i915))
>>> +		obj->cache_coherent = I915_BO_CACHE_COHERENT_FOR_READ;
>>> +	else
>>> +		obj->cache_coherent = 0;
>>> +
>>> +	obj->cache_dirty =
>>> +		!(obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_WRITE) &&
>>> +		!IS_DGFX(i915);
>>> +}
>>> +
>>>   /**
>>>    * i915_gem_object_set_cache_coherency - Mark up the object's coherency levels
>>> - * for a given cache_level
>>> + * for a given caching mode
>>>    * @obj: #drm_i915_gem_object
>>> - * @cache_level: cache level
>>> + * @cache: cache mode
>>>    */
>>>   void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object *obj,
>>> -					 unsigned int cache_level)
>>> +					 i915_cache_t cache)
>>>   {
>>> -	struct drm_i915_private *i915 = to_i915(obj->base.dev);
>>> +	struct drm_i915_private *i915 = obj_to_i915(obj);
>>> +	int found;
>>>   
>>> -	obj->pat_index = i915_gem_get_pat_index(i915, cache_level);
>>> +	found = i915_cache_find_pat(i915, cache);
>>> +	if (found < 0) {
>>> +		char buf[I915_CACHE_NAME_LEN];
>>>   
>>> -	if (cache_level != I915_CACHE_NONE)
>>> -		obj->cache_coherent = (I915_BO_CACHE_COHERENT_FOR_READ |
>>> -				       I915_BO_CACHE_COHERENT_FOR_WRITE);
>>> -	else if (HAS_LLC(i915))
>>> -		obj->cache_coherent = I915_BO_CACHE_COHERENT_FOR_READ;
>>> -	else
>>> -		obj->cache_coherent = 0;
>>> +		i915_cache_print(buf, sizeof(buf), NULL, cache);
>>> +		drm_err_ratelimited(&i915->drm, "Unknown cache mode %s!\n",
>>> +				    buf);
>>>   
>>> -	obj->cache_dirty =
>>> -		!(obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_WRITE) &&
>>> -		!IS_DGFX(i915);
>>> +		found = i915->pat_uc;
>>> +	}
>>> +
>>> +	obj->pat_index = found;
>>> +
>>> +	__i915_gem_object_update_coherency(obj);
>>>   }
>>>   
>>>   /**
>>> @@ -181,24 +196,18 @@ void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object *obj,
>>>   void i915_gem_object_set_pat_index(struct drm_i915_gem_object *obj,
>>>   				   unsigned int pat_index)
>>>   {
>>> -	struct drm_i915_private *i915 = to_i915(obj->base.dev);
>>> +	struct drm_i915_private *i915 = obj_to_i915(obj);
>>>   
>>>   	if (obj->pat_index == pat_index)
>>>   		return;
>>>   
>>> +	if (drm_WARN_ON_ONCE(&i915->drm,
>>> +			     pat_index > INTEL_INFO(i915)->max_pat_index))
>>> +		return;
>>> +
>>>   	obj->pat_index = pat_index;
>>>   
>>> -	if (pat_index != i915_gem_get_pat_index(i915, I915_CACHE_NONE))
>>> -		obj->cache_coherent = (I915_BO_CACHE_COHERENT_FOR_READ |
>>> -				       I915_BO_CACHE_COHERENT_FOR_WRITE);
>>> -	else if (HAS_LLC(i915))
>>> -		obj->cache_coherent = I915_BO_CACHE_COHERENT_FOR_READ;
>>> -	else
>>> -		obj->cache_coherent = 0;
>>> -
>>> -	obj->cache_dirty =
>>> -		!(obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_WRITE) &&
>>> -		!IS_DGFX(i915);
>>> +	__i915_gem_object_update_coherency(obj);
>>>   }
>>>   
>>>   bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj)
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h
>>> index 884a17275b3a..a5d4ee19d9be 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
>>> @@ -13,6 +13,7 @@
>>>   
>>>   #include "display/intel_frontbuffer.h"
>>>   #include "intel_memory_region.h"
>>> +#include "i915_cache.h"
>>>   #include "i915_gem_object_types.h"
>>>   #include "i915_gem_gtt.h"
>>>   #include "i915_gem_ww.h"
>>> @@ -32,10 +33,6 @@ static inline bool i915_gem_object_size_2big(u64 size)
>>>   	return false;
>>>   }
>>>   
>>> -unsigned int i915_gem_get_pat_index(struct drm_i915_private *i915,
>>> -				    enum i915_cache_level level);
>>> -bool i915_gem_object_has_cache_level(const struct drm_i915_gem_object *obj,
>>> -				     enum i915_cache_level lvl);
>>>   void i915_gem_init__objects(struct drm_i915_private *i915);
>>>   
>>>   void i915_objects_module_exit(void);
>>> @@ -764,8 +761,12 @@ int i915_gem_object_wait_moving_fence(struct drm_i915_gem_object *obj,
>>>   				      bool intr);
>>>   bool i915_gem_object_has_unknown_state(struct drm_i915_gem_object *obj);
>>>   
>>> +bool i915_gem_object_has_cache_mode(const struct drm_i915_gem_object *obj,
>>> +				    enum i915_cache_mode mode);
>>> +bool i915_gem_object_has_cache_flag(const struct drm_i915_gem_object *obj,
>>> +				    unsigned int flag);
>>>   void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object *obj,
>>> -					 unsigned int cache_level);
>>> +					 i915_cache_t cache);
>>>   void i915_gem_object_set_pat_index(struct drm_i915_gem_object *obj,
>>>   				   unsigned int pat_index);
>>>   bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj);
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
>>> index 8de2b91b3edf..6790e13ad262 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
>>> @@ -14,6 +14,7 @@
>>>   #include <uapi/drm/i915_drm.h>
>>>   
>>>   #include "i915_active.h"
>>> +#include "i915_cache.h"
>>>   #include "i915_selftest.h"
>>>   #include "i915_vma_resource.h"
>>>   
>>> @@ -116,93 +117,6 @@ struct drm_i915_gem_object_ops {
>>>   	const char *name; /* friendly name for debug, e.g. lockdep classes */
>>>   };
>>>   
>>> -/**
>>> - * enum i915_cache_level - The supported GTT caching values for system memory
>>> - * pages.
>>> - *
>>> - * These translate to some special GTT PTE bits when binding pages into some
>>> - * address space. It also determines whether an object, or rather its pages are
>>> - * coherent with the GPU, when also reading or writing through the CPU cache
>>> - * with those pages.
>>> - *
>>> - * Userspace can also control this through struct drm_i915_gem_caching.
>>> - */
>>> -enum i915_cache_level {
>>> -	/**
>>> -	 * @I915_CACHE_NONE:
>>> -	 *
>>> -	 * GPU access is not coherent with the CPU cache. If the cache is dirty
>>> -	 * and we need the underlying pages to be coherent with some later GPU
>>> -	 * access then we need to manually flush the pages.
>>> -	 *
>>> -	 * On shared LLC platforms reads and writes through the CPU cache are
>>> -	 * still coherent even with this setting. See also
>>> -	 * &drm_i915_gem_object.cache_coherent for more details. Due to this we
>>> -	 * should only ever use uncached for scanout surfaces, otherwise we end
>>> -	 * up over-flushing in some places.
>>> -	 *
>>> -	 * This is the default on non-LLC platforms.
>>> -	 */
>>> -	I915_CACHE_NONE = 0,
>>> -	/**
>>> -	 * @I915_CACHE_LLC:
>>> -	 *
>>> -	 * GPU access is coherent with the CPU cache. If the cache is dirty,
>>> -	 * then the GPU will ensure that access remains coherent, when both
>>> -	 * reading and writing through the CPU cache. GPU writes can dirty the
>>> -	 * CPU cache.
>>> -	 *
>>> -	 * Not used for scanout surfaces.
>>> -	 *
>>> -	 * Applies to both platforms with shared LLC(HAS_LLC), and snooping
>>> -	 * based platforms(HAS_SNOOP).
>>> -	 *
>>> -	 * This is the default on shared LLC platforms.  The only exception is
>>> -	 * scanout objects, where the display engine is not coherent with the
>>> -	 * CPU cache. For such objects I915_CACHE_NONE or I915_CACHE_WT is
>>> -	 * automatically applied by the kernel in pin_for_display, if userspace
>>> -	 * has not done so already.
>>> -	 */
>>> -	I915_CACHE_LLC,
>>> -	/**
>>> -	 * @I915_CACHE_L3_LLC:
>>> -	 *
>>> -	 * Explicitly enable the Gfx L3 cache, with coherent LLC.
>>> -	 *
>>> -	 * The Gfx L3 sits between the domain specific caches, e.g
>>> -	 * sampler/render caches, and the larger LLC. LLC is coherent with the
>>> -	 * GPU, but L3 is only visible to the GPU, so likely needs to be flushed
>>> -	 * when the workload completes.
>>> -	 *
>>> -	 * Not used for scanout surfaces.
>>> -	 *
>>> -	 * Only exposed on some gen7 + GGTT. More recent hardware has dropped
>>> -	 * this explicit setting, where it should now be enabled by default.
>>> -	 */
>>> -	I915_CACHE_L3_LLC,
>>> -	/**
>>> -	 * @I915_CACHE_WT:
>>> -	 *
>>> -	 * Write-through. Used for scanout surfaces.
>>> -	 *
>>> -	 * The GPU can utilise the caches, while still having the display engine
>>> -	 * be coherent with GPU writes, as a result we don't need to flush the
>>> -	 * CPU caches when moving out of the render domain. This is the default
>>> -	 * setting chosen by the kernel, if supported by the HW, otherwise we
>>> -	 * fallback to I915_CACHE_NONE. On the CPU side writes through the CPU
>>> -	 * cache still need to be flushed, to remain coherent with the display
>>> -	 * engine.
>>> -	 */
>>> -	I915_CACHE_WT,
>>> -	/**
>>> -	 * @I915_MAX_CACHE_LEVEL:
>>> -	 *
>>> -	 * Mark the last entry in the enum. Used for defining cachelevel_to_pat
>>> -	 * array for cache_level to pat translation table.
>>> -	 */
>>> -	I915_MAX_CACHE_LEVEL,
>>> -};
>>> -
>>>   enum i915_map_type {
>>>   	I915_MAP_WB = 0,
>>>   	I915_MAP_WC,
>>> @@ -403,16 +317,6 @@ struct drm_i915_gem_object {
>>>   	/**
>>>   	 * @cache_coherent:
>>>   	 *
>>> -	 * Note: with the change above which replaced @cache_level with pat_index,
>>> -	 * the use of @cache_coherent is limited to the objects created by kernel
>>> -	 * or by userspace without pat index specified.
>>> -	 * Check for @pat_set_by_user to find out if an object has pat index set
>>> -	 * by userspace. The ioctl's to change cache settings have also been
>>> -	 * disabled for the objects with pat index set by userspace. Please don't
>>> -	 * assume @cache_coherent having the flags set as describe here. A helper
>>> -	 * function i915_gem_object_has_cache_level() provides one way to bypass
>>> -	 * the use of this field.
>>> -	 *
>>>   	 * Track whether the pages are coherent with the GPU if reading or
>>>   	 * writing through the CPU caches. The largely depends on the
>>>   	 * @cache_level setting.
>>> @@ -447,7 +351,7 @@ struct drm_i915_gem_object {
>>>   	 * flushing the surface just before doing the scanout.  This does mean
>>>   	 * we might unnecessarily flush non-scanout objects in some places, but
>>>   	 * the default assumption is that all normal objects should be using
>>> -	 * I915_CACHE_LLC, at least on platforms with the shared LLC.
>>> +	 * I915_CACHE_CACHED, at least on platforms with the shared LLC.
>>>   	 *
>>>   	 * Supported values:
>>>   	 *
>>> @@ -486,16 +390,6 @@ struct drm_i915_gem_object {
>>>   	/**
>>>   	 * @cache_dirty:
>>>   	 *
>>> -	 * Note: with the change above which replaced cache_level with pat_index,
>>> -	 * the use of @cache_dirty is limited to the objects created by kernel
>>> -	 * or by userspace without pat index specified.
>>> -	 * Check for @pat_set_by_user to find out if an object has pat index set
>>> -	 * by userspace. The ioctl's to change cache settings have also been
>>> -	 * disabled for the objects with pat_index set by userspace. Please don't
>>> -	 * assume @cache_dirty is set as describe here. Also see helper function
>>> -	 * i915_gem_object_has_cache_level() for possible ways to bypass the use
>>> -	 * of this field.
>>> -	 *
>>>   	 * Track if we are we dirty with writes through the CPU cache for this
>>>   	 * object. As a result reading directly from main memory might yield
>>>   	 * stale data.
>>> @@ -531,9 +425,9 @@ struct drm_i915_gem_object {
>>>   	 *
>>>   	 *   1. All userspace objects, by default, have @cache_level set as
>>>   	 *   I915_CACHE_NONE. The only exception is userptr objects, where we
>>> -	 *   instead force I915_CACHE_LLC, but we also don't allow userspace to
>>> -	 *   ever change the @cache_level for such objects. Another special case
>>> -	 *   is dma-buf, which doesn't rely on @cache_dirty,  but there we
>>> +	 *   instead force I915_CACHE_CACHED, but we also don't allow userspace
>>> +	 *   to ever change the @cache_level for such objects. Another special
>>> +	 *   case is dma-buf, which doesn't rely on @cache_dirty,  but there we
>>>   	 *   always do a forced flush when acquiring the pages, if there is a
>>>   	 *   chance that the pages can be read directly from main memory with
>>>   	 *   the GPU.
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
>>> index 8f1633c3fb93..aba908f0349f 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
>>> @@ -584,7 +584,7 @@ static int shmem_object_init(struct intel_memory_region *mem,
>>>   	static struct lock_class_key lock_class;
>>>   	struct drm_i915_private *i915 = mem->i915;
>>>   	struct address_space *mapping;
>>> -	unsigned int cache_level;
>>> +	i915_cache_t cache;
>>>   	gfp_t mask;
>>>   	int ret;
>>>   
>>> @@ -628,11 +628,11 @@ static int shmem_object_init(struct intel_memory_region *mem,
>>>   		 * However, we maintain the display planes as UC, and so
>>>   		 * need to rebind when first used as such.
>>>   		 */
>>> -		cache_level = I915_CACHE_LLC;
>>> +		cache = I915_CACHE_CACHED;
>>>   	else
>>> -		cache_level = I915_CACHE_NONE;
>>> +		cache = I915_CACHE_NONE;
>>>   
>>> -	i915_gem_object_set_cache_coherency(obj, cache_level);
>>> +	i915_gem_object_set_cache_coherency(obj, cache);
>>>   
>>>   	i915_gem_object_init_memory_region(obj, mem);
>>>   
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
>>> index 1c8eb806b7d3..cc907a1f1c53 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
>>> @@ -691,7 +691,7 @@ static int __i915_gem_object_create_stolen(struct intel_memory_region *mem,
>>>   
>>>   	obj->stolen = stolen;
>>>   	obj->read_domains = I915_GEM_DOMAIN_CPU | I915_GEM_DOMAIN_GTT;
>>> -	cache_level = HAS_LLC(mem->i915) ? I915_CACHE_LLC : I915_CACHE_NONE;
>>> +	cache_level = HAS_LLC(mem->i915) ? I915_CACHE_CACHED : I915_CACHE_NONE;
>>>   	i915_gem_object_set_cache_coherency(obj, cache_level);
>>>   
>>>   	if (WARN_ON(!i915_gem_object_trylock(obj, NULL)))
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
>>> index 6bd6c239f4ac..107176d1757b 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
>>> @@ -48,14 +48,14 @@ void i915_ttm_migrate_set_ban_memcpy(bool ban)
>>>   }
>>>   #endif
>>>   
>>> -static enum i915_cache_level
>>> -i915_ttm_cache_level(struct drm_i915_private *i915, struct ttm_resource *res,
>>> -		     struct ttm_tt *ttm)
>>> +static i915_cache_t
>>> +i915_ttm_cache(struct drm_i915_private *i915, struct ttm_resource *res,
>>> +	       struct ttm_tt *ttm)
>>>   {
>>>   	return ((HAS_LLC(i915) || HAS_SNOOP(i915)) &&
>>>   		!i915_ttm_gtt_binds_lmem(res) &&
>>> -		ttm->caching == ttm_cached) ? I915_CACHE_LLC :
>>> -		I915_CACHE_NONE;
>>> +		ttm->caching == ttm_cached) ? I915_CACHE_CACHED :
>>> +					      I915_CACHE_NONE;
>>>   }
>>>   
>>>   static unsigned int
>>> @@ -112,8 +112,8 @@ void i915_ttm_adjust_domains_after_move(struct drm_i915_gem_object *obj)
>>>   void i915_ttm_adjust_gem_after_move(struct drm_i915_gem_object *obj)
>>>   {
>>>   	struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
>>> -	unsigned int cache_level;
>>>   	unsigned int mem_flags;
>>> +	i915_cache_t cache;
>>>   	unsigned int i;
>>>   	int mem_type;
>>>   
>>> @@ -126,13 +126,13 @@ void i915_ttm_adjust_gem_after_move(struct drm_i915_gem_object *obj)
>>>   	if (!bo->resource) {
>>>   		mem_flags = I915_BO_FLAG_STRUCT_PAGE;
>>>   		mem_type = I915_PL_SYSTEM;
>>> -		cache_level = I915_CACHE_NONE;
>>> +		cache = I915_CACHE_NONE;
>>>   	} else {
>>>   		mem_flags = i915_ttm_cpu_maps_iomem(bo->resource) ? I915_BO_FLAG_IOMEM :
>>>   			I915_BO_FLAG_STRUCT_PAGE;
>>>   		mem_type = bo->resource->mem_type;
>>> -		cache_level = i915_ttm_cache_level(to_i915(bo->base.dev), bo->resource,
>>> -						   bo->ttm);
>>> +		cache = i915_ttm_cache(to_i915(bo->base.dev), bo->resource,
>>> +				       bo->ttm);
>>>   	}
>>>   
>>>   	/*
>>> @@ -157,7 +157,7 @@ void i915_ttm_adjust_gem_after_move(struct drm_i915_gem_object *obj)
>>>   	obj->mem_flags &= ~(I915_BO_FLAG_STRUCT_PAGE | I915_BO_FLAG_IOMEM);
>>>   	obj->mem_flags |= mem_flags;
>>>   
>>> -	i915_gem_object_set_cache_coherency(obj, cache_level);
>>> +	i915_gem_object_set_cache_coherency(obj, cache);
>>>   }
>>>   
>>>   /**
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
>>> index 1d3ebdf4069b..5d2891981bd4 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
>>> @@ -553,7 +553,7 @@ i915_gem_userptr_ioctl(struct drm_device *dev,
>>>   	obj->mem_flags = I915_BO_FLAG_STRUCT_PAGE;
>>>   	obj->read_domains = I915_GEM_DOMAIN_CPU;
>>>   	obj->write_domain = I915_GEM_DOMAIN_CPU;
>>> -	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
>>> +	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
>>>   
>>>   	obj->userptr.ptr = args->user_ptr;
>>>   	obj->userptr.notifier_seq = ULONG_MAX;
>>> diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c b/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c
>>> index bac957755068..77d04be5e9d7 100644
>>> --- a/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c
>>> +++ b/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c
>>> @@ -123,7 +123,7 @@ huge_gem_object(struct drm_i915_private *i915,
>>>   
>>>   	obj->read_domains = I915_GEM_DOMAIN_CPU;
>>>   	obj->write_domain = I915_GEM_DOMAIN_CPU;
>>> -	cache_level = HAS_LLC(i915) ? I915_CACHE_LLC : I915_CACHE_NONE;
>>> +	cache_level = HAS_LLC(i915) ? I915_CACHE_CACHED : I915_CACHE_NONE;
>>>   	i915_gem_object_set_cache_coherency(obj, cache_level);
>>>   	obj->scratch = phys_size;
>>>   
>>> diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
>>> index 6bddd733d796..6ca5b9dbc414 100644
>>> --- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
>>> +++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
>>> @@ -200,9 +200,10 @@ huge_pages_object(struct drm_i915_private *i915,
>>>   	obj->write_domain = I915_GEM_DOMAIN_CPU;
>>>   	obj->read_domains = I915_GEM_DOMAIN_CPU;
>>>   
>>> -	cache_level = HAS_LLC(i915) ? I915_CACHE_LLC : I915_CACHE_NONE;
>>> +	cache_level = HAS_LLC(i915) ? I915_CACHE_CACHED : I915_CACHE_NONE;
>>>   	i915_gem_object_set_cache_coherency(obj, cache_level);
>>>   
>>> +
>>>   	obj->mm.page_mask = page_mask;
>>>   
>>>   	return obj;
>>> diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>>> index 675f71f06e89..3c93a73cf6b1 100644
>>> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>>> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>>> @@ -16,11 +16,11 @@
>>>   #include "intel_gtt.h"
>>>   
>>>   static u64 gen8_pde_encode(const dma_addr_t addr,
>>> -			   const enum i915_cache_level level)
>>> +			   const enum i915_cache_mode cache_mode)
>>>   {
>>>   	u64 pde = addr | GEN8_PAGE_PRESENT | GEN8_PAGE_RW;
>>>   
>>> -	if (level != I915_CACHE_NONE)
>>> +	if (cache_mode != I915_CACHE_MODE_UC)
>>>   		pde |= PPAT_CACHED_PDE;
>>>   	else
>>>   		pde |= PPAT_UNCACHED;
>>> @@ -43,10 +43,10 @@ static u64 gen8_pte_encode(dma_addr_t addr,
>>>   	 * See translation table defined by LEGACY_CACHELEVEL.
>>>   	 */
>>>   	switch (pat_index) {
>>> -	case I915_CACHE_NONE:
>>> +	case I915_CACHE_MODE_UC:
>>>   		pte |= PPAT_UNCACHED;
>>>   		break;
>>> -	case I915_CACHE_WT:
>>> +	case I915_CACHE_MODE_WT:
>>>   		pte |= PPAT_DISPLAY_ELLC;
>>>   		break;
>>>   	default:
>>> @@ -893,7 +893,7 @@ static int gen8_init_scratch(struct i915_address_space *vm)
>>>   		}
>>>   
>>>   		fill_px(obj, vm->scratch[i - 1]->encode);
>>> -		obj->encode = gen8_pde_encode(px_dma(obj), I915_CACHE_NONE);
>>> +		obj->encode = gen8_pde_encode(px_dma(obj), I915_CACHE_MODE_UC);
>>>   
>>>   		vm->scratch[i] = obj;
>>>   	}
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
>>> index ee15486fed0d..f1e59e512d14 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
>>> @@ -1103,7 +1103,7 @@ static int init_status_page(struct intel_engine_cs *engine)
>>>   		return PTR_ERR(obj);
>>>   	}
>>>   
>>> -	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
>>> +	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
>>>   
>>>   	vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL);
>>>   	if (IS_ERR(vma)) {
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c
>>> index fca61ddca8ad..ab5f654e7557 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
>>> @@ -1011,11 +1011,6 @@ static int gen8_gmch_probe(struct i915_ggtt *ggtt)
>>>   	return ggtt_probe_common(ggtt, size);
>>>   }
>>>   
>>> -/*
>>> - * For pre-gen8 platforms pat_index is the same as enum i915_cache_level,
>>> - * so the switch-case statements in these PTE encode functions are still valid.
>>> - * See translation table LEGACY_CACHELEVEL.
>>> - */
>>>   static u64 snb_pte_encode(dma_addr_t addr,
>>>   			  unsigned int pat_index,
>>>   			  u32 flags)
>>> @@ -1023,11 +1018,11 @@ static u64 snb_pte_encode(dma_addr_t addr,
>>>   	gen6_pte_t pte = GEN6_PTE_ADDR_ENCODE(addr) | GEN6_PTE_VALID;
>>>   
>>>   	switch (pat_index) {
>>> -	case I915_CACHE_L3_LLC:
>>> -	case I915_CACHE_LLC:
>>> +	case I915_CACHE_MODE_WB:
>>> +	case __I915_CACHE_MODE_WB_L3:
>>>   		pte |= GEN6_PTE_CACHE_LLC;
>>>   		break;
>>> -	case I915_CACHE_NONE:
>>> +	case I915_CACHE_MODE_UC:
>>>   		pte |= GEN6_PTE_UNCACHED;
>>>   		break;
>>>   	default:
>>> @@ -1044,13 +1039,13 @@ static u64 ivb_pte_encode(dma_addr_t addr,
>>>   	gen6_pte_t pte = GEN6_PTE_ADDR_ENCODE(addr) | GEN6_PTE_VALID;
>>>   
>>>   	switch (pat_index) {
>>> -	case I915_CACHE_L3_LLC:
>>> +	case __I915_CACHE_MODE_WB_L3:
>>>   		pte |= GEN7_PTE_CACHE_L3_LLC;
>>>   		break;
>>> -	case I915_CACHE_LLC:
>>> +	case I915_CACHE_MODE_WB:
>>>   		pte |= GEN6_PTE_CACHE_LLC;
>>>   		break;
>>> -	case I915_CACHE_NONE:
>>> +	case I915_CACHE_MODE_UC:
>>>   		pte |= GEN6_PTE_UNCACHED;
>>>   		break;
>>>   	default:
>>> @@ -1069,7 +1064,7 @@ static u64 byt_pte_encode(dma_addr_t addr,
>>>   	if (!(flags & PTE_READ_ONLY))
>>>   		pte |= BYT_PTE_WRITEABLE;
>>>   
>>> -	if (pat_index != I915_CACHE_NONE)
>>> +	if (pat_index != I915_CACHE_MODE_UC)
>>>   		pte |= BYT_PTE_SNOOPED_BY_CPU_CACHES;
>>>   
>>>   	return pte;
>>> @@ -1081,7 +1076,7 @@ static u64 hsw_pte_encode(dma_addr_t addr,
>>>   {
>>>   	gen6_pte_t pte = HSW_PTE_ADDR_ENCODE(addr) | GEN6_PTE_VALID;
>>>   
>>> -	if (pat_index != I915_CACHE_NONE)
>>> +	if (pat_index != I915_CACHE_MODE_UC)
>>>   		pte |= HSW_WB_LLC_AGE3;
>>>   
>>>   	return pte;
>>> @@ -1094,9 +1089,9 @@ static u64 iris_pte_encode(dma_addr_t addr,
>>>   	gen6_pte_t pte = HSW_PTE_ADDR_ENCODE(addr) | GEN6_PTE_VALID;
>>>   
>>>   	switch (pat_index) {
>>> -	case I915_CACHE_NONE:
>>> +	case I915_CACHE_MODE_UC:
>>>   		break;
>>> -	case I915_CACHE_WT:
>>> +	case I915_CACHE_MODE_WT:
>>>   		pte |= HSW_WT_ELLC_LLC_AGE3;
>>>   		break;
>>>   	default:
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c b/drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c
>>> index 866c416afb73..803c41ac4ccb 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c
>>> @@ -21,7 +21,7 @@ static void gmch_ggtt_insert_page(struct i915_address_space *vm,
>>>   				  unsigned int pat_index,
>>>   				  u32 unused)
>>>   {
>>> -	unsigned int flags = (pat_index == I915_CACHE_NONE) ?
>>> +	unsigned int flags = (pat_index == I915_CACHE_MODE_UC) ?
>>>   		AGP_USER_MEMORY : AGP_USER_CACHED_MEMORY;
>>>   
>>>   	intel_gmch_gtt_insert_page(addr, offset >> PAGE_SHIFT, flags);
>>> @@ -32,7 +32,7 @@ static void gmch_ggtt_insert_entries(struct i915_address_space *vm,
>>>   				     unsigned int pat_index,
>>>   				     u32 unused)
>>>   {
>>> -	unsigned int flags = (pat_index == I915_CACHE_NONE) ?
>>> +	unsigned int flags = (pat_index == I915_CACHE_MODE_UC) ?
>>>   		AGP_USER_MEMORY : AGP_USER_CACHED_MEMORY;
>>>   
>>>   	intel_gmch_gtt_insert_sg_entries(vma_res->bi.pages, vma_res->start >> PAGE_SHIFT,
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
>>> index 065099362a98..48055304537a 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
>>> @@ -676,7 +676,7 @@ __vm_create_scratch_for_read(struct i915_address_space *vm, unsigned long size)
>>>   	if (IS_ERR(obj))
>>>   		return ERR_CAST(obj);
>>>   
>>> -	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
>>> +	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
>>>   
>>>   	vma = i915_vma_instance(obj, vm, NULL);
>>>   	if (IS_ERR(vma)) {
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
>>> index 7192a534a654..af4277c1d577 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
>>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
>>> @@ -636,7 +636,8 @@ void
>>>   __set_pd_entry(struct i915_page_directory * const pd,
>>>   	       const unsigned short idx,
>>>   	       struct i915_page_table *pt,
>>> -	       u64 (*encode)(const dma_addr_t, const enum i915_cache_level));
>>> +	       u64 (*encode)(const dma_addr_t,
>>> +			     const enum i915_cache_mode cache_mode));
>>>   
>>>   #define set_pd_entry(pd, idx, to) \
>>>   	__set_pd_entry((pd), (idx), px_pt(to), gen8_pde_encode)
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
>>> index 436756bfbb1a..3e461d4f3693 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
>>> @@ -98,14 +98,16 @@ void
>>>   __set_pd_entry(struct i915_page_directory * const pd,
>>>   	       const unsigned short idx,
>>>   	       struct i915_page_table * const to,
>>> -	       u64 (*encode)(const dma_addr_t, const enum i915_cache_level))
>>> +	       u64 (*encode)(const dma_addr_t,
>>> +			     const enum i915_cache_mode cache_mode))
>>>   {
>>>   	/* Each thread pre-pins the pd, and we may have a thread per pde. */
>>>   	GEM_BUG_ON(atomic_read(px_used(pd)) > NALLOC * I915_PDES);
>>>   
>>>   	atomic_inc(px_used(pd));
>>>   	pd->entry[idx] = to;
>>> -	write_dma_entry(px_base(pd), idx, encode(px_dma(to), I915_CACHE_LLC));
>>> +	write_dma_entry(px_base(pd), idx,
>>> +			encode(px_dma(to), I915_CACHE_MODE_WB));
>>>   }
>>>   
>>>   void
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
>>> index 92085ffd23de..9131d228d285 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
>>> @@ -551,7 +551,9 @@ alloc_context_vma(struct intel_engine_cs *engine)
>>>   	 * later platforms don't have L3 control bits in the PTE.
>>>   	 */
>>>   	if (IS_IVYBRIDGE(i915))
>>> -		i915_gem_object_set_cache_coherency(obj, I915_CACHE_L3_LLC);
>>> +		i915_gem_object_set_cache_coherency(obj,
>>> +						    I915_CACHE_CACHED |
>>> +						    __I915_CACHE_FLAG(L3));
>>>   
>>>   	vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL);
>>>   	if (IS_ERR(vma)) {
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_timeline.c b/drivers/gpu/drm/i915/gt/intel_timeline.c
>>> index b9640212d659..025ce54c886d 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_timeline.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_timeline.c
>>> @@ -26,7 +26,7 @@ static struct i915_vma *hwsp_alloc(struct intel_gt *gt)
>>>   	if (IS_ERR(obj))
>>>   		return ERR_CAST(obj);
>>>   
>>> -	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
>>> +	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
>>>   
>>>   	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL);
>>>   	if (IS_ERR(vma))
>>> diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
>>> index 8b0d84f2aad2..fc278fa463b0 100644
>>> --- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
>>> +++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
>>> @@ -64,7 +64,7 @@ static int hang_init(struct hang *h, struct intel_gt *gt)
>>>   		goto err_hws;
>>>   	}
>>>   
>>> -	i915_gem_object_set_cache_coherency(h->hws, I915_CACHE_LLC);
>>> +	i915_gem_object_set_cache_coherency(h->hws, I915_CACHE_CACHED);
>>>   	vaddr = i915_gem_object_pin_map_unlocked(h->hws, I915_MAP_WB);
>>>   	if (IS_ERR(vaddr)) {
>>>   		err = PTR_ERR(vaddr);
>>> diff --git a/drivers/gpu/drm/i915/gt/selftest_workarounds.c b/drivers/gpu/drm/i915/gt/selftest_workarounds.c
>>> index 14a8b25b6204..d25990d33d44 100644
>>> --- a/drivers/gpu/drm/i915/gt/selftest_workarounds.c
>>> +++ b/drivers/gpu/drm/i915/gt/selftest_workarounds.c
>>> @@ -111,7 +111,7 @@ read_nonprivs(struct intel_context *ce)
>>>   	if (IS_ERR(result))
>>>   		return result;
>>>   
>>> -	i915_gem_object_set_cache_coherency(result, I915_CACHE_LLC);
>>> +	i915_gem_object_set_cache_coherency(result, I915_CACHE_CACHED);
>>>   
>>>   	cs = i915_gem_object_pin_map_unlocked(result, I915_MAP_WB);
>>>   	if (IS_ERR(cs)) {
>>> diff --git a/drivers/gpu/drm/i915/i915_cache.c b/drivers/gpu/drm/i915/i915_cache.c
>>> index 06eb5933c719..f4ba1cb430d3 100644
>>> --- a/drivers/gpu/drm/i915/i915_cache.c
>>> +++ b/drivers/gpu/drm/i915/i915_cache.c
>>> @@ -6,13 +6,88 @@
>>>   #include "i915_cache.h"
>>>   #include "i915_drv.h"
>>>   
>>> -void i915_cache_init(struct drm_i915_private *i915)
>>> +int i915_cache_init(struct drm_i915_private *i915)
>>>   {
>>> -	i915->pat_uc = i915_gem_get_pat_index(i915, I915_CACHE_NONE);
>>> -	drm_info(&i915->drm, "Using PAT index %u for uncached access\n",
>>> -		 i915->pat_uc);
>>> +	int ret;
>>>   
>>> -	i915->pat_wb = i915_gem_get_pat_index(i915, I915_CACHE_LLC);
>>> -	drm_info(&i915->drm, "Using PAT index %u for write-back access\n",
>>> -		 i915->pat_wb);
>>> +	ret = i915_cache_find_pat(i915, I915_CACHE_NONE);
>>> +	if (ret < 0) {
>>> +		drm_err(&i915->drm,
>>> +			"Failed to find PAT index for uncached access\n");
>>> +		return -ENODEV;
>>> +	}
>>> +	drm_info(&i915->drm, "Using PAT index %u for uncached access\n", ret);
>>> +	i915->pat_uc = ret;
>>> +
>>> +	ret = i915_cache_find_pat(i915, I915_CACHE_CACHED);
>>> +	if (ret < 0) {
>>> +		drm_err(&i915->drm,
>>> +			"Failed to find PAT index for write-back access\n");
>>> +		return -ENODEV;
>>> +	}
>>> +	drm_info(&i915->drm, "Using PAT index %u for write-back access\n", ret);
>>> +	i915->pat_wb = ret;
>>> +
>>> +	return 0;
>>> +}
>>> +
>>> +int i915_cache_find_pat(struct drm_i915_private *i915, i915_cache_t cache)
>>> +{
>>> +	const struct intel_device_info *info = INTEL_INFO(i915);
>>> +	int i;
>>> +
>>> +	for (i = 0; i < ARRAY_SIZE(info->cache_modes); i++) {
>>> +		if (info->cache_modes[i] == cache)
>>> +			return i;
>>> +	}
>>> +
>>> +	return -1;
>>> +}
>>> +
>>> +void i915_cache_print(char *buf, size_t buflen, const char *suffix,
>>> +		      i915_cache_t cache)
>>> +{
>>> +	const enum i915_cache_mode mode = I915_CACHE_MODE(cache);
>>> +	static const char * const mode_str[] = {
>>> +		[I915_CACHE_MODE_UC] = "UC",
>>> +		[I915_CACHE_MODE_WB] = "WB",
>>> +		[I915_CACHE_MODE_WT] = "WT",
>>> +		[I915_CACHE_MODE_WC] = "WC",
>>> +	};
>>> +	static const char * const flag_str[] = {
>>> +		[ilog2(I915_CACHE_FLAG_COH1W)] = "1-Way-Coherent",
>>> +		[ilog2(I915_CACHE_FLAG_COH2W)] = "2-Way-Coherent",
>>> +		[ilog2(I915_CACHE_FLAG_L3)] =    "L3",
>>> +		[ilog2(I915_CACHE_FLAG_CLOS1)] = "CLOS1",
>>> +		[ilog2(I915_CACHE_FLAG_CLOS2)] = "CLOS2",
>>> +	};
>>> +
>>> +	if (mode > ARRAY_SIZE(mode_str)) {
>>> +		snprintf(buf, buflen, "0x%x%s", cache, suffix ?: "");
>>> +	} else {
>>> +		unsigned long flags = I915_CACHE_FLAGS(cache);
>>> +		unsigned long bit;
>>> +		int ret;
>>> +
>>> +		ret = snprintf(buf, buflen, "%s", mode_str[mode]);
>>> +		buf += ret;
>>> +		buflen -= ret;
>>> +
>>> +		/*
>>> +		 * Don't print "1-way-2-way", it would be confusing and 2-way
>>> +		 * implies 1-way anyway.
>>> +		 */
>>> +		if ((flags & (I915_CACHE_FLAG_COH1W | I915_CACHE_FLAG_COH2W)) ==
>>> +		    (I915_CACHE_FLAG_COH1W | I915_CACHE_FLAG_COH2W))
>>> +			flags &= ~I915_CACHE_FLAG_COH1W;
>>> +
>>> +		for_each_set_bit(bit, &flags, BITS_PER_TYPE(i915_cache_t)) {
>>> +			ret = snprintf(buf, buflen, "-%s", flag_str[bit]);
>>> +			buf += ret;
>>> +			buflen -= ret;
>>> +		}
>>> +
>>> +		if (suffix)
>>> +			snprintf(buf, buflen, "%s", suffix);
>>> +	}
>>>   }
>>> diff --git a/drivers/gpu/drm/i915/i915_cache.h b/drivers/gpu/drm/i915/i915_cache.h
>>> index cb68936fb8a2..d9e97318b942 100644
>>> --- a/drivers/gpu/drm/i915/i915_cache.h
>>> +++ b/drivers/gpu/drm/i915/i915_cache.h
>>> @@ -6,8 +6,76 @@
>>>   #ifndef __I915_CACHE_H__
>>>   #define __I915_CACHE_H__
>>>   
>>> +#include <linux/types.h>
>>> +
>>> +struct drm_printer;
>>> +
>>>   struct drm_i915_private;
>>>   
>>> -void i915_cache_init(struct drm_i915_private *i915);
>>> +typedef u16 i915_cache_t;
>>> +
>>> +/* Cache modes */
>>> +enum i915_cache_mode {
>>> +	I915_CACHE_MODE_UC = 0,
>>> +	I915_CACHE_MODE_WB,
>>> +	__I915_CACHE_MODE_WB_L3, /* Special do-not-use entry for legacy 1:1 mapping. */
>>> +	I915_CACHE_MODE_WT,
>>> +	I915_CACHE_MODE_WC,
>>> +	I915_NUM_CACHE_MODES
>>> +};
>>> +
>>> +/* Cache mode flag bits */
>>> +#define I915_CACHE_FLAG_COH1W	(0x1)
>>> +#define I915_CACHE_FLAG_COH2W	(0x2) /* 1-way needs to be set too. */
>>> +#define I915_CACHE_FLAG_L3	(0x4)
>>> +#define I915_CACHE_FLAG_CLOS1	(0x8)
>>> +#define I915_CACHE_FLAG_CLOS2	(0x10)
>>> +
>>> +/*
>>> + * Overloaded I915_CACHE() macro based on:
>>> + *  https://stackoverflow.com/questions/3046889/optional-parameters-with-c-macros
>>> + *
>>> + * It is possible to call I915_CACHE with mode and zero or more flags as
>>> + * separate arguments. Ie these all work:
>>> + *
>>> + *   I915_CACHE(WB)
>>> + *   I915_CACHE(WB, COH1W, COH2W)
>>> + *   I915_CACHE(WB, COH1W, COH2W, L3)
>>> + */
>>> +
>>> +#define __I915_CACHE_FLAG(f) (I915_CACHE_FLAG_##f << 8)
>>> +#define __I915_CACHE(m, f) ((i915_cache_t)(I915_CACHE_MODE_##m | (f)))
>>> +
>>> +#define I915_CACHE_4(m, f1, f2, f3)	__I915_CACHE(m, __I915_CACHE_FLAG(f1) | __I915_CACHE_FLAG(f2) | __I915_CACHE_FLAG(f3))
>>> +#define I915_CACHE_3(m, f1, f2)		__I915_CACHE(m, __I915_CACHE_FLAG(f1) | __I915_CACHE_FLAG(f2))
>>> +#define I915_CACHE_2(m, f1)		__I915_CACHE(m, __I915_CACHE_FLAG(f1))
>>> +#define I915_CACHE_1(m)			__I915_CACHE(m, 0)
>>> +#define I915_CACHE_0(m)			__I915_CACHE(WC, 0)
>>> +
>>> +#define FUNC_CHOOSER(_f1, _f2, _f3, _f4, _f5, ...) _f5
>>> +#define FUNC_RECOMPOSER(argsWithParentheses) FUNC_CHOOSER argsWithParentheses
>>> +#define CHOOSE_FROM_ARG_COUNT(...) FUNC_RECOMPOSER((__VA_ARGS__, I915_CACHE_4, I915_CACHE_3, I915_CACHE_2, I915_CACHE_1, ))
>>> +#define NO_ARG_EXPANDER() ,,,I915_CACHE_0
>>> +#define MACRO_CHOOSER(...) CHOOSE_FROM_ARG_COUNT(NO_ARG_EXPANDER __VA_ARGS__ ())
>>> +
>>> +#define I915_CACHE(...) MACRO_CHOOSER(__VA_ARGS__)(__VA_ARGS__)
>>> +
>>> +/* i915_cache_t mode and flags extraction helpers. */
>>> +#define I915_CACHE_MODE(cache) \
>>> +	((enum i915_cache_mode)(((i915_cache_t)(cache)) & 0xff))
>>> +#define I915_CACHE_FLAGS(cache) \
>>> +	((unsigned int)((((i915_cache_t)(cache) & 0xff00)) >> 8))
>>> +
>>> +/* Helpers for i915 caching modes. */
>>> +#define I915_CACHE_NONE		I915_CACHE(UC)
>>> +#define I915_CACHE_CACHED	I915_CACHE(WB, COH1W, COH2W)
>>> +#define I915_CACHE_WT		I915_CACHE(WT)
>>> +
>>> +int i915_cache_init(struct drm_i915_private *i915);
>>> +int i915_cache_find_pat(struct drm_i915_private *i915, i915_cache_t cache);
>>> +void i915_cache_print(char *buf, size_t buflen, const char *suffix,
>>> +		      i915_cache_t cache);
>>> +
>>> +#define I915_CACHE_NAME_LEN (40)
>>>   
>>>   #endif /* __I915_CACHE_H__ */
>>> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
>>> index 4de44cf1026d..4ec292011546 100644
>>> --- a/drivers/gpu/drm/i915/i915_debugfs.c
>>> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
>>> @@ -140,57 +140,18 @@ static const char *stringify_vma_type(const struct i915_vma *vma)
>>>   	return "ppgtt";
>>>   }
>>>   
>>> -static const char *i915_cache_level_str(struct drm_i915_gem_object *obj)
>>> -{
>>> -	struct drm_i915_private *i915 = obj_to_i915(obj);
>>> -
>>> -	if (IS_METEORLAKE(i915)) {
>>> -		switch (obj->pat_index) {
>>> -		case 0: return " WB";
>>> -		case 1: return " WT";
>>> -		case 2: return " UC";
>>> -		case 3: return " WB (1-Way Coh)";
>>> -		case 4: return " WB (2-Way Coh)";
>>> -		default: return " not defined";
>>> -		}
>>> -	} else if (IS_PONTEVECCHIO(i915)) {
>>> -		switch (obj->pat_index) {
>>> -		case 0: return " UC";
>>> -		case 1: return " WC";
>>> -		case 2: return " WT";
>>> -		case 3: return " WB";
>>> -		case 4: return " WT (CLOS1)";
>>> -		case 5: return " WB (CLOS1)";
>>> -		case 6: return " WT (CLOS2)";
>>> -		case 7: return " WT (CLOS2)";
>>> -		default: return " not defined";
>>> -		}
>>> -	} else if (GRAPHICS_VER(i915) >= 12) {
>>> -		switch (obj->pat_index) {
>>> -		case 0: return " WB";
>>> -		case 1: return " WC";
>>> -		case 2: return " WT";
>>> -		case 3: return " UC";
>>> -		default: return " not defined";
>>> -		}
>>> -	} else {
>>> -		switch (obj->pat_index) {
>>> -		case 0: return " UC";
>>> -		case 1: return HAS_LLC(i915) ?
>>> -			       " LLC" : " snooped";
>>> -		case 2: return " L3+LLC";
>>> -		case 3: return " WT";
>>> -		default: return " not defined";
>>> -		}
>>> -	}
>>> -}
>>> -
>>>   void
>>>   i915_debugfs_describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
>>>   {
>>> +	struct drm_i915_private *i915 = to_i915(obj->base.dev);
>>> +	char buf[I915_CACHE_NAME_LEN];
>>>   	struct i915_vma *vma;
>>>   	int pin_count = 0;
>>>   
>>> +	i915_cache_print(buf, sizeof(buf),
>>> +			 obj->pat_set_by_user ? "!" : NULL,
>>> +			 INTEL_INFO(i915)->cache_modes[obj->pat_index]);
>>> +
>>>   	seq_printf(m, "%pK: %c%c%c %8zdKiB %02x %02x %s%s%s",
>>>   		   &obj->base,
>>>   		   get_tiling_flag(obj),
>>> @@ -199,7 +160,7 @@ i915_debugfs_describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
>>>   		   obj->base.size / 1024,
>>>   		   obj->read_domains,
>>>   		   obj->write_domain,
>>> -		   i915_cache_level_str(obj),
>>> +		   buf,
>>>   		   obj->mm.dirty ? " dirty" : "",
>>>   		   obj->mm.madv == I915_MADV_DONTNEED ? " purgeable" : "");
>>>   	if (obj->base.name)
>>> diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
>>> index bb2223cc3470..8663388a524f 100644
>>> --- a/drivers/gpu/drm/i915/i915_driver.c
>>> +++ b/drivers/gpu/drm/i915/i915_driver.c
>>> @@ -241,7 +241,9 @@ static int i915_driver_early_probe(struct drm_i915_private *dev_priv)
>>>   	i915_memcpy_init_early(dev_priv);
>>>   	intel_runtime_pm_init_early(&dev_priv->runtime_pm);
>>>   
>>> -	i915_cache_init(dev_priv);
>>> +	ret = i915_cache_init(dev_priv);
>>> +	if (ret < 0)
>>> +		return ret;
>>>   
>>>   	ret = i915_workqueues_init(dev_priv);
>>>   	if (ret < 0)
>>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>>> index 896aa48ed089..814705cfeb12 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem.c
>>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>>> @@ -1144,19 +1144,6 @@ int i915_gem_init(struct drm_i915_private *dev_priv)
>>>   	unsigned int i;
>>>   	int ret;
>>>   
>>> -	/*
>>> -	 * In the proccess of replacing cache_level with pat_index a tricky
>>> -	 * dependency is created on the definition of the enum i915_cache_level.
>>> -	 * in case this enum is changed, PTE encode would be broken.
>>> -	 * Add a WARNING here. And remove when we completely quit using this
>>> -	 * enum
>>> -	 */
>>> -	BUILD_BUG_ON(I915_CACHE_NONE != 0 ||
>>> -		     I915_CACHE_LLC != 1 ||
>>> -		     I915_CACHE_L3_LLC != 2 ||
>>> -		     I915_CACHE_WT != 3 ||
>>> -		     I915_MAX_CACHE_LEVEL != 4);
>>> -
>>>   	/* We need to fallback to 4K pages if host doesn't support huge gtt. */
>>>   	if (intel_vgpu_active(dev_priv) && !intel_vgpu_has_huge_gtt(dev_priv))
>>>   		RUNTIME_INFO(dev_priv)->page_sizes = I915_GTT_PAGE_SIZE_4K;
>>> diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
>>> index fcacdc21643c..565a60a1645d 100644
>>> --- a/drivers/gpu/drm/i915/i915_pci.c
>>> +++ b/drivers/gpu/drm/i915/i915_pci.c
>>> @@ -32,6 +32,7 @@
>>>   #include "gt/intel_sa_media.h"
>>>   #include "gem/i915_gem_object_types.h"
>>>   
>>> +#include "i915_cache.h"
>>>   #include "i915_driver.h"
>>>   #include "i915_drv.h"
>>>   #include "i915_pci.h"
>>> @@ -43,36 +44,43 @@
>>>   	.__runtime.graphics.ip.ver = (x), \
>>>   	.__runtime.media.ip.ver = (x)
>>>   
>>> -#define LEGACY_CACHELEVEL \
>>> -	.cachelevel_to_pat = { \
>>> -		[I915_CACHE_NONE]   = 0, \
>>> -		[I915_CACHE_LLC]    = 1, \
>>> -		[I915_CACHE_L3_LLC] = 2, \
>>> -		[I915_CACHE_WT]     = 3, \
>>> +#define LEGACY_CACHE_MODES \
>>> +	.cache_modes = { \
>>> +		[I915_CACHE_MODE_UC] 	  = I915_CACHE(UC), \
>>> +		[I915_CACHE_MODE_WB] 	  = I915_CACHE(WB, COH1W, COH2W), \
>>
>> Reading bspec 2863 (bdw) indicates that the CPU being able to snoop the
>> GPU's L3 was a new feature in gen8.  So for HSW and earlier, any
>> coherency was only 1-way (GPU could be coherent with CPU's caches, but
>> not vice-versa).  Only starting with gen8 did we get 2-way coherency as
>> an option where the CPU would also be coherent with the GPU cache (and
>> with gen8 and beyond you could still select 1-way instead of 2-way
>> coherency with instruction-level granularity via MOCS).  There are also
>> some legacy platforms (e.g., EHL/JSL on bspec 13948) where the IA wasn't
>> coherent with GPU L3 so we were back to 1-way coherency.
>>
>> So should we split LEGACY_CACHE_MODES into two tables with different
>> coherency settings attached to I915_CACHE_MODE_WB?
>>
>>> +		[__I915_CACHE_MODE_WB_L3] = I915_CACHE(WB, COH1W, COH2W, L3), \
>>> +		[I915_CACHE_MODE_WT] 	  = I915_CACHE(WT), \
>>>   	}
>>>   
>>> -#define TGL_CACHELEVEL \
>>> -	.cachelevel_to_pat = { \
>>> -		[I915_CACHE_NONE]   = 3, \
>>> -		[I915_CACHE_LLC]    = 0, \
>>> -		[I915_CACHE_L3_LLC] = 0, \
>>> -		[I915_CACHE_WT]     = 2, \
>>> +#define GEN12_CACHE_MODES \
>>> +	.cache_modes = { \
>>> +		[0] = I915_CACHE(WB, COH1W, COH2W), \
>>> +		[1] = I915_CACHE(WC), \
>>> +		[2] = I915_CACHE(WT), \
>>> +		[3] = I915_CACHE(UC), \
>>>   	}
>>>   
>>> -#define PVC_CACHELEVEL \
>>> -	.cachelevel_to_pat = { \
>>> -		[I915_CACHE_NONE]   = 0, \
>>> -		[I915_CACHE_LLC]    = 3, \
>>> -		[I915_CACHE_L3_LLC] = 3, \
>>> -		[I915_CACHE_WT]     = 2, \
>>> +/* FIXME is 1-way or 2-way for 3, 5, 7 */
>>> +
>>> +#define PVC_CACHE_MODES \
>>> +	.cache_modes = { \
>>> +		[0] = I915_CACHE(UC), \
>>> +		[1] = I915_CACHE(WC), \
>>> +		[2] = I915_CACHE(WT), \
>>> +		[3] = I915_CACHE(WB, COH1W), \
>>> +		[4] = I915_CACHE(WT, CLOS1), \
>>> +		[5] = I915_CACHE(WB, COH1W, CLOS1), \
>>> +		[6] = I915_CACHE(WT, CLOS2), \
>>> +		[7] = I915_CACHE(WB, COH1W, CLOS2), \
>>>   	}
>>>   
>>> -#define MTL_CACHELEVEL \
>>> -	.cachelevel_to_pat = { \
>>> -		[I915_CACHE_NONE]   = 2, \
>>> -		[I915_CACHE_LLC]    = 3, \
>>> -		[I915_CACHE_L3_LLC] = 3, \
>>> -		[I915_CACHE_WT]     = 1, \
>>> +#define MTL_CACHE_MODES \
>>> +	.cache_modes = { \
>>> +		[0] = I915_CACHE(WB), \
>>> +		[1] = I915_CACHE(WT), \
>>> +		[2] = I915_CACHE(UC), \
>>> +		[3] = I915_CACHE(WB, COH1W), \
>>> +		[4] = I915_CACHE(WB, COH1W, COH2W), \
>>
>> We may want a comment on this one since the "2W" part is sort of a lie.
>> Bspec 63884 has a programming note for MTL that says
>>
>>          "...Except for system atomics, setting Coherency Mode to 10 or
>>          11 results in this same one-way coherenct behavior..."
>>
>> So if we ask for 2W, we actually only get 1W behavior except in a very
>> narrow set of cases.
>>
>>
>> Matt
>>
>>>   	}
>>>   
>>>   /* Keep in gen based order, and chronological order within a gen */
>>> @@ -97,7 +105,7 @@
>>>   	.max_pat_index = 3, \
>>>   	GEN_DEFAULT_PAGE_SIZES, \
>>>   	GEN_DEFAULT_REGIONS, \
>>> -	LEGACY_CACHELEVEL
>>> +	LEGACY_CACHE_MODES
>>>   
>>>   #define I845_FEATURES \
>>>   	GEN(2), \
>>> @@ -112,7 +120,7 @@
>>>   	.max_pat_index = 3, \
>>>   	GEN_DEFAULT_PAGE_SIZES, \
>>>   	GEN_DEFAULT_REGIONS, \
>>> -	LEGACY_CACHELEVEL
>>> +	LEGACY_CACHE_MODES
>>>   
>>>   static const struct intel_device_info i830_info = {
>>>   	I830_FEATURES,
>>> @@ -145,7 +153,7 @@ static const struct intel_device_info i865g_info = {
>>>   	.max_pat_index = 3, \
>>>   	GEN_DEFAULT_PAGE_SIZES, \
>>>   	GEN_DEFAULT_REGIONS, \
>>> -	LEGACY_CACHELEVEL
>>> +	LEGACY_CACHE_MODES
>>>   
>>>   static const struct intel_device_info i915g_info = {
>>>   	GEN3_FEATURES,
>>> @@ -208,7 +216,7 @@ static const struct intel_device_info pnv_m_info = {
>>>   	.max_pat_index = 3, \
>>>   	GEN_DEFAULT_PAGE_SIZES, \
>>>   	GEN_DEFAULT_REGIONS, \
>>> -	LEGACY_CACHELEVEL
>>> +	LEGACY_CACHE_MODES
>>>   
>>>   static const struct intel_device_info i965g_info = {
>>>   	GEN4_FEATURES,
>>> @@ -252,7 +260,7 @@ static const struct intel_device_info gm45_info = {
>>>   	.max_pat_index = 3, \
>>>   	GEN_DEFAULT_PAGE_SIZES, \
>>>   	GEN_DEFAULT_REGIONS, \
>>> -	LEGACY_CACHELEVEL
>>> +	LEGACY_CACHE_MODES
>>>   
>>>   static const struct intel_device_info ilk_d_info = {
>>>   	GEN5_FEATURES,
>>> @@ -282,7 +290,7 @@ static const struct intel_device_info ilk_m_info = {
>>>   	.__runtime.ppgtt_size = 31, \
>>>   	GEN_DEFAULT_PAGE_SIZES, \
>>>   	GEN_DEFAULT_REGIONS, \
>>> -	LEGACY_CACHELEVEL
>>> +	LEGACY_CACHE_MODES
>>>   
>>>   #define SNB_D_PLATFORM \
>>>   	GEN6_FEATURES, \
>>> @@ -330,7 +338,7 @@ static const struct intel_device_info snb_m_gt2_info = {
>>>   	.__runtime.ppgtt_size = 31, \
>>>   	GEN_DEFAULT_PAGE_SIZES, \
>>>   	GEN_DEFAULT_REGIONS, \
>>> -	LEGACY_CACHELEVEL
>>> +	LEGACY_CACHE_MODES
>>>   
>>>   #define IVB_D_PLATFORM \
>>>   	GEN7_FEATURES, \
>>> @@ -387,7 +395,7 @@ static const struct intel_device_info vlv_info = {
>>>   	.platform_engine_mask = BIT(RCS0) | BIT(VCS0) | BIT(BCS0),
>>>   	GEN_DEFAULT_PAGE_SIZES,
>>>   	GEN_DEFAULT_REGIONS,
>>> -	LEGACY_CACHELEVEL,
>>> +	LEGACY_CACHE_MODES
>>>   };
>>>   
>>>   #define G75_FEATURES  \
>>> @@ -473,7 +481,7 @@ static const struct intel_device_info chv_info = {
>>>   	.has_coherent_ggtt = false,
>>>   	GEN_DEFAULT_PAGE_SIZES,
>>>   	GEN_DEFAULT_REGIONS,
>>> -	LEGACY_CACHELEVEL,
>>> +	LEGACY_CACHE_MODES
>>>   };
>>>   
>>>   #define GEN9_DEFAULT_PAGE_SIZES \
>>> @@ -536,7 +544,7 @@ static const struct intel_device_info skl_gt4_info = {
>>>   	.max_pat_index = 3, \
>>>   	GEN9_DEFAULT_PAGE_SIZES, \
>>>   	GEN_DEFAULT_REGIONS, \
>>> -	LEGACY_CACHELEVEL
>>> +	LEGACY_CACHE_MODES
>>>   
>>>   static const struct intel_device_info bxt_info = {
>>>   	GEN9_LP_FEATURES,
>>> @@ -640,7 +648,7 @@ static const struct intel_device_info jsl_info = {
>>>   #define GEN12_FEATURES \
>>>   	GEN11_FEATURES, \
>>>   	GEN(12), \
>>> -	TGL_CACHELEVEL, \
>>> +	GEN12_CACHE_MODES, \
>>>   	.has_global_mocs = 1, \
>>>   	.has_pxp = 1, \
>>>   	.max_pat_index = 3
>>> @@ -708,7 +716,7 @@ static const struct intel_device_info adl_p_info = {
>>>   	.__runtime.graphics.ip.ver = 12, \
>>>   	.__runtime.graphics.ip.rel = 50, \
>>>   	XE_HP_PAGE_SIZES, \
>>> -	TGL_CACHELEVEL, \
>>> +	GEN12_CACHE_MODES, \
>>>   	.dma_mask_size = 46, \
>>>   	.has_3d_pipeline = 1, \
>>>   	.has_64bit_reloc = 1, \
>>> @@ -803,7 +811,7 @@ static const struct intel_device_info pvc_info = {
>>>   		BIT(VCS0) |
>>>   		BIT(CCS0) | BIT(CCS1) | BIT(CCS2) | BIT(CCS3),
>>>   	.require_force_probe = 1,
>>> -	PVC_CACHELEVEL,
>>> +	PVC_CACHE_MODES
>>>   };
>>>   
>>>   static const struct intel_gt_definition xelpmp_extra_gt[] = {
>>> @@ -838,7 +846,7 @@ static const struct intel_device_info mtl_info = {
>>>   	.memory_regions = REGION_SMEM | REGION_STOLEN_LMEM,
>>>   	.platform_engine_mask = BIT(RCS0) | BIT(BCS0) | BIT(CCS0),
>>>   	.require_force_probe = 1,
>>> -	MTL_CACHELEVEL,
>>> +	MTL_CACHE_MODES
>>>   };
>>>   
>>>   #undef PLATFORM
>>> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
>>> index 04bc1f4a1115..973175a64534 100644
>>> --- a/drivers/gpu/drm/i915/i915_perf.c
>>> +++ b/drivers/gpu/drm/i915/i915_perf.c
>>> @@ -1870,7 +1870,7 @@ static int alloc_oa_buffer(struct i915_perf_stream *stream)
>>>   		return PTR_ERR(bo);
>>>   	}
>>>   
>>> -	i915_gem_object_set_cache_coherency(bo, I915_CACHE_LLC);
>>> +	i915_gem_object_set_cache_coherency(bo, I915_CACHE_CACHED);
>>>   
>>>   	/* PreHSW required 512K alignment, HSW requires 16M */
>>>   	vma = i915_vma_instance(bo, &gt->ggtt->vm, NULL);
>>> diff --git a/drivers/gpu/drm/i915/intel_device_info.h b/drivers/gpu/drm/i915/intel_device_info.h
>>> index dbfe6443457b..2ce13b7c48cb 100644
>>> --- a/drivers/gpu/drm/i915/intel_device_info.h
>>> +++ b/drivers/gpu/drm/i915/intel_device_info.h
>>> @@ -27,6 +27,8 @@
>>>   
>>>   #include <uapi/drm/i915_drm.h>
>>>   
>>> +#include "i915_cache.h"
>>> +
>>>   #include "intel_step.h"
>>>   
>>>   #include "gt/intel_engine_types.h"
>>> @@ -243,8 +245,8 @@ struct intel_device_info {
>>>   	 */
>>>   	const struct intel_runtime_info __runtime;
>>>   
>>> -	u32 cachelevel_to_pat[I915_MAX_CACHE_LEVEL];
>>> -	u32 max_pat_index;
>>> +	i915_cache_t cache_modes[8];
>>> +	unsigned int max_pat_index;
>>>   };
>>>   
>>>   struct intel_driver_caps {
>>> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
>>> index f910ec9b6d2b..ba821e48baa5 100644
>>> --- a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
>>> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
>>> @@ -267,7 +267,7 @@ static int igt_evict_for_cache_color(void *arg)
>>>   		err = PTR_ERR(obj);
>>>   		goto cleanup;
>>>   	}
>>> -	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
>>> +	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
>>>   	quirk_add(obj, &objects);
>>>   
>>>   	vma = i915_gem_object_ggtt_pin(obj, NULL, 0, 0,
>>> @@ -283,7 +283,7 @@ static int igt_evict_for_cache_color(void *arg)
>>>   		err = PTR_ERR(obj);
>>>   		goto cleanup;
>>>   	}
>>> -	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
>>> +	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
>>>   	quirk_add(obj, &objects);
>>>   
>>>   	/* Neighbouring; same colour - should fit */
>>> diff --git a/drivers/gpu/drm/i915/selftests/igt_spinner.c b/drivers/gpu/drm/i915/selftests/igt_spinner.c
>>> index 3c5e0952f1b8..4cfc5000d6ff 100644
>>> --- a/drivers/gpu/drm/i915/selftests/igt_spinner.c
>>> +++ b/drivers/gpu/drm/i915/selftests/igt_spinner.c
>>> @@ -23,7 +23,7 @@ int igt_spinner_init(struct igt_spinner *spin, struct intel_gt *gt)
>>>   		err = PTR_ERR(spin->hws);
>>>   		goto err;
>>>   	}
>>> -	i915_gem_object_set_cache_coherency(spin->hws, I915_CACHE_LLC);
>>> +	i915_gem_object_set_cache_coherency(spin->hws, I915_CACHE_CACHED);
>>>   
>>>   	spin->obj = i915_gem_object_create_internal(gt->i915, PAGE_SIZE);
>>>   	if (IS_ERR(spin->obj)) {
>>> diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
>>> index 1d1a457e2aee..8ae77bcf27fa 100644
>>> --- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
>>> +++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
>>> @@ -126,13 +126,13 @@ static const struct intel_device_info mock_info = {
>>>   	.memory_regions = REGION_SMEM,
>>>   	.platform_engine_mask = BIT(0),
>>>   
>>> -	/* simply use legacy cache level for mock device */
>>> +	/* Simply use legacy cache modes for the mock device. */
>>>   	.max_pat_index = 3,
>>> -	.cachelevel_to_pat = {
>>> -		[I915_CACHE_NONE]   = 0,
>>> -		[I915_CACHE_LLC]    = 1,
>>> -		[I915_CACHE_L3_LLC] = 2,
>>> -		[I915_CACHE_WT]     = 3,
>>> +	.cache_modes = {
>>> +		[0] = I915_CACHE(UC),
>>> +		[1] = I915_CACHE(WB, COH1W),
>>> +		[2] = I915_CACHE(WB, COH1W, COH2W, L3),
>>> +		[3] = I915_CACHE(WT),
>>>   	},
>>>   };
>>>   
>>> @@ -181,7 +181,7 @@ struct drm_i915_private *mock_gem_device(void)
>>>   	/* Set up device info and initial runtime info. */
>>>   	intel_device_info_driver_create(i915, pdev->device, &mock_info);
>>>   
>>> -	i915_cache_init(i915);
>>> +	WARN_ON(i915_cache_init(i915));
>>>   
>>>   	dev_pm_domain_set(&pdev->dev, &pm_domain);
>>>   	pm_runtime_enable(&pdev->dev);
>>> -- 
>>> 2.39.2
>>>
>>
>> -- 
>> Matt Roper
>> Graphics Software Engineer
>> Linux GPU Platform Enablement
>> Intel Corporation
> 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC 4/8] drm/i915: Refactor PAT/object cache handling
  2023-07-27 23:57     ` [Intel-gfx] " Matt Roper
@ 2023-07-28 12:39       ` Tvrtko Ursulin
  -1 siblings, 0 replies; 59+ messages in thread
From: Tvrtko Ursulin @ 2023-07-28 12:39 UTC (permalink / raw)
  To: Matt Roper
  Cc: Fei Yang, Tvrtko Ursulin, Intel-gfx, dri-devel, Andi Shyti, Chris Wilson


Forgot one part of your reply:

On 28/07/2023 00:57, Matt Roper wrote:
> On Thu, Jul 27, 2023 at 03:55:00PM +0100, Tvrtko Ursulin wrote:
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> Commit 9275277d5324 ("drm/i915: use pat_index instead of cache_level") has
>> introduced PAT indices to i915 internal APIs, partially replacing the
>> usage of driver internal cache_level, but has also added a few sub-
>> optimal design decisions which this patch tries to improve upon.
>>
>> Principal change here is to invert the per platform cache level to PAT
>> index table which was added by the referenced commit, and by doing so
>> enable i915 to understand the cache mode between PAT indices, changing
>> them from opaque to transparent.
>>
>> Once we have the inverted table we are able to remove the hidden false
>> "return true" from i915_gem_object_has_cache_level and make the involved
>> code path clearer.
>>
>> To achieve this we replace the enum i915_cache_level with i915_cache_t,
>> composed of a more detailed representation of each cache mode (base mode
>> plus flags).
>>
>> In this way we are able to express the differences between different
>> write-back mode coherency settings on Meteorlake, which in turn enables us
>> to map the i915 "cached" mode to the correct Meteorlake PAT index.
>>
>> We can also replace the platform dependent cache mode to string code in
>> debugfs and elsewhere by the single implementation based on i915_cache_t.
>>
>> v2:
>>   * Fix PAT-to-cache-mode table for PVC. (Fei)
>>   * Cache display caching mode too. (Fei)
>>   * Improve and document criteria in i915_gem_object_can_bypass_llc() (Matt)
>>
>> v3:
>>   * Checkpath issues.
>>   * Cache mode flags check fixed.
>>
>> v4:
>>   * Fix intel_device_info->cache_modes array size. (Matt)
>>   * Boolean cache mode and flags query. (Matt)
>>   * Reduce number of cache macros with some macro magic.
>>   * One more checkpatch fix.
>>   * Tweak tables to show legacy and Gen12 WB is fully coherent.
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> References: 9275277d5324 ("drm/i915: use pat_index instead of cache_level")
>> Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
>> Cc: Fei Yang <fei.yang@intel.com>
>> Cc: Andi Shyti <andi.shyti@linux.intel.com>
>> Cc: Matt Roper <matthew.d.roper@intel.com>
>> ---
>>   drivers/gpu/drm/i915/gem/i915_gem_domain.c    |  60 +++++----
>>   drivers/gpu/drm/i915/gem/i915_gem_domain.h    |   5 +-
>>   .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |   3 +-
>>   drivers/gpu/drm/i915/gem/i915_gem_internal.c  |   2 +-
>>   drivers/gpu/drm/i915/gem/i915_gem_mman.c      |   4 +-
>>   drivers/gpu/drm/i915/gem/i915_gem_object.c    | 117 ++++++++++--------
>>   drivers/gpu/drm/i915/gem/i915_gem_object.h    |  11 +-
>>   .../gpu/drm/i915/gem/i915_gem_object_types.h  | 116 +----------------
>>   drivers/gpu/drm/i915/gem/i915_gem_shmem.c     |   8 +-
>>   drivers/gpu/drm/i915/gem/i915_gem_stolen.c    |   2 +-
>>   drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c  |  20 +--
>>   drivers/gpu/drm/i915/gem/i915_gem_userptr.c   |   2 +-
>>   .../drm/i915/gem/selftests/huge_gem_object.c  |   2 +-
>>   .../gpu/drm/i915/gem/selftests/huge_pages.c   |   3 +-
>>   drivers/gpu/drm/i915/gt/gen8_ppgtt.c          |  10 +-
>>   drivers/gpu/drm/i915/gt/intel_engine_cs.c     |   2 +-
>>   drivers/gpu/drm/i915/gt/intel_ggtt.c          |  25 ++--
>>   drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c     |   4 +-
>>   drivers/gpu/drm/i915/gt/intel_gtt.c           |   2 +-
>>   drivers/gpu/drm/i915/gt/intel_gtt.h           |   3 +-
>>   drivers/gpu/drm/i915/gt/intel_ppgtt.c         |   6 +-
>>   .../gpu/drm/i915/gt/intel_ring_submission.c   |   4 +-
>>   drivers/gpu/drm/i915/gt/intel_timeline.c      |   2 +-
>>   drivers/gpu/drm/i915/gt/selftest_hangcheck.c  |   2 +-
>>   .../gpu/drm/i915/gt/selftest_workarounds.c    |   2 +-
>>   drivers/gpu/drm/i915/i915_cache.c             |  89 +++++++++++--
>>   drivers/gpu/drm/i915/i915_cache.h             |  70 ++++++++++-
>>   drivers/gpu/drm/i915/i915_debugfs.c           |  53 ++------
>>   drivers/gpu/drm/i915/i915_driver.c            |   4 +-
>>   drivers/gpu/drm/i915/i915_gem.c               |  13 --
>>   drivers/gpu/drm/i915/i915_pci.c               |  84 +++++++------
>>   drivers/gpu/drm/i915/i915_perf.c              |   2 +-
>>   drivers/gpu/drm/i915/intel_device_info.h      |   6 +-
>>   .../gpu/drm/i915/selftests/i915_gem_evict.c   |   4 +-
>>   drivers/gpu/drm/i915/selftests/igt_spinner.c  |   2 +-
>>   .../gpu/drm/i915/selftests/mock_gem_device.c  |  14 +--
>>   36 files changed, 391 insertions(+), 367 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
>> index 57db9c581bf6..c15f83de33af 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
>> @@ -8,6 +8,7 @@
>>   #include "display/intel_frontbuffer.h"
>>   #include "gt/intel_gt.h"
>>   
>> +#include "i915_cache.h"
>>   #include "i915_drv.h"
>>   #include "i915_gem_clflush.h"
>>   #include "i915_gem_domain.h"
>> @@ -41,14 +42,17 @@ static bool gpu_write_needs_clflush(struct drm_i915_gem_object *obj)
>>   		return false;
>>   
>>   	/*
>> -	 * For objects created by userspace through GEM_CREATE with pat_index
>> -	 * set by set_pat extension, i915_gem_object_has_cache_level() will
>> -	 * always return true, because the coherency of such object is managed
>> -	 * by userspace. Othereise the call here would fall back to checking
>> -	 * whether the object is un-cached or write-through.
>> +	 * Always flush cache for UMD objects with PAT index set.
>>   	 */
>> -	return !(i915_gem_object_has_cache_level(obj, I915_CACHE_NONE) ||
>> -		 i915_gem_object_has_cache_level(obj, I915_CACHE_WT));
>> +	if (obj->pat_set_by_user)
>> +		return true;
>> +
>> +	/*
>> +	 * Fully coherent cached access may end up with data in the CPU cache
>> +	 * which hasn't hit memory yet.
>> +	 */
>> +	return i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_WB) &&
>> +	       i915_gem_object_has_cache_flag(obj, I915_CACHE_FLAG_COH2W);
>>   }
>>   
>>   bool i915_gem_cpu_write_needs_clflush(struct drm_i915_gem_object *obj)
>> @@ -268,7 +272,7 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
>>   /**
>>    * i915_gem_object_set_cache_level - Changes the cache-level of an object across all VMA.
>>    * @obj: object to act on
>> - * @cache_level: new cache level to set for the object
>> + * @cache: new caching mode to set for the object
>>    *
>>    * After this function returns, the object will be in the new cache-level
>>    * across all GTT and the contents of the backing storage will be coherent,
>> @@ -281,18 +285,28 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
>>    * that all direct access to the scanout remains coherent.
>>    */
>>   int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
>> -				    enum i915_cache_level cache_level)
>> +				    i915_cache_t cache)
>>   {
>> -	int ret;
>> +	struct drm_i915_private *i915 = to_i915(obj->base.dev);
>> +	int pat, ret;
>>   
>> -	/*
>> -	 * For objects created by userspace through GEM_CREATE with pat_index
>> -	 * set by set_pat extension, simply return 0 here without touching
>> -	 * the cache setting, because such objects should have an immutable
>> -	 * cache setting by desgin and always managed by userspace.
>> -	 */
>> -	if (i915_gem_object_has_cache_level(obj, cache_level))
>> +	pat = i915_cache_find_pat(i915, cache);
>> +	if (pat < 0) {
>> +		char buf[I915_CACHE_NAME_LEN];
>> +
>> +		i915_cache_print(buf, sizeof(buf), NULL, cache);
>> +		drm_err_ratelimited(&i915->drm,
>> +				    "Attempting to use unknown caching mode %s!\n",
>> +				    buf);
>> +
>> +		return -EINVAL;
>> +	} else if (pat == obj->pat_index) {
>>   		return 0;
>> +	} else if (obj->pat_set_by_user) {
>> +		drm_notice_once(&i915->drm,
>> +				"Attempting to change caching mode on an object with fixed PAT!\n");
>> +		return -EINVAL;
>> +	}
>>   
>>   	ret = i915_gem_object_wait(obj,
>>   				   I915_WAIT_INTERRUPTIBLE |
>> @@ -302,7 +316,7 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
>>   		return ret;
>>   
>>   	/* Always invalidate stale cachelines */
>> -	i915_gem_object_set_cache_coherency(obj, cache_level);
>> +	i915_gem_object_set_pat_index(obj, pat);
>>   	obj->cache_dirty = true;
>>   
>>   	/* The cache-level will be applied when each vma is rebound. */
>> @@ -337,10 +351,10 @@ int i915_gem_get_caching_ioctl(struct drm_device *dev, void *data,
>>   		goto out;
>>   	}
>>   
>> -	if (i915_gem_object_has_cache_level(obj, I915_CACHE_LLC) ||
>> -	    i915_gem_object_has_cache_level(obj, I915_CACHE_L3_LLC))
>> +	if (i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_WB) &&
>> +	    i915_gem_object_has_cache_flag(obj, I915_CACHE_FLAG_COH2W))
>>   		args->caching = I915_CACHING_CACHED;
>> -	else if (i915_gem_object_has_cache_level(obj, I915_CACHE_WT))
>> +	else if (i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_WT))
>>   		args->caching = I915_CACHING_DISPLAY;
>>   	else
>>   		args->caching = I915_CACHING_NONE;
>> @@ -355,7 +369,7 @@ int i915_gem_set_caching_ioctl(struct drm_device *dev, void *data,
>>   	struct drm_i915_private *i915 = to_i915(dev);
>>   	struct drm_i915_gem_caching *args = data;
>>   	struct drm_i915_gem_object *obj;
>> -	enum i915_cache_level level;
>> +	i915_cache_t level;
>>   	int ret = 0;
>>   
>>   	if (IS_DGFX(i915))
>> @@ -378,7 +392,7 @@ int i915_gem_set_caching_ioctl(struct drm_device *dev, void *data,
>>   		if (!HAS_LLC(i915) && !HAS_SNOOP(i915))
>>   			return -ENODEV;
>>   
>> -		level = I915_CACHE_LLC;
>> +		level = I915_CACHE_CACHED;
>>   		break;
>>   	case I915_CACHING_DISPLAY:
>>   		level = HAS_WT(i915) ? I915_CACHE_WT : I915_CACHE_NONE;
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.h b/drivers/gpu/drm/i915/gem/i915_gem_domain.h
>> index 9622df962bfc..6da5c351f6fd 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.h
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.h
>> @@ -6,10 +6,11 @@
>>   #ifndef __I915_GEM_DOMAIN_H__
>>   #define __I915_GEM_DOMAIN_H__
>>   
>> +#include "i915_cache.h"
>> +
>>   struct drm_i915_gem_object;
>> -enum i915_cache_level;
>>   
>>   int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
>> -				    enum i915_cache_level cache_level);
>> +				    i915_cache_t cache);
>>   
>>   #endif /* __I915_GEM_DOMAIN_H__ */
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>> index 0a1d40220020..9d6e49c8a4c6 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>> @@ -648,7 +648,8 @@ static inline int use_cpu_reloc(const struct reloc_cache *cache,
>>   	 */
>>   	return (cache->has_llc ||
>>   		obj->cache_dirty ||
>> -		!i915_gem_object_has_cache_level(obj, I915_CACHE_NONE));
>> +		!(obj->pat_set_by_user ||
>> +		  i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_UC)));
>>   }
>>   
>>   static int eb_reserve_vma(struct i915_execbuffer *eb,
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_internal.c b/drivers/gpu/drm/i915/gem/i915_gem_internal.c
>> index 6bc26b4b06b8..88c360c3d6a3 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_internal.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_internal.c
>> @@ -170,7 +170,7 @@ __i915_gem_object_create_internal(struct drm_i915_private *i915,
>>   	obj->read_domains = I915_GEM_DOMAIN_CPU;
>>   	obj->write_domain = I915_GEM_DOMAIN_CPU;
>>   
>> -	cache_level = HAS_LLC(i915) ? I915_CACHE_LLC : I915_CACHE_NONE;
>> +	cache_level = HAS_LLC(i915) ? I915_CACHE_CACHED : I915_CACHE_NONE;
>>   	i915_gem_object_set_cache_coherency(obj, cache_level);
>>   
>>   	return obj;
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
>> index aa4d842d4c5a..cd7f8ded0d6f 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
>> @@ -382,7 +382,6 @@ static vm_fault_t vm_fault_gtt(struct vm_fault *vmf)
>>   		goto err_reset;
>>   	}
>>   
>> -	/* Access to snoopable pages through the GTT is incoherent. */
>>   	/*
>>   	 * For objects created by userspace through GEM_CREATE with pat_index
>>   	 * set by set_pat extension, coherency is managed by userspace, make
>> @@ -391,7 +390,8 @@ static vm_fault_t vm_fault_gtt(struct vm_fault *vmf)
>>   	 * objects. Otherwise this helper function would fall back to checking
>>   	 * whether the object is un-cached.
>>   	 */
>> -	if (!(i915_gem_object_has_cache_level(obj, I915_CACHE_NONE) ||
>> +	if (!((obj->pat_set_by_user ||
>> +	       i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_UC)) ||
>>   	      HAS_LLC(i915))) {
>>   		ret = -EFAULT;
>>   		goto err_unpin;
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
>> index 3dc4fbb67d2b..ec1f0be43d0d 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
>> @@ -45,33 +45,6 @@ static struct kmem_cache *slab_objects;
>>   
>>   static const struct drm_gem_object_funcs i915_gem_object_funcs;
>>   
>> -unsigned int i915_gem_get_pat_index(struct drm_i915_private *i915,
>> -				    enum i915_cache_level level)
>> -{
>> -	if (drm_WARN_ON(&i915->drm, level >= I915_MAX_CACHE_LEVEL))
>> -		return 0;
>> -
>> -	return INTEL_INFO(i915)->cachelevel_to_pat[level];
>> -}
>> -
>> -bool i915_gem_object_has_cache_level(const struct drm_i915_gem_object *obj,
>> -				     enum i915_cache_level lvl)
>> -{
>> -	/*
>> -	 * In case the pat_index is set by user space, this kernel mode
>> -	 * driver should leave the coherency to be managed by user space,
>> -	 * simply return true here.
>> -	 */
>> -	if (obj->pat_set_by_user)
>> -		return true;
>> -
>> -	/*
>> -	 * Otherwise the pat_index should have been converted from cache_level
>> -	 * so that the following comparison is valid.
>> -	 */
>> -	return obj->pat_index == i915_gem_get_pat_index(obj_to_i915(obj), lvl);
>> -}
>> -
>>   struct drm_i915_gem_object *i915_gem_object_alloc(void)
>>   {
>>   	struct drm_i915_gem_object *obj;
>> @@ -144,30 +117,72 @@ void __i915_gem_object_fini(struct drm_i915_gem_object *obj)
>>   	dma_resv_fini(&obj->base._resv);
>>   }
>>   
>> +bool i915_gem_object_has_cache_mode(const struct drm_i915_gem_object *obj,
>> +				    enum i915_cache_mode mode)
>> +{
>> +	struct drm_i915_private *i915 = obj_to_i915(obj);
>> +	i915_cache_t cache = INTEL_INFO(i915)->cache_modes[obj->pat_index];
>> +
>> +	return I915_CACHE_MODE(cache) == mode;
>> +}
>> +
>> +bool i915_gem_object_has_cache_flag(const struct drm_i915_gem_object *obj,
>> +				    unsigned int flag)
>> +{
>> +	struct drm_i915_private *i915 = obj_to_i915(obj);
>> +	i915_cache_t cache = INTEL_INFO(i915)->cache_modes[obj->pat_index];
>> +
>> +	return I915_CACHE_FLAGS(cache) & flag;
>> +}
>> +
>> +static void __i915_gem_object_update_coherency(struct drm_i915_gem_object *obj)
>> +{
>> +	struct drm_i915_private *i915 = obj_to_i915(obj);
>> +	i915_cache_t cache = INTEL_INFO(i915)->cache_modes[obj->pat_index];
>> +	const unsigned int flags = I915_CACHE_FLAGS(cache);
>> +	const unsigned int mode = I915_CACHE_MODE(cache);
>> +
>> +	if (mode == I915_CACHE_MODE_WC ||
>> +	    mode == I915_CACHE_MODE_WT ||
>> +	    (mode == I915_CACHE_MODE_WB && (flags & I915_CACHE_FLAG_COH2W)))
>> +		obj->cache_coherent = I915_BO_CACHE_COHERENT_FOR_READ |
>> +				      I915_BO_CACHE_COHERENT_FOR_WRITE;
>> +	else if (HAS_LLC(i915))
>> +		obj->cache_coherent = I915_BO_CACHE_COHERENT_FOR_READ;
>> +	else
>> +		obj->cache_coherent = 0;
>> +
>> +	obj->cache_dirty =
>> +		!(obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_WRITE) &&
>> +		!IS_DGFX(i915);
>> +}
>> +
>>   /**
>>    * i915_gem_object_set_cache_coherency - Mark up the object's coherency levels
>> - * for a given cache_level
>> + * for a given caching mode
>>    * @obj: #drm_i915_gem_object
>> - * @cache_level: cache level
>> + * @cache: cache mode
>>    */
>>   void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object *obj,
>> -					 unsigned int cache_level)
>> +					 i915_cache_t cache)
>>   {
>> -	struct drm_i915_private *i915 = to_i915(obj->base.dev);
>> +	struct drm_i915_private *i915 = obj_to_i915(obj);
>> +	int found;
>>   
>> -	obj->pat_index = i915_gem_get_pat_index(i915, cache_level);
>> +	found = i915_cache_find_pat(i915, cache);
>> +	if (found < 0) {
>> +		char buf[I915_CACHE_NAME_LEN];
>>   
>> -	if (cache_level != I915_CACHE_NONE)
>> -		obj->cache_coherent = (I915_BO_CACHE_COHERENT_FOR_READ |
>> -				       I915_BO_CACHE_COHERENT_FOR_WRITE);
>> -	else if (HAS_LLC(i915))
>> -		obj->cache_coherent = I915_BO_CACHE_COHERENT_FOR_READ;
>> -	else
>> -		obj->cache_coherent = 0;
>> +		i915_cache_print(buf, sizeof(buf), NULL, cache);
>> +		drm_err_ratelimited(&i915->drm, "Unknown cache mode %s!\n",
>> +				    buf);
>>   
>> -	obj->cache_dirty =
>> -		!(obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_WRITE) &&
>> -		!IS_DGFX(i915);
>> +		found = i915->pat_uc;
>> +	}
>> +
>> +	obj->pat_index = found;
>> +
>> +	__i915_gem_object_update_coherency(obj);
>>   }
>>   
>>   /**
>> @@ -181,24 +196,18 @@ void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object *obj,
>>   void i915_gem_object_set_pat_index(struct drm_i915_gem_object *obj,
>>   				   unsigned int pat_index)
>>   {
>> -	struct drm_i915_private *i915 = to_i915(obj->base.dev);
>> +	struct drm_i915_private *i915 = obj_to_i915(obj);
>>   
>>   	if (obj->pat_index == pat_index)
>>   		return;
>>   
>> +	if (drm_WARN_ON_ONCE(&i915->drm,
>> +			     pat_index > INTEL_INFO(i915)->max_pat_index))
>> +		return;
>> +
>>   	obj->pat_index = pat_index;
>>   
>> -	if (pat_index != i915_gem_get_pat_index(i915, I915_CACHE_NONE))
>> -		obj->cache_coherent = (I915_BO_CACHE_COHERENT_FOR_READ |
>> -				       I915_BO_CACHE_COHERENT_FOR_WRITE);
>> -	else if (HAS_LLC(i915))
>> -		obj->cache_coherent = I915_BO_CACHE_COHERENT_FOR_READ;
>> -	else
>> -		obj->cache_coherent = 0;
>> -
>> -	obj->cache_dirty =
>> -		!(obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_WRITE) &&
>> -		!IS_DGFX(i915);
>> +	__i915_gem_object_update_coherency(obj);
>>   }
>>   
>>   bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj)
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h
>> index 884a17275b3a..a5d4ee19d9be 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
>> @@ -13,6 +13,7 @@
>>   
>>   #include "display/intel_frontbuffer.h"
>>   #include "intel_memory_region.h"
>> +#include "i915_cache.h"
>>   #include "i915_gem_object_types.h"
>>   #include "i915_gem_gtt.h"
>>   #include "i915_gem_ww.h"
>> @@ -32,10 +33,6 @@ static inline bool i915_gem_object_size_2big(u64 size)
>>   	return false;
>>   }
>>   
>> -unsigned int i915_gem_get_pat_index(struct drm_i915_private *i915,
>> -				    enum i915_cache_level level);
>> -bool i915_gem_object_has_cache_level(const struct drm_i915_gem_object *obj,
>> -				     enum i915_cache_level lvl);
>>   void i915_gem_init__objects(struct drm_i915_private *i915);
>>   
>>   void i915_objects_module_exit(void);
>> @@ -764,8 +761,12 @@ int i915_gem_object_wait_moving_fence(struct drm_i915_gem_object *obj,
>>   				      bool intr);
>>   bool i915_gem_object_has_unknown_state(struct drm_i915_gem_object *obj);
>>   
>> +bool i915_gem_object_has_cache_mode(const struct drm_i915_gem_object *obj,
>> +				    enum i915_cache_mode mode);
>> +bool i915_gem_object_has_cache_flag(const struct drm_i915_gem_object *obj,
>> +				    unsigned int flag);
>>   void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object *obj,
>> -					 unsigned int cache_level);
>> +					 i915_cache_t cache);
>>   void i915_gem_object_set_pat_index(struct drm_i915_gem_object *obj,
>>   				   unsigned int pat_index);
>>   bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj);
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
>> index 8de2b91b3edf..6790e13ad262 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
>> @@ -14,6 +14,7 @@
>>   #include <uapi/drm/i915_drm.h>
>>   
>>   #include "i915_active.h"
>> +#include "i915_cache.h"
>>   #include "i915_selftest.h"
>>   #include "i915_vma_resource.h"
>>   
>> @@ -116,93 +117,6 @@ struct drm_i915_gem_object_ops {
>>   	const char *name; /* friendly name for debug, e.g. lockdep classes */
>>   };
>>   
>> -/**
>> - * enum i915_cache_level - The supported GTT caching values for system memory
>> - * pages.
>> - *
>> - * These translate to some special GTT PTE bits when binding pages into some
>> - * address space. It also determines whether an object, or rather its pages are
>> - * coherent with the GPU, when also reading or writing through the CPU cache
>> - * with those pages.
>> - *
>> - * Userspace can also control this through struct drm_i915_gem_caching.
>> - */
>> -enum i915_cache_level {
>> -	/**
>> -	 * @I915_CACHE_NONE:
>> -	 *
>> -	 * GPU access is not coherent with the CPU cache. If the cache is dirty
>> -	 * and we need the underlying pages to be coherent with some later GPU
>> -	 * access then we need to manually flush the pages.
>> -	 *
>> -	 * On shared LLC platforms reads and writes through the CPU cache are
>> -	 * still coherent even with this setting. See also
>> -	 * &drm_i915_gem_object.cache_coherent for more details. Due to this we
>> -	 * should only ever use uncached for scanout surfaces, otherwise we end
>> -	 * up over-flushing in some places.
>> -	 *
>> -	 * This is the default on non-LLC platforms.
>> -	 */
>> -	I915_CACHE_NONE = 0,
>> -	/**
>> -	 * @I915_CACHE_LLC:
>> -	 *
>> -	 * GPU access is coherent with the CPU cache. If the cache is dirty,
>> -	 * then the GPU will ensure that access remains coherent, when both
>> -	 * reading and writing through the CPU cache. GPU writes can dirty the
>> -	 * CPU cache.
>> -	 *
>> -	 * Not used for scanout surfaces.
>> -	 *
>> -	 * Applies to both platforms with shared LLC(HAS_LLC), and snooping
>> -	 * based platforms(HAS_SNOOP).
>> -	 *
>> -	 * This is the default on shared LLC platforms.  The only exception is
>> -	 * scanout objects, where the display engine is not coherent with the
>> -	 * CPU cache. For such objects I915_CACHE_NONE or I915_CACHE_WT is
>> -	 * automatically applied by the kernel in pin_for_display, if userspace
>> -	 * has not done so already.
>> -	 */
>> -	I915_CACHE_LLC,
>> -	/**
>> -	 * @I915_CACHE_L3_LLC:
>> -	 *
>> -	 * Explicitly enable the Gfx L3 cache, with coherent LLC.
>> -	 *
>> -	 * The Gfx L3 sits between the domain specific caches, e.g
>> -	 * sampler/render caches, and the larger LLC. LLC is coherent with the
>> -	 * GPU, but L3 is only visible to the GPU, so likely needs to be flushed
>> -	 * when the workload completes.
>> -	 *
>> -	 * Not used for scanout surfaces.
>> -	 *
>> -	 * Only exposed on some gen7 + GGTT. More recent hardware has dropped
>> -	 * this explicit setting, where it should now be enabled by default.
>> -	 */
>> -	I915_CACHE_L3_LLC,
>> -	/**
>> -	 * @I915_CACHE_WT:
>> -	 *
>> -	 * Write-through. Used for scanout surfaces.
>> -	 *
>> -	 * The GPU can utilise the caches, while still having the display engine
>> -	 * be coherent with GPU writes, as a result we don't need to flush the
>> -	 * CPU caches when moving out of the render domain. This is the default
>> -	 * setting chosen by the kernel, if supported by the HW, otherwise we
>> -	 * fallback to I915_CACHE_NONE. On the CPU side writes through the CPU
>> -	 * cache still need to be flushed, to remain coherent with the display
>> -	 * engine.
>> -	 */
>> -	I915_CACHE_WT,
>> -	/**
>> -	 * @I915_MAX_CACHE_LEVEL:
>> -	 *
>> -	 * Mark the last entry in the enum. Used for defining cachelevel_to_pat
>> -	 * array for cache_level to pat translation table.
>> -	 */
>> -	I915_MAX_CACHE_LEVEL,
>> -};
>> -
>>   enum i915_map_type {
>>   	I915_MAP_WB = 0,
>>   	I915_MAP_WC,
>> @@ -403,16 +317,6 @@ struct drm_i915_gem_object {
>>   	/**
>>   	 * @cache_coherent:
>>   	 *
>> -	 * Note: with the change above which replaced @cache_level with pat_index,
>> -	 * the use of @cache_coherent is limited to the objects created by kernel
>> -	 * or by userspace without pat index specified.
>> -	 * Check for @pat_set_by_user to find out if an object has pat index set
>> -	 * by userspace. The ioctl's to change cache settings have also been
>> -	 * disabled for the objects with pat index set by userspace. Please don't
>> -	 * assume @cache_coherent having the flags set as describe here. A helper
>> -	 * function i915_gem_object_has_cache_level() provides one way to bypass
>> -	 * the use of this field.
>> -	 *
>>   	 * Track whether the pages are coherent with the GPU if reading or
>>   	 * writing through the CPU caches. The largely depends on the
>>   	 * @cache_level setting.
>> @@ -447,7 +351,7 @@ struct drm_i915_gem_object {
>>   	 * flushing the surface just before doing the scanout.  This does mean
>>   	 * we might unnecessarily flush non-scanout objects in some places, but
>>   	 * the default assumption is that all normal objects should be using
>> -	 * I915_CACHE_LLC, at least on platforms with the shared LLC.
>> +	 * I915_CACHE_CACHED, at least on platforms with the shared LLC.
>>   	 *
>>   	 * Supported values:
>>   	 *
>> @@ -486,16 +390,6 @@ struct drm_i915_gem_object {
>>   	/**
>>   	 * @cache_dirty:
>>   	 *
>> -	 * Note: with the change above which replaced cache_level with pat_index,
>> -	 * the use of @cache_dirty is limited to the objects created by kernel
>> -	 * or by userspace without pat index specified.
>> -	 * Check for @pat_set_by_user to find out if an object has pat index set
>> -	 * by userspace. The ioctl's to change cache settings have also been
>> -	 * disabled for the objects with pat_index set by userspace. Please don't
>> -	 * assume @cache_dirty is set as describe here. Also see helper function
>> -	 * i915_gem_object_has_cache_level() for possible ways to bypass the use
>> -	 * of this field.
>> -	 *
>>   	 * Track if we are we dirty with writes through the CPU cache for this
>>   	 * object. As a result reading directly from main memory might yield
>>   	 * stale data.
>> @@ -531,9 +425,9 @@ struct drm_i915_gem_object {
>>   	 *
>>   	 *   1. All userspace objects, by default, have @cache_level set as
>>   	 *   I915_CACHE_NONE. The only exception is userptr objects, where we
>> -	 *   instead force I915_CACHE_LLC, but we also don't allow userspace to
>> -	 *   ever change the @cache_level for such objects. Another special case
>> -	 *   is dma-buf, which doesn't rely on @cache_dirty,  but there we
>> +	 *   instead force I915_CACHE_CACHED, but we also don't allow userspace
>> +	 *   to ever change the @cache_level for such objects. Another special
>> +	 *   case is dma-buf, which doesn't rely on @cache_dirty,  but there we
>>   	 *   always do a forced flush when acquiring the pages, if there is a
>>   	 *   chance that the pages can be read directly from main memory with
>>   	 *   the GPU.
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
>> index 8f1633c3fb93..aba908f0349f 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
>> @@ -584,7 +584,7 @@ static int shmem_object_init(struct intel_memory_region *mem,
>>   	static struct lock_class_key lock_class;
>>   	struct drm_i915_private *i915 = mem->i915;
>>   	struct address_space *mapping;
>> -	unsigned int cache_level;
>> +	i915_cache_t cache;
>>   	gfp_t mask;
>>   	int ret;
>>   
>> @@ -628,11 +628,11 @@ static int shmem_object_init(struct intel_memory_region *mem,
>>   		 * However, we maintain the display planes as UC, and so
>>   		 * need to rebind when first used as such.
>>   		 */
>> -		cache_level = I915_CACHE_LLC;
>> +		cache = I915_CACHE_CACHED;
>>   	else
>> -		cache_level = I915_CACHE_NONE;
>> +		cache = I915_CACHE_NONE;
>>   
>> -	i915_gem_object_set_cache_coherency(obj, cache_level);
>> +	i915_gem_object_set_cache_coherency(obj, cache);
>>   
>>   	i915_gem_object_init_memory_region(obj, mem);
>>   
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
>> index 1c8eb806b7d3..cc907a1f1c53 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
>> @@ -691,7 +691,7 @@ static int __i915_gem_object_create_stolen(struct intel_memory_region *mem,
>>   
>>   	obj->stolen = stolen;
>>   	obj->read_domains = I915_GEM_DOMAIN_CPU | I915_GEM_DOMAIN_GTT;
>> -	cache_level = HAS_LLC(mem->i915) ? I915_CACHE_LLC : I915_CACHE_NONE;
>> +	cache_level = HAS_LLC(mem->i915) ? I915_CACHE_CACHED : I915_CACHE_NONE;
>>   	i915_gem_object_set_cache_coherency(obj, cache_level);
>>   
>>   	if (WARN_ON(!i915_gem_object_trylock(obj, NULL)))
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
>> index 6bd6c239f4ac..107176d1757b 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
>> @@ -48,14 +48,14 @@ void i915_ttm_migrate_set_ban_memcpy(bool ban)
>>   }
>>   #endif
>>   
>> -static enum i915_cache_level
>> -i915_ttm_cache_level(struct drm_i915_private *i915, struct ttm_resource *res,
>> -		     struct ttm_tt *ttm)
>> +static i915_cache_t
>> +i915_ttm_cache(struct drm_i915_private *i915, struct ttm_resource *res,
>> +	       struct ttm_tt *ttm)
>>   {
>>   	return ((HAS_LLC(i915) || HAS_SNOOP(i915)) &&
>>   		!i915_ttm_gtt_binds_lmem(res) &&
>> -		ttm->caching == ttm_cached) ? I915_CACHE_LLC :
>> -		I915_CACHE_NONE;
>> +		ttm->caching == ttm_cached) ? I915_CACHE_CACHED :
>> +					      I915_CACHE_NONE;
>>   }
>>   
>>   static unsigned int
>> @@ -112,8 +112,8 @@ void i915_ttm_adjust_domains_after_move(struct drm_i915_gem_object *obj)
>>   void i915_ttm_adjust_gem_after_move(struct drm_i915_gem_object *obj)
>>   {
>>   	struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
>> -	unsigned int cache_level;
>>   	unsigned int mem_flags;
>> +	i915_cache_t cache;
>>   	unsigned int i;
>>   	int mem_type;
>>   
>> @@ -126,13 +126,13 @@ void i915_ttm_adjust_gem_after_move(struct drm_i915_gem_object *obj)
>>   	if (!bo->resource) {
>>   		mem_flags = I915_BO_FLAG_STRUCT_PAGE;
>>   		mem_type = I915_PL_SYSTEM;
>> -		cache_level = I915_CACHE_NONE;
>> +		cache = I915_CACHE_NONE;
>>   	} else {
>>   		mem_flags = i915_ttm_cpu_maps_iomem(bo->resource) ? I915_BO_FLAG_IOMEM :
>>   			I915_BO_FLAG_STRUCT_PAGE;
>>   		mem_type = bo->resource->mem_type;
>> -		cache_level = i915_ttm_cache_level(to_i915(bo->base.dev), bo->resource,
>> -						   bo->ttm);
>> +		cache = i915_ttm_cache(to_i915(bo->base.dev), bo->resource,
>> +				       bo->ttm);
>>   	}
>>   
>>   	/*
>> @@ -157,7 +157,7 @@ void i915_ttm_adjust_gem_after_move(struct drm_i915_gem_object *obj)
>>   	obj->mem_flags &= ~(I915_BO_FLAG_STRUCT_PAGE | I915_BO_FLAG_IOMEM);
>>   	obj->mem_flags |= mem_flags;
>>   
>> -	i915_gem_object_set_cache_coherency(obj, cache_level);
>> +	i915_gem_object_set_cache_coherency(obj, cache);
>>   }
>>   
>>   /**
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
>> index 1d3ebdf4069b..5d2891981bd4 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
>> @@ -553,7 +553,7 @@ i915_gem_userptr_ioctl(struct drm_device *dev,
>>   	obj->mem_flags = I915_BO_FLAG_STRUCT_PAGE;
>>   	obj->read_domains = I915_GEM_DOMAIN_CPU;
>>   	obj->write_domain = I915_GEM_DOMAIN_CPU;
>> -	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
>> +	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
>>   
>>   	obj->userptr.ptr = args->user_ptr;
>>   	obj->userptr.notifier_seq = ULONG_MAX;
>> diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c b/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c
>> index bac957755068..77d04be5e9d7 100644
>> --- a/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c
>> +++ b/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c
>> @@ -123,7 +123,7 @@ huge_gem_object(struct drm_i915_private *i915,
>>   
>>   	obj->read_domains = I915_GEM_DOMAIN_CPU;
>>   	obj->write_domain = I915_GEM_DOMAIN_CPU;
>> -	cache_level = HAS_LLC(i915) ? I915_CACHE_LLC : I915_CACHE_NONE;
>> +	cache_level = HAS_LLC(i915) ? I915_CACHE_CACHED : I915_CACHE_NONE;
>>   	i915_gem_object_set_cache_coherency(obj, cache_level);
>>   	obj->scratch = phys_size;
>>   
>> diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
>> index 6bddd733d796..6ca5b9dbc414 100644
>> --- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
>> +++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
>> @@ -200,9 +200,10 @@ huge_pages_object(struct drm_i915_private *i915,
>>   	obj->write_domain = I915_GEM_DOMAIN_CPU;
>>   	obj->read_domains = I915_GEM_DOMAIN_CPU;
>>   
>> -	cache_level = HAS_LLC(i915) ? I915_CACHE_LLC : I915_CACHE_NONE;
>> +	cache_level = HAS_LLC(i915) ? I915_CACHE_CACHED : I915_CACHE_NONE;
>>   	i915_gem_object_set_cache_coherency(obj, cache_level);
>>   
>> +
>>   	obj->mm.page_mask = page_mask;
>>   
>>   	return obj;
>> diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>> index 675f71f06e89..3c93a73cf6b1 100644
>> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>> @@ -16,11 +16,11 @@
>>   #include "intel_gtt.h"
>>   
>>   static u64 gen8_pde_encode(const dma_addr_t addr,
>> -			   const enum i915_cache_level level)
>> +			   const enum i915_cache_mode cache_mode)
>>   {
>>   	u64 pde = addr | GEN8_PAGE_PRESENT | GEN8_PAGE_RW;
>>   
>> -	if (level != I915_CACHE_NONE)
>> +	if (cache_mode != I915_CACHE_MODE_UC)
>>   		pde |= PPAT_CACHED_PDE;
>>   	else
>>   		pde |= PPAT_UNCACHED;
>> @@ -43,10 +43,10 @@ static u64 gen8_pte_encode(dma_addr_t addr,
>>   	 * See translation table defined by LEGACY_CACHELEVEL.
>>   	 */
>>   	switch (pat_index) {
>> -	case I915_CACHE_NONE:
>> +	case I915_CACHE_MODE_UC:
>>   		pte |= PPAT_UNCACHED;
>>   		break;
>> -	case I915_CACHE_WT:
>> +	case I915_CACHE_MODE_WT:
>>   		pte |= PPAT_DISPLAY_ELLC;
>>   		break;
>>   	default:
>> @@ -893,7 +893,7 @@ static int gen8_init_scratch(struct i915_address_space *vm)
>>   		}
>>   
>>   		fill_px(obj, vm->scratch[i - 1]->encode);
>> -		obj->encode = gen8_pde_encode(px_dma(obj), I915_CACHE_NONE);
>> +		obj->encode = gen8_pde_encode(px_dma(obj), I915_CACHE_MODE_UC);
>>   
>>   		vm->scratch[i] = obj;
>>   	}
>> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
>> index ee15486fed0d..f1e59e512d14 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
>> @@ -1103,7 +1103,7 @@ static int init_status_page(struct intel_engine_cs *engine)
>>   		return PTR_ERR(obj);
>>   	}
>>   
>> -	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
>> +	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
>>   
>>   	vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL);
>>   	if (IS_ERR(vma)) {
>> diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c
>> index fca61ddca8ad..ab5f654e7557 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
>> @@ -1011,11 +1011,6 @@ static int gen8_gmch_probe(struct i915_ggtt *ggtt)
>>   	return ggtt_probe_common(ggtt, size);
>>   }
>>   
>> -/*
>> - * For pre-gen8 platforms pat_index is the same as enum i915_cache_level,
>> - * so the switch-case statements in these PTE encode functions are still valid.
>> - * See translation table LEGACY_CACHELEVEL.
>> - */
>>   static u64 snb_pte_encode(dma_addr_t addr,
>>   			  unsigned int pat_index,
>>   			  u32 flags)
>> @@ -1023,11 +1018,11 @@ static u64 snb_pte_encode(dma_addr_t addr,
>>   	gen6_pte_t pte = GEN6_PTE_ADDR_ENCODE(addr) | GEN6_PTE_VALID;
>>   
>>   	switch (pat_index) {
>> -	case I915_CACHE_L3_LLC:
>> -	case I915_CACHE_LLC:
>> +	case I915_CACHE_MODE_WB:
>> +	case __I915_CACHE_MODE_WB_L3:
>>   		pte |= GEN6_PTE_CACHE_LLC;
>>   		break;
>> -	case I915_CACHE_NONE:
>> +	case I915_CACHE_MODE_UC:
>>   		pte |= GEN6_PTE_UNCACHED;
>>   		break;
>>   	default:
>> @@ -1044,13 +1039,13 @@ static u64 ivb_pte_encode(dma_addr_t addr,
>>   	gen6_pte_t pte = GEN6_PTE_ADDR_ENCODE(addr) | GEN6_PTE_VALID;
>>   
>>   	switch (pat_index) {
>> -	case I915_CACHE_L3_LLC:
>> +	case __I915_CACHE_MODE_WB_L3:
>>   		pte |= GEN7_PTE_CACHE_L3_LLC;
>>   		break;
>> -	case I915_CACHE_LLC:
>> +	case I915_CACHE_MODE_WB:
>>   		pte |= GEN6_PTE_CACHE_LLC;
>>   		break;
>> -	case I915_CACHE_NONE:
>> +	case I915_CACHE_MODE_UC:
>>   		pte |= GEN6_PTE_UNCACHED;
>>   		break;
>>   	default:
>> @@ -1069,7 +1064,7 @@ static u64 byt_pte_encode(dma_addr_t addr,
>>   	if (!(flags & PTE_READ_ONLY))
>>   		pte |= BYT_PTE_WRITEABLE;
>>   
>> -	if (pat_index != I915_CACHE_NONE)
>> +	if (pat_index != I915_CACHE_MODE_UC)
>>   		pte |= BYT_PTE_SNOOPED_BY_CPU_CACHES;
>>   
>>   	return pte;
>> @@ -1081,7 +1076,7 @@ static u64 hsw_pte_encode(dma_addr_t addr,
>>   {
>>   	gen6_pte_t pte = HSW_PTE_ADDR_ENCODE(addr) | GEN6_PTE_VALID;
>>   
>> -	if (pat_index != I915_CACHE_NONE)
>> +	if (pat_index != I915_CACHE_MODE_UC)
>>   		pte |= HSW_WB_LLC_AGE3;
>>   
>>   	return pte;
>> @@ -1094,9 +1089,9 @@ static u64 iris_pte_encode(dma_addr_t addr,
>>   	gen6_pte_t pte = HSW_PTE_ADDR_ENCODE(addr) | GEN6_PTE_VALID;
>>   
>>   	switch (pat_index) {
>> -	case I915_CACHE_NONE:
>> +	case I915_CACHE_MODE_UC:
>>   		break;
>> -	case I915_CACHE_WT:
>> +	case I915_CACHE_MODE_WT:
>>   		pte |= HSW_WT_ELLC_LLC_AGE3;
>>   		break;
>>   	default:
>> diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c b/drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c
>> index 866c416afb73..803c41ac4ccb 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c
>> @@ -21,7 +21,7 @@ static void gmch_ggtt_insert_page(struct i915_address_space *vm,
>>   				  unsigned int pat_index,
>>   				  u32 unused)
>>   {
>> -	unsigned int flags = (pat_index == I915_CACHE_NONE) ?
>> +	unsigned int flags = (pat_index == I915_CACHE_MODE_UC) ?
>>   		AGP_USER_MEMORY : AGP_USER_CACHED_MEMORY;
>>   
>>   	intel_gmch_gtt_insert_page(addr, offset >> PAGE_SHIFT, flags);
>> @@ -32,7 +32,7 @@ static void gmch_ggtt_insert_entries(struct i915_address_space *vm,
>>   				     unsigned int pat_index,
>>   				     u32 unused)
>>   {
>> -	unsigned int flags = (pat_index == I915_CACHE_NONE) ?
>> +	unsigned int flags = (pat_index == I915_CACHE_MODE_UC) ?
>>   		AGP_USER_MEMORY : AGP_USER_CACHED_MEMORY;
>>   
>>   	intel_gmch_gtt_insert_sg_entries(vma_res->bi.pages, vma_res->start >> PAGE_SHIFT,
>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
>> index 065099362a98..48055304537a 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
>> @@ -676,7 +676,7 @@ __vm_create_scratch_for_read(struct i915_address_space *vm, unsigned long size)
>>   	if (IS_ERR(obj))
>>   		return ERR_CAST(obj);
>>   
>> -	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
>> +	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
>>   
>>   	vma = i915_vma_instance(obj, vm, NULL);
>>   	if (IS_ERR(vma)) {
>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
>> index 7192a534a654..af4277c1d577 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
>> @@ -636,7 +636,8 @@ void
>>   __set_pd_entry(struct i915_page_directory * const pd,
>>   	       const unsigned short idx,
>>   	       struct i915_page_table *pt,
>> -	       u64 (*encode)(const dma_addr_t, const enum i915_cache_level));
>> +	       u64 (*encode)(const dma_addr_t,
>> +			     const enum i915_cache_mode cache_mode));
>>   
>>   #define set_pd_entry(pd, idx, to) \
>>   	__set_pd_entry((pd), (idx), px_pt(to), gen8_pde_encode)
>> diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
>> index 436756bfbb1a..3e461d4f3693 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
>> @@ -98,14 +98,16 @@ void
>>   __set_pd_entry(struct i915_page_directory * const pd,
>>   	       const unsigned short idx,
>>   	       struct i915_page_table * const to,
>> -	       u64 (*encode)(const dma_addr_t, const enum i915_cache_level))
>> +	       u64 (*encode)(const dma_addr_t,
>> +			     const enum i915_cache_mode cache_mode))
>>   {
>>   	/* Each thread pre-pins the pd, and we may have a thread per pde. */
>>   	GEM_BUG_ON(atomic_read(px_used(pd)) > NALLOC * I915_PDES);
>>   
>>   	atomic_inc(px_used(pd));
>>   	pd->entry[idx] = to;
>> -	write_dma_entry(px_base(pd), idx, encode(px_dma(to), I915_CACHE_LLC));
>> +	write_dma_entry(px_base(pd), idx,
>> +			encode(px_dma(to), I915_CACHE_MODE_WB));
>>   }
>>   
>>   void
>> diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
>> index 92085ffd23de..9131d228d285 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
>> @@ -551,7 +551,9 @@ alloc_context_vma(struct intel_engine_cs *engine)
>>   	 * later platforms don't have L3 control bits in the PTE.
>>   	 */
>>   	if (IS_IVYBRIDGE(i915))
>> -		i915_gem_object_set_cache_coherency(obj, I915_CACHE_L3_LLC);
>> +		i915_gem_object_set_cache_coherency(obj,
>> +						    I915_CACHE_CACHED |
>> +						    __I915_CACHE_FLAG(L3));
>>   
>>   	vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL);
>>   	if (IS_ERR(vma)) {
>> diff --git a/drivers/gpu/drm/i915/gt/intel_timeline.c b/drivers/gpu/drm/i915/gt/intel_timeline.c
>> index b9640212d659..025ce54c886d 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_timeline.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_timeline.c
>> @@ -26,7 +26,7 @@ static struct i915_vma *hwsp_alloc(struct intel_gt *gt)
>>   	if (IS_ERR(obj))
>>   		return ERR_CAST(obj);
>>   
>> -	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
>> +	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
>>   
>>   	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL);
>>   	if (IS_ERR(vma))
>> diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
>> index 8b0d84f2aad2..fc278fa463b0 100644
>> --- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
>> +++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
>> @@ -64,7 +64,7 @@ static int hang_init(struct hang *h, struct intel_gt *gt)
>>   		goto err_hws;
>>   	}
>>   
>> -	i915_gem_object_set_cache_coherency(h->hws, I915_CACHE_LLC);
>> +	i915_gem_object_set_cache_coherency(h->hws, I915_CACHE_CACHED);
>>   	vaddr = i915_gem_object_pin_map_unlocked(h->hws, I915_MAP_WB);
>>   	if (IS_ERR(vaddr)) {
>>   		err = PTR_ERR(vaddr);
>> diff --git a/drivers/gpu/drm/i915/gt/selftest_workarounds.c b/drivers/gpu/drm/i915/gt/selftest_workarounds.c
>> index 14a8b25b6204..d25990d33d44 100644
>> --- a/drivers/gpu/drm/i915/gt/selftest_workarounds.c
>> +++ b/drivers/gpu/drm/i915/gt/selftest_workarounds.c
>> @@ -111,7 +111,7 @@ read_nonprivs(struct intel_context *ce)
>>   	if (IS_ERR(result))
>>   		return result;
>>   
>> -	i915_gem_object_set_cache_coherency(result, I915_CACHE_LLC);
>> +	i915_gem_object_set_cache_coherency(result, I915_CACHE_CACHED);
>>   
>>   	cs = i915_gem_object_pin_map_unlocked(result, I915_MAP_WB);
>>   	if (IS_ERR(cs)) {
>> diff --git a/drivers/gpu/drm/i915/i915_cache.c b/drivers/gpu/drm/i915/i915_cache.c
>> index 06eb5933c719..f4ba1cb430d3 100644
>> --- a/drivers/gpu/drm/i915/i915_cache.c
>> +++ b/drivers/gpu/drm/i915/i915_cache.c
>> @@ -6,13 +6,88 @@
>>   #include "i915_cache.h"
>>   #include "i915_drv.h"
>>   
>> -void i915_cache_init(struct drm_i915_private *i915)
>> +int i915_cache_init(struct drm_i915_private *i915)
>>   {
>> -	i915->pat_uc = i915_gem_get_pat_index(i915, I915_CACHE_NONE);
>> -	drm_info(&i915->drm, "Using PAT index %u for uncached access\n",
>> -		 i915->pat_uc);
>> +	int ret;
>>   
>> -	i915->pat_wb = i915_gem_get_pat_index(i915, I915_CACHE_LLC);
>> -	drm_info(&i915->drm, "Using PAT index %u for write-back access\n",
>> -		 i915->pat_wb);
>> +	ret = i915_cache_find_pat(i915, I915_CACHE_NONE);
>> +	if (ret < 0) {
>> +		drm_err(&i915->drm,
>> +			"Failed to find PAT index for uncached access\n");
>> +		return -ENODEV;
>> +	}
>> +	drm_info(&i915->drm, "Using PAT index %u for uncached access\n", ret);
>> +	i915->pat_uc = ret;
>> +
>> +	ret = i915_cache_find_pat(i915, I915_CACHE_CACHED);
>> +	if (ret < 0) {
>> +		drm_err(&i915->drm,
>> +			"Failed to find PAT index for write-back access\n");
>> +		return -ENODEV;
>> +	}
>> +	drm_info(&i915->drm, "Using PAT index %u for write-back access\n", ret);
>> +	i915->pat_wb = ret;
>> +
>> +	return 0;
>> +}
>> +
>> +int i915_cache_find_pat(struct drm_i915_private *i915, i915_cache_t cache)
>> +{
>> +	const struct intel_device_info *info = INTEL_INFO(i915);
>> +	int i;
>> +
>> +	for (i = 0; i < ARRAY_SIZE(info->cache_modes); i++) {
>> +		if (info->cache_modes[i] == cache)
>> +			return i;
>> +	}
>> +
>> +	return -1;
>> +}
>> +
>> +void i915_cache_print(char *buf, size_t buflen, const char *suffix,
>> +		      i915_cache_t cache)
>> +{
>> +	const enum i915_cache_mode mode = I915_CACHE_MODE(cache);
>> +	static const char * const mode_str[] = {
>> +		[I915_CACHE_MODE_UC] = "UC",
>> +		[I915_CACHE_MODE_WB] = "WB",
>> +		[I915_CACHE_MODE_WT] = "WT",
>> +		[I915_CACHE_MODE_WC] = "WC",
>> +	};
>> +	static const char * const flag_str[] = {
>> +		[ilog2(I915_CACHE_FLAG_COH1W)] = "1-Way-Coherent",
>> +		[ilog2(I915_CACHE_FLAG_COH2W)] = "2-Way-Coherent",
>> +		[ilog2(I915_CACHE_FLAG_L3)] =    "L3",
>> +		[ilog2(I915_CACHE_FLAG_CLOS1)] = "CLOS1",
>> +		[ilog2(I915_CACHE_FLAG_CLOS2)] = "CLOS2",
>> +	};
>> +
>> +	if (mode > ARRAY_SIZE(mode_str)) {
>> +		snprintf(buf, buflen, "0x%x%s", cache, suffix ?: "");
>> +	} else {
>> +		unsigned long flags = I915_CACHE_FLAGS(cache);
>> +		unsigned long bit;
>> +		int ret;
>> +
>> +		ret = snprintf(buf, buflen, "%s", mode_str[mode]);
>> +		buf += ret;
>> +		buflen -= ret;
>> +
>> +		/*
>> +		 * Don't print "1-way-2-way", it would be confusing and 2-way
>> +		 * implies 1-way anyway.
>> +		 */
>> +		if ((flags & (I915_CACHE_FLAG_COH1W | I915_CACHE_FLAG_COH2W)) ==
>> +		    (I915_CACHE_FLAG_COH1W | I915_CACHE_FLAG_COH2W))
>> +			flags &= ~I915_CACHE_FLAG_COH1W;
>> +
>> +		for_each_set_bit(bit, &flags, BITS_PER_TYPE(i915_cache_t)) {
>> +			ret = snprintf(buf, buflen, "-%s", flag_str[bit]);
>> +			buf += ret;
>> +			buflen -= ret;
>> +		}
>> +
>> +		if (suffix)
>> +			snprintf(buf, buflen, "%s", suffix);
>> +	}
>>   }
>> diff --git a/drivers/gpu/drm/i915/i915_cache.h b/drivers/gpu/drm/i915/i915_cache.h
>> index cb68936fb8a2..d9e97318b942 100644
>> --- a/drivers/gpu/drm/i915/i915_cache.h
>> +++ b/drivers/gpu/drm/i915/i915_cache.h
>> @@ -6,8 +6,76 @@
>>   #ifndef __I915_CACHE_H__
>>   #define __I915_CACHE_H__
>>   
>> +#include <linux/types.h>
>> +
>> +struct drm_printer;
>> +
>>   struct drm_i915_private;
>>   
>> -void i915_cache_init(struct drm_i915_private *i915);
>> +typedef u16 i915_cache_t;
>> +
>> +/* Cache modes */
>> +enum i915_cache_mode {
>> +	I915_CACHE_MODE_UC = 0,
>> +	I915_CACHE_MODE_WB,
>> +	__I915_CACHE_MODE_WB_L3, /* Special do-not-use entry for legacy 1:1 mapping. */
>> +	I915_CACHE_MODE_WT,
>> +	I915_CACHE_MODE_WC,
>> +	I915_NUM_CACHE_MODES
>> +};
>> +
>> +/* Cache mode flag bits */
>> +#define I915_CACHE_FLAG_COH1W	(0x1)
>> +#define I915_CACHE_FLAG_COH2W	(0x2) /* 1-way needs to be set too. */
>> +#define I915_CACHE_FLAG_L3	(0x4)
>> +#define I915_CACHE_FLAG_CLOS1	(0x8)
>> +#define I915_CACHE_FLAG_CLOS2	(0x10)
>> +
>> +/*
>> + * Overloaded I915_CACHE() macro based on:
>> + *  https://stackoverflow.com/questions/3046889/optional-parameters-with-c-macros
>> + *
>> + * It is possible to call I915_CACHE with mode and zero or more flags as
>> + * separate arguments. Ie these all work:
>> + *
>> + *   I915_CACHE(WB)
>> + *   I915_CACHE(WB, COH1W, COH2W)
>> + *   I915_CACHE(WB, COH1W, COH2W, L3)
>> + */
>> +
>> +#define __I915_CACHE_FLAG(f) (I915_CACHE_FLAG_##f << 8)
>> +#define __I915_CACHE(m, f) ((i915_cache_t)(I915_CACHE_MODE_##m | (f)))
>> +
>> +#define I915_CACHE_4(m, f1, f2, f3)	__I915_CACHE(m, __I915_CACHE_FLAG(f1) | __I915_CACHE_FLAG(f2) | __I915_CACHE_FLAG(f3))
>> +#define I915_CACHE_3(m, f1, f2)		__I915_CACHE(m, __I915_CACHE_FLAG(f1) | __I915_CACHE_FLAG(f2))
>> +#define I915_CACHE_2(m, f1)		__I915_CACHE(m, __I915_CACHE_FLAG(f1))
>> +#define I915_CACHE_1(m)			__I915_CACHE(m, 0)
>> +#define I915_CACHE_0(m)			__I915_CACHE(WC, 0)
>> +
>> +#define FUNC_CHOOSER(_f1, _f2, _f3, _f4, _f5, ...) _f5
>> +#define FUNC_RECOMPOSER(argsWithParentheses) FUNC_CHOOSER argsWithParentheses
>> +#define CHOOSE_FROM_ARG_COUNT(...) FUNC_RECOMPOSER((__VA_ARGS__, I915_CACHE_4, I915_CACHE_3, I915_CACHE_2, I915_CACHE_1, ))
>> +#define NO_ARG_EXPANDER() ,,,I915_CACHE_0
>> +#define MACRO_CHOOSER(...) CHOOSE_FROM_ARG_COUNT(NO_ARG_EXPANDER __VA_ARGS__ ())
>> +
>> +#define I915_CACHE(...) MACRO_CHOOSER(__VA_ARGS__)(__VA_ARGS__)
>> +
>> +/* i915_cache_t mode and flags extraction helpers. */
>> +#define I915_CACHE_MODE(cache) \
>> +	((enum i915_cache_mode)(((i915_cache_t)(cache)) & 0xff))
>> +#define I915_CACHE_FLAGS(cache) \
>> +	((unsigned int)((((i915_cache_t)(cache) & 0xff00)) >> 8))
>> +
>> +/* Helpers for i915 caching modes. */
>> +#define I915_CACHE_NONE		I915_CACHE(UC)
>> +#define I915_CACHE_CACHED	I915_CACHE(WB, COH1W, COH2W)
>> +#define I915_CACHE_WT		I915_CACHE(WT)
>> +
>> +int i915_cache_init(struct drm_i915_private *i915);
>> +int i915_cache_find_pat(struct drm_i915_private *i915, i915_cache_t cache);
>> +void i915_cache_print(char *buf, size_t buflen, const char *suffix,
>> +		      i915_cache_t cache);
>> +
>> +#define I915_CACHE_NAME_LEN (40)
>>   
>>   #endif /* __I915_CACHE_H__ */
>> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
>> index 4de44cf1026d..4ec292011546 100644
>> --- a/drivers/gpu/drm/i915/i915_debugfs.c
>> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
>> @@ -140,57 +140,18 @@ static const char *stringify_vma_type(const struct i915_vma *vma)
>>   	return "ppgtt";
>>   }
>>   
>> -static const char *i915_cache_level_str(struct drm_i915_gem_object *obj)
>> -{
>> -	struct drm_i915_private *i915 = obj_to_i915(obj);
>> -
>> -	if (IS_METEORLAKE(i915)) {
>> -		switch (obj->pat_index) {
>> -		case 0: return " WB";
>> -		case 1: return " WT";
>> -		case 2: return " UC";
>> -		case 3: return " WB (1-Way Coh)";
>> -		case 4: return " WB (2-Way Coh)";
>> -		default: return " not defined";
>> -		}
>> -	} else if (IS_PONTEVECCHIO(i915)) {
>> -		switch (obj->pat_index) {
>> -		case 0: return " UC";
>> -		case 1: return " WC";
>> -		case 2: return " WT";
>> -		case 3: return " WB";
>> -		case 4: return " WT (CLOS1)";
>> -		case 5: return " WB (CLOS1)";
>> -		case 6: return " WT (CLOS2)";
>> -		case 7: return " WT (CLOS2)";
>> -		default: return " not defined";
>> -		}
>> -	} else if (GRAPHICS_VER(i915) >= 12) {
>> -		switch (obj->pat_index) {
>> -		case 0: return " WB";
>> -		case 1: return " WC";
>> -		case 2: return " WT";
>> -		case 3: return " UC";
>> -		default: return " not defined";
>> -		}
>> -	} else {
>> -		switch (obj->pat_index) {
>> -		case 0: return " UC";
>> -		case 1: return HAS_LLC(i915) ?
>> -			       " LLC" : " snooped";
>> -		case 2: return " L3+LLC";
>> -		case 3: return " WT";
>> -		default: return " not defined";
>> -		}
>> -	}
>> -}
>> -
>>   void
>>   i915_debugfs_describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
>>   {
>> +	struct drm_i915_private *i915 = to_i915(obj->base.dev);
>> +	char buf[I915_CACHE_NAME_LEN];
>>   	struct i915_vma *vma;
>>   	int pin_count = 0;
>>   
>> +	i915_cache_print(buf, sizeof(buf),
>> +			 obj->pat_set_by_user ? "!" : NULL,
>> +			 INTEL_INFO(i915)->cache_modes[obj->pat_index]);
>> +
>>   	seq_printf(m, "%pK: %c%c%c %8zdKiB %02x %02x %s%s%s",
>>   		   &obj->base,
>>   		   get_tiling_flag(obj),
>> @@ -199,7 +160,7 @@ i915_debugfs_describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
>>   		   obj->base.size / 1024,
>>   		   obj->read_domains,
>>   		   obj->write_domain,
>> -		   i915_cache_level_str(obj),
>> +		   buf,
>>   		   obj->mm.dirty ? " dirty" : "",
>>   		   obj->mm.madv == I915_MADV_DONTNEED ? " purgeable" : "");
>>   	if (obj->base.name)
>> diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
>> index bb2223cc3470..8663388a524f 100644
>> --- a/drivers/gpu/drm/i915/i915_driver.c
>> +++ b/drivers/gpu/drm/i915/i915_driver.c
>> @@ -241,7 +241,9 @@ static int i915_driver_early_probe(struct drm_i915_private *dev_priv)
>>   	i915_memcpy_init_early(dev_priv);
>>   	intel_runtime_pm_init_early(&dev_priv->runtime_pm);
>>   
>> -	i915_cache_init(dev_priv);
>> +	ret = i915_cache_init(dev_priv);
>> +	if (ret < 0)
>> +		return ret;
>>   
>>   	ret = i915_workqueues_init(dev_priv);
>>   	if (ret < 0)
>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>> index 896aa48ed089..814705cfeb12 100644
>> --- a/drivers/gpu/drm/i915/i915_gem.c
>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>> @@ -1144,19 +1144,6 @@ int i915_gem_init(struct drm_i915_private *dev_priv)
>>   	unsigned int i;
>>   	int ret;
>>   
>> -	/*
>> -	 * In the proccess of replacing cache_level with pat_index a tricky
>> -	 * dependency is created on the definition of the enum i915_cache_level.
>> -	 * in case this enum is changed, PTE encode would be broken.
>> -	 * Add a WARNING here. And remove when we completely quit using this
>> -	 * enum
>> -	 */
>> -	BUILD_BUG_ON(I915_CACHE_NONE != 0 ||
>> -		     I915_CACHE_LLC != 1 ||
>> -		     I915_CACHE_L3_LLC != 2 ||
>> -		     I915_CACHE_WT != 3 ||
>> -		     I915_MAX_CACHE_LEVEL != 4);
>> -
>>   	/* We need to fallback to 4K pages if host doesn't support huge gtt. */
>>   	if (intel_vgpu_active(dev_priv) && !intel_vgpu_has_huge_gtt(dev_priv))
>>   		RUNTIME_INFO(dev_priv)->page_sizes = I915_GTT_PAGE_SIZE_4K;
>> diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
>> index fcacdc21643c..565a60a1645d 100644
>> --- a/drivers/gpu/drm/i915/i915_pci.c
>> +++ b/drivers/gpu/drm/i915/i915_pci.c
>> @@ -32,6 +32,7 @@
>>   #include "gt/intel_sa_media.h"
>>   #include "gem/i915_gem_object_types.h"
>>   
>> +#include "i915_cache.h"
>>   #include "i915_driver.h"
>>   #include "i915_drv.h"
>>   #include "i915_pci.h"
>> @@ -43,36 +44,43 @@
>>   	.__runtime.graphics.ip.ver = (x), \
>>   	.__runtime.media.ip.ver = (x)
>>   
>> -#define LEGACY_CACHELEVEL \
>> -	.cachelevel_to_pat = { \
>> -		[I915_CACHE_NONE]   = 0, \
>> -		[I915_CACHE_LLC]    = 1, \
>> -		[I915_CACHE_L3_LLC] = 2, \
>> -		[I915_CACHE_WT]     = 3, \
>> +#define LEGACY_CACHE_MODES \
>> +	.cache_modes = { \
>> +		[I915_CACHE_MODE_UC] 	  = I915_CACHE(UC), \
>> +		[I915_CACHE_MODE_WB] 	  = I915_CACHE(WB, COH1W, COH2W), \
> 
> Reading bspec 2863 (bdw) indicates that the CPU being able to snoop the
> GPU's L3 was a new feature in gen8.  So for HSW and earlier, any
> coherency was only 1-way (GPU could be coherent with CPU's caches, but
> not vice-versa).  Only starting with gen8 did we get 2-way coherency as
> an option where the CPU would also be coherent with the GPU cache (and
> with gen8 and beyond you could still select 1-way instead of 2-way
> coherency with instruction-level granularity via MOCS).  There are also
> some legacy platforms (e.g., EHL/JSL on bspec 13948) where the IA wasn't
> coherent with GPU L3 so we were back to 1-way coherency.
> 
> So should we split LEGACY_CACHE_MODES into two tables with different
> coherency settings attached to I915_CACHE_MODE_WB?
> 
>> +		[__I915_CACHE_MODE_WB_L3] = I915_CACHE(WB, COH1W, COH2W, L3), \
>> +		[I915_CACHE_MODE_WT] 	  = I915_CACHE(WT), \
>>   	}
>>   
>> -#define TGL_CACHELEVEL \
>> -	.cachelevel_to_pat = { \
>> -		[I915_CACHE_NONE]   = 3, \
>> -		[I915_CACHE_LLC]    = 0, \
>> -		[I915_CACHE_L3_LLC] = 0, \
>> -		[I915_CACHE_WT]     = 2, \
>> +#define GEN12_CACHE_MODES \
>> +	.cache_modes = { \
>> +		[0] = I915_CACHE(WB, COH1W, COH2W), \
>> +		[1] = I915_CACHE(WC), \
>> +		[2] = I915_CACHE(WT), \
>> +		[3] = I915_CACHE(UC), \
>>   	}
>>   
>> -#define PVC_CACHELEVEL \
>> -	.cachelevel_to_pat = { \
>> -		[I915_CACHE_NONE]   = 0, \
>> -		[I915_CACHE_LLC]    = 3, \
>> -		[I915_CACHE_L3_LLC] = 3, \
>> -		[I915_CACHE_WT]     = 2, \
>> +/* FIXME is 1-way or 2-way for 3, 5, 7 */
>> +
>> +#define PVC_CACHE_MODES \
>> +	.cache_modes = { \
>> +		[0] = I915_CACHE(UC), \
>> +		[1] = I915_CACHE(WC), \
>> +		[2] = I915_CACHE(WT), \
>> +		[3] = I915_CACHE(WB, COH1W), \
>> +		[4] = I915_CACHE(WT, CLOS1), \
>> +		[5] = I915_CACHE(WB, COH1W, CLOS1), \
>> +		[6] = I915_CACHE(WT, CLOS2), \
>> +		[7] = I915_CACHE(WB, COH1W, CLOS2), \
>>   	}
>>   
>> -#define MTL_CACHELEVEL \
>> -	.cachelevel_to_pat = { \
>> -		[I915_CACHE_NONE]   = 2, \
>> -		[I915_CACHE_LLC]    = 3, \
>> -		[I915_CACHE_L3_LLC] = 3, \
>> -		[I915_CACHE_WT]     = 1, \
>> +#define MTL_CACHE_MODES \
>> +	.cache_modes = { \
>> +		[0] = I915_CACHE(WB), \
>> +		[1] = I915_CACHE(WT), \
>> +		[2] = I915_CACHE(UC), \
>> +		[3] = I915_CACHE(WB, COH1W), \
>> +		[4] = I915_CACHE(WB, COH1W, COH2W), \
> 
> We may want a comment on this one since the "2W" part is sort of a lie.
> Bspec 63884 has a programming note for MTL that says
> 
>          "...Except for system atomics, setting Coherency Mode to 10 or
>          11 results in this same one-way coherenct behavior..."
> 
> So if we ask for 2W, we actually only get 1W behavior except in a very
> narrow set of cases.

Shall I just not mark it as 2-way then becuase it sounds that for i915 
purposes it is not 2-way?!

Could invent a new flag just to documet this is something weird?

Regards,

Tvrtko

> 
> 
> Matt
> 
>>   	}
>>   
>>   /* Keep in gen based order, and chronological order within a gen */
>> @@ -97,7 +105,7 @@
>>   	.max_pat_index = 3, \
>>   	GEN_DEFAULT_PAGE_SIZES, \
>>   	GEN_DEFAULT_REGIONS, \
>> -	LEGACY_CACHELEVEL
>> +	LEGACY_CACHE_MODES
>>   
>>   #define I845_FEATURES \
>>   	GEN(2), \
>> @@ -112,7 +120,7 @@
>>   	.max_pat_index = 3, \
>>   	GEN_DEFAULT_PAGE_SIZES, \
>>   	GEN_DEFAULT_REGIONS, \
>> -	LEGACY_CACHELEVEL
>> +	LEGACY_CACHE_MODES
>>   
>>   static const struct intel_device_info i830_info = {
>>   	I830_FEATURES,
>> @@ -145,7 +153,7 @@ static const struct intel_device_info i865g_info = {
>>   	.max_pat_index = 3, \
>>   	GEN_DEFAULT_PAGE_SIZES, \
>>   	GEN_DEFAULT_REGIONS, \
>> -	LEGACY_CACHELEVEL
>> +	LEGACY_CACHE_MODES
>>   
>>   static const struct intel_device_info i915g_info = {
>>   	GEN3_FEATURES,
>> @@ -208,7 +216,7 @@ static const struct intel_device_info pnv_m_info = {
>>   	.max_pat_index = 3, \
>>   	GEN_DEFAULT_PAGE_SIZES, \
>>   	GEN_DEFAULT_REGIONS, \
>> -	LEGACY_CACHELEVEL
>> +	LEGACY_CACHE_MODES
>>   
>>   static const struct intel_device_info i965g_info = {
>>   	GEN4_FEATURES,
>> @@ -252,7 +260,7 @@ static const struct intel_device_info gm45_info = {
>>   	.max_pat_index = 3, \
>>   	GEN_DEFAULT_PAGE_SIZES, \
>>   	GEN_DEFAULT_REGIONS, \
>> -	LEGACY_CACHELEVEL
>> +	LEGACY_CACHE_MODES
>>   
>>   static const struct intel_device_info ilk_d_info = {
>>   	GEN5_FEATURES,
>> @@ -282,7 +290,7 @@ static const struct intel_device_info ilk_m_info = {
>>   	.__runtime.ppgtt_size = 31, \
>>   	GEN_DEFAULT_PAGE_SIZES, \
>>   	GEN_DEFAULT_REGIONS, \
>> -	LEGACY_CACHELEVEL
>> +	LEGACY_CACHE_MODES
>>   
>>   #define SNB_D_PLATFORM \
>>   	GEN6_FEATURES, \
>> @@ -330,7 +338,7 @@ static const struct intel_device_info snb_m_gt2_info = {
>>   	.__runtime.ppgtt_size = 31, \
>>   	GEN_DEFAULT_PAGE_SIZES, \
>>   	GEN_DEFAULT_REGIONS, \
>> -	LEGACY_CACHELEVEL
>> +	LEGACY_CACHE_MODES
>>   
>>   #define IVB_D_PLATFORM \
>>   	GEN7_FEATURES, \
>> @@ -387,7 +395,7 @@ static const struct intel_device_info vlv_info = {
>>   	.platform_engine_mask = BIT(RCS0) | BIT(VCS0) | BIT(BCS0),
>>   	GEN_DEFAULT_PAGE_SIZES,
>>   	GEN_DEFAULT_REGIONS,
>> -	LEGACY_CACHELEVEL,
>> +	LEGACY_CACHE_MODES
>>   };
>>   
>>   #define G75_FEATURES  \
>> @@ -473,7 +481,7 @@ static const struct intel_device_info chv_info = {
>>   	.has_coherent_ggtt = false,
>>   	GEN_DEFAULT_PAGE_SIZES,
>>   	GEN_DEFAULT_REGIONS,
>> -	LEGACY_CACHELEVEL,
>> +	LEGACY_CACHE_MODES
>>   };
>>   
>>   #define GEN9_DEFAULT_PAGE_SIZES \
>> @@ -536,7 +544,7 @@ static const struct intel_device_info skl_gt4_info = {
>>   	.max_pat_index = 3, \
>>   	GEN9_DEFAULT_PAGE_SIZES, \
>>   	GEN_DEFAULT_REGIONS, \
>> -	LEGACY_CACHELEVEL
>> +	LEGACY_CACHE_MODES
>>   
>>   static const struct intel_device_info bxt_info = {
>>   	GEN9_LP_FEATURES,
>> @@ -640,7 +648,7 @@ static const struct intel_device_info jsl_info = {
>>   #define GEN12_FEATURES \
>>   	GEN11_FEATURES, \
>>   	GEN(12), \
>> -	TGL_CACHELEVEL, \
>> +	GEN12_CACHE_MODES, \
>>   	.has_global_mocs = 1, \
>>   	.has_pxp = 1, \
>>   	.max_pat_index = 3
>> @@ -708,7 +716,7 @@ static const struct intel_device_info adl_p_info = {
>>   	.__runtime.graphics.ip.ver = 12, \
>>   	.__runtime.graphics.ip.rel = 50, \
>>   	XE_HP_PAGE_SIZES, \
>> -	TGL_CACHELEVEL, \
>> +	GEN12_CACHE_MODES, \
>>   	.dma_mask_size = 46, \
>>   	.has_3d_pipeline = 1, \
>>   	.has_64bit_reloc = 1, \
>> @@ -803,7 +811,7 @@ static const struct intel_device_info pvc_info = {
>>   		BIT(VCS0) |
>>   		BIT(CCS0) | BIT(CCS1) | BIT(CCS2) | BIT(CCS3),
>>   	.require_force_probe = 1,
>> -	PVC_CACHELEVEL,
>> +	PVC_CACHE_MODES
>>   };
>>   
>>   static const struct intel_gt_definition xelpmp_extra_gt[] = {
>> @@ -838,7 +846,7 @@ static const struct intel_device_info mtl_info = {
>>   	.memory_regions = REGION_SMEM | REGION_STOLEN_LMEM,
>>   	.platform_engine_mask = BIT(RCS0) | BIT(BCS0) | BIT(CCS0),
>>   	.require_force_probe = 1,
>> -	MTL_CACHELEVEL,
>> +	MTL_CACHE_MODES
>>   };
>>   
>>   #undef PLATFORM
>> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
>> index 04bc1f4a1115..973175a64534 100644
>> --- a/drivers/gpu/drm/i915/i915_perf.c
>> +++ b/drivers/gpu/drm/i915/i915_perf.c
>> @@ -1870,7 +1870,7 @@ static int alloc_oa_buffer(struct i915_perf_stream *stream)
>>   		return PTR_ERR(bo);
>>   	}
>>   
>> -	i915_gem_object_set_cache_coherency(bo, I915_CACHE_LLC);
>> +	i915_gem_object_set_cache_coherency(bo, I915_CACHE_CACHED);
>>   
>>   	/* PreHSW required 512K alignment, HSW requires 16M */
>>   	vma = i915_vma_instance(bo, &gt->ggtt->vm, NULL);
>> diff --git a/drivers/gpu/drm/i915/intel_device_info.h b/drivers/gpu/drm/i915/intel_device_info.h
>> index dbfe6443457b..2ce13b7c48cb 100644
>> --- a/drivers/gpu/drm/i915/intel_device_info.h
>> +++ b/drivers/gpu/drm/i915/intel_device_info.h
>> @@ -27,6 +27,8 @@
>>   
>>   #include <uapi/drm/i915_drm.h>
>>   
>> +#include "i915_cache.h"
>> +
>>   #include "intel_step.h"
>>   
>>   #include "gt/intel_engine_types.h"
>> @@ -243,8 +245,8 @@ struct intel_device_info {
>>   	 */
>>   	const struct intel_runtime_info __runtime;
>>   
>> -	u32 cachelevel_to_pat[I915_MAX_CACHE_LEVEL];
>> -	u32 max_pat_index;
>> +	i915_cache_t cache_modes[8];
>> +	unsigned int max_pat_index;
>>   };
>>   
>>   struct intel_driver_caps {
>> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
>> index f910ec9b6d2b..ba821e48baa5 100644
>> --- a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
>> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
>> @@ -267,7 +267,7 @@ static int igt_evict_for_cache_color(void *arg)
>>   		err = PTR_ERR(obj);
>>   		goto cleanup;
>>   	}
>> -	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
>> +	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
>>   	quirk_add(obj, &objects);
>>   
>>   	vma = i915_gem_object_ggtt_pin(obj, NULL, 0, 0,
>> @@ -283,7 +283,7 @@ static int igt_evict_for_cache_color(void *arg)
>>   		err = PTR_ERR(obj);
>>   		goto cleanup;
>>   	}
>> -	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
>> +	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
>>   	quirk_add(obj, &objects);
>>   
>>   	/* Neighbouring; same colour - should fit */
>> diff --git a/drivers/gpu/drm/i915/selftests/igt_spinner.c b/drivers/gpu/drm/i915/selftests/igt_spinner.c
>> index 3c5e0952f1b8..4cfc5000d6ff 100644
>> --- a/drivers/gpu/drm/i915/selftests/igt_spinner.c
>> +++ b/drivers/gpu/drm/i915/selftests/igt_spinner.c
>> @@ -23,7 +23,7 @@ int igt_spinner_init(struct igt_spinner *spin, struct intel_gt *gt)
>>   		err = PTR_ERR(spin->hws);
>>   		goto err;
>>   	}
>> -	i915_gem_object_set_cache_coherency(spin->hws, I915_CACHE_LLC);
>> +	i915_gem_object_set_cache_coherency(spin->hws, I915_CACHE_CACHED);
>>   
>>   	spin->obj = i915_gem_object_create_internal(gt->i915, PAGE_SIZE);
>>   	if (IS_ERR(spin->obj)) {
>> diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
>> index 1d1a457e2aee..8ae77bcf27fa 100644
>> --- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
>> +++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
>> @@ -126,13 +126,13 @@ static const struct intel_device_info mock_info = {
>>   	.memory_regions = REGION_SMEM,
>>   	.platform_engine_mask = BIT(0),
>>   
>> -	/* simply use legacy cache level for mock device */
>> +	/* Simply use legacy cache modes for the mock device. */
>>   	.max_pat_index = 3,
>> -	.cachelevel_to_pat = {
>> -		[I915_CACHE_NONE]   = 0,
>> -		[I915_CACHE_LLC]    = 1,
>> -		[I915_CACHE_L3_LLC] = 2,
>> -		[I915_CACHE_WT]     = 3,
>> +	.cache_modes = {
>> +		[0] = I915_CACHE(UC),
>> +		[1] = I915_CACHE(WB, COH1W),
>> +		[2] = I915_CACHE(WB, COH1W, COH2W, L3),
>> +		[3] = I915_CACHE(WT),
>>   	},
>>   };
>>   
>> @@ -181,7 +181,7 @@ struct drm_i915_private *mock_gem_device(void)
>>   	/* Set up device info and initial runtime info. */
>>   	intel_device_info_driver_create(i915, pdev->device, &mock_info);
>>   
>> -	i915_cache_init(i915);
>> +	WARN_ON(i915_cache_init(i915));
>>   
>>   	dev_pm_domain_set(&pdev->dev, &pm_domain);
>>   	pm_runtime_enable(&pdev->dev);
>> -- 
>> 2.39.2
>>
> 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Intel-gfx] [RFC 4/8] drm/i915: Refactor PAT/object cache handling
@ 2023-07-28 12:39       ` Tvrtko Ursulin
  0 siblings, 0 replies; 59+ messages in thread
From: Tvrtko Ursulin @ 2023-07-28 12:39 UTC (permalink / raw)
  To: Matt Roper; +Cc: Intel-gfx, dri-devel, Chris Wilson


Forgot one part of your reply:

On 28/07/2023 00:57, Matt Roper wrote:
> On Thu, Jul 27, 2023 at 03:55:00PM +0100, Tvrtko Ursulin wrote:
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> Commit 9275277d5324 ("drm/i915: use pat_index instead of cache_level") has
>> introduced PAT indices to i915 internal APIs, partially replacing the
>> usage of driver internal cache_level, but has also added a few sub-
>> optimal design decisions which this patch tries to improve upon.
>>
>> Principal change here is to invert the per platform cache level to PAT
>> index table which was added by the referenced commit, and by doing so
>> enable i915 to understand the cache mode between PAT indices, changing
>> them from opaque to transparent.
>>
>> Once we have the inverted table we are able to remove the hidden false
>> "return true" from i915_gem_object_has_cache_level and make the involved
>> code path clearer.
>>
>> To achieve this we replace the enum i915_cache_level with i915_cache_t,
>> composed of a more detailed representation of each cache mode (base mode
>> plus flags).
>>
>> In this way we are able to express the differences between different
>> write-back mode coherency settings on Meteorlake, which in turn enables us
>> to map the i915 "cached" mode to the correct Meteorlake PAT index.
>>
>> We can also replace the platform dependent cache mode to string code in
>> debugfs and elsewhere by the single implementation based on i915_cache_t.
>>
>> v2:
>>   * Fix PAT-to-cache-mode table for PVC. (Fei)
>>   * Cache display caching mode too. (Fei)
>>   * Improve and document criteria in i915_gem_object_can_bypass_llc() (Matt)
>>
>> v3:
>>   * Checkpath issues.
>>   * Cache mode flags check fixed.
>>
>> v4:
>>   * Fix intel_device_info->cache_modes array size. (Matt)
>>   * Boolean cache mode and flags query. (Matt)
>>   * Reduce number of cache macros with some macro magic.
>>   * One more checkpatch fix.
>>   * Tweak tables to show legacy and Gen12 WB is fully coherent.
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> References: 9275277d5324 ("drm/i915: use pat_index instead of cache_level")
>> Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
>> Cc: Fei Yang <fei.yang@intel.com>
>> Cc: Andi Shyti <andi.shyti@linux.intel.com>
>> Cc: Matt Roper <matthew.d.roper@intel.com>
>> ---
>>   drivers/gpu/drm/i915/gem/i915_gem_domain.c    |  60 +++++----
>>   drivers/gpu/drm/i915/gem/i915_gem_domain.h    |   5 +-
>>   .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |   3 +-
>>   drivers/gpu/drm/i915/gem/i915_gem_internal.c  |   2 +-
>>   drivers/gpu/drm/i915/gem/i915_gem_mman.c      |   4 +-
>>   drivers/gpu/drm/i915/gem/i915_gem_object.c    | 117 ++++++++++--------
>>   drivers/gpu/drm/i915/gem/i915_gem_object.h    |  11 +-
>>   .../gpu/drm/i915/gem/i915_gem_object_types.h  | 116 +----------------
>>   drivers/gpu/drm/i915/gem/i915_gem_shmem.c     |   8 +-
>>   drivers/gpu/drm/i915/gem/i915_gem_stolen.c    |   2 +-
>>   drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c  |  20 +--
>>   drivers/gpu/drm/i915/gem/i915_gem_userptr.c   |   2 +-
>>   .../drm/i915/gem/selftests/huge_gem_object.c  |   2 +-
>>   .../gpu/drm/i915/gem/selftests/huge_pages.c   |   3 +-
>>   drivers/gpu/drm/i915/gt/gen8_ppgtt.c          |  10 +-
>>   drivers/gpu/drm/i915/gt/intel_engine_cs.c     |   2 +-
>>   drivers/gpu/drm/i915/gt/intel_ggtt.c          |  25 ++--
>>   drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c     |   4 +-
>>   drivers/gpu/drm/i915/gt/intel_gtt.c           |   2 +-
>>   drivers/gpu/drm/i915/gt/intel_gtt.h           |   3 +-
>>   drivers/gpu/drm/i915/gt/intel_ppgtt.c         |   6 +-
>>   .../gpu/drm/i915/gt/intel_ring_submission.c   |   4 +-
>>   drivers/gpu/drm/i915/gt/intel_timeline.c      |   2 +-
>>   drivers/gpu/drm/i915/gt/selftest_hangcheck.c  |   2 +-
>>   .../gpu/drm/i915/gt/selftest_workarounds.c    |   2 +-
>>   drivers/gpu/drm/i915/i915_cache.c             |  89 +++++++++++--
>>   drivers/gpu/drm/i915/i915_cache.h             |  70 ++++++++++-
>>   drivers/gpu/drm/i915/i915_debugfs.c           |  53 ++------
>>   drivers/gpu/drm/i915/i915_driver.c            |   4 +-
>>   drivers/gpu/drm/i915/i915_gem.c               |  13 --
>>   drivers/gpu/drm/i915/i915_pci.c               |  84 +++++++------
>>   drivers/gpu/drm/i915/i915_perf.c              |   2 +-
>>   drivers/gpu/drm/i915/intel_device_info.h      |   6 +-
>>   .../gpu/drm/i915/selftests/i915_gem_evict.c   |   4 +-
>>   drivers/gpu/drm/i915/selftests/igt_spinner.c  |   2 +-
>>   .../gpu/drm/i915/selftests/mock_gem_device.c  |  14 +--
>>   36 files changed, 391 insertions(+), 367 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
>> index 57db9c581bf6..c15f83de33af 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
>> @@ -8,6 +8,7 @@
>>   #include "display/intel_frontbuffer.h"
>>   #include "gt/intel_gt.h"
>>   
>> +#include "i915_cache.h"
>>   #include "i915_drv.h"
>>   #include "i915_gem_clflush.h"
>>   #include "i915_gem_domain.h"
>> @@ -41,14 +42,17 @@ static bool gpu_write_needs_clflush(struct drm_i915_gem_object *obj)
>>   		return false;
>>   
>>   	/*
>> -	 * For objects created by userspace through GEM_CREATE with pat_index
>> -	 * set by set_pat extension, i915_gem_object_has_cache_level() will
>> -	 * always return true, because the coherency of such object is managed
>> -	 * by userspace. Othereise the call here would fall back to checking
>> -	 * whether the object is un-cached or write-through.
>> +	 * Always flush cache for UMD objects with PAT index set.
>>   	 */
>> -	return !(i915_gem_object_has_cache_level(obj, I915_CACHE_NONE) ||
>> -		 i915_gem_object_has_cache_level(obj, I915_CACHE_WT));
>> +	if (obj->pat_set_by_user)
>> +		return true;
>> +
>> +	/*
>> +	 * Fully coherent cached access may end up with data in the CPU cache
>> +	 * which hasn't hit memory yet.
>> +	 */
>> +	return i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_WB) &&
>> +	       i915_gem_object_has_cache_flag(obj, I915_CACHE_FLAG_COH2W);
>>   }
>>   
>>   bool i915_gem_cpu_write_needs_clflush(struct drm_i915_gem_object *obj)
>> @@ -268,7 +272,7 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
>>   /**
>>    * i915_gem_object_set_cache_level - Changes the cache-level of an object across all VMA.
>>    * @obj: object to act on
>> - * @cache_level: new cache level to set for the object
>> + * @cache: new caching mode to set for the object
>>    *
>>    * After this function returns, the object will be in the new cache-level
>>    * across all GTT and the contents of the backing storage will be coherent,
>> @@ -281,18 +285,28 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
>>    * that all direct access to the scanout remains coherent.
>>    */
>>   int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
>> -				    enum i915_cache_level cache_level)
>> +				    i915_cache_t cache)
>>   {
>> -	int ret;
>> +	struct drm_i915_private *i915 = to_i915(obj->base.dev);
>> +	int pat, ret;
>>   
>> -	/*
>> -	 * For objects created by userspace through GEM_CREATE with pat_index
>> -	 * set by set_pat extension, simply return 0 here without touching
>> -	 * the cache setting, because such objects should have an immutable
>> -	 * cache setting by desgin and always managed by userspace.
>> -	 */
>> -	if (i915_gem_object_has_cache_level(obj, cache_level))
>> +	pat = i915_cache_find_pat(i915, cache);
>> +	if (pat < 0) {
>> +		char buf[I915_CACHE_NAME_LEN];
>> +
>> +		i915_cache_print(buf, sizeof(buf), NULL, cache);
>> +		drm_err_ratelimited(&i915->drm,
>> +				    "Attempting to use unknown caching mode %s!\n",
>> +				    buf);
>> +
>> +		return -EINVAL;
>> +	} else if (pat == obj->pat_index) {
>>   		return 0;
>> +	} else if (obj->pat_set_by_user) {
>> +		drm_notice_once(&i915->drm,
>> +				"Attempting to change caching mode on an object with fixed PAT!\n");
>> +		return -EINVAL;
>> +	}
>>   
>>   	ret = i915_gem_object_wait(obj,
>>   				   I915_WAIT_INTERRUPTIBLE |
>> @@ -302,7 +316,7 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
>>   		return ret;
>>   
>>   	/* Always invalidate stale cachelines */
>> -	i915_gem_object_set_cache_coherency(obj, cache_level);
>> +	i915_gem_object_set_pat_index(obj, pat);
>>   	obj->cache_dirty = true;
>>   
>>   	/* The cache-level will be applied when each vma is rebound. */
>> @@ -337,10 +351,10 @@ int i915_gem_get_caching_ioctl(struct drm_device *dev, void *data,
>>   		goto out;
>>   	}
>>   
>> -	if (i915_gem_object_has_cache_level(obj, I915_CACHE_LLC) ||
>> -	    i915_gem_object_has_cache_level(obj, I915_CACHE_L3_LLC))
>> +	if (i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_WB) &&
>> +	    i915_gem_object_has_cache_flag(obj, I915_CACHE_FLAG_COH2W))
>>   		args->caching = I915_CACHING_CACHED;
>> -	else if (i915_gem_object_has_cache_level(obj, I915_CACHE_WT))
>> +	else if (i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_WT))
>>   		args->caching = I915_CACHING_DISPLAY;
>>   	else
>>   		args->caching = I915_CACHING_NONE;
>> @@ -355,7 +369,7 @@ int i915_gem_set_caching_ioctl(struct drm_device *dev, void *data,
>>   	struct drm_i915_private *i915 = to_i915(dev);
>>   	struct drm_i915_gem_caching *args = data;
>>   	struct drm_i915_gem_object *obj;
>> -	enum i915_cache_level level;
>> +	i915_cache_t level;
>>   	int ret = 0;
>>   
>>   	if (IS_DGFX(i915))
>> @@ -378,7 +392,7 @@ int i915_gem_set_caching_ioctl(struct drm_device *dev, void *data,
>>   		if (!HAS_LLC(i915) && !HAS_SNOOP(i915))
>>   			return -ENODEV;
>>   
>> -		level = I915_CACHE_LLC;
>> +		level = I915_CACHE_CACHED;
>>   		break;
>>   	case I915_CACHING_DISPLAY:
>>   		level = HAS_WT(i915) ? I915_CACHE_WT : I915_CACHE_NONE;
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.h b/drivers/gpu/drm/i915/gem/i915_gem_domain.h
>> index 9622df962bfc..6da5c351f6fd 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.h
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.h
>> @@ -6,10 +6,11 @@
>>   #ifndef __I915_GEM_DOMAIN_H__
>>   #define __I915_GEM_DOMAIN_H__
>>   
>> +#include "i915_cache.h"
>> +
>>   struct drm_i915_gem_object;
>> -enum i915_cache_level;
>>   
>>   int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
>> -				    enum i915_cache_level cache_level);
>> +				    i915_cache_t cache);
>>   
>>   #endif /* __I915_GEM_DOMAIN_H__ */
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>> index 0a1d40220020..9d6e49c8a4c6 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>> @@ -648,7 +648,8 @@ static inline int use_cpu_reloc(const struct reloc_cache *cache,
>>   	 */
>>   	return (cache->has_llc ||
>>   		obj->cache_dirty ||
>> -		!i915_gem_object_has_cache_level(obj, I915_CACHE_NONE));
>> +		!(obj->pat_set_by_user ||
>> +		  i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_UC)));
>>   }
>>   
>>   static int eb_reserve_vma(struct i915_execbuffer *eb,
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_internal.c b/drivers/gpu/drm/i915/gem/i915_gem_internal.c
>> index 6bc26b4b06b8..88c360c3d6a3 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_internal.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_internal.c
>> @@ -170,7 +170,7 @@ __i915_gem_object_create_internal(struct drm_i915_private *i915,
>>   	obj->read_domains = I915_GEM_DOMAIN_CPU;
>>   	obj->write_domain = I915_GEM_DOMAIN_CPU;
>>   
>> -	cache_level = HAS_LLC(i915) ? I915_CACHE_LLC : I915_CACHE_NONE;
>> +	cache_level = HAS_LLC(i915) ? I915_CACHE_CACHED : I915_CACHE_NONE;
>>   	i915_gem_object_set_cache_coherency(obj, cache_level);
>>   
>>   	return obj;
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
>> index aa4d842d4c5a..cd7f8ded0d6f 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
>> @@ -382,7 +382,6 @@ static vm_fault_t vm_fault_gtt(struct vm_fault *vmf)
>>   		goto err_reset;
>>   	}
>>   
>> -	/* Access to snoopable pages through the GTT is incoherent. */
>>   	/*
>>   	 * For objects created by userspace through GEM_CREATE with pat_index
>>   	 * set by set_pat extension, coherency is managed by userspace, make
>> @@ -391,7 +390,8 @@ static vm_fault_t vm_fault_gtt(struct vm_fault *vmf)
>>   	 * objects. Otherwise this helper function would fall back to checking
>>   	 * whether the object is un-cached.
>>   	 */
>> -	if (!(i915_gem_object_has_cache_level(obj, I915_CACHE_NONE) ||
>> +	if (!((obj->pat_set_by_user ||
>> +	       i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_UC)) ||
>>   	      HAS_LLC(i915))) {
>>   		ret = -EFAULT;
>>   		goto err_unpin;
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
>> index 3dc4fbb67d2b..ec1f0be43d0d 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
>> @@ -45,33 +45,6 @@ static struct kmem_cache *slab_objects;
>>   
>>   static const struct drm_gem_object_funcs i915_gem_object_funcs;
>>   
>> -unsigned int i915_gem_get_pat_index(struct drm_i915_private *i915,
>> -				    enum i915_cache_level level)
>> -{
>> -	if (drm_WARN_ON(&i915->drm, level >= I915_MAX_CACHE_LEVEL))
>> -		return 0;
>> -
>> -	return INTEL_INFO(i915)->cachelevel_to_pat[level];
>> -}
>> -
>> -bool i915_gem_object_has_cache_level(const struct drm_i915_gem_object *obj,
>> -				     enum i915_cache_level lvl)
>> -{
>> -	/*
>> -	 * In case the pat_index is set by user space, this kernel mode
>> -	 * driver should leave the coherency to be managed by user space,
>> -	 * simply return true here.
>> -	 */
>> -	if (obj->pat_set_by_user)
>> -		return true;
>> -
>> -	/*
>> -	 * Otherwise the pat_index should have been converted from cache_level
>> -	 * so that the following comparison is valid.
>> -	 */
>> -	return obj->pat_index == i915_gem_get_pat_index(obj_to_i915(obj), lvl);
>> -}
>> -
>>   struct drm_i915_gem_object *i915_gem_object_alloc(void)
>>   {
>>   	struct drm_i915_gem_object *obj;
>> @@ -144,30 +117,72 @@ void __i915_gem_object_fini(struct drm_i915_gem_object *obj)
>>   	dma_resv_fini(&obj->base._resv);
>>   }
>>   
>> +bool i915_gem_object_has_cache_mode(const struct drm_i915_gem_object *obj,
>> +				    enum i915_cache_mode mode)
>> +{
>> +	struct drm_i915_private *i915 = obj_to_i915(obj);
>> +	i915_cache_t cache = INTEL_INFO(i915)->cache_modes[obj->pat_index];
>> +
>> +	return I915_CACHE_MODE(cache) == mode;
>> +}
>> +
>> +bool i915_gem_object_has_cache_flag(const struct drm_i915_gem_object *obj,
>> +				    unsigned int flag)
>> +{
>> +	struct drm_i915_private *i915 = obj_to_i915(obj);
>> +	i915_cache_t cache = INTEL_INFO(i915)->cache_modes[obj->pat_index];
>> +
>> +	return I915_CACHE_FLAGS(cache) & flag;
>> +}
>> +
>> +static void __i915_gem_object_update_coherency(struct drm_i915_gem_object *obj)
>> +{
>> +	struct drm_i915_private *i915 = obj_to_i915(obj);
>> +	i915_cache_t cache = INTEL_INFO(i915)->cache_modes[obj->pat_index];
>> +	const unsigned int flags = I915_CACHE_FLAGS(cache);
>> +	const unsigned int mode = I915_CACHE_MODE(cache);
>> +
>> +	if (mode == I915_CACHE_MODE_WC ||
>> +	    mode == I915_CACHE_MODE_WT ||
>> +	    (mode == I915_CACHE_MODE_WB && (flags & I915_CACHE_FLAG_COH2W)))
>> +		obj->cache_coherent = I915_BO_CACHE_COHERENT_FOR_READ |
>> +				      I915_BO_CACHE_COHERENT_FOR_WRITE;
>> +	else if (HAS_LLC(i915))
>> +		obj->cache_coherent = I915_BO_CACHE_COHERENT_FOR_READ;
>> +	else
>> +		obj->cache_coherent = 0;
>> +
>> +	obj->cache_dirty =
>> +		!(obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_WRITE) &&
>> +		!IS_DGFX(i915);
>> +}
>> +
>>   /**
>>    * i915_gem_object_set_cache_coherency - Mark up the object's coherency levels
>> - * for a given cache_level
>> + * for a given caching mode
>>    * @obj: #drm_i915_gem_object
>> - * @cache_level: cache level
>> + * @cache: cache mode
>>    */
>>   void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object *obj,
>> -					 unsigned int cache_level)
>> +					 i915_cache_t cache)
>>   {
>> -	struct drm_i915_private *i915 = to_i915(obj->base.dev);
>> +	struct drm_i915_private *i915 = obj_to_i915(obj);
>> +	int found;
>>   
>> -	obj->pat_index = i915_gem_get_pat_index(i915, cache_level);
>> +	found = i915_cache_find_pat(i915, cache);
>> +	if (found < 0) {
>> +		char buf[I915_CACHE_NAME_LEN];
>>   
>> -	if (cache_level != I915_CACHE_NONE)
>> -		obj->cache_coherent = (I915_BO_CACHE_COHERENT_FOR_READ |
>> -				       I915_BO_CACHE_COHERENT_FOR_WRITE);
>> -	else if (HAS_LLC(i915))
>> -		obj->cache_coherent = I915_BO_CACHE_COHERENT_FOR_READ;
>> -	else
>> -		obj->cache_coherent = 0;
>> +		i915_cache_print(buf, sizeof(buf), NULL, cache);
>> +		drm_err_ratelimited(&i915->drm, "Unknown cache mode %s!\n",
>> +				    buf);
>>   
>> -	obj->cache_dirty =
>> -		!(obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_WRITE) &&
>> -		!IS_DGFX(i915);
>> +		found = i915->pat_uc;
>> +	}
>> +
>> +	obj->pat_index = found;
>> +
>> +	__i915_gem_object_update_coherency(obj);
>>   }
>>   
>>   /**
>> @@ -181,24 +196,18 @@ void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object *obj,
>>   void i915_gem_object_set_pat_index(struct drm_i915_gem_object *obj,
>>   				   unsigned int pat_index)
>>   {
>> -	struct drm_i915_private *i915 = to_i915(obj->base.dev);
>> +	struct drm_i915_private *i915 = obj_to_i915(obj);
>>   
>>   	if (obj->pat_index == pat_index)
>>   		return;
>>   
>> +	if (drm_WARN_ON_ONCE(&i915->drm,
>> +			     pat_index > INTEL_INFO(i915)->max_pat_index))
>> +		return;
>> +
>>   	obj->pat_index = pat_index;
>>   
>> -	if (pat_index != i915_gem_get_pat_index(i915, I915_CACHE_NONE))
>> -		obj->cache_coherent = (I915_BO_CACHE_COHERENT_FOR_READ |
>> -				       I915_BO_CACHE_COHERENT_FOR_WRITE);
>> -	else if (HAS_LLC(i915))
>> -		obj->cache_coherent = I915_BO_CACHE_COHERENT_FOR_READ;
>> -	else
>> -		obj->cache_coherent = 0;
>> -
>> -	obj->cache_dirty =
>> -		!(obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_WRITE) &&
>> -		!IS_DGFX(i915);
>> +	__i915_gem_object_update_coherency(obj);
>>   }
>>   
>>   bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj)
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h
>> index 884a17275b3a..a5d4ee19d9be 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
>> @@ -13,6 +13,7 @@
>>   
>>   #include "display/intel_frontbuffer.h"
>>   #include "intel_memory_region.h"
>> +#include "i915_cache.h"
>>   #include "i915_gem_object_types.h"
>>   #include "i915_gem_gtt.h"
>>   #include "i915_gem_ww.h"
>> @@ -32,10 +33,6 @@ static inline bool i915_gem_object_size_2big(u64 size)
>>   	return false;
>>   }
>>   
>> -unsigned int i915_gem_get_pat_index(struct drm_i915_private *i915,
>> -				    enum i915_cache_level level);
>> -bool i915_gem_object_has_cache_level(const struct drm_i915_gem_object *obj,
>> -				     enum i915_cache_level lvl);
>>   void i915_gem_init__objects(struct drm_i915_private *i915);
>>   
>>   void i915_objects_module_exit(void);
>> @@ -764,8 +761,12 @@ int i915_gem_object_wait_moving_fence(struct drm_i915_gem_object *obj,
>>   				      bool intr);
>>   bool i915_gem_object_has_unknown_state(struct drm_i915_gem_object *obj);
>>   
>> +bool i915_gem_object_has_cache_mode(const struct drm_i915_gem_object *obj,
>> +				    enum i915_cache_mode mode);
>> +bool i915_gem_object_has_cache_flag(const struct drm_i915_gem_object *obj,
>> +				    unsigned int flag);
>>   void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object *obj,
>> -					 unsigned int cache_level);
>> +					 i915_cache_t cache);
>>   void i915_gem_object_set_pat_index(struct drm_i915_gem_object *obj,
>>   				   unsigned int pat_index);
>>   bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj);
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
>> index 8de2b91b3edf..6790e13ad262 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
>> @@ -14,6 +14,7 @@
>>   #include <uapi/drm/i915_drm.h>
>>   
>>   #include "i915_active.h"
>> +#include "i915_cache.h"
>>   #include "i915_selftest.h"
>>   #include "i915_vma_resource.h"
>>   
>> @@ -116,93 +117,6 @@ struct drm_i915_gem_object_ops {
>>   	const char *name; /* friendly name for debug, e.g. lockdep classes */
>>   };
>>   
>> -/**
>> - * enum i915_cache_level - The supported GTT caching values for system memory
>> - * pages.
>> - *
>> - * These translate to some special GTT PTE bits when binding pages into some
>> - * address space. It also determines whether an object, or rather its pages are
>> - * coherent with the GPU, when also reading or writing through the CPU cache
>> - * with those pages.
>> - *
>> - * Userspace can also control this through struct drm_i915_gem_caching.
>> - */
>> -enum i915_cache_level {
>> -	/**
>> -	 * @I915_CACHE_NONE:
>> -	 *
>> -	 * GPU access is not coherent with the CPU cache. If the cache is dirty
>> -	 * and we need the underlying pages to be coherent with some later GPU
>> -	 * access then we need to manually flush the pages.
>> -	 *
>> -	 * On shared LLC platforms reads and writes through the CPU cache are
>> -	 * still coherent even with this setting. See also
>> -	 * &drm_i915_gem_object.cache_coherent for more details. Due to this we
>> -	 * should only ever use uncached for scanout surfaces, otherwise we end
>> -	 * up over-flushing in some places.
>> -	 *
>> -	 * This is the default on non-LLC platforms.
>> -	 */
>> -	I915_CACHE_NONE = 0,
>> -	/**
>> -	 * @I915_CACHE_LLC:
>> -	 *
>> -	 * GPU access is coherent with the CPU cache. If the cache is dirty,
>> -	 * then the GPU will ensure that access remains coherent, when both
>> -	 * reading and writing through the CPU cache. GPU writes can dirty the
>> -	 * CPU cache.
>> -	 *
>> -	 * Not used for scanout surfaces.
>> -	 *
>> -	 * Applies to both platforms with shared LLC(HAS_LLC), and snooping
>> -	 * based platforms(HAS_SNOOP).
>> -	 *
>> -	 * This is the default on shared LLC platforms.  The only exception is
>> -	 * scanout objects, where the display engine is not coherent with the
>> -	 * CPU cache. For such objects I915_CACHE_NONE or I915_CACHE_WT is
>> -	 * automatically applied by the kernel in pin_for_display, if userspace
>> -	 * has not done so already.
>> -	 */
>> -	I915_CACHE_LLC,
>> -	/**
>> -	 * @I915_CACHE_L3_LLC:
>> -	 *
>> -	 * Explicitly enable the Gfx L3 cache, with coherent LLC.
>> -	 *
>> -	 * The Gfx L3 sits between the domain specific caches, e.g
>> -	 * sampler/render caches, and the larger LLC. LLC is coherent with the
>> -	 * GPU, but L3 is only visible to the GPU, so likely needs to be flushed
>> -	 * when the workload completes.
>> -	 *
>> -	 * Not used for scanout surfaces.
>> -	 *
>> -	 * Only exposed on some gen7 + GGTT. More recent hardware has dropped
>> -	 * this explicit setting, where it should now be enabled by default.
>> -	 */
>> -	I915_CACHE_L3_LLC,
>> -	/**
>> -	 * @I915_CACHE_WT:
>> -	 *
>> -	 * Write-through. Used for scanout surfaces.
>> -	 *
>> -	 * The GPU can utilise the caches, while still having the display engine
>> -	 * be coherent with GPU writes, as a result we don't need to flush the
>> -	 * CPU caches when moving out of the render domain. This is the default
>> -	 * setting chosen by the kernel, if supported by the HW, otherwise we
>> -	 * fallback to I915_CACHE_NONE. On the CPU side writes through the CPU
>> -	 * cache still need to be flushed, to remain coherent with the display
>> -	 * engine.
>> -	 */
>> -	I915_CACHE_WT,
>> -	/**
>> -	 * @I915_MAX_CACHE_LEVEL:
>> -	 *
>> -	 * Mark the last entry in the enum. Used for defining cachelevel_to_pat
>> -	 * array for cache_level to pat translation table.
>> -	 */
>> -	I915_MAX_CACHE_LEVEL,
>> -};
>> -
>>   enum i915_map_type {
>>   	I915_MAP_WB = 0,
>>   	I915_MAP_WC,
>> @@ -403,16 +317,6 @@ struct drm_i915_gem_object {
>>   	/**
>>   	 * @cache_coherent:
>>   	 *
>> -	 * Note: with the change above which replaced @cache_level with pat_index,
>> -	 * the use of @cache_coherent is limited to the objects created by kernel
>> -	 * or by userspace without pat index specified.
>> -	 * Check for @pat_set_by_user to find out if an object has pat index set
>> -	 * by userspace. The ioctl's to change cache settings have also been
>> -	 * disabled for the objects with pat index set by userspace. Please don't
>> -	 * assume @cache_coherent having the flags set as describe here. A helper
>> -	 * function i915_gem_object_has_cache_level() provides one way to bypass
>> -	 * the use of this field.
>> -	 *
>>   	 * Track whether the pages are coherent with the GPU if reading or
>>   	 * writing through the CPU caches. The largely depends on the
>>   	 * @cache_level setting.
>> @@ -447,7 +351,7 @@ struct drm_i915_gem_object {
>>   	 * flushing the surface just before doing the scanout.  This does mean
>>   	 * we might unnecessarily flush non-scanout objects in some places, but
>>   	 * the default assumption is that all normal objects should be using
>> -	 * I915_CACHE_LLC, at least on platforms with the shared LLC.
>> +	 * I915_CACHE_CACHED, at least on platforms with the shared LLC.
>>   	 *
>>   	 * Supported values:
>>   	 *
>> @@ -486,16 +390,6 @@ struct drm_i915_gem_object {
>>   	/**
>>   	 * @cache_dirty:
>>   	 *
>> -	 * Note: with the change above which replaced cache_level with pat_index,
>> -	 * the use of @cache_dirty is limited to the objects created by kernel
>> -	 * or by userspace without pat index specified.
>> -	 * Check for @pat_set_by_user to find out if an object has pat index set
>> -	 * by userspace. The ioctl's to change cache settings have also been
>> -	 * disabled for the objects with pat_index set by userspace. Please don't
>> -	 * assume @cache_dirty is set as describe here. Also see helper function
>> -	 * i915_gem_object_has_cache_level() for possible ways to bypass the use
>> -	 * of this field.
>> -	 *
>>   	 * Track if we are we dirty with writes through the CPU cache for this
>>   	 * object. As a result reading directly from main memory might yield
>>   	 * stale data.
>> @@ -531,9 +425,9 @@ struct drm_i915_gem_object {
>>   	 *
>>   	 *   1. All userspace objects, by default, have @cache_level set as
>>   	 *   I915_CACHE_NONE. The only exception is userptr objects, where we
>> -	 *   instead force I915_CACHE_LLC, but we also don't allow userspace to
>> -	 *   ever change the @cache_level for such objects. Another special case
>> -	 *   is dma-buf, which doesn't rely on @cache_dirty,  but there we
>> +	 *   instead force I915_CACHE_CACHED, but we also don't allow userspace
>> +	 *   to ever change the @cache_level for such objects. Another special
>> +	 *   case is dma-buf, which doesn't rely on @cache_dirty,  but there we
>>   	 *   always do a forced flush when acquiring the pages, if there is a
>>   	 *   chance that the pages can be read directly from main memory with
>>   	 *   the GPU.
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
>> index 8f1633c3fb93..aba908f0349f 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
>> @@ -584,7 +584,7 @@ static int shmem_object_init(struct intel_memory_region *mem,
>>   	static struct lock_class_key lock_class;
>>   	struct drm_i915_private *i915 = mem->i915;
>>   	struct address_space *mapping;
>> -	unsigned int cache_level;
>> +	i915_cache_t cache;
>>   	gfp_t mask;
>>   	int ret;
>>   
>> @@ -628,11 +628,11 @@ static int shmem_object_init(struct intel_memory_region *mem,
>>   		 * However, we maintain the display planes as UC, and so
>>   		 * need to rebind when first used as such.
>>   		 */
>> -		cache_level = I915_CACHE_LLC;
>> +		cache = I915_CACHE_CACHED;
>>   	else
>> -		cache_level = I915_CACHE_NONE;
>> +		cache = I915_CACHE_NONE;
>>   
>> -	i915_gem_object_set_cache_coherency(obj, cache_level);
>> +	i915_gem_object_set_cache_coherency(obj, cache);
>>   
>>   	i915_gem_object_init_memory_region(obj, mem);
>>   
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
>> index 1c8eb806b7d3..cc907a1f1c53 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
>> @@ -691,7 +691,7 @@ static int __i915_gem_object_create_stolen(struct intel_memory_region *mem,
>>   
>>   	obj->stolen = stolen;
>>   	obj->read_domains = I915_GEM_DOMAIN_CPU | I915_GEM_DOMAIN_GTT;
>> -	cache_level = HAS_LLC(mem->i915) ? I915_CACHE_LLC : I915_CACHE_NONE;
>> +	cache_level = HAS_LLC(mem->i915) ? I915_CACHE_CACHED : I915_CACHE_NONE;
>>   	i915_gem_object_set_cache_coherency(obj, cache_level);
>>   
>>   	if (WARN_ON(!i915_gem_object_trylock(obj, NULL)))
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
>> index 6bd6c239f4ac..107176d1757b 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
>> @@ -48,14 +48,14 @@ void i915_ttm_migrate_set_ban_memcpy(bool ban)
>>   }
>>   #endif
>>   
>> -static enum i915_cache_level
>> -i915_ttm_cache_level(struct drm_i915_private *i915, struct ttm_resource *res,
>> -		     struct ttm_tt *ttm)
>> +static i915_cache_t
>> +i915_ttm_cache(struct drm_i915_private *i915, struct ttm_resource *res,
>> +	       struct ttm_tt *ttm)
>>   {
>>   	return ((HAS_LLC(i915) || HAS_SNOOP(i915)) &&
>>   		!i915_ttm_gtt_binds_lmem(res) &&
>> -		ttm->caching == ttm_cached) ? I915_CACHE_LLC :
>> -		I915_CACHE_NONE;
>> +		ttm->caching == ttm_cached) ? I915_CACHE_CACHED :
>> +					      I915_CACHE_NONE;
>>   }
>>   
>>   static unsigned int
>> @@ -112,8 +112,8 @@ void i915_ttm_adjust_domains_after_move(struct drm_i915_gem_object *obj)
>>   void i915_ttm_adjust_gem_after_move(struct drm_i915_gem_object *obj)
>>   {
>>   	struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
>> -	unsigned int cache_level;
>>   	unsigned int mem_flags;
>> +	i915_cache_t cache;
>>   	unsigned int i;
>>   	int mem_type;
>>   
>> @@ -126,13 +126,13 @@ void i915_ttm_adjust_gem_after_move(struct drm_i915_gem_object *obj)
>>   	if (!bo->resource) {
>>   		mem_flags = I915_BO_FLAG_STRUCT_PAGE;
>>   		mem_type = I915_PL_SYSTEM;
>> -		cache_level = I915_CACHE_NONE;
>> +		cache = I915_CACHE_NONE;
>>   	} else {
>>   		mem_flags = i915_ttm_cpu_maps_iomem(bo->resource) ? I915_BO_FLAG_IOMEM :
>>   			I915_BO_FLAG_STRUCT_PAGE;
>>   		mem_type = bo->resource->mem_type;
>> -		cache_level = i915_ttm_cache_level(to_i915(bo->base.dev), bo->resource,
>> -						   bo->ttm);
>> +		cache = i915_ttm_cache(to_i915(bo->base.dev), bo->resource,
>> +				       bo->ttm);
>>   	}
>>   
>>   	/*
>> @@ -157,7 +157,7 @@ void i915_ttm_adjust_gem_after_move(struct drm_i915_gem_object *obj)
>>   	obj->mem_flags &= ~(I915_BO_FLAG_STRUCT_PAGE | I915_BO_FLAG_IOMEM);
>>   	obj->mem_flags |= mem_flags;
>>   
>> -	i915_gem_object_set_cache_coherency(obj, cache_level);
>> +	i915_gem_object_set_cache_coherency(obj, cache);
>>   }
>>   
>>   /**
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
>> index 1d3ebdf4069b..5d2891981bd4 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
>> @@ -553,7 +553,7 @@ i915_gem_userptr_ioctl(struct drm_device *dev,
>>   	obj->mem_flags = I915_BO_FLAG_STRUCT_PAGE;
>>   	obj->read_domains = I915_GEM_DOMAIN_CPU;
>>   	obj->write_domain = I915_GEM_DOMAIN_CPU;
>> -	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
>> +	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
>>   
>>   	obj->userptr.ptr = args->user_ptr;
>>   	obj->userptr.notifier_seq = ULONG_MAX;
>> diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c b/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c
>> index bac957755068..77d04be5e9d7 100644
>> --- a/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c
>> +++ b/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c
>> @@ -123,7 +123,7 @@ huge_gem_object(struct drm_i915_private *i915,
>>   
>>   	obj->read_domains = I915_GEM_DOMAIN_CPU;
>>   	obj->write_domain = I915_GEM_DOMAIN_CPU;
>> -	cache_level = HAS_LLC(i915) ? I915_CACHE_LLC : I915_CACHE_NONE;
>> +	cache_level = HAS_LLC(i915) ? I915_CACHE_CACHED : I915_CACHE_NONE;
>>   	i915_gem_object_set_cache_coherency(obj, cache_level);
>>   	obj->scratch = phys_size;
>>   
>> diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
>> index 6bddd733d796..6ca5b9dbc414 100644
>> --- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
>> +++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
>> @@ -200,9 +200,10 @@ huge_pages_object(struct drm_i915_private *i915,
>>   	obj->write_domain = I915_GEM_DOMAIN_CPU;
>>   	obj->read_domains = I915_GEM_DOMAIN_CPU;
>>   
>> -	cache_level = HAS_LLC(i915) ? I915_CACHE_LLC : I915_CACHE_NONE;
>> +	cache_level = HAS_LLC(i915) ? I915_CACHE_CACHED : I915_CACHE_NONE;
>>   	i915_gem_object_set_cache_coherency(obj, cache_level);
>>   
>> +
>>   	obj->mm.page_mask = page_mask;
>>   
>>   	return obj;
>> diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>> index 675f71f06e89..3c93a73cf6b1 100644
>> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>> @@ -16,11 +16,11 @@
>>   #include "intel_gtt.h"
>>   
>>   static u64 gen8_pde_encode(const dma_addr_t addr,
>> -			   const enum i915_cache_level level)
>> +			   const enum i915_cache_mode cache_mode)
>>   {
>>   	u64 pde = addr | GEN8_PAGE_PRESENT | GEN8_PAGE_RW;
>>   
>> -	if (level != I915_CACHE_NONE)
>> +	if (cache_mode != I915_CACHE_MODE_UC)
>>   		pde |= PPAT_CACHED_PDE;
>>   	else
>>   		pde |= PPAT_UNCACHED;
>> @@ -43,10 +43,10 @@ static u64 gen8_pte_encode(dma_addr_t addr,
>>   	 * See translation table defined by LEGACY_CACHELEVEL.
>>   	 */
>>   	switch (pat_index) {
>> -	case I915_CACHE_NONE:
>> +	case I915_CACHE_MODE_UC:
>>   		pte |= PPAT_UNCACHED;
>>   		break;
>> -	case I915_CACHE_WT:
>> +	case I915_CACHE_MODE_WT:
>>   		pte |= PPAT_DISPLAY_ELLC;
>>   		break;
>>   	default:
>> @@ -893,7 +893,7 @@ static int gen8_init_scratch(struct i915_address_space *vm)
>>   		}
>>   
>>   		fill_px(obj, vm->scratch[i - 1]->encode);
>> -		obj->encode = gen8_pde_encode(px_dma(obj), I915_CACHE_NONE);
>> +		obj->encode = gen8_pde_encode(px_dma(obj), I915_CACHE_MODE_UC);
>>   
>>   		vm->scratch[i] = obj;
>>   	}
>> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
>> index ee15486fed0d..f1e59e512d14 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
>> @@ -1103,7 +1103,7 @@ static int init_status_page(struct intel_engine_cs *engine)
>>   		return PTR_ERR(obj);
>>   	}
>>   
>> -	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
>> +	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
>>   
>>   	vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL);
>>   	if (IS_ERR(vma)) {
>> diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c
>> index fca61ddca8ad..ab5f654e7557 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
>> @@ -1011,11 +1011,6 @@ static int gen8_gmch_probe(struct i915_ggtt *ggtt)
>>   	return ggtt_probe_common(ggtt, size);
>>   }
>>   
>> -/*
>> - * For pre-gen8 platforms pat_index is the same as enum i915_cache_level,
>> - * so the switch-case statements in these PTE encode functions are still valid.
>> - * See translation table LEGACY_CACHELEVEL.
>> - */
>>   static u64 snb_pte_encode(dma_addr_t addr,
>>   			  unsigned int pat_index,
>>   			  u32 flags)
>> @@ -1023,11 +1018,11 @@ static u64 snb_pte_encode(dma_addr_t addr,
>>   	gen6_pte_t pte = GEN6_PTE_ADDR_ENCODE(addr) | GEN6_PTE_VALID;
>>   
>>   	switch (pat_index) {
>> -	case I915_CACHE_L3_LLC:
>> -	case I915_CACHE_LLC:
>> +	case I915_CACHE_MODE_WB:
>> +	case __I915_CACHE_MODE_WB_L3:
>>   		pte |= GEN6_PTE_CACHE_LLC;
>>   		break;
>> -	case I915_CACHE_NONE:
>> +	case I915_CACHE_MODE_UC:
>>   		pte |= GEN6_PTE_UNCACHED;
>>   		break;
>>   	default:
>> @@ -1044,13 +1039,13 @@ static u64 ivb_pte_encode(dma_addr_t addr,
>>   	gen6_pte_t pte = GEN6_PTE_ADDR_ENCODE(addr) | GEN6_PTE_VALID;
>>   
>>   	switch (pat_index) {
>> -	case I915_CACHE_L3_LLC:
>> +	case __I915_CACHE_MODE_WB_L3:
>>   		pte |= GEN7_PTE_CACHE_L3_LLC;
>>   		break;
>> -	case I915_CACHE_LLC:
>> +	case I915_CACHE_MODE_WB:
>>   		pte |= GEN6_PTE_CACHE_LLC;
>>   		break;
>> -	case I915_CACHE_NONE:
>> +	case I915_CACHE_MODE_UC:
>>   		pte |= GEN6_PTE_UNCACHED;
>>   		break;
>>   	default:
>> @@ -1069,7 +1064,7 @@ static u64 byt_pte_encode(dma_addr_t addr,
>>   	if (!(flags & PTE_READ_ONLY))
>>   		pte |= BYT_PTE_WRITEABLE;
>>   
>> -	if (pat_index != I915_CACHE_NONE)
>> +	if (pat_index != I915_CACHE_MODE_UC)
>>   		pte |= BYT_PTE_SNOOPED_BY_CPU_CACHES;
>>   
>>   	return pte;
>> @@ -1081,7 +1076,7 @@ static u64 hsw_pte_encode(dma_addr_t addr,
>>   {
>>   	gen6_pte_t pte = HSW_PTE_ADDR_ENCODE(addr) | GEN6_PTE_VALID;
>>   
>> -	if (pat_index != I915_CACHE_NONE)
>> +	if (pat_index != I915_CACHE_MODE_UC)
>>   		pte |= HSW_WB_LLC_AGE3;
>>   
>>   	return pte;
>> @@ -1094,9 +1089,9 @@ static u64 iris_pte_encode(dma_addr_t addr,
>>   	gen6_pte_t pte = HSW_PTE_ADDR_ENCODE(addr) | GEN6_PTE_VALID;
>>   
>>   	switch (pat_index) {
>> -	case I915_CACHE_NONE:
>> +	case I915_CACHE_MODE_UC:
>>   		break;
>> -	case I915_CACHE_WT:
>> +	case I915_CACHE_MODE_WT:
>>   		pte |= HSW_WT_ELLC_LLC_AGE3;
>>   		break;
>>   	default:
>> diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c b/drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c
>> index 866c416afb73..803c41ac4ccb 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c
>> @@ -21,7 +21,7 @@ static void gmch_ggtt_insert_page(struct i915_address_space *vm,
>>   				  unsigned int pat_index,
>>   				  u32 unused)
>>   {
>> -	unsigned int flags = (pat_index == I915_CACHE_NONE) ?
>> +	unsigned int flags = (pat_index == I915_CACHE_MODE_UC) ?
>>   		AGP_USER_MEMORY : AGP_USER_CACHED_MEMORY;
>>   
>>   	intel_gmch_gtt_insert_page(addr, offset >> PAGE_SHIFT, flags);
>> @@ -32,7 +32,7 @@ static void gmch_ggtt_insert_entries(struct i915_address_space *vm,
>>   				     unsigned int pat_index,
>>   				     u32 unused)
>>   {
>> -	unsigned int flags = (pat_index == I915_CACHE_NONE) ?
>> +	unsigned int flags = (pat_index == I915_CACHE_MODE_UC) ?
>>   		AGP_USER_MEMORY : AGP_USER_CACHED_MEMORY;
>>   
>>   	intel_gmch_gtt_insert_sg_entries(vma_res->bi.pages, vma_res->start >> PAGE_SHIFT,
>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
>> index 065099362a98..48055304537a 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
>> @@ -676,7 +676,7 @@ __vm_create_scratch_for_read(struct i915_address_space *vm, unsigned long size)
>>   	if (IS_ERR(obj))
>>   		return ERR_CAST(obj);
>>   
>> -	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
>> +	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
>>   
>>   	vma = i915_vma_instance(obj, vm, NULL);
>>   	if (IS_ERR(vma)) {
>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
>> index 7192a534a654..af4277c1d577 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
>> @@ -636,7 +636,8 @@ void
>>   __set_pd_entry(struct i915_page_directory * const pd,
>>   	       const unsigned short idx,
>>   	       struct i915_page_table *pt,
>> -	       u64 (*encode)(const dma_addr_t, const enum i915_cache_level));
>> +	       u64 (*encode)(const dma_addr_t,
>> +			     const enum i915_cache_mode cache_mode));
>>   
>>   #define set_pd_entry(pd, idx, to) \
>>   	__set_pd_entry((pd), (idx), px_pt(to), gen8_pde_encode)
>> diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
>> index 436756bfbb1a..3e461d4f3693 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
>> @@ -98,14 +98,16 @@ void
>>   __set_pd_entry(struct i915_page_directory * const pd,
>>   	       const unsigned short idx,
>>   	       struct i915_page_table * const to,
>> -	       u64 (*encode)(const dma_addr_t, const enum i915_cache_level))
>> +	       u64 (*encode)(const dma_addr_t,
>> +			     const enum i915_cache_mode cache_mode))
>>   {
>>   	/* Each thread pre-pins the pd, and we may have a thread per pde. */
>>   	GEM_BUG_ON(atomic_read(px_used(pd)) > NALLOC * I915_PDES);
>>   
>>   	atomic_inc(px_used(pd));
>>   	pd->entry[idx] = to;
>> -	write_dma_entry(px_base(pd), idx, encode(px_dma(to), I915_CACHE_LLC));
>> +	write_dma_entry(px_base(pd), idx,
>> +			encode(px_dma(to), I915_CACHE_MODE_WB));
>>   }
>>   
>>   void
>> diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
>> index 92085ffd23de..9131d228d285 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
>> @@ -551,7 +551,9 @@ alloc_context_vma(struct intel_engine_cs *engine)
>>   	 * later platforms don't have L3 control bits in the PTE.
>>   	 */
>>   	if (IS_IVYBRIDGE(i915))
>> -		i915_gem_object_set_cache_coherency(obj, I915_CACHE_L3_LLC);
>> +		i915_gem_object_set_cache_coherency(obj,
>> +						    I915_CACHE_CACHED |
>> +						    __I915_CACHE_FLAG(L3));
>>   
>>   	vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL);
>>   	if (IS_ERR(vma)) {
>> diff --git a/drivers/gpu/drm/i915/gt/intel_timeline.c b/drivers/gpu/drm/i915/gt/intel_timeline.c
>> index b9640212d659..025ce54c886d 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_timeline.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_timeline.c
>> @@ -26,7 +26,7 @@ static struct i915_vma *hwsp_alloc(struct intel_gt *gt)
>>   	if (IS_ERR(obj))
>>   		return ERR_CAST(obj);
>>   
>> -	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
>> +	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
>>   
>>   	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL);
>>   	if (IS_ERR(vma))
>> diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
>> index 8b0d84f2aad2..fc278fa463b0 100644
>> --- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
>> +++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
>> @@ -64,7 +64,7 @@ static int hang_init(struct hang *h, struct intel_gt *gt)
>>   		goto err_hws;
>>   	}
>>   
>> -	i915_gem_object_set_cache_coherency(h->hws, I915_CACHE_LLC);
>> +	i915_gem_object_set_cache_coherency(h->hws, I915_CACHE_CACHED);
>>   	vaddr = i915_gem_object_pin_map_unlocked(h->hws, I915_MAP_WB);
>>   	if (IS_ERR(vaddr)) {
>>   		err = PTR_ERR(vaddr);
>> diff --git a/drivers/gpu/drm/i915/gt/selftest_workarounds.c b/drivers/gpu/drm/i915/gt/selftest_workarounds.c
>> index 14a8b25b6204..d25990d33d44 100644
>> --- a/drivers/gpu/drm/i915/gt/selftest_workarounds.c
>> +++ b/drivers/gpu/drm/i915/gt/selftest_workarounds.c
>> @@ -111,7 +111,7 @@ read_nonprivs(struct intel_context *ce)
>>   	if (IS_ERR(result))
>>   		return result;
>>   
>> -	i915_gem_object_set_cache_coherency(result, I915_CACHE_LLC);
>> +	i915_gem_object_set_cache_coherency(result, I915_CACHE_CACHED);
>>   
>>   	cs = i915_gem_object_pin_map_unlocked(result, I915_MAP_WB);
>>   	if (IS_ERR(cs)) {
>> diff --git a/drivers/gpu/drm/i915/i915_cache.c b/drivers/gpu/drm/i915/i915_cache.c
>> index 06eb5933c719..f4ba1cb430d3 100644
>> --- a/drivers/gpu/drm/i915/i915_cache.c
>> +++ b/drivers/gpu/drm/i915/i915_cache.c
>> @@ -6,13 +6,88 @@
>>   #include "i915_cache.h"
>>   #include "i915_drv.h"
>>   
>> -void i915_cache_init(struct drm_i915_private *i915)
>> +int i915_cache_init(struct drm_i915_private *i915)
>>   {
>> -	i915->pat_uc = i915_gem_get_pat_index(i915, I915_CACHE_NONE);
>> -	drm_info(&i915->drm, "Using PAT index %u for uncached access\n",
>> -		 i915->pat_uc);
>> +	int ret;
>>   
>> -	i915->pat_wb = i915_gem_get_pat_index(i915, I915_CACHE_LLC);
>> -	drm_info(&i915->drm, "Using PAT index %u for write-back access\n",
>> -		 i915->pat_wb);
>> +	ret = i915_cache_find_pat(i915, I915_CACHE_NONE);
>> +	if (ret < 0) {
>> +		drm_err(&i915->drm,
>> +			"Failed to find PAT index for uncached access\n");
>> +		return -ENODEV;
>> +	}
>> +	drm_info(&i915->drm, "Using PAT index %u for uncached access\n", ret);
>> +	i915->pat_uc = ret;
>> +
>> +	ret = i915_cache_find_pat(i915, I915_CACHE_CACHED);
>> +	if (ret < 0) {
>> +		drm_err(&i915->drm,
>> +			"Failed to find PAT index for write-back access\n");
>> +		return -ENODEV;
>> +	}
>> +	drm_info(&i915->drm, "Using PAT index %u for write-back access\n", ret);
>> +	i915->pat_wb = ret;
>> +
>> +	return 0;
>> +}
>> +
>> +int i915_cache_find_pat(struct drm_i915_private *i915, i915_cache_t cache)
>> +{
>> +	const struct intel_device_info *info = INTEL_INFO(i915);
>> +	int i;
>> +
>> +	for (i = 0; i < ARRAY_SIZE(info->cache_modes); i++) {
>> +		if (info->cache_modes[i] == cache)
>> +			return i;
>> +	}
>> +
>> +	return -1;
>> +}
>> +
>> +void i915_cache_print(char *buf, size_t buflen, const char *suffix,
>> +		      i915_cache_t cache)
>> +{
>> +	const enum i915_cache_mode mode = I915_CACHE_MODE(cache);
>> +	static const char * const mode_str[] = {
>> +		[I915_CACHE_MODE_UC] = "UC",
>> +		[I915_CACHE_MODE_WB] = "WB",
>> +		[I915_CACHE_MODE_WT] = "WT",
>> +		[I915_CACHE_MODE_WC] = "WC",
>> +	};
>> +	static const char * const flag_str[] = {
>> +		[ilog2(I915_CACHE_FLAG_COH1W)] = "1-Way-Coherent",
>> +		[ilog2(I915_CACHE_FLAG_COH2W)] = "2-Way-Coherent",
>> +		[ilog2(I915_CACHE_FLAG_L3)] =    "L3",
>> +		[ilog2(I915_CACHE_FLAG_CLOS1)] = "CLOS1",
>> +		[ilog2(I915_CACHE_FLAG_CLOS2)] = "CLOS2",
>> +	};
>> +
>> +	if (mode > ARRAY_SIZE(mode_str)) {
>> +		snprintf(buf, buflen, "0x%x%s", cache, suffix ?: "");
>> +	} else {
>> +		unsigned long flags = I915_CACHE_FLAGS(cache);
>> +		unsigned long bit;
>> +		int ret;
>> +
>> +		ret = snprintf(buf, buflen, "%s", mode_str[mode]);
>> +		buf += ret;
>> +		buflen -= ret;
>> +
>> +		/*
>> +		 * Don't print "1-way-2-way", it would be confusing and 2-way
>> +		 * implies 1-way anyway.
>> +		 */
>> +		if ((flags & (I915_CACHE_FLAG_COH1W | I915_CACHE_FLAG_COH2W)) ==
>> +		    (I915_CACHE_FLAG_COH1W | I915_CACHE_FLAG_COH2W))
>> +			flags &= ~I915_CACHE_FLAG_COH1W;
>> +
>> +		for_each_set_bit(bit, &flags, BITS_PER_TYPE(i915_cache_t)) {
>> +			ret = snprintf(buf, buflen, "-%s", flag_str[bit]);
>> +			buf += ret;
>> +			buflen -= ret;
>> +		}
>> +
>> +		if (suffix)
>> +			snprintf(buf, buflen, "%s", suffix);
>> +	}
>>   }
>> diff --git a/drivers/gpu/drm/i915/i915_cache.h b/drivers/gpu/drm/i915/i915_cache.h
>> index cb68936fb8a2..d9e97318b942 100644
>> --- a/drivers/gpu/drm/i915/i915_cache.h
>> +++ b/drivers/gpu/drm/i915/i915_cache.h
>> @@ -6,8 +6,76 @@
>>   #ifndef __I915_CACHE_H__
>>   #define __I915_CACHE_H__
>>   
>> +#include <linux/types.h>
>> +
>> +struct drm_printer;
>> +
>>   struct drm_i915_private;
>>   
>> -void i915_cache_init(struct drm_i915_private *i915);
>> +typedef u16 i915_cache_t;
>> +
>> +/* Cache modes */
>> +enum i915_cache_mode {
>> +	I915_CACHE_MODE_UC = 0,
>> +	I915_CACHE_MODE_WB,
>> +	__I915_CACHE_MODE_WB_L3, /* Special do-not-use entry for legacy 1:1 mapping. */
>> +	I915_CACHE_MODE_WT,
>> +	I915_CACHE_MODE_WC,
>> +	I915_NUM_CACHE_MODES
>> +};
>> +
>> +/* Cache mode flag bits */
>> +#define I915_CACHE_FLAG_COH1W	(0x1)
>> +#define I915_CACHE_FLAG_COH2W	(0x2) /* 1-way needs to be set too. */
>> +#define I915_CACHE_FLAG_L3	(0x4)
>> +#define I915_CACHE_FLAG_CLOS1	(0x8)
>> +#define I915_CACHE_FLAG_CLOS2	(0x10)
>> +
>> +/*
>> + * Overloaded I915_CACHE() macro based on:
>> + *  https://stackoverflow.com/questions/3046889/optional-parameters-with-c-macros
>> + *
>> + * It is possible to call I915_CACHE with mode and zero or more flags as
>> + * separate arguments. Ie these all work:
>> + *
>> + *   I915_CACHE(WB)
>> + *   I915_CACHE(WB, COH1W, COH2W)
>> + *   I915_CACHE(WB, COH1W, COH2W, L3)
>> + */
>> +
>> +#define __I915_CACHE_FLAG(f) (I915_CACHE_FLAG_##f << 8)
>> +#define __I915_CACHE(m, f) ((i915_cache_t)(I915_CACHE_MODE_##m | (f)))
>> +
>> +#define I915_CACHE_4(m, f1, f2, f3)	__I915_CACHE(m, __I915_CACHE_FLAG(f1) | __I915_CACHE_FLAG(f2) | __I915_CACHE_FLAG(f3))
>> +#define I915_CACHE_3(m, f1, f2)		__I915_CACHE(m, __I915_CACHE_FLAG(f1) | __I915_CACHE_FLAG(f2))
>> +#define I915_CACHE_2(m, f1)		__I915_CACHE(m, __I915_CACHE_FLAG(f1))
>> +#define I915_CACHE_1(m)			__I915_CACHE(m, 0)
>> +#define I915_CACHE_0(m)			__I915_CACHE(WC, 0)
>> +
>> +#define FUNC_CHOOSER(_f1, _f2, _f3, _f4, _f5, ...) _f5
>> +#define FUNC_RECOMPOSER(argsWithParentheses) FUNC_CHOOSER argsWithParentheses
>> +#define CHOOSE_FROM_ARG_COUNT(...) FUNC_RECOMPOSER((__VA_ARGS__, I915_CACHE_4, I915_CACHE_3, I915_CACHE_2, I915_CACHE_1, ))
>> +#define NO_ARG_EXPANDER() ,,,I915_CACHE_0
>> +#define MACRO_CHOOSER(...) CHOOSE_FROM_ARG_COUNT(NO_ARG_EXPANDER __VA_ARGS__ ())
>> +
>> +#define I915_CACHE(...) MACRO_CHOOSER(__VA_ARGS__)(__VA_ARGS__)
>> +
>> +/* i915_cache_t mode and flags extraction helpers. */
>> +#define I915_CACHE_MODE(cache) \
>> +	((enum i915_cache_mode)(((i915_cache_t)(cache)) & 0xff))
>> +#define I915_CACHE_FLAGS(cache) \
>> +	((unsigned int)((((i915_cache_t)(cache) & 0xff00)) >> 8))
>> +
>> +/* Helpers for i915 caching modes. */
>> +#define I915_CACHE_NONE		I915_CACHE(UC)
>> +#define I915_CACHE_CACHED	I915_CACHE(WB, COH1W, COH2W)
>> +#define I915_CACHE_WT		I915_CACHE(WT)
>> +
>> +int i915_cache_init(struct drm_i915_private *i915);
>> +int i915_cache_find_pat(struct drm_i915_private *i915, i915_cache_t cache);
>> +void i915_cache_print(char *buf, size_t buflen, const char *suffix,
>> +		      i915_cache_t cache);
>> +
>> +#define I915_CACHE_NAME_LEN (40)
>>   
>>   #endif /* __I915_CACHE_H__ */
>> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
>> index 4de44cf1026d..4ec292011546 100644
>> --- a/drivers/gpu/drm/i915/i915_debugfs.c
>> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
>> @@ -140,57 +140,18 @@ static const char *stringify_vma_type(const struct i915_vma *vma)
>>   	return "ppgtt";
>>   }
>>   
>> -static const char *i915_cache_level_str(struct drm_i915_gem_object *obj)
>> -{
>> -	struct drm_i915_private *i915 = obj_to_i915(obj);
>> -
>> -	if (IS_METEORLAKE(i915)) {
>> -		switch (obj->pat_index) {
>> -		case 0: return " WB";
>> -		case 1: return " WT";
>> -		case 2: return " UC";
>> -		case 3: return " WB (1-Way Coh)";
>> -		case 4: return " WB (2-Way Coh)";
>> -		default: return " not defined";
>> -		}
>> -	} else if (IS_PONTEVECCHIO(i915)) {
>> -		switch (obj->pat_index) {
>> -		case 0: return " UC";
>> -		case 1: return " WC";
>> -		case 2: return " WT";
>> -		case 3: return " WB";
>> -		case 4: return " WT (CLOS1)";
>> -		case 5: return " WB (CLOS1)";
>> -		case 6: return " WT (CLOS2)";
>> -		case 7: return " WT (CLOS2)";
>> -		default: return " not defined";
>> -		}
>> -	} else if (GRAPHICS_VER(i915) >= 12) {
>> -		switch (obj->pat_index) {
>> -		case 0: return " WB";
>> -		case 1: return " WC";
>> -		case 2: return " WT";
>> -		case 3: return " UC";
>> -		default: return " not defined";
>> -		}
>> -	} else {
>> -		switch (obj->pat_index) {
>> -		case 0: return " UC";
>> -		case 1: return HAS_LLC(i915) ?
>> -			       " LLC" : " snooped";
>> -		case 2: return " L3+LLC";
>> -		case 3: return " WT";
>> -		default: return " not defined";
>> -		}
>> -	}
>> -}
>> -
>>   void
>>   i915_debugfs_describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
>>   {
>> +	struct drm_i915_private *i915 = to_i915(obj->base.dev);
>> +	char buf[I915_CACHE_NAME_LEN];
>>   	struct i915_vma *vma;
>>   	int pin_count = 0;
>>   
>> +	i915_cache_print(buf, sizeof(buf),
>> +			 obj->pat_set_by_user ? "!" : NULL,
>> +			 INTEL_INFO(i915)->cache_modes[obj->pat_index]);
>> +
>>   	seq_printf(m, "%pK: %c%c%c %8zdKiB %02x %02x %s%s%s",
>>   		   &obj->base,
>>   		   get_tiling_flag(obj),
>> @@ -199,7 +160,7 @@ i915_debugfs_describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
>>   		   obj->base.size / 1024,
>>   		   obj->read_domains,
>>   		   obj->write_domain,
>> -		   i915_cache_level_str(obj),
>> +		   buf,
>>   		   obj->mm.dirty ? " dirty" : "",
>>   		   obj->mm.madv == I915_MADV_DONTNEED ? " purgeable" : "");
>>   	if (obj->base.name)
>> diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
>> index bb2223cc3470..8663388a524f 100644
>> --- a/drivers/gpu/drm/i915/i915_driver.c
>> +++ b/drivers/gpu/drm/i915/i915_driver.c
>> @@ -241,7 +241,9 @@ static int i915_driver_early_probe(struct drm_i915_private *dev_priv)
>>   	i915_memcpy_init_early(dev_priv);
>>   	intel_runtime_pm_init_early(&dev_priv->runtime_pm);
>>   
>> -	i915_cache_init(dev_priv);
>> +	ret = i915_cache_init(dev_priv);
>> +	if (ret < 0)
>> +		return ret;
>>   
>>   	ret = i915_workqueues_init(dev_priv);
>>   	if (ret < 0)
>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>> index 896aa48ed089..814705cfeb12 100644
>> --- a/drivers/gpu/drm/i915/i915_gem.c
>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>> @@ -1144,19 +1144,6 @@ int i915_gem_init(struct drm_i915_private *dev_priv)
>>   	unsigned int i;
>>   	int ret;
>>   
>> -	/*
>> -	 * In the proccess of replacing cache_level with pat_index a tricky
>> -	 * dependency is created on the definition of the enum i915_cache_level.
>> -	 * in case this enum is changed, PTE encode would be broken.
>> -	 * Add a WARNING here. And remove when we completely quit using this
>> -	 * enum
>> -	 */
>> -	BUILD_BUG_ON(I915_CACHE_NONE != 0 ||
>> -		     I915_CACHE_LLC != 1 ||
>> -		     I915_CACHE_L3_LLC != 2 ||
>> -		     I915_CACHE_WT != 3 ||
>> -		     I915_MAX_CACHE_LEVEL != 4);
>> -
>>   	/* We need to fallback to 4K pages if host doesn't support huge gtt. */
>>   	if (intel_vgpu_active(dev_priv) && !intel_vgpu_has_huge_gtt(dev_priv))
>>   		RUNTIME_INFO(dev_priv)->page_sizes = I915_GTT_PAGE_SIZE_4K;
>> diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
>> index fcacdc21643c..565a60a1645d 100644
>> --- a/drivers/gpu/drm/i915/i915_pci.c
>> +++ b/drivers/gpu/drm/i915/i915_pci.c
>> @@ -32,6 +32,7 @@
>>   #include "gt/intel_sa_media.h"
>>   #include "gem/i915_gem_object_types.h"
>>   
>> +#include "i915_cache.h"
>>   #include "i915_driver.h"
>>   #include "i915_drv.h"
>>   #include "i915_pci.h"
>> @@ -43,36 +44,43 @@
>>   	.__runtime.graphics.ip.ver = (x), \
>>   	.__runtime.media.ip.ver = (x)
>>   
>> -#define LEGACY_CACHELEVEL \
>> -	.cachelevel_to_pat = { \
>> -		[I915_CACHE_NONE]   = 0, \
>> -		[I915_CACHE_LLC]    = 1, \
>> -		[I915_CACHE_L3_LLC] = 2, \
>> -		[I915_CACHE_WT]     = 3, \
>> +#define LEGACY_CACHE_MODES \
>> +	.cache_modes = { \
>> +		[I915_CACHE_MODE_UC] 	  = I915_CACHE(UC), \
>> +		[I915_CACHE_MODE_WB] 	  = I915_CACHE(WB, COH1W, COH2W), \
> 
> Reading bspec 2863 (bdw) indicates that the CPU being able to snoop the
> GPU's L3 was a new feature in gen8.  So for HSW and earlier, any
> coherency was only 1-way (GPU could be coherent with CPU's caches, but
> not vice-versa).  Only starting with gen8 did we get 2-way coherency as
> an option where the CPU would also be coherent with the GPU cache (and
> with gen8 and beyond you could still select 1-way instead of 2-way
> coherency with instruction-level granularity via MOCS).  There are also
> some legacy platforms (e.g., EHL/JSL on bspec 13948) where the IA wasn't
> coherent with GPU L3 so we were back to 1-way coherency.
> 
> So should we split LEGACY_CACHE_MODES into two tables with different
> coherency settings attached to I915_CACHE_MODE_WB?
> 
>> +		[__I915_CACHE_MODE_WB_L3] = I915_CACHE(WB, COH1W, COH2W, L3), \
>> +		[I915_CACHE_MODE_WT] 	  = I915_CACHE(WT), \
>>   	}
>>   
>> -#define TGL_CACHELEVEL \
>> -	.cachelevel_to_pat = { \
>> -		[I915_CACHE_NONE]   = 3, \
>> -		[I915_CACHE_LLC]    = 0, \
>> -		[I915_CACHE_L3_LLC] = 0, \
>> -		[I915_CACHE_WT]     = 2, \
>> +#define GEN12_CACHE_MODES \
>> +	.cache_modes = { \
>> +		[0] = I915_CACHE(WB, COH1W, COH2W), \
>> +		[1] = I915_CACHE(WC), \
>> +		[2] = I915_CACHE(WT), \
>> +		[3] = I915_CACHE(UC), \
>>   	}
>>   
>> -#define PVC_CACHELEVEL \
>> -	.cachelevel_to_pat = { \
>> -		[I915_CACHE_NONE]   = 0, \
>> -		[I915_CACHE_LLC]    = 3, \
>> -		[I915_CACHE_L3_LLC] = 3, \
>> -		[I915_CACHE_WT]     = 2, \
>> +/* FIXME is 1-way or 2-way for 3, 5, 7 */
>> +
>> +#define PVC_CACHE_MODES \
>> +	.cache_modes = { \
>> +		[0] = I915_CACHE(UC), \
>> +		[1] = I915_CACHE(WC), \
>> +		[2] = I915_CACHE(WT), \
>> +		[3] = I915_CACHE(WB, COH1W), \
>> +		[4] = I915_CACHE(WT, CLOS1), \
>> +		[5] = I915_CACHE(WB, COH1W, CLOS1), \
>> +		[6] = I915_CACHE(WT, CLOS2), \
>> +		[7] = I915_CACHE(WB, COH1W, CLOS2), \
>>   	}
>>   
>> -#define MTL_CACHELEVEL \
>> -	.cachelevel_to_pat = { \
>> -		[I915_CACHE_NONE]   = 2, \
>> -		[I915_CACHE_LLC]    = 3, \
>> -		[I915_CACHE_L3_LLC] = 3, \
>> -		[I915_CACHE_WT]     = 1, \
>> +#define MTL_CACHE_MODES \
>> +	.cache_modes = { \
>> +		[0] = I915_CACHE(WB), \
>> +		[1] = I915_CACHE(WT), \
>> +		[2] = I915_CACHE(UC), \
>> +		[3] = I915_CACHE(WB, COH1W), \
>> +		[4] = I915_CACHE(WB, COH1W, COH2W), \
> 
> We may want a comment on this one since the "2W" part is sort of a lie.
> Bspec 63884 has a programming note for MTL that says
> 
>          "...Except for system atomics, setting Coherency Mode to 10 or
>          11 results in this same one-way coherenct behavior..."
> 
> So if we ask for 2W, we actually only get 1W behavior except in a very
> narrow set of cases.

Shall I just not mark it as 2-way then becuase it sounds that for i915 
purposes it is not 2-way?!

Could invent a new flag just to documet this is something weird?

Regards,

Tvrtko

> 
> 
> Matt
> 
>>   	}
>>   
>>   /* Keep in gen based order, and chronological order within a gen */
>> @@ -97,7 +105,7 @@
>>   	.max_pat_index = 3, \
>>   	GEN_DEFAULT_PAGE_SIZES, \
>>   	GEN_DEFAULT_REGIONS, \
>> -	LEGACY_CACHELEVEL
>> +	LEGACY_CACHE_MODES
>>   
>>   #define I845_FEATURES \
>>   	GEN(2), \
>> @@ -112,7 +120,7 @@
>>   	.max_pat_index = 3, \
>>   	GEN_DEFAULT_PAGE_SIZES, \
>>   	GEN_DEFAULT_REGIONS, \
>> -	LEGACY_CACHELEVEL
>> +	LEGACY_CACHE_MODES
>>   
>>   static const struct intel_device_info i830_info = {
>>   	I830_FEATURES,
>> @@ -145,7 +153,7 @@ static const struct intel_device_info i865g_info = {
>>   	.max_pat_index = 3, \
>>   	GEN_DEFAULT_PAGE_SIZES, \
>>   	GEN_DEFAULT_REGIONS, \
>> -	LEGACY_CACHELEVEL
>> +	LEGACY_CACHE_MODES
>>   
>>   static const struct intel_device_info i915g_info = {
>>   	GEN3_FEATURES,
>> @@ -208,7 +216,7 @@ static const struct intel_device_info pnv_m_info = {
>>   	.max_pat_index = 3, \
>>   	GEN_DEFAULT_PAGE_SIZES, \
>>   	GEN_DEFAULT_REGIONS, \
>> -	LEGACY_CACHELEVEL
>> +	LEGACY_CACHE_MODES
>>   
>>   static const struct intel_device_info i965g_info = {
>>   	GEN4_FEATURES,
>> @@ -252,7 +260,7 @@ static const struct intel_device_info gm45_info = {
>>   	.max_pat_index = 3, \
>>   	GEN_DEFAULT_PAGE_SIZES, \
>>   	GEN_DEFAULT_REGIONS, \
>> -	LEGACY_CACHELEVEL
>> +	LEGACY_CACHE_MODES
>>   
>>   static const struct intel_device_info ilk_d_info = {
>>   	GEN5_FEATURES,
>> @@ -282,7 +290,7 @@ static const struct intel_device_info ilk_m_info = {
>>   	.__runtime.ppgtt_size = 31, \
>>   	GEN_DEFAULT_PAGE_SIZES, \
>>   	GEN_DEFAULT_REGIONS, \
>> -	LEGACY_CACHELEVEL
>> +	LEGACY_CACHE_MODES
>>   
>>   #define SNB_D_PLATFORM \
>>   	GEN6_FEATURES, \
>> @@ -330,7 +338,7 @@ static const struct intel_device_info snb_m_gt2_info = {
>>   	.__runtime.ppgtt_size = 31, \
>>   	GEN_DEFAULT_PAGE_SIZES, \
>>   	GEN_DEFAULT_REGIONS, \
>> -	LEGACY_CACHELEVEL
>> +	LEGACY_CACHE_MODES
>>   
>>   #define IVB_D_PLATFORM \
>>   	GEN7_FEATURES, \
>> @@ -387,7 +395,7 @@ static const struct intel_device_info vlv_info = {
>>   	.platform_engine_mask = BIT(RCS0) | BIT(VCS0) | BIT(BCS0),
>>   	GEN_DEFAULT_PAGE_SIZES,
>>   	GEN_DEFAULT_REGIONS,
>> -	LEGACY_CACHELEVEL,
>> +	LEGACY_CACHE_MODES
>>   };
>>   
>>   #define G75_FEATURES  \
>> @@ -473,7 +481,7 @@ static const struct intel_device_info chv_info = {
>>   	.has_coherent_ggtt = false,
>>   	GEN_DEFAULT_PAGE_SIZES,
>>   	GEN_DEFAULT_REGIONS,
>> -	LEGACY_CACHELEVEL,
>> +	LEGACY_CACHE_MODES
>>   };
>>   
>>   #define GEN9_DEFAULT_PAGE_SIZES \
>> @@ -536,7 +544,7 @@ static const struct intel_device_info skl_gt4_info = {
>>   	.max_pat_index = 3, \
>>   	GEN9_DEFAULT_PAGE_SIZES, \
>>   	GEN_DEFAULT_REGIONS, \
>> -	LEGACY_CACHELEVEL
>> +	LEGACY_CACHE_MODES
>>   
>>   static const struct intel_device_info bxt_info = {
>>   	GEN9_LP_FEATURES,
>> @@ -640,7 +648,7 @@ static const struct intel_device_info jsl_info = {
>>   #define GEN12_FEATURES \
>>   	GEN11_FEATURES, \
>>   	GEN(12), \
>> -	TGL_CACHELEVEL, \
>> +	GEN12_CACHE_MODES, \
>>   	.has_global_mocs = 1, \
>>   	.has_pxp = 1, \
>>   	.max_pat_index = 3
>> @@ -708,7 +716,7 @@ static const struct intel_device_info adl_p_info = {
>>   	.__runtime.graphics.ip.ver = 12, \
>>   	.__runtime.graphics.ip.rel = 50, \
>>   	XE_HP_PAGE_SIZES, \
>> -	TGL_CACHELEVEL, \
>> +	GEN12_CACHE_MODES, \
>>   	.dma_mask_size = 46, \
>>   	.has_3d_pipeline = 1, \
>>   	.has_64bit_reloc = 1, \
>> @@ -803,7 +811,7 @@ static const struct intel_device_info pvc_info = {
>>   		BIT(VCS0) |
>>   		BIT(CCS0) | BIT(CCS1) | BIT(CCS2) | BIT(CCS3),
>>   	.require_force_probe = 1,
>> -	PVC_CACHELEVEL,
>> +	PVC_CACHE_MODES
>>   };
>>   
>>   static const struct intel_gt_definition xelpmp_extra_gt[] = {
>> @@ -838,7 +846,7 @@ static const struct intel_device_info mtl_info = {
>>   	.memory_regions = REGION_SMEM | REGION_STOLEN_LMEM,
>>   	.platform_engine_mask = BIT(RCS0) | BIT(BCS0) | BIT(CCS0),
>>   	.require_force_probe = 1,
>> -	MTL_CACHELEVEL,
>> +	MTL_CACHE_MODES
>>   };
>>   
>>   #undef PLATFORM
>> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
>> index 04bc1f4a1115..973175a64534 100644
>> --- a/drivers/gpu/drm/i915/i915_perf.c
>> +++ b/drivers/gpu/drm/i915/i915_perf.c
>> @@ -1870,7 +1870,7 @@ static int alloc_oa_buffer(struct i915_perf_stream *stream)
>>   		return PTR_ERR(bo);
>>   	}
>>   
>> -	i915_gem_object_set_cache_coherency(bo, I915_CACHE_LLC);
>> +	i915_gem_object_set_cache_coherency(bo, I915_CACHE_CACHED);
>>   
>>   	/* PreHSW required 512K alignment, HSW requires 16M */
>>   	vma = i915_vma_instance(bo, &gt->ggtt->vm, NULL);
>> diff --git a/drivers/gpu/drm/i915/intel_device_info.h b/drivers/gpu/drm/i915/intel_device_info.h
>> index dbfe6443457b..2ce13b7c48cb 100644
>> --- a/drivers/gpu/drm/i915/intel_device_info.h
>> +++ b/drivers/gpu/drm/i915/intel_device_info.h
>> @@ -27,6 +27,8 @@
>>   
>>   #include <uapi/drm/i915_drm.h>
>>   
>> +#include "i915_cache.h"
>> +
>>   #include "intel_step.h"
>>   
>>   #include "gt/intel_engine_types.h"
>> @@ -243,8 +245,8 @@ struct intel_device_info {
>>   	 */
>>   	const struct intel_runtime_info __runtime;
>>   
>> -	u32 cachelevel_to_pat[I915_MAX_CACHE_LEVEL];
>> -	u32 max_pat_index;
>> +	i915_cache_t cache_modes[8];
>> +	unsigned int max_pat_index;
>>   };
>>   
>>   struct intel_driver_caps {
>> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
>> index f910ec9b6d2b..ba821e48baa5 100644
>> --- a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
>> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
>> @@ -267,7 +267,7 @@ static int igt_evict_for_cache_color(void *arg)
>>   		err = PTR_ERR(obj);
>>   		goto cleanup;
>>   	}
>> -	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
>> +	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
>>   	quirk_add(obj, &objects);
>>   
>>   	vma = i915_gem_object_ggtt_pin(obj, NULL, 0, 0,
>> @@ -283,7 +283,7 @@ static int igt_evict_for_cache_color(void *arg)
>>   		err = PTR_ERR(obj);
>>   		goto cleanup;
>>   	}
>> -	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
>> +	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
>>   	quirk_add(obj, &objects);
>>   
>>   	/* Neighbouring; same colour - should fit */
>> diff --git a/drivers/gpu/drm/i915/selftests/igt_spinner.c b/drivers/gpu/drm/i915/selftests/igt_spinner.c
>> index 3c5e0952f1b8..4cfc5000d6ff 100644
>> --- a/drivers/gpu/drm/i915/selftests/igt_spinner.c
>> +++ b/drivers/gpu/drm/i915/selftests/igt_spinner.c
>> @@ -23,7 +23,7 @@ int igt_spinner_init(struct igt_spinner *spin, struct intel_gt *gt)
>>   		err = PTR_ERR(spin->hws);
>>   		goto err;
>>   	}
>> -	i915_gem_object_set_cache_coherency(spin->hws, I915_CACHE_LLC);
>> +	i915_gem_object_set_cache_coherency(spin->hws, I915_CACHE_CACHED);
>>   
>>   	spin->obj = i915_gem_object_create_internal(gt->i915, PAGE_SIZE);
>>   	if (IS_ERR(spin->obj)) {
>> diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
>> index 1d1a457e2aee..8ae77bcf27fa 100644
>> --- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
>> +++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
>> @@ -126,13 +126,13 @@ static const struct intel_device_info mock_info = {
>>   	.memory_regions = REGION_SMEM,
>>   	.platform_engine_mask = BIT(0),
>>   
>> -	/* simply use legacy cache level for mock device */
>> +	/* Simply use legacy cache modes for the mock device. */
>>   	.max_pat_index = 3,
>> -	.cachelevel_to_pat = {
>> -		[I915_CACHE_NONE]   = 0,
>> -		[I915_CACHE_LLC]    = 1,
>> -		[I915_CACHE_L3_LLC] = 2,
>> -		[I915_CACHE_WT]     = 3,
>> +	.cache_modes = {
>> +		[0] = I915_CACHE(UC),
>> +		[1] = I915_CACHE(WB, COH1W),
>> +		[2] = I915_CACHE(WB, COH1W, COH2W, L3),
>> +		[3] = I915_CACHE(WT),
>>   	},
>>   };
>>   
>> @@ -181,7 +181,7 @@ struct drm_i915_private *mock_gem_device(void)
>>   	/* Set up device info and initial runtime info. */
>>   	intel_device_info_driver_create(i915, pdev->device, &mock_info);
>>   
>> -	i915_cache_init(i915);
>> +	WARN_ON(i915_cache_init(i915));
>>   
>>   	dev_pm_domain_set(&pdev->dev, &pm_domain);
>>   	pm_runtime_enable(&pdev->dev);
>> -- 
>> 2.39.2
>>
> 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC 7/8] drm/i915: Lift the user PAT restriction from use_cpu_reloc
  2023-07-28  0:09     ` [Intel-gfx] " Matt Roper
@ 2023-07-28 12:45       ` Tvrtko Ursulin
  -1 siblings, 0 replies; 59+ messages in thread
From: Tvrtko Ursulin @ 2023-07-28 12:45 UTC (permalink / raw)
  To: Matt Roper; +Cc: Intel-gfx, Fei Yang, dri-devel, Tvrtko Ursulin


On 28/07/2023 01:09, Matt Roper wrote:
> On Thu, Jul 27, 2023 at 03:55:03PM +0100, Tvrtko Ursulin wrote:
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> Now that i915 understands the caching modes behind PAT indices, we can
>> refine the check in use_cpu_reloc() to not reject the uncached PAT if it
>> was set by userspace.
>>
>> Instead it can decide based on the presence of full coherency which
>> should be functionally equivalent on legacy platforms. We can ignore WT
>> since it is only used by the display, and we can ignore Meteorlake since
>> it will fail on the existing "has_llc" condition before the object cache
>> mode check.
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> Cc: Fei Yang <fei.yang@intel.com>
>> Cc: Matt Roper <matthew.d.roper@intel.com>
>> ---
>>   drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 9 +--------
>>   1 file changed, 1 insertion(+), 8 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>> index 9d6e49c8a4c6..f74b33670bad 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>> @@ -640,16 +640,9 @@ static inline int use_cpu_reloc(const struct reloc_cache *cache,
>>   	if (DBG_FORCE_RELOC == FORCE_GTT_RELOC)
>>   		return false;
>>   
>> -	/*
>> -	 * For objects created by userspace through GEM_CREATE with pat_index
>> -	 * set by set_pat extension, i915_gem_object_has_cache_level() always
>> -	 * return true, otherwise the call would fall back to checking whether
>> -	 * the object is un-cached.
>> -	 */
>>   	return (cache->has_llc ||
>>   		obj->cache_dirty ||
>> -		!(obj->pat_set_by_user ||
>> -		  i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_UC)));
>> +		i915_gem_object_has_cache_flag(obj, I915_CACHE_FLAG_COH2W));
> 
> My understanding of relocations is minimal, but does 2W actually matter
> here (CPU snooping GPU caches)?  I would have expected only 1W coherency
> to be necessary (GPU snooping CPU caches)?

I struggled with this one. Original code was:

         return (cache->has_llc ||
                 obj->cache_dirty ||
                 obj->cache_level != I915_CACHE_NONE);

And I struggled to figure out the intent. It is not "don't do CPU 
relocations for uncached" because it will do them when LLC or dirty 
regardless.

You could be right.. can we interpret it as any mode apart from uncached 
was viewed as coherent for CPU writes being seen by the GPU?

In which case should/could it be based on I915_BO_CACHE_COHERENT_FOR_WRITE?

Regards,

Tvrtko

> 
> 
> Matt
> 
>>   }
>>   
>>   static int eb_reserve_vma(struct i915_execbuffer *eb,
>> -- 
>> 2.39.2
>>
> 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Intel-gfx] [RFC 7/8] drm/i915: Lift the user PAT restriction from use_cpu_reloc
@ 2023-07-28 12:45       ` Tvrtko Ursulin
  0 siblings, 0 replies; 59+ messages in thread
From: Tvrtko Ursulin @ 2023-07-28 12:45 UTC (permalink / raw)
  To: Matt Roper; +Cc: Intel-gfx, dri-devel


On 28/07/2023 01:09, Matt Roper wrote:
> On Thu, Jul 27, 2023 at 03:55:03PM +0100, Tvrtko Ursulin wrote:
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> Now that i915 understands the caching modes behind PAT indices, we can
>> refine the check in use_cpu_reloc() to not reject the uncached PAT if it
>> was set by userspace.
>>
>> Instead it can decide based on the presence of full coherency which
>> should be functionally equivalent on legacy platforms. We can ignore WT
>> since it is only used by the display, and we can ignore Meteorlake since
>> it will fail on the existing "has_llc" condition before the object cache
>> mode check.
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> Cc: Fei Yang <fei.yang@intel.com>
>> Cc: Matt Roper <matthew.d.roper@intel.com>
>> ---
>>   drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 9 +--------
>>   1 file changed, 1 insertion(+), 8 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>> index 9d6e49c8a4c6..f74b33670bad 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>> @@ -640,16 +640,9 @@ static inline int use_cpu_reloc(const struct reloc_cache *cache,
>>   	if (DBG_FORCE_RELOC == FORCE_GTT_RELOC)
>>   		return false;
>>   
>> -	/*
>> -	 * For objects created by userspace through GEM_CREATE with pat_index
>> -	 * set by set_pat extension, i915_gem_object_has_cache_level() always
>> -	 * return true, otherwise the call would fall back to checking whether
>> -	 * the object is un-cached.
>> -	 */
>>   	return (cache->has_llc ||
>>   		obj->cache_dirty ||
>> -		!(obj->pat_set_by_user ||
>> -		  i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_UC)));
>> +		i915_gem_object_has_cache_flag(obj, I915_CACHE_FLAG_COH2W));
> 
> My understanding of relocations is minimal, but does 2W actually matter
> here (CPU snooping GPU caches)?  I would have expected only 1W coherency
> to be necessary (GPU snooping CPU caches)?

I struggled with this one. Original code was:

         return (cache->has_llc ||
                 obj->cache_dirty ||
                 obj->cache_level != I915_CACHE_NONE);

And I struggled to figure out the intent. It is not "don't do CPU 
relocations for uncached" because it will do them when LLC or dirty 
regardless.

You could be right.. can we interpret it as any mode apart from uncached 
was viewed as coherent for CPU writes being seen by the GPU?

In which case should/could it be based on I915_BO_CACHE_COHERENT_FOR_WRITE?

Regards,

Tvrtko

> 
> 
> Matt
> 
>>   }
>>   
>>   static int eb_reserve_vma(struct i915_execbuffer *eb,
>> -- 
>> 2.39.2
>>
> 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC 4/8] drm/i915: Refactor PAT/object cache handling
  2023-07-28  7:14     ` [Intel-gfx] " Yang, Fei
@ 2023-07-28 12:55       ` Tvrtko Ursulin
  -1 siblings, 0 replies; 59+ messages in thread
From: Tvrtko Ursulin @ 2023-07-28 12:55 UTC (permalink / raw)
  To: Yang, Fei, Intel-gfx, dri-devel
  Cc: Roper, Matthew D, Chris Wilson, Andi Shyti, Ursulin, Tvrtko


On 28/07/2023 08:14, Yang, Fei wrote:
> [snip]
>> @@ -41,14 +42,17 @@ static bool gpu_write_needs_clflush(struct drm_i915_gem_object *obj)
>>                return false;
>>
>>        /*
>> -      * For objects created by userspace through GEM_CREATE with pat_index
>> -      * set by set_pat extension, i915_gem_object_has_cache_level() will
>> -      * always return true, because the coherency of such object is managed
> 
> i915_gem_object_has_cache_level() always return true means this function
> always return false.
> 
>> -      * by userspace. Othereise the call here would fall back to checking
>> -      * whether the object is un-cached or write-through.
>> +      * Always flush cache for UMD objects with PAT index set.
> 
> (obj->pat_set_by_user == true) indicates UMD knows how to handle the coherency,
> forcing clflush in KMD would be redundant.

For Meteorlake I made gpu_write_needs_clflush() always return false anyway.

Could you please submit a patch with kerneldoc for i915_drm.h explaining 
what the set domain ioctl is expected to do when set pat extension is 
used? With the focus on the use cases of how userspace is managing 
coherency using it, or it isn't, or what.

>>         */
>> -     return !(i915_gem_object_has_cache_level(obj, I915_CACHE_NONE) ||
>> -              i915_gem_object_has_cache_level(obj, I915_CACHE_WT));
>> +     if (obj->pat_set_by_user)
>> +             return true;
> 
> return false;

Oops, thank you! I did warn in the cover letter I was getting confused 
by boolean logic conversions, cross-referencing three versions, and 
extracting the pat_set_by_user to call sites. :)

>> +
>> +     /*
>> +      * Fully coherent cached access may end up with data in the CPU cache
>> +      * which hasn't hit memory yet.
>> +      */
>> +     return i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_WB) &&
>> +            i915_gem_object_has_cache_flag(obj, I915_CACHE_FLAG_COH2W);
> 
> Why checking COH2W here? The logic was, if UC or WT return false, otherwise
> return true. So, as long as cache_mode is WB, it's sufficient to say true
> here, right?

I was trying to penetrate the reason behind the check.

Original code was:

        return !(obj->cache_level == I915_CACHE_NONE ||
                 obj->cache_level == I915_CACHE_WT);

Which is equivalent to "is it WB", right? (Since it matches on both old 
LLC flavours.)

Which I thought, in the context of this function, is supposed to answer 
the question of "can there be data in the shared cache written by the 
GPU but not committed to RAM yet".

And then I thought that can only ever happen with 2-way coherency. 
Otherwise GPU writes never end up in the CPU cache.

Did I get that wrong? Maybe I have..

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Intel-gfx] [RFC 4/8] drm/i915: Refactor PAT/object cache handling
@ 2023-07-28 12:55       ` Tvrtko Ursulin
  0 siblings, 0 replies; 59+ messages in thread
From: Tvrtko Ursulin @ 2023-07-28 12:55 UTC (permalink / raw)
  To: Yang, Fei, Intel-gfx, dri-devel; +Cc: Roper, Matthew D, Chris Wilson


On 28/07/2023 08:14, Yang, Fei wrote:
> [snip]
>> @@ -41,14 +42,17 @@ static bool gpu_write_needs_clflush(struct drm_i915_gem_object *obj)
>>                return false;
>>
>>        /*
>> -      * For objects created by userspace through GEM_CREATE with pat_index
>> -      * set by set_pat extension, i915_gem_object_has_cache_level() will
>> -      * always return true, because the coherency of such object is managed
> 
> i915_gem_object_has_cache_level() always return true means this function
> always return false.
> 
>> -      * by userspace. Othereise the call here would fall back to checking
>> -      * whether the object is un-cached or write-through.
>> +      * Always flush cache for UMD objects with PAT index set.
> 
> (obj->pat_set_by_user == true) indicates UMD knows how to handle the coherency,
> forcing clflush in KMD would be redundant.

For Meteorlake I made gpu_write_needs_clflush() always return false anyway.

Could you please submit a patch with kerneldoc for i915_drm.h explaining 
what the set domain ioctl is expected to do when set pat extension is 
used? With the focus on the use cases of how userspace is managing 
coherency using it, or it isn't, or what.

>>         */
>> -     return !(i915_gem_object_has_cache_level(obj, I915_CACHE_NONE) ||
>> -              i915_gem_object_has_cache_level(obj, I915_CACHE_WT));
>> +     if (obj->pat_set_by_user)
>> +             return true;
> 
> return false;

Oops, thank you! I did warn in the cover letter I was getting confused 
by boolean logic conversions, cross-referencing three versions, and 
extracting the pat_set_by_user to call sites. :)

>> +
>> +     /*
>> +      * Fully coherent cached access may end up with data in the CPU cache
>> +      * which hasn't hit memory yet.
>> +      */
>> +     return i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_WB) &&
>> +            i915_gem_object_has_cache_flag(obj, I915_CACHE_FLAG_COH2W);
> 
> Why checking COH2W here? The logic was, if UC or WT return false, otherwise
> return true. So, as long as cache_mode is WB, it's sufficient to say true
> here, right?

I was trying to penetrate the reason behind the check.

Original code was:

        return !(obj->cache_level == I915_CACHE_NONE ||
                 obj->cache_level == I915_CACHE_WT);

Which is equivalent to "is it WB", right? (Since it matches on both old 
LLC flavours.)

Which I thought, in the context of this function, is supposed to answer 
the question of "can there be data in the shared cache written by the 
GPU but not committed to RAM yet".

And then I thought that can only ever happen with 2-way coherency. 
Otherwise GPU writes never end up in the CPU cache.

Did I get that wrong? Maybe I have..

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Intel-gfx] [RFC 2/8] drm/i915: Split PTE encode between Gen12 and Meteorlake
  2023-07-28  8:18     ` Tvrtko Ursulin
@ 2023-07-28 14:41       ` Matt Roper
  0 siblings, 0 replies; 59+ messages in thread
From: Matt Roper @ 2023-07-28 14:41 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Intel-gfx, dri-devel

On Fri, Jul 28, 2023 at 09:18:36AM +0100, Tvrtko Ursulin wrote:
> 
> On 27/07/2023 23:25, Matt Roper wrote:
> > On Thu, Jul 27, 2023 at 03:54:58PM +0100, Tvrtko Ursulin wrote:
> > > From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > > 
> > > No need to run extra instructions which will never trigger on platforms
> > > before Meteorlake.
> > > 
> > > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > > ---
> > >   drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 26 ++++++++++++++++++++++++++
> > >   1 file changed, 26 insertions(+)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> > > index c8568e5d1147..862ac1d2de25 100644
> > > --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> > > +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> > > @@ -63,6 +63,30 @@ static u64 gen12_pte_encode(dma_addr_t addr,
> > >   {
> > >   	gen8_pte_t pte = addr | GEN8_PAGE_PRESENT | GEN8_PAGE_RW;
> > > +	if (unlikely(flags & PTE_READ_ONLY))
> > > +		pte &= ~GEN8_PAGE_RW;
> > > +
> > > +	if (flags & PTE_LM)
> > > +		pte |= GEN12_PPGTT_PTE_LM;
> > > +
> > > +	if (pat_index & BIT(0))
> > > +		pte |= GEN12_PPGTT_PTE_PAT0;
> > > +
> > > +	if (pat_index & BIT(1))
> > > +		pte |= GEN12_PPGTT_PTE_PAT1;
> > > +
> > > +	if (pat_index & BIT(2))
> > > +		pte |= GEN12_PPGTT_PTE_PAT2;
> > > +
> > > +	return pte;
> > > +}
> > > +
> > > +static u64 mtl_pte_encode(dma_addr_t addr,
> > > +			  unsigned int pat_index,
> > > +			  u32 flags)
> > > +{
> > > +	gen8_pte_t pte = addr | GEN8_PAGE_PRESENT | GEN8_PAGE_RW;
> > > +
> > 
> > Would it be more readable to start with
> > 
> >          gen8_pte_t pte = gen12_pte_encode(addr, pat_index, flags);
> > 
> > and then |-in only the MTL-specific bit(s) as appropriate?
> > 
> > >   	if (unlikely(flags & PTE_READ_ONLY))
> > >   		pte &= ~GEN8_PAGE_RW;
> > > @@ -995,6 +1019,8 @@ struct i915_ppgtt *gen8_ppgtt_create(struct intel_gt *gt,
> > >   	 */
> > >   	ppgtt->vm.alloc_scratch_dma = alloc_pt_dma;
> > > +	if (GRAPHICS_VER_FULL(gt->i915) >= IP_VER(12, 70))
> > > +		ppgtt->vm.pte_encode = mtl_pte_encode;
> > >   	if (GRAPHICS_VER(gt->i915) >= 12)
> > >   		ppgtt->vm.pte_encode = gen12_pte_encode;
> > 
> > I think you wanted 'else if' here.  Otherwise you clobber the MTL
> > function pointer.
> 
> Doh this was a strong fail.. Yes and yes.. I even had it like you suggest in
> that patch I mentioned to you earlier..
> https://patchwork.freedesktop.org/patch/546013/?series=120341&rev=2.
> 
> Do you have an opinion on that one perhaps?

Yeah, I overlooked that patch before, but it looks good to me.


Matt


> 
> Thanks,
> 
> Tvrtko

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC 4/8] drm/i915: Refactor PAT/object cache handling
  2023-07-28 12:39       ` [Intel-gfx] " Tvrtko Ursulin
@ 2023-07-28 14:53         ` Matt Roper
  -1 siblings, 0 replies; 59+ messages in thread
From: Matt Roper @ 2023-07-28 14:53 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Fei Yang, Tvrtko Ursulin, Intel-gfx, dri-devel, Andi Shyti, Chris Wilson

On Fri, Jul 28, 2023 at 01:39:06PM +0100, Tvrtko Ursulin wrote:
> 
> Forgot one part of your reply:
> 
> On 28/07/2023 00:57, Matt Roper wrote:
> > On Thu, Jul 27, 2023 at 03:55:00PM +0100, Tvrtko Ursulin wrote:
> > > From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > > 
> > > Commit 9275277d5324 ("drm/i915: use pat_index instead of cache_level") has
> > > introduced PAT indices to i915 internal APIs, partially replacing the
> > > usage of driver internal cache_level, but has also added a few sub-
> > > optimal design decisions which this patch tries to improve upon.
> > > 
> > > Principal change here is to invert the per platform cache level to PAT
> > > index table which was added by the referenced commit, and by doing so
> > > enable i915 to understand the cache mode between PAT indices, changing
> > > them from opaque to transparent.
> > > 
> > > Once we have the inverted table we are able to remove the hidden false
> > > "return true" from i915_gem_object_has_cache_level and make the involved
> > > code path clearer.
> > > 
> > > To achieve this we replace the enum i915_cache_level with i915_cache_t,
> > > composed of a more detailed representation of each cache mode (base mode
> > > plus flags).
> > > 
> > > In this way we are able to express the differences between different
> > > write-back mode coherency settings on Meteorlake, which in turn enables us
> > > to map the i915 "cached" mode to the correct Meteorlake PAT index.
> > > 
> > > We can also replace the platform dependent cache mode to string code in
> > > debugfs and elsewhere by the single implementation based on i915_cache_t.
> > > 
> > > v2:
> > >   * Fix PAT-to-cache-mode table for PVC. (Fei)
> > >   * Cache display caching mode too. (Fei)
> > >   * Improve and document criteria in i915_gem_object_can_bypass_llc() (Matt)
> > > 
> > > v3:
> > >   * Checkpath issues.
> > >   * Cache mode flags check fixed.
> > > 
> > > v4:
> > >   * Fix intel_device_info->cache_modes array size. (Matt)
> > >   * Boolean cache mode and flags query. (Matt)
> > >   * Reduce number of cache macros with some macro magic.
> > >   * One more checkpatch fix.
> > >   * Tweak tables to show legacy and Gen12 WB is fully coherent.
> > > 
> > > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > > References: 9275277d5324 ("drm/i915: use pat_index instead of cache_level")
> > > Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
> > > Cc: Fei Yang <fei.yang@intel.com>
> > > Cc: Andi Shyti <andi.shyti@linux.intel.com>
> > > Cc: Matt Roper <matthew.d.roper@intel.com>
> > > ---
> > >   drivers/gpu/drm/i915/gem/i915_gem_domain.c    |  60 +++++----
> > >   drivers/gpu/drm/i915/gem/i915_gem_domain.h    |   5 +-
> > >   .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |   3 +-
> > >   drivers/gpu/drm/i915/gem/i915_gem_internal.c  |   2 +-
> > >   drivers/gpu/drm/i915/gem/i915_gem_mman.c      |   4 +-
> > >   drivers/gpu/drm/i915/gem/i915_gem_object.c    | 117 ++++++++++--------
> > >   drivers/gpu/drm/i915/gem/i915_gem_object.h    |  11 +-
> > >   .../gpu/drm/i915/gem/i915_gem_object_types.h  | 116 +----------------
> > >   drivers/gpu/drm/i915/gem/i915_gem_shmem.c     |   8 +-
> > >   drivers/gpu/drm/i915/gem/i915_gem_stolen.c    |   2 +-
> > >   drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c  |  20 +--
> > >   drivers/gpu/drm/i915/gem/i915_gem_userptr.c   |   2 +-
> > >   .../drm/i915/gem/selftests/huge_gem_object.c  |   2 +-
> > >   .../gpu/drm/i915/gem/selftests/huge_pages.c   |   3 +-
> > >   drivers/gpu/drm/i915/gt/gen8_ppgtt.c          |  10 +-
> > >   drivers/gpu/drm/i915/gt/intel_engine_cs.c     |   2 +-
> > >   drivers/gpu/drm/i915/gt/intel_ggtt.c          |  25 ++--
> > >   drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c     |   4 +-
> > >   drivers/gpu/drm/i915/gt/intel_gtt.c           |   2 +-
> > >   drivers/gpu/drm/i915/gt/intel_gtt.h           |   3 +-
> > >   drivers/gpu/drm/i915/gt/intel_ppgtt.c         |   6 +-
> > >   .../gpu/drm/i915/gt/intel_ring_submission.c   |   4 +-
> > >   drivers/gpu/drm/i915/gt/intel_timeline.c      |   2 +-
> > >   drivers/gpu/drm/i915/gt/selftest_hangcheck.c  |   2 +-
> > >   .../gpu/drm/i915/gt/selftest_workarounds.c    |   2 +-
> > >   drivers/gpu/drm/i915/i915_cache.c             |  89 +++++++++++--
> > >   drivers/gpu/drm/i915/i915_cache.h             |  70 ++++++++++-
> > >   drivers/gpu/drm/i915/i915_debugfs.c           |  53 ++------
> > >   drivers/gpu/drm/i915/i915_driver.c            |   4 +-
> > >   drivers/gpu/drm/i915/i915_gem.c               |  13 --
> > >   drivers/gpu/drm/i915/i915_pci.c               |  84 +++++++------
> > >   drivers/gpu/drm/i915/i915_perf.c              |   2 +-
> > >   drivers/gpu/drm/i915/intel_device_info.h      |   6 +-
> > >   .../gpu/drm/i915/selftests/i915_gem_evict.c   |   4 +-
> > >   drivers/gpu/drm/i915/selftests/igt_spinner.c  |   2 +-
> > >   .../gpu/drm/i915/selftests/mock_gem_device.c  |  14 +--
> > >   36 files changed, 391 insertions(+), 367 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> > > index 57db9c581bf6..c15f83de33af 100644
> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> > > @@ -8,6 +8,7 @@
> > >   #include "display/intel_frontbuffer.h"
> > >   #include "gt/intel_gt.h"
> > > +#include "i915_cache.h"
> > >   #include "i915_drv.h"
> > >   #include "i915_gem_clflush.h"
> > >   #include "i915_gem_domain.h"
> > > @@ -41,14 +42,17 @@ static bool gpu_write_needs_clflush(struct drm_i915_gem_object *obj)
> > >   		return false;
> > >   	/*
> > > -	 * For objects created by userspace through GEM_CREATE with pat_index
> > > -	 * set by set_pat extension, i915_gem_object_has_cache_level() will
> > > -	 * always return true, because the coherency of such object is managed
> > > -	 * by userspace. Othereise the call here would fall back to checking
> > > -	 * whether the object is un-cached or write-through.
> > > +	 * Always flush cache for UMD objects with PAT index set.
> > >   	 */
> > > -	return !(i915_gem_object_has_cache_level(obj, I915_CACHE_NONE) ||
> > > -		 i915_gem_object_has_cache_level(obj, I915_CACHE_WT));
> > > +	if (obj->pat_set_by_user)
> > > +		return true;
> > > +
> > > +	/*
> > > +	 * Fully coherent cached access may end up with data in the CPU cache
> > > +	 * which hasn't hit memory yet.
> > > +	 */
> > > +	return i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_WB) &&
> > > +	       i915_gem_object_has_cache_flag(obj, I915_CACHE_FLAG_COH2W);
> > >   }
> > >   bool i915_gem_cpu_write_needs_clflush(struct drm_i915_gem_object *obj)
> > > @@ -268,7 +272,7 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
> > >   /**
> > >    * i915_gem_object_set_cache_level - Changes the cache-level of an object across all VMA.
> > >    * @obj: object to act on
> > > - * @cache_level: new cache level to set for the object
> > > + * @cache: new caching mode to set for the object
> > >    *
> > >    * After this function returns, the object will be in the new cache-level
> > >    * across all GTT and the contents of the backing storage will be coherent,
> > > @@ -281,18 +285,28 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
> > >    * that all direct access to the scanout remains coherent.
> > >    */
> > >   int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
> > > -				    enum i915_cache_level cache_level)
> > > +				    i915_cache_t cache)
> > >   {
> > > -	int ret;
> > > +	struct drm_i915_private *i915 = to_i915(obj->base.dev);
> > > +	int pat, ret;
> > > -	/*
> > > -	 * For objects created by userspace through GEM_CREATE with pat_index
> > > -	 * set by set_pat extension, simply return 0 here without touching
> > > -	 * the cache setting, because such objects should have an immutable
> > > -	 * cache setting by desgin and always managed by userspace.
> > > -	 */
> > > -	if (i915_gem_object_has_cache_level(obj, cache_level))
> > > +	pat = i915_cache_find_pat(i915, cache);
> > > +	if (pat < 0) {
> > > +		char buf[I915_CACHE_NAME_LEN];
> > > +
> > > +		i915_cache_print(buf, sizeof(buf), NULL, cache);
> > > +		drm_err_ratelimited(&i915->drm,
> > > +				    "Attempting to use unknown caching mode %s!\n",
> > > +				    buf);
> > > +
> > > +		return -EINVAL;
> > > +	} else if (pat == obj->pat_index) {
> > >   		return 0;
> > > +	} else if (obj->pat_set_by_user) {
> > > +		drm_notice_once(&i915->drm,
> > > +				"Attempting to change caching mode on an object with fixed PAT!\n");
> > > +		return -EINVAL;
> > > +	}
> > >   	ret = i915_gem_object_wait(obj,
> > >   				   I915_WAIT_INTERRUPTIBLE |
> > > @@ -302,7 +316,7 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
> > >   		return ret;
> > >   	/* Always invalidate stale cachelines */
> > > -	i915_gem_object_set_cache_coherency(obj, cache_level);
> > > +	i915_gem_object_set_pat_index(obj, pat);
> > >   	obj->cache_dirty = true;
> > >   	/* The cache-level will be applied when each vma is rebound. */
> > > @@ -337,10 +351,10 @@ int i915_gem_get_caching_ioctl(struct drm_device *dev, void *data,
> > >   		goto out;
> > >   	}
> > > -	if (i915_gem_object_has_cache_level(obj, I915_CACHE_LLC) ||
> > > -	    i915_gem_object_has_cache_level(obj, I915_CACHE_L3_LLC))
> > > +	if (i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_WB) &&
> > > +	    i915_gem_object_has_cache_flag(obj, I915_CACHE_FLAG_COH2W))
> > >   		args->caching = I915_CACHING_CACHED;
> > > -	else if (i915_gem_object_has_cache_level(obj, I915_CACHE_WT))
> > > +	else if (i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_WT))
> > >   		args->caching = I915_CACHING_DISPLAY;
> > >   	else
> > >   		args->caching = I915_CACHING_NONE;
> > > @@ -355,7 +369,7 @@ int i915_gem_set_caching_ioctl(struct drm_device *dev, void *data,
> > >   	struct drm_i915_private *i915 = to_i915(dev);
> > >   	struct drm_i915_gem_caching *args = data;
> > >   	struct drm_i915_gem_object *obj;
> > > -	enum i915_cache_level level;
> > > +	i915_cache_t level;
> > >   	int ret = 0;
> > >   	if (IS_DGFX(i915))
> > > @@ -378,7 +392,7 @@ int i915_gem_set_caching_ioctl(struct drm_device *dev, void *data,
> > >   		if (!HAS_LLC(i915) && !HAS_SNOOP(i915))
> > >   			return -ENODEV;
> > > -		level = I915_CACHE_LLC;
> > > +		level = I915_CACHE_CACHED;
> > >   		break;
> > >   	case I915_CACHING_DISPLAY:
> > >   		level = HAS_WT(i915) ? I915_CACHE_WT : I915_CACHE_NONE;
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.h b/drivers/gpu/drm/i915/gem/i915_gem_domain.h
> > > index 9622df962bfc..6da5c351f6fd 100644
> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.h
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.h
> > > @@ -6,10 +6,11 @@
> > >   #ifndef __I915_GEM_DOMAIN_H__
> > >   #define __I915_GEM_DOMAIN_H__
> > > +#include "i915_cache.h"
> > > +
> > >   struct drm_i915_gem_object;
> > > -enum i915_cache_level;
> > >   int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
> > > -				    enum i915_cache_level cache_level);
> > > +				    i915_cache_t cache);
> > >   #endif /* __I915_GEM_DOMAIN_H__ */
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > > index 0a1d40220020..9d6e49c8a4c6 100644
> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > > @@ -648,7 +648,8 @@ static inline int use_cpu_reloc(const struct reloc_cache *cache,
> > >   	 */
> > >   	return (cache->has_llc ||
> > >   		obj->cache_dirty ||
> > > -		!i915_gem_object_has_cache_level(obj, I915_CACHE_NONE));
> > > +		!(obj->pat_set_by_user ||
> > > +		  i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_UC)));
> > >   }
> > >   static int eb_reserve_vma(struct i915_execbuffer *eb,
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_internal.c b/drivers/gpu/drm/i915/gem/i915_gem_internal.c
> > > index 6bc26b4b06b8..88c360c3d6a3 100644
> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_internal.c
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_internal.c
> > > @@ -170,7 +170,7 @@ __i915_gem_object_create_internal(struct drm_i915_private *i915,
> > >   	obj->read_domains = I915_GEM_DOMAIN_CPU;
> > >   	obj->write_domain = I915_GEM_DOMAIN_CPU;
> > > -	cache_level = HAS_LLC(i915) ? I915_CACHE_LLC : I915_CACHE_NONE;
> > > +	cache_level = HAS_LLC(i915) ? I915_CACHE_CACHED : I915_CACHE_NONE;
> > >   	i915_gem_object_set_cache_coherency(obj, cache_level);
> > >   	return obj;
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> > > index aa4d842d4c5a..cd7f8ded0d6f 100644
> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> > > @@ -382,7 +382,6 @@ static vm_fault_t vm_fault_gtt(struct vm_fault *vmf)
> > >   		goto err_reset;
> > >   	}
> > > -	/* Access to snoopable pages through the GTT is incoherent. */
> > >   	/*
> > >   	 * For objects created by userspace through GEM_CREATE with pat_index
> > >   	 * set by set_pat extension, coherency is managed by userspace, make
> > > @@ -391,7 +390,8 @@ static vm_fault_t vm_fault_gtt(struct vm_fault *vmf)
> > >   	 * objects. Otherwise this helper function would fall back to checking
> > >   	 * whether the object is un-cached.
> > >   	 */
> > > -	if (!(i915_gem_object_has_cache_level(obj, I915_CACHE_NONE) ||
> > > +	if (!((obj->pat_set_by_user ||
> > > +	       i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_UC)) ||
> > >   	      HAS_LLC(i915))) {
> > >   		ret = -EFAULT;
> > >   		goto err_unpin;
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> > > index 3dc4fbb67d2b..ec1f0be43d0d 100644
> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> > > @@ -45,33 +45,6 @@ static struct kmem_cache *slab_objects;
> > >   static const struct drm_gem_object_funcs i915_gem_object_funcs;
> > > -unsigned int i915_gem_get_pat_index(struct drm_i915_private *i915,
> > > -				    enum i915_cache_level level)
> > > -{
> > > -	if (drm_WARN_ON(&i915->drm, level >= I915_MAX_CACHE_LEVEL))
> > > -		return 0;
> > > -
> > > -	return INTEL_INFO(i915)->cachelevel_to_pat[level];
> > > -}
> > > -
> > > -bool i915_gem_object_has_cache_level(const struct drm_i915_gem_object *obj,
> > > -				     enum i915_cache_level lvl)
> > > -{
> > > -	/*
> > > -	 * In case the pat_index is set by user space, this kernel mode
> > > -	 * driver should leave the coherency to be managed by user space,
> > > -	 * simply return true here.
> > > -	 */
> > > -	if (obj->pat_set_by_user)
> > > -		return true;
> > > -
> > > -	/*
> > > -	 * Otherwise the pat_index should have been converted from cache_level
> > > -	 * so that the following comparison is valid.
> > > -	 */
> > > -	return obj->pat_index == i915_gem_get_pat_index(obj_to_i915(obj), lvl);
> > > -}
> > > -
> > >   struct drm_i915_gem_object *i915_gem_object_alloc(void)
> > >   {
> > >   	struct drm_i915_gem_object *obj;
> > > @@ -144,30 +117,72 @@ void __i915_gem_object_fini(struct drm_i915_gem_object *obj)
> > >   	dma_resv_fini(&obj->base._resv);
> > >   }
> > > +bool i915_gem_object_has_cache_mode(const struct drm_i915_gem_object *obj,
> > > +				    enum i915_cache_mode mode)
> > > +{
> > > +	struct drm_i915_private *i915 = obj_to_i915(obj);
> > > +	i915_cache_t cache = INTEL_INFO(i915)->cache_modes[obj->pat_index];
> > > +
> > > +	return I915_CACHE_MODE(cache) == mode;
> > > +}
> > > +
> > > +bool i915_gem_object_has_cache_flag(const struct drm_i915_gem_object *obj,
> > > +				    unsigned int flag)
> > > +{
> > > +	struct drm_i915_private *i915 = obj_to_i915(obj);
> > > +	i915_cache_t cache = INTEL_INFO(i915)->cache_modes[obj->pat_index];
> > > +
> > > +	return I915_CACHE_FLAGS(cache) & flag;
> > > +}
> > > +
> > > +static void __i915_gem_object_update_coherency(struct drm_i915_gem_object *obj)
> > > +{
> > > +	struct drm_i915_private *i915 = obj_to_i915(obj);
> > > +	i915_cache_t cache = INTEL_INFO(i915)->cache_modes[obj->pat_index];
> > > +	const unsigned int flags = I915_CACHE_FLAGS(cache);
> > > +	const unsigned int mode = I915_CACHE_MODE(cache);
> > > +
> > > +	if (mode == I915_CACHE_MODE_WC ||
> > > +	    mode == I915_CACHE_MODE_WT ||
> > > +	    (mode == I915_CACHE_MODE_WB && (flags & I915_CACHE_FLAG_COH2W)))
> > > +		obj->cache_coherent = I915_BO_CACHE_COHERENT_FOR_READ |
> > > +				      I915_BO_CACHE_COHERENT_FOR_WRITE;
> > > +	else if (HAS_LLC(i915))
> > > +		obj->cache_coherent = I915_BO_CACHE_COHERENT_FOR_READ;
> > > +	else
> > > +		obj->cache_coherent = 0;
> > > +
> > > +	obj->cache_dirty =
> > > +		!(obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_WRITE) &&
> > > +		!IS_DGFX(i915);
> > > +}
> > > +
> > >   /**
> > >    * i915_gem_object_set_cache_coherency - Mark up the object's coherency levels
> > > - * for a given cache_level
> > > + * for a given caching mode
> > >    * @obj: #drm_i915_gem_object
> > > - * @cache_level: cache level
> > > + * @cache: cache mode
> > >    */
> > >   void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object *obj,
> > > -					 unsigned int cache_level)
> > > +					 i915_cache_t cache)
> > >   {
> > > -	struct drm_i915_private *i915 = to_i915(obj->base.dev);
> > > +	struct drm_i915_private *i915 = obj_to_i915(obj);
> > > +	int found;
> > > -	obj->pat_index = i915_gem_get_pat_index(i915, cache_level);
> > > +	found = i915_cache_find_pat(i915, cache);
> > > +	if (found < 0) {
> > > +		char buf[I915_CACHE_NAME_LEN];
> > > -	if (cache_level != I915_CACHE_NONE)
> > > -		obj->cache_coherent = (I915_BO_CACHE_COHERENT_FOR_READ |
> > > -				       I915_BO_CACHE_COHERENT_FOR_WRITE);
> > > -	else if (HAS_LLC(i915))
> > > -		obj->cache_coherent = I915_BO_CACHE_COHERENT_FOR_READ;
> > > -	else
> > > -		obj->cache_coherent = 0;
> > > +		i915_cache_print(buf, sizeof(buf), NULL, cache);
> > > +		drm_err_ratelimited(&i915->drm, "Unknown cache mode %s!\n",
> > > +				    buf);
> > > -	obj->cache_dirty =
> > > -		!(obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_WRITE) &&
> > > -		!IS_DGFX(i915);
> > > +		found = i915->pat_uc;
> > > +	}
> > > +
> > > +	obj->pat_index = found;
> > > +
> > > +	__i915_gem_object_update_coherency(obj);
> > >   }
> > >   /**
> > > @@ -181,24 +196,18 @@ void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object *obj,
> > >   void i915_gem_object_set_pat_index(struct drm_i915_gem_object *obj,
> > >   				   unsigned int pat_index)
> > >   {
> > > -	struct drm_i915_private *i915 = to_i915(obj->base.dev);
> > > +	struct drm_i915_private *i915 = obj_to_i915(obj);
> > >   	if (obj->pat_index == pat_index)
> > >   		return;
> > > +	if (drm_WARN_ON_ONCE(&i915->drm,
> > > +			     pat_index > INTEL_INFO(i915)->max_pat_index))
> > > +		return;
> > > +
> > >   	obj->pat_index = pat_index;
> > > -	if (pat_index != i915_gem_get_pat_index(i915, I915_CACHE_NONE))
> > > -		obj->cache_coherent = (I915_BO_CACHE_COHERENT_FOR_READ |
> > > -				       I915_BO_CACHE_COHERENT_FOR_WRITE);
> > > -	else if (HAS_LLC(i915))
> > > -		obj->cache_coherent = I915_BO_CACHE_COHERENT_FOR_READ;
> > > -	else
> > > -		obj->cache_coherent = 0;
> > > -
> > > -	obj->cache_dirty =
> > > -		!(obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_WRITE) &&
> > > -		!IS_DGFX(i915);
> > > +	__i915_gem_object_update_coherency(obj);
> > >   }
> > >   bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj)
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> > > index 884a17275b3a..a5d4ee19d9be 100644
> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> > > @@ -13,6 +13,7 @@
> > >   #include "display/intel_frontbuffer.h"
> > >   #include "intel_memory_region.h"
> > > +#include "i915_cache.h"
> > >   #include "i915_gem_object_types.h"
> > >   #include "i915_gem_gtt.h"
> > >   #include "i915_gem_ww.h"
> > > @@ -32,10 +33,6 @@ static inline bool i915_gem_object_size_2big(u64 size)
> > >   	return false;
> > >   }
> > > -unsigned int i915_gem_get_pat_index(struct drm_i915_private *i915,
> > > -				    enum i915_cache_level level);
> > > -bool i915_gem_object_has_cache_level(const struct drm_i915_gem_object *obj,
> > > -				     enum i915_cache_level lvl);
> > >   void i915_gem_init__objects(struct drm_i915_private *i915);
> > >   void i915_objects_module_exit(void);
> > > @@ -764,8 +761,12 @@ int i915_gem_object_wait_moving_fence(struct drm_i915_gem_object *obj,
> > >   				      bool intr);
> > >   bool i915_gem_object_has_unknown_state(struct drm_i915_gem_object *obj);
> > > +bool i915_gem_object_has_cache_mode(const struct drm_i915_gem_object *obj,
> > > +				    enum i915_cache_mode mode);
> > > +bool i915_gem_object_has_cache_flag(const struct drm_i915_gem_object *obj,
> > > +				    unsigned int flag);
> > >   void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object *obj,
> > > -					 unsigned int cache_level);
> > > +					 i915_cache_t cache);
> > >   void i915_gem_object_set_pat_index(struct drm_i915_gem_object *obj,
> > >   				   unsigned int pat_index);
> > >   bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj);
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> > > index 8de2b91b3edf..6790e13ad262 100644
> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> > > @@ -14,6 +14,7 @@
> > >   #include <uapi/drm/i915_drm.h>
> > >   #include "i915_active.h"
> > > +#include "i915_cache.h"
> > >   #include "i915_selftest.h"
> > >   #include "i915_vma_resource.h"
> > > @@ -116,93 +117,6 @@ struct drm_i915_gem_object_ops {
> > >   	const char *name; /* friendly name for debug, e.g. lockdep classes */
> > >   };
> > > -/**
> > > - * enum i915_cache_level - The supported GTT caching values for system memory
> > > - * pages.
> > > - *
> > > - * These translate to some special GTT PTE bits when binding pages into some
> > > - * address space. It also determines whether an object, or rather its pages are
> > > - * coherent with the GPU, when also reading or writing through the CPU cache
> > > - * with those pages.
> > > - *
> > > - * Userspace can also control this through struct drm_i915_gem_caching.
> > > - */
> > > -enum i915_cache_level {
> > > -	/**
> > > -	 * @I915_CACHE_NONE:
> > > -	 *
> > > -	 * GPU access is not coherent with the CPU cache. If the cache is dirty
> > > -	 * and we need the underlying pages to be coherent with some later GPU
> > > -	 * access then we need to manually flush the pages.
> > > -	 *
> > > -	 * On shared LLC platforms reads and writes through the CPU cache are
> > > -	 * still coherent even with this setting. See also
> > > -	 * &drm_i915_gem_object.cache_coherent for more details. Due to this we
> > > -	 * should only ever use uncached for scanout surfaces, otherwise we end
> > > -	 * up over-flushing in some places.
> > > -	 *
> > > -	 * This is the default on non-LLC platforms.
> > > -	 */
> > > -	I915_CACHE_NONE = 0,
> > > -	/**
> > > -	 * @I915_CACHE_LLC:
> > > -	 *
> > > -	 * GPU access is coherent with the CPU cache. If the cache is dirty,
> > > -	 * then the GPU will ensure that access remains coherent, when both
> > > -	 * reading and writing through the CPU cache. GPU writes can dirty the
> > > -	 * CPU cache.
> > > -	 *
> > > -	 * Not used for scanout surfaces.
> > > -	 *
> > > -	 * Applies to both platforms with shared LLC(HAS_LLC), and snooping
> > > -	 * based platforms(HAS_SNOOP).
> > > -	 *
> > > -	 * This is the default on shared LLC platforms.  The only exception is
> > > -	 * scanout objects, where the display engine is not coherent with the
> > > -	 * CPU cache. For such objects I915_CACHE_NONE or I915_CACHE_WT is
> > > -	 * automatically applied by the kernel in pin_for_display, if userspace
> > > -	 * has not done so already.
> > > -	 */
> > > -	I915_CACHE_LLC,
> > > -	/**
> > > -	 * @I915_CACHE_L3_LLC:
> > > -	 *
> > > -	 * Explicitly enable the Gfx L3 cache, with coherent LLC.
> > > -	 *
> > > -	 * The Gfx L3 sits between the domain specific caches, e.g
> > > -	 * sampler/render caches, and the larger LLC. LLC is coherent with the
> > > -	 * GPU, but L3 is only visible to the GPU, so likely needs to be flushed
> > > -	 * when the workload completes.
> > > -	 *
> > > -	 * Not used for scanout surfaces.
> > > -	 *
> > > -	 * Only exposed on some gen7 + GGTT. More recent hardware has dropped
> > > -	 * this explicit setting, where it should now be enabled by default.
> > > -	 */
> > > -	I915_CACHE_L3_LLC,
> > > -	/**
> > > -	 * @I915_CACHE_WT:
> > > -	 *
> > > -	 * Write-through. Used for scanout surfaces.
> > > -	 *
> > > -	 * The GPU can utilise the caches, while still having the display engine
> > > -	 * be coherent with GPU writes, as a result we don't need to flush the
> > > -	 * CPU caches when moving out of the render domain. This is the default
> > > -	 * setting chosen by the kernel, if supported by the HW, otherwise we
> > > -	 * fallback to I915_CACHE_NONE. On the CPU side writes through the CPU
> > > -	 * cache still need to be flushed, to remain coherent with the display
> > > -	 * engine.
> > > -	 */
> > > -	I915_CACHE_WT,
> > > -	/**
> > > -	 * @I915_MAX_CACHE_LEVEL:
> > > -	 *
> > > -	 * Mark the last entry in the enum. Used for defining cachelevel_to_pat
> > > -	 * array for cache_level to pat translation table.
> > > -	 */
> > > -	I915_MAX_CACHE_LEVEL,
> > > -};
> > > -
> > >   enum i915_map_type {
> > >   	I915_MAP_WB = 0,
> > >   	I915_MAP_WC,
> > > @@ -403,16 +317,6 @@ struct drm_i915_gem_object {
> > >   	/**
> > >   	 * @cache_coherent:
> > >   	 *
> > > -	 * Note: with the change above which replaced @cache_level with pat_index,
> > > -	 * the use of @cache_coherent is limited to the objects created by kernel
> > > -	 * or by userspace without pat index specified.
> > > -	 * Check for @pat_set_by_user to find out if an object has pat index set
> > > -	 * by userspace. The ioctl's to change cache settings have also been
> > > -	 * disabled for the objects with pat index set by userspace. Please don't
> > > -	 * assume @cache_coherent having the flags set as describe here. A helper
> > > -	 * function i915_gem_object_has_cache_level() provides one way to bypass
> > > -	 * the use of this field.
> > > -	 *
> > >   	 * Track whether the pages are coherent with the GPU if reading or
> > >   	 * writing through the CPU caches. The largely depends on the
> > >   	 * @cache_level setting.
> > > @@ -447,7 +351,7 @@ struct drm_i915_gem_object {
> > >   	 * flushing the surface just before doing the scanout.  This does mean
> > >   	 * we might unnecessarily flush non-scanout objects in some places, but
> > >   	 * the default assumption is that all normal objects should be using
> > > -	 * I915_CACHE_LLC, at least on platforms with the shared LLC.
> > > +	 * I915_CACHE_CACHED, at least on platforms with the shared LLC.
> > >   	 *
> > >   	 * Supported values:
> > >   	 *
> > > @@ -486,16 +390,6 @@ struct drm_i915_gem_object {
> > >   	/**
> > >   	 * @cache_dirty:
> > >   	 *
> > > -	 * Note: with the change above which replaced cache_level with pat_index,
> > > -	 * the use of @cache_dirty is limited to the objects created by kernel
> > > -	 * or by userspace without pat index specified.
> > > -	 * Check for @pat_set_by_user to find out if an object has pat index set
> > > -	 * by userspace. The ioctl's to change cache settings have also been
> > > -	 * disabled for the objects with pat_index set by userspace. Please don't
> > > -	 * assume @cache_dirty is set as describe here. Also see helper function
> > > -	 * i915_gem_object_has_cache_level() for possible ways to bypass the use
> > > -	 * of this field.
> > > -	 *
> > >   	 * Track if we are we dirty with writes through the CPU cache for this
> > >   	 * object. As a result reading directly from main memory might yield
> > >   	 * stale data.
> > > @@ -531,9 +425,9 @@ struct drm_i915_gem_object {
> > >   	 *
> > >   	 *   1. All userspace objects, by default, have @cache_level set as
> > >   	 *   I915_CACHE_NONE. The only exception is userptr objects, where we
> > > -	 *   instead force I915_CACHE_LLC, but we also don't allow userspace to
> > > -	 *   ever change the @cache_level for such objects. Another special case
> > > -	 *   is dma-buf, which doesn't rely on @cache_dirty,  but there we
> > > +	 *   instead force I915_CACHE_CACHED, but we also don't allow userspace
> > > +	 *   to ever change the @cache_level for such objects. Another special
> > > +	 *   case is dma-buf, which doesn't rely on @cache_dirty,  but there we
> > >   	 *   always do a forced flush when acquiring the pages, if there is a
> > >   	 *   chance that the pages can be read directly from main memory with
> > >   	 *   the GPU.
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
> > > index 8f1633c3fb93..aba908f0349f 100644
> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
> > > @@ -584,7 +584,7 @@ static int shmem_object_init(struct intel_memory_region *mem,
> > >   	static struct lock_class_key lock_class;
> > >   	struct drm_i915_private *i915 = mem->i915;
> > >   	struct address_space *mapping;
> > > -	unsigned int cache_level;
> > > +	i915_cache_t cache;
> > >   	gfp_t mask;
> > >   	int ret;
> > > @@ -628,11 +628,11 @@ static int shmem_object_init(struct intel_memory_region *mem,
> > >   		 * However, we maintain the display planes as UC, and so
> > >   		 * need to rebind when first used as such.
> > >   		 */
> > > -		cache_level = I915_CACHE_LLC;
> > > +		cache = I915_CACHE_CACHED;
> > >   	else
> > > -		cache_level = I915_CACHE_NONE;
> > > +		cache = I915_CACHE_NONE;
> > > -	i915_gem_object_set_cache_coherency(obj, cache_level);
> > > +	i915_gem_object_set_cache_coherency(obj, cache);
> > >   	i915_gem_object_init_memory_region(obj, mem);
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
> > > index 1c8eb806b7d3..cc907a1f1c53 100644
> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
> > > @@ -691,7 +691,7 @@ static int __i915_gem_object_create_stolen(struct intel_memory_region *mem,
> > >   	obj->stolen = stolen;
> > >   	obj->read_domains = I915_GEM_DOMAIN_CPU | I915_GEM_DOMAIN_GTT;
> > > -	cache_level = HAS_LLC(mem->i915) ? I915_CACHE_LLC : I915_CACHE_NONE;
> > > +	cache_level = HAS_LLC(mem->i915) ? I915_CACHE_CACHED : I915_CACHE_NONE;
> > >   	i915_gem_object_set_cache_coherency(obj, cache_level);
> > >   	if (WARN_ON(!i915_gem_object_trylock(obj, NULL)))
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
> > > index 6bd6c239f4ac..107176d1757b 100644
> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
> > > @@ -48,14 +48,14 @@ void i915_ttm_migrate_set_ban_memcpy(bool ban)
> > >   }
> > >   #endif
> > > -static enum i915_cache_level
> > > -i915_ttm_cache_level(struct drm_i915_private *i915, struct ttm_resource *res,
> > > -		     struct ttm_tt *ttm)
> > > +static i915_cache_t
> > > +i915_ttm_cache(struct drm_i915_private *i915, struct ttm_resource *res,
> > > +	       struct ttm_tt *ttm)
> > >   {
> > >   	return ((HAS_LLC(i915) || HAS_SNOOP(i915)) &&
> > >   		!i915_ttm_gtt_binds_lmem(res) &&
> > > -		ttm->caching == ttm_cached) ? I915_CACHE_LLC :
> > > -		I915_CACHE_NONE;
> > > +		ttm->caching == ttm_cached) ? I915_CACHE_CACHED :
> > > +					      I915_CACHE_NONE;
> > >   }
> > >   static unsigned int
> > > @@ -112,8 +112,8 @@ void i915_ttm_adjust_domains_after_move(struct drm_i915_gem_object *obj)
> > >   void i915_ttm_adjust_gem_after_move(struct drm_i915_gem_object *obj)
> > >   {
> > >   	struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
> > > -	unsigned int cache_level;
> > >   	unsigned int mem_flags;
> > > +	i915_cache_t cache;
> > >   	unsigned int i;
> > >   	int mem_type;
> > > @@ -126,13 +126,13 @@ void i915_ttm_adjust_gem_after_move(struct drm_i915_gem_object *obj)
> > >   	if (!bo->resource) {
> > >   		mem_flags = I915_BO_FLAG_STRUCT_PAGE;
> > >   		mem_type = I915_PL_SYSTEM;
> > > -		cache_level = I915_CACHE_NONE;
> > > +		cache = I915_CACHE_NONE;
> > >   	} else {
> > >   		mem_flags = i915_ttm_cpu_maps_iomem(bo->resource) ? I915_BO_FLAG_IOMEM :
> > >   			I915_BO_FLAG_STRUCT_PAGE;
> > >   		mem_type = bo->resource->mem_type;
> > > -		cache_level = i915_ttm_cache_level(to_i915(bo->base.dev), bo->resource,
> > > -						   bo->ttm);
> > > +		cache = i915_ttm_cache(to_i915(bo->base.dev), bo->resource,
> > > +				       bo->ttm);
> > >   	}
> > >   	/*
> > > @@ -157,7 +157,7 @@ void i915_ttm_adjust_gem_after_move(struct drm_i915_gem_object *obj)
> > >   	obj->mem_flags &= ~(I915_BO_FLAG_STRUCT_PAGE | I915_BO_FLAG_IOMEM);
> > >   	obj->mem_flags |= mem_flags;
> > > -	i915_gem_object_set_cache_coherency(obj, cache_level);
> > > +	i915_gem_object_set_cache_coherency(obj, cache);
> > >   }
> > >   /**
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
> > > index 1d3ebdf4069b..5d2891981bd4 100644
> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
> > > @@ -553,7 +553,7 @@ i915_gem_userptr_ioctl(struct drm_device *dev,
> > >   	obj->mem_flags = I915_BO_FLAG_STRUCT_PAGE;
> > >   	obj->read_domains = I915_GEM_DOMAIN_CPU;
> > >   	obj->write_domain = I915_GEM_DOMAIN_CPU;
> > > -	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
> > > +	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
> > >   	obj->userptr.ptr = args->user_ptr;
> > >   	obj->userptr.notifier_seq = ULONG_MAX;
> > > diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c b/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c
> > > index bac957755068..77d04be5e9d7 100644
> > > --- a/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c
> > > +++ b/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c
> > > @@ -123,7 +123,7 @@ huge_gem_object(struct drm_i915_private *i915,
> > >   	obj->read_domains = I915_GEM_DOMAIN_CPU;
> > >   	obj->write_domain = I915_GEM_DOMAIN_CPU;
> > > -	cache_level = HAS_LLC(i915) ? I915_CACHE_LLC : I915_CACHE_NONE;
> > > +	cache_level = HAS_LLC(i915) ? I915_CACHE_CACHED : I915_CACHE_NONE;
> > >   	i915_gem_object_set_cache_coherency(obj, cache_level);
> > >   	obj->scratch = phys_size;
> > > diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
> > > index 6bddd733d796..6ca5b9dbc414 100644
> > > --- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
> > > +++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
> > > @@ -200,9 +200,10 @@ huge_pages_object(struct drm_i915_private *i915,
> > >   	obj->write_domain = I915_GEM_DOMAIN_CPU;
> > >   	obj->read_domains = I915_GEM_DOMAIN_CPU;
> > > -	cache_level = HAS_LLC(i915) ? I915_CACHE_LLC : I915_CACHE_NONE;
> > > +	cache_level = HAS_LLC(i915) ? I915_CACHE_CACHED : I915_CACHE_NONE;
> > >   	i915_gem_object_set_cache_coherency(obj, cache_level);
> > > +
> > >   	obj->mm.page_mask = page_mask;
> > >   	return obj;
> > > diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> > > index 675f71f06e89..3c93a73cf6b1 100644
> > > --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> > > +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> > > @@ -16,11 +16,11 @@
> > >   #include "intel_gtt.h"
> > >   static u64 gen8_pde_encode(const dma_addr_t addr,
> > > -			   const enum i915_cache_level level)
> > > +			   const enum i915_cache_mode cache_mode)
> > >   {
> > >   	u64 pde = addr | GEN8_PAGE_PRESENT | GEN8_PAGE_RW;
> > > -	if (level != I915_CACHE_NONE)
> > > +	if (cache_mode != I915_CACHE_MODE_UC)
> > >   		pde |= PPAT_CACHED_PDE;
> > >   	else
> > >   		pde |= PPAT_UNCACHED;
> > > @@ -43,10 +43,10 @@ static u64 gen8_pte_encode(dma_addr_t addr,
> > >   	 * See translation table defined by LEGACY_CACHELEVEL.
> > >   	 */
> > >   	switch (pat_index) {
> > > -	case I915_CACHE_NONE:
> > > +	case I915_CACHE_MODE_UC:
> > >   		pte |= PPAT_UNCACHED;
> > >   		break;
> > > -	case I915_CACHE_WT:
> > > +	case I915_CACHE_MODE_WT:
> > >   		pte |= PPAT_DISPLAY_ELLC;
> > >   		break;
> > >   	default:
> > > @@ -893,7 +893,7 @@ static int gen8_init_scratch(struct i915_address_space *vm)
> > >   		}
> > >   		fill_px(obj, vm->scratch[i - 1]->encode);
> > > -		obj->encode = gen8_pde_encode(px_dma(obj), I915_CACHE_NONE);
> > > +		obj->encode = gen8_pde_encode(px_dma(obj), I915_CACHE_MODE_UC);
> > >   		vm->scratch[i] = obj;
> > >   	}
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > > index ee15486fed0d..f1e59e512d14 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > > +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > > @@ -1103,7 +1103,7 @@ static int init_status_page(struct intel_engine_cs *engine)
> > >   		return PTR_ERR(obj);
> > >   	}
> > > -	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
> > > +	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
> > >   	vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL);
> > >   	if (IS_ERR(vma)) {
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> > > index fca61ddca8ad..ab5f654e7557 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
> > > +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> > > @@ -1011,11 +1011,6 @@ static int gen8_gmch_probe(struct i915_ggtt *ggtt)
> > >   	return ggtt_probe_common(ggtt, size);
> > >   }
> > > -/*
> > > - * For pre-gen8 platforms pat_index is the same as enum i915_cache_level,
> > > - * so the switch-case statements in these PTE encode functions are still valid.
> > > - * See translation table LEGACY_CACHELEVEL.
> > > - */
> > >   static u64 snb_pte_encode(dma_addr_t addr,
> > >   			  unsigned int pat_index,
> > >   			  u32 flags)
> > > @@ -1023,11 +1018,11 @@ static u64 snb_pte_encode(dma_addr_t addr,
> > >   	gen6_pte_t pte = GEN6_PTE_ADDR_ENCODE(addr) | GEN6_PTE_VALID;
> > >   	switch (pat_index) {
> > > -	case I915_CACHE_L3_LLC:
> > > -	case I915_CACHE_LLC:
> > > +	case I915_CACHE_MODE_WB:
> > > +	case __I915_CACHE_MODE_WB_L3:
> > >   		pte |= GEN6_PTE_CACHE_LLC;
> > >   		break;
> > > -	case I915_CACHE_NONE:
> > > +	case I915_CACHE_MODE_UC:
> > >   		pte |= GEN6_PTE_UNCACHED;
> > >   		break;
> > >   	default:
> > > @@ -1044,13 +1039,13 @@ static u64 ivb_pte_encode(dma_addr_t addr,
> > >   	gen6_pte_t pte = GEN6_PTE_ADDR_ENCODE(addr) | GEN6_PTE_VALID;
> > >   	switch (pat_index) {
> > > -	case I915_CACHE_L3_LLC:
> > > +	case __I915_CACHE_MODE_WB_L3:
> > >   		pte |= GEN7_PTE_CACHE_L3_LLC;
> > >   		break;
> > > -	case I915_CACHE_LLC:
> > > +	case I915_CACHE_MODE_WB:
> > >   		pte |= GEN6_PTE_CACHE_LLC;
> > >   		break;
> > > -	case I915_CACHE_NONE:
> > > +	case I915_CACHE_MODE_UC:
> > >   		pte |= GEN6_PTE_UNCACHED;
> > >   		break;
> > >   	default:
> > > @@ -1069,7 +1064,7 @@ static u64 byt_pte_encode(dma_addr_t addr,
> > >   	if (!(flags & PTE_READ_ONLY))
> > >   		pte |= BYT_PTE_WRITEABLE;
> > > -	if (pat_index != I915_CACHE_NONE)
> > > +	if (pat_index != I915_CACHE_MODE_UC)
> > >   		pte |= BYT_PTE_SNOOPED_BY_CPU_CACHES;
> > >   	return pte;
> > > @@ -1081,7 +1076,7 @@ static u64 hsw_pte_encode(dma_addr_t addr,
> > >   {
> > >   	gen6_pte_t pte = HSW_PTE_ADDR_ENCODE(addr) | GEN6_PTE_VALID;
> > > -	if (pat_index != I915_CACHE_NONE)
> > > +	if (pat_index != I915_CACHE_MODE_UC)
> > >   		pte |= HSW_WB_LLC_AGE3;
> > >   	return pte;
> > > @@ -1094,9 +1089,9 @@ static u64 iris_pte_encode(dma_addr_t addr,
> > >   	gen6_pte_t pte = HSW_PTE_ADDR_ENCODE(addr) | GEN6_PTE_VALID;
> > >   	switch (pat_index) {
> > > -	case I915_CACHE_NONE:
> > > +	case I915_CACHE_MODE_UC:
> > >   		break;
> > > -	case I915_CACHE_WT:
> > > +	case I915_CACHE_MODE_WT:
> > >   		pte |= HSW_WT_ELLC_LLC_AGE3;
> > >   		break;
> > >   	default:
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c b/drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c
> > > index 866c416afb73..803c41ac4ccb 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c
> > > +++ b/drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c
> > > @@ -21,7 +21,7 @@ static void gmch_ggtt_insert_page(struct i915_address_space *vm,
> > >   				  unsigned int pat_index,
> > >   				  u32 unused)
> > >   {
> > > -	unsigned int flags = (pat_index == I915_CACHE_NONE) ?
> > > +	unsigned int flags = (pat_index == I915_CACHE_MODE_UC) ?
> > >   		AGP_USER_MEMORY : AGP_USER_CACHED_MEMORY;
> > >   	intel_gmch_gtt_insert_page(addr, offset >> PAGE_SHIFT, flags);
> > > @@ -32,7 +32,7 @@ static void gmch_ggtt_insert_entries(struct i915_address_space *vm,
> > >   				     unsigned int pat_index,
> > >   				     u32 unused)
> > >   {
> > > -	unsigned int flags = (pat_index == I915_CACHE_NONE) ?
> > > +	unsigned int flags = (pat_index == I915_CACHE_MODE_UC) ?
> > >   		AGP_USER_MEMORY : AGP_USER_CACHED_MEMORY;
> > >   	intel_gmch_gtt_insert_sg_entries(vma_res->bi.pages, vma_res->start >> PAGE_SHIFT,
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
> > > index 065099362a98..48055304537a 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
> > > +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
> > > @@ -676,7 +676,7 @@ __vm_create_scratch_for_read(struct i915_address_space *vm, unsigned long size)
> > >   	if (IS_ERR(obj))
> > >   		return ERR_CAST(obj);
> > > -	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
> > > +	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
> > >   	vma = i915_vma_instance(obj, vm, NULL);
> > >   	if (IS_ERR(vma)) {
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
> > > index 7192a534a654..af4277c1d577 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
> > > +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
> > > @@ -636,7 +636,8 @@ void
> > >   __set_pd_entry(struct i915_page_directory * const pd,
> > >   	       const unsigned short idx,
> > >   	       struct i915_page_table *pt,
> > > -	       u64 (*encode)(const dma_addr_t, const enum i915_cache_level));
> > > +	       u64 (*encode)(const dma_addr_t,
> > > +			     const enum i915_cache_mode cache_mode));
> > >   #define set_pd_entry(pd, idx, to) \
> > >   	__set_pd_entry((pd), (idx), px_pt(to), gen8_pde_encode)
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
> > > index 436756bfbb1a..3e461d4f3693 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c
> > > +++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
> > > @@ -98,14 +98,16 @@ void
> > >   __set_pd_entry(struct i915_page_directory * const pd,
> > >   	       const unsigned short idx,
> > >   	       struct i915_page_table * const to,
> > > -	       u64 (*encode)(const dma_addr_t, const enum i915_cache_level))
> > > +	       u64 (*encode)(const dma_addr_t,
> > > +			     const enum i915_cache_mode cache_mode))
> > >   {
> > >   	/* Each thread pre-pins the pd, and we may have a thread per pde. */
> > >   	GEM_BUG_ON(atomic_read(px_used(pd)) > NALLOC * I915_PDES);
> > >   	atomic_inc(px_used(pd));
> > >   	pd->entry[idx] = to;
> > > -	write_dma_entry(px_base(pd), idx, encode(px_dma(to), I915_CACHE_LLC));
> > > +	write_dma_entry(px_base(pd), idx,
> > > +			encode(px_dma(to), I915_CACHE_MODE_WB));
> > >   }
> > >   void
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> > > index 92085ffd23de..9131d228d285 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> > > +++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> > > @@ -551,7 +551,9 @@ alloc_context_vma(struct intel_engine_cs *engine)
> > >   	 * later platforms don't have L3 control bits in the PTE.
> > >   	 */
> > >   	if (IS_IVYBRIDGE(i915))
> > > -		i915_gem_object_set_cache_coherency(obj, I915_CACHE_L3_LLC);
> > > +		i915_gem_object_set_cache_coherency(obj,
> > > +						    I915_CACHE_CACHED |
> > > +						    __I915_CACHE_FLAG(L3));
> > >   	vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL);
> > >   	if (IS_ERR(vma)) {
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_timeline.c b/drivers/gpu/drm/i915/gt/intel_timeline.c
> > > index b9640212d659..025ce54c886d 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_timeline.c
> > > +++ b/drivers/gpu/drm/i915/gt/intel_timeline.c
> > > @@ -26,7 +26,7 @@ static struct i915_vma *hwsp_alloc(struct intel_gt *gt)
> > >   	if (IS_ERR(obj))
> > >   		return ERR_CAST(obj);
> > > -	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
> > > +	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
> > >   	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL);
> > >   	if (IS_ERR(vma))
> > > diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> > > index 8b0d84f2aad2..fc278fa463b0 100644
> > > --- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> > > +++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> > > @@ -64,7 +64,7 @@ static int hang_init(struct hang *h, struct intel_gt *gt)
> > >   		goto err_hws;
> > >   	}
> > > -	i915_gem_object_set_cache_coherency(h->hws, I915_CACHE_LLC);
> > > +	i915_gem_object_set_cache_coherency(h->hws, I915_CACHE_CACHED);
> > >   	vaddr = i915_gem_object_pin_map_unlocked(h->hws, I915_MAP_WB);
> > >   	if (IS_ERR(vaddr)) {
> > >   		err = PTR_ERR(vaddr);
> > > diff --git a/drivers/gpu/drm/i915/gt/selftest_workarounds.c b/drivers/gpu/drm/i915/gt/selftest_workarounds.c
> > > index 14a8b25b6204..d25990d33d44 100644
> > > --- a/drivers/gpu/drm/i915/gt/selftest_workarounds.c
> > > +++ b/drivers/gpu/drm/i915/gt/selftest_workarounds.c
> > > @@ -111,7 +111,7 @@ read_nonprivs(struct intel_context *ce)
> > >   	if (IS_ERR(result))
> > >   		return result;
> > > -	i915_gem_object_set_cache_coherency(result, I915_CACHE_LLC);
> > > +	i915_gem_object_set_cache_coherency(result, I915_CACHE_CACHED);
> > >   	cs = i915_gem_object_pin_map_unlocked(result, I915_MAP_WB);
> > >   	if (IS_ERR(cs)) {
> > > diff --git a/drivers/gpu/drm/i915/i915_cache.c b/drivers/gpu/drm/i915/i915_cache.c
> > > index 06eb5933c719..f4ba1cb430d3 100644
> > > --- a/drivers/gpu/drm/i915/i915_cache.c
> > > +++ b/drivers/gpu/drm/i915/i915_cache.c
> > > @@ -6,13 +6,88 @@
> > >   #include "i915_cache.h"
> > >   #include "i915_drv.h"
> > > -void i915_cache_init(struct drm_i915_private *i915)
> > > +int i915_cache_init(struct drm_i915_private *i915)
> > >   {
> > > -	i915->pat_uc = i915_gem_get_pat_index(i915, I915_CACHE_NONE);
> > > -	drm_info(&i915->drm, "Using PAT index %u for uncached access\n",
> > > -		 i915->pat_uc);
> > > +	int ret;
> > > -	i915->pat_wb = i915_gem_get_pat_index(i915, I915_CACHE_LLC);
> > > -	drm_info(&i915->drm, "Using PAT index %u for write-back access\n",
> > > -		 i915->pat_wb);
> > > +	ret = i915_cache_find_pat(i915, I915_CACHE_NONE);
> > > +	if (ret < 0) {
> > > +		drm_err(&i915->drm,
> > > +			"Failed to find PAT index for uncached access\n");
> > > +		return -ENODEV;
> > > +	}
> > > +	drm_info(&i915->drm, "Using PAT index %u for uncached access\n", ret);
> > > +	i915->pat_uc = ret;
> > > +
> > > +	ret = i915_cache_find_pat(i915, I915_CACHE_CACHED);
> > > +	if (ret < 0) {
> > > +		drm_err(&i915->drm,
> > > +			"Failed to find PAT index for write-back access\n");
> > > +		return -ENODEV;
> > > +	}
> > > +	drm_info(&i915->drm, "Using PAT index %u for write-back access\n", ret);
> > > +	i915->pat_wb = ret;
> > > +
> > > +	return 0;
> > > +}
> > > +
> > > +int i915_cache_find_pat(struct drm_i915_private *i915, i915_cache_t cache)
> > > +{
> > > +	const struct intel_device_info *info = INTEL_INFO(i915);
> > > +	int i;
> > > +
> > > +	for (i = 0; i < ARRAY_SIZE(info->cache_modes); i++) {
> > > +		if (info->cache_modes[i] == cache)
> > > +			return i;
> > > +	}
> > > +
> > > +	return -1;
> > > +}
> > > +
> > > +void i915_cache_print(char *buf, size_t buflen, const char *suffix,
> > > +		      i915_cache_t cache)
> > > +{
> > > +	const enum i915_cache_mode mode = I915_CACHE_MODE(cache);
> > > +	static const char * const mode_str[] = {
> > > +		[I915_CACHE_MODE_UC] = "UC",
> > > +		[I915_CACHE_MODE_WB] = "WB",
> > > +		[I915_CACHE_MODE_WT] = "WT",
> > > +		[I915_CACHE_MODE_WC] = "WC",
> > > +	};
> > > +	static const char * const flag_str[] = {
> > > +		[ilog2(I915_CACHE_FLAG_COH1W)] = "1-Way-Coherent",
> > > +		[ilog2(I915_CACHE_FLAG_COH2W)] = "2-Way-Coherent",
> > > +		[ilog2(I915_CACHE_FLAG_L3)] =    "L3",
> > > +		[ilog2(I915_CACHE_FLAG_CLOS1)] = "CLOS1",
> > > +		[ilog2(I915_CACHE_FLAG_CLOS2)] = "CLOS2",
> > > +	};
> > > +
> > > +	if (mode > ARRAY_SIZE(mode_str)) {
> > > +		snprintf(buf, buflen, "0x%x%s", cache, suffix ?: "");
> > > +	} else {
> > > +		unsigned long flags = I915_CACHE_FLAGS(cache);
> > > +		unsigned long bit;
> > > +		int ret;
> > > +
> > > +		ret = snprintf(buf, buflen, "%s", mode_str[mode]);
> > > +		buf += ret;
> > > +		buflen -= ret;
> > > +
> > > +		/*
> > > +		 * Don't print "1-way-2-way", it would be confusing and 2-way
> > > +		 * implies 1-way anyway.
> > > +		 */
> > > +		if ((flags & (I915_CACHE_FLAG_COH1W | I915_CACHE_FLAG_COH2W)) ==
> > > +		    (I915_CACHE_FLAG_COH1W | I915_CACHE_FLAG_COH2W))
> > > +			flags &= ~I915_CACHE_FLAG_COH1W;
> > > +
> > > +		for_each_set_bit(bit, &flags, BITS_PER_TYPE(i915_cache_t)) {
> > > +			ret = snprintf(buf, buflen, "-%s", flag_str[bit]);
> > > +			buf += ret;
> > > +			buflen -= ret;
> > > +		}
> > > +
> > > +		if (suffix)
> > > +			snprintf(buf, buflen, "%s", suffix);
> > > +	}
> > >   }
> > > diff --git a/drivers/gpu/drm/i915/i915_cache.h b/drivers/gpu/drm/i915/i915_cache.h
> > > index cb68936fb8a2..d9e97318b942 100644
> > > --- a/drivers/gpu/drm/i915/i915_cache.h
> > > +++ b/drivers/gpu/drm/i915/i915_cache.h
> > > @@ -6,8 +6,76 @@
> > >   #ifndef __I915_CACHE_H__
> > >   #define __I915_CACHE_H__
> > > +#include <linux/types.h>
> > > +
> > > +struct drm_printer;
> > > +
> > >   struct drm_i915_private;
> > > -void i915_cache_init(struct drm_i915_private *i915);
> > > +typedef u16 i915_cache_t;
> > > +
> > > +/* Cache modes */
> > > +enum i915_cache_mode {
> > > +	I915_CACHE_MODE_UC = 0,
> > > +	I915_CACHE_MODE_WB,
> > > +	__I915_CACHE_MODE_WB_L3, /* Special do-not-use entry for legacy 1:1 mapping. */
> > > +	I915_CACHE_MODE_WT,
> > > +	I915_CACHE_MODE_WC,
> > > +	I915_NUM_CACHE_MODES
> > > +};
> > > +
> > > +/* Cache mode flag bits */
> > > +#define I915_CACHE_FLAG_COH1W	(0x1)
> > > +#define I915_CACHE_FLAG_COH2W	(0x2) /* 1-way needs to be set too. */
> > > +#define I915_CACHE_FLAG_L3	(0x4)
> > > +#define I915_CACHE_FLAG_CLOS1	(0x8)
> > > +#define I915_CACHE_FLAG_CLOS2	(0x10)
> > > +
> > > +/*
> > > + * Overloaded I915_CACHE() macro based on:
> > > + *  https://stackoverflow.com/questions/3046889/optional-parameters-with-c-macros
> > > + *
> > > + * It is possible to call I915_CACHE with mode and zero or more flags as
> > > + * separate arguments. Ie these all work:
> > > + *
> > > + *   I915_CACHE(WB)
> > > + *   I915_CACHE(WB, COH1W, COH2W)
> > > + *   I915_CACHE(WB, COH1W, COH2W, L3)
> > > + */
> > > +
> > > +#define __I915_CACHE_FLAG(f) (I915_CACHE_FLAG_##f << 8)
> > > +#define __I915_CACHE(m, f) ((i915_cache_t)(I915_CACHE_MODE_##m | (f)))
> > > +
> > > +#define I915_CACHE_4(m, f1, f2, f3)	__I915_CACHE(m, __I915_CACHE_FLAG(f1) | __I915_CACHE_FLAG(f2) | __I915_CACHE_FLAG(f3))
> > > +#define I915_CACHE_3(m, f1, f2)		__I915_CACHE(m, __I915_CACHE_FLAG(f1) | __I915_CACHE_FLAG(f2))
> > > +#define I915_CACHE_2(m, f1)		__I915_CACHE(m, __I915_CACHE_FLAG(f1))
> > > +#define I915_CACHE_1(m)			__I915_CACHE(m, 0)
> > > +#define I915_CACHE_0(m)			__I915_CACHE(WC, 0)
> > > +
> > > +#define FUNC_CHOOSER(_f1, _f2, _f3, _f4, _f5, ...) _f5
> > > +#define FUNC_RECOMPOSER(argsWithParentheses) FUNC_CHOOSER argsWithParentheses
> > > +#define CHOOSE_FROM_ARG_COUNT(...) FUNC_RECOMPOSER((__VA_ARGS__, I915_CACHE_4, I915_CACHE_3, I915_CACHE_2, I915_CACHE_1, ))
> > > +#define NO_ARG_EXPANDER() ,,,I915_CACHE_0
> > > +#define MACRO_CHOOSER(...) CHOOSE_FROM_ARG_COUNT(NO_ARG_EXPANDER __VA_ARGS__ ())
> > > +
> > > +#define I915_CACHE(...) MACRO_CHOOSER(__VA_ARGS__)(__VA_ARGS__)
> > > +
> > > +/* i915_cache_t mode and flags extraction helpers. */
> > > +#define I915_CACHE_MODE(cache) \
> > > +	((enum i915_cache_mode)(((i915_cache_t)(cache)) & 0xff))
> > > +#define I915_CACHE_FLAGS(cache) \
> > > +	((unsigned int)((((i915_cache_t)(cache) & 0xff00)) >> 8))
> > > +
> > > +/* Helpers for i915 caching modes. */
> > > +#define I915_CACHE_NONE		I915_CACHE(UC)
> > > +#define I915_CACHE_CACHED	I915_CACHE(WB, COH1W, COH2W)
> > > +#define I915_CACHE_WT		I915_CACHE(WT)
> > > +
> > > +int i915_cache_init(struct drm_i915_private *i915);
> > > +int i915_cache_find_pat(struct drm_i915_private *i915, i915_cache_t cache);
> > > +void i915_cache_print(char *buf, size_t buflen, const char *suffix,
> > > +		      i915_cache_t cache);
> > > +
> > > +#define I915_CACHE_NAME_LEN (40)
> > >   #endif /* __I915_CACHE_H__ */
> > > diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> > > index 4de44cf1026d..4ec292011546 100644
> > > --- a/drivers/gpu/drm/i915/i915_debugfs.c
> > > +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> > > @@ -140,57 +140,18 @@ static const char *stringify_vma_type(const struct i915_vma *vma)
> > >   	return "ppgtt";
> > >   }
> > > -static const char *i915_cache_level_str(struct drm_i915_gem_object *obj)
> > > -{
> > > -	struct drm_i915_private *i915 = obj_to_i915(obj);
> > > -
> > > -	if (IS_METEORLAKE(i915)) {
> > > -		switch (obj->pat_index) {
> > > -		case 0: return " WB";
> > > -		case 1: return " WT";
> > > -		case 2: return " UC";
> > > -		case 3: return " WB (1-Way Coh)";
> > > -		case 4: return " WB (2-Way Coh)";
> > > -		default: return " not defined";
> > > -		}
> > > -	} else if (IS_PONTEVECCHIO(i915)) {
> > > -		switch (obj->pat_index) {
> > > -		case 0: return " UC";
> > > -		case 1: return " WC";
> > > -		case 2: return " WT";
> > > -		case 3: return " WB";
> > > -		case 4: return " WT (CLOS1)";
> > > -		case 5: return " WB (CLOS1)";
> > > -		case 6: return " WT (CLOS2)";
> > > -		case 7: return " WT (CLOS2)";
> > > -		default: return " not defined";
> > > -		}
> > > -	} else if (GRAPHICS_VER(i915) >= 12) {
> > > -		switch (obj->pat_index) {
> > > -		case 0: return " WB";
> > > -		case 1: return " WC";
> > > -		case 2: return " WT";
> > > -		case 3: return " UC";
> > > -		default: return " not defined";
> > > -		}
> > > -	} else {
> > > -		switch (obj->pat_index) {
> > > -		case 0: return " UC";
> > > -		case 1: return HAS_LLC(i915) ?
> > > -			       " LLC" : " snooped";
> > > -		case 2: return " L3+LLC";
> > > -		case 3: return " WT";
> > > -		default: return " not defined";
> > > -		}
> > > -	}
> > > -}
> > > -
> > >   void
> > >   i915_debugfs_describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
> > >   {
> > > +	struct drm_i915_private *i915 = to_i915(obj->base.dev);
> > > +	char buf[I915_CACHE_NAME_LEN];
> > >   	struct i915_vma *vma;
> > >   	int pin_count = 0;
> > > +	i915_cache_print(buf, sizeof(buf),
> > > +			 obj->pat_set_by_user ? "!" : NULL,
> > > +			 INTEL_INFO(i915)->cache_modes[obj->pat_index]);
> > > +
> > >   	seq_printf(m, "%pK: %c%c%c %8zdKiB %02x %02x %s%s%s",
> > >   		   &obj->base,
> > >   		   get_tiling_flag(obj),
> > > @@ -199,7 +160,7 @@ i915_debugfs_describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
> > >   		   obj->base.size / 1024,
> > >   		   obj->read_domains,
> > >   		   obj->write_domain,
> > > -		   i915_cache_level_str(obj),
> > > +		   buf,
> > >   		   obj->mm.dirty ? " dirty" : "",
> > >   		   obj->mm.madv == I915_MADV_DONTNEED ? " purgeable" : "");
> > >   	if (obj->base.name)
> > > diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
> > > index bb2223cc3470..8663388a524f 100644
> > > --- a/drivers/gpu/drm/i915/i915_driver.c
> > > +++ b/drivers/gpu/drm/i915/i915_driver.c
> > > @@ -241,7 +241,9 @@ static int i915_driver_early_probe(struct drm_i915_private *dev_priv)
> > >   	i915_memcpy_init_early(dev_priv);
> > >   	intel_runtime_pm_init_early(&dev_priv->runtime_pm);
> > > -	i915_cache_init(dev_priv);
> > > +	ret = i915_cache_init(dev_priv);
> > > +	if (ret < 0)
> > > +		return ret;
> > >   	ret = i915_workqueues_init(dev_priv);
> > >   	if (ret < 0)
> > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > > index 896aa48ed089..814705cfeb12 100644
> > > --- a/drivers/gpu/drm/i915/i915_gem.c
> > > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > > @@ -1144,19 +1144,6 @@ int i915_gem_init(struct drm_i915_private *dev_priv)
> > >   	unsigned int i;
> > >   	int ret;
> > > -	/*
> > > -	 * In the proccess of replacing cache_level with pat_index a tricky
> > > -	 * dependency is created on the definition of the enum i915_cache_level.
> > > -	 * in case this enum is changed, PTE encode would be broken.
> > > -	 * Add a WARNING here. And remove when we completely quit using this
> > > -	 * enum
> > > -	 */
> > > -	BUILD_BUG_ON(I915_CACHE_NONE != 0 ||
> > > -		     I915_CACHE_LLC != 1 ||
> > > -		     I915_CACHE_L3_LLC != 2 ||
> > > -		     I915_CACHE_WT != 3 ||
> > > -		     I915_MAX_CACHE_LEVEL != 4);
> > > -
> > >   	/* We need to fallback to 4K pages if host doesn't support huge gtt. */
> > >   	if (intel_vgpu_active(dev_priv) && !intel_vgpu_has_huge_gtt(dev_priv))
> > >   		RUNTIME_INFO(dev_priv)->page_sizes = I915_GTT_PAGE_SIZE_4K;
> > > diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
> > > index fcacdc21643c..565a60a1645d 100644
> > > --- a/drivers/gpu/drm/i915/i915_pci.c
> > > +++ b/drivers/gpu/drm/i915/i915_pci.c
> > > @@ -32,6 +32,7 @@
> > >   #include "gt/intel_sa_media.h"
> > >   #include "gem/i915_gem_object_types.h"
> > > +#include "i915_cache.h"
> > >   #include "i915_driver.h"
> > >   #include "i915_drv.h"
> > >   #include "i915_pci.h"
> > > @@ -43,36 +44,43 @@
> > >   	.__runtime.graphics.ip.ver = (x), \
> > >   	.__runtime.media.ip.ver = (x)
> > > -#define LEGACY_CACHELEVEL \
> > > -	.cachelevel_to_pat = { \
> > > -		[I915_CACHE_NONE]   = 0, \
> > > -		[I915_CACHE_LLC]    = 1, \
> > > -		[I915_CACHE_L3_LLC] = 2, \
> > > -		[I915_CACHE_WT]     = 3, \
> > > +#define LEGACY_CACHE_MODES \
> > > +	.cache_modes = { \
> > > +		[I915_CACHE_MODE_UC] 	  = I915_CACHE(UC), \
> > > +		[I915_CACHE_MODE_WB] 	  = I915_CACHE(WB, COH1W, COH2W), \
> > 
> > Reading bspec 2863 (bdw) indicates that the CPU being able to snoop the
> > GPU's L3 was a new feature in gen8.  So for HSW and earlier, any
> > coherency was only 1-way (GPU could be coherent with CPU's caches, but
> > not vice-versa).  Only starting with gen8 did we get 2-way coherency as
> > an option where the CPU would also be coherent with the GPU cache (and
> > with gen8 and beyond you could still select 1-way instead of 2-way
> > coherency with instruction-level granularity via MOCS).  There are also
> > some legacy platforms (e.g., EHL/JSL on bspec 13948) where the IA wasn't
> > coherent with GPU L3 so we were back to 1-way coherency.
> > 
> > So should we split LEGACY_CACHE_MODES into two tables with different
> > coherency settings attached to I915_CACHE_MODE_WB?
> > 
> > > +		[__I915_CACHE_MODE_WB_L3] = I915_CACHE(WB, COH1W, COH2W, L3), \
> > > +		[I915_CACHE_MODE_WT] 	  = I915_CACHE(WT), \
> > >   	}
> > > -#define TGL_CACHELEVEL \
> > > -	.cachelevel_to_pat = { \
> > > -		[I915_CACHE_NONE]   = 3, \
> > > -		[I915_CACHE_LLC]    = 0, \
> > > -		[I915_CACHE_L3_LLC] = 0, \
> > > -		[I915_CACHE_WT]     = 2, \
> > > +#define GEN12_CACHE_MODES \
> > > +	.cache_modes = { \
> > > +		[0] = I915_CACHE(WB, COH1W, COH2W), \
> > > +		[1] = I915_CACHE(WC), \
> > > +		[2] = I915_CACHE(WT), \
> > > +		[3] = I915_CACHE(UC), \
> > >   	}
> > > -#define PVC_CACHELEVEL \
> > > -	.cachelevel_to_pat = { \
> > > -		[I915_CACHE_NONE]   = 0, \
> > > -		[I915_CACHE_LLC]    = 3, \
> > > -		[I915_CACHE_L3_LLC] = 3, \
> > > -		[I915_CACHE_WT]     = 2, \
> > > +/* FIXME is 1-way or 2-way for 3, 5, 7 */
> > > +
> > > +#define PVC_CACHE_MODES \
> > > +	.cache_modes = { \
> > > +		[0] = I915_CACHE(UC), \
> > > +		[1] = I915_CACHE(WC), \
> > > +		[2] = I915_CACHE(WT), \
> > > +		[3] = I915_CACHE(WB, COH1W), \
> > > +		[4] = I915_CACHE(WT, CLOS1), \
> > > +		[5] = I915_CACHE(WB, COH1W, CLOS1), \
> > > +		[6] = I915_CACHE(WT, CLOS2), \
> > > +		[7] = I915_CACHE(WB, COH1W, CLOS2), \
> > >   	}
> > > -#define MTL_CACHELEVEL \
> > > -	.cachelevel_to_pat = { \
> > > -		[I915_CACHE_NONE]   = 2, \
> > > -		[I915_CACHE_LLC]    = 3, \
> > > -		[I915_CACHE_L3_LLC] = 3, \
> > > -		[I915_CACHE_WT]     = 1, \
> > > +#define MTL_CACHE_MODES \
> > > +	.cache_modes = { \
> > > +		[0] = I915_CACHE(WB), \
> > > +		[1] = I915_CACHE(WT), \
> > > +		[2] = I915_CACHE(UC), \
> > > +		[3] = I915_CACHE(WB, COH1W), \
> > > +		[4] = I915_CACHE(WB, COH1W, COH2W), \
> > 
> > We may want a comment on this one since the "2W" part is sort of a lie.
> > Bspec 63884 has a programming note for MTL that says
> > 
> >          "...Except for system atomics, setting Coherency Mode to 10 or
> >          11 results in this same one-way coherenct behavior..."
> > 
> > So if we ask for 2W, we actually only get 1W behavior except in a very
> > narrow set of cases.
> 
> Shall I just not mark it as 2-way then becuase it sounds that for i915
> purposes it is not 2-way?!
> 
> Could invent a new flag just to documet this is something weird?
> 
> Regards,
> 
> Tvrtko

Yeah, it sounds like that might be best.


Matt

> 
> > 
> > 
> > Matt
> > 
> > >   	}
> > >   /* Keep in gen based order, and chronological order within a gen */
> > > @@ -97,7 +105,7 @@
> > >   	.max_pat_index = 3, \
> > >   	GEN_DEFAULT_PAGE_SIZES, \
> > >   	GEN_DEFAULT_REGIONS, \
> > > -	LEGACY_CACHELEVEL
> > > +	LEGACY_CACHE_MODES
> > >   #define I845_FEATURES \
> > >   	GEN(2), \
> > > @@ -112,7 +120,7 @@
> > >   	.max_pat_index = 3, \
> > >   	GEN_DEFAULT_PAGE_SIZES, \
> > >   	GEN_DEFAULT_REGIONS, \
> > > -	LEGACY_CACHELEVEL
> > > +	LEGACY_CACHE_MODES
> > >   static const struct intel_device_info i830_info = {
> > >   	I830_FEATURES,
> > > @@ -145,7 +153,7 @@ static const struct intel_device_info i865g_info = {
> > >   	.max_pat_index = 3, \
> > >   	GEN_DEFAULT_PAGE_SIZES, \
> > >   	GEN_DEFAULT_REGIONS, \
> > > -	LEGACY_CACHELEVEL
> > > +	LEGACY_CACHE_MODES
> > >   static const struct intel_device_info i915g_info = {
> > >   	GEN3_FEATURES,
> > > @@ -208,7 +216,7 @@ static const struct intel_device_info pnv_m_info = {
> > >   	.max_pat_index = 3, \
> > >   	GEN_DEFAULT_PAGE_SIZES, \
> > >   	GEN_DEFAULT_REGIONS, \
> > > -	LEGACY_CACHELEVEL
> > > +	LEGACY_CACHE_MODES
> > >   static const struct intel_device_info i965g_info = {
> > >   	GEN4_FEATURES,
> > > @@ -252,7 +260,7 @@ static const struct intel_device_info gm45_info = {
> > >   	.max_pat_index = 3, \
> > >   	GEN_DEFAULT_PAGE_SIZES, \
> > >   	GEN_DEFAULT_REGIONS, \
> > > -	LEGACY_CACHELEVEL
> > > +	LEGACY_CACHE_MODES
> > >   static const struct intel_device_info ilk_d_info = {
> > >   	GEN5_FEATURES,
> > > @@ -282,7 +290,7 @@ static const struct intel_device_info ilk_m_info = {
> > >   	.__runtime.ppgtt_size = 31, \
> > >   	GEN_DEFAULT_PAGE_SIZES, \
> > >   	GEN_DEFAULT_REGIONS, \
> > > -	LEGACY_CACHELEVEL
> > > +	LEGACY_CACHE_MODES
> > >   #define SNB_D_PLATFORM \
> > >   	GEN6_FEATURES, \
> > > @@ -330,7 +338,7 @@ static const struct intel_device_info snb_m_gt2_info = {
> > >   	.__runtime.ppgtt_size = 31, \
> > >   	GEN_DEFAULT_PAGE_SIZES, \
> > >   	GEN_DEFAULT_REGIONS, \
> > > -	LEGACY_CACHELEVEL
> > > +	LEGACY_CACHE_MODES
> > >   #define IVB_D_PLATFORM \
> > >   	GEN7_FEATURES, \
> > > @@ -387,7 +395,7 @@ static const struct intel_device_info vlv_info = {
> > >   	.platform_engine_mask = BIT(RCS0) | BIT(VCS0) | BIT(BCS0),
> > >   	GEN_DEFAULT_PAGE_SIZES,
> > >   	GEN_DEFAULT_REGIONS,
> > > -	LEGACY_CACHELEVEL,
> > > +	LEGACY_CACHE_MODES
> > >   };
> > >   #define G75_FEATURES  \
> > > @@ -473,7 +481,7 @@ static const struct intel_device_info chv_info = {
> > >   	.has_coherent_ggtt = false,
> > >   	GEN_DEFAULT_PAGE_SIZES,
> > >   	GEN_DEFAULT_REGIONS,
> > > -	LEGACY_CACHELEVEL,
> > > +	LEGACY_CACHE_MODES
> > >   };
> > >   #define GEN9_DEFAULT_PAGE_SIZES \
> > > @@ -536,7 +544,7 @@ static const struct intel_device_info skl_gt4_info = {
> > >   	.max_pat_index = 3, \
> > >   	GEN9_DEFAULT_PAGE_SIZES, \
> > >   	GEN_DEFAULT_REGIONS, \
> > > -	LEGACY_CACHELEVEL
> > > +	LEGACY_CACHE_MODES
> > >   static const struct intel_device_info bxt_info = {
> > >   	GEN9_LP_FEATURES,
> > > @@ -640,7 +648,7 @@ static const struct intel_device_info jsl_info = {
> > >   #define GEN12_FEATURES \
> > >   	GEN11_FEATURES, \
> > >   	GEN(12), \
> > > -	TGL_CACHELEVEL, \
> > > +	GEN12_CACHE_MODES, \
> > >   	.has_global_mocs = 1, \
> > >   	.has_pxp = 1, \
> > >   	.max_pat_index = 3
> > > @@ -708,7 +716,7 @@ static const struct intel_device_info adl_p_info = {
> > >   	.__runtime.graphics.ip.ver = 12, \
> > >   	.__runtime.graphics.ip.rel = 50, \
> > >   	XE_HP_PAGE_SIZES, \
> > > -	TGL_CACHELEVEL, \
> > > +	GEN12_CACHE_MODES, \
> > >   	.dma_mask_size = 46, \
> > >   	.has_3d_pipeline = 1, \
> > >   	.has_64bit_reloc = 1, \
> > > @@ -803,7 +811,7 @@ static const struct intel_device_info pvc_info = {
> > >   		BIT(VCS0) |
> > >   		BIT(CCS0) | BIT(CCS1) | BIT(CCS2) | BIT(CCS3),
> > >   	.require_force_probe = 1,
> > > -	PVC_CACHELEVEL,
> > > +	PVC_CACHE_MODES
> > >   };
> > >   static const struct intel_gt_definition xelpmp_extra_gt[] = {
> > > @@ -838,7 +846,7 @@ static const struct intel_device_info mtl_info = {
> > >   	.memory_regions = REGION_SMEM | REGION_STOLEN_LMEM,
> > >   	.platform_engine_mask = BIT(RCS0) | BIT(BCS0) | BIT(CCS0),
> > >   	.require_force_probe = 1,
> > > -	MTL_CACHELEVEL,
> > > +	MTL_CACHE_MODES
> > >   };
> > >   #undef PLATFORM
> > > diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> > > index 04bc1f4a1115..973175a64534 100644
> > > --- a/drivers/gpu/drm/i915/i915_perf.c
> > > +++ b/drivers/gpu/drm/i915/i915_perf.c
> > > @@ -1870,7 +1870,7 @@ static int alloc_oa_buffer(struct i915_perf_stream *stream)
> > >   		return PTR_ERR(bo);
> > >   	}
> > > -	i915_gem_object_set_cache_coherency(bo, I915_CACHE_LLC);
> > > +	i915_gem_object_set_cache_coherency(bo, I915_CACHE_CACHED);
> > >   	/* PreHSW required 512K alignment, HSW requires 16M */
> > >   	vma = i915_vma_instance(bo, &gt->ggtt->vm, NULL);
> > > diff --git a/drivers/gpu/drm/i915/intel_device_info.h b/drivers/gpu/drm/i915/intel_device_info.h
> > > index dbfe6443457b..2ce13b7c48cb 100644
> > > --- a/drivers/gpu/drm/i915/intel_device_info.h
> > > +++ b/drivers/gpu/drm/i915/intel_device_info.h
> > > @@ -27,6 +27,8 @@
> > >   #include <uapi/drm/i915_drm.h>
> > > +#include "i915_cache.h"
> > > +
> > >   #include "intel_step.h"
> > >   #include "gt/intel_engine_types.h"
> > > @@ -243,8 +245,8 @@ struct intel_device_info {
> > >   	 */
> > >   	const struct intel_runtime_info __runtime;
> > > -	u32 cachelevel_to_pat[I915_MAX_CACHE_LEVEL];
> > > -	u32 max_pat_index;
> > > +	i915_cache_t cache_modes[8];
> > > +	unsigned int max_pat_index;
> > >   };
> > >   struct intel_driver_caps {
> > > diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
> > > index f910ec9b6d2b..ba821e48baa5 100644
> > > --- a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
> > > +++ b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
> > > @@ -267,7 +267,7 @@ static int igt_evict_for_cache_color(void *arg)
> > >   		err = PTR_ERR(obj);
> > >   		goto cleanup;
> > >   	}
> > > -	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
> > > +	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
> > >   	quirk_add(obj, &objects);
> > >   	vma = i915_gem_object_ggtt_pin(obj, NULL, 0, 0,
> > > @@ -283,7 +283,7 @@ static int igt_evict_for_cache_color(void *arg)
> > >   		err = PTR_ERR(obj);
> > >   		goto cleanup;
> > >   	}
> > > -	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
> > > +	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
> > >   	quirk_add(obj, &objects);
> > >   	/* Neighbouring; same colour - should fit */
> > > diff --git a/drivers/gpu/drm/i915/selftests/igt_spinner.c b/drivers/gpu/drm/i915/selftests/igt_spinner.c
> > > index 3c5e0952f1b8..4cfc5000d6ff 100644
> > > --- a/drivers/gpu/drm/i915/selftests/igt_spinner.c
> > > +++ b/drivers/gpu/drm/i915/selftests/igt_spinner.c
> > > @@ -23,7 +23,7 @@ int igt_spinner_init(struct igt_spinner *spin, struct intel_gt *gt)
> > >   		err = PTR_ERR(spin->hws);
> > >   		goto err;
> > >   	}
> > > -	i915_gem_object_set_cache_coherency(spin->hws, I915_CACHE_LLC);
> > > +	i915_gem_object_set_cache_coherency(spin->hws, I915_CACHE_CACHED);
> > >   	spin->obj = i915_gem_object_create_internal(gt->i915, PAGE_SIZE);
> > >   	if (IS_ERR(spin->obj)) {
> > > diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> > > index 1d1a457e2aee..8ae77bcf27fa 100644
> > > --- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> > > +++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> > > @@ -126,13 +126,13 @@ static const struct intel_device_info mock_info = {
> > >   	.memory_regions = REGION_SMEM,
> > >   	.platform_engine_mask = BIT(0),
> > > -	/* simply use legacy cache level for mock device */
> > > +	/* Simply use legacy cache modes for the mock device. */
> > >   	.max_pat_index = 3,
> > > -	.cachelevel_to_pat = {
> > > -		[I915_CACHE_NONE]   = 0,
> > > -		[I915_CACHE_LLC]    = 1,
> > > -		[I915_CACHE_L3_LLC] = 2,
> > > -		[I915_CACHE_WT]     = 3,
> > > +	.cache_modes = {
> > > +		[0] = I915_CACHE(UC),
> > > +		[1] = I915_CACHE(WB, COH1W),
> > > +		[2] = I915_CACHE(WB, COH1W, COH2W, L3),
> > > +		[3] = I915_CACHE(WT),
> > >   	},
> > >   };
> > > @@ -181,7 +181,7 @@ struct drm_i915_private *mock_gem_device(void)
> > >   	/* Set up device info and initial runtime info. */
> > >   	intel_device_info_driver_create(i915, pdev->device, &mock_info);
> > > -	i915_cache_init(i915);
> > > +	WARN_ON(i915_cache_init(i915));
> > >   	dev_pm_domain_set(&pdev->dev, &pm_domain);
> > >   	pm_runtime_enable(&pdev->dev);
> > > -- 
> > > 2.39.2
> > > 
> > 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Intel-gfx] [RFC 4/8] drm/i915: Refactor PAT/object cache handling
@ 2023-07-28 14:53         ` Matt Roper
  0 siblings, 0 replies; 59+ messages in thread
From: Matt Roper @ 2023-07-28 14:53 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Intel-gfx, dri-devel, Chris Wilson

On Fri, Jul 28, 2023 at 01:39:06PM +0100, Tvrtko Ursulin wrote:
> 
> Forgot one part of your reply:
> 
> On 28/07/2023 00:57, Matt Roper wrote:
> > On Thu, Jul 27, 2023 at 03:55:00PM +0100, Tvrtko Ursulin wrote:
> > > From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > > 
> > > Commit 9275277d5324 ("drm/i915: use pat_index instead of cache_level") has
> > > introduced PAT indices to i915 internal APIs, partially replacing the
> > > usage of driver internal cache_level, but has also added a few sub-
> > > optimal design decisions which this patch tries to improve upon.
> > > 
> > > Principal change here is to invert the per platform cache level to PAT
> > > index table which was added by the referenced commit, and by doing so
> > > enable i915 to understand the cache mode between PAT indices, changing
> > > them from opaque to transparent.
> > > 
> > > Once we have the inverted table we are able to remove the hidden false
> > > "return true" from i915_gem_object_has_cache_level and make the involved
> > > code path clearer.
> > > 
> > > To achieve this we replace the enum i915_cache_level with i915_cache_t,
> > > composed of a more detailed representation of each cache mode (base mode
> > > plus flags).
> > > 
> > > In this way we are able to express the differences between different
> > > write-back mode coherency settings on Meteorlake, which in turn enables us
> > > to map the i915 "cached" mode to the correct Meteorlake PAT index.
> > > 
> > > We can also replace the platform dependent cache mode to string code in
> > > debugfs and elsewhere by the single implementation based on i915_cache_t.
> > > 
> > > v2:
> > >   * Fix PAT-to-cache-mode table for PVC. (Fei)
> > >   * Cache display caching mode too. (Fei)
> > >   * Improve and document criteria in i915_gem_object_can_bypass_llc() (Matt)
> > > 
> > > v3:
> > >   * Checkpath issues.
> > >   * Cache mode flags check fixed.
> > > 
> > > v4:
> > >   * Fix intel_device_info->cache_modes array size. (Matt)
> > >   * Boolean cache mode and flags query. (Matt)
> > >   * Reduce number of cache macros with some macro magic.
> > >   * One more checkpatch fix.
> > >   * Tweak tables to show legacy and Gen12 WB is fully coherent.
> > > 
> > > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > > References: 9275277d5324 ("drm/i915: use pat_index instead of cache_level")
> > > Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
> > > Cc: Fei Yang <fei.yang@intel.com>
> > > Cc: Andi Shyti <andi.shyti@linux.intel.com>
> > > Cc: Matt Roper <matthew.d.roper@intel.com>
> > > ---
> > >   drivers/gpu/drm/i915/gem/i915_gem_domain.c    |  60 +++++----
> > >   drivers/gpu/drm/i915/gem/i915_gem_domain.h    |   5 +-
> > >   .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |   3 +-
> > >   drivers/gpu/drm/i915/gem/i915_gem_internal.c  |   2 +-
> > >   drivers/gpu/drm/i915/gem/i915_gem_mman.c      |   4 +-
> > >   drivers/gpu/drm/i915/gem/i915_gem_object.c    | 117 ++++++++++--------
> > >   drivers/gpu/drm/i915/gem/i915_gem_object.h    |  11 +-
> > >   .../gpu/drm/i915/gem/i915_gem_object_types.h  | 116 +----------------
> > >   drivers/gpu/drm/i915/gem/i915_gem_shmem.c     |   8 +-
> > >   drivers/gpu/drm/i915/gem/i915_gem_stolen.c    |   2 +-
> > >   drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c  |  20 +--
> > >   drivers/gpu/drm/i915/gem/i915_gem_userptr.c   |   2 +-
> > >   .../drm/i915/gem/selftests/huge_gem_object.c  |   2 +-
> > >   .../gpu/drm/i915/gem/selftests/huge_pages.c   |   3 +-
> > >   drivers/gpu/drm/i915/gt/gen8_ppgtt.c          |  10 +-
> > >   drivers/gpu/drm/i915/gt/intel_engine_cs.c     |   2 +-
> > >   drivers/gpu/drm/i915/gt/intel_ggtt.c          |  25 ++--
> > >   drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c     |   4 +-
> > >   drivers/gpu/drm/i915/gt/intel_gtt.c           |   2 +-
> > >   drivers/gpu/drm/i915/gt/intel_gtt.h           |   3 +-
> > >   drivers/gpu/drm/i915/gt/intel_ppgtt.c         |   6 +-
> > >   .../gpu/drm/i915/gt/intel_ring_submission.c   |   4 +-
> > >   drivers/gpu/drm/i915/gt/intel_timeline.c      |   2 +-
> > >   drivers/gpu/drm/i915/gt/selftest_hangcheck.c  |   2 +-
> > >   .../gpu/drm/i915/gt/selftest_workarounds.c    |   2 +-
> > >   drivers/gpu/drm/i915/i915_cache.c             |  89 +++++++++++--
> > >   drivers/gpu/drm/i915/i915_cache.h             |  70 ++++++++++-
> > >   drivers/gpu/drm/i915/i915_debugfs.c           |  53 ++------
> > >   drivers/gpu/drm/i915/i915_driver.c            |   4 +-
> > >   drivers/gpu/drm/i915/i915_gem.c               |  13 --
> > >   drivers/gpu/drm/i915/i915_pci.c               |  84 +++++++------
> > >   drivers/gpu/drm/i915/i915_perf.c              |   2 +-
> > >   drivers/gpu/drm/i915/intel_device_info.h      |   6 +-
> > >   .../gpu/drm/i915/selftests/i915_gem_evict.c   |   4 +-
> > >   drivers/gpu/drm/i915/selftests/igt_spinner.c  |   2 +-
> > >   .../gpu/drm/i915/selftests/mock_gem_device.c  |  14 +--
> > >   36 files changed, 391 insertions(+), 367 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> > > index 57db9c581bf6..c15f83de33af 100644
> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> > > @@ -8,6 +8,7 @@
> > >   #include "display/intel_frontbuffer.h"
> > >   #include "gt/intel_gt.h"
> > > +#include "i915_cache.h"
> > >   #include "i915_drv.h"
> > >   #include "i915_gem_clflush.h"
> > >   #include "i915_gem_domain.h"
> > > @@ -41,14 +42,17 @@ static bool gpu_write_needs_clflush(struct drm_i915_gem_object *obj)
> > >   		return false;
> > >   	/*
> > > -	 * For objects created by userspace through GEM_CREATE with pat_index
> > > -	 * set by set_pat extension, i915_gem_object_has_cache_level() will
> > > -	 * always return true, because the coherency of such object is managed
> > > -	 * by userspace. Othereise the call here would fall back to checking
> > > -	 * whether the object is un-cached or write-through.
> > > +	 * Always flush cache for UMD objects with PAT index set.
> > >   	 */
> > > -	return !(i915_gem_object_has_cache_level(obj, I915_CACHE_NONE) ||
> > > -		 i915_gem_object_has_cache_level(obj, I915_CACHE_WT));
> > > +	if (obj->pat_set_by_user)
> > > +		return true;
> > > +
> > > +	/*
> > > +	 * Fully coherent cached access may end up with data in the CPU cache
> > > +	 * which hasn't hit memory yet.
> > > +	 */
> > > +	return i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_WB) &&
> > > +	       i915_gem_object_has_cache_flag(obj, I915_CACHE_FLAG_COH2W);
> > >   }
> > >   bool i915_gem_cpu_write_needs_clflush(struct drm_i915_gem_object *obj)
> > > @@ -268,7 +272,7 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
> > >   /**
> > >    * i915_gem_object_set_cache_level - Changes the cache-level of an object across all VMA.
> > >    * @obj: object to act on
> > > - * @cache_level: new cache level to set for the object
> > > + * @cache: new caching mode to set for the object
> > >    *
> > >    * After this function returns, the object will be in the new cache-level
> > >    * across all GTT and the contents of the backing storage will be coherent,
> > > @@ -281,18 +285,28 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
> > >    * that all direct access to the scanout remains coherent.
> > >    */
> > >   int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
> > > -				    enum i915_cache_level cache_level)
> > > +				    i915_cache_t cache)
> > >   {
> > > -	int ret;
> > > +	struct drm_i915_private *i915 = to_i915(obj->base.dev);
> > > +	int pat, ret;
> > > -	/*
> > > -	 * For objects created by userspace through GEM_CREATE with pat_index
> > > -	 * set by set_pat extension, simply return 0 here without touching
> > > -	 * the cache setting, because such objects should have an immutable
> > > -	 * cache setting by desgin and always managed by userspace.
> > > -	 */
> > > -	if (i915_gem_object_has_cache_level(obj, cache_level))
> > > +	pat = i915_cache_find_pat(i915, cache);
> > > +	if (pat < 0) {
> > > +		char buf[I915_CACHE_NAME_LEN];
> > > +
> > > +		i915_cache_print(buf, sizeof(buf), NULL, cache);
> > > +		drm_err_ratelimited(&i915->drm,
> > > +				    "Attempting to use unknown caching mode %s!\n",
> > > +				    buf);
> > > +
> > > +		return -EINVAL;
> > > +	} else if (pat == obj->pat_index) {
> > >   		return 0;
> > > +	} else if (obj->pat_set_by_user) {
> > > +		drm_notice_once(&i915->drm,
> > > +				"Attempting to change caching mode on an object with fixed PAT!\n");
> > > +		return -EINVAL;
> > > +	}
> > >   	ret = i915_gem_object_wait(obj,
> > >   				   I915_WAIT_INTERRUPTIBLE |
> > > @@ -302,7 +316,7 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
> > >   		return ret;
> > >   	/* Always invalidate stale cachelines */
> > > -	i915_gem_object_set_cache_coherency(obj, cache_level);
> > > +	i915_gem_object_set_pat_index(obj, pat);
> > >   	obj->cache_dirty = true;
> > >   	/* The cache-level will be applied when each vma is rebound. */
> > > @@ -337,10 +351,10 @@ int i915_gem_get_caching_ioctl(struct drm_device *dev, void *data,
> > >   		goto out;
> > >   	}
> > > -	if (i915_gem_object_has_cache_level(obj, I915_CACHE_LLC) ||
> > > -	    i915_gem_object_has_cache_level(obj, I915_CACHE_L3_LLC))
> > > +	if (i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_WB) &&
> > > +	    i915_gem_object_has_cache_flag(obj, I915_CACHE_FLAG_COH2W))
> > >   		args->caching = I915_CACHING_CACHED;
> > > -	else if (i915_gem_object_has_cache_level(obj, I915_CACHE_WT))
> > > +	else if (i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_WT))
> > >   		args->caching = I915_CACHING_DISPLAY;
> > >   	else
> > >   		args->caching = I915_CACHING_NONE;
> > > @@ -355,7 +369,7 @@ int i915_gem_set_caching_ioctl(struct drm_device *dev, void *data,
> > >   	struct drm_i915_private *i915 = to_i915(dev);
> > >   	struct drm_i915_gem_caching *args = data;
> > >   	struct drm_i915_gem_object *obj;
> > > -	enum i915_cache_level level;
> > > +	i915_cache_t level;
> > >   	int ret = 0;
> > >   	if (IS_DGFX(i915))
> > > @@ -378,7 +392,7 @@ int i915_gem_set_caching_ioctl(struct drm_device *dev, void *data,
> > >   		if (!HAS_LLC(i915) && !HAS_SNOOP(i915))
> > >   			return -ENODEV;
> > > -		level = I915_CACHE_LLC;
> > > +		level = I915_CACHE_CACHED;
> > >   		break;
> > >   	case I915_CACHING_DISPLAY:
> > >   		level = HAS_WT(i915) ? I915_CACHE_WT : I915_CACHE_NONE;
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.h b/drivers/gpu/drm/i915/gem/i915_gem_domain.h
> > > index 9622df962bfc..6da5c351f6fd 100644
> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.h
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.h
> > > @@ -6,10 +6,11 @@
> > >   #ifndef __I915_GEM_DOMAIN_H__
> > >   #define __I915_GEM_DOMAIN_H__
> > > +#include "i915_cache.h"
> > > +
> > >   struct drm_i915_gem_object;
> > > -enum i915_cache_level;
> > >   int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
> > > -				    enum i915_cache_level cache_level);
> > > +				    i915_cache_t cache);
> > >   #endif /* __I915_GEM_DOMAIN_H__ */
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > > index 0a1d40220020..9d6e49c8a4c6 100644
> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > > @@ -648,7 +648,8 @@ static inline int use_cpu_reloc(const struct reloc_cache *cache,
> > >   	 */
> > >   	return (cache->has_llc ||
> > >   		obj->cache_dirty ||
> > > -		!i915_gem_object_has_cache_level(obj, I915_CACHE_NONE));
> > > +		!(obj->pat_set_by_user ||
> > > +		  i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_UC)));
> > >   }
> > >   static int eb_reserve_vma(struct i915_execbuffer *eb,
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_internal.c b/drivers/gpu/drm/i915/gem/i915_gem_internal.c
> > > index 6bc26b4b06b8..88c360c3d6a3 100644
> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_internal.c
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_internal.c
> > > @@ -170,7 +170,7 @@ __i915_gem_object_create_internal(struct drm_i915_private *i915,
> > >   	obj->read_domains = I915_GEM_DOMAIN_CPU;
> > >   	obj->write_domain = I915_GEM_DOMAIN_CPU;
> > > -	cache_level = HAS_LLC(i915) ? I915_CACHE_LLC : I915_CACHE_NONE;
> > > +	cache_level = HAS_LLC(i915) ? I915_CACHE_CACHED : I915_CACHE_NONE;
> > >   	i915_gem_object_set_cache_coherency(obj, cache_level);
> > >   	return obj;
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> > > index aa4d842d4c5a..cd7f8ded0d6f 100644
> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> > > @@ -382,7 +382,6 @@ static vm_fault_t vm_fault_gtt(struct vm_fault *vmf)
> > >   		goto err_reset;
> > >   	}
> > > -	/* Access to snoopable pages through the GTT is incoherent. */
> > >   	/*
> > >   	 * For objects created by userspace through GEM_CREATE with pat_index
> > >   	 * set by set_pat extension, coherency is managed by userspace, make
> > > @@ -391,7 +390,8 @@ static vm_fault_t vm_fault_gtt(struct vm_fault *vmf)
> > >   	 * objects. Otherwise this helper function would fall back to checking
> > >   	 * whether the object is un-cached.
> > >   	 */
> > > -	if (!(i915_gem_object_has_cache_level(obj, I915_CACHE_NONE) ||
> > > +	if (!((obj->pat_set_by_user ||
> > > +	       i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_UC)) ||
> > >   	      HAS_LLC(i915))) {
> > >   		ret = -EFAULT;
> > >   		goto err_unpin;
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> > > index 3dc4fbb67d2b..ec1f0be43d0d 100644
> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> > > @@ -45,33 +45,6 @@ static struct kmem_cache *slab_objects;
> > >   static const struct drm_gem_object_funcs i915_gem_object_funcs;
> > > -unsigned int i915_gem_get_pat_index(struct drm_i915_private *i915,
> > > -				    enum i915_cache_level level)
> > > -{
> > > -	if (drm_WARN_ON(&i915->drm, level >= I915_MAX_CACHE_LEVEL))
> > > -		return 0;
> > > -
> > > -	return INTEL_INFO(i915)->cachelevel_to_pat[level];
> > > -}
> > > -
> > > -bool i915_gem_object_has_cache_level(const struct drm_i915_gem_object *obj,
> > > -				     enum i915_cache_level lvl)
> > > -{
> > > -	/*
> > > -	 * In case the pat_index is set by user space, this kernel mode
> > > -	 * driver should leave the coherency to be managed by user space,
> > > -	 * simply return true here.
> > > -	 */
> > > -	if (obj->pat_set_by_user)
> > > -		return true;
> > > -
> > > -	/*
> > > -	 * Otherwise the pat_index should have been converted from cache_level
> > > -	 * so that the following comparison is valid.
> > > -	 */
> > > -	return obj->pat_index == i915_gem_get_pat_index(obj_to_i915(obj), lvl);
> > > -}
> > > -
> > >   struct drm_i915_gem_object *i915_gem_object_alloc(void)
> > >   {
> > >   	struct drm_i915_gem_object *obj;
> > > @@ -144,30 +117,72 @@ void __i915_gem_object_fini(struct drm_i915_gem_object *obj)
> > >   	dma_resv_fini(&obj->base._resv);
> > >   }
> > > +bool i915_gem_object_has_cache_mode(const struct drm_i915_gem_object *obj,
> > > +				    enum i915_cache_mode mode)
> > > +{
> > > +	struct drm_i915_private *i915 = obj_to_i915(obj);
> > > +	i915_cache_t cache = INTEL_INFO(i915)->cache_modes[obj->pat_index];
> > > +
> > > +	return I915_CACHE_MODE(cache) == mode;
> > > +}
> > > +
> > > +bool i915_gem_object_has_cache_flag(const struct drm_i915_gem_object *obj,
> > > +				    unsigned int flag)
> > > +{
> > > +	struct drm_i915_private *i915 = obj_to_i915(obj);
> > > +	i915_cache_t cache = INTEL_INFO(i915)->cache_modes[obj->pat_index];
> > > +
> > > +	return I915_CACHE_FLAGS(cache) & flag;
> > > +}
> > > +
> > > +static void __i915_gem_object_update_coherency(struct drm_i915_gem_object *obj)
> > > +{
> > > +	struct drm_i915_private *i915 = obj_to_i915(obj);
> > > +	i915_cache_t cache = INTEL_INFO(i915)->cache_modes[obj->pat_index];
> > > +	const unsigned int flags = I915_CACHE_FLAGS(cache);
> > > +	const unsigned int mode = I915_CACHE_MODE(cache);
> > > +
> > > +	if (mode == I915_CACHE_MODE_WC ||
> > > +	    mode == I915_CACHE_MODE_WT ||
> > > +	    (mode == I915_CACHE_MODE_WB && (flags & I915_CACHE_FLAG_COH2W)))
> > > +		obj->cache_coherent = I915_BO_CACHE_COHERENT_FOR_READ |
> > > +				      I915_BO_CACHE_COHERENT_FOR_WRITE;
> > > +	else if (HAS_LLC(i915))
> > > +		obj->cache_coherent = I915_BO_CACHE_COHERENT_FOR_READ;
> > > +	else
> > > +		obj->cache_coherent = 0;
> > > +
> > > +	obj->cache_dirty =
> > > +		!(obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_WRITE) &&
> > > +		!IS_DGFX(i915);
> > > +}
> > > +
> > >   /**
> > >    * i915_gem_object_set_cache_coherency - Mark up the object's coherency levels
> > > - * for a given cache_level
> > > + * for a given caching mode
> > >    * @obj: #drm_i915_gem_object
> > > - * @cache_level: cache level
> > > + * @cache: cache mode
> > >    */
> > >   void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object *obj,
> > > -					 unsigned int cache_level)
> > > +					 i915_cache_t cache)
> > >   {
> > > -	struct drm_i915_private *i915 = to_i915(obj->base.dev);
> > > +	struct drm_i915_private *i915 = obj_to_i915(obj);
> > > +	int found;
> > > -	obj->pat_index = i915_gem_get_pat_index(i915, cache_level);
> > > +	found = i915_cache_find_pat(i915, cache);
> > > +	if (found < 0) {
> > > +		char buf[I915_CACHE_NAME_LEN];
> > > -	if (cache_level != I915_CACHE_NONE)
> > > -		obj->cache_coherent = (I915_BO_CACHE_COHERENT_FOR_READ |
> > > -				       I915_BO_CACHE_COHERENT_FOR_WRITE);
> > > -	else if (HAS_LLC(i915))
> > > -		obj->cache_coherent = I915_BO_CACHE_COHERENT_FOR_READ;
> > > -	else
> > > -		obj->cache_coherent = 0;
> > > +		i915_cache_print(buf, sizeof(buf), NULL, cache);
> > > +		drm_err_ratelimited(&i915->drm, "Unknown cache mode %s!\n",
> > > +				    buf);
> > > -	obj->cache_dirty =
> > > -		!(obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_WRITE) &&
> > > -		!IS_DGFX(i915);
> > > +		found = i915->pat_uc;
> > > +	}
> > > +
> > > +	obj->pat_index = found;
> > > +
> > > +	__i915_gem_object_update_coherency(obj);
> > >   }
> > >   /**
> > > @@ -181,24 +196,18 @@ void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object *obj,
> > >   void i915_gem_object_set_pat_index(struct drm_i915_gem_object *obj,
> > >   				   unsigned int pat_index)
> > >   {
> > > -	struct drm_i915_private *i915 = to_i915(obj->base.dev);
> > > +	struct drm_i915_private *i915 = obj_to_i915(obj);
> > >   	if (obj->pat_index == pat_index)
> > >   		return;
> > > +	if (drm_WARN_ON_ONCE(&i915->drm,
> > > +			     pat_index > INTEL_INFO(i915)->max_pat_index))
> > > +		return;
> > > +
> > >   	obj->pat_index = pat_index;
> > > -	if (pat_index != i915_gem_get_pat_index(i915, I915_CACHE_NONE))
> > > -		obj->cache_coherent = (I915_BO_CACHE_COHERENT_FOR_READ |
> > > -				       I915_BO_CACHE_COHERENT_FOR_WRITE);
> > > -	else if (HAS_LLC(i915))
> > > -		obj->cache_coherent = I915_BO_CACHE_COHERENT_FOR_READ;
> > > -	else
> > > -		obj->cache_coherent = 0;
> > > -
> > > -	obj->cache_dirty =
> > > -		!(obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_WRITE) &&
> > > -		!IS_DGFX(i915);
> > > +	__i915_gem_object_update_coherency(obj);
> > >   }
> > >   bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj)
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> > > index 884a17275b3a..a5d4ee19d9be 100644
> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> > > @@ -13,6 +13,7 @@
> > >   #include "display/intel_frontbuffer.h"
> > >   #include "intel_memory_region.h"
> > > +#include "i915_cache.h"
> > >   #include "i915_gem_object_types.h"
> > >   #include "i915_gem_gtt.h"
> > >   #include "i915_gem_ww.h"
> > > @@ -32,10 +33,6 @@ static inline bool i915_gem_object_size_2big(u64 size)
> > >   	return false;
> > >   }
> > > -unsigned int i915_gem_get_pat_index(struct drm_i915_private *i915,
> > > -				    enum i915_cache_level level);
> > > -bool i915_gem_object_has_cache_level(const struct drm_i915_gem_object *obj,
> > > -				     enum i915_cache_level lvl);
> > >   void i915_gem_init__objects(struct drm_i915_private *i915);
> > >   void i915_objects_module_exit(void);
> > > @@ -764,8 +761,12 @@ int i915_gem_object_wait_moving_fence(struct drm_i915_gem_object *obj,
> > >   				      bool intr);
> > >   bool i915_gem_object_has_unknown_state(struct drm_i915_gem_object *obj);
> > > +bool i915_gem_object_has_cache_mode(const struct drm_i915_gem_object *obj,
> > > +				    enum i915_cache_mode mode);
> > > +bool i915_gem_object_has_cache_flag(const struct drm_i915_gem_object *obj,
> > > +				    unsigned int flag);
> > >   void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object *obj,
> > > -					 unsigned int cache_level);
> > > +					 i915_cache_t cache);
> > >   void i915_gem_object_set_pat_index(struct drm_i915_gem_object *obj,
> > >   				   unsigned int pat_index);
> > >   bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj);
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> > > index 8de2b91b3edf..6790e13ad262 100644
> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> > > @@ -14,6 +14,7 @@
> > >   #include <uapi/drm/i915_drm.h>
> > >   #include "i915_active.h"
> > > +#include "i915_cache.h"
> > >   #include "i915_selftest.h"
> > >   #include "i915_vma_resource.h"
> > > @@ -116,93 +117,6 @@ struct drm_i915_gem_object_ops {
> > >   	const char *name; /* friendly name for debug, e.g. lockdep classes */
> > >   };
> > > -/**
> > > - * enum i915_cache_level - The supported GTT caching values for system memory
> > > - * pages.
> > > - *
> > > - * These translate to some special GTT PTE bits when binding pages into some
> > > - * address space. It also determines whether an object, or rather its pages are
> > > - * coherent with the GPU, when also reading or writing through the CPU cache
> > > - * with those pages.
> > > - *
> > > - * Userspace can also control this through struct drm_i915_gem_caching.
> > > - */
> > > -enum i915_cache_level {
> > > -	/**
> > > -	 * @I915_CACHE_NONE:
> > > -	 *
> > > -	 * GPU access is not coherent with the CPU cache. If the cache is dirty
> > > -	 * and we need the underlying pages to be coherent with some later GPU
> > > -	 * access then we need to manually flush the pages.
> > > -	 *
> > > -	 * On shared LLC platforms reads and writes through the CPU cache are
> > > -	 * still coherent even with this setting. See also
> > > -	 * &drm_i915_gem_object.cache_coherent for more details. Due to this we
> > > -	 * should only ever use uncached for scanout surfaces, otherwise we end
> > > -	 * up over-flushing in some places.
> > > -	 *
> > > -	 * This is the default on non-LLC platforms.
> > > -	 */
> > > -	I915_CACHE_NONE = 0,
> > > -	/**
> > > -	 * @I915_CACHE_LLC:
> > > -	 *
> > > -	 * GPU access is coherent with the CPU cache. If the cache is dirty,
> > > -	 * then the GPU will ensure that access remains coherent, when both
> > > -	 * reading and writing through the CPU cache. GPU writes can dirty the
> > > -	 * CPU cache.
> > > -	 *
> > > -	 * Not used for scanout surfaces.
> > > -	 *
> > > -	 * Applies to both platforms with shared LLC(HAS_LLC), and snooping
> > > -	 * based platforms(HAS_SNOOP).
> > > -	 *
> > > -	 * This is the default on shared LLC platforms.  The only exception is
> > > -	 * scanout objects, where the display engine is not coherent with the
> > > -	 * CPU cache. For such objects I915_CACHE_NONE or I915_CACHE_WT is
> > > -	 * automatically applied by the kernel in pin_for_display, if userspace
> > > -	 * has not done so already.
> > > -	 */
> > > -	I915_CACHE_LLC,
> > > -	/**
> > > -	 * @I915_CACHE_L3_LLC:
> > > -	 *
> > > -	 * Explicitly enable the Gfx L3 cache, with coherent LLC.
> > > -	 *
> > > -	 * The Gfx L3 sits between the domain specific caches, e.g
> > > -	 * sampler/render caches, and the larger LLC. LLC is coherent with the
> > > -	 * GPU, but L3 is only visible to the GPU, so likely needs to be flushed
> > > -	 * when the workload completes.
> > > -	 *
> > > -	 * Not used for scanout surfaces.
> > > -	 *
> > > -	 * Only exposed on some gen7 + GGTT. More recent hardware has dropped
> > > -	 * this explicit setting, where it should now be enabled by default.
> > > -	 */
> > > -	I915_CACHE_L3_LLC,
> > > -	/**
> > > -	 * @I915_CACHE_WT:
> > > -	 *
> > > -	 * Write-through. Used for scanout surfaces.
> > > -	 *
> > > -	 * The GPU can utilise the caches, while still having the display engine
> > > -	 * be coherent with GPU writes, as a result we don't need to flush the
> > > -	 * CPU caches when moving out of the render domain. This is the default
> > > -	 * setting chosen by the kernel, if supported by the HW, otherwise we
> > > -	 * fallback to I915_CACHE_NONE. On the CPU side writes through the CPU
> > > -	 * cache still need to be flushed, to remain coherent with the display
> > > -	 * engine.
> > > -	 */
> > > -	I915_CACHE_WT,
> > > -	/**
> > > -	 * @I915_MAX_CACHE_LEVEL:
> > > -	 *
> > > -	 * Mark the last entry in the enum. Used for defining cachelevel_to_pat
> > > -	 * array for cache_level to pat translation table.
> > > -	 */
> > > -	I915_MAX_CACHE_LEVEL,
> > > -};
> > > -
> > >   enum i915_map_type {
> > >   	I915_MAP_WB = 0,
> > >   	I915_MAP_WC,
> > > @@ -403,16 +317,6 @@ struct drm_i915_gem_object {
> > >   	/**
> > >   	 * @cache_coherent:
> > >   	 *
> > > -	 * Note: with the change above which replaced @cache_level with pat_index,
> > > -	 * the use of @cache_coherent is limited to the objects created by kernel
> > > -	 * or by userspace without pat index specified.
> > > -	 * Check for @pat_set_by_user to find out if an object has pat index set
> > > -	 * by userspace. The ioctl's to change cache settings have also been
> > > -	 * disabled for the objects with pat index set by userspace. Please don't
> > > -	 * assume @cache_coherent having the flags set as describe here. A helper
> > > -	 * function i915_gem_object_has_cache_level() provides one way to bypass
> > > -	 * the use of this field.
> > > -	 *
> > >   	 * Track whether the pages are coherent with the GPU if reading or
> > >   	 * writing through the CPU caches. The largely depends on the
> > >   	 * @cache_level setting.
> > > @@ -447,7 +351,7 @@ struct drm_i915_gem_object {
> > >   	 * flushing the surface just before doing the scanout.  This does mean
> > >   	 * we might unnecessarily flush non-scanout objects in some places, but
> > >   	 * the default assumption is that all normal objects should be using
> > > -	 * I915_CACHE_LLC, at least on platforms with the shared LLC.
> > > +	 * I915_CACHE_CACHED, at least on platforms with the shared LLC.
> > >   	 *
> > >   	 * Supported values:
> > >   	 *
> > > @@ -486,16 +390,6 @@ struct drm_i915_gem_object {
> > >   	/**
> > >   	 * @cache_dirty:
> > >   	 *
> > > -	 * Note: with the change above which replaced cache_level with pat_index,
> > > -	 * the use of @cache_dirty is limited to the objects created by kernel
> > > -	 * or by userspace without pat index specified.
> > > -	 * Check for @pat_set_by_user to find out if an object has pat index set
> > > -	 * by userspace. The ioctl's to change cache settings have also been
> > > -	 * disabled for the objects with pat_index set by userspace. Please don't
> > > -	 * assume @cache_dirty is set as describe here. Also see helper function
> > > -	 * i915_gem_object_has_cache_level() for possible ways to bypass the use
> > > -	 * of this field.
> > > -	 *
> > >   	 * Track if we are we dirty with writes through the CPU cache for this
> > >   	 * object. As a result reading directly from main memory might yield
> > >   	 * stale data.
> > > @@ -531,9 +425,9 @@ struct drm_i915_gem_object {
> > >   	 *
> > >   	 *   1. All userspace objects, by default, have @cache_level set as
> > >   	 *   I915_CACHE_NONE. The only exception is userptr objects, where we
> > > -	 *   instead force I915_CACHE_LLC, but we also don't allow userspace to
> > > -	 *   ever change the @cache_level for such objects. Another special case
> > > -	 *   is dma-buf, which doesn't rely on @cache_dirty,  but there we
> > > +	 *   instead force I915_CACHE_CACHED, but we also don't allow userspace
> > > +	 *   to ever change the @cache_level for such objects. Another special
> > > +	 *   case is dma-buf, which doesn't rely on @cache_dirty,  but there we
> > >   	 *   always do a forced flush when acquiring the pages, if there is a
> > >   	 *   chance that the pages can be read directly from main memory with
> > >   	 *   the GPU.
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
> > > index 8f1633c3fb93..aba908f0349f 100644
> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
> > > @@ -584,7 +584,7 @@ static int shmem_object_init(struct intel_memory_region *mem,
> > >   	static struct lock_class_key lock_class;
> > >   	struct drm_i915_private *i915 = mem->i915;
> > >   	struct address_space *mapping;
> > > -	unsigned int cache_level;
> > > +	i915_cache_t cache;
> > >   	gfp_t mask;
> > >   	int ret;
> > > @@ -628,11 +628,11 @@ static int shmem_object_init(struct intel_memory_region *mem,
> > >   		 * However, we maintain the display planes as UC, and so
> > >   		 * need to rebind when first used as such.
> > >   		 */
> > > -		cache_level = I915_CACHE_LLC;
> > > +		cache = I915_CACHE_CACHED;
> > >   	else
> > > -		cache_level = I915_CACHE_NONE;
> > > +		cache = I915_CACHE_NONE;
> > > -	i915_gem_object_set_cache_coherency(obj, cache_level);
> > > +	i915_gem_object_set_cache_coherency(obj, cache);
> > >   	i915_gem_object_init_memory_region(obj, mem);
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
> > > index 1c8eb806b7d3..cc907a1f1c53 100644
> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
> > > @@ -691,7 +691,7 @@ static int __i915_gem_object_create_stolen(struct intel_memory_region *mem,
> > >   	obj->stolen = stolen;
> > >   	obj->read_domains = I915_GEM_DOMAIN_CPU | I915_GEM_DOMAIN_GTT;
> > > -	cache_level = HAS_LLC(mem->i915) ? I915_CACHE_LLC : I915_CACHE_NONE;
> > > +	cache_level = HAS_LLC(mem->i915) ? I915_CACHE_CACHED : I915_CACHE_NONE;
> > >   	i915_gem_object_set_cache_coherency(obj, cache_level);
> > >   	if (WARN_ON(!i915_gem_object_trylock(obj, NULL)))
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
> > > index 6bd6c239f4ac..107176d1757b 100644
> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
> > > @@ -48,14 +48,14 @@ void i915_ttm_migrate_set_ban_memcpy(bool ban)
> > >   }
> > >   #endif
> > > -static enum i915_cache_level
> > > -i915_ttm_cache_level(struct drm_i915_private *i915, struct ttm_resource *res,
> > > -		     struct ttm_tt *ttm)
> > > +static i915_cache_t
> > > +i915_ttm_cache(struct drm_i915_private *i915, struct ttm_resource *res,
> > > +	       struct ttm_tt *ttm)
> > >   {
> > >   	return ((HAS_LLC(i915) || HAS_SNOOP(i915)) &&
> > >   		!i915_ttm_gtt_binds_lmem(res) &&
> > > -		ttm->caching == ttm_cached) ? I915_CACHE_LLC :
> > > -		I915_CACHE_NONE;
> > > +		ttm->caching == ttm_cached) ? I915_CACHE_CACHED :
> > > +					      I915_CACHE_NONE;
> > >   }
> > >   static unsigned int
> > > @@ -112,8 +112,8 @@ void i915_ttm_adjust_domains_after_move(struct drm_i915_gem_object *obj)
> > >   void i915_ttm_adjust_gem_after_move(struct drm_i915_gem_object *obj)
> > >   {
> > >   	struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
> > > -	unsigned int cache_level;
> > >   	unsigned int mem_flags;
> > > +	i915_cache_t cache;
> > >   	unsigned int i;
> > >   	int mem_type;
> > > @@ -126,13 +126,13 @@ void i915_ttm_adjust_gem_after_move(struct drm_i915_gem_object *obj)
> > >   	if (!bo->resource) {
> > >   		mem_flags = I915_BO_FLAG_STRUCT_PAGE;
> > >   		mem_type = I915_PL_SYSTEM;
> > > -		cache_level = I915_CACHE_NONE;
> > > +		cache = I915_CACHE_NONE;
> > >   	} else {
> > >   		mem_flags = i915_ttm_cpu_maps_iomem(bo->resource) ? I915_BO_FLAG_IOMEM :
> > >   			I915_BO_FLAG_STRUCT_PAGE;
> > >   		mem_type = bo->resource->mem_type;
> > > -		cache_level = i915_ttm_cache_level(to_i915(bo->base.dev), bo->resource,
> > > -						   bo->ttm);
> > > +		cache = i915_ttm_cache(to_i915(bo->base.dev), bo->resource,
> > > +				       bo->ttm);
> > >   	}
> > >   	/*
> > > @@ -157,7 +157,7 @@ void i915_ttm_adjust_gem_after_move(struct drm_i915_gem_object *obj)
> > >   	obj->mem_flags &= ~(I915_BO_FLAG_STRUCT_PAGE | I915_BO_FLAG_IOMEM);
> > >   	obj->mem_flags |= mem_flags;
> > > -	i915_gem_object_set_cache_coherency(obj, cache_level);
> > > +	i915_gem_object_set_cache_coherency(obj, cache);
> > >   }
> > >   /**
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
> > > index 1d3ebdf4069b..5d2891981bd4 100644
> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
> > > @@ -553,7 +553,7 @@ i915_gem_userptr_ioctl(struct drm_device *dev,
> > >   	obj->mem_flags = I915_BO_FLAG_STRUCT_PAGE;
> > >   	obj->read_domains = I915_GEM_DOMAIN_CPU;
> > >   	obj->write_domain = I915_GEM_DOMAIN_CPU;
> > > -	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
> > > +	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
> > >   	obj->userptr.ptr = args->user_ptr;
> > >   	obj->userptr.notifier_seq = ULONG_MAX;
> > > diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c b/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c
> > > index bac957755068..77d04be5e9d7 100644
> > > --- a/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c
> > > +++ b/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c
> > > @@ -123,7 +123,7 @@ huge_gem_object(struct drm_i915_private *i915,
> > >   	obj->read_domains = I915_GEM_DOMAIN_CPU;
> > >   	obj->write_domain = I915_GEM_DOMAIN_CPU;
> > > -	cache_level = HAS_LLC(i915) ? I915_CACHE_LLC : I915_CACHE_NONE;
> > > +	cache_level = HAS_LLC(i915) ? I915_CACHE_CACHED : I915_CACHE_NONE;
> > >   	i915_gem_object_set_cache_coherency(obj, cache_level);
> > >   	obj->scratch = phys_size;
> > > diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
> > > index 6bddd733d796..6ca5b9dbc414 100644
> > > --- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
> > > +++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
> > > @@ -200,9 +200,10 @@ huge_pages_object(struct drm_i915_private *i915,
> > >   	obj->write_domain = I915_GEM_DOMAIN_CPU;
> > >   	obj->read_domains = I915_GEM_DOMAIN_CPU;
> > > -	cache_level = HAS_LLC(i915) ? I915_CACHE_LLC : I915_CACHE_NONE;
> > > +	cache_level = HAS_LLC(i915) ? I915_CACHE_CACHED : I915_CACHE_NONE;
> > >   	i915_gem_object_set_cache_coherency(obj, cache_level);
> > > +
> > >   	obj->mm.page_mask = page_mask;
> > >   	return obj;
> > > diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> > > index 675f71f06e89..3c93a73cf6b1 100644
> > > --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> > > +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> > > @@ -16,11 +16,11 @@
> > >   #include "intel_gtt.h"
> > >   static u64 gen8_pde_encode(const dma_addr_t addr,
> > > -			   const enum i915_cache_level level)
> > > +			   const enum i915_cache_mode cache_mode)
> > >   {
> > >   	u64 pde = addr | GEN8_PAGE_PRESENT | GEN8_PAGE_RW;
> > > -	if (level != I915_CACHE_NONE)
> > > +	if (cache_mode != I915_CACHE_MODE_UC)
> > >   		pde |= PPAT_CACHED_PDE;
> > >   	else
> > >   		pde |= PPAT_UNCACHED;
> > > @@ -43,10 +43,10 @@ static u64 gen8_pte_encode(dma_addr_t addr,
> > >   	 * See translation table defined by LEGACY_CACHELEVEL.
> > >   	 */
> > >   	switch (pat_index) {
> > > -	case I915_CACHE_NONE:
> > > +	case I915_CACHE_MODE_UC:
> > >   		pte |= PPAT_UNCACHED;
> > >   		break;
> > > -	case I915_CACHE_WT:
> > > +	case I915_CACHE_MODE_WT:
> > >   		pte |= PPAT_DISPLAY_ELLC;
> > >   		break;
> > >   	default:
> > > @@ -893,7 +893,7 @@ static int gen8_init_scratch(struct i915_address_space *vm)
> > >   		}
> > >   		fill_px(obj, vm->scratch[i - 1]->encode);
> > > -		obj->encode = gen8_pde_encode(px_dma(obj), I915_CACHE_NONE);
> > > +		obj->encode = gen8_pde_encode(px_dma(obj), I915_CACHE_MODE_UC);
> > >   		vm->scratch[i] = obj;
> > >   	}
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > > index ee15486fed0d..f1e59e512d14 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > > +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > > @@ -1103,7 +1103,7 @@ static int init_status_page(struct intel_engine_cs *engine)
> > >   		return PTR_ERR(obj);
> > >   	}
> > > -	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
> > > +	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
> > >   	vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL);
> > >   	if (IS_ERR(vma)) {
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> > > index fca61ddca8ad..ab5f654e7557 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
> > > +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> > > @@ -1011,11 +1011,6 @@ static int gen8_gmch_probe(struct i915_ggtt *ggtt)
> > >   	return ggtt_probe_common(ggtt, size);
> > >   }
> > > -/*
> > > - * For pre-gen8 platforms pat_index is the same as enum i915_cache_level,
> > > - * so the switch-case statements in these PTE encode functions are still valid.
> > > - * See translation table LEGACY_CACHELEVEL.
> > > - */
> > >   static u64 snb_pte_encode(dma_addr_t addr,
> > >   			  unsigned int pat_index,
> > >   			  u32 flags)
> > > @@ -1023,11 +1018,11 @@ static u64 snb_pte_encode(dma_addr_t addr,
> > >   	gen6_pte_t pte = GEN6_PTE_ADDR_ENCODE(addr) | GEN6_PTE_VALID;
> > >   	switch (pat_index) {
> > > -	case I915_CACHE_L3_LLC:
> > > -	case I915_CACHE_LLC:
> > > +	case I915_CACHE_MODE_WB:
> > > +	case __I915_CACHE_MODE_WB_L3:
> > >   		pte |= GEN6_PTE_CACHE_LLC;
> > >   		break;
> > > -	case I915_CACHE_NONE:
> > > +	case I915_CACHE_MODE_UC:
> > >   		pte |= GEN6_PTE_UNCACHED;
> > >   		break;
> > >   	default:
> > > @@ -1044,13 +1039,13 @@ static u64 ivb_pte_encode(dma_addr_t addr,
> > >   	gen6_pte_t pte = GEN6_PTE_ADDR_ENCODE(addr) | GEN6_PTE_VALID;
> > >   	switch (pat_index) {
> > > -	case I915_CACHE_L3_LLC:
> > > +	case __I915_CACHE_MODE_WB_L3:
> > >   		pte |= GEN7_PTE_CACHE_L3_LLC;
> > >   		break;
> > > -	case I915_CACHE_LLC:
> > > +	case I915_CACHE_MODE_WB:
> > >   		pte |= GEN6_PTE_CACHE_LLC;
> > >   		break;
> > > -	case I915_CACHE_NONE:
> > > +	case I915_CACHE_MODE_UC:
> > >   		pte |= GEN6_PTE_UNCACHED;
> > >   		break;
> > >   	default:
> > > @@ -1069,7 +1064,7 @@ static u64 byt_pte_encode(dma_addr_t addr,
> > >   	if (!(flags & PTE_READ_ONLY))
> > >   		pte |= BYT_PTE_WRITEABLE;
> > > -	if (pat_index != I915_CACHE_NONE)
> > > +	if (pat_index != I915_CACHE_MODE_UC)
> > >   		pte |= BYT_PTE_SNOOPED_BY_CPU_CACHES;
> > >   	return pte;
> > > @@ -1081,7 +1076,7 @@ static u64 hsw_pte_encode(dma_addr_t addr,
> > >   {
> > >   	gen6_pte_t pte = HSW_PTE_ADDR_ENCODE(addr) | GEN6_PTE_VALID;
> > > -	if (pat_index != I915_CACHE_NONE)
> > > +	if (pat_index != I915_CACHE_MODE_UC)
> > >   		pte |= HSW_WB_LLC_AGE3;
> > >   	return pte;
> > > @@ -1094,9 +1089,9 @@ static u64 iris_pte_encode(dma_addr_t addr,
> > >   	gen6_pte_t pte = HSW_PTE_ADDR_ENCODE(addr) | GEN6_PTE_VALID;
> > >   	switch (pat_index) {
> > > -	case I915_CACHE_NONE:
> > > +	case I915_CACHE_MODE_UC:
> > >   		break;
> > > -	case I915_CACHE_WT:
> > > +	case I915_CACHE_MODE_WT:
> > >   		pte |= HSW_WT_ELLC_LLC_AGE3;
> > >   		break;
> > >   	default:
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c b/drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c
> > > index 866c416afb73..803c41ac4ccb 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c
> > > +++ b/drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c
> > > @@ -21,7 +21,7 @@ static void gmch_ggtt_insert_page(struct i915_address_space *vm,
> > >   				  unsigned int pat_index,
> > >   				  u32 unused)
> > >   {
> > > -	unsigned int flags = (pat_index == I915_CACHE_NONE) ?
> > > +	unsigned int flags = (pat_index == I915_CACHE_MODE_UC) ?
> > >   		AGP_USER_MEMORY : AGP_USER_CACHED_MEMORY;
> > >   	intel_gmch_gtt_insert_page(addr, offset >> PAGE_SHIFT, flags);
> > > @@ -32,7 +32,7 @@ static void gmch_ggtt_insert_entries(struct i915_address_space *vm,
> > >   				     unsigned int pat_index,
> > >   				     u32 unused)
> > >   {
> > > -	unsigned int flags = (pat_index == I915_CACHE_NONE) ?
> > > +	unsigned int flags = (pat_index == I915_CACHE_MODE_UC) ?
> > >   		AGP_USER_MEMORY : AGP_USER_CACHED_MEMORY;
> > >   	intel_gmch_gtt_insert_sg_entries(vma_res->bi.pages, vma_res->start >> PAGE_SHIFT,
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
> > > index 065099362a98..48055304537a 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
> > > +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
> > > @@ -676,7 +676,7 @@ __vm_create_scratch_for_read(struct i915_address_space *vm, unsigned long size)
> > >   	if (IS_ERR(obj))
> > >   		return ERR_CAST(obj);
> > > -	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
> > > +	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
> > >   	vma = i915_vma_instance(obj, vm, NULL);
> > >   	if (IS_ERR(vma)) {
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
> > > index 7192a534a654..af4277c1d577 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
> > > +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
> > > @@ -636,7 +636,8 @@ void
> > >   __set_pd_entry(struct i915_page_directory * const pd,
> > >   	       const unsigned short idx,
> > >   	       struct i915_page_table *pt,
> > > -	       u64 (*encode)(const dma_addr_t, const enum i915_cache_level));
> > > +	       u64 (*encode)(const dma_addr_t,
> > > +			     const enum i915_cache_mode cache_mode));
> > >   #define set_pd_entry(pd, idx, to) \
> > >   	__set_pd_entry((pd), (idx), px_pt(to), gen8_pde_encode)
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
> > > index 436756bfbb1a..3e461d4f3693 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c
> > > +++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
> > > @@ -98,14 +98,16 @@ void
> > >   __set_pd_entry(struct i915_page_directory * const pd,
> > >   	       const unsigned short idx,
> > >   	       struct i915_page_table * const to,
> > > -	       u64 (*encode)(const dma_addr_t, const enum i915_cache_level))
> > > +	       u64 (*encode)(const dma_addr_t,
> > > +			     const enum i915_cache_mode cache_mode))
> > >   {
> > >   	/* Each thread pre-pins the pd, and we may have a thread per pde. */
> > >   	GEM_BUG_ON(atomic_read(px_used(pd)) > NALLOC * I915_PDES);
> > >   	atomic_inc(px_used(pd));
> > >   	pd->entry[idx] = to;
> > > -	write_dma_entry(px_base(pd), idx, encode(px_dma(to), I915_CACHE_LLC));
> > > +	write_dma_entry(px_base(pd), idx,
> > > +			encode(px_dma(to), I915_CACHE_MODE_WB));
> > >   }
> > >   void
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> > > index 92085ffd23de..9131d228d285 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> > > +++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> > > @@ -551,7 +551,9 @@ alloc_context_vma(struct intel_engine_cs *engine)
> > >   	 * later platforms don't have L3 control bits in the PTE.
> > >   	 */
> > >   	if (IS_IVYBRIDGE(i915))
> > > -		i915_gem_object_set_cache_coherency(obj, I915_CACHE_L3_LLC);
> > > +		i915_gem_object_set_cache_coherency(obj,
> > > +						    I915_CACHE_CACHED |
> > > +						    __I915_CACHE_FLAG(L3));
> > >   	vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL);
> > >   	if (IS_ERR(vma)) {
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_timeline.c b/drivers/gpu/drm/i915/gt/intel_timeline.c
> > > index b9640212d659..025ce54c886d 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_timeline.c
> > > +++ b/drivers/gpu/drm/i915/gt/intel_timeline.c
> > > @@ -26,7 +26,7 @@ static struct i915_vma *hwsp_alloc(struct intel_gt *gt)
> > >   	if (IS_ERR(obj))
> > >   		return ERR_CAST(obj);
> > > -	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
> > > +	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
> > >   	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL);
> > >   	if (IS_ERR(vma))
> > > diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> > > index 8b0d84f2aad2..fc278fa463b0 100644
> > > --- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> > > +++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> > > @@ -64,7 +64,7 @@ static int hang_init(struct hang *h, struct intel_gt *gt)
> > >   		goto err_hws;
> > >   	}
> > > -	i915_gem_object_set_cache_coherency(h->hws, I915_CACHE_LLC);
> > > +	i915_gem_object_set_cache_coherency(h->hws, I915_CACHE_CACHED);
> > >   	vaddr = i915_gem_object_pin_map_unlocked(h->hws, I915_MAP_WB);
> > >   	if (IS_ERR(vaddr)) {
> > >   		err = PTR_ERR(vaddr);
> > > diff --git a/drivers/gpu/drm/i915/gt/selftest_workarounds.c b/drivers/gpu/drm/i915/gt/selftest_workarounds.c
> > > index 14a8b25b6204..d25990d33d44 100644
> > > --- a/drivers/gpu/drm/i915/gt/selftest_workarounds.c
> > > +++ b/drivers/gpu/drm/i915/gt/selftest_workarounds.c
> > > @@ -111,7 +111,7 @@ read_nonprivs(struct intel_context *ce)
> > >   	if (IS_ERR(result))
> > >   		return result;
> > > -	i915_gem_object_set_cache_coherency(result, I915_CACHE_LLC);
> > > +	i915_gem_object_set_cache_coherency(result, I915_CACHE_CACHED);
> > >   	cs = i915_gem_object_pin_map_unlocked(result, I915_MAP_WB);
> > >   	if (IS_ERR(cs)) {
> > > diff --git a/drivers/gpu/drm/i915/i915_cache.c b/drivers/gpu/drm/i915/i915_cache.c
> > > index 06eb5933c719..f4ba1cb430d3 100644
> > > --- a/drivers/gpu/drm/i915/i915_cache.c
> > > +++ b/drivers/gpu/drm/i915/i915_cache.c
> > > @@ -6,13 +6,88 @@
> > >   #include "i915_cache.h"
> > >   #include "i915_drv.h"
> > > -void i915_cache_init(struct drm_i915_private *i915)
> > > +int i915_cache_init(struct drm_i915_private *i915)
> > >   {
> > > -	i915->pat_uc = i915_gem_get_pat_index(i915, I915_CACHE_NONE);
> > > -	drm_info(&i915->drm, "Using PAT index %u for uncached access\n",
> > > -		 i915->pat_uc);
> > > +	int ret;
> > > -	i915->pat_wb = i915_gem_get_pat_index(i915, I915_CACHE_LLC);
> > > -	drm_info(&i915->drm, "Using PAT index %u for write-back access\n",
> > > -		 i915->pat_wb);
> > > +	ret = i915_cache_find_pat(i915, I915_CACHE_NONE);
> > > +	if (ret < 0) {
> > > +		drm_err(&i915->drm,
> > > +			"Failed to find PAT index for uncached access\n");
> > > +		return -ENODEV;
> > > +	}
> > > +	drm_info(&i915->drm, "Using PAT index %u for uncached access\n", ret);
> > > +	i915->pat_uc = ret;
> > > +
> > > +	ret = i915_cache_find_pat(i915, I915_CACHE_CACHED);
> > > +	if (ret < 0) {
> > > +		drm_err(&i915->drm,
> > > +			"Failed to find PAT index for write-back access\n");
> > > +		return -ENODEV;
> > > +	}
> > > +	drm_info(&i915->drm, "Using PAT index %u for write-back access\n", ret);
> > > +	i915->pat_wb = ret;
> > > +
> > > +	return 0;
> > > +}
> > > +
> > > +int i915_cache_find_pat(struct drm_i915_private *i915, i915_cache_t cache)
> > > +{
> > > +	const struct intel_device_info *info = INTEL_INFO(i915);
> > > +	int i;
> > > +
> > > +	for (i = 0; i < ARRAY_SIZE(info->cache_modes); i++) {
> > > +		if (info->cache_modes[i] == cache)
> > > +			return i;
> > > +	}
> > > +
> > > +	return -1;
> > > +}
> > > +
> > > +void i915_cache_print(char *buf, size_t buflen, const char *suffix,
> > > +		      i915_cache_t cache)
> > > +{
> > > +	const enum i915_cache_mode mode = I915_CACHE_MODE(cache);
> > > +	static const char * const mode_str[] = {
> > > +		[I915_CACHE_MODE_UC] = "UC",
> > > +		[I915_CACHE_MODE_WB] = "WB",
> > > +		[I915_CACHE_MODE_WT] = "WT",
> > > +		[I915_CACHE_MODE_WC] = "WC",
> > > +	};
> > > +	static const char * const flag_str[] = {
> > > +		[ilog2(I915_CACHE_FLAG_COH1W)] = "1-Way-Coherent",
> > > +		[ilog2(I915_CACHE_FLAG_COH2W)] = "2-Way-Coherent",
> > > +		[ilog2(I915_CACHE_FLAG_L3)] =    "L3",
> > > +		[ilog2(I915_CACHE_FLAG_CLOS1)] = "CLOS1",
> > > +		[ilog2(I915_CACHE_FLAG_CLOS2)] = "CLOS2",
> > > +	};
> > > +
> > > +	if (mode > ARRAY_SIZE(mode_str)) {
> > > +		snprintf(buf, buflen, "0x%x%s", cache, suffix ?: "");
> > > +	} else {
> > > +		unsigned long flags = I915_CACHE_FLAGS(cache);
> > > +		unsigned long bit;
> > > +		int ret;
> > > +
> > > +		ret = snprintf(buf, buflen, "%s", mode_str[mode]);
> > > +		buf += ret;
> > > +		buflen -= ret;
> > > +
> > > +		/*
> > > +		 * Don't print "1-way-2-way", it would be confusing and 2-way
> > > +		 * implies 1-way anyway.
> > > +		 */
> > > +		if ((flags & (I915_CACHE_FLAG_COH1W | I915_CACHE_FLAG_COH2W)) ==
> > > +		    (I915_CACHE_FLAG_COH1W | I915_CACHE_FLAG_COH2W))
> > > +			flags &= ~I915_CACHE_FLAG_COH1W;
> > > +
> > > +		for_each_set_bit(bit, &flags, BITS_PER_TYPE(i915_cache_t)) {
> > > +			ret = snprintf(buf, buflen, "-%s", flag_str[bit]);
> > > +			buf += ret;
> > > +			buflen -= ret;
> > > +		}
> > > +
> > > +		if (suffix)
> > > +			snprintf(buf, buflen, "%s", suffix);
> > > +	}
> > >   }
> > > diff --git a/drivers/gpu/drm/i915/i915_cache.h b/drivers/gpu/drm/i915/i915_cache.h
> > > index cb68936fb8a2..d9e97318b942 100644
> > > --- a/drivers/gpu/drm/i915/i915_cache.h
> > > +++ b/drivers/gpu/drm/i915/i915_cache.h
> > > @@ -6,8 +6,76 @@
> > >   #ifndef __I915_CACHE_H__
> > >   #define __I915_CACHE_H__
> > > +#include <linux/types.h>
> > > +
> > > +struct drm_printer;
> > > +
> > >   struct drm_i915_private;
> > > -void i915_cache_init(struct drm_i915_private *i915);
> > > +typedef u16 i915_cache_t;
> > > +
> > > +/* Cache modes */
> > > +enum i915_cache_mode {
> > > +	I915_CACHE_MODE_UC = 0,
> > > +	I915_CACHE_MODE_WB,
> > > +	__I915_CACHE_MODE_WB_L3, /* Special do-not-use entry for legacy 1:1 mapping. */
> > > +	I915_CACHE_MODE_WT,
> > > +	I915_CACHE_MODE_WC,
> > > +	I915_NUM_CACHE_MODES
> > > +};
> > > +
> > > +/* Cache mode flag bits */
> > > +#define I915_CACHE_FLAG_COH1W	(0x1)
> > > +#define I915_CACHE_FLAG_COH2W	(0x2) /* 1-way needs to be set too. */
> > > +#define I915_CACHE_FLAG_L3	(0x4)
> > > +#define I915_CACHE_FLAG_CLOS1	(0x8)
> > > +#define I915_CACHE_FLAG_CLOS2	(0x10)
> > > +
> > > +/*
> > > + * Overloaded I915_CACHE() macro based on:
> > > + *  https://stackoverflow.com/questions/3046889/optional-parameters-with-c-macros
> > > + *
> > > + * It is possible to call I915_CACHE with mode and zero or more flags as
> > > + * separate arguments. Ie these all work:
> > > + *
> > > + *   I915_CACHE(WB)
> > > + *   I915_CACHE(WB, COH1W, COH2W)
> > > + *   I915_CACHE(WB, COH1W, COH2W, L3)
> > > + */
> > > +
> > > +#define __I915_CACHE_FLAG(f) (I915_CACHE_FLAG_##f << 8)
> > > +#define __I915_CACHE(m, f) ((i915_cache_t)(I915_CACHE_MODE_##m | (f)))
> > > +
> > > +#define I915_CACHE_4(m, f1, f2, f3)	__I915_CACHE(m, __I915_CACHE_FLAG(f1) | __I915_CACHE_FLAG(f2) | __I915_CACHE_FLAG(f3))
> > > +#define I915_CACHE_3(m, f1, f2)		__I915_CACHE(m, __I915_CACHE_FLAG(f1) | __I915_CACHE_FLAG(f2))
> > > +#define I915_CACHE_2(m, f1)		__I915_CACHE(m, __I915_CACHE_FLAG(f1))
> > > +#define I915_CACHE_1(m)			__I915_CACHE(m, 0)
> > > +#define I915_CACHE_0(m)			__I915_CACHE(WC, 0)
> > > +
> > > +#define FUNC_CHOOSER(_f1, _f2, _f3, _f4, _f5, ...) _f5
> > > +#define FUNC_RECOMPOSER(argsWithParentheses) FUNC_CHOOSER argsWithParentheses
> > > +#define CHOOSE_FROM_ARG_COUNT(...) FUNC_RECOMPOSER((__VA_ARGS__, I915_CACHE_4, I915_CACHE_3, I915_CACHE_2, I915_CACHE_1, ))
> > > +#define NO_ARG_EXPANDER() ,,,I915_CACHE_0
> > > +#define MACRO_CHOOSER(...) CHOOSE_FROM_ARG_COUNT(NO_ARG_EXPANDER __VA_ARGS__ ())
> > > +
> > > +#define I915_CACHE(...) MACRO_CHOOSER(__VA_ARGS__)(__VA_ARGS__)
> > > +
> > > +/* i915_cache_t mode and flags extraction helpers. */
> > > +#define I915_CACHE_MODE(cache) \
> > > +	((enum i915_cache_mode)(((i915_cache_t)(cache)) & 0xff))
> > > +#define I915_CACHE_FLAGS(cache) \
> > > +	((unsigned int)((((i915_cache_t)(cache) & 0xff00)) >> 8))
> > > +
> > > +/* Helpers for i915 caching modes. */
> > > +#define I915_CACHE_NONE		I915_CACHE(UC)
> > > +#define I915_CACHE_CACHED	I915_CACHE(WB, COH1W, COH2W)
> > > +#define I915_CACHE_WT		I915_CACHE(WT)
> > > +
> > > +int i915_cache_init(struct drm_i915_private *i915);
> > > +int i915_cache_find_pat(struct drm_i915_private *i915, i915_cache_t cache);
> > > +void i915_cache_print(char *buf, size_t buflen, const char *suffix,
> > > +		      i915_cache_t cache);
> > > +
> > > +#define I915_CACHE_NAME_LEN (40)
> > >   #endif /* __I915_CACHE_H__ */
> > > diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> > > index 4de44cf1026d..4ec292011546 100644
> > > --- a/drivers/gpu/drm/i915/i915_debugfs.c
> > > +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> > > @@ -140,57 +140,18 @@ static const char *stringify_vma_type(const struct i915_vma *vma)
> > >   	return "ppgtt";
> > >   }
> > > -static const char *i915_cache_level_str(struct drm_i915_gem_object *obj)
> > > -{
> > > -	struct drm_i915_private *i915 = obj_to_i915(obj);
> > > -
> > > -	if (IS_METEORLAKE(i915)) {
> > > -		switch (obj->pat_index) {
> > > -		case 0: return " WB";
> > > -		case 1: return " WT";
> > > -		case 2: return " UC";
> > > -		case 3: return " WB (1-Way Coh)";
> > > -		case 4: return " WB (2-Way Coh)";
> > > -		default: return " not defined";
> > > -		}
> > > -	} else if (IS_PONTEVECCHIO(i915)) {
> > > -		switch (obj->pat_index) {
> > > -		case 0: return " UC";
> > > -		case 1: return " WC";
> > > -		case 2: return " WT";
> > > -		case 3: return " WB";
> > > -		case 4: return " WT (CLOS1)";
> > > -		case 5: return " WB (CLOS1)";
> > > -		case 6: return " WT (CLOS2)";
> > > -		case 7: return " WT (CLOS2)";
> > > -		default: return " not defined";
> > > -		}
> > > -	} else if (GRAPHICS_VER(i915) >= 12) {
> > > -		switch (obj->pat_index) {
> > > -		case 0: return " WB";
> > > -		case 1: return " WC";
> > > -		case 2: return " WT";
> > > -		case 3: return " UC";
> > > -		default: return " not defined";
> > > -		}
> > > -	} else {
> > > -		switch (obj->pat_index) {
> > > -		case 0: return " UC";
> > > -		case 1: return HAS_LLC(i915) ?
> > > -			       " LLC" : " snooped";
> > > -		case 2: return " L3+LLC";
> > > -		case 3: return " WT";
> > > -		default: return " not defined";
> > > -		}
> > > -	}
> > > -}
> > > -
> > >   void
> > >   i915_debugfs_describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
> > >   {
> > > +	struct drm_i915_private *i915 = to_i915(obj->base.dev);
> > > +	char buf[I915_CACHE_NAME_LEN];
> > >   	struct i915_vma *vma;
> > >   	int pin_count = 0;
> > > +	i915_cache_print(buf, sizeof(buf),
> > > +			 obj->pat_set_by_user ? "!" : NULL,
> > > +			 INTEL_INFO(i915)->cache_modes[obj->pat_index]);
> > > +
> > >   	seq_printf(m, "%pK: %c%c%c %8zdKiB %02x %02x %s%s%s",
> > >   		   &obj->base,
> > >   		   get_tiling_flag(obj),
> > > @@ -199,7 +160,7 @@ i915_debugfs_describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
> > >   		   obj->base.size / 1024,
> > >   		   obj->read_domains,
> > >   		   obj->write_domain,
> > > -		   i915_cache_level_str(obj),
> > > +		   buf,
> > >   		   obj->mm.dirty ? " dirty" : "",
> > >   		   obj->mm.madv == I915_MADV_DONTNEED ? " purgeable" : "");
> > >   	if (obj->base.name)
> > > diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
> > > index bb2223cc3470..8663388a524f 100644
> > > --- a/drivers/gpu/drm/i915/i915_driver.c
> > > +++ b/drivers/gpu/drm/i915/i915_driver.c
> > > @@ -241,7 +241,9 @@ static int i915_driver_early_probe(struct drm_i915_private *dev_priv)
> > >   	i915_memcpy_init_early(dev_priv);
> > >   	intel_runtime_pm_init_early(&dev_priv->runtime_pm);
> > > -	i915_cache_init(dev_priv);
> > > +	ret = i915_cache_init(dev_priv);
> > > +	if (ret < 0)
> > > +		return ret;
> > >   	ret = i915_workqueues_init(dev_priv);
> > >   	if (ret < 0)
> > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > > index 896aa48ed089..814705cfeb12 100644
> > > --- a/drivers/gpu/drm/i915/i915_gem.c
> > > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > > @@ -1144,19 +1144,6 @@ int i915_gem_init(struct drm_i915_private *dev_priv)
> > >   	unsigned int i;
> > >   	int ret;
> > > -	/*
> > > -	 * In the proccess of replacing cache_level with pat_index a tricky
> > > -	 * dependency is created on the definition of the enum i915_cache_level.
> > > -	 * in case this enum is changed, PTE encode would be broken.
> > > -	 * Add a WARNING here. And remove when we completely quit using this
> > > -	 * enum
> > > -	 */
> > > -	BUILD_BUG_ON(I915_CACHE_NONE != 0 ||
> > > -		     I915_CACHE_LLC != 1 ||
> > > -		     I915_CACHE_L3_LLC != 2 ||
> > > -		     I915_CACHE_WT != 3 ||
> > > -		     I915_MAX_CACHE_LEVEL != 4);
> > > -
> > >   	/* We need to fallback to 4K pages if host doesn't support huge gtt. */
> > >   	if (intel_vgpu_active(dev_priv) && !intel_vgpu_has_huge_gtt(dev_priv))
> > >   		RUNTIME_INFO(dev_priv)->page_sizes = I915_GTT_PAGE_SIZE_4K;
> > > diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
> > > index fcacdc21643c..565a60a1645d 100644
> > > --- a/drivers/gpu/drm/i915/i915_pci.c
> > > +++ b/drivers/gpu/drm/i915/i915_pci.c
> > > @@ -32,6 +32,7 @@
> > >   #include "gt/intel_sa_media.h"
> > >   #include "gem/i915_gem_object_types.h"
> > > +#include "i915_cache.h"
> > >   #include "i915_driver.h"
> > >   #include "i915_drv.h"
> > >   #include "i915_pci.h"
> > > @@ -43,36 +44,43 @@
> > >   	.__runtime.graphics.ip.ver = (x), \
> > >   	.__runtime.media.ip.ver = (x)
> > > -#define LEGACY_CACHELEVEL \
> > > -	.cachelevel_to_pat = { \
> > > -		[I915_CACHE_NONE]   = 0, \
> > > -		[I915_CACHE_LLC]    = 1, \
> > > -		[I915_CACHE_L3_LLC] = 2, \
> > > -		[I915_CACHE_WT]     = 3, \
> > > +#define LEGACY_CACHE_MODES \
> > > +	.cache_modes = { \
> > > +		[I915_CACHE_MODE_UC] 	  = I915_CACHE(UC), \
> > > +		[I915_CACHE_MODE_WB] 	  = I915_CACHE(WB, COH1W, COH2W), \
> > 
> > Reading bspec 2863 (bdw) indicates that the CPU being able to snoop the
> > GPU's L3 was a new feature in gen8.  So for HSW and earlier, any
> > coherency was only 1-way (GPU could be coherent with CPU's caches, but
> > not vice-versa).  Only starting with gen8 did we get 2-way coherency as
> > an option where the CPU would also be coherent with the GPU cache (and
> > with gen8 and beyond you could still select 1-way instead of 2-way
> > coherency with instruction-level granularity via MOCS).  There are also
> > some legacy platforms (e.g., EHL/JSL on bspec 13948) where the IA wasn't
> > coherent with GPU L3 so we were back to 1-way coherency.
> > 
> > So should we split LEGACY_CACHE_MODES into two tables with different
> > coherency settings attached to I915_CACHE_MODE_WB?
> > 
> > > +		[__I915_CACHE_MODE_WB_L3] = I915_CACHE(WB, COH1W, COH2W, L3), \
> > > +		[I915_CACHE_MODE_WT] 	  = I915_CACHE(WT), \
> > >   	}
> > > -#define TGL_CACHELEVEL \
> > > -	.cachelevel_to_pat = { \
> > > -		[I915_CACHE_NONE]   = 3, \
> > > -		[I915_CACHE_LLC]    = 0, \
> > > -		[I915_CACHE_L3_LLC] = 0, \
> > > -		[I915_CACHE_WT]     = 2, \
> > > +#define GEN12_CACHE_MODES \
> > > +	.cache_modes = { \
> > > +		[0] = I915_CACHE(WB, COH1W, COH2W), \
> > > +		[1] = I915_CACHE(WC), \
> > > +		[2] = I915_CACHE(WT), \
> > > +		[3] = I915_CACHE(UC), \
> > >   	}
> > > -#define PVC_CACHELEVEL \
> > > -	.cachelevel_to_pat = { \
> > > -		[I915_CACHE_NONE]   = 0, \
> > > -		[I915_CACHE_LLC]    = 3, \
> > > -		[I915_CACHE_L3_LLC] = 3, \
> > > -		[I915_CACHE_WT]     = 2, \
> > > +/* FIXME is 1-way or 2-way for 3, 5, 7 */
> > > +
> > > +#define PVC_CACHE_MODES \
> > > +	.cache_modes = { \
> > > +		[0] = I915_CACHE(UC), \
> > > +		[1] = I915_CACHE(WC), \
> > > +		[2] = I915_CACHE(WT), \
> > > +		[3] = I915_CACHE(WB, COH1W), \
> > > +		[4] = I915_CACHE(WT, CLOS1), \
> > > +		[5] = I915_CACHE(WB, COH1W, CLOS1), \
> > > +		[6] = I915_CACHE(WT, CLOS2), \
> > > +		[7] = I915_CACHE(WB, COH1W, CLOS2), \
> > >   	}
> > > -#define MTL_CACHELEVEL \
> > > -	.cachelevel_to_pat = { \
> > > -		[I915_CACHE_NONE]   = 2, \
> > > -		[I915_CACHE_LLC]    = 3, \
> > > -		[I915_CACHE_L3_LLC] = 3, \
> > > -		[I915_CACHE_WT]     = 1, \
> > > +#define MTL_CACHE_MODES \
> > > +	.cache_modes = { \
> > > +		[0] = I915_CACHE(WB), \
> > > +		[1] = I915_CACHE(WT), \
> > > +		[2] = I915_CACHE(UC), \
> > > +		[3] = I915_CACHE(WB, COH1W), \
> > > +		[4] = I915_CACHE(WB, COH1W, COH2W), \
> > 
> > We may want a comment on this one since the "2W" part is sort of a lie.
> > Bspec 63884 has a programming note for MTL that says
> > 
> >          "...Except for system atomics, setting Coherency Mode to 10 or
> >          11 results in this same one-way coherenct behavior..."
> > 
> > So if we ask for 2W, we actually only get 1W behavior except in a very
> > narrow set of cases.
> 
> Shall I just not mark it as 2-way then becuase it sounds that for i915
> purposes it is not 2-way?!
> 
> Could invent a new flag just to documet this is something weird?
> 
> Regards,
> 
> Tvrtko

Yeah, it sounds like that might be best.


Matt

> 
> > 
> > 
> > Matt
> > 
> > >   	}
> > >   /* Keep in gen based order, and chronological order within a gen */
> > > @@ -97,7 +105,7 @@
> > >   	.max_pat_index = 3, \
> > >   	GEN_DEFAULT_PAGE_SIZES, \
> > >   	GEN_DEFAULT_REGIONS, \
> > > -	LEGACY_CACHELEVEL
> > > +	LEGACY_CACHE_MODES
> > >   #define I845_FEATURES \
> > >   	GEN(2), \
> > > @@ -112,7 +120,7 @@
> > >   	.max_pat_index = 3, \
> > >   	GEN_DEFAULT_PAGE_SIZES, \
> > >   	GEN_DEFAULT_REGIONS, \
> > > -	LEGACY_CACHELEVEL
> > > +	LEGACY_CACHE_MODES
> > >   static const struct intel_device_info i830_info = {
> > >   	I830_FEATURES,
> > > @@ -145,7 +153,7 @@ static const struct intel_device_info i865g_info = {
> > >   	.max_pat_index = 3, \
> > >   	GEN_DEFAULT_PAGE_SIZES, \
> > >   	GEN_DEFAULT_REGIONS, \
> > > -	LEGACY_CACHELEVEL
> > > +	LEGACY_CACHE_MODES
> > >   static const struct intel_device_info i915g_info = {
> > >   	GEN3_FEATURES,
> > > @@ -208,7 +216,7 @@ static const struct intel_device_info pnv_m_info = {
> > >   	.max_pat_index = 3, \
> > >   	GEN_DEFAULT_PAGE_SIZES, \
> > >   	GEN_DEFAULT_REGIONS, \
> > > -	LEGACY_CACHELEVEL
> > > +	LEGACY_CACHE_MODES
> > >   static const struct intel_device_info i965g_info = {
> > >   	GEN4_FEATURES,
> > > @@ -252,7 +260,7 @@ static const struct intel_device_info gm45_info = {
> > >   	.max_pat_index = 3, \
> > >   	GEN_DEFAULT_PAGE_SIZES, \
> > >   	GEN_DEFAULT_REGIONS, \
> > > -	LEGACY_CACHELEVEL
> > > +	LEGACY_CACHE_MODES
> > >   static const struct intel_device_info ilk_d_info = {
> > >   	GEN5_FEATURES,
> > > @@ -282,7 +290,7 @@ static const struct intel_device_info ilk_m_info = {
> > >   	.__runtime.ppgtt_size = 31, \
> > >   	GEN_DEFAULT_PAGE_SIZES, \
> > >   	GEN_DEFAULT_REGIONS, \
> > > -	LEGACY_CACHELEVEL
> > > +	LEGACY_CACHE_MODES
> > >   #define SNB_D_PLATFORM \
> > >   	GEN6_FEATURES, \
> > > @@ -330,7 +338,7 @@ static const struct intel_device_info snb_m_gt2_info = {
> > >   	.__runtime.ppgtt_size = 31, \
> > >   	GEN_DEFAULT_PAGE_SIZES, \
> > >   	GEN_DEFAULT_REGIONS, \
> > > -	LEGACY_CACHELEVEL
> > > +	LEGACY_CACHE_MODES
> > >   #define IVB_D_PLATFORM \
> > >   	GEN7_FEATURES, \
> > > @@ -387,7 +395,7 @@ static const struct intel_device_info vlv_info = {
> > >   	.platform_engine_mask = BIT(RCS0) | BIT(VCS0) | BIT(BCS0),
> > >   	GEN_DEFAULT_PAGE_SIZES,
> > >   	GEN_DEFAULT_REGIONS,
> > > -	LEGACY_CACHELEVEL,
> > > +	LEGACY_CACHE_MODES
> > >   };
> > >   #define G75_FEATURES  \
> > > @@ -473,7 +481,7 @@ static const struct intel_device_info chv_info = {
> > >   	.has_coherent_ggtt = false,
> > >   	GEN_DEFAULT_PAGE_SIZES,
> > >   	GEN_DEFAULT_REGIONS,
> > > -	LEGACY_CACHELEVEL,
> > > +	LEGACY_CACHE_MODES
> > >   };
> > >   #define GEN9_DEFAULT_PAGE_SIZES \
> > > @@ -536,7 +544,7 @@ static const struct intel_device_info skl_gt4_info = {
> > >   	.max_pat_index = 3, \
> > >   	GEN9_DEFAULT_PAGE_SIZES, \
> > >   	GEN_DEFAULT_REGIONS, \
> > > -	LEGACY_CACHELEVEL
> > > +	LEGACY_CACHE_MODES
> > >   static const struct intel_device_info bxt_info = {
> > >   	GEN9_LP_FEATURES,
> > > @@ -640,7 +648,7 @@ static const struct intel_device_info jsl_info = {
> > >   #define GEN12_FEATURES \
> > >   	GEN11_FEATURES, \
> > >   	GEN(12), \
> > > -	TGL_CACHELEVEL, \
> > > +	GEN12_CACHE_MODES, \
> > >   	.has_global_mocs = 1, \
> > >   	.has_pxp = 1, \
> > >   	.max_pat_index = 3
> > > @@ -708,7 +716,7 @@ static const struct intel_device_info adl_p_info = {
> > >   	.__runtime.graphics.ip.ver = 12, \
> > >   	.__runtime.graphics.ip.rel = 50, \
> > >   	XE_HP_PAGE_SIZES, \
> > > -	TGL_CACHELEVEL, \
> > > +	GEN12_CACHE_MODES, \
> > >   	.dma_mask_size = 46, \
> > >   	.has_3d_pipeline = 1, \
> > >   	.has_64bit_reloc = 1, \
> > > @@ -803,7 +811,7 @@ static const struct intel_device_info pvc_info = {
> > >   		BIT(VCS0) |
> > >   		BIT(CCS0) | BIT(CCS1) | BIT(CCS2) | BIT(CCS3),
> > >   	.require_force_probe = 1,
> > > -	PVC_CACHELEVEL,
> > > +	PVC_CACHE_MODES
> > >   };
> > >   static const struct intel_gt_definition xelpmp_extra_gt[] = {
> > > @@ -838,7 +846,7 @@ static const struct intel_device_info mtl_info = {
> > >   	.memory_regions = REGION_SMEM | REGION_STOLEN_LMEM,
> > >   	.platform_engine_mask = BIT(RCS0) | BIT(BCS0) | BIT(CCS0),
> > >   	.require_force_probe = 1,
> > > -	MTL_CACHELEVEL,
> > > +	MTL_CACHE_MODES
> > >   };
> > >   #undef PLATFORM
> > > diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> > > index 04bc1f4a1115..973175a64534 100644
> > > --- a/drivers/gpu/drm/i915/i915_perf.c
> > > +++ b/drivers/gpu/drm/i915/i915_perf.c
> > > @@ -1870,7 +1870,7 @@ static int alloc_oa_buffer(struct i915_perf_stream *stream)
> > >   		return PTR_ERR(bo);
> > >   	}
> > > -	i915_gem_object_set_cache_coherency(bo, I915_CACHE_LLC);
> > > +	i915_gem_object_set_cache_coherency(bo, I915_CACHE_CACHED);
> > >   	/* PreHSW required 512K alignment, HSW requires 16M */
> > >   	vma = i915_vma_instance(bo, &gt->ggtt->vm, NULL);
> > > diff --git a/drivers/gpu/drm/i915/intel_device_info.h b/drivers/gpu/drm/i915/intel_device_info.h
> > > index dbfe6443457b..2ce13b7c48cb 100644
> > > --- a/drivers/gpu/drm/i915/intel_device_info.h
> > > +++ b/drivers/gpu/drm/i915/intel_device_info.h
> > > @@ -27,6 +27,8 @@
> > >   #include <uapi/drm/i915_drm.h>
> > > +#include "i915_cache.h"
> > > +
> > >   #include "intel_step.h"
> > >   #include "gt/intel_engine_types.h"
> > > @@ -243,8 +245,8 @@ struct intel_device_info {
> > >   	 */
> > >   	const struct intel_runtime_info __runtime;
> > > -	u32 cachelevel_to_pat[I915_MAX_CACHE_LEVEL];
> > > -	u32 max_pat_index;
> > > +	i915_cache_t cache_modes[8];
> > > +	unsigned int max_pat_index;
> > >   };
> > >   struct intel_driver_caps {
> > > diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
> > > index f910ec9b6d2b..ba821e48baa5 100644
> > > --- a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
> > > +++ b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
> > > @@ -267,7 +267,7 @@ static int igt_evict_for_cache_color(void *arg)
> > >   		err = PTR_ERR(obj);
> > >   		goto cleanup;
> > >   	}
> > > -	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
> > > +	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
> > >   	quirk_add(obj, &objects);
> > >   	vma = i915_gem_object_ggtt_pin(obj, NULL, 0, 0,
> > > @@ -283,7 +283,7 @@ static int igt_evict_for_cache_color(void *arg)
> > >   		err = PTR_ERR(obj);
> > >   		goto cleanup;
> > >   	}
> > > -	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
> > > +	i915_gem_object_set_cache_coherency(obj, I915_CACHE_CACHED);
> > >   	quirk_add(obj, &objects);
> > >   	/* Neighbouring; same colour - should fit */
> > > diff --git a/drivers/gpu/drm/i915/selftests/igt_spinner.c b/drivers/gpu/drm/i915/selftests/igt_spinner.c
> > > index 3c5e0952f1b8..4cfc5000d6ff 100644
> > > --- a/drivers/gpu/drm/i915/selftests/igt_spinner.c
> > > +++ b/drivers/gpu/drm/i915/selftests/igt_spinner.c
> > > @@ -23,7 +23,7 @@ int igt_spinner_init(struct igt_spinner *spin, struct intel_gt *gt)
> > >   		err = PTR_ERR(spin->hws);
> > >   		goto err;
> > >   	}
> > > -	i915_gem_object_set_cache_coherency(spin->hws, I915_CACHE_LLC);
> > > +	i915_gem_object_set_cache_coherency(spin->hws, I915_CACHE_CACHED);
> > >   	spin->obj = i915_gem_object_create_internal(gt->i915, PAGE_SIZE);
> > >   	if (IS_ERR(spin->obj)) {
> > > diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> > > index 1d1a457e2aee..8ae77bcf27fa 100644
> > > --- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> > > +++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> > > @@ -126,13 +126,13 @@ static const struct intel_device_info mock_info = {
> > >   	.memory_regions = REGION_SMEM,
> > >   	.platform_engine_mask = BIT(0),
> > > -	/* simply use legacy cache level for mock device */
> > > +	/* Simply use legacy cache modes for the mock device. */
> > >   	.max_pat_index = 3,
> > > -	.cachelevel_to_pat = {
> > > -		[I915_CACHE_NONE]   = 0,
> > > -		[I915_CACHE_LLC]    = 1,
> > > -		[I915_CACHE_L3_LLC] = 2,
> > > -		[I915_CACHE_WT]     = 3,
> > > +	.cache_modes = {
> > > +		[0] = I915_CACHE(UC),
> > > +		[1] = I915_CACHE(WB, COH1W),
> > > +		[2] = I915_CACHE(WB, COH1W, COH2W, L3),
> > > +		[3] = I915_CACHE(WT),
> > >   	},
> > >   };
> > > @@ -181,7 +181,7 @@ struct drm_i915_private *mock_gem_device(void)
> > >   	/* Set up device info and initial runtime info. */
> > >   	intel_device_info_driver_create(i915, pdev->device, &mock_info);
> > > -	i915_cache_init(i915);
> > > +	WARN_ON(i915_cache_init(i915));
> > >   	dev_pm_domain_set(&pdev->dev, &pm_domain);
> > >   	pm_runtime_enable(&pdev->dev);
> > > -- 
> > > 2.39.2
> > > 
> > 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation

^ permalink raw reply	[flat|nested] 59+ messages in thread

end of thread, other threads:[~2023-07-28 14:54 UTC | newest]

Thread overview: 59+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-27 14:54 [RFC 0/8] Another take on PAT/object cache mode refactoring Tvrtko Ursulin
2023-07-27 14:54 ` [Intel-gfx] " Tvrtko Ursulin
2023-07-27 14:54 ` [RFC 1/8] drm/i915: Skip clflush after GPU writes on Meteorlake Tvrtko Ursulin
2023-07-27 14:54   ` [Intel-gfx] " Tvrtko Ursulin
2023-07-27 22:19   ` Matt Roper
2023-07-27 22:19     ` [Intel-gfx] " Matt Roper
2023-07-28  5:50   ` Yang, Fei
2023-07-28  5:50     ` [Intel-gfx] " Yang, Fei
2023-07-27 14:54 ` [RFC 2/8] drm/i915: Split PTE encode between Gen12 and Meteorlake Tvrtko Ursulin
2023-07-27 14:54   ` [Intel-gfx] " Tvrtko Ursulin
2023-07-27 22:25   ` Matt Roper
2023-07-28  8:18     ` Tvrtko Ursulin
2023-07-28 14:41       ` Matt Roper
2023-07-27 14:54 ` [RFC 3/8] drm/i915: Cache PAT index used by the driver Tvrtko Ursulin
2023-07-27 14:54   ` [Intel-gfx] " Tvrtko Ursulin
2023-07-27 22:44   ` Matt Roper
2023-07-27 22:44     ` [Intel-gfx] " Matt Roper
2023-07-28 12:03     ` Tvrtko Ursulin
2023-07-28 12:03       ` [Intel-gfx] " Tvrtko Ursulin
2023-07-27 14:55 ` [RFC 4/8] drm/i915: Refactor PAT/object cache handling Tvrtko Ursulin
2023-07-27 14:55   ` [Intel-gfx] " Tvrtko Ursulin
2023-07-27 23:57   ` Matt Roper
2023-07-27 23:57     ` [Intel-gfx] " Matt Roper
2023-07-28  0:17     ` Matt Roper
2023-07-28  0:17       ` [Intel-gfx] " Matt Roper
2023-07-28 12:35       ` Tvrtko Ursulin
2023-07-28 12:35         ` Tvrtko Ursulin
2023-07-28 12:23     ` Tvrtko Ursulin
2023-07-28 12:23       ` [Intel-gfx] " Tvrtko Ursulin
2023-07-28 12:39     ` Tvrtko Ursulin
2023-07-28 12:39       ` [Intel-gfx] " Tvrtko Ursulin
2023-07-28 14:53       ` Matt Roper
2023-07-28 14:53         ` [Intel-gfx] " Matt Roper
2023-07-28  7:14   ` Yang, Fei
2023-07-28  7:14     ` [Intel-gfx] " Yang, Fei
2023-07-28 12:55     ` Tvrtko Ursulin
2023-07-28 12:55       ` [Intel-gfx] " Tvrtko Ursulin
2023-07-27 14:55 ` [RFC 5/8] drm/i915: Improve the vm_fault_gtt user PAT index restriction Tvrtko Ursulin
2023-07-27 14:55   ` [Intel-gfx] " Tvrtko Ursulin
2023-07-28  0:04   ` Matt Roper
2023-07-28  0:04     ` [Intel-gfx] " Matt Roper
2023-07-28 12:28     ` Tvrtko Ursulin
2023-07-28 12:28       ` [Intel-gfx] " Tvrtko Ursulin
2023-07-27 14:55 ` [RFC 6/8] drm/i915: Lift the user PAT restriction from gpu_write_needs_clflush Tvrtko Ursulin
2023-07-27 14:55   ` [Intel-gfx] " Tvrtko Ursulin
2023-07-28  0:05   ` Matt Roper
2023-07-28  0:05     ` [Intel-gfx] " Matt Roper
2023-07-27 14:55 ` [RFC 7/8] drm/i915: Lift the user PAT restriction from use_cpu_reloc Tvrtko Ursulin
2023-07-27 14:55   ` [Intel-gfx] " Tvrtko Ursulin
2023-07-28  0:09   ` Matt Roper
2023-07-28  0:09     ` [Intel-gfx] " Matt Roper
2023-07-28 12:45     ` Tvrtko Ursulin
2023-07-28 12:45       ` [Intel-gfx] " Tvrtko Ursulin
2023-07-27 14:55 ` [RFC 8/8] drm/i915: Refine the caching check in i915_gem_object_can_bypass_llc Tvrtko Ursulin
2023-07-27 14:55   ` [Intel-gfx] " Tvrtko Ursulin
2023-07-27 19:43 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Another take on PAT/object cache mode refactoring Patchwork
2023-07-27 19:43 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2023-07-27 20:01 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2023-07-28  1:03 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.