[igt-dev] [PATCH i-g-t v4 00/15] PAT and cache coherency support

All of lore.kernel.org
 help / color / mirror / Atom feed

* [igt-dev] [PATCH i-g-t v4 00/15] PAT and cache coherency support
@ 2023-10-19 14:40 Matthew Auld
  2023-10-19 14:40 ` [igt-dev] [PATCH i-g-t v4 01/15] drm-uapi/xe_drm: sync to get pat and coherency bits Matthew Auld
                   ` (14 more replies)
  0 siblings, 15 replies; 28+ messages in thread
From: Matthew Auld @ 2023-10-19 14:40 UTC (permalink / raw)
  To: igt-dev

Series implements the IGT side of things needed to support the new Xe uapi here:
https://patchwork.freedesktop.org/series/123027/

Branch with the IGT changes:
https://gitlab.freedesktop.org/mwa/igt-gpu-tools/-/commits/xe-pat-index

Branch with the KMD changes:
https://gitlab.freedesktop.org/mwa/kernel/-/tree/xe-pat-index?ref_type=heads

v2:
  - Various tweaks and improvements.
  - Rebase on Xe2 additions.
  - Handle compressed wt on Xe2 + some other xe2 specific pat_index modes.
v3:
  - Various fixes and improvements.
v4:
  - Various improvements. Rebase.

-- 
2.41.0

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [igt-dev] [PATCH i-g-t v4 01/15] drm-uapi/xe_drm: sync to get pat and coherency bits
  2023-10-19 14:40 [igt-dev] [PATCH i-g-t v4 00/15] PAT and cache coherency support Matthew Auld
@ 2023-10-19 14:40 ` Matthew Auld
  2023-10-19 14:40 ` [igt-dev] [PATCH i-g-t v4 02/15] lib/igt_fb: mark buffers as SCANOUT Matthew Auld
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 28+ messages in thread
From: Matthew Auld @ 2023-10-19 14:40 UTC (permalink / raw)
  To: igt-dev

Grab the PAT & coherency uapi additions.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Cc: Pallavi Mishra <pallavi.mishra@intel.com>
Reviewed-by: Pallavi Mishra <pallavi.mishra@intel.com>
---
 include/drm-uapi/xe_drm.h | 93 +++++++++++++++++++++++++++++++++++++--
 1 file changed, 90 insertions(+), 3 deletions(-)

diff --git a/include/drm-uapi/xe_drm.h b/include/drm-uapi/xe_drm.h
index 6ff1106e4..d9ed75f53 100644
--- a/include/drm-uapi/xe_drm.h
+++ b/include/drm-uapi/xe_drm.h
@@ -548,8 +548,54 @@ struct drm_xe_gem_create {
 	 */
 	__u32 handle;
 
-	/** @pad: MBZ */
-	__u32 pad;
+	/**
+	 * @coh_mode: The coherency mode for this object. This will limit the
+	 * possible @cpu_caching values.
+	 *
+	 * Supported values:
+	 *
+	 * DRM_XE_GEM_COH_NONE: GPU access is assumed to be not coherent with
+	 * CPU. CPU caches are not snooped.
+	 *
+	 * DRM_XE_GEM_COH_AT_LEAST_1WAY:
+	 *
+	 * CPU-GPU coherency must be at least 1WAY.
+	 *
+	 * If 1WAY then GPU access is coherent with CPU (CPU caches are snooped)
+	 * until GPU acquires. The acquire by the GPU is not tracked by CPU
+	 * caches.
+	 *
+	 * If 2WAY then should be fully coherent between GPU and CPU.  Fully
+	 * tracked by CPU caches. Both CPU and GPU caches are snooped.
+	 *
+	 * Note: On dgpu the GPU device never caches system memory. The device
+	 * should be thought of as always 1WAY coherent, with the addition that
+	 * the GPU never caches system memory. At least on current dgpu HW there
+	 * is no way to turn off snooping so likely the different coherency
+	 * modes of the pat_index make no difference for system memory.
+	 */
+#define DRM_XE_GEM_COH_NONE		1
+#define DRM_XE_GEM_COH_AT_LEAST_1WAY	2
+	__u16 coh_mode;
+
+	/**
+	 * @cpu_caching: The CPU caching mode to select for this object. If
+	 * mmaping the object the mode selected here will also be used.
+	 *
+	 * Supported values:
+	 *
+	 * DRM_XE_GEM_CPU_CACHING_WB: Allocate the pages with write-back caching.
+	 * On iGPU this can't be used for scanout surfaces. The @coh_mode must
+	 * be DRM_XE_GEM_COH_AT_LEAST_1WAY. Currently not allowed for objects placed
+	 * in VRAM.
+	 *
+	 * DRM_XE_GEM_CPU_CACHING_WC: Allocate the pages as write-combined. This is
+	 * uncached. Any @coh_mode is permitted. Scanout surfaces should likely
+	 * use this. All objects that can be placed in VRAM must use this.
+	 */
+#define DRM_XE_GEM_CPU_CACHING_WB                      1
+#define DRM_XE_GEM_CPU_CACHING_WC                      2
+	__u16 cpu_caching;
 
 	/** @reserved: Reserved */
 	__u64 reserved[2];
@@ -626,8 +672,49 @@ struct drm_xe_vm_bind_op {
 	 */
 	__u32 obj;
 
+	/**
+	 * @pat_index: The platform defined @pat_index to use for this mapping.
+	 * The index basically maps to some predefined memory attributes,
+	 * including things like caching, coherency, compression etc.  The exact
+	 * meaning of the pat_index is platform specific and defined in the
+	 * Bspec and PRMs.  When the KMD sets up the binding the index here is
+	 * encoded into the ppGTT PTE.
+	 *
+	 * For coherency the @pat_index needs to be least as coherent as
+	 * drm_xe_gem_create.coh_mode. i.e coh_mode(pat_index) >=
+	 * drm_xe_gem_create.coh_mode. The KMD will extract the coherency mode
+	 * from the @pat_index and reject if there is a mismatch (see note below
+	 * for pre-MTL platforms).
+	 *
+	 * Note: On pre-MTL platforms there is only a caching mode and no
+	 * explicit coherency mode, but on such hardware there is always a
+	 * shared-LLC (or is dgpu) so all GT memory accesses are coherent with
+	 * CPU caches even with the caching mode set as uncached.  It's only the
+	 * display engine that is incoherent (on dgpu it must be in VRAM which
+	 * is always mapped as WC on the CPU). However to keep the uapi somewhat
+	 * consistent with newer platforms the KMD groups the different cache
+	 * levels into the following coherency buckets on all pre-MTL platforms:
+	 *
+	 *	ppGTT UC -> DRM_XE_GEM_COH_NONE
+	 *	ppGTT WC -> DRM_XE_GEM_COH_NONE
+	 *	ppGTT WT -> DRM_XE_GEM_COH_NONE
+	 *	ppGTT WB -> DRM_XE_GEM_COH_AT_LEAST_1WAY
+	 *
+	 * In practice UC/WC/WT should only ever used for scanout surfaces on
+	 * such platforms (or perhaps in general for dma-buf if shared with
+	 * another device) since it is only the display engine that is actually
+	 * incoherent.  Everything else should typically use WB given that we
+	 * have a shared-LLC.  On MTL+ this completely changes and the HW
+	 * defines the coherency mode as part of the @pat_index, where
+	 * incoherent GT access is possible.
+	 *
+	 * Note: For userptr and externally imported dma-buf the kernel expects
+	 * either 1WAY or 2WAY for the @pat_index.
+	 */
+	__u16 pat_index;
+
 	/** @pad: MBZ */
-	__u32 pad;
+	__u16 pad;
 
 	union {
 		/**
-- 
2.41.0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [igt-dev] [PATCH i-g-t v4 02/15] lib/igt_fb: mark buffers as SCANOUT
  2023-10-19 14:40 [igt-dev] [PATCH i-g-t v4 00/15] PAT and cache coherency support Matthew Auld
  2023-10-19 14:40 ` [igt-dev] [PATCH i-g-t v4 01/15] drm-uapi/xe_drm: sync to get pat and coherency bits Matthew Auld
@ 2023-10-19 14:40 ` Matthew Auld
  2023-10-19 14:40 ` [igt-dev] [PATCH i-g-t v4 03/15] lib/igt_draw: " Matthew Auld
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 28+ messages in thread
From: Matthew Auld @ 2023-10-19 14:40 UTC (permalink / raw)
  To: igt-dev

Display buffers likely will want WC, instead of the default WB on the
CPU side, given that display engine is incoherent with CPU caches.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Cc: Pallavi Mishra <pallavi.mishra@intel.com>
Reviewed-by: Pallavi Mishra <pallavi.mishra@intel.com>
---
 lib/igt_fb.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/lib/igt_fb.c b/lib/igt_fb.c
index e531a041e..ad0148339 100644
--- a/lib/igt_fb.c
+++ b/lib/igt_fb.c
@@ -1206,7 +1206,8 @@ static int create_bo_for_fb(struct igt_fb *fb, bool prefer_sysmem)
 			igt_assert(err == 0 || err == -EOPNOTSUPP);
 		} else if (is_xe_device(fd)) {
 			fb->gem_handle = xe_bo_create_flags(fd, 0, fb->size,
-							visible_vram_if_possible(fd, 0));
+							    visible_vram_if_possible(fd, 0) |
+							    XE_GEM_CREATE_FLAG_SCANOUT);
 		} else if (is_vc4_device(fd)) {
 			fb->gem_handle = igt_vc4_create_bo(fd, fb->size);
 
-- 
2.41.0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [igt-dev] [PATCH i-g-t v4 03/15] lib/igt_draw: mark buffers as SCANOUT
  2023-10-19 14:40 [igt-dev] [PATCH i-g-t v4 00/15] PAT and cache coherency support Matthew Auld
  2023-10-19 14:40 ` [igt-dev] [PATCH i-g-t v4 01/15] drm-uapi/xe_drm: sync to get pat and coherency bits Matthew Auld
  2023-10-19 14:40 ` [igt-dev] [PATCH i-g-t v4 02/15] lib/igt_fb: mark buffers as SCANOUT Matthew Auld
@ 2023-10-19 14:40 ` Matthew Auld
  2023-10-19 14:40 ` [igt-dev] [PATCH i-g-t v4 04/15] lib/xe: support cpu_caching and coh_mod for gem_create Matthew Auld
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 28+ messages in thread
From: Matthew Auld @ 2023-10-19 14:40 UTC (permalink / raw)
  To: igt-dev

Display buffers likely will want WC, instead of the default WB on the
CPU side, given that display engine is incoherent with CPU caches.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Cc: Pallavi Mishra <pallavi.mishra@intel.com>
Reviewed-by: Pallavi Mishra <pallavi.mishra@intel.com>
---
 lib/igt_draw.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/lib/igt_draw.c b/lib/igt_draw.c
index 9a7664a37..1cf9d87c9 100644
--- a/lib/igt_draw.c
+++ b/lib/igt_draw.c
@@ -797,7 +797,8 @@ static void draw_rect_render(int fd, struct cmd_data *cmd_data,
 	else
 		tmp.handle = xe_bo_create_flags(fd, 0,
 						ALIGN(tmp.size, xe_get_default_alignment(fd)),
-						visible_vram_if_possible(fd, 0));
+						visible_vram_if_possible(fd, 0) |
+						XE_GEM_CREATE_FLAG_SCANOUT);
 
 	tmp.stride = rect->w * pixel_size;
 	tmp.bpp = buf->bpp;
-- 
2.41.0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [igt-dev] [PATCH i-g-t v4 04/15] lib/xe: support cpu_caching and coh_mod for gem_create
  2023-10-19 14:40 [igt-dev] [PATCH i-g-t v4 00/15] PAT and cache coherency support Matthew Auld
                   ` (2 preceding siblings ...)
  2023-10-19 14:40 ` [igt-dev] [PATCH i-g-t v4 03/15] lib/igt_draw: " Matthew Auld
@ 2023-10-19 14:40 ` Matthew Auld
  2023-10-19 14:40 ` [igt-dev] [PATCH i-g-t v4 05/15] tests/xe/mmap: add some tests for cpu_caching and coh_mode Matthew Auld
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 28+ messages in thread
From: Matthew Auld @ 2023-10-19 14:40 UTC (permalink / raw)
  To: igt-dev

Most tests shouldn't about such things, so likely it's just a case of
picking the most sane default. However we also add some helpers for the
tests that do care.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Cc: Pallavi Mishra <pallavi.mishra@intel.com>
Reviewed-by: Pallavi Mishra <pallavi.mishra@intel.com>
---
 lib/xe/xe_ioctl.c       | 65 ++++++++++++++++++++++++++++++++++-------
 lib/xe/xe_ioctl.h       |  8 +++++
 tests/intel/xe_create.c |  3 ++
 3 files changed, 65 insertions(+), 11 deletions(-)

diff --git a/lib/xe/xe_ioctl.c b/lib/xe/xe_ioctl.c
index 895e3bd4e..4cf44f1ee 100644
--- a/lib/xe/xe_ioctl.c
+++ b/lib/xe/xe_ioctl.c
@@ -226,13 +226,30 @@ void xe_vm_destroy(int fd, uint32_t vm)
 	igt_assert_eq(igt_ioctl(fd, DRM_IOCTL_XE_VM_DESTROY, &destroy), 0);
 }
 
-uint32_t __xe_bo_create_flags(int fd, uint32_t vm, uint64_t size, uint32_t flags,
-			      uint32_t *handle)
+void __xe_default_coh_caching_from_flags(int fd, uint32_t flags,
+					 uint16_t *cpu_caching,
+					 uint16_t *coh_mode)
+{
+	if ((flags & all_memory_regions(fd)) != system_memory(fd) ||
+	    flags & XE_GEM_CREATE_FLAG_SCANOUT) {
+		/* VRAM placements or scanout should always use WC */
+		*cpu_caching = DRM_XE_GEM_CPU_CACHING_WC;
+		*coh_mode = DRM_XE_GEM_COH_NONE;
+	} else {
+		*cpu_caching = DRM_XE_GEM_CPU_CACHING_WB;
+		*coh_mode = DRM_XE_GEM_COH_AT_LEAST_1WAY;
+	}
+}
+
+static uint32_t ___xe_bo_create_flags(int fd, uint32_t vm, uint64_t size, uint32_t flags,
+				      uint16_t cpu_caching, uint16_t coh_mode, uint32_t *handle)
 {
 	struct drm_xe_gem_create create = {
 		.vm_id = vm,
 		.size = size,
 		.flags = flags,
+		.cpu_caching = cpu_caching,
+		.coh_mode = coh_mode,
 	};
 	int err;
 
@@ -242,6 +259,18 @@ uint32_t __xe_bo_create_flags(int fd, uint32_t vm, uint64_t size, uint32_t flags
 
 	*handle = create.handle;
 	return 0;
+
+}
+
+uint32_t __xe_bo_create_flags(int fd, uint32_t vm, uint64_t size, uint32_t flags,
+			      uint32_t *handle)
+{
+	uint16_t cpu_caching, coh_mode;
+
+	__xe_default_coh_caching_from_flags(fd, flags, &cpu_caching, &coh_mode);
+
+	return ___xe_bo_create_flags(fd, vm, size, flags, cpu_caching, coh_mode,
+				     handle);
 }
 
 uint32_t xe_bo_create_flags(int fd, uint32_t vm, uint64_t size, uint32_t flags)
@@ -253,19 +282,33 @@ uint32_t xe_bo_create_flags(int fd, uint32_t vm, uint64_t size, uint32_t flags)
 	return handle;
 }
 
+uint32_t __xe_bo_create_caching(int fd, uint32_t vm, uint64_t size, uint32_t flags,
+				uint16_t cpu_caching, uint16_t coh_mode,
+				uint32_t *handle)
+{
+	return ___xe_bo_create_flags(fd, vm, size, flags, cpu_caching, coh_mode,
+				     handle);
+}
+
+uint32_t xe_bo_create_caching(int fd, uint32_t vm, uint64_t size, uint32_t flags,
+			      uint16_t cpu_caching, uint16_t coh_mode)
+{
+	uint32_t handle;
+
+	igt_assert_eq(__xe_bo_create_caching(fd, vm, size, flags,
+					     cpu_caching, coh_mode, &handle), 0);
+
+	return handle;
+}
+
 uint32_t xe_bo_create(int fd, int gt, uint32_t vm, uint64_t size)
 {
-	struct drm_xe_gem_create create = {
-		.vm_id = vm,
-		.size = size,
-		.flags = vram_if_possible(fd, gt),
-	};
-	int err;
+	uint32_t handle;
 
-	err = igt_ioctl(fd, DRM_IOCTL_XE_GEM_CREATE, &create);
-	igt_assert_eq(err, 0);
+	igt_assert_eq(__xe_bo_create_flags(fd, vm, size, vram_if_possible(fd, gt),
+					   &handle), 0);
 
-	return create.handle;
+	return handle;
 }
 
 uint32_t xe_bind_exec_queue_create(int fd, uint32_t vm, uint64_t ext, bool async)
diff --git a/lib/xe/xe_ioctl.h b/lib/xe/xe_ioctl.h
index a8dbcf376..e3f62a28a 100644
--- a/lib/xe/xe_ioctl.h
+++ b/lib/xe/xe_ioctl.h
@@ -67,6 +67,14 @@ void xe_vm_destroy(int fd, uint32_t vm);
 uint32_t __xe_bo_create_flags(int fd, uint32_t vm, uint64_t size, uint32_t flags,
 			      uint32_t *handle);
 uint32_t xe_bo_create_flags(int fd, uint32_t vm, uint64_t size, uint32_t flags);
+uint32_t __xe_bo_create_caching(int fd, uint32_t vm, uint64_t size, uint32_t flags,
+				uint16_t cpu_caching, uint16_t coh_mode,
+				uint32_t *handle);
+uint32_t xe_bo_create_caching(int fd, uint32_t vm, uint64_t size, uint32_t flags,
+			      uint16_t cpu_caching, uint16_t coh_mode);
+void __xe_default_coh_caching_from_flags(int fd, uint32_t flags,
+					 uint16_t *cpu_caching,
+					 uint16_t *coh_mode);
 uint32_t xe_bo_create(int fd, int gt, uint32_t vm, uint64_t size);
 uint32_t xe_exec_queue_create(int fd, uint32_t vm,
 			  struct drm_xe_engine_class_instance *instance,
diff --git a/tests/intel/xe_create.c b/tests/intel/xe_create.c
index d99bd51cf..ae8c501f6 100644
--- a/tests/intel/xe_create.c
+++ b/tests/intel/xe_create.c
@@ -30,6 +30,9 @@ static int __create_bo(int fd, uint32_t vm, uint64_t size, uint32_t flags,
 
 	igt_assert(handlep);
 
+	__xe_default_coh_caching_from_flags(fd, flags, &create.cpu_caching,
+					    &create.coh_mode);
+
 	if (igt_ioctl(fd, DRM_IOCTL_XE_GEM_CREATE, &create)) {
 		ret = -errno;
 		errno = 0;
-- 
2.41.0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [igt-dev] [PATCH i-g-t v4 05/15] tests/xe/mmap: add some tests for cpu_caching and coh_mode
  2023-10-19 14:40 [igt-dev] [PATCH i-g-t v4 00/15] PAT and cache coherency support Matthew Auld
                   ` (3 preceding siblings ...)
  2023-10-19 14:40 ` [igt-dev] [PATCH i-g-t v4 04/15] lib/xe: support cpu_caching and coh_mod for gem_create Matthew Auld
@ 2023-10-19 14:40 ` Matthew Auld
  2023-10-19 14:40 ` [igt-dev] [PATCH i-g-t v4 06/15] lib/intel_pat: add helpers for common pat_index modes Matthew Auld
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 28+ messages in thread
From: Matthew Auld @ 2023-10-19 14:40 UTC (permalink / raw)
  To: igt-dev

Ensure the various invalid combinations are rejected. Also ensure we can
mmap and fault anything that is valid.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Cc: Pallavi Mishra <pallavi.mishra@intel.com>
Reviewed-by: Pallavi Mishra <pallavi.mishra@intel.com>
---
 tests/intel/xe_mmap.c | 77 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 77 insertions(+)

diff --git a/tests/intel/xe_mmap.c b/tests/intel/xe_mmap.c
index 7e7e43c00..09e9c8aae 100644
--- a/tests/intel/xe_mmap.c
+++ b/tests/intel/xe_mmap.c
@@ -199,6 +199,80 @@ static void test_small_bar(int fd)
 	gem_close(fd, bo);
 }
 
+static void assert_caching(int fd, uint64_t flags, uint16_t cpu_caching,
+			   uint16_t coh_mode, bool fail)
+{
+	uint64_t size = xe_get_default_alignment(fd);
+	uint64_t mmo;
+	uint32_t handle;
+	uint32_t *map;
+	bool ret;
+
+	ret = __xe_bo_create_caching(fd, 0, size, flags, cpu_caching,
+				     coh_mode, &handle);
+	igt_assert(ret == fail);
+
+	if (fail)
+		return;
+
+	mmo = xe_bo_mmap_offset(fd, handle);
+	map = mmap(NULL, size, PROT_WRITE, MAP_SHARED, fd, mmo);
+	igt_assert(map != MAP_FAILED);
+	map[0] = 0xdeadbeaf;
+	gem_close(fd, handle);
+}
+
+/**
+ * SUBTEST: cpu-caching-coh
+ * Description: Test cpu_caching and coh, including mmap behaviour.
+ * Test category: functionality test
+ */
+static void test_cpu_caching(int fd)
+{
+	if (vram_memory(fd, 0)) {
+		assert_caching(fd, vram_memory(fd, 0),
+			       DRM_XE_GEM_CPU_CACHING_WC, DRM_XE_GEM_COH_NONE,
+			       false);
+		assert_caching(fd, vram_memory(fd, 0),
+			       DRM_XE_GEM_CPU_CACHING_WC, DRM_XE_GEM_COH_AT_LEAST_1WAY,
+			       false);
+		assert_caching(fd, vram_memory(fd, 0) | system_memory(fd),
+			       DRM_XE_GEM_CPU_CACHING_WC, DRM_XE_GEM_COH_NONE,
+			       false);
+
+		assert_caching(fd, vram_memory(fd, 0),
+			       DRM_XE_GEM_CPU_CACHING_WB, DRM_XE_GEM_COH_NONE,
+			       true);
+		assert_caching(fd, vram_memory(fd, 0),
+			       DRM_XE_GEM_CPU_CACHING_WB, DRM_XE_GEM_COH_AT_LEAST_1WAY,
+			       true);
+		assert_caching(fd, vram_memory(fd, 0) | system_memory(fd),
+			       DRM_XE_GEM_CPU_CACHING_WB, DRM_XE_GEM_COH_NONE,
+			       true);
+		assert_caching(fd, vram_memory(fd, 0) | system_memory(fd),
+			       DRM_XE_GEM_CPU_CACHING_WB, DRM_XE_GEM_COH_AT_LEAST_1WAY,
+			       true);
+	}
+
+	assert_caching(fd, system_memory(fd), DRM_XE_GEM_CPU_CACHING_WB,
+		       DRM_XE_GEM_COH_AT_LEAST_1WAY, false);
+	assert_caching(fd, system_memory(fd), DRM_XE_GEM_CPU_CACHING_WC,
+		       DRM_XE_GEM_COH_NONE, false);
+	assert_caching(fd, system_memory(fd), DRM_XE_GEM_CPU_CACHING_WC,
+		       DRM_XE_GEM_COH_AT_LEAST_1WAY, false);
+
+	assert_caching(fd, system_memory(fd), DRM_XE_GEM_CPU_CACHING_WB,
+		       DRM_XE_GEM_COH_NONE, true);
+	assert_caching(fd, system_memory(fd), -1, -1, true);
+	assert_caching(fd, system_memory(fd), 0, 0, true);
+	assert_caching(fd, system_memory(fd), 0, DRM_XE_GEM_COH_AT_LEAST_1WAY, true);
+	assert_caching(fd, system_memory(fd), DRM_XE_GEM_CPU_CACHING_WC, 0, true);
+	assert_caching(fd, system_memory(fd), DRM_XE_GEM_CPU_CACHING_WC + 1,
+		       DRM_XE_GEM_COH_AT_LEAST_1WAY, true);
+	assert_caching(fd, system_memory(fd), DRM_XE_GEM_CPU_CACHING_WC,
+		       DRM_XE_GEM_COH_AT_LEAST_1WAY + 1, true);
+}
+
 igt_main
 {
 	int fd;
@@ -230,6 +304,9 @@ igt_main
 		test_small_bar(fd);
 	}
 
+	igt_subtest("cpu-caching-coh")
+		test_cpu_caching(fd);
+
 	igt_fixture
 		drm_close_driver(fd);
 }
-- 
2.41.0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [igt-dev] [PATCH i-g-t v4 06/15] lib/intel_pat: add helpers for common pat_index modes
  2023-10-19 14:40 [igt-dev] [PATCH i-g-t v4 00/15] PAT and cache coherency support Matthew Auld
                   ` (4 preceding siblings ...)
  2023-10-19 14:40 ` [igt-dev] [PATCH i-g-t v4 05/15] tests/xe/mmap: add some tests for cpu_caching and coh_mode Matthew Auld
@ 2023-10-19 14:40 ` Matthew Auld
  2023-10-19 14:40 ` [igt-dev] [PATCH i-g-t v4 07/15] lib/allocator: add get_offset_pat_index() helper Matthew Auld
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 28+ messages in thread
From: Matthew Auld @ 2023-10-19 14:40 UTC (permalink / raw)
  To: igt-dev

For now just add uc, wt and wb for every platform. The wb mode should
always be at least 1way coherent, if messing around with system memory.
Also make non-matching platforms throw an error rather than trying to
inherit the modes from previous platforms since they will likely be
different.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Cc: Pallavi Mishra <pallavi.mishra@intel.com>
Reviewed-by: Pallavi Mishra <pallavi.mishra@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
---
 lib/intel_pat.c | 77 +++++++++++++++++++++++++++++++++++++++++++++++++
 lib/intel_pat.h | 19 ++++++++++++
 lib/meson.build |  1 +
 3 files changed, 97 insertions(+)
 create mode 100644 lib/intel_pat.c
 create mode 100644 lib/intel_pat.h

diff --git a/lib/intel_pat.c b/lib/intel_pat.c
new file mode 100644
index 000000000..2b892ee52
--- /dev/null
+++ b/lib/intel_pat.c
@@ -0,0 +1,77 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2023 Intel Corporation
+ */
+
+#include "intel_pat.h"
+
+#include "igt.h"
+
+struct intel_pat_cache {
+	uint8_t uc; /* UC + COH_NONE */
+	uint8_t wt; /* WT + COH_NONE */
+	uint8_t wb; /* WB + COH_AT_LEAST_1WAY */
+
+	uint8_t max_index;
+};
+
+static void intel_get_pat_idx(int fd, struct intel_pat_cache *pat)
+{
+	uint16_t dev_id = intel_get_drm_devid(fd);
+
+	if (intel_get_device_info(dev_id)->graphics_ver == 20) {
+		pat->uc = 3;
+		pat->wt = 15; /* Compressed + WB-transient */
+		pat->wb = 2;
+		pat->max_index = 31;
+	} else if (IS_METEORLAKE(dev_id)) {
+		pat->uc = 2;
+		pat->wt = 1;
+		pat->wb = 3;
+		pat->max_index = 3;
+	} else if (IS_PONTEVECCHIO(dev_id)) {
+		pat->uc = 0;
+		pat->wt = 2;
+		pat->wb = 3;
+		pat->max_index = 7;
+	} else if (intel_graphics_ver(dev_id) <= IP_VER(12, 60)) {
+		pat->uc = 3;
+		pat->wt = 2;
+		pat->wb = 0;
+		pat->max_index = 3;
+	} else {
+		igt_critical("Platform is missing PAT settings for uc/wt/wb\n");
+	}
+}
+
+uint8_t intel_get_max_pat_index(int fd)
+{
+	struct intel_pat_cache pat = {};
+
+	intel_get_pat_idx(fd, &pat);
+	return pat.max_index;
+}
+
+uint8_t intel_get_pat_idx_uc(int fd)
+{
+	struct intel_pat_cache pat = {};
+
+	intel_get_pat_idx(fd, &pat);
+	return pat.uc;
+}
+
+uint8_t intel_get_pat_idx_wt(int fd)
+{
+	struct intel_pat_cache pat = {};
+
+	intel_get_pat_idx(fd, &pat);
+	return pat.wt;
+}
+
+uint8_t intel_get_pat_idx_wb(int fd)
+{
+	struct intel_pat_cache pat = {};
+
+	intel_get_pat_idx(fd, &pat);
+	return pat.wb;
+}
diff --git a/lib/intel_pat.h b/lib/intel_pat.h
new file mode 100644
index 000000000..c24dbc275
--- /dev/null
+++ b/lib/intel_pat.h
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2023 Intel Corporation
+ */
+
+#ifndef INTEL_PAT_H
+#define INTEL_PAT_H
+
+#include <stdint.h>
+
+#define DEFAULT_PAT_INDEX ((uint8_t)-1) /* igt-core can pick 1way or better */
+
+uint8_t intel_get_max_pat_index(int fd);
+
+uint8_t intel_get_pat_idx_uc(int fd);
+uint8_t intel_get_pat_idx_wt(int fd);
+uint8_t intel_get_pat_idx_wb(int fd);
+
+#endif /* INTEL_PAT_H */
diff --git a/lib/meson.build b/lib/meson.build
index a7bccafc3..48466a2e9 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -64,6 +64,7 @@ lib_sources = [
 	'intel_device_info.c',
 	'intel_mmio.c',
 	'intel_mocs.c',
+	'intel_pat.c',
 	'ioctl_wrappers.c',
 	'media_spin.c',
 	'media_fill.c',
-- 
2.41.0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [igt-dev] [PATCH i-g-t v4 07/15] lib/allocator: add get_offset_pat_index() helper
  2023-10-19 14:40 [igt-dev] [PATCH i-g-t v4 00/15] PAT and cache coherency support Matthew Auld
                   ` (5 preceding siblings ...)
  2023-10-19 14:40 ` [igt-dev] [PATCH i-g-t v4 06/15] lib/intel_pat: add helpers for common pat_index modes Matthew Auld
@ 2023-10-19 14:40 ` Matthew Auld
  2023-10-19 14:40 ` [igt-dev] [PATCH i-g-t v4 08/15] lib/intel_blt: support pat_index Matthew Auld
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 28+ messages in thread
From: Matthew Auld @ 2023-10-19 14:40 UTC (permalink / raw)
  To: igt-dev

For some cases we are going to need to pass the pat_index for the
vm_bind op. Add a helper for this, such that we can allocate an address
and give the mapping some pat_index.

v2 (Zbigniew)
  - Plumb pat_index down into intel_allocator_record and protect against
    potential changes.
  - Add pat_index to bind_debug().
v3 (Zbigniew)
   - A few more improvements

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Cc: Pallavi Mishra <pallavi.mishra@intel.com>
Reviewed-by: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com>
---
 lib/intel_allocator.c             | 58 ++++++++++++++++++++++---------
 lib/intel_allocator.h             |  7 ++--
 lib/intel_allocator_msgchannel.h  |  1 +
 lib/intel_allocator_reloc.c       |  5 ++-
 lib/intel_allocator_simple.c      |  5 ++-
 lib/xe/xe_util.c                  |  1 +
 lib/xe/xe_util.h                  |  1 +
 tests/intel/api_intel_allocator.c |  4 ++-
 8 files changed, 61 insertions(+), 21 deletions(-)

diff --git a/lib/intel_allocator.c b/lib/intel_allocator.c
index f0a9b7fb5..e5b9457b8 100644
--- a/lib/intel_allocator.c
+++ b/lib/intel_allocator.c
@@ -16,6 +16,7 @@
 #include "igt_map.h"
 #include "intel_allocator.h"
 #include "intel_allocator_msgchannel.h"
+#include "intel_pat.h"
 #include "xe/xe_query.h"
 #include "xe/xe_util.h"
 
@@ -92,6 +93,7 @@ struct allocator_object {
 	uint32_t handle;
 	uint64_t offset;
 	uint64_t size;
+	uint8_t pat_index;
 
 	enum allocator_bind_op bind_op;
 };
@@ -590,16 +592,17 @@ static int handle_request(struct alloc_req *req, struct alloc_resp *resp)
 							req->alloc.handle,
 							req->alloc.size,
 							req->alloc.alignment,
+							req->alloc.pat_index,
 							req->alloc.strategy);
 			alloc_info("<alloc> [tid: %ld] ahnd: %" PRIx64
 				   ", ctx: %u, vm: %u, handle: %u"
 				   ", size: 0x%" PRIx64 ", offset: 0x%" PRIx64
-				   ", alignment: 0x%" PRIx64 ", strategy: %u\n",
+				   ", alignment: 0x%" PRIx64 ", pat_index: %u, strategy: %u\n",
 				   (long) req->tid, req->allocator_handle,
 				   al->ctx, al->vm,
 				   req->alloc.handle, req->alloc.size,
 				   resp->alloc.offset, req->alloc.alignment,
-				   req->alloc.strategy);
+				   req->alloc.pat_index, req->alloc.strategy);
 			break;
 
 		case REQ_FREE:
@@ -1122,24 +1125,24 @@ void intel_allocator_get_address_range(uint64_t allocator_handle,
 
 static bool is_same(struct allocator_object *obj,
 		    uint32_t handle, uint64_t offset, uint64_t size,
-		    enum allocator_bind_op bind_op)
+		    uint8_t pat_index, enum allocator_bind_op bind_op)
 {
 	return obj->handle == handle &&	obj->offset == offset && obj->size == size &&
-	       (obj->bind_op == bind_op || obj->bind_op == BOUND);
+	       obj->pat_index == pat_index && (obj->bind_op == bind_op || obj->bind_op == BOUND);
 }
 
 static void track_object(uint64_t allocator_handle, uint32_t handle,
-			 uint64_t offset, uint64_t size,
+			 uint64_t offset, uint64_t size, uint8_t pat_index,
 			 enum allocator_bind_op bind_op)
 {
 	struct ahnd_info *ainfo;
 	struct allocator_object *obj;
 
-	bind_debug("[TRACK OBJECT]: [%s] pid: %d, tid: %d, ahnd: %llx, handle: %u, offset: %llx, size: %llx\n",
+	bind_debug("[TRACK OBJECT]: [%s] pid: %d, tid: %d, ahnd: %llx, handle: %u, offset: %llx, size: %llx, pat_index: %u\n",
 		   bind_op == TO_BIND ? "BIND" : "UNBIND",
 		   getpid(), gettid(),
 		   (long long)allocator_handle,
-		   handle, (long long)offset, (long long)size);
+		   handle, (long long)offset, (long long)size, pat_index);
 
 	if (offset == ALLOC_INVALID_ADDRESS) {
 		bind_debug("[TRACK OBJECT] => invalid address %llx, skipping tracking\n",
@@ -1156,6 +1159,9 @@ static void track_object(uint64_t allocator_handle, uint32_t handle,
 	if (ainfo->driver == INTEL_DRIVER_I915)
 		return; /* no-op for i915, at least for now */
 
+	if (pat_index == DEFAULT_PAT_INDEX)
+		pat_index = intel_get_pat_idx_wb(ainfo->fd);
+
 	pthread_mutex_lock(&ainfo->bind_map_mutex);
 	obj = igt_map_search(ainfo->bind_map, &handle);
 	if (obj) {
@@ -1165,7 +1171,7 @@ static void track_object(uint64_t allocator_handle, uint32_t handle,
 		 * bind_map.
 		 */
 		if (bind_op == TO_BIND) {
-			igt_assert_eq(is_same(obj, handle, offset, size, bind_op), true);
+			igt_assert_eq(is_same(obj, handle, offset, size, pat_index, bind_op), true);
 		} else if (bind_op == TO_UNBIND) {
 			if (obj->bind_op == TO_BIND)
 				igt_map_remove(ainfo->bind_map, &obj->handle, map_entry_free_func);
@@ -1181,6 +1187,7 @@ static void track_object(uint64_t allocator_handle, uint32_t handle,
 		obj->handle = handle;
 		obj->offset = offset;
 		obj->size = size;
+		obj->pat_index = pat_index;
 		obj->bind_op = bind_op;
 		igt_map_insert(ainfo->bind_map, &obj->handle, obj);
 	}
@@ -1194,6 +1201,8 @@ out:
  * @handle: handle to an object
  * @size: size of an object
  * @alignment: determines object alignment
+ * @pat_index: chosen pat_index for the binding
+ * @strategy: chosen allocator strategy
  *
  * Function finds and returns the most suitable offset with given @alignment
  * for an object with @size identified by the @handle.
@@ -1204,14 +1213,16 @@ out:
  */
 uint64_t __intel_allocator_alloc(uint64_t allocator_handle, uint32_t handle,
 				 uint64_t size, uint64_t alignment,
-				 enum allocator_strategy strategy)
+				 uint8_t pat_index, enum allocator_strategy strategy)
 {
 	struct alloc_req req = { .request_type = REQ_ALLOC,
 				 .allocator_handle = allocator_handle,
 				 .alloc.handle = handle,
 				 .alloc.size = size,
 				 .alloc.strategy = strategy,
-				 .alloc.alignment = alignment };
+				 .alloc.alignment = alignment,
+				 .alloc.pat_index = pat_index,
+	};
 	struct alloc_resp resp;
 
 	igt_assert((alignment & (alignment-1)) == 0);
@@ -1219,7 +1230,8 @@ uint64_t __intel_allocator_alloc(uint64_t allocator_handle, uint32_t handle,
 	igt_assert(handle_request(&req, &resp) == 0);
 	igt_assert(resp.response_type == RESP_ALLOC);
 
-	track_object(allocator_handle, handle, resp.alloc.offset, size, TO_BIND);
+	track_object(allocator_handle, handle, resp.alloc.offset, size, pat_index,
+		     TO_BIND);
 
 	return resp.alloc.offset;
 }
@@ -1241,7 +1253,7 @@ uint64_t intel_allocator_alloc(uint64_t allocator_handle, uint32_t handle,
 	uint64_t offset;
 
 	offset = __intel_allocator_alloc(allocator_handle, handle,
-					 size, alignment,
+					 size, alignment, DEFAULT_PAT_INDEX,
 					 ALLOC_STRATEGY_NONE);
 	igt_assert(offset != ALLOC_INVALID_ADDRESS);
 
@@ -1268,7 +1280,8 @@ uint64_t intel_allocator_alloc_with_strategy(uint64_t allocator_handle,
 	uint64_t offset;
 
 	offset = __intel_allocator_alloc(allocator_handle, handle,
-					 size, alignment, strategy);
+					 size, alignment, DEFAULT_PAT_INDEX,
+					 strategy);
 	igt_assert(offset != ALLOC_INVALID_ADDRESS);
 
 	return offset;
@@ -1298,7 +1311,7 @@ bool intel_allocator_free(uint64_t allocator_handle, uint32_t handle)
 	igt_assert(handle_request(&req, &resp) == 0);
 	igt_assert(resp.response_type == RESP_FREE);
 
-	track_object(allocator_handle, handle, 0, 0, TO_UNBIND);
+	track_object(allocator_handle, handle, 0, 0, 0, TO_UNBIND);
 
 	return resp.free.freed;
 }
@@ -1500,16 +1513,17 @@ static void __xe_op_bind(struct ahnd_info *ainfo, uint32_t sync_in, uint32_t syn
 		if (obj->bind_op == BOUND)
 			continue;
 
-		bind_info("= [vm: %u] %s => %u %lx %lx\n",
+		bind_info("= [vm: %u] %s => %u %lx %lx %u\n",
 			  ainfo->vm,
 			  obj->bind_op == TO_BIND ? "TO BIND" : "TO UNBIND",
 			  obj->handle, obj->offset,
-			  obj->size);
+			  obj->size, obj->pat_index);
 
 		entry = malloc(sizeof(*entry));
 		entry->handle = obj->handle;
 		entry->offset = obj->offset;
 		entry->size = obj->size;
+		entry->pat_index = obj->pat_index;
 		entry->bind_op = obj->bind_op == TO_BIND ? XE_OBJECT_BIND :
 							   XE_OBJECT_UNBIND;
 		igt_list_add(&entry->link, &obj_list);
@@ -1534,6 +1548,18 @@ static void __xe_op_bind(struct ahnd_info *ainfo, uint32_t sync_in, uint32_t syn
 	}
 }
 
+uint64_t get_offset_pat_index(uint64_t ahnd, uint32_t handle, uint64_t size,
+			      uint64_t alignment, uint8_t pat_index)
+{
+	uint64_t offset;
+
+	offset = __intel_allocator_alloc(ahnd, handle, size, alignment,
+					 pat_index, ALLOC_STRATEGY_NONE);
+	igt_assert(offset != ALLOC_INVALID_ADDRESS);
+
+	return offset;
+}
+
 /**
  * intel_allocator_bind:
  * @allocator_handle: handle to an allocator
diff --git a/lib/intel_allocator.h b/lib/intel_allocator.h
index f9ff7f1cc..4b6292f06 100644
--- a/lib/intel_allocator.h
+++ b/lib/intel_allocator.h
@@ -144,7 +144,7 @@ struct intel_allocator {
 	void (*get_address_range)(struct intel_allocator *ial,
 				  uint64_t *startp, uint64_t *endp);
 	uint64_t (*alloc)(struct intel_allocator *ial, uint32_t handle,
-			  uint64_t size, uint64_t alignment,
+			  uint64_t size, uint64_t alignment, uint8_t pat_index,
 			  enum allocator_strategy strategy);
 	bool (*is_allocated)(struct intel_allocator *ial, uint32_t handle,
 			     uint64_t size, uint64_t offset);
@@ -186,7 +186,7 @@ bool intel_allocator_close(uint64_t allocator_handle);
 void intel_allocator_get_address_range(uint64_t allocator_handle,
 				       uint64_t *startp, uint64_t *endp);
 uint64_t __intel_allocator_alloc(uint64_t allocator_handle, uint32_t handle,
-				 uint64_t size, uint64_t alignment,
+				 uint64_t size, uint64_t alignment, uint8_t pat_index,
 				 enum allocator_strategy strategy);
 uint64_t intel_allocator_alloc(uint64_t allocator_handle, uint32_t handle,
 			       uint64_t size, uint64_t alignment);
@@ -266,6 +266,9 @@ static inline bool put_ahnd(uint64_t ahnd)
 	return !ahnd || intel_allocator_close(ahnd);
 }
 
+uint64_t get_offset_pat_index(uint64_t ahnd, uint32_t handle, uint64_t size,
+			      uint64_t alignment, uint8_t pat_index);
+
 static inline uint64_t get_offset(uint64_t ahnd, uint32_t handle,
 				  uint64_t size, uint64_t alignment)
 {
diff --git a/lib/intel_allocator_msgchannel.h b/lib/intel_allocator_msgchannel.h
index ba38530fd..55e2e0ed6 100644
--- a/lib/intel_allocator_msgchannel.h
+++ b/lib/intel_allocator_msgchannel.h
@@ -60,6 +60,7 @@ struct alloc_req {
 			uint32_t handle;
 			uint64_t size;
 			uint64_t alignment;
+			uint8_t pat_index;
 			uint8_t strategy;
 		} alloc;
 
diff --git a/lib/intel_allocator_reloc.c b/lib/intel_allocator_reloc.c
index 3aa9ebe76..e7d5dce4a 100644
--- a/lib/intel_allocator_reloc.c
+++ b/lib/intel_allocator_reloc.c
@@ -29,6 +29,7 @@ struct intel_allocator_record {
 	uint32_t handle;
 	uint64_t offset;
 	uint64_t size;
+	uint8_t pat_index;
 };
 
 /* Keep the low 256k clear, for negative deltas */
@@ -54,7 +55,7 @@ static void intel_allocator_reloc_get_address_range(struct intel_allocator *ial,
 
 static uint64_t intel_allocator_reloc_alloc(struct intel_allocator *ial,
 					    uint32_t handle, uint64_t size,
-					    uint64_t alignment,
+					    uint64_t alignment, uint8_t pat_index,
 					    enum allocator_strategy strategy)
 {
 	struct intel_allocator_record *rec;
@@ -67,6 +68,7 @@ static uint64_t intel_allocator_reloc_alloc(struct intel_allocator *ial,
 	if (rec) {
 		offset = rec->offset;
 		igt_assert(rec->size == size);
+		igt_assert(rec->pat_index == pat_index);
 	} else {
 		aligned_offset = ALIGN(ialr->offset, alignment);
 
@@ -84,6 +86,7 @@ static uint64_t intel_allocator_reloc_alloc(struct intel_allocator *ial,
 		rec->handle = handle;
 		rec->offset = offset;
 		rec->size = size;
+		rec->pat_index = pat_index;
 
 		igt_map_insert(ialr->objects, &rec->handle, rec);
 
diff --git a/lib/intel_allocator_simple.c b/lib/intel_allocator_simple.c
index 3d5e45870..25b92db11 100644
--- a/lib/intel_allocator_simple.c
+++ b/lib/intel_allocator_simple.c
@@ -48,6 +48,7 @@ struct intel_allocator_record {
 	uint32_t handle;
 	uint64_t offset;
 	uint64_t size;
+	uint8_t pat_index;
 };
 
 #define simple_vma_foreach_hole(_hole, _heap) \
@@ -371,7 +372,7 @@ static bool simple_vma_heap_alloc_addr(struct intel_allocator_simple *ials,
 
 static uint64_t intel_allocator_simple_alloc(struct intel_allocator *ial,
 					     uint32_t handle, uint64_t size,
-					     uint64_t alignment,
+					     uint64_t alignment, uint8_t pat_index,
 					     enum allocator_strategy strategy)
 {
 	struct intel_allocator_record *rec;
@@ -387,6 +388,7 @@ static uint64_t intel_allocator_simple_alloc(struct intel_allocator *ial,
 	if (rec) {
 		offset = rec->offset;
 		igt_assert(rec->size == size);
+		igt_assert(rec->pat_index == pat_index);
 	} else {
 		if (!simple_vma_heap_alloc(&ials->heap, &offset,
 					   size, alignment, strategy))
@@ -396,6 +398,7 @@ static uint64_t intel_allocator_simple_alloc(struct intel_allocator *ial,
 		rec->handle = handle;
 		rec->offset = offset;
 		rec->size = size;
+		rec->pat_index = pat_index;
 
 		igt_map_insert(ials->objects, &rec->handle, rec);
 		ials->allocated_objects++;
diff --git a/lib/xe/xe_util.c b/lib/xe/xe_util.c
index 5fa4d4610..3610e440c 100644
--- a/lib/xe/xe_util.c
+++ b/lib/xe/xe_util.c
@@ -148,6 +148,7 @@ static struct drm_xe_vm_bind_op *xe_alloc_bind_ops(struct igt_list_head *obj_lis
 		ops->addr = obj->offset;
 		ops->range = obj->size;
 		ops->region = 0;
+		ops->pat_index = obj->pat_index;
 
 		bind_info("  [%d]: [%6s] handle: %u, offset: %llx, size: %llx\n",
 			  i, obj->bind_op == XE_OBJECT_BIND ? "BIND" : "UNBIND",
diff --git a/lib/xe/xe_util.h b/lib/xe/xe_util.h
index e97d236b8..e3bdf3d11 100644
--- a/lib/xe/xe_util.h
+++ b/lib/xe/xe_util.h
@@ -36,6 +36,7 @@ struct xe_object {
 	uint32_t handle;
 	uint64_t offset;
 	uint64_t size;
+	uint8_t pat_index;
 	enum xe_bind_op bind_op;
 	struct igt_list_head link;
 };
diff --git a/tests/intel/api_intel_allocator.c b/tests/intel/api_intel_allocator.c
index f3fcf8a34..d19be3ce9 100644
--- a/tests/intel/api_intel_allocator.c
+++ b/tests/intel/api_intel_allocator.c
@@ -9,6 +9,7 @@
 #include "igt.h"
 #include "igt_aux.h"
 #include "intel_allocator.h"
+#include "intel_pat.h"
 #include "xe/xe_ioctl.h"
 #include "xe/xe_query.h"
 
@@ -131,7 +132,8 @@ static void alloc_simple(int fd)
 
 	intel_allocator_get_address_range(ahnd, &start, &end);
 	offset0 = intel_allocator_alloc(ahnd, 1, end - start, 0);
-	offset1 = __intel_allocator_alloc(ahnd, 2, 4096, 0, ALLOC_STRATEGY_NONE);
+	offset1 = __intel_allocator_alloc(ahnd, 2, 4096, 0, DEFAULT_PAT_INDEX,
+					  ALLOC_STRATEGY_NONE);
 	igt_assert(offset1 == ALLOC_INVALID_ADDRESS);
 	intel_allocator_free(ahnd, 1);
 
-- 
2.41.0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [igt-dev] [PATCH i-g-t v4 08/15] lib/intel_blt: support pat_index
  2023-10-19 14:40 [igt-dev] [PATCH i-g-t v4 00/15] PAT and cache coherency support Matthew Auld
                   ` (6 preceding siblings ...)
  2023-10-19 14:40 ` [igt-dev] [PATCH i-g-t v4 07/15] lib/allocator: add get_offset_pat_index() helper Matthew Auld
@ 2023-10-19 14:40 ` Matthew Auld
  2023-10-19 14:41 ` [igt-dev] [PATCH i-g-t v4 09/15] lib/intel_buf: " Matthew Auld
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 28+ messages in thread
From: Matthew Auld @ 2023-10-19 14:40 UTC (permalink / raw)
  To: igt-dev

For the most part we can just use the default wb, however some users
including display might want to use something else.

v2 (Zbigniew):
  - Fix the formatting slightly.
v3:
  - Rebase on mem_object changes.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Cc: Pallavi Mishra <pallavi.mishra@intel.com>
Reviewed-by: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com>
---
 lib/igt_fb.c                    |  2 +
 lib/intel_blt.c                 | 77 +++++++++++++++++++++------------
 lib/intel_blt.h                 | 12 +++--
 tests/intel/gem_ccs.c           | 16 ++++---
 tests/intel/gem_lmem_swapping.c |  4 +-
 tests/intel/xe_ccs.c            | 19 ++++----
 tests/intel/xe_copy_basic.c     |  9 ++--
 7 files changed, 88 insertions(+), 51 deletions(-)

diff --git a/lib/igt_fb.c b/lib/igt_fb.c
index ad0148339..e8f46534e 100644
--- a/lib/igt_fb.c
+++ b/lib/igt_fb.c
@@ -37,6 +37,7 @@
 #include "i915/gem_mman.h"
 #include "intel_blt.h"
 #include "intel_mocs.h"
+#include "intel_pat.h"
 #include "igt_aux.h"
 #include "igt_color_encoding.h"
 #include "igt_fb.h"
@@ -2768,6 +2769,7 @@ static struct blt_copy_object *blt_fb_init(const struct igt_fb *fb,
 
 	blt_set_object(blt, handle, fb->size, memregion,
 		       intel_get_uc_mocs_index(fb->fd),
+		       intel_get_pat_idx_uc(fb->fd),
 		       blt_tile,
 		       is_ccs_modifier(fb->modifier) ? COMPRESSION_ENABLED : COMPRESSION_DISABLED,
 		       is_gen12_mc_ccs_modifier(fb->modifier) ? COMPRESSION_TYPE_MEDIA : COMPRESSION_TYPE_3D);
diff --git a/lib/intel_blt.c b/lib/intel_blt.c
index 28dc9e96b..2e9074eaf 100644
--- a/lib/intel_blt.c
+++ b/lib/intel_blt.c
@@ -14,6 +14,7 @@
 #include "igt_syncobj.h"
 #include "intel_blt.h"
 #include "intel_mocs.h"
+#include "intel_pat.h"
 #include "xe/xe_ioctl.h"
 #include "xe/xe_query.h"
 #include "xe/xe_util.h"
@@ -849,10 +850,12 @@ uint64_t emit_blt_block_copy(int fd,
 	igt_assert_f(blt, "block-copy requires data to do blit\n");
 
 	alignment = get_default_alignment(fd, blt->driver);
-	src_offset = get_offset(ahnd, blt->src.handle, blt->src.size, alignment)
-		     + blt->src.plane_offset;
-	dst_offset = get_offset(ahnd, blt->dst.handle, blt->dst.size, alignment)
-		     + blt->dst.plane_offset;
+	src_offset = get_offset_pat_index(ahnd, blt->src.handle, blt->src.size,
+					  alignment, blt->src.pat_index);
+	src_offset += blt->src.plane_offset;
+	dst_offset = get_offset_pat_index(ahnd, blt->dst.handle, blt->dst.size,
+					  alignment, blt->dst.pat_index);
+	dst_offset += blt->dst.plane_offset;
 	bb_offset = get_offset(ahnd, blt->bb.handle, blt->bb.size, alignment);
 
 	fill_data(&data, blt, src_offset, dst_offset, ext, ip_ver);
@@ -923,8 +926,10 @@ int blt_block_copy(int fd,
 	igt_assert_neq(blt->driver, 0);
 
 	alignment = get_default_alignment(fd, blt->driver);
-	src_offset = get_offset(ahnd, blt->src.handle, blt->src.size, alignment);
-	dst_offset = get_offset(ahnd, blt->dst.handle, blt->dst.size, alignment);
+	src_offset = get_offset_pat_index(ahnd, blt->src.handle, blt->src.size,
+					  alignment, blt->src.pat_index);
+	dst_offset = get_offset_pat_index(ahnd, blt->dst.handle, blt->dst.size,
+					  alignment, blt->dst.pat_index);
 	bb_offset = get_offset(ahnd, blt->bb.handle, blt->bb.size, alignment);
 
 	emit_blt_block_copy(fd, ahnd, blt, ext, 0, true);
@@ -1128,8 +1133,10 @@ uint64_t emit_blt_ctrl_surf_copy(int fd,
 	igt_assert_f(surf, "ctrl-surf-copy requires data to do ctrl-surf-copy blit\n");
 
 	alignment = max_t(uint64_t, get_default_alignment(fd, surf->driver), 1ull << 16);
-	src_offset = get_offset(ahnd, surf->src.handle, surf->src.size, alignment);
-	dst_offset = get_offset(ahnd, surf->dst.handle, surf->dst.size, alignment);
+	src_offset = get_offset_pat_index(ahnd, surf->src.handle, surf->src.size,
+					  alignment, surf->src.pat_index);
+	dst_offset = get_offset_pat_index(ahnd, surf->dst.handle, surf->dst.size,
+					  alignment, surf->dst.pat_index);
 	bb_offset = get_offset(ahnd, surf->bb.handle, surf->bb.size, alignment);
 
 	if (ip_ver >= IP_VER(20, 0)) {
@@ -1230,8 +1237,10 @@ int blt_ctrl_surf_copy(int fd,
 	igt_assert_neq(surf->driver, 0);
 
 	alignment = max_t(uint64_t, get_default_alignment(fd, surf->driver), 1ull << 16);
-	src_offset = get_offset(ahnd, surf->src.handle, surf->src.size, alignment);
-	dst_offset = get_offset(ahnd, surf->dst.handle, surf->dst.size, alignment);
+	src_offset = get_offset_pat_index(ahnd, surf->src.handle, surf->src.size,
+					  alignment, surf->src.pat_index);
+	dst_offset = get_offset_pat_index(ahnd, surf->dst.handle, surf->dst.size,
+					  alignment, surf->dst.pat_index);
 	bb_offset = get_offset(ahnd, surf->bb.handle, surf->bb.size, alignment);
 
 	emit_blt_ctrl_surf_copy(fd, ahnd, surf, 0, true);
@@ -1470,10 +1479,12 @@ uint64_t emit_blt_fast_copy(int fd,
 	data.dw03.dst_x2 = blt->dst.x2;
 	data.dw03.dst_y2 = blt->dst.y2;
 
-	src_offset = get_offset(ahnd, blt->src.handle, blt->src.size, alignment)
-		     + blt->src.plane_offset;
-	dst_offset = get_offset(ahnd, blt->dst.handle, blt->dst.size, alignment)
-		     + blt->dst.plane_offset;
+	src_offset = get_offset_pat_index(ahnd, blt->src.handle, blt->src.size,
+					  alignment, blt->src.pat_index);
+	src_offset += blt->src.plane_offset;
+	dst_offset = get_offset_pat_index(ahnd, blt->dst.handle, blt->dst.size, alignment,
+					  blt->dst.pat_index);
+	dst_offset += blt->dst.plane_offset;
 	bb_offset = get_offset(ahnd, blt->bb.handle, blt->bb.size, alignment);
 
 	data.dw04.dst_address_lo = dst_offset;
@@ -1547,8 +1558,10 @@ int blt_fast_copy(int fd,
 	igt_assert_neq(blt->driver, 0);
 
 	alignment = get_default_alignment(fd, blt->driver);
-	src_offset = get_offset(ahnd, blt->src.handle, blt->src.size, alignment);
-	dst_offset = get_offset(ahnd, blt->dst.handle, blt->dst.size, alignment);
+	src_offset = get_offset_pat_index(ahnd, blt->src.handle, blt->src.size,
+					  alignment, blt->src.pat_index);
+	dst_offset = get_offset_pat_index(ahnd, blt->dst.handle, blt->dst.size,
+					  alignment, blt->dst.pat_index);
 	bb_offset = get_offset(ahnd, blt->bb.handle, blt->bb.size, alignment);
 
 	emit_blt_fast_copy(fd, ahnd, blt, 0, true);
@@ -1603,8 +1616,10 @@ static void emit_blt_mem_copy(int fd, uint64_t ahnd, const struct blt_mem_data *
 	uint32_t optype;
 
 	alignment = get_default_alignment(fd, mem->driver);
-	src_offset = get_offset(ahnd, mem->src.handle, mem->src.size, alignment);
-	dst_offset = get_offset(ahnd, mem->dst.handle, mem->dst.size, alignment);
+	src_offset = get_offset_pat_index(ahnd, mem->src.handle, mem->src.size,
+					  alignment, mem->src.pat_index);
+	dst_offset = get_offset_pat_index(ahnd, mem->dst.handle, mem->dst.size,
+					  alignment, mem->dst.pat_index);
 
 	batch = bo_map(fd, mem->bb.handle, mem->bb.size, mem->driver);
 	optype = mem->src.type == M_MATRIX ? 1 << 17 : 0;
@@ -1649,8 +1664,10 @@ int blt_mem_copy(int fd, const intel_ctx_t *ctx,
 	int ret;
 
 	alignment = get_default_alignment(fd, mem->driver);
-	src_offset = get_offset(ahnd, mem->src.handle, mem->src.size, alignment);
-	dst_offset = get_offset(ahnd, mem->dst.handle, mem->dst.size, alignment);
+	src_offset = get_offset_pat_index(ahnd, mem->src.handle, mem->src.size,
+					  alignment, mem->src.pat_index);
+	dst_offset = get_offset_pat_index(ahnd, mem->dst.handle, mem->dst.size,
+					  alignment, mem->dst.pat_index);
 	bb_offset = get_offset(ahnd, mem->bb.handle, mem->bb.size, alignment);
 
 	emit_blt_mem_copy(fd, ahnd, mem);
@@ -1690,7 +1707,8 @@ static void emit_blt_mem_set(int fd, uint64_t ahnd, const struct blt_mem_data *m
 	uint32_t value;
 
 	alignment = get_default_alignment(fd, mem->driver);
-	dst_offset = get_offset(ahnd, mem->dst.handle, mem->dst.size, alignment);
+	dst_offset = get_offset_pat_index(ahnd, mem->dst.handle, mem->dst.size,
+					  alignment, mem->dst.pat_index);
 
 	batch = bo_map(fd, mem->bb.handle, mem->bb.size, mem->driver);
 	value = (uint32_t)fill_data << 24;
@@ -1733,7 +1751,8 @@ int blt_mem_set(int fd, const intel_ctx_t *ctx,
 	int ret;
 
 	alignment = get_default_alignment(fd, mem->driver);
-	dst_offset = get_offset(ahnd, mem->dst.handle, mem->dst.size, alignment);
+	dst_offset = get_offset_pat_index(ahnd, mem->dst.handle, mem->dst.size,
+					  alignment, mem->dst.pat_index);
 	bb_offset = get_offset(ahnd, mem->bb.handle, mem->bb.size, alignment);
 
 	emit_blt_mem_set(fd, ahnd, mem, fill_data);
@@ -1808,7 +1827,7 @@ blt_create_object(const struct blt_copy_data *blt, uint32_t region,
 							  &size, region) == 0);
 	}
 
-	blt_set_object(obj, handle, size, region, mocs_index, tiling,
+	blt_set_object(obj, handle, size, region, mocs_index, DEFAULT_PAT_INDEX, tiling,
 		       compression, compression_type);
 	blt_set_geom(obj, stride, 0, 0, width, height, 0, 0);
 
@@ -1829,7 +1848,7 @@ void blt_destroy_object(int fd, struct blt_copy_object *obj)
 
 void blt_set_object(struct blt_copy_object *obj,
 		    uint32_t handle, uint64_t size, uint32_t region,
-		    uint8_t mocs_index, enum blt_tiling_type tiling,
+		    uint8_t mocs_index, uint8_t pat_index, enum blt_tiling_type tiling,
 		    enum blt_compression compression,
 		    enum blt_compression_type compression_type)
 {
@@ -1837,6 +1856,7 @@ void blt_set_object(struct blt_copy_object *obj,
 	obj->size = size;
 	obj->region = region;
 	obj->mocs_index = mocs_index;
+	obj->pat_index = pat_index;
 	obj->tiling = tiling;
 	obj->compression = compression;
 	obj->compression_type = compression_type;
@@ -1845,13 +1865,14 @@ void blt_set_object(struct blt_copy_object *obj,
 void blt_set_mem_object(struct blt_mem_object *obj,
 			uint32_t handle, uint64_t size, uint32_t pitch,
 			uint32_t width, uint32_t height, uint32_t region,
-			uint8_t mocs_index, enum blt_memop_type type,
-			enum blt_compression compression)
+			uint8_t mocs_index, uint8_t pat_index,
+			enum blt_memop_type type, enum blt_compression compression)
 {
 	obj->handle = handle;
 	obj->region = region;
 	obj->size = size;
 	obj->mocs_index = mocs_index;
+	obj->pat_index = pat_index;
 	obj->type = type;
 	obj->compression = compression;
 	obj->width = width;
@@ -1881,12 +1902,14 @@ void blt_set_copy_object(struct blt_copy_object *obj,
 
 void blt_set_ctrl_surf_object(struct blt_ctrl_surf_copy_object *obj,
 			      uint32_t handle, uint32_t region, uint64_t size,
-			      uint8_t mocs_index, enum blt_access_type access_type)
+			      uint8_t mocs_index, uint8_t pat_index,
+			      enum blt_access_type access_type)
 {
 	obj->handle = handle;
 	obj->region = region;
 	obj->size = size;
 	obj->mocs_index = mocs_index;
+	obj->pat_index = pat_index;
 	obj->access_type = access_type;
 }
 
diff --git a/lib/intel_blt.h b/lib/intel_blt.h
index 01a7e117a..34ee88ecb 100644
--- a/lib/intel_blt.h
+++ b/lib/intel_blt.h
@@ -79,6 +79,7 @@ struct blt_copy_object {
 	uint32_t region;
 	uint64_t size;
 	uint8_t mocs_index;
+	uint8_t pat_index;
 	enum blt_tiling_type tiling;
 	enum blt_compression compression;  /* BC only */
 	enum blt_compression_type compression_type; /* BC only */
@@ -98,6 +99,7 @@ struct blt_mem_object {
 	uint32_t region;
 	uint64_t size;
 	uint8_t mocs_index;
+	uint8_t pat_index;
 	enum blt_memop_type type;
 	enum blt_compression compression;
 	uint32_t width;
@@ -172,6 +174,7 @@ struct blt_ctrl_surf_copy_object {
 	uint32_t region;
 	uint64_t size;
 	uint8_t mocs_index;
+	uint8_t pat_index;
 	enum blt_access_type access_type;
 };
 
@@ -279,15 +282,15 @@ blt_create_object(const struct blt_copy_data *blt, uint32_t region,
 void blt_destroy_object(int fd, struct blt_copy_object *obj);
 void blt_set_object(struct blt_copy_object *obj,
 		    uint32_t handle, uint64_t size, uint32_t region,
-		    uint8_t mocs_index, enum blt_tiling_type tiling,
+		    uint8_t mocs_index, uint8_t pat_index, enum blt_tiling_type tiling,
 		    enum blt_compression compression,
 		    enum blt_compression_type compression_type);
 
 void blt_set_mem_object(struct blt_mem_object *obj,
 			uint32_t handle, uint64_t size, uint32_t pitch,
 			uint32_t width, uint32_t height, uint32_t region,
-			uint8_t mocs_index, enum blt_memop_type type,
-			enum blt_compression compression);
+			uint8_t mocs_index, uint8_t pat_index,
+			enum blt_memop_type type, enum blt_compression compression);
 
 void blt_set_object_ext(struct blt_block_copy_object_ext *obj,
 			uint8_t compression_format,
@@ -297,7 +300,8 @@ void blt_set_copy_object(struct blt_copy_object *obj,
 			 const struct blt_copy_object *orig);
 void blt_set_ctrl_surf_object(struct blt_ctrl_surf_copy_object *obj,
 			      uint32_t handle, uint32_t region, uint64_t size,
-			      uint8_t mocs_index, enum blt_access_type access_type);
+			      uint8_t mocs_index, uint8_t pat_index,
+			      enum blt_access_type access_type);
 
 void blt_surface_info(const char *info,
 		      const struct blt_copy_object *obj);
diff --git a/tests/intel/gem_ccs.c b/tests/intel/gem_ccs.c
index ed149ef9e..0a691778d 100644
--- a/tests/intel/gem_ccs.c
+++ b/tests/intel/gem_ccs.c
@@ -15,6 +15,7 @@
 #include "lib/intel_chipset.h"
 #include "intel_blt.h"
 #include "intel_mocs.h"
+#include "intel_pat.h"
 /**
  * TEST: gem ccs
  * Description: Exercise gen12 blitter with and without flatccs compression
@@ -111,9 +112,9 @@ static void surf_copy(int i915,
 	blt_ctrl_surf_copy_init(i915, &surf);
 	surf.print_bb = param.print_bb;
 	blt_set_ctrl_surf_object(&surf.src, mid->handle, mid->region, mid->size,
-				 uc_mocs, BLT_INDIRECT_ACCESS);
+				 uc_mocs, DEFAULT_PAT_INDEX, BLT_INDIRECT_ACCESS);
 	blt_set_ctrl_surf_object(&surf.dst, ccs, REGION_SMEM, ccssize,
-				 uc_mocs, DIRECT_ACCESS);
+				 uc_mocs, DEFAULT_PAT_INDEX, DIRECT_ACCESS);
 	bb_size = 4096;
 	igt_assert_eq(__gem_create(i915, &bb_size, &bb1), 0);
 	blt_set_batch(&surf.bb, bb1, bb_size, REGION_SMEM);
@@ -133,7 +134,7 @@ static void surf_copy(int i915,
 		igt_system_suspend_autoresume(SUSPEND_STATE_FREEZE, SUSPEND_TEST_NONE);
 
 		blt_set_ctrl_surf_object(&surf.dst, ccs2, REGION_SMEM, ccssize,
-					 0, DIRECT_ACCESS);
+					 0, DEFAULT_PAT_INDEX, DIRECT_ACCESS);
 		blt_ctrl_surf_copy(i915, ctx, e, ahnd, &surf);
 		gem_sync(i915, surf.dst.handle);
 
@@ -155,9 +156,9 @@ static void surf_copy(int i915,
 	for (int i = 0; i < surf.dst.size / sizeof(uint32_t); i++)
 		ccsmap[i] = i;
 	blt_set_ctrl_surf_object(&surf.src, ccs, REGION_SMEM, ccssize,
-				 uc_mocs, DIRECT_ACCESS);
+				 uc_mocs, DEFAULT_PAT_INDEX, DIRECT_ACCESS);
 	blt_set_ctrl_surf_object(&surf.dst, mid->handle, mid->region, mid->size,
-				 uc_mocs, INDIRECT_ACCESS);
+				 uc_mocs, DEFAULT_PAT_INDEX, INDIRECT_ACCESS);
 	blt_ctrl_surf_copy(i915, ctx, e, ahnd, &surf);
 
 	blt_copy_init(i915, &blt);
@@ -399,7 +400,8 @@ static void block_copy(int i915,
 	blt_set_object_ext(&ext.dst, 0, width, height, SURFACE_TYPE_2D);
 	if (config->inplace) {
 		blt_set_object(&blt.dst, mid->handle, dst->size, mid->region, 0,
-			       T_LINEAR, COMPRESSION_DISABLED, comp_type);
+			       DEFAULT_PAT_INDEX, T_LINEAR, COMPRESSION_DISABLED,
+			       comp_type);
 		blt.dst.ptr = mid->ptr;
 	}
 
@@ -475,7 +477,7 @@ static void block_multicopy(int i915,
 
 	if (config->inplace) {
 		blt_set_object(&blt3.dst, mid->handle, dst->size, mid->region,
-			       mid->mocs_index, mid_tiling, COMPRESSION_DISABLED,
+			       mid->mocs_index, DEFAULT_PAT_INDEX, mid_tiling, COMPRESSION_DISABLED,
 			       comp_type);
 		blt3.dst.ptr = mid->ptr;
 	}
diff --git a/tests/intel/gem_lmem_swapping.c b/tests/intel/gem_lmem_swapping.c
index 2e0ba0793..fa3ec6d99 100644
--- a/tests/intel/gem_lmem_swapping.c
+++ b/tests/intel/gem_lmem_swapping.c
@@ -486,7 +486,7 @@ static void __do_evict(int i915,
 				   INTEL_MEMORY_REGION_ID(I915_SYSTEM_MEMORY, 0));
 		blt_set_object(tmp, tmp->handle, params->size.max,
 			       INTEL_MEMORY_REGION_ID(I915_SYSTEM_MEMORY, 0),
-			       intel_get_uc_mocs_index(i915), T_LINEAR,
+			       intel_get_uc_mocs_index(i915), 0, T_LINEAR,
 			       COMPRESSION_DISABLED, COMPRESSION_TYPE_3D);
 		blt_set_geom(tmp, stride, 0, 0, width, height, 0, 0);
 	}
@@ -516,7 +516,7 @@ static void __do_evict(int i915,
 			obj->blt_obj = calloc(1, sizeof(*obj->blt_obj));
 			igt_assert(obj->blt_obj);
 			blt_set_object(obj->blt_obj, obj->handle, obj->size, region_id,
-				       intel_get_uc_mocs_index(i915), T_LINEAR,
+				       intel_get_uc_mocs_index(i915), 0, T_LINEAR,
 				       COMPRESSION_ENABLED, COMPRESSION_TYPE_3D);
 			blt_set_geom(obj->blt_obj, stride, 0, 0, width, height, 0, 0);
 			init_object_ccs(i915, obj, tmp, rand(), blt_ctx,
diff --git a/tests/intel/xe_ccs.c b/tests/intel/xe_ccs.c
index 876c239e4..647a6bd2e 100644
--- a/tests/intel/xe_ccs.c
+++ b/tests/intel/xe_ccs.c
@@ -13,6 +13,7 @@
 #include "igt_syncobj.h"
 #include "intel_blt.h"
 #include "intel_mocs.h"
+#include "intel_pat.h"
 #include "xe/xe_ioctl.h"
 #include "xe/xe_query.h"
 #include "xe/xe_util.h"
@@ -108,8 +109,9 @@ static void surf_copy(int xe,
 	blt_ctrl_surf_copy_init(xe, &surf);
 	surf.print_bb = param.print_bb;
 	blt_set_ctrl_surf_object(&surf.src, mid->handle, mid->region, mid->size,
-				 uc_mocs, BLT_INDIRECT_ACCESS);
-	blt_set_ctrl_surf_object(&surf.dst, ccs, sysmem, ccssize, uc_mocs, DIRECT_ACCESS);
+				 uc_mocs, DEFAULT_PAT_INDEX, BLT_INDIRECT_ACCESS);
+	blt_set_ctrl_surf_object(&surf.dst, ccs, sysmem, ccssize, uc_mocs,
+				 DEFAULT_PAT_INDEX, DIRECT_ACCESS);
 	bb_size = xe_get_default_alignment(xe);
 	bb1 = xe_bo_create_flags(xe, 0, bb_size, sysmem);
 	blt_set_batch(&surf.bb, bb1, bb_size, sysmem);
@@ -130,7 +132,7 @@ static void surf_copy(int xe,
 		igt_system_suspend_autoresume(SUSPEND_STATE_FREEZE, SUSPEND_TEST_NONE);
 
 		blt_set_ctrl_surf_object(&surf.dst, ccs2, system_memory(xe), ccssize,
-					 0, DIRECT_ACCESS);
+					 0, DEFAULT_PAT_INDEX, DIRECT_ACCESS);
 		blt_ctrl_surf_copy(xe, ctx, NULL, ahnd, &surf);
 		intel_ctx_xe_sync(ctx, true);
 
@@ -153,9 +155,9 @@ static void surf_copy(int xe,
 	for (int i = 0; i < surf.dst.size / sizeof(uint32_t); i++)
 		ccsmap[i] = i;
 	blt_set_ctrl_surf_object(&surf.src, ccs, sysmem, ccssize,
-				 uc_mocs, DIRECT_ACCESS);
+				 uc_mocs, DEFAULT_PAT_INDEX, DIRECT_ACCESS);
 	blt_set_ctrl_surf_object(&surf.dst, mid->handle, mid->region, mid->size,
-				 uc_mocs, INDIRECT_ACCESS);
+				 uc_mocs, DEFAULT_PAT_INDEX, INDIRECT_ACCESS);
 	blt_ctrl_surf_copy(xe, ctx, NULL, ahnd, &surf);
 	intel_ctx_xe_sync(ctx, true);
 
@@ -369,7 +371,8 @@ static void block_copy(int xe,
 	blt_set_object_ext(&ext.dst, 0, width, height, SURFACE_TYPE_2D);
 	if (config->inplace) {
 		blt_set_object(&blt.dst, mid->handle, dst->size, mid->region, 0,
-			       T_LINEAR, COMPRESSION_DISABLED, comp_type);
+			       DEFAULT_PAT_INDEX, T_LINEAR, COMPRESSION_DISABLED,
+			       comp_type);
 		blt.dst.ptr = mid->ptr;
 	}
 
@@ -450,8 +453,8 @@ static void block_multicopy(int xe,
 
 	if (config->inplace) {
 		blt_set_object(&blt3.dst, mid->handle, dst->size, mid->region,
-			       mid->mocs_index, mid_tiling, COMPRESSION_DISABLED,
-			       comp_type);
+			       mid->mocs_index, DEFAULT_PAT_INDEX, mid_tiling,
+			       COMPRESSION_DISABLED, comp_type);
 		blt3.dst.ptr = mid->ptr;
 	}
 
diff --git a/tests/intel/xe_copy_basic.c b/tests/intel/xe_copy_basic.c
index 059c54488..516d96052 100644
--- a/tests/intel/xe_copy_basic.c
+++ b/tests/intel/xe_copy_basic.c
@@ -11,6 +11,7 @@
 #include "intel_blt.h"
 #include "lib/intel_cmds_info.h"
 #include "lib/intel_mocs.h"
+#include "lib/intel_pat.h"
 #include "lib/intel_reg.h"
 #include "xe/xe_ioctl.h"
 #include "xe/xe_query.h"
@@ -56,9 +57,11 @@ mem_copy(int fd, uint32_t src_handle, uint32_t dst_handle, const intel_ctx_t *ct
 
 	blt_mem_init(fd, &mem);
 	blt_set_mem_object(&mem.src, src_handle, size, 0, width, height,
-			   region, src_mocs, M_LINEAR, COMPRESSION_DISABLED);
+			   region, src_mocs, DEFAULT_PAT_INDEX, M_LINEAR,
+			   COMPRESSION_DISABLED);
 	blt_set_mem_object(&mem.dst, dst_handle, size, 0, width, height,
-			   region, dst_mocs, M_LINEAR, COMPRESSION_DISABLED);
+			   region, dst_mocs, DEFAULT_PAT_INDEX, M_LINEAR,
+			   COMPRESSION_DISABLED);
 	mem.src.ptr = xe_bo_map(fd, src_handle, size);
 	mem.dst.ptr = xe_bo_map(fd, dst_handle, size);
 
@@ -105,7 +108,7 @@ mem_set(int fd, uint32_t dst_handle, const intel_ctx_t *ctx, uint32_t size,
 	bb = xe_bo_create_flags(fd, 0, bb_size, region);
 	blt_mem_init(fd, &mem);
 	blt_set_mem_object(&mem.dst, dst_handle, size, 0, width, height, region,
-			   dst_mocs, M_LINEAR, COMPRESSION_DISABLED);
+			   dst_mocs, DEFAULT_PAT_INDEX, M_LINEAR, COMPRESSION_DISABLED);
 	mem.dst.ptr = xe_bo_map(fd, dst_handle, size);
 	blt_set_batch(&mem.bb, bb, bb_size, region);
 	blt_mem_set(fd, ctx, NULL, ahnd, &mem, fill_data);
-- 
2.41.0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [igt-dev] [PATCH i-g-t v4 09/15] lib/intel_buf: support pat_index
  2023-10-19 14:40 [igt-dev] [PATCH i-g-t v4 00/15] PAT and cache coherency support Matthew Auld
                   ` (7 preceding siblings ...)
  2023-10-19 14:40 ` [igt-dev] [PATCH i-g-t v4 08/15] lib/intel_blt: support pat_index Matthew Auld
@ 2023-10-19 14:41 ` Matthew Auld
  2023-10-20  5:17   ` Niranjana Vishwanathapura
  2023-10-19 14:41 ` [igt-dev] [PATCH i-g-t v4 10/15] lib/xe_ioctl: update vm_bind to account for pat_index Matthew Auld
                   ` (5 subsequent siblings)
  14 siblings, 1 reply; 28+ messages in thread
From: Matthew Auld @ 2023-10-19 14:41 UTC (permalink / raw)
  To: igt-dev

Some users need to able select their own pat_index. Some display tests
use igt_draw which in turn uses intel_batchbuffer and intel_buf.  We
also have a couple more display tests directly using these interfaces
directly. Idea is to select wt/uc for anything display related, but also
allow any test to select a pat_index for a given intel_buf.

v2: (Zbigniew):
  - Add some macro helpers for decoding pat_index and range in rsvd1 (Zbigniew):
  - Rather use uc than wt. On xe2+ wt uses compression so CPU access
    might not work as expected, so for now just use uc.
v3:
  - Drop pat_index from reserve_if_not_allocated.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Cc: Pallavi Mishra <pallavi.mishra@intel.com>
Acked-by: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com>
---
 lib/igt_draw.c            |  7 ++++-
 lib/igt_fb.c              |  3 ++-
 lib/intel_batchbuffer.c   | 54 ++++++++++++++++++++++++++++++---------
 lib/intel_bufops.c        | 29 ++++++++++++++-------
 lib/intel_bufops.h        |  9 +++++--
 tests/intel/kms_big_fb.c  |  4 ++-
 tests/intel/kms_dirtyfb.c |  7 +++--
 tests/intel/kms_psr.c     |  4 ++-
 tests/intel/xe_intel_bb.c |  3 ++-
 9 files changed, 90 insertions(+), 30 deletions(-)

diff --git a/lib/igt_draw.c b/lib/igt_draw.c
index 1cf9d87c9..efd3a2436 100644
--- a/lib/igt_draw.c
+++ b/lib/igt_draw.c
@@ -31,6 +31,7 @@
 #include "intel_batchbuffer.h"
 #include "intel_chipset.h"
 #include "intel_mocs.h"
+#include "intel_pat.h"
 #include "igt_core.h"
 #include "igt_fb.h"
 #include "ioctl_wrappers.h"
@@ -75,6 +76,7 @@ struct buf_data {
 	uint32_t size;
 	uint32_t stride;
 	int bpp;
+	uint8_t pat_index;
 };
 
 struct rect {
@@ -658,7 +660,8 @@ static struct intel_buf *create_buf(int fd, struct buf_ops *bops,
 				    width, height, from->bpp, 0,
 				    tiling, 0,
 				    size, 0,
-				    region);
+				    region,
+				    from->pat_index);
 
 	/* Make sure we close handle on destroy path */
 	intel_buf_set_ownership(buf, true);
@@ -791,6 +794,7 @@ static void draw_rect_render(int fd, struct cmd_data *cmd_data,
 	igt_skip_on(!rendercopy);
 
 	/* We create a temporary buffer and copy from it using rendercopy. */
+	tmp.pat_index = buf->pat_index;
 	tmp.size = rect->w * rect->h * pixel_size;
 	if (is_i915_device(fd))
 		tmp.handle = gem_create(fd, tmp.size);
@@ -858,6 +862,7 @@ void igt_draw_rect(int fd, struct buf_ops *bops, uint32_t ctx,
 		.size = buf_size,
 		.stride = buf_stride,
 		.bpp = bpp,
+		.pat_index = intel_get_pat_idx_uc(fd),
 	};
 	struct rect rect = {
 		.x = rect_x,
diff --git a/lib/igt_fb.c b/lib/igt_fb.c
index e8f46534e..531496e7b 100644
--- a/lib/igt_fb.c
+++ b/lib/igt_fb.c
@@ -2637,7 +2637,8 @@ igt_fb_create_intel_buf(int fd, struct buf_ops *bops,
 				    igt_fb_mod_to_tiling(fb->modifier),
 				    compression, fb->size,
 				    fb->strides[0],
-				    region);
+				    region,
+				    intel_get_pat_idx_uc(fd));
 	intel_buf_set_name(buf, name);
 
 	/* Make sure we close handle on destroy path */
diff --git a/lib/intel_batchbuffer.c b/lib/intel_batchbuffer.c
index df82ef5f5..ef3e3154a 100644
--- a/lib/intel_batchbuffer.c
+++ b/lib/intel_batchbuffer.c
@@ -38,6 +38,7 @@
 #include "intel_batchbuffer.h"
 #include "intel_bufops.h"
 #include "intel_chipset.h"
+#include "intel_pat.h"
 #include "media_fill.h"
 #include "media_spin.h"
 #include "sw_sync.h"
@@ -825,15 +826,18 @@ static void __reallocate_objects(struct intel_bb *ibb)
 static inline uint64_t __intel_bb_get_offset(struct intel_bb *ibb,
 					     uint32_t handle,
 					     uint64_t size,
-					     uint32_t alignment)
+					     uint32_t alignment,
+					     uint8_t pat_index)
 {
 	uint64_t offset;
 
 	if (ibb->enforce_relocs)
 		return 0;
 
-	offset = intel_allocator_alloc(ibb->allocator_handle,
-				       handle, size, alignment);
+	offset = __intel_allocator_alloc(ibb->allocator_handle, handle,
+					 size, alignment, pat_index,
+					 ALLOC_STRATEGY_NONE);
+	igt_assert(offset != ALLOC_INVALID_ADDRESS);
 
 	return offset;
 }
@@ -1280,6 +1284,10 @@ void intel_bb_destroy(struct intel_bb *ibb)
 	free(ibb);
 }
 
+#define SZ_4K	0x1000
+#define XE_OBJ_SIZE(rsvd1) ((rsvd1) & ~(SZ_4K-1))
+#define XE_OBJ_PAT_IDX(rsvd1) ((rsvd1) & (SZ_4K-1))
+
 static struct drm_xe_vm_bind_op *xe_alloc_bind_ops(struct intel_bb *ibb,
 						   uint32_t op, uint32_t flags,
 						   uint32_t region)
@@ -1302,11 +1310,14 @@ static struct drm_xe_vm_bind_op *xe_alloc_bind_ops(struct intel_bb *ibb,
 		ops->flags = flags;
 		ops->obj_offset = 0;
 		ops->addr = objects[i]->offset;
-		ops->range = objects[i]->rsvd1;
+		ops->range = XE_OBJ_SIZE(objects[i]->rsvd1);
 		ops->region = region;
+		if (set_obj)
+			ops->pat_index = XE_OBJ_PAT_IDX(objects[i]->rsvd1);
 
-		igt_debug("  [%d]: handle: %u, offset: %llx, size: %llx\n",
-			  i, ops->obj, (long long)ops->addr, (long long)ops->range);
+		igt_debug("  [%d]: handle: %u, offset: %llx, size: %llx pat_index: %u\n",
+			  i, ops->obj, (long long)ops->addr, (long long)ops->range,
+			  ops->pat_index);
 	}
 
 	return bind_ops;
@@ -1412,7 +1423,8 @@ void intel_bb_reset(struct intel_bb *ibb, bool purge_objects_cache)
 		ibb->batch_offset = __intel_bb_get_offset(ibb,
 							  ibb->handle,
 							  ibb->size,
-							  ibb->alignment);
+							  ibb->alignment,
+							  DEFAULT_PAT_INDEX);
 
 	intel_bb_add_object(ibb, ibb->handle, ibb->size,
 			    ibb->batch_offset,
@@ -1648,7 +1660,8 @@ static void __remove_from_objects(struct intel_bb *ibb,
  */
 static struct drm_i915_gem_exec_object2 *
 __intel_bb_add_object(struct intel_bb *ibb, uint32_t handle, uint64_t size,
-		      uint64_t offset, uint64_t alignment, bool write)
+		      uint64_t offset, uint64_t alignment, uint8_t pat_index,
+		      bool write)
 {
 	struct drm_i915_gem_exec_object2 *object;
 
@@ -1664,6 +1677,9 @@ __intel_bb_add_object(struct intel_bb *ibb, uint32_t handle, uint64_t size,
 	object = __add_to_cache(ibb, handle);
 	__add_to_objects(ibb, object);
 
+	if (pat_index == DEFAULT_PAT_INDEX)
+		pat_index = intel_get_pat_idx_wb(ibb->fd);
+
 	/*
 	 * If object->offset == INVALID_ADDRESS we added freshly object to the
 	 * cache. In that case we have two choices:
@@ -1673,7 +1689,7 @@ __intel_bb_add_object(struct intel_bb *ibb, uint32_t handle, uint64_t size,
 	if (INVALID_ADDR(object->offset)) {
 		if (INVALID_ADDR(offset)) {
 			offset = __intel_bb_get_offset(ibb, handle, size,
-						       alignment);
+						       alignment, pat_index);
 		} else {
 			offset = offset & (ibb->gtt_size - 1);
 
@@ -1724,6 +1740,18 @@ __intel_bb_add_object(struct intel_bb *ibb, uint32_t handle, uint64_t size,
 	if (ibb->driver == INTEL_DRIVER_XE) {
 		object->alignment = alignment;
 		object->rsvd1 = size;
+		igt_assert(!XE_OBJ_PAT_IDX(object->rsvd1));
+
+		if (pat_index == DEFAULT_PAT_INDEX)
+			pat_index = intel_get_pat_idx_wb(ibb->fd);
+
+		/*
+		 * XXX: For now encode the pat_index in the first few bits of
+		 * rsvd1. intel_batchbuffer should really stop using the i915
+		 * drm_i915_gem_exec_object2 to encode VMA placement
+		 * information on xe...
+		 */
+		object->rsvd1 |= pat_index;
 	}
 
 	return object;
@@ -1736,7 +1764,7 @@ intel_bb_add_object(struct intel_bb *ibb, uint32_t handle, uint64_t size,
 	struct drm_i915_gem_exec_object2 *obj = NULL;
 
 	obj = __intel_bb_add_object(ibb, handle, size, offset,
-				    alignment, write);
+				    alignment, DEFAULT_PAT_INDEX, write);
 	igt_assert(obj);
 
 	return obj;
@@ -1798,8 +1826,10 @@ __intel_bb_add_intel_buf(struct intel_bb *ibb, struct intel_buf *buf,
 		}
 	}
 
-	obj = intel_bb_add_object(ibb, buf->handle, intel_buf_bo_size(buf),
-				  buf->addr.offset, alignment, write);
+	obj = __intel_bb_add_object(ibb, buf->handle, intel_buf_bo_size(buf),
+				    buf->addr.offset, alignment, buf->pat_index,
+				    write);
+	igt_assert(obj);
 	buf->addr.offset = obj->offset;
 
 	if (igt_list_empty(&buf->link)) {
diff --git a/lib/intel_bufops.c b/lib/intel_bufops.c
index 2c91adb88..fbee4748e 100644
--- a/lib/intel_bufops.c
+++ b/lib/intel_bufops.c
@@ -29,6 +29,7 @@
 #include "igt.h"
 #include "igt_x86.h"
 #include "intel_bufops.h"
+#include "intel_pat.h"
 #include "xe/xe_ioctl.h"
 #include "xe/xe_query.h"
 
@@ -818,7 +819,7 @@ static void __intel_buf_init(struct buf_ops *bops,
 			     int width, int height, int bpp, int alignment,
 			     uint32_t req_tiling, uint32_t compression,
 			     uint64_t bo_size, int bo_stride,
-			     uint64_t region)
+			     uint64_t region, uint8_t pat_index)
 {
 	uint32_t tiling = req_tiling;
 	uint64_t size;
@@ -839,6 +840,10 @@ static void __intel_buf_init(struct buf_ops *bops,
 	IGT_INIT_LIST_HEAD(&buf->link);
 	buf->mocs = INTEL_BUF_MOCS_DEFAULT;
 
+	if (pat_index == DEFAULT_PAT_INDEX)
+		pat_index = intel_get_pat_idx_wb(bops->fd);
+	buf->pat_index = pat_index;
+
 	if (compression) {
 		igt_require(bops->intel_gen >= 9);
 		igt_assert(req_tiling == I915_TILING_Y ||
@@ -957,7 +962,7 @@ void intel_buf_init(struct buf_ops *bops,
 	region = bops->driver == INTEL_DRIVER_I915 ? I915_SYSTEM_MEMORY :
 						     system_memory(bops->fd);
 	__intel_buf_init(bops, 0, buf, width, height, bpp, alignment,
-			 tiling, compression, 0, 0, region);
+			 tiling, compression, 0, 0, region, DEFAULT_PAT_INDEX);
 
 	intel_buf_set_ownership(buf, true);
 }
@@ -974,7 +979,7 @@ void intel_buf_init_in_region(struct buf_ops *bops,
 			      uint64_t region)
 {
 	__intel_buf_init(bops, 0, buf, width, height, bpp, alignment,
-			 tiling, compression, 0, 0, region);
+			 tiling, compression, 0, 0, region, DEFAULT_PAT_INDEX);
 
 	intel_buf_set_ownership(buf, true);
 }
@@ -1033,7 +1038,7 @@ void intel_buf_init_using_handle(struct buf_ops *bops,
 				 uint32_t req_tiling, uint32_t compression)
 {
 	__intel_buf_init(bops, handle, buf, width, height, bpp, alignment,
-			 req_tiling, compression, 0, 0, -1);
+			 req_tiling, compression, 0, 0, -1, DEFAULT_PAT_INDEX);
 }
 
 /**
@@ -1050,6 +1055,7 @@ void intel_buf_init_using_handle(struct buf_ops *bops,
  * @size: real bo size
  * @stride: bo stride
  * @region: region
+ * @pat_index: pat_index to use for the binding (only used on xe)
  *
  * Function configures BO handle within intel_buf structure passed by the caller
  * (with all its metadata - width, height, ...). Useful if BO was created
@@ -1067,10 +1073,12 @@ void intel_buf_init_full(struct buf_ops *bops,
 			 uint32_t compression,
 			 uint64_t size,
 			 int stride,
-			 uint64_t region)
+			 uint64_t region,
+			 uint8_t pat_index)
 {
 	__intel_buf_init(bops, handle, buf, width, height, bpp, alignment,
-			 req_tiling, compression, size, stride, region);
+			 req_tiling, compression, size, stride, region,
+			 pat_index);
 }
 
 /**
@@ -1149,7 +1157,8 @@ struct intel_buf *intel_buf_create_using_handle_and_size(struct buf_ops *bops,
 							 int stride)
 {
 	return intel_buf_create_full(bops, handle, width, height, bpp, alignment,
-				     req_tiling, compression, size, stride, -1);
+				     req_tiling, compression, size, stride, -1,
+				     DEFAULT_PAT_INDEX);
 }
 
 struct intel_buf *intel_buf_create_full(struct buf_ops *bops,
@@ -1160,7 +1169,8 @@ struct intel_buf *intel_buf_create_full(struct buf_ops *bops,
 					uint32_t compression,
 					uint64_t size,
 					int stride,
-					uint64_t region)
+					uint64_t region,
+					uint8_t pat_index)
 {
 	struct intel_buf *buf;
 
@@ -1170,7 +1180,8 @@ struct intel_buf *intel_buf_create_full(struct buf_ops *bops,
 	igt_assert(buf);
 
 	__intel_buf_init(bops, handle, buf, width, height, bpp, alignment,
-			 req_tiling, compression, size, stride, region);
+			 req_tiling, compression, size, stride, region,
+			 pat_index);
 
 	return buf;
 }
diff --git a/lib/intel_bufops.h b/lib/intel_bufops.h
index 4dfe4681c..b6048402b 100644
--- a/lib/intel_bufops.h
+++ b/lib/intel_bufops.h
@@ -63,6 +63,9 @@ struct intel_buf {
 	/* Content Protection*/
 	bool is_protected;
 
+	/* pat_index to use for mapping this buf. Only used in Xe. */
+	uint8_t pat_index;
+
 	/* For debugging purposes */
 	char name[INTEL_BUF_NAME_MAXSIZE + 1];
 };
@@ -161,7 +164,8 @@ void intel_buf_init_full(struct buf_ops *bops,
 			 uint32_t compression,
 			 uint64_t size,
 			 int stride,
-			 uint64_t region);
+			 uint64_t region,
+			 uint8_t pat_index);
 
 struct intel_buf *intel_buf_create(struct buf_ops *bops,
 				   int width, int height,
@@ -192,7 +196,8 @@ struct intel_buf *intel_buf_create_full(struct buf_ops *bops,
 					uint32_t compression,
 					uint64_t size,
 					int stride,
-					uint64_t region);
+					uint64_t region,
+					uint8_t pat_index);
 void intel_buf_destroy(struct intel_buf *buf);
 
 static inline void intel_buf_set_pxp(struct intel_buf *buf, bool new_pxp_state)
diff --git a/tests/intel/kms_big_fb.c b/tests/intel/kms_big_fb.c
index 2c7b24fca..64a67e34a 100644
--- a/tests/intel/kms_big_fb.c
+++ b/tests/intel/kms_big_fb.c
@@ -34,6 +34,7 @@
 #include <string.h>
 
 #include "i915/gem_create.h"
+#include "intel_pat.h"
 #include "xe/xe_ioctl.h"
 #include "xe/xe_query.h"
 
@@ -88,7 +89,8 @@ static struct intel_buf *init_buf(data_t *data,
 	handle = gem_open(data->drm_fd, name);
 	buf = intel_buf_create_full(data->bops, handle, width, height,
 				    bpp, 0, tiling, 0, size, 0,
-				    region);
+				    region,
+				    intel_get_pat_idx_uc(data->drm_fd));
 
 	intel_buf_set_name(buf, buf_name);
 	intel_buf_set_ownership(buf, true);
diff --git a/tests/intel/kms_dirtyfb.c b/tests/intel/kms_dirtyfb.c
index cc9529178..bf9f91505 100644
--- a/tests/intel/kms_dirtyfb.c
+++ b/tests/intel/kms_dirtyfb.c
@@ -10,6 +10,7 @@
 
 #include "i915/intel_drrs.h"
 #include "i915/intel_fbc.h"
+#include "intel_pat.h"
 
 #include "xe/xe_query.h"
 
@@ -246,14 +247,16 @@ static void run_test(data_t *data)
 				    0,
 				    igt_fb_mod_to_tiling(data->fbs[1].modifier),
 				    0, 0, 0, is_xe_device(data->drm_fd) ?
-				    system_memory(data->drm_fd) : 0);
+				    system_memory(data->drm_fd) : 0,
+				    intel_get_pat_idx_uc(data->drm_fd));
 	dst = intel_buf_create_full(data->bops, data->fbs[2].gem_handle,
 				    data->fbs[2].width,
 				    data->fbs[2].height,
 				    igt_drm_format_to_bpp(data->fbs[2].drm_format),
 				    0, igt_fb_mod_to_tiling(data->fbs[2].modifier),
 				    0, 0, 0, is_xe_device(data->drm_fd) ?
-				    system_memory(data->drm_fd) : 0);
+				    system_memory(data->drm_fd) : 0,
+				    intel_get_pat_idx_uc(data->drm_fd));
 	ibb = intel_bb_create(data->drm_fd, PAGE_SIZE);
 
 	spin = igt_spin_new(data->drm_fd, .ahnd = ibb->allocator_handle);
diff --git a/tests/intel/kms_psr.c b/tests/intel/kms_psr.c
index ffecc5222..4cc41e479 100644
--- a/tests/intel/kms_psr.c
+++ b/tests/intel/kms_psr.c
@@ -31,6 +31,7 @@
 #include "igt.h"
 #include "igt_sysfs.h"
 #include "igt_psr.h"
+#include "intel_pat.h"
 #include <errno.h>
 #include <stdbool.h>
 #include <stdio.h>
@@ -356,7 +357,8 @@ static struct intel_buf *create_buf_from_fb(data_t *data,
 	name = gem_flink(data->drm_fd, fb->gem_handle);
 	handle = gem_open(data->drm_fd, name);
 	buf = intel_buf_create_full(data->bops, handle, width, height,
-				    bpp, 0, tiling, 0, size, stride, region);
+				    bpp, 0, tiling, 0, size, stride, region,
+				    intel_get_pat_idx_uc(data->drm_fd));
 	intel_buf_set_ownership(buf, true);
 
 	return buf;
diff --git a/tests/intel/xe_intel_bb.c b/tests/intel/xe_intel_bb.c
index 26e4dcc85..e2accb743 100644
--- a/tests/intel/xe_intel_bb.c
+++ b/tests/intel/xe_intel_bb.c
@@ -19,6 +19,7 @@
 #include "igt.h"
 #include "igt_crc.h"
 #include "intel_bufops.h"
+#include "intel_pat.h"
 #include "xe/xe_ioctl.h"
 #include "xe/xe_query.h"
 
@@ -400,7 +401,7 @@ static void create_in_region(struct buf_ops *bops, uint64_t region)
 	intel_buf_init_full(bops, handle, &buf,
 			    width/4, height, 32, 0,
 			    I915_TILING_NONE, 0,
-			    size, 0, region);
+			    size, 0, region, DEFAULT_PAT_INDEX);
 	intel_buf_set_ownership(&buf, true);
 
 	intel_bb_add_intel_buf(ibb, &buf, false);
-- 
2.41.0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [igt-dev] [PATCH i-g-t v4 10/15] lib/xe_ioctl: update vm_bind to account for pat_index
  2023-10-19 14:40 [igt-dev] [PATCH i-g-t v4 00/15] PAT and cache coherency support Matthew Auld
                   ` (8 preceding siblings ...)
  2023-10-19 14:41 ` [igt-dev] [PATCH i-g-t v4 09/15] lib/intel_buf: " Matthew Auld
@ 2023-10-19 14:41 ` Matthew Auld
  2023-10-19 17:37   ` Niranjana Vishwanathapura
  2023-10-19 14:41 ` [igt-dev] [PATCH i-g-t v4 11/15] lib/intel_allocator: treat default_alignment as the minimum Matthew Auld
                   ` (4 subsequent siblings)
  14 siblings, 1 reply; 28+ messages in thread
From: Matthew Auld @ 2023-10-19 14:41 UTC (permalink / raw)
  To: igt-dev

Keep things minimal and select the 1way+ by default on all platforms.
Other users can use intel_buf, get_offset_pat_index etc or use
__xe_vm_bind() directly.  Display tests don't directly use this
interface.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Cc: Pallavi Mishra <pallavi.mishra@intel.com>
---
 lib/xe/xe_ioctl.c   | 8 ++++++--
 lib/xe/xe_ioctl.h   | 2 +-
 tests/intel/xe_vm.c | 5 ++++-
 3 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/lib/xe/xe_ioctl.c b/lib/xe/xe_ioctl.c
index 4cf44f1ee..f51f931ee 100644
--- a/lib/xe/xe_ioctl.c
+++ b/lib/xe/xe_ioctl.c
@@ -41,6 +41,7 @@
 #include "config.h"
 #include "drmtest.h"
 #include "igt_syncobj.h"
+#include "intel_pat.h"
 #include "ioctl_wrappers.h"
 #include "xe_ioctl.h"
 #include "xe_query.h"
@@ -92,7 +93,7 @@ void xe_vm_bind_array(int fd, uint32_t vm, uint32_t exec_queue,
 int  __xe_vm_bind(int fd, uint32_t vm, uint32_t exec_queue, uint32_t bo,
 		  uint64_t offset, uint64_t addr, uint64_t size, uint32_t op,
 		  uint32_t flags, struct drm_xe_sync *sync, uint32_t num_syncs,
-		  uint32_t region, uint64_t ext)
+		  uint32_t region, uint8_t pat_index, uint64_t ext)
 {
 	struct drm_xe_vm_bind bind = {
 		.extensions = ext,
@@ -108,6 +109,8 @@ int  __xe_vm_bind(int fd, uint32_t vm, uint32_t exec_queue, uint32_t bo,
 		.num_syncs = num_syncs,
 		.syncs = (uintptr_t)sync,
 		.exec_queue_id = exec_queue,
+		.bind.pat_index = (pat_index == DEFAULT_PAT_INDEX) ?
+			intel_get_pat_idx_wb(fd) : pat_index,
 	};
 
 	if (igt_ioctl(fd, DRM_IOCTL_XE_VM_BIND, &bind))
@@ -122,7 +125,8 @@ void  __xe_vm_bind_assert(int fd, uint32_t vm, uint32_t exec_queue, uint32_t bo,
 			  uint32_t num_syncs, uint32_t region, uint64_t ext)
 {
 	igt_assert_eq(__xe_vm_bind(fd, vm, exec_queue, bo, offset, addr, size,
-				   op, flags, sync, num_syncs, region, ext), 0);
+				   op, flags, sync, num_syncs, region, DEFAULT_PAT_INDEX,
+				   ext), 0);
 }
 
 void xe_vm_bind(int fd, uint32_t vm, uint32_t bo, uint64_t offset,
diff --git a/lib/xe/xe_ioctl.h b/lib/xe/xe_ioctl.h
index e3f62a28a..a28375d3e 100644
--- a/lib/xe/xe_ioctl.h
+++ b/lib/xe/xe_ioctl.h
@@ -20,7 +20,7 @@ uint32_t xe_vm_create(int fd, uint32_t flags, uint64_t ext);
 int  __xe_vm_bind(int fd, uint32_t vm, uint32_t exec_queue, uint32_t bo,
 		  uint64_t offset, uint64_t addr, uint64_t size, uint32_t op,
 		  uint32_t flags, struct drm_xe_sync *sync, uint32_t num_syncs,
-		  uint32_t region, uint64_t ext);
+		  uint32_t region, uint8_t pat_index, uint64_t ext);
 void  __xe_vm_bind_assert(int fd, uint32_t vm, uint32_t exec_queue, uint32_t bo,
 			  uint64_t offset, uint64_t addr, uint64_t size,
 			  uint32_t op, uint32_t flags, struct drm_xe_sync *sync,
diff --git a/tests/intel/xe_vm.c b/tests/intel/xe_vm.c
index dd3302337..a01e1ba47 100644
--- a/tests/intel/xe_vm.c
+++ b/tests/intel/xe_vm.c
@@ -10,6 +10,7 @@
  */
 
 #include "igt.h"
+#include "intel_pat.h"
 #include "lib/igt_syncobj.h"
 #include "lib/intel_reg.h"
 #include "xe_drm.h"
@@ -316,7 +317,8 @@ static void userptr_invalid(int fd)
 	vm = xe_vm_create(fd, 0, 0);
 	munmap(data, size);
 	ret = __xe_vm_bind(fd, vm, 0, 0, to_user_pointer(data), 0x40000,
-			   size, XE_VM_BIND_OP_MAP_USERPTR, 0, NULL, 0, 0, 0);
+			   size, XE_VM_BIND_OP_MAP_USERPTR, 0, NULL, 0, 0,
+			   DEFAULT_PAT_INDEX, 0);
 	igt_assert(ret == -EFAULT);
 
 	xe_vm_destroy(fd, vm);
@@ -755,6 +757,7 @@ test_bind_array(int fd, struct drm_xe_engine_class_instance *eci, int n_execs,
 		bind_ops[i].op = XE_VM_BIND_OP_MAP;
 		bind_ops[i].flags = XE_VM_BIND_FLAG_ASYNC;
 		bind_ops[i].region = 0;
+		bind_ops[i].pat_index = intel_get_pat_idx_wb(fd);
 		bind_ops[i].reserved[0] = 0;
 		bind_ops[i].reserved[1] = 0;
 
-- 
2.41.0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [igt-dev] [PATCH i-g-t v4 11/15] lib/intel_allocator: treat default_alignment as the minimum
  2023-10-19 14:40 [igt-dev] [PATCH i-g-t v4 00/15] PAT and cache coherency support Matthew Auld
                   ` (9 preceding siblings ...)
  2023-10-19 14:41 ` [igt-dev] [PATCH i-g-t v4 10/15] lib/xe_ioctl: update vm_bind to account for pat_index Matthew Auld
@ 2023-10-19 14:41 ` Matthew Auld
  2023-10-19 17:34   ` Niranjana Vishwanathapura
  2023-10-19 14:41 ` [igt-dev] [PATCH i-g-t v4 12/15] lib/intel_blt: tidy up alignment usage Matthew Auld
                   ` (3 subsequent siblings)
  14 siblings, 1 reply; 28+ messages in thread
From: Matthew Auld @ 2023-10-19 14:41 UTC (permalink / raw)
  To: igt-dev

If something overrides the default alignment, we should only apply the
alignment if it is larger than the default_alignment.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Cc: Pallavi Mishra <pallavi.mishra@intel.com>
---
 lib/intel_allocator.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/lib/intel_allocator.c b/lib/intel_allocator.c
index e5b9457b8..d94043016 100644
--- a/lib/intel_allocator.c
+++ b/lib/intel_allocator.c
@@ -586,6 +586,9 @@ static int handle_request(struct alloc_req *req, struct alloc_resp *resp)
 		case REQ_ALLOC:
 			if (!req->alloc.alignment)
 				req->alloc.alignment = ial->default_alignment;
+			else
+				req->alloc.alignment = max(ial->default_alignment,
+							   req->alloc.alignment);
 
 			resp->response_type = RESP_ALLOC;
 			resp->alloc.offset = ial->alloc(ial,
-- 
2.41.0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [igt-dev] [PATCH i-g-t v4 12/15] lib/intel_blt: tidy up alignment usage
  2023-10-19 14:40 [igt-dev] [PATCH i-g-t v4 00/15] PAT and cache coherency support Matthew Auld
                   ` (10 preceding siblings ...)
  2023-10-19 14:41 ` [igt-dev] [PATCH i-g-t v4 11/15] lib/intel_allocator: treat default_alignment as the minimum Matthew Auld
@ 2023-10-19 14:41 ` Matthew Auld
  2023-10-19 20:46   ` Niranjana Vishwanathapura
  2023-10-19 14:41 ` [igt-dev] [PATCH i-g-t v4 13/15] lib/intel_batchbuffer: extend to include optional alignment Matthew Auld
                   ` (2 subsequent siblings)
  14 siblings, 1 reply; 28+ messages in thread
From: Matthew Auld @ 2023-10-19 14:41 UTC (permalink / raw)
  To: igt-dev

No need to select get_default_alignment() all over the place; the
allocator should know the required default alignment. If we need
something specific like in the case of the ctrl surf we can now just set
whatever, safe in the knowledge that the allocator will already consider
the default allocator alignment as the min.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Cc: Pallavi Mishra <pallavi.mishra@intel.com>
---
 lib/intel_blt.c | 78 +++++++++++++++++++------------------------------
 1 file changed, 30 insertions(+), 48 deletions(-)

diff --git a/lib/intel_blt.c b/lib/intel_blt.c
index 2e9074eaf..a25f0a814 100644
--- a/lib/intel_blt.c
+++ b/lib/intel_blt.c
@@ -783,14 +783,6 @@ static void dump_bb_ext(struct gen12_block_copy_data_ext *data)
 		 data->dw21.src_array_index);
 }
 
-static uint64_t get_default_alignment(int fd, enum intel_driver driver)
-{
-	if (driver == INTEL_DRIVER_XE)
-		return xe_get_default_alignment(fd);
-
-	return gem_detect_safe_alignment(fd);
-}
-
 static void *bo_map(int fd, uint32_t handle, uint64_t size,
 		    enum intel_driver driver)
 {
@@ -842,21 +834,20 @@ uint64_t emit_blt_block_copy(int fd,
 	unsigned int ip_ver = intel_graphics_ver(intel_get_drm_devid(fd));
 	struct gen12_block_copy_data data = {};
 	struct gen12_block_copy_data_ext dext = {};
-	uint64_t dst_offset, src_offset, bb_offset, alignment;
+	uint64_t dst_offset, src_offset, bb_offset;
 	uint32_t bbe = MI_BATCH_BUFFER_END;
 	uint8_t *bb;
 
 	igt_assert_f(ahnd, "block-copy supports softpin only\n");
 	igt_assert_f(blt, "block-copy requires data to do blit\n");
 
-	alignment = get_default_alignment(fd, blt->driver);
 	src_offset = get_offset_pat_index(ahnd, blt->src.handle, blt->src.size,
-					  alignment, blt->src.pat_index);
+					  0, blt->src.pat_index);
 	src_offset += blt->src.plane_offset;
 	dst_offset = get_offset_pat_index(ahnd, blt->dst.handle, blt->dst.size,
-					  alignment, blt->dst.pat_index);
+					  0, blt->dst.pat_index);
 	dst_offset += blt->dst.plane_offset;
-	bb_offset = get_offset(ahnd, blt->bb.handle, blt->bb.size, alignment);
+	bb_offset = get_offset(ahnd, blt->bb.handle, blt->bb.size, 0);
 
 	fill_data(&data, blt, src_offset, dst_offset, ext, ip_ver);
 
@@ -918,19 +909,18 @@ int blt_block_copy(int fd,
 {
 	struct drm_i915_gem_execbuffer2 execbuf = {};
 	struct drm_i915_gem_exec_object2 obj[3] = {};
-	uint64_t dst_offset, src_offset, bb_offset, alignment;
+	uint64_t dst_offset, src_offset, bb_offset;
 	int ret;
 
 	igt_assert_f(ahnd, "block-copy supports softpin only\n");
 	igt_assert_f(blt, "block-copy requires data to do blit\n");
 	igt_assert_neq(blt->driver, 0);
 
-	alignment = get_default_alignment(fd, blt->driver);
 	src_offset = get_offset_pat_index(ahnd, blt->src.handle, blt->src.size,
-					  alignment, blt->src.pat_index);
+					  0, blt->src.pat_index);
 	dst_offset = get_offset_pat_index(ahnd, blt->dst.handle, blt->dst.size,
-					  alignment, blt->dst.pat_index);
-	bb_offset = get_offset(ahnd, blt->bb.handle, blt->bb.size, alignment);
+					  0, blt->dst.pat_index);
+	bb_offset = get_offset(ahnd, blt->bb.handle, blt->bb.size, 0);
 
 	emit_blt_block_copy(fd, ahnd, blt, ext, 0, true);
 
@@ -1132,7 +1122,7 @@ uint64_t emit_blt_ctrl_surf_copy(int fd,
 	igt_assert_f(ahnd, "ctrl-surf-copy supports softpin only\n");
 	igt_assert_f(surf, "ctrl-surf-copy requires data to do ctrl-surf-copy blit\n");
 
-	alignment = max_t(uint64_t, get_default_alignment(fd, surf->driver), 1ull << 16);
+	alignment = 1ull << 16;
 	src_offset = get_offset_pat_index(ahnd, surf->src.handle, surf->src.size,
 					  alignment, surf->src.pat_index);
 	dst_offset = get_offset_pat_index(ahnd, surf->dst.handle, surf->dst.size,
@@ -1236,7 +1226,7 @@ int blt_ctrl_surf_copy(int fd,
 	igt_assert_f(surf, "ctrl-surf-copy requires data to do ctrl-surf-copy blit\n");
 	igt_assert_neq(surf->driver, 0);
 
-	alignment = max_t(uint64_t, get_default_alignment(fd, surf->driver), 1ull << 16);
+	alignment = 1ull << 16;
 	src_offset = get_offset_pat_index(ahnd, surf->src.handle, surf->src.size,
 					  alignment, surf->src.pat_index);
 	dst_offset = get_offset_pat_index(ahnd, surf->dst.handle, surf->dst.size,
@@ -1443,13 +1433,10 @@ uint64_t emit_blt_fast_copy(int fd,
 {
 	unsigned int ip_ver = intel_graphics_ver(intel_get_drm_devid(fd));
 	struct gen12_fast_copy_data data = {};
-	uint64_t dst_offset, src_offset, bb_offset, alignment;
+	uint64_t dst_offset, src_offset, bb_offset;
 	uint32_t bbe = MI_BATCH_BUFFER_END;
 	uint32_t *bb;
 
-
-	alignment = get_default_alignment(fd, blt->driver);
-
 	data.dw00.client = 0x2;
 	data.dw00.opcode = 0x42;
 	data.dw00.dst_tiling = __fast_tiling(blt->dst.tiling);
@@ -1480,12 +1467,12 @@ uint64_t emit_blt_fast_copy(int fd,
 	data.dw03.dst_y2 = blt->dst.y2;
 
 	src_offset = get_offset_pat_index(ahnd, blt->src.handle, blt->src.size,
-					  alignment, blt->src.pat_index);
+					  0, blt->src.pat_index);
 	src_offset += blt->src.plane_offset;
-	dst_offset = get_offset_pat_index(ahnd, blt->dst.handle, blt->dst.size, alignment,
+	dst_offset = get_offset_pat_index(ahnd, blt->dst.handle, blt->dst.size, 0,
 					  blt->dst.pat_index);
 	dst_offset += blt->dst.plane_offset;
-	bb_offset = get_offset(ahnd, blt->bb.handle, blt->bb.size, alignment);
+	bb_offset = get_offset(ahnd, blt->bb.handle, blt->bb.size, 0);
 
 	data.dw04.dst_address_lo = dst_offset;
 	data.dw05.dst_address_hi = dst_offset >> 32;
@@ -1550,19 +1537,18 @@ int blt_fast_copy(int fd,
 {
 	struct drm_i915_gem_execbuffer2 execbuf = {};
 	struct drm_i915_gem_exec_object2 obj[3] = {};
-	uint64_t dst_offset, src_offset, bb_offset, alignment;
+	uint64_t dst_offset, src_offset, bb_offset;
 	int ret;
 
 	igt_assert_f(ahnd, "fast-copy supports softpin only\n");
 	igt_assert_f(blt, "fast-copy requires data to do fast-copy blit\n");
 	igt_assert_neq(blt->driver, 0);
 
-	alignment = get_default_alignment(fd, blt->driver);
 	src_offset = get_offset_pat_index(ahnd, blt->src.handle, blt->src.size,
-					  alignment, blt->src.pat_index);
+					  0, blt->src.pat_index);
 	dst_offset = get_offset_pat_index(ahnd, blt->dst.handle, blt->dst.size,
-					  alignment, blt->dst.pat_index);
-	bb_offset = get_offset(ahnd, blt->bb.handle, blt->bb.size, alignment);
+					  0, blt->dst.pat_index);
+	bb_offset = get_offset(ahnd, blt->bb.handle, blt->bb.size, 0);
 
 	emit_blt_fast_copy(fd, ahnd, blt, 0, true);
 
@@ -1610,16 +1596,15 @@ void blt_mem_init(int fd, struct blt_mem_data *mem)
 
 static void emit_blt_mem_copy(int fd, uint64_t ahnd, const struct blt_mem_data *mem)
 {
-	uint64_t dst_offset, src_offset, alignment;
+	uint64_t dst_offset, src_offset;
 	int i;
 	uint32_t *batch;
 	uint32_t optype;
 
-	alignment = get_default_alignment(fd, mem->driver);
 	src_offset = get_offset_pat_index(ahnd, mem->src.handle, mem->src.size,
-					  alignment, mem->src.pat_index);
+					  0, mem->src.pat_index);
 	dst_offset = get_offset_pat_index(ahnd, mem->dst.handle, mem->dst.size,
-					  alignment, mem->dst.pat_index);
+					  0, mem->dst.pat_index);
 
 	batch = bo_map(fd, mem->bb.handle, mem->bb.size, mem->driver);
 	optype = mem->src.type == M_MATRIX ? 1 << 17 : 0;
@@ -1660,15 +1645,14 @@ int blt_mem_copy(int fd, const intel_ctx_t *ctx,
 {
 	struct drm_i915_gem_execbuffer2 execbuf = {};
 	struct drm_i915_gem_exec_object2 obj[3] = {};
-	uint64_t dst_offset, src_offset, bb_offset, alignment;
+	uint64_t dst_offset, src_offset, bb_offset;
 	int ret;
 
-	alignment = get_default_alignment(fd, mem->driver);
 	src_offset = get_offset_pat_index(ahnd, mem->src.handle, mem->src.size,
-					  alignment, mem->src.pat_index);
+					  0, mem->src.pat_index);
 	dst_offset = get_offset_pat_index(ahnd, mem->dst.handle, mem->dst.size,
-					  alignment, mem->dst.pat_index);
-	bb_offset = get_offset(ahnd, mem->bb.handle, mem->bb.size, alignment);
+					  0, mem->dst.pat_index);
+	bb_offset = get_offset(ahnd, mem->bb.handle, mem->bb.size, 0);
 
 	emit_blt_mem_copy(fd, ahnd, mem);
 
@@ -1701,14 +1685,13 @@ int blt_mem_copy(int fd, const intel_ctx_t *ctx,
 static void emit_blt_mem_set(int fd, uint64_t ahnd, const struct blt_mem_data *mem,
 			     uint8_t fill_data)
 {
-	uint64_t dst_offset, alignment;
+	uint64_t dst_offset;
 	int b;
 	uint32_t *batch;
 	uint32_t value;
 
-	alignment = get_default_alignment(fd, mem->driver);
 	dst_offset = get_offset_pat_index(ahnd, mem->dst.handle, mem->dst.size,
-					  alignment, mem->dst.pat_index);
+					  0, mem->dst.pat_index);
 
 	batch = bo_map(fd, mem->bb.handle, mem->bb.size, mem->driver);
 	value = (uint32_t)fill_data << 24;
@@ -1747,13 +1730,12 @@ int blt_mem_set(int fd, const intel_ctx_t *ctx,
 {
 	struct drm_i915_gem_execbuffer2 execbuf = {};
 	struct drm_i915_gem_exec_object2 obj[2] = {};
-	uint64_t dst_offset, bb_offset, alignment;
+	uint64_t dst_offset, bb_offset;
 	int ret;
 
-	alignment = get_default_alignment(fd, mem->driver);
 	dst_offset = get_offset_pat_index(ahnd, mem->dst.handle, mem->dst.size,
-					  alignment, mem->dst.pat_index);
-	bb_offset = get_offset(ahnd, mem->bb.handle, mem->bb.size, alignment);
+					  0, mem->dst.pat_index);
+	bb_offset = get_offset(ahnd, mem->bb.handle, mem->bb.size, 0);
 
 	emit_blt_mem_set(fd, ahnd, mem, fill_data);
 
-- 
2.41.0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [igt-dev] [PATCH i-g-t v4 13/15] lib/intel_batchbuffer: extend to include optional alignment
  2023-10-19 14:40 [igt-dev] [PATCH i-g-t v4 00/15] PAT and cache coherency support Matthew Auld
                   ` (11 preceding siblings ...)
  2023-10-19 14:41 ` [igt-dev] [PATCH i-g-t v4 12/15] lib/intel_blt: tidy up alignment usage Matthew Auld
@ 2023-10-19 14:41 ` Matthew Auld
  2023-10-19 20:36   ` Niranjana Vishwanathapura
  2023-10-19 14:41 ` [igt-dev] [PATCH i-g-t v4 14/15] tests/xe: add some vm_bind pat_index tests Matthew Auld
  2023-10-19 14:41 ` [igt-dev] [PATCH i-g-t v4 15/15] tests/intel-ci/xe: add pat and caching related tests Matthew Auld
  14 siblings, 1 reply; 28+ messages in thread
From: Matthew Auld @ 2023-10-19 14:41 UTC (permalink / raw)
  To: igt-dev

Extend intel_bb_create_full() to support specifying the alignment for
the allocator. This will be useful in some upcoming test.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Cc: Pallavi Mishra <pallavi.mishra@intel.com>
---
 lib/intel_batchbuffer.c | 30 +++++++++++++++++++-----------
 lib/intel_batchbuffer.h |  2 +-
 2 files changed, 20 insertions(+), 12 deletions(-)

diff --git a/lib/intel_batchbuffer.c b/lib/intel_batchbuffer.c
index ef3e3154a..0fcf62452 100644
--- a/lib/intel_batchbuffer.c
+++ b/lib/intel_batchbuffer.c
@@ -894,7 +894,7 @@ static inline uint64_t __intel_bb_get_offset(struct intel_bb *ibb,
 static struct intel_bb *
 __intel_bb_create(int fd, uint32_t ctx, uint32_t vm, const intel_ctx_cfg_t *cfg,
 		  uint32_t size, bool do_relocs,
-		  uint64_t start, uint64_t end,
+		  uint64_t start, uint64_t end, uint64_t alignment,
 		  uint8_t allocator_type, enum allocator_strategy strategy)
 {
 	struct drm_i915_gem_exec_object2 *object;
@@ -918,7 +918,11 @@ __intel_bb_create(int fd, uint32_t ctx, uint32_t vm, const intel_ctx_cfg_t *cfg,
 	 */
 	if (ibb->driver == INTEL_DRIVER_I915) {
 		ibb->uses_full_ppgtt = gem_uses_full_ppgtt(fd);
-		ibb->alignment = gem_detect_safe_alignment(fd);
+
+		if (!alignment)
+			alignment = gem_detect_safe_alignment(fd);
+
+		ibb->alignment = alignment;
 		ibb->gtt_size = gem_aperture_size(fd);
 		ibb->handle = gem_create(fd, size);
 
@@ -947,7 +951,10 @@ __intel_bb_create(int fd, uint32_t ctx, uint32_t vm, const intel_ctx_cfg_t *cfg,
 	} else {
 		igt_assert(!do_relocs);
 
-		ibb->alignment = xe_get_default_alignment(fd);
+		if (!alignment)
+			alignment = xe_get_default_alignment(fd);
+
+		ibb->alignment = alignment;
 		size = ALIGN(size, ibb->alignment);
 		ibb->handle = xe_bo_create_flags(fd, 0, size, visible_vram_if_possible(fd, 0));
 
@@ -1018,6 +1025,7 @@ __intel_bb_create(int fd, uint32_t ctx, uint32_t vm, const intel_ctx_cfg_t *cfg,
  * @size: size of the batchbuffer
  * @start: allocator vm start address
  * @end: allocator vm start address
+ * @alignment: alignment to use for allocator, zero for default
  * @allocator_type: allocator type, SIMPLE, RELOC, ...
  * @strategy: allocation strategy
  *
@@ -1034,11 +1042,11 @@ __intel_bb_create(int fd, uint32_t ctx, uint32_t vm, const intel_ctx_cfg_t *cfg,
 struct intel_bb *intel_bb_create_full(int fd, uint32_t ctx, uint32_t vm,
 				      const intel_ctx_cfg_t *cfg, uint32_t size,
 				      uint64_t start, uint64_t end,
-				      uint8_t allocator_type,
+				      uint64_t alignment, uint8_t allocator_type,
 				      enum allocator_strategy strategy)
 {
 	return __intel_bb_create(fd, ctx, vm, cfg, size, false, start, end,
-				 allocator_type, strategy);
+				 alignment, allocator_type, strategy);
 }
 
 /**
@@ -1063,7 +1071,7 @@ struct intel_bb *intel_bb_create_with_allocator(int fd, uint32_t ctx, uint32_t v
 						uint32_t size,
 						uint8_t allocator_type)
 {
-	return __intel_bb_create(fd, ctx, vm, cfg, size, false, 0, 0,
+	return __intel_bb_create(fd, ctx, vm, cfg, size, false, 0, 0, 0,
 				 allocator_type, ALLOC_STRATEGY_HIGH_TO_LOW);
 }
 
@@ -1102,7 +1110,7 @@ struct intel_bb *intel_bb_create(int fd, uint32_t size)
 	bool relocs = is_i915_device(fd) && gem_has_relocations(fd);
 
 	return __intel_bb_create(fd, 0, 0, NULL, size,
-				 relocs && !aux_needs_softpin(fd), 0, 0,
+				 relocs && !aux_needs_softpin(fd), 0, 0, 0,
 				 INTEL_ALLOCATOR_SIMPLE,
 				 ALLOC_STRATEGY_HIGH_TO_LOW);
 }
@@ -1129,7 +1137,7 @@ intel_bb_create_with_context(int fd, uint32_t ctx, uint32_t vm,
 	bool relocs = is_i915_device(fd) && gem_has_relocations(fd);
 
 	return __intel_bb_create(fd, ctx, vm, cfg, size,
-				 relocs && !aux_needs_softpin(fd), 0, 0,
+				 relocs && !aux_needs_softpin(fd), 0, 0, 0,
 				 INTEL_ALLOCATOR_SIMPLE,
 				 ALLOC_STRATEGY_HIGH_TO_LOW);
 }
@@ -1150,7 +1158,7 @@ struct intel_bb *intel_bb_create_with_relocs(int fd, uint32_t size)
 {
 	igt_require(is_i915_device(fd) && gem_has_relocations(fd));
 
-	return __intel_bb_create(fd, 0, 0, NULL, size, true, 0, 0,
+	return __intel_bb_create(fd, 0, 0, NULL, size, true, 0, 0, 0,
 				 INTEL_ALLOCATOR_NONE, ALLOC_STRATEGY_NONE);
 }
 
@@ -1175,7 +1183,7 @@ intel_bb_create_with_relocs_and_context(int fd, uint32_t ctx,
 {
 	igt_require(is_i915_device(fd) && gem_has_relocations(fd));
 
-	return __intel_bb_create(fd, ctx, 0, cfg, size, true, 0, 0,
+	return __intel_bb_create(fd, ctx, 0, cfg, size, true, 0, 0, 0,
 				 INTEL_ALLOCATOR_NONE, ALLOC_STRATEGY_NONE);
 }
 
@@ -1195,7 +1203,7 @@ struct intel_bb *intel_bb_create_no_relocs(int fd, uint32_t size)
 {
 	igt_require(gem_uses_full_ppgtt(fd));
 
-	return __intel_bb_create(fd, 0, 0, NULL, size, false, 0, 0,
+	return __intel_bb_create(fd, 0, 0, NULL, size, false, 0, 0, 0,
 				 INTEL_ALLOCATOR_SIMPLE,
 				 ALLOC_STRATEGY_HIGH_TO_LOW);
 }
diff --git a/lib/intel_batchbuffer.h b/lib/intel_batchbuffer.h
index bdb3b6a67..8738cb5c4 100644
--- a/lib/intel_batchbuffer.h
+++ b/lib/intel_batchbuffer.h
@@ -307,7 +307,7 @@ struct intel_bb {
 struct intel_bb *
 intel_bb_create_full(int fd, uint32_t ctx, uint32_t vm,
 		     const intel_ctx_cfg_t *cfg, uint32_t size, uint64_t start,
-		     uint64_t end, uint8_t allocator_type,
+		     uint64_t end, uint64_t alignment, uint8_t allocator_type,
 		     enum allocator_strategy strategy);
 struct intel_bb *
 intel_bb_create_with_allocator(int fd, uint32_t ctx, uint32_t vm,
-- 
2.41.0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [igt-dev] [PATCH i-g-t v4 14/15] tests/xe: add some vm_bind pat_index tests
  2023-10-19 14:40 [igt-dev] [PATCH i-g-t v4 00/15] PAT and cache coherency support Matthew Auld
                   ` (12 preceding siblings ...)
  2023-10-19 14:41 ` [igt-dev] [PATCH i-g-t v4 13/15] lib/intel_batchbuffer: extend to include optional alignment Matthew Auld
@ 2023-10-19 14:41 ` Matthew Auld
  2023-10-20  5:27   ` Niranjana Vishwanathapura
  2023-10-19 14:41 ` [igt-dev] [PATCH i-g-t v4 15/15] tests/intel-ci/xe: add pat and caching related tests Matthew Auld
  14 siblings, 1 reply; 28+ messages in thread
From: Matthew Auld @ 2023-10-19 14:41 UTC (permalink / raw)
  To: igt-dev; +Cc: Nitish Kumar

Add some basic tests for pat_index and vm_bind.

v2: Make sure to actually use srand() with the chosen seed
  - Make it work on xe2; the wt mode now has compression.
  - Also test some xe2+ specific pat_index modes.
v3: Fix decompress step.
v4: (Niranjana)
  - Various improvements, including testing more pat_index modes, like
    wc where possible.
  - Document the idea behind "common" modes.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Cc: Pallavi Mishra <pallavi.mishra@intel.com>
Cc: Nitish Kumar <nitish.kumar@intel.com>
---
 tests/intel/xe_pat.c | 754 +++++++++++++++++++++++++++++++++++++++++++
 tests/meson.build    |   1 +
 2 files changed, 755 insertions(+)
 create mode 100644 tests/intel/xe_pat.c

diff --git a/tests/intel/xe_pat.c b/tests/intel/xe_pat.c
new file mode 100644
index 000000000..1e74014b8
--- /dev/null
+++ b/tests/intel/xe_pat.c
@@ -0,0 +1,754 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2023 Intel Corporation
+ */
+
+/**
+ * TEST: Test for selecting per-VMA pat_index
+ * Category: Software building block
+ * Sub-category: VMA
+ * Functionality: pat_index
+ */
+
+#include "igt.h"
+#include "intel_blt.h"
+#include "intel_mocs.h"
+#include "intel_pat.h"
+
+#include "xe/xe_ioctl.h"
+#include "xe/xe_query.h"
+#include "xe/xe_util.h"
+
+#define PAGE_SIZE 4096
+
+static bool do_slow_check;
+
+/**
+ * SUBTEST: userptr-coh-none
+ * Test category: functionality test
+ * Description: Test non-coherent pat_index on userptr
+ */
+static void userptr_coh_none(int fd)
+{
+	size_t size = xe_get_default_alignment(fd);
+	uint32_t vm;
+	void *data;
+
+	data = mmap(0, size, PROT_READ |
+		    PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS, -1, 0);
+	igt_assert(data != MAP_FAILED);
+
+	vm = xe_vm_create(fd, 0, 0);
+
+	/*
+	 * Try some valid combinations first just to make sure we're not being
+	 * swindled.
+	 */
+	igt_assert_eq(__xe_vm_bind(fd, vm, 0, 0, to_user_pointer(data), 0x40000,
+				   size, XE_VM_BIND_OP_MAP_USERPTR, 0, NULL, 0, 0,
+				   DEFAULT_PAT_INDEX, 0),
+		      0);
+	xe_vm_unbind_sync(fd, vm, 0, 0x40000, size);
+	igt_assert_eq(__xe_vm_bind(fd, vm, 0, 0, to_user_pointer(data), 0x40000,
+				   size, XE_VM_BIND_OP_MAP_USERPTR, 0, NULL, 0, 0,
+				   intel_get_pat_idx_wb(fd), 0),
+		      0);
+	xe_vm_unbind_sync(fd, vm, 0, 0x40000, size);
+
+	/* And then some known COH_NONE pat_index combos which should fail. */
+	igt_assert_eq(__xe_vm_bind(fd, vm, 0, 0, to_user_pointer(data), 0x40000,
+				   size, XE_VM_BIND_OP_MAP_USERPTR, 0, NULL, 0, 0,
+				   intel_get_pat_idx_uc(fd), 0),
+		      -EINVAL);
+	igt_assert_eq(__xe_vm_bind(fd, vm, 0, 0, to_user_pointer(data), 0x40000,
+				   size, XE_VM_BIND_OP_MAP_USERPTR, 0, NULL, 0, 0,
+				   intel_get_pat_idx_wt(fd), 0),
+		      -EINVAL);
+
+	munmap(data, size);
+	xe_vm_destroy(fd, vm);
+}
+
+/**
+ * SUBTEST: pat-index-all
+ * Test category: functionality test
+ * Description: Test every pat_index
+ */
+static void pat_index_all(int fd)
+{
+	uint16_t dev_id = intel_get_drm_devid(fd);
+	size_t size = xe_get_default_alignment(fd);
+	uint32_t vm, bo;
+	uint8_t pat_index;
+
+	vm = xe_vm_create(fd, 0, 0);
+
+	bo = xe_bo_create_caching(fd, 0, size, all_memory_regions(fd),
+				  DRM_XE_GEM_CPU_CACHING_WC,
+				  DRM_XE_GEM_COH_NONE);
+
+	igt_assert_eq(__xe_vm_bind(fd, vm, 0, bo, 0, 0x40000,
+				   size, XE_VM_BIND_OP_MAP, 0, NULL, 0, 0,
+				   intel_get_pat_idx_uc(fd), 0),
+		      0);
+	xe_vm_unbind_sync(fd, vm, 0, 0x40000, size);
+
+	igt_assert_eq(__xe_vm_bind(fd, vm, 0, bo, 0, 0x40000,
+				   size, XE_VM_BIND_OP_MAP, 0, NULL, 0, 0,
+				   intel_get_pat_idx_wt(fd), 0),
+		      0);
+	xe_vm_unbind_sync(fd, vm, 0, 0x40000, size);
+
+	igt_assert_eq(__xe_vm_bind(fd, vm, 0, bo, 0, 0x40000,
+				   size, XE_VM_BIND_OP_MAP, 0, NULL, 0, 0,
+				   intel_get_pat_idx_wb(fd), 0),
+		      0);
+	xe_vm_unbind_sync(fd, vm, 0, 0x40000, size);
+
+	igt_assert(intel_get_max_pat_index(fd));
+
+	for (pat_index = 0; pat_index <= intel_get_max_pat_index(fd);
+	     pat_index++) {
+		if (intel_get_device_info(dev_id)->graphics_ver == 20 &&
+		    pat_index >= 16 && pat_index <= 19) { /* hw reserved */
+			igt_assert_eq(__xe_vm_bind(fd, vm, 0, bo, 0, 0x40000,
+						   size, XE_VM_BIND_OP_MAP, 0, NULL, 0, 0,
+						   pat_index, 0),
+				      -EINVAL);
+		} else {
+			igt_assert_eq(__xe_vm_bind(fd, vm, 0, bo, 0, 0x40000,
+						   size, XE_VM_BIND_OP_MAP, 0, NULL, 0, 0,
+						   pat_index, 0),
+				      0);
+			xe_vm_unbind_sync(fd, vm, 0, 0x40000, size);
+		}
+	}
+
+	igt_assert_eq(__xe_vm_bind(fd, vm, 0, bo, 0, 0x40000,
+				   size, XE_VM_BIND_OP_MAP, 0, NULL, 0, 0,
+				   pat_index, 0),
+		      -EINVAL);
+
+	gem_close(fd, bo);
+
+	/* Must be at least as coherent as the gem_create coh_mode. */
+	bo = xe_bo_create_caching(fd, 0, size, system_memory(fd),
+				  DRM_XE_GEM_CPU_CACHING_WB,
+				  DRM_XE_GEM_COH_AT_LEAST_1WAY);
+
+	igt_assert_eq(__xe_vm_bind(fd, vm, 0, bo, 0, 0x40000,
+				   size, XE_VM_BIND_OP_MAP, 0, NULL, 0, 0,
+				   intel_get_pat_idx_uc(fd), 0),
+		      -EINVAL);
+
+	igt_assert_eq(__xe_vm_bind(fd, vm, 0, bo, 0, 0x40000,
+				   size, XE_VM_BIND_OP_MAP, 0, NULL, 0, 0,
+				   intel_get_pat_idx_wt(fd), 0),
+		      -EINVAL);
+
+	igt_assert_eq(__xe_vm_bind(fd, vm, 0, bo, 0, 0x40000,
+				   size, XE_VM_BIND_OP_MAP, 0, NULL, 0, 0,
+				   intel_get_pat_idx_wb(fd), 0),
+		      0);
+	xe_vm_unbind_sync(fd, vm, 0, 0x40000, size);
+
+	gem_close(fd, bo);
+
+	xe_vm_destroy(fd, vm);
+}
+
+#define CLEAR_1 0xFFFFFFFF /* something compressible */
+
+static void xe2_blt_decompress_dst(int fd,
+				   intel_ctx_t *ctx,
+				   uint64_t ahnd,
+				   struct blt_copy_data *blt,
+				   uint32_t alias_handle,
+				   uint32_t size)
+{
+	struct blt_copy_object tmp = {};
+
+	/*
+	 * Xe2 in-place decompression using an alias to the same physical
+	 * memory, but with the dst mapped using some uncompressed pat_index.
+	 * This should allow checking the object pages via mmap.
+	 */
+
+	memcpy(&tmp, &blt->src, sizeof(blt->dst));
+	memcpy(&blt->src, &blt->dst, sizeof(blt->dst));
+	blt_set_object(&blt->dst, alias_handle, size, 0,
+		       intel_get_uc_mocs_index(fd),
+		       intel_get_pat_idx_uc(fd), /* compression disabled */
+		       T_LINEAR, 0, 0);
+	blt_fast_copy(fd, ctx, NULL, ahnd, blt);
+	memcpy(&blt->dst, &blt->src, sizeof(blt->dst));
+	memcpy(&blt->src, &tmp, sizeof(blt->dst));
+}
+
+struct xe_pat_size_mode {
+	uint16_t width;
+	uint16_t height;
+	uint32_t alignment;
+	const char *name;
+};
+
+struct xe_pat_param {
+	int fd;
+
+	const struct xe_pat_size_mode *size;
+
+	uint32_t r1;
+	uint8_t  r1_pat_index;
+	uint16_t r1_coh_mode;
+	bool     r1_force_cpu_wc;
+
+	uint32_t r2;
+	uint8_t  r2_pat_index;
+	uint16_t r2_coh_mode;
+	bool     r2_force_cpu_wc;
+	bool     r2_compressed; /* xe2+ compression */
+
+};
+
+static void pat_index_blt(struct xe_pat_param *p)
+{
+	struct drm_xe_engine_class_instance inst = {
+		.engine_class = DRM_XE_ENGINE_CLASS_COPY,
+	};
+	struct blt_copy_data blt = {};
+	struct blt_copy_object src = {};
+	struct blt_copy_object dst = {};
+	uint32_t vm, exec_queue, src_bo, dst_bo, bb;
+	uint32_t *src_map, *dst_map;
+	uint16_t r1_cpu_caching, r2_cpu_caching;
+	uint32_t r1_flags, r2_flags;
+	intel_ctx_t *ctx;
+	uint64_t ahnd;
+	int width = p->size->width, height = p->size->height;
+	int size, stride, bb_size;
+	int bpp = 32;
+	uint32_t alias, name;
+	int fd = p->fd;
+	int i;
+
+	igt_require(blt_has_fast_copy(fd));
+
+	vm = xe_vm_create(fd, DRM_XE_VM_CREATE_ASYNC_DEFAULT, 0);
+	exec_queue = xe_exec_queue_create(fd, vm, &inst, 0);
+	ctx = intel_ctx_xe(fd, vm, exec_queue, 0, 0, 0);
+	ahnd = intel_allocator_open_full(fd, ctx->vm, 0, 0,
+					 INTEL_ALLOCATOR_SIMPLE,
+					 ALLOC_STRATEGY_LOW_TO_HIGH,
+					 p->size->alignment);
+
+	bb_size = xe_get_default_alignment(fd);
+	bb = xe_bo_create_flags(fd, 0, bb_size, system_memory(fd));
+
+	size = width * height * bpp / 8;
+	stride = width * 4;
+
+	r1_flags = 0;
+	if (p->r1 != system_memory(fd))
+		r1_flags |= XE_GEM_CREATE_FLAG_NEEDS_VISIBLE_VRAM;
+
+	if (p->r1_coh_mode == DRM_XE_GEM_COH_AT_LEAST_1WAY
+	    && p->r1 == system_memory(fd) && !p->r1_force_cpu_wc)
+		r1_cpu_caching = DRM_XE_GEM_CPU_CACHING_WB;
+	else
+		r1_cpu_caching = DRM_XE_GEM_CPU_CACHING_WC;
+
+	r2_flags = 0;
+	if (p->r2 != system_memory(fd))
+		r2_flags |= XE_GEM_CREATE_FLAG_NEEDS_VISIBLE_VRAM;
+
+	if (p->r2_coh_mode == DRM_XE_GEM_COH_AT_LEAST_1WAY &&
+	    p->r2 == system_memory(fd) && !p->r2_force_cpu_wc)
+		r2_cpu_caching = DRM_XE_GEM_CPU_CACHING_WB;
+	else
+		r2_cpu_caching = DRM_XE_GEM_CPU_CACHING_WC;
+
+
+	src_bo = xe_bo_create_caching(fd, 0, size, p->r1 | r1_flags, r1_cpu_caching,
+				      p->r1_coh_mode);
+	dst_bo = xe_bo_create_caching(fd, 0, size, p->r2 | r2_flags, r2_cpu_caching,
+				      p->r2_coh_mode);
+	if (p->r2_compressed) {
+		name = gem_flink(fd, dst_bo);
+		alias = gem_open(fd, name);
+	}
+
+	blt_copy_init(fd, &blt);
+	blt.color_depth = CD_32bit;
+
+	blt_set_object(&src, src_bo, size, p->r1, intel_get_uc_mocs_index(fd),
+		       p->r1_pat_index, T_LINEAR,
+		       COMPRESSION_DISABLED, COMPRESSION_TYPE_3D);
+	blt_set_geom(&src, stride, 0, 0, width, height, 0, 0);
+
+	blt_set_object(&dst, dst_bo, size, p->r2, intel_get_uc_mocs_index(fd),
+		       p->r2_pat_index, T_LINEAR,
+		       COMPRESSION_DISABLED, COMPRESSION_TYPE_3D);
+	blt_set_geom(&dst, stride, 0, 0, width, height, 0, 0);
+
+	blt_set_copy_object(&blt.src, &src);
+	blt_set_copy_object(&blt.dst, &dst);
+	blt_set_batch(&blt.bb, bb, bb_size, system_memory(fd));
+
+	src_map = xe_bo_map(fd, src_bo, size);
+	dst_map = xe_bo_map(fd, dst_bo, size);
+
+	/* Ensure we always see zeroes for the initial KMD zeroing */
+	blt_fast_copy(fd, ctx, NULL, ahnd, &blt);
+	if (p->r2_compressed)
+		xe2_blt_decompress_dst(fd, ctx, ahnd, &blt, alias, size);
+
+	/*
+	 * Only sample random dword in every page if we are doing slow uncached
+	 * reads from VRAM.
+	 */
+	if (!do_slow_check && p->r2 != system_memory(fd)) {
+		int dwords_page = PAGE_SIZE / sizeof(uint32_t);
+		int dword = rand() % dwords_page;
+
+		igt_debug("random dword: %d\n", dword);
+
+		for (i = dword; i < size / sizeof(uint32_t); i += dwords_page)
+			igt_assert_eq(dst_map[i], 0);
+
+	} else {
+		for (i = 0; i < size / sizeof(uint32_t); i++)
+			igt_assert_eq(dst_map[i], 0);
+	}
+
+	/* Write some values from the CPU, potentially dirtying the CPU cache */
+	for (i = 0; i < size / sizeof(uint32_t); i++) {
+		if (p->r2_compressed)
+			src_map[i] = CLEAR_1;
+		else
+			src_map[i] = i;
+	}
+
+	/* And finally ensure we always see the CPU written values */
+	blt_fast_copy(fd, ctx, NULL, ahnd, &blt);
+	if (p->r2_compressed)
+		xe2_blt_decompress_dst(fd, ctx, ahnd, &blt, alias, size);
+
+	if (!do_slow_check && p->r2 != system_memory(fd)) {
+		int dwords_page = PAGE_SIZE / sizeof(uint32_t);
+		int dword = rand() % dwords_page;
+
+		igt_debug("random dword: %d\n", dword);
+
+		for (i = dword; i < size / sizeof(uint32_t); i += dwords_page) {
+			if (p->r2_compressed)
+				igt_assert_eq(dst_map[i], CLEAR_1);
+			else
+				igt_assert_eq(dst_map[i], i);
+		}
+
+	} else {
+		for (i = 0; i < size / sizeof(uint32_t); i++) {
+			if (p->r2_compressed)
+				igt_assert_eq(dst_map[i], CLEAR_1);
+			else
+				igt_assert_eq(dst_map[i], i);
+		}
+	}
+
+	munmap(src_map, size);
+	munmap(dst_map, size);
+
+	gem_close(fd, src_bo);
+	gem_close(fd, dst_bo);
+	gem_close(fd, bb);
+
+	xe_exec_queue_destroy(fd, exec_queue);
+	xe_vm_destroy(fd, vm);
+
+	put_ahnd(ahnd);
+	intel_ctx_destroy(fd, ctx);
+}
+
+static void pat_index_render(struct xe_pat_param *p)
+{
+	int fd = p->fd;
+	uint32_t devid = intel_get_drm_devid(fd);
+	igt_render_copyfunc_t render_copy = NULL;
+	int size, stride, width = p->size->width, height = p->size->height;
+	struct intel_buf src, dst;
+	struct intel_bb *ibb;
+	struct buf_ops *bops;
+	uint16_t r1_cpu_caching, r2_cpu_caching;
+	uint32_t r1_flags, r2_flags;
+	uint32_t src_bo, dst_bo;
+	uint32_t *src_map, *dst_map;
+	int bpp = 32;
+	int i;
+
+	bops = buf_ops_create(fd);
+
+	render_copy = igt_get_render_copyfunc(devid);
+	igt_require(render_copy);
+	igt_require(!p->r2_compressed); /* XXX */
+	igt_require(xe_has_engine_class(fd, DRM_XE_ENGINE_CLASS_RENDER));
+
+	ibb = intel_bb_create_full(fd, 0, 0, NULL, xe_get_default_alignment(fd),
+				   0, 0, p->size->alignment,
+				   INTEL_ALLOCATOR_SIMPLE,
+				   ALLOC_STRATEGY_HIGH_TO_LOW);
+
+	if (p->r1_coh_mode == DRM_XE_GEM_COH_AT_LEAST_1WAY
+	    && p->r1 == system_memory(fd) && !p->r1_force_cpu_wc)
+		r1_cpu_caching = DRM_XE_GEM_CPU_CACHING_WB;
+	else
+		r1_cpu_caching = DRM_XE_GEM_CPU_CACHING_WC;
+
+	if (p->r2_coh_mode == DRM_XE_GEM_COH_AT_LEAST_1WAY &&
+	    p->r2 == system_memory(fd) && !p->r2_force_cpu_wc)
+		r2_cpu_caching = DRM_XE_GEM_CPU_CACHING_WB;
+	else
+		r2_cpu_caching = DRM_XE_GEM_CPU_CACHING_WC;
+
+	size = width * height * bpp / 8;
+	stride = width * 4;
+
+	r1_flags = 0;
+	if (p->r1 != system_memory(fd))
+		r1_flags |= XE_GEM_CREATE_FLAG_NEEDS_VISIBLE_VRAM;
+
+	src_bo = xe_bo_create_caching(fd, 0, size, p->r1 | r1_flags, r1_cpu_caching,
+				      p->r1_coh_mode);
+	intel_buf_init_full(bops, src_bo, &src, width, height, bpp, 0,
+			    I915_TILING_NONE, I915_COMPRESSION_NONE, size,
+			    stride, p->r1, p->r1_pat_index);
+
+	r2_flags = 0;
+	if (p->r2 != system_memory(fd))
+		r2_flags |= XE_GEM_CREATE_FLAG_NEEDS_VISIBLE_VRAM;
+
+	dst_bo = xe_bo_create_caching(fd, 0, size, p->r2 | r2_flags, r2_cpu_caching,
+				      p->r2_coh_mode);
+	intel_buf_init_full(bops, dst_bo, &dst, width, height, bpp, 0,
+			    I915_TILING_NONE, I915_COMPRESSION_NONE, size,
+			    stride, p->r2, p->r2_pat_index);
+
+	src_map = xe_bo_map(fd, src_bo, size);
+	dst_map = xe_bo_map(fd, dst_bo, size);
+
+	/* Ensure we always see zeroes for the initial KMD zeroing */
+	render_copy(ibb,
+		    &src,
+		    0, 0, width, height,
+		    &dst,
+		    0, 0);
+	intel_bb_sync(ibb);
+
+	if (!do_slow_check && p->r2 != system_memory(fd)) {
+		int dwords_page = PAGE_SIZE / sizeof(uint32_t);
+		int dword = rand() % dwords_page;
+
+		igt_debug("random dword: %d\n", dword);
+
+		for (i = dword; i < size / sizeof(uint32_t); i += dwords_page)
+			igt_assert_eq(dst_map[i], 0);
+	} else {
+		for (i = 0; i < size / sizeof(uint32_t); i++)
+			igt_assert_eq(dst_map[i], 0);
+	}
+
+	/* Write some values from the CPU, potentially dirtying the CPU cache */
+	for (i = 0; i < size / sizeof(uint32_t); i++)
+		src_map[i] = i;
+
+	/* And finally ensure we always see the CPU written values */
+	render_copy(ibb,
+		    &src,
+		    0, 0, width, height,
+		    &dst,
+		    0, 0);
+	intel_bb_sync(ibb);
+
+	if (!do_slow_check && p->r2 != system_memory(fd)) {
+		int dwords_page = PAGE_SIZE / sizeof(uint32_t);
+		int dword = rand() % dwords_page;
+
+		igt_debug("random dword: %d\n", dword);
+
+		for (i = dword; i < size / sizeof(uint32_t); i += dwords_page)
+			igt_assert_eq(dst_map[i], i);
+	} else {
+		for (i = 0; i < size / sizeof(uint32_t); i++)
+			igt_assert_eq(dst_map[i], i);
+	}
+
+	munmap(src_map, size);
+	munmap(dst_map, size);
+
+	intel_bb_destroy(ibb);
+
+	gem_close(fd, src_bo);
+	gem_close(fd, dst_bo);
+}
+
+static uint8_t get_pat_idx_uc(int fd, bool *compressed)
+{
+	if (compressed)
+		*compressed = false;
+
+	return intel_get_pat_idx_uc(fd);
+}
+
+static uint8_t get_pat_idx_wt(int fd, bool *compressed)
+{
+	uint16_t dev_id = intel_get_drm_devid(fd);
+
+	if (compressed)
+		*compressed = intel_get_device_info(dev_id)->graphics_ver == 20;
+
+	return intel_get_pat_idx_wt(fd);
+}
+
+static uint8_t get_pat_idx_wb(int fd, bool *compressed)
+{
+	if (compressed)
+		*compressed = false;
+
+	return intel_get_pat_idx_wb(fd);
+}
+
+struct pat_index_entry {
+	uint8_t (*get_pat_index)(int fd, bool *compressed);
+
+	uint8_t pat_index;
+	bool compressed;
+
+	const char *name;
+	uint16_t coh_mode;
+	bool force_cpu_wc;
+};
+
+/*
+ * The common modes are available on all platforms supported by Xe and so should
+ * be commonly supported. There are many more possible pat_index modes, however
+ * most IGTs shouldn't really care about them so likely no need to add them to
+ * lib/intel_pat.c. We do try to test some on the non-common modes here.
+ */
+const struct pat_index_entry common_pat_index_modes[] = {
+	{ get_pat_idx_uc, 0, 0, "uc",        DRM_XE_GEM_COH_NONE                },
+	{ get_pat_idx_wt, 0, 0, "wt",        DRM_XE_GEM_COH_NONE                },
+	{ get_pat_idx_wb, 0, 0, "wb",        DRM_XE_GEM_COH_AT_LEAST_1WAY       },
+	{ get_pat_idx_wb, 0, 0, "wb-cpu-wc", DRM_XE_GEM_COH_AT_LEAST_1WAY, true },
+};
+
+const struct pat_index_entry xelp_pat_index_modes[] = {
+	{ NULL, 1, false, "wc", DRM_XE_GEM_COH_NONE },
+};
+
+const struct pat_index_entry xehpc_pat_index_modes[] = {
+	{ NULL, 1, false, "wc",    DRM_XE_GEM_COH_NONE          },
+	{ NULL, 4, false, "c1-wt", DRM_XE_GEM_COH_NONE          },
+	{ NULL, 5, false, "c1-wb", DRM_XE_GEM_COH_AT_LEAST_1WAY },
+	{ NULL, 6, false, "c2-wt", DRM_XE_GEM_COH_NONE          },
+	{ NULL, 7, false, "c2-wb", DRM_XE_GEM_COH_AT_LEAST_1WAY },
+};
+
+/* Too many, just pick some interesting ones */
+const struct pat_index_entry xe2_pat_index_modes[] = {
+	{ NULL, 1, false, "1way",        DRM_XE_GEM_COH_AT_LEAST_1WAY       },
+	{ NULL, 2, false, "2way",        DRM_XE_GEM_COH_AT_LEAST_1WAY       },
+	{ NULL, 2, false, "2way-cpu-wc", DRM_XE_GEM_COH_AT_LEAST_1WAY, true },
+	{ NULL, 3, true,  "uc-comp",     DRM_XE_GEM_COH_NONE                },
+	{ NULL, 5, false, "uc-1way",     DRM_XE_GEM_COH_AT_LEAST_1WAY       },
+};
+
+/*
+ * Depending on 2M/1G GTT pages we might trigger different PTE layouts for the
+ * PAT bits, so make sure we test with and without huge-pages. Also ensure we
+ * have a mix of different pat_index modes for each PDE.
+ */
+const struct xe_pat_size_mode size_modes[] =  {
+	{ 256,  256,  0,        "mixed-pde"  },
+	{ 1024, 1024, 1u << 21, "single-pde" },
+};
+
+typedef void (*copy_fn)(struct xe_pat_param *p);
+
+const struct xe_pat_copy_mode {
+	copy_fn fn;
+	const char *name;
+} copy_modes[] =  {
+	{  pat_index_blt,    "blt"    },
+	{  pat_index_render, "render" },
+};
+
+/**
+ * SUBTEST: pat-index-common
+ * Test category: functionality test
+ * Description: Check the common pat_index modes.
+ */
+
+/**
+ * SUBTEST: pat-index-xelp
+ * Test category: functionality test
+ * Description: Check some of the xelp pat_index modes.
+ */
+
+/**
+ * SUBTEST: pat-index-xehpc
+ * Test category: functionality test
+ * Description: Check some of the xehpc pat_index modes.
+ */
+
+/**
+ * SUBTEST: pat-index-xe2
+ * Test category: functionality test
+ * Description: Check some of the xe2 pat_index modes.
+ */
+
+static void subtest_pat_index_modes_with_regions(int fd,
+						 const struct pat_index_entry *modes_arr,
+						 int n_modes)
+{
+	struct igt_collection *copy_set;
+	struct igt_collection *pat_index_set;
+	struct igt_collection *regions_set;
+	struct igt_collection *sizes_set;
+	struct igt_collection *copies;
+	struct xe_pat_param p = {};
+
+	p.fd = fd;
+
+	copy_set = igt_collection_create(ARRAY_SIZE(copy_modes));
+
+	pat_index_set = igt_collection_create(n_modes);
+
+	regions_set = xe_get_memory_region_set(fd,
+					       XE_MEM_REGION_CLASS_SYSMEM,
+					       XE_MEM_REGION_CLASS_VRAM);
+
+	sizes_set = igt_collection_create(ARRAY_SIZE(size_modes));
+
+	for_each_variation_r(copies, 1, copy_set) {
+		struct igt_collection *regions;
+		struct xe_pat_copy_mode copy_mode;
+
+		copy_mode = copy_modes[igt_collection_get_value(copies, 0)];
+
+		for_each_variation_r(regions, 2, regions_set) {
+			struct igt_collection *pat_modes;
+			uint32_t r1, r2;
+			char *reg_str;
+
+			r1 = igt_collection_get_value(regions, 0);
+			r2 = igt_collection_get_value(regions, 1);
+
+			reg_str = xe_memregion_dynamic_subtest_name(fd, regions);
+
+			for_each_variation_r(pat_modes, 2, pat_index_set) {
+				struct igt_collection *sizes;
+				struct pat_index_entry r1_entry, r2_entry;
+				int r1_idx, r2_idx;
+
+				r1_idx = igt_collection_get_value(pat_modes, 0);
+				r2_idx = igt_collection_get_value(pat_modes, 1);
+
+				r1_entry = modes_arr[r1_idx];
+				r2_entry = modes_arr[r2_idx];
+
+				if (r1_entry.get_pat_index)
+					p.r1_pat_index = r1_entry.get_pat_index(fd, NULL);
+				else
+					p.r1_pat_index = r1_entry.pat_index;
+
+				if (r2_entry.get_pat_index)
+					p.r2_pat_index = r2_entry.get_pat_index(fd, &p.r2_compressed);
+				else {
+					p.r2_pat_index = r2_entry.pat_index;
+					p.r2_compressed = r2_entry.compressed;
+				}
+
+				p.r1_coh_mode = r1_entry.coh_mode;
+				p.r2_coh_mode = r2_entry.coh_mode;
+
+				p.r1_force_cpu_wc = r1_entry.force_cpu_wc;
+				p.r2_force_cpu_wc = r2_entry.force_cpu_wc;
+
+				p.r1 = r1;
+				p.r2 = r2;
+
+				for_each_variation_r(sizes, 1, sizes_set) {
+					int size_mode_idx = igt_collection_get_value(sizes, 0);
+
+					p.size = &size_modes[size_mode_idx];
+
+					igt_debug("[r1]: r: %u, idx: %u, coh: %u, wc: %d\n",
+						  p.r1, p.r1_pat_index, p.r1_coh_mode, p.r1_force_cpu_wc);
+					igt_debug("[r2]: r: %u, idx: %u, coh: %u, wc: %d, comp: %d, w: %u, h: %u, a: %u\n",
+						  p.r2, p.r2_pat_index, p.r2_coh_mode,
+						  p.r2_force_cpu_wc, p.r2_compressed,
+						  p.size->width, p.size->height,
+						  p.size->alignment);
+
+					igt_dynamic_f("%s-%s-%s-%s-%s",
+						      copy_mode.name,
+						      reg_str, r1_entry.name,
+						      r2_entry.name, p.size->name)
+						copy_mode.fn(&p);
+				}
+			}
+
+			free(reg_str);
+		}
+	}
+}
+
+igt_main
+{
+	uint16_t dev_id;
+	int fd;
+
+	igt_fixture {
+		uint32_t seed;
+
+		fd = drm_open_driver(DRIVER_XE);
+		dev_id = intel_get_drm_devid(fd);
+
+		seed = time(NULL);
+		srand(seed);
+		igt_debug("seed: %d\n", seed);
+
+		xe_device_get(fd);
+	}
+
+	igt_subtest("pat-index-all")
+		pat_index_all(fd);
+
+	igt_subtest("userptr-coh-none")
+		userptr_coh_none(fd);
+
+	igt_subtest_with_dynamic("pat-index-common") {
+		subtest_pat_index_modes_with_regions(fd, common_pat_index_modes,
+						     ARRAY_SIZE(common_pat_index_modes));
+	}
+
+	igt_subtest_with_dynamic("pat-index-xelp") {
+		igt_require(intel_graphics_ver(dev_id) <= IP_VER(12, 55));
+		subtest_pat_index_modes_with_regions(fd, xelp_pat_index_modes,
+						     ARRAY_SIZE(xelp_pat_index_modes));
+	}
+
+	igt_subtest_with_dynamic("pat-index-xehpc") {
+		igt_require(IS_PONTEVECCHIO(dev_id));
+		subtest_pat_index_modes_with_regions(fd, xehpc_pat_index_modes,
+						     ARRAY_SIZE(xehpc_pat_index_modes));
+	}
+
+	igt_subtest_with_dynamic("pat-index-xe2") {
+		igt_require(intel_get_device_info(dev_id)->graphics_ver >= 20);
+		subtest_pat_index_modes_with_regions(fd, xe2_pat_index_modes,
+						     ARRAY_SIZE(xe2_pat_index_modes));
+	}
+
+	igt_fixture
+		drm_close_driver(fd);
+}
diff --git a/tests/meson.build b/tests/meson.build
index 5afcd8cbb..3aecfbee0 100644
--- a/tests/meson.build
+++ b/tests/meson.build
@@ -297,6 +297,7 @@ intel_xe_progs = [
 	'xe_mmap',
 	'xe_module_load',
 	'xe_noexec_ping_pong',
+	'xe_pat',
 	'xe_pm',
 	'xe_pm_residency',
 	'xe_prime_self_import',
-- 
2.41.0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [igt-dev] [PATCH i-g-t v4 15/15] tests/intel-ci/xe: add pat and caching related tests
  2023-10-19 14:40 [igt-dev] [PATCH i-g-t v4 00/15] PAT and cache coherency support Matthew Auld
                   ` (13 preceding siblings ...)
  2023-10-19 14:41 ` [igt-dev] [PATCH i-g-t v4 14/15] tests/xe: add some vm_bind pat_index tests Matthew Auld
@ 2023-10-19 14:41 ` Matthew Auld
  14 siblings, 0 replies; 28+ messages in thread
From: Matthew Auld @ 2023-10-19 14:41 UTC (permalink / raw)
  To: igt-dev

Add the various pat_index, coh_mode and cpu_caching related tests to
BAT.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Cc: Pallavi Mishra <pallavi.mishra@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
---
 tests/intel-ci/xe-fast-feedback.testlist | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/tests/intel-ci/xe-fast-feedback.testlist b/tests/intel-ci/xe-fast-feedback.testlist
index 0cf28baf9..a01b38918 100644
--- a/tests/intel-ci/xe-fast-feedback.testlist
+++ b/tests/intel-ci/xe-fast-feedback.testlist
@@ -138,6 +138,7 @@ igt@xe_intel_bb@simple-bb-ctx
 igt@xe_mmap@bad-extensions
 igt@xe_mmap@bad-flags
 igt@xe_mmap@bad-object
+igt@xe_mmap@cpu-caching-coh
 igt@xe_mmap@system
 igt@xe_mmap@vram
 igt@xe_mmap@vram-system
@@ -178,6 +179,12 @@ igt@xe_vm@munmap-style-unbind-userptr-end
 igt@xe_vm@munmap-style-unbind-userptr-front
 igt@xe_vm@munmap-style-unbind-userptr-inval-end
 igt@xe_vm@munmap-style-unbind-userptr-inval-front
+igt@xe_pat@userptr-coh-none
+igt@xe_pat@pat-index-all
+igt@xe_pat@pat-index-common
+igt@xe_pat@pat-index-xelp
+igt@xe_pat@pat-index-xehpc
+igt@xe_pat@pat-index-xe2
 igt@xe_waitfence@abstime
 igt@xe_waitfence@reltime
 igt@kms_addfb_basic@addfb25-4-tiled
-- 
2.41.0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [igt-dev] [PATCH i-g-t v4 11/15] lib/intel_allocator: treat default_alignment as the minimum
  2023-10-19 14:41 ` [igt-dev] [PATCH i-g-t v4 11/15] lib/intel_allocator: treat default_alignment as the minimum Matthew Auld
@ 2023-10-19 17:34   ` Niranjana Vishwanathapura
  2023-10-20  7:55     ` Matthew Auld
  0 siblings, 1 reply; 28+ messages in thread
From: Niranjana Vishwanathapura @ 2023-10-19 17:34 UTC (permalink / raw)
  To: Matthew Auld; +Cc: igt-dev

On Thu, Oct 19, 2023 at 03:41:02PM +0100, Matthew Auld wrote:
>If something overrides the default alignment, we should only apply the
>alignment if it is larger than the default_alignment.
>
>Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>Cc: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com>
>Cc: José Roberto de Souza <jose.souza@intel.com>
>Cc: Pallavi Mishra <pallavi.mishra@intel.com>
>---
> lib/intel_allocator.c | 3 +++
> 1 file changed, 3 insertions(+)
>
>diff --git a/lib/intel_allocator.c b/lib/intel_allocator.c
>index e5b9457b8..d94043016 100644
>--- a/lib/intel_allocator.c
>+++ b/lib/intel_allocator.c
>@@ -586,6 +586,9 @@ static int handle_request(struct alloc_req *req, struct alloc_resp *resp)
> 		case REQ_ALLOC:
> 			if (!req->alloc.alignment)
> 				req->alloc.alignment = ial->default_alignment;
>+			else
>+				req->alloc.alignment = max(ial->default_alignment,
>+							   req->alloc.alignment);

Looks like we don't need if/else clause here.
req->alloc.alignment = max(ial->default_alignment, req->alloc.alignment);

Other than that, change looks good to me.
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>

Niranjana

>
> 			resp->response_type = RESP_ALLOC;
> 			resp->alloc.offset = ial->alloc(ial,
>-- 
>2.41.0
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [igt-dev] [PATCH i-g-t v4 10/15] lib/xe_ioctl: update vm_bind to account for pat_index
  2023-10-19 14:41 ` [igt-dev] [PATCH i-g-t v4 10/15] lib/xe_ioctl: update vm_bind to account for pat_index Matthew Auld
@ 2023-10-19 17:37   ` Niranjana Vishwanathapura
  2023-10-20  5:19     ` Niranjana Vishwanathapura
  0 siblings, 1 reply; 28+ messages in thread
From: Niranjana Vishwanathapura @ 2023-10-19 17:37 UTC (permalink / raw)
  To: Matthew Auld; +Cc: igt-dev

On Thu, Oct 19, 2023 at 03:41:01PM +0100, Matthew Auld wrote:
>Keep things minimal and select the 1way+ by default on all platforms.
>Other users can use intel_buf, get_offset_pat_index etc or use
>__xe_vm_bind() directly.  Display tests don't directly use this
>interface.
>
>Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>Cc: José Roberto de Souza <jose.souza@intel.com>
>Cc: Pallavi Mishra <pallavi.mishra@intel.com>
>---
> lib/xe/xe_ioctl.c   | 8 ++++++--
> lib/xe/xe_ioctl.h   | 2 +-
> tests/intel/xe_vm.c | 5 ++++-
> 3 files changed, 11 insertions(+), 4 deletions(-)
>
>diff --git a/lib/xe/xe_ioctl.c b/lib/xe/xe_ioctl.c
>index 4cf44f1ee..f51f931ee 100644
>--- a/lib/xe/xe_ioctl.c
>+++ b/lib/xe/xe_ioctl.c
>@@ -41,6 +41,7 @@
> #include "config.h"
> #include "drmtest.h"
> #include "igt_syncobj.h"
>+#include "intel_pat.h"
> #include "ioctl_wrappers.h"
> #include "xe_ioctl.h"
> #include "xe_query.h"
>@@ -92,7 +93,7 @@ void xe_vm_bind_array(int fd, uint32_t vm, uint32_t exec_queue,
> int  __xe_vm_bind(int fd, uint32_t vm, uint32_t exec_queue, uint32_t bo,
> 		  uint64_t offset, uint64_t addr, uint64_t size, uint32_t op,
> 		  uint32_t flags, struct drm_xe_sync *sync, uint32_t num_syncs,
>-		  uint32_t region, uint64_t ext)
>+		  uint32_t region, uint8_t pat_index, uint64_t ext)
> {
> 	struct drm_xe_vm_bind bind = {
> 		.extensions = ext,
>@@ -108,6 +109,8 @@ int  __xe_vm_bind(int fd, uint32_t vm, uint32_t exec_queue, uint32_t bo,
> 		.num_syncs = num_syncs,
> 		.syncs = (uintptr_t)sync,
> 		.exec_queue_id = exec_queue,
>+		.bind.pat_index = (pat_index == DEFAULT_PAT_INDEX) ?
>+			intel_get_pat_idx_wb(fd) : pat_index,
> 	};
>
> 	if (igt_ioctl(fd, DRM_IOCTL_XE_VM_BIND, &bind))
>@@ -122,7 +125,8 @@ void  __xe_vm_bind_assert(int fd, uint32_t vm, uint32_t exec_queue, uint32_t bo,
> 			  uint32_t num_syncs, uint32_t region, uint64_t ext)
> {
> 	igt_assert_eq(__xe_vm_bind(fd, vm, exec_queue, bo, offset, addr, size,
>-				   op, flags, sync, num_syncs, region, ext), 0);
>+				   op, flags, sync, num_syncs, region, DEFAULT_PAT_INDEX,
>+				   ext), 0);
> }
>
> void xe_vm_bind(int fd, uint32_t vm, uint32_t bo, uint64_t offset,
>diff --git a/lib/xe/xe_ioctl.h b/lib/xe/xe_ioctl.h
>index e3f62a28a..a28375d3e 100644
>--- a/lib/xe/xe_ioctl.h
>+++ b/lib/xe/xe_ioctl.h
>@@ -20,7 +20,7 @@ uint32_t xe_vm_create(int fd, uint32_t flags, uint64_t ext);
> int  __xe_vm_bind(int fd, uint32_t vm, uint32_t exec_queue, uint32_t bo,
> 		  uint64_t offset, uint64_t addr, uint64_t size, uint32_t op,
> 		  uint32_t flags, struct drm_xe_sync *sync, uint32_t num_syncs,
>-		  uint32_t region, uint64_t ext);
>+		  uint32_t region, uint8_t pat_index, uint64_t ext);
> void  __xe_vm_bind_assert(int fd, uint32_t vm, uint32_t exec_queue, uint32_t bo,
> 			  uint64_t offset, uint64_t addr, uint64_t size,
> 			  uint32_t op, uint32_t flags, struct drm_xe_sync *sync,
>diff --git a/tests/intel/xe_vm.c b/tests/intel/xe_vm.c
>index dd3302337..a01e1ba47 100644
>--- a/tests/intel/xe_vm.c
>+++ b/tests/intel/xe_vm.c
>@@ -10,6 +10,7 @@
>  */
>
> #include "igt.h"
>+#include "intel_pat.h"
> #include "lib/igt_syncobj.h"
> #include "lib/intel_reg.h"
> #include "xe_drm.h"
>@@ -316,7 +317,8 @@ static void userptr_invalid(int fd)
> 	vm = xe_vm_create(fd, 0, 0);
> 	munmap(data, size);
> 	ret = __xe_vm_bind(fd, vm, 0, 0, to_user_pointer(data), 0x40000,
>-			   size, XE_VM_BIND_OP_MAP_USERPTR, 0, NULL, 0, 0, 0);
>+			   size, XE_VM_BIND_OP_MAP_USERPTR, 0, NULL, 0, 0,
>+			   DEFAULT_PAT_INDEX, 0);
> 	igt_assert(ret == -EFAULT);
>
> 	xe_vm_destroy(fd, vm);
>@@ -755,6 +757,7 @@ test_bind_array(int fd, struct drm_xe_engine_class_instance *eci, int n_execs,
> 		bind_ops[i].op = XE_VM_BIND_OP_MAP;
> 		bind_ops[i].flags = XE_VM_BIND_FLAG_ASYNC;
> 		bind_ops[i].region = 0;
>+		bind_ops[i].pat_index = intel_get_pat_idx_wb(fd);
> 		bind_ops[i].reserved[0] = 0;
> 		bind_ops[i].reserved[1] = 0;

I am seeing few other (below) usage of vm_bind_array() calls.
lib/xe/xe_util.c
lib/intel_batchbuffer.c

I think they need to be updated too.

Niranjana

>
>-- 
>2.41.0
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [igt-dev] [PATCH i-g-t v4 13/15] lib/intel_batchbuffer: extend to include optional alignment
  2023-10-19 14:41 ` [igt-dev] [PATCH i-g-t v4 13/15] lib/intel_batchbuffer: extend to include optional alignment Matthew Auld
@ 2023-10-19 20:36   ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 28+ messages in thread
From: Niranjana Vishwanathapura @ 2023-10-19 20:36 UTC (permalink / raw)
  To: Matthew Auld; +Cc: igt-dev

On Thu, Oct 19, 2023 at 03:41:04PM +0100, Matthew Auld wrote:
>Extend intel_bb_create_full() to support specifying the alignment for
>the allocator. This will be useful in some upcoming test.
>
>Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>Cc: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com>
>Cc: José Roberto de Souza <jose.souza@intel.com>
>Cc: Pallavi Mishra <pallavi.mishra@intel.com>

LGTM.
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>

>---
> lib/intel_batchbuffer.c | 30 +++++++++++++++++++-----------
> lib/intel_batchbuffer.h |  2 +-
> 2 files changed, 20 insertions(+), 12 deletions(-)
>
>diff --git a/lib/intel_batchbuffer.c b/lib/intel_batchbuffer.c
>index ef3e3154a..0fcf62452 100644
>--- a/lib/intel_batchbuffer.c
>+++ b/lib/intel_batchbuffer.c
>@@ -894,7 +894,7 @@ static inline uint64_t __intel_bb_get_offset(struct intel_bb *ibb,
> static struct intel_bb *
> __intel_bb_create(int fd, uint32_t ctx, uint32_t vm, const intel_ctx_cfg_t *cfg,
> 		  uint32_t size, bool do_relocs,
>-		  uint64_t start, uint64_t end,
>+		  uint64_t start, uint64_t end, uint64_t alignment,
> 		  uint8_t allocator_type, enum allocator_strategy strategy)
> {
> 	struct drm_i915_gem_exec_object2 *object;
>@@ -918,7 +918,11 @@ __intel_bb_create(int fd, uint32_t ctx, uint32_t vm, const intel_ctx_cfg_t *cfg,
> 	 */
> 	if (ibb->driver == INTEL_DRIVER_I915) {
> 		ibb->uses_full_ppgtt = gem_uses_full_ppgtt(fd);
>-		ibb->alignment = gem_detect_safe_alignment(fd);
>+
>+		if (!alignment)
>+			alignment = gem_detect_safe_alignment(fd);
>+
>+		ibb->alignment = alignment;
> 		ibb->gtt_size = gem_aperture_size(fd);
> 		ibb->handle = gem_create(fd, size);
>
>@@ -947,7 +951,10 @@ __intel_bb_create(int fd, uint32_t ctx, uint32_t vm, const intel_ctx_cfg_t *cfg,
> 	} else {
> 		igt_assert(!do_relocs);
>
>-		ibb->alignment = xe_get_default_alignment(fd);
>+		if (!alignment)
>+			alignment = xe_get_default_alignment(fd);
>+
>+		ibb->alignment = alignment;
> 		size = ALIGN(size, ibb->alignment);
> 		ibb->handle = xe_bo_create_flags(fd, 0, size, visible_vram_if_possible(fd, 0));
>
>@@ -1018,6 +1025,7 @@ __intel_bb_create(int fd, uint32_t ctx, uint32_t vm, const intel_ctx_cfg_t *cfg,
>  * @size: size of the batchbuffer
>  * @start: allocator vm start address
>  * @end: allocator vm start address
>+ * @alignment: alignment to use for allocator, zero for default
>  * @allocator_type: allocator type, SIMPLE, RELOC, ...
>  * @strategy: allocation strategy
>  *
>@@ -1034,11 +1042,11 @@ __intel_bb_create(int fd, uint32_t ctx, uint32_t vm, const intel_ctx_cfg_t *cfg,
> struct intel_bb *intel_bb_create_full(int fd, uint32_t ctx, uint32_t vm,
> 				      const intel_ctx_cfg_t *cfg, uint32_t size,
> 				      uint64_t start, uint64_t end,
>-				      uint8_t allocator_type,
>+				      uint64_t alignment, uint8_t allocator_type,
> 				      enum allocator_strategy strategy)
> {
> 	return __intel_bb_create(fd, ctx, vm, cfg, size, false, start, end,
>-				 allocator_type, strategy);
>+				 alignment, allocator_type, strategy);
> }
>
> /**
>@@ -1063,7 +1071,7 @@ struct intel_bb *intel_bb_create_with_allocator(int fd, uint32_t ctx, uint32_t v
> 						uint32_t size,
> 						uint8_t allocator_type)
> {
>-	return __intel_bb_create(fd, ctx, vm, cfg, size, false, 0, 0,
>+	return __intel_bb_create(fd, ctx, vm, cfg, size, false, 0, 0, 0,
> 				 allocator_type, ALLOC_STRATEGY_HIGH_TO_LOW);
> }
>
>@@ -1102,7 +1110,7 @@ struct intel_bb *intel_bb_create(int fd, uint32_t size)
> 	bool relocs = is_i915_device(fd) && gem_has_relocations(fd);
>
> 	return __intel_bb_create(fd, 0, 0, NULL, size,
>-				 relocs && !aux_needs_softpin(fd), 0, 0,
>+				 relocs && !aux_needs_softpin(fd), 0, 0, 0,
> 				 INTEL_ALLOCATOR_SIMPLE,
> 				 ALLOC_STRATEGY_HIGH_TO_LOW);
> }
>@@ -1129,7 +1137,7 @@ intel_bb_create_with_context(int fd, uint32_t ctx, uint32_t vm,
> 	bool relocs = is_i915_device(fd) && gem_has_relocations(fd);
>
> 	return __intel_bb_create(fd, ctx, vm, cfg, size,
>-				 relocs && !aux_needs_softpin(fd), 0, 0,
>+				 relocs && !aux_needs_softpin(fd), 0, 0, 0,
> 				 INTEL_ALLOCATOR_SIMPLE,
> 				 ALLOC_STRATEGY_HIGH_TO_LOW);
> }
>@@ -1150,7 +1158,7 @@ struct intel_bb *intel_bb_create_with_relocs(int fd, uint32_t size)
> {
> 	igt_require(is_i915_device(fd) && gem_has_relocations(fd));
>
>-	return __intel_bb_create(fd, 0, 0, NULL, size, true, 0, 0,
>+	return __intel_bb_create(fd, 0, 0, NULL, size, true, 0, 0, 0,
> 				 INTEL_ALLOCATOR_NONE, ALLOC_STRATEGY_NONE);
> }
>
>@@ -1175,7 +1183,7 @@ intel_bb_create_with_relocs_and_context(int fd, uint32_t ctx,
> {
> 	igt_require(is_i915_device(fd) && gem_has_relocations(fd));
>
>-	return __intel_bb_create(fd, ctx, 0, cfg, size, true, 0, 0,
>+	return __intel_bb_create(fd, ctx, 0, cfg, size, true, 0, 0, 0,
> 				 INTEL_ALLOCATOR_NONE, ALLOC_STRATEGY_NONE);
> }
>
>@@ -1195,7 +1203,7 @@ struct intel_bb *intel_bb_create_no_relocs(int fd, uint32_t size)
> {
> 	igt_require(gem_uses_full_ppgtt(fd));
>
>-	return __intel_bb_create(fd, 0, 0, NULL, size, false, 0, 0,
>+	return __intel_bb_create(fd, 0, 0, NULL, size, false, 0, 0, 0,
> 				 INTEL_ALLOCATOR_SIMPLE,
> 				 ALLOC_STRATEGY_HIGH_TO_LOW);
> }
>diff --git a/lib/intel_batchbuffer.h b/lib/intel_batchbuffer.h
>index bdb3b6a67..8738cb5c4 100644
>--- a/lib/intel_batchbuffer.h
>+++ b/lib/intel_batchbuffer.h
>@@ -307,7 +307,7 @@ struct intel_bb {
> struct intel_bb *
> intel_bb_create_full(int fd, uint32_t ctx, uint32_t vm,
> 		     const intel_ctx_cfg_t *cfg, uint32_t size, uint64_t start,
>-		     uint64_t end, uint8_t allocator_type,
>+		     uint64_t end, uint64_t alignment, uint8_t allocator_type,
> 		     enum allocator_strategy strategy);
> struct intel_bb *
> intel_bb_create_with_allocator(int fd, uint32_t ctx, uint32_t vm,
>-- 
>2.41.0
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [igt-dev] [PATCH i-g-t v4 12/15] lib/intel_blt: tidy up alignment usage
  2023-10-19 14:41 ` [igt-dev] [PATCH i-g-t v4 12/15] lib/intel_blt: tidy up alignment usage Matthew Auld
@ 2023-10-19 20:46   ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 28+ messages in thread
From: Niranjana Vishwanathapura @ 2023-10-19 20:46 UTC (permalink / raw)
  To: Matthew Auld; +Cc: igt-dev

On Thu, Oct 19, 2023 at 03:41:03PM +0100, Matthew Auld wrote:
>No need to select get_default_alignment() all over the place; the
>allocator should know the required default alignment. If we need
>something specific like in the case of the ctrl surf we can now just set
>whatever, safe in the knowledge that the allocator will already consider
>the default allocator alignment as the min.
>
>Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>Cc: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com>
>Cc: José Roberto de Souza <jose.souza@intel.com>
>Cc: Pallavi Mishra <pallavi.mishra@intel.com>

LGTM.
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>

>---
> lib/intel_blt.c | 78 +++++++++++++++++++------------------------------
> 1 file changed, 30 insertions(+), 48 deletions(-)
>
>diff --git a/lib/intel_blt.c b/lib/intel_blt.c
>index 2e9074eaf..a25f0a814 100644
>--- a/lib/intel_blt.c
>+++ b/lib/intel_blt.c
>@@ -783,14 +783,6 @@ static void dump_bb_ext(struct gen12_block_copy_data_ext *data)
> 		 data->dw21.src_array_index);
> }
>
>-static uint64_t get_default_alignment(int fd, enum intel_driver driver)
>-{
>-	if (driver == INTEL_DRIVER_XE)
>-		return xe_get_default_alignment(fd);
>-
>-	return gem_detect_safe_alignment(fd);
>-}
>-
> static void *bo_map(int fd, uint32_t handle, uint64_t size,
> 		    enum intel_driver driver)
> {
>@@ -842,21 +834,20 @@ uint64_t emit_blt_block_copy(int fd,
> 	unsigned int ip_ver = intel_graphics_ver(intel_get_drm_devid(fd));
> 	struct gen12_block_copy_data data = {};
> 	struct gen12_block_copy_data_ext dext = {};
>-	uint64_t dst_offset, src_offset, bb_offset, alignment;
>+	uint64_t dst_offset, src_offset, bb_offset;
> 	uint32_t bbe = MI_BATCH_BUFFER_END;
> 	uint8_t *bb;
>
> 	igt_assert_f(ahnd, "block-copy supports softpin only\n");
> 	igt_assert_f(blt, "block-copy requires data to do blit\n");
>
>-	alignment = get_default_alignment(fd, blt->driver);
> 	src_offset = get_offset_pat_index(ahnd, blt->src.handle, blt->src.size,
>-					  alignment, blt->src.pat_index);
>+					  0, blt->src.pat_index);
> 	src_offset += blt->src.plane_offset;
> 	dst_offset = get_offset_pat_index(ahnd, blt->dst.handle, blt->dst.size,
>-					  alignment, blt->dst.pat_index);
>+					  0, blt->dst.pat_index);
> 	dst_offset += blt->dst.plane_offset;
>-	bb_offset = get_offset(ahnd, blt->bb.handle, blt->bb.size, alignment);
>+	bb_offset = get_offset(ahnd, blt->bb.handle, blt->bb.size, 0);
>
> 	fill_data(&data, blt, src_offset, dst_offset, ext, ip_ver);
>
>@@ -918,19 +909,18 @@ int blt_block_copy(int fd,
> {
> 	struct drm_i915_gem_execbuffer2 execbuf = {};
> 	struct drm_i915_gem_exec_object2 obj[3] = {};
>-	uint64_t dst_offset, src_offset, bb_offset, alignment;
>+	uint64_t dst_offset, src_offset, bb_offset;
> 	int ret;
>
> 	igt_assert_f(ahnd, "block-copy supports softpin only\n");
> 	igt_assert_f(blt, "block-copy requires data to do blit\n");
> 	igt_assert_neq(blt->driver, 0);
>
>-	alignment = get_default_alignment(fd, blt->driver);
> 	src_offset = get_offset_pat_index(ahnd, blt->src.handle, blt->src.size,
>-					  alignment, blt->src.pat_index);
>+					  0, blt->src.pat_index);
> 	dst_offset = get_offset_pat_index(ahnd, blt->dst.handle, blt->dst.size,
>-					  alignment, blt->dst.pat_index);
>-	bb_offset = get_offset(ahnd, blt->bb.handle, blt->bb.size, alignment);
>+					  0, blt->dst.pat_index);
>+	bb_offset = get_offset(ahnd, blt->bb.handle, blt->bb.size, 0);
>
> 	emit_blt_block_copy(fd, ahnd, blt, ext, 0, true);
>
>@@ -1132,7 +1122,7 @@ uint64_t emit_blt_ctrl_surf_copy(int fd,
> 	igt_assert_f(ahnd, "ctrl-surf-copy supports softpin only\n");
> 	igt_assert_f(surf, "ctrl-surf-copy requires data to do ctrl-surf-copy blit\n");
>
>-	alignment = max_t(uint64_t, get_default_alignment(fd, surf->driver), 1ull << 16);
>+	alignment = 1ull << 16;
> 	src_offset = get_offset_pat_index(ahnd, surf->src.handle, surf->src.size,
> 					  alignment, surf->src.pat_index);
> 	dst_offset = get_offset_pat_index(ahnd, surf->dst.handle, surf->dst.size,
>@@ -1236,7 +1226,7 @@ int blt_ctrl_surf_copy(int fd,
> 	igt_assert_f(surf, "ctrl-surf-copy requires data to do ctrl-surf-copy blit\n");
> 	igt_assert_neq(surf->driver, 0);
>
>-	alignment = max_t(uint64_t, get_default_alignment(fd, surf->driver), 1ull << 16);
>+	alignment = 1ull << 16;
> 	src_offset = get_offset_pat_index(ahnd, surf->src.handle, surf->src.size,
> 					  alignment, surf->src.pat_index);
> 	dst_offset = get_offset_pat_index(ahnd, surf->dst.handle, surf->dst.size,
>@@ -1443,13 +1433,10 @@ uint64_t emit_blt_fast_copy(int fd,
> {
> 	unsigned int ip_ver = intel_graphics_ver(intel_get_drm_devid(fd));
> 	struct gen12_fast_copy_data data = {};
>-	uint64_t dst_offset, src_offset, bb_offset, alignment;
>+	uint64_t dst_offset, src_offset, bb_offset;
> 	uint32_t bbe = MI_BATCH_BUFFER_END;
> 	uint32_t *bb;
>
>-
>-	alignment = get_default_alignment(fd, blt->driver);
>-
> 	data.dw00.client = 0x2;
> 	data.dw00.opcode = 0x42;
> 	data.dw00.dst_tiling = __fast_tiling(blt->dst.tiling);
>@@ -1480,12 +1467,12 @@ uint64_t emit_blt_fast_copy(int fd,
> 	data.dw03.dst_y2 = blt->dst.y2;
>
> 	src_offset = get_offset_pat_index(ahnd, blt->src.handle, blt->src.size,
>-					  alignment, blt->src.pat_index);
>+					  0, blt->src.pat_index);
> 	src_offset += blt->src.plane_offset;
>-	dst_offset = get_offset_pat_index(ahnd, blt->dst.handle, blt->dst.size, alignment,
>+	dst_offset = get_offset_pat_index(ahnd, blt->dst.handle, blt->dst.size, 0,
> 					  blt->dst.pat_index);
> 	dst_offset += blt->dst.plane_offset;
>-	bb_offset = get_offset(ahnd, blt->bb.handle, blt->bb.size, alignment);
>+	bb_offset = get_offset(ahnd, blt->bb.handle, blt->bb.size, 0);
>
> 	data.dw04.dst_address_lo = dst_offset;
> 	data.dw05.dst_address_hi = dst_offset >> 32;
>@@ -1550,19 +1537,18 @@ int blt_fast_copy(int fd,
> {
> 	struct drm_i915_gem_execbuffer2 execbuf = {};
> 	struct drm_i915_gem_exec_object2 obj[3] = {};
>-	uint64_t dst_offset, src_offset, bb_offset, alignment;
>+	uint64_t dst_offset, src_offset, bb_offset;
> 	int ret;
>
> 	igt_assert_f(ahnd, "fast-copy supports softpin only\n");
> 	igt_assert_f(blt, "fast-copy requires data to do fast-copy blit\n");
> 	igt_assert_neq(blt->driver, 0);
>
>-	alignment = get_default_alignment(fd, blt->driver);
> 	src_offset = get_offset_pat_index(ahnd, blt->src.handle, blt->src.size,
>-					  alignment, blt->src.pat_index);
>+					  0, blt->src.pat_index);
> 	dst_offset = get_offset_pat_index(ahnd, blt->dst.handle, blt->dst.size,
>-					  alignment, blt->dst.pat_index);
>-	bb_offset = get_offset(ahnd, blt->bb.handle, blt->bb.size, alignment);
>+					  0, blt->dst.pat_index);
>+	bb_offset = get_offset(ahnd, blt->bb.handle, blt->bb.size, 0);
>
> 	emit_blt_fast_copy(fd, ahnd, blt, 0, true);
>
>@@ -1610,16 +1596,15 @@ void blt_mem_init(int fd, struct blt_mem_data *mem)
>
> static void emit_blt_mem_copy(int fd, uint64_t ahnd, const struct blt_mem_data *mem)
> {
>-	uint64_t dst_offset, src_offset, alignment;
>+	uint64_t dst_offset, src_offset;
> 	int i;
> 	uint32_t *batch;
> 	uint32_t optype;
>
>-	alignment = get_default_alignment(fd, mem->driver);
> 	src_offset = get_offset_pat_index(ahnd, mem->src.handle, mem->src.size,
>-					  alignment, mem->src.pat_index);
>+					  0, mem->src.pat_index);
> 	dst_offset = get_offset_pat_index(ahnd, mem->dst.handle, mem->dst.size,
>-					  alignment, mem->dst.pat_index);
>+					  0, mem->dst.pat_index);
>
> 	batch = bo_map(fd, mem->bb.handle, mem->bb.size, mem->driver);
> 	optype = mem->src.type == M_MATRIX ? 1 << 17 : 0;
>@@ -1660,15 +1645,14 @@ int blt_mem_copy(int fd, const intel_ctx_t *ctx,
> {
> 	struct drm_i915_gem_execbuffer2 execbuf = {};
> 	struct drm_i915_gem_exec_object2 obj[3] = {};
>-	uint64_t dst_offset, src_offset, bb_offset, alignment;
>+	uint64_t dst_offset, src_offset, bb_offset;
> 	int ret;
>
>-	alignment = get_default_alignment(fd, mem->driver);
> 	src_offset = get_offset_pat_index(ahnd, mem->src.handle, mem->src.size,
>-					  alignment, mem->src.pat_index);
>+					  0, mem->src.pat_index);
> 	dst_offset = get_offset_pat_index(ahnd, mem->dst.handle, mem->dst.size,
>-					  alignment, mem->dst.pat_index);
>-	bb_offset = get_offset(ahnd, mem->bb.handle, mem->bb.size, alignment);
>+					  0, mem->dst.pat_index);
>+	bb_offset = get_offset(ahnd, mem->bb.handle, mem->bb.size, 0);
>
> 	emit_blt_mem_copy(fd, ahnd, mem);
>
>@@ -1701,14 +1685,13 @@ int blt_mem_copy(int fd, const intel_ctx_t *ctx,
> static void emit_blt_mem_set(int fd, uint64_t ahnd, const struct blt_mem_data *mem,
> 			     uint8_t fill_data)
> {
>-	uint64_t dst_offset, alignment;
>+	uint64_t dst_offset;
> 	int b;
> 	uint32_t *batch;
> 	uint32_t value;
>
>-	alignment = get_default_alignment(fd, mem->driver);
> 	dst_offset = get_offset_pat_index(ahnd, mem->dst.handle, mem->dst.size,
>-					  alignment, mem->dst.pat_index);
>+					  0, mem->dst.pat_index);
>
> 	batch = bo_map(fd, mem->bb.handle, mem->bb.size, mem->driver);
> 	value = (uint32_t)fill_data << 24;
>@@ -1747,13 +1730,12 @@ int blt_mem_set(int fd, const intel_ctx_t *ctx,
> {
> 	struct drm_i915_gem_execbuffer2 execbuf = {};
> 	struct drm_i915_gem_exec_object2 obj[2] = {};
>-	uint64_t dst_offset, bb_offset, alignment;
>+	uint64_t dst_offset, bb_offset;
> 	int ret;
>
>-	alignment = get_default_alignment(fd, mem->driver);
> 	dst_offset = get_offset_pat_index(ahnd, mem->dst.handle, mem->dst.size,
>-					  alignment, mem->dst.pat_index);
>-	bb_offset = get_offset(ahnd, mem->bb.handle, mem->bb.size, alignment);
>+					  0, mem->dst.pat_index);
>+	bb_offset = get_offset(ahnd, mem->bb.handle, mem->bb.size, 0);
>
> 	emit_blt_mem_set(fd, ahnd, mem, fill_data);
>
>-- 
>2.41.0
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [igt-dev] [PATCH i-g-t v4 09/15] lib/intel_buf: support pat_index
  2023-10-19 14:41 ` [igt-dev] [PATCH i-g-t v4 09/15] lib/intel_buf: " Matthew Auld
@ 2023-10-20  5:17   ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 28+ messages in thread
From: Niranjana Vishwanathapura @ 2023-10-20  5:17 UTC (permalink / raw)
  To: Matthew Auld; +Cc: igt-dev

On Thu, Oct 19, 2023 at 03:41:00PM +0100, Matthew Auld wrote:
>Some users need to able select their own pat_index. Some display tests
>use igt_draw which in turn uses intel_batchbuffer and intel_buf.  We
>also have a couple more display tests directly using these interfaces
>directly. Idea is to select wt/uc for anything display related, but also
>allow any test to select a pat_index for a given intel_buf.
>
>v2: (Zbigniew):
>  - Add some macro helpers for decoding pat_index and range in rsvd1 (Zbigniew):
>  - Rather use uc than wt. On xe2+ wt uses compression so CPU access
>    might not work as expected, so for now just use uc.
>v3:
>  - Drop pat_index from reserve_if_not_allocated.
>
>Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>Cc: José Roberto de Souza <jose.souza@intel.com>
>Cc: Pallavi Mishra <pallavi.mishra@intel.com>
>Acked-by: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com>
>---
> lib/igt_draw.c            |  7 ++++-
> lib/igt_fb.c              |  3 ++-
> lib/intel_batchbuffer.c   | 54 ++++++++++++++++++++++++++++++---------
> lib/intel_bufops.c        | 29 ++++++++++++++-------
> lib/intel_bufops.h        |  9 +++++--
> tests/intel/kms_big_fb.c  |  4 ++-
> tests/intel/kms_dirtyfb.c |  7 +++--
> tests/intel/kms_psr.c     |  4 ++-
> tests/intel/xe_intel_bb.c |  3 ++-
> 9 files changed, 90 insertions(+), 30 deletions(-)
>
>diff --git a/lib/igt_draw.c b/lib/igt_draw.c
>index 1cf9d87c9..efd3a2436 100644
>--- a/lib/igt_draw.c
>+++ b/lib/igt_draw.c
>@@ -31,6 +31,7 @@
> #include "intel_batchbuffer.h"
> #include "intel_chipset.h"
> #include "intel_mocs.h"
>+#include "intel_pat.h"
> #include "igt_core.h"
> #include "igt_fb.h"
> #include "ioctl_wrappers.h"
>@@ -75,6 +76,7 @@ struct buf_data {
> 	uint32_t size;
> 	uint32_t stride;
> 	int bpp;
>+	uint8_t pat_index;
> };
>
> struct rect {
>@@ -658,7 +660,8 @@ static struct intel_buf *create_buf(int fd, struct buf_ops *bops,
> 				    width, height, from->bpp, 0,
> 				    tiling, 0,
> 				    size, 0,
>-				    region);
>+				    region,
>+				    from->pat_index);
>
> 	/* Make sure we close handle on destroy path */
> 	intel_buf_set_ownership(buf, true);
>@@ -791,6 +794,7 @@ static void draw_rect_render(int fd, struct cmd_data *cmd_data,
> 	igt_skip_on(!rendercopy);
>
> 	/* We create a temporary buffer and copy from it using rendercopy. */
>+	tmp.pat_index = buf->pat_index;
> 	tmp.size = rect->w * rect->h * pixel_size;
> 	if (is_i915_device(fd))
> 		tmp.handle = gem_create(fd, tmp.size);
>@@ -858,6 +862,7 @@ void igt_draw_rect(int fd, struct buf_ops *bops, uint32_t ctx,
> 		.size = buf_size,
> 		.stride = buf_stride,
> 		.bpp = bpp,
>+		.pat_index = intel_get_pat_idx_uc(fd),
> 	};
> 	struct rect rect = {
> 		.x = rect_x,
>diff --git a/lib/igt_fb.c b/lib/igt_fb.c
>index e8f46534e..531496e7b 100644
>--- a/lib/igt_fb.c
>+++ b/lib/igt_fb.c
>@@ -2637,7 +2637,8 @@ igt_fb_create_intel_buf(int fd, struct buf_ops *bops,
> 				    igt_fb_mod_to_tiling(fb->modifier),
> 				    compression, fb->size,
> 				    fb->strides[0],
>-				    region);
>+				    region,
>+				    intel_get_pat_idx_uc(fd));
> 	intel_buf_set_name(buf, name);
>
> 	/* Make sure we close handle on destroy path */
>diff --git a/lib/intel_batchbuffer.c b/lib/intel_batchbuffer.c
>index df82ef5f5..ef3e3154a 100644
>--- a/lib/intel_batchbuffer.c
>+++ b/lib/intel_batchbuffer.c
>@@ -38,6 +38,7 @@
> #include "intel_batchbuffer.h"
> #include "intel_bufops.h"
> #include "intel_chipset.h"
>+#include "intel_pat.h"
> #include "media_fill.h"
> #include "media_spin.h"
> #include "sw_sync.h"
>@@ -825,15 +826,18 @@ static void __reallocate_objects(struct intel_bb *ibb)
> static inline uint64_t __intel_bb_get_offset(struct intel_bb *ibb,
> 					     uint32_t handle,
> 					     uint64_t size,
>-					     uint32_t alignment)
>+					     uint32_t alignment,
>+					     uint8_t pat_index)
> {
> 	uint64_t offset;
>
> 	if (ibb->enforce_relocs)
> 		return 0;
>
>-	offset = intel_allocator_alloc(ibb->allocator_handle,
>-				       handle, size, alignment);
>+	offset = __intel_allocator_alloc(ibb->allocator_handle, handle,
>+					 size, alignment, pat_index,
>+					 ALLOC_STRATEGY_NONE);
>+	igt_assert(offset != ALLOC_INVALID_ADDRESS);
>
> 	return offset;
> }
>@@ -1280,6 +1284,10 @@ void intel_bb_destroy(struct intel_bb *ibb)
> 	free(ibb);
> }
>
>+#define SZ_4K	0x1000
>+#define XE_OBJ_SIZE(rsvd1) ((rsvd1) & ~(SZ_4K-1))
>+#define XE_OBJ_PAT_IDX(rsvd1) ((rsvd1) & (SZ_4K-1))
>+
> static struct drm_xe_vm_bind_op *xe_alloc_bind_ops(struct intel_bb *ibb,
> 						   uint32_t op, uint32_t flags,
> 						   uint32_t region)
>@@ -1302,11 +1310,14 @@ static struct drm_xe_vm_bind_op *xe_alloc_bind_ops(struct intel_bb *ibb,
> 		ops->flags = flags;
> 		ops->obj_offset = 0;
> 		ops->addr = objects[i]->offset;
>-		ops->range = objects[i]->rsvd1;
>+		ops->range = XE_OBJ_SIZE(objects[i]->rsvd1);
> 		ops->region = region;
>+		if (set_obj)
>+			ops->pat_index = XE_OBJ_PAT_IDX(objects[i]->rsvd1);
>
>-		igt_debug("  [%d]: handle: %u, offset: %llx, size: %llx\n",
>-			  i, ops->obj, (long long)ops->addr, (long long)ops->range);
>+		igt_debug("  [%d]: handle: %u, offset: %llx, size: %llx pat_index: %u\n",
>+			  i, ops->obj, (long long)ops->addr, (long long)ops->range,
>+			  ops->pat_index);
> 	}
>
> 	return bind_ops;
>@@ -1412,7 +1423,8 @@ void intel_bb_reset(struct intel_bb *ibb, bool purge_objects_cache)
> 		ibb->batch_offset = __intel_bb_get_offset(ibb,
> 							  ibb->handle,
> 							  ibb->size,
>-							  ibb->alignment);
>+							  ibb->alignment,
>+							  DEFAULT_PAT_INDEX);
>
> 	intel_bb_add_object(ibb, ibb->handle, ibb->size,
> 			    ibb->batch_offset,
>@@ -1648,7 +1660,8 @@ static void __remove_from_objects(struct intel_bb *ibb,
>  */
> static struct drm_i915_gem_exec_object2 *
> __intel_bb_add_object(struct intel_bb *ibb, uint32_t handle, uint64_t size,
>-		      uint64_t offset, uint64_t alignment, bool write)
>+		      uint64_t offset, uint64_t alignment, uint8_t pat_index,
>+		      bool write)
> {
> 	struct drm_i915_gem_exec_object2 *object;
>
>@@ -1664,6 +1677,9 @@ __intel_bb_add_object(struct intel_bb *ibb, uint32_t handle, uint64_t size,
> 	object = __add_to_cache(ibb, handle);
> 	__add_to_objects(ibb, object);
>
>+	if (pat_index == DEFAULT_PAT_INDEX)
>+		pat_index = intel_get_pat_idx_wb(ibb->fd);
>+
> 	/*
> 	 * If object->offset == INVALID_ADDRESS we added freshly object to the
> 	 * cache. In that case we have two choices:
>@@ -1673,7 +1689,7 @@ __intel_bb_add_object(struct intel_bb *ibb, uint32_t handle, uint64_t size,
> 	if (INVALID_ADDR(object->offset)) {
> 		if (INVALID_ADDR(offset)) {
> 			offset = __intel_bb_get_offset(ibb, handle, size,
>-						       alignment);
>+						       alignment, pat_index);
> 		} else {
> 			offset = offset & (ibb->gtt_size - 1);
>
>@@ -1724,6 +1740,18 @@ __intel_bb_add_object(struct intel_bb *ibb, uint32_t handle, uint64_t size,
> 	if (ibb->driver == INTEL_DRIVER_XE) {
> 		object->alignment = alignment;
> 		object->rsvd1 = size;
>+		igt_assert(!XE_OBJ_PAT_IDX(object->rsvd1));
>+
>+		if (pat_index == DEFAULT_PAT_INDEX)
>+			pat_index = intel_get_pat_idx_wb(ibb->fd);
>+
>+		/*
>+		 * XXX: For now encode the pat_index in the first few bits of
>+		 * rsvd1. intel_batchbuffer should really stop using the i915
>+		 * drm_i915_gem_exec_object2 to encode VMA placement
>+		 * information on xe...
>+		 */

I agree. At some point, we hope we move away from using drm_i915_gem_exec_object2
for Xe in IGT libraries.

Overall the patch seems logical to me. If by mistake IGT sends DEFAULT_PAT_INDEX to KMD,
we have a range check there to catch it. So,
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>

Niranjana

>+		object->rsvd1 |= pat_index;
> 	}
>
> 	return object;
>@@ -1736,7 +1764,7 @@ intel_bb_add_object(struct intel_bb *ibb, uint32_t handle, uint64_t size,
> 	struct drm_i915_gem_exec_object2 *obj = NULL;
>
> 	obj = __intel_bb_add_object(ibb, handle, size, offset,
>-				    alignment, write);
>+				    alignment, DEFAULT_PAT_INDEX, write);
> 	igt_assert(obj);
>
> 	return obj;
>@@ -1798,8 +1826,10 @@ __intel_bb_add_intel_buf(struct intel_bb *ibb, struct intel_buf *buf,
> 		}
> 	}
>
>-	obj = intel_bb_add_object(ibb, buf->handle, intel_buf_bo_size(buf),
>-				  buf->addr.offset, alignment, write);
>+	obj = __intel_bb_add_object(ibb, buf->handle, intel_buf_bo_size(buf),
>+				    buf->addr.offset, alignment, buf->pat_index,
>+				    write);
>+	igt_assert(obj);
> 	buf->addr.offset = obj->offset;
>
> 	if (igt_list_empty(&buf->link)) {
>diff --git a/lib/intel_bufops.c b/lib/intel_bufops.c
>index 2c91adb88..fbee4748e 100644
>--- a/lib/intel_bufops.c
>+++ b/lib/intel_bufops.c
>@@ -29,6 +29,7 @@
> #include "igt.h"
> #include "igt_x86.h"
> #include "intel_bufops.h"
>+#include "intel_pat.h"
> #include "xe/xe_ioctl.h"
> #include "xe/xe_query.h"
>
>@@ -818,7 +819,7 @@ static void __intel_buf_init(struct buf_ops *bops,
> 			     int width, int height, int bpp, int alignment,
> 			     uint32_t req_tiling, uint32_t compression,
> 			     uint64_t bo_size, int bo_stride,
>-			     uint64_t region)
>+			     uint64_t region, uint8_t pat_index)
> {
> 	uint32_t tiling = req_tiling;
> 	uint64_t size;
>@@ -839,6 +840,10 @@ static void __intel_buf_init(struct buf_ops *bops,
> 	IGT_INIT_LIST_HEAD(&buf->link);
> 	buf->mocs = INTEL_BUF_MOCS_DEFAULT;
>
>+	if (pat_index == DEFAULT_PAT_INDEX)
>+		pat_index = intel_get_pat_idx_wb(bops->fd);
>+	buf->pat_index = pat_index;
>+
> 	if (compression) {
> 		igt_require(bops->intel_gen >= 9);
> 		igt_assert(req_tiling == I915_TILING_Y ||
>@@ -957,7 +962,7 @@ void intel_buf_init(struct buf_ops *bops,
> 	region = bops->driver == INTEL_DRIVER_I915 ? I915_SYSTEM_MEMORY :
> 						     system_memory(bops->fd);
> 	__intel_buf_init(bops, 0, buf, width, height, bpp, alignment,
>-			 tiling, compression, 0, 0, region);
>+			 tiling, compression, 0, 0, region, DEFAULT_PAT_INDEX);
>
> 	intel_buf_set_ownership(buf, true);
> }
>@@ -974,7 +979,7 @@ void intel_buf_init_in_region(struct buf_ops *bops,
> 			      uint64_t region)
> {
> 	__intel_buf_init(bops, 0, buf, width, height, bpp, alignment,
>-			 tiling, compression, 0, 0, region);
>+			 tiling, compression, 0, 0, region, DEFAULT_PAT_INDEX);
>
> 	intel_buf_set_ownership(buf, true);
> }
>@@ -1033,7 +1038,7 @@ void intel_buf_init_using_handle(struct buf_ops *bops,
> 				 uint32_t req_tiling, uint32_t compression)
> {
> 	__intel_buf_init(bops, handle, buf, width, height, bpp, alignment,
>-			 req_tiling, compression, 0, 0, -1);
>+			 req_tiling, compression, 0, 0, -1, DEFAULT_PAT_INDEX);
> }
>
> /**
>@@ -1050,6 +1055,7 @@ void intel_buf_init_using_handle(struct buf_ops *bops,
>  * @size: real bo size
>  * @stride: bo stride
>  * @region: region
>+ * @pat_index: pat_index to use for the binding (only used on xe)
>  *
>  * Function configures BO handle within intel_buf structure passed by the caller
>  * (with all its metadata - width, height, ...). Useful if BO was created
>@@ -1067,10 +1073,12 @@ void intel_buf_init_full(struct buf_ops *bops,
> 			 uint32_t compression,
> 			 uint64_t size,
> 			 int stride,
>-			 uint64_t region)
>+			 uint64_t region,
>+			 uint8_t pat_index)
> {
> 	__intel_buf_init(bops, handle, buf, width, height, bpp, alignment,
>-			 req_tiling, compression, size, stride, region);
>+			 req_tiling, compression, size, stride, region,
>+			 pat_index);
> }
>
> /**
>@@ -1149,7 +1157,8 @@ struct intel_buf *intel_buf_create_using_handle_and_size(struct buf_ops *bops,
> 							 int stride)
> {
> 	return intel_buf_create_full(bops, handle, width, height, bpp, alignment,
>-				     req_tiling, compression, size, stride, -1);
>+				     req_tiling, compression, size, stride, -1,
>+				     DEFAULT_PAT_INDEX);
> }
>
> struct intel_buf *intel_buf_create_full(struct buf_ops *bops,
>@@ -1160,7 +1169,8 @@ struct intel_buf *intel_buf_create_full(struct buf_ops *bops,
> 					uint32_t compression,
> 					uint64_t size,
> 					int stride,
>-					uint64_t region)
>+					uint64_t region,
>+					uint8_t pat_index)
> {
> 	struct intel_buf *buf;
>
>@@ -1170,7 +1180,8 @@ struct intel_buf *intel_buf_create_full(struct buf_ops *bops,
> 	igt_assert(buf);
>
> 	__intel_buf_init(bops, handle, buf, width, height, bpp, alignment,
>-			 req_tiling, compression, size, stride, region);
>+			 req_tiling, compression, size, stride, region,
>+			 pat_index);
>
> 	return buf;
> }
>diff --git a/lib/intel_bufops.h b/lib/intel_bufops.h
>index 4dfe4681c..b6048402b 100644
>--- a/lib/intel_bufops.h
>+++ b/lib/intel_bufops.h
>@@ -63,6 +63,9 @@ struct intel_buf {
> 	/* Content Protection*/
> 	bool is_protected;
>
>+	/* pat_index to use for mapping this buf. Only used in Xe. */
>+	uint8_t pat_index;
>+
> 	/* For debugging purposes */
> 	char name[INTEL_BUF_NAME_MAXSIZE + 1];
> };
>@@ -161,7 +164,8 @@ void intel_buf_init_full(struct buf_ops *bops,
> 			 uint32_t compression,
> 			 uint64_t size,
> 			 int stride,
>-			 uint64_t region);
>+			 uint64_t region,
>+			 uint8_t pat_index);
>
> struct intel_buf *intel_buf_create(struct buf_ops *bops,
> 				   int width, int height,
>@@ -192,7 +196,8 @@ struct intel_buf *intel_buf_create_full(struct buf_ops *bops,
> 					uint32_t compression,
> 					uint64_t size,
> 					int stride,
>-					uint64_t region);
>+					uint64_t region,
>+					uint8_t pat_index);
> void intel_buf_destroy(struct intel_buf *buf);
>
> static inline void intel_buf_set_pxp(struct intel_buf *buf, bool new_pxp_state)
>diff --git a/tests/intel/kms_big_fb.c b/tests/intel/kms_big_fb.c
>index 2c7b24fca..64a67e34a 100644
>--- a/tests/intel/kms_big_fb.c
>+++ b/tests/intel/kms_big_fb.c
>@@ -34,6 +34,7 @@
> #include <string.h>
>
> #include "i915/gem_create.h"
>+#include "intel_pat.h"
> #include "xe/xe_ioctl.h"
> #include "xe/xe_query.h"
>
>@@ -88,7 +89,8 @@ static struct intel_buf *init_buf(data_t *data,
> 	handle = gem_open(data->drm_fd, name);
> 	buf = intel_buf_create_full(data->bops, handle, width, height,
> 				    bpp, 0, tiling, 0, size, 0,
>-				    region);
>+				    region,
>+				    intel_get_pat_idx_uc(data->drm_fd));
>
> 	intel_buf_set_name(buf, buf_name);
> 	intel_buf_set_ownership(buf, true);
>diff --git a/tests/intel/kms_dirtyfb.c b/tests/intel/kms_dirtyfb.c
>index cc9529178..bf9f91505 100644
>--- a/tests/intel/kms_dirtyfb.c
>+++ b/tests/intel/kms_dirtyfb.c
>@@ -10,6 +10,7 @@
>
> #include "i915/intel_drrs.h"
> #include "i915/intel_fbc.h"
>+#include "intel_pat.h"
>
> #include "xe/xe_query.h"
>
>@@ -246,14 +247,16 @@ static void run_test(data_t *data)
> 				    0,
> 				    igt_fb_mod_to_tiling(data->fbs[1].modifier),
> 				    0, 0, 0, is_xe_device(data->drm_fd) ?
>-				    system_memory(data->drm_fd) : 0);
>+				    system_memory(data->drm_fd) : 0,
>+				    intel_get_pat_idx_uc(data->drm_fd));
> 	dst = intel_buf_create_full(data->bops, data->fbs[2].gem_handle,
> 				    data->fbs[2].width,
> 				    data->fbs[2].height,
> 				    igt_drm_format_to_bpp(data->fbs[2].drm_format),
> 				    0, igt_fb_mod_to_tiling(data->fbs[2].modifier),
> 				    0, 0, 0, is_xe_device(data->drm_fd) ?
>-				    system_memory(data->drm_fd) : 0);
>+				    system_memory(data->drm_fd) : 0,
>+				    intel_get_pat_idx_uc(data->drm_fd));
> 	ibb = intel_bb_create(data->drm_fd, PAGE_SIZE);
>
> 	spin = igt_spin_new(data->drm_fd, .ahnd = ibb->allocator_handle);
>diff --git a/tests/intel/kms_psr.c b/tests/intel/kms_psr.c
>index ffecc5222..4cc41e479 100644
>--- a/tests/intel/kms_psr.c
>+++ b/tests/intel/kms_psr.c
>@@ -31,6 +31,7 @@
> #include "igt.h"
> #include "igt_sysfs.h"
> #include "igt_psr.h"
>+#include "intel_pat.h"
> #include <errno.h>
> #include <stdbool.h>
> #include <stdio.h>
>@@ -356,7 +357,8 @@ static struct intel_buf *create_buf_from_fb(data_t *data,
> 	name = gem_flink(data->drm_fd, fb->gem_handle);
> 	handle = gem_open(data->drm_fd, name);
> 	buf = intel_buf_create_full(data->bops, handle, width, height,
>-				    bpp, 0, tiling, 0, size, stride, region);
>+				    bpp, 0, tiling, 0, size, stride, region,
>+				    intel_get_pat_idx_uc(data->drm_fd));
> 	intel_buf_set_ownership(buf, true);
>
> 	return buf;
>diff --git a/tests/intel/xe_intel_bb.c b/tests/intel/xe_intel_bb.c
>index 26e4dcc85..e2accb743 100644
>--- a/tests/intel/xe_intel_bb.c
>+++ b/tests/intel/xe_intel_bb.c
>@@ -19,6 +19,7 @@
> #include "igt.h"
> #include "igt_crc.h"
> #include "intel_bufops.h"
>+#include "intel_pat.h"
> #include "xe/xe_ioctl.h"
> #include "xe/xe_query.h"
>
>@@ -400,7 +401,7 @@ static void create_in_region(struct buf_ops *bops, uint64_t region)
> 	intel_buf_init_full(bops, handle, &buf,
> 			    width/4, height, 32, 0,
> 			    I915_TILING_NONE, 0,
>-			    size, 0, region);
>+			    size, 0, region, DEFAULT_PAT_INDEX);
> 	intel_buf_set_ownership(&buf, true);
>
> 	intel_bb_add_intel_buf(ibb, &buf, false);
>-- 
>2.41.0
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [igt-dev] [PATCH i-g-t v4 10/15] lib/xe_ioctl: update vm_bind to account for pat_index
  2023-10-19 17:37   ` Niranjana Vishwanathapura
@ 2023-10-20  5:19     ` Niranjana Vishwanathapura
  2023-10-20  8:13       ` Matthew Auld
  0 siblings, 1 reply; 28+ messages in thread
From: Niranjana Vishwanathapura @ 2023-10-20  5:19 UTC (permalink / raw)
  To: Matthew Auld; +Cc: igt-dev

On Thu, Oct 19, 2023 at 10:37:57AM -0700, Niranjana Vishwanathapura wrote:
>On Thu, Oct 19, 2023 at 03:41:01PM +0100, Matthew Auld wrote:
>>Keep things minimal and select the 1way+ by default on all platforms.
>>Other users can use intel_buf, get_offset_pat_index etc or use
>>__xe_vm_bind() directly.  Display tests don't directly use this
>>interface.
>>
>>Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>>Cc: José Roberto de Souza <jose.souza@intel.com>
>>Cc: Pallavi Mishra <pallavi.mishra@intel.com>
>>---
>>lib/xe/xe_ioctl.c   | 8 ++++++--
>>lib/xe/xe_ioctl.h   | 2 +-
>>tests/intel/xe_vm.c | 5 ++++-
>>3 files changed, 11 insertions(+), 4 deletions(-)
>>
>>diff --git a/lib/xe/xe_ioctl.c b/lib/xe/xe_ioctl.c
>>index 4cf44f1ee..f51f931ee 100644
>>--- a/lib/xe/xe_ioctl.c
>>+++ b/lib/xe/xe_ioctl.c
>>@@ -41,6 +41,7 @@
>>#include "config.h"
>>#include "drmtest.h"
>>#include "igt_syncobj.h"
>>+#include "intel_pat.h"
>>#include "ioctl_wrappers.h"
>>#include "xe_ioctl.h"
>>#include "xe_query.h"
>>@@ -92,7 +93,7 @@ void xe_vm_bind_array(int fd, uint32_t vm, uint32_t exec_queue,
>>int  __xe_vm_bind(int fd, uint32_t vm, uint32_t exec_queue, uint32_t bo,
>>		  uint64_t offset, uint64_t addr, uint64_t size, uint32_t op,
>>		  uint32_t flags, struct drm_xe_sync *sync, uint32_t num_syncs,
>>-		  uint32_t region, uint64_t ext)
>>+		  uint32_t region, uint8_t pat_index, uint64_t ext)
>>{
>>	struct drm_xe_vm_bind bind = {
>>		.extensions = ext,
>>@@ -108,6 +109,8 @@ int  __xe_vm_bind(int fd, uint32_t vm, uint32_t exec_queue, uint32_t bo,
>>		.num_syncs = num_syncs,
>>		.syncs = (uintptr_t)sync,
>>		.exec_queue_id = exec_queue,
>>+		.bind.pat_index = (pat_index == DEFAULT_PAT_INDEX) ?
>>+			intel_get_pat_idx_wb(fd) : pat_index,
>>	};
>>
>>	if (igt_ioctl(fd, DRM_IOCTL_XE_VM_BIND, &bind))
>>@@ -122,7 +125,8 @@ void  __xe_vm_bind_assert(int fd, uint32_t vm, uint32_t exec_queue, uint32_t bo,
>>			  uint32_t num_syncs, uint32_t region, uint64_t ext)
>>{
>>	igt_assert_eq(__xe_vm_bind(fd, vm, exec_queue, bo, offset, addr, size,
>>-				   op, flags, sync, num_syncs, region, ext), 0);
>>+				   op, flags, sync, num_syncs, region, DEFAULT_PAT_INDEX,
>>+				   ext), 0);
>>}
>>
>>void xe_vm_bind(int fd, uint32_t vm, uint32_t bo, uint64_t offset,
>>diff --git a/lib/xe/xe_ioctl.h b/lib/xe/xe_ioctl.h
>>index e3f62a28a..a28375d3e 100644
>>--- a/lib/xe/xe_ioctl.h
>>+++ b/lib/xe/xe_ioctl.h
>>@@ -20,7 +20,7 @@ uint32_t xe_vm_create(int fd, uint32_t flags, uint64_t ext);
>>int  __xe_vm_bind(int fd, uint32_t vm, uint32_t exec_queue, uint32_t bo,
>>		  uint64_t offset, uint64_t addr, uint64_t size, uint32_t op,
>>		  uint32_t flags, struct drm_xe_sync *sync, uint32_t num_syncs,
>>-		  uint32_t region, uint64_t ext);
>>+		  uint32_t region, uint8_t pat_index, uint64_t ext);
>>void  __xe_vm_bind_assert(int fd, uint32_t vm, uint32_t exec_queue, uint32_t bo,
>>			  uint64_t offset, uint64_t addr, uint64_t size,
>>			  uint32_t op, uint32_t flags, struct drm_xe_sync *sync,
>>diff --git a/tests/intel/xe_vm.c b/tests/intel/xe_vm.c
>>index dd3302337..a01e1ba47 100644
>>--- a/tests/intel/xe_vm.c
>>+++ b/tests/intel/xe_vm.c
>>@@ -10,6 +10,7 @@
>> */
>>
>>#include "igt.h"
>>+#include "intel_pat.h"
>>#include "lib/igt_syncobj.h"
>>#include "lib/intel_reg.h"
>>#include "xe_drm.h"
>>@@ -316,7 +317,8 @@ static void userptr_invalid(int fd)
>>	vm = xe_vm_create(fd, 0, 0);
>>	munmap(data, size);
>>	ret = __xe_vm_bind(fd, vm, 0, 0, to_user_pointer(data), 0x40000,
>>-			   size, XE_VM_BIND_OP_MAP_USERPTR, 0, NULL, 0, 0, 0);
>>+			   size, XE_VM_BIND_OP_MAP_USERPTR, 0, NULL, 0, 0,
>>+			   DEFAULT_PAT_INDEX, 0);
>>	igt_assert(ret == -EFAULT);
>>
>>	xe_vm_destroy(fd, vm);
>>@@ -755,6 +757,7 @@ test_bind_array(int fd, struct drm_xe_engine_class_instance *eci, int n_execs,
>>		bind_ops[i].op = XE_VM_BIND_OP_MAP;
>>		bind_ops[i].flags = XE_VM_BIND_FLAG_ASYNC;
>>		bind_ops[i].region = 0;
>>+		bind_ops[i].pat_index = intel_get_pat_idx_wb(fd);
>>		bind_ops[i].reserved[0] = 0;
>>		bind_ops[i].reserved[1] = 0;
>
>I am seeing few other (below) usage of vm_bind_array() calls.
>lib/xe/xe_util.c
>lib/intel_batchbuffer.c
>
>I think they need to be updated too.
>

I see lib/intel_batchbuffer.c case is handled in previous patch.
That leaves only lib/xe/xe_util.c case.

Niranjana

>Niranjana
>
>>
>>-- 
>>2.41.0
>>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [igt-dev] [PATCH i-g-t v4 14/15] tests/xe: add some vm_bind pat_index tests
  2023-10-19 14:41 ` [igt-dev] [PATCH i-g-t v4 14/15] tests/xe: add some vm_bind pat_index tests Matthew Auld
@ 2023-10-20  5:27   ` Niranjana Vishwanathapura
  2023-10-20  8:21     ` Matthew Auld
  0 siblings, 1 reply; 28+ messages in thread
From: Niranjana Vishwanathapura @ 2023-10-20  5:27 UTC (permalink / raw)
  To: Matthew Auld; +Cc: igt-dev, Nitish Kumar

On Thu, Oct 19, 2023 at 03:41:05PM +0100, Matthew Auld wrote:
>Add some basic tests for pat_index and vm_bind.
>
>v2: Make sure to actually use srand() with the chosen seed
>  - Make it work on xe2; the wt mode now has compression.
>  - Also test some xe2+ specific pat_index modes.
>v3: Fix decompress step.
>v4: (Niranjana)
>  - Various improvements, including testing more pat_index modes, like
>    wc where possible.
>  - Document the idea behind "common" modes.
>
>Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>Cc: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
>Cc: José Roberto de Souza <jose.souza@intel.com>
>Cc: Pallavi Mishra <pallavi.mishra@intel.com>
>Cc: Nitish Kumar <nitish.kumar@intel.com>
>---
> tests/intel/xe_pat.c | 754 +++++++++++++++++++++++++++++++++++++++++++
> tests/meson.build    |   1 +
> 2 files changed, 755 insertions(+)
> create mode 100644 tests/intel/xe_pat.c
>
>diff --git a/tests/intel/xe_pat.c b/tests/intel/xe_pat.c
>new file mode 100644
>index 000000000..1e74014b8
>--- /dev/null
>+++ b/tests/intel/xe_pat.c
>@@ -0,0 +1,754 @@
>+// SPDX-License-Identifier: MIT
>+/*
>+ * Copyright © 2023 Intel Corporation
>+ */
>+
>+/**
>+ * TEST: Test for selecting per-VMA pat_index
>+ * Category: Software building block
>+ * Sub-category: VMA
>+ * Functionality: pat_index
>+ */
>+
>+#include "igt.h"
>+#include "intel_blt.h"
>+#include "intel_mocs.h"
>+#include "intel_pat.h"
>+
>+#include "xe/xe_ioctl.h"
>+#include "xe/xe_query.h"
>+#include "xe/xe_util.h"
>+
>+#define PAGE_SIZE 4096
>+
>+static bool do_slow_check;
>+
>+/**
>+ * SUBTEST: userptr-coh-none
>+ * Test category: functionality test
>+ * Description: Test non-coherent pat_index on userptr
>+ */
>+static void userptr_coh_none(int fd)
>+{
>+	size_t size = xe_get_default_alignment(fd);
>+	uint32_t vm;
>+	void *data;
>+
>+	data = mmap(0, size, PROT_READ |
>+		    PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS, -1, 0);
>+	igt_assert(data != MAP_FAILED);
>+
>+	vm = xe_vm_create(fd, 0, 0);
>+
>+	/*
>+	 * Try some valid combinations first just to make sure we're not being
>+	 * swindled.
>+	 */
>+	igt_assert_eq(__xe_vm_bind(fd, vm, 0, 0, to_user_pointer(data), 0x40000,
>+				   size, XE_VM_BIND_OP_MAP_USERPTR, 0, NULL, 0, 0,
>+				   DEFAULT_PAT_INDEX, 0),
>+		      0);
>+	xe_vm_unbind_sync(fd, vm, 0, 0x40000, size);
>+	igt_assert_eq(__xe_vm_bind(fd, vm, 0, 0, to_user_pointer(data), 0x40000,
>+				   size, XE_VM_BIND_OP_MAP_USERPTR, 0, NULL, 0, 0,
>+				   intel_get_pat_idx_wb(fd), 0),
>+		      0);
>+	xe_vm_unbind_sync(fd, vm, 0, 0x40000, size);
>+
>+	/* And then some known COH_NONE pat_index combos which should fail. */
>+	igt_assert_eq(__xe_vm_bind(fd, vm, 0, 0, to_user_pointer(data), 0x40000,
>+				   size, XE_VM_BIND_OP_MAP_USERPTR, 0, NULL, 0, 0,
>+				   intel_get_pat_idx_uc(fd), 0),
>+		      -EINVAL);
>+	igt_assert_eq(__xe_vm_bind(fd, vm, 0, 0, to_user_pointer(data), 0x40000,
>+				   size, XE_VM_BIND_OP_MAP_USERPTR, 0, NULL, 0, 0,
>+				   intel_get_pat_idx_wt(fd), 0),
>+		      -EINVAL);
>+
>+	munmap(data, size);
>+	xe_vm_destroy(fd, vm);
>+}
>+
>+/**
>+ * SUBTEST: pat-index-all
>+ * Test category: functionality test
>+ * Description: Test every pat_index
>+ */
>+static void pat_index_all(int fd)
>+{
>+	uint16_t dev_id = intel_get_drm_devid(fd);
>+	size_t size = xe_get_default_alignment(fd);
>+	uint32_t vm, bo;
>+	uint8_t pat_index;
>+
>+	vm = xe_vm_create(fd, 0, 0);
>+
>+	bo = xe_bo_create_caching(fd, 0, size, all_memory_regions(fd),
>+				  DRM_XE_GEM_CPU_CACHING_WC,
>+				  DRM_XE_GEM_COH_NONE);
>+
>+	igt_assert_eq(__xe_vm_bind(fd, vm, 0, bo, 0, 0x40000,
>+				   size, XE_VM_BIND_OP_MAP, 0, NULL, 0, 0,
>+				   intel_get_pat_idx_uc(fd), 0),
>+		      0);
>+	xe_vm_unbind_sync(fd, vm, 0, 0x40000, size);
>+
>+	igt_assert_eq(__xe_vm_bind(fd, vm, 0, bo, 0, 0x40000,
>+				   size, XE_VM_BIND_OP_MAP, 0, NULL, 0, 0,
>+				   intel_get_pat_idx_wt(fd), 0),
>+		      0);
>+	xe_vm_unbind_sync(fd, vm, 0, 0x40000, size);
>+
>+	igt_assert_eq(__xe_vm_bind(fd, vm, 0, bo, 0, 0x40000,
>+				   size, XE_VM_BIND_OP_MAP, 0, NULL, 0, 0,
>+				   intel_get_pat_idx_wb(fd), 0),
>+		      0);
>+	xe_vm_unbind_sync(fd, vm, 0, 0x40000, size);
>+
>+	igt_assert(intel_get_max_pat_index(fd));
>+
>+	for (pat_index = 0; pat_index <= intel_get_max_pat_index(fd);
>+	     pat_index++) {
>+		if (intel_get_device_info(dev_id)->graphics_ver == 20 &&
>+		    pat_index >= 16 && pat_index <= 19) { /* hw reserved */
>+			igt_assert_eq(__xe_vm_bind(fd, vm, 0, bo, 0, 0x40000,
>+						   size, XE_VM_BIND_OP_MAP, 0, NULL, 0, 0,
>+						   pat_index, 0),
>+				      -EINVAL);
>+		} else {
>+			igt_assert_eq(__xe_vm_bind(fd, vm, 0, bo, 0, 0x40000,
>+						   size, XE_VM_BIND_OP_MAP, 0, NULL, 0, 0,
>+						   pat_index, 0),
>+				      0);
>+			xe_vm_unbind_sync(fd, vm, 0, 0x40000, size);
>+		}
>+	}
>+
>+	igt_assert_eq(__xe_vm_bind(fd, vm, 0, bo, 0, 0x40000,
>+				   size, XE_VM_BIND_OP_MAP, 0, NULL, 0, 0,
>+				   pat_index, 0),
>+		      -EINVAL);
>+
>+	gem_close(fd, bo);
>+
>+	/* Must be at least as coherent as the gem_create coh_mode. */
>+	bo = xe_bo_create_caching(fd, 0, size, system_memory(fd),
>+				  DRM_XE_GEM_CPU_CACHING_WB,
>+				  DRM_XE_GEM_COH_AT_LEAST_1WAY);
>+
>+	igt_assert_eq(__xe_vm_bind(fd, vm, 0, bo, 0, 0x40000,
>+				   size, XE_VM_BIND_OP_MAP, 0, NULL, 0, 0,
>+				   intel_get_pat_idx_uc(fd), 0),
>+		      -EINVAL);
>+
>+	igt_assert_eq(__xe_vm_bind(fd, vm, 0, bo, 0, 0x40000,
>+				   size, XE_VM_BIND_OP_MAP, 0, NULL, 0, 0,
>+				   intel_get_pat_idx_wt(fd), 0),
>+		      -EINVAL);
>+
>+	igt_assert_eq(__xe_vm_bind(fd, vm, 0, bo, 0, 0x40000,
>+				   size, XE_VM_BIND_OP_MAP, 0, NULL, 0, 0,
>+				   intel_get_pat_idx_wb(fd), 0),
>+		      0);
>+	xe_vm_unbind_sync(fd, vm, 0, 0x40000, size);
>+
>+	gem_close(fd, bo);
>+
>+	xe_vm_destroy(fd, vm);
>+}
>+
>+#define CLEAR_1 0xFFFFFFFF /* something compressible */
>+
>+static void xe2_blt_decompress_dst(int fd,
>+				   intel_ctx_t *ctx,
>+				   uint64_t ahnd,
>+				   struct blt_copy_data *blt,
>+				   uint32_t alias_handle,
>+				   uint32_t size)
>+{
>+	struct blt_copy_object tmp = {};
>+
>+	/*
>+	 * Xe2 in-place decompression using an alias to the same physical
>+	 * memory, but with the dst mapped using some uncompressed pat_index.
>+	 * This should allow checking the object pages via mmap.
>+	 */
>+
>+	memcpy(&tmp, &blt->src, sizeof(blt->dst));
>+	memcpy(&blt->src, &blt->dst, sizeof(blt->dst));
>+	blt_set_object(&blt->dst, alias_handle, size, 0,
>+		       intel_get_uc_mocs_index(fd),
>+		       intel_get_pat_idx_uc(fd), /* compression disabled */
>+		       T_LINEAR, 0, 0);
>+	blt_fast_copy(fd, ctx, NULL, ahnd, blt);
>+	memcpy(&blt->dst, &blt->src, sizeof(blt->dst));
>+	memcpy(&blt->src, &tmp, sizeof(blt->dst));
>+}
>+
>+struct xe_pat_size_mode {
>+	uint16_t width;
>+	uint16_t height;
>+	uint32_t alignment;
>+	const char *name;
>+};
>+
>+struct xe_pat_param {
>+	int fd;
>+
>+	const struct xe_pat_size_mode *size;
>+
>+	uint32_t r1;
>+	uint8_t  r1_pat_index;
>+	uint16_t r1_coh_mode;
>+	bool     r1_force_cpu_wc;
>+
>+	uint32_t r2;
>+	uint8_t  r2_pat_index;
>+	uint16_t r2_coh_mode;
>+	bool     r2_force_cpu_wc;
>+	bool     r2_compressed; /* xe2+ compression */
>+
>+};
>+
>+static void pat_index_blt(struct xe_pat_param *p)
>+{
>+	struct drm_xe_engine_class_instance inst = {
>+		.engine_class = DRM_XE_ENGINE_CLASS_COPY,
>+	};
>+	struct blt_copy_data blt = {};
>+	struct blt_copy_object src = {};
>+	struct blt_copy_object dst = {};
>+	uint32_t vm, exec_queue, src_bo, dst_bo, bb;
>+	uint32_t *src_map, *dst_map;
>+	uint16_t r1_cpu_caching, r2_cpu_caching;
>+	uint32_t r1_flags, r2_flags;
>+	intel_ctx_t *ctx;
>+	uint64_t ahnd;
>+	int width = p->size->width, height = p->size->height;
>+	int size, stride, bb_size;
>+	int bpp = 32;
>+	uint32_t alias, name;
>+	int fd = p->fd;
>+	int i;
>+
>+	igt_require(blt_has_fast_copy(fd));
>+
>+	vm = xe_vm_create(fd, DRM_XE_VM_CREATE_ASYNC_DEFAULT, 0);
>+	exec_queue = xe_exec_queue_create(fd, vm, &inst, 0);
>+	ctx = intel_ctx_xe(fd, vm, exec_queue, 0, 0, 0);
>+	ahnd = intel_allocator_open_full(fd, ctx->vm, 0, 0,
>+					 INTEL_ALLOCATOR_SIMPLE,
>+					 ALLOC_STRATEGY_LOW_TO_HIGH,
>+					 p->size->alignment);
>+
>+	bb_size = xe_get_default_alignment(fd);
>+	bb = xe_bo_create_flags(fd, 0, bb_size, system_memory(fd));
>+
>+	size = width * height * bpp / 8;
>+	stride = width * 4;
>+
>+	r1_flags = 0;
>+	if (p->r1 != system_memory(fd))
>+		r1_flags |= XE_GEM_CREATE_FLAG_NEEDS_VISIBLE_VRAM;
>+
>+	if (p->r1_coh_mode == DRM_XE_GEM_COH_AT_LEAST_1WAY
>+	    && p->r1 == system_memory(fd) && !p->r1_force_cpu_wc)
>+		r1_cpu_caching = DRM_XE_GEM_CPU_CACHING_WB;
>+	else
>+		r1_cpu_caching = DRM_XE_GEM_CPU_CACHING_WC;
>+
>+	r2_flags = 0;
>+	if (p->r2 != system_memory(fd))
>+		r2_flags |= XE_GEM_CREATE_FLAG_NEEDS_VISIBLE_VRAM;
>+
>+	if (p->r2_coh_mode == DRM_XE_GEM_COH_AT_LEAST_1WAY &&
>+	    p->r2 == system_memory(fd) && !p->r2_force_cpu_wc)
>+		r2_cpu_caching = DRM_XE_GEM_CPU_CACHING_WB;
>+	else
>+		r2_cpu_caching = DRM_XE_GEM_CPU_CACHING_WC;
>+
>+
>+	src_bo = xe_bo_create_caching(fd, 0, size, p->r1 | r1_flags, r1_cpu_caching,
>+				      p->r1_coh_mode);
>+	dst_bo = xe_bo_create_caching(fd, 0, size, p->r2 | r2_flags, r2_cpu_caching,
>+				      p->r2_coh_mode);
>+	if (p->r2_compressed) {
>+		name = gem_flink(fd, dst_bo);
>+		alias = gem_open(fd, name);
>+	}
>+
>+	blt_copy_init(fd, &blt);
>+	blt.color_depth = CD_32bit;
>+
>+	blt_set_object(&src, src_bo, size, p->r1, intel_get_uc_mocs_index(fd),
>+		       p->r1_pat_index, T_LINEAR,
>+		       COMPRESSION_DISABLED, COMPRESSION_TYPE_3D);
>+	blt_set_geom(&src, stride, 0, 0, width, height, 0, 0);
>+
>+	blt_set_object(&dst, dst_bo, size, p->r2, intel_get_uc_mocs_index(fd),
>+		       p->r2_pat_index, T_LINEAR,
>+		       COMPRESSION_DISABLED, COMPRESSION_TYPE_3D);
>+	blt_set_geom(&dst, stride, 0, 0, width, height, 0, 0);
>+
>+	blt_set_copy_object(&blt.src, &src);
>+	blt_set_copy_object(&blt.dst, &dst);
>+	blt_set_batch(&blt.bb, bb, bb_size, system_memory(fd));
>+
>+	src_map = xe_bo_map(fd, src_bo, size);
>+	dst_map = xe_bo_map(fd, dst_bo, size);
>+
>+	/* Ensure we always see zeroes for the initial KMD zeroing */
>+	blt_fast_copy(fd, ctx, NULL, ahnd, &blt);
>+	if (p->r2_compressed)
>+		xe2_blt_decompress_dst(fd, ctx, ahnd, &blt, alias, size);
>+
>+	/*
>+	 * Only sample random dword in every page if we are doing slow uncached
>+	 * reads from VRAM.
>+	 */
>+	if (!do_slow_check && p->r2 != system_memory(fd)) {
>+		int dwords_page = PAGE_SIZE / sizeof(uint32_t);
>+		int dword = rand() % dwords_page;
>+
>+		igt_debug("random dword: %d\n", dword);
>+
>+		for (i = dword; i < size / sizeof(uint32_t); i += dwords_page)
>+			igt_assert_eq(dst_map[i], 0);
>+
>+	} else {
>+		for (i = 0; i < size / sizeof(uint32_t); i++)
>+			igt_assert_eq(dst_map[i], 0);
>+	}
>+
>+	/* Write some values from the CPU, potentially dirtying the CPU cache */
>+	for (i = 0; i < size / sizeof(uint32_t); i++) {
>+		if (p->r2_compressed)
>+			src_map[i] = CLEAR_1;
>+		else
>+			src_map[i] = i;
>+	}
>+
>+	/* And finally ensure we always see the CPU written values */
>+	blt_fast_copy(fd, ctx, NULL, ahnd, &blt);
>+	if (p->r2_compressed)
>+		xe2_blt_decompress_dst(fd, ctx, ahnd, &blt, alias, size);
>+
>+	if (!do_slow_check && p->r2 != system_memory(fd)) {
>+		int dwords_page = PAGE_SIZE / sizeof(uint32_t);
>+		int dword = rand() % dwords_page;
>+
>+		igt_debug("random dword: %d\n", dword);
>+
>+		for (i = dword; i < size / sizeof(uint32_t); i += dwords_page) {
>+			if (p->r2_compressed)
>+				igt_assert_eq(dst_map[i], CLEAR_1);
>+			else
>+				igt_assert_eq(dst_map[i], i);
>+		}
>+
>+	} else {
>+		for (i = 0; i < size / sizeof(uint32_t); i++) {
>+			if (p->r2_compressed)
>+				igt_assert_eq(dst_map[i], CLEAR_1);
>+			else
>+				igt_assert_eq(dst_map[i], i);
>+		}
>+	}
>+
>+	munmap(src_map, size);
>+	munmap(dst_map, size);
>+
>+	gem_close(fd, src_bo);
>+	gem_close(fd, dst_bo);
>+	gem_close(fd, bb);
>+
>+	xe_exec_queue_destroy(fd, exec_queue);
>+	xe_vm_destroy(fd, vm);
>+
>+	put_ahnd(ahnd);
>+	intel_ctx_destroy(fd, ctx);
>+}
>+
>+static void pat_index_render(struct xe_pat_param *p)
>+{
>+	int fd = p->fd;
>+	uint32_t devid = intel_get_drm_devid(fd);
>+	igt_render_copyfunc_t render_copy = NULL;
>+	int size, stride, width = p->size->width, height = p->size->height;
>+	struct intel_buf src, dst;
>+	struct intel_bb *ibb;
>+	struct buf_ops *bops;
>+	uint16_t r1_cpu_caching, r2_cpu_caching;
>+	uint32_t r1_flags, r2_flags;
>+	uint32_t src_bo, dst_bo;
>+	uint32_t *src_map, *dst_map;
>+	int bpp = 32;
>+	int i;
>+
>+	bops = buf_ops_create(fd);
>+
>+	render_copy = igt_get_render_copyfunc(devid);
>+	igt_require(render_copy);
>+	igt_require(!p->r2_compressed); /* XXX */
>+	igt_require(xe_has_engine_class(fd, DRM_XE_ENGINE_CLASS_RENDER));
>+
>+	ibb = intel_bb_create_full(fd, 0, 0, NULL, xe_get_default_alignment(fd),
>+				   0, 0, p->size->alignment,
>+				   INTEL_ALLOCATOR_SIMPLE,
>+				   ALLOC_STRATEGY_HIGH_TO_LOW);
>+
>+	if (p->r1_coh_mode == DRM_XE_GEM_COH_AT_LEAST_1WAY
>+	    && p->r1 == system_memory(fd) && !p->r1_force_cpu_wc)
>+		r1_cpu_caching = DRM_XE_GEM_CPU_CACHING_WB;
>+	else
>+		r1_cpu_caching = DRM_XE_GEM_CPU_CACHING_WC;
>+
>+	if (p->r2_coh_mode == DRM_XE_GEM_COH_AT_LEAST_1WAY &&
>+	    p->r2 == system_memory(fd) && !p->r2_force_cpu_wc)
>+		r2_cpu_caching = DRM_XE_GEM_CPU_CACHING_WB;
>+	else
>+		r2_cpu_caching = DRM_XE_GEM_CPU_CACHING_WC;
>+
>+	size = width * height * bpp / 8;
>+	stride = width * 4;
>+
>+	r1_flags = 0;
>+	if (p->r1 != system_memory(fd))
>+		r1_flags |= XE_GEM_CREATE_FLAG_NEEDS_VISIBLE_VRAM;
>+
>+	src_bo = xe_bo_create_caching(fd, 0, size, p->r1 | r1_flags, r1_cpu_caching,
>+				      p->r1_coh_mode);
>+	intel_buf_init_full(bops, src_bo, &src, width, height, bpp, 0,
>+			    I915_TILING_NONE, I915_COMPRESSION_NONE, size,
>+			    stride, p->r1, p->r1_pat_index);
>+
>+	r2_flags = 0;
>+	if (p->r2 != system_memory(fd))
>+		r2_flags |= XE_GEM_CREATE_FLAG_NEEDS_VISIBLE_VRAM;
>+
>+	dst_bo = xe_bo_create_caching(fd, 0, size, p->r2 | r2_flags, r2_cpu_caching,
>+				      p->r2_coh_mode);
>+	intel_buf_init_full(bops, dst_bo, &dst, width, height, bpp, 0,
>+			    I915_TILING_NONE, I915_COMPRESSION_NONE, size,
>+			    stride, p->r2, p->r2_pat_index);
>+
>+	src_map = xe_bo_map(fd, src_bo, size);
>+	dst_map = xe_bo_map(fd, dst_bo, size);
>+
>+	/* Ensure we always see zeroes for the initial KMD zeroing */
>+	render_copy(ibb,
>+		    &src,
>+		    0, 0, width, height,
>+		    &dst,
>+		    0, 0);
>+	intel_bb_sync(ibb);
>+
>+	if (!do_slow_check && p->r2 != system_memory(fd)) {
>+		int dwords_page = PAGE_SIZE / sizeof(uint32_t);
>+		int dword = rand() % dwords_page;
>+
>+		igt_debug("random dword: %d\n", dword);
>+
>+		for (i = dword; i < size / sizeof(uint32_t); i += dwords_page)
>+			igt_assert_eq(dst_map[i], 0);
>+	} else {
>+		for (i = 0; i < size / sizeof(uint32_t); i++)
>+			igt_assert_eq(dst_map[i], 0);
>+	}
>+
>+	/* Write some values from the CPU, potentially dirtying the CPU cache */
>+	for (i = 0; i < size / sizeof(uint32_t); i++)
>+		src_map[i] = i;
>+
>+	/* And finally ensure we always see the CPU written values */
>+	render_copy(ibb,
>+		    &src,
>+		    0, 0, width, height,
>+		    &dst,
>+		    0, 0);
>+	intel_bb_sync(ibb);
>+
>+	if (!do_slow_check && p->r2 != system_memory(fd)) {
>+		int dwords_page = PAGE_SIZE / sizeof(uint32_t);
>+		int dword = rand() % dwords_page;
>+
>+		igt_debug("random dword: %d\n", dword);
>+
>+		for (i = dword; i < size / sizeof(uint32_t); i += dwords_page)
>+			igt_assert_eq(dst_map[i], i);
>+	} else {
>+		for (i = 0; i < size / sizeof(uint32_t); i++)
>+			igt_assert_eq(dst_map[i], i);
>+	}
>+
>+	munmap(src_map, size);
>+	munmap(dst_map, size);
>+
>+	intel_bb_destroy(ibb);
>+
>+	gem_close(fd, src_bo);
>+	gem_close(fd, dst_bo);
>+}
>+
>+static uint8_t get_pat_idx_uc(int fd, bool *compressed)
>+{
>+	if (compressed)
>+		*compressed = false;
>+
>+	return intel_get_pat_idx_uc(fd);
>+}
>+
>+static uint8_t get_pat_idx_wt(int fd, bool *compressed)
>+{
>+	uint16_t dev_id = intel_get_drm_devid(fd);
>+
>+	if (compressed)
>+		*compressed = intel_get_device_info(dev_id)->graphics_ver == 20;
>+
>+	return intel_get_pat_idx_wt(fd);
>+}
>+
>+static uint8_t get_pat_idx_wb(int fd, bool *compressed)
>+{
>+	if (compressed)
>+		*compressed = false;
>+
>+	return intel_get_pat_idx_wb(fd);
>+}
>+
>+struct pat_index_entry {
>+	uint8_t (*get_pat_index)(int fd, bool *compressed);
>+
>+	uint8_t pat_index;
>+	bool compressed;
>+
>+	const char *name;
>+	uint16_t coh_mode;
>+	bool force_cpu_wc;
>+};
>+
>+/*
>+ * The common modes are available on all platforms supported by Xe and so should
>+ * be commonly supported. There are many more possible pat_index modes, however
>+ * most IGTs shouldn't really care about them so likely no need to add them to
>+ * lib/intel_pat.c. We do try to test some on the non-common modes here.
>+ */
>+const struct pat_index_entry common_pat_index_modes[] = {
>+	{ get_pat_idx_uc, 0, 0, "uc",        DRM_XE_GEM_COH_NONE                },
>+	{ get_pat_idx_wt, 0, 0, "wt",        DRM_XE_GEM_COH_NONE                },
>+	{ get_pat_idx_wb, 0, 0, "wb",        DRM_XE_GEM_COH_AT_LEAST_1WAY       },
>+	{ get_pat_idx_wb, 0, 0, "wb-cpu-wc", DRM_XE_GEM_COH_AT_LEAST_1WAY, true },
>+};
>+
>+const struct pat_index_entry xelp_pat_index_modes[] = {
>+	{ NULL, 1, false, "wc", DRM_XE_GEM_COH_NONE },
>+};
>+
>+const struct pat_index_entry xehpc_pat_index_modes[] = {
>+	{ NULL, 1, false, "wc",    DRM_XE_GEM_COH_NONE          },
>+	{ NULL, 4, false, "c1-wt", DRM_XE_GEM_COH_NONE          },
>+	{ NULL, 5, false, "c1-wb", DRM_XE_GEM_COH_AT_LEAST_1WAY },
>+	{ NULL, 6, false, "c2-wt", DRM_XE_GEM_COH_NONE          },
>+	{ NULL, 7, false, "c2-wb", DRM_XE_GEM_COH_AT_LEAST_1WAY },
>+};
>+
>+/* Too many, just pick some interesting ones */
>+const struct pat_index_entry xe2_pat_index_modes[] = {
>+	{ NULL, 1, false, "1way",        DRM_XE_GEM_COH_AT_LEAST_1WAY       },
>+	{ NULL, 2, false, "2way",        DRM_XE_GEM_COH_AT_LEAST_1WAY       },
>+	{ NULL, 2, false, "2way-cpu-wc", DRM_XE_GEM_COH_AT_LEAST_1WAY, true },
>+	{ NULL, 3, true,  "uc-comp",     DRM_XE_GEM_COH_NONE                },
>+	{ NULL, 5, false, "uc-1way",     DRM_XE_GEM_COH_AT_LEAST_1WAY       },
>+};
>+
>+/*
>+ * Depending on 2M/1G GTT pages we might trigger different PTE layouts for the
>+ * PAT bits, so make sure we test with and without huge-pages. Also ensure we
>+ * have a mix of different pat_index modes for each PDE.
>+ */
>+const struct xe_pat_size_mode size_modes[] =  {
>+	{ 256,  256,  0,        "mixed-pde"  },
>+	{ 1024, 1024, 1u << 21, "single-pde" },
>+};

I am bit confused with naming here (mixed-pde/single-pde).
The first case here creates BOs of size 256*256*8/2 = 256K which means it will
need updating few PTEs could be all under a single PTE. This tests pat_index
setting of PTEs
The second case here create BOs of size 1024*1024*8/2 = 4MB which at 2MB offset
will occupy 2 PDEs. This tests pat_index setting of leaf PDEs.
Right?

Other than that, the patch looks fine to me.
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>

>+
>+typedef void (*copy_fn)(struct xe_pat_param *p);
>+
>+const struct xe_pat_copy_mode {
>+	copy_fn fn;
>+	const char *name;
>+} copy_modes[] =  {
>+	{  pat_index_blt,    "blt"    },
>+	{  pat_index_render, "render" },
>+};
>+
>+/**
>+ * SUBTEST: pat-index-common
>+ * Test category: functionality test
>+ * Description: Check the common pat_index modes.
>+ */
>+
>+/**
>+ * SUBTEST: pat-index-xelp
>+ * Test category: functionality test
>+ * Description: Check some of the xelp pat_index modes.
>+ */
>+
>+/**
>+ * SUBTEST: pat-index-xehpc
>+ * Test category: functionality test
>+ * Description: Check some of the xehpc pat_index modes.
>+ */
>+
>+/**
>+ * SUBTEST: pat-index-xe2
>+ * Test category: functionality test
>+ * Description: Check some of the xe2 pat_index modes.
>+ */
>+
>+static void subtest_pat_index_modes_with_regions(int fd,
>+						 const struct pat_index_entry *modes_arr,
>+						 int n_modes)
>+{
>+	struct igt_collection *copy_set;
>+	struct igt_collection *pat_index_set;
>+	struct igt_collection *regions_set;
>+	struct igt_collection *sizes_set;
>+	struct igt_collection *copies;
>+	struct xe_pat_param p = {};
>+
>+	p.fd = fd;
>+
>+	copy_set = igt_collection_create(ARRAY_SIZE(copy_modes));
>+
>+	pat_index_set = igt_collection_create(n_modes);
>+
>+	regions_set = xe_get_memory_region_set(fd,
>+					       XE_MEM_REGION_CLASS_SYSMEM,
>+					       XE_MEM_REGION_CLASS_VRAM);
>+
>+	sizes_set = igt_collection_create(ARRAY_SIZE(size_modes));
>+
>+	for_each_variation_r(copies, 1, copy_set) {
>+		struct igt_collection *regions;
>+		struct xe_pat_copy_mode copy_mode;
>+
>+		copy_mode = copy_modes[igt_collection_get_value(copies, 0)];
>+
>+		for_each_variation_r(regions, 2, regions_set) {
>+			struct igt_collection *pat_modes;
>+			uint32_t r1, r2;
>+			char *reg_str;
>+
>+			r1 = igt_collection_get_value(regions, 0);
>+			r2 = igt_collection_get_value(regions, 1);
>+
>+			reg_str = xe_memregion_dynamic_subtest_name(fd, regions);
>+
>+			for_each_variation_r(pat_modes, 2, pat_index_set) {
>+				struct igt_collection *sizes;
>+				struct pat_index_entry r1_entry, r2_entry;
>+				int r1_idx, r2_idx;
>+
>+				r1_idx = igt_collection_get_value(pat_modes, 0);
>+				r2_idx = igt_collection_get_value(pat_modes, 1);
>+
>+				r1_entry = modes_arr[r1_idx];
>+				r2_entry = modes_arr[r2_idx];
>+
>+				if (r1_entry.get_pat_index)
>+					p.r1_pat_index = r1_entry.get_pat_index(fd, NULL);
>+				else
>+					p.r1_pat_index = r1_entry.pat_index;
>+
>+				if (r2_entry.get_pat_index)
>+					p.r2_pat_index = r2_entry.get_pat_index(fd, &p.r2_compressed);
>+				else {
>+					p.r2_pat_index = r2_entry.pat_index;
>+					p.r2_compressed = r2_entry.compressed;
>+				}
>+
>+				p.r1_coh_mode = r1_entry.coh_mode;
>+				p.r2_coh_mode = r2_entry.coh_mode;
>+
>+				p.r1_force_cpu_wc = r1_entry.force_cpu_wc;
>+				p.r2_force_cpu_wc = r2_entry.force_cpu_wc;
>+
>+				p.r1 = r1;
>+				p.r2 = r2;
>+
>+				for_each_variation_r(sizes, 1, sizes_set) {
>+					int size_mode_idx = igt_collection_get_value(sizes, 0);
>+
>+					p.size = &size_modes[size_mode_idx];
>+
>+					igt_debug("[r1]: r: %u, idx: %u, coh: %u, wc: %d\n",
>+						  p.r1, p.r1_pat_index, p.r1_coh_mode, p.r1_force_cpu_wc);
>+					igt_debug("[r2]: r: %u, idx: %u, coh: %u, wc: %d, comp: %d, w: %u, h: %u, a: %u\n",
>+						  p.r2, p.r2_pat_index, p.r2_coh_mode,
>+						  p.r2_force_cpu_wc, p.r2_compressed,
>+						  p.size->width, p.size->height,
>+						  p.size->alignment);
>+
>+					igt_dynamic_f("%s-%s-%s-%s-%s",
>+						      copy_mode.name,
>+						      reg_str, r1_entry.name,
>+						      r2_entry.name, p.size->name)
>+						copy_mode.fn(&p);
>+				}
>+			}
>+
>+			free(reg_str);
>+		}
>+	}
>+}
>+
>+igt_main
>+{
>+	uint16_t dev_id;
>+	int fd;
>+
>+	igt_fixture {
>+		uint32_t seed;
>+
>+		fd = drm_open_driver(DRIVER_XE);
>+		dev_id = intel_get_drm_devid(fd);
>+
>+		seed = time(NULL);
>+		srand(seed);
>+		igt_debug("seed: %d\n", seed);
>+
>+		xe_device_get(fd);
>+	}
>+
>+	igt_subtest("pat-index-all")
>+		pat_index_all(fd);
>+
>+	igt_subtest("userptr-coh-none")
>+		userptr_coh_none(fd);
>+
>+	igt_subtest_with_dynamic("pat-index-common") {
>+		subtest_pat_index_modes_with_regions(fd, common_pat_index_modes,
>+						     ARRAY_SIZE(common_pat_index_modes));
>+	}
>+
>+	igt_subtest_with_dynamic("pat-index-xelp") {
>+		igt_require(intel_graphics_ver(dev_id) <= IP_VER(12, 55));
>+		subtest_pat_index_modes_with_regions(fd, xelp_pat_index_modes,
>+						     ARRAY_SIZE(xelp_pat_index_modes));
>+	}
>+
>+	igt_subtest_with_dynamic("pat-index-xehpc") {
>+		igt_require(IS_PONTEVECCHIO(dev_id));
>+		subtest_pat_index_modes_with_regions(fd, xehpc_pat_index_modes,
>+						     ARRAY_SIZE(xehpc_pat_index_modes));
>+	}
>+
>+	igt_subtest_with_dynamic("pat-index-xe2") {
>+		igt_require(intel_get_device_info(dev_id)->graphics_ver >= 20);
>+		subtest_pat_index_modes_with_regions(fd, xe2_pat_index_modes,
>+						     ARRAY_SIZE(xe2_pat_index_modes));
>+	}
>+
>+	igt_fixture
>+		drm_close_driver(fd);
>+}
>diff --git a/tests/meson.build b/tests/meson.build
>index 5afcd8cbb..3aecfbee0 100644
>--- a/tests/meson.build
>+++ b/tests/meson.build
>@@ -297,6 +297,7 @@ intel_xe_progs = [
> 	'xe_mmap',
> 	'xe_module_load',
> 	'xe_noexec_ping_pong',
>+	'xe_pat',
> 	'xe_pm',
> 	'xe_pm_residency',
> 	'xe_prime_self_import',
>-- 
>2.41.0
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [igt-dev] [PATCH i-g-t v4 11/15] lib/intel_allocator: treat default_alignment as the minimum
  2023-10-19 17:34   ` Niranjana Vishwanathapura
@ 2023-10-20  7:55     ` Matthew Auld
  0 siblings, 0 replies; 28+ messages in thread
From: Matthew Auld @ 2023-10-20  7:55 UTC (permalink / raw)
  To: Niranjana Vishwanathapura; +Cc: igt-dev

On 19/10/2023 18:34, Niranjana Vishwanathapura wrote:
> On Thu, Oct 19, 2023 at 03:41:02PM +0100, Matthew Auld wrote:
>> If something overrides the default alignment, we should only apply the
>> alignment if it is larger than the default_alignment.
>>
>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>> Cc: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com>
>> Cc: José Roberto de Souza <jose.souza@intel.com>
>> Cc: Pallavi Mishra <pallavi.mishra@intel.com>
>> ---
>> lib/intel_allocator.c | 3 +++
>> 1 file changed, 3 insertions(+)
>>
>> diff --git a/lib/intel_allocator.c b/lib/intel_allocator.c
>> index e5b9457b8..d94043016 100644
>> --- a/lib/intel_allocator.c
>> +++ b/lib/intel_allocator.c
>> @@ -586,6 +586,9 @@ static int handle_request(struct alloc_req *req, 
>> struct alloc_resp *resp)
>>         case REQ_ALLOC:
>>             if (!req->alloc.alignment)
>>                 req->alloc.alignment = ial->default_alignment;
>> +            else
>> +                req->alloc.alignment = max(ial->default_alignment,
>> +                               req->alloc.alignment);
> 
> Looks like we don't need if/else clause here.
> req->alloc.alignment = max(ial->default_alignment, req->alloc.alignment);

Will fix. Thanks.

> 
> Other than that, change looks good to me.
> Reviewed-by: Niranjana Vishwanathapura 
> <niranjana.vishwanathapura@intel.com>
> 
> Niranjana
> 
>>
>>             resp->response_type = RESP_ALLOC;
>>             resp->alloc.offset = ial->alloc(ial,
>> -- 
>> 2.41.0
>>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [igt-dev] [PATCH i-g-t v4 10/15] lib/xe_ioctl: update vm_bind to account for pat_index
  2023-10-20  5:19     ` Niranjana Vishwanathapura
@ 2023-10-20  8:13       ` Matthew Auld
  0 siblings, 0 replies; 28+ messages in thread
From: Matthew Auld @ 2023-10-20  8:13 UTC (permalink / raw)
  To: Niranjana Vishwanathapura; +Cc: igt-dev

On 20/10/2023 06:19, Niranjana Vishwanathapura wrote:
> On Thu, Oct 19, 2023 at 10:37:57AM -0700, Niranjana Vishwanathapura wrote:
>> On Thu, Oct 19, 2023 at 03:41:01PM +0100, Matthew Auld wrote:
>>> Keep things minimal and select the 1way+ by default on all platforms.
>>> Other users can use intel_buf, get_offset_pat_index etc or use
>>> __xe_vm_bind() directly.  Display tests don't directly use this
>>> interface.
>>>
>>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>>> Cc: José Roberto de Souza <jose.souza@intel.com>
>>> Cc: Pallavi Mishra <pallavi.mishra@intel.com>
>>> ---
>>> lib/xe/xe_ioctl.c   | 8 ++++++--
>>> lib/xe/xe_ioctl.h   | 2 +-
>>> tests/intel/xe_vm.c | 5 ++++-
>>> 3 files changed, 11 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/lib/xe/xe_ioctl.c b/lib/xe/xe_ioctl.c
>>> index 4cf44f1ee..f51f931ee 100644
>>> --- a/lib/xe/xe_ioctl.c
>>> +++ b/lib/xe/xe_ioctl.c
>>> @@ -41,6 +41,7 @@
>>> #include "config.h"
>>> #include "drmtest.h"
>>> #include "igt_syncobj.h"
>>> +#include "intel_pat.h"
>>> #include "ioctl_wrappers.h"
>>> #include "xe_ioctl.h"
>>> #include "xe_query.h"
>>> @@ -92,7 +93,7 @@ void xe_vm_bind_array(int fd, uint32_t vm, uint32_t 
>>> exec_queue,
>>> int  __xe_vm_bind(int fd, uint32_t vm, uint32_t exec_queue, uint32_t bo,
>>>           uint64_t offset, uint64_t addr, uint64_t size, uint32_t op,
>>>           uint32_t flags, struct drm_xe_sync *sync, uint32_t num_syncs,
>>> -          uint32_t region, uint64_t ext)
>>> +          uint32_t region, uint8_t pat_index, uint64_t ext)
>>> {
>>>     struct drm_xe_vm_bind bind = {
>>>         .extensions = ext,
>>> @@ -108,6 +109,8 @@ int  __xe_vm_bind(int fd, uint32_t vm, uint32_t 
>>> exec_queue, uint32_t bo,
>>>         .num_syncs = num_syncs,
>>>         .syncs = (uintptr_t)sync,
>>>         .exec_queue_id = exec_queue,
>>> +        .bind.pat_index = (pat_index == DEFAULT_PAT_INDEX) ?
>>> +            intel_get_pat_idx_wb(fd) : pat_index,
>>>     };
>>>
>>>     if (igt_ioctl(fd, DRM_IOCTL_XE_VM_BIND, &bind))
>>> @@ -122,7 +125,8 @@ void  __xe_vm_bind_assert(int fd, uint32_t vm, 
>>> uint32_t exec_queue, uint32_t bo,
>>>               uint32_t num_syncs, uint32_t region, uint64_t ext)
>>> {
>>>     igt_assert_eq(__xe_vm_bind(fd, vm, exec_queue, bo, offset, addr, 
>>> size,
>>> -                   op, flags, sync, num_syncs, region, ext), 0);
>>> +                   op, flags, sync, num_syncs, region, 
>>> DEFAULT_PAT_INDEX,
>>> +                   ext), 0);
>>> }
>>>
>>> void xe_vm_bind(int fd, uint32_t vm, uint32_t bo, uint64_t offset,
>>> diff --git a/lib/xe/xe_ioctl.h b/lib/xe/xe_ioctl.h
>>> index e3f62a28a..a28375d3e 100644
>>> --- a/lib/xe/xe_ioctl.h
>>> +++ b/lib/xe/xe_ioctl.h
>>> @@ -20,7 +20,7 @@ uint32_t xe_vm_create(int fd, uint32_t flags, 
>>> uint64_t ext);
>>> int  __xe_vm_bind(int fd, uint32_t vm, uint32_t exec_queue, uint32_t bo,
>>>           uint64_t offset, uint64_t addr, uint64_t size, uint32_t op,
>>>           uint32_t flags, struct drm_xe_sync *sync, uint32_t num_syncs,
>>> -          uint32_t region, uint64_t ext);
>>> +          uint32_t region, uint8_t pat_index, uint64_t ext);
>>> void  __xe_vm_bind_assert(int fd, uint32_t vm, uint32_t exec_queue, 
>>> uint32_t bo,
>>>               uint64_t offset, uint64_t addr, uint64_t size,
>>>               uint32_t op, uint32_t flags, struct drm_xe_sync *sync,
>>> diff --git a/tests/intel/xe_vm.c b/tests/intel/xe_vm.c
>>> index dd3302337..a01e1ba47 100644
>>> --- a/tests/intel/xe_vm.c
>>> +++ b/tests/intel/xe_vm.c
>>> @@ -10,6 +10,7 @@
>>> */
>>>
>>> #include "igt.h"
>>> +#include "intel_pat.h"
>>> #include "lib/igt_syncobj.h"
>>> #include "lib/intel_reg.h"
>>> #include "xe_drm.h"
>>> @@ -316,7 +317,8 @@ static void userptr_invalid(int fd)
>>>     vm = xe_vm_create(fd, 0, 0);
>>>     munmap(data, size);
>>>     ret = __xe_vm_bind(fd, vm, 0, 0, to_user_pointer(data), 0x40000,
>>> -               size, XE_VM_BIND_OP_MAP_USERPTR, 0, NULL, 0, 0, 0);
>>> +               size, XE_VM_BIND_OP_MAP_USERPTR, 0, NULL, 0, 0,
>>> +               DEFAULT_PAT_INDEX, 0);
>>>     igt_assert(ret == -EFAULT);
>>>
>>>     xe_vm_destroy(fd, vm);
>>> @@ -755,6 +757,7 @@ test_bind_array(int fd, struct 
>>> drm_xe_engine_class_instance *eci, int n_execs,
>>>         bind_ops[i].op = XE_VM_BIND_OP_MAP;
>>>         bind_ops[i].flags = XE_VM_BIND_FLAG_ASYNC;
>>>         bind_ops[i].region = 0;
>>> +        bind_ops[i].pat_index = intel_get_pat_idx_wb(fd);
>>>         bind_ops[i].reserved[0] = 0;
>>>         bind_ops[i].reserved[1] = 0;
>>
>> I am seeing few other (below) usage of vm_bind_array() calls.
>> lib/xe/xe_util.c
>> lib/intel_batchbuffer.c
>>
>> I think they need to be updated too.
>>
> 
> I see lib/intel_batchbuffer.c case is handled in previous patch.
> That leaves only lib/xe/xe_util.c case.

xe_util was converted in: "lib/allocator: add get_offset_pat_index() 
helper". The get_offset() stuff uses the xe_util method of binding so 
was converted in that patch.

> 
> Niranjana
> 
>> Niranjana
>>
>>>
>>> -- 
>>> 2.41.0
>>>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [igt-dev] [PATCH i-g-t v4 14/15] tests/xe: add some vm_bind pat_index tests
  2023-10-20  5:27   ` Niranjana Vishwanathapura
@ 2023-10-20  8:21     ` Matthew Auld
  2023-10-20  8:42       ` Matthew Auld
  2023-10-20 17:24       ` Niranjana Vishwanathapura
  0 siblings, 2 replies; 28+ messages in thread
From: Matthew Auld @ 2023-10-20  8:21 UTC (permalink / raw)
  To: Niranjana Vishwanathapura; +Cc: igt-dev, Nitish Kumar

On 20/10/2023 06:27, Niranjana Vishwanathapura wrote:
> On Thu, Oct 19, 2023 at 03:41:05PM +0100, Matthew Auld wrote:
>> Add some basic tests for pat_index and vm_bind.
>>
>> v2: Make sure to actually use srand() with the chosen seed
>>  - Make it work on xe2; the wt mode now has compression.
>>  - Also test some xe2+ specific pat_index modes.
>> v3: Fix decompress step.
>> v4: (Niranjana)
>>  - Various improvements, including testing more pat_index modes, like
>>    wc where possible.
>>  - Document the idea behind "common" modes.
>>
>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>> Cc: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
>> Cc: José Roberto de Souza <jose.souza@intel.com>
>> Cc: Pallavi Mishra <pallavi.mishra@intel.com>
>> Cc: Nitish Kumar <nitish.kumar@intel.com>
>> ---
>> tests/intel/xe_pat.c | 754 +++++++++++++++++++++++++++++++++++++++++++
>> tests/meson.build    |   1 +
>> 2 files changed, 755 insertions(+)
>> create mode 100644 tests/intel/xe_pat.c
>>
>> diff --git a/tests/intel/xe_pat.c b/tests/intel/xe_pat.c
>> new file mode 100644
>> index 000000000..1e74014b8
>> --- /dev/null
>> +++ b/tests/intel/xe_pat.c
>> @@ -0,0 +1,754 @@
>> +// SPDX-License-Identifier: MIT
>> +/*
>> + * Copyright © 2023 Intel Corporation
>> + */
>> +
>> +/**
>> + * TEST: Test for selecting per-VMA pat_index
>> + * Category: Software building block
>> + * Sub-category: VMA
>> + * Functionality: pat_index
>> + */
>> +
>> +#include "igt.h"
>> +#include "intel_blt.h"
>> +#include "intel_mocs.h"
>> +#include "intel_pat.h"
>> +
>> +#include "xe/xe_ioctl.h"
>> +#include "xe/xe_query.h"
>> +#include "xe/xe_util.h"
>> +
>> +#define PAGE_SIZE 4096
>> +
>> +static bool do_slow_check;
>> +
>> +/**
>> + * SUBTEST: userptr-coh-none
>> + * Test category: functionality test
>> + * Description: Test non-coherent pat_index on userptr
>> + */
>> +static void userptr_coh_none(int fd)
>> +{
>> +    size_t size = xe_get_default_alignment(fd);
>> +    uint32_t vm;
>> +    void *data;
>> +
>> +    data = mmap(0, size, PROT_READ |
>> +            PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS, -1, 0);
>> +    igt_assert(data != MAP_FAILED);
>> +
>> +    vm = xe_vm_create(fd, 0, 0);
>> +
>> +    /*
>> +     * Try some valid combinations first just to make sure we're not 
>> being
>> +     * swindled.
>> +     */
>> +    igt_assert_eq(__xe_vm_bind(fd, vm, 0, 0, to_user_pointer(data), 
>> 0x40000,
>> +                   size, XE_VM_BIND_OP_MAP_USERPTR, 0, NULL, 0, 0,
>> +                   DEFAULT_PAT_INDEX, 0),
>> +              0);
>> +    xe_vm_unbind_sync(fd, vm, 0, 0x40000, size);
>> +    igt_assert_eq(__xe_vm_bind(fd, vm, 0, 0, to_user_pointer(data), 
>> 0x40000,
>> +                   size, XE_VM_BIND_OP_MAP_USERPTR, 0, NULL, 0, 0,
>> +                   intel_get_pat_idx_wb(fd), 0),
>> +              0);
>> +    xe_vm_unbind_sync(fd, vm, 0, 0x40000, size);
>> +
>> +    /* And then some known COH_NONE pat_index combos which should 
>> fail. */
>> +    igt_assert_eq(__xe_vm_bind(fd, vm, 0, 0, to_user_pointer(data), 
>> 0x40000,
>> +                   size, XE_VM_BIND_OP_MAP_USERPTR, 0, NULL, 0, 0,
>> +                   intel_get_pat_idx_uc(fd), 0),
>> +              -EINVAL);
>> +    igt_assert_eq(__xe_vm_bind(fd, vm, 0, 0, to_user_pointer(data), 
>> 0x40000,
>> +                   size, XE_VM_BIND_OP_MAP_USERPTR, 0, NULL, 0, 0,
>> +                   intel_get_pat_idx_wt(fd), 0),
>> +              -EINVAL);
>> +
>> +    munmap(data, size);
>> +    xe_vm_destroy(fd, vm);
>> +}
>> +
>> +/**
>> + * SUBTEST: pat-index-all
>> + * Test category: functionality test
>> + * Description: Test every pat_index
>> + */
>> +static void pat_index_all(int fd)
>> +{
>> +    uint16_t dev_id = intel_get_drm_devid(fd);
>> +    size_t size = xe_get_default_alignment(fd);
>> +    uint32_t vm, bo;
>> +    uint8_t pat_index;
>> +
>> +    vm = xe_vm_create(fd, 0, 0);
>> +
>> +    bo = xe_bo_create_caching(fd, 0, size, all_memory_regions(fd),
>> +                  DRM_XE_GEM_CPU_CACHING_WC,
>> +                  DRM_XE_GEM_COH_NONE);
>> +
>> +    igt_assert_eq(__xe_vm_bind(fd, vm, 0, bo, 0, 0x40000,
>> +                   size, XE_VM_BIND_OP_MAP, 0, NULL, 0, 0,
>> +                   intel_get_pat_idx_uc(fd), 0),
>> +              0);
>> +    xe_vm_unbind_sync(fd, vm, 0, 0x40000, size);
>> +
>> +    igt_assert_eq(__xe_vm_bind(fd, vm, 0, bo, 0, 0x40000,
>> +                   size, XE_VM_BIND_OP_MAP, 0, NULL, 0, 0,
>> +                   intel_get_pat_idx_wt(fd), 0),
>> +              0);
>> +    xe_vm_unbind_sync(fd, vm, 0, 0x40000, size);
>> +
>> +    igt_assert_eq(__xe_vm_bind(fd, vm, 0, bo, 0, 0x40000,
>> +                   size, XE_VM_BIND_OP_MAP, 0, NULL, 0, 0,
>> +                   intel_get_pat_idx_wb(fd), 0),
>> +              0);
>> +    xe_vm_unbind_sync(fd, vm, 0, 0x40000, size);
>> +
>> +    igt_assert(intel_get_max_pat_index(fd));
>> +
>> +    for (pat_index = 0; pat_index <= intel_get_max_pat_index(fd);
>> +         pat_index++) {
>> +        if (intel_get_device_info(dev_id)->graphics_ver == 20 &&
>> +            pat_index >= 16 && pat_index <= 19) { /* hw reserved */
>> +            igt_assert_eq(__xe_vm_bind(fd, vm, 0, bo, 0, 0x40000,
>> +                           size, XE_VM_BIND_OP_MAP, 0, NULL, 0, 0,
>> +                           pat_index, 0),
>> +                      -EINVAL);
>> +        } else {
>> +            igt_assert_eq(__xe_vm_bind(fd, vm, 0, bo, 0, 0x40000,
>> +                           size, XE_VM_BIND_OP_MAP, 0, NULL, 0, 0,
>> +                           pat_index, 0),
>> +                      0);
>> +            xe_vm_unbind_sync(fd, vm, 0, 0x40000, size);
>> +        }
>> +    }
>> +
>> +    igt_assert_eq(__xe_vm_bind(fd, vm, 0, bo, 0, 0x40000,
>> +                   size, XE_VM_BIND_OP_MAP, 0, NULL, 0, 0,
>> +                   pat_index, 0),
>> +              -EINVAL);
>> +
>> +    gem_close(fd, bo);
>> +
>> +    /* Must be at least as coherent as the gem_create coh_mode. */
>> +    bo = xe_bo_create_caching(fd, 0, size, system_memory(fd),
>> +                  DRM_XE_GEM_CPU_CACHING_WB,
>> +                  DRM_XE_GEM_COH_AT_LEAST_1WAY);
>> +
>> +    igt_assert_eq(__xe_vm_bind(fd, vm, 0, bo, 0, 0x40000,
>> +                   size, XE_VM_BIND_OP_MAP, 0, NULL, 0, 0,
>> +                   intel_get_pat_idx_uc(fd), 0),
>> +              -EINVAL);
>> +
>> +    igt_assert_eq(__xe_vm_bind(fd, vm, 0, bo, 0, 0x40000,
>> +                   size, XE_VM_BIND_OP_MAP, 0, NULL, 0, 0,
>> +                   intel_get_pat_idx_wt(fd), 0),
>> +              -EINVAL);
>> +
>> +    igt_assert_eq(__xe_vm_bind(fd, vm, 0, bo, 0, 0x40000,
>> +                   size, XE_VM_BIND_OP_MAP, 0, NULL, 0, 0,
>> +                   intel_get_pat_idx_wb(fd), 0),
>> +              0);
>> +    xe_vm_unbind_sync(fd, vm, 0, 0x40000, size);
>> +
>> +    gem_close(fd, bo);
>> +
>> +    xe_vm_destroy(fd, vm);
>> +}
>> +
>> +#define CLEAR_1 0xFFFFFFFF /* something compressible */
>> +
>> +static void xe2_blt_decompress_dst(int fd,
>> +                   intel_ctx_t *ctx,
>> +                   uint64_t ahnd,
>> +                   struct blt_copy_data *blt,
>> +                   uint32_t alias_handle,
>> +                   uint32_t size)
>> +{
>> +    struct blt_copy_object tmp = {};
>> +
>> +    /*
>> +     * Xe2 in-place decompression using an alias to the same physical
>> +     * memory, but with the dst mapped using some uncompressed 
>> pat_index.
>> +     * This should allow checking the object pages via mmap.
>> +     */
>> +
>> +    memcpy(&tmp, &blt->src, sizeof(blt->dst));
>> +    memcpy(&blt->src, &blt->dst, sizeof(blt->dst));
>> +    blt_set_object(&blt->dst, alias_handle, size, 0,
>> +               intel_get_uc_mocs_index(fd),
>> +               intel_get_pat_idx_uc(fd), /* compression disabled */
>> +               T_LINEAR, 0, 0);
>> +    blt_fast_copy(fd, ctx, NULL, ahnd, blt);
>> +    memcpy(&blt->dst, &blt->src, sizeof(blt->dst));
>> +    memcpy(&blt->src, &tmp, sizeof(blt->dst));
>> +}
>> +
>> +struct xe_pat_size_mode {
>> +    uint16_t width;
>> +    uint16_t height;
>> +    uint32_t alignment;
>> +    const char *name;
>> +};
>> +
>> +struct xe_pat_param {
>> +    int fd;
>> +
>> +    const struct xe_pat_size_mode *size;
>> +
>> +    uint32_t r1;
>> +    uint8_t  r1_pat_index;
>> +    uint16_t r1_coh_mode;
>> +    bool     r1_force_cpu_wc;
>> +
>> +    uint32_t r2;
>> +    uint8_t  r2_pat_index;
>> +    uint16_t r2_coh_mode;
>> +    bool     r2_force_cpu_wc;
>> +    bool     r2_compressed; /* xe2+ compression */
>> +
>> +};
>> +
>> +static void pat_index_blt(struct xe_pat_param *p)
>> +{
>> +    struct drm_xe_engine_class_instance inst = {
>> +        .engine_class = DRM_XE_ENGINE_CLASS_COPY,
>> +    };
>> +    struct blt_copy_data blt = {};
>> +    struct blt_copy_object src = {};
>> +    struct blt_copy_object dst = {};
>> +    uint32_t vm, exec_queue, src_bo, dst_bo, bb;
>> +    uint32_t *src_map, *dst_map;
>> +    uint16_t r1_cpu_caching, r2_cpu_caching;
>> +    uint32_t r1_flags, r2_flags;
>> +    intel_ctx_t *ctx;
>> +    uint64_t ahnd;
>> +    int width = p->size->width, height = p->size->height;
>> +    int size, stride, bb_size;
>> +    int bpp = 32;
>> +    uint32_t alias, name;
>> +    int fd = p->fd;
>> +    int i;
>> +
>> +    igt_require(blt_has_fast_copy(fd));
>> +
>> +    vm = xe_vm_create(fd, DRM_XE_VM_CREATE_ASYNC_DEFAULT, 0);
>> +    exec_queue = xe_exec_queue_create(fd, vm, &inst, 0);
>> +    ctx = intel_ctx_xe(fd, vm, exec_queue, 0, 0, 0);
>> +    ahnd = intel_allocator_open_full(fd, ctx->vm, 0, 0,
>> +                     INTEL_ALLOCATOR_SIMPLE,
>> +                     ALLOC_STRATEGY_LOW_TO_HIGH,
>> +                     p->size->alignment);
>> +
>> +    bb_size = xe_get_default_alignment(fd);
>> +    bb = xe_bo_create_flags(fd, 0, bb_size, system_memory(fd));
>> +
>> +    size = width * height * bpp / 8;
>> +    stride = width * 4;
>> +
>> +    r1_flags = 0;
>> +    if (p->r1 != system_memory(fd))
>> +        r1_flags |= XE_GEM_CREATE_FLAG_NEEDS_VISIBLE_VRAM;
>> +
>> +    if (p->r1_coh_mode == DRM_XE_GEM_COH_AT_LEAST_1WAY
>> +        && p->r1 == system_memory(fd) && !p->r1_force_cpu_wc)
>> +        r1_cpu_caching = DRM_XE_GEM_CPU_CACHING_WB;
>> +    else
>> +        r1_cpu_caching = DRM_XE_GEM_CPU_CACHING_WC;
>> +
>> +    r2_flags = 0;
>> +    if (p->r2 != system_memory(fd))
>> +        r2_flags |= XE_GEM_CREATE_FLAG_NEEDS_VISIBLE_VRAM;
>> +
>> +    if (p->r2_coh_mode == DRM_XE_GEM_COH_AT_LEAST_1WAY &&
>> +        p->r2 == system_memory(fd) && !p->r2_force_cpu_wc)
>> +        r2_cpu_caching = DRM_XE_GEM_CPU_CACHING_WB;
>> +    else
>> +        r2_cpu_caching = DRM_XE_GEM_CPU_CACHING_WC;
>> +
>> +
>> +    src_bo = xe_bo_create_caching(fd, 0, size, p->r1 | r1_flags, 
>> r1_cpu_caching,
>> +                      p->r1_coh_mode);
>> +    dst_bo = xe_bo_create_caching(fd, 0, size, p->r2 | r2_flags, 
>> r2_cpu_caching,
>> +                      p->r2_coh_mode);
>> +    if (p->r2_compressed) {
>> +        name = gem_flink(fd, dst_bo);
>> +        alias = gem_open(fd, name);
>> +    }
>> +
>> +    blt_copy_init(fd, &blt);
>> +    blt.color_depth = CD_32bit;
>> +
>> +    blt_set_object(&src, src_bo, size, p->r1, 
>> intel_get_uc_mocs_index(fd),
>> +               p->r1_pat_index, T_LINEAR,
>> +               COMPRESSION_DISABLED, COMPRESSION_TYPE_3D);
>> +    blt_set_geom(&src, stride, 0, 0, width, height, 0, 0);
>> +
>> +    blt_set_object(&dst, dst_bo, size, p->r2, 
>> intel_get_uc_mocs_index(fd),
>> +               p->r2_pat_index, T_LINEAR,
>> +               COMPRESSION_DISABLED, COMPRESSION_TYPE_3D);
>> +    blt_set_geom(&dst, stride, 0, 0, width, height, 0, 0);
>> +
>> +    blt_set_copy_object(&blt.src, &src);
>> +    blt_set_copy_object(&blt.dst, &dst);
>> +    blt_set_batch(&blt.bb, bb, bb_size, system_memory(fd));
>> +
>> +    src_map = xe_bo_map(fd, src_bo, size);
>> +    dst_map = xe_bo_map(fd, dst_bo, size);
>> +
>> +    /* Ensure we always see zeroes for the initial KMD zeroing */
>> +    blt_fast_copy(fd, ctx, NULL, ahnd, &blt);
>> +    if (p->r2_compressed)
>> +        xe2_blt_decompress_dst(fd, ctx, ahnd, &blt, alias, size);
>> +
>> +    /*
>> +     * Only sample random dword in every page if we are doing slow 
>> uncached
>> +     * reads from VRAM.
>> +     */
>> +    if (!do_slow_check && p->r2 != system_memory(fd)) {
>> +        int dwords_page = PAGE_SIZE / sizeof(uint32_t);
>> +        int dword = rand() % dwords_page;
>> +
>> +        igt_debug("random dword: %d\n", dword);
>> +
>> +        for (i = dword; i < size / sizeof(uint32_t); i += dwords_page)
>> +            igt_assert_eq(dst_map[i], 0);
>> +
>> +    } else {
>> +        for (i = 0; i < size / sizeof(uint32_t); i++)
>> +            igt_assert_eq(dst_map[i], 0);
>> +    }
>> +
>> +    /* Write some values from the CPU, potentially dirtying the CPU 
>> cache */
>> +    for (i = 0; i < size / sizeof(uint32_t); i++) {
>> +        if (p->r2_compressed)
>> +            src_map[i] = CLEAR_1;
>> +        else
>> +            src_map[i] = i;
>> +    }
>> +
>> +    /* And finally ensure we always see the CPU written values */
>> +    blt_fast_copy(fd, ctx, NULL, ahnd, &blt);
>> +    if (p->r2_compressed)
>> +        xe2_blt_decompress_dst(fd, ctx, ahnd, &blt, alias, size);
>> +
>> +    if (!do_slow_check && p->r2 != system_memory(fd)) {
>> +        int dwords_page = PAGE_SIZE / sizeof(uint32_t);
>> +        int dword = rand() % dwords_page;
>> +
>> +        igt_debug("random dword: %d\n", dword);
>> +
>> +        for (i = dword; i < size / sizeof(uint32_t); i += dwords_page) {
>> +            if (p->r2_compressed)
>> +                igt_assert_eq(dst_map[i], CLEAR_1);
>> +            else
>> +                igt_assert_eq(dst_map[i], i);
>> +        }
>> +
>> +    } else {
>> +        for (i = 0; i < size / sizeof(uint32_t); i++) {
>> +            if (p->r2_compressed)
>> +                igt_assert_eq(dst_map[i], CLEAR_1);
>> +            else
>> +                igt_assert_eq(dst_map[i], i);
>> +        }
>> +    }
>> +
>> +    munmap(src_map, size);
>> +    munmap(dst_map, size);
>> +
>> +    gem_close(fd, src_bo);
>> +    gem_close(fd, dst_bo);
>> +    gem_close(fd, bb);
>> +
>> +    xe_exec_queue_destroy(fd, exec_queue);
>> +    xe_vm_destroy(fd, vm);
>> +
>> +    put_ahnd(ahnd);
>> +    intel_ctx_destroy(fd, ctx);
>> +}
>> +
>> +static void pat_index_render(struct xe_pat_param *p)
>> +{
>> +    int fd = p->fd;
>> +    uint32_t devid = intel_get_drm_devid(fd);
>> +    igt_render_copyfunc_t render_copy = NULL;
>> +    int size, stride, width = p->size->width, height = p->size->height;
>> +    struct intel_buf src, dst;
>> +    struct intel_bb *ibb;
>> +    struct buf_ops *bops;
>> +    uint16_t r1_cpu_caching, r2_cpu_caching;
>> +    uint32_t r1_flags, r2_flags;
>> +    uint32_t src_bo, dst_bo;
>> +    uint32_t *src_map, *dst_map;
>> +    int bpp = 32;
>> +    int i;
>> +
>> +    bops = buf_ops_create(fd);
>> +
>> +    render_copy = igt_get_render_copyfunc(devid);
>> +    igt_require(render_copy);
>> +    igt_require(!p->r2_compressed); /* XXX */
>> +    igt_require(xe_has_engine_class(fd, DRM_XE_ENGINE_CLASS_RENDER));
>> +
>> +    ibb = intel_bb_create_full(fd, 0, 0, NULL, 
>> xe_get_default_alignment(fd),
>> +                   0, 0, p->size->alignment,
>> +                   INTEL_ALLOCATOR_SIMPLE,
>> +                   ALLOC_STRATEGY_HIGH_TO_LOW);
>> +
>> +    if (p->r1_coh_mode == DRM_XE_GEM_COH_AT_LEAST_1WAY
>> +        && p->r1 == system_memory(fd) && !p->r1_force_cpu_wc)
>> +        r1_cpu_caching = DRM_XE_GEM_CPU_CACHING_WB;
>> +    else
>> +        r1_cpu_caching = DRM_XE_GEM_CPU_CACHING_WC;
>> +
>> +    if (p->r2_coh_mode == DRM_XE_GEM_COH_AT_LEAST_1WAY &&
>> +        p->r2 == system_memory(fd) && !p->r2_force_cpu_wc)
>> +        r2_cpu_caching = DRM_XE_GEM_CPU_CACHING_WB;
>> +    else
>> +        r2_cpu_caching = DRM_XE_GEM_CPU_CACHING_WC;
>> +
>> +    size = width * height * bpp / 8;
>> +    stride = width * 4;
>> +
>> +    r1_flags = 0;
>> +    if (p->r1 != system_memory(fd))
>> +        r1_flags |= XE_GEM_CREATE_FLAG_NEEDS_VISIBLE_VRAM;
>> +
>> +    src_bo = xe_bo_create_caching(fd, 0, size, p->r1 | r1_flags, 
>> r1_cpu_caching,
>> +                      p->r1_coh_mode);
>> +    intel_buf_init_full(bops, src_bo, &src, width, height, bpp, 0,
>> +                I915_TILING_NONE, I915_COMPRESSION_NONE, size,
>> +                stride, p->r1, p->r1_pat_index);
>> +
>> +    r2_flags = 0;
>> +    if (p->r2 != system_memory(fd))
>> +        r2_flags |= XE_GEM_CREATE_FLAG_NEEDS_VISIBLE_VRAM;
>> +
>> +    dst_bo = xe_bo_create_caching(fd, 0, size, p->r2 | r2_flags, 
>> r2_cpu_caching,
>> +                      p->r2_coh_mode);
>> +    intel_buf_init_full(bops, dst_bo, &dst, width, height, bpp, 0,
>> +                I915_TILING_NONE, I915_COMPRESSION_NONE, size,
>> +                stride, p->r2, p->r2_pat_index);
>> +
>> +    src_map = xe_bo_map(fd, src_bo, size);
>> +    dst_map = xe_bo_map(fd, dst_bo, size);
>> +
>> +    /* Ensure we always see zeroes for the initial KMD zeroing */
>> +    render_copy(ibb,
>> +            &src,
>> +            0, 0, width, height,
>> +            &dst,
>> +            0, 0);
>> +    intel_bb_sync(ibb);
>> +
>> +    if (!do_slow_check && p->r2 != system_memory(fd)) {
>> +        int dwords_page = PAGE_SIZE / sizeof(uint32_t);
>> +        int dword = rand() % dwords_page;
>> +
>> +        igt_debug("random dword: %d\n", dword);
>> +
>> +        for (i = dword; i < size / sizeof(uint32_t); i += dwords_page)
>> +            igt_assert_eq(dst_map[i], 0);
>> +    } else {
>> +        for (i = 0; i < size / sizeof(uint32_t); i++)
>> +            igt_assert_eq(dst_map[i], 0);
>> +    }
>> +
>> +    /* Write some values from the CPU, potentially dirtying the CPU 
>> cache */
>> +    for (i = 0; i < size / sizeof(uint32_t); i++)
>> +        src_map[i] = i;
>> +
>> +    /* And finally ensure we always see the CPU written values */
>> +    render_copy(ibb,
>> +            &src,
>> +            0, 0, width, height,
>> +            &dst,
>> +            0, 0);
>> +    intel_bb_sync(ibb);
>> +
>> +    if (!do_slow_check && p->r2 != system_memory(fd)) {
>> +        int dwords_page = PAGE_SIZE / sizeof(uint32_t);
>> +        int dword = rand() % dwords_page;
>> +
>> +        igt_debug("random dword: %d\n", dword);
>> +
>> +        for (i = dword; i < size / sizeof(uint32_t); i += dwords_page)
>> +            igt_assert_eq(dst_map[i], i);
>> +    } else {
>> +        for (i = 0; i < size / sizeof(uint32_t); i++)
>> +            igt_assert_eq(dst_map[i], i);
>> +    }
>> +
>> +    munmap(src_map, size);
>> +    munmap(dst_map, size);
>> +
>> +    intel_bb_destroy(ibb);
>> +
>> +    gem_close(fd, src_bo);
>> +    gem_close(fd, dst_bo);
>> +}
>> +
>> +static uint8_t get_pat_idx_uc(int fd, bool *compressed)
>> +{
>> +    if (compressed)
>> +        *compressed = false;
>> +
>> +    return intel_get_pat_idx_uc(fd);
>> +}
>> +
>> +static uint8_t get_pat_idx_wt(int fd, bool *compressed)
>> +{
>> +    uint16_t dev_id = intel_get_drm_devid(fd);
>> +
>> +    if (compressed)
>> +        *compressed = intel_get_device_info(dev_id)->graphics_ver == 20;
>> +
>> +    return intel_get_pat_idx_wt(fd);
>> +}
>> +
>> +static uint8_t get_pat_idx_wb(int fd, bool *compressed)
>> +{
>> +    if (compressed)
>> +        *compressed = false;
>> +
>> +    return intel_get_pat_idx_wb(fd);
>> +}
>> +
>> +struct pat_index_entry {
>> +    uint8_t (*get_pat_index)(int fd, bool *compressed);
>> +
>> +    uint8_t pat_index;
>> +    bool compressed;
>> +
>> +    const char *name;
>> +    uint16_t coh_mode;
>> +    bool force_cpu_wc;
>> +};
>> +
>> +/*
>> + * The common modes are available on all platforms supported by Xe 
>> and so should
>> + * be commonly supported. There are many more possible pat_index 
>> modes, however
>> + * most IGTs shouldn't really care about them so likely no need to 
>> add them to
>> + * lib/intel_pat.c. We do try to test some on the non-common modes here.
>> + */
>> +const struct pat_index_entry common_pat_index_modes[] = {
>> +    { get_pat_idx_uc, 0, 0, "uc",        
>> DRM_XE_GEM_COH_NONE                },
>> +    { get_pat_idx_wt, 0, 0, "wt",        
>> DRM_XE_GEM_COH_NONE                },
>> +    { get_pat_idx_wb, 0, 0, "wb",        
>> DRM_XE_GEM_COH_AT_LEAST_1WAY       },
>> +    { get_pat_idx_wb, 0, 0, "wb-cpu-wc", 
>> DRM_XE_GEM_COH_AT_LEAST_1WAY, true },
>> +};
>> +
>> +const struct pat_index_entry xelp_pat_index_modes[] = {
>> +    { NULL, 1, false, "wc", DRM_XE_GEM_COH_NONE },
>> +};
>> +
>> +const struct pat_index_entry xehpc_pat_index_modes[] = {
>> +    { NULL, 1, false, "wc",    DRM_XE_GEM_COH_NONE          },
>> +    { NULL, 4, false, "c1-wt", DRM_XE_GEM_COH_NONE          },
>> +    { NULL, 5, false, "c1-wb", DRM_XE_GEM_COH_AT_LEAST_1WAY },
>> +    { NULL, 6, false, "c2-wt", DRM_XE_GEM_COH_NONE          },
>> +    { NULL, 7, false, "c2-wb", DRM_XE_GEM_COH_AT_LEAST_1WAY },
>> +};
>> +
>> +/* Too many, just pick some interesting ones */
>> +const struct pat_index_entry xe2_pat_index_modes[] = {
>> +    { NULL, 1, false, "1way",        
>> DRM_XE_GEM_COH_AT_LEAST_1WAY       },
>> +    { NULL, 2, false, "2way",        
>> DRM_XE_GEM_COH_AT_LEAST_1WAY       },
>> +    { NULL, 2, false, "2way-cpu-wc", DRM_XE_GEM_COH_AT_LEAST_1WAY, 
>> true },
>> +    { NULL, 3, true,  "uc-comp",     
>> DRM_XE_GEM_COH_NONE                },
>> +    { NULL, 5, false, "uc-1way",     
>> DRM_XE_GEM_COH_AT_LEAST_1WAY       },
>> +};
>> +
>> +/*
>> + * Depending on 2M/1G GTT pages we might trigger different PTE 
>> layouts for the
>> + * PAT bits, so make sure we test with and without huge-pages. Also 
>> ensure we
>> + * have a mix of different pat_index modes for each PDE.
>> + */
>> +const struct xe_pat_size_mode size_modes[] =  {
>> +    { 256,  256,  0,        "mixed-pde"  },
>> +    { 1024, 1024, 1u << 21, "single-pde" },
>> +};
> 
> I am bit confused with naming here (mixed-pde/single-pde).
> The first case here creates BOs of size 256*256*8/2 = 256K which means 
> it will
> need updating few PTEs could be all under a single PTE. This tests 
> pat_index
> setting of PTEs
> The second case here create BOs of size 1024*1024*8/2 = 4MB which at 2MB 
> offset
> will occupy 2 PDEs. This tests pat_index setting of leaf PDEs.
> Right?

Yup, the "mixed-pde" just means that the pde contains multiple different 
mappings using different pat_index. The "single-pde" means that the 
mapping will entirely consume each pde, hopefully with 2M GTT pages 
given the alignment. And yes this is mostly to test bit7/bit12 with pat[2].

I will change this to rather use 2M size, which is maybe less consufing.

> 
> Other than that, the patch looks fine to me.
> Reviewed-by: Niranjana Vishwanathapura 
> <niranjana.vishwanathapura@intel.com>

Thanks.

> 
>> +
>> +typedef void (*copy_fn)(struct xe_pat_param *p);
>> +
>> +const struct xe_pat_copy_mode {
>> +    copy_fn fn;
>> +    const char *name;
>> +} copy_modes[] =  {
>> +    {  pat_index_blt,    "blt"    },
>> +    {  pat_index_render, "render" },
>> +};
>> +
>> +/**
>> + * SUBTEST: pat-index-common
>> + * Test category: functionality test
>> + * Description: Check the common pat_index modes.
>> + */
>> +
>> +/**
>> + * SUBTEST: pat-index-xelp
>> + * Test category: functionality test
>> + * Description: Check some of the xelp pat_index modes.
>> + */
>> +
>> +/**
>> + * SUBTEST: pat-index-xehpc
>> + * Test category: functionality test
>> + * Description: Check some of the xehpc pat_index modes.
>> + */
>> +
>> +/**
>> + * SUBTEST: pat-index-xe2
>> + * Test category: functionality test
>> + * Description: Check some of the xe2 pat_index modes.
>> + */
>> +
>> +static void subtest_pat_index_modes_with_regions(int fd,
>> +                         const struct pat_index_entry *modes_arr,
>> +                         int n_modes)
>> +{
>> +    struct igt_collection *copy_set;
>> +    struct igt_collection *pat_index_set;
>> +    struct igt_collection *regions_set;
>> +    struct igt_collection *sizes_set;
>> +    struct igt_collection *copies;
>> +    struct xe_pat_param p = {};
>> +
>> +    p.fd = fd;
>> +
>> +    copy_set = igt_collection_create(ARRAY_SIZE(copy_modes));
>> +
>> +    pat_index_set = igt_collection_create(n_modes);
>> +
>> +    regions_set = xe_get_memory_region_set(fd,
>> +                           XE_MEM_REGION_CLASS_SYSMEM,
>> +                           XE_MEM_REGION_CLASS_VRAM);
>> +
>> +    sizes_set = igt_collection_create(ARRAY_SIZE(size_modes));
>> +
>> +    for_each_variation_r(copies, 1, copy_set) {
>> +        struct igt_collection *regions;
>> +        struct xe_pat_copy_mode copy_mode;
>> +
>> +        copy_mode = copy_modes[igt_collection_get_value(copies, 0)];
>> +
>> +        for_each_variation_r(regions, 2, regions_set) {
>> +            struct igt_collection *pat_modes;
>> +            uint32_t r1, r2;
>> +            char *reg_str;
>> +
>> +            r1 = igt_collection_get_value(regions, 0);
>> +            r2 = igt_collection_get_value(regions, 1);
>> +
>> +            reg_str = xe_memregion_dynamic_subtest_name(fd, regions);
>> +
>> +            for_each_variation_r(pat_modes, 2, pat_index_set) {
>> +                struct igt_collection *sizes;
>> +                struct pat_index_entry r1_entry, r2_entry;
>> +                int r1_idx, r2_idx;
>> +
>> +                r1_idx = igt_collection_get_value(pat_modes, 0);
>> +                r2_idx = igt_collection_get_value(pat_modes, 1);
>> +
>> +                r1_entry = modes_arr[r1_idx];
>> +                r2_entry = modes_arr[r2_idx];
>> +
>> +                if (r1_entry.get_pat_index)
>> +                    p.r1_pat_index = r1_entry.get_pat_index(fd, NULL);
>> +                else
>> +                    p.r1_pat_index = r1_entry.pat_index;
>> +
>> +                if (r2_entry.get_pat_index)
>> +                    p.r2_pat_index = r2_entry.get_pat_index(fd, 
>> &p.r2_compressed);
>> +                else {
>> +                    p.r2_pat_index = r2_entry.pat_index;
>> +                    p.r2_compressed = r2_entry.compressed;
>> +                }
>> +
>> +                p.r1_coh_mode = r1_entry.coh_mode;
>> +                p.r2_coh_mode = r2_entry.coh_mode;
>> +
>> +                p.r1_force_cpu_wc = r1_entry.force_cpu_wc;
>> +                p.r2_force_cpu_wc = r2_entry.force_cpu_wc;
>> +
>> +                p.r1 = r1;
>> +                p.r2 = r2;
>> +
>> +                for_each_variation_r(sizes, 1, sizes_set) {
>> +                    int size_mode_idx = 
>> igt_collection_get_value(sizes, 0);
>> +
>> +                    p.size = &size_modes[size_mode_idx];
>> +
>> +                    igt_debug("[r1]: r: %u, idx: %u, coh: %u, wc: %d\n",
>> +                          p.r1, p.r1_pat_index, p.r1_coh_mode, 
>> p.r1_force_cpu_wc);
>> +                    igt_debug("[r2]: r: %u, idx: %u, coh: %u, wc: %d, 
>> comp: %d, w: %u, h: %u, a: %u\n",
>> +                          p.r2, p.r2_pat_index, p.r2_coh_mode,
>> +                          p.r2_force_cpu_wc, p.r2_compressed,
>> +                          p.size->width, p.size->height,
>> +                          p.size->alignment);
>> +
>> +                    igt_dynamic_f("%s-%s-%s-%s-%s",
>> +                              copy_mode.name,
>> +                              reg_str, r1_entry.name,
>> +                              r2_entry.name, p.size->name)
>> +                        copy_mode.fn(&p);
>> +                }
>> +            }
>> +
>> +            free(reg_str);
>> +        }
>> +    }
>> +}
>> +
>> +igt_main
>> +{
>> +    uint16_t dev_id;
>> +    int fd;
>> +
>> +    igt_fixture {
>> +        uint32_t seed;
>> +
>> +        fd = drm_open_driver(DRIVER_XE);
>> +        dev_id = intel_get_drm_devid(fd);
>> +
>> +        seed = time(NULL);
>> +        srand(seed);
>> +        igt_debug("seed: %d\n", seed);
>> +
>> +        xe_device_get(fd);
>> +    }
>> +
>> +    igt_subtest("pat-index-all")
>> +        pat_index_all(fd);
>> +
>> +    igt_subtest("userptr-coh-none")
>> +        userptr_coh_none(fd);
>> +
>> +    igt_subtest_with_dynamic("pat-index-common") {
>> +        subtest_pat_index_modes_with_regions(fd, common_pat_index_modes,
>> +                             ARRAY_SIZE(common_pat_index_modes));
>> +    }
>> +
>> +    igt_subtest_with_dynamic("pat-index-xelp") {
>> +        igt_require(intel_graphics_ver(dev_id) <= IP_VER(12, 55));
>> +        subtest_pat_index_modes_with_regions(fd, xelp_pat_index_modes,
>> +                             ARRAY_SIZE(xelp_pat_index_modes));
>> +    }
>> +
>> +    igt_subtest_with_dynamic("pat-index-xehpc") {
>> +        igt_require(IS_PONTEVECCHIO(dev_id));
>> +        subtest_pat_index_modes_with_regions(fd, xehpc_pat_index_modes,
>> +                             ARRAY_SIZE(xehpc_pat_index_modes));
>> +    }
>> +
>> +    igt_subtest_with_dynamic("pat-index-xe2") {
>> +        igt_require(intel_get_device_info(dev_id)->graphics_ver >= 20);
>> +        subtest_pat_index_modes_with_regions(fd, xe2_pat_index_modes,
>> +                             ARRAY_SIZE(xe2_pat_index_modes));
>> +    }
>> +
>> +    igt_fixture
>> +        drm_close_driver(fd);
>> +}
>> diff --git a/tests/meson.build b/tests/meson.build
>> index 5afcd8cbb..3aecfbee0 100644
>> --- a/tests/meson.build
>> +++ b/tests/meson.build
>> @@ -297,6 +297,7 @@ intel_xe_progs = [
>>     'xe_mmap',
>>     'xe_module_load',
>>     'xe_noexec_ping_pong',
>> +    'xe_pat',
>>     'xe_pm',
>>     'xe_pm_residency',
>>     'xe_prime_self_import',
>> -- 
>> 2.41.0
>>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [igt-dev] [PATCH i-g-t v4 14/15] tests/xe: add some vm_bind pat_index tests
  2023-10-20  8:21     ` Matthew Auld
@ 2023-10-20  8:42       ` Matthew Auld
  2023-10-20 17:24       ` Niranjana Vishwanathapura
  1 sibling, 0 replies; 28+ messages in thread
From: Matthew Auld @ 2023-10-20  8:42 UTC (permalink / raw)
  To: Niranjana Vishwanathapura; +Cc: igt-dev, Nitish Kumar

On 20/10/2023 09:21, Matthew Auld wrote:
> On 20/10/2023 06:27, Niranjana Vishwanathapura wrote:
>> On Thu, Oct 19, 2023 at 03:41:05PM +0100, Matthew Auld wrote:
>>> Add some basic tests for pat_index and vm_bind.
>>>
>>> v2: Make sure to actually use srand() with the chosen seed
>>>  - Make it work on xe2; the wt mode now has compression.
>>>  - Also test some xe2+ specific pat_index modes.
>>> v3: Fix decompress step.
>>> v4: (Niranjana)
>>>  - Various improvements, including testing more pat_index modes, like
>>>    wc where possible.
>>>  - Document the idea behind "common" modes.
>>>
>>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>>> Cc: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
>>> Cc: José Roberto de Souza <jose.souza@intel.com>
>>> Cc: Pallavi Mishra <pallavi.mishra@intel.com>
>>> Cc: Nitish Kumar <nitish.kumar@intel.com>
>>> ---
>>> tests/intel/xe_pat.c | 754 +++++++++++++++++++++++++++++++++++++++++++
>>> tests/meson.build    |   1 +
>>> 2 files changed, 755 insertions(+)
>>> create mode 100644 tests/intel/xe_pat.c
>>>
>>> diff --git a/tests/intel/xe_pat.c b/tests/intel/xe_pat.c
>>> new file mode 100644
>>> index 000000000..1e74014b8
>>> --- /dev/null
>>> +++ b/tests/intel/xe_pat.c
>>> @@ -0,0 +1,754 @@
>>> +// SPDX-License-Identifier: MIT
>>> +/*
>>> + * Copyright © 2023 Intel Corporation
>>> + */
>>> +
>>> +/**
>>> + * TEST: Test for selecting per-VMA pat_index
>>> + * Category: Software building block
>>> + * Sub-category: VMA
>>> + * Functionality: pat_index
>>> + */
>>> +
>>> +#include "igt.h"
>>> +#include "intel_blt.h"
>>> +#include "intel_mocs.h"
>>> +#include "intel_pat.h"
>>> +
>>> +#include "xe/xe_ioctl.h"
>>> +#include "xe/xe_query.h"
>>> +#include "xe/xe_util.h"
>>> +
>>> +#define PAGE_SIZE 4096
>>> +
>>> +static bool do_slow_check;
>>> +
>>> +/**
>>> + * SUBTEST: userptr-coh-none
>>> + * Test category: functionality test
>>> + * Description: Test non-coherent pat_index on userptr
>>> + */
>>> +static void userptr_coh_none(int fd)
>>> +{
>>> +    size_t size = xe_get_default_alignment(fd);
>>> +    uint32_t vm;
>>> +    void *data;
>>> +
>>> +    data = mmap(0, size, PROT_READ |
>>> +            PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS, -1, 0);
>>> +    igt_assert(data != MAP_FAILED);
>>> +
>>> +    vm = xe_vm_create(fd, 0, 0);
>>> +
>>> +    /*
>>> +     * Try some valid combinations first just to make sure we're not 
>>> being
>>> +     * swindled.
>>> +     */
>>> +    igt_assert_eq(__xe_vm_bind(fd, vm, 0, 0, to_user_pointer(data), 
>>> 0x40000,
>>> +                   size, XE_VM_BIND_OP_MAP_USERPTR, 0, NULL, 0, 0,
>>> +                   DEFAULT_PAT_INDEX, 0),
>>> +              0);
>>> +    xe_vm_unbind_sync(fd, vm, 0, 0x40000, size);
>>> +    igt_assert_eq(__xe_vm_bind(fd, vm, 0, 0, to_user_pointer(data), 
>>> 0x40000,
>>> +                   size, XE_VM_BIND_OP_MAP_USERPTR, 0, NULL, 0, 0,
>>> +                   intel_get_pat_idx_wb(fd), 0),
>>> +              0);
>>> +    xe_vm_unbind_sync(fd, vm, 0, 0x40000, size);
>>> +
>>> +    /* And then some known COH_NONE pat_index combos which should 
>>> fail. */
>>> +    igt_assert_eq(__xe_vm_bind(fd, vm, 0, 0, to_user_pointer(data), 
>>> 0x40000,
>>> +                   size, XE_VM_BIND_OP_MAP_USERPTR, 0, NULL, 0, 0,
>>> +                   intel_get_pat_idx_uc(fd), 0),
>>> +              -EINVAL);
>>> +    igt_assert_eq(__xe_vm_bind(fd, vm, 0, 0, to_user_pointer(data), 
>>> 0x40000,
>>> +                   size, XE_VM_BIND_OP_MAP_USERPTR, 0, NULL, 0, 0,
>>> +                   intel_get_pat_idx_wt(fd), 0),
>>> +              -EINVAL);
>>> +
>>> +    munmap(data, size);
>>> +    xe_vm_destroy(fd, vm);
>>> +}
>>> +
>>> +/**
>>> + * SUBTEST: pat-index-all
>>> + * Test category: functionality test
>>> + * Description: Test every pat_index
>>> + */
>>> +static void pat_index_all(int fd)
>>> +{
>>> +    uint16_t dev_id = intel_get_drm_devid(fd);
>>> +    size_t size = xe_get_default_alignment(fd);
>>> +    uint32_t vm, bo;
>>> +    uint8_t pat_index;
>>> +
>>> +    vm = xe_vm_create(fd, 0, 0);
>>> +
>>> +    bo = xe_bo_create_caching(fd, 0, size, all_memory_regions(fd),
>>> +                  DRM_XE_GEM_CPU_CACHING_WC,
>>> +                  DRM_XE_GEM_COH_NONE);
>>> +
>>> +    igt_assert_eq(__xe_vm_bind(fd, vm, 0, bo, 0, 0x40000,
>>> +                   size, XE_VM_BIND_OP_MAP, 0, NULL, 0, 0,
>>> +                   intel_get_pat_idx_uc(fd), 0),
>>> +              0);
>>> +    xe_vm_unbind_sync(fd, vm, 0, 0x40000, size);
>>> +
>>> +    igt_assert_eq(__xe_vm_bind(fd, vm, 0, bo, 0, 0x40000,
>>> +                   size, XE_VM_BIND_OP_MAP, 0, NULL, 0, 0,
>>> +                   intel_get_pat_idx_wt(fd), 0),
>>> +              0);
>>> +    xe_vm_unbind_sync(fd, vm, 0, 0x40000, size);
>>> +
>>> +    igt_assert_eq(__xe_vm_bind(fd, vm, 0, bo, 0, 0x40000,
>>> +                   size, XE_VM_BIND_OP_MAP, 0, NULL, 0, 0,
>>> +                   intel_get_pat_idx_wb(fd), 0),
>>> +              0);
>>> +    xe_vm_unbind_sync(fd, vm, 0, 0x40000, size);
>>> +
>>> +    igt_assert(intel_get_max_pat_index(fd));
>>> +
>>> +    for (pat_index = 0; pat_index <= intel_get_max_pat_index(fd);
>>> +         pat_index++) {
>>> +        if (intel_get_device_info(dev_id)->graphics_ver == 20 &&
>>> +            pat_index >= 16 && pat_index <= 19) { /* hw reserved */
>>> +            igt_assert_eq(__xe_vm_bind(fd, vm, 0, bo, 0, 0x40000,
>>> +                           size, XE_VM_BIND_OP_MAP, 0, NULL, 0, 0,
>>> +                           pat_index, 0),
>>> +                      -EINVAL);
>>> +        } else {
>>> +            igt_assert_eq(__xe_vm_bind(fd, vm, 0, bo, 0, 0x40000,
>>> +                           size, XE_VM_BIND_OP_MAP, 0, NULL, 0, 0,
>>> +                           pat_index, 0),
>>> +                      0);
>>> +            xe_vm_unbind_sync(fd, vm, 0, 0x40000, size);
>>> +        }
>>> +    }
>>> +
>>> +    igt_assert_eq(__xe_vm_bind(fd, vm, 0, bo, 0, 0x40000,
>>> +                   size, XE_VM_BIND_OP_MAP, 0, NULL, 0, 0,
>>> +                   pat_index, 0),
>>> +              -EINVAL);
>>> +
>>> +    gem_close(fd, bo);
>>> +
>>> +    /* Must be at least as coherent as the gem_create coh_mode. */
>>> +    bo = xe_bo_create_caching(fd, 0, size, system_memory(fd),
>>> +                  DRM_XE_GEM_CPU_CACHING_WB,
>>> +                  DRM_XE_GEM_COH_AT_LEAST_1WAY);
>>> +
>>> +    igt_assert_eq(__xe_vm_bind(fd, vm, 0, bo, 0, 0x40000,
>>> +                   size, XE_VM_BIND_OP_MAP, 0, NULL, 0, 0,
>>> +                   intel_get_pat_idx_uc(fd), 0),
>>> +              -EINVAL);
>>> +
>>> +    igt_assert_eq(__xe_vm_bind(fd, vm, 0, bo, 0, 0x40000,
>>> +                   size, XE_VM_BIND_OP_MAP, 0, NULL, 0, 0,
>>> +                   intel_get_pat_idx_wt(fd), 0),
>>> +              -EINVAL);
>>> +
>>> +    igt_assert_eq(__xe_vm_bind(fd, vm, 0, bo, 0, 0x40000,
>>> +                   size, XE_VM_BIND_OP_MAP, 0, NULL, 0, 0,
>>> +                   intel_get_pat_idx_wb(fd), 0),
>>> +              0);
>>> +    xe_vm_unbind_sync(fd, vm, 0, 0x40000, size);
>>> +
>>> +    gem_close(fd, bo);
>>> +
>>> +    xe_vm_destroy(fd, vm);
>>> +}
>>> +
>>> +#define CLEAR_1 0xFFFFFFFF /* something compressible */
>>> +
>>> +static void xe2_blt_decompress_dst(int fd,
>>> +                   intel_ctx_t *ctx,
>>> +                   uint64_t ahnd,
>>> +                   struct blt_copy_data *blt,
>>> +                   uint32_t alias_handle,
>>> +                   uint32_t size)
>>> +{
>>> +    struct blt_copy_object tmp = {};
>>> +
>>> +    /*
>>> +     * Xe2 in-place decompression using an alias to the same physical
>>> +     * memory, but with the dst mapped using some uncompressed 
>>> pat_index.
>>> +     * This should allow checking the object pages via mmap.
>>> +     */
>>> +
>>> +    memcpy(&tmp, &blt->src, sizeof(blt->dst));
>>> +    memcpy(&blt->src, &blt->dst, sizeof(blt->dst));
>>> +    blt_set_object(&blt->dst, alias_handle, size, 0,
>>> +               intel_get_uc_mocs_index(fd),
>>> +               intel_get_pat_idx_uc(fd), /* compression disabled */
>>> +               T_LINEAR, 0, 0);
>>> +    blt_fast_copy(fd, ctx, NULL, ahnd, blt);
>>> +    memcpy(&blt->dst, &blt->src, sizeof(blt->dst));
>>> +    memcpy(&blt->src, &tmp, sizeof(blt->dst));
>>> +}
>>> +
>>> +struct xe_pat_size_mode {
>>> +    uint16_t width;
>>> +    uint16_t height;
>>> +    uint32_t alignment;
>>> +    const char *name;
>>> +};
>>> +
>>> +struct xe_pat_param {
>>> +    int fd;
>>> +
>>> +    const struct xe_pat_size_mode *size;
>>> +
>>> +    uint32_t r1;
>>> +    uint8_t  r1_pat_index;
>>> +    uint16_t r1_coh_mode;
>>> +    bool     r1_force_cpu_wc;
>>> +
>>> +    uint32_t r2;
>>> +    uint8_t  r2_pat_index;
>>> +    uint16_t r2_coh_mode;
>>> +    bool     r2_force_cpu_wc;
>>> +    bool     r2_compressed; /* xe2+ compression */
>>> +
>>> +};
>>> +
>>> +static void pat_index_blt(struct xe_pat_param *p)
>>> +{
>>> +    struct drm_xe_engine_class_instance inst = {
>>> +        .engine_class = DRM_XE_ENGINE_CLASS_COPY,
>>> +    };
>>> +    struct blt_copy_data blt = {};
>>> +    struct blt_copy_object src = {};
>>> +    struct blt_copy_object dst = {};
>>> +    uint32_t vm, exec_queue, src_bo, dst_bo, bb;
>>> +    uint32_t *src_map, *dst_map;
>>> +    uint16_t r1_cpu_caching, r2_cpu_caching;
>>> +    uint32_t r1_flags, r2_flags;
>>> +    intel_ctx_t *ctx;
>>> +    uint64_t ahnd;
>>> +    int width = p->size->width, height = p->size->height;
>>> +    int size, stride, bb_size;
>>> +    int bpp = 32;
>>> +    uint32_t alias, name;
>>> +    int fd = p->fd;
>>> +    int i;
>>> +
>>> +    igt_require(blt_has_fast_copy(fd));
>>> +
>>> +    vm = xe_vm_create(fd, DRM_XE_VM_CREATE_ASYNC_DEFAULT, 0);
>>> +    exec_queue = xe_exec_queue_create(fd, vm, &inst, 0);
>>> +    ctx = intel_ctx_xe(fd, vm, exec_queue, 0, 0, 0);
>>> +    ahnd = intel_allocator_open_full(fd, ctx->vm, 0, 0,
>>> +                     INTEL_ALLOCATOR_SIMPLE,
>>> +                     ALLOC_STRATEGY_LOW_TO_HIGH,
>>> +                     p->size->alignment);
>>> +
>>> +    bb_size = xe_get_default_alignment(fd);
>>> +    bb = xe_bo_create_flags(fd, 0, bb_size, system_memory(fd));
>>> +
>>> +    size = width * height * bpp / 8;
>>> +    stride = width * 4;
>>> +
>>> +    r1_flags = 0;
>>> +    if (p->r1 != system_memory(fd))
>>> +        r1_flags |= XE_GEM_CREATE_FLAG_NEEDS_VISIBLE_VRAM;
>>> +
>>> +    if (p->r1_coh_mode == DRM_XE_GEM_COH_AT_LEAST_1WAY
>>> +        && p->r1 == system_memory(fd) && !p->r1_force_cpu_wc)
>>> +        r1_cpu_caching = DRM_XE_GEM_CPU_CACHING_WB;
>>> +    else
>>> +        r1_cpu_caching = DRM_XE_GEM_CPU_CACHING_WC;
>>> +
>>> +    r2_flags = 0;
>>> +    if (p->r2 != system_memory(fd))
>>> +        r2_flags |= XE_GEM_CREATE_FLAG_NEEDS_VISIBLE_VRAM;
>>> +
>>> +    if (p->r2_coh_mode == DRM_XE_GEM_COH_AT_LEAST_1WAY &&
>>> +        p->r2 == system_memory(fd) && !p->r2_force_cpu_wc)
>>> +        r2_cpu_caching = DRM_XE_GEM_CPU_CACHING_WB;
>>> +    else
>>> +        r2_cpu_caching = DRM_XE_GEM_CPU_CACHING_WC;
>>> +
>>> +
>>> +    src_bo = xe_bo_create_caching(fd, 0, size, p->r1 | r1_flags, 
>>> r1_cpu_caching,
>>> +                      p->r1_coh_mode);
>>> +    dst_bo = xe_bo_create_caching(fd, 0, size, p->r2 | r2_flags, 
>>> r2_cpu_caching,
>>> +                      p->r2_coh_mode);
>>> +    if (p->r2_compressed) {
>>> +        name = gem_flink(fd, dst_bo);
>>> +        alias = gem_open(fd, name);
>>> +    }
>>> +
>>> +    blt_copy_init(fd, &blt);
>>> +    blt.color_depth = CD_32bit;
>>> +
>>> +    blt_set_object(&src, src_bo, size, p->r1, 
>>> intel_get_uc_mocs_index(fd),
>>> +               p->r1_pat_index, T_LINEAR,
>>> +               COMPRESSION_DISABLED, COMPRESSION_TYPE_3D);
>>> +    blt_set_geom(&src, stride, 0, 0, width, height, 0, 0);
>>> +
>>> +    blt_set_object(&dst, dst_bo, size, p->r2, 
>>> intel_get_uc_mocs_index(fd),
>>> +               p->r2_pat_index, T_LINEAR,
>>> +               COMPRESSION_DISABLED, COMPRESSION_TYPE_3D);
>>> +    blt_set_geom(&dst, stride, 0, 0, width, height, 0, 0);
>>> +
>>> +    blt_set_copy_object(&blt.src, &src);
>>> +    blt_set_copy_object(&blt.dst, &dst);
>>> +    blt_set_batch(&blt.bb, bb, bb_size, system_memory(fd));
>>> +
>>> +    src_map = xe_bo_map(fd, src_bo, size);
>>> +    dst_map = xe_bo_map(fd, dst_bo, size);
>>> +
>>> +    /* Ensure we always see zeroes for the initial KMD zeroing */
>>> +    blt_fast_copy(fd, ctx, NULL, ahnd, &blt);
>>> +    if (p->r2_compressed)
>>> +        xe2_blt_decompress_dst(fd, ctx, ahnd, &blt, alias, size);
>>> +
>>> +    /*
>>> +     * Only sample random dword in every page if we are doing slow 
>>> uncached
>>> +     * reads from VRAM.
>>> +     */
>>> +    if (!do_slow_check && p->r2 != system_memory(fd)) {
>>> +        int dwords_page = PAGE_SIZE / sizeof(uint32_t);
>>> +        int dword = rand() % dwords_page;
>>> +
>>> +        igt_debug("random dword: %d\n", dword);
>>> +
>>> +        for (i = dword; i < size / sizeof(uint32_t); i += dwords_page)
>>> +            igt_assert_eq(dst_map[i], 0);
>>> +
>>> +    } else {
>>> +        for (i = 0; i < size / sizeof(uint32_t); i++)
>>> +            igt_assert_eq(dst_map[i], 0);
>>> +    }
>>> +
>>> +    /* Write some values from the CPU, potentially dirtying the CPU 
>>> cache */
>>> +    for (i = 0; i < size / sizeof(uint32_t); i++) {
>>> +        if (p->r2_compressed)
>>> +            src_map[i] = CLEAR_1;
>>> +        else
>>> +            src_map[i] = i;
>>> +    }
>>> +
>>> +    /* And finally ensure we always see the CPU written values */
>>> +    blt_fast_copy(fd, ctx, NULL, ahnd, &blt);
>>> +    if (p->r2_compressed)
>>> +        xe2_blt_decompress_dst(fd, ctx, ahnd, &blt, alias, size);
>>> +
>>> +    if (!do_slow_check && p->r2 != system_memory(fd)) {
>>> +        int dwords_page = PAGE_SIZE / sizeof(uint32_t);
>>> +        int dword = rand() % dwords_page;
>>> +
>>> +        igt_debug("random dword: %d\n", dword);
>>> +
>>> +        for (i = dword; i < size / sizeof(uint32_t); i += 
>>> dwords_page) {
>>> +            if (p->r2_compressed)
>>> +                igt_assert_eq(dst_map[i], CLEAR_1);
>>> +            else
>>> +                igt_assert_eq(dst_map[i], i);
>>> +        }
>>> +
>>> +    } else {
>>> +        for (i = 0; i < size / sizeof(uint32_t); i++) {
>>> +            if (p->r2_compressed)
>>> +                igt_assert_eq(dst_map[i], CLEAR_1);
>>> +            else
>>> +                igt_assert_eq(dst_map[i], i);
>>> +        }
>>> +    }
>>> +
>>> +    munmap(src_map, size);
>>> +    munmap(dst_map, size);
>>> +
>>> +    gem_close(fd, src_bo);
>>> +    gem_close(fd, dst_bo);
>>> +    gem_close(fd, bb);
>>> +
>>> +    xe_exec_queue_destroy(fd, exec_queue);
>>> +    xe_vm_destroy(fd, vm);
>>> +
>>> +    put_ahnd(ahnd);
>>> +    intel_ctx_destroy(fd, ctx);
>>> +}
>>> +
>>> +static void pat_index_render(struct xe_pat_param *p)
>>> +{
>>> +    int fd = p->fd;
>>> +    uint32_t devid = intel_get_drm_devid(fd);
>>> +    igt_render_copyfunc_t render_copy = NULL;
>>> +    int size, stride, width = p->size->width, height = p->size->height;
>>> +    struct intel_buf src, dst;
>>> +    struct intel_bb *ibb;
>>> +    struct buf_ops *bops;
>>> +    uint16_t r1_cpu_caching, r2_cpu_caching;
>>> +    uint32_t r1_flags, r2_flags;
>>> +    uint32_t src_bo, dst_bo;
>>> +    uint32_t *src_map, *dst_map;
>>> +    int bpp = 32;
>>> +    int i;
>>> +
>>> +    bops = buf_ops_create(fd);
>>> +
>>> +    render_copy = igt_get_render_copyfunc(devid);
>>> +    igt_require(render_copy);
>>> +    igt_require(!p->r2_compressed); /* XXX */
>>> +    igt_require(xe_has_engine_class(fd, DRM_XE_ENGINE_CLASS_RENDER));
>>> +
>>> +    ibb = intel_bb_create_full(fd, 0, 0, NULL, 
>>> xe_get_default_alignment(fd),
>>> +                   0, 0, p->size->alignment,
>>> +                   INTEL_ALLOCATOR_SIMPLE,
>>> +                   ALLOC_STRATEGY_HIGH_TO_LOW);
>>> +
>>> +    if (p->r1_coh_mode == DRM_XE_GEM_COH_AT_LEAST_1WAY
>>> +        && p->r1 == system_memory(fd) && !p->r1_force_cpu_wc)
>>> +        r1_cpu_caching = DRM_XE_GEM_CPU_CACHING_WB;
>>> +    else
>>> +        r1_cpu_caching = DRM_XE_GEM_CPU_CACHING_WC;
>>> +
>>> +    if (p->r2_coh_mode == DRM_XE_GEM_COH_AT_LEAST_1WAY &&
>>> +        p->r2 == system_memory(fd) && !p->r2_force_cpu_wc)
>>> +        r2_cpu_caching = DRM_XE_GEM_CPU_CACHING_WB;
>>> +    else
>>> +        r2_cpu_caching = DRM_XE_GEM_CPU_CACHING_WC;
>>> +
>>> +    size = width * height * bpp / 8;
>>> +    stride = width * 4;
>>> +
>>> +    r1_flags = 0;
>>> +    if (p->r1 != system_memory(fd))
>>> +        r1_flags |= XE_GEM_CREATE_FLAG_NEEDS_VISIBLE_VRAM;
>>> +
>>> +    src_bo = xe_bo_create_caching(fd, 0, size, p->r1 | r1_flags, 
>>> r1_cpu_caching,
>>> +                      p->r1_coh_mode);
>>> +    intel_buf_init_full(bops, src_bo, &src, width, height, bpp, 0,
>>> +                I915_TILING_NONE, I915_COMPRESSION_NONE, size,
>>> +                stride, p->r1, p->r1_pat_index);
>>> +
>>> +    r2_flags = 0;
>>> +    if (p->r2 != system_memory(fd))
>>> +        r2_flags |= XE_GEM_CREATE_FLAG_NEEDS_VISIBLE_VRAM;
>>> +
>>> +    dst_bo = xe_bo_create_caching(fd, 0, size, p->r2 | r2_flags, 
>>> r2_cpu_caching,
>>> +                      p->r2_coh_mode);
>>> +    intel_buf_init_full(bops, dst_bo, &dst, width, height, bpp, 0,
>>> +                I915_TILING_NONE, I915_COMPRESSION_NONE, size,
>>> +                stride, p->r2, p->r2_pat_index);
>>> +
>>> +    src_map = xe_bo_map(fd, src_bo, size);
>>> +    dst_map = xe_bo_map(fd, dst_bo, size);
>>> +
>>> +    /* Ensure we always see zeroes for the initial KMD zeroing */
>>> +    render_copy(ibb,
>>> +            &src,
>>> +            0, 0, width, height,
>>> +            &dst,
>>> +            0, 0);
>>> +    intel_bb_sync(ibb);
>>> +
>>> +    if (!do_slow_check && p->r2 != system_memory(fd)) {
>>> +        int dwords_page = PAGE_SIZE / sizeof(uint32_t);
>>> +        int dword = rand() % dwords_page;
>>> +
>>> +        igt_debug("random dword: %d\n", dword);
>>> +
>>> +        for (i = dword; i < size / sizeof(uint32_t); i += dwords_page)
>>> +            igt_assert_eq(dst_map[i], 0);
>>> +    } else {
>>> +        for (i = 0; i < size / sizeof(uint32_t); i++)
>>> +            igt_assert_eq(dst_map[i], 0);
>>> +    }
>>> +
>>> +    /* Write some values from the CPU, potentially dirtying the CPU 
>>> cache */
>>> +    for (i = 0; i < size / sizeof(uint32_t); i++)
>>> +        src_map[i] = i;
>>> +
>>> +    /* And finally ensure we always see the CPU written values */
>>> +    render_copy(ibb,
>>> +            &src,
>>> +            0, 0, width, height,
>>> +            &dst,
>>> +            0, 0);
>>> +    intel_bb_sync(ibb);
>>> +
>>> +    if (!do_slow_check && p->r2 != system_memory(fd)) {
>>> +        int dwords_page = PAGE_SIZE / sizeof(uint32_t);
>>> +        int dword = rand() % dwords_page;
>>> +
>>> +        igt_debug("random dword: %d\n", dword);
>>> +
>>> +        for (i = dword; i < size / sizeof(uint32_t); i += dwords_page)
>>> +            igt_assert_eq(dst_map[i], i);
>>> +    } else {
>>> +        for (i = 0; i < size / sizeof(uint32_t); i++)
>>> +            igt_assert_eq(dst_map[i], i);
>>> +    }
>>> +
>>> +    munmap(src_map, size);
>>> +    munmap(dst_map, size);
>>> +
>>> +    intel_bb_destroy(ibb);
>>> +
>>> +    gem_close(fd, src_bo);
>>> +    gem_close(fd, dst_bo);
>>> +}
>>> +
>>> +static uint8_t get_pat_idx_uc(int fd, bool *compressed)
>>> +{
>>> +    if (compressed)
>>> +        *compressed = false;
>>> +
>>> +    return intel_get_pat_idx_uc(fd);
>>> +}
>>> +
>>> +static uint8_t get_pat_idx_wt(int fd, bool *compressed)
>>> +{
>>> +    uint16_t dev_id = intel_get_drm_devid(fd);
>>> +
>>> +    if (compressed)
>>> +        *compressed = intel_get_device_info(dev_id)->graphics_ver == 
>>> 20;
>>> +
>>> +    return intel_get_pat_idx_wt(fd);
>>> +}
>>> +
>>> +static uint8_t get_pat_idx_wb(int fd, bool *compressed)
>>> +{
>>> +    if (compressed)
>>> +        *compressed = false;
>>> +
>>> +    return intel_get_pat_idx_wb(fd);
>>> +}
>>> +
>>> +struct pat_index_entry {
>>> +    uint8_t (*get_pat_index)(int fd, bool *compressed);
>>> +
>>> +    uint8_t pat_index;
>>> +    bool compressed;
>>> +
>>> +    const char *name;
>>> +    uint16_t coh_mode;
>>> +    bool force_cpu_wc;
>>> +};
>>> +
>>> +/*
>>> + * The common modes are available on all platforms supported by Xe 
>>> and so should
>>> + * be commonly supported. There are many more possible pat_index 
>>> modes, however
>>> + * most IGTs shouldn't really care about them so likely no need to 
>>> add them to
>>> + * lib/intel_pat.c. We do try to test some on the non-common modes 
>>> here.
>>> + */
>>> +const struct pat_index_entry common_pat_index_modes[] = {
>>> +    { get_pat_idx_uc, 0, 0, "uc", DRM_XE_GEM_COH_NONE                },
>>> +    { get_pat_idx_wt, 0, 0, "wt", DRM_XE_GEM_COH_NONE                },
>>> +    { get_pat_idx_wb, 0, 0, "wb", DRM_XE_GEM_COH_AT_LEAST_1WAY       },
>>> +    { get_pat_idx_wb, 0, 0, "wb-cpu-wc", 
>>> DRM_XE_GEM_COH_AT_LEAST_1WAY, true },
>>> +};
>>> +
>>> +const struct pat_index_entry xelp_pat_index_modes[] = {
>>> +    { NULL, 1, false, "wc", DRM_XE_GEM_COH_NONE },
>>> +};
>>> +
>>> +const struct pat_index_entry xehpc_pat_index_modes[] = {
>>> +    { NULL, 1, false, "wc",    DRM_XE_GEM_COH_NONE          },
>>> +    { NULL, 4, false, "c1-wt", DRM_XE_GEM_COH_NONE          },
>>> +    { NULL, 5, false, "c1-wb", DRM_XE_GEM_COH_AT_LEAST_1WAY },
>>> +    { NULL, 6, false, "c2-wt", DRM_XE_GEM_COH_NONE          },
>>> +    { NULL, 7, false, "c2-wb", DRM_XE_GEM_COH_AT_LEAST_1WAY },
>>> +};
>>> +
>>> +/* Too many, just pick some interesting ones */
>>> +const struct pat_index_entry xe2_pat_index_modes[] = {
>>> +    { NULL, 1, false, "1way", DRM_XE_GEM_COH_AT_LEAST_1WAY       },
>>> +    { NULL, 2, false, "2way", DRM_XE_GEM_COH_AT_LEAST_1WAY       },
>>> +    { NULL, 2, false, "2way-cpu-wc", DRM_XE_GEM_COH_AT_LEAST_1WAY, 
>>> true },
>>> +    { NULL, 3, true,  "uc-comp", DRM_XE_GEM_COH_NONE                },
>>> +    { NULL, 5, false, "uc-1way", DRM_XE_GEM_COH_AT_LEAST_1WAY       },
>>> +};
>>> +
>>> +/*
>>> + * Depending on 2M/1G GTT pages we might trigger different PTE 
>>> layouts for the
>>> + * PAT bits, so make sure we test with and without huge-pages. Also 
>>> ensure we
>>> + * have a mix of different pat_index modes for each PDE.
>>> + */
>>> +const struct xe_pat_size_mode size_modes[] =  {
>>> +    { 256,  256,  0,        "mixed-pde"  },
>>> +    { 1024, 1024, 1u << 21, "single-pde" },
>>> +};
>>
>> I am bit confused with naming here (mixed-pde/single-pde).
>> The first case here creates BOs of size 256*256*8/2 = 256K which means 
>> it will
>> need updating few PTEs could be all under a single PTE. This tests 
>> pat_index
>> setting of PTEs
>> The second case here create BOs of size 1024*1024*8/2 = 4MB which at 
>> 2MB offset
>> will occupy 2 PDEs. This tests pat_index setting of leaf PDEs.
>> Right?
> 
> Yup, the "mixed-pde" just means that the pde contains multiple different 
> mappings using different pat_index. The "single-pde" means that the 
> mapping will entirely consume each pde, hopefully with 2M GTT pages 
> given the alignment. And yes this is mostly to test bit7/bit12 with pat[2].
> 
> I will change this to rather use 2M size, which is maybe less consufing.

Also just realised I forgot to include the xelpg tables. Will fix that also.

> 
>>
>> Other than that, the patch looks fine to me.
>> Reviewed-by: Niranjana Vishwanathapura 
>> <niranjana.vishwanathapura@intel.com>
> 
> Thanks.
> 
>>
>>> +
>>> +typedef void (*copy_fn)(struct xe_pat_param *p);
>>> +
>>> +const struct xe_pat_copy_mode {
>>> +    copy_fn fn;
>>> +    const char *name;
>>> +} copy_modes[] =  {
>>> +    {  pat_index_blt,    "blt"    },
>>> +    {  pat_index_render, "render" },
>>> +};
>>> +
>>> +/**
>>> + * SUBTEST: pat-index-common
>>> + * Test category: functionality test
>>> + * Description: Check the common pat_index modes.
>>> + */
>>> +
>>> +/**
>>> + * SUBTEST: pat-index-xelp
>>> + * Test category: functionality test
>>> + * Description: Check some of the xelp pat_index modes.
>>> + */
>>> +
>>> +/**
>>> + * SUBTEST: pat-index-xehpc
>>> + * Test category: functionality test
>>> + * Description: Check some of the xehpc pat_index modes.
>>> + */
>>> +
>>> +/**
>>> + * SUBTEST: pat-index-xe2
>>> + * Test category: functionality test
>>> + * Description: Check some of the xe2 pat_index modes.
>>> + */
>>> +
>>> +static void subtest_pat_index_modes_with_regions(int fd,
>>> +                         const struct pat_index_entry *modes_arr,
>>> +                         int n_modes)
>>> +{
>>> +    struct igt_collection *copy_set;
>>> +    struct igt_collection *pat_index_set;
>>> +    struct igt_collection *regions_set;
>>> +    struct igt_collection *sizes_set;
>>> +    struct igt_collection *copies;
>>> +    struct xe_pat_param p = {};
>>> +
>>> +    p.fd = fd;
>>> +
>>> +    copy_set = igt_collection_create(ARRAY_SIZE(copy_modes));
>>> +
>>> +    pat_index_set = igt_collection_create(n_modes);
>>> +
>>> +    regions_set = xe_get_memory_region_set(fd,
>>> +                           XE_MEM_REGION_CLASS_SYSMEM,
>>> +                           XE_MEM_REGION_CLASS_VRAM);
>>> +
>>> +    sizes_set = igt_collection_create(ARRAY_SIZE(size_modes));
>>> +
>>> +    for_each_variation_r(copies, 1, copy_set) {
>>> +        struct igt_collection *regions;
>>> +        struct xe_pat_copy_mode copy_mode;
>>> +
>>> +        copy_mode = copy_modes[igt_collection_get_value(copies, 0)];
>>> +
>>> +        for_each_variation_r(regions, 2, regions_set) {
>>> +            struct igt_collection *pat_modes;
>>> +            uint32_t r1, r2;
>>> +            char *reg_str;
>>> +
>>> +            r1 = igt_collection_get_value(regions, 0);
>>> +            r2 = igt_collection_get_value(regions, 1);
>>> +
>>> +            reg_str = xe_memregion_dynamic_subtest_name(fd, regions);
>>> +
>>> +            for_each_variation_r(pat_modes, 2, pat_index_set) {
>>> +                struct igt_collection *sizes;
>>> +                struct pat_index_entry r1_entry, r2_entry;
>>> +                int r1_idx, r2_idx;
>>> +
>>> +                r1_idx = igt_collection_get_value(pat_modes, 0);
>>> +                r2_idx = igt_collection_get_value(pat_modes, 1);
>>> +
>>> +                r1_entry = modes_arr[r1_idx];
>>> +                r2_entry = modes_arr[r2_idx];
>>> +
>>> +                if (r1_entry.get_pat_index)
>>> +                    p.r1_pat_index = r1_entry.get_pat_index(fd, NULL);
>>> +                else
>>> +                    p.r1_pat_index = r1_entry.pat_index;
>>> +
>>> +                if (r2_entry.get_pat_index)
>>> +                    p.r2_pat_index = r2_entry.get_pat_index(fd, 
>>> &p.r2_compressed);
>>> +                else {
>>> +                    p.r2_pat_index = r2_entry.pat_index;
>>> +                    p.r2_compressed = r2_entry.compressed;
>>> +                }
>>> +
>>> +                p.r1_coh_mode = r1_entry.coh_mode;
>>> +                p.r2_coh_mode = r2_entry.coh_mode;
>>> +
>>> +                p.r1_force_cpu_wc = r1_entry.force_cpu_wc;
>>> +                p.r2_force_cpu_wc = r2_entry.force_cpu_wc;
>>> +
>>> +                p.r1 = r1;
>>> +                p.r2 = r2;
>>> +
>>> +                for_each_variation_r(sizes, 1, sizes_set) {
>>> +                    int size_mode_idx = 
>>> igt_collection_get_value(sizes, 0);
>>> +
>>> +                    p.size = &size_modes[size_mode_idx];
>>> +
>>> +                    igt_debug("[r1]: r: %u, idx: %u, coh: %u, wc: 
>>> %d\n",
>>> +                          p.r1, p.r1_pat_index, p.r1_coh_mode, 
>>> p.r1_force_cpu_wc);
>>> +                    igt_debug("[r2]: r: %u, idx: %u, coh: %u, wc: 
>>> %d, comp: %d, w: %u, h: %u, a: %u\n",
>>> +                          p.r2, p.r2_pat_index, p.r2_coh_mode,
>>> +                          p.r2_force_cpu_wc, p.r2_compressed,
>>> +                          p.size->width, p.size->height,
>>> +                          p.size->alignment);
>>> +
>>> +                    igt_dynamic_f("%s-%s-%s-%s-%s",
>>> +                              copy_mode.name,
>>> +                              reg_str, r1_entry.name,
>>> +                              r2_entry.name, p.size->name)
>>> +                        copy_mode.fn(&p);
>>> +                }
>>> +            }
>>> +
>>> +            free(reg_str);
>>> +        }
>>> +    }
>>> +}
>>> +
>>> +igt_main
>>> +{
>>> +    uint16_t dev_id;
>>> +    int fd;
>>> +
>>> +    igt_fixture {
>>> +        uint32_t seed;
>>> +
>>> +        fd = drm_open_driver(DRIVER_XE);
>>> +        dev_id = intel_get_drm_devid(fd);
>>> +
>>> +        seed = time(NULL);
>>> +        srand(seed);
>>> +        igt_debug("seed: %d\n", seed);
>>> +
>>> +        xe_device_get(fd);
>>> +    }
>>> +
>>> +    igt_subtest("pat-index-all")
>>> +        pat_index_all(fd);
>>> +
>>> +    igt_subtest("userptr-coh-none")
>>> +        userptr_coh_none(fd);
>>> +
>>> +    igt_subtest_with_dynamic("pat-index-common") {
>>> +        subtest_pat_index_modes_with_regions(fd, 
>>> common_pat_index_modes,
>>> +                             ARRAY_SIZE(common_pat_index_modes));
>>> +    }
>>> +
>>> +    igt_subtest_with_dynamic("pat-index-xelp") {
>>> +        igt_require(intel_graphics_ver(dev_id) <= IP_VER(12, 55));
>>> +        subtest_pat_index_modes_with_regions(fd, xelp_pat_index_modes,
>>> +                             ARRAY_SIZE(xelp_pat_index_modes));
>>> +    }
>>> +
>>> +    igt_subtest_with_dynamic("pat-index-xehpc") {
>>> +        igt_require(IS_PONTEVECCHIO(dev_id));
>>> +        subtest_pat_index_modes_with_regions(fd, xehpc_pat_index_modes,
>>> +                             ARRAY_SIZE(xehpc_pat_index_modes));
>>> +    }
>>> +
>>> +    igt_subtest_with_dynamic("pat-index-xe2") {
>>> +        igt_require(intel_get_device_info(dev_id)->graphics_ver >= 20);
>>> +        subtest_pat_index_modes_with_regions(fd, xe2_pat_index_modes,
>>> +                             ARRAY_SIZE(xe2_pat_index_modes));
>>> +    }
>>> +
>>> +    igt_fixture
>>> +        drm_close_driver(fd);
>>> +}
>>> diff --git a/tests/meson.build b/tests/meson.build
>>> index 5afcd8cbb..3aecfbee0 100644
>>> --- a/tests/meson.build
>>> +++ b/tests/meson.build
>>> @@ -297,6 +297,7 @@ intel_xe_progs = [
>>>     'xe_mmap',
>>>     'xe_module_load',
>>>     'xe_noexec_ping_pong',
>>> +    'xe_pat',
>>>     'xe_pm',
>>>     'xe_pm_residency',
>>>     'xe_prime_self_import',
>>> -- 
>>> 2.41.0
>>>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [igt-dev] [PATCH i-g-t v4 14/15] tests/xe: add some vm_bind pat_index tests
  2023-10-20  8:21     ` Matthew Auld
  2023-10-20  8:42       ` Matthew Auld
@ 2023-10-20 17:24       ` Niranjana Vishwanathapura
  1 sibling, 0 replies; 28+ messages in thread
From: Niranjana Vishwanathapura @ 2023-10-20 17:24 UTC (permalink / raw)
  To: Matthew Auld; +Cc: igt-dev, Nitish Kumar

On Fri, Oct 20, 2023 at 09:21:13AM +0100, Matthew Auld wrote:
>On 20/10/2023 06:27, Niranjana Vishwanathapura wrote:
>>On Thu, Oct 19, 2023 at 03:41:05PM +0100, Matthew Auld wrote:
>>>Add some basic tests for pat_index and vm_bind.
>>>
>>>v2: Make sure to actually use srand() with the chosen seed
>>> - Make it work on xe2; the wt mode now has compression.
>>> - Also test some xe2+ specific pat_index modes.
>>>v3: Fix decompress step.
>>>v4: (Niranjana)
>>> - Various improvements, including testing more pat_index modes, like
>>>   wc where possible.
>>> - Document the idea behind "common" modes.
>>>
>>>Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>>>Cc: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
>>>Cc: José Roberto de Souza <jose.souza@intel.com>
>>>Cc: Pallavi Mishra <pallavi.mishra@intel.com>
>>>Cc: Nitish Kumar <nitish.kumar@intel.com>
>>>---
>>>tests/intel/xe_pat.c | 754 +++++++++++++++++++++++++++++++++++++++++++
>>>tests/meson.build    |   1 +
>>>2 files changed, 755 insertions(+)
>>>create mode 100644 tests/intel/xe_pat.c
>>>
>>>diff --git a/tests/intel/xe_pat.c b/tests/intel/xe_pat.c
>>>new file mode 100644
>>>index 000000000..1e74014b8
>>>--- /dev/null
>>>+++ b/tests/intel/xe_pat.c
>>>@@ -0,0 +1,754 @@
>>>+// SPDX-License-Identifier: MIT
>>>+/*
>>>+ * Copyright © 2023 Intel Corporation
>>>+ */
>>>+
>>>+/**
>>>+ * TEST: Test for selecting per-VMA pat_index
>>>+ * Category: Software building block
>>>+ * Sub-category: VMA
>>>+ * Functionality: pat_index
>>>+ */
>>>+
>>>+#include "igt.h"
>>>+#include "intel_blt.h"
>>>+#include "intel_mocs.h"
>>>+#include "intel_pat.h"
>>>+
>>>+#include "xe/xe_ioctl.h"
>>>+#include "xe/xe_query.h"
>>>+#include "xe/xe_util.h"
>>>+
>>>+#define PAGE_SIZE 4096
>>>+
>>>+static bool do_slow_check;
>>>+
>>>+/**
>>>+ * SUBTEST: userptr-coh-none
>>>+ * Test category: functionality test
>>>+ * Description: Test non-coherent pat_index on userptr
>>>+ */
>>>+static void userptr_coh_none(int fd)
>>>+{
>>>+    size_t size = xe_get_default_alignment(fd);
>>>+    uint32_t vm;
>>>+    void *data;
>>>+
>>>+    data = mmap(0, size, PROT_READ |
>>>+            PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS, -1, 0);
>>>+    igt_assert(data != MAP_FAILED);
>>>+
>>>+    vm = xe_vm_create(fd, 0, 0);
>>>+
>>>+    /*
>>>+     * Try some valid combinations first just to make sure we're 
>>>not being
>>>+     * swindled.
>>>+     */
>>>+    igt_assert_eq(__xe_vm_bind(fd, vm, 0, 0, 
>>>to_user_pointer(data), 0x40000,
>>>+                   size, XE_VM_BIND_OP_MAP_USERPTR, 0, NULL, 0, 0,
>>>+                   DEFAULT_PAT_INDEX, 0),
>>>+              0);
>>>+    xe_vm_unbind_sync(fd, vm, 0, 0x40000, size);
>>>+    igt_assert_eq(__xe_vm_bind(fd, vm, 0, 0, 
>>>to_user_pointer(data), 0x40000,
>>>+                   size, XE_VM_BIND_OP_MAP_USERPTR, 0, NULL, 0, 0,
>>>+                   intel_get_pat_idx_wb(fd), 0),
>>>+              0);
>>>+    xe_vm_unbind_sync(fd, vm, 0, 0x40000, size);
>>>+
>>>+    /* And then some known COH_NONE pat_index combos which should 
>>>fail. */
>>>+    igt_assert_eq(__xe_vm_bind(fd, vm, 0, 0, 
>>>to_user_pointer(data), 0x40000,
>>>+                   size, XE_VM_BIND_OP_MAP_USERPTR, 0, NULL, 0, 0,
>>>+                   intel_get_pat_idx_uc(fd), 0),
>>>+              -EINVAL);
>>>+    igt_assert_eq(__xe_vm_bind(fd, vm, 0, 0, 
>>>to_user_pointer(data), 0x40000,
>>>+                   size, XE_VM_BIND_OP_MAP_USERPTR, 0, NULL, 0, 0,
>>>+                   intel_get_pat_idx_wt(fd), 0),
>>>+              -EINVAL);
>>>+
>>>+    munmap(data, size);
>>>+    xe_vm_destroy(fd, vm);
>>>+}
>>>+
>>>+/**
>>>+ * SUBTEST: pat-index-all
>>>+ * Test category: functionality test
>>>+ * Description: Test every pat_index
>>>+ */
>>>+static void pat_index_all(int fd)
>>>+{
>>>+    uint16_t dev_id = intel_get_drm_devid(fd);
>>>+    size_t size = xe_get_default_alignment(fd);
>>>+    uint32_t vm, bo;
>>>+    uint8_t pat_index;
>>>+
>>>+    vm = xe_vm_create(fd, 0, 0);
>>>+
>>>+    bo = xe_bo_create_caching(fd, 0, size, all_memory_regions(fd),
>>>+                  DRM_XE_GEM_CPU_CACHING_WC,
>>>+                  DRM_XE_GEM_COH_NONE);
>>>+
>>>+    igt_assert_eq(__xe_vm_bind(fd, vm, 0, bo, 0, 0x40000,
>>>+                   size, XE_VM_BIND_OP_MAP, 0, NULL, 0, 0,
>>>+                   intel_get_pat_idx_uc(fd), 0),
>>>+              0);
>>>+    xe_vm_unbind_sync(fd, vm, 0, 0x40000, size);
>>>+
>>>+    igt_assert_eq(__xe_vm_bind(fd, vm, 0, bo, 0, 0x40000,
>>>+                   size, XE_VM_BIND_OP_MAP, 0, NULL, 0, 0,
>>>+                   intel_get_pat_idx_wt(fd), 0),
>>>+              0);
>>>+    xe_vm_unbind_sync(fd, vm, 0, 0x40000, size);
>>>+
>>>+    igt_assert_eq(__xe_vm_bind(fd, vm, 0, bo, 0, 0x40000,
>>>+                   size, XE_VM_BIND_OP_MAP, 0, NULL, 0, 0,
>>>+                   intel_get_pat_idx_wb(fd), 0),
>>>+              0);
>>>+    xe_vm_unbind_sync(fd, vm, 0, 0x40000, size);
>>>+
>>>+    igt_assert(intel_get_max_pat_index(fd));
>>>+
>>>+    for (pat_index = 0; pat_index <= intel_get_max_pat_index(fd);
>>>+         pat_index++) {
>>>+        if (intel_get_device_info(dev_id)->graphics_ver == 20 &&
>>>+            pat_index >= 16 && pat_index <= 19) { /* hw reserved */
>>>+            igt_assert_eq(__xe_vm_bind(fd, vm, 0, bo, 0, 0x40000,
>>>+                           size, XE_VM_BIND_OP_MAP, 0, NULL, 0, 0,
>>>+                           pat_index, 0),
>>>+                      -EINVAL);
>>>+        } else {
>>>+            igt_assert_eq(__xe_vm_bind(fd, vm, 0, bo, 0, 0x40000,
>>>+                           size, XE_VM_BIND_OP_MAP, 0, NULL, 0, 0,
>>>+                           pat_index, 0),
>>>+                      0);
>>>+            xe_vm_unbind_sync(fd, vm, 0, 0x40000, size);
>>>+        }
>>>+    }
>>>+
>>>+    igt_assert_eq(__xe_vm_bind(fd, vm, 0, bo, 0, 0x40000,
>>>+                   size, XE_VM_BIND_OP_MAP, 0, NULL, 0, 0,
>>>+                   pat_index, 0),
>>>+              -EINVAL);
>>>+
>>>+    gem_close(fd, bo);
>>>+
>>>+    /* Must be at least as coherent as the gem_create coh_mode. */
>>>+    bo = xe_bo_create_caching(fd, 0, size, system_memory(fd),
>>>+                  DRM_XE_GEM_CPU_CACHING_WB,
>>>+                  DRM_XE_GEM_COH_AT_LEAST_1WAY);
>>>+
>>>+    igt_assert_eq(__xe_vm_bind(fd, vm, 0, bo, 0, 0x40000,
>>>+                   size, XE_VM_BIND_OP_MAP, 0, NULL, 0, 0,
>>>+                   intel_get_pat_idx_uc(fd), 0),
>>>+              -EINVAL);
>>>+
>>>+    igt_assert_eq(__xe_vm_bind(fd, vm, 0, bo, 0, 0x40000,
>>>+                   size, XE_VM_BIND_OP_MAP, 0, NULL, 0, 0,
>>>+                   intel_get_pat_idx_wt(fd), 0),
>>>+              -EINVAL);
>>>+
>>>+    igt_assert_eq(__xe_vm_bind(fd, vm, 0, bo, 0, 0x40000,
>>>+                   size, XE_VM_BIND_OP_MAP, 0, NULL, 0, 0,
>>>+                   intel_get_pat_idx_wb(fd), 0),
>>>+              0);
>>>+    xe_vm_unbind_sync(fd, vm, 0, 0x40000, size);
>>>+
>>>+    gem_close(fd, bo);
>>>+
>>>+    xe_vm_destroy(fd, vm);
>>>+}
>>>+
>>>+#define CLEAR_1 0xFFFFFFFF /* something compressible */
>>>+
>>>+static void xe2_blt_decompress_dst(int fd,
>>>+                   intel_ctx_t *ctx,
>>>+                   uint64_t ahnd,
>>>+                   struct blt_copy_data *blt,
>>>+                   uint32_t alias_handle,
>>>+                   uint32_t size)
>>>+{
>>>+    struct blt_copy_object tmp = {};
>>>+
>>>+    /*
>>>+     * Xe2 in-place decompression using an alias to the same physical
>>>+     * memory, but with the dst mapped using some uncompressed 
>>>pat_index.
>>>+     * This should allow checking the object pages via mmap.
>>>+     */
>>>+
>>>+    memcpy(&tmp, &blt->src, sizeof(blt->dst));
>>>+    memcpy(&blt->src, &blt->dst, sizeof(blt->dst));
>>>+    blt_set_object(&blt->dst, alias_handle, size, 0,
>>>+               intel_get_uc_mocs_index(fd),
>>>+               intel_get_pat_idx_uc(fd), /* compression disabled */
>>>+               T_LINEAR, 0, 0);
>>>+    blt_fast_copy(fd, ctx, NULL, ahnd, blt);
>>>+    memcpy(&blt->dst, &blt->src, sizeof(blt->dst));
>>>+    memcpy(&blt->src, &tmp, sizeof(blt->dst));
>>>+}
>>>+
>>>+struct xe_pat_size_mode {
>>>+    uint16_t width;
>>>+    uint16_t height;
>>>+    uint32_t alignment;
>>>+    const char *name;
>>>+};
>>>+
>>>+struct xe_pat_param {
>>>+    int fd;
>>>+
>>>+    const struct xe_pat_size_mode *size;
>>>+
>>>+    uint32_t r1;
>>>+    uint8_t  r1_pat_index;
>>>+    uint16_t r1_coh_mode;
>>>+    bool     r1_force_cpu_wc;
>>>+
>>>+    uint32_t r2;
>>>+    uint8_t  r2_pat_index;
>>>+    uint16_t r2_coh_mode;
>>>+    bool     r2_force_cpu_wc;
>>>+    bool     r2_compressed; /* xe2+ compression */
>>>+
>>>+};
>>>+
>>>+static void pat_index_blt(struct xe_pat_param *p)
>>>+{
>>>+    struct drm_xe_engine_class_instance inst = {
>>>+        .engine_class = DRM_XE_ENGINE_CLASS_COPY,
>>>+    };
>>>+    struct blt_copy_data blt = {};
>>>+    struct blt_copy_object src = {};
>>>+    struct blt_copy_object dst = {};
>>>+    uint32_t vm, exec_queue, src_bo, dst_bo, bb;
>>>+    uint32_t *src_map, *dst_map;
>>>+    uint16_t r1_cpu_caching, r2_cpu_caching;
>>>+    uint32_t r1_flags, r2_flags;
>>>+    intel_ctx_t *ctx;
>>>+    uint64_t ahnd;
>>>+    int width = p->size->width, height = p->size->height;
>>>+    int size, stride, bb_size;
>>>+    int bpp = 32;
>>>+    uint32_t alias, name;
>>>+    int fd = p->fd;
>>>+    int i;
>>>+
>>>+    igt_require(blt_has_fast_copy(fd));
>>>+
>>>+    vm = xe_vm_create(fd, DRM_XE_VM_CREATE_ASYNC_DEFAULT, 0);
>>>+    exec_queue = xe_exec_queue_create(fd, vm, &inst, 0);
>>>+    ctx = intel_ctx_xe(fd, vm, exec_queue, 0, 0, 0);
>>>+    ahnd = intel_allocator_open_full(fd, ctx->vm, 0, 0,
>>>+                     INTEL_ALLOCATOR_SIMPLE,
>>>+                     ALLOC_STRATEGY_LOW_TO_HIGH,
>>>+                     p->size->alignment);
>>>+
>>>+    bb_size = xe_get_default_alignment(fd);
>>>+    bb = xe_bo_create_flags(fd, 0, bb_size, system_memory(fd));
>>>+
>>>+    size = width * height * bpp / 8;
>>>+    stride = width * 4;
>>>+
>>>+    r1_flags = 0;
>>>+    if (p->r1 != system_memory(fd))
>>>+        r1_flags |= XE_GEM_CREATE_FLAG_NEEDS_VISIBLE_VRAM;
>>>+
>>>+    if (p->r1_coh_mode == DRM_XE_GEM_COH_AT_LEAST_1WAY
>>>+        && p->r1 == system_memory(fd) && !p->r1_force_cpu_wc)
>>>+        r1_cpu_caching = DRM_XE_GEM_CPU_CACHING_WB;
>>>+    else
>>>+        r1_cpu_caching = DRM_XE_GEM_CPU_CACHING_WC;
>>>+
>>>+    r2_flags = 0;
>>>+    if (p->r2 != system_memory(fd))
>>>+        r2_flags |= XE_GEM_CREATE_FLAG_NEEDS_VISIBLE_VRAM;
>>>+
>>>+    if (p->r2_coh_mode == DRM_XE_GEM_COH_AT_LEAST_1WAY &&
>>>+        p->r2 == system_memory(fd) && !p->r2_force_cpu_wc)
>>>+        r2_cpu_caching = DRM_XE_GEM_CPU_CACHING_WB;
>>>+    else
>>>+        r2_cpu_caching = DRM_XE_GEM_CPU_CACHING_WC;
>>>+
>>>+
>>>+    src_bo = xe_bo_create_caching(fd, 0, size, p->r1 | r1_flags, 
>>>r1_cpu_caching,
>>>+                      p->r1_coh_mode);
>>>+    dst_bo = xe_bo_create_caching(fd, 0, size, p->r2 | r2_flags, 
>>>r2_cpu_caching,
>>>+                      p->r2_coh_mode);
>>>+    if (p->r2_compressed) {
>>>+        name = gem_flink(fd, dst_bo);
>>>+        alias = gem_open(fd, name);
>>>+    }
>>>+
>>>+    blt_copy_init(fd, &blt);
>>>+    blt.color_depth = CD_32bit;
>>>+
>>>+    blt_set_object(&src, src_bo, size, p->r1, 
>>>intel_get_uc_mocs_index(fd),
>>>+               p->r1_pat_index, T_LINEAR,
>>>+               COMPRESSION_DISABLED, COMPRESSION_TYPE_3D);
>>>+    blt_set_geom(&src, stride, 0, 0, width, height, 0, 0);
>>>+
>>>+    blt_set_object(&dst, dst_bo, size, p->r2, 
>>>intel_get_uc_mocs_index(fd),
>>>+               p->r2_pat_index, T_LINEAR,
>>>+               COMPRESSION_DISABLED, COMPRESSION_TYPE_3D);
>>>+    blt_set_geom(&dst, stride, 0, 0, width, height, 0, 0);
>>>+
>>>+    blt_set_copy_object(&blt.src, &src);
>>>+    blt_set_copy_object(&blt.dst, &dst);
>>>+    blt_set_batch(&blt.bb, bb, bb_size, system_memory(fd));
>>>+
>>>+    src_map = xe_bo_map(fd, src_bo, size);
>>>+    dst_map = xe_bo_map(fd, dst_bo, size);
>>>+
>>>+    /* Ensure we always see zeroes for the initial KMD zeroing */
>>>+    blt_fast_copy(fd, ctx, NULL, ahnd, &blt);
>>>+    if (p->r2_compressed)
>>>+        xe2_blt_decompress_dst(fd, ctx, ahnd, &blt, alias, size);
>>>+
>>>+    /*
>>>+     * Only sample random dword in every page if we are doing 
>>>slow uncached
>>>+     * reads from VRAM.
>>>+     */
>>>+    if (!do_slow_check && p->r2 != system_memory(fd)) {
>>>+        int dwords_page = PAGE_SIZE / sizeof(uint32_t);
>>>+        int dword = rand() % dwords_page;
>>>+
>>>+        igt_debug("random dword: %d\n", dword);
>>>+
>>>+        for (i = dword; i < size / sizeof(uint32_t); i += dwords_page)
>>>+            igt_assert_eq(dst_map[i], 0);
>>>+
>>>+    } else {
>>>+        for (i = 0; i < size / sizeof(uint32_t); i++)
>>>+            igt_assert_eq(dst_map[i], 0);
>>>+    }
>>>+
>>>+    /* Write some values from the CPU, potentially dirtying the 
>>>CPU cache */
>>>+    for (i = 0; i < size / sizeof(uint32_t); i++) {
>>>+        if (p->r2_compressed)
>>>+            src_map[i] = CLEAR_1;
>>>+        else
>>>+            src_map[i] = i;
>>>+    }
>>>+
>>>+    /* And finally ensure we always see the CPU written values */
>>>+    blt_fast_copy(fd, ctx, NULL, ahnd, &blt);
>>>+    if (p->r2_compressed)
>>>+        xe2_blt_decompress_dst(fd, ctx, ahnd, &blt, alias, size);
>>>+
>>>+    if (!do_slow_check && p->r2 != system_memory(fd)) {
>>>+        int dwords_page = PAGE_SIZE / sizeof(uint32_t);
>>>+        int dword = rand() % dwords_page;
>>>+
>>>+        igt_debug("random dword: %d\n", dword);
>>>+
>>>+        for (i = dword; i < size / sizeof(uint32_t); i += dwords_page) {
>>>+            if (p->r2_compressed)
>>>+                igt_assert_eq(dst_map[i], CLEAR_1);
>>>+            else
>>>+                igt_assert_eq(dst_map[i], i);
>>>+        }
>>>+
>>>+    } else {
>>>+        for (i = 0; i < size / sizeof(uint32_t); i++) {
>>>+            if (p->r2_compressed)
>>>+                igt_assert_eq(dst_map[i], CLEAR_1);
>>>+            else
>>>+                igt_assert_eq(dst_map[i], i);
>>>+        }
>>>+    }
>>>+
>>>+    munmap(src_map, size);
>>>+    munmap(dst_map, size);
>>>+
>>>+    gem_close(fd, src_bo);
>>>+    gem_close(fd, dst_bo);
>>>+    gem_close(fd, bb);
>>>+
>>>+    xe_exec_queue_destroy(fd, exec_queue);
>>>+    xe_vm_destroy(fd, vm);
>>>+
>>>+    put_ahnd(ahnd);
>>>+    intel_ctx_destroy(fd, ctx);
>>>+}
>>>+
>>>+static void pat_index_render(struct xe_pat_param *p)
>>>+{
>>>+    int fd = p->fd;
>>>+    uint32_t devid = intel_get_drm_devid(fd);
>>>+    igt_render_copyfunc_t render_copy = NULL;
>>>+    int size, stride, width = p->size->width, height = p->size->height;
>>>+    struct intel_buf src, dst;
>>>+    struct intel_bb *ibb;
>>>+    struct buf_ops *bops;
>>>+    uint16_t r1_cpu_caching, r2_cpu_caching;
>>>+    uint32_t r1_flags, r2_flags;
>>>+    uint32_t src_bo, dst_bo;
>>>+    uint32_t *src_map, *dst_map;
>>>+    int bpp = 32;
>>>+    int i;
>>>+
>>>+    bops = buf_ops_create(fd);
>>>+
>>>+    render_copy = igt_get_render_copyfunc(devid);
>>>+    igt_require(render_copy);
>>>+    igt_require(!p->r2_compressed); /* XXX */
>>>+    igt_require(xe_has_engine_class(fd, DRM_XE_ENGINE_CLASS_RENDER));
>>>+
>>>+    ibb = intel_bb_create_full(fd, 0, 0, NULL, 
>>>xe_get_default_alignment(fd),
>>>+                   0, 0, p->size->alignment,
>>>+                   INTEL_ALLOCATOR_SIMPLE,
>>>+                   ALLOC_STRATEGY_HIGH_TO_LOW);
>>>+
>>>+    if (p->r1_coh_mode == DRM_XE_GEM_COH_AT_LEAST_1WAY
>>>+        && p->r1 == system_memory(fd) && !p->r1_force_cpu_wc)
>>>+        r1_cpu_caching = DRM_XE_GEM_CPU_CACHING_WB;
>>>+    else
>>>+        r1_cpu_caching = DRM_XE_GEM_CPU_CACHING_WC;
>>>+
>>>+    if (p->r2_coh_mode == DRM_XE_GEM_COH_AT_LEAST_1WAY &&
>>>+        p->r2 == system_memory(fd) && !p->r2_force_cpu_wc)
>>>+        r2_cpu_caching = DRM_XE_GEM_CPU_CACHING_WB;
>>>+    else
>>>+        r2_cpu_caching = DRM_XE_GEM_CPU_CACHING_WC;
>>>+
>>>+    size = width * height * bpp / 8;
>>>+    stride = width * 4;
>>>+
>>>+    r1_flags = 0;
>>>+    if (p->r1 != system_memory(fd))
>>>+        r1_flags |= XE_GEM_CREATE_FLAG_NEEDS_VISIBLE_VRAM;
>>>+
>>>+    src_bo = xe_bo_create_caching(fd, 0, size, p->r1 | r1_flags, 
>>>r1_cpu_caching,
>>>+                      p->r1_coh_mode);
>>>+    intel_buf_init_full(bops, src_bo, &src, width, height, bpp, 0,
>>>+                I915_TILING_NONE, I915_COMPRESSION_NONE, size,
>>>+                stride, p->r1, p->r1_pat_index);
>>>+
>>>+    r2_flags = 0;
>>>+    if (p->r2 != system_memory(fd))
>>>+        r2_flags |= XE_GEM_CREATE_FLAG_NEEDS_VISIBLE_VRAM;
>>>+
>>>+    dst_bo = xe_bo_create_caching(fd, 0, size, p->r2 | r2_flags, 
>>>r2_cpu_caching,
>>>+                      p->r2_coh_mode);
>>>+    intel_buf_init_full(bops, dst_bo, &dst, width, height, bpp, 0,
>>>+                I915_TILING_NONE, I915_COMPRESSION_NONE, size,
>>>+                stride, p->r2, p->r2_pat_index);
>>>+
>>>+    src_map = xe_bo_map(fd, src_bo, size);
>>>+    dst_map = xe_bo_map(fd, dst_bo, size);
>>>+
>>>+    /* Ensure we always see zeroes for the initial KMD zeroing */
>>>+    render_copy(ibb,
>>>+            &src,
>>>+            0, 0, width, height,
>>>+            &dst,
>>>+            0, 0);
>>>+    intel_bb_sync(ibb);
>>>+
>>>+    if (!do_slow_check && p->r2 != system_memory(fd)) {
>>>+        int dwords_page = PAGE_SIZE / sizeof(uint32_t);
>>>+        int dword = rand() % dwords_page;
>>>+
>>>+        igt_debug("random dword: %d\n", dword);
>>>+
>>>+        for (i = dword; i < size / sizeof(uint32_t); i += dwords_page)
>>>+            igt_assert_eq(dst_map[i], 0);
>>>+    } else {
>>>+        for (i = 0; i < size / sizeof(uint32_t); i++)
>>>+            igt_assert_eq(dst_map[i], 0);
>>>+    }
>>>+
>>>+    /* Write some values from the CPU, potentially dirtying the 
>>>CPU cache */
>>>+    for (i = 0; i < size / sizeof(uint32_t); i++)
>>>+        src_map[i] = i;
>>>+
>>>+    /* And finally ensure we always see the CPU written values */
>>>+    render_copy(ibb,
>>>+            &src,
>>>+            0, 0, width, height,
>>>+            &dst,
>>>+            0, 0);
>>>+    intel_bb_sync(ibb);
>>>+
>>>+    if (!do_slow_check && p->r2 != system_memory(fd)) {
>>>+        int dwords_page = PAGE_SIZE / sizeof(uint32_t);
>>>+        int dword = rand() % dwords_page;
>>>+
>>>+        igt_debug("random dword: %d\n", dword);
>>>+
>>>+        for (i = dword; i < size / sizeof(uint32_t); i += dwords_page)
>>>+            igt_assert_eq(dst_map[i], i);
>>>+    } else {
>>>+        for (i = 0; i < size / sizeof(uint32_t); i++)
>>>+            igt_assert_eq(dst_map[i], i);
>>>+    }
>>>+
>>>+    munmap(src_map, size);
>>>+    munmap(dst_map, size);
>>>+
>>>+    intel_bb_destroy(ibb);
>>>+
>>>+    gem_close(fd, src_bo);
>>>+    gem_close(fd, dst_bo);
>>>+}
>>>+
>>>+static uint8_t get_pat_idx_uc(int fd, bool *compressed)
>>>+{
>>>+    if (compressed)
>>>+        *compressed = false;
>>>+
>>>+    return intel_get_pat_idx_uc(fd);
>>>+}
>>>+
>>>+static uint8_t get_pat_idx_wt(int fd, bool *compressed)
>>>+{
>>>+    uint16_t dev_id = intel_get_drm_devid(fd);
>>>+
>>>+    if (compressed)
>>>+        *compressed = intel_get_device_info(dev_id)->graphics_ver == 20;
>>>+
>>>+    return intel_get_pat_idx_wt(fd);
>>>+}
>>>+
>>>+static uint8_t get_pat_idx_wb(int fd, bool *compressed)
>>>+{
>>>+    if (compressed)
>>>+        *compressed = false;
>>>+
>>>+    return intel_get_pat_idx_wb(fd);
>>>+}
>>>+
>>>+struct pat_index_entry {
>>>+    uint8_t (*get_pat_index)(int fd, bool *compressed);
>>>+
>>>+    uint8_t pat_index;
>>>+    bool compressed;
>>>+
>>>+    const char *name;
>>>+    uint16_t coh_mode;
>>>+    bool force_cpu_wc;
>>>+};
>>>+
>>>+/*
>>>+ * The common modes are available on all platforms supported by 
>>>Xe and so should
>>>+ * be commonly supported. There are many more possible pat_index 
>>>modes, however
>>>+ * most IGTs shouldn't really care about them so likely no need 
>>>to add them to
>>>+ * lib/intel_pat.c. We do try to test some on the non-common modes here.
>>>+ */
>>>+const struct pat_index_entry common_pat_index_modes[] = {
>>>+    { get_pat_idx_uc, 0, 0, "uc",        
>>>DRM_XE_GEM_COH_NONE                },
>>>+    { get_pat_idx_wt, 0, 0, "wt",        
>>>DRM_XE_GEM_COH_NONE                },
>>>+    { get_pat_idx_wb, 0, 0, "wb",        
>>>DRM_XE_GEM_COH_AT_LEAST_1WAY       },
>>>+    { get_pat_idx_wb, 0, 0, "wb-cpu-wc", 
>>>DRM_XE_GEM_COH_AT_LEAST_1WAY, true },
>>>+};
>>>+
>>>+const struct pat_index_entry xelp_pat_index_modes[] = {
>>>+    { NULL, 1, false, "wc", DRM_XE_GEM_COH_NONE },
>>>+};
>>>+
>>>+const struct pat_index_entry xehpc_pat_index_modes[] = {
>>>+    { NULL, 1, false, "wc",    DRM_XE_GEM_COH_NONE          },
>>>+    { NULL, 4, false, "c1-wt", DRM_XE_GEM_COH_NONE          },
>>>+    { NULL, 5, false, "c1-wb", DRM_XE_GEM_COH_AT_LEAST_1WAY },
>>>+    { NULL, 6, false, "c2-wt", DRM_XE_GEM_COH_NONE          },
>>>+    { NULL, 7, false, "c2-wb", DRM_XE_GEM_COH_AT_LEAST_1WAY },
>>>+};
>>>+
>>>+/* Too many, just pick some interesting ones */
>>>+const struct pat_index_entry xe2_pat_index_modes[] = {
>>>+    { NULL, 1, false, "1way",        
>>>DRM_XE_GEM_COH_AT_LEAST_1WAY       },
>>>+    { NULL, 2, false, "2way",        
>>>DRM_XE_GEM_COH_AT_LEAST_1WAY       },
>>>+    { NULL, 2, false, "2way-cpu-wc", 
>>>DRM_XE_GEM_COH_AT_LEAST_1WAY, true },
>>>+    { NULL, 3, true,  "uc-comp",     
>>>DRM_XE_GEM_COH_NONE                },
>>>+    { NULL, 5, false, "uc-1way",     
>>>DRM_XE_GEM_COH_AT_LEAST_1WAY       },
>>>+};
>>>+
>>>+/*
>>>+ * Depending on 2M/1G GTT pages we might trigger different PTE 
>>>layouts for the
>>>+ * PAT bits, so make sure we test with and without huge-pages. 
>>>Also ensure we
>>>+ * have a mix of different pat_index modes for each PDE.
>>>+ */
>>>+const struct xe_pat_size_mode size_modes[] =  {
>>>+    { 256,  256,  0,        "mixed-pde"  },
>>>+    { 1024, 1024, 1u << 21, "single-pde" },
>>>+};
>>
>>I am bit confused with naming here (mixed-pde/single-pde).
>>The first case here creates BOs of size 256*256*8/2 = 256K which 
>>means it will
>>need updating few PTEs could be all under a single PTE. This tests 
>>pat_index
>>setting of PTEs
>>The second case here create BOs of size 1024*1024*8/2 = 4MB which at 
>>2MB offset
>>will occupy 2 PDEs. This tests pat_index setting of leaf PDEs.
>>Right?
>
>Yup, the "mixed-pde" just means that the pde contains multiple 
>different mappings using different pat_index. The "single-pde" means 
>that the mapping will entirely consume each pde, hopefully with 2M GTT 
>pages given the alignment. And yes this is mostly to test bit7/bit12 
>with pat[2].
>

But the "mixed-pde" will have multiple PTE entries but all with the same
pat_index right? ie., either r1_pat_index or r2_pat_index. I didn't get
the "mixed" meaning here. My understanding is, tests are "multi-pte" and
"single-pde".

Niranjana

>I will change this to rather use 2M size, which is maybe less consufing.
>
>>
>>Other than that, the patch looks fine to me.
>>Reviewed-by: Niranjana Vishwanathapura 
>><niranjana.vishwanathapura@intel.com>
>
>Thanks.
>
>>
>>>+
>>>+typedef void (*copy_fn)(struct xe_pat_param *p);
>>>+
>>>+const struct xe_pat_copy_mode {
>>>+    copy_fn fn;
>>>+    const char *name;
>>>+} copy_modes[] =  {
>>>+    {  pat_index_blt,    "blt"    },
>>>+    {  pat_index_render, "render" },
>>>+};
>>>+
>>>+/**
>>>+ * SUBTEST: pat-index-common
>>>+ * Test category: functionality test
>>>+ * Description: Check the common pat_index modes.
>>>+ */
>>>+
>>>+/**
>>>+ * SUBTEST: pat-index-xelp
>>>+ * Test category: functionality test
>>>+ * Description: Check some of the xelp pat_index modes.
>>>+ */
>>>+
>>>+/**
>>>+ * SUBTEST: pat-index-xehpc
>>>+ * Test category: functionality test
>>>+ * Description: Check some of the xehpc pat_index modes.
>>>+ */
>>>+
>>>+/**
>>>+ * SUBTEST: pat-index-xe2
>>>+ * Test category: functionality test
>>>+ * Description: Check some of the xe2 pat_index modes.
>>>+ */
>>>+
>>>+static void subtest_pat_index_modes_with_regions(int fd,
>>>+                         const struct pat_index_entry *modes_arr,
>>>+                         int n_modes)
>>>+{
>>>+    struct igt_collection *copy_set;
>>>+    struct igt_collection *pat_index_set;
>>>+    struct igt_collection *regions_set;
>>>+    struct igt_collection *sizes_set;
>>>+    struct igt_collection *copies;
>>>+    struct xe_pat_param p = {};
>>>+
>>>+    p.fd = fd;
>>>+
>>>+    copy_set = igt_collection_create(ARRAY_SIZE(copy_modes));
>>>+
>>>+    pat_index_set = igt_collection_create(n_modes);
>>>+
>>>+    regions_set = xe_get_memory_region_set(fd,
>>>+                           XE_MEM_REGION_CLASS_SYSMEM,
>>>+                           XE_MEM_REGION_CLASS_VRAM);
>>>+
>>>+    sizes_set = igt_collection_create(ARRAY_SIZE(size_modes));
>>>+
>>>+    for_each_variation_r(copies, 1, copy_set) {
>>>+        struct igt_collection *regions;
>>>+        struct xe_pat_copy_mode copy_mode;
>>>+
>>>+        copy_mode = copy_modes[igt_collection_get_value(copies, 0)];
>>>+
>>>+        for_each_variation_r(regions, 2, regions_set) {
>>>+            struct igt_collection *pat_modes;
>>>+            uint32_t r1, r2;
>>>+            char *reg_str;
>>>+
>>>+            r1 = igt_collection_get_value(regions, 0);
>>>+            r2 = igt_collection_get_value(regions, 1);
>>>+
>>>+            reg_str = xe_memregion_dynamic_subtest_name(fd, regions);
>>>+
>>>+            for_each_variation_r(pat_modes, 2, pat_index_set) {
>>>+                struct igt_collection *sizes;
>>>+                struct pat_index_entry r1_entry, r2_entry;
>>>+                int r1_idx, r2_idx;
>>>+
>>>+                r1_idx = igt_collection_get_value(pat_modes, 0);
>>>+                r2_idx = igt_collection_get_value(pat_modes, 1);
>>>+
>>>+                r1_entry = modes_arr[r1_idx];
>>>+                r2_entry = modes_arr[r2_idx];
>>>+
>>>+                if (r1_entry.get_pat_index)
>>>+                    p.r1_pat_index = r1_entry.get_pat_index(fd, NULL);
>>>+                else
>>>+                    p.r1_pat_index = r1_entry.pat_index;
>>>+
>>>+                if (r2_entry.get_pat_index)
>>>+                    p.r2_pat_index = r2_entry.get_pat_index(fd, 
>>>&p.r2_compressed);
>>>+                else {
>>>+                    p.r2_pat_index = r2_entry.pat_index;
>>>+                    p.r2_compressed = r2_entry.compressed;
>>>+                }
>>>+
>>>+                p.r1_coh_mode = r1_entry.coh_mode;
>>>+                p.r2_coh_mode = r2_entry.coh_mode;
>>>+
>>>+                p.r1_force_cpu_wc = r1_entry.force_cpu_wc;
>>>+                p.r2_force_cpu_wc = r2_entry.force_cpu_wc;
>>>+
>>>+                p.r1 = r1;
>>>+                p.r2 = r2;
>>>+
>>>+                for_each_variation_r(sizes, 1, sizes_set) {
>>>+                    int size_mode_idx = 
>>>igt_collection_get_value(sizes, 0);
>>>+
>>>+                    p.size = &size_modes[size_mode_idx];
>>>+
>>>+                    igt_debug("[r1]: r: %u, idx: %u, coh: %u, wc: %d\n",
>>>+                          p.r1, p.r1_pat_index, p.r1_coh_mode, 
>>>p.r1_force_cpu_wc);
>>>+                    igt_debug("[r2]: r: %u, idx: %u, coh: %u, wc: 
>>>%d, comp: %d, w: %u, h: %u, a: %u\n",
>>>+                          p.r2, p.r2_pat_index, p.r2_coh_mode,
>>>+                          p.r2_force_cpu_wc, p.r2_compressed,
>>>+                          p.size->width, p.size->height,
>>>+                          p.size->alignment);
>>>+
>>>+                    igt_dynamic_f("%s-%s-%s-%s-%s",
>>>+                              copy_mode.name,
>>>+                              reg_str, r1_entry.name,
>>>+                              r2_entry.name, p.size->name)
>>>+                        copy_mode.fn(&p);
>>>+                }
>>>+            }
>>>+
>>>+            free(reg_str);
>>>+        }
>>>+    }
>>>+}
>>>+
>>>+igt_main
>>>+{
>>>+    uint16_t dev_id;
>>>+    int fd;
>>>+
>>>+    igt_fixture {
>>>+        uint32_t seed;
>>>+
>>>+        fd = drm_open_driver(DRIVER_XE);
>>>+        dev_id = intel_get_drm_devid(fd);
>>>+
>>>+        seed = time(NULL);
>>>+        srand(seed);
>>>+        igt_debug("seed: %d\n", seed);
>>>+
>>>+        xe_device_get(fd);
>>>+    }
>>>+
>>>+    igt_subtest("pat-index-all")
>>>+        pat_index_all(fd);
>>>+
>>>+    igt_subtest("userptr-coh-none")
>>>+        userptr_coh_none(fd);
>>>+
>>>+    igt_subtest_with_dynamic("pat-index-common") {
>>>+        subtest_pat_index_modes_with_regions(fd, common_pat_index_modes,
>>>+                             ARRAY_SIZE(common_pat_index_modes));
>>>+    }
>>>+
>>>+    igt_subtest_with_dynamic("pat-index-xelp") {
>>>+        igt_require(intel_graphics_ver(dev_id) <= IP_VER(12, 55));
>>>+        subtest_pat_index_modes_with_regions(fd, xelp_pat_index_modes,
>>>+                             ARRAY_SIZE(xelp_pat_index_modes));
>>>+    }
>>>+
>>>+    igt_subtest_with_dynamic("pat-index-xehpc") {
>>>+        igt_require(IS_PONTEVECCHIO(dev_id));
>>>+        subtest_pat_index_modes_with_regions(fd, xehpc_pat_index_modes,
>>>+                             ARRAY_SIZE(xehpc_pat_index_modes));
>>>+    }
>>>+
>>>+    igt_subtest_with_dynamic("pat-index-xe2") {
>>>+        igt_require(intel_get_device_info(dev_id)->graphics_ver >= 20);
>>>+        subtest_pat_index_modes_with_regions(fd, xe2_pat_index_modes,
>>>+                             ARRAY_SIZE(xe2_pat_index_modes));
>>>+    }
>>>+
>>>+    igt_fixture
>>>+        drm_close_driver(fd);
>>>+}
>>>diff --git a/tests/meson.build b/tests/meson.build
>>>index 5afcd8cbb..3aecfbee0 100644
>>>--- a/tests/meson.build
>>>+++ b/tests/meson.build
>>>@@ -297,6 +297,7 @@ intel_xe_progs = [
>>>    'xe_mmap',
>>>    'xe_module_load',
>>>    'xe_noexec_ping_pong',
>>>+    'xe_pat',
>>>    'xe_pm',
>>>    'xe_pm_residency',
>>>    'xe_prime_self_import',
>>>-- 
>>>2.41.0
>>>

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2023-10-20 17:25 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-19 14:40 [igt-dev] [PATCH i-g-t v4 00/15] PAT and cache coherency support Matthew Auld
2023-10-19 14:40 ` [igt-dev] [PATCH i-g-t v4 01/15] drm-uapi/xe_drm: sync to get pat and coherency bits Matthew Auld
2023-10-19 14:40 ` [igt-dev] [PATCH i-g-t v4 02/15] lib/igt_fb: mark buffers as SCANOUT Matthew Auld
2023-10-19 14:40 ` [igt-dev] [PATCH i-g-t v4 03/15] lib/igt_draw: " Matthew Auld
2023-10-19 14:40 ` [igt-dev] [PATCH i-g-t v4 04/15] lib/xe: support cpu_caching and coh_mod for gem_create Matthew Auld
2023-10-19 14:40 ` [igt-dev] [PATCH i-g-t v4 05/15] tests/xe/mmap: add some tests for cpu_caching and coh_mode Matthew Auld
2023-10-19 14:40 ` [igt-dev] [PATCH i-g-t v4 06/15] lib/intel_pat: add helpers for common pat_index modes Matthew Auld
2023-10-19 14:40 ` [igt-dev] [PATCH i-g-t v4 07/15] lib/allocator: add get_offset_pat_index() helper Matthew Auld
2023-10-19 14:40 ` [igt-dev] [PATCH i-g-t v4 08/15] lib/intel_blt: support pat_index Matthew Auld
2023-10-19 14:41 ` [igt-dev] [PATCH i-g-t v4 09/15] lib/intel_buf: " Matthew Auld
2023-10-20  5:17   ` Niranjana Vishwanathapura
2023-10-19 14:41 ` [igt-dev] [PATCH i-g-t v4 10/15] lib/xe_ioctl: update vm_bind to account for pat_index Matthew Auld
2023-10-19 17:37   ` Niranjana Vishwanathapura
2023-10-20  5:19     ` Niranjana Vishwanathapura
2023-10-20  8:13       ` Matthew Auld
2023-10-19 14:41 ` [igt-dev] [PATCH i-g-t v4 11/15] lib/intel_allocator: treat default_alignment as the minimum Matthew Auld
2023-10-19 17:34   ` Niranjana Vishwanathapura
2023-10-20  7:55     ` Matthew Auld
2023-10-19 14:41 ` [igt-dev] [PATCH i-g-t v4 12/15] lib/intel_blt: tidy up alignment usage Matthew Auld
2023-10-19 20:46   ` Niranjana Vishwanathapura
2023-10-19 14:41 ` [igt-dev] [PATCH i-g-t v4 13/15] lib/intel_batchbuffer: extend to include optional alignment Matthew Auld
2023-10-19 20:36   ` Niranjana Vishwanathapura
2023-10-19 14:41 ` [igt-dev] [PATCH i-g-t v4 14/15] tests/xe: add some vm_bind pat_index tests Matthew Auld
2023-10-20  5:27   ` Niranjana Vishwanathapura
2023-10-20  8:21     ` Matthew Auld
2023-10-20  8:42       ` Matthew Auld
2023-10-20 17:24       ` Niranjana Vishwanathapura
2023-10-19 14:41 ` [igt-dev] [PATCH i-g-t v4 15/15] tests/intel-ci/xe: add pat and caching related tests Matthew Auld

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.