All of lore.kernel.org
 help / color / mirror / Atom feed
* [igt-dev] [PATCH i-g-t,v4 0/5] Add testing for CCS
@ 2021-12-10 13:05 apoorva1.singh
  2021-12-10 13:05 ` [igt-dev] [PATCH i-g-t, v4 1/5] lib/i915: Introduce library intel_mocs apoorva1.singh
                   ` (6 more replies)
  0 siblings, 7 replies; 14+ messages in thread
From: apoorva1.singh @ 2021-12-10 13:05 UTC (permalink / raw)
  To: apoorva1.singh, igt-dev, ramalingam.c, zbigniew.kempczynski,
	arjun.melkaveri

From: Apoorva Singh <apoorva1.singh@intel.com>

- Add new library intel_mocs for mocs settings.

- Add new library i915_blt for various blt commands.

- Add a new platform flag, has_flat_ccs, for platforms
  supporting Flat CCS.

- Only use the main copy engines for XY_BLOCK_COPY.
  XY_BLOCK_COPY blt command is used to transfer the ccs data.
  So, we can only run the tests on those engines which have
  support for the "block_copy" capability.

- Add gem_ccs test for CCS testing.
  Commands are constructed with XY_BLOCK_COPY_BLT
  and XY_CTRL_SURF_COPY_BLT instructions.

v4: Used macro to avoid code duplication in gem_ccs test.

Apoorva Singh (3):
  lib/i915: Introduce library intel_mocs
  lib/i915: Introduce library i915_blt
  lib/intel_chipset.h: Add has_flat_ccs flag

CQ Tang (1):
  i915/gem_ccs: Add testing for CCS

Chris Wilson (1):
  i915/gem_engine_topology: Only use the main copy engines for
    XY_BLOCK_COPY

 lib/i915/gem_engine_topology.c |  38 +++
 lib/i915/gem_engine_topology.h |   5 +
 lib/i915/i915_blt.c            | 469 +++++++++++++++++++++++++++++++++
 lib/i915/i915_blt.h            |  82 ++++++
 lib/i915/intel_mocs.c          |  51 ++++
 lib/i915/intel_mocs.h          |  19 ++
 lib/intel_chipset.h            |   3 +
 lib/meson.build                |   2 +
 tests/i915/gem_ccs.c           | 406 ++++++++++++++++++++++++++++
 tests/meson.build              |   1 +
 10 files changed, 1076 insertions(+)
 create mode 100644 lib/i915/i915_blt.c
 create mode 100644 lib/i915/i915_blt.h
 create mode 100644 lib/i915/intel_mocs.c
 create mode 100644 lib/i915/intel_mocs.h
 create mode 100644 tests/i915/gem_ccs.c

-- 
2.25.1

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [igt-dev] [PATCH i-g-t, v4 1/5] lib/i915: Introduce library intel_mocs
  2021-12-10 13:05 [igt-dev] [PATCH i-g-t,v4 0/5] Add testing for CCS apoorva1.singh
@ 2021-12-10 13:05 ` apoorva1.singh
  2021-12-13 15:58   ` Zbigniew Kempczyński
  2021-12-10 13:05 ` [igt-dev] [PATCH i-g-t, v4 2/5] lib/i915: Introduce library i915_blt apoorva1.singh
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 14+ messages in thread
From: apoorva1.singh @ 2021-12-10 13:05 UTC (permalink / raw)
  To: apoorva1.singh, igt-dev, ramalingam.c, zbigniew.kempczynski,
	arjun.melkaveri

From: Apoorva Singh <apoorva1.singh@intel.com>

Add new library intel_mocs for mocs settings.

Signed-off-by: Apoorva Singh <apoorva1.singh@intel.com>
Cc: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com>
Cc: Melkaveri, Arjun <arjun.melkaveri@intel.com>
---
 lib/i915/intel_mocs.c | 51 +++++++++++++++++++++++++++++++++++++++++++
 lib/i915/intel_mocs.h | 19 ++++++++++++++++
 lib/meson.build       |  1 +
 3 files changed, 71 insertions(+)
 create mode 100644 lib/i915/intel_mocs.c
 create mode 100644 lib/i915/intel_mocs.h

diff --git a/lib/i915/intel_mocs.c b/lib/i915/intel_mocs.c
new file mode 100644
index 00000000..e8c19309
--- /dev/null
+++ b/lib/i915/intel_mocs.c
@@ -0,0 +1,51 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2021 Intel Corporation
+ */
+
+#include "igt.h"
+#include "i915/gem.h"
+#include "intel_mocs.h"
+
+static void get_mocs_index(int fd, struct drm_i915_mocs_index *mocs)
+{
+	uint16_t devid = intel_get_drm_devid(fd);
+
+	/*
+	 * Gen >= 12 onwards don't have a setting for PTE,
+	 * so using I915_MOCS_PTE as mocs index may leads to
+	 * some undefined MOCS behavior.
+	 * This helper function is providing current UC as well
+	 * as WB MOCS index based on platform.
+
+	 */
+	if (IS_DG1(devid)) {
+		mocs->uc_index = 1;
+		mocs->wb_index = 5;
+	} else if (IS_GEN12(devid)) {
+		mocs->uc_index = 3;
+		mocs->wb_index = 2;
+	} else {
+		mocs->uc_index = I915_MOCS_PTE;
+		mocs->wb_index = I915_MOCS_CACHED;
+	}
+}
+
+/* BitField [6:1] represents index to MOCS Tables
+ * BitField [0] represents Encryption/Decryption */
+
+uint8_t intel_get_wb_mocs(int fd)
+{
+	struct drm_i915_mocs_index mocs;
+
+	get_mocs_index(fd, &mocs);
+	return mocs.wb_index << 1;
+}
+
+uint8_t intel_get_uc_mocs(int fd)
+{
+	struct drm_i915_mocs_index mocs;
+
+	get_mocs_index(fd, &mocs);
+	return mocs.uc_index << 1;
+}
diff --git a/lib/i915/intel_mocs.h b/lib/i915/intel_mocs.h
new file mode 100644
index 00000000..efd8607f
--- /dev/null
+++ b/lib/i915/intel_mocs.h
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2019 Intel Corporation
+ */
+
+#ifndef _INTEL_MOCS_H
+#define _INTEL_MOCS_H
+
+#define XY_BLOCK_COPY_BLT_MOCS_SHIFT		21
+#define XY_CTRL_SURF_COPY_BLT_MOCS_SHIFT	25
+
+struct drm_i915_mocs_index {
+	uint8_t uc_index;
+	uint8_t wb_index;
+};
+
+uint8_t intel_get_wb_mocs(int fd);
+uint8_t intel_get_uc_mocs(int fd);
+#endif /* _INTEL_MOCS_H */
diff --git a/lib/meson.build b/lib/meson.build
index b9568a71..f500f0f1 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -11,6 +11,7 @@ lib_sources = [
 	'i915/gem_mman.c',
 	'i915/gem_vm.c',
 	'i915/intel_memory_region.c',
+	'i915/intel_mocs.c',
 	'igt_collection.c',
 	'igt_color_encoding.c',
 	'igt_debugfs.c',
-- 
2.25.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [igt-dev] [PATCH i-g-t, v4 2/5] lib/i915: Introduce library i915_blt
  2021-12-10 13:05 [igt-dev] [PATCH i-g-t,v4 0/5] Add testing for CCS apoorva1.singh
  2021-12-10 13:05 ` [igt-dev] [PATCH i-g-t, v4 1/5] lib/i915: Introduce library intel_mocs apoorva1.singh
@ 2021-12-10 13:05 ` apoorva1.singh
  2021-12-15  9:40   ` Zbigniew Kempczyński
                     ` (3 more replies)
  2021-12-10 13:05 ` [igt-dev] [PATCH i-g-t, v4 3/5] lib/intel_chipset.h: Add has_flat_ccs flag apoorva1.singh
                   ` (4 subsequent siblings)
  6 siblings, 4 replies; 14+ messages in thread
From: apoorva1.singh @ 2021-12-10 13:05 UTC (permalink / raw)
  To: apoorva1.singh, igt-dev, ramalingam.c, zbigniew.kempczynski,
	arjun.melkaveri

From: Apoorva Singh <apoorva1.singh@intel.com>

Add new library 'i915_blt' for various blt commands.

Signed-off-by: Apoorva Singh <apoorva1.singh@intel.com>
Signed-off-by: Ayaz A Siddiqui <ayaz.siddiqui@intel.com>
Cc: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com>
Cc: Melkaveri, Arjun <arjun.melkaveri@intel.com>
---
 lib/i915/i915_blt.c | 469 ++++++++++++++++++++++++++++++++++++++++++++
 lib/i915/i915_blt.h |  82 ++++++++
 lib/meson.build     |   1 +
 3 files changed, 552 insertions(+)
 create mode 100644 lib/i915/i915_blt.c
 create mode 100644 lib/i915/i915_blt.h

diff --git a/lib/i915/i915_blt.c b/lib/i915/i915_blt.c
new file mode 100644
index 00000000..abfe7739
--- /dev/null
+++ b/lib/i915/i915_blt.c
@@ -0,0 +1,469 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2021 Intel Corporation
+ */
+
+#include <errno.h>
+#include <sys/ioctl.h>
+#include <sys/time.h>
+#include <malloc.h>
+#include "drm.h"
+#include "igt.h"
+#include "i915_blt.h"
+#include "i915/intel_mocs.h"
+
+/*
+ * make_block_copy_batch:
+ * @fd: open i915 drm file descriptor
+ * @batch_buf: the batch buffer to populate with the command
+ * @src: fd of the source BO
+ * @dst: fd of the destination BO
+ * @length: size of the src and dest BOs
+ * @reloc: pointer to the relocation entyr for this command
+ * @offset_src: source address offset
+ * @offset_dst: destination address offset
+ * @src_mem_type: source memory type (denotes direct or indirect
+ *			addressing)
+ * @dst_mem_type: destination memory type (denotes direct or indirect
+ *			addressing)
+ * @src_compression: flag to enable uncompressed read of compressed data
+ *			at the source
+ * @dst_compression: flag to enable compressed write at the destination
+ * @resolve: flag to enable resolve of compressed data
+ */
+static int make_block_copy_batch(int fd, uint32_t *batch_buf,
+				 uint32_t src, uint32_t dst, uint32_t length,
+				 struct drm_i915_gem_relocation_entry *reloc,
+				 uint64_t offset_src, uint64_t offset_dst,
+				 int src_mem_type, int dst_mem_type,
+				 int src_compression, int dst_compression,
+				 int resolve)
+{
+	uint32_t *b = batch_buf;
+	uint32_t devid;
+	uint8_t src_mocs = intel_get_uc_mocs(fd);
+	uint8_t dst_mocs = src_mocs;
+
+	devid = intel_get_drm_devid(fd);
+
+	igt_assert(AT_LEAST_GEN(devid, 12) && IS_TIGERLAKE(devid) && !(src_compression || dst_compression));
+
+	/* BG 0 */
+	b[0] = BLOCK_COPY_BLT_CMD | resolve;
+
+	/* BG 1
+	 *
+	 * Using Tile 4 dimensions.  Height = 32 rows
+	 * Width = 128 bytes
+	 */
+	b[1] = dst_compression | TILE_4_FORMAT | TILE_4_WIDTH_DWORD |
+		dst_mocs << XY_BLOCK_COPY_BLT_MOCS_SHIFT;;
+
+	/* BG 3
+	 *
+	 * X2 = TILE_4_WIDTH
+	 * Y2 = (length / TILE_4_WIDTH) << 16:
+	 */
+	b[3] = TILE_4_WIDTH | (length >> 7) << DEST_Y2_COORDINATE_SHIFT;
+
+	b[4] = offset_dst;
+	b[5] = offset_dst >> 32;
+
+	/* relocate address in b[4] and b[5] */
+	reloc->offset = 4 * (sizeof(uint32_t));
+	reloc->delta = 0;
+	reloc->target_handle = dst;
+	reloc->read_domains = I915_GEM_DOMAIN_RENDER;
+	reloc->write_domain = I915_GEM_DOMAIN_RENDER;
+	reloc->presumed_offset = 0;
+	reloc++;
+
+	/* BG 6 */
+	b[6] = dst_mem_type << DEST_MEM_TYPE_SHIFT;
+
+	/* BG 8 */
+	b[8] = src_compression | TILE_4_WIDTH_DWORD | TILE_4_FORMAT |
+		src_mocs << XY_BLOCK_COPY_BLT_MOCS_SHIFT;
+
+	b[9] = offset_src;
+	b[10] = offset_src >> 32;
+
+	/* relocate address in b[9] and b[10] */
+	reloc->offset = 9 * sizeof(uint32_t);
+	reloc->delta = 0;
+	reloc->target_handle = src;
+	reloc->read_domains = I915_GEM_DOMAIN_RENDER;
+	reloc->write_domain = 0;
+	reloc->presumed_offset = 0;
+	reloc++;
+
+	/* BG 11 */
+	b[11] = src_mem_type << SRC_MEM_TYPE_SHIFT;
+
+	/* BG 16  */
+	b[16] = SURFACE_TYPE_2D |
+		((TILE_4_WIDTH - 1) << DEST_SURF_WIDTH_SHIFT) |
+		(TILE_4_HEIGHT - 1);
+
+	/* BG 19 */
+	b[19] = SURFACE_TYPE_2D |
+		((TILE_4_WIDTH - 1) << SRC_SURF_WIDTH_SHIFT) |
+		(TILE_4_HEIGHT - 1);
+
+	b += XY_BLOCK_COPY_BLT_LEN_DWORD;
+
+	b[0] = MI_FLUSH_DW | MI_FLUSH_LLC | MI_INVALIDATE_TLB;
+	reloc->offset = 23 * sizeof(uint32_t);
+	reloc->delta = 0;
+	reloc->target_handle = dst_compression > 0 ? dst : src;
+	reloc->read_domains = 0;
+	reloc->write_domain = 0;
+	reloc->presumed_offset = 0;
+	reloc++;
+	b[3] = 0;
+
+	b[4] = MI_FLUSH_DW | MI_FLUSH_CCS;
+	reloc->offset = 27 * sizeof(uint32_t);
+	reloc->delta = 0;
+	reloc->target_handle = dst_compression > 0 ? dst : src;
+	reloc->read_domains = 0;
+	reloc->write_domain = 0;
+	reloc->presumed_offset = 0;
+	reloc++;
+	b[7] = 0;
+
+	b[8] = MI_BATCH_BUFFER_END;
+	b[9] = 0;
+
+	b += 10;
+
+	return (b - batch_buf) * sizeof(uint32_t);
+}
+
+static void __xy_block_copy_blt(int fd, uint32_t bb_region, uint32_t src, uint32_t dst,
+				uint64_t src_size, uint64_t dst_size, uint64_t ahnd,
+				uint32_t length, enum copy_mode mode, bool enable_compression,
+				uint32_t ctx, struct intel_execution_engine2 *e)
+{
+	struct drm_i915_gem_relocation_entry reloc[4];
+	struct drm_i915_gem_exec_object2 exec[3];
+	struct drm_i915_gem_execbuffer2 execbuf;
+	int len;
+	int src_mem_type, dst_mem_type;
+	int dst_compression, src_compression;
+	int resolve;
+	uint32_t cmd, batch_buf[BATCH_SIZE/sizeof(uint32_t)] = {};
+	uint64_t offset_src, offset_dst, offset_bb, bb_size, ret;
+
+	bb_size = BATCH_SIZE;
+	ret = __gem_create_in_memory_regions(fd, &cmd, &bb_size, bb_region);
+	igt_assert_eq(ret, 0);
+
+	switch(mode) {
+		case SYS_TO_SYS: /* copy from smem to smem */
+			src_mem_type = MEM_TYPE_SYS;
+			dst_mem_type = MEM_TYPE_SYS;
+			src_compression = 0;
+			dst_compression = 0;
+			resolve = 0;
+		case SYS_TO_LOCAL: /* copy from smem to lmem */
+			src_mem_type = MEM_TYPE_SYS;
+			dst_mem_type = MEM_TYPE_LOCAL;
+			src_compression = 0;
+			dst_compression = enable_compression ? (COMPRESSION_ENABLE | AUX_CCS_E) : 0;
+			resolve = 0;
+		case LOCAL_TO_SYS: /* copy from lmem to smem */
+			src_mem_type = MEM_TYPE_LOCAL;
+			dst_mem_type = MEM_TYPE_SYS;
+			src_compression = enable_compression ? (COMPRESSION_ENABLE | AUX_CCS_E) : 0;
+			dst_compression = 0;
+			resolve = 0;
+		case LOCAL_TO_LOCAL: /* copy from lmem to lmem */
+			src_mem_type = MEM_TYPE_LOCAL;
+			dst_mem_type = MEM_TYPE_LOCAL;
+			src_compression = enable_compression ? (COMPRESSION_ENABLE | AUX_CCS_E) : 0;
+			dst_compression = enable_compression ? (COMPRESSION_ENABLE | AUX_CCS_E) : 0;
+			resolve = 0;
+		case LOCAL_TO_LOCAL_INPLACE: /* in-place decompress */
+			src_mem_type = MEM_TYPE_LOCAL;
+			dst_mem_type = MEM_TYPE_LOCAL;
+			src_compression = enable_compression ? (COMPRESSION_ENABLE | AUX_CCS_E) : 0;
+			dst_compression = enable_compression ? (COMPRESSION_ENABLE | AUX_CCS_E) : 0;
+			resolve = FULL_RESOLVE;
+	}
+
+	offset_src = get_offset(ahnd, src, src_size, 0);
+	offset_dst = get_offset(ahnd, dst, dst_size, 0);
+	offset_bb = get_offset(ahnd, cmd, bb_size, 0);
+
+	/* construct the batch buffer */
+	memset(reloc, 0, sizeof(reloc));
+	len = make_block_copy_batch(fd, batch_buf,
+				    src, dst, length, reloc,
+				    offset_src, offset_dst,
+				    src_mem_type, dst_mem_type,
+				    src_compression, dst_compression,
+				    resolve);
+	igt_assert(len > 0);
+
+	/* write batch buffer to 'cmd' BO */
+	gem_write(fd, cmd, 0, batch_buf, len);
+
+	/* Execute the batch buffer */
+	memset(exec, 0, sizeof(exec));
+	if (mode == LOCAL_TO_LOCAL_INPLACE) {
+		exec[0].handle = dst;
+		exec[1].handle = cmd;
+		exec[1].relocation_count = !ahnd ? 4 : 0;
+		exec[1].relocs_ptr = to_user_pointer(reloc);
+		if (ahnd) {
+			exec[0].offset = offset_src;
+			exec[0].flags |= EXEC_OBJECT_PINNED | EXEC_OBJECT_WRITE;
+			exec[1].offset = offset_dst;
+			exec[1].flags |= EXEC_OBJECT_PINNED;
+		}
+	} else {
+		exec[0].handle = src;
+		exec[1].handle = dst;
+		exec[2].handle = cmd;
+		exec[2].relocation_count = !ahnd ? 4 : 0;
+		exec[2].relocs_ptr = to_user_pointer(reloc);
+		if (ahnd) {
+			exec[0].offset = offset_src;
+			exec[0].flags |= EXEC_OBJECT_PINNED;
+			exec[1].offset = offset_dst;
+			exec[1].flags |= EXEC_OBJECT_PINNED | EXEC_OBJECT_WRITE;
+			exec[2].offset = offset_bb;
+			exec[2].flags |= EXEC_OBJECT_PINNED;
+		}
+	}
+
+	memset(&execbuf, 0, sizeof(execbuf));
+	execbuf.buffers_ptr = to_user_pointer(exec);
+
+	if (mode == LOCAL_TO_LOCAL_INPLACE)
+		execbuf.buffer_count = 2;
+	else
+		execbuf.buffer_count = 3;
+	execbuf.batch_len = len;
+
+	if (ctx)
+		execbuf.rsvd1 = ctx;
+
+	execbuf.flags = I915_EXEC_BLT;
+	if (e)
+		execbuf.flags = e->flags;
+
+	gem_execbuf(fd, &execbuf);
+	gem_close(fd, cmd);
+	put_offset(ahnd, src);
+	put_offset(ahnd, dst);
+	put_offset(ahnd, cmd);
+}
+
+void xy_block_copy_blt(int fd, uint32_t bb_region, uint32_t src, uint32_t dst,
+		       uint64_t src_size, uint64_t dst_size, uint64_t ahnd,
+		       uint32_t length, enum copy_mode mode, bool enable_compression,
+		       struct intel_execution_engine2 *e)
+{
+	__xy_block_copy_blt(fd, bb_region, src, dst, src_size, dst_size, ahnd,
+			    length, mode, enable_compression, 0, e);
+}
+
+void xy_block_copy_blt_ctx(int fd, uint32_t bb_region, uint32_t src, uint32_t dst,
+			   uint64_t src_size, uint64_t dst_size, uint64_t ahnd,
+			   uint32_t length, enum copy_mode mode, bool enable_compression,
+			   uint32_t ctx, struct intel_execution_engine2 *e)
+{
+	__xy_block_copy_blt(fd, bb_region, src, dst, src_size, dst_size, ahnd,
+			    length, mode, enable_compression, ctx, e);
+}
+
+/*
+ * make_ctrl_surf_batch:
+ * @fd: open i915 drm file descriptor
+ * @batch_buf: the batch buffer to populate with the command
+ * @src: fd of the source BO
+ * @dst: fd of the destination BO
+ * @length: size of the ctrl surf in bytes
+ * @reloc: pointer to the relocation entyr for this command
+ * @offset_src: source address offset
+ * @offset_dst: destination address offset
+ * @src_mem_access: source memory type (denotes direct or indirect
+ *			addressing)
+ * @dst_mem_acdcess: destination memory type (denotes direct or indirect
+ *			addressing)
+ */
+static int make_ctrl_surf_batch(int fd, uint32_t *batch_buf,
+				uint32_t src, uint32_t dst, uint32_t length,
+				struct drm_i915_gem_relocation_entry *reloc,
+				uint64_t offset_src, uint64_t offset_dst,
+				int src_mem_access, int dst_mem_access)
+{
+	int num_ccs_blocks;
+	uint32_t *b = batch_buf;
+	uint8_t src_mocs = intel_get_uc_mocs(fd);
+	uint8_t dst_mocs = src_mocs;
+
+	num_ccs_blocks = length/CCS_RATIO;
+	if (num_ccs_blocks < 1)
+		num_ccs_blocks = 1;
+	if (num_ccs_blocks > NUM_CCS_BLKS_PER_XFER)
+		return 0;
+
+	/*
+	 * We use logical AND with 1023 since the size field
+	 * takes values which is in the range of 0 - 1023
+	 */
+	b[0] = ((XY_CTRL_SURF_COPY_BLT) |
+		(src_mem_access << SRC_ACCESS_TYPE_SHIFT) |
+		(dst_mem_access << DST_ACCESS_TYPE_SHIFT) |
+		(((num_ccs_blocks - 1) & 1023) << CCS_SIZE_SHIFT));
+
+	b[1] = offset_src;
+	b[2] = offset_src >> 32 | src_mocs << XY_CTRL_SURF_COPY_BLT_MOCS_SHIFT;
+
+	/* relocate address in b[1] and b[2] */
+	reloc->offset = 1 * sizeof(uint32_t);
+	reloc->delta = 0;
+	reloc->target_handle = src;
+	reloc->read_domains = I915_GEM_DOMAIN_RENDER;
+	reloc->write_domain = 0;
+	reloc->presumed_offset = 0;
+	reloc++;
+
+	b[3] = offset_dst;
+	b[4] = offset_dst >> 32 | dst_mocs << XY_CTRL_SURF_COPY_BLT_MOCS_SHIFT;
+
+	/* relocate address in b[3] and b[4] */
+	reloc->offset = 3 * (sizeof(uint32_t));
+	reloc->delta = 0;
+	reloc->target_handle = dst;
+	reloc->read_domains = I915_GEM_DOMAIN_RENDER;
+	reloc->write_domain = I915_GEM_DOMAIN_RENDER;
+	reloc->presumed_offset = 0;
+	reloc++;
+
+	b[5] = 0;
+
+	b[6] = MI_FLUSH_DW | MI_FLUSH_LLC | MI_INVALIDATE_TLB;
+
+	reloc->offset = 7 * sizeof(uint32_t);
+	reloc->delta = 0;
+	reloc->target_handle =
+	dst_mem_access == INDIRECT_ACCESS ? dst : src;
+	reloc->read_domains = 0;
+	reloc->write_domain = 0;
+	reloc->presumed_offset = 0;
+	reloc++;
+	b[9] = 0;
+
+	b[10] = MI_FLUSH_DW | MI_FLUSH_CCS;
+	reloc->offset = 11 * sizeof(uint32_t);
+	reloc->delta = 0;
+	reloc->target_handle =
+	dst_mem_access == INDIRECT_ACCESS ? dst : src;
+	reloc->read_domains = 0;
+	reloc->write_domain = 0;
+	reloc->presumed_offset = 0;
+	reloc++;
+	b[13] = 0;
+
+	b[14] = MI_BATCH_BUFFER_END;
+	b[15] = 0;
+
+	b += 16;
+
+	return (b - batch_buf) * sizeof(uint32_t);
+}
+
+static void __xy_ctrl_surf_copy_blt(int fd, uint32_t bb_region, uint32_t src,
+				    uint32_t dst, uint64_t src_size, uint64_t dst_size,
+				    uint64_t ahnd, uint32_t length, bool writetodev,
+				    uint32_t ctx, struct intel_execution_engine2 *e)
+{
+	struct drm_i915_gem_relocation_entry reloc[4];
+	struct drm_i915_gem_exec_object2 exec[3];
+	struct drm_i915_gem_execbuffer2 execbuf;
+	int len, src_mem_access, dst_mem_access;
+	uint32_t cmd, batch_buf[BATCH_SIZE/sizeof(uint32_t)] = {};
+	uint64_t offset_src, offset_dst, offset_bb, bb_size, ret;
+
+	bb_size = BATCH_SIZE;
+	ret = __gem_create_in_memory_regions(fd, &cmd, &bb_size, bb_region);
+	igt_assert_eq(ret, 0);
+
+	if (writetodev) {
+		src_mem_access = DIRECT_ACCESS;
+		dst_mem_access = INDIRECT_ACCESS;
+	} else {
+		src_mem_access = INDIRECT_ACCESS;
+		dst_mem_access = DIRECT_ACCESS;
+	}
+
+	offset_src = get_offset(ahnd, src, src_size, 0);
+	offset_dst = get_offset(ahnd, dst, dst_size, 0);
+	offset_bb = get_offset(ahnd, cmd, bb_size, 0);
+
+	/* construct batch command buffer */
+	memset(reloc, 0, sizeof(reloc));
+	len = make_ctrl_surf_batch(fd, batch_buf,
+				   src, dst, length, reloc,
+				   offset_src, offset_dst,
+				   src_mem_access, dst_mem_access);
+	igt_assert(len > 0);
+
+	/* Copy the batch buff to BO cmd */
+	gem_write(fd, cmd, 0, batch_buf, len);
+
+	/* Execute the batch buffer */
+	memset(exec, 0, sizeof(exec));
+	exec[0].handle = src;
+	exec[1].handle = dst;
+	exec[2].handle = cmd;
+	exec[2].relocation_count = !ahnd ? 4 : 0;
+	exec[2].relocs_ptr = to_user_pointer(reloc);
+	if (ahnd) {
+		exec[0].offset = offset_src;
+		exec[0].flags |= EXEC_OBJECT_PINNED;
+		exec[1].offset = offset_dst;
+		exec[1].flags |= EXEC_OBJECT_PINNED | EXEC_OBJECT_WRITE;
+		exec[2].offset = offset_bb;
+		exec[2].flags |= EXEC_OBJECT_PINNED;
+	}
+
+	memset(&execbuf, 0, sizeof(execbuf));
+	execbuf.buffers_ptr = to_user_pointer(exec);
+	execbuf.buffer_count = 3;
+	execbuf.batch_len = len;
+	execbuf.flags = I915_EXEC_BLT;
+	if (ctx)
+		execbuf.rsvd1 = ctx;
+	if (e)
+		execbuf.flags = e->flags;
+
+	gem_execbuf(fd, &execbuf);
+	gem_close(fd, cmd);
+	put_offset(ahnd, src);
+	put_offset(ahnd, dst);
+	put_offset(ahnd, cmd);
+}
+
+void xy_ctrl_surf_copy_blt(int fd, uint32_t bb_region, uint32_t src, uint32_t dst,
+			   uint64_t src_size, uint64_t dst_size, uint64_t ahnd,
+			   uint32_t length, bool writetodev,
+			   struct intel_execution_engine2 *e)
+{
+	__xy_ctrl_surf_copy_blt(fd, bb_region, src, dst, src_size, dst_size,
+				ahnd, length, writetodev, 0, e);
+}
+
+void xy_ctrl_surf_copy_blt_ctx(int fd, uint32_t bb_region, uint32_t src, uint32_t dst,
+			       uint64_t src_size, uint64_t dst_size, uint64_t ahnd,
+			       uint32_t length, bool writetodev, uint32_t ctx,
+			       struct intel_execution_engine2 *e)
+{
+	__xy_ctrl_surf_copy_blt(fd, bb_region, src, dst, src_size, dst_size,
+				ahnd, length, writetodev, ctx, e);
+}
+
diff --git a/lib/i915/i915_blt.h b/lib/i915/i915_blt.h
new file mode 100644
index 00000000..71653880
--- /dev/null
+++ b/lib/i915/i915_blt.h
@@ -0,0 +1,82 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2021 Intel Corporation
+ */
+
+#include <errno.h>
+#include <sys/ioctl.h>
+#include <sys/time.h>
+#include <malloc.h>
+#include "drm.h"
+#include "igt.h"
+
+#define MI_FLUSH_DW_LEN_DWORD	4
+#define MI_FLUSH_DW		(0x26 << 23 | 1)
+#define MI_FLUSH_CCS		(1 << 16)
+#define MI_FLUSH_LLC		(1 << 9)
+#define MI_INVALIDATE_TLB	(1 << 18)
+
+/* XY_BLOCK_COPY_BLT instruction has 22 bit groups 1 DWORD each */
+#define XY_BLOCK_COPY_BLT_LEN_DWORD	22
+#define BLOCK_COPY_BLT_CMD		(2 << 29 | 0x41 << 22 | 0x14)
+#define COMPRESSION_ENABLE		(1 << 29)
+#define AUX_CCS_E			(5 << 18)
+#define FULL_RESOLVE			(1 << 12)
+#define PARTIAL_RESOLVE			(2 << 12)
+#define TILE_4_FORMAT			(2 << 30)
+#define TILE_4_WIDTH			(128)
+#define TILE_4_WIDTH_DWORD		((128 >> 2) - 1)
+#define TILE_4_HEIGHT			(32)
+#define SURFACE_TYPE_2D			(1 << 29)
+
+#define DEST_Y2_COORDINATE_SHIFT	(16)
+#define DEST_MEM_TYPE_SHIFT		(31)
+#define SRC_MEM_TYPE_SHIFT		(31)
+#define DEST_SURF_WIDTH_SHIFT		(14)
+#define SRC_SURF_WIDTH_SHIFT		(14)
+
+#define XY_CTRL_SURF_COPY_BLT		(2<<29 | 0x48<<22 | 3)
+#define SRC_ACCESS_TYPE_SHIFT		21
+#define DST_ACCESS_TYPE_SHIFT		20
+#define CCS_SIZE_SHIFT			8
+#define MI_INSTR(opcode, flags) (((opcode) << 23) | (flags))
+#define MI_ARB_CHECK			MI_INSTR(0x05, 0)
+#define NUM_CCS_BLKS_PER_XFER		1024
+#define INDIRECT_ACCESS                 0
+#define DIRECT_ACCESS                   1
+
+#define BATCH_SIZE			4096
+#define BOSIZE_MIN			(4*1024)
+#define BOSIZE_MAX			(4*1024*1024)
+#define CCS_RATIO			256
+
+#define MEM_TYPE_SYS			1
+#define MEM_TYPE_LOCAL			0
+
+enum copy_mode {
+	SYS_TO_SYS = 0,
+	SYS_TO_LOCAL,
+	LOCAL_TO_SYS,
+	LOCAL_TO_LOCAL,
+	LOCAL_TO_LOCAL_INPLACE,
+};
+
+void xy_block_copy_blt(int fd, uint32_t bb_region, uint32_t src, uint32_t dst,
+		       uint64_t src_size, uint64_t dst_size, uint64_t ahnd,
+		       uint32_t length, enum copy_mode mode, bool enable_compression,
+		       struct intel_execution_engine2 *e);
+
+void xy_ctrl_surf_copy_blt(int fd, uint32_t bb_region, uint32_t src, uint32_t dst,
+			   uint64_t src_size, uint64_t dst_size, uint64_t ahnd,
+			   uint32_t length, bool writetodev,
+			   struct intel_execution_engine2 *e);
+
+void xy_block_copy_blt_ctx(int fd, uint32_t bb_region, uint32_t src, uint32_t dst,
+			   uint64_t src_size, uint64_t dst_size, uint64_t ahnd,
+			   uint32_t length, enum copy_mode mode, bool enable_compression,
+			   uint32_t ctx, struct intel_execution_engine2 *e);
+
+void xy_ctrl_surf_copy_blt_ctx(int fd, uint32_t bb_region, uint32_t src, uint32_t dst,
+			       uint64_t src_size, uint64_t dst_size, uint64_t ahnd,
+			       uint32_t length, bool writetodev, uint32_t ctx,
+			       struct intel_execution_engine2 *e);
diff --git a/lib/meson.build b/lib/meson.build
index f500f0f1..f2924541 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -12,6 +12,7 @@ lib_sources = [
 	'i915/gem_vm.c',
 	'i915/intel_memory_region.c',
 	'i915/intel_mocs.c',
+	'i915/i915_blt.c',
 	'igt_collection.c',
 	'igt_color_encoding.c',
 	'igt_debugfs.c',
-- 
2.25.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [igt-dev] [PATCH i-g-t, v4 3/5] lib/intel_chipset.h: Add has_flat_ccs flag
  2021-12-10 13:05 [igt-dev] [PATCH i-g-t,v4 0/5] Add testing for CCS apoorva1.singh
  2021-12-10 13:05 ` [igt-dev] [PATCH i-g-t, v4 1/5] lib/i915: Introduce library intel_mocs apoorva1.singh
  2021-12-10 13:05 ` [igt-dev] [PATCH i-g-t, v4 2/5] lib/i915: Introduce library i915_blt apoorva1.singh
@ 2021-12-10 13:05 ` apoorva1.singh
  2021-12-10 13:05 ` [igt-dev] [PATCH i-g-t, v4 4/5] i915/gem_engine_topology: Only use the main copy engines for XY_BLOCK_COPY apoorva1.singh
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: apoorva1.singh @ 2021-12-10 13:05 UTC (permalink / raw)
  To: apoorva1.singh, igt-dev, ramalingam.c, zbigniew.kempczynski,
	arjun.melkaveri

From: Apoorva Singh <apoorva1.singh@intel.com>

Add a new platform flag, has_flat_ccs, for platforms
supporting Flat CCS.

Signed-off-by: Apoorva Singh <apoorva1.singh@intel.com>
Cc: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com>
Cc: Melkaveri, Arjun <arjun.melkaveri@intel.com>
Acked-by: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com>
---
 lib/intel_chipset.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/lib/intel_chipset.h b/lib/intel_chipset.h
index 2b795527..d2719932 100644
--- a/lib/intel_chipset.h
+++ b/lib/intel_chipset.h
@@ -40,6 +40,7 @@ struct intel_device_info {
 	unsigned graphics_ver;
 	unsigned display_ver;
 	unsigned gt; /* 0 if unknown */
+	bool has_flat_ccs : 1;
 	bool is_mobile : 1;
 	bool is_whitney : 1;
 	bool is_almador : 1;
@@ -209,4 +210,6 @@ void intel_check_pch(void);
 				   IS_CHERRYVIEW(devid) || \
 				   IS_BROXTON(devid)))
 
+#define HAS_FLAT_CCS(devid)	(intel_get_device_info(devid)->has_flat_ccs)
+
 #endif /* _INTEL_CHIPSET_H */
-- 
2.25.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [igt-dev] [PATCH i-g-t, v4 4/5] i915/gem_engine_topology: Only use the main copy engines for XY_BLOCK_COPY
  2021-12-10 13:05 [igt-dev] [PATCH i-g-t,v4 0/5] Add testing for CCS apoorva1.singh
                   ` (2 preceding siblings ...)
  2021-12-10 13:05 ` [igt-dev] [PATCH i-g-t, v4 3/5] lib/intel_chipset.h: Add has_flat_ccs flag apoorva1.singh
@ 2021-12-10 13:05 ` apoorva1.singh
  2021-12-10 13:05 ` [igt-dev] [PATCH i-g-t,v4 5/5] i915/gem_ccs: Add testing for CCS apoorva1.singh
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: apoorva1.singh @ 2021-12-10 13:05 UTC (permalink / raw)
  To: apoorva1.singh, igt-dev, ramalingam.c, zbigniew.kempczynski,
	arjun.melkaveri

From: Chris Wilson <chris.p.wilson@intel.com>

XY_BLOCK_COPY blt command is used to transfer the ccs data.
So, we can only run the tests on those engines which have
support for the "block_copy" capability.

Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
Signed-off-by: Apoorva Singh <apoorva1.singh@intel.com>
Cc: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com>
Cc: Melkaveri, Arjun <arjun.melkaveri@intel.com>
---
 lib/i915/gem_engine_topology.c | 38 ++++++++++++++++++++++++++++++++++
 lib/i915/gem_engine_topology.h |  5 +++++
 2 files changed, 43 insertions(+)

diff --git a/lib/i915/gem_engine_topology.c b/lib/i915/gem_engine_topology.c
index 729f42b0..37b5875e 100644
--- a/lib/i915/gem_engine_topology.c
+++ b/lib/i915/gem_engine_topology.c
@@ -488,6 +488,44 @@ int gem_engine_property_printf(int i915, const char *engine, const char *attr,
 	return ret;
 }
 
+static bool
+__gem_engine_has_capability(int i915, const char *engine,
+			    const char *attr, const char *cap)
+{
+	char buf[4096] = {};
+	FILE *file;
+
+	file = __open_attr(igt_sysfs_open(i915), "r",
+			   "engine", engine, attr, NULL);
+	if (file) {
+		fread(buf, 1, sizeof(buf) - 1, file);
+		fclose(file);
+	}
+
+	return strstr(buf, cap);
+}
+
+bool gem_engine_has_capability(int i915, const char *engine, const char *cap)
+{
+	return __gem_engine_has_capability(i915, engine, "capabilities", cap);
+}
+
+bool gem_engine_has_known_capability(int i915, const char *engine, const char *cap)
+{
+	return __gem_engine_has_capability(i915, engine, "known_capabilities", cap);
+}
+
+bool gem_engine_can_block_copy(int i915, const struct intel_execution_engine2 *engine)
+{
+	if (engine->class != I915_ENGINE_CLASS_COPY)
+		return false;
+
+	if (!gem_engine_has_known_capability(i915, engine->name, "block_copy"))
+		return intel_gen(intel_get_drm_devid(i915)) >= 12;
+
+	return gem_engine_has_capability(i915, engine->name, "block_copy");
+}
+
 uint32_t gem_engine_mmio_base(int i915, const char *engine)
 {
 	unsigned int mmio = 0;
diff --git a/lib/i915/gem_engine_topology.h b/lib/i915/gem_engine_topology.h
index 4cfab560..d24bc9e8 100644
--- a/lib/i915/gem_engine_topology.h
+++ b/lib/i915/gem_engine_topology.h
@@ -124,6 +124,11 @@ int gem_engine_property_printf(int i915, const char *engine, const char *attr,
 
 uint32_t gem_engine_mmio_base(int i915, const char *engine);
 
+bool gem_engine_has_capability(int i915, const char *engine, const char *cap);
+bool gem_engine_has_known_capability(int i915, const char *engine, const char *cap);
+
+bool gem_engine_can_block_copy(int i915, const struct intel_execution_engine2 *engine);
+
 void dyn_sysfs_engines(int i915, int engines, const char *file,
 		       void (*test)(int i915, int engine));
 
-- 
2.25.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [igt-dev] [PATCH i-g-t,v4 5/5] i915/gem_ccs: Add testing for CCS
  2021-12-10 13:05 [igt-dev] [PATCH i-g-t,v4 0/5] Add testing for CCS apoorva1.singh
                   ` (3 preceding siblings ...)
  2021-12-10 13:05 ` [igt-dev] [PATCH i-g-t, v4 4/5] i915/gem_engine_topology: Only use the main copy engines for XY_BLOCK_COPY apoorva1.singh
@ 2021-12-10 13:05 ` apoorva1.singh
  2021-12-10 13:45 ` [igt-dev] ✓ Fi.CI.BAT: success for Add testing for CCS (rev4) Patchwork
  2021-12-11  8:42 ` [igt-dev] ✗ Fi.CI.IGT: failure " Patchwork
  6 siblings, 0 replies; 14+ messages in thread
From: apoorva1.singh @ 2021-12-10 13:05 UTC (permalink / raw)
  To: apoorva1.singh, igt-dev, ramalingam.c, zbigniew.kempczynski,
	arjun.melkaveri

From: CQ Tang <cq.tang@intel.com>

Add gem_ccs test for CCS testing.
Commands are constructed with XY_BLOCK_COPY_BLT
and XY_CTRL_SURF_COPY_BLT instructions.

Signed-off-by: CQ Tang <cq.tang@intel.com>
Signed-off-by: Apoorva Singh <apoorva1.singh@intel.com>
Cc: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com>
Cc: Melkaveri, Arjun <arjun.melkaveri@intel.com>
---
 tests/i915/gem_ccs.c | 406 +++++++++++++++++++++++++++++++++++++++++++
 tests/meson.build    |   1 +
 2 files changed, 407 insertions(+)
 create mode 100644 tests/i915/gem_ccs.c

diff --git a/tests/i915/gem_ccs.c b/tests/i915/gem_ccs.c
new file mode 100644
index 00000000..7d4aee1c
--- /dev/null
+++ b/tests/i915/gem_ccs.c
@@ -0,0 +1,406 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2021 Intel Corporation
+ */
+
+#include <errno.h>
+#include <sys/ioctl.h>
+#include <sys/time.h>
+#include <malloc.h>
+#include "drm.h"
+#include "igt.h"
+#include "i915/gem.h"
+#include "i915/gem_create.h"
+#include "lib/intel_chipset.h"
+#include "i915/i915_blt.h"
+
+IGT_TEST_DESCRIPTION("Exercise the memory bandwidth compression and "
+		     "decompression when copying data between "
+		     "system and local memory");
+
+static void igt_wr_xy_ctrl_surf_copy_blt(int fd, uint32_t region, uint32_t src, uint32_t dst,
+					 uint32_t out, uint64_t src_size, uint64_t dst_size,
+					 uint64_t out_size, uint64_t ahnd, int ccssize,
+					 uint8_t *pattern_buf, uint8_t *read_buf, int bosize,
+					 struct intel_execution_engine2 *e)
+{
+	gem_write(fd, src, 0, pattern_buf, bosize);
+
+	/*
+	 * 'dst' is lmem BO with ccs, directly
+	 * copy content in 'src' BO to 'dst' BO's ccs
+	 */
+	xy_ctrl_surf_copy_blt(fd, region, src, dst, src_size, dst_size,
+			      ahnd, ccssize, true, e);
+
+	memset(read_buf, 0, ccssize);
+	gem_write(fd, out, 0, read_buf, ccssize);
+
+	/* copy 'dst' BO's ccs into 'out' BO */
+	xy_ctrl_surf_copy_blt(fd, region, dst, out, dst_size, out_size,
+			      ahnd, ccssize, false, e);
+
+	gem_read(fd, out, 0, read_buf, ccssize);
+
+	igt_assert_eq(memcmp(read_buf, pattern_buf, ccssize), 0);
+}
+
+static void igt_overwritten_xy_block_copy_blt(int fd, uint32_t region, uint32_t src, uint32_t dst,
+					      uint32_t out, uint64_t src_size, uint64_t dst_size,
+					      uint64_t out_size, uint64_t ahnd, int ccssize,
+					      uint8_t *pattern_buf, uint8_t *read_buf, int bosize,
+					      struct intel_execution_engine2 *e)
+{
+	bool enable_compression = true;
+
+	igt_info("copy random pattern to lmem BO with compression\n");
+
+	gem_write(fd, src, 0, pattern_buf, bosize);
+	gem_write(fd, dst, 0, pattern_buf, bosize);
+
+	/*
+	 * 'dst' is lmem BO with ccs,
+	 * copy content in 'src' BO to 'dst' BO
+	 */
+	xy_block_copy_blt(fd, region, src, dst, src_size, dst_size, ahnd,
+			  bosize, SYS_TO_LOCAL, enable_compression, e);
+
+	memset(read_buf, 0, ccssize);
+	gem_write(fd, out, 0, read_buf, ccssize);
+
+	/* copy 'dst' BO's ccs into 'out' BO */
+	xy_ctrl_surf_copy_blt(fd, region, dst, out, dst_size, out_size,
+			      ahnd, ccssize, false, e);
+
+	gem_read(fd, out, 0, read_buf, ccssize);
+
+	igt_assert_neq(memcmp(read_buf, pattern_buf, ccssize), 0);
+
+	memset(read_buf, 0, bosize);
+	gem_read(fd, dst, 0, read_buf, bosize);
+
+	igt_assert_neq(memcmp(read_buf, pattern_buf, bosize), 0);
+
+	memset(read_buf, 0, bosize);
+	gem_write(fd, out, 0, read_buf, bosize);
+
+	/* copy 'dst' BO into 'out' BO */
+	xy_block_copy_blt(fd, region, dst, out, dst_size, out_size, ahnd,
+			  bosize, LOCAL_TO_SYS, enable_compression, e);
+
+	gem_read(fd, out, 0, read_buf, bosize);
+
+	igt_assert_eq(memcmp(read_buf, pattern_buf, bosize), 0);
+
+	/* decompress 'dst' in place */
+	xy_block_copy_blt(fd, region, dst, dst, dst_size, dst_size, ahnd,
+			  bosize, LOCAL_TO_LOCAL_INPLACE, enable_compression, e);
+
+	memset(read_buf, 0, bosize);
+	gem_read(fd, dst, 0, read_buf, bosize);
+
+	igt_assert_eq(memcmp(read_buf, pattern_buf, bosize), 0);
+
+	memset(read_buf, 0, bosize);
+	gem_write(fd, out, 0, read_buf, bosize);
+
+	/* copy decompressed 'dst' to 'out' */
+	xy_block_copy_blt(fd, region, dst, out, dst_size, out_size, ahnd,
+			  bosize, LOCAL_TO_SYS, enable_compression, e);
+
+	gem_read(fd, out, 0, read_buf, bosize);
+
+	igt_assert_eq(memcmp(read_buf, pattern_buf, bosize), 0);
+}
+
+static void igt_corrupted_xy_ctrl_surf_copy_blt(int fd, uint32_t region, uint32_t src, uint32_t dst,
+						uint32_t out, uint64_t src_size, uint64_t dst_size,
+						uint64_t out_size, uint64_t ahnd, int ccssize,
+						uint8_t *pattern_buf, uint8_t *read_buf, int bosize,
+						struct intel_execution_engine2 *e)
+{
+	bool enable_compression = true;
+
+	gem_write(fd, src, 0, pattern_buf, bosize);
+
+	igt_info("corrupt CCS via XY_CTRL_SURF_COPY_BLT\n");
+
+	/* corrupt 'dst' BO's ccs by writing directly */
+	xy_ctrl_surf_copy_blt(fd, region, src, dst, src_size, dst_size,
+			      ahnd, ccssize, true, e);
+
+	memset(read_buf, 0, bosize);
+	gem_write(fd, out, 0, read_buf, bosize);
+
+	/* copy 'dst' BO into 'out' BO */
+	xy_block_copy_blt(fd, region, dst, out, dst_size, out_size, ahnd,
+			  bosize, LOCAL_TO_SYS, enable_compression, e);
+
+	gem_read(fd, out, 0, read_buf, bosize);
+
+	igt_assert_neq(memcmp(read_buf, pattern_buf, bosize), 0);
+
+	memset(read_buf, 0, ccssize);
+	gem_write(fd, out, 0, read_buf, ccssize);
+
+	/* copy 'dst' BO's ccs into 'out' BO */
+	xy_ctrl_surf_copy_blt(fd, region, dst, out, dst_size, out_size,
+			      ahnd, ccssize, false, e);
+
+	gem_read(fd, out, 0, read_buf, ccssize);
+
+	igt_assert_eq(memcmp(read_buf, pattern_buf, ccssize), 0);
+}
+
+static void igt_copy_zero_pattern_xy_block_copy_blt(int fd, uint32_t region, uint32_t src,
+						    uint32_t dst, uint32_t out, uint64_t src_size,
+						    uint64_t dst_size, uint64_t out_size,
+						    uint64_t ahnd, int ccssize, uint8_t *pattern_buf,
+						    uint8_t *read_buf, uint8_t *input_buf,
+						    int bosize, struct intel_execution_engine2 *e)
+{
+	bool enable_compression = true;
+
+	igt_info("copy zeros pattern to lmem BO with compression\n");
+
+	gem_write(fd, src, 0, pattern_buf, bosize);
+	gem_write(fd, dst, 0, pattern_buf, bosize);
+	/* set ccs to random pattern */
+	xy_ctrl_surf_copy_blt(fd, region, src, dst, src_size, dst_size,
+			      ahnd, ccssize, true, e);
+
+	memset(input_buf, 0, bosize);
+	gem_write(fd, src, 0, input_buf, bosize);
+
+	/* copy 'src' to 'dst' with compression */
+	xy_block_copy_blt(fd, region, src, dst, src_size, dst_size,
+			  ahnd, bosize, SYS_TO_LOCAL, enable_compression, e);
+
+	gem_write(fd, out, 0, pattern_buf, bosize);
+
+	/* copy 'dst' BO back into 'out' BO */
+	xy_block_copy_blt(fd, region, dst, out, dst_size, out_size, ahnd,
+			  bosize, LOCAL_TO_SYS, enable_compression, e);
+
+	gem_read(fd, out, 0, read_buf, bosize);
+
+	igt_assert_eq(memcmp(read_buf, input_buf, bosize), 0);
+
+	memset(read_buf, 0, bosize);
+	gem_read(fd, dst, 0, read_buf, bosize);
+
+	igt_assert_eq(memcmp(read_buf, pattern_buf, bosize), 0);
+
+	memset(read_buf, 0, ccssize);
+	gem_write(fd, out, 0, read_buf, ccssize);
+
+	/* copy 'dst' BO's ccs into 'out' BO */
+	xy_ctrl_surf_copy_blt(fd, region, dst, out, dst_size, out_size,
+			      ahnd, ccssize, false, e);
+
+	gem_read(fd, out, 0, read_buf, ccssize);
+
+	igt_assert_neq(memcmp(read_buf, pattern_buf, ccssize), 0);
+}
+
+static void igt_copy_repeat_pattern_xy_block_copy_blt(int fd, uint32_t region, uint32_t src,
+						      uint32_t dst, uint32_t out, uint64_t src_size,
+						      uint64_t dst_size, uint64_t out_size,
+						      uint64_t ahnd, int ccssize,
+						      uint8_t *pattern_buf, uint8_t *read_buf,
+						      uint8_t *input_buf, int bosize,
+						      struct intel_execution_engine2 *e)
+{
+	int i;
+	bool enable_compression = true;
+
+	igt_info("copy repeat pattern to lmem BO with compression\n");
+
+	gem_write(fd, src, 0, pattern_buf, bosize);
+	gem_write(fd, src, 0, pattern_buf, ccssize);
+
+	xy_ctrl_surf_copy_blt(fd, region, src, dst, src_size, dst_size,
+			      ahnd, ccssize, true, e);
+	gem_write(fd, dst, 0, pattern_buf, bosize);
+
+	/* generate repeating pattern */
+	input_buf[0] = (uint8_t)rand();
+	input_buf[1] = (uint8_t)rand();
+	input_buf[2] = (uint8_t)rand();
+	input_buf[3] = (uint8_t)rand();
+	for (i = 4; i < bosize; i++)
+		input_buf[i] = input_buf[i - 4];
+	gem_write(fd, src, 0, input_buf, bosize);
+
+	xy_block_copy_blt(fd, region, src, dst, src_size, dst_size, ahnd,
+			  bosize, SYS_TO_LOCAL, enable_compression, e);
+
+	memset(read_buf, 0, bosize);
+	gem_write(fd, out, 0, read_buf, bosize);
+
+	/* copy 'dst' BO back into 'out' BO */
+	xy_block_copy_blt(fd, region, dst, out, dst_size, out_size, ahnd,
+			  bosize, LOCAL_TO_SYS, enable_compression, e);
+
+	gem_read(fd, out, 0, read_buf, bosize);
+
+	igt_assert_eq(memcmp(read_buf, input_buf, bosize), 0);
+
+	memset(read_buf, 0, bosize);
+	gem_read(fd, dst, 0, read_buf, bosize);
+
+	igt_assert_neq(memcmp(read_buf, pattern_buf, bosize), 0);
+
+	memset(read_buf, 0, ccssize);
+	gem_write(fd, out, 0, read_buf, ccssize);
+
+	/* copy 'dst' BO's ccs into 'out' BO */
+	xy_ctrl_surf_copy_blt(fd, region, dst, out, dst_size, out_size,
+			      ahnd, ccssize, false, e);
+
+	gem_read(fd, out, 0, read_buf, ccssize);
+
+	igt_assert_neq(memcmp(read_buf, pattern_buf, ccssize), 0);
+}
+
+/*
+ * Allocate a BO in SMEM.
+ * Fill a pattern
+ * Use XY_BLOCK_COPY_BLT to copy it to LMEM with compression enabled
+ * Clear the BO in SMEM, and the pattern_buf in which the pattern was
+ * stored
+ * Use XY_BLOCK_COPY_BLT to copy it back to the BO in SMEM with
+ * resolve
+ * Compare the value in the BO in SMEM matches the pattern
+ */
+static void test_ccs(int fd, int size, uint32_t region, char *sub_name,
+		     struct intel_execution_engine2 *e)
+{
+	int bosize, ccssize, ret, i;
+	int start, end;
+	uint32_t src, dst, out;
+	uint8_t *pattern_buf, *input_buf, *read_buf;
+	uint64_t ahnd, src_size, dst_size, out_size;
+	struct timeval tv;
+
+	ahnd = get_reloc_ahnd(fd, 0);
+
+	if (size > 0) {
+		start = size;
+		end = start + 1;
+	} else {
+		start = BOSIZE_MIN;
+		end = BOSIZE_MAX;
+	}
+
+	for (bosize = start; bosize < end; bosize *= 2) {
+		/* allocate working buffers */
+		pattern_buf = malloc(bosize);
+		igt_assert(pattern_buf);
+		input_buf = malloc(bosize);
+		igt_assert(input_buf);
+		read_buf = malloc(bosize);
+		igt_assert(read_buf);
+
+		ccssize = bosize / CCS_RATIO;
+		src_size = dst_size = out_size = bosize;
+
+		/* allocate working BOs in the right location */
+		ret = __gem_create_in_memory_regions(fd, &src, &src_size,
+						   INTEL_MEMORY_REGION_ID(I915_SYSTEM_MEMORY, 0));
+		igt_assert_eq(ret, 0);
+
+		ret = __gem_create_in_memory_regions(fd, &dst, &dst_size, region);
+		igt_assert_eq(ret, 0);
+
+		ret = __gem_create_in_memory_regions(fd, &out, &out_size,
+						   INTEL_MEMORY_REGION_ID(I915_SYSTEM_MEMORY, 0));
+		igt_assert_eq(ret, 0);
+
+		/* fill in random pattern */
+		ret = gettimeofday(&tv, NULL);
+		igt_assert(!ret);
+		srandom((int)tv.tv_usec);
+
+		for (i = 0; i < bosize; i++)
+			pattern_buf[i] = (uint8_t)rand();
+
+		igt_info("progress: bosize %d, ccssize %d\n", bosize, ccssize);
+
+		igt_dynamic_f("write_read_in_ccs_surface-%s-%s", sub_name, e->name)
+			igt_wr_xy_ctrl_surf_copy_blt(fd, region, src, dst, out, src_size, dst_size,
+						     out_size, ahnd, ccssize, pattern_buf, read_buf,
+						     bosize, e);
+		igt_dynamic_f("verify_compression_of_random_data-%s-%s", sub_name, e->name)
+			igt_overwritten_xy_block_copy_blt(fd, region, src, dst, out, src_size,
+							  dst_size, out_size, ahnd, ccssize,
+							  pattern_buf, read_buf, bosize, e);
+		igt_dynamic_f("verify_corrupted_pattern_in_ccs_surface-%s-%s", sub_name, e->name)
+			igt_corrupted_xy_ctrl_surf_copy_blt(fd, region, src, dst, out, src_size,
+							    dst_size, out_size, ahnd, ccssize,
+							    pattern_buf, read_buf, bosize, e);
+		igt_dynamic_f("copy_zero_pattern_with_compression-%s-%s", sub_name, e->name)
+			igt_copy_zero_pattern_xy_block_copy_blt(fd, region, src, dst, out, src_size,
+								dst_size, out_size, ahnd, ccssize,
+								pattern_buf, read_buf, input_buf,
+								bosize, e);
+		igt_dynamic_f("copy_repeat_pattern_with_compression-%s-%s", sub_name, e->name)
+			igt_copy_repeat_pattern_xy_block_copy_blt(fd, region, src, dst, out,
+								  src_size, dst_size, out_size,
+								  ahnd, ccssize, pattern_buf,
+								  read_buf, input_buf, bosize, e);
+
+		gem_close(fd, out);
+		gem_close(fd, dst);
+		gem_close(fd, src);
+		free(read_buf);
+		free(input_buf);
+		free(pattern_buf);
+	}
+
+	put_ahnd(ahnd);
+}
+
+#define test_each_physical_engine(T, size) \
+        igt_subtest_with_dynamic(T) \
+                for_each_physical_engine(drm_fd, e) { \
+                        if (!gem_engine_can_block_copy(drm_fd, e)) \
+                                continue; \
+                        for_each_combination(regions, 1, set) { \
+                                sub_name = memregion_dynamic_subtest_name(regions); \
+                                region = igt_collection_get_value(regions, 0); \
+                                test_ccs(drm_fd, size, region, sub_name, e); \
+                                free(sub_name); \
+                        } \
+                }
+
+igt_main
+{
+	struct drm_i915_query_memory_regions *query_info;
+	struct intel_execution_engine2 *e;
+	struct igt_collection *regions, *set;
+	char *sub_name;
+	uint32_t region;
+	int drm_fd;
+
+	igt_fixture {
+		drm_fd = drm_open_driver(DRIVER_INTEL);
+		igt_require_gem(drm_fd);
+		igt_require(AT_LEAST_GEN(intel_get_drm_devid(drm_fd), 12) > 0);
+		igt_require(HAS_FLAT_CCS(intel_get_drm_devid(drm_fd)));
+
+		query_info = gem_get_query_memory_regions(drm_fd);
+		igt_require(query_info);
+
+		set = get_memory_region_set(query_info, I915_DEVICE_MEMORY);
+	}
+
+	test_each_physical_engine("basic-gem-ccs-4K", 4 * 1024);
+	test_each_physical_engine("basic-gem-ccs-64K", 64 * 1024);
+	test_each_physical_engine("basic-gem-ccs-1M", 1024 * 1024);
+	test_each_physical_engine("basic-gem-ccs-all", 0);
+
+	igt_fixture {
+		close(drm_fd);
+	}
+}
diff --git a/tests/meson.build b/tests/meson.build
index c14acf99..e0b48d66 100644
--- a/tests/meson.build
+++ b/tests/meson.build
@@ -107,6 +107,7 @@ i915_progs = [
 	'gem_blits',
 	'gem_busy',
 	'gem_caching',
+	'gem_ccs',
 	'gem_close',
 	'gem_close_race',
 	'gem_concurrent_blit',
-- 
2.25.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [igt-dev] ✓ Fi.CI.BAT: success for Add testing for CCS (rev4)
  2021-12-10 13:05 [igt-dev] [PATCH i-g-t,v4 0/5] Add testing for CCS apoorva1.singh
                   ` (4 preceding siblings ...)
  2021-12-10 13:05 ` [igt-dev] [PATCH i-g-t,v4 5/5] i915/gem_ccs: Add testing for CCS apoorva1.singh
@ 2021-12-10 13:45 ` Patchwork
  2021-12-11  8:42 ` [igt-dev] ✗ Fi.CI.IGT: failure " Patchwork
  6 siblings, 0 replies; 14+ messages in thread
From: Patchwork @ 2021-12-10 13:45 UTC (permalink / raw)
  To: apoorva1.singh; +Cc: igt-dev

[-- Attachment #1: Type: text/plain, Size: 2592 bytes --]

== Series Details ==

Series: Add testing for CCS (rev4)
URL   : https://patchwork.freedesktop.org/series/96648/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_10988 -> IGTPW_6487
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/index.html

Participating hosts (45 -> 35)
------------------------------

  Missing    (10): fi-ilk-m540 bat-dg1-6 bat-dg1-5 fi-hsw-4200u fi-bsw-cyan bat-adlp-6 bat-adlp-4 fi-ctg-p8600 bat-jsl-2 fi-bdw-samus 

Known issues
------------

  Here are the changes found in IGTPW_6487 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_flink_basic@bad-flink:
    - fi-skl-6600u:       [PASS][1] -> [FAIL][2] ([i915#4547])
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10988/fi-skl-6600u/igt@gem_flink_basic@bad-flink.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/fi-skl-6600u/igt@gem_flink_basic@bad-flink.html

  * igt@kms_frontbuffer_tracking@basic:
    - fi-cml-u2:          [PASS][3] -> [DMESG-WARN][4] ([i915#4269])
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10988/fi-cml-u2/igt@kms_frontbuffer_tracking@basic.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/fi-cml-u2/igt@kms_frontbuffer_tracking@basic.html

  
#### Possible fixes ####

  * igt@core_hotunplug@unbind-rebind:
    - fi-tgl-u2:          [INCOMPLETE][5] ([i915#4006]) -> [PASS][6]
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10988/fi-tgl-u2/igt@core_hotunplug@unbind-rebind.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/fi-tgl-u2/igt@core_hotunplug@unbind-rebind.html

  
  [i915#4006]: https://gitlab.freedesktop.org/drm/intel/issues/4006
  [i915#4269]: https://gitlab.freedesktop.org/drm/intel/issues/4269
  [i915#4547]: https://gitlab.freedesktop.org/drm/intel/issues/4547


Build changes
-------------

  * CI: CI-20190529 -> None
  * IGT: IGT_6305 -> IGTPW_6487

  CI-20190529: 20190529
  CI_DRM_10988: 24a4093e85c578905d39ebe14225dbeb5b6f07d5 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGTPW_6487: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/index.html
  IGT_6305: 136258e86a093fdb50a7a341de1c09ac9a076fea @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git



== Testlist changes ==

+igt@gem_ccs@basic-gem-ccs-1m
+igt@gem_ccs@basic-gem-ccs-4k
+igt@gem_ccs@basic-gem-ccs-64k
+igt@gem_ccs@basic-gem-ccs-all

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/index.html

[-- Attachment #2: Type: text/html, Size: 3269 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [igt-dev] ✗ Fi.CI.IGT: failure for Add testing for CCS (rev4)
  2021-12-10 13:05 [igt-dev] [PATCH i-g-t,v4 0/5] Add testing for CCS apoorva1.singh
                   ` (5 preceding siblings ...)
  2021-12-10 13:45 ` [igt-dev] ✓ Fi.CI.BAT: success for Add testing for CCS (rev4) Patchwork
@ 2021-12-11  8:42 ` Patchwork
  6 siblings, 0 replies; 14+ messages in thread
From: Patchwork @ 2021-12-11  8:42 UTC (permalink / raw)
  To: apoorva1.singh; +Cc: igt-dev

[-- Attachment #1: Type: text/plain, Size: 30244 bytes --]

== Series Details ==

Series: Add testing for CCS (rev4)
URL   : https://patchwork.freedesktop.org/series/96648/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_10988_full -> IGTPW_6487_full
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with IGTPW_6487_full absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in IGTPW_6487_full, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/index.html

Participating hosts (10 -> 7)
------------------------------

  Missing    (3): pig-skl-6260u pig-glk-j5005 shard-rkl 

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in IGTPW_6487_full:

### IGT changes ###

#### Possible regressions ####

  * {igt@gem_ccs@basic-gem-ccs-4k} (NEW):
    - shard-iclb:         NOTRUN -> [SKIP][1] +3 similar issues
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-iclb3/igt@gem_ccs@basic-gem-ccs-4k.html

  * {igt@gem_ccs@basic-gem-ccs-all} (NEW):
    - shard-tglb:         NOTRUN -> [SKIP][2] +3 similar issues
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-tglb2/igt@gem_ccs@basic-gem-ccs-all.html

  * igt@kms_ccs@pipe-a-crc-primary-basic-yf_tiled_ccs:
    - shard-glk:          [PASS][3] -> [FAIL][4]
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10988/shard-glk6/igt@kms_ccs@pipe-a-crc-primary-basic-yf_tiled_ccs.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-glk5/igt@kms_ccs@pipe-a-crc-primary-basic-yf_tiled_ccs.html

  
New tests
---------

  New tests have been introduced between CI_DRM_10988_full and IGTPW_6487_full:

### New IGT tests (4) ###

  * igt@gem_ccs@basic-gem-ccs-1m:
    - Statuses : 6 skip(s)
    - Exec time: [0.0] s

  * igt@gem_ccs@basic-gem-ccs-4k:
    - Statuses : 6 skip(s)
    - Exec time: [0.0] s

  * igt@gem_ccs@basic-gem-ccs-64k:
    - Statuses : 6 skip(s)
    - Exec time: [0.0] s

  * igt@gem_ccs@basic-gem-ccs-all:
    - Statuses : 6 skip(s)
    - Exec time: [0.0] s

  

Known issues
------------

  Here are the changes found in IGTPW_6487_full that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_ctx_param@set-priority-not-supported:
    - shard-tglb:         NOTRUN -> [SKIP][5] ([fdo#109314])
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-tglb2/igt@gem_ctx_param@set-priority-not-supported.html
    - shard-iclb:         NOTRUN -> [SKIP][6] ([fdo#109314])
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-iclb1/igt@gem_ctx_param@set-priority-not-supported.html

  * igt@gem_ctx_persistence@engines-hostile-preempt:
    - shard-snb:          NOTRUN -> [SKIP][7] ([fdo#109271] / [i915#1099]) +2 similar issues
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-snb7/igt@gem_ctx_persistence@engines-hostile-preempt.html

  * igt@gem_ctx_sseu@engines:
    - shard-tglb:         NOTRUN -> [SKIP][8] ([i915#280])
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-tglb8/igt@gem_ctx_sseu@engines.html

  * igt@gem_eio@unwedge-stress:
    - shard-iclb:         [PASS][9] -> [TIMEOUT][10] ([i915#2481] / [i915#3070])
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10988/shard-iclb8/igt@gem_eio@unwedge-stress.html
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-iclb5/igt@gem_eio@unwedge-stress.html

  * igt@gem_exec_fair@basic-deadline:
    - shard-kbl:          NOTRUN -> [FAIL][11] ([i915#2846])
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-kbl2/igt@gem_exec_fair@basic-deadline.html
    - shard-glk:          NOTRUN -> [FAIL][12] ([i915#2846])
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-glk2/igt@gem_exec_fair@basic-deadline.html
    - shard-apl:          NOTRUN -> [FAIL][13] ([i915#2846])
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-apl1/igt@gem_exec_fair@basic-deadline.html

  * igt@gem_exec_fair@basic-none-share@rcs0:
    - shard-kbl:          NOTRUN -> [FAIL][14] ([i915#2842])
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-kbl3/igt@gem_exec_fair@basic-none-share@rcs0.html

  * igt@gem_exec_fair@basic-none-vip@rcs0:
    - shard-kbl:          [PASS][15] -> [FAIL][16] ([i915#2842])
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10988/shard-kbl6/igt@gem_exec_fair@basic-none-vip@rcs0.html
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-kbl6/igt@gem_exec_fair@basic-none-vip@rcs0.html

  * igt@gem_exec_fair@basic-none@vcs1:
    - shard-iclb:         NOTRUN -> [FAIL][17] ([i915#2842])
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-iclb1/igt@gem_exec_fair@basic-none@vcs1.html

  * igt@gem_exec_fair@basic-pace@bcs0:
    - shard-iclb:         [PASS][18] -> [FAIL][19] ([i915#2842])
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10988/shard-iclb7/igt@gem_exec_fair@basic-pace@bcs0.html
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-iclb6/igt@gem_exec_fair@basic-pace@bcs0.html

  * igt@gem_huc_copy@huc-copy:
    - shard-tglb:         [PASS][20] -> [SKIP][21] ([i915#2190])
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10988/shard-tglb8/igt@gem_huc_copy@huc-copy.html
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-tglb6/igt@gem_huc_copy@huc-copy.html

  * igt@gem_lmem_swapping@heavy-random:
    - shard-iclb:         NOTRUN -> [SKIP][22] ([i915#4613])
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-iclb6/igt@gem_lmem_swapping@heavy-random.html
    - shard-tglb:         NOTRUN -> [SKIP][23] ([i915#4613])
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-tglb2/igt@gem_lmem_swapping@heavy-random.html
    - shard-apl:          NOTRUN -> [SKIP][24] ([fdo#109271] / [i915#4613])
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-apl8/igt@gem_lmem_swapping@heavy-random.html
    - shard-glk:          NOTRUN -> [SKIP][25] ([fdo#109271] / [i915#4613])
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-glk4/igt@gem_lmem_swapping@heavy-random.html

  * igt@gem_lmem_swapping@parallel-random-verify:
    - shard-kbl:          NOTRUN -> [SKIP][26] ([fdo#109271] / [i915#4613]) +2 similar issues
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-kbl1/igt@gem_lmem_swapping@parallel-random-verify.html

  * igt@gem_ppgtt@flink-and-close-vma-leak:
    - shard-glk:          [PASS][27] -> [FAIL][28] ([i915#644])
   [27]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10988/shard-glk8/igt@gem_ppgtt@flink-and-close-vma-leak.html
   [28]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-glk7/igt@gem_ppgtt@flink-and-close-vma-leak.html

  * igt@gem_pxp@verify-pxp-key-change-after-suspend-resume:
    - shard-tglb:         NOTRUN -> [SKIP][29] ([i915#4270])
   [29]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-tglb2/igt@gem_pxp@verify-pxp-key-change-after-suspend-resume.html
    - shard-iclb:         NOTRUN -> [SKIP][30] ([i915#4270])
   [30]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-iclb7/igt@gem_pxp@verify-pxp-key-change-after-suspend-resume.html

  * igt@gem_render_copy@y-tiled-mc-ccs-to-vebox-y-tiled:
    - shard-iclb:         NOTRUN -> [SKIP][31] ([i915#768])
   [31]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-iclb1/igt@gem_render_copy@y-tiled-mc-ccs-to-vebox-y-tiled.html

  * igt@gen7_exec_parse@basic-offset:
    - shard-tglb:         NOTRUN -> [SKIP][32] ([fdo#109289])
   [32]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-tglb7/igt@gen7_exec_parse@basic-offset.html
    - shard-iclb:         NOTRUN -> [SKIP][33] ([fdo#109289])
   [33]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-iclb6/igt@gen7_exec_parse@basic-offset.html

  * igt@gen9_exec_parse@unaligned-access:
    - shard-iclb:         NOTRUN -> [SKIP][34] ([i915#2856])
   [34]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-iclb4/igt@gen9_exec_parse@unaligned-access.html
    - shard-tglb:         NOTRUN -> [SKIP][35] ([i915#2856])
   [35]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-tglb5/igt@gen9_exec_parse@unaligned-access.html

  * igt@i915_pm_dc@dc9-dpms:
    - shard-apl:          [PASS][36] -> [SKIP][37] ([fdo#109271])
   [36]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10988/shard-apl6/igt@i915_pm_dc@dc9-dpms.html
   [37]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-apl7/igt@i915_pm_dc@dc9-dpms.html

  * igt@i915_pm_lpsp@kms-lpsp@kms-lpsp-dp:
    - shard-kbl:          NOTRUN -> [SKIP][38] ([fdo#109271] / [i915#1937])
   [38]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-kbl3/igt@i915_pm_lpsp@kms-lpsp@kms-lpsp-dp.html

  * igt@i915_pm_rpm@modeset-pc8-residency-stress:
    - shard-tglb:         NOTRUN -> [SKIP][39] ([fdo#109506] / [i915#2411])
   [39]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-tglb6/igt@i915_pm_rpm@modeset-pc8-residency-stress.html
    - shard-iclb:         NOTRUN -> [SKIP][40] ([fdo#109293] / [fdo#109506])
   [40]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-iclb2/igt@i915_pm_rpm@modeset-pc8-residency-stress.html

  * igt@i915_suspend@debugfs-reader:
    - shard-kbl:          [PASS][41] -> [INCOMPLETE][42] ([i915#3614])
   [41]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10988/shard-kbl2/igt@i915_suspend@debugfs-reader.html
   [42]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-kbl2/igt@i915_suspend@debugfs-reader.html

  * igt@i915_suspend@sysfs-reader:
    - shard-kbl:          NOTRUN -> [DMESG-WARN][43] ([i915#180]) +1 similar issue
   [43]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-kbl7/igt@i915_suspend@sysfs-reader.html

  * igt@kms_addfb_basic@invalid-smem-bo-on-discrete:
    - shard-tglb:         NOTRUN -> [SKIP][44] ([i915#3826])
   [44]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-tglb2/igt@kms_addfb_basic@invalid-smem-bo-on-discrete.html
    - shard-iclb:         NOTRUN -> [SKIP][45] ([i915#3826])
   [45]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-iclb1/igt@kms_addfb_basic@invalid-smem-bo-on-discrete.html

  * igt@kms_big_fb@x-tiled-max-hw-stride-32bpp-rotate-180-hflip:
    - shard-kbl:          NOTRUN -> [SKIP][46] ([fdo#109271] / [i915#3777]) +4 similar issues
   [46]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-kbl2/igt@kms_big_fb@x-tiled-max-hw-stride-32bpp-rotate-180-hflip.html

  * igt@kms_big_fb@y-tiled-max-hw-stride-64bpp-rotate-180-hflip:
    - shard-apl:          NOTRUN -> [SKIP][47] ([fdo#109271] / [i915#3777])
   [47]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-apl8/igt@kms_big_fb@y-tiled-max-hw-stride-64bpp-rotate-180-hflip.html

  * igt@kms_big_fb@yf-tiled-16bpp-rotate-0:
    - shard-glk:          [PASS][48] -> [DMESG-WARN][49] ([i915#118])
   [48]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10988/shard-glk8/igt@kms_big_fb@yf-tiled-16bpp-rotate-0.html
   [49]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-glk5/igt@kms_big_fb@yf-tiled-16bpp-rotate-0.html

  * igt@kms_big_fb@yf-tiled-64bpp-rotate-270:
    - shard-tglb:         NOTRUN -> [SKIP][50] ([fdo#111615]) +4 similar issues
   [50]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-tglb3/igt@kms_big_fb@yf-tiled-64bpp-rotate-270.html

  * igt@kms_big_fb@yf-tiled-64bpp-rotate-90:
    - shard-iclb:         NOTRUN -> [SKIP][51] ([fdo#110723]) +2 similar issues
   [51]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-iclb3/igt@kms_big_fb@yf-tiled-64bpp-rotate-90.html

  * igt@kms_big_fb@yf-tiled-max-hw-stride-32bpp-rotate-0-hflip-async-flip:
    - shard-kbl:          NOTRUN -> [SKIP][52] ([fdo#109271]) +271 similar issues
   [52]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-kbl4/igt@kms_big_fb@yf-tiled-max-hw-stride-32bpp-rotate-0-hflip-async-flip.html

  * igt@kms_ccs@pipe-a-bad-pixel-format-y_tiled_ccs:
    - shard-tglb:         NOTRUN -> [SKIP][53] ([i915#3689]) +3 similar issues
   [53]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-tglb8/igt@kms_ccs@pipe-a-bad-pixel-format-y_tiled_ccs.html

  * igt@kms_ccs@pipe-a-crc-sprite-planes-basic-y_tiled_gen12_rc_ccs_cc:
    - shard-kbl:          NOTRUN -> [SKIP][54] ([fdo#109271] / [i915#3886]) +10 similar issues
   [54]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-kbl1/igt@kms_ccs@pipe-a-crc-sprite-planes-basic-y_tiled_gen12_rc_ccs_cc.html

  * igt@kms_ccs@pipe-a-random-ccs-data-y_tiled_gen12_rc_ccs:
    - shard-glk:          NOTRUN -> [SKIP][55] ([fdo#109271]) +71 similar issues
   [55]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-glk4/igt@kms_ccs@pipe-a-random-ccs-data-y_tiled_gen12_rc_ccs.html

  * igt@kms_ccs@pipe-b-random-ccs-data-yf_tiled_ccs:
    - shard-tglb:         NOTRUN -> [SKIP][56] ([fdo#111615] / [i915#3689]) +3 similar issues
   [56]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-tglb2/igt@kms_ccs@pipe-b-random-ccs-data-yf_tiled_ccs.html

  * igt@kms_ccs@pipe-c-bad-pixel-format-y_tiled_gen12_mc_ccs:
    - shard-iclb:         NOTRUN -> [SKIP][57] ([fdo#109278] / [i915#3886]) +3 similar issues
   [57]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-iclb8/igt@kms_ccs@pipe-c-bad-pixel-format-y_tiled_gen12_mc_ccs.html

  * igt@kms_ccs@pipe-c-bad-rotation-90-y_tiled_gen12_mc_ccs:
    - shard-apl:          NOTRUN -> [SKIP][58] ([fdo#109271] / [i915#3886]) +8 similar issues
   [58]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-apl1/igt@kms_ccs@pipe-c-bad-rotation-90-y_tiled_gen12_mc_ccs.html
    - shard-glk:          NOTRUN -> [SKIP][59] ([fdo#109271] / [i915#3886]) +3 similar issues
   [59]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-glk2/igt@kms_ccs@pipe-c-bad-rotation-90-y_tiled_gen12_mc_ccs.html
    - shard-tglb:         NOTRUN -> [SKIP][60] ([i915#3689] / [i915#3886]) +2 similar issues
   [60]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-tglb5/igt@kms_ccs@pipe-c-bad-rotation-90-y_tiled_gen12_mc_ccs.html

  * igt@kms_cdclk@mode-transition:
    - shard-apl:          NOTRUN -> [SKIP][61] ([fdo#109271]) +186 similar issues
   [61]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-apl2/igt@kms_cdclk@mode-transition.html

  * igt@kms_chamelium@hdmi-edid-read:
    - shard-tglb:         NOTRUN -> [SKIP][62] ([fdo#109284] / [fdo#111827]) +5 similar issues
   [62]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-tglb5/igt@kms_chamelium@hdmi-edid-read.html

  * igt@kms_chamelium@hdmi-hpd-storm:
    - shard-kbl:          NOTRUN -> [SKIP][63] ([fdo#109271] / [fdo#111827]) +20 similar issues
   [63]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-kbl3/igt@kms_chamelium@hdmi-hpd-storm.html

  * igt@kms_color_chamelium@pipe-b-ctm-red-to-blue:
    - shard-iclb:         NOTRUN -> [SKIP][64] ([fdo#109284] / [fdo#111827]) +3 similar issues
   [64]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-iclb4/igt@kms_color_chamelium@pipe-b-ctm-red-to-blue.html
    - shard-apl:          NOTRUN -> [SKIP][65] ([fdo#109271] / [fdo#111827]) +13 similar issues
   [65]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-apl7/igt@kms_color_chamelium@pipe-b-ctm-red-to-blue.html

  * igt@kms_color_chamelium@pipe-c-ctm-green-to-red:
    - shard-snb:          NOTRUN -> [SKIP][66] ([fdo#109271] / [fdo#111827]) +7 similar issues
   [66]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-snb2/igt@kms_color_chamelium@pipe-c-ctm-green-to-red.html

  * igt@kms_color_chamelium@pipe-d-ctm-0-75:
    - shard-glk:          NOTRUN -> [SKIP][67] ([fdo#109271] / [fdo#111827]) +4 similar issues
   [67]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-glk2/igt@kms_color_chamelium@pipe-d-ctm-0-75.html
    - shard-iclb:         NOTRUN -> [SKIP][68] ([fdo#109278] / [fdo#109284] / [fdo#111827])
   [68]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-iclb5/igt@kms_color_chamelium@pipe-d-ctm-0-75.html

  * igt@kms_content_protection@legacy:
    - shard-kbl:          NOTRUN -> [TIMEOUT][69] ([i915#1319]) +1 similar issue
   [69]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-kbl4/igt@kms_content_protection@legacy.html

  * igt@kms_content_protection@lic:
    - shard-apl:          NOTRUN -> [TIMEOUT][70] ([i915#1319])
   [70]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-apl2/igt@kms_content_protection@lic.html
    - shard-iclb:         NOTRUN -> [SKIP][71] ([fdo#109300] / [fdo#111066])
   [71]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-iclb2/igt@kms_content_protection@lic.html
    - shard-tglb:         NOTRUN -> [SKIP][72] ([fdo#111828])
   [72]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-tglb6/igt@kms_content_protection@lic.html

  * igt@kms_content_protection@uevent:
    - shard-kbl:          NOTRUN -> [FAIL][73] ([i915#2105])
   [73]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-kbl2/igt@kms_content_protection@uevent.html

  * igt@kms_cursor_crc@pipe-b-cursor-32x32-rapid-movement:
    - shard-tglb:         NOTRUN -> [SKIP][74] ([i915#3319]) +1 similar issue
   [74]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-tglb2/igt@kms_cursor_crc@pipe-b-cursor-32x32-rapid-movement.html

  * igt@kms_cursor_crc@pipe-b-cursor-512x170-offscreen:
    - shard-iclb:         NOTRUN -> [SKIP][75] ([fdo#109278] / [fdo#109279]) +1 similar issue
   [75]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-iclb3/igt@kms_cursor_crc@pipe-b-cursor-512x170-offscreen.html

  * igt@kms_cursor_crc@pipe-c-cursor-512x170-offscreen:
    - shard-tglb:         NOTRUN -> [SKIP][76] ([fdo#109279] / [i915#3359]) +3 similar issues
   [76]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-tglb5/igt@kms_cursor_crc@pipe-c-cursor-512x170-offscreen.html

  * igt@kms_cursor_crc@pipe-c-cursor-max-size-rapid-movement:
    - shard-tglb:         NOTRUN -> [SKIP][77] ([i915#3359]) +3 similar issues
   [77]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-tglb8/igt@kms_cursor_crc@pipe-c-cursor-max-size-rapid-movement.html

  * igt@kms_cursor_crc@pipe-d-cursor-64x64-random:
    - shard-iclb:         NOTRUN -> [SKIP][78] ([fdo#109278]) +18 similar issues
   [78]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-iclb7/igt@kms_cursor_crc@pipe-d-cursor-64x64-random.html

  * igt@kms_cursor_edge_walk@pipe-d-128x128-right-edge:
    - shard-snb:          NOTRUN -> [SKIP][79] ([fdo#109271]) +172 similar issues
   [79]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-snb5/igt@kms_cursor_edge_walk@pipe-d-128x128-right-edge.html

  * igt@kms_cursor_legacy@cursorb-vs-flipa-atomic-transitions-varying-size:
    - shard-iclb:         NOTRUN -> [SKIP][80] ([fdo#109274] / [fdo#109278])
   [80]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-iclb8/igt@kms_cursor_legacy@cursorb-vs-flipa-atomic-transitions-varying-size.html

  * igt@kms_cursor_legacy@pipe-d-single-bo:
    - shard-kbl:          NOTRUN -> [SKIP][81] ([fdo#109271] / [i915#533]) +1 similar issue
   [81]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-kbl6/igt@kms_cursor_legacy@pipe-d-single-bo.html

  * igt@kms_dp_tiled_display@basic-test-pattern:
    - shard-iclb:         NOTRUN -> [SKIP][82] ([i915#426])
   [82]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-iclb4/igt@kms_dp_tiled_display@basic-test-pattern.html
    - shard-tglb:         NOTRUN -> [SKIP][83] ([i915#426])
   [83]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-tglb5/igt@kms_dp_tiled_display@basic-test-pattern.html

  * igt@kms_flip@2x-blocking-wf_vblank:
    - shard-iclb:         NOTRUN -> [SKIP][84] ([fdo#109274]) +2 similar issues
   [84]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-iclb4/igt@kms_flip@2x-blocking-wf_vblank.html

  * igt@kms_flip_tiling@flip-change-tiling@dp-1-pipe-a-y-to-yf-ccs:
    - shard-apl:          NOTRUN -> [DMESG-WARN][85] ([i915#1226])
   [85]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-apl2/igt@kms_flip_tiling@flip-change-tiling@dp-1-pipe-a-y-to-yf-ccs.html

  * igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-cur-indfb-onoff:
    - shard-tglb:         NOTRUN -> [SKIP][86] ([fdo#111825]) +23 similar issues
   [86]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-tglb1/igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-cur-indfb-onoff.html

  * igt@kms_frontbuffer_tracking@fbcpsr-2p-scndscrn-spr-indfb-draw-pwrite:
    - shard-iclb:         NOTRUN -> [SKIP][87] ([fdo#109280]) +17 similar issues
   [87]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-iclb7/igt@kms_frontbuffer_tracking@fbcpsr-2p-scndscrn-spr-indfb-draw-pwrite.html

  * igt@kms_hdr@static-toggle-suspend:
    - shard-tglb:         NOTRUN -> [SKIP][88] ([i915#1187])
   [88]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-tglb3/igt@kms_hdr@static-toggle-suspend.html

  * igt@kms_pipe_crc_basic@disable-crc-after-crtc-pipe-d:
    - shard-glk:          NOTRUN -> [SKIP][89] ([fdo#109271] / [i915#533])
   [89]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-glk3/igt@kms_pipe_crc_basic@disable-crc-after-crtc-pipe-d.html
    - shard-apl:          NOTRUN -> [SKIP][90] ([fdo#109271] / [i915#533]) +1 similar issue
   [90]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-apl3/igt@kms_pipe_crc_basic@disable-crc-after-crtc-pipe-d.html

  * igt@kms_plane_alpha_blend@pipe-a-alpha-transparent-fb:
    - shard-apl:          NOTRUN -> [FAIL][91] ([i915#265])
   [91]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-apl7/igt@kms_plane_alpha_blend@pipe-a-alpha-transparent-fb.html

  * igt@kms_plane_alpha_blend@pipe-b-alpha-basic:
    - shard-glk:          NOTRUN -> [FAIL][92] ([fdo#108145] / [i915#265]) +1 similar issue
   [92]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-glk8/igt@kms_plane_alpha_blend@pipe-b-alpha-basic.html
    - shard-apl:          NOTRUN -> [FAIL][93] ([fdo#108145] / [i915#265]) +1 similar issue
   [93]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-apl3/igt@kms_plane_alpha_blend@pipe-b-alpha-basic.html

  * igt@kms_plane_alpha_blend@pipe-b-alpha-transparent-fb:
    - shard-kbl:          NOTRUN -> [FAIL][94] ([i915#265])
   [94]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-kbl4/igt@kms_plane_alpha_blend@pipe-b-alpha-transparent-fb.html

  * igt@kms_plane_alpha_blend@pipe-c-alpha-opaque-fb:
    - shard-kbl:          NOTRUN -> [FAIL][95] ([fdo#108145] / [i915#265]) +4 similar issues
   [95]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-kbl1/igt@kms_plane_alpha_blend@pipe-c-alpha-opaque-fb.html

  * igt@kms_plane_lowres@pipe-b-tiling-none:
    - shard-tglb:         NOTRUN -> [SKIP][96] ([i915#3536])
   [96]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-tglb6/igt@kms_plane_lowres@pipe-b-tiling-none.html
    - shard-iclb:         NOTRUN -> [SKIP][97] ([i915#3536])
   [97]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-iclb8/igt@kms_plane_lowres@pipe-b-tiling-none.html

  * igt@kms_plane_multiple@atomic-pipe-d-tiling-yf:
    - shard-tglb:         NOTRUN -> [SKIP][98] ([fdo#111615] / [fdo#112054])
   [98]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-tglb2/igt@kms_plane_multiple@atomic-pipe-d-tiling-yf.html

  * igt@kms_plane_scaling@scaler-with-clipping-clamping@pipe-c-scaler-with-clipping-clamping:
    - shard-apl:          NOTRUN -> [SKIP][99] ([fdo#109271] / [i915#2733])
   [99]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-apl4/igt@kms_plane_scaling@scaler-with-clipping-clamping@pipe-c-scaler-with-clipping-clamping.html
    - shard-kbl:          NOTRUN -> [SKIP][100] ([fdo#109271] / [i915#2733])
   [100]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-kbl3/igt@kms_plane_scaling@scaler-with-clipping-clamping@pipe-c-scaler-with-clipping-clamping.html
    - shard-glk:          NOTRUN -> [SKIP][101] ([fdo#109271] / [i915#2733])
   [101]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-glk6/igt@kms_plane_scaling@scaler-with-clipping-clamping@pipe-c-scaler-with-clipping-clamping.html

  * igt@kms_psr2_sf@overlay-primary-update-sf-dmg-area-3:
    - shard-kbl:          NOTRUN -> [SKIP][102] ([fdo#109271] / [i915#658]) +4 similar issues
   [102]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-kbl7/igt@kms_psr2_sf@overlay-primary-update-sf-dmg-area-3.html

  * igt@kms_psr2_sf@overlay-primary-update-sf-dmg-area-4:
    - shard-apl:          NOTRUN -> [SKIP][103] ([fdo#109271] / [i915#658]) +1 similar issue
   [103]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-apl8/igt@kms_psr2_sf@overlay-primary-update-sf-dmg-area-4.html

  * igt@kms_psr@psr2_no_drrs:
    - shard-iclb:         NOTRUN -> [SKIP][104] ([fdo#109441]) +1 similar issue
   [104]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-iclb5/igt@kms_psr@psr2_no_drrs.html

  * igt@kms_psr@psr2_primary_render:
    - shard-tglb:         NOTRUN -> [FAIL][105] ([i915#132] / [i915#3467]) +1 similar issue
   [105]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-tglb8/igt@kms_psr@psr2_primary_render.html

  * igt@kms_tv_load_detect@load-detect:
    - shard-tglb:         NOTRUN -> [SKIP][106] ([fdo#109309])
   [106]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-tglb5/igt@kms_tv_load_detect@load-detect.html
    - shard-iclb:         NOTRUN -> [SKIP][107] ([fdo#109309])
   [107]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-iclb1/igt@kms_tv_load_detect@load-detect.html

  * igt@kms_vblank@pipe-b-ts-continuation-suspend:
    - shard-kbl:          [PASS][108] -> [DMESG-WARN][109] ([i915#180]) +1 similar issue
   [108]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10988/shard-kbl6/igt@kms_vblank@pipe-b-ts-continuation-suspend.html
   [109]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-kbl6/igt@kms_vblank@pipe-b-ts-continuation-suspend.html

  * igt@kms_writeback@writeback-check-output:
    - shard-kbl:          NOTRUN -> [SKIP][110] ([fdo#109271] / [i915#2437])
   [110]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-kbl2/igt@kms_writeback@writeback-check-output.html

  * igt@nouveau_crc@pipe-c-source-outp-complete:
    - shard-tglb:         NOTRUN -> [SKIP][111] ([i915#2530]) +3 similar issues
   [111]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-tglb3/igt@nouveau_crc@pipe-c-source-outp-complete.html

  * igt@nouveau_crc@pipe-c-source-rg:
    - shard-iclb:         NOTRUN -> [SKIP][112] ([i915#2530]) +2 similar issues
   [112]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-iclb5/igt@nouveau_crc@pipe-c-source-rg.html

  * igt@perf@polling-parameterized:
    - shard-glk:          [PASS][113] -> [FAIL][114] ([i915#1542])
   [113]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10988/shard-glk7/igt@perf@polling-parameterized.html
   [114]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-glk3/igt@perf@polling-parameterized.html
    - shard-iclb:         [PASS][115] -> [FAIL][116] ([i915#1542])
   [115]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10988/shard-iclb2/igt@perf@polling-parameterized.html
   [116]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-iclb8/igt@perf@polling-parameterized.html

  * igt@prime_nv_api@i915_self_import:
    - shard-tglb:         NOTRUN -> [SKIP][117] ([fdo#109291]) +3 similar issues
   [117]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-tglb1/igt@prime_nv_api@i915_self_import.html

  * igt@prime_nv_api@nv_i915_import_twice_check_flink_name:
    - shard-iclb:         NOTRUN -> [SKIP][118] ([fdo#109291]) +2 similar issues
   [118]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-iclb3/igt@prime_nv_api@nv_i915_import_twice_check_flink_name.html

  * igt@sysfs_clients@fair-7:
    - shard-apl:          NOTRUN -> [SKIP][119] ([fdo#109271] / [i915#2994]) +1 similar issue
   [119]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-apl7/igt@sysfs_clients@fair-7.html
    - shard-iclb:         NOTRUN -> [SKIP][120] ([i915#2994]) +1 similar issue
   [120]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-iclb4/igt@sysfs_clients@fair-7.html
    - shard-tglb:         NOTRUN -> [SKIP][121] ([i915#2994]) +2 similar issues
   [121]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-tglb3/igt@sysfs_clients@fair-7.html

  * igt@sysfs_clients@split-10:
    - shard-glk:          NOTRUN -> [SKIP][122] ([fdo#109271] / [i915#2994]) +1 similar issue
   [122]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-glk8/igt@sysfs_clients@split-10.html

  * igt@sysfs_clients@split-50:
    - shard-kbl:          NOTRUN -> [SKIP][123] ([fdo#109271] / [i915#2994]) +3 similar issues
   [123]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-kbl4/igt@sysfs_clients@split-50.html

  
#### Possible fixes ####

  * igt@gem_exec_fair@basic-none-solo@rcs0:
    - shard-kbl:          [FAIL][124] ([i915#2842]) -> [PASS][125]
   [124]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10988/shard-kbl7/igt@gem_exec_fair@basic-none-solo@rcs0.html
   [125]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-kbl1/igt@gem_exec_fair@basic-none-solo@rcs0.html

  * igt@gem_exec_fair@basic-none-vip@rcs0:
    - shard-tglb:         [FAIL][126] ([i915#2842]) -> [PASS][127] +4 similar issues
   [126]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10988/shard-tglb3/igt@gem_exec_fair@basic-none-vip@rcs0.html
   [127]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/shard-tglb6/igt@gem_exec_fair@basic-none-vip@rcs0.html

  * igt@gem_exec_fair@basic-pace@vcs0:
    - shard-glk:          [FAIL][128] ([i915#2842]) -> [PASS][129] +1 similar issue
   [128]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10988/shard-glk5/igt@gem_exec_fair@basic-pace@vcs0.html
   [129]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_648

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6487/index.html

[-- Attachment #2: Type: text/html, Size: 33949 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [igt-dev] [PATCH i-g-t, v4 1/5] lib/i915: Introduce library intel_mocs
  2021-12-10 13:05 ` [igt-dev] [PATCH i-g-t, v4 1/5] lib/i915: Introduce library intel_mocs apoorva1.singh
@ 2021-12-13 15:58   ` Zbigniew Kempczyński
  0 siblings, 0 replies; 14+ messages in thread
From: Zbigniew Kempczyński @ 2021-12-13 15:58 UTC (permalink / raw)
  To: apoorva1.singh; +Cc: igt-dev

On Fri, Dec 10, 2021 at 06:35:29PM +0530, apoorva1.singh@intel.com wrote:
> From: Apoorva Singh <apoorva1.singh@intel.com>
> 
> Add new library intel_mocs for mocs settings.
> 
> Signed-off-by: Apoorva Singh <apoorva1.singh@intel.com>
> Cc: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com>
> Cc: Melkaveri, Arjun <arjun.melkaveri@intel.com>
> ---
>  lib/i915/intel_mocs.c | 51 +++++++++++++++++++++++++++++++++++++++++++
>  lib/i915/intel_mocs.h | 19 ++++++++++++++++
>  lib/meson.build       |  1 +
>  3 files changed, 71 insertions(+)
>  create mode 100644 lib/i915/intel_mocs.c
>  create mode 100644 lib/i915/intel_mocs.h
> 
> diff --git a/lib/i915/intel_mocs.c b/lib/i915/intel_mocs.c
> new file mode 100644
> index 00000000..e8c19309
> --- /dev/null
> +++ b/lib/i915/intel_mocs.c
> @@ -0,0 +1,51 @@
> +// SPDX-License-Identifier: MIT
> +/*
> + * Copyright © 2021 Intel Corporation
> + */
> +
> +#include "igt.h"
> +#include "i915/gem.h"
> +#include "intel_mocs.h"
> +
> +static void get_mocs_index(int fd, struct drm_i915_mocs_index *mocs)
> +{
> +	uint16_t devid = intel_get_drm_devid(fd);
> +
> +	/*
> +	 * Gen >= 12 onwards don't have a setting for PTE,
> +	 * so using I915_MOCS_PTE as mocs index may leads to
> +	 * some undefined MOCS behavior.
> +	 * This helper function is providing current UC as well
> +	 * as WB MOCS index based on platform.
> +

Empty line.

> +	 */
> +	if (IS_DG1(devid)) {
> +		mocs->uc_index = 1;
> +		mocs->wb_index = 5;

Previous uAPI defined enum i915_mocs_table_index, now I see we're
hardcoding mocs entry index here. I see it changes regarding gen
but I would like to ask - will we fill enum / add macros to have code
more clear where no magic number is entered directly in the code?

--
Zbigniew

> +	} else if (IS_GEN12(devid)) {
> +		mocs->uc_index = 3;
> +		mocs->wb_index = 2;
> +	} else {
> +		mocs->uc_index = I915_MOCS_PTE;
> +		mocs->wb_index = I915_MOCS_CACHED;
> +	}
> +}
> +
> +/* BitField [6:1] represents index to MOCS Tables
> + * BitField [0] represents Encryption/Decryption */
> +
> +uint8_t intel_get_wb_mocs(int fd)
> +{
> +	struct drm_i915_mocs_index mocs;
> +
> +	get_mocs_index(fd, &mocs);
> +	return mocs.wb_index << 1;
> +}
> +
> +uint8_t intel_get_uc_mocs(int fd)
> +{
> +	struct drm_i915_mocs_index mocs;
> +
> +	get_mocs_index(fd, &mocs);
> +	return mocs.uc_index << 1;
> +}
> diff --git a/lib/i915/intel_mocs.h b/lib/i915/intel_mocs.h
> new file mode 100644
> index 00000000..efd8607f
> --- /dev/null
> +++ b/lib/i915/intel_mocs.h
> @@ -0,0 +1,19 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright © 2019 Intel Corporation
> + */
> +
> +#ifndef _INTEL_MOCS_H
> +#define _INTEL_MOCS_H
> +
> +#define XY_BLOCK_COPY_BLT_MOCS_SHIFT		21
> +#define XY_CTRL_SURF_COPY_BLT_MOCS_SHIFT	25
> +
> +struct drm_i915_mocs_index {
> +	uint8_t uc_index;
> +	uint8_t wb_index;
> +};
> +
> +uint8_t intel_get_wb_mocs(int fd);
> +uint8_t intel_get_uc_mocs(int fd);
> +#endif /* _INTEL_MOCS_H */
> diff --git a/lib/meson.build b/lib/meson.build
> index b9568a71..f500f0f1 100644
> --- a/lib/meson.build
> +++ b/lib/meson.build
> @@ -11,6 +11,7 @@ lib_sources = [
>  	'i915/gem_mman.c',
>  	'i915/gem_vm.c',
>  	'i915/intel_memory_region.c',
> +	'i915/intel_mocs.c',
>  	'igt_collection.c',
>  	'igt_color_encoding.c',
>  	'igt_debugfs.c',
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [igt-dev] [PATCH i-g-t, v4 2/5] lib/i915: Introduce library i915_blt
  2021-12-10 13:05 ` [igt-dev] [PATCH i-g-t, v4 2/5] lib/i915: Introduce library i915_blt apoorva1.singh
@ 2021-12-15  9:40   ` Zbigniew Kempczyński
  2021-12-16 13:11   ` Zbigniew Kempczyński
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 14+ messages in thread
From: Zbigniew Kempczyński @ 2021-12-15  9:40 UTC (permalink / raw)
  To: apoorva1.singh; +Cc: igt-dev

On Fri, Dec 10, 2021 at 06:35:30PM +0530, apoorva1.singh@intel.com wrote:
> From: Apoorva Singh <apoorva1.singh@intel.com>
> 
> Add new library 'i915_blt' for various blt commands.
> 
> Signed-off-by: Apoorva Singh <apoorva1.singh@intel.com>
> Signed-off-by: Ayaz A Siddiqui <ayaz.siddiqui@intel.com>
> Cc: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com>
> Cc: Melkaveri, Arjun <arjun.melkaveri@intel.com>
> ---
>  lib/i915/i915_blt.c | 469 ++++++++++++++++++++++++++++++++++++++++++++
>  lib/i915/i915_blt.h |  82 ++++++++
>  lib/meson.build     |   1 +
>  3 files changed, 552 insertions(+)
>  create mode 100644 lib/i915/i915_blt.c
>  create mode 100644 lib/i915/i915_blt.h
> 
> diff --git a/lib/i915/i915_blt.c b/lib/i915/i915_blt.c
> new file mode 100644
> index 00000000..abfe7739
> --- /dev/null
> +++ b/lib/i915/i915_blt.c
> @@ -0,0 +1,469 @@
> +// SPDX-License-Identifier: MIT
> +/*
> + * Copyright © 2021 Intel Corporation
> + */
> +
> +#include <errno.h>
> +#include <sys/ioctl.h>
> +#include <sys/time.h>
> +#include <malloc.h>
> +#include "drm.h"
> +#include "igt.h"
> +#include "i915_blt.h"
> +#include "i915/intel_mocs.h"
> +
> +/*
> + * make_block_copy_batch:
> + * @fd: open i915 drm file descriptor
> + * @batch_buf: the batch buffer to populate with the command
> + * @src: fd of the source BO
> + * @dst: fd of the destination BO

fd? 

> + * @length: size of the src and dest BOs
> + * @reloc: pointer to the relocation entyr for this command
> + * @offset_src: source address offset
> + * @offset_dst: destination address offset
> + * @src_mem_type: source memory type (denotes direct or indirect
> + *			addressing)
> + * @dst_mem_type: destination memory type (denotes direct or indirect
> + *			addressing)
> + * @src_compression: flag to enable uncompressed read of compressed data
> + *			at the source
> + * @dst_compression: flag to enable compressed write at the destination
> + * @resolve: flag to enable resolve of compressed data

Also add information what function returns.

> + */
> +static int make_block_copy_batch(int fd, uint32_t *batch_buf,
> +				 uint32_t src, uint32_t dst, uint32_t length,
> +				 struct drm_i915_gem_relocation_entry *reloc,
> +				 uint64_t offset_src, uint64_t offset_dst,
> +				 int src_mem_type, int dst_mem_type,
> +				 int src_compression, int dst_compression,
> +				 int resolve)
> +{
> +	uint32_t *b = batch_buf;
> +	uint32_t devid;
> +	uint8_t src_mocs = intel_get_uc_mocs(fd);
> +	uint8_t dst_mocs = src_mocs;
> +
> +	devid = intel_get_drm_devid(fd);
> +
> +	igt_assert(AT_LEAST_GEN(devid, 12) && IS_TIGERLAKE(devid) && !(src_compression || dst_compression));

Petri has some doubts regarding this. We don't have code which exercises
this on TGL and DG1. We've already discussed this and new test for 
this functionality will be added. So according to his comment and 
lack of our testing remove TGL and compression, just leave:

igt_assert(AT_LEAST_GEN(devid, 12))

If we'll encounter problems on TGL/DG1 we can narrow this later.

> +
> +	/* BG 0 */
> +	b[0] = BLOCK_COPY_BLT_CMD | resolve;
> +
> +	/* BG 1
> +	 *
> +	 * Using Tile 4 dimensions.  Height = 32 rows
> +	 * Width = 128 bytes
> +	 */
> +	b[1] = dst_compression | TILE_4_FORMAT | TILE_4_WIDTH_DWORD |
> +		dst_mocs << XY_BLOCK_COPY_BLT_MOCS_SHIFT;;
> +
> +	/* BG 3
> +	 *
> +	 * X2 = TILE_4_WIDTH
> +	 * Y2 = (length / TILE_4_WIDTH) << 16:
> +	 */
> +	b[3] = TILE_4_WIDTH | (length >> 7) << DEST_Y2_COORDINATE_SHIFT;
> +
> +	b[4] = offset_dst;
> +	b[5] = offset_dst >> 32;
> +
> +	/* relocate address in b[4] and b[5] */
> +	reloc->offset = 4 * (sizeof(uint32_t));
> +	reloc->delta = 0;
> +	reloc->target_handle = dst;
> +	reloc->read_domains = I915_GEM_DOMAIN_RENDER;
> +	reloc->write_domain = I915_GEM_DOMAIN_RENDER;
> +	reloc->presumed_offset = 0;
> +	reloc++;
> +
> +	/* BG 6 */
> +	b[6] = dst_mem_type << DEST_MEM_TYPE_SHIFT;
> +
> +	/* BG 8 */
> +	b[8] = src_compression | TILE_4_WIDTH_DWORD | TILE_4_FORMAT |
> +		src_mocs << XY_BLOCK_COPY_BLT_MOCS_SHIFT;
> +
> +	b[9] = offset_src;
> +	b[10] = offset_src >> 32;
> +
> +	/* relocate address in b[9] and b[10] */
> +	reloc->offset = 9 * sizeof(uint32_t);
> +	reloc->delta = 0;
> +	reloc->target_handle = src;
> +	reloc->read_domains = I915_GEM_DOMAIN_RENDER;
> +	reloc->write_domain = 0;
> +	reloc->presumed_offset = 0;
> +	reloc++;
> +
> +	/* BG 11 */
> +	b[11] = src_mem_type << SRC_MEM_TYPE_SHIFT;
> +
> +	/* BG 16  */
> +	b[16] = SURFACE_TYPE_2D |
> +		((TILE_4_WIDTH - 1) << DEST_SURF_WIDTH_SHIFT) |
> +		(TILE_4_HEIGHT - 1);
> +
> +	/* BG 19 */
> +	b[19] = SURFACE_TYPE_2D |
> +		((TILE_4_WIDTH - 1) << SRC_SURF_WIDTH_SHIFT) |
> +		(TILE_4_HEIGHT - 1);
> +
> +	b += XY_BLOCK_COPY_BLT_LEN_DWORD;
> +
> +	b[0] = MI_FLUSH_DW | MI_FLUSH_LLC | MI_INVALIDATE_TLB;
> +	reloc->offset = 23 * sizeof(uint32_t);
> +	reloc->delta = 0;
> +	reloc->target_handle = dst_compression > 0 ? dst : src;
> +	reloc->read_domains = 0;
> +	reloc->write_domain = 0;
> +	reloc->presumed_offset = 0;
> +	reloc++;
> +	b[3] = 0;
> +
> +	b[4] = MI_FLUSH_DW | MI_FLUSH_CCS;
> +	reloc->offset = 27 * sizeof(uint32_t);
> +	reloc->delta = 0;
> +	reloc->target_handle = dst_compression > 0 ? dst : src;
> +	reloc->read_domains = 0;
> +	reloc->write_domain = 0;
> +	reloc->presumed_offset = 0;
> +	reloc++;
> +	b[7] = 0;
> +
> +	b[8] = MI_BATCH_BUFFER_END;
> +	b[9] = 0;
> +
> +	b += 10;
> +
> +	return (b - batch_buf) * sizeof(uint32_t);
> +}
> +
> +static void __xy_block_copy_blt(int fd, uint32_t bb_region, uint32_t src, uint32_t dst,
> +				uint64_t src_size, uint64_t dst_size, uint64_t ahnd,
> +				uint32_t length, enum copy_mode mode, bool enable_compression,
> +				uint32_t ctx, struct intel_execution_engine2 *e)
> +{
> +	struct drm_i915_gem_relocation_entry reloc[4];
> +	struct drm_i915_gem_exec_object2 exec[3];
> +	struct drm_i915_gem_execbuffer2 execbuf;
> +	int len;
> +	int src_mem_type, dst_mem_type;
> +	int dst_compression, src_compression;
> +	int resolve;
> +	uint32_t cmd, batch_buf[BATCH_SIZE/sizeof(uint32_t)] = {};
> +	uint64_t offset_src, offset_dst, offset_bb, bb_size, ret;
> +
> +	bb_size = BATCH_SIZE;
> +	ret = __gem_create_in_memory_regions(fd, &cmd, &bb_size, bb_region);
> +	igt_assert_eq(ret, 0);
> +
> +	switch(mode) {
> +		case SYS_TO_SYS: /* copy from smem to smem */
> +			src_mem_type = MEM_TYPE_SYS;
> +			dst_mem_type = MEM_TYPE_SYS;
> +			src_compression = 0;
> +			dst_compression = 0;
> +			resolve = 0;
> +		case SYS_TO_LOCAL: /* copy from smem to lmem */
> +			src_mem_type = MEM_TYPE_SYS;
> +			dst_mem_type = MEM_TYPE_LOCAL;
> +			src_compression = 0;
> +			dst_compression = enable_compression ? (COMPRESSION_ENABLE | AUX_CCS_E) : 0;
> +			resolve = 0;
> +		case LOCAL_TO_SYS: /* copy from lmem to smem */
> +			src_mem_type = MEM_TYPE_LOCAL;
> +			dst_mem_type = MEM_TYPE_SYS;
> +			src_compression = enable_compression ? (COMPRESSION_ENABLE | AUX_CCS_E) : 0;
> +			dst_compression = 0;
> +			resolve = 0;
> +		case LOCAL_TO_LOCAL: /* copy from lmem to lmem */
> +			src_mem_type = MEM_TYPE_LOCAL;
> +			dst_mem_type = MEM_TYPE_LOCAL;
> +			src_compression = enable_compression ? (COMPRESSION_ENABLE | AUX_CCS_E) : 0;
> +			dst_compression = enable_compression ? (COMPRESSION_ENABLE | AUX_CCS_E) : 0;
> +			resolve = 0;
> +		case LOCAL_TO_LOCAL_INPLACE: /* in-place decompress */

I just realized we don't need _INPLACE suffix, just src == dst 
would require full resolve so please remove this from enum.

> +			src_mem_type = MEM_TYPE_LOCAL;
> +			dst_mem_type = MEM_TYPE_LOCAL;
> +			src_compression = enable_compression ? (COMPRESSION_ENABLE | AUX_CCS_E) : 0;
> +			dst_compression = enable_compression ? (COMPRESSION_ENABLE | AUX_CCS_E) : 0;
> +			resolve = FULL_RESOLVE;
> +	}
> +
> +	offset_src = get_offset(ahnd, src, src_size, 0);
> +	offset_dst = get_offset(ahnd, dst, dst_size, 0);
> +	offset_bb = get_offset(ahnd, cmd, bb_size, 0);
> +
> +	/* construct the batch buffer */
> +	memset(reloc, 0, sizeof(reloc));
> +	len = make_block_copy_batch(fd, batch_buf,
> +				    src, dst, length, reloc,
> +				    offset_src, offset_dst,
> +				    src_mem_type, dst_mem_type,
> +				    src_compression, dst_compression,
> +				    resolve);
> +	igt_assert(len > 0);

Function won't return 0 if we look at its implementation - 
return is only on the end of the function and just does
arithmetics which is > 0 always. So this assert is unnecessary.

> +
> +	/* write batch buffer to 'cmd' BO */
> +	gem_write(fd, cmd, 0, batch_buf, len);
> +
> +	/* Execute the batch buffer */
> +	memset(exec, 0, sizeof(exec));
> +	if (mode == LOCAL_TO_LOCAL_INPLACE) {

We can check if (mode == LOCAL_TO_LOCAL && src == dst) so we'll don't
introduce unnecessary enum. I'm sorry for this suggestion before.

> +		exec[0].handle = dst;
> +		exec[1].handle = cmd;
> +		exec[1].relocation_count = !ahnd ? 4 : 0;
> +		exec[1].relocs_ptr = to_user_pointer(reloc);
> +		if (ahnd) {
> +			exec[0].offset = offset_src;
> +			exec[0].flags |= EXEC_OBJECT_PINNED | EXEC_OBJECT_WRITE;
> +			exec[1].offset = offset_dst;
> +			exec[1].flags |= EXEC_OBJECT_PINNED;
> +		}
> +	} else {
> +		exec[0].handle = src;
> +		exec[1].handle = dst;
> +		exec[2].handle = cmd;
> +		exec[2].relocation_count = !ahnd ? 4 : 0;
> +		exec[2].relocs_ptr = to_user_pointer(reloc);
> +		if (ahnd) {
> +			exec[0].offset = offset_src;
> +			exec[0].flags |= EXEC_OBJECT_PINNED;
> +			exec[1].offset = offset_dst;
> +			exec[1].flags |= EXEC_OBJECT_PINNED | EXEC_OBJECT_WRITE;
> +			exec[2].offset = offset_bb;
> +			exec[2].flags |= EXEC_OBJECT_PINNED;
> +		}
> +	}
> +
> +	memset(&execbuf, 0, sizeof(execbuf));
> +	execbuf.buffers_ptr = to_user_pointer(exec);
> +
> +	if (mode == LOCAL_TO_LOCAL_INPLACE)
> +		execbuf.buffer_count = 2;
> +	else
> +		execbuf.buffer_count = 3;

Can be part of if / else above.

> +	execbuf.batch_len = len;
> +
> +	if (ctx)
> +		execbuf.rsvd1 = ctx;
> +
> +	execbuf.flags = I915_EXEC_BLT;
> +	if (e)
> +		execbuf.flags = e->flags;

Some developers prefers initialization using if / else:

	if (e)
		execbuf.flags = e->flags;
	else 
		execbuf.flags = I915_EXEC_BLT;

So you can use this instead of overwriting. I don't have
strong preference about this. Personally I would rather use

	execbuf.flags = e ? e->flags : I915_EXEC_BLT;

Pick one :)

> +
> +	gem_execbuf(fd, &execbuf);
> +	gem_close(fd, cmd);
> +	put_offset(ahnd, src);
> +	put_offset(ahnd, dst);
> +	put_offset(ahnd, cmd);
> +}
> +
> +void xy_block_copy_blt(int fd, uint32_t bb_region, uint32_t src, uint32_t dst,
> +		       uint64_t src_size, uint64_t dst_size, uint64_t ahnd,
> +		       uint32_t length, enum copy_mode mode, bool enable_compression,
> +		       struct intel_execution_engine2 *e)
> +{
> +	__xy_block_copy_blt(fd, bb_region, src, dst, src_size, dst_size, ahnd,
> +			    length, mode, enable_compression, 0, e);
> +}
> +
> +void xy_block_copy_blt_ctx(int fd, uint32_t bb_region, uint32_t src, uint32_t dst,
> +			   uint64_t src_size, uint64_t dst_size, uint64_t ahnd,
> +			   uint32_t length, enum copy_mode mode, bool enable_compression,
> +			   uint32_t ctx, struct intel_execution_engine2 *e)
> +{
> +	__xy_block_copy_blt(fd, bb_region, src, dst, src_size, dst_size, ahnd,
> +			    length, mode, enable_compression, ctx, e);
> +}
> +
> +/*
> + * make_ctrl_surf_batch:
> + * @fd: open i915 drm file descriptor
> + * @batch_buf: the batch buffer to populate with the command
> + * @src: fd of the source BO
> + * @dst: fd of the destination BO

Same comment regarding fd like above.

> + * @length: size of the ctrl surf in bytes
> + * @reloc: pointer to the relocation entyr for this command
> + * @offset_src: source address offset
> + * @offset_dst: destination address offset
> + * @src_mem_access: source memory type (denotes direct or indirect
> + *			addressing)
> + * @dst_mem_acdcess: destination memory type (denotes direct or indirect
> + *			addressing)

Describe also what function returns.

> + */
> +static int make_ctrl_surf_batch(int fd, uint32_t *batch_buf,
> +				uint32_t src, uint32_t dst, uint32_t length,
> +				struct drm_i915_gem_relocation_entry *reloc,
> +				uint64_t offset_src, uint64_t offset_dst,
> +				int src_mem_access, int dst_mem_access)
> +{
> +	int num_ccs_blocks;
> +	uint32_t *b = batch_buf;
> +	uint8_t src_mocs = intel_get_uc_mocs(fd);
> +	uint8_t dst_mocs = src_mocs;
> +
> +	num_ccs_blocks = length/CCS_RATIO;
> +	if (num_ccs_blocks < 1)
> +		num_ccs_blocks = 1;
> +	if (num_ccs_blocks > NUM_CCS_BLKS_PER_XFER)
> +		return 0;
> +
> +	/*
> +	 * We use logical AND with 1023 since the size field
> +	 * takes values which is in the range of 0 - 1023
> +	 */
> +	b[0] = ((XY_CTRL_SURF_COPY_BLT) |
> +		(src_mem_access << SRC_ACCESS_TYPE_SHIFT) |
> +		(dst_mem_access << DST_ACCESS_TYPE_SHIFT) |
> +		(((num_ccs_blocks - 1) & 1023) << CCS_SIZE_SHIFT));
> +
> +	b[1] = offset_src;
> +	b[2] = offset_src >> 32 | src_mocs << XY_CTRL_SURF_COPY_BLT_MOCS_SHIFT;
> +
> +	/* relocate address in b[1] and b[2] */
> +	reloc->offset = 1 * sizeof(uint32_t);
> +	reloc->delta = 0;
> +	reloc->target_handle = src;
> +	reloc->read_domains = I915_GEM_DOMAIN_RENDER;
> +	reloc->write_domain = 0;
> +	reloc->presumed_offset = 0;
> +	reloc++;
> +
> +	b[3] = offset_dst;
> +	b[4] = offset_dst >> 32 | dst_mocs << XY_CTRL_SURF_COPY_BLT_MOCS_SHIFT;
> +
> +	/* relocate address in b[3] and b[4] */
> +	reloc->offset = 3 * (sizeof(uint32_t));
> +	reloc->delta = 0;
> +	reloc->target_handle = dst;
> +	reloc->read_domains = I915_GEM_DOMAIN_RENDER;
> +	reloc->write_domain = I915_GEM_DOMAIN_RENDER;
> +	reloc->presumed_offset = 0;
> +	reloc++;
> +
> +	b[5] = 0;
> +
> +	b[6] = MI_FLUSH_DW | MI_FLUSH_LLC | MI_INVALIDATE_TLB;
> +
> +	reloc->offset = 7 * sizeof(uint32_t);
> +	reloc->delta = 0;
> +	reloc->target_handle =
> +	dst_mem_access == INDIRECT_ACCESS ? dst : src;
> +	reloc->read_domains = 0;
> +	reloc->write_domain = 0;
> +	reloc->presumed_offset = 0;
> +	reloc++;
> +	b[9] = 0;
> +
> +	b[10] = MI_FLUSH_DW | MI_FLUSH_CCS;
> +	reloc->offset = 11 * sizeof(uint32_t);
> +	reloc->delta = 0;
> +	reloc->target_handle =
> +	dst_mem_access == INDIRECT_ACCESS ? dst : src;
> +	reloc->read_domains = 0;
> +	reloc->write_domain = 0;
> +	reloc->presumed_offset = 0;
> +	reloc++;
> +	b[13] = 0;
> +
> +	b[14] = MI_BATCH_BUFFER_END;
> +	b[15] = 0;
> +
> +	b += 16;
> +
> +	return (b - batch_buf) * sizeof(uint32_t);
> +}
> +
> +static void __xy_ctrl_surf_copy_blt(int fd, uint32_t bb_region, uint32_t src,
> +				    uint32_t dst, uint64_t src_size, uint64_t dst_size,
> +				    uint64_t ahnd, uint32_t length, bool writetodev,
> +				    uint32_t ctx, struct intel_execution_engine2 *e)
> +{
> +	struct drm_i915_gem_relocation_entry reloc[4];
> +	struct drm_i915_gem_exec_object2 exec[3];
> +	struct drm_i915_gem_execbuffer2 execbuf;
> +	int len, src_mem_access, dst_mem_access;
> +	uint32_t cmd, batch_buf[BATCH_SIZE/sizeof(uint32_t)] = {};
> +	uint64_t offset_src, offset_dst, offset_bb, bb_size, ret;
> +
> +	bb_size = BATCH_SIZE;
> +	ret = __gem_create_in_memory_regions(fd, &cmd, &bb_size, bb_region);
> +	igt_assert_eq(ret, 0);
> +
> +	if (writetodev) {
> +		src_mem_access = DIRECT_ACCESS;
> +		dst_mem_access = INDIRECT_ACCESS;
> +	} else {
> +		src_mem_access = INDIRECT_ACCESS;
> +		dst_mem_access = DIRECT_ACCESS;
> +	}
> +
> +	offset_src = get_offset(ahnd, src, src_size, 0);
> +	offset_dst = get_offset(ahnd, dst, dst_size, 0);
> +	offset_bb = get_offset(ahnd, cmd, bb_size, 0);
> +
> +	/* construct batch command buffer */
> +	memset(reloc, 0, sizeof(reloc));
> +	len = make_ctrl_surf_batch(fd, batch_buf,
> +				   src, dst, length, reloc,
> +				   offset_src, offset_dst,
> +				   src_mem_access, dst_mem_access);
> +	igt_assert(len > 0);

Assert is not necessary, function doesn't have alternate path
in which you could get 0.

> +
> +	/* Copy the batch buff to BO cmd */
> +	gem_write(fd, cmd, 0, batch_buf, len);
> +
> +	/* Execute the batch buffer */
> +	memset(exec, 0, sizeof(exec));
> +	exec[0].handle = src;
> +	exec[1].handle = dst;
> +	exec[2].handle = cmd;
> +	exec[2].relocation_count = !ahnd ? 4 : 0;
> +	exec[2].relocs_ptr = to_user_pointer(reloc);
> +	if (ahnd) {
> +		exec[0].offset = offset_src;
> +		exec[0].flags |= EXEC_OBJECT_PINNED;
> +		exec[1].offset = offset_dst;
> +		exec[1].flags |= EXEC_OBJECT_PINNED | EXEC_OBJECT_WRITE;
> +		exec[2].offset = offset_bb;
> +		exec[2].flags |= EXEC_OBJECT_PINNED;
> +	}
> +
> +	memset(&execbuf, 0, sizeof(execbuf));
> +	execbuf.buffers_ptr = to_user_pointer(exec);
> +	execbuf.buffer_count = 3;
> +	execbuf.batch_len = len;
> +	execbuf.flags = I915_EXEC_BLT;
> +	if (ctx)
> +		execbuf.rsvd1 = ctx;
> +	if (e)
> +		execbuf.flags = e->flags;

Same comment about execbuf.flags initialization like above.

> +
> +	gem_execbuf(fd, &execbuf);
> +	gem_close(fd, cmd);
> +	put_offset(ahnd, src);
> +	put_offset(ahnd, dst);
> +	put_offset(ahnd, cmd);
> +}
> +
> +void xy_ctrl_surf_copy_blt(int fd, uint32_t bb_region, uint32_t src, uint32_t dst,
> +			   uint64_t src_size, uint64_t dst_size, uint64_t ahnd,
> +			   uint32_t length, bool writetodev,
> +			   struct intel_execution_engine2 *e)
> +{
> +	__xy_ctrl_surf_copy_blt(fd, bb_region, src, dst, src_size, dst_size,
> +				ahnd, length, writetodev, 0, e);
> +}
> +
> +void xy_ctrl_surf_copy_blt_ctx(int fd, uint32_t bb_region, uint32_t src, uint32_t dst,
> +			       uint64_t src_size, uint64_t dst_size, uint64_t ahnd,
> +			       uint32_t length, bool writetodev, uint32_t ctx,
> +			       struct intel_execution_engine2 *e)
> +{
> +	__xy_ctrl_surf_copy_blt(fd, bb_region, src, dst, src_size, dst_size,
> +				ahnd, length, writetodev, ctx, e);
> +}
> +
> diff --git a/lib/i915/i915_blt.h b/lib/i915/i915_blt.h
> new file mode 100644
> index 00000000..71653880
> --- /dev/null
> +++ b/lib/i915/i915_blt.h
> @@ -0,0 +1,82 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright © 2021 Intel Corporation
> + */
> +
> +#include <errno.h>
> +#include <sys/ioctl.h>
> +#include <sys/time.h>
> +#include <malloc.h>
> +#include "drm.h"
> +#include "igt.h"
> +
> +#define MI_FLUSH_DW_LEN_DWORD	4
> +#define MI_FLUSH_DW		(0x26 << 23 | 1)
> +#define MI_FLUSH_CCS		(1 << 16)
> +#define MI_FLUSH_LLC		(1 << 9)
> +#define MI_INVALIDATE_TLB	(1 << 18)
> +
> +/* XY_BLOCK_COPY_BLT instruction has 22 bit groups 1 DWORD each */
> +#define XY_BLOCK_COPY_BLT_LEN_DWORD	22
> +#define BLOCK_COPY_BLT_CMD		(2 << 29 | 0x41 << 22 | 0x14)
> +#define COMPRESSION_ENABLE		(1 << 29)
> +#define AUX_CCS_E			(5 << 18)
> +#define FULL_RESOLVE			(1 << 12)
> +#define PARTIAL_RESOLVE			(2 << 12)
> +#define TILE_4_FORMAT			(2 << 30)
> +#define TILE_4_WIDTH			(128)
> +#define TILE_4_WIDTH_DWORD		((128 >> 2) - 1)
> +#define TILE_4_HEIGHT			(32)
> +#define SURFACE_TYPE_2D			(1 << 29)
> +
> +#define DEST_Y2_COORDINATE_SHIFT	(16)
> +#define DEST_MEM_TYPE_SHIFT		(31)
> +#define SRC_MEM_TYPE_SHIFT		(31)
> +#define DEST_SURF_WIDTH_SHIFT		(14)
> +#define SRC_SURF_WIDTH_SHIFT		(14)
> +
> +#define XY_CTRL_SURF_COPY_BLT		(2<<29 | 0x48<<22 | 3)
> +#define SRC_ACCESS_TYPE_SHIFT		21
> +#define DST_ACCESS_TYPE_SHIFT		20
> +#define CCS_SIZE_SHIFT			8
> +#define MI_INSTR(opcode, flags) (((opcode) << 23) | (flags))
> +#define MI_ARB_CHECK			MI_INSTR(0x05, 0)
> +#define NUM_CCS_BLKS_PER_XFER		1024
> +#define INDIRECT_ACCESS                 0
> +#define DIRECT_ACCESS                   1
> +
> +#define BATCH_SIZE			4096
> +#define BOSIZE_MIN			(4*1024)
> +#define BOSIZE_MAX			(4*1024*1024)
> +#define CCS_RATIO			256
> +
> +#define MEM_TYPE_SYS			1
> +#define MEM_TYPE_LOCAL			0
> +
> +enum copy_mode {
> +	SYS_TO_SYS = 0,
> +	SYS_TO_LOCAL,
> +	LOCAL_TO_SYS,
> +	LOCAL_TO_LOCAL,
> +	LOCAL_TO_LOCAL_INPLACE,
> +};
> +
> +void xy_block_copy_blt(int fd, uint32_t bb_region, uint32_t src, uint32_t dst,
> +		       uint64_t src_size, uint64_t dst_size, uint64_t ahnd,
> +		       uint32_t length, enum copy_mode mode, bool enable_compression,
> +		       struct intel_execution_engine2 *e);

I just realized myself passing e != NULL here will likely be wrong if you won't
pass ctx too. So I would remove functions with _ctx() and add ctx here.

Currently this is dry review, I'm not able to run this code on dg2 yet.

--
Zbigniew
 
> +
> +void xy_ctrl_surf_copy_blt(int fd, uint32_t bb_region, uint32_t src, uint32_t dst,
> +			   uint64_t src_size, uint64_t dst_size, uint64_t ahnd,
> +			   uint32_t length, bool writetodev,
> +			   struct intel_execution_engine2 *e);
> +
> +void xy_block_copy_blt_ctx(int fd, uint32_t bb_region, uint32_t src, uint32_t dst,
> +			   uint64_t src_size, uint64_t dst_size, uint64_t ahnd,
> +			   uint32_t length, enum copy_mode mode, bool enable_compression,
> +			   uint32_t ctx, struct intel_execution_engine2 *e);
> +
> +void xy_ctrl_surf_copy_blt_ctx(int fd, uint32_t bb_region, uint32_t src, uint32_t dst,
> +			       uint64_t src_size, uint64_t dst_size, uint64_t ahnd,
> +			       uint32_t length, bool writetodev, uint32_t ctx,
> +			       struct intel_execution_engine2 *e);
> diff --git a/lib/meson.build b/lib/meson.build
> index f500f0f1..f2924541 100644
> --- a/lib/meson.build
> +++ b/lib/meson.build
> @@ -12,6 +12,7 @@ lib_sources = [
>  	'i915/gem_vm.c',
>  	'i915/intel_memory_region.c',
>  	'i915/intel_mocs.c',
> +	'i915/i915_blt.c',
>  	'igt_collection.c',
>  	'igt_color_encoding.c',
>  	'igt_debugfs.c',
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [igt-dev] [PATCH i-g-t, v4 2/5] lib/i915: Introduce library i915_blt
  2021-12-10 13:05 ` [igt-dev] [PATCH i-g-t, v4 2/5] lib/i915: Introduce library i915_blt apoorva1.singh
  2021-12-15  9:40   ` Zbigniew Kempczyński
@ 2021-12-16 13:11   ` Zbigniew Kempczyński
  2021-12-16 13:15   ` Zbigniew Kempczyński
  2021-12-16 13:19   ` Zbigniew Kempczyński
  3 siblings, 0 replies; 14+ messages in thread
From: Zbigniew Kempczyński @ 2021-12-16 13:11 UTC (permalink / raw)
  To: apoorva1.singh; +Cc: igt-dev

On Fri, Dec 10, 2021 at 06:35:30PM +0530, apoorva1.singh@intel.com wrote:
> From: Apoorva Singh <apoorva1.singh@intel.com>
> 
> Add new library 'i915_blt' for various blt commands.
> 
> Signed-off-by: Apoorva Singh <apoorva1.singh@intel.com>
> Signed-off-by: Ayaz A Siddiqui <ayaz.siddiqui@intel.com>
> Cc: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com>
> Cc: Melkaveri, Arjun <arjun.melkaveri@intel.com>
> ---
>  lib/i915/i915_blt.c | 469 ++++++++++++++++++++++++++++++++++++++++++++
>  lib/i915/i915_blt.h |  82 ++++++++
>  lib/meson.build     |   1 +
>  3 files changed, 552 insertions(+)
>  create mode 100644 lib/i915/i915_blt.c
>  create mode 100644 lib/i915/i915_blt.h
> 
> diff --git a/lib/i915/i915_blt.c b/lib/i915/i915_blt.c
> new file mode 100644
> index 00000000..abfe7739
> --- /dev/null
> +++ b/lib/i915/i915_blt.c
> @@ -0,0 +1,469 @@
> +// SPDX-License-Identifier: MIT
> +/*
> + * Copyright © 2021 Intel Corporation
> + */
> +
> +#include <errno.h>
> +#include <sys/ioctl.h>
> +#include <sys/time.h>
> +#include <malloc.h>
> +#include "drm.h"
> +#include "igt.h"
> +#include "i915_blt.h"
> +#include "i915/intel_mocs.h"
> +
> +/*
> + * make_block_copy_batch:
> + * @fd: open i915 drm file descriptor
> + * @batch_buf: the batch buffer to populate with the command
> + * @src: fd of the source BO
> + * @dst: fd of the destination BO
> + * @length: size of the src and dest BOs
> + * @reloc: pointer to the relocation entyr for this command
> + * @offset_src: source address offset
> + * @offset_dst: destination address offset
> + * @src_mem_type: source memory type (denotes direct or indirect
> + *			addressing)
> + * @dst_mem_type: destination memory type (denotes direct or indirect
> + *			addressing)
> + * @src_compression: flag to enable uncompressed read of compressed data
> + *			at the source
> + * @dst_compression: flag to enable compressed write at the destination
> + * @resolve: flag to enable resolve of compressed data
> + */
> +static int make_block_copy_batch(int fd, uint32_t *batch_buf,
> +				 uint32_t src, uint32_t dst, uint32_t length,
> +				 struct drm_i915_gem_relocation_entry *reloc,
> +				 uint64_t offset_src, uint64_t offset_dst,
> +				 int src_mem_type, int dst_mem_type,
> +				 int src_compression, int dst_compression,
> +				 int resolve)
> +{
> +	uint32_t *b = batch_buf;
> +	uint32_t devid;
> +	uint8_t src_mocs = intel_get_uc_mocs(fd);
> +	uint8_t dst_mocs = src_mocs;
> +
> +	devid = intel_get_drm_devid(fd);
> +
> +	igt_assert(AT_LEAST_GEN(devid, 12) && IS_TIGERLAKE(devid) && !(src_compression || dst_compression));
> +
> +	/* BG 0 */
> +	b[0] = BLOCK_COPY_BLT_CMD | resolve;

I don't like how resolve is passed here. Caller must know at which bits
set this what is wrong imo. 
We should have definitions: RESOLVE_NONE = 0, RESOLVE_FULL = 1, RESOLVE_PARTIAL = 2
and bit shift to valid position should be done here.

--
Zbigniew

> +
> +	/* BG 1
> +	 *
> +	 * Using Tile 4 dimensions.  Height = 32 rows
> +	 * Width = 128 bytes
> +	 */
> +	b[1] = dst_compression | TILE_4_FORMAT | TILE_4_WIDTH_DWORD |
> +		dst_mocs << XY_BLOCK_COPY_BLT_MOCS_SHIFT;;
> +
> +	/* BG 3
> +	 *
> +	 * X2 = TILE_4_WIDTH
> +	 * Y2 = (length / TILE_4_WIDTH) << 16:
> +	 */
> +	b[3] = TILE_4_WIDTH | (length >> 7) << DEST_Y2_COORDINATE_SHIFT;
> +
> +	b[4] = offset_dst;
> +	b[5] = offset_dst >> 32;
> +
> +	/* relocate address in b[4] and b[5] */
> +	reloc->offset = 4 * (sizeof(uint32_t));
> +	reloc->delta = 0;
> +	reloc->target_handle = dst;
> +	reloc->read_domains = I915_GEM_DOMAIN_RENDER;
> +	reloc->write_domain = I915_GEM_DOMAIN_RENDER;
> +	reloc->presumed_offset = 0;
> +	reloc++;
> +
> +	/* BG 6 */
> +	b[6] = dst_mem_type << DEST_MEM_TYPE_SHIFT;
> +
> +	/* BG 8 */
> +	b[8] = src_compression | TILE_4_WIDTH_DWORD | TILE_4_FORMAT |
> +		src_mocs << XY_BLOCK_COPY_BLT_MOCS_SHIFT;
> +
> +	b[9] = offset_src;
> +	b[10] = offset_src >> 32;
> +
> +	/* relocate address in b[9] and b[10] */
> +	reloc->offset = 9 * sizeof(uint32_t);
> +	reloc->delta = 0;
> +	reloc->target_handle = src;
> +	reloc->read_domains = I915_GEM_DOMAIN_RENDER;
> +	reloc->write_domain = 0;
> +	reloc->presumed_offset = 0;
> +	reloc++;
> +
> +	/* BG 11 */
> +	b[11] = src_mem_type << SRC_MEM_TYPE_SHIFT;
> +
> +	/* BG 16  */
> +	b[16] = SURFACE_TYPE_2D |
> +		((TILE_4_WIDTH - 1) << DEST_SURF_WIDTH_SHIFT) |
> +		(TILE_4_HEIGHT - 1);
> +
> +	/* BG 19 */
> +	b[19] = SURFACE_TYPE_2D |
> +		((TILE_4_WIDTH - 1) << SRC_SURF_WIDTH_SHIFT) |
> +		(TILE_4_HEIGHT - 1);
> +
> +	b += XY_BLOCK_COPY_BLT_LEN_DWORD;
> +
> +	b[0] = MI_FLUSH_DW | MI_FLUSH_LLC | MI_INVALIDATE_TLB;
> +	reloc->offset = 23 * sizeof(uint32_t);
> +	reloc->delta = 0;
> +	reloc->target_handle = dst_compression > 0 ? dst : src;
> +	reloc->read_domains = 0;
> +	reloc->write_domain = 0;
> +	reloc->presumed_offset = 0;
> +	reloc++;
> +	b[3] = 0;
> +
> +	b[4] = MI_FLUSH_DW | MI_FLUSH_CCS;
> +	reloc->offset = 27 * sizeof(uint32_t);
> +	reloc->delta = 0;
> +	reloc->target_handle = dst_compression > 0 ? dst : src;
> +	reloc->read_domains = 0;
> +	reloc->write_domain = 0;
> +	reloc->presumed_offset = 0;
> +	reloc++;
> +	b[7] = 0;
> +
> +	b[8] = MI_BATCH_BUFFER_END;
> +	b[9] = 0;
> +
> +	b += 10;
> +
> +	return (b - batch_buf) * sizeof(uint32_t);
> +}
> +
> +static void __xy_block_copy_blt(int fd, uint32_t bb_region, uint32_t src, uint32_t dst,
> +				uint64_t src_size, uint64_t dst_size, uint64_t ahnd,
> +				uint32_t length, enum copy_mode mode, bool enable_compression,
> +				uint32_t ctx, struct intel_execution_engine2 *e)
> +{
> +	struct drm_i915_gem_relocation_entry reloc[4];
> +	struct drm_i915_gem_exec_object2 exec[3];
> +	struct drm_i915_gem_execbuffer2 execbuf;
> +	int len;
> +	int src_mem_type, dst_mem_type;
> +	int dst_compression, src_compression;
> +	int resolve;
> +	uint32_t cmd, batch_buf[BATCH_SIZE/sizeof(uint32_t)] = {};
> +	uint64_t offset_src, offset_dst, offset_bb, bb_size, ret;
> +
> +	bb_size = BATCH_SIZE;
> +	ret = __gem_create_in_memory_regions(fd, &cmd, &bb_size, bb_region);
> +	igt_assert_eq(ret, 0);
> +
> +	switch(mode) {
> +		case SYS_TO_SYS: /* copy from smem to smem */
> +			src_mem_type = MEM_TYPE_SYS;
> +			dst_mem_type = MEM_TYPE_SYS;
> +			src_compression = 0;
> +			dst_compression = 0;
> +			resolve = 0;
> +		case SYS_TO_LOCAL: /* copy from smem to lmem */
> +			src_mem_type = MEM_TYPE_SYS;
> +			dst_mem_type = MEM_TYPE_LOCAL;
> +			src_compression = 0;
> +			dst_compression = enable_compression ? (COMPRESSION_ENABLE | AUX_CCS_E) : 0;
> +			resolve = 0;
> +		case LOCAL_TO_SYS: /* copy from lmem to smem */
> +			src_mem_type = MEM_TYPE_LOCAL;
> +			dst_mem_type = MEM_TYPE_SYS;
> +			src_compression = enable_compression ? (COMPRESSION_ENABLE | AUX_CCS_E) : 0;
> +			dst_compression = 0;
> +			resolve = 0;
> +		case LOCAL_TO_LOCAL: /* copy from lmem to lmem */
> +			src_mem_type = MEM_TYPE_LOCAL;
> +			dst_mem_type = MEM_TYPE_LOCAL;
> +			src_compression = enable_compression ? (COMPRESSION_ENABLE | AUX_CCS_E) : 0;
> +			dst_compression = enable_compression ? (COMPRESSION_ENABLE | AUX_CCS_E) : 0;
> +			resolve = 0;
> +		case LOCAL_TO_LOCAL_INPLACE: /* in-place decompress */
> +			src_mem_type = MEM_TYPE_LOCAL;
> +			dst_mem_type = MEM_TYPE_LOCAL;
> +			src_compression = enable_compression ? (COMPRESSION_ENABLE | AUX_CCS_E) : 0;
> +			dst_compression = enable_compression ? (COMPRESSION_ENABLE | AUX_CCS_E) : 0;
> +			resolve = FULL_RESOLVE;
> +	}
> +
> +	offset_src = get_offset(ahnd, src, src_size, 0);
> +	offset_dst = get_offset(ahnd, dst, dst_size, 0);
> +	offset_bb = get_offset(ahnd, cmd, bb_size, 0);
> +
> +	/* construct the batch buffer */
> +	memset(reloc, 0, sizeof(reloc));
> +	len = make_block_copy_batch(fd, batch_buf,
> +				    src, dst, length, reloc,
> +				    offset_src, offset_dst,
> +				    src_mem_type, dst_mem_type,
> +				    src_compression, dst_compression,
> +				    resolve);
> +	igt_assert(len > 0);
> +
> +	/* write batch buffer to 'cmd' BO */
> +	gem_write(fd, cmd, 0, batch_buf, len);
> +
> +	/* Execute the batch buffer */
> +	memset(exec, 0, sizeof(exec));
> +	if (mode == LOCAL_TO_LOCAL_INPLACE) {
> +		exec[0].handle = dst;
> +		exec[1].handle = cmd;
> +		exec[1].relocation_count = !ahnd ? 4 : 0;
> +		exec[1].relocs_ptr = to_user_pointer(reloc);
> +		if (ahnd) {
> +			exec[0].offset = offset_src;
> +			exec[0].flags |= EXEC_OBJECT_PINNED | EXEC_OBJECT_WRITE;
> +			exec[1].offset = offset_dst;
> +			exec[1].flags |= EXEC_OBJECT_PINNED;
> +		}
> +	} else {
> +		exec[0].handle = src;
> +		exec[1].handle = dst;
> +		exec[2].handle = cmd;
> +		exec[2].relocation_count = !ahnd ? 4 : 0;
> +		exec[2].relocs_ptr = to_user_pointer(reloc);
> +		if (ahnd) {
> +			exec[0].offset = offset_src;
> +			exec[0].flags |= EXEC_OBJECT_PINNED;
> +			exec[1].offset = offset_dst;
> +			exec[1].flags |= EXEC_OBJECT_PINNED | EXEC_OBJECT_WRITE;
> +			exec[2].offset = offset_bb;
> +			exec[2].flags |= EXEC_OBJECT_PINNED;
> +		}
> +	}
> +
> +	memset(&execbuf, 0, sizeof(execbuf));
> +	execbuf.buffers_ptr = to_user_pointer(exec);
> +
> +	if (mode == LOCAL_TO_LOCAL_INPLACE)
> +		execbuf.buffer_count = 2;
> +	else
> +		execbuf.buffer_count = 3;
> +	execbuf.batch_len = len;
> +
> +	if (ctx)
> +		execbuf.rsvd1 = ctx;
> +
> +	execbuf.flags = I915_EXEC_BLT;
> +	if (e)
> +		execbuf.flags = e->flags;
> +
> +	gem_execbuf(fd, &execbuf);
> +	gem_close(fd, cmd);
> +	put_offset(ahnd, src);
> +	put_offset(ahnd, dst);
> +	put_offset(ahnd, cmd);
> +}
> +
> +void xy_block_copy_blt(int fd, uint32_t bb_region, uint32_t src, uint32_t dst,
> +		       uint64_t src_size, uint64_t dst_size, uint64_t ahnd,
> +		       uint32_t length, enum copy_mode mode, bool enable_compression,
> +		       struct intel_execution_engine2 *e)
> +{
> +	__xy_block_copy_blt(fd, bb_region, src, dst, src_size, dst_size, ahnd,
> +			    length, mode, enable_compression, 0, e);
> +}
> +
> +void xy_block_copy_blt_ctx(int fd, uint32_t bb_region, uint32_t src, uint32_t dst,
> +			   uint64_t src_size, uint64_t dst_size, uint64_t ahnd,
> +			   uint32_t length, enum copy_mode mode, bool enable_compression,
> +			   uint32_t ctx, struct intel_execution_engine2 *e)
> +{
> +	__xy_block_copy_blt(fd, bb_region, src, dst, src_size, dst_size, ahnd,
> +			    length, mode, enable_compression, ctx, e);
> +}
> +
> +/*
> + * make_ctrl_surf_batch:
> + * @fd: open i915 drm file descriptor
> + * @batch_buf: the batch buffer to populate with the command
> + * @src: fd of the source BO
> + * @dst: fd of the destination BO
> + * @length: size of the ctrl surf in bytes
> + * @reloc: pointer to the relocation entyr for this command
> + * @offset_src: source address offset
> + * @offset_dst: destination address offset
> + * @src_mem_access: source memory type (denotes direct or indirect
> + *			addressing)
> + * @dst_mem_acdcess: destination memory type (denotes direct or indirect
> + *			addressing)
> + */
> +static int make_ctrl_surf_batch(int fd, uint32_t *batch_buf,
> +				uint32_t src, uint32_t dst, uint32_t length,
> +				struct drm_i915_gem_relocation_entry *reloc,
> +				uint64_t offset_src, uint64_t offset_dst,
> +				int src_mem_access, int dst_mem_access)
> +{
> +	int num_ccs_blocks;
> +	uint32_t *b = batch_buf;
> +	uint8_t src_mocs = intel_get_uc_mocs(fd);
> +	uint8_t dst_mocs = src_mocs;
> +
> +	num_ccs_blocks = length/CCS_RATIO;
> +	if (num_ccs_blocks < 1)
> +		num_ccs_blocks = 1;
> +	if (num_ccs_blocks > NUM_CCS_BLKS_PER_XFER)
> +		return 0;
> +
> +	/*
> +	 * We use logical AND with 1023 since the size field
> +	 * takes values which is in the range of 0 - 1023
> +	 */
> +	b[0] = ((XY_CTRL_SURF_COPY_BLT) |
> +		(src_mem_access << SRC_ACCESS_TYPE_SHIFT) |
> +		(dst_mem_access << DST_ACCESS_TYPE_SHIFT) |
> +		(((num_ccs_blocks - 1) & 1023) << CCS_SIZE_SHIFT));
> +
> +	b[1] = offset_src;
> +	b[2] = offset_src >> 32 | src_mocs << XY_CTRL_SURF_COPY_BLT_MOCS_SHIFT;
> +
> +	/* relocate address in b[1] and b[2] */
> +	reloc->offset = 1 * sizeof(uint32_t);
> +	reloc->delta = 0;
> +	reloc->target_handle = src;
> +	reloc->read_domains = I915_GEM_DOMAIN_RENDER;
> +	reloc->write_domain = 0;
> +	reloc->presumed_offset = 0;
> +	reloc++;
> +
> +	b[3] = offset_dst;
> +	b[4] = offset_dst >> 32 | dst_mocs << XY_CTRL_SURF_COPY_BLT_MOCS_SHIFT;
> +
> +	/* relocate address in b[3] and b[4] */
> +	reloc->offset = 3 * (sizeof(uint32_t));
> +	reloc->delta = 0;
> +	reloc->target_handle = dst;
> +	reloc->read_domains = I915_GEM_DOMAIN_RENDER;
> +	reloc->write_domain = I915_GEM_DOMAIN_RENDER;
> +	reloc->presumed_offset = 0;
> +	reloc++;
> +
> +	b[5] = 0;
> +
> +	b[6] = MI_FLUSH_DW | MI_FLUSH_LLC | MI_INVALIDATE_TLB;
> +
> +	reloc->offset = 7 * sizeof(uint32_t);
> +	reloc->delta = 0;
> +	reloc->target_handle =
> +	dst_mem_access == INDIRECT_ACCESS ? dst : src;
> +	reloc->read_domains = 0;
> +	reloc->write_domain = 0;
> +	reloc->presumed_offset = 0;
> +	reloc++;
> +	b[9] = 0;
> +
> +	b[10] = MI_FLUSH_DW | MI_FLUSH_CCS;
> +	reloc->offset = 11 * sizeof(uint32_t);
> +	reloc->delta = 0;
> +	reloc->target_handle =
> +	dst_mem_access == INDIRECT_ACCESS ? dst : src;
> +	reloc->read_domains = 0;
> +	reloc->write_domain = 0;
> +	reloc->presumed_offset = 0;
> +	reloc++;
> +	b[13] = 0;
> +
> +	b[14] = MI_BATCH_BUFFER_END;
> +	b[15] = 0;
> +
> +	b += 16;
> +
> +	return (b - batch_buf) * sizeof(uint32_t);
> +}
> +
> +static void __xy_ctrl_surf_copy_blt(int fd, uint32_t bb_region, uint32_t src,
> +				    uint32_t dst, uint64_t src_size, uint64_t dst_size,
> +				    uint64_t ahnd, uint32_t length, bool writetodev,
> +				    uint32_t ctx, struct intel_execution_engine2 *e)
> +{
> +	struct drm_i915_gem_relocation_entry reloc[4];
> +	struct drm_i915_gem_exec_object2 exec[3];
> +	struct drm_i915_gem_execbuffer2 execbuf;
> +	int len, src_mem_access, dst_mem_access;
> +	uint32_t cmd, batch_buf[BATCH_SIZE/sizeof(uint32_t)] = {};
> +	uint64_t offset_src, offset_dst, offset_bb, bb_size, ret;
> +
> +	bb_size = BATCH_SIZE;
> +	ret = __gem_create_in_memory_regions(fd, &cmd, &bb_size, bb_region);
> +	igt_assert_eq(ret, 0);
> +
> +	if (writetodev) {
> +		src_mem_access = DIRECT_ACCESS;
> +		dst_mem_access = INDIRECT_ACCESS;
> +	} else {
> +		src_mem_access = INDIRECT_ACCESS;
> +		dst_mem_access = DIRECT_ACCESS;
> +	}
> +
> +	offset_src = get_offset(ahnd, src, src_size, 0);
> +	offset_dst = get_offset(ahnd, dst, dst_size, 0);
> +	offset_bb = get_offset(ahnd, cmd, bb_size, 0);
> +
> +	/* construct batch command buffer */
> +	memset(reloc, 0, sizeof(reloc));
> +	len = make_ctrl_surf_batch(fd, batch_buf,
> +				   src, dst, length, reloc,
> +				   offset_src, offset_dst,
> +				   src_mem_access, dst_mem_access);
> +	igt_assert(len > 0);
> +
> +	/* Copy the batch buff to BO cmd */
> +	gem_write(fd, cmd, 0, batch_buf, len);
> +
> +	/* Execute the batch buffer */
> +	memset(exec, 0, sizeof(exec));
> +	exec[0].handle = src;
> +	exec[1].handle = dst;
> +	exec[2].handle = cmd;
> +	exec[2].relocation_count = !ahnd ? 4 : 0;
> +	exec[2].relocs_ptr = to_user_pointer(reloc);
> +	if (ahnd) {
> +		exec[0].offset = offset_src;
> +		exec[0].flags |= EXEC_OBJECT_PINNED;
> +		exec[1].offset = offset_dst;
> +		exec[1].flags |= EXEC_OBJECT_PINNED | EXEC_OBJECT_WRITE;
> +		exec[2].offset = offset_bb;
> +		exec[2].flags |= EXEC_OBJECT_PINNED;
> +	}
> +
> +	memset(&execbuf, 0, sizeof(execbuf));
> +	execbuf.buffers_ptr = to_user_pointer(exec);
> +	execbuf.buffer_count = 3;
> +	execbuf.batch_len = len;
> +	execbuf.flags = I915_EXEC_BLT;
> +	if (ctx)
> +		execbuf.rsvd1 = ctx;
> +	if (e)
> +		execbuf.flags = e->flags;
> +
> +	gem_execbuf(fd, &execbuf);
> +	gem_close(fd, cmd);
> +	put_offset(ahnd, src);
> +	put_offset(ahnd, dst);
> +	put_offset(ahnd, cmd);
> +}
> +
> +void xy_ctrl_surf_copy_blt(int fd, uint32_t bb_region, uint32_t src, uint32_t dst,
> +			   uint64_t src_size, uint64_t dst_size, uint64_t ahnd,
> +			   uint32_t length, bool writetodev,
> +			   struct intel_execution_engine2 *e)
> +{
> +	__xy_ctrl_surf_copy_blt(fd, bb_region, src, dst, src_size, dst_size,
> +				ahnd, length, writetodev, 0, e);
> +}
> +
> +void xy_ctrl_surf_copy_blt_ctx(int fd, uint32_t bb_region, uint32_t src, uint32_t dst,
> +			       uint64_t src_size, uint64_t dst_size, uint64_t ahnd,
> +			       uint32_t length, bool writetodev, uint32_t ctx,
> +			       struct intel_execution_engine2 *e)
> +{
> +	__xy_ctrl_surf_copy_blt(fd, bb_region, src, dst, src_size, dst_size,
> +				ahnd, length, writetodev, ctx, e);
> +}
> +
> diff --git a/lib/i915/i915_blt.h b/lib/i915/i915_blt.h
> new file mode 100644
> index 00000000..71653880
> --- /dev/null
> +++ b/lib/i915/i915_blt.h
> @@ -0,0 +1,82 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright © 2021 Intel Corporation
> + */
> +
> +#include <errno.h>
> +#include <sys/ioctl.h>
> +#include <sys/time.h>
> +#include <malloc.h>
> +#include "drm.h"
> +#include "igt.h"
> +
> +#define MI_FLUSH_DW_LEN_DWORD	4
> +#define MI_FLUSH_DW		(0x26 << 23 | 1)
> +#define MI_FLUSH_CCS		(1 << 16)
> +#define MI_FLUSH_LLC		(1 << 9)
> +#define MI_INVALIDATE_TLB	(1 << 18)
> +
> +/* XY_BLOCK_COPY_BLT instruction has 22 bit groups 1 DWORD each */
> +#define XY_BLOCK_COPY_BLT_LEN_DWORD	22
> +#define BLOCK_COPY_BLT_CMD		(2 << 29 | 0x41 << 22 | 0x14)
> +#define COMPRESSION_ENABLE		(1 << 29)
> +#define AUX_CCS_E			(5 << 18)
> +#define FULL_RESOLVE			(1 << 12)
> +#define PARTIAL_RESOLVE			(2 << 12)
> +#define TILE_4_FORMAT			(2 << 30)
> +#define TILE_4_WIDTH			(128)
> +#define TILE_4_WIDTH_DWORD		((128 >> 2) - 1)
> +#define TILE_4_HEIGHT			(32)
> +#define SURFACE_TYPE_2D			(1 << 29)
> +
> +#define DEST_Y2_COORDINATE_SHIFT	(16)
> +#define DEST_MEM_TYPE_SHIFT		(31)
> +#define SRC_MEM_TYPE_SHIFT		(31)
> +#define DEST_SURF_WIDTH_SHIFT		(14)
> +#define SRC_SURF_WIDTH_SHIFT		(14)
> +
> +#define XY_CTRL_SURF_COPY_BLT		(2<<29 | 0x48<<22 | 3)
> +#define SRC_ACCESS_TYPE_SHIFT		21
> +#define DST_ACCESS_TYPE_SHIFT		20
> +#define CCS_SIZE_SHIFT			8
> +#define MI_INSTR(opcode, flags) (((opcode) << 23) | (flags))
> +#define MI_ARB_CHECK			MI_INSTR(0x05, 0)
> +#define NUM_CCS_BLKS_PER_XFER		1024
> +#define INDIRECT_ACCESS                 0
> +#define DIRECT_ACCESS                   1
> +
> +#define BATCH_SIZE			4096
> +#define BOSIZE_MIN			(4*1024)
> +#define BOSIZE_MAX			(4*1024*1024)
> +#define CCS_RATIO			256
> +
> +#define MEM_TYPE_SYS			1
> +#define MEM_TYPE_LOCAL			0
> +
> +enum copy_mode {
> +	SYS_TO_SYS = 0,
> +	SYS_TO_LOCAL,
> +	LOCAL_TO_SYS,
> +	LOCAL_TO_LOCAL,
> +	LOCAL_TO_LOCAL_INPLACE,
> +};
> +
> +void xy_block_copy_blt(int fd, uint32_t bb_region, uint32_t src, uint32_t dst,
> +		       uint64_t src_size, uint64_t dst_size, uint64_t ahnd,
> +		       uint32_t length, enum copy_mode mode, bool enable_compression,
> +		       struct intel_execution_engine2 *e);
> +
> +void xy_ctrl_surf_copy_blt(int fd, uint32_t bb_region, uint32_t src, uint32_t dst,
> +			   uint64_t src_size, uint64_t dst_size, uint64_t ahnd,
> +			   uint32_t length, bool writetodev,
> +			   struct intel_execution_engine2 *e);
> +
> +void xy_block_copy_blt_ctx(int fd, uint32_t bb_region, uint32_t src, uint32_t dst,
> +			   uint64_t src_size, uint64_t dst_size, uint64_t ahnd,
> +			   uint32_t length, enum copy_mode mode, bool enable_compression,
> +			   uint32_t ctx, struct intel_execution_engine2 *e);
> +
> +void xy_ctrl_surf_copy_blt_ctx(int fd, uint32_t bb_region, uint32_t src, uint32_t dst,
> +			       uint64_t src_size, uint64_t dst_size, uint64_t ahnd,
> +			       uint32_t length, bool writetodev, uint32_t ctx,
> +			       struct intel_execution_engine2 *e);
> diff --git a/lib/meson.build b/lib/meson.build
> index f500f0f1..f2924541 100644
> --- a/lib/meson.build
> +++ b/lib/meson.build
> @@ -12,6 +12,7 @@ lib_sources = [
>  	'i915/gem_vm.c',
>  	'i915/intel_memory_region.c',
>  	'i915/intel_mocs.c',
> +	'i915/i915_blt.c',
>  	'igt_collection.c',
>  	'igt_color_encoding.c',
>  	'igt_debugfs.c',
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [igt-dev] [PATCH i-g-t, v4 2/5] lib/i915: Introduce library i915_blt
  2021-12-10 13:05 ` [igt-dev] [PATCH i-g-t, v4 2/5] lib/i915: Introduce library i915_blt apoorva1.singh
  2021-12-15  9:40   ` Zbigniew Kempczyński
  2021-12-16 13:11   ` Zbigniew Kempczyński
@ 2021-12-16 13:15   ` Zbigniew Kempczyński
  2021-12-16 13:19   ` Zbigniew Kempczyński
  3 siblings, 0 replies; 14+ messages in thread
From: Zbigniew Kempczyński @ 2021-12-16 13:15 UTC (permalink / raw)
  To: apoorva1.singh; +Cc: igt-dev

On Fri, Dec 10, 2021 at 06:35:30PM +0530, apoorva1.singh@intel.com wrote:
> From: Apoorva Singh <apoorva1.singh@intel.com>
> 
> Add new library 'i915_blt' for various blt commands.
> 
> Signed-off-by: Apoorva Singh <apoorva1.singh@intel.com>
> Signed-off-by: Ayaz A Siddiqui <ayaz.siddiqui@intel.com>
> Cc: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com>
> Cc: Melkaveri, Arjun <arjun.melkaveri@intel.com>
> ---
>  lib/i915/i915_blt.c | 469 ++++++++++++++++++++++++++++++++++++++++++++
>  lib/i915/i915_blt.h |  82 ++++++++
>  lib/meson.build     |   1 +
>  3 files changed, 552 insertions(+)
>  create mode 100644 lib/i915/i915_blt.c
>  create mode 100644 lib/i915/i915_blt.h
> 
> diff --git a/lib/i915/i915_blt.c b/lib/i915/i915_blt.c
> new file mode 100644
> index 00000000..abfe7739
> --- /dev/null
> +++ b/lib/i915/i915_blt.c
> @@ -0,0 +1,469 @@
> +// SPDX-License-Identifier: MIT
> +/*
> + * Copyright © 2021 Intel Corporation
> + */
> +
> +#include <errno.h>
> +#include <sys/ioctl.h>
> +#include <sys/time.h>
> +#include <malloc.h>
> +#include "drm.h"
> +#include "igt.h"
> +#include "i915_blt.h"
> +#include "i915/intel_mocs.h"
> +
> +/*
> + * make_block_copy_batch:
> + * @fd: open i915 drm file descriptor
> + * @batch_buf: the batch buffer to populate with the command
> + * @src: fd of the source BO
> + * @dst: fd of the destination BO
> + * @length: size of the src and dest BOs
> + * @reloc: pointer to the relocation entyr for this command
> + * @offset_src: source address offset
> + * @offset_dst: destination address offset
> + * @src_mem_type: source memory type (denotes direct or indirect
> + *			addressing)
> + * @dst_mem_type: destination memory type (denotes direct or indirect
> + *			addressing)
> + * @src_compression: flag to enable uncompressed read of compressed data
> + *			at the source
> + * @dst_compression: flag to enable compressed write at the destination
> + * @resolve: flag to enable resolve of compressed data
> + */
> +static int make_block_copy_batch(int fd, uint32_t *batch_buf,
> +				 uint32_t src, uint32_t dst, uint32_t length,
> +				 struct drm_i915_gem_relocation_entry *reloc,
> +				 uint64_t offset_src, uint64_t offset_dst,
> +				 int src_mem_type, int dst_mem_type,
> +				 int src_compression, int dst_compression,
> +				 int resolve)
> +{
> +	uint32_t *b = batch_buf;
> +	uint32_t devid;
> +	uint8_t src_mocs = intel_get_uc_mocs(fd);
> +	uint8_t dst_mocs = src_mocs;
> +
> +	devid = intel_get_drm_devid(fd);
> +
> +	igt_assert(AT_LEAST_GEN(devid, 12) && IS_TIGERLAKE(devid) && !(src_compression || dst_compression));
> +
> +	/* BG 0 */
> +	b[0] = BLOCK_COPY_BLT_CMD | resolve;
> +
> +	/* BG 1
> +	 *
> +	 * Using Tile 4 dimensions.  Height = 32 rows
> +	 * Width = 128 bytes
> +	 */
> +	b[1] = dst_compression | TILE_4_FORMAT | TILE_4_WIDTH_DWORD |
> +		dst_mocs << XY_BLOCK_COPY_BLT_MOCS_SHIFT;;
> +
> +	/* BG 3
> +	 *
> +	 * X2 = TILE_4_WIDTH
> +	 * Y2 = (length / TILE_4_WIDTH) << 16:
> +	 */
> +	b[3] = TILE_4_WIDTH | (length >> 7) << DEST_Y2_COORDINATE_SHIFT;
> +
> +	b[4] = offset_dst;
> +	b[5] = offset_dst >> 32;
> +
> +	/* relocate address in b[4] and b[5] */
> +	reloc->offset = 4 * (sizeof(uint32_t));
> +	reloc->delta = 0;
> +	reloc->target_handle = dst;
> +	reloc->read_domains = I915_GEM_DOMAIN_RENDER;
> +	reloc->write_domain = I915_GEM_DOMAIN_RENDER;
> +	reloc->presumed_offset = 0;
> +	reloc++;
> +
> +	/* BG 6 */
> +	b[6] = dst_mem_type << DEST_MEM_TYPE_SHIFT;
> +
> +	/* BG 8 */
> +	b[8] = src_compression | TILE_4_WIDTH_DWORD | TILE_4_FORMAT |
> +		src_mocs << XY_BLOCK_COPY_BLT_MOCS_SHIFT;
> +
> +	b[9] = offset_src;
> +	b[10] = offset_src >> 32;
> +
> +	/* relocate address in b[9] and b[10] */
> +	reloc->offset = 9 * sizeof(uint32_t);
> +	reloc->delta = 0;
> +	reloc->target_handle = src;
> +	reloc->read_domains = I915_GEM_DOMAIN_RENDER;
> +	reloc->write_domain = 0;
> +	reloc->presumed_offset = 0;
> +	reloc++;
> +
> +	/* BG 11 */
> +	b[11] = src_mem_type << SRC_MEM_TYPE_SHIFT;
> +
> +	/* BG 16  */
> +	b[16] = SURFACE_TYPE_2D |
> +		((TILE_4_WIDTH - 1) << DEST_SURF_WIDTH_SHIFT) |
> +		(TILE_4_HEIGHT - 1);
> +
> +	/* BG 19 */
> +	b[19] = SURFACE_TYPE_2D |
> +		((TILE_4_WIDTH - 1) << SRC_SURF_WIDTH_SHIFT) |
> +		(TILE_4_HEIGHT - 1);
> +
> +	b += XY_BLOCK_COPY_BLT_LEN_DWORD;
> +
> +	b[0] = MI_FLUSH_DW | MI_FLUSH_LLC | MI_INVALIDATE_TLB;
> +	reloc->offset = 23 * sizeof(uint32_t);
> +	reloc->delta = 0;
> +	reloc->target_handle = dst_compression > 0 ? dst : src;
> +	reloc->read_domains = 0;
> +	reloc->write_domain = 0;
> +	reloc->presumed_offset = 0;
> +	reloc++;
> +	b[3] = 0;
> +
> +	b[4] = MI_FLUSH_DW | MI_FLUSH_CCS;
> +	reloc->offset = 27 * sizeof(uint32_t);
> +	reloc->delta = 0;
> +	reloc->target_handle = dst_compression > 0 ? dst : src;
> +	reloc->read_domains = 0;
> +	reloc->write_domain = 0;
> +	reloc->presumed_offset = 0;
> +	reloc++;
> +	b[7] = 0;
> +
> +	b[8] = MI_BATCH_BUFFER_END;
> +	b[9] = 0;
> +
> +	b += 10;
> +
> +	return (b - batch_buf) * sizeof(uint32_t);
> +}
> +
> +static void __xy_block_copy_blt(int fd, uint32_t bb_region, uint32_t src, uint32_t dst,
> +				uint64_t src_size, uint64_t dst_size, uint64_t ahnd,
> +				uint32_t length, enum copy_mode mode, bool enable_compression,
> +				uint32_t ctx, struct intel_execution_engine2 *e)
> +{
> +	struct drm_i915_gem_relocation_entry reloc[4];
> +	struct drm_i915_gem_exec_object2 exec[3];
> +	struct drm_i915_gem_execbuffer2 execbuf;
> +	int len;
> +	int src_mem_type, dst_mem_type;
> +	int dst_compression, src_compression;
> +	int resolve;
> +	uint32_t cmd, batch_buf[BATCH_SIZE/sizeof(uint32_t)] = {};
> +	uint64_t offset_src, offset_dst, offset_bb, bb_size, ret;
> +
> +	bb_size = BATCH_SIZE;
> +	ret = __gem_create_in_memory_regions(fd, &cmd, &bb_size, bb_region);
> +	igt_assert_eq(ret, 0);
> +
> +	switch(mode) {
> +		case SYS_TO_SYS: /* copy from smem to smem */
> +			src_mem_type = MEM_TYPE_SYS;
> +			dst_mem_type = MEM_TYPE_SYS;
> +			src_compression = 0;
> +			dst_compression = 0;
> +			resolve = 0;
> +		case SYS_TO_LOCAL: /* copy from smem to lmem */
> +			src_mem_type = MEM_TYPE_SYS;
> +			dst_mem_type = MEM_TYPE_LOCAL;
> +			src_compression = 0;
> +			dst_compression = enable_compression ? (COMPRESSION_ENABLE | AUX_CCS_E) : 0;
> +			resolve = 0;
> +		case LOCAL_TO_SYS: /* copy from lmem to smem */
> +			src_mem_type = MEM_TYPE_LOCAL;
> +			dst_mem_type = MEM_TYPE_SYS;
> +			src_compression = enable_compression ? (COMPRESSION_ENABLE | AUX_CCS_E) : 0;
> +			dst_compression = 0;
> +			resolve = 0;
> +		case LOCAL_TO_LOCAL: /* copy from lmem to lmem */
> +			src_mem_type = MEM_TYPE_LOCAL;
> +			dst_mem_type = MEM_TYPE_LOCAL;
> +			src_compression = enable_compression ? (COMPRESSION_ENABLE | AUX_CCS_E) : 0;
> +			dst_compression = enable_compression ? (COMPRESSION_ENABLE | AUX_CCS_E) : 0;
> +			resolve = 0;
> +		case LOCAL_TO_LOCAL_INPLACE: /* in-place decompress */
> +			src_mem_type = MEM_TYPE_LOCAL;
> +			dst_mem_type = MEM_TYPE_LOCAL;
> +			src_compression = enable_compression ? (COMPRESSION_ENABLE | AUX_CCS_E) : 0;
> +			dst_compression = enable_compression ? (COMPRESSION_ENABLE | AUX_CCS_E) : 0;

Same here - we distribute bit settings across two functions:
__xy_block_copy_blt() and make_block_copy_batch() what I don't like.
I think we should pass all required parameters to make_block_copy_batch
to avoid passing already shifted values. I think we should have single point 
(make_block_copy_batch()) which does all required bitshifts.

--
Zbigniew
 
> +			resolve = FULL_RESOLVE;
> +	}
> +
> +	offset_src = get_offset(ahnd, src, src_size, 0);
> +	offset_dst = get_offset(ahnd, dst, dst_size, 0);
> +	offset_bb = get_offset(ahnd, cmd, bb_size, 0);
> +
> +	/* construct the batch buffer */
> +	memset(reloc, 0, sizeof(reloc));
> +	len = make_block_copy_batch(fd, batch_buf,
> +				    src, dst, length, reloc,
> +				    offset_src, offset_dst,
> +				    src_mem_type, dst_mem_type,
> +				    src_compression, dst_compression,
> +				    resolve);
> +	igt_assert(len > 0);
> +
> +	/* write batch buffer to 'cmd' BO */
> +	gem_write(fd, cmd, 0, batch_buf, len);
> +
> +	/* Execute the batch buffer */
> +	memset(exec, 0, sizeof(exec));
> +	if (mode == LOCAL_TO_LOCAL_INPLACE) {
> +		exec[0].handle = dst;
> +		exec[1].handle = cmd;
> +		exec[1].relocation_count = !ahnd ? 4 : 0;
> +		exec[1].relocs_ptr = to_user_pointer(reloc);
> +		if (ahnd) {
> +			exec[0].offset = offset_src;
> +			exec[0].flags |= EXEC_OBJECT_PINNED | EXEC_OBJECT_WRITE;
> +			exec[1].offset = offset_dst;
> +			exec[1].flags |= EXEC_OBJECT_PINNED;
> +		}
> +	} else {
> +		exec[0].handle = src;
> +		exec[1].handle = dst;
> +		exec[2].handle = cmd;
> +		exec[2].relocation_count = !ahnd ? 4 : 0;
> +		exec[2].relocs_ptr = to_user_pointer(reloc);
> +		if (ahnd) {
> +			exec[0].offset = offset_src;
> +			exec[0].flags |= EXEC_OBJECT_PINNED;
> +			exec[1].offset = offset_dst;
> +			exec[1].flags |= EXEC_OBJECT_PINNED | EXEC_OBJECT_WRITE;
> +			exec[2].offset = offset_bb;
> +			exec[2].flags |= EXEC_OBJECT_PINNED;
> +		}
> +	}
> +
> +	memset(&execbuf, 0, sizeof(execbuf));
> +	execbuf.buffers_ptr = to_user_pointer(exec);
> +
> +	if (mode == LOCAL_TO_LOCAL_INPLACE)
> +		execbuf.buffer_count = 2;
> +	else
> +		execbuf.buffer_count = 3;
> +	execbuf.batch_len = len;
> +
> +	if (ctx)
> +		execbuf.rsvd1 = ctx;
> +
> +	execbuf.flags = I915_EXEC_BLT;
> +	if (e)
> +		execbuf.flags = e->flags;
> +
> +	gem_execbuf(fd, &execbuf);
> +	gem_close(fd, cmd);
> +	put_offset(ahnd, src);
> +	put_offset(ahnd, dst);
> +	put_offset(ahnd, cmd);
> +}
> +
> +void xy_block_copy_blt(int fd, uint32_t bb_region, uint32_t src, uint32_t dst,
> +		       uint64_t src_size, uint64_t dst_size, uint64_t ahnd,
> +		       uint32_t length, enum copy_mode mode, bool enable_compression,
> +		       struct intel_execution_engine2 *e)
> +{
> +	__xy_block_copy_blt(fd, bb_region, src, dst, src_size, dst_size, ahnd,
> +			    length, mode, enable_compression, 0, e);
> +}
> +
> +void xy_block_copy_blt_ctx(int fd, uint32_t bb_region, uint32_t src, uint32_t dst,
> +			   uint64_t src_size, uint64_t dst_size, uint64_t ahnd,
> +			   uint32_t length, enum copy_mode mode, bool enable_compression,
> +			   uint32_t ctx, struct intel_execution_engine2 *e)
> +{
> +	__xy_block_copy_blt(fd, bb_region, src, dst, src_size, dst_size, ahnd,
> +			    length, mode, enable_compression, ctx, e);
> +}
> +
> +/*
> + * make_ctrl_surf_batch:
> + * @fd: open i915 drm file descriptor
> + * @batch_buf: the batch buffer to populate with the command
> + * @src: fd of the source BO
> + * @dst: fd of the destination BO
> + * @length: size of the ctrl surf in bytes
> + * @reloc: pointer to the relocation entyr for this command
> + * @offset_src: source address offset
> + * @offset_dst: destination address offset
> + * @src_mem_access: source memory type (denotes direct or indirect
> + *			addressing)
> + * @dst_mem_acdcess: destination memory type (denotes direct or indirect
> + *			addressing)
> + */
> +static int make_ctrl_surf_batch(int fd, uint32_t *batch_buf,
> +				uint32_t src, uint32_t dst, uint32_t length,
> +				struct drm_i915_gem_relocation_entry *reloc,
> +				uint64_t offset_src, uint64_t offset_dst,
> +				int src_mem_access, int dst_mem_access)
> +{
> +	int num_ccs_blocks;
> +	uint32_t *b = batch_buf;
> +	uint8_t src_mocs = intel_get_uc_mocs(fd);
> +	uint8_t dst_mocs = src_mocs;
> +
> +	num_ccs_blocks = length/CCS_RATIO;
> +	if (num_ccs_blocks < 1)
> +		num_ccs_blocks = 1;
> +	if (num_ccs_blocks > NUM_CCS_BLKS_PER_XFER)
> +		return 0;
> +
> +	/*
> +	 * We use logical AND with 1023 since the size field
> +	 * takes values which is in the range of 0 - 1023
> +	 */
> +	b[0] = ((XY_CTRL_SURF_COPY_BLT) |
> +		(src_mem_access << SRC_ACCESS_TYPE_SHIFT) |
> +		(dst_mem_access << DST_ACCESS_TYPE_SHIFT) |
> +		(((num_ccs_blocks - 1) & 1023) << CCS_SIZE_SHIFT));
> +
> +	b[1] = offset_src;
> +	b[2] = offset_src >> 32 | src_mocs << XY_CTRL_SURF_COPY_BLT_MOCS_SHIFT;
> +
> +	/* relocate address in b[1] and b[2] */
> +	reloc->offset = 1 * sizeof(uint32_t);
> +	reloc->delta = 0;
> +	reloc->target_handle = src;
> +	reloc->read_domains = I915_GEM_DOMAIN_RENDER;
> +	reloc->write_domain = 0;
> +	reloc->presumed_offset = 0;
> +	reloc++;
> +
> +	b[3] = offset_dst;
> +	b[4] = offset_dst >> 32 | dst_mocs << XY_CTRL_SURF_COPY_BLT_MOCS_SHIFT;
> +
> +	/* relocate address in b[3] and b[4] */
> +	reloc->offset = 3 * (sizeof(uint32_t));
> +	reloc->delta = 0;
> +	reloc->target_handle = dst;
> +	reloc->read_domains = I915_GEM_DOMAIN_RENDER;
> +	reloc->write_domain = I915_GEM_DOMAIN_RENDER;
> +	reloc->presumed_offset = 0;
> +	reloc++;
> +
> +	b[5] = 0;
> +
> +	b[6] = MI_FLUSH_DW | MI_FLUSH_LLC | MI_INVALIDATE_TLB;
> +
> +	reloc->offset = 7 * sizeof(uint32_t);
> +	reloc->delta = 0;
> +	reloc->target_handle =
> +	dst_mem_access == INDIRECT_ACCESS ? dst : src;
> +	reloc->read_domains = 0;
> +	reloc->write_domain = 0;
> +	reloc->presumed_offset = 0;
> +	reloc++;
> +	b[9] = 0;
> +
> +	b[10] = MI_FLUSH_DW | MI_FLUSH_CCS;
> +	reloc->offset = 11 * sizeof(uint32_t);
> +	reloc->delta = 0;
> +	reloc->target_handle =
> +	dst_mem_access == INDIRECT_ACCESS ? dst : src;
> +	reloc->read_domains = 0;
> +	reloc->write_domain = 0;
> +	reloc->presumed_offset = 0;
> +	reloc++;
> +	b[13] = 0;
> +
> +	b[14] = MI_BATCH_BUFFER_END;
> +	b[15] = 0;
> +
> +	b += 16;
> +
> +	return (b - batch_buf) * sizeof(uint32_t);
> +}
> +
> +static void __xy_ctrl_surf_copy_blt(int fd, uint32_t bb_region, uint32_t src,
> +				    uint32_t dst, uint64_t src_size, uint64_t dst_size,
> +				    uint64_t ahnd, uint32_t length, bool writetodev,
> +				    uint32_t ctx, struct intel_execution_engine2 *e)
> +{
> +	struct drm_i915_gem_relocation_entry reloc[4];
> +	struct drm_i915_gem_exec_object2 exec[3];
> +	struct drm_i915_gem_execbuffer2 execbuf;
> +	int len, src_mem_access, dst_mem_access;
> +	uint32_t cmd, batch_buf[BATCH_SIZE/sizeof(uint32_t)] = {};
> +	uint64_t offset_src, offset_dst, offset_bb, bb_size, ret;
> +
> +	bb_size = BATCH_SIZE;
> +	ret = __gem_create_in_memory_regions(fd, &cmd, &bb_size, bb_region);
> +	igt_assert_eq(ret, 0);
> +
> +	if (writetodev) {
> +		src_mem_access = DIRECT_ACCESS;
> +		dst_mem_access = INDIRECT_ACCESS;
> +	} else {
> +		src_mem_access = INDIRECT_ACCESS;
> +		dst_mem_access = DIRECT_ACCESS;
> +	}
> +
> +	offset_src = get_offset(ahnd, src, src_size, 0);
> +	offset_dst = get_offset(ahnd, dst, dst_size, 0);
> +	offset_bb = get_offset(ahnd, cmd, bb_size, 0);
> +
> +	/* construct batch command buffer */
> +	memset(reloc, 0, sizeof(reloc));
> +	len = make_ctrl_surf_batch(fd, batch_buf,
> +				   src, dst, length, reloc,
> +				   offset_src, offset_dst,
> +				   src_mem_access, dst_mem_access);
> +	igt_assert(len > 0);
> +
> +	/* Copy the batch buff to BO cmd */
> +	gem_write(fd, cmd, 0, batch_buf, len);
> +
> +	/* Execute the batch buffer */
> +	memset(exec, 0, sizeof(exec));
> +	exec[0].handle = src;
> +	exec[1].handle = dst;
> +	exec[2].handle = cmd;
> +	exec[2].relocation_count = !ahnd ? 4 : 0;
> +	exec[2].relocs_ptr = to_user_pointer(reloc);
> +	if (ahnd) {
> +		exec[0].offset = offset_src;
> +		exec[0].flags |= EXEC_OBJECT_PINNED;
> +		exec[1].offset = offset_dst;
> +		exec[1].flags |= EXEC_OBJECT_PINNED | EXEC_OBJECT_WRITE;
> +		exec[2].offset = offset_bb;
> +		exec[2].flags |= EXEC_OBJECT_PINNED;
> +	}
> +
> +	memset(&execbuf, 0, sizeof(execbuf));
> +	execbuf.buffers_ptr = to_user_pointer(exec);
> +	execbuf.buffer_count = 3;
> +	execbuf.batch_len = len;
> +	execbuf.flags = I915_EXEC_BLT;
> +	if (ctx)
> +		execbuf.rsvd1 = ctx;
> +	if (e)
> +		execbuf.flags = e->flags;
> +
> +	gem_execbuf(fd, &execbuf);
> +	gem_close(fd, cmd);
> +	put_offset(ahnd, src);
> +	put_offset(ahnd, dst);
> +	put_offset(ahnd, cmd);
> +}
> +
> +void xy_ctrl_surf_copy_blt(int fd, uint32_t bb_region, uint32_t src, uint32_t dst,
> +			   uint64_t src_size, uint64_t dst_size, uint64_t ahnd,
> +			   uint32_t length, bool writetodev,
> +			   struct intel_execution_engine2 *e)
> +{
> +	__xy_ctrl_surf_copy_blt(fd, bb_region, src, dst, src_size, dst_size,
> +				ahnd, length, writetodev, 0, e);
> +}
> +
> +void xy_ctrl_surf_copy_blt_ctx(int fd, uint32_t bb_region, uint32_t src, uint32_t dst,
> +			       uint64_t src_size, uint64_t dst_size, uint64_t ahnd,
> +			       uint32_t length, bool writetodev, uint32_t ctx,
> +			       struct intel_execution_engine2 *e)
> +{
> +	__xy_ctrl_surf_copy_blt(fd, bb_region, src, dst, src_size, dst_size,
> +				ahnd, length, writetodev, ctx, e);
> +}
> +
> diff --git a/lib/i915/i915_blt.h b/lib/i915/i915_blt.h
> new file mode 100644
> index 00000000..71653880
> --- /dev/null
> +++ b/lib/i915/i915_blt.h
> @@ -0,0 +1,82 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright © 2021 Intel Corporation
> + */
> +
> +#include <errno.h>
> +#include <sys/ioctl.h>
> +#include <sys/time.h>
> +#include <malloc.h>
> +#include "drm.h"
> +#include "igt.h"
> +
> +#define MI_FLUSH_DW_LEN_DWORD	4
> +#define MI_FLUSH_DW		(0x26 << 23 | 1)
> +#define MI_FLUSH_CCS		(1 << 16)
> +#define MI_FLUSH_LLC		(1 << 9)
> +#define MI_INVALIDATE_TLB	(1 << 18)
> +
> +/* XY_BLOCK_COPY_BLT instruction has 22 bit groups 1 DWORD each */
> +#define XY_BLOCK_COPY_BLT_LEN_DWORD	22
> +#define BLOCK_COPY_BLT_CMD		(2 << 29 | 0x41 << 22 | 0x14)
> +#define COMPRESSION_ENABLE		(1 << 29)
> +#define AUX_CCS_E			(5 << 18)
> +#define FULL_RESOLVE			(1 << 12)
> +#define PARTIAL_RESOLVE			(2 << 12)
> +#define TILE_4_FORMAT			(2 << 30)
> +#define TILE_4_WIDTH			(128)
> +#define TILE_4_WIDTH_DWORD		((128 >> 2) - 1)
> +#define TILE_4_HEIGHT			(32)
> +#define SURFACE_TYPE_2D			(1 << 29)
> +
> +#define DEST_Y2_COORDINATE_SHIFT	(16)
> +#define DEST_MEM_TYPE_SHIFT		(31)
> +#define SRC_MEM_TYPE_SHIFT		(31)
> +#define DEST_SURF_WIDTH_SHIFT		(14)
> +#define SRC_SURF_WIDTH_SHIFT		(14)
> +
> +#define XY_CTRL_SURF_COPY_BLT		(2<<29 | 0x48<<22 | 3)
> +#define SRC_ACCESS_TYPE_SHIFT		21
> +#define DST_ACCESS_TYPE_SHIFT		20
> +#define CCS_SIZE_SHIFT			8
> +#define MI_INSTR(opcode, flags) (((opcode) << 23) | (flags))
> +#define MI_ARB_CHECK			MI_INSTR(0x05, 0)
> +#define NUM_CCS_BLKS_PER_XFER		1024
> +#define INDIRECT_ACCESS                 0
> +#define DIRECT_ACCESS                   1
> +
> +#define BATCH_SIZE			4096
> +#define BOSIZE_MIN			(4*1024)
> +#define BOSIZE_MAX			(4*1024*1024)
> +#define CCS_RATIO			256
> +
> +#define MEM_TYPE_SYS			1
> +#define MEM_TYPE_LOCAL			0
> +
> +enum copy_mode {
> +	SYS_TO_SYS = 0,
> +	SYS_TO_LOCAL,
> +	LOCAL_TO_SYS,
> +	LOCAL_TO_LOCAL,
> +	LOCAL_TO_LOCAL_INPLACE,
> +};
> +
> +void xy_block_copy_blt(int fd, uint32_t bb_region, uint32_t src, uint32_t dst,
> +		       uint64_t src_size, uint64_t dst_size, uint64_t ahnd,
> +		       uint32_t length, enum copy_mode mode, bool enable_compression,
> +		       struct intel_execution_engine2 *e);
> +
> +void xy_ctrl_surf_copy_blt(int fd, uint32_t bb_region, uint32_t src, uint32_t dst,
> +			   uint64_t src_size, uint64_t dst_size, uint64_t ahnd,
> +			   uint32_t length, bool writetodev,
> +			   struct intel_execution_engine2 *e);
> +
> +void xy_block_copy_blt_ctx(int fd, uint32_t bb_region, uint32_t src, uint32_t dst,
> +			   uint64_t src_size, uint64_t dst_size, uint64_t ahnd,
> +			   uint32_t length, enum copy_mode mode, bool enable_compression,
> +			   uint32_t ctx, struct intel_execution_engine2 *e);
> +
> +void xy_ctrl_surf_copy_blt_ctx(int fd, uint32_t bb_region, uint32_t src, uint32_t dst,
> +			       uint64_t src_size, uint64_t dst_size, uint64_t ahnd,
> +			       uint32_t length, bool writetodev, uint32_t ctx,
> +			       struct intel_execution_engine2 *e);
> diff --git a/lib/meson.build b/lib/meson.build
> index f500f0f1..f2924541 100644
> --- a/lib/meson.build
> +++ b/lib/meson.build
> @@ -12,6 +12,7 @@ lib_sources = [
>  	'i915/gem_vm.c',
>  	'i915/intel_memory_region.c',
>  	'i915/intel_mocs.c',
> +	'i915/i915_blt.c',
>  	'igt_collection.c',
>  	'igt_color_encoding.c',
>  	'igt_debugfs.c',
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [igt-dev] [PATCH i-g-t, v4 2/5] lib/i915: Introduce library i915_blt
  2021-12-10 13:05 ` [igt-dev] [PATCH i-g-t, v4 2/5] lib/i915: Introduce library i915_blt apoorva1.singh
                     ` (2 preceding siblings ...)
  2021-12-16 13:15   ` Zbigniew Kempczyński
@ 2021-12-16 13:19   ` Zbigniew Kempczyński
  2021-12-16 14:18     ` Singh, Apoorva1
  3 siblings, 1 reply; 14+ messages in thread
From: Zbigniew Kempczyński @ 2021-12-16 13:19 UTC (permalink / raw)
  To: apoorva1.singh; +Cc: igt-dev

On Fri, Dec 10, 2021 at 06:35:30PM +0530, apoorva1.singh@intel.com wrote:
> From: Apoorva Singh <apoorva1.singh@intel.com>
> 
> Add new library 'i915_blt' for various blt commands.
> 
> Signed-off-by: Apoorva Singh <apoorva1.singh@intel.com>
> Signed-off-by: Ayaz A Siddiqui <ayaz.siddiqui@intel.com>
> Cc: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com>
> Cc: Melkaveri, Arjun <arjun.melkaveri@intel.com>
> ---
>  lib/i915/i915_blt.c | 469 ++++++++++++++++++++++++++++++++++++++++++++
>  lib/i915/i915_blt.h |  82 ++++++++
>  lib/meson.build     |   1 +
>  3 files changed, 552 insertions(+)
>  create mode 100644 lib/i915/i915_blt.c
>  create mode 100644 lib/i915/i915_blt.h
> 
> diff --git a/lib/i915/i915_blt.c b/lib/i915/i915_blt.c
> new file mode 100644
> index 00000000..abfe7739
> --- /dev/null
> +++ b/lib/i915/i915_blt.c
> @@ -0,0 +1,469 @@
> +// SPDX-License-Identifier: MIT
> +/*
> + * Copyright © 2021 Intel Corporation
> + */
> +
> +#include <errno.h>
> +#include <sys/ioctl.h>
> +#include <sys/time.h>
> +#include <malloc.h>
> +#include "drm.h"
> +#include "igt.h"
> +#include "i915_blt.h"
> +#include "i915/intel_mocs.h"
> +
> +/*
> + * make_block_copy_batch:
> + * @fd: open i915 drm file descriptor
> + * @batch_buf: the batch buffer to populate with the command
> + * @src: fd of the source BO
> + * @dst: fd of the destination BO
> + * @length: size of the src and dest BOs
> + * @reloc: pointer to the relocation entyr for this command
> + * @offset_src: source address offset
> + * @offset_dst: destination address offset
> + * @src_mem_type: source memory type (denotes direct or indirect
> + *			addressing)
> + * @dst_mem_type: destination memory type (denotes direct or indirect
> + *			addressing)
> + * @src_compression: flag to enable uncompressed read of compressed data
> + *			at the source
> + * @dst_compression: flag to enable compressed write at the destination
> + * @resolve: flag to enable resolve of compressed data
> + */
> +static int make_block_copy_batch(int fd, uint32_t *batch_buf,
> +				 uint32_t src, uint32_t dst, uint32_t length,
> +				 struct drm_i915_gem_relocation_entry *reloc,
> +				 uint64_t offset_src, uint64_t offset_dst,
> +				 int src_mem_type, int dst_mem_type,
> +				 int src_compression, int dst_compression,
> +				 int resolve)
> +{
> +	uint32_t *b = batch_buf;
> +	uint32_t devid;
> +	uint8_t src_mocs = intel_get_uc_mocs(fd);
> +	uint8_t dst_mocs = src_mocs;
> +
> +	devid = intel_get_drm_devid(fd);
> +
> +	igt_assert(AT_LEAST_GEN(devid, 12) && IS_TIGERLAKE(devid) && !(src_compression || dst_compression));
> +
> +	/* BG 0 */
> +	b[0] = BLOCK_COPY_BLT_CMD | resolve;
> +
> +	/* BG 1
> +	 *
> +	 * Using Tile 4 dimensions.  Height = 32 rows
> +	 * Width = 128 bytes
> +	 */
> +	b[1] = dst_compression | TILE_4_FORMAT | TILE_4_WIDTH_DWORD |
> +		dst_mocs << XY_BLOCK_COPY_BLT_MOCS_SHIFT;;
> +
> +	/* BG 3
> +	 *
> +	 * X2 = TILE_4_WIDTH
> +	 * Y2 = (length / TILE_4_WIDTH) << 16:
> +	 */
> +	b[3] = TILE_4_WIDTH | (length >> 7) << DEST_Y2_COORDINATE_SHIFT;
> +
> +	b[4] = offset_dst;
> +	b[5] = offset_dst >> 32;
> +
> +	/* relocate address in b[4] and b[5] */
> +	reloc->offset = 4 * (sizeof(uint32_t));
> +	reloc->delta = 0;
> +	reloc->target_handle = dst;
> +	reloc->read_domains = I915_GEM_DOMAIN_RENDER;
> +	reloc->write_domain = I915_GEM_DOMAIN_RENDER;
> +	reloc->presumed_offset = 0;
> +	reloc++;
> +
> +	/* BG 6 */
> +	b[6] = dst_mem_type << DEST_MEM_TYPE_SHIFT;
> +
> +	/* BG 8 */
> +	b[8] = src_compression | TILE_4_WIDTH_DWORD | TILE_4_FORMAT |
> +		src_mocs << XY_BLOCK_COPY_BLT_MOCS_SHIFT;
> +
> +	b[9] = offset_src;
> +	b[10] = offset_src >> 32;
> +
> +	/* relocate address in b[9] and b[10] */
> +	reloc->offset = 9 * sizeof(uint32_t);
> +	reloc->delta = 0;
> +	reloc->target_handle = src;
> +	reloc->read_domains = I915_GEM_DOMAIN_RENDER;
> +	reloc->write_domain = 0;
> +	reloc->presumed_offset = 0;
> +	reloc++;
> +
> +	/* BG 11 */
> +	b[11] = src_mem_type << SRC_MEM_TYPE_SHIFT;
> +
> +	/* BG 16  */
> +	b[16] = SURFACE_TYPE_2D |
> +		((TILE_4_WIDTH - 1) << DEST_SURF_WIDTH_SHIFT) |
> +		(TILE_4_HEIGHT - 1);
> +
> +	/* BG 19 */
> +	b[19] = SURFACE_TYPE_2D |
> +		((TILE_4_WIDTH - 1) << SRC_SURF_WIDTH_SHIFT) |
> +		(TILE_4_HEIGHT - 1);
> +
> +	b += XY_BLOCK_COPY_BLT_LEN_DWORD;
> +
> +	b[0] = MI_FLUSH_DW | MI_FLUSH_LLC | MI_INVALIDATE_TLB;
> +	reloc->offset = 23 * sizeof(uint32_t);
> +	reloc->delta = 0;
> +	reloc->target_handle = dst_compression > 0 ? dst : src;
> +	reloc->read_domains = 0;
> +	reloc->write_domain = 0;
> +	reloc->presumed_offset = 0;
> +	reloc++;
> +	b[3] = 0;
> +
> +	b[4] = MI_FLUSH_DW | MI_FLUSH_CCS;
> +	reloc->offset = 27 * sizeof(uint32_t);
> +	reloc->delta = 0;
> +	reloc->target_handle = dst_compression > 0 ? dst : src;
> +	reloc->read_domains = 0;
> +	reloc->write_domain = 0;
> +	reloc->presumed_offset = 0;
> +	reloc++;
> +	b[7] = 0;
> +
> +	b[8] = MI_BATCH_BUFFER_END;
> +	b[9] = 0;
> +
> +	b += 10;
> +
> +	return (b - batch_buf) * sizeof(uint32_t);
> +}
> +
> +static void __xy_block_copy_blt(int fd, uint32_t bb_region, uint32_t src, uint32_t dst,
> +				uint64_t src_size, uint64_t dst_size, uint64_t ahnd,
> +				uint32_t length, enum copy_mode mode, bool enable_compression,
> +				uint32_t ctx, struct intel_execution_engine2 *e)
> +{
> +	struct drm_i915_gem_relocation_entry reloc[4];
> +	struct drm_i915_gem_exec_object2 exec[3];
> +	struct drm_i915_gem_execbuffer2 execbuf;
> +	int len;
> +	int src_mem_type, dst_mem_type;
> +	int dst_compression, src_compression;
> +	int resolve;
> +	uint32_t cmd, batch_buf[BATCH_SIZE/sizeof(uint32_t)] = {};
> +	uint64_t offset_src, offset_dst, offset_bb, bb_size, ret;
> +
> +	bb_size = BATCH_SIZE;
> +	ret = __gem_create_in_memory_regions(fd, &cmd, &bb_size, bb_region);
> +	igt_assert_eq(ret, 0);
> +
> +	switch(mode) {
> +		case SYS_TO_SYS: /* copy from smem to smem */
> +			src_mem_type = MEM_TYPE_SYS;
> +			dst_mem_type = MEM_TYPE_SYS;
> +			src_compression = 0;
> +			dst_compression = 0;
> +			resolve = 0;
> +		case SYS_TO_LOCAL: /* copy from smem to lmem */
> +			src_mem_type = MEM_TYPE_SYS;
> +			dst_mem_type = MEM_TYPE_LOCAL;
> +			src_compression = 0;
> +			dst_compression = enable_compression ? (COMPRESSION_ENABLE | AUX_CCS_E) : 0;
> +			resolve = 0;
> +		case LOCAL_TO_SYS: /* copy from lmem to smem */
> +			src_mem_type = MEM_TYPE_LOCAL;
> +			dst_mem_type = MEM_TYPE_SYS;
> +			src_compression = enable_compression ? (COMPRESSION_ENABLE | AUX_CCS_E) : 0;
> +			dst_compression = 0;
> +			resolve = 0;
> +		case LOCAL_TO_LOCAL: /* copy from lmem to lmem */
> +			src_mem_type = MEM_TYPE_LOCAL;
> +			dst_mem_type = MEM_TYPE_LOCAL;
> +			src_compression = enable_compression ? (COMPRESSION_ENABLE | AUX_CCS_E) : 0;
> +			dst_compression = enable_compression ? (COMPRESSION_ENABLE | AUX_CCS_E) : 0;
> +			resolve = 0;
> +		case LOCAL_TO_LOCAL_INPLACE: /* in-place decompress */
> +			src_mem_type = MEM_TYPE_LOCAL;
> +			dst_mem_type = MEM_TYPE_LOCAL;
> +			src_compression = enable_compression ? (COMPRESSION_ENABLE | AUX_CCS_E) : 0;
> +			dst_compression = enable_compression ? (COMPRESSION_ENABLE | AUX_CCS_E) : 0;
> +			resolve = FULL_RESOLVE;
> +	}

Wow, I was blind before - in all 'case' there're missing breaks,
so we catch last one if we hit any.

--
Zbigniew

> +
> +	offset_src = get_offset(ahnd, src, src_size, 0);
> +	offset_dst = get_offset(ahnd, dst, dst_size, 0);
> +	offset_bb = get_offset(ahnd, cmd, bb_size, 0);
> +
> +	/* construct the batch buffer */
> +	memset(reloc, 0, sizeof(reloc));
> +	len = make_block_copy_batch(fd, batch_buf,
> +				    src, dst, length, reloc,
> +				    offset_src, offset_dst,
> +				    src_mem_type, dst_mem_type,
> +				    src_compression, dst_compression,
> +				    resolve);
> +	igt_assert(len > 0);
> +
> +	/* write batch buffer to 'cmd' BO */
> +	gem_write(fd, cmd, 0, batch_buf, len);
> +
> +	/* Execute the batch buffer */
> +	memset(exec, 0, sizeof(exec));
> +	if (mode == LOCAL_TO_LOCAL_INPLACE) {
> +		exec[0].handle = dst;
> +		exec[1].handle = cmd;
> +		exec[1].relocation_count = !ahnd ? 4 : 0;
> +		exec[1].relocs_ptr = to_user_pointer(reloc);
> +		if (ahnd) {
> +			exec[0].offset = offset_src;
> +			exec[0].flags |= EXEC_OBJECT_PINNED | EXEC_OBJECT_WRITE;
> +			exec[1].offset = offset_dst;
> +			exec[1].flags |= EXEC_OBJECT_PINNED;
> +		}
> +	} else {
> +		exec[0].handle = src;
> +		exec[1].handle = dst;
> +		exec[2].handle = cmd;
> +		exec[2].relocation_count = !ahnd ? 4 : 0;
> +		exec[2].relocs_ptr = to_user_pointer(reloc);
> +		if (ahnd) {
> +			exec[0].offset = offset_src;
> +			exec[0].flags |= EXEC_OBJECT_PINNED;
> +			exec[1].offset = offset_dst;
> +			exec[1].flags |= EXEC_OBJECT_PINNED | EXEC_OBJECT_WRITE;
> +			exec[2].offset = offset_bb;
> +			exec[2].flags |= EXEC_OBJECT_PINNED;
> +		}
> +	}
> +
> +	memset(&execbuf, 0, sizeof(execbuf));
> +	execbuf.buffers_ptr = to_user_pointer(exec);
> +
> +	if (mode == LOCAL_TO_LOCAL_INPLACE)
> +		execbuf.buffer_count = 2;
> +	else
> +		execbuf.buffer_count = 3;
> +	execbuf.batch_len = len;
> +
> +	if (ctx)
> +		execbuf.rsvd1 = ctx;
> +
> +	execbuf.flags = I915_EXEC_BLT;
> +	if (e)
> +		execbuf.flags = e->flags;
> +
> +	gem_execbuf(fd, &execbuf);
> +	gem_close(fd, cmd);
> +	put_offset(ahnd, src);
> +	put_offset(ahnd, dst);
> +	put_offset(ahnd, cmd);
> +}
> +
> +void xy_block_copy_blt(int fd, uint32_t bb_region, uint32_t src, uint32_t dst,
> +		       uint64_t src_size, uint64_t dst_size, uint64_t ahnd,
> +		       uint32_t length, enum copy_mode mode, bool enable_compression,
> +		       struct intel_execution_engine2 *e)
> +{
> +	__xy_block_copy_blt(fd, bb_region, src, dst, src_size, dst_size, ahnd,
> +			    length, mode, enable_compression, 0, e);
> +}
> +
> +void xy_block_copy_blt_ctx(int fd, uint32_t bb_region, uint32_t src, uint32_t dst,
> +			   uint64_t src_size, uint64_t dst_size, uint64_t ahnd,
> +			   uint32_t length, enum copy_mode mode, bool enable_compression,
> +			   uint32_t ctx, struct intel_execution_engine2 *e)
> +{
> +	__xy_block_copy_blt(fd, bb_region, src, dst, src_size, dst_size, ahnd,
> +			    length, mode, enable_compression, ctx, e);
> +}
> +
> +/*
> + * make_ctrl_surf_batch:
> + * @fd: open i915 drm file descriptor
> + * @batch_buf: the batch buffer to populate with the command
> + * @src: fd of the source BO
> + * @dst: fd of the destination BO
> + * @length: size of the ctrl surf in bytes
> + * @reloc: pointer to the relocation entyr for this command
> + * @offset_src: source address offset
> + * @offset_dst: destination address offset
> + * @src_mem_access: source memory type (denotes direct or indirect
> + *			addressing)
> + * @dst_mem_acdcess: destination memory type (denotes direct or indirect
> + *			addressing)
> + */
> +static int make_ctrl_surf_batch(int fd, uint32_t *batch_buf,
> +				uint32_t src, uint32_t dst, uint32_t length,
> +				struct drm_i915_gem_relocation_entry *reloc,
> +				uint64_t offset_src, uint64_t offset_dst,
> +				int src_mem_access, int dst_mem_access)
> +{
> +	int num_ccs_blocks;
> +	uint32_t *b = batch_buf;
> +	uint8_t src_mocs = intel_get_uc_mocs(fd);
> +	uint8_t dst_mocs = src_mocs;
> +
> +	num_ccs_blocks = length/CCS_RATIO;
> +	if (num_ccs_blocks < 1)
> +		num_ccs_blocks = 1;
> +	if (num_ccs_blocks > NUM_CCS_BLKS_PER_XFER)
> +		return 0;
> +
> +	/*
> +	 * We use logical AND with 1023 since the size field
> +	 * takes values which is in the range of 0 - 1023
> +	 */
> +	b[0] = ((XY_CTRL_SURF_COPY_BLT) |
> +		(src_mem_access << SRC_ACCESS_TYPE_SHIFT) |
> +		(dst_mem_access << DST_ACCESS_TYPE_SHIFT) |
> +		(((num_ccs_blocks - 1) & 1023) << CCS_SIZE_SHIFT));
> +
> +	b[1] = offset_src;
> +	b[2] = offset_src >> 32 | src_mocs << XY_CTRL_SURF_COPY_BLT_MOCS_SHIFT;
> +
> +	/* relocate address in b[1] and b[2] */
> +	reloc->offset = 1 * sizeof(uint32_t);
> +	reloc->delta = 0;
> +	reloc->target_handle = src;
> +	reloc->read_domains = I915_GEM_DOMAIN_RENDER;
> +	reloc->write_domain = 0;
> +	reloc->presumed_offset = 0;
> +	reloc++;
> +
> +	b[3] = offset_dst;
> +	b[4] = offset_dst >> 32 | dst_mocs << XY_CTRL_SURF_COPY_BLT_MOCS_SHIFT;
> +
> +	/* relocate address in b[3] and b[4] */
> +	reloc->offset = 3 * (sizeof(uint32_t));
> +	reloc->delta = 0;
> +	reloc->target_handle = dst;
> +	reloc->read_domains = I915_GEM_DOMAIN_RENDER;
> +	reloc->write_domain = I915_GEM_DOMAIN_RENDER;
> +	reloc->presumed_offset = 0;
> +	reloc++;
> +
> +	b[5] = 0;
> +
> +	b[6] = MI_FLUSH_DW | MI_FLUSH_LLC | MI_INVALIDATE_TLB;
> +
> +	reloc->offset = 7 * sizeof(uint32_t);
> +	reloc->delta = 0;
> +	reloc->target_handle =
> +	dst_mem_access == INDIRECT_ACCESS ? dst : src;
> +	reloc->read_domains = 0;
> +	reloc->write_domain = 0;
> +	reloc->presumed_offset = 0;
> +	reloc++;
> +	b[9] = 0;
> +
> +	b[10] = MI_FLUSH_DW | MI_FLUSH_CCS;
> +	reloc->offset = 11 * sizeof(uint32_t);
> +	reloc->delta = 0;
> +	reloc->target_handle =
> +	dst_mem_access == INDIRECT_ACCESS ? dst : src;
> +	reloc->read_domains = 0;
> +	reloc->write_domain = 0;
> +	reloc->presumed_offset = 0;
> +	reloc++;
> +	b[13] = 0;
> +
> +	b[14] = MI_BATCH_BUFFER_END;
> +	b[15] = 0;
> +
> +	b += 16;
> +
> +	return (b - batch_buf) * sizeof(uint32_t);
> +}
> +
> +static void __xy_ctrl_surf_copy_blt(int fd, uint32_t bb_region, uint32_t src,
> +				    uint32_t dst, uint64_t src_size, uint64_t dst_size,
> +				    uint64_t ahnd, uint32_t length, bool writetodev,
> +				    uint32_t ctx, struct intel_execution_engine2 *e)
> +{
> +	struct drm_i915_gem_relocation_entry reloc[4];
> +	struct drm_i915_gem_exec_object2 exec[3];
> +	struct drm_i915_gem_execbuffer2 execbuf;
> +	int len, src_mem_access, dst_mem_access;
> +	uint32_t cmd, batch_buf[BATCH_SIZE/sizeof(uint32_t)] = {};
> +	uint64_t offset_src, offset_dst, offset_bb, bb_size, ret;
> +
> +	bb_size = BATCH_SIZE;
> +	ret = __gem_create_in_memory_regions(fd, &cmd, &bb_size, bb_region);
> +	igt_assert_eq(ret, 0);
> +
> +	if (writetodev) {
> +		src_mem_access = DIRECT_ACCESS;
> +		dst_mem_access = INDIRECT_ACCESS;
> +	} else {
> +		src_mem_access = INDIRECT_ACCESS;
> +		dst_mem_access = DIRECT_ACCESS;
> +	}
> +
> +	offset_src = get_offset(ahnd, src, src_size, 0);
> +	offset_dst = get_offset(ahnd, dst, dst_size, 0);
> +	offset_bb = get_offset(ahnd, cmd, bb_size, 0);
> +
> +	/* construct batch command buffer */
> +	memset(reloc, 0, sizeof(reloc));
> +	len = make_ctrl_surf_batch(fd, batch_buf,
> +				   src, dst, length, reloc,
> +				   offset_src, offset_dst,
> +				   src_mem_access, dst_mem_access);
> +	igt_assert(len > 0);
> +
> +	/* Copy the batch buff to BO cmd */
> +	gem_write(fd, cmd, 0, batch_buf, len);
> +
> +	/* Execute the batch buffer */
> +	memset(exec, 0, sizeof(exec));
> +	exec[0].handle = src;
> +	exec[1].handle = dst;
> +	exec[2].handle = cmd;
> +	exec[2].relocation_count = !ahnd ? 4 : 0;
> +	exec[2].relocs_ptr = to_user_pointer(reloc);
> +	if (ahnd) {
> +		exec[0].offset = offset_src;
> +		exec[0].flags |= EXEC_OBJECT_PINNED;
> +		exec[1].offset = offset_dst;
> +		exec[1].flags |= EXEC_OBJECT_PINNED | EXEC_OBJECT_WRITE;
> +		exec[2].offset = offset_bb;
> +		exec[2].flags |= EXEC_OBJECT_PINNED;
> +	}
> +
> +	memset(&execbuf, 0, sizeof(execbuf));
> +	execbuf.buffers_ptr = to_user_pointer(exec);
> +	execbuf.buffer_count = 3;
> +	execbuf.batch_len = len;
> +	execbuf.flags = I915_EXEC_BLT;
> +	if (ctx)
> +		execbuf.rsvd1 = ctx;
> +	if (e)
> +		execbuf.flags = e->flags;
> +
> +	gem_execbuf(fd, &execbuf);
> +	gem_close(fd, cmd);
> +	put_offset(ahnd, src);
> +	put_offset(ahnd, dst);
> +	put_offset(ahnd, cmd);
> +}
> +
> +void xy_ctrl_surf_copy_blt(int fd, uint32_t bb_region, uint32_t src, uint32_t dst,
> +			   uint64_t src_size, uint64_t dst_size, uint64_t ahnd,
> +			   uint32_t length, bool writetodev,
> +			   struct intel_execution_engine2 *e)
> +{
> +	__xy_ctrl_surf_copy_blt(fd, bb_region, src, dst, src_size, dst_size,
> +				ahnd, length, writetodev, 0, e);
> +}
> +
> +void xy_ctrl_surf_copy_blt_ctx(int fd, uint32_t bb_region, uint32_t src, uint32_t dst,
> +			       uint64_t src_size, uint64_t dst_size, uint64_t ahnd,
> +			       uint32_t length, bool writetodev, uint32_t ctx,
> +			       struct intel_execution_engine2 *e)
> +{
> +	__xy_ctrl_surf_copy_blt(fd, bb_region, src, dst, src_size, dst_size,
> +				ahnd, length, writetodev, ctx, e);
> +}
> +
> diff --git a/lib/i915/i915_blt.h b/lib/i915/i915_blt.h
> new file mode 100644
> index 00000000..71653880
> --- /dev/null
> +++ b/lib/i915/i915_blt.h
> @@ -0,0 +1,82 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright © 2021 Intel Corporation
> + */
> +
> +#include <errno.h>
> +#include <sys/ioctl.h>
> +#include <sys/time.h>
> +#include <malloc.h>
> +#include "drm.h"
> +#include "igt.h"
> +
> +#define MI_FLUSH_DW_LEN_DWORD	4
> +#define MI_FLUSH_DW		(0x26 << 23 | 1)
> +#define MI_FLUSH_CCS		(1 << 16)
> +#define MI_FLUSH_LLC		(1 << 9)
> +#define MI_INVALIDATE_TLB	(1 << 18)
> +
> +/* XY_BLOCK_COPY_BLT instruction has 22 bit groups 1 DWORD each */
> +#define XY_BLOCK_COPY_BLT_LEN_DWORD	22
> +#define BLOCK_COPY_BLT_CMD		(2 << 29 | 0x41 << 22 | 0x14)
> +#define COMPRESSION_ENABLE		(1 << 29)
> +#define AUX_CCS_E			(5 << 18)
> +#define FULL_RESOLVE			(1 << 12)
> +#define PARTIAL_RESOLVE			(2 << 12)
> +#define TILE_4_FORMAT			(2 << 30)
> +#define TILE_4_WIDTH			(128)
> +#define TILE_4_WIDTH_DWORD		((128 >> 2) - 1)
> +#define TILE_4_HEIGHT			(32)
> +#define SURFACE_TYPE_2D			(1 << 29)
> +
> +#define DEST_Y2_COORDINATE_SHIFT	(16)
> +#define DEST_MEM_TYPE_SHIFT		(31)
> +#define SRC_MEM_TYPE_SHIFT		(31)
> +#define DEST_SURF_WIDTH_SHIFT		(14)
> +#define SRC_SURF_WIDTH_SHIFT		(14)
> +
> +#define XY_CTRL_SURF_COPY_BLT		(2<<29 | 0x48<<22 | 3)
> +#define SRC_ACCESS_TYPE_SHIFT		21
> +#define DST_ACCESS_TYPE_SHIFT		20
> +#define CCS_SIZE_SHIFT			8
> +#define MI_INSTR(opcode, flags) (((opcode) << 23) | (flags))
> +#define MI_ARB_CHECK			MI_INSTR(0x05, 0)
> +#define NUM_CCS_BLKS_PER_XFER		1024
> +#define INDIRECT_ACCESS                 0
> +#define DIRECT_ACCESS                   1
> +
> +#define BATCH_SIZE			4096
> +#define BOSIZE_MIN			(4*1024)
> +#define BOSIZE_MAX			(4*1024*1024)
> +#define CCS_RATIO			256
> +
> +#define MEM_TYPE_SYS			1
> +#define MEM_TYPE_LOCAL			0
> +
> +enum copy_mode {
> +	SYS_TO_SYS = 0,
> +	SYS_TO_LOCAL,
> +	LOCAL_TO_SYS,
> +	LOCAL_TO_LOCAL,
> +	LOCAL_TO_LOCAL_INPLACE,
> +};
> +
> +void xy_block_copy_blt(int fd, uint32_t bb_region, uint32_t src, uint32_t dst,
> +		       uint64_t src_size, uint64_t dst_size, uint64_t ahnd,
> +		       uint32_t length, enum copy_mode mode, bool enable_compression,
> +		       struct intel_execution_engine2 *e);
> +
> +void xy_ctrl_surf_copy_blt(int fd, uint32_t bb_region, uint32_t src, uint32_t dst,
> +			   uint64_t src_size, uint64_t dst_size, uint64_t ahnd,
> +			   uint32_t length, bool writetodev,
> +			   struct intel_execution_engine2 *e);
> +
> +void xy_block_copy_blt_ctx(int fd, uint32_t bb_region, uint32_t src, uint32_t dst,
> +			   uint64_t src_size, uint64_t dst_size, uint64_t ahnd,
> +			   uint32_t length, enum copy_mode mode, bool enable_compression,
> +			   uint32_t ctx, struct intel_execution_engine2 *e);
> +
> +void xy_ctrl_surf_copy_blt_ctx(int fd, uint32_t bb_region, uint32_t src, uint32_t dst,
> +			       uint64_t src_size, uint64_t dst_size, uint64_t ahnd,
> +			       uint32_t length, bool writetodev, uint32_t ctx,
> +			       struct intel_execution_engine2 *e);
> diff --git a/lib/meson.build b/lib/meson.build
> index f500f0f1..f2924541 100644
> --- a/lib/meson.build
> +++ b/lib/meson.build
> @@ -12,6 +12,7 @@ lib_sources = [
>  	'i915/gem_vm.c',
>  	'i915/intel_memory_region.c',
>  	'i915/intel_mocs.c',
> +	'i915/i915_blt.c',
>  	'igt_collection.c',
>  	'igt_color_encoding.c',
>  	'igt_debugfs.c',
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [igt-dev] [PATCH i-g-t, v4 2/5] lib/i915: Introduce library i915_blt
  2021-12-16 13:19   ` Zbigniew Kempczyński
@ 2021-12-16 14:18     ` Singh, Apoorva1
  0 siblings, 0 replies; 14+ messages in thread
From: Singh, Apoorva1 @ 2021-12-16 14:18 UTC (permalink / raw)
  To: Kempczynski, Zbigniew; +Cc: igt-dev



> -----Original Message-----
> From: Kempczynski, Zbigniew <zbigniew.kempczynski@intel.com>
> Sent: Thursday, December 16, 2021 6:49 PM
> To: Singh, Apoorva1 <apoorva1.singh@intel.com>
> Cc: igt-dev@lists.freedesktop.org; C, Ramalingam <ramalingam.c@intel.com>;
> Melkaveri, Arjun <arjun.melkaveri@intel.com>
> Subject: Re: [PATCH i-g-t,v4 2/5] lib/i915: Introduce library i915_blt
> 
> On Fri, Dec 10, 2021 at 06:35:30PM +0530, apoorva1.singh@intel.com wrote:
> > From: Apoorva Singh <apoorva1.singh@intel.com>
> >
> > Add new library 'i915_blt' for various blt commands.
> >
> > Signed-off-by: Apoorva Singh <apoorva1.singh@intel.com>
> > Signed-off-by: Ayaz A Siddiqui <ayaz.siddiqui@intel.com>
> > Cc: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com>
> > Cc: Melkaveri, Arjun <arjun.melkaveri@intel.com>
> > ---
> >  lib/i915/i915_blt.c | 469
> > ++++++++++++++++++++++++++++++++++++++++++++
> >  lib/i915/i915_blt.h |  82 ++++++++
> >  lib/meson.build     |   1 +
> >  3 files changed, 552 insertions(+)
> >  create mode 100644 lib/i915/i915_blt.c  create mode 100644
> > lib/i915/i915_blt.h
> >
> > diff --git a/lib/i915/i915_blt.c b/lib/i915/i915_blt.c new file mode
> > 100644 index 00000000..abfe7739
> > --- /dev/null
> > +++ b/lib/i915/i915_blt.c
> > @@ -0,0 +1,469 @@
> > +// SPDX-License-Identifier: MIT
> > +/*
> > + * Copyright © 2021 Intel Corporation  */
> > +
> > +#include <errno.h>
> > +#include <sys/ioctl.h>
> > +#include <sys/time.h>
> > +#include <malloc.h>
> > +#include "drm.h"
> > +#include "igt.h"
> > +#include "i915_blt.h"
> > +#include "i915/intel_mocs.h"
> > +
> > +/*
> > + * make_block_copy_batch:
> > + * @fd: open i915 drm file descriptor
> > + * @batch_buf: the batch buffer to populate with the command
> > + * @src: fd of the source BO
> > + * @dst: fd of the destination BO
> > + * @length: size of the src and dest BOs
> > + * @reloc: pointer to the relocation entyr for this command
> > + * @offset_src: source address offset
> > + * @offset_dst: destination address offset
> > + * @src_mem_type: source memory type (denotes direct or indirect
> > + *			addressing)
> > + * @dst_mem_type: destination memory type (denotes direct or indirect
> > + *			addressing)
> > + * @src_compression: flag to enable uncompressed read of compressed data
> > + *			at the source
> > + * @dst_compression: flag to enable compressed write at the
> > +destination
> > + * @resolve: flag to enable resolve of compressed data  */ static int
> > +make_block_copy_batch(int fd, uint32_t *batch_buf,
> > +				 uint32_t src, uint32_t dst, uint32_t length,
> > +				 struct drm_i915_gem_relocation_entry *reloc,
> > +				 uint64_t offset_src, uint64_t offset_dst,
> > +				 int src_mem_type, int dst_mem_type,
> > +				 int src_compression, int dst_compression,
> > +				 int resolve)
> > +{
> > +	uint32_t *b = batch_buf;
> > +	uint32_t devid;
> > +	uint8_t src_mocs = intel_get_uc_mocs(fd);
> > +	uint8_t dst_mocs = src_mocs;
> > +
> > +	devid = intel_get_drm_devid(fd);
> > +
> > +	igt_assert(AT_LEAST_GEN(devid, 12) && IS_TIGERLAKE(devid) &&
> > +!(src_compression || dst_compression));
> > +
> > +	/* BG 0 */
> > +	b[0] = BLOCK_COPY_BLT_CMD | resolve;
> > +
> > +	/* BG 1
> > +	 *
> > +	 * Using Tile 4 dimensions.  Height = 32 rows
> > +	 * Width = 128 bytes
> > +	 */
> > +	b[1] = dst_compression | TILE_4_FORMAT | TILE_4_WIDTH_DWORD |
> > +		dst_mocs << XY_BLOCK_COPY_BLT_MOCS_SHIFT;;
> > +
> > +	/* BG 3
> > +	 *
> > +	 * X2 = TILE_4_WIDTH
> > +	 * Y2 = (length / TILE_4_WIDTH) << 16:
> > +	 */
> > +	b[3] = TILE_4_WIDTH | (length >> 7) << DEST_Y2_COORDINATE_SHIFT;
> > +
> > +	b[4] = offset_dst;
> > +	b[5] = offset_dst >> 32;
> > +
> > +	/* relocate address in b[4] and b[5] */
> > +	reloc->offset = 4 * (sizeof(uint32_t));
> > +	reloc->delta = 0;
> > +	reloc->target_handle = dst;
> > +	reloc->read_domains = I915_GEM_DOMAIN_RENDER;
> > +	reloc->write_domain = I915_GEM_DOMAIN_RENDER;
> > +	reloc->presumed_offset = 0;
> > +	reloc++;
> > +
> > +	/* BG 6 */
> > +	b[6] = dst_mem_type << DEST_MEM_TYPE_SHIFT;
> > +
> > +	/* BG 8 */
> > +	b[8] = src_compression | TILE_4_WIDTH_DWORD | TILE_4_FORMAT |
> > +		src_mocs << XY_BLOCK_COPY_BLT_MOCS_SHIFT;
> > +
> > +	b[9] = offset_src;
> > +	b[10] = offset_src >> 32;
> > +
> > +	/* relocate address in b[9] and b[10] */
> > +	reloc->offset = 9 * sizeof(uint32_t);
> > +	reloc->delta = 0;
> > +	reloc->target_handle = src;
> > +	reloc->read_domains = I915_GEM_DOMAIN_RENDER;
> > +	reloc->write_domain = 0;
> > +	reloc->presumed_offset = 0;
> > +	reloc++;
> > +
> > +	/* BG 11 */
> > +	b[11] = src_mem_type << SRC_MEM_TYPE_SHIFT;
> > +
> > +	/* BG 16  */
> > +	b[16] = SURFACE_TYPE_2D |
> > +		((TILE_4_WIDTH - 1) << DEST_SURF_WIDTH_SHIFT) |
> > +		(TILE_4_HEIGHT - 1);
> > +
> > +	/* BG 19 */
> > +	b[19] = SURFACE_TYPE_2D |
> > +		((TILE_4_WIDTH - 1) << SRC_SURF_WIDTH_SHIFT) |
> > +		(TILE_4_HEIGHT - 1);
> > +
> > +	b += XY_BLOCK_COPY_BLT_LEN_DWORD;
> > +
> > +	b[0] = MI_FLUSH_DW | MI_FLUSH_LLC | MI_INVALIDATE_TLB;
> > +	reloc->offset = 23 * sizeof(uint32_t);
> > +	reloc->delta = 0;
> > +	reloc->target_handle = dst_compression > 0 ? dst : src;
> > +	reloc->read_domains = 0;
> > +	reloc->write_domain = 0;
> > +	reloc->presumed_offset = 0;
> > +	reloc++;
> > +	b[3] = 0;
> > +
> > +	b[4] = MI_FLUSH_DW | MI_FLUSH_CCS;
> > +	reloc->offset = 27 * sizeof(uint32_t);
> > +	reloc->delta = 0;
> > +	reloc->target_handle = dst_compression > 0 ? dst : src;
> > +	reloc->read_domains = 0;
> > +	reloc->write_domain = 0;
> > +	reloc->presumed_offset = 0;
> > +	reloc++;
> > +	b[7] = 0;
> > +
> > +	b[8] = MI_BATCH_BUFFER_END;
> > +	b[9] = 0;
> > +
> > +	b += 10;
> > +
> > +	return (b - batch_buf) * sizeof(uint32_t); }
> > +
> > +static void __xy_block_copy_blt(int fd, uint32_t bb_region, uint32_t src,
> uint32_t dst,
> > +				uint64_t src_size, uint64_t dst_size, uint64_t
> ahnd,
> > +				uint32_t length, enum copy_mode mode, bool
> enable_compression,
> > +				uint32_t ctx, struct intel_execution_engine2
> *e) {
> > +	struct drm_i915_gem_relocation_entry reloc[4];
> > +	struct drm_i915_gem_exec_object2 exec[3];
> > +	struct drm_i915_gem_execbuffer2 execbuf;
> > +	int len;
> > +	int src_mem_type, dst_mem_type;
> > +	int dst_compression, src_compression;
> > +	int resolve;
> > +	uint32_t cmd, batch_buf[BATCH_SIZE/sizeof(uint32_t)] = {};
> > +	uint64_t offset_src, offset_dst, offset_bb, bb_size, ret;
> > +
> > +	bb_size = BATCH_SIZE;
> > +	ret = __gem_create_in_memory_regions(fd, &cmd, &bb_size,
> bb_region);
> > +	igt_assert_eq(ret, 0);
> > +
> > +	switch(mode) {
> > +		case SYS_TO_SYS: /* copy from smem to smem */
> > +			src_mem_type = MEM_TYPE_SYS;
> > +			dst_mem_type = MEM_TYPE_SYS;
> > +			src_compression = 0;
> > +			dst_compression = 0;
> > +			resolve = 0;
> > +		case SYS_TO_LOCAL: /* copy from smem to lmem */
> > +			src_mem_type = MEM_TYPE_SYS;
> > +			dst_mem_type = MEM_TYPE_LOCAL;
> > +			src_compression = 0;
> > +			dst_compression = enable_compression ?
> (COMPRESSION_ENABLE | AUX_CCS_E) : 0;
> > +			resolve = 0;
> > +		case LOCAL_TO_SYS: /* copy from lmem to smem */
> > +			src_mem_type = MEM_TYPE_LOCAL;
> > +			dst_mem_type = MEM_TYPE_SYS;
> > +			src_compression = enable_compression ?
> (COMPRESSION_ENABLE | AUX_CCS_E) : 0;
> > +			dst_compression = 0;
> > +			resolve = 0;
> > +		case LOCAL_TO_LOCAL: /* copy from lmem to lmem */
> > +			src_mem_type = MEM_TYPE_LOCAL;
> > +			dst_mem_type = MEM_TYPE_LOCAL;
> > +			src_compression = enable_compression ?
> (COMPRESSION_ENABLE | AUX_CCS_E) : 0;
> > +			dst_compression = enable_compression ?
> (COMPRESSION_ENABLE | AUX_CCS_E) : 0;
> > +			resolve = 0;
> > +		case LOCAL_TO_LOCAL_INPLACE: /* in-place decompress */
> > +			src_mem_type = MEM_TYPE_LOCAL;
> > +			dst_mem_type = MEM_TYPE_LOCAL;
> > +			src_compression = enable_compression ?
> (COMPRESSION_ENABLE | AUX_CCS_E) : 0;
> > +			dst_compression = enable_compression ?
> (COMPRESSION_ENABLE | AUX_CCS_E) : 0;
> > +			resolve = FULL_RESOLVE;
> > +	}
> 
> Wow, I was blind before - in all 'case' there're missing breaks, so we catch last
> one if we hit any.
> 
> --
> Zbigniew
> 

Oops, sorry. It's really a surprising miss from my side. Thanks a lot for pointing it out.
I will rectify it in next series.

Thanks,
Apoorva

> > +
> > +	offset_src = get_offset(ahnd, src, src_size, 0);
> > +	offset_dst = get_offset(ahnd, dst, dst_size, 0);
> > +	offset_bb = get_offset(ahnd, cmd, bb_size, 0);
> > +
> > +	/* construct the batch buffer */
> > +	memset(reloc, 0, sizeof(reloc));
> > +	len = make_block_copy_batch(fd, batch_buf,
> > +				    src, dst, length, reloc,
> > +				    offset_src, offset_dst,
> > +				    src_mem_type, dst_mem_type,
> > +				    src_compression, dst_compression,
> > +				    resolve);
> > +	igt_assert(len > 0);
> > +
> > +	/* write batch buffer to 'cmd' BO */
> > +	gem_write(fd, cmd, 0, batch_buf, len);
> > +
> > +	/* Execute the batch buffer */
> > +	memset(exec, 0, sizeof(exec));
> > +	if (mode == LOCAL_TO_LOCAL_INPLACE) {
> > +		exec[0].handle = dst;
> > +		exec[1].handle = cmd;
> > +		exec[1].relocation_count = !ahnd ? 4 : 0;
> > +		exec[1].relocs_ptr = to_user_pointer(reloc);
> > +		if (ahnd) {
> > +			exec[0].offset = offset_src;
> > +			exec[0].flags |= EXEC_OBJECT_PINNED |
> EXEC_OBJECT_WRITE;
> > +			exec[1].offset = offset_dst;
> > +			exec[1].flags |= EXEC_OBJECT_PINNED;
> > +		}
> > +	} else {
> > +		exec[0].handle = src;
> > +		exec[1].handle = dst;
> > +		exec[2].handle = cmd;
> > +		exec[2].relocation_count = !ahnd ? 4 : 0;
> > +		exec[2].relocs_ptr = to_user_pointer(reloc);
> > +		if (ahnd) {
> > +			exec[0].offset = offset_src;
> > +			exec[0].flags |= EXEC_OBJECT_PINNED;
> > +			exec[1].offset = offset_dst;
> > +			exec[1].flags |= EXEC_OBJECT_PINNED |
> EXEC_OBJECT_WRITE;
> > +			exec[2].offset = offset_bb;
> > +			exec[2].flags |= EXEC_OBJECT_PINNED;
> > +		}
> > +	}
> > +
> > +	memset(&execbuf, 0, sizeof(execbuf));
> > +	execbuf.buffers_ptr = to_user_pointer(exec);
> > +
> > +	if (mode == LOCAL_TO_LOCAL_INPLACE)
> > +		execbuf.buffer_count = 2;
> > +	else
> > +		execbuf.buffer_count = 3;
> > +	execbuf.batch_len = len;
> > +
> > +	if (ctx)
> > +		execbuf.rsvd1 = ctx;
> > +
> > +	execbuf.flags = I915_EXEC_BLT;
> > +	if (e)
> > +		execbuf.flags = e->flags;
> > +
> > +	gem_execbuf(fd, &execbuf);
> > +	gem_close(fd, cmd);
> > +	put_offset(ahnd, src);
> > +	put_offset(ahnd, dst);
> > +	put_offset(ahnd, cmd);
> > +}
> > +
> > +void xy_block_copy_blt(int fd, uint32_t bb_region, uint32_t src, uint32_t dst,
> > +		       uint64_t src_size, uint64_t dst_size, uint64_t ahnd,
> > +		       uint32_t length, enum copy_mode mode, bool
> enable_compression,
> > +		       struct intel_execution_engine2 *e) {
> > +	__xy_block_copy_blt(fd, bb_region, src, dst, src_size, dst_size, ahnd,
> > +			    length, mode, enable_compression, 0, e); }
> > +
> > +void xy_block_copy_blt_ctx(int fd, uint32_t bb_region, uint32_t src, uint32_t
> dst,
> > +			   uint64_t src_size, uint64_t dst_size, uint64_t ahnd,
> > +			   uint32_t length, enum copy_mode mode, bool
> enable_compression,
> > +			   uint32_t ctx, struct intel_execution_engine2 *e) {
> > +	__xy_block_copy_blt(fd, bb_region, src, dst, src_size, dst_size, ahnd,
> > +			    length, mode, enable_compression, ctx, e); }
> > +
> > +/*
> > + * make_ctrl_surf_batch:
> > + * @fd: open i915 drm file descriptor
> > + * @batch_buf: the batch buffer to populate with the command
> > + * @src: fd of the source BO
> > + * @dst: fd of the destination BO
> > + * @length: size of the ctrl surf in bytes
> > + * @reloc: pointer to the relocation entyr for this command
> > + * @offset_src: source address offset
> > + * @offset_dst: destination address offset
> > + * @src_mem_access: source memory type (denotes direct or indirect
> > + *			addressing)
> > + * @dst_mem_acdcess: destination memory type (denotes direct or indirect
> > + *			addressing)
> > + */
> > +static int make_ctrl_surf_batch(int fd, uint32_t *batch_buf,
> > +				uint32_t src, uint32_t dst, uint32_t length,
> > +				struct drm_i915_gem_relocation_entry *reloc,
> > +				uint64_t offset_src, uint64_t offset_dst,
> > +				int src_mem_access, int dst_mem_access)
> > +{
> > +	int num_ccs_blocks;
> > +	uint32_t *b = batch_buf;
> > +	uint8_t src_mocs = intel_get_uc_mocs(fd);
> > +	uint8_t dst_mocs = src_mocs;
> > +
> > +	num_ccs_blocks = length/CCS_RATIO;
> > +	if (num_ccs_blocks < 1)
> > +		num_ccs_blocks = 1;
> > +	if (num_ccs_blocks > NUM_CCS_BLKS_PER_XFER)
> > +		return 0;
> > +
> > +	/*
> > +	 * We use logical AND with 1023 since the size field
> > +	 * takes values which is in the range of 0 - 1023
> > +	 */
> > +	b[0] = ((XY_CTRL_SURF_COPY_BLT) |
> > +		(src_mem_access << SRC_ACCESS_TYPE_SHIFT) |
> > +		(dst_mem_access << DST_ACCESS_TYPE_SHIFT) |
> > +		(((num_ccs_blocks - 1) & 1023) << CCS_SIZE_SHIFT));
> > +
> > +	b[1] = offset_src;
> > +	b[2] = offset_src >> 32 | src_mocs <<
> XY_CTRL_SURF_COPY_BLT_MOCS_SHIFT;
> > +
> > +	/* relocate address in b[1] and b[2] */
> > +	reloc->offset = 1 * sizeof(uint32_t);
> > +	reloc->delta = 0;
> > +	reloc->target_handle = src;
> > +	reloc->read_domains = I915_GEM_DOMAIN_RENDER;
> > +	reloc->write_domain = 0;
> > +	reloc->presumed_offset = 0;
> > +	reloc++;
> > +
> > +	b[3] = offset_dst;
> > +	b[4] = offset_dst >> 32 | dst_mocs <<
> XY_CTRL_SURF_COPY_BLT_MOCS_SHIFT;
> > +
> > +	/* relocate address in b[3] and b[4] */
> > +	reloc->offset = 3 * (sizeof(uint32_t));
> > +	reloc->delta = 0;
> > +	reloc->target_handle = dst;
> > +	reloc->read_domains = I915_GEM_DOMAIN_RENDER;
> > +	reloc->write_domain = I915_GEM_DOMAIN_RENDER;
> > +	reloc->presumed_offset = 0;
> > +	reloc++;
> > +
> > +	b[5] = 0;
> > +
> > +	b[6] = MI_FLUSH_DW | MI_FLUSH_LLC | MI_INVALIDATE_TLB;
> > +
> > +	reloc->offset = 7 * sizeof(uint32_t);
> > +	reloc->delta = 0;
> > +	reloc->target_handle =
> > +	dst_mem_access == INDIRECT_ACCESS ? dst : src;
> > +	reloc->read_domains = 0;
> > +	reloc->write_domain = 0;
> > +	reloc->presumed_offset = 0;
> > +	reloc++;
> > +	b[9] = 0;
> > +
> > +	b[10] = MI_FLUSH_DW | MI_FLUSH_CCS;
> > +	reloc->offset = 11 * sizeof(uint32_t);
> > +	reloc->delta = 0;
> > +	reloc->target_handle =
> > +	dst_mem_access == INDIRECT_ACCESS ? dst : src;
> > +	reloc->read_domains = 0;
> > +	reloc->write_domain = 0;
> > +	reloc->presumed_offset = 0;
> > +	reloc++;
> > +	b[13] = 0;
> > +
> > +	b[14] = MI_BATCH_BUFFER_END;
> > +	b[15] = 0;
> > +
> > +	b += 16;
> > +
> > +	return (b - batch_buf) * sizeof(uint32_t);
> > +}
> > +
> > +static void __xy_ctrl_surf_copy_blt(int fd, uint32_t bb_region, uint32_t src,
> > +				    uint32_t dst, uint64_t src_size, uint64_t
> dst_size,
> > +				    uint64_t ahnd, uint32_t length, bool
> writetodev,
> > +				    uint32_t ctx, struct intel_execution_engine2
> *e)
> > +{
> > +	struct drm_i915_gem_relocation_entry reloc[4];
> > +	struct drm_i915_gem_exec_object2 exec[3];
> > +	struct drm_i915_gem_execbuffer2 execbuf;
> > +	int len, src_mem_access, dst_mem_access;
> > +	uint32_t cmd, batch_buf[BATCH_SIZE/sizeof(uint32_t)] = {};
> > +	uint64_t offset_src, offset_dst, offset_bb, bb_size, ret;
> > +
> > +	bb_size = BATCH_SIZE;
> > +	ret = __gem_create_in_memory_regions(fd, &cmd, &bb_size,
> bb_region);
> > +	igt_assert_eq(ret, 0);
> > +
> > +	if (writetodev) {
> > +		src_mem_access = DIRECT_ACCESS;
> > +		dst_mem_access = INDIRECT_ACCESS;
> > +	} else {
> > +		src_mem_access = INDIRECT_ACCESS;
> > +		dst_mem_access = DIRECT_ACCESS;
> > +	}
> > +
> > +	offset_src = get_offset(ahnd, src, src_size, 0);
> > +	offset_dst = get_offset(ahnd, dst, dst_size, 0);
> > +	offset_bb = get_offset(ahnd, cmd, bb_size, 0);
> > +
> > +	/* construct batch command buffer */
> > +	memset(reloc, 0, sizeof(reloc));
> > +	len = make_ctrl_surf_batch(fd, batch_buf,
> > +				   src, dst, length, reloc,
> > +				   offset_src, offset_dst,
> > +				   src_mem_access, dst_mem_access);
> > +	igt_assert(len > 0);
> > +
> > +	/* Copy the batch buff to BO cmd */
> > +	gem_write(fd, cmd, 0, batch_buf, len);
> > +
> > +	/* Execute the batch buffer */
> > +	memset(exec, 0, sizeof(exec));
> > +	exec[0].handle = src;
> > +	exec[1].handle = dst;
> > +	exec[2].handle = cmd;
> > +	exec[2].relocation_count = !ahnd ? 4 : 0;
> > +	exec[2].relocs_ptr = to_user_pointer(reloc);
> > +	if (ahnd) {
> > +		exec[0].offset = offset_src;
> > +		exec[0].flags |= EXEC_OBJECT_PINNED;
> > +		exec[1].offset = offset_dst;
> > +		exec[1].flags |= EXEC_OBJECT_PINNED |
> EXEC_OBJECT_WRITE;
> > +		exec[2].offset = offset_bb;
> > +		exec[2].flags |= EXEC_OBJECT_PINNED;
> > +	}
> > +
> > +	memset(&execbuf, 0, sizeof(execbuf));
> > +	execbuf.buffers_ptr = to_user_pointer(exec);
> > +	execbuf.buffer_count = 3;
> > +	execbuf.batch_len = len;
> > +	execbuf.flags = I915_EXEC_BLT;
> > +	if (ctx)
> > +		execbuf.rsvd1 = ctx;
> > +	if (e)
> > +		execbuf.flags = e->flags;
> > +
> > +	gem_execbuf(fd, &execbuf);
> > +	gem_close(fd, cmd);
> > +	put_offset(ahnd, src);
> > +	put_offset(ahnd, dst);
> > +	put_offset(ahnd, cmd);
> > +}
> > +
> > +void xy_ctrl_surf_copy_blt(int fd, uint32_t bb_region, uint32_t src, uint32_t
> dst,
> > +			   uint64_t src_size, uint64_t dst_size, uint64_t ahnd,
> > +			   uint32_t length, bool writetodev,
> > +			   struct intel_execution_engine2 *e)
> > +{
> > +	__xy_ctrl_surf_copy_blt(fd, bb_region, src, dst, src_size, dst_size,
> > +				ahnd, length, writetodev, 0, e);
> > +}
> > +
> > +void xy_ctrl_surf_copy_blt_ctx(int fd, uint32_t bb_region, uint32_t src,
> uint32_t dst,
> > +			       uint64_t src_size, uint64_t dst_size, uint64_t ahnd,
> > +			       uint32_t length, bool writetodev, uint32_t ctx,
> > +			       struct intel_execution_engine2 *e)
> > +{
> > +	__xy_ctrl_surf_copy_blt(fd, bb_region, src, dst, src_size, dst_size,
> > +				ahnd, length, writetodev, ctx, e);
> > +}
> > +
> > diff --git a/lib/i915/i915_blt.h b/lib/i915/i915_blt.h
> > new file mode 100644
> > index 00000000..71653880
> > --- /dev/null
> > +++ b/lib/i915/i915_blt.h
> > @@ -0,0 +1,82 @@
> > +/* SPDX-License-Identifier: MIT */
> > +/*
> > + * Copyright © 2021 Intel Corporation
> > + */
> > +
> > +#include <errno.h>
> > +#include <sys/ioctl.h>
> > +#include <sys/time.h>
> > +#include <malloc.h>
> > +#include "drm.h"
> > +#include "igt.h"
> > +
> > +#define MI_FLUSH_DW_LEN_DWORD	4
> > +#define MI_FLUSH_DW		(0x26 << 23 | 1)
> > +#define MI_FLUSH_CCS		(1 << 16)
> > +#define MI_FLUSH_LLC		(1 << 9)
> > +#define MI_INVALIDATE_TLB	(1 << 18)
> > +
> > +/* XY_BLOCK_COPY_BLT instruction has 22 bit groups 1 DWORD each */
> > +#define XY_BLOCK_COPY_BLT_LEN_DWORD	22
> > +#define BLOCK_COPY_BLT_CMD		(2 << 29 | 0x41 << 22 | 0x14)
> > +#define COMPRESSION_ENABLE		(1 << 29)
> > +#define AUX_CCS_E			(5 << 18)
> > +#define FULL_RESOLVE			(1 << 12)
> > +#define PARTIAL_RESOLVE			(2 << 12)
> > +#define TILE_4_FORMAT			(2 << 30)
> > +#define TILE_4_WIDTH			(128)
> > +#define TILE_4_WIDTH_DWORD		((128 >> 2) - 1)
> > +#define TILE_4_HEIGHT			(32)
> > +#define SURFACE_TYPE_2D			(1 << 29)
> > +
> > +#define DEST_Y2_COORDINATE_SHIFT	(16)
> > +#define DEST_MEM_TYPE_SHIFT		(31)
> > +#define SRC_MEM_TYPE_SHIFT		(31)
> > +#define DEST_SURF_WIDTH_SHIFT		(14)
> > +#define SRC_SURF_WIDTH_SHIFT		(14)
> > +
> > +#define XY_CTRL_SURF_COPY_BLT		(2<<29 | 0x48<<22 | 3)
> > +#define SRC_ACCESS_TYPE_SHIFT		21
> > +#define DST_ACCESS_TYPE_SHIFT		20
> > +#define CCS_SIZE_SHIFT			8
> > +#define MI_INSTR(opcode, flags) (((opcode) << 23) | (flags))
> > +#define MI_ARB_CHECK			MI_INSTR(0x05, 0)
> > +#define NUM_CCS_BLKS_PER_XFER		1024
> > +#define INDIRECT_ACCESS                 0
> > +#define DIRECT_ACCESS                   1
> > +
> > +#define BATCH_SIZE			4096
> > +#define BOSIZE_MIN			(4*1024)
> > +#define BOSIZE_MAX			(4*1024*1024)
> > +#define CCS_RATIO			256
> > +
> > +#define MEM_TYPE_SYS			1
> > +#define MEM_TYPE_LOCAL			0
> > +
> > +enum copy_mode {
> > +	SYS_TO_SYS = 0,
> > +	SYS_TO_LOCAL,
> > +	LOCAL_TO_SYS,
> > +	LOCAL_TO_LOCAL,
> > +	LOCAL_TO_LOCAL_INPLACE,
> > +};
> > +
> > +void xy_block_copy_blt(int fd, uint32_t bb_region, uint32_t src, uint32_t dst,
> > +		       uint64_t src_size, uint64_t dst_size, uint64_t ahnd,
> > +		       uint32_t length, enum copy_mode mode, bool
> enable_compression,
> > +		       struct intel_execution_engine2 *e);
> > +
> > +void xy_ctrl_surf_copy_blt(int fd, uint32_t bb_region, uint32_t src, uint32_t
> dst,
> > +			   uint64_t src_size, uint64_t dst_size, uint64_t ahnd,
> > +			   uint32_t length, bool writetodev,
> > +			   struct intel_execution_engine2 *e);
> > +
> > +void xy_block_copy_blt_ctx(int fd, uint32_t bb_region, uint32_t src, uint32_t
> dst,
> > +			   uint64_t src_size, uint64_t dst_size, uint64_t ahnd,
> > +			   uint32_t length, enum copy_mode mode, bool
> enable_compression,
> > +			   uint32_t ctx, struct intel_execution_engine2 *e);
> > +
> > +void xy_ctrl_surf_copy_blt_ctx(int fd, uint32_t bb_region, uint32_t src,
> uint32_t dst,
> > +			       uint64_t src_size, uint64_t dst_size, uint64_t ahnd,
> > +			       uint32_t length, bool writetodev, uint32_t ctx,
> > +			       struct intel_execution_engine2 *e);
> > diff --git a/lib/meson.build b/lib/meson.build
> > index f500f0f1..f2924541 100644
> > --- a/lib/meson.build
> > +++ b/lib/meson.build
> > @@ -12,6 +12,7 @@ lib_sources = [
> >  	'i915/gem_vm.c',
> >  	'i915/intel_memory_region.c',
> >  	'i915/intel_mocs.c',
> > +	'i915/i915_blt.c',
> >  	'igt_collection.c',
> >  	'igt_color_encoding.c',
> >  	'igt_debugfs.c',
> > --
> > 2.25.1
> >

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-12-16 14:18 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-10 13:05 [igt-dev] [PATCH i-g-t,v4 0/5] Add testing for CCS apoorva1.singh
2021-12-10 13:05 ` [igt-dev] [PATCH i-g-t, v4 1/5] lib/i915: Introduce library intel_mocs apoorva1.singh
2021-12-13 15:58   ` Zbigniew Kempczyński
2021-12-10 13:05 ` [igt-dev] [PATCH i-g-t, v4 2/5] lib/i915: Introduce library i915_blt apoorva1.singh
2021-12-15  9:40   ` Zbigniew Kempczyński
2021-12-16 13:11   ` Zbigniew Kempczyński
2021-12-16 13:15   ` Zbigniew Kempczyński
2021-12-16 13:19   ` Zbigniew Kempczyński
2021-12-16 14:18     ` Singh, Apoorva1
2021-12-10 13:05 ` [igt-dev] [PATCH i-g-t, v4 3/5] lib/intel_chipset.h: Add has_flat_ccs flag apoorva1.singh
2021-12-10 13:05 ` [igt-dev] [PATCH i-g-t, v4 4/5] i915/gem_engine_topology: Only use the main copy engines for XY_BLOCK_COPY apoorva1.singh
2021-12-10 13:05 ` [igt-dev] [PATCH i-g-t,v4 5/5] i915/gem_ccs: Add testing for CCS apoorva1.singh
2021-12-10 13:45 ` [igt-dev] ✓ Fi.CI.BAT: success for Add testing for CCS (rev4) Patchwork
2021-12-11  8:42 ` [igt-dev] ✗ Fi.CI.IGT: failure " Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.