All of lore.kernel.org
 help / color / mirror / Atom feed
* [igt-dev] [PATCH i-g-t 0/3] Add block-multicopy-* subtests
@ 2022-12-12 12:50 Zbigniew Kempczyński
  2022-12-12 12:50 ` [igt-dev] [PATCH i-g-t 1/3] lib/i915_blt: Remove src == dst pitch restriction Zbigniew Kempczyński
                   ` (4 more replies)
  0 siblings, 5 replies; 20+ messages in thread
From: Zbigniew Kempczyński @ 2022-12-12 12:50 UTC (permalink / raw)
  To: igt-dev

According to debugging with hw team extend the test to exercise blits
in single batch and additional step with inplace decompression to same
tiling format. 

Cc: Karolina Stolarek <karolina.stolarek@intel.com>

Zbigniew Kempczyński (3):
  lib/i915_blt: Remove src == dst pitch restriction
  lib/i915_blt: Extract blit emit functions
  tests/gem_ccs: Add block-multicopy subtest

 lib/i915/i915_blt.c  | 271 +++++++++++++++++++++++++++++++------------
 lib/i915/i915_blt.h  |  19 +++
 tests/i915/gem_ccs.c | 262 ++++++++++++++++++++++++++++++++++++++---
 3 files changed, 462 insertions(+), 90 deletions(-)

-- 
2.34.1

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [igt-dev] [PATCH i-g-t 1/3] lib/i915_blt: Remove src == dst pitch restriction
  2022-12-12 12:50 [igt-dev] [PATCH i-g-t 0/3] Add block-multicopy-* subtests Zbigniew Kempczyński
@ 2022-12-12 12:50 ` Zbigniew Kempczyński
  2022-12-13 15:37   ` Karolina Stolarek
  2022-12-12 12:50 ` [igt-dev] [PATCH i-g-t 2/3] lib/i915_blt: Extract blit emit functions Zbigniew Kempczyński
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 20+ messages in thread
From: Zbigniew Kempczyński @ 2022-12-12 12:50 UTC (permalink / raw)
  To: igt-dev

During debugging phase we established there's not necessary to enforce
same pitch for destination surface in FULL_RESOLVE mode. Relax this
condition allowing caller to pass requirement surface configuration.

Signed-off-by: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com>
---
 lib/i915/i915_blt.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/lib/i915/i915_blt.c b/lib/i915/i915_blt.c
index 3776c56c60..42c28623f9 100644
--- a/lib/i915/i915_blt.c
+++ b/lib/i915/i915_blt.c
@@ -317,13 +317,11 @@ static void fill_data(struct gen12_block_copy_data *data,
 	data->dw00.special_mode = __special_mode(blt);
 	data->dw00.length = extended_command ? 20 : 10;
 
-	if (__special_mode(blt) == SM_FULL_RESOLVE && blt->src.tiling == T_TILE64) {
-		data->dw01.dst_pitch = blt->src.pitch - 1;
+	if (__special_mode(blt) == SM_FULL_RESOLVE)
 		data->dw01.dst_aux_mode = __aux_mode(&blt->src);
-	} else {
-		data->dw01.dst_pitch = blt->dst.pitch - 1;
+	else
 		data->dw01.dst_aux_mode = __aux_mode(&blt->dst);
-	}
+	data->dw01.dst_pitch = blt->dst.pitch - 1;
 
 	data->dw01.dst_mocs = blt->dst.mocs;
 	data->dw01.dst_compression = blt->dst.compression;
-- 
2.34.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [igt-dev] [PATCH i-g-t 2/3] lib/i915_blt: Extract blit emit functions
  2022-12-12 12:50 [igt-dev] [PATCH i-g-t 0/3] Add block-multicopy-* subtests Zbigniew Kempczyński
  2022-12-12 12:50 ` [igt-dev] [PATCH i-g-t 1/3] lib/i915_blt: Remove src == dst pitch restriction Zbigniew Kempczyński
@ 2022-12-12 12:50 ` Zbigniew Kempczyński
  2022-12-13 15:39   ` Karolina Stolarek
  2022-12-12 12:50 ` [igt-dev] [PATCH i-g-t 3/3] tests/gem_ccs: Add block-multicopy subtest Zbigniew Kempczyński
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 20+ messages in thread
From: Zbigniew Kempczyński @ 2022-12-12 12:50 UTC (permalink / raw)
  To: igt-dev

Add some flexibility in building user pipelines extracting blitter
emission code to dedicated functions. Previous blitter functions which
do one blit-and-execute are rewritten to use those functions.
Requires usage with stateful allocator (offset might be acquired more
than one, so it must not change).

Signed-off-by: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com>
---
 lib/i915/i915_blt.c | 263 ++++++++++++++++++++++++++++++++------------
 lib/i915/i915_blt.h |  19 ++++
 2 files changed, 213 insertions(+), 69 deletions(-)

diff --git a/lib/i915/i915_blt.c b/lib/i915/i915_blt.c
index 42c28623f9..32ad608775 100644
--- a/lib/i915/i915_blt.c
+++ b/lib/i915/i915_blt.c
@@ -503,58 +503,61 @@ static void dump_bb_ext(struct gen12_block_copy_data_ext *data)
 }
 
 /**
- * blt_block_copy:
+ * emit_blt_block_copy:
  * @i915: drm fd
- * @ctx: intel_ctx_t context
- * @e: blitter engine for @ctx
  * @ahnd: allocator handle
  * @blt: basic blitter data (for TGL/DG1 which doesn't support ext version)
  * @ext: extended blitter data (for DG2+, supports flatccs compression)
+ * @bb_pos: position at which insert block copy commands
+ * @emit_bbe: emit MI_BATCH_BUFFER_END after block-copy or not
  *
- * Function does blit between @src and @dst described in @blt object.
+ * Function inserts block-copy blit into batch at @bb_pos. Allows concatenating
+ * with other commands to achieve pipelining.
  *
  * Returns:
- * execbuffer status.
+ * Next write position in batch.
  */
-int blt_block_copy(int i915,
-		   const intel_ctx_t *ctx,
-		   const struct intel_execution_engine2 *e,
-		   uint64_t ahnd,
-		   const struct blt_copy_data *blt,
-		   const struct blt_block_copy_data_ext *ext)
+uint64_t emit_blt_block_copy(int i915,
+			     uint64_t ahnd,
+			     const struct blt_copy_data *blt,
+			     const struct blt_block_copy_data_ext *ext,
+			     uint64_t bb_pos,
+			     bool emit_bbe)
 {
-	struct drm_i915_gem_execbuffer2 execbuf = {};
-	struct drm_i915_gem_exec_object2 obj[3] = {};
 	struct gen12_block_copy_data data = {};
 	struct gen12_block_copy_data_ext dext = {};
-	uint64_t dst_offset, src_offset, bb_offset, alignment;
-	uint32_t *bb;
-	int i, ret;
+	uint64_t dst_offset, src_offset, bb_offset;
+	uint32_t bbe = MI_BATCH_BUFFER_END;
+	uint8_t *bb;
 
 	igt_assert_f(ahnd, "block-copy supports softpin only\n");
 	igt_assert_f(blt, "block-copy requires data to do blit\n");
 
-	alignment = gem_detect_safe_alignment(i915);
-	src_offset = get_offset(ahnd, blt->src.handle, blt->src.size, alignment);
-	if (__special_mode(blt) == SM_FULL_RESOLVE)
-		dst_offset = src_offset;
-	else
-		dst_offset = get_offset(ahnd, blt->dst.handle, blt->dst.size, alignment);
-	bb_offset = get_offset(ahnd, blt->bb.handle, blt->bb.size, alignment);
+	src_offset = get_offset(ahnd, blt->src.handle, blt->src.size, 0);
+	dst_offset = get_offset(ahnd, blt->dst.handle, blt->dst.size, 0);
+	bb_offset = get_offset(ahnd, blt->bb.handle, blt->bb.size, 0);
 
 	fill_data(&data, blt, src_offset, dst_offset, ext);
 
-	i = sizeof(data) / sizeof(uint32_t);
 	bb = gem_mmap__device_coherent(i915, blt->bb.handle, 0, blt->bb.size,
 				       PROT_READ | PROT_WRITE);
-	memcpy(bb, &data, sizeof(data));
+
+	igt_assert(bb_pos + sizeof(data) < blt->bb.size);
+	memcpy(bb + bb_pos, &data, sizeof(data));
+	bb_pos += sizeof(data);
 
 	if (ext) {
 		fill_data_ext(&dext, ext);
-		memcpy(bb + i, &dext, sizeof(dext));
-		i += sizeof(dext) / sizeof(uint32_t);
+		igt_assert(bb_pos + sizeof(dext) < blt->bb.size);
+		memcpy(bb + bb_pos, &dext, sizeof(dext));
+		bb_pos += sizeof(dext);
+	}
+
+	if (emit_bbe) {
+		igt_assert(bb_pos + sizeof(uint32_t) < blt->bb.size);
+		memcpy(bb + bb_pos, &bbe, sizeof(bbe));
+		bb_pos += sizeof(uint32_t);
 	}
-	bb[i++] = MI_BATCH_BUFFER_END;
 
 	if (blt->print_bb) {
 		igt_info("[BLOCK COPY]\n");
@@ -569,6 +572,44 @@ int blt_block_copy(int i915,
 
 	munmap(bb, blt->bb.size);
 
+	return bb_pos;
+}
+
+/**
+ * blt_block_copy:
+ * @i915: drm fd
+ * @ctx: intel_ctx_t context
+ * @e: blitter engine for @ctx
+ * @ahnd: allocator handle
+ * @blt: basic blitter data (for TGL/DG1 which doesn't support ext version)
+ * @ext: extended blitter data (for DG2+, supports flatccs compression)
+ *
+ * Function does blit between @src and @dst described in @blt object.
+ *
+ * Returns:
+ * execbuffer status.
+ */
+int blt_block_copy(int i915,
+		   const intel_ctx_t *ctx,
+		   const struct intel_execution_engine2 *e,
+		   uint64_t ahnd,
+		   const struct blt_copy_data *blt,
+		   const struct blt_block_copy_data_ext *ext)
+{
+	struct drm_i915_gem_execbuffer2 execbuf = {};
+	struct drm_i915_gem_exec_object2 obj[3] = {};
+	uint64_t dst_offset, src_offset, bb_offset;
+	int ret;
+
+	igt_assert_f(ahnd, "block-copy supports softpin only\n");
+	igt_assert_f(blt, "block-copy requires data to do blit\n");
+
+	src_offset = get_offset(ahnd, blt->src.handle, blt->src.size, 0);
+	dst_offset = get_offset(ahnd, blt->dst.handle, blt->dst.size, 0);
+	bb_offset = get_offset(ahnd, blt->bb.handle, blt->bb.size, 0);
+
+	emit_blt_block_copy(i915, ahnd, blt, ext, 0, true);
+
 	obj[0].offset = CANONICAL(dst_offset);
 	obj[1].offset = CANONICAL(src_offset);
 	obj[2].offset = CANONICAL(bb_offset);
@@ -655,31 +696,30 @@ static void dump_bb_surf_ctrl_cmd(const struct gen12_ctrl_surf_copy_data *data)
 }
 
 /**
- * blt_ctrl_surf_copy:
+ * emit_blt_ctrl_surf_copy:
  * @i915: drm fd
- * @ctx: intel_ctx_t context
- * @e: blitter engine for @ctx
  * @ahnd: allocator handle
  * @surf: blitter data for ctrl-surf-copy
+ * @bb_pos: position at which insert block copy commands
+ * @emit_bbe: emit MI_BATCH_BUFFER_END after ctrl-surf-copy or not
  *
- * Function does ctrl-surf-copy blit between @src and @dst described in
- * @blt object.
+ * Function emits ctrl-surf-copy blit between @src and @dst described in
+ * @blt object at @bb_pos. Allows concatenating with other commands to
+ * achieve pipelining.
  *
  * Returns:
- * execbuffer status.
+ * Next write position in batch.
  */
-int blt_ctrl_surf_copy(int i915,
-		       const intel_ctx_t *ctx,
-		       const struct intel_execution_engine2 *e,
-		       uint64_t ahnd,
-		       const struct blt_ctrl_surf_copy_data *surf)
+uint64_t emit_blt_ctrl_surf_copy(int i915,
+				 uint64_t ahnd,
+				 const struct blt_ctrl_surf_copy_data *surf,
+				 uint64_t bb_pos,
+				 bool emit_bbe)
 {
-	struct drm_i915_gem_execbuffer2 execbuf = {};
-	struct drm_i915_gem_exec_object2 obj[3] = {};
 	struct gen12_ctrl_surf_copy_data data = {};
 	uint64_t dst_offset, src_offset, bb_offset, alignment;
+	uint32_t bbe = MI_BATCH_BUFFER_END;
 	uint32_t *bb;
-	int i;
 
 	igt_assert_f(ahnd, "ctrl-surf-copy supports softpin only\n");
 	igt_assert_f(surf, "ctrl-surf-copy requires data to do ctrl-surf-copy blit\n");
@@ -695,12 +735,9 @@ int blt_ctrl_surf_copy(int i915,
 	data.dw00.size_of_ctrl_copy = __ccs_size(surf) / CCS_RATIO - 1;
 	data.dw00.length = 0x3;
 
-	src_offset = get_offset(ahnd, surf->src.handle, surf->src.size,
-				alignment);
-	dst_offset = get_offset(ahnd, surf->dst.handle, surf->dst.size,
-				alignment);
-	bb_offset = get_offset(ahnd, surf->bb.handle, surf->bb.size,
-			       alignment);
+	src_offset = get_offset(ahnd, surf->src.handle, surf->src.size, alignment);
+	dst_offset = get_offset(ahnd, surf->dst.handle, surf->dst.size, alignment);
+	bb_offset = get_offset(ahnd, surf->bb.handle, surf->bb.size, alignment);
 
 	data.dw01.src_address_lo = src_offset;
 	data.dw02.src_address_hi = src_offset >> 32;
@@ -710,11 +747,18 @@ int blt_ctrl_surf_copy(int i915,
 	data.dw04.dst_address_hi = dst_offset >> 32;
 	data.dw04.dst_mocs = surf->dst.mocs;
 
-	i = sizeof(data) / sizeof(uint32_t);
 	bb = gem_mmap__device_coherent(i915, surf->bb.handle, 0, surf->bb.size,
 				       PROT_READ | PROT_WRITE);
-	memcpy(bb, &data, sizeof(data));
-	bb[i++] = MI_BATCH_BUFFER_END;
+
+	igt_assert(bb_pos + sizeof(data) < surf->bb.size);
+	memcpy(bb + bb_pos, &data, sizeof(data));
+	bb_pos += sizeof(data);
+
+	if (emit_bbe) {
+		igt_assert(bb_pos + sizeof(uint32_t) < surf->bb.size);
+		memcpy(bb + bb_pos, &bbe, sizeof(bbe));
+		bb_pos += sizeof(uint32_t);
+	}
 
 	if (surf->print_bb) {
 		igt_info("BB [CTRL SURF]:\n");
@@ -724,8 +768,46 @@ int blt_ctrl_surf_copy(int i915,
 
 		dump_bb_surf_ctrl_cmd(&data);
 	}
+
 	munmap(bb, surf->bb.size);
 
+	return bb_pos;
+}
+
+/**
+ * blt_ctrl_surf_copy:
+ * @i915: drm fd
+ * @ctx: intel_ctx_t context
+ * @e: blitter engine for @ctx
+ * @ahnd: allocator handle
+ * @surf: blitter data for ctrl-surf-copy
+ *
+ * Function does ctrl-surf-copy blit between @src and @dst described in
+ * @blt object.
+ *
+ * Returns:
+ * execbuffer status.
+ */
+int blt_ctrl_surf_copy(int i915,
+		       const intel_ctx_t *ctx,
+		       const struct intel_execution_engine2 *e,
+		       uint64_t ahnd,
+		       const struct blt_ctrl_surf_copy_data *surf)
+{
+	struct drm_i915_gem_execbuffer2 execbuf = {};
+	struct drm_i915_gem_exec_object2 obj[3] = {};
+	uint64_t dst_offset, src_offset, bb_offset, alignment;
+
+	igt_assert_f(ahnd, "ctrl-surf-copy supports softpin only\n");
+	igt_assert_f(surf, "ctrl-surf-copy requires data to do ctrl-surf-copy blit\n");
+
+	alignment = max_t(uint64_t, gem_detect_safe_alignment(i915), 1ull << 16);
+	src_offset = get_offset(ahnd, surf->src.handle, surf->src.size, alignment);
+	dst_offset = get_offset(ahnd, surf->dst.handle, surf->dst.size, alignment);
+	bb_offset = get_offset(ahnd, surf->bb.handle, surf->bb.size, alignment);
+
+	emit_blt_ctrl_surf_copy(i915, ahnd, surf, 0, true);
+
 	obj[0].offset = CANONICAL(dst_offset);
 	obj[1].offset = CANONICAL(src_offset);
 	obj[2].offset = CANONICAL(bb_offset);
@@ -869,31 +951,31 @@ static void dump_bb_fast_cmd(struct gen12_fast_copy_data *data)
 }
 
 /**
- * blt_fast_copy:
+ * emit_blt_fast_copy:
  * @i915: drm fd
- * @ctx: intel_ctx_t context
- * @e: blitter engine for @ctx
  * @ahnd: allocator handle
  * @blt: blitter data for fast-copy (same as for block-copy but doesn't use
  * compression fields).
+ * @bb_pos: position at which insert block copy commands
+ * @emit_bbe: emit MI_BATCH_BUFFER_END after fast-copy or not
  *
- * Function does fast blit between @src and @dst described in @blt object.
+ * Function emits fast-copy blit between @src and @dst described in @blt object
+ * at @bb_pos. Allows concatenating with other commands to
+ * achieve pipelining.
  *
  * Returns:
- * execbuffer status.
+ * Next write position in batch.
  */
-int blt_fast_copy(int i915,
-		  const intel_ctx_t *ctx,
-		  const struct intel_execution_engine2 *e,
-		  uint64_t ahnd,
-		  const struct blt_copy_data *blt)
+uint64_t emit_blt_fast_copy(int i915,
+			    uint64_t ahnd,
+			    const struct blt_copy_data *blt,
+			    uint64_t bb_pos,
+			    bool emit_bbe)
 {
-	struct drm_i915_gem_execbuffer2 execbuf = {};
-	struct drm_i915_gem_exec_object2 obj[3] = {};
 	struct gen12_fast_copy_data data = {};
 	uint64_t dst_offset, src_offset, bb_offset, alignment;
+	uint32_t bbe = MI_BATCH_BUFFER_END;
 	uint32_t *bb;
-	int i, ret;
 
 	alignment = gem_detect_safe_alignment(i915);
 
@@ -931,22 +1013,65 @@ int blt_fast_copy(int i915,
 	data.dw08.src_address_lo = src_offset;
 	data.dw09.src_address_hi = src_offset >> 32;
 
-	i = sizeof(data) / sizeof(uint32_t);
 	bb = gem_mmap__device_coherent(i915, blt->bb.handle, 0, blt->bb.size,
 				       PROT_READ | PROT_WRITE);
 
-	memcpy(bb, &data, sizeof(data));
-	bb[i++] = MI_BATCH_BUFFER_END;
+	igt_assert(bb_pos + sizeof(data) < blt->bb.size);
+	memcpy(bb + bb_pos, &data, sizeof(data));
+	bb_pos += sizeof(data);
+
+	if (emit_bbe) {
+		igt_assert(bb_pos + sizeof(uint32_t) < blt->bb.size);
+		memcpy(bb + bb_pos, &bbe, sizeof(bbe));
+		bb_pos += sizeof(uint32_t);
+	}
 
 	if (blt->print_bb) {
 		igt_info("BB [FAST COPY]\n");
-		igt_info("blit [src offset: %llx, dst offset: %llx\n",
-			 (long long) src_offset, (long long) dst_offset);
+		igt_info("src offset: %llx, dst offset: %llx, bb offset: %llx\n",
+			 (long long) src_offset, (long long) dst_offset,
+			 (long long) bb_offset);
 		dump_bb_fast_cmd(&data);
 	}
 
 	munmap(bb, blt->bb.size);
 
+	return bb_pos;
+}
+
+/**
+ * blt_fast_copy:
+ * @i915: drm fd
+ * @ctx: intel_ctx_t context
+ * @e: blitter engine for @ctx
+ * @ahnd: allocator handle
+ * @blt: blitter data for fast-copy (same as for block-copy but doesn't use
+ * compression fields).
+ *
+ * Function does fast blit between @src and @dst described in @blt object.
+ *
+ * Returns:
+ * execbuffer status.
+ */
+int blt_fast_copy(int i915,
+		  const intel_ctx_t *ctx,
+		  const struct intel_execution_engine2 *e,
+		  uint64_t ahnd,
+		  const struct blt_copy_data *blt)
+{
+	struct drm_i915_gem_execbuffer2 execbuf = {};
+	struct drm_i915_gem_exec_object2 obj[3] = {};
+	uint64_t dst_offset, src_offset, bb_offset, alignment;
+	int ret;
+
+	alignment = gem_detect_safe_alignment(i915);
+
+	src_offset = get_offset(ahnd, blt->src.handle, blt->src.size, alignment);
+	dst_offset = get_offset(ahnd, blt->dst.handle, blt->dst.size, alignment);
+	bb_offset = get_offset(ahnd, blt->bb.handle, blt->bb.size, alignment);
+
+	emit_blt_fast_copy(i915, ahnd, blt, 0, true);
+
 	obj[0].offset = CANONICAL(dst_offset);
 	obj[1].offset = CANONICAL(src_offset);
 	obj[2].offset = CANONICAL(bb_offset);
diff --git a/lib/i915/i915_blt.h b/lib/i915/i915_blt.h
index e0e8b52bc2..34db9bb962 100644
--- a/lib/i915/i915_blt.h
+++ b/lib/i915/i915_blt.h
@@ -168,6 +168,13 @@ bool blt_supports_compression(int i915);
 bool blt_supports_tiling(int i915, enum blt_tiling tiling);
 const char *blt_tiling_name(enum blt_tiling tiling);
 
+uint64_t emit_blt_block_copy(int i915,
+			     uint64_t ahnd,
+			     const struct blt_copy_data *blt,
+			     const struct blt_block_copy_data_ext *ext,
+			     uint64_t bb_pos,
+			     bool emit_bbe);
+
 int blt_block_copy(int i915,
 		   const intel_ctx_t *ctx,
 		   const struct intel_execution_engine2 *e,
@@ -175,12 +182,24 @@ int blt_block_copy(int i915,
 		   const struct blt_copy_data *blt,
 		   const struct blt_block_copy_data_ext *ext);
 
+uint64_t emit_blt_ctrl_surf_copy(int i915,
+				 uint64_t ahnd,
+				 const struct blt_ctrl_surf_copy_data *surf,
+				 uint64_t bb_pos,
+				 bool emit_bbe);
+
 int blt_ctrl_surf_copy(int i915,
 		       const intel_ctx_t *ctx,
 		       const struct intel_execution_engine2 *e,
 		       uint64_t ahnd,
 		       const struct blt_ctrl_surf_copy_data *surf);
 
+uint64_t emit_blt_fast_copy(int i915,
+			    uint64_t ahnd,
+			    const struct blt_copy_data *blt,
+			    uint64_t bb_pos,
+			    bool emit_bbe);
+
 int blt_fast_copy(int i915,
 		  const intel_ctx_t *ctx,
 		  const struct intel_execution_engine2 *e,
-- 
2.34.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [igt-dev] [PATCH i-g-t 3/3] tests/gem_ccs: Add block-multicopy subtest
  2022-12-12 12:50 [igt-dev] [PATCH i-g-t 0/3] Add block-multicopy-* subtests Zbigniew Kempczyński
  2022-12-12 12:50 ` [igt-dev] [PATCH i-g-t 1/3] lib/i915_blt: Remove src == dst pitch restriction Zbigniew Kempczyński
  2022-12-12 12:50 ` [igt-dev] [PATCH i-g-t 2/3] lib/i915_blt: Extract blit emit functions Zbigniew Kempczyński
@ 2022-12-12 12:50 ` Zbigniew Kempczyński
  2022-12-13 15:40   ` Karolina Stolarek
  2022-12-12 13:23 ` [igt-dev] ✓ Fi.CI.BAT: success for Add block-multicopy-* subtests Patchwork
  2022-12-12 21:05 ` [igt-dev] ✓ Fi.CI.IGT: " Patchwork
  4 siblings, 1 reply; 20+ messages in thread
From: Zbigniew Kempczyński @ 2022-12-12 12:50 UTC (permalink / raw)
  To: igt-dev

Exercise sequence of blits packed in single batch. It may reveal
flushing/decompressing/detiling problems during execution.
multicopy-inplace version differs from copy-inplace version with
additional blit to same tiling format during decompression to separate
problems visible in decompressing/detiling in one step.

Signed-off-by: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com>
---
 tests/i915/gem_ccs.c | 262 ++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 246 insertions(+), 16 deletions(-)

diff --git a/tests/i915/gem_ccs.c b/tests/i915/gem_ccs.c
index 4ecb3e36ac..6b5f199ec7 100644
--- a/tests/i915/gem_ccs.c
+++ b/tests/i915/gem_ccs.c
@@ -262,6 +262,119 @@ static void surf_copy(int i915,
 	gem_close(i915, ccs);
 }
 
+struct blt_copy3_data {
+	int i915;
+	struct blt_copy_object src;
+	struct blt_copy_object mid;
+	struct blt_copy_object dst;
+	struct blt_copy_object final;
+	struct blt_copy_batch bb;
+	enum blt_color_depth color_depth;
+
+	/* debug stuff */
+	bool print_bb;
+};
+
+struct blt_block_copy3_data_ext {
+	struct blt_block_copy_object_ext src;
+	struct blt_block_copy_object_ext mid;
+	struct blt_block_copy_object_ext dst;
+	struct blt_block_copy_object_ext final;
+};
+
+#define FILL_OBJ(_idx, _handle, _offset, _flags) do { \
+	obj[(_idx)].handle = (_handle); \
+	obj[(_idx)].offset = (_offset); \
+	obj[(_idx)++].flags = EXEC_OBJECT_PINNED | \
+			      EXEC_OBJECT_SUPPORTS_48B_ADDRESS | (_flags) ; \
+} while (0)
+
+static int blt_block_copy3(int i915,
+			   const intel_ctx_t *ctx,
+			   const struct intel_execution_engine2 *e,
+			   uint64_t ahnd,
+			   const struct blt_copy3_data *blt3,
+			   const struct blt_block_copy3_data_ext *ext3)
+{
+	struct drm_i915_gem_execbuffer2 execbuf = {};
+	struct drm_i915_gem_exec_object2 obj[5] = {};
+	struct blt_copy_data blt0;
+	struct blt_block_copy_data_ext ext0;
+	uint64_t src_offset, mid_offset, dst_offset, final_offset, bb_offset, alignment;
+	uint64_t bb_pos = 0;
+	uint32_t *bb;
+	int i, ret;
+
+	igt_assert_f(ahnd, "block-copy3 supports softpin only\n");
+	igt_assert_f(blt3, "block-copy3 requires data to do blit\n");
+
+	alignment = gem_detect_safe_alignment(i915);
+	src_offset = get_offset(ahnd, blt3->src.handle, blt3->src.size, alignment);
+	mid_offset = get_offset(ahnd, blt3->mid.handle, blt3->mid.size, alignment);
+	dst_offset = get_offset(ahnd, blt3->dst.handle, blt3->dst.size, alignment);
+	final_offset = get_offset(ahnd, blt3->final.handle, blt3->final.size, alignment);
+	bb_offset = get_offset(ahnd, blt3->bb.handle, blt3->bb.size, alignment);
+
+	igt_debug("src: %lx, mid: %lx, dst: %lx, final: %lx, bb: %lx, align: %lx\n",
+		  src_offset, mid_offset, dst_offset, final_offset, bb_offset, alignment);
+
+	/* First blit src -> mid */
+	memset(&blt0, 0, sizeof(blt0));
+	blt0.src = blt3->src;
+	blt0.dst = blt3->mid;
+	blt0.bb = blt3->bb;
+	blt0.color_depth = blt3->color_depth;
+	blt0.print_bb = blt3->print_bb;
+	ext0.src = ext3->src;
+	ext0.dst = ext3->mid;
+	bb_pos = emit_blt_block_copy(i915, ahnd, &blt0, &ext0, bb_pos, false);
+
+	/* Second blit mid -> dst */
+	memset(&blt0, 0, sizeof(blt0));
+	blt0.src = blt3->mid;
+	blt0.dst = blt3->dst;
+	blt0.bb = blt3->bb;
+	blt0.color_depth = blt3->color_depth;
+	blt0.print_bb = blt3->print_bb;
+	ext0.src = ext3->mid;
+	ext0.dst = ext3->dst;
+	bb_pos = emit_blt_block_copy(i915, ahnd, &blt0, &ext0, bb_pos, false);
+
+	/* Third blit dst -> final */
+	memset(&blt0, 0, sizeof(blt0));
+	blt0.src = blt3->dst;
+	blt0.dst = blt3->final;
+	blt0.bb = blt3->bb;
+	blt0.color_depth = blt3->color_depth;
+	blt0.print_bb = blt3->print_bb;
+	ext0.src = ext3->dst;
+	ext0.dst = ext3->final;
+	bb_pos = emit_blt_block_copy(i915, ahnd, &blt0, &ext0, bb_pos, true);
+
+	i = 0;
+	FILL_OBJ(i, blt3->src.handle, CANONICAL(src_offset), 0);
+	FILL_OBJ(i, blt3->mid.handle, CANONICAL(mid_offset), EXEC_OBJECT_WRITE);
+	if (mid_offset != dst_offset)
+		FILL_OBJ(i, blt3->dst.handle, CANONICAL(dst_offset), EXEC_OBJECT_WRITE);
+	FILL_OBJ(i, blt3->final.handle, CANONICAL(final_offset), 0);
+	FILL_OBJ(i, blt3->bb.handle, CANONICAL(bb_offset), 0);
+
+	execbuf.buffer_count = i;
+
+	for (int __i = 0; __i < i; __i++)
+		igt_debug("obj[%d].offset: %llx, handle: %u\n",
+			  __i, (long long) obj[__i].offset, obj[__i].handle);
+	execbuf.buffers_ptr = to_user_pointer(obj);
+	execbuf.rsvd1 = ctx ? ctx->id : 0;
+	execbuf.flags = e ? e->flags : I915_EXEC_BLT;
+	ret = __gem_execbuf(i915, &execbuf);
+
+	gem_sync(i915, blt3->bb.handle);
+	munmap(bb, blt3->bb.size);
+
+	return ret;
+}
+
 static void block_copy(int i915,
 		       const intel_ctx_t *ctx,
 		       const struct intel_execution_engine2 *e,
@@ -380,10 +493,100 @@ static void block_copy(int i915,
 	igt_assert_f(!result, "source and destination surfaces differs!\n");
 }
 
+static void block_multicopy(int i915,
+			    const intel_ctx_t *ctx,
+			    const struct intel_execution_engine2 *e,
+			    uint32_t region1, uint32_t region2,
+			    enum blt_tiling mid_tiling,
+			    const struct test_config *config)
+{
+	struct blt_copy3_data blt3 = {};
+	struct blt_block_copy3_data_ext ext3 = {}, *pext3 = &ext3;
+	struct blt_copy_object *src, *mid, *dst, *final;
+	const uint32_t bpp = 32;
+	uint64_t bb_size = 4096;
+	uint64_t ahnd = get_reloc_ahnd(i915, ctx->id);
+	uint32_t run_id = mid_tiling;
+	uint32_t mid_region = region2, bb;
+	uint32_t width = param.width, height = param.height;
+	enum blt_compression mid_compression = config->compression;
+	int mid_compression_format = param.compression_format;
+	enum blt_compression_type comp_type = COMPRESSION_TYPE_3D;
+	uint8_t uc_mocs = intel_get_uc_mocs(i915);
+	int result;
+
+	igt_assert(__gem_create_in_memory_regions(i915, &bb, &bb_size, region1) == 0);
+
+	if (!blt_supports_compression(i915))
+		pext3 = NULL;
+
+	src = create_object(i915, region1, width, height, bpp, uc_mocs,
+			    T_LINEAR, COMPRESSION_DISABLED, comp_type, true);
+	mid = create_object(i915, mid_region, width, height, bpp, uc_mocs,
+			    mid_tiling, mid_compression, comp_type, true);
+	dst = create_object(i915, region1, width, height, bpp, uc_mocs,
+			    mid_tiling, COMPRESSION_DISABLED, comp_type, true);
+	final = create_object(i915, region1, width, height, bpp, uc_mocs,
+			    T_LINEAR, COMPRESSION_DISABLED, comp_type, true);
+	igt_assert(src->size == dst->size);
+	PRINT_SURFACE_INFO("src", src);
+	PRINT_SURFACE_INFO("mid", mid);
+	PRINT_SURFACE_INFO("dst", dst);
+	PRINT_SURFACE_INFO("final", final);
+
+	blt_surface_fill_rect(i915, src, width, height);
+
+	memset(&blt3, 0, sizeof(blt3));
+	blt3.color_depth = CD_32bit;
+	blt3.print_bb = param.print_bb;
+	set_blt_object(&blt3.src, src);
+	set_blt_object(&blt3.mid, mid);
+	set_blt_object(&blt3.dst, dst);
+	set_blt_object(&blt3.final, final);
+
+	if (config->inplace) {
+		set_object(&blt3.dst, mid->handle, dst->size, mid->region, mid->mocs,
+			   mid_tiling, COMPRESSION_DISABLED, comp_type);
+		blt3.dst.ptr = mid->ptr;
+	}
+
+	set_object_ext(&ext3.src, 0, width, height, SURFACE_TYPE_2D);
+	set_object_ext(&ext3.mid, mid_compression_format, width, height, SURFACE_TYPE_2D);
+	set_object_ext(&ext3.dst, 0, width, height, SURFACE_TYPE_2D);
+	set_object_ext(&ext3.final, 0, width, height, SURFACE_TYPE_2D);
+	set_batch(&blt3.bb, bb, bb_size, region1);
+
+	blt_block_copy3(i915, ctx, e, ahnd, &blt3, pext3);
+	gem_sync(i915, blt3.final.handle);
+
+	WRITE_PNG(i915, run_id, "src", &blt3.src, width, height);
+	if (!config->inplace)
+		WRITE_PNG(i915, run_id, "mid", &blt3.mid, width, height);
+	WRITE_PNG(i915, run_id, "dst", &blt3.dst, width, height);
+	WRITE_PNG(i915, run_id, "final", &blt3.final, width, height);
+
+	result = memcmp(src->ptr, blt3.final.ptr, src->size);
+
+	destroy_object(i915, src);
+	destroy_object(i915, mid);
+	destroy_object(i915, dst);
+	destroy_object(i915, final);
+	gem_close(i915, bb);
+	put_ahnd(ahnd);
+
+	igt_assert_f(!result, "source and destination surfaces differs!\n");
+}
+
+enum copy_func {
+	BLOCK_COPY,
+	BLOCK_MULTICOPY,
+};
+
 static void block_copy_test(int i915,
 			    const struct test_config *config,
 			    const intel_ctx_t *ctx,
-			    struct igt_collection *set)
+			    struct igt_collection *set,
+			    enum copy_func copy_function)
 {
 	struct igt_collection *regions;
 	const struct intel_execution_engine2 *e;
@@ -415,15 +618,27 @@ static void block_copy_test(int i915,
 					continue;
 
 				regtxt = memregion_dynamic_subtest_name(regions);
-				igt_dynamic_f("%s-%s-compfmt%d-%s",
-					      blt_tiling_name(tiling),
-					      config->compression ?
-						      "compressed" : "uncompressed",
-					      param.compression_format, regtxt) {
-					block_copy(i915, ctx, e,
-						   region1, region2,
-						   tiling, config);
-				}
+
+				if (copy_function == BLOCK_COPY)
+					igt_dynamic_f("%s-%s-compfmt%d-%s",
+						      blt_tiling_name(tiling),
+						      config->compression ?
+							      "compressed" : "uncompressed",
+						      param.compression_format, regtxt) {
+						block_copy(i915, ctx, e,
+							   region1, region2,
+							   tiling, config);
+					}
+				else if (copy_function == BLOCK_MULTICOPY)
+					igt_dynamic_f("%s-%s-compfmt%d-%s-multicopy",
+						      blt_tiling_name(tiling),
+						      config->compression ?
+							      "compressed" : "uncompressed",
+						      param.compression_format, regtxt) {
+						block_multicopy(i915, ctx, e,
+								region1, region2,
+								tiling, config);
+					}
 				free(regtxt);
 			}
 		}
@@ -506,14 +721,21 @@ igt_main_args("bf:pst:W:H:", NULL, help_str, opt_handler, NULL)
 	igt_subtest_with_dynamic("block-copy-uncompressed") {
 		struct test_config config = {};
 
-		block_copy_test(i915, &config, ctx, set);
+		block_copy_test(i915, &config, ctx, set, BLOCK_COPY);
 	}
 
 	igt_describe("Check block-copy flatccs compressed blit");
 	igt_subtest_with_dynamic("block-copy-compressed") {
 		struct test_config config = { .compression = true };
 
-		block_copy_test(i915, &config, ctx, set);
+		block_copy_test(i915, &config, ctx, set, BLOCK_COPY);
+	}
+
+	igt_describe("Check block-multicopy flatccs compressed blit");
+	igt_subtest_with_dynamic("block-multicopy-compressed") {
+		struct test_config config = { .compression = true };
+
+		block_copy_test(i915, &config, ctx, set, BLOCK_MULTICOPY);
 	}
 
 	igt_describe("Check block-copy flatccs inplace decompression blit");
@@ -521,7 +743,15 @@ igt_main_args("bf:pst:W:H:", NULL, help_str, opt_handler, NULL)
 		struct test_config config = { .compression = true,
 					      .inplace = true };
 
-		block_copy_test(i915, &config, ctx, set);
+		block_copy_test(i915, &config, ctx, set, BLOCK_COPY);
+	}
+
+	igt_describe("Check block-multicopy flatccs inplace decompression blit");
+	igt_subtest_with_dynamic("block-multicopy-inplace") {
+		struct test_config config = { .compression = true,
+					      .inplace = true };
+
+		block_copy_test(i915, &config, ctx, set, BLOCK_MULTICOPY);
 	}
 
 	igt_describe("Check flatccs data can be copied from/to surface");
@@ -529,7 +759,7 @@ igt_main_args("bf:pst:W:H:", NULL, help_str, opt_handler, NULL)
 		struct test_config config = { .compression = true,
 					      .surfcopy = true };
 
-		block_copy_test(i915, &config, ctx, set);
+		block_copy_test(i915, &config, ctx, set, BLOCK_COPY);
 	}
 
 	igt_describe("Check flatccs data are physically tagged and visible"
@@ -539,7 +769,7 @@ igt_main_args("bf:pst:W:H:", NULL, help_str, opt_handler, NULL)
 					      .surfcopy = true,
 					      .new_ctx = true };
 
-		block_copy_test(i915, &config, ctx, set);
+		block_copy_test(i915, &config, ctx, set, BLOCK_COPY);
 	}
 
 	igt_describe("Check flatccs data persists after suspend / resume (S0)");
@@ -548,7 +778,7 @@ igt_main_args("bf:pst:W:H:", NULL, help_str, opt_handler, NULL)
 					      .surfcopy = true,
 					      .suspend_resume = true };
 
-		block_copy_test(i915, &config, ctx, set);
+		block_copy_test(i915, &config, ctx, set, BLOCK_COPY);
 	}
 
 	igt_fixture {
-- 
2.34.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [igt-dev] ✓ Fi.CI.BAT: success for Add block-multicopy-* subtests
  2022-12-12 12:50 [igt-dev] [PATCH i-g-t 0/3] Add block-multicopy-* subtests Zbigniew Kempczyński
                   ` (2 preceding siblings ...)
  2022-12-12 12:50 ` [igt-dev] [PATCH i-g-t 3/3] tests/gem_ccs: Add block-multicopy subtest Zbigniew Kempczyński
@ 2022-12-12 13:23 ` Patchwork
  2022-12-12 21:05 ` [igt-dev] ✓ Fi.CI.IGT: " Patchwork
  4 siblings, 0 replies; 20+ messages in thread
From: Patchwork @ 2022-12-12 13:23 UTC (permalink / raw)
  To: Zbigniew Kempczyński; +Cc: igt-dev

[-- Attachment #1: Type: text/plain, Size: 3314 bytes --]

== Series Details ==

Series: Add block-multicopy-* subtests
URL   : https://patchwork.freedesktop.org/series/111852/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_12493 -> IGTPW_8221
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/index.html

Participating hosts (40 -> 24)
------------------------------

  Missing    (16): fi-kbl-soraka bat-kbl-2 bat-adls-5 bat-adlp-9 bat-dg1-5 fi-bsw-n3050 bat-dg2-8 bat-adlm-1 bat-dg2-9 bat-adlp-6 bat-adlp-4 bat-adln-1 bat-jsl-3 bat-dg2-11 fi-bsw-nick fi-skl-6600u 

Known issues
------------

  Here are the changes found in IGTPW_8221 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@i915_suspend@basic-s3-without-i915:
    - fi-rkl-11600:       [PASS][1] -> [INCOMPLETE][2] ([i915#4817])
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12493/fi-rkl-11600/igt@i915_suspend@basic-s3-without-i915.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/fi-rkl-11600/igt@i915_suspend@basic-s3-without-i915.html

  * igt@kms_chamelium@common-hpd-after-suspend:
    - fi-hsw-4770:        NOTRUN -> [SKIP][3] ([fdo#109271] / [fdo#111827])
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/fi-hsw-4770/igt@kms_chamelium@common-hpd-after-suspend.html

  
#### Possible fixes ####

  * igt@i915_selftest@live@gt_pm:
    - {bat-rpls-2}:       [DMESG-FAIL][4] ([i915#4258]) -> [PASS][5]
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12493/bat-rpls-2/igt@i915_selftest@live@gt_pm.html
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/bat-rpls-2/igt@i915_selftest@live@gt_pm.html

  * igt@i915_selftest@live@hangcheck:
    - fi-hsw-4770:        [INCOMPLETE][6] ([i915#4785]) -> [PASS][7]
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12493/fi-hsw-4770/igt@i915_selftest@live@hangcheck.html
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/fi-hsw-4770/igt@i915_selftest@live@hangcheck.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#111827]: https://bugs.freedesktop.org/show_bug.cgi?id=111827
  [i915#4258]: https://gitlab.freedesktop.org/drm/intel/issues/4258
  [i915#4312]: https://gitlab.freedesktop.org/drm/intel/issues/4312
  [i915#4785]: https://gitlab.freedesktop.org/drm/intel/issues/4785
  [i915#4817]: https://gitlab.freedesktop.org/drm/intel/issues/4817
  [i915#4983]: https://gitlab.freedesktop.org/drm/intel/issues/4983


Build changes
-------------

  * CI: CI-20190529 -> None
  * IGT: IGT_7090 -> IGTPW_8221

  CI-20190529: 20190529
  CI_DRM_12493: a6dc4d045339e2817103e99539e3efaa554c941f @ git://anongit.freedesktop.org/gfx-ci/linux
  IGTPW_8221: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/index.html
  IGT_7090: 5aafcf060b6dfbb2fa7aace76c8074d98ac7da8f @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git


Testlist changes
----------------

+igt@gem_ccs@block-multicopy-compressed
+igt@gem_ccs@block-multicopy-inplace

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/index.html

[-- Attachment #2: Type: text/html, Size: 3872 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [igt-dev] ✓ Fi.CI.IGT: success for Add block-multicopy-* subtests
  2022-12-12 12:50 [igt-dev] [PATCH i-g-t 0/3] Add block-multicopy-* subtests Zbigniew Kempczyński
                   ` (3 preceding siblings ...)
  2022-12-12 13:23 ` [igt-dev] ✓ Fi.CI.BAT: success for Add block-multicopy-* subtests Patchwork
@ 2022-12-12 21:05 ` Patchwork
  4 siblings, 0 replies; 20+ messages in thread
From: Patchwork @ 2022-12-12 21:05 UTC (permalink / raw)
  To: Zbigniew Kempczyński; +Cc: igt-dev

[-- Attachment #1: Type: text/plain, Size: 42082 bytes --]

== Series Details ==

Series: Add block-multicopy-* subtests
URL   : https://patchwork.freedesktop.org/series/111852/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_12493_full -> IGTPW_8221_full
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/index.html

Participating hosts (14 -> 11)
------------------------------

  Missing    (3): pig-skl-6260u pig-kbl-iris pig-glk-j5005 

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in IGTPW_8221_full:

### IGT changes ###

#### Possible regressions ####

  * {igt@gem_ccs@block-multicopy-compressed} (NEW):
    - shard-iclb:         NOTRUN -> [SKIP][1] +1 similar issue
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-iclb6/igt@gem_ccs@block-multicopy-compressed.html
    - {shard-rkl}:        NOTRUN -> [SKIP][2] +1 similar issue
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-rkl-4/igt@gem_ccs@block-multicopy-compressed.html
    - {shard-dg1}:        NOTRUN -> [SKIP][3] +1 similar issue
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-dg1-15/igt@gem_ccs@block-multicopy-compressed.html
    - shard-tglb:         NOTRUN -> [SKIP][4] +1 similar issue
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-tglb1/igt@gem_ccs@block-multicopy-compressed.html

  
#### Suppressed ####

  The following results come from untrusted machines, tests, or statuses.
  They do not affect the overall result.

  * igt@gem_exec_balancer@full-pulse:
    - {shard-rkl}:        [PASS][5] -> [DMESG-WARN][6]
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12493/shard-rkl-6/igt@gem_exec_balancer@full-pulse.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-rkl-4/igt@gem_exec_balancer@full-pulse.html

  * igt@kms_atomic_transition@plane-all-modeset-transition-fencing-internal-panels:
    - {shard-tglu-9}:     NOTRUN -> [SKIP][7] +1 similar issue
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-tglu-9/igt@kms_atomic_transition@plane-all-modeset-transition-fencing-internal-panels.html

  
New tests
---------

  New tests have been introduced between CI_DRM_12493_full and IGTPW_8221_full:

### New IGT tests (2) ###

  * igt@gem_ccs@block-multicopy-compressed:
    - Statuses : 7 skip(s)
    - Exec time: [0.0] s

  * igt@gem_ccs@block-multicopy-inplace:
    - Statuses : 7 skip(s)
    - Exec time: [0.0] s

  

Known issues
------------

  Here are the changes found in IGTPW_8221_full that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@api_intel_bb@object-noreloc-keep-cache-simple:
    - shard-snb:          NOTRUN -> [SKIP][8] ([fdo#109271]) +72 similar issues
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-snb5/igt@api_intel_bb@object-noreloc-keep-cache-simple.html

  * igt@feature_discovery@psr2:
    - shard-iclb:         [PASS][9] -> [SKIP][10] ([i915#658])
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12493/shard-iclb2/igt@feature_discovery@psr2.html
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-iclb7/igt@feature_discovery@psr2.html

  * igt@gem_ctx_persistence@engines-persistence:
    - shard-snb:          NOTRUN -> [SKIP][11] ([fdo#109271] / [i915#1099])
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-snb7/igt@gem_ctx_persistence@engines-persistence.html

  * igt@gem_exec_balancer@parallel-bb-first:
    - shard-iclb:         [PASS][12] -> [SKIP][13] ([i915#4525]) +1 similar issue
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12493/shard-iclb2/igt@gem_exec_balancer@parallel-bb-first.html
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-iclb5/igt@gem_exec_balancer@parallel-bb-first.html

  * igt@gem_exec_fair@basic-none-share@rcs0:
    - shard-tglb:         [PASS][14] -> [FAIL][15] ([i915#2842])
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12493/shard-tglb1/igt@gem_exec_fair@basic-none-share@rcs0.html
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-tglb1/igt@gem_exec_fair@basic-none-share@rcs0.html

  * igt@gem_exec_fair@basic-none@rcs0:
    - shard-glk:          [PASS][16] -> [FAIL][17] ([i915#2842]) +1 similar issue
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12493/shard-glk5/igt@gem_exec_fair@basic-none@rcs0.html
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-glk3/igt@gem_exec_fair@basic-none@rcs0.html

  * igt@gem_huc_copy@huc-copy:
    - shard-tglb:         [PASS][18] -> [SKIP][19] ([i915#2190])
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12493/shard-tglb1/igt@gem_huc_copy@huc-copy.html
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-tglb6/igt@gem_huc_copy@huc-copy.html

  * igt@gem_lmem_swapping@heavy-multi:
    - shard-glk:          NOTRUN -> [SKIP][20] ([fdo#109271] / [i915#4613])
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-glk4/igt@gem_lmem_swapping@heavy-multi.html

  * igt@gem_lmem_swapping@parallel-random-engines:
    - shard-iclb:         NOTRUN -> [SKIP][21] ([i915#4613])
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-iclb5/igt@gem_lmem_swapping@parallel-random-engines.html
    - shard-apl:          NOTRUN -> [SKIP][22] ([fdo#109271] / [i915#4613])
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-apl6/igt@gem_lmem_swapping@parallel-random-engines.html
    - shard-tglb:         NOTRUN -> [SKIP][23] ([i915#4613])
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-tglb6/igt@gem_lmem_swapping@parallel-random-engines.html

  * igt@gem_softpin@evict-single-offset:
    - shard-apl:          NOTRUN -> [FAIL][24] ([i915#4171])
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-apl3/igt@gem_softpin@evict-single-offset.html

  * igt@gem_userptr_blits@access-control:
    - shard-iclb:         NOTRUN -> [SKIP][25] ([i915#3297])
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-iclb8/igt@gem_userptr_blits@access-control.html

  * igt@gem_workarounds@suspend-resume-context:
    - shard-apl:          [PASS][26] -> [DMESG-WARN][27] ([i915#180]) +1 similar issue
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12493/shard-apl8/igt@gem_workarounds@suspend-resume-context.html
   [27]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-apl8/igt@gem_workarounds@suspend-resume-context.html

  * igt@gen9_exec_parse@bb-oversize:
    - shard-tglb:         NOTRUN -> [SKIP][28] ([i915#2527] / [i915#2856]) +1 similar issue
   [28]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-tglb1/igt@gen9_exec_parse@bb-oversize.html

  * igt@gen9_exec_parse@bb-start-param:
    - shard-iclb:         NOTRUN -> [SKIP][29] ([i915#2856]) +1 similar issue
   [29]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-iclb8/igt@gen9_exec_parse@bb-start-param.html

  * igt@kms_big_fb@4-tiled-64bpp-rotate-180:
    - shard-iclb:         NOTRUN -> [SKIP][30] ([i915#5286])
   [30]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-iclb1/igt@kms_big_fb@4-tiled-64bpp-rotate-180.html
    - shard-tglb:         NOTRUN -> [SKIP][31] ([i915#5286])
   [31]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-tglb7/igt@kms_big_fb@4-tiled-64bpp-rotate-180.html

  * igt@kms_big_fb@yf-tiled-max-hw-stride-32bpp-rotate-180-hflip:
    - shard-tglb:         NOTRUN -> [SKIP][32] ([fdo#111615])
   [32]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-tglb3/igt@kms_big_fb@yf-tiled-max-hw-stride-32bpp-rotate-180-hflip.html

  * igt@kms_big_joiner@basic:
    - shard-iclb:         NOTRUN -> [SKIP][33] ([i915#2705])
   [33]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-iclb8/igt@kms_big_joiner@basic.html
    - shard-tglb:         NOTRUN -> [SKIP][34] ([i915#2705])
   [34]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-tglb7/igt@kms_big_joiner@basic.html

  * igt@kms_ccs@pipe-a-bad-rotation-90-4_tiled_dg2_rc_ccs_cc:
    - shard-glk:          NOTRUN -> [SKIP][35] ([fdo#109271]) +40 similar issues
   [35]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-glk2/igt@kms_ccs@pipe-a-bad-rotation-90-4_tiled_dg2_rc_ccs_cc.html

  * igt@kms_ccs@pipe-a-crc-primary-rotation-180-4_tiled_dg2_rc_ccs_cc:
    - shard-iclb:         NOTRUN -> [SKIP][36] ([fdo#109278]) +3 similar issues
   [36]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-iclb8/igt@kms_ccs@pipe-a-crc-primary-rotation-180-4_tiled_dg2_rc_ccs_cc.html

  * igt@kms_ccs@pipe-a-random-ccs-data-4_tiled_dg2_mc_ccs:
    - shard-tglb:         NOTRUN -> [SKIP][37] ([i915#6095])
   [37]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-tglb1/igt@kms_ccs@pipe-a-random-ccs-data-4_tiled_dg2_mc_ccs.html

  * igt@kms_ccs@pipe-b-bad-pixel-format-y_tiled_gen12_mc_ccs:
    - shard-glk:          NOTRUN -> [SKIP][38] ([fdo#109271] / [i915#3886]) +1 similar issue
   [38]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-glk7/igt@kms_ccs@pipe-b-bad-pixel-format-y_tiled_gen12_mc_ccs.html

  * igt@kms_ccs@pipe-b-bad-rotation-90-y_tiled_gen12_rc_ccs_cc:
    - shard-iclb:         NOTRUN -> [SKIP][39] ([fdo#109278] / [i915#3886])
   [39]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-iclb5/igt@kms_ccs@pipe-b-bad-rotation-90-y_tiled_gen12_rc_ccs_cc.html

  * igt@kms_ccs@pipe-c-bad-pixel-format-y_tiled_gen12_mc_ccs:
    - shard-apl:          NOTRUN -> [SKIP][40] ([fdo#109271] / [i915#3886]) +3 similar issues
   [40]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-apl2/igt@kms_ccs@pipe-c-bad-pixel-format-y_tiled_gen12_mc_ccs.html

  * igt@kms_ccs@pipe-c-crc-primary-rotation-180-4_tiled_dg2_rc_ccs:
    - shard-tglb:         NOTRUN -> [SKIP][41] ([i915#3689] / [i915#6095]) +2 similar issues
   [41]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-tglb6/igt@kms_ccs@pipe-c-crc-primary-rotation-180-4_tiled_dg2_rc_ccs.html

  * igt@kms_ccs@pipe-c-random-ccs-data-yf_tiled_ccs:
    - shard-tglb:         NOTRUN -> [SKIP][42] ([fdo#111615] / [i915#3689]) +2 similar issues
   [42]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-tglb6/igt@kms_ccs@pipe-c-random-ccs-data-yf_tiled_ccs.html

  * igt@kms_ccs@pipe-d-bad-pixel-format-y_tiled_gen12_mc_ccs:
    - shard-apl:          NOTRUN -> [SKIP][43] ([fdo#109271]) +77 similar issues
   [43]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-apl6/igt@kms_ccs@pipe-d-bad-pixel-format-y_tiled_gen12_mc_ccs.html
    - shard-tglb:         NOTRUN -> [SKIP][44] ([i915#3689]) +1 similar issue
   [44]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-tglb3/igt@kms_ccs@pipe-d-bad-pixel-format-y_tiled_gen12_mc_ccs.html

  * igt@kms_chamelium@dp-hpd-for-each-pipe:
    - shard-glk:          NOTRUN -> [SKIP][45] ([fdo#109271] / [fdo#111827]) +2 similar issues
   [45]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-glk6/igt@kms_chamelium@dp-hpd-for-each-pipe.html

  * igt@kms_chamelium@vga-hpd-without-ddc:
    - shard-apl:          NOTRUN -> [SKIP][46] ([fdo#109271] / [fdo#111827]) +1 similar issue
   [46]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-apl6/igt@kms_chamelium@vga-hpd-without-ddc.html

  * igt@kms_color_chamelium@ctm-0-75:
    - shard-iclb:         NOTRUN -> [SKIP][47] ([fdo#109284] / [fdo#111827])
   [47]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-iclb6/igt@kms_color_chamelium@ctm-0-75.html
    - shard-snb:          NOTRUN -> [SKIP][48] ([fdo#109271] / [fdo#111827])
   [48]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-snb4/igt@kms_color_chamelium@ctm-0-75.html
    - shard-tglb:         NOTRUN -> [SKIP][49] ([fdo#109284] / [fdo#111827])
   [49]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-tglb1/igt@kms_color_chamelium@ctm-0-75.html

  * igt@kms_content_protection@atomic:
    - shard-tglb:         NOTRUN -> [SKIP][50] ([i915#7118])
   [50]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-tglb2/igt@kms_content_protection@atomic.html

  * igt@kms_content_protection@uevent:
    - shard-iclb:         NOTRUN -> [SKIP][51] ([i915#7118])
   [51]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-iclb6/igt@kms_content_protection@uevent.html
    - shard-tglb:         NOTRUN -> [SKIP][52] ([i915#6944] / [i915#7118])
   [52]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-tglb1/igt@kms_content_protection@uevent.html

  * igt@kms_content_protection@uevent@pipe-a-dp-1:
    - shard-apl:          NOTRUN -> [FAIL][53] ([i915#1339])
   [53]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-apl8/igt@kms_content_protection@uevent@pipe-a-dp-1.html

  * igt@kms_dsc@dsc-with-bpc:
    - shard-iclb:         NOTRUN -> [SKIP][54] ([i915#3840])
   [54]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-iclb6/igt@kms_dsc@dsc-with-bpc.html

  * igt@kms_fbcon_fbt@fbc:
    - shard-apl:          NOTRUN -> [FAIL][55] ([i915#4767])
   [55]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-apl6/igt@kms_fbcon_fbt@fbc.html
    - shard-tglb:         NOTRUN -> [FAIL][56] ([i915#4767])
   [56]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-tglb3/igt@kms_fbcon_fbt@fbc.html
    - shard-iclb:         NOTRUN -> [FAIL][57] ([i915#4767])
   [57]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-iclb1/igt@kms_fbcon_fbt@fbc.html

  * igt@kms_flip@2x-flip-vs-suspend:
    - shard-iclb:         NOTRUN -> [SKIP][58] ([fdo#109274]) +3 similar issues
   [58]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-iclb8/igt@kms_flip@2x-flip-vs-suspend.html

  * igt@kms_flip@2x-plain-flip:
    - shard-tglb:         NOTRUN -> [SKIP][59] ([fdo#109274] / [fdo#111825] / [i915#3637]) +3 similar issues
   [59]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-tglb3/igt@kms_flip@2x-plain-flip.html

  * igt@kms_flip_scaled_crc@flip-32bpp-4tile-to-64bpp-4tile-downscaling@pipe-a-valid-mode:
    - shard-iclb:         NOTRUN -> [SKIP][60] ([i915#2587] / [i915#2672]) +4 similar issues
   [60]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-iclb7/igt@kms_flip_scaled_crc@flip-32bpp-4tile-to-64bpp-4tile-downscaling@pipe-a-valid-mode.html

  * igt@kms_flip_scaled_crc@flip-64bpp-4tile-to-16bpp-4tile-upscaling@pipe-a-default-mode:
    - shard-iclb:         NOTRUN -> [SKIP][61] ([i915#2672]) +3 similar issues
   [61]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-iclb2/igt@kms_flip_scaled_crc@flip-64bpp-4tile-to-16bpp-4tile-upscaling@pipe-a-default-mode.html

  * igt@kms_flip_scaled_crc@flip-64bpp-linear-to-16bpp-linear-downscaling@pipe-a-default-mode:
    - shard-iclb:         [PASS][62] -> [SKIP][63] ([i915#3555])
   [62]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12493/shard-iclb3/igt@kms_flip_scaled_crc@flip-64bpp-linear-to-16bpp-linear-downscaling@pipe-a-default-mode.html
   [63]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-iclb2/igt@kms_flip_scaled_crc@flip-64bpp-linear-to-16bpp-linear-downscaling@pipe-a-default-mode.html

  * igt@kms_flip_scaled_crc@flip-64bpp-ytile-to-32bpp-ytilegen12rcccs-upscaling@pipe-a-valid-mode:
    - shard-iclb:         NOTRUN -> [SKIP][64] ([i915#2672] / [i915#3555])
   [64]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-iclb6/igt@kms_flip_scaled_crc@flip-64bpp-ytile-to-32bpp-ytilegen12rcccs-upscaling@pipe-a-valid-mode.html

  * igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-pri-indfb-draw-mmap-gtt:
    - shard-tglb:         NOTRUN -> [SKIP][65] ([fdo#109280] / [fdo#111825]) +13 similar issues
   [65]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-tglb7/igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-pri-indfb-draw-mmap-gtt.html

  * igt@kms_frontbuffer_tracking@fbcpsr-1p-primscrn-indfb-plflip-blt:
    - shard-tglb:         NOTRUN -> [SKIP][66] ([i915#6497]) +4 similar issues
   [66]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-tglb6/igt@kms_frontbuffer_tracking@fbcpsr-1p-primscrn-indfb-plflip-blt.html

  * igt@kms_frontbuffer_tracking@psr-2p-scndscrn-indfb-pgflip-blt:
    - shard-iclb:         NOTRUN -> [SKIP][67] ([fdo#109280]) +10 similar issues
   [67]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-iclb6/igt@kms_frontbuffer_tracking@psr-2p-scndscrn-indfb-pgflip-blt.html

  * igt@kms_plane_alpha_blend@alpha-basic@pipe-a-dp-1:
    - shard-apl:          NOTRUN -> [FAIL][68] ([i915#4573])
   [68]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-apl3/igt@kms_plane_alpha_blend@alpha-basic@pipe-a-dp-1.html

  * igt@kms_plane_alpha_blend@alpha-basic@pipe-c-dp-1:
    - shard-apl:          NOTRUN -> [DMESG-FAIL][69] ([IGT#6]) +1 similar issue
   [69]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-apl3/igt@kms_plane_alpha_blend@alpha-basic@pipe-c-dp-1.html

  * igt@kms_plane_scaling@planes-upscale-20x20-downscale-factor-0-5@pipe-b-edp-1:
    - shard-iclb:         [PASS][70] -> [SKIP][71] ([i915#5235]) +2 similar issues
   [70]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12493/shard-iclb3/igt@kms_plane_scaling@planes-upscale-20x20-downscale-factor-0-5@pipe-b-edp-1.html
   [71]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-iclb2/igt@kms_plane_scaling@planes-upscale-20x20-downscale-factor-0-5@pipe-b-edp-1.html

  * igt@kms_psr2_sf@cursor-plane-move-continuous-exceed-sf:
    - shard-tglb:         NOTRUN -> [SKIP][72] ([i915#2920])
   [72]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-tglb3/igt@kms_psr2_sf@cursor-plane-move-continuous-exceed-sf.html
    - shard-iclb:         NOTRUN -> [SKIP][73] ([i915#658])
   [73]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-iclb7/igt@kms_psr2_sf@cursor-plane-move-continuous-exceed-sf.html

  * igt@kms_psr2_sf@overlay-plane-update-sf-dmg-area:
    - shard-apl:          NOTRUN -> [SKIP][74] ([fdo#109271] / [i915#658]) +1 similar issue
   [74]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-apl3/igt@kms_psr2_sf@overlay-plane-update-sf-dmg-area.html

  * igt@kms_psr2_su@page_flip-p010@pipe-b-edp-1:
    - shard-iclb:         NOTRUN -> [FAIL][75] ([i915#5939]) +2 similar issues
   [75]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-iclb2/igt@kms_psr2_su@page_flip-p010@pipe-b-edp-1.html

  * igt@kms_psr@psr2_cursor_mmap_gtt:
    - shard-iclb:         [PASS][76] -> [SKIP][77] ([fdo#109441]) +1 similar issue
   [76]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12493/shard-iclb2/igt@kms_psr@psr2_cursor_mmap_gtt.html
   [77]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-iclb5/igt@kms_psr@psr2_cursor_mmap_gtt.html

  * igt@kms_psr@psr2_sprite_blt:
    - shard-iclb:         NOTRUN -> [SKIP][78] ([fdo#109441]) +1 similar issue
   [78]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-iclb5/igt@kms_psr@psr2_sprite_blt.html
    - shard-tglb:         NOTRUN -> [FAIL][79] ([i915#132] / [i915#3467])
   [79]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-tglb6/igt@kms_psr@psr2_sprite_blt.html

  * igt@kms_psr_stress_test@flip-primary-invalidate-overlay:
    - shard-tglb:         NOTRUN -> [SKIP][80] ([i915#5519])
   [80]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-tglb5/igt@kms_psr_stress_test@flip-primary-invalidate-overlay.html

  * igt@kms_psr_stress_test@invalidate-primary-flip-overlay:
    - shard-iclb:         [PASS][81] -> [SKIP][82] ([i915#5519])
   [81]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12493/shard-iclb3/igt@kms_psr_stress_test@invalidate-primary-flip-overlay.html
   [82]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-iclb6/igt@kms_psr_stress_test@invalidate-primary-flip-overlay.html

  * igt@sysfs_clients@create:
    - shard-iclb:         NOTRUN -> [SKIP][83] ([i915#2994])
   [83]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-iclb2/igt@sysfs_clients@create.html
    - shard-tglb:         NOTRUN -> [SKIP][84] ([i915#2994])
   [84]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-tglb5/igt@sysfs_clients@create.html

  * igt@sysfs_clients@recycle:
    - shard-apl:          NOTRUN -> [SKIP][85] ([fdo#109271] / [i915#2994]) +1 similar issue
   [85]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-apl3/igt@sysfs_clients@recycle.html

  
#### Possible fixes ####

  * igt@api_intel_bb@object-reloc-keep-cache:
    - {shard-rkl}:        [SKIP][86] ([i915#3281]) -> [PASS][87] +1 similar issue
   [86]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12493/shard-rkl-1/igt@api_intel_bb@object-reloc-keep-cache.html
   [87]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-rkl-5/igt@api_intel_bb@object-reloc-keep-cache.html

  * igt@device_reset@unbind-reset-rebind:
    - {shard-rkl}:        [DMESG-WARN][88] ([i915#5507]) -> [PASS][89]
   [88]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12493/shard-rkl-5/igt@device_reset@unbind-reset-rebind.html
   [89]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-rkl-4/igt@device_reset@unbind-reset-rebind.html
    - shard-glk:          [DMESG-WARN][90] ([i915#5507]) -> [PASS][91]
   [90]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12493/shard-glk5/igt@device_reset@unbind-reset-rebind.html
   [91]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-glk1/igt@device_reset@unbind-reset-rebind.html

  * igt@drm_read@empty-nonblock:
    - {shard-rkl}:        [SKIP][92] ([i915#1845] / [i915#4098]) -> [PASS][93] +30 similar issues
   [92]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12493/shard-rkl-4/igt@drm_read@empty-nonblock.html
   [93]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-rkl-6/igt@drm_read@empty-nonblock.html

  * igt@fbdev@unaligned-read:
    - {shard-rkl}:        [SKIP][94] ([i915#2582]) -> [PASS][95] +1 similar issue
   [94]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12493/shard-rkl-1/igt@fbdev@unaligned-read.html
   [95]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-rkl-6/igt@fbdev@unaligned-read.html

  * igt@feature_discovery@psr1:
    - {shard-rkl}:        [SKIP][96] ([i915#658]) -> [PASS][97]
   [96]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12493/shard-rkl-5/igt@feature_discovery@psr1.html
   [97]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-rkl-6/igt@feature_discovery@psr1.html

  * igt@gem_eio@reset-stress:
    - {shard-dg1}:        [FAIL][98] ([i915#5784]) -> [PASS][99] +1 similar issue
   [98]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12493/shard-dg1-19/igt@gem_eio@reset-stress.html
   [99]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-dg1-13/igt@gem_eio@reset-stress.html

  * igt@gem_exec_balancer@parallel-keep-in-fence:
    - shard-iclb:         [SKIP][100] ([i915#4525]) -> [PASS][101] +1 similar issue
   [100]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12493/shard-iclb7/igt@gem_exec_balancer@parallel-keep-in-fence.html
   [101]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-iclb2/igt@gem_exec_balancer@parallel-keep-in-fence.html

  * igt@gem_exec_fair@basic-flow@rcs0:
    - shard-tglb:         [FAIL][102] ([i915#2842]) -> [PASS][103] +1 similar issue
   [102]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12493/shard-tglb7/igt@gem_exec_fair@basic-flow@rcs0.html
   [103]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-tglb7/igt@gem_exec_fair@basic-flow@rcs0.html

  * igt@gem_exec_fair@basic-none-share@rcs0:
    - shard-glk:          [FAIL][104] ([i915#2842]) -> [PASS][105]
   [104]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12493/shard-glk8/igt@gem_exec_fair@basic-none-share@rcs0.html
   [105]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-glk9/igt@gem_exec_fair@basic-none-share@rcs0.html

  * igt@gem_exec_fair@basic-pace-share@rcs0:
    - {shard-rkl}:        [FAIL][106] ([i915#2842]) -> [PASS][107]
   [106]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12493/shard-rkl-2/igt@gem_exec_fair@basic-pace-share@rcs0.html
   [107]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-rkl-6/igt@gem_exec_fair@basic-pace-share@rcs0.html

  * igt@gem_partial_pwrite_pread@write-uncached:
    - {shard-rkl}:        [SKIP][108] ([i915#3282]) -> [PASS][109] +1 similar issue
   [108]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12493/shard-rkl-4/igt@gem_partial_pwrite_pread@write-uncached.html
   [109]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-rkl-5/igt@gem_partial_pwrite_pread@write-uncached.html

  * igt@gen9_exec_parse@allowed-single:
    - shard-apl:          [DMESG-WARN][110] ([i915#5566] / [i915#716]) -> [PASS][111]
   [110]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12493/shard-apl1/igt@gen9_exec_parse@allowed-single.html
   [111]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-apl1/igt@gen9_exec_parse@allowed-single.html
    - shard-glk:          [DMESG-WARN][112] ([i915#5566] / [i915#716]) -> [PASS][113]
   [112]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12493/shard-glk6/igt@gen9_exec_parse@allowed-single.html
   [113]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-glk3/igt@gen9_exec_parse@allowed-single.html

  * igt@i915_pm_dc@dc9-dpms:
    - {shard-rkl}:        [SKIP][114] ([i915#3361]) -> [PASS][115]
   [114]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12493/shard-rkl-5/igt@i915_pm_dc@dc9-dpms.html
   [115]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-rkl-2/igt@i915_pm_dc@dc9-dpms.html

  * igt@i915_pm_rc6_residency@rc6-idle@vecs0:
    - {shard-dg1}:        [FAIL][116] ([i915#3591]) -> [PASS][117]
   [116]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12493/shard-dg1-18/igt@i915_pm_rc6_residency@rc6-idle@vecs0.html
   [117]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-dg1-13/igt@i915_pm_rc6_residency@rc6-idle@vecs0.html

  * igt@i915_pm_rpm@cursor-dpms:
    - {shard-rkl}:        [SKIP][118] ([i915#1849]) -> [PASS][119]
   [118]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12493/shard-rkl-4/igt@i915_pm_rpm@cursor-dpms.html
   [119]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-rkl-6/igt@i915_pm_rpm@cursor-dpms.html

  * igt@i915_suspend@forcewake:
    - shard-apl:          [DMESG-WARN][120] ([i915#180]) -> [PASS][121]
   [120]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12493/shard-apl1/igt@i915_suspend@forcewake.html
   [121]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-apl2/igt@i915_suspend@forcewake.html

  * igt@kms_flip@flip-vs-suspend-interruptible@a-edp1:
    - shard-tglb:         [INCOMPLETE][122] ([i915#2411]) -> [PASS][123]
   [122]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12493/shard-tglb3/igt@kms_flip@flip-vs-suspend-interruptible@a-edp1.html
   [123]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-tglb7/igt@kms_flip@flip-vs-suspend-interruptible@a-edp1.html

  * igt@kms_frontbuffer_tracking@fbc-1p-offscren-pri-shrfb-draw-pwrite:
    - {shard-rkl}:        [SKIP][124] ([i915#1849] / [i915#4098]) -> [PASS][125] +18 similar issues
   [124]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12493/shard-rkl-2/igt@kms_frontbuffer_tracking@fbc-1p-offscren-pri-shrfb-draw-pwrite.html
   [125]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-rkl-6/igt@kms_frontbuffer_tracking@fbc-1p-offscren-pri-shrfb-draw-pwrite.html

  * igt@kms_psr@cursor_mmap_cpu:
    - {shard-rkl}:        [SKIP][126] ([i915#1072]) -> [PASS][127] +3 similar issues
   [126]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12493/shard-rkl-4/igt@kms_psr@cursor_mmap_cpu.html
   [127]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-rkl-6/igt@kms_psr@cursor_mmap_cpu.html

  * igt@kms_psr@psr2_dpms:
    - shard-iclb:         [SKIP][128] ([fdo#109441]) -> [PASS][129]
   [128]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12493/shard-iclb7/igt@kms_psr@psr2_dpms.html
   [129]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-iclb2/igt@kms_psr@psr2_dpms.html

  * igt@kms_psr_stress_test@invalidate-primary-flip-overlay:
    - shard-tglb:         [SKIP][130] ([i915#5519]) -> [PASS][131]
   [130]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12493/shard-tglb7/igt@kms_psr_stress_test@invalidate-primary-flip-overlay.html
   [131]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-tglb1/igt@kms_psr_stress_test@invalidate-primary-flip-overlay.html

  * igt@perf_pmu@module-unload:
    - {shard-dg1}:        [DMESG-WARN][132] -> [PASS][133]
   [132]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12493/shard-dg1-13/igt@perf_pmu@module-unload.html
   [133]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-dg1-13/igt@perf_pmu@module-unload.html

  * igt@prime_vgem@coherency-gtt:
    - {shard-rkl}:        [SKIP][134] ([fdo#109295] / [fdo#111656] / [i915#3708]) -> [PASS][135]
   [134]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12493/shard-rkl-1/igt@prime_vgem@coherency-gtt.html
   [135]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-rkl-5/igt@prime_vgem@coherency-gtt.html

  * igt@sysfs_timeslice_duration@timeout@rcs0:
    - {shard-dg1}:        [FAIL][136] ([i915#1755]) -> [PASS][137] +1 similar issue
   [136]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12493/shard-dg1-13/igt@sysfs_timeslice_duration@timeout@rcs0.html
   [137]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-dg1-17/igt@sysfs_timeslice_duration@timeout@rcs0.html

  * igt@testdisplay:
    - {shard-rkl}:        [SKIP][138] ([i915#4098]) -> [PASS][139] +2 similar issues
   [138]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12493/shard-rkl-4/igt@testdisplay.html
   [139]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-rkl-6/igt@testdisplay.html

  
#### Warnings ####

  * igt@kms_psr2_sf@overlay-plane-move-continuous-exceed-fully-sf:
    - shard-iclb:         [SKIP][140] ([i915#658]) -> [SKIP][141] ([i915#2920])
   [140]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12493/shard-iclb6/igt@kms_psr2_sf@overlay-plane-move-continuous-exceed-fully-sf.html
   [141]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-iclb2/igt@kms_psr2_sf@overlay-plane-move-continuous-exceed-fully-sf.html

  * igt@kms_psr2_sf@overlay-plane-move-continuous-exceed-sf:
    - shard-iclb:         [SKIP][142] ([i915#2920]) -> [SKIP][143] ([i915#658])
   [142]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12493/shard-iclb2/igt@kms_psr2_sf@overlay-plane-move-continuous-exceed-sf.html
   [143]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-iclb3/igt@kms_psr2_sf@overlay-plane-move-continuous-exceed-sf.html

  * igt@kms_psr2_sf@overlay-primary-update-sf-dmg-area:
    - shard-iclb:         [SKIP][144] ([i915#2920]) -> [SKIP][145] ([fdo#111068] / [i915#658]) +1 similar issue
   [144]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12493/shard-iclb2/igt@kms_psr2_sf@overlay-primary-update-sf-dmg-area.html
   [145]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/shard-iclb1/igt@kms_psr2_sf@overlay-primary-update-sf-dmg-area.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [IGT#6]: https://gitlab.freedesktop.org/drm/igt-gpu-tools/issues/6
  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109274]: https://bugs.freedesktop.org/show_bug.cgi?id=109274
  [fdo#109278]: https://bugs.freedesktop.org/show_bug.cgi?id=109278
  [fdo#109279]: https://bugs.freedesktop.org/show_bug.cgi?id=109279
  [fdo#109280]: https://bugs.freedesktop.org/show_bug.cgi?id=109280
  [fdo#109283]: https://bugs.freedesktop.org/show_bug.cgi?id=109283
  [fdo#109284]: https://bugs.freedesktop.org/show_bug.cgi?id=109284
  [fdo#109285]: https://bugs.freedesktop.org/show_bug.cgi?id=109285
  [fdo#109289]: https://bugs.freedesktop.org/show_bug.cgi?id=109289
  [fdo#109295]: https://bugs.freedesktop.org/show_bug.cgi?id=109295
  [fdo#109308]: https://bugs.freedesktop.org/show_bug.cgi?id=109308
  [fdo#109312]: https://bugs.freedesktop.org/show_bug.cgi?id=109312
  [fdo#109313]: https://bugs.freedesktop.org/show_bug.cgi?id=109313
  [fdo#109315]: https://bugs.freedesktop.org/show_bug.cgi?id=109315
  [fdo#109441]: https://bugs.freedesktop.org/show_bug.cgi?id=109441
  [fdo#109506]: https://bugs.freedesktop.org/show_bug.cgi?id=109506
  [fdo#109642]: https://bugs.freedesktop.org/show_bug.cgi?id=109642
  [fdo#110189]: https://bugs.freedesktop.org/show_bug.cgi?id=110189
  [fdo#110542]: https://bugs.freedesktop.org/show_bug.cgi?id=110542
  [fdo#110723]: https://bugs.freedesktop.org/show_bug.cgi?id=110723
  [fdo#111068]: https://bugs.freedesktop.org/show_bug.cgi?id=111068
  [fdo#111614]: https://bugs.freedesktop.org/show_bug.cgi?id=111614
  [fdo#111615]: https://bugs.freedesktop.org/show_bug.cgi?id=111615
  [fdo#111644]: https://bugs.freedesktop.org/show_bug.cgi?id=111644
  [fdo#111656]: https://bugs.freedesktop.org/show_bug.cgi?id=111656
  [fdo#111825]: https://bugs.freedesktop.org/show_bug.cgi?id=111825
  [fdo#111827]: https://bugs.freedesktop.org/show_bug.cgi?id=111827
  [fdo#112054]: https://bugs.freedesktop.org/show_bug.cgi?id=112054
  [fdo#112283]: https://bugs.freedesktop.org/show_bug.cgi?id=112283
  [i915#1072]: https://gitlab.freedesktop.org/drm/intel/issues/1072
  [i915#1099]: https://gitlab.freedesktop.org/drm/intel/issues/1099
  [i915#132]: https://gitlab.freedesktop.org/drm/intel/issues/132
  [i915#1339]: https://gitlab.freedesktop.org/drm/intel/issues/1339
  [i915#1397]: https://gitlab.freedesktop.org/drm/intel/issues/1397
  [i915#1722]: https://gitlab.freedesktop.org/drm/intel/issues/1722
  [i915#1755]: https://gitlab.freedesktop.org/drm/intel/issues/1755
  [i915#1769]: https://gitlab.freedesktop.org/drm/intel/issues/1769
  [i915#180]: https://gitlab.freedesktop.org/drm/intel/issues/180
  [i915#1825]: https://gitlab.freedesktop.org/drm/intel/issues/1825
  [i915#1839]: https://gitlab.freedesktop.org/drm/intel/issues/1839
  [i915#1845]: https://gitlab.freedesktop.org/drm/intel/issues/1845
  [i915#1849]: https://gitlab.freedesktop.org/drm/intel/issues/1849
  [i915#1902]: https://gitlab.freedesktop.org/drm/intel/issues/1902
  [i915#2190]: https://gitlab.freedesktop.org/drm/intel/issues/2190
  [i915#2232]: https://gitlab.freedesktop.org/drm/intel/issues/2232
  [i915#2411]: https://gitlab.freedesktop.org/drm/intel/issues/2411
  [i915#2437]: https://gitlab.freedesktop.org/drm/intel/issues/2437
  [i915#2527]: https://gitlab.freedesktop.org/drm/intel/issues/2527
  [i915#2575]: https://gitlab.freedesktop.org/drm/intel/issues/2575
  [i915#2582]: https://gitlab.freedesktop.org/drm/intel/issues/2582
  [i915#2587]: https://gitlab.freedesktop.org/drm/intel/issues/2587
  [i915#2658]: https://gitlab.freedesktop.org/drm/intel/issues/2658
  [i915#2672]: https://gitlab.freedesktop.org/drm/intel/issues/2672
  [i915#2681]: https://gitlab.freedesktop.org/drm/intel/issues/2681
  [i915#2705]: https://gitlab.freedesktop.org/drm/intel/issues/2705
  [i915#280]: https://gitlab.freedesktop.org/drm/intel/issues/280
  [i915#284]: https://gitlab.freedesktop.org/drm/intel/issues/284
  [i915#2842]: https://gitlab.freedesktop.org/drm/intel/issues/2842
  [i915#2856]: https://gitlab.freedesktop.org/drm/intel/issues/2856
  [i915#2920]: https://gitlab.freedesktop.org/drm/intel/issues/2920
  [i915#2994]: https://gitlab.freedesktop.org/drm/intel/issues/2994
  [i915#3002]: https://gitlab.freedesktop.org/drm/intel/issues/3002
  [i915#3116]: https://gitlab.freedesktop.org/drm/intel/issues/3116
  [i915#315]: https://gitlab.freedesktop.org/drm/intel/issues/315
  [i915#3281]: https://gitlab.freedesktop.org/drm/intel/issues/3281
  [i915#3282]: https://gitlab.freedesktop.org/drm/intel/issues/3282
  [i915#3297]: https://gitlab.freedesktop.org/drm/intel/issues/3297
  [i915#3301]: https://gitlab.freedesktop.org/drm/intel/issues/3301
  [i915#3318]: https://gitlab.freedesktop.org/drm/intel/issues/3318
  [i915#3323]: https://gitlab.freedesktop.org/drm/intel/issues/3323
  [i915#3359]: https://gitlab.freedesktop.org/drm/intel/issues/3359
  [i915#3361]: https://gitlab.freedesktop.org/drm/intel/issues/3361
  [i915#3458]: https://gitlab.freedesktop.org/drm/intel/issues/3458
  [i915#3467]: https://gitlab.freedesktop.org/drm/intel/issues/3467
  [i915#3469]: https://gitlab.freedesktop.org/drm/intel/issues/3469
  [i915#3546]: https://gitlab.freedesktop.org/drm/intel/issues/3546
  [i915#3555]: https://gitlab.freedesktop.org/drm/intel/issues/3555
  [i915#3558]: https://gitlab.freedesktop.org/drm/intel/issues/3558
  [i915#3591]: https://gitlab.freedesktop.org/drm/intel/issues/3591
  [i915#3637]: https://gitlab.freedesktop.org/drm/intel/issues/3637
  [i915#3638]: https://gitlab.freedesktop.org/drm/intel/issues/3638
  [i915#3639]: https://gitlab.freedesktop.org/drm/intel/issues/3639
  [i915#3689]: https://gitlab.freedesktop.org/drm/intel/issues/3689
  [i915#3708]: https://gitlab.freedesktop.org/drm/intel/issues/3708
  [i915#3734]: https://gitlab.freedesktop.org/drm/intel/issues/3734
  [i915#3742]: https://gitlab.freedesktop.org/drm/intel/issues/3742
  [i915#3810]: https://gitlab.freedesktop.org/drm/intel/issues/3810
  [i915#3840]: https://gitlab.freedesktop.org/drm/intel/issues/3840
  [i915#3886]: https://gitlab.freedesktop.org/drm/intel/issues/3886
  [i915#3955]: https://gitlab.freedesktop.org/drm/intel/issues/3955
  [i915#3989]: https://gitlab.freedesktop.org/drm/intel/issues/3989
  [i915#404]: https://gitlab.freedesktop.org/drm/intel/issues/404
  [i915#4070]: https://gitlab.freedesktop.org/drm/intel/issues/4070
  [i915#4077]: https://gitlab.freedesktop.org/drm/intel/issues/4077
  [i915#4078]: https://gitlab.freedesktop.org/drm/intel/issues/4078
  [i915#4079]: https://gitlab.freedesktop.org/drm/intel/issues/4079
  [i915#4098]: https://gitlab.freedesktop.org/drm/intel/issues/4098
  [i915#4103]: https://gitlab.freedesktop.org/drm/intel/issues/4103
  [i915#4171]: https://gitlab.freedesktop.org/drm/intel/issues/4171
  [i915#426]: https://gitlab.freedesktop.org/drm/intel/issues/426
  [i915#4270]: https://gitlab.freedesktop.org/drm/intel/issues/4270
  [i915#4281]: https://gitlab.freedesktop.org/drm/intel/issues/4281
  [i915#4312]: https://gitlab.freedesktop.org/drm/intel/issues/4312
  [i915#4525]: https://gitlab.freedesktop.org/drm/intel/issues/4525
  [i915#4538]: https://gitlab.freedesktop.org/drm/intel/issues/4538
  [i915#454]: https://gitlab.freedesktop.org/drm/intel/issues/454
  [i915#4573]: https://gitlab.freedesktop.org/drm/intel/issues/4573
  [i915#4613]: https://gitlab.freedesktop.org/drm/intel/issues/4613
  [i915#4767]: https://gitlab.freedesktop.org/drm/intel/issues/4767
  [i915#4833]: https://gitlab.freedesktop.org/drm/intel/issues/4833
  [i915#4860]: https://gitlab.freedesktop.org/drm/intel/issues/4860
  [i915#4877]: https://gitlab.freedesktop.org/drm/intel/issues/4877
  [i915#5176]: https://gitlab.freedesktop.org/drm/intel/issues/5176
  [i915#5235]: https://gitlab.freedesktop.org/drm/intel/issues/5235
  [i915#5286]: https://gitlab.freedesktop.org/drm/intel/issues/5286
  [i915#5288]: https://gitlab.freedesktop.org/drm/intel/issues/5288
  [i915#5289]: https://gitlab.freedesktop.org/drm/intel/issues/5289
  [i915#5325]: https://gitlab.freedesktop.org/drm/intel/issues/5325
  [i915#5327]: https://gitlab.freedesktop.org/drm/intel/issues/5327
  [i915#533]: https://gitlab.freedesktop.org/drm/intel/issues/533
  [i915#5439]: https://gitlab.freedesktop.org/drm/intel/issues/5439
  [i915#5461]: https://gitlab.freedesktop.org/drm/intel/issues/5461
  [i915#5507]: https://gitlab.freedesktop.org/drm/intel/issues/5507
  [i915#5519]: https://gitlab.freedesktop.org/drm/intel/issues/5519
  [i915#5563]: https://gitlab.freedesktop.org/drm/intel/issues/5563
  [i915#5566]: https://gitlab.freedesktop.org/drm/intel/issues/5566
  [i915#5723]: https://gitlab.freedesktop.org/drm/intel/issues/5723
  [i915#5784]: https://gitlab.freedesktop.org/drm/intel/issues/5784
  [i915#5939]: https://gitlab.freedesktop.org/drm/intel/issues/5939
  [i915#6095]: https://gitlab.freedesktop.org/drm/intel/issues/6095
  [i915#6227]: https://gitlab.freedesktop.org/drm/intel/issues/6227
  [i915#6245]: https://gitlab.freedesktop.org/drm/intel/issues/6245
  [i915#6247]: https://gitlab.freedesktop.org/drm/intel/issues/6247
  [i915#6248]: https://gitlab.freedesktop.org/drm/intel/issues/6248
  [i915#6268]: https://gitlab.freedesktop.org/drm/intel/issues/6268
  [i915#6335]: https://gitlab.freedesktop.org/drm/intel/issues/6335
  [i915#6497]: https://gitlab.freedesktop.org/drm/intel/issues/6497
  [i915#6524]: https://gitlab.freedesktop.org/drm/intel/issues/6524
  [i915#658]: https://gitlab.freedesktop.org/drm/intel/issues/658
  [i915#6768]: https://gitlab.freedesktop.org/drm/intel/issues/6768
  [i915#6944]: https://gitlab.freedesktop.org/drm/intel/issues/6944
  [i915#6946]: https://gitlab.freedesktop.org/drm/intel/issues/6946
  [i915#7116]: https://gitlab.freedesktop.org/drm/intel/issues/7116
  [i915#7118]: https://gitlab.freedesktop.org/drm/intel/issues/7118
  [i915#716]: https://gitlab.freedesktop.org/drm/intel/issues/716
  [i915#7456]: https://gitlab.freedesktop.org/drm/intel/issues/7456
  [i915#7561]: https://gitlab.freedesktop.org/drm/intel/issues/7561
  [i915#7651]: https://gitlab.freedesktop.org/drm/intel/issues/7651
  [i915#7672]: https://gitlab.freedesktop.org/drm/intel/issues/7672
  [i915#7673]: https://gitlab.freedesktop.org/drm/intel/issues/7673
  [i915#7688]: https://gitlab.freedesktop.org/drm/intel/issues/7688


Build changes
-------------

  * CI: CI-20190529 -> None
  * IGT: IGT_7090 -> IGTPW_8221
  * Piglit: piglit_4509 -> None

  CI-20190529: 20190529
  CI_DRM_12493: a6dc4d045339e2817103e99539e3efaa554c941f @ git://anongit.freedesktop.org/gfx-ci/linux
  IGTPW_8221: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/index.html
  IGT_7090: 5aafcf060b6dfbb2fa7aace76c8074d98ac7da8f @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  piglit_4509: fdc5a4ca11124ab8413c7988896eec4c97336694 @ git://anongit.freedesktop.org/piglit

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8221/index.html

[-- Attachment #2: Type: text/html, Size: 43607 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 1/3] lib/i915_blt: Remove src == dst pitch restriction
  2022-12-12 12:50 ` [igt-dev] [PATCH i-g-t 1/3] lib/i915_blt: Remove src == dst pitch restriction Zbigniew Kempczyński
@ 2022-12-13 15:37   ` Karolina Stolarek
  2022-12-14 18:57     ` Zbigniew Kempczyński
  0 siblings, 1 reply; 20+ messages in thread
From: Karolina Stolarek @ 2022-12-13 15:37 UTC (permalink / raw)
  To: Zbigniew Kempczyński; +Cc: igt-dev

On 12.12.2022 13:50, Zbigniew Kempczyński wrote:
> During debugging phase we established there's not necessary to enforce
> same pitch for destination surface in FULL_RESOLVE mode. Relax this
> condition allowing caller to pass requirement surface configuration.
> 
> Signed-off-by: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com>
> ---
>   lib/i915/i915_blt.c | 8 +++-----
>   1 file changed, 3 insertions(+), 5 deletions(-)
> 
> diff --git a/lib/i915/i915_blt.c b/lib/i915/i915_blt.c
> index 3776c56c60..42c28623f9 100644
> --- a/lib/i915/i915_blt.c
> +++ b/lib/i915/i915_blt.c
> @@ -317,13 +317,11 @@ static void fill_data(struct gen12_block_copy_data *data,
>   	data->dw00.special_mode = __special_mode(blt);
>   	data->dw00.length = extended_command ? 20 : 10;
>   
> -	if (__special_mode(blt) == SM_FULL_RESOLVE && blt->src.tiling == T_TILE64) {
> -		data->dw01.dst_pitch = blt->src.pitch - 1;
> +	if (__special_mode(blt) == SM_FULL_RESOLVE)

"blt->src.tiling == T_TILE64" part was introduced in commit cdc0258a4672 
("lib/i915_blt: Narrow setting same pitch and aux-ccs to Tile64 only").
I know it's a bit nit-picky, but could we revert that change (as it 
looks like it's not necessary), and have a separate commit that removes 
data->dw01.dst_pitch = blt->src.pitch - 1?

The question remains -- what actually caused the regression for Tile4 
and xmajor? Was it because of the pitch settings? I run the tests a 
couple of times in the row after applying this patch and everything 
worked as expected.

All the best,
Karolina

>   		data->dw01.dst_aux_mode = __aux_mode(&blt->src);
> -	} else {
> -		data->dw01.dst_pitch = blt->dst.pitch - 1;
> +	else
>   		data->dw01.dst_aux_mode = __aux_mode(&blt->dst);
> -	}
> +	data->dw01.dst_pitch = blt->dst.pitch - 1;
>   
>   	data->dw01.dst_mocs = blt->dst.mocs;
>   	data->dw01.dst_compression = blt->dst.compression;

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 2/3] lib/i915_blt: Extract blit emit functions
  2022-12-12 12:50 ` [igt-dev] [PATCH i-g-t 2/3] lib/i915_blt: Extract blit emit functions Zbigniew Kempczyński
@ 2022-12-13 15:39   ` Karolina Stolarek
  2022-12-14 19:40     ` Zbigniew Kempczyński
  0 siblings, 1 reply; 20+ messages in thread
From: Karolina Stolarek @ 2022-12-13 15:39 UTC (permalink / raw)
  To: Zbigniew Kempczyński; +Cc: igt-dev

On 12.12.2022 13:50, Zbigniew Kempczyński wrote:
> Add some flexibility in building user pipelines extracting blitter
> emission code to dedicated functions. Previous blitter functions which
> do one blit-and-execute are rewritten to use those functions.
> Requires usage with stateful allocator (offset might be acquired more
> than one, so it must not change).
> 
> Signed-off-by: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com>
> ---
>   lib/i915/i915_blt.c | 263 ++++++++++++++++++++++++++++++++------------
>   lib/i915/i915_blt.h |  19 ++++
>   2 files changed, 213 insertions(+), 69 deletions(-)
> 
> diff --git a/lib/i915/i915_blt.c b/lib/i915/i915_blt.c
> index 42c28623f9..32ad608775 100644
> --- a/lib/i915/i915_blt.c
> +++ b/lib/i915/i915_blt.c
> @@ -503,58 +503,61 @@ static void dump_bb_ext(struct gen12_block_copy_data_ext *data)
>   }
>   
>   /**
> - * blt_block_copy:
> + * emit_blt_block_copy:
>    * @i915: drm fd
> - * @ctx: intel_ctx_t context
> - * @e: blitter engine for @ctx
>    * @ahnd: allocator handle
>    * @blt: basic blitter data (for TGL/DG1 which doesn't support ext version)
>    * @ext: extended blitter data (for DG2+, supports flatccs compression)
> + * @bb_pos: position at which insert block copy commands
> + * @emit_bbe: emit MI_BATCH_BUFFER_END after block-copy or not
>    *
> - * Function does blit between @src and @dst described in @blt object.
> + * Function inserts block-copy blit into batch at @bb_pos. Allows concatenating
> + * with other commands to achieve pipelining.
>    *
>    * Returns:
> - * execbuffer status.
> + * Next write position in batch.
>    */
> -int blt_block_copy(int i915,
> -		   const intel_ctx_t *ctx,
> -		   const struct intel_execution_engine2 *e,
> -		   uint64_t ahnd,
> -		   const struct blt_copy_data *blt,
> -		   const struct blt_block_copy_data_ext *ext)
> +uint64_t emit_blt_block_copy(int i915,
> +			     uint64_t ahnd,
> +			     const struct blt_copy_data *blt,
> +			     const struct blt_block_copy_data_ext *ext,
> +			     uint64_t bb_pos,
> +			     bool emit_bbe)
>   {
> -	struct drm_i915_gem_execbuffer2 execbuf = {};
> -	struct drm_i915_gem_exec_object2 obj[3] = {};
>   	struct gen12_block_copy_data data = {};
>   	struct gen12_block_copy_data_ext dext = {};
> -	uint64_t dst_offset, src_offset, bb_offset, alignment;
> -	uint32_t *bb;
> -	int i, ret;
> +	uint64_t dst_offset, src_offset, bb_offset;
> +	uint32_t bbe = MI_BATCH_BUFFER_END;
> +	uint8_t *bb;
>   
>   	igt_assert_f(ahnd, "block-copy supports softpin only\n");
>   	igt_assert_f(blt, "block-copy requires data to do blit\n");
>   
> -	alignment = gem_detect_safe_alignment(i915);
> -	src_offset = get_offset(ahnd, blt->src.handle, blt->src.size, alignment);
> -	if (__special_mode(blt) == SM_FULL_RESOLVE)
> -		dst_offset = src_offset;
> -	else
> -		dst_offset = get_offset(ahnd, blt->dst.handle, blt->dst.size, alignment);
> -	bb_offset = get_offset(ahnd, blt->bb.handle, blt->bb.size, alignment);
> +	src_offset = get_offset(ahnd, blt->src.handle, blt->src.size, 0);
> +	dst_offset = get_offset(ahnd, blt->dst.handle, blt->dst.size, 0);
> +	bb_offset = get_offset(ahnd, blt->bb.handle, blt->bb.size, 0);

Why do we pass 0 as object alignment here? In surf-copy and fast-copy we 
pass "alignment" in.

>   
>   	fill_data(&data, blt, src_offset, dst_offset, ext);
>   
> -	i = sizeof(data) / sizeof(uint32_t);
>   	bb = gem_mmap__device_coherent(i915, blt->bb.handle, 0, blt->bb.size,
>   				       PROT_READ | PROT_WRITE);
> -	memcpy(bb, &data, sizeof(data));
> +
> +	igt_assert(bb_pos + sizeof(data) < blt->bb.size);

I'd say we need extra space for potential MI_BATCH_BUFFER_END here

> +	memcpy(bb + bb_pos, &data, sizeof(data));
> +	bb_pos += sizeof(data);
>   
>   	if (ext) {
>   		fill_data_ext(&dext, ext);
> -		memcpy(bb + i, &dext, sizeof(dext));
> -		i += sizeof(dext) / sizeof(uint32_t);
> +		igt_assert(bb_pos + sizeof(dext) < blt->bb.size);
> +		memcpy(bb + bb_pos, &dext, sizeof(dext));
> +		bb_pos += sizeof(dext);
> +	}
> +
> +	if (emit_bbe) {
> +		igt_assert(bb_pos + sizeof(uint32_t) < blt->bb.size);
> +		memcpy(bb + bb_pos, &bbe, sizeof(bbe));
> +		bb_pos += sizeof(uint32_t);
>   	}
> -	bb[i++] = MI_BATCH_BUFFER_END;
>   
>   	if (blt->print_bb) {
>   		igt_info("[BLOCK COPY]\n");
> @@ -569,6 +572,44 @@ int blt_block_copy(int i915,
>   
>   	munmap(bb, blt->bb.size);
>   
> +	return bb_pos;
> +}
> +
> +/**
> + * blt_block_copy:
> + * @i915: drm fd
> + * @ctx: intel_ctx_t context
> + * @e: blitter engine for @ctx
> + * @ahnd: allocator handle
> + * @blt: basic blitter data (for TGL/DG1 which doesn't support ext version)
> + * @ext: extended blitter data (for DG2+, supports flatccs compression)
> + *
> + * Function does blit between @src and @dst described in @blt object.
> + *
> + * Returns:
> + * execbuffer status.
> + */
> +int blt_block_copy(int i915,
> +		   const intel_ctx_t *ctx,
> +		   const struct intel_execution_engine2 *e,
> +		   uint64_t ahnd,
> +		   const struct blt_copy_data *blt,
> +		   const struct blt_block_copy_data_ext *ext)
> +{
> +	struct drm_i915_gem_execbuffer2 execbuf = {};
> +	struct drm_i915_gem_exec_object2 obj[3] = {};
> +	uint64_t dst_offset, src_offset, bb_offset;
> +	int ret;
> +
> +	igt_assert_f(ahnd, "block-copy supports softpin only\n");
> +	igt_assert_f(blt, "block-copy requires data to do blit\n");
> +
> +	src_offset = get_offset(ahnd, blt->src.handle, blt->src.size, 0);
> +	dst_offset = get_offset(ahnd, blt->dst.handle, blt->dst.size, 0);
> +	bb_offset = get_offset(ahnd, blt->bb.handle, blt->bb.size, 0);
> +
> +	emit_blt_block_copy(i915, ahnd, blt, ext, 0, true);
> +
>   	obj[0].offset = CANONICAL(dst_offset);
>   	obj[1].offset = CANONICAL(src_offset);
>   	obj[2].offset = CANONICAL(bb_offset > @@ -655,31 +696,30 @@ static void dump_bb_surf_ctrl_cmd(const struct 
gen12_ctrl_surf_copy_data *data)
>   }
>   
>   /**
> - * blt_ctrl_surf_copy:
> + * emit_blt_ctrl_surf_copy:
>    * @i915: drm fd
> - * @ctx: intel_ctx_t context
> - * @e: blitter engine for @ctx
>    * @ahnd: allocator handle
>    * @surf: blitter data for ctrl-surf-copy
> + * @bb_pos: position at which insert block copy commands
> + * @emit_bbe: emit MI_BATCH_BUFFER_END after ctrl-surf-copy or not
>    *
> - * Function does ctrl-surf-copy blit between @src and @dst described in
> - * @blt object.
> + * Function emits ctrl-surf-copy blit between @src and @dst described in
> + * @blt object at @bb_pos. Allows concatenating with other commands to
> + * achieve pipelining.
>    *
>    * Returns:
> - * execbuffer status.
> + * Next write position in batch.
>    */
> -int blt_ctrl_surf_copy(int i915,
> -		       const intel_ctx_t *ctx,
> -		       const struct intel_execution_engine2 *e,
> -		       uint64_t ahnd,
> -		       const struct blt_ctrl_surf_copy_data *surf)
> +uint64_t emit_blt_ctrl_surf_copy(int i915,
> +				 uint64_t ahnd,
> +				 const struct blt_ctrl_surf_copy_data *surf,
> +				 uint64_t bb_pos,
> +				 bool emit_bbe)
>   {
> -	struct drm_i915_gem_execbuffer2 execbuf = {};
> -	struct drm_i915_gem_exec_object2 obj[3] = {};
>   	struct gen12_ctrl_surf_copy_data data = {};
>   	uint64_t dst_offset, src_offset, bb_offset, alignment;
> +	uint32_t bbe = MI_BATCH_BUFFER_END;
>   	uint32_t *bb;
> -	int i;
>   
>   	igt_assert_f(ahnd, "ctrl-surf-copy supports softpin only\n");
>   	igt_assert_f(surf, "ctrl-surf-copy requires data to do ctrl-surf-copy blit\n");
> @@ -695,12 +735,9 @@ int blt_ctrl_surf_copy(int i915,
>   	data.dw00.size_of_ctrl_copy = __ccs_size(surf) / CCS_RATIO - 1;
>   	data.dw00.length = 0x3;
>   
> -	src_offset = get_offset(ahnd, surf->src.handle, surf->src.size,
> -				alignment);
> -	dst_offset = get_offset(ahnd, surf->dst.handle, surf->dst.size,
> -				alignment);
> -	bb_offset = get_offset(ahnd, surf->bb.handle, surf->bb.size,
> -			       alignment);
> +	src_offset = get_offset(ahnd, surf->src.handle, surf->src.size, alignment);
> +	dst_offset = get_offset(ahnd, surf->dst.handle, surf->dst.size, alignment);
> +	bb_offset = get_offset(ahnd, surf->bb.handle, surf->bb.size, alignment);
>   
>   	data.dw01.src_address_lo = src_offset;
>   	data.dw02.src_address_hi = src_offset >> 32;
> @@ -710,11 +747,18 @@ int blt_ctrl_surf_copy(int i915,
>   	data.dw04.dst_address_hi = dst_offset >> 32;
>   	data.dw04.dst_mocs = surf->dst.mocs;
>   
> -	i = sizeof(data) / sizeof(uint32_t);
>   	bb = gem_mmap__device_coherent(i915, surf->bb.handle, 0, surf->bb.size,
>   				       PROT_READ | PROT_WRITE);
> -	memcpy(bb, &data, sizeof(data));
> -	bb[i++] = MI_BATCH_BUFFER_END;
> +
> +	igt_assert(bb_pos + sizeof(data) < surf->bb.size);
> +	memcpy(bb + bb_pos, &data, sizeof(data));
> +	bb_pos += sizeof(data);
> +
> +	if (emit_bbe) {
> +		igt_assert(bb_pos + sizeof(uint32_t) < surf->bb.size);
> +		memcpy(bb + bb_pos, &bbe, sizeof(bbe));
> +		bb_pos += sizeof(uint32_t);
> +	}
>   
>   	if (surf->print_bb) {
>   		igt_info("BB [CTRL SURF]:\n");
> @@ -724,8 +768,46 @@ int blt_ctrl_surf_copy(int i915,
>   
>   		dump_bb_surf_ctrl_cmd(&data);
>   	}
> +
>   	munmap(bb, surf->bb.size);
>   
> +	return bb_pos;
> +}
> +
> +/**
> + * blt_ctrl_surf_copy:
> + * @i915: drm fd
> + * @ctx: intel_ctx_t context
> + * @e: blitter engine for @ctx
> + * @ahnd: allocator handle
> + * @surf: blitter data for ctrl-surf-copy
> + *
> + * Function does ctrl-surf-copy blit between @src and @dst described in
> + * @blt object.
> + *
> + * Returns:
> + * execbuffer status.
> + */
> +int blt_ctrl_surf_copy(int i915,
> +		       const intel_ctx_t *ctx,
> +		       const struct intel_execution_engine2 *e,
> +		       uint64_t ahnd,
> +		       const struct blt_ctrl_surf_copy_data *surf)
> +{
> +	struct drm_i915_gem_execbuffer2 execbuf = {};
> +	struct drm_i915_gem_exec_object2 obj[3] = {};
> +	uint64_t dst_offset, src_offset, bb_offset, alignment;
> +
> +	igt_assert_f(ahnd, "ctrl-surf-copy supports softpin only\n");
> +	igt_assert_f(surf, "ctrl-surf-copy requires data to do ctrl-surf-copy blit\n");
> +
> +	alignment = max_t(uint64_t, gem_detect_safe_alignment(i915), 1ull << 16);
> +	src_offset = get_offset(ahnd, surf->src.handle, surf->src.size, alignment);
> +	dst_offset = get_offset(ahnd, surf->dst.handle, surf->dst.size, alignment);
> +	bb_offset = get_offset(ahnd, surf->bb.handle, surf->bb.size, alignment);
> +
> +	emit_blt_ctrl_surf_copy(i915, ahnd, surf, 0, true);
> +
>   	obj[0].offset = CANONICAL(dst_offset);
>   	obj[1].offset = CANONICAL(src_offset);
>   	obj[2].offset = CANONICAL(bb_offset);
> @@ -869,31 +951,31 @@ static void dump_bb_fast_cmd(struct gen12_fast_copy_data *data)
>   }
>   
>   /**
> - * blt_fast_copy:
> + * emit_blt_fast_copy:
>    * @i915: drm fd
> - * @ctx: intel_ctx_t context
> - * @e: blitter engine for @ctx
>    * @ahnd: allocator handle
>    * @blt: blitter data for fast-copy (same as for block-copy but doesn't use
>    * compression fields).
> + * @bb_pos: position at which insert block copy commands
> + * @emit_bbe: emit MI_BATCH_BUFFER_END after fast-copy or not
>    *
> - * Function does fast blit between @src and @dst described in @blt object.
> + * Function emits fast-copy blit between @src and @dst described in @blt object
> + * at @bb_pos. Allows concatenating with other commands to
> + * achieve pipelining.
>    *
>    * Returns:
> - * execbuffer status.
> + * Next write position in batch.
>    */
> -int blt_fast_copy(int i915,
> -		  const intel_ctx_t *ctx,
> -		  const struct intel_execution_engine2 *e,
> -		  uint64_t ahnd,
> -		  const struct blt_copy_data *blt)
> +uint64_t emit_blt_fast_copy(int i915,
> +			    uint64_t ahnd,
> +			    const struct blt_copy_data *blt,
> +			    uint64_t bb_pos,
> +			    bool emit_bbe)
>   {
> -	struct drm_i915_gem_execbuffer2 execbuf = {};
> -	struct drm_i915_gem_exec_object2 obj[3] = {};
>   	struct gen12_fast_copy_data data = {};
>   	uint64_t dst_offset, src_offset, bb_offset, alignment;
> +	uint32_t bbe = MI_BATCH_BUFFER_END;
>   	uint32_t *bb;
> -	int i, ret;
>   
>   	alignment = gem_detect_safe_alignment(i915);
>   
> @@ -931,22 +1013,65 @@ int blt_fast_copy(int i915,
>   	data.dw08.src_address_lo = src_offset;
>   	data.dw09.src_address_hi = src_offset >> 32;
>   
> -	i = sizeof(data) / sizeof(uint32_t);
>   	bb = gem_mmap__device_coherent(i915, blt->bb.handle, 0, blt->bb.size,
>   				       PROT_READ | PROT_WRITE);
>   
> -	memcpy(bb, &data, sizeof(data));
> -	bb[i++] = MI_BATCH_BUFFER_END;
> +	igt_assert(bb_pos + sizeof(data) < blt->bb.size);
> +	memcpy(bb + bb_pos, &data, sizeof(data));
> +	bb_pos += sizeof(data);
> +
> +	if (emit_bbe) {
> +		igt_assert(bb_pos + sizeof(uint32_t) < blt->bb.size);
> +		memcpy(bb + bb_pos, &bbe, sizeof(bbe));
> +		bb_pos += sizeof(uint32_t);
> +	}
>   
>   	if (blt->print_bb) {
>   		igt_info("BB [FAST COPY]\n");
> -		igt_info("blit [src offset: %llx, dst offset: %llx\n",
> -			 (long long) src_offset, (long long) dst_offset);
> +		igt_info("src offset: %llx, dst offset: %llx, bb offset: %llx\n",
> +			 (long long) src_offset, (long long) dst_offset,
> +			 (long long) bb_offset);

Nitpick: as you're touching these lines anyway, could you delete space 
after cast? They're not needed.

In general, I'm fine with the changes, but would like to clarify a 
couple of things above before giving r-b.

All the best,
Karolina

>   		dump_bb_fast_cmd(&data);
>   	}
>   
>   	munmap(bb, blt->bb.size);
>   
> +	return bb_pos;
> +}
> +
> +/**
> + * blt_fast_copy:
> + * @i915: drm fd
> + * @ctx: intel_ctx_t context
> + * @e: blitter engine for @ctx
> + * @ahnd: allocator handle
> + * @blt: blitter data for fast-copy (same as for block-copy but doesn't use
> + * compression fields).
> + *
> + * Function does fast blit between @src and @dst described in @blt object.
> + *
> + * Returns:
> + * execbuffer status.
> + */
> +int blt_fast_copy(int i915,
> +		  const intel_ctx_t *ctx,
> +		  const struct intel_execution_engine2 *e,
> +		  uint64_t ahnd,
> +		  const struct blt_copy_data *blt)
> +{
> +	struct drm_i915_gem_execbuffer2 execbuf = {};
> +	struct drm_i915_gem_exec_object2 obj[3] = {};
> +	uint64_t dst_offset, src_offset, bb_offset, alignment;
> +	int ret;
> +
> +	alignment = gem_detect_safe_alignment(i915);
> +
> +	src_offset = get_offset(ahnd, blt->src.handle, blt->src.size, alignment);
> +	dst_offset = get_offset(ahnd, blt->dst.handle, blt->dst.size, alignment);
> +	bb_offset = get_offset(ahnd, blt->bb.handle, blt->bb.size, alignment);
> +
> +	emit_blt_fast_copy(i915, ahnd, blt, 0, true);
> +
>   	obj[0].offset = CANONICAL(dst_offset);
>   	obj[1].offset = CANONICAL(src_offset);
>   	obj[2].offset = CANONICAL(bb_offset);
> diff --git a/lib/i915/i915_blt.h b/lib/i915/i915_blt.h
> index e0e8b52bc2..34db9bb962 100644
> --- a/lib/i915/i915_blt.h
> +++ b/lib/i915/i915_blt.h
> @@ -168,6 +168,13 @@ bool blt_supports_compression(int i915);
>   bool blt_supports_tiling(int i915, enum blt_tiling tiling);
>   const char *blt_tiling_name(enum blt_tiling tiling);
>   
> +uint64_t emit_blt_block_copy(int i915,
> +			     uint64_t ahnd,
> +			     const struct blt_copy_data *blt,
> +			     const struct blt_block_copy_data_ext *ext,
> +			     uint64_t bb_pos,
> +			     bool emit_bbe);
> +
>   int blt_block_copy(int i915,
>   		   const intel_ctx_t *ctx,
>   		   const struct intel_execution_engine2 *e,
> @@ -175,12 +182,24 @@ int blt_block_copy(int i915,
>   		   const struct blt_copy_data *blt,
>   		   const struct blt_block_copy_data_ext *ext);
>   
> +uint64_t emit_blt_ctrl_surf_copy(int i915,
> +				 uint64_t ahnd,
> +				 const struct blt_ctrl_surf_copy_data *surf,
> +				 uint64_t bb_pos,
> +				 bool emit_bbe);
> +
>   int blt_ctrl_surf_copy(int i915,
>   		       const intel_ctx_t *ctx,
>   		       const struct intel_execution_engine2 *e,
>   		       uint64_t ahnd,
>   		       const struct blt_ctrl_surf_copy_data *surf);
>   
> +uint64_t emit_blt_fast_copy(int i915,
> +			    uint64_t ahnd,
> +			    const struct blt_copy_data *blt,
> +			    uint64_t bb_pos,
> +			    bool emit_bbe);
> +
>   int blt_fast_copy(int i915,
>   		  const intel_ctx_t *ctx,
>   		  const struct intel_execution_engine2 *e,

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 3/3] tests/gem_ccs: Add block-multicopy subtest
  2022-12-12 12:50 ` [igt-dev] [PATCH i-g-t 3/3] tests/gem_ccs: Add block-multicopy subtest Zbigniew Kempczyński
@ 2022-12-13 15:40   ` Karolina Stolarek
  2022-12-14 19:57     ` Zbigniew Kempczyński
  0 siblings, 1 reply; 20+ messages in thread
From: Karolina Stolarek @ 2022-12-13 15:40 UTC (permalink / raw)
  To: Zbigniew Kempczyński; +Cc: igt-dev

On 12.12.2022 13:50, Zbigniew Kempczyński wrote:
> Exercise sequence of blits packed in single batch. It may reveal
> flushing/decompressing/detiling problems during execution.
> multicopy-inplace version differs from copy-inplace version with
> additional blit to same tiling format during decompression to separate
> problems visible in decompressing/detiling in one step.
> 
> Signed-off-by: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com>
> ---
>   tests/i915/gem_ccs.c | 262 ++++++++++++++++++++++++++++++++++++++++---
>   1 file changed, 246 insertions(+), 16 deletions(-)
> 
> diff --git a/tests/i915/gem_ccs.c b/tests/i915/gem_ccs.c
> index 4ecb3e36ac..6b5f199ec7 100644
> --- a/tests/i915/gem_ccs.c
> +++ b/tests/i915/gem_ccs.c
> @@ -262,6 +262,119 @@ static void surf_copy(int i915,
>   	gem_close(i915, ccs);
>   }
>   
> +struct blt_copy3_data {
> +	int i915;
> +	struct blt_copy_object src;
> +	struct blt_copy_object mid;
> +	struct blt_copy_object dst;
> +	struct blt_copy_object final;
> +	struct blt_copy_batch bb;
> +	enum blt_color_depth color_depth;
> +
> +	/* debug stuff */
> +	bool print_bb;
> +};
> +
> +struct blt_block_copy3_data_ext {
> +	struct blt_block_copy_object_ext src;
> +	struct blt_block_copy_object_ext mid;
> +	struct blt_block_copy_object_ext dst;
> +	struct blt_block_copy_object_ext final;
> +};
> +
> +#define FILL_OBJ(_idx, _handle, _offset, _flags) do { \
> +	obj[(_idx)].handle = (_handle); \
> +	obj[(_idx)].offset = (_offset); \
> +	obj[(_idx)++].flags = EXEC_OBJECT_PINNED | \
> +			      EXEC_OBJECT_SUPPORTS_48B_ADDRESS | (_flags) ; \
> +} while (0)
> +

I'm not sure if we want to have a macro with a hidden side effect. "i++" 
after each statement isn't pretty either, so I'm on the fence with this one.

> +static int blt_block_copy3(int i915,
> +			   const intel_ctx_t *ctx,
> +			   const struct intel_execution_engine2 *e,
> +			   uint64_t ahnd,
> +			   const struct blt_copy3_data *blt3,
> +			   const struct blt_block_copy3_data_ext *ext3)
> +{
> +	struct drm_i915_gem_execbuffer2 execbuf = {};
> +	struct drm_i915_gem_exec_object2 obj[5] = {};
> +	struct blt_copy_data blt0;
> +	struct blt_block_copy_data_ext ext0;
> +	uint64_t src_offset, mid_offset, dst_offset, final_offset, bb_offset, alignment;
> +	uint64_t bb_pos = 0;
> +	uint32_t *bb;
> +	int i, ret;
> +
> +	igt_assert_f(ahnd, "block-copy3 supports softpin only\n");
> +	igt_assert_f(blt3, "block-copy3 requires data to do blit\n");
> +
> +	alignment = gem_detect_safe_alignment(i915);
> +	src_offset = get_offset(ahnd, blt3->src.handle, blt3->src.size, alignment);
> +	mid_offset = get_offset(ahnd, blt3->mid.handle, blt3->mid.size, alignment);
> +	dst_offset = get_offset(ahnd, blt3->dst.handle, blt3->dst.size, alignment);
> +	final_offset = get_offset(ahnd, blt3->final.handle, blt3->final.size, alignment);
> +	bb_offset = get_offset(ahnd, blt3->bb.handle, blt3->bb.size, alignment);
> +
> +	igt_debug("src: %lx, mid: %lx, dst: %lx, final: %lx, bb: %lx, align: %lx\n",
> +		  src_offset, mid_offset, dst_offset, final_offset, bb_offset, alignment);
> +
> +	/* First blit src -> mid */
> +	memset(&blt0, 0, sizeof(blt0));
> +	blt0.src = blt3->src;
> +	blt0.dst = blt3->mid;
> +	blt0.bb = blt3->bb;
> +	blt0.color_depth = blt3->color_depth;
> +	blt0.print_bb = blt3->print_bb;
> +	ext0.src = ext3->src;
> +	ext0.dst = ext3->mid;
> +	bb_pos = emit_blt_block_copy(i915, ahnd, &blt0, &ext0, bb_pos, false);
> +
> +	/* Second blit mid -> dst */
> +	memset(&blt0, 0, sizeof(blt0));
> +	blt0.src = blt3->mid;
> +	blt0.dst = blt3->dst;
> +	blt0.bb = blt3->bb;
> +	blt0.color_depth = blt3->color_depth;
> +	blt0.print_bb = blt3->print_bb;
> +	ext0.src = ext3->mid;
> +	ext0.dst = ext3->dst;
> +	bb_pos = emit_blt_block_copy(i915, ahnd, &blt0, &ext0, bb_pos, false);
> +
> +	/* Third blit dst -> final */
> +	memset(&blt0, 0, sizeof(blt0));
> +	blt0.src = blt3->dst;
> +	blt0.dst = blt3->final;
> +	blt0.bb = blt3->bb;
> +	blt0.color_depth = blt3->color_depth;
> +	blt0.print_bb = blt3->print_bb;
> +	ext0.src = ext3->dst;
> +	ext0.dst = ext3->final;
> +	bb_pos = emit_blt_block_copy(i915, ahnd, &blt0, &ext0, bb_pos, true);
> +
> +	i = 0;
> +	FILL_OBJ(i, blt3->src.handle, CANONICAL(src_offset), 0);
> +	FILL_OBJ(i, blt3->mid.handle, CANONICAL(mid_offset), EXEC_OBJECT_WRITE);
> +	if (mid_offset != dst_offset)
> +		FILL_OBJ(i, blt3->dst.handle, CANONICAL(dst_offset), EXEC_OBJECT_WRITE);
> +	FILL_OBJ(i, blt3->final.handle, CANONICAL(final_offset), 0);
> +	FILL_OBJ(i, blt3->bb.handle, CANONICAL(bb_offset), 0);
> +
> +	execbuf.buffer_count = i;
> +
> +	for (int __i = 0; __i < i; __i++)
> +		igt_debug("obj[%d].offset: %llx, handle: %u\n",
> +			  __i, (long long) obj[__i].offset, obj[__i].handle);
> +	execbuf.buffers_ptr = to_user_pointer(obj);
> +	execbuf.rsvd1 = ctx ? ctx->id : 0;
> +	execbuf.flags = e ? e->flags : I915_EXEC_BLT;
> +	ret = __gem_execbuf(i915, &execbuf);
> +
> +	gem_sync(i915, blt3->bb.handle);
> +	munmap(bb, blt3->bb.size);
> +
> +	return ret;
> +}
> +
>   static void block_copy(int i915,
>   		       const intel_ctx_t *ctx,
>   		       const struct intel_execution_engine2 *e,
> @@ -380,10 +493,100 @@ static void block_copy(int i915,
>   	igt_assert_f(!result, "source and destination surfaces differs!\n");
>   }
>   
> +static void block_multicopy(int i915,
> +			    const intel_ctx_t *ctx,
> +			    const struct intel_execution_engine2 *e,
> +			    uint32_t region1, uint32_t region2,
> +			    enum blt_tiling mid_tiling,
> +			    const struct test_config *config)
> +{
> +	struct blt_copy3_data blt3 = {};
> +	struct blt_block_copy3_data_ext ext3 = {}, *pext3 = &ext3;
> +	struct blt_copy_object *src, *mid, *dst, *final;
> +	const uint32_t bpp = 32;
> +	uint64_t bb_size = 4096;
> +	uint64_t ahnd = get_reloc_ahnd(i915, ctx->id);
> +	uint32_t run_id = mid_tiling;
> +	uint32_t mid_region = region2, bb;
> +	uint32_t width = param.width, height = param.height;
> +	enum blt_compression mid_compression = config->compression;
> +	int mid_compression_format = param.compression_format;
> +	enum blt_compression_type comp_type = COMPRESSION_TYPE_3D;
> +	uint8_t uc_mocs = intel_get_uc_mocs(i915);
> +	int result;
> +
> +	igt_assert(__gem_create_in_memory_regions(i915, &bb, &bb_size, region1) == 0);
> +
> +	if (!blt_supports_compression(i915))
> +		pext3 = NULL;
> +
> +	src = create_object(i915, region1, width, height, bpp, uc_mocs,
> +			    T_LINEAR, COMPRESSION_DISABLED, comp_type, true);
> +	mid = create_object(i915, mid_region, width, height, bpp, uc_mocs,
> +			    mid_tiling, mid_compression, comp_type, true);
> +	dst = create_object(i915, region1, width, height, bpp, uc_mocs,
> +			    mid_tiling, COMPRESSION_DISABLED, comp_type, true);
> +	final = create_object(i915, region1, width, height, bpp, uc_mocs,
> +			    T_LINEAR, COMPRESSION_DISABLED, comp_type, true);
> +	igt_assert(src->size == dst->size);
> +	PRINT_SURFACE_INFO("src", src);
> +	PRINT_SURFACE_INFO("mid", mid);
> +	PRINT_SURFACE_INFO("dst", dst);
> +	PRINT_SURFACE_INFO("final", final);
> +
> +	blt_surface_fill_rect(i915, src, width, height);
> +
> +	memset(&blt3, 0, sizeof(blt3));
> +	blt3.color_depth = CD_32bit;
> +	blt3.print_bb = param.print_bb;
> +	set_blt_object(&blt3.src, src);
> +	set_blt_object(&blt3.mid, mid);
> +	set_blt_object(&blt3.dst, dst);
> +	set_blt_object(&blt3.final, final);
> +
> +	if (config->inplace) {
> +		set_object(&blt3.dst, mid->handle, dst->size, mid->region, mid->mocs,
> +			   mid_tiling, COMPRESSION_DISABLED, comp_type);
> +		blt3.dst.ptr = mid->ptr;
> +	}
> +
> +	set_object_ext(&ext3.src, 0, width, height, SURFACE_TYPE_2D);
> +	set_object_ext(&ext3.mid, mid_compression_format, width, height, SURFACE_TYPE_2D);
> +	set_object_ext(&ext3.dst, 0, width, height, SURFACE_TYPE_2D);
> +	set_object_ext(&ext3.final, 0, width, height, SURFACE_TYPE_2D);
> +	set_batch(&blt3.bb, bb, bb_size, region1);
> +
> +	blt_block_copy3(i915, ctx, e, ahnd, &blt3, pext3);
> +	gem_sync(i915, blt3.final.handle);
> +
> +	WRITE_PNG(i915, run_id, "src", &blt3.src, width, height);
> +	if (!config->inplace)
> +		WRITE_PNG(i915, run_id, "mid", &blt3.mid, width, height);
> +	WRITE_PNG(i915, run_id, "dst", &blt3.dst, width, height);
> +	WRITE_PNG(i915, run_id, "final", &blt3.final, width, height);
> +
> +	result = memcmp(src->ptr, blt3.final.ptr, src->size);
> +
> +	destroy_object(i915, src);
> +	destroy_object(i915, mid);
> +	destroy_object(i915, dst);
> +	destroy_object(i915, final);
> +	gem_close(i915, bb);
> +	put_ahnd(ahnd);
> +
> +	igt_assert_f(!result, "source and destination surfaces differs!\n");
> +}
> +
> +enum copy_func {
> +	BLOCK_COPY,
> +	BLOCK_MULTICOPY,
> +};
> +

I was thinking if we need a simple case here (i.e. one emit, src->dst)

>   static void block_copy_test(int i915,
>   			    const struct test_config *config,
>   			    const intel_ctx_t *ctx,
> -			    struct igt_collection *set)
> +			    struct igt_collection *set,
> +			    enum copy_func copy_function)
>   {
>   	struct igt_collection *regions;
>   	const struct intel_execution_engine2 *e;
> @@ -415,15 +618,27 @@ static void block_copy_test(int i915,
>   					continue;
>   
>   				regtxt = memregion_dynamic_subtest_name(regions);
> -				igt_dynamic_f("%s-%s-compfmt%d-%s",
> -					      blt_tiling_name(tiling),
> -					      config->compression ?
> -						      "compressed" : "uncompressed",
> -					      param.compression_format, regtxt) {
> -					block_copy(i915, ctx, e,
> -						   region1, region2,
> -						   tiling, config);
> -				}
> +
> +				if (copy_function == BLOCK_COPY)
> +					igt_dynamic_f("%s-%s-compfmt%d-%s",
> +						      blt_tiling_name(tiling),
> +						      config->compression ?
> +							      "compressed" : "uncompressed",
> +						      param.compression_format, regtxt) {
> +						block_copy(i915, ctx, e,
> +							   region1, region2,
> +							   tiling, config);
> +					}
> +				else if (copy_function == BLOCK_MULTICOPY)
> +					igt_dynamic_f("%s-%s-compfmt%d-%s-multicopy",
> +						      blt_tiling_name(tiling),
> +						      config->compression ?
> +							      "compressed" : "uncompressed",
> +						      param.compression_format, regtxt) {
> +						block_multicopy(i915, ctx, e,
> +								region1, region2,
> +								tiling, config);
> +					}

We could avoid repetition to some extent if we had introduced a macro 
that accepts a function pointer and/or some config data. But I don't 
feel strongly about it, just putting it out there.

Many thanks,
Karolina

>   				free(regtxt);
>   			}
>   		}
> @@ -506,14 +721,21 @@ igt_main_args("bf:pst:W:H:", NULL, help_str, opt_handler, NULL)
>   	igt_subtest_with_dynamic("block-copy-uncompressed") {
>   		struct test_config config = {};
>   
> -		block_copy_test(i915, &config, ctx, set);
> +		block_copy_test(i915, &config, ctx, set, BLOCK_COPY);
>   	}
>   
>   	igt_describe("Check block-copy flatccs compressed blit");
>   	igt_subtest_with_dynamic("block-copy-compressed") {
>   		struct test_config config = { .compression = true };
>   
> -		block_copy_test(i915, &config, ctx, set);
> +		block_copy_test(i915, &config, ctx, set, BLOCK_COPY);
> +	}
> +
> +	igt_describe("Check block-multicopy flatccs compressed blit");
> +	igt_subtest_with_dynamic("block-multicopy-compressed") {
> +		struct test_config config = { .compression = true };
> +
> +		block_copy_test(i915, &config, ctx, set, BLOCK_MULTICOPY);
>   	}
>   
>   	igt_describe("Check block-copy flatccs inplace decompression blit");
> @@ -521,7 +743,15 @@ igt_main_args("bf:pst:W:H:", NULL, help_str, opt_handler, NULL)
>   		struct test_config config = { .compression = true,
>   					      .inplace = true };
>   
> -		block_copy_test(i915, &config, ctx, set);
> +		block_copy_test(i915, &config, ctx, set, BLOCK_COPY);
> +	}
> +
> +	igt_describe("Check block-multicopy flatccs inplace decompression blit");
> +	igt_subtest_with_dynamic("block-multicopy-inplace") {
> +		struct test_config config = { .compression = true,
> +					      .inplace = true };
> +
> +		block_copy_test(i915, &config, ctx, set, BLOCK_MULTICOPY);
>   	}
>   
>   	igt_describe("Check flatccs data can be copied from/to surface");
> @@ -529,7 +759,7 @@ igt_main_args("bf:pst:W:H:", NULL, help_str, opt_handler, NULL)
>   		struct test_config config = { .compression = true,
>   					      .surfcopy = true };
>   
> -		block_copy_test(i915, &config, ctx, set);
> +		block_copy_test(i915, &config, ctx, set, BLOCK_COPY);
>   	}
>   
>   	igt_describe("Check flatccs data are physically tagged and visible"
> @@ -539,7 +769,7 @@ igt_main_args("bf:pst:W:H:", NULL, help_str, opt_handler, NULL)
>   					      .surfcopy = true,
>   					      .new_ctx = true };
>   
> -		block_copy_test(i915, &config, ctx, set);
> +		block_copy_test(i915, &config, ctx, set, BLOCK_COPY);
>   	}
>   
>   	igt_describe("Check flatccs data persists after suspend / resume (S0)");
> @@ -548,7 +778,7 @@ igt_main_args("bf:pst:W:H:", NULL, help_str, opt_handler, NULL)
>   					      .surfcopy = true,
>   					      .suspend_resume = true };
>   
> -		block_copy_test(i915, &config, ctx, set);
> +		block_copy_test(i915, &config, ctx, set, BLOCK_COPY);
>   	}
>   
>   	igt_fixture {

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 1/3] lib/i915_blt: Remove src == dst pitch restriction
  2022-12-13 15:37   ` Karolina Stolarek
@ 2022-12-14 18:57     ` Zbigniew Kempczyński
  2022-12-15  8:41       ` Karolina Stolarek
  0 siblings, 1 reply; 20+ messages in thread
From: Zbigniew Kempczyński @ 2022-12-14 18:57 UTC (permalink / raw)
  To: Karolina Stolarek; +Cc: igt-dev

On Tue, Dec 13, 2022 at 04:37:26PM +0100, Karolina Stolarek wrote:
> On 12.12.2022 13:50, Zbigniew Kempczyński wrote:
> > During debugging phase we established there's not necessary to enforce
> > same pitch for destination surface in FULL_RESOLVE mode. Relax this
> > condition allowing caller to pass requirement surface configuration.
> > 
> > Signed-off-by: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com>
> > ---
> >   lib/i915/i915_blt.c | 8 +++-----
> >   1 file changed, 3 insertions(+), 5 deletions(-)
> > 
> > diff --git a/lib/i915/i915_blt.c b/lib/i915/i915_blt.c
> > index 3776c56c60..42c28623f9 100644
> > --- a/lib/i915/i915_blt.c
> > +++ b/lib/i915/i915_blt.c
> > @@ -317,13 +317,11 @@ static void fill_data(struct gen12_block_copy_data *data,
> >   	data->dw00.special_mode = __special_mode(blt);
> >   	data->dw00.length = extended_command ? 20 : 10;
> > -	if (__special_mode(blt) == SM_FULL_RESOLVE && blt->src.tiling == T_TILE64) {
> > -		data->dw01.dst_pitch = blt->src.pitch - 1;
> > +	if (__special_mode(blt) == SM_FULL_RESOLVE)
> 
> "blt->src.tiling == T_TILE64" part was introduced in commit cdc0258a4672
> ("lib/i915_blt: Narrow setting same pitch and aux-ccs to Tile64 only").
> I know it's a bit nit-picky, but could we revert that change (as it looks
> like it's not necessary), and have a separate commit that removes
> data->dw01.dst_pitch = blt->src.pitch - 1?

Two reverts would be necessary: cdc0258a467 and 229bb0accbb7 so single commit
against three looks better for me. Tiling inplace testing is still in progress.
If you think better is to revert above and add another commit I'll do it.

> 
> The question remains -- what actually caused the regression for Tile4 and
> xmajor? Was it because of the pitch settings? I run the tests a couple of
> times in the row after applying this patch and everything worked as
> expected.

Yes, dst pitch influences on destination surface state. For cascaded 
src->mid->dst mid.pitch may differ from dst.pitch (tileX -> linear blit).
For inplace there's some side effect which we're debugging now.

--
Zbigniew 

> 
> All the best,
> Karolina
> 
> >   		data->dw01.dst_aux_mode = __aux_mode(&blt->src);
> > -	} else {
> > -		data->dw01.dst_pitch = blt->dst.pitch - 1;
> > +	else
> >   		data->dw01.dst_aux_mode = __aux_mode(&blt->dst);
> > -	}
> > +	data->dw01.dst_pitch = blt->dst.pitch - 1;
> >   	data->dw01.dst_mocs = blt->dst.mocs;
> >   	data->dw01.dst_compression = blt->dst.compression;

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 2/3] lib/i915_blt: Extract blit emit functions
  2022-12-13 15:39   ` Karolina Stolarek
@ 2022-12-14 19:40     ` Zbigniew Kempczyński
  2022-12-15  8:42       ` Karolina Stolarek
  0 siblings, 1 reply; 20+ messages in thread
From: Zbigniew Kempczyński @ 2022-12-14 19:40 UTC (permalink / raw)
  To: Karolina Stolarek; +Cc: igt-dev

On Tue, Dec 13, 2022 at 04:39:14PM +0100, Karolina Stolarek wrote:
> On 12.12.2022 13:50, Zbigniew Kempczyński wrote:
> > Add some flexibility in building user pipelines extracting blitter
> > emission code to dedicated functions. Previous blitter functions which
> > do one blit-and-execute are rewritten to use those functions.
> > Requires usage with stateful allocator (offset might be acquired more
> > than one, so it must not change).
> > 
> > Signed-off-by: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com>
> > ---
> >   lib/i915/i915_blt.c | 263 ++++++++++++++++++++++++++++++++------------
> >   lib/i915/i915_blt.h |  19 ++++
> >   2 files changed, 213 insertions(+), 69 deletions(-)
> > 
> > diff --git a/lib/i915/i915_blt.c b/lib/i915/i915_blt.c
> > index 42c28623f9..32ad608775 100644
> > --- a/lib/i915/i915_blt.c
> > +++ b/lib/i915/i915_blt.c
> > @@ -503,58 +503,61 @@ static void dump_bb_ext(struct gen12_block_copy_data_ext *data)
> >   }
> >   /**
> > - * blt_block_copy:
> > + * emit_blt_block_copy:
> >    * @i915: drm fd
> > - * @ctx: intel_ctx_t context
> > - * @e: blitter engine for @ctx
> >    * @ahnd: allocator handle
> >    * @blt: basic blitter data (for TGL/DG1 which doesn't support ext version)
> >    * @ext: extended blitter data (for DG2+, supports flatccs compression)
> > + * @bb_pos: position at which insert block copy commands
> > + * @emit_bbe: emit MI_BATCH_BUFFER_END after block-copy or not
> >    *
> > - * Function does blit between @src and @dst described in @blt object.
> > + * Function inserts block-copy blit into batch at @bb_pos. Allows concatenating
> > + * with other commands to achieve pipelining.
> >    *
> >    * Returns:
> > - * execbuffer status.
> > + * Next write position in batch.
> >    */
> > -int blt_block_copy(int i915,
> > -		   const intel_ctx_t *ctx,
> > -		   const struct intel_execution_engine2 *e,
> > -		   uint64_t ahnd,
> > -		   const struct blt_copy_data *blt,
> > -		   const struct blt_block_copy_data_ext *ext)
> > +uint64_t emit_blt_block_copy(int i915,
> > +			     uint64_t ahnd,
> > +			     const struct blt_copy_data *blt,
> > +			     const struct blt_block_copy_data_ext *ext,
> > +			     uint64_t bb_pos,
> > +			     bool emit_bbe)
> >   {
> > -	struct drm_i915_gem_execbuffer2 execbuf = {};
> > -	struct drm_i915_gem_exec_object2 obj[3] = {};
> >   	struct gen12_block_copy_data data = {};
> >   	struct gen12_block_copy_data_ext dext = {};
> > -	uint64_t dst_offset, src_offset, bb_offset, alignment;
> > -	uint32_t *bb;
> > -	int i, ret;
> > +	uint64_t dst_offset, src_offset, bb_offset;
> > +	uint32_t bbe = MI_BATCH_BUFFER_END;
> > +	uint8_t *bb;
> >   	igt_assert_f(ahnd, "block-copy supports softpin only\n");
> >   	igt_assert_f(blt, "block-copy requires data to do blit\n");
> > -	alignment = gem_detect_safe_alignment(i915);
> > -	src_offset = get_offset(ahnd, blt->src.handle, blt->src.size, alignment);
> > -	if (__special_mode(blt) == SM_FULL_RESOLVE)
> > -		dst_offset = src_offset;
> > -	else
> > -		dst_offset = get_offset(ahnd, blt->dst.handle, blt->dst.size, alignment);
> > -	bb_offset = get_offset(ahnd, blt->bb.handle, blt->bb.size, alignment);
> > +	src_offset = get_offset(ahnd, blt->src.handle, blt->src.size, 0);
> > +	dst_offset = get_offset(ahnd, blt->dst.handle, blt->dst.size, 0);
> > +	bb_offset = get_offset(ahnd, blt->bb.handle, blt->bb.size, 0);
> 
> Why do we pass 0 as object alignment here? In surf-copy and fast-copy we
> pass "alignment" in.

After rethinking you're right, caller may instantiate allocator open
with unacceptable alignment (like 4K where we use smem and lmem regions)
so enforcing safe alignment will protect us from ENOSPC.

Will be sent in v2 after you'll comment reply to previous patch.
Anyway thanks for pointing this.

> 
> >   	fill_data(&data, blt, src_offset, dst_offset, ext);
> > -	i = sizeof(data) / sizeof(uint32_t);
> >   	bb = gem_mmap__device_coherent(i915, blt->bb.handle, 0, blt->bb.size,
> >   				       PROT_READ | PROT_WRITE);
> > -	memcpy(bb, &data, sizeof(data));
> > +
> > +	igt_assert(bb_pos + sizeof(data) < blt->bb.size);
> 
> I'd say we need extra space for potential MI_BATCH_BUFFER_END here

I don't assume how many steps (memcpy's to bb) will be in this blit
so I incrementally check if there's enough space in bb. See all 
igt_asserts() below - one for dext and second for bbe (notice that
bbe might be not emitted - but this is caller responsibility to handle
if emitted instructions consumed all bb dwords).

> 
> > +	memcpy(bb + bb_pos, &data, sizeof(data));
> > +	bb_pos += sizeof(data);
> >   	if (ext) {
> >   		fill_data_ext(&dext, ext);
> > -		memcpy(bb + i, &dext, sizeof(dext));
> > -		i += sizeof(dext) / sizeof(uint32_t);
> > +		igt_assert(bb_pos + sizeof(dext) < blt->bb.size);
> > +		memcpy(bb + bb_pos, &dext, sizeof(dext));
> > +		bb_pos += sizeof(dext);
> > +	}
> > +
> > +	if (emit_bbe) {
> > +		igt_assert(bb_pos + sizeof(uint32_t) < blt->bb.size);
> > +		memcpy(bb + bb_pos, &bbe, sizeof(bbe));
> > +		bb_pos += sizeof(uint32_t);
> >   	}
> > -	bb[i++] = MI_BATCH_BUFFER_END;
> >   	if (blt->print_bb) {
> >   		igt_info("[BLOCK COPY]\n");
> > @@ -569,6 +572,44 @@ int blt_block_copy(int i915,
> >   	munmap(bb, blt->bb.size);
> > +	return bb_pos;
> > +}
> > +
> > +/**
> > + * blt_block_copy:
> > + * @i915: drm fd
> > + * @ctx: intel_ctx_t context
> > + * @e: blitter engine for @ctx
> > + * @ahnd: allocator handle
> > + * @blt: basic blitter data (for TGL/DG1 which doesn't support ext version)
> > + * @ext: extended blitter data (for DG2+, supports flatccs compression)
> > + *
> > + * Function does blit between @src and @dst described in @blt object.
> > + *
> > + * Returns:
> > + * execbuffer status.
> > + */
> > +int blt_block_copy(int i915,
> > +		   const intel_ctx_t *ctx,
> > +		   const struct intel_execution_engine2 *e,
> > +		   uint64_t ahnd,
> > +		   const struct blt_copy_data *blt,
> > +		   const struct blt_block_copy_data_ext *ext)
> > +{
> > +	struct drm_i915_gem_execbuffer2 execbuf = {};
> > +	struct drm_i915_gem_exec_object2 obj[3] = {};
> > +	uint64_t dst_offset, src_offset, bb_offset;
> > +	int ret;
> > +
> > +	igt_assert_f(ahnd, "block-copy supports softpin only\n");
> > +	igt_assert_f(blt, "block-copy requires data to do blit\n");
> > +
> > +	src_offset = get_offset(ahnd, blt->src.handle, blt->src.size, 0);
> > +	dst_offset = get_offset(ahnd, blt->dst.handle, blt->dst.size, 0);
> > +	bb_offset = get_offset(ahnd, blt->bb.handle, blt->bb.size, 0);
> > +
> > +	emit_blt_block_copy(i915, ahnd, blt, ext, 0, true);
> > +
> >   	obj[0].offset = CANONICAL(dst_offset);
> >   	obj[1].offset = CANONICAL(src_offset);
> >   	obj[2].offset = CANONICAL(bb_offset > @@ -655,31 +696,30 @@ static
> > void dump_bb_surf_ctrl_cmd(const struct
> gen12_ctrl_surf_copy_data *data)
> >   }
> >   /**
> > - * blt_ctrl_surf_copy:
> > + * emit_blt_ctrl_surf_copy:
> >    * @i915: drm fd
> > - * @ctx: intel_ctx_t context
> > - * @e: blitter engine for @ctx
> >    * @ahnd: allocator handle
> >    * @surf: blitter data for ctrl-surf-copy
> > + * @bb_pos: position at which insert block copy commands
> > + * @emit_bbe: emit MI_BATCH_BUFFER_END after ctrl-surf-copy or not
> >    *
> > - * Function does ctrl-surf-copy blit between @src and @dst described in
> > - * @blt object.
> > + * Function emits ctrl-surf-copy blit between @src and @dst described in
> > + * @blt object at @bb_pos. Allows concatenating with other commands to
> > + * achieve pipelining.
> >    *
> >    * Returns:
> > - * execbuffer status.
> > + * Next write position in batch.
> >    */
> > -int blt_ctrl_surf_copy(int i915,
> > -		       const intel_ctx_t *ctx,
> > -		       const struct intel_execution_engine2 *e,
> > -		       uint64_t ahnd,
> > -		       const struct blt_ctrl_surf_copy_data *surf)
> > +uint64_t emit_blt_ctrl_surf_copy(int i915,
> > +				 uint64_t ahnd,
> > +				 const struct blt_ctrl_surf_copy_data *surf,
> > +				 uint64_t bb_pos,
> > +				 bool emit_bbe)
> >   {
> > -	struct drm_i915_gem_execbuffer2 execbuf = {};
> > -	struct drm_i915_gem_exec_object2 obj[3] = {};
> >   	struct gen12_ctrl_surf_copy_data data = {};
> >   	uint64_t dst_offset, src_offset, bb_offset, alignment;
> > +	uint32_t bbe = MI_BATCH_BUFFER_END;
> >   	uint32_t *bb;
> > -	int i;
> >   	igt_assert_f(ahnd, "ctrl-surf-copy supports softpin only\n");
> >   	igt_assert_f(surf, "ctrl-surf-copy requires data to do ctrl-surf-copy blit\n");
> > @@ -695,12 +735,9 @@ int blt_ctrl_surf_copy(int i915,
> >   	data.dw00.size_of_ctrl_copy = __ccs_size(surf) / CCS_RATIO - 1;
> >   	data.dw00.length = 0x3;
> > -	src_offset = get_offset(ahnd, surf->src.handle, surf->src.size,
> > -				alignment);
> > -	dst_offset = get_offset(ahnd, surf->dst.handle, surf->dst.size,
> > -				alignment);
> > -	bb_offset = get_offset(ahnd, surf->bb.handle, surf->bb.size,
> > -			       alignment);
> > +	src_offset = get_offset(ahnd, surf->src.handle, surf->src.size, alignment);
> > +	dst_offset = get_offset(ahnd, surf->dst.handle, surf->dst.size, alignment);
> > +	bb_offset = get_offset(ahnd, surf->bb.handle, surf->bb.size, alignment);
> >   	data.dw01.src_address_lo = src_offset;
> >   	data.dw02.src_address_hi = src_offset >> 32;
> > @@ -710,11 +747,18 @@ int blt_ctrl_surf_copy(int i915,
> >   	data.dw04.dst_address_hi = dst_offset >> 32;
> >   	data.dw04.dst_mocs = surf->dst.mocs;
> > -	i = sizeof(data) / sizeof(uint32_t);
> >   	bb = gem_mmap__device_coherent(i915, surf->bb.handle, 0, surf->bb.size,
> >   				       PROT_READ | PROT_WRITE);
> > -	memcpy(bb, &data, sizeof(data));
> > -	bb[i++] = MI_BATCH_BUFFER_END;
> > +
> > +	igt_assert(bb_pos + sizeof(data) < surf->bb.size);
> > +	memcpy(bb + bb_pos, &data, sizeof(data));
> > +	bb_pos += sizeof(data);
> > +
> > +	if (emit_bbe) {
> > +		igt_assert(bb_pos + sizeof(uint32_t) < surf->bb.size);
> > +		memcpy(bb + bb_pos, &bbe, sizeof(bbe));
> > +		bb_pos += sizeof(uint32_t);
> > +	}
> >   	if (surf->print_bb) {
> >   		igt_info("BB [CTRL SURF]:\n");
> > @@ -724,8 +768,46 @@ int blt_ctrl_surf_copy(int i915,
> >   		dump_bb_surf_ctrl_cmd(&data);
> >   	}
> > +
> >   	munmap(bb, surf->bb.size);
> > +	return bb_pos;
> > +}
> > +
> > +/**
> > + * blt_ctrl_surf_copy:
> > + * @i915: drm fd
> > + * @ctx: intel_ctx_t context
> > + * @e: blitter engine for @ctx
> > + * @ahnd: allocator handle
> > + * @surf: blitter data for ctrl-surf-copy
> > + *
> > + * Function does ctrl-surf-copy blit between @src and @dst described in
> > + * @blt object.
> > + *
> > + * Returns:
> > + * execbuffer status.
> > + */
> > +int blt_ctrl_surf_copy(int i915,
> > +		       const intel_ctx_t *ctx,
> > +		       const struct intel_execution_engine2 *e,
> > +		       uint64_t ahnd,
> > +		       const struct blt_ctrl_surf_copy_data *surf)
> > +{
> > +	struct drm_i915_gem_execbuffer2 execbuf = {};
> > +	struct drm_i915_gem_exec_object2 obj[3] = {};
> > +	uint64_t dst_offset, src_offset, bb_offset, alignment;
> > +
> > +	igt_assert_f(ahnd, "ctrl-surf-copy supports softpin only\n");
> > +	igt_assert_f(surf, "ctrl-surf-copy requires data to do ctrl-surf-copy blit\n");
> > +
> > +	alignment = max_t(uint64_t, gem_detect_safe_alignment(i915), 1ull << 16);
> > +	src_offset = get_offset(ahnd, surf->src.handle, surf->src.size, alignment);
> > +	dst_offset = get_offset(ahnd, surf->dst.handle, surf->dst.size, alignment);
> > +	bb_offset = get_offset(ahnd, surf->bb.handle, surf->bb.size, alignment);
> > +
> > +	emit_blt_ctrl_surf_copy(i915, ahnd, surf, 0, true);
> > +
> >   	obj[0].offset = CANONICAL(dst_offset);
> >   	obj[1].offset = CANONICAL(src_offset);
> >   	obj[2].offset = CANONICAL(bb_offset);
> > @@ -869,31 +951,31 @@ static void dump_bb_fast_cmd(struct gen12_fast_copy_data *data)
> >   }
> >   /**
> > - * blt_fast_copy:
> > + * emit_blt_fast_copy:
> >    * @i915: drm fd
> > - * @ctx: intel_ctx_t context
> > - * @e: blitter engine for @ctx
> >    * @ahnd: allocator handle
> >    * @blt: blitter data for fast-copy (same as for block-copy but doesn't use
> >    * compression fields).
> > + * @bb_pos: position at which insert block copy commands
> > + * @emit_bbe: emit MI_BATCH_BUFFER_END after fast-copy or not
> >    *
> > - * Function does fast blit between @src and @dst described in @blt object.
> > + * Function emits fast-copy blit between @src and @dst described in @blt object
> > + * at @bb_pos. Allows concatenating with other commands to
> > + * achieve pipelining.
> >    *
> >    * Returns:
> > - * execbuffer status.
> > + * Next write position in batch.
> >    */
> > -int blt_fast_copy(int i915,
> > -		  const intel_ctx_t *ctx,
> > -		  const struct intel_execution_engine2 *e,
> > -		  uint64_t ahnd,
> > -		  const struct blt_copy_data *blt)
> > +uint64_t emit_blt_fast_copy(int i915,
> > +			    uint64_t ahnd,
> > +			    const struct blt_copy_data *blt,
> > +			    uint64_t bb_pos,
> > +			    bool emit_bbe)
> >   {
> > -	struct drm_i915_gem_execbuffer2 execbuf = {};
> > -	struct drm_i915_gem_exec_object2 obj[3] = {};
> >   	struct gen12_fast_copy_data data = {};
> >   	uint64_t dst_offset, src_offset, bb_offset, alignment;
> > +	uint32_t bbe = MI_BATCH_BUFFER_END;
> >   	uint32_t *bb;
> > -	int i, ret;
> >   	alignment = gem_detect_safe_alignment(i915);
> > @@ -931,22 +1013,65 @@ int blt_fast_copy(int i915,
> >   	data.dw08.src_address_lo = src_offset;
> >   	data.dw09.src_address_hi = src_offset >> 32;
> > -	i = sizeof(data) / sizeof(uint32_t);
> >   	bb = gem_mmap__device_coherent(i915, blt->bb.handle, 0, blt->bb.size,
> >   				       PROT_READ | PROT_WRITE);
> > -	memcpy(bb, &data, sizeof(data));
> > -	bb[i++] = MI_BATCH_BUFFER_END;
> > +	igt_assert(bb_pos + sizeof(data) < blt->bb.size);
> > +	memcpy(bb + bb_pos, &data, sizeof(data));
> > +	bb_pos += sizeof(data);
> > +
> > +	if (emit_bbe) {
> > +		igt_assert(bb_pos + sizeof(uint32_t) < blt->bb.size);
> > +		memcpy(bb + bb_pos, &bbe, sizeof(bbe));
> > +		bb_pos += sizeof(uint32_t);
> > +	}
> >   	if (blt->print_bb) {
> >   		igt_info("BB [FAST COPY]\n");
> > -		igt_info("blit [src offset: %llx, dst offset: %llx\n",
> > -			 (long long) src_offset, (long long) dst_offset);
> > +		igt_info("src offset: %llx, dst offset: %llx, bb offset: %llx\n",
> > +			 (long long) src_offset, (long long) dst_offset,
> > +			 (long long) bb_offset);
> 
> Nitpick: as you're touching these lines anyway, could you delete space after
> cast? They're not needed.

I see preferred code style is to join (type)var, strange, imo (type)
var looks more readable as var is immediately visible (joined value
disturbes my perpception). Anyway ok, I will change this.

--
Zbigniew


> 
> In general, I'm fine with the changes, but would like to clarify a couple of
> things above before giving r-b.
> 
> All the best,
> Karolina
> 
> >   		dump_bb_fast_cmd(&data);
> >   	}
> >   	munmap(bb, blt->bb.size);
> > +	return bb_pos;
> > +}
> > +
> > +/**
> > + * blt_fast_copy:
> > + * @i915: drm fd
> > + * @ctx: intel_ctx_t context
> > + * @e: blitter engine for @ctx
> > + * @ahnd: allocator handle
> > + * @blt: blitter data for fast-copy (same as for block-copy but doesn't use
> > + * compression fields).
> > + *
> > + * Function does fast blit between @src and @dst described in @blt object.
> > + *
> > + * Returns:
> > + * execbuffer status.
> > + */
> > +int blt_fast_copy(int i915,
> > +		  const intel_ctx_t *ctx,
> > +		  const struct intel_execution_engine2 *e,
> > +		  uint64_t ahnd,
> > +		  const struct blt_copy_data *blt)
> > +{
> > +	struct drm_i915_gem_execbuffer2 execbuf = {};
> > +	struct drm_i915_gem_exec_object2 obj[3] = {};
> > +	uint64_t dst_offset, src_offset, bb_offset, alignment;
> > +	int ret;
> > +
> > +	alignment = gem_detect_safe_alignment(i915);
> > +
> > +	src_offset = get_offset(ahnd, blt->src.handle, blt->src.size, alignment);
> > +	dst_offset = get_offset(ahnd, blt->dst.handle, blt->dst.size, alignment);
> > +	bb_offset = get_offset(ahnd, blt->bb.handle, blt->bb.size, alignment);
> > +
> > +	emit_blt_fast_copy(i915, ahnd, blt, 0, true);
> > +
> >   	obj[0].offset = CANONICAL(dst_offset);
> >   	obj[1].offset = CANONICAL(src_offset);
> >   	obj[2].offset = CANONICAL(bb_offset);
> > diff --git a/lib/i915/i915_blt.h b/lib/i915/i915_blt.h
> > index e0e8b52bc2..34db9bb962 100644
> > --- a/lib/i915/i915_blt.h
> > +++ b/lib/i915/i915_blt.h
> > @@ -168,6 +168,13 @@ bool blt_supports_compression(int i915);
> >   bool blt_supports_tiling(int i915, enum blt_tiling tiling);
> >   const char *blt_tiling_name(enum blt_tiling tiling);
> > +uint64_t emit_blt_block_copy(int i915,
> > +			     uint64_t ahnd,
> > +			     const struct blt_copy_data *blt,
> > +			     const struct blt_block_copy_data_ext *ext,
> > +			     uint64_t bb_pos,
> > +			     bool emit_bbe);
> > +
> >   int blt_block_copy(int i915,
> >   		   const intel_ctx_t *ctx,
> >   		   const struct intel_execution_engine2 *e,
> > @@ -175,12 +182,24 @@ int blt_block_copy(int i915,
> >   		   const struct blt_copy_data *blt,
> >   		   const struct blt_block_copy_data_ext *ext);
> > +uint64_t emit_blt_ctrl_surf_copy(int i915,
> > +				 uint64_t ahnd,
> > +				 const struct blt_ctrl_surf_copy_data *surf,
> > +				 uint64_t bb_pos,
> > +				 bool emit_bbe);
> > +
> >   int blt_ctrl_surf_copy(int i915,
> >   		       const intel_ctx_t *ctx,
> >   		       const struct intel_execution_engine2 *e,
> >   		       uint64_t ahnd,
> >   		       const struct blt_ctrl_surf_copy_data *surf);
> > +uint64_t emit_blt_fast_copy(int i915,
> > +			    uint64_t ahnd,
> > +			    const struct blt_copy_data *blt,
> > +			    uint64_t bb_pos,
> > +			    bool emit_bbe);
> > +
> >   int blt_fast_copy(int i915,
> >   		  const intel_ctx_t *ctx,
> >   		  const struct intel_execution_engine2 *e,

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 3/3] tests/gem_ccs: Add block-multicopy subtest
  2022-12-13 15:40   ` Karolina Stolarek
@ 2022-12-14 19:57     ` Zbigniew Kempczyński
  2022-12-15  8:42       ` Karolina Stolarek
  0 siblings, 1 reply; 20+ messages in thread
From: Zbigniew Kempczyński @ 2022-12-14 19:57 UTC (permalink / raw)
  To: Karolina Stolarek; +Cc: igt-dev

On Tue, Dec 13, 2022 at 04:40:38PM +0100, Karolina Stolarek wrote:
> On 12.12.2022 13:50, Zbigniew Kempczyński wrote:
> > Exercise sequence of blits packed in single batch. It may reveal
> > flushing/decompressing/detiling problems during execution.
> > multicopy-inplace version differs from copy-inplace version with
> > additional blit to same tiling format during decompression to separate
> > problems visible in decompressing/detiling in one step.
> > 
> > Signed-off-by: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com>
> > ---
> >   tests/i915/gem_ccs.c | 262 ++++++++++++++++++++++++++++++++++++++++---
> >   1 file changed, 246 insertions(+), 16 deletions(-)
> > 
> > diff --git a/tests/i915/gem_ccs.c b/tests/i915/gem_ccs.c
> > index 4ecb3e36ac..6b5f199ec7 100644
> > --- a/tests/i915/gem_ccs.c
> > +++ b/tests/i915/gem_ccs.c
> > @@ -262,6 +262,119 @@ static void surf_copy(int i915,
> >   	gem_close(i915, ccs);
> >   }
> > +struct blt_copy3_data {
> > +	int i915;
> > +	struct blt_copy_object src;
> > +	struct blt_copy_object mid;
> > +	struct blt_copy_object dst;
> > +	struct blt_copy_object final;
> > +	struct blt_copy_batch bb;
> > +	enum blt_color_depth color_depth;
> > +
> > +	/* debug stuff */
> > +	bool print_bb;
> > +};
> > +
> > +struct blt_block_copy3_data_ext {
> > +	struct blt_block_copy_object_ext src;
> > +	struct blt_block_copy_object_ext mid;
> > +	struct blt_block_copy_object_ext dst;
> > +	struct blt_block_copy_object_ext final;
> > +};
> > +
> > +#define FILL_OBJ(_idx, _handle, _offset, _flags) do { \
> > +	obj[(_idx)].handle = (_handle); \
> > +	obj[(_idx)].offset = (_offset); \
> > +	obj[(_idx)++].flags = EXEC_OBJECT_PINNED | \
> > +			      EXEC_OBJECT_SUPPORTS_48B_ADDRESS | (_flags) ; \
> > +} while (0)
> > +
> 
> I'm not sure if we want to have a macro with a hidden side effect. "i++"
> after each statement isn't pretty either, so I'm on the fence with this one.

I'm aware this has side effect, but this makes main code more clear as it
is not interlaced with var++ between the lines. I think reader who knows
how execbuf objs works will notice there's some implicit increment in the
macro. 

> 
> > +static int blt_block_copy3(int i915,
> > +			   const intel_ctx_t *ctx,
> > +			   const struct intel_execution_engine2 *e,
> > +			   uint64_t ahnd,
> > +			   const struct blt_copy3_data *blt3,
> > +			   const struct blt_block_copy3_data_ext *ext3)
> > +{
> > +	struct drm_i915_gem_execbuffer2 execbuf = {};
> > +	struct drm_i915_gem_exec_object2 obj[5] = {};
> > +	struct blt_copy_data blt0;
> > +	struct blt_block_copy_data_ext ext0;
> > +	uint64_t src_offset, mid_offset, dst_offset, final_offset, bb_offset, alignment;
> > +	uint64_t bb_pos = 0;
> > +	uint32_t *bb;
> > +	int i, ret;
> > +
> > +	igt_assert_f(ahnd, "block-copy3 supports softpin only\n");
> > +	igt_assert_f(blt3, "block-copy3 requires data to do blit\n");
> > +
> > +	alignment = gem_detect_safe_alignment(i915);
> > +	src_offset = get_offset(ahnd, blt3->src.handle, blt3->src.size, alignment);
> > +	mid_offset = get_offset(ahnd, blt3->mid.handle, blt3->mid.size, alignment);
> > +	dst_offset = get_offset(ahnd, blt3->dst.handle, blt3->dst.size, alignment);
> > +	final_offset = get_offset(ahnd, blt3->final.handle, blt3->final.size, alignment);
> > +	bb_offset = get_offset(ahnd, blt3->bb.handle, blt3->bb.size, alignment);
> > +
> > +	igt_debug("src: %lx, mid: %lx, dst: %lx, final: %lx, bb: %lx, align: %lx\n",
> > +		  src_offset, mid_offset, dst_offset, final_offset, bb_offset, alignment);
> > +
> > +	/* First blit src -> mid */
> > +	memset(&blt0, 0, sizeof(blt0));
> > +	blt0.src = blt3->src;
> > +	blt0.dst = blt3->mid;
> > +	blt0.bb = blt3->bb;
> > +	blt0.color_depth = blt3->color_depth;
> > +	blt0.print_bb = blt3->print_bb;
> > +	ext0.src = ext3->src;
> > +	ext0.dst = ext3->mid;
> > +	bb_pos = emit_blt_block_copy(i915, ahnd, &blt0, &ext0, bb_pos, false);
> > +
> > +	/* Second blit mid -> dst */
> > +	memset(&blt0, 0, sizeof(blt0));
> > +	blt0.src = blt3->mid;
> > +	blt0.dst = blt3->dst;
> > +	blt0.bb = blt3->bb;
> > +	blt0.color_depth = blt3->color_depth;
> > +	blt0.print_bb = blt3->print_bb;
> > +	ext0.src = ext3->mid;
> > +	ext0.dst = ext3->dst;
> > +	bb_pos = emit_blt_block_copy(i915, ahnd, &blt0, &ext0, bb_pos, false);
> > +
> > +	/* Third blit dst -> final */
> > +	memset(&blt0, 0, sizeof(blt0));
> > +	blt0.src = blt3->dst;
> > +	blt0.dst = blt3->final;
> > +	blt0.bb = blt3->bb;
> > +	blt0.color_depth = blt3->color_depth;
> > +	blt0.print_bb = blt3->print_bb;
> > +	ext0.src = ext3->dst;
> > +	ext0.dst = ext3->final;
> > +	bb_pos = emit_blt_block_copy(i915, ahnd, &blt0, &ext0, bb_pos, true);
> > +
> > +	i = 0;
> > +	FILL_OBJ(i, blt3->src.handle, CANONICAL(src_offset), 0);
> > +	FILL_OBJ(i, blt3->mid.handle, CANONICAL(mid_offset), EXEC_OBJECT_WRITE);
> > +	if (mid_offset != dst_offset)
> > +		FILL_OBJ(i, blt3->dst.handle, CANONICAL(dst_offset), EXEC_OBJECT_WRITE);
> > +	FILL_OBJ(i, blt3->final.handle, CANONICAL(final_offset), 0);
> > +	FILL_OBJ(i, blt3->bb.handle, CANONICAL(bb_offset), 0);
> > +
> > +	execbuf.buffer_count = i;
> > +
> > +	for (int __i = 0; __i < i; __i++)
> > +		igt_debug("obj[%d].offset: %llx, handle: %u\n",
> > +			  __i, (long long) obj[__i].offset, obj[__i].handle);
> > +	execbuf.buffers_ptr = to_user_pointer(obj);
> > +	execbuf.rsvd1 = ctx ? ctx->id : 0;
> > +	execbuf.flags = e ? e->flags : I915_EXEC_BLT;
> > +	ret = __gem_execbuf(i915, &execbuf);
> > +
> > +	gem_sync(i915, blt3->bb.handle);
> > +	munmap(bb, blt3->bb.size);
> > +
> > +	return ret;
> > +}
> > +
> >   static void block_copy(int i915,
> >   		       const intel_ctx_t *ctx,
> >   		       const struct intel_execution_engine2 *e,
> > @@ -380,10 +493,100 @@ static void block_copy(int i915,
> >   	igt_assert_f(!result, "source and destination surfaces differs!\n");
> >   }
> > +static void block_multicopy(int i915,
> > +			    const intel_ctx_t *ctx,
> > +			    const struct intel_execution_engine2 *e,
> > +			    uint32_t region1, uint32_t region2,
> > +			    enum blt_tiling mid_tiling,
> > +			    const struct test_config *config)
> > +{
> > +	struct blt_copy3_data blt3 = {};
> > +	struct blt_block_copy3_data_ext ext3 = {}, *pext3 = &ext3;
> > +	struct blt_copy_object *src, *mid, *dst, *final;
> > +	const uint32_t bpp = 32;
> > +	uint64_t bb_size = 4096;
> > +	uint64_t ahnd = get_reloc_ahnd(i915, ctx->id);
> > +	uint32_t run_id = mid_tiling;
> > +	uint32_t mid_region = region2, bb;
> > +	uint32_t width = param.width, height = param.height;
> > +	enum blt_compression mid_compression = config->compression;
> > +	int mid_compression_format = param.compression_format;
> > +	enum blt_compression_type comp_type = COMPRESSION_TYPE_3D;
> > +	uint8_t uc_mocs = intel_get_uc_mocs(i915);
> > +	int result;
> > +
> > +	igt_assert(__gem_create_in_memory_regions(i915, &bb, &bb_size, region1) == 0);
> > +
> > +	if (!blt_supports_compression(i915))
> > +		pext3 = NULL;
> > +
> > +	src = create_object(i915, region1, width, height, bpp, uc_mocs,
> > +			    T_LINEAR, COMPRESSION_DISABLED, comp_type, true);
> > +	mid = create_object(i915, mid_region, width, height, bpp, uc_mocs,
> > +			    mid_tiling, mid_compression, comp_type, true);
> > +	dst = create_object(i915, region1, width, height, bpp, uc_mocs,
> > +			    mid_tiling, COMPRESSION_DISABLED, comp_type, true);
> > +	final = create_object(i915, region1, width, height, bpp, uc_mocs,
> > +			    T_LINEAR, COMPRESSION_DISABLED, comp_type, true);
> > +	igt_assert(src->size == dst->size);
> > +	PRINT_SURFACE_INFO("src", src);
> > +	PRINT_SURFACE_INFO("mid", mid);
> > +	PRINT_SURFACE_INFO("dst", dst);
> > +	PRINT_SURFACE_INFO("final", final);
> > +
> > +	blt_surface_fill_rect(i915, src, width, height);
> > +
> > +	memset(&blt3, 0, sizeof(blt3));
> > +	blt3.color_depth = CD_32bit;
> > +	blt3.print_bb = param.print_bb;
> > +	set_blt_object(&blt3.src, src);
> > +	set_blt_object(&blt3.mid, mid);
> > +	set_blt_object(&blt3.dst, dst);
> > +	set_blt_object(&blt3.final, final);
> > +
> > +	if (config->inplace) {
> > +		set_object(&blt3.dst, mid->handle, dst->size, mid->region, mid->mocs,
> > +			   mid_tiling, COMPRESSION_DISABLED, comp_type);
> > +		blt3.dst.ptr = mid->ptr;
> > +	}
> > +
> > +	set_object_ext(&ext3.src, 0, width, height, SURFACE_TYPE_2D);
> > +	set_object_ext(&ext3.mid, mid_compression_format, width, height, SURFACE_TYPE_2D);
> > +	set_object_ext(&ext3.dst, 0, width, height, SURFACE_TYPE_2D);
> > +	set_object_ext(&ext3.final, 0, width, height, SURFACE_TYPE_2D);
> > +	set_batch(&blt3.bb, bb, bb_size, region1);
> > +
> > +	blt_block_copy3(i915, ctx, e, ahnd, &blt3, pext3);
> > +	gem_sync(i915, blt3.final.handle);
> > +
> > +	WRITE_PNG(i915, run_id, "src", &blt3.src, width, height);
> > +	if (!config->inplace)
> > +		WRITE_PNG(i915, run_id, "mid", &blt3.mid, width, height);
> > +	WRITE_PNG(i915, run_id, "dst", &blt3.dst, width, height);
> > +	WRITE_PNG(i915, run_id, "final", &blt3.final, width, height);
> > +
> > +	result = memcmp(src->ptr, blt3.final.ptr, src->size);
> > +
> > +	destroy_object(i915, src);
> > +	destroy_object(i915, mid);
> > +	destroy_object(i915, dst);
> > +	destroy_object(i915, final);
> > +	gem_close(i915, bb);
> > +	put_ahnd(ahnd);
> > +
> > +	igt_assert_f(!result, "source and destination surfaces differs!\n");
> > +}
> > +
> > +enum copy_func {
> > +	BLOCK_COPY,
> > +	BLOCK_MULTICOPY,
> > +};
> > +
> 
> I was thinking if we need a simple case here (i.e. one emit, src->dst)

What tiling you want to blit? linear -> linear? linear -> tileN? How to
verify it works? fmt1 -> fmt2 -> fmt1 is equal to fmt1 -> fmt2 (detile
fmt2 -> linear on cpu). 

At the moment we don't verify middle surface, only assume src == dst
is fine with middle blit.

--
Zbigniew

> 
> >   static void block_copy_test(int i915,
> >   			    const struct test_config *config,
> >   			    const intel_ctx_t *ctx,
> > -			    struct igt_collection *set)
> > +			    struct igt_collection *set,
> > +			    enum copy_func copy_function)
> >   {
> >   	struct igt_collection *regions;
> >   	const struct intel_execution_engine2 *e;
> > @@ -415,15 +618,27 @@ static void block_copy_test(int i915,
> >   					continue;
> >   				regtxt = memregion_dynamic_subtest_name(regions);
> > -				igt_dynamic_f("%s-%s-compfmt%d-%s",
> > -					      blt_tiling_name(tiling),
> > -					      config->compression ?
> > -						      "compressed" : "uncompressed",
> > -					      param.compression_format, regtxt) {
> > -					block_copy(i915, ctx, e,
> > -						   region1, region2,
> > -						   tiling, config);
> > -				}
> > +
> > +				if (copy_function == BLOCK_COPY)
> > +					igt_dynamic_f("%s-%s-compfmt%d-%s",
> > +						      blt_tiling_name(tiling),
> > +						      config->compression ?
> > +							      "compressed" : "uncompressed",
> > +						      param.compression_format, regtxt) {
> > +						block_copy(i915, ctx, e,
> > +							   region1, region2,
> > +							   tiling, config);
> > +					}
> > +				else if (copy_function == BLOCK_MULTICOPY)
> > +					igt_dynamic_f("%s-%s-compfmt%d-%s-multicopy",
> > +						      blt_tiling_name(tiling),
> > +						      config->compression ?
> > +							      "compressed" : "uncompressed",
> > +						      param.compression_format, regtxt) {
> > +						block_multicopy(i915, ctx, e,
> > +								region1, region2,
> > +								tiling, config);
> > +					}
> 
> We could avoid repetition to some extent if we had introduced a macro that
> accepts a function pointer and/or some config data. But I don't feel
> strongly about it, just putting it out there.
> 
> Many thanks,
> Karolina
> 
> >   				free(regtxt);
> >   			}
> >   		}
> > @@ -506,14 +721,21 @@ igt_main_args("bf:pst:W:H:", NULL, help_str, opt_handler, NULL)
> >   	igt_subtest_with_dynamic("block-copy-uncompressed") {
> >   		struct test_config config = {};
> > -		block_copy_test(i915, &config, ctx, set);
> > +		block_copy_test(i915, &config, ctx, set, BLOCK_COPY);
> >   	}
> >   	igt_describe("Check block-copy flatccs compressed blit");
> >   	igt_subtest_with_dynamic("block-copy-compressed") {
> >   		struct test_config config = { .compression = true };
> > -		block_copy_test(i915, &config, ctx, set);
> > +		block_copy_test(i915, &config, ctx, set, BLOCK_COPY);
> > +	}
> > +
> > +	igt_describe("Check block-multicopy flatccs compressed blit");
> > +	igt_subtest_with_dynamic("block-multicopy-compressed") {
> > +		struct test_config config = { .compression = true };
> > +
> > +		block_copy_test(i915, &config, ctx, set, BLOCK_MULTICOPY);
> >   	}
> >   	igt_describe("Check block-copy flatccs inplace decompression blit");
> > @@ -521,7 +743,15 @@ igt_main_args("bf:pst:W:H:", NULL, help_str, opt_handler, NULL)
> >   		struct test_config config = { .compression = true,
> >   					      .inplace = true };
> > -		block_copy_test(i915, &config, ctx, set);
> > +		block_copy_test(i915, &config, ctx, set, BLOCK_COPY);
> > +	}
> > +
> > +	igt_describe("Check block-multicopy flatccs inplace decompression blit");
> > +	igt_subtest_with_dynamic("block-multicopy-inplace") {
> > +		struct test_config config = { .compression = true,
> > +					      .inplace = true };
> > +
> > +		block_copy_test(i915, &config, ctx, set, BLOCK_MULTICOPY);
> >   	}
> >   	igt_describe("Check flatccs data can be copied from/to surface");
> > @@ -529,7 +759,7 @@ igt_main_args("bf:pst:W:H:", NULL, help_str, opt_handler, NULL)
> >   		struct test_config config = { .compression = true,
> >   					      .surfcopy = true };
> > -		block_copy_test(i915, &config, ctx, set);
> > +		block_copy_test(i915, &config, ctx, set, BLOCK_COPY);
> >   	}
> >   	igt_describe("Check flatccs data are physically tagged and visible"
> > @@ -539,7 +769,7 @@ igt_main_args("bf:pst:W:H:", NULL, help_str, opt_handler, NULL)
> >   					      .surfcopy = true,
> >   					      .new_ctx = true };
> > -		block_copy_test(i915, &config, ctx, set);
> > +		block_copy_test(i915, &config, ctx, set, BLOCK_COPY);
> >   	}
> >   	igt_describe("Check flatccs data persists after suspend / resume (S0)");
> > @@ -548,7 +778,7 @@ igt_main_args("bf:pst:W:H:", NULL, help_str, opt_handler, NULL)
> >   					      .surfcopy = true,
> >   					      .suspend_resume = true };
> > -		block_copy_test(i915, &config, ctx, set);
> > +		block_copy_test(i915, &config, ctx, set, BLOCK_COPY);
> >   	}
> >   	igt_fixture {

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 1/3] lib/i915_blt: Remove src == dst pitch restriction
  2022-12-14 18:57     ` Zbigniew Kempczyński
@ 2022-12-15  8:41       ` Karolina Stolarek
  2022-12-15 10:43         ` Karolina Stolarek
  0 siblings, 1 reply; 20+ messages in thread
From: Karolina Stolarek @ 2022-12-15  8:41 UTC (permalink / raw)
  To: Zbigniew Kempczyński; +Cc: igt-dev

On 14.12.2022 19:57, Zbigniew Kempczyński wrote:
> On Tue, Dec 13, 2022 at 04:37:26PM +0100, Karolina Stolarek wrote:
>> On 12.12.2022 13:50, Zbigniew Kempczyński wrote:
>>> During debugging phase we established there's not necessary to enforce
>>> same pitch for destination surface in FULL_RESOLVE mode. Relax this
>>> condition allowing caller to pass requirement surface configuration.
>>>
>>> Signed-off-by: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com>
>>> ---
>>>    lib/i915/i915_blt.c | 8 +++-----
>>>    1 file changed, 3 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/lib/i915/i915_blt.c b/lib/i915/i915_blt.c
>>> index 3776c56c60..42c28623f9 100644
>>> --- a/lib/i915/i915_blt.c
>>> +++ b/lib/i915/i915_blt.c
>>> @@ -317,13 +317,11 @@ static void fill_data(struct gen12_block_copy_data *data,
>>>    	data->dw00.special_mode = __special_mode(blt);
>>>    	data->dw00.length = extended_command ? 20 : 10;
>>> -	if (__special_mode(blt) == SM_FULL_RESOLVE && blt->src.tiling == T_TILE64) {
>>> -		data->dw01.dst_pitch = blt->src.pitch - 1;
>>> +	if (__special_mode(blt) == SM_FULL_RESOLVE)
>>
>> "blt->src.tiling == T_TILE64" part was introduced in commit cdc0258a4672
>> ("lib/i915_blt: Narrow setting same pitch and aux-ccs to Tile64 only").
>> I know it's a bit nit-picky, but could we revert that change (as it looks
>> like it's not necessary), and have a separate commit that removes
>> data->dw01.dst_pitch = blt->src.pitch - 1?
> 
> Two reverts would be necessary: cdc0258a467 and 229bb0accbb7 so single commit
> against three looks better for me. Tiling inplace testing is still in progress.
> If you think better is to revert above and add another commit I'll do it.

To me, some of the 229bb0accbb7 is still here (setting aux_mode), so I'd 
say the revert of it is not necessary. But, yeah, I think it would be 
better to revert cdc0258a467.

> 
>>
>> The question remains -- what actually caused the regression for Tile4 and
>> xmajor? Was it because of the pitch settings? I run the tests a couple of
>> times in the row after applying this patch and everything worked as
>> expected.
> 
> Yes, dst pitch influences on destination surface state. For cascaded
> src->mid->dst mid.pitch may differ from dst.pitch (tileX -> linear blit).
> For inplace there's some side effect which we're debugging now.

OK, so that seems like a root cause. As for inplace, I think it can be 
slightly different, but we can update the function once debugging is 
finished.

All the best,
Karolina

> 
> --
> Zbigniew
> 
>>
>> All the best,
>> Karolina
>>
>>>    		data->dw01.dst_aux_mode = __aux_mode(&blt->src);
>>> -	} else {
>>> -		data->dw01.dst_pitch = blt->dst.pitch - 1;
>>> +	else
>>>    		data->dw01.dst_aux_mode = __aux_mode(&blt->dst);
>>> -	}
>>> +	data->dw01.dst_pitch = blt->dst.pitch - 1;
>>>    	data->dw01.dst_mocs = blt->dst.mocs;
>>>    	data->dw01.dst_compression = blt->dst.compression;

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 2/3] lib/i915_blt: Extract blit emit functions
  2022-12-14 19:40     ` Zbigniew Kempczyński
@ 2022-12-15  8:42       ` Karolina Stolarek
  2022-12-15 10:41         ` Zbigniew Kempczyński
  0 siblings, 1 reply; 20+ messages in thread
From: Karolina Stolarek @ 2022-12-15  8:42 UTC (permalink / raw)
  To: Zbigniew Kempczyński; +Cc: igt-dev

On 14.12.2022 20:40, Zbigniew Kempczyński wrote:
> On Tue, Dec 13, 2022 at 04:39:14PM +0100, Karolina Stolarek wrote:
>> On 12.12.2022 13:50, Zbigniew Kempczyński wrote:
>>> Add some flexibility in building user pipelines extracting blitter
>>> emission code to dedicated functions. Previous blitter functions which
>>> do one blit-and-execute are rewritten to use those functions.
>>> Requires usage with stateful allocator (offset might be acquired more
>>> than one, so it must not change).
>>>
>>> Signed-off-by: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com>
>>> ---
>>>    lib/i915/i915_blt.c | 263 ++++++++++++++++++++++++++++++++------------
>>>    lib/i915/i915_blt.h |  19 ++++
>>>    2 files changed, 213 insertions(+), 69 deletions(-)
>>>
>>> diff --git a/lib/i915/i915_blt.c b/lib/i915/i915_blt.c
>>> index 42c28623f9..32ad608775 100644
>>> --- a/lib/i915/i915_blt.c
>>> +++ b/lib/i915/i915_blt.c
>>> @@ -503,58 +503,61 @@ static void dump_bb_ext(struct gen12_block_copy_data_ext *data)
>>>    }
>>>    /**
>>> - * blt_block_copy:
>>> + * emit_blt_block_copy:
>>>     * @i915: drm fd
>>> - * @ctx: intel_ctx_t context
>>> - * @e: blitter engine for @ctx
>>>     * @ahnd: allocator handle
>>>     * @blt: basic blitter data (for TGL/DG1 which doesn't support ext version)
>>>     * @ext: extended blitter data (for DG2+, supports flatccs compression)
>>> + * @bb_pos: position at which insert block copy commands
>>> + * @emit_bbe: emit MI_BATCH_BUFFER_END after block-copy or not
>>>     *
>>> - * Function does blit between @src and @dst described in @blt object.
>>> + * Function inserts block-copy blit into batch at @bb_pos. Allows concatenating
>>> + * with other commands to achieve pipelining.
>>>     *
>>>     * Returns:
>>> - * execbuffer status.
>>> + * Next write position in batch.
>>>     */
>>> -int blt_block_copy(int i915,
>>> -		   const intel_ctx_t *ctx,
>>> -		   const struct intel_execution_engine2 *e,
>>> -		   uint64_t ahnd,
>>> -		   const struct blt_copy_data *blt,
>>> -		   const struct blt_block_copy_data_ext *ext)
>>> +uint64_t emit_blt_block_copy(int i915,
>>> +			     uint64_t ahnd,
>>> +			     const struct blt_copy_data *blt,
>>> +			     const struct blt_block_copy_data_ext *ext,
>>> +			     uint64_t bb_pos,
>>> +			     bool emit_bbe)
>>>    {
>>> -	struct drm_i915_gem_execbuffer2 execbuf = {};
>>> -	struct drm_i915_gem_exec_object2 obj[3] = {};
>>>    	struct gen12_block_copy_data data = {};
>>>    	struct gen12_block_copy_data_ext dext = {};
>>> -	uint64_t dst_offset, src_offset, bb_offset, alignment;
>>> -	uint32_t *bb;
>>> -	int i, ret;
>>> +	uint64_t dst_offset, src_offset, bb_offset;
>>> +	uint32_t bbe = MI_BATCH_BUFFER_END;
>>> +	uint8_t *bb;
>>>    	igt_assert_f(ahnd, "block-copy supports softpin only\n");
>>>    	igt_assert_f(blt, "block-copy requires data to do blit\n");
>>> -	alignment = gem_detect_safe_alignment(i915);
>>> -	src_offset = get_offset(ahnd, blt->src.handle, blt->src.size, alignment);
>>> -	if (__special_mode(blt) == SM_FULL_RESOLVE)
>>> -		dst_offset = src_offset;
>>> -	else
>>> -		dst_offset = get_offset(ahnd, blt->dst.handle, blt->dst.size, alignment);
>>> -	bb_offset = get_offset(ahnd, blt->bb.handle, blt->bb.size, alignment);
>>> +	src_offset = get_offset(ahnd, blt->src.handle, blt->src.size, 0);
>>> +	dst_offset = get_offset(ahnd, blt->dst.handle, blt->dst.size, 0);
>>> +	bb_offset = get_offset(ahnd, blt->bb.handle, blt->bb.size, 0);
>>
>> Why do we pass 0 as object alignment here? In surf-copy and fast-copy we
>> pass "alignment" in.
> 
> After rethinking you're right, caller may instantiate allocator open
> with unacceptable alignment (like 4K where we use smem and lmem regions)
> so enforcing safe alignment will protect us from ENOSPC.
> 
> Will be sent in v2 after you'll comment reply to previous patch.
> Anyway thanks for pointing this.

Right, thanks for taking care of it!

> 
>>
>>>    	fill_data(&data, blt, src_offset, dst_offset, ext);
>>> -	i = sizeof(data) / sizeof(uint32_t);
>>>    	bb = gem_mmap__device_coherent(i915, blt->bb.handle, 0, blt->bb.size,
>>>    				       PROT_READ | PROT_WRITE);
>>> -	memcpy(bb, &data, sizeof(data));
>>> +
>>> +	igt_assert(bb_pos + sizeof(data) < blt->bb.size);
>>
>> I'd say we need extra space for potential MI_BATCH_BUFFER_END here
> 
> I don't assume how many steps (memcpy's to bb) will be in this blit
> so I incrementally check if there's enough space in bb. See all
> igt_asserts() below - one for dext and second for bbe (notice that
> bbe might be not emitted - but this is caller responsibility to handle
> if emitted instructions consumed all bb dwords).

I meant the scenario where we pass emit_bbe=false and the caller wants 
to add more data. Now, when I'm thinking about it, it seems like a quite 
abstract scenario from this function perspective, so I think we're good 
here.

> 
>>
>>> +	memcpy(bb + bb_pos, &data, sizeof(data));
>>> +	bb_pos += sizeof(data);
>>>    	if (ext) {
>>>    		fill_data_ext(&dext, ext);
>>> -		memcpy(bb + i, &dext, sizeof(dext));
>>> -		i += sizeof(dext) / sizeof(uint32_t);
>>> +		igt_assert(bb_pos + sizeof(dext) < blt->bb.size);
>>> +		memcpy(bb + bb_pos, &dext, sizeof(dext));
>>> +		bb_pos += sizeof(dext);
>>> +	}
>>> +
>>> +	if (emit_bbe) {
>>> +		igt_assert(bb_pos + sizeof(uint32_t) < blt->bb.size);
>>> +		memcpy(bb + bb_pos, &bbe, sizeof(bbe));
>>> +		bb_pos += sizeof(uint32_t);
>>>    	}
>>> -	bb[i++] = MI_BATCH_BUFFER_END;
>>>    	if (blt->print_bb) {
>>>    		igt_info("[BLOCK COPY]\n");
>>> @@ -569,6 +572,44 @@ int blt_block_copy(int i915,
>>>    	munmap(bb, blt->bb.size);
>>> +	return bb_pos;
>>> +}
>>> +
>>> +/**
>>> + * blt_block_copy:
>>> + * @i915: drm fd
>>> + * @ctx: intel_ctx_t context
>>> + * @e: blitter engine for @ctx
>>> + * @ahnd: allocator handle
>>> + * @blt: basic blitter data (for TGL/DG1 which doesn't support ext version)
>>> + * @ext: extended blitter data (for DG2+, supports flatccs compression)
>>> + *
>>> + * Function does blit between @src and @dst described in @blt object.
>>> + *
>>> + * Returns:
>>> + * execbuffer status.
>>> + */
>>> +int blt_block_copy(int i915,
>>> +		   const intel_ctx_t *ctx,
>>> +		   const struct intel_execution_engine2 *e,
>>> +		   uint64_t ahnd,
>>> +		   const struct blt_copy_data *blt,
>>> +		   const struct blt_block_copy_data_ext *ext)
>>> +{
>>> +	struct drm_i915_gem_execbuffer2 execbuf = {};
>>> +	struct drm_i915_gem_exec_object2 obj[3] = {};
>>> +	uint64_t dst_offset, src_offset, bb_offset;
>>> +	int ret;
>>> +
>>> +	igt_assert_f(ahnd, "block-copy supports softpin only\n");
>>> +	igt_assert_f(blt, "block-copy requires data to do blit\n");
>>> +
>>> +	src_offset = get_offset(ahnd, blt->src.handle, blt->src.size, 0);
>>> +	dst_offset = get_offset(ahnd, blt->dst.handle, blt->dst.size, 0);
>>> +	bb_offset = get_offset(ahnd, blt->bb.handle, blt->bb.size, 0);
>>> +
>>> +	emit_blt_block_copy(i915, ahnd, blt, ext, 0, true);
>>> +
>>>    	obj[0].offset = CANONICAL(dst_offset);
>>>    	obj[1].offset = CANONICAL(src_offset);
>>>    	obj[2].offset = CANONICAL(bb_offset > @@ -655,31 +696,30 @@ static
>>> void dump_bb_surf_ctrl_cmd(const struct
>> gen12_ctrl_surf_copy_data *data)
>>>    }
>>>    /**
>>> - * blt_ctrl_surf_copy:
>>> + * emit_blt_ctrl_surf_copy:
>>>     * @i915: drm fd
>>> - * @ctx: intel_ctx_t context
>>> - * @e: blitter engine for @ctx
>>>     * @ahnd: allocator handle
>>>     * @surf: blitter data for ctrl-surf-copy
>>> + * @bb_pos: position at which insert block copy commands
>>> + * @emit_bbe: emit MI_BATCH_BUFFER_END after ctrl-surf-copy or not
>>>     *
>>> - * Function does ctrl-surf-copy blit between @src and @dst described in
>>> - * @blt object.
>>> + * Function emits ctrl-surf-copy blit between @src and @dst described in
>>> + * @blt object at @bb_pos. Allows concatenating with other commands to
>>> + * achieve pipelining.
>>>     *
>>>     * Returns:
>>> - * execbuffer status.
>>> + * Next write position in batch.
>>>     */
>>> -int blt_ctrl_surf_copy(int i915,
>>> -		       const intel_ctx_t *ctx,
>>> -		       const struct intel_execution_engine2 *e,
>>> -		       uint64_t ahnd,
>>> -		       const struct blt_ctrl_surf_copy_data *surf)
>>> +uint64_t emit_blt_ctrl_surf_copy(int i915,
>>> +				 uint64_t ahnd,
>>> +				 const struct blt_ctrl_surf_copy_data *surf,
>>> +				 uint64_t bb_pos,
>>> +				 bool emit_bbe)
>>>    {
>>> -	struct drm_i915_gem_execbuffer2 execbuf = {};
>>> -	struct drm_i915_gem_exec_object2 obj[3] = {};
>>>    	struct gen12_ctrl_surf_copy_data data = {};
>>>    	uint64_t dst_offset, src_offset, bb_offset, alignment;
>>> +	uint32_t bbe = MI_BATCH_BUFFER_END;
>>>    	uint32_t *bb;
>>> -	int i;
>>>    	igt_assert_f(ahnd, "ctrl-surf-copy supports softpin only\n");
>>>    	igt_assert_f(surf, "ctrl-surf-copy requires data to do ctrl-surf-copy blit\n");
>>> @@ -695,12 +735,9 @@ int blt_ctrl_surf_copy(int i915,
>>>    	data.dw00.size_of_ctrl_copy = __ccs_size(surf) / CCS_RATIO - 1;
>>>    	data.dw00.length = 0x3;
>>> -	src_offset = get_offset(ahnd, surf->src.handle, surf->src.size,
>>> -				alignment);
>>> -	dst_offset = get_offset(ahnd, surf->dst.handle, surf->dst.size,
>>> -				alignment);
>>> -	bb_offset = get_offset(ahnd, surf->bb.handle, surf->bb.size,
>>> -			       alignment);
>>> +	src_offset = get_offset(ahnd, surf->src.handle, surf->src.size, alignment);
>>> +	dst_offset = get_offset(ahnd, surf->dst.handle, surf->dst.size, alignment);
>>> +	bb_offset = get_offset(ahnd, surf->bb.handle, surf->bb.size, alignment);
>>>    	data.dw01.src_address_lo = src_offset;
>>>    	data.dw02.src_address_hi = src_offset >> 32;
>>> @@ -710,11 +747,18 @@ int blt_ctrl_surf_copy(int i915,
>>>    	data.dw04.dst_address_hi = dst_offset >> 32;
>>>    	data.dw04.dst_mocs = surf->dst.mocs;
>>> -	i = sizeof(data) / sizeof(uint32_t);
>>>    	bb = gem_mmap__device_coherent(i915, surf->bb.handle, 0, surf->bb.size,
>>>    				       PROT_READ | PROT_WRITE);
>>> -	memcpy(bb, &data, sizeof(data));
>>> -	bb[i++] = MI_BATCH_BUFFER_END;
>>> +
>>> +	igt_assert(bb_pos + sizeof(data) < surf->bb.size);
>>> +	memcpy(bb + bb_pos, &data, sizeof(data));
>>> +	bb_pos += sizeof(data);
>>> +
>>> +	if (emit_bbe) {
>>> +		igt_assert(bb_pos + sizeof(uint32_t) < surf->bb.size);
>>> +		memcpy(bb + bb_pos, &bbe, sizeof(bbe));
>>> +		bb_pos += sizeof(uint32_t);
>>> +	}
>>>    	if (surf->print_bb) {
>>>    		igt_info("BB [CTRL SURF]:\n");
>>> @@ -724,8 +768,46 @@ int blt_ctrl_surf_copy(int i915,
>>>    		dump_bb_surf_ctrl_cmd(&data);
>>>    	}
>>> +
>>>    	munmap(bb, surf->bb.size);
>>> +	return bb_pos;
>>> +}
>>> +
>>> +/**
>>> + * blt_ctrl_surf_copy:
>>> + * @i915: drm fd
>>> + * @ctx: intel_ctx_t context
>>> + * @e: blitter engine for @ctx
>>> + * @ahnd: allocator handle
>>> + * @surf: blitter data for ctrl-surf-copy
>>> + *
>>> + * Function does ctrl-surf-copy blit between @src and @dst described in
>>> + * @blt object.
>>> + *
>>> + * Returns:
>>> + * execbuffer status.
>>> + */
>>> +int blt_ctrl_surf_copy(int i915,
>>> +		       const intel_ctx_t *ctx,
>>> +		       const struct intel_execution_engine2 *e,
>>> +		       uint64_t ahnd,
>>> +		       const struct blt_ctrl_surf_copy_data *surf)
>>> +{
>>> +	struct drm_i915_gem_execbuffer2 execbuf = {};
>>> +	struct drm_i915_gem_exec_object2 obj[3] = {};
>>> +	uint64_t dst_offset, src_offset, bb_offset, alignment;
>>> +
>>> +	igt_assert_f(ahnd, "ctrl-surf-copy supports softpin only\n");
>>> +	igt_assert_f(surf, "ctrl-surf-copy requires data to do ctrl-surf-copy blit\n");
>>> +
>>> +	alignment = max_t(uint64_t, gem_detect_safe_alignment(i915), 1ull << 16);
>>> +	src_offset = get_offset(ahnd, surf->src.handle, surf->src.size, alignment);
>>> +	dst_offset = get_offset(ahnd, surf->dst.handle, surf->dst.size, alignment);
>>> +	bb_offset = get_offset(ahnd, surf->bb.handle, surf->bb.size, alignment);
>>> +
>>> +	emit_blt_ctrl_surf_copy(i915, ahnd, surf, 0, true);
>>> +
>>>    	obj[0].offset = CANONICAL(dst_offset);
>>>    	obj[1].offset = CANONICAL(src_offset);
>>>    	obj[2].offset = CANONICAL(bb_offset);
>>> @@ -869,31 +951,31 @@ static void dump_bb_fast_cmd(struct gen12_fast_copy_data *data)
>>>    }
>>>    /**
>>> - * blt_fast_copy:
>>> + * emit_blt_fast_copy:
>>>     * @i915: drm fd
>>> - * @ctx: intel_ctx_t context
>>> - * @e: blitter engine for @ctx
>>>     * @ahnd: allocator handle
>>>     * @blt: blitter data for fast-copy (same as for block-copy but doesn't use
>>>     * compression fields).
>>> + * @bb_pos: position at which insert block copy commands
>>> + * @emit_bbe: emit MI_BATCH_BUFFER_END after fast-copy or not
>>>     *
>>> - * Function does fast blit between @src and @dst described in @blt object.
>>> + * Function emits fast-copy blit between @src and @dst described in @blt object
>>> + * at @bb_pos. Allows concatenating with other commands to
>>> + * achieve pipelining.
>>>     *
>>>     * Returns:
>>> - * execbuffer status.
>>> + * Next write position in batch.
>>>     */
>>> -int blt_fast_copy(int i915,
>>> -		  const intel_ctx_t *ctx,
>>> -		  const struct intel_execution_engine2 *e,
>>> -		  uint64_t ahnd,
>>> -		  const struct blt_copy_data *blt)
>>> +uint64_t emit_blt_fast_copy(int i915,
>>> +			    uint64_t ahnd,
>>> +			    const struct blt_copy_data *blt,
>>> +			    uint64_t bb_pos,
>>> +			    bool emit_bbe)
>>>    {
>>> -	struct drm_i915_gem_execbuffer2 execbuf = {};
>>> -	struct drm_i915_gem_exec_object2 obj[3] = {};
>>>    	struct gen12_fast_copy_data data = {};
>>>    	uint64_t dst_offset, src_offset, bb_offset, alignment;
>>> +	uint32_t bbe = MI_BATCH_BUFFER_END;
>>>    	uint32_t *bb;
>>> -	int i, ret;
>>>    	alignment = gem_detect_safe_alignment(i915);
>>> @@ -931,22 +1013,65 @@ int blt_fast_copy(int i915,
>>>    	data.dw08.src_address_lo = src_offset;
>>>    	data.dw09.src_address_hi = src_offset >> 32;
>>> -	i = sizeof(data) / sizeof(uint32_t);
>>>    	bb = gem_mmap__device_coherent(i915, blt->bb.handle, 0, blt->bb.size,
>>>    				       PROT_READ | PROT_WRITE);
>>> -	memcpy(bb, &data, sizeof(data));
>>> -	bb[i++] = MI_BATCH_BUFFER_END;
>>> +	igt_assert(bb_pos + sizeof(data) < blt->bb.size);
>>> +	memcpy(bb + bb_pos, &data, sizeof(data));
>>> +	bb_pos += sizeof(data);
>>> +
>>> +	if (emit_bbe) {
>>> +		igt_assert(bb_pos + sizeof(uint32_t) < blt->bb.size);
>>> +		memcpy(bb + bb_pos, &bbe, sizeof(bbe));
>>> +		bb_pos += sizeof(uint32_t);
>>> +	}
>>>    	if (blt->print_bb) {
>>>    		igt_info("BB [FAST COPY]\n");
>>> -		igt_info("blit [src offset: %llx, dst offset: %llx\n",
>>> -			 (long long) src_offset, (long long) dst_offset);
>>> +		igt_info("src offset: %llx, dst offset: %llx, bb offset: %llx\n",
>>> +			 (long long) src_offset, (long long) dst_offset,
>>> +			 (long long) bb_offset);
>>
>> Nitpick: as you're touching these lines anyway, could you delete space after
>> cast? They're not needed.
> 
> I see preferred code style is to join (type)var, strange, imo (type)
> var looks more readable as var is immediately visible (joined value
> disturbes my perpception). Anyway ok, I will change this.

I run checkpatch.pl and it complained a bit:
 > CHECK: No space is necessary after a cast

Feel free to add my r-b to v2 of this patch:
Reviewed-by: Karolina Stolarek <karolina.stolarek@intel.com>

Thanks,
Karolina

> 
> --
> Zbigniew
> 
> 
>>
>> In general, I'm fine with the changes, but would like to clarify a couple of
>> things above before giving r-b.
>>
>> All the best,
>> Karolina
>>
>>>    		dump_bb_fast_cmd(&data);
>>>    	}
>>>    	munmap(bb, blt->bb.size);
>>> +	return bb_pos;
>>> +}
>>> +
>>> +/**
>>> + * blt_fast_copy:
>>> + * @i915: drm fd
>>> + * @ctx: intel_ctx_t context
>>> + * @e: blitter engine for @ctx
>>> + * @ahnd: allocator handle
>>> + * @blt: blitter data for fast-copy (same as for block-copy but doesn't use
>>> + * compression fields).
>>> + *
>>> + * Function does fast blit between @src and @dst described in @blt object.
>>> + *
>>> + * Returns:
>>> + * execbuffer status.
>>> + */
>>> +int blt_fast_copy(int i915,
>>> +		  const intel_ctx_t *ctx,
>>> +		  const struct intel_execution_engine2 *e,
>>> +		  uint64_t ahnd,
>>> +		  const struct blt_copy_data *blt)
>>> +{
>>> +	struct drm_i915_gem_execbuffer2 execbuf = {};
>>> +	struct drm_i915_gem_exec_object2 obj[3] = {};
>>> +	uint64_t dst_offset, src_offset, bb_offset, alignment;
>>> +	int ret;
>>> +
>>> +	alignment = gem_detect_safe_alignment(i915);
>>> +
>>> +	src_offset = get_offset(ahnd, blt->src.handle, blt->src.size, alignment);
>>> +	dst_offset = get_offset(ahnd, blt->dst.handle, blt->dst.size, alignment);
>>> +	bb_offset = get_offset(ahnd, blt->bb.handle, blt->bb.size, alignment);
>>> +
>>> +	emit_blt_fast_copy(i915, ahnd, blt, 0, true);
>>> +
>>>    	obj[0].offset = CANONICAL(dst_offset);
>>>    	obj[1].offset = CANONICAL(src_offset);
>>>    	obj[2].offset = CANONICAL(bb_offset);
>>> diff --git a/lib/i915/i915_blt.h b/lib/i915/i915_blt.h
>>> index e0e8b52bc2..34db9bb962 100644
>>> --- a/lib/i915/i915_blt.h
>>> +++ b/lib/i915/i915_blt.h
>>> @@ -168,6 +168,13 @@ bool blt_supports_compression(int i915);
>>>    bool blt_supports_tiling(int i915, enum blt_tiling tiling);
>>>    const char *blt_tiling_name(enum blt_tiling tiling);
>>> +uint64_t emit_blt_block_copy(int i915,
>>> +			     uint64_t ahnd,
>>> +			     const struct blt_copy_data *blt,
>>> +			     const struct blt_block_copy_data_ext *ext,
>>> +			     uint64_t bb_pos,
>>> +			     bool emit_bbe);
>>> +
>>>    int blt_block_copy(int i915,
>>>    		   const intel_ctx_t *ctx,
>>>    		   const struct intel_execution_engine2 *e,
>>> @@ -175,12 +182,24 @@ int blt_block_copy(int i915,
>>>    		   const struct blt_copy_data *blt,
>>>    		   const struct blt_block_copy_data_ext *ext);
>>> +uint64_t emit_blt_ctrl_surf_copy(int i915,
>>> +				 uint64_t ahnd,
>>> +				 const struct blt_ctrl_surf_copy_data *surf,
>>> +				 uint64_t bb_pos,
>>> +				 bool emit_bbe);
>>> +
>>>    int blt_ctrl_surf_copy(int i915,
>>>    		       const intel_ctx_t *ctx,
>>>    		       const struct intel_execution_engine2 *e,
>>>    		       uint64_t ahnd,
>>>    		       const struct blt_ctrl_surf_copy_data *surf);
>>> +uint64_t emit_blt_fast_copy(int i915,
>>> +			    uint64_t ahnd,
>>> +			    const struct blt_copy_data *blt,
>>> +			    uint64_t bb_pos,
>>> +			    bool emit_bbe);
>>> +
>>>    int blt_fast_copy(int i915,
>>>    		  const intel_ctx_t *ctx,
>>>    		  const struct intel_execution_engine2 *e,

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 3/3] tests/gem_ccs: Add block-multicopy subtest
  2022-12-14 19:57     ` Zbigniew Kempczyński
@ 2022-12-15  8:42       ` Karolina Stolarek
  2022-12-15 11:00         ` Zbigniew Kempczyński
  0 siblings, 1 reply; 20+ messages in thread
From: Karolina Stolarek @ 2022-12-15  8:42 UTC (permalink / raw)
  To: Zbigniew Kempczyński; +Cc: igt-dev

On 14.12.2022 20:57, Zbigniew Kempczyński wrote:
> On Tue, Dec 13, 2022 at 04:40:38PM +0100, Karolina Stolarek wrote:
>> On 12.12.2022 13:50, Zbigniew Kempczyński wrote:
>>> Exercise sequence of blits packed in single batch. It may reveal
>>> flushing/decompressing/detiling problems during execution.
>>> multicopy-inplace version differs from copy-inplace version with
>>> additional blit to same tiling format during decompression to separate
>>> problems visible in decompressing/detiling in one step.
>>>
>>> Signed-off-by: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com>
>>> ---
>>>    tests/i915/gem_ccs.c | 262 ++++++++++++++++++++++++++++++++++++++++---
>>>    1 file changed, 246 insertions(+), 16 deletions(-)
>>>
>>> diff --git a/tests/i915/gem_ccs.c b/tests/i915/gem_ccs.c
>>> index 4ecb3e36ac..6b5f199ec7 100644
>>> --- a/tests/i915/gem_ccs.c
>>> +++ b/tests/i915/gem_ccs.c
>>> @@ -262,6 +262,119 @@ static void surf_copy(int i915,
>>>    	gem_close(i915, ccs);
>>>    }
>>> +struct blt_copy3_data {
>>> +	int i915;
>>> +	struct blt_copy_object src;
>>> +	struct blt_copy_object mid;
>>> +	struct blt_copy_object dst;
>>> +	struct blt_copy_object final;
>>> +	struct blt_copy_batch bb;
>>> +	enum blt_color_depth color_depth;
>>> +
>>> +	/* debug stuff */
>>> +	bool print_bb;
>>> +};
>>> +
>>> +struct blt_block_copy3_data_ext {
>>> +	struct blt_block_copy_object_ext src;
>>> +	struct blt_block_copy_object_ext mid;
>>> +	struct blt_block_copy_object_ext dst;
>>> +	struct blt_block_copy_object_ext final;
>>> +};
>>> +
>>> +#define FILL_OBJ(_idx, _handle, _offset, _flags) do { \
>>> +	obj[(_idx)].handle = (_handle); \
>>> +	obj[(_idx)].offset = (_offset); \
>>> +	obj[(_idx)++].flags = EXEC_OBJECT_PINNED | \
>>> +			      EXEC_OBJECT_SUPPORTS_48B_ADDRESS | (_flags) ; \
>>> +} while (0)
>>> +
>>
>> I'm not sure if we want to have a macro with a hidden side effect. "i++"
>> after each statement isn't pretty either, so I'm on the fence with this one.
> 
> I'm aware this has side effect, but this makes main code more clear as it
> is not interlaced with var++ between the lines. I think reader who knows
> how execbuf objs works will notice there's some implicit increment in the
> macro.

When I first read the code, I thought "why do we keep using the same i 
for different objects?", and then looked at this definition. Or we could 
call FILL_OBJ(++i, ...), but it's not pretty either.

> 
>>
>>> +static int blt_block_copy3(int i915,
>>> +			   const intel_ctx_t *ctx,
>>> +			   const struct intel_execution_engine2 *e,
>>> +			   uint64_t ahnd,
>>> +			   const struct blt_copy3_data *blt3,
>>> +			   const struct blt_block_copy3_data_ext *ext3)
>>> +{
>>> +	struct drm_i915_gem_execbuffer2 execbuf = {};
>>> +	struct drm_i915_gem_exec_object2 obj[5] = {};
>>> +	struct blt_copy_data blt0;
>>> +	struct blt_block_copy_data_ext ext0;
>>> +	uint64_t src_offset, mid_offset, dst_offset, final_offset, bb_offset, alignment;
>>> +	uint64_t bb_pos = 0;
>>> +	uint32_t *bb;
>>> +	int i, ret;
>>> +
>>> +	igt_assert_f(ahnd, "block-copy3 supports softpin only\n");
>>> +	igt_assert_f(blt3, "block-copy3 requires data to do blit\n");
>>> +
>>> +	alignment = gem_detect_safe_alignment(i915);
>>> +	src_offset = get_offset(ahnd, blt3->src.handle, blt3->src.size, alignment);
>>> +	mid_offset = get_offset(ahnd, blt3->mid.handle, blt3->mid.size, alignment);
>>> +	dst_offset = get_offset(ahnd, blt3->dst.handle, blt3->dst.size, alignment);
>>> +	final_offset = get_offset(ahnd, blt3->final.handle, blt3->final.size, alignment);
>>> +	bb_offset = get_offset(ahnd, blt3->bb.handle, blt3->bb.size, alignment);
>>> +
>>> +	igt_debug("src: %lx, mid: %lx, dst: %lx, final: %lx, bb: %lx, align: %lx\n",
>>> +		  src_offset, mid_offset, dst_offset, final_offset, bb_offset, alignment);
>>> +
>>> +	/* First blit src -> mid */
>>> +	memset(&blt0, 0, sizeof(blt0));
>>> +	blt0.src = blt3->src;
>>> +	blt0.dst = blt3->mid;
>>> +	blt0.bb = blt3->bb;
>>> +	blt0.color_depth = blt3->color_depth;
>>> +	blt0.print_bb = blt3->print_bb;
>>> +	ext0.src = ext3->src;
>>> +	ext0.dst = ext3->mid;
>>> +	bb_pos = emit_blt_block_copy(i915, ahnd, &blt0, &ext0, bb_pos, false);
>>> +
>>> +	/* Second blit mid -> dst */
>>> +	memset(&blt0, 0, sizeof(blt0));
>>> +	blt0.src = blt3->mid;
>>> +	blt0.dst = blt3->dst;
>>> +	blt0.bb = blt3->bb;
>>> +	blt0.color_depth = blt3->color_depth;
>>> +	blt0.print_bb = blt3->print_bb;
>>> +	ext0.src = ext3->mid;
>>> +	ext0.dst = ext3->dst;
>>> +	bb_pos = emit_blt_block_copy(i915, ahnd, &blt0, &ext0, bb_pos, false);
>>> +
>>> +	/* Third blit dst -> final */
>>> +	memset(&blt0, 0, sizeof(blt0));
>>> +	blt0.src = blt3->dst;
>>> +	blt0.dst = blt3->final;
>>> +	blt0.bb = blt3->bb;
>>> +	blt0.color_depth = blt3->color_depth;
>>> +	blt0.print_bb = blt3->print_bb;
>>> +	ext0.src = ext3->dst;
>>> +	ext0.dst = ext3->final;
>>> +	bb_pos = emit_blt_block_copy(i915, ahnd, &blt0, &ext0, bb_pos, true);
>>> +
>>> +	i = 0;
>>> +	FILL_OBJ(i, blt3->src.handle, CANONICAL(src_offset), 0);
>>> +	FILL_OBJ(i, blt3->mid.handle, CANONICAL(mid_offset), EXEC_OBJECT_WRITE);
>>> +	if (mid_offset != dst_offset)
>>> +		FILL_OBJ(i, blt3->dst.handle, CANONICAL(dst_offset), EXEC_OBJECT_WRITE);
>>> +	FILL_OBJ(i, blt3->final.handle, CANONICAL(final_offset), 0);
>>> +	FILL_OBJ(i, blt3->bb.handle, CANONICAL(bb_offset), 0);
>>> +
>>> +	execbuf.buffer_count = i;
>>> +
>>> +	for (int __i = 0; __i < i; __i++)
>>> +		igt_debug("obj[%d].offset: %llx, handle: %u\n",
>>> +			  __i, (long long) obj[__i].offset, obj[__i].handle);
>>> +	execbuf.buffers_ptr = to_user_pointer(obj);
>>> +	execbuf.rsvd1 = ctx ? ctx->id : 0;
>>> +	execbuf.flags = e ? e->flags : I915_EXEC_BLT;
>>> +	ret = __gem_execbuf(i915, &execbuf);
>>> +
>>> +	gem_sync(i915, blt3->bb.handle);
>>> +	munmap(bb, blt3->bb.size);
>>> +
>>> +	return ret;
>>> +}
>>> +
>>>    static void block_copy(int i915,
>>>    		       const intel_ctx_t *ctx,
>>>    		       const struct intel_execution_engine2 *e,
>>> @@ -380,10 +493,100 @@ static void block_copy(int i915,
>>>    	igt_assert_f(!result, "source and destination surfaces differs!\n");
>>>    }
>>> +static void block_multicopy(int i915,
>>> +			    const intel_ctx_t *ctx,
>>> +			    const struct intel_execution_engine2 *e,
>>> +			    uint32_t region1, uint32_t region2,
>>> +			    enum blt_tiling mid_tiling,
>>> +			    const struct test_config *config)
>>> +{
>>> +	struct blt_copy3_data blt3 = {};
>>> +	struct blt_block_copy3_data_ext ext3 = {}, *pext3 = &ext3;
>>> +	struct blt_copy_object *src, *mid, *dst, *final;
>>> +	const uint32_t bpp = 32;
>>> +	uint64_t bb_size = 4096;
>>> +	uint64_t ahnd = get_reloc_ahnd(i915, ctx->id);
>>> +	uint32_t run_id = mid_tiling;
>>> +	uint32_t mid_region = region2, bb;
>>> +	uint32_t width = param.width, height = param.height;
>>> +	enum blt_compression mid_compression = config->compression;
>>> +	int mid_compression_format = param.compression_format;
>>> +	enum blt_compression_type comp_type = COMPRESSION_TYPE_3D;
>>> +	uint8_t uc_mocs = intel_get_uc_mocs(i915);
>>> +	int result;
>>> +
>>> +	igt_assert(__gem_create_in_memory_regions(i915, &bb, &bb_size, region1) == 0);
>>> +
>>> +	if (!blt_supports_compression(i915))
>>> +		pext3 = NULL;
>>> +
>>> +	src = create_object(i915, region1, width, height, bpp, uc_mocs,
>>> +			    T_LINEAR, COMPRESSION_DISABLED, comp_type, true);
>>> +	mid = create_object(i915, mid_region, width, height, bpp, uc_mocs,
>>> +			    mid_tiling, mid_compression, comp_type, true);
>>> +	dst = create_object(i915, region1, width, height, bpp, uc_mocs,
>>> +			    mid_tiling, COMPRESSION_DISABLED, comp_type, true);
>>> +	final = create_object(i915, region1, width, height, bpp, uc_mocs,
>>> +			    T_LINEAR, COMPRESSION_DISABLED, comp_type, true);
>>> +	igt_assert(src->size == dst->size);
>>> +	PRINT_SURFACE_INFO("src", src);
>>> +	PRINT_SURFACE_INFO("mid", mid);
>>> +	PRINT_SURFACE_INFO("dst", dst);
>>> +	PRINT_SURFACE_INFO("final", final);
>>> +
>>> +	blt_surface_fill_rect(i915, src, width, height);
>>> +
>>> +	memset(&blt3, 0, sizeof(blt3));
>>> +	blt3.color_depth = CD_32bit;
>>> +	blt3.print_bb = param.print_bb;
>>> +	set_blt_object(&blt3.src, src);
>>> +	set_blt_object(&blt3.mid, mid);
>>> +	set_blt_object(&blt3.dst, dst);
>>> +	set_blt_object(&blt3.final, final);
>>> +
>>> +	if (config->inplace) {
>>> +		set_object(&blt3.dst, mid->handle, dst->size, mid->region, mid->mocs,
>>> +			   mid_tiling, COMPRESSION_DISABLED, comp_type);
>>> +		blt3.dst.ptr = mid->ptr;
>>> +	}
>>> +
>>> +	set_object_ext(&ext3.src, 0, width, height, SURFACE_TYPE_2D);
>>> +	set_object_ext(&ext3.mid, mid_compression_format, width, height, SURFACE_TYPE_2D);
>>> +	set_object_ext(&ext3.dst, 0, width, height, SURFACE_TYPE_2D);
>>> +	set_object_ext(&ext3.final, 0, width, height, SURFACE_TYPE_2D);
>>> +	set_batch(&blt3.bb, bb, bb_size, region1);
>>> +
>>> +	blt_block_copy3(i915, ctx, e, ahnd, &blt3, pext3);
>>> +	gem_sync(i915, blt3.final.handle);
>>> +
>>> +	WRITE_PNG(i915, run_id, "src", &blt3.src, width, height);
>>> +	if (!config->inplace)
>>> +		WRITE_PNG(i915, run_id, "mid", &blt3.mid, width, height);
>>> +	WRITE_PNG(i915, run_id, "dst", &blt3.dst, width, height);
>>> +	WRITE_PNG(i915, run_id, "final", &blt3.final, width, height);
>>> +
>>> +	result = memcmp(src->ptr, blt3.final.ptr, src->size);
>>> +
>>> +	destroy_object(i915, src);
>>> +	destroy_object(i915, mid);
>>> +	destroy_object(i915, dst);
>>> +	destroy_object(i915, final);
>>> +	gem_close(i915, bb);
>>> +	put_ahnd(ahnd);
>>> +
>>> +	igt_assert_f(!result, "source and destination surfaces differs!\n");
>>> +}
>>> +
>>> +enum copy_func {
>>> +	BLOCK_COPY,
>>> +	BLOCK_MULTICOPY,
>>> +};
>>> +
>>
>> I was thinking if we need a simple case here (i.e. one emit, src->dst)
> 
> What tiling you want to blit? linear -> linear? linear -> tileN? How to
> verify it works? fmt1 -> fmt2 -> fmt1 is equal to fmt1 -> fmt2 (detile
> fmt2 -> linear on cpu).
> 
> At the moment we don't verify middle surface, only assume src == dst
> is fine with middle blit.

I thought about linear->linear in the beginning, but maybe it would make 
more sense to wait for predicates that can check the middle layer and 
work with different tilings. It can stay like this for now.

Many thanks,
Karolina

> 
> --
> Zbigniew
> 
>>
>>>    static void block_copy_test(int i915,
>>>    			    const struct test_config *config,
>>>    			    const intel_ctx_t *ctx,
>>> -			    struct igt_collection *set)
>>> +			    struct igt_collection *set,
>>> +			    enum copy_func copy_function)
>>>    {
>>>    	struct igt_collection *regions;
>>>    	const struct intel_execution_engine2 *e;
>>> @@ -415,15 +618,27 @@ static void block_copy_test(int i915,
>>>    					continue;
>>>    				regtxt = memregion_dynamic_subtest_name(regions);
>>> -				igt_dynamic_f("%s-%s-compfmt%d-%s",
>>> -					      blt_tiling_name(tiling),
>>> -					      config->compression ?
>>> -						      "compressed" : "uncompressed",
>>> -					      param.compression_format, regtxt) {
>>> -					block_copy(i915, ctx, e,
>>> -						   region1, region2,
>>> -						   tiling, config);
>>> -				}
>>> +
>>> +				if (copy_function == BLOCK_COPY)
>>> +					igt_dynamic_f("%s-%s-compfmt%d-%s",
>>> +						      blt_tiling_name(tiling),
>>> +						      config->compression ?
>>> +							      "compressed" : "uncompressed",
>>> +						      param.compression_format, regtxt) {
>>> +						block_copy(i915, ctx, e,
>>> +							   region1, region2,
>>> +							   tiling, config);
>>> +					}
>>> +				else if (copy_function == BLOCK_MULTICOPY)
>>> +					igt_dynamic_f("%s-%s-compfmt%d-%s-multicopy",
>>> +						      blt_tiling_name(tiling),
>>> +						      config->compression ?
>>> +							      "compressed" : "uncompressed",
>>> +						      param.compression_format, regtxt) {
>>> +						block_multicopy(i915, ctx, e,
>>> +								region1, region2,
>>> +								tiling, config);
>>> +					}
>>
>> We could avoid repetition to some extent if we had introduced a macro that
>> accepts a function pointer and/or some config data. But I don't feel
>> strongly about it, just putting it out there.
>>
>> Many thanks,
>> Karolina
>>
>>>    				free(regtxt);
>>>    			}
>>>    		}
>>> @@ -506,14 +721,21 @@ igt_main_args("bf:pst:W:H:", NULL, help_str, opt_handler, NULL)
>>>    	igt_subtest_with_dynamic("block-copy-uncompressed") {
>>>    		struct test_config config = {};
>>> -		block_copy_test(i915, &config, ctx, set);
>>> +		block_copy_test(i915, &config, ctx, set, BLOCK_COPY);
>>>    	}
>>>    	igt_describe("Check block-copy flatccs compressed blit");
>>>    	igt_subtest_with_dynamic("block-copy-compressed") {
>>>    		struct test_config config = { .compression = true };
>>> -		block_copy_test(i915, &config, ctx, set);
>>> +		block_copy_test(i915, &config, ctx, set, BLOCK_COPY);
>>> +	}
>>> +
>>> +	igt_describe("Check block-multicopy flatccs compressed blit");
>>> +	igt_subtest_with_dynamic("block-multicopy-compressed") {
>>> +		struct test_config config = { .compression = true };
>>> +
>>> +		block_copy_test(i915, &config, ctx, set, BLOCK_MULTICOPY);
>>>    	}
>>>    	igt_describe("Check block-copy flatccs inplace decompression blit");
>>> @@ -521,7 +743,15 @@ igt_main_args("bf:pst:W:H:", NULL, help_str, opt_handler, NULL)
>>>    		struct test_config config = { .compression = true,
>>>    					      .inplace = true };
>>> -		block_copy_test(i915, &config, ctx, set);
>>> +		block_copy_test(i915, &config, ctx, set, BLOCK_COPY);
>>> +	}
>>> +
>>> +	igt_describe("Check block-multicopy flatccs inplace decompression blit");
>>> +	igt_subtest_with_dynamic("block-multicopy-inplace") {
>>> +		struct test_config config = { .compression = true,
>>> +					      .inplace = true };
>>> +
>>> +		block_copy_test(i915, &config, ctx, set, BLOCK_MULTICOPY);
>>>    	}
>>>    	igt_describe("Check flatccs data can be copied from/to surface");
>>> @@ -529,7 +759,7 @@ igt_main_args("bf:pst:W:H:", NULL, help_str, opt_handler, NULL)
>>>    		struct test_config config = { .compression = true,
>>>    					      .surfcopy = true };
>>> -		block_copy_test(i915, &config, ctx, set);
>>> +		block_copy_test(i915, &config, ctx, set, BLOCK_COPY);
>>>    	}
>>>    	igt_describe("Check flatccs data are physically tagged and visible"
>>> @@ -539,7 +769,7 @@ igt_main_args("bf:pst:W:H:", NULL, help_str, opt_handler, NULL)
>>>    					      .surfcopy = true,
>>>    					      .new_ctx = true };
>>> -		block_copy_test(i915, &config, ctx, set);
>>> +		block_copy_test(i915, &config, ctx, set, BLOCK_COPY);
>>>    	}
>>>    	igt_describe("Check flatccs data persists after suspend / resume (S0)");
>>> @@ -548,7 +778,7 @@ igt_main_args("bf:pst:W:H:", NULL, help_str, opt_handler, NULL)
>>>    					      .surfcopy = true,
>>>    					      .suspend_resume = true };
>>> -		block_copy_test(i915, &config, ctx, set);
>>> +		block_copy_test(i915, &config, ctx, set, BLOCK_COPY);
>>>    	}
>>>    	igt_fixture {

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 2/3] lib/i915_blt: Extract blit emit functions
  2022-12-15  8:42       ` Karolina Stolarek
@ 2022-12-15 10:41         ` Zbigniew Kempczyński
  2022-12-15 10:47           ` Karolina Stolarek
  0 siblings, 1 reply; 20+ messages in thread
From: Zbigniew Kempczyński @ 2022-12-15 10:41 UTC (permalink / raw)
  To: Karolina Stolarek; +Cc: igt-dev

On Thu, Dec 15, 2022 at 09:42:17AM +0100, Karolina Stolarek wrote:
> On 14.12.2022 20:40, Zbigniew Kempczyński wrote:
> > On Tue, Dec 13, 2022 at 04:39:14PM +0100, Karolina Stolarek wrote:
> > > On 12.12.2022 13:50, Zbigniew Kempczyński wrote:
> > > > Add some flexibility in building user pipelines extracting blitter
> > > > emission code to dedicated functions. Previous blitter functions which
> > > > do one blit-and-execute are rewritten to use those functions.
> > > > Requires usage with stateful allocator (offset might be acquired more
> > > > than one, so it must not change).
> > > > 
> > > > Signed-off-by: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com>
> > > > ---
> > > >    lib/i915/i915_blt.c | 263 ++++++++++++++++++++++++++++++++------------
> > > >    lib/i915/i915_blt.h |  19 ++++
> > > >    2 files changed, 213 insertions(+), 69 deletions(-)
> > > > 
> > > > diff --git a/lib/i915/i915_blt.c b/lib/i915/i915_blt.c
> > > > index 42c28623f9..32ad608775 100644
> > > > --- a/lib/i915/i915_blt.c
> > > > +++ b/lib/i915/i915_blt.c
> > > > @@ -503,58 +503,61 @@ static void dump_bb_ext(struct gen12_block_copy_data_ext *data)
> > > >    }
> > > >    /**
> > > > - * blt_block_copy:
> > > > + * emit_blt_block_copy:
> > > >     * @i915: drm fd
> > > > - * @ctx: intel_ctx_t context
> > > > - * @e: blitter engine for @ctx
> > > >     * @ahnd: allocator handle
> > > >     * @blt: basic blitter data (for TGL/DG1 which doesn't support ext version)
> > > >     * @ext: extended blitter data (for DG2+, supports flatccs compression)
> > > > + * @bb_pos: position at which insert block copy commands
> > > > + * @emit_bbe: emit MI_BATCH_BUFFER_END after block-copy or not
> > > >     *
> > > > - * Function does blit between @src and @dst described in @blt object.
> > > > + * Function inserts block-copy blit into batch at @bb_pos. Allows concatenating
> > > > + * with other commands to achieve pipelining.
> > > >     *
> > > >     * Returns:
> > > > - * execbuffer status.
> > > > + * Next write position in batch.
> > > >     */
> > > > -int blt_block_copy(int i915,
> > > > -		   const intel_ctx_t *ctx,
> > > > -		   const struct intel_execution_engine2 *e,
> > > > -		   uint64_t ahnd,
> > > > -		   const struct blt_copy_data *blt,
> > > > -		   const struct blt_block_copy_data_ext *ext)
> > > > +uint64_t emit_blt_block_copy(int i915,
> > > > +			     uint64_t ahnd,
> > > > +			     const struct blt_copy_data *blt,
> > > > +			     const struct blt_block_copy_data_ext *ext,
> > > > +			     uint64_t bb_pos,
> > > > +			     bool emit_bbe)
> > > >    {
> > > > -	struct drm_i915_gem_execbuffer2 execbuf = {};
> > > > -	struct drm_i915_gem_exec_object2 obj[3] = {};
> > > >    	struct gen12_block_copy_data data = {};
> > > >    	struct gen12_block_copy_data_ext dext = {};
> > > > -	uint64_t dst_offset, src_offset, bb_offset, alignment;
> > > > -	uint32_t *bb;
> > > > -	int i, ret;
> > > > +	uint64_t dst_offset, src_offset, bb_offset;
> > > > +	uint32_t bbe = MI_BATCH_BUFFER_END;
> > > > +	uint8_t *bb;
> > > >    	igt_assert_f(ahnd, "block-copy supports softpin only\n");
> > > >    	igt_assert_f(blt, "block-copy requires data to do blit\n");
> > > > -	alignment = gem_detect_safe_alignment(i915);
> > > > -	src_offset = get_offset(ahnd, blt->src.handle, blt->src.size, alignment);
> > > > -	if (__special_mode(blt) == SM_FULL_RESOLVE)
> > > > -		dst_offset = src_offset;
> > > > -	else
> > > > -		dst_offset = get_offset(ahnd, blt->dst.handle, blt->dst.size, alignment);
> > > > -	bb_offset = get_offset(ahnd, blt->bb.handle, blt->bb.size, alignment);
> > > > +	src_offset = get_offset(ahnd, blt->src.handle, blt->src.size, 0);
> > > > +	dst_offset = get_offset(ahnd, blt->dst.handle, blt->dst.size, 0);
> > > > +	bb_offset = get_offset(ahnd, blt->bb.handle, blt->bb.size, 0);
> > > 
> > > Why do we pass 0 as object alignment here? In surf-copy and fast-copy we
> > > pass "alignment" in.
> > 
> > After rethinking you're right, caller may instantiate allocator open
> > with unacceptable alignment (like 4K where we use smem and lmem regions)
> > so enforcing safe alignment will protect us from ENOSPC.
> > 
> > Will be sent in v2 after you'll comment reply to previous patch.
> > Anyway thanks for pointing this.
> 
> Right, thanks for taking care of it!
> 
> > 
> > > 
> > > >    	fill_data(&data, blt, src_offset, dst_offset, ext);
> > > > -	i = sizeof(data) / sizeof(uint32_t);
> > > >    	bb = gem_mmap__device_coherent(i915, blt->bb.handle, 0, blt->bb.size,
> > > >    				       PROT_READ | PROT_WRITE);
> > > > -	memcpy(bb, &data, sizeof(data));
> > > > +
> > > > +	igt_assert(bb_pos + sizeof(data) < blt->bb.size);
> > > 
> > > I'd say we need extra space for potential MI_BATCH_BUFFER_END here
> > 
> > I don't assume how many steps (memcpy's to bb) will be in this blit
> > so I incrementally check if there's enough space in bb. See all
> > igt_asserts() below - one for dext and second for bbe (notice that
> > bbe might be not emitted - but this is caller responsibility to handle
> > if emitted instructions consumed all bb dwords).
> 
> I meant the scenario where we pass emit_bbe=false and the caller wants to
> add more data. Now, when I'm thinking about it, it seems like a quite
> abstract scenario from this function perspective, so I think we're good
> here.

Caller receives current position in batch and responsibility to check if
there's enough place in the buffer to add sth there is on its side.

--
Zbigniew

> 
> > 
> > > 
> > > > +	memcpy(bb + bb_pos, &data, sizeof(data));
> > > > +	bb_pos += sizeof(data);
> > > >    	if (ext) {
> > > >    		fill_data_ext(&dext, ext);
> > > > -		memcpy(bb + i, &dext, sizeof(dext));
> > > > -		i += sizeof(dext) / sizeof(uint32_t);
> > > > +		igt_assert(bb_pos + sizeof(dext) < blt->bb.size);
> > > > +		memcpy(bb + bb_pos, &dext, sizeof(dext));
> > > > +		bb_pos += sizeof(dext);
> > > > +	}
> > > > +
> > > > +	if (emit_bbe) {
> > > > +		igt_assert(bb_pos + sizeof(uint32_t) < blt->bb.size);
> > > > +		memcpy(bb + bb_pos, &bbe, sizeof(bbe));
> > > > +		bb_pos += sizeof(uint32_t);
> > > >    	}
> > > > -	bb[i++] = MI_BATCH_BUFFER_END;
> > > >    	if (blt->print_bb) {
> > > >    		igt_info("[BLOCK COPY]\n");
> > > > @@ -569,6 +572,44 @@ int blt_block_copy(int i915,
> > > >    	munmap(bb, blt->bb.size);
> > > > +	return bb_pos;
> > > > +}
> > > > +
> > > > +/**
> > > > + * blt_block_copy:
> > > > + * @i915: drm fd
> > > > + * @ctx: intel_ctx_t context
> > > > + * @e: blitter engine for @ctx
> > > > + * @ahnd: allocator handle
> > > > + * @blt: basic blitter data (for TGL/DG1 which doesn't support ext version)
> > > > + * @ext: extended blitter data (for DG2+, supports flatccs compression)
> > > > + *
> > > > + * Function does blit between @src and @dst described in @blt object.
> > > > + *
> > > > + * Returns:
> > > > + * execbuffer status.
> > > > + */
> > > > +int blt_block_copy(int i915,
> > > > +		   const intel_ctx_t *ctx,
> > > > +		   const struct intel_execution_engine2 *e,
> > > > +		   uint64_t ahnd,
> > > > +		   const struct blt_copy_data *blt,
> > > > +		   const struct blt_block_copy_data_ext *ext)
> > > > +{
> > > > +	struct drm_i915_gem_execbuffer2 execbuf = {};
> > > > +	struct drm_i915_gem_exec_object2 obj[3] = {};
> > > > +	uint64_t dst_offset, src_offset, bb_offset;
> > > > +	int ret;
> > > > +
> > > > +	igt_assert_f(ahnd, "block-copy supports softpin only\n");
> > > > +	igt_assert_f(blt, "block-copy requires data to do blit\n");
> > > > +
> > > > +	src_offset = get_offset(ahnd, blt->src.handle, blt->src.size, 0);
> > > > +	dst_offset = get_offset(ahnd, blt->dst.handle, blt->dst.size, 0);
> > > > +	bb_offset = get_offset(ahnd, blt->bb.handle, blt->bb.size, 0);
> > > > +
> > > > +	emit_blt_block_copy(i915, ahnd, blt, ext, 0, true);
> > > > +
> > > >    	obj[0].offset = CANONICAL(dst_offset);
> > > >    	obj[1].offset = CANONICAL(src_offset);
> > > >    	obj[2].offset = CANONICAL(bb_offset > @@ -655,31 +696,30 @@ static
> > > > void dump_bb_surf_ctrl_cmd(const struct
> > > gen12_ctrl_surf_copy_data *data)
> > > >    }
> > > >    /**
> > > > - * blt_ctrl_surf_copy:
> > > > + * emit_blt_ctrl_surf_copy:
> > > >     * @i915: drm fd
> > > > - * @ctx: intel_ctx_t context
> > > > - * @e: blitter engine for @ctx
> > > >     * @ahnd: allocator handle
> > > >     * @surf: blitter data for ctrl-surf-copy
> > > > + * @bb_pos: position at which insert block copy commands
> > > > + * @emit_bbe: emit MI_BATCH_BUFFER_END after ctrl-surf-copy or not
> > > >     *
> > > > - * Function does ctrl-surf-copy blit between @src and @dst described in
> > > > - * @blt object.
> > > > + * Function emits ctrl-surf-copy blit between @src and @dst described in
> > > > + * @blt object at @bb_pos. Allows concatenating with other commands to
> > > > + * achieve pipelining.
> > > >     *
> > > >     * Returns:
> > > > - * execbuffer status.
> > > > + * Next write position in batch.
> > > >     */
> > > > -int blt_ctrl_surf_copy(int i915,
> > > > -		       const intel_ctx_t *ctx,
> > > > -		       const struct intel_execution_engine2 *e,
> > > > -		       uint64_t ahnd,
> > > > -		       const struct blt_ctrl_surf_copy_data *surf)
> > > > +uint64_t emit_blt_ctrl_surf_copy(int i915,
> > > > +				 uint64_t ahnd,
> > > > +				 const struct blt_ctrl_surf_copy_data *surf,
> > > > +				 uint64_t bb_pos,
> > > > +				 bool emit_bbe)
> > > >    {
> > > > -	struct drm_i915_gem_execbuffer2 execbuf = {};
> > > > -	struct drm_i915_gem_exec_object2 obj[3] = {};
> > > >    	struct gen12_ctrl_surf_copy_data data = {};
> > > >    	uint64_t dst_offset, src_offset, bb_offset, alignment;
> > > > +	uint32_t bbe = MI_BATCH_BUFFER_END;
> > > >    	uint32_t *bb;
> > > > -	int i;
> > > >    	igt_assert_f(ahnd, "ctrl-surf-copy supports softpin only\n");
> > > >    	igt_assert_f(surf, "ctrl-surf-copy requires data to do ctrl-surf-copy blit\n");
> > > > @@ -695,12 +735,9 @@ int blt_ctrl_surf_copy(int i915,
> > > >    	data.dw00.size_of_ctrl_copy = __ccs_size(surf) / CCS_RATIO - 1;
> > > >    	data.dw00.length = 0x3;
> > > > -	src_offset = get_offset(ahnd, surf->src.handle, surf->src.size,
> > > > -				alignment);
> > > > -	dst_offset = get_offset(ahnd, surf->dst.handle, surf->dst.size,
> > > > -				alignment);
> > > > -	bb_offset = get_offset(ahnd, surf->bb.handle, surf->bb.size,
> > > > -			       alignment);
> > > > +	src_offset = get_offset(ahnd, surf->src.handle, surf->src.size, alignment);
> > > > +	dst_offset = get_offset(ahnd, surf->dst.handle, surf->dst.size, alignment);
> > > > +	bb_offset = get_offset(ahnd, surf->bb.handle, surf->bb.size, alignment);
> > > >    	data.dw01.src_address_lo = src_offset;
> > > >    	data.dw02.src_address_hi = src_offset >> 32;
> > > > @@ -710,11 +747,18 @@ int blt_ctrl_surf_copy(int i915,
> > > >    	data.dw04.dst_address_hi = dst_offset >> 32;
> > > >    	data.dw04.dst_mocs = surf->dst.mocs;
> > > > -	i = sizeof(data) / sizeof(uint32_t);
> > > >    	bb = gem_mmap__device_coherent(i915, surf->bb.handle, 0, surf->bb.size,
> > > >    				       PROT_READ | PROT_WRITE);
> > > > -	memcpy(bb, &data, sizeof(data));
> > > > -	bb[i++] = MI_BATCH_BUFFER_END;
> > > > +
> > > > +	igt_assert(bb_pos + sizeof(data) < surf->bb.size);
> > > > +	memcpy(bb + bb_pos, &data, sizeof(data));
> > > > +	bb_pos += sizeof(data);
> > > > +
> > > > +	if (emit_bbe) {
> > > > +		igt_assert(bb_pos + sizeof(uint32_t) < surf->bb.size);
> > > > +		memcpy(bb + bb_pos, &bbe, sizeof(bbe));
> > > > +		bb_pos += sizeof(uint32_t);
> > > > +	}
> > > >    	if (surf->print_bb) {
> > > >    		igt_info("BB [CTRL SURF]:\n");
> > > > @@ -724,8 +768,46 @@ int blt_ctrl_surf_copy(int i915,
> > > >    		dump_bb_surf_ctrl_cmd(&data);
> > > >    	}
> > > > +
> > > >    	munmap(bb, surf->bb.size);
> > > > +	return bb_pos;
> > > > +}
> > > > +
> > > > +/**
> > > > + * blt_ctrl_surf_copy:
> > > > + * @i915: drm fd
> > > > + * @ctx: intel_ctx_t context
> > > > + * @e: blitter engine for @ctx
> > > > + * @ahnd: allocator handle
> > > > + * @surf: blitter data for ctrl-surf-copy
> > > > + *
> > > > + * Function does ctrl-surf-copy blit between @src and @dst described in
> > > > + * @blt object.
> > > > + *
> > > > + * Returns:
> > > > + * execbuffer status.
> > > > + */
> > > > +int blt_ctrl_surf_copy(int i915,
> > > > +		       const intel_ctx_t *ctx,
> > > > +		       const struct intel_execution_engine2 *e,
> > > > +		       uint64_t ahnd,
> > > > +		       const struct blt_ctrl_surf_copy_data *surf)
> > > > +{
> > > > +	struct drm_i915_gem_execbuffer2 execbuf = {};
> > > > +	struct drm_i915_gem_exec_object2 obj[3] = {};
> > > > +	uint64_t dst_offset, src_offset, bb_offset, alignment;
> > > > +
> > > > +	igt_assert_f(ahnd, "ctrl-surf-copy supports softpin only\n");
> > > > +	igt_assert_f(surf, "ctrl-surf-copy requires data to do ctrl-surf-copy blit\n");
> > > > +
> > > > +	alignment = max_t(uint64_t, gem_detect_safe_alignment(i915), 1ull << 16);
> > > > +	src_offset = get_offset(ahnd, surf->src.handle, surf->src.size, alignment);
> > > > +	dst_offset = get_offset(ahnd, surf->dst.handle, surf->dst.size, alignment);
> > > > +	bb_offset = get_offset(ahnd, surf->bb.handle, surf->bb.size, alignment);
> > > > +
> > > > +	emit_blt_ctrl_surf_copy(i915, ahnd, surf, 0, true);
> > > > +
> > > >    	obj[0].offset = CANONICAL(dst_offset);
> > > >    	obj[1].offset = CANONICAL(src_offset);
> > > >    	obj[2].offset = CANONICAL(bb_offset);
> > > > @@ -869,31 +951,31 @@ static void dump_bb_fast_cmd(struct gen12_fast_copy_data *data)
> > > >    }
> > > >    /**
> > > > - * blt_fast_copy:
> > > > + * emit_blt_fast_copy:
> > > >     * @i915: drm fd
> > > > - * @ctx: intel_ctx_t context
> > > > - * @e: blitter engine for @ctx
> > > >     * @ahnd: allocator handle
> > > >     * @blt: blitter data for fast-copy (same as for block-copy but doesn't use
> > > >     * compression fields).
> > > > + * @bb_pos: position at which insert block copy commands
> > > > + * @emit_bbe: emit MI_BATCH_BUFFER_END after fast-copy or not
> > > >     *
> > > > - * Function does fast blit between @src and @dst described in @blt object.
> > > > + * Function emits fast-copy blit between @src and @dst described in @blt object
> > > > + * at @bb_pos. Allows concatenating with other commands to
> > > > + * achieve pipelining.
> > > >     *
> > > >     * Returns:
> > > > - * execbuffer status.
> > > > + * Next write position in batch.
> > > >     */
> > > > -int blt_fast_copy(int i915,
> > > > -		  const intel_ctx_t *ctx,
> > > > -		  const struct intel_execution_engine2 *e,
> > > > -		  uint64_t ahnd,
> > > > -		  const struct blt_copy_data *blt)
> > > > +uint64_t emit_blt_fast_copy(int i915,
> > > > +			    uint64_t ahnd,
> > > > +			    const struct blt_copy_data *blt,
> > > > +			    uint64_t bb_pos,
> > > > +			    bool emit_bbe)
> > > >    {
> > > > -	struct drm_i915_gem_execbuffer2 execbuf = {};
> > > > -	struct drm_i915_gem_exec_object2 obj[3] = {};
> > > >    	struct gen12_fast_copy_data data = {};
> > > >    	uint64_t dst_offset, src_offset, bb_offset, alignment;
> > > > +	uint32_t bbe = MI_BATCH_BUFFER_END;
> > > >    	uint32_t *bb;
> > > > -	int i, ret;
> > > >    	alignment = gem_detect_safe_alignment(i915);
> > > > @@ -931,22 +1013,65 @@ int blt_fast_copy(int i915,
> > > >    	data.dw08.src_address_lo = src_offset;
> > > >    	data.dw09.src_address_hi = src_offset >> 32;
> > > > -	i = sizeof(data) / sizeof(uint32_t);
> > > >    	bb = gem_mmap__device_coherent(i915, blt->bb.handle, 0, blt->bb.size,
> > > >    				       PROT_READ | PROT_WRITE);
> > > > -	memcpy(bb, &data, sizeof(data));
> > > > -	bb[i++] = MI_BATCH_BUFFER_END;
> > > > +	igt_assert(bb_pos + sizeof(data) < blt->bb.size);
> > > > +	memcpy(bb + bb_pos, &data, sizeof(data));
> > > > +	bb_pos += sizeof(data);
> > > > +
> > > > +	if (emit_bbe) {
> > > > +		igt_assert(bb_pos + sizeof(uint32_t) < blt->bb.size);
> > > > +		memcpy(bb + bb_pos, &bbe, sizeof(bbe));
> > > > +		bb_pos += sizeof(uint32_t);
> > > > +	}
> > > >    	if (blt->print_bb) {
> > > >    		igt_info("BB [FAST COPY]\n");
> > > > -		igt_info("blit [src offset: %llx, dst offset: %llx\n",
> > > > -			 (long long) src_offset, (long long) dst_offset);
> > > > +		igt_info("src offset: %llx, dst offset: %llx, bb offset: %llx\n",
> > > > +			 (long long) src_offset, (long long) dst_offset,
> > > > +			 (long long) bb_offset);
> > > 
> > > Nitpick: as you're touching these lines anyway, could you delete space after
> > > cast? They're not needed.
> > 
> > I see preferred code style is to join (type)var, strange, imo (type)
> > var looks more readable as var is immediately visible (joined value
> > disturbes my perpception). Anyway ok, I will change this.
> 
> I run checkpatch.pl and it complained a bit:
> > CHECK: No space is necessary after a cast
> 
> Feel free to add my r-b to v2 of this patch:
> Reviewed-by: Karolina Stolarek <karolina.stolarek@intel.com>
> 
> Thanks,
> Karolina
> 
> > 
> > --
> > Zbigniew
> > 
> > 
> > > 
> > > In general, I'm fine with the changes, but would like to clarify a couple of
> > > things above before giving r-b.
> > > 
> > > All the best,
> > > Karolina
> > > 
> > > >    		dump_bb_fast_cmd(&data);
> > > >    	}
> > > >    	munmap(bb, blt->bb.size);
> > > > +	return bb_pos;
> > > > +}
> > > > +
> > > > +/**
> > > > + * blt_fast_copy:
> > > > + * @i915: drm fd
> > > > + * @ctx: intel_ctx_t context
> > > > + * @e: blitter engine for @ctx
> > > > + * @ahnd: allocator handle
> > > > + * @blt: blitter data for fast-copy (same as for block-copy but doesn't use
> > > > + * compression fields).
> > > > + *
> > > > + * Function does fast blit between @src and @dst described in @blt object.
> > > > + *
> > > > + * Returns:
> > > > + * execbuffer status.
> > > > + */
> > > > +int blt_fast_copy(int i915,
> > > > +		  const intel_ctx_t *ctx,
> > > > +		  const struct intel_execution_engine2 *e,
> > > > +		  uint64_t ahnd,
> > > > +		  const struct blt_copy_data *blt)
> > > > +{
> > > > +	struct drm_i915_gem_execbuffer2 execbuf = {};
> > > > +	struct drm_i915_gem_exec_object2 obj[3] = {};
> > > > +	uint64_t dst_offset, src_offset, bb_offset, alignment;
> > > > +	int ret;
> > > > +
> > > > +	alignment = gem_detect_safe_alignment(i915);
> > > > +
> > > > +	src_offset = get_offset(ahnd, blt->src.handle, blt->src.size, alignment);
> > > > +	dst_offset = get_offset(ahnd, blt->dst.handle, blt->dst.size, alignment);
> > > > +	bb_offset = get_offset(ahnd, blt->bb.handle, blt->bb.size, alignment);
> > > > +
> > > > +	emit_blt_fast_copy(i915, ahnd, blt, 0, true);
> > > > +
> > > >    	obj[0].offset = CANONICAL(dst_offset);
> > > >    	obj[1].offset = CANONICAL(src_offset);
> > > >    	obj[2].offset = CANONICAL(bb_offset);
> > > > diff --git a/lib/i915/i915_blt.h b/lib/i915/i915_blt.h
> > > > index e0e8b52bc2..34db9bb962 100644
> > > > --- a/lib/i915/i915_blt.h
> > > > +++ b/lib/i915/i915_blt.h
> > > > @@ -168,6 +168,13 @@ bool blt_supports_compression(int i915);
> > > >    bool blt_supports_tiling(int i915, enum blt_tiling tiling);
> > > >    const char *blt_tiling_name(enum blt_tiling tiling);
> > > > +uint64_t emit_blt_block_copy(int i915,
> > > > +			     uint64_t ahnd,
> > > > +			     const struct blt_copy_data *blt,
> > > > +			     const struct blt_block_copy_data_ext *ext,
> > > > +			     uint64_t bb_pos,
> > > > +			     bool emit_bbe);
> > > > +
> > > >    int blt_block_copy(int i915,
> > > >    		   const intel_ctx_t *ctx,
> > > >    		   const struct intel_execution_engine2 *e,
> > > > @@ -175,12 +182,24 @@ int blt_block_copy(int i915,
> > > >    		   const struct blt_copy_data *blt,
> > > >    		   const struct blt_block_copy_data_ext *ext);
> > > > +uint64_t emit_blt_ctrl_surf_copy(int i915,
> > > > +				 uint64_t ahnd,
> > > > +				 const struct blt_ctrl_surf_copy_data *surf,
> > > > +				 uint64_t bb_pos,
> > > > +				 bool emit_bbe);
> > > > +
> > > >    int blt_ctrl_surf_copy(int i915,
> > > >    		       const intel_ctx_t *ctx,
> > > >    		       const struct intel_execution_engine2 *e,
> > > >    		       uint64_t ahnd,
> > > >    		       const struct blt_ctrl_surf_copy_data *surf);
> > > > +uint64_t emit_blt_fast_copy(int i915,
> > > > +			    uint64_t ahnd,
> > > > +			    const struct blt_copy_data *blt,
> > > > +			    uint64_t bb_pos,
> > > > +			    bool emit_bbe);
> > > > +
> > > >    int blt_fast_copy(int i915,
> > > >    		  const intel_ctx_t *ctx,
> > > >    		  const struct intel_execution_engine2 *e,

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 1/3] lib/i915_blt: Remove src == dst pitch restriction
  2022-12-15  8:41       ` Karolina Stolarek
@ 2022-12-15 10:43         ` Karolina Stolarek
  0 siblings, 0 replies; 20+ messages in thread
From: Karolina Stolarek @ 2022-12-15 10:43 UTC (permalink / raw)
  To: Zbigniew Kempczyński; +Cc: igt-dev

On 15.12.2022 09:41, Karolina Stolarek wrote:
> On 14.12.2022 19:57, Zbigniew Kempczyński wrote:
>> On Tue, Dec 13, 2022 at 04:37:26PM +0100, Karolina Stolarek wrote:
>>> On 12.12.2022 13:50, Zbigniew Kempczyński wrote:
>>>> During debugging phase we established there's not necessary to enforce
>>>> same pitch for destination surface in FULL_RESOLVE mode. Relax this
>>>> condition allowing caller to pass requirement surface configuration.
>>>>
>>>> Signed-off-by: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com>
>>>> ---
>>>>    lib/i915/i915_blt.c | 8 +++-----
>>>>    1 file changed, 3 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/lib/i915/i915_blt.c b/lib/i915/i915_blt.c
>>>> index 3776c56c60..42c28623f9 100644
>>>> --- a/lib/i915/i915_blt.c
>>>> +++ b/lib/i915/i915_blt.c
>>>> @@ -317,13 +317,11 @@ static void fill_data(struct 
>>>> gen12_block_copy_data *data,
>>>>        data->dw00.special_mode = __special_mode(blt);
>>>>        data->dw00.length = extended_command ? 20 : 10;
>>>> -    if (__special_mode(blt) == SM_FULL_RESOLVE && blt->src.tiling 
>>>> == T_TILE64) {
>>>> -        data->dw01.dst_pitch = blt->src.pitch - 1;
>>>> +    if (__special_mode(blt) == SM_FULL_RESOLVE)
>>>
>>> "blt->src.tiling == T_TILE64" part was introduced in commit cdc0258a4672
>>> ("lib/i915_blt: Narrow setting same pitch and aux-ccs to Tile64 only").
>>> I know it's a bit nit-picky, but could we revert that change (as it 
>>> looks
>>> like it's not necessary), and have a separate commit that removes
>>> data->dw01.dst_pitch = blt->src.pitch - 1?
>>
>> Two reverts would be necessary: cdc0258a467 and 229bb0accbb7 so single 
>> commit
>> against three looks better for me. Tiling inplace testing is still in 
>> progress.
>> If you think better is to revert above and add another commit I'll do it.
> 
> To me, some of the 229bb0accbb7 is still here (setting aux_mode), so I'd 
> say the revert of it is not necessary. But, yeah, I think it would be 
> better to revert cdc0258a467.

After further discussions with Zbigniew, we decided not to revert 
cdc0258a467. The original implementation isn't correct, and reverting 
two patches with a change on the top of it only muddies the waters. 
Let's keep things simple, and go with this patch.

All the best,
Karolina

>>
>>>
>>> The question remains -- what actually caused the regression for Tile4 
>>> and
>>> xmajor? Was it because of the pitch settings? I run the tests a 
>>> couple of
>>> times in the row after applying this patch and everything worked as
>>> expected.
>>
>> Yes, dst pitch influences on destination surface state. For cascaded
>> src->mid->dst mid.pitch may differ from dst.pitch (tileX -> linear blit).
>> For inplace there's some side effect which we're debugging now.
> 
> OK, so that seems like a root cause. As for inplace, I think it can be 
> slightly different, but we can update the function once debugging is 
> finished.
> 
> All the best,
> Karolina
> 
>>
>> -- 
>> Zbigniew
>>
>>>
>>> All the best,
>>> Karolina
>>>
>>>>            data->dw01.dst_aux_mode = __aux_mode(&blt->src);
>>>> -    } else {
>>>> -        data->dw01.dst_pitch = blt->dst.pitch - 1;
>>>> +    else
>>>>            data->dw01.dst_aux_mode = __aux_mode(&blt->dst);
>>>> -    }
>>>> +    data->dw01.dst_pitch = blt->dst.pitch - 1;
>>>>        data->dw01.dst_mocs = blt->dst.mocs;
>>>>        data->dw01.dst_compression = blt->dst.compression;

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 2/3] lib/i915_blt: Extract blit emit functions
  2022-12-15 10:41         ` Zbigniew Kempczyński
@ 2022-12-15 10:47           ` Karolina Stolarek
  0 siblings, 0 replies; 20+ messages in thread
From: Karolina Stolarek @ 2022-12-15 10:47 UTC (permalink / raw)
  To: Zbigniew Kempczyński; +Cc: igt-dev

On 15.12.2022 11:41, Zbigniew Kempczyński wrote:
> On Thu, Dec 15, 2022 at 09:42:17AM +0100, Karolina Stolarek wrote:
>> On 14.12.2022 20:40, Zbigniew Kempczyński wrote:
>>> On Tue, Dec 13, 2022 at 04:39:14PM +0100, Karolina Stolarek wrote:
>>>> On 12.12.2022 13:50, Zbigniew Kempczyński wrote:
>>>>> Add some flexibility in building user pipelines extracting blitter
>>>>> emission code to dedicated functions. Previous blitter functions which
>>>>> do one blit-and-execute are rewritten to use those functions.
>>>>> Requires usage with stateful allocator (offset might be acquired more
>>>>> than one, so it must not change).
>>>>>
>>>>> Signed-off-by: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com>
>>>>> ---
>>>>>     lib/i915/i915_blt.c | 263 ++++++++++++++++++++++++++++++++------------
>>>>>     lib/i915/i915_blt.h |  19 ++++
>>>>>     2 files changed, 213 insertions(+), 69 deletions(-)
>>>>>
>>>>> diff --git a/lib/i915/i915_blt.c b/lib/i915/i915_blt.c
>>>>> index 42c28623f9..32ad608775 100644
>>>>> --- a/lib/i915/i915_blt.c
>>>>> +++ b/lib/i915/i915_blt.c
>>>>> @@ -503,58 +503,61 @@ static void dump_bb_ext(struct gen12_block_copy_data_ext *data)
>>>>>     }
>>>>>     /**
>>>>> - * blt_block_copy:
>>>>> + * emit_blt_block_copy:
>>>>>      * @i915: drm fd
>>>>> - * @ctx: intel_ctx_t context
>>>>> - * @e: blitter engine for @ctx
>>>>>      * @ahnd: allocator handle
>>>>>      * @blt: basic blitter data (for TGL/DG1 which doesn't support ext version)
>>>>>      * @ext: extended blitter data (for DG2+, supports flatccs compression)
>>>>> + * @bb_pos: position at which insert block copy commands
>>>>> + * @emit_bbe: emit MI_BATCH_BUFFER_END after block-copy or not
>>>>>      *
>>>>> - * Function does blit between @src and @dst described in @blt object.
>>>>> + * Function inserts block-copy blit into batch at @bb_pos. Allows concatenating
>>>>> + * with other commands to achieve pipelining.
>>>>>      *
>>>>>      * Returns:
>>>>> - * execbuffer status.
>>>>> + * Next write position in batch.
>>>>>      */
>>>>> -int blt_block_copy(int i915,
>>>>> -		   const intel_ctx_t *ctx,
>>>>> -		   const struct intel_execution_engine2 *e,
>>>>> -		   uint64_t ahnd,
>>>>> -		   const struct blt_copy_data *blt,
>>>>> -		   const struct blt_block_copy_data_ext *ext)
>>>>> +uint64_t emit_blt_block_copy(int i915,
>>>>> +			     uint64_t ahnd,
>>>>> +			     const struct blt_copy_data *blt,
>>>>> +			     const struct blt_block_copy_data_ext *ext,
>>>>> +			     uint64_t bb_pos,
>>>>> +			     bool emit_bbe)
>>>>>     {
>>>>> -	struct drm_i915_gem_execbuffer2 execbuf = {};
>>>>> -	struct drm_i915_gem_exec_object2 obj[3] = {};
>>>>>     	struct gen12_block_copy_data data = {};
>>>>>     	struct gen12_block_copy_data_ext dext = {};
>>>>> -	uint64_t dst_offset, src_offset, bb_offset, alignment;
>>>>> -	uint32_t *bb;
>>>>> -	int i, ret;
>>>>> +	uint64_t dst_offset, src_offset, bb_offset;
>>>>> +	uint32_t bbe = MI_BATCH_BUFFER_END;
>>>>> +	uint8_t *bb;
>>>>>     	igt_assert_f(ahnd, "block-copy supports softpin only\n");
>>>>>     	igt_assert_f(blt, "block-copy requires data to do blit\n");
>>>>> -	alignment = gem_detect_safe_alignment(i915);
>>>>> -	src_offset = get_offset(ahnd, blt->src.handle, blt->src.size, alignment);
>>>>> -	if (__special_mode(blt) == SM_FULL_RESOLVE)
>>>>> -		dst_offset = src_offset;
>>>>> -	else
>>>>> -		dst_offset = get_offset(ahnd, blt->dst.handle, blt->dst.size, alignment);
>>>>> -	bb_offset = get_offset(ahnd, blt->bb.handle, blt->bb.size, alignment);
>>>>> +	src_offset = get_offset(ahnd, blt->src.handle, blt->src.size, 0);
>>>>> +	dst_offset = get_offset(ahnd, blt->dst.handle, blt->dst.size, 0);
>>>>> +	bb_offset = get_offset(ahnd, blt->bb.handle, blt->bb.size, 0);
>>>>
>>>> Why do we pass 0 as object alignment here? In surf-copy and fast-copy we
>>>> pass "alignment" in.
>>>
>>> After rethinking you're right, caller may instantiate allocator open
>>> with unacceptable alignment (like 4K where we use smem and lmem regions)
>>> so enforcing safe alignment will protect us from ENOSPC.
>>>
>>> Will be sent in v2 after you'll comment reply to previous patch.
>>> Anyway thanks for pointing this.
>>
>> Right, thanks for taking care of it!
>>
>>>
>>>>
>>>>>     	fill_data(&data, blt, src_offset, dst_offset, ext);
>>>>> -	i = sizeof(data) / sizeof(uint32_t);
>>>>>     	bb = gem_mmap__device_coherent(i915, blt->bb.handle, 0, blt->bb.size,
>>>>>     				       PROT_READ | PROT_WRITE);
>>>>> -	memcpy(bb, &data, sizeof(data));
>>>>> +
>>>>> +	igt_assert(bb_pos + sizeof(data) < blt->bb.size);
>>>>
>>>> I'd say we need extra space for potential MI_BATCH_BUFFER_END here
>>>
>>> I don't assume how many steps (memcpy's to bb) will be in this blit
>>> so I incrementally check if there's enough space in bb. See all
>>> igt_asserts() below - one for dext and second for bbe (notice that
>>> bbe might be not emitted - but this is caller responsibility to handle
>>> if emitted instructions consumed all bb dwords).
>>
>> I meant the scenario where we pass emit_bbe=false and the caller wants to
>> add more data. Now, when I'm thinking about it, it seems like a quite
>> abstract scenario from this function perspective, so I think we're good
>> here.
> 
> Caller receives current position in batch and responsibility to check if
> there's enough place in the buffer to add sth there is on its side.

Makes sense

Thanks,
Karolina

> --
> Zbigniew
> 
>>
>>>
>>>>
>>>>> +	memcpy(bb + bb_pos, &data, sizeof(data));
>>>>> +	bb_pos += sizeof(data);
>>>>>     	if (ext) {
>>>>>     		fill_data_ext(&dext, ext);
>>>>> -		memcpy(bb + i, &dext, sizeof(dext));
>>>>> -		i += sizeof(dext) / sizeof(uint32_t);
>>>>> +		igt_assert(bb_pos + sizeof(dext) < blt->bb.size);
>>>>> +		memcpy(bb + bb_pos, &dext, sizeof(dext));
>>>>> +		bb_pos += sizeof(dext);
>>>>> +	}
>>>>> +
>>>>> +	if (emit_bbe) {
>>>>> +		igt_assert(bb_pos + sizeof(uint32_t) < blt->bb.size);
>>>>> +		memcpy(bb + bb_pos, &bbe, sizeof(bbe));
>>>>> +		bb_pos += sizeof(uint32_t);
>>>>>     	}
>>>>> -	bb[i++] = MI_BATCH_BUFFER_END;
>>>>>     	if (blt->print_bb) {
>>>>>     		igt_info("[BLOCK COPY]\n");
>>>>> @@ -569,6 +572,44 @@ int blt_block_copy(int i915,
>>>>>     	munmap(bb, blt->bb.size);
>>>>> +	return bb_pos;
>>>>> +}
>>>>> +
>>>>> +/**
>>>>> + * blt_block_copy:
>>>>> + * @i915: drm fd
>>>>> + * @ctx: intel_ctx_t context
>>>>> + * @e: blitter engine for @ctx
>>>>> + * @ahnd: allocator handle
>>>>> + * @blt: basic blitter data (for TGL/DG1 which doesn't support ext version)
>>>>> + * @ext: extended blitter data (for DG2+, supports flatccs compression)
>>>>> + *
>>>>> + * Function does blit between @src and @dst described in @blt object.
>>>>> + *
>>>>> + * Returns:
>>>>> + * execbuffer status.
>>>>> + */
>>>>> +int blt_block_copy(int i915,
>>>>> +		   const intel_ctx_t *ctx,
>>>>> +		   const struct intel_execution_engine2 *e,
>>>>> +		   uint64_t ahnd,
>>>>> +		   const struct blt_copy_data *blt,
>>>>> +		   const struct blt_block_copy_data_ext *ext)
>>>>> +{
>>>>> +	struct drm_i915_gem_execbuffer2 execbuf = {};
>>>>> +	struct drm_i915_gem_exec_object2 obj[3] = {};
>>>>> +	uint64_t dst_offset, src_offset, bb_offset;
>>>>> +	int ret;
>>>>> +
>>>>> +	igt_assert_f(ahnd, "block-copy supports softpin only\n");
>>>>> +	igt_assert_f(blt, "block-copy requires data to do blit\n");
>>>>> +
>>>>> +	src_offset = get_offset(ahnd, blt->src.handle, blt->src.size, 0);
>>>>> +	dst_offset = get_offset(ahnd, blt->dst.handle, blt->dst.size, 0);
>>>>> +	bb_offset = get_offset(ahnd, blt->bb.handle, blt->bb.size, 0);
>>>>> +
>>>>> +	emit_blt_block_copy(i915, ahnd, blt, ext, 0, true);
>>>>> +
>>>>>     	obj[0].offset = CANONICAL(dst_offset);
>>>>>     	obj[1].offset = CANONICAL(src_offset);
>>>>>     	obj[2].offset = CANONICAL(bb_offset > @@ -655,31 +696,30 @@ static
>>>>> void dump_bb_surf_ctrl_cmd(const struct
>>>> gen12_ctrl_surf_copy_data *data)
>>>>>     }
>>>>>     /**
>>>>> - * blt_ctrl_surf_copy:
>>>>> + * emit_blt_ctrl_surf_copy:
>>>>>      * @i915: drm fd
>>>>> - * @ctx: intel_ctx_t context
>>>>> - * @e: blitter engine for @ctx
>>>>>      * @ahnd: allocator handle
>>>>>      * @surf: blitter data for ctrl-surf-copy
>>>>> + * @bb_pos: position at which insert block copy commands
>>>>> + * @emit_bbe: emit MI_BATCH_BUFFER_END after ctrl-surf-copy or not
>>>>>      *
>>>>> - * Function does ctrl-surf-copy blit between @src and @dst described in
>>>>> - * @blt object.
>>>>> + * Function emits ctrl-surf-copy blit between @src and @dst described in
>>>>> + * @blt object at @bb_pos. Allows concatenating with other commands to
>>>>> + * achieve pipelining.
>>>>>      *
>>>>>      * Returns:
>>>>> - * execbuffer status.
>>>>> + * Next write position in batch.
>>>>>      */
>>>>> -int blt_ctrl_surf_copy(int i915,
>>>>> -		       const intel_ctx_t *ctx,
>>>>> -		       const struct intel_execution_engine2 *e,
>>>>> -		       uint64_t ahnd,
>>>>> -		       const struct blt_ctrl_surf_copy_data *surf)
>>>>> +uint64_t emit_blt_ctrl_surf_copy(int i915,
>>>>> +				 uint64_t ahnd,
>>>>> +				 const struct blt_ctrl_surf_copy_data *surf,
>>>>> +				 uint64_t bb_pos,
>>>>> +				 bool emit_bbe)
>>>>>     {
>>>>> -	struct drm_i915_gem_execbuffer2 execbuf = {};
>>>>> -	struct drm_i915_gem_exec_object2 obj[3] = {};
>>>>>     	struct gen12_ctrl_surf_copy_data data = {};
>>>>>     	uint64_t dst_offset, src_offset, bb_offset, alignment;
>>>>> +	uint32_t bbe = MI_BATCH_BUFFER_END;
>>>>>     	uint32_t *bb;
>>>>> -	int i;
>>>>>     	igt_assert_f(ahnd, "ctrl-surf-copy supports softpin only\n");
>>>>>     	igt_assert_f(surf, "ctrl-surf-copy requires data to do ctrl-surf-copy blit\n");
>>>>> @@ -695,12 +735,9 @@ int blt_ctrl_surf_copy(int i915,
>>>>>     	data.dw00.size_of_ctrl_copy = __ccs_size(surf) / CCS_RATIO - 1;
>>>>>     	data.dw00.length = 0x3;
>>>>> -	src_offset = get_offset(ahnd, surf->src.handle, surf->src.size,
>>>>> -				alignment);
>>>>> -	dst_offset = get_offset(ahnd, surf->dst.handle, surf->dst.size,
>>>>> -				alignment);
>>>>> -	bb_offset = get_offset(ahnd, surf->bb.handle, surf->bb.size,
>>>>> -			       alignment);
>>>>> +	src_offset = get_offset(ahnd, surf->src.handle, surf->src.size, alignment);
>>>>> +	dst_offset = get_offset(ahnd, surf->dst.handle, surf->dst.size, alignment);
>>>>> +	bb_offset = get_offset(ahnd, surf->bb.handle, surf->bb.size, alignment);
>>>>>     	data.dw01.src_address_lo = src_offset;
>>>>>     	data.dw02.src_address_hi = src_offset >> 32;
>>>>> @@ -710,11 +747,18 @@ int blt_ctrl_surf_copy(int i915,
>>>>>     	data.dw04.dst_address_hi = dst_offset >> 32;
>>>>>     	data.dw04.dst_mocs = surf->dst.mocs;
>>>>> -	i = sizeof(data) / sizeof(uint32_t);
>>>>>     	bb = gem_mmap__device_coherent(i915, surf->bb.handle, 0, surf->bb.size,
>>>>>     				       PROT_READ | PROT_WRITE);
>>>>> -	memcpy(bb, &data, sizeof(data));
>>>>> -	bb[i++] = MI_BATCH_BUFFER_END;
>>>>> +
>>>>> +	igt_assert(bb_pos + sizeof(data) < surf->bb.size);
>>>>> +	memcpy(bb + bb_pos, &data, sizeof(data));
>>>>> +	bb_pos += sizeof(data);
>>>>> +
>>>>> +	if (emit_bbe) {
>>>>> +		igt_assert(bb_pos + sizeof(uint32_t) < surf->bb.size);
>>>>> +		memcpy(bb + bb_pos, &bbe, sizeof(bbe));
>>>>> +		bb_pos += sizeof(uint32_t);
>>>>> +	}
>>>>>     	if (surf->print_bb) {
>>>>>     		igt_info("BB [CTRL SURF]:\n");
>>>>> @@ -724,8 +768,46 @@ int blt_ctrl_surf_copy(int i915,
>>>>>     		dump_bb_surf_ctrl_cmd(&data);
>>>>>     	}
>>>>> +
>>>>>     	munmap(bb, surf->bb.size);
>>>>> +	return bb_pos;
>>>>> +}
>>>>> +
>>>>> +/**
>>>>> + * blt_ctrl_surf_copy:
>>>>> + * @i915: drm fd
>>>>> + * @ctx: intel_ctx_t context
>>>>> + * @e: blitter engine for @ctx
>>>>> + * @ahnd: allocator handle
>>>>> + * @surf: blitter data for ctrl-surf-copy
>>>>> + *
>>>>> + * Function does ctrl-surf-copy blit between @src and @dst described in
>>>>> + * @blt object.
>>>>> + *
>>>>> + * Returns:
>>>>> + * execbuffer status.
>>>>> + */
>>>>> +int blt_ctrl_surf_copy(int i915,
>>>>> +		       const intel_ctx_t *ctx,
>>>>> +		       const struct intel_execution_engine2 *e,
>>>>> +		       uint64_t ahnd,
>>>>> +		       const struct blt_ctrl_surf_copy_data *surf)
>>>>> +{
>>>>> +	struct drm_i915_gem_execbuffer2 execbuf = {};
>>>>> +	struct drm_i915_gem_exec_object2 obj[3] = {};
>>>>> +	uint64_t dst_offset, src_offset, bb_offset, alignment;
>>>>> +
>>>>> +	igt_assert_f(ahnd, "ctrl-surf-copy supports softpin only\n");
>>>>> +	igt_assert_f(surf, "ctrl-surf-copy requires data to do ctrl-surf-copy blit\n");
>>>>> +
>>>>> +	alignment = max_t(uint64_t, gem_detect_safe_alignment(i915), 1ull << 16);
>>>>> +	src_offset = get_offset(ahnd, surf->src.handle, surf->src.size, alignment);
>>>>> +	dst_offset = get_offset(ahnd, surf->dst.handle, surf->dst.size, alignment);
>>>>> +	bb_offset = get_offset(ahnd, surf->bb.handle, surf->bb.size, alignment);
>>>>> +
>>>>> +	emit_blt_ctrl_surf_copy(i915, ahnd, surf, 0, true);
>>>>> +
>>>>>     	obj[0].offset = CANONICAL(dst_offset);
>>>>>     	obj[1].offset = CANONICAL(src_offset);
>>>>>     	obj[2].offset = CANONICAL(bb_offset);
>>>>> @@ -869,31 +951,31 @@ static void dump_bb_fast_cmd(struct gen12_fast_copy_data *data)
>>>>>     }
>>>>>     /**
>>>>> - * blt_fast_copy:
>>>>> + * emit_blt_fast_copy:
>>>>>      * @i915: drm fd
>>>>> - * @ctx: intel_ctx_t context
>>>>> - * @e: blitter engine for @ctx
>>>>>      * @ahnd: allocator handle
>>>>>      * @blt: blitter data for fast-copy (same as for block-copy but doesn't use
>>>>>      * compression fields).
>>>>> + * @bb_pos: position at which insert block copy commands
>>>>> + * @emit_bbe: emit MI_BATCH_BUFFER_END after fast-copy or not
>>>>>      *
>>>>> - * Function does fast blit between @src and @dst described in @blt object.
>>>>> + * Function emits fast-copy blit between @src and @dst described in @blt object
>>>>> + * at @bb_pos. Allows concatenating with other commands to
>>>>> + * achieve pipelining.
>>>>>      *
>>>>>      * Returns:
>>>>> - * execbuffer status.
>>>>> + * Next write position in batch.
>>>>>      */
>>>>> -int blt_fast_copy(int i915,
>>>>> -		  const intel_ctx_t *ctx,
>>>>> -		  const struct intel_execution_engine2 *e,
>>>>> -		  uint64_t ahnd,
>>>>> -		  const struct blt_copy_data *blt)
>>>>> +uint64_t emit_blt_fast_copy(int i915,
>>>>> +			    uint64_t ahnd,
>>>>> +			    const struct blt_copy_data *blt,
>>>>> +			    uint64_t bb_pos,
>>>>> +			    bool emit_bbe)
>>>>>     {
>>>>> -	struct drm_i915_gem_execbuffer2 execbuf = {};
>>>>> -	struct drm_i915_gem_exec_object2 obj[3] = {};
>>>>>     	struct gen12_fast_copy_data data = {};
>>>>>     	uint64_t dst_offset, src_offset, bb_offset, alignment;
>>>>> +	uint32_t bbe = MI_BATCH_BUFFER_END;
>>>>>     	uint32_t *bb;
>>>>> -	int i, ret;
>>>>>     	alignment = gem_detect_safe_alignment(i915);
>>>>> @@ -931,22 +1013,65 @@ int blt_fast_copy(int i915,
>>>>>     	data.dw08.src_address_lo = src_offset;
>>>>>     	data.dw09.src_address_hi = src_offset >> 32;
>>>>> -	i = sizeof(data) / sizeof(uint32_t);
>>>>>     	bb = gem_mmap__device_coherent(i915, blt->bb.handle, 0, blt->bb.size,
>>>>>     				       PROT_READ | PROT_WRITE);
>>>>> -	memcpy(bb, &data, sizeof(data));
>>>>> -	bb[i++] = MI_BATCH_BUFFER_END;
>>>>> +	igt_assert(bb_pos + sizeof(data) < blt->bb.size);
>>>>> +	memcpy(bb + bb_pos, &data, sizeof(data));
>>>>> +	bb_pos += sizeof(data);
>>>>> +
>>>>> +	if (emit_bbe) {
>>>>> +		igt_assert(bb_pos + sizeof(uint32_t) < blt->bb.size);
>>>>> +		memcpy(bb + bb_pos, &bbe, sizeof(bbe));
>>>>> +		bb_pos += sizeof(uint32_t);
>>>>> +	}
>>>>>     	if (blt->print_bb) {
>>>>>     		igt_info("BB [FAST COPY]\n");
>>>>> -		igt_info("blit [src offset: %llx, dst offset: %llx\n",
>>>>> -			 (long long) src_offset, (long long) dst_offset);
>>>>> +		igt_info("src offset: %llx, dst offset: %llx, bb offset: %llx\n",
>>>>> +			 (long long) src_offset, (long long) dst_offset,
>>>>> +			 (long long) bb_offset);
>>>>
>>>> Nitpick: as you're touching these lines anyway, could you delete space after
>>>> cast? They're not needed.
>>>
>>> I see preferred code style is to join (type)var, strange, imo (type)
>>> var looks more readable as var is immediately visible (joined value
>>> disturbes my perpception). Anyway ok, I will change this.
>>
>> I run checkpatch.pl and it complained a bit:
>>> CHECK: No space is necessary after a cast
>>
>> Feel free to add my r-b to v2 of this patch:
>> Reviewed-by: Karolina Stolarek <karolina.stolarek@intel.com>
>>
>> Thanks,
>> Karolina
>>
>>>
>>> --
>>> Zbigniew
>>>
>>>
>>>>
>>>> In general, I'm fine with the changes, but would like to clarify a couple of
>>>> things above before giving r-b.
>>>>
>>>> All the best,
>>>> Karolina
>>>>
>>>>>     		dump_bb_fast_cmd(&data);
>>>>>     	}
>>>>>     	munmap(bb, blt->bb.size);
>>>>> +	return bb_pos;
>>>>> +}
>>>>> +
>>>>> +/**
>>>>> + * blt_fast_copy:
>>>>> + * @i915: drm fd
>>>>> + * @ctx: intel_ctx_t context
>>>>> + * @e: blitter engine for @ctx
>>>>> + * @ahnd: allocator handle
>>>>> + * @blt: blitter data for fast-copy (same as for block-copy but doesn't use
>>>>> + * compression fields).
>>>>> + *
>>>>> + * Function does fast blit between @src and @dst described in @blt object.
>>>>> + *
>>>>> + * Returns:
>>>>> + * execbuffer status.
>>>>> + */
>>>>> +int blt_fast_copy(int i915,
>>>>> +		  const intel_ctx_t *ctx,
>>>>> +		  const struct intel_execution_engine2 *e,
>>>>> +		  uint64_t ahnd,
>>>>> +		  const struct blt_copy_data *blt)
>>>>> +{
>>>>> +	struct drm_i915_gem_execbuffer2 execbuf = {};
>>>>> +	struct drm_i915_gem_exec_object2 obj[3] = {};
>>>>> +	uint64_t dst_offset, src_offset, bb_offset, alignment;
>>>>> +	int ret;
>>>>> +
>>>>> +	alignment = gem_detect_safe_alignment(i915);
>>>>> +
>>>>> +	src_offset = get_offset(ahnd, blt->src.handle, blt->src.size, alignment);
>>>>> +	dst_offset = get_offset(ahnd, blt->dst.handle, blt->dst.size, alignment);
>>>>> +	bb_offset = get_offset(ahnd, blt->bb.handle, blt->bb.size, alignment);
>>>>> +
>>>>> +	emit_blt_fast_copy(i915, ahnd, blt, 0, true);
>>>>> +
>>>>>     	obj[0].offset = CANONICAL(dst_offset);
>>>>>     	obj[1].offset = CANONICAL(src_offset);
>>>>>     	obj[2].offset = CANONICAL(bb_offset);
>>>>> diff --git a/lib/i915/i915_blt.h b/lib/i915/i915_blt.h
>>>>> index e0e8b52bc2..34db9bb962 100644
>>>>> --- a/lib/i915/i915_blt.h
>>>>> +++ b/lib/i915/i915_blt.h
>>>>> @@ -168,6 +168,13 @@ bool blt_supports_compression(int i915);
>>>>>     bool blt_supports_tiling(int i915, enum blt_tiling tiling);
>>>>>     const char *blt_tiling_name(enum blt_tiling tiling);
>>>>> +uint64_t emit_blt_block_copy(int i915,
>>>>> +			     uint64_t ahnd,
>>>>> +			     const struct blt_copy_data *blt,
>>>>> +			     const struct blt_block_copy_data_ext *ext,
>>>>> +			     uint64_t bb_pos,
>>>>> +			     bool emit_bbe);
>>>>> +
>>>>>     int blt_block_copy(int i915,
>>>>>     		   const intel_ctx_t *ctx,
>>>>>     		   const struct intel_execution_engine2 *e,
>>>>> @@ -175,12 +182,24 @@ int blt_block_copy(int i915,
>>>>>     		   const struct blt_copy_data *blt,
>>>>>     		   const struct blt_block_copy_data_ext *ext);
>>>>> +uint64_t emit_blt_ctrl_surf_copy(int i915,
>>>>> +				 uint64_t ahnd,
>>>>> +				 const struct blt_ctrl_surf_copy_data *surf,
>>>>> +				 uint64_t bb_pos,
>>>>> +				 bool emit_bbe);
>>>>> +
>>>>>     int blt_ctrl_surf_copy(int i915,
>>>>>     		       const intel_ctx_t *ctx,
>>>>>     		       const struct intel_execution_engine2 *e,
>>>>>     		       uint64_t ahnd,
>>>>>     		       const struct blt_ctrl_surf_copy_data *surf);
>>>>> +uint64_t emit_blt_fast_copy(int i915,
>>>>> +			    uint64_t ahnd,
>>>>> +			    const struct blt_copy_data *blt,
>>>>> +			    uint64_t bb_pos,
>>>>> +			    bool emit_bbe);
>>>>> +
>>>>>     int blt_fast_copy(int i915,
>>>>>     		  const intel_ctx_t *ctx,
>>>>>     		  const struct intel_execution_engine2 *e,

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 3/3] tests/gem_ccs: Add block-multicopy subtest
  2022-12-15  8:42       ` Karolina Stolarek
@ 2022-12-15 11:00         ` Zbigniew Kempczyński
  2022-12-15 11:47           ` Karolina Stolarek
  0 siblings, 1 reply; 20+ messages in thread
From: Zbigniew Kempczyński @ 2022-12-15 11:00 UTC (permalink / raw)
  To: Karolina Stolarek; +Cc: igt-dev

On Thu, Dec 15, 2022 at 09:42:56AM +0100, Karolina Stolarek wrote:
> On 14.12.2022 20:57, Zbigniew Kempczyński wrote:
> > On Tue, Dec 13, 2022 at 04:40:38PM +0100, Karolina Stolarek wrote:
> > > On 12.12.2022 13:50, Zbigniew Kempczyński wrote:
> > > > Exercise sequence of blits packed in single batch. It may reveal
> > > > flushing/decompressing/detiling problems during execution.
> > > > multicopy-inplace version differs from copy-inplace version with
> > > > additional blit to same tiling format during decompression to separate
> > > > problems visible in decompressing/detiling in one step.
> > > > 
> > > > Signed-off-by: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com>
> > > > ---
> > > >    tests/i915/gem_ccs.c | 262 ++++++++++++++++++++++++++++++++++++++++---
> > > >    1 file changed, 246 insertions(+), 16 deletions(-)
> > > > 
> > > > diff --git a/tests/i915/gem_ccs.c b/tests/i915/gem_ccs.c
> > > > index 4ecb3e36ac..6b5f199ec7 100644
> > > > --- a/tests/i915/gem_ccs.c
> > > > +++ b/tests/i915/gem_ccs.c
> > > > @@ -262,6 +262,119 @@ static void surf_copy(int i915,
> > > >    	gem_close(i915, ccs);
> > > >    }
> > > > +struct blt_copy3_data {
> > > > +	int i915;
> > > > +	struct blt_copy_object src;
> > > > +	struct blt_copy_object mid;
> > > > +	struct blt_copy_object dst;
> > > > +	struct blt_copy_object final;
> > > > +	struct blt_copy_batch bb;
> > > > +	enum blt_color_depth color_depth;
> > > > +
> > > > +	/* debug stuff */
> > > > +	bool print_bb;
> > > > +};
> > > > +
> > > > +struct blt_block_copy3_data_ext {
> > > > +	struct blt_block_copy_object_ext src;
> > > > +	struct blt_block_copy_object_ext mid;
> > > > +	struct blt_block_copy_object_ext dst;
> > > > +	struct blt_block_copy_object_ext final;
> > > > +};
> > > > +
> > > > +#define FILL_OBJ(_idx, _handle, _offset, _flags) do { \
> > > > +	obj[(_idx)].handle = (_handle); \
> > > > +	obj[(_idx)].offset = (_offset); \
> > > > +	obj[(_idx)++].flags = EXEC_OBJECT_PINNED | \
> > > > +			      EXEC_OBJECT_SUPPORTS_48B_ADDRESS | (_flags) ; \
> > > > +} while (0)
> > > > +
> > > 
> > > I'm not sure if we want to have a macro with a hidden side effect. "i++"
> > > after each statement isn't pretty either, so I'm on the fence with this one.
> > 
> > I'm aware this has side effect, but this makes main code more clear as it
> > is not interlaced with var++ between the lines. I think reader who knows
> > how execbuf objs works will notice there's some implicit increment in the
> > macro.
> 
> When I first read the code, I thought "why do we keep using the same i for
> different objects?", and then looked at this definition. Or we could call
> FILL_OBJ(++i, ...), but it's not pretty either.

This will execute increment 3 times here.

--
Zbigniew

> 
> > 
> > > 
> > > > +static int blt_block_copy3(int i915,
> > > > +			   const intel_ctx_t *ctx,
> > > > +			   const struct intel_execution_engine2 *e,
> > > > +			   uint64_t ahnd,
> > > > +			   const struct blt_copy3_data *blt3,
> > > > +			   const struct blt_block_copy3_data_ext *ext3)
> > > > +{
> > > > +	struct drm_i915_gem_execbuffer2 execbuf = {};
> > > > +	struct drm_i915_gem_exec_object2 obj[5] = {};
> > > > +	struct blt_copy_data blt0;
> > > > +	struct blt_block_copy_data_ext ext0;
> > > > +	uint64_t src_offset, mid_offset, dst_offset, final_offset, bb_offset, alignment;
> > > > +	uint64_t bb_pos = 0;
> > > > +	uint32_t *bb;
> > > > +	int i, ret;
> > > > +
> > > > +	igt_assert_f(ahnd, "block-copy3 supports softpin only\n");
> > > > +	igt_assert_f(blt3, "block-copy3 requires data to do blit\n");
> > > > +
> > > > +	alignment = gem_detect_safe_alignment(i915);
> > > > +	src_offset = get_offset(ahnd, blt3->src.handle, blt3->src.size, alignment);
> > > > +	mid_offset = get_offset(ahnd, blt3->mid.handle, blt3->mid.size, alignment);
> > > > +	dst_offset = get_offset(ahnd, blt3->dst.handle, blt3->dst.size, alignment);
> > > > +	final_offset = get_offset(ahnd, blt3->final.handle, blt3->final.size, alignment);
> > > > +	bb_offset = get_offset(ahnd, blt3->bb.handle, blt3->bb.size, alignment);
> > > > +
> > > > +	igt_debug("src: %lx, mid: %lx, dst: %lx, final: %lx, bb: %lx, align: %lx\n",
> > > > +		  src_offset, mid_offset, dst_offset, final_offset, bb_offset, alignment);
> > > > +
> > > > +	/* First blit src -> mid */
> > > > +	memset(&blt0, 0, sizeof(blt0));
> > > > +	blt0.src = blt3->src;
> > > > +	blt0.dst = blt3->mid;
> > > > +	blt0.bb = blt3->bb;
> > > > +	blt0.color_depth = blt3->color_depth;
> > > > +	blt0.print_bb = blt3->print_bb;
> > > > +	ext0.src = ext3->src;
> > > > +	ext0.dst = ext3->mid;
> > > > +	bb_pos = emit_blt_block_copy(i915, ahnd, &blt0, &ext0, bb_pos, false);
> > > > +
> > > > +	/* Second blit mid -> dst */
> > > > +	memset(&blt0, 0, sizeof(blt0));
> > > > +	blt0.src = blt3->mid;
> > > > +	blt0.dst = blt3->dst;
> > > > +	blt0.bb = blt3->bb;
> > > > +	blt0.color_depth = blt3->color_depth;
> > > > +	blt0.print_bb = blt3->print_bb;
> > > > +	ext0.src = ext3->mid;
> > > > +	ext0.dst = ext3->dst;
> > > > +	bb_pos = emit_blt_block_copy(i915, ahnd, &blt0, &ext0, bb_pos, false);
> > > > +
> > > > +	/* Third blit dst -> final */
> > > > +	memset(&blt0, 0, sizeof(blt0));
> > > > +	blt0.src = blt3->dst;
> > > > +	blt0.dst = blt3->final;
> > > > +	blt0.bb = blt3->bb;
> > > > +	blt0.color_depth = blt3->color_depth;
> > > > +	blt0.print_bb = blt3->print_bb;
> > > > +	ext0.src = ext3->dst;
> > > > +	ext0.dst = ext3->final;
> > > > +	bb_pos = emit_blt_block_copy(i915, ahnd, &blt0, &ext0, bb_pos, true);
> > > > +
> > > > +	i = 0;
> > > > +	FILL_OBJ(i, blt3->src.handle, CANONICAL(src_offset), 0);
> > > > +	FILL_OBJ(i, blt3->mid.handle, CANONICAL(mid_offset), EXEC_OBJECT_WRITE);
> > > > +	if (mid_offset != dst_offset)
> > > > +		FILL_OBJ(i, blt3->dst.handle, CANONICAL(dst_offset), EXEC_OBJECT_WRITE);
> > > > +	FILL_OBJ(i, blt3->final.handle, CANONICAL(final_offset), 0);
> > > > +	FILL_OBJ(i, blt3->bb.handle, CANONICAL(bb_offset), 0);
> > > > +
> > > > +	execbuf.buffer_count = i;
> > > > +
> > > > +	for (int __i = 0; __i < i; __i++)
> > > > +		igt_debug("obj[%d].offset: %llx, handle: %u\n",
> > > > +			  __i, (long long) obj[__i].offset, obj[__i].handle);
> > > > +	execbuf.buffers_ptr = to_user_pointer(obj);
> > > > +	execbuf.rsvd1 = ctx ? ctx->id : 0;
> > > > +	execbuf.flags = e ? e->flags : I915_EXEC_BLT;
> > > > +	ret = __gem_execbuf(i915, &execbuf);
> > > > +
> > > > +	gem_sync(i915, blt3->bb.handle);
> > > > +	munmap(bb, blt3->bb.size);
> > > > +
> > > > +	return ret;
> > > > +}
> > > > +
> > > >    static void block_copy(int i915,
> > > >    		       const intel_ctx_t *ctx,
> > > >    		       const struct intel_execution_engine2 *e,
> > > > @@ -380,10 +493,100 @@ static void block_copy(int i915,
> > > >    	igt_assert_f(!result, "source and destination surfaces differs!\n");
> > > >    }
> > > > +static void block_multicopy(int i915,
> > > > +			    const intel_ctx_t *ctx,
> > > > +			    const struct intel_execution_engine2 *e,
> > > > +			    uint32_t region1, uint32_t region2,
> > > > +			    enum blt_tiling mid_tiling,
> > > > +			    const struct test_config *config)
> > > > +{
> > > > +	struct blt_copy3_data blt3 = {};
> > > > +	struct blt_block_copy3_data_ext ext3 = {}, *pext3 = &ext3;
> > > > +	struct blt_copy_object *src, *mid, *dst, *final;
> > > > +	const uint32_t bpp = 32;
> > > > +	uint64_t bb_size = 4096;
> > > > +	uint64_t ahnd = get_reloc_ahnd(i915, ctx->id);
> > > > +	uint32_t run_id = mid_tiling;
> > > > +	uint32_t mid_region = region2, bb;
> > > > +	uint32_t width = param.width, height = param.height;
> > > > +	enum blt_compression mid_compression = config->compression;
> > > > +	int mid_compression_format = param.compression_format;
> > > > +	enum blt_compression_type comp_type = COMPRESSION_TYPE_3D;
> > > > +	uint8_t uc_mocs = intel_get_uc_mocs(i915);
> > > > +	int result;
> > > > +
> > > > +	igt_assert(__gem_create_in_memory_regions(i915, &bb, &bb_size, region1) == 0);
> > > > +
> > > > +	if (!blt_supports_compression(i915))
> > > > +		pext3 = NULL;
> > > > +
> > > > +	src = create_object(i915, region1, width, height, bpp, uc_mocs,
> > > > +			    T_LINEAR, COMPRESSION_DISABLED, comp_type, true);
> > > > +	mid = create_object(i915, mid_region, width, height, bpp, uc_mocs,
> > > > +			    mid_tiling, mid_compression, comp_type, true);
> > > > +	dst = create_object(i915, region1, width, height, bpp, uc_mocs,
> > > > +			    mid_tiling, COMPRESSION_DISABLED, comp_type, true);
> > > > +	final = create_object(i915, region1, width, height, bpp, uc_mocs,
> > > > +			    T_LINEAR, COMPRESSION_DISABLED, comp_type, true);
> > > > +	igt_assert(src->size == dst->size);
> > > > +	PRINT_SURFACE_INFO("src", src);
> > > > +	PRINT_SURFACE_INFO("mid", mid);
> > > > +	PRINT_SURFACE_INFO("dst", dst);
> > > > +	PRINT_SURFACE_INFO("final", final);
> > > > +
> > > > +	blt_surface_fill_rect(i915, src, width, height);
> > > > +
> > > > +	memset(&blt3, 0, sizeof(blt3));
> > > > +	blt3.color_depth = CD_32bit;
> > > > +	blt3.print_bb = param.print_bb;
> > > > +	set_blt_object(&blt3.src, src);
> > > > +	set_blt_object(&blt3.mid, mid);
> > > > +	set_blt_object(&blt3.dst, dst);
> > > > +	set_blt_object(&blt3.final, final);
> > > > +
> > > > +	if (config->inplace) {
> > > > +		set_object(&blt3.dst, mid->handle, dst->size, mid->region, mid->mocs,
> > > > +			   mid_tiling, COMPRESSION_DISABLED, comp_type);
> > > > +		blt3.dst.ptr = mid->ptr;
> > > > +	}
> > > > +
> > > > +	set_object_ext(&ext3.src, 0, width, height, SURFACE_TYPE_2D);
> > > > +	set_object_ext(&ext3.mid, mid_compression_format, width, height, SURFACE_TYPE_2D);
> > > > +	set_object_ext(&ext3.dst, 0, width, height, SURFACE_TYPE_2D);
> > > > +	set_object_ext(&ext3.final, 0, width, height, SURFACE_TYPE_2D);
> > > > +	set_batch(&blt3.bb, bb, bb_size, region1);
> > > > +
> > > > +	blt_block_copy3(i915, ctx, e, ahnd, &blt3, pext3);
> > > > +	gem_sync(i915, blt3.final.handle);
> > > > +
> > > > +	WRITE_PNG(i915, run_id, "src", &blt3.src, width, height);
> > > > +	if (!config->inplace)
> > > > +		WRITE_PNG(i915, run_id, "mid", &blt3.mid, width, height);
> > > > +	WRITE_PNG(i915, run_id, "dst", &blt3.dst, width, height);
> > > > +	WRITE_PNG(i915, run_id, "final", &blt3.final, width, height);
> > > > +
> > > > +	result = memcmp(src->ptr, blt3.final.ptr, src->size);
> > > > +
> > > > +	destroy_object(i915, src);
> > > > +	destroy_object(i915, mid);
> > > > +	destroy_object(i915, dst);
> > > > +	destroy_object(i915, final);
> > > > +	gem_close(i915, bb);
> > > > +	put_ahnd(ahnd);
> > > > +
> > > > +	igt_assert_f(!result, "source and destination surfaces differs!\n");
> > > > +}
> > > > +
> > > > +enum copy_func {
> > > > +	BLOCK_COPY,
> > > > +	BLOCK_MULTICOPY,
> > > > +};
> > > > +
> > > 
> > > I was thinking if we need a simple case here (i.e. one emit, src->dst)
> > 
> > What tiling you want to blit? linear -> linear? linear -> tileN? How to
> > verify it works? fmt1 -> fmt2 -> fmt1 is equal to fmt1 -> fmt2 (detile
> > fmt2 -> linear on cpu).
> > 
> > At the moment we don't verify middle surface, only assume src == dst
> > is fine with middle blit.
> 
> I thought about linear->linear in the beginning, but maybe it would make
> more sense to wait for predicates that can check the middle layer and work
> with different tilings. It can stay like this for now.
> 
> Many thanks,
> Karolina
> 
> > 
> > --
> > Zbigniew
> > 
> > > 
> > > >    static void block_copy_test(int i915,
> > > >    			    const struct test_config *config,
> > > >    			    const intel_ctx_t *ctx,
> > > > -			    struct igt_collection *set)
> > > > +			    struct igt_collection *set,
> > > > +			    enum copy_func copy_function)
> > > >    {
> > > >    	struct igt_collection *regions;
> > > >    	const struct intel_execution_engine2 *e;
> > > > @@ -415,15 +618,27 @@ static void block_copy_test(int i915,
> > > >    					continue;
> > > >    				regtxt = memregion_dynamic_subtest_name(regions);
> > > > -				igt_dynamic_f("%s-%s-compfmt%d-%s",
> > > > -					      blt_tiling_name(tiling),
> > > > -					      config->compression ?
> > > > -						      "compressed" : "uncompressed",
> > > > -					      param.compression_format, regtxt) {
> > > > -					block_copy(i915, ctx, e,
> > > > -						   region1, region2,
> > > > -						   tiling, config);
> > > > -				}
> > > > +
> > > > +				if (copy_function == BLOCK_COPY)
> > > > +					igt_dynamic_f("%s-%s-compfmt%d-%s",
> > > > +						      blt_tiling_name(tiling),
> > > > +						      config->compression ?
> > > > +							      "compressed" : "uncompressed",
> > > > +						      param.compression_format, regtxt) {
> > > > +						block_copy(i915, ctx, e,
> > > > +							   region1, region2,
> > > > +							   tiling, config);
> > > > +					}
> > > > +				else if (copy_function == BLOCK_MULTICOPY)
> > > > +					igt_dynamic_f("%s-%s-compfmt%d-%s-multicopy",
> > > > +						      blt_tiling_name(tiling),
> > > > +						      config->compression ?
> > > > +							      "compressed" : "uncompressed",
> > > > +						      param.compression_format, regtxt) {
> > > > +						block_multicopy(i915, ctx, e,
> > > > +								region1, region2,
> > > > +								tiling, config);
> > > > +					}
> > > 
> > > We could avoid repetition to some extent if we had introduced a macro that
> > > accepts a function pointer and/or some config data. But I don't feel
> > > strongly about it, just putting it out there.
> > > 
> > > Many thanks,
> > > Karolina
> > > 
> > > >    				free(regtxt);
> > > >    			}
> > > >    		}
> > > > @@ -506,14 +721,21 @@ igt_main_args("bf:pst:W:H:", NULL, help_str, opt_handler, NULL)
> > > >    	igt_subtest_with_dynamic("block-copy-uncompressed") {
> > > >    		struct test_config config = {};
> > > > -		block_copy_test(i915, &config, ctx, set);
> > > > +		block_copy_test(i915, &config, ctx, set, BLOCK_COPY);
> > > >    	}
> > > >    	igt_describe("Check block-copy flatccs compressed blit");
> > > >    	igt_subtest_with_dynamic("block-copy-compressed") {
> > > >    		struct test_config config = { .compression = true };
> > > > -		block_copy_test(i915, &config, ctx, set);
> > > > +		block_copy_test(i915, &config, ctx, set, BLOCK_COPY);
> > > > +	}
> > > > +
> > > > +	igt_describe("Check block-multicopy flatccs compressed blit");
> > > > +	igt_subtest_with_dynamic("block-multicopy-compressed") {
> > > > +		struct test_config config = { .compression = true };
> > > > +
> > > > +		block_copy_test(i915, &config, ctx, set, BLOCK_MULTICOPY);
> > > >    	}
> > > >    	igt_describe("Check block-copy flatccs inplace decompression blit");
> > > > @@ -521,7 +743,15 @@ igt_main_args("bf:pst:W:H:", NULL, help_str, opt_handler, NULL)
> > > >    		struct test_config config = { .compression = true,
> > > >    					      .inplace = true };
> > > > -		block_copy_test(i915, &config, ctx, set);
> > > > +		block_copy_test(i915, &config, ctx, set, BLOCK_COPY);
> > > > +	}
> > > > +
> > > > +	igt_describe("Check block-multicopy flatccs inplace decompression blit");
> > > > +	igt_subtest_with_dynamic("block-multicopy-inplace") {
> > > > +		struct test_config config = { .compression = true,
> > > > +					      .inplace = true };
> > > > +
> > > > +		block_copy_test(i915, &config, ctx, set, BLOCK_MULTICOPY);
> > > >    	}
> > > >    	igt_describe("Check flatccs data can be copied from/to surface");
> > > > @@ -529,7 +759,7 @@ igt_main_args("bf:pst:W:H:", NULL, help_str, opt_handler, NULL)
> > > >    		struct test_config config = { .compression = true,
> > > >    					      .surfcopy = true };
> > > > -		block_copy_test(i915, &config, ctx, set);
> > > > +		block_copy_test(i915, &config, ctx, set, BLOCK_COPY);
> > > >    	}
> > > >    	igt_describe("Check flatccs data are physically tagged and visible"
> > > > @@ -539,7 +769,7 @@ igt_main_args("bf:pst:W:H:", NULL, help_str, opt_handler, NULL)
> > > >    					      .surfcopy = true,
> > > >    					      .new_ctx = true };
> > > > -		block_copy_test(i915, &config, ctx, set);
> > > > +		block_copy_test(i915, &config, ctx, set, BLOCK_COPY);
> > > >    	}
> > > >    	igt_describe("Check flatccs data persists after suspend / resume (S0)");
> > > > @@ -548,7 +778,7 @@ igt_main_args("bf:pst:W:H:", NULL, help_str, opt_handler, NULL)
> > > >    					      .surfcopy = true,
> > > >    					      .suspend_resume = true };
> > > > -		block_copy_test(i915, &config, ctx, set);
> > > > +		block_copy_test(i915, &config, ctx, set, BLOCK_COPY);
> > > >    	}
> > > >    	igt_fixture {

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 3/3] tests/gem_ccs: Add block-multicopy subtest
  2022-12-15 11:00         ` Zbigniew Kempczyński
@ 2022-12-15 11:47           ` Karolina Stolarek
  0 siblings, 0 replies; 20+ messages in thread
From: Karolina Stolarek @ 2022-12-15 11:47 UTC (permalink / raw)
  To: Zbigniew Kempczyński; +Cc: igt-dev



On 15.12.2022 12:00, Zbigniew Kempczyński wrote:
> On Thu, Dec 15, 2022 at 09:42:56AM +0100, Karolina Stolarek wrote:
>> On 14.12.2022 20:57, Zbigniew Kempczyński wrote:
>>> On Tue, Dec 13, 2022 at 04:40:38PM +0100, Karolina Stolarek wrote:
>>>> On 12.12.2022 13:50, Zbigniew Kempczyński wrote:
>>>>> Exercise sequence of blits packed in single batch. It may reveal
>>>>> flushing/decompressing/detiling problems during execution.
>>>>> multicopy-inplace version differs from copy-inplace version with
>>>>> additional blit to same tiling format during decompression to separate
>>>>> problems visible in decompressing/detiling in one step.
>>>>>
>>>>> Signed-off-by: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com>
>>>>> ---
>>>>>     tests/i915/gem_ccs.c | 262 ++++++++++++++++++++++++++++++++++++++++---
>>>>>     1 file changed, 246 insertions(+), 16 deletions(-)
>>>>>
>>>>> diff --git a/tests/i915/gem_ccs.c b/tests/i915/gem_ccs.c
>>>>> index 4ecb3e36ac..6b5f199ec7 100644
>>>>> --- a/tests/i915/gem_ccs.c
>>>>> +++ b/tests/i915/gem_ccs.c
>>>>> @@ -262,6 +262,119 @@ static void surf_copy(int i915,
>>>>>     	gem_close(i915, ccs);
>>>>>     }
>>>>> +struct blt_copy3_data {
>>>>> +	int i915;
>>>>> +	struct blt_copy_object src;
>>>>> +	struct blt_copy_object mid;
>>>>> +	struct blt_copy_object dst;
>>>>> +	struct blt_copy_object final;
>>>>> +	struct blt_copy_batch bb;
>>>>> +	enum blt_color_depth color_depth;
>>>>> +
>>>>> +	/* debug stuff */
>>>>> +	bool print_bb;
>>>>> +};
>>>>> +
>>>>> +struct blt_block_copy3_data_ext {
>>>>> +	struct blt_block_copy_object_ext src;
>>>>> +	struct blt_block_copy_object_ext mid;
>>>>> +	struct blt_block_copy_object_ext dst;
>>>>> +	struct blt_block_copy_object_ext final;
>>>>> +};
>>>>> +
>>>>> +#define FILL_OBJ(_idx, _handle, _offset, _flags) do { \
>>>>> +	obj[(_idx)].handle = (_handle); \
>>>>> +	obj[(_idx)].offset = (_offset); \
>>>>> +	obj[(_idx)++].flags = EXEC_OBJECT_PINNED | \
>>>>> +			      EXEC_OBJECT_SUPPORTS_48B_ADDRESS | (_flags) ; \
>>>>> +} while (0)
>>>>> +
>>>>
>>>> I'm not sure if we want to have a macro with a hidden side effect. "i++"
>>>> after each statement isn't pretty either, so I'm on the fence with this one.
>>>
>>> I'm aware this has side effect, but this makes main code more clear as it
>>> is not interlaced with var++ between the lines. I think reader who knows
>>> how execbuf objs works will notice there's some implicit increment in the
>>> macro.
>>
>> When I first read the code, I thought "why do we keep using the same i for
>> different objects?", and then looked at this definition. Or we could call
>> FILL_OBJ(++i, ...), but it's not pretty either.
> 
> This will execute increment 3 times here.

Argh, you're right, silly me


Karolina

> --
> Zbigniew
> 
>>
>>>
>>>>
>>>>> +static int blt_block_copy3(int i915,
>>>>> +			   const intel_ctx_t *ctx,
>>>>> +			   const struct intel_execution_engine2 *e,
>>>>> +			   uint64_t ahnd,
>>>>> +			   const struct blt_copy3_data *blt3,
>>>>> +			   const struct blt_block_copy3_data_ext *ext3)
>>>>> +{
>>>>> +	struct drm_i915_gem_execbuffer2 execbuf = {};
>>>>> +	struct drm_i915_gem_exec_object2 obj[5] = {};
>>>>> +	struct blt_copy_data blt0;
>>>>> +	struct blt_block_copy_data_ext ext0;
>>>>> +	uint64_t src_offset, mid_offset, dst_offset, final_offset, bb_offset, alignment;
>>>>> +	uint64_t bb_pos = 0;
>>>>> +	uint32_t *bb;
>>>>> +	int i, ret;
>>>>> +
>>>>> +	igt_assert_f(ahnd, "block-copy3 supports softpin only\n");
>>>>> +	igt_assert_f(blt3, "block-copy3 requires data to do blit\n");
>>>>> +
>>>>> +	alignment = gem_detect_safe_alignment(i915);
>>>>> +	src_offset = get_offset(ahnd, blt3->src.handle, blt3->src.size, alignment);
>>>>> +	mid_offset = get_offset(ahnd, blt3->mid.handle, blt3->mid.size, alignment);
>>>>> +	dst_offset = get_offset(ahnd, blt3->dst.handle, blt3->dst.size, alignment);
>>>>> +	final_offset = get_offset(ahnd, blt3->final.handle, blt3->final.size, alignment);
>>>>> +	bb_offset = get_offset(ahnd, blt3->bb.handle, blt3->bb.size, alignment);
>>>>> +
>>>>> +	igt_debug("src: %lx, mid: %lx, dst: %lx, final: %lx, bb: %lx, align: %lx\n",
>>>>> +		  src_offset, mid_offset, dst_offset, final_offset, bb_offset, alignment);
>>>>> +
>>>>> +	/* First blit src -> mid */
>>>>> +	memset(&blt0, 0, sizeof(blt0));
>>>>> +	blt0.src = blt3->src;
>>>>> +	blt0.dst = blt3->mid;
>>>>> +	blt0.bb = blt3->bb;
>>>>> +	blt0.color_depth = blt3->color_depth;
>>>>> +	blt0.print_bb = blt3->print_bb;
>>>>> +	ext0.src = ext3->src;
>>>>> +	ext0.dst = ext3->mid;
>>>>> +	bb_pos = emit_blt_block_copy(i915, ahnd, &blt0, &ext0, bb_pos, false);
>>>>> +
>>>>> +	/* Second blit mid -> dst */
>>>>> +	memset(&blt0, 0, sizeof(blt0));
>>>>> +	blt0.src = blt3->mid;
>>>>> +	blt0.dst = blt3->dst;
>>>>> +	blt0.bb = blt3->bb;
>>>>> +	blt0.color_depth = blt3->color_depth;
>>>>> +	blt0.print_bb = blt3->print_bb;
>>>>> +	ext0.src = ext3->mid;
>>>>> +	ext0.dst = ext3->dst;
>>>>> +	bb_pos = emit_blt_block_copy(i915, ahnd, &blt0, &ext0, bb_pos, false);
>>>>> +
>>>>> +	/* Third blit dst -> final */
>>>>> +	memset(&blt0, 0, sizeof(blt0));
>>>>> +	blt0.src = blt3->dst;
>>>>> +	blt0.dst = blt3->final;
>>>>> +	blt0.bb = blt3->bb;
>>>>> +	blt0.color_depth = blt3->color_depth;
>>>>> +	blt0.print_bb = blt3->print_bb;
>>>>> +	ext0.src = ext3->dst;
>>>>> +	ext0.dst = ext3->final;
>>>>> +	bb_pos = emit_blt_block_copy(i915, ahnd, &blt0, &ext0, bb_pos, true);
>>>>> +
>>>>> +	i = 0;
>>>>> +	FILL_OBJ(i, blt3->src.handle, CANONICAL(src_offset), 0);
>>>>> +	FILL_OBJ(i, blt3->mid.handle, CANONICAL(mid_offset), EXEC_OBJECT_WRITE);
>>>>> +	if (mid_offset != dst_offset)
>>>>> +		FILL_OBJ(i, blt3->dst.handle, CANONICAL(dst_offset), EXEC_OBJECT_WRITE);
>>>>> +	FILL_OBJ(i, blt3->final.handle, CANONICAL(final_offset), 0);
>>>>> +	FILL_OBJ(i, blt3->bb.handle, CANONICAL(bb_offset), 0);
>>>>> +
>>>>> +	execbuf.buffer_count = i;
>>>>> +
>>>>> +	for (int __i = 0; __i < i; __i++)
>>>>> +		igt_debug("obj[%d].offset: %llx, handle: %u\n",
>>>>> +			  __i, (long long) obj[__i].offset, obj[__i].handle);
>>>>> +	execbuf.buffers_ptr = to_user_pointer(obj);
>>>>> +	execbuf.rsvd1 = ctx ? ctx->id : 0;
>>>>> +	execbuf.flags = e ? e->flags : I915_EXEC_BLT;
>>>>> +	ret = __gem_execbuf(i915, &execbuf);
>>>>> +
>>>>> +	gem_sync(i915, blt3->bb.handle);
>>>>> +	munmap(bb, blt3->bb.size);
>>>>> +
>>>>> +	return ret;
>>>>> +}
>>>>> +
>>>>>     static void block_copy(int i915,
>>>>>     		       const intel_ctx_t *ctx,
>>>>>     		       const struct intel_execution_engine2 *e,
>>>>> @@ -380,10 +493,100 @@ static void block_copy(int i915,
>>>>>     	igt_assert_f(!result, "source and destination surfaces differs!\n");
>>>>>     }
>>>>> +static void block_multicopy(int i915,
>>>>> +			    const intel_ctx_t *ctx,
>>>>> +			    const struct intel_execution_engine2 *e,
>>>>> +			    uint32_t region1, uint32_t region2,
>>>>> +			    enum blt_tiling mid_tiling,
>>>>> +			    const struct test_config *config)
>>>>> +{
>>>>> +	struct blt_copy3_data blt3 = {};
>>>>> +	struct blt_block_copy3_data_ext ext3 = {}, *pext3 = &ext3;
>>>>> +	struct blt_copy_object *src, *mid, *dst, *final;
>>>>> +	const uint32_t bpp = 32;
>>>>> +	uint64_t bb_size = 4096;
>>>>> +	uint64_t ahnd = get_reloc_ahnd(i915, ctx->id);
>>>>> +	uint32_t run_id = mid_tiling;
>>>>> +	uint32_t mid_region = region2, bb;
>>>>> +	uint32_t width = param.width, height = param.height;
>>>>> +	enum blt_compression mid_compression = config->compression;
>>>>> +	int mid_compression_format = param.compression_format;
>>>>> +	enum blt_compression_type comp_type = COMPRESSION_TYPE_3D;
>>>>> +	uint8_t uc_mocs = intel_get_uc_mocs(i915);
>>>>> +	int result;
>>>>> +
>>>>> +	igt_assert(__gem_create_in_memory_regions(i915, &bb, &bb_size, region1) == 0);
>>>>> +
>>>>> +	if (!blt_supports_compression(i915))
>>>>> +		pext3 = NULL;
>>>>> +
>>>>> +	src = create_object(i915, region1, width, height, bpp, uc_mocs,
>>>>> +			    T_LINEAR, COMPRESSION_DISABLED, comp_type, true);
>>>>> +	mid = create_object(i915, mid_region, width, height, bpp, uc_mocs,
>>>>> +			    mid_tiling, mid_compression, comp_type, true);
>>>>> +	dst = create_object(i915, region1, width, height, bpp, uc_mocs,
>>>>> +			    mid_tiling, COMPRESSION_DISABLED, comp_type, true);
>>>>> +	final = create_object(i915, region1, width, height, bpp, uc_mocs,
>>>>> +			    T_LINEAR, COMPRESSION_DISABLED, comp_type, true);
>>>>> +	igt_assert(src->size == dst->size);
>>>>> +	PRINT_SURFACE_INFO("src", src);
>>>>> +	PRINT_SURFACE_INFO("mid", mid);
>>>>> +	PRINT_SURFACE_INFO("dst", dst);
>>>>> +	PRINT_SURFACE_INFO("final", final);
>>>>> +
>>>>> +	blt_surface_fill_rect(i915, src, width, height);
>>>>> +
>>>>> +	memset(&blt3, 0, sizeof(blt3));
>>>>> +	blt3.color_depth = CD_32bit;
>>>>> +	blt3.print_bb = param.print_bb;
>>>>> +	set_blt_object(&blt3.src, src);
>>>>> +	set_blt_object(&blt3.mid, mid);
>>>>> +	set_blt_object(&blt3.dst, dst);
>>>>> +	set_blt_object(&blt3.final, final);
>>>>> +
>>>>> +	if (config->inplace) {
>>>>> +		set_object(&blt3.dst, mid->handle, dst->size, mid->region, mid->mocs,
>>>>> +			   mid_tiling, COMPRESSION_DISABLED, comp_type);
>>>>> +		blt3.dst.ptr = mid->ptr;
>>>>> +	}
>>>>> +
>>>>> +	set_object_ext(&ext3.src, 0, width, height, SURFACE_TYPE_2D);
>>>>> +	set_object_ext(&ext3.mid, mid_compression_format, width, height, SURFACE_TYPE_2D);
>>>>> +	set_object_ext(&ext3.dst, 0, width, height, SURFACE_TYPE_2D);
>>>>> +	set_object_ext(&ext3.final, 0, width, height, SURFACE_TYPE_2D);
>>>>> +	set_batch(&blt3.bb, bb, bb_size, region1);
>>>>> +
>>>>> +	blt_block_copy3(i915, ctx, e, ahnd, &blt3, pext3);
>>>>> +	gem_sync(i915, blt3.final.handle);
>>>>> +
>>>>> +	WRITE_PNG(i915, run_id, "src", &blt3.src, width, height);
>>>>> +	if (!config->inplace)
>>>>> +		WRITE_PNG(i915, run_id, "mid", &blt3.mid, width, height);
>>>>> +	WRITE_PNG(i915, run_id, "dst", &blt3.dst, width, height);
>>>>> +	WRITE_PNG(i915, run_id, "final", &blt3.final, width, height);
>>>>> +
>>>>> +	result = memcmp(src->ptr, blt3.final.ptr, src->size);
>>>>> +
>>>>> +	destroy_object(i915, src);
>>>>> +	destroy_object(i915, mid);
>>>>> +	destroy_object(i915, dst);
>>>>> +	destroy_object(i915, final);
>>>>> +	gem_close(i915, bb);
>>>>> +	put_ahnd(ahnd);
>>>>> +
>>>>> +	igt_assert_f(!result, "source and destination surfaces differs!\n");
>>>>> +}
>>>>> +
>>>>> +enum copy_func {
>>>>> +	BLOCK_COPY,
>>>>> +	BLOCK_MULTICOPY,
>>>>> +};
>>>>> +
>>>>
>>>> I was thinking if we need a simple case here (i.e. one emit, src->dst)
>>>
>>> What tiling you want to blit? linear -> linear? linear -> tileN? How to
>>> verify it works? fmt1 -> fmt2 -> fmt1 is equal to fmt1 -> fmt2 (detile
>>> fmt2 -> linear on cpu).
>>>
>>> At the moment we don't verify middle surface, only assume src == dst
>>> is fine with middle blit.
>>
>> I thought about linear->linear in the beginning, but maybe it would make
>> more sense to wait for predicates that can check the middle layer and work
>> with different tilings. It can stay like this for now.
>>
>> Many thanks,
>> Karolina
>>
>>>
>>> --
>>> Zbigniew
>>>
>>>>
>>>>>     static void block_copy_test(int i915,
>>>>>     			    const struct test_config *config,
>>>>>     			    const intel_ctx_t *ctx,
>>>>> -			    struct igt_collection *set)
>>>>> +			    struct igt_collection *set,
>>>>> +			    enum copy_func copy_function)
>>>>>     {
>>>>>     	struct igt_collection *regions;
>>>>>     	const struct intel_execution_engine2 *e;
>>>>> @@ -415,15 +618,27 @@ static void block_copy_test(int i915,
>>>>>     					continue;
>>>>>     				regtxt = memregion_dynamic_subtest_name(regions);
>>>>> -				igt_dynamic_f("%s-%s-compfmt%d-%s",
>>>>> -					      blt_tiling_name(tiling),
>>>>> -					      config->compression ?
>>>>> -						      "compressed" : "uncompressed",
>>>>> -					      param.compression_format, regtxt) {
>>>>> -					block_copy(i915, ctx, e,
>>>>> -						   region1, region2,
>>>>> -						   tiling, config);
>>>>> -				}
>>>>> +
>>>>> +				if (copy_function == BLOCK_COPY)
>>>>> +					igt_dynamic_f("%s-%s-compfmt%d-%s",
>>>>> +						      blt_tiling_name(tiling),
>>>>> +						      config->compression ?
>>>>> +							      "compressed" : "uncompressed",
>>>>> +						      param.compression_format, regtxt) {
>>>>> +						block_copy(i915, ctx, e,
>>>>> +							   region1, region2,
>>>>> +							   tiling, config);
>>>>> +					}
>>>>> +				else if (copy_function == BLOCK_MULTICOPY)
>>>>> +					igt_dynamic_f("%s-%s-compfmt%d-%s-multicopy",
>>>>> +						      blt_tiling_name(tiling),
>>>>> +						      config->compression ?
>>>>> +							      "compressed" : "uncompressed",
>>>>> +						      param.compression_format, regtxt) {
>>>>> +						block_multicopy(i915, ctx, e,
>>>>> +								region1, region2,
>>>>> +								tiling, config);
>>>>> +					}
>>>>
>>>> We could avoid repetition to some extent if we had introduced a macro that
>>>> accepts a function pointer and/or some config data. But I don't feel
>>>> strongly about it, just putting it out there.
>>>>
>>>> Many thanks,
>>>> Karolina
>>>>
>>>>>     				free(regtxt);
>>>>>     			}
>>>>>     		}
>>>>> @@ -506,14 +721,21 @@ igt_main_args("bf:pst:W:H:", NULL, help_str, opt_handler, NULL)
>>>>>     	igt_subtest_with_dynamic("block-copy-uncompressed") {
>>>>>     		struct test_config config = {};
>>>>> -		block_copy_test(i915, &config, ctx, set);
>>>>> +		block_copy_test(i915, &config, ctx, set, BLOCK_COPY);
>>>>>     	}
>>>>>     	igt_describe("Check block-copy flatccs compressed blit");
>>>>>     	igt_subtest_with_dynamic("block-copy-compressed") {
>>>>>     		struct test_config config = { .compression = true };
>>>>> -		block_copy_test(i915, &config, ctx, set);
>>>>> +		block_copy_test(i915, &config, ctx, set, BLOCK_COPY);
>>>>> +	}
>>>>> +
>>>>> +	igt_describe("Check block-multicopy flatccs compressed blit");
>>>>> +	igt_subtest_with_dynamic("block-multicopy-compressed") {
>>>>> +		struct test_config config = { .compression = true };
>>>>> +
>>>>> +		block_copy_test(i915, &config, ctx, set, BLOCK_MULTICOPY);
>>>>>     	}
>>>>>     	igt_describe("Check block-copy flatccs inplace decompression blit");
>>>>> @@ -521,7 +743,15 @@ igt_main_args("bf:pst:W:H:", NULL, help_str, opt_handler, NULL)
>>>>>     		struct test_config config = { .compression = true,
>>>>>     					      .inplace = true };
>>>>> -		block_copy_test(i915, &config, ctx, set);
>>>>> +		block_copy_test(i915, &config, ctx, set, BLOCK_COPY);
>>>>> +	}
>>>>> +
>>>>> +	igt_describe("Check block-multicopy flatccs inplace decompression blit");
>>>>> +	igt_subtest_with_dynamic("block-multicopy-inplace") {
>>>>> +		struct test_config config = { .compression = true,
>>>>> +					      .inplace = true };
>>>>> +
>>>>> +		block_copy_test(i915, &config, ctx, set, BLOCK_MULTICOPY);
>>>>>     	}
>>>>>     	igt_describe("Check flatccs data can be copied from/to surface");
>>>>> @@ -529,7 +759,7 @@ igt_main_args("bf:pst:W:H:", NULL, help_str, opt_handler, NULL)
>>>>>     		struct test_config config = { .compression = true,
>>>>>     					      .surfcopy = true };
>>>>> -		block_copy_test(i915, &config, ctx, set);
>>>>> +		block_copy_test(i915, &config, ctx, set, BLOCK_COPY);
>>>>>     	}
>>>>>     	igt_describe("Check flatccs data are physically tagged and visible"
>>>>> @@ -539,7 +769,7 @@ igt_main_args("bf:pst:W:H:", NULL, help_str, opt_handler, NULL)
>>>>>     					      .surfcopy = true,
>>>>>     					      .new_ctx = true };
>>>>> -		block_copy_test(i915, &config, ctx, set);
>>>>> +		block_copy_test(i915, &config, ctx, set, BLOCK_COPY);
>>>>>     	}
>>>>>     	igt_describe("Check flatccs data persists after suspend / resume (S0)");
>>>>> @@ -548,7 +778,7 @@ igt_main_args("bf:pst:W:H:", NULL, help_str, opt_handler, NULL)
>>>>>     					      .surfcopy = true,
>>>>>     					      .suspend_resume = true };
>>>>> -		block_copy_test(i915, &config, ctx, set);
>>>>> +		block_copy_test(i915, &config, ctx, set, BLOCK_COPY);
>>>>>     	}
>>>>>     	igt_fixture {

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2022-12-15 11:47 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-12 12:50 [igt-dev] [PATCH i-g-t 0/3] Add block-multicopy-* subtests Zbigniew Kempczyński
2022-12-12 12:50 ` [igt-dev] [PATCH i-g-t 1/3] lib/i915_blt: Remove src == dst pitch restriction Zbigniew Kempczyński
2022-12-13 15:37   ` Karolina Stolarek
2022-12-14 18:57     ` Zbigniew Kempczyński
2022-12-15  8:41       ` Karolina Stolarek
2022-12-15 10:43         ` Karolina Stolarek
2022-12-12 12:50 ` [igt-dev] [PATCH i-g-t 2/3] lib/i915_blt: Extract blit emit functions Zbigniew Kempczyński
2022-12-13 15:39   ` Karolina Stolarek
2022-12-14 19:40     ` Zbigniew Kempczyński
2022-12-15  8:42       ` Karolina Stolarek
2022-12-15 10:41         ` Zbigniew Kempczyński
2022-12-15 10:47           ` Karolina Stolarek
2022-12-12 12:50 ` [igt-dev] [PATCH i-g-t 3/3] tests/gem_ccs: Add block-multicopy subtest Zbigniew Kempczyński
2022-12-13 15:40   ` Karolina Stolarek
2022-12-14 19:57     ` Zbigniew Kempczyński
2022-12-15  8:42       ` Karolina Stolarek
2022-12-15 11:00         ` Zbigniew Kempczyński
2022-12-15 11:47           ` Karolina Stolarek
2022-12-12 13:23 ` [igt-dev] ✓ Fi.CI.BAT: success for Add block-multicopy-* subtests Patchwork
2022-12-12 21:05 ` [igt-dev] ✓ Fi.CI.IGT: " Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.