intel-gfx.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [Intel-gfx] [RFC PATCH i-g-t 0/2] tests/prime_vgem: Examine blitter access path
@ 2020-01-09 14:01 Janusz Krzysztofik
  2020-01-09 14:01 ` [Intel-gfx] [PATCH i-g-t 1/2] lib/intel_batchbuffer: Add blitter copy using XY_SRC_COPY_BLT Janusz Krzysztofik
  2020-01-09 14:01 ` [Intel-gfx] [PATCH i-g-t 2/2] tests/prime_vgem: Examine blitter access path Janusz Krzysztofik
  0 siblings, 2 replies; 10+ messages in thread
From: Janusz Krzysztofik @ 2020-01-09 14:01 UTC (permalink / raw)
  To: Chris Wilson; +Cc: igt-dev, Daniel Vetter, intel-gfx

On future hardware with missing GGTT BAR we won't be able to exercise
dma-buf access via that path.  An alternative to basic-gtt subtest for
testing dma-buf access is required, as well as basic-fence-mmap and
coherency-gtt subtest alternatives for testing WC coherency.

Access to the dma sg list feature exposed by dma-buf can be tested
through blitter.  Unfortunately we don't have any equivalently simple
tests that use blitter.  Provide them.

Blitter XY_SRC_COPY method implemented by igt_blitter_src_copy__raw()
IGT library function has been chosen.

v2: As fast copy is not supported on platforms older than Gen 9,
    use XY_SRC_COPY instead (Chris),
  - bundle with a patch providing XY_SRC_COPY implementation,
  - add subtest descriptions.

Janusz Krzysztofik (1):
  tests/prime_vgem: Examine blitter access path

Vanshidhar Konda (1):
  lib/intel_batchbuffer: Add blitter copy using XY_SRC_COPY_BLT

 lib/intel_batchbuffer.c | 185 ++++++++++++++++++++++++++++++++++++++
 lib/intel_batchbuffer.h |  21 +++++
 tests/prime_vgem.c      | 192 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 398 insertions(+)

-- 
2.21.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Intel-gfx] [PATCH i-g-t 1/2] lib/intel_batchbuffer: Add blitter copy using XY_SRC_COPY_BLT
  2020-01-09 14:01 [Intel-gfx] [RFC PATCH i-g-t 0/2] tests/prime_vgem: Examine blitter access path Janusz Krzysztofik
@ 2020-01-09 14:01 ` Janusz Krzysztofik
  2020-01-09 14:43   ` Chris Wilson
  2020-01-09 14:01 ` [Intel-gfx] [PATCH i-g-t 2/2] tests/prime_vgem: Examine blitter access path Janusz Krzysztofik
  1 sibling, 1 reply; 10+ messages in thread
From: Janusz Krzysztofik @ 2020-01-09 14:01 UTC (permalink / raw)
  To: Chris Wilson; +Cc: igt-dev, Daniel Vetter, intel-gfx

From: Vanshidhar Konda <vanshidhar.r.konda@intel.com>

Add a method that uses the XY_SRC_COPY_BLT instruction for copying
buffers using the blitter engine.

v2: Use uint32_t for parameters; fix stride for Gen2/3
v3: Taken over from Vanshi by Janusz,
  - skip upper bits of src and dst addresses on Gen < 8

Signed-off-by: Vanshidhar Konda <vanshidhar.r.konda@intel.com>
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
---
 lib/intel_batchbuffer.c | 185 ++++++++++++++++++++++++++++++++++++++++
 lib/intel_batchbuffer.h |  21 +++++
 2 files changed, 206 insertions(+)

diff --git a/lib/intel_batchbuffer.c b/lib/intel_batchbuffer.c
index 07de5cbb..a56cd5a7 100644
--- a/lib/intel_batchbuffer.c
+++ b/lib/intel_batchbuffer.c
@@ -45,6 +45,12 @@
 
 #include <i915_drm.h>
 
+#define MI_FLUSH_DW (0x26 << 23)
+
+#define BCS_SWCTRL 0x22200
+#define BCS_SRC_Y (1 << 0)
+#define BCS_DST_Y (1 << 1)
+
 /**
  * SECTION:intel_batchbuffer
  * @short_description: Batchbuffer and blitter support
@@ -660,6 +666,185 @@ static void exec_blit(int fd,
 	gem_execbuf(fd, &exec);
 }
 
+static uint32_t src_copy_dword0(uint32_t src_tiling, uint32_t dst_tiling,
+				uint32_t bpp, uint32_t device_gen)
+{
+	uint32_t dword0 = 0;
+
+	dword0 |= XY_SRC_COPY_BLT_CMD;
+	if (bpp == 32)
+		dword0 |= XY_SRC_COPY_BLT_WRITE_RGB |
+			XY_SRC_COPY_BLT_WRITE_ALPHA;
+
+	if (device_gen >= 4 && src_tiling)
+		dword0 |= XY_SRC_COPY_BLT_SRC_TILED;
+	if (device_gen >= 4 && dst_tiling)
+		dword0 |= XY_SRC_COPY_BLT_DST_TILED;
+
+	return dword0;
+}
+
+static uint32_t src_copy_dword1(uint32_t dst_pitch, uint32_t bpp)
+{
+	uint32_t dword1 = 0;
+
+	switch (bpp) {
+	case 8:
+		break;
+	case 16:
+		dword1 |= (1 << 24); /* Only support 565 color */
+		break;
+	case 32:
+		dword1 |= (3 << 24);
+		break;
+	default:
+		igt_assert(0);
+	}
+
+	dword1 |= 0xcc << 16;
+	dword1 |= dst_pitch;
+
+	return dword1;
+}
+/**
+ * igt_blitter_src_copy__raw:
+ * @fd: file descriptor of the i915 driver
+ * @src_handle: GEM handle of the source buffer
+ * @src_delta: offset into the source GEM bo, in bytes
+ * @src_stride: Stride (in bytes) of the source buffer
+ * @src_tiling: Tiling mode of the source buffer
+ * @src_x: X coordinate of the source region to copy
+ * @src_y: Y coordinate of the source region to copy
+ * @width: Width of the region to copy
+ * @height: Height of the region to copy
+ * @bpp: source and destination bits per pixel
+ * @dst_handle: GEM handle of the destination buffer
+ * @dst_delta: offset into the destination GEM bo, in bytes
+ * @dst_stride: Stride (in bytes) of the destination buffer
+ * @dst_tiling: Tiling mode of the destination buffer
+ * @dst_x: X coordinate of destination
+ * @dst_y: Y coordinate of destination
+ *
+ */
+void igt_blitter_src_copy__raw(int fd,
+				/* src */
+				uint32_t src_handle,
+				uint32_t src_delta,
+				uint32_t src_stride,
+				uint32_t src_tiling,
+				uint32_t src_x, uint32_t src_y,
+
+				/* size */
+				uint32_t width, uint32_t height,
+
+				/* bpp */
+				uint32_t bpp,
+
+				/* dst */
+				uint32_t dst_handle,
+				uint32_t dst_delta,
+				uint32_t dst_stride,
+				uint32_t dst_tiling,
+				uint32_t dst_x, uint32_t dst_y)
+{
+	uint32_t batch[32];
+	struct drm_i915_gem_exec_object2 objs[3];
+	struct drm_i915_gem_relocation_entry relocs[2];
+	uint32_t batch_handle;
+	uint32_t src_pitch, dst_pitch;
+	uint32_t dst_reloc_offset, src_reloc_offset;
+	int i = 0;
+	uint32_t gen = intel_gen(intel_get_drm_devid(fd));
+	const bool has_64b_reloc = gen >= 8;
+
+	memset(batch, 0, sizeof(batch));
+
+	igt_assert((src_tiling == I915_TILING_NONE) ||
+		   (src_tiling == I915_TILING_X) ||
+		   (src_tiling == I915_TILING_Y));
+	igt_assert((dst_tiling == I915_TILING_NONE) ||
+		   (dst_tiling == I915_TILING_X) ||
+		   (dst_tiling == I915_TILING_Y));
+
+	src_pitch = (gen >= 4 && src_tiling) ? src_stride / 4 : src_stride;
+	dst_pitch = (gen >= 4 && dst_tiling) ? dst_stride / 4 : dst_stride;
+
+	CHECK_RANGE(src_x); CHECK_RANGE(src_y);
+	CHECK_RANGE(dst_x); CHECK_RANGE(dst_y);
+	CHECK_RANGE(width); CHECK_RANGE(height);
+	CHECK_RANGE(src_x + width); CHECK_RANGE(src_y + height);
+	CHECK_RANGE(dst_x + width); CHECK_RANGE(dst_y + height);
+	CHECK_RANGE(src_pitch); CHECK_RANGE(dst_pitch);
+
+	if ((src_tiling | dst_tiling) >= I915_TILING_Y) {
+		unsigned int mask;
+
+		batch[i++] = MI_LOAD_REGISTER_IMM;
+		batch[i++] = BCS_SWCTRL;
+
+		mask = (BCS_SRC_Y | BCS_DST_Y) << 16;
+		if (src_tiling == I915_TILING_Y)
+			mask |= BCS_SRC_Y;
+		if (dst_tiling == I915_TILING_Y)
+			mask |= BCS_DST_Y;
+		batch[i++] = mask;
+	}
+
+	batch[i] = src_copy_dword0(src_tiling, dst_tiling, bpp, gen);
+	batch[i++] |= 6 + 2 * has_64b_reloc;
+	batch[i++] = src_copy_dword1(dst_pitch, bpp);
+	batch[i++] = (dst_y << 16) | dst_x; /* dst x1,y1 */
+	batch[i++] = ((dst_y + height) << 16) | (dst_x + width); /* dst x2,y2 */
+	dst_reloc_offset = i;
+	batch[i++] = dst_delta; /* dst address lower bits */
+	if (has_64b_reloc)
+		batch[i++] = 0;	/* dst address upper bits */
+	batch[i++] = (src_y << 16) | src_x; /* src x1,y1 */
+	batch[i++] = src_pitch;
+	src_reloc_offset = i;
+	batch[i++] = src_delta; /* src address lower bits */
+	if (has_64b_reloc)
+		batch[i++] = 0;	/* src address upper bits */
+
+	if ((src_tiling | dst_tiling) >= I915_TILING_Y) {
+		igt_assert(gen >= 6);
+		batch[i++] = MI_FLUSH_DW | 2;
+		batch[i++] = 0;
+		batch[i++] = 0;
+		batch[i++] = 0;
+
+		batch[i++] = MI_LOAD_REGISTER_IMM;
+		batch[i++] = BCS_SWCTRL;
+		batch[i++] = (BCS_SRC_Y | BCS_DST_Y) << 16;
+	}
+
+	batch[i++] = MI_BATCH_BUFFER_END;
+	batch[i++] = MI_NOOP;
+
+	igt_assert(i <= ARRAY_SIZE(batch));
+
+	batch_handle = gem_create(fd, 4096);
+	gem_write(fd, batch_handle, 0, batch, sizeof(batch));
+
+	fill_relocation(&relocs[0], dst_handle, dst_delta, dst_reloc_offset,
+			I915_GEM_DOMAIN_RENDER, I915_GEM_DOMAIN_RENDER);
+	fill_relocation(&relocs[1], src_handle, src_delta, src_reloc_offset,
+			I915_GEM_DOMAIN_RENDER, 0);
+
+	fill_object(&objs[0], dst_handle, NULL, 0);
+	fill_object(&objs[1], src_handle, NULL, 0);
+	fill_object(&objs[2], batch_handle, relocs, 2);
+
+	if (dst_tiling)
+		objs[0].flags |= EXEC_OBJECT_NEEDS_FENCE;
+	if (src_tiling)
+		objs[1].flags |= EXEC_OBJECT_NEEDS_FENCE;
+
+	exec_blit(fd, objs, 3, ARRAY_SIZE(batch));
+
+	gem_close(fd, batch_handle);
+}
+
 /**
  * igt_blitter_fast_copy__raw:
  * @fd: file descriptor of the i915 driver
diff --git a/lib/intel_batchbuffer.h b/lib/intel_batchbuffer.h
index e5f6e6d0..13a4d2fa 100644
--- a/lib/intel_batchbuffer.h
+++ b/lib/intel_batchbuffer.h
@@ -241,6 +241,27 @@ struct igt_buf {
 unsigned igt_buf_width(const struct igt_buf *buf);
 unsigned igt_buf_height(const struct igt_buf *buf);
 
+void igt_blitter_src_copy__raw(int fd,
+				/* src */
+				uint32_t src_handle,
+				uint32_t src_delta,
+				uint32_t src_stride,
+				uint32_t src_tiling,
+				uint32_t src_x, uint32_t src_y,
+
+				/* size */
+				uint32_t width, uint32_t height,
+
+				/* bpp */
+				uint32_t bpp,
+
+				/* dst */
+				uint32_t dst_handle,
+				uint32_t dst_delta,
+				uint32_t dst_stride,
+				uint32_t dst_tiling,
+				uint32_t dst_x, uint32_t dst_y);
+
 void igt_blitter_fast_copy(struct intel_batchbuffer *batch,
 			   const struct igt_buf *src, unsigned src_delta,
 			   unsigned src_x, unsigned src_y,
-- 
2.21.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Intel-gfx] [PATCH i-g-t 2/2] tests/prime_vgem: Examine blitter access path
  2020-01-09 14:01 [Intel-gfx] [RFC PATCH i-g-t 0/2] tests/prime_vgem: Examine blitter access path Janusz Krzysztofik
  2020-01-09 14:01 ` [Intel-gfx] [PATCH i-g-t 1/2] lib/intel_batchbuffer: Add blitter copy using XY_SRC_COPY_BLT Janusz Krzysztofik
@ 2020-01-09 14:01 ` Janusz Krzysztofik
  2020-01-09 15:03   ` Chris Wilson
  1 sibling, 1 reply; 10+ messages in thread
From: Janusz Krzysztofik @ 2020-01-09 14:01 UTC (permalink / raw)
  To: Chris Wilson; +Cc: igt-dev, Daniel Vetter, intel-gfx

On future hardware with missing GGTT BAR we won't be able to exercise
dma-buf access via that path.  An alternative to basic-gtt subtest for
testing dma-buf access is required, as well as basic-fence-mmap and
coherency-gtt subtest alternatives for testing WC coherency.

Access to the dma sg list feature exposed by dma-buf can be tested
through blitter.  Unfortunately we don't have any equivalently simple
tests that use blitter.  Provide them.

Blitter XY_SRC_COPY method implemented by igt_blitter_src_copy__raw()
IGT library function has been chosen.

v2: As fast copy is not supported on platforms older than Gen 9,
    use XY_SRC_COPY instead (Chris),
  - add subtest descriptions.

Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Suggested-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
---
 tests/prime_vgem.c | 192 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 192 insertions(+)

diff --git a/tests/prime_vgem.c b/tests/prime_vgem.c
index 69ae8c9b..9ceee47a 100644
--- a/tests/prime_vgem.c
+++ b/tests/prime_vgem.c
@@ -23,6 +23,7 @@
 
 #include "igt.h"
 #include "igt_vgem.h"
+#include "intel_batchbuffer.h"
 
 #include <sys/ioctl.h>
 #include <sys/poll.h>
@@ -171,6 +172,77 @@ static void test_fence_mmap(int i915, int vgem)
 	close(slave[1]);
 }
 
+static void test_fence_blt(int i915, int vgem)
+{
+	struct vgem_bo scratch;
+	uint32_t prime;
+	uint32_t *ptr;
+	uint32_t fence;
+	int dmabuf, i;
+	int master[2], slave[2];
+
+	igt_assert(pipe(master) == 0);
+	igt_assert(pipe(slave) == 0);
+
+	scratch.width = 1024;
+	scratch.height = 1024;
+	scratch.bpp = 32;
+	vgem_create(vgem, &scratch);
+
+	dmabuf = prime_handle_to_fd(vgem, scratch.handle);
+	prime = prime_fd_to_handle(i915, dmabuf);
+	close(dmabuf);
+
+	igt_fork(child, 1) {
+		uint32_t native;
+
+		close(master[0]);
+		close(slave[1]);
+
+		native = gem_create(i915, scratch.size);
+
+		ptr = gem_mmap__wc(i915, native, 0, scratch.size, PROT_READ);
+		for (i = 0; i < 1024; i++)
+			igt_assert_eq(ptr[1024 * i], 0);
+
+		write(master[1], &child, sizeof(child));
+		read(slave[0], &child, sizeof(child));
+
+		igt_blitter_src_copy__raw(i915, prime, 0,
+					  scratch.width * scratch.bpp / 8,
+					  I915_TILING_NONE, 0, 0,
+					  scratch.width, scratch.height,
+					  scratch.bpp, native, 0,
+					  scratch.width * scratch.bpp / 8,
+					  I915_TILING_NONE, 0, 0);
+		gem_sync(i915, native);
+
+		for (i = 0; i < 1024; i++)
+			igt_assert_eq(ptr[1024 * i], i);
+
+		munmap(ptr, scratch.size);
+		gem_close(i915, native);
+		gem_close(i915, prime);
+	}
+
+	close(master[1]);
+	close(slave[0]);
+	read(master[0], &i, sizeof(i));
+	fence = vgem_fence_attach(vgem, &scratch, VGEM_FENCE_WRITE);
+	write(slave[1], &i, sizeof(i));
+
+	ptr = vgem_mmap(vgem, &scratch, PROT_WRITE);
+	for (i = 0; i < 1024; i++)
+		ptr[1024 * i] = i;
+	munmap(ptr, scratch.size);
+	vgem_fence_signal(vgem, fence);
+	gem_close(vgem, scratch.handle);
+
+	igt_waitchildren();
+	close(master[0]);
+	close(slave[1]);
+}
+
 static void test_write(int vgem, int i915)
 {
 	struct vgem_bo scratch;
@@ -236,6 +308,62 @@ static void test_gtt(int vgem, int i915)
 	gem_close(vgem, scratch.handle);
 }
 
+static void test_blt(int vgem, int i915)
+{
+	struct vgem_bo scratch;
+	uint32_t prime, native;
+	uint32_t *ptr;
+	int dmabuf, i;
+
+	scratch.width = 1024;
+	scratch.height = 1024;
+	scratch.bpp = 32;
+	vgem_create(vgem, &scratch);
+
+	dmabuf = prime_handle_to_fd(vgem, scratch.handle);
+	prime = prime_fd_to_handle(i915, dmabuf);
+	close(dmabuf);
+
+	native = gem_create(i915, scratch.size);
+
+	ptr = gem_mmap__wc(i915, native, 0, scratch.size, PROT_WRITE);
+	for (i = 0; i < 1024; i++)
+		ptr[1024 * i] = i;
+	munmap(ptr, scratch.size);
+
+	igt_blitter_src_copy__raw(i915,
+				  native, 0, scratch.width * scratch.bpp / 8,
+				  I915_TILING_NONE, 0, 0,
+				  scratch.width, scratch.height, scratch.bpp,
+				  prime, 0, scratch.width * scratch.bpp / 8,
+				  I915_TILING_NONE, 0, 0);
+	gem_sync(i915, prime);
+
+	ptr = vgem_mmap(vgem, &scratch, PROT_READ | PROT_WRITE);
+	for (i = 0; i < 1024; i++) {
+		igt_assert_eq(ptr[1024 * i], i);
+		ptr[1024 * i] = ~i;
+	}
+	munmap(ptr, scratch.size);
+
+	igt_blitter_src_copy__raw(i915,
+				  prime, 0, scratch.width * scratch.bpp / 8,
+				  I915_TILING_NONE, 0, 0,
+				  scratch.width, scratch.height, scratch.bpp,
+				  native, 0, scratch.width * scratch.bpp / 8,
+				  I915_TILING_NONE, 0, 0);
+	gem_sync(i915, native);
+
+	ptr = gem_mmap__wc(i915, native, 0, scratch.size, PROT_READ);
+	for (i = 0; i < 1024; i++)
+		igt_assert_eq(ptr[1024 * i], ~i);
+	munmap(ptr, scratch.size);
+
+	gem_close(i915, native);
+	gem_close(i915, prime);
+	gem_close(vgem, scratch.handle);
+}
+
 static void test_shrink(int vgem, int i915)
 {
 	struct vgem_bo scratch = {
@@ -319,6 +447,59 @@ static void test_gtt_interleaved(int vgem, int i915)
 	gem_close(vgem, scratch.handle);
 }
 
+static void test_blt_interleaved(int vgem, int i915)
+{
+	struct vgem_bo scratch;
+	uint32_t prime, native;
+	uint32_t *foreign, *local;
+	int dmabuf, i;
+
+	scratch.width = 1024;
+	scratch.height = 1024;
+	scratch.bpp = 32;
+	vgem_create(vgem, &scratch);
+
+	dmabuf = prime_handle_to_fd(vgem, scratch.handle);
+	prime = prime_fd_to_handle(i915, dmabuf);
+	close(dmabuf);
+
+	native = gem_create(i915, scratch.size);
+
+	foreign = vgem_mmap(vgem, &scratch, PROT_WRITE);
+	local = gem_mmap__wc(i915, native, 0, scratch.size, PROT_WRITE);
+
+	for (i = 0; i < 1024; i++) {
+		local[1024 * i] = i;
+		igt_blitter_src_copy__raw(i915, native, 0,
+					  scratch.width * scratch.bpp / 8,
+					  I915_TILING_NONE, 0, i,
+					  scratch.width, 1, scratch.bpp,
+					  prime, 0,
+					  scratch.width * scratch.bpp / 8,
+					  I915_TILING_NONE, 0, i);
+		gem_sync(i915, prime);
+		igt_assert_eq(foreign[1024 * i], i);
+
+		foreign[1024 * i] = ~i;
+		igt_blitter_src_copy__raw(i915, prime, 0,
+					  scratch.width * scratch.bpp / 8,
+					  I915_TILING_NONE, 0, i,
+					  scratch.width, 1, scratch.bpp,
+					  native, 0,
+					  scratch.width * scratch.bpp / 8,
+					  I915_TILING_NONE, 0, i);
+		gem_sync(i915, native);
+		igt_assert_eq(local[1024 * i], ~i);
+	}
+
+	munmap(local, scratch.size);
+	munmap(foreign, scratch.size);
+
+	gem_close(i915, native);
+	gem_close(i915, prime);
+	gem_close(vgem, scratch.handle);
+}
+
 static bool prime_busy(int fd, bool excl)
 {
 	struct pollfd pfd = { .fd = fd, .events = excl ? POLLOUT : POLLIN };
@@ -834,12 +1015,20 @@ igt_main
 	igt_subtest("basic-gtt")
 		test_gtt(vgem, i915);
 
+	igt_describe("Examine blitter access path");
+	igt_subtest("basic-blt")
+		test_blt(vgem, i915);
+
 	igt_subtest("shrink")
 		test_shrink(vgem, i915);
 
 	igt_subtest("coherency-gtt")
 		test_gtt_interleaved(vgem, i915);
 
+	igt_describe("Examine blitter access path WC coherency");
+	igt_subtest("coherency-blt")
+		test_blt_interleaved(vgem, i915);
+
 	for (e = intel_execution_engines; e->name; e++) {
 		igt_subtest_f("%ssync-%s",
 			      e->exec_id == 0 ? "basic-" : "",
@@ -886,6 +1075,9 @@ igt_main
 			test_fence_read(i915, vgem);
 		igt_subtest("basic-fence-mmap")
 			test_fence_mmap(i915, vgem);
+		igt_describe("Examine blitter access path fencing");
+		igt_subtest("basic-fence-blt")
+			test_fence_blt(i915, vgem);
 
 		for (e = intel_execution_engines; e->name; e++) {
 			igt_subtest_f("%sfence-wait-%s",
-- 
2.21.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [Intel-gfx] [PATCH i-g-t 1/2] lib/intel_batchbuffer: Add blitter copy using XY_SRC_COPY_BLT
  2020-01-09 14:01 ` [Intel-gfx] [PATCH i-g-t 1/2] lib/intel_batchbuffer: Add blitter copy using XY_SRC_COPY_BLT Janusz Krzysztofik
@ 2020-01-09 14:43   ` Chris Wilson
  2020-01-15 11:10     ` Janusz Krzysztofik
  0 siblings, 1 reply; 10+ messages in thread
From: Chris Wilson @ 2020-01-09 14:43 UTC (permalink / raw)
  To: Janusz Krzysztofik; +Cc: igt-dev, Daniel Vetter, intel-gfx

Quoting Janusz Krzysztofik (2020-01-09 14:01:24)
> From: Vanshidhar Konda <vanshidhar.r.konda@intel.com>
> 
> Add a method that uses the XY_SRC_COPY_BLT instruction for copying
> buffers using the blitter engine.
> 
> v2: Use uint32_t for parameters; fix stride for Gen2/3
> v3: Taken over from Vanshi by Janusz,
>   - skip upper bits of src and dst addresses on Gen < 8
> 
> Signed-off-by: Vanshidhar Konda <vanshidhar.r.konda@intel.com>
> Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
> ---
>  lib/intel_batchbuffer.c | 185 ++++++++++++++++++++++++++++++++++++++++
>  lib/intel_batchbuffer.h |  21 +++++
>  2 files changed, 206 insertions(+)
> 
> diff --git a/lib/intel_batchbuffer.c b/lib/intel_batchbuffer.c
> index 07de5cbb..a56cd5a7 100644
> --- a/lib/intel_batchbuffer.c
> +++ b/lib/intel_batchbuffer.c
> @@ -45,6 +45,12 @@
>  
>  #include <i915_drm.h>
>  
> +#define MI_FLUSH_DW (0x26 << 23)
> +
> +#define BCS_SWCTRL 0x22200
> +#define BCS_SRC_Y (1 << 0)
> +#define BCS_DST_Y (1 << 1)
> +
>  /**
>   * SECTION:intel_batchbuffer
>   * @short_description: Batchbuffer and blitter support
> @@ -660,6 +666,185 @@ static void exec_blit(int fd,
>         gem_execbuf(fd, &exec);
>  }
>  
> +static uint32_t src_copy_dword0(uint32_t src_tiling, uint32_t dst_tiling,
> +                               uint32_t bpp, uint32_t device_gen)
> +{
> +       uint32_t dword0 = 0;
> +
> +       dword0 |= XY_SRC_COPY_BLT_CMD;
> +       if (bpp == 32)
> +               dword0 |= XY_SRC_COPY_BLT_WRITE_RGB |
> +                       XY_SRC_COPY_BLT_WRITE_ALPHA;
> +
> +       if (device_gen >= 4 && src_tiling)
> +               dword0 |= XY_SRC_COPY_BLT_SRC_TILED;
> +       if (device_gen >= 4 && dst_tiling)
> +               dword0 |= XY_SRC_COPY_BLT_DST_TILED;
> +
> +       return dword0;
> +}
> +
> +static uint32_t src_copy_dword1(uint32_t dst_pitch, uint32_t bpp)
> +{
> +       uint32_t dword1 = 0;
> +
> +       switch (bpp) {
> +       case 8:
> +               break;
> +       case 16:
> +               dword1 |= (1 << 24); /* Only support 565 color */
> +               break;
> +       case 32:
> +               dword1 |= (3 << 24);
> +               break;
> +       default:
> +               igt_assert(0);
> +       }
> +
> +       dword1 |= 0xcc << 16;
> +       dword1 |= dst_pitch;
> +
> +       return dword1;
> +}
> +/**
> + * igt_blitter_src_copy__raw:
> + * @fd: file descriptor of the i915 driver
> + * @src_handle: GEM handle of the source buffer
> + * @src_delta: offset into the source GEM bo, in bytes
> + * @src_stride: Stride (in bytes) of the source buffer
> + * @src_tiling: Tiling mode of the source buffer
> + * @src_x: X coordinate of the source region to copy
> + * @src_y: Y coordinate of the source region to copy
> + * @width: Width of the region to copy
> + * @height: Height of the region to copy
> + * @bpp: source and destination bits per pixel
> + * @dst_handle: GEM handle of the destination buffer
> + * @dst_delta: offset into the destination GEM bo, in bytes
> + * @dst_stride: Stride (in bytes) of the destination buffer
> + * @dst_tiling: Tiling mode of the destination buffer
> + * @dst_x: X coordinate of destination
> + * @dst_y: Y coordinate of destination
> + *
> + */
> +void igt_blitter_src_copy__raw(int fd,

Just call it igt_blitter_copy(). It should be the shorter, more often
useful variant. _fast is the special case version.

These functions should not be intel_batchbuffer.c. Either you do intend
them to be vendor neutral, and they could indeed become so, or they
should not be called igt_.

> +                               /* src */
> +                               uint32_t src_handle,
> +                               uint32_t src_delta,
> +                               uint32_t src_stride,
> +                               uint32_t src_tiling,
> +                               uint32_t src_x, uint32_t src_y,
> +
> +                               /* size */
> +                               uint32_t width, uint32_t height,
> +
> +                               /* bpp */
> +                               uint32_t bpp,
> +
> +                               /* dst */
> +                               uint32_t dst_handle,
> +                               uint32_t dst_delta,
> +                               uint32_t dst_stride,
> +                               uint32_t dst_tiling,
> +                               uint32_t dst_x, uint32_t dst_y)
> +{
> +       uint32_t batch[32];
> +       struct drm_i915_gem_exec_object2 objs[3];
> +       struct drm_i915_gem_relocation_entry relocs[2];
> +       uint32_t batch_handle;
> +       uint32_t src_pitch, dst_pitch;
> +       uint32_t dst_reloc_offset, src_reloc_offset;
> +       int i = 0;
> +       uint32_t gen = intel_gen(intel_get_drm_devid(fd));
> +       const bool has_64b_reloc = gen >= 8;
> +
> +       memset(batch, 0, sizeof(batch));
> +
> +       igt_assert((src_tiling == I915_TILING_NONE) ||
> +                  (src_tiling == I915_TILING_X) ||
> +                  (src_tiling == I915_TILING_Y));
> +       igt_assert((dst_tiling == I915_TILING_NONE) ||
> +                  (dst_tiling == I915_TILING_X) ||
> +                  (dst_tiling == I915_TILING_Y));
> +
> +       src_pitch = (gen >= 4 && src_tiling) ? src_stride / 4 : src_stride;
> +       dst_pitch = (gen >= 4 && dst_tiling) ? dst_stride / 4 : dst_stride;
> +
> +       CHECK_RANGE(src_x); CHECK_RANGE(src_y);
> +       CHECK_RANGE(dst_x); CHECK_RANGE(dst_y);
> +       CHECK_RANGE(width); CHECK_RANGE(height);
> +       CHECK_RANGE(src_x + width); CHECK_RANGE(src_y + height);
> +       CHECK_RANGE(dst_x + width); CHECK_RANGE(dst_y + height);
> +       CHECK_RANGE(src_pitch); CHECK_RANGE(dst_pitch);
> +
> +       if ((src_tiling | dst_tiling) >= I915_TILING_Y) {
> +               unsigned int mask;
> +
> +               batch[i++] = MI_LOAD_REGISTER_IMM;
> +               batch[i++] = BCS_SWCTRL;
> +
> +               mask = (BCS_SRC_Y | BCS_DST_Y) << 16;
> +               if (src_tiling == I915_TILING_Y)
> +                       mask |= BCS_SRC_Y;
> +               if (dst_tiling == I915_TILING_Y)
> +                       mask |= BCS_DST_Y;
> +               batch[i++] = mask;
> +       }
> +
> +       batch[i] = src_copy_dword0(src_tiling, dst_tiling, bpp, gen);
> +       batch[i++] |= 6 + 2 * has_64b_reloc;
> +       batch[i++] = src_copy_dword1(dst_pitch, bpp);
> +       batch[i++] = (dst_y << 16) | dst_x; /* dst x1,y1 */
> +       batch[i++] = ((dst_y + height) << 16) | (dst_x + width); /* dst x2,y2 */
> +       dst_reloc_offset = i;
> +       batch[i++] = dst_delta; /* dst address lower bits */
> +       if (has_64b_reloc)
> +               batch[i++] = 0; /* dst address upper bits */

It is better do the relocations inline and use the computed address from
the relocation routine so that the value in the batch is known to be
equal to value passed to the kernel.

> +       batch[i++] = (src_y << 16) | src_x; /* src x1,y1 */
> +       batch[i++] = src_pitch;
> +       src_reloc_offset = i;
> +       batch[i++] = src_delta; /* src address lower bits */
> +       if (has_64b_reloc)
> +               batch[i++] = 0; /* src address upper bits */
> +
> +       if ((src_tiling | dst_tiling) >= I915_TILING_Y) {
> +               igt_assert(gen >= 6);
> +               batch[i++] = MI_FLUSH_DW | 2;
> +               batch[i++] = 0;
> +               batch[i++] = 0;
> +               batch[i++] = 0;
> +
> +               batch[i++] = MI_LOAD_REGISTER_IMM;
> +               batch[i++] = BCS_SWCTRL;
> +               batch[i++] = (BCS_SRC_Y | BCS_DST_Y) << 16;
> +       }
> +
> +       batch[i++] = MI_BATCH_BUFFER_END;
> +       batch[i++] = MI_NOOP;
> +
> +       igt_assert(i <= ARRAY_SIZE(batch));
> +
> +       batch_handle = gem_create(fd, 4096);
> +       gem_write(fd, batch_handle, 0, batch, sizeof(batch));
> +
> +       fill_relocation(&relocs[0], dst_handle, dst_delta, dst_reloc_offset,
> +                       I915_GEM_DOMAIN_RENDER, I915_GEM_DOMAIN_RENDER);
> +       fill_relocation(&relocs[1], src_handle, src_delta, src_reloc_offset,
> +                       I915_GEM_DOMAIN_RENDER, 0);
> +
> +       fill_object(&objs[0], dst_handle, NULL, 0);
> +       fill_object(&objs[1], src_handle, NULL, 0);
> +       fill_object(&objs[2], batch_handle, relocs, 2);
> +
> +       if (dst_tiling)
> +               objs[0].flags |= EXEC_OBJECT_NEEDS_FENCE;

obj[0].flags |= EXEC_OBJECT_WRITE;

and then you can would be eligible to use execbuf.flags |= I915_EXEC_NORELOC;
But given you have no history, for it to work you could preseed the
obj.offset to sensible values rather than have all 3 three in the same
location. And you could fix the code to I915_EXEC_LUT_HANDLE.

> +       if (src_tiling)
> +               objs[1].flags |= EXEC_OBJECT_NEEDS_FENCE;
> +
> +       exec_blit(fd, objs, 3, ARRAY_SIZE(batch));

exec_blit() is broken as it does not apply to <gen6.

> +
> +       gem_close(fd, batch_handle);
> +}
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Intel-gfx] [PATCH i-g-t 2/2] tests/prime_vgem: Examine blitter access path
  2020-01-09 14:01 ` [Intel-gfx] [PATCH i-g-t 2/2] tests/prime_vgem: Examine blitter access path Janusz Krzysztofik
@ 2020-01-09 15:03   ` Chris Wilson
  2020-01-15 11:04     ` Janusz Krzysztofik
  2020-01-22 15:24     ` Janusz Krzysztofik
  0 siblings, 2 replies; 10+ messages in thread
From: Chris Wilson @ 2020-01-09 15:03 UTC (permalink / raw)
  To: Janusz Krzysztofik; +Cc: igt-dev, Daniel Vetter, intel-gfx

Quoting Janusz Krzysztofik (2020-01-09 14:01:25)
> On future hardware with missing GGTT BAR we won't be able to exercise
> dma-buf access via that path.  An alternative to basic-gtt subtest for
> testing dma-buf access is required, as well as basic-fence-mmap and
> coherency-gtt subtest alternatives for testing WC coherency.
> 
> Access to the dma sg list feature exposed by dma-buf can be tested
> through blitter.  Unfortunately we don't have any equivalently simple
> tests that use blitter.  Provide them.
> 
> Blitter XY_SRC_COPY method implemented by igt_blitter_src_copy__raw()
> IGT library function has been chosen.
> 
> v2: As fast copy is not supported on platforms older than Gen 9,
>     use XY_SRC_COPY instead (Chris),
>   - add subtest descriptions.
> 
> Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
> Suggested-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
> ---
>  tests/prime_vgem.c | 192 +++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 192 insertions(+)
> 
> diff --git a/tests/prime_vgem.c b/tests/prime_vgem.c
> index 69ae8c9b..9ceee47a 100644
> --- a/tests/prime_vgem.c
> +++ b/tests/prime_vgem.c
> @@ -23,6 +23,7 @@
>  
>  #include "igt.h"
>  #include "igt_vgem.h"
> +#include "intel_batchbuffer.h"
>  
>  #include <sys/ioctl.h>
>  #include <sys/poll.h>
> @@ -171,6 +172,77 @@ static void test_fence_mmap(int i915, int vgem)
>         close(slave[1]);
>  }
>  
> +static void test_fence_blt(int i915, int vgem)
> +{
> +       struct vgem_bo scratch;
> +       uint32_t prime;
> +       uint32_t *ptr;
> +       uint32_t fence;
> +       int dmabuf, i;
> +       int master[2], slave[2];
> +
> +       igt_assert(pipe(master) == 0);
> +       igt_assert(pipe(slave) == 0);
> +
> +       scratch.width = 1024;
> +       scratch.height = 1024;
> +       scratch.bpp = 32;
> +       vgem_create(vgem, &scratch);

Hmm. The pitch is returned by the dumb create, did we not report it in
vgem_bo? scratch.pitch;
> +
> +       dmabuf = prime_handle_to_fd(vgem, scratch.handle);
> +       prime = prime_fd_to_handle(i915, dmabuf);
> +       close(dmabuf);
> +
> +       igt_fork(child, 1) {
> +               uint32_t native;
> +
> +               close(master[0]);
> +               close(slave[1]);
> +
> +               native = gem_create(i915, scratch.size);
> +
> +               ptr = gem_mmap__wc(i915, native, 0, scratch.size, PROT_READ);
> +               for (i = 0; i < 1024; i++)
for (i = 0; i < scratch.height; i++)

> +                       igt_assert_eq(ptr[1024 * i], 0);

igt_assert_eq_u32(ptr[scratch.pitch * i / sizeof(*ptr)], 0);

The "%08x" will be easier to read when you have ~i later on.

> +
> +               write(master[1], &child, sizeof(child));
> +               read(slave[0], &child, sizeof(child));
> +
> +               igt_blitter_src_copy__raw(i915, prime, 0,
> +                                         scratch.width * scratch.bpp / 8,

scratch.pitch,

> +                                         I915_TILING_NONE, 0, 0,
> +                                         scratch.width, scratch.height,
> +                                         scratch.bpp, native, 0,
> +                                         scratch.width * scratch.bpp / 8,
> +                                         I915_TILING_NONE, 0, 0);
> +               gem_sync(i915, native);
> +
> +               for (i = 0; i < 1024; i++)
> +                       igt_assert_eq(ptr[1024 * i], i);
> +
> +               munmap(ptr, scratch.size);
> +               gem_close(i915, native);
> +               gem_close(i915, prime);
> +       }
> +
> +       close(master[1]);
> +       close(slave[0]);
> +       read(master[0], &i, sizeof(i));
> +       fence = vgem_fence_attach(vgem, &scratch, VGEM_FENCE_WRITE);
> +       write(slave[1], &i, sizeof(i));

Throw in a usleep(50*1000); to emphasize that the only thing stopping
the blitter is the fence.

> +
> +       ptr = vgem_mmap(vgem, &scratch, PROT_WRITE);
> +       for (i = 0; i < 1024; i++)
> +               ptr[1024 * i] = i;
> +       munmap(ptr, scratch.size);
> +       vgem_fence_signal(vgem, fence);
> +       gem_close(vgem, scratch.handle);
> +
> +       igt_waitchildren();
> +       close(master[0]);
> +       close(slave[1]);
> +}
> +
>  static void test_write(int vgem, int i915)
>  {
>         struct vgem_bo scratch;
> @@ -236,6 +308,62 @@ static void test_gtt(int vgem, int i915)
>         gem_close(vgem, scratch.handle);
>  }
>  
> +static void test_blt(int vgem, int i915)
> +{
> +       struct vgem_bo scratch;
> +       uint32_t prime, native;
> +       uint32_t *ptr;
> +       int dmabuf, i;
> +
> +       scratch.width = 1024;
> +       scratch.height = 1024;
> +       scratch.bpp = 32;
> +       vgem_create(vgem, &scratch);
> +
> +       dmabuf = prime_handle_to_fd(vgem, scratch.handle);
> +       prime = prime_fd_to_handle(i915, dmabuf);
> +       close(dmabuf);
> +
> +       native = gem_create(i915, scratch.size);
> +
> +       ptr = gem_mmap__wc(i915, native, 0, scratch.size, PROT_WRITE);
> +       for (i = 0; i < 1024; i++)
> +               ptr[1024 * i] = i;
> +       munmap(ptr, scratch.size);
> +
> +       igt_blitter_src_copy__raw(i915,
> +                                 native, 0, scratch.width * scratch.bpp / 8,
> +                                 I915_TILING_NONE, 0, 0,
> +                                 scratch.width, scratch.height, scratch.bpp,
> +                                 prime, 0, scratch.width * scratch.bpp / 8,
> +                                 I915_TILING_NONE, 0, 0);
> +       gem_sync(i915, prime);

Would this be better with prime_sync_start(dmabuf, true); ?

gem_sync(prime) is nice in its own way (checking sync on imported handle
provides correct serialisation for the vgem_mmap). But I think the
recommended practice would be to use dmabuf for the inter-device sync?

> +       ptr = vgem_mmap(vgem, &scratch, PROT_READ | PROT_WRITE);
> +       for (i = 0; i < 1024; i++) {
> +               igt_assert_eq(ptr[1024 * i], i);
> +               ptr[1024 * i] = ~i;
> +       }
> +       munmap(ptr, scratch.size);
> +
> +       igt_blitter_src_copy__raw(i915,
> +                                 prime, 0, scratch.width * scratch.bpp / 8,
> +                                 I915_TILING_NONE, 0, 0,
> +                                 scratch.width, scratch.height, scratch.bpp,
> +                                 native, 0, scratch.width * scratch.bpp / 8,
> +                                 I915_TILING_NONE, 0, 0);
> +       gem_sync(i915, native);
> +
> +       ptr = gem_mmap__wc(i915, native, 0, scratch.size, PROT_READ);
> +       for (i = 0; i < 1024; i++)
> +               igt_assert_eq(ptr[1024 * i], ~i);
> +       munmap(ptr, scratch.size);
> +
> +       gem_close(i915, native);
> +       gem_close(i915, prime);
> +       gem_close(vgem, scratch.handle);
> +}
> +
>  static void test_shrink(int vgem, int i915)
>  {
>         struct vgem_bo scratch = {
> @@ -319,6 +447,59 @@ static void test_gtt_interleaved(int vgem, int i915)
>         gem_close(vgem, scratch.handle);
>  }
>  
> +static void test_blt_interleaved(int vgem, int i915)
> +{
> +       struct vgem_bo scratch;
> +       uint32_t prime, native;
> +       uint32_t *foreign, *local;
> +       int dmabuf, i;
> +
> +       scratch.width = 1024;
> +       scratch.height = 1024;
> +       scratch.bpp = 32;
> +       vgem_create(vgem, &scratch);
> +
> +       dmabuf = prime_handle_to_fd(vgem, scratch.handle);
> +       prime = prime_fd_to_handle(i915, dmabuf);
> +       close(dmabuf);
> +
> +       native = gem_create(i915, scratch.size);
> +
> +       foreign = vgem_mmap(vgem, &scratch, PROT_WRITE);
> +       local = gem_mmap__wc(i915, native, 0, scratch.size, PROT_WRITE);
> +
> +       for (i = 0; i < 1024; i++) {
> +               local[1024 * i] = i;
> +               igt_blitter_src_copy__raw(i915, native, 0,
> +                                         scratch.width * scratch.bpp / 8,
> +                                         I915_TILING_NONE, 0, i,
> +                                         scratch.width, 1, scratch.bpp,
> +                                         prime, 0,
> +                                         scratch.width * scratch.bpp / 8,
> +                                         I915_TILING_NONE, 0, i);
> +               gem_sync(i915, prime);
> +               igt_assert_eq(foreign[1024 * i], i);
> +
> +               foreign[1024 * i] = ~i;
> +               igt_blitter_src_copy__raw(i915, prime, 0,
> +                                         scratch.width * scratch.bpp / 8,
> +                                         I915_TILING_NONE, 0, i,
> +                                         scratch.width, 1, scratch.bpp,
> +                                         native, 0,
> +                                         scratch.width * scratch.bpp / 8,
> +                                         I915_TILING_NONE, 0, i);
> +               gem_sync(i915, native);
> +               igt_assert_eq(local[1024 * i], ~i);
> +       }
> +
> +       munmap(local, scratch.size);
> +       munmap(foreign, scratch.size);
> +
> +       gem_close(i915, native);
> +       gem_close(i915, prime);
> +       gem_close(vgem, scratch.handle);
> +}

So other than using scratch.constants where possible, the tests make
sense and look interesting. Asynchronous fence scheduling,
back-and-forth coherency and serialisation.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Intel-gfx] [PATCH i-g-t 2/2] tests/prime_vgem: Examine blitter access path
  2020-01-09 15:03   ` Chris Wilson
@ 2020-01-15 11:04     ` Janusz Krzysztofik
  2020-01-16 12:13       ` Janusz Krzysztofik
  2020-01-22 15:24     ` Janusz Krzysztofik
  1 sibling, 1 reply; 10+ messages in thread
From: Janusz Krzysztofik @ 2020-01-15 11:04 UTC (permalink / raw)
  To: Chris Wilson; +Cc: igt-dev, Daniel Vetter, intel-gfx

Hi Chris,

Thanks for your review.

On Thursday, January 9, 2020 4:03:30 PM CET Chris Wilson wrote:
> Quoting Janusz Krzysztofik (2020-01-09 14:01:25)
> > On future hardware with missing GGTT BAR we won't be able to exercise
> > dma-buf access via that path.  An alternative to basic-gtt subtest for
> > testing dma-buf access is required, as well as basic-fence-mmap and
> > coherency-gtt subtest alternatives for testing WC coherency.
> > 
> > Access to the dma sg list feature exposed by dma-buf can be tested
> > through blitter.  Unfortunately we don't have any equivalently simple
> > tests that use blitter.  Provide them.
> > 
> > Blitter XY_SRC_COPY method implemented by igt_blitter_src_copy__raw()
> > IGT library function has been chosen.
> > 
> > v2: As fast copy is not supported on platforms older than Gen 9,
> >     use XY_SRC_COPY instead (Chris),
> >   - add subtest descriptions.
> > 
> > Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Suggested-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> > Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
> > ---
> >  tests/prime_vgem.c | 192 +++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 192 insertions(+)
> > 
> > diff --git a/tests/prime_vgem.c b/tests/prime_vgem.c
> > index 69ae8c9b..9ceee47a 100644
> > --- a/tests/prime_vgem.c
> > +++ b/tests/prime_vgem.c
> > @@ -23,6 +23,7 @@
> >  
> >  #include "igt.h"
> >  #include "igt_vgem.h"
> > +#include "intel_batchbuffer.h"
> >  
> >  #include <sys/ioctl.h>
> >  #include <sys/poll.h>
> > @@ -171,6 +172,77 @@ static void test_fence_mmap(int i915, int vgem)
> >         close(slave[1]);
> >  }
> >  
> > +static void test_fence_blt(int i915, int vgem)
> > +{
> > +       struct vgem_bo scratch;
> > +       uint32_t prime;
> > +       uint32_t *ptr;
> > +       uint32_t fence;
> > +       int dmabuf, i;
> > +       int master[2], slave[2];
> > +
> > +       igt_assert(pipe(master) == 0);
> > +       igt_assert(pipe(slave) == 0);
> > +
> > +       scratch.width = 1024;
> > +       scratch.height = 1024;
> > +       scratch.bpp = 32;
> > +       vgem_create(vgem, &scratch);
> 
> Hmm. The pitch is returned by the dumb create, did we not report it in
> vgem_bo? scratch.pitch;

You're right, the pitch is returned in scratch.pitch.
 
> > +
> > +       dmabuf = prime_handle_to_fd(vgem, scratch.handle);
> > +       prime = prime_fd_to_handle(i915, dmabuf);
> > +       close(dmabuf);
> > +
> > +       igt_fork(child, 1) {
> > +               uint32_t native;
> > +
> > +               close(master[0]);
> > +               close(slave[1]);
> > +
> > +               native = gem_create(i915, scratch.size);
> > +
> > +               ptr = gem_mmap__wc(i915, native, 0, scratch.size, PROT_READ);
> > +               for (i = 0; i < 1024; i++)
> for (i = 0; i < scratch.height; i++)

Makes sense.

> > +                       igt_assert_eq(ptr[1024 * i], 0);
> 
> igt_assert_eq_u32(ptr[scratch.pitch * i / sizeof(*ptr)], 0);
> 
> The "%08x" will be easier to read when you have ~i later on.

Indeed.

> > +
> > +               write(master[1], &child, sizeof(child));
> > +               read(slave[0], &child, sizeof(child));
> > +
> > +               igt_blitter_src_copy__raw(i915, prime, 0,
> > +                                         scratch.width * scratch.bpp / 8,
> 
> scratch.pitch,

Sure.

> > +                                         I915_TILING_NONE, 0, 0,
> > +                                         scratch.width, scratch.height,
> > +                                         scratch.bpp, native, 0,
> > +                                         scratch.width * scratch.bpp / 8,
> > +                                         I915_TILING_NONE, 0, 0);
> > +               gem_sync(i915, native);

While exercising next version of the patch in my test environment, I've found 
out that the only place where synchronization is actually required for new 
subtests to succeed is this subtest - "basic-fence-blt".  Moreover, either 
following the blitter copy operation with gem_sync(i915, native) or enclosing 
it with prime_sync_start/end(dmabuf, flase) does the job for me.  Assuming 
this behavior is reproducible, which method would you prefer here?

> > +
> > +               for (i = 0; i < 1024; i++)
> > +                       igt_assert_eq(ptr[1024 * i], i);
> > +
> > +               munmap(ptr, scratch.size);
> > +               gem_close(i915, native);
> > +               gem_close(i915, prime);
> > +       }
> > +
> > +       close(master[1]);
> > +       close(slave[0]);
> > +       read(master[0], &i, sizeof(i));
> > +       fence = vgem_fence_attach(vgem, &scratch, VGEM_FENCE_WRITE);
> > +       write(slave[1], &i, sizeof(i));
> 
> Throw in a usleep(50*1000); to emphasize that the only thing stopping
> the blitter is the fence.

OK, and with your comment placed above it, I think.

> > +
> > +       ptr = vgem_mmap(vgem, &scratch, PROT_WRITE);
> > +       for (i = 0; i < 1024; i++)
> > +               ptr[1024 * i] = i;
> > +       munmap(ptr, scratch.size);
> > +       vgem_fence_signal(vgem, fence);
> > +       gem_close(vgem, scratch.handle);
> > +
> > +       igt_waitchildren();
> > +       close(master[0]);
> > +       close(slave[1]);
> > +}
> > +
> >  static void test_write(int vgem, int i915)
> >  {
> >         struct vgem_bo scratch;
> > @@ -236,6 +308,62 @@ static void test_gtt(int vgem, int i915)
> >         gem_close(vgem, scratch.handle);
> >  }
> >  
> > +static void test_blt(int vgem, int i915)
> > +{
> > +       struct vgem_bo scratch;
> > +       uint32_t prime, native;
> > +       uint32_t *ptr;
> > +       int dmabuf, i;
> > +
> > +       scratch.width = 1024;
> > +       scratch.height = 1024;
> > +       scratch.bpp = 32;
> > +       vgem_create(vgem, &scratch);
> > +
> > +       dmabuf = prime_handle_to_fd(vgem, scratch.handle);
> > +       prime = prime_fd_to_handle(i915, dmabuf);
> > +       close(dmabuf);
> > +
> > +       native = gem_create(i915, scratch.size);
> > +
> > +       ptr = gem_mmap__wc(i915, native, 0, scratch.size, PROT_WRITE);
> > +       for (i = 0; i < 1024; i++)
> > +               ptr[1024 * i] = i;
> > +       munmap(ptr, scratch.size);
> > +
> > +       igt_blitter_src_copy__raw(i915,
> > +                                 native, 0, scratch.width * scratch.bpp / 8,
> > +                                 I915_TILING_NONE, 0, 0,
> > +                                 scratch.width, scratch.height, scratch.bpp,
> > +                                 prime, 0, scratch.width * scratch.bpp / 8,
> > +                                 I915_TILING_NONE, 0, 0);
> > +       gem_sync(i915, prime);
> 
> Would this be better with prime_sync_start(dmabuf, true); ?

Did you mean surrounding the this native->prime blitter copy operation with 
prime_sync_start/end(dmabuf, true); ?

> gem_sync(prime) is nice in its own way (checking sync on imported handle
> provides correct serialisation for the vgem_mmap). But I think the
> recommended practice would be to use dmabuf for the inter-device sync?

My intention was to make sure the copy has been completed by GPU.

As I've found both test_blt() and test_blt_interleaved() succeed without any 
syncing in my test environment, I'm going to give different options a try on 
Trybot first before my next submission, unless you are able to provide me with 
your recommendations on all my gem_sync() use cases found in this patch.

Thanks,
Janusz


> > +       ptr = vgem_mmap(vgem, &scratch, PROT_READ | PROT_WRITE);
> > +       for (i = 0; i < 1024; i++) {
> > +               igt_assert_eq(ptr[1024 * i], i);
> > +               ptr[1024 * i] = ~i;
> > +       }
> > +       munmap(ptr, scratch.size);
> > +
> > +       igt_blitter_src_copy__raw(i915,
> > +                                 prime, 0, scratch.width * scratch.bpp / 8,
> > +                                 I915_TILING_NONE, 0, 0,
> > +                                 scratch.width, scratch.height, scratch.bpp,
> > +                                 native, 0, scratch.width * scratch.bpp / 8,
> > +                                 I915_TILING_NONE, 0, 0);
> > +       gem_sync(i915, native);
> > +
> > +       ptr = gem_mmap__wc(i915, native, 0, scratch.size, PROT_READ);
> > +       for (i = 0; i < 1024; i++)
> > +               igt_assert_eq(ptr[1024 * i], ~i);
> > +       munmap(ptr, scratch.size);
> > +
> > +       gem_close(i915, native);
> > +       gem_close(i915, prime);
> > +       gem_close(vgem, scratch.handle);
> > +}
> > +
> >  static void test_shrink(int vgem, int i915)
> >  {
> >         struct vgem_bo scratch = {
> > @@ -319,6 +447,59 @@ static void test_gtt_interleaved(int vgem, int i915)
> >         gem_close(vgem, scratch.handle);
> >  }
> >  
> > +static void test_blt_interleaved(int vgem, int i915)
> > +{
> > +       struct vgem_bo scratch;
> > +       uint32_t prime, native;
> > +       uint32_t *foreign, *local;
> > +       int dmabuf, i;
> > +
> > +       scratch.width = 1024;
> > +       scratch.height = 1024;
> > +       scratch.bpp = 32;
> > +       vgem_create(vgem, &scratch);
> > +
> > +       dmabuf = prime_handle_to_fd(vgem, scratch.handle);
> > +       prime = prime_fd_to_handle(i915, dmabuf);
> > +       close(dmabuf);
> > +
> > +       native = gem_create(i915, scratch.size);
> > +
> > +       foreign = vgem_mmap(vgem, &scratch, PROT_WRITE);
> > +       local = gem_mmap__wc(i915, native, 0, scratch.size, PROT_WRITE);
> > +
> > +       for (i = 0; i < 1024; i++) {
> > +               local[1024 * i] = i;
> > +               igt_blitter_src_copy__raw(i915, native, 0,
> > +                                         scratch.width * scratch.bpp / 8,
> > +                                         I915_TILING_NONE, 0, i,
> > +                                         scratch.width, 1, scratch.bpp,
> > +                                         prime, 0,
> > +                                         scratch.width * scratch.bpp / 8,
> > +                                         I915_TILING_NONE, 0, i);
> > +               gem_sync(i915, prime);
> > +               igt_assert_eq(foreign[1024 * i], i);
> > +
> > +               foreign[1024 * i] = ~i;
> > +               igt_blitter_src_copy__raw(i915, prime, 0,
> > +                                         scratch.width * scratch.bpp / 8,
> > +                                         I915_TILING_NONE, 0, i,
> > +                                         scratch.width, 1, scratch.bpp,
> > +                                         native, 0,
> > +                                         scratch.width * scratch.bpp / 8,
> > +                                         I915_TILING_NONE, 0, i);
> > +               gem_sync(i915, native);
> > +               igt_assert_eq(local[1024 * i], ~i);
> > +       }
> > +
> > +       munmap(local, scratch.size);
> > +       munmap(foreign, scratch.size);
> > +
> > +       gem_close(i915, native);
> > +       gem_close(i915, prime);
> > +       gem_close(vgem, scratch.handle);
> > +}
> 
> So other than using scratch.constants where possible, the tests make
> sense and look interesting. Asynchronous fence scheduling,
> back-and-forth coherency and serialisation.
> 
> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
> -Chris
> 




_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Intel-gfx] [PATCH i-g-t 1/2] lib/intel_batchbuffer: Add blitter copy using XY_SRC_COPY_BLT
  2020-01-09 14:43   ` Chris Wilson
@ 2020-01-15 11:10     ` Janusz Krzysztofik
  0 siblings, 0 replies; 10+ messages in thread
From: Janusz Krzysztofik @ 2020-01-15 11:10 UTC (permalink / raw)
  To: Chris Wilson; +Cc: igt-dev, Daniel Vetter, intel-gfx

Hi Chris,

On Thursday, January 9, 2020 3:43:55 PM CET Chris Wilson wrote:
> Quoting Janusz Krzysztofik (2020-01-09 14:01:24)
> > From: Vanshidhar Konda <vanshidhar.r.konda@intel.com>
> > 
> > Add a method that uses the XY_SRC_COPY_BLT instruction for copying
> > buffers using the blitter engine.
> > 
> > v2: Use uint32_t for parameters; fix stride for Gen2/3
> > v3: Taken over from Vanshi by Janusz,
> >   - skip upper bits of src and dst addresses on Gen < 8
> > 
> > Signed-off-by: Vanshidhar Konda <vanshidhar.r.konda@intel.com>
> > Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
> > ---
> >  lib/intel_batchbuffer.c | 185 ++++++++++++++++++++++++++++++++++++++++
> >  lib/intel_batchbuffer.h |  21 +++++
> >  2 files changed, 206 insertions(+)
> > 
> > diff --git a/lib/intel_batchbuffer.c b/lib/intel_batchbuffer.c
> > index 07de5cbb..a56cd5a7 100644
> > --- a/lib/intel_batchbuffer.c
> > +++ b/lib/intel_batchbuffer.c
> > @@ -45,6 +45,12 @@
> >  
> >  #include <i915_drm.h>
> >  
> > +#define MI_FLUSH_DW (0x26 << 23)
> > +
> > +#define BCS_SWCTRL 0x22200
> > +#define BCS_SRC_Y (1 << 0)
> > +#define BCS_DST_Y (1 << 1)
> > +
> >  /**
> >   * SECTION:intel_batchbuffer
> >   * @short_description: Batchbuffer and blitter support
> > @@ -660,6 +666,185 @@ static void exec_blit(int fd,
> >         gem_execbuf(fd, &exec);
> >  }
> >  
> > +static uint32_t src_copy_dword0(uint32_t src_tiling, uint32_t dst_tiling,
> > +                               uint32_t bpp, uint32_t device_gen)
> > +{
> > +       uint32_t dword0 = 0;
> > +
> > +       dword0 |= XY_SRC_COPY_BLT_CMD;
> > +       if (bpp == 32)
> > +               dword0 |= XY_SRC_COPY_BLT_WRITE_RGB |
> > +                       XY_SRC_COPY_BLT_WRITE_ALPHA;
> > +
> > +       if (device_gen >= 4 && src_tiling)
> > +               dword0 |= XY_SRC_COPY_BLT_SRC_TILED;
> > +       if (device_gen >= 4 && dst_tiling)
> > +               dword0 |= XY_SRC_COPY_BLT_DST_TILED;
> > +
> > +       return dword0;
> > +}
> > +
> > +static uint32_t src_copy_dword1(uint32_t dst_pitch, uint32_t bpp)
> > +{
> > +       uint32_t dword1 = 0;
> > +
> > +       switch (bpp) {
> > +       case 8:
> > +               break;
> > +       case 16:
> > +               dword1 |= (1 << 24); /* Only support 565 color */
> > +               break;
> > +       case 32:
> > +               dword1 |= (3 << 24);
> > +               break;
> > +       default:
> > +               igt_assert(0);
> > +       }
> > +
> > +       dword1 |= 0xcc << 16;
> > +       dword1 |= dst_pitch;
> > +
> > +       return dword1;
> > +}
> > +/**
> > + * igt_blitter_src_copy__raw:
> > + * @fd: file descriptor of the i915 driver
> > + * @src_handle: GEM handle of the source buffer
> > + * @src_delta: offset into the source GEM bo, in bytes
> > + * @src_stride: Stride (in bytes) of the source buffer
> > + * @src_tiling: Tiling mode of the source buffer
> > + * @src_x: X coordinate of the source region to copy
> > + * @src_y: Y coordinate of the source region to copy
> > + * @width: Width of the region to copy
> > + * @height: Height of the region to copy
> > + * @bpp: source and destination bits per pixel
> > + * @dst_handle: GEM handle of the destination buffer
> > + * @dst_delta: offset into the destination GEM bo, in bytes
> > + * @dst_stride: Stride (in bytes) of the destination buffer
> > + * @dst_tiling: Tiling mode of the destination buffer
> > + * @dst_x: X coordinate of destination
> > + * @dst_y: Y coordinate of destination
> > + *
> > + */
> > +void igt_blitter_src_copy__raw(int fd,
> 
> Just call it igt_blitter_copy(). It should be the shorter, more often
> useful variant. _fast is the special case version.
> 
> These functions should not be intel_batchbuffer.c. Either you do intend
> them to be vendor neutral, and they could indeed become so, or they
> should not be called igt_.
> 
> > +                               /* src */
> > +                               uint32_t src_handle,
> > +                               uint32_t src_delta,
> > +                               uint32_t src_stride,
> > +                               uint32_t src_tiling,
> > +                               uint32_t src_x, uint32_t src_y,
> > +
> > +                               /* size */
> > +                               uint32_t width, uint32_t height,
> > +
> > +                               /* bpp */
> > +                               uint32_t bpp,
> > +
> > +                               /* dst */
> > +                               uint32_t dst_handle,
> > +                               uint32_t dst_delta,
> > +                               uint32_t dst_stride,
> > +                               uint32_t dst_tiling,
> > +                               uint32_t dst_x, uint32_t dst_y)
> > +{
> > +       uint32_t batch[32];
> > +       struct drm_i915_gem_exec_object2 objs[3];
> > +       struct drm_i915_gem_relocation_entry relocs[2];
> > +       uint32_t batch_handle;
> > +       uint32_t src_pitch, dst_pitch;
> > +       uint32_t dst_reloc_offset, src_reloc_offset;
> > +       int i = 0;
> > +       uint32_t gen = intel_gen(intel_get_drm_devid(fd));
> > +       const bool has_64b_reloc = gen >= 8;
> > +
> > +       memset(batch, 0, sizeof(batch));
> > +
> > +       igt_assert((src_tiling == I915_TILING_NONE) ||
> > +                  (src_tiling == I915_TILING_X) ||
> > +                  (src_tiling == I915_TILING_Y));
> > +       igt_assert((dst_tiling == I915_TILING_NONE) ||
> > +                  (dst_tiling == I915_TILING_X) ||
> > +                  (dst_tiling == I915_TILING_Y));
> > +
> > +       src_pitch = (gen >= 4 && src_tiling) ? src_stride / 4 : src_stride;
> > +       dst_pitch = (gen >= 4 && dst_tiling) ? dst_stride / 4 : dst_stride;
> > +
> > +       CHECK_RANGE(src_x); CHECK_RANGE(src_y);
> > +       CHECK_RANGE(dst_x); CHECK_RANGE(dst_y);
> > +       CHECK_RANGE(width); CHECK_RANGE(height);
> > +       CHECK_RANGE(src_x + width); CHECK_RANGE(src_y + height);
> > +       CHECK_RANGE(dst_x + width); CHECK_RANGE(dst_y + height);
> > +       CHECK_RANGE(src_pitch); CHECK_RANGE(dst_pitch);
> > +
> > +       if ((src_tiling | dst_tiling) >= I915_TILING_Y) {
> > +               unsigned int mask;
> > +
> > +               batch[i++] = MI_LOAD_REGISTER_IMM;
> > +               batch[i++] = BCS_SWCTRL;
> > +
> > +               mask = (BCS_SRC_Y | BCS_DST_Y) << 16;
> > +               if (src_tiling == I915_TILING_Y)
> > +                       mask |= BCS_SRC_Y;
> > +               if (dst_tiling == I915_TILING_Y)
> > +                       mask |= BCS_DST_Y;
> > +               batch[i++] = mask;
> > +       }
> > +
> > +       batch[i] = src_copy_dword0(src_tiling, dst_tiling, bpp, gen);
> > +       batch[i++] |= 6 + 2 * has_64b_reloc;
> > +       batch[i++] = src_copy_dword1(dst_pitch, bpp);
> > +       batch[i++] = (dst_y << 16) | dst_x; /* dst x1,y1 */
> > +       batch[i++] = ((dst_y + height) << 16) | (dst_x + width); /* dst x2,y2 */
> > +       dst_reloc_offset = i;
> > +       batch[i++] = dst_delta; /* dst address lower bits */
> > +       if (has_64b_reloc)
> > +               batch[i++] = 0; /* dst address upper bits */
> 
> It is better do the relocations inline and use the computed address from
> the relocation routine so that the value in the batch is known to be
> equal to value passed to the kernel.
> 
> > +       batch[i++] = (src_y << 16) | src_x; /* src x1,y1 */
> > +       batch[i++] = src_pitch;
> > +       src_reloc_offset = i;
> > +       batch[i++] = src_delta; /* src address lower bits */
> > +       if (has_64b_reloc)
> > +               batch[i++] = 0; /* src address upper bits */
> > +
> > +       if ((src_tiling | dst_tiling) >= I915_TILING_Y) {
> > +               igt_assert(gen >= 6);
> > +               batch[i++] = MI_FLUSH_DW | 2;
> > +               batch[i++] = 0;
> > +               batch[i++] = 0;
> > +               batch[i++] = 0;
> > +
> > +               batch[i++] = MI_LOAD_REGISTER_IMM;
> > +               batch[i++] = BCS_SWCTRL;
> > +               batch[i++] = (BCS_SRC_Y | BCS_DST_Y) << 16;
> > +       }
> > +
> > +       batch[i++] = MI_BATCH_BUFFER_END;
> > +       batch[i++] = MI_NOOP;
> > +
> > +       igt_assert(i <= ARRAY_SIZE(batch));
> > +
> > +       batch_handle = gem_create(fd, 4096);
> > +       gem_write(fd, batch_handle, 0, batch, sizeof(batch));
> > +
> > +       fill_relocation(&relocs[0], dst_handle, dst_delta, dst_reloc_offset,
> > +                       I915_GEM_DOMAIN_RENDER, I915_GEM_DOMAIN_RENDER);
> > +       fill_relocation(&relocs[1], src_handle, src_delta, src_reloc_offset,
> > +                       I915_GEM_DOMAIN_RENDER, 0);
> > +
> > +       fill_object(&objs[0], dst_handle, NULL, 0);
> > +       fill_object(&objs[1], src_handle, NULL, 0);
> > +       fill_object(&objs[2], batch_handle, relocs, 2);
> > +
> > +       if (dst_tiling)
> > +               objs[0].flags |= EXEC_OBJECT_NEEDS_FENCE;
> 
> obj[0].flags |= EXEC_OBJECT_WRITE;
> 
> and then you can would be eligible to use execbuf.flags |= I915_EXEC_NORELOC;
> But given you have no history, for it to work you could preseed the
> obj.offset to sensible values rather than have all 3 three in the same
> location. And you could fix the code to I915_EXEC_LUT_HANDLE.
> 
> > +       if (src_tiling)
> > +               objs[1].flags |= EXEC_OBJECT_NEEDS_FENCE;
> > +
> > +       exec_blit(fd, objs, 3, ARRAY_SIZE(batch));
> 
> exec_blit() is broken as it does not apply to <gen6.

Having this patch just taken over from its original author, I don't feel to be 
in a position to address all your comments in a reasonably short time.  Let me 
propose a simplified implementation, local to prime_vgem, in my next iteration 
as an option.  I'm going to use the same API as that of 
igt_blitter_src_copy__raw() and hopefully the future igt_blitter_copy() 
library helper so switching to it as soon as it is ready should be trivial.

Thanks,
Janusz


> 
> > +
> > +       gem_close(fd, batch_handle);
> > +}
> 




_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Intel-gfx] [PATCH i-g-t 2/2] tests/prime_vgem: Examine blitter access path
  2020-01-15 11:04     ` Janusz Krzysztofik
@ 2020-01-16 12:13       ` Janusz Krzysztofik
  0 siblings, 0 replies; 10+ messages in thread
From: Janusz Krzysztofik @ 2020-01-16 12:13 UTC (permalink / raw)
  To: Chris Wilson; +Cc: igt-dev, Daniel Vetter, intel-gfx

Hi,

Answering to myself, sorry.

On Wednesday, January 15, 2020 12:04:51 PM CET Janusz Krzysztofik wrote:
> Hi Chris,
> 
> Thanks for your review.
> 
> On Thursday, January 9, 2020 4:03:30 PM CET Chris Wilson wrote:
> > Quoting Janusz Krzysztofik (2020-01-09 14:01:25)
> > > On future hardware with missing GGTT BAR we won't be able to exercise
> > > dma-buf access via that path.  An alternative to basic-gtt subtest for
> > > testing dma-buf access is required, as well as basic-fence-mmap and
> > > coherency-gtt subtest alternatives for testing WC coherency.
> > > 
> > > Access to the dma sg list feature exposed by dma-buf can be tested
> > > through blitter.  Unfortunately we don't have any equivalently simple
> > > tests that use blitter.  Provide them.
> > > 
> > > Blitter XY_SRC_COPY method implemented by igt_blitter_src_copy__raw()
> > > IGT library function has been chosen.
> > > 
> > > v2: As fast copy is not supported on platforms older than Gen 9,
> > >     use XY_SRC_COPY instead (Chris),
> > >   - add subtest descriptions.
> > > 
> > > Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > Suggested-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
> > > ---
> > >  tests/prime_vgem.c | 192 +++++++++++++++++++++++++++++++++++++++++++++
> > >  1 file changed, 192 insertions(+)
> > > 
> > > diff --git a/tests/prime_vgem.c b/tests/prime_vgem.c
> > > index 69ae8c9b..9ceee47a 100644
> > > --- a/tests/prime_vgem.c
> > > +++ b/tests/prime_vgem.c
> > > @@ -23,6 +23,7 @@
> > >  
> > >  #include "igt.h"
> > >  #include "igt_vgem.h"
> > > +#include "intel_batchbuffer.h"
> > >  
> > >  #include <sys/ioctl.h>
> > >  #include <sys/poll.h>
> > > @@ -171,6 +172,77 @@ static void test_fence_mmap(int i915, int vgem)
> > >         close(slave[1]);
> > >  }
> > >  
> > > +static void test_fence_blt(int i915, int vgem)
> > > +{
> > > +       struct vgem_bo scratch;
> > > +       uint32_t prime;
> > > +       uint32_t *ptr;
> > > +       uint32_t fence;
> > > +       int dmabuf, i;
> > > +       int master[2], slave[2];
> > > +
> > > +       igt_assert(pipe(master) == 0);
> > > +       igt_assert(pipe(slave) == 0);
> > > +
> > > +       scratch.width = 1024;
> > > +       scratch.height = 1024;
> > > +       scratch.bpp = 32;
> > > +       vgem_create(vgem, &scratch);
> > 
> > Hmm. The pitch is returned by the dumb create, did we not report it in
> > vgem_bo? scratch.pitch;
> 
> You're right, the pitch is returned in scratch.pitch.
>  
> > > +
> > > +       dmabuf = prime_handle_to_fd(vgem, scratch.handle);
> > > +       prime = prime_fd_to_handle(i915, dmabuf);
> > > +       close(dmabuf);
> > > +
> > > +       igt_fork(child, 1) {
> > > +               uint32_t native;
> > > +
> > > +               close(master[0]);
> > > +               close(slave[1]);
> > > +
> > > +               native = gem_create(i915, scratch.size);
> > > +
> > > +               ptr = gem_mmap__wc(i915, native, 0, scratch.size, PROT_READ);
> > > +               for (i = 0; i < 1024; i++)
> > for (i = 0; i < scratch.height; i++)
> 
> Makes sense.
> 
> > > +                       igt_assert_eq(ptr[1024 * i], 0);
> > 
> > igt_assert_eq_u32(ptr[scratch.pitch * i / sizeof(*ptr)], 0);
> > 
> > The "%08x" will be easier to read when you have ~i later on.
> 
> Indeed.
> 
> > > +
> > > +               write(master[1], &child, sizeof(child));
> > > +               read(slave[0], &child, sizeof(child));
> > > +
> > > +               igt_blitter_src_copy__raw(i915, prime, 0,
> > > +                                         scratch.width * scratch.bpp / 8,
> > 
> > scratch.pitch,
> 
> Sure.
> 
> > > +                                         I915_TILING_NONE, 0, 0,
> > > +                                         scratch.width, scratch.height,
> > > +                                         scratch.bpp, native, 0,
> > > +                                         scratch.width * scratch.bpp / 8,
> > > +                                         I915_TILING_NONE, 0, 0);
> > > +               gem_sync(i915, native);
> 
> While exercising next version of the patch in my test environment, I've found 
> out that the only place where synchronization is actually required for new 
> subtests to succeed is this subtest - "basic-fence-blt".  Moreover, either 
> following the blitter copy operation with gem_sync(i915, native) or enclosing 
> it with prime_sync_start/end(dmabuf, flase) does the job for me.  Assuming 
> this behavior is reproducible, which method would you prefer here?

Putting prime_sync_start() before the blitter copy operation would result in 
the fenced vgem object being awaited for already from inside 
prime_sync_start(), not from inside our blitter copy operation as intended.

Please forgive me my ignorance.

Thanks,
Janusz


> > > +
> > > +               for (i = 0; i < 1024; i++)
> > > +                       igt_assert_eq(ptr[1024 * i], i);
> > > +
> > > +               munmap(ptr, scratch.size);
> > > +               gem_close(i915, native);
> > > +               gem_close(i915, prime);
> > > +       }
> > > +
> > > +       close(master[1]);
> > > +       close(slave[0]);
> > > +       read(master[0], &i, sizeof(i));
> > > +       fence = vgem_fence_attach(vgem, &scratch, VGEM_FENCE_WRITE);
> > > +       write(slave[1], &i, sizeof(i));
> > 
> > Throw in a usleep(50*1000); to emphasize that the only thing stopping
> > the blitter is the fence.
> 
> OK, and with your comment placed above it, I think.
> 
> > > +
> > > +       ptr = vgem_mmap(vgem, &scratch, PROT_WRITE);
> > > +       for (i = 0; i < 1024; i++)
> > > +               ptr[1024 * i] = i;
> > > +       munmap(ptr, scratch.size);
> > > +       vgem_fence_signal(vgem, fence);
> > > +       gem_close(vgem, scratch.handle);
> > > +
> > > +       igt_waitchildren();
> > > +       close(master[0]);
> > > +       close(slave[1]);
> > > +}
> > > +
> > >  static void test_write(int vgem, int i915)
> > >  {
> > >         struct vgem_bo scratch;
> > > @@ -236,6 +308,62 @@ static void test_gtt(int vgem, int i915)
> > >         gem_close(vgem, scratch.handle);
> > >  }
> > >  
> > > +static void test_blt(int vgem, int i915)
> > > +{
> > > +       struct vgem_bo scratch;
> > > +       uint32_t prime, native;
> > > +       uint32_t *ptr;
> > > +       int dmabuf, i;
> > > +
> > > +       scratch.width = 1024;
> > > +       scratch.height = 1024;
> > > +       scratch.bpp = 32;
> > > +       vgem_create(vgem, &scratch);
> > > +
> > > +       dmabuf = prime_handle_to_fd(vgem, scratch.handle);
> > > +       prime = prime_fd_to_handle(i915, dmabuf);
> > > +       close(dmabuf);
> > > +
> > > +       native = gem_create(i915, scratch.size);
> > > +
> > > +       ptr = gem_mmap__wc(i915, native, 0, scratch.size, PROT_WRITE);
> > > +       for (i = 0; i < 1024; i++)
> > > +               ptr[1024 * i] = i;
> > > +       munmap(ptr, scratch.size);
> > > +
> > > +       igt_blitter_src_copy__raw(i915,
> > > +                                 native, 0, scratch.width * scratch.bpp / 8,
> > > +                                 I915_TILING_NONE, 0, 0,
> > > +                                 scratch.width, scratch.height, scratch.bpp,
> > > +                                 prime, 0, scratch.width * scratch.bpp / 8,
> > > +                                 I915_TILING_NONE, 0, 0);
> > > +       gem_sync(i915, prime);
> > 
> > Would this be better with prime_sync_start(dmabuf, true); ?
> 
> Did you mean surrounding the this native->prime blitter copy operation with 
> prime_sync_start/end(dmabuf, true); ?
> 
> > gem_sync(prime) is nice in its own way (checking sync on imported handle
> > provides correct serialisation for the vgem_mmap). But I think the
> > recommended practice would be to use dmabuf for the inter-device sync?
> 
> My intention was to make sure the copy has been completed by GPU.
> 
> As I've found both test_blt() and test_blt_interleaved() succeed without any 
> syncing in my test environment, I'm going to give different options a try on 
> Trybot first before my next submission, unless you are able to provide me with 
> your recommendations on all my gem_sync() use cases found in this patch.
> 
> Thanks,
> Janusz
> 
> 
> > > +       ptr = vgem_mmap(vgem, &scratch, PROT_READ | PROT_WRITE);
> > > +       for (i = 0; i < 1024; i++) {
> > > +               igt_assert_eq(ptr[1024 * i], i);
> > > +               ptr[1024 * i] = ~i;
> > > +       }
> > > +       munmap(ptr, scratch.size);
> > > +
> > > +       igt_blitter_src_copy__raw(i915,
> > > +                                 prime, 0, scratch.width * scratch.bpp / 8,
> > > +                                 I915_TILING_NONE, 0, 0,
> > > +                                 scratch.width, scratch.height, scratch.bpp,
> > > +                                 native, 0, scratch.width * scratch.bpp / 8,
> > > +                                 I915_TILING_NONE, 0, 0);
> > > +       gem_sync(i915, native);
> > > +
> > > +       ptr = gem_mmap__wc(i915, native, 0, scratch.size, PROT_READ);
> > > +       for (i = 0; i < 1024; i++)
> > > +               igt_assert_eq(ptr[1024 * i], ~i);
> > > +       munmap(ptr, scratch.size);
> > > +
> > > +       gem_close(i915, native);
> > > +       gem_close(i915, prime);
> > > +       gem_close(vgem, scratch.handle);
> > > +}
> > > +
> > >  static void test_shrink(int vgem, int i915)
> > >  {
> > >         struct vgem_bo scratch = {
> > > @@ -319,6 +447,59 @@ static void test_gtt_interleaved(int vgem, int i915)
> > >         gem_close(vgem, scratch.handle);
> > >  }
> > >  
> > > +static void test_blt_interleaved(int vgem, int i915)
> > > +{
> > > +       struct vgem_bo scratch;
> > > +       uint32_t prime, native;
> > > +       uint32_t *foreign, *local;
> > > +       int dmabuf, i;
> > > +
> > > +       scratch.width = 1024;
> > > +       scratch.height = 1024;
> > > +       scratch.bpp = 32;
> > > +       vgem_create(vgem, &scratch);
> > > +
> > > +       dmabuf = prime_handle_to_fd(vgem, scratch.handle);
> > > +       prime = prime_fd_to_handle(i915, dmabuf);
> > > +       close(dmabuf);
> > > +
> > > +       native = gem_create(i915, scratch.size);
> > > +
> > > +       foreign = vgem_mmap(vgem, &scratch, PROT_WRITE);
> > > +       local = gem_mmap__wc(i915, native, 0, scratch.size, PROT_WRITE);
> > > +
> > > +       for (i = 0; i < 1024; i++) {
> > > +               local[1024 * i] = i;
> > > +               igt_blitter_src_copy__raw(i915, native, 0,
> > > +                                         scratch.width * scratch.bpp / 8,
> > > +                                         I915_TILING_NONE, 0, i,
> > > +                                         scratch.width, 1, scratch.bpp,
> > > +                                         prime, 0,
> > > +                                         scratch.width * scratch.bpp / 8,
> > > +                                         I915_TILING_NONE, 0, i);
> > > +               gem_sync(i915, prime);
> > > +               igt_assert_eq(foreign[1024 * i], i);
> > > +
> > > +               foreign[1024 * i] = ~i;
> > > +               igt_blitter_src_copy__raw(i915, prime, 0,
> > > +                                         scratch.width * scratch.bpp / 8,
> > > +                                         I915_TILING_NONE, 0, i,
> > > +                                         scratch.width, 1, scratch.bpp,
> > > +                                         native, 0,
> > > +                                         scratch.width * scratch.bpp / 8,
> > > +                                         I915_TILING_NONE, 0, i);
> > > +               gem_sync(i915, native);
> > > +               igt_assert_eq(local[1024 * i], ~i);
> > > +       }
> > > +
> > > +       munmap(local, scratch.size);
> > > +       munmap(foreign, scratch.size);
> > > +
> > > +       gem_close(i915, native);
> > > +       gem_close(i915, prime);
> > > +       gem_close(vgem, scratch.handle);
> > > +}
> > 
> > So other than using scratch.constants where possible, the tests make
> > sense and look interesting. Asynchronous fence scheduling,
> > back-and-forth coherency and serialisation.
> > 
> > Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
> > -Chris
> > 
> 
> 




_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Intel-gfx] [PATCH i-g-t 2/2] tests/prime_vgem: Examine blitter access path
  2020-01-09 15:03   ` Chris Wilson
  2020-01-15 11:04     ` Janusz Krzysztofik
@ 2020-01-22 15:24     ` Janusz Krzysztofik
  2020-01-23 11:01       ` Janusz Krzysztofik
  1 sibling, 1 reply; 10+ messages in thread
From: Janusz Krzysztofik @ 2020-01-22 15:24 UTC (permalink / raw)
  To: Chris Wilson; +Cc: igt-dev, Daniel Vetter, intel-gfx

Hi Chris,

On Thursday, January 9, 2020 4:03:30 PM CET Chris Wilson wrote:
> Quoting Janusz Krzysztofik (2020-01-09 14:01:25)
> > On future hardware with missing GGTT BAR we won't be able to exercise
> > dma-buf access via that path.  An alternative to basic-gtt subtest for
> > testing dma-buf access is required, as well as basic-fence-mmap and
> > coherency-gtt subtest alternatives for testing WC coherency.
> > 
> > Access to the dma sg list feature exposed by dma-buf can be tested
> > through blitter.  Unfortunately we don't have any equivalently simple
> > tests that use blitter.  Provide them.
> > 
> > Blitter XY_SRC_COPY method implemented by igt_blitter_src_copy__raw()
> > IGT library function has been chosen.
> > 
> > v2: As fast copy is not supported on platforms older than Gen 9,
> >     use XY_SRC_COPY instead (Chris),
> >   - add subtest descriptions.
> > 
> > Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Suggested-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> > Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
> > ---
> >  tests/prime_vgem.c | 192 +++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 192 insertions(+)
> > 
> > diff --git a/tests/prime_vgem.c b/tests/prime_vgem.c
> > index 69ae8c9b..9ceee47a 100644
> > --- a/tests/prime_vgem.c
> > +++ b/tests/prime_vgem.c
<SNIP>
> >  
> > +static void test_blt(int vgem, int i915)
> > +{
> > +       struct vgem_bo scratch;
> > +       uint32_t prime, native;
> > +       uint32_t *ptr;
> > +       int dmabuf, i;
> > +
> > +       scratch.width = 1024;
> > +       scratch.height = 1024;
> > +       scratch.bpp = 32;
> > +       vgem_create(vgem, &scratch);
> > +
> > +       dmabuf = prime_handle_to_fd(vgem, scratch.handle);
> > +       prime = prime_fd_to_handle(i915, dmabuf);
> > +       close(dmabuf);
> > +
> > +       native = gem_create(i915, scratch.size);
> > +
> > +       ptr = gem_mmap__wc(i915, native, 0, scratch.size, PROT_WRITE);
> > +       for (i = 0; i < 1024; i++)
> > +               ptr[1024 * i] = i;
> > +       munmap(ptr, scratch.size);
> > +
> > +       igt_blitter_src_copy__raw(i915,
> > +                                 native, 0, scratch.width * scratch.bpp / 8,
> > +                                 I915_TILING_NONE, 0, 0,
> > +                                 scratch.width, scratch.height, scratch.bpp,
> > +                                 prime, 0, scratch.width * scratch.bpp / 8,
> > +                                 I915_TILING_NONE, 0, 0);
> > +       gem_sync(i915, prime);
> 
> Would this be better with prime_sync_start(dmabuf, true); ?

I'm not sure why you suggest true for write access, not false for just having 
the prime object confirmed ready for reading.  Could you please explain?

Thanks,
Janusz

> gem_sync(prime) is nice in its own way (checking sync on imported handle
> provides correct serialisation for the vgem_mmap). But I think the
> recommended practice would be to use dmabuf for the inter-device sync?




_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Intel-gfx] [PATCH i-g-t 2/2] tests/prime_vgem: Examine blitter access path
  2020-01-22 15:24     ` Janusz Krzysztofik
@ 2020-01-23 11:01       ` Janusz Krzysztofik
  0 siblings, 0 replies; 10+ messages in thread
From: Janusz Krzysztofik @ 2020-01-23 11:01 UTC (permalink / raw)
  To: Chris Wilson; +Cc: igt-dev, Daniel Vetter, intel-gfx

Sorry for replying to myself again.

On Wednesday, January 22, 2020 4:24:49 PM CET Janusz Krzysztofik wrote:
> Hi Chris,
> 
> On Thursday, January 9, 2020 4:03:30 PM CET Chris Wilson wrote:
> > Quoting Janusz Krzysztofik (2020-01-09 14:01:25)
> > > On future hardware with missing GGTT BAR we won't be able to exercise
> > > dma-buf access via that path.  An alternative to basic-gtt subtest for
> > > testing dma-buf access is required, as well as basic-fence-mmap and
> > > coherency-gtt subtest alternatives for testing WC coherency.
> > > 
> > > Access to the dma sg list feature exposed by dma-buf can be tested
> > > through blitter.  Unfortunately we don't have any equivalently simple
> > > tests that use blitter.  Provide them.
> > > 
> > > Blitter XY_SRC_COPY method implemented by igt_blitter_src_copy__raw()
> > > IGT library function has been chosen.
> > > 
> > > v2: As fast copy is not supported on platforms older than Gen 9,
> > >     use XY_SRC_COPY instead (Chris),
> > >   - add subtest descriptions.
> > > 
> > > Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > Suggested-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
> > > ---
> > >  tests/prime_vgem.c | 192 +++++++++++++++++++++++++++++++++++++++++++++
> > >  1 file changed, 192 insertions(+)
> > > 
> > > diff --git a/tests/prime_vgem.c b/tests/prime_vgem.c
> > > index 69ae8c9b..9ceee47a 100644
> > > --- a/tests/prime_vgem.c
> > > +++ b/tests/prime_vgem.c
> <SNIP>
> > >  
> > > +static void test_blt(int vgem, int i915)
> > > +{
> > > +       struct vgem_bo scratch;
> > > +       uint32_t prime, native;
> > > +       uint32_t *ptr;
> > > +       int dmabuf, i;
> > > +
> > > +       scratch.width = 1024;
> > > +       scratch.height = 1024;
> > > +       scratch.bpp = 32;
> > > +       vgem_create(vgem, &scratch);
> > > +
> > > +       dmabuf = prime_handle_to_fd(vgem, scratch.handle);
> > > +       prime = prime_fd_to_handle(i915, dmabuf);
> > > +       close(dmabuf);
> > > +
> > > +       native = gem_create(i915, scratch.size);
> > > +
> > > +       ptr = gem_mmap__wc(i915, native, 0, scratch.size, PROT_WRITE);
> > > +       for (i = 0; i < 1024; i++)
> > > +               ptr[1024 * i] = i;
> > > +       munmap(ptr, scratch.size);
> > > +
> > > +       igt_blitter_src_copy__raw(i915,
> > > +                                 native, 0, scratch.width * scratch.bpp / 8,
> > > +                                 I915_TILING_NONE, 0, 0,
> > > +                                 scratch.width, scratch.height, scratch.bpp,
> > > +                                 prime, 0, scratch.width * scratch.bpp / 8,
> > > +                                 I915_TILING_NONE, 0, 0);
> > > +       gem_sync(i915, prime);
> > 
> > Would this be better with prime_sync_start(dmabuf, true); ?
> 
> I'm not sure why you suggest true for write access, not false for just having 
> the prime object confirmed ready for reading.  Could you please explain?

Now I think that by requesting a write sync, as suggested by Chris, we make 
sure that all former write operations have completed.  That might be not true 
if we requested a read sync as we might get a snapshot taken while the write 
we wanted to serialize against was still pending.  I hope my understanding of 
this is now correct.

Thanks,
Janusz

> 
> Thanks,
> Janusz
> 
> > gem_sync(prime) is nice in its own way (checking sync on imported handle
> > provides correct serialisation for the vgem_mmap). But I think the
> > recommended practice would be to use dmabuf for the inter-device sync?
> 
> 




_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-01-23 11:02 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-09 14:01 [Intel-gfx] [RFC PATCH i-g-t 0/2] tests/prime_vgem: Examine blitter access path Janusz Krzysztofik
2020-01-09 14:01 ` [Intel-gfx] [PATCH i-g-t 1/2] lib/intel_batchbuffer: Add blitter copy using XY_SRC_COPY_BLT Janusz Krzysztofik
2020-01-09 14:43   ` Chris Wilson
2020-01-15 11:10     ` Janusz Krzysztofik
2020-01-09 14:01 ` [Intel-gfx] [PATCH i-g-t 2/2] tests/prime_vgem: Examine blitter access path Janusz Krzysztofik
2020-01-09 15:03   ` Chris Wilson
2020-01-15 11:04     ` Janusz Krzysztofik
2020-01-16 12:13       ` Janusz Krzysztofik
2020-01-22 15:24     ` Janusz Krzysztofik
2020-01-23 11:01       ` Janusz Krzysztofik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).