[PATCH i-g-t 1/6] overlay: Remove the miscalculation of window position

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH i-g-t 1/6] overlay: Remove the miscalculation of window position
@ 2018-05-14  8:02 ` Chris Wilson
  0 siblings, 0 replies; 38+ messages in thread
From: Chris Wilson @ 2018-05-14  8:02 UTC (permalink / raw)
  To: intel-gfx; +Cc: igt-dev

We already call x11_position() to calculate the position of the
overlay, so do not need to manually recompute them inside
x11_overlay_create(). This has the advantage that x11_position()
understands the multi-monitor layout instructions.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 overlay/x11/x11-overlay.c | 16 ----------------
 1 file changed, 16 deletions(-)

diff --git a/overlay/x11/x11-overlay.c b/overlay/x11/x11-overlay.c
index d08ebfc9c..ae6494295 100644
--- a/overlay/x11/x11-overlay.c
+++ b/overlay/x11/x11-overlay.c
@@ -271,22 +271,6 @@ x11_overlay_create(struct config *config, int *width, int *height)
 
 	priv->x = x;
 	priv->y = y;
-	if (position != POS_UNSET) {
-		switch (position & 7) {
-		default:
-		case 0: priv->x = 0; break;
-		case 1: priv->x = (scr->width - image->width)/2; break;
-		case 2: priv->x = scr->width - image->width; break;
-		}
-
-		switch ((position >> 4) & 7) {
-		default:
-		case 0: priv->y = 0; break;
-		case 1: priv->y = (scr->height - image->height)/2; break;
-		case 2: priv->y = scr->height - image->height; break;
-		}
-	}
-
 
 	priv->image = image;
 	priv->image->data = (void *)&priv->name;
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [igt-dev] [PATCH i-g-t 1/6] overlay: Remove the miscalculation of window position
@ 2018-05-14  8:02 ` Chris Wilson
  0 siblings, 0 replies; 38+ messages in thread
From: Chris Wilson @ 2018-05-14  8:02 UTC (permalink / raw)
  To: intel-gfx; +Cc: igt-dev

We already call x11_position() to calculate the position of the
overlay, so do not need to manually recompute them inside
x11_overlay_create(). This has the advantage that x11_position()
understands the multi-monitor layout instructions.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 overlay/x11/x11-overlay.c | 16 ----------------
 1 file changed, 16 deletions(-)

diff --git a/overlay/x11/x11-overlay.c b/overlay/x11/x11-overlay.c
index d08ebfc9c..ae6494295 100644
--- a/overlay/x11/x11-overlay.c
+++ b/overlay/x11/x11-overlay.c
@@ -271,22 +271,6 @@ x11_overlay_create(struct config *config, int *width, int *height)
 
 	priv->x = x;
 	priv->y = y;
-	if (position != POS_UNSET) {
-		switch (position & 7) {
-		default:
-		case 0: priv->x = 0; break;
-		case 1: priv->x = (scr->width - image->width)/2; break;
-		case 2: priv->x = scr->width - image->width; break;
-		}
-
-		switch ((position >> 4) & 7) {
-		default:
-		case 0: priv->y = 0; break;
-		case 1: priv->y = (scr->height - image->height)/2; break;
-		case 2: priv->y = scr->height - image->height; break;
-		}
-	}
-
 
 	priv->image = image;
 	priv->image->data = (void *)&priv->name;
-- 
2.17.0

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH i-g-t 2/6] lib/audio: Replace sqrt(a*a + b*b) with hypot(a, b)
  2018-05-14  8:02 ` [igt-dev] " Chris Wilson
@ 2018-05-14  8:02   ` Chris Wilson
  -1 siblings, 0 replies; 38+ messages in thread
From: Chris Wilson @ 2018-05-14  8:02 UTC (permalink / raw)
  To: intel-gfx; +Cc: igt-dev

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 lib/igt_audio.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/lib/igt_audio.c b/lib/igt_audio.c
index 2321d1c6e..229ce6c69 100644
--- a/lib/igt_audio.c
+++ b/lib/igt_audio.c
@@ -266,9 +266,7 @@ bool audio_signal_detect(struct audio_signal *signal, int channels,
 		max = 0;
 
 		for (i = 0; i < frames / 2; i++) {
-			amplitude[i] = sqrt(data[i] * data[i] +
-					    data[frames - i] *
-					    data[frames - i]);
+			amplitude[i] = hypot(data[i], data[frames - i]);
 			if (amplitude[i] > max)
 				max = amplitude[i];
 		}
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [igt-dev] [PATCH i-g-t 2/6] lib/audio: Replace sqrt(a*a + b*b) with hypot(a, b)
@ 2018-05-14  8:02   ` Chris Wilson
  0 siblings, 0 replies; 38+ messages in thread
From: Chris Wilson @ 2018-05-14  8:02 UTC (permalink / raw)
  To: intel-gfx; +Cc: igt-dev

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 lib/igt_audio.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/lib/igt_audio.c b/lib/igt_audio.c
index 2321d1c6e..229ce6c69 100644
--- a/lib/igt_audio.c
+++ b/lib/igt_audio.c
@@ -266,9 +266,7 @@ bool audio_signal_detect(struct audio_signal *signal, int channels,
 		max = 0;
 
 		for (i = 0; i < frames / 2; i++) {
-			amplitude[i] = sqrt(data[i] * data[i] +
-					    data[frames - i] *
-					    data[frames - i]);
+			amplitude[i] = hypot(data[i], data[frames - i]);
 			if (amplitude[i] > max)
 				max = amplitude[i];
 		}
-- 
2.17.0

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH i-g-t 3/6] igt/gem_ctx_thrash: Order writes between contexts
  2018-05-14  8:02 ` [igt-dev] " Chris Wilson
@ 2018-05-14  8:02   ` Chris Wilson
  -1 siblings, 0 replies; 38+ messages in thread
From: Chris Wilson @ 2018-05-14  8:02 UTC (permalink / raw)
  To: intel-gfx; +Cc: igt-dev

The test wrote to the same dwords from multiple contexts, assuming that
the writes would be ordered by its submission. However, as it was using
multiple contexts without a write hazard, those timelines are not
coupled and the requests may be emitted to hw in any order. So emit a
write hazard for each individual dword in the scratch (avoiding the
write hazard for the scratch as a whole) to ensure the writes do occur
in the expected order.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 tests/gem_ctx_thrash.c | 92 ++++++++++++++++++++++++------------------
 1 file changed, 53 insertions(+), 39 deletions(-)

diff --git a/tests/gem_ctx_thrash.c b/tests/gem_ctx_thrash.c
index 2cd9cfebf..b25f95f13 100644
--- a/tests/gem_ctx_thrash.c
+++ b/tests/gem_ctx_thrash.c
@@ -90,17 +90,13 @@ static void single(const char *name, bool all_engines)
 {
 	struct drm_i915_gem_exec_object2 *obj;
 	struct drm_i915_gem_relocation_entry *reloc;
-	unsigned engines[16];
-	uint64_t size;
-	uint32_t *ctx, *map, scratch;
-	unsigned num_ctx;
-	int fd, gen, num_engines;
+	unsigned int engines[16], num_engines, num_ctx;
+	uint32_t *ctx, *map, scratch, size;
+	int fd, gen;
 #define MAX_LOOP 16
 
-	fd = drm_open_driver_master(DRIVER_INTEL);
+	fd = drm_open_driver(DRIVER_INTEL);
 	igt_require_gem(fd);
-	igt_require(gem_can_store_dword(fd, 0));
-
 	gem_require_contexts(fd);
 
 	gen = intel_gen(intel_get_drm_devid(fd));
@@ -108,54 +104,77 @@ static void single(const char *name, bool all_engines)
 	num_engines = 0;
 	if (all_engines) {
 		unsigned engine;
+
 		for_each_physical_engine(fd, engine) {
+			if (!gem_can_store_dword(fd, engine))
+				continue;
+
 			engines[num_engines++] = engine;
 			if (num_engines == ARRAY_SIZE(engines))
 				break;
 		}
-	} else
+	} else {
+		igt_require(gem_can_store_dword(fd, 0));
 		engines[num_engines++] = 0;
+	}
+	igt_require(num_engines);
 
 	num_ctx = get_num_contexts(fd, num_engines);
 
 	size = ALIGN(num_ctx * sizeof(uint32_t), 4096);
-	scratch = gem_create(fd, ALIGN(num_ctx * sizeof(uint32_t), 4096));
+	scratch = gem_create(fd, size);
 	gem_set_caching(fd, scratch, I915_CACHING_CACHED);
-	obj = calloc(num_ctx, 2 * sizeof(*obj));
-	reloc = calloc(num_ctx, sizeof(*reloc));
+	obj = calloc(num_ctx, 3 * sizeof(*obj));
+	reloc = calloc(num_ctx, 2 * sizeof(*reloc));
 
 	ctx = malloc(num_ctx * sizeof(uint32_t));
 	igt_assert(ctx);
 	for (unsigned n = 0; n < num_ctx; n++) {
 		ctx[n] = gem_context_create(fd);
-		obj[2*n + 0].handle = scratch;
-
-		reloc[n].target_handle = scratch;
-		reloc[n].presumed_offset = 0;
-		reloc[n].offset = sizeof(uint32_t);
-		reloc[n].delta = n * sizeof(uint32_t);
-		reloc[n].read_domains = I915_GEM_DOMAIN_INSTRUCTION;
-		reloc[n].write_domain = 0; /* lies! */
+
+		obj[3*n + 0].handle = gem_create(fd, 4096);
+		reloc[2*n + 0].target_handle = obj[3*n + 0].handle;
+		reloc[2*n + 0].presumed_offset = 0;
+		reloc[2*n + 0].offset = 4000;
+		reloc[2*n + 0].delta = 0;
+		reloc[2*n + 0].read_domains = I915_GEM_DOMAIN_RENDER;
+		reloc[2*n + 0].write_domain = I915_GEM_DOMAIN_RENDER;
+
+		obj[3*n + 1].handle = scratch;
+		reloc[2*n + 1].target_handle = scratch;
+		reloc[2*n + 1].presumed_offset = 0;
+		reloc[2*n + 1].offset = sizeof(uint32_t);
+		reloc[2*n + 1].delta = n * sizeof(uint32_t);
+		reloc[2*n + 1].read_domains = I915_GEM_DOMAIN_RENDER;
+		reloc[2*n + 1].write_domain = 0; /* lies! */
 		if (gen >= 4 && gen < 8)
-			reloc[n].offset += sizeof(uint32_t);
+			reloc[2*n + 1].offset += sizeof(uint32_t);
 
-		obj[2*n + 1].relocs_ptr = to_user_pointer(&reloc[n]);
-		obj[2*n + 1].relocation_count = 1;
+		obj[3*n + 2].relocs_ptr = to_user_pointer(&reloc[2*n]);
+		obj[3*n + 2].relocation_count = 2;
 	}
 
 	map = gem_mmap__cpu(fd, scratch, 0, size, PROT_WRITE);
-	for (unsigned loop = 1; loop <= MAX_LOOP; loop <<= 1) {
-		unsigned count = loop * num_ctx;
+	for (unsigned int loop = 1; loop <= MAX_LOOP; loop <<= 1) {
+		const unsigned int count = loop * num_ctx;
 		uint32_t *all;
 
 		all = malloc(count * sizeof(uint32_t));
-		for (unsigned n = 0; n < count; n++)
+		for (unsigned int n = 0; n < count; n++)
 			all[n] = ctx[n % num_ctx];
 		igt_permute_array(all, count, xchg_int);
-		for (unsigned n = 0; n < count; n++) {
-			struct drm_i915_gem_execbuffer2 execbuf;
-			unsigned r = n % num_ctx;
-			uint64_t offset = reloc[r].presumed_offset + reloc[r].delta;
+
+		for (unsigned int n = 0; n < count; n++) {
+			const unsigned int r = n % num_ctx;
+			struct drm_i915_gem_execbuffer2 execbuf = {
+				.buffers_ptr = to_user_pointer(&obj[3*r]),
+				.buffer_count = 3,
+				.flags = engines[n % num_engines],
+				.rsvd1 = all[n],
+			};
+			uint64_t offset =
+				reloc[2*r + 1].presumed_offset +
+				reloc[2*r + 1].delta;
 			uint32_t handle = gem_create(fd, 4096);
 			uint32_t buf[16];
 			int i;
@@ -176,27 +195,22 @@ static void single(const char *name, bool all_engines)
 			buf[++i] = all[n];
 			buf[++i] = MI_BATCH_BUFFER_END;
 			gem_write(fd, handle, 0, buf, sizeof(buf));
-			obj[2*r + 1].handle = handle;
+			obj[3*r + 2].handle = handle;
 
-			memset(&execbuf, 0, sizeof(execbuf));
-			execbuf.buffers_ptr = to_user_pointer(&obj[2*r]);
-			execbuf.buffer_count = 2;
-			execbuf.flags = engines[n % num_engines];
-			execbuf.rsvd1 = all[n];
 			gem_execbuf(fd, &execbuf);
 			gem_close(fd, handle);
 		}
 
-		/* Note we lied about the write-domain when writing from the
+		/*
+		 * Note we lied about the write-domain when writing from the
 		 * GPU (in order to avoid inter-ring synchronisation), so now
 		 * we have to force the synchronisation here.
 		 */
 		gem_set_domain(fd, scratch,
 			       I915_GEM_DOMAIN_CPU, I915_GEM_DOMAIN_CPU);
-		for (unsigned n = count - num_ctx; n < count; n++)
+		for (unsigned int n = count - num_ctx; n < count; n++)
 			igt_assert_eq(map[n % num_ctx], all[n]);
 		free(all);
-		igt_progress(name, loop, MAX_LOOP);
 	}
 	munmap(map, size);
 
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [igt-dev] [PATCH i-g-t 3/6] igt/gem_ctx_thrash: Order writes between contexts
@ 2018-05-14  8:02   ` Chris Wilson
  0 siblings, 0 replies; 38+ messages in thread
From: Chris Wilson @ 2018-05-14  8:02 UTC (permalink / raw)
  To: intel-gfx; +Cc: igt-dev

The test wrote to the same dwords from multiple contexts, assuming that
the writes would be ordered by its submission. However, as it was using
multiple contexts without a write hazard, those timelines are not
coupled and the requests may be emitted to hw in any order. So emit a
write hazard for each individual dword in the scratch (avoiding the
write hazard for the scratch as a whole) to ensure the writes do occur
in the expected order.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 tests/gem_ctx_thrash.c | 92 ++++++++++++++++++++++++------------------
 1 file changed, 53 insertions(+), 39 deletions(-)

diff --git a/tests/gem_ctx_thrash.c b/tests/gem_ctx_thrash.c
index 2cd9cfebf..b25f95f13 100644
--- a/tests/gem_ctx_thrash.c
+++ b/tests/gem_ctx_thrash.c
@@ -90,17 +90,13 @@ static void single(const char *name, bool all_engines)
 {
 	struct drm_i915_gem_exec_object2 *obj;
 	struct drm_i915_gem_relocation_entry *reloc;
-	unsigned engines[16];
-	uint64_t size;
-	uint32_t *ctx, *map, scratch;
-	unsigned num_ctx;
-	int fd, gen, num_engines;
+	unsigned int engines[16], num_engines, num_ctx;
+	uint32_t *ctx, *map, scratch, size;
+	int fd, gen;
 #define MAX_LOOP 16
 
-	fd = drm_open_driver_master(DRIVER_INTEL);
+	fd = drm_open_driver(DRIVER_INTEL);
 	igt_require_gem(fd);
-	igt_require(gem_can_store_dword(fd, 0));
-
 	gem_require_contexts(fd);
 
 	gen = intel_gen(intel_get_drm_devid(fd));
@@ -108,54 +104,77 @@ static void single(const char *name, bool all_engines)
 	num_engines = 0;
 	if (all_engines) {
 		unsigned engine;
+
 		for_each_physical_engine(fd, engine) {
+			if (!gem_can_store_dword(fd, engine))
+				continue;
+
 			engines[num_engines++] = engine;
 			if (num_engines == ARRAY_SIZE(engines))
 				break;
 		}
-	} else
+	} else {
+		igt_require(gem_can_store_dword(fd, 0));
 		engines[num_engines++] = 0;
+	}
+	igt_require(num_engines);
 
 	num_ctx = get_num_contexts(fd, num_engines);
 
 	size = ALIGN(num_ctx * sizeof(uint32_t), 4096);
-	scratch = gem_create(fd, ALIGN(num_ctx * sizeof(uint32_t), 4096));
+	scratch = gem_create(fd, size);
 	gem_set_caching(fd, scratch, I915_CACHING_CACHED);
-	obj = calloc(num_ctx, 2 * sizeof(*obj));
-	reloc = calloc(num_ctx, sizeof(*reloc));
+	obj = calloc(num_ctx, 3 * sizeof(*obj));
+	reloc = calloc(num_ctx, 2 * sizeof(*reloc));
 
 	ctx = malloc(num_ctx * sizeof(uint32_t));
 	igt_assert(ctx);
 	for (unsigned n = 0; n < num_ctx; n++) {
 		ctx[n] = gem_context_create(fd);
-		obj[2*n + 0].handle = scratch;
-
-		reloc[n].target_handle = scratch;
-		reloc[n].presumed_offset = 0;
-		reloc[n].offset = sizeof(uint32_t);
-		reloc[n].delta = n * sizeof(uint32_t);
-		reloc[n].read_domains = I915_GEM_DOMAIN_INSTRUCTION;
-		reloc[n].write_domain = 0; /* lies! */
+
+		obj[3*n + 0].handle = gem_create(fd, 4096);
+		reloc[2*n + 0].target_handle = obj[3*n + 0].handle;
+		reloc[2*n + 0].presumed_offset = 0;
+		reloc[2*n + 0].offset = 4000;
+		reloc[2*n + 0].delta = 0;
+		reloc[2*n + 0].read_domains = I915_GEM_DOMAIN_RENDER;
+		reloc[2*n + 0].write_domain = I915_GEM_DOMAIN_RENDER;
+
+		obj[3*n + 1].handle = scratch;
+		reloc[2*n + 1].target_handle = scratch;
+		reloc[2*n + 1].presumed_offset = 0;
+		reloc[2*n + 1].offset = sizeof(uint32_t);
+		reloc[2*n + 1].delta = n * sizeof(uint32_t);
+		reloc[2*n + 1].read_domains = I915_GEM_DOMAIN_RENDER;
+		reloc[2*n + 1].write_domain = 0; /* lies! */
 		if (gen >= 4 && gen < 8)
-			reloc[n].offset += sizeof(uint32_t);
+			reloc[2*n + 1].offset += sizeof(uint32_t);
 
-		obj[2*n + 1].relocs_ptr = to_user_pointer(&reloc[n]);
-		obj[2*n + 1].relocation_count = 1;
+		obj[3*n + 2].relocs_ptr = to_user_pointer(&reloc[2*n]);
+		obj[3*n + 2].relocation_count = 2;
 	}
 
 	map = gem_mmap__cpu(fd, scratch, 0, size, PROT_WRITE);
-	for (unsigned loop = 1; loop <= MAX_LOOP; loop <<= 1) {
-		unsigned count = loop * num_ctx;
+	for (unsigned int loop = 1; loop <= MAX_LOOP; loop <<= 1) {
+		const unsigned int count = loop * num_ctx;
 		uint32_t *all;
 
 		all = malloc(count * sizeof(uint32_t));
-		for (unsigned n = 0; n < count; n++)
+		for (unsigned int n = 0; n < count; n++)
 			all[n] = ctx[n % num_ctx];
 		igt_permute_array(all, count, xchg_int);
-		for (unsigned n = 0; n < count; n++) {
-			struct drm_i915_gem_execbuffer2 execbuf;
-			unsigned r = n % num_ctx;
-			uint64_t offset = reloc[r].presumed_offset + reloc[r].delta;
+
+		for (unsigned int n = 0; n < count; n++) {
+			const unsigned int r = n % num_ctx;
+			struct drm_i915_gem_execbuffer2 execbuf = {
+				.buffers_ptr = to_user_pointer(&obj[3*r]),
+				.buffer_count = 3,
+				.flags = engines[n % num_engines],
+				.rsvd1 = all[n],
+			};
+			uint64_t offset =
+				reloc[2*r + 1].presumed_offset +
+				reloc[2*r + 1].delta;
 			uint32_t handle = gem_create(fd, 4096);
 			uint32_t buf[16];
 			int i;
@@ -176,27 +195,22 @@ static void single(const char *name, bool all_engines)
 			buf[++i] = all[n];
 			buf[++i] = MI_BATCH_BUFFER_END;
 			gem_write(fd, handle, 0, buf, sizeof(buf));
-			obj[2*r + 1].handle = handle;
+			obj[3*r + 2].handle = handle;
 
-			memset(&execbuf, 0, sizeof(execbuf));
-			execbuf.buffers_ptr = to_user_pointer(&obj[2*r]);
-			execbuf.buffer_count = 2;
-			execbuf.flags = engines[n % num_engines];
-			execbuf.rsvd1 = all[n];
 			gem_execbuf(fd, &execbuf);
 			gem_close(fd, handle);
 		}
 
-		/* Note we lied about the write-domain when writing from the
+		/*
+		 * Note we lied about the write-domain when writing from the
 		 * GPU (in order to avoid inter-ring synchronisation), so now
 		 * we have to force the synchronisation here.
 		 */
 		gem_set_domain(fd, scratch,
 			       I915_GEM_DOMAIN_CPU, I915_GEM_DOMAIN_CPU);
-		for (unsigned n = count - num_ctx; n < count; n++)
+		for (unsigned int n = count - num_ctx; n < count; n++)
 			igt_assert_eq(map[n % num_ctx], all[n]);
 		free(all);
-		igt_progress(name, loop, MAX_LOOP);
 	}
 	munmap(map, size);
 
-- 
2.17.0

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH i-g-t 4/6] igt/gem_eio: Exercise banning
  2018-05-14  8:02 ` [igt-dev] " Chris Wilson
@ 2018-05-14  8:02   ` Chris Wilson
  -1 siblings, 0 replies; 38+ messages in thread
From: Chris Wilson @ 2018-05-14  8:02 UTC (permalink / raw)
  To: intel-gfx; +Cc: igt-dev

If we trigger "too many" resets, the context and even the file, will be
banend and subsequent execbufs should fail with -EIO.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 tests/gem_eio.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 49 insertions(+)

diff --git a/tests/gem_eio.c b/tests/gem_eio.c
index 6455c6acd..4720b47b5 100644
--- a/tests/gem_eio.c
+++ b/tests/gem_eio.c
@@ -250,6 +250,52 @@ static int __check_wait(int fd, uint32_t bo, unsigned int wait)
 	return ret;
 }
 
+static void __test_banned(int fd)
+{
+	struct drm_i915_gem_exec_object2 obj = {
+		.handle = gem_create(fd, 4096),
+	};
+	struct drm_i915_gem_execbuffer2 execbuf = {
+		.buffers_ptr = to_user_pointer(&obj),
+		.buffer_count = 1,
+	};
+	const uint32_t bbe = MI_BATCH_BUFFER_END;
+	unsigned long count = 0;
+
+	gem_write(fd, obj.handle, 0, &bbe, sizeof(bbe));
+
+	gem_quiescent_gpu(fd);
+	igt_require(i915_reset_control(true));
+
+	igt_until_timeout(5) {
+		igt_spin_t *hang;
+
+		if (__gem_execbuf(fd, &execbuf) == -EIO) {
+			igt_info("Banned after causing %lu hangs\n", count);
+			igt_assert(count > 1);
+			return;
+		}
+
+		/* Trigger a reset, making sure we are detected as guilty */
+		hang = spin_sync(fd, 0, 0);
+		trigger_reset(fd);
+		igt_spin_batch_free(fd, hang);
+
+		count++;
+	}
+
+	igt_assert_f(false,
+		     "Ran for 5s, %lu hangs without being banned\n",
+		     count);
+}
+
+static void test_banned(int fd)
+{
+	fd = gem_reopen_driver(fd);
+	__test_banned(fd);
+	close(fd);
+}
+
 #define TEST_WEDGE (1)
 
 static void test_wait(int fd, unsigned int flags, unsigned int wait)
@@ -693,6 +739,9 @@ igt_main
 	igt_subtest("execbuf")
 		test_execbuf(fd);
 
+	igt_subtest("banned")
+		test_banned(fd);
+
 	igt_subtest("suspend")
 		test_suspend(fd, SUSPEND_STATE_MEM);
 
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [igt-dev] [PATCH i-g-t 4/6] igt/gem_eio: Exercise banning
@ 2018-05-14  8:02   ` Chris Wilson
  0 siblings, 0 replies; 38+ messages in thread
From: Chris Wilson @ 2018-05-14  8:02 UTC (permalink / raw)
  To: intel-gfx; +Cc: igt-dev

If we trigger "too many" resets, the context and even the file, will be
banend and subsequent execbufs should fail with -EIO.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 tests/gem_eio.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 49 insertions(+)

diff --git a/tests/gem_eio.c b/tests/gem_eio.c
index 6455c6acd..4720b47b5 100644
--- a/tests/gem_eio.c
+++ b/tests/gem_eio.c
@@ -250,6 +250,52 @@ static int __check_wait(int fd, uint32_t bo, unsigned int wait)
 	return ret;
 }
 
+static void __test_banned(int fd)
+{
+	struct drm_i915_gem_exec_object2 obj = {
+		.handle = gem_create(fd, 4096),
+	};
+	struct drm_i915_gem_execbuffer2 execbuf = {
+		.buffers_ptr = to_user_pointer(&obj),
+		.buffer_count = 1,
+	};
+	const uint32_t bbe = MI_BATCH_BUFFER_END;
+	unsigned long count = 0;
+
+	gem_write(fd, obj.handle, 0, &bbe, sizeof(bbe));
+
+	gem_quiescent_gpu(fd);
+	igt_require(i915_reset_control(true));
+
+	igt_until_timeout(5) {
+		igt_spin_t *hang;
+
+		if (__gem_execbuf(fd, &execbuf) == -EIO) {
+			igt_info("Banned after causing %lu hangs\n", count);
+			igt_assert(count > 1);
+			return;
+		}
+
+		/* Trigger a reset, making sure we are detected as guilty */
+		hang = spin_sync(fd, 0, 0);
+		trigger_reset(fd);
+		igt_spin_batch_free(fd, hang);
+
+		count++;
+	}
+
+	igt_assert_f(false,
+		     "Ran for 5s, %lu hangs without being banned\n",
+		     count);
+}
+
+static void test_banned(int fd)
+{
+	fd = gem_reopen_driver(fd);
+	__test_banned(fd);
+	close(fd);
+}
+
 #define TEST_WEDGE (1)
 
 static void test_wait(int fd, unsigned int flags, unsigned int wait)
@@ -693,6 +739,9 @@ igt_main
 	igt_subtest("execbuf")
 		test_execbuf(fd);
 
+	igt_subtest("banned")
+		test_banned(fd);
+
 	igt_subtest("suspend")
 		test_suspend(fd, SUSPEND_STATE_MEM);
 
-- 
2.17.0

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH i-g-t 5/6] tests/gem_eio: Only wait-for-idle inside trigger_reset()
  2018-05-14  8:02 ` [igt-dev] " Chris Wilson
@ 2018-05-14  8:02   ` Chris Wilson
  -1 siblings, 0 replies; 38+ messages in thread
From: Chris Wilson @ 2018-05-14  8:02 UTC (permalink / raw)
  To: intel-gfx; +Cc: igt-dev

trigger_reset() imposes a tight time constraint (2s) so that we verify
that the reset itself completes quickly. In the middle of this check, we
call gem_quiescent_gpu() which may invoke an rcu_barrier() or two to
clear out the freed memory (DROP_FREED). Those barriers may have
unbounded latency pushing beyond the 2s timeout, so restrict the
operation to only wait-for-idle (DROP_ACTIVE).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105957
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 tests/gem_eio.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/tests/gem_eio.c b/tests/gem_eio.c
index 4720b47b5..e1aff639d 100644
--- a/tests/gem_eio.c
+++ b/tests/gem_eio.c
@@ -74,8 +74,7 @@ static void trigger_reset(int fd)
 	/* And just check the gpu is indeed running again */
 	igt_debug("Checking that the GPU recovered\n");
 	gem_test_engine(fd, ALL_ENGINES);
-
-	gem_quiescent_gpu(fd);
+	igt_drop_caches_set(fd, DROP_ACTIVE);
 
 	/* We expect forced reset and health check to be quick. */
 	igt_assert(igt_seconds_elapsed(&ts) < 2);
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [igt-dev] [PATCH i-g-t 5/6] tests/gem_eio: Only wait-for-idle inside trigger_reset()
@ 2018-05-14  8:02   ` Chris Wilson
  0 siblings, 0 replies; 38+ messages in thread
From: Chris Wilson @ 2018-05-14  8:02 UTC (permalink / raw)
  To: intel-gfx; +Cc: igt-dev, Tvrtko Ursulin

trigger_reset() imposes a tight time constraint (2s) so that we verify
that the reset itself completes quickly. In the middle of this check, we
call gem_quiescent_gpu() which may invoke an rcu_barrier() or two to
clear out the freed memory (DROP_FREED). Those barriers may have
unbounded latency pushing beyond the 2s timeout, so restrict the
operation to only wait-for-idle (DROP_ACTIVE).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105957
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 tests/gem_eio.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/tests/gem_eio.c b/tests/gem_eio.c
index 4720b47b5..e1aff639d 100644
--- a/tests/gem_eio.c
+++ b/tests/gem_eio.c
@@ -74,8 +74,7 @@ static void trigger_reset(int fd)
 	/* And just check the gpu is indeed running again */
 	igt_debug("Checking that the GPU recovered\n");
 	gem_test_engine(fd, ALL_ENGINES);
-
-	gem_quiescent_gpu(fd);
+	igt_drop_caches_set(fd, DROP_ACTIVE);
 
 	/* We expect forced reset and health check to be quick. */
 	igt_assert(igt_seconds_elapsed(&ts) < 2);
-- 
2.17.0

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH i-g-t 6/6] tests/gem_exec_latency: New subtests for checking submission from RT tasks
  2018-05-14  8:02 ` [igt-dev] " Chris Wilson
@ 2018-05-14  8:02   ` Chris Wilson
  -1 siblings, 0 replies; 38+ messages in thread
From: Chris Wilson @ 2018-05-14  8:02 UTC (permalink / raw)
  To: intel-gfx; +Cc: igt-dev

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

We want to make sure RT tasks which use a lot of CPU times can submit
batch buffers with roughly the same latency (and certainly not worse)
compared to normal tasks.

v2: Add tests to run across all engines simultaneously to encourage
ksoftirqd to kick in even more often.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> #v1
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 tests/gem_exec_latency.c | 207 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 207 insertions(+)

diff --git a/tests/gem_exec_latency.c b/tests/gem_exec_latency.c
index 9498c0921..c5816427b 100644
--- a/tests/gem_exec_latency.c
+++ b/tests/gem_exec_latency.c
@@ -36,11 +36,15 @@
 #include <sys/time.h>
 #include <sys/signal.h>
 #include <time.h>
+#include <sched.h>
 
 #include "drm.h"
 
 #include "igt_sysfs.h"
 #include "igt_vgem.h"
+#include "igt_dummyload.h"
+#include "igt_stats.h"
+
 #include "i915/gem_ring.h"
 
 #define LOCAL_I915_EXEC_NO_RELOC (1<<11)
@@ -351,6 +355,189 @@ static void latency_from_ring(int fd,
 	}
 }
 
+static void __rearm_spin_batch(igt_spin_t *spin)
+{
+	const uint32_t mi_arb_chk = 0x5 << 23;
+
+       *spin->batch = mi_arb_chk;
+       *spin->running = 0;
+       __sync_synchronize();
+}
+
+static void
+__submit_spin_batch(int fd, igt_spin_t *spin, unsigned int flags)
+{
+	struct drm_i915_gem_execbuffer2 eb = spin->execbuf;
+
+	eb.flags &= ~(0x3f | I915_EXEC_BSD_MASK);
+	eb.flags |= flags | I915_EXEC_NO_RELOC;
+
+	gem_execbuf(fd, &eb);
+}
+
+struct rt_pkt {
+	struct igt_mean mean;
+	double min, max;
+};
+
+static bool __spin_wait(int fd, igt_spin_t *spin)
+{
+	while (!READ_ONCE(*spin->running)) {
+		if (!gem_bo_busy(fd, spin->handle))
+			return false;
+	}
+
+	return true;
+}
+
+/*
+ * Test whether RT thread which hogs the CPU a lot can submit work with
+ * reasonable latency.
+ */
+static void
+rthog_latency_on_ring(int fd, unsigned int engine, const char *name, unsigned int flags)
+#define RTIDLE 0x1
+{
+	const char *passname[] = { "warmup", "normal", "rt" };
+	struct rt_pkt *results;
+	unsigned int engines[16];
+	const char *names[16];
+	unsigned int nengine;
+	int ret;
+
+	results = mmap(NULL, 4096, PROT_WRITE, MAP_SHARED | MAP_ANON, -1, 0);
+	igt_assert(results != MAP_FAILED);
+
+	nengine = 0;
+	if (engine == ALL_ENGINES) {
+		for_each_physical_engine(fd, engine) {
+			if (!gem_can_store_dword(fd, engine))
+				continue;
+
+			engines[nengine] = engine;
+			names[nengine] = e__->name;
+			nengine++;
+		}
+		igt_require(nengine > 1);
+	} else {
+		igt_require(gem_can_store_dword(fd, engine));
+		engines[nengine] = engine;
+		names[nengine] = name;
+		nengine++;
+	}
+
+	igt_fork(child, nengine) {
+		unsigned int pass = 0; /* Three passes: warmup, normal, rt. */
+
+		engine = engines[child];
+		do {
+			struct igt_mean mean;
+			double min = HUGE_VAL;
+			double max = -HUGE_VAL;
+			igt_spin_t *spin;
+
+			igt_mean_init(&mean);
+
+			if (pass == 2) {
+				struct sched_param rt =
+				{ .sched_priority = 99 };
+
+				ret = sched_setscheduler(0,
+							 SCHED_FIFO | SCHED_RESET_ON_FORK,
+							 &rt);
+				if (ret) {
+					igt_warn("Failed to set scheduling policy!\n");
+					break;
+				}
+			}
+
+			spin = __igt_spin_batch_new_poll(fd, 0, engine);
+			if (!spin) {
+				igt_warn("Failed to create spinner! (%s)\n",
+					 passname[pass]);
+				break;
+			}
+			igt_spin_busywait_until_running(spin);
+
+			igt_until_timeout(pass > 0 ? 5 : 2) {
+				struct timespec ts = { };
+				double t;
+
+				igt_spin_batch_end(spin);
+				gem_sync(fd, spin->handle);
+				if (flags & RTIDLE)
+					usleep(250);
+
+				/*
+				 * If we are oversubscribed (more RT hogs than
+				 * cpus) give the others a change to run;
+				 * otherwise, they will interrupt us in the
+				 * middle of the measurement.
+				 */
+				if (nengine > 1)
+					usleep(10*nengine);
+
+				__rearm_spin_batch(spin);
+
+				igt_nsec_elapsed(&ts);
+				__submit_spin_batch(fd, spin, engine);
+				if (!__spin_wait(fd, spin)) {
+					igt_warn("Wait timeout! (%s)\n",
+						 passname[pass]);
+					break;
+				}
+
+				t = igt_nsec_elapsed(&ts) * 1e-9;
+				if (t > max)
+					max = t;
+				if (t < min)
+					min = t;
+
+				igt_mean_add(&mean, t);
+			}
+
+			igt_spin_batch_free(fd, spin);
+
+			igt_info("%8s %10s: mean=%.2fus stddev=%.3fus [%.2fus, %.2fus] (n=%lu)\n",
+				 names[child],
+				 passname[pass],
+				 igt_mean_get(&mean) * 1e6,
+				 sqrt(igt_mean_get_variance(&mean)) * 1e6,
+				 min * 1e6, max * 1e6,
+				 mean.count);
+
+			results[3 * child + pass].mean = mean;
+			results[3 * child + pass].min = min;
+			results[3 * child + pass].max = max;
+		} while (++pass < 3);
+	}
+
+	igt_waitchildren();
+
+	for (unsigned int child = 0; child < nengine; child++) {
+		struct rt_pkt normal = results[3 * child + 1];
+		struct rt_pkt rt = results[3 * child + 2];
+
+		igt_assert(rt.max);
+
+		igt_info("%8s: normal latency=%.2f±%.3fus, rt latency=%.2f±%.3fus\n",
+			 names[child],
+			 igt_mean_get(&normal.mean) * 1e6,
+			 sqrt(igt_mean_get_variance(&normal.mean)) * 1e6,
+			 igt_mean_get(&rt.mean) * 1e6,
+			 sqrt(igt_mean_get_variance(&rt.mean)) * 1e6);
+
+		igt_assert(igt_mean_get(&rt.mean) <
+			   igt_mean_get(&normal.mean) * 2);
+
+		/* The system is noisy; be conservative when declaring fail. */
+		igt_assert(igt_mean_get_variance(&rt.mean) <
+			   igt_mean_get_variance(&normal.mean) * 25);
+	}
+
+	munmap(results, 4096);
+}
+
 igt_main
 {
 	const struct intel_execution_engine *e;
@@ -373,6 +560,12 @@ igt_main
 		intel_register_access_init(intel_get_pci_device(), false, device);
 	}
 
+	igt_subtest("all-rtidle-submit")
+		rthog_latency_on_ring(device, ALL_ENGINES, "all", RTIDLE);
+
+	igt_subtest("all-rthog-submit")
+		rthog_latency_on_ring(device, ALL_ENGINES, "all", 0);
+
 	igt_subtest_group {
 		igt_fixture
 			igt_require(intel_gen(intel_get_drm_devid(device)) >= 7);
@@ -391,6 +584,20 @@ igt_main
 							e->exec_id | e->flags,
 							e->name, 0);
 
+				igt_subtest_f("%s-rtidle-submit", e->name)
+					rthog_latency_on_ring(device,
+							      e->exec_id |
+							      e->flags,
+							      e->name,
+							      RTIDLE);
+
+				igt_subtest_f("%s-rthog-submit", e->name)
+					rthog_latency_on_ring(device,
+							      e->exec_id |
+							      e->flags,
+							      e->name,
+							      0);
+
 				igt_subtest_f("%s-dispatch-queued", e->name)
 					latency_on_ring(device,
 							e->exec_id | e->flags,
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [igt-dev] [PATCH i-g-t 6/6] tests/gem_exec_latency: New subtests for checking submission from RT tasks
@ 2018-05-14  8:02   ` Chris Wilson
  0 siblings, 0 replies; 38+ messages in thread
From: Chris Wilson @ 2018-05-14  8:02 UTC (permalink / raw)
  To: intel-gfx; +Cc: igt-dev, Tvrtko Ursulin

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

We want to make sure RT tasks which use a lot of CPU times can submit
batch buffers with roughly the same latency (and certainly not worse)
compared to normal tasks.

v2: Add tests to run across all engines simultaneously to encourage
ksoftirqd to kick in even more often.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> #v1
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 tests/gem_exec_latency.c | 207 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 207 insertions(+)

diff --git a/tests/gem_exec_latency.c b/tests/gem_exec_latency.c
index 9498c0921..c5816427b 100644
--- a/tests/gem_exec_latency.c
+++ b/tests/gem_exec_latency.c
@@ -36,11 +36,15 @@
 #include <sys/time.h>
 #include <sys/signal.h>
 #include <time.h>
+#include <sched.h>
 
 #include "drm.h"
 
 #include "igt_sysfs.h"
 #include "igt_vgem.h"
+#include "igt_dummyload.h"
+#include "igt_stats.h"
+
 #include "i915/gem_ring.h"
 
 #define LOCAL_I915_EXEC_NO_RELOC (1<<11)
@@ -351,6 +355,189 @@ static void latency_from_ring(int fd,
 	}
 }
 
+static void __rearm_spin_batch(igt_spin_t *spin)
+{
+	const uint32_t mi_arb_chk = 0x5 << 23;
+
+       *spin->batch = mi_arb_chk;
+       *spin->running = 0;
+       __sync_synchronize();
+}
+
+static void
+__submit_spin_batch(int fd, igt_spin_t *spin, unsigned int flags)
+{
+	struct drm_i915_gem_execbuffer2 eb = spin->execbuf;
+
+	eb.flags &= ~(0x3f | I915_EXEC_BSD_MASK);
+	eb.flags |= flags | I915_EXEC_NO_RELOC;
+
+	gem_execbuf(fd, &eb);
+}
+
+struct rt_pkt {
+	struct igt_mean mean;
+	double min, max;
+};
+
+static bool __spin_wait(int fd, igt_spin_t *spin)
+{
+	while (!READ_ONCE(*spin->running)) {
+		if (!gem_bo_busy(fd, spin->handle))
+			return false;
+	}
+
+	return true;
+}
+
+/*
+ * Test whether RT thread which hogs the CPU a lot can submit work with
+ * reasonable latency.
+ */
+static void
+rthog_latency_on_ring(int fd, unsigned int engine, const char *name, unsigned int flags)
+#define RTIDLE 0x1
+{
+	const char *passname[] = { "warmup", "normal", "rt" };
+	struct rt_pkt *results;
+	unsigned int engines[16];
+	const char *names[16];
+	unsigned int nengine;
+	int ret;
+
+	results = mmap(NULL, 4096, PROT_WRITE, MAP_SHARED | MAP_ANON, -1, 0);
+	igt_assert(results != MAP_FAILED);
+
+	nengine = 0;
+	if (engine == ALL_ENGINES) {
+		for_each_physical_engine(fd, engine) {
+			if (!gem_can_store_dword(fd, engine))
+				continue;
+
+			engines[nengine] = engine;
+			names[nengine] = e__->name;
+			nengine++;
+		}
+		igt_require(nengine > 1);
+	} else {
+		igt_require(gem_can_store_dword(fd, engine));
+		engines[nengine] = engine;
+		names[nengine] = name;
+		nengine++;
+	}
+
+	igt_fork(child, nengine) {
+		unsigned int pass = 0; /* Three passes: warmup, normal, rt. */
+
+		engine = engines[child];
+		do {
+			struct igt_mean mean;
+			double min = HUGE_VAL;
+			double max = -HUGE_VAL;
+			igt_spin_t *spin;
+
+			igt_mean_init(&mean);
+
+			if (pass == 2) {
+				struct sched_param rt =
+				{ .sched_priority = 99 };
+
+				ret = sched_setscheduler(0,
+							 SCHED_FIFO | SCHED_RESET_ON_FORK,
+							 &rt);
+				if (ret) {
+					igt_warn("Failed to set scheduling policy!\n");
+					break;
+				}
+			}
+
+			spin = __igt_spin_batch_new_poll(fd, 0, engine);
+			if (!spin) {
+				igt_warn("Failed to create spinner! (%s)\n",
+					 passname[pass]);
+				break;
+			}
+			igt_spin_busywait_until_running(spin);
+
+			igt_until_timeout(pass > 0 ? 5 : 2) {
+				struct timespec ts = { };
+				double t;
+
+				igt_spin_batch_end(spin);
+				gem_sync(fd, spin->handle);
+				if (flags & RTIDLE)
+					usleep(250);
+
+				/*
+				 * If we are oversubscribed (more RT hogs than
+				 * cpus) give the others a change to run;
+				 * otherwise, they will interrupt us in the
+				 * middle of the measurement.
+				 */
+				if (nengine > 1)
+					usleep(10*nengine);
+
+				__rearm_spin_batch(spin);
+
+				igt_nsec_elapsed(&ts);
+				__submit_spin_batch(fd, spin, engine);
+				if (!__spin_wait(fd, spin)) {
+					igt_warn("Wait timeout! (%s)\n",
+						 passname[pass]);
+					break;
+				}
+
+				t = igt_nsec_elapsed(&ts) * 1e-9;
+				if (t > max)
+					max = t;
+				if (t < min)
+					min = t;
+
+				igt_mean_add(&mean, t);
+			}
+
+			igt_spin_batch_free(fd, spin);
+
+			igt_info("%8s %10s: mean=%.2fus stddev=%.3fus [%.2fus, %.2fus] (n=%lu)\n",
+				 names[child],
+				 passname[pass],
+				 igt_mean_get(&mean) * 1e6,
+				 sqrt(igt_mean_get_variance(&mean)) * 1e6,
+				 min * 1e6, max * 1e6,
+				 mean.count);
+
+			results[3 * child + pass].mean = mean;
+			results[3 * child + pass].min = min;
+			results[3 * child + pass].max = max;
+		} while (++pass < 3);
+	}
+
+	igt_waitchildren();
+
+	for (unsigned int child = 0; child < nengine; child++) {
+		struct rt_pkt normal = results[3 * child + 1];
+		struct rt_pkt rt = results[3 * child + 2];
+
+		igt_assert(rt.max);
+
+		igt_info("%8s: normal latency=%.2f±%.3fus, rt latency=%.2f±%.3fus\n",
+			 names[child],
+			 igt_mean_get(&normal.mean) * 1e6,
+			 sqrt(igt_mean_get_variance(&normal.mean)) * 1e6,
+			 igt_mean_get(&rt.mean) * 1e6,
+			 sqrt(igt_mean_get_variance(&rt.mean)) * 1e6);
+
+		igt_assert(igt_mean_get(&rt.mean) <
+			   igt_mean_get(&normal.mean) * 2);
+
+		/* The system is noisy; be conservative when declaring fail. */
+		igt_assert(igt_mean_get_variance(&rt.mean) <
+			   igt_mean_get_variance(&normal.mean) * 25);
+	}
+
+	munmap(results, 4096);
+}
+
 igt_main
 {
 	const struct intel_execution_engine *e;
@@ -373,6 +560,12 @@ igt_main
 		intel_register_access_init(intel_get_pci_device(), false, device);
 	}
 
+	igt_subtest("all-rtidle-submit")
+		rthog_latency_on_ring(device, ALL_ENGINES, "all", RTIDLE);
+
+	igt_subtest("all-rthog-submit")
+		rthog_latency_on_ring(device, ALL_ENGINES, "all", 0);
+
 	igt_subtest_group {
 		igt_fixture
 			igt_require(intel_gen(intel_get_drm_devid(device)) >= 7);
@@ -391,6 +584,20 @@ igt_main
 							e->exec_id | e->flags,
 							e->name, 0);
 
+				igt_subtest_f("%s-rtidle-submit", e->name)
+					rthog_latency_on_ring(device,
+							      e->exec_id |
+							      e->flags,
+							      e->name,
+							      RTIDLE);
+
+				igt_subtest_f("%s-rthog-submit", e->name)
+					rthog_latency_on_ring(device,
+							      e->exec_id |
+							      e->flags,
+							      e->name,
+							      0);
+
 				igt_subtest_f("%s-dispatch-queued", e->name)
 					latency_on_ring(device,
 							e->exec_id | e->flags,
-- 
2.17.0

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [igt-dev] ✓ Fi.CI.BAT: success for series starting with [i-g-t,1/6] overlay: Remove the miscalculation of window position
  2018-05-14  8:02 ` [igt-dev] " Chris Wilson
                   ` (5 preceding siblings ...)
  (?)
@ 2018-05-14  8:52 ` Patchwork
  -1 siblings, 0 replies; 38+ messages in thread
From: Patchwork @ 2018-05-14  8:52 UTC (permalink / raw)
  To: Chris Wilson; +Cc: igt-dev

== Series Details ==

Series: series starting with [i-g-t,1/6] overlay: Remove the miscalculation of window position
URL   : https://patchwork.freedesktop.org/series/43119/
State : success

== Summary ==

= CI Bug Log - changes from CI_DRM_4173 -> IGTPW_1351 =

== Summary - SUCCESS ==

  No regressions found.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/43119/revisions/1/mbox/

== Known issues ==

  Here are the changes found in IGTPW_1351 that come from known issues:

  === IGT changes ===

    ==== Possible fixes ====

    igt@kms_flip@basic-flip-vs-wf_vblank:
      fi-cfl-u:           FAIL (fdo#100368) -> PASS

    igt@kms_pipe_crc_basic@suspend-read-crc-pipe-a:
      fi-cnl-y3:          DMESG-WARN (fdo#104951) -> PASS

    igt@kms_pipe_crc_basic@suspend-read-crc-pipe-b:
      fi-snb-2520m:       INCOMPLETE (fdo#103713) -> PASS

    
  fdo#100368 https://bugs.freedesktop.org/show_bug.cgi?id=100368
  fdo#103713 https://bugs.freedesktop.org/show_bug.cgi?id=103713
  fdo#104951 https://bugs.freedesktop.org/show_bug.cgi?id=104951


== Participating hosts (41 -> 37) ==

  Missing    (4): fi-ctg-p8600 fi-ilk-m540 fi-byt-squawks fi-skl-6700hq 


== Build changes ==

    * IGT: IGT_4476 -> IGTPW_1351

  CI_DRM_4173: 141169650b471cf9c33ad3e0a938a78e4018aacd @ git://anongit.freedesktop.org/gfx-ci/linux
  IGTPW_1351: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_1351/
  IGT_4476: 03a62cf055481f66b4f58e6228bc45f8ca454216 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  piglit_4476: 3ba0657bff4216d1ec7179935590261855f1651e @ git://anongit.freedesktop.org/piglit



== Testlist changes ==

+igt@gem_eio@banned
+igt@gem_exec_latency@all-rthog-submit
+igt@gem_exec_latency@all-rtidle-submit
+igt@gem_exec_latency@blt-rthog-submit
+igt@gem_exec_latency@blt-rtidle-submit
+igt@gem_exec_latency@bsd1-rthog-submit
+igt@gem_exec_latency@bsd1-rtidle-submit
+igt@gem_exec_latency@bsd2-rthog-submit
+igt@gem_exec_latency@bsd2-rtidle-submit
+igt@gem_exec_latency@bsd-rthog-submit
+igt@gem_exec_latency@bsd-rtidle-submit
+igt@gem_exec_latency@render-rthog-submit
+igt@gem_exec_latency@render-rtidle-submit
+igt@gem_exec_latency@vebox-rthog-submit
+igt@gem_exec_latency@vebox-rtidle-submit

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_1351/issues.html
_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH i-g-t 1/6] overlay: Remove the miscalculation of window position
  2018-05-14  8:02 ` [igt-dev] " Chris Wilson
@ 2018-05-14 10:18   ` Tvrtko Ursulin
  -1 siblings, 0 replies; 38+ messages in thread
From: Tvrtko Ursulin @ 2018-05-14 10:18 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: igt-dev


On 14/05/2018 09:02, Chris Wilson wrote:
> We already call x11_position() to calculate the position of the
> overlay, so do not need to manually recompute them inside
> x11_overlay_create(). This has the advantage that x11_position()
> understands the multi-monitor layout instructions.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   overlay/x11/x11-overlay.c | 16 ----------------
>   1 file changed, 16 deletions(-)
> 
> diff --git a/overlay/x11/x11-overlay.c b/overlay/x11/x11-overlay.c
> index d08ebfc9c..ae6494295 100644
> --- a/overlay/x11/x11-overlay.c
> +++ b/overlay/x11/x11-overlay.c
> @@ -271,22 +271,6 @@ x11_overlay_create(struct config *config, int *width, int *height)
>   
>   	priv->x = x;
>   	priv->y = y;
> -	if (position != POS_UNSET) {
> -		switch (position & 7) {
> -		default:
> -		case 0: priv->x = 0; break;
> -		case 1: priv->x = (scr->width - image->width)/2; break;
> -		case 2: priv->x = scr->width - image->width; break;
> -		}
> -
> -		switch ((position >> 4) & 7) {
> -		default:
> -		case 0: priv->y = 0; break;
> -		case 1: priv->y = (scr->height - image->height)/2; break;
> -		case 2: priv->y = scr->height - image->height; break;
> -		}
> -	}
> -
>   
>   	priv->image = image;
>   	priv->image->data = (void *)&priv->name;
> 

Looks OK even without full understanding of the X11 programming.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [igt-dev] [Intel-gfx] [PATCH i-g-t 1/6] overlay: Remove the miscalculation of window position
@ 2018-05-14 10:18   ` Tvrtko Ursulin
  0 siblings, 0 replies; 38+ messages in thread
From: Tvrtko Ursulin @ 2018-05-14 10:18 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: igt-dev


On 14/05/2018 09:02, Chris Wilson wrote:
> We already call x11_position() to calculate the position of the
> overlay, so do not need to manually recompute them inside
> x11_overlay_create(). This has the advantage that x11_position()
> understands the multi-monitor layout instructions.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   overlay/x11/x11-overlay.c | 16 ----------------
>   1 file changed, 16 deletions(-)
> 
> diff --git a/overlay/x11/x11-overlay.c b/overlay/x11/x11-overlay.c
> index d08ebfc9c..ae6494295 100644
> --- a/overlay/x11/x11-overlay.c
> +++ b/overlay/x11/x11-overlay.c
> @@ -271,22 +271,6 @@ x11_overlay_create(struct config *config, int *width, int *height)
>   
>   	priv->x = x;
>   	priv->y = y;
> -	if (position != POS_UNSET) {
> -		switch (position & 7) {
> -		default:
> -		case 0: priv->x = 0; break;
> -		case 1: priv->x = (scr->width - image->width)/2; break;
> -		case 2: priv->x = scr->width - image->width; break;
> -		}
> -
> -		switch ((position >> 4) & 7) {
> -		default:
> -		case 0: priv->y = 0; break;
> -		case 1: priv->y = (scr->height - image->height)/2; break;
> -		case 2: priv->y = scr->height - image->height; break;
> -		}
> -	}
> -
>   
>   	priv->image = image;
>   	priv->image->data = (void *)&priv->name;
> 

Looks OK even without full understanding of the X11 programming.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH i-g-t 2/6] lib/audio: Replace sqrt(a*a + b*b) with hypot(a, b)
  2018-05-14  8:02   ` [igt-dev] " Chris Wilson
@ 2018-05-14 10:20     ` Tvrtko Ursulin
  -1 siblings, 0 replies; 38+ messages in thread
From: Tvrtko Ursulin @ 2018-05-14 10:20 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: igt-dev


On 14/05/2018 09:02, Chris Wilson wrote:
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   lib/igt_audio.c | 4 +---
>   1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/lib/igt_audio.c b/lib/igt_audio.c
> index 2321d1c6e..229ce6c69 100644
> --- a/lib/igt_audio.c
> +++ b/lib/igt_audio.c
> @@ -266,9 +266,7 @@ bool audio_signal_detect(struct audio_signal *signal, int channels,
>   		max = 0;
>   
>   		for (i = 0; i < frames / 2; i++) {
> -			amplitude[i] = sqrt(data[i] * data[i] +
> -					    data[frames - i] *
> -					    data[frames - i]);
> +			amplitude[i] = hypot(data[i], data[frames - i]);
>   			if (amplitude[i] > max)
>   				max = amplitude[i];
>   		}
> 

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [igt-dev] [Intel-gfx] [PATCH i-g-t 2/6] lib/audio: Replace sqrt(a*a + b*b) with hypot(a, b)
@ 2018-05-14 10:20     ` Tvrtko Ursulin
  0 siblings, 0 replies; 38+ messages in thread
From: Tvrtko Ursulin @ 2018-05-14 10:20 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: igt-dev


On 14/05/2018 09:02, Chris Wilson wrote:
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   lib/igt_audio.c | 4 +---
>   1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/lib/igt_audio.c b/lib/igt_audio.c
> index 2321d1c6e..229ce6c69 100644
> --- a/lib/igt_audio.c
> +++ b/lib/igt_audio.c
> @@ -266,9 +266,7 @@ bool audio_signal_detect(struct audio_signal *signal, int channels,
>   		max = 0;
>   
>   		for (i = 0; i < frames / 2; i++) {
> -			amplitude[i] = sqrt(data[i] * data[i] +
> -					    data[frames - i] *
> -					    data[frames - i]);
> +			amplitude[i] = hypot(data[i], data[frames - i]);
>   			if (amplitude[i] > max)
>   				max = amplitude[i];
>   		}
> 

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [igt-dev] ✗ Fi.CI.IGT: failure for series starting with [i-g-t,1/6] overlay: Remove the miscalculation of window position
  2018-05-14  8:02 ` [igt-dev] " Chris Wilson
                   ` (7 preceding siblings ...)
  (?)
@ 2018-05-14 10:31 ` Patchwork
  -1 siblings, 0 replies; 38+ messages in thread
From: Patchwork @ 2018-05-14 10:31 UTC (permalink / raw)
  To: Chris Wilson; +Cc: igt-dev

== Series Details ==

Series: series starting with [i-g-t,1/6] overlay: Remove the miscalculation of window position
URL   : https://patchwork.freedesktop.org/series/43119/
State : failure

== Summary ==

= CI Bug Log - changes from IGT_4476_full -> IGTPW_1351_full =

== Summary - FAILURE ==

  Serious unknown changes coming with IGTPW_1351_full absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in IGTPW_1351_full, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/43119/revisions/1/mbox/

== Possible new issues ==

  Here are the unknown changes that may have been introduced in IGTPW_1351_full:

  === IGT changes ===

    ==== Possible regressions ====

    {igt@gem_eio@banned}:
      shard-hsw:          NOTRUN -> DMESG-WARN
      shard-kbl:          NOTRUN -> DMESG-WARN
      shard-snb:          NOTRUN -> DMESG-WARN
      shard-apl:          NOTRUN -> DMESG-WARN

    igt@gem_exec_schedule@preempt-queue-contexts-bsd1:
      shard-kbl:          PASS -> DMESG-WARN

    igt@kms_cursor_legacy@2x-nonblocking-modeset-vs-cursor-atomic:
      shard-glk:          SKIP -> FAIL

    
    ==== Warnings ====

    igt@gem_exec_schedule@deep-bsd1:
      shard-kbl:          SKIP -> PASS +2

    igt@gem_exec_schedule@deep-bsd2:
      shard-kbl:          PASS -> SKIP +2

    igt@kms_ccs@pipe-a-bad-aux-stride:
      shard-apl:          SKIP -> PASS +1

    igt@kms_cursor_legacy@cursorb-vs-flipb-atomic-transitions:
      shard-glk:          SKIP -> PASS +142

    igt@kms_hdmi_inject@inject-audio:
      shard-glk:          PASS -> SKIP +2

    
== Known issues ==

  Here are the changes found in IGTPW_1351_full that come from known issues:

  === IGT changes ===

    ==== Issues hit ====

    igt@gem_ppgtt@blt-vs-render-ctxn:
      shard-kbl:          PASS -> INCOMPLETE (fdo#106023, fdo#103665)

    igt@kms_ccs@pipe-a-crc-sprite-planes-basic:
      shard-apl:          PASS -> FAIL (fdo#106510) +1

    igt@kms_flip@2x-absolute-wf_vblank-interruptible:
      shard-glk:          SKIP -> FAIL (fdo#106087)

    igt@kms_flip@2x-flip-vs-absolute-wf_vblank-interruptible:
      shard-glk:          SKIP -> FAIL (fdo#100368)

    igt@kms_flip@2x-flip-vs-panning-vs-hang-interruptible:
      shard-glk:          SKIP -> INCOMPLETE (fdo#103359, k.org#198133)

    igt@kms_flip@flip-vs-absolute-wf_vblank-interruptible:
      shard-glk:          PASS -> FAIL (fdo#100368)

    igt@kms_flip_tiling@flip-to-x-tiled:
      shard-glk:          PASS -> FAIL (fdo#103822, fdo#104724)

    igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-pri-indfb-draw-blt:
      shard-glk:          SKIP -> FAIL (fdo#103167, fdo#104724) +1

    igt@kms_setmode@basic:
      shard-kbl:          PASS -> FAIL (fdo#99912)

    igt@kms_sysfs_edid_timing:
      shard-apl:          PASS -> FAIL (fdo#100047)

    igt@perf@polling:
      shard-hsw:          PASS -> FAIL (fdo#102252)

    igt@testdisplay:
      shard-glk:          PASS -> INCOMPLETE (fdo#103359, k.org#198133)

    
    ==== Possible fixes ====

    igt@kms_flip@wf_vblank-ts-check:
      shard-hsw:          FAIL (fdo#103928) -> PASS

    igt@kms_plane@plane-panning-bottom-right-suspend-pipe-c-planes:
      shard-apl:          FAIL (fdo#103166, fdo#104724) -> PASS

    igt@kms_plane_lowres@pipe-b-tiling-y:
      shard-kbl:          DMESG-WARN (fdo#103313, fdo#103558, fdo#105602) -> PASS +8

    igt@pm_rpm@dpms-mode-unset-non-lpsp:
      shard-kbl:          DMESG-WARN (fdo#103558, fdo#105602) -> PASS +2

    
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  fdo#100047 https://bugs.freedesktop.org/show_bug.cgi?id=100047
  fdo#100368 https://bugs.freedesktop.org/show_bug.cgi?id=100368
  fdo#102252 https://bugs.freedesktop.org/show_bug.cgi?id=102252
  fdo#103166 https://bugs.freedesktop.org/show_bug.cgi?id=103166
  fdo#103167 https://bugs.freedesktop.org/show_bug.cgi?id=103167
  fdo#103313 https://bugs.freedesktop.org/show_bug.cgi?id=103313
  fdo#103359 https://bugs.freedesktop.org/show_bug.cgi?id=103359
  fdo#103558 https://bugs.freedesktop.org/show_bug.cgi?id=103558
  fdo#103665 https://bugs.freedesktop.org/show_bug.cgi?id=103665
  fdo#103822 https://bugs.freedesktop.org/show_bug.cgi?id=103822
  fdo#103928 https://bugs.freedesktop.org/show_bug.cgi?id=103928
  fdo#104724 https://bugs.freedesktop.org/show_bug.cgi?id=104724
  fdo#105602 https://bugs.freedesktop.org/show_bug.cgi?id=105602
  fdo#106023 https://bugs.freedesktop.org/show_bug.cgi?id=106023
  fdo#106087 https://bugs.freedesktop.org/show_bug.cgi?id=106087
  fdo#106510 https://bugs.freedesktop.org/show_bug.cgi?id=106510
  fdo#99912 https://bugs.freedesktop.org/show_bug.cgi?id=99912
  k.org#198133 https://bugzilla.kernel.org/show_bug.cgi?id=198133


== Participating hosts (5 -> 5) ==

  No changes in participating hosts


== Build changes ==

    * IGT: IGT_4476 -> IGTPW_1351
    * Linux: CI_DRM_4172 -> CI_DRM_4173

  CI_DRM_4172: b4c14c0809e8f6dba9a4fdf1a659cea5a98750d3 @ git://anongit.freedesktop.org/gfx-ci/linux
  CI_DRM_4173: 141169650b471cf9c33ad3e0a938a78e4018aacd @ git://anongit.freedesktop.org/gfx-ci/linux
  IGTPW_1351: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_1351/
  IGT_4476: 03a62cf055481f66b4f58e6228bc45f8ca454216 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  piglit_4476: 3ba0657bff4216d1ec7179935590261855f1651e @ git://anongit.freedesktop.org/piglit

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_1351/shards.html
_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH i-g-t 3/6] igt/gem_ctx_thrash: Order writes between contexts
  2018-05-14  8:02   ` [igt-dev] " Chris Wilson
@ 2018-05-14 10:59     ` Tvrtko Ursulin
  -1 siblings, 0 replies; 38+ messages in thread
From: Tvrtko Ursulin @ 2018-05-14 10:59 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: igt-dev


On 14/05/2018 09:02, Chris Wilson wrote:
> The test wrote to the same dwords from multiple contexts, assuming that
> the writes would be ordered by its submission. However, as it was using
> multiple contexts without a write hazard, those timelines are not
> coupled and the requests may be emitted to hw in any order. So emit a
> write hazard for each individual dword in the scratch (avoiding the
> write hazard for the scratch as a whole) to ensure the writes do occur
> in the expected order.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   tests/gem_ctx_thrash.c | 92 ++++++++++++++++++++++++------------------
>   1 file changed, 53 insertions(+), 39 deletions(-)
> 
> diff --git a/tests/gem_ctx_thrash.c b/tests/gem_ctx_thrash.c
> index 2cd9cfebf..b25f95f13 100644
> --- a/tests/gem_ctx_thrash.c
> +++ b/tests/gem_ctx_thrash.c
> @@ -90,17 +90,13 @@ static void single(const char *name, bool all_engines)
>   {
>   	struct drm_i915_gem_exec_object2 *obj;
>   	struct drm_i915_gem_relocation_entry *reloc;
> -	unsigned engines[16];
> -	uint64_t size;
> -	uint32_t *ctx, *map, scratch;
> -	unsigned num_ctx;
> -	int fd, gen, num_engines;
> +	unsigned int engines[16], num_engines, num_ctx;
> +	uint32_t *ctx, *map, scratch, size;
> +	int fd, gen;
>   #define MAX_LOOP 16
>   
> -	fd = drm_open_driver_master(DRIVER_INTEL);
> +	fd = drm_open_driver(DRIVER_INTEL);
>   	igt_require_gem(fd);
> -	igt_require(gem_can_store_dword(fd, 0));
> -
>   	gem_require_contexts(fd);
>   
>   	gen = intel_gen(intel_get_drm_devid(fd));
> @@ -108,54 +104,77 @@ static void single(const char *name, bool all_engines)
>   	num_engines = 0;
>   	if (all_engines) {
>   		unsigned engine;
> +
>   		for_each_physical_engine(fd, engine) {
> +			if (!gem_can_store_dword(fd, engine))
> +				continue;
> +
>   			engines[num_engines++] = engine;
>   			if (num_engines == ARRAY_SIZE(engines))
>   				break;
>   		}
> -	} else
> +	} else {
> +		igt_require(gem_can_store_dword(fd, 0));
>   		engines[num_engines++] = 0;
> +	}
> +	igt_require(num_engines);
>   
>   	num_ctx = get_num_contexts(fd, num_engines);
>   
>   	size = ALIGN(num_ctx * sizeof(uint32_t), 4096);
> -	scratch = gem_create(fd, ALIGN(num_ctx * sizeof(uint32_t), 4096));
> +	scratch = gem_create(fd, size);
>   	gem_set_caching(fd, scratch, I915_CACHING_CACHED);
> -	obj = calloc(num_ctx, 2 * sizeof(*obj));
> -	reloc = calloc(num_ctx, sizeof(*reloc));
> +	obj = calloc(num_ctx, 3 * sizeof(*obj));
> +	reloc = calloc(num_ctx, 2 * sizeof(*reloc));
>   
>   	ctx = malloc(num_ctx * sizeof(uint32_t));
>   	igt_assert(ctx);
>   	for (unsigned n = 0; n < num_ctx; n++) {
>   		ctx[n] = gem_context_create(fd);
> -		obj[2*n + 0].handle = scratch;
> -
> -		reloc[n].target_handle = scratch;
> -		reloc[n].presumed_offset = 0;
> -		reloc[n].offset = sizeof(uint32_t);
> -		reloc[n].delta = n * sizeof(uint32_t);
> -		reloc[n].read_domains = I915_GEM_DOMAIN_INSTRUCTION;
> -		reloc[n].write_domain = 0; /* lies! */
> +
> +		obj[3*n + 0].handle = gem_create(fd, 4096);
> +		reloc[2*n + 0].target_handle = obj[3*n + 0].handle;
> +		reloc[2*n + 0].presumed_offset = 0;
> +		reloc[2*n + 0].offset = 4000;
> +		reloc[2*n + 0].delta = 0;
> +		reloc[2*n + 0].read_domains = I915_GEM_DOMAIN_RENDER;
> +		reloc[2*n + 0].write_domain = I915_GEM_DOMAIN_RENDER;
> +
> +		obj[3*n + 1].handle = scratch;
> +		reloc[2*n + 1].target_handle = scratch;
> +		reloc[2*n + 1].presumed_offset = 0;
> +		reloc[2*n + 1].offset = sizeof(uint32_t);
> +		reloc[2*n + 1].delta = n * sizeof(uint32_t);
> +		reloc[2*n + 1].read_domains = I915_GEM_DOMAIN_RENDER;
> +		reloc[2*n + 1].write_domain = 0; /* lies! */
>   		if (gen >= 4 && gen < 8)
> -			reloc[n].offset += sizeof(uint32_t);
> +			reloc[2*n + 1].offset += sizeof(uint32_t);
>   
> -		obj[2*n + 1].relocs_ptr = to_user_pointer(&reloc[n]);
> -		obj[2*n + 1].relocation_count = 1;
> +		obj[3*n + 2].relocs_ptr = to_user_pointer(&reloc[2*n]);
> +		obj[3*n + 2].relocation_count = 2;
>   	}
>   
>   	map = gem_mmap__cpu(fd, scratch, 0, size, PROT_WRITE);
> -	for (unsigned loop = 1; loop <= MAX_LOOP; loop <<= 1) {
> -		unsigned count = loop * num_ctx;
> +	for (unsigned int loop = 1; loop <= MAX_LOOP; loop <<= 1) {
> +		const unsigned int count = loop * num_ctx;
>   		uint32_t *all;
>   
>   		all = malloc(count * sizeof(uint32_t));
> -		for (unsigned n = 0; n < count; n++)
> +		for (unsigned int n = 0; n < count; n++)
>   			all[n] = ctx[n % num_ctx];
>   		igt_permute_array(all, count, xchg_int);
> -		for (unsigned n = 0; n < count; n++) {
> -			struct drm_i915_gem_execbuffer2 execbuf;
> -			unsigned r = n % num_ctx;
> -			uint64_t offset = reloc[r].presumed_offset + reloc[r].delta;
> +
> +		for (unsigned int n = 0; n < count; n++) {
> +			const unsigned int r = n % num_ctx;
> +			struct drm_i915_gem_execbuffer2 execbuf = {
> +				.buffers_ptr = to_user_pointer(&obj[3*r]),
> +				.buffer_count = 3,
> +				.flags = engines[n % num_engines],
> +				.rsvd1 = all[n],
> +			};
> +			uint64_t offset =
> +				reloc[2*r + 1].presumed_offset +
> +				reloc[2*r + 1].delta;
>   			uint32_t handle = gem_create(fd, 4096);
>   			uint32_t buf[16];
>   			int i;
> @@ -176,27 +195,22 @@ static void single(const char *name, bool all_engines)
>   			buf[++i] = all[n];
>   			buf[++i] = MI_BATCH_BUFFER_END;
>   			gem_write(fd, handle, 0, buf, sizeof(buf));
> -			obj[2*r + 1].handle = handle;
> +			obj[3*r + 2].handle = handle;
>   
> -			memset(&execbuf, 0, sizeof(execbuf));
> -			execbuf.buffers_ptr = to_user_pointer(&obj[2*r]);
> -			execbuf.buffer_count = 2;
> -			execbuf.flags = engines[n % num_engines];
> -			execbuf.rsvd1 = all[n];
>   			gem_execbuf(fd, &execbuf);
>   			gem_close(fd, handle);
>   		}
>   
> -		/* Note we lied about the write-domain when writing from the
> +		/*
> +		 * Note we lied about the write-domain when writing from the
>   		 * GPU (in order to avoid inter-ring synchronisation), so now
>   		 * we have to force the synchronisation here.
>   		 */
>   		gem_set_domain(fd, scratch,
>   			       I915_GEM_DOMAIN_CPU, I915_GEM_DOMAIN_CPU);
> -		for (unsigned n = count - num_ctx; n < count; n++)
> +		for (unsigned int n = count - num_ctx; n < count; n++)
>   			igt_assert_eq(map[n % num_ctx], all[n]);
>   		free(all);
> -		igt_progress(name, loop, MAX_LOOP);
>   	}
>   	munmap(map, size);
>   
> 

Just one thing I failed to figure out - what would be the difference 
from simply putting a write hazard on the scratch page write? Wouldn't 
both guarantee execution in order of execbuf?

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [igt-dev] [Intel-gfx] [PATCH i-g-t 3/6] igt/gem_ctx_thrash: Order writes between contexts
@ 2018-05-14 10:59     ` Tvrtko Ursulin
  0 siblings, 0 replies; 38+ messages in thread
From: Tvrtko Ursulin @ 2018-05-14 10:59 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: igt-dev


On 14/05/2018 09:02, Chris Wilson wrote:
> The test wrote to the same dwords from multiple contexts, assuming that
> the writes would be ordered by its submission. However, as it was using
> multiple contexts without a write hazard, those timelines are not
> coupled and the requests may be emitted to hw in any order. So emit a
> write hazard for each individual dword in the scratch (avoiding the
> write hazard for the scratch as a whole) to ensure the writes do occur
> in the expected order.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   tests/gem_ctx_thrash.c | 92 ++++++++++++++++++++++++------------------
>   1 file changed, 53 insertions(+), 39 deletions(-)
> 
> diff --git a/tests/gem_ctx_thrash.c b/tests/gem_ctx_thrash.c
> index 2cd9cfebf..b25f95f13 100644
> --- a/tests/gem_ctx_thrash.c
> +++ b/tests/gem_ctx_thrash.c
> @@ -90,17 +90,13 @@ static void single(const char *name, bool all_engines)
>   {
>   	struct drm_i915_gem_exec_object2 *obj;
>   	struct drm_i915_gem_relocation_entry *reloc;
> -	unsigned engines[16];
> -	uint64_t size;
> -	uint32_t *ctx, *map, scratch;
> -	unsigned num_ctx;
> -	int fd, gen, num_engines;
> +	unsigned int engines[16], num_engines, num_ctx;
> +	uint32_t *ctx, *map, scratch, size;
> +	int fd, gen;
>   #define MAX_LOOP 16
>   
> -	fd = drm_open_driver_master(DRIVER_INTEL);
> +	fd = drm_open_driver(DRIVER_INTEL);
>   	igt_require_gem(fd);
> -	igt_require(gem_can_store_dword(fd, 0));
> -
>   	gem_require_contexts(fd);
>   
>   	gen = intel_gen(intel_get_drm_devid(fd));
> @@ -108,54 +104,77 @@ static void single(const char *name, bool all_engines)
>   	num_engines = 0;
>   	if (all_engines) {
>   		unsigned engine;
> +
>   		for_each_physical_engine(fd, engine) {
> +			if (!gem_can_store_dword(fd, engine))
> +				continue;
> +
>   			engines[num_engines++] = engine;
>   			if (num_engines == ARRAY_SIZE(engines))
>   				break;
>   		}
> -	} else
> +	} else {
> +		igt_require(gem_can_store_dword(fd, 0));
>   		engines[num_engines++] = 0;
> +	}
> +	igt_require(num_engines);
>   
>   	num_ctx = get_num_contexts(fd, num_engines);
>   
>   	size = ALIGN(num_ctx * sizeof(uint32_t), 4096);
> -	scratch = gem_create(fd, ALIGN(num_ctx * sizeof(uint32_t), 4096));
> +	scratch = gem_create(fd, size);
>   	gem_set_caching(fd, scratch, I915_CACHING_CACHED);
> -	obj = calloc(num_ctx, 2 * sizeof(*obj));
> -	reloc = calloc(num_ctx, sizeof(*reloc));
> +	obj = calloc(num_ctx, 3 * sizeof(*obj));
> +	reloc = calloc(num_ctx, 2 * sizeof(*reloc));
>   
>   	ctx = malloc(num_ctx * sizeof(uint32_t));
>   	igt_assert(ctx);
>   	for (unsigned n = 0; n < num_ctx; n++) {
>   		ctx[n] = gem_context_create(fd);
> -		obj[2*n + 0].handle = scratch;
> -
> -		reloc[n].target_handle = scratch;
> -		reloc[n].presumed_offset = 0;
> -		reloc[n].offset = sizeof(uint32_t);
> -		reloc[n].delta = n * sizeof(uint32_t);
> -		reloc[n].read_domains = I915_GEM_DOMAIN_INSTRUCTION;
> -		reloc[n].write_domain = 0; /* lies! */
> +
> +		obj[3*n + 0].handle = gem_create(fd, 4096);
> +		reloc[2*n + 0].target_handle = obj[3*n + 0].handle;
> +		reloc[2*n + 0].presumed_offset = 0;
> +		reloc[2*n + 0].offset = 4000;
> +		reloc[2*n + 0].delta = 0;
> +		reloc[2*n + 0].read_domains = I915_GEM_DOMAIN_RENDER;
> +		reloc[2*n + 0].write_domain = I915_GEM_DOMAIN_RENDER;
> +
> +		obj[3*n + 1].handle = scratch;
> +		reloc[2*n + 1].target_handle = scratch;
> +		reloc[2*n + 1].presumed_offset = 0;
> +		reloc[2*n + 1].offset = sizeof(uint32_t);
> +		reloc[2*n + 1].delta = n * sizeof(uint32_t);
> +		reloc[2*n + 1].read_domains = I915_GEM_DOMAIN_RENDER;
> +		reloc[2*n + 1].write_domain = 0; /* lies! */
>   		if (gen >= 4 && gen < 8)
> -			reloc[n].offset += sizeof(uint32_t);
> +			reloc[2*n + 1].offset += sizeof(uint32_t);
>   
> -		obj[2*n + 1].relocs_ptr = to_user_pointer(&reloc[n]);
> -		obj[2*n + 1].relocation_count = 1;
> +		obj[3*n + 2].relocs_ptr = to_user_pointer(&reloc[2*n]);
> +		obj[3*n + 2].relocation_count = 2;
>   	}
>   
>   	map = gem_mmap__cpu(fd, scratch, 0, size, PROT_WRITE);
> -	for (unsigned loop = 1; loop <= MAX_LOOP; loop <<= 1) {
> -		unsigned count = loop * num_ctx;
> +	for (unsigned int loop = 1; loop <= MAX_LOOP; loop <<= 1) {
> +		const unsigned int count = loop * num_ctx;
>   		uint32_t *all;
>   
>   		all = malloc(count * sizeof(uint32_t));
> -		for (unsigned n = 0; n < count; n++)
> +		for (unsigned int n = 0; n < count; n++)
>   			all[n] = ctx[n % num_ctx];
>   		igt_permute_array(all, count, xchg_int);
> -		for (unsigned n = 0; n < count; n++) {
> -			struct drm_i915_gem_execbuffer2 execbuf;
> -			unsigned r = n % num_ctx;
> -			uint64_t offset = reloc[r].presumed_offset + reloc[r].delta;
> +
> +		for (unsigned int n = 0; n < count; n++) {
> +			const unsigned int r = n % num_ctx;
> +			struct drm_i915_gem_execbuffer2 execbuf = {
> +				.buffers_ptr = to_user_pointer(&obj[3*r]),
> +				.buffer_count = 3,
> +				.flags = engines[n % num_engines],
> +				.rsvd1 = all[n],
> +			};
> +			uint64_t offset =
> +				reloc[2*r + 1].presumed_offset +
> +				reloc[2*r + 1].delta;
>   			uint32_t handle = gem_create(fd, 4096);
>   			uint32_t buf[16];
>   			int i;
> @@ -176,27 +195,22 @@ static void single(const char *name, bool all_engines)
>   			buf[++i] = all[n];
>   			buf[++i] = MI_BATCH_BUFFER_END;
>   			gem_write(fd, handle, 0, buf, sizeof(buf));
> -			obj[2*r + 1].handle = handle;
> +			obj[3*r + 2].handle = handle;
>   
> -			memset(&execbuf, 0, sizeof(execbuf));
> -			execbuf.buffers_ptr = to_user_pointer(&obj[2*r]);
> -			execbuf.buffer_count = 2;
> -			execbuf.flags = engines[n % num_engines];
> -			execbuf.rsvd1 = all[n];
>   			gem_execbuf(fd, &execbuf);
>   			gem_close(fd, handle);
>   		}
>   
> -		/* Note we lied about the write-domain when writing from the
> +		/*
> +		 * Note we lied about the write-domain when writing from the
>   		 * GPU (in order to avoid inter-ring synchronisation), so now
>   		 * we have to force the synchronisation here.
>   		 */
>   		gem_set_domain(fd, scratch,
>   			       I915_GEM_DOMAIN_CPU, I915_GEM_DOMAIN_CPU);
> -		for (unsigned n = count - num_ctx; n < count; n++)
> +		for (unsigned int n = count - num_ctx; n < count; n++)
>   			igt_assert_eq(map[n % num_ctx], all[n]);
>   		free(all);
> -		igt_progress(name, loop, MAX_LOOP);
>   	}
>   	munmap(map, size);
>   
> 

Just one thing I failed to figure out - what would be the difference 
from simply putting a write hazard on the scratch page write? Wouldn't 
both guarantee execution in order of execbuf?

Regards,

Tvrtko
_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH i-g-t 4/6] igt/gem_eio: Exercise banning
  2018-05-14  8:02   ` [igt-dev] " Chris Wilson
@ 2018-05-14 11:01     ` Tvrtko Ursulin
  -1 siblings, 0 replies; 38+ messages in thread
From: Tvrtko Ursulin @ 2018-05-14 11:01 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: igt-dev


On 14/05/2018 09:02, Chris Wilson wrote:
> If we trigger "too many" resets, the context and even the file, will be
> banend and subsequent execbufs should fail with -EIO.

banned

> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   tests/gem_eio.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 49 insertions(+)
> 
> diff --git a/tests/gem_eio.c b/tests/gem_eio.c
> index 6455c6acd..4720b47b5 100644
> --- a/tests/gem_eio.c
> +++ b/tests/gem_eio.c
> @@ -250,6 +250,52 @@ static int __check_wait(int fd, uint32_t bo, unsigned int wait)
>   	return ret;
>   }
>   
> +static void __test_banned(int fd)
> +{
> +	struct drm_i915_gem_exec_object2 obj = {
> +		.handle = gem_create(fd, 4096),
> +	};
> +	struct drm_i915_gem_execbuffer2 execbuf = {
> +		.buffers_ptr = to_user_pointer(&obj),
> +		.buffer_count = 1,
> +	};
> +	const uint32_t bbe = MI_BATCH_BUFFER_END;
> +	unsigned long count = 0;
> +
> +	gem_write(fd, obj.handle, 0, &bbe, sizeof(bbe));
> +
> +	gem_quiescent_gpu(fd);
> +	igt_require(i915_reset_control(true));
> +
> +	igt_until_timeout(5) {
> +		igt_spin_t *hang;
> +
> +		if (__gem_execbuf(fd, &execbuf) == -EIO) {
> +			igt_info("Banned after causing %lu hangs\n", count);
> +			igt_assert(count > 1);
> +			return;
> +		}
> +
> +		/* Trigger a reset, making sure we are detected as guilty */
> +		hang = spin_sync(fd, 0, 0);
> +		trigger_reset(fd);
> +		igt_spin_batch_free(fd, hang);
> +
> +		count++;
> +	}
> +
> +	igt_assert_f(false,
> +		     "Ran for 5s, %lu hangs without being banned\n",
> +		     count);
> +}
> +
> +static void test_banned(int fd)
> +{
> +	fd = gem_reopen_driver(fd);
> +	__test_banned(fd);
> +	close(fd);
> +}
> +
>   #define TEST_WEDGE (1)
>   
>   static void test_wait(int fd, unsigned int flags, unsigned int wait)
> @@ -693,6 +739,9 @@ igt_main
>   	igt_subtest("execbuf")
>   		test_execbuf(fd);
>   
> +	igt_subtest("banned")
> +		test_banned(fd);
> +
>   	igt_subtest("suspend")
>   		test_suspend(fd, SUSPEND_STATE_MEM);
>   
> 

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Intel-gfx] [PATCH i-g-t 4/6] igt/gem_eio: Exercise banning
@ 2018-05-14 11:01     ` Tvrtko Ursulin
  0 siblings, 0 replies; 38+ messages in thread
From: Tvrtko Ursulin @ 2018-05-14 11:01 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: igt-dev


On 14/05/2018 09:02, Chris Wilson wrote:
> If we trigger "too many" resets, the context and even the file, will be
> banend and subsequent execbufs should fail with -EIO.

banned

> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   tests/gem_eio.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 49 insertions(+)
> 
> diff --git a/tests/gem_eio.c b/tests/gem_eio.c
> index 6455c6acd..4720b47b5 100644
> --- a/tests/gem_eio.c
> +++ b/tests/gem_eio.c
> @@ -250,6 +250,52 @@ static int __check_wait(int fd, uint32_t bo, unsigned int wait)
>   	return ret;
>   }
>   
> +static void __test_banned(int fd)
> +{
> +	struct drm_i915_gem_exec_object2 obj = {
> +		.handle = gem_create(fd, 4096),
> +	};
> +	struct drm_i915_gem_execbuffer2 execbuf = {
> +		.buffers_ptr = to_user_pointer(&obj),
> +		.buffer_count = 1,
> +	};
> +	const uint32_t bbe = MI_BATCH_BUFFER_END;
> +	unsigned long count = 0;
> +
> +	gem_write(fd, obj.handle, 0, &bbe, sizeof(bbe));
> +
> +	gem_quiescent_gpu(fd);
> +	igt_require(i915_reset_control(true));
> +
> +	igt_until_timeout(5) {
> +		igt_spin_t *hang;
> +
> +		if (__gem_execbuf(fd, &execbuf) == -EIO) {
> +			igt_info("Banned after causing %lu hangs\n", count);
> +			igt_assert(count > 1);
> +			return;
> +		}
> +
> +		/* Trigger a reset, making sure we are detected as guilty */
> +		hang = spin_sync(fd, 0, 0);
> +		trigger_reset(fd);
> +		igt_spin_batch_free(fd, hang);
> +
> +		count++;
> +	}
> +
> +	igt_assert_f(false,
> +		     "Ran for 5s, %lu hangs without being banned\n",
> +		     count);
> +}
> +
> +static void test_banned(int fd)
> +{
> +	fd = gem_reopen_driver(fd);
> +	__test_banned(fd);
> +	close(fd);
> +}
> +
>   #define TEST_WEDGE (1)
>   
>   static void test_wait(int fd, unsigned int flags, unsigned int wait)
> @@ -693,6 +739,9 @@ igt_main
>   	igt_subtest("execbuf")
>   		test_execbuf(fd);
>   
> +	igt_subtest("banned")
> +		test_banned(fd);
> +
>   	igt_subtest("suspend")
>   		test_suspend(fd, SUSPEND_STATE_MEM);
>   
> 

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH i-g-t 5/6] tests/gem_eio: Only wait-for-idle inside trigger_reset()
  2018-05-14  8:02   ` [igt-dev] " Chris Wilson
@ 2018-05-14 11:03     ` Tvrtko Ursulin
  -1 siblings, 0 replies; 38+ messages in thread
From: Tvrtko Ursulin @ 2018-05-14 11:03 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: igt-dev


On 14/05/2018 09:02, Chris Wilson wrote:
> trigger_reset() imposes a tight time constraint (2s) so that we verify
> that the reset itself completes quickly. In the middle of this check, we
> call gem_quiescent_gpu() which may invoke an rcu_barrier() or two to
> clear out the freed memory (DROP_FREED). Those barriers may have
> unbounded latency pushing beyond the 2s timeout, so restrict the
> operation to only wait-for-idle (DROP_ACTIVE).
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105957
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   tests/gem_eio.c | 3 +--
>   1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/tests/gem_eio.c b/tests/gem_eio.c
> index 4720b47b5..e1aff639d 100644
> --- a/tests/gem_eio.c
> +++ b/tests/gem_eio.c
> @@ -74,8 +74,7 @@ static void trigger_reset(int fd)
>   	/* And just check the gpu is indeed running again */
>   	igt_debug("Checking that the GPU recovered\n");
>   	gem_test_engine(fd, ALL_ENGINES);
> -
> -	gem_quiescent_gpu(fd);
> +	igt_drop_caches_set(fd, DROP_ACTIVE);
>   
>   	/* We expect forced reset and health check to be quick. */
>   	igt_assert(igt_seconds_elapsed(&ts) < 2);
> 

Sounds fine to only wait for idle:

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

However I am a bit surprised that under plain IGT environment RCU 
latency can be so high.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [igt-dev] [Intel-gfx] [PATCH i-g-t 5/6] tests/gem_eio: Only wait-for-idle inside trigger_reset()
@ 2018-05-14 11:03     ` Tvrtko Ursulin
  0 siblings, 0 replies; 38+ messages in thread
From: Tvrtko Ursulin @ 2018-05-14 11:03 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: igt-dev


On 14/05/2018 09:02, Chris Wilson wrote:
> trigger_reset() imposes a tight time constraint (2s) so that we verify
> that the reset itself completes quickly. In the middle of this check, we
> call gem_quiescent_gpu() which may invoke an rcu_barrier() or two to
> clear out the freed memory (DROP_FREED). Those barriers may have
> unbounded latency pushing beyond the 2s timeout, so restrict the
> operation to only wait-for-idle (DROP_ACTIVE).
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105957
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   tests/gem_eio.c | 3 +--
>   1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/tests/gem_eio.c b/tests/gem_eio.c
> index 4720b47b5..e1aff639d 100644
> --- a/tests/gem_eio.c
> +++ b/tests/gem_eio.c
> @@ -74,8 +74,7 @@ static void trigger_reset(int fd)
>   	/* And just check the gpu is indeed running again */
>   	igt_debug("Checking that the GPU recovered\n");
>   	gem_test_engine(fd, ALL_ENGINES);
> -
> -	gem_quiescent_gpu(fd);
> +	igt_drop_caches_set(fd, DROP_ACTIVE);
>   
>   	/* We expect forced reset and health check to be quick. */
>   	igt_assert(igt_seconds_elapsed(&ts) < 2);
> 

Sounds fine to only wait for idle:

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

However I am a bit surprised that under plain IGT environment RCU 
latency can be so high.

Regards,

Tvrtko
_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH i-g-t 5/6] tests/gem_eio: Only wait-for-idle inside trigger_reset()
  2018-05-14 11:03     ` [igt-dev] [Intel-gfx] " Tvrtko Ursulin
@ 2018-05-14 11:07       ` Chris Wilson
  -1 siblings, 0 replies; 38+ messages in thread
From: Chris Wilson @ 2018-05-14 11:07 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx; +Cc: igt-dev

Quoting Tvrtko Ursulin (2018-05-14 12:03:58)
> 
> On 14/05/2018 09:02, Chris Wilson wrote:
> > trigger_reset() imposes a tight time constraint (2s) so that we verify
> > that the reset itself completes quickly. In the middle of this check, we
> > call gem_quiescent_gpu() which may invoke an rcu_barrier() or two to
> > clear out the freed memory (DROP_FREED). Those barriers may have
> > unbounded latency pushing beyond the 2s timeout, so restrict the
> > operation to only wait-for-idle (DROP_ACTIVE).
> > 
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105957
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > ---
> >   tests/gem_eio.c | 3 +--
> >   1 file changed, 1 insertion(+), 2 deletions(-)
> > 
> > diff --git a/tests/gem_eio.c b/tests/gem_eio.c
> > index 4720b47b5..e1aff639d 100644
> > --- a/tests/gem_eio.c
> > +++ b/tests/gem_eio.c
> > @@ -74,8 +74,7 @@ static void trigger_reset(int fd)
> >       /* And just check the gpu is indeed running again */
> >       igt_debug("Checking that the GPU recovered\n");
> >       gem_test_engine(fd, ALL_ENGINES);
> > -
> > -     gem_quiescent_gpu(fd);
> > +     igt_drop_caches_set(fd, DROP_ACTIVE);
> >   
> >       /* We expect forced reset and health check to be quick. */
> >       igt_assert(igt_seconds_elapsed(&ts) < 2);
> > 
> 
> Sounds fine to only wait for idle:
> 
> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> However I am a bit surprised that under plain IGT environment RCU 
> latency can be so high.

I suspect it's more of a latency issue with recent kernels or CI systems.
There are a few mysterious multi-second sleeps that cause sporadic fails
around the place. rcu_barrier() can be slow (like 30s slow) but only
under load, afaik.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Intel-gfx] [PATCH i-g-t 5/6] tests/gem_eio: Only wait-for-idle inside trigger_reset()
@ 2018-05-14 11:07       ` Chris Wilson
  0 siblings, 0 replies; 38+ messages in thread
From: Chris Wilson @ 2018-05-14 11:07 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx; +Cc: igt-dev

Quoting Tvrtko Ursulin (2018-05-14 12:03:58)
> 
> On 14/05/2018 09:02, Chris Wilson wrote:
> > trigger_reset() imposes a tight time constraint (2s) so that we verify
> > that the reset itself completes quickly. In the middle of this check, we
> > call gem_quiescent_gpu() which may invoke an rcu_barrier() or two to
> > clear out the freed memory (DROP_FREED). Those barriers may have
> > unbounded latency pushing beyond the 2s timeout, so restrict the
> > operation to only wait-for-idle (DROP_ACTIVE).
> > 
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105957
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > ---
> >   tests/gem_eio.c | 3 +--
> >   1 file changed, 1 insertion(+), 2 deletions(-)
> > 
> > diff --git a/tests/gem_eio.c b/tests/gem_eio.c
> > index 4720b47b5..e1aff639d 100644
> > --- a/tests/gem_eio.c
> > +++ b/tests/gem_eio.c
> > @@ -74,8 +74,7 @@ static void trigger_reset(int fd)
> >       /* And just check the gpu is indeed running again */
> >       igt_debug("Checking that the GPU recovered\n");
> >       gem_test_engine(fd, ALL_ENGINES);
> > -
> > -     gem_quiescent_gpu(fd);
> > +     igt_drop_caches_set(fd, DROP_ACTIVE);
> >   
> >       /* We expect forced reset and health check to be quick. */
> >       igt_assert(igt_seconds_elapsed(&ts) < 2);
> > 
> 
> Sounds fine to only wait for idle:
> 
> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> However I am a bit surprised that under plain IGT environment RCU 
> latency can be so high.

I suspect it's more of a latency issue with recent kernels or CI systems.
There are a few mysterious multi-second sleeps that cause sporadic fails
around the place. rcu_barrier() can be slow (like 30s slow) but only
under load, afaik.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH i-g-t 6/6] tests/gem_exec_latency: New subtests for checking submission from RT tasks
  2018-05-14  8:02   ` [igt-dev] " Chris Wilson
@ 2018-05-14 11:13     ` Tvrtko Ursulin
  -1 siblings, 0 replies; 38+ messages in thread
From: Tvrtko Ursulin @ 2018-05-14 11:13 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: igt-dev


On 14/05/2018 09:02, Chris Wilson wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> We want to make sure RT tasks which use a lot of CPU times can submit
> batch buffers with roughly the same latency (and certainly not worse)
> compared to normal tasks.
> 
> v2: Add tests to run across all engines simultaneously to encourage
> ksoftirqd to kick in even more often.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> #v1
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   tests/gem_exec_latency.c | 207 +++++++++++++++++++++++++++++++++++++++
>   1 file changed, 207 insertions(+)
> 
> diff --git a/tests/gem_exec_latency.c b/tests/gem_exec_latency.c
> index 9498c0921..c5816427b 100644
> --- a/tests/gem_exec_latency.c
> +++ b/tests/gem_exec_latency.c
> @@ -36,11 +36,15 @@
>   #include <sys/time.h>
>   #include <sys/signal.h>
>   #include <time.h>
> +#include <sched.h>
>   
>   #include "drm.h"
>   
>   #include "igt_sysfs.h"
>   #include "igt_vgem.h"
> +#include "igt_dummyload.h"
> +#include "igt_stats.h"
> +
>   #include "i915/gem_ring.h"
>   
>   #define LOCAL_I915_EXEC_NO_RELOC (1<<11)
> @@ -351,6 +355,189 @@ static void latency_from_ring(int fd,
>   	}
>   }
>   
> +static void __rearm_spin_batch(igt_spin_t *spin)
> +{
> +	const uint32_t mi_arb_chk = 0x5 << 23;
> +
> +       *spin->batch = mi_arb_chk;
> +       *spin->running = 0;
> +       __sync_synchronize();
> +}
> +
> +static void
> +__submit_spin_batch(int fd, igt_spin_t *spin, unsigned int flags)
> +{
> +	struct drm_i915_gem_execbuffer2 eb = spin->execbuf;
> +
> +	eb.flags &= ~(0x3f | I915_EXEC_BSD_MASK);
> +	eb.flags |= flags | I915_EXEC_NO_RELOC;
> +
> +	gem_execbuf(fd, &eb);
> +}
> +
> +struct rt_pkt {
> +	struct igt_mean mean;
> +	double min, max;
> +};
> +
> +static bool __spin_wait(int fd, igt_spin_t *spin)
> +{
> +	while (!READ_ONCE(*spin->running)) {
> +		if (!gem_bo_busy(fd, spin->handle))
> +			return false;
> +	}
> +
> +	return true;
> +}
> +
> +/*
> + * Test whether RT thread which hogs the CPU a lot can submit work with
> + * reasonable latency.
> + */
> +static void
> +rthog_latency_on_ring(int fd, unsigned int engine, const char *name, unsigned int flags)
> +#define RTIDLE 0x1
> +{
> +	const char *passname[] = { "warmup", "normal", "rt" };
> +	struct rt_pkt *results;
> +	unsigned int engines[16];
> +	const char *names[16];
> +	unsigned int nengine;
> +	int ret;
> +
> +	results = mmap(NULL, 4096, PROT_WRITE, MAP_SHARED | MAP_ANON, -1, 0);
> +	igt_assert(results != MAP_FAILED);
> +
> +	nengine = 0;
> +	if (engine == ALL_ENGINES) {
> +		for_each_physical_engine(fd, engine) {
> +			if (!gem_can_store_dword(fd, engine))
> +				continue;
> +
> +			engines[nengine] = engine;
> +			names[nengine] = e__->name;
> +			nengine++;
> +		}
> +		igt_require(nengine > 1);
> +	} else {
> +		igt_require(gem_can_store_dword(fd, engine));
> +		engines[nengine] = engine;
> +		names[nengine] = name;
> +		nengine++;
> +	}
> +
> +	igt_fork(child, nengine) {
> +		unsigned int pass = 0; /* Three passes: warmup, normal, rt. */
> +
> +		engine = engines[child];
> +		do {
> +			struct igt_mean mean;
> +			double min = HUGE_VAL;
> +			double max = -HUGE_VAL;
> +			igt_spin_t *spin;
> +
> +			igt_mean_init(&mean);
> +
> +			if (pass == 2) {
> +				struct sched_param rt =
> +				{ .sched_priority = 99 };
> +
> +				ret = sched_setscheduler(0,
> +							 SCHED_FIFO | SCHED_RESET_ON_FORK,
> +							 &rt);
> +				if (ret) {
> +					igt_warn("Failed to set scheduling policy!\n");
> +					break;
> +				}
> +			}
> +
> +			spin = __igt_spin_batch_new_poll(fd, 0, engine);
> +			if (!spin) {
> +				igt_warn("Failed to create spinner! (%s)\n",
> +					 passname[pass]);
> +				break;
> +			}
> +			igt_spin_busywait_until_running(spin);
> +
> +			igt_until_timeout(pass > 0 ? 5 : 2) {
> +				struct timespec ts = { };
> +				double t;
> +
> +				igt_spin_batch_end(spin);
> +				gem_sync(fd, spin->handle);
> +				if (flags & RTIDLE)
> +					usleep(250);

250us is how long you expect context complete to be received in?

s/RTIDLE/SUBMIT_IDLE/ ?

> +
> +				/*
> +				 * If we are oversubscribed (more RT hogs than
> +				 * cpus) give the others a change to run;
> +				 * otherwise, they will interrupt us in the
> +				 * middle of the measurement.
> +				 */
> +				if (nengine > 1)
> +					usleep(10*nengine);

Could check for actual number of cpus here.

You want it on top of the RTIDLE sleep?

> +
> +				__rearm_spin_batch(spin);
> +
> +				igt_nsec_elapsed(&ts);
> +				__submit_spin_batch(fd, spin, engine);
> +				if (!__spin_wait(fd, spin)) {
> +					igt_warn("Wait timeout! (%s)\n",
> +						 passname[pass]);

It's not really a timeout how __spin_wait is implemented.

> +					break;
> +				}
> +
> +				t = igt_nsec_elapsed(&ts) * 1e-9;
> +				if (t > max)
> +					max = t;
> +				if (t < min)
> +					min = t;
> +
> +				igt_mean_add(&mean, t);
> +			}
> +
> +			igt_spin_batch_free(fd, spin);
> +
> +			igt_info("%8s %10s: mean=%.2fus stddev=%.3fus [%.2fus, %.2fus] (n=%lu)\n",
> +				 names[child],
> +				 passname[pass],
> +				 igt_mean_get(&mean) * 1e6,
> +				 sqrt(igt_mean_get_variance(&mean)) * 1e6, > +				 min * 1e6, max * 1e6,
> +				 mean.count);
> +
> +			results[3 * child + pass].mean = mean;
> +			results[3 * child + pass].min = min;
> +			results[3 * child + pass].max = max;
> +		} while (++pass < 3);
> +	}
> +
> +	igt_waitchildren();
> +
> +	for (unsigned int child = 0; child < nengine; child++) {
> +		struct rt_pkt normal = results[3 * child + 1];
> +		struct rt_pkt rt = results[3 * child + 2];
> +
> +		igt_assert(rt.max);
> +
> +		igt_info("%8s: normal latency=%.2f±%.3fus, rt latency=%.2f±%.3fus\n",
> +			 names[child],
> +			 igt_mean_get(&normal.mean) * 1e6,
> +			 sqrt(igt_mean_get_variance(&normal.mean)) * 1e6,
> +			 igt_mean_get(&rt.mean) * 1e6,
> +			 sqrt(igt_mean_get_variance(&rt.mean)) * 1e6);
> +
> +		igt_assert(igt_mean_get(&rt.mean) <
> +			   igt_mean_get(&normal.mean) * 2);
> +
> +		/* The system is noisy; be conservative when declaring fail. */
> +		igt_assert(igt_mean_get_variance(&rt.mean) <
> +			   igt_mean_get_variance(&normal.mean) * 25);

I don't know what "* 25" means in statistical terms. At some point you 
said variance doesn't work and you had to go with stdev?

> +	}
> +
> +	munmap(results, 4096);
> +}
> +
>   igt_main
>   {
>   	const struct intel_execution_engine *e;
> @@ -373,6 +560,12 @@ igt_main
>   		intel_register_access_init(intel_get_pci_device(), false, device);
>   	}
>   
> +	igt_subtest("all-rtidle-submit")
> +		rthog_latency_on_ring(device, ALL_ENGINES, "all", RTIDLE);
> +
> +	igt_subtest("all-rthog-submit")
> +		rthog_latency_on_ring(device, ALL_ENGINES, "all", 0);
> +
>   	igt_subtest_group {
>   		igt_fixture
>   			igt_require(intel_gen(intel_get_drm_devid(device)) >= 7);
> @@ -391,6 +584,20 @@ igt_main
>   							e->exec_id | e->flags,
>   							e->name, 0);
>   
> +				igt_subtest_f("%s-rtidle-submit", e->name)
> +					rthog_latency_on_ring(device,
> +							      e->exec_id |
> +							      e->flags,
> +							      e->name,
> +							      RTIDLE);
> +
> +				igt_subtest_f("%s-rthog-submit", e->name)
> +					rthog_latency_on_ring(device,
> +							      e->exec_id |
> +							      e->flags,
> +							      e->name,
> +							      0);
> +
>   				igt_subtest_f("%s-dispatch-queued", e->name)
>   					latency_on_ring(device,
>   							e->exec_id | e->flags,
> 

Regards,

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Intel-gfx] [PATCH i-g-t 6/6] tests/gem_exec_latency: New subtests for checking submission from RT tasks
@ 2018-05-14 11:13     ` Tvrtko Ursulin
  0 siblings, 0 replies; 38+ messages in thread
From: Tvrtko Ursulin @ 2018-05-14 11:13 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: igt-dev


On 14/05/2018 09:02, Chris Wilson wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> We want to make sure RT tasks which use a lot of CPU times can submit
> batch buffers with roughly the same latency (and certainly not worse)
> compared to normal tasks.
> 
> v2: Add tests to run across all engines simultaneously to encourage
> ksoftirqd to kick in even more often.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> #v1
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   tests/gem_exec_latency.c | 207 +++++++++++++++++++++++++++++++++++++++
>   1 file changed, 207 insertions(+)
> 
> diff --git a/tests/gem_exec_latency.c b/tests/gem_exec_latency.c
> index 9498c0921..c5816427b 100644
> --- a/tests/gem_exec_latency.c
> +++ b/tests/gem_exec_latency.c
> @@ -36,11 +36,15 @@
>   #include <sys/time.h>
>   #include <sys/signal.h>
>   #include <time.h>
> +#include <sched.h>
>   
>   #include "drm.h"
>   
>   #include "igt_sysfs.h"
>   #include "igt_vgem.h"
> +#include "igt_dummyload.h"
> +#include "igt_stats.h"
> +
>   #include "i915/gem_ring.h"
>   
>   #define LOCAL_I915_EXEC_NO_RELOC (1<<11)
> @@ -351,6 +355,189 @@ static void latency_from_ring(int fd,
>   	}
>   }
>   
> +static void __rearm_spin_batch(igt_spin_t *spin)
> +{
> +	const uint32_t mi_arb_chk = 0x5 << 23;
> +
> +       *spin->batch = mi_arb_chk;
> +       *spin->running = 0;
> +       __sync_synchronize();
> +}
> +
> +static void
> +__submit_spin_batch(int fd, igt_spin_t *spin, unsigned int flags)
> +{
> +	struct drm_i915_gem_execbuffer2 eb = spin->execbuf;
> +
> +	eb.flags &= ~(0x3f | I915_EXEC_BSD_MASK);
> +	eb.flags |= flags | I915_EXEC_NO_RELOC;
> +
> +	gem_execbuf(fd, &eb);
> +}
> +
> +struct rt_pkt {
> +	struct igt_mean mean;
> +	double min, max;
> +};
> +
> +static bool __spin_wait(int fd, igt_spin_t *spin)
> +{
> +	while (!READ_ONCE(*spin->running)) {
> +		if (!gem_bo_busy(fd, spin->handle))
> +			return false;
> +	}
> +
> +	return true;
> +}
> +
> +/*
> + * Test whether RT thread which hogs the CPU a lot can submit work with
> + * reasonable latency.
> + */
> +static void
> +rthog_latency_on_ring(int fd, unsigned int engine, const char *name, unsigned int flags)
> +#define RTIDLE 0x1
> +{
> +	const char *passname[] = { "warmup", "normal", "rt" };
> +	struct rt_pkt *results;
> +	unsigned int engines[16];
> +	const char *names[16];
> +	unsigned int nengine;
> +	int ret;
> +
> +	results = mmap(NULL, 4096, PROT_WRITE, MAP_SHARED | MAP_ANON, -1, 0);
> +	igt_assert(results != MAP_FAILED);
> +
> +	nengine = 0;
> +	if (engine == ALL_ENGINES) {
> +		for_each_physical_engine(fd, engine) {
> +			if (!gem_can_store_dword(fd, engine))
> +				continue;
> +
> +			engines[nengine] = engine;
> +			names[nengine] = e__->name;
> +			nengine++;
> +		}
> +		igt_require(nengine > 1);
> +	} else {
> +		igt_require(gem_can_store_dword(fd, engine));
> +		engines[nengine] = engine;
> +		names[nengine] = name;
> +		nengine++;
> +	}
> +
> +	igt_fork(child, nengine) {
> +		unsigned int pass = 0; /* Three passes: warmup, normal, rt. */
> +
> +		engine = engines[child];
> +		do {
> +			struct igt_mean mean;
> +			double min = HUGE_VAL;
> +			double max = -HUGE_VAL;
> +			igt_spin_t *spin;
> +
> +			igt_mean_init(&mean);
> +
> +			if (pass == 2) {
> +				struct sched_param rt =
> +				{ .sched_priority = 99 };
> +
> +				ret = sched_setscheduler(0,
> +							 SCHED_FIFO | SCHED_RESET_ON_FORK,
> +							 &rt);
> +				if (ret) {
> +					igt_warn("Failed to set scheduling policy!\n");
> +					break;
> +				}
> +			}
> +
> +			spin = __igt_spin_batch_new_poll(fd, 0, engine);
> +			if (!spin) {
> +				igt_warn("Failed to create spinner! (%s)\n",
> +					 passname[pass]);
> +				break;
> +			}
> +			igt_spin_busywait_until_running(spin);
> +
> +			igt_until_timeout(pass > 0 ? 5 : 2) {
> +				struct timespec ts = { };
> +				double t;
> +
> +				igt_spin_batch_end(spin);
> +				gem_sync(fd, spin->handle);
> +				if (flags & RTIDLE)
> +					usleep(250);

250us is how long you expect context complete to be received in?

s/RTIDLE/SUBMIT_IDLE/ ?

> +
> +				/*
> +				 * If we are oversubscribed (more RT hogs than
> +				 * cpus) give the others a change to run;
> +				 * otherwise, they will interrupt us in the
> +				 * middle of the measurement.
> +				 */
> +				if (nengine > 1)
> +					usleep(10*nengine);

Could check for actual number of cpus here.

You want it on top of the RTIDLE sleep?

> +
> +				__rearm_spin_batch(spin);
> +
> +				igt_nsec_elapsed(&ts);
> +				__submit_spin_batch(fd, spin, engine);
> +				if (!__spin_wait(fd, spin)) {
> +					igt_warn("Wait timeout! (%s)\n",
> +						 passname[pass]);

It's not really a timeout how __spin_wait is implemented.

> +					break;
> +				}
> +
> +				t = igt_nsec_elapsed(&ts) * 1e-9;
> +				if (t > max)
> +					max = t;
> +				if (t < min)
> +					min = t;
> +
> +				igt_mean_add(&mean, t);
> +			}
> +
> +			igt_spin_batch_free(fd, spin);
> +
> +			igt_info("%8s %10s: mean=%.2fus stddev=%.3fus [%.2fus, %.2fus] (n=%lu)\n",
> +				 names[child],
> +				 passname[pass],
> +				 igt_mean_get(&mean) * 1e6,
> +				 sqrt(igt_mean_get_variance(&mean)) * 1e6, > +				 min * 1e6, max * 1e6,
> +				 mean.count);
> +
> +			results[3 * child + pass].mean = mean;
> +			results[3 * child + pass].min = min;
> +			results[3 * child + pass].max = max;
> +		} while (++pass < 3);
> +	}
> +
> +	igt_waitchildren();
> +
> +	for (unsigned int child = 0; child < nengine; child++) {
> +		struct rt_pkt normal = results[3 * child + 1];
> +		struct rt_pkt rt = results[3 * child + 2];
> +
> +		igt_assert(rt.max);
> +
> +		igt_info("%8s: normal latency=%.2f±%.3fus, rt latency=%.2f±%.3fus\n",
> +			 names[child],
> +			 igt_mean_get(&normal.mean) * 1e6,
> +			 sqrt(igt_mean_get_variance(&normal.mean)) * 1e6,
> +			 igt_mean_get(&rt.mean) * 1e6,
> +			 sqrt(igt_mean_get_variance(&rt.mean)) * 1e6);
> +
> +		igt_assert(igt_mean_get(&rt.mean) <
> +			   igt_mean_get(&normal.mean) * 2);
> +
> +		/* The system is noisy; be conservative when declaring fail. */
> +		igt_assert(igt_mean_get_variance(&rt.mean) <
> +			   igt_mean_get_variance(&normal.mean) * 25);

I don't know what "* 25" means in statistical terms. At some point you 
said variance doesn't work and you had to go with stdev?

> +	}
> +
> +	munmap(results, 4096);
> +}
> +
>   igt_main
>   {
>   	const struct intel_execution_engine *e;
> @@ -373,6 +560,12 @@ igt_main
>   		intel_register_access_init(intel_get_pci_device(), false, device);
>   	}
>   
> +	igt_subtest("all-rtidle-submit")
> +		rthog_latency_on_ring(device, ALL_ENGINES, "all", RTIDLE);
> +
> +	igt_subtest("all-rthog-submit")
> +		rthog_latency_on_ring(device, ALL_ENGINES, "all", 0);
> +
>   	igt_subtest_group {
>   		igt_fixture
>   			igt_require(intel_gen(intel_get_drm_devid(device)) >= 7);
> @@ -391,6 +584,20 @@ igt_main
>   							e->exec_id | e->flags,
>   							e->name, 0);
>   
> +				igt_subtest_f("%s-rtidle-submit", e->name)
> +					rthog_latency_on_ring(device,
> +							      e->exec_id |
> +							      e->flags,
> +							      e->name,
> +							      RTIDLE);
> +
> +				igt_subtest_f("%s-rthog-submit", e->name)
> +					rthog_latency_on_ring(device,
> +							      e->exec_id |
> +							      e->flags,
> +							      e->name,
> +							      0);
> +
>   				igt_subtest_f("%s-dispatch-queued", e->name)
>   					latency_on_ring(device,
>   							e->exec_id | e->flags,
> 

Regards,

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH i-g-t 6/6] tests/gem_exec_latency: New subtests for checking submission from RT tasks
  2018-05-14 11:13     ` [Intel-gfx] " Tvrtko Ursulin
@ 2018-05-14 11:25       ` Chris Wilson
  -1 siblings, 0 replies; 38+ messages in thread
From: Chris Wilson @ 2018-05-14 11:25 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx; +Cc: igt-dev

Quoting Tvrtko Ursulin (2018-05-14 12:13:10)
> 
> On 14/05/2018 09:02, Chris Wilson wrote:
> > From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > 
> > We want to make sure RT tasks which use a lot of CPU times can submit
> > batch buffers with roughly the same latency (and certainly not worse)
> > compared to normal tasks.
> > 
> > v2: Add tests to run across all engines simultaneously to encourage
> > ksoftirqd to kick in even more often.
> > 
> > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> #v1
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >   tests/gem_exec_latency.c | 207 +++++++++++++++++++++++++++++++++++++++
> >   1 file changed, 207 insertions(+)
> > 
> > diff --git a/tests/gem_exec_latency.c b/tests/gem_exec_latency.c
> > index 9498c0921..c5816427b 100644
> > --- a/tests/gem_exec_latency.c
> > +++ b/tests/gem_exec_latency.c
> > @@ -36,11 +36,15 @@
> >   #include <sys/time.h>
> >   #include <sys/signal.h>
> >   #include <time.h>
> > +#include <sched.h>
> >   
> >   #include "drm.h"
> >   
> >   #include "igt_sysfs.h"
> >   #include "igt_vgem.h"
> > +#include "igt_dummyload.h"
> > +#include "igt_stats.h"
> > +
> >   #include "i915/gem_ring.h"
> >   
> >   #define LOCAL_I915_EXEC_NO_RELOC (1<<11)
> > @@ -351,6 +355,189 @@ static void latency_from_ring(int fd,
> >       }
> >   }
> >   
> > +static void __rearm_spin_batch(igt_spin_t *spin)
> > +{
> > +     const uint32_t mi_arb_chk = 0x5 << 23;
> > +
> > +       *spin->batch = mi_arb_chk;
> > +       *spin->running = 0;
> > +       __sync_synchronize();
> > +}
> > +
> > +static void
> > +__submit_spin_batch(int fd, igt_spin_t *spin, unsigned int flags)
> > +{
> > +     struct drm_i915_gem_execbuffer2 eb = spin->execbuf;
> > +
> > +     eb.flags &= ~(0x3f | I915_EXEC_BSD_MASK);
> > +     eb.flags |= flags | I915_EXEC_NO_RELOC;
> > +
> > +     gem_execbuf(fd, &eb);
> > +}
> > +
> > +struct rt_pkt {
> > +     struct igt_mean mean;
> > +     double min, max;
> > +};
> > +
> > +static bool __spin_wait(int fd, igt_spin_t *spin)
> > +{
> > +     while (!READ_ONCE(*spin->running)) {
> > +             if (!gem_bo_busy(fd, spin->handle))
> > +                     return false;
> > +     }
> > +
> > +     return true;
> > +}
> > +
> > +/*
> > + * Test whether RT thread which hogs the CPU a lot can submit work with
> > + * reasonable latency.
> > + */
> > +static void
> > +rthog_latency_on_ring(int fd, unsigned int engine, const char *name, unsigned int flags)
> > +#define RTIDLE 0x1
> > +{
> > +     const char *passname[] = { "warmup", "normal", "rt" };
> > +     struct rt_pkt *results;
> > +     unsigned int engines[16];
> > +     const char *names[16];
> > +     unsigned int nengine;
> > +     int ret;
> > +
> > +     results = mmap(NULL, 4096, PROT_WRITE, MAP_SHARED | MAP_ANON, -1, 0);
> > +     igt_assert(results != MAP_FAILED);
> > +
> > +     nengine = 0;
> > +     if (engine == ALL_ENGINES) {
> > +             for_each_physical_engine(fd, engine) {
> > +                     if (!gem_can_store_dword(fd, engine))
> > +                             continue;
> > +
> > +                     engines[nengine] = engine;
> > +                     names[nengine] = e__->name;
> > +                     nengine++;
> > +             }
> > +             igt_require(nengine > 1);
> > +     } else {
> > +             igt_require(gem_can_store_dword(fd, engine));
> > +             engines[nengine] = engine;
> > +             names[nengine] = name;
> > +             nengine++;
> > +     }
> > +
> > +     igt_fork(child, nengine) {
> > +             unsigned int pass = 0; /* Three passes: warmup, normal, rt. */
> > +
> > +             engine = engines[child];
> > +             do {
> > +                     struct igt_mean mean;
> > +                     double min = HUGE_VAL;
> > +                     double max = -HUGE_VAL;
> > +                     igt_spin_t *spin;
> > +
> > +                     igt_mean_init(&mean);
> > +
> > +                     if (pass == 2) {
> > +                             struct sched_param rt =
> > +                             { .sched_priority = 99 };
> > +
> > +                             ret = sched_setscheduler(0,
> > +                                                      SCHED_FIFO | SCHED_RESET_ON_FORK,
> > +                                                      &rt);
> > +                             if (ret) {
> > +                                     igt_warn("Failed to set scheduling policy!\n");
> > +                                     break;
> > +                             }
> > +                     }
> > +
> > +                     spin = __igt_spin_batch_new_poll(fd, 0, engine);
> > +                     if (!spin) {
> > +                             igt_warn("Failed to create spinner! (%s)\n",
> > +                                      passname[pass]);
> > +                             break;
> > +                     }
> > +                     igt_spin_busywait_until_running(spin);
> > +
> > +                     igt_until_timeout(pass > 0 ? 5 : 2) {
> > +                             struct timespec ts = { };
> > +                             double t;
> > +
> > +                             igt_spin_batch_end(spin);
> > +                             gem_sync(fd, spin->handle);
> > +                             if (flags & RTIDLE)
> > +                                     usleep(250);
> 
> 250us is how long you expect context complete to be received in?
> 
> s/RTIDLE/SUBMIT_IDLE/ ?

And the others. I guess DROP_IDLE (i.e. do a wait-for-idle might be
fun).

> > +
> > +                             /*
> > +                              * If we are oversubscribed (more RT hogs than
> > +                              * cpus) give the others a change to run;
> > +                              * otherwise, they will interrupt us in the
> > +                              * middle of the measurement.
> > +                              */
> > +                             if (nengine > 1)
> > +                                     usleep(10*nengine);
> 
> Could check for actual number of cpus here.

Could be that fancy. There's still the issue that if we consume
more than 70% (or whatever the rt limit is) we can be throttled. So we
do need some sort of scheduler relinquishment.
 
> You want it on top of the RTIDLE sleep?

It's peanuts on top, so I don't mind.
> 
> > +
> > +                             __rearm_spin_batch(spin);
> > +
> > +                             igt_nsec_elapsed(&ts);
> > +                             __submit_spin_batch(fd, spin, engine);
> > +                             if (!__spin_wait(fd, spin)) {
> > +                                     igt_warn("Wait timeout! (%s)\n",
> > +                                              passname[pass]);
> 
> It's not really a timeout how __spin_wait is implemented.

Can't blame me for this one ;)

> > +                                     break;
> > +                             }
> > +
> > +                             t = igt_nsec_elapsed(&ts) * 1e-9;
> > +                             if (t > max)
> > +                                     max = t;
> > +                             if (t < min)
> > +                                     min = t;
> > +
> > +                             igt_mean_add(&mean, t);
> > +                     }
> > +
> > +                     igt_spin_batch_free(fd, spin);
> > +
> > +                     igt_info("%8s %10s: mean=%.2fus stddev=%.3fus [%.2fus, %.2fus] (n=%lu)\n",
> > +                              names[child],
> > +                              passname[pass],
> > +                              igt_mean_get(&mean) * 1e6,
> > +                              sqrt(igt_mean_get_variance(&mean)) * 1e6, > +                           min * 1e6, max * 1e6,
> > +                              mean.count);
> > +
> > +                     results[3 * child + pass].mean = mean;
> > +                     results[3 * child + pass].min = min;
> > +                     results[3 * child + pass].max = max;
> > +             } while (++pass < 3);
> > +     }
> > +
> > +     igt_waitchildren();
> > +
> > +     for (unsigned int child = 0; child < nengine; child++) {
> > +             struct rt_pkt normal = results[3 * child + 1];
> > +             struct rt_pkt rt = results[3 * child + 2];
> > +
> > +             igt_assert(rt.max);
> > +
> > +             igt_info("%8s: normal latency=%.2f±%.3fus, rt latency=%.2f±%.3fus\n",
> > +                      names[child],
> > +                      igt_mean_get(&normal.mean) * 1e6,
> > +                      sqrt(igt_mean_get_variance(&normal.mean)) * 1e6,
> > +                      igt_mean_get(&rt.mean) * 1e6,
> > +                      sqrt(igt_mean_get_variance(&rt.mean)) * 1e6);
> > +
> > +             igt_assert(igt_mean_get(&rt.mean) <
> > +                        igt_mean_get(&normal.mean) * 2);
> > +
> > +             /* The system is noisy; be conservative when declaring fail. */
> > +             igt_assert(igt_mean_get_variance(&rt.mean) <
> > +                        igt_mean_get_variance(&normal.mean) * 25);
> 
> I don't know what "* 25" means in statistical terms. At some point you 
> said variance doesn't work and you had to go with stdev?

5 sigma. I was complaining that reporting variance as 0.000us wasn't
particularly useful (wrong units for starters ;) and that we definitely
could meet the 3*variance requirement :)

At this moment, I'm plucking numbers to try and safely fail when it's
bad (we hit ksoftird and it's order of magnitude worse), but enough
slack for noise. The mean so far looks more reliable than the
variance/stddev, but testing the variability makes sense. Oh well, if
the system is that unstable, we just could use more passes to improve the
statistics.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Intel-gfx] [PATCH i-g-t 6/6] tests/gem_exec_latency: New subtests for checking submission from RT tasks
@ 2018-05-14 11:25       ` Chris Wilson
  0 siblings, 0 replies; 38+ messages in thread
From: Chris Wilson @ 2018-05-14 11:25 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx; +Cc: igt-dev

Quoting Tvrtko Ursulin (2018-05-14 12:13:10)
> 
> On 14/05/2018 09:02, Chris Wilson wrote:
> > From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > 
> > We want to make sure RT tasks which use a lot of CPU times can submit
> > batch buffers with roughly the same latency (and certainly not worse)
> > compared to normal tasks.
> > 
> > v2: Add tests to run across all engines simultaneously to encourage
> > ksoftirqd to kick in even more often.
> > 
> > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> #v1
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >   tests/gem_exec_latency.c | 207 +++++++++++++++++++++++++++++++++++++++
> >   1 file changed, 207 insertions(+)
> > 
> > diff --git a/tests/gem_exec_latency.c b/tests/gem_exec_latency.c
> > index 9498c0921..c5816427b 100644
> > --- a/tests/gem_exec_latency.c
> > +++ b/tests/gem_exec_latency.c
> > @@ -36,11 +36,15 @@
> >   #include <sys/time.h>
> >   #include <sys/signal.h>
> >   #include <time.h>
> > +#include <sched.h>
> >   
> >   #include "drm.h"
> >   
> >   #include "igt_sysfs.h"
> >   #include "igt_vgem.h"
> > +#include "igt_dummyload.h"
> > +#include "igt_stats.h"
> > +
> >   #include "i915/gem_ring.h"
> >   
> >   #define LOCAL_I915_EXEC_NO_RELOC (1<<11)
> > @@ -351,6 +355,189 @@ static void latency_from_ring(int fd,
> >       }
> >   }
> >   
> > +static void __rearm_spin_batch(igt_spin_t *spin)
> > +{
> > +     const uint32_t mi_arb_chk = 0x5 << 23;
> > +
> > +       *spin->batch = mi_arb_chk;
> > +       *spin->running = 0;
> > +       __sync_synchronize();
> > +}
> > +
> > +static void
> > +__submit_spin_batch(int fd, igt_spin_t *spin, unsigned int flags)
> > +{
> > +     struct drm_i915_gem_execbuffer2 eb = spin->execbuf;
> > +
> > +     eb.flags &= ~(0x3f | I915_EXEC_BSD_MASK);
> > +     eb.flags |= flags | I915_EXEC_NO_RELOC;
> > +
> > +     gem_execbuf(fd, &eb);
> > +}
> > +
> > +struct rt_pkt {
> > +     struct igt_mean mean;
> > +     double min, max;
> > +};
> > +
> > +static bool __spin_wait(int fd, igt_spin_t *spin)
> > +{
> > +     while (!READ_ONCE(*spin->running)) {
> > +             if (!gem_bo_busy(fd, spin->handle))
> > +                     return false;
> > +     }
> > +
> > +     return true;
> > +}
> > +
> > +/*
> > + * Test whether RT thread which hogs the CPU a lot can submit work with
> > + * reasonable latency.
> > + */
> > +static void
> > +rthog_latency_on_ring(int fd, unsigned int engine, const char *name, unsigned int flags)
> > +#define RTIDLE 0x1
> > +{
> > +     const char *passname[] = { "warmup", "normal", "rt" };
> > +     struct rt_pkt *results;
> > +     unsigned int engines[16];
> > +     const char *names[16];
> > +     unsigned int nengine;
> > +     int ret;
> > +
> > +     results = mmap(NULL, 4096, PROT_WRITE, MAP_SHARED | MAP_ANON, -1, 0);
> > +     igt_assert(results != MAP_FAILED);
> > +
> > +     nengine = 0;
> > +     if (engine == ALL_ENGINES) {
> > +             for_each_physical_engine(fd, engine) {
> > +                     if (!gem_can_store_dword(fd, engine))
> > +                             continue;
> > +
> > +                     engines[nengine] = engine;
> > +                     names[nengine] = e__->name;
> > +                     nengine++;
> > +             }
> > +             igt_require(nengine > 1);
> > +     } else {
> > +             igt_require(gem_can_store_dword(fd, engine));
> > +             engines[nengine] = engine;
> > +             names[nengine] = name;
> > +             nengine++;
> > +     }
> > +
> > +     igt_fork(child, nengine) {
> > +             unsigned int pass = 0; /* Three passes: warmup, normal, rt. */
> > +
> > +             engine = engines[child];
> > +             do {
> > +                     struct igt_mean mean;
> > +                     double min = HUGE_VAL;
> > +                     double max = -HUGE_VAL;
> > +                     igt_spin_t *spin;
> > +
> > +                     igt_mean_init(&mean);
> > +
> > +                     if (pass == 2) {
> > +                             struct sched_param rt =
> > +                             { .sched_priority = 99 };
> > +
> > +                             ret = sched_setscheduler(0,
> > +                                                      SCHED_FIFO | SCHED_RESET_ON_FORK,
> > +                                                      &rt);
> > +                             if (ret) {
> > +                                     igt_warn("Failed to set scheduling policy!\n");
> > +                                     break;
> > +                             }
> > +                     }
> > +
> > +                     spin = __igt_spin_batch_new_poll(fd, 0, engine);
> > +                     if (!spin) {
> > +                             igt_warn("Failed to create spinner! (%s)\n",
> > +                                      passname[pass]);
> > +                             break;
> > +                     }
> > +                     igt_spin_busywait_until_running(spin);
> > +
> > +                     igt_until_timeout(pass > 0 ? 5 : 2) {
> > +                             struct timespec ts = { };
> > +                             double t;
> > +
> > +                             igt_spin_batch_end(spin);
> > +                             gem_sync(fd, spin->handle);
> > +                             if (flags & RTIDLE)
> > +                                     usleep(250);
> 
> 250us is how long you expect context complete to be received in?
> 
> s/RTIDLE/SUBMIT_IDLE/ ?

And the others. I guess DROP_IDLE (i.e. do a wait-for-idle might be
fun).

> > +
> > +                             /*
> > +                              * If we are oversubscribed (more RT hogs than
> > +                              * cpus) give the others a change to run;
> > +                              * otherwise, they will interrupt us in the
> > +                              * middle of the measurement.
> > +                              */
> > +                             if (nengine > 1)
> > +                                     usleep(10*nengine);
> 
> Could check for actual number of cpus here.

Could be that fancy. There's still the issue that if we consume
more than 70% (or whatever the rt limit is) we can be throttled. So we
do need some sort of scheduler relinquishment.
 
> You want it on top of the RTIDLE sleep?

It's peanuts on top, so I don't mind.
> 
> > +
> > +                             __rearm_spin_batch(spin);
> > +
> > +                             igt_nsec_elapsed(&ts);
> > +                             __submit_spin_batch(fd, spin, engine);
> > +                             if (!__spin_wait(fd, spin)) {
> > +                                     igt_warn("Wait timeout! (%s)\n",
> > +                                              passname[pass]);
> 
> It's not really a timeout how __spin_wait is implemented.

Can't blame me for this one ;)

> > +                                     break;
> > +                             }
> > +
> > +                             t = igt_nsec_elapsed(&ts) * 1e-9;
> > +                             if (t > max)
> > +                                     max = t;
> > +                             if (t < min)
> > +                                     min = t;
> > +
> > +                             igt_mean_add(&mean, t);
> > +                     }
> > +
> > +                     igt_spin_batch_free(fd, spin);
> > +
> > +                     igt_info("%8s %10s: mean=%.2fus stddev=%.3fus [%.2fus, %.2fus] (n=%lu)\n",
> > +                              names[child],
> > +                              passname[pass],
> > +                              igt_mean_get(&mean) * 1e6,
> > +                              sqrt(igt_mean_get_variance(&mean)) * 1e6, > +                           min * 1e6, max * 1e6,
> > +                              mean.count);
> > +
> > +                     results[3 * child + pass].mean = mean;
> > +                     results[3 * child + pass].min = min;
> > +                     results[3 * child + pass].max = max;
> > +             } while (++pass < 3);
> > +     }
> > +
> > +     igt_waitchildren();
> > +
> > +     for (unsigned int child = 0; child < nengine; child++) {
> > +             struct rt_pkt normal = results[3 * child + 1];
> > +             struct rt_pkt rt = results[3 * child + 2];
> > +
> > +             igt_assert(rt.max);
> > +
> > +             igt_info("%8s: normal latency=%.2f±%.3fus, rt latency=%.2f±%.3fus\n",
> > +                      names[child],
> > +                      igt_mean_get(&normal.mean) * 1e6,
> > +                      sqrt(igt_mean_get_variance(&normal.mean)) * 1e6,
> > +                      igt_mean_get(&rt.mean) * 1e6,
> > +                      sqrt(igt_mean_get_variance(&rt.mean)) * 1e6);
> > +
> > +             igt_assert(igt_mean_get(&rt.mean) <
> > +                        igt_mean_get(&normal.mean) * 2);
> > +
> > +             /* The system is noisy; be conservative when declaring fail. */
> > +             igt_assert(igt_mean_get_variance(&rt.mean) <
> > +                        igt_mean_get_variance(&normal.mean) * 25);
> 
> I don't know what "* 25" means in statistical terms. At some point you 
> said variance doesn't work and you had to go with stdev?

5 sigma. I was complaining that reporting variance as 0.000us wasn't
particularly useful (wrong units for starters ;) and that we definitely
could meet the 3*variance requirement :)

At this moment, I'm plucking numbers to try and safely fail when it's
bad (we hit ksoftird and it's order of magnitude worse), but enough
slack for noise. The mean so far looks more reliable than the
variance/stddev, but testing the variability makes sense. Oh well, if
the system is that unstable, we just could use more passes to improve the
statistics.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH i-g-t 3/6] igt/gem_ctx_thrash: Order writes between contexts
  2018-05-14 10:59     ` [igt-dev] [Intel-gfx] " Tvrtko Ursulin
@ 2018-05-14 15:10       ` Chris Wilson
  -1 siblings, 0 replies; 38+ messages in thread
From: Chris Wilson @ 2018-05-14 15:10 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx; +Cc: igt-dev

Quoting Tvrtko Ursulin (2018-05-14 11:59:09)
> 
> On 14/05/2018 09:02, Chris Wilson wrote:
> > The test wrote to the same dwords from multiple contexts, assuming that
> > the writes would be ordered by its submission. However, as it was using
> > multiple contexts without a write hazard, those timelines are not
> > coupled and the requests may be emitted to hw in any order. So emit a
> > write hazard for each individual dword in the scratch (avoiding the
> > write hazard for the scratch as a whole) to ensure the writes do occur
> > in the expected order.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >   tests/gem_ctx_thrash.c | 92 ++++++++++++++++++++++++------------------
> >   1 file changed, 53 insertions(+), 39 deletions(-)
> > 
> > diff --git a/tests/gem_ctx_thrash.c b/tests/gem_ctx_thrash.c
> > index 2cd9cfebf..b25f95f13 100644
> > --- a/tests/gem_ctx_thrash.c
> > +++ b/tests/gem_ctx_thrash.c
> > @@ -90,17 +90,13 @@ static void single(const char *name, bool all_engines)
> >   {
> >       struct drm_i915_gem_exec_object2 *obj;
> >       struct drm_i915_gem_relocation_entry *reloc;
> > -     unsigned engines[16];
> > -     uint64_t size;
> > -     uint32_t *ctx, *map, scratch;
> > -     unsigned num_ctx;
> > -     int fd, gen, num_engines;
> > +     unsigned int engines[16], num_engines, num_ctx;
> > +     uint32_t *ctx, *map, scratch, size;
> > +     int fd, gen;
> >   #define MAX_LOOP 16
> >   
> > -     fd = drm_open_driver_master(DRIVER_INTEL);
> > +     fd = drm_open_driver(DRIVER_INTEL);
> >       igt_require_gem(fd);
> > -     igt_require(gem_can_store_dword(fd, 0));
> > -
> >       gem_require_contexts(fd);
> >   
> >       gen = intel_gen(intel_get_drm_devid(fd));
> > @@ -108,54 +104,77 @@ static void single(const char *name, bool all_engines)
> >       num_engines = 0;
> >       if (all_engines) {
> >               unsigned engine;
> > +
> >               for_each_physical_engine(fd, engine) {
> > +                     if (!gem_can_store_dword(fd, engine))
> > +                             continue;
> > +
> >                       engines[num_engines++] = engine;
> >                       if (num_engines == ARRAY_SIZE(engines))
> >                               break;
> >               }
> > -     } else
> > +     } else {
> > +             igt_require(gem_can_store_dword(fd, 0));
> >               engines[num_engines++] = 0;
> > +     }
> > +     igt_require(num_engines);
> >   
> >       num_ctx = get_num_contexts(fd, num_engines);
> >   
> >       size = ALIGN(num_ctx * sizeof(uint32_t), 4096);
> > -     scratch = gem_create(fd, ALIGN(num_ctx * sizeof(uint32_t), 4096));
> > +     scratch = gem_create(fd, size);
> >       gem_set_caching(fd, scratch, I915_CACHING_CACHED);
> > -     obj = calloc(num_ctx, 2 * sizeof(*obj));
> > -     reloc = calloc(num_ctx, sizeof(*reloc));
> > +     obj = calloc(num_ctx, 3 * sizeof(*obj));
> > +     reloc = calloc(num_ctx, 2 * sizeof(*reloc));
> >   
> >       ctx = malloc(num_ctx * sizeof(uint32_t));
> >       igt_assert(ctx);
> >       for (unsigned n = 0; n < num_ctx; n++) {
> >               ctx[n] = gem_context_create(fd);
> > -             obj[2*n + 0].handle = scratch;
> > -
> > -             reloc[n].target_handle = scratch;
> > -             reloc[n].presumed_offset = 0;
> > -             reloc[n].offset = sizeof(uint32_t);
> > -             reloc[n].delta = n * sizeof(uint32_t);
> > -             reloc[n].read_domains = I915_GEM_DOMAIN_INSTRUCTION;
> > -             reloc[n].write_domain = 0; /* lies! */
> > +
> > +             obj[3*n + 0].handle = gem_create(fd, 4096);
> > +             reloc[2*n + 0].target_handle = obj[3*n + 0].handle;
> > +             reloc[2*n + 0].presumed_offset = 0;
> > +             reloc[2*n + 0].offset = 4000;
> > +             reloc[2*n + 0].delta = 0;
> > +             reloc[2*n + 0].read_domains = I915_GEM_DOMAIN_RENDER;
> > +             reloc[2*n + 0].write_domain = I915_GEM_DOMAIN_RENDER;
> > +
> > +             obj[3*n + 1].handle = scratch;
> > +             reloc[2*n + 1].target_handle = scratch;
> > +             reloc[2*n + 1].presumed_offset = 0;
> > +             reloc[2*n + 1].offset = sizeof(uint32_t);
> > +             reloc[2*n + 1].delta = n * sizeof(uint32_t);
> > +             reloc[2*n + 1].read_domains = I915_GEM_DOMAIN_RENDER;
> > +             reloc[2*n + 1].write_domain = 0; /* lies! */
> >               if (gen >= 4 && gen < 8)
> > -                     reloc[n].offset += sizeof(uint32_t);
> > +                     reloc[2*n + 1].offset += sizeof(uint32_t);
> >   
> > -             obj[2*n + 1].relocs_ptr = to_user_pointer(&reloc[n]);
> > -             obj[2*n + 1].relocation_count = 1;
> > +             obj[3*n + 2].relocs_ptr = to_user_pointer(&reloc[2*n]);
> > +             obj[3*n + 2].relocation_count = 2;
> >       }
> >   
> >       map = gem_mmap__cpu(fd, scratch, 0, size, PROT_WRITE);
> > -     for (unsigned loop = 1; loop <= MAX_LOOP; loop <<= 1) {
> > -             unsigned count = loop * num_ctx;
> > +     for (unsigned int loop = 1; loop <= MAX_LOOP; loop <<= 1) {
> > +             const unsigned int count = loop * num_ctx;
> >               uint32_t *all;
> >   
> >               all = malloc(count * sizeof(uint32_t));
> > -             for (unsigned n = 0; n < count; n++)
> > +             for (unsigned int n = 0; n < count; n++)
> >                       all[n] = ctx[n % num_ctx];
> >               igt_permute_array(all, count, xchg_int);
> > -             for (unsigned n = 0; n < count; n++) {
> > -                     struct drm_i915_gem_execbuffer2 execbuf;
> > -                     unsigned r = n % num_ctx;
> > -                     uint64_t offset = reloc[r].presumed_offset + reloc[r].delta;
> > +
> > +             for (unsigned int n = 0; n < count; n++) {
> > +                     const unsigned int r = n % num_ctx;
> > +                     struct drm_i915_gem_execbuffer2 execbuf = {
> > +                             .buffers_ptr = to_user_pointer(&obj[3*r]),
> > +                             .buffer_count = 3,
> > +                             .flags = engines[n % num_engines],
> > +                             .rsvd1 = all[n],
> > +                     };
> > +                     uint64_t offset =
> > +                             reloc[2*r + 1].presumed_offset +
> > +                             reloc[2*r + 1].delta;
> >                       uint32_t handle = gem_create(fd, 4096);
> >                       uint32_t buf[16];
> >                       int i;
> > @@ -176,27 +195,22 @@ static void single(const char *name, bool all_engines)
> >                       buf[++i] = all[n];
> >                       buf[++i] = MI_BATCH_BUFFER_END;
> >                       gem_write(fd, handle, 0, buf, sizeof(buf));
> > -                     obj[2*r + 1].handle = handle;
> > +                     obj[3*r + 2].handle = handle;
> >   
> > -                     memset(&execbuf, 0, sizeof(execbuf));
> > -                     execbuf.buffers_ptr = to_user_pointer(&obj[2*r]);
> > -                     execbuf.buffer_count = 2;
> > -                     execbuf.flags = engines[n % num_engines];
> > -                     execbuf.rsvd1 = all[n];
> >                       gem_execbuf(fd, &execbuf);
> >                       gem_close(fd, handle);
> >               }
> >   
> > -             /* Note we lied about the write-domain when writing from the
> > +             /*
> > +              * Note we lied about the write-domain when writing from the
> >                * GPU (in order to avoid inter-ring synchronisation), so now
> >                * we have to force the synchronisation here.
> >                */
> >               gem_set_domain(fd, scratch,
> >                              I915_GEM_DOMAIN_CPU, I915_GEM_DOMAIN_CPU);
> > -             for (unsigned n = count - num_ctx; n < count; n++)
> > +             for (unsigned int n = count - num_ctx; n < count; n++)
> >                       igt_assert_eq(map[n % num_ctx], all[n]);
> >               free(all);
> > -             igt_progress(name, loop, MAX_LOOP);
> >       }
> >       munmap(map, size);
> >   
> > 
> 
> Just one thing I failed to figure out - what would be the difference 
> from simply putting a write hazard on the scratch page write? Wouldn't 
> both guarantee execution in order of execbuf?

That would cause every request to be executed in order and prevent
concurrent execution across engines. The intent of the test is to check
what happens when we run out of GTT space for the context/ring objects,
and part of that is to load up the GPU as much and as wide as possible.

Instead of forcing any ordering, we could just make the scratch big
enough for all writes; but it was such a quirky bug that I wanted to
share the fix. :)
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [igt-dev] [Intel-gfx] [PATCH i-g-t 3/6] igt/gem_ctx_thrash: Order writes between contexts
@ 2018-05-14 15:10       ` Chris Wilson
  0 siblings, 0 replies; 38+ messages in thread
From: Chris Wilson @ 2018-05-14 15:10 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx; +Cc: igt-dev

Quoting Tvrtko Ursulin (2018-05-14 11:59:09)
> 
> On 14/05/2018 09:02, Chris Wilson wrote:
> > The test wrote to the same dwords from multiple contexts, assuming that
> > the writes would be ordered by its submission. However, as it was using
> > multiple contexts without a write hazard, those timelines are not
> > coupled and the requests may be emitted to hw in any order. So emit a
> > write hazard for each individual dword in the scratch (avoiding the
> > write hazard for the scratch as a whole) to ensure the writes do occur
> > in the expected order.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >   tests/gem_ctx_thrash.c | 92 ++++++++++++++++++++++++------------------
> >   1 file changed, 53 insertions(+), 39 deletions(-)
> > 
> > diff --git a/tests/gem_ctx_thrash.c b/tests/gem_ctx_thrash.c
> > index 2cd9cfebf..b25f95f13 100644
> > --- a/tests/gem_ctx_thrash.c
> > +++ b/tests/gem_ctx_thrash.c
> > @@ -90,17 +90,13 @@ static void single(const char *name, bool all_engines)
> >   {
> >       struct drm_i915_gem_exec_object2 *obj;
> >       struct drm_i915_gem_relocation_entry *reloc;
> > -     unsigned engines[16];
> > -     uint64_t size;
> > -     uint32_t *ctx, *map, scratch;
> > -     unsigned num_ctx;
> > -     int fd, gen, num_engines;
> > +     unsigned int engines[16], num_engines, num_ctx;
> > +     uint32_t *ctx, *map, scratch, size;
> > +     int fd, gen;
> >   #define MAX_LOOP 16
> >   
> > -     fd = drm_open_driver_master(DRIVER_INTEL);
> > +     fd = drm_open_driver(DRIVER_INTEL);
> >       igt_require_gem(fd);
> > -     igt_require(gem_can_store_dword(fd, 0));
> > -
> >       gem_require_contexts(fd);
> >   
> >       gen = intel_gen(intel_get_drm_devid(fd));
> > @@ -108,54 +104,77 @@ static void single(const char *name, bool all_engines)
> >       num_engines = 0;
> >       if (all_engines) {
> >               unsigned engine;
> > +
> >               for_each_physical_engine(fd, engine) {
> > +                     if (!gem_can_store_dword(fd, engine))
> > +                             continue;
> > +
> >                       engines[num_engines++] = engine;
> >                       if (num_engines == ARRAY_SIZE(engines))
> >                               break;
> >               }
> > -     } else
> > +     } else {
> > +             igt_require(gem_can_store_dword(fd, 0));
> >               engines[num_engines++] = 0;
> > +     }
> > +     igt_require(num_engines);
> >   
> >       num_ctx = get_num_contexts(fd, num_engines);
> >   
> >       size = ALIGN(num_ctx * sizeof(uint32_t), 4096);
> > -     scratch = gem_create(fd, ALIGN(num_ctx * sizeof(uint32_t), 4096));
> > +     scratch = gem_create(fd, size);
> >       gem_set_caching(fd, scratch, I915_CACHING_CACHED);
> > -     obj = calloc(num_ctx, 2 * sizeof(*obj));
> > -     reloc = calloc(num_ctx, sizeof(*reloc));
> > +     obj = calloc(num_ctx, 3 * sizeof(*obj));
> > +     reloc = calloc(num_ctx, 2 * sizeof(*reloc));
> >   
> >       ctx = malloc(num_ctx * sizeof(uint32_t));
> >       igt_assert(ctx);
> >       for (unsigned n = 0; n < num_ctx; n++) {
> >               ctx[n] = gem_context_create(fd);
> > -             obj[2*n + 0].handle = scratch;
> > -
> > -             reloc[n].target_handle = scratch;
> > -             reloc[n].presumed_offset = 0;
> > -             reloc[n].offset = sizeof(uint32_t);
> > -             reloc[n].delta = n * sizeof(uint32_t);
> > -             reloc[n].read_domains = I915_GEM_DOMAIN_INSTRUCTION;
> > -             reloc[n].write_domain = 0; /* lies! */
> > +
> > +             obj[3*n + 0].handle = gem_create(fd, 4096);
> > +             reloc[2*n + 0].target_handle = obj[3*n + 0].handle;
> > +             reloc[2*n + 0].presumed_offset = 0;
> > +             reloc[2*n + 0].offset = 4000;
> > +             reloc[2*n + 0].delta = 0;
> > +             reloc[2*n + 0].read_domains = I915_GEM_DOMAIN_RENDER;
> > +             reloc[2*n + 0].write_domain = I915_GEM_DOMAIN_RENDER;
> > +
> > +             obj[3*n + 1].handle = scratch;
> > +             reloc[2*n + 1].target_handle = scratch;
> > +             reloc[2*n + 1].presumed_offset = 0;
> > +             reloc[2*n + 1].offset = sizeof(uint32_t);
> > +             reloc[2*n + 1].delta = n * sizeof(uint32_t);
> > +             reloc[2*n + 1].read_domains = I915_GEM_DOMAIN_RENDER;
> > +             reloc[2*n + 1].write_domain = 0; /* lies! */
> >               if (gen >= 4 && gen < 8)
> > -                     reloc[n].offset += sizeof(uint32_t);
> > +                     reloc[2*n + 1].offset += sizeof(uint32_t);
> >   
> > -             obj[2*n + 1].relocs_ptr = to_user_pointer(&reloc[n]);
> > -             obj[2*n + 1].relocation_count = 1;
> > +             obj[3*n + 2].relocs_ptr = to_user_pointer(&reloc[2*n]);
> > +             obj[3*n + 2].relocation_count = 2;
> >       }
> >   
> >       map = gem_mmap__cpu(fd, scratch, 0, size, PROT_WRITE);
> > -     for (unsigned loop = 1; loop <= MAX_LOOP; loop <<= 1) {
> > -             unsigned count = loop * num_ctx;
> > +     for (unsigned int loop = 1; loop <= MAX_LOOP; loop <<= 1) {
> > +             const unsigned int count = loop * num_ctx;
> >               uint32_t *all;
> >   
> >               all = malloc(count * sizeof(uint32_t));
> > -             for (unsigned n = 0; n < count; n++)
> > +             for (unsigned int n = 0; n < count; n++)
> >                       all[n] = ctx[n % num_ctx];
> >               igt_permute_array(all, count, xchg_int);
> > -             for (unsigned n = 0; n < count; n++) {
> > -                     struct drm_i915_gem_execbuffer2 execbuf;
> > -                     unsigned r = n % num_ctx;
> > -                     uint64_t offset = reloc[r].presumed_offset + reloc[r].delta;
> > +
> > +             for (unsigned int n = 0; n < count; n++) {
> > +                     const unsigned int r = n % num_ctx;
> > +                     struct drm_i915_gem_execbuffer2 execbuf = {
> > +                             .buffers_ptr = to_user_pointer(&obj[3*r]),
> > +                             .buffer_count = 3,
> > +                             .flags = engines[n % num_engines],
> > +                             .rsvd1 = all[n],
> > +                     };
> > +                     uint64_t offset =
> > +                             reloc[2*r + 1].presumed_offset +
> > +                             reloc[2*r + 1].delta;
> >                       uint32_t handle = gem_create(fd, 4096);
> >                       uint32_t buf[16];
> >                       int i;
> > @@ -176,27 +195,22 @@ static void single(const char *name, bool all_engines)
> >                       buf[++i] = all[n];
> >                       buf[++i] = MI_BATCH_BUFFER_END;
> >                       gem_write(fd, handle, 0, buf, sizeof(buf));
> > -                     obj[2*r + 1].handle = handle;
> > +                     obj[3*r + 2].handle = handle;
> >   
> > -                     memset(&execbuf, 0, sizeof(execbuf));
> > -                     execbuf.buffers_ptr = to_user_pointer(&obj[2*r]);
> > -                     execbuf.buffer_count = 2;
> > -                     execbuf.flags = engines[n % num_engines];
> > -                     execbuf.rsvd1 = all[n];
> >                       gem_execbuf(fd, &execbuf);
> >                       gem_close(fd, handle);
> >               }
> >   
> > -             /* Note we lied about the write-domain when writing from the
> > +             /*
> > +              * Note we lied about the write-domain when writing from the
> >                * GPU (in order to avoid inter-ring synchronisation), so now
> >                * we have to force the synchronisation here.
> >                */
> >               gem_set_domain(fd, scratch,
> >                              I915_GEM_DOMAIN_CPU, I915_GEM_DOMAIN_CPU);
> > -             for (unsigned n = count - num_ctx; n < count; n++)
> > +             for (unsigned int n = count - num_ctx; n < count; n++)
> >                       igt_assert_eq(map[n % num_ctx], all[n]);
> >               free(all);
> > -             igt_progress(name, loop, MAX_LOOP);
> >       }
> >       munmap(map, size);
> >   
> > 
> 
> Just one thing I failed to figure out - what would be the difference 
> from simply putting a write hazard on the scratch page write? Wouldn't 
> both guarantee execution in order of execbuf?

That would cause every request to be executed in order and prevent
concurrent execution across engines. The intent of the test is to check
what happens when we run out of GTT space for the context/ring objects,
and part of that is to load up the GPU as much and as wide as possible.

Instead of forcing any ordering, we could just make the scratch big
enough for all writes; but it was such a quirky bug that I wanted to
share the fix. :)
-Chris
_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH i-g-t 3/6] igt/gem_ctx_thrash: Order writes between contexts
  2018-05-14 15:10       ` [igt-dev] [Intel-gfx] " Chris Wilson
@ 2018-05-15  8:20         ` Tvrtko Ursulin
  -1 siblings, 0 replies; 38+ messages in thread
From: Tvrtko Ursulin @ 2018-05-15  8:20 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: igt-dev


On 14/05/2018 16:10, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-05-14 11:59:09)
>>
>> On 14/05/2018 09:02, Chris Wilson wrote:
>>> The test wrote to the same dwords from multiple contexts, assuming that
>>> the writes would be ordered by its submission. However, as it was using
>>> multiple contexts without a write hazard, those timelines are not
>>> coupled and the requests may be emitted to hw in any order. So emit a
>>> write hazard for each individual dword in the scratch (avoiding the
>>> write hazard for the scratch as a whole) to ensure the writes do occur
>>> in the expected order.
>>>
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>> ---
>>>    tests/gem_ctx_thrash.c | 92 ++++++++++++++++++++++++------------------
>>>    1 file changed, 53 insertions(+), 39 deletions(-)
>>>
>>> diff --git a/tests/gem_ctx_thrash.c b/tests/gem_ctx_thrash.c
>>> index 2cd9cfebf..b25f95f13 100644
>>> --- a/tests/gem_ctx_thrash.c
>>> +++ b/tests/gem_ctx_thrash.c
>>> @@ -90,17 +90,13 @@ static void single(const char *name, bool all_engines)
>>>    {
>>>        struct drm_i915_gem_exec_object2 *obj;
>>>        struct drm_i915_gem_relocation_entry *reloc;
>>> -     unsigned engines[16];
>>> -     uint64_t size;
>>> -     uint32_t *ctx, *map, scratch;
>>> -     unsigned num_ctx;
>>> -     int fd, gen, num_engines;
>>> +     unsigned int engines[16], num_engines, num_ctx;
>>> +     uint32_t *ctx, *map, scratch, size;
>>> +     int fd, gen;
>>>    #define MAX_LOOP 16
>>>    
>>> -     fd = drm_open_driver_master(DRIVER_INTEL);
>>> +     fd = drm_open_driver(DRIVER_INTEL);
>>>        igt_require_gem(fd);
>>> -     igt_require(gem_can_store_dword(fd, 0));
>>> -
>>>        gem_require_contexts(fd);
>>>    
>>>        gen = intel_gen(intel_get_drm_devid(fd));
>>> @@ -108,54 +104,77 @@ static void single(const char *name, bool all_engines)
>>>        num_engines = 0;
>>>        if (all_engines) {
>>>                unsigned engine;
>>> +
>>>                for_each_physical_engine(fd, engine) {
>>> +                     if (!gem_can_store_dword(fd, engine))
>>> +                             continue;
>>> +
>>>                        engines[num_engines++] = engine;
>>>                        if (num_engines == ARRAY_SIZE(engines))
>>>                                break;
>>>                }
>>> -     } else
>>> +     } else {
>>> +             igt_require(gem_can_store_dword(fd, 0));
>>>                engines[num_engines++] = 0;
>>> +     }
>>> +     igt_require(num_engines);
>>>    
>>>        num_ctx = get_num_contexts(fd, num_engines);
>>>    
>>>        size = ALIGN(num_ctx * sizeof(uint32_t), 4096);
>>> -     scratch = gem_create(fd, ALIGN(num_ctx * sizeof(uint32_t), 4096));
>>> +     scratch = gem_create(fd, size);
>>>        gem_set_caching(fd, scratch, I915_CACHING_CACHED);
>>> -     obj = calloc(num_ctx, 2 * sizeof(*obj));
>>> -     reloc = calloc(num_ctx, sizeof(*reloc));
>>> +     obj = calloc(num_ctx, 3 * sizeof(*obj));
>>> +     reloc = calloc(num_ctx, 2 * sizeof(*reloc));
>>>    
>>>        ctx = malloc(num_ctx * sizeof(uint32_t));
>>>        igt_assert(ctx);
>>>        for (unsigned n = 0; n < num_ctx; n++) {
>>>                ctx[n] = gem_context_create(fd);
>>> -             obj[2*n + 0].handle = scratch;
>>> -
>>> -             reloc[n].target_handle = scratch;
>>> -             reloc[n].presumed_offset = 0;
>>> -             reloc[n].offset = sizeof(uint32_t);
>>> -             reloc[n].delta = n * sizeof(uint32_t);
>>> -             reloc[n].read_domains = I915_GEM_DOMAIN_INSTRUCTION;
>>> -             reloc[n].write_domain = 0; /* lies! */
>>> +
>>> +             obj[3*n + 0].handle = gem_create(fd, 4096);
>>> +             reloc[2*n + 0].target_handle = obj[3*n + 0].handle;
>>> +             reloc[2*n + 0].presumed_offset = 0;
>>> +             reloc[2*n + 0].offset = 4000;
>>> +             reloc[2*n + 0].delta = 0;
>>> +             reloc[2*n + 0].read_domains = I915_GEM_DOMAIN_RENDER;
>>> +             reloc[2*n + 0].write_domain = I915_GEM_DOMAIN_RENDER;
>>> +
>>> +             obj[3*n + 1].handle = scratch;
>>> +             reloc[2*n + 1].target_handle = scratch;
>>> +             reloc[2*n + 1].presumed_offset = 0;
>>> +             reloc[2*n + 1].offset = sizeof(uint32_t);
>>> +             reloc[2*n + 1].delta = n * sizeof(uint32_t);
>>> +             reloc[2*n + 1].read_domains = I915_GEM_DOMAIN_RENDER;
>>> +             reloc[2*n + 1].write_domain = 0; /* lies! */
>>>                if (gen >= 4 && gen < 8)
>>> -                     reloc[n].offset += sizeof(uint32_t);
>>> +                     reloc[2*n + 1].offset += sizeof(uint32_t);
>>>    
>>> -             obj[2*n + 1].relocs_ptr = to_user_pointer(&reloc[n]);
>>> -             obj[2*n + 1].relocation_count = 1;
>>> +             obj[3*n + 2].relocs_ptr = to_user_pointer(&reloc[2*n]);
>>> +             obj[3*n + 2].relocation_count = 2;
>>>        }
>>>    
>>>        map = gem_mmap__cpu(fd, scratch, 0, size, PROT_WRITE);
>>> -     for (unsigned loop = 1; loop <= MAX_LOOP; loop <<= 1) {
>>> -             unsigned count = loop * num_ctx;
>>> +     for (unsigned int loop = 1; loop <= MAX_LOOP; loop <<= 1) {
>>> +             const unsigned int count = loop * num_ctx;
>>>                uint32_t *all;
>>>    
>>>                all = malloc(count * sizeof(uint32_t));
>>> -             for (unsigned n = 0; n < count; n++)
>>> +             for (unsigned int n = 0; n < count; n++)
>>>                        all[n] = ctx[n % num_ctx];
>>>                igt_permute_array(all, count, xchg_int);
>>> -             for (unsigned n = 0; n < count; n++) {
>>> -                     struct drm_i915_gem_execbuffer2 execbuf;
>>> -                     unsigned r = n % num_ctx;
>>> -                     uint64_t offset = reloc[r].presumed_offset + reloc[r].delta;
>>> +
>>> +             for (unsigned int n = 0; n < count; n++) {
>>> +                     const unsigned int r = n % num_ctx;
>>> +                     struct drm_i915_gem_execbuffer2 execbuf = {
>>> +                             .buffers_ptr = to_user_pointer(&obj[3*r]),
>>> +                             .buffer_count = 3,
>>> +                             .flags = engines[n % num_engines],
>>> +                             .rsvd1 = all[n],
>>> +                     };
>>> +                     uint64_t offset =
>>> +                             reloc[2*r + 1].presumed_offset +
>>> +                             reloc[2*r + 1].delta;
>>>                        uint32_t handle = gem_create(fd, 4096);
>>>                        uint32_t buf[16];
>>>                        int i;
>>> @@ -176,27 +195,22 @@ static void single(const char *name, bool all_engines)
>>>                        buf[++i] = all[n];
>>>                        buf[++i] = MI_BATCH_BUFFER_END;
>>>                        gem_write(fd, handle, 0, buf, sizeof(buf));
>>> -                     obj[2*r + 1].handle = handle;
>>> +                     obj[3*r + 2].handle = handle;
>>>    
>>> -                     memset(&execbuf, 0, sizeof(execbuf));
>>> -                     execbuf.buffers_ptr = to_user_pointer(&obj[2*r]);
>>> -                     execbuf.buffer_count = 2;
>>> -                     execbuf.flags = engines[n % num_engines];
>>> -                     execbuf.rsvd1 = all[n];
>>>                        gem_execbuf(fd, &execbuf);
>>>                        gem_close(fd, handle);
>>>                }
>>>    
>>> -             /* Note we lied about the write-domain when writing from the
>>> +             /*
>>> +              * Note we lied about the write-domain when writing from the
>>>                 * GPU (in order to avoid inter-ring synchronisation), so now
>>>                 * we have to force the synchronisation here.
>>>                 */
>>>                gem_set_domain(fd, scratch,
>>>                               I915_GEM_DOMAIN_CPU, I915_GEM_DOMAIN_CPU);
>>> -             for (unsigned n = count - num_ctx; n < count; n++)
>>> +             for (unsigned int n = count - num_ctx; n < count; n++)
>>>                        igt_assert_eq(map[n % num_ctx], all[n]);
>>>                free(all);
>>> -             igt_progress(name, loop, MAX_LOOP);
>>>        }
>>>        munmap(map, size);
>>>    
>>>
>>
>> Just one thing I failed to figure out - what would be the difference
>> from simply putting a write hazard on the scratch page write? Wouldn't
>> both guarantee execution in order of execbuf?
> 
> That would cause every request to be executed in order and prevent
> concurrent execution across engines. The intent of the test is to check

True, I forgot the context of a patch that each context gets it own 
"ordering" bo, while the execbuf loop is separate.

> what happens when we run out of GTT space for the context/ring objects,
> and part of that is to load up the GPU as much and as wide as possible.
> 
> Instead of forcing any ordering, we could just make the scratch big
> enough for all writes; but it was such a quirky bug that I wanted to
> share the fix. :)

However, if the "ordering" bo is one for each context, I still don't 
quite understand what is ordered with the patch which wouldn't normally be.

And also, the verification loop is after all execbufs have completed - 
so why is even the order of execution important?

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [igt-dev] [Intel-gfx] [PATCH i-g-t 3/6] igt/gem_ctx_thrash: Order writes between contexts
@ 2018-05-15  8:20         ` Tvrtko Ursulin
  0 siblings, 0 replies; 38+ messages in thread
From: Tvrtko Ursulin @ 2018-05-15  8:20 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: igt-dev


On 14/05/2018 16:10, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-05-14 11:59:09)
>>
>> On 14/05/2018 09:02, Chris Wilson wrote:
>>> The test wrote to the same dwords from multiple contexts, assuming that
>>> the writes would be ordered by its submission. However, as it was using
>>> multiple contexts without a write hazard, those timelines are not
>>> coupled and the requests may be emitted to hw in any order. So emit a
>>> write hazard for each individual dword in the scratch (avoiding the
>>> write hazard for the scratch as a whole) to ensure the writes do occur
>>> in the expected order.
>>>
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>> ---
>>>    tests/gem_ctx_thrash.c | 92 ++++++++++++++++++++++++------------------
>>>    1 file changed, 53 insertions(+), 39 deletions(-)
>>>
>>> diff --git a/tests/gem_ctx_thrash.c b/tests/gem_ctx_thrash.c
>>> index 2cd9cfebf..b25f95f13 100644
>>> --- a/tests/gem_ctx_thrash.c
>>> +++ b/tests/gem_ctx_thrash.c
>>> @@ -90,17 +90,13 @@ static void single(const char *name, bool all_engines)
>>>    {
>>>        struct drm_i915_gem_exec_object2 *obj;
>>>        struct drm_i915_gem_relocation_entry *reloc;
>>> -     unsigned engines[16];
>>> -     uint64_t size;
>>> -     uint32_t *ctx, *map, scratch;
>>> -     unsigned num_ctx;
>>> -     int fd, gen, num_engines;
>>> +     unsigned int engines[16], num_engines, num_ctx;
>>> +     uint32_t *ctx, *map, scratch, size;
>>> +     int fd, gen;
>>>    #define MAX_LOOP 16
>>>    
>>> -     fd = drm_open_driver_master(DRIVER_INTEL);
>>> +     fd = drm_open_driver(DRIVER_INTEL);
>>>        igt_require_gem(fd);
>>> -     igt_require(gem_can_store_dword(fd, 0));
>>> -
>>>        gem_require_contexts(fd);
>>>    
>>>        gen = intel_gen(intel_get_drm_devid(fd));
>>> @@ -108,54 +104,77 @@ static void single(const char *name, bool all_engines)
>>>        num_engines = 0;
>>>        if (all_engines) {
>>>                unsigned engine;
>>> +
>>>                for_each_physical_engine(fd, engine) {
>>> +                     if (!gem_can_store_dword(fd, engine))
>>> +                             continue;
>>> +
>>>                        engines[num_engines++] = engine;
>>>                        if (num_engines == ARRAY_SIZE(engines))
>>>                                break;
>>>                }
>>> -     } else
>>> +     } else {
>>> +             igt_require(gem_can_store_dword(fd, 0));
>>>                engines[num_engines++] = 0;
>>> +     }
>>> +     igt_require(num_engines);
>>>    
>>>        num_ctx = get_num_contexts(fd, num_engines);
>>>    
>>>        size = ALIGN(num_ctx * sizeof(uint32_t), 4096);
>>> -     scratch = gem_create(fd, ALIGN(num_ctx * sizeof(uint32_t), 4096));
>>> +     scratch = gem_create(fd, size);
>>>        gem_set_caching(fd, scratch, I915_CACHING_CACHED);
>>> -     obj = calloc(num_ctx, 2 * sizeof(*obj));
>>> -     reloc = calloc(num_ctx, sizeof(*reloc));
>>> +     obj = calloc(num_ctx, 3 * sizeof(*obj));
>>> +     reloc = calloc(num_ctx, 2 * sizeof(*reloc));
>>>    
>>>        ctx = malloc(num_ctx * sizeof(uint32_t));
>>>        igt_assert(ctx);
>>>        for (unsigned n = 0; n < num_ctx; n++) {
>>>                ctx[n] = gem_context_create(fd);
>>> -             obj[2*n + 0].handle = scratch;
>>> -
>>> -             reloc[n].target_handle = scratch;
>>> -             reloc[n].presumed_offset = 0;
>>> -             reloc[n].offset = sizeof(uint32_t);
>>> -             reloc[n].delta = n * sizeof(uint32_t);
>>> -             reloc[n].read_domains = I915_GEM_DOMAIN_INSTRUCTION;
>>> -             reloc[n].write_domain = 0; /* lies! */
>>> +
>>> +             obj[3*n + 0].handle = gem_create(fd, 4096);
>>> +             reloc[2*n + 0].target_handle = obj[3*n + 0].handle;
>>> +             reloc[2*n + 0].presumed_offset = 0;
>>> +             reloc[2*n + 0].offset = 4000;
>>> +             reloc[2*n + 0].delta = 0;
>>> +             reloc[2*n + 0].read_domains = I915_GEM_DOMAIN_RENDER;
>>> +             reloc[2*n + 0].write_domain = I915_GEM_DOMAIN_RENDER;
>>> +
>>> +             obj[3*n + 1].handle = scratch;
>>> +             reloc[2*n + 1].target_handle = scratch;
>>> +             reloc[2*n + 1].presumed_offset = 0;
>>> +             reloc[2*n + 1].offset = sizeof(uint32_t);
>>> +             reloc[2*n + 1].delta = n * sizeof(uint32_t);
>>> +             reloc[2*n + 1].read_domains = I915_GEM_DOMAIN_RENDER;
>>> +             reloc[2*n + 1].write_domain = 0; /* lies! */
>>>                if (gen >= 4 && gen < 8)
>>> -                     reloc[n].offset += sizeof(uint32_t);
>>> +                     reloc[2*n + 1].offset += sizeof(uint32_t);
>>>    
>>> -             obj[2*n + 1].relocs_ptr = to_user_pointer(&reloc[n]);
>>> -             obj[2*n + 1].relocation_count = 1;
>>> +             obj[3*n + 2].relocs_ptr = to_user_pointer(&reloc[2*n]);
>>> +             obj[3*n + 2].relocation_count = 2;
>>>        }
>>>    
>>>        map = gem_mmap__cpu(fd, scratch, 0, size, PROT_WRITE);
>>> -     for (unsigned loop = 1; loop <= MAX_LOOP; loop <<= 1) {
>>> -             unsigned count = loop * num_ctx;
>>> +     for (unsigned int loop = 1; loop <= MAX_LOOP; loop <<= 1) {
>>> +             const unsigned int count = loop * num_ctx;
>>>                uint32_t *all;
>>>    
>>>                all = malloc(count * sizeof(uint32_t));
>>> -             for (unsigned n = 0; n < count; n++)
>>> +             for (unsigned int n = 0; n < count; n++)
>>>                        all[n] = ctx[n % num_ctx];
>>>                igt_permute_array(all, count, xchg_int);
>>> -             for (unsigned n = 0; n < count; n++) {
>>> -                     struct drm_i915_gem_execbuffer2 execbuf;
>>> -                     unsigned r = n % num_ctx;
>>> -                     uint64_t offset = reloc[r].presumed_offset + reloc[r].delta;
>>> +
>>> +             for (unsigned int n = 0; n < count; n++) {
>>> +                     const unsigned int r = n % num_ctx;
>>> +                     struct drm_i915_gem_execbuffer2 execbuf = {
>>> +                             .buffers_ptr = to_user_pointer(&obj[3*r]),
>>> +                             .buffer_count = 3,
>>> +                             .flags = engines[n % num_engines],
>>> +                             .rsvd1 = all[n],
>>> +                     };
>>> +                     uint64_t offset =
>>> +                             reloc[2*r + 1].presumed_offset +
>>> +                             reloc[2*r + 1].delta;
>>>                        uint32_t handle = gem_create(fd, 4096);
>>>                        uint32_t buf[16];
>>>                        int i;
>>> @@ -176,27 +195,22 @@ static void single(const char *name, bool all_engines)
>>>                        buf[++i] = all[n];
>>>                        buf[++i] = MI_BATCH_BUFFER_END;
>>>                        gem_write(fd, handle, 0, buf, sizeof(buf));
>>> -                     obj[2*r + 1].handle = handle;
>>> +                     obj[3*r + 2].handle = handle;
>>>    
>>> -                     memset(&execbuf, 0, sizeof(execbuf));
>>> -                     execbuf.buffers_ptr = to_user_pointer(&obj[2*r]);
>>> -                     execbuf.buffer_count = 2;
>>> -                     execbuf.flags = engines[n % num_engines];
>>> -                     execbuf.rsvd1 = all[n];
>>>                        gem_execbuf(fd, &execbuf);
>>>                        gem_close(fd, handle);
>>>                }
>>>    
>>> -             /* Note we lied about the write-domain when writing from the
>>> +             /*
>>> +              * Note we lied about the write-domain when writing from the
>>>                 * GPU (in order to avoid inter-ring synchronisation), so now
>>>                 * we have to force the synchronisation here.
>>>                 */
>>>                gem_set_domain(fd, scratch,
>>>                               I915_GEM_DOMAIN_CPU, I915_GEM_DOMAIN_CPU);
>>> -             for (unsigned n = count - num_ctx; n < count; n++)
>>> +             for (unsigned int n = count - num_ctx; n < count; n++)
>>>                        igt_assert_eq(map[n % num_ctx], all[n]);
>>>                free(all);
>>> -             igt_progress(name, loop, MAX_LOOP);
>>>        }
>>>        munmap(map, size);
>>>    
>>>
>>
>> Just one thing I failed to figure out - what would be the difference
>> from simply putting a write hazard on the scratch page write? Wouldn't
>> both guarantee execution in order of execbuf?
> 
> That would cause every request to be executed in order and prevent
> concurrent execution across engines. The intent of the test is to check

True, I forgot the context of a patch that each context gets it own 
"ordering" bo, while the execbuf loop is separate.

> what happens when we run out of GTT space for the context/ring objects,
> and part of that is to load up the GPU as much and as wide as possible.
> 
> Instead of forcing any ordering, we could just make the scratch big
> enough for all writes; but it was such a quirky bug that I wanted to
> share the fix. :)

However, if the "ordering" bo is one for each context, I still don't 
quite understand what is ordered with the patch which wouldn't normally be.

And also, the verification loop is after all execbufs have completed - 
so why is even the order of execution important?

Regards,

Tvrtko
_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH i-g-t 3/6] igt/gem_ctx_thrash: Order writes between contexts
  2018-05-15  8:20         ` [igt-dev] [Intel-gfx] " Tvrtko Ursulin
@ 2018-05-15  8:29           ` Chris Wilson
  -1 siblings, 0 replies; 38+ messages in thread
From: Chris Wilson @ 2018-05-15  8:29 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx; +Cc: igt-dev

Quoting Tvrtko Ursulin (2018-05-15 09:20:13)
> 
> On 14/05/2018 16:10, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2018-05-14 11:59:09)
> >>
> >> On 14/05/2018 09:02, Chris Wilson wrote:
> >>> The test wrote to the same dwords from multiple contexts, assuming that
> >>> the writes would be ordered by its submission. However, as it was using
> >>> multiple contexts without a write hazard, those timelines are not
> >>> coupled and the requests may be emitted to hw in any order. So emit a
> >>> write hazard for each individual dword in the scratch (avoiding the
> >>> write hazard for the scratch as a whole) to ensure the writes do occur
> >>> in the expected order.
> >>>
> >>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >>> ---
> >>>    tests/gem_ctx_thrash.c | 92 ++++++++++++++++++++++++------------------
> >>>    1 file changed, 53 insertions(+), 39 deletions(-)
> >>>
> >>> diff --git a/tests/gem_ctx_thrash.c b/tests/gem_ctx_thrash.c
> >>> index 2cd9cfebf..b25f95f13 100644
> >>> --- a/tests/gem_ctx_thrash.c
> >>> +++ b/tests/gem_ctx_thrash.c
> >>> @@ -90,17 +90,13 @@ static void single(const char *name, bool all_engines)
> >>>    {
> >>>        struct drm_i915_gem_exec_object2 *obj;
> >>>        struct drm_i915_gem_relocation_entry *reloc;
> >>> -     unsigned engines[16];
> >>> -     uint64_t size;
> >>> -     uint32_t *ctx, *map, scratch;
> >>> -     unsigned num_ctx;
> >>> -     int fd, gen, num_engines;
> >>> +     unsigned int engines[16], num_engines, num_ctx;
> >>> +     uint32_t *ctx, *map, scratch, size;
> >>> +     int fd, gen;
> >>>    #define MAX_LOOP 16
> >>>    
> >>> -     fd = drm_open_driver_master(DRIVER_INTEL);
> >>> +     fd = drm_open_driver(DRIVER_INTEL);
> >>>        igt_require_gem(fd);
> >>> -     igt_require(gem_can_store_dword(fd, 0));
> >>> -
> >>>        gem_require_contexts(fd);
> >>>    
> >>>        gen = intel_gen(intel_get_drm_devid(fd));
> >>> @@ -108,54 +104,77 @@ static void single(const char *name, bool all_engines)
> >>>        num_engines = 0;
> >>>        if (all_engines) {
> >>>                unsigned engine;
> >>> +
> >>>                for_each_physical_engine(fd, engine) {
> >>> +                     if (!gem_can_store_dword(fd, engine))
> >>> +                             continue;
> >>> +
> >>>                        engines[num_engines++] = engine;
> >>>                        if (num_engines == ARRAY_SIZE(engines))
> >>>                                break;
> >>>                }
> >>> -     } else
> >>> +     } else {
> >>> +             igt_require(gem_can_store_dword(fd, 0));
> >>>                engines[num_engines++] = 0;
> >>> +     }
> >>> +     igt_require(num_engines);
> >>>    
> >>>        num_ctx = get_num_contexts(fd, num_engines);
> >>>    
> >>>        size = ALIGN(num_ctx * sizeof(uint32_t), 4096);
> >>> -     scratch = gem_create(fd, ALIGN(num_ctx * sizeof(uint32_t), 4096));
> >>> +     scratch = gem_create(fd, size);
> >>>        gem_set_caching(fd, scratch, I915_CACHING_CACHED);
> >>> -     obj = calloc(num_ctx, 2 * sizeof(*obj));
> >>> -     reloc = calloc(num_ctx, sizeof(*reloc));
> >>> +     obj = calloc(num_ctx, 3 * sizeof(*obj));
> >>> +     reloc = calloc(num_ctx, 2 * sizeof(*reloc));
> >>>    
> >>>        ctx = malloc(num_ctx * sizeof(uint32_t));
> >>>        igt_assert(ctx);
> >>>        for (unsigned n = 0; n < num_ctx; n++) {
> >>>                ctx[n] = gem_context_create(fd);
> >>> -             obj[2*n + 0].handle = scratch;
> >>> -
> >>> -             reloc[n].target_handle = scratch;
> >>> -             reloc[n].presumed_offset = 0;
> >>> -             reloc[n].offset = sizeof(uint32_t);
> >>> -             reloc[n].delta = n * sizeof(uint32_t);
> >>> -             reloc[n].read_domains = I915_GEM_DOMAIN_INSTRUCTION;
> >>> -             reloc[n].write_domain = 0; /* lies! */
> >>> +
> >>> +             obj[3*n + 0].handle = gem_create(fd, 4096);
> >>> +             reloc[2*n + 0].target_handle = obj[3*n + 0].handle;
> >>> +             reloc[2*n + 0].presumed_offset = 0;
> >>> +             reloc[2*n + 0].offset = 4000;
> >>> +             reloc[2*n + 0].delta = 0;
> >>> +             reloc[2*n + 0].read_domains = I915_GEM_DOMAIN_RENDER;
> >>> +             reloc[2*n + 0].write_domain = I915_GEM_DOMAIN_RENDER;
> >>> +
> >>> +             obj[3*n + 1].handle = scratch;
> >>> +             reloc[2*n + 1].target_handle = scratch;
> >>> +             reloc[2*n + 1].presumed_offset = 0;
> >>> +             reloc[2*n + 1].offset = sizeof(uint32_t);
> >>> +             reloc[2*n + 1].delta = n * sizeof(uint32_t);
> >>> +             reloc[2*n + 1].read_domains = I915_GEM_DOMAIN_RENDER;
> >>> +             reloc[2*n + 1].write_domain = 0; /* lies! */
> >>>                if (gen >= 4 && gen < 8)
> >>> -                     reloc[n].offset += sizeof(uint32_t);
> >>> +                     reloc[2*n + 1].offset += sizeof(uint32_t);
> >>>    
> >>> -             obj[2*n + 1].relocs_ptr = to_user_pointer(&reloc[n]);
> >>> -             obj[2*n + 1].relocation_count = 1;
> >>> +             obj[3*n + 2].relocs_ptr = to_user_pointer(&reloc[2*n]);
> >>> +             obj[3*n + 2].relocation_count = 2;
> >>>        }
> >>>    
> >>>        map = gem_mmap__cpu(fd, scratch, 0, size, PROT_WRITE);
> >>> -     for (unsigned loop = 1; loop <= MAX_LOOP; loop <<= 1) {
> >>> -             unsigned count = loop * num_ctx;
> >>> +     for (unsigned int loop = 1; loop <= MAX_LOOP; loop <<= 1) {
> >>> +             const unsigned int count = loop * num_ctx;
> >>>                uint32_t *all;
> >>>    
> >>>                all = malloc(count * sizeof(uint32_t));
> >>> -             for (unsigned n = 0; n < count; n++)
> >>> +             for (unsigned int n = 0; n < count; n++)
> >>>                        all[n] = ctx[n % num_ctx];
> >>>                igt_permute_array(all, count, xchg_int);
> >>> -             for (unsigned n = 0; n < count; n++) {
> >>> -                     struct drm_i915_gem_execbuffer2 execbuf;
> >>> -                     unsigned r = n % num_ctx;
> >>> -                     uint64_t offset = reloc[r].presumed_offset + reloc[r].delta;
> >>> +
> >>> +             for (unsigned int n = 0; n < count; n++) {
> >>> +                     const unsigned int r = n % num_ctx;
> >>> +                     struct drm_i915_gem_execbuffer2 execbuf = {
> >>> +                             .buffers_ptr = to_user_pointer(&obj[3*r]),
> >>> +                             .buffer_count = 3,
> >>> +                             .flags = engines[n % num_engines],
> >>> +                             .rsvd1 = all[n],
> >>> +                     };
> >>> +                     uint64_t offset =
> >>> +                             reloc[2*r + 1].presumed_offset +
> >>> +                             reloc[2*r + 1].delta;
> >>>                        uint32_t handle = gem_create(fd, 4096);
> >>>                        uint32_t buf[16];
> >>>                        int i;
> >>> @@ -176,27 +195,22 @@ static void single(const char *name, bool all_engines)
> >>>                        buf[++i] = all[n];
> >>>                        buf[++i] = MI_BATCH_BUFFER_END;
> >>>                        gem_write(fd, handle, 0, buf, sizeof(buf));
> >>> -                     obj[2*r + 1].handle = handle;
> >>> +                     obj[3*r + 2].handle = handle;
> >>>    
> >>> -                     memset(&execbuf, 0, sizeof(execbuf));
> >>> -                     execbuf.buffers_ptr = to_user_pointer(&obj[2*r]);
> >>> -                     execbuf.buffer_count = 2;
> >>> -                     execbuf.flags = engines[n % num_engines];
> >>> -                     execbuf.rsvd1 = all[n];
> >>>                        gem_execbuf(fd, &execbuf);
> >>>                        gem_close(fd, handle);
> >>>                }
> >>>    
> >>> -             /* Note we lied about the write-domain when writing from the
> >>> +             /*
> >>> +              * Note we lied about the write-domain when writing from the
> >>>                 * GPU (in order to avoid inter-ring synchronisation), so now
> >>>                 * we have to force the synchronisation here.
> >>>                 */
> >>>                gem_set_domain(fd, scratch,
> >>>                               I915_GEM_DOMAIN_CPU, I915_GEM_DOMAIN_CPU);
> >>> -             for (unsigned n = count - num_ctx; n < count; n++)
> >>> +             for (unsigned int n = count - num_ctx; n < count; n++)
> >>>                        igt_assert_eq(map[n % num_ctx], all[n]);
> >>>                free(all);
> >>> -             igt_progress(name, loop, MAX_LOOP);
> >>>        }
> >>>        munmap(map, size);
> >>>    
> >>>
> >>
> >> Just one thing I failed to figure out - what would be the difference
> >> from simply putting a write hazard on the scratch page write? Wouldn't
> >> both guarantee execution in order of execbuf?
> > 
> > That would cause every request to be executed in order and prevent
> > concurrent execution across engines. The intent of the test is to check
> 
> True, I forgot the context of a patch that each context gets it own 
> "ordering" bo, while the execbuf loop is separate.
> 
> > what happens when we run out of GTT space for the context/ring objects,
> > and part of that is to load up the GPU as much and as wide as possible.
> > 
> > Instead of forcing any ordering, we could just make the scratch big
> > enough for all writes; but it was such a quirky bug that I wanted to
> > share the fix. :)
> 
> However, if the "ordering" bo is one for each context, I still don't 
> quite understand what is ordered with the patch which wouldn't normally be.

Not one for each context, one for each dword in the scatch.
 
> And also, the verification loop is after all execbufs have completed - 
> so why is even the order of execution important?

Huh? Many contexts write into a single dword, we only check the last
value of the dword and presume which context wrote its id. We never told
the kernel what order the contexts had to be executed in, so the last
value is unspecified.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [igt-dev] [Intel-gfx] [PATCH i-g-t 3/6] igt/gem_ctx_thrash: Order writes between contexts
@ 2018-05-15  8:29           ` Chris Wilson
  0 siblings, 0 replies; 38+ messages in thread
From: Chris Wilson @ 2018-05-15  8:29 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx; +Cc: igt-dev

Quoting Tvrtko Ursulin (2018-05-15 09:20:13)
> 
> On 14/05/2018 16:10, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2018-05-14 11:59:09)
> >>
> >> On 14/05/2018 09:02, Chris Wilson wrote:
> >>> The test wrote to the same dwords from multiple contexts, assuming that
> >>> the writes would be ordered by its submission. However, as it was using
> >>> multiple contexts without a write hazard, those timelines are not
> >>> coupled and the requests may be emitted to hw in any order. So emit a
> >>> write hazard for each individual dword in the scratch (avoiding the
> >>> write hazard for the scratch as a whole) to ensure the writes do occur
> >>> in the expected order.
> >>>
> >>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >>> ---
> >>>    tests/gem_ctx_thrash.c | 92 ++++++++++++++++++++++++------------------
> >>>    1 file changed, 53 insertions(+), 39 deletions(-)
> >>>
> >>> diff --git a/tests/gem_ctx_thrash.c b/tests/gem_ctx_thrash.c
> >>> index 2cd9cfebf..b25f95f13 100644
> >>> --- a/tests/gem_ctx_thrash.c
> >>> +++ b/tests/gem_ctx_thrash.c
> >>> @@ -90,17 +90,13 @@ static void single(const char *name, bool all_engines)
> >>>    {
> >>>        struct drm_i915_gem_exec_object2 *obj;
> >>>        struct drm_i915_gem_relocation_entry *reloc;
> >>> -     unsigned engines[16];
> >>> -     uint64_t size;
> >>> -     uint32_t *ctx, *map, scratch;
> >>> -     unsigned num_ctx;
> >>> -     int fd, gen, num_engines;
> >>> +     unsigned int engines[16], num_engines, num_ctx;
> >>> +     uint32_t *ctx, *map, scratch, size;
> >>> +     int fd, gen;
> >>>    #define MAX_LOOP 16
> >>>    
> >>> -     fd = drm_open_driver_master(DRIVER_INTEL);
> >>> +     fd = drm_open_driver(DRIVER_INTEL);
> >>>        igt_require_gem(fd);
> >>> -     igt_require(gem_can_store_dword(fd, 0));
> >>> -
> >>>        gem_require_contexts(fd);
> >>>    
> >>>        gen = intel_gen(intel_get_drm_devid(fd));
> >>> @@ -108,54 +104,77 @@ static void single(const char *name, bool all_engines)
> >>>        num_engines = 0;
> >>>        if (all_engines) {
> >>>                unsigned engine;
> >>> +
> >>>                for_each_physical_engine(fd, engine) {
> >>> +                     if (!gem_can_store_dword(fd, engine))
> >>> +                             continue;
> >>> +
> >>>                        engines[num_engines++] = engine;
> >>>                        if (num_engines == ARRAY_SIZE(engines))
> >>>                                break;
> >>>                }
> >>> -     } else
> >>> +     } else {
> >>> +             igt_require(gem_can_store_dword(fd, 0));
> >>>                engines[num_engines++] = 0;
> >>> +     }
> >>> +     igt_require(num_engines);
> >>>    
> >>>        num_ctx = get_num_contexts(fd, num_engines);
> >>>    
> >>>        size = ALIGN(num_ctx * sizeof(uint32_t), 4096);
> >>> -     scratch = gem_create(fd, ALIGN(num_ctx * sizeof(uint32_t), 4096));
> >>> +     scratch = gem_create(fd, size);
> >>>        gem_set_caching(fd, scratch, I915_CACHING_CACHED);
> >>> -     obj = calloc(num_ctx, 2 * sizeof(*obj));
> >>> -     reloc = calloc(num_ctx, sizeof(*reloc));
> >>> +     obj = calloc(num_ctx, 3 * sizeof(*obj));
> >>> +     reloc = calloc(num_ctx, 2 * sizeof(*reloc));
> >>>    
> >>>        ctx = malloc(num_ctx * sizeof(uint32_t));
> >>>        igt_assert(ctx);
> >>>        for (unsigned n = 0; n < num_ctx; n++) {
> >>>                ctx[n] = gem_context_create(fd);
> >>> -             obj[2*n + 0].handle = scratch;
> >>> -
> >>> -             reloc[n].target_handle = scratch;
> >>> -             reloc[n].presumed_offset = 0;
> >>> -             reloc[n].offset = sizeof(uint32_t);
> >>> -             reloc[n].delta = n * sizeof(uint32_t);
> >>> -             reloc[n].read_domains = I915_GEM_DOMAIN_INSTRUCTION;
> >>> -             reloc[n].write_domain = 0; /* lies! */
> >>> +
> >>> +             obj[3*n + 0].handle = gem_create(fd, 4096);
> >>> +             reloc[2*n + 0].target_handle = obj[3*n + 0].handle;
> >>> +             reloc[2*n + 0].presumed_offset = 0;
> >>> +             reloc[2*n + 0].offset = 4000;
> >>> +             reloc[2*n + 0].delta = 0;
> >>> +             reloc[2*n + 0].read_domains = I915_GEM_DOMAIN_RENDER;
> >>> +             reloc[2*n + 0].write_domain = I915_GEM_DOMAIN_RENDER;
> >>> +
> >>> +             obj[3*n + 1].handle = scratch;
> >>> +             reloc[2*n + 1].target_handle = scratch;
> >>> +             reloc[2*n + 1].presumed_offset = 0;
> >>> +             reloc[2*n + 1].offset = sizeof(uint32_t);
> >>> +             reloc[2*n + 1].delta = n * sizeof(uint32_t);
> >>> +             reloc[2*n + 1].read_domains = I915_GEM_DOMAIN_RENDER;
> >>> +             reloc[2*n + 1].write_domain = 0; /* lies! */
> >>>                if (gen >= 4 && gen < 8)
> >>> -                     reloc[n].offset += sizeof(uint32_t);
> >>> +                     reloc[2*n + 1].offset += sizeof(uint32_t);
> >>>    
> >>> -             obj[2*n + 1].relocs_ptr = to_user_pointer(&reloc[n]);
> >>> -             obj[2*n + 1].relocation_count = 1;
> >>> +             obj[3*n + 2].relocs_ptr = to_user_pointer(&reloc[2*n]);
> >>> +             obj[3*n + 2].relocation_count = 2;
> >>>        }
> >>>    
> >>>        map = gem_mmap__cpu(fd, scratch, 0, size, PROT_WRITE);
> >>> -     for (unsigned loop = 1; loop <= MAX_LOOP; loop <<= 1) {
> >>> -             unsigned count = loop * num_ctx;
> >>> +     for (unsigned int loop = 1; loop <= MAX_LOOP; loop <<= 1) {
> >>> +             const unsigned int count = loop * num_ctx;
> >>>                uint32_t *all;
> >>>    
> >>>                all = malloc(count * sizeof(uint32_t));
> >>> -             for (unsigned n = 0; n < count; n++)
> >>> +             for (unsigned int n = 0; n < count; n++)
> >>>                        all[n] = ctx[n % num_ctx];
> >>>                igt_permute_array(all, count, xchg_int);
> >>> -             for (unsigned n = 0; n < count; n++) {
> >>> -                     struct drm_i915_gem_execbuffer2 execbuf;
> >>> -                     unsigned r = n % num_ctx;
> >>> -                     uint64_t offset = reloc[r].presumed_offset + reloc[r].delta;
> >>> +
> >>> +             for (unsigned int n = 0; n < count; n++) {
> >>> +                     const unsigned int r = n % num_ctx;
> >>> +                     struct drm_i915_gem_execbuffer2 execbuf = {
> >>> +                             .buffers_ptr = to_user_pointer(&obj[3*r]),
> >>> +                             .buffer_count = 3,
> >>> +                             .flags = engines[n % num_engines],
> >>> +                             .rsvd1 = all[n],
> >>> +                     };
> >>> +                     uint64_t offset =
> >>> +                             reloc[2*r + 1].presumed_offset +
> >>> +                             reloc[2*r + 1].delta;
> >>>                        uint32_t handle = gem_create(fd, 4096);
> >>>                        uint32_t buf[16];
> >>>                        int i;
> >>> @@ -176,27 +195,22 @@ static void single(const char *name, bool all_engines)
> >>>                        buf[++i] = all[n];
> >>>                        buf[++i] = MI_BATCH_BUFFER_END;
> >>>                        gem_write(fd, handle, 0, buf, sizeof(buf));
> >>> -                     obj[2*r + 1].handle = handle;
> >>> +                     obj[3*r + 2].handle = handle;
> >>>    
> >>> -                     memset(&execbuf, 0, sizeof(execbuf));
> >>> -                     execbuf.buffers_ptr = to_user_pointer(&obj[2*r]);
> >>> -                     execbuf.buffer_count = 2;
> >>> -                     execbuf.flags = engines[n % num_engines];
> >>> -                     execbuf.rsvd1 = all[n];
> >>>                        gem_execbuf(fd, &execbuf);
> >>>                        gem_close(fd, handle);
> >>>                }
> >>>    
> >>> -             /* Note we lied about the write-domain when writing from the
> >>> +             /*
> >>> +              * Note we lied about the write-domain when writing from the
> >>>                 * GPU (in order to avoid inter-ring synchronisation), so now
> >>>                 * we have to force the synchronisation here.
> >>>                 */
> >>>                gem_set_domain(fd, scratch,
> >>>                               I915_GEM_DOMAIN_CPU, I915_GEM_DOMAIN_CPU);
> >>> -             for (unsigned n = count - num_ctx; n < count; n++)
> >>> +             for (unsigned int n = count - num_ctx; n < count; n++)
> >>>                        igt_assert_eq(map[n % num_ctx], all[n]);
> >>>                free(all);
> >>> -             igt_progress(name, loop, MAX_LOOP);
> >>>        }
> >>>        munmap(map, size);
> >>>    
> >>>
> >>
> >> Just one thing I failed to figure out - what would be the difference
> >> from simply putting a write hazard on the scratch page write? Wouldn't
> >> both guarantee execution in order of execbuf?
> > 
> > That would cause every request to be executed in order and prevent
> > concurrent execution across engines. The intent of the test is to check
> 
> True, I forgot the context of a patch that each context gets it own 
> "ordering" bo, while the execbuf loop is separate.
> 
> > what happens when we run out of GTT space for the context/ring objects,
> > and part of that is to load up the GPU as much and as wide as possible.
> > 
> > Instead of forcing any ordering, we could just make the scratch big
> > enough for all writes; but it was such a quirky bug that I wanted to
> > share the fix. :)
> 
> However, if the "ordering" bo is one for each context, I still don't 
> quite understand what is ordered with the patch which wouldn't normally be.

Not one for each context, one for each dword in the scatch.
 
> And also, the verification loop is after all execbufs have completed - 
> so why is even the order of execution important?

Huh? Many contexts write into a single dword, we only check the last
value of the dword and presume which context wrote its id. We never told
the kernel what order the contexts had to be executed in, so the last
value is unspecified.
-Chris
_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH i-g-t 3/6] igt/gem_ctx_thrash: Order writes between contexts
  2018-05-15  8:29           ` [igt-dev] [Intel-gfx] " Chris Wilson
@ 2018-05-15  8:45             ` Tvrtko Ursulin
  -1 siblings, 0 replies; 38+ messages in thread
From: Tvrtko Ursulin @ 2018-05-15  8:45 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: igt-dev


On 15/05/2018 09:29, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-05-15 09:20:13)
>>
>> On 14/05/2018 16:10, Chris Wilson wrote:
>>> Quoting Tvrtko Ursulin (2018-05-14 11:59:09)
>>>>
>>>> On 14/05/2018 09:02, Chris Wilson wrote:
>>>>> The test wrote to the same dwords from multiple contexts, assuming that
>>>>> the writes would be ordered by its submission. However, as it was using
>>>>> multiple contexts without a write hazard, those timelines are not
>>>>> coupled and the requests may be emitted to hw in any order. So emit a
>>>>> write hazard for each individual dword in the scratch (avoiding the
>>>>> write hazard for the scratch as a whole) to ensure the writes do occur
>>>>> in the expected order.
>>>>>
>>>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>>>> ---
>>>>>     tests/gem_ctx_thrash.c | 92 ++++++++++++++++++++++++------------------
>>>>>     1 file changed, 53 insertions(+), 39 deletions(-)
>>>>>
>>>>> diff --git a/tests/gem_ctx_thrash.c b/tests/gem_ctx_thrash.c
>>>>> index 2cd9cfebf..b25f95f13 100644
>>>>> --- a/tests/gem_ctx_thrash.c
>>>>> +++ b/tests/gem_ctx_thrash.c
>>>>> @@ -90,17 +90,13 @@ static void single(const char *name, bool all_engines)
>>>>>     {
>>>>>         struct drm_i915_gem_exec_object2 *obj;
>>>>>         struct drm_i915_gem_relocation_entry *reloc;
>>>>> -     unsigned engines[16];
>>>>> -     uint64_t size;
>>>>> -     uint32_t *ctx, *map, scratch;
>>>>> -     unsigned num_ctx;
>>>>> -     int fd, gen, num_engines;
>>>>> +     unsigned int engines[16], num_engines, num_ctx;
>>>>> +     uint32_t *ctx, *map, scratch, size;
>>>>> +     int fd, gen;
>>>>>     #define MAX_LOOP 16
>>>>>     
>>>>> -     fd = drm_open_driver_master(DRIVER_INTEL);
>>>>> +     fd = drm_open_driver(DRIVER_INTEL);
>>>>>         igt_require_gem(fd);
>>>>> -     igt_require(gem_can_store_dword(fd, 0));
>>>>> -
>>>>>         gem_require_contexts(fd);
>>>>>     
>>>>>         gen = intel_gen(intel_get_drm_devid(fd));
>>>>> @@ -108,54 +104,77 @@ static void single(const char *name, bool all_engines)
>>>>>         num_engines = 0;
>>>>>         if (all_engines) {
>>>>>                 unsigned engine;
>>>>> +
>>>>>                 for_each_physical_engine(fd, engine) {
>>>>> +                     if (!gem_can_store_dword(fd, engine))
>>>>> +                             continue;
>>>>> +
>>>>>                         engines[num_engines++] = engine;
>>>>>                         if (num_engines == ARRAY_SIZE(engines))
>>>>>                                 break;
>>>>>                 }
>>>>> -     } else
>>>>> +     } else {
>>>>> +             igt_require(gem_can_store_dword(fd, 0));
>>>>>                 engines[num_engines++] = 0;
>>>>> +     }
>>>>> +     igt_require(num_engines);
>>>>>     
>>>>>         num_ctx = get_num_contexts(fd, num_engines);
>>>>>     
>>>>>         size = ALIGN(num_ctx * sizeof(uint32_t), 4096);
>>>>> -     scratch = gem_create(fd, ALIGN(num_ctx * sizeof(uint32_t), 4096));
>>>>> +     scratch = gem_create(fd, size);
>>>>>         gem_set_caching(fd, scratch, I915_CACHING_CACHED);
>>>>> -     obj = calloc(num_ctx, 2 * sizeof(*obj));
>>>>> -     reloc = calloc(num_ctx, sizeof(*reloc));
>>>>> +     obj = calloc(num_ctx, 3 * sizeof(*obj));
>>>>> +     reloc = calloc(num_ctx, 2 * sizeof(*reloc));
>>>>>     
>>>>>         ctx = malloc(num_ctx * sizeof(uint32_t));
>>>>>         igt_assert(ctx);
>>>>>         for (unsigned n = 0; n < num_ctx; n++) {
>>>>>                 ctx[n] = gem_context_create(fd);
>>>>> -             obj[2*n + 0].handle = scratch;
>>>>> -
>>>>> -             reloc[n].target_handle = scratch;
>>>>> -             reloc[n].presumed_offset = 0;
>>>>> -             reloc[n].offset = sizeof(uint32_t);
>>>>> -             reloc[n].delta = n * sizeof(uint32_t);
>>>>> -             reloc[n].read_domains = I915_GEM_DOMAIN_INSTRUCTION;
>>>>> -             reloc[n].write_domain = 0; /* lies! */
>>>>> +
>>>>> +             obj[3*n + 0].handle = gem_create(fd, 4096);
>>>>> +             reloc[2*n + 0].target_handle = obj[3*n + 0].handle;
>>>>> +             reloc[2*n + 0].presumed_offset = 0;
>>>>> +             reloc[2*n + 0].offset = 4000;
>>>>> +             reloc[2*n + 0].delta = 0;
>>>>> +             reloc[2*n + 0].read_domains = I915_GEM_DOMAIN_RENDER;
>>>>> +             reloc[2*n + 0].write_domain = I915_GEM_DOMAIN_RENDER;
>>>>> +
>>>>> +             obj[3*n + 1].handle = scratch;
>>>>> +             reloc[2*n + 1].target_handle = scratch;
>>>>> +             reloc[2*n + 1].presumed_offset = 0;
>>>>> +             reloc[2*n + 1].offset = sizeof(uint32_t);
>>>>> +             reloc[2*n + 1].delta = n * sizeof(uint32_t);
>>>>> +             reloc[2*n + 1].read_domains = I915_GEM_DOMAIN_RENDER;
>>>>> +             reloc[2*n + 1].write_domain = 0; /* lies! */
>>>>>                 if (gen >= 4 && gen < 8)
>>>>> -                     reloc[n].offset += sizeof(uint32_t);
>>>>> +                     reloc[2*n + 1].offset += sizeof(uint32_t);
>>>>>     
>>>>> -             obj[2*n + 1].relocs_ptr = to_user_pointer(&reloc[n]);
>>>>> -             obj[2*n + 1].relocation_count = 1;
>>>>> +             obj[3*n + 2].relocs_ptr = to_user_pointer(&reloc[2*n]);
>>>>> +             obj[3*n + 2].relocation_count = 2;
>>>>>         }
>>>>>     
>>>>>         map = gem_mmap__cpu(fd, scratch, 0, size, PROT_WRITE);
>>>>> -     for (unsigned loop = 1; loop <= MAX_LOOP; loop <<= 1) {
>>>>> -             unsigned count = loop * num_ctx;
>>>>> +     for (unsigned int loop = 1; loop <= MAX_LOOP; loop <<= 1) {
>>>>> +             const unsigned int count = loop * num_ctx;
>>>>>                 uint32_t *all;
>>>>>     
>>>>>                 all = malloc(count * sizeof(uint32_t));
>>>>> -             for (unsigned n = 0; n < count; n++)
>>>>> +             for (unsigned int n = 0; n < count; n++)
>>>>>                         all[n] = ctx[n % num_ctx];
>>>>>                 igt_permute_array(all, count, xchg_int);
>>>>> -             for (unsigned n = 0; n < count; n++) {
>>>>> -                     struct drm_i915_gem_execbuffer2 execbuf;
>>>>> -                     unsigned r = n % num_ctx;
>>>>> -                     uint64_t offset = reloc[r].presumed_offset + reloc[r].delta;
>>>>> +
>>>>> +             for (unsigned int n = 0; n < count; n++) {
>>>>> +                     const unsigned int r = n % num_ctx;
>>>>> +                     struct drm_i915_gem_execbuffer2 execbuf = {
>>>>> +                             .buffers_ptr = to_user_pointer(&obj[3*r]),
>>>>> +                             .buffer_count = 3,
>>>>> +                             .flags = engines[n % num_engines],
>>>>> +                             .rsvd1 = all[n],
>>>>> +                     };
>>>>> +                     uint64_t offset =
>>>>> +                             reloc[2*r + 1].presumed_offset +
>>>>> +                             reloc[2*r + 1].delta;
>>>>>                         uint32_t handle = gem_create(fd, 4096);
>>>>>                         uint32_t buf[16];
>>>>>                         int i;
>>>>> @@ -176,27 +195,22 @@ static void single(const char *name, bool all_engines)
>>>>>                         buf[++i] = all[n];
>>>>>                         buf[++i] = MI_BATCH_BUFFER_END;
>>>>>                         gem_write(fd, handle, 0, buf, sizeof(buf));
>>>>> -                     obj[2*r + 1].handle = handle;
>>>>> +                     obj[3*r + 2].handle = handle;
>>>>>     
>>>>> -                     memset(&execbuf, 0, sizeof(execbuf));
>>>>> -                     execbuf.buffers_ptr = to_user_pointer(&obj[2*r]);
>>>>> -                     execbuf.buffer_count = 2;
>>>>> -                     execbuf.flags = engines[n % num_engines];
>>>>> -                     execbuf.rsvd1 = all[n];
>>>>>                         gem_execbuf(fd, &execbuf);
>>>>>                         gem_close(fd, handle);
>>>>>                 }
>>>>>     
>>>>> -             /* Note we lied about the write-domain when writing from the
>>>>> +             /*
>>>>> +              * Note we lied about the write-domain when writing from the
>>>>>                  * GPU (in order to avoid inter-ring synchronisation), so now
>>>>>                  * we have to force the synchronisation here.
>>>>>                  */
>>>>>                 gem_set_domain(fd, scratch,
>>>>>                                I915_GEM_DOMAIN_CPU, I915_GEM_DOMAIN_CPU);
>>>>> -             for (unsigned n = count - num_ctx; n < count; n++)
>>>>> +             for (unsigned int n = count - num_ctx; n < count; n++)
>>>>>                         igt_assert_eq(map[n % num_ctx], all[n]);
>>>>>                 free(all);
>>>>> -             igt_progress(name, loop, MAX_LOOP);
>>>>>         }
>>>>>         munmap(map, size);
>>>>>     
>>>>>
>>>>
>>>> Just one thing I failed to figure out - what would be the difference
>>>> from simply putting a write hazard on the scratch page write? Wouldn't
>>>> both guarantee execution in order of execbuf?
>>>
>>> That would cause every request to be executed in order and prevent
>>> concurrent execution across engines. The intent of the test is to check
>>
>> True, I forgot the context of a patch that each context gets it own
>> "ordering" bo, while the execbuf loop is separate.
>>
>>> what happens when we run out of GTT space for the context/ring objects,
>>> and part of that is to load up the GPU as much and as wide as possible.
>>>
>>> Instead of forcing any ordering, we could just make the scratch big
>>> enough for all writes; but it was such a quirky bug that I wanted to
>>> share the fix. :)
>>
>> However, if the "ordering" bo is one for each context, I still don't
>> quite understand what is ordered with the patch which wouldn't normally be.
> 
> Not one for each context, one for each dword in the scatch.

I see gem_context_create and gem_create for the new "ordering" bo in the 
same loop "for n = 0; n < num_ctx". So I conclude number of "ordering" 
bos is equal to number of contexts.

>> And also, the verification loop is after all execbufs have completed -
>> so why is even the order of execution important?
> 
> Huh? Many contexts write into a single dword, we only check the last
> value of the dword and presume which context wrote its id. We never told
> the kernel what order the contexts had to be executed in, so the last
> value is unspecified.

Then in the execbuf loop I saw for each execbuf context being picked 
with "n % num_ctx" and the write destination as "n % num_ctx" as well.

But I missed the igt_permute_array is only on the context list, which 
decouples the two.

I guess case in point why complex test patterns without any high level 
comments on what and how test intends to work suck since now both of us 
lost way too much time for this.

So I guess its fine, but a pain to review since I don't have a compiler 
and a virtual machine in my head.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [igt-dev] [Intel-gfx] [PATCH i-g-t 3/6] igt/gem_ctx_thrash: Order writes between contexts
@ 2018-05-15  8:45             ` Tvrtko Ursulin
  0 siblings, 0 replies; 38+ messages in thread
From: Tvrtko Ursulin @ 2018-05-15  8:45 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: igt-dev


On 15/05/2018 09:29, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-05-15 09:20:13)
>>
>> On 14/05/2018 16:10, Chris Wilson wrote:
>>> Quoting Tvrtko Ursulin (2018-05-14 11:59:09)
>>>>
>>>> On 14/05/2018 09:02, Chris Wilson wrote:
>>>>> The test wrote to the same dwords from multiple contexts, assuming that
>>>>> the writes would be ordered by its submission. However, as it was using
>>>>> multiple contexts without a write hazard, those timelines are not
>>>>> coupled and the requests may be emitted to hw in any order. So emit a
>>>>> write hazard for each individual dword in the scratch (avoiding the
>>>>> write hazard for the scratch as a whole) to ensure the writes do occur
>>>>> in the expected order.
>>>>>
>>>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>>>> ---
>>>>>     tests/gem_ctx_thrash.c | 92 ++++++++++++++++++++++++------------------
>>>>>     1 file changed, 53 insertions(+), 39 deletions(-)
>>>>>
>>>>> diff --git a/tests/gem_ctx_thrash.c b/tests/gem_ctx_thrash.c
>>>>> index 2cd9cfebf..b25f95f13 100644
>>>>> --- a/tests/gem_ctx_thrash.c
>>>>> +++ b/tests/gem_ctx_thrash.c
>>>>> @@ -90,17 +90,13 @@ static void single(const char *name, bool all_engines)
>>>>>     {
>>>>>         struct drm_i915_gem_exec_object2 *obj;
>>>>>         struct drm_i915_gem_relocation_entry *reloc;
>>>>> -     unsigned engines[16];
>>>>> -     uint64_t size;
>>>>> -     uint32_t *ctx, *map, scratch;
>>>>> -     unsigned num_ctx;
>>>>> -     int fd, gen, num_engines;
>>>>> +     unsigned int engines[16], num_engines, num_ctx;
>>>>> +     uint32_t *ctx, *map, scratch, size;
>>>>> +     int fd, gen;
>>>>>     #define MAX_LOOP 16
>>>>>     
>>>>> -     fd = drm_open_driver_master(DRIVER_INTEL);
>>>>> +     fd = drm_open_driver(DRIVER_INTEL);
>>>>>         igt_require_gem(fd);
>>>>> -     igt_require(gem_can_store_dword(fd, 0));
>>>>> -
>>>>>         gem_require_contexts(fd);
>>>>>     
>>>>>         gen = intel_gen(intel_get_drm_devid(fd));
>>>>> @@ -108,54 +104,77 @@ static void single(const char *name, bool all_engines)
>>>>>         num_engines = 0;
>>>>>         if (all_engines) {
>>>>>                 unsigned engine;
>>>>> +
>>>>>                 for_each_physical_engine(fd, engine) {
>>>>> +                     if (!gem_can_store_dword(fd, engine))
>>>>> +                             continue;
>>>>> +
>>>>>                         engines[num_engines++] = engine;
>>>>>                         if (num_engines == ARRAY_SIZE(engines))
>>>>>                                 break;
>>>>>                 }
>>>>> -     } else
>>>>> +     } else {
>>>>> +             igt_require(gem_can_store_dword(fd, 0));
>>>>>                 engines[num_engines++] = 0;
>>>>> +     }
>>>>> +     igt_require(num_engines);
>>>>>     
>>>>>         num_ctx = get_num_contexts(fd, num_engines);
>>>>>     
>>>>>         size = ALIGN(num_ctx * sizeof(uint32_t), 4096);
>>>>> -     scratch = gem_create(fd, ALIGN(num_ctx * sizeof(uint32_t), 4096));
>>>>> +     scratch = gem_create(fd, size);
>>>>>         gem_set_caching(fd, scratch, I915_CACHING_CACHED);
>>>>> -     obj = calloc(num_ctx, 2 * sizeof(*obj));
>>>>> -     reloc = calloc(num_ctx, sizeof(*reloc));
>>>>> +     obj = calloc(num_ctx, 3 * sizeof(*obj));
>>>>> +     reloc = calloc(num_ctx, 2 * sizeof(*reloc));
>>>>>     
>>>>>         ctx = malloc(num_ctx * sizeof(uint32_t));
>>>>>         igt_assert(ctx);
>>>>>         for (unsigned n = 0; n < num_ctx; n++) {
>>>>>                 ctx[n] = gem_context_create(fd);
>>>>> -             obj[2*n + 0].handle = scratch;
>>>>> -
>>>>> -             reloc[n].target_handle = scratch;
>>>>> -             reloc[n].presumed_offset = 0;
>>>>> -             reloc[n].offset = sizeof(uint32_t);
>>>>> -             reloc[n].delta = n * sizeof(uint32_t);
>>>>> -             reloc[n].read_domains = I915_GEM_DOMAIN_INSTRUCTION;
>>>>> -             reloc[n].write_domain = 0; /* lies! */
>>>>> +
>>>>> +             obj[3*n + 0].handle = gem_create(fd, 4096);
>>>>> +             reloc[2*n + 0].target_handle = obj[3*n + 0].handle;
>>>>> +             reloc[2*n + 0].presumed_offset = 0;
>>>>> +             reloc[2*n + 0].offset = 4000;
>>>>> +             reloc[2*n + 0].delta = 0;
>>>>> +             reloc[2*n + 0].read_domains = I915_GEM_DOMAIN_RENDER;
>>>>> +             reloc[2*n + 0].write_domain = I915_GEM_DOMAIN_RENDER;
>>>>> +
>>>>> +             obj[3*n + 1].handle = scratch;
>>>>> +             reloc[2*n + 1].target_handle = scratch;
>>>>> +             reloc[2*n + 1].presumed_offset = 0;
>>>>> +             reloc[2*n + 1].offset = sizeof(uint32_t);
>>>>> +             reloc[2*n + 1].delta = n * sizeof(uint32_t);
>>>>> +             reloc[2*n + 1].read_domains = I915_GEM_DOMAIN_RENDER;
>>>>> +             reloc[2*n + 1].write_domain = 0; /* lies! */
>>>>>                 if (gen >= 4 && gen < 8)
>>>>> -                     reloc[n].offset += sizeof(uint32_t);
>>>>> +                     reloc[2*n + 1].offset += sizeof(uint32_t);
>>>>>     
>>>>> -             obj[2*n + 1].relocs_ptr = to_user_pointer(&reloc[n]);
>>>>> -             obj[2*n + 1].relocation_count = 1;
>>>>> +             obj[3*n + 2].relocs_ptr = to_user_pointer(&reloc[2*n]);
>>>>> +             obj[3*n + 2].relocation_count = 2;
>>>>>         }
>>>>>     
>>>>>         map = gem_mmap__cpu(fd, scratch, 0, size, PROT_WRITE);
>>>>> -     for (unsigned loop = 1; loop <= MAX_LOOP; loop <<= 1) {
>>>>> -             unsigned count = loop * num_ctx;
>>>>> +     for (unsigned int loop = 1; loop <= MAX_LOOP; loop <<= 1) {
>>>>> +             const unsigned int count = loop * num_ctx;
>>>>>                 uint32_t *all;
>>>>>     
>>>>>                 all = malloc(count * sizeof(uint32_t));
>>>>> -             for (unsigned n = 0; n < count; n++)
>>>>> +             for (unsigned int n = 0; n < count; n++)
>>>>>                         all[n] = ctx[n % num_ctx];
>>>>>                 igt_permute_array(all, count, xchg_int);
>>>>> -             for (unsigned n = 0; n < count; n++) {
>>>>> -                     struct drm_i915_gem_execbuffer2 execbuf;
>>>>> -                     unsigned r = n % num_ctx;
>>>>> -                     uint64_t offset = reloc[r].presumed_offset + reloc[r].delta;
>>>>> +
>>>>> +             for (unsigned int n = 0; n < count; n++) {
>>>>> +                     const unsigned int r = n % num_ctx;
>>>>> +                     struct drm_i915_gem_execbuffer2 execbuf = {
>>>>> +                             .buffers_ptr = to_user_pointer(&obj[3*r]),
>>>>> +                             .buffer_count = 3,
>>>>> +                             .flags = engines[n % num_engines],
>>>>> +                             .rsvd1 = all[n],
>>>>> +                     };
>>>>> +                     uint64_t offset =
>>>>> +                             reloc[2*r + 1].presumed_offset +
>>>>> +                             reloc[2*r + 1].delta;
>>>>>                         uint32_t handle = gem_create(fd, 4096);
>>>>>                         uint32_t buf[16];
>>>>>                         int i;
>>>>> @@ -176,27 +195,22 @@ static void single(const char *name, bool all_engines)
>>>>>                         buf[++i] = all[n];
>>>>>                         buf[++i] = MI_BATCH_BUFFER_END;
>>>>>                         gem_write(fd, handle, 0, buf, sizeof(buf));
>>>>> -                     obj[2*r + 1].handle = handle;
>>>>> +                     obj[3*r + 2].handle = handle;
>>>>>     
>>>>> -                     memset(&execbuf, 0, sizeof(execbuf));
>>>>> -                     execbuf.buffers_ptr = to_user_pointer(&obj[2*r]);
>>>>> -                     execbuf.buffer_count = 2;
>>>>> -                     execbuf.flags = engines[n % num_engines];
>>>>> -                     execbuf.rsvd1 = all[n];
>>>>>                         gem_execbuf(fd, &execbuf);
>>>>>                         gem_close(fd, handle);
>>>>>                 }
>>>>>     
>>>>> -             /* Note we lied about the write-domain when writing from the
>>>>> +             /*
>>>>> +              * Note we lied about the write-domain when writing from the
>>>>>                  * GPU (in order to avoid inter-ring synchronisation), so now
>>>>>                  * we have to force the synchronisation here.
>>>>>                  */
>>>>>                 gem_set_domain(fd, scratch,
>>>>>                                I915_GEM_DOMAIN_CPU, I915_GEM_DOMAIN_CPU);
>>>>> -             for (unsigned n = count - num_ctx; n < count; n++)
>>>>> +             for (unsigned int n = count - num_ctx; n < count; n++)
>>>>>                         igt_assert_eq(map[n % num_ctx], all[n]);
>>>>>                 free(all);
>>>>> -             igt_progress(name, loop, MAX_LOOP);
>>>>>         }
>>>>>         munmap(map, size);
>>>>>     
>>>>>
>>>>
>>>> Just one thing I failed to figure out - what would be the difference
>>>> from simply putting a write hazard on the scratch page write? Wouldn't
>>>> both guarantee execution in order of execbuf?
>>>
>>> That would cause every request to be executed in order and prevent
>>> concurrent execution across engines. The intent of the test is to check
>>
>> True, I forgot the context of a patch that each context gets it own
>> "ordering" bo, while the execbuf loop is separate.
>>
>>> what happens when we run out of GTT space for the context/ring objects,
>>> and part of that is to load up the GPU as much and as wide as possible.
>>>
>>> Instead of forcing any ordering, we could just make the scratch big
>>> enough for all writes; but it was such a quirky bug that I wanted to
>>> share the fix. :)
>>
>> However, if the "ordering" bo is one for each context, I still don't
>> quite understand what is ordered with the patch which wouldn't normally be.
> 
> Not one for each context, one for each dword in the scatch.

I see gem_context_create and gem_create for the new "ordering" bo in the 
same loop "for n = 0; n < num_ctx". So I conclude number of "ordering" 
bos is equal to number of contexts.

>> And also, the verification loop is after all execbufs have completed -
>> so why is even the order of execution important?
> 
> Huh? Many contexts write into a single dword, we only check the last
> value of the dword and presume which context wrote its id. We never told
> the kernel what order the contexts had to be executed in, so the last
> value is unspecified.

Then in the execbuf loop I saw for each execbuf context being picked 
with "n % num_ctx" and the write destination as "n % num_ctx" as well.

But I missed the igt_permute_array is only on the context list, which 
decouples the two.

I guess case in point why complex test patterns without any high level 
comments on what and how test intends to work suck since now both of us 
lost way too much time for this.

So I guess its fine, but a pain to review since I don't have a compiler 
and a virtual machine in my head.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2018-05-15  8:45 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-14  8:02 [PATCH i-g-t 1/6] overlay: Remove the miscalculation of window position Chris Wilson
2018-05-14  8:02 ` [igt-dev] " Chris Wilson
2018-05-14  8:02 ` [PATCH i-g-t 2/6] lib/audio: Replace sqrt(a*a + b*b) with hypot(a, b) Chris Wilson
2018-05-14  8:02   ` [igt-dev] " Chris Wilson
2018-05-14 10:20   ` Tvrtko Ursulin
2018-05-14 10:20     ` [igt-dev] [Intel-gfx] " Tvrtko Ursulin
2018-05-14  8:02 ` [PATCH i-g-t 3/6] igt/gem_ctx_thrash: Order writes between contexts Chris Wilson
2018-05-14  8:02   ` [igt-dev] " Chris Wilson
2018-05-14 10:59   ` Tvrtko Ursulin
2018-05-14 10:59     ` [igt-dev] [Intel-gfx] " Tvrtko Ursulin
2018-05-14 15:10     ` Chris Wilson
2018-05-14 15:10       ` [igt-dev] [Intel-gfx] " Chris Wilson
2018-05-15  8:20       ` Tvrtko Ursulin
2018-05-15  8:20         ` [igt-dev] [Intel-gfx] " Tvrtko Ursulin
2018-05-15  8:29         ` Chris Wilson
2018-05-15  8:29           ` [igt-dev] [Intel-gfx] " Chris Wilson
2018-05-15  8:45           ` Tvrtko Ursulin
2018-05-15  8:45             ` [igt-dev] [Intel-gfx] " Tvrtko Ursulin
2018-05-14  8:02 ` [PATCH i-g-t 4/6] igt/gem_eio: Exercise banning Chris Wilson
2018-05-14  8:02   ` [igt-dev] " Chris Wilson
2018-05-14 11:01   ` Tvrtko Ursulin
2018-05-14 11:01     ` [Intel-gfx] " Tvrtko Ursulin
2018-05-14  8:02 ` [PATCH i-g-t 5/6] tests/gem_eio: Only wait-for-idle inside trigger_reset() Chris Wilson
2018-05-14  8:02   ` [igt-dev] " Chris Wilson
2018-05-14 11:03   ` Tvrtko Ursulin
2018-05-14 11:03     ` [igt-dev] [Intel-gfx] " Tvrtko Ursulin
2018-05-14 11:07     ` Chris Wilson
2018-05-14 11:07       ` [Intel-gfx] " Chris Wilson
2018-05-14  8:02 ` [PATCH i-g-t 6/6] tests/gem_exec_latency: New subtests for checking submission from RT tasks Chris Wilson
2018-05-14  8:02   ` [igt-dev] " Chris Wilson
2018-05-14 11:13   ` Tvrtko Ursulin
2018-05-14 11:13     ` [Intel-gfx] " Tvrtko Ursulin
2018-05-14 11:25     ` Chris Wilson
2018-05-14 11:25       ` [Intel-gfx] " Chris Wilson
2018-05-14  8:52 ` [igt-dev] ✓ Fi.CI.BAT: success for series starting with [i-g-t,1/6] overlay: Remove the miscalculation of window position Patchwork
2018-05-14 10:18 ` [PATCH i-g-t 1/6] " Tvrtko Ursulin
2018-05-14 10:18   ` [igt-dev] [Intel-gfx] " Tvrtko Ursulin
2018-05-14 10:31 ` [igt-dev] ✗ Fi.CI.IGT: failure for series starting with [i-g-t,1/6] " Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.