All of lore.kernel.org
 help / color / mirror / Atom feed
* [Intel-gfx] [PATCH i-g-t 0/8] Fixes for gem_exec_capture
@ 2021-10-21 23:40 John.C.Harrison
  2021-10-21 23:40   ` [igt-dev] " John.C.Harrison
                   ` (9 more replies)
  0 siblings, 10 replies; 51+ messages in thread
From: John.C.Harrison @ 2021-10-21 23:40 UTC (permalink / raw)
  To: IGT-Dev; +Cc: Intel-GFX, John Harrison

From: John Harrison <John.C.Harrison@Intel.com>

Fix a bunch of issues with gem_exec_capture with the ultimate aim of
making it pass on GuC enabled platforms.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>


John Harrison (8):
  tests/i915/gem_exec_capture: Remove pointless assert
  tests/i915/gem_exec_capture: Cope with larger page sizes
  tests/i915/gem_exec_capture: Make the error decode a common helper
  tests/i915/gem_exec_capture: Use contexts and engines properly
  tests/i915/gem_exec_capture: Check for memory allocation failure
  lib/igt_sysfs: Support large files
  lib/igt_gt: Allow per engine reset testing
  tests/i915/gem_exec_capture: Update to support GuC based resets

 lib/igt_gt.c                  |  44 ++--
 lib/igt_gt.h                  |   1 +
 lib/igt_sysfs.c               |  17 +-
 tests/i915/gem_exec_capture.c | 472 ++++++++++++++++++++--------------
 4 files changed, 317 insertions(+), 217 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 51+ messages in thread

* [Intel-gfx] [PATCH i-g-t 1/8] tests/i915/gem_exec_capture: Remove pointless assert
  2021-10-21 23:40 [Intel-gfx] [PATCH i-g-t 0/8] Fixes for gem_exec_capture John.C.Harrison
@ 2021-10-21 23:40   ` John.C.Harrison
  2021-10-21 23:40   ` [igt-dev] " John.C.Harrison
                     ` (8 subsequent siblings)
  9 siblings, 0 replies; 51+ messages in thread
From: John.C.Harrison @ 2021-10-21 23:40 UTC (permalink / raw)
  To: IGT-Dev; +Cc: Intel-GFX, John Harrison

From: John Harrison <John.C.Harrison@Intel.com>

The 'many' test ended with an 'assert(count)', presumably meaning to
ensure that some objects were actually captured. However, 'count' is
the number of objects created not how many were captured. Plus, there
is already a 'require(count > 1)' at the start and count is invarient
so the final assert is basically pointless.

General concensus appears to be that the test should not fail
irrespective of how many blobs are captured as low memory situations
could cause the capture to be abbreviated. So just remove the
pointless assert completely.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 tests/i915/gem_exec_capture.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/tests/i915/gem_exec_capture.c b/tests/i915/gem_exec_capture.c
index 7e0a8b8ad..53649cdb2 100644
--- a/tests/i915/gem_exec_capture.c
+++ b/tests/i915/gem_exec_capture.c
@@ -524,7 +524,6 @@ static void many(int fd, int dir, uint64_t size, unsigned int flags)
 	}
 	igt_info("Captured %lu %"PRId64"-blobs out of a total of %lu\n",
 		 blobs, size >> 12, count);
-	igt_assert(count);
 
 	free(error);
 	free(offsets);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [igt-dev] [PATCH i-g-t 1/8] tests/i915/gem_exec_capture: Remove pointless assert
@ 2021-10-21 23:40   ` John.C.Harrison
  0 siblings, 0 replies; 51+ messages in thread
From: John.C.Harrison @ 2021-10-21 23:40 UTC (permalink / raw)
  To: IGT-Dev; +Cc: Intel-GFX, John Harrison

From: John Harrison <John.C.Harrison@Intel.com>

The 'many' test ended with an 'assert(count)', presumably meaning to
ensure that some objects were actually captured. However, 'count' is
the number of objects created not how many were captured. Plus, there
is already a 'require(count > 1)' at the start and count is invarient
so the final assert is basically pointless.

General concensus appears to be that the test should not fail
irrespective of how many blobs are captured as low memory situations
could cause the capture to be abbreviated. So just remove the
pointless assert completely.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 tests/i915/gem_exec_capture.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/tests/i915/gem_exec_capture.c b/tests/i915/gem_exec_capture.c
index 7e0a8b8ad..53649cdb2 100644
--- a/tests/i915/gem_exec_capture.c
+++ b/tests/i915/gem_exec_capture.c
@@ -524,7 +524,6 @@ static void many(int fd, int dir, uint64_t size, unsigned int flags)
 	}
 	igt_info("Captured %lu %"PRId64"-blobs out of a total of %lu\n",
 		 blobs, size >> 12, count);
-	igt_assert(count);
 
 	free(error);
 	free(offsets);
-- 
2.25.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Intel-gfx] [PATCH i-g-t 2/8] tests/i915/gem_exec_capture: Cope with larger page sizes
  2021-10-21 23:40 [Intel-gfx] [PATCH i-g-t 0/8] Fixes for gem_exec_capture John.C.Harrison
@ 2021-10-21 23:40   ` John.C.Harrison
  2021-10-21 23:40   ` [igt-dev] " John.C.Harrison
                     ` (8 subsequent siblings)
  9 siblings, 0 replies; 51+ messages in thread
From: John.C.Harrison @ 2021-10-21 23:40 UTC (permalink / raw)
  To: IGT-Dev; +Cc: Intel-GFX, John Harrison

From: John Harrison <John.C.Harrison@Intel.com>

At some point, larger than 4KB page sizes were added to the i915
driver. This included adding an informational line to the buffer
entries in error capture logs. However, the error capture test was not
updated to skip this string, thus it would silently abort processing.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 tests/i915/gem_exec_capture.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/tests/i915/gem_exec_capture.c b/tests/i915/gem_exec_capture.c
index 53649cdb2..47ca64dd6 100644
--- a/tests/i915/gem_exec_capture.c
+++ b/tests/i915/gem_exec_capture.c
@@ -484,6 +484,12 @@ static void many(int fd, int dir, uint64_t size, unsigned int flags)
 		addr |= strtoul(str + 1, &str, 16);
 		igt_assert(*str++ == '\n');
 
+		/* gtt_page_sizes = 0x00010000 */
+		if (strncmp(str, "gtt_page_sizes = 0x", 19) == 0) {
+			str += 19 + 8;
+			igt_assert(*str++ == '\n');
+		}
+
 		if (!(*str == ':' || *str == '~'))
 			continue;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [igt-dev] [PATCH i-g-t 2/8] tests/i915/gem_exec_capture: Cope with larger page sizes
@ 2021-10-21 23:40   ` John.C.Harrison
  0 siblings, 0 replies; 51+ messages in thread
From: John.C.Harrison @ 2021-10-21 23:40 UTC (permalink / raw)
  To: IGT-Dev; +Cc: Intel-GFX, John Harrison

From: John Harrison <John.C.Harrison@Intel.com>

At some point, larger than 4KB page sizes were added to the i915
driver. This included adding an informational line to the buffer
entries in error capture logs. However, the error capture test was not
updated to skip this string, thus it would silently abort processing.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 tests/i915/gem_exec_capture.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/tests/i915/gem_exec_capture.c b/tests/i915/gem_exec_capture.c
index 53649cdb2..47ca64dd6 100644
--- a/tests/i915/gem_exec_capture.c
+++ b/tests/i915/gem_exec_capture.c
@@ -484,6 +484,12 @@ static void many(int fd, int dir, uint64_t size, unsigned int flags)
 		addr |= strtoul(str + 1, &str, 16);
 		igt_assert(*str++ == '\n');
 
+		/* gtt_page_sizes = 0x00010000 */
+		if (strncmp(str, "gtt_page_sizes = 0x", 19) == 0) {
+			str += 19 + 8;
+			igt_assert(*str++ == '\n');
+		}
+
 		if (!(*str == ':' || *str == '~'))
 			continue;
 
-- 
2.25.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Intel-gfx] [PATCH i-g-t 3/8] tests/i915/gem_exec_capture: Make the error decode a common helper
  2021-10-21 23:40 [Intel-gfx] [PATCH i-g-t 0/8] Fixes for gem_exec_capture John.C.Harrison
  2021-10-21 23:40   ` [igt-dev] " John.C.Harrison
  2021-10-21 23:40   ` [igt-dev] " John.C.Harrison
@ 2021-10-21 23:40 ` John.C.Harrison
  2021-10-29  2:34     ` [igt-dev] " Matthew Brost
  2021-10-21 23:40   ` [igt-dev] " John.C.Harrison
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 51+ messages in thread
From: John.C.Harrison @ 2021-10-21 23:40 UTC (permalink / raw)
  To: IGT-Dev; +Cc: Intel-GFX, John Harrison

From: John Harrison <John.C.Harrison@Intel.com>

The decode of the error capture contents was happening in two
different sub-tests with two very different pieces of code. One being
much more extensive than the other (actually decodes and verifies the
contents of the captured buffers rather than just the address). So,
move the code into a common helper function and use that in both
places.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 tests/i915/gem_exec_capture.c | 344 +++++++++++++++++-----------------
 1 file changed, 170 insertions(+), 174 deletions(-)

diff --git a/tests/i915/gem_exec_capture.c b/tests/i915/gem_exec_capture.c
index 47ca64dd6..c85c198f7 100644
--- a/tests/i915/gem_exec_capture.c
+++ b/tests/i915/gem_exec_capture.c
@@ -33,32 +33,175 @@
 
 IGT_TEST_DESCRIPTION("Check that we capture the user specified objects on a hang");
 
-static void check_error_state(int dir, struct drm_i915_gem_exec_object2 *obj)
+struct offset {
+	uint64_t addr;
+	unsigned long idx;
+	bool found;
+};
+
+static unsigned long zlib_inflate(uint32_t **ptr, unsigned long len)
+{
+	struct z_stream_s zstream;
+	void *out;
+
+	memset(&zstream, 0, sizeof(zstream));
+
+	zstream.next_in = (unsigned char *)*ptr;
+	zstream.avail_in = 4*len;
+
+	if (inflateInit(&zstream) != Z_OK)
+		return 0;
+
+	out = malloc(128*4096); /* approximate obj size */
+	zstream.next_out = out;
+	zstream.avail_out = 128*4096;
+
+	do {
+		switch (inflate(&zstream, Z_SYNC_FLUSH)) {
+		case Z_STREAM_END:
+			goto end;
+		case Z_OK:
+			break;
+		default:
+			inflateEnd(&zstream);
+			return 0;
+		}
+
+		if (zstream.avail_out)
+			break;
+
+		out = realloc(out, 2*zstream.total_out);
+		if (out == NULL) {
+			inflateEnd(&zstream);
+			return 0;
+		}
+
+		zstream.next_out = (unsigned char *)out + zstream.total_out;
+		zstream.avail_out = zstream.total_out;
+	} while (1);
+end:
+	inflateEnd(&zstream);
+	free(*ptr);
+	*ptr = out;
+	return zstream.total_out / 4;
+}
+
+static unsigned long
+ascii85_decode(char *in, uint32_t **out, bool inflate, char **end)
+{
+	unsigned long len = 0, size = 1024;
+
+	*out = realloc(*out, sizeof(uint32_t)*size);
+	if (*out == NULL)
+		return 0;
+
+	while (*in >= '!' && *in <= 'z') {
+		uint32_t v = 0;
+
+		if (len == size) {
+			size *= 2;
+			*out = realloc(*out, sizeof(uint32_t)*size);
+			if (*out == NULL)
+				return 0;
+		}
+
+		if (*in == 'z') {
+			in++;
+		} else {
+			v += in[0] - 33; v *= 85;
+			v += in[1] - 33; v *= 85;
+			v += in[2] - 33; v *= 85;
+			v += in[3] - 33; v *= 85;
+			v += in[4] - 33;
+			in += 5;
+		}
+		(*out)[len++] = v;
+	}
+	*end = in;
+
+	if (!inflate)
+		return len;
+
+	return zlib_inflate(out, len);
+}
+
+static int check_error_state(int dir, struct offset *obj_offsets, int obj_count,
+			     uint64_t obj_size, bool incremental)
 {
 	char *error, *str;
-	bool found = false;
+	int blobs = 0;
 
 	error = igt_sysfs_get(dir, "error");
 	igt_sysfs_set(dir, "error", "Begone!");
-
 	igt_assert(error);
 	igt_debug("%s\n", error);
 
 	/* render ring --- user = 0x00000000 ffffd000 */
-	for (str = error; (str = strstr(str, "--- user = ")); str++) {
+	for (str = error; (str = strstr(str, "--- user = ")); ) {
+		uint32_t *data = NULL;
 		uint64_t addr;
-		uint32_t hi, lo;
+		unsigned long i, sz;
+		unsigned long start;
+		unsigned long end;
 
-		igt_assert(sscanf(str, "--- user = 0x%x %x", &hi, &lo) == 2);
-		addr = hi;
+		if (strncmp(str, "--- user = 0x", 13))
+			break;
+		str += 13;
+		addr = strtoul(str, &str, 16);
 		addr <<= 32;
-		addr |= lo;
-		igt_assert_eq_u64(addr, obj->offset);
-		found = true;
+		addr |= strtoul(str + 1, &str, 16);
+		igt_assert(*str++ == '\n');
+
+		start = 0;
+		end = obj_count;
+		while (end > start) {
+			i = (end - start) / 2 + start;
+			if (obj_offsets[i].addr < addr)
+				start = i + 1;
+			else if (obj_offsets[i].addr > addr)
+				end = i;
+			else
+				break;
+		}
+		igt_assert(obj_offsets[i].addr == addr);
+		igt_assert(!obj_offsets[i].found);
+		obj_offsets[i].found = true;
+		igt_debug("offset:%"PRIx64", index:%ld\n",
+			  addr, obj_offsets[i].idx);
+
+		/* gtt_page_sizes = 0x00010000 */
+		if (strncmp(str, "gtt_page_sizes = 0x", 19) == 0) {
+			str += 19 + 8;
+			igt_assert(*str++ == '\n');
+		}
+
+		if (!(*str == ':' || *str == '~'))
+			continue;
+
+		igt_debug("blob:%.64s\n", str);
+		sz = ascii85_decode(str + 1, &data, *str == ':', &str);
+
+		igt_assert_eq(4 * sz, obj_size);
+		igt_assert(*str++ == '\n');
+		str = strchr(str, '-');
+
+		if (incremental) {
+			uint32_t expect;
+
+			expect = obj_offsets[i].idx * obj_size;
+			for (i = 0; i < sz; i++)
+				igt_assert_eq(data[i], expect++);
+		} else {
+			for (i = 0; i < sz; i++)
+				igt_assert_eq(data[i], 0);
+		}
+
+		blobs++;
+		free(data);
 	}
 
 	free(error);
-	igt_assert(found);
+	return blobs;
 }
 
 static void __capture1(int fd, int dir, uint64_t ahnd, const intel_ctx_t *ctx,
@@ -73,6 +216,7 @@ static void __capture1(int fd, int dir, uint64_t ahnd, const intel_ctx_t *ctx,
 	struct drm_i915_gem_relocation_entry reloc[2];
 	struct drm_i915_gem_execbuffer2 execbuf;
 	uint32_t *batch, *seqno;
+	struct offset offset;
 	int i;
 
 	memset(obj, 0, sizeof(obj));
@@ -168,7 +312,10 @@ static void __capture1(int fd, int dir, uint64_t ahnd, const intel_ctx_t *ctx,
 
 	/* Check that only the buffer we marked is reported in the error */
 	igt_force_gpu_reset(fd);
-	check_error_state(dir, &obj[CAPTURE]);
+	memset(&offset, 0, sizeof(offset));
+	offset.addr = obj[CAPTURE].offset;
+	igt_assert_eq(check_error_state(dir, &offset, 1, target_size, false), 1);
+	igt_assert(offset.found);
 
 	gem_sync(fd, obj[BATCH].handle);
 
@@ -183,11 +330,12 @@ static void capture(int fd, int dir, const intel_ctx_t *ctx, unsigned ring)
 {
 	uint32_t handle;
 	uint64_t ahnd;
+	int obj_size = 4096;
 
-	handle = gem_create(fd, 4096);
+	handle = gem_create(fd, obj_size);
 	ahnd = get_reloc_ahnd(fd, ctx->id);
 
-	__capture1(fd, dir, ahnd, ctx, ring, handle, 4096);
+	__capture1(fd, dir, ahnd, ctx, ring, handle, obj_size);
 
 	gem_close(fd, handle);
 	put_ahnd(ahnd);
@@ -206,10 +354,8 @@ static int cmp(const void *A, const void *B)
 	return 0;
 }
 
-static struct offset {
-	uint64_t addr;
-	unsigned long idx;
-} *__captureN(int fd, int dir, uint64_t ahnd, unsigned ring,
+static struct offset *
+__captureN(int fd, int dir, uint64_t ahnd, unsigned ring,
 	      unsigned int size, int count,
 	      unsigned int flags)
 #define INCREMENTAL 0x1
@@ -357,98 +503,11 @@ static struct offset {
 	return offsets;
 }
 
-static unsigned long zlib_inflate(uint32_t **ptr, unsigned long len)
-{
-	struct z_stream_s zstream;
-	void *out;
-
-	memset(&zstream, 0, sizeof(zstream));
-
-	zstream.next_in = (unsigned char *)*ptr;
-	zstream.avail_in = 4*len;
-
-	if (inflateInit(&zstream) != Z_OK)
-		return 0;
-
-	out = malloc(128*4096); /* approximate obj size */
-	zstream.next_out = out;
-	zstream.avail_out = 128*4096;
-
-	do {
-		switch (inflate(&zstream, Z_SYNC_FLUSH)) {
-		case Z_STREAM_END:
-			goto end;
-		case Z_OK:
-			break;
-		default:
-			inflateEnd(&zstream);
-			return 0;
-		}
-
-		if (zstream.avail_out)
-			break;
-
-		out = realloc(out, 2*zstream.total_out);
-		if (out == NULL) {
-			inflateEnd(&zstream);
-			return 0;
-		}
-
-		zstream.next_out = (unsigned char *)out + zstream.total_out;
-		zstream.avail_out = zstream.total_out;
-	} while (1);
-end:
-	inflateEnd(&zstream);
-	free(*ptr);
-	*ptr = out;
-	return zstream.total_out / 4;
-}
-
-static unsigned long
-ascii85_decode(char *in, uint32_t **out, bool inflate, char **end)
-{
-	unsigned long len = 0, size = 1024;
-
-	*out = realloc(*out, sizeof(uint32_t)*size);
-	if (*out == NULL)
-		return 0;
-
-	while (*in >= '!' && *in <= 'z') {
-		uint32_t v = 0;
-
-		if (len == size) {
-			size *= 2;
-			*out = realloc(*out, sizeof(uint32_t)*size);
-			if (*out == NULL)
-				return 0;
-		}
-
-		if (*in == 'z') {
-			in++;
-		} else {
-			v += in[0] - 33; v *= 85;
-			v += in[1] - 33; v *= 85;
-			v += in[2] - 33; v *= 85;
-			v += in[3] - 33; v *= 85;
-			v += in[4] - 33;
-			in += 5;
-		}
-		(*out)[len++] = v;
-	}
-	*end = in;
-
-	if (!inflate)
-		return len;
-
-	return zlib_inflate(out, len);
-}
-
 static void many(int fd, int dir, uint64_t size, unsigned int flags)
 {
 	uint64_t ram, gtt, ahnd;
 	unsigned long count, blobs;
 	struct offset *offsets;
-	char *error, *str;
 
 	gtt = gem_aperture_size(fd) / size;
 	ram = (intel_get_avail_ram_mb() << 20) / size;
@@ -463,75 +522,10 @@ static void many(int fd, int dir, uint64_t size, unsigned int flags)
 
 	offsets = __captureN(fd, dir, ahnd, 0, size, count, flags);
 
-	error = igt_sysfs_get(dir, "error");
-	igt_sysfs_set(dir, "error", "Begone!");
-	igt_assert(error);
-
-	blobs = 0;
-	/* render ring --- user = 0x00000000 ffffd000 */
-	str = strstr(error, "--- user = ");
-	while (str) {
-		uint32_t *data = NULL;
-		unsigned long i, sz;
-		uint64_t addr;
-
-		if (strncmp(str, "--- user = 0x", 13))
-			break;
-
-		str += 13;
-		addr = strtoul(str, &str, 16);
-		addr <<= 32;
-		addr |= strtoul(str + 1, &str, 16);
-		igt_assert(*str++ == '\n');
-
-		/* gtt_page_sizes = 0x00010000 */
-		if (strncmp(str, "gtt_page_sizes = 0x", 19) == 0) {
-			str += 19 + 8;
-			igt_assert(*str++ == '\n');
-		}
-
-		if (!(*str == ':' || *str == '~'))
-			continue;
-
-		igt_debug("blob:%.64s\n", str);
-		sz = ascii85_decode(str + 1, &data, *str == ':', &str);
-		igt_assert_eq(4 * sz, size);
-		igt_assert(*str++ == '\n');
-		str = strchr(str, '-');
-
-		if (flags & INCREMENTAL) {
-			unsigned long start = 0;
-			unsigned long end = count;
-			uint32_t expect;
-
-			while (end > start) {
-				i = (end - start) / 2 + start;
-				if (offsets[i].addr < addr)
-					start = i + 1;
-				else if (offsets[i].addr > addr)
-					end = i;
-				else
-					break;
-			}
-			igt_assert(offsets[i].addr == addr);
-			igt_debug("offset:%"PRIx64", index:%ld\n",
-				  addr, offsets[i].idx);
-
-			expect = offsets[i].idx * size;
-			for (i = 0; i < sz; i++)
-				igt_assert_eq(data[i], expect++);
-		} else {
-			for (i = 0; i < sz; i++)
-				igt_assert_eq(data[i], 0);
-		}
-
-		blobs++;
-		free(data);
-	}
+	blobs = check_error_state(dir, offsets, count, size, !!(flags & INCREMENTAL));
 	igt_info("Captured %lu %"PRId64"-blobs out of a total of %lu\n",
 		 blobs, size >> 12, count);
 
-	free(error);
 	free(offsets);
 	put_ahnd(ahnd);
 }
@@ -625,12 +619,14 @@ static void userptr(int fd, int dir)
 	uint32_t handle;
 	uint64_t ahnd;
 	void *ptr;
+	int obj_size = 4096;
 
-	igt_assert(posix_memalign(&ptr, 4096, 4096) == 0);
-	igt_require(__gem_userptr(fd, ptr, 4096, 0, 0, &handle) == 0);
+	igt_assert(posix_memalign(&ptr, obj_size, obj_size) == 0);
+	memset(ptr, 0, obj_size);
+	igt_require(__gem_userptr(fd, ptr, obj_size, 0, 0, &handle) == 0);
 	ahnd = get_reloc_ahnd(fd, ctx->id);
 
-	__capture1(fd, dir, ahnd, intel_ctx_0(fd), 0, handle, 4096);
+	__capture1(fd, dir, ahnd, intel_ctx_0(fd), 0, handle, obj_size);
 
 	gem_close(fd, handle);
 	put_ahnd(ahnd);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Intel-gfx] [PATCH i-g-t 4/8] tests/i915/gem_exec_capture: Use contexts and engines properly
  2021-10-21 23:40 [Intel-gfx] [PATCH i-g-t 0/8] Fixes for gem_exec_capture John.C.Harrison
@ 2021-10-21 23:40   ` John.C.Harrison
  2021-10-21 23:40   ` [igt-dev] " John.C.Harrison
                     ` (8 subsequent siblings)
  9 siblings, 0 replies; 51+ messages in thread
From: John.C.Harrison @ 2021-10-21 23:40 UTC (permalink / raw)
  To: IGT-Dev; +Cc: Intel-GFX, John Harrison

From: John Harrison <John.C.Harrison@Intel.com>

Some of the capture tests were using explicit contexts, some not. Some
were poking the per engine pre-emption timeout, some not. This would
lead to sporadic failures due to random timeouts, contexts being
banned depending upon how many subtests were run and/or how many
engines a given platform has, and other such failures.

So, update all tests to be conistent.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 tests/i915/gem_exec_capture.c | 80 +++++++++++++++++++++++++----------
 1 file changed, 58 insertions(+), 22 deletions(-)

diff --git a/tests/i915/gem_exec_capture.c b/tests/i915/gem_exec_capture.c
index c85c198f7..e373d24ed 100644
--- a/tests/i915/gem_exec_capture.c
+++ b/tests/i915/gem_exec_capture.c
@@ -204,8 +204,19 @@ static int check_error_state(int dir, struct offset *obj_offsets, int obj_count,
 	return blobs;
 }
 
+static void configure_hangs(int fd, const struct intel_execution_engine2 *e, int ctxt_id)
+{
+	/* Ensure fast hang detection */
+	gem_engine_property_printf(fd, e->name, "preempt_timeout_ms", "%d", 250);
+	gem_engine_property_printf(fd, e->name, "heartbeat_interval_ms", "%d", 500);
+
+	/* Allow engine based resets and disable banning */
+	igt_allow_hang(fd, ctxt_id, HANG_ALLOW_CAPTURE);
+}
+
 static void __capture1(int fd, int dir, uint64_t ahnd, const intel_ctx_t *ctx,
-		       unsigned ring, uint32_t target, uint64_t target_size)
+		       const struct intel_execution_engine2 *e,
+		       uint32_t target, uint64_t target_size)
 {
 	const unsigned int gen = intel_gen(intel_get_drm_devid(fd));
 	struct drm_i915_gem_exec_object2 obj[4];
@@ -219,6 +230,8 @@ static void __capture1(int fd, int dir, uint64_t ahnd, const intel_ctx_t *ctx,
 	struct offset offset;
 	int i;
 
+	configure_hangs(fd, e, ctx->id);
+
 	memset(obj, 0, sizeof(obj));
 	obj[SCRATCH].handle = gem_create(fd, 4096);
 	obj[SCRATCH].flags = EXEC_OBJECT_WRITE;
@@ -297,7 +310,7 @@ static void __capture1(int fd, int dir, uint64_t ahnd, const intel_ctx_t *ctx,
 	memset(&execbuf, 0, sizeof(execbuf));
 	execbuf.buffers_ptr = (uintptr_t)obj;
 	execbuf.buffer_count = ARRAY_SIZE(obj);
-	execbuf.flags = ring;
+	execbuf.flags = e->flags;
 	if (gen > 3 && gen < 6)
 		execbuf.flags |= I915_EXEC_SECURE;
 	execbuf.rsvd1 = ctx->id;
@@ -326,7 +339,8 @@ static void __capture1(int fd, int dir, uint64_t ahnd, const intel_ctx_t *ctx,
 	gem_close(fd, obj[SCRATCH].handle);
 }
 
-static void capture(int fd, int dir, const intel_ctx_t *ctx, unsigned ring)
+static void capture(int fd, int dir, const intel_ctx_t *ctx,
+		    const struct intel_execution_engine2 *e)
 {
 	uint32_t handle;
 	uint64_t ahnd;
@@ -335,7 +349,7 @@ static void capture(int fd, int dir, const intel_ctx_t *ctx, unsigned ring)
 	handle = gem_create(fd, obj_size);
 	ahnd = get_reloc_ahnd(fd, ctx->id);
 
-	__capture1(fd, dir, ahnd, ctx, ring, handle, obj_size);
+	__capture1(fd, dir, ahnd, ctx, e, handle, obj_size);
 
 	gem_close(fd, handle);
 	put_ahnd(ahnd);
@@ -355,9 +369,9 @@ static int cmp(const void *A, const void *B)
 }
 
 static struct offset *
-__captureN(int fd, int dir, uint64_t ahnd, unsigned ring,
-	      unsigned int size, int count,
-	      unsigned int flags)
+__captureN(int fd, int dir, uint64_t ahnd, const intel_ctx_t *ctx,
+	   const struct intel_execution_engine2 *e,
+	   unsigned int size, int count, unsigned int flags)
 #define INCREMENTAL 0x1
 #define ASYNC 0x2
 {
@@ -369,6 +383,8 @@ __captureN(int fd, int dir, uint64_t ahnd, unsigned ring,
 	struct offset *offsets;
 	int i;
 
+	configure_hangs(fd, e, ctx->id);
+
 	offsets = calloc(count, sizeof(*offsets));
 	igt_assert(offsets);
 
@@ -470,9 +486,10 @@ __captureN(int fd, int dir, uint64_t ahnd, unsigned ring,
 	memset(&execbuf, 0, sizeof(execbuf));
 	execbuf.buffers_ptr = (uintptr_t)obj;
 	execbuf.buffer_count = count + 2;
-	execbuf.flags = ring;
+	execbuf.flags = e->flags;
 	if (gen > 3 && gen < 6)
 		execbuf.flags |= I915_EXEC_SECURE;
+	execbuf.rsvd1 = ctx->id;
 
 	igt_assert(!READ_ONCE(*seqno));
 	gem_execbuf(fd, &execbuf);
@@ -505,10 +522,20 @@ __captureN(int fd, int dir, uint64_t ahnd, unsigned ring,
 
 static void many(int fd, int dir, uint64_t size, unsigned int flags)
 {
+	const struct intel_execution_engine2 *e;
+	const intel_ctx_t *ctx;
 	uint64_t ram, gtt, ahnd;
 	unsigned long count, blobs;
 	struct offset *offsets;
 
+	/* Find the first available engine: */
+	ctx = intel_ctx_create_all_physical(fd);
+	igt_assert(ctx);
+	for_each_ctx_engine(fd, ctx, e)
+		for_each_if(gem_class_can_store_dword(fd, e->class))
+			break;
+	igt_assert(e);
+
 	gtt = gem_aperture_size(fd) / size;
 	ram = (intel_get_avail_ram_mb() << 20) / size;
 	igt_debug("Available objects in GTT:%"PRIu64", RAM:%"PRIu64"\n",
@@ -518,9 +545,9 @@ static void many(int fd, int dir, uint64_t size, unsigned int flags)
 	igt_require(count > 1);
 
 	intel_require_memory(count, size, CHECK_RAM);
-	ahnd = get_reloc_ahnd(fd, 0);
+	ahnd = get_reloc_ahnd(fd, ctx->id);
 
-	offsets = __captureN(fd, dir, ahnd, 0, size, count, flags);
+	offsets = __captureN(fd, dir, ahnd, ctx, e, size, count, flags);
 
 	blobs = check_error_state(dir, offsets, count, size, !!(flags & INCREMENTAL));
 	igt_info("Captured %lu %"PRId64"-blobs out of a total of %lu\n",
@@ -531,7 +558,7 @@ static void many(int fd, int dir, uint64_t size, unsigned int flags)
 }
 
 static void prioinv(int fd, int dir, const intel_ctx_t *ctx,
-		    unsigned ring, const char *name)
+		    const struct intel_execution_engine2 *e)
 {
 	const uint32_t bbe = MI_BATCH_BUFFER_END;
 	struct drm_i915_gem_exec_object2 obj = {
@@ -540,7 +567,7 @@ static void prioinv(int fd, int dir, const intel_ctx_t *ctx,
 	struct drm_i915_gem_execbuffer2 execbuf = {
 		.buffers_ptr = to_user_pointer(&obj),
 		.buffer_count = 1,
-		.flags = ring,
+		.flags = e->flags,
 		.rsvd1 = ctx->id,
 	};
 	int64_t timeout = NSEC_PER_SEC; /* 1s, feeling generous, blame debug */
@@ -555,10 +582,6 @@ static void prioinv(int fd, int dir, const intel_ctx_t *ctx,
 	igt_require(igt_params_set(fd, "reset", "%u", -1)); /* engine resets! */
 	igt_require(gem_gpu_reset_type(fd) > 1);
 
-	/* Needs to be fast enough for the hangcheck to return within 1s */
-	igt_require(gem_engine_property_printf(fd, name, "preempt_timeout_ms", "%d", 0) > 0);
-	gem_engine_property_printf(fd, name, "preempt_timeout_ms", "%d", 500);
-
 	gtt = gem_aperture_size(fd) / size;
 	ram = (intel_get_avail_ram_mb() << 20) / size;
 	igt_debug("Available objects in GTT:%"PRIu64", RAM:%"PRIu64"\n",
@@ -576,15 +599,19 @@ static void prioinv(int fd, int dir, const intel_ctx_t *ctx,
 
 	igt_assert(pipe(link) == 0);
 	igt_fork(child, 1) {
+		const intel_ctx_t *ctx2;
 		fd = gem_reopen_driver(fd);
 		igt_debug("Submitting large capture [%ld x %dMiB objects]\n",
 			  count, (int)(size >> 20));
 
+		ctx2 = intel_ctx_create_all_physical(fd);
+		igt_assert(ctx2);
+
 		intel_allocator_init();
 		/* Reopen the allocator in the new process. */
-		ahnd = get_reloc_ahnd(fd, 0);
+		ahnd = get_reloc_ahnd(fd, ctx2->id);
 
-		free(__captureN(fd, dir, ahnd, ring, size, count, ASYNC));
+		free(__captureN(fd, dir, ahnd, ctx2, e, size, count, ASYNC));
 		put_ahnd(ahnd);
 
 		write(link[1], &fd, sizeof(fd)); /* wake the parent up */
@@ -615,18 +642,27 @@ static void prioinv(int fd, int dir, const intel_ctx_t *ctx,
 
 static void userptr(int fd, int dir)
 {
-	const intel_ctx_t *ctx = intel_ctx_0(fd);
+	const struct intel_execution_engine2 *e;
+	const intel_ctx_t *ctx;
 	uint32_t handle;
 	uint64_t ahnd;
 	void *ptr;
 	int obj_size = 4096;
 
+	/* Find the first available engine: */
+	ctx = intel_ctx_create_all_physical(fd);
+	igt_assert(ctx);
+	for_each_ctx_engine(fd, ctx, e)
+		for_each_if(gem_class_can_store_dword(fd, e->class))
+			break;
+	igt_assert(e);
+
 	igt_assert(posix_memalign(&ptr, obj_size, obj_size) == 0);
 	memset(ptr, 0, obj_size);
 	igt_require(__gem_userptr(fd, ptr, obj_size, 0, 0, &handle) == 0);
 	ahnd = get_reloc_ahnd(fd, ctx->id);
 
-	__capture1(fd, dir, ahnd, intel_ctx_0(fd), 0, handle, obj_size);
+	__capture1(fd, dir, ahnd, ctx, e, handle, obj_size);
 
 	gem_close(fd, handle);
 	put_ahnd(ahnd);
@@ -684,7 +720,7 @@ igt_main
 	}
 
 	test_each_engine("capture", fd, ctx, e)
-		capture(fd, dir, ctx, e->flags);
+		capture(fd, dir, ctx, e);
 
 	igt_subtest_f("many-4K-zero") {
 		igt_require(gem_can_store_dword(fd, 0));
@@ -719,7 +755,7 @@ igt_main
 	}
 
 	test_each_engine("pi", fd, ctx, e)
-		prioinv(fd, dir, ctx, e->flags, e->name);
+		prioinv(fd, dir, ctx, e);
 
 	igt_fixture {
 		close(dir);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [igt-dev] [PATCH i-g-t 4/8] tests/i915/gem_exec_capture: Use contexts and engines properly
@ 2021-10-21 23:40   ` John.C.Harrison
  0 siblings, 0 replies; 51+ messages in thread
From: John.C.Harrison @ 2021-10-21 23:40 UTC (permalink / raw)
  To: IGT-Dev; +Cc: Intel-GFX, John Harrison

From: John Harrison <John.C.Harrison@Intel.com>

Some of the capture tests were using explicit contexts, some not. Some
were poking the per engine pre-emption timeout, some not. This would
lead to sporadic failures due to random timeouts, contexts being
banned depending upon how many subtests were run and/or how many
engines a given platform has, and other such failures.

So, update all tests to be conistent.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 tests/i915/gem_exec_capture.c | 80 +++++++++++++++++++++++++----------
 1 file changed, 58 insertions(+), 22 deletions(-)

diff --git a/tests/i915/gem_exec_capture.c b/tests/i915/gem_exec_capture.c
index c85c198f7..e373d24ed 100644
--- a/tests/i915/gem_exec_capture.c
+++ b/tests/i915/gem_exec_capture.c
@@ -204,8 +204,19 @@ static int check_error_state(int dir, struct offset *obj_offsets, int obj_count,
 	return blobs;
 }
 
+static void configure_hangs(int fd, const struct intel_execution_engine2 *e, int ctxt_id)
+{
+	/* Ensure fast hang detection */
+	gem_engine_property_printf(fd, e->name, "preempt_timeout_ms", "%d", 250);
+	gem_engine_property_printf(fd, e->name, "heartbeat_interval_ms", "%d", 500);
+
+	/* Allow engine based resets and disable banning */
+	igt_allow_hang(fd, ctxt_id, HANG_ALLOW_CAPTURE);
+}
+
 static void __capture1(int fd, int dir, uint64_t ahnd, const intel_ctx_t *ctx,
-		       unsigned ring, uint32_t target, uint64_t target_size)
+		       const struct intel_execution_engine2 *e,
+		       uint32_t target, uint64_t target_size)
 {
 	const unsigned int gen = intel_gen(intel_get_drm_devid(fd));
 	struct drm_i915_gem_exec_object2 obj[4];
@@ -219,6 +230,8 @@ static void __capture1(int fd, int dir, uint64_t ahnd, const intel_ctx_t *ctx,
 	struct offset offset;
 	int i;
 
+	configure_hangs(fd, e, ctx->id);
+
 	memset(obj, 0, sizeof(obj));
 	obj[SCRATCH].handle = gem_create(fd, 4096);
 	obj[SCRATCH].flags = EXEC_OBJECT_WRITE;
@@ -297,7 +310,7 @@ static void __capture1(int fd, int dir, uint64_t ahnd, const intel_ctx_t *ctx,
 	memset(&execbuf, 0, sizeof(execbuf));
 	execbuf.buffers_ptr = (uintptr_t)obj;
 	execbuf.buffer_count = ARRAY_SIZE(obj);
-	execbuf.flags = ring;
+	execbuf.flags = e->flags;
 	if (gen > 3 && gen < 6)
 		execbuf.flags |= I915_EXEC_SECURE;
 	execbuf.rsvd1 = ctx->id;
@@ -326,7 +339,8 @@ static void __capture1(int fd, int dir, uint64_t ahnd, const intel_ctx_t *ctx,
 	gem_close(fd, obj[SCRATCH].handle);
 }
 
-static void capture(int fd, int dir, const intel_ctx_t *ctx, unsigned ring)
+static void capture(int fd, int dir, const intel_ctx_t *ctx,
+		    const struct intel_execution_engine2 *e)
 {
 	uint32_t handle;
 	uint64_t ahnd;
@@ -335,7 +349,7 @@ static void capture(int fd, int dir, const intel_ctx_t *ctx, unsigned ring)
 	handle = gem_create(fd, obj_size);
 	ahnd = get_reloc_ahnd(fd, ctx->id);
 
-	__capture1(fd, dir, ahnd, ctx, ring, handle, obj_size);
+	__capture1(fd, dir, ahnd, ctx, e, handle, obj_size);
 
 	gem_close(fd, handle);
 	put_ahnd(ahnd);
@@ -355,9 +369,9 @@ static int cmp(const void *A, const void *B)
 }
 
 static struct offset *
-__captureN(int fd, int dir, uint64_t ahnd, unsigned ring,
-	      unsigned int size, int count,
-	      unsigned int flags)
+__captureN(int fd, int dir, uint64_t ahnd, const intel_ctx_t *ctx,
+	   const struct intel_execution_engine2 *e,
+	   unsigned int size, int count, unsigned int flags)
 #define INCREMENTAL 0x1
 #define ASYNC 0x2
 {
@@ -369,6 +383,8 @@ __captureN(int fd, int dir, uint64_t ahnd, unsigned ring,
 	struct offset *offsets;
 	int i;
 
+	configure_hangs(fd, e, ctx->id);
+
 	offsets = calloc(count, sizeof(*offsets));
 	igt_assert(offsets);
 
@@ -470,9 +486,10 @@ __captureN(int fd, int dir, uint64_t ahnd, unsigned ring,
 	memset(&execbuf, 0, sizeof(execbuf));
 	execbuf.buffers_ptr = (uintptr_t)obj;
 	execbuf.buffer_count = count + 2;
-	execbuf.flags = ring;
+	execbuf.flags = e->flags;
 	if (gen > 3 && gen < 6)
 		execbuf.flags |= I915_EXEC_SECURE;
+	execbuf.rsvd1 = ctx->id;
 
 	igt_assert(!READ_ONCE(*seqno));
 	gem_execbuf(fd, &execbuf);
@@ -505,10 +522,20 @@ __captureN(int fd, int dir, uint64_t ahnd, unsigned ring,
 
 static void many(int fd, int dir, uint64_t size, unsigned int flags)
 {
+	const struct intel_execution_engine2 *e;
+	const intel_ctx_t *ctx;
 	uint64_t ram, gtt, ahnd;
 	unsigned long count, blobs;
 	struct offset *offsets;
 
+	/* Find the first available engine: */
+	ctx = intel_ctx_create_all_physical(fd);
+	igt_assert(ctx);
+	for_each_ctx_engine(fd, ctx, e)
+		for_each_if(gem_class_can_store_dword(fd, e->class))
+			break;
+	igt_assert(e);
+
 	gtt = gem_aperture_size(fd) / size;
 	ram = (intel_get_avail_ram_mb() << 20) / size;
 	igt_debug("Available objects in GTT:%"PRIu64", RAM:%"PRIu64"\n",
@@ -518,9 +545,9 @@ static void many(int fd, int dir, uint64_t size, unsigned int flags)
 	igt_require(count > 1);
 
 	intel_require_memory(count, size, CHECK_RAM);
-	ahnd = get_reloc_ahnd(fd, 0);
+	ahnd = get_reloc_ahnd(fd, ctx->id);
 
-	offsets = __captureN(fd, dir, ahnd, 0, size, count, flags);
+	offsets = __captureN(fd, dir, ahnd, ctx, e, size, count, flags);
 
 	blobs = check_error_state(dir, offsets, count, size, !!(flags & INCREMENTAL));
 	igt_info("Captured %lu %"PRId64"-blobs out of a total of %lu\n",
@@ -531,7 +558,7 @@ static void many(int fd, int dir, uint64_t size, unsigned int flags)
 }
 
 static void prioinv(int fd, int dir, const intel_ctx_t *ctx,
-		    unsigned ring, const char *name)
+		    const struct intel_execution_engine2 *e)
 {
 	const uint32_t bbe = MI_BATCH_BUFFER_END;
 	struct drm_i915_gem_exec_object2 obj = {
@@ -540,7 +567,7 @@ static void prioinv(int fd, int dir, const intel_ctx_t *ctx,
 	struct drm_i915_gem_execbuffer2 execbuf = {
 		.buffers_ptr = to_user_pointer(&obj),
 		.buffer_count = 1,
-		.flags = ring,
+		.flags = e->flags,
 		.rsvd1 = ctx->id,
 	};
 	int64_t timeout = NSEC_PER_SEC; /* 1s, feeling generous, blame debug */
@@ -555,10 +582,6 @@ static void prioinv(int fd, int dir, const intel_ctx_t *ctx,
 	igt_require(igt_params_set(fd, "reset", "%u", -1)); /* engine resets! */
 	igt_require(gem_gpu_reset_type(fd) > 1);
 
-	/* Needs to be fast enough for the hangcheck to return within 1s */
-	igt_require(gem_engine_property_printf(fd, name, "preempt_timeout_ms", "%d", 0) > 0);
-	gem_engine_property_printf(fd, name, "preempt_timeout_ms", "%d", 500);
-
 	gtt = gem_aperture_size(fd) / size;
 	ram = (intel_get_avail_ram_mb() << 20) / size;
 	igt_debug("Available objects in GTT:%"PRIu64", RAM:%"PRIu64"\n",
@@ -576,15 +599,19 @@ static void prioinv(int fd, int dir, const intel_ctx_t *ctx,
 
 	igt_assert(pipe(link) == 0);
 	igt_fork(child, 1) {
+		const intel_ctx_t *ctx2;
 		fd = gem_reopen_driver(fd);
 		igt_debug("Submitting large capture [%ld x %dMiB objects]\n",
 			  count, (int)(size >> 20));
 
+		ctx2 = intel_ctx_create_all_physical(fd);
+		igt_assert(ctx2);
+
 		intel_allocator_init();
 		/* Reopen the allocator in the new process. */
-		ahnd = get_reloc_ahnd(fd, 0);
+		ahnd = get_reloc_ahnd(fd, ctx2->id);
 
-		free(__captureN(fd, dir, ahnd, ring, size, count, ASYNC));
+		free(__captureN(fd, dir, ahnd, ctx2, e, size, count, ASYNC));
 		put_ahnd(ahnd);
 
 		write(link[1], &fd, sizeof(fd)); /* wake the parent up */
@@ -615,18 +642,27 @@ static void prioinv(int fd, int dir, const intel_ctx_t *ctx,
 
 static void userptr(int fd, int dir)
 {
-	const intel_ctx_t *ctx = intel_ctx_0(fd);
+	const struct intel_execution_engine2 *e;
+	const intel_ctx_t *ctx;
 	uint32_t handle;
 	uint64_t ahnd;
 	void *ptr;
 	int obj_size = 4096;
 
+	/* Find the first available engine: */
+	ctx = intel_ctx_create_all_physical(fd);
+	igt_assert(ctx);
+	for_each_ctx_engine(fd, ctx, e)
+		for_each_if(gem_class_can_store_dword(fd, e->class))
+			break;
+	igt_assert(e);
+
 	igt_assert(posix_memalign(&ptr, obj_size, obj_size) == 0);
 	memset(ptr, 0, obj_size);
 	igt_require(__gem_userptr(fd, ptr, obj_size, 0, 0, &handle) == 0);
 	ahnd = get_reloc_ahnd(fd, ctx->id);
 
-	__capture1(fd, dir, ahnd, intel_ctx_0(fd), 0, handle, obj_size);
+	__capture1(fd, dir, ahnd, ctx, e, handle, obj_size);
 
 	gem_close(fd, handle);
 	put_ahnd(ahnd);
@@ -684,7 +720,7 @@ igt_main
 	}
 
 	test_each_engine("capture", fd, ctx, e)
-		capture(fd, dir, ctx, e->flags);
+		capture(fd, dir, ctx, e);
 
 	igt_subtest_f("many-4K-zero") {
 		igt_require(gem_can_store_dword(fd, 0));
@@ -719,7 +755,7 @@ igt_main
 	}
 
 	test_each_engine("pi", fd, ctx, e)
-		prioinv(fd, dir, ctx, e->flags, e->name);
+		prioinv(fd, dir, ctx, e);
 
 	igt_fixture {
 		close(dir);
-- 
2.25.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Intel-gfx] [PATCH i-g-t 5/8] tests/i915/gem_exec_capture: Check for memory allocation failure
  2021-10-21 23:40 [Intel-gfx] [PATCH i-g-t 0/8] Fixes for gem_exec_capture John.C.Harrison
                   ` (3 preceding siblings ...)
  2021-10-21 23:40   ` [igt-dev] " John.C.Harrison
@ 2021-10-21 23:40 ` John.C.Harrison
  2021-10-29  2:20   ` Matthew Brost
  2021-11-03 14:00     ` [igt-dev] " Tvrtko Ursulin
  2021-10-21 23:40   ` [igt-dev] " John.C.Harrison
                   ` (4 subsequent siblings)
  9 siblings, 2 replies; 51+ messages in thread
From: John.C.Harrison @ 2021-10-21 23:40 UTC (permalink / raw)
  To: IGT-Dev; +Cc: Intel-GFX, John Harrison

From: John Harrison <John.C.Harrison@Intel.com>

The sysfs file read helper does not actually report any errors if a
realloc fails. It just silently returns a 'valid' but truncated
buffer. This then leads to the decode of the buffer failing in random
ways. So, add a check for ENOMEM being generated during the read.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 tests/i915/gem_exec_capture.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tests/i915/gem_exec_capture.c b/tests/i915/gem_exec_capture.c
index e373d24ed..8997125ee 100644
--- a/tests/i915/gem_exec_capture.c
+++ b/tests/i915/gem_exec_capture.c
@@ -131,9 +131,11 @@ static int check_error_state(int dir, struct offset *obj_offsets, int obj_count,
 	char *error, *str;
 	int blobs = 0;
 
+	errno = 0;
 	error = igt_sysfs_get(dir, "error");
 	igt_sysfs_set(dir, "error", "Begone!");
 	igt_assert(error);
+	igt_assert(errno != ENOMEM);
 	igt_debug("%s\n", error);
 
 	/* render ring --- user = 0x00000000 ffffd000 */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Intel-gfx] [PATCH i-g-t 6/8] lib/igt_sysfs: Support large files
  2021-10-21 23:40 [Intel-gfx] [PATCH i-g-t 0/8] Fixes for gem_exec_capture John.C.Harrison
@ 2021-10-21 23:40   ` John.C.Harrison
  2021-10-21 23:40   ` [igt-dev] " John.C.Harrison
                     ` (8 subsequent siblings)
  9 siblings, 0 replies; 51+ messages in thread
From: John.C.Harrison @ 2021-10-21 23:40 UTC (permalink / raw)
  To: IGT-Dev; +Cc: Intel-GFX, John Harrison

From: John Harrison <John.C.Harrison@Intel.com>

The syfs helper functions were all using basic 'int' data types for
sizs, offsets, etc. when reading from sysfs. This works fine for
little files, but not for large error capture logs (which can be
gigabytes in sizes).

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 lib/igt_sysfs.c | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/lib/igt_sysfs.c b/lib/igt_sysfs.c
index 6919ac361..ee75e3ef1 100644
--- a/lib/igt_sysfs.c
+++ b/lib/igt_sysfs.c
@@ -53,9 +53,11 @@
  * provides basic support for like igt_sysfs_open().
  */
 
-static int readN(int fd, char *buf, int len)
+static ssize_t readN(int fd, char *buf, size_t len)
 {
-	int ret, total = 0;
+	ssize_t ret;
+	size_t total = 0;
+
 	do {
 		ret = read(fd, buf + total, len - total);
 		if (ret < 0)
@@ -69,9 +71,11 @@ static int readN(int fd, char *buf, int len)
 	return total ?: ret;
 }
 
-static int writeN(int fd, const char *buf, int len)
+static ssize_t writeN(int fd, const char *buf, size_t len)
 {
-	int ret, total = 0;
+	ssize_t ret;
+	size_t total = 0;
+
 	do {
 		ret = write(fd, buf + total, len - total);
 		if (ret < 0)
@@ -218,8 +222,9 @@ bool igt_sysfs_set(int dir, const char *attr, const char *value)
 char *igt_sysfs_get(int dir, const char *attr)
 {
 	char *buf;
-	int len, offset, rem;
-	int ret, fd;
+	size_t len, offset, rem;
+	ssize_t ret;
+	int fd;
 
 	fd = openat(dir, attr, O_RDONLY);
 	if (igt_debug_on(fd < 0))
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [igt-dev] [PATCH i-g-t 6/8] lib/igt_sysfs: Support large files
@ 2021-10-21 23:40   ` John.C.Harrison
  0 siblings, 0 replies; 51+ messages in thread
From: John.C.Harrison @ 2021-10-21 23:40 UTC (permalink / raw)
  To: IGT-Dev; +Cc: Intel-GFX, John Harrison

From: John Harrison <John.C.Harrison@Intel.com>

The syfs helper functions were all using basic 'int' data types for
sizs, offsets, etc. when reading from sysfs. This works fine for
little files, but not for large error capture logs (which can be
gigabytes in sizes).

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 lib/igt_sysfs.c | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/lib/igt_sysfs.c b/lib/igt_sysfs.c
index 6919ac361..ee75e3ef1 100644
--- a/lib/igt_sysfs.c
+++ b/lib/igt_sysfs.c
@@ -53,9 +53,11 @@
  * provides basic support for like igt_sysfs_open().
  */
 
-static int readN(int fd, char *buf, int len)
+static ssize_t readN(int fd, char *buf, size_t len)
 {
-	int ret, total = 0;
+	ssize_t ret;
+	size_t total = 0;
+
 	do {
 		ret = read(fd, buf + total, len - total);
 		if (ret < 0)
@@ -69,9 +71,11 @@ static int readN(int fd, char *buf, int len)
 	return total ?: ret;
 }
 
-static int writeN(int fd, const char *buf, int len)
+static ssize_t writeN(int fd, const char *buf, size_t len)
 {
-	int ret, total = 0;
+	ssize_t ret;
+	size_t total = 0;
+
 	do {
 		ret = write(fd, buf + total, len - total);
 		if (ret < 0)
@@ -218,8 +222,9 @@ bool igt_sysfs_set(int dir, const char *attr, const char *value)
 char *igt_sysfs_get(int dir, const char *attr)
 {
 	char *buf;
-	int len, offset, rem;
-	int ret, fd;
+	size_t len, offset, rem;
+	ssize_t ret;
+	int fd;
 
 	fd = openat(dir, attr, O_RDONLY);
 	if (igt_debug_on(fd < 0))
-- 
2.25.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Intel-gfx] [PATCH i-g-t 7/8] lib/igt_gt: Allow per engine reset testing
  2021-10-21 23:40 [Intel-gfx] [PATCH i-g-t 0/8] Fixes for gem_exec_capture John.C.Harrison
@ 2021-10-21 23:40   ` John.C.Harrison
  2021-10-21 23:40   ` [igt-dev] " John.C.Harrison
                     ` (8 subsequent siblings)
  9 siblings, 0 replies; 51+ messages in thread
From: John.C.Harrison @ 2021-10-21 23:40 UTC (permalink / raw)
  To: IGT-Dev; +Cc: Intel-GFX, John Harrison

From: John Harrison <John.C.Harrison@Intel.com>

With GuC submission, engine resets are handled entirely within GuC
rather than within i915. Traditionally, IGT has disallowed engine
based resets becuase they don't send the uevent which IGT uses to
check for unexpected resets. However, it is important to be able to
test all reset mechanisms that can be used, so allow engine based
resets to be enabled.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 lib/igt_gt.c | 44 +++++++++++++++++++++++++++++---------------
 lib/igt_gt.h |  1 +
 2 files changed, 30 insertions(+), 15 deletions(-)

diff --git a/lib/igt_gt.c b/lib/igt_gt.c
index a0ba04cc1..7c7df95ee 100644
--- a/lib/igt_gt.c
+++ b/lib/igt_gt.c
@@ -56,23 +56,28 @@
  * engines.
  */
 
+static int reset_query_once = -1;
+
 static bool has_gpu_reset(int fd)
 {
-	static int once = -1;
-	if (once < 0) {
-		struct drm_i915_getparam gp;
-		int val = 0;
-
-		memset(&gp, 0, sizeof(gp));
-		gp.param = 35; /* HAS_GPU_RESET */
-		gp.value = &val;
-
-		if (ioctl(fd, DRM_IOCTL_I915_GETPARAM, &gp))
-			once = intel_gen(intel_get_drm_devid(fd)) >= 5;
-		else
-			once = val > 0;
+	if (reset_query_once < 0) {
+		reset_query_once = gem_gpu_reset_type(fd);
+
+		/* Very old kernels did not support the query */
+		if (reset_query_once == -1)
+			reset_query_once =
+			      (intel_gen(intel_get_drm_devid(fd)) >= 5) ? 1 : 0;
 	}
-	return once;
+
+	return reset_query_once > 0;
+}
+
+static bool has_engine_reset(int fd)
+{
+	if (reset_query_once < 0)
+		has_gpu_reset(fd);
+
+	return reset_query_once > 1;
 }
 
 static void eat_error_state(int dev)
@@ -176,7 +181,11 @@ igt_hang_t igt_allow_hang(int fd, unsigned ctx, unsigned flags)
 		igt_skip("hang injection disabled by user [IGT_HANG=0]\n");
 	gem_context_require_bannable(fd);
 
-	allow_reset = 1;
+	if (flags & HANG_WANT_ENGINE_RESET)
+		allow_reset = 2;
+	else
+		allow_reset = 1;
+
 	if ((flags & HANG_ALLOW_CAPTURE) == 0) {
 		param.param = I915_CONTEXT_PARAM_NO_ERROR_CAPTURE;
 		param.value = 1;
@@ -187,11 +196,16 @@ igt_hang_t igt_allow_hang(int fd, unsigned ctx, unsigned flags)
 		__gem_context_set_param(fd, &param);
 		allow_reset = INT_MAX; /* any reset method */
 	}
+
 	igt_require(igt_params_set(fd, "reset", "%d", allow_reset));
+	reset_query_once = -1;  /* Re-query after changing param */
 
 	if (!igt_check_boolean_env_var("IGT_HANG_WITHOUT_RESET", false))
 		igt_require(has_gpu_reset(fd));
 
+	if (flags & HANG_WANT_ENGINE_RESET)
+		igt_require(has_engine_reset(fd));
+
 	ban = context_get_ban(fd, ctx);
 	if ((flags & HANG_ALLOW_BAN) == 0)
 		context_set_ban(fd, ctx, 0);
diff --git a/lib/igt_gt.h b/lib/igt_gt.h
index ceb044b86..c5059817b 100644
--- a/lib/igt_gt.h
+++ b/lib/igt_gt.h
@@ -51,6 +51,7 @@ igt_hang_t igt_hang_ctx_with_ahnd(int fd, uint64_t ahnd, uint32_t ctx, int ring,
 
 #define HANG_ALLOW_BAN 1
 #define HANG_ALLOW_CAPTURE 2
+#define HANG_WANT_ENGINE_RESET 4
 
 igt_hang_t igt_hang_ring(int fd, int ring);
 igt_hang_t igt_hang_ring_with_ahnd(int fd, int ring, uint64_t ahnd);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [igt-dev] [PATCH i-g-t 7/8] lib/igt_gt: Allow per engine reset testing
@ 2021-10-21 23:40   ` John.C.Harrison
  0 siblings, 0 replies; 51+ messages in thread
From: John.C.Harrison @ 2021-10-21 23:40 UTC (permalink / raw)
  To: IGT-Dev; +Cc: Intel-GFX, John Harrison

From: John Harrison <John.C.Harrison@Intel.com>

With GuC submission, engine resets are handled entirely within GuC
rather than within i915. Traditionally, IGT has disallowed engine
based resets becuase they don't send the uevent which IGT uses to
check for unexpected resets. However, it is important to be able to
test all reset mechanisms that can be used, so allow engine based
resets to be enabled.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 lib/igt_gt.c | 44 +++++++++++++++++++++++++++++---------------
 lib/igt_gt.h |  1 +
 2 files changed, 30 insertions(+), 15 deletions(-)

diff --git a/lib/igt_gt.c b/lib/igt_gt.c
index a0ba04cc1..7c7df95ee 100644
--- a/lib/igt_gt.c
+++ b/lib/igt_gt.c
@@ -56,23 +56,28 @@
  * engines.
  */
 
+static int reset_query_once = -1;
+
 static bool has_gpu_reset(int fd)
 {
-	static int once = -1;
-	if (once < 0) {
-		struct drm_i915_getparam gp;
-		int val = 0;
-
-		memset(&gp, 0, sizeof(gp));
-		gp.param = 35; /* HAS_GPU_RESET */
-		gp.value = &val;
-
-		if (ioctl(fd, DRM_IOCTL_I915_GETPARAM, &gp))
-			once = intel_gen(intel_get_drm_devid(fd)) >= 5;
-		else
-			once = val > 0;
+	if (reset_query_once < 0) {
+		reset_query_once = gem_gpu_reset_type(fd);
+
+		/* Very old kernels did not support the query */
+		if (reset_query_once == -1)
+			reset_query_once =
+			      (intel_gen(intel_get_drm_devid(fd)) >= 5) ? 1 : 0;
 	}
-	return once;
+
+	return reset_query_once > 0;
+}
+
+static bool has_engine_reset(int fd)
+{
+	if (reset_query_once < 0)
+		has_gpu_reset(fd);
+
+	return reset_query_once > 1;
 }
 
 static void eat_error_state(int dev)
@@ -176,7 +181,11 @@ igt_hang_t igt_allow_hang(int fd, unsigned ctx, unsigned flags)
 		igt_skip("hang injection disabled by user [IGT_HANG=0]\n");
 	gem_context_require_bannable(fd);
 
-	allow_reset = 1;
+	if (flags & HANG_WANT_ENGINE_RESET)
+		allow_reset = 2;
+	else
+		allow_reset = 1;
+
 	if ((flags & HANG_ALLOW_CAPTURE) == 0) {
 		param.param = I915_CONTEXT_PARAM_NO_ERROR_CAPTURE;
 		param.value = 1;
@@ -187,11 +196,16 @@ igt_hang_t igt_allow_hang(int fd, unsigned ctx, unsigned flags)
 		__gem_context_set_param(fd, &param);
 		allow_reset = INT_MAX; /* any reset method */
 	}
+
 	igt_require(igt_params_set(fd, "reset", "%d", allow_reset));
+	reset_query_once = -1;  /* Re-query after changing param */
 
 	if (!igt_check_boolean_env_var("IGT_HANG_WITHOUT_RESET", false))
 		igt_require(has_gpu_reset(fd));
 
+	if (flags & HANG_WANT_ENGINE_RESET)
+		igt_require(has_engine_reset(fd));
+
 	ban = context_get_ban(fd, ctx);
 	if ((flags & HANG_ALLOW_BAN) == 0)
 		context_set_ban(fd, ctx, 0);
diff --git a/lib/igt_gt.h b/lib/igt_gt.h
index ceb044b86..c5059817b 100644
--- a/lib/igt_gt.h
+++ b/lib/igt_gt.h
@@ -51,6 +51,7 @@ igt_hang_t igt_hang_ctx_with_ahnd(int fd, uint64_t ahnd, uint32_t ctx, int ring,
 
 #define HANG_ALLOW_BAN 1
 #define HANG_ALLOW_CAPTURE 2
+#define HANG_WANT_ENGINE_RESET 4
 
 igt_hang_t igt_hang_ring(int fd, int ring);
 igt_hang_t igt_hang_ring_with_ahnd(int fd, int ring, uint64_t ahnd);
-- 
2.25.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Intel-gfx] [PATCH i-g-t 8/8] tests/i915/gem_exec_capture: Update to support GuC based resets
  2021-10-21 23:40 [Intel-gfx] [PATCH i-g-t 0/8] Fixes for gem_exec_capture John.C.Harrison
                   ` (6 preceding siblings ...)
  2021-10-21 23:40   ` [igt-dev] " John.C.Harrison
@ 2021-10-21 23:40 ` John.C.Harrison
  2021-10-29  2:54   ` Matthew Brost
  2021-10-22  0:27 ` [igt-dev] ✓ Fi.CI.BAT: success for Fixes for gem_exec_capture Patchwork
  2021-10-22  3:38 ` [igt-dev] ✓ Fi.CI.IGT: " Patchwork
  9 siblings, 1 reply; 51+ messages in thread
From: John.C.Harrison @ 2021-10-21 23:40 UTC (permalink / raw)
  To: IGT-Dev; +Cc: Intel-GFX, John Harrison

From: John Harrison <John.C.Harrison@Intel.com>

When GuC submission is enabled, GuC itself manages hang detection and
recovery. Therefore, any test that relies on being able to trigger an
engine reset in the driver will fail. Full GT resets can still be
triggered by the driver. However, in that situation detecting the
specific context that caused a hang is not possible as the driver has
no information about what is actually running on the hardware at any
given time. Plus of course, there was no context that caused the hang
because the hang was triggered manually, so it's basically a bogus
mechanism in the first place!

Update the capture test to cause a reset via a the hangcheck mechanism
by submitting a hanging batch and waiting. That way it is guaranteed to
be testing the correct reset code paths for the current platform,
whether that is GuC enabled or not.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 tests/i915/gem_exec_capture.c | 65 ++++++++++++++++++++++++++++-------
 1 file changed, 53 insertions(+), 12 deletions(-)

diff --git a/tests/i915/gem_exec_capture.c b/tests/i915/gem_exec_capture.c
index 8997125ee..dda6e6a8f 100644
--- a/tests/i915/gem_exec_capture.c
+++ b/tests/i915/gem_exec_capture.c
@@ -23,6 +23,7 @@
 
 #include <sys/poll.h>
 #include <zlib.h>
+#include <sched.h>
 
 #include "i915/gem.h"
 #include "i915/gem_create.h"
@@ -31,6 +32,8 @@
 #include "igt_rand.h"
 #include "igt_sysfs.h"
 
+#define MAX_RESET_TIME	600
+
 IGT_TEST_DESCRIPTION("Check that we capture the user specified objects on a hang");
 
 struct offset {
@@ -213,7 +216,29 @@ static void configure_hangs(int fd, const struct intel_execution_engine2 *e, int
 	gem_engine_property_printf(fd, e->name, "heartbeat_interval_ms", "%d", 500);
 
 	/* Allow engine based resets and disable banning */
-	igt_allow_hang(fd, ctxt_id, HANG_ALLOW_CAPTURE);
+	igt_allow_hang(fd, ctxt_id, HANG_ALLOW_CAPTURE | HANG_WANT_ENGINE_RESET);
+}
+
+static bool fence_busy(int fence)
+{
+	return poll(&(struct pollfd){fence, POLLIN}, 1, 0) == 0;
+}
+
+static void wait_to_die(int fence_out)
+{
+	struct timeval before, after, delta;
+
+	/* Wait for a reset to occur */
+	gettimeofday(&before, NULL);
+	while (fence_busy(fence_out)) {
+		gettimeofday(&after, NULL);
+		timersub(&after, &before, &delta);
+		igt_assert(delta.tv_sec < MAX_RESET_TIME);
+		sched_yield();
+	}
+	gettimeofday(&after, NULL);
+	timersub(&after, &before, &delta);
+	igt_info("Target died after %ld.%06lds\n", delta.tv_sec, delta.tv_usec);
 }
 
 static void __capture1(int fd, int dir, uint64_t ahnd, const intel_ctx_t *ctx,
@@ -230,7 +255,7 @@ static void __capture1(int fd, int dir, uint64_t ahnd, const intel_ctx_t *ctx,
 	struct drm_i915_gem_execbuffer2 execbuf;
 	uint32_t *batch, *seqno;
 	struct offset offset;
-	int i;
+	int i, fence_out;
 
 	configure_hangs(fd, e, ctx->id);
 
@@ -315,18 +340,25 @@ static void __capture1(int fd, int dir, uint64_t ahnd, const intel_ctx_t *ctx,
 	execbuf.flags = e->flags;
 	if (gen > 3 && gen < 6)
 		execbuf.flags |= I915_EXEC_SECURE;
+	execbuf.flags |= I915_EXEC_FENCE_OUT;
 	execbuf.rsvd1 = ctx->id;
+	execbuf.rsvd2 = ~0UL;
 
 	igt_assert(!READ_ONCE(*seqno));
-	gem_execbuf(fd, &execbuf);
+	gem_execbuf_wr(fd, &execbuf);
+
+	fence_out = execbuf.rsvd2 >> 32;
+	igt_assert(fence_out >= 0);
 
 	/* Wait for the request to start */
 	while (READ_ONCE(*seqno) != 0xc0ffee)
 		igt_assert(gem_bo_busy(fd, obj[SCRATCH].handle));
 	munmap(seqno, 4096);
 
+	/* Wait for a reset to occur */
+	wait_to_die(fence_out);
+
 	/* Check that only the buffer we marked is reported in the error */
-	igt_force_gpu_reset(fd);
 	memset(&offset, 0, sizeof(offset));
 	offset.addr = obj[CAPTURE].offset;
 	igt_assert_eq(check_error_state(dir, &offset, 1, target_size, false), 1);
@@ -373,7 +405,8 @@ static int cmp(const void *A, const void *B)
 static struct offset *
 __captureN(int fd, int dir, uint64_t ahnd, const intel_ctx_t *ctx,
 	   const struct intel_execution_engine2 *e,
-	   unsigned int size, int count, unsigned int flags)
+	   unsigned int size, int count,
+	   unsigned int flags, int *_fence_out)
 #define INCREMENTAL 0x1
 #define ASYNC 0x2
 {
@@ -383,7 +416,7 @@ __captureN(int fd, int dir, uint64_t ahnd, const intel_ctx_t *ctx,
 	struct drm_i915_gem_execbuffer2 execbuf;
 	uint32_t *batch, *seqno;
 	struct offset *offsets;
-	int i;
+	int i, fence_out;
 
 	configure_hangs(fd, e, ctx->id);
 
@@ -491,10 +524,17 @@ __captureN(int fd, int dir, uint64_t ahnd, const intel_ctx_t *ctx,
 	execbuf.flags = e->flags;
 	if (gen > 3 && gen < 6)
 		execbuf.flags |= I915_EXEC_SECURE;
+	execbuf.flags |= I915_EXEC_FENCE_OUT;
 	execbuf.rsvd1 = ctx->id;
+	execbuf.rsvd2 = ~0UL;
 
 	igt_assert(!READ_ONCE(*seqno));
-	gem_execbuf(fd, &execbuf);
+	gem_execbuf_wr(fd, &execbuf);
+
+	fence_out = execbuf.rsvd2 >> 32;
+	igt_assert(fence_out >= 0);
+	if (_fence_out)
+		*_fence_out = fence_out;
 
 	/* Wait for the request to start */
 	while (READ_ONCE(*seqno) != 0xc0ffee)
@@ -502,7 +542,7 @@ __captureN(int fd, int dir, uint64_t ahnd, const intel_ctx_t *ctx,
 	munmap(seqno, 4096);
 
 	if (!(flags & ASYNC)) {
-		igt_force_gpu_reset(fd);
+		wait_to_die(fence_out);
 		gem_sync(fd, obj[count + 1].handle);
 	}
 
@@ -549,7 +589,7 @@ static void many(int fd, int dir, uint64_t size, unsigned int flags)
 	intel_require_memory(count, size, CHECK_RAM);
 	ahnd = get_reloc_ahnd(fd, ctx->id);
 
-	offsets = __captureN(fd, dir, ahnd, ctx, e, size, count, flags);
+	offsets = __captureN(fd, dir, ahnd, ctx, e, size, count, flags, NULL);
 
 	blobs = check_error_state(dir, offsets, count, size, !!(flags & INCREMENTAL));
 	igt_info("Captured %lu %"PRId64"-blobs out of a total of %lu\n",
@@ -602,6 +642,7 @@ static void prioinv(int fd, int dir, const intel_ctx_t *ctx,
 	igt_assert(pipe(link) == 0);
 	igt_fork(child, 1) {
 		const intel_ctx_t *ctx2;
+		int fence_out;
 		fd = gem_reopen_driver(fd);
 		igt_debug("Submitting large capture [%ld x %dMiB objects]\n",
 			  count, (int)(size >> 20));
@@ -613,11 +654,11 @@ static void prioinv(int fd, int dir, const intel_ctx_t *ctx,
 		/* Reopen the allocator in the new process. */
 		ahnd = get_reloc_ahnd(fd, ctx2->id);
 
-		free(__captureN(fd, dir, ahnd, ctx2, e, size, count, ASYNC));
+		free(__captureN(fd, dir, ahnd, ctx2, e, size, count, ASYNC, &fence_out));
 		put_ahnd(ahnd);
 
 		write(link[1], &fd, sizeof(fd)); /* wake the parent up */
-		igt_force_gpu_reset(fd);
+		wait_to_die(fence_out);
 		write(link[1], &fd, sizeof(fd)); /* wake the parent up */
 	}
 	read(link[0], &dummy, sizeof(dummy));
@@ -714,7 +755,7 @@ igt_main
 		gem_require_mmap_wc(fd);
 		igt_require(has_capture(fd));
 		ctx = intel_ctx_create_all_physical(fd);
-		igt_allow_hang(fd, ctx->id, HANG_ALLOW_CAPTURE);
+		igt_allow_hang(fd, ctx->id, HANG_ALLOW_CAPTURE | HANG_WANT_ENGINE_RESET);
 
 		dir = igt_sysfs_open(fd);
 		igt_require(igt_sysfs_set(dir, "error", "Begone!"));
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [igt-dev] ✓ Fi.CI.BAT: success for Fixes for gem_exec_capture
  2021-10-21 23:40 [Intel-gfx] [PATCH i-g-t 0/8] Fixes for gem_exec_capture John.C.Harrison
                   ` (7 preceding siblings ...)
  2021-10-21 23:40 ` [Intel-gfx] [PATCH i-g-t 8/8] tests/i915/gem_exec_capture: Update to support GuC based resets John.C.Harrison
@ 2021-10-22  0:27 ` Patchwork
  2021-10-22  3:38 ` [igt-dev] ✓ Fi.CI.IGT: " Patchwork
  9 siblings, 0 replies; 51+ messages in thread
From: Patchwork @ 2021-10-22  0:27 UTC (permalink / raw)
  To: john.c.harrison; +Cc: igt-dev

[-- Attachment #1: Type: text/plain, Size: 3842 bytes --]

== Series Details ==

Series: Fixes for gem_exec_capture
URL   : https://patchwork.freedesktop.org/series/96160/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_10773 -> IGTPW_6346
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/index.html

Known issues
------------

  Here are the changes found in IGTPW_6346 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_exec_suspend@basic-s3:
    - fi-skl-6600u:       [PASS][1] -> [INCOMPLETE][2] ([i915#198])
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10773/fi-skl-6600u/igt@gem_exec_suspend@basic-s3.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/fi-skl-6600u/igt@gem_exec_suspend@basic-s3.html

  
#### Possible fixes ####

  * igt@i915_pm_rpm@module-reload:
    - fi-kbl-guc:         [FAIL][3] ([i915#579]) -> [PASS][4]
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10773/fi-kbl-guc/igt@i915_pm_rpm@module-reload.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/fi-kbl-guc/igt@i915_pm_rpm@module-reload.html

  * igt@i915_selftest@live@gt_heartbeat:
    - fi-bdw-samus:       [DMESG-FAIL][5] ([i915#541]) -> [PASS][6]
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10773/fi-bdw-samus/igt@i915_selftest@live@gt_heartbeat.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/fi-bdw-samus/igt@i915_selftest@live@gt_heartbeat.html

  * igt@i915_selftest@live@hangcheck:
    - {fi-hsw-gt1}:       [DMESG-WARN][7] ([i915#3303]) -> [PASS][8]
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10773/fi-hsw-gt1/igt@i915_selftest@live@hangcheck.html
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/fi-hsw-gt1/igt@i915_selftest@live@hangcheck.html

  * igt@kms_flip@basic-flip-vs-modeset@c-dp2:
    - fi-cfl-8109u:       [DMESG-WARN][9] ([i915#165]) -> [PASS][10] +2 similar issues
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10773/fi-cfl-8109u/igt@kms_flip@basic-flip-vs-modeset@c-dp2.html
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/fi-cfl-8109u/igt@kms_flip@basic-flip-vs-modeset@c-dp2.html

  * igt@kms_pipe_crc_basic@compare-crc-sanitycheck-pipe-b:
    - fi-cfl-8109u:       [DMESG-WARN][11] ([i915#165] / [i915#295]) -> [PASS][12] +20 similar issues
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10773/fi-cfl-8109u/igt@kms_pipe_crc_basic@compare-crc-sanitycheck-pipe-b.html
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/fi-cfl-8109u/igt@kms_pipe_crc_basic@compare-crc-sanitycheck-pipe-b.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [i915#165]: https://gitlab.freedesktop.org/drm/intel/issues/165
  [i915#198]: https://gitlab.freedesktop.org/drm/intel/issues/198
  [i915#295]: https://gitlab.freedesktop.org/drm/intel/issues/295
  [i915#3303]: https://gitlab.freedesktop.org/drm/intel/issues/3303
  [i915#541]: https://gitlab.freedesktop.org/drm/intel/issues/541
  [i915#579]: https://gitlab.freedesktop.org/drm/intel/issues/579


Participating hosts (39 -> 36)
------------------------------

  Missing    (3): fi-ctg-p8600 fi-bsw-cyan fi-hsw-4200u 


Build changes
-------------

  * CI: CI-20190529 -> None
  * IGT: IGT_6258 -> IGTPW_6346

  CI-20190529: 20190529
  CI_DRM_10773: fa267509357bd9eb021c3d474fe0980cde18de62 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGTPW_6346: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/index.html
  IGT_6258: 4c80c71d7dec29b6376846ae96bd04dc0b6e34d9 @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/index.html

[-- Attachment #2: Type: text/html, Size: 4650 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [igt-dev] ✓ Fi.CI.IGT: success for Fixes for gem_exec_capture
  2021-10-21 23:40 [Intel-gfx] [PATCH i-g-t 0/8] Fixes for gem_exec_capture John.C.Harrison
                   ` (8 preceding siblings ...)
  2021-10-22  0:27 ` [igt-dev] ✓ Fi.CI.BAT: success for Fixes for gem_exec_capture Patchwork
@ 2021-10-22  3:38 ` Patchwork
  9 siblings, 0 replies; 51+ messages in thread
From: Patchwork @ 2021-10-22  3:38 UTC (permalink / raw)
  To: john.c.harrison; +Cc: igt-dev

[-- Attachment #1: Type: text/plain, Size: 30243 bytes --]

== Series Details ==

Series: Fixes for gem_exec_capture
URL   : https://patchwork.freedesktop.org/series/96160/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_10773_full -> IGTPW_6346_full
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/index.html

Known issues
------------

  Here are the changes found in IGTPW_6346_full that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@feature_discovery@display-2x:
    - shard-tglb:         NOTRUN -> [SKIP][1] ([i915#1839])
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-tglb7/igt@feature_discovery@display-2x.html

  * igt@gem_create@create-massive:
    - shard-snb:          NOTRUN -> [DMESG-WARN][2] ([i915#3002])
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-snb7/igt@gem_create@create-massive.html

  * igt@gem_ctx_isolation@preservation-s3@rcs0:
    - shard-tglb:         [PASS][3] -> [INCOMPLETE][4] ([i915#1373])
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10773/shard-tglb8/igt@gem_ctx_isolation@preservation-s3@rcs0.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-tglb7/igt@gem_ctx_isolation@preservation-s3@rcs0.html

  * igt@gem_ctx_persistence@engines-queued:
    - shard-snb:          NOTRUN -> [SKIP][5] ([fdo#109271] / [i915#1099]) +5 similar issues
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-snb5/igt@gem_ctx_persistence@engines-queued.html

  * igt@gem_exec_capture@pi@vcs0:
    - shard-iclb:         [PASS][6] -> [INCOMPLETE][7] ([i915#2369])
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10773/shard-iclb3/igt@gem_exec_capture@pi@vcs0.html
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-iclb6/igt@gem_exec_capture@pi@vcs0.html

  * igt@gem_exec_fair@basic-deadline:
    - shard-glk:          [PASS][8] -> [FAIL][9] ([i915#2846])
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10773/shard-glk4/igt@gem_exec_fair@basic-deadline.html
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-glk1/igt@gem_exec_fair@basic-deadline.html

  * igt@gem_exec_fair@basic-none-vip@rcs0:
    - shard-glk:          [PASS][10] -> [FAIL][11] ([i915#2842])
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10773/shard-glk5/igt@gem_exec_fair@basic-none-vip@rcs0.html
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-glk9/igt@gem_exec_fair@basic-none-vip@rcs0.html

  * igt@gem_exec_fair@basic-none@vcs1:
    - shard-iclb:         NOTRUN -> [FAIL][12] ([i915#2842])
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-iclb4/igt@gem_exec_fair@basic-none@vcs1.html

  * igt@gem_exec_fair@basic-pace@bcs0:
    - shard-tglb:         [PASS][13] -> [FAIL][14] ([i915#2842]) +2 similar issues
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10773/shard-tglb8/igt@gem_exec_fair@basic-pace@bcs0.html
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-tglb2/igt@gem_exec_fair@basic-pace@bcs0.html

  * igt@gem_exec_fair@basic-throttle@rcs0:
    - shard-tglb:         NOTRUN -> [FAIL][15] ([i915#2842])
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-tglb8/igt@gem_exec_fair@basic-throttle@rcs0.html

  * igt@gem_exec_flush@basic-batch-kernel-default-cmd:
    - shard-iclb:         NOTRUN -> [SKIP][16] ([fdo#109313])
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-iclb1/igt@gem_exec_flush@basic-batch-kernel-default-cmd.html
    - shard-tglb:         NOTRUN -> [SKIP][17] ([fdo#109313])
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-tglb1/igt@gem_exec_flush@basic-batch-kernel-default-cmd.html

  * igt@gem_exec_params@no-bsd:
    - shard-tglb:         NOTRUN -> [SKIP][18] ([fdo#109283])
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-tglb3/igt@gem_exec_params@no-bsd.html
    - shard-iclb:         NOTRUN -> [SKIP][19] ([fdo#109283])
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-iclb2/igt@gem_exec_params@no-bsd.html

  * igt@gem_exec_params@secure-non-root:
    - shard-tglb:         NOTRUN -> [SKIP][20] ([fdo#112283])
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-tglb2/igt@gem_exec_params@secure-non-root.html

  * igt@gem_pxp@create-protected-buffer:
    - shard-iclb:         NOTRUN -> [SKIP][21] ([i915#4270])
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-iclb2/igt@gem_pxp@create-protected-buffer.html
    - shard-tglb:         NOTRUN -> [SKIP][22] ([i915#4270])
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-tglb6/igt@gem_pxp@create-protected-buffer.html

  * igt@gem_userptr_blits@readonly-unsync:
    - shard-tglb:         NOTRUN -> [SKIP][23] ([i915#3297])
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-tglb8/igt@gem_userptr_blits@readonly-unsync.html

  * igt@gem_userptr_blits@vma-merge:
    - shard-snb:          NOTRUN -> [FAIL][24] ([i915#2724])
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-snb6/igt@gem_userptr_blits@vma-merge.html
    - shard-kbl:          NOTRUN -> [FAIL][25] ([i915#3318])
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-kbl4/igt@gem_userptr_blits@vma-merge.html

  * igt@gen7_exec_parse@bitmasks:
    - shard-tglb:         NOTRUN -> [SKIP][26] ([fdo#109289])
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-tglb8/igt@gen7_exec_parse@bitmasks.html

  * igt@gen9_exec_parse@shadow-peek:
    - shard-iclb:         NOTRUN -> [SKIP][27] ([i915#2856])
   [27]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-iclb6/igt@gen9_exec_parse@shadow-peek.html

  * igt@gen9_exec_parse@valid-registers:
    - shard-tglb:         NOTRUN -> [SKIP][28] ([i915#2856]) +1 similar issue
   [28]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-tglb5/igt@gen9_exec_parse@valid-registers.html

  * igt@i915_pm_dc@dc9-dpms:
    - shard-iclb:         [PASS][29] -> [FAIL][30] ([i915#4275])
   [29]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10773/shard-iclb4/igt@i915_pm_dc@dc9-dpms.html
   [30]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-iclb5/igt@i915_pm_dc@dc9-dpms.html

  * igt@i915_pm_rpm@dpms-non-lpsp:
    - shard-tglb:         NOTRUN -> [SKIP][31] ([fdo#111644] / [i915#1397] / [i915#2411])
   [31]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-tglb7/igt@i915_pm_rpm@dpms-non-lpsp.html

  * igt@i915_pm_sseu@full-enable:
    - shard-tglb:         NOTRUN -> [SKIP][32] ([fdo#109288])
   [32]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-tglb8/igt@i915_pm_sseu@full-enable.html

  * igt@i915_suspend@sysfs-reader:
    - shard-apl:          [PASS][33] -> [DMESG-WARN][34] ([i915#180]) +6 similar issues
   [33]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10773/shard-apl7/igt@i915_suspend@sysfs-reader.html
   [34]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-apl8/igt@i915_suspend@sysfs-reader.html

  * igt@kms_big_fb@linear-8bpp-rotate-270:
    - shard-tglb:         NOTRUN -> [SKIP][35] ([fdo#111614]) +3 similar issues
   [35]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-tglb6/igt@kms_big_fb@linear-8bpp-rotate-270.html

  * igt@kms_big_fb@x-tiled-16bpp-rotate-270:
    - shard-iclb:         NOTRUN -> [SKIP][36] ([fdo#110725] / [fdo#111614])
   [36]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-iclb5/igt@kms_big_fb@x-tiled-16bpp-rotate-270.html

  * igt@kms_big_fb@yf-tiled-max-hw-stride-32bpp-rotate-0-hflip:
    - shard-apl:          NOTRUN -> [SKIP][37] ([fdo#109271] / [i915#3777]) +1 similar issue
   [37]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-apl1/igt@kms_big_fb@yf-tiled-max-hw-stride-32bpp-rotate-0-hflip.html

  * igt@kms_big_fb@yf-tiled-max-hw-stride-64bpp-rotate-0-hflip:
    - shard-tglb:         NOTRUN -> [SKIP][38] ([fdo#111615])
   [38]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-tglb3/igt@kms_big_fb@yf-tiled-max-hw-stride-64bpp-rotate-0-hflip.html

  * igt@kms_big_joiner@invalid-modeset:
    - shard-tglb:         NOTRUN -> [SKIP][39] ([i915#2705])
   [39]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-tglb2/igt@kms_big_joiner@invalid-modeset.html

  * igt@kms_ccs@pipe-a-crc-sprite-planes-basic-y_tiled_gen12_mc_ccs:
    - shard-glk:          NOTRUN -> [SKIP][40] ([fdo#109271] / [i915#3886]) +4 similar issues
   [40]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-glk4/igt@kms_ccs@pipe-a-crc-sprite-planes-basic-y_tiled_gen12_mc_ccs.html
    - shard-iclb:         NOTRUN -> [SKIP][41] ([fdo#109278] / [i915#3886]) +1 similar issue
   [41]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-iclb8/igt@kms_ccs@pipe-a-crc-sprite-planes-basic-y_tiled_gen12_mc_ccs.html
    - shard-tglb:         NOTRUN -> [SKIP][42] ([i915#3689] / [i915#3886])
   [42]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-tglb1/igt@kms_ccs@pipe-a-crc-sprite-planes-basic-y_tiled_gen12_mc_ccs.html

  * igt@kms_ccs@pipe-a-missing-ccs-buffer-y_tiled_gen12_rc_ccs_cc:
    - shard-apl:          NOTRUN -> [SKIP][43] ([fdo#109271] / [i915#3886]) +12 similar issues
   [43]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-apl8/igt@kms_ccs@pipe-a-missing-ccs-buffer-y_tiled_gen12_rc_ccs_cc.html

  * igt@kms_ccs@pipe-b-ccs-on-another-bo-y_tiled_gen12_rc_ccs:
    - shard-iclb:         NOTRUN -> [SKIP][44] ([fdo#109278]) +2 similar issues
   [44]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-iclb3/igt@kms_ccs@pipe-b-ccs-on-another-bo-y_tiled_gen12_rc_ccs.html

  * igt@kms_ccs@pipe-b-missing-ccs-buffer-y_tiled_gen12_rc_ccs_cc:
    - shard-kbl:          NOTRUN -> [SKIP][45] ([fdo#109271] / [i915#3886]) +14 similar issues
   [45]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-kbl6/igt@kms_ccs@pipe-b-missing-ccs-buffer-y_tiled_gen12_rc_ccs_cc.html

  * igt@kms_ccs@pipe-d-missing-ccs-buffer-yf_tiled_ccs:
    - shard-tglb:         NOTRUN -> [SKIP][46] ([i915#3689]) +8 similar issues
   [46]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-tglb6/igt@kms_ccs@pipe-d-missing-ccs-buffer-yf_tiled_ccs.html

  * igt@kms_chamelium@hdmi-hpd-enable-disable-mode:
    - shard-snb:          NOTRUN -> [SKIP][47] ([fdo#109271] / [fdo#111827]) +21 similar issues
   [47]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-snb7/igt@kms_chamelium@hdmi-hpd-enable-disable-mode.html

  * igt@kms_chamelium@vga-hpd-after-suspend:
    - shard-glk:          NOTRUN -> [SKIP][48] ([fdo#109271] / [fdo#111827]) +1 similar issue
   [48]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-glk4/igt@kms_chamelium@vga-hpd-after-suspend.html

  * igt@kms_color_chamelium@pipe-a-ctm-0-5:
    - shard-apl:          NOTRUN -> [SKIP][49] ([fdo#109271] / [fdo#111827]) +16 similar issues
   [49]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-apl6/igt@kms_color_chamelium@pipe-a-ctm-0-5.html

  * igt@kms_color_chamelium@pipe-b-ctm-0-75:
    - shard-tglb:         NOTRUN -> [SKIP][50] ([fdo#109284] / [fdo#111827]) +5 similar issues
   [50]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-tglb7/igt@kms_color_chamelium@pipe-b-ctm-0-75.html

  * igt@kms_color_chamelium@pipe-c-ctm-0-25:
    - shard-kbl:          NOTRUN -> [SKIP][51] ([fdo#109271] / [fdo#111827]) +18 similar issues
   [51]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-kbl2/igt@kms_color_chamelium@pipe-c-ctm-0-25.html

  * igt@kms_content_protection@atomic:
    - shard-kbl:          NOTRUN -> [TIMEOUT][52] ([i915#1319])
   [52]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-kbl7/igt@kms_content_protection@atomic.html

  * igt@kms_content_protection@uevent:
    - shard-apl:          NOTRUN -> [FAIL][53] ([i915#2105])
   [53]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-apl2/igt@kms_content_protection@uevent.html

  * igt@kms_cursor_crc@pipe-c-cursor-512x170-offscreen:
    - shard-tglb:         NOTRUN -> [SKIP][54] ([fdo#109279] / [i915#3359]) +3 similar issues
   [54]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-tglb6/igt@kms_cursor_crc@pipe-c-cursor-512x170-offscreen.html

  * igt@kms_cursor_crc@pipe-c-cursor-max-size-onscreen:
    - shard-tglb:         NOTRUN -> [SKIP][55] ([i915#3359]) +4 similar issues
   [55]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-tglb3/igt@kms_cursor_crc@pipe-c-cursor-max-size-onscreen.html

  * igt@kms_cursor_crc@pipe-d-cursor-32x32-random:
    - shard-tglb:         NOTRUN -> [SKIP][56] ([i915#3319])
   [56]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-tglb2/igt@kms_cursor_crc@pipe-d-cursor-32x32-random.html

  * igt@kms_cursor_edge_walk@pipe-d-128x128-right-edge:
    - shard-snb:          NOTRUN -> [SKIP][57] ([fdo#109271]) +420 similar issues
   [57]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-snb7/igt@kms_cursor_edge_walk@pipe-d-128x128-right-edge.html

  * igt@kms_cursor_legacy@cursorb-vs-flipa-atomic:
    - shard-iclb:         NOTRUN -> [SKIP][58] ([fdo#109274] / [fdo#109278])
   [58]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-iclb3/igt@kms_cursor_legacy@cursorb-vs-flipa-atomic.html

  * igt@kms_cursor_legacy@pipe-d-torture-bo:
    - shard-kbl:          NOTRUN -> [SKIP][59] ([fdo#109271] / [i915#533]) +2 similar issues
   [59]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-kbl2/igt@kms_cursor_legacy@pipe-d-torture-bo.html

  * igt@kms_dp_tiled_display@basic-test-pattern-with-chamelium:
    - shard-tglb:         NOTRUN -> [SKIP][60] ([i915#3528])
   [60]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-tglb6/igt@kms_dp_tiled_display@basic-test-pattern-with-chamelium.html

  * igt@kms_flip@flip-vs-expired-vblank-interruptible@a-hdmi-a2:
    - shard-glk:          [PASS][61] -> [FAIL][62] ([i915#79])
   [61]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10773/shard-glk1/igt@kms_flip@flip-vs-expired-vblank-interruptible@a-hdmi-a2.html
   [62]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-glk9/igt@kms_flip@flip-vs-expired-vblank-interruptible@a-hdmi-a2.html

  * igt@kms_flip@flip-vs-suspend-interruptible@a-dp1:
    - shard-kbl:          [PASS][63] -> [DMESG-WARN][64] ([i915#180]) +2 similar issues
   [63]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10773/shard-kbl2/igt@kms_flip@flip-vs-suspend-interruptible@a-dp1.html
   [64]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-kbl7/igt@kms_flip@flip-vs-suspend-interruptible@a-dp1.html

  * igt@kms_flip@flip-vs-suspend-interruptible@a-edp1:
    - shard-tglb:         [PASS][65] -> [INCOMPLETE][66] ([i915#2411] / [i915#456])
   [65]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10773/shard-tglb3/igt@kms_flip@flip-vs-suspend-interruptible@a-edp1.html
   [66]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-tglb7/igt@kms_flip@flip-vs-suspend-interruptible@a-edp1.html

  * igt@kms_flip@flip-vs-suspend-interruptible@c-dp1:
    - shard-kbl:          NOTRUN -> [DMESG-WARN][67] ([i915#180])
   [67]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-kbl7/igt@kms_flip@flip-vs-suspend-interruptible@c-dp1.html

  * igt@kms_flip@flip-vs-suspend@a-edp1:
    - shard-tglb:         [PASS][68] -> [INCOMPLETE][69] ([i915#456])
   [68]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10773/shard-tglb6/igt@kms_flip@flip-vs-suspend@a-edp1.html
   [69]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-tglb7/igt@kms_flip@flip-vs-suspend@a-edp1.html

  * igt@kms_flip_scaled_crc@flip-64bpp-ytile-to-32bpp-ytilercccs:
    - shard-apl:          NOTRUN -> [SKIP][70] ([fdo#109271] / [i915#2672])
   [70]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-apl6/igt@kms_flip_scaled_crc@flip-64bpp-ytile-to-32bpp-ytilercccs.html

  * igt@kms_frontbuffer_tracking@fbc-2p-primscrn-pri-shrfb-draw-render:
    - shard-glk:          [PASS][71] -> [FAIL][72] ([i915#1888] / [i915#2546])
   [71]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10773/shard-glk6/igt@kms_frontbuffer_tracking@fbc-2p-primscrn-pri-shrfb-draw-render.html
   [72]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-glk4/igt@kms_frontbuffer_tracking@fbc-2p-primscrn-pri-shrfb-draw-render.html

  * igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-shrfb-msflip-blt:
    - shard-tglb:         NOTRUN -> [SKIP][73] ([fdo#111825]) +21 similar issues
   [73]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-tglb1/igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-shrfb-msflip-blt.html
    - shard-iclb:         NOTRUN -> [SKIP][74] ([fdo#109280]) +7 similar issues
   [74]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-iclb4/igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-shrfb-msflip-blt.html

  * igt@kms_frontbuffer_tracking@psr-2p-primscrn-pri-indfb-draw-mmap-cpu:
    - shard-glk:          NOTRUN -> [SKIP][75] ([fdo#109271]) +30 similar issues
   [75]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-glk1/igt@kms_frontbuffer_tracking@psr-2p-primscrn-pri-indfb-draw-mmap-cpu.html

  * igt@kms_hdr@static-toggle-dpms:
    - shard-tglb:         NOTRUN -> [SKIP][76] ([i915#1187])
   [76]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-tglb2/igt@kms_hdr@static-toggle-dpms.html

  * igt@kms_pipe_b_c_ivb@disable-pipe-b-enable-pipe-c:
    - shard-apl:          NOTRUN -> [SKIP][77] ([fdo#109271]) +207 similar issues
   [77]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-apl7/igt@kms_pipe_b_c_ivb@disable-pipe-b-enable-pipe-c.html

  * igt@kms_pipe_crc_basic@compare-crc-sanitycheck-pipe-d:
    - shard-apl:          NOTRUN -> [SKIP][78] ([fdo#109271] / [i915#533]) +2 similar issues
   [78]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-apl3/igt@kms_pipe_crc_basic@compare-crc-sanitycheck-pipe-d.html

  * igt@kms_plane_alpha_blend@pipe-a-alpha-7efc:
    - shard-kbl:          NOTRUN -> [FAIL][79] ([fdo#108145] / [i915#265]) +1 similar issue
   [79]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-kbl6/igt@kms_plane_alpha_blend@pipe-a-alpha-7efc.html

  * igt@kms_plane_alpha_blend@pipe-b-alpha-transparent-fb:
    - shard-kbl:          NOTRUN -> [FAIL][80] ([i915#265]) +1 similar issue
   [80]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-kbl6/igt@kms_plane_alpha_blend@pipe-b-alpha-transparent-fb.html

  * igt@kms_plane_alpha_blend@pipe-c-alpha-transparent-fb:
    - shard-apl:          NOTRUN -> [FAIL][81] ([i915#265])
   [81]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-apl3/igt@kms_plane_alpha_blend@pipe-c-alpha-transparent-fb.html

  * igt@kms_plane_lowres@pipe-a-tiling-y:
    - shard-iclb:         NOTRUN -> [SKIP][82] ([i915#3536])
   [82]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-iclb6/igt@kms_plane_lowres@pipe-a-tiling-y.html
    - shard-tglb:         NOTRUN -> [SKIP][83] ([i915#3536])
   [83]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-tglb7/igt@kms_plane_lowres@pipe-a-tiling-y.html

  * igt@kms_plane_scaling@scaler-with-clipping-clamping@pipe-c-scaler-with-clipping-clamping:
    - shard-kbl:          NOTRUN -> [SKIP][84] ([fdo#109271] / [i915#2733])
   [84]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-kbl1/igt@kms_plane_scaling@scaler-with-clipping-clamping@pipe-c-scaler-with-clipping-clamping.html

  * igt@kms_psr2_sf@overlay-plane-update-sf-dmg-area-2:
    - shard-apl:          NOTRUN -> [SKIP][85] ([fdo#109271] / [i915#658]) +4 similar issues
   [85]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-apl1/igt@kms_psr2_sf@overlay-plane-update-sf-dmg-area-2.html

  * igt@kms_psr2_sf@primary-plane-update-sf-dmg-area-1:
    - shard-kbl:          NOTRUN -> [SKIP][86] ([fdo#109271] / [i915#658]) +6 similar issues
   [86]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-kbl4/igt@kms_psr2_sf@primary-plane-update-sf-dmg-area-1.html

  * igt@kms_psr2_sf@primary-plane-update-sf-dmg-area-5:
    - shard-tglb:         NOTRUN -> [SKIP][87] ([i915#2920]) +1 similar issue
   [87]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-tglb1/igt@kms_psr2_sf@primary-plane-update-sf-dmg-area-5.html

  * igt@kms_psr2_su@page_flip:
    - shard-tglb:         NOTRUN -> [SKIP][88] ([i915#1911])
   [88]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-tglb6/igt@kms_psr2_su@page_flip.html

  * igt@kms_psr@psr2_cursor_plane_onoff:
    - shard-tglb:         NOTRUN -> [FAIL][89] ([i915#132] / [i915#3467])
   [89]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-tglb6/igt@kms_psr@psr2_cursor_plane_onoff.html

  * igt@kms_psr@psr2_no_drrs:
    - shard-iclb:         [PASS][90] -> [SKIP][91] ([fdo#109441]) +1 similar issue
   [90]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10773/shard-iclb2/igt@kms_psr@psr2_no_drrs.html
   [91]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-iclb8/igt@kms_psr@psr2_no_drrs.html

  * igt@kms_universal_plane@disable-primary-vs-flip-pipe-d:
    - shard-kbl:          NOTRUN -> [SKIP][92] ([fdo#109271]) +261 similar issues
   [92]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-kbl3/igt@kms_universal_plane@disable-primary-vs-flip-pipe-d.html

  * igt@kms_writeback@writeback-check-output:
    - shard-apl:          NOTRUN -> [SKIP][93] ([fdo#109271] / [i915#2437])
   [93]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-apl2/igt@kms_writeback@writeback-check-output.html

  * igt@nouveau_crc@pipe-d-ctx-flip-skip-current-frame:
    - shard-tglb:         NOTRUN -> [SKIP][94] ([i915#2530])
   [94]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-tglb7/igt@nouveau_crc@pipe-d-ctx-flip-skip-current-frame.html

  * igt@prime_nv_test@i915_import_cpu_mmap:
    - shard-iclb:         NOTRUN -> [SKIP][95] ([fdo#109291])
   [95]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-iclb4/igt@prime_nv_test@i915_import_cpu_mmap.html

  * igt@prime_nv_test@i915_import_gtt_mmap:
    - shard-tglb:         NOTRUN -> [SKIP][96] ([fdo#109291]) +1 similar issue
   [96]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-tglb6/igt@prime_nv_test@i915_import_gtt_mmap.html

  * igt@prime_vgem@basic-userptr:
    - shard-tglb:         NOTRUN -> [SKIP][97] ([i915#3301])
   [97]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-tglb8/igt@prime_vgem@basic-userptr.html

  * igt@prime_vgem@fence-write-hang:
    - shard-tglb:         NOTRUN -> [SKIP][98] ([fdo#109295])
   [98]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-tglb5/igt@prime_vgem@fence-write-hang.html

  * igt@sysfs_clients@recycle-many:
    - shard-apl:          NOTRUN -> [SKIP][99] ([fdo#109271] / [i915#2994]) +2 similar issues
   [99]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-apl1/igt@sysfs_clients@recycle-many.html

  * igt@sysfs_clients@sema-10:
    - shard-tglb:         NOTRUN -> [SKIP][100] ([i915#2994]) +1 similar issue
   [100]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-tglb1/igt@sysfs_clients@sema-10.html
    - shard-kbl:          NOTRUN -> [SKIP][101] ([fdo#109271] / [i915#2994]) +1 similar issue
   [101]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-kbl4/igt@sysfs_clients@sema-10.html

  
#### Possible fixes ####

  * igt@feature_discovery@psr2:
    - shard-iclb:         [SKIP][102] ([i915#658]) -> [PASS][103]
   [102]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10773/shard-iclb8/igt@feature_discovery@psr2.html
   [103]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-iclb2/igt@feature_discovery@psr2.html

  * igt@gem_exec_fair@basic-deadline:
    - shard-kbl:          [FAIL][104] ([i915#2846]) -> [PASS][105]
   [104]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10773/shard-kbl4/igt@gem_exec_fair@basic-deadline.html
   [105]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-kbl3/igt@gem_exec_fair@basic-deadline.html

  * igt@gem_exec_fair@basic-pace@rcs0:
    - shard-kbl:          [FAIL][106] ([i915#2842]) -> [PASS][107]
   [106]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10773/shard-kbl6/igt@gem_exec_fair@basic-pace@rcs0.html
   [107]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-kbl7/igt@gem_exec_fair@basic-pace@rcs0.html

  * igt@gem_exec_fair@basic-pace@vecs0:
    - shard-kbl:          [SKIP][108] ([fdo#109271]) -> [PASS][109]
   [108]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10773/shard-kbl6/igt@gem_exec_fair@basic-pace@vecs0.html
   [109]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-kbl7/igt@gem_exec_fair@basic-pace@vecs0.html
    - shard-iclb:         [FAIL][110] ([i915#2842]) -> [PASS][111]
   [110]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10773/shard-iclb8/igt@gem_exec_fair@basic-pace@vecs0.html
   [111]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-iclb3/igt@gem_exec_fair@basic-pace@vecs0.html

  * igt@gem_exec_fair@basic-throttle@rcs0:
    - shard-glk:          [FAIL][112] ([i915#2842]) -> [PASS][113] +1 similar issue
   [112]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10773/shard-glk9/igt@gem_exec_fair@basic-throttle@rcs0.html
   [113]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-glk4/igt@gem_exec_fair@basic-throttle@rcs0.html

  * igt@i915_pm_dc@dc6-psr:
    - shard-iclb:         [FAIL][114] ([i915#454]) -> [PASS][115]
   [114]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10773/shard-iclb6/igt@i915_pm_dc@dc6-psr.html
   [115]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-iclb5/igt@i915_pm_dc@dc6-psr.html

  * igt@i915_suspend@forcewake:
    - shard-tglb:         [INCOMPLETE][116] ([i915#2411] / [i915#456]) -> [PASS][117] +1 similar issue
   [116]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10773/shard-tglb7/igt@i915_suspend@forcewake.html
   [117]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-tglb1/igt@i915_suspend@forcewake.html

  * igt@kms_big_fb@linear-32bpp-rotate-180:
    - shard-glk:          [DMESG-WARN][118] ([i915#118]) -> [PASS][119] +1 similar issue
   [118]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10773/shard-glk6/igt@kms_big_fb@linear-32bpp-rotate-180.html
   [119]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-glk6/igt@kms_big_fb@linear-32bpp-rotate-180.html

  * igt@kms_cursor_crc@pipe-a-cursor-suspend:
    - shard-kbl:          [DMESG-WARN][120] ([i915#180]) -> [PASS][121] +6 similar issues
   [120]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10773/shard-kbl4/igt@kms_cursor_crc@pipe-a-cursor-suspend.html
   [121]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-kbl6/igt@kms_cursor_crc@pipe-a-cursor-suspend.html

  * igt@kms_fbcon_fbt@psr-suspend:
    - shard-tglb:         [INCOMPLETE][122] ([i915#456]) -> [PASS][123]
   [122]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10773/shard-tglb7/igt@kms_fbcon_fbt@psr-suspend.html
   [123]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-tglb8/igt@kms_fbcon_fbt@psr-suspend.html

  * igt@kms_frontbuffer_tracking@fbc-suspend:
    - shard-tglb:         [INCOMPLETE][124] ([i915#2828] / [i915#456]) -> [PASS][125]
   [124]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10773/shard-tglb7/igt@kms_frontbuffer_tracking@fbc-suspend.html
   [125]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-tglb2/igt@kms_frontbuffer_tracking@fbc-suspend.html

  * igt@kms_psr@psr2_suspend:
    - shard-iclb:         [SKIP][126] ([fdo#109441]) -> [PASS][127] +2 similar issues
   [126]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10773/shard-iclb3/igt@kms_psr@psr2_suspend.html
   [127]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-iclb2/igt@kms_psr@psr2_suspend.html

  * igt@kms_vblank@pipe-a-ts-continuation-suspend:
    - shard-kbl:          [DMESG-WARN][128] ([i915#180] / [i915#295]) -> [PASS][129]
   [128]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10773/shard-kbl7/igt@kms_vblank@pipe-a-ts-continuation-suspend.html
   [129]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-kbl7/igt@kms_vblank@pipe-a-ts-continuation-suspend.html

  
#### Warnings ####

  * igt@i915_pm_rc6_residency@rc6-fence:
    - shard-iclb:         [WARN][130] ([i915#1804] / [i915#2684]) -> [WARN][131] ([i915#2684])
   [130]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10773/shard-iclb6/igt@i915_pm_rc6_residency@rc6-fence.html
   [131]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-iclb5/igt@i915_pm_rc6_residency@rc6-fence.html

  * igt@kms_flip@flip-vs-suspend-interruptible@b-dp1:
    - shard-kbl:          [INCOMPLETE][132] -> [DMESG-WARN][133] ([i915#180])
   [132]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10773/shard-kbl2/igt@kms_flip@flip-vs-suspend-interruptible@b-dp1.html
   [133]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-kbl7/igt@kms_flip@flip-vs-suspend-interruptible@b-dp1.html

  * igt@kms_psr2_sf@overlay-plane-update-sf-dmg-area-4:
    - shard-iclb:         [SKIP][134] ([i915#2920]) -> [SKIP][135] ([i915#658]) +1 similar issue
   [134]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10773/shard-iclb2/igt@kms_psr2_sf@overlay-plane-update-sf-dmg-area-4.html
   [135]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-iclb8/igt@kms_psr2_sf@overlay-plane-update-sf-dmg-area-4.html

  * igt@kms_psr2_sf@overlay-primary-update-sf-dmg-area-2:
    - shard-iclb:         [SKIP][136] ([i915#658]) -> [SKIP][137] ([i915#2920])
   [136]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10773/shard-iclb4/igt@kms_psr2_sf@overlay-primary-update-sf-dmg-area-2.html
   [137]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/shard-iclb2/igt@kms_psr2_sf@overlay-primary-update-sf-dmg-area-2.html

  * igt@runner@aborted:
    - shard-kbl:          ([FAIL][138], [FAIL][139], [FAIL][140], [FAIL][141], [FAIL][142], [FAIL][143], [FAIL][144], [FAIL][145], [FAIL][146], [FAIL][147], [FAIL][148], [FAIL][149], [FAIL][150], [FAIL][151]) ([fdo#109271] / [i915#1436] / [i915#180] / [i915#1814] / [i915#3002] / [i915#3363] / [i915#4312] / [i915#602]) -> ([FAIL][152], [FAIL][153], [FAIL][154], [FAIL][155], [FAIL][156], [FAIL][157], [FAIL][158], [FAIL][159], [FAIL][160]) ([fdo#109271]

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6346/index.html

[-- Attachment #2: Type: text/html, Size: 33745 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Intel-gfx] [igt-dev] [PATCH i-g-t 1/8] tests/i915/gem_exec_capture: Remove pointless assert
  2021-10-21 23:40   ` [igt-dev] " John.C.Harrison
@ 2021-10-29  2:14     ` Matthew Brost
  -1 siblings, 0 replies; 51+ messages in thread
From: Matthew Brost @ 2021-10-29  2:14 UTC (permalink / raw)
  To: John.C.Harrison; +Cc: IGT-Dev, Intel-GFX

On Thu, Oct 21, 2021 at 04:40:37PM -0700, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> The 'many' test ended with an 'assert(count)', presumably meaning to
> ensure that some objects were actually captured. However, 'count' is
> the number of objects created not how many were captured. Plus, there
> is already a 'require(count > 1)' at the start and count is invarient
> so the final assert is basically pointless.
> 
> General concensus appears to be that the test should not fail
> irrespective of how many blobs are captured as low memory situations
> could cause the capture to be abbreviated. So just remove the
> pointless assert completely.
> 
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>

Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> ---
>  tests/i915/gem_exec_capture.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/tests/i915/gem_exec_capture.c b/tests/i915/gem_exec_capture.c
> index 7e0a8b8ad..53649cdb2 100644
> --- a/tests/i915/gem_exec_capture.c
> +++ b/tests/i915/gem_exec_capture.c
> @@ -524,7 +524,6 @@ static void many(int fd, int dir, uint64_t size, unsigned int flags)
>  	}
>  	igt_info("Captured %lu %"PRId64"-blobs out of a total of %lu\n",
>  		 blobs, size >> 12, count);
> -	igt_assert(count);
>  
>  	free(error);
>  	free(offsets);
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 1/8] tests/i915/gem_exec_capture: Remove pointless assert
@ 2021-10-29  2:14     ` Matthew Brost
  0 siblings, 0 replies; 51+ messages in thread
From: Matthew Brost @ 2021-10-29  2:14 UTC (permalink / raw)
  To: John.C.Harrison; +Cc: IGT-Dev, Intel-GFX

On Thu, Oct 21, 2021 at 04:40:37PM -0700, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> The 'many' test ended with an 'assert(count)', presumably meaning to
> ensure that some objects were actually captured. However, 'count' is
> the number of objects created not how many were captured. Plus, there
> is already a 'require(count > 1)' at the start and count is invarient
> so the final assert is basically pointless.
> 
> General concensus appears to be that the test should not fail
> irrespective of how many blobs are captured as low memory situations
> could cause the capture to be abbreviated. So just remove the
> pointless assert completely.
> 
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>

Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> ---
>  tests/i915/gem_exec_capture.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/tests/i915/gem_exec_capture.c b/tests/i915/gem_exec_capture.c
> index 7e0a8b8ad..53649cdb2 100644
> --- a/tests/i915/gem_exec_capture.c
> +++ b/tests/i915/gem_exec_capture.c
> @@ -524,7 +524,6 @@ static void many(int fd, int dir, uint64_t size, unsigned int flags)
>  	}
>  	igt_info("Captured %lu %"PRId64"-blobs out of a total of %lu\n",
>  		 blobs, size >> 12, count);
> -	igt_assert(count);
>  
>  	free(error);
>  	free(offsets);
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Intel-gfx] [PATCH i-g-t 5/8] tests/i915/gem_exec_capture: Check for memory allocation failure
  2021-10-21 23:40 ` [Intel-gfx] [PATCH i-g-t 5/8] tests/i915/gem_exec_capture: Check for memory allocation failure John.C.Harrison
@ 2021-10-29  2:20   ` Matthew Brost
  2021-11-03 14:00     ` [igt-dev] " Tvrtko Ursulin
  1 sibling, 0 replies; 51+ messages in thread
From: Matthew Brost @ 2021-10-29  2:20 UTC (permalink / raw)
  To: John.C.Harrison; +Cc: IGT-Dev, Intel-GFX

On Thu, Oct 21, 2021 at 04:40:41PM -0700, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> The sysfs file read helper does not actually report any errors if a
> realloc fails. It just silently returns a 'valid' but truncated
> buffer. This then leads to the decode of the buffer failing in random
> ways. So, add a check for ENOMEM being generated during the read.
> 
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>

Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> ---
>  tests/i915/gem_exec_capture.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/tests/i915/gem_exec_capture.c b/tests/i915/gem_exec_capture.c
> index e373d24ed..8997125ee 100644
> --- a/tests/i915/gem_exec_capture.c
> +++ b/tests/i915/gem_exec_capture.c
> @@ -131,9 +131,11 @@ static int check_error_state(int dir, struct offset *obj_offsets, int obj_count,
>  	char *error, *str;
>  	int blobs = 0;
>  
> +	errno = 0;
>  	error = igt_sysfs_get(dir, "error");
>  	igt_sysfs_set(dir, "error", "Begone!");
>  	igt_assert(error);
> +	igt_assert(errno != ENOMEM);
>  	igt_debug("%s\n", error);
>  
>  	/* render ring --- user = 0x00000000 ffffd000 */
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Intel-gfx] [PATCH i-g-t 3/8] tests/i915/gem_exec_capture: Make the error decode a common helper
  2021-10-21 23:40 ` [Intel-gfx] [PATCH i-g-t 3/8] tests/i915/gem_exec_capture: Make the error decode a common helper John.C.Harrison
@ 2021-10-29  2:34     ` Matthew Brost
  0 siblings, 0 replies; 51+ messages in thread
From: Matthew Brost @ 2021-10-29  2:34 UTC (permalink / raw)
  To: John.C.Harrison; +Cc: IGT-Dev, Intel-GFX

On Thu, Oct 21, 2021 at 04:40:39PM -0700, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> The decode of the error capture contents was happening in two
> different sub-tests with two very different pieces of code. One being
> much more extensive than the other (actually decodes and verifies the
> contents of the captured buffers rather than just the address). So,
> move the code into a common helper function and use that in both
> places.
> 
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>

Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> ---
>  tests/i915/gem_exec_capture.c | 344 +++++++++++++++++-----------------
>  1 file changed, 170 insertions(+), 174 deletions(-)
> 
> diff --git a/tests/i915/gem_exec_capture.c b/tests/i915/gem_exec_capture.c
> index 47ca64dd6..c85c198f7 100644
> --- a/tests/i915/gem_exec_capture.c
> +++ b/tests/i915/gem_exec_capture.c
> @@ -33,32 +33,175 @@
>  
>  IGT_TEST_DESCRIPTION("Check that we capture the user specified objects on a hang");
>  
> -static void check_error_state(int dir, struct drm_i915_gem_exec_object2 *obj)
> +struct offset {
> +	uint64_t addr;
> +	unsigned long idx;
> +	bool found;
> +};
> +
> +static unsigned long zlib_inflate(uint32_t **ptr, unsigned long len)
> +{
> +	struct z_stream_s zstream;
> +	void *out;
> +
> +	memset(&zstream, 0, sizeof(zstream));
> +
> +	zstream.next_in = (unsigned char *)*ptr;
> +	zstream.avail_in = 4*len;
> +
> +	if (inflateInit(&zstream) != Z_OK)
> +		return 0;
> +
> +	out = malloc(128*4096); /* approximate obj size */
> +	zstream.next_out = out;
> +	zstream.avail_out = 128*4096;
> +
> +	do {
> +		switch (inflate(&zstream, Z_SYNC_FLUSH)) {
> +		case Z_STREAM_END:
> +			goto end;
> +		case Z_OK:
> +			break;
> +		default:
> +			inflateEnd(&zstream);
> +			return 0;
> +		}
> +
> +		if (zstream.avail_out)
> +			break;
> +
> +		out = realloc(out, 2*zstream.total_out);
> +		if (out == NULL) {
> +			inflateEnd(&zstream);
> +			return 0;
> +		}
> +
> +		zstream.next_out = (unsigned char *)out + zstream.total_out;
> +		zstream.avail_out = zstream.total_out;
> +	} while (1);
> +end:
> +	inflateEnd(&zstream);
> +	free(*ptr);
> +	*ptr = out;
> +	return zstream.total_out / 4;
> +}
> +
> +static unsigned long
> +ascii85_decode(char *in, uint32_t **out, bool inflate, char **end)
> +{
> +	unsigned long len = 0, size = 1024;
> +
> +	*out = realloc(*out, sizeof(uint32_t)*size);
> +	if (*out == NULL)
> +		return 0;
> +
> +	while (*in >= '!' && *in <= 'z') {
> +		uint32_t v = 0;
> +
> +		if (len == size) {
> +			size *= 2;
> +			*out = realloc(*out, sizeof(uint32_t)*size);
> +			if (*out == NULL)
> +				return 0;
> +		}
> +
> +		if (*in == 'z') {
> +			in++;
> +		} else {
> +			v += in[0] - 33; v *= 85;
> +			v += in[1] - 33; v *= 85;
> +			v += in[2] - 33; v *= 85;
> +			v += in[3] - 33; v *= 85;
> +			v += in[4] - 33;
> +			in += 5;
> +		}
> +		(*out)[len++] = v;
> +	}
> +	*end = in;
> +
> +	if (!inflate)
> +		return len;
> +
> +	return zlib_inflate(out, len);
> +}
> +
> +static int check_error_state(int dir, struct offset *obj_offsets, int obj_count,
> +			     uint64_t obj_size, bool incremental)
>  {
>  	char *error, *str;
> -	bool found = false;
> +	int blobs = 0;
>  
>  	error = igt_sysfs_get(dir, "error");
>  	igt_sysfs_set(dir, "error", "Begone!");
> -
>  	igt_assert(error);
>  	igt_debug("%s\n", error);
>  
>  	/* render ring --- user = 0x00000000 ffffd000 */
> -	for (str = error; (str = strstr(str, "--- user = ")); str++) {
> +	for (str = error; (str = strstr(str, "--- user = ")); ) {
> +		uint32_t *data = NULL;
>  		uint64_t addr;
> -		uint32_t hi, lo;
> +		unsigned long i, sz;
> +		unsigned long start;
> +		unsigned long end;
>  
> -		igt_assert(sscanf(str, "--- user = 0x%x %x", &hi, &lo) == 2);
> -		addr = hi;
> +		if (strncmp(str, "--- user = 0x", 13))
> +			break;
> +		str += 13;
> +		addr = strtoul(str, &str, 16);
>  		addr <<= 32;
> -		addr |= lo;
> -		igt_assert_eq_u64(addr, obj->offset);
> -		found = true;
> +		addr |= strtoul(str + 1, &str, 16);
> +		igt_assert(*str++ == '\n');
> +
> +		start = 0;
> +		end = obj_count;
> +		while (end > start) {
> +			i = (end - start) / 2 + start;
> +			if (obj_offsets[i].addr < addr)
> +				start = i + 1;
> +			else if (obj_offsets[i].addr > addr)
> +				end = i;
> +			else
> +				break;
> +		}
> +		igt_assert(obj_offsets[i].addr == addr);
> +		igt_assert(!obj_offsets[i].found);
> +		obj_offsets[i].found = true;
> +		igt_debug("offset:%"PRIx64", index:%ld\n",
> +			  addr, obj_offsets[i].idx);
> +
> +		/* gtt_page_sizes = 0x00010000 */
> +		if (strncmp(str, "gtt_page_sizes = 0x", 19) == 0) {
> +			str += 19 + 8;
> +			igt_assert(*str++ == '\n');
> +		}
> +
> +		if (!(*str == ':' || *str == '~'))
> +			continue;
> +
> +		igt_debug("blob:%.64s\n", str);
> +		sz = ascii85_decode(str + 1, &data, *str == ':', &str);
> +
> +		igt_assert_eq(4 * sz, obj_size);
> +		igt_assert(*str++ == '\n');
> +		str = strchr(str, '-');
> +
> +		if (incremental) {
> +			uint32_t expect;
> +
> +			expect = obj_offsets[i].idx * obj_size;
> +			for (i = 0; i < sz; i++)
> +				igt_assert_eq(data[i], expect++);
> +		} else {
> +			for (i = 0; i < sz; i++)
> +				igt_assert_eq(data[i], 0);
> +		}
> +
> +		blobs++;
> +		free(data);
>  	}
>  
>  	free(error);
> -	igt_assert(found);
> +	return blobs;
>  }
>  
>  static void __capture1(int fd, int dir, uint64_t ahnd, const intel_ctx_t *ctx,
> @@ -73,6 +216,7 @@ static void __capture1(int fd, int dir, uint64_t ahnd, const intel_ctx_t *ctx,
>  	struct drm_i915_gem_relocation_entry reloc[2];
>  	struct drm_i915_gem_execbuffer2 execbuf;
>  	uint32_t *batch, *seqno;
> +	struct offset offset;
>  	int i;
>  
>  	memset(obj, 0, sizeof(obj));
> @@ -168,7 +312,10 @@ static void __capture1(int fd, int dir, uint64_t ahnd, const intel_ctx_t *ctx,
>  
>  	/* Check that only the buffer we marked is reported in the error */
>  	igt_force_gpu_reset(fd);
> -	check_error_state(dir, &obj[CAPTURE]);
> +	memset(&offset, 0, sizeof(offset));
> +	offset.addr = obj[CAPTURE].offset;
> +	igt_assert_eq(check_error_state(dir, &offset, 1, target_size, false), 1);
> +	igt_assert(offset.found);
>  
>  	gem_sync(fd, obj[BATCH].handle);
>  
> @@ -183,11 +330,12 @@ static void capture(int fd, int dir, const intel_ctx_t *ctx, unsigned ring)
>  {
>  	uint32_t handle;
>  	uint64_t ahnd;
> +	int obj_size = 4096;
>  
> -	handle = gem_create(fd, 4096);
> +	handle = gem_create(fd, obj_size);
>  	ahnd = get_reloc_ahnd(fd, ctx->id);
>  
> -	__capture1(fd, dir, ahnd, ctx, ring, handle, 4096);
> +	__capture1(fd, dir, ahnd, ctx, ring, handle, obj_size);
>  
>  	gem_close(fd, handle);
>  	put_ahnd(ahnd);
> @@ -206,10 +354,8 @@ static int cmp(const void *A, const void *B)
>  	return 0;
>  }
>  
> -static struct offset {
> -	uint64_t addr;
> -	unsigned long idx;
> -} *__captureN(int fd, int dir, uint64_t ahnd, unsigned ring,
> +static struct offset *
> +__captureN(int fd, int dir, uint64_t ahnd, unsigned ring,
>  	      unsigned int size, int count,
>  	      unsigned int flags)
>  #define INCREMENTAL 0x1
> @@ -357,98 +503,11 @@ static struct offset {
>  	return offsets;
>  }
>  
> -static unsigned long zlib_inflate(uint32_t **ptr, unsigned long len)
> -{
> -	struct z_stream_s zstream;
> -	void *out;
> -
> -	memset(&zstream, 0, sizeof(zstream));
> -
> -	zstream.next_in = (unsigned char *)*ptr;
> -	zstream.avail_in = 4*len;
> -
> -	if (inflateInit(&zstream) != Z_OK)
> -		return 0;
> -
> -	out = malloc(128*4096); /* approximate obj size */
> -	zstream.next_out = out;
> -	zstream.avail_out = 128*4096;
> -
> -	do {
> -		switch (inflate(&zstream, Z_SYNC_FLUSH)) {
> -		case Z_STREAM_END:
> -			goto end;
> -		case Z_OK:
> -			break;
> -		default:
> -			inflateEnd(&zstream);
> -			return 0;
> -		}
> -
> -		if (zstream.avail_out)
> -			break;
> -
> -		out = realloc(out, 2*zstream.total_out);
> -		if (out == NULL) {
> -			inflateEnd(&zstream);
> -			return 0;
> -		}
> -
> -		zstream.next_out = (unsigned char *)out + zstream.total_out;
> -		zstream.avail_out = zstream.total_out;
> -	} while (1);
> -end:
> -	inflateEnd(&zstream);
> -	free(*ptr);
> -	*ptr = out;
> -	return zstream.total_out / 4;
> -}
> -
> -static unsigned long
> -ascii85_decode(char *in, uint32_t **out, bool inflate, char **end)
> -{
> -	unsigned long len = 0, size = 1024;
> -
> -	*out = realloc(*out, sizeof(uint32_t)*size);
> -	if (*out == NULL)
> -		return 0;
> -
> -	while (*in >= '!' && *in <= 'z') {
> -		uint32_t v = 0;
> -
> -		if (len == size) {
> -			size *= 2;
> -			*out = realloc(*out, sizeof(uint32_t)*size);
> -			if (*out == NULL)
> -				return 0;
> -		}
> -
> -		if (*in == 'z') {
> -			in++;
> -		} else {
> -			v += in[0] - 33; v *= 85;
> -			v += in[1] - 33; v *= 85;
> -			v += in[2] - 33; v *= 85;
> -			v += in[3] - 33; v *= 85;
> -			v += in[4] - 33;
> -			in += 5;
> -		}
> -		(*out)[len++] = v;
> -	}
> -	*end = in;
> -
> -	if (!inflate)
> -		return len;
> -
> -	return zlib_inflate(out, len);
> -}
> -
>  static void many(int fd, int dir, uint64_t size, unsigned int flags)
>  {
>  	uint64_t ram, gtt, ahnd;
>  	unsigned long count, blobs;
>  	struct offset *offsets;
> -	char *error, *str;
>  
>  	gtt = gem_aperture_size(fd) / size;
>  	ram = (intel_get_avail_ram_mb() << 20) / size;
> @@ -463,75 +522,10 @@ static void many(int fd, int dir, uint64_t size, unsigned int flags)
>  
>  	offsets = __captureN(fd, dir, ahnd, 0, size, count, flags);
>  
> -	error = igt_sysfs_get(dir, "error");
> -	igt_sysfs_set(dir, "error", "Begone!");
> -	igt_assert(error);
> -
> -	blobs = 0;
> -	/* render ring --- user = 0x00000000 ffffd000 */
> -	str = strstr(error, "--- user = ");
> -	while (str) {
> -		uint32_t *data = NULL;
> -		unsigned long i, sz;
> -		uint64_t addr;
> -
> -		if (strncmp(str, "--- user = 0x", 13))
> -			break;
> -
> -		str += 13;
> -		addr = strtoul(str, &str, 16);
> -		addr <<= 32;
> -		addr |= strtoul(str + 1, &str, 16);
> -		igt_assert(*str++ == '\n');
> -
> -		/* gtt_page_sizes = 0x00010000 */
> -		if (strncmp(str, "gtt_page_sizes = 0x", 19) == 0) {
> -			str += 19 + 8;
> -			igt_assert(*str++ == '\n');
> -		}
> -
> -		if (!(*str == ':' || *str == '~'))
> -			continue;
> -
> -		igt_debug("blob:%.64s\n", str);
> -		sz = ascii85_decode(str + 1, &data, *str == ':', &str);
> -		igt_assert_eq(4 * sz, size);
> -		igt_assert(*str++ == '\n');
> -		str = strchr(str, '-');
> -
> -		if (flags & INCREMENTAL) {
> -			unsigned long start = 0;
> -			unsigned long end = count;
> -			uint32_t expect;
> -
> -			while (end > start) {
> -				i = (end - start) / 2 + start;
> -				if (offsets[i].addr < addr)
> -					start = i + 1;
> -				else if (offsets[i].addr > addr)
> -					end = i;
> -				else
> -					break;
> -			}
> -			igt_assert(offsets[i].addr == addr);
> -			igt_debug("offset:%"PRIx64", index:%ld\n",
> -				  addr, offsets[i].idx);
> -
> -			expect = offsets[i].idx * size;
> -			for (i = 0; i < sz; i++)
> -				igt_assert_eq(data[i], expect++);
> -		} else {
> -			for (i = 0; i < sz; i++)
> -				igt_assert_eq(data[i], 0);
> -		}
> -
> -		blobs++;
> -		free(data);
> -	}
> +	blobs = check_error_state(dir, offsets, count, size, !!(flags & INCREMENTAL));
>  	igt_info("Captured %lu %"PRId64"-blobs out of a total of %lu\n",
>  		 blobs, size >> 12, count);
>  
> -	free(error);
>  	free(offsets);
>  	put_ahnd(ahnd);
>  }
> @@ -625,12 +619,14 @@ static void userptr(int fd, int dir)
>  	uint32_t handle;
>  	uint64_t ahnd;
>  	void *ptr;
> +	int obj_size = 4096;
>  
> -	igt_assert(posix_memalign(&ptr, 4096, 4096) == 0);
> -	igt_require(__gem_userptr(fd, ptr, 4096, 0, 0, &handle) == 0);
> +	igt_assert(posix_memalign(&ptr, obj_size, obj_size) == 0);
> +	memset(ptr, 0, obj_size);
> +	igt_require(__gem_userptr(fd, ptr, obj_size, 0, 0, &handle) == 0);
>  	ahnd = get_reloc_ahnd(fd, ctx->id);
>  
> -	__capture1(fd, dir, ahnd, intel_ctx_0(fd), 0, handle, 4096);
> +	__capture1(fd, dir, ahnd, intel_ctx_0(fd), 0, handle, obj_size);
>  
>  	gem_close(fd, handle);
>  	put_ahnd(ahnd);
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [igt-dev] [Intel-gfx] [PATCH i-g-t 3/8] tests/i915/gem_exec_capture: Make the error decode a common helper
@ 2021-10-29  2:34     ` Matthew Brost
  0 siblings, 0 replies; 51+ messages in thread
From: Matthew Brost @ 2021-10-29  2:34 UTC (permalink / raw)
  To: John.C.Harrison; +Cc: IGT-Dev, Intel-GFX

On Thu, Oct 21, 2021 at 04:40:39PM -0700, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> The decode of the error capture contents was happening in two
> different sub-tests with two very different pieces of code. One being
> much more extensive than the other (actually decodes and verifies the
> contents of the captured buffers rather than just the address). So,
> move the code into a common helper function and use that in both
> places.
> 
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>

Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> ---
>  tests/i915/gem_exec_capture.c | 344 +++++++++++++++++-----------------
>  1 file changed, 170 insertions(+), 174 deletions(-)
> 
> diff --git a/tests/i915/gem_exec_capture.c b/tests/i915/gem_exec_capture.c
> index 47ca64dd6..c85c198f7 100644
> --- a/tests/i915/gem_exec_capture.c
> +++ b/tests/i915/gem_exec_capture.c
> @@ -33,32 +33,175 @@
>  
>  IGT_TEST_DESCRIPTION("Check that we capture the user specified objects on a hang");
>  
> -static void check_error_state(int dir, struct drm_i915_gem_exec_object2 *obj)
> +struct offset {
> +	uint64_t addr;
> +	unsigned long idx;
> +	bool found;
> +};
> +
> +static unsigned long zlib_inflate(uint32_t **ptr, unsigned long len)
> +{
> +	struct z_stream_s zstream;
> +	void *out;
> +
> +	memset(&zstream, 0, sizeof(zstream));
> +
> +	zstream.next_in = (unsigned char *)*ptr;
> +	zstream.avail_in = 4*len;
> +
> +	if (inflateInit(&zstream) != Z_OK)
> +		return 0;
> +
> +	out = malloc(128*4096); /* approximate obj size */
> +	zstream.next_out = out;
> +	zstream.avail_out = 128*4096;
> +
> +	do {
> +		switch (inflate(&zstream, Z_SYNC_FLUSH)) {
> +		case Z_STREAM_END:
> +			goto end;
> +		case Z_OK:
> +			break;
> +		default:
> +			inflateEnd(&zstream);
> +			return 0;
> +		}
> +
> +		if (zstream.avail_out)
> +			break;
> +
> +		out = realloc(out, 2*zstream.total_out);
> +		if (out == NULL) {
> +			inflateEnd(&zstream);
> +			return 0;
> +		}
> +
> +		zstream.next_out = (unsigned char *)out + zstream.total_out;
> +		zstream.avail_out = zstream.total_out;
> +	} while (1);
> +end:
> +	inflateEnd(&zstream);
> +	free(*ptr);
> +	*ptr = out;
> +	return zstream.total_out / 4;
> +}
> +
> +static unsigned long
> +ascii85_decode(char *in, uint32_t **out, bool inflate, char **end)
> +{
> +	unsigned long len = 0, size = 1024;
> +
> +	*out = realloc(*out, sizeof(uint32_t)*size);
> +	if (*out == NULL)
> +		return 0;
> +
> +	while (*in >= '!' && *in <= 'z') {
> +		uint32_t v = 0;
> +
> +		if (len == size) {
> +			size *= 2;
> +			*out = realloc(*out, sizeof(uint32_t)*size);
> +			if (*out == NULL)
> +				return 0;
> +		}
> +
> +		if (*in == 'z') {
> +			in++;
> +		} else {
> +			v += in[0] - 33; v *= 85;
> +			v += in[1] - 33; v *= 85;
> +			v += in[2] - 33; v *= 85;
> +			v += in[3] - 33; v *= 85;
> +			v += in[4] - 33;
> +			in += 5;
> +		}
> +		(*out)[len++] = v;
> +	}
> +	*end = in;
> +
> +	if (!inflate)
> +		return len;
> +
> +	return zlib_inflate(out, len);
> +}
> +
> +static int check_error_state(int dir, struct offset *obj_offsets, int obj_count,
> +			     uint64_t obj_size, bool incremental)
>  {
>  	char *error, *str;
> -	bool found = false;
> +	int blobs = 0;
>  
>  	error = igt_sysfs_get(dir, "error");
>  	igt_sysfs_set(dir, "error", "Begone!");
> -
>  	igt_assert(error);
>  	igt_debug("%s\n", error);
>  
>  	/* render ring --- user = 0x00000000 ffffd000 */
> -	for (str = error; (str = strstr(str, "--- user = ")); str++) {
> +	for (str = error; (str = strstr(str, "--- user = ")); ) {
> +		uint32_t *data = NULL;
>  		uint64_t addr;
> -		uint32_t hi, lo;
> +		unsigned long i, sz;
> +		unsigned long start;
> +		unsigned long end;
>  
> -		igt_assert(sscanf(str, "--- user = 0x%x %x", &hi, &lo) == 2);
> -		addr = hi;
> +		if (strncmp(str, "--- user = 0x", 13))
> +			break;
> +		str += 13;
> +		addr = strtoul(str, &str, 16);
>  		addr <<= 32;
> -		addr |= lo;
> -		igt_assert_eq_u64(addr, obj->offset);
> -		found = true;
> +		addr |= strtoul(str + 1, &str, 16);
> +		igt_assert(*str++ == '\n');
> +
> +		start = 0;
> +		end = obj_count;
> +		while (end > start) {
> +			i = (end - start) / 2 + start;
> +			if (obj_offsets[i].addr < addr)
> +				start = i + 1;
> +			else if (obj_offsets[i].addr > addr)
> +				end = i;
> +			else
> +				break;
> +		}
> +		igt_assert(obj_offsets[i].addr == addr);
> +		igt_assert(!obj_offsets[i].found);
> +		obj_offsets[i].found = true;
> +		igt_debug("offset:%"PRIx64", index:%ld\n",
> +			  addr, obj_offsets[i].idx);
> +
> +		/* gtt_page_sizes = 0x00010000 */
> +		if (strncmp(str, "gtt_page_sizes = 0x", 19) == 0) {
> +			str += 19 + 8;
> +			igt_assert(*str++ == '\n');
> +		}
> +
> +		if (!(*str == ':' || *str == '~'))
> +			continue;
> +
> +		igt_debug("blob:%.64s\n", str);
> +		sz = ascii85_decode(str + 1, &data, *str == ':', &str);
> +
> +		igt_assert_eq(4 * sz, obj_size);
> +		igt_assert(*str++ == '\n');
> +		str = strchr(str, '-');
> +
> +		if (incremental) {
> +			uint32_t expect;
> +
> +			expect = obj_offsets[i].idx * obj_size;
> +			for (i = 0; i < sz; i++)
> +				igt_assert_eq(data[i], expect++);
> +		} else {
> +			for (i = 0; i < sz; i++)
> +				igt_assert_eq(data[i], 0);
> +		}
> +
> +		blobs++;
> +		free(data);
>  	}
>  
>  	free(error);
> -	igt_assert(found);
> +	return blobs;
>  }
>  
>  static void __capture1(int fd, int dir, uint64_t ahnd, const intel_ctx_t *ctx,
> @@ -73,6 +216,7 @@ static void __capture1(int fd, int dir, uint64_t ahnd, const intel_ctx_t *ctx,
>  	struct drm_i915_gem_relocation_entry reloc[2];
>  	struct drm_i915_gem_execbuffer2 execbuf;
>  	uint32_t *batch, *seqno;
> +	struct offset offset;
>  	int i;
>  
>  	memset(obj, 0, sizeof(obj));
> @@ -168,7 +312,10 @@ static void __capture1(int fd, int dir, uint64_t ahnd, const intel_ctx_t *ctx,
>  
>  	/* Check that only the buffer we marked is reported in the error */
>  	igt_force_gpu_reset(fd);
> -	check_error_state(dir, &obj[CAPTURE]);
> +	memset(&offset, 0, sizeof(offset));
> +	offset.addr = obj[CAPTURE].offset;
> +	igt_assert_eq(check_error_state(dir, &offset, 1, target_size, false), 1);
> +	igt_assert(offset.found);
>  
>  	gem_sync(fd, obj[BATCH].handle);
>  
> @@ -183,11 +330,12 @@ static void capture(int fd, int dir, const intel_ctx_t *ctx, unsigned ring)
>  {
>  	uint32_t handle;
>  	uint64_t ahnd;
> +	int obj_size = 4096;
>  
> -	handle = gem_create(fd, 4096);
> +	handle = gem_create(fd, obj_size);
>  	ahnd = get_reloc_ahnd(fd, ctx->id);
>  
> -	__capture1(fd, dir, ahnd, ctx, ring, handle, 4096);
> +	__capture1(fd, dir, ahnd, ctx, ring, handle, obj_size);
>  
>  	gem_close(fd, handle);
>  	put_ahnd(ahnd);
> @@ -206,10 +354,8 @@ static int cmp(const void *A, const void *B)
>  	return 0;
>  }
>  
> -static struct offset {
> -	uint64_t addr;
> -	unsigned long idx;
> -} *__captureN(int fd, int dir, uint64_t ahnd, unsigned ring,
> +static struct offset *
> +__captureN(int fd, int dir, uint64_t ahnd, unsigned ring,
>  	      unsigned int size, int count,
>  	      unsigned int flags)
>  #define INCREMENTAL 0x1
> @@ -357,98 +503,11 @@ static struct offset {
>  	return offsets;
>  }
>  
> -static unsigned long zlib_inflate(uint32_t **ptr, unsigned long len)
> -{
> -	struct z_stream_s zstream;
> -	void *out;
> -
> -	memset(&zstream, 0, sizeof(zstream));
> -
> -	zstream.next_in = (unsigned char *)*ptr;
> -	zstream.avail_in = 4*len;
> -
> -	if (inflateInit(&zstream) != Z_OK)
> -		return 0;
> -
> -	out = malloc(128*4096); /* approximate obj size */
> -	zstream.next_out = out;
> -	zstream.avail_out = 128*4096;
> -
> -	do {
> -		switch (inflate(&zstream, Z_SYNC_FLUSH)) {
> -		case Z_STREAM_END:
> -			goto end;
> -		case Z_OK:
> -			break;
> -		default:
> -			inflateEnd(&zstream);
> -			return 0;
> -		}
> -
> -		if (zstream.avail_out)
> -			break;
> -
> -		out = realloc(out, 2*zstream.total_out);
> -		if (out == NULL) {
> -			inflateEnd(&zstream);
> -			return 0;
> -		}
> -
> -		zstream.next_out = (unsigned char *)out + zstream.total_out;
> -		zstream.avail_out = zstream.total_out;
> -	} while (1);
> -end:
> -	inflateEnd(&zstream);
> -	free(*ptr);
> -	*ptr = out;
> -	return zstream.total_out / 4;
> -}
> -
> -static unsigned long
> -ascii85_decode(char *in, uint32_t **out, bool inflate, char **end)
> -{
> -	unsigned long len = 0, size = 1024;
> -
> -	*out = realloc(*out, sizeof(uint32_t)*size);
> -	if (*out == NULL)
> -		return 0;
> -
> -	while (*in >= '!' && *in <= 'z') {
> -		uint32_t v = 0;
> -
> -		if (len == size) {
> -			size *= 2;
> -			*out = realloc(*out, sizeof(uint32_t)*size);
> -			if (*out == NULL)
> -				return 0;
> -		}
> -
> -		if (*in == 'z') {
> -			in++;
> -		} else {
> -			v += in[0] - 33; v *= 85;
> -			v += in[1] - 33; v *= 85;
> -			v += in[2] - 33; v *= 85;
> -			v += in[3] - 33; v *= 85;
> -			v += in[4] - 33;
> -			in += 5;
> -		}
> -		(*out)[len++] = v;
> -	}
> -	*end = in;
> -
> -	if (!inflate)
> -		return len;
> -
> -	return zlib_inflate(out, len);
> -}
> -
>  static void many(int fd, int dir, uint64_t size, unsigned int flags)
>  {
>  	uint64_t ram, gtt, ahnd;
>  	unsigned long count, blobs;
>  	struct offset *offsets;
> -	char *error, *str;
>  
>  	gtt = gem_aperture_size(fd) / size;
>  	ram = (intel_get_avail_ram_mb() << 20) / size;
> @@ -463,75 +522,10 @@ static void many(int fd, int dir, uint64_t size, unsigned int flags)
>  
>  	offsets = __captureN(fd, dir, ahnd, 0, size, count, flags);
>  
> -	error = igt_sysfs_get(dir, "error");
> -	igt_sysfs_set(dir, "error", "Begone!");
> -	igt_assert(error);
> -
> -	blobs = 0;
> -	/* render ring --- user = 0x00000000 ffffd000 */
> -	str = strstr(error, "--- user = ");
> -	while (str) {
> -		uint32_t *data = NULL;
> -		unsigned long i, sz;
> -		uint64_t addr;
> -
> -		if (strncmp(str, "--- user = 0x", 13))
> -			break;
> -
> -		str += 13;
> -		addr = strtoul(str, &str, 16);
> -		addr <<= 32;
> -		addr |= strtoul(str + 1, &str, 16);
> -		igt_assert(*str++ == '\n');
> -
> -		/* gtt_page_sizes = 0x00010000 */
> -		if (strncmp(str, "gtt_page_sizes = 0x", 19) == 0) {
> -			str += 19 + 8;
> -			igt_assert(*str++ == '\n');
> -		}
> -
> -		if (!(*str == ':' || *str == '~'))
> -			continue;
> -
> -		igt_debug("blob:%.64s\n", str);
> -		sz = ascii85_decode(str + 1, &data, *str == ':', &str);
> -		igt_assert_eq(4 * sz, size);
> -		igt_assert(*str++ == '\n');
> -		str = strchr(str, '-');
> -
> -		if (flags & INCREMENTAL) {
> -			unsigned long start = 0;
> -			unsigned long end = count;
> -			uint32_t expect;
> -
> -			while (end > start) {
> -				i = (end - start) / 2 + start;
> -				if (offsets[i].addr < addr)
> -					start = i + 1;
> -				else if (offsets[i].addr > addr)
> -					end = i;
> -				else
> -					break;
> -			}
> -			igt_assert(offsets[i].addr == addr);
> -			igt_debug("offset:%"PRIx64", index:%ld\n",
> -				  addr, offsets[i].idx);
> -
> -			expect = offsets[i].idx * size;
> -			for (i = 0; i < sz; i++)
> -				igt_assert_eq(data[i], expect++);
> -		} else {
> -			for (i = 0; i < sz; i++)
> -				igt_assert_eq(data[i], 0);
> -		}
> -
> -		blobs++;
> -		free(data);
> -	}
> +	blobs = check_error_state(dir, offsets, count, size, !!(flags & INCREMENTAL));
>  	igt_info("Captured %lu %"PRId64"-blobs out of a total of %lu\n",
>  		 blobs, size >> 12, count);
>  
> -	free(error);
>  	free(offsets);
>  	put_ahnd(ahnd);
>  }
> @@ -625,12 +619,14 @@ static void userptr(int fd, int dir)
>  	uint32_t handle;
>  	uint64_t ahnd;
>  	void *ptr;
> +	int obj_size = 4096;
>  
> -	igt_assert(posix_memalign(&ptr, 4096, 4096) == 0);
> -	igt_require(__gem_userptr(fd, ptr, 4096, 0, 0, &handle) == 0);
> +	igt_assert(posix_memalign(&ptr, obj_size, obj_size) == 0);
> +	memset(ptr, 0, obj_size);
> +	igt_require(__gem_userptr(fd, ptr, obj_size, 0, 0, &handle) == 0);
>  	ahnd = get_reloc_ahnd(fd, ctx->id);
>  
> -	__capture1(fd, dir, ahnd, intel_ctx_0(fd), 0, handle, 4096);
> +	__capture1(fd, dir, ahnd, intel_ctx_0(fd), 0, handle, obj_size);
>  
>  	gem_close(fd, handle);
>  	put_ahnd(ahnd);
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Intel-gfx] [igt-dev] [PATCH i-g-t 6/8] lib/igt_sysfs: Support large files
  2021-10-21 23:40   ` [igt-dev] " John.C.Harrison
@ 2021-10-29  2:46     ` Matthew Brost
  -1 siblings, 0 replies; 51+ messages in thread
From: Matthew Brost @ 2021-10-29  2:46 UTC (permalink / raw)
  To: John.C.Harrison; +Cc: IGT-Dev, Intel-GFX

On Thu, Oct 21, 2021 at 04:40:42PM -0700, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> The syfs helper functions were all using basic 'int' data types for
> sizs, offsets, etc. when reading from sysfs. This works fine for
> little files, but not for large error capture logs (which can be
> gigabytes in sizes).
> 
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>

Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> ---
>  lib/igt_sysfs.c | 17 +++++++++++------
>  1 file changed, 11 insertions(+), 6 deletions(-)
> 
> diff --git a/lib/igt_sysfs.c b/lib/igt_sysfs.c
> index 6919ac361..ee75e3ef1 100644
> --- a/lib/igt_sysfs.c
> +++ b/lib/igt_sysfs.c
> @@ -53,9 +53,11 @@
>   * provides basic support for like igt_sysfs_open().
>   */
>  
> -static int readN(int fd, char *buf, int len)
> +static ssize_t readN(int fd, char *buf, size_t len)
>  {
> -	int ret, total = 0;
> +	ssize_t ret;
> +	size_t total = 0;
> +
>  	do {
>  		ret = read(fd, buf + total, len - total);
>  		if (ret < 0)
> @@ -69,9 +71,11 @@ static int readN(int fd, char *buf, int len)
>  	return total ?: ret;
>  }
>  
> -static int writeN(int fd, const char *buf, int len)
> +static ssize_t writeN(int fd, const char *buf, size_t len)
>  {
> -	int ret, total = 0;
> +	ssize_t ret;
> +	size_t total = 0;
> +
>  	do {
>  		ret = write(fd, buf + total, len - total);
>  		if (ret < 0)
> @@ -218,8 +222,9 @@ bool igt_sysfs_set(int dir, const char *attr, const char *value)
>  char *igt_sysfs_get(int dir, const char *attr)
>  {
>  	char *buf;
> -	int len, offset, rem;
> -	int ret, fd;
> +	size_t len, offset, rem;
> +	ssize_t ret;
> +	int fd;
>  
>  	fd = openat(dir, attr, O_RDONLY);
>  	if (igt_debug_on(fd < 0))
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 6/8] lib/igt_sysfs: Support large files
@ 2021-10-29  2:46     ` Matthew Brost
  0 siblings, 0 replies; 51+ messages in thread
From: Matthew Brost @ 2021-10-29  2:46 UTC (permalink / raw)
  To: John.C.Harrison; +Cc: IGT-Dev, Intel-GFX

On Thu, Oct 21, 2021 at 04:40:42PM -0700, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> The syfs helper functions were all using basic 'int' data types for
> sizs, offsets, etc. when reading from sysfs. This works fine for
> little files, but not for large error capture logs (which can be
> gigabytes in sizes).
> 
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>

Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> ---
>  lib/igt_sysfs.c | 17 +++++++++++------
>  1 file changed, 11 insertions(+), 6 deletions(-)
> 
> diff --git a/lib/igt_sysfs.c b/lib/igt_sysfs.c
> index 6919ac361..ee75e3ef1 100644
> --- a/lib/igt_sysfs.c
> +++ b/lib/igt_sysfs.c
> @@ -53,9 +53,11 @@
>   * provides basic support for like igt_sysfs_open().
>   */
>  
> -static int readN(int fd, char *buf, int len)
> +static ssize_t readN(int fd, char *buf, size_t len)
>  {
> -	int ret, total = 0;
> +	ssize_t ret;
> +	size_t total = 0;
> +
>  	do {
>  		ret = read(fd, buf + total, len - total);
>  		if (ret < 0)
> @@ -69,9 +71,11 @@ static int readN(int fd, char *buf, int len)
>  	return total ?: ret;
>  }
>  
> -static int writeN(int fd, const char *buf, int len)
> +static ssize_t writeN(int fd, const char *buf, size_t len)
>  {
> -	int ret, total = 0;
> +	ssize_t ret;
> +	size_t total = 0;
> +
>  	do {
>  		ret = write(fd, buf + total, len - total);
>  		if (ret < 0)
> @@ -218,8 +222,9 @@ bool igt_sysfs_set(int dir, const char *attr, const char *value)
>  char *igt_sysfs_get(int dir, const char *attr)
>  {
>  	char *buf;
> -	int len, offset, rem;
> -	int ret, fd;
> +	size_t len, offset, rem;
> +	ssize_t ret;
> +	int fd;
>  
>  	fd = openat(dir, attr, O_RDONLY);
>  	if (igt_debug_on(fd < 0))
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Intel-gfx] [PATCH i-g-t 8/8] tests/i915/gem_exec_capture: Update to support GuC based resets
  2021-10-21 23:40 ` [Intel-gfx] [PATCH i-g-t 8/8] tests/i915/gem_exec_capture: Update to support GuC based resets John.C.Harrison
@ 2021-10-29  2:54   ` Matthew Brost
  0 siblings, 0 replies; 51+ messages in thread
From: Matthew Brost @ 2021-10-29  2:54 UTC (permalink / raw)
  To: John.C.Harrison; +Cc: IGT-Dev, Intel-GFX

On Thu, Oct 21, 2021 at 04:40:44PM -0700, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> When GuC submission is enabled, GuC itself manages hang detection and
> recovery. Therefore, any test that relies on being able to trigger an
> engine reset in the driver will fail. Full GT resets can still be
> triggered by the driver. However, in that situation detecting the
> specific context that caused a hang is not possible as the driver has
> no information about what is actually running on the hardware at any
> given time. Plus of course, there was no context that caused the hang
> because the hang was triggered manually, so it's basically a bogus
> mechanism in the first place!
> 
> Update the capture test to cause a reset via a the hangcheck mechanism
> by submitting a hanging batch and waiting. That way it is guaranteed to
> be testing the correct reset code paths for the current platform,
> whether that is GuC enabled or not.
> 
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>

Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> ---
>  tests/i915/gem_exec_capture.c | 65 ++++++++++++++++++++++++++++-------
>  1 file changed, 53 insertions(+), 12 deletions(-)
> 
> diff --git a/tests/i915/gem_exec_capture.c b/tests/i915/gem_exec_capture.c
> index 8997125ee..dda6e6a8f 100644
> --- a/tests/i915/gem_exec_capture.c
> +++ b/tests/i915/gem_exec_capture.c
> @@ -23,6 +23,7 @@
>  
>  #include <sys/poll.h>
>  #include <zlib.h>
> +#include <sched.h>
>  
>  #include "i915/gem.h"
>  #include "i915/gem_create.h"
> @@ -31,6 +32,8 @@
>  #include "igt_rand.h"
>  #include "igt_sysfs.h"
>  
> +#define MAX_RESET_TIME	600
> +
>  IGT_TEST_DESCRIPTION("Check that we capture the user specified objects on a hang");
>  
>  struct offset {
> @@ -213,7 +216,29 @@ static void configure_hangs(int fd, const struct intel_execution_engine2 *e, int
>  	gem_engine_property_printf(fd, e->name, "heartbeat_interval_ms", "%d", 500);
>  
>  	/* Allow engine based resets and disable banning */
> -	igt_allow_hang(fd, ctxt_id, HANG_ALLOW_CAPTURE);
> +	igt_allow_hang(fd, ctxt_id, HANG_ALLOW_CAPTURE | HANG_WANT_ENGINE_RESET);
> +}
> +
> +static bool fence_busy(int fence)
> +{
> +	return poll(&(struct pollfd){fence, POLLIN}, 1, 0) == 0;
> +}
> +
> +static void wait_to_die(int fence_out)
> +{
> +	struct timeval before, after, delta;
> +
> +	/* Wait for a reset to occur */
> +	gettimeofday(&before, NULL);
> +	while (fence_busy(fence_out)) {
> +		gettimeofday(&after, NULL);
> +		timersub(&after, &before, &delta);
> +		igt_assert(delta.tv_sec < MAX_RESET_TIME);
> +		sched_yield();
> +	}
> +	gettimeofday(&after, NULL);
> +	timersub(&after, &before, &delta);
> +	igt_info("Target died after %ld.%06lds\n", delta.tv_sec, delta.tv_usec);
>  }
>  
>  static void __capture1(int fd, int dir, uint64_t ahnd, const intel_ctx_t *ctx,
> @@ -230,7 +255,7 @@ static void __capture1(int fd, int dir, uint64_t ahnd, const intel_ctx_t *ctx,
>  	struct drm_i915_gem_execbuffer2 execbuf;
>  	uint32_t *batch, *seqno;
>  	struct offset offset;
> -	int i;
> +	int i, fence_out;
>  
>  	configure_hangs(fd, e, ctx->id);
>  
> @@ -315,18 +340,25 @@ static void __capture1(int fd, int dir, uint64_t ahnd, const intel_ctx_t *ctx,
>  	execbuf.flags = e->flags;
>  	if (gen > 3 && gen < 6)
>  		execbuf.flags |= I915_EXEC_SECURE;
> +	execbuf.flags |= I915_EXEC_FENCE_OUT;
>  	execbuf.rsvd1 = ctx->id;
> +	execbuf.rsvd2 = ~0UL;
>  
>  	igt_assert(!READ_ONCE(*seqno));
> -	gem_execbuf(fd, &execbuf);
> +	gem_execbuf_wr(fd, &execbuf);
> +
> +	fence_out = execbuf.rsvd2 >> 32;
> +	igt_assert(fence_out >= 0);
>  
>  	/* Wait for the request to start */
>  	while (READ_ONCE(*seqno) != 0xc0ffee)
>  		igt_assert(gem_bo_busy(fd, obj[SCRATCH].handle));
>  	munmap(seqno, 4096);
>  
> +	/* Wait for a reset to occur */
> +	wait_to_die(fence_out);
> +
>  	/* Check that only the buffer we marked is reported in the error */
> -	igt_force_gpu_reset(fd);
>  	memset(&offset, 0, sizeof(offset));
>  	offset.addr = obj[CAPTURE].offset;
>  	igt_assert_eq(check_error_state(dir, &offset, 1, target_size, false), 1);
> @@ -373,7 +405,8 @@ static int cmp(const void *A, const void *B)
>  static struct offset *
>  __captureN(int fd, int dir, uint64_t ahnd, const intel_ctx_t *ctx,
>  	   const struct intel_execution_engine2 *e,
> -	   unsigned int size, int count, unsigned int flags)
> +	   unsigned int size, int count,
> +	   unsigned int flags, int *_fence_out)
>  #define INCREMENTAL 0x1
>  #define ASYNC 0x2
>  {
> @@ -383,7 +416,7 @@ __captureN(int fd, int dir, uint64_t ahnd, const intel_ctx_t *ctx,
>  	struct drm_i915_gem_execbuffer2 execbuf;
>  	uint32_t *batch, *seqno;
>  	struct offset *offsets;
> -	int i;
> +	int i, fence_out;
>  
>  	configure_hangs(fd, e, ctx->id);
>  
> @@ -491,10 +524,17 @@ __captureN(int fd, int dir, uint64_t ahnd, const intel_ctx_t *ctx,
>  	execbuf.flags = e->flags;
>  	if (gen > 3 && gen < 6)
>  		execbuf.flags |= I915_EXEC_SECURE;
> +	execbuf.flags |= I915_EXEC_FENCE_OUT;
>  	execbuf.rsvd1 = ctx->id;
> +	execbuf.rsvd2 = ~0UL;
>  
>  	igt_assert(!READ_ONCE(*seqno));
> -	gem_execbuf(fd, &execbuf);
> +	gem_execbuf_wr(fd, &execbuf);
> +
> +	fence_out = execbuf.rsvd2 >> 32;
> +	igt_assert(fence_out >= 0);
> +	if (_fence_out)
> +		*_fence_out = fence_out;
>  
>  	/* Wait for the request to start */
>  	while (READ_ONCE(*seqno) != 0xc0ffee)
> @@ -502,7 +542,7 @@ __captureN(int fd, int dir, uint64_t ahnd, const intel_ctx_t *ctx,
>  	munmap(seqno, 4096);
>  
>  	if (!(flags & ASYNC)) {
> -		igt_force_gpu_reset(fd);
> +		wait_to_die(fence_out);
>  		gem_sync(fd, obj[count + 1].handle);
>  	}
>  
> @@ -549,7 +589,7 @@ static void many(int fd, int dir, uint64_t size, unsigned int flags)
>  	intel_require_memory(count, size, CHECK_RAM);
>  	ahnd = get_reloc_ahnd(fd, ctx->id);
>  
> -	offsets = __captureN(fd, dir, ahnd, ctx, e, size, count, flags);
> +	offsets = __captureN(fd, dir, ahnd, ctx, e, size, count, flags, NULL);
>  
>  	blobs = check_error_state(dir, offsets, count, size, !!(flags & INCREMENTAL));
>  	igt_info("Captured %lu %"PRId64"-blobs out of a total of %lu\n",
> @@ -602,6 +642,7 @@ static void prioinv(int fd, int dir, const intel_ctx_t *ctx,
>  	igt_assert(pipe(link) == 0);
>  	igt_fork(child, 1) {
>  		const intel_ctx_t *ctx2;
> +		int fence_out;
>  		fd = gem_reopen_driver(fd);
>  		igt_debug("Submitting large capture [%ld x %dMiB objects]\n",
>  			  count, (int)(size >> 20));
> @@ -613,11 +654,11 @@ static void prioinv(int fd, int dir, const intel_ctx_t *ctx,
>  		/* Reopen the allocator in the new process. */
>  		ahnd = get_reloc_ahnd(fd, ctx2->id);
>  
> -		free(__captureN(fd, dir, ahnd, ctx2, e, size, count, ASYNC));
> +		free(__captureN(fd, dir, ahnd, ctx2, e, size, count, ASYNC, &fence_out));
>  		put_ahnd(ahnd);
>  
>  		write(link[1], &fd, sizeof(fd)); /* wake the parent up */
> -		igt_force_gpu_reset(fd);
> +		wait_to_die(fence_out);
>  		write(link[1], &fd, sizeof(fd)); /* wake the parent up */
>  	}
>  	read(link[0], &dummy, sizeof(dummy));
> @@ -714,7 +755,7 @@ igt_main
>  		gem_require_mmap_wc(fd);
>  		igt_require(has_capture(fd));
>  		ctx = intel_ctx_create_all_physical(fd);
> -		igt_allow_hang(fd, ctx->id, HANG_ALLOW_CAPTURE);
> +		igt_allow_hang(fd, ctx->id, HANG_ALLOW_CAPTURE | HANG_WANT_ENGINE_RESET);
>  
>  		dir = igt_sysfs_open(fd);
>  		igt_require(igt_sysfs_set(dir, "error", "Begone!"));
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Intel-gfx] [igt-dev] [PATCH i-g-t 2/8] tests/i915/gem_exec_capture: Cope with larger page sizes
  2021-10-21 23:40   ` [igt-dev] " John.C.Harrison
@ 2021-10-29 17:39     ` Matthew Brost
  -1 siblings, 0 replies; 51+ messages in thread
From: Matthew Brost @ 2021-10-29 17:39 UTC (permalink / raw)
  To: John.C.Harrison; +Cc: IGT-Dev, Intel-GFX

On Thu, Oct 21, 2021 at 04:40:38PM -0700, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> At some point, larger than 4KB page sizes were added to the i915
> driver. This included adding an informational line to the buffer
> entries in error capture logs. However, the error capture test was not
> updated to skip this string, thus it would silently abort processing.
> 
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> ---
>  tests/i915/gem_exec_capture.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/tests/i915/gem_exec_capture.c b/tests/i915/gem_exec_capture.c
> index 53649cdb2..47ca64dd6 100644
> --- a/tests/i915/gem_exec_capture.c
> +++ b/tests/i915/gem_exec_capture.c
> @@ -484,6 +484,12 @@ static void many(int fd, int dir, uint64_t size, unsigned int flags)
>  		addr |= strtoul(str + 1, &str, 16);
>  		igt_assert(*str++ == '\n');
>  
> +		/* gtt_page_sizes = 0x00010000 */
> +		if (strncmp(str, "gtt_page_sizes = 0x", 19) == 0) {
> +			str += 19 + 8;
> +			igt_assert(*str++ == '\n');
> +		}

Can you explain this logic to me, for the life of me I can't figure out
what this doing. That probably warrent's a more detailed comment too.

Matt 

> +
>  		if (!(*str == ':' || *str == '~'))
>  			continue;
>  
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 2/8] tests/i915/gem_exec_capture: Cope with larger page sizes
@ 2021-10-29 17:39     ` Matthew Brost
  0 siblings, 0 replies; 51+ messages in thread
From: Matthew Brost @ 2021-10-29 17:39 UTC (permalink / raw)
  To: John.C.Harrison; +Cc: IGT-Dev, Intel-GFX

On Thu, Oct 21, 2021 at 04:40:38PM -0700, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> At some point, larger than 4KB page sizes were added to the i915
> driver. This included adding an informational line to the buffer
> entries in error capture logs. However, the error capture test was not
> updated to skip this string, thus it would silently abort processing.
> 
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> ---
>  tests/i915/gem_exec_capture.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/tests/i915/gem_exec_capture.c b/tests/i915/gem_exec_capture.c
> index 53649cdb2..47ca64dd6 100644
> --- a/tests/i915/gem_exec_capture.c
> +++ b/tests/i915/gem_exec_capture.c
> @@ -484,6 +484,12 @@ static void many(int fd, int dir, uint64_t size, unsigned int flags)
>  		addr |= strtoul(str + 1, &str, 16);
>  		igt_assert(*str++ == '\n');
>  
> +		/* gtt_page_sizes = 0x00010000 */
> +		if (strncmp(str, "gtt_page_sizes = 0x", 19) == 0) {
> +			str += 19 + 8;
> +			igt_assert(*str++ == '\n');
> +		}

Can you explain this logic to me, for the life of me I can't figure out
what this doing. That probably warrent's a more detailed comment too.

Matt 

> +
>  		if (!(*str == ':' || *str == '~'))
>  			continue;
>  
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Intel-gfx] [igt-dev] [PATCH i-g-t 2/8] tests/i915/gem_exec_capture: Cope with larger page sizes
  2021-10-29 17:39     ` Matthew Brost
@ 2021-10-30  0:32       ` John Harrison
  -1 siblings, 0 replies; 51+ messages in thread
From: John Harrison @ 2021-10-30  0:32 UTC (permalink / raw)
  To: Matthew Brost; +Cc: IGT-Dev, Intel-GFX

On 10/29/2021 10:39, Matthew Brost wrote:
> On Thu, Oct 21, 2021 at 04:40:38PM -0700, John.C.Harrison@Intel.com wrote:
>> From: John Harrison <John.C.Harrison@Intel.com>
>>
>> At some point, larger than 4KB page sizes were added to the i915
>> driver. This included adding an informational line to the buffer
>> entries in error capture logs. However, the error capture test was not
>> updated to skip this string, thus it would silently abort processing.
>>
>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>> ---
>>   tests/i915/gem_exec_capture.c | 6 ++++++
>>   1 file changed, 6 insertions(+)
>>
>> diff --git a/tests/i915/gem_exec_capture.c b/tests/i915/gem_exec_capture.c
>> index 53649cdb2..47ca64dd6 100644
>> --- a/tests/i915/gem_exec_capture.c
>> +++ b/tests/i915/gem_exec_capture.c
>> @@ -484,6 +484,12 @@ static void many(int fd, int dir, uint64_t size, unsigned int flags)
>>   		addr |= strtoul(str + 1, &str, 16);
>>   		igt_assert(*str++ == '\n');
>>   
>> +		/* gtt_page_sizes = 0x00010000 */
>> +		if (strncmp(str, "gtt_page_sizes = 0x", 19) == 0) {
>> +			str += 19 + 8;
>> +			igt_assert(*str++ == '\n');
>> +		}
> Can you explain this logic to me, for the life of me I can't figure out
> what this doing. That probably warrent's a more detailed comment too.
It's no different to the rest of the processing that this code was 
already doing.

if( start_of_current_line == "gtt_page_sizes = 0x") {
     current_line += strlen(above_string) + strlen(8-digit hex string);
     assert( next_character_of_current_line == end_of_line);
}

I.e. skip over any line that just contains the page size message.

John.

>
> Matt
>
>> +
>>   		if (!(*str == ':' || *str == '~'))
>>   			continue;
>>   
>> -- 
>> 2.25.1
>>


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 2/8] tests/i915/gem_exec_capture: Cope with larger page sizes
@ 2021-10-30  0:32       ` John Harrison
  0 siblings, 0 replies; 51+ messages in thread
From: John Harrison @ 2021-10-30  0:32 UTC (permalink / raw)
  To: Matthew Brost; +Cc: IGT-Dev, Intel-GFX

On 10/29/2021 10:39, Matthew Brost wrote:
> On Thu, Oct 21, 2021 at 04:40:38PM -0700, John.C.Harrison@Intel.com wrote:
>> From: John Harrison <John.C.Harrison@Intel.com>
>>
>> At some point, larger than 4KB page sizes were added to the i915
>> driver. This included adding an informational line to the buffer
>> entries in error capture logs. However, the error capture test was not
>> updated to skip this string, thus it would silently abort processing.
>>
>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>> ---
>>   tests/i915/gem_exec_capture.c | 6 ++++++
>>   1 file changed, 6 insertions(+)
>>
>> diff --git a/tests/i915/gem_exec_capture.c b/tests/i915/gem_exec_capture.c
>> index 53649cdb2..47ca64dd6 100644
>> --- a/tests/i915/gem_exec_capture.c
>> +++ b/tests/i915/gem_exec_capture.c
>> @@ -484,6 +484,12 @@ static void many(int fd, int dir, uint64_t size, unsigned int flags)
>>   		addr |= strtoul(str + 1, &str, 16);
>>   		igt_assert(*str++ == '\n');
>>   
>> +		/* gtt_page_sizes = 0x00010000 */
>> +		if (strncmp(str, "gtt_page_sizes = 0x", 19) == 0) {
>> +			str += 19 + 8;
>> +			igt_assert(*str++ == '\n');
>> +		}
> Can you explain this logic to me, for the life of me I can't figure out
> what this doing. That probably warrent's a more detailed comment too.
It's no different to the rest of the processing that this code was 
already doing.

if( start_of_current_line == "gtt_page_sizes = 0x") {
     current_line += strlen(above_string) + strlen(8-digit hex string);
     assert( next_character_of_current_line == end_of_line);
}

I.e. skip over any line that just contains the page size message.

John.

>
> Matt
>
>> +
>>   		if (!(*str == ':' || *str == '~'))
>>   			continue;
>>   
>> -- 
>> 2.25.1
>>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Intel-gfx] [igt-dev] [PATCH i-g-t 2/8] tests/i915/gem_exec_capture: Cope with larger page sizes
  2021-10-30  0:32       ` John Harrison
@ 2021-11-02 23:18         ` Matthew Brost
  -1 siblings, 0 replies; 51+ messages in thread
From: Matthew Brost @ 2021-11-02 23:18 UTC (permalink / raw)
  To: John Harrison; +Cc: IGT-Dev, Intel-GFX

On Fri, Oct 29, 2021 at 05:32:40PM -0700, John Harrison wrote:
> On 10/29/2021 10:39, Matthew Brost wrote:
> > On Thu, Oct 21, 2021 at 04:40:38PM -0700, John.C.Harrison@Intel.com wrote:
> > > From: John Harrison <John.C.Harrison@Intel.com>
> > > 
> > > At some point, larger than 4KB page sizes were added to the i915
> > > driver. This included adding an informational line to the buffer
> > > entries in error capture logs. However, the error capture test was not
> > > updated to skip this string, thus it would silently abort processing.
> > > 
> > > Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> > > ---
> > >   tests/i915/gem_exec_capture.c | 6 ++++++
> > >   1 file changed, 6 insertions(+)
> > > 
> > > diff --git a/tests/i915/gem_exec_capture.c b/tests/i915/gem_exec_capture.c
> > > index 53649cdb2..47ca64dd6 100644
> > > --- a/tests/i915/gem_exec_capture.c
> > > +++ b/tests/i915/gem_exec_capture.c
> > > @@ -484,6 +484,12 @@ static void many(int fd, int dir, uint64_t size, unsigned int flags)
> > >   		addr |= strtoul(str + 1, &str, 16);
> > >   		igt_assert(*str++ == '\n');
> > > +		/* gtt_page_sizes = 0x00010000 */
> > > +		if (strncmp(str, "gtt_page_sizes = 0x", 19) == 0) {
> > > +			str += 19 + 8;
> > > +			igt_assert(*str++ == '\n');
> > > +		}
> > Can you explain this logic to me, for the life of me I can't figure out
> > what this doing. That probably warrent's a more detailed comment too.
> It's no different to the rest of the processing that this code was already
> doing.
> 
> if( start_of_current_line == "gtt_page_sizes = 0x") {
>     current_line += strlen(above_string) + strlen(8-digit hex string);
>     assert( next_character_of_current_line == end_of_line);
> }
> 
> I.e. skip over any line that just contains the page size message.
> 

Ok, got it. Not sure I missed that. The magic numbers 19 and 8 where
confusing me but I understand this now.

With that:
Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> John.
> 
> > 
> > Matt
> > 
> > > +
> > >   		if (!(*str == ':' || *str == '~'))
> > >   			continue;
> > > -- 
> > > 2.25.1
> > > 
> 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 2/8] tests/i915/gem_exec_capture: Cope with larger page sizes
@ 2021-11-02 23:18         ` Matthew Brost
  0 siblings, 0 replies; 51+ messages in thread
From: Matthew Brost @ 2021-11-02 23:18 UTC (permalink / raw)
  To: John Harrison; +Cc: IGT-Dev, Intel-GFX

On Fri, Oct 29, 2021 at 05:32:40PM -0700, John Harrison wrote:
> On 10/29/2021 10:39, Matthew Brost wrote:
> > On Thu, Oct 21, 2021 at 04:40:38PM -0700, John.C.Harrison@Intel.com wrote:
> > > From: John Harrison <John.C.Harrison@Intel.com>
> > > 
> > > At some point, larger than 4KB page sizes were added to the i915
> > > driver. This included adding an informational line to the buffer
> > > entries in error capture logs. However, the error capture test was not
> > > updated to skip this string, thus it would silently abort processing.
> > > 
> > > Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> > > ---
> > >   tests/i915/gem_exec_capture.c | 6 ++++++
> > >   1 file changed, 6 insertions(+)
> > > 
> > > diff --git a/tests/i915/gem_exec_capture.c b/tests/i915/gem_exec_capture.c
> > > index 53649cdb2..47ca64dd6 100644
> > > --- a/tests/i915/gem_exec_capture.c
> > > +++ b/tests/i915/gem_exec_capture.c
> > > @@ -484,6 +484,12 @@ static void many(int fd, int dir, uint64_t size, unsigned int flags)
> > >   		addr |= strtoul(str + 1, &str, 16);
> > >   		igt_assert(*str++ == '\n');
> > > +		/* gtt_page_sizes = 0x00010000 */
> > > +		if (strncmp(str, "gtt_page_sizes = 0x", 19) == 0) {
> > > +			str += 19 + 8;
> > > +			igt_assert(*str++ == '\n');
> > > +		}
> > Can you explain this logic to me, for the life of me I can't figure out
> > what this doing. That probably warrent's a more detailed comment too.
> It's no different to the rest of the processing that this code was already
> doing.
> 
> if( start_of_current_line == "gtt_page_sizes = 0x") {
>     current_line += strlen(above_string) + strlen(8-digit hex string);
>     assert( next_character_of_current_line == end_of_line);
> }
> 
> I.e. skip over any line that just contains the page size message.
> 

Ok, got it. Not sure I missed that. The magic numbers 19 and 8 where
confusing me but I understand this now.

With that:
Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> John.
> 
> > 
> > Matt
> > 
> > > +
> > >   		if (!(*str == ':' || *str == '~'))
> > >   			continue;
> > > -- 
> > > 2.25.1
> > > 
> 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Intel-gfx] [PATCH i-g-t 4/8] tests/i915/gem_exec_capture: Use contexts and engines properly
  2021-10-21 23:40   ` [igt-dev] " John.C.Harrison
@ 2021-11-02 23:34     ` Matthew Brost
  -1 siblings, 0 replies; 51+ messages in thread
From: Matthew Brost @ 2021-11-02 23:34 UTC (permalink / raw)
  To: John.C.Harrison; +Cc: IGT-Dev, Intel-GFX

On Thu, Oct 21, 2021 at 04:40:40PM -0700, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> Some of the capture tests were using explicit contexts, some not. Some
> were poking the per engine pre-emption timeout, some not. This would
> lead to sporadic failures due to random timeouts, contexts being
> banned depending upon how many subtests were run and/or how many
> engines a given platform has, and other such failures.
> 
> So, update all tests to be conistent.
> 
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> ---
>  tests/i915/gem_exec_capture.c | 80 +++++++++++++++++++++++++----------
>  1 file changed, 58 insertions(+), 22 deletions(-)
> 
> diff --git a/tests/i915/gem_exec_capture.c b/tests/i915/gem_exec_capture.c
> index c85c198f7..e373d24ed 100644
> --- a/tests/i915/gem_exec_capture.c
> +++ b/tests/i915/gem_exec_capture.c
> @@ -204,8 +204,19 @@ static int check_error_state(int dir, struct offset *obj_offsets, int obj_count,
>  	return blobs;
>  }
>  
> +static void configure_hangs(int fd, const struct intel_execution_engine2 *e, int ctxt_id)
> +{
> +	/* Ensure fast hang detection */
> +	gem_engine_property_printf(fd, e->name, "preempt_timeout_ms", "%d", 250);
> +	gem_engine_property_printf(fd, e->name, "heartbeat_interval_ms", "%d", 500);

#define for 250, 500?

> +
> +	/* Allow engine based resets and disable banning */
> +	igt_allow_hang(fd, ctxt_id, HANG_ALLOW_CAPTURE);
> +}
> +
>  static void __capture1(int fd, int dir, uint64_t ahnd, const intel_ctx_t *ctx,
> -		       unsigned ring, uint32_t target, uint64_t target_size)
> +		       const struct intel_execution_engine2 *e,
> +		       uint32_t target, uint64_t target_size)
>  {
>  	const unsigned int gen = intel_gen(intel_get_drm_devid(fd));
>  	struct drm_i915_gem_exec_object2 obj[4];
> @@ -219,6 +230,8 @@ static void __capture1(int fd, int dir, uint64_t ahnd, const intel_ctx_t *ctx,
>  	struct offset offset;
>  	int i;
>  
> +	configure_hangs(fd, e, ctx->id);
> +
>  	memset(obj, 0, sizeof(obj));
>  	obj[SCRATCH].handle = gem_create(fd, 4096);
>  	obj[SCRATCH].flags = EXEC_OBJECT_WRITE;
> @@ -297,7 +310,7 @@ static void __capture1(int fd, int dir, uint64_t ahnd, const intel_ctx_t *ctx,
>  	memset(&execbuf, 0, sizeof(execbuf));
>  	execbuf.buffers_ptr = (uintptr_t)obj;
>  	execbuf.buffer_count = ARRAY_SIZE(obj);
> -	execbuf.flags = ring;
> +	execbuf.flags = e->flags;
>  	if (gen > 3 && gen < 6)
>  		execbuf.flags |= I915_EXEC_SECURE;
>  	execbuf.rsvd1 = ctx->id;
> @@ -326,7 +339,8 @@ static void __capture1(int fd, int dir, uint64_t ahnd, const intel_ctx_t *ctx,
>  	gem_close(fd, obj[SCRATCH].handle);
>  }
>  
> -static void capture(int fd, int dir, const intel_ctx_t *ctx, unsigned ring)
> +static void capture(int fd, int dir, const intel_ctx_t *ctx,
> +		    const struct intel_execution_engine2 *e)
>  {
>  	uint32_t handle;
>  	uint64_t ahnd;
> @@ -335,7 +349,7 @@ static void capture(int fd, int dir, const intel_ctx_t *ctx, unsigned ring)
>  	handle = gem_create(fd, obj_size);
>  	ahnd = get_reloc_ahnd(fd, ctx->id);
>  
> -	__capture1(fd, dir, ahnd, ctx, ring, handle, obj_size);
> +	__capture1(fd, dir, ahnd, ctx, e, handle, obj_size);
>  
>  	gem_close(fd, handle);
>  	put_ahnd(ahnd);
> @@ -355,9 +369,9 @@ static int cmp(const void *A, const void *B)
>  }
>  
>  static struct offset *
> -__captureN(int fd, int dir, uint64_t ahnd, unsigned ring,
> -	      unsigned int size, int count,
> -	      unsigned int flags)
> +__captureN(int fd, int dir, uint64_t ahnd, const intel_ctx_t *ctx,
> +	   const struct intel_execution_engine2 *e,
> +	   unsigned int size, int count, unsigned int flags)
>  #define INCREMENTAL 0x1
>  #define ASYNC 0x2
>  {
> @@ -369,6 +383,8 @@ __captureN(int fd, int dir, uint64_t ahnd, unsigned ring,
>  	struct offset *offsets;
>  	int i;
>  
> +	configure_hangs(fd, e, ctx->id);
> +
>  	offsets = calloc(count, sizeof(*offsets));
>  	igt_assert(offsets);
>  
> @@ -470,9 +486,10 @@ __captureN(int fd, int dir, uint64_t ahnd, unsigned ring,
>  	memset(&execbuf, 0, sizeof(execbuf));
>  	execbuf.buffers_ptr = (uintptr_t)obj;
>  	execbuf.buffer_count = count + 2;
> -	execbuf.flags = ring;
> +	execbuf.flags = e->flags;
>  	if (gen > 3 && gen < 6)
>  		execbuf.flags |= I915_EXEC_SECURE;
> +	execbuf.rsvd1 = ctx->id;
>  
>  	igt_assert(!READ_ONCE(*seqno));
>  	gem_execbuf(fd, &execbuf);
> @@ -505,10 +522,20 @@ __captureN(int fd, int dir, uint64_t ahnd, unsigned ring,
>  
>  static void many(int fd, int dir, uint64_t size, unsigned int flags)
>  {
> +	const struct intel_execution_engine2 *e;
> +	const intel_ctx_t *ctx;
>  	uint64_t ram, gtt, ahnd;
>  	unsigned long count, blobs;
>  	struct offset *offsets;
>  
> +	/* Find the first available engine: */
> +	ctx = intel_ctx_create_all_physical(fd);
> +	igt_assert(ctx);
> +	for_each_ctx_engine(fd, ctx, e)
> +		for_each_if(gem_class_can_store_dword(fd, e->class))
> +			break;
> +	igt_assert(e);

Duplicated below. Helper for this?

Matt

> +
>  	gtt = gem_aperture_size(fd) / size;
>  	ram = (intel_get_avail_ram_mb() << 20) / size;
>  	igt_debug("Available objects in GTT:%"PRIu64", RAM:%"PRIu64"\n",
> @@ -518,9 +545,9 @@ static void many(int fd, int dir, uint64_t size, unsigned int flags)
>  	igt_require(count > 1);
>  
>  	intel_require_memory(count, size, CHECK_RAM);
> -	ahnd = get_reloc_ahnd(fd, 0);
> +	ahnd = get_reloc_ahnd(fd, ctx->id);
>  
> -	offsets = __captureN(fd, dir, ahnd, 0, size, count, flags);
> +	offsets = __captureN(fd, dir, ahnd, ctx, e, size, count, flags);
>  
>  	blobs = check_error_state(dir, offsets, count, size, !!(flags & INCREMENTAL));
>  	igt_info("Captured %lu %"PRId64"-blobs out of a total of %lu\n",
> @@ -531,7 +558,7 @@ static void many(int fd, int dir, uint64_t size, unsigned int flags)
>  }
>  
>  static void prioinv(int fd, int dir, const intel_ctx_t *ctx,
> -		    unsigned ring, const char *name)
> +		    const struct intel_execution_engine2 *e)
>  {
>  	const uint32_t bbe = MI_BATCH_BUFFER_END;
>  	struct drm_i915_gem_exec_object2 obj = {
> @@ -540,7 +567,7 @@ static void prioinv(int fd, int dir, const intel_ctx_t *ctx,
>  	struct drm_i915_gem_execbuffer2 execbuf = {
>  		.buffers_ptr = to_user_pointer(&obj),
>  		.buffer_count = 1,
> -		.flags = ring,
> +		.flags = e->flags,
>  		.rsvd1 = ctx->id,
>  	};
>  	int64_t timeout = NSEC_PER_SEC; /* 1s, feeling generous, blame debug */
> @@ -555,10 +582,6 @@ static void prioinv(int fd, int dir, const intel_ctx_t *ctx,
>  	igt_require(igt_params_set(fd, "reset", "%u", -1)); /* engine resets! */
>  	igt_require(gem_gpu_reset_type(fd) > 1);
>  
> -	/* Needs to be fast enough for the hangcheck to return within 1s */
> -	igt_require(gem_engine_property_printf(fd, name, "preempt_timeout_ms", "%d", 0) > 0);
> -	gem_engine_property_printf(fd, name, "preempt_timeout_ms", "%d", 500);
> -
>  	gtt = gem_aperture_size(fd) / size;
>  	ram = (intel_get_avail_ram_mb() << 20) / size;
>  	igt_debug("Available objects in GTT:%"PRIu64", RAM:%"PRIu64"\n",
> @@ -576,15 +599,19 @@ static void prioinv(int fd, int dir, const intel_ctx_t *ctx,
>  
>  	igt_assert(pipe(link) == 0);
>  	igt_fork(child, 1) {
> +		const intel_ctx_t *ctx2;
>  		fd = gem_reopen_driver(fd);
>  		igt_debug("Submitting large capture [%ld x %dMiB objects]\n",
>  			  count, (int)(size >> 20));
>  
> +		ctx2 = intel_ctx_create_all_physical(fd);
> +		igt_assert(ctx2);
> +
>  		intel_allocator_init();
>  		/* Reopen the allocator in the new process. */
> -		ahnd = get_reloc_ahnd(fd, 0);
> +		ahnd = get_reloc_ahnd(fd, ctx2->id);
>  
> -		free(__captureN(fd, dir, ahnd, ring, size, count, ASYNC));
> +		free(__captureN(fd, dir, ahnd, ctx2, e, size, count, ASYNC));
>  		put_ahnd(ahnd);
>  
>  		write(link[1], &fd, sizeof(fd)); /* wake the parent up */
> @@ -615,18 +642,27 @@ static void prioinv(int fd, int dir, const intel_ctx_t *ctx,
>  
>  static void userptr(int fd, int dir)
>  {
> -	const intel_ctx_t *ctx = intel_ctx_0(fd);
> +	const struct intel_execution_engine2 *e;
> +	const intel_ctx_t *ctx;
>  	uint32_t handle;
>  	uint64_t ahnd;
>  	void *ptr;
>  	int obj_size = 4096;
>  
> +	/* Find the first available engine: */
> +	ctx = intel_ctx_create_all_physical(fd);
> +	igt_assert(ctx);
> +	for_each_ctx_engine(fd, ctx, e)
> +		for_each_if(gem_class_can_store_dword(fd, e->class))
> +			break;
> +	igt_assert(e);
> +
>  	igt_assert(posix_memalign(&ptr, obj_size, obj_size) == 0);
>  	memset(ptr, 0, obj_size);
>  	igt_require(__gem_userptr(fd, ptr, obj_size, 0, 0, &handle) == 0);
>  	ahnd = get_reloc_ahnd(fd, ctx->id);
>  
> -	__capture1(fd, dir, ahnd, intel_ctx_0(fd), 0, handle, obj_size);
> +	__capture1(fd, dir, ahnd, ctx, e, handle, obj_size);
>  
>  	gem_close(fd, handle);
>  	put_ahnd(ahnd);
> @@ -684,7 +720,7 @@ igt_main
>  	}
>  
>  	test_each_engine("capture", fd, ctx, e)
> -		capture(fd, dir, ctx, e->flags);
> +		capture(fd, dir, ctx, e);
>  
>  	igt_subtest_f("many-4K-zero") {
>  		igt_require(gem_can_store_dword(fd, 0));
> @@ -719,7 +755,7 @@ igt_main
>  	}
>  
>  	test_each_engine("pi", fd, ctx, e)
> -		prioinv(fd, dir, ctx, e->flags, e->name);
> +		prioinv(fd, dir, ctx, e);
>  
>  	igt_fixture {
>  		close(dir);
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [igt-dev] [Intel-gfx] [PATCH i-g-t 4/8] tests/i915/gem_exec_capture: Use contexts and engines properly
@ 2021-11-02 23:34     ` Matthew Brost
  0 siblings, 0 replies; 51+ messages in thread
From: Matthew Brost @ 2021-11-02 23:34 UTC (permalink / raw)
  To: John.C.Harrison; +Cc: IGT-Dev, Intel-GFX

On Thu, Oct 21, 2021 at 04:40:40PM -0700, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> Some of the capture tests were using explicit contexts, some not. Some
> were poking the per engine pre-emption timeout, some not. This would
> lead to sporadic failures due to random timeouts, contexts being
> banned depending upon how many subtests were run and/or how many
> engines a given platform has, and other such failures.
> 
> So, update all tests to be conistent.
> 
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> ---
>  tests/i915/gem_exec_capture.c | 80 +++++++++++++++++++++++++----------
>  1 file changed, 58 insertions(+), 22 deletions(-)
> 
> diff --git a/tests/i915/gem_exec_capture.c b/tests/i915/gem_exec_capture.c
> index c85c198f7..e373d24ed 100644
> --- a/tests/i915/gem_exec_capture.c
> +++ b/tests/i915/gem_exec_capture.c
> @@ -204,8 +204,19 @@ static int check_error_state(int dir, struct offset *obj_offsets, int obj_count,
>  	return blobs;
>  }
>  
> +static void configure_hangs(int fd, const struct intel_execution_engine2 *e, int ctxt_id)
> +{
> +	/* Ensure fast hang detection */
> +	gem_engine_property_printf(fd, e->name, "preempt_timeout_ms", "%d", 250);
> +	gem_engine_property_printf(fd, e->name, "heartbeat_interval_ms", "%d", 500);

#define for 250, 500?

> +
> +	/* Allow engine based resets and disable banning */
> +	igt_allow_hang(fd, ctxt_id, HANG_ALLOW_CAPTURE);
> +}
> +
>  static void __capture1(int fd, int dir, uint64_t ahnd, const intel_ctx_t *ctx,
> -		       unsigned ring, uint32_t target, uint64_t target_size)
> +		       const struct intel_execution_engine2 *e,
> +		       uint32_t target, uint64_t target_size)
>  {
>  	const unsigned int gen = intel_gen(intel_get_drm_devid(fd));
>  	struct drm_i915_gem_exec_object2 obj[4];
> @@ -219,6 +230,8 @@ static void __capture1(int fd, int dir, uint64_t ahnd, const intel_ctx_t *ctx,
>  	struct offset offset;
>  	int i;
>  
> +	configure_hangs(fd, e, ctx->id);
> +
>  	memset(obj, 0, sizeof(obj));
>  	obj[SCRATCH].handle = gem_create(fd, 4096);
>  	obj[SCRATCH].flags = EXEC_OBJECT_WRITE;
> @@ -297,7 +310,7 @@ static void __capture1(int fd, int dir, uint64_t ahnd, const intel_ctx_t *ctx,
>  	memset(&execbuf, 0, sizeof(execbuf));
>  	execbuf.buffers_ptr = (uintptr_t)obj;
>  	execbuf.buffer_count = ARRAY_SIZE(obj);
> -	execbuf.flags = ring;
> +	execbuf.flags = e->flags;
>  	if (gen > 3 && gen < 6)
>  		execbuf.flags |= I915_EXEC_SECURE;
>  	execbuf.rsvd1 = ctx->id;
> @@ -326,7 +339,8 @@ static void __capture1(int fd, int dir, uint64_t ahnd, const intel_ctx_t *ctx,
>  	gem_close(fd, obj[SCRATCH].handle);
>  }
>  
> -static void capture(int fd, int dir, const intel_ctx_t *ctx, unsigned ring)
> +static void capture(int fd, int dir, const intel_ctx_t *ctx,
> +		    const struct intel_execution_engine2 *e)
>  {
>  	uint32_t handle;
>  	uint64_t ahnd;
> @@ -335,7 +349,7 @@ static void capture(int fd, int dir, const intel_ctx_t *ctx, unsigned ring)
>  	handle = gem_create(fd, obj_size);
>  	ahnd = get_reloc_ahnd(fd, ctx->id);
>  
> -	__capture1(fd, dir, ahnd, ctx, ring, handle, obj_size);
> +	__capture1(fd, dir, ahnd, ctx, e, handle, obj_size);
>  
>  	gem_close(fd, handle);
>  	put_ahnd(ahnd);
> @@ -355,9 +369,9 @@ static int cmp(const void *A, const void *B)
>  }
>  
>  static struct offset *
> -__captureN(int fd, int dir, uint64_t ahnd, unsigned ring,
> -	      unsigned int size, int count,
> -	      unsigned int flags)
> +__captureN(int fd, int dir, uint64_t ahnd, const intel_ctx_t *ctx,
> +	   const struct intel_execution_engine2 *e,
> +	   unsigned int size, int count, unsigned int flags)
>  #define INCREMENTAL 0x1
>  #define ASYNC 0x2
>  {
> @@ -369,6 +383,8 @@ __captureN(int fd, int dir, uint64_t ahnd, unsigned ring,
>  	struct offset *offsets;
>  	int i;
>  
> +	configure_hangs(fd, e, ctx->id);
> +
>  	offsets = calloc(count, sizeof(*offsets));
>  	igt_assert(offsets);
>  
> @@ -470,9 +486,10 @@ __captureN(int fd, int dir, uint64_t ahnd, unsigned ring,
>  	memset(&execbuf, 0, sizeof(execbuf));
>  	execbuf.buffers_ptr = (uintptr_t)obj;
>  	execbuf.buffer_count = count + 2;
> -	execbuf.flags = ring;
> +	execbuf.flags = e->flags;
>  	if (gen > 3 && gen < 6)
>  		execbuf.flags |= I915_EXEC_SECURE;
> +	execbuf.rsvd1 = ctx->id;
>  
>  	igt_assert(!READ_ONCE(*seqno));
>  	gem_execbuf(fd, &execbuf);
> @@ -505,10 +522,20 @@ __captureN(int fd, int dir, uint64_t ahnd, unsigned ring,
>  
>  static void many(int fd, int dir, uint64_t size, unsigned int flags)
>  {
> +	const struct intel_execution_engine2 *e;
> +	const intel_ctx_t *ctx;
>  	uint64_t ram, gtt, ahnd;
>  	unsigned long count, blobs;
>  	struct offset *offsets;
>  
> +	/* Find the first available engine: */
> +	ctx = intel_ctx_create_all_physical(fd);
> +	igt_assert(ctx);
> +	for_each_ctx_engine(fd, ctx, e)
> +		for_each_if(gem_class_can_store_dword(fd, e->class))
> +			break;
> +	igt_assert(e);

Duplicated below. Helper for this?

Matt

> +
>  	gtt = gem_aperture_size(fd) / size;
>  	ram = (intel_get_avail_ram_mb() << 20) / size;
>  	igt_debug("Available objects in GTT:%"PRIu64", RAM:%"PRIu64"\n",
> @@ -518,9 +545,9 @@ static void many(int fd, int dir, uint64_t size, unsigned int flags)
>  	igt_require(count > 1);
>  
>  	intel_require_memory(count, size, CHECK_RAM);
> -	ahnd = get_reloc_ahnd(fd, 0);
> +	ahnd = get_reloc_ahnd(fd, ctx->id);
>  
> -	offsets = __captureN(fd, dir, ahnd, 0, size, count, flags);
> +	offsets = __captureN(fd, dir, ahnd, ctx, e, size, count, flags);
>  
>  	blobs = check_error_state(dir, offsets, count, size, !!(flags & INCREMENTAL));
>  	igt_info("Captured %lu %"PRId64"-blobs out of a total of %lu\n",
> @@ -531,7 +558,7 @@ static void many(int fd, int dir, uint64_t size, unsigned int flags)
>  }
>  
>  static void prioinv(int fd, int dir, const intel_ctx_t *ctx,
> -		    unsigned ring, const char *name)
> +		    const struct intel_execution_engine2 *e)
>  {
>  	const uint32_t bbe = MI_BATCH_BUFFER_END;
>  	struct drm_i915_gem_exec_object2 obj = {
> @@ -540,7 +567,7 @@ static void prioinv(int fd, int dir, const intel_ctx_t *ctx,
>  	struct drm_i915_gem_execbuffer2 execbuf = {
>  		.buffers_ptr = to_user_pointer(&obj),
>  		.buffer_count = 1,
> -		.flags = ring,
> +		.flags = e->flags,
>  		.rsvd1 = ctx->id,
>  	};
>  	int64_t timeout = NSEC_PER_SEC; /* 1s, feeling generous, blame debug */
> @@ -555,10 +582,6 @@ static void prioinv(int fd, int dir, const intel_ctx_t *ctx,
>  	igt_require(igt_params_set(fd, "reset", "%u", -1)); /* engine resets! */
>  	igt_require(gem_gpu_reset_type(fd) > 1);
>  
> -	/* Needs to be fast enough for the hangcheck to return within 1s */
> -	igt_require(gem_engine_property_printf(fd, name, "preempt_timeout_ms", "%d", 0) > 0);
> -	gem_engine_property_printf(fd, name, "preempt_timeout_ms", "%d", 500);
> -
>  	gtt = gem_aperture_size(fd) / size;
>  	ram = (intel_get_avail_ram_mb() << 20) / size;
>  	igt_debug("Available objects in GTT:%"PRIu64", RAM:%"PRIu64"\n",
> @@ -576,15 +599,19 @@ static void prioinv(int fd, int dir, const intel_ctx_t *ctx,
>  
>  	igt_assert(pipe(link) == 0);
>  	igt_fork(child, 1) {
> +		const intel_ctx_t *ctx2;
>  		fd = gem_reopen_driver(fd);
>  		igt_debug("Submitting large capture [%ld x %dMiB objects]\n",
>  			  count, (int)(size >> 20));
>  
> +		ctx2 = intel_ctx_create_all_physical(fd);
> +		igt_assert(ctx2);
> +
>  		intel_allocator_init();
>  		/* Reopen the allocator in the new process. */
> -		ahnd = get_reloc_ahnd(fd, 0);
> +		ahnd = get_reloc_ahnd(fd, ctx2->id);
>  
> -		free(__captureN(fd, dir, ahnd, ring, size, count, ASYNC));
> +		free(__captureN(fd, dir, ahnd, ctx2, e, size, count, ASYNC));
>  		put_ahnd(ahnd);
>  
>  		write(link[1], &fd, sizeof(fd)); /* wake the parent up */
> @@ -615,18 +642,27 @@ static void prioinv(int fd, int dir, const intel_ctx_t *ctx,
>  
>  static void userptr(int fd, int dir)
>  {
> -	const intel_ctx_t *ctx = intel_ctx_0(fd);
> +	const struct intel_execution_engine2 *e;
> +	const intel_ctx_t *ctx;
>  	uint32_t handle;
>  	uint64_t ahnd;
>  	void *ptr;
>  	int obj_size = 4096;
>  
> +	/* Find the first available engine: */
> +	ctx = intel_ctx_create_all_physical(fd);
> +	igt_assert(ctx);
> +	for_each_ctx_engine(fd, ctx, e)
> +		for_each_if(gem_class_can_store_dword(fd, e->class))
> +			break;
> +	igt_assert(e);
> +
>  	igt_assert(posix_memalign(&ptr, obj_size, obj_size) == 0);
>  	memset(ptr, 0, obj_size);
>  	igt_require(__gem_userptr(fd, ptr, obj_size, 0, 0, &handle) == 0);
>  	ahnd = get_reloc_ahnd(fd, ctx->id);
>  
> -	__capture1(fd, dir, ahnd, intel_ctx_0(fd), 0, handle, obj_size);
> +	__capture1(fd, dir, ahnd, ctx, e, handle, obj_size);
>  
>  	gem_close(fd, handle);
>  	put_ahnd(ahnd);
> @@ -684,7 +720,7 @@ igt_main
>  	}
>  
>  	test_each_engine("capture", fd, ctx, e)
> -		capture(fd, dir, ctx, e->flags);
> +		capture(fd, dir, ctx, e);
>  
>  	igt_subtest_f("many-4K-zero") {
>  		igt_require(gem_can_store_dword(fd, 0));
> @@ -719,7 +755,7 @@ igt_main
>  	}
>  
>  	test_each_engine("pi", fd, ctx, e)
> -		prioinv(fd, dir, ctx, e->flags, e->name);
> +		prioinv(fd, dir, ctx, e);
>  
>  	igt_fixture {
>  		close(dir);
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Intel-gfx] [PATCH i-g-t 7/8] lib/igt_gt: Allow per engine reset testing
  2021-10-21 23:40   ` [igt-dev] " John.C.Harrison
@ 2021-11-03  0:47     ` Matthew Brost
  -1 siblings, 0 replies; 51+ messages in thread
From: Matthew Brost @ 2021-11-03  0:47 UTC (permalink / raw)
  To: John.C.Harrison; +Cc: IGT-Dev, Intel-GFX

On Thu, Oct 21, 2021 at 04:40:43PM -0700, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> With GuC submission, engine resets are handled entirely within GuC
> rather than within i915. Traditionally, IGT has disallowed engine
> based resets becuase they don't send the uevent which IGT uses to
> check for unexpected resets. However, it is important to be able to
> test all reset mechanisms that can be used, so allow engine based
> resets to be enabled.
> 
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>

Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> ---
>  lib/igt_gt.c | 44 +++++++++++++++++++++++++++++---------------
>  lib/igt_gt.h |  1 +
>  2 files changed, 30 insertions(+), 15 deletions(-)
> 
> diff --git a/lib/igt_gt.c b/lib/igt_gt.c
> index a0ba04cc1..7c7df95ee 100644
> --- a/lib/igt_gt.c
> +++ b/lib/igt_gt.c
> @@ -56,23 +56,28 @@
>   * engines.
>   */
>  
> +static int reset_query_once = -1;
> +
>  static bool has_gpu_reset(int fd)
>  {
> -	static int once = -1;
> -	if (once < 0) {
> -		struct drm_i915_getparam gp;
> -		int val = 0;
> -
> -		memset(&gp, 0, sizeof(gp));
> -		gp.param = 35; /* HAS_GPU_RESET */
> -		gp.value = &val;
> -
> -		if (ioctl(fd, DRM_IOCTL_I915_GETPARAM, &gp))
> -			once = intel_gen(intel_get_drm_devid(fd)) >= 5;
> -		else
> -			once = val > 0;
> +	if (reset_query_once < 0) {
> +		reset_query_once = gem_gpu_reset_type(fd);
> +
> +		/* Very old kernels did not support the query */
> +		if (reset_query_once == -1)
> +			reset_query_once =
> +			      (intel_gen(intel_get_drm_devid(fd)) >= 5) ? 1 : 0;
>  	}
> -	return once;
> +
> +	return reset_query_once > 0;
> +}
> +
> +static bool has_engine_reset(int fd)
> +{
> +	if (reset_query_once < 0)
> +		has_gpu_reset(fd);
> +
> +	return reset_query_once > 1;
>  }
>  
>  static void eat_error_state(int dev)
> @@ -176,7 +181,11 @@ igt_hang_t igt_allow_hang(int fd, unsigned ctx, unsigned flags)
>  		igt_skip("hang injection disabled by user [IGT_HANG=0]\n");
>  	gem_context_require_bannable(fd);
>  
> -	allow_reset = 1;
> +	if (flags & HANG_WANT_ENGINE_RESET)
> +		allow_reset = 2;
> +	else
> +		allow_reset = 1;
> +
>  	if ((flags & HANG_ALLOW_CAPTURE) == 0) {
>  		param.param = I915_CONTEXT_PARAM_NO_ERROR_CAPTURE;
>  		param.value = 1;
> @@ -187,11 +196,16 @@ igt_hang_t igt_allow_hang(int fd, unsigned ctx, unsigned flags)
>  		__gem_context_set_param(fd, &param);
>  		allow_reset = INT_MAX; /* any reset method */
>  	}
> +
>  	igt_require(igt_params_set(fd, "reset", "%d", allow_reset));
> +	reset_query_once = -1;  /* Re-query after changing param */
>  
>  	if (!igt_check_boolean_env_var("IGT_HANG_WITHOUT_RESET", false))
>  		igt_require(has_gpu_reset(fd));
>  
> +	if (flags & HANG_WANT_ENGINE_RESET)
> +		igt_require(has_engine_reset(fd));
> +
>  	ban = context_get_ban(fd, ctx);
>  	if ((flags & HANG_ALLOW_BAN) == 0)
>  		context_set_ban(fd, ctx, 0);
> diff --git a/lib/igt_gt.h b/lib/igt_gt.h
> index ceb044b86..c5059817b 100644
> --- a/lib/igt_gt.h
> +++ b/lib/igt_gt.h
> @@ -51,6 +51,7 @@ igt_hang_t igt_hang_ctx_with_ahnd(int fd, uint64_t ahnd, uint32_t ctx, int ring,
>  
>  #define HANG_ALLOW_BAN 1
>  #define HANG_ALLOW_CAPTURE 2
> +#define HANG_WANT_ENGINE_RESET 4
>  
>  igt_hang_t igt_hang_ring(int fd, int ring);
>  igt_hang_t igt_hang_ring_with_ahnd(int fd, int ring, uint64_t ahnd);
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [igt-dev] [Intel-gfx] [PATCH i-g-t 7/8] lib/igt_gt: Allow per engine reset testing
@ 2021-11-03  0:47     ` Matthew Brost
  0 siblings, 0 replies; 51+ messages in thread
From: Matthew Brost @ 2021-11-03  0:47 UTC (permalink / raw)
  To: John.C.Harrison; +Cc: IGT-Dev, Intel-GFX

On Thu, Oct 21, 2021 at 04:40:43PM -0700, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> With GuC submission, engine resets are handled entirely within GuC
> rather than within i915. Traditionally, IGT has disallowed engine
> based resets becuase they don't send the uevent which IGT uses to
> check for unexpected resets. However, it is important to be able to
> test all reset mechanisms that can be used, so allow engine based
> resets to be enabled.
> 
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>

Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> ---
>  lib/igt_gt.c | 44 +++++++++++++++++++++++++++++---------------
>  lib/igt_gt.h |  1 +
>  2 files changed, 30 insertions(+), 15 deletions(-)
> 
> diff --git a/lib/igt_gt.c b/lib/igt_gt.c
> index a0ba04cc1..7c7df95ee 100644
> --- a/lib/igt_gt.c
> +++ b/lib/igt_gt.c
> @@ -56,23 +56,28 @@
>   * engines.
>   */
>  
> +static int reset_query_once = -1;
> +
>  static bool has_gpu_reset(int fd)
>  {
> -	static int once = -1;
> -	if (once < 0) {
> -		struct drm_i915_getparam gp;
> -		int val = 0;
> -
> -		memset(&gp, 0, sizeof(gp));
> -		gp.param = 35; /* HAS_GPU_RESET */
> -		gp.value = &val;
> -
> -		if (ioctl(fd, DRM_IOCTL_I915_GETPARAM, &gp))
> -			once = intel_gen(intel_get_drm_devid(fd)) >= 5;
> -		else
> -			once = val > 0;
> +	if (reset_query_once < 0) {
> +		reset_query_once = gem_gpu_reset_type(fd);
> +
> +		/* Very old kernels did not support the query */
> +		if (reset_query_once == -1)
> +			reset_query_once =
> +			      (intel_gen(intel_get_drm_devid(fd)) >= 5) ? 1 : 0;
>  	}
> -	return once;
> +
> +	return reset_query_once > 0;
> +}
> +
> +static bool has_engine_reset(int fd)
> +{
> +	if (reset_query_once < 0)
> +		has_gpu_reset(fd);
> +
> +	return reset_query_once > 1;
>  }
>  
>  static void eat_error_state(int dev)
> @@ -176,7 +181,11 @@ igt_hang_t igt_allow_hang(int fd, unsigned ctx, unsigned flags)
>  		igt_skip("hang injection disabled by user [IGT_HANG=0]\n");
>  	gem_context_require_bannable(fd);
>  
> -	allow_reset = 1;
> +	if (flags & HANG_WANT_ENGINE_RESET)
> +		allow_reset = 2;
> +	else
> +		allow_reset = 1;
> +
>  	if ((flags & HANG_ALLOW_CAPTURE) == 0) {
>  		param.param = I915_CONTEXT_PARAM_NO_ERROR_CAPTURE;
>  		param.value = 1;
> @@ -187,11 +196,16 @@ igt_hang_t igt_allow_hang(int fd, unsigned ctx, unsigned flags)
>  		__gem_context_set_param(fd, &param);
>  		allow_reset = INT_MAX; /* any reset method */
>  	}
> +
>  	igt_require(igt_params_set(fd, "reset", "%d", allow_reset));
> +	reset_query_once = -1;  /* Re-query after changing param */
>  
>  	if (!igt_check_boolean_env_var("IGT_HANG_WITHOUT_RESET", false))
>  		igt_require(has_gpu_reset(fd));
>  
> +	if (flags & HANG_WANT_ENGINE_RESET)
> +		igt_require(has_engine_reset(fd));
> +
>  	ban = context_get_ban(fd, ctx);
>  	if ((flags & HANG_ALLOW_BAN) == 0)
>  		context_set_ban(fd, ctx, 0);
> diff --git a/lib/igt_gt.h b/lib/igt_gt.h
> index ceb044b86..c5059817b 100644
> --- a/lib/igt_gt.h
> +++ b/lib/igt_gt.h
> @@ -51,6 +51,7 @@ igt_hang_t igt_hang_ctx_with_ahnd(int fd, uint64_t ahnd, uint32_t ctx, int ring,
>  
>  #define HANG_ALLOW_BAN 1
>  #define HANG_ALLOW_CAPTURE 2
> +#define HANG_WANT_ENGINE_RESET 4
>  
>  igt_hang_t igt_hang_ring(int fd, int ring);
>  igt_hang_t igt_hang_ring_with_ahnd(int fd, int ring, uint64_t ahnd);
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Intel-gfx] [PATCH i-g-t 4/8] tests/i915/gem_exec_capture: Use contexts and engines properly
  2021-11-02 23:34     ` [igt-dev] " Matthew Brost
  (?)
@ 2021-11-03  1:45     ` John Harrison
  2021-11-03  9:36         ` [igt-dev] " Petri Latvala
  -1 siblings, 1 reply; 51+ messages in thread
From: John Harrison @ 2021-11-03  1:45 UTC (permalink / raw)
  To: Matthew Brost; +Cc: IGT-Dev, Intel-GFX

On 11/2/2021 16:34, Matthew Brost wrote:
> On Thu, Oct 21, 2021 at 04:40:40PM -0700, John.C.Harrison@Intel.com wrote:
>> From: John Harrison <John.C.Harrison@Intel.com>
>>
>> Some of the capture tests were using explicit contexts, some not. Some
>> were poking the per engine pre-emption timeout, some not. This would
>> lead to sporadic failures due to random timeouts, contexts being
>> banned depending upon how many subtests were run and/or how many
>> engines a given platform has, and other such failures.
>>
>> So, update all tests to be conistent.
>>
>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>> ---
>>   tests/i915/gem_exec_capture.c | 80 +++++++++++++++++++++++++----------
>>   1 file changed, 58 insertions(+), 22 deletions(-)
>>
>> diff --git a/tests/i915/gem_exec_capture.c b/tests/i915/gem_exec_capture.c
>> index c85c198f7..e373d24ed 100644
>> --- a/tests/i915/gem_exec_capture.c
>> +++ b/tests/i915/gem_exec_capture.c
>> @@ -204,8 +204,19 @@ static int check_error_state(int dir, struct offset *obj_offsets, int obj_count,
>>   	return blobs;
>>   }
>>   
>> +static void configure_hangs(int fd, const struct intel_execution_engine2 *e, int ctxt_id)
>> +{
>> +	/* Ensure fast hang detection */
>> +	gem_engine_property_printf(fd, e->name, "preempt_timeout_ms", "%d", 250);
>> +	gem_engine_property_printf(fd, e->name, "heartbeat_interval_ms", "%d", 500);
> #define for 250, 500?
Is there any point? There is no special reason for the values other than 
small enough to be fast and long enough to not be too small to be 
usable. So there isn't really any particular name to give them beyond 
'SHORT_PREEMPT_TIMEOUT' or some such. And the whole point of the helper 
function is that the values are programmed in one place only and not 
used anywhere else. So there is no worry about repetition of magic numbers.


>
>> +
>> +	/* Allow engine based resets and disable banning */
>> +	igt_allow_hang(fd, ctxt_id, HANG_ALLOW_CAPTURE);
>> +}
>> +
>>   static void __capture1(int fd, int dir, uint64_t ahnd, const intel_ctx_t *ctx,
>> -		       unsigned ring, uint32_t target, uint64_t target_size)
>> +		       const struct intel_execution_engine2 *e,
>> +		       uint32_t target, uint64_t target_size)
>>   {
>>   	const unsigned int gen = intel_gen(intel_get_drm_devid(fd));
>>   	struct drm_i915_gem_exec_object2 obj[4];
>> @@ -219,6 +230,8 @@ static void __capture1(int fd, int dir, uint64_t ahnd, const intel_ctx_t *ctx,
>>   	struct offset offset;
>>   	int i;
>>   
>> +	configure_hangs(fd, e, ctx->id);
>> +
>>   	memset(obj, 0, sizeof(obj));
>>   	obj[SCRATCH].handle = gem_create(fd, 4096);
>>   	obj[SCRATCH].flags = EXEC_OBJECT_WRITE;
>> @@ -297,7 +310,7 @@ static void __capture1(int fd, int dir, uint64_t ahnd, const intel_ctx_t *ctx,
>>   	memset(&execbuf, 0, sizeof(execbuf));
>>   	execbuf.buffers_ptr = (uintptr_t)obj;
>>   	execbuf.buffer_count = ARRAY_SIZE(obj);
>> -	execbuf.flags = ring;
>> +	execbuf.flags = e->flags;
>>   	if (gen > 3 && gen < 6)
>>   		execbuf.flags |= I915_EXEC_SECURE;
>>   	execbuf.rsvd1 = ctx->id;
>> @@ -326,7 +339,8 @@ static void __capture1(int fd, int dir, uint64_t ahnd, const intel_ctx_t *ctx,
>>   	gem_close(fd, obj[SCRATCH].handle);
>>   }
>>   
>> -static void capture(int fd, int dir, const intel_ctx_t *ctx, unsigned ring)
>> +static void capture(int fd, int dir, const intel_ctx_t *ctx,
>> +		    const struct intel_execution_engine2 *e)
>>   {
>>   	uint32_t handle;
>>   	uint64_t ahnd;
>> @@ -335,7 +349,7 @@ static void capture(int fd, int dir, const intel_ctx_t *ctx, unsigned ring)
>>   	handle = gem_create(fd, obj_size);
>>   	ahnd = get_reloc_ahnd(fd, ctx->id);
>>   
>> -	__capture1(fd, dir, ahnd, ctx, ring, handle, obj_size);
>> +	__capture1(fd, dir, ahnd, ctx, e, handle, obj_size);
>>   
>>   	gem_close(fd, handle);
>>   	put_ahnd(ahnd);
>> @@ -355,9 +369,9 @@ static int cmp(const void *A, const void *B)
>>   }
>>   
>>   static struct offset *
>> -__captureN(int fd, int dir, uint64_t ahnd, unsigned ring,
>> -	      unsigned int size, int count,
>> -	      unsigned int flags)
>> +__captureN(int fd, int dir, uint64_t ahnd, const intel_ctx_t *ctx,
>> +	   const struct intel_execution_engine2 *e,
>> +	   unsigned int size, int count, unsigned int flags)
>>   #define INCREMENTAL 0x1
>>   #define ASYNC 0x2
>>   {
>> @@ -369,6 +383,8 @@ __captureN(int fd, int dir, uint64_t ahnd, unsigned ring,
>>   	struct offset *offsets;
>>   	int i;
>>   
>> +	configure_hangs(fd, e, ctx->id);
>> +
>>   	offsets = calloc(count, sizeof(*offsets));
>>   	igt_assert(offsets);
>>   
>> @@ -470,9 +486,10 @@ __captureN(int fd, int dir, uint64_t ahnd, unsigned ring,
>>   	memset(&execbuf, 0, sizeof(execbuf));
>>   	execbuf.buffers_ptr = (uintptr_t)obj;
>>   	execbuf.buffer_count = count + 2;
>> -	execbuf.flags = ring;
>> +	execbuf.flags = e->flags;
>>   	if (gen > 3 && gen < 6)
>>   		execbuf.flags |= I915_EXEC_SECURE;
>> +	execbuf.rsvd1 = ctx->id;
>>   
>>   	igt_assert(!READ_ONCE(*seqno));
>>   	gem_execbuf(fd, &execbuf);
>> @@ -505,10 +522,20 @@ __captureN(int fd, int dir, uint64_t ahnd, unsigned ring,
>>   
>>   static void many(int fd, int dir, uint64_t size, unsigned int flags)
>>   {
>> +	const struct intel_execution_engine2 *e;
>> +	const intel_ctx_t *ctx;
>>   	uint64_t ram, gtt, ahnd;
>>   	unsigned long count, blobs;
>>   	struct offset *offsets;
>>   
>> +	/* Find the first available engine: */
>> +	ctx = intel_ctx_create_all_physical(fd);
>> +	igt_assert(ctx);
>> +	for_each_ctx_engine(fd, ctx, e)
>> +		for_each_if(gem_class_can_store_dword(fd, e->class))
>> +			break;
>> +	igt_assert(e);
> Duplicated below. Helper for this?
>
> Matt
Sure.

John.

>> +
>>   	gtt = gem_aperture_size(fd) / size;
>>   	ram = (intel_get_avail_ram_mb() << 20) / size;
>>   	igt_debug("Available objects in GTT:%"PRIu64", RAM:%"PRIu64"\n",
>> @@ -518,9 +545,9 @@ static void many(int fd, int dir, uint64_t size, unsigned int flags)
>>   	igt_require(count > 1);
>>   
>>   	intel_require_memory(count, size, CHECK_RAM);
>> -	ahnd = get_reloc_ahnd(fd, 0);
>> +	ahnd = get_reloc_ahnd(fd, ctx->id);
>>   
>> -	offsets = __captureN(fd, dir, ahnd, 0, size, count, flags);
>> +	offsets = __captureN(fd, dir, ahnd, ctx, e, size, count, flags);
>>   
>>   	blobs = check_error_state(dir, offsets, count, size, !!(flags & INCREMENTAL));
>>   	igt_info("Captured %lu %"PRId64"-blobs out of a total of %lu\n",
>> @@ -531,7 +558,7 @@ static void many(int fd, int dir, uint64_t size, unsigned int flags)
>>   }
>>   
>>   static void prioinv(int fd, int dir, const intel_ctx_t *ctx,
>> -		    unsigned ring, const char *name)
>> +		    const struct intel_execution_engine2 *e)
>>   {
>>   	const uint32_t bbe = MI_BATCH_BUFFER_END;
>>   	struct drm_i915_gem_exec_object2 obj = {
>> @@ -540,7 +567,7 @@ static void prioinv(int fd, int dir, const intel_ctx_t *ctx,
>>   	struct drm_i915_gem_execbuffer2 execbuf = {
>>   		.buffers_ptr = to_user_pointer(&obj),
>>   		.buffer_count = 1,
>> -		.flags = ring,
>> +		.flags = e->flags,
>>   		.rsvd1 = ctx->id,
>>   	};
>>   	int64_t timeout = NSEC_PER_SEC; /* 1s, feeling generous, blame debug */
>> @@ -555,10 +582,6 @@ static void prioinv(int fd, int dir, const intel_ctx_t *ctx,
>>   	igt_require(igt_params_set(fd, "reset", "%u", -1)); /* engine resets! */
>>   	igt_require(gem_gpu_reset_type(fd) > 1);
>>   
>> -	/* Needs to be fast enough for the hangcheck to return within 1s */
>> -	igt_require(gem_engine_property_printf(fd, name, "preempt_timeout_ms", "%d", 0) > 0);
>> -	gem_engine_property_printf(fd, name, "preempt_timeout_ms", "%d", 500);
>> -
>>   	gtt = gem_aperture_size(fd) / size;
>>   	ram = (intel_get_avail_ram_mb() << 20) / size;
>>   	igt_debug("Available objects in GTT:%"PRIu64", RAM:%"PRIu64"\n",
>> @@ -576,15 +599,19 @@ static void prioinv(int fd, int dir, const intel_ctx_t *ctx,
>>   
>>   	igt_assert(pipe(link) == 0);
>>   	igt_fork(child, 1) {
>> +		const intel_ctx_t *ctx2;
>>   		fd = gem_reopen_driver(fd);
>>   		igt_debug("Submitting large capture [%ld x %dMiB objects]\n",
>>   			  count, (int)(size >> 20));
>>   
>> +		ctx2 = intel_ctx_create_all_physical(fd);
>> +		igt_assert(ctx2);
>> +
>>   		intel_allocator_init();
>>   		/* Reopen the allocator in the new process. */
>> -		ahnd = get_reloc_ahnd(fd, 0);
>> +		ahnd = get_reloc_ahnd(fd, ctx2->id);
>>   
>> -		free(__captureN(fd, dir, ahnd, ring, size, count, ASYNC));
>> +		free(__captureN(fd, dir, ahnd, ctx2, e, size, count, ASYNC));
>>   		put_ahnd(ahnd);
>>   
>>   		write(link[1], &fd, sizeof(fd)); /* wake the parent up */
>> @@ -615,18 +642,27 @@ static void prioinv(int fd, int dir, const intel_ctx_t *ctx,
>>   
>>   static void userptr(int fd, int dir)
>>   {
>> -	const intel_ctx_t *ctx = intel_ctx_0(fd);
>> +	const struct intel_execution_engine2 *e;
>> +	const intel_ctx_t *ctx;
>>   	uint32_t handle;
>>   	uint64_t ahnd;
>>   	void *ptr;
>>   	int obj_size = 4096;
>>   
>> +	/* Find the first available engine: */
>> +	ctx = intel_ctx_create_all_physical(fd);
>> +	igt_assert(ctx);
>> +	for_each_ctx_engine(fd, ctx, e)
>> +		for_each_if(gem_class_can_store_dword(fd, e->class))
>> +			break;
>> +	igt_assert(e);
>> +
>>   	igt_assert(posix_memalign(&ptr, obj_size, obj_size) == 0);
>>   	memset(ptr, 0, obj_size);
>>   	igt_require(__gem_userptr(fd, ptr, obj_size, 0, 0, &handle) == 0);
>>   	ahnd = get_reloc_ahnd(fd, ctx->id);
>>   
>> -	__capture1(fd, dir, ahnd, intel_ctx_0(fd), 0, handle, obj_size);
>> +	__capture1(fd, dir, ahnd, ctx, e, handle, obj_size);
>>   
>>   	gem_close(fd, handle);
>>   	put_ahnd(ahnd);
>> @@ -684,7 +720,7 @@ igt_main
>>   	}
>>   
>>   	test_each_engine("capture", fd, ctx, e)
>> -		capture(fd, dir, ctx, e->flags);
>> +		capture(fd, dir, ctx, e);
>>   
>>   	igt_subtest_f("many-4K-zero") {
>>   		igt_require(gem_can_store_dword(fd, 0));
>> @@ -719,7 +755,7 @@ igt_main
>>   	}
>>   
>>   	test_each_engine("pi", fd, ctx, e)
>> -		prioinv(fd, dir, ctx, e->flags, e->name);
>> +		prioinv(fd, dir, ctx, e);
>>   
>>   	igt_fixture {
>>   		close(dir);
>> -- 
>> 2.25.1
>>


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Intel-gfx] [PATCH i-g-t 4/8] tests/i915/gem_exec_capture: Use contexts and engines properly
  2021-11-03  1:45     ` John Harrison
@ 2021-11-03  9:36         ` Petri Latvala
  0 siblings, 0 replies; 51+ messages in thread
From: Petri Latvala @ 2021-11-03  9:36 UTC (permalink / raw)
  To: John Harrison; +Cc: IGT-Dev, Intel-GFX

On Tue, Nov 02, 2021 at 06:45:38PM -0700, John Harrison wrote:
> On 11/2/2021 16:34, Matthew Brost wrote:
> > On Thu, Oct 21, 2021 at 04:40:40PM -0700, John.C.Harrison@Intel.com wrote:
> > > From: John Harrison <John.C.Harrison@Intel.com>
> > > 
> > > Some of the capture tests were using explicit contexts, some not. Some
> > > were poking the per engine pre-emption timeout, some not. This would
> > > lead to sporadic failures due to random timeouts, contexts being
> > > banned depending upon how many subtests were run and/or how many
> > > engines a given platform has, and other such failures.
> > > 
> > > So, update all tests to be conistent.
> > > 
> > > Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> > > ---
> > >   tests/i915/gem_exec_capture.c | 80 +++++++++++++++++++++++++----------
> > >   1 file changed, 58 insertions(+), 22 deletions(-)
> > > 
> > > diff --git a/tests/i915/gem_exec_capture.c b/tests/i915/gem_exec_capture.c
> > > index c85c198f7..e373d24ed 100644
> > > --- a/tests/i915/gem_exec_capture.c
> > > +++ b/tests/i915/gem_exec_capture.c
> > > @@ -204,8 +204,19 @@ static int check_error_state(int dir, struct offset *obj_offsets, int obj_count,
> > >   	return blobs;
> > >   }
> > > +static void configure_hangs(int fd, const struct intel_execution_engine2 *e, int ctxt_id)
> > > +{
> > > +	/* Ensure fast hang detection */
> > > +	gem_engine_property_printf(fd, e->name, "preempt_timeout_ms", "%d", 250);
> > > +	gem_engine_property_printf(fd, e->name, "heartbeat_interval_ms", "%d", 500);
> > #define for 250, 500?
> Is there any point? There is no special reason for the values other than
> small enough to be fast and long enough to not be too small to be usable. So
> there isn't really any particular name to give them beyond
> 'SHORT_PREEMPT_TIMEOUT' or some such. And the whole point of the helper
> function is that the values are programmed in one place only and not used
> anywhere else. So there is no worry about repetition of magic numbers.

In about one year everyone has forgotten this explanation and will
wonder if it's related to some in-kernel behaviour or if there's some
other reason these values have been chosen.

So at least a comment why the values are these, please.


-- 
Petri Latvala


> 
> 
> > 
> > > +
> > > +	/* Allow engine based resets and disable banning */
> > > +	igt_allow_hang(fd, ctxt_id, HANG_ALLOW_CAPTURE);
> > > +}
> > > +
> > >   static void __capture1(int fd, int dir, uint64_t ahnd, const intel_ctx_t *ctx,
> > > -		       unsigned ring, uint32_t target, uint64_t target_size)
> > > +		       const struct intel_execution_engine2 *e,
> > > +		       uint32_t target, uint64_t target_size)
> > >   {
> > >   	const unsigned int gen = intel_gen(intel_get_drm_devid(fd));
> > >   	struct drm_i915_gem_exec_object2 obj[4];
> > > @@ -219,6 +230,8 @@ static void __capture1(int fd, int dir, uint64_t ahnd, const intel_ctx_t *ctx,
> > >   	struct offset offset;
> > >   	int i;
> > > +	configure_hangs(fd, e, ctx->id);
> > > +
> > >   	memset(obj, 0, sizeof(obj));
> > >   	obj[SCRATCH].handle = gem_create(fd, 4096);
> > >   	obj[SCRATCH].flags = EXEC_OBJECT_WRITE;
> > > @@ -297,7 +310,7 @@ static void __capture1(int fd, int dir, uint64_t ahnd, const intel_ctx_t *ctx,
> > >   	memset(&execbuf, 0, sizeof(execbuf));
> > >   	execbuf.buffers_ptr = (uintptr_t)obj;
> > >   	execbuf.buffer_count = ARRAY_SIZE(obj);
> > > -	execbuf.flags = ring;
> > > +	execbuf.flags = e->flags;
> > >   	if (gen > 3 && gen < 6)
> > >   		execbuf.flags |= I915_EXEC_SECURE;
> > >   	execbuf.rsvd1 = ctx->id;
> > > @@ -326,7 +339,8 @@ static void __capture1(int fd, int dir, uint64_t ahnd, const intel_ctx_t *ctx,
> > >   	gem_close(fd, obj[SCRATCH].handle);
> > >   }
> > > -static void capture(int fd, int dir, const intel_ctx_t *ctx, unsigned ring)
> > > +static void capture(int fd, int dir, const intel_ctx_t *ctx,
> > > +		    const struct intel_execution_engine2 *e)
> > >   {
> > >   	uint32_t handle;
> > >   	uint64_t ahnd;
> > > @@ -335,7 +349,7 @@ static void capture(int fd, int dir, const intel_ctx_t *ctx, unsigned ring)
> > >   	handle = gem_create(fd, obj_size);
> > >   	ahnd = get_reloc_ahnd(fd, ctx->id);
> > > -	__capture1(fd, dir, ahnd, ctx, ring, handle, obj_size);
> > > +	__capture1(fd, dir, ahnd, ctx, e, handle, obj_size);
> > >   	gem_close(fd, handle);
> > >   	put_ahnd(ahnd);
> > > @@ -355,9 +369,9 @@ static int cmp(const void *A, const void *B)
> > >   }
> > >   static struct offset *
> > > -__captureN(int fd, int dir, uint64_t ahnd, unsigned ring,
> > > -	      unsigned int size, int count,
> > > -	      unsigned int flags)
> > > +__captureN(int fd, int dir, uint64_t ahnd, const intel_ctx_t *ctx,
> > > +	   const struct intel_execution_engine2 *e,
> > > +	   unsigned int size, int count, unsigned int flags)
> > >   #define INCREMENTAL 0x1
> > >   #define ASYNC 0x2
> > >   {
> > > @@ -369,6 +383,8 @@ __captureN(int fd, int dir, uint64_t ahnd, unsigned ring,
> > >   	struct offset *offsets;
> > >   	int i;
> > > +	configure_hangs(fd, e, ctx->id);
> > > +
> > >   	offsets = calloc(count, sizeof(*offsets));
> > >   	igt_assert(offsets);
> > > @@ -470,9 +486,10 @@ __captureN(int fd, int dir, uint64_t ahnd, unsigned ring,
> > >   	memset(&execbuf, 0, sizeof(execbuf));
> > >   	execbuf.buffers_ptr = (uintptr_t)obj;
> > >   	execbuf.buffer_count = count + 2;
> > > -	execbuf.flags = ring;
> > > +	execbuf.flags = e->flags;
> > >   	if (gen > 3 && gen < 6)
> > >   		execbuf.flags |= I915_EXEC_SECURE;
> > > +	execbuf.rsvd1 = ctx->id;
> > >   	igt_assert(!READ_ONCE(*seqno));
> > >   	gem_execbuf(fd, &execbuf);
> > > @@ -505,10 +522,20 @@ __captureN(int fd, int dir, uint64_t ahnd, unsigned ring,
> > >   static void many(int fd, int dir, uint64_t size, unsigned int flags)
> > >   {
> > > +	const struct intel_execution_engine2 *e;
> > > +	const intel_ctx_t *ctx;
> > >   	uint64_t ram, gtt, ahnd;
> > >   	unsigned long count, blobs;
> > >   	struct offset *offsets;
> > > +	/* Find the first available engine: */
> > > +	ctx = intel_ctx_create_all_physical(fd);
> > > +	igt_assert(ctx);
> > > +	for_each_ctx_engine(fd, ctx, e)
> > > +		for_each_if(gem_class_can_store_dword(fd, e->class))
> > > +			break;
> > > +	igt_assert(e);
> > Duplicated below. Helper for this?
> > 
> > Matt
> Sure.
> 
> John.
> 
> > > +
> > >   	gtt = gem_aperture_size(fd) / size;
> > >   	ram = (intel_get_avail_ram_mb() << 20) / size;
> > >   	igt_debug("Available objects in GTT:%"PRIu64", RAM:%"PRIu64"\n",
> > > @@ -518,9 +545,9 @@ static void many(int fd, int dir, uint64_t size, unsigned int flags)
> > >   	igt_require(count > 1);
> > >   	intel_require_memory(count, size, CHECK_RAM);
> > > -	ahnd = get_reloc_ahnd(fd, 0);
> > > +	ahnd = get_reloc_ahnd(fd, ctx->id);
> > > -	offsets = __captureN(fd, dir, ahnd, 0, size, count, flags);
> > > +	offsets = __captureN(fd, dir, ahnd, ctx, e, size, count, flags);
> > >   	blobs = check_error_state(dir, offsets, count, size, !!(flags & INCREMENTAL));
> > >   	igt_info("Captured %lu %"PRId64"-blobs out of a total of %lu\n",
> > > @@ -531,7 +558,7 @@ static void many(int fd, int dir, uint64_t size, unsigned int flags)
> > >   }
> > >   static void prioinv(int fd, int dir, const intel_ctx_t *ctx,
> > > -		    unsigned ring, const char *name)
> > > +		    const struct intel_execution_engine2 *e)
> > >   {
> > >   	const uint32_t bbe = MI_BATCH_BUFFER_END;
> > >   	struct drm_i915_gem_exec_object2 obj = {
> > > @@ -540,7 +567,7 @@ static void prioinv(int fd, int dir, const intel_ctx_t *ctx,
> > >   	struct drm_i915_gem_execbuffer2 execbuf = {
> > >   		.buffers_ptr = to_user_pointer(&obj),
> > >   		.buffer_count = 1,
> > > -		.flags = ring,
> > > +		.flags = e->flags,
> > >   		.rsvd1 = ctx->id,
> > >   	};
> > >   	int64_t timeout = NSEC_PER_SEC; /* 1s, feeling generous, blame debug */
> > > @@ -555,10 +582,6 @@ static void prioinv(int fd, int dir, const intel_ctx_t *ctx,
> > >   	igt_require(igt_params_set(fd, "reset", "%u", -1)); /* engine resets! */
> > >   	igt_require(gem_gpu_reset_type(fd) > 1);
> > > -	/* Needs to be fast enough for the hangcheck to return within 1s */
> > > -	igt_require(gem_engine_property_printf(fd, name, "preempt_timeout_ms", "%d", 0) > 0);
> > > -	gem_engine_property_printf(fd, name, "preempt_timeout_ms", "%d", 500);
> > > -
> > >   	gtt = gem_aperture_size(fd) / size;
> > >   	ram = (intel_get_avail_ram_mb() << 20) / size;
> > >   	igt_debug("Available objects in GTT:%"PRIu64", RAM:%"PRIu64"\n",
> > > @@ -576,15 +599,19 @@ static void prioinv(int fd, int dir, const intel_ctx_t *ctx,
> > >   	igt_assert(pipe(link) == 0);
> > >   	igt_fork(child, 1) {
> > > +		const intel_ctx_t *ctx2;
> > >   		fd = gem_reopen_driver(fd);
> > >   		igt_debug("Submitting large capture [%ld x %dMiB objects]\n",
> > >   			  count, (int)(size >> 20));
> > > +		ctx2 = intel_ctx_create_all_physical(fd);
> > > +		igt_assert(ctx2);
> > > +
> > >   		intel_allocator_init();
> > >   		/* Reopen the allocator in the new process. */
> > > -		ahnd = get_reloc_ahnd(fd, 0);
> > > +		ahnd = get_reloc_ahnd(fd, ctx2->id);
> > > -		free(__captureN(fd, dir, ahnd, ring, size, count, ASYNC));
> > > +		free(__captureN(fd, dir, ahnd, ctx2, e, size, count, ASYNC));
> > >   		put_ahnd(ahnd);
> > >   		write(link[1], &fd, sizeof(fd)); /* wake the parent up */
> > > @@ -615,18 +642,27 @@ static void prioinv(int fd, int dir, const intel_ctx_t *ctx,
> > >   static void userptr(int fd, int dir)
> > >   {
> > > -	const intel_ctx_t *ctx = intel_ctx_0(fd);
> > > +	const struct intel_execution_engine2 *e;
> > > +	const intel_ctx_t *ctx;
> > >   	uint32_t handle;
> > >   	uint64_t ahnd;
> > >   	void *ptr;
> > >   	int obj_size = 4096;
> > > +	/* Find the first available engine: */
> > > +	ctx = intel_ctx_create_all_physical(fd);
> > > +	igt_assert(ctx);
> > > +	for_each_ctx_engine(fd, ctx, e)
> > > +		for_each_if(gem_class_can_store_dword(fd, e->class))
> > > +			break;
> > > +	igt_assert(e);
> > > +
> > >   	igt_assert(posix_memalign(&ptr, obj_size, obj_size) == 0);
> > >   	memset(ptr, 0, obj_size);
> > >   	igt_require(__gem_userptr(fd, ptr, obj_size, 0, 0, &handle) == 0);
> > >   	ahnd = get_reloc_ahnd(fd, ctx->id);
> > > -	__capture1(fd, dir, ahnd, intel_ctx_0(fd), 0, handle, obj_size);
> > > +	__capture1(fd, dir, ahnd, ctx, e, handle, obj_size);
> > >   	gem_close(fd, handle);
> > >   	put_ahnd(ahnd);
> > > @@ -684,7 +720,7 @@ igt_main
> > >   	}
> > >   	test_each_engine("capture", fd, ctx, e)
> > > -		capture(fd, dir, ctx, e->flags);
> > > +		capture(fd, dir, ctx, e);
> > >   	igt_subtest_f("many-4K-zero") {
> > >   		igt_require(gem_can_store_dword(fd, 0));
> > > @@ -719,7 +755,7 @@ igt_main
> > >   	}
> > >   	test_each_engine("pi", fd, ctx, e)
> > > -		prioinv(fd, dir, ctx, e->flags, e->name);
> > > +		prioinv(fd, dir, ctx, e);
> > >   	igt_fixture {
> > >   		close(dir);
> > > -- 
> > > 2.25.1
> > > 
> 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [igt-dev] [Intel-gfx] [PATCH i-g-t 4/8] tests/i915/gem_exec_capture: Use contexts and engines properly
@ 2021-11-03  9:36         ` Petri Latvala
  0 siblings, 0 replies; 51+ messages in thread
From: Petri Latvala @ 2021-11-03  9:36 UTC (permalink / raw)
  To: John Harrison; +Cc: IGT-Dev, Intel-GFX

On Tue, Nov 02, 2021 at 06:45:38PM -0700, John Harrison wrote:
> On 11/2/2021 16:34, Matthew Brost wrote:
> > On Thu, Oct 21, 2021 at 04:40:40PM -0700, John.C.Harrison@Intel.com wrote:
> > > From: John Harrison <John.C.Harrison@Intel.com>
> > > 
> > > Some of the capture tests were using explicit contexts, some not. Some
> > > were poking the per engine pre-emption timeout, some not. This would
> > > lead to sporadic failures due to random timeouts, contexts being
> > > banned depending upon how many subtests were run and/or how many
> > > engines a given platform has, and other such failures.
> > > 
> > > So, update all tests to be conistent.
> > > 
> > > Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> > > ---
> > >   tests/i915/gem_exec_capture.c | 80 +++++++++++++++++++++++++----------
> > >   1 file changed, 58 insertions(+), 22 deletions(-)
> > > 
> > > diff --git a/tests/i915/gem_exec_capture.c b/tests/i915/gem_exec_capture.c
> > > index c85c198f7..e373d24ed 100644
> > > --- a/tests/i915/gem_exec_capture.c
> > > +++ b/tests/i915/gem_exec_capture.c
> > > @@ -204,8 +204,19 @@ static int check_error_state(int dir, struct offset *obj_offsets, int obj_count,
> > >   	return blobs;
> > >   }
> > > +static void configure_hangs(int fd, const struct intel_execution_engine2 *e, int ctxt_id)
> > > +{
> > > +	/* Ensure fast hang detection */
> > > +	gem_engine_property_printf(fd, e->name, "preempt_timeout_ms", "%d", 250);
> > > +	gem_engine_property_printf(fd, e->name, "heartbeat_interval_ms", "%d", 500);
> > #define for 250, 500?
> Is there any point? There is no special reason for the values other than
> small enough to be fast and long enough to not be too small to be usable. So
> there isn't really any particular name to give them beyond
> 'SHORT_PREEMPT_TIMEOUT' or some such. And the whole point of the helper
> function is that the values are programmed in one place only and not used
> anywhere else. So there is no worry about repetition of magic numbers.

In about one year everyone has forgotten this explanation and will
wonder if it's related to some in-kernel behaviour or if there's some
other reason these values have been chosen.

So at least a comment why the values are these, please.


-- 
Petri Latvala


> 
> 
> > 
> > > +
> > > +	/* Allow engine based resets and disable banning */
> > > +	igt_allow_hang(fd, ctxt_id, HANG_ALLOW_CAPTURE);
> > > +}
> > > +
> > >   static void __capture1(int fd, int dir, uint64_t ahnd, const intel_ctx_t *ctx,
> > > -		       unsigned ring, uint32_t target, uint64_t target_size)
> > > +		       const struct intel_execution_engine2 *e,
> > > +		       uint32_t target, uint64_t target_size)
> > >   {
> > >   	const unsigned int gen = intel_gen(intel_get_drm_devid(fd));
> > >   	struct drm_i915_gem_exec_object2 obj[4];
> > > @@ -219,6 +230,8 @@ static void __capture1(int fd, int dir, uint64_t ahnd, const intel_ctx_t *ctx,
> > >   	struct offset offset;
> > >   	int i;
> > > +	configure_hangs(fd, e, ctx->id);
> > > +
> > >   	memset(obj, 0, sizeof(obj));
> > >   	obj[SCRATCH].handle = gem_create(fd, 4096);
> > >   	obj[SCRATCH].flags = EXEC_OBJECT_WRITE;
> > > @@ -297,7 +310,7 @@ static void __capture1(int fd, int dir, uint64_t ahnd, const intel_ctx_t *ctx,
> > >   	memset(&execbuf, 0, sizeof(execbuf));
> > >   	execbuf.buffers_ptr = (uintptr_t)obj;
> > >   	execbuf.buffer_count = ARRAY_SIZE(obj);
> > > -	execbuf.flags = ring;
> > > +	execbuf.flags = e->flags;
> > >   	if (gen > 3 && gen < 6)
> > >   		execbuf.flags |= I915_EXEC_SECURE;
> > >   	execbuf.rsvd1 = ctx->id;
> > > @@ -326,7 +339,8 @@ static void __capture1(int fd, int dir, uint64_t ahnd, const intel_ctx_t *ctx,
> > >   	gem_close(fd, obj[SCRATCH].handle);
> > >   }
> > > -static void capture(int fd, int dir, const intel_ctx_t *ctx, unsigned ring)
> > > +static void capture(int fd, int dir, const intel_ctx_t *ctx,
> > > +		    const struct intel_execution_engine2 *e)
> > >   {
> > >   	uint32_t handle;
> > >   	uint64_t ahnd;
> > > @@ -335,7 +349,7 @@ static void capture(int fd, int dir, const intel_ctx_t *ctx, unsigned ring)
> > >   	handle = gem_create(fd, obj_size);
> > >   	ahnd = get_reloc_ahnd(fd, ctx->id);
> > > -	__capture1(fd, dir, ahnd, ctx, ring, handle, obj_size);
> > > +	__capture1(fd, dir, ahnd, ctx, e, handle, obj_size);
> > >   	gem_close(fd, handle);
> > >   	put_ahnd(ahnd);
> > > @@ -355,9 +369,9 @@ static int cmp(const void *A, const void *B)
> > >   }
> > >   static struct offset *
> > > -__captureN(int fd, int dir, uint64_t ahnd, unsigned ring,
> > > -	      unsigned int size, int count,
> > > -	      unsigned int flags)
> > > +__captureN(int fd, int dir, uint64_t ahnd, const intel_ctx_t *ctx,
> > > +	   const struct intel_execution_engine2 *e,
> > > +	   unsigned int size, int count, unsigned int flags)
> > >   #define INCREMENTAL 0x1
> > >   #define ASYNC 0x2
> > >   {
> > > @@ -369,6 +383,8 @@ __captureN(int fd, int dir, uint64_t ahnd, unsigned ring,
> > >   	struct offset *offsets;
> > >   	int i;
> > > +	configure_hangs(fd, e, ctx->id);
> > > +
> > >   	offsets = calloc(count, sizeof(*offsets));
> > >   	igt_assert(offsets);
> > > @@ -470,9 +486,10 @@ __captureN(int fd, int dir, uint64_t ahnd, unsigned ring,
> > >   	memset(&execbuf, 0, sizeof(execbuf));
> > >   	execbuf.buffers_ptr = (uintptr_t)obj;
> > >   	execbuf.buffer_count = count + 2;
> > > -	execbuf.flags = ring;
> > > +	execbuf.flags = e->flags;
> > >   	if (gen > 3 && gen < 6)
> > >   		execbuf.flags |= I915_EXEC_SECURE;
> > > +	execbuf.rsvd1 = ctx->id;
> > >   	igt_assert(!READ_ONCE(*seqno));
> > >   	gem_execbuf(fd, &execbuf);
> > > @@ -505,10 +522,20 @@ __captureN(int fd, int dir, uint64_t ahnd, unsigned ring,
> > >   static void many(int fd, int dir, uint64_t size, unsigned int flags)
> > >   {
> > > +	const struct intel_execution_engine2 *e;
> > > +	const intel_ctx_t *ctx;
> > >   	uint64_t ram, gtt, ahnd;
> > >   	unsigned long count, blobs;
> > >   	struct offset *offsets;
> > > +	/* Find the first available engine: */
> > > +	ctx = intel_ctx_create_all_physical(fd);
> > > +	igt_assert(ctx);
> > > +	for_each_ctx_engine(fd, ctx, e)
> > > +		for_each_if(gem_class_can_store_dword(fd, e->class))
> > > +			break;
> > > +	igt_assert(e);
> > Duplicated below. Helper for this?
> > 
> > Matt
> Sure.
> 
> John.
> 
> > > +
> > >   	gtt = gem_aperture_size(fd) / size;
> > >   	ram = (intel_get_avail_ram_mb() << 20) / size;
> > >   	igt_debug("Available objects in GTT:%"PRIu64", RAM:%"PRIu64"\n",
> > > @@ -518,9 +545,9 @@ static void many(int fd, int dir, uint64_t size, unsigned int flags)
> > >   	igt_require(count > 1);
> > >   	intel_require_memory(count, size, CHECK_RAM);
> > > -	ahnd = get_reloc_ahnd(fd, 0);
> > > +	ahnd = get_reloc_ahnd(fd, ctx->id);
> > > -	offsets = __captureN(fd, dir, ahnd, 0, size, count, flags);
> > > +	offsets = __captureN(fd, dir, ahnd, ctx, e, size, count, flags);
> > >   	blobs = check_error_state(dir, offsets, count, size, !!(flags & INCREMENTAL));
> > >   	igt_info("Captured %lu %"PRId64"-blobs out of a total of %lu\n",
> > > @@ -531,7 +558,7 @@ static void many(int fd, int dir, uint64_t size, unsigned int flags)
> > >   }
> > >   static void prioinv(int fd, int dir, const intel_ctx_t *ctx,
> > > -		    unsigned ring, const char *name)
> > > +		    const struct intel_execution_engine2 *e)
> > >   {
> > >   	const uint32_t bbe = MI_BATCH_BUFFER_END;
> > >   	struct drm_i915_gem_exec_object2 obj = {
> > > @@ -540,7 +567,7 @@ static void prioinv(int fd, int dir, const intel_ctx_t *ctx,
> > >   	struct drm_i915_gem_execbuffer2 execbuf = {
> > >   		.buffers_ptr = to_user_pointer(&obj),
> > >   		.buffer_count = 1,
> > > -		.flags = ring,
> > > +		.flags = e->flags,
> > >   		.rsvd1 = ctx->id,
> > >   	};
> > >   	int64_t timeout = NSEC_PER_SEC; /* 1s, feeling generous, blame debug */
> > > @@ -555,10 +582,6 @@ static void prioinv(int fd, int dir, const intel_ctx_t *ctx,
> > >   	igt_require(igt_params_set(fd, "reset", "%u", -1)); /* engine resets! */
> > >   	igt_require(gem_gpu_reset_type(fd) > 1);
> > > -	/* Needs to be fast enough for the hangcheck to return within 1s */
> > > -	igt_require(gem_engine_property_printf(fd, name, "preempt_timeout_ms", "%d", 0) > 0);
> > > -	gem_engine_property_printf(fd, name, "preempt_timeout_ms", "%d", 500);
> > > -
> > >   	gtt = gem_aperture_size(fd) / size;
> > >   	ram = (intel_get_avail_ram_mb() << 20) / size;
> > >   	igt_debug("Available objects in GTT:%"PRIu64", RAM:%"PRIu64"\n",
> > > @@ -576,15 +599,19 @@ static void prioinv(int fd, int dir, const intel_ctx_t *ctx,
> > >   	igt_assert(pipe(link) == 0);
> > >   	igt_fork(child, 1) {
> > > +		const intel_ctx_t *ctx2;
> > >   		fd = gem_reopen_driver(fd);
> > >   		igt_debug("Submitting large capture [%ld x %dMiB objects]\n",
> > >   			  count, (int)(size >> 20));
> > > +		ctx2 = intel_ctx_create_all_physical(fd);
> > > +		igt_assert(ctx2);
> > > +
> > >   		intel_allocator_init();
> > >   		/* Reopen the allocator in the new process. */
> > > -		ahnd = get_reloc_ahnd(fd, 0);
> > > +		ahnd = get_reloc_ahnd(fd, ctx2->id);
> > > -		free(__captureN(fd, dir, ahnd, ring, size, count, ASYNC));
> > > +		free(__captureN(fd, dir, ahnd, ctx2, e, size, count, ASYNC));
> > >   		put_ahnd(ahnd);
> > >   		write(link[1], &fd, sizeof(fd)); /* wake the parent up */
> > > @@ -615,18 +642,27 @@ static void prioinv(int fd, int dir, const intel_ctx_t *ctx,
> > >   static void userptr(int fd, int dir)
> > >   {
> > > -	const intel_ctx_t *ctx = intel_ctx_0(fd);
> > > +	const struct intel_execution_engine2 *e;
> > > +	const intel_ctx_t *ctx;
> > >   	uint32_t handle;
> > >   	uint64_t ahnd;
> > >   	void *ptr;
> > >   	int obj_size = 4096;
> > > +	/* Find the first available engine: */
> > > +	ctx = intel_ctx_create_all_physical(fd);
> > > +	igt_assert(ctx);
> > > +	for_each_ctx_engine(fd, ctx, e)
> > > +		for_each_if(gem_class_can_store_dword(fd, e->class))
> > > +			break;
> > > +	igt_assert(e);
> > > +
> > >   	igt_assert(posix_memalign(&ptr, obj_size, obj_size) == 0);
> > >   	memset(ptr, 0, obj_size);
> > >   	igt_require(__gem_userptr(fd, ptr, obj_size, 0, 0, &handle) == 0);
> > >   	ahnd = get_reloc_ahnd(fd, ctx->id);
> > > -	__capture1(fd, dir, ahnd, intel_ctx_0(fd), 0, handle, obj_size);
> > > +	__capture1(fd, dir, ahnd, ctx, e, handle, obj_size);
> > >   	gem_close(fd, handle);
> > >   	put_ahnd(ahnd);
> > > @@ -684,7 +720,7 @@ igt_main
> > >   	}
> > >   	test_each_engine("capture", fd, ctx, e)
> > > -		capture(fd, dir, ctx, e->flags);
> > > +		capture(fd, dir, ctx, e);
> > >   	igt_subtest_f("many-4K-zero") {
> > >   		igt_require(gem_can_store_dword(fd, 0));
> > > @@ -719,7 +755,7 @@ igt_main
> > >   	}
> > >   	test_each_engine("pi", fd, ctx, e)
> > > -		prioinv(fd, dir, ctx, e->flags, e->name);
> > > +		prioinv(fd, dir, ctx, e);
> > >   	igt_fixture {
> > >   		close(dir);
> > > -- 
> > > 2.25.1
> > > 
> 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Intel-gfx] [PATCH i-g-t 1/8] tests/i915/gem_exec_capture: Remove pointless assert
  2021-10-21 23:40   ` [igt-dev] " John.C.Harrison
@ 2021-11-03 13:50     ` Tvrtko Ursulin
  -1 siblings, 0 replies; 51+ messages in thread
From: Tvrtko Ursulin @ 2021-11-03 13:50 UTC (permalink / raw)
  To: John.C.Harrison, IGT-Dev; +Cc: Intel-GFX


On 22/10/2021 00:40, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> The 'many' test ended with an 'assert(count)', presumably meaning to
> ensure that some objects were actually captured. However, 'count' is
> the number of objects created not how many were captured. Plus, there
> is already a 'require(count > 1)' at the start and count is invarient
> so the final assert is basically pointless.
> 
> General concensus appears to be that the test should not fail
> irrespective of how many blobs are captured as low memory situations
> could cause the capture to be abbreviated. So just remove the
> pointless assert completely.

Hm the test appears to be using intel_get_avail_ram_mb() to size the 
working set. Suggesting problems with low memory situations should not 
apply unless bugs. In which case would a better fix be improving the 
sizing logic and changing the assert to igt_assert(blobs)?

Regards,

Tvrtko

> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> ---
>   tests/i915/gem_exec_capture.c | 1 -
>   1 file changed, 1 deletion(-)
> 
> diff --git a/tests/i915/gem_exec_capture.c b/tests/i915/gem_exec_capture.c
> index 7e0a8b8ad..53649cdb2 100644
> --- a/tests/i915/gem_exec_capture.c
> +++ b/tests/i915/gem_exec_capture.c
> @@ -524,7 +524,6 @@ static void many(int fd, int dir, uint64_t size, unsigned int flags)
>   	}
>   	igt_info("Captured %lu %"PRId64"-blobs out of a total of %lu\n",
>   		 blobs, size >> 12, count);
> -	igt_assert(count);
>   
>   	free(error);
>   	free(offsets);
> 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [igt-dev] [Intel-gfx] [PATCH i-g-t 1/8] tests/i915/gem_exec_capture: Remove pointless assert
@ 2021-11-03 13:50     ` Tvrtko Ursulin
  0 siblings, 0 replies; 51+ messages in thread
From: Tvrtko Ursulin @ 2021-11-03 13:50 UTC (permalink / raw)
  To: John.C.Harrison, IGT-Dev; +Cc: Intel-GFX


On 22/10/2021 00:40, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> The 'many' test ended with an 'assert(count)', presumably meaning to
> ensure that some objects were actually captured. However, 'count' is
> the number of objects created not how many were captured. Plus, there
> is already a 'require(count > 1)' at the start and count is invarient
> so the final assert is basically pointless.
> 
> General concensus appears to be that the test should not fail
> irrespective of how many blobs are captured as low memory situations
> could cause the capture to be abbreviated. So just remove the
> pointless assert completely.

Hm the test appears to be using intel_get_avail_ram_mb() to size the 
working set. Suggesting problems with low memory situations should not 
apply unless bugs. In which case would a better fix be improving the 
sizing logic and changing the assert to igt_assert(blobs)?

Regards,

Tvrtko

> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> ---
>   tests/i915/gem_exec_capture.c | 1 -
>   1 file changed, 1 deletion(-)
> 
> diff --git a/tests/i915/gem_exec_capture.c b/tests/i915/gem_exec_capture.c
> index 7e0a8b8ad..53649cdb2 100644
> --- a/tests/i915/gem_exec_capture.c
> +++ b/tests/i915/gem_exec_capture.c
> @@ -524,7 +524,6 @@ static void many(int fd, int dir, uint64_t size, unsigned int flags)
>   	}
>   	igt_info("Captured %lu %"PRId64"-blobs out of a total of %lu\n",
>   		 blobs, size >> 12, count);
> -	igt_assert(count);
>   
>   	free(error);
>   	free(offsets);
> 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Intel-gfx] [PATCH i-g-t 5/8] tests/i915/gem_exec_capture: Check for memory allocation failure
  2021-10-21 23:40 ` [Intel-gfx] [PATCH i-g-t 5/8] tests/i915/gem_exec_capture: Check for memory allocation failure John.C.Harrison
@ 2021-11-03 14:00     ` Tvrtko Ursulin
  2021-11-03 14:00     ` [igt-dev] " Tvrtko Ursulin
  1 sibling, 0 replies; 51+ messages in thread
From: Tvrtko Ursulin @ 2021-11-03 14:00 UTC (permalink / raw)
  To: John.C.Harrison, IGT-Dev; +Cc: Intel-GFX


On 22/10/2021 00:40, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> The sysfs file read helper does not actually report any errors if a
> realloc fails. It just silently returns a 'valid' but truncated
> buffer. This then leads to the decode of the buffer failing in random
> ways. So, add a check for ENOMEM being generated during the read.
> 
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> ---
>   tests/i915/gem_exec_capture.c | 2 ++
>   1 file changed, 2 insertions(+)
> 
> diff --git a/tests/i915/gem_exec_capture.c b/tests/i915/gem_exec_capture.c
> index e373d24ed..8997125ee 100644
> --- a/tests/i915/gem_exec_capture.c
> +++ b/tests/i915/gem_exec_capture.c
> @@ -131,9 +131,11 @@ static int check_error_state(int dir, struct offset *obj_offsets, int obj_count,
>   	char *error, *str;
>   	int blobs = 0;
>   
> +	errno = 0;
>   	error = igt_sysfs_get(dir, "error");
>   	igt_sysfs_set(dir, "error", "Begone!");
>   	igt_assert(error);
> +	igt_assert(errno != ENOMEM);

igt_sysfs_get:

	len = 64;
...
                 newbuf = realloc(buf, 2*len);

Maybe the problem is doubling goes out of hand. How big are your 
buffers? Perhaps you could improve the library function instead to grow 
less aggressively.

And at the same time perhaps the bug is this:

                 if (igt_debug_on(!newbuf))
                         break;
...
         return buf;

So failures to grow the buffer are ignored, while failure to allocate 
the initial one are not. Perhaps both should return NULL and then 
callers would not be surprised.

Or you think someone relies on this current odd behaviour?

Regards,

Tvrtko

>   	igt_debug("%s\n", error);
>   
>   	/* render ring --- user = 0x00000000 ffffd000 */
> 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [igt-dev] [Intel-gfx] [PATCH i-g-t 5/8] tests/i915/gem_exec_capture: Check for memory allocation failure
@ 2021-11-03 14:00     ` Tvrtko Ursulin
  0 siblings, 0 replies; 51+ messages in thread
From: Tvrtko Ursulin @ 2021-11-03 14:00 UTC (permalink / raw)
  To: John.C.Harrison, IGT-Dev; +Cc: Intel-GFX


On 22/10/2021 00:40, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> The sysfs file read helper does not actually report any errors if a
> realloc fails. It just silently returns a 'valid' but truncated
> buffer. This then leads to the decode of the buffer failing in random
> ways. So, add a check for ENOMEM being generated during the read.
> 
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> ---
>   tests/i915/gem_exec_capture.c | 2 ++
>   1 file changed, 2 insertions(+)
> 
> diff --git a/tests/i915/gem_exec_capture.c b/tests/i915/gem_exec_capture.c
> index e373d24ed..8997125ee 100644
> --- a/tests/i915/gem_exec_capture.c
> +++ b/tests/i915/gem_exec_capture.c
> @@ -131,9 +131,11 @@ static int check_error_state(int dir, struct offset *obj_offsets, int obj_count,
>   	char *error, *str;
>   	int blobs = 0;
>   
> +	errno = 0;
>   	error = igt_sysfs_get(dir, "error");
>   	igt_sysfs_set(dir, "error", "Begone!");
>   	igt_assert(error);
> +	igt_assert(errno != ENOMEM);

igt_sysfs_get:

	len = 64;
...
                 newbuf = realloc(buf, 2*len);

Maybe the problem is doubling goes out of hand. How big are your 
buffers? Perhaps you could improve the library function instead to grow 
less aggressively.

And at the same time perhaps the bug is this:

                 if (igt_debug_on(!newbuf))
                         break;
...
         return buf;

So failures to grow the buffer are ignored, while failure to allocate 
the initial one are not. Perhaps both should return NULL and then 
callers would not be surprised.

Or you think someone relies on this current odd behaviour?

Regards,

Tvrtko

>   	igt_debug("%s\n", error);
>   
>   	/* render ring --- user = 0x00000000 ffffd000 */
> 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Intel-gfx] [PATCH i-g-t 5/8] tests/i915/gem_exec_capture: Check for memory allocation failure
  2021-11-03 14:00     ` [igt-dev] " Tvrtko Ursulin
@ 2021-11-03 18:36       ` John Harrison
  -1 siblings, 0 replies; 51+ messages in thread
From: John Harrison @ 2021-11-03 18:36 UTC (permalink / raw)
  To: Tvrtko Ursulin, IGT-Dev; +Cc: Intel-GFX

On 11/3/2021 07:00, Tvrtko Ursulin wrote:
> On 22/10/2021 00:40, John.C.Harrison@Intel.com wrote:
>> From: John Harrison <John.C.Harrison@Intel.com>
>>
>> The sysfs file read helper does not actually report any errors if a
>> realloc fails. It just silently returns a 'valid' but truncated
>> buffer. This then leads to the decode of the buffer failing in random
>> ways. So, add a check for ENOMEM being generated during the read.
>>
>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>> ---
>>   tests/i915/gem_exec_capture.c | 2 ++
>>   1 file changed, 2 insertions(+)
>>
>> diff --git a/tests/i915/gem_exec_capture.c 
>> b/tests/i915/gem_exec_capture.c
>> index e373d24ed..8997125ee 100644
>> --- a/tests/i915/gem_exec_capture.c
>> +++ b/tests/i915/gem_exec_capture.c
>> @@ -131,9 +131,11 @@ static int check_error_state(int dir, struct 
>> offset *obj_offsets, int obj_count,
>>       char *error, *str;
>>       int blobs = 0;
>>   +    errno = 0;
>>       error = igt_sysfs_get(dir, "error");
>>       igt_sysfs_set(dir, "error", "Begone!");
>>       igt_assert(error);
>> +    igt_assert(errno != ENOMEM);
>
> igt_sysfs_get:
>
>     len = 64;
> ...
>                 newbuf = realloc(buf, 2*len);
>
> Maybe the problem is doubling goes out of hand. How big are your 
> buffers? Perhaps you could improve the library function instead to 
> grow less aggressively.
The buffers are generally ending at 2GB in size with the capture being 
about 1.8GB (on the particular system I happen to be testing on).

I considered various options such as doubling until a given size and 
then just incrementing by fixed amounts. But where do you draw the line? 
1MB, 128MB, 1GB, 128GB? If the final result needs to be 128GB (which you 
cannot know until you have finished reading and resizing) and you are 
allocating in 1MB chunks then it is going to take a very long time to 
get there. I ended up leaving it as a straight double on the grounds 
that it is the best compromise between overallocation and taking 
ridiculous numbers of steps.



>
> And at the same time perhaps the bug is this:
>
>                 if (igt_debug_on(!newbuf))
>                         break;
> ...
>         return buf;
>
> So failures to grow the buffer are ignored, while failure to allocate 
> the initial one are not. Perhaps both should return NULL and then 
> callers would not be surprised.
>
> Or you think someone relies on this current odd behaviour?
>
As per the commit description, this is exactly the problem. However, I 
do not know for certain this is not intentional behaviour and something 
somewhere is relying on it. And I really do not have the time to audit 
this. The vast majority of uses are reading teeny tiny files and don't 
care but who knows what might not be in some particular 
test/config/platform/etc. The fact that it is explicitly saying 
'igt_debug_on' means that someone must have made a conscious decision to 
not assert. It's not like they just forgot to check for null being 
returned. Which implies it is intentional and required.

John.


> Regards,
>
> Tvrtko
>
>>       igt_debug("%s\n", error);
>>         /* render ring --- user = 0x00000000 ffffd000 */
>>


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [igt-dev] [Intel-gfx] [PATCH i-g-t 5/8] tests/i915/gem_exec_capture: Check for memory allocation failure
@ 2021-11-03 18:36       ` John Harrison
  0 siblings, 0 replies; 51+ messages in thread
From: John Harrison @ 2021-11-03 18:36 UTC (permalink / raw)
  To: Tvrtko Ursulin, IGT-Dev; +Cc: Intel-GFX

On 11/3/2021 07:00, Tvrtko Ursulin wrote:
> On 22/10/2021 00:40, John.C.Harrison@Intel.com wrote:
>> From: John Harrison <John.C.Harrison@Intel.com>
>>
>> The sysfs file read helper does not actually report any errors if a
>> realloc fails. It just silently returns a 'valid' but truncated
>> buffer. This then leads to the decode of the buffer failing in random
>> ways. So, add a check for ENOMEM being generated during the read.
>>
>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>> ---
>>   tests/i915/gem_exec_capture.c | 2 ++
>>   1 file changed, 2 insertions(+)
>>
>> diff --git a/tests/i915/gem_exec_capture.c 
>> b/tests/i915/gem_exec_capture.c
>> index e373d24ed..8997125ee 100644
>> --- a/tests/i915/gem_exec_capture.c
>> +++ b/tests/i915/gem_exec_capture.c
>> @@ -131,9 +131,11 @@ static int check_error_state(int dir, struct 
>> offset *obj_offsets, int obj_count,
>>       char *error, *str;
>>       int blobs = 0;
>>   +    errno = 0;
>>       error = igt_sysfs_get(dir, "error");
>>       igt_sysfs_set(dir, "error", "Begone!");
>>       igt_assert(error);
>> +    igt_assert(errno != ENOMEM);
>
> igt_sysfs_get:
>
>     len = 64;
> ...
>                 newbuf = realloc(buf, 2*len);
>
> Maybe the problem is doubling goes out of hand. How big are your 
> buffers? Perhaps you could improve the library function instead to 
> grow less aggressively.
The buffers are generally ending at 2GB in size with the capture being 
about 1.8GB (on the particular system I happen to be testing on).

I considered various options such as doubling until a given size and 
then just incrementing by fixed amounts. But where do you draw the line? 
1MB, 128MB, 1GB, 128GB? If the final result needs to be 128GB (which you 
cannot know until you have finished reading and resizing) and you are 
allocating in 1MB chunks then it is going to take a very long time to 
get there. I ended up leaving it as a straight double on the grounds 
that it is the best compromise between overallocation and taking 
ridiculous numbers of steps.



>
> And at the same time perhaps the bug is this:
>
>                 if (igt_debug_on(!newbuf))
>                         break;
> ...
>         return buf;
>
> So failures to grow the buffer are ignored, while failure to allocate 
> the initial one are not. Perhaps both should return NULL and then 
> callers would not be surprised.
>
> Or you think someone relies on this current odd behaviour?
>
As per the commit description, this is exactly the problem. However, I 
do not know for certain this is not intentional behaviour and something 
somewhere is relying on it. And I really do not have the time to audit 
this. The vast majority of uses are reading teeny tiny files and don't 
care but who knows what might not be in some particular 
test/config/platform/etc. The fact that it is explicitly saying 
'igt_debug_on' means that someone must have made a conscious decision to 
not assert. It's not like they just forgot to check for null being 
returned. Which implies it is intentional and required.

John.


> Regards,
>
> Tvrtko
>
>>       igt_debug("%s\n", error);
>>         /* render ring --- user = 0x00000000 ffffd000 */
>>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Intel-gfx] [PATCH i-g-t 1/8] tests/i915/gem_exec_capture: Remove pointless assert
  2021-11-03 13:50     ` [igt-dev] " Tvrtko Ursulin
@ 2021-11-03 18:44       ` John Harrison
  -1 siblings, 0 replies; 51+ messages in thread
From: John Harrison @ 2021-11-03 18:44 UTC (permalink / raw)
  To: Tvrtko Ursulin, IGT-Dev; +Cc: Intel-GFX

On 11/3/2021 06:50, Tvrtko Ursulin wrote:
> On 22/10/2021 00:40, John.C.Harrison@Intel.com wrote:
>> From: John Harrison <John.C.Harrison@Intel.com>
>>
>> The 'many' test ended with an 'assert(count)', presumably meaning to
>> ensure that some objects were actually captured. However, 'count' is
>> the number of objects created not how many were captured. Plus, there
>> is already a 'require(count > 1)' at the start and count is invarient
>> so the final assert is basically pointless.
>>
>> General concensus appears to be that the test should not fail
>> irrespective of how many blobs are captured as low memory situations
>> could cause the capture to be abbreviated. So just remove the
>> pointless assert completely.
>
> Hm the test appears to be using intel_get_avail_ram_mb() to size the 
> working set. Suggesting problems with low memory situations should not 
> apply unless bugs. In which case would a better fix be improving the 
> sizing logic and changing the assert to igt_assert(blobs)?
After fixing the sysfs read code to cope with large files, I don't ever 
see abbreviated captures any more. However, other reviewers objected to 
asserting anything at all about the final count (whether full size, zero 
or whatever) on the grounds that low memory issues *might* still occur. 
And some in quite blunt language as I recall. If you think different, 
feel free to start your own patch set.

John.

>
> Regards,
>
> Tvrtko
>
>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>> ---
>>   tests/i915/gem_exec_capture.c | 1 -
>>   1 file changed, 1 deletion(-)
>>
>> diff --git a/tests/i915/gem_exec_capture.c 
>> b/tests/i915/gem_exec_capture.c
>> index 7e0a8b8ad..53649cdb2 100644
>> --- a/tests/i915/gem_exec_capture.c
>> +++ b/tests/i915/gem_exec_capture.c
>> @@ -524,7 +524,6 @@ static void many(int fd, int dir, uint64_t size, 
>> unsigned int flags)
>>       }
>>       igt_info("Captured %lu %"PRId64"-blobs out of a total of %lu\n",
>>            blobs, size >> 12, count);
>> -    igt_assert(count);
>>         free(error);
>>       free(offsets);
>>


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [igt-dev] [Intel-gfx] [PATCH i-g-t 1/8] tests/i915/gem_exec_capture: Remove pointless assert
@ 2021-11-03 18:44       ` John Harrison
  0 siblings, 0 replies; 51+ messages in thread
From: John Harrison @ 2021-11-03 18:44 UTC (permalink / raw)
  To: Tvrtko Ursulin, IGT-Dev; +Cc: Intel-GFX

On 11/3/2021 06:50, Tvrtko Ursulin wrote:
> On 22/10/2021 00:40, John.C.Harrison@Intel.com wrote:
>> From: John Harrison <John.C.Harrison@Intel.com>
>>
>> The 'many' test ended with an 'assert(count)', presumably meaning to
>> ensure that some objects were actually captured. However, 'count' is
>> the number of objects created not how many were captured. Plus, there
>> is already a 'require(count > 1)' at the start and count is invarient
>> so the final assert is basically pointless.
>>
>> General concensus appears to be that the test should not fail
>> irrespective of how many blobs are captured as low memory situations
>> could cause the capture to be abbreviated. So just remove the
>> pointless assert completely.
>
> Hm the test appears to be using intel_get_avail_ram_mb() to size the 
> working set. Suggesting problems with low memory situations should not 
> apply unless bugs. In which case would a better fix be improving the 
> sizing logic and changing the assert to igt_assert(blobs)?
After fixing the sysfs read code to cope with large files, I don't ever 
see abbreviated captures any more. However, other reviewers objected to 
asserting anything at all about the final count (whether full size, zero 
or whatever) on the grounds that low memory issues *might* still occur. 
And some in quite blunt language as I recall. If you think different, 
feel free to start your own patch set.

John.

>
> Regards,
>
> Tvrtko
>
>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>> ---
>>   tests/i915/gem_exec_capture.c | 1 -
>>   1 file changed, 1 deletion(-)
>>
>> diff --git a/tests/i915/gem_exec_capture.c 
>> b/tests/i915/gem_exec_capture.c
>> index 7e0a8b8ad..53649cdb2 100644
>> --- a/tests/i915/gem_exec_capture.c
>> +++ b/tests/i915/gem_exec_capture.c
>> @@ -524,7 +524,6 @@ static void many(int fd, int dir, uint64_t size, 
>> unsigned int flags)
>>       }
>>       igt_info("Captured %lu %"PRId64"-blobs out of a total of %lu\n",
>>            blobs, size >> 12, count);
>> -    igt_assert(count);
>>         free(error);
>>       free(offsets);
>>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Intel-gfx] [PATCH i-g-t 4/8] tests/i915/gem_exec_capture: Use contexts and engines properly
  2021-11-03  9:36         ` [igt-dev] " Petri Latvala
@ 2021-11-03 18:49           ` John Harrison
  -1 siblings, 0 replies; 51+ messages in thread
From: John Harrison @ 2021-11-03 18:49 UTC (permalink / raw)
  To: Petri Latvala; +Cc: IGT-Dev, Intel-GFX

On 11/3/2021 02:36, Petri Latvala wrote:
> On Tue, Nov 02, 2021 at 06:45:38PM -0700, John Harrison wrote:
>> On 11/2/2021 16:34, Matthew Brost wrote:
>>> On Thu, Oct 21, 2021 at 04:40:40PM -0700, John.C.Harrison@Intel.com wrote:
>>>> From: John Harrison <John.C.Harrison@Intel.com>
>>>>
>>>> Some of the capture tests were using explicit contexts, some not. Some
>>>> were poking the per engine pre-emption timeout, some not. This would
>>>> lead to sporadic failures due to random timeouts, contexts being
>>>> banned depending upon how many subtests were run and/or how many
>>>> engines a given platform has, and other such failures.
>>>>
>>>> So, update all tests to be conistent.
>>>>
>>>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>>>> ---
>>>>    tests/i915/gem_exec_capture.c | 80 +++++++++++++++++++++++++----------
>>>>    1 file changed, 58 insertions(+), 22 deletions(-)
>>>>
>>>> diff --git a/tests/i915/gem_exec_capture.c b/tests/i915/gem_exec_capture.c
>>>> index c85c198f7..e373d24ed 100644
>>>> --- a/tests/i915/gem_exec_capture.c
>>>> +++ b/tests/i915/gem_exec_capture.c
>>>> @@ -204,8 +204,19 @@ static int check_error_state(int dir, struct offset *obj_offsets, int obj_count,
>>>>    	return blobs;
>>>>    }
>>>> +static void configure_hangs(int fd, const struct intel_execution_engine2 *e, int ctxt_id)
>>>> +{
>>>> +	/* Ensure fast hang detection */
>>>> +	gem_engine_property_printf(fd, e->name, "preempt_timeout_ms", "%d", 250);
>>>> +	gem_engine_property_printf(fd, e->name, "heartbeat_interval_ms", "%d", 500);
>>> #define for 250, 500?
>> Is there any point? There is no special reason for the values other than
>> small enough to be fast and long enough to not be too small to be usable. So
>> there isn't really any particular name to give them beyond
>> 'SHORT_PREEMPT_TIMEOUT' or some such. And the whole point of the helper
>> function is that the values are programmed in one place only and not used
>> anywhere else. So there is no worry about repetition of magic numbers.
> In about one year everyone has forgotten this explanation and will
> wonder if it's related to some in-kernel behaviour or if there's some
> other reason these values have been chosen.
>
> So at least a comment why the values are these, please.
There is a comment already. Not sure what more can be added that is 
meaningful other than changing it to "Ensure fast hang detection by 
picking some random numbers out of the air that seem to be vaguely 
plausible".

John.

>
>


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [igt-dev] [Intel-gfx] [PATCH i-g-t 4/8] tests/i915/gem_exec_capture: Use contexts and engines properly
@ 2021-11-03 18:49           ` John Harrison
  0 siblings, 0 replies; 51+ messages in thread
From: John Harrison @ 2021-11-03 18:49 UTC (permalink / raw)
  To: Petri Latvala; +Cc: IGT-Dev, Intel-GFX

On 11/3/2021 02:36, Petri Latvala wrote:
> On Tue, Nov 02, 2021 at 06:45:38PM -0700, John Harrison wrote:
>> On 11/2/2021 16:34, Matthew Brost wrote:
>>> On Thu, Oct 21, 2021 at 04:40:40PM -0700, John.C.Harrison@Intel.com wrote:
>>>> From: John Harrison <John.C.Harrison@Intel.com>
>>>>
>>>> Some of the capture tests were using explicit contexts, some not. Some
>>>> were poking the per engine pre-emption timeout, some not. This would
>>>> lead to sporadic failures due to random timeouts, contexts being
>>>> banned depending upon how many subtests were run and/or how many
>>>> engines a given platform has, and other such failures.
>>>>
>>>> So, update all tests to be conistent.
>>>>
>>>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>>>> ---
>>>>    tests/i915/gem_exec_capture.c | 80 +++++++++++++++++++++++++----------
>>>>    1 file changed, 58 insertions(+), 22 deletions(-)
>>>>
>>>> diff --git a/tests/i915/gem_exec_capture.c b/tests/i915/gem_exec_capture.c
>>>> index c85c198f7..e373d24ed 100644
>>>> --- a/tests/i915/gem_exec_capture.c
>>>> +++ b/tests/i915/gem_exec_capture.c
>>>> @@ -204,8 +204,19 @@ static int check_error_state(int dir, struct offset *obj_offsets, int obj_count,
>>>>    	return blobs;
>>>>    }
>>>> +static void configure_hangs(int fd, const struct intel_execution_engine2 *e, int ctxt_id)
>>>> +{
>>>> +	/* Ensure fast hang detection */
>>>> +	gem_engine_property_printf(fd, e->name, "preempt_timeout_ms", "%d", 250);
>>>> +	gem_engine_property_printf(fd, e->name, "heartbeat_interval_ms", "%d", 500);
>>> #define for 250, 500?
>> Is there any point? There is no special reason for the values other than
>> small enough to be fast and long enough to not be too small to be usable. So
>> there isn't really any particular name to give them beyond
>> 'SHORT_PREEMPT_TIMEOUT' or some such. And the whole point of the helper
>> function is that the values are programmed in one place only and not used
>> anywhere else. So there is no worry about repetition of magic numbers.
> In about one year everyone has forgotten this explanation and will
> wonder if it's related to some in-kernel behaviour or if there's some
> other reason these values have been chosen.
>
> So at least a comment why the values are these, please.
There is a comment already. Not sure what more can be added that is 
meaningful other than changing it to "Ensure fast hang detection by 
picking some random numbers out of the air that seem to be vaguely 
plausible".

John.

>
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Intel-gfx] [PATCH i-g-t 4/8] tests/i915/gem_exec_capture: Use contexts and engines properly
  2021-11-03 18:49           ` [igt-dev] " John Harrison
@ 2021-11-04  6:40             ` Petri Latvala
  -1 siblings, 0 replies; 51+ messages in thread
From: Petri Latvala @ 2021-11-04  6:40 UTC (permalink / raw)
  To: John Harrison; +Cc: IGT-Dev, Intel-GFX

On Wed, Nov 03, 2021 at 11:49:47AM -0700, John Harrison wrote:
> On 11/3/2021 02:36, Petri Latvala wrote:
> > On Tue, Nov 02, 2021 at 06:45:38PM -0700, John Harrison wrote:
> > > On 11/2/2021 16:34, Matthew Brost wrote:
> > > > On Thu, Oct 21, 2021 at 04:40:40PM -0700, John.C.Harrison@Intel.com wrote:
> > > > > From: John Harrison <John.C.Harrison@Intel.com>
> > > > > 
> > > > > Some of the capture tests were using explicit contexts, some not. Some
> > > > > were poking the per engine pre-emption timeout, some not. This would
> > > > > lead to sporadic failures due to random timeouts, contexts being
> > > > > banned depending upon how many subtests were run and/or how many
> > > > > engines a given platform has, and other such failures.
> > > > > 
> > > > > So, update all tests to be conistent.
> > > > > 
> > > > > Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> > > > > ---
> > > > >    tests/i915/gem_exec_capture.c | 80 +++++++++++++++++++++++++----------
> > > > >    1 file changed, 58 insertions(+), 22 deletions(-)
> > > > > 
> > > > > diff --git a/tests/i915/gem_exec_capture.c b/tests/i915/gem_exec_capture.c
> > > > > index c85c198f7..e373d24ed 100644
> > > > > --- a/tests/i915/gem_exec_capture.c
> > > > > +++ b/tests/i915/gem_exec_capture.c
> > > > > @@ -204,8 +204,19 @@ static int check_error_state(int dir, struct offset *obj_offsets, int obj_count,
> > > > >    	return blobs;
> > > > >    }
> > > > > +static void configure_hangs(int fd, const struct intel_execution_engine2 *e, int ctxt_id)
> > > > > +{
> > > > > +	/* Ensure fast hang detection */
> > > > > +	gem_engine_property_printf(fd, e->name, "preempt_timeout_ms", "%d", 250);
> > > > > +	gem_engine_property_printf(fd, e->name, "heartbeat_interval_ms", "%d", 500);
> > > > #define for 250, 500?
> > > Is there any point? There is no special reason for the values other than
> > > small enough to be fast and long enough to not be too small to be usable. So
> > > there isn't really any particular name to give them beyond
> > > 'SHORT_PREEMPT_TIMEOUT' or some such. And the whole point of the helper
> > > function is that the values are programmed in one place only and not used
> > > anywhere else. So there is no worry about repetition of magic numbers.
> > In about one year everyone has forgotten this explanation and will
> > wonder if it's related to some in-kernel behaviour or if there's some
> > other reason these values have been chosen.
> > 
> > So at least a comment why the values are these, please.
> There is a comment already. Not sure what more can be added that is
> meaningful other than changing it to "Ensure fast hang detection by picking
> some random numbers out of the air that seem to be vaguely plausible".

Fair enough.


-- 
Petri Latvala

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [igt-dev] [Intel-gfx] [PATCH i-g-t 4/8] tests/i915/gem_exec_capture: Use contexts and engines properly
@ 2021-11-04  6:40             ` Petri Latvala
  0 siblings, 0 replies; 51+ messages in thread
From: Petri Latvala @ 2021-11-04  6:40 UTC (permalink / raw)
  To: John Harrison; +Cc: IGT-Dev, Intel-GFX

On Wed, Nov 03, 2021 at 11:49:47AM -0700, John Harrison wrote:
> On 11/3/2021 02:36, Petri Latvala wrote:
> > On Tue, Nov 02, 2021 at 06:45:38PM -0700, John Harrison wrote:
> > > On 11/2/2021 16:34, Matthew Brost wrote:
> > > > On Thu, Oct 21, 2021 at 04:40:40PM -0700, John.C.Harrison@Intel.com wrote:
> > > > > From: John Harrison <John.C.Harrison@Intel.com>
> > > > > 
> > > > > Some of the capture tests were using explicit contexts, some not. Some
> > > > > were poking the per engine pre-emption timeout, some not. This would
> > > > > lead to sporadic failures due to random timeouts, contexts being
> > > > > banned depending upon how many subtests were run and/or how many
> > > > > engines a given platform has, and other such failures.
> > > > > 
> > > > > So, update all tests to be conistent.
> > > > > 
> > > > > Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> > > > > ---
> > > > >    tests/i915/gem_exec_capture.c | 80 +++++++++++++++++++++++++----------
> > > > >    1 file changed, 58 insertions(+), 22 deletions(-)
> > > > > 
> > > > > diff --git a/tests/i915/gem_exec_capture.c b/tests/i915/gem_exec_capture.c
> > > > > index c85c198f7..e373d24ed 100644
> > > > > --- a/tests/i915/gem_exec_capture.c
> > > > > +++ b/tests/i915/gem_exec_capture.c
> > > > > @@ -204,8 +204,19 @@ static int check_error_state(int dir, struct offset *obj_offsets, int obj_count,
> > > > >    	return blobs;
> > > > >    }
> > > > > +static void configure_hangs(int fd, const struct intel_execution_engine2 *e, int ctxt_id)
> > > > > +{
> > > > > +	/* Ensure fast hang detection */
> > > > > +	gem_engine_property_printf(fd, e->name, "preempt_timeout_ms", "%d", 250);
> > > > > +	gem_engine_property_printf(fd, e->name, "heartbeat_interval_ms", "%d", 500);
> > > > #define for 250, 500?
> > > Is there any point? There is no special reason for the values other than
> > > small enough to be fast and long enough to not be too small to be usable. So
> > > there isn't really any particular name to give them beyond
> > > 'SHORT_PREEMPT_TIMEOUT' or some such. And the whole point of the helper
> > > function is that the values are programmed in one place only and not used
> > > anywhere else. So there is no worry about repetition of magic numbers.
> > In about one year everyone has forgotten this explanation and will
> > wonder if it's related to some in-kernel behaviour or if there's some
> > other reason these values have been chosen.
> > 
> > So at least a comment why the values are these, please.
> There is a comment already. Not sure what more can be added that is
> meaningful other than changing it to "Ensure fast hang detection by picking
> some random numbers out of the air that seem to be vaguely plausible".

Fair enough.


-- 
Petri Latvala

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Intel-gfx] [PATCH i-g-t 1/8] tests/i915/gem_exec_capture: Remove pointless assert
  2021-11-03 18:44       ` [igt-dev] " John Harrison
@ 2021-11-04  9:14         ` Tvrtko Ursulin
  -1 siblings, 0 replies; 51+ messages in thread
From: Tvrtko Ursulin @ 2021-11-04  9:14 UTC (permalink / raw)
  To: John Harrison, IGT-Dev; +Cc: Intel-GFX


On 03/11/2021 18:44, John Harrison wrote:
> On 11/3/2021 06:50, Tvrtko Ursulin wrote:
>> On 22/10/2021 00:40, John.C.Harrison@Intel.com wrote:
>>> From: John Harrison <John.C.Harrison@Intel.com>
>>>
>>> The 'many' test ended with an 'assert(count)', presumably meaning to
>>> ensure that some objects were actually captured. However, 'count' is
>>> the number of objects created not how many were captured. Plus, there
>>> is already a 'require(count > 1)' at the start and count is invarient
>>> so the final assert is basically pointless.
>>>
>>> General concensus appears to be that the test should not fail
>>> irrespective of how many blobs are captured as low memory situations
>>> could cause the capture to be abbreviated. So just remove the
>>> pointless assert completely.
>>
>> Hm the test appears to be using intel_get_avail_ram_mb() to size the 
>> working set. Suggesting problems with low memory situations should not 
>> apply unless bugs. In which case would a better fix be improving the 
>> sizing logic and changing the assert to igt_assert(blobs)?
> After fixing the sysfs read code to cope with large files, I don't ever 
> see abbreviated captures any more. However, other reviewers objected to 
> asserting anything at all about the final count (whether full size, zero 
> or whatever) on the grounds that low memory issues *might* still occur. 
> And some in quite blunt language as I recall. If you think different, 
> feel free to start your own patch set.

Do you have a link so I can understand the discussion? Because from the 
top of my head I can't imagine what were the objections, I mean what is 
the point of keeping the test but not asserting at the end at least 
something was captured?

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [igt-dev] [Intel-gfx] [PATCH i-g-t 1/8] tests/i915/gem_exec_capture: Remove pointless assert
@ 2021-11-04  9:14         ` Tvrtko Ursulin
  0 siblings, 0 replies; 51+ messages in thread
From: Tvrtko Ursulin @ 2021-11-04  9:14 UTC (permalink / raw)
  To: John Harrison, IGT-Dev; +Cc: Intel-GFX


On 03/11/2021 18:44, John Harrison wrote:
> On 11/3/2021 06:50, Tvrtko Ursulin wrote:
>> On 22/10/2021 00:40, John.C.Harrison@Intel.com wrote:
>>> From: John Harrison <John.C.Harrison@Intel.com>
>>>
>>> The 'many' test ended with an 'assert(count)', presumably meaning to
>>> ensure that some objects were actually captured. However, 'count' is
>>> the number of objects created not how many were captured. Plus, there
>>> is already a 'require(count > 1)' at the start and count is invarient
>>> so the final assert is basically pointless.
>>>
>>> General concensus appears to be that the test should not fail
>>> irrespective of how many blobs are captured as low memory situations
>>> could cause the capture to be abbreviated. So just remove the
>>> pointless assert completely.
>>
>> Hm the test appears to be using intel_get_avail_ram_mb() to size the 
>> working set. Suggesting problems with low memory situations should not 
>> apply unless bugs. In which case would a better fix be improving the 
>> sizing logic and changing the assert to igt_assert(blobs)?
> After fixing the sysfs read code to cope with large files, I don't ever 
> see abbreviated captures any more. However, other reviewers objected to 
> asserting anything at all about the final count (whether full size, zero 
> or whatever) on the grounds that low memory issues *might* still occur. 
> And some in quite blunt language as I recall. If you think different, 
> feel free to start your own patch set.

Do you have a link so I can understand the discussion? Because from the 
top of my head I can't imagine what were the objections, I mean what is 
the point of keeping the test but not asserting at the end at least 
something was captured?

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2021-11-04  9:15 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-21 23:40 [Intel-gfx] [PATCH i-g-t 0/8] Fixes for gem_exec_capture John.C.Harrison
2021-10-21 23:40 ` [Intel-gfx] [PATCH i-g-t 1/8] tests/i915/gem_exec_capture: Remove pointless assert John.C.Harrison
2021-10-21 23:40   ` [igt-dev] " John.C.Harrison
2021-10-29  2:14   ` [Intel-gfx] " Matthew Brost
2021-10-29  2:14     ` Matthew Brost
2021-11-03 13:50   ` [Intel-gfx] " Tvrtko Ursulin
2021-11-03 13:50     ` [igt-dev] " Tvrtko Ursulin
2021-11-03 18:44     ` John Harrison
2021-11-03 18:44       ` [igt-dev] " John Harrison
2021-11-04  9:14       ` Tvrtko Ursulin
2021-11-04  9:14         ` [igt-dev] " Tvrtko Ursulin
2021-10-21 23:40 ` [Intel-gfx] [PATCH i-g-t 2/8] tests/i915/gem_exec_capture: Cope with larger page sizes John.C.Harrison
2021-10-21 23:40   ` [igt-dev] " John.C.Harrison
2021-10-29 17:39   ` [Intel-gfx] " Matthew Brost
2021-10-29 17:39     ` Matthew Brost
2021-10-30  0:32     ` [Intel-gfx] " John Harrison
2021-10-30  0:32       ` John Harrison
2021-11-02 23:18       ` [Intel-gfx] " Matthew Brost
2021-11-02 23:18         ` Matthew Brost
2021-10-21 23:40 ` [Intel-gfx] [PATCH i-g-t 3/8] tests/i915/gem_exec_capture: Make the error decode a common helper John.C.Harrison
2021-10-29  2:34   ` Matthew Brost
2021-10-29  2:34     ` [igt-dev] " Matthew Brost
2021-10-21 23:40 ` [Intel-gfx] [PATCH i-g-t 4/8] tests/i915/gem_exec_capture: Use contexts and engines properly John.C.Harrison
2021-10-21 23:40   ` [igt-dev] " John.C.Harrison
2021-11-02 23:34   ` [Intel-gfx] " Matthew Brost
2021-11-02 23:34     ` [igt-dev] " Matthew Brost
2021-11-03  1:45     ` John Harrison
2021-11-03  9:36       ` Petri Latvala
2021-11-03  9:36         ` [igt-dev] " Petri Latvala
2021-11-03 18:49         ` John Harrison
2021-11-03 18:49           ` [igt-dev] " John Harrison
2021-11-04  6:40           ` Petri Latvala
2021-11-04  6:40             ` [igt-dev] " Petri Latvala
2021-10-21 23:40 ` [Intel-gfx] [PATCH i-g-t 5/8] tests/i915/gem_exec_capture: Check for memory allocation failure John.C.Harrison
2021-10-29  2:20   ` Matthew Brost
2021-11-03 14:00   ` Tvrtko Ursulin
2021-11-03 14:00     ` [igt-dev] " Tvrtko Ursulin
2021-11-03 18:36     ` John Harrison
2021-11-03 18:36       ` [igt-dev] " John Harrison
2021-10-21 23:40 ` [Intel-gfx] [PATCH i-g-t 6/8] lib/igt_sysfs: Support large files John.C.Harrison
2021-10-21 23:40   ` [igt-dev] " John.C.Harrison
2021-10-29  2:46   ` [Intel-gfx] " Matthew Brost
2021-10-29  2:46     ` Matthew Brost
2021-10-21 23:40 ` [Intel-gfx] [PATCH i-g-t 7/8] lib/igt_gt: Allow per engine reset testing John.C.Harrison
2021-10-21 23:40   ` [igt-dev] " John.C.Harrison
2021-11-03  0:47   ` [Intel-gfx] " Matthew Brost
2021-11-03  0:47     ` [igt-dev] " Matthew Brost
2021-10-21 23:40 ` [Intel-gfx] [PATCH i-g-t 8/8] tests/i915/gem_exec_capture: Update to support GuC based resets John.C.Harrison
2021-10-29  2:54   ` Matthew Brost
2021-10-22  0:27 ` [igt-dev] ✓ Fi.CI.BAT: success for Fixes for gem_exec_capture Patchwork
2021-10-22  3:38 ` [igt-dev] ✓ Fi.CI.IGT: " Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.