[PATCH i-g-t v6 00/12] Improve robustness of the i915 perf tests

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH i-g-t v6 00/12] Improve robustness of the i915 perf tests
@ 2017-10-04 11:19 Lionel Landwerlin
  2017-10-04 11:19 ` [PATCH i-g-t v6 01/12] tests/perf: make stream_fd a global variable Lionel Landwerlin
                   ` (14 more replies)
  0 siblings, 15 replies; 24+ messages in thread
From: Lionel Landwerlin @ 2017-10-04 11:19 UTC (permalink / raw)
  To: intel-gfx

Hi,

Hopefully it's the last round for this series. It's just missing Rb on
patches 2 & 11. Alsp added 12 which fixes a problem reported by Ville.

Cheers,

Lionel Landwerlin (11):
  tests/perf: make stream_fd a global variable
  tests/perf: update max buffer size for reading reports
  tests/perf: rc6: try to guess when rc6 is disabled
  tests/perf: remove frequency related changes
  tests/perf: rework oa-exponent test
  tests/perf: make enable-disable more reliable
  tests/perf: make buffer-fill more reliable
  tests/perf: estimate number of blocking/polling based on time spent
  tests/perf: prevent power management to kick in when necessary
  tests/perf: add support for Coffeelake
  tests/perf: split array of formats descriptions

Robert Bragg (1):
  tests/perf: add per context filtering test for gen8+

 tests/perf.c | 2190 ++++++++++++++++++++++++++++++++++++++++++++++------------
 1 file changed, 1741 insertions(+), 449 deletions(-)

--
2.14.2
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH i-g-t v6 01/12] tests/perf: make stream_fd a global variable
  2017-10-04 11:19 [PATCH i-g-t v6 00/12] Improve robustness of the i915 perf tests Lionel Landwerlin
@ 2017-10-04 11:19 ` Lionel Landwerlin
  2017-10-04 12:08   ` Chris Wilson
  2017-10-04 11:19 ` [PATCH i-g-t v6 02/12] tests/perf: add per context filtering test for gen8+ Lionel Landwerlin
                   ` (13 subsequent siblings)
  14 siblings, 1 reply; 24+ messages in thread
From: Lionel Landwerlin @ 2017-10-04 11:19 UTC (permalink / raw)
  To: intel-gfx

When debugging unstable tests on new platforms we currently we don't
cleanup everything well in between different tests. Since only a
single OA stream fd can be opened at a time, having the stream_fd as a
global variable helps us cleanup the state between tests.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
---
 tests/perf.c | 121 ++++++++++++++++++++++++++++++++---------------------------
 1 file changed, 65 insertions(+), 56 deletions(-)

diff --git a/tests/perf.c b/tests/perf.c
index a82a3da3..f89a235e 100644
--- a/tests/perf.c
+++ b/tests/perf.c
@@ -285,6 +285,7 @@ static bool hsw_undefined_a_counters[45] = {
 static bool gen8_undefined_a_counters[45];
 
 static int drm_fd = -1;
+static int stream_fd = -1;
 static uint32_t devid;
 static int card = -1;
 static int n_eus;
@@ -306,10 +307,22 @@ static uint32_t (*read_report_ticks)(uint32_t *report,
 static void (*sanity_check_reports)(uint32_t *oa_report0, uint32_t *oa_report1,
 				    enum drm_i915_oa_format format);
 
+static void
+__perf_close(int fd)
+{
+	close(fd);
+	stream_fd = -1;
+}
+
 static int
 __perf_open(int fd, struct drm_i915_perf_open_param *param)
 {
-	int ret = igt_ioctl(fd, DRM_IOCTL_I915_PERF_OPEN, param);
+	int ret;
+
+	if (stream_fd >= 0)
+		__perf_close(stream_fd);
+
+	ret = igt_ioctl(fd, DRM_IOCTL_I915_PERF_OPEN, param);
 
 	igt_assert(ret >= 0);
 	errno = 0;
@@ -978,14 +991,12 @@ test_system_wide_paranoid(void)
 			.num_properties = sizeof(properties) / 16,
 			.properties_ptr = to_user_pointer(properties),
 		};
-		int stream_fd;
-
 		write_u64_file("/proc/sys/dev/i915/perf_stream_paranoid", 0);
 
 		igt_drop_root();
 
 		stream_fd = __perf_open(drm_fd, &param);
-		close(stream_fd);
+		__perf_close(stream_fd);
 	}
 
 	igt_waitchildren();
@@ -1033,7 +1044,6 @@ test_invalid_oa_metric_set_id(void)
 		.num_properties = sizeof(properties) / 16,
 		.properties_ptr = to_user_pointer(properties),
 	};
-	int stream_fd;
 
 	do_ioctl_err(drm_fd, DRM_IOCTL_I915_PERF_OPEN, &param, EINVAL);
 
@@ -1043,7 +1053,7 @@ test_invalid_oa_metric_set_id(void)
 	/* Check that we aren't just seeing false positives... */
 	properties[ARRAY_SIZE(properties) - 1] = test_metric_set_id;
 	stream_fd = __perf_open(drm_fd, &param);
-	close(stream_fd);
+	__perf_close(stream_fd);
 
 	/* There's no valid default OA metric set ID... */
 	param.num_properties--;
@@ -1068,7 +1078,6 @@ test_invalid_oa_format_id(void)
 		.num_properties = sizeof(properties) / 16,
 		.properties_ptr = to_user_pointer(properties),
 	};
-	int stream_fd;
 
 	do_ioctl_err(drm_fd, DRM_IOCTL_I915_PERF_OPEN, &param, EINVAL);
 
@@ -1078,7 +1087,7 @@ test_invalid_oa_format_id(void)
 	/* Check that we aren't just seeing false positives... */
 	properties[ARRAY_SIZE(properties) - 1] = test_oa_format;
 	stream_fd = __perf_open(drm_fd, &param);
-	close(stream_fd);
+	__perf_close(stream_fd);
 
 	/* There's no valid default OA format... */
 	param.num_properties--;
@@ -1106,8 +1115,7 @@ test_missing_sample_flags(void)
 }
 
 static void
-read_2_oa_reports(int stream_fd,
-		  int format_id,
+read_2_oa_reports(int format_id,
 		  int exponent,
 		  uint32_t *oa_report0,
 		  uint32_t *oa_report1,
@@ -1241,12 +1249,13 @@ open_and_read_2_oa_reports(int format_id,
 		.num_properties = sizeof(properties) / 16,
 		.properties_ptr = to_user_pointer(properties),
 	};
-	int stream_fd = __perf_open(drm_fd, &param);
 
-	read_2_oa_reports(stream_fd, format_id, exponent,
+	stream_fd = __perf_open(drm_fd, &param);
+
+	read_2_oa_reports(format_id, exponent,
 			  oa_report0, oa_report1, timer_only);
 
-	close(stream_fd);
+	__perf_close(stream_fd);
 }
 
 static void
@@ -1546,9 +1555,10 @@ test_invalid_oa_exponent(void)
 		.num_properties = sizeof(properties) / 16,
 		.properties_ptr = to_user_pointer(properties),
 	};
-	int stream_fd = __perf_open(drm_fd, &param);
 
-	close(stream_fd);
+	stream_fd = __perf_open(drm_fd, &param);
+
+	__perf_close(stream_fd);
 
 	for (int i = 32; i < 65; i++) {
 		properties[7] = i;
@@ -1598,12 +1608,10 @@ test_low_oa_exponent_permissions(void)
 	properties[7] = ok_exponent;
 
 	igt_fork(child, 1) {
-		int stream_fd;
-
 		igt_drop_root();
 
 		stream_fd = __perf_open(drm_fd, &param);
-		close(stream_fd);
+		__perf_close(stream_fd);
 	}
 
 	igt_waitchildren();
@@ -1652,7 +1660,6 @@ test_per_context_mode_unprivileged(void)
 	igt_fork(child, 1) {
 		drm_intel_context *context;
 		drm_intel_bufmgr *bufmgr;
-		int stream_fd;
 		uint32_t ctx_id = 0xffffffff; /* invalid id */
 		int ret;
 
@@ -1670,7 +1677,7 @@ test_per_context_mode_unprivileged(void)
 		properties[1] = ctx_id;
 
 		stream_fd = __perf_open(drm_fd, &param);
-		close(stream_fd);
+		__perf_close(stream_fd);
 
 		drm_intel_gem_context_destroy(context);
 		drm_intel_bufmgr_destroy(bufmgr);
@@ -1733,7 +1740,6 @@ test_blocking(void)
 		.num_properties = sizeof(properties) / 16,
 		.properties_ptr = to_user_pointer(properties),
 	};
-	int stream_fd = __perf_open(drm_fd, &param);
 	uint8_t buf[1024 * 1024];
 	struct tms start_times;
 	struct tms end_times;
@@ -1758,6 +1764,8 @@ test_blocking(void)
 	int64_t start;
 	int n = 0;
 
+	stream_fd = __perf_open(drm_fd, &param);
+
 	times(&start_times);
 
 	igt_debug("tick length = %dns, test duration = %"PRIu64"ns, min iter. = %d, max iter. = %d\n",
@@ -1855,7 +1863,7 @@ test_blocking(void)
 
 	igt_assert(kernel_ns <= (test_duration_ns / 100ull));
 
-	close(stream_fd);
+	__perf_close(stream_fd);
 }
 
 static void
@@ -1884,7 +1892,6 @@ test_polling(void)
 		.num_properties = sizeof(properties) / 16,
 		.properties_ptr = to_user_pointer(properties),
 	};
-	int stream_fd = __perf_open(drm_fd, &param);
 	uint8_t buf[1024 * 1024];
 	struct tms start_times;
 	struct tms end_times;
@@ -1908,6 +1915,8 @@ test_polling(void)
 	int64_t start;
 	int n = 0;
 
+	stream_fd = __perf_open(drm_fd, &param);
+
 	times(&start_times);
 
 	igt_debug("tick length = %dns, test duration = %"PRIu64"ns, min iter. = %d, max iter. = %d\n",
@@ -2036,7 +2045,7 @@ test_polling(void)
 
 	igt_assert(kernel_ns <= (test_duration_ns / 100ull));
 
-	close(stream_fd);
+	__perf_close(stream_fd);
 }
 
 static void
@@ -2059,7 +2068,6 @@ test_buffer_fill(void)
 		.num_properties = sizeof(properties) / 16,
 		.properties_ptr = to_user_pointer(properties),
 	};
-	int stream_fd = __perf_open(drm_fd, &param);
 	int buf_size = 65536 * (256 + sizeof(struct drm_i915_perf_record_header));
 	uint8_t *buf = malloc(buf_size);
 	size_t oa_buf_size = 16 * 1024 * 1024;
@@ -2069,6 +2077,8 @@ test_buffer_fill(void)
 
 	igt_assert(fill_duration < 1000000000);
 
+	stream_fd = __perf_open(drm_fd, &param);
+
 	for (int i = 0; i < 5; i++) {
 		struct drm_i915_perf_record_header *header;
 		bool overflow_seen;
@@ -2119,7 +2129,7 @@ test_buffer_fill(void)
 
 	free(buf);
 
-	close(stream_fd);
+	__perf_close(stream_fd);
 }
 
 static void
@@ -2143,7 +2153,6 @@ test_enable_disable(void)
 		.num_properties = sizeof(properties) / 16,
 		.properties_ptr = to_user_pointer(properties),
 	};
-	int stream_fd = __perf_open(drm_fd, &param);
 	int buf_size = 65536 * (256 + sizeof(struct drm_i915_perf_record_header));
 	uint8_t *buf = malloc(buf_size);
 	size_t oa_buf_size = 16 * 1024 * 1024;
@@ -2151,6 +2160,7 @@ test_enable_disable(void)
 	int n_full_oa_reports = oa_buf_size / report_size;
 	uint64_t fill_duration = n_full_oa_reports * oa_period;
 
+	stream_fd = __perf_open(drm_fd, &param);
 
 	for (int i = 0; i < 5; i++) {
 		int len;
@@ -2196,7 +2206,7 @@ test_enable_disable(void)
 
 	free(buf);
 
-	close(stream_fd);
+	__perf_close(stream_fd);
 }
 
 static void
@@ -2223,7 +2233,6 @@ test_short_reads(void)
 	uint8_t *pages = mmap(NULL, page_size * 2,
 			      PROT_READ|PROT_WRITE, MAP_PRIVATE, zero_fd, 0);
 	struct drm_i915_perf_record_header *header;
-	int stream_fd;
 	int ret;
 
 	igt_assert_neq(zero_fd, -1);
@@ -2280,7 +2289,7 @@ test_short_reads(void)
 	igt_assert_eq(ret, -1);
 	igt_assert_eq(errno, ENOSPC);
 
-	close(stream_fd);
+	__perf_close(stream_fd);
 
 	munmap(pages, page_size * 2);
 }
@@ -2305,14 +2314,16 @@ test_non_sampling_read_error(void)
 		.num_properties = sizeof(properties) / 16,
 		.properties_ptr = to_user_pointer(properties),
 	};
-	int stream_fd = __perf_open(drm_fd, &param);
+	int ret;
 	uint8_t buf[1024];
 
-	int ret = read(stream_fd, buf, sizeof(buf));
+	stream_fd = __perf_open(drm_fd, &param);
+
+	ret = read(stream_fd, buf, sizeof(buf));
 	igt_assert_eq(ret, -1);
 	igt_assert_eq(errno, EIO);
 
-	close(stream_fd);
+	__perf_close(stream_fd);
 }
 
 /* Check that attempts to read from a stream while it is disable will return
@@ -2339,25 +2350,24 @@ test_disabled_read_error(void)
 		.num_properties = sizeof(properties) / 16,
 		.properties_ptr = to_user_pointer(properties),
 	};
-	int stream_fd = __perf_open(drm_fd, &param);
 	uint32_t oa_report0[64];
 	uint32_t oa_report1[64];
 	uint32_t buf[128] = { 0 };
 	int ret;
 
+	stream_fd = __perf_open(drm_fd, &param);
 
 	ret = read(stream_fd, buf, sizeof(buf));
 	igt_assert_eq(ret, -1);
 	igt_assert_eq(errno, EIO);
 
-	close(stream_fd);
+	__perf_close(stream_fd);
 
 
 	param.flags &= ~I915_PERF_FLAG_DISABLED;
 	stream_fd = __perf_open(drm_fd, &param);
 
-	read_2_oa_reports(stream_fd,
-			  test_oa_format,
+	read_2_oa_reports(test_oa_format,
 			  oa_exponent,
 			  oa_report0,
 			  oa_report1,
@@ -2371,14 +2381,13 @@ test_disabled_read_error(void)
 
 	do_ioctl(stream_fd, I915_PERF_IOCTL_ENABLE, 0);
 
-	read_2_oa_reports(stream_fd,
-			  test_oa_format,
+	read_2_oa_reports(test_oa_format,
 			  oa_exponent,
 			  oa_report0,
 			  oa_report1,
 			  false); /* not just timer reports */
 
-	close(stream_fd);
+	__perf_close(stream_fd);
 }
 
 static void
@@ -2427,7 +2436,6 @@ test_mi_rpc(void)
 		.num_properties = sizeof(properties) / 16,
 		.properties_ptr = to_user_pointer(properties),
 	};
-	int stream_fd = __perf_open(drm_fd, &param);
 	drm_intel_bufmgr *bufmgr = drm_intel_bufmgr_gem_init(drm_fd, 4096);
 	drm_intel_context *context;
 	struct intel_batchbuffer *batch;
@@ -2435,6 +2443,8 @@ test_mi_rpc(void)
 	uint32_t *report32;
 	int ret;
 
+	stream_fd = __perf_open(drm_fd, &param);
+
 	drm_intel_bufmgr_gem_enable_reuse(bufmgr);
 
 	context = drm_intel_gem_context_create(bufmgr);
@@ -2472,7 +2482,7 @@ test_mi_rpc(void)
 	intel_batchbuffer_free(batch);
 	drm_intel_gem_context_destroy(context);
 	drm_intel_bufmgr_destroy(bufmgr);
-	close(stream_fd);
+	__perf_close(stream_fd);
 }
 
 static void
@@ -2563,7 +2573,6 @@ hsw_test_single_ctx_counters(void)
 	igt_fork(child, 1) {
 		drm_intel_bufmgr *bufmgr;
 		drm_intel_context *context0, *context1;
-		int stream_fd;
 		struct intel_batchbuffer *batch;
 		struct igt_buf src, dst;
 		drm_intel_bo *bo;
@@ -2742,7 +2751,7 @@ hsw_test_single_ctx_counters(void)
 		drm_intel_gem_context_destroy(context0);
 		drm_intel_gem_context_destroy(context1);
 		drm_intel_bufmgr_destroy(bufmgr);
-		close(stream_fd);
+		__perf_close(stream_fd);
 	}
 
 	igt_waitchildren();
@@ -2765,10 +2774,12 @@ test_rc6_disable(void)
 		.num_properties = sizeof(properties) / 16,
 		.properties_ptr = to_user_pointer(properties),
 	};
-	int stream_fd = __perf_open(drm_fd, &param);
-	uint64_t n_events_start = read_debugfs_u64_record(drm_fd, "i915_drpc_info",
-							  "RC6 residency since boot");
-	uint64_t n_events_end;
+	uint64_t n_events_start, n_events_end;
+
+	stream_fd = __perf_open(drm_fd, &param);
+
+	n_events_start = read_debugfs_u64_record(drm_fd, "i915_drpc_info",
+						 "RC6 residency since boot");
 
 	nanosleep(&(struct timespec){ .tv_sec = 0, .tv_nsec = 500000000 }, NULL);
 
@@ -2777,12 +2788,12 @@ test_rc6_disable(void)
 
 	igt_assert_eq(n_events_end - n_events_start, 0);
 
-	close(stream_fd);
+	__perf_close(stream_fd);
 
 	n_events_start = read_debugfs_u64_record(drm_fd, "i915_drpc_info",
 						 "RC6 residency since boot");
 
-	nanosleep(&(struct timespec){ .tv_sec = 0, .tv_nsec = 500000000 }, NULL);
+	nanosleep(&(struct timespec){ .tv_sec = 1, .tv_nsec = 0 }, NULL);
 
 	n_events_end = read_debugfs_u64_record(drm_fd, "i915_drpc_info",
 					       "RC6 residency since boot");
@@ -2920,7 +2931,7 @@ test_create_destroy_userspace_config(void)
 	const char *uuid = "01234567-0123-0123-0123-0123456789ab";
 	uint32_t mux_regs[] = { 0x9888 /* NOA_WRITE */, 0x0 };
 	uint32_t flex_regs[100];
-	int i, stream_fd;
+	int i;
 	uint64_t config_id;
 	uint64_t properties[] = {
 		DRM_I915_PERF_PROP_OA_METRICS_SET, 0, /* Filled later */
@@ -2988,7 +2999,7 @@ test_create_destroy_userspace_config(void)
 	/* Read the config to verify shouldn't raise any issue. */
 	config_id = i915_perf_add_config(drm_fd, &config);
 
-	close(stream_fd);
+	__perf_close(stream_fd);
 
 	i915_perf_remove_config(drm_fd, config_id);
 }
@@ -3160,7 +3171,6 @@ test_i915_ref_count(void)
 		.properties_ptr = to_user_pointer(properties),
 	};
 	unsigned baseline, ref_count0, ref_count1;
-	int stream_fd;
 	uint32_t oa_report0[64];
 	uint32_t oa_report1[64];
 
@@ -3200,14 +3210,13 @@ test_i915_ref_count(void)
 
 	igt_assert(ref_count0 > baseline);
 
-	read_2_oa_reports(stream_fd,
-			  test_oa_format,
+	read_2_oa_reports(test_oa_format,
 			  oa_exp_1_millisec,
 			  oa_report0,
 			  oa_report1,
 			  false); /* not just timer reports */
 
-	close(stream_fd);
+	__perf_close(stream_fd);
 	ref_count0 = read_i915_module_ref();
 	igt_debug("ref count after closing i915 perf stream fd = %u\n", ref_count0);
 	igt_assert_eq(ref_count0, baseline);
-- 
2.14.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH i-g-t v6 02/12] tests/perf: add per context filtering test for gen8+
  2017-10-04 11:19 [PATCH i-g-t v6 00/12] Improve robustness of the i915 perf tests Lionel Landwerlin
  2017-10-04 11:19 ` [PATCH i-g-t v6 01/12] tests/perf: make stream_fd a global variable Lionel Landwerlin
@ 2017-10-04 11:19 ` Lionel Landwerlin
  2017-10-04 11:46   ` Matthew Auld
  2017-10-04 11:19 ` [PATCH i-g-t v6 03/12] tests/perf: update max buffer size for reading reports Lionel Landwerlin
                   ` (12 subsequent siblings)
  14 siblings, 1 reply; 24+ messages in thread
From: Lionel Landwerlin @ 2017-10-04 11:19 UTC (permalink / raw)
  To: intel-gfx

From: Robert Bragg <robert@sixbynine.org>

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 tests/perf.c | 777 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 745 insertions(+), 32 deletions(-)

diff --git a/tests/perf.c b/tests/perf.c
index f89a235e..8644e252 100644
--- a/tests/perf.c
+++ b/tests/perf.c
@@ -48,7 +48,9 @@ IGT_TEST_DESCRIPTION("Test the i915 perf metrics streaming interface");
 #define OAREPORT_REASON_MASK           0x3f
 #define OAREPORT_REASON_SHIFT          19
 #define OAREPORT_REASON_TIMER          (1<<0)
+#define OAREPORT_REASON_INTERNAL       (3<<1)
 #define OAREPORT_REASON_CTX_SWITCH     (1<<3)
+#define OAREPORT_REASON_GO             (1<<4)
 #define OAREPORT_REASON_CLK_RATIO      (1<<5)
 
 #define GFX_OP_PIPE_CONTROL     ((3 << 29) | (3 << 27) | (2 << 24))
@@ -574,6 +576,22 @@ oa_exponent_to_ns(int exponent)
        return 1000000000ULL * (2ULL << exponent) / timestamp_frequency;
 }
 
+static bool
+oa_report_ctx_is_valid(uint32_t *report)
+{
+	if (IS_HASWELL(devid)) {
+		return false; /* TODO */
+	} else if (IS_GEN8(devid)) {
+		return report[0] & (1ul << 25);
+	} else if (IS_GEN9(devid)) {
+		return report[0] & (1ul << 16);
+	}
+
+	/* Need to update this function for newer Gen. */
+	igt_assert(!"reached");
+}
+
+
 static void
 hsw_sanity_check_render_basic_reports(uint32_t *oa_report0, uint32_t *oa_report1,
 				      enum drm_i915_oa_format fmt)
@@ -678,6 +696,100 @@ gen8_40bit_a_delta(uint64_t value0, uint64_t value1)
 		return value1 - value0;
 }
 
+static void
+accumulate_uint32(size_t offset,
+		  uint32_t *report0,
+                  uint32_t *report1,
+                  uint64_t *delta)
+{
+	uint32_t value0 = *(uint32_t *)(((uint8_t *)report0) + offset);
+	uint32_t value1 = *(uint32_t *)(((uint8_t *)report1) + offset);
+
+	*delta += (uint32_t)(value1 - value0);
+}
+
+static void
+accumulate_uint40(int a_index,
+                  uint32_t *report0,
+                  uint32_t *report1,
+		  enum drm_i915_oa_format format,
+                  uint64_t *delta)
+{
+	uint64_t value0 = gen8_read_40bit_a_counter(report0, format, a_index),
+		 value1 = gen8_read_40bit_a_counter(report1, format, a_index);
+
+	*delta += gen8_40bit_a_delta(value0, value1);
+}
+
+static void
+accumulate_reports(struct accumulator *accumulator,
+		   uint32_t *start,
+		   uint32_t *end)
+{
+	enum drm_i915_oa_format format = accumulator->format;
+	uint64_t *deltas = accumulator->deltas;
+	int idx = 0;
+
+	if (intel_gen(devid) >= 8) {
+		/* timestamp */
+		accumulate_uint32(4, start, end, deltas + idx++);
+
+		/* clock cycles */
+		accumulate_uint32(12, start, end, deltas + idx++);
+	} else {
+		/* timestamp */
+		accumulate_uint32(4, start, end, deltas + idx++);
+	}
+
+	for (int i = 0; i < oa_formats[format].n_a40; i++)
+		accumulate_uint40(i, start, end, format, deltas + idx++);
+
+	for (int i = 0; i < oa_formats[format].n_a; i++) {
+		accumulate_uint32(oa_formats[format].a_off + 4 * i,
+				  start, end, deltas + idx++);
+	}
+
+	for (int i = 0; i < oa_formats[format].n_b; i++) {
+		accumulate_uint32(oa_formats[format].b_off + 4 * i,
+				  start, end, deltas + idx++);
+	}
+
+	for (int i = 0; i < oa_formats[format].n_c; i++) {
+		accumulate_uint32(oa_formats[format].c_off + 4 * i,
+				  start, end, deltas + idx++);
+	}
+}
+
+static void
+accumulator_print(struct accumulator *accumulator, const char *title)
+{
+	enum drm_i915_oa_format format = accumulator->format;
+	uint64_t *deltas = accumulator->deltas;
+	int idx = 0;
+
+	igt_debug("%s:\n", title);
+	if (intel_gen(devid) >= 8) {
+		igt_debug("\ttime delta = %lu\n", deltas[idx++]);
+		igt_debug("\tclock cycle delta = %lu\n", deltas[idx++]);
+
+		for (int i = 0; i < oa_formats[format].n_a40; i++)
+			igt_debug("\tA%u = %lu\n", i, deltas[idx++]);
+	} else {
+		igt_debug("\ttime delta = %lu\n", deltas[idx++]);
+	}
+
+	for (int i = 0; i < oa_formats[format].n_a; i++) {
+		int a_id = oa_formats[format].first_a + i;
+		igt_debug("\tA%u = %lu\n", a_id, deltas[idx++]);
+	}
+
+	for (int i = 0; i < oa_formats[format].n_a; i++)
+		igt_debug("\tB%u = %lu\n", i, deltas[idx++]);
+
+	for (int i = 0; i < oa_formats[format].n_c; i++)
+		igt_debug("\tC%u = %lu\n", i, deltas[idx++]);
+}
+
 /* The TestOa metric set is designed so */
 static void
 gen8_sanity_check_test_oa_reports(uint32_t *oa_report0, uint32_t *oa_report1,
@@ -944,6 +1056,62 @@ gt_frequency_range_restore(void)
 	gt_max_freq_mhz = gt_max_freq_mhz_saved;
 }
 
+static int
+i915_read_reports_until_timestamp(enum drm_i915_oa_format oa_format,
+				  uint8_t *buf,
+				  uint32_t max_size,
+				  uint32_t start_timestamp,
+				  uint32_t end_timestamp)
+{
+	size_t format_size = oa_formats[oa_format].size;
+	uint32_t last_seen_timestamp = start_timestamp;
+	int total_len = 0;
+
+	while (last_seen_timestamp < end_timestamp) {
+		int offset, len;
+
+		/* Running out of space. */
+		if ((max_size - total_len) < format_size) {
+			igt_warn("run out of space before reaching "
+				 "end timestamp (%u/%u)\n",
+				 last_seen_timestamp, end_timestamp);
+			return -1;
+		}
+
+		while ((len = read(stream_fd, &buf[total_len],
+				   max_size - total_len)) < 0 &&
+		       errno == EINTR)
+			;
+
+		/* Intentionally return an error. */
+		if (len <= 0) {
+			if (errno == EAGAIN)
+				return total_len;
+			else {
+				igt_warn("error read OA stream : %i\n", errno);
+				return -1;
+			}
+		}
+
+		offset = total_len;
+		total_len += len;
+
+		while (offset < total_len) {
+		  const struct drm_i915_perf_record_header *header =
+		    (const struct drm_i915_perf_record_header *) &buf[offset];
+		  uint32_t *report = (uint32_t *) (header + 1);
+
+		  if (header->type == DRM_I915_PERF_RECORD_SAMPLE)
+		    last_seen_timestamp = report[1];
+
+		  offset += header->size;
+		}
+	}
+
+	return total_len;
+}
+
+
 /* CAP_SYS_ADMIN is required to open system wide metrics, unless the system
  * control parameter dev.i915.perf_stream_paranoid == 0 */
 static void
@@ -1362,6 +1530,70 @@ print_reports(uint32_t *oa_report0, uint32_t *oa_report1, int fmt)
 	}
 }
 
+/* Debug function, only useful when reports don't make sense. */
+#if 0
+static void
+print_report(uint32_t *report, int fmt)
+{
+	igt_debug("TIMESTAMP: %"PRIu32"\n", report[1]);
+
+	if (IS_HASWELL(devid) && oa_formats[fmt].n_c == 0) {
+		igt_debug("CLOCK = N/A\n");
+	} else {
+		uint32_t clock = read_report_ticks(report, fmt);
+
+		igt_debug("CLOCK: %"PRIu32"\n", clock);
+	}
+
+	if (intel_gen(devid) >= 8) {
+		uint32_t slice_freq, unslice_freq;
+		const char *reason = gen8_read_report_reason(report);
+
+		gen8_read_report_clock_ratios(report, &slice_freq, &unslice_freq);
+
+		igt_debug("SLICE CLK: %umhz\n", slice_freq);
+		igt_debug("UNSLICE CLK: %umhz\n", unslice_freq);
+		igt_debug("REASON: \"%s\"\n", reason);
+		igt_debug("CTX ID: %"PRIu32"/%"PRIx32"\n", report[2], report[2]);
+	}
+
+	/* Gen8+ has some 40bit A counters... */
+	for (int j = 0; j < oa_formats[fmt].n_a40; j++) {
+		uint64_t value = gen8_read_40bit_a_counter(report, fmt, j);
+
+		if (undefined_a_counters[j])
+			continue;
+
+		igt_debug("A%d: %"PRIu64"\n", j, value);
+	}
+
+	for (int j = 0; j < oa_formats[fmt].n_a; j++) {
+		uint32_t *a = (uint32_t *)(((uint8_t *)report) +
+					   oa_formats[fmt].a_off);
+		int a_id = oa_formats[fmt].first_a + j;
+
+		if (undefined_a_counters[a_id])
+			continue;
+
+		igt_debug("A%d: %"PRIu32"\n", a_id, a[j]);
+	}
+
+	for (int j = 0; j < oa_formats[fmt].n_b; j++) {
+		uint32_t *b = (uint32_t *)(((uint8_t *)report) +
+					   oa_formats[fmt].b_off);
+
+		igt_debug("B%d: %"PRIu32"\n", j, b[j]);
+	}
+
+	for (int j = 0; j < oa_formats[fmt].n_c; j++) {
+		uint32_t *c = (uint32_t *)(((uint8_t *)report) +
+					   oa_formats[fmt].c_off);
+
+		igt_debug("C%d: %"PRIu32"\n", j, c[j]);
+	}
+}
+#endif
+
 static void
 test_oa_formats(void)
 {
@@ -2486,14 +2718,8 @@ test_mi_rpc(void)
 }
 
 static void
-scratch_buf_init(drm_intel_bufmgr *bufmgr,
-		 struct igt_buf *buf,
-		 int width, int height,
-		 uint32_t color)
+scratch_buf_memset(drm_intel_bo *bo, int width, int height, uint32_t color)
 {
-	size_t stride = width * 4;
-	size_t size = stride * height;
-	drm_intel_bo *bo = drm_intel_bo_alloc(bufmgr, "", size, 4096);
 	int ret;
 
 	ret = drm_intel_bo_map(bo, true /* writable */);
@@ -2503,6 +2729,19 @@ scratch_buf_init(drm_intel_bufmgr *bufmgr,
 		((uint32_t *)bo->virtual)[i] = color;
 
 	drm_intel_bo_unmap(bo);
+}
+
+static void
+scratch_buf_init(drm_intel_bufmgr *bufmgr,
+		 struct igt_buf *buf,
+		 int width, int height,
+		 uint32_t color)
+{
+	size_t stride = width * 4;
+	size_t size = stride * height;
+	drm_intel_bo *bo = drm_intel_bo_alloc(bufmgr, "", size, 4096);
+
+	scratch_buf_memset(bo, width, height, color);
 
 	buf->bo = bo;
 	buf->stride = stride;
@@ -2521,14 +2760,25 @@ emit_stall_timestamp_and_rpc(struct intel_batchbuffer *batch,
 				   PIPE_CONTROL_RENDER_TARGET_FLUSH |
 				   PIPE_CONTROL_WRITE_TIMESTAMP);
 
-	BEGIN_BATCH(5, 1);
-	OUT_BATCH(GFX_OP_PIPE_CONTROL | (5 - 2));
-	OUT_BATCH(pipe_ctl_flags);
-	OUT_RELOC(dst, I915_GEM_DOMAIN_INSTRUCTION, I915_GEM_DOMAIN_INSTRUCTION,
-		  timestamp_offset);
-	OUT_BATCH(0); /* imm lower */
-	OUT_BATCH(0); /* imm upper */
-	ADVANCE_BATCH();
+	if (intel_gen(devid) >= 8) {
+		BEGIN_BATCH(5, 1);
+		OUT_BATCH(GFX_OP_PIPE_CONTROL | (6 - 2));
+		OUT_BATCH(pipe_ctl_flags);
+		OUT_RELOC(dst, I915_GEM_DOMAIN_INSTRUCTION, I915_GEM_DOMAIN_INSTRUCTION,
+			  timestamp_offset);
+		OUT_BATCH(0); /* imm lower */
+		OUT_BATCH(0); /* imm upper */
+		ADVANCE_BATCH();
+	} else {
+		BEGIN_BATCH(5, 1);
+		OUT_BATCH(GFX_OP_PIPE_CONTROL | (5 - 2));
+		OUT_BATCH(pipe_ctl_flags);
+		OUT_RELOC(dst, I915_GEM_DOMAIN_INSTRUCTION, I915_GEM_DOMAIN_INSTRUCTION,
+			  timestamp_offset);
+		OUT_BATCH(0); /* imm lower */
+		OUT_BATCH(0); /* imm upper */
+		ADVANCE_BATCH();
+	}
 
 	emit_report_perf_count(batch, dst, report_dst_offset, report_id);
 }
@@ -2574,7 +2824,7 @@ hsw_test_single_ctx_counters(void)
 		drm_intel_bufmgr *bufmgr;
 		drm_intel_context *context0, *context1;
 		struct intel_batchbuffer *batch;
-		struct igt_buf src, dst;
+		struct igt_buf src[3], dst[3];
 		drm_intel_bo *bo;
 		uint32_t *report0_32, *report1_32;
 		uint64_t timestamp0_64, timestamp1_64;
@@ -2592,8 +2842,10 @@ hsw_test_single_ctx_counters(void)
 		bufmgr = drm_intel_bufmgr_gem_init(drm_fd, 4096);
 		drm_intel_bufmgr_gem_enable_reuse(bufmgr);
 
-		scratch_buf_init(bufmgr, &src, width, height, 0xff0000ff);
-		scratch_buf_init(bufmgr, &dst, width, height, 0x00ff00ff);
+		for (int i = 0; i < ARRAY_SIZE(src); i++) {
+			scratch_buf_init(bufmgr, &src[i], width, height, 0xff0000ff);
+			scratch_buf_init(bufmgr, &dst[i], width, height, 0x00ff00ff);
+		}
 
 		batch = intel_batchbuffer_alloc(bufmgr, devid);
 
@@ -2627,14 +2879,19 @@ hsw_test_single_ctx_counters(void)
 		 */
 		render_copy(batch,
 			    context0,
-			    &src, 0, 0, width, height,
-			    &dst, 0, 0);
+			    &src[0], 0, 0, width, height,
+			    &dst[0], 0, 0);
 
 		ret = drm_intel_gem_context_get_id(context0, &ctx_id);
 		igt_assert_eq(ret, 0);
 		igt_assert_neq(ctx_id, 0xffffffff);
 		properties[1] = ctx_id;
 
+		intel_batchbuffer_flush_with_context(batch, context0);
+
+		scratch_buf_memset(src[0].bo, width, height, 0xff0000ff);
+		scratch_buf_memset(dst[0].bo, width, height, 0x00ff00ff);
+
 		igt_debug("opening i915-perf stream\n");
 		stream_fd = __perf_open(drm_fd, &param);
 
@@ -2661,8 +2918,8 @@ hsw_test_single_ctx_counters(void)
 
 		render_copy(batch,
 			    context0,
-			    &src, 0, 0, width, height,
-			    &dst, 0, 0);
+			    &src[0], 0, 0, width, height,
+			    &dst[0], 0, 0);
 
 		/* Another redundant flush to clarify batch bo is free to reuse */
 		intel_batchbuffer_flush_with_context(batch, context0);
@@ -2673,13 +2930,13 @@ hsw_test_single_ctx_counters(void)
 		 */
 		render_copy(batch,
 			    context1,
-			    &src, 0, 0, width, height,
-			    &dst, 0, 0);
+			    &src[1], 0, 0, width, height,
+			    &dst[1], 0, 0);
 
 		render_copy(batch,
 			    context1,
-			    &src, 0, 0, width, height,
-			    &dst, 0, 0);
+			    &src[2], 0, 0, width, height,
+			    &dst[2], 0, 0);
 
 		/* And another */
 		intel_batchbuffer_flush_with_context(batch, context1);
@@ -2708,6 +2965,7 @@ hsw_test_single_ctx_counters(void)
 
 		/* A40 == N samples written to all render targets */
 		n_samples_written = report1_32[43] - report0_32[43];
+
 		igt_debug("n samples written = %d\n", n_samples_written);
 		igt_assert_eq(n_samples_written, width * height);
 
@@ -2742,8 +3000,10 @@ hsw_test_single_ctx_counters(void)
 			(delta_oa32_ns - delta_ts64_ns);
 		igt_assert(delta_delta <= 320);
 
-		drm_intel_bo_unreference(src.bo);
-		drm_intel_bo_unreference(dst.bo);
+		for (int i = 0; i < ARRAY_SIZE(src); i++) {
+			drm_intel_bo_unreference(src[i].bo);
+			drm_intel_bo_unreference(dst[i].bo);
+		}
 
 		drm_intel_bo_unmap(bo);
 		drm_intel_bo_unreference(bo);
@@ -2757,6 +3017,452 @@ hsw_test_single_ctx_counters(void)
 	igt_waitchildren();
 }
 
+/* Tests the INTEL_performance_query use case where an unprivileged process
+ * should be able to configure the OA unit for per-context metrics (for a
+ * context associated with that process' drm file descriptor) and the counters
+ * should only relate to that specific context.
+ *
+ * For Gen8+ although reports read via i915 perf can be filtered for a single
+ * context the counters themselves always progress as global/system-wide
+ * counters affected by all contexts. To support the INTEL_performance_query
+ * use case on Gen8+ it's necessary to combine OABUFFER and
+ * MI_REPORT_PERF_COUNT reports so that counter normalisation can take into
+ * account context-switch reports and factor out any counter progression not
+ * associated with the current context.
+ */
+static void
+gen8_test_single_ctx_render_target_writes_a_counter(void)
+{
+	int oa_exponent = max_oa_exponent_for_period_lte(1000000);
+	uint64_t properties[] = {
+		DRM_I915_PERF_PROP_CTX_HANDLE, UINT64_MAX, /* updated below */
+
+		/* Note: we have to specify at least one sample property even
+		 * though we aren't interested in samples in this case
+		 */
+		DRM_I915_PERF_PROP_SAMPLE_OA, true,
+
+		/* OA unit configuration */
+		DRM_I915_PERF_PROP_OA_METRICS_SET, test_metric_set_id,
+		DRM_I915_PERF_PROP_OA_FORMAT, test_oa_format,
+		DRM_I915_PERF_PROP_OA_EXPONENT, oa_exponent,
+
+		/* Note: no OA exponent specified in this case */
+	};
+	struct drm_i915_perf_open_param param = {
+		.flags = I915_PERF_FLAG_FD_CLOEXEC,
+		.num_properties = ARRAY_SIZE(properties) / 2,
+		.properties_ptr = to_user_pointer(properties),
+	};
+	size_t format_size = oa_formats[test_oa_format].size;
+	size_t sample_size = (sizeof(struct drm_i915_perf_record_header) +
+			      format_size);
+	int max_reports = (16 * 1024 * 1024) / format_size;
+	int buf_size = sample_size * max_reports * 1.5;
+	int child_ret;
+	uint8_t *buf = malloc(buf_size);
+	ssize_t len;
+	struct igt_helper_process child = {};
+
+	/* should be default, but just to be sure... */
+	write_u64_file("/proc/sys/dev/i915/perf_stream_paranoid", 1);
+
+	do {
+
+		igt_fork_helper(&child) {
+			struct drm_i915_perf_record_header *header;
+			drm_intel_bufmgr *bufmgr;
+			drm_intel_context *context0, *context1;
+			struct intel_batchbuffer *batch;
+			struct igt_buf src[3], dst[3];
+			drm_intel_bo *bo;
+			uint32_t *report0_32, *report1_32;
+			uint32_t *prev, *lprev = NULL;
+			uint64_t timestamp0_64, timestamp1_64;
+			uint32_t delta_ts64, delta_oa32;
+			uint64_t delta_ts64_ns, delta_oa32_ns;
+			uint32_t delta_delta;
+			int width = 800;
+			int height = 600;
+			uint32_t ctx_id = 0xffffffff; /* invalid handle */
+			uint32_t ctx1_id = 0xffffffff;  /* invalid handle */
+			uint32_t current_ctx_id = 0xffffffff;
+			uint32_t n_invalid_ctx = 0;
+			int ret;
+			struct accumulator accumulator = {
+				.format = test_oa_format
+			};
+
+			bufmgr = drm_intel_bufmgr_gem_init(drm_fd, 4096);
+			drm_intel_bufmgr_gem_enable_reuse(bufmgr);
+
+			for (int i = 0; i < ARRAY_SIZE(src); i++) {
+				scratch_buf_init(bufmgr, &src[i], width, height, 0xff0000ff);
+				scratch_buf_init(bufmgr, &dst[i], width, height, 0x00ff00ff);
+			}
+
+			batch = intel_batchbuffer_alloc(bufmgr, devid);
+
+			context0 = drm_intel_gem_context_create(bufmgr);
+			igt_assert(context0);
+
+			context1 = drm_intel_gem_context_create(bufmgr);
+			igt_assert(context1);
+
+			igt_debug("submitting warm up render_copy\n");
+
+			/* Submit some early, unmeasured, work to the context we want
+			 * to measure to try and catch issues with i915-perf
+			 * initializing the HW context ID for filtering.
+			 *
+			 * We do this because i915-perf single context filtering had
+			 * previously only relied on a hook into context pinning to
+			 * initialize the HW context ID, instead of also trying to
+			 * determine the HW ID while opening the stream, in case it
+			 * has already been pinned.
+			 *
+			 * This wasn't noticed by the previous unit test because we
+			 * were opening the stream while the context hadn't been
+			 * touched or pinned yet and so it worked out correctly to wait
+			 * for the pinning hook.
+			 *
+			 * Now a buggy version of i915-perf will fail to measure
+			 * anything for context0 once this initial render_copy() ends
+			 * up pinning the context since there won't ever be a pinning
+			 * hook callback.
+			 */
+			render_copy(batch,
+				    context0,
+				    &src[0], 0, 0, width, height,
+				    &dst[0], 0, 0);
+
+			ret = drm_intel_gem_context_get_id(context0, &ctx_id);
+			igt_assert_eq(ret, 0);
+			igt_assert_neq(ctx_id, 0xffffffff);
+			properties[1] = ctx_id;
+
+			scratch_buf_memset(src[0].bo, width, height, 0xff0000ff);
+			scratch_buf_memset(dst[0].bo, width, height, 0x00ff00ff);
+
+			igt_debug("opening i915-perf stream\n");
+			stream_fd = __perf_open(drm_fd, &param);
+
+			bo = drm_intel_bo_alloc(bufmgr, "mi_rpc dest bo", 4096, 64);
+
+			ret = drm_intel_bo_map(bo, true /* write enable */);
+			igt_assert_eq(ret, 0);
+
+			memset(bo->virtual, 0x80, 4096);
+			drm_intel_bo_unmap(bo);
+
+			emit_stall_timestamp_and_rpc(batch,
+						     bo,
+						     512 /* timestamp offset */,
+						     0, /* report dst offset */
+						     0xdeadbeef); /* report id */
+
+			/* Explicitly flush here (even though the render_copy() call
+			 * will itself flush before/after the copy) to clarify that
+			 * that the PIPE_CONTROL + MI_RPC commands will be in a
+			 * separate batch from the copy.
+			 */
+			intel_batchbuffer_flush_with_context(batch, context0);
+
+			render_copy(batch,
+				    context0,
+				    &src[0], 0, 0, width, height,
+				    &dst[0], 0, 0);
+
+			/* Another redundant flush to clarify batch bo is free to reuse */
+			intel_batchbuffer_flush_with_context(batch, context0);
+
+			/* submit two copies on the other context to avoid a false
+			 * positive in case the driver somehow ended up filtering for
+			 * context1
+			 */
+			render_copy(batch,
+				    context1,
+				    &src[1], 0, 0, width, height,
+				    &dst[1], 0, 0);
+
+			ret = drm_intel_gem_context_get_id(context1, &ctx1_id);
+			igt_assert_eq(ret, 0);
+			igt_assert_neq(ctx1_id, 0xffffffff);
+
+			render_copy(batch,
+				    context1,
+				    &src[2], 0, 0, width, height,
+				    &dst[2], 0, 0);
+
+			/* And another */
+			intel_batchbuffer_flush_with_context(batch, context1);
+
+			emit_stall_timestamp_and_rpc(batch,
+						     bo,
+						     520 /* timestamp offset */,
+						     256, /* report dst offset */
+						     0xbeefbeef); /* report id */
+
+			intel_batchbuffer_flush_with_context(batch, context1);
+
+			ret = drm_intel_bo_map(bo, false /* write enable */);
+			igt_assert_eq(ret, 0);
+
+			report0_32 = bo->virtual;
+			igt_assert_eq(report0_32[0], 0xdeadbeef); /* report ID */
+			igt_assert_neq(report0_32[1], 0); /* timestamp */
+			//report0_32[2] = 0xffffffff;
+			prev = report0_32;
+			ctx_id = prev[2];
+			igt_debug("MI_RPC(start) CTX ID: %u\n", ctx_id);
+
+			report1_32 = report0_32 + 64; /* 64 uint32_t = 256bytes offset */
+			igt_assert_eq(report1_32[0], 0xbeefbeef); /* report ID */
+			igt_assert_neq(report1_32[1], 0); /* timestamp */
+			//report1_32[2] = 0xffffffff;
+			ctx1_id = report1_32[2];
+
+			memset(accumulator.deltas, 0, sizeof(accumulator.deltas));
+			accumulate_reports(&accumulator, report0_32, report1_32);
+			igt_debug("total: A0 = %lu, A21 = %lu, A26 = %lu\n",
+				  accumulator.deltas[2 + 0], /* skip timestamp + clock cycles */
+				  accumulator.deltas[2 + 21],
+				  accumulator.deltas[2 + 26]);
+
+			igt_debug("oa_timestamp32 0 = %u\n", report0_32[1]);
+			igt_debug("oa_timestamp32 1 = %u\n", report1_32[1]);
+			igt_debug("ctx_id 0 = %u\n", report0_32[2]);
+			igt_debug("ctx_id 1 = %u\n", report1_32[2]);
+
+			timestamp0_64 = *(uint64_t *)(((uint8_t *)bo->virtual) + 512);
+			timestamp1_64 = *(uint64_t *)(((uint8_t *)bo->virtual) + 520);
+
+			igt_debug("ts_timestamp64 0 = %"PRIu64"\n", timestamp0_64);
+			igt_debug("ts_timestamp64 1 = %"PRIu64"\n", timestamp1_64);
+
+			delta_ts64 = timestamp1_64 - timestamp0_64;
+			delta_oa32 = report1_32[1] - report0_32[1];
+
+			/* sanity check that we can pass the delta to timebase_scale */
+			igt_assert(delta_ts64 < UINT32_MAX);
+			delta_oa32_ns = timebase_scale(delta_oa32);
+			delta_ts64_ns = timebase_scale(delta_ts64);
+
+			igt_debug("oa32 delta = %u, = %uns\n",
+				  delta_oa32, (unsigned)delta_oa32_ns);
+			igt_debug("ts64 delta = %u, = %uns\n",
+				  delta_ts64, (unsigned)delta_ts64_ns);
+
+			/* The delta as calculated via the PIPE_CONTROL timestamp or
+			 * the OA report timestamps should be almost identical but
+			 * allow a 500 nanoseconds margin.
+			 */
+			delta_delta = delta_ts64_ns > delta_oa32_ns ?
+				(delta_ts64_ns - delta_oa32_ns) :
+				(delta_oa32_ns - delta_ts64_ns);
+			if (delta_delta > 500) {
+				igt_debug("skipping\n");
+				exit(EAGAIN);
+			}
+
+			len = i915_read_reports_until_timestamp(test_oa_format,
+								buf, buf_size,
+								report0_32[1],
+								report1_32[1]);
+
+			igt_assert(len > 0);
+			igt_debug("read %d bytes\n", (int)len);
+
+			memset(accumulator.deltas, 0, sizeof(accumulator.deltas));
+
+			for (size_t offset = 0; offset < len; offset += header->size) {
+				uint32_t *report;
+				uint32_t reason;
+				const char *skip_reason = NULL, *report_reason = NULL;
+				struct accumulator laccumulator = {
+					.format = test_oa_format
+				};
+
+
+				header = (void *)(buf + offset);
+
+				igt_assert_eq(header->pad, 0); /* Reserved */
+
+				/* Currently the only test that should ever expect to
+				 * see a _BUFFER_LOST error is the buffer_fill test,
+				 * otherwise something bad has probably happened...
+				 */
+				igt_assert_neq(header->type, DRM_I915_PERF_RECORD_OA_BUFFER_LOST);
+
+				/* At high sampling frequencies the OA HW might not be
+				 * able to cope with all write requests and will notify
+				 * us that a report was lost.
+				 *
+				 * XXX: we should maybe restart the test in this case?
+				 */
+				if (header->type == DRM_I915_PERF_RECORD_OA_REPORT_LOST) {
+					igt_debug("OA trigger collision / report lost\n");
+					exit(EAGAIN);
+				}
+
+				/* Currently the only other record type expected is a
+				 * _SAMPLE. Notably this test will need updating if
+				 * i915-perf is extended in the future with additional
+				 * record types.
+				 */
+				igt_assert_eq(header->type, DRM_I915_PERF_RECORD_SAMPLE);
+
+				igt_assert_eq(header->size, sample_size);
+
+				report = (void *)(header + 1);
+
+				/* Don't expect zero for timestamps */
+				igt_assert_neq(report[1], 0);
+
+				igt_debug("report %p:\n", report);
+
+				/* Discard reports not contained in between the
+				 * timestamps we're looking at. */
+				{
+					uint32_t time_delta = report[1] - report0_32[1];
+
+					if (timebase_scale(time_delta) > 1000000000) {
+						skip_reason = "prior first mi-rpc";
+					}
+				}
+
+				{
+					uint32_t time_delta = report[1] - report1_32[1];
+
+					if (timebase_scale(time_delta) <= 1000000000) {
+						igt_debug("    comes after last MI_RPC (%u)\n",
+							  report1_32[1]);
+						report = report1_32;
+					}
+				}
+
+				/* Print out deltas for a few significant
+				 * counters for each report. */
+				if (lprev) {
+					memset(laccumulator.deltas, 0, sizeof(laccumulator.deltas));
+					accumulate_reports(&laccumulator, lprev, report);
+					igt_debug("    deltas: A0=%lu A21=%lu, A26=%lu\n",
+						  laccumulator.deltas[2 + 0], /* skip timestamp + clock cycles */
+						  laccumulator.deltas[2 + 21],
+						  laccumulator.deltas[2 + 26]);
+				}
+				lprev = report;
+
+				/* Print out reason for the report. */
+				reason = ((report[0] >> OAREPORT_REASON_SHIFT) &
+					  OAREPORT_REASON_MASK);
+
+				if (reason & OAREPORT_REASON_CTX_SWITCH) {
+					report_reason = "ctx-load";
+				} else if (reason & OAREPORT_REASON_TIMER) {
+					report_reason = "timer";
+				} else if (reason & OAREPORT_REASON_INTERNAL ||
+					   reason & OAREPORT_REASON_GO ||
+					   reason & OAREPORT_REASON_CLK_RATIO) {
+					report_reason = "internal/go/clk-ratio";
+				} else {
+					report_reason = "end-mi-rpc";
+				}
+				igt_debug("    ctx_id=%u/%x reason=%s oa_timestamp32=%u\n",
+					  report[2], report[2], report_reason, report[1]);
+
+				/* Should we skip this report?
+				 *
+				 *   Only if the current context id of
+				 *   the stream is not the one we want
+				 *   to measure.
+				 */
+				if (current_ctx_id != ctx_id) {
+					skip_reason = "not our context";
+				}
+
+				if (n_invalid_ctx > 1) {
+					skip_reason = "too many invalid context events";
+				}
+
+				if (!skip_reason) {
+					accumulate_reports(&accumulator, prev, report);
+					igt_debug(" -> Accumulated deltas A0=%lu A21=%lu, A26=%lu\n",
+						  accumulator.deltas[2 + 0], /* skip timestamp + clock cycles */
+						  accumulator.deltas[2 + 21],
+						  accumulator.deltas[2 + 26]);
+				} else {
+					igt_debug(" -> Skipping: %s\n", skip_reason);
+				}
+
+
+				/* Finally update current-ctx_id, only possible
+				 * with a valid context id. */
+				if (oa_report_ctx_is_valid(report)) {
+					current_ctx_id = report[2];
+					n_invalid_ctx = 0;
+				} else {
+					n_invalid_ctx++;
+				}
+
+				prev = report;
+
+				if (report == report1_32) {
+					igt_debug("Breaking on end of report\n");
+					print_reports(report0_32, report1_32,
+						      lookup_format(test_oa_format));
+					break;
+				}
+			}
+
+			igt_debug("n samples written = %ld/%lu (%ix%i)\n",
+				  accumulator.deltas[2 + 21],/* skip timestamp + clock cycles */
+				  accumulator.deltas[2 + 26],
+				  width, height);
+			accumulator_print(&accumulator, "filtered");
+
+			ret = drm_intel_bo_map(src[0].bo, false /* write enable */);
+			igt_assert_eq(ret, 0);
+			ret = drm_intel_bo_map(dst[0].bo, false /* write enable */);
+			igt_assert_eq(ret, 0);
+
+			ret = memcmp(src[0].bo->virtual, dst[0].bo->virtual, 4 * width * height);
+			if (ret != 0) {
+				accumulator_print(&accumulator, "total");
+				/* This needs to be investigated... From time
+				 * to time, the work we kick off doesn't seem
+				 * to happen. WTH?? */
+				exit(EAGAIN);
+			}
+			//igt_assert_eq(ret, 0);
+
+			drm_intel_bo_unmap(src[0].bo);
+			drm_intel_bo_unmap(dst[0].bo);
+
+			igt_assert_eq(accumulator.deltas[2 + 26], width * height);
+
+			for (int i = 0; i < ARRAY_SIZE(src); i++) {
+				drm_intel_bo_unreference(src[i].bo);
+				drm_intel_bo_unreference(dst[i].bo);
+			}
+
+			drm_intel_bo_unmap(bo);
+			drm_intel_bo_unreference(bo);
+			intel_batchbuffer_free(batch);
+			drm_intel_gem_context_destroy(context0);
+			drm_intel_gem_context_destroy(context1);
+			drm_intel_bufmgr_destroy(bufmgr);
+			__perf_close(stream_fd);
+		}
+
+		child_ret = igt_wait_helper(&child);
+
+		igt_assert(WEXITSTATUS(child_ret) == EAGAIN ||
+			   WEXITSTATUS(child_ret) == 0);
+
+	} while (WEXITSTATUS(child_ret) == EAGAIN);
+}
+
 static void
 test_rc6_disable(void)
 {
@@ -3298,8 +4004,10 @@ igt_main
 		test_oa_exponents(550);
 	}
 
-	igt_subtest("per-context-mode-unprivileged")
+	igt_subtest("per-context-mode-unprivileged") {
+		igt_require(IS_HASWELL(devid));
 		test_per_context_mode_unprivileged();
+	}
 
 	igt_subtest("buffer-fill")
 		test_buffer_fill();
@@ -3324,15 +4032,20 @@ igt_main
 	igt_subtest("mi-rpc")
 		test_mi_rpc();
 
-	igt_subtest("unprivileged-singled-ctx-counters") {
+	igt_subtest("unprivileged-single-ctx-counters") {
+		igt_require(IS_HASWELL(devid));
+		hsw_test_single_ctx_counters();
+	}
+
+	igt_subtest("gen8-unprivileged-single-ctx-counters") {
 		/* For Gen8+ the OA unit can no longer be made to clock gate
 		 * for a specific context. Additionally the partial-replacement
 		 * functionality to HW filter timer reports for a specific
 		 * context (SKL+) can't stop multiple applications viewing
 		 * system-wide data via MI_REPORT_PERF_COUNT commands.
 		 */
-		igt_require(IS_HASWELL(devid));
-		hsw_test_single_ctx_counters();
+		igt_require(intel_gen(devid) >= 8);
+		gen8_test_single_ctx_render_target_writes_a_counter();
 	}
 
 	igt_subtest("rc6-disable")
-- 
2.14.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH i-g-t v6 03/12] tests/perf: update max buffer size for reading reports
  2017-10-04 11:19 [PATCH i-g-t v6 00/12] Improve robustness of the i915 perf tests Lionel Landwerlin
  2017-10-04 11:19 ` [PATCH i-g-t v6 01/12] tests/perf: make stream_fd a global variable Lionel Landwerlin
  2017-10-04 11:19 ` [PATCH i-g-t v6 02/12] tests/perf: add per context filtering test for gen8+ Lionel Landwerlin
@ 2017-10-04 11:19 ` Lionel Landwerlin
  2017-10-04 11:19 ` [PATCH i-g-t v6 04/12] tests/perf: rc6: try to guess when rc6 is disabled Lionel Landwerlin
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Lionel Landwerlin @ 2017-10-04 11:19 UTC (permalink / raw)
  To: intel-gfx

Make it clear that we're using a 16Mb buffer.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
---
 tests/perf.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/tests/perf.c b/tests/perf.c
index 8644e252..bd139bde 100644
--- a/tests/perf.c
+++ b/tests/perf.c
@@ -1298,9 +1298,7 @@ read_2_oa_reports(int format_id,
 	/* Note: we allocate a large buffer so that each read() iteration
 	 * should scrape *all* pending records.
 	 *
-	 * The largest buffer the OA unit supports is 16MB and the smallest
-	 * OA report format is 64bytes allowing up to 262144 reports to
-	 * be buffered.
+	 * The largest buffer the OA unit supports is 16MB.
 	 *
 	 * Being sure we are fetching all buffered reports allows us to
 	 * potentially throw away / skip all reports whenever we see
@@ -1313,7 +1311,8 @@ read_2_oa_reports(int format_id,
 	 * to indicate that the OA unit may be over taxed if lots of reports
 	 * are being lost.
 	 */
-	int buf_size = 262144 * (64 + sizeof(struct drm_i915_perf_record_header));
+	int max_reports = (16 * 1024 * 1024) / format_size;
+	int buf_size = sample_size * max_reports * 1.5;
 	uint8_t *buf = malloc(buf_size);
 	int n = 0;
 
@@ -1325,6 +1324,7 @@ read_2_oa_reports(int format_id,
 			;
 
 		igt_assert(len > 0);
+		igt_debug("read %d bytes\n", (int)len);
 
 		for (size_t offset = 0; offset < len; offset += header->size) {
 			const uint32_t *report;
-- 
2.14.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH i-g-t v6 04/12] tests/perf: rc6: try to guess when rc6 is disabled
  2017-10-04 11:19 [PATCH i-g-t v6 00/12] Improve robustness of the i915 perf tests Lionel Landwerlin
                   ` (2 preceding siblings ...)
  2017-10-04 11:19 ` [PATCH i-g-t v6 03/12] tests/perf: update max buffer size for reading reports Lionel Landwerlin
@ 2017-10-04 11:19 ` Lionel Landwerlin
  2017-10-04 11:19 ` [PATCH i-g-t v6 05/12] tests/perf: remove frequency related changes Lionel Landwerlin
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Lionel Landwerlin @ 2017-10-04 11:19 UTC (permalink / raw)
  To: intel-gfx

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
---
 tests/perf.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/tests/perf.c b/tests/perf.c
index bd139bde..5fe0a332 100644
--- a/tests/perf.c
+++ b/tests/perf.c
@@ -3463,6 +3463,17 @@ gen8_test_single_ctx_render_target_writes_a_counter(void)
 	} while (WEXITSTATUS(child_ret) == EAGAIN);
 }
 
+static bool
+rc6_enabled(void)
+{
+	char *rc6_status = read_debugfs_record(drm_fd, "i915_drpc_info",
+					       "RC6 Enabled");
+	bool enabled = strcmp(rc6_status, "yes") == 0;
+
+	free(rc6_status);
+	return enabled;
+}
+
 static void
 test_rc6_disable(void)
 {
@@ -3482,6 +3493,8 @@ test_rc6_disable(void)
 	};
 	uint64_t n_events_start, n_events_end;
 
+	igt_skip_on(!rc6_enabled());
+
 	stream_fd = __perf_open(drm_fd, &param);
 
 	n_events_start = read_debugfs_u64_record(drm_fd, "i915_drpc_info",
-- 
2.14.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH i-g-t v6 05/12] tests/perf: remove frequency related changes
  2017-10-04 11:19 [PATCH i-g-t v6 00/12] Improve robustness of the i915 perf tests Lionel Landwerlin
                   ` (3 preceding siblings ...)
  2017-10-04 11:19 ` [PATCH i-g-t v6 04/12] tests/perf: rc6: try to guess when rc6 is disabled Lionel Landwerlin
@ 2017-10-04 11:19 ` Lionel Landwerlin
  2017-10-04 11:19 ` [PATCH i-g-t v6 06/12] tests/perf: rework oa-exponent test Lionel Landwerlin
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Lionel Landwerlin @ 2017-10-04 11:19 UTC (permalink / raw)
  To: intel-gfx

Experience shows that most of the issues we face with periodicity of
the reports produced by the OA unit are related to power management,
not frequency.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
---
 tests/perf.c | 141 ++++-------------------------------------------------------
 1 file changed, 9 insertions(+), 132 deletions(-)

diff --git a/tests/perf.c b/tests/perf.c
index 5fe0a332..f256bac3 100644
--- a/tests/perf.c
+++ b/tests/perf.c
@@ -293,12 +293,9 @@ static int card = -1;
 static int n_eus;
 
 static uint64_t test_metric_set_id = UINT64_MAX;
-static uint64_t gt_min_freq_mhz_saved = 0;
-static uint64_t gt_max_freq_mhz_saved = 0;
-static uint64_t gt_min_freq_mhz = 0;
-static uint64_t gt_max_freq_mhz = 0;
 
 static uint64_t timestamp_frequency = 12500000;
+static uint64_t gt_max_freq_mhz = 0;
 static enum drm_i915_oa_format test_oa_format;
 static bool *undefined_a_counters;
 static uint64_t oa_exp_1_millisec;
@@ -402,16 +399,6 @@ sysfs_read(const char *file)
 	return read_u64_file(buf);
 }
 
-static void
-sysfs_write(const char *file, uint64_t val)
-{
-	char buf[512];
-
-	snprintf(buf, sizeof(buf), "/sys/class/drm/card%d/%s", card, file);
-
-	write_u64_file(buf, val);
-}
-
 static char *
 read_debugfs_record(int device, const char *file, const char *key)
 {
@@ -1008,54 +995,6 @@ init_sys_info(void)
 	return try_read_u64_file(buf, &test_metric_set_id);
 }
 
-static void
-gt_frequency_range_save(void)
-{
-	gt_min_freq_mhz_saved = sysfs_read("gt_min_freq_mhz");
-	gt_max_freq_mhz_saved = sysfs_read("gt_max_freq_mhz");
-
-	gt_min_freq_mhz = gt_min_freq_mhz_saved;
-	gt_max_freq_mhz = gt_max_freq_mhz_saved;
-}
-
-static void
-gt_frequency_pin(int gt_freq_mhz)
-{
-	igt_debug("requesting pinned GT freq = %dmhz\n", gt_freq_mhz);
-
-	if (gt_freq_mhz > gt_max_freq_mhz) {
-		sysfs_write("gt_max_freq_mhz", gt_freq_mhz);
-		sysfs_write("gt_min_freq_mhz", gt_freq_mhz);
-	} else {
-		sysfs_write("gt_min_freq_mhz", gt_freq_mhz);
-		sysfs_write("gt_max_freq_mhz", gt_freq_mhz);
-	}
-	gt_min_freq_mhz = gt_freq_mhz;
-	gt_max_freq_mhz = gt_freq_mhz;
-}
-
-static void
-gt_frequency_range_restore(void)
-{
-	igt_debug("restoring GT frequency range: min = %dmhz, max =%dmhz, current: min=%dmhz, max=%dmhz\n",
-		  (int)gt_min_freq_mhz_saved,
-		  (int)gt_max_freq_mhz_saved,
-		  (int)gt_min_freq_mhz,
-		  (int)gt_max_freq_mhz);
-
-	/* Assume current min/max are the same */
-	if (gt_min_freq_mhz_saved > gt_max_freq_mhz) {
-		sysfs_write("gt_max_freq_mhz", gt_max_freq_mhz_saved);
-		sysfs_write("gt_min_freq_mhz", gt_min_freq_mhz_saved);
-	} else {
-		sysfs_write("gt_min_freq_mhz", gt_min_freq_mhz_saved);
-		sysfs_write("gt_max_freq_mhz", gt_max_freq_mhz_saved);
-	}
-
-	gt_min_freq_mhz = gt_min_freq_mhz_saved;
-	gt_max_freq_mhz = gt_max_freq_mhz_saved;
-}
-
 static int
 i915_read_reports_until_timestamp(enum drm_i915_oa_format oa_format,
 				  uint8_t *buf,
@@ -1632,33 +1571,9 @@ test_oa_formats(void)
 }
 
 static void
-test_oa_exponents(int gt_freq_mhz)
+test_oa_exponents(void)
 {
-	uint32_t freq_margin;
-
-	/* This test tries to use the sysfs interface for pinning the GT
-	 * frequency so we have another point of reference for comparing with
-	 * the clock frequency as derived from OA reports.
-	 *
-	 * This test has been finicky to stabilise while the
-	 * gt_min/max_freq_mhz files in sysfs don't seem to be a reliable
-	 * mechanism for fixing the gpu frequency.
-	 *
-	 * Since these unit tests are focused on the OA unit not the ability to
-	 * pin the frequency via sysfs we make the test account for pinning not
-	 * being reliable and read back the current frequency for each
-	 * iteration of this test to take this into account.
-	 */
-	gt_frequency_pin(gt_freq_mhz);
-
-	igt_debug("Testing OA timer exponents with requested GT frequency = %dmhz\n",
-		  gt_freq_mhz);
-
-	/* allow a +- 10% error margin when checking that the frequency
-	 * calculated from the OA reports matches the frequency according to
-	 * sysfs.
-	 */
-	freq_margin = gt_freq_mhz * 0.1;
+	igt_debug("Testing OA timer exponents\n");
 
 	/* It's asking a lot to sample with a 160 nanosecond period and the
 	 * test can fail due to buffer overflows if it wasn't possible to
@@ -1673,7 +1588,6 @@ test_oa_exponents(int gt_freq_mhz)
 		uint32_t clock_delta;
 		uint32_t freq;
 		int n_tested = 0;
-		int n_freq_matches = 0;
 
 		/* The exponent is effectively selecting a bit in the timestamp
 		 * to trigger reports on and so in practice we expect the raw
@@ -1683,15 +1597,10 @@ test_oa_exponents(int gt_freq_mhz)
 		expected_timestamp_delta = 2 << i;
 
 		for (int j = 0; n_tested < 10 && j < 100; j++) {
-			int gt_freq_mhz_0, gt_freq_mhz_1;
 			uint32_t ticks0, ticks1;
 
-			gt_freq_mhz_0 = sysfs_read("gt_act_freq_mhz");
-
-			igt_debug("ITER %d: testing OA exponent %d (period = %"PRIu64"ns) with sysfs GT freq = %dmhz +- %u\n",
-				  j, i,
-				  oa_exponent_to_ns(i),
-				  gt_freq_mhz_0, freq_margin);
+			igt_debug("ITER %d: testing OA exponent %d (period = %"PRIu64"ns)\n",
+				  j, i, oa_exponent_to_ns(i));
 
 			open_and_read_2_oa_reports(test_oa_format,
 						   i, /* exponent */
@@ -1700,16 +1609,6 @@ test_oa_exponents(int gt_freq_mhz)
 						   true); /* timer triggered
 							     reports only */
 
-			gt_freq_mhz_1 = sysfs_read("gt_act_freq_mhz");
-
-			/* If it looks like the frequency has changed according
-			 * to sysfs then skip looking at this pair of reports
-			 */
-			if (gt_freq_mhz_0 != gt_freq_mhz_1) {
-				igt_debug("skipping OA reports pair due to GT frequency change according to sysfs\n");
-				continue;
-			}
-
 			timestamp_delta = oa_report1[1] - oa_report0[1];
 			igt_assert_neq(timestamp_delta, 0);
 
@@ -1732,31 +1631,13 @@ test_oa_exponents(int gt_freq_mhz)
 			igt_debug("ITER %d: time delta = %"PRIu32"(ns) clock delta = %"PRIu32" freq = %"PRIu32"(mhz)\n",
 				  j, time_delta, clock_delta, freq);
 
-                        if (freq < (gt_freq_mhz_1 + freq_margin) &&
-                            freq > (gt_freq_mhz_1 - freq_margin))
-				n_freq_matches++;
-
 			n_tested++;
 		}
 
 		if (n_tested < 10)
 			igt_debug("sysfs frequency pinning too unstable for cross-referencing with OA derived frequency");
 		igt_assert_eq(n_tested, 10);
-
-		igt_debug("number of iterations with expected clock frequency = %d\n",
-			  n_freq_matches);
-
-		/* Don't assert the calculated frequency for extremely short
-		 * durations.
-		 *
-		 * Allow some mismatches since can't be can't be sure about
-		 * frequency changes between sysfs reads.
-		 */
-		if (i > 3)
-			igt_assert(n_freq_matches >= 7);
 	}
-
-	gt_frequency_range_restore();
 }
 
 /* The OA exponent selects a timestamp counter bit to trigger reports on.
@@ -3981,11 +3862,11 @@ igt_main
 
 		igt_require(init_sys_info());
 
-		gt_frequency_range_save();
-
 		write_u64_file("/proc/sys/dev/i915/perf_stream_paranoid", 1);
 		write_u64_file("/proc/sys/dev/i915/oa_max_sample_rate", 100000);
 
+		gt_max_freq_mhz = sysfs_read("gt_boost_freq_mhz");
+
 		render_copy = igt_get_render_copyfunc(devid);
 		igt_require_f(render_copy, "no render-copy function\n");
 	}
@@ -4012,10 +3893,8 @@ igt_main
 		test_invalid_oa_exponent();
 	igt_subtest("low-oa-exponent-permissions")
 		test_low_oa_exponent_permissions();
-	igt_subtest("oa-exponents") {
-		test_oa_exponents(450);
-		test_oa_exponents(550);
-	}
+	igt_subtest("oa-exponents")
+		test_oa_exponents();
 
 	igt_subtest("per-context-mode-unprivileged") {
 		igt_require(IS_HASWELL(devid));
@@ -4081,8 +3960,6 @@ igt_main
 		write_u64_file("/proc/sys/dev/i915/oa_max_sample_rate", 100000);
 		write_u64_file("/proc/sys/dev/i915/perf_stream_paranoid", 1);
 
-		gt_frequency_range_restore();
-
 		close(drm_fd);
 	}
 }
-- 
2.14.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH i-g-t v6 06/12] tests/perf: rework oa-exponent test
  2017-10-04 11:19 [PATCH i-g-t v6 00/12] Improve robustness of the i915 perf tests Lionel Landwerlin
                   ` (4 preceding siblings ...)
  2017-10-04 11:19 ` [PATCH i-g-t v6 05/12] tests/perf: remove frequency related changes Lionel Landwerlin
@ 2017-10-04 11:19 ` Lionel Landwerlin
  2017-10-04 11:19 ` [PATCH i-g-t v6 07/12] tests/perf: make enable-disable more reliable Lionel Landwerlin
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Lionel Landwerlin @ 2017-10-04 11:19 UTC (permalink / raw)
  To: intel-gfx

New issues that were discovered while making the tests work on Gen8+ :

 - we need to measure timings between periodic reports and discard all
   other kind of reports

 - it seems periodicity of the reports can be affected outside of RC6
   (frequency change), we can detect this by looking at the amount of
   clock cycles per timestamp deltas

v2: Drop some unused variables (Matthew)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
---
 tests/perf.c | 733 ++++++++++++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 599 insertions(+), 134 deletions(-)

diff --git a/tests/perf.c b/tests/perf.c
index f256bac3..15a43cf8 100644
--- a/tests/perf.c
+++ b/tests/perf.c
@@ -28,6 +28,7 @@
 #include <fcntl.h>
 #include <inttypes.h>
 #include <errno.h>
+#include <signal.h>
 #include <sys/stat.h>
 #include <sys/time.h>
 #include <sys/times.h>
@@ -306,6 +307,25 @@ static uint32_t (*read_report_ticks)(uint32_t *report,
 static void (*sanity_check_reports)(uint32_t *oa_report0, uint32_t *oa_report1,
 				    enum drm_i915_oa_format format);
 
+static bool
+timestamp_delta_within(uint32_t delta,
+		       uint32_t expected_delta,
+		       uint32_t margin)
+{
+	return delta >= (expected_delta - margin) &&
+	       delta <= (expected_delta + margin);
+}
+
+static bool
+double_value_within(double value,
+		    double expected,
+		    double percent_margin)
+{
+	return value >= (expected - expected * percent_margin / 100.0) &&
+	       value <= (expected + expected * percent_margin / 100.0);
+
+}
+
 static void
 __perf_close(int fd)
 {
@@ -472,6 +492,20 @@ gen8_read_report_ticks(uint32_t *report, enum drm_i915_oa_format format)
 	return report[3];
 }
 
+static void
+gen8_read_report_clock_ratios(uint32_t *report,
+			      uint32_t *slice_freq_mhz,
+			      uint32_t *unslice_freq_mhz)
+{
+	uint32_t unslice_freq = report[0] & 0x1ff;
+	uint32_t slice_freq_low = (report[0] >> 25) & 0x7f;
+	uint32_t slice_freq_high = (report[0] >> 9) & 0x3;
+	uint32_t slice_freq = slice_freq_low | (slice_freq_high << 7);
+
+	*slice_freq_mhz = (slice_freq * 16666) / 1000;
+	*unslice_freq_mhz = (unslice_freq * 16666) / 1000;
+}
+
 static const char *
 gen8_read_report_reason(const uint32_t *report)
 {
@@ -494,29 +528,6 @@ gen8_read_report_reason(const uint32_t *report)
 		return "unknown";
 }
 
-static bool
-oa_report_is_periodic(uint32_t oa_exponent, const uint32_t *report)
-{
-	if (IS_HASWELL(devid)) {
-		/* For Haswell we don't have a documented report reason field
-		 * (though empirically report[0] bit 10 does seem to correlate
-		 * with a timer trigger reason) so we instead infer which
-		 * reports are timer triggered by checking if the least
-		 * significant bits are zero and the exponent bit is set.
-		 */
-		uint32_t oa_exponent_mask = (1 << (oa_exponent + 1)) - 1;
-
-		if ((report[1] & oa_exponent_mask) != (1 << oa_exponent))
-			return true;
-	} else {
-		if ((report[0] >> OAREPORT_REASON_SHIFT) &
-		    OAREPORT_REASON_TIMER)
-			return true;
-	}
-
-	return false;
-}
-
 static uint64_t
 timebase_scale(uint32_t u32_delta)
 {
@@ -563,6 +574,29 @@ oa_exponent_to_ns(int exponent)
        return 1000000000ULL * (2ULL << exponent) / timestamp_frequency;
 }
 
+static bool
+oa_report_is_periodic(uint32_t oa_exponent, const uint32_t *report)
+{
+	if (IS_HASWELL(devid)) {
+		/* For Haswell we don't have a documented report reason field
+		 * (though empirically report[0] bit 10 does seem to correlate
+		 * with a timer trigger reason) so we instead infer which
+		 * reports are timer triggered by checking if the least
+		 * significant bits are zero and the exponent bit is set.
+		 */
+		uint32_t oa_exponent_mask = (1 << (oa_exponent + 1)) - 1;
+
+		if ((report[1] & oa_exponent_mask) == (1 << oa_exponent))
+			return true;
+	} else {
+		if ((report[0] >> OAREPORT_REASON_SHIFT) &
+		    OAREPORT_REASON_TIMER)
+			return true;
+	}
+
+	return false;
+}
+
 static bool
 oa_report_ctx_is_valid(uint32_t *report)
 {
@@ -578,6 +612,128 @@ oa_report_ctx_is_valid(uint32_t *report)
 	igt_assert(!"reached");
 }
 
+static uint32_t
+oa_report_get_ctx_id(uint32_t *report)
+{
+	if (!oa_report_ctx_is_valid(report))
+		return 0xffffffff;
+	return report[2];
+}
+
+static double
+oa_reports_tick_per_period(uint32_t *report0, uint32_t *report1)
+{
+	if (intel_gen(devid) < 8)
+		return 0.0;
+
+	/* Measure the number GPU tick delta to timestamp delta. */
+	return (double) (report1[3] - report0[3]) /
+	       (double) (report1[1] - report0[1]);
+}
+
+static void
+scratch_buf_memset(drm_intel_bo *bo, int width, int height, uint32_t color)
+{
+	int ret;
+
+	ret = drm_intel_bo_map(bo, true /* writable */);
+	igt_assert_eq(ret, 0);
+
+	for (int i = 0; i < width * height; i++)
+		((uint32_t *)bo->virtual)[i] = color;
+
+	drm_intel_bo_unmap(bo);
+}
+
+static void
+scratch_buf_init(drm_intel_bufmgr *bufmgr,
+		 struct igt_buf *buf,
+		 int width, int height,
+		 uint32_t color)
+{
+	size_t stride = width * 4;
+	size_t size = stride * height;
+	drm_intel_bo *bo = drm_intel_bo_alloc(bufmgr, "", size, 4096);
+
+	scratch_buf_memset(bo, width, height, color);
+
+	buf->bo = bo;
+	buf->stride = stride;
+	buf->tiling = I915_TILING_NONE;
+	buf->size = size;
+}
+
+static void
+emit_report_perf_count(struct intel_batchbuffer *batch,
+		       drm_intel_bo *dst_bo,
+		       int dst_offset,
+		       uint32_t report_id)
+{
+	if (IS_HASWELL(devid)) {
+		BEGIN_BATCH(3, 1);
+		OUT_BATCH(GEN6_MI_REPORT_PERF_COUNT);
+		OUT_RELOC(dst_bo, I915_GEM_DOMAIN_INSTRUCTION, I915_GEM_DOMAIN_INSTRUCTION,
+			  dst_offset);
+		OUT_BATCH(report_id);
+		ADVANCE_BATCH();
+	} else {
+		/* XXX: NB: n dwords arg is actually magic since it internally
+		 * automatically accounts for larger addresses on gen >= 8...
+		 */
+		BEGIN_BATCH(3, 1);
+		OUT_BATCH(GEN8_MI_REPORT_PERF_COUNT);
+		OUT_RELOC(dst_bo, I915_GEM_DOMAIN_INSTRUCTION, I915_GEM_DOMAIN_INSTRUCTION,
+			  dst_offset);
+		OUT_BATCH(report_id);
+		ADVANCE_BATCH();
+	}
+}
+
+static uint32_t
+i915_get_one_gpu_timestamp(uint32_t *context_id)
+{
+	drm_intel_bufmgr *bufmgr = drm_intel_bufmgr_gem_init(drm_fd, 4096);
+	drm_intel_context *mi_rpc_ctx = drm_intel_gem_context_create(bufmgr);
+	drm_intel_bo *mi_rpc_bo = drm_intel_bo_alloc(bufmgr, "mi_rpc dest bo", 4096, 64);
+	struct intel_batchbuffer *mi_rpc_batch = intel_batchbuffer_alloc(bufmgr, devid);
+	int ret;
+	uint32_t timestamp;
+
+	drm_intel_bufmgr_gem_enable_reuse(bufmgr);
+
+	if (context_id) {
+		ret = drm_intel_gem_context_get_id(mi_rpc_ctx, context_id);
+		igt_assert_eq(ret, 0);
+	}
+
+	igt_assert(mi_rpc_ctx);
+	igt_assert(mi_rpc_bo);
+	igt_assert(mi_rpc_batch);
+
+	ret = drm_intel_bo_map(mi_rpc_bo, true);
+	igt_assert_eq(ret, 0);
+	memset(mi_rpc_bo->virtual, 0x80, 4096);
+	drm_intel_bo_unmap(mi_rpc_bo);
+
+	emit_report_perf_count(mi_rpc_batch,
+			       mi_rpc_bo, /* dst */
+			       0, /* dst offset in bytes */
+			       0xdeadbeef); /* report ID */
+
+	intel_batchbuffer_flush_with_context(mi_rpc_batch, mi_rpc_ctx);
+
+	ret = drm_intel_bo_map(mi_rpc_bo, false /* write enable */);
+	igt_assert_eq(ret, 0);
+	timestamp = ((uint32_t *)mi_rpc_bo->virtual)[1];
+	drm_intel_bo_unmap(mi_rpc_bo);
+
+	drm_intel_bo_unreference(mi_rpc_bo);
+	intel_batchbuffer_free(mi_rpc_batch);
+	drm_intel_gem_context_destroy(mi_rpc_ctx);
+	drm_intel_bufmgr_destroy(bufmgr);
+
+	return timestamp;
+}
 
 static void
 hsw_sanity_check_render_basic_reports(uint32_t *oa_report0, uint32_t *oa_report1,
@@ -1050,7 +1206,6 @@ i915_read_reports_until_timestamp(enum drm_i915_oa_format oa_format,
 	return total_len;
 }
 
-
 /* CAP_SYS_ADMIN is required to open system wide metrics, unless the system
  * control parameter dev.i915.perf_stream_paranoid == 0 */
 static void
@@ -1365,20 +1520,6 @@ open_and_read_2_oa_reports(int format_id,
 	__perf_close(stream_fd);
 }
 
-static void
-gen8_read_report_clock_ratios(uint32_t *report,
-			      uint32_t *slice_freq_mhz,
-			      uint32_t *unslice_freq_mhz)
-{
-	uint32_t unslice_freq = report[0] & 0x1ff;
-	uint32_t slice_freq_low = (report[0] >> 25) & 0x7f;
-	uint32_t slice_freq_high = (report[0] >> 9) & 0x3;
-	uint32_t slice_freq = slice_freq_low | (slice_freq_high << 7);
-
-	*slice_freq_mhz = (slice_freq * 16666) / 1000;
-	*unslice_freq_mhz = (unslice_freq * 16666) / 1000;
-}
-
 static void
 print_reports(uint32_t *oa_report0, uint32_t *oa_report1, int fmt)
 {
@@ -1570,74 +1711,456 @@ test_oa_formats(void)
 	}
 }
 
+
+enum load {
+	LOW,
+	HIGH
+};
+
+#define LOAD_HELPER_PAUSE_USEC 500
+
+static struct load_helper {
+	int devid;
+	int has_ppgtt;
+	drm_intel_bufmgr *bufmgr;
+	drm_intel_context *context;
+	uint32_t context_id;
+	struct intel_batchbuffer *batch;
+	enum load load;
+	bool exit;
+	struct igt_helper_process igt_proc;
+	struct igt_buf src, dst;
+} lh = { 0, };
+
+static void load_helper_signal_handler(int sig)
+{
+	if (sig == SIGUSR2)
+		lh.load = lh.load == LOW ? HIGH : LOW;
+	else
+		lh.exit = true;
+}
+
+static void load_helper_set_load(enum load load)
+{
+	igt_assert(lh.igt_proc.running);
+
+	if (lh.load == load)
+		return;
+
+	lh.load = load;
+	kill(lh.igt_proc.pid, SIGUSR2);
+}
+
+static void load_helper_run(enum load load)
+{
+	/*
+	 * FIXME fork helpers won't get cleaned up when started from within a
+	 * subtest, so handle the case where it sticks around a bit too long.
+	 */
+	if (lh.igt_proc.running) {
+		load_helper_set_load(load);
+		return;
+	}
+
+	lh.load = load;
+
+	igt_fork_helper(&lh.igt_proc) {
+		signal(SIGUSR1, load_helper_signal_handler);
+		signal(SIGUSR2, load_helper_signal_handler);
+
+		while (!lh.exit) {
+			int ret;
+
+			render_copy(lh.batch,
+				    lh.context,
+				    &lh.src, 0, 0, 1920, 1080,
+				    &lh.dst, 0, 0);
+
+			intel_batchbuffer_flush_with_context(lh.batch,
+							     lh.context);
+
+			ret = drm_intel_gem_context_get_id(lh.context,
+							   &lh.context_id);
+			igt_assert_eq(ret, 0);
+
+			drm_intel_bo_wait_rendering(lh.dst.bo);
+
+			/* Lower the load by pausing after every submitted
+			 * write. */
+			if (lh.load == LOW)
+				usleep(LOAD_HELPER_PAUSE_USEC);
+		}
+	}
+}
+
+static void load_helper_stop(void)
+{
+	kill(lh.igt_proc.pid, SIGUSR1);
+	igt_assert(igt_wait_helper(&lh.igt_proc) == 0);
+}
+
+static void load_helper_init(void)
+{
+	int ret;
+
+	lh.devid = intel_get_drm_devid(drm_fd);
+	lh.has_ppgtt = gem_uses_ppgtt(drm_fd);
+
+	/* MI_STORE_DATA can only use GTT address on gen4+/g33 and needs
+	 * snoopable mem on pre-gen6. Hence load-helper only works on gen6+, but
+	 * that's also all we care about for the rps testcase*/
+	igt_assert(intel_gen(lh.devid) >= 6);
+	lh.bufmgr = drm_intel_bufmgr_gem_init(drm_fd, 4096);
+	igt_assert(lh.bufmgr);
+
+	drm_intel_bufmgr_gem_enable_reuse(lh.bufmgr);
+
+	lh.context = drm_intel_gem_context_create(lh.bufmgr);
+	igt_assert(lh.context);
+
+	lh.context_id = 0xffffffff;
+	ret = drm_intel_gem_context_get_id(lh.context, &lh.context_id);
+	igt_assert_eq(ret, 0);
+	igt_assert_neq(lh.context_id, 0xffffffff);
+
+	lh.batch = intel_batchbuffer_alloc(lh.bufmgr, lh.devid);
+	igt_assert(lh.batch);
+
+	scratch_buf_init(lh.bufmgr, &lh.dst, 1920, 1080, 0);
+	scratch_buf_init(lh.bufmgr, &lh.src, 1920, 1080, 0);
+}
+
+static void load_helper_fini(void)
+{
+	if (lh.igt_proc.running)
+		load_helper_stop();
+
+	if (lh.src.bo)
+		drm_intel_bo_unreference(lh.src.bo);
+	if (lh.dst.bo)
+		drm_intel_bo_unreference(lh.dst.bo);
+
+	if (lh.batch)
+		intel_batchbuffer_free(lh.batch);
+
+	if (lh.context)
+		drm_intel_gem_context_destroy(lh.context);
+
+	if (lh.bufmgr)
+		drm_intel_bufmgr_destroy(lh.bufmgr);
+}
+
 static void
 test_oa_exponents(void)
 {
-	igt_debug("Testing OA timer exponents\n");
+	load_helper_init();
+	load_helper_run(HIGH);
 
 	/* It's asking a lot to sample with a 160 nanosecond period and the
 	 * test can fail due to buffer overflows if it wasn't possible to
 	 * keep up, so we don't start from an exponent of zero...
 	 */
-	for (int i = 5; i < 20; i++) {
-		uint32_t expected_timestamp_delta;
-		uint32_t timestamp_delta;
-		uint32_t oa_report0[64];
-		uint32_t oa_report1[64];
+	for (int exponent = 5; exponent < 18; exponent++) {
+		uint64_t expected_timestamp_delta;
 		uint32_t time_delta;
-		uint32_t clock_delta;
-		uint32_t freq;
 		int n_tested = 0;
+		int n_time_delta_matches = 0;
 
 		/* The exponent is effectively selecting a bit in the timestamp
 		 * to trigger reports on and so in practice we expect the raw
 		 * timestamp deltas for periodic reports to exactly match the
 		 * value of next bit.
 		 */
-		expected_timestamp_delta = 2 << i;
+		expected_timestamp_delta = 2UL << exponent;
 
 		for (int j = 0; n_tested < 10 && j < 100; j++) {
-			uint32_t ticks0, ticks1;
-
-			igt_debug("ITER %d: testing OA exponent %d (period = %"PRIu64"ns)\n",
-				  j, i, oa_exponent_to_ns(i));
-
-			open_and_read_2_oa_reports(test_oa_format,
-						   i, /* exponent */
-						   oa_report0,
-						   oa_report1,
-						   true); /* timer triggered
-							     reports only */
-
-			timestamp_delta = oa_report1[1] - oa_report0[1];
-			igt_assert_neq(timestamp_delta, 0);
-
-			if (timestamp_delta != expected_timestamp_delta) {
-				igt_debug("timestamp0 = %u/0x%x\n",
-					  oa_report0[1], oa_report0[1]);
-				igt_debug("timestamp1 = %u/0x%x\n",
-					  oa_report1[1], oa_report1[1]);
+			uint64_t properties[] = {
+				/* Include OA reports in samples */
+				DRM_I915_PERF_PROP_SAMPLE_OA, true,
+
+				/* OA unit configuration */
+				DRM_I915_PERF_PROP_OA_METRICS_SET, test_metric_set_id,
+				DRM_I915_PERF_PROP_OA_FORMAT, test_oa_format,
+				DRM_I915_PERF_PROP_OA_EXPONENT, exponent,
+			};
+			struct drm_i915_perf_open_param param = {
+				.flags = I915_PERF_FLAG_FD_CLOEXEC,
+				.num_properties = ARRAY_SIZE(properties) / 2,
+				.properties_ptr = to_user_pointer(properties),
+			};
+			int ret;
+			uint64_t average_timestamp_delta;
+			uint32_t n_reports = 0;
+			uint32_t n_idle_reports = 0;
+			uint32_t n_reads = 0;
+			uint32_t context_id;
+			uint64_t first_timestamp = 0;
+			bool check_first_timestamp = true;
+			struct drm_i915_perf_record_header *header;
+			uint64_t delta_delta;
+			struct {
+				uint32_t report[64];
+			} reports[30];
+			struct {
+				uint8_t *buf;
+				size_t len;
+			} reads[1000];
+			double error;
+			double tick_per_period;
+
+			igt_debug("ITER %d: testing OA exponent %d,"
+				  " expected ts delta = %"PRIu64" (%"PRIu64"ns/%.2fus/%.2fms)\n",
+				  j, exponent,
+				  expected_timestamp_delta,
+				  oa_exponent_to_ns(exponent),
+				  oa_exponent_to_ns(exponent) / 1000.0,
+				  oa_exponent_to_ns(exponent) / (1000.0 * 1000.0));
+
+			stream_fd = __perf_open(drm_fd, &param);
+
+			/* Right after opening the OA stream, read a
+			 * first timestamp as way to filter previously
+			 * scheduled work that would have configured
+			 * the OA unit at a different period. */
+			first_timestamp = i915_get_one_gpu_timestamp(&context_id);
+
+			while (n_reads < ARRAY_SIZE(reads) &&
+			       n_reports < ARRAY_SIZE(reports)) {
+				const size_t buf_size = 1024 * 1024;
+				uint8_t *buf = reads[n_reads++].buf = calloc(1, buf_size);
+
+				while ((ret = read(stream_fd, buf, buf_size)) < 0 &&
+				       errno == EINTR)
+					;
+
+				/* We should never have no data. */
+				igt_assert(ret > 0);
+				reads[n_reads - 1].len = ret;
+
+				igt_debug(" > read %i bytes\n", ret);
+
+				for (int offset = 0;
+				     offset < ret && n_reports < ARRAY_SIZE(reports);
+				     offset += header->size) {
+					uint32_t *report;
+					double previous_tick_per_period;
+
+					header = (void *)(buf + offset);
+
+					if (header->type == DRM_I915_PERF_RECORD_OA_BUFFER_LOST) {
+						igt_assert(!"reached");
+						break;
+					}
+
+					if (header->type == DRM_I915_PERF_RECORD_OA_REPORT_LOST) {
+						n_reports = 0;
+						n_idle_reports = 0;
+						for (int r = 0; r < n_reads; r++)
+							free(reads[r].buf);
+						n_reads = 0;
+						break;
+					}
+
+					if (header->type != DRM_I915_PERF_RECORD_SAMPLE)
+						continue;
+
+					report = (void *)(header + 1);
+
+					/* Skip anything before the first
+					 * timestamp, it might not be at the
+					 * right periodic exponent. */
+					if (check_first_timestamp &&
+					    report[1] < first_timestamp) {
+						igt_debug(" > Dropping ts=%u (prior %"PRIu64")\n",
+							  report[1], first_timestamp);
+						continue;
+					}
+
+					/* Once we've passed the first
+					 * timestamp, no need to check. */
+					check_first_timestamp = false;
+
+					if (!oa_report_ctx_is_valid(report))
+						n_idle_reports++;
+
+					/* We only measure timestamps between
+					 * periodic reports. */
+					if (!oa_report_is_periodic(exponent, report))
+						continue;
+
+					igt_debug(" > write %i timestamp=%u\n", n_reports, report[1]);
+					memcpy(reports[n_reports].report, report,
+					       sizeof(reports[n_reports].report));
+					n_reports++;
+
+					previous_tick_per_period = tick_per_period;
+
+					if (n_reports > 1) {
+						tick_per_period =
+							oa_reports_tick_per_period(reports[n_reports - 2].report,
+										   reports[n_reports - 1].report);
+
+						/* Dismiss the series of report
+						 * if we notice clock frequency
+						 * changes. */
+						if (!double_value_within(tick_per_period,
+									 previous_tick_per_period, 5)) {
+								igt_debug("Noticed clock frequency change at ts=%u (%f / %f), "
+									  "dropping reports and trying again\n",
+									  report[1], previous_tick_per_period, tick_per_period);
+								n_reports = 0;
+								n_idle_reports = 0;
+								for (int r = 0; r < n_reads; r++)
+									free(reads[r].buf);
+								n_reads = 0;
+								break;
+						}
+					}
+				}
 			}
 
-			igt_assert_eq(timestamp_delta, expected_timestamp_delta);
+			__perf_close(stream_fd);
+			igt_debug("closed stream\n");
 
-			ticks0 = read_report_ticks(oa_report0, test_oa_format);
-			ticks1 = read_report_ticks(oa_report1, test_oa_format);
-			clock_delta = ticks1 - ticks0;
+			igt_assert_eq(n_reports, ARRAY_SIZE(reports));
 
-			time_delta = timebase_scale(timestamp_delta);
+			average_timestamp_delta = 0;
+			for (int i = 0; i < (n_reports - 1); i++) {
+				/* XXX: calculating with u32 arithmetic to account for overflow */
+				uint32_t u32_delta = reports[i + 1].report[1] - reports[i].report[1];
 
-			freq = ((uint64_t)clock_delta * 1000) / time_delta;
-			igt_debug("ITER %d: time delta = %"PRIu32"(ns) clock delta = %"PRIu32" freq = %"PRIu32"(mhz)\n",
-				  j, time_delta, clock_delta, freq);
+				average_timestamp_delta += u32_delta;
+			}
+			average_timestamp_delta /= (n_reports - 1);
+
+			if (average_timestamp_delta > expected_timestamp_delta)
+				delta_delta  = average_timestamp_delta - expected_timestamp_delta;
+			else
+				delta_delta = expected_timestamp_delta - average_timestamp_delta;
+			error = (delta_delta / (double)expected_timestamp_delta) * 100.0;
+
+			time_delta = timebase_scale(average_timestamp_delta);
+
+			igt_debug(" > Avg. time delta = %"PRIu32"(ns),"
+				  " n idle reports = %u, n reads = %u, error=%f\n",
+				  time_delta, n_idle_reports, n_reads, error);
+			if (error > 5) {
+				uint32_t *rpt = NULL, *last = NULL, *last_periodic = NULL;
+
+				igt_debug(" > More than 5%% error: avg_ts_delta = %"PRIu64", delta_delta = %"PRIu64", "
+					  "expected_delta = %"PRIu64", first_timestamp = %"PRIu64" ctx_id=%"PRIu32"\n",
+					  average_timestamp_delta, delta_delta, expected_timestamp_delta, first_timestamp, context_id);
+				for (int i = 0; i < (n_reports - 1); i++) {
+					/* XXX: calculating with u32 arithmetic to account for overflow */
+					uint32_t u32_delta =
+						reports[i + 1].report[1] - reports[i].report[1];
+
+					if (u32_delta > expected_timestamp_delta)
+						delta_delta  = u32_delta - expected_timestamp_delta;
+					else
+						delta_delta = expected_timestamp_delta - u32_delta;
+					error = (delta_delta / (double)expected_timestamp_delta) * 100.0;
+
+					igt_debug(" > ts=%u-%u timestamp delta from %2d to %2d = %-8u (error = %u%%, ctx_id = %x)\n",
+						  reports[i + 1].report[1], reports[i].report[1],
+						  i, i + 1, u32_delta, (unsigned)error,
+						  oa_report_get_ctx_id(reports[i + 1].report));
+				}
+				for (int r = 0; r < n_reads; r++) {
+					igt_debug(" > read\n");
+					for (int offset = 0;
+					     offset < reads[r].len;
+					     offset += header->size) {
+						/* Just a random counter,
+						 * helpful to notice
+						 * inconsistency in debug.
+						 */
+						int counter_print = 13;
+						uint64_t a0 = 0, aN = 0;
+						double local_period = 0;
+
+						header = (void *) &reads[r].buf[offset];
+
+						if (header->type != DRM_I915_PERF_RECORD_SAMPLE) {
+							igt_debug(" > loss\n");
+							continue;
+						}
+
+						rpt = (void *)(header + 1);
+
+						if (last) {
+							a0 = gen8_read_40bit_a_counter(rpt, test_oa_format, 0) -
+								gen8_read_40bit_a_counter(last, test_oa_format, 0);
+							aN = gen8_read_40bit_a_counter(rpt, test_oa_format, counter_print) -
+								gen8_read_40bit_a_counter(last, test_oa_format, counter_print);
+						}
+
+						if (last_periodic &&
+						    intel_gen(devid) >= 8 &&
+						    oa_report_is_periodic(exponent, rpt)) {
+							local_period =
+								((uint64_t) rpt[3] - last_periodic[3])  /
+								((uint64_t) rpt[1] - last_periodic[1]);
+						}
+
+						igt_debug(" > report ts=%u"
+							  " ts_delta_last=%8u ts_delta_last_periodic=%8u is_timer=%i ctx_id=%8x gpu_ticks=%u period=%.2f A0=%lu A%i=%lu\n",
+							  rpt[1],
+							  (last != NULL) ? (rpt[1] - last[1]) : 0,
+							  (last_periodic != NULL) ? (rpt[1] - last_periodic[1]) : 0,
+							  oa_report_is_periodic(exponent, rpt),
+							  oa_report_get_ctx_id(rpt),
+							  (last != NULL) ? (rpt[3] - last[3]) : 0,
+							  local_period,
+							  a0, counter_print, aN);
+
+						last = rpt;
+						if (oa_report_is_periodic(exponent, rpt))
+							last_periodic = rpt;
+					}
+				}
+
+				igt_assert(!"reached");
+			}
+
+			if (timestamp_delta_within(average_timestamp_delta,
+						   expected_timestamp_delta,
+						   expected_timestamp_delta * 0.05)) {
+				igt_debug(" > timestamp delta matching %"PRIu64"ns ~= expected %"PRIu64"! :)\n",
+					  timebase_scale(average_timestamp_delta),
+					  timebase_scale(expected_timestamp_delta));
+				n_time_delta_matches++;
+			} else {
+				igt_debug(" > timestamp delta mismatch: %"PRIu64"ns != expected %"PRIu64"ns\n",
+					  timebase_scale(average_timestamp_delta),
+					  timebase_scale(expected_timestamp_delta));
+				igt_assert(average_timestamp_delta <
+					   (expected_timestamp_delta * 2));
+			}
 
 			n_tested++;
+
+			for (int r = 0; r < n_reads; r++)
+				free(reads[r].buf);
 		}
 
 		if (n_tested < 10)
-			igt_debug("sysfs frequency pinning too unstable for cross-referencing with OA derived frequency");
+			igt_debug("Too many test iterations had to be skipped\n");
 		igt_assert_eq(n_tested, 10);
+
+		igt_debug("Number of iterations with expected timestamp delta = %d\n",
+			  n_time_delta_matches);
+
+		/* The HW doesn't give us any strict guarantee that the
+		 * timestamps are exactly aligned with the exponent mask but
+		 * in practice it seems very rare for that not to be the case
+		 * so it a useful sanity check to assert quite strictly...
+		 */
+		igt_assert(n_time_delta_matches >= 9);
 	}
+
+	load_helper_stop();
+	load_helper_fini();
 }
 
 /* The OA exponent selects a timestamp counter bit to trigger reports on.
@@ -2503,32 +3026,6 @@ test_disabled_read_error(void)
 	__perf_close(stream_fd);
 }
 
-static void
-emit_report_perf_count(struct intel_batchbuffer *batch,
-		       drm_intel_bo *dst_bo,
-		       int dst_offset,
-		       uint32_t report_id)
-{
-	if (IS_HASWELL(devid)) {
-		BEGIN_BATCH(3, 1);
-		OUT_BATCH(GEN6_MI_REPORT_PERF_COUNT);
-		OUT_RELOC(dst_bo, I915_GEM_DOMAIN_INSTRUCTION, I915_GEM_DOMAIN_INSTRUCTION,
-			  dst_offset);
-		OUT_BATCH(report_id);
-		ADVANCE_BATCH();
-	} else {
-		/* XXX: NB: n dwords arg is actually magic since it internally
-		 * automatically accounts for larger addresses on gen >= 8...
-		 */
-		BEGIN_BATCH(3, 1);
-		OUT_BATCH(GEN8_MI_REPORT_PERF_COUNT);
-		OUT_RELOC(dst_bo, I915_GEM_DOMAIN_INSTRUCTION, I915_GEM_DOMAIN_INSTRUCTION,
-			  dst_offset);
-		OUT_BATCH(report_id);
-		ADVANCE_BATCH();
-	}
-}
-
 static void
 test_mi_rpc(void)
 {
@@ -2598,38 +3095,6 @@ test_mi_rpc(void)
 	__perf_close(stream_fd);
 }
 
-static void
-scratch_buf_memset(drm_intel_bo *bo, int width, int height, uint32_t color)
-{
-	int ret;
-
-	ret = drm_intel_bo_map(bo, true /* writable */);
-	igt_assert_eq(ret, 0);
-
-	for (int i = 0; i < width * height; i++)
-		((uint32_t *)bo->virtual)[i] = color;
-
-	drm_intel_bo_unmap(bo);
-}
-
-static void
-scratch_buf_init(drm_intel_bufmgr *bufmgr,
-		 struct igt_buf *buf,
-		 int width, int height,
-		 uint32_t color)
-{
-	size_t stride = width * 4;
-	size_t size = stride * height;
-	drm_intel_bo *bo = drm_intel_bo_alloc(bufmgr, "", size, 4096);
-
-	scratch_buf_memset(bo, width, height, color);
-
-	buf->bo = bo;
-	buf->stride = stride;
-	buf->tiling = I915_TILING_NONE;
-	buf->size = size;
-}
-
 static void
 emit_stall_timestamp_and_rpc(struct intel_batchbuffer *batch,
 			     drm_intel_bo *dst,
-- 
2.14.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH i-g-t v6 07/12] tests/perf: make enable-disable more reliable
  2017-10-04 11:19 [PATCH i-g-t v6 00/12] Improve robustness of the i915 perf tests Lionel Landwerlin
                   ` (5 preceding siblings ...)
  2017-10-04 11:19 ` [PATCH i-g-t v6 06/12] tests/perf: rework oa-exponent test Lionel Landwerlin
@ 2017-10-04 11:19 ` Lionel Landwerlin
  2017-10-04 11:19 ` [PATCH i-g-t v6 08/12] tests/perf: make buffer-fill " Lionel Landwerlin
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Lionel Landwerlin @ 2017-10-04 11:19 UTC (permalink / raw)
  To: intel-gfx

Estimation of the amount of reports can only refer to periodic ones,
as context switch reports completely depend on what happens on the
system. Also generate some load to prevent clock frequency changes to
impact our measurement.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
---
 tests/perf.c | 95 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 89 insertions(+), 6 deletions(-)

diff --git a/tests/perf.c b/tests/perf.c
index 15a43cf8..61febf25 100644
--- a/tests/perf.c
+++ b/tests/perf.c
@@ -2796,10 +2796,18 @@ test_enable_disable(void)
 	int n_full_oa_reports = oa_buf_size / report_size;
 	uint64_t fill_duration = n_full_oa_reports * oa_period;
 
+	load_helper_init();
+	load_helper_run(HIGH);
+
 	stream_fd = __perf_open(drm_fd, &param);
 
 	for (int i = 0; i < 5; i++) {
 		int len;
+		uint32_t n_periodic_reports;
+		struct drm_i915_perf_record_header *header;
+		uint32_t first_timestamp = 0, last_timestamp = 0;
+		uint32_t last_periodic_report[64];
+		double tick_per_period;
 
 		/* Giving enough time for an overflow might help catch whether
 		 * the OA unit has been enabled even if the driver might at
@@ -2819,18 +2827,90 @@ test_enable_disable(void)
 
 		nanosleep(&(struct timespec){ .tv_sec = 0,
 					      .tv_nsec = fill_duration / 2 },
-			  NULL);
+			NULL);
 
-		while ((len = read(stream_fd, buf, buf_size)) == -1 && errno == EINTR)
-			;
+		n_periodic_reports = 0;
 
-		igt_assert_neq(len, -1);
+		/* Because of the race condition between notification of new
+		 * reports and reports landing in memory, we need to rely on
+		 * timestamps to figure whether we've read enough of them.
+		 */
+		while (((last_timestamp - first_timestamp) * oa_period) < (fill_duration / 2)) {
 
-		igt_assert(len > report_size * n_full_oa_reports * 0.45);
-		igt_assert(len < report_size * n_full_oa_reports * 0.55);
+			while ((len = read(stream_fd, buf, buf_size)) == -1 && errno == EINTR)
+				;
+
+			igt_assert_neq(len, -1);
+
+			for (int offset = 0; offset < len; offset += header->size) {
+				uint32_t *report;
+				double previous_tick_per_period;
+
+				header = (void *) (buf + offset);
+				report = (void *) (header + 1);
+
+				switch (header->type) {
+				case DRM_I915_PERF_RECORD_OA_REPORT_LOST:
+					break;
+				case DRM_I915_PERF_RECORD_SAMPLE:
+					if (first_timestamp == 0)
+						first_timestamp = report[1];
+					last_timestamp = report[1];
+
+					previous_tick_per_period = tick_per_period;
+
+					if (n_periodic_reports > 0 &&
+					    oa_report_is_periodic(oa_exponent, report)) {
+						tick_per_period =
+							oa_reports_tick_per_period(last_periodic_report,
+										   report);
+
+						if (!double_value_within(tick_per_period,
+									 previous_tick_per_period, 5))
+							igt_debug("clock change!\n");
+
+						igt_debug(" > report ts=%u"
+							  " ts_delta_last_periodic=%8u is_timer=%i ctx_id=%8x gpu_ticks=%u nb_periodic=%u\n",
+							  report[1],
+							  report[1] - last_periodic_report[1],
+							  oa_report_is_periodic(oa_exponent, report),
+							  oa_report_get_ctx_id(report),
+							  report[3] - last_periodic_report[3],
+							  n_periodic_reports);
+
+						memcpy(last_periodic_report, report,
+						       sizeof(last_periodic_report));
+					}
+
+					/* We want to measure only the periodic
+					 * reports, ctx-switch might inflate the
+					 * content of the buffer and skew or
+					 * measurement.
+					 */
+					n_periodic_reports +=
+						oa_report_is_periodic(oa_exponent, report);
+					break;
+				case DRM_I915_PERF_RECORD_OA_BUFFER_LOST:
+					igt_assert(!"unexpected overflow");
+					break;
+				}
+			}
+
+		}
 
 		do_ioctl(stream_fd, I915_PERF_IOCTL_DISABLE, 0);
 
+		igt_debug("%f < %lu < %f\n",
+			  report_size * n_full_oa_reports * 0.45,
+			  n_periodic_reports * report_size,
+			  report_size * n_full_oa_reports * 0.55);
+
+		igt_assert((n_periodic_reports * report_size) >
+			   (report_size * n_full_oa_reports * 0.45));
+		igt_assert((n_periodic_reports * report_size) <
+			   report_size * n_full_oa_reports * 0.55);
+
+
 		/* It's considered an error to read a stream while it's disabled
 		 * since it would block indefinitely...
 		 */
@@ -2843,6 +2923,9 @@ test_enable_disable(void)
 	free(buf);
 
 	__perf_close(stream_fd);
+
+	load_helper_stop();
+	load_helper_fini();
 }
 
 static void
-- 
2.14.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH i-g-t v6 08/12] tests/perf: make buffer-fill more reliable
  2017-10-04 11:19 [PATCH i-g-t v6 00/12] Improve robustness of the i915 perf tests Lionel Landwerlin
                   ` (6 preceding siblings ...)
  2017-10-04 11:19 ` [PATCH i-g-t v6 07/12] tests/perf: make enable-disable more reliable Lionel Landwerlin
@ 2017-10-04 11:19 ` Lionel Landwerlin
  2017-10-04 11:19 ` [PATCH i-g-t v6 09/12] tests/perf: estimate number of blocking/polling based on time spent Lionel Landwerlin
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Lionel Landwerlin @ 2017-10-04 11:19 UTC (permalink / raw)
  To: intel-gfx

Filling rate of the buffer must discard context switch reports as they
do not depend upon the periodicity, instead they're a factor on the
amount of different applications concurrently running on the system.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
---
 tests/perf.c | 120 ++++++++++++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 103 insertions(+), 17 deletions(-)

diff --git a/tests/perf.c b/tests/perf.c
index 61febf25..959a7347 100644
--- a/tests/perf.c
+++ b/tests/perf.c
@@ -2704,22 +2704,30 @@ test_buffer_fill(void)
 		.num_properties = sizeof(properties) / 16,
 		.properties_ptr = to_user_pointer(properties),
 	};
+	struct drm_i915_perf_record_header *header;
 	int buf_size = 65536 * (256 + sizeof(struct drm_i915_perf_record_header));
 	uint8_t *buf = malloc(buf_size);
+	int len;
 	size_t oa_buf_size = 16 * 1024 * 1024;
 	size_t report_size = oa_formats[test_oa_format].size;
 	int n_full_oa_reports = oa_buf_size / report_size;
 	uint64_t fill_duration = n_full_oa_reports * oa_period;
 
+	load_helper_init();
+	load_helper_run(HIGH);
+
 	igt_assert(fill_duration < 1000000000);
 
 	stream_fd = __perf_open(drm_fd, &param);
 
 	for (int i = 0; i < 5; i++) {
-		struct drm_i915_perf_record_header *header;
 		bool overflow_seen;
-		int offset = 0;
-		int len;
+		uint32_t n_periodic_reports;
+		uint32_t first_timestamp = 0, last_timestamp = 0;
+		uint32_t last_periodic_report[64];
+		double tick_per_period;
+
+		do_ioctl(stream_fd, I915_PERF_IOCTL_ENABLE, 0);
 
 		nanosleep(&(struct timespec){ .tv_sec = 0,
 					      .tv_nsec = fill_duration * 1.25 },
@@ -2731,7 +2739,7 @@ test_buffer_fill(void)
 		igt_assert_neq(len, -1);
 
 		overflow_seen = false;
-		for (offset = 0; offset < len; offset += header->size) {
+		for (int offset = 0; offset < len; offset += header->size) {
 			header = (void *)(buf + offset);
 
 			if (header->type == DRM_I915_PERF_RECORD_OA_BUFFER_LOST)
@@ -2740,32 +2748,110 @@ test_buffer_fill(void)
 
 		igt_assert_eq(overflow_seen, true);
 
+		do_ioctl(stream_fd, I915_PERF_IOCTL_DISABLE, 0);
+
+		igt_debug("fill_duration = %luns, oa_exponent = %u\n",
+			  fill_duration, oa_exponent);
+
+		do_ioctl(stream_fd, I915_PERF_IOCTL_ENABLE, 0);
+
 		nanosleep(&(struct timespec){ .tv_sec = 0,
-					      .tv_nsec = fill_duration / 2 },
-			  NULL);
+					.tv_nsec = fill_duration / 2 },
+			NULL);
 
-		while ((len = read(stream_fd, buf, buf_size)) == -1 && errno == EINTR)
-			;
+		n_periodic_reports = 0;
 
-		igt_assert_neq(len, -1);
+		/* Because of the race condition between notification of new
+		 * reports and reports landing in memory, we need to rely on
+		 * timestamps to figure whether we've read enough of them.
+		 */
+		while (((last_timestamp - first_timestamp) * oa_period) < (fill_duration / 2)) {
 
-		igt_assert(len > report_size * n_full_oa_reports * 0.45);
-		igt_assert(len < report_size * n_full_oa_reports * 0.55);
+			igt_debug("dts=%u elapsed=%lu duration=%lu\n",
+				  last_timestamp - first_timestamp,
+				  (last_timestamp - first_timestamp) * oa_period,
+				  fill_duration / 2);
 
-		overflow_seen = false;
-		for (offset = 0; offset < len; offset += header->size) {
-			header = (void *)(buf + offset);
+			while ((len = read(stream_fd, buf, buf_size)) == -1 && errno == EINTR)
+				;
 
-			if (header->type == DRM_I915_PERF_RECORD_OA_BUFFER_LOST)
-				overflow_seen = true;
+			igt_assert_neq(len, -1);
+
+			for (int offset = 0; offset < len; offset += header->size) {
+				uint32_t *report;
+				double previous_tick_per_period;
+
+				header = (void *) (buf + offset);
+				report = (void *) (header + 1);
+
+				switch (header->type) {
+				case DRM_I915_PERF_RECORD_OA_REPORT_LOST:
+					igt_debug("report loss, trying again\n");
+					break;
+				case DRM_I915_PERF_RECORD_SAMPLE:
+					igt_debug(" > report ts=%u"
+						  " ts_delta_last_periodic=%8u is_timer=%i ctx_id=%8x gpu_ticks=%u nb_periodic=%u\n",
+						  report[1],
+						  n_periodic_reports > 0 ? report[1] - last_periodic_report[1] : 0,
+						  oa_report_is_periodic(oa_exponent, report),
+						  oa_report_get_ctx_id(report),
+						  n_periodic_reports > 0 ? report[3] - last_periodic_report[3] : 0,
+						  n_periodic_reports);
+
+					if (first_timestamp == 0)
+						first_timestamp = report[1];
+					last_timestamp = report[1];
+
+					previous_tick_per_period = tick_per_period;
+
+					if (n_periodic_reports > 1 &&
+					    oa_report_is_periodic(oa_exponent, report)) {
+						tick_per_period =
+							oa_reports_tick_per_period(last_periodic_report,
+										   report);
+
+						if (!double_value_within(previous_tick_per_period,
+									 tick_per_period, 5))
+							igt_debug("clock change!\n");
+
+						memcpy(last_periodic_report, report,
+						       sizeof(last_periodic_report));
+					}
+
+					/* We want to measure only the periodic
+					 * reports, ctx-switch might inflate the
+					 * content of the buffer and skew or
+					 * measurement.
+					 */
+					n_periodic_reports +=
+						oa_report_is_periodic(oa_exponent, report);
+					break;
+				case DRM_I915_PERF_RECORD_OA_BUFFER_LOST:
+					igt_assert(!"unexpected overflow");
+					break;
+				}
+			}
 		}
 
-		igt_assert_eq(overflow_seen, false);
+		do_ioctl(stream_fd, I915_PERF_IOCTL_DISABLE, 0);
+
+		igt_debug("%f < %lu < %f\n",
+			  report_size * n_full_oa_reports * 0.45,
+			  n_periodic_reports * report_size,
+			  report_size * n_full_oa_reports * 0.55);
+
+		igt_assert(n_periodic_reports * report_size >
+			   report_size * n_full_oa_reports * 0.45);
+		igt_assert(n_periodic_reports * report_size <
+			   report_size * n_full_oa_reports * 0.55);
 	}
 
 	free(buf);
 
 	__perf_close(stream_fd);
+
+	load_helper_stop();
+	load_helper_fini();
 }
 
 static void
-- 
2.14.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH i-g-t v6 09/12] tests/perf: estimate number of blocking/polling based on time spent
  2017-10-04 11:19 [PATCH i-g-t v6 00/12] Improve robustness of the i915 perf tests Lionel Landwerlin
                   ` (7 preceding siblings ...)
  2017-10-04 11:19 ` [PATCH i-g-t v6 08/12] tests/perf: make buffer-fill " Lionel Landwerlin
@ 2017-10-04 11:19 ` Lionel Landwerlin
  2017-10-04 11:19 ` [PATCH i-g-t v6 10/12] tests/perf: prevent power management to kick in when necessary Lionel Landwerlin
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Lionel Landwerlin @ 2017-10-04 11:19 UTC (permalink / raw)
  To: intel-gfx

Blocking & polling tests define an amount of time to spend in the test
and then estimate the number of syscalls that should successfully
return. The problem is that while running the test we might spend
slightly more time than initiallly planned. This change estimates the
number of syscalls based on time spent after the fact.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
---
 tests/perf.c | 42 +++++++++++++++++++++++++++++++++---------
 1 file changed, 33 insertions(+), 9 deletions(-)

diff --git a/tests/perf.c b/tests/perf.c
index 959a7347..73ac62e0 100644
--- a/tests/perf.c
+++ b/tests/perf.c
@@ -2372,7 +2372,8 @@ test_blocking(void)
 		DRM_I915_PERF_PROP_OA_EXPONENT, oa_exponent,
 	};
 	struct drm_i915_perf_open_param param = {
-		.flags = I915_PERF_FLAG_FD_CLOEXEC,
+		.flags = I915_PERF_FLAG_FD_CLOEXEC |
+			I915_PERF_FLAG_DISABLED,
 		.num_properties = sizeof(properties) / 16,
 		.properties_ptr = to_user_pointer(properties),
 	};
@@ -2397,16 +2398,17 @@ test_blocking(void)
 	 */
 	int min_iterations = (test_duration_ns / (oa_period + 6000000ull));
 
-	int64_t start;
+	int64_t start, end;
 	int n = 0;
 
 	stream_fd = __perf_open(drm_fd, &param);
 
 	times(&start_times);
 
-	igt_debug("tick length = %dns, test duration = %"PRIu64"ns, min iter. = %d, max iter. = %d\n",
+	igt_debug("tick length = %dns, test duration = %"PRIu64"ns, min iter. = %d,"
+		  " estimated max iter. = %d, oa_period = %"PRIu64"ns\n",
 		  (int)tick_ns, test_duration_ns,
-		  min_iterations, max_iterations);
+		  min_iterations, max_iterations, oa_period);
 
 	/* In the loop we perform blocking polls while the HW is sampling at
 	 * ~25Hz, with the expectation that we spend most of our time blocked
@@ -2425,8 +2427,13 @@ test_blocking(void)
 	 * floor(real_stime)).
 	 *
 	 * We Loop for 1000 x tick_ns so one tick corresponds to 0.1%
+	 *
+	 * Also enable the stream just before poll/read to minimize
+	 * the error delta.
 	 */
-	for (start = get_time(); (get_time() - start) < test_duration_ns; /* nop */) {
+	start = get_time();
+	do_ioctl(stream_fd, I915_PERF_IOCTL_ENABLE, 0);
+	for (/* nop */; ((end = get_time()) - start) < test_duration_ns; /* nop */) {
 		struct drm_i915_perf_record_header *header;
 		bool timer_report_read = false;
 		bool non_timer_report_read = false;
@@ -2468,6 +2475,12 @@ test_blocking(void)
 		n++;
 	}
 
+	/* Updated the maximum of iterations based on the time spent
+	 * in the loop.
+	 */
+	max_iterations = (end - start) / oa_period + 1;
+	igt_debug("adjusted max iter. = %d\n", max_iterations);
+
 	times(&end_times);
 
 	/* Using nanosecond units is fairly silly here, given the tick in-
@@ -2524,6 +2537,7 @@ test_polling(void)
 	};
 	struct drm_i915_perf_open_param param = {
 		.flags = I915_PERF_FLAG_FD_CLOEXEC |
+			I915_PERF_FLAG_DISABLED |
 			I915_PERF_FLAG_FD_NONBLOCK,
 		.num_properties = sizeof(properties) / 16,
 		.properties_ptr = to_user_pointer(properties),
@@ -2548,7 +2562,7 @@ test_polling(void)
 	 * to check for data and giving some time to read().
 	 */
 	int min_iterations = (test_duration_ns / (oa_period + 6000000ull));
-	int64_t start;
+	int64_t start, end;
 	int n = 0;
 
 	stream_fd = __perf_open(drm_fd, &param);
@@ -2576,8 +2590,13 @@ test_polling(void)
 	 * floor(real_stime)).
 	 *
 	 * We Loop for 1000 x tick_ns so one tick corresponds to 0.1%
+	 *
+	 * Also enable the stream just before poll/read to minimize
+	 * the error delta.
 	 */
-	for (start = get_time(); (get_time() - start) < test_duration_ns; /* nop */) {
+	start = get_time();
+	do_ioctl(stream_fd, I915_PERF_IOCTL_ENABLE, 0);
+	for (/* nop */; ((end = get_time()) - start) < test_duration_ns; /* nop */) {
 		struct pollfd pollfd = { .fd = stream_fd, .events = POLLIN };
 		struct drm_i915_perf_record_header *header;
 		bool timer_report_read = false;
@@ -2625,8 +2644,7 @@ test_polling(void)
 				if (header->type == DRM_I915_PERF_RECORD_SAMPLE) {
 					uint32_t *report = (void *)(header + 1);
 
-					if (oa_report_is_periodic(oa_exponent,
-								  report))
+					if (oa_report_is_periodic(oa_exponent, report))
 						timer_report_read = true;
 					else
 						non_timer_report_read = true;
@@ -2650,6 +2668,12 @@ test_polling(void)
 		n++;
 	}
 
+	/* Updated the maximum of iterations based on the time spent
+	 * in the loop.
+	 */
+	max_iterations = (end - start) / oa_period + 1;
+	igt_debug("adjusted max iter. = %d\n", max_iterations);
+
 	times(&end_times);
 
 	/* Using nanosecond units is fairly silly here, given the tick in-
-- 
2.14.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH i-g-t v6 10/12] tests/perf: prevent power management to kick in when necessary
  2017-10-04 11:19 [PATCH i-g-t v6 00/12] Improve robustness of the i915 perf tests Lionel Landwerlin
                   ` (8 preceding siblings ...)
  2017-10-04 11:19 ` [PATCH i-g-t v6 09/12] tests/perf: estimate number of blocking/polling based on time spent Lionel Landwerlin
@ 2017-10-04 11:19 ` Lionel Landwerlin
  2017-10-04 11:19 ` [PATCH i-g-t v6 11/12] tests/perf: add support for Coffeelake Lionel Landwerlin
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Lionel Landwerlin @ 2017-10-04 11:19 UTC (permalink / raw)
  To: intel-gfx

Some of our tests measure that the OA unit produces reports at
expected time intervals (as configured through the PERF_OPEN
ioctl). It turns out the power management plays a role in the decision
of the OA unit to write reports to memory. Under normal circumstances
we don't really mind if the unit misses one report here or there, but
for our tests it makes pretty difficult to verify whether we've made a
mistake in the configuration.

To work around this, let's prevent power management to kick in by
holding /dev/cpu_dma_latency opened for the following tests :

  - enable-disable
  - blocking
  - polling
  - buffer-fill
  - oa-exponents

Many thanks to Chris Wilson for suggesting this!

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
---
 tests/perf.c | 64 ++++++++++++++++++++++++++++++++++++++----------------------
 1 file changed, 41 insertions(+), 23 deletions(-)

diff --git a/tests/perf.c b/tests/perf.c
index 73ac62e0..063182bc 100644
--- a/tests/perf.c
+++ b/tests/perf.c
@@ -288,6 +288,7 @@ static bool hsw_undefined_a_counters[45] = {
 static bool gen8_undefined_a_counters[45];
 
 static int drm_fd = -1;
+static int pm_fd = -1;
 static int stream_fd = -1;
 static uint32_t devid;
 static int card = -1;
@@ -331,21 +332,38 @@ __perf_close(int fd)
 {
 	close(fd);
 	stream_fd = -1;
+
+	if (pm_fd >= 0) {
+		close(pm_fd);
+		pm_fd = -1;
+	}
 }
 
 static int
-__perf_open(int fd, struct drm_i915_perf_open_param *param)
+__perf_open(int fd, struct drm_i915_perf_open_param *param, bool prevent_pm)
 {
 	int ret;
+	int32_t pm_value = 0;
 
 	if (stream_fd >= 0)
 		__perf_close(stream_fd);
+	if (pm_fd >= 0) {
+		close(pm_fd);
+		pm_fd = -1;
+	}
 
 	ret = igt_ioctl(fd, DRM_IOCTL_I915_PERF_OPEN, param);
 
 	igt_assert(ret >= 0);
 	errno = 0;
 
+	if (prevent_pm) {
+		pm_fd = open("/dev/cpu_dma_latency", O_RDWR);
+		igt_assert(pm_fd >= 0);
+
+		igt_assert_eq(write(pm_fd, &pm_value, sizeof(pm_value)), sizeof(pm_value));
+	}
+
 	return ret;
 }
 
@@ -1257,7 +1275,7 @@ test_system_wide_paranoid(void)
 
 		igt_drop_root();
 
-		stream_fd = __perf_open(drm_fd, &param);
+		stream_fd = __perf_open(drm_fd, &param, false);
 		__perf_close(stream_fd);
 	}
 
@@ -1314,7 +1332,7 @@ test_invalid_oa_metric_set_id(void)
 
 	/* Check that we aren't just seeing false positives... */
 	properties[ARRAY_SIZE(properties) - 1] = test_metric_set_id;
-	stream_fd = __perf_open(drm_fd, &param);
+	stream_fd = __perf_open(drm_fd, &param, false);
 	__perf_close(stream_fd);
 
 	/* There's no valid default OA metric set ID... */
@@ -1348,7 +1366,7 @@ test_invalid_oa_format_id(void)
 
 	/* Check that we aren't just seeing false positives... */
 	properties[ARRAY_SIZE(properties) - 1] = test_oa_format;
-	stream_fd = __perf_open(drm_fd, &param);
+	stream_fd = __perf_open(drm_fd, &param, false);
 	__perf_close(stream_fd);
 
 	/* There's no valid default OA format... */
@@ -1512,7 +1530,7 @@ open_and_read_2_oa_reports(int format_id,
 		.properties_ptr = to_user_pointer(properties),
 	};
 
-	stream_fd = __perf_open(drm_fd, &param);
+	stream_fd = __perf_open(drm_fd, &param, false);
 
 	read_2_oa_reports(format_id, exponent,
 			  oa_report0, oa_report1, timer_only);
@@ -1916,7 +1934,7 @@ test_oa_exponents(void)
 				  oa_exponent_to_ns(exponent) / 1000.0,
 				  oa_exponent_to_ns(exponent) / (1000.0 * 1000.0));
 
-			stream_fd = __perf_open(drm_fd, &param);
+			stream_fd = __perf_open(drm_fd, &param, true /* prevent_pm */);
 
 			/* Right after opening the OA stream, read a
 			 * first timestamp as way to filter previously
@@ -2192,7 +2210,7 @@ test_invalid_oa_exponent(void)
 		.properties_ptr = to_user_pointer(properties),
 	};
 
-	stream_fd = __perf_open(drm_fd, &param);
+	stream_fd = __perf_open(drm_fd, &param, false);
 
 	__perf_close(stream_fd);
 
@@ -2246,7 +2264,7 @@ test_low_oa_exponent_permissions(void)
 	igt_fork(child, 1) {
 		igt_drop_root();
 
-		stream_fd = __perf_open(drm_fd, &param);
+		stream_fd = __perf_open(drm_fd, &param, false);
 		__perf_close(stream_fd);
 	}
 
@@ -2312,7 +2330,7 @@ test_per_context_mode_unprivileged(void)
 
 		properties[1] = ctx_id;
 
-		stream_fd = __perf_open(drm_fd, &param);
+		stream_fd = __perf_open(drm_fd, &param, false);
 		__perf_close(stream_fd);
 
 		drm_intel_gem_context_destroy(context);
@@ -2401,7 +2419,7 @@ test_blocking(void)
 	int64_t start, end;
 	int n = 0;
 
-	stream_fd = __perf_open(drm_fd, &param);
+	stream_fd = __perf_open(drm_fd, &param, true /* prevent_pm */);
 
 	times(&start_times);
 
@@ -2565,7 +2583,7 @@ test_polling(void)
 	int64_t start, end;
 	int n = 0;
 
-	stream_fd = __perf_open(drm_fd, &param);
+	stream_fd = __perf_open(drm_fd, &param, true /* prevent_pm */);
 
 	times(&start_times);
 
@@ -2742,7 +2760,7 @@ test_buffer_fill(void)
 
 	igt_assert(fill_duration < 1000000000);
 
-	stream_fd = __perf_open(drm_fd, &param);
+	stream_fd = __perf_open(drm_fd, &param, true /* prevent_pm */);
 
 	for (int i = 0; i < 5; i++) {
 		bool overflow_seen;
@@ -2909,7 +2927,7 @@ test_enable_disable(void)
 	load_helper_init();
 	load_helper_run(HIGH);
 
-	stream_fd = __perf_open(drm_fd, &param);
+	stream_fd = __perf_open(drm_fd, &param, true /* prevent_pm */);
 
 	for (int i = 0; i < 5; i++) {
 		int len;
@@ -3073,7 +3091,7 @@ test_short_reads(void)
 	ret = mprotect(pages + page_size, page_size, PROT_NONE);
 	igt_assert_eq(ret, 0);
 
-	stream_fd = __perf_open(drm_fd, &param);
+	stream_fd = __perf_open(drm_fd, &param, false);
 
 	nanosleep(&(struct timespec){ .tv_sec = 0, .tv_nsec = 5000000 }, NULL);
 
@@ -3146,7 +3164,7 @@ test_non_sampling_read_error(void)
 	int ret;
 	uint8_t buf[1024];
 
-	stream_fd = __perf_open(drm_fd, &param);
+	stream_fd = __perf_open(drm_fd, &param, false);
 
 	ret = read(stream_fd, buf, sizeof(buf));
 	igt_assert_eq(ret, -1);
@@ -3184,7 +3202,7 @@ test_disabled_read_error(void)
 	uint32_t buf[128] = { 0 };
 	int ret;
 
-	stream_fd = __perf_open(drm_fd, &param);
+	stream_fd = __perf_open(drm_fd, &param, false);
 
 	ret = read(stream_fd, buf, sizeof(buf));
 	igt_assert_eq(ret, -1);
@@ -3194,7 +3212,7 @@ test_disabled_read_error(void)
 
 
 	param.flags &= ~I915_PERF_FLAG_DISABLED;
-	stream_fd = __perf_open(drm_fd, &param);
+	stream_fd = __perf_open(drm_fd, &param, false);
 
 	read_2_oa_reports(test_oa_format,
 			  oa_exponent,
@@ -3246,7 +3264,7 @@ test_mi_rpc(void)
 	uint32_t *report32;
 	int ret;
 
-	stream_fd = __perf_open(drm_fd, &param);
+	stream_fd = __perf_open(drm_fd, &param, false);
 
 	drm_intel_bufmgr_gem_enable_reuse(bufmgr);
 
@@ -3432,7 +3450,7 @@ hsw_test_single_ctx_counters(void)
 		scratch_buf_memset(dst[0].bo, width, height, 0x00ff00ff);
 
 		igt_debug("opening i915-perf stream\n");
-		stream_fd = __perf_open(drm_fd, &param);
+		stream_fd = __perf_open(drm_fd, &param, false);
 
 		bo = drm_intel_bo_alloc(bufmgr, "mi_rpc dest bo", 4096, 64);
 
@@ -3684,7 +3702,7 @@ gen8_test_single_ctx_render_target_writes_a_counter(void)
 			scratch_buf_memset(dst[0].bo, width, height, 0x00ff00ff);
 
 			igt_debug("opening i915-perf stream\n");
-			stream_fd = __perf_open(drm_fd, &param);
+			stream_fd = __perf_open(drm_fd, &param, false);
 
 			bo = drm_intel_bo_alloc(bufmgr, "mi_rpc dest bo", 4096, 64);
 
@@ -4034,7 +4052,7 @@ test_rc6_disable(void)
 
 	igt_skip_on(!rc6_enabled());
 
-	stream_fd = __perf_open(drm_fd, &param);
+	stream_fd = __perf_open(drm_fd, &param, false);
 
 	n_events_start = read_debugfs_u64_record(drm_fd, "i915_drpc_info",
 						 "RC6 residency since boot");
@@ -4249,7 +4267,7 @@ test_create_destroy_userspace_config(void)
 
 	/* Try to use the new config */
 	properties[1] = config_id;
-	stream_fd = __perf_open(drm_fd, &param);
+	stream_fd = __perf_open(drm_fd, &param, false);
 
 	/* Verify that destroying the config doesn't yield any error. */
 	i915_perf_remove_config(drm_fd, config_id);
@@ -4456,7 +4474,7 @@ test_i915_ref_count(void)
 	igt_debug("initial ref count with drm_fd open = %u\n", ref_count0);
 	igt_assert(ref_count0 > baseline);
 
-	stream_fd = __perf_open(drm_fd, &param);
+	stream_fd = __perf_open(drm_fd, &param, false);
 	ref_count1 = read_i915_module_ref();
 	igt_debug("ref count after opening i915 perf stream = %u\n", ref_count1);
 	igt_assert(ref_count1 > ref_count0);
-- 
2.14.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH i-g-t v6 11/12] tests/perf: add support for Coffeelake
  2017-10-04 11:19 [PATCH i-g-t v6 00/12] Improve robustness of the i915 perf tests Lionel Landwerlin
                   ` (9 preceding siblings ...)
  2017-10-04 11:19 ` [PATCH i-g-t v6 10/12] tests/perf: prevent power management to kick in when necessary Lionel Landwerlin
@ 2017-10-04 11:19 ` Lionel Landwerlin
  2017-10-04 11:51   ` Matthew Auld
  2017-10-04 11:19 ` [PATCH i-g-t v6 12/12] tests/perf: split array of formats descriptions Lionel Landwerlin
                   ` (3 subsequent siblings)
  14 siblings, 1 reply; 24+ messages in thread
From: Lionel Landwerlin @ 2017-10-04 11:19 UTC (permalink / raw)
  To: intel-gfx

Using the same timestamp frequency as Skylake/Kabylake.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 tests/perf.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/tests/perf.c b/tests/perf.c
index 063182bc..4febdf99 100644
--- a/tests/perf.c
+++ b/tests/perf.c
@@ -1145,6 +1145,16 @@ init_sys_info(void)
 		} else if (IS_GEMINILAKE(devid)) {
 			test_set_uuid = "dd3fd789-e783-4204-8cd0-b671bbccb0cf";
 			timestamp_frequency = 19200000;
+		} else if (IS_COFFEELAKE(devid)) {
+			switch (intel_gt(devid)) {
+			case 1:
+				test_set_uuid = "74fb4902-d3d3-4237-9e90-cbdc68d0a446";
+				break;
+			default:
+				igt_debug("unsupported Cannonlake GT size\n");
+				return false;
+			}
+			timestamp_frequency = 12000000;
 		} else {
 			igt_debug("unsupported GT\n");
 			return false;
-- 
2.14.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH i-g-t v6 12/12] tests/perf: split array of formats descriptions
  2017-10-04 11:19 [PATCH i-g-t v6 00/12] Improve robustness of the i915 perf tests Lionel Landwerlin
                   ` (10 preceding siblings ...)
  2017-10-04 11:19 ` [PATCH i-g-t v6 11/12] tests/perf: add support for Coffeelake Lionel Landwerlin
@ 2017-10-04 11:19 ` Lionel Landwerlin
  2017-10-04 12:10   ` Matthew Auld
  2017-10-04 11:49 ` ✗ Fi.CI.BAT: failure for Improve robustness of the i915 perf tests (rev4) Patchwork
                   ` (2 subsequent siblings)
  14 siblings, 1 reply; 24+ messages in thread
From: Lionel Landwerlin @ 2017-10-04 11:19 UTC (permalink / raw)
  To: intel-gfx

The I915_OA_FORMAT_C4_B8 format has different offset on Haswell &
Gen8. Let's split the format lists so we don't mix them.

Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 tests/perf.c | 218 +++++++++++++++++++++++++++++------------------------------
 1 file changed, 106 insertions(+), 112 deletions(-)

diff --git a/tests/perf.c b/tests/perf.c
index 4febdf99..21601ed4 100644
--- a/tests/perf.c
+++ b/tests/perf.c
@@ -188,7 +188,7 @@ struct accumulator {
 	uint64_t deltas[MAX_RAW_OA_COUNTERS];
 };
 
-static struct {
+struct oa_format {
 	const char *name;
 	size_t size;
 	int a40_high_off; /* bytes */
@@ -201,69 +201,59 @@ static struct {
 	int n_b;
 	int c_off;
 	int n_c;
-	int min_gen;
-	int max_gen;
-} oa_formats[local_I915_OA_FORMAT_MAX] = {
+};
+
+static struct oa_format hsw_oa_formats[local_I915_OA_FORMAT_MAX] = {
 	[I915_OA_FORMAT_A13] = { /* HSW only */
 		"A13", .size = 64,
-		.a_off = 12, .n_a = 13,
-		.max_gen = 7 },
+		.a_off = 12, .n_a = 13, },
 	[I915_OA_FORMAT_A29] = { /* HSW only */
 		"A29", .size = 128,
-		.a_off = 12, .n_a = 29,
-		.max_gen = 7 },
+		.a_off = 12, .n_a = 29, },
 	[I915_OA_FORMAT_A13_B8_C8] = { /* HSW only */
 		"A13_B8_C8", .size = 128,
 		.a_off = 12, .n_a = 13,
 		.b_off = 64, .n_b = 8,
-		.c_off = 96, .n_c = 8,
-		.max_gen = 7 },
+		.c_off = 96, .n_c = 8, },
 	[I915_OA_FORMAT_A45_B8_C8] = { /* HSW only */
 		"A45_B8_C8", .size = 256,
 		.a_off = 12,  .n_a = 45,
 		.b_off = 192, .n_b = 8,
-		.c_off = 224, .n_c = 8,
-		.max_gen = 7 },
+		.c_off = 224, .n_c = 8, },
 	[I915_OA_FORMAT_B4_C8] = { /* HSW only */
 		"B4_C8", .size = 64,
 		.b_off = 16, .n_b = 4,
-		.c_off = 32, .n_c = 8,
-		.max_gen = 7 },
+		.c_off = 32, .n_c = 8, },
 	[I915_OA_FORMAT_B4_C8_A16] = { /* HSW only */
 		"B4_C8_A16", .size = 128,
 		.b_off = 16, .n_b = 4,
 		.c_off = 32, .n_c = 8,
-		.a_off = 60, .n_a = 16, .first_a = 29,
-		.max_gen = 7 },
+		.a_off = 60, .n_a = 16, .first_a = 29, },
 	[I915_OA_FORMAT_C4_B8] = { /* HSW+ (header differs from HSW-Gen8+) */
 		"C4_B8", .size = 64,
 		.c_off = 16, .n_c = 4,
 		.b_off = 28, .n_b = 8 },
+};
 
-	/* Gen8+ */
-
+static struct oa_format gen8_oa_formats[local_I915_OA_FORMAT_MAX] = {
 	[local_I915_OA_FORMAT_A12] = {
 		"A12", .size = 64,
-		.a_off = 12, .n_a = 12, .first_a = 7,
-		.min_gen = 8 },
+		.a_off = 12, .n_a = 12, .first_a = 7, },
 	[local_I915_OA_FORMAT_A12_B8_C8] = {
 		"A12_B8_C8", .size = 128,
 		.a_off = 12, .n_a = 12,
 		.b_off = 64, .n_b = 8,
-		.c_off = 96, .n_c = 8, .first_a = 7,
-		.min_gen = 8 },
+		.c_off = 96, .n_c = 8, .first_a = 7, },
 	[local_I915_OA_FORMAT_A32u40_A4u32_B8_C8] = {
 		"A32u40_A4u32_B8_C8", .size = 256,
 		.a40_high_off = 160, .a40_low_off = 16, .n_a40 = 32,
 		.a_off = 144, .n_a = 4, .first_a = 32,
 		.b_off = 192, .n_b = 8,
-		.c_off = 224, .n_c = 8,
-		.min_gen = 8 },
+		.c_off = 224, .n_c = 8, },
 	[I915_OA_FORMAT_C4_B8] = {
 		"C4_B8", .size = 64,
 		.c_off = 16, .n_c = 4,
-		.b_off = 32, .n_b = 8,
-		.min_gen = 8 },
+		.b_off = 32, .n_b = 8, },
 };
 
 static bool hsw_undefined_a_counters[45] = {
@@ -308,6 +298,14 @@ static uint32_t (*read_report_ticks)(uint32_t *report,
 static void (*sanity_check_reports)(uint32_t *oa_report0, uint32_t *oa_report1,
 				    enum drm_i915_oa_format format);
 
+static struct oa_format
+get_oa_format(enum drm_i915_oa_format format)
+{
+	if (IS_HASWELL(devid))
+		return hsw_oa_formats[format];
+	return gen8_oa_formats[format];
+}
+
 static bool
 timestamp_delta_within(uint32_t delta,
 		       uint32_t expected_delta,
@@ -371,7 +369,7 @@ static int
 lookup_format(int i915_perf_fmt_id)
 {
 	igt_assert(i915_perf_fmt_id < local_I915_OA_FORMAT_MAX);
-	igt_assert(oa_formats[i915_perf_fmt_id].name);
+	igt_assert(get_oa_format(i915_perf_fmt_id).name);
 
 	return i915_perf_fmt_id;
 }
@@ -497,9 +495,9 @@ read_debugfs_u64_record(int fd, const char *file, const char *key)
 static uint32_t
 hsw_read_report_ticks(uint32_t *report, enum drm_i915_oa_format format)
 {
-	uint32_t *c = (uint32_t *)(((uint8_t *)report) + oa_formats[format].c_off);
+	uint32_t *c = (uint32_t *)(((uint8_t *)report) + get_oa_format(format).c_off);
 
-	igt_assert_neq(oa_formats[format].n_c, 0);
+	igt_assert_neq(get_oa_format(format).n_c, 0);
 
 	return c[2];
 }
@@ -760,6 +758,7 @@ hsw_sanity_check_render_basic_reports(uint32_t *oa_report0, uint32_t *oa_report1
 	uint32_t time_delta = timebase_scale(oa_report1[1] - oa_report0[1]);
 	uint32_t clock_delta;
 	uint32_t max_delta;
+	struct oa_format format = get_oa_format(fmt);
 
 	igt_assert_neq(time_delta, 0);
 
@@ -767,7 +766,7 @@ hsw_sanity_check_render_basic_reports(uint32_t *oa_report0, uint32_t *oa_report1
 	 * can't explicitly derive a clock delta for all OA report
 	 * formats...
 	 */
-	if (oa_formats[fmt].n_c == 0) {
+	if (format.n_c == 0) {
 		/* Assume running at max freq for sake of
 		 * below sanity check on counters... */
 		clock_delta = (gt_max_freq_mhz *
@@ -797,14 +796,14 @@ hsw_sanity_check_render_basic_reports(uint32_t *oa_report0, uint32_t *oa_report1
 	max_delta = clock_delta * n_eus;
 
 	/* 40bit A counters were only introduced for Gen8+ */
-	igt_assert_eq(oa_formats[fmt].n_a40, 0);
+	igt_assert_eq(format.n_a40, 0);
 
-	for (int j = 0; j < oa_formats[fmt].n_a; j++) {
+	for (int j = 0; j < format.n_a; j++) {
 		uint32_t *a0 = (uint32_t *)(((uint8_t *)oa_report0) +
-					    oa_formats[fmt].a_off);
+					    format.a_off);
 		uint32_t *a1 = (uint32_t *)(((uint8_t *)oa_report1) +
-					    oa_formats[fmt].a_off);
-		int a_id = oa_formats[fmt].first_a + j;
+					    format.a_off);
+		int a_id = format.first_a + j;
 		uint32_t delta = a1[j] - a0[j];
 
 		if (undefined_a_counters[a_id])
@@ -814,22 +813,22 @@ hsw_sanity_check_render_basic_reports(uint32_t *oa_report0, uint32_t *oa_report1
 		igt_assert(delta <= max_delta);
 	}
 
-	for (int j = 0; j < oa_formats[fmt].n_b; j++) {
+	for (int j = 0; j < format.n_b; j++) {
 		uint32_t *b0 = (uint32_t *)(((uint8_t *)oa_report0) +
-					    oa_formats[fmt].b_off);
+					    format.b_off);
 		uint32_t *b1 = (uint32_t *)(((uint8_t *)oa_report1) +
-					    oa_formats[fmt].b_off);
+					    format.b_off);
 		uint32_t delta = b1[j] - b0[j];
 
 		igt_debug("B%d: delta = %"PRIu32"\n", j, delta);
 		igt_assert(delta <= max_delta);
 	}
 
-	for (int j = 0; j < oa_formats[fmt].n_c; j++) {
+	for (int j = 0; j < format.n_c; j++) {
 		uint32_t *c0 = (uint32_t *)(((uint8_t *)oa_report0) +
-					    oa_formats[fmt].c_off);
+					    format.c_off);
 		uint32_t *c1 = (uint32_t *)(((uint8_t *)oa_report1) +
-					    oa_formats[fmt].c_off);
+					    format.c_off);
 		uint32_t delta = c1[j] - c0[j];
 
 		igt_debug("C%d: delta = %"PRIu32"\n", j, delta);
@@ -840,9 +839,10 @@ hsw_sanity_check_render_basic_reports(uint32_t *oa_report0, uint32_t *oa_report1
 static uint64_t
 gen8_read_40bit_a_counter(uint32_t *report, enum drm_i915_oa_format fmt, int a_id)
 {
-	uint8_t *a40_high = (((uint8_t *)report) + oa_formats[fmt].a40_high_off);
+	struct oa_format format = get_oa_format(fmt);
+	uint8_t *a40_high = (((uint8_t *)report) + format.a40_high_off);
 	uint32_t *a40_low = (uint32_t *)(((uint8_t *)report) +
-					 oa_formats[fmt].a40_low_off);
+					 format.a40_low_off);
 	uint64_t high = (uint64_t)(a40_high[a_id]) << 32;
 
 	return a40_low[a_id] | high;
@@ -887,7 +887,7 @@ accumulate_reports(struct accumulator *accumulator,
 		   uint32_t *start,
 		   uint32_t *end)
 {
-	enum drm_i915_oa_format format = accumulator->format;
+	struct oa_format format = get_oa_format(accumulator->format);
 	uint64_t *deltas = accumulator->deltas;
 	int idx = 0;
 
@@ -902,21 +902,23 @@ accumulate_reports(struct accumulator *accumulator,
 		accumulate_uint32(4, start, end, deltas + idx++);
 	}
 
-	for (int i = 0; i < oa_formats[format].n_a40; i++)
-		accumulate_uint40(i, start, end, format, deltas + idx++);
+	for (int i = 0; i < format.n_a40; i++) {
+		accumulate_uint40(i, start, end, accumulator->format,
+				  deltas + idx++);
+	}
 
-	for (int i = 0; i < oa_formats[format].n_a; i++) {
-		accumulate_uint32(oa_formats[format].a_off + 4 * i,
+	for (int i = 0; i < format.n_a; i++) {
+		accumulate_uint32(format.a_off + 4 * i,
 				  start, end, deltas + idx++);
 	}
 
-	for (int i = 0; i < oa_formats[format].n_b; i++) {
-		accumulate_uint32(oa_formats[format].b_off + 4 * i,
+	for (int i = 0; i < format.n_b; i++) {
+		accumulate_uint32(format.b_off + 4 * i,
 				  start, end, deltas + idx++);
 	}
 
-	for (int i = 0; i < oa_formats[format].n_c; i++) {
-		accumulate_uint32(oa_formats[format].c_off + 4 * i,
+	for (int i = 0; i < format.n_c; i++) {
+		accumulate_uint32(format.c_off + 4 * i,
 				  start, end, deltas + idx++);
 	}
 }
@@ -924,7 +926,7 @@ accumulate_reports(struct accumulator *accumulator,
 static void
 accumulator_print(struct accumulator *accumulator, const char *title)
 {
-	enum drm_i915_oa_format format = accumulator->format;
+	struct oa_format format = get_oa_format(accumulator->format);
 	uint64_t *deltas = accumulator->deltas;
 	int idx = 0;
 
@@ -933,21 +935,21 @@ accumulator_print(struct accumulator *accumulator, const char *title)
 		igt_debug("\ttime delta = %lu\n", deltas[idx++]);
 		igt_debug("\tclock cycle delta = %lu\n", deltas[idx++]);
 
-		for (int i = 0; i < oa_formats[format].n_a40; i++)
+		for (int i = 0; i < format.n_a40; i++)
 			igt_debug("\tA%u = %lu\n", i, deltas[idx++]);
 	} else {
 		igt_debug("\ttime delta = %lu\n", deltas[idx++]);
 	}
 
-	for (int i = 0; i < oa_formats[format].n_a; i++) {
-		int a_id = oa_formats[format].first_a + i;
+	for (int i = 0; i < format.n_a; i++) {
+		int a_id = format.first_a + i;
 		igt_debug("\tA%u = %lu\n", a_id, deltas[idx++]);
 	}
 
-	for (int i = 0; i < oa_formats[format].n_a; i++)
+	for (int i = 0; i < format.n_a; i++)
 		igt_debug("\tB%u = %lu\n", i, deltas[idx++]);
 
-	for (int i = 0; i < oa_formats[format].n_c; i++)
+	for (int i = 0; i < format.n_c; i++)
 		igt_debug("\tC%u = %lu\n", i, deltas[idx++]);
 }
 
@@ -956,6 +958,7 @@ static void
 gen8_sanity_check_test_oa_reports(uint32_t *oa_report0, uint32_t *oa_report1,
 				  enum drm_i915_oa_format fmt)
 {
+	struct oa_format format = get_oa_format(fmt);
 	uint32_t time_delta = timebase_scale(oa_report1[1] - oa_report0[1]);
 	uint32_t ticks0 = read_report_ticks(oa_report0, fmt);
 	uint32_t ticks1 = read_report_ticks(oa_report1, fmt);
@@ -963,9 +966,9 @@ gen8_sanity_check_test_oa_reports(uint32_t *oa_report0, uint32_t *oa_report1,
 	uint32_t max_delta;
 	uint64_t freq;
 	uint32_t *rpt0_b = (uint32_t *)(((uint8_t *)oa_report0) +
-					oa_formats[fmt].b_off);
+					format.b_off);
 	uint32_t *rpt1_b = (uint32_t *)(((uint8_t *)oa_report1) +
-					oa_formats[fmt].b_off);
+					format.b_off);
 	uint32_t b;
 	uint32_t ref;
 
@@ -983,7 +986,7 @@ gen8_sanity_check_test_oa_reports(uint32_t *oa_report0, uint32_t *oa_report1,
 	max_delta = clock_delta * n_eus;
 
 	/* Gen8+ has some 40bit A counters... */
-	for (int j = 0; j < oa_formats[fmt].n_a40; j++) {
+	for (int j = 0; j < format.n_a40; j++) {
 		uint64_t value0 = gen8_read_40bit_a_counter(oa_report0, fmt, j);
 		uint64_t value1 = gen8_read_40bit_a_counter(oa_report1, fmt, j);
 		uint64_t delta = gen8_40bit_a_delta(value0, value1);
@@ -995,12 +998,12 @@ gen8_sanity_check_test_oa_reports(uint32_t *oa_report0, uint32_t *oa_report1,
 		igt_assert(delta <= max_delta);
 	}
 
-	for (int j = 0; j < oa_formats[fmt].n_a; j++) {
+	for (int j = 0; j < format.n_a; j++) {
 		uint32_t *a0 = (uint32_t *)(((uint8_t *)oa_report0) +
-					    oa_formats[fmt].a_off);
+					    format.a_off);
 		uint32_t *a1 = (uint32_t *)(((uint8_t *)oa_report1) +
-					    oa_formats[fmt].a_off);
-		int a_id = oa_formats[fmt].first_a + j;
+					    format.a_off);
+		int a_id = format.first_a + j;
 		uint32_t delta = a1[j] - a0[j];
 
 		if (undefined_a_counters[a_id])
@@ -1013,7 +1016,7 @@ gen8_sanity_check_test_oa_reports(uint32_t *oa_report0, uint32_t *oa_report1,
 	/* The TestOa metric set defines all B counters to be a
 	 * multiple of the gpu clock
 	 */
-	if (oa_formats[fmt].n_b) {
+	if (format.n_b) {
 		b = rpt1_b[0] - rpt0_b[0];
 		igt_debug("B0: delta = %"PRIu32"\n", b);
 		igt_assert_eq(b, 0);
@@ -1052,11 +1055,11 @@ gen8_sanity_check_test_oa_reports(uint32_t *oa_report0, uint32_t *oa_report1,
 		igt_assert(b >= ref - 1 && b <= ref + 1);
 	}
 
-	for (int j = 0; j < oa_formats[fmt].n_c; j++) {
+	for (int j = 0; j < format.n_c; j++) {
 		uint32_t *c0 = (uint32_t *)(((uint8_t *)oa_report0) +
-					    oa_formats[fmt].c_off);
+					    format.c_off);
 		uint32_t *c1 = (uint32_t *)(((uint8_t *)oa_report1) +
-					    oa_formats[fmt].c_off);
+					    format.c_off);
 		uint32_t delta = c1[j] - c0[j];
 
 		igt_debug("C%d: delta = %"PRIu32"\n", j, delta);
@@ -1186,7 +1189,7 @@ i915_read_reports_until_timestamp(enum drm_i915_oa_format oa_format,
 				  uint32_t start_timestamp,
 				  uint32_t end_timestamp)
 {
-	size_t format_size = oa_formats[oa_format].size;
+	size_t format_size = get_oa_format(oa_format).size;
 	uint32_t last_seen_timestamp = start_timestamp;
 	int total_len = 0;
 
@@ -1411,7 +1414,7 @@ read_2_oa_reports(int format_id,
 		  uint32_t *oa_report1,
 		  bool timer_only)
 {
-	size_t format_size = oa_formats[format_id].size;
+	size_t format_size = get_oa_format(format_id).size;
 	size_t sample_size = (sizeof(struct drm_i915_perf_record_header) +
 			      format_size);
 	const struct drm_i915_perf_record_header *header;
@@ -1551,10 +1554,12 @@ open_and_read_2_oa_reports(int format_id,
 static void
 print_reports(uint32_t *oa_report0, uint32_t *oa_report1, int fmt)
 {
+	struct oa_format format = get_oa_format(fmt);
+
 	igt_debug("TIMESTAMP: 1st = %"PRIu32", 2nd = %"PRIu32", delta = %"PRIu32"\n",
 		  oa_report0[1], oa_report1[1], oa_report1[1] - oa_report0[1]);
 
-	if (IS_HASWELL(devid) && oa_formats[fmt].n_c == 0) {
+	if (IS_HASWELL(devid) && format.n_c == 0) {
 		igt_debug("CLOCK = N/A\n");
 	} else {
 		uint32_t clock0 = read_report_ticks(oa_report0, fmt);
@@ -1588,7 +1593,7 @@ print_reports(uint32_t *oa_report0, uint32_t *oa_report1, int fmt)
 	}
 
 	/* Gen8+ has some 40bit A counters... */
-	for (int j = 0; j < oa_formats[fmt].n_a40; j++) {
+	for (int j = 0; j < format.n_a40; j++) {
 		uint64_t value0 = gen8_read_40bit_a_counter(oa_report0, fmt, j);
 		uint64_t value1 = gen8_read_40bit_a_counter(oa_report1, fmt, j);
 		uint64_t delta = gen8_40bit_a_delta(value0, value1);
@@ -1600,12 +1605,12 @@ print_reports(uint32_t *oa_report0, uint32_t *oa_report1, int fmt)
 			  j, value0, value1, delta);
 	}
 
-	for (int j = 0; j < oa_formats[fmt].n_a; j++) {
+	for (int j = 0; j < format.n_a; j++) {
 		uint32_t *a0 = (uint32_t *)(((uint8_t *)oa_report0) +
-					    oa_formats[fmt].a_off);
+					    format.a_off);
 		uint32_t *a1 = (uint32_t *)(((uint8_t *)oa_report1) +
-					    oa_formats[fmt].a_off);
-		int a_id = oa_formats[fmt].first_a + j;
+					    format.a_off);
+		int a_id = format.first_a + j;
 		uint32_t delta = a1[j] - a0[j];
 
 		if (undefined_a_counters[a_id])
@@ -1615,22 +1620,22 @@ print_reports(uint32_t *oa_report0, uint32_t *oa_report1, int fmt)
 			  a_id, a0[j], a1[j], delta);
 	}
 
-	for (int j = 0; j < oa_formats[fmt].n_b; j++) {
+	for (int j = 0; j < format.n_b; j++) {
 		uint32_t *b0 = (uint32_t *)(((uint8_t *)oa_report0) +
-					    oa_formats[fmt].b_off);
+					    format.b_off);
 		uint32_t *b1 = (uint32_t *)(((uint8_t *)oa_report1) +
-					    oa_formats[fmt].b_off);
+					    format.b_off);
 		uint32_t delta = b1[j] - b0[j];
 
 		igt_debug("B%d: 1st = %"PRIu32", 2nd = %"PRIu32", delta = %"PRIu32"\n",
 			  j, b0[j], b1[j], delta);
 	}
 
-	for (int j = 0; j < oa_formats[fmt].n_c; j++) {
+	for (int j = 0; j < format.n_c; j++) {
 		uint32_t *c0 = (uint32_t *)(((uint8_t *)oa_report0) +
-					    oa_formats[fmt].c_off);
+					    format.c_off);
 		uint32_t *c1 = (uint32_t *)(((uint8_t *)oa_report1) +
-					    oa_formats[fmt].c_off);
+					    format.c_off);
 		uint32_t delta = c1[j] - c0[j];
 
 		igt_debug("C%d: 1st = %"PRIu32", 2nd = %"PRIu32", delta = %"PRIu32"\n",
@@ -1643,9 +1648,11 @@ print_reports(uint32_t *oa_report0, uint32_t *oa_report1, int fmt)
 static void
 print_report(uint32_t *report, int fmt)
 {
+	struct oa_format format = get_oa_format(fmt);
+
 	igt_debug("TIMESTAMP: %"PRIu32"\n", report[1]);
 
-	if (IS_HASWELL(devid) && oa_formats[fmt].n_c == 0) {
+	if (IS_HASWELL(devid) && format.n_c == 0) {
 		igt_debug("CLOCK = N/A\n");
 	} else {
 		uint32_t clock = read_report_ticks(report, fmt);
@@ -1666,7 +1673,7 @@ print_report(uint32_t *report, int fmt)
 	}
 
 	/* Gen8+ has some 40bit A counters... */
-	for (int j = 0; j < oa_formats[fmt].n_a40; j++) {
+	for (int j = 0; j < format.n_a40; j++) {
 		uint64_t value = gen8_read_40bit_a_counter(report, fmt, j);
 
 		if (undefined_a_counters[j])
@@ -1675,10 +1682,10 @@ print_report(uint32_t *report, int fmt)
 		igt_debug("A%d: %"PRIu64"\n", j, value);
 	}
 
-	for (int j = 0; j < oa_formats[fmt].n_a; j++) {
+	for (int j = 0; j < format.n_a; j++) {
 		uint32_t *a = (uint32_t *)(((uint8_t *)report) +
-					   oa_formats[fmt].a_off);
-		int a_id = oa_formats[fmt].first_a + j;
+					   format.a_off);
+		int a_id = format.first_a + j;
 
 		if (undefined_a_counters[a_id])
 			continue;
@@ -1686,16 +1693,16 @@ print_report(uint32_t *report, int fmt)
 		igt_debug("A%d: %"PRIu32"\n", a_id, a[j]);
 	}
 
-	for (int j = 0; j < oa_formats[fmt].n_b; j++) {
+	for (int j = 0; j < format.n_b; j++) {
 		uint32_t *b = (uint32_t *)(((uint8_t *)report) +
-					   oa_formats[fmt].b_off);
+					   format.b_off);
 
 		igt_debug("B%d: %"PRIu32"\n", j, b[j]);
 	}
 
-	for (int j = 0; j < oa_formats[fmt].n_c; j++) {
+	for (int j = 0; j < format.n_c; j++) {
 		uint32_t *c = (uint32_t *)(((uint8_t *)report) +
-					   oa_formats[fmt].c_off);
+					   format.c_off);
 
 		igt_debug("C%d: %"PRIu32"\n", j, c[j]);
 	}
@@ -1705,28 +1712,15 @@ print_report(uint32_t *report, int fmt)
 static void
 test_oa_formats(void)
 {
-	for (int i = 0; i < ARRAY_SIZE(oa_formats); i++) {
+	for (int i = 0; i < I915_OA_FORMAT_MAX; i++) {
+		struct oa_format format = get_oa_format(i);
 		uint32_t oa_report0[64];
 		uint32_t oa_report1[64];
 
-		if (!oa_formats[i].name) /* sparse, indexed by ID */
-			continue;
-
-		if (oa_formats[i].min_gen &&
-		    intel_gen(devid) < oa_formats[i].min_gen) {
-			igt_debug("skipping unsupported OA format %s\n",
-				  oa_formats[i].name);
+		if (!format.name) /* sparse, indexed by ID */
 			continue;
-		}
-
-		if (oa_formats[i].max_gen &&
-		    intel_gen(devid) > oa_formats[i].max_gen) {
-			igt_debug("skipping unsupported OA format %s\n",
-				  oa_formats[i].name);
-			continue;
-		}
 
-		igt_debug("Checking OA format %s\n", oa_formats[i].name);
+		igt_debug("Checking OA format %s\n", format.name);
 
 		open_and_read_2_oa_reports(i,
 					   oa_exp_1_millisec,
@@ -2761,7 +2755,7 @@ test_buffer_fill(void)
 	uint8_t *buf = malloc(buf_size);
 	int len;
 	size_t oa_buf_size = 16 * 1024 * 1024;
-	size_t report_size = oa_formats[test_oa_format].size;
+	size_t report_size = get_oa_format(test_oa_format).size;
 	int n_full_oa_reports = oa_buf_size / report_size;
 	uint64_t fill_duration = n_full_oa_reports * oa_period;
 
@@ -2930,7 +2924,7 @@ test_enable_disable(void)
 	int buf_size = 65536 * (256 + sizeof(struct drm_i915_perf_record_header));
 	uint8_t *buf = malloc(buf_size);
 	size_t oa_buf_size = 16 * 1024 * 1024;
-	size_t report_size = oa_formats[test_oa_format].size;
+	size_t report_size = get_oa_format(test_oa_format).size;
 	int n_full_oa_reports = oa_buf_size / report_size;
 	uint64_t fill_duration = n_full_oa_reports * oa_period;
 
@@ -3621,7 +3615,7 @@ gen8_test_single_ctx_render_target_writes_a_counter(void)
 		.num_properties = ARRAY_SIZE(properties) / 2,
 		.properties_ptr = to_user_pointer(properties),
 	};
-	size_t format_size = oa_formats[test_oa_format].size;
+	size_t format_size = get_oa_format(test_oa_format).size;
 	size_t sample_size = (sizeof(struct drm_i915_perf_record_header) +
 			      format_size);
 	int max_reports = (16 * 1024 * 1024) / format_size;
-- 
2.14.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH i-g-t v6 02/12] tests/perf: add per context filtering test for gen8+
  2017-10-04 11:19 ` [PATCH i-g-t v6 02/12] tests/perf: add per context filtering test for gen8+ Lionel Landwerlin
@ 2017-10-04 11:46   ` Matthew Auld
  2017-10-04 12:43     ` Lionel Landwerlin
  0 siblings, 1 reply; 24+ messages in thread
From: Matthew Auld @ 2017-10-04 11:46 UTC (permalink / raw)
  To: Lionel Landwerlin; +Cc: Intel Graphics Development

On 4 October 2017 at 12:19, Lionel Landwerlin
<lionel.g.landwerlin@intel.com> wrote:
> From: Robert Bragg <robert@sixbynine.org>
>
> Signed-off-by: Robert Bragg <robert@sixbynine.org>
> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> ---
>  tests/perf.c | 777 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 745 insertions(+), 32 deletions(-)
>
> diff --git a/tests/perf.c b/tests/perf.c
> index f89a235e..8644e252 100644
> --- a/tests/perf.c
> +++ b/tests/perf.c
> @@ -48,7 +48,9 @@ IGT_TEST_DESCRIPTION("Test the i915 perf metrics streaming interface");
>  #define OAREPORT_REASON_MASK           0x3f
>  #define OAREPORT_REASON_SHIFT          19
>  #define OAREPORT_REASON_TIMER          (1<<0)
> +#define OAREPORT_REASON_INTERNAL       (3<<1)
>  #define OAREPORT_REASON_CTX_SWITCH     (1<<3)
> +#define OAREPORT_REASON_GO             (1<<4)
>  #define OAREPORT_REASON_CLK_RATIO      (1<<5)
>
>  #define GFX_OP_PIPE_CONTROL     ((3 << 29) | (3 << 27) | (2 << 24))
> @@ -574,6 +576,22 @@ oa_exponent_to_ns(int exponent)
>         return 1000000000ULL * (2ULL << exponent) / timestamp_frequency;
>  }
>
> +static bool
> +oa_report_ctx_is_valid(uint32_t *report)
> +{
> +       if (IS_HASWELL(devid)) {
> +               return false; /* TODO */
> +       } else if (IS_GEN8(devid)) {
> +               return report[0] & (1ul << 25);
> +       } else if (IS_GEN9(devid)) {
> +               return report[0] & (1ul << 16);
> +       }
> +
> +       /* Need to update this function for newer Gen. */
> +       igt_assert(!"reached");
> +}
> +
> +
>  static void
>  hsw_sanity_check_render_basic_reports(uint32_t *oa_report0, uint32_t *oa_report1,
>                                       enum drm_i915_oa_format fmt)
> @@ -678,6 +696,100 @@ gen8_40bit_a_delta(uint64_t value0, uint64_t value1)
>                 return value1 - value0;
>  }
>
> +static void
> +accumulate_uint32(size_t offset,
> +                 uint32_t *report0,
> +                  uint32_t *report1,
> +                  uint64_t *delta)
> +{
> +       uint32_t value0 = *(uint32_t *)(((uint8_t *)report0) + offset);
> +       uint32_t value1 = *(uint32_t *)(((uint8_t *)report1) + offset);
> +
> +       *delta += (uint32_t)(value1 - value0);
> +}
> +
> +static void
> +accumulate_uint40(int a_index,
> +                  uint32_t *report0,
> +                  uint32_t *report1,
> +                 enum drm_i915_oa_format format,
> +                  uint64_t *delta)
> +{
> +       uint64_t value0 = gen8_read_40bit_a_counter(report0, format, a_index),
> +                value1 = gen8_read_40bit_a_counter(report1, format, a_index);
> +
> +       *delta += gen8_40bit_a_delta(value0, value1);
> +}
> +
> +static void
> +accumulate_reports(struct accumulator *accumulator,
> +                  uint32_t *start,
> +                  uint32_t *end)
> +{
> +       enum drm_i915_oa_format format = accumulator->format;
> +       uint64_t *deltas = accumulator->deltas;
> +       int idx = 0;
> +
> +       if (intel_gen(devid) >= 8) {
> +               /* timestamp */
> +               accumulate_uint32(4, start, end, deltas + idx++);
> +
> +               /* clock cycles */
> +               accumulate_uint32(12, start, end, deltas + idx++);
> +       } else {
> +               /* timestamp */
> +               accumulate_uint32(4, start, end, deltas + idx++);
> +       }
> +
> +       for (int i = 0; i < oa_formats[format].n_a40; i++)
> +               accumulate_uint40(i, start, end, format, deltas + idx++);
> +
> +       for (int i = 0; i < oa_formats[format].n_a; i++) {
> +               accumulate_uint32(oa_formats[format].a_off + 4 * i,
> +                                 start, end, deltas + idx++);
> +       }
> +
> +       for (int i = 0; i < oa_formats[format].n_b; i++) {
> +               accumulate_uint32(oa_formats[format].b_off + 4 * i,
> +                                 start, end, deltas + idx++);
> +       }
> +
> +       for (int i = 0; i < oa_formats[format].n_c; i++) {
> +               accumulate_uint32(oa_formats[format].c_off + 4 * i,
> +                                 start, end, deltas + idx++);
> +       }
> +}
> +
> +static void
> +accumulator_print(struct accumulator *accumulator, const char *title)
> +{
> +       enum drm_i915_oa_format format = accumulator->format;
> +       uint64_t *deltas = accumulator->deltas;
> +       int idx = 0;
> +
> +       igt_debug("%s:\n", title);
> +       if (intel_gen(devid) >= 8) {
> +               igt_debug("\ttime delta = %lu\n", deltas[idx++]);
> +               igt_debug("\tclock cycle delta = %lu\n", deltas[idx++]);
> +
> +               for (int i = 0; i < oa_formats[format].n_a40; i++)
> +                       igt_debug("\tA%u = %lu\n", i, deltas[idx++]);
> +       } else {
> +               igt_debug("\ttime delta = %lu\n", deltas[idx++]);
> +       }
> +
> +       for (int i = 0; i < oa_formats[format].n_a; i++) {
> +               int a_id = oa_formats[format].first_a + i;
> +               igt_debug("\tA%u = %lu\n", a_id, deltas[idx++]);
> +       }
> +
> +       for (int i = 0; i < oa_formats[format].n_a; i++)
> +               igt_debug("\tB%u = %lu\n", i, deltas[idx++]);
> +
> +       for (int i = 0; i < oa_formats[format].n_c; i++)
> +               igt_debug("\tC%u = %lu\n", i, deltas[idx++]);
> +}
> +
>  /* The TestOa metric set is designed so */
>  static void
>  gen8_sanity_check_test_oa_reports(uint32_t *oa_report0, uint32_t *oa_report1,
> @@ -944,6 +1056,62 @@ gt_frequency_range_restore(void)
>         gt_max_freq_mhz = gt_max_freq_mhz_saved;
>  }
>
> +static int
> +i915_read_reports_until_timestamp(enum drm_i915_oa_format oa_format,
> +                                 uint8_t *buf,
> +                                 uint32_t max_size,
> +                                 uint32_t start_timestamp,
> +                                 uint32_t end_timestamp)
> +{
> +       size_t format_size = oa_formats[oa_format].size;
> +       uint32_t last_seen_timestamp = start_timestamp;
> +       int total_len = 0;
> +
> +       while (last_seen_timestamp < end_timestamp) {
> +               int offset, len;
> +
> +               /* Running out of space. */
> +               if ((max_size - total_len) < format_size) {
> +                       igt_warn("run out of space before reaching "
> +                                "end timestamp (%u/%u)\n",
> +                                last_seen_timestamp, end_timestamp);
> +                       return -1;
> +               }
> +
> +               while ((len = read(stream_fd, &buf[total_len],
> +                                  max_size - total_len)) < 0 &&
> +                      errno == EINTR)
> +                       ;
> +
> +               /* Intentionally return an error. */
> +               if (len <= 0) {
> +                       if (errno == EAGAIN)
> +                               return total_len;
> +                       else {
> +                               igt_warn("error read OA stream : %i\n", errno);
> +                               return -1;
> +                       }
> +               }
> +
> +               offset = total_len;
> +               total_len += len;
> +
> +               while (offset < total_len) {
> +                 const struct drm_i915_perf_record_header *header =
> +                   (const struct drm_i915_perf_record_header *) &buf[offset];
> +                 uint32_t *report = (uint32_t *) (header + 1);
> +
> +                 if (header->type == DRM_I915_PERF_RECORD_SAMPLE)
> +                   last_seen_timestamp = report[1];
> +
> +                 offset += header->size;
> +               }
Formatting looks off.

> +       }
> +
> +       return total_len;
> +}
> +
> +
>  /* CAP_SYS_ADMIN is required to open system wide metrics, unless the system
>   * control parameter dev.i915.perf_stream_paranoid == 0 */
>  static void
> @@ -1362,6 +1530,70 @@ print_reports(uint32_t *oa_report0, uint32_t *oa_report1, int fmt)
>         }
>  }
>
> +/* Debug function, only useful when reports don't make sense. */
> +#if 0
> +static void
> +print_report(uint32_t *report, int fmt)
> +{
> +       igt_debug("TIMESTAMP: %"PRIu32"\n", report[1]);
> +
> +       if (IS_HASWELL(devid) && oa_formats[fmt].n_c == 0) {
> +               igt_debug("CLOCK = N/A\n");
> +       } else {
> +               uint32_t clock = read_report_ticks(report, fmt);
> +
> +               igt_debug("CLOCK: %"PRIu32"\n", clock);
> +       }
> +
> +       if (intel_gen(devid) >= 8) {
> +               uint32_t slice_freq, unslice_freq;
> +               const char *reason = gen8_read_report_reason(report);
> +
> +               gen8_read_report_clock_ratios(report, &slice_freq, &unslice_freq);
> +
> +               igt_debug("SLICE CLK: %umhz\n", slice_freq);
> +               igt_debug("UNSLICE CLK: %umhz\n", unslice_freq);
> +               igt_debug("REASON: \"%s\"\n", reason);
> +               igt_debug("CTX ID: %"PRIu32"/%"PRIx32"\n", report[2], report[2]);
> +       }
> +
> +       /* Gen8+ has some 40bit A counters... */
> +       for (int j = 0; j < oa_formats[fmt].n_a40; j++) {
> +               uint64_t value = gen8_read_40bit_a_counter(report, fmt, j);
> +
> +               if (undefined_a_counters[j])
> +                       continue;
> +
> +               igt_debug("A%d: %"PRIu64"\n", j, value);
> +       }
> +
> +       for (int j = 0; j < oa_formats[fmt].n_a; j++) {
> +               uint32_t *a = (uint32_t *)(((uint8_t *)report) +
> +                                          oa_formats[fmt].a_off);
> +               int a_id = oa_formats[fmt].first_a + j;
> +
> +               if (undefined_a_counters[a_id])
> +                       continue;
> +
> +               igt_debug("A%d: %"PRIu32"\n", a_id, a[j]);
> +       }
> +
> +       for (int j = 0; j < oa_formats[fmt].n_b; j++) {
> +               uint32_t *b = (uint32_t *)(((uint8_t *)report) +
> +                                          oa_formats[fmt].b_off);
> +
> +               igt_debug("B%d: %"PRIu32"\n", j, b[j]);
> +       }
> +
> +       for (int j = 0; j < oa_formats[fmt].n_c; j++) {
> +               uint32_t *c = (uint32_t *)(((uint8_t *)report) +
> +                                          oa_formats[fmt].c_off);
> +
> +               igt_debug("C%d: %"PRIu32"\n", j, c[j]);
> +       }
> +}
> +#endif
> +
>  static void
>  test_oa_formats(void)
>  {
> @@ -2486,14 +2718,8 @@ test_mi_rpc(void)
>  }
>
>  static void
> -scratch_buf_init(drm_intel_bufmgr *bufmgr,
> -                struct igt_buf *buf,
> -                int width, int height,
> -                uint32_t color)
> +scratch_buf_memset(drm_intel_bo *bo, int width, int height, uint32_t color)
>  {
> -       size_t stride = width * 4;
> -       size_t size = stride * height;
> -       drm_intel_bo *bo = drm_intel_bo_alloc(bufmgr, "", size, 4096);
>         int ret;
>
>         ret = drm_intel_bo_map(bo, true /* writable */);
> @@ -2503,6 +2729,19 @@ scratch_buf_init(drm_intel_bufmgr *bufmgr,
>                 ((uint32_t *)bo->virtual)[i] = color;
>
>         drm_intel_bo_unmap(bo);
> +}
> +
> +static void
> +scratch_buf_init(drm_intel_bufmgr *bufmgr,
> +                struct igt_buf *buf,
> +                int width, int height,
> +                uint32_t color)
> +{
> +       size_t stride = width * 4;
> +       size_t size = stride * height;
> +       drm_intel_bo *bo = drm_intel_bo_alloc(bufmgr, "", size, 4096);
> +
> +       scratch_buf_memset(bo, width, height, color);
>
>         buf->bo = bo;
>         buf->stride = stride;
> @@ -2521,14 +2760,25 @@ emit_stall_timestamp_and_rpc(struct intel_batchbuffer *batch,
>                                    PIPE_CONTROL_RENDER_TARGET_FLUSH |
>                                    PIPE_CONTROL_WRITE_TIMESTAMP);
>
> -       BEGIN_BATCH(5, 1);
> -       OUT_BATCH(GFX_OP_PIPE_CONTROL | (5 - 2));
> -       OUT_BATCH(pipe_ctl_flags);
> -       OUT_RELOC(dst, I915_GEM_DOMAIN_INSTRUCTION, I915_GEM_DOMAIN_INSTRUCTION,
> -                 timestamp_offset);
> -       OUT_BATCH(0); /* imm lower */
> -       OUT_BATCH(0); /* imm upper */
> -       ADVANCE_BATCH();
> +       if (intel_gen(devid) >= 8) {
> +               BEGIN_BATCH(5, 1);
> +               OUT_BATCH(GFX_OP_PIPE_CONTROL | (6 - 2));
> +               OUT_BATCH(pipe_ctl_flags);
> +               OUT_RELOC(dst, I915_GEM_DOMAIN_INSTRUCTION, I915_GEM_DOMAIN_INSTRUCTION,
> +                         timestamp_offset);
> +               OUT_BATCH(0); /* imm lower */
> +               OUT_BATCH(0); /* imm upper */
> +               ADVANCE_BATCH();
> +       } else {
> +               BEGIN_BATCH(5, 1);
> +               OUT_BATCH(GFX_OP_PIPE_CONTROL | (5 - 2));
> +               OUT_BATCH(pipe_ctl_flags);
> +               OUT_RELOC(dst, I915_GEM_DOMAIN_INSTRUCTION, I915_GEM_DOMAIN_INSTRUCTION,
> +                         timestamp_offset);
> +               OUT_BATCH(0); /* imm lower */
> +               OUT_BATCH(0); /* imm upper */
> +               ADVANCE_BATCH();
> +       }
>
>         emit_report_perf_count(batch, dst, report_dst_offset, report_id);
>  }
> @@ -2574,7 +2824,7 @@ hsw_test_single_ctx_counters(void)
>                 drm_intel_bufmgr *bufmgr;
>                 drm_intel_context *context0, *context1;
>                 struct intel_batchbuffer *batch;
> -               struct igt_buf src, dst;
> +               struct igt_buf src[3], dst[3];
>                 drm_intel_bo *bo;
>                 uint32_t *report0_32, *report1_32;
>                 uint64_t timestamp0_64, timestamp1_64;
> @@ -2592,8 +2842,10 @@ hsw_test_single_ctx_counters(void)
>                 bufmgr = drm_intel_bufmgr_gem_init(drm_fd, 4096);
>                 drm_intel_bufmgr_gem_enable_reuse(bufmgr);
>
> -               scratch_buf_init(bufmgr, &src, width, height, 0xff0000ff);
> -               scratch_buf_init(bufmgr, &dst, width, height, 0x00ff00ff);
> +               for (int i = 0; i < ARRAY_SIZE(src); i++) {
> +                       scratch_buf_init(bufmgr, &src[i], width, height, 0xff0000ff);
> +                       scratch_buf_init(bufmgr, &dst[i], width, height, 0x00ff00ff);
> +               }
>
>                 batch = intel_batchbuffer_alloc(bufmgr, devid);
>
> @@ -2627,14 +2879,19 @@ hsw_test_single_ctx_counters(void)
>                  */
>                 render_copy(batch,
>                             context0,
> -                           &src, 0, 0, width, height,
> -                           &dst, 0, 0);
> +                           &src[0], 0, 0, width, height,
> +                           &dst[0], 0, 0);
>
>                 ret = drm_intel_gem_context_get_id(context0, &ctx_id);
>                 igt_assert_eq(ret, 0);
>                 igt_assert_neq(ctx_id, 0xffffffff);
>                 properties[1] = ctx_id;
>
> +               intel_batchbuffer_flush_with_context(batch, context0);
> +
> +               scratch_buf_memset(src[0].bo, width, height, 0xff0000ff);
> +               scratch_buf_memset(dst[0].bo, width, height, 0x00ff00ff);
> +
>                 igt_debug("opening i915-perf stream\n");
>                 stream_fd = __perf_open(drm_fd, &param);
>
> @@ -2661,8 +2918,8 @@ hsw_test_single_ctx_counters(void)
>
>                 render_copy(batch,
>                             context0,
> -                           &src, 0, 0, width, height,
> -                           &dst, 0, 0);
> +                           &src[0], 0, 0, width, height,
> +                           &dst[0], 0, 0);
>
>                 /* Another redundant flush to clarify batch bo is free to reuse */
>                 intel_batchbuffer_flush_with_context(batch, context0);
> @@ -2673,13 +2930,13 @@ hsw_test_single_ctx_counters(void)
>                  */
>                 render_copy(batch,
>                             context1,
> -                           &src, 0, 0, width, height,
> -                           &dst, 0, 0);
> +                           &src[1], 0, 0, width, height,
> +                           &dst[1], 0, 0);
>
>                 render_copy(batch,
>                             context1,
> -                           &src, 0, 0, width, height,
> -                           &dst, 0, 0);
> +                           &src[2], 0, 0, width, height,
> +                           &dst[2], 0, 0);
>
>                 /* And another */
>                 intel_batchbuffer_flush_with_context(batch, context1);
> @@ -2708,6 +2965,7 @@ hsw_test_single_ctx_counters(void)
>
>                 /* A40 == N samples written to all render targets */
>                 n_samples_written = report1_32[43] - report0_32[43];
> +
>                 igt_debug("n samples written = %d\n", n_samples_written);
>                 igt_assert_eq(n_samples_written, width * height);
>
> @@ -2742,8 +3000,10 @@ hsw_test_single_ctx_counters(void)
>                         (delta_oa32_ns - delta_ts64_ns);
>                 igt_assert(delta_delta <= 320);
>
> -               drm_intel_bo_unreference(src.bo);
> -               drm_intel_bo_unreference(dst.bo);
> +               for (int i = 0; i < ARRAY_SIZE(src); i++) {
> +                       drm_intel_bo_unreference(src[i].bo);
> +                       drm_intel_bo_unreference(dst[i].bo);
> +               }
>
>                 drm_intel_bo_unmap(bo);
>                 drm_intel_bo_unreference(bo);
> @@ -2757,6 +3017,452 @@ hsw_test_single_ctx_counters(void)
>         igt_waitchildren();
>  }
>
> +/* Tests the INTEL_performance_query use case where an unprivileged process
> + * should be able to configure the OA unit for per-context metrics (for a
> + * context associated with that process' drm file descriptor) and the counters
> + * should only relate to that specific context.
> + *
> + * For Gen8+ although reports read via i915 perf can be filtered for a single
> + * context the counters themselves always progress as global/system-wide
> + * counters affected by all contexts. To support the INTEL_performance_query
> + * use case on Gen8+ it's necessary to combine OABUFFER and
> + * MI_REPORT_PERF_COUNT reports so that counter normalisation can take into
> + * account context-switch reports and factor out any counter progression not
> + * associated with the current context.
> + */
> +static void
> +gen8_test_single_ctx_render_target_writes_a_counter(void)
> +{
> +       int oa_exponent = max_oa_exponent_for_period_lte(1000000);
> +       uint64_t properties[] = {
> +               DRM_I915_PERF_PROP_CTX_HANDLE, UINT64_MAX, /* updated below */
> +
> +               /* Note: we have to specify at least one sample property even
> +                * though we aren't interested in samples in this case
> +                */
> +               DRM_I915_PERF_PROP_SAMPLE_OA, true,
> +
> +               /* OA unit configuration */
> +               DRM_I915_PERF_PROP_OA_METRICS_SET, test_metric_set_id,
> +               DRM_I915_PERF_PROP_OA_FORMAT, test_oa_format,
> +               DRM_I915_PERF_PROP_OA_EXPONENT, oa_exponent,
> +
> +               /* Note: no OA exponent specified in this case */
> +       };
> +       struct drm_i915_perf_open_param param = {
> +               .flags = I915_PERF_FLAG_FD_CLOEXEC,
> +               .num_properties = ARRAY_SIZE(properties) / 2,
> +               .properties_ptr = to_user_pointer(properties),
> +       };
> +       size_t format_size = oa_formats[test_oa_format].size;
> +       size_t sample_size = (sizeof(struct drm_i915_perf_record_header) +
> +                             format_size);
> +       int max_reports = (16 * 1024 * 1024) / format_size;
> +       int buf_size = sample_size * max_reports * 1.5;
> +       int child_ret;
> +       uint8_t *buf = malloc(buf_size);
> +       ssize_t len;
> +       struct igt_helper_process child = {};
> +
> +       /* should be default, but just to be sure... */
> +       write_u64_file("/proc/sys/dev/i915/perf_stream_paranoid", 1);
> +
> +       do {
> +
> +               igt_fork_helper(&child) {
> +                       struct drm_i915_perf_record_header *header;
> +                       drm_intel_bufmgr *bufmgr;
> +                       drm_intel_context *context0, *context1;
> +                       struct intel_batchbuffer *batch;
> +                       struct igt_buf src[3], dst[3];
> +                       drm_intel_bo *bo;
> +                       uint32_t *report0_32, *report1_32;
> +                       uint32_t *prev, *lprev = NULL;
> +                       uint64_t timestamp0_64, timestamp1_64;
> +                       uint32_t delta_ts64, delta_oa32;
> +                       uint64_t delta_ts64_ns, delta_oa32_ns;
> +                       uint32_t delta_delta;
> +                       int width = 800;
> +                       int height = 600;
> +                       uint32_t ctx_id = 0xffffffff; /* invalid handle */
> +                       uint32_t ctx1_id = 0xffffffff;  /* invalid handle */
> +                       uint32_t current_ctx_id = 0xffffffff;
> +                       uint32_t n_invalid_ctx = 0;
> +                       int ret;
> +                       struct accumulator accumulator = {
> +                               .format = test_oa_format
> +                       };
> +
> +                       bufmgr = drm_intel_bufmgr_gem_init(drm_fd, 4096);
> +                       drm_intel_bufmgr_gem_enable_reuse(bufmgr);
> +
> +                       for (int i = 0; i < ARRAY_SIZE(src); i++) {
> +                               scratch_buf_init(bufmgr, &src[i], width, height, 0xff0000ff);
> +                               scratch_buf_init(bufmgr, &dst[i], width, height, 0x00ff00ff);
> +                       }
> +
> +                       batch = intel_batchbuffer_alloc(bufmgr, devid);
> +
> +                       context0 = drm_intel_gem_context_create(bufmgr);
> +                       igt_assert(context0);
> +
> +                       context1 = drm_intel_gem_context_create(bufmgr);
> +                       igt_assert(context1);
> +
> +                       igt_debug("submitting warm up render_copy\n");
> +
> +                       /* Submit some early, unmeasured, work to the context we want
> +                        * to measure to try and catch issues with i915-perf
> +                        * initializing the HW context ID for filtering.
> +                        *
> +                        * We do this because i915-perf single context filtering had
> +                        * previously only relied on a hook into context pinning to
> +                        * initialize the HW context ID, instead of also trying to
> +                        * determine the HW ID while opening the stream, in case it
> +                        * has already been pinned.
> +                        *
> +                        * This wasn't noticed by the previous unit test because we
> +                        * were opening the stream while the context hadn't been
> +                        * touched or pinned yet and so it worked out correctly to wait
> +                        * for the pinning hook.
> +                        *
> +                        * Now a buggy version of i915-perf will fail to measure
> +                        * anything for context0 once this initial render_copy() ends
> +                        * up pinning the context since there won't ever be a pinning
> +                        * hook callback.
> +                        */
> +                       render_copy(batch,
> +                                   context0,
> +                                   &src[0], 0, 0, width, height,
> +                                   &dst[0], 0, 0);
> +
> +                       ret = drm_intel_gem_context_get_id(context0, &ctx_id);
> +                       igt_assert_eq(ret, 0);
> +                       igt_assert_neq(ctx_id, 0xffffffff);
> +                       properties[1] = ctx_id;
> +
> +                       scratch_buf_memset(src[0].bo, width, height, 0xff0000ff);
> +                       scratch_buf_memset(dst[0].bo, width, height, 0x00ff00ff);
> +
> +                       igt_debug("opening i915-perf stream\n");
> +                       stream_fd = __perf_open(drm_fd, &param);
> +
> +                       bo = drm_intel_bo_alloc(bufmgr, "mi_rpc dest bo", 4096, 64);
alignment=64 ?

> +
> +                       ret = drm_intel_bo_map(bo, true /* write enable */);
> +                       igt_assert_eq(ret, 0);
> +
> +                       memset(bo->virtual, 0x80, 4096);
> +                       drm_intel_bo_unmap(bo);
> +
> +                       emit_stall_timestamp_and_rpc(batch,
> +                                                    bo,
> +                                                    512 /* timestamp offset */,
> +                                                    0, /* report dst offset */
> +                                                    0xdeadbeef); /* report id */
> +
> +                       /* Explicitly flush here (even though the render_copy() call
> +                        * will itself flush before/after the copy) to clarify that
> +                        * that the PIPE_CONTROL + MI_RPC commands will be in a
> +                        * separate batch from the copy.
> +                        */
> +                       intel_batchbuffer_flush_with_context(batch, context0);
> +
> +                       render_copy(batch,
> +                                   context0,
> +                                   &src[0], 0, 0, width, height,
> +                                   &dst[0], 0, 0);
> +
> +                       /* Another redundant flush to clarify batch bo is free to reuse */
> +                       intel_batchbuffer_flush_with_context(batch, context0);
> +
> +                       /* submit two copies on the other context to avoid a false
> +                        * positive in case the driver somehow ended up filtering for
> +                        * context1
> +                        */
> +                       render_copy(batch,
> +                                   context1,
> +                                   &src[1], 0, 0, width, height,
> +                                   &dst[1], 0, 0);
> +
> +                       ret = drm_intel_gem_context_get_id(context1, &ctx1_id);
> +                       igt_assert_eq(ret, 0);
> +                       igt_assert_neq(ctx1_id, 0xffffffff);
> +
> +                       render_copy(batch,
> +                                   context1,
> +                                   &src[2], 0, 0, width, height,
> +                                   &dst[2], 0, 0);
> +
> +                       /* And another */
> +                       intel_batchbuffer_flush_with_context(batch, context1);
> +
> +                       emit_stall_timestamp_and_rpc(batch,
> +                                                    bo,
> +                                                    520 /* timestamp offset */,
> +                                                    256, /* report dst offset */
> +                                                    0xbeefbeef); /* report id */
> +
> +                       intel_batchbuffer_flush_with_context(batch, context1);
> +
> +                       ret = drm_intel_bo_map(bo, false /* write enable */);
> +                       igt_assert_eq(ret, 0);
> +
> +                       report0_32 = bo->virtual;
> +                       igt_assert_eq(report0_32[0], 0xdeadbeef); /* report ID */
> +                       igt_assert_neq(report0_32[1], 0); /* timestamp */
> +                       //report0_32[2] = 0xffffffff;
Commented out code, elsewhere also, I take it that this is not just
sentimental and is for debug purposes :)

Otherwise looks okay:
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* ✗ Fi.CI.BAT: failure for Improve robustness of the i915 perf tests (rev4)
  2017-10-04 11:19 [PATCH i-g-t v6 00/12] Improve robustness of the i915 perf tests Lionel Landwerlin
                   ` (11 preceding siblings ...)
  2017-10-04 11:19 ` [PATCH i-g-t v6 12/12] tests/perf: split array of formats descriptions Lionel Landwerlin
@ 2017-10-04 11:49 ` Patchwork
  2017-10-04 12:31 ` ✓ Fi.CI.BAT: success " Patchwork
  2017-10-04 13:27 ` ✗ Fi.CI.IGT: warning " Patchwork
  14 siblings, 0 replies; 24+ messages in thread
From: Patchwork @ 2017-10-04 11:49 UTC (permalink / raw)
  To: Lionel Landwerlin; +Cc: intel-gfx

== Series Details ==

Series: Improve robustness of the i915 perf tests (rev4)
URL   : https://patchwork.freedesktop.org/series/28373/
State : failure

== Summary ==

IGT patchset tested on top of latest successful build
7c9dccb596b6467899c993a7df5d32574c1b89b9 Fix compilation on some distros

with latest DRM-Tip kernel build CI_DRM_3171
d8f7188551f3 drm-tip: 2017y-10m-04d-08h-46m-00s UTC integration manifest

Testlist changes:
+igt@perf@gen8-unprivileged-single-ctx-counters
+igt@perf@unprivileged-single-ctx-counters
-igt@perf@unprivileged-singled-ctx-counters

Test gem_exec_suspend:
        Subgroup basic-s4-devices:
                pass       -> DMESG-WARN (fi-kbl-7500u)
Test kms_busy:
        Subgroup basic-flip-b:
                dmesg-warn -> PASS       (fi-skl-6700hq) fdo#101518
        Subgroup basic-flip-c:
                dmesg-warn -> PASS       (fi-skl-6700k) fdo#103097
Test kms_pipe_crc_basic:
        Subgroup hang-read-crc-pipe-b:
                pass       -> INCOMPLETE (fi-cfl-s)

fdo#101518 https://bugs.freedesktop.org/show_bug.cgi?id=101518
fdo#103097 https://bugs.freedesktop.org/show_bug.cgi?id=103097

fi-bdw-5557u     total:289  pass:268  dwarn:0   dfail:0   fail:0   skip:21  time:458s
fi-bdw-gvtdvm    total:289  pass:265  dwarn:0   dfail:0   fail:0   skip:24  time:481s
fi-blb-e6850     total:289  pass:224  dwarn:1   dfail:0   fail:0   skip:64  time:391s
fi-bsw-n3050     total:289  pass:243  dwarn:0   dfail:0   fail:0   skip:46  time:578s
fi-bwr-2160      total:289  pass:184  dwarn:0   dfail:0   fail:0   skip:105 time:291s
fi-bxt-dsi       total:289  pass:259  dwarn:0   dfail:0   fail:0   skip:30  time:535s
fi-bxt-j4205     total:289  pass:260  dwarn:0   dfail:0   fail:0   skip:29  time:541s
fi-byt-j1900     total:289  pass:254  dwarn:1   dfail:0   fail:0   skip:34  time:548s
fi-byt-n2820     total:289  pass:250  dwarn:1   dfail:0   fail:0   skip:38  time:534s
fi-cfl-s         total:231  pass:204  dwarn:1   dfail:0   fail:0   skip:25 
fi-cnl-y         total:289  pass:262  dwarn:0   dfail:0   fail:0   skip:27  time:640s
fi-elk-e7500     total:289  pass:230  dwarn:0   dfail:0   fail:0   skip:59  time:439s
fi-glk-1         total:289  pass:261  dwarn:0   dfail:0   fail:0   skip:28  time:600s
fi-hsw-4770      total:289  pass:263  dwarn:0   dfail:0   fail:0   skip:26  time:443s
fi-hsw-4770r     total:289  pass:263  dwarn:0   dfail:0   fail:0   skip:26  time:426s
fi-ilk-650       total:289  pass:229  dwarn:0   dfail:0   fail:0   skip:60  time:469s
fi-ivb-3520m     total:289  pass:261  dwarn:0   dfail:0   fail:0   skip:28  time:508s
fi-ivb-3770      total:289  pass:261  dwarn:0   dfail:0   fail:0   skip:28  time:479s
fi-kbl-7500u     total:289  pass:263  dwarn:2   dfail:0   fail:0   skip:24  time:507s
fi-kbl-7560u     total:289  pass:270  dwarn:0   dfail:0   fail:0   skip:19  time:576s
fi-kbl-7567u     total:289  pass:265  dwarn:4   dfail:0   fail:0   skip:20  time:492s
fi-kbl-r         total:289  pass:262  dwarn:0   dfail:0   fail:0   skip:27  time:592s
fi-pnv-d510      total:289  pass:223  dwarn:1   dfail:0   fail:0   skip:65  time:662s
fi-skl-6260u     total:289  pass:269  dwarn:0   dfail:0   fail:0   skip:20  time:475s
fi-skl-6700hq    total:289  pass:263  dwarn:0   dfail:0   fail:0   skip:26  time:619s
fi-skl-6700k     total:289  pass:265  dwarn:0   dfail:0   fail:0   skip:24  time:534s
fi-skl-6770hq    total:289  pass:269  dwarn:0   dfail:0   fail:0   skip:20  time:526s
fi-skl-gvtdvm    total:289  pass:266  dwarn:0   dfail:0   fail:0   skip:23  time:486s
fi-snb-2520m     total:289  pass:251  dwarn:0   dfail:0   fail:0   skip:38  time:587s
fi-snb-2600      total:289  pass:250  dwarn:0   dfail:0   fail:0   skip:39  time:441s

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_295/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH i-g-t v6 11/12] tests/perf: add support for Coffeelake
  2017-10-04 11:19 ` [PATCH i-g-t v6 11/12] tests/perf: add support for Coffeelake Lionel Landwerlin
@ 2017-10-04 11:51   ` Matthew Auld
  0 siblings, 0 replies; 24+ messages in thread
From: Matthew Auld @ 2017-10-04 11:51 UTC (permalink / raw)
  To: Lionel Landwerlin; +Cc: Intel Graphics Development

On 4 October 2017 at 12:19, Lionel Landwerlin
<lionel.g.landwerlin@intel.com> wrote:
> Using the same timestamp frequency as Skylake/Kabylake.
>
> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH i-g-t v6 01/12] tests/perf: make stream_fd a global variable
  2017-10-04 11:19 ` [PATCH i-g-t v6 01/12] tests/perf: make stream_fd a global variable Lionel Landwerlin
@ 2017-10-04 12:08   ` Chris Wilson
  0 siblings, 0 replies; 24+ messages in thread
From: Chris Wilson @ 2017-10-04 12:08 UTC (permalink / raw)
  To: Lionel Landwerlin, intel-gfx

Quoting Lionel Landwerlin (2017-10-04 12:19:36)
> When debugging unstable tests on new platforms we currently we don't
> cleanup everything well in between different tests. Since only a
> single OA stream fd can be opened at a time, having the stream_fd as a
> global variable helps us cleanup the state between tests.
> 
> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> Reviewed-by: Matthew Auld <matthew.auld@intel.com>

Food for thought: launch each subtest in a process, fd will then be
reaped on process termination. That should be a good way to cleanup
local test resources?
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH i-g-t v6 12/12] tests/perf: split array of formats descriptions
  2017-10-04 11:19 ` [PATCH i-g-t v6 12/12] tests/perf: split array of formats descriptions Lionel Landwerlin
@ 2017-10-04 12:10   ` Matthew Auld
  0 siblings, 0 replies; 24+ messages in thread
From: Matthew Auld @ 2017-10-04 12:10 UTC (permalink / raw)
  To: Lionel Landwerlin; +Cc: Intel Graphics Development

On 4 October 2017 at 12:19, Lionel Landwerlin
<lionel.g.landwerlin@intel.com> wrote:
> The I915_OA_FORMAT_C4_B8 format has different offset on Haswell &
> Gen8. Let's split the format lists so we don't mix them.
>
> Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* ✓ Fi.CI.BAT: success for Improve robustness of the i915 perf tests (rev4)
  2017-10-04 11:19 [PATCH i-g-t v6 00/12] Improve robustness of the i915 perf tests Lionel Landwerlin
                   ` (12 preceding siblings ...)
  2017-10-04 11:49 ` ✗ Fi.CI.BAT: failure for Improve robustness of the i915 perf tests (rev4) Patchwork
@ 2017-10-04 12:31 ` Patchwork
  2017-10-04 13:27 ` ✗ Fi.CI.IGT: warning " Patchwork
  14 siblings, 0 replies; 24+ messages in thread
From: Patchwork @ 2017-10-04 12:31 UTC (permalink / raw)
  To: Lionel Landwerlin; +Cc: intel-gfx

== Series Details ==

Series: Improve robustness of the i915 perf tests (rev4)
URL   : https://patchwork.freedesktop.org/series/28373/
State : success

== Summary ==

IGT patchset tested on top of latest successful build
26d0da428f6d6029ad803821ddb6bbc6ff5765a5 lib/igt_kms: Disable crtc in legacy path when output is unset

with latest DRM-Tip kernel build CI_DRM_3171
d8f7188551f3 drm-tip: 2017y-10m-04d-08h-46m-00s UTC integration manifest

Testlist changes:
+igt@perf@gen8-unprivileged-single-ctx-counters
+igt@perf@unprivileged-single-ctx-counters
-igt@perf@unprivileged-singled-ctx-counters

Test kms_busy:
        Subgroup basic-flip-b:
                dmesg-warn -> PASS       (fi-skl-6700hq) fdo#101518
        Subgroup basic-flip-c:
                dmesg-warn -> PASS       (fi-skl-6700k) fdo#103097
Test drv_module_reload:
        Subgroup basic-no-display:
                dmesg-warn -> PASS       (fi-cfl-s) k.org#196765 +1

fdo#101518 https://bugs.freedesktop.org/show_bug.cgi?id=101518
fdo#103097 https://bugs.freedesktop.org/show_bug.cgi?id=103097
k.org#196765 https://bugzilla.kernel.org/show_bug.cgi?id=196765

fi-bdw-5557u     total:289  pass:268  dwarn:0   dfail:0   fail:0   skip:21  time:462s
fi-bdw-gvtdvm    total:289  pass:265  dwarn:0   dfail:0   fail:0   skip:24  time:473s
fi-blb-e6850     total:289  pass:224  dwarn:1   dfail:0   fail:0   skip:64  time:398s
fi-bsw-n3050     total:289  pass:243  dwarn:0   dfail:0   fail:0   skip:46  time:574s
fi-bwr-2160      total:289  pass:184  dwarn:0   dfail:0   fail:0   skip:105 time:291s
fi-bxt-dsi       total:289  pass:259  dwarn:0   dfail:0   fail:0   skip:30  time:530s
fi-bxt-j4205     total:289  pass:260  dwarn:0   dfail:0   fail:0   skip:29  time:535s
fi-byt-j1900     total:289  pass:254  dwarn:1   dfail:0   fail:0   skip:34  time:549s
fi-byt-n2820     total:289  pass:250  dwarn:1   dfail:0   fail:0   skip:38  time:534s
fi-cfl-s         total:289  pass:256  dwarn:1   dfail:0   fail:0   skip:32  time:558s
fi-cnl-y         total:289  pass:262  dwarn:0   dfail:0   fail:0   skip:27  time:630s
fi-elk-e7500     total:289  pass:230  dwarn:0   dfail:0   fail:0   skip:59  time:443s
fi-glk-1         total:289  pass:261  dwarn:0   dfail:0   fail:0   skip:28  time:598s
fi-hsw-4770      total:289  pass:263  dwarn:0   dfail:0   fail:0   skip:26  time:453s
fi-hsw-4770r     total:289  pass:263  dwarn:0   dfail:0   fail:0   skip:26  time:419s
fi-ilk-650       total:289  pass:229  dwarn:0   dfail:0   fail:0   skip:60  time:463s
fi-ivb-3520m     total:289  pass:261  dwarn:0   dfail:0   fail:0   skip:28  time:511s
fi-ivb-3770      total:289  pass:261  dwarn:0   dfail:0   fail:0   skip:28  time:480s
fi-kbl-7500u     total:289  pass:264  dwarn:1   dfail:0   fail:0   skip:24  time:503s
fi-kbl-7560u     total:289  pass:270  dwarn:0   dfail:0   fail:0   skip:19  time:579s
fi-kbl-7567u     total:289  pass:265  dwarn:4   dfail:0   fail:0   skip:20  time:493s
fi-kbl-r         total:289  pass:262  dwarn:0   dfail:0   fail:0   skip:27  time:590s
fi-pnv-d510      total:289  pass:223  dwarn:1   dfail:0   fail:0   skip:65  time:665s
fi-skl-6260u     total:289  pass:269  dwarn:0   dfail:0   fail:0   skip:20  time:482s
fi-skl-6700hq    total:289  pass:263  dwarn:0   dfail:0   fail:0   skip:26  time:623s
fi-skl-6700k     total:289  pass:265  dwarn:0   dfail:0   fail:0   skip:24  time:535s
fi-skl-6770hq    total:289  pass:269  dwarn:0   dfail:0   fail:0   skip:20  time:516s
fi-skl-gvtdvm    total:289  pass:266  dwarn:0   dfail:0   fail:0   skip:23  time:480s
fi-snb-2520m     total:289  pass:251  dwarn:0   dfail:0   fail:0   skip:38  time:600s
fi-snb-2600      total:289  pass:250  dwarn:0   dfail:0   fail:0   skip:39  time:439s

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_296/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH i-g-t v6 02/12] tests/perf: add per context filtering test for gen8+
  2017-10-04 11:46   ` Matthew Auld
@ 2017-10-04 12:43     ` Lionel Landwerlin
  2017-10-04 13:05       ` Matthew Auld
  0 siblings, 1 reply; 24+ messages in thread
From: Lionel Landwerlin @ 2017-10-04 12:43 UTC (permalink / raw)
  To: Matthew Auld; +Cc: Intel Graphics Development

On 04/10/17 12:46, Matthew Auld wrote:
> On 4 October 2017 at 12:19, Lionel Landwerlin
> <lionel.g.landwerlin@intel.com> wrote:
>> From: Robert Bragg <robert@sixbynine.org>
>>
>> Signed-off-by: Robert Bragg <robert@sixbynine.org>
>> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>> ---
>>   tests/perf.c | 777 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++---
>>   1 file changed, 745 insertions(+), 32 deletions(-)
>>
>> diff --git a/tests/perf.c b/tests/perf.c
>> index f89a235e..8644e252 100644
>> --- a/tests/perf.c
>> +++ b/tests/perf.c
>> @@ -48,7 +48,9 @@ IGT_TEST_DESCRIPTION("Test the i915 perf metrics streaming interface");
>>   #define OAREPORT_REASON_MASK           0x3f
>>   #define OAREPORT_REASON_SHIFT          19
>>   #define OAREPORT_REASON_TIMER          (1<<0)
>> +#define OAREPORT_REASON_INTERNAL       (3<<1)
>>   #define OAREPORT_REASON_CTX_SWITCH     (1<<3)
>> +#define OAREPORT_REASON_GO             (1<<4)
>>   #define OAREPORT_REASON_CLK_RATIO      (1<<5)
>>
>>   #define GFX_OP_PIPE_CONTROL     ((3 << 29) | (3 << 27) | (2 << 24))
>> @@ -574,6 +576,22 @@ oa_exponent_to_ns(int exponent)
>>          return 1000000000ULL * (2ULL << exponent) / timestamp_frequency;
>>   }
>>
>> +static bool
>> +oa_report_ctx_is_valid(uint32_t *report)
>> +{
>> +       if (IS_HASWELL(devid)) {
>> +               return false; /* TODO */
>> +       } else if (IS_GEN8(devid)) {
>> +               return report[0] & (1ul << 25);
>> +       } else if (IS_GEN9(devid)) {
>> +               return report[0] & (1ul << 16);
>> +       }
>> +
>> +       /* Need to update this function for newer Gen. */
>> +       igt_assert(!"reached");
>> +}
>> +
>> +
>>   static void
>>   hsw_sanity_check_render_basic_reports(uint32_t *oa_report0, uint32_t *oa_report1,
>>                                        enum drm_i915_oa_format fmt)
>> @@ -678,6 +696,100 @@ gen8_40bit_a_delta(uint64_t value0, uint64_t value1)
>>                  return value1 - value0;
>>   }
>>
>> +static void
>> +accumulate_uint32(size_t offset,
>> +                 uint32_t *report0,
>> +                  uint32_t *report1,
>> +                  uint64_t *delta)
>> +{
>> +       uint32_t value0 = *(uint32_t *)(((uint8_t *)report0) + offset);
>> +       uint32_t value1 = *(uint32_t *)(((uint8_t *)report1) + offset);
>> +
>> +       *delta += (uint32_t)(value1 - value0);
>> +}
>> +
>> +static void
>> +accumulate_uint40(int a_index,
>> +                  uint32_t *report0,
>> +                  uint32_t *report1,
>> +                 enum drm_i915_oa_format format,
>> +                  uint64_t *delta)
>> +{
>> +       uint64_t value0 = gen8_read_40bit_a_counter(report0, format, a_index),
>> +                value1 = gen8_read_40bit_a_counter(report1, format, a_index);
>> +
>> +       *delta += gen8_40bit_a_delta(value0, value1);
>> +}
>> +
>> +static void
>> +accumulate_reports(struct accumulator *accumulator,
>> +                  uint32_t *start,
>> +                  uint32_t *end)
>> +{
>> +       enum drm_i915_oa_format format = accumulator->format;
>> +       uint64_t *deltas = accumulator->deltas;
>> +       int idx = 0;
>> +
>> +       if (intel_gen(devid) >= 8) {
>> +               /* timestamp */
>> +               accumulate_uint32(4, start, end, deltas + idx++);
>> +
>> +               /* clock cycles */
>> +               accumulate_uint32(12, start, end, deltas + idx++);
>> +       } else {
>> +               /* timestamp */
>> +               accumulate_uint32(4, start, end, deltas + idx++);
>> +       }
>> +
>> +       for (int i = 0; i < oa_formats[format].n_a40; i++)
>> +               accumulate_uint40(i, start, end, format, deltas + idx++);
>> +
>> +       for (int i = 0; i < oa_formats[format].n_a; i++) {
>> +               accumulate_uint32(oa_formats[format].a_off + 4 * i,
>> +                                 start, end, deltas + idx++);
>> +       }
>> +
>> +       for (int i = 0; i < oa_formats[format].n_b; i++) {
>> +               accumulate_uint32(oa_formats[format].b_off + 4 * i,
>> +                                 start, end, deltas + idx++);
>> +       }
>> +
>> +       for (int i = 0; i < oa_formats[format].n_c; i++) {
>> +               accumulate_uint32(oa_formats[format].c_off + 4 * i,
>> +                                 start, end, deltas + idx++);
>> +       }
>> +}
>> +
>> +static void
>> +accumulator_print(struct accumulator *accumulator, const char *title)
>> +{
>> +       enum drm_i915_oa_format format = accumulator->format;
>> +       uint64_t *deltas = accumulator->deltas;
>> +       int idx = 0;
>> +
>> +       igt_debug("%s:\n", title);
>> +       if (intel_gen(devid) >= 8) {
>> +               igt_debug("\ttime delta = %lu\n", deltas[idx++]);
>> +               igt_debug("\tclock cycle delta = %lu\n", deltas[idx++]);
>> +
>> +               for (int i = 0; i < oa_formats[format].n_a40; i++)
>> +                       igt_debug("\tA%u = %lu\n", i, deltas[idx++]);
>> +       } else {
>> +               igt_debug("\ttime delta = %lu\n", deltas[idx++]);
>> +       }
>> +
>> +       for (int i = 0; i < oa_formats[format].n_a; i++) {
>> +               int a_id = oa_formats[format].first_a + i;
>> +               igt_debug("\tA%u = %lu\n", a_id, deltas[idx++]);
>> +       }
>> +
>> +       for (int i = 0; i < oa_formats[format].n_a; i++)
>> +               igt_debug("\tB%u = %lu\n", i, deltas[idx++]);
>> +
>> +       for (int i = 0; i < oa_formats[format].n_c; i++)
>> +               igt_debug("\tC%u = %lu\n", i, deltas[idx++]);
>> +}
>> +
>>   /* The TestOa metric set is designed so */
>>   static void
>>   gen8_sanity_check_test_oa_reports(uint32_t *oa_report0, uint32_t *oa_report1,
>> @@ -944,6 +1056,62 @@ gt_frequency_range_restore(void)
>>          gt_max_freq_mhz = gt_max_freq_mhz_saved;
>>   }
>>
>> +static int
>> +i915_read_reports_until_timestamp(enum drm_i915_oa_format oa_format,
>> +                                 uint8_t *buf,
>> +                                 uint32_t max_size,
>> +                                 uint32_t start_timestamp,
>> +                                 uint32_t end_timestamp)
>> +{
>> +       size_t format_size = oa_formats[oa_format].size;
>> +       uint32_t last_seen_timestamp = start_timestamp;
>> +       int total_len = 0;
>> +
>> +       while (last_seen_timestamp < end_timestamp) {
>> +               int offset, len;
>> +
>> +               /* Running out of space. */
>> +               if ((max_size - total_len) < format_size) {
>> +                       igt_warn("run out of space before reaching "
>> +                                "end timestamp (%u/%u)\n",
>> +                                last_seen_timestamp, end_timestamp);
>> +                       return -1;
>> +               }
>> +
>> +               while ((len = read(stream_fd, &buf[total_len],
>> +                                  max_size - total_len)) < 0 &&
>> +                      errno == EINTR)
>> +                       ;
>> +
>> +               /* Intentionally return an error. */
>> +               if (len <= 0) {
>> +                       if (errno == EAGAIN)
>> +                               return total_len;
>> +                       else {
>> +                               igt_warn("error read OA stream : %i\n", errno);
>> +                               return -1;
>> +                       }
>> +               }
>> +
>> +               offset = total_len;
>> +               total_len += len;
>> +
>> +               while (offset < total_len) {
>> +                 const struct drm_i915_perf_record_header *header =
>> +                   (const struct drm_i915_perf_record_header *) &buf[offset];
>> +                 uint32_t *report = (uint32_t *) (header + 1);
>> +
>> +                 if (header->type == DRM_I915_PERF_RECORD_SAMPLE)
>> +                   last_seen_timestamp = report[1];
>> +
>> +                 offset += header->size;
>> +               }
> Formatting looks off.

Fixing.

>
>> +       }
>> +
>> +       return total_len;
>> +}
>> +
>> +
>>   /* CAP_SYS_ADMIN is required to open system wide metrics, unless the system
>>    * control parameter dev.i915.perf_stream_paranoid == 0 */
>>   static void
>> @@ -1362,6 +1530,70 @@ print_reports(uint32_t *oa_report0, uint32_t *oa_report1, int fmt)
>>          }
>>   }
>>
>> +/* Debug function, only useful when reports don't make sense. */
>> +#if 0
>> +static void
>> +print_report(uint32_t *report, int fmt)
>> +{
>> +       igt_debug("TIMESTAMP: %"PRIu32"\n", report[1]);
>> +
>> +       if (IS_HASWELL(devid) && oa_formats[fmt].n_c == 0) {
>> +               igt_debug("CLOCK = N/A\n");
>> +       } else {
>> +               uint32_t clock = read_report_ticks(report, fmt);
>> +
>> +               igt_debug("CLOCK: %"PRIu32"\n", clock);
>> +       }
>> +
>> +       if (intel_gen(devid) >= 8) {
>> +               uint32_t slice_freq, unslice_freq;
>> +               const char *reason = gen8_read_report_reason(report);
>> +
>> +               gen8_read_report_clock_ratios(report, &slice_freq, &unslice_freq);
>> +
>> +               igt_debug("SLICE CLK: %umhz\n", slice_freq);
>> +               igt_debug("UNSLICE CLK: %umhz\n", unslice_freq);
>> +               igt_debug("REASON: \"%s\"\n", reason);
>> +               igt_debug("CTX ID: %"PRIu32"/%"PRIx32"\n", report[2], report[2]);
>> +       }
>> +
>> +       /* Gen8+ has some 40bit A counters... */
>> +       for (int j = 0; j < oa_formats[fmt].n_a40; j++) {
>> +               uint64_t value = gen8_read_40bit_a_counter(report, fmt, j);
>> +
>> +               if (undefined_a_counters[j])
>> +                       continue;
>> +
>> +               igt_debug("A%d: %"PRIu64"\n", j, value);
>> +       }
>> +
>> +       for (int j = 0; j < oa_formats[fmt].n_a; j++) {
>> +               uint32_t *a = (uint32_t *)(((uint8_t *)report) +
>> +                                          oa_formats[fmt].a_off);
>> +               int a_id = oa_formats[fmt].first_a + j;
>> +
>> +               if (undefined_a_counters[a_id])
>> +                       continue;
>> +
>> +               igt_debug("A%d: %"PRIu32"\n", a_id, a[j]);
>> +       }
>> +
>> +       for (int j = 0; j < oa_formats[fmt].n_b; j++) {
>> +               uint32_t *b = (uint32_t *)(((uint8_t *)report) +
>> +                                          oa_formats[fmt].b_off);
>> +
>> +               igt_debug("B%d: %"PRIu32"\n", j, b[j]);
>> +       }
>> +
>> +       for (int j = 0; j < oa_formats[fmt].n_c; j++) {
>> +               uint32_t *c = (uint32_t *)(((uint8_t *)report) +
>> +                                          oa_formats[fmt].c_off);
>> +
>> +               igt_debug("C%d: %"PRIu32"\n", j, c[j]);
>> +       }
>> +}
>> +#endif
>> +
>>   static void
>>   test_oa_formats(void)
>>   {
>> @@ -2486,14 +2718,8 @@ test_mi_rpc(void)
>>   }
>>
>>   static void
>> -scratch_buf_init(drm_intel_bufmgr *bufmgr,
>> -                struct igt_buf *buf,
>> -                int width, int height,
>> -                uint32_t color)
>> +scratch_buf_memset(drm_intel_bo *bo, int width, int height, uint32_t color)
>>   {
>> -       size_t stride = width * 4;
>> -       size_t size = stride * height;
>> -       drm_intel_bo *bo = drm_intel_bo_alloc(bufmgr, "", size, 4096);
>>          int ret;
>>
>>          ret = drm_intel_bo_map(bo, true /* writable */);
>> @@ -2503,6 +2729,19 @@ scratch_buf_init(drm_intel_bufmgr *bufmgr,
>>                  ((uint32_t *)bo->virtual)[i] = color;
>>
>>          drm_intel_bo_unmap(bo);
>> +}
>> +
>> +static void
>> +scratch_buf_init(drm_intel_bufmgr *bufmgr,
>> +                struct igt_buf *buf,
>> +                int width, int height,
>> +                uint32_t color)
>> +{
>> +       size_t stride = width * 4;
>> +       size_t size = stride * height;
>> +       drm_intel_bo *bo = drm_intel_bo_alloc(bufmgr, "", size, 4096);
>> +
>> +       scratch_buf_memset(bo, width, height, color);
>>
>>          buf->bo = bo;
>>          buf->stride = stride;
>> @@ -2521,14 +2760,25 @@ emit_stall_timestamp_and_rpc(struct intel_batchbuffer *batch,
>>                                     PIPE_CONTROL_RENDER_TARGET_FLUSH |
>>                                     PIPE_CONTROL_WRITE_TIMESTAMP);
>>
>> -       BEGIN_BATCH(5, 1);
>> -       OUT_BATCH(GFX_OP_PIPE_CONTROL | (5 - 2));
>> -       OUT_BATCH(pipe_ctl_flags);
>> -       OUT_RELOC(dst, I915_GEM_DOMAIN_INSTRUCTION, I915_GEM_DOMAIN_INSTRUCTION,
>> -                 timestamp_offset);
>> -       OUT_BATCH(0); /* imm lower */
>> -       OUT_BATCH(0); /* imm upper */
>> -       ADVANCE_BATCH();
>> +       if (intel_gen(devid) >= 8) {
>> +               BEGIN_BATCH(5, 1);
>> +               OUT_BATCH(GFX_OP_PIPE_CONTROL | (6 - 2));
>> +               OUT_BATCH(pipe_ctl_flags);
>> +               OUT_RELOC(dst, I915_GEM_DOMAIN_INSTRUCTION, I915_GEM_DOMAIN_INSTRUCTION,
>> +                         timestamp_offset);
>> +               OUT_BATCH(0); /* imm lower */
>> +               OUT_BATCH(0); /* imm upper */
>> +               ADVANCE_BATCH();
>> +       } else {
>> +               BEGIN_BATCH(5, 1);
>> +               OUT_BATCH(GFX_OP_PIPE_CONTROL | (5 - 2));
>> +               OUT_BATCH(pipe_ctl_flags);
>> +               OUT_RELOC(dst, I915_GEM_DOMAIN_INSTRUCTION, I915_GEM_DOMAIN_INSTRUCTION,
>> +                         timestamp_offset);
>> +               OUT_BATCH(0); /* imm lower */
>> +               OUT_BATCH(0); /* imm upper */
>> +               ADVANCE_BATCH();
>> +       }
>>
>>          emit_report_perf_count(batch, dst, report_dst_offset, report_id);
>>   }
>> @@ -2574,7 +2824,7 @@ hsw_test_single_ctx_counters(void)
>>                  drm_intel_bufmgr *bufmgr;
>>                  drm_intel_context *context0, *context1;
>>                  struct intel_batchbuffer *batch;
>> -               struct igt_buf src, dst;
>> +               struct igt_buf src[3], dst[3];
>>                  drm_intel_bo *bo;
>>                  uint32_t *report0_32, *report1_32;
>>                  uint64_t timestamp0_64, timestamp1_64;
>> @@ -2592,8 +2842,10 @@ hsw_test_single_ctx_counters(void)
>>                  bufmgr = drm_intel_bufmgr_gem_init(drm_fd, 4096);
>>                  drm_intel_bufmgr_gem_enable_reuse(bufmgr);
>>
>> -               scratch_buf_init(bufmgr, &src, width, height, 0xff0000ff);
>> -               scratch_buf_init(bufmgr, &dst, width, height, 0x00ff00ff);
>> +               for (int i = 0; i < ARRAY_SIZE(src); i++) {
>> +                       scratch_buf_init(bufmgr, &src[i], width, height, 0xff0000ff);
>> +                       scratch_buf_init(bufmgr, &dst[i], width, height, 0x00ff00ff);
>> +               }
>>
>>                  batch = intel_batchbuffer_alloc(bufmgr, devid);
>>
>> @@ -2627,14 +2879,19 @@ hsw_test_single_ctx_counters(void)
>>                   */
>>                  render_copy(batch,
>>                              context0,
>> -                           &src, 0, 0, width, height,
>> -                           &dst, 0, 0);
>> +                           &src[0], 0, 0, width, height,
>> +                           &dst[0], 0, 0);
>>
>>                  ret = drm_intel_gem_context_get_id(context0, &ctx_id);
>>                  igt_assert_eq(ret, 0);
>>                  igt_assert_neq(ctx_id, 0xffffffff);
>>                  properties[1] = ctx_id;
>>
>> +               intel_batchbuffer_flush_with_context(batch, context0);
>> +
>> +               scratch_buf_memset(src[0].bo, width, height, 0xff0000ff);
>> +               scratch_buf_memset(dst[0].bo, width, height, 0x00ff00ff);
>> +
>>                  igt_debug("opening i915-perf stream\n");
>>                  stream_fd = __perf_open(drm_fd, &param);
>>
>> @@ -2661,8 +2918,8 @@ hsw_test_single_ctx_counters(void)
>>
>>                  render_copy(batch,
>>                              context0,
>> -                           &src, 0, 0, width, height,
>> -                           &dst, 0, 0);
>> +                           &src[0], 0, 0, width, height,
>> +                           &dst[0], 0, 0);
>>
>>                  /* Another redundant flush to clarify batch bo is free to reuse */
>>                  intel_batchbuffer_flush_with_context(batch, context0);
>> @@ -2673,13 +2930,13 @@ hsw_test_single_ctx_counters(void)
>>                   */
>>                  render_copy(batch,
>>                              context1,
>> -                           &src, 0, 0, width, height,
>> -                           &dst, 0, 0);
>> +                           &src[1], 0, 0, width, height,
>> +                           &dst[1], 0, 0);
>>
>>                  render_copy(batch,
>>                              context1,
>> -                           &src, 0, 0, width, height,
>> -                           &dst, 0, 0);
>> +                           &src[2], 0, 0, width, height,
>> +                           &dst[2], 0, 0);
>>
>>                  /* And another */
>>                  intel_batchbuffer_flush_with_context(batch, context1);
>> @@ -2708,6 +2965,7 @@ hsw_test_single_ctx_counters(void)
>>
>>                  /* A40 == N samples written to all render targets */
>>                  n_samples_written = report1_32[43] - report0_32[43];
>> +
>>                  igt_debug("n samples written = %d\n", n_samples_written);
>>                  igt_assert_eq(n_samples_written, width * height);
>>
>> @@ -2742,8 +3000,10 @@ hsw_test_single_ctx_counters(void)
>>                          (delta_oa32_ns - delta_ts64_ns);
>>                  igt_assert(delta_delta <= 320);
>>
>> -               drm_intel_bo_unreference(src.bo);
>> -               drm_intel_bo_unreference(dst.bo);
>> +               for (int i = 0; i < ARRAY_SIZE(src); i++) {
>> +                       drm_intel_bo_unreference(src[i].bo);
>> +                       drm_intel_bo_unreference(dst[i].bo);
>> +               }
>>
>>                  drm_intel_bo_unmap(bo);
>>                  drm_intel_bo_unreference(bo);
>> @@ -2757,6 +3017,452 @@ hsw_test_single_ctx_counters(void)
>>          igt_waitchildren();
>>   }
>>
>> +/* Tests the INTEL_performance_query use case where an unprivileged process
>> + * should be able to configure the OA unit for per-context metrics (for a
>> + * context associated with that process' drm file descriptor) and the counters
>> + * should only relate to that specific context.
>> + *
>> + * For Gen8+ although reports read via i915 perf can be filtered for a single
>> + * context the counters themselves always progress as global/system-wide
>> + * counters affected by all contexts. To support the INTEL_performance_query
>> + * use case on Gen8+ it's necessary to combine OABUFFER and
>> + * MI_REPORT_PERF_COUNT reports so that counter normalisation can take into
>> + * account context-switch reports and factor out any counter progression not
>> + * associated with the current context.
>> + */
>> +static void
>> +gen8_test_single_ctx_render_target_writes_a_counter(void)
>> +{
>> +       int oa_exponent = max_oa_exponent_for_period_lte(1000000);
>> +       uint64_t properties[] = {
>> +               DRM_I915_PERF_PROP_CTX_HANDLE, UINT64_MAX, /* updated below */
>> +
>> +               /* Note: we have to specify at least one sample property even
>> +                * though we aren't interested in samples in this case
>> +                */
>> +               DRM_I915_PERF_PROP_SAMPLE_OA, true,
>> +
>> +               /* OA unit configuration */
>> +               DRM_I915_PERF_PROP_OA_METRICS_SET, test_metric_set_id,
>> +               DRM_I915_PERF_PROP_OA_FORMAT, test_oa_format,
>> +               DRM_I915_PERF_PROP_OA_EXPONENT, oa_exponent,
>> +
>> +               /* Note: no OA exponent specified in this case */
>> +       };
>> +       struct drm_i915_perf_open_param param = {
>> +               .flags = I915_PERF_FLAG_FD_CLOEXEC,
>> +               .num_properties = ARRAY_SIZE(properties) / 2,
>> +               .properties_ptr = to_user_pointer(properties),
>> +       };
>> +       size_t format_size = oa_formats[test_oa_format].size;
>> +       size_t sample_size = (sizeof(struct drm_i915_perf_record_header) +
>> +                             format_size);
>> +       int max_reports = (16 * 1024 * 1024) / format_size;
>> +       int buf_size = sample_size * max_reports * 1.5;
>> +       int child_ret;
>> +       uint8_t *buf = malloc(buf_size);
>> +       ssize_t len;
>> +       struct igt_helper_process child = {};
>> +
>> +       /* should be default, but just to be sure... */
>> +       write_u64_file("/proc/sys/dev/i915/perf_stream_paranoid", 1);
>> +
>> +       do {
>> +
>> +               igt_fork_helper(&child) {
>> +                       struct drm_i915_perf_record_header *header;
>> +                       drm_intel_bufmgr *bufmgr;
>> +                       drm_intel_context *context0, *context1;
>> +                       struct intel_batchbuffer *batch;
>> +                       struct igt_buf src[3], dst[3];
>> +                       drm_intel_bo *bo;
>> +                       uint32_t *report0_32, *report1_32;
>> +                       uint32_t *prev, *lprev = NULL;
>> +                       uint64_t timestamp0_64, timestamp1_64;
>> +                       uint32_t delta_ts64, delta_oa32;
>> +                       uint64_t delta_ts64_ns, delta_oa32_ns;
>> +                       uint32_t delta_delta;
>> +                       int width = 800;
>> +                       int height = 600;
>> +                       uint32_t ctx_id = 0xffffffff; /* invalid handle */
>> +                       uint32_t ctx1_id = 0xffffffff;  /* invalid handle */
>> +                       uint32_t current_ctx_id = 0xffffffff;
>> +                       uint32_t n_invalid_ctx = 0;
>> +                       int ret;
>> +                       struct accumulator accumulator = {
>> +                               .format = test_oa_format
>> +                       };
>> +
>> +                       bufmgr = drm_intel_bufmgr_gem_init(drm_fd, 4096);
>> +                       drm_intel_bufmgr_gem_enable_reuse(bufmgr);
>> +
>> +                       for (int i = 0; i < ARRAY_SIZE(src); i++) {
>> +                               scratch_buf_init(bufmgr, &src[i], width, height, 0xff0000ff);
>> +                               scratch_buf_init(bufmgr, &dst[i], width, height, 0x00ff00ff);
>> +                       }
>> +
>> +                       batch = intel_batchbuffer_alloc(bufmgr, devid);
>> +
>> +                       context0 = drm_intel_gem_context_create(bufmgr);
>> +                       igt_assert(context0);
>> +
>> +                       context1 = drm_intel_gem_context_create(bufmgr);
>> +                       igt_assert(context1);
>> +
>> +                       igt_debug("submitting warm up render_copy\n");
>> +
>> +                       /* Submit some early, unmeasured, work to the context we want
>> +                        * to measure to try and catch issues with i915-perf
>> +                        * initializing the HW context ID for filtering.
>> +                        *
>> +                        * We do this because i915-perf single context filtering had
>> +                        * previously only relied on a hook into context pinning to
>> +                        * initialize the HW context ID, instead of also trying to
>> +                        * determine the HW ID while opening the stream, in case it
>> +                        * has already been pinned.
>> +                        *
>> +                        * This wasn't noticed by the previous unit test because we
>> +                        * were opening the stream while the context hadn't been
>> +                        * touched or pinned yet and so it worked out correctly to wait
>> +                        * for the pinning hook.
>> +                        *
>> +                        * Now a buggy version of i915-perf will fail to measure
>> +                        * anything for context0 once this initial render_copy() ends
>> +                        * up pinning the context since there won't ever be a pinning
>> +                        * hook callback.
>> +                        */
>> +                       render_copy(batch,
>> +                                   context0,
>> +                                   &src[0], 0, 0, width, height,
>> +                                   &dst[0], 0, 0);
>> +
>> +                       ret = drm_intel_gem_context_get_id(context0, &ctx_id);
>> +                       igt_assert_eq(ret, 0);
>> +                       igt_assert_neq(ctx_id, 0xffffffff);
>> +                       properties[1] = ctx_id;
>> +
>> +                       scratch_buf_memset(src[0].bo, width, height, 0xff0000ff);
>> +                       scratch_buf_memset(dst[0].bo, width, height, 0x00ff00ff);
>> +
>> +                       igt_debug("opening i915-perf stream\n");
>> +                       stream_fd = __perf_open(drm_fd, &param);
>> +
>> +                       bo = drm_intel_bo_alloc(bufmgr, "mi_rpc dest bo", 4096, 64);
> alignment=64 ?

Alignment requirement for MI_RPC are lower than with surfaces and indeed 
64bytes.

>
>> +
>> +                       ret = drm_intel_bo_map(bo, true /* write enable */);
>> +                       igt_assert_eq(ret, 0);
>> +
>> +                       memset(bo->virtual, 0x80, 4096);
>> +                       drm_intel_bo_unmap(bo);
>> +
>> +                       emit_stall_timestamp_and_rpc(batch,
>> +                                                    bo,
>> +                                                    512 /* timestamp offset */,
>> +                                                    0, /* report dst offset */
>> +                                                    0xdeadbeef); /* report id */
>> +
>> +                       /* Explicitly flush here (even though the render_copy() call
>> +                        * will itself flush before/after the copy) to clarify that
>> +                        * that the PIPE_CONTROL + MI_RPC commands will be in a
>> +                        * separate batch from the copy.
>> +                        */
>> +                       intel_batchbuffer_flush_with_context(batch, context0);
>> +
>> +                       render_copy(batch,
>> +                                   context0,
>> +                                   &src[0], 0, 0, width, height,
>> +                                   &dst[0], 0, 0);
>> +
>> +                       /* Another redundant flush to clarify batch bo is free to reuse */
>> +                       intel_batchbuffer_flush_with_context(batch, context0);
>> +
>> +                       /* submit two copies on the other context to avoid a false
>> +                        * positive in case the driver somehow ended up filtering for
>> +                        * context1
>> +                        */
>> +                       render_copy(batch,
>> +                                   context1,
>> +                                   &src[1], 0, 0, width, height,
>> +                                   &dst[1], 0, 0);
>> +
>> +                       ret = drm_intel_gem_context_get_id(context1, &ctx1_id);
>> +                       igt_assert_eq(ret, 0);
>> +                       igt_assert_neq(ctx1_id, 0xffffffff);
>> +
>> +                       render_copy(batch,
>> +                                   context1,
>> +                                   &src[2], 0, 0, width, height,
>> +                                   &dst[2], 0, 0);
>> +
>> +                       /* And another */
>> +                       intel_batchbuffer_flush_with_context(batch, context1);
>> +
>> +                       emit_stall_timestamp_and_rpc(batch,
>> +                                                    bo,
>> +                                                    520 /* timestamp offset */,
>> +                                                    256, /* report dst offset */
>> +                                                    0xbeefbeef); /* report id */
>> +
>> +                       intel_batchbuffer_flush_with_context(batch, context1);
>> +
>> +                       ret = drm_intel_bo_map(bo, false /* write enable */);
>> +                       igt_assert_eq(ret, 0);
>> +
>> +                       report0_32 = bo->virtual;
>> +                       igt_assert_eq(report0_32[0], 0xdeadbeef); /* report ID */
>> +                       igt_assert_neq(report0_32[1], 0); /* timestamp */
>> +                       //report0_32[2] = 0xffffffff;
> Commented out code, elsewhere also, I take it that this is not just
> sentimental and is for debug purposes :)

Nah, sorry, cleaning this up.

> Otherwise looks okay: Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Thank you so much!

-
Lionel
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH i-g-t v6 02/12] tests/perf: add per context filtering test for gen8+
  2017-10-04 12:43     ` Lionel Landwerlin
@ 2017-10-04 13:05       ` Matthew Auld
  2017-10-04 13:51         ` Lionel Landwerlin
  0 siblings, 1 reply; 24+ messages in thread
From: Matthew Auld @ 2017-10-04 13:05 UTC (permalink / raw)
  To: Lionel Landwerlin; +Cc: Intel Graphics Development

On 4 October 2017 at 13:43, Lionel Landwerlin
<lionel.g.landwerlin@intel.com> wrote:
> On 04/10/17 12:46, Matthew Auld wrote:
>>
>> On 4 October 2017 at 12:19, Lionel Landwerlin
>> <lionel.g.landwerlin@intel.com> wrote:
>>>
>>> From: Robert Bragg <robert@sixbynine.org>
>>>
>>> Signed-off-by: Robert Bragg <robert@sixbynine.org>
>>> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

<SNIP>

>>> +                       bo = drm_intel_bo_alloc(bufmgr, "mi_rpc dest bo",
>>> 4096, 64);
>>
>> alignment=64 ?
>
>
> Alignment requirement for MI_RPC are lower than with surfaces and indeed
> 64bytes.

The minimum gtt alignment is 4K, so specifying 64bytes doesn't make sense.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* ✗ Fi.CI.IGT: warning for Improve robustness of the i915 perf tests (rev4)
  2017-10-04 11:19 [PATCH i-g-t v6 00/12] Improve robustness of the i915 perf tests Lionel Landwerlin
                   ` (13 preceding siblings ...)
  2017-10-04 12:31 ` ✓ Fi.CI.BAT: success " Patchwork
@ 2017-10-04 13:27 ` Patchwork
  14 siblings, 0 replies; 24+ messages in thread
From: Patchwork @ 2017-10-04 13:27 UTC (permalink / raw)
  To: Lionel Landwerlin; +Cc: intel-gfx

== Series Details ==

Series: Improve robustness of the i915 perf tests (rev4)
URL   : https://patchwork.freedesktop.org/series/28373/
State : warning

== Summary ==

Test kms_draw_crc:
        Subgroup draw-method-xrgb8888-mmap-cpu-untiled:
                skip       -> PASS       (shard-hsw)
Test gem_eio:
        Subgroup in-flight:
                dmesg-warn -> PASS       (shard-hsw) fdo#102886
Test kms_frontbuffer_tracking:
        Subgroup fbc-1p-primscrn-spr-indfb-draw-blt:
                skip       -> PASS       (shard-hsw)
Test kms_cursor_legacy:
        Subgroup pipe-F-forked-move:
                incomplete -> SKIP       (shard-hsw) fdo#102332 +1
        Subgroup cursorA-vs-flipA-atomic:
                skip       -> PASS       (shard-hsw)
Test perf:
        Subgroup blocking:
                fail       -> PASS       (shard-hsw) fdo#102252
        Subgroup oa-exponents:
                fail       -> PASS       (shard-hsw) fdo#102254
Test prime_mmap:
        Subgroup test_userptr:
                pass       -> DMESG-WARN (shard-hsw)

fdo#102886 https://bugs.freedesktop.org/show_bug.cgi?id=102886
fdo#102332 https://bugs.freedesktop.org/show_bug.cgi?id=102332
fdo#102252 https://bugs.freedesktop.org/show_bug.cgi?id=102252
fdo#102254 https://bugs.freedesktop.org/show_bug.cgi?id=102254

shard-hsw        total:2430 pass:1334 dwarn:6   dfail:0   fail:6   skip:1084 time:10067s

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_296/shards.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH i-g-t v6 02/12] tests/perf: add per context filtering test for gen8+
  2017-10-04 13:05       ` Matthew Auld
@ 2017-10-04 13:51         ` Lionel Landwerlin
  2017-10-04 13:57           ` Chris Wilson
  0 siblings, 1 reply; 24+ messages in thread
From: Lionel Landwerlin @ 2017-10-04 13:51 UTC (permalink / raw)
  To: Matthew Auld; +Cc: Intel Graphics Development

On 04/10/17 14:05, Matthew Auld wrote:
> On 4 October 2017 at 13:43, Lionel Landwerlin
> <lionel.g.landwerlin@intel.com> wrote:
>> On 04/10/17 12:46, Matthew Auld wrote:
>>> On 4 October 2017 at 12:19, Lionel Landwerlin
>>> <lionel.g.landwerlin@intel.com> wrote:
>>>> From: Robert Bragg <robert@sixbynine.org>
>>>>
>>>> Signed-off-by: Robert Bragg <robert@sixbynine.org>
>>>> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> <SNIP>
>
>>>> +                       bo = drm_intel_bo_alloc(bufmgr, "mi_rpc dest bo",
>>>> 4096, 64);
>>> alignment=64 ?
>>
>> Alignment requirement for MI_RPC are lower than with surfaces and indeed
>> 64bytes.
> The minimum gtt alignment is 4K, so specifying 64bytes doesn't make sense.
>
Same can be said about an allocation of 4096 with alignement of 4096.

I can send a fix to set all of those to 0.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH i-g-t v6 02/12] tests/perf: add per context filtering test for gen8+
  2017-10-04 13:51         ` Lionel Landwerlin
@ 2017-10-04 13:57           ` Chris Wilson
  0 siblings, 0 replies; 24+ messages in thread
From: Chris Wilson @ 2017-10-04 13:57 UTC (permalink / raw)
  To: Lionel Landwerlin, Matthew Auld; +Cc: Intel Graphics Development

Quoting Lionel Landwerlin (2017-10-04 14:51:18)
> On 04/10/17 14:05, Matthew Auld wrote:
> > On 4 October 2017 at 13:43, Lionel Landwerlin
> > <lionel.g.landwerlin@intel.com> wrote:
> >> On 04/10/17 12:46, Matthew Auld wrote:
> >>> On 4 October 2017 at 12:19, Lionel Landwerlin
> >>> <lionel.g.landwerlin@intel.com> wrote:
> >>>> From: Robert Bragg <robert@sixbynine.org>
> >>>>
> >>>> Signed-off-by: Robert Bragg <robert@sixbynine.org>
> >>>> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> > <SNIP>
> >
> >>>> +                       bo = drm_intel_bo_alloc(bufmgr, "mi_rpc dest bo",
> >>>> 4096, 64);
> >>> alignment=64 ?
> >>
> >> Alignment requirement for MI_RPC are lower than with surfaces and indeed
> >> 64bytes.
> > The minimum gtt alignment is 4K, so specifying 64bytes doesn't make sense.
> >
> Same can be said about an allocation of 4096 with alignement of 4096.
> 
> I can send a fix to set all of those to 0.

It still has some merit as pure documentation, even if it will be
converted to an alignment of 0 by the kernel. That will be useful if
instead of allocating a whole bo, you wish to do suballocations.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2017-10-04 13:57 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-04 11:19 [PATCH i-g-t v6 00/12] Improve robustness of the i915 perf tests Lionel Landwerlin
2017-10-04 11:19 ` [PATCH i-g-t v6 01/12] tests/perf: make stream_fd a global variable Lionel Landwerlin
2017-10-04 12:08   ` Chris Wilson
2017-10-04 11:19 ` [PATCH i-g-t v6 02/12] tests/perf: add per context filtering test for gen8+ Lionel Landwerlin
2017-10-04 11:46   ` Matthew Auld
2017-10-04 12:43     ` Lionel Landwerlin
2017-10-04 13:05       ` Matthew Auld
2017-10-04 13:51         ` Lionel Landwerlin
2017-10-04 13:57           ` Chris Wilson
2017-10-04 11:19 ` [PATCH i-g-t v6 03/12] tests/perf: update max buffer size for reading reports Lionel Landwerlin
2017-10-04 11:19 ` [PATCH i-g-t v6 04/12] tests/perf: rc6: try to guess when rc6 is disabled Lionel Landwerlin
2017-10-04 11:19 ` [PATCH i-g-t v6 05/12] tests/perf: remove frequency related changes Lionel Landwerlin
2017-10-04 11:19 ` [PATCH i-g-t v6 06/12] tests/perf: rework oa-exponent test Lionel Landwerlin
2017-10-04 11:19 ` [PATCH i-g-t v6 07/12] tests/perf: make enable-disable more reliable Lionel Landwerlin
2017-10-04 11:19 ` [PATCH i-g-t v6 08/12] tests/perf: make buffer-fill " Lionel Landwerlin
2017-10-04 11:19 ` [PATCH i-g-t v6 09/12] tests/perf: estimate number of blocking/polling based on time spent Lionel Landwerlin
2017-10-04 11:19 ` [PATCH i-g-t v6 10/12] tests/perf: prevent power management to kick in when necessary Lionel Landwerlin
2017-10-04 11:19 ` [PATCH i-g-t v6 11/12] tests/perf: add support for Coffeelake Lionel Landwerlin
2017-10-04 11:51   ` Matthew Auld
2017-10-04 11:19 ` [PATCH i-g-t v6 12/12] tests/perf: split array of formats descriptions Lionel Landwerlin
2017-10-04 12:10   ` Matthew Auld
2017-10-04 11:49 ` ✗ Fi.CI.BAT: failure for Improve robustness of the i915 perf tests (rev4) Patchwork
2017-10-04 12:31 ` ✓ Fi.CI.BAT: success " Patchwork
2017-10-04 13:27 ` ✗ Fi.CI.IGT: warning " Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.