All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH i-g-t 00/29] Update i915 perf tests for Gen8+
@ 2017-04-25 22:32 Lionel Landwerlin
  2017-04-25 22:32 ` [PATCH i-g-t 01/29] igt/perf: generalize lookup for test metric set Lionel Landwerlin
                   ` (28 more replies)
  0 siblings, 29 replies; 36+ messages in thread
From: Lionel Landwerlin @ 2017-04-25 22:32 UTC (permalink / raw)
  To: intel-gfx

Hi,

Apologies for this unfriendly series, in the end squashing everything
into a single commit didn't look too good.

We went through quite a few iterations to figure all the instability
issues on Gen8+. This has finally reached a state where you can run
the tests on a machine with other applications using the GPU in
parallel. This gives us confidence that we now have a pretty good
understanding of how the filtering of reports should be done as well
as things that might affect periodic report timings.

Most of Rob's patches are already reviewed or co-authored. I'm not
really expecting a someone to thoroughly review all of these changes
as it takes a fair amount of time to get into all of the fiddly
details, but if someone could look at the end result and quickly read
through to check there isn't something terribly wrong, that would be
helpful.

Thanks a lot,

Lionel Landwerlin (8):
  igt/perf: add utility function for checking periodic reports
  igt/perf: make stream_fd a global variable
  igt/perf: update max buffer size for reading reports
  igt/perf: rework oa-exponent test
  igt/perf: make enable-disable more reliable
  igt/perf: make buffer-fill more reliable
  igt/perf: load gt_boost_freq_mhz as max gt frequency
  igt/perf: remove unused frequency functions

Robert Bragg (21):
  igt/perf: generalize lookup for test metric set
  igt/perf: improve robustness of polling/blocking tests
  igt/perf: init timestamp freq and oa format per devid
  igt/perf: update init_sys_info for skl with per-gt configs
  igt/perf: add gen8 formats
  igt/perf: fix a counter indexing
  igt/perf: generalize checks for undefined A counters
  igt/perf: generalize reading gpu ticks from reports
  igt/perf: move timebase + oa exponent utilities up
  igt/perf: wrap emission of MI_REPORT_PERF_COUNT
  igt/perf: handling printing gen8 formats
  igt/perf: avoid assumptions about oa exponent <-> freq mappings
  igt/perf: allow 10% margin matching oa/sysfs freq in test_oa_exponents
  igt/perf: s/test_perf_ctx_mi_rpc/hsw_test_single_ctx_counters/
  igt/perf: don't assume constant of 40 EUs
  igt/perf: consider ctx-switch reports while polling/blocking
  igt/perf: factor out oa report sanity checking
  igt/perf: print [un]slice freq and report reasons in debug
  igt/perf: update print_reports to print context ID
  igt/perf: add per context filtering test for gen8+
  igt/perf: fix rc6 test

 tests/perf.c | 2896 +++++++++++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 2339 insertions(+), 557 deletions(-)

--
2.11.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH i-g-t 01/29] igt/perf: generalize lookup for test metric set
  2017-04-25 22:32 [PATCH i-g-t 00/29] Update i915 perf tests for Gen8+ Lionel Landwerlin
@ 2017-04-25 22:32 ` Lionel Landwerlin
  2017-04-25 22:32 ` [PATCH i-g-t 02/29] igt/perf: improve robustness of polling/blocking tests Lionel Landwerlin
                   ` (27 subsequent siblings)
  28 siblings, 0 replies; 36+ messages in thread
From: Lionel Landwerlin @ 2017-04-25 22:32 UTC (permalink / raw)
  To: intel-gfx

From: Robert Bragg <robert@sixbynine.org>

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 tests/perf.c | 85 ++++++++++++++++++++++++++++++++++++++++--------------------
 1 file changed, 57 insertions(+), 28 deletions(-)

diff --git a/tests/perf.c b/tests/perf.c
index 2a66bb63..0422e517 100644
--- a/tests/perf.c
+++ b/tests/perf.c
@@ -195,7 +195,7 @@ static int drm_fd = -1;
 static uint32_t devid;
 static int card = -1;
 
-static uint64_t hsw_render_basic_id = UINT64_MAX;
+static uint64_t test_metric_set_id = UINT64_MAX;
 static uint64_t gt_min_freq_mhz_saved = 0;
 static uint64_t gt_max_freq_mhz_saved = 0;
 static uint64_t gt_min_freq_mhz = 0;
@@ -348,17 +348,46 @@ read_debugfs_u64_record(int fd, const char *file, const char *key)
 }
 
 static bool
-lookup_hsw_render_basic_id(void)
+lookup_test_metric_set_id(void)
 {
+	const char *test_set_name = NULL;
+	const char *test_set_uuid = NULL;
 	char buf[256];
 
 	igt_assert_neq(card, -1);
+	igt_assert_neq(devid, 0);
+
+	if (IS_HASWELL(devid)) {
+		/* We don't have a TestOa metric set for Haswell so use
+		 * RenderBasic
+		 */
+		test_set_name = "RenderBasic";
+		test_set_uuid = "403d8832-1a27-4aa6-a64e-f5389ce7b212";
+	} else if (IS_BROADWELL(devid)) {
+		test_set_name = "TestOa";
+		test_set_uuid = "d6de6f55-e526-4f79-a6a6-d7315c09044e";
+	} else if (IS_CHERRYVIEW(devid)) {
+		test_set_name = "TestOa";
+		test_set_uuid = "4a534b07-cba3-414d-8d60-874830e883aa";
+	} else if (IS_SKYLAKE(devid)) {
+		test_set_name = "TestOa";
+		test_set_uuid = "544a0c1f-5863-4682-bc59-778b7eab8303";
+	} else if (IS_BROXTON(devid)) {
+		test_set_name = "TestOa";
+		test_set_uuid = "5ee72f5c-092f-421e-8b70-225f7c3e9612";
+	} else
+		return false;
+
+	igt_debug("%s metric set UUID = %s\n",
+		  test_set_name,
+		  test_set_uuid);
 
 	snprintf(buf, sizeof(buf),
-		 "/sys/class/drm/card%d/metrics/403d8832-1a27-4aa6-a64e-f5389ce7b212/id",
-		 card);
+		 "/sys/class/drm/card%d/metrics/%s/id",
+		 card,
+		 test_set_uuid);
 
-	return try_read_u64_file(buf, &hsw_render_basic_id);
+	return try_read_u64_file(buf, &test_metric_set_id);
 }
 
 static void
@@ -426,7 +455,7 @@ test_system_wide_paranoid(void)
 			DRM_I915_PERF_PROP_SAMPLE_OA, true,
 
 			/* OA unit configuration */
-			DRM_I915_PERF_PROP_OA_METRICS_SET, hsw_render_basic_id,
+			DRM_I915_PERF_PROP_OA_METRICS_SET, test_metric_set_id,
 			DRM_I915_PERF_PROP_OA_FORMAT, I915_OA_FORMAT_A45_B8_C8,
 			DRM_I915_PERF_PROP_OA_EXPONENT, 13, /* 1 millisecond */
 		};
@@ -452,7 +481,7 @@ test_system_wide_paranoid(void)
 			DRM_I915_PERF_PROP_SAMPLE_OA, true,
 
 			/* OA unit configuration */
-			DRM_I915_PERF_PROP_OA_METRICS_SET, hsw_render_basic_id,
+			DRM_I915_PERF_PROP_OA_METRICS_SET, test_metric_set_id,
 			DRM_I915_PERF_PROP_OA_FORMAT, I915_OA_FORMAT_A45_B8_C8,
 			DRM_I915_PERF_PROP_OA_EXPONENT, 13, /* 1 millisecond */
 		};
@@ -486,7 +515,7 @@ test_invalid_open_flags(void)
 		DRM_I915_PERF_PROP_SAMPLE_OA, true,
 
 		/* OA unit configuration */
-		DRM_I915_PERF_PROP_OA_METRICS_SET, hsw_render_basic_id,
+		DRM_I915_PERF_PROP_OA_METRICS_SET, test_metric_set_id,
 		DRM_I915_PERF_PROP_OA_FORMAT, I915_OA_FORMAT_A45_B8_C8,
 		DRM_I915_PERF_PROP_OA_EXPONENT, 13, /* 1 millisecond */
 	};
@@ -525,7 +554,7 @@ test_invalid_oa_metric_set_id(void)
 	do_ioctl_err(drm_fd, DRM_IOCTL_I915_PERF_OPEN, &param, EINVAL);
 
 	/* Check that we aren't just seeing false positives... */
-	properties[ARRAY_SIZE(properties) - 1] = hsw_render_basic_id;
+	properties[ARRAY_SIZE(properties) - 1] = test_metric_set_id;
 	stream_fd = __perf_open(drm_fd, &param);
 	close(stream_fd);
 
@@ -542,7 +571,7 @@ test_invalid_oa_format_id(void)
 		DRM_I915_PERF_PROP_SAMPLE_OA, true,
 
 		/* OA unit configuration */
-		DRM_I915_PERF_PROP_OA_METRICS_SET, hsw_render_basic_id,
+		DRM_I915_PERF_PROP_OA_METRICS_SET, test_metric_set_id,
 		DRM_I915_PERF_PROP_OA_EXPONENT, 13, /* 1 millisecond */
 		DRM_I915_PERF_PROP_OA_FORMAT, UINT64_MAX,
 	};
@@ -576,7 +605,7 @@ test_missing_sample_flags(void)
 		/* No _PROP_SAMPLE_xyz flags */
 
 		/* OA unit configuration */
-		DRM_I915_PERF_PROP_OA_METRICS_SET, hsw_render_basic_id,
+		DRM_I915_PERF_PROP_OA_METRICS_SET, test_metric_set_id,
 		DRM_I915_PERF_PROP_OA_EXPONENT, 13, /* 1 millisecond */
 		DRM_I915_PERF_PROP_OA_FORMAT, I915_OA_FORMAT_A45_B8_C8,
 	};
@@ -729,7 +758,7 @@ open_and_read_2_oa_reports(int format_id,
 		DRM_I915_PERF_PROP_SAMPLE_OA, true,
 
 		/* OA unit configuration */
-		DRM_I915_PERF_PROP_OA_METRICS_SET, hsw_render_basic_id,
+		DRM_I915_PERF_PROP_OA_METRICS_SET, test_metric_set_id,
 		DRM_I915_PERF_PROP_OA_FORMAT, format_id,
 		DRM_I915_PERF_PROP_OA_EXPONENT, exponent,
 
@@ -1041,7 +1070,7 @@ test_invalid_oa_exponent(void)
 		DRM_I915_PERF_PROP_SAMPLE_OA, true,
 
 		/* OA unit configuration */
-		DRM_I915_PERF_PROP_OA_METRICS_SET, hsw_render_basic_id,
+		DRM_I915_PERF_PROP_OA_METRICS_SET, test_metric_set_id,
 		DRM_I915_PERF_PROP_OA_FORMAT, I915_OA_FORMAT_A45_B8_C8,
 		DRM_I915_PERF_PROP_OA_EXPONENT, 31, /* maximum exponent expected
 						       to be accepted */
@@ -1098,7 +1127,7 @@ test_low_oa_exponent_permissions(void)
 		DRM_I915_PERF_PROP_SAMPLE_OA, true,
 
 		/* OA unit configuration */
-		DRM_I915_PERF_PROP_OA_METRICS_SET, hsw_render_basic_id,
+		DRM_I915_PERF_PROP_OA_METRICS_SET, test_metric_set_id,
 		DRM_I915_PERF_PROP_OA_FORMAT, I915_OA_FORMAT_A45_B8_C8,
 		DRM_I915_PERF_PROP_OA_EXPONENT, bad_exponent,
 	};
@@ -1163,7 +1192,7 @@ test_per_context_mode_unprivileged(void)
 		DRM_I915_PERF_PROP_SAMPLE_OA, true,
 
 		/* OA unit configuration */
-		DRM_I915_PERF_PROP_OA_METRICS_SET, hsw_render_basic_id,
+		DRM_I915_PERF_PROP_OA_METRICS_SET, test_metric_set_id,
 		DRM_I915_PERF_PROP_OA_FORMAT, I915_OA_FORMAT_A45_B8_C8,
 		DRM_I915_PERF_PROP_OA_EXPONENT, 13, /* 1 millisecond */
 	};
@@ -1250,7 +1279,7 @@ test_blocking(void)
 		DRM_I915_PERF_PROP_SAMPLE_OA, true,
 
 		/* OA unit configuration */
-		DRM_I915_PERF_PROP_OA_METRICS_SET, hsw_render_basic_id,
+		DRM_I915_PERF_PROP_OA_METRICS_SET, test_metric_set_id,
 		DRM_I915_PERF_PROP_OA_FORMAT, I915_OA_FORMAT_A45_B8_C8,
 		DRM_I915_PERF_PROP_OA_EXPONENT, oa_exponent,
 	};
@@ -1342,7 +1371,7 @@ test_polling(void)
 		DRM_I915_PERF_PROP_SAMPLE_OA, true,
 
 		/* OA unit configuration */
-		DRM_I915_PERF_PROP_OA_METRICS_SET, hsw_render_basic_id,
+		DRM_I915_PERF_PROP_OA_METRICS_SET, test_metric_set_id,
 		DRM_I915_PERF_PROP_OA_FORMAT, I915_OA_FORMAT_A45_B8_C8,
 		DRM_I915_PERF_PROP_OA_EXPONENT, oa_exponent,
 	};
@@ -1460,7 +1489,7 @@ test_buffer_fill(void)
 		DRM_I915_PERF_PROP_SAMPLE_OA, true,
 
 		/* OA unit configuration */
-		DRM_I915_PERF_PROP_OA_METRICS_SET, hsw_render_basic_id,
+		DRM_I915_PERF_PROP_OA_METRICS_SET, test_metric_set_id,
 		DRM_I915_PERF_PROP_OA_FORMAT, I915_OA_FORMAT_A45_B8_C8,
 		DRM_I915_PERF_PROP_OA_EXPONENT, oa_exponent,
 	};
@@ -1534,7 +1563,7 @@ test_enable_disable(void)
 		DRM_I915_PERF_PROP_SAMPLE_OA, true,
 
 		/* OA unit configuration */
-		DRM_I915_PERF_PROP_OA_METRICS_SET, hsw_render_basic_id,
+		DRM_I915_PERF_PROP_OA_METRICS_SET, test_metric_set_id,
 		DRM_I915_PERF_PROP_OA_FORMAT, I915_OA_FORMAT_A45_B8_C8,
 		DRM_I915_PERF_PROP_OA_EXPONENT, oa_exponent,
 	};
@@ -1605,7 +1634,7 @@ test_short_reads(void)
 		DRM_I915_PERF_PROP_SAMPLE_OA, true,
 
 		/* OA unit configuration */
-		DRM_I915_PERF_PROP_OA_METRICS_SET, hsw_render_basic_id,
+		DRM_I915_PERF_PROP_OA_METRICS_SET, test_metric_set_id,
 		DRM_I915_PERF_PROP_OA_FORMAT, I915_OA_FORMAT_A45_B8_C8,
 		DRM_I915_PERF_PROP_OA_EXPONENT, oa_exponent,
 	};
@@ -1692,7 +1721,7 @@ test_non_sampling_read_error(void)
 		DRM_I915_PERF_PROP_SAMPLE_OA, true,
 
 		/* OA unit configuration */
-		DRM_I915_PERF_PROP_OA_METRICS_SET, hsw_render_basic_id,
+		DRM_I915_PERF_PROP_OA_METRICS_SET, test_metric_set_id,
 		DRM_I915_PERF_PROP_OA_FORMAT, I915_OA_FORMAT_A45_B8_C8,
 
 		/* XXX: no sampling exponent */
@@ -1726,7 +1755,7 @@ test_disabled_read_error(void)
 		DRM_I915_PERF_PROP_SAMPLE_OA, true,
 
 		/* OA unit configuration */
-		DRM_I915_PERF_PROP_OA_METRICS_SET, hsw_render_basic_id,
+		DRM_I915_PERF_PROP_OA_METRICS_SET, test_metric_set_id,
 		DRM_I915_PERF_PROP_OA_FORMAT, I915_OA_FORMAT_A45_B8_C8,
 		DRM_I915_PERF_PROP_OA_EXPONENT, oa_exponent,
 	};
@@ -1788,7 +1817,7 @@ test_mi_rpc(void)
 		DRM_I915_PERF_PROP_SAMPLE_OA, true,
 
 		/* OA unit configuration */
-		DRM_I915_PERF_PROP_OA_METRICS_SET, hsw_render_basic_id,
+		DRM_I915_PERF_PROP_OA_METRICS_SET, test_metric_set_id,
 		DRM_I915_PERF_PROP_OA_FORMAT, I915_OA_FORMAT_A45_B8_C8,
 
 		/* Note: no OA exponent specified in this case */
@@ -1918,7 +1947,7 @@ test_per_ctx_mi_rpc(void)
 		DRM_I915_PERF_PROP_SAMPLE_OA, true,
 
 		/* OA unit configuration */
-		DRM_I915_PERF_PROP_OA_METRICS_SET, hsw_render_basic_id,
+		DRM_I915_PERF_PROP_OA_METRICS_SET, test_metric_set_id,
 		DRM_I915_PERF_PROP_OA_FORMAT, I915_OA_FORMAT_A45_B8_C8,
 
 		/* Note: no OA exponent specified in this case */
@@ -2129,7 +2158,7 @@ test_rc6_disable(void)
 		DRM_I915_PERF_PROP_SAMPLE_OA, true,
 
 		/* OA unit configuration */
-		DRM_I915_PERF_PROP_OA_METRICS_SET, hsw_render_basic_id,
+		DRM_I915_PERF_PROP_OA_METRICS_SET, test_metric_set_id,
 		DRM_I915_PERF_PROP_OA_FORMAT, I915_OA_FORMAT_A45_B8_C8,
 		DRM_I915_PERF_PROP_OA_EXPONENT, oa_exponent,
 	};
@@ -2230,8 +2259,8 @@ test_i915_ref_count(void)
 	card = drm_get_card();
 
 	igt_require(IS_HASWELL(devid));
-	igt_require(lookup_hsw_render_basic_id());
-	properties[3] = hsw_render_basic_id;
+	igt_require(lookup_test_metric_set_id());
+	properties[3] = test_metric_set_id;
 
 	ref_count0 = read_i915_module_ref();
 	igt_debug("initial ref count with drm_fd open = %u\n", ref_count0);
@@ -2301,7 +2330,7 @@ igt_main
 		card = drm_get_card();
 
 		igt_require(IS_HASWELL(devid));
-		igt_require(lookup_hsw_render_basic_id());
+		igt_require(lookup_test_metric_set_id());
 
 		gt_frequency_range_save();
 
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH i-g-t 02/29] igt/perf: improve robustness of polling/blocking tests
  2017-04-25 22:32 [PATCH i-g-t 00/29] Update i915 perf tests for Gen8+ Lionel Landwerlin
  2017-04-25 22:32 ` [PATCH i-g-t 01/29] igt/perf: generalize lookup for test metric set Lionel Landwerlin
@ 2017-04-25 22:32 ` Lionel Landwerlin
  2017-06-16 13:48   ` Matthew Auld
  2017-04-25 22:32 ` [PATCH i-g-t 03/29] igt/perf: init timestamp freq and oa format per devid Lionel Landwerlin
                   ` (26 subsequent siblings)
  28 siblings, 1 reply; 36+ messages in thread
From: Lionel Landwerlin @ 2017-04-25 22:32 UTC (permalink / raw)
  To: intel-gfx

From: Robert Bragg <robert@sixbynine.org>

There were a couple of problems with both of these tests that could lead
to false negatives addressed by this patch.

1) The upper limit for the number of iterations missed a +1 to consider
   that there might be a sample immediately available at the start of the
   loop.

2) The tests didn't consider that a duration measured in terms of
   (end-start) ticks could be +- 1 tick since we don't know the
   fractional part of the tick counts. Our threshold for stime being <
   one tick could have a false negative for any real stime between 1 to
   10 milliseconds depending on luck.

The tests now both run for a lot longer (1000 x tick duration, or
typically 10 seconds each) so that a single tick represents a much
smaller proportion of the total duration (0.1%) and the stime thresholds
are now set at 1% of the total duration.

Signed-off-by: Robert Bragg <robert@sixbynine.org>
---
 tests/perf.c | 139 +++++++++++++++++++++++++++++++++++++++--------------------
 1 file changed, 93 insertions(+), 46 deletions(-)

diff --git a/tests/perf.c b/tests/perf.c
index 0422e517..df0120b2 100644
--- a/tests/perf.c
+++ b/tests/perf.c
@@ -1294,18 +1294,50 @@ test_blocking(void)
 	struct tms end_times;
 	int64_t user_ns, kernel_ns;
 	int64_t tick_ns = 1000000000 / sysconf(_SC_CLK_TCK);
+	int64_t test_duration_ns = tick_ns * 1000;
+
+	/* Based on the 40ms OA sampling period set above: max OA samples: */
+	int max_iterations = (test_duration_ns / 40000000ull) + 1;
+
+	/* It's a bit tricky to put a lower limit here, but we expect a
+	 * relatively low latency for seeing reports, while we don't currently
+	 * give any control over this in the api.
+	 *
+	 * We assume a maximum latency of 6 millisecond to deliver a POLLIN and
+	 * read() after a new sample is written (46ms per iteration) considering
+	 * the knowledge that that the driver uses a 200Hz hrtimer (5ms period)
+	 * to check for data and giving some time to read().
+	 */
+	int min_iterations = (test_duration_ns / 46000000ull);
+
 	int64_t start;
 	int n = 0;
 
 	times(&start_times);
 
-	/* Loop for 600ms performing blocking reads while the HW is sampling at
+	igt_debug("tick length = %dns, test duration = %"PRIu64"ns, min iter. = %d, max iter. = %d\n",
+		  (int)tick_ns, test_duration_ns,
+		  min_iterations, max_iterations);
+
+	/* In the loop we perform blocking polls while the HW is sampling at
 	 * ~25Hz, with the expectation that we spend most of our time blocked
 	 * in the kernel, and shouldn't be burning cpu cycles in the kernel in
 	 * association with this process (verified by looking at stime before
 	 * and after loop).
+	 *
+	 * We're looking to assert that less than 1% of the test duration is
+	 * spent in the kernel dealing with polling and read()ing.
+	 *
+	 * The test runs for a relatively long time considering the very low
+	 * resolution of stime in ticks of typically 10 milliseconds. Since we
+	 * don't know the fractional part of tick values we read from userspace
+	 * so our minimum threshold needs to be >= one tick since any
+	 * measurement might really be +- tick_ns (assuming we effectively get
+	 * floor(real_stime)).
+	 *
+	 * We Loop for 1000 x tick_ns so one tick corresponds to 0.1%
 	 */
-	for (start = get_time(); (get_time() - start) < 600000000; /* nop */) {
+	for (start = get_time(); (get_time() - start) < test_duration_ns; /* nop */) {
 		int ret;
 
 		while ((ret = read(stream_fd, buf, sizeof(buf))) < 0 &&
@@ -1325,33 +1357,25 @@ test_blocking(void)
 	user_ns = (end_times.tms_utime - start_times.tms_utime) * tick_ns;
 	kernel_ns = (end_times.tms_stime - start_times.tms_stime) * tick_ns;
 
-	igt_debug("%d blocking reads in 500 milliseconds, with 1KHz OA sampling\n", n);
-	igt_debug("time in userspace = %"PRIu64"ns (start utime = %d, end = %d, ns ticks per sec = %d)\n",
-		  user_ns, (int)start_times.tms_utime, (int)end_times.tms_utime, (int)tick_ns);
-	igt_debug("time in kernelspace = %"PRIu64"ns (start stime = %d, end = %d, ns ticks per sec = %d)\n",
-		  kernel_ns, (int)start_times.tms_stime, (int)end_times.tms_stime, (int)tick_ns);
+	igt_debug("%d blocking reads during test with 25Hz OA sampling\n", n);
+	igt_debug("time in userspace = %"PRIu64"ns (+-%dns) (start utime = %d, end = %d)\n",
+		  user_ns, (int)tick_ns,
+		  (int)start_times.tms_utime, (int)end_times.tms_utime);
+	igt_debug("time in kernelspace = %"PRIu64"ns (+-%dns) (start stime = %d, end = %d)\n",
+		  kernel_ns, (int)tick_ns,
+		  (int)start_times.tms_stime, (int)end_times.tms_stime);
 
 	/* With completely broken blocking (but also not returning an error) we
-	 * could end up with an open loop, hopefully recognisable with > 15
-	 * (600/40)iterations.
+	 * could end up with an open loop,
 	 */
-	igt_assert(n <= 15);
+	igt_assert(n <= max_iterations);
 
-	/* It's a bit tricky to put a lower limit here, but we expect a
-	 * relatively low latency for seeing reports, while we don't currently
-	 * give any control over this in the api.
-	 *
-	 * Limited to a 5 millisecond latency and 45ms (worst case)
-	 * per-iteration that could give 13.3 iterations. Rounding gives a tiny
-	 * bit more latency slack (6ms)...
+	/* Make sure the driver is reporting new samples with a reasonably
+	 * low latency...
 	 */
-	igt_assert(n > 13);
+	igt_assert(n > min_iterations);
 
-	/* A bit tricky to put a number on this, but we don't expect the kernel
-	 * to use any significant cpu while waiting and given the in precision
-	 * of stime (multiple of CLK_TCK) we expect this to round to zero.
-	 */
-	igt_assert_eq(kernel_ns, 0);
+	igt_assert(kernel_ns <= (test_duration_ns / 100ull));
 
 	close(stream_fd);
 }
@@ -1387,18 +1411,49 @@ test_polling(void)
 	struct tms end_times;
 	int64_t user_ns, kernel_ns;
 	int64_t tick_ns = 1000000000 / sysconf(_SC_CLK_TCK);
+	int64_t test_duration_ns = tick_ns * 1000;
+
+	/* Based on the 40ms OA sampling period set above: max OA samples: */
+	int max_iterations = (test_duration_ns / 40000000ull) + 1;
+
+	/* It's a bit tricky to put a lower limit here, but we expect a
+	 * relatively low latency for seeing reports, while we don't currently
+	 * give any control over this in the api.
+	 *
+	 * We assume a maximum latency of 6 millisecond to deliver a POLLIN and
+	 * read() after a new sample is written (46ms per iteration) considering
+	 * the knowledge that that the driver uses a 200Hz hrtimer (5ms period)
+	 * to check for data and giving some time to read().
+	 */
+	int min_iterations = (test_duration_ns / 46000000ull);
 	int64_t start;
 	int n = 0;
 
 	times(&start_times);
 
-	/* Loop for 600ms performing blocking polls while the HW is sampling at
+	igt_debug("tick length = %dns, test duration = %"PRIu64"ns, min iter. = %d, max iter. = %d\n",
+		  (int)tick_ns, test_duration_ns,
+		  min_iterations, max_iterations);
+
+	/* In the loop we perform blocking polls while the HW is sampling at
 	 * ~25Hz, with the expectation that we spend most of our time blocked
 	 * in the kernel, and shouldn't be burning cpu cycles in the kernel in
 	 * association with this process (verified by looking at stime before
 	 * and after loop).
+	 *
+	 * We're looking to assert that less than 1% of the test duration is
+	 * spent in the kernel dealing with polling and read()ing.
+	 *
+	 * The test runs for a relatively long time considering the very low
+	 * resolution of stime in ticks of typically 10 milliseconds. Since we
+	 * don't know the fractional part of tick values we read from userspace
+	 * so our minimum threshold needs to be >= one tick since any
+	 * measurement might really be +- tick_ns (assuming we effectively get
+	 * floor(real_stime)).
+	 *
+	 * We Loop for 1000 x tick_ns so one tick corresponds to 0.1%
 	 */
-	for (start = get_time(); (get_time() - start) < 600000000; /* nop */) {
+	for (start = get_time(); (get_time() - start) < test_duration_ns; /* nop */) {
 		struct pollfd pollfd = { .fd = stream_fd, .events = POLLIN };
 		int ret;
 
@@ -1449,33 +1504,25 @@ test_polling(void)
 	user_ns = (end_times.tms_utime - start_times.tms_utime) * tick_ns;
 	kernel_ns = (end_times.tms_stime - start_times.tms_stime) * tick_ns;
 
-	igt_debug("%d blocking poll()s in 600 milliseconds, with 25Hz OA sampling\n", n);
-	igt_debug("time in userspace = %"PRIu64"ns (start utime = %d, end = %d, ns ticks per sec = %d)\n",
-		  user_ns, (int)start_times.tms_utime, (int)end_times.tms_utime, (int)tick_ns);
-	igt_debug("time in kernelspace = %"PRIu64"ns (start stime = %d, end = %d, ns ticks per sec = %d)\n",
-		  kernel_ns, (int)start_times.tms_stime, (int)end_times.tms_stime, (int)tick_ns);
+	igt_debug("%d blocking poll()s during test with 25Hz OA sampling\n", n);
+	igt_debug("time in userspace = %"PRIu64"ns (+-%dns) (start utime = %d, end = %d)\n",
+		  user_ns, (int)tick_ns,
+		  (int)start_times.tms_utime, (int)end_times.tms_utime);
+	igt_debug("time in kernelspace = %"PRIu64"ns (+-%dns) (start stime = %d, end = %d)\n",
+		  kernel_ns, (int)tick_ns,
+		  (int)start_times.tms_stime, (int)end_times.tms_stime);
 
 	/* With completely broken blocking while polling (but still somehow
-	 * reporting a POLLIN event) we could end up with an open loop,
-	 * hopefully recognisable with > 15 (600/40)iterations.
+	 * reporting a POLLIN event) we could end up with an open loop.
 	 */
-	igt_assert(n <= 15);
+	igt_assert(n <= max_iterations);
 
-	/* It's a bit tricky to put a lower limit here, but we expect a
-	 * relatively low latency for seeing reports, while we don't currently
-	 * give any control over this in the api.
-	 *
-	 * Limited to a 5 millisecond latency and 45ms (worst case)
-	 * per-iteration that could give 13.3 iterations. Rounding gives a tiny
-	 * bit more latency slack (6ms)...
+	/* Make sure the driver is reporting new samples with a reasonably
+	 * low latency...
 	 */
-	igt_assert(n > 13);
+	igt_assert(n > min_iterations);
 
-	/* A bit tricky to put a number on this, but we don't expect the kernel
-	 * to use any significant cpu while waiting and given the in precision
-	 * of stime (multiple of CLK_TCK) we expect this to round to zero.
-	 */
-	igt_assert_eq(kernel_ns, 0);
+	igt_assert(kernel_ns <= (test_duration_ns / 100ull));
 
 	close(stream_fd);
 }
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH i-g-t 03/29] igt/perf: init timestamp freq and oa format per devid
  2017-04-25 22:32 [PATCH i-g-t 00/29] Update i915 perf tests for Gen8+ Lionel Landwerlin
  2017-04-25 22:32 ` [PATCH i-g-t 01/29] igt/perf: generalize lookup for test metric set Lionel Landwerlin
  2017-04-25 22:32 ` [PATCH i-g-t 02/29] igt/perf: improve robustness of polling/blocking tests Lionel Landwerlin
@ 2017-04-25 22:32 ` Lionel Landwerlin
  2017-06-16 14:37   ` Matthew Auld
  2017-04-25 22:32 ` [PATCH i-g-t 04/29] igt/perf: update init_sys_info for skl with per-gt configs Lionel Landwerlin
                   ` (25 subsequent siblings)
  28 siblings, 1 reply; 36+ messages in thread
From: Lionel Landwerlin @ 2017-04-25 22:32 UTC (permalink / raw)
  To: intel-gfx

From: Robert Bragg <robert@sixbynine.org>

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 tests/perf.c | 120 +++++++++++++++++++++++++++++++++--------------------------
 1 file changed, 67 insertions(+), 53 deletions(-)

diff --git a/tests/perf.c b/tests/perf.c
index df0120b2..f518bcc1 100644
--- a/tests/perf.c
+++ b/tests/perf.c
@@ -82,15 +82,20 @@ IGT_TEST_DESCRIPTION("Test the i915 perf metrics streaming interface");
 #define DRM_IOCTL_I915_PERF_OPEN	DRM_IOW(DRM_COMMAND_BASE + DRM_I915_PERF_OPEN, struct drm_i915_perf_open_param)
 
 enum drm_i915_oa_format {
-       I915_OA_FORMAT_A13 = 1,
-       I915_OA_FORMAT_A29,
-       I915_OA_FORMAT_A13_B8_C8,
-       I915_OA_FORMAT_B4_C8,
-       I915_OA_FORMAT_A45_B8_C8,
-       I915_OA_FORMAT_B4_C8_A16,
-       I915_OA_FORMAT_C4_B8,
-
-       I915_OA_FORMAT_MAX /* non-ABI */
+	I915_OA_FORMAT_A13 = 1,     /* HSW only */
+	I915_OA_FORMAT_A29,         /* HSW only */
+	I915_OA_FORMAT_A13_B8_C8,   /* HSW only */
+	I915_OA_FORMAT_B4_C8,       /* HSW only */
+	I915_OA_FORMAT_A45_B8_C8,   /* HSW only */
+	I915_OA_FORMAT_B4_C8_A16,   /* HSW only */
+	I915_OA_FORMAT_C4_B8,       /* HSW+ */
+
+	/* Gen8+ */
+	I915_OA_FORMAT_A12,
+	I915_OA_FORMAT_A12_B8_C8,
+	I915_OA_FORMAT_A32u40_A4u32_B8_C8,
+
+	I915_OA_FORMAT_MAX /* non-ABI */
 };
 
 enum drm_i915_perf_property_id {
@@ -202,6 +207,7 @@ static uint64_t gt_min_freq_mhz = 0;
 static uint64_t gt_max_freq_mhz = 0;
 
 static uint64_t timestamp_frequency = 12500000;
+static enum drm_i915_oa_format test_oa_format;
 
 static igt_render_copyfunc_t render_copy = NULL;
 
@@ -348,7 +354,7 @@ read_debugfs_u64_record(int fd, const char *file, const char *key)
 }
 
 static bool
-lookup_test_metric_set_id(void)
+init_sys_info(void)
 {
 	const char *test_set_name = NULL;
 	const char *test_set_uuid = NULL;
@@ -357,26 +363,32 @@ lookup_test_metric_set_id(void)
 	igt_assert_neq(card, -1);
 	igt_assert_neq(devid, 0);
 
+	timestamp_frequency = 12500000;
+
 	if (IS_HASWELL(devid)) {
 		/* We don't have a TestOa metric set for Haswell so use
 		 * RenderBasic
 		 */
 		test_set_name = "RenderBasic";
 		test_set_uuid = "403d8832-1a27-4aa6-a64e-f5389ce7b212";
-	} else if (IS_BROADWELL(devid)) {
-		test_set_name = "TestOa";
-		test_set_uuid = "d6de6f55-e526-4f79-a6a6-d7315c09044e";
-	} else if (IS_CHERRYVIEW(devid)) {
-		test_set_name = "TestOa";
-		test_set_uuid = "4a534b07-cba3-414d-8d60-874830e883aa";
-	} else if (IS_SKYLAKE(devid)) {
-		test_set_name = "TestOa";
-		test_set_uuid = "544a0c1f-5863-4682-bc59-778b7eab8303";
-	} else if (IS_BROXTON(devid)) {
+		test_oa_format = I915_OA_FORMAT_A45_B8_C8;
+	} else {
 		test_set_name = "TestOa";
-		test_set_uuid = "5ee72f5c-092f-421e-8b70-225f7c3e9612";
-	} else
-		return false;
+		test_oa_format = I915_OA_FORMAT_A32u40_A4u32_B8_C8;
+
+		if (IS_BROADWELL(devid)) {
+			test_set_uuid = "d6de6f55-e526-4f79-a6a6-d7315c09044e";
+		} else if (IS_CHERRYVIEW(devid)) {
+			test_set_uuid = "4a534b07-cba3-414d-8d60-874830e883aa";
+		} else if (IS_SKYLAKE(devid)) {
+			test_set_uuid = "544a0c1f-5863-4682-bc59-778b7eab8303";
+			timestamp_frequency = 12000000;
+		} else if (IS_BROXTON(devid)) {
+			test_set_uuid = "5ee72f5c-092f-421e-8b70-225f7c3e9612";
+			timestamp_frequency = 19200000;
+		} else
+			return false;
+	}
 
 	igt_debug("%s metric set UUID = %s\n",
 		  test_set_name,
@@ -456,7 +468,7 @@ test_system_wide_paranoid(void)
 
 			/* OA unit configuration */
 			DRM_I915_PERF_PROP_OA_METRICS_SET, test_metric_set_id,
-			DRM_I915_PERF_PROP_OA_FORMAT, I915_OA_FORMAT_A45_B8_C8,
+			DRM_I915_PERF_PROP_OA_FORMAT, test_oa_format,
 			DRM_I915_PERF_PROP_OA_EXPONENT, 13, /* 1 millisecond */
 		};
 		struct drm_i915_perf_open_param param = {
@@ -482,7 +494,7 @@ test_system_wide_paranoid(void)
 
 			/* OA unit configuration */
 			DRM_I915_PERF_PROP_OA_METRICS_SET, test_metric_set_id,
-			DRM_I915_PERF_PROP_OA_FORMAT, I915_OA_FORMAT_A45_B8_C8,
+			DRM_I915_PERF_PROP_OA_FORMAT, test_oa_format,
 			DRM_I915_PERF_PROP_OA_EXPONENT, 13, /* 1 millisecond */
 		};
 		struct drm_i915_perf_open_param param = {
@@ -516,7 +528,7 @@ test_invalid_open_flags(void)
 
 		/* OA unit configuration */
 		DRM_I915_PERF_PROP_OA_METRICS_SET, test_metric_set_id,
-		DRM_I915_PERF_PROP_OA_FORMAT, I915_OA_FORMAT_A45_B8_C8,
+		DRM_I915_PERF_PROP_OA_FORMAT, test_oa_format,
 		DRM_I915_PERF_PROP_OA_EXPONENT, 13, /* 1 millisecond */
 	};
 	struct drm_i915_perf_open_param param = {
@@ -536,7 +548,7 @@ test_invalid_oa_metric_set_id(void)
 		DRM_I915_PERF_PROP_SAMPLE_OA, true,
 
 		/* OA unit configuration */
-		DRM_I915_PERF_PROP_OA_FORMAT, I915_OA_FORMAT_A45_B8_C8,
+		DRM_I915_PERF_PROP_OA_FORMAT, test_oa_format,
 		DRM_I915_PERF_PROP_OA_EXPONENT, 13, /* 1 millisecond */
 		DRM_I915_PERF_PROP_OA_METRICS_SET, UINT64_MAX,
 	};
@@ -589,7 +601,7 @@ test_invalid_oa_format_id(void)
 	do_ioctl_err(drm_fd, DRM_IOCTL_I915_PERF_OPEN, &param, EINVAL);
 
 	/* Check that we aren't just seeing false positives... */
-	properties[ARRAY_SIZE(properties) - 1] = I915_OA_FORMAT_A45_B8_C8;
+	properties[ARRAY_SIZE(properties) - 1] = test_oa_format;
 	stream_fd = __perf_open(drm_fd, &param);
 	close(stream_fd);
 
@@ -607,7 +619,7 @@ test_missing_sample_flags(void)
 		/* OA unit configuration */
 		DRM_I915_PERF_PROP_OA_METRICS_SET, test_metric_set_id,
 		DRM_I915_PERF_PROP_OA_EXPONENT, 13, /* 1 millisecond */
-		DRM_I915_PERF_PROP_OA_FORMAT, I915_OA_FORMAT_A45_B8_C8,
+		DRM_I915_PERF_PROP_OA_FORMAT, test_oa_format,
 	};
 	struct drm_i915_perf_open_param param = {
 		.flags = I915_PERF_FLAG_FD_CLOEXEC,
@@ -981,7 +993,7 @@ test_oa_exponents(int gt_freq_mhz)
 			igt_debug("ITER %d: testing OA exponent %d with sysfs GT freq = %dmhz\n",
 				  j, i, gt_freq_mhz_0);
 
-			open_and_read_2_oa_reports(I915_OA_FORMAT_A45_B8_C8,
+			open_and_read_2_oa_reports(test_oa_format,
 						   i, /* exponent */
 						   oa_report0,
 						   oa_report1,
@@ -1014,7 +1026,7 @@ test_oa_exponents(int gt_freq_mhz)
 			 * open_and_read_2_oa_reports(), the C2 counter is
 			 * configured as the gpu clock counter...
 			 */
-			c_off = oa_formats[I915_OA_FORMAT_A45_B8_C8].c_off;
+			c_off = oa_formats[test_oa_format].c_off;
 			igt_assert(c_off);
 			c0 = (uint32_t *)(((uint8_t *)oa_report0) + c_off);
 			c1 = (uint32_t *)(((uint8_t *)oa_report1) + c_off);
@@ -1071,7 +1083,7 @@ test_invalid_oa_exponent(void)
 
 		/* OA unit configuration */
 		DRM_I915_PERF_PROP_OA_METRICS_SET, test_metric_set_id,
-		DRM_I915_PERF_PROP_OA_FORMAT, I915_OA_FORMAT_A45_B8_C8,
+		DRM_I915_PERF_PROP_OA_FORMAT, test_oa_format,
 		DRM_I915_PERF_PROP_OA_EXPONENT, 31, /* maximum exponent expected
 						       to be accepted */
 	};
@@ -1128,7 +1140,7 @@ test_low_oa_exponent_permissions(void)
 
 		/* OA unit configuration */
 		DRM_I915_PERF_PROP_OA_METRICS_SET, test_metric_set_id,
-		DRM_I915_PERF_PROP_OA_FORMAT, I915_OA_FORMAT_A45_B8_C8,
+		DRM_I915_PERF_PROP_OA_FORMAT, test_oa_format,
 		DRM_I915_PERF_PROP_OA_EXPONENT, bad_exponent,
 	};
 	struct drm_i915_perf_open_param param = {
@@ -1193,7 +1205,7 @@ test_per_context_mode_unprivileged(void)
 
 		/* OA unit configuration */
 		DRM_I915_PERF_PROP_OA_METRICS_SET, test_metric_set_id,
-		DRM_I915_PERF_PROP_OA_FORMAT, I915_OA_FORMAT_A45_B8_C8,
+		DRM_I915_PERF_PROP_OA_FORMAT, test_oa_format,
 		DRM_I915_PERF_PROP_OA_EXPONENT, 13, /* 1 millisecond */
 	};
 	struct drm_i915_perf_open_param param = {
@@ -1280,7 +1292,7 @@ test_blocking(void)
 
 		/* OA unit configuration */
 		DRM_I915_PERF_PROP_OA_METRICS_SET, test_metric_set_id,
-		DRM_I915_PERF_PROP_OA_FORMAT, I915_OA_FORMAT_A45_B8_C8,
+		DRM_I915_PERF_PROP_OA_FORMAT, test_oa_format,
 		DRM_I915_PERF_PROP_OA_EXPONENT, oa_exponent,
 	};
 	struct drm_i915_perf_open_param param = {
@@ -1396,7 +1408,7 @@ test_polling(void)
 
 		/* OA unit configuration */
 		DRM_I915_PERF_PROP_OA_METRICS_SET, test_metric_set_id,
-		DRM_I915_PERF_PROP_OA_FORMAT, I915_OA_FORMAT_A45_B8_C8,
+		DRM_I915_PERF_PROP_OA_FORMAT, test_oa_format,
 		DRM_I915_PERF_PROP_OA_EXPONENT, oa_exponent,
 	};
 	struct drm_i915_perf_open_param param = {
@@ -1537,7 +1549,7 @@ test_buffer_fill(void)
 
 		/* OA unit configuration */
 		DRM_I915_PERF_PROP_OA_METRICS_SET, test_metric_set_id,
-		DRM_I915_PERF_PROP_OA_FORMAT, I915_OA_FORMAT_A45_B8_C8,
+		DRM_I915_PERF_PROP_OA_FORMAT, test_oa_format,
 		DRM_I915_PERF_PROP_OA_EXPONENT, oa_exponent,
 	};
 	struct drm_i915_perf_open_param param = {
@@ -1611,7 +1623,7 @@ test_enable_disable(void)
 
 		/* OA unit configuration */
 		DRM_I915_PERF_PROP_OA_METRICS_SET, test_metric_set_id,
-		DRM_I915_PERF_PROP_OA_FORMAT, I915_OA_FORMAT_A45_B8_C8,
+		DRM_I915_PERF_PROP_OA_FORMAT, test_oa_format,
 		DRM_I915_PERF_PROP_OA_EXPONENT, oa_exponent,
 	};
 	struct drm_i915_perf_open_param param = {
@@ -1682,7 +1694,7 @@ test_short_reads(void)
 
 		/* OA unit configuration */
 		DRM_I915_PERF_PROP_OA_METRICS_SET, test_metric_set_id,
-		DRM_I915_PERF_PROP_OA_FORMAT, I915_OA_FORMAT_A45_B8_C8,
+		DRM_I915_PERF_PROP_OA_FORMAT, test_oa_format,
 		DRM_I915_PERF_PROP_OA_EXPONENT, oa_exponent,
 	};
 	struct drm_i915_perf_open_param param = {
@@ -1769,7 +1781,7 @@ test_non_sampling_read_error(void)
 
 		/* OA unit configuration */
 		DRM_I915_PERF_PROP_OA_METRICS_SET, test_metric_set_id,
-		DRM_I915_PERF_PROP_OA_FORMAT, I915_OA_FORMAT_A45_B8_C8,
+		DRM_I915_PERF_PROP_OA_FORMAT, test_oa_format,
 
 		/* XXX: no sampling exponent */
 	};
@@ -1803,7 +1815,7 @@ test_disabled_read_error(void)
 
 		/* OA unit configuration */
 		DRM_I915_PERF_PROP_OA_METRICS_SET, test_metric_set_id,
-		DRM_I915_PERF_PROP_OA_FORMAT, I915_OA_FORMAT_A45_B8_C8,
+		DRM_I915_PERF_PROP_OA_FORMAT, test_oa_format,
 		DRM_I915_PERF_PROP_OA_EXPONENT, oa_exponent,
 	};
 	struct drm_i915_perf_open_param param = {
@@ -1830,7 +1842,7 @@ test_disabled_read_error(void)
 	stream_fd = __perf_open(drm_fd, &param);
 
 	read_2_oa_reports(stream_fd,
-			  I915_OA_FORMAT_A45_B8_C8,
+			  test_oa_format,
 			  oa_exponent,
 			  oa_report0,
 			  oa_report1,
@@ -1845,7 +1857,7 @@ test_disabled_read_error(void)
 	do_ioctl(stream_fd, I915_PERF_IOCTL_ENABLE, 0);
 
 	read_2_oa_reports(stream_fd,
-			  I915_OA_FORMAT_A45_B8_C8,
+			  test_oa_format,
 			  oa_exponent,
 			  oa_report0,
 			  oa_report1,
@@ -1865,7 +1877,7 @@ test_mi_rpc(void)
 
 		/* OA unit configuration */
 		DRM_I915_PERF_PROP_OA_METRICS_SET, test_metric_set_id,
-		DRM_I915_PERF_PROP_OA_FORMAT, I915_OA_FORMAT_A45_B8_C8,
+		DRM_I915_PERF_PROP_OA_FORMAT, test_oa_format,
 
 		/* Note: no OA exponent specified in this case */
 	};
@@ -1995,7 +2007,7 @@ test_per_ctx_mi_rpc(void)
 
 		/* OA unit configuration */
 		DRM_I915_PERF_PROP_OA_METRICS_SET, test_metric_set_id,
-		DRM_I915_PERF_PROP_OA_FORMAT, I915_OA_FORMAT_A45_B8_C8,
+		DRM_I915_PERF_PROP_OA_FORMAT, test_oa_format,
 
 		/* Note: no OA exponent specified in this case */
 	};
@@ -2143,7 +2155,7 @@ test_per_ctx_mi_rpc(void)
 		igt_assert_neq(report1_32[1], 0); /* timestamp */
 
 		print_reports(report0_32, report1_32,
-			      lookup_format(I915_OA_FORMAT_A45_B8_C8));
+			      lookup_format(test_oa_format));
 
 		/* A40 == N samples written to all render targets */
 		n_samples_written = report1_32[43] - report0_32[43];
@@ -2206,7 +2218,7 @@ test_rc6_disable(void)
 
 		/* OA unit configuration */
 		DRM_I915_PERF_PROP_OA_METRICS_SET, test_metric_set_id,
-		DRM_I915_PERF_PROP_OA_FORMAT, I915_OA_FORMAT_A45_B8_C8,
+		DRM_I915_PERF_PROP_OA_FORMAT, test_oa_format,
 		DRM_I915_PERF_PROP_OA_EXPONENT, oa_exponent,
 	};
 	struct drm_i915_perf_open_param param = {
@@ -2280,7 +2292,7 @@ test_i915_ref_count(void)
 
 		/* OA unit configuration */
 		DRM_I915_PERF_PROP_OA_METRICS_SET, 0 /* updated below */,
-		DRM_I915_PERF_PROP_OA_FORMAT, I915_OA_FORMAT_A45_B8_C8,
+		DRM_I915_PERF_PROP_OA_FORMAT, 0, /* update below */
 		DRM_I915_PERF_PROP_OA_EXPONENT, oa_exponent,
 	};
 	struct drm_i915_perf_open_param param = {
@@ -2305,9 +2317,12 @@ test_i915_ref_count(void)
 	devid = intel_get_drm_devid(drm_fd);
 	card = drm_get_card();
 
-	igt_require(IS_HASWELL(devid));
-	igt_require(lookup_test_metric_set_id());
+	/* Note: these global variables are only initialized after calling
+	 * init_sys_info()...
+	 */
+	igt_require(init_sys_info());
 	properties[3] = test_metric_set_id;
+	properties[5] = test_oa_format;
 
 	ref_count0 = read_i915_module_ref();
 	igt_debug("initial ref count with drm_fd open = %u\n", ref_count0);
@@ -2326,7 +2341,7 @@ test_i915_ref_count(void)
 	igt_assert(ref_count0 > baseline);
 
 	read_2_oa_reports(stream_fd,
-			  I915_OA_FORMAT_A45_B8_C8,
+			  test_oa_format,
 			  oa_exponent,
 			  oa_report0,
 			  oa_report1,
@@ -2376,8 +2391,7 @@ igt_main
 		devid = intel_get_drm_devid(drm_fd);
 		card = drm_get_card();
 
-		igt_require(IS_HASWELL(devid));
-		igt_require(lookup_test_metric_set_id());
+		igt_require(init_sys_info());
 
 		gt_frequency_range_save();
 
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH i-g-t 04/29] igt/perf: update init_sys_info for skl with per-gt configs
  2017-04-25 22:32 [PATCH i-g-t 00/29] Update i915 perf tests for Gen8+ Lionel Landwerlin
                   ` (2 preceding siblings ...)
  2017-04-25 22:32 ` [PATCH i-g-t 03/29] igt/perf: init timestamp freq and oa format per devid Lionel Landwerlin
@ 2017-04-25 22:32 ` Lionel Landwerlin
  2017-04-25 22:32 ` [PATCH i-g-t 05/29] igt/perf: add gen8 formats Lionel Landwerlin
                   ` (24 subsequent siblings)
  28 siblings, 0 replies; 36+ messages in thread
From: Lionel Landwerlin @ 2017-04-25 22:32 UTC (permalink / raw)
  To: intel-gfx

From: Robert Bragg <robert@sixbynine.org>

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 tests/perf.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/tests/perf.c b/tests/perf.c
index f518bcc1..29487cdf 100644
--- a/tests/perf.c
+++ b/tests/perf.c
@@ -381,7 +381,20 @@ init_sys_info(void)
 		} else if (IS_CHERRYVIEW(devid)) {
 			test_set_uuid = "4a534b07-cba3-414d-8d60-874830e883aa";
 		} else if (IS_SKYLAKE(devid)) {
-			test_set_uuid = "544a0c1f-5863-4682-bc59-778b7eab8303";
+			switch (intel_gt(devid)) {
+			case 1:
+				test_set_uuid = "1651949f-0ac0-4cb1-a06f-dafd74a407d1";
+				break;
+			case 2:
+				test_set_uuid = "2b985803-d3c9-4629-8a4f-634bfecba0e8";
+				break;
+			case 3:
+				test_set_uuid = "882fa433-1f4a-4a67-a962-c741888fe5f5";
+				break;
+			default:
+				igt_debug("unsupport Skylake GT size\n");
+				return false;
+			}
 			timestamp_frequency = 12000000;
 		} else if (IS_BROXTON(devid)) {
 			test_set_uuid = "5ee72f5c-092f-421e-8b70-225f7c3e9612";
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH i-g-t 05/29] igt/perf: add gen8 formats
  2017-04-25 22:32 [PATCH i-g-t 00/29] Update i915 perf tests for Gen8+ Lionel Landwerlin
                   ` (3 preceding siblings ...)
  2017-04-25 22:32 ` [PATCH i-g-t 04/29] igt/perf: update init_sys_info for skl with per-gt configs Lionel Landwerlin
@ 2017-04-25 22:32 ` Lionel Landwerlin
  2017-04-25 22:32 ` [PATCH i-g-t 06/29] igt/perf: fix a counter indexing Lionel Landwerlin
                   ` (23 subsequent siblings)
  28 siblings, 0 replies; 36+ messages in thread
From: Lionel Landwerlin @ 2017-04-25 22:32 UTC (permalink / raw)
  To: intel-gfx

From: Robert Bragg <robert@sixbynine.org>

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 tests/perf.c | 78 +++++++++++++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 64 insertions(+), 14 deletions(-)

diff --git a/tests/perf.c b/tests/perf.c
index 29487cdf..3eef82d2 100644
--- a/tests/perf.c
+++ b/tests/perf.c
@@ -139,43 +139,79 @@ enum drm_i915_perf_record_type {
 static struct {
 	const char *name;
 	size_t size;
-	int a_off; /* bytes */
+	int a40_high_off; /* bytes */
+	int a40_low_off;
+	int n_a40;
+	int a_off;
 	int n_a;
 	int first_a;
 	int b_off;
 	int n_b;
 	int c_off;
 	int n_c;
+	int min_gen;
+	int max_gen;
 } oa_formats[I915_OA_FORMAT_MAX] = {
-	[I915_OA_FORMAT_A13] = {
+	[I915_OA_FORMAT_A13] = { /* HSW only */
 		"A13", .size = 64,
-		.a_off = 12, .n_a = 13 },
-	[I915_OA_FORMAT_A29] = {
+		.a_off = 12, .n_a = 13,
+		.max_gen = 7 },
+	[I915_OA_FORMAT_A29] = { /* HSW only */
 		"A29", .size = 128,
-		.a_off = 12, .n_a = 29 },
-	[I915_OA_FORMAT_A13_B8_C8] = {
+		.a_off = 12, .n_a = 29,
+		.max_gen = 7 },
+	[I915_OA_FORMAT_A13_B8_C8] = { /* HSW only */
 		"A13_B8_C8", .size = 128,
 		.a_off = 12, .n_a = 13,
 		.b_off = 64, .n_b = 8,
-		.c_off = 96, .n_c = 8 },
-	[I915_OA_FORMAT_A45_B8_C8] = {
+		.c_off = 96, .n_c = 8,
+		.max_gen = 7 },
+	[I915_OA_FORMAT_A45_B8_C8] = { /* HSW only */
 		"A45_B8_C8", .size = 256,
 		.a_off = 12,  .n_a = 45,
 		.b_off = 192, .n_b = 8,
-		.c_off = 224, .n_c = 8 },
-	[I915_OA_FORMAT_B4_C8] = {
+		.c_off = 224, .n_c = 8,
+		.max_gen = 7 },
+	[I915_OA_FORMAT_B4_C8] = { /* HSW only */
 		"B4_C8", .size = 64,
 		.b_off = 16, .n_b = 4,
-		.c_off = 32, .n_c = 8 },
-	[I915_OA_FORMAT_B4_C8_A16] = {
+		.c_off = 32, .n_c = 8,
+		.max_gen = 7 },
+	[I915_OA_FORMAT_B4_C8_A16] = { /* HSW only */
 		"B4_C8_A16", .size = 128,
 		.b_off = 16, .n_b = 4,
 		.c_off = 32, .n_c = 8,
-		.a_off = 60, .n_a = 16, .first_a = 29 },
-	[I915_OA_FORMAT_C4_B8] = {
+		.a_off = 60, .n_a = 16, .first_a = 29,
+		.max_gen = 7 },
+	[I915_OA_FORMAT_C4_B8] = { /* HSW+ (header differs from HSW-Gen8+) */
 		"C4_B8", .size = 64,
 		.c_off = 16, .n_c = 4,
 		.b_off = 28, .n_b = 8 },
+
+	/* Gen8+ */
+
+	[I915_OA_FORMAT_A12] = {
+		"A12", .size = 64,
+		.a_off = 12, .n_a = 12, .first_a = 7,
+		.min_gen = 8 },
+	[I915_OA_FORMAT_A12_B8_C8] = {
+		"A12_B8_C8", .size = 128,
+		.a_off = 12, .n_a = 12,
+		.b_off = 64, .n_b = 8,
+		.c_off = 96, .n_c = 8, .first_a = 7,
+		.min_gen = 8 },
+	[I915_OA_FORMAT_A32u40_A4u32_B8_C8] = {
+		"A32u40_A4u32_B8_C8", .size = 256,
+		.a40_high_off = 160, .a40_low_off = 16, .n_a40 = 32,
+		.a_off = 144, .n_a = 4, .first_a = 32,
+		.b_off = 192, .n_b = 8,
+		.c_off = 224, .n_c = 8,
+		.min_gen = 8 },
+	[I915_OA_FORMAT_C4_B8] = {
+		"C4_B8", .size = 64,
+		.c_off = 16, .n_c = 4,
+		.b_off = 32, .n_b = 8,
+		.min_gen = 8 },
 };
 
 static bool hsw_undefined_a_counters[45] = {
@@ -870,6 +906,20 @@ test_oa_formats(void)
 		if (!oa_formats[i].name) /* sparse, indexed by ID */
 			continue;
 
+		if (oa_formats[i].min_gen &&
+		    intel_gen(devid) < oa_formats[i].min_gen) {
+			igt_debug("skipping unsupported OA format %s\n",
+				  oa_formats[i].name);
+			continue;
+		}
+
+		if (oa_formats[i].max_gen &&
+		    intel_gen(devid) > oa_formats[i].max_gen) {
+			igt_debug("skipping unsupported OA format %s\n",
+				  oa_formats[i].name);
+			continue;
+		}
+
 		igt_debug("Checking OA format %s\n", oa_formats[i].name);
 
 		open_and_read_2_oa_reports(i,
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH i-g-t 06/29] igt/perf: fix a counter indexing
  2017-04-25 22:32 [PATCH i-g-t 00/29] Update i915 perf tests for Gen8+ Lionel Landwerlin
                   ` (4 preceding siblings ...)
  2017-04-25 22:32 ` [PATCH i-g-t 05/29] igt/perf: add gen8 formats Lionel Landwerlin
@ 2017-04-25 22:32 ` Lionel Landwerlin
  2017-04-25 22:32 ` [PATCH i-g-t 07/29] igt/perf: generalize checks for undefined A counters Lionel Landwerlin
                   ` (22 subsequent siblings)
  28 siblings, 0 replies; 36+ messages in thread
From: Lionel Landwerlin @ 2017-04-25 22:32 UTC (permalink / raw)
  To: intel-gfx

From: Robert Bragg <robert@sixbynine.org>

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 tests/perf.c | 20 ++++++++------------
 1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/tests/perf.c b/tests/perf.c
index 3eef82d2..5a6bd05a 100644
--- a/tests/perf.c
+++ b/tests/perf.c
@@ -863,17 +863,15 @@ print_reports(uint32_t *oa_report0, uint32_t *oa_report1, int fmt)
 	} else
 		igt_debug("CLOCK = N/A\n");
 
-	for (int j = oa_formats[fmt].first_a;
-	     j < oa_formats[fmt].n_a;
-	     j++)
-	{
+	for (int j = 0; j < oa_formats[fmt].n_a; j++) {
+		int a_id = oa_formats[fmt].first_a + j;
 		uint32_t delta = a1[j] - a0[j];
 
-		if (hsw_undefined_a_counters[j])
+		if (hsw_undefined_a_counters[a_id])
 			continue;
 
 		igt_debug("A%d: 1st = %"PRIu32", 2nd = %"PRIu32", delta = %"PRIu32"\n",
-			  j, a0[j], a1[j], delta);
+			  a_id, a0[j], a1[j], delta);
 	}
 
 	for (int j = 0; j < oa_formats[fmt].n_b; j++) {
@@ -976,16 +974,14 @@ test_oa_formats(void)
 		 */
 		max_delta = clock_delta * 40;
 
-		for (int j = oa_formats[i].first_a;
-		     j < oa_formats[i].n_a;
-		     j++)
-		{
+		for (int j = 0; j < oa_formats[i].n_a; j++) {
+			int a_id = oa_formats[i].first_a + j;
 			uint32_t delta = a1[j] - a0[j];
 
-			if (hsw_undefined_a_counters[j])
+			if (hsw_undefined_a_counters[a_id])
 				continue;
 
-			igt_debug("A%d: delta = %"PRIu32"\n", j, delta);
+			igt_debug("A%d: delta = %"PRIu32"\n", a_id, delta);
 			igt_assert(delta <= max_delta);
 		}
 
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH i-g-t 07/29] igt/perf: generalize checks for undefined A counters
  2017-04-25 22:32 [PATCH i-g-t 00/29] Update i915 perf tests for Gen8+ Lionel Landwerlin
                   ` (5 preceding siblings ...)
  2017-04-25 22:32 ` [PATCH i-g-t 06/29] igt/perf: fix a counter indexing Lionel Landwerlin
@ 2017-04-25 22:32 ` Lionel Landwerlin
  2017-04-25 22:32 ` [PATCH i-g-t 08/29] igt/perf: generalize reading gpu ticks from reports Lionel Landwerlin
                   ` (21 subsequent siblings)
  28 siblings, 0 replies; 36+ messages in thread
From: Lionel Landwerlin @ 2017-04-25 22:32 UTC (permalink / raw)
  To: intel-gfx

From: Robert Bragg <robert@sixbynine.org>

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 tests/perf.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/tests/perf.c b/tests/perf.c
index 5a6bd05a..fe39f4dd 100644
--- a/tests/perf.c
+++ b/tests/perf.c
@@ -232,6 +232,9 @@ static bool hsw_undefined_a_counters[45] = {
 	[44] = true,
 };
 
+/* No A counters currently reserved/undefined for gen8+ so far */
+static bool gen8_undefined_a_counters[45];
+
 static int drm_fd = -1;
 static uint32_t devid;
 static int card = -1;
@@ -244,6 +247,7 @@ static uint64_t gt_max_freq_mhz = 0;
 
 static uint64_t timestamp_frequency = 12500000;
 static enum drm_i915_oa_format test_oa_format;
+static bool *undefined_a_counters;
 
 static igt_render_copyfunc_t render_copy = NULL;
 
@@ -408,9 +412,11 @@ init_sys_info(void)
 		test_set_name = "RenderBasic";
 		test_set_uuid = "403d8832-1a27-4aa6-a64e-f5389ce7b212";
 		test_oa_format = I915_OA_FORMAT_A45_B8_C8;
+		undefined_a_counters = hsw_undefined_a_counters;
 	} else {
 		test_set_name = "TestOa";
 		test_oa_format = I915_OA_FORMAT_A32u40_A4u32_B8_C8;
+		undefined_a_counters = gen8_undefined_a_counters;
 
 		if (IS_BROADWELL(devid)) {
 			test_set_uuid = "d6de6f55-e526-4f79-a6a6-d7315c09044e";
@@ -867,7 +873,7 @@ print_reports(uint32_t *oa_report0, uint32_t *oa_report1, int fmt)
 		int a_id = oa_formats[fmt].first_a + j;
 		uint32_t delta = a1[j] - a0[j];
 
-		if (hsw_undefined_a_counters[a_id])
+		if (undefined_a_counters[a_id])
 			continue;
 
 		igt_debug("A%d: 1st = %"PRIu32", 2nd = %"PRIu32", delta = %"PRIu32"\n",
@@ -978,7 +984,7 @@ test_oa_formats(void)
 			int a_id = oa_formats[i].first_a + j;
 			uint32_t delta = a1[j] - a0[j];
 
-			if (hsw_undefined_a_counters[a_id])
+			if (undefined_a_counters[a_id])
 				continue;
 
 			igt_debug("A%d: delta = %"PRIu32"\n", a_id, delta);
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH i-g-t 08/29] igt/perf: generalize reading gpu ticks from reports
  2017-04-25 22:32 [PATCH i-g-t 00/29] Update i915 perf tests for Gen8+ Lionel Landwerlin
                   ` (6 preceding siblings ...)
  2017-04-25 22:32 ` [PATCH i-g-t 07/29] igt/perf: generalize checks for undefined A counters Lionel Landwerlin
@ 2017-04-25 22:32 ` Lionel Landwerlin
  2017-04-25 22:32 ` [PATCH i-g-t 09/29] igt/perf: move timebase + oa exponent utilities up Lionel Landwerlin
                   ` (20 subsequent siblings)
  28 siblings, 0 replies; 36+ messages in thread
From: Lionel Landwerlin @ 2017-04-25 22:32 UTC (permalink / raw)
  To: intel-gfx

From: Robert Bragg <robert@sixbynine.org>

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 tests/perf.c | 67 +++++++++++++++++++++++++++++++++++++-----------------------
 1 file changed, 42 insertions(+), 25 deletions(-)

diff --git a/tests/perf.c b/tests/perf.c
index fe39f4dd..48e8750f 100644
--- a/tests/perf.c
+++ b/tests/perf.c
@@ -250,6 +250,8 @@ static enum drm_i915_oa_format test_oa_format;
 static bool *undefined_a_counters;
 
 static igt_render_copyfunc_t render_copy = NULL;
+static uint32_t (*read_report_ticks)(uint32_t *report,
+				     enum drm_i915_oa_format format);
 
 static int
 __perf_open(int fd, struct drm_i915_perf_open_param *param)
@@ -393,6 +395,28 @@ read_debugfs_u64_record(int fd, const char *file, const char *key)
 	return val;
 }
 
+/* XXX: For Haswell this utility is only applicable to the render basic
+ * metric set.
+ *
+ * C2 corresponds to a clock counter for the Haswell render basic metric set
+ * but it's not included in all of the formats.
+ */
+static uint32_t
+hsw_read_report_ticks(uint32_t *report, enum drm_i915_oa_format format)
+{
+	uint32_t *c = (uint32_t *)(((uint8_t *)report) + oa_formats[format].c_off);
+
+	igt_assert_neq(oa_formats[format].n_c, 0);
+
+	return c[2];
+}
+
+static uint32_t
+gen8_read_report_ticks(uint32_t *report, enum drm_i915_oa_format format)
+{
+	return report[3];
+}
+
 static bool
 init_sys_info(void)
 {
@@ -413,10 +437,12 @@ init_sys_info(void)
 		test_set_uuid = "403d8832-1a27-4aa6-a64e-f5389ce7b212";
 		test_oa_format = I915_OA_FORMAT_A45_B8_C8;
 		undefined_a_counters = hsw_undefined_a_counters;
+		read_report_ticks = hsw_read_report_ticks;
 	} else {
 		test_set_name = "TestOa";
 		test_oa_format = I915_OA_FORMAT_A32u40_A4u32_B8_C8;
 		undefined_a_counters = gen8_undefined_a_counters;
+		read_report_ticks = gen8_read_report_ticks;
 
 		if (IS_BROADWELL(devid)) {
 			test_set_uuid = "d6de6f55-e526-4f79-a6a6-d7315c09044e";
@@ -945,30 +971,28 @@ test_oa_formats(void)
 		time_delta = timebase_scale(oa_report1[1] - oa_report0[1]);
 		igt_assert_neq(time_delta, 0);
 
-		/* C2 corresponds to a clock counter for the Haswell render
-		 * basic metric set but it's not included in all of the
-		 * formats.
+		/* As a special case we have to consider that on Haswell we
+		 * can't explicitly derive a clock delta for all OA report
+		 * formats...
 		 */
-		if (oa_formats[i].n_c) {
+		if (IS_HASWELL(devid) && oa_formats[i].n_c == 0) {
+			/* Assume running at max freq for sake of
+			 * below sanity check on counters... */
+			clock_delta = (gt_max_freq_mhz *
+				       (uint64_t)time_delta) / 1000;
+		} else {
+			uint32_t ticks0 = read_report_ticks(oa_report0, i);
+			uint32_t ticks1 = read_report_ticks(oa_report1, i);
 			uint64_t freq;
 
-			/* The first report might have a clock count of zero
-			 * but we wouldn't expect that in the second report...
-			 */
-			igt_assert_neq(c1[2], 0);
+			clock_delta = ticks1 - ticks0;
 
-			clock_delta = c1[2] - c0[2];
 			igt_assert_neq(clock_delta, 0);
 
 			freq = ((uint64_t)clock_delta * 1000) / time_delta;
 			igt_debug("freq = %"PRIu64"\n", freq);
 
 			igt_assert(freq <= gt_max_freq_mhz);
-		} else {
-			/* Assume running at max freq for sake of
-			 * below sanity check on counters... */
-			clock_delta = (gt_max_freq_mhz *
-				       (uint64_t)time_delta) / 1000;
 		}
 
 		igt_debug("clock delta = %"PRIu32"\n", clock_delta);
@@ -1035,7 +1059,6 @@ test_oa_exponents(int gt_freq_mhz)
 		uint32_t timestamp_delta;
 		uint32_t oa_report0[64];
 		uint32_t oa_report1[64];
-		uint32_t *c0, *c1;
 		uint32_t time_delta;
 		uint32_t clock_delta;
 		uint32_t freq;
@@ -1051,7 +1074,7 @@ test_oa_exponents(int gt_freq_mhz)
 
 		for (int j = 0; n_tested < 10 && j < 100; j++) {
 			int gt_freq_mhz_0, gt_freq_mhz_1;
-			int c_off;
+			uint32_t ticks0, ticks1;
 
 			gt_freq_mhz_0 = sysfs_read("gt_act_freq_mhz");
 
@@ -1087,15 +1110,9 @@ test_oa_exponents(int gt_freq_mhz)
 
 			igt_assert_eq(timestamp_delta, expected_timestamp_delta);
 
-			/* NB: for the render basic metric set opened above by
-			 * open_and_read_2_oa_reports(), the C2 counter is
-			 * configured as the gpu clock counter...
-			 */
-			c_off = oa_formats[test_oa_format].c_off;
-			igt_assert(c_off);
-			c0 = (uint32_t *)(((uint8_t *)oa_report0) + c_off);
-			c1 = (uint32_t *)(((uint8_t *)oa_report1) + c_off);
-			clock_delta = c1[2] - c0[2];
+			ticks0 = read_report_ticks(oa_report0, test_oa_format);
+			ticks1 = read_report_ticks(oa_report1, test_oa_format);
+			clock_delta = ticks1 - ticks0;
 
 			time_delta = timebase_scale(timestamp_delta);
 
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH i-g-t 09/29] igt/perf: move timebase + oa exponent utilities up
  2017-04-25 22:32 [PATCH i-g-t 00/29] Update i915 perf tests for Gen8+ Lionel Landwerlin
                   ` (7 preceding siblings ...)
  2017-04-25 22:32 ` [PATCH i-g-t 08/29] igt/perf: generalize reading gpu ticks from reports Lionel Landwerlin
@ 2017-04-25 22:32 ` Lionel Landwerlin
  2017-04-25 22:32 ` [PATCH i-g-t 10/29] igt/perf: wrap emission of MI_REPORT_PERF_COUNT Lionel Landwerlin
                   ` (19 subsequent siblings)
  28 siblings, 0 replies; 36+ messages in thread
From: Lionel Landwerlin @ 2017-04-25 22:32 UTC (permalink / raw)
  To: intel-gfx

From: Robert Bragg <robert@sixbynine.org>

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 tests/perf.c | 56 ++++++++++++++++++++++++++++----------------------------
 1 file changed, 28 insertions(+), 28 deletions(-)

diff --git a/tests/perf.c b/tests/perf.c
index 48e8750f..600fa7d9 100644
--- a/tests/perf.c
+++ b/tests/perf.c
@@ -417,6 +417,34 @@ gen8_read_report_ticks(uint32_t *report, enum drm_i915_oa_format format)
 	return report[3];
 }
 
+static uint64_t
+timebase_scale(uint32_t u32_delta)
+{
+	return ((uint64_t)u32_delta * NSEC_PER_SEC) / timestamp_frequency;
+}
+
+/* Return the largest OA exponent that will still result in a sampling
+ * frequency higher than the given frequency.
+ */
+static int
+max_oa_exponent_for_higher_freq(uint64_t freq)
+{
+	/* NB: timebase_scale() takes a uint32_t and an exponent of 30
+	 * would already represent a period of ~3 minutes so there's
+	 * really no need to consider higher exponents.
+	 */
+	for (int i = 0; i < 30; i++) {
+		uint64_t oa_period = timebase_scale(2 << i);
+		uint32_t oa_freq = NSEC_PER_SEC / oa_period;
+
+		if (oa_freq <= freq)
+			return max(0, i - 1);
+	}
+
+	igt_assert(!"reached");
+	return -1;
+}
+
 static bool
 init_sys_info(void)
 {
@@ -531,12 +559,6 @@ gt_frequency_range_restore(void)
 	gt_max_freq_mhz = gt_max_freq_mhz_saved;
 }
 
-static uint64_t
-timebase_scale(uint32_t u32_delta)
-{
-	return ((uint64_t)u32_delta * NSEC_PER_SEC) / timestamp_frequency;
-}
-
 /* CAP_SYS_ADMIN is required to open system wide metrics, unless the system
  * control parameter dev.i915.perf_stream_paranoid == 0 */
 static void
@@ -1184,28 +1206,6 @@ test_invalid_oa_exponent(void)
 	}
 }
 
-/* Return the largest OA exponent that will still result in a sampling
- * frequency higher than the given frequency.
- */
-static int
-max_oa_exponent_for_higher_freq(uint64_t freq)
-{
-	/* NB: timebase_scale() takes a uint32_t and an exponent of 30
-	 * would already represent a period of ~3 minutes so there's
-	 * really no need to consider higher exponents.
-	 */
-	for (int i = 0; i < 30; i++) {
-		uint64_t oa_period = timebase_scale(2 << i);
-		uint32_t oa_freq = NSEC_PER_SEC / oa_period;
-
-		if (oa_freq <= freq)
-			return max(0, i - 1);
-	}
-
-	igt_assert(!"reached");
-	return -1;
-}
-
 /* The lowest periodic sampling exponent equates to a period of 160 nanoseconds
  * or a frequency of 6.25MHz which is only possible to request as root by
  * default. By default the maximum OA sampling rate is 100KHz
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH i-g-t 10/29] igt/perf: wrap emission of MI_REPORT_PERF_COUNT
  2017-04-25 22:32 [PATCH i-g-t 00/29] Update i915 perf tests for Gen8+ Lionel Landwerlin
                   ` (8 preceding siblings ...)
  2017-04-25 22:32 ` [PATCH i-g-t 09/29] igt/perf: move timebase + oa exponent utilities up Lionel Landwerlin
@ 2017-04-25 22:32 ` Lionel Landwerlin
  2017-04-25 22:32 ` [PATCH i-g-t 11/29] igt/perf: handling printing gen8 formats Lionel Landwerlin
                   ` (18 subsequent siblings)
  28 siblings, 0 replies; 36+ messages in thread
From: Lionel Landwerlin @ 2017-04-25 22:32 UTC (permalink / raw)
  To: intel-gfx

From: Robert Bragg <robert@sixbynine.org>

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 tests/perf.c | 44 ++++++++++++++++++++++++++++++++------------
 1 file changed, 32 insertions(+), 12 deletions(-)

diff --git a/tests/perf.c b/tests/perf.c
index 600fa7d9..864c465c 100644
--- a/tests/perf.c
+++ b/tests/perf.c
@@ -43,6 +43,7 @@
 IGT_TEST_DESCRIPTION("Test the i915 perf metrics streaming interface");
 
 #define GEN6_MI_REPORT_PERF_COUNT ((0x28 << 23) | (3 - 2))
+#define GEN8_MI_REPORT_PERF_COUNT ((0x28 << 23) | (4 - 2))
 
 #define GFX_OP_PIPE_CONTROL     ((3 << 29) | (3 << 27) | (2 << 24))
 #define PIPE_CONTROL_CS_STALL	   (1 << 20)
@@ -1949,6 +1950,32 @@ test_disabled_read_error(void)
 }
 
 static void
+emit_report_perf_count(struct intel_batchbuffer *batch,
+		       drm_intel_bo *dst_bo,
+		       int dst_offset,
+		       uint32_t report_id)
+{
+	if (IS_HASWELL(devid)) {
+		BEGIN_BATCH(3, 1);
+		OUT_BATCH(GEN6_MI_REPORT_PERF_COUNT);
+		OUT_RELOC(dst_bo, I915_GEM_DOMAIN_INSTRUCTION, I915_GEM_DOMAIN_INSTRUCTION,
+			  dst_offset);
+		OUT_BATCH(report_id);
+		ADVANCE_BATCH();
+	} else {
+		/* XXX: NB: n dwords arg is actually magic since it internally
+		 * automatically accounts for larger addresses on gen >= 8...
+		 */
+		BEGIN_BATCH(3, 1);
+		OUT_BATCH(GEN8_MI_REPORT_PERF_COUNT);
+		OUT_RELOC(dst_bo, I915_GEM_DOMAIN_INSTRUCTION, I915_GEM_DOMAIN_INSTRUCTION,
+			  dst_offset);
+		OUT_BATCH(report_id);
+		ADVANCE_BATCH();
+	}
+}
+
+static void
 test_mi_rpc(void)
 {
 	uint64_t properties[] = {
@@ -1991,12 +2018,10 @@ test_mi_rpc(void)
 	memset(bo->virtual, 0x80, 4096);
 	drm_intel_bo_unmap(bo);
 
-	BEGIN_BATCH(3, 1);
-	OUT_BATCH(GEN6_MI_REPORT_PERF_COUNT);
-	OUT_RELOC(bo, I915_GEM_DOMAIN_INSTRUCTION, I915_GEM_DOMAIN_INSTRUCTION,
-		  0); /* offset in bytes */
-	OUT_BATCH(0xdeadbeef); /* report ID */
-	ADVANCE_BATCH();
+	emit_report_perf_count(batch,
+			       bo, /* dst */
+			       0, /* dst offset in bytes */
+			       0xdeadbeef); /* report ID */
 
 	intel_batchbuffer_flush_with_context(batch, context);
 
@@ -2063,12 +2088,7 @@ emit_stall_timestamp_and_rpc(struct intel_batchbuffer *batch,
 	OUT_BATCH(0); /* imm upper */
 	ADVANCE_BATCH();
 
-	BEGIN_BATCH(3, 1);
-	OUT_BATCH(GEN6_MI_REPORT_PERF_COUNT);
-	OUT_RELOC(dst, I915_GEM_DOMAIN_INSTRUCTION, I915_GEM_DOMAIN_INSTRUCTION,
-		  report_dst_offset);
-	OUT_BATCH(report_id);
-	ADVANCE_BATCH();
+	emit_report_perf_count(batch, dst, report_dst_offset, report_id);
 }
 
 /* Tests the INTEL_performance_query use case where an unprivileged process
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH i-g-t 11/29] igt/perf: handling printing gen8 formats
  2017-04-25 22:32 [PATCH i-g-t 00/29] Update i915 perf tests for Gen8+ Lionel Landwerlin
                   ` (9 preceding siblings ...)
  2017-04-25 22:32 ` [PATCH i-g-t 10/29] igt/perf: wrap emission of MI_REPORT_PERF_COUNT Lionel Landwerlin
@ 2017-04-25 22:32 ` Lionel Landwerlin
  2017-04-25 22:32 ` [PATCH i-g-t 12/29] igt/perf: avoid assumptions about oa exponent <-> freq mappings Lionel Landwerlin
                   ` (17 subsequent siblings)
  28 siblings, 0 replies; 36+ messages in thread
From: Lionel Landwerlin @ 2017-04-25 22:32 UTC (permalink / raw)
  To: intel-gfx

From: Robert Bragg <robert@sixbynine.org>

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 tests/perf.c | 73 +++++++++++++++++++++++++++++++++++++++++++++---------------
 1 file changed, 55 insertions(+), 18 deletions(-)

diff --git a/tests/perf.c b/tests/perf.c
index 864c465c..15f41246 100644
--- a/tests/perf.c
+++ b/tests/perf.c
@@ -446,6 +446,26 @@ max_oa_exponent_for_higher_freq(uint64_t freq)
 	return -1;
 }
 
+static uint64_t
+gen8_read_40bit_a_counter(uint32_t *report, enum drm_i915_oa_format fmt, int a_id)
+{
+	uint8_t *a40_high = (((uint8_t *)report) + oa_formats[fmt].a40_high_off);
+	uint32_t *a40_low = (uint32_t *)(((uint8_t *)report) +
+					 oa_formats[fmt].a40_low_off);
+	uint64_t high = (uint64_t)(a40_high[a_id]) << 32;
+
+	return a40_low[a_id] | high;
+}
+
+static uint64_t
+gen8_40bit_a_delta(uint64_t value0, uint64_t value1)
+{
+	if (value0 > value1)
+		return (1ULL << 40) + value1 - value0;
+	else
+		return value1 - value0;
+}
+
 static bool
 init_sys_info(void)
 {
@@ -895,30 +915,37 @@ open_and_read_2_oa_reports(int format_id,
 static void
 print_reports(uint32_t *oa_report0, uint32_t *oa_report1, int fmt)
 {
-	uint32_t *a0, *b0, *c0;
-	uint32_t *a1, *b1, *c1;
-
-	/* Not ideal naming here with a0 or a1
-	 * differentiating report0 or 1 not A counter 0 or 1....
-	 */
-	a0 = (uint32_t *)(((uint8_t *)oa_report0) + oa_formats[fmt].a_off);
-	b0 = (uint32_t *)(((uint8_t *)oa_report0) + oa_formats[fmt].b_off);
-	c0 = (uint32_t *)(((uint8_t *)oa_report0) + oa_formats[fmt].c_off);
-
-	a1 = (uint32_t *)(((uint8_t *)oa_report1) + oa_formats[fmt].a_off);
-	b1 = (uint32_t *)(((uint8_t *)oa_report1) + oa_formats[fmt].b_off);
-	c1 = (uint32_t *)(((uint8_t *)oa_report1) + oa_formats[fmt].c_off);
-
 	igt_debug("TIMESTAMP: 1st = %"PRIu32", 2nd = %"PRIu32", delta = %"PRIu32"\n",
 		  oa_report0[1], oa_report1[1], oa_report1[1] - oa_report0[1]);
 
-	if (oa_formats[fmt].n_c) {
-		igt_debug("CLOCK: 1st = %"PRIu32", 2nd = %"PRIu32", delta = %"PRIu32"\n",
-			  c0[2], c1[2], c1[2] - c0[2]);
-	} else
+	if (IS_HASWELL(devid) && oa_formats[fmt].n_c == 0) {
 		igt_debug("CLOCK = N/A\n");
+	} else {
+		uint32_t clock0 = read_report_ticks(oa_report0, fmt);
+		uint32_t clock1 = read_report_ticks(oa_report1, fmt);
+
+		igt_debug("CLOCK: 1st = %"PRIu32", 2nd = %"PRIu32", delta = %"PRIu32"\n",
+			  clock0, clock1, clock1 - clock0);
+	}
+
+	/* Gen8+ has some 40bit A counters... */
+	for (int j = 0; j < oa_formats[fmt].n_a40; j++) {
+		uint64_t value0 = gen8_read_40bit_a_counter(oa_report0, fmt, j);
+		uint64_t value1 = gen8_read_40bit_a_counter(oa_report1, fmt, j);
+		uint64_t delta = gen8_40bit_a_delta(value0, value1);
+
+		if (undefined_a_counters[j])
+			continue;
+
+		igt_debug("A%d: 1st = %"PRIu64", 2nd = %"PRIu64", delta = %"PRIu64"\n",
+			  j, value0, value1, delta);
+	}
 
 	for (int j = 0; j < oa_formats[fmt].n_a; j++) {
+		uint32_t *a0 = (uint32_t *)(((uint8_t *)oa_report0) +
+					    oa_formats[fmt].a_off);
+		uint32_t *a1 = (uint32_t *)(((uint8_t *)oa_report1) +
+					    oa_formats[fmt].a_off);
 		int a_id = oa_formats[fmt].first_a + j;
 		uint32_t delta = a1[j] - a0[j];
 
@@ -930,13 +957,23 @@ print_reports(uint32_t *oa_report0, uint32_t *oa_report1, int fmt)
 	}
 
 	for (int j = 0; j < oa_formats[fmt].n_b; j++) {
+		uint32_t *b0 = (uint32_t *)(((uint8_t *)oa_report0) +
+					    oa_formats[fmt].b_off);
+		uint32_t *b1 = (uint32_t *)(((uint8_t *)oa_report1) +
+					    oa_formats[fmt].b_off);
 		uint32_t delta = b1[j] - b0[j];
+
 		igt_debug("B%d: 1st = %"PRIu32", 2nd = %"PRIu32", delta = %"PRIu32"\n",
 			  j, b0[j], b1[j], delta);
 	}
 
 	for (int j = 0; j < oa_formats[fmt].n_c; j++) {
+		uint32_t *c0 = (uint32_t *)(((uint8_t *)oa_report0) +
+					    oa_formats[fmt].c_off);
+		uint32_t *c1 = (uint32_t *)(((uint8_t *)oa_report1) +
+					    oa_formats[fmt].c_off);
 		uint32_t delta = c1[j] - c0[j];
+
 		igt_debug("C%d: 1st = %"PRIu32", 2nd = %"PRIu32", delta = %"PRIu32"\n",
 			  j, c0[j], c1[j], delta);
 	}
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH i-g-t 12/29] igt/perf: avoid assumptions about oa exponent <-> freq mappings
  2017-04-25 22:32 [PATCH i-g-t 00/29] Update i915 perf tests for Gen8+ Lionel Landwerlin
                   ` (10 preceding siblings ...)
  2017-04-25 22:32 ` [PATCH i-g-t 11/29] igt/perf: handling printing gen8 formats Lionel Landwerlin
@ 2017-04-25 22:32 ` Lionel Landwerlin
  2017-04-25 22:32 ` [PATCH i-g-t 13/29] igt/perf: allow 10% margin matching oa/sysfs freq in test_oa_exponents Lionel Landwerlin
                   ` (16 subsequent siblings)
  28 siblings, 0 replies; 36+ messages in thread
From: Lionel Landwerlin @ 2017-04-25 22:32 UTC (permalink / raw)
  To: intel-gfx

From: Robert Bragg <robert@sixbynine.org>

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 tests/perf.c | 135 +++++++++++++++++++++++++++++++++++++----------------------
 1 file changed, 84 insertions(+), 51 deletions(-)

diff --git a/tests/perf.c b/tests/perf.c
index 15f41246..d47e45c8 100644
--- a/tests/perf.c
+++ b/tests/perf.c
@@ -249,6 +249,7 @@ static uint64_t gt_max_freq_mhz = 0;
 static uint64_t timestamp_frequency = 12500000;
 static enum drm_i915_oa_format test_oa_format;
 static bool *undefined_a_counters;
+static uint64_t oa_exp_1_millisec;
 
 static igt_render_copyfunc_t render_copy = NULL;
 static uint32_t (*read_report_ticks)(uint32_t *report,
@@ -424,11 +425,11 @@ timebase_scale(uint32_t u32_delta)
 	return ((uint64_t)u32_delta * NSEC_PER_SEC) / timestamp_frequency;
 }
 
-/* Return the largest OA exponent that will still result in a sampling
- * frequency higher than the given frequency.
+/* Returns: the largest OA exponent that will still result in a sampling period
+ * less than or equal to the given @period.
  */
 static int
-max_oa_exponent_for_higher_freq(uint64_t freq)
+max_oa_exponent_for_period_lte(uint64_t period)
 {
 	/* NB: timebase_scale() takes a uint32_t and an exponent of 30
 	 * would already represent a period of ~3 minutes so there's
@@ -436,9 +437,8 @@ max_oa_exponent_for_higher_freq(uint64_t freq)
 	 */
 	for (int i = 0; i < 30; i++) {
 		uint64_t oa_period = timebase_scale(2 << i);
-		uint32_t oa_freq = NSEC_PER_SEC / oa_period;
 
-		if (oa_freq <= freq)
+		if (oa_period > period)
 			return max(0, i - 1);
 	}
 
@@ -446,6 +446,25 @@ max_oa_exponent_for_higher_freq(uint64_t freq)
 	return -1;
 }
 
+/* Return: the largest OA exponent that will still result in a sampling
+ * frequency greater than the given @frequency.
+ */
+static int
+max_oa_exponent_for_freq_gt(uint64_t frequency)
+{
+	uint64_t period = NSEC_PER_SEC / frequency;
+
+	igt_assert_neq(period, 0);
+
+	return max_oa_exponent_for_period_lte(period - 1);
+}
+
+static uint64_t
+oa_exponent_to_ns(int exponent)
+{
+       return 1000000000ULL * (2ULL << exponent) / timestamp_frequency;
+}
+
 static uint64_t
 gen8_read_40bit_a_counter(uint32_t *report, enum drm_i915_oa_format fmt, int a_id)
 {
@@ -524,6 +543,8 @@ init_sys_info(void)
 		  test_set_name,
 		  test_set_uuid);
 
+	oa_exp_1_millisec = max_oa_exponent_for_period_lte(1000000);
+
 	snprintf(buf, sizeof(buf),
 		 "/sys/class/drm/card%d/metrics/%s/id",
 		 card,
@@ -593,7 +614,7 @@ test_system_wide_paranoid(void)
 			/* OA unit configuration */
 			DRM_I915_PERF_PROP_OA_METRICS_SET, test_metric_set_id,
 			DRM_I915_PERF_PROP_OA_FORMAT, test_oa_format,
-			DRM_I915_PERF_PROP_OA_EXPONENT, 13, /* 1 millisecond */
+			DRM_I915_PERF_PROP_OA_EXPONENT, oa_exp_1_millisec,
 		};
 		struct drm_i915_perf_open_param param = {
 			.flags = I915_PERF_FLAG_FD_CLOEXEC |
@@ -619,7 +640,7 @@ test_system_wide_paranoid(void)
 			/* OA unit configuration */
 			DRM_I915_PERF_PROP_OA_METRICS_SET, test_metric_set_id,
 			DRM_I915_PERF_PROP_OA_FORMAT, test_oa_format,
-			DRM_I915_PERF_PROP_OA_EXPONENT, 13, /* 1 millisecond */
+			DRM_I915_PERF_PROP_OA_EXPONENT, oa_exp_1_millisec,
 		};
 		struct drm_i915_perf_open_param param = {
 			.flags = I915_PERF_FLAG_FD_CLOEXEC |
@@ -653,7 +674,7 @@ test_invalid_open_flags(void)
 		/* OA unit configuration */
 		DRM_I915_PERF_PROP_OA_METRICS_SET, test_metric_set_id,
 		DRM_I915_PERF_PROP_OA_FORMAT, test_oa_format,
-		DRM_I915_PERF_PROP_OA_EXPONENT, 13, /* 1 millisecond */
+		DRM_I915_PERF_PROP_OA_EXPONENT, oa_exp_1_millisec,
 	};
 	struct drm_i915_perf_open_param param = {
 		.flags = ~0, /* Undefined flag bits set! */
@@ -673,7 +694,7 @@ test_invalid_oa_metric_set_id(void)
 
 		/* OA unit configuration */
 		DRM_I915_PERF_PROP_OA_FORMAT, test_oa_format,
-		DRM_I915_PERF_PROP_OA_EXPONENT, 13, /* 1 millisecond */
+		DRM_I915_PERF_PROP_OA_EXPONENT, oa_exp_1_millisec,
 		DRM_I915_PERF_PROP_OA_METRICS_SET, UINT64_MAX,
 	};
 	struct drm_i915_perf_open_param param = {
@@ -708,7 +729,7 @@ test_invalid_oa_format_id(void)
 
 		/* OA unit configuration */
 		DRM_I915_PERF_PROP_OA_METRICS_SET, test_metric_set_id,
-		DRM_I915_PERF_PROP_OA_EXPONENT, 13, /* 1 millisecond */
+		DRM_I915_PERF_PROP_OA_EXPONENT, oa_exp_1_millisec,
 		DRM_I915_PERF_PROP_OA_FORMAT, UINT64_MAX,
 	};
 	struct drm_i915_perf_open_param param = {
@@ -742,7 +763,7 @@ test_missing_sample_flags(void)
 
 		/* OA unit configuration */
 		DRM_I915_PERF_PROP_OA_METRICS_SET, test_metric_set_id,
-		DRM_I915_PERF_PROP_OA_EXPONENT, 13, /* 1 millisecond */
+		DRM_I915_PERF_PROP_OA_EXPONENT, oa_exp_1_millisec,
 		DRM_I915_PERF_PROP_OA_FORMAT, test_oa_format,
 	};
 	struct drm_i915_perf_open_param param = {
@@ -982,8 +1003,6 @@ print_reports(uint32_t *oa_report0, uint32_t *oa_report1, int fmt)
 static void
 test_oa_formats(void)
 {
-	int oa_exponent = 13;
-
 	for (int i = 0; i < ARRAY_SIZE(oa_formats); i++) {
 		uint32_t oa_report0[64];
 		uint32_t oa_report1[64];
@@ -1013,7 +1032,7 @@ test_oa_formats(void)
 		igt_debug("Checking OA format %s\n", oa_formats[i].name);
 
 		open_and_read_2_oa_reports(i,
-					   oa_exponent,
+					   oa_exp_1_millisec,
 					   oa_report0,
 					   oa_report1,
 					   false); /* timer reports only */
@@ -1252,7 +1271,7 @@ static void
 test_low_oa_exponent_permissions(void)
 {
 	int max_freq = read_u64_file("/proc/sys/dev/i915/oa_max_sample_rate");
-	int bad_exponent = max_oa_exponent_for_higher_freq(max_freq);
+	int bad_exponent = max_oa_exponent_for_freq_gt(max_freq);
 	int ok_exponent = bad_exponent + 1;
 	uint64_t properties[] = {
 		/* Include OA reports in samples */
@@ -1326,7 +1345,7 @@ test_per_context_mode_unprivileged(void)
 		/* OA unit configuration */
 		DRM_I915_PERF_PROP_OA_METRICS_SET, test_metric_set_id,
 		DRM_I915_PERF_PROP_OA_FORMAT, test_oa_format,
-		DRM_I915_PERF_PROP_OA_EXPONENT, 13, /* 1 millisecond */
+		DRM_I915_PERF_PROP_OA_EXPONENT, oa_exp_1_millisec,
 	};
 	struct drm_i915_perf_open_param param = {
 		.flags = I915_PERF_FLAG_FD_CLOEXEC,
@@ -1399,13 +1418,14 @@ get_time(void)
 static void
 test_blocking(void)
 {
-	/* 40 milliseconds
+	/* ~40 milliseconds
 	 *
 	 * Having a period somewhat > sysconf(_SC_CLK_TCK) helps to stop
 	 * scheduling (liable to kick in when we make blocking poll()s/reads)
 	 * from interfering with the test.
 	 */
-	int oa_exponent = 18;
+	int oa_exponent = max_oa_exponent_for_period_lte(40000000);
+	uint64_t oa_period = oa_exponent_to_ns(oa_exponent);
 	uint64_t properties[] = {
 		/* Include OA reports in samples */
 		DRM_I915_PERF_PROP_SAMPLE_OA, true,
@@ -1428,8 +1448,7 @@ test_blocking(void)
 	int64_t tick_ns = 1000000000 / sysconf(_SC_CLK_TCK);
 	int64_t test_duration_ns = tick_ns * 1000;
 
-	/* Based on the 40ms OA sampling period set above: max OA samples: */
-	int max_iterations = (test_duration_ns / 40000000ull) + 1;
+	int max_iterations = (test_duration_ns / oa_period) + 1;
 
 	/* It's a bit tricky to put a lower limit here, but we expect a
 	 * relatively low latency for seeing reports, while we don't currently
@@ -1440,7 +1459,7 @@ test_blocking(void)
 	 * the knowledge that that the driver uses a 200Hz hrtimer (5ms period)
 	 * to check for data and giving some time to read().
 	 */
-	int min_iterations = (test_duration_ns / 46000000ull);
+	int min_iterations = (test_duration_ns / (oa_period + 6000000ull));
 
 	int64_t start;
 	int n = 0;
@@ -1489,7 +1508,7 @@ test_blocking(void)
 	user_ns = (end_times.tms_utime - start_times.tms_utime) * tick_ns;
 	kernel_ns = (end_times.tms_stime - start_times.tms_stime) * tick_ns;
 
-	igt_debug("%d blocking reads during test with 25Hz OA sampling\n", n);
+	igt_debug("%d blocking reads during test with ~25Hz OA sampling\n", n);
 	igt_debug("time in userspace = %"PRIu64"ns (+-%dns) (start utime = %d, end = %d)\n",
 		  user_ns, (int)tick_ns,
 		  (int)start_times.tms_utime, (int)end_times.tms_utime);
@@ -1515,13 +1534,14 @@ test_blocking(void)
 static void
 test_polling(void)
 {
-	/* 40 milliseconds
+	/* ~40 milliseconds
 	 *
 	 * Having a period somewhat > sysconf(_SC_CLK_TCK) helps to stop
 	 * scheduling (liable to kick in when we make blocking poll()s/reads)
 	 * from interfering with the test.
 	 */
-	int oa_exponent = 18;
+	int oa_exponent = max_oa_exponent_for_period_lte(40000000);
+	uint64_t oa_period = oa_exponent_to_ns(oa_exponent);
 	uint64_t properties[] = {
 		/* Include OA reports in samples */
 		DRM_I915_PERF_PROP_SAMPLE_OA, true,
@@ -1545,8 +1565,7 @@ test_polling(void)
 	int64_t tick_ns = 1000000000 / sysconf(_SC_CLK_TCK);
 	int64_t test_duration_ns = tick_ns * 1000;
 
-	/* Based on the 40ms OA sampling period set above: max OA samples: */
-	int max_iterations = (test_duration_ns / 40000000ull) + 1;
+	int max_iterations = (test_duration_ns / oa_period) + 1;
 
 	/* It's a bit tricky to put a lower limit here, but we expect a
 	 * relatively low latency for seeing reports, while we don't currently
@@ -1557,7 +1576,7 @@ test_polling(void)
 	 * the knowledge that that the driver uses a 200Hz hrtimer (5ms period)
 	 * to check for data and giving some time to read().
 	 */
-	int min_iterations = (test_duration_ns / 46000000ull);
+	int min_iterations = (test_duration_ns / (oa_period + 6000000ull));
 	int64_t start;
 	int n = 0;
 
@@ -1636,7 +1655,7 @@ test_polling(void)
 	user_ns = (end_times.tms_utime - start_times.tms_utime) * tick_ns;
 	kernel_ns = (end_times.tms_stime - start_times.tms_stime) * tick_ns;
 
-	igt_debug("%d blocking poll()s during test with 25Hz OA sampling\n", n);
+	igt_debug("%d blocking poll()s during test with ~25Hz OA sampling\n", n);
 	igt_debug("time in userspace = %"PRIu64"ns (+-%dns) (start utime = %d, end = %d)\n",
 		  user_ns, (int)tick_ns,
 		  (int)start_times.tms_utime, (int)end_times.tms_utime);
@@ -1662,7 +1681,9 @@ test_polling(void)
 static void
 test_buffer_fill(void)
 {
-	int oa_exponent = 5; /* 5 micro seconds */
+	/* ~5 micro second period */
+	int oa_exponent = max_oa_exponent_for_period_lte(5000);
+	uint64_t oa_period = oa_exponent_to_ns(oa_exponent);
 	uint64_t properties[] = {
 		/* Include OA reports in samples */
 		DRM_I915_PERF_PROP_SAMPLE_OA, true,
@@ -1680,7 +1701,12 @@ test_buffer_fill(void)
 	int stream_fd = __perf_open(drm_fd, &param);
 	int buf_size = 65536 * (256 + sizeof(struct drm_i915_perf_record_header));
 	uint8_t *buf = malloc(buf_size);
+	size_t oa_buf_size = 16 * 1024 * 1024;
+	size_t report_size = oa_formats[test_oa_format].size;
+	int n_full_oa_reports = oa_buf_size / report_size;
+	uint64_t fill_duration = n_full_oa_reports * oa_period;
 
+	igt_assert(fill_duration < 1000000000);
 
 	for (int i = 0; i < 5; i++) {
 		struct drm_i915_perf_record_header *header;
@@ -1688,9 +1714,9 @@ test_buffer_fill(void)
 		int offset = 0;
 		int len;
 
-		/* It should take ~330 milliseconds to fill a 16MB OA buffer with a
-		 * 5 microsecond sampling period and 256 byte reports. */
-		nanosleep(&(struct timespec){ .tv_sec = 0, .tv_nsec = 500000000 }, NULL);
+		nanosleep(&(struct timespec){ .tv_sec = 0,
+					      .tv_nsec = fill_duration * 1.25 },
+			  NULL);
 
 		while ((len = read(stream_fd, buf, buf_size)) == -1 && errno == EINTR)
 			;
@@ -1707,15 +1733,17 @@ test_buffer_fill(void)
 
 		igt_assert_eq(overflow_seen, true);
 
-		nanosleep(&(struct timespec){ .tv_sec = 0, .tv_nsec = 1000000 }, NULL);
+		nanosleep(&(struct timespec){ .tv_sec = 0,
+					      .tv_nsec = fill_duration / 2 },
+			  NULL);
 
 		while ((len = read(stream_fd, buf, buf_size)) == -1 && errno == EINTR)
 			;
 
 		igt_assert_neq(len, -1);
 
-		/* expect ~ 200 records in 1 millisecond */
-		igt_assert(len > 256 * 150);
+		igt_assert(len > report_size * n_full_oa_reports * 0.45);
+		igt_assert(len < report_size * n_full_oa_reports * 0.55);
 
 		overflow_seen = false;
 		for (offset = 0; offset < len; offset += header->size) {
@@ -1736,7 +1764,9 @@ test_buffer_fill(void)
 static void
 test_enable_disable(void)
 {
-	int oa_exponent = 5; /* 5 micro seconds */
+	/* ~5 micro second period */
+	int oa_exponent = max_oa_exponent_for_period_lte(5000);
+	uint64_t oa_period = oa_exponent_to_ns(oa_exponent);
 	uint64_t properties[] = {
 		/* Include OA reports in samples */
 		DRM_I915_PERF_PROP_SAMPLE_OA, true,
@@ -1755,20 +1785,22 @@ test_enable_disable(void)
 	int stream_fd = __perf_open(drm_fd, &param);
 	int buf_size = 65536 * (256 + sizeof(struct drm_i915_perf_record_header));
 	uint8_t *buf = malloc(buf_size);
+	size_t oa_buf_size = 16 * 1024 * 1024;
+	size_t report_size = oa_formats[test_oa_format].size;
+	int n_full_oa_reports = oa_buf_size / report_size;
+	uint64_t fill_duration = n_full_oa_reports * oa_period;
 
 
 	for (int i = 0; i < 5; i++) {
 		int len;
 
-		/* If the stream were enabled then it would take ~330
-		 * milliseconds to fill a 16MB OA buffer with a 5 microsecond
-		 * sampling period and 256 byte reports.
-		 *
-		 * Giving enough time for an overflow might help catch whether
+		/* Giving enough time for an overflow might help catch whether
 		 * the OA unit has been enabled even if the driver might at
 		 * least avoid copying reports while disabled.
 		 */
-		nanosleep(&(struct timespec){ .tv_sec = 0, .tv_nsec = 500000000 }, NULL);
+		nanosleep(&(struct timespec){ .tv_sec = 0,
+					      .tv_nsec = fill_duration * 1.25 },
+			  NULL);
 
 		while ((len = read(stream_fd, buf, buf_size)) == -1 && errno == EINTR)
 			;
@@ -1778,15 +1810,17 @@ test_enable_disable(void)
 
 		do_ioctl(stream_fd, I915_PERF_IOCTL_ENABLE, 0);
 
-		nanosleep(&(struct timespec){ .tv_sec = 0, .tv_nsec = 1000000 }, NULL);
+		nanosleep(&(struct timespec){ .tv_sec = 0,
+					      .tv_nsec = fill_duration / 2 },
+			  NULL);
 
 		while ((len = read(stream_fd, buf, buf_size)) == -1 && errno == EINTR)
 			;
 
 		igt_assert_neq(len, -1);
 
-		/* expect ~ 200 records in 1 millisecond */
-		igt_assert(len > 256 * 150 && len < 256 * 2000);
+		igt_assert(len > report_size * n_full_oa_reports * 0.45);
+		igt_assert(len < report_size * n_full_oa_reports * 0.55);
 
 		do_ioctl(stream_fd, I915_PERF_IOCTL_DISABLE, 0);
 
@@ -1807,7 +1841,7 @@ test_enable_disable(void)
 static void
 test_short_reads(void)
 {
-	int oa_exponent = 5; /* 5 micro seconds */
+	int oa_exponent = max_oa_exponent_for_period_lte(5000);
 	uint64_t properties[] = {
 		/* Include OA reports in samples */
 		DRM_I915_PERF_PROP_SAMPLE_OA, true,
@@ -2350,7 +2384,6 @@ test_per_ctx_mi_rpc(void)
 static void
 test_rc6_disable(void)
 {
-	int oa_exponent = 13; /* 1 millisecond */
 	uint64_t properties[] = {
 		/* Include OA reports in samples */
 		DRM_I915_PERF_PROP_SAMPLE_OA, true,
@@ -2358,7 +2391,7 @@ test_rc6_disable(void)
 		/* OA unit configuration */
 		DRM_I915_PERF_PROP_OA_METRICS_SET, test_metric_set_id,
 		DRM_I915_PERF_PROP_OA_FORMAT, test_oa_format,
-		DRM_I915_PERF_PROP_OA_EXPONENT, oa_exponent,
+		DRM_I915_PERF_PROP_OA_EXPONENT, oa_exp_1_millisec,
 	};
 	struct drm_i915_perf_open_param param = {
 		.flags = I915_PERF_FLAG_FD_CLOEXEC,
@@ -2424,7 +2457,6 @@ done:
 static void
 test_i915_ref_count(void)
 {
-	int oa_exponent = 13; /* 1 millisecond */
 	uint64_t properties[] = {
 		/* Include OA reports in samples */
 		DRM_I915_PERF_PROP_SAMPLE_OA, true,
@@ -2432,7 +2464,7 @@ test_i915_ref_count(void)
 		/* OA unit configuration */
 		DRM_I915_PERF_PROP_OA_METRICS_SET, 0 /* updated below */,
 		DRM_I915_PERF_PROP_OA_FORMAT, 0, /* update below */
-		DRM_I915_PERF_PROP_OA_EXPONENT, oa_exponent,
+		DRM_I915_PERF_PROP_OA_EXPONENT, 0, /* update below */
 	};
 	struct drm_i915_perf_open_param param = {
 		.flags = I915_PERF_FLAG_FD_CLOEXEC,
@@ -2462,6 +2494,7 @@ test_i915_ref_count(void)
 	igt_require(init_sys_info());
 	properties[3] = test_metric_set_id;
 	properties[5] = test_oa_format;
+	properties[7] = oa_exp_1_millisec;
 
 	ref_count0 = read_i915_module_ref();
 	igt_debug("initial ref count with drm_fd open = %u\n", ref_count0);
@@ -2481,7 +2514,7 @@ test_i915_ref_count(void)
 
 	read_2_oa_reports(stream_fd,
 			  test_oa_format,
-			  oa_exponent,
+			  oa_exp_1_millisec,
 			  oa_report0,
 			  oa_report1,
 			  false); /* not just timer reports */
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH i-g-t 13/29] igt/perf: allow 10% margin matching oa/sysfs freq in test_oa_exponents
  2017-04-25 22:32 [PATCH i-g-t 00/29] Update i915 perf tests for Gen8+ Lionel Landwerlin
                   ` (11 preceding siblings ...)
  2017-04-25 22:32 ` [PATCH i-g-t 12/29] igt/perf: avoid assumptions about oa exponent <-> freq mappings Lionel Landwerlin
@ 2017-04-25 22:32 ` Lionel Landwerlin
  2017-04-25 22:32 ` [PATCH i-g-t 14/29] igt/perf: s/test_perf_ctx_mi_rpc/hsw_test_single_ctx_counters/ Lionel Landwerlin
                   ` (15 subsequent siblings)
  28 siblings, 0 replies; 36+ messages in thread
From: Lionel Landwerlin @ 2017-04-25 22:32 UTC (permalink / raw)
  To: intel-gfx

From: Robert Bragg <robert@sixbynine.org>

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 tests/perf.c | 19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/tests/perf.c b/tests/perf.c
index d47e45c8..c8092eaa 100644
--- a/tests/perf.c
+++ b/tests/perf.c
@@ -1111,6 +1111,8 @@ test_oa_formats(void)
 static void
 test_oa_exponents(int gt_freq_mhz)
 {
+	uint32_t freq_margin;
+
 	/* This test tries to use the sysfs interface for pinning the GT
 	 * frequency so we have another point of reference for comparing with
 	 * the clock frequency as derived from OA reports.
@@ -1129,11 +1131,17 @@ test_oa_exponents(int gt_freq_mhz)
 	igt_debug("Testing OA timer exponents with requested GT frequency = %dmhz\n",
 		  gt_freq_mhz);
 
+	/* allow a +- 10% error margin when checking that the frequency
+	 * calculated from the OA reports matches the frequency according to
+	 * sysfs.
+	 */
+	freq_margin = gt_freq_mhz * 0.1;
+
 	/* It's asking a lot to sample with a 160 nanosecond period and the
 	 * test can fail due to buffer overflows if it wasn't possible to
 	 * keep up, so we don't start from an exponent of zero...
 	 */
-	for (int i = 2; i < 20; i++) {
+	for (int i = 5; i < 20; i++) {
 		uint32_t expected_timestamp_delta;
 		uint32_t timestamp_delta;
 		uint32_t oa_report0[64];
@@ -1157,8 +1165,10 @@ test_oa_exponents(int gt_freq_mhz)
 
 			gt_freq_mhz_0 = sysfs_read("gt_act_freq_mhz");
 
-			igt_debug("ITER %d: testing OA exponent %d with sysfs GT freq = %dmhz\n",
-				  j, i, gt_freq_mhz_0);
+			igt_debug("ITER %d: testing OA exponent %d (period = %"PRIu64"ns) with sysfs GT freq = %dmhz +- %u\n",
+				  j, i,
+				  oa_exponent_to_ns(i),
+				  gt_freq_mhz_0, freq_margin);
 
 			open_and_read_2_oa_reports(test_oa_format,
 						   i, /* exponent */
@@ -1199,7 +1209,8 @@ test_oa_exponents(int gt_freq_mhz)
 			igt_debug("ITER %d: time delta = %"PRIu32"(ns) clock delta = %"PRIu32" freq = %"PRIu32"(mhz)\n",
 				  j, time_delta, clock_delta, freq);
 
-			if (freq == gt_freq_mhz_1)
+                        if (freq < (gt_freq_mhz_1 + freq_margin) &&
+                            freq > (gt_freq_mhz_1 - freq_margin))
 				n_freq_matches++;
 
 			n_tested++;
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH i-g-t 14/29] igt/perf: s/test_perf_ctx_mi_rpc/hsw_test_single_ctx_counters/
  2017-04-25 22:32 [PATCH i-g-t 00/29] Update i915 perf tests for Gen8+ Lionel Landwerlin
                   ` (12 preceding siblings ...)
  2017-04-25 22:32 ` [PATCH i-g-t 13/29] igt/perf: allow 10% margin matching oa/sysfs freq in test_oa_exponents Lionel Landwerlin
@ 2017-04-25 22:32 ` Lionel Landwerlin
  2017-04-25 22:32 ` [PATCH i-g-t 15/29] igt/perf: don't assume constant of 40 EUs Lionel Landwerlin
                   ` (14 subsequent siblings)
  28 siblings, 0 replies; 36+ messages in thread
From: Lionel Landwerlin @ 2017-04-25 22:32 UTC (permalink / raw)
  To: intel-gfx

From: Robert Bragg <robert@sixbynine.org>

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 tests/perf.c | 20 +++++++++++++++++---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/tests/perf.c b/tests/perf.c
index c8092eaa..62bfd80f 100644
--- a/tests/perf.c
+++ b/tests/perf.c
@@ -2177,9 +2177,15 @@ emit_stall_timestamp_and_rpc(struct intel_batchbuffer *batch,
  * should be able to configure the OA unit for per-context metrics (for a
  * context associated with that process' drm file descriptor) and the counters
  * should only relate to that specific context.
+ *
+ * Unfortunately only Haswell limits the progression of OA counters for a
+ * single context and so this unit test is Haswell specific. For Gen8+ although
+ * reports read via i915 perf can be filtered for a single context the counters
+ * themselves always progress as global/system-wide counters affected by all
+ * contexts.
  */
 static void
-test_per_ctx_mi_rpc(void)
+hsw_test_single_ctx_counters(void)
 {
 	uint64_t properties[] = {
 		DRM_I915_PERF_PROP_CTX_HANDLE, UINT64_MAX, /* updated below */
@@ -2638,8 +2644,16 @@ igt_main
 	igt_subtest("mi-rpc")
 		test_mi_rpc();
 
-	igt_subtest("mi-rpc-per-ctx")
-		test_per_ctx_mi_rpc();
+	igt_subtest("unprivileged-singled-ctx-counters") {
+		/* For Gen8+ the OA unit can no longer be made to clock gate
+		 * for a specific context. Additionally the partial-replacement
+		 * functionality to HW filter timer reports for a specific
+		 * context (SKL+) can't stop multiple applications viewing
+		 * system-wide data via MI_REPORT_PERF_COUNT commands.
+		 */
+		igt_require(IS_HASWELL(devid));
+		hsw_test_single_ctx_counters();
+	}
 
 	igt_subtest("rc6-disable")
 		test_rc6_disable();
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH i-g-t 15/29] igt/perf: don't assume constant of 40 EUs
  2017-04-25 22:32 [PATCH i-g-t 00/29] Update i915 perf tests for Gen8+ Lionel Landwerlin
                   ` (13 preceding siblings ...)
  2017-04-25 22:32 ` [PATCH i-g-t 14/29] igt/perf: s/test_perf_ctx_mi_rpc/hsw_test_single_ctx_counters/ Lionel Landwerlin
@ 2017-04-25 22:32 ` Lionel Landwerlin
  2017-04-25 22:32 ` [PATCH i-g-t 16/29] igt/perf: consider ctx-switch reports while polling/blocking Lionel Landwerlin
                   ` (13 subsequent siblings)
  28 siblings, 0 replies; 36+ messages in thread
From: Lionel Landwerlin @ 2017-04-25 22:32 UTC (permalink / raw)
  To: intel-gfx

From: Robert Bragg <robert@sixbynine.org>

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 tests/perf.c | 22 ++++++++++++++++++++--
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/tests/perf.c b/tests/perf.c
index 62bfd80f..9a8c54fc 100644
--- a/tests/perf.c
+++ b/tests/perf.c
@@ -239,6 +239,7 @@ static bool gen8_undefined_a_counters[45];
 static int drm_fd = -1;
 static uint32_t devid;
 static int card = -1;
+static int n_eus;
 
 static uint64_t test_metric_set_id = UINT64_MAX;
 static uint64_t gt_min_freq_mhz_saved = 0;
@@ -506,7 +507,20 @@ init_sys_info(void)
 		test_oa_format = I915_OA_FORMAT_A45_B8_C8;
 		undefined_a_counters = hsw_undefined_a_counters;
 		read_report_ticks = hsw_read_report_ticks;
+
+		if (intel_gt(devid) == 0)
+			n_eus = 10;
+		else if (intel_gt(devid) == 1)
+			n_eus = 20;
+		else if (intel_gt(devid) == 2)
+			n_eus = 40;
+		else {
+			igt_assert(!"reached");
+			return false;
+		}
 	} else {
+		drm_i915_getparam_t gp;
+
 		test_set_name = "TestOa";
 		test_oa_format = I915_OA_FORMAT_A32u40_A4u32_B8_C8;
 		undefined_a_counters = gen8_undefined_a_counters;
@@ -537,6 +551,10 @@ init_sys_info(void)
 			timestamp_frequency = 19200000;
 		} else
 			return false;
+
+		gp.param = I915_PARAM_EU_TOTAL;
+		gp.value = &n_eus;
+		do_ioctl(drm_fd, DRM_IOCTL_I915_GETPARAM, &gp);
 	}
 
 	igt_debug("%s metric set UUID = %s\n",
@@ -1077,11 +1095,11 @@ test_oa_formats(void)
 		igt_debug("clock delta = %"PRIu32"\n", clock_delta);
 
 		/* The maximum rate for any HSW counter =
-		 *   clock_delta * 40 EUs
+		 *   clock_delta * N EUs
 		 *
 		 * Sanity check that no counters exceed this delta.
 		 */
-		max_delta = clock_delta * 40;
+		max_delta = clock_delta * n_eus;
 
 		for (int j = 0; j < oa_formats[i].n_a; j++) {
 			int a_id = oa_formats[i].first_a + j;
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH i-g-t 16/29] igt/perf: consider ctx-switch reports while polling/blocking
  2017-04-25 22:32 [PATCH i-g-t 00/29] Update i915 perf tests for Gen8+ Lionel Landwerlin
                   ` (14 preceding siblings ...)
  2017-04-25 22:32 ` [PATCH i-g-t 15/29] igt/perf: don't assume constant of 40 EUs Lionel Landwerlin
@ 2017-04-25 22:32 ` Lionel Landwerlin
  2017-04-25 22:32 ` [PATCH i-g-t 17/29] igt/perf: factor out oa report sanity checking Lionel Landwerlin
                   ` (12 subsequent siblings)
  28 siblings, 0 replies; 36+ messages in thread
From: Lionel Landwerlin @ 2017-04-25 22:32 UTC (permalink / raw)
  To: intel-gfx

From: Robert Bragg <robert@sixbynine.org>

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 tests/perf.c | 92 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 86 insertions(+), 6 deletions(-)

diff --git a/tests/perf.c b/tests/perf.c
index 9a8c54fc..fe5ff0fc 100644
--- a/tests/perf.c
+++ b/tests/perf.c
@@ -45,6 +45,12 @@ IGT_TEST_DESCRIPTION("Test the i915 perf metrics streaming interface");
 #define GEN6_MI_REPORT_PERF_COUNT ((0x28 << 23) | (3 - 2))
 #define GEN8_MI_REPORT_PERF_COUNT ((0x28 << 23) | (4 - 2))
 
+#define OAREPORT_REASON_MASK           0x3f
+#define OAREPORT_REASON_SHIFT          19
+#define OAREPORT_REASON_TIMER          (1<<0)
+#define OAREPORT_REASON_CTX_SWITCH     (1<<3)
+#define OAREPORT_REASON_CLK_RATIO      (1<<5)
+
 #define GFX_OP_PIPE_CONTROL     ((3 << 29) | (3 << 27) | (2 << 24))
 #define PIPE_CONTROL_CS_STALL	   (1 << 20)
 #define PIPE_CONTROL_GLOBAL_SNAPSHOT_COUNT_RESET	(1 << 19)
@@ -1478,6 +1484,7 @@ test_blocking(void)
 	int64_t test_duration_ns = tick_ns * 1000;
 
 	int max_iterations = (test_duration_ns / oa_period) + 1;
+	int n_extra_iterations = 0;
 
 	/* It's a bit tricky to put a lower limit here, but we expect a
 	 * relatively low latency for seeing reports, while we don't currently
@@ -1518,6 +1525,9 @@ test_blocking(void)
 	 * We Loop for 1000 x tick_ns so one tick corresponds to 0.1%
 	 */
 	for (start = get_time(); (get_time() - start) < test_duration_ns; /* nop */) {
+		struct drm_i915_perf_record_header *header;
+		bool timer_report_read = false;
+		bool non_timer_report_read = false;
 		int ret;
 
 		while ((ret = read(stream_fd, buf, sizeof(buf))) < 0 &&
@@ -1526,6 +1536,36 @@ test_blocking(void)
 
 		igt_assert(ret > 0);
 
+		/* For Haswell reports don't contain a well defined reason
+		 * field we so assume all reports to be 'periodic'. For gen8+
+		 * we want to to consider that the HW automatically writes some
+		 * non periodic reports (e.g. on context switch) which might
+		 * lead to more successful read()s than expected due to
+		 * periodic sampling and we don't want these extra reads to
+		 * cause the test to fail...
+		 */
+		if (intel_gen(devid) >= 8) {
+			for (int offset = 0; offset < ret; offset += header->size) {
+				header = (void *)(buf + offset);
+
+				if (header->type == DRM_I915_PERF_RECORD_SAMPLE) {
+					uint32_t *report = (void *)(header + 1);
+
+					uint32_t reason = ((report[0] >>
+							    OAREPORT_REASON_SHIFT) &
+							   OAREPORT_REASON_MASK);
+
+					if (reason & OAREPORT_REASON_TIMER)
+						timer_report_read = true;
+					else
+						non_timer_report_read = true;
+				}
+			}
+		}
+
+		if (non_timer_report_read && !timer_report_read)
+			n_extra_iterations++;
+
 		n++;
 	}
 
@@ -1537,7 +1577,10 @@ test_blocking(void)
 	user_ns = (end_times.tms_utime - start_times.tms_utime) * tick_ns;
 	kernel_ns = (end_times.tms_stime - start_times.tms_stime) * tick_ns;
 
-	igt_debug("%d blocking reads during test with ~25Hz OA sampling\n", n);
+	igt_debug("%d blocking reads during test with ~25Hz OA sampling (expect no more than %d)\n",
+		  n, max_iterations);
+	igt_debug("%d extra iterations seen, not related to periodic sampling (e.g. context switches)\n",
+		  n_extra_iterations);
 	igt_debug("time in userspace = %"PRIu64"ns (+-%dns) (start utime = %d, end = %d)\n",
 		  user_ns, (int)tick_ns,
 		  (int)start_times.tms_utime, (int)end_times.tms_utime);
@@ -1548,12 +1591,12 @@ test_blocking(void)
 	/* With completely broken blocking (but also not returning an error) we
 	 * could end up with an open loop,
 	 */
-	igt_assert(n <= max_iterations);
+	igt_assert(n <= (max_iterations + n_extra_iterations));
 
 	/* Make sure the driver is reporting new samples with a reasonably
 	 * low latency...
 	 */
-	igt_assert(n > min_iterations);
+	igt_assert(n > (min_iterations + n_extra_iterations));
 
 	igt_assert(kernel_ns <= (test_duration_ns / 100ull));
 
@@ -1595,6 +1638,7 @@ test_polling(void)
 	int64_t test_duration_ns = tick_ns * 1000;
 
 	int max_iterations = (test_duration_ns / oa_period) + 1;
+	int n_extra_iterations = 0;
 
 	/* It's a bit tricky to put a lower limit here, but we expect a
 	 * relatively low latency for seeing reports, while we don't currently
@@ -1635,6 +1679,9 @@ test_polling(void)
 	 */
 	for (start = get_time(); (get_time() - start) < test_duration_ns; /* nop */) {
 		struct pollfd pollfd = { .fd = stream_fd, .events = POLLIN };
+		struct drm_i915_perf_record_header *header;
+		bool timer_report_read = false;
+		bool non_timer_report_read = false;
 		int ret;
 
 		while ((ret = poll(&pollfd, 1, -1)) < 0 &&
@@ -1663,6 +1710,36 @@ test_polling(void)
 			igt_debug("Unexpected error when reading after poll = %d\n", errno);
 		igt_assert_neq(ret, -1);
 
+		/* For Haswell reports don't contain a well defined reason
+		 * field we so assume all reports to be 'periodic'. For gen8+
+		 * we want to to consider that the HW automatically writes some
+		 * non periodic reports (e.g. on context switch) which might
+		 * lead to more successful read()s than expected due to
+		 * periodic sampling and we don't want these extra reads to
+		 * cause the test to fail...
+		 */
+		if (intel_gen(devid) >= 8) {
+			for (int offset = 0; offset < ret; offset += header->size) {
+				header = (void *)(buf + offset);
+
+				if (header->type == DRM_I915_PERF_RECORD_SAMPLE) {
+					uint32_t *report = (void *)(header + 1);
+
+					uint32_t reason = ((report[0] >>
+							    OAREPORT_REASON_SHIFT) &
+							   OAREPORT_REASON_MASK);
+
+					if (reason & OAREPORT_REASON_TIMER)
+						timer_report_read = true;
+					else
+						non_timer_report_read = true;
+				}
+			}
+		}
+
+		if (non_timer_report_read && !timer_report_read)
+			n_extra_iterations++;
+
 		/* At this point, after consuming pending reports (and hoping
 		 * the scheduler hasn't stopped us for too long we now
 		 * expect EAGAIN on read.
@@ -1684,7 +1761,10 @@ test_polling(void)
 	user_ns = (end_times.tms_utime - start_times.tms_utime) * tick_ns;
 	kernel_ns = (end_times.tms_stime - start_times.tms_stime) * tick_ns;
 
-	igt_debug("%d blocking poll()s during test with ~25Hz OA sampling\n", n);
+	igt_debug("%d blocking reads during test with ~25Hz OA sampling (expect no more than %d)\n",
+		  n, max_iterations);
+	igt_debug("%d extra iterations seen, not related to periodic sampling (e.g. context switches)\n",
+		  n_extra_iterations);
 	igt_debug("time in userspace = %"PRIu64"ns (+-%dns) (start utime = %d, end = %d)\n",
 		  user_ns, (int)tick_ns,
 		  (int)start_times.tms_utime, (int)end_times.tms_utime);
@@ -1695,12 +1775,12 @@ test_polling(void)
 	/* With completely broken blocking while polling (but still somehow
 	 * reporting a POLLIN event) we could end up with an open loop.
 	 */
-	igt_assert(n <= max_iterations);
+	igt_assert(n <= (max_iterations + n_extra_iterations));
 
 	/* Make sure the driver is reporting new samples with a reasonably
 	 * low latency...
 	 */
-	igt_assert(n > min_iterations);
+	igt_assert(n > (min_iterations + n_extra_iterations));
 
 	igt_assert(kernel_ns <= (test_duration_ns / 100ull));
 
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH i-g-t 17/29] igt/perf: factor out oa report sanity checking
  2017-04-25 22:32 [PATCH i-g-t 00/29] Update i915 perf tests for Gen8+ Lionel Landwerlin
                   ` (15 preceding siblings ...)
  2017-04-25 22:32 ` [PATCH i-g-t 16/29] igt/perf: consider ctx-switch reports while polling/blocking Lionel Landwerlin
@ 2017-04-25 22:32 ` Lionel Landwerlin
  2017-04-25 22:32 ` [PATCH i-g-t 18/29] igt/perf: print [un]slice freq and report reasons in debug Lionel Landwerlin
                   ` (11 subsequent siblings)
  28 siblings, 0 replies; 36+ messages in thread
From: Lionel Landwerlin @ 2017-04-25 22:32 UTC (permalink / raw)
  To: intel-gfx

From: Robert Bragg <robert@sixbynine.org>

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 tests/perf.c | 274 +++++++++++++++++++++++++++++++++++++++++++----------------
 1 file changed, 202 insertions(+), 72 deletions(-)

diff --git a/tests/perf.c b/tests/perf.c
index fe5ff0fc..08ee8665 100644
--- a/tests/perf.c
+++ b/tests/perf.c
@@ -261,6 +261,8 @@ static uint64_t oa_exp_1_millisec;
 static igt_render_copyfunc_t render_copy = NULL;
 static uint32_t (*read_report_ticks)(uint32_t *report,
 				     enum drm_i915_oa_format format);
+static void (*sanity_check_reports)(uint32_t *oa_report0, uint32_t *oa_report1,
+				    enum drm_i915_oa_format format);
 
 static int
 __perf_open(int fd, struct drm_i915_perf_open_param *param)
@@ -472,6 +474,90 @@ oa_exponent_to_ns(int exponent)
        return 1000000000ULL * (2ULL << exponent) / timestamp_frequency;
 }
 
+static void
+hsw_sanity_check_render_basic_reports(uint32_t *oa_report0, uint32_t *oa_report1,
+				      enum drm_i915_oa_format fmt)
+{
+	uint32_t time_delta = timebase_scale(oa_report1[1] - oa_report0[1]);
+	uint32_t clock_delta;
+	uint32_t max_delta;
+
+	igt_assert_neq(time_delta, 0);
+
+	/* As a special case we have to consider that on Haswell we
+	 * can't explicitly derive a clock delta for all OA report
+	 * formats...
+	 */
+	if (oa_formats[fmt].n_c == 0) {
+		/* Assume running at max freq for sake of
+		 * below sanity check on counters... */
+		clock_delta = (gt_max_freq_mhz *
+			       (uint64_t)time_delta) / 1000;
+	} else {
+		uint32_t ticks0 = read_report_ticks(oa_report0, fmt);
+		uint32_t ticks1 = read_report_ticks(oa_report1, fmt);
+		uint64_t freq;
+
+		clock_delta = ticks1 - ticks0;
+
+		igt_assert_neq(clock_delta, 0);
+
+		freq = ((uint64_t)clock_delta * 1000) / time_delta;
+		igt_debug("freq = %"PRIu64"\n", freq);
+
+		igt_assert(freq <= gt_max_freq_mhz);
+	}
+
+	igt_debug("clock delta = %"PRIu32"\n", clock_delta);
+
+	/* The maximum rate for any HSW counter =
+	 *   clock_delta * N EUs
+	 *
+	 * Sanity check that no counters exceed this delta.
+	 */
+	max_delta = clock_delta * n_eus;
+
+	/* 40bit A counters were only introduced for Gen8+ */
+	igt_assert_eq(oa_formats[fmt].n_a40, 0);
+
+	for (int j = 0; j < oa_formats[fmt].n_a; j++) {
+		uint32_t *a0 = (uint32_t *)(((uint8_t *)oa_report0) +
+					    oa_formats[fmt].a_off);
+		uint32_t *a1 = (uint32_t *)(((uint8_t *)oa_report1) +
+					    oa_formats[fmt].a_off);
+		int a_id = oa_formats[fmt].first_a + j;
+		uint32_t delta = a1[j] - a0[j];
+
+		if (undefined_a_counters[a_id])
+			continue;
+
+		igt_debug("A%d: delta = %"PRIu32"\n", a_id, delta);
+		igt_assert(delta <= max_delta);
+	}
+
+	for (int j = 0; j < oa_formats[fmt].n_b; j++) {
+		uint32_t *b0 = (uint32_t *)(((uint8_t *)oa_report0) +
+					    oa_formats[fmt].b_off);
+		uint32_t *b1 = (uint32_t *)(((uint8_t *)oa_report1) +
+					    oa_formats[fmt].b_off);
+		uint32_t delta = b1[j] - b0[j];
+
+		igt_debug("B%d: delta = %"PRIu32"\n", j, delta);
+		igt_assert(delta <= max_delta);
+	}
+
+	for (int j = 0; j < oa_formats[fmt].n_c; j++) {
+		uint32_t *c0 = (uint32_t *)(((uint8_t *)oa_report0) +
+					    oa_formats[fmt].c_off);
+		uint32_t *c1 = (uint32_t *)(((uint8_t *)oa_report1) +
+					    oa_formats[fmt].c_off);
+		uint32_t delta = c1[j] - c0[j];
+
+		igt_debug("C%d: delta = %"PRIu32"\n", j, delta);
+		igt_assert(delta <= max_delta);
+	}
+}
+
 static uint64_t
 gen8_read_40bit_a_counter(uint32_t *report, enum drm_i915_oa_format fmt, int a_id)
 {
@@ -492,6 +578,119 @@ gen8_40bit_a_delta(uint64_t value0, uint64_t value1)
 		return value1 - value0;
 }
 
+/* The TestOa metric set is designed so */
+static void
+gen8_sanity_check_test_oa_reports(uint32_t *oa_report0, uint32_t *oa_report1,
+				  enum drm_i915_oa_format fmt)
+{
+	uint32_t time_delta = timebase_scale(oa_report1[1] - oa_report0[1]);
+	uint32_t ticks0 = read_report_ticks(oa_report0, fmt);
+	uint32_t ticks1 = read_report_ticks(oa_report1, fmt);
+	uint32_t clock_delta = ticks1 - ticks0;
+	uint32_t max_delta;
+	uint64_t freq;
+	uint32_t *rpt0_b = (uint32_t *)(((uint8_t *)oa_report0) +
+					oa_formats[fmt].b_off);
+	uint32_t *rpt1_b = (uint32_t *)(((uint8_t *)oa_report1) +
+					oa_formats[fmt].b_off);
+	uint32_t b;
+	uint32_t ref;
+
+
+	igt_assert_neq(time_delta, 0);
+	igt_assert_neq(clock_delta, 0);
+
+	freq = ((uint64_t)clock_delta * 1000) / time_delta;
+	igt_debug("freq = %"PRIu64"\n", freq);
+
+	igt_assert(freq <= gt_max_freq_mhz);
+
+	igt_debug("clock delta = %"PRIu32"\n", clock_delta);
+
+	max_delta = clock_delta * n_eus;
+
+	/* Gen8+ has some 40bit A counters... */
+	for (int j = 0; j < oa_formats[fmt].n_a40; j++) {
+		uint64_t value0 = gen8_read_40bit_a_counter(oa_report0, fmt, j);
+		uint64_t value1 = gen8_read_40bit_a_counter(oa_report1, fmt, j);
+		uint64_t delta = gen8_40bit_a_delta(value0, value1);
+
+		if (undefined_a_counters[j])
+			continue;
+
+		igt_debug("A%d: delta = %"PRIu64"\n", j, delta);
+		igt_assert(delta <= max_delta);
+	}
+
+	for (int j = 0; j < oa_formats[fmt].n_a; j++) {
+		uint32_t *a0 = (uint32_t *)(((uint8_t *)oa_report0) +
+					    oa_formats[fmt].a_off);
+		uint32_t *a1 = (uint32_t *)(((uint8_t *)oa_report1) +
+					    oa_formats[fmt].a_off);
+		int a_id = oa_formats[fmt].first_a + j;
+		uint32_t delta = a1[j] - a0[j];
+
+		if (undefined_a_counters[a_id])
+			continue;
+
+		igt_debug("A%d: delta = %"PRIu32"\n", a_id, delta);
+		igt_assert(delta <= max_delta);
+	}
+
+	/* The TestOa metric set defines all B counters to be a
+	 * multiple of the gpu clock
+	 */
+	if (oa_formats[fmt].n_b) {
+		b = rpt1_b[0] - rpt0_b[0];
+		igt_debug("B0: delta = %"PRIu32"\n", b);
+		igt_assert_eq(b, 0);
+
+		b = rpt1_b[1] - rpt0_b[1];
+		igt_debug("B1: delta = %"PRIu32"\n", b);
+		igt_assert_eq(b, clock_delta);
+
+		b = rpt1_b[2] - rpt0_b[2];
+		igt_debug("B2: delta = %"PRIu32"\n", b);
+		igt_assert_eq(b, clock_delta);
+
+		b = rpt1_b[3] - rpt0_b[3];
+		ref = clock_delta / 2;
+		igt_debug("B3: delta = %"PRIu32"\n", b);
+		igt_assert(b >= ref - 1 && b <= ref + 1);
+
+		b = rpt1_b[4] - rpt0_b[4];
+		ref = clock_delta / 3;
+		igt_debug("B4: delta = %"PRIu32"\n", b);
+		igt_assert(b >= ref - 1 && b <= ref + 1);
+
+		b = rpt1_b[5] - rpt0_b[5];
+		ref = clock_delta / 3;
+		igt_debug("B5: delta = %"PRIu32"\n", b);
+		igt_assert(b >= ref - 1 && b <= ref + 1);
+
+		b = rpt1_b[6] - rpt0_b[6];
+		ref = clock_delta / 6;
+		igt_debug("B6: delta = %"PRIu32"\n", b);
+		igt_assert(b >= ref - 1 && b <= ref + 1);
+
+		b = rpt1_b[7] - rpt0_b[7];
+		ref = clock_delta * 2 / 3;
+		igt_debug("B7: delta = %"PRIu32"\n", b);
+		igt_assert(b >= ref - 1 && b <= ref + 1);
+	}
+
+	for (int j = 0; j < oa_formats[fmt].n_c; j++) {
+		uint32_t *c0 = (uint32_t *)(((uint8_t *)oa_report0) +
+					    oa_formats[fmt].c_off);
+		uint32_t *c1 = (uint32_t *)(((uint8_t *)oa_report1) +
+					    oa_formats[fmt].c_off);
+		uint32_t delta = c1[j] - c0[j];
+
+		igt_debug("C%d: delta = %"PRIu32"\n", j, delta);
+		igt_assert(delta <= max_delta);
+	}
+}
+
 static bool
 init_sys_info(void)
 {
@@ -513,6 +712,7 @@ init_sys_info(void)
 		test_oa_format = I915_OA_FORMAT_A45_B8_C8;
 		undefined_a_counters = hsw_undefined_a_counters;
 		read_report_ticks = hsw_read_report_ticks;
+		sanity_check_reports = hsw_sanity_check_render_basic_reports;
 
 		if (intel_gt(devid) == 0)
 			n_eus = 10;
@@ -531,6 +731,7 @@ init_sys_info(void)
 		test_oa_format = I915_OA_FORMAT_A32u40_A4u32_B8_C8;
 		undefined_a_counters = gen8_undefined_a_counters;
 		read_report_ticks = gen8_read_report_ticks;
+		sanity_check_reports = gen8_sanity_check_test_oa_reports;
 
 		if (IS_BROADWELL(devid)) {
 			test_set_uuid = "d6de6f55-e526-4f79-a6a6-d7315c09044e";
@@ -1030,11 +1231,6 @@ test_oa_formats(void)
 	for (int i = 0; i < ARRAY_SIZE(oa_formats); i++) {
 		uint32_t oa_report0[64];
 		uint32_t oa_report1[64];
-		uint32_t *a0, *b0, *c0;
-		uint32_t *a1, *b1, *c1;
-		uint32_t time_delta;
-		uint32_t clock_delta;
-		uint32_t max_delta;
 
 		if (!oa_formats[i].name) /* sparse, indexed by ID */
 			continue;
@@ -1062,73 +1258,7 @@ test_oa_formats(void)
 					   false); /* timer reports only */
 
 		print_reports(oa_report0, oa_report1, i);
-
-		a0 = (uint32_t *)(((uint8_t *)oa_report0) + oa_formats[i].a_off);
-		b0 = (uint32_t *)(((uint8_t *)oa_report0) + oa_formats[i].b_off);
-		c0 = (uint32_t *)(((uint8_t *)oa_report0) + oa_formats[i].c_off);
-
-		a1 = (uint32_t *)(((uint8_t *)oa_report1) + oa_formats[i].a_off);
-		b1 = (uint32_t *)(((uint8_t *)oa_report1) + oa_formats[i].b_off);
-		c1 = (uint32_t *)(((uint8_t *)oa_report1) + oa_formats[i].c_off);
-
-		time_delta = timebase_scale(oa_report1[1] - oa_report0[1]);
-		igt_assert_neq(time_delta, 0);
-
-		/* As a special case we have to consider that on Haswell we
-		 * can't explicitly derive a clock delta for all OA report
-		 * formats...
-		 */
-		if (IS_HASWELL(devid) && oa_formats[i].n_c == 0) {
-			/* Assume running at max freq for sake of
-			 * below sanity check on counters... */
-			clock_delta = (gt_max_freq_mhz *
-				       (uint64_t)time_delta) / 1000;
-		} else {
-			uint32_t ticks0 = read_report_ticks(oa_report0, i);
-			uint32_t ticks1 = read_report_ticks(oa_report1, i);
-			uint64_t freq;
-
-			clock_delta = ticks1 - ticks0;
-
-			igt_assert_neq(clock_delta, 0);
-
-			freq = ((uint64_t)clock_delta * 1000) / time_delta;
-			igt_debug("freq = %"PRIu64"\n", freq);
-
-			igt_assert(freq <= gt_max_freq_mhz);
-		}
-
-		igt_debug("clock delta = %"PRIu32"\n", clock_delta);
-
-		/* The maximum rate for any HSW counter =
-		 *   clock_delta * N EUs
-		 *
-		 * Sanity check that no counters exceed this delta.
-		 */
-		max_delta = clock_delta * n_eus;
-
-		for (int j = 0; j < oa_formats[i].n_a; j++) {
-			int a_id = oa_formats[i].first_a + j;
-			uint32_t delta = a1[j] - a0[j];
-
-			if (undefined_a_counters[a_id])
-				continue;
-
-			igt_debug("A%d: delta = %"PRIu32"\n", a_id, delta);
-			igt_assert(delta <= max_delta);
-		}
-
-		for (int j = 0; j < oa_formats[i].n_b; j++) {
-			uint32_t delta = b1[j] - b0[j];
-			igt_debug("B%d: delta = %"PRIu32"\n", j, delta);
-			igt_assert(delta <= max_delta);
-		}
-
-		for (int j = 0; j < oa_formats[i].n_c; j++) {
-			uint32_t delta = c1[j] - c0[j];
-			igt_debug("C%d: delta = %"PRIu32"\n", j, delta);
-			igt_assert(delta <= max_delta);
-		}
+		sanity_check_reports(oa_report0, oa_report1, i);
 	}
 }
 
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH i-g-t 18/29] igt/perf: print [un]slice freq and report reasons in debug
  2017-04-25 22:32 [PATCH i-g-t 00/29] Update i915 perf tests for Gen8+ Lionel Landwerlin
                   ` (16 preceding siblings ...)
  2017-04-25 22:32 ` [PATCH i-g-t 17/29] igt/perf: factor out oa report sanity checking Lionel Landwerlin
@ 2017-04-25 22:32 ` Lionel Landwerlin
  2017-04-25 22:32 ` [PATCH i-g-t 19/29] igt/perf: update print_reports to print context ID Lionel Landwerlin
                   ` (10 subsequent siblings)
  28 siblings, 0 replies; 36+ messages in thread
From: Lionel Landwerlin @ 2017-04-25 22:32 UTC (permalink / raw)
  To: intel-gfx

From: Robert Bragg <robert@sixbynine.org>

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 tests/perf.c | 58 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 57 insertions(+), 1 deletion(-)

diff --git a/tests/perf.c b/tests/perf.c
index 08ee8665..ab8db296 100644
--- a/tests/perf.c
+++ b/tests/perf.c
@@ -428,6 +428,28 @@ gen8_read_report_ticks(uint32_t *report, enum drm_i915_oa_format format)
 	return report[3];
 }
 
+static const char *
+gen8_read_report_reason(const uint32_t *report)
+{
+	uint32_t reason = ((report[0] >> OAREPORT_REASON_SHIFT) &
+			   OAREPORT_REASON_MASK);
+
+	if (reason & (1<<0))
+		return "timer";
+	else if (reason & (1<<1))
+	      return "internal trigger 1";
+	else if (reason & (1<<2))
+	      return "internal trigger 2";
+	else if (reason & (1<<3))
+	      return "context switch";
+	else if (reason & (1<<4))
+	      return "GO 1->0 transition (enter RC6)";
+	else if (reason & (1<<5))
+		return "[un]slice clock ratio change";
+	else
+		return "unknown";
+}
+
 static uint64_t
 timebase_scale(uint32_t u32_delta)
 {
@@ -749,7 +771,7 @@ init_sys_info(void)
 				test_set_uuid = "882fa433-1f4a-4a67-a962-c741888fe5f5";
 				break;
 			default:
-				igt_debug("unsupport Skylake GT size\n");
+				igt_debug("unsupported Skylake GT size\n");
 				return false;
 			}
 			timestamp_frequency = 12000000;
@@ -1159,6 +1181,20 @@ open_and_read_2_oa_reports(int format_id,
 }
 
 static void
+gen8_read_report_clock_ratios(uint32_t *report,
+			      uint32_t *slice_freq_mhz,
+			      uint32_t *unslice_freq_mhz)
+{
+	uint32_t unslice_freq = report[0] & 0x1ff;
+	uint32_t slice_freq_low = (report[0] >> 25) & 0x7f;
+	uint32_t slice_freq_high = (report[0] >> 9) & 0x3;
+	uint32_t slice_freq = slice_freq_low | (slice_freq_high << 7);
+
+	*slice_freq_mhz = (slice_freq * 16666) / 1000;
+	*unslice_freq_mhz = (unslice_freq * 16666) / 1000;
+}
+
+static void
 print_reports(uint32_t *oa_report0, uint32_t *oa_report1, int fmt)
 {
 	igt_debug("TIMESTAMP: 1st = %"PRIu32", 2nd = %"PRIu32", delta = %"PRIu32"\n",
@@ -1174,6 +1210,26 @@ print_reports(uint32_t *oa_report0, uint32_t *oa_report1, int fmt)
 			  clock0, clock1, clock1 - clock0);
 	}
 
+	if (intel_gen(devid) >= 8) {
+		uint32_t slice_freq0, slice_freq1, unslice_freq0, unslice_freq1;
+		const char *reason0 = gen8_read_report_reason(oa_report0);
+		const char *reason1 = gen8_read_report_reason(oa_report1);
+
+		gen8_read_report_clock_ratios(oa_report0,
+					      &slice_freq0, &unslice_freq0);
+		gen8_read_report_clock_ratios(oa_report1,
+					      &slice_freq1, &unslice_freq1);
+
+		igt_debug("SLICE CLK: 1st = %umhz, 2nd = %umhz, delta = %d\n",
+			  slice_freq0, slice_freq1,
+			  ((int)slice_freq1 - (int)slice_freq0));
+		igt_debug("UNSLICE CLK: 1st = %umhz, 2nd = %umhz, delta = %d\n",
+			  unslice_freq0, unslice_freq1,
+			  ((int)unslice_freq1 - (int)unslice_freq0));
+
+		igt_debug("REASONS: 1st = \"%s\", 2nd = \"%s\"\n", reason0, reason1);
+	}
+
 	/* Gen8+ has some 40bit A counters... */
 	for (int j = 0; j < oa_formats[fmt].n_a40; j++) {
 		uint64_t value0 = gen8_read_40bit_a_counter(oa_report0, fmt, j);
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH i-g-t 19/29] igt/perf: update print_reports to print context ID
  2017-04-25 22:32 [PATCH i-g-t 00/29] Update i915 perf tests for Gen8+ Lionel Landwerlin
                   ` (17 preceding siblings ...)
  2017-04-25 22:32 ` [PATCH i-g-t 18/29] igt/perf: print [un]slice freq and report reasons in debug Lionel Landwerlin
@ 2017-04-25 22:32 ` Lionel Landwerlin
  2017-04-25 22:32 ` [PATCH i-g-t 20/29] igt/perf: add utility function for checking periodic reports Lionel Landwerlin
                   ` (9 subsequent siblings)
  28 siblings, 0 replies; 36+ messages in thread
From: Lionel Landwerlin @ 2017-04-25 22:32 UTC (permalink / raw)
  To: intel-gfx

From: Robert Bragg <robert@sixbynine.org>

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 tests/perf.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tests/perf.c b/tests/perf.c
index ab8db296..d057d943 100644
--- a/tests/perf.c
+++ b/tests/perf.c
@@ -1215,6 +1215,9 @@ print_reports(uint32_t *oa_report0, uint32_t *oa_report1, int fmt)
 		const char *reason0 = gen8_read_report_reason(oa_report0);
 		const char *reason1 = gen8_read_report_reason(oa_report1);
 
+		igt_debug("CTX ID: 1st = %"PRIu32", 2nd = %"PRIu32"\n",
+			  oa_report0[2], oa_report1[2]);
+
 		gen8_read_report_clock_ratios(oa_report0,
 					      &slice_freq0, &unslice_freq0);
 		gen8_read_report_clock_ratios(oa_report1,
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH i-g-t 20/29] igt/perf: add utility function for checking periodic reports
  2017-04-25 22:32 [PATCH i-g-t 00/29] Update i915 perf tests for Gen8+ Lionel Landwerlin
                   ` (18 preceding siblings ...)
  2017-04-25 22:32 ` [PATCH i-g-t 19/29] igt/perf: update print_reports to print context ID Lionel Landwerlin
@ 2017-04-25 22:32 ` Lionel Landwerlin
  2017-06-16 14:41   ` Matthew Auld
  2017-04-25 22:32 ` [PATCH i-g-t 21/29] igt/perf: make stream_fd a global variable Lionel Landwerlin
                   ` (8 subsequent siblings)
  28 siblings, 1 reply; 36+ messages in thread
From: Lionel Landwerlin @ 2017-04-25 22:32 UTC (permalink / raw)
  To: intel-gfx

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 tests/perf.c | 55 +++++++++++++++++++++++++++++--------------------------
 1 file changed, 29 insertions(+), 26 deletions(-)

diff --git a/tests/perf.c b/tests/perf.c
index d057d943..f8ac06c3 100644
--- a/tests/perf.c
+++ b/tests/perf.c
@@ -450,6 +450,29 @@ gen8_read_report_reason(const uint32_t *report)
 		return "unknown";
 }
 
+static bool
+oa_report_is_periodic(uint32_t oa_exponent, const uint32_t *report)
+{
+
+	if (IS_HASWELL(devid)) {
+		/* For Haswell we don't have a documented report reason field
+		 * (though empirically report[0] bit 10 does seem to correlate
+		 * with a timer trigger reason) so we instead infer which
+		 * reports are timer triggered by checking if the least
+		 * significant bits are zero and the exponent bit is set.
+		 */
+		uint32_t oa_exponent_mask = (1 << (oa_exponent + 1)) - 1;
+		if ((report[1] & oa_exponent_mask) != (1 << oa_exponent))
+			return true;
+	} else {
+		if ((report[0] >> OAREPORT_REASON_SHIFT) &
+		    OAREPORT_REASON_TIMER)
+			return true;
+	}
+
+	return false;
+}
+
 static uint64_t
 timebase_scale(uint32_t u32_delta)
 {
@@ -1115,22 +1138,8 @@ read_2_oa_reports(int stream_fd,
 			igt_assert_neq(report[1], 0);
 
 			if (timer_only) {
-				/* For Haswell we don't have a documented
-				 * report reason field (though empirically
-				 * report[0] bit 10 does seem to correlate with
-				 * a timer trigger reason) so we instead infer
-				 * which reports are timer triggered by
-				 * checking if the least significant bits are
-				 * zero and the exponent bit is set.
-				 */
-				if ((report[1] & exponent_mask) != (1 << exponent)) {
-					igt_debug("skipping non timer report reason=%x\n",
-						  report[0]);
-
-					/* Also assert our hypothesis about the
-					 * reason bit...
-					 */
-					igt_assert_eq(report[0] & (1 << 10), 0);
+				if (!oa_report_is_periodic(exponent, report)) {
+					igt_debug("skipping non timer report\n");
 					continue;
 				}
 			}
@@ -1740,11 +1749,8 @@ test_blocking(void)
 				if (header->type == DRM_I915_PERF_RECORD_SAMPLE) {
 					uint32_t *report = (void *)(header + 1);
 
-					uint32_t reason = ((report[0] >>
-							    OAREPORT_REASON_SHIFT) &
-							   OAREPORT_REASON_MASK);
-
-					if (reason & OAREPORT_REASON_TIMER)
+					if (oa_report_is_periodic(oa_exponent,
+								  report))
 						timer_report_read = true;
 					else
 						non_timer_report_read = true;
@@ -1914,11 +1920,8 @@ test_polling(void)
 				if (header->type == DRM_I915_PERF_RECORD_SAMPLE) {
 					uint32_t *report = (void *)(header + 1);
 
-					uint32_t reason = ((report[0] >>
-							    OAREPORT_REASON_SHIFT) &
-							   OAREPORT_REASON_MASK);
-
-					if (reason & OAREPORT_REASON_TIMER)
+					if (oa_report_is_periodic(oa_exponent,
+								  report))
 						timer_report_read = true;
 					else
 						non_timer_report_read = true;
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH i-g-t 21/29] igt/perf: make stream_fd a global variable
  2017-04-25 22:32 [PATCH i-g-t 00/29] Update i915 perf tests for Gen8+ Lionel Landwerlin
                   ` (19 preceding siblings ...)
  2017-04-25 22:32 ` [PATCH i-g-t 20/29] igt/perf: add utility function for checking periodic reports Lionel Landwerlin
@ 2017-04-25 22:32 ` Lionel Landwerlin
  2017-06-16 14:43   ` Matthew Auld
  2017-06-21 13:47   ` Matthew Auld
  2017-04-25 22:32 ` [PATCH i-g-t 22/29] igt/perf: add per context filtering test for gen8+ Lionel Landwerlin
                   ` (7 subsequent siblings)
  28 siblings, 2 replies; 36+ messages in thread
From: Lionel Landwerlin @ 2017-04-25 22:32 UTC (permalink / raw)
  To: intel-gfx

When debugging unstable tests on new platforms we currently we don't
cleanup everything well in between different tests. Since only a
single OA stream fd can be opened at a time, having the stream_fd as a
global variable helps us cleanup the state between tests.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 tests/perf.c | 108 ++++++++++++++++++++++++++++++++---------------------------
 1 file changed, 58 insertions(+), 50 deletions(-)

diff --git a/tests/perf.c b/tests/perf.c
index f8ac06c3..b7af1c3b 100644
--- a/tests/perf.c
+++ b/tests/perf.c
@@ -243,6 +243,7 @@ static bool hsw_undefined_a_counters[45] = {
 static bool gen8_undefined_a_counters[45];
 
 static int drm_fd = -1;
+static int stream_fd = -1;
 static uint32_t devid;
 static int card = -1;
 static int n_eus;
@@ -264,10 +265,22 @@ static uint32_t (*read_report_ticks)(uint32_t *report,
 static void (*sanity_check_reports)(uint32_t *oa_report0, uint32_t *oa_report1,
 				    enum drm_i915_oa_format format);
 
+static void
+__perf_close(int fd)
+{
+	close(fd);
+	stream_fd = -1;
+}
+
 static int
 __perf_open(int fd, struct drm_i915_perf_open_param *param)
 {
-	int ret = igt_ioctl(fd, DRM_IOCTL_I915_PERF_OPEN, param);
+	int ret;
+
+	if (stream_fd >= 0)
+		__perf_close(stream_fd);
+
+	ret = igt_ioctl(fd, DRM_IOCTL_I915_PERF_OPEN, param);
 
 	igt_assert(ret >= 0);
 	errno = 0;
@@ -918,14 +931,12 @@ test_system_wide_paranoid(void)
 			.num_properties = sizeof(properties) / 16,
 			.properties_ptr = to_user_pointer(properties),
 		};
-		int stream_fd;
-
 		write_u64_file("/proc/sys/dev/i915/perf_stream_paranoid", 0);
 
 		igt_drop_root();
 
 		stream_fd = __perf_open(drm_fd, &param);
-		close(stream_fd);
+		__perf_close(stream_fd);
 	}
 
 	igt_waitchildren();
@@ -973,7 +984,6 @@ test_invalid_oa_metric_set_id(void)
 		.num_properties = sizeof(properties) / 16,
 		.properties_ptr = to_user_pointer(properties),
 	};
-	int stream_fd;
 
 	do_ioctl_err(drm_fd, DRM_IOCTL_I915_PERF_OPEN, &param, EINVAL);
 
@@ -983,7 +993,7 @@ test_invalid_oa_metric_set_id(void)
 	/* Check that we aren't just seeing false positives... */
 	properties[ARRAY_SIZE(properties) - 1] = test_metric_set_id;
 	stream_fd = __perf_open(drm_fd, &param);
-	close(stream_fd);
+	__perf_close(stream_fd);
 
 	/* There's no valid default OA metric set ID... */
 	param.num_properties--;
@@ -1008,7 +1018,6 @@ test_invalid_oa_format_id(void)
 		.num_properties = sizeof(properties) / 16,
 		.properties_ptr = to_user_pointer(properties),
 	};
-	int stream_fd;
 
 	do_ioctl_err(drm_fd, DRM_IOCTL_I915_PERF_OPEN, &param, EINVAL);
 
@@ -1018,7 +1027,7 @@ test_invalid_oa_format_id(void)
 	/* Check that we aren't just seeing false positives... */
 	properties[ARRAY_SIZE(properties) - 1] = test_oa_format;
 	stream_fd = __perf_open(drm_fd, &param);
-	close(stream_fd);
+	__perf_close(stream_fd);
 
 	/* There's no valid default OA format... */
 	param.num_properties--;
@@ -1046,8 +1055,7 @@ test_missing_sample_flags(void)
 }
 
 static void
-read_2_oa_reports(int stream_fd,
-		  int format_id,
+read_2_oa_reports(int format_id,
 		  int exponent,
 		  uint32_t *oa_report0,
 		  uint32_t *oa_report1,
@@ -1181,12 +1189,13 @@ open_and_read_2_oa_reports(int format_id,
 		.num_properties = sizeof(properties) / 16,
 		.properties_ptr = to_user_pointer(properties),
 	};
-	int stream_fd = __perf_open(drm_fd, &param);
 
-	read_2_oa_reports(stream_fd, format_id, exponent,
+	stream_fd = __perf_open(drm_fd, &param);
+
+	read_2_oa_reports(format_id, exponent,
 			  oa_report0, oa_report1, timer_only);
 
-	close(stream_fd);
+	__perf_close(stream_fd);
 }
 
 static void
@@ -1486,9 +1495,10 @@ test_invalid_oa_exponent(void)
 		.num_properties = sizeof(properties) / 16,
 		.properties_ptr = to_user_pointer(properties),
 	};
-	int stream_fd = __perf_open(drm_fd, &param);
 
-	close(stream_fd);
+	stream_fd = __perf_open(drm_fd, &param);
+
+	__perf_close(stream_fd);
 
 	for (int i = 32; i < 65; i++) {
 		properties[7] = i;
@@ -1538,12 +1548,10 @@ test_low_oa_exponent_permissions(void)
 	properties[7] = ok_exponent;
 
 	igt_fork(child, 1) {
-		int stream_fd;
-
 		igt_drop_root();
 
 		stream_fd = __perf_open(drm_fd, &param);
-		close(stream_fd);
+		__perf_close(stream_fd);
 	}
 
 	igt_waitchildren();
@@ -1592,7 +1600,6 @@ test_per_context_mode_unprivileged(void)
 	igt_fork(child, 1) {
 		drm_intel_context *context;
 		drm_intel_bufmgr *bufmgr;
-		int stream_fd;
 		uint32_t ctx_id = 0xffffffff; /* invalid id */
 		int ret;
 
@@ -1610,7 +1617,7 @@ test_per_context_mode_unprivileged(void)
 		properties[1] = ctx_id;
 
 		stream_fd = __perf_open(drm_fd, &param);
-		close(stream_fd);
+		__perf_close(stream_fd);
 
 		drm_intel_gem_context_destroy(context);
 		drm_intel_bufmgr_destroy(bufmgr);
@@ -1673,7 +1680,6 @@ test_blocking(void)
 		.num_properties = sizeof(properties) / 16,
 		.properties_ptr = to_user_pointer(properties),
 	};
-	int stream_fd = __perf_open(drm_fd, &param);
 	uint8_t buf[1024 * 1024];
 	struct tms start_times;
 	struct tms end_times;
@@ -1698,6 +1704,8 @@ test_blocking(void)
 	int64_t start;
 	int n = 0;
 
+	stream_fd = __perf_open(drm_fd, &param);
+
 	times(&start_times);
 
 	igt_debug("tick length = %dns, test duration = %"PRIu64"ns, min iter. = %d, max iter. = %d\n",
@@ -1795,7 +1803,7 @@ test_blocking(void)
 
 	igt_assert(kernel_ns <= (test_duration_ns / 100ull));
 
-	close(stream_fd);
+	__perf_close(stream_fd);
 }
 
 static void
@@ -1824,7 +1832,6 @@ test_polling(void)
 		.num_properties = sizeof(properties) / 16,
 		.properties_ptr = to_user_pointer(properties),
 	};
-	int stream_fd = __perf_open(drm_fd, &param);
 	uint8_t buf[1024 * 1024];
 	struct tms start_times;
 	struct tms end_times;
@@ -1848,6 +1855,8 @@ test_polling(void)
 	int64_t start;
 	int n = 0;
 
+	stream_fd = __perf_open(drm_fd, &param);
+
 	times(&start_times);
 
 	igt_debug("tick length = %dns, test duration = %"PRIu64"ns, min iter. = %d, max iter. = %d\n",
@@ -1976,7 +1985,7 @@ test_polling(void)
 
 	igt_assert(kernel_ns <= (test_duration_ns / 100ull));
 
-	close(stream_fd);
+	__perf_close(stream_fd);
 }
 
 static void
@@ -1999,7 +2008,6 @@ test_buffer_fill(void)
 		.num_properties = sizeof(properties) / 16,
 		.properties_ptr = to_user_pointer(properties),
 	};
-	int stream_fd = __perf_open(drm_fd, &param);
 	int buf_size = 65536 * (256 + sizeof(struct drm_i915_perf_record_header));
 	uint8_t *buf = malloc(buf_size);
 	size_t oa_buf_size = 16 * 1024 * 1024;
@@ -2009,6 +2017,8 @@ test_buffer_fill(void)
 
 	igt_assert(fill_duration < 1000000000);
 
+	stream_fd = __perf_open(drm_fd, &param);
+
 	for (int i = 0; i < 5; i++) {
 		struct drm_i915_perf_record_header *header;
 		bool overflow_seen;
@@ -2059,7 +2069,7 @@ test_buffer_fill(void)
 
 	free(buf);
 
-	close(stream_fd);
+	__perf_close(stream_fd);
 }
 
 static void
@@ -2083,7 +2093,6 @@ test_enable_disable(void)
 		.num_properties = sizeof(properties) / 16,
 		.properties_ptr = to_user_pointer(properties),
 	};
-	int stream_fd = __perf_open(drm_fd, &param);
 	int buf_size = 65536 * (256 + sizeof(struct drm_i915_perf_record_header));
 	uint8_t *buf = malloc(buf_size);
 	size_t oa_buf_size = 16 * 1024 * 1024;
@@ -2091,6 +2100,7 @@ test_enable_disable(void)
 	int n_full_oa_reports = oa_buf_size / report_size;
 	uint64_t fill_duration = n_full_oa_reports * oa_period;
 
+	stream_fd = __perf_open(drm_fd, &param);
 
 	for (int i = 0; i < 5; i++) {
 		int len;
@@ -2136,7 +2146,7 @@ test_enable_disable(void)
 
 	free(buf);
 
-	close(stream_fd);
+	__perf_close(stream_fd);
 }
 
 static void
@@ -2163,7 +2173,6 @@ test_short_reads(void)
 	uint8_t *pages = mmap(NULL, page_size * 2,
 			      PROT_READ|PROT_WRITE, MAP_PRIVATE, zero_fd, 0);
 	struct drm_i915_perf_record_header *header;
-	int stream_fd;
 	int ret;
 
 	igt_assert_neq(zero_fd, -1);
@@ -2220,7 +2229,7 @@ test_short_reads(void)
 	igt_assert_eq(ret, -1);
 	igt_assert_eq(errno, ENOSPC);
 
-	close(stream_fd);
+	__perf_close(stream_fd);
 
 	munmap(pages, page_size * 2);
 }
@@ -2245,14 +2254,16 @@ test_non_sampling_read_error(void)
 		.num_properties = sizeof(properties) / 16,
 		.properties_ptr = to_user_pointer(properties),
 	};
-	int stream_fd = __perf_open(drm_fd, &param);
+	int ret;
 	uint8_t buf[1024];
 
-	int ret = read(stream_fd, buf, sizeof(buf));
+	stream_fd = __perf_open(drm_fd, &param);
+
+	ret = read(stream_fd, buf, sizeof(buf));
 	igt_assert_eq(ret, -1);
 	igt_assert_eq(errno, EIO);
 
-	close(stream_fd);
+	__perf_close(stream_fd);
 }
 
 /* Check that attempts to read from a stream while it is disable will return
@@ -2279,25 +2290,24 @@ test_disabled_read_error(void)
 		.num_properties = sizeof(properties) / 16,
 		.properties_ptr = to_user_pointer(properties),
 	};
-	int stream_fd = __perf_open(drm_fd, &param);
 	uint32_t oa_report0[64];
 	uint32_t oa_report1[64];
 	uint32_t buf[128] = { 0 };
 	int ret;
 
+	stream_fd = __perf_open(drm_fd, &param);
 
 	ret = read(stream_fd, buf, sizeof(buf));
 	igt_assert_eq(ret, -1);
 	igt_assert_eq(errno, EIO);
 
-	close(stream_fd);
+	__perf_close(stream_fd);
 
 
 	param.flags &= ~I915_PERF_FLAG_DISABLED;
 	stream_fd = __perf_open(drm_fd, &param);
 
-	read_2_oa_reports(stream_fd,
-			  test_oa_format,
+	read_2_oa_reports(test_oa_format,
 			  oa_exponent,
 			  oa_report0,
 			  oa_report1,
@@ -2311,14 +2321,13 @@ test_disabled_read_error(void)
 
 	do_ioctl(stream_fd, I915_PERF_IOCTL_ENABLE, 0);
 
-	read_2_oa_reports(stream_fd,
-			  test_oa_format,
+	read_2_oa_reports(test_oa_format,
 			  oa_exponent,
 			  oa_report0,
 			  oa_report1,
 			  false); /* not just timer reports */
 
-	close(stream_fd);
+	__perf_close(stream_fd);
 }
 
 static void
@@ -2367,7 +2376,6 @@ test_mi_rpc(void)
 		.num_properties = sizeof(properties) / 16,
 		.properties_ptr = to_user_pointer(properties),
 	};
-	int stream_fd = __perf_open(drm_fd, &param);
 	drm_intel_bufmgr *bufmgr = drm_intel_bufmgr_gem_init(drm_fd, 4096);
 	drm_intel_context *context;
 	struct intel_batchbuffer *batch;
@@ -2375,6 +2383,8 @@ test_mi_rpc(void)
 	uint32_t *report32;
 	int ret;
 
+	stream_fd = __perf_open(drm_fd, &param);
+
 	drm_intel_bufmgr_gem_enable_reuse(bufmgr);
 
 	context = drm_intel_gem_context_create(bufmgr);
@@ -2412,7 +2422,7 @@ test_mi_rpc(void)
 	intel_batchbuffer_free(batch);
 	drm_intel_gem_context_destroy(context);
 	drm_intel_bufmgr_destroy(bufmgr);
-	close(stream_fd);
+	__perf_close(stream_fd);
 }
 
 static void
@@ -2503,7 +2513,6 @@ hsw_test_single_ctx_counters(void)
 	igt_fork(child, 1) {
 		drm_intel_bufmgr *bufmgr;
 		drm_intel_context *context0, *context1;
-		int stream_fd;
 		struct intel_batchbuffer *batch;
 		struct igt_buf src, dst;
 		drm_intel_bo *bo;
@@ -2682,7 +2691,7 @@ hsw_test_single_ctx_counters(void)
 		drm_intel_gem_context_destroy(context0);
 		drm_intel_gem_context_destroy(context1);
 		drm_intel_bufmgr_destroy(bufmgr);
-		close(stream_fd);
+		__perf_close(stream_fd);
 	}
 
 	igt_waitchildren();
@@ -2705,11 +2714,12 @@ test_rc6_disable(void)
 		.num_properties = sizeof(properties) / 16,
 		.properties_ptr = to_user_pointer(properties),
 	};
-	int stream_fd = __perf_open(drm_fd, &param);
 	uint64_t n_events_start = read_debugfs_u64_record(drm_fd, "i915_drpc_info",
 							  "RC6 residency since boot");
 	uint64_t n_events_end;
 
+	stream_fd = __perf_open(drm_fd, &param);
+
 	nanosleep(&(struct timespec){ .tv_sec = 0, .tv_nsec = 500000000 }, NULL);
 
 	n_events_end = read_debugfs_u64_record(drm_fd, "i915_drpc_info",
@@ -2717,7 +2727,7 @@ test_rc6_disable(void)
 
 	igt_assert_eq(n_events_end - n_events_start, 0);
 
-	close(stream_fd);
+	__perf_close(stream_fd);
 
 	n_events_start = read_debugfs_u64_record(drm_fd, "i915_drpc_info",
 						 "RC6 residency since boot");
@@ -2779,7 +2789,6 @@ test_i915_ref_count(void)
 		.properties_ptr = to_user_pointer(properties),
 	};
 	unsigned baseline, ref_count0, ref_count1;
-	int stream_fd;
 	uint32_t oa_report0[64];
 	uint32_t oa_report1[64];
 
@@ -2819,14 +2828,13 @@ test_i915_ref_count(void)
 
 	igt_assert(ref_count0 > baseline);
 
-	read_2_oa_reports(stream_fd,
-			  test_oa_format,
+	read_2_oa_reports(test_oa_format,
 			  oa_exp_1_millisec,
 			  oa_report0,
 			  oa_report1,
 			  false); /* not just timer reports */
 
-	close(stream_fd);
+	__perf_close(stream_fd);
 	ref_count0 = read_i915_module_ref();
 	igt_debug("ref count after closing i915 perf stream fd = %u\n", ref_count0);
 	igt_assert_eq(ref_count0, baseline);
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH i-g-t 22/29] igt/perf: add per context filtering test for gen8+
  2017-04-25 22:32 [PATCH i-g-t 00/29] Update i915 perf tests for Gen8+ Lionel Landwerlin
                   ` (20 preceding siblings ...)
  2017-04-25 22:32 ` [PATCH i-g-t 21/29] igt/perf: make stream_fd a global variable Lionel Landwerlin
@ 2017-04-25 22:32 ` Lionel Landwerlin
  2017-04-25 22:32 ` [PATCH i-g-t 23/29] igt/perf: update max buffer size for reading reports Lionel Landwerlin
                   ` (6 subsequent siblings)
  28 siblings, 0 replies; 36+ messages in thread
From: Lionel Landwerlin @ 2017-04-25 22:32 UTC (permalink / raw)
  To: intel-gfx

From: Robert Bragg <robert@sixbynine.org>

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 tests/perf.c | 813 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 775 insertions(+), 38 deletions(-)

diff --git a/tests/perf.c b/tests/perf.c
index b7af1c3b..98f80bfd 100644
--- a/tests/perf.c
+++ b/tests/perf.c
@@ -48,7 +48,9 @@ IGT_TEST_DESCRIPTION("Test the i915 perf metrics streaming interface");
 #define OAREPORT_REASON_MASK           0x3f
 #define OAREPORT_REASON_SHIFT          19
 #define OAREPORT_REASON_TIMER          (1<<0)
+#define OAREPORT_REASON_INTERNAL       (3<<1)
 #define OAREPORT_REASON_CTX_SWITCH     (1<<3)
+#define OAREPORT_REASON_GO             (1<<4)
 #define OAREPORT_REASON_CLK_RATIO      (1<<5)
 
 #define GFX_OP_PIPE_CONTROL     ((3 << 29) | (3 << 27) | (2 << 24))
@@ -143,6 +145,13 @@ enum drm_i915_perf_record_type {
 };
 #endif /* !DRM_I915_PERF_OPEN */
 
+struct accumulator {
+#define MAX_RAW_OA_COUNTERS 62
+	enum drm_i915_oa_format format;
+
+	uint64_t deltas[MAX_RAW_OA_COUNTERS];
+};
+
 static struct {
 	const char *name;
 	size_t size;
@@ -532,6 +541,22 @@ oa_exponent_to_ns(int exponent)
        return 1000000000ULL * (2ULL << exponent) / timestamp_frequency;
 }
 
+static bool
+oa_report_ctx_is_valid(uint32_t *report)
+{
+	if (IS_HASWELL(devid)) {
+		return false; /* TODO */
+	} else if (IS_GEN8(devid)) {
+		return report[0] & (1ul << 25);
+	} else if (IS_GEN9(devid)) {
+		return report[0] & (1ul << 16);
+	}
+
+	/* Need to update this function for newer Gen. */
+	igt_assert(!"reached");
+}
+
+
 static void
 hsw_sanity_check_render_basic_reports(uint32_t *oa_report0, uint32_t *oa_report1,
 				      enum drm_i915_oa_format fmt)
@@ -636,6 +661,100 @@ gen8_40bit_a_delta(uint64_t value0, uint64_t value1)
 		return value1 - value0;
 }
 
+static void
+accumulate_uint32(size_t offset,
+		  uint32_t *report0,
+                  uint32_t *report1,
+                  uint64_t *delta)
+{
+	uint32_t value0 = *(uint32_t *)(((uint8_t *)report0) + offset);
+	uint32_t value1 = *(uint32_t *)(((uint8_t *)report1) + offset);
+
+	*delta += (uint32_t)(value1 - value0);
+}
+
+static void
+accumulate_uint40(int a_index,
+                  uint32_t *report0,
+                  uint32_t *report1,
+		  enum drm_i915_oa_format format,
+                  uint64_t *delta)
+{
+	uint64_t value0 = gen8_read_40bit_a_counter(report0, format, a_index),
+		 value1 = gen8_read_40bit_a_counter(report1, format, a_index);
+
+	*delta += gen8_40bit_a_delta(value0, value1);
+}
+
+static void
+accumulate_reports(struct accumulator *accumulator,
+		   uint32_t *start,
+		   uint32_t *end)
+{
+	enum drm_i915_oa_format format = accumulator->format;
+	uint64_t *deltas = accumulator->deltas;
+	int idx = 0;
+
+	if (intel_gen(devid) >= 8) {
+		/* timestamp */
+		accumulate_uint32(4, start, end, deltas + idx++);
+
+		/* clock cycles */
+		accumulate_uint32(12, start, end, deltas + idx++);
+	} else {
+		/* timestamp */
+		accumulate_uint32(4, start, end, deltas + idx++);
+	}
+
+	for (int i = 0; i < oa_formats[format].n_a40; i++)
+		accumulate_uint40(i, start, end, format, deltas + idx++);
+
+	for (int i = 0; i < oa_formats[format].n_a; i++) {
+		accumulate_uint32(oa_formats[format].a_off + 4 * i,
+				  start, end, deltas + idx++);
+	}
+
+	for (int i = 0; i < oa_formats[format].n_b; i++) {
+		accumulate_uint32(oa_formats[format].b_off + 4 * i,
+				  start, end, deltas + idx++);
+	}
+
+	for (int i = 0; i < oa_formats[format].n_c; i++) {
+		accumulate_uint32(oa_formats[format].c_off + 4 * i,
+				  start, end, deltas + idx++);
+	}
+}
+
+static void
+accumulator_print(struct accumulator *accumulator, const char *title)
+{
+	enum drm_i915_oa_format format = accumulator->format;
+	uint64_t *deltas = accumulator->deltas;
+	int idx = 0;
+
+	igt_debug("%s:\n", title);
+	if (intel_gen(devid) >= 8) {
+		igt_debug("\ttime delta = %lu\n", deltas[idx++]);
+		igt_debug("\tclock cycle delta = %lu\n", deltas[idx++]);
+
+		for (int i = 0; i < oa_formats[format].n_a40; i++)
+			igt_debug("\tA%u = %lu\n", i, deltas[idx++]);
+	} else {
+		igt_debug("\ttime delta = %lu\n", deltas[idx++]);
+	}
+
+	for (int i = 0; i < oa_formats[format].n_a; i++) {
+		int a_id = oa_formats[format].first_a + i;
+		igt_debug("\tA%u = %lu\n", a_id, deltas[idx++]);
+	}
+
+	for (int i = 0; i < oa_formats[format].n_a; i++)
+		igt_debug("\tB%u = %lu\n", i, deltas[idx++]);
+
+	for (int i = 0; i < oa_formats[format].n_c; i++)
+		igt_debug("\tC%u = %lu\n", i, deltas[idx++]);
+}
+
 /* The TestOa metric set is designed so */
 static void
 gen8_sanity_check_test_oa_reports(uint32_t *oa_report0, uint32_t *oa_report1,
@@ -884,6 +1003,62 @@ gt_frequency_range_restore(void)
 	gt_max_freq_mhz = gt_max_freq_mhz_saved;
 }
 
+static int
+i915_read_reports_until_timestamp(enum drm_i915_oa_format oa_format,
+				  uint8_t *buf,
+				  uint32_t max_size,
+				  uint32_t start_timestamp,
+				  uint32_t end_timestamp)
+{
+	size_t format_size = oa_formats[oa_format].size;
+	uint32_t last_seen_timestamp = start_timestamp;
+	int total_len = 0;
+
+	while (last_seen_timestamp < end_timestamp) {
+		int offset, len;
+
+		/* Running out of space. */
+		if ((max_size - total_len) < format_size) {
+			igt_warn("run out of space before reaching "
+				 "end timestamp (%u/%u)\n",
+				 last_seen_timestamp, end_timestamp);
+			return -1;
+		}
+
+		while ((len = read(stream_fd, &buf[total_len],
+				   max_size - total_len)) < 0 &&
+		       errno == EINTR)
+			;
+
+		/* Intentionally return an error. */
+		if (len <= 0) {
+			if (errno == EAGAIN)
+				return total_len;
+			else {
+				igt_warn("error read OA stream : %i\n", errno);
+				return -1;
+			}
+		}
+
+		offset = total_len;
+		total_len += len;
+
+		while (offset < total_len) {
+		  const struct drm_i915_perf_record_header *header =
+		    (const struct drm_i915_perf_record_header *) &buf[offset];
+		  uint32_t *report = (uint32_t *) (header + 1);
+
+		  if (header->type == DRM_I915_PERF_RECORD_SAMPLE)
+		    last_seen_timestamp = report[1];
+
+		  offset += header->size;
+		}
+	}
+
+	return total_len;
+}
+
+
 /* CAP_SYS_ADMIN is required to open system wide metrics, unless the system
  * control parameter dev.i915.perf_stream_paranoid == 0 */
 static void
@@ -1303,6 +1478,66 @@ print_reports(uint32_t *oa_report0, uint32_t *oa_report1, int fmt)
 }
 
 static void
+print_report(uint32_t *report, int fmt)
+{
+	igt_debug("TIMESTAMP: %"PRIu32"\n", report[1]);
+
+	if (IS_HASWELL(devid) && oa_formats[fmt].n_c == 0) {
+		igt_debug("CLOCK = N/A\n");
+	} else {
+		uint32_t clock = read_report_ticks(report, fmt);
+
+		igt_debug("CLOCK: %"PRIu32"\n", clock);
+	}
+
+	if (intel_gen(devid) >= 8) {
+		uint32_t slice_freq, unslice_freq;
+		const char *reason = gen8_read_report_reason(report);
+
+		gen8_read_report_clock_ratios(report, &slice_freq, &unslice_freq);
+
+		igt_debug("SLICE CLK: %umhz\n", slice_freq);
+		igt_debug("UNSLICE CLK: %umhz\n", unslice_freq);
+		igt_debug("REASON: \"%s\"\n", reason);
+		igt_debug("CTX ID: %"PRIu32"/%"PRIx32"\n", report[2], report[2]);
+	}
+
+	/* Gen8+ has some 40bit A counters... */
+	for (int j = 0; j < oa_formats[fmt].n_a40; j++) {
+		uint64_t value = gen8_read_40bit_a_counter(report, fmt, j);
+
+		if (undefined_a_counters[j])
+			continue;
+
+		igt_debug("A%d: %"PRIu64"\n", j, value);
+	}
+
+	for (int j = 0; j < oa_formats[fmt].n_a; j++) {
+		uint32_t *a = (uint32_t *)(((uint8_t *)report) +
+					   oa_formats[fmt].a_off);
+		int a_id = oa_formats[fmt].first_a + j;
+
+		if (undefined_a_counters[a_id])
+			continue;
+
+		igt_debug("A%d: %"PRIu32"\n", a_id, a[j]);
+	}
+
+	for (int j = 0; j < oa_formats[fmt].n_b; j++) {
+		uint32_t *b = (uint32_t *)(((uint8_t *)report) +
+					   oa_formats[fmt].b_off);
+
+		igt_debug("B%d: %"PRIu32"\n", j, b[j]);
+	}
+
+	for (int j = 0; j < oa_formats[fmt].n_c; j++) {
+		uint32_t *c = (uint32_t *)(((uint8_t *)report) +
+					   oa_formats[fmt].c_off);
+
+		igt_debug("C%d: %"PRIu32"\n", j, c[j]);
+	}
+}
+static void
 test_oa_formats(void)
 {
 	for (int i = 0; i < ARRAY_SIZE(oa_formats); i++) {
@@ -1382,6 +1617,11 @@ test_oa_exponents(int gt_freq_mhz)
 		uint32_t freq;
 		int n_tested = 0;
 		int n_freq_matches = 0;
+		int n_time_delta_matches = 0;
+
+#warning "XXX: it seems pretty odd that the time delta assertion failures centre around these exponents"
+		if (i == 6 || i == 7 || i == 8)
+			continue;
 
 		/* The exponent is effectively selecting a bit in the timestamp
 		 * to trigger reports on and so in practice we expect the raw
@@ -1421,15 +1661,19 @@ test_oa_exponents(int gt_freq_mhz)
 			timestamp_delta = oa_report1[1] - oa_report0[1];
 			igt_assert_neq(timestamp_delta, 0);
 
-			if (timestamp_delta != expected_timestamp_delta) {
-				igt_debug("timestamp0 = %u/0x%x\n",
-					  oa_report0[1], oa_report0[1]);
-				igt_debug("timestamp1 = %u/0x%x\n",
+			if (timestamp_delta == expected_timestamp_delta)
+				n_time_delta_matches++;
+			else {
+				igt_debug("timestamp delta mismatch: %"PRIu64"ns != expected %"PRIu64"ns, ts0 = %u/0x%x, ts1 = %u/0x%x\n",
+					  timebase_scale(timestamp_delta),
+					  timebase_scale(expected_timestamp_delta),
+					  oa_report0[1], oa_report0[1],
 					  oa_report1[1], oa_report1[1]);
+				print_reports(oa_report0, oa_report1, test_oa_format);
+				igt_assert(timestamp_delta <
+					   (expected_timestamp_delta * 2));
 			}
 
-			igt_assert_eq(timestamp_delta, expected_timestamp_delta);
-
 			ticks0 = read_report_ticks(oa_report0, test_oa_format);
 			ticks1 = read_report_ticks(oa_report1, test_oa_format);
 			clock_delta = ticks1 - ticks0;
@@ -1451,6 +1695,16 @@ test_oa_exponents(int gt_freq_mhz)
 			igt_debug("sysfs frequency pinning too unstable for cross-referencing with OA derived frequency");
 		igt_assert_eq(n_tested, 10);
 
+		igt_debug("number of iterations with expected timestamp delta = %d\n",
+			  n_time_delta_matches);
+
+		/* The HW doesn't give us any strict guarantee that the
+		 * timestamps are exactly aligned with the exponent mask but
+		 * in practice it seems very rare for that not to be the case
+		 * so it a useful sanity check to assert quite strictly...
+		 */
+		igt_assert(n_time_delta_matches >= 9);
+
 		igt_debug("number of iterations with expected clock frequency = %d\n",
 			  n_freq_matches);
 
@@ -2426,14 +2680,8 @@ test_mi_rpc(void)
 }
 
 static void
-scratch_buf_init(drm_intel_bufmgr *bufmgr,
-		 struct igt_buf *buf,
-		 int width, int height,
-		 uint32_t color)
+scratch_buf_memset(drm_intel_bo *bo, int width, int height, uint32_t color)
 {
-	size_t stride = width * 4;
-	size_t size = stride * height;
-	drm_intel_bo *bo = drm_intel_bo_alloc(bufmgr, "", size, 4096);
 	int ret;
 
 	ret = drm_intel_bo_map(bo, true /* writable */);
@@ -2443,6 +2691,19 @@ scratch_buf_init(drm_intel_bufmgr *bufmgr,
 		((uint32_t *)bo->virtual)[i] = color;
 
 	drm_intel_bo_unmap(bo);
+}
+
+static void
+scratch_buf_init(drm_intel_bufmgr *bufmgr,
+		 struct igt_buf *buf,
+		 int width, int height,
+		 uint32_t color)
+{
+	size_t stride = width * 4;
+	size_t size = stride * height;
+	drm_intel_bo *bo = drm_intel_bo_alloc(bufmgr, "", size, 4096);
+
+	scratch_buf_memset(bo, width, height, color);
 
 	buf->bo = bo;
 	buf->stride = stride;
@@ -2461,14 +2722,25 @@ emit_stall_timestamp_and_rpc(struct intel_batchbuffer *batch,
 				   PIPE_CONTROL_RENDER_TARGET_FLUSH |
 				   PIPE_CONTROL_WRITE_TIMESTAMP);
 
-	BEGIN_BATCH(5, 1);
-	OUT_BATCH(GFX_OP_PIPE_CONTROL | (5 - 2));
-	OUT_BATCH(pipe_ctl_flags);
-	OUT_RELOC(dst, I915_GEM_DOMAIN_INSTRUCTION, I915_GEM_DOMAIN_INSTRUCTION,
-		  timestamp_offset);
-	OUT_BATCH(0); /* imm lower */
-	OUT_BATCH(0); /* imm upper */
-	ADVANCE_BATCH();
+	if (intel_gen(devid) >= 8) {
+		BEGIN_BATCH(5, 1);
+		OUT_BATCH(GFX_OP_PIPE_CONTROL | (6 - 2));
+		OUT_BATCH(pipe_ctl_flags);
+		OUT_RELOC(dst, I915_GEM_DOMAIN_INSTRUCTION, I915_GEM_DOMAIN_INSTRUCTION,
+			  timestamp_offset);
+		OUT_BATCH(0); /* imm lower */
+		OUT_BATCH(0); /* imm upper */
+		ADVANCE_BATCH();
+	} else {
+		BEGIN_BATCH(5, 1);
+		OUT_BATCH(GFX_OP_PIPE_CONTROL | (5 - 2));
+		OUT_BATCH(pipe_ctl_flags);
+		OUT_RELOC(dst, I915_GEM_DOMAIN_INSTRUCTION, I915_GEM_DOMAIN_INSTRUCTION,
+			  timestamp_offset);
+		OUT_BATCH(0); /* imm lower */
+		OUT_BATCH(0); /* imm upper */
+		ADVANCE_BATCH();
+	}
 
 	emit_report_perf_count(batch, dst, report_dst_offset, report_id);
 }
@@ -2514,7 +2786,7 @@ hsw_test_single_ctx_counters(void)
 		drm_intel_bufmgr *bufmgr;
 		drm_intel_context *context0, *context1;
 		struct intel_batchbuffer *batch;
-		struct igt_buf src, dst;
+		struct igt_buf src[3], dst[3];
 		drm_intel_bo *bo;
 		uint32_t *report0_32, *report1_32;
 		uint64_t timestamp0_64, timestamp1_64;
@@ -2532,8 +2804,10 @@ hsw_test_single_ctx_counters(void)
 		bufmgr = drm_intel_bufmgr_gem_init(drm_fd, 4096);
 		drm_intel_bufmgr_gem_enable_reuse(bufmgr);
 
-		scratch_buf_init(bufmgr, &src, width, height, 0xff0000ff);
-		scratch_buf_init(bufmgr, &dst, width, height, 0x00ff00ff);
+		for (int i = 0; i < ARRAY_SIZE(src); i++) {
+			scratch_buf_init(bufmgr, &src[i], width, height, 0xff0000ff);
+			scratch_buf_init(bufmgr, &dst[i], width, height, 0x00ff00ff);
+		}
 
 		batch = intel_batchbuffer_alloc(bufmgr, devid);
 
@@ -2567,14 +2841,19 @@ hsw_test_single_ctx_counters(void)
 		 */
 		render_copy(batch,
 			    context0,
-			    &src, 0, 0, width, height,
-			    &dst, 0, 0);
+			    &src[0], 0, 0, width, height,
+			    &dst[0], 0, 0);
 
 		ret = drm_intel_gem_context_get_id(context0, &ctx_id);
 		igt_assert_eq(ret, 0);
 		igt_assert_neq(ctx_id, 0xffffffff);
 		properties[1] = ctx_id;
 
+		intel_batchbuffer_flush_with_context(batch, context0);
+
+		scratch_buf_memset(src[0].bo, width, height, 0xff0000ff);
+		scratch_buf_memset(dst[0].bo, width, height, 0x00ff00ff);
+
 		igt_debug("opening i915-perf stream\n");
 		stream_fd = __perf_open(drm_fd, &param);
 
@@ -2601,8 +2880,8 @@ hsw_test_single_ctx_counters(void)
 
 		render_copy(batch,
 			    context0,
-			    &src, 0, 0, width, height,
-			    &dst, 0, 0);
+			    &src[0], 0, 0, width, height,
+			    &dst[0], 0, 0);
 
 		/* Another redundant flush to clarify batch bo is free to reuse */
 		intel_batchbuffer_flush_with_context(batch, context0);
@@ -2613,13 +2892,13 @@ hsw_test_single_ctx_counters(void)
 		 */
 		render_copy(batch,
 			    context1,
-			    &src, 0, 0, width, height,
-			    &dst, 0, 0);
+			    &src[1], 0, 0, width, height,
+			    &dst[1], 0, 0);
 
 		render_copy(batch,
 			    context1,
-			    &src, 0, 0, width, height,
-			    &dst, 0, 0);
+			    &src[2], 0, 0, width, height,
+			    &dst[2], 0, 0);
 
 		/* And another */
 		intel_batchbuffer_flush_with_context(batch, context1);
@@ -2648,6 +2927,7 @@ hsw_test_single_ctx_counters(void)
 
 		/* A40 == N samples written to all render targets */
 		n_samples_written = report1_32[43] - report0_32[43];
+
 		igt_debug("n samples written = %d\n", n_samples_written);
 		igt_assert_eq(n_samples_written, width * height);
 
@@ -2682,8 +2962,10 @@ hsw_test_single_ctx_counters(void)
 			(delta_oa32_ns - delta_ts64_ns);
 		igt_assert(delta_delta <= 320);
 
-		drm_intel_bo_unreference(src.bo);
-		drm_intel_bo_unreference(dst.bo);
+		for (int i = 0; i < ARRAY_SIZE(src); i++) {
+			drm_intel_bo_unreference(src[i].bo);
+			drm_intel_bo_unreference(dst[i].bo);
+		}
 
 		drm_intel_bo_unmap(bo);
 		drm_intel_bo_unreference(bo);
@@ -2697,6 +2979,454 @@ hsw_test_single_ctx_counters(void)
 	igt_waitchildren();
 }
 
+/* Tests the INTEL_performance_query use case where an unprivileged process
+ * should be able to configure the OA unit for per-context metrics (for a
+ * context associated with that process' drm file descriptor) and the counters
+ * should only relate to that specific context.
+ *
+ * For Gen8+ although reports read via i915 perf can be filtered for a single
+ * context the counters themselves always progress as global/system-wide
+ * counters affected by all contexts. To support the INTEL_performance_query
+ * use case on Gen8+ it's necessary to combine OABUFFER and
+ * MI_REPORT_PERF_COUNT reports so that counter normalisation can take into
+ * account context-switch reports and factor out any counter progression not
+ * associated with the current context.
+ */
+static void
+gen8_test_single_ctx_render_target_writes_a_counter(void)
+{
+	int oa_exponent = max_oa_exponent_for_period_lte(1000000);
+	uint64_t properties[] = {
+		DRM_I915_PERF_PROP_CTX_HANDLE, UINT64_MAX, /* updated below */
+
+		/* Note: we have to specify at least one sample property even
+		 * though we aren't interested in samples in this case
+		 */
+		DRM_I915_PERF_PROP_SAMPLE_OA, true,
+
+		/* OA unit configuration */
+		DRM_I915_PERF_PROP_OA_METRICS_SET, test_metric_set_id,
+		DRM_I915_PERF_PROP_OA_FORMAT, test_oa_format,
+		DRM_I915_PERF_PROP_OA_EXPONENT, oa_exponent,
+
+		/* Note: no OA exponent specified in this case */
+	};
+	struct drm_i915_perf_open_param param = {
+		.flags = I915_PERF_FLAG_FD_CLOEXEC,
+		.num_properties = ARRAY_SIZE(properties) / 2,
+		.properties_ptr = to_user_pointer(properties),
+	};
+	size_t format_size = oa_formats[test_oa_format].size;
+	size_t sample_size = (sizeof(struct drm_i915_perf_record_header) +
+			      format_size);
+	int max_reports = (16 * 1024 * 1024) / format_size;
+	int buf_size = sample_size * max_reports * 1.5;
+	int child_ret;
+	uint8_t *buf = malloc(buf_size);
+	ssize_t len;
+	struct igt_helper_process child = {};
+
+	/* should be default, but just to be sure... */
+	write_u64_file("/proc/sys/dev/i915/perf_stream_paranoid", 1);
+
+	do {
+
+		igt_fork_helper(&child) {
+			struct drm_i915_perf_record_header *header;
+			drm_intel_bufmgr *bufmgr;
+			drm_intel_context *context0, *context1;
+			struct intel_batchbuffer *batch;
+			struct igt_buf src[3], dst[3];
+			drm_intel_bo *bo;
+			uint32_t *report0_32, *report1_32;
+			uint32_t *prev, *lprev = NULL;
+			uint64_t timestamp0_64, timestamp1_64;
+			uint32_t delta_ts64, delta_oa32;
+			uint64_t delta_ts64_ns, delta_oa32_ns;
+			uint32_t delta_delta;
+			int width = 800;
+			int height = 600;
+			uint32_t ctx_id = 0xffffffff; /* invalid handle */
+			uint32_t ctx1_id = 0xffffffff;  /* invalid handle */
+			uint32_t current_ctx_id = 0xffffffff;
+			uint32_t n_invalid_ctx = 0;
+			int ret;
+			struct accumulator accumulator = {
+				.format = test_oa_format
+			};
+
+			//igt_drop_root();
+
+			bufmgr = drm_intel_bufmgr_gem_init(drm_fd, 4096);
+			drm_intel_bufmgr_gem_enable_reuse(bufmgr);
+
+			for (int i = 0; i < ARRAY_SIZE(src); i++) {
+				scratch_buf_init(bufmgr, &src[i], width, height, 0xff0000ff);
+				scratch_buf_init(bufmgr, &dst[i], width, height, 0x00ff00ff);
+			}
+
+			batch = intel_batchbuffer_alloc(bufmgr, devid);
+
+			context0 = drm_intel_gem_context_create(bufmgr);
+			igt_assert(context0);
+
+			context1 = drm_intel_gem_context_create(bufmgr);
+			igt_assert(context1);
+
+			igt_debug("submitting warm up render_copy\n");
+
+			/* Submit some early, unmeasured, work to the context we want
+			 * to measure to try and catch issues with i915-perf
+			 * initializing the HW context ID for filtering.
+			 *
+			 * We do this because i915-perf single context filtering had
+			 * previously only relied on a hook into context pinning to
+			 * initialize the HW context ID, instead of also trying to
+			 * determine the HW ID while opening the stream, in case it
+			 * has already been pinned.
+			 *
+			 * This wasn't noticed by the previous unit test because we
+			 * were opening the stream while the context hadn't been
+			 * touched or pinned yet and so it worked out correctly to wait
+			 * for the pinning hook.
+			 *
+			 * Now a buggy version of i915-perf will fail to measure
+			 * anything for context0 once this initial render_copy() ends
+			 * up pinning the context since there won't ever be a pinning
+			 * hook callback.
+			 */
+			render_copy(batch,
+				    context0,
+				    &src[0], 0, 0, width, height,
+				    &dst[0], 0, 0);
+
+			ret = drm_intel_gem_context_get_id(context0, &ctx_id);
+			igt_assert_eq(ret, 0);
+			igt_assert_neq(ctx_id, 0xffffffff);
+			properties[1] = ctx_id;
+
+			scratch_buf_memset(src[0].bo, width, height, 0xff0000ff);
+			scratch_buf_memset(dst[0].bo, width, height, 0x00ff00ff);
+
+			igt_debug("opening i915-perf stream\n");
+			stream_fd = __perf_open(drm_fd, &param);
+
+			bo = drm_intel_bo_alloc(bufmgr, "mi_rpc dest bo", 4096, 64);
+
+			ret = drm_intel_bo_map(bo, true /* write enable */);
+			igt_assert_eq(ret, 0);
+
+			memset(bo->virtual, 0x80, 4096);
+			drm_intel_bo_unmap(bo);
+
+			emit_stall_timestamp_and_rpc(batch,
+						     bo,
+						     512 /* timestamp offset */,
+						     0, /* report dst offset */
+						     0xdeadbeef); /* report id */
+
+			/* Explicitly flush here (even though the render_copy() call
+			 * will itself flush before/after the copy) to clarify that
+			 * that the PIPE_CONTROL + MI_RPC commands will be in a
+			 * separate batch from the copy.
+			 */
+			intel_batchbuffer_flush_with_context(batch, context0);
+
+			render_copy(batch,
+				    context0,
+				    &src[0], 0, 0, width, height,
+				    &dst[0], 0, 0);
+
+			/* Another redundant flush to clarify batch bo is free to reuse */
+			intel_batchbuffer_flush_with_context(batch, context0);
+
+			/* submit two copies on the other context to avoid a false
+			 * positive in case the driver somehow ended up filtering for
+			 * context1
+			 */
+			render_copy(batch,
+				    context1,
+				    &src[1], 0, 0, width, height,
+				    &dst[1], 0, 0);
+
+			ret = drm_intel_gem_context_get_id(context1, &ctx1_id);
+			igt_assert_eq(ret, 0);
+			igt_assert_neq(ctx1_id, 0xffffffff);
+
+			render_copy(batch,
+				    context1,
+				    &src[2], 0, 0, width, height,
+				    &dst[2], 0, 0);
+
+			/* And another */
+			intel_batchbuffer_flush_with_context(batch, context1);
+
+			emit_stall_timestamp_and_rpc(batch,
+						     bo,
+						     520 /* timestamp offset */,
+						     256, /* report dst offset */
+						     0xbeefbeef); /* report id */
+
+			intel_batchbuffer_flush_with_context(batch, context1);
+
+			ret = drm_intel_bo_map(bo, false /* write enable */);
+			igt_assert_eq(ret, 0);
+
+			report0_32 = bo->virtual;
+			igt_assert_eq(report0_32[0], 0xdeadbeef); /* report ID */
+			igt_assert_neq(report0_32[1], 0); /* timestamp */
+			//report0_32[2] = 0xffffffff;
+			prev = report0_32;
+			ctx_id = prev[2];
+			igt_debug("MI_RPC(start) CTX ID: %u\n", ctx_id);
+
+			report1_32 = report0_32 + 64; /* 64 uint32_t = 256bytes offset */
+			igt_assert_eq(report1_32[0], 0xbeefbeef); /* report ID */
+			igt_assert_neq(report1_32[1], 0); /* timestamp */
+			//report1_32[2] = 0xffffffff;
+			ctx1_id = report1_32[2];
+
+			memset(accumulator.deltas, 0, sizeof(accumulator.deltas));
+			accumulate_reports(&accumulator, report0_32, report1_32);
+			igt_debug("total: A0 = %lu, A21 = %lu, A26 = %lu\n",
+				  accumulator.deltas[2 + 0], /* skip timestamp + clock cycles */
+				  accumulator.deltas[2 + 21],
+				  accumulator.deltas[2 + 26]);
+
+			igt_debug("oa_timestamp32 0 = %u\n", report0_32[1]);
+			igt_debug("oa_timestamp32 1 = %u\n", report1_32[1]);
+			igt_debug("ctx_id 0 = %u\n", report0_32[2]);
+			igt_debug("ctx_id 1 = %u\n", report1_32[2]);
+
+			timestamp0_64 = *(uint64_t *)(((uint8_t *)bo->virtual) + 512);
+			timestamp1_64 = *(uint64_t *)(((uint8_t *)bo->virtual) + 520);
+
+			igt_debug("ts_timestamp64 0 = %"PRIu64"\n", timestamp0_64);
+			igt_debug("ts_timestamp64 1 = %"PRIu64"\n", timestamp1_64);
+
+			delta_ts64 = timestamp1_64 - timestamp0_64;
+			delta_oa32 = report1_32[1] - report0_32[1];
+
+			/* sanity check that we can pass the delta to timebase_scale */
+			igt_assert(delta_ts64 < UINT32_MAX);
+			delta_oa32_ns = timebase_scale(delta_oa32);
+			delta_ts64_ns = timebase_scale(delta_ts64);
+
+			igt_debug("oa32 delta = %u, = %uns\n",
+				  delta_oa32, (unsigned)delta_oa32_ns);
+			igt_debug("ts64 delta = %u, = %uns\n",
+				  delta_ts64, (unsigned)delta_ts64_ns);
+
+			/* The delta as calculated via the PIPE_CONTROL timestamp or
+			 * the OA report timestamps should be almost identical but
+			 * allow a 320 nanoseconds margin.
+			 */
+			delta_delta = delta_ts64_ns > delta_oa32_ns ?
+				(delta_ts64_ns - delta_oa32_ns) :
+				(delta_oa32_ns - delta_ts64_ns);
+			if (delta_delta > 500) {
+				igt_debug("skipping\n");
+				exit(EAGAIN);
+			}
+
+			len = i915_read_reports_until_timestamp(test_oa_format,
+								buf, buf_size,
+								report0_32[1],
+								report1_32[1]);
+
+			igt_assert(len > 0);
+			igt_debug("read %d bytes\n", (int)len);
+
+			memset(accumulator.deltas, 0, sizeof(accumulator.deltas));
+
+			for (size_t offset = 0; offset < len; offset += header->size) {
+				uint32_t *report;
+				uint32_t reason;
+				const char *skip_reason = NULL, *report_reason = NULL;
+				struct accumulator laccumulator = {
+					.format = test_oa_format
+				};
+
+
+				header = (void *)(buf + offset);
+
+				igt_assert_eq(header->pad, 0); /* Reserved */
+
+				/* Currently the only test that should ever expect to
+				 * see a _BUFFER_LOST error is the buffer_fill test,
+				 * otherwise something bad has probably happened...
+				 */
+				igt_assert_neq(header->type, DRM_I915_PERF_RECORD_OA_BUFFER_LOST);
+
+				/* At high sampling frequencies the OA HW might not be
+				 * able to cope with all write requests and will notify
+				 * us that a report was lost.
+				 *
+				 * XXX: we should maybe restart the test in this case?
+				 */
+				if (header->type == DRM_I915_PERF_RECORD_OA_REPORT_LOST) {
+					igt_debug("OA trigger collision / report lost\n");
+					exit(EAGAIN);
+				}
+
+				/* Currently the only other record type expected is a
+				 * _SAMPLE. Notably this test will need updating if
+				 * i915-perf is extended in the future with additional
+				 * record types.
+				 */
+				igt_assert_eq(header->type, DRM_I915_PERF_RECORD_SAMPLE);
+
+				igt_assert_eq(header->size, sample_size);
+
+				report = (void *)(header + 1);
+
+				/* Don't expect zero for timestamps */
+				igt_assert_neq(report[1], 0);
+
+				igt_debug("report %p:\n", report);
+
+				/* Discard reports not contained in between the
+				 * timestamps we're looking at. */
+				{
+					uint32_t time_delta = report[1] - report0_32[1];
+
+					if (timebase_scale(time_delta) > 1000000000) {
+						skip_reason = "prior first mi-rpc";
+					}
+				}
+
+				{
+					uint32_t time_delta = report[1] - report1_32[1];
+
+					if (timebase_scale(time_delta) <= 1000000000) {
+						igt_debug("    comes after last MI_RPC (%u)\n",
+							  report1_32[1]);
+						report = report1_32;
+					}
+				}
+
+				/* Print out deltas for a few significant
+				 * counters for each report. */
+				if (lprev) {
+					memset(laccumulator.deltas, 0, sizeof(laccumulator.deltas));
+					accumulate_reports(&laccumulator, lprev, report);
+					igt_debug("    deltas: A0=%lu A21=%lu, A26=%lu\n",
+						  laccumulator.deltas[2 + 0], /* skip timestamp + clock cycles */
+						  laccumulator.deltas[2 + 21],
+						  laccumulator.deltas[2 + 26]);
+				}
+				lprev = report;
+
+				/* Print out reason for the report. */
+				reason = ((report[0] >> OAREPORT_REASON_SHIFT) &
+					  OAREPORT_REASON_MASK);
+
+				if (reason & OAREPORT_REASON_CTX_SWITCH) {
+					report_reason = "ctx-load";
+				} else if (reason & OAREPORT_REASON_TIMER) {
+					report_reason = "timer";
+				} else if (reason & OAREPORT_REASON_INTERNAL ||
+					   reason & OAREPORT_REASON_GO ||
+					   reason & OAREPORT_REASON_CLK_RATIO) {
+					report_reason = "internal/go/clk-ratio";
+				} else {
+					report_reason = "end-mi-rpc";
+				}
+				igt_debug("    ctx_id=%u/%x reason=%s oa_timestamp32=%u\n",
+					  report[2], report[2], report_reason, report[1]);
+
+				/* Should we skip this report?
+				 *
+				 *   Only if the current context id of
+				 *   the stream is not the one we want
+				 *   to measure.
+				 */
+				if (current_ctx_id != ctx_id) {
+					skip_reason = "not our context";
+				}
+
+				if (n_invalid_ctx > 1) {
+					skip_reason = "too many invalid context events";
+				}
+
+				if (!skip_reason) {
+					accumulate_reports(&accumulator, prev, report);
+					igt_debug(" -> Accumulated deltas A0=%lu A21=%lu, A26=%lu\n",
+						  accumulator.deltas[2 + 0], /* skip timestamp + clock cycles */
+						  accumulator.deltas[2 + 21],
+						  accumulator.deltas[2 + 26]);
+				} else {
+					igt_debug(" -> Skipping: %s\n", skip_reason);
+				}
+
+
+				/* Finally update current-ctx_id, only possible
+				 * with a valid contextOB id. */
+				if (oa_report_ctx_is_valid(report)) {
+					current_ctx_id = report[2];
+					n_invalid_ctx = 0;
+				} else {
+					n_invalid_ctx++;
+				}
+
+				prev = report;
+
+				if (report == report1_32) {
+					igt_debug("Breaking on end of report\n");
+					print_reports(report0_32, report1_32,
+						      lookup_format(test_oa_format));
+					break;
+				}
+			}
+
+			igt_debug("n samples written = %ld/%lu (%ix%i)\n",
+				  accumulator.deltas[2 + 21],/* skip timestamp + clock cycles */
+				  accumulator.deltas[2 + 26],
+				  width, height);
+			accumulator_print(&accumulator, "filtered");
+
+			ret = drm_intel_bo_map(src[0].bo, false /* write enable */);
+			igt_assert_eq(ret, 0);
+			ret = drm_intel_bo_map(dst[0].bo, false /* write enable */);
+			igt_assert_eq(ret, 0);
+
+			ret = memcmp(src[0].bo->virtual, dst[0].bo->virtual, 4 * width * height);
+			if (ret != 0) {
+				accumulator_print(&accumulator, "total");
+				/* This needs to be investigated... From time
+				 * to time, the work we kick off doesn't seem
+				 * to happen. WTH?? */
+				exit(EAGAIN);
+			}
+			//igt_assert_eq(ret, 0);
+
+			drm_intel_bo_unmap(src[0].bo);
+			drm_intel_bo_unmap(dst[0].bo);
+
+			igt_assert_eq(accumulator.deltas[2 + 26], width * height);
+
+			for (int i = 0; i < ARRAY_SIZE(src); i++) {
+				drm_intel_bo_unreference(src[i].bo);
+				drm_intel_bo_unreference(dst[i].bo);
+			}
+
+			drm_intel_bo_unmap(bo);
+			drm_intel_bo_unreference(bo);
+			intel_batchbuffer_free(batch);
+			drm_intel_gem_context_destroy(context0);
+			drm_intel_gem_context_destroy(context1);
+			drm_intel_bufmgr_destroy(bufmgr);
+			__perf_close(stream_fd);
+		}
+
+		child_ret = igt_wait_helper(&child);
+
+		igt_assert(WEXITSTATUS(child_ret) == EAGAIN ||
+			   WEXITSTATUS(child_ret) == 0);
+
+	} while (WEXITSTATUS(child_ret) == EAGAIN);
+}
+
 static void
 test_rc6_disable(void)
 {
@@ -2916,8 +3646,10 @@ igt_main
 		test_oa_exponents(550);
 	}
 
-	igt_subtest("per-context-mode-unprivileged")
+	igt_subtest("per-context-mode-unprivileged") {
+		igt_require(IS_HASWELL(devid));
 		test_per_context_mode_unprivileged();
+	}
 
 	igt_subtest("buffer-fill")
 		test_buffer_fill();
@@ -2942,15 +3674,20 @@ igt_main
 	igt_subtest("mi-rpc")
 		test_mi_rpc();
 
-	igt_subtest("unprivileged-singled-ctx-counters") {
+	igt_subtest("unprivileged-single-ctx-counters") {
+		igt_require(IS_HASWELL(devid));
+		hsw_test_single_ctx_counters();
+	}
+
+	igt_subtest("gen8-unprivileged-single-ctx-counters") {
 		/* For Gen8+ the OA unit can no longer be made to clock gate
 		 * for a specific context. Additionally the partial-replacement
 		 * functionality to HW filter timer reports for a specific
 		 * context (SKL+) can't stop multiple applications viewing
 		 * system-wide data via MI_REPORT_PERF_COUNT commands.
 		 */
-		igt_require(IS_HASWELL(devid));
-		hsw_test_single_ctx_counters();
+		igt_require(intel_gen(devid) >= 8);
+		gen8_test_single_ctx_render_target_writes_a_counter();
 	}
 
 	igt_subtest("rc6-disable")
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH i-g-t 23/29] igt/perf: update max buffer size for reading reports
  2017-04-25 22:32 [PATCH i-g-t 00/29] Update i915 perf tests for Gen8+ Lionel Landwerlin
                   ` (21 preceding siblings ...)
  2017-04-25 22:32 ` [PATCH i-g-t 22/29] igt/perf: add per context filtering test for gen8+ Lionel Landwerlin
@ 2017-04-25 22:32 ` Lionel Landwerlin
  2017-04-25 22:32 ` [PATCH i-g-t 24/29] igt/perf: fix rc6 test Lionel Landwerlin
                   ` (5 subsequent siblings)
  28 siblings, 0 replies; 36+ messages in thread
From: Lionel Landwerlin @ 2017-04-25 22:32 UTC (permalink / raw)
  To: intel-gfx

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 tests/perf.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/tests/perf.c b/tests/perf.c
index 98f80bfd..887836e2 100644
--- a/tests/perf.c
+++ b/tests/perf.c
@@ -1245,9 +1245,7 @@ read_2_oa_reports(int format_id,
 	/* Note: we allocate a large buffer so that each read() iteration
 	 * should scrape *all* pending records.
 	 *
-	 * The largest buffer the OA unit supports is 16MB and the smallest
-	 * OA report format is 64bytes allowing up to 262144 reports to
-	 * be buffered.
+	 * The largest buffer the OA unit supports is 16MB.
 	 *
 	 * Being sure we are fetching all buffered reports allows us to
 	 * potentially throw away / skip all reports whenever we see
@@ -1260,7 +1258,8 @@ read_2_oa_reports(int format_id,
 	 * to indicate that the OA unit may be over taxed if lots of reports
 	 * are being lost.
 	 */
-	int buf_size = 262144 * (64 + sizeof(struct drm_i915_perf_record_header));
+	int max_reports = (16 * 1024 * 1024) / format_size;
+	int buf_size = sample_size * max_reports * 1.5;
 	uint8_t *buf = malloc(buf_size);
 	int n = 0;
 
@@ -1272,6 +1271,7 @@ read_2_oa_reports(int format_id,
 			;
 
 		igt_assert(len > 0);
+		igt_debug("read %d bytes\n", (int)len);
 
 		for (size_t offset = 0; offset < len; offset += header->size) {
 			const uint32_t *report;
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH i-g-t 24/29] igt/perf: fix rc6 test
  2017-04-25 22:32 [PATCH i-g-t 00/29] Update i915 perf tests for Gen8+ Lionel Landwerlin
                   ` (22 preceding siblings ...)
  2017-04-25 22:32 ` [PATCH i-g-t 23/29] igt/perf: update max buffer size for reading reports Lionel Landwerlin
@ 2017-04-25 22:32 ` Lionel Landwerlin
  2017-04-25 22:32 ` [PATCH i-g-t 25/29] igt/perf: rework oa-exponent test Lionel Landwerlin
                   ` (4 subsequent siblings)
  28 siblings, 0 replies; 36+ messages in thread
From: Lionel Landwerlin @ 2017-04-25 22:32 UTC (permalink / raw)
  To: intel-gfx

From: Robert Bragg <robert@sixbynine.org>

When measuring that rc6 doesn't happen, we need to do so after opening
the OA stream.

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 tests/perf.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/tests/perf.c b/tests/perf.c
index 887836e2..9fd40ff0 100644
--- a/tests/perf.c
+++ b/tests/perf.c
@@ -3444,12 +3444,13 @@ test_rc6_disable(void)
 		.num_properties = sizeof(properties) / 16,
 		.properties_ptr = to_user_pointer(properties),
 	};
-	uint64_t n_events_start = read_debugfs_u64_record(drm_fd, "i915_drpc_info",
-							  "RC6 residency since boot");
-	uint64_t n_events_end;
+	uint64_t n_events_start, n_events_end;
 
 	stream_fd = __perf_open(drm_fd, &param);
 
+	n_events_start = read_debugfs_u64_record(drm_fd, "i915_drpc_info",
+						 "RC6 residency since boot");
+
 	nanosleep(&(struct timespec){ .tv_sec = 0, .tv_nsec = 500000000 }, NULL);
 
 	n_events_end = read_debugfs_u64_record(drm_fd, "i915_drpc_info",
@@ -3462,7 +3463,7 @@ test_rc6_disable(void)
 	n_events_start = read_debugfs_u64_record(drm_fd, "i915_drpc_info",
 						 "RC6 residency since boot");
 
-	nanosleep(&(struct timespec){ .tv_sec = 0, .tv_nsec = 500000000 }, NULL);
+	nanosleep(&(struct timespec){ .tv_sec = 1, .tv_nsec = 0 }, NULL);
 
 	n_events_end = read_debugfs_u64_record(drm_fd, "i915_drpc_info",
 					       "RC6 residency since boot");
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH i-g-t 25/29] igt/perf: rework oa-exponent test
  2017-04-25 22:32 [PATCH i-g-t 00/29] Update i915 perf tests for Gen8+ Lionel Landwerlin
                   ` (23 preceding siblings ...)
  2017-04-25 22:32 ` [PATCH i-g-t 24/29] igt/perf: fix rc6 test Lionel Landwerlin
@ 2017-04-25 22:32 ` Lionel Landwerlin
  2017-04-25 22:32 ` [PATCH i-g-t 26/29] igt/perf: make enable-disable more reliable Lionel Landwerlin
                   ` (3 subsequent siblings)
  28 siblings, 0 replies; 36+ messages in thread
From: Lionel Landwerlin @ 2017-04-25 22:32 UTC (permalink / raw)
  To: intel-gfx

New issues that were discovered while making the tests work on Gen8+ :

 - we need to measure timings between periodic reports and discard all
   other kind of reports

 - it seems periodicity of the reports can be affected outside of RC6
   (frequency change), we can detect this by looking at the amount of
   clock cycles per timestamp deltas

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 tests/perf.c | 765 ++++++++++++++++++++++++++++++++++++++++++++---------------
 1 file changed, 573 insertions(+), 192 deletions(-)

diff --git a/tests/perf.c b/tests/perf.c
index 9fd40ff0..922c692d 100644
--- a/tests/perf.c
+++ b/tests/perf.c
@@ -28,6 +28,7 @@
 #include <fcntl.h>
 #include <inttypes.h>
 #include <errno.h>
+#include <signal.h>
 #include <sys/stat.h>
 #include <sys/time.h>
 #include <sys/times.h>
@@ -274,6 +275,15 @@ static uint32_t (*read_report_ticks)(uint32_t *report,
 static void (*sanity_check_reports)(uint32_t *oa_report0, uint32_t *oa_report1,
 				    enum drm_i915_oa_format format);
 
+static bool
+timestamp_delta_within(uint32_t delta,
+		       uint32_t expected_delta,
+		       uint32_t margin)
+{
+	return delta > (expected_delta - margin) &&
+	       delta < (expected_delta + margin);
+}
+
 static void
 __perf_close(int fd)
 {
@@ -450,6 +460,20 @@ gen8_read_report_ticks(uint32_t *report, enum drm_i915_oa_format format)
 	return report[3];
 }
 
+static void
+gen8_read_report_clock_ratios(uint32_t *report,
+			      uint32_t *slice_freq_mhz,
+			      uint32_t *unslice_freq_mhz)
+{
+	uint32_t unslice_freq = report[0] & 0x1ff;
+	uint32_t slice_freq_low = (report[0] >> 25) & 0x7f;
+	uint32_t slice_freq_high = (report[0] >> 9) & 0x3;
+	uint32_t slice_freq = slice_freq_low | (slice_freq_high << 7);
+
+	*slice_freq_mhz = (slice_freq * 16666) / 1000;
+	*unslice_freq_mhz = (unslice_freq * 16666) / 1000;
+}
+
 static const char *
 gen8_read_report_reason(const uint32_t *report)
 {
@@ -472,29 +496,6 @@ gen8_read_report_reason(const uint32_t *report)
 		return "unknown";
 }
 
-static bool
-oa_report_is_periodic(uint32_t oa_exponent, const uint32_t *report)
-{
-
-	if (IS_HASWELL(devid)) {
-		/* For Haswell we don't have a documented report reason field
-		 * (though empirically report[0] bit 10 does seem to correlate
-		 * with a timer trigger reason) so we instead infer which
-		 * reports are timer triggered by checking if the least
-		 * significant bits are zero and the exponent bit is set.
-		 */
-		uint32_t oa_exponent_mask = (1 << (oa_exponent + 1)) - 1;
-		if ((report[1] & oa_exponent_mask) != (1 << oa_exponent))
-			return true;
-	} else {
-		if ((report[0] >> OAREPORT_REASON_SHIFT) &
-		    OAREPORT_REASON_TIMER)
-			return true;
-	}
-
-	return false;
-}
-
 static uint64_t
 timebase_scale(uint32_t u32_delta)
 {
@@ -542,6 +543,29 @@ oa_exponent_to_ns(int exponent)
 }
 
 static bool
+oa_report_is_periodic(uint32_t oa_exponent, const uint32_t *report)
+{
+
+	if (IS_HASWELL(devid)) {
+		/* For Haswell we don't have a documented report reason field
+		 * (though empirically report[0] bit 10 does seem to correlate
+		 * with a timer trigger reason) so we instead infer which
+		 * reports are timer triggered by checking if the least
+		 * significant bits are zero and the exponent bit is set.
+		 */
+		uint32_t oa_exponent_mask = (1 << (oa_exponent + 1)) - 1;
+		if ((report[1] & oa_exponent_mask) != (1 << oa_exponent))
+			return true;
+	} else {
+		if ((report[0] >> OAREPORT_REASON_SHIFT) &
+		    OAREPORT_REASON_TIMER)
+			return true;
+	}
+
+	return false;
+}
+
+static bool
 oa_report_ctx_is_valid(uint32_t *report)
 {
 	if (IS_HASWELL(devid)) {
@@ -556,6 +580,130 @@ oa_report_ctx_is_valid(uint32_t *report)
 	igt_assert(!"reached");
 }
 
+static uint32_t
+oa_report_get_ctx_id(uint32_t *report)
+{
+	if (!oa_report_ctx_is_valid(report))
+		return 0xffffffff;
+	return report[2];
+}
+
+static bool
+oa_reports_have_clock_change(uint32_t *report0, uint32_t *report1)
+{
+	double tick_per_period;
+
+	if (intel_gen(devid) < 8)
+		return false;
+
+	/* Measure the number GPU tick delta to timestamp delta. */
+	tick_per_period =
+		(report1[3] - report0[3]) / (report1[1] - report0[1]);
+
+	if ((tick_per_period / 80.0) >= 0.97)
+		return false;
+
+	return true;
+}
+
+static void
+scratch_buf_memset(drm_intel_bo *bo, int width, int height, uint32_t color)
+{
+	int ret;
+
+	ret = drm_intel_bo_map(bo, true /* writable */);
+	igt_assert_eq(ret, 0);
+
+	for (int i = 0; i < width * height; i++)
+		((uint32_t *)bo->virtual)[i] = color;
+
+	drm_intel_bo_unmap(bo);
+}
+
+static void
+scratch_buf_init(drm_intel_bufmgr *bufmgr,
+		 struct igt_buf *buf,
+		 int width, int height,
+		 uint32_t color)
+{
+	size_t stride = width * 4;
+	size_t size = stride * height;
+	drm_intel_bo *bo = drm_intel_bo_alloc(bufmgr, "", size, 4096);
+
+	scratch_buf_memset(bo, width, height, color);
+
+	buf->bo = bo;
+	buf->stride = stride;
+	buf->tiling = I915_TILING_NONE;
+	buf->size = size;
+}
+
+static void
+emit_report_perf_count(struct intel_batchbuffer *batch,
+		       drm_intel_bo *dst_bo,
+		       int dst_offset,
+		       uint32_t report_id)
+{
+	if (IS_HASWELL(devid)) {
+		BEGIN_BATCH(3, 1);
+		OUT_BATCH(GEN6_MI_REPORT_PERF_COUNT);
+		OUT_RELOC(dst_bo, I915_GEM_DOMAIN_INSTRUCTION, I915_GEM_DOMAIN_INSTRUCTION,
+			  dst_offset);
+		OUT_BATCH(report_id);
+		ADVANCE_BATCH();
+	} else {
+		/* XXX: NB: n dwords arg is actually magic since it internally
+		 * automatically accounts for larger addresses on gen >= 8...
+		 */
+		BEGIN_BATCH(3, 1);
+		OUT_BATCH(GEN8_MI_REPORT_PERF_COUNT);
+		OUT_RELOC(dst_bo, I915_GEM_DOMAIN_INSTRUCTION, I915_GEM_DOMAIN_INSTRUCTION,
+			  dst_offset);
+		OUT_BATCH(report_id);
+		ADVANCE_BATCH();
+	}
+}
+
+static uint32_t
+i915_get_one_gpu_timestamp(void)
+{
+	drm_intel_bufmgr *bufmgr = drm_intel_bufmgr_gem_init(drm_fd, 4096);
+	drm_intel_context *mi_rpc_ctx = drm_intel_gem_context_create(bufmgr);
+	drm_intel_bo *mi_rpc_bo = drm_intel_bo_alloc(bufmgr, "mi_rpc dest bo", 4096, 64);
+	struct intel_batchbuffer *mi_rpc_batch = intel_batchbuffer_alloc(bufmgr, devid);
+	int ret;
+	uint32_t timestamp;
+
+	drm_intel_bufmgr_gem_enable_reuse(bufmgr);
+
+	igt_assert(mi_rpc_ctx);
+	igt_assert(mi_rpc_bo);
+	igt_assert(mi_rpc_batch);
+
+	ret = drm_intel_bo_map(mi_rpc_bo, true);
+	igt_assert_eq(ret, 0);
+	memset(mi_rpc_bo->virtual, 0x80, 4096);
+	drm_intel_bo_unmap(mi_rpc_bo);
+
+	emit_report_perf_count(mi_rpc_batch,
+			       mi_rpc_bo, /* dst */
+			       0, /* dst offset in bytes */
+			       0xdeadbeef); /* report ID */
+
+	intel_batchbuffer_flush_with_context(mi_rpc_batch, mi_rpc_ctx);
+
+	ret = drm_intel_bo_map(mi_rpc_bo, false /* write enable */);
+	igt_assert_eq(ret, 0);
+	timestamp = ((uint32_t *)mi_rpc_bo->virtual)[1];
+	drm_intel_bo_unmap(mi_rpc_bo);
+
+	drm_intel_bo_unreference(mi_rpc_bo);
+	intel_batchbuffer_free(mi_rpc_batch);
+	drm_intel_gem_context_destroy(mi_rpc_ctx);
+	drm_intel_bufmgr_destroy(bufmgr);
+
+	return timestamp;
+}
 
 static void
 hsw_sanity_check_render_basic_reports(uint32_t *oa_report0, uint32_t *oa_report1,
@@ -965,6 +1113,16 @@ gt_frequency_range_save(void)
 	gt_max_freq_mhz = gt_max_freq_mhz_saved;
 }
 
+static void wait_freq_settle(void)
+{
+	struct timespec ts;
+
+	/* FIXME: Lazy sleep without check. */
+	ts.tv_sec = 0;
+	ts.tv_nsec = 20000;
+	clock_nanosleep(CLOCK_MONOTONIC, 0, &ts, NULL);
+}
+
 static void
 gt_frequency_pin(int gt_freq_mhz)
 {
@@ -979,6 +1137,8 @@ gt_frequency_pin(int gt_freq_mhz)
 	}
 	gt_min_freq_mhz = gt_freq_mhz;
 	gt_max_freq_mhz = gt_freq_mhz;
+
+	wait_freq_settle();
 }
 
 static void
@@ -1058,7 +1218,6 @@ i915_read_reports_until_timestamp(enum drm_i915_oa_format oa_format,
 	return total_len;
 }
 
-
 /* CAP_SYS_ADMIN is required to open system wide metrics, unless the system
  * control parameter dev.i915.perf_stream_paranoid == 0 */
 static void
@@ -1374,20 +1533,6 @@ open_and_read_2_oa_reports(int format_id,
 }
 
 static void
-gen8_read_report_clock_ratios(uint32_t *report,
-			      uint32_t *slice_freq_mhz,
-			      uint32_t *unslice_freq_mhz)
-{
-	uint32_t unslice_freq = report[0] & 0x1ff;
-	uint32_t slice_freq_low = (report[0] >> 25) & 0x7f;
-	uint32_t slice_freq_high = (report[0] >> 9) & 0x3;
-	uint32_t slice_freq = slice_freq_low | (slice_freq_high << 7);
-
-	*slice_freq_mhz = (slice_freq * 16666) / 1000;
-	*unslice_freq_mhz = (unslice_freq * 16666) / 1000;
-}
-
-static void
 print_reports(uint32_t *oa_report0, uint32_t *oa_report1, int fmt)
 {
 	igt_debug("TIMESTAMP: 1st = %"PRIu32", 2nd = %"PRIu32", delta = %"PRIu32"\n",
@@ -1574,128 +1719,432 @@ test_oa_formats(void)
 	}
 }
 
-static void
-test_oa_exponents(int gt_freq_mhz)
+
+enum load {
+	LOW,
+	HIGH
+};
+
+static struct load_helper {
+	int devid;
+	int has_ppgtt;
+	drm_intel_bufmgr *bufmgr;
+	drm_intel_context *context;
+	uint32_t context_id;
+	struct intel_batchbuffer *batch;
+	drm_intel_bo *target_buffer;
+	enum load load;
+	bool exit;
+	struct igt_helper_process igt_proc;
+	struct igt_buf src, dst;
+} lh = { 0, };
+
+static void load_helper_signal_handler(int sig)
 {
-	uint32_t freq_margin;
+	if (sig == SIGUSR2)
+		lh.load = lh.load == LOW ? HIGH : LOW;
+	else
+		lh.exit = true;
+}
 
-	/* This test tries to use the sysfs interface for pinning the GT
-	 * frequency so we have another point of reference for comparing with
-	 * the clock frequency as derived from OA reports.
-	 *
-	 * This test has been finicky to stabilise while the
-	 * gt_min/max_freq_mhz files in sysfs don't seem to be a reliable
-	 * mechanism for fixing the gpu frequency.
-	 *
-	 * Since these unit tests are focused on the OA unit not the ability to
-	 * pin the frequency via sysfs we make the test account for pinning not
-	 * being reliable and read back the current frequency for each
-	 * iteration of this test to take this into account.
-	 */
-	gt_frequency_pin(gt_freq_mhz);
+#define LOAD_HELPER_PAUSE_USEC 500
+#define LOAD_HELPER_BO_SIZE (16*1024*1024)
+static void load_helper_set_load(enum load load)
+{
+	igt_assert(lh.igt_proc.running);
+
+	if (lh.load == load)
+		return;
 
-	igt_debug("Testing OA timer exponents with requested GT frequency = %dmhz\n",
-		  gt_freq_mhz);
+	lh.load = load;
+	kill(lh.igt_proc.pid, SIGUSR2);
+}
 
-	/* allow a +- 10% error margin when checking that the frequency
-	 * calculated from the OA reports matches the frequency according to
-	 * sysfs.
+static void load_helper_run(enum load load)
+{
+	/*
+	 * FIXME fork helpers won't get cleaned up when started from within a
+	 * subtest, so handle the case where it sticks around a bit too long.
 	 */
-	freq_margin = gt_freq_mhz * 0.1;
+	if (lh.igt_proc.running) {
+		load_helper_set_load(load);
+		return;
+	}
+
+	lh.load = load;
+
+	igt_fork_helper(&lh.igt_proc) {
+		signal(SIGUSR1, load_helper_signal_handler);
+		signal(SIGUSR2, load_helper_signal_handler);
+
+		while (!lh.exit) {
+			int ret;
+
+			render_copy(lh.batch,
+				    lh.context,
+				    &lh.src, 0, 0, 1920, 1080,
+				    &lh.dst, 0, 0);
+
+			intel_batchbuffer_flush_with_context(lh.batch,
+							     lh.context);
+
+			ret = drm_intel_gem_context_get_id(lh.context,
+							   &lh.context_id);
+			igt_assert_eq(ret, 0);
+
+			drm_intel_bo_wait_rendering(lh.dst.bo);
+
+			/* Lower the load by pausing after every submitted
+			 * write. */
+			if (lh.load == LOW)
+				usleep(LOAD_HELPER_PAUSE_USEC);
+		}
+	}
+}
+
+static void load_helper_stop(void)
+{
+	kill(lh.igt_proc.pid, SIGUSR1);
+	igt_assert(igt_wait_helper(&lh.igt_proc) == 0);
+}
+
+static void load_helper_init(void)
+{
+	int ret;
+
+	lh.devid = intel_get_drm_devid(drm_fd);
+	lh.has_ppgtt = gem_uses_ppgtt(drm_fd);
+
+	/* MI_STORE_DATA can only use GTT address on gen4+/g33 and needs
+	 * snoopable mem on pre-gen6. Hence load-helper only works on gen6+, but
+	 * that's also all we care about for the rps testcase*/
+	igt_assert(intel_gen(lh.devid) >= 6);
+	lh.bufmgr = drm_intel_bufmgr_gem_init(drm_fd, 4096);
+	igt_assert(lh.bufmgr);
+
+	drm_intel_bufmgr_gem_enable_reuse(lh.bufmgr);
+
+	lh.context = drm_intel_gem_context_create(lh.bufmgr);
+	igt_assert(lh.context);
+
+	lh.context_id = 0xffffffff;
+	ret = drm_intel_gem_context_get_id(lh.context, &lh.context_id);
+	igt_assert_eq(ret, 0);
+	igt_assert_neq(lh.context_id, 0xffffffff);
+
+	lh.batch = intel_batchbuffer_alloc(lh.bufmgr, lh.devid);
+	igt_assert(lh.batch);
+
+	scratch_buf_init(lh.bufmgr, &lh.dst, 1920, 1080, 0);
+	scratch_buf_init(lh.bufmgr, &lh.src, 1920, 1080, 0);
+}
+
+static void load_helper_deinit(void)
+{
+	if (lh.igt_proc.running)
+		load_helper_stop();
+
+	if (lh.src.bo)
+		drm_intel_bo_unreference(lh.src.bo);
+	if (lh.dst.bo)
+		drm_intel_bo_unreference(lh.dst.bo);
+
+	if (lh.batch)
+		intel_batchbuffer_free(lh.batch);
+
+	if (lh.context)
+		drm_intel_gem_context_destroy(lh.context);
+
+	if (lh.bufmgr)
+		drm_intel_bufmgr_destroy(lh.bufmgr);
+}
+
+static void
+test_oa_exponents(void)
+{
+	load_helper_init();
+	load_helper_run(HIGH);
 
 	/* It's asking a lot to sample with a 160 nanosecond period and the
 	 * test can fail due to buffer overflows if it wasn't possible to
 	 * keep up, so we don't start from an exponent of zero...
 	 */
-	for (int i = 5; i < 20; i++) {
-		uint32_t expected_timestamp_delta;
-		uint32_t timestamp_delta;
-		uint32_t oa_report0[64];
-		uint32_t oa_report1[64];
+	for (int exponent = 5; exponent < 18; exponent++) {
+		uint64_t expected_timestamp_delta;
 		uint32_t time_delta;
-		uint32_t clock_delta;
-		uint32_t freq;
 		int n_tested = 0;
-		int n_freq_matches = 0;
 		int n_time_delta_matches = 0;
 
-#warning "XXX: it seems pretty odd that the time delta assertion failures centre around these exponents"
-		if (i == 6 || i == 7 || i == 8)
-			continue;
-
 		/* The exponent is effectively selecting a bit in the timestamp
 		 * to trigger reports on and so in practice we expect the raw
 		 * timestamp deltas for periodic reports to exactly match the
 		 * value of next bit.
 		 */
-		expected_timestamp_delta = 2 << i;
+		expected_timestamp_delta = 2UL << exponent;
 
 		for (int j = 0; n_tested < 10 && j < 100; j++) {
-			int gt_freq_mhz_0, gt_freq_mhz_1;
-			uint32_t ticks0, ticks1;
+			uint64_t properties[] = {
+				/* Include OA reports in samples */
+				DRM_I915_PERF_PROP_SAMPLE_OA, true,
+
+				/* OA unit configuration */
+				DRM_I915_PERF_PROP_OA_METRICS_SET, test_metric_set_id,
+				DRM_I915_PERF_PROP_OA_FORMAT, test_oa_format,
+				DRM_I915_PERF_PROP_OA_EXPONENT, exponent,
+			};
+			struct drm_i915_perf_open_param param = {
+				.flags = I915_PERF_FLAG_FD_CLOEXEC,
+				.num_properties = ARRAY_SIZE(properties) / 2,
+				.properties_ptr = to_user_pointer(properties),
+			};
+			int ret;
+			uint64_t average_timestamp_delta;
+			uint32_t n_reports = 0;
+			uint32_t n_report_lost = 0;
+			uint32_t n_idle_reports = 0;
+			uint32_t n_reads = 0;
+			uint64_t first_timestamp = 0;
+			bool check_first_timestamp = true;
+			struct drm_i915_perf_record_header *header;
+			uint64_t delta_delta;
+			struct {
+				uint32_t report[64];
+			} reports[30];
+			struct {
+				uint8_t *buf;
+				size_t len;
+			} reads[1000];
+			double error;
+
+			igt_debug("ITER %d: testing OA exponent %d,"
+				  " expected ts delta = %"PRIu64" (%"PRIu64"ns/%.2fus/%.2fms)\n",
+				  j, exponent,
+				  expected_timestamp_delta,
+				  oa_exponent_to_ns(exponent),
+				  oa_exponent_to_ns(exponent) / 1000.0,
+				  oa_exponent_to_ns(exponent) / (1000.0 * 1000.0));
 
-			gt_freq_mhz_0 = sysfs_read("gt_act_freq_mhz");
+			stream_fd = __perf_open(drm_fd, &param);
 
-			igt_debug("ITER %d: testing OA exponent %d (period = %"PRIu64"ns) with sysfs GT freq = %dmhz +- %u\n",
-				  j, i,
-				  oa_exponent_to_ns(i),
-				  gt_freq_mhz_0, freq_margin);
+			/* Right after opening the OA stream, read a
+			 * first timestamp as way to filter previously
+			 * scheduled work that would have configured
+			 * the OA unit at a different period. */
+			first_timestamp = i915_get_one_gpu_timestamp();
 
-			open_and_read_2_oa_reports(test_oa_format,
-						   i, /* exponent */
-						   oa_report0,
-						   oa_report1,
-						   true); /* timer triggered
-							     reports only */
+			while (n_reads < ARRAY_SIZE(reads) &&
+			       n_reports < ARRAY_SIZE(reports)) {
+				const size_t buf_size = 1024 * 1024;
+				uint8_t *buf = reads[n_reads++].buf = calloc(1, buf_size);
 
-			gt_freq_mhz_1 = sysfs_read("gt_act_freq_mhz");
+				while ((ret = read(stream_fd, buf, buf_size)) < 0 &&
+				       errno == EINTR)
+					;
 
-			/* If it looks like the frequency has changed according
-			 * to sysfs then skip looking at this pair of reports
-			 */
-			if (gt_freq_mhz_0 != gt_freq_mhz_1) {
-				igt_debug("skipping OA reports pair due to GT frequency change according to sysfs\n");
-				continue;
-			}
+				/* We should never have no data. */
+				igt_assert(ret > 0);
+				reads[n_reads - 1].len = ret;
 
-			timestamp_delta = oa_report1[1] - oa_report0[1];
-			igt_assert_neq(timestamp_delta, 0);
+				igt_debug(" > read %i bytes\n", ret);
 
-			if (timestamp_delta == expected_timestamp_delta)
-				n_time_delta_matches++;
-			else {
-				igt_debug("timestamp delta mismatch: %"PRIu64"ns != expected %"PRIu64"ns, ts0 = %u/0x%x, ts1 = %u/0x%x\n",
-					  timebase_scale(timestamp_delta),
-					  timebase_scale(expected_timestamp_delta),
-					  oa_report0[1], oa_report0[1],
-					  oa_report1[1], oa_report1[1]);
-				print_reports(oa_report0, oa_report1, test_oa_format);
-				igt_assert(timestamp_delta <
-					   (expected_timestamp_delta * 2));
+				for (int offset = 0;
+				     offset < ret && n_reports < ARRAY_SIZE(reports);
+				     offset += header->size) {
+					uint32_t *report;
+
+					header = (void *)(buf + offset);
+
+					if (header->type == DRM_I915_PERF_RECORD_OA_BUFFER_LOST) {
+						igt_assert(!"reached");
+						break;
+					}
+
+					if (header->type == DRM_I915_PERF_RECORD_OA_REPORT_LOST) {
+						n_report_lost++;
+						n_reports = 0;
+						n_report_lost = 0;
+						n_idle_reports = 0;
+						for (int r = 0; r < n_reads; r++)
+							free(reads[r].buf);
+						n_reads = 0;
+						break;
+					}
+
+					if (header->type != DRM_I915_PERF_RECORD_SAMPLE)
+						continue;
+
+					report = (void *)(header + 1);
+
+					/* Skip anything before the first
+					 * timestamp, it might not be at the
+					 * right periodic exponent. */
+					if (check_first_timestamp &&
+					    report[1] < first_timestamp) {
+						igt_debug(" > Dropping ts=%u (prior %"PRIu64")\n",
+							  report[1], first_timestamp);
+						continue;
+					}
+
+					/* Once we've passed the first
+					 * timestamp, no need to check. */
+					check_first_timestamp = false;
+
+					if (!oa_report_ctx_is_valid(report))
+						n_idle_reports++;
+
+					/* We only measure timestamps between
+					 * periodic reports. */
+					if (!oa_report_is_periodic(exponent, report))
+						continue;
+
+					igt_debug(" > write %i timestamp=%u\n", n_reports, report[1]);
+					memcpy(reports[n_reports].report, report,
+					       sizeof(reports[n_reports].report));
+					n_reports++;
+
+					/* Dismiss the series of report if we
+					 * notice clock frequency changes. */
+					if (n_reports > 1 &&
+					    oa_reports_have_clock_change(reports[n_reports - 2].report,
+									 reports[n_reports - 1].report)) {
+						igt_debug("Noticed clock frequency change at ts=%u, dropping reports and trying again\n",
+							  report[1]);
+						n_reports = 0;
+						n_report_lost = 0;
+						n_idle_reports = 0;
+						for (int r = 0; r < n_reads; r++)
+							free(reads[r].buf);
+						n_reads = 0;
+						break;
+					}
+				}
 			}
 
-			ticks0 = read_report_ticks(oa_report0, test_oa_format);
-			ticks1 = read_report_ticks(oa_report1, test_oa_format);
-			clock_delta = ticks1 - ticks0;
+			close(stream_fd);
+			igt_debug("closed stream\n");
 
-			time_delta = timebase_scale(timestamp_delta);
+			igt_assert_eq(n_reports, ARRAY_SIZE(reports));
 
-			freq = ((uint64_t)clock_delta * 1000) / time_delta;
-			igt_debug("ITER %d: time delta = %"PRIu32"(ns) clock delta = %"PRIu32" freq = %"PRIu32"(mhz)\n",
-				  j, time_delta, clock_delta, freq);
+			average_timestamp_delta = 0;
+			for (int i = 0; i < (n_reports - 1); i++) {
+				/* XXX: calculating with u32 arithmetic to account for overflow */
+				uint32_t u32_delta = reports[i + 1].report[1] - reports[i].report[1];
 
-                        if (freq < (gt_freq_mhz_1 + freq_margin) &&
-                            freq > (gt_freq_mhz_1 - freq_margin))
-				n_freq_matches++;
+				average_timestamp_delta += u32_delta;
+			}
+			average_timestamp_delta /= (n_reports - 1);
+
+			if (average_timestamp_delta > expected_timestamp_delta)
+				delta_delta  = average_timestamp_delta - expected_timestamp_delta;
+			else
+				delta_delta = expected_timestamp_delta - average_timestamp_delta;
+			error = (delta_delta / (double)expected_timestamp_delta) * 100.0;
+
+			time_delta = timebase_scale(average_timestamp_delta);
+
+			igt_debug(" > Avg. time delta = %"PRIu32"(ns),"
+				  " lost reports = %u, n idle reports = %u,"
+				  " n reads = %u, error=%f\n",
+				  time_delta, n_report_lost, n_idle_reports, n_reads, error);
+			if (error > 5) {
+				uint32_t *rpt = NULL, *last = NULL, *last_periodic = NULL;
+
+				igt_debug(" > More than 5%% error: avg_ts_delta = %"PRIu64", delta_delta = %"PRIu64", expected_delta = %"PRIu64"\n",
+					  average_timestamp_delta, delta_delta, expected_timestamp_delta);
+				for (int i = 0; i < (n_reports - 1); i++) {
+					/* XXX: calculating with u32 arithmetic to account for overflow */
+					uint32_t u32_delta =
+						reports[i + 1].report[1] - reports[i].report[1];
+
+					if (u32_delta > expected_timestamp_delta)
+						delta_delta  = u32_delta - expected_timestamp_delta;
+					else
+						delta_delta = expected_timestamp_delta - u32_delta;
+					error = (delta_delta / (double)expected_timestamp_delta) * 100.0;
+
+					igt_debug(" > ts=%u-%u timestamp delta from %2d to %2d = %-8u (error = %u%%, ctx_id = %x)\n",
+						  reports[i + 1].report[1], reports[i].report[1],
+						  i, i + 1, u32_delta, (unsigned)error,
+						  oa_report_get_ctx_id(reports[i + 1].report));
+				}
+				for (int r = 0; r < n_reads; r++) {
+					igt_debug(" > read\n");
+					for (int offset = 0;
+					     offset < reads[r].len;
+					     offset += header->size) {
+						int counter_print = 1;
+						uint64_t a0 = 0, aN = 0;
+						double local_period = 0;
+
+						header = (void *) &reads[r].buf[offset];
+
+						if (header->type != DRM_I915_PERF_RECORD_SAMPLE) {
+							igt_debug(" > loss\n");
+							continue;
+						}
+
+						rpt = (void *)(header + 1);
+
+						if (last) {
+							a0 = gen8_read_40bit_a_counter(rpt, test_oa_format, 0) -
+								gen8_read_40bit_a_counter(last, test_oa_format, 0);
+							aN = gen8_read_40bit_a_counter(rpt, test_oa_format, 13) -
+								gen8_read_40bit_a_counter(last, test_oa_format, 13);
+						}
+
+						if (last_periodic &&
+						    oa_report_is_periodic(exponent, rpt)) {
+							local_period =
+								((uint64_t) rpt[3] - last_periodic[3])  /
+								((uint64_t) rpt[1] - last_periodic[1]);
+						}
+
+						igt_debug(" > report ts=%u"
+							  " ts_delta_last=%8u ts_delta_last_periodic=%8u is_timer=%i ctx_id=%8x gpu_ticks=%u period=%.2f A0=%lu A%i=%lu\n",
+							  rpt[1],
+							  (last != NULL) ? (rpt[1] - last[1]) : 0,
+							  (last_periodic != NULL) ? (rpt[1] - last_periodic[1]) : 0,
+							  oa_report_is_periodic(exponent, rpt),
+							  oa_report_get_ctx_id(rpt),
+							  (last != NULL) ? (rpt[3] - last[3]) : 0,
+							  local_period,
+							  a0, counter_print, aN);
+
+						last = rpt;
+						if (oa_report_is_periodic(exponent, rpt))
+							last_periodic = rpt;
+					}
+				}
+
+				igt_assert(!"reached");
+			}
+
+			if (timestamp_delta_within(average_timestamp_delta,
+						   expected_timestamp_delta,
+						   expected_timestamp_delta * 0.05)) {
+				igt_debug(" > timestamp delta matching %"PRIu64"ns ~= expected %"PRIu64"! :)\n",
+					  timebase_scale(average_timestamp_delta),
+					  timebase_scale(expected_timestamp_delta));
+				n_time_delta_matches++;
+			} else {
+				igt_debug(" > timestamp delta mismatch: %"PRIu64"ns != expected %"PRIu64"ns\n",
+					  timebase_scale(average_timestamp_delta),
+					  timebase_scale(expected_timestamp_delta));
+				igt_assert(average_timestamp_delta <
+					   (expected_timestamp_delta * 2));
+			}
 
 			n_tested++;
+
+			for (int r = 0; r < n_reads; r++)
+				free(reads[r].buf);
 		}
 
 		if (n_tested < 10)
-			igt_debug("sysfs frequency pinning too unstable for cross-referencing with OA derived frequency");
+			igt_debug("Too many test iterations had to be skipped\n");
 		igt_assert_eq(n_tested, 10);
 
-		igt_debug("number of iterations with expected timestamp delta = %d\n",
+		igt_debug("Number of iterations with expected timestamp delta = %d\n",
 			  n_time_delta_matches);
 
 		/* The HW doesn't give us any strict guarantee that the
@@ -1704,21 +2153,12 @@ test_oa_exponents(int gt_freq_mhz)
 		 * so it a useful sanity check to assert quite strictly...
 		 */
 		igt_assert(n_time_delta_matches >= 9);
-
-		igt_debug("number of iterations with expected clock frequency = %d\n",
-			  n_freq_matches);
-
-		/* Don't assert the calculated frequency for extremely short
-		 * durations.
-		 *
-		 * Allow some mismatches since can't be can't be sure about
-		 * frequency changes between sysfs reads.
-		 */
-		if (i > 3)
-			igt_assert(n_freq_matches >= 7);
 	}
 
 	gt_frequency_range_restore();
+
+	load_helper_stop();
+	load_helper_deinit();
 }
 
 /* The OA exponent selects a timestamp counter bit to trigger reports on.
@@ -2585,32 +3025,6 @@ test_disabled_read_error(void)
 }
 
 static void
-emit_report_perf_count(struct intel_batchbuffer *batch,
-		       drm_intel_bo *dst_bo,
-		       int dst_offset,
-		       uint32_t report_id)
-{
-	if (IS_HASWELL(devid)) {
-		BEGIN_BATCH(3, 1);
-		OUT_BATCH(GEN6_MI_REPORT_PERF_COUNT);
-		OUT_RELOC(dst_bo, I915_GEM_DOMAIN_INSTRUCTION, I915_GEM_DOMAIN_INSTRUCTION,
-			  dst_offset);
-		OUT_BATCH(report_id);
-		ADVANCE_BATCH();
-	} else {
-		/* XXX: NB: n dwords arg is actually magic since it internally
-		 * automatically accounts for larger addresses on gen >= 8...
-		 */
-		BEGIN_BATCH(3, 1);
-		OUT_BATCH(GEN8_MI_REPORT_PERF_COUNT);
-		OUT_RELOC(dst_bo, I915_GEM_DOMAIN_INSTRUCTION, I915_GEM_DOMAIN_INSTRUCTION,
-			  dst_offset);
-		OUT_BATCH(report_id);
-		ADVANCE_BATCH();
-	}
-}
-
-static void
 test_mi_rpc(void)
 {
 	uint64_t properties[] = {
@@ -2680,38 +3094,6 @@ test_mi_rpc(void)
 }
 
 static void
-scratch_buf_memset(drm_intel_bo *bo, int width, int height, uint32_t color)
-{
-	int ret;
-
-	ret = drm_intel_bo_map(bo, true /* writable */);
-	igt_assert_eq(ret, 0);
-
-	for (int i = 0; i < width * height; i++)
-		((uint32_t *)bo->virtual)[i] = color;
-
-	drm_intel_bo_unmap(bo);
-}
-
-static void
-scratch_buf_init(drm_intel_bufmgr *bufmgr,
-		 struct igt_buf *buf,
-		 int width, int height,
-		 uint32_t color)
-{
-	size_t stride = width * 4;
-	size_t size = stride * height;
-	drm_intel_bo *bo = drm_intel_bo_alloc(bufmgr, "", size, 4096);
-
-	scratch_buf_memset(bo, width, height, color);
-
-	buf->bo = bo;
-	buf->stride = stride;
-	buf->tiling = I915_TILING_NONE;
-	buf->size = size;
-}
-
-static void
 emit_stall_timestamp_and_rpc(struct intel_batchbuffer *batch,
 			     drm_intel_bo *dst,
 			     int timestamp_offset,
@@ -3643,8 +4025,7 @@ igt_main
 	igt_subtest("low-oa-exponent-permissions")
 		test_low_oa_exponent_permissions();
 	igt_subtest("oa-exponents") {
-		test_oa_exponents(450);
-		test_oa_exponents(550);
+		test_oa_exponents();
 	}
 
 	igt_subtest("per-context-mode-unprivileged") {
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH i-g-t 26/29] igt/perf: make enable-disable more reliable
  2017-04-25 22:32 [PATCH i-g-t 00/29] Update i915 perf tests for Gen8+ Lionel Landwerlin
                   ` (24 preceding siblings ...)
  2017-04-25 22:32 ` [PATCH i-g-t 25/29] igt/perf: rework oa-exponent test Lionel Landwerlin
@ 2017-04-25 22:32 ` Lionel Landwerlin
  2017-04-25 22:32 ` [PATCH i-g-t 27/29] igt/perf: make buffer-fill " Lionel Landwerlin
                   ` (2 subsequent siblings)
  28 siblings, 0 replies; 36+ messages in thread
From: Lionel Landwerlin @ 2017-04-25 22:32 UTC (permalink / raw)
  To: intel-gfx

Estimation of the amount of reports can only refer to periodic ones,
as context switch reports completely depend on what happens on the
system. Also generate some load to prevent clock frequency changes to
impact our measurement.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 tests/perf.c | 88 +++++++++++++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 82 insertions(+), 6 deletions(-)

diff --git a/tests/perf.c b/tests/perf.c
index 922c692d..8639a5a2 100644
--- a/tests/perf.c
+++ b/tests/perf.c
@@ -2794,10 +2794,17 @@ test_enable_disable(void)
 	int n_full_oa_reports = oa_buf_size / report_size;
 	uint64_t fill_duration = n_full_oa_reports * oa_period;
 
+	load_helper_init();
+	load_helper_run(HIGH);
+
 	stream_fd = __perf_open(drm_fd, &param);
 
 	for (int i = 0; i < 5; i++) {
 		int len;
+		uint32_t n_periodic_reports;
+		struct drm_i915_perf_record_header *header;
+		uint32_t first_timestamp = 0, last_timestamp = 0;
+		uint32_t last_periodic_report[64];
 
 		/* Giving enough time for an overflow might help catch whether
 		 * the OA unit has been enabled even if the driver might at
@@ -2817,18 +2824,84 @@ test_enable_disable(void)
 
 		nanosleep(&(struct timespec){ .tv_sec = 0,
 					      .tv_nsec = fill_duration / 2 },
-			  NULL);
+			NULL);
 
-		while ((len = read(stream_fd, buf, buf_size)) == -1 && errno == EINTR)
-			;
+		n_periodic_reports = 0;
 
-		igt_assert_neq(len, -1);
+		/* Because of the race condition between notification of new
+		 * reports and reports landing in memory, we need to rely on
+		 * timestamps to figure whether we've read enough of them.
+		 */
+		while (((last_timestamp - first_timestamp) * oa_exponent_to_ns(oa_exponent)) <
+		       (fill_duration / 2)) {
 
-		igt_assert(len > report_size * n_full_oa_reports * 0.45);
-		igt_assert(len < report_size * n_full_oa_reports * 0.55);
+			while ((len = read(stream_fd, buf, buf_size)) == -1 && errno == EINTR)
+				;
+
+			igt_assert_neq(len, -1);
+
+			for (int offset = 0; offset < len; offset += header->size) {
+				uint32_t *report;
+
+				header = (void *) (buf + offset);
+				report = (void *) (header + 1);
+
+				switch (header->type) {
+				case DRM_I915_PERF_RECORD_OA_REPORT_LOST:
+					break;
+				case DRM_I915_PERF_RECORD_SAMPLE:
+					if (first_timestamp == 0)
+						first_timestamp = report[1];
+					last_timestamp = report[1];
+
+					if (n_periodic_reports > 0 &&
+					    oa_report_is_periodic(oa_exponent, report)) {
+						if (oa_reports_have_clock_change(last_periodic_report,
+										 report))
+							igt_debug("clock change!\n");
+
+						igt_debug(" > report ts=%u"
+							  " ts_delta_last_periodic=%8u is_timer=%i ctx_id=%8x gpu_ticks=%u nb_periodic=%u\n",
+							  report[1],
+							  report[1] - last_periodic_report[1],
+							  oa_report_is_periodic(oa_exponent, report),
+							  oa_report_get_ctx_id(report),
+							  report[3] - last_periodic_report[3],
+							  n_periodic_reports);
+
+						memcpy(last_periodic_report, report,
+						       sizeof(last_periodic_report));
+					}
+
+					/* We want to measure only the periodic
+					 * reports, ctx-switch might inflate the
+					 * content of the buffer and skew or
+					 * measurement.
+					 */
+					n_periodic_reports +=
+						oa_report_is_periodic(oa_exponent, report);
+					break;
+				case DRM_I915_PERF_RECORD_OA_BUFFER_LOST:
+					igt_assert(!"unexpected overflow");
+					break;
+				}
+			}
+
+		}
 
 		do_ioctl(stream_fd, I915_PERF_IOCTL_DISABLE, 0);
 
+		igt_debug("%f < %lu < %f\n",
+			  report_size * n_full_oa_reports * 0.45,
+			  n_periodic_reports * report_size,
+			  report_size * n_full_oa_reports * 0.55);
+
+		igt_assert((n_periodic_reports * report_size) >
+			   (report_size * n_full_oa_reports * 0.45));
+		igt_assert((n_periodic_reports * report_size) <
+			   report_size * n_full_oa_reports * 0.55);
+
+
 		/* It's considered an error to read a stream while it's disabled
 		 * since it would block indefinitely...
 		 */
@@ -2841,6 +2914,9 @@ test_enable_disable(void)
 	free(buf);
 
 	__perf_close(stream_fd);
+
+	load_helper_stop();
+	load_helper_deinit();
 }
 
 static void
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH i-g-t 27/29] igt/perf: make buffer-fill more reliable
  2017-04-25 22:32 [PATCH i-g-t 00/29] Update i915 perf tests for Gen8+ Lionel Landwerlin
                   ` (25 preceding siblings ...)
  2017-04-25 22:32 ` [PATCH i-g-t 26/29] igt/perf: make enable-disable more reliable Lionel Landwerlin
@ 2017-04-25 22:32 ` Lionel Landwerlin
  2017-04-25 22:33 ` [PATCH i-g-t 28/29] igt/perf: load gt_boost_freq_mhz as max gt frequency Lionel Landwerlin
  2017-04-25 22:33 ` [PATCH i-g-t 29/29] igt/perf: remove unused frequency functions Lionel Landwerlin
  28 siblings, 0 replies; 36+ messages in thread
From: Lionel Landwerlin @ 2017-04-25 22:32 UTC (permalink / raw)
  To: intel-gfx

Filling rate of the buffer must discard context switch reports as they
do not depend upon the periodicity, instead they're a factor on the
amount of different applications concurrently running on the system.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 tests/perf.c | 113 ++++++++++++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 96 insertions(+), 17 deletions(-)

diff --git a/tests/perf.c b/tests/perf.c
index 8639a5a2..6026811b 100644
--- a/tests/perf.c
+++ b/tests/perf.c
@@ -2702,22 +2702,29 @@ test_buffer_fill(void)
 		.num_properties = sizeof(properties) / 16,
 		.properties_ptr = to_user_pointer(properties),
 	};
+	struct drm_i915_perf_record_header *header;
 	int buf_size = 65536 * (256 + sizeof(struct drm_i915_perf_record_header));
 	uint8_t *buf = malloc(buf_size);
+	int len;
 	size_t oa_buf_size = 16 * 1024 * 1024;
 	size_t report_size = oa_formats[test_oa_format].size;
 	int n_full_oa_reports = oa_buf_size / report_size;
 	uint64_t fill_duration = n_full_oa_reports * oa_period;
 
+	load_helper_init();
+	load_helper_run(HIGH);
+
 	igt_assert(fill_duration < 1000000000);
 
 	stream_fd = __perf_open(drm_fd, &param);
 
 	for (int i = 0; i < 5; i++) {
-		struct drm_i915_perf_record_header *header;
 		bool overflow_seen;
-		int offset = 0;
-		int len;
+		uint32_t n_periodic_reports;
+		uint32_t first_timestamp = 0, last_timestamp = 0;
+		uint32_t last_periodic_report[64];
+
+		do_ioctl(stream_fd, I915_PERF_IOCTL_ENABLE, 0);
 
 		nanosleep(&(struct timespec){ .tv_sec = 0,
 					      .tv_nsec = fill_duration * 1.25 },
@@ -2729,7 +2736,7 @@ test_buffer_fill(void)
 		igt_assert_neq(len, -1);
 
 		overflow_seen = false;
-		for (offset = 0; offset < len; offset += header->size) {
+		for (int offset = 0; offset < len; offset += header->size) {
 			header = (void *)(buf + offset);
 
 			if (header->type == DRM_I915_PERF_RECORD_OA_BUFFER_LOST)
@@ -2738,32 +2745,104 @@ test_buffer_fill(void)
 
 		igt_assert_eq(overflow_seen, true);
 
+		do_ioctl(stream_fd, I915_PERF_IOCTL_DISABLE, 0);
+
+		igt_debug("fill_duration = %luns, oa_exponent = %u\n",
+			  fill_duration, oa_exponent);
+
+		do_ioctl(stream_fd, I915_PERF_IOCTL_ENABLE, 0);
+
 		nanosleep(&(struct timespec){ .tv_sec = 0,
-					      .tv_nsec = fill_duration / 2 },
-			  NULL);
+					.tv_nsec = fill_duration / 2 },
+			NULL);
 
-		while ((len = read(stream_fd, buf, buf_size)) == -1 && errno == EINTR)
-			;
+		n_periodic_reports = 0;
 
-		igt_assert_neq(len, -1);
+		/* Because of the race condition between notification of new
+		 * reports and reports landing in memory, we need to rely on
+		 * timestamps to figure whether we've read enough of them.
+		 */
+		while (((last_timestamp - first_timestamp) * oa_exponent_to_ns(oa_exponent)) <
+		       (fill_duration / 2)) {
 
-		igt_assert(len > report_size * n_full_oa_reports * 0.45);
-		igt_assert(len < report_size * n_full_oa_reports * 0.55);
+			igt_debug("dts=%u elapsed=%lu duration=%lu\n",
+				  last_timestamp - first_timestamp,
+				  (last_timestamp - first_timestamp) * oa_exponent_to_ns(oa_exponent),
+				  fill_duration / 2);
 
-		overflow_seen = false;
-		for (offset = 0; offset < len; offset += header->size) {
-			header = (void *)(buf + offset);
+			while ((len = read(stream_fd, buf, buf_size)) == -1 && errno == EINTR)
+				;
 
-			if (header->type == DRM_I915_PERF_RECORD_OA_BUFFER_LOST)
-				overflow_seen = true;
+			igt_assert_neq(len, -1);
+
+			for (int offset = 0; offset < len; offset += header->size) {
+				uint32_t *report;
+
+				header = (void *) (buf + offset);
+				report = (void *) (header + 1);
+
+				switch (header->type) {
+				case DRM_I915_PERF_RECORD_OA_REPORT_LOST:
+					igt_debug("report loss, trying again\n");
+					break;
+				case DRM_I915_PERF_RECORD_SAMPLE:
+					igt_debug(" > report ts=%u"
+						  " ts_delta_last_periodic=%8u is_timer=%i ctx_id=%8x gpu_ticks=%u nb_periodic=%u\n",
+						  report[1],
+						  n_periodic_reports > 0 ? report[1] - last_periodic_report[1] : 0,
+						  oa_report_is_periodic(oa_exponent, report),
+						  oa_report_get_ctx_id(report),
+						  n_periodic_reports > 0 ? report[3] - last_periodic_report[3] : 0,
+						  n_periodic_reports);
+
+					if (first_timestamp == 0)
+						first_timestamp = report[1];
+					last_timestamp = report[1];
+
+					if (n_periodic_reports > 0 &&
+					    oa_report_is_periodic(oa_exponent, report)) {
+						if (oa_reports_have_clock_change(last_periodic_report,
+										 report))
+							igt_debug("clock change!\n");
+
+						memcpy(last_periodic_report, report,
+						       sizeof(last_periodic_report));
+					}
+
+					/* We want to measure only the periodic
+					 * reports, ctx-switch might inflate the
+					 * content of the buffer and skew or
+					 * measurement.
+					 */
+					n_periodic_reports +=
+						oa_report_is_periodic(oa_exponent, report);
+					break;
+				case DRM_I915_PERF_RECORD_OA_BUFFER_LOST:
+					igt_assert(!"unexpected overflow");
+					break;
+				}
+			}
 		}
 
-		igt_assert_eq(overflow_seen, false);
+		do_ioctl(stream_fd, I915_PERF_IOCTL_DISABLE, 0);
+
+		igt_debug("%f < %lu < %f\n",
+			  report_size * n_full_oa_reports * 0.45,
+			  n_periodic_reports * report_size,
+			  report_size * n_full_oa_reports * 0.55);
+
+		igt_assert(n_periodic_reports * report_size >
+			   report_size * n_full_oa_reports * 0.45);
+		igt_assert(n_periodic_reports * report_size <
+			   report_size * n_full_oa_reports * 0.55);
 	}
 
 	free(buf);
 
 	__perf_close(stream_fd);
+
+	load_helper_stop();
+	load_helper_deinit();
 }
 
 static void
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH i-g-t 28/29] igt/perf: load gt_boost_freq_mhz as max gt frequency
  2017-04-25 22:32 [PATCH i-g-t 00/29] Update i915 perf tests for Gen8+ Lionel Landwerlin
                   ` (26 preceding siblings ...)
  2017-04-25 22:32 ` [PATCH i-g-t 27/29] igt/perf: make buffer-fill " Lionel Landwerlin
@ 2017-04-25 22:33 ` Lionel Landwerlin
  2017-04-25 22:33 ` [PATCH i-g-t 29/29] igt/perf: remove unused frequency functions Lionel Landwerlin
  28 siblings, 0 replies; 36+ messages in thread
From: Lionel Landwerlin @ 2017-04-25 22:33 UTC (permalink / raw)
  To: intel-gfx

We want the absolute max the hardware can do, not the max value
set by a previous application/user.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 tests/perf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/perf.c b/tests/perf.c
index 6026811b..3d033b3a 100644
--- a/tests/perf.c
+++ b/tests/perf.c
@@ -1107,7 +1107,7 @@ static void
 gt_frequency_range_save(void)
 {
 	gt_min_freq_mhz_saved = sysfs_read("gt_min_freq_mhz");
-	gt_max_freq_mhz_saved = sysfs_read("gt_max_freq_mhz");
+	gt_max_freq_mhz_saved = sysfs_read("gt_boost_freq_mhz");
 
 	gt_min_freq_mhz = gt_min_freq_mhz_saved;
 	gt_max_freq_mhz = gt_max_freq_mhz_saved;
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH i-g-t 29/29] igt/perf: remove unused frequency functions
  2017-04-25 22:32 [PATCH i-g-t 00/29] Update i915 perf tests for Gen8+ Lionel Landwerlin
                   ` (27 preceding siblings ...)
  2017-04-25 22:33 ` [PATCH i-g-t 28/29] igt/perf: load gt_boost_freq_mhz as max gt frequency Lionel Landwerlin
@ 2017-04-25 22:33 ` Lionel Landwerlin
  28 siblings, 0 replies; 36+ messages in thread
From: Lionel Landwerlin @ 2017-04-25 22:33 UTC (permalink / raw)
  To: intel-gfx

Now that we've found that frequency changes happen mostly outside of
our control and don't seem to be following our requests through sysfs,
let's drop a bunch code/variables.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 tests/perf.c | 83 +++---------------------------------------------------------
 1 file changed, 3 insertions(+), 80 deletions(-)

diff --git a/tests/perf.c b/tests/perf.c
index 3d033b3a..7088d723 100644
--- a/tests/perf.c
+++ b/tests/perf.c
@@ -259,12 +259,9 @@ static int card = -1;
 static int n_eus;
 
 static uint64_t test_metric_set_id = UINT64_MAX;
-static uint64_t gt_min_freq_mhz_saved = 0;
-static uint64_t gt_max_freq_mhz_saved = 0;
-static uint64_t gt_min_freq_mhz = 0;
-static uint64_t gt_max_freq_mhz = 0;
 
 static uint64_t timestamp_frequency = 12500000;
+static uint64_t gt_max_freq_mhz = 0;
 static enum drm_i915_oa_format test_oa_format;
 static bool *undefined_a_counters;
 static uint64_t oa_exp_1_millisec;
@@ -377,16 +374,6 @@ sysfs_read(const char *file)
 	return read_u64_file(buf);
 }
 
-static void
-sysfs_write(const char *file, uint64_t val)
-{
-	char buf[512];
-
-	snprintf(buf, sizeof(buf), "/sys/class/drm/card%d/%s", card, file);
-
-	write_u64_file(buf, val);
-}
-
 static char *
 read_debugfs_record(int device, const char *file, const char *key)
 {
@@ -1103,66 +1090,6 @@ init_sys_info(void)
 	return try_read_u64_file(buf, &test_metric_set_id);
 }
 
-static void
-gt_frequency_range_save(void)
-{
-	gt_min_freq_mhz_saved = sysfs_read("gt_min_freq_mhz");
-	gt_max_freq_mhz_saved = sysfs_read("gt_boost_freq_mhz");
-
-	gt_min_freq_mhz = gt_min_freq_mhz_saved;
-	gt_max_freq_mhz = gt_max_freq_mhz_saved;
-}
-
-static void wait_freq_settle(void)
-{
-	struct timespec ts;
-
-	/* FIXME: Lazy sleep without check. */
-	ts.tv_sec = 0;
-	ts.tv_nsec = 20000;
-	clock_nanosleep(CLOCK_MONOTONIC, 0, &ts, NULL);
-}
-
-static void
-gt_frequency_pin(int gt_freq_mhz)
-{
-	igt_debug("requesting pinned GT freq = %dmhz\n", gt_freq_mhz);
-
-	if (gt_freq_mhz > gt_max_freq_mhz) {
-		sysfs_write("gt_max_freq_mhz", gt_freq_mhz);
-		sysfs_write("gt_min_freq_mhz", gt_freq_mhz);
-	} else {
-		sysfs_write("gt_min_freq_mhz", gt_freq_mhz);
-		sysfs_write("gt_max_freq_mhz", gt_freq_mhz);
-	}
-	gt_min_freq_mhz = gt_freq_mhz;
-	gt_max_freq_mhz = gt_freq_mhz;
-
-	wait_freq_settle();
-}
-
-static void
-gt_frequency_range_restore(void)
-{
-	igt_debug("restoring GT frequency range: min = %dmhz, max =%dmhz, current: min=%dmhz, max=%dmhz\n",
-		  (int)gt_min_freq_mhz_saved,
-		  (int)gt_max_freq_mhz_saved,
-		  (int)gt_min_freq_mhz,
-		  (int)gt_max_freq_mhz);
-
-	/* Assume current min/max are the same */
-	if (gt_min_freq_mhz_saved > gt_max_freq_mhz) {
-		sysfs_write("gt_max_freq_mhz", gt_max_freq_mhz_saved);
-		sysfs_write("gt_min_freq_mhz", gt_min_freq_mhz_saved);
-	} else {
-		sysfs_write("gt_min_freq_mhz", gt_min_freq_mhz_saved);
-		sysfs_write("gt_max_freq_mhz", gt_max_freq_mhz_saved);
-	}
-
-	gt_min_freq_mhz = gt_min_freq_mhz_saved;
-	gt_max_freq_mhz = gt_max_freq_mhz_saved;
-}
-
 static int
 i915_read_reports_until_timestamp(enum drm_i915_oa_format oa_format,
 				  uint8_t *buf,
@@ -2155,8 +2082,6 @@ test_oa_exponents(void)
 		igt_assert(n_time_delta_matches >= 9);
 	}
 
-	gt_frequency_range_restore();
-
 	load_helper_stop();
 	load_helper_deinit();
 }
@@ -4148,11 +4073,11 @@ igt_main
 
 		igt_require(init_sys_info());
 
-		gt_frequency_range_save();
-
 		write_u64_file("/proc/sys/dev/i915/perf_stream_paranoid", 1);
 		write_u64_file("/proc/sys/dev/i915/oa_max_sample_rate", 100000);
 
+		gt_max_freq_mhz = sysfs_read("gt_boost_freq_mhz");
+
 		render_copy = igt_get_render_copyfunc(devid);
 		igt_require_f(render_copy, "no render-copy function\n");
 	}
@@ -4235,8 +4160,6 @@ igt_main
 		write_u64_file("/proc/sys/dev/i915/oa_max_sample_rate", 100000);
 		write_u64_file("/proc/sys/dev/i915/perf_stream_paranoid", 1);
 
-		gt_frequency_range_restore();
-
 		close(drm_fd);
 	}
 }
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH i-g-t 02/29] igt/perf: improve robustness of polling/blocking tests
  2017-04-25 22:32 ` [PATCH i-g-t 02/29] igt/perf: improve robustness of polling/blocking tests Lionel Landwerlin
@ 2017-06-16 13:48   ` Matthew Auld
  0 siblings, 0 replies; 36+ messages in thread
From: Matthew Auld @ 2017-06-16 13:48 UTC (permalink / raw)
  To: Lionel Landwerlin; +Cc: Intel Graphics Development

On 25 April 2017 at 23:32, Lionel Landwerlin
<lionel.g.landwerlin@intel.com> wrote:
> From: Robert Bragg <robert@sixbynine.org>
>
> There were a couple of problems with both of these tests that could lead
> to false negatives addressed by this patch.
>
> 1) The upper limit for the number of iterations missed a +1 to consider
>    that there might be a sample immediately available at the start of the
>    loop.
>
> 2) The tests didn't consider that a duration measured in terms of
>    (end-start) ticks could be +- 1 tick since we don't know the
>    fractional part of the tick counts. Our threshold for stime being <
>    one tick could have a false negative for any real stime between 1 to
>    10 milliseconds depending on luck.
>
> The tests now both run for a lot longer (1000 x tick duration, or
> typically 10 seconds each) so that a single tick represents a much
> smaller proportion of the total duration (0.1%) and the stime thresholds
> are now set at 1% of the total duration.
>
> Signed-off-by: Robert Bragg <robert@sixbynine.org>
I did r-b this in the past, so:

Reviewed-by: Matthew Auld <matthew.auld@intel.com>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH i-g-t 03/29] igt/perf: init timestamp freq and oa format per devid
  2017-04-25 22:32 ` [PATCH i-g-t 03/29] igt/perf: init timestamp freq and oa format per devid Lionel Landwerlin
@ 2017-06-16 14:37   ` Matthew Auld
  0 siblings, 0 replies; 36+ messages in thread
From: Matthew Auld @ 2017-06-16 14:37 UTC (permalink / raw)
  To: Lionel Landwerlin; +Cc: Intel Graphics Development

On 25 April 2017 at 23:32, Lionel Landwerlin
<lionel.g.landwerlin@intel.com> wrote:
> From: Robert Bragg <robert@sixbynine.org>
>
> Signed-off-by: Robert Bragg <robert@sixbynine.org>
> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH i-g-t 20/29] igt/perf: add utility function for checking periodic reports
  2017-04-25 22:32 ` [PATCH i-g-t 20/29] igt/perf: add utility function for checking periodic reports Lionel Landwerlin
@ 2017-06-16 14:41   ` Matthew Auld
  0 siblings, 0 replies; 36+ messages in thread
From: Matthew Auld @ 2017-06-16 14:41 UTC (permalink / raw)
  To: Lionel Landwerlin; +Cc: Intel Graphics Development

On 25 April 2017 at 23:32, Lionel Landwerlin
<lionel.g.landwerlin@intel.com> wrote:
> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> ---
>  tests/perf.c | 55 +++++++++++++++++++++++++++++--------------------------
>  1 file changed, 29 insertions(+), 26 deletions(-)
>
> diff --git a/tests/perf.c b/tests/perf.c
> index d057d943..f8ac06c3 100644
> --- a/tests/perf.c
> +++ b/tests/perf.c
> @@ -450,6 +450,29 @@ gen8_read_report_reason(const uint32_t *report)
>                 return "unknown";
>  }
>
> +static bool
> +oa_report_is_periodic(uint32_t oa_exponent, const uint32_t *report)
> +{
> +
spurious newline.

> +       if (IS_HASWELL(devid)) {
> +               /* For Haswell we don't have a documented report reason field
> +                * (though empirically report[0] bit 10 does seem to correlate
> +                * with a timer trigger reason) so we instead infer which
> +                * reports are timer triggered by checking if the least
> +                * significant bits are zero and the exponent bit is set.
> +                */
> +               uint32_t oa_exponent_mask = (1 << (oa_exponent + 1)) - 1;
newline.

Reviewed-by: Matthew Auld <matthew.auld@intel.com>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH i-g-t 21/29] igt/perf: make stream_fd a global variable
  2017-04-25 22:32 ` [PATCH i-g-t 21/29] igt/perf: make stream_fd a global variable Lionel Landwerlin
@ 2017-06-16 14:43   ` Matthew Auld
  2017-06-16 14:46     ` Lionel Landwerlin
  2017-06-21 13:47   ` Matthew Auld
  1 sibling, 1 reply; 36+ messages in thread
From: Matthew Auld @ 2017-06-16 14:43 UTC (permalink / raw)
  To: Lionel Landwerlin; +Cc: Intel Graphics Development

On 25 April 2017 at 23:32, Lionel Landwerlin
<lionel.g.landwerlin@intel.com> wrote:
> When debugging unstable tests on new platforms we currently we don't
we don't currently

> cleanup everything well in between different tests. Since only a
> single OA stream fd can be opened at a time, having the stream_fd as a
> global variable helps us cleanup the state between tests.
>
> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
So just improve the tests such that they do the necessary cleanup?

But if you feel this is better, then:

Reviewed-by: Matthew Auld <matthew.auld@intel.com>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH i-g-t 21/29] igt/perf: make stream_fd a global variable
  2017-06-16 14:43   ` Matthew Auld
@ 2017-06-16 14:46     ` Lionel Landwerlin
  0 siblings, 0 replies; 36+ messages in thread
From: Lionel Landwerlin @ 2017-06-16 14:46 UTC (permalink / raw)
  To: Matthew Auld; +Cc: Intel Graphics Development

On 16/06/17 15:43, Matthew Auld wrote:
> On 25 April 2017 at 23:32, Lionel Landwerlin
> <lionel.g.landwerlin@intel.com> wrote:
>> When debugging unstable tests on new platforms we currently we don't
> we don't currently
>
>> cleanup everything well in between different tests. Since only a
>> single OA stream fd can be opened at a time, having the stream_fd as a
>> global variable helps us cleanup the state between tests.
>>
>> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> So just improve the tests such that they do the necessary cleanup?

That prevents every single test to implement clean logic. That logic is 
abstracted in the open() close instead which is used by most tests.

>
> But if you feel this is better, then:
>
> Reviewed-by: Matthew Auld <matthew.auld@intel.com>
>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH i-g-t 21/29] igt/perf: make stream_fd a global variable
  2017-04-25 22:32 ` [PATCH i-g-t 21/29] igt/perf: make stream_fd a global variable Lionel Landwerlin
  2017-06-16 14:43   ` Matthew Auld
@ 2017-06-21 13:47   ` Matthew Auld
  1 sibling, 0 replies; 36+ messages in thread
From: Matthew Auld @ 2017-06-21 13:47 UTC (permalink / raw)
  To: Lionel Landwerlin; +Cc: Intel Graphics Development

On 25 April 2017 at 23:32, Lionel Landwerlin
<lionel.g.landwerlin@intel.com> wrote:
> When debugging unstable tests on new platforms we currently we don't
> cleanup everything well in between different tests. Since only a
> single OA stream fd can be opened at a time, having the stream_fd as a
> global variable helps us cleanup the state between tests.
>
> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> ---
>  tests/perf.c | 108 ++++++++++++++++++++++++++++++++---------------------------
>  1 file changed, 58 insertions(+), 50 deletions(-)
>
> diff --git a/tests/perf.c b/tests/perf.c
> index f8ac06c3..b7af1c3b 100644
> --- a/tests/perf.c
> +++ b/tests/perf.c
> @@ -243,6 +243,7 @@ static bool hsw_undefined_a_counters[45] = {
>  static bool gen8_undefined_a_counters[45];
>
>  static int drm_fd = -1;
> +static int stream_fd = -1;
>  static uint32_t devid;
>  static int card = -1;
>  static int n_eus;
> @@ -264,10 +265,22 @@ static uint32_t (*read_report_ticks)(uint32_t *report,
>  static void (*sanity_check_reports)(uint32_t *oa_report0, uint32_t *oa_report1,
>                                     enum drm_i915_oa_format format);
>
> +static void
> +__perf_close(int fd)
> +{
> +       close(fd);
> +       stream_fd = -1;
> +}
> +
>  static int
>  __perf_open(int fd, struct drm_i915_perf_open_param *param)
>  {
> -       int ret = igt_ioctl(fd, DRM_IOCTL_I915_PERF_OPEN, param);
> +       int ret;
> +
> +       if (stream_fd >= 0)
> +               __perf_close(stream_fd);
> +
> +       ret = igt_ioctl(fd, DRM_IOCTL_I915_PERF_OPEN, param);
>
>         igt_assert(ret >= 0);
>         errno = 0;
> @@ -918,14 +931,12 @@ test_system_wide_paranoid(void)
>                         .num_properties = sizeof(properties) / 16,
>                         .properties_ptr = to_user_pointer(properties),
>                 };
> -               int stream_fd;
> -
>                 write_u64_file("/proc/sys/dev/i915/perf_stream_paranoid", 0);
>
>                 igt_drop_root();
>
>                 stream_fd = __perf_open(drm_fd, &param);
> -               close(stream_fd);
> +               __perf_close(stream_fd);
>         }
>
>         igt_waitchildren();
> @@ -973,7 +984,6 @@ test_invalid_oa_metric_set_id(void)
>                 .num_properties = sizeof(properties) / 16,
>                 .properties_ptr = to_user_pointer(properties),
>         };
> -       int stream_fd;
>
>         do_ioctl_err(drm_fd, DRM_IOCTL_I915_PERF_OPEN, &param, EINVAL);
>
> @@ -983,7 +993,7 @@ test_invalid_oa_metric_set_id(void)
>         /* Check that we aren't just seeing false positives... */
>         properties[ARRAY_SIZE(properties) - 1] = test_metric_set_id;
>         stream_fd = __perf_open(drm_fd, &param);
> -       close(stream_fd);
> +       __perf_close(stream_fd);
>
>         /* There's no valid default OA metric set ID... */
>         param.num_properties--;
> @@ -1008,7 +1018,6 @@ test_invalid_oa_format_id(void)
>                 .num_properties = sizeof(properties) / 16,
>                 .properties_ptr = to_user_pointer(properties),
>         };
> -       int stream_fd;
>
>         do_ioctl_err(drm_fd, DRM_IOCTL_I915_PERF_OPEN, &param, EINVAL);
>
> @@ -1018,7 +1027,7 @@ test_invalid_oa_format_id(void)
>         /* Check that we aren't just seeing false positives... */
>         properties[ARRAY_SIZE(properties) - 1] = test_oa_format;
>         stream_fd = __perf_open(drm_fd, &param);
> -       close(stream_fd);
> +       __perf_close(stream_fd);
>
>         /* There's no valid default OA format... */
>         param.num_properties--;
> @@ -1046,8 +1055,7 @@ test_missing_sample_flags(void)
>  }
>
>  static void
> -read_2_oa_reports(int stream_fd,
> -                 int format_id,
> +read_2_oa_reports(int format_id,
>                   int exponent,
>                   uint32_t *oa_report0,
>                   uint32_t *oa_report1,
> @@ -1181,12 +1189,13 @@ open_and_read_2_oa_reports(int format_id,
>                 .num_properties = sizeof(properties) / 16,
>                 .properties_ptr = to_user_pointer(properties),
>         };
> -       int stream_fd = __perf_open(drm_fd, &param);
>
> -       read_2_oa_reports(stream_fd, format_id, exponent,
> +       stream_fd = __perf_open(drm_fd, &param);
> +
> +       read_2_oa_reports(format_id, exponent,
>                           oa_report0, oa_report1, timer_only);
>
> -       close(stream_fd);
> +       __perf_close(stream_fd);
>  }
>
>  static void
> @@ -1486,9 +1495,10 @@ test_invalid_oa_exponent(void)
>                 .num_properties = sizeof(properties) / 16,
>                 .properties_ptr = to_user_pointer(properties),
>         };
> -       int stream_fd = __perf_open(drm_fd, &param);
>
> -       close(stream_fd);
> +       stream_fd = __perf_open(drm_fd, &param);
> +
> +       __perf_close(stream_fd);
>
>         for (int i = 32; i < 65; i++) {
>                 properties[7] = i;
> @@ -1538,12 +1548,10 @@ test_low_oa_exponent_permissions(void)
>         properties[7] = ok_exponent;
>
>         igt_fork(child, 1) {
> -               int stream_fd;
> -
>                 igt_drop_root();
>
>                 stream_fd = __perf_open(drm_fd, &param);
> -               close(stream_fd);
> +               __perf_close(stream_fd);
>         }
>
>         igt_waitchildren();
> @@ -1592,7 +1600,6 @@ test_per_context_mode_unprivileged(void)
>         igt_fork(child, 1) {
>                 drm_intel_context *context;
>                 drm_intel_bufmgr *bufmgr;
> -               int stream_fd;
>                 uint32_t ctx_id = 0xffffffff; /* invalid id */
>                 int ret;
>
> @@ -1610,7 +1617,7 @@ test_per_context_mode_unprivileged(void)
>                 properties[1] = ctx_id;
>
>                 stream_fd = __perf_open(drm_fd, &param);
> -               close(stream_fd);
> +               __perf_close(stream_fd);
>
>                 drm_intel_gem_context_destroy(context);
>                 drm_intel_bufmgr_destroy(bufmgr);
> @@ -1673,7 +1680,6 @@ test_blocking(void)
>                 .num_properties = sizeof(properties) / 16,
>                 .properties_ptr = to_user_pointer(properties),
>         };
> -       int stream_fd = __perf_open(drm_fd, &param);
>         uint8_t buf[1024 * 1024];
>         struct tms start_times;
>         struct tms end_times;
> @@ -1698,6 +1704,8 @@ test_blocking(void)
>         int64_t start;
>         int n = 0;
>
> +       stream_fd = __perf_open(drm_fd, &param);
> +
>         times(&start_times);
>
>         igt_debug("tick length = %dns, test duration = %"PRIu64"ns, min iter. = %d, max iter. = %d\n",
> @@ -1795,7 +1803,7 @@ test_blocking(void)
>
>         igt_assert(kernel_ns <= (test_duration_ns / 100ull));
>
> -       close(stream_fd);
> +       __perf_close(stream_fd);
>  }
>
>  static void
> @@ -1824,7 +1832,6 @@ test_polling(void)
>                 .num_properties = sizeof(properties) / 16,
>                 .properties_ptr = to_user_pointer(properties),
>         };
> -       int stream_fd = __perf_open(drm_fd, &param);
>         uint8_t buf[1024 * 1024];
>         struct tms start_times;
>         struct tms end_times;
> @@ -1848,6 +1855,8 @@ test_polling(void)
>         int64_t start;
>         int n = 0;
>
> +       stream_fd = __perf_open(drm_fd, &param);
> +
>         times(&start_times);
>
>         igt_debug("tick length = %dns, test duration = %"PRIu64"ns, min iter. = %d, max iter. = %d\n",
> @@ -1976,7 +1985,7 @@ test_polling(void)
>
>         igt_assert(kernel_ns <= (test_duration_ns / 100ull));
>
> -       close(stream_fd);
> +       __perf_close(stream_fd);
>  }
>
>  static void
> @@ -1999,7 +2008,6 @@ test_buffer_fill(void)
>                 .num_properties = sizeof(properties) / 16,
>                 .properties_ptr = to_user_pointer(properties),
>         };
> -       int stream_fd = __perf_open(drm_fd, &param);
>         int buf_size = 65536 * (256 + sizeof(struct drm_i915_perf_record_header));
>         uint8_t *buf = malloc(buf_size);
>         size_t oa_buf_size = 16 * 1024 * 1024;
> @@ -2009,6 +2017,8 @@ test_buffer_fill(void)
>
>         igt_assert(fill_duration < 1000000000);
>
> +       stream_fd = __perf_open(drm_fd, &param);
> +
>         for (int i = 0; i < 5; i++) {
>                 struct drm_i915_perf_record_header *header;
>                 bool overflow_seen;
> @@ -2059,7 +2069,7 @@ test_buffer_fill(void)
>
>         free(buf);
>
> -       close(stream_fd);
> +       __perf_close(stream_fd);
>  }
>
>  static void
> @@ -2083,7 +2093,6 @@ test_enable_disable(void)
>                 .num_properties = sizeof(properties) / 16,
>                 .properties_ptr = to_user_pointer(properties),
>         };
> -       int stream_fd = __perf_open(drm_fd, &param);
>         int buf_size = 65536 * (256 + sizeof(struct drm_i915_perf_record_header));
>         uint8_t *buf = malloc(buf_size);
>         size_t oa_buf_size = 16 * 1024 * 1024;
> @@ -2091,6 +2100,7 @@ test_enable_disable(void)
>         int n_full_oa_reports = oa_buf_size / report_size;
>         uint64_t fill_duration = n_full_oa_reports * oa_period;
>
> +       stream_fd = __perf_open(drm_fd, &param);
>
>         for (int i = 0; i < 5; i++) {
>                 int len;
> @@ -2136,7 +2146,7 @@ test_enable_disable(void)
>
>         free(buf);
>
> -       close(stream_fd);
> +       __perf_close(stream_fd);
>  }
>
>  static void
> @@ -2163,7 +2173,6 @@ test_short_reads(void)
>         uint8_t *pages = mmap(NULL, page_size * 2,
>                               PROT_READ|PROT_WRITE, MAP_PRIVATE, zero_fd, 0);
>         struct drm_i915_perf_record_header *header;
> -       int stream_fd;
>         int ret;
>
>         igt_assert_neq(zero_fd, -1);
> @@ -2220,7 +2229,7 @@ test_short_reads(void)
>         igt_assert_eq(ret, -1);
>         igt_assert_eq(errno, ENOSPC);
>
> -       close(stream_fd);
> +       __perf_close(stream_fd);
>
>         munmap(pages, page_size * 2);
>  }
> @@ -2245,14 +2254,16 @@ test_non_sampling_read_error(void)
>                 .num_properties = sizeof(properties) / 16,
>                 .properties_ptr = to_user_pointer(properties),
>         };
> -       int stream_fd = __perf_open(drm_fd, &param);
> +       int ret;
>         uint8_t buf[1024];
>
> -       int ret = read(stream_fd, buf, sizeof(buf));
> +       stream_fd = __perf_open(drm_fd, &param);
> +
> +       ret = read(stream_fd, buf, sizeof(buf));
>         igt_assert_eq(ret, -1);
>         igt_assert_eq(errno, EIO);
>
> -       close(stream_fd);
> +       __perf_close(stream_fd);
>  }
>
>  /* Check that attempts to read from a stream while it is disable will return
> @@ -2279,25 +2290,24 @@ test_disabled_read_error(void)
>                 .num_properties = sizeof(properties) / 16,
>                 .properties_ptr = to_user_pointer(properties),
>         };
> -       int stream_fd = __perf_open(drm_fd, &param);
>         uint32_t oa_report0[64];
>         uint32_t oa_report1[64];
>         uint32_t buf[128] = { 0 };
>         int ret;
>
> +       stream_fd = __perf_open(drm_fd, &param);
>
>         ret = read(stream_fd, buf, sizeof(buf));
>         igt_assert_eq(ret, -1);
>         igt_assert_eq(errno, EIO);
>
> -       close(stream_fd);
> +       __perf_close(stream_fd);
>
>
>         param.flags &= ~I915_PERF_FLAG_DISABLED;
>         stream_fd = __perf_open(drm_fd, &param);
>
> -       read_2_oa_reports(stream_fd,
> -                         test_oa_format,
> +       read_2_oa_reports(test_oa_format,
>                           oa_exponent,
>                           oa_report0,
>                           oa_report1,
> @@ -2311,14 +2321,13 @@ test_disabled_read_error(void)
>
>         do_ioctl(stream_fd, I915_PERF_IOCTL_ENABLE, 0);
>
> -       read_2_oa_reports(stream_fd,
> -                         test_oa_format,
> +       read_2_oa_reports(test_oa_format,
>                           oa_exponent,
>                           oa_report0,
>                           oa_report1,
>                           false); /* not just timer reports */
>
> -       close(stream_fd);
> +       __perf_close(stream_fd);
>  }
>
>  static void
> @@ -2367,7 +2376,6 @@ test_mi_rpc(void)
>                 .num_properties = sizeof(properties) / 16,
>                 .properties_ptr = to_user_pointer(properties),
>         };
> -       int stream_fd = __perf_open(drm_fd, &param);
>         drm_intel_bufmgr *bufmgr = drm_intel_bufmgr_gem_init(drm_fd, 4096);
>         drm_intel_context *context;
>         struct intel_batchbuffer *batch;
> @@ -2375,6 +2383,8 @@ test_mi_rpc(void)
>         uint32_t *report32;
>         int ret;
>
> +       stream_fd = __perf_open(drm_fd, &param);
> +
>         drm_intel_bufmgr_gem_enable_reuse(bufmgr);
>
>         context = drm_intel_gem_context_create(bufmgr);
> @@ -2412,7 +2422,7 @@ test_mi_rpc(void)
>         intel_batchbuffer_free(batch);
>         drm_intel_gem_context_destroy(context);
>         drm_intel_bufmgr_destroy(bufmgr);
> -       close(stream_fd);
> +       __perf_close(stream_fd);
>  }
>
>  static void
> @@ -2503,7 +2513,6 @@ hsw_test_single_ctx_counters(void)
>         igt_fork(child, 1) {
>                 drm_intel_bufmgr *bufmgr;
>                 drm_intel_context *context0, *context1;
> -               int stream_fd;
>                 struct intel_batchbuffer *batch;
>                 struct igt_buf src, dst;
>                 drm_intel_bo *bo;
> @@ -2682,7 +2691,7 @@ hsw_test_single_ctx_counters(void)
>                 drm_intel_gem_context_destroy(context0);
>                 drm_intel_gem_context_destroy(context1);
>                 drm_intel_bufmgr_destroy(bufmgr);
> -               close(stream_fd);
> +               __perf_close(stream_fd);
>         }
>
>         igt_waitchildren();
> @@ -2705,11 +2714,12 @@ test_rc6_disable(void)
>                 .num_properties = sizeof(properties) / 16,
>                 .properties_ptr = to_user_pointer(properties),
>         };
> -       int stream_fd = __perf_open(drm_fd, &param);
>         uint64_t n_events_start = read_debugfs_u64_record(drm_fd, "i915_drpc_info",
>                                                           "RC6 residency since boot");
>         uint64_t n_events_end;
>
> +       stream_fd = __perf_open(drm_fd, &param);
Totally missed that, so we are actually breaking the test here, and
then fix it later in "igt/perf: fix rc6 test" :|

> +
>         nanosleep(&(struct timespec){ .tv_sec = 0, .tv_nsec = 500000000 }, NULL);
>
>         n_events_end = read_debugfs_u64_record(drm_fd, "i915_drpc_info",
> @@ -2717,7 +2727,7 @@ test_rc6_disable(void)
>
>         igt_assert_eq(n_events_end - n_events_start, 0);
>
> -       close(stream_fd);
> +       __perf_close(stream_fd);
>
>         n_events_start = read_debugfs_u64_record(drm_fd, "i915_drpc_info",
>                                                  "RC6 residency since boot");
> @@ -2779,7 +2789,6 @@ test_i915_ref_count(void)
>                 .properties_ptr = to_user_pointer(properties),
>         };
>         unsigned baseline, ref_count0, ref_count1;
> -       int stream_fd;
>         uint32_t oa_report0[64];
>         uint32_t oa_report1[64];
>
> @@ -2819,14 +2828,13 @@ test_i915_ref_count(void)
>
>         igt_assert(ref_count0 > baseline);
>
> -       read_2_oa_reports(stream_fd,
> -                         test_oa_format,
> +       read_2_oa_reports(test_oa_format,
>                           oa_exp_1_millisec,
>                           oa_report0,
>                           oa_report1,
>                           false); /* not just timer reports */
>
> -       close(stream_fd);
> +       __perf_close(stream_fd);
>         ref_count0 = read_i915_module_ref();
>         igt_debug("ref count after closing i915 perf stream fd = %u\n", ref_count0);
>         igt_assert_eq(ref_count0, baseline);
> --
> 2.11.0
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2017-06-21 13:48 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-25 22:32 [PATCH i-g-t 00/29] Update i915 perf tests for Gen8+ Lionel Landwerlin
2017-04-25 22:32 ` [PATCH i-g-t 01/29] igt/perf: generalize lookup for test metric set Lionel Landwerlin
2017-04-25 22:32 ` [PATCH i-g-t 02/29] igt/perf: improve robustness of polling/blocking tests Lionel Landwerlin
2017-06-16 13:48   ` Matthew Auld
2017-04-25 22:32 ` [PATCH i-g-t 03/29] igt/perf: init timestamp freq and oa format per devid Lionel Landwerlin
2017-06-16 14:37   ` Matthew Auld
2017-04-25 22:32 ` [PATCH i-g-t 04/29] igt/perf: update init_sys_info for skl with per-gt configs Lionel Landwerlin
2017-04-25 22:32 ` [PATCH i-g-t 05/29] igt/perf: add gen8 formats Lionel Landwerlin
2017-04-25 22:32 ` [PATCH i-g-t 06/29] igt/perf: fix a counter indexing Lionel Landwerlin
2017-04-25 22:32 ` [PATCH i-g-t 07/29] igt/perf: generalize checks for undefined A counters Lionel Landwerlin
2017-04-25 22:32 ` [PATCH i-g-t 08/29] igt/perf: generalize reading gpu ticks from reports Lionel Landwerlin
2017-04-25 22:32 ` [PATCH i-g-t 09/29] igt/perf: move timebase + oa exponent utilities up Lionel Landwerlin
2017-04-25 22:32 ` [PATCH i-g-t 10/29] igt/perf: wrap emission of MI_REPORT_PERF_COUNT Lionel Landwerlin
2017-04-25 22:32 ` [PATCH i-g-t 11/29] igt/perf: handling printing gen8 formats Lionel Landwerlin
2017-04-25 22:32 ` [PATCH i-g-t 12/29] igt/perf: avoid assumptions about oa exponent <-> freq mappings Lionel Landwerlin
2017-04-25 22:32 ` [PATCH i-g-t 13/29] igt/perf: allow 10% margin matching oa/sysfs freq in test_oa_exponents Lionel Landwerlin
2017-04-25 22:32 ` [PATCH i-g-t 14/29] igt/perf: s/test_perf_ctx_mi_rpc/hsw_test_single_ctx_counters/ Lionel Landwerlin
2017-04-25 22:32 ` [PATCH i-g-t 15/29] igt/perf: don't assume constant of 40 EUs Lionel Landwerlin
2017-04-25 22:32 ` [PATCH i-g-t 16/29] igt/perf: consider ctx-switch reports while polling/blocking Lionel Landwerlin
2017-04-25 22:32 ` [PATCH i-g-t 17/29] igt/perf: factor out oa report sanity checking Lionel Landwerlin
2017-04-25 22:32 ` [PATCH i-g-t 18/29] igt/perf: print [un]slice freq and report reasons in debug Lionel Landwerlin
2017-04-25 22:32 ` [PATCH i-g-t 19/29] igt/perf: update print_reports to print context ID Lionel Landwerlin
2017-04-25 22:32 ` [PATCH i-g-t 20/29] igt/perf: add utility function for checking periodic reports Lionel Landwerlin
2017-06-16 14:41   ` Matthew Auld
2017-04-25 22:32 ` [PATCH i-g-t 21/29] igt/perf: make stream_fd a global variable Lionel Landwerlin
2017-06-16 14:43   ` Matthew Auld
2017-06-16 14:46     ` Lionel Landwerlin
2017-06-21 13:47   ` Matthew Auld
2017-04-25 22:32 ` [PATCH i-g-t 22/29] igt/perf: add per context filtering test for gen8+ Lionel Landwerlin
2017-04-25 22:32 ` [PATCH i-g-t 23/29] igt/perf: update max buffer size for reading reports Lionel Landwerlin
2017-04-25 22:32 ` [PATCH i-g-t 24/29] igt/perf: fix rc6 test Lionel Landwerlin
2017-04-25 22:32 ` [PATCH i-g-t 25/29] igt/perf: rework oa-exponent test Lionel Landwerlin
2017-04-25 22:32 ` [PATCH i-g-t 26/29] igt/perf: make enable-disable more reliable Lionel Landwerlin
2017-04-25 22:32 ` [PATCH i-g-t 27/29] igt/perf: make buffer-fill " Lionel Landwerlin
2017-04-25 22:33 ` [PATCH i-g-t 28/29] igt/perf: load gt_boost_freq_mhz as max gt frequency Lionel Landwerlin
2017-04-25 22:33 ` [PATCH i-g-t 29/29] igt/perf: remove unused frequency functions Lionel Landwerlin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.