igt/perf_pmu roundup

All of lore.kernel.org
 help / color / mirror / Atom feed

* igt/perf_pmu roundup
@ 2017-11-23  8:22 Chris Wilson
  2017-11-23  8:22 ` [PATCH igt 1/3] igt/perf_pmu: Reduce arbitrary delays Chris Wilson
                   ` (4 more replies)
  0 siblings, 5 replies; 15+ messages in thread
From: Chris Wilson @ 2017-11-23  8:22 UTC (permalink / raw)
  To: intel-gfx

The stragglers for easy consumption and discussion.
-Chris

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH igt 1/3] igt/perf_pmu: Reduce arbitrary delays
  2017-11-23  8:22 igt/perf_pmu roundup Chris Wilson
@ 2017-11-23  8:22 ` Chris Wilson
  2017-11-23 13:35   ` Tvrtko Ursulin
  2017-11-23  8:22 ` [PATCH igt 2/3] igt/perf_pmu: Stop peeking at intel_mmio registers Chris Wilson
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 15+ messages in thread
From: Chris Wilson @ 2017-11-23  8:22 UTC (permalink / raw)
  To: intel-gfx

gem_quiescent_gpu() is supposed to ensure that the HW is idle, and in
the process kick the GPU into rc6, so we should not need a long delay
afterwards to ensure that we are indeed in rc6. We do however need a
small delay in order to be sure that rc6 cycle counter has started and
stopped.

v2: Apply to rc6p as well.
v3: The longest rc6 timeout (before the HW kicks in and enables rc6 on
an idle GPU) is 50ms, so make sure that at least that time has passed
since we were busy.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 tests/perf_pmu.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/tests/perf_pmu.c b/tests/perf_pmu.c
index ef5de658..5118c023 100644
--- a/tests/perf_pmu.c
+++ b/tests/perf_pmu.c
@@ -1003,7 +1003,7 @@ test_rc6(int gem_fd)
 	fd = open_pmu(I915_PMU_RC6_RESIDENCY);
 
 	gem_quiescent_gpu(gem_fd);
-	usleep(1e6);
+	usleep(100e3); /* wait for the rc6 cycle counter to kick in */
 
 	/* Go idle and check full RC6. */
 	prev = pmu_read_single(fd);
@@ -1015,6 +1015,7 @@ test_rc6(int gem_fd)
 	/* Wake up device and check no RC6. */
 	fw = igt_open_forcewake_handle(gem_fd);
 	igt_assert(fw >= 0);
+	usleep(1e3); /* wait for the rc6 cycle counter to stop ticking */
 
 	prev = pmu_read_single(fd);
 	usleep(duration_ns / 1000);
@@ -1047,7 +1048,7 @@ test_rc6p(int gem_fd)
 	igt_require(num_pmu == 3);
 
 	gem_quiescent_gpu(gem_fd);
-	usleep(1e6);
+	usleep(100e3); /* wait for the rc6 cycle counter to kick in */
 
 	/* Go idle and check full RC6. */
 	pmu_read_multi(fd, num_pmu, prev);
@@ -1060,6 +1061,7 @@ test_rc6p(int gem_fd)
 	/* Wake up device and check no RC6. */
 	fw = igt_open_forcewake_handle(gem_fd);
 	igt_assert(fw >= 0);
+	usleep(1e3); /* wait for the rc6 cycle counter to stop ticking */
 
 	pmu_read_multi(fd, num_pmu, prev);
 	usleep(duration_ns / 1000);
-- 
2.15.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH igt 2/3] igt/perf_pmu: Stop peeking at intel_mmio registers
  2017-11-23  8:22 igt/perf_pmu roundup Chris Wilson
  2017-11-23  8:22 ` [PATCH igt 1/3] igt/perf_pmu: Reduce arbitrary delays Chris Wilson
@ 2017-11-23  8:22 ` Chris Wilson
  2017-11-24  8:55   ` Tvrtko Ursulin
  2017-11-23  8:22 ` [PATCH igt 3/3] igt/perf_pmu: Idle the GPU before starting to measure busyness Chris Wilson
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 15+ messages in thread
From: Chris Wilson @ 2017-11-23  8:22 UTC (permalink / raw)
  To: intel-gfx

Program the MI_WAIT_FOR_EVENT without reference to DERRMR by knowing its
state is ~0u when not in use, and is only in use when userspace requires
it. By not touching intel_regsiter_access we completely eliminate the
risk that we leak the forcewake ref, which can cause later rc6 to fail.

At the same time, note that vlv/chv use a different mechanism (read
none) for coupling between the render engine and display.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 tests/perf_pmu.c | 63 ++++++++++++++++++++++++++++----------------------------
 1 file changed, 32 insertions(+), 31 deletions(-)

diff --git a/tests/perf_pmu.c b/tests/perf_pmu.c
index 5118c023..7fd1506e 100644
--- a/tests/perf_pmu.c
+++ b/tests/perf_pmu.c
@@ -498,20 +498,22 @@ event_wait(int gem_fd, const struct intel_execution_engine2 *e)
 {
 	struct drm_i915_gem_exec_object2 obj = { };
 	struct drm_i915_gem_execbuffer2 eb = { };
-	data_t data;
-	igt_display_t *display = &data.display;
 	const uint32_t DERRMR = 0x44050;
+	const uint32_t FORCEWAKE_MT = 0xa188;
 	unsigned int valid_tests = 0;
-	uint32_t batch[8], *b;
+	uint32_t batch[16], *b;
+	uint16_t devid;
 	igt_output_t *output;
-	uint32_t bb_handle;
-	uint32_t reg;
+	data_t data;
 	enum pipe p;
 	int fd;
 
-	igt_require(intel_gen(intel_get_drm_devid(gem_fd)) >= 6);
-	igt_require(intel_register_access_init(intel_get_pci_device(),
-					       false, gem_fd) == 0);
+	devid = intel_get_drm_devid(gem_fd);
+	igt_require(intel_gen(devid) >= 7);
+	igt_skip_on(IS_VALLEYVIEW(devid) || IS_CHERRYVIEW(devid));
+
+	kmstest_set_vt_graphics_mode();
+	igt_display_init(&data.display, gem_fd);
 
 	/**
 	 * We will use the display to render event forwarind so need to
@@ -522,51 +524,52 @@ event_wait(int gem_fd, const struct intel_execution_engine2 *e)
 	 * listen to the recorded time spent in engine wait state as reported
 	 * by the PMU.
 	 */
-	reg = intel_register_read(DERRMR);
-
-	kmstest_set_vt_graphics_mode();
-	igt_display_init(&data.display, gem_fd);
-
-	bb_handle = gem_create(gem_fd, 4096);
+	obj.handle = gem_create(gem_fd, 4096);
 
 	b = batch;
 	*b++ = MI_LOAD_REGISTER_IMM;
+	*b++ = FORCEWAKE_MT;
+	*b++ = 2 << 16 | 2;
+	*b++ = MI_LOAD_REGISTER_IMM;
 	*b++ = DERRMR;
-	*b++ = reg & ~((1 << 3) | (1 << 11) | (1 << 21));
-	*b++ = MI_WAIT_FOR_EVENT | MI_WAIT_FOR_PIPE_A_VBLANK;
+	*b++ = ~0u;
+	*b++ = MI_WAIT_FOR_EVENT;
 	*b++ = MI_LOAD_REGISTER_IMM;
 	*b++ = DERRMR;
-	*b++ = reg;
+	*b++ = ~0u;
+	*b++ = MI_LOAD_REGISTER_IMM;
+	*b++ = FORCEWAKE_MT;
+	*b++ = 2 << 16;
 	*b++ = MI_BATCH_BUFFER_END;
 
-	obj.handle = bb_handle;
-
 	eb.buffer_count = 1;
 	eb.buffers_ptr = to_user_pointer(&obj);
 	eb.flags = e2ring(gem_fd, e) | I915_EXEC_SECURE;
 
-	for_each_pipe_with_valid_output(display, p, output) {
+	for_each_pipe_with_valid_output(&data.display, p, output) {
 		struct igt_helper_process waiter = { };
 		const unsigned int frames = 3;
-		unsigned int frame;
 		uint64_t val[2];
 
-		batch[3] = MI_WAIT_FOR_EVENT;
+		batch[6] = MI_WAIT_FOR_EVENT;
 		switch (p) {
 		case PIPE_A:
-			batch[3] |= MI_WAIT_FOR_PIPE_A_VBLANK;
+			batch[6] |= MI_WAIT_FOR_PIPE_A_VBLANK;
+			batch[5] = ~(1 << 3);
 			break;
 		case PIPE_B:
-			batch[3] |= MI_WAIT_FOR_PIPE_B_VBLANK;
+			batch[6] |= MI_WAIT_FOR_PIPE_B_VBLANK;
+			batch[5] = ~(1 << 11);
 			break;
 		case PIPE_C:
-			batch[3] |= MI_WAIT_FOR_PIPE_C_VBLANK;
+			batch[6] |= MI_WAIT_FOR_PIPE_C_VBLANK;
+			batch[5] = ~(1 << 21);
 			break;
 		default:
 			continue;
 		}
 
-		gem_write(gem_fd, bb_handle, 0, batch, sizeof(batch));
+		gem_write(gem_fd, obj.handle, 0, batch, sizeof(batch));
 
 		data.pipe = p;
 		prepare_crtc(&data, gem_fd, output);
@@ -589,9 +592,9 @@ event_wait(int gem_fd, const struct intel_execution_engine2 *e)
 			}
 		}
 
-		for (frame = 0; frame < frames; frame++) {
+		for (unsigned int frame = 0; frame < frames; frame++) {
 			gem_execbuf(gem_fd, &eb);
-			gem_sync(gem_fd, bb_handle);
+			gem_sync(gem_fd, obj.handle);
 		}
 
 		igt_stop_helper(&waiter);
@@ -606,9 +609,7 @@ event_wait(int gem_fd, const struct intel_execution_engine2 *e)
 		igt_assert(val[1] - val[0] > 0);
 	}
 
-	gem_close(gem_fd, bb_handle);
-
-	intel_register_access_fini();
+	gem_close(gem_fd, obj.handle);
 
 	igt_require_f(valid_tests,
 		      "no valid crtc/connector combinations found\n");
-- 
2.15.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH igt 3/3] igt/perf_pmu: Idle the GPU before starting to measure busyness
  2017-11-23  8:22 igt/perf_pmu roundup Chris Wilson
  2017-11-23  8:22 ` [PATCH igt 1/3] igt/perf_pmu: Reduce arbitrary delays Chris Wilson
  2017-11-23  8:22 ` [PATCH igt 2/3] igt/perf_pmu: Stop peeking at intel_mmio registers Chris Wilson
@ 2017-11-23  8:22 ` Chris Wilson
  2017-11-23 13:37   ` Tvrtko Ursulin
  2017-11-23 18:53 ` ✓ Fi.CI.BAT: success for series starting with [1/3] igt/perf_pmu: Reduce arbitrary delays Patchwork
  2017-11-23 22:14 ` ✓ Fi.CI.IGT: " Patchwork
  4 siblings, 1 reply; 15+ messages in thread
From: Chris Wilson @ 2017-11-23  8:22 UTC (permalink / raw)
  To: intel-gfx

Make sure the HW is idle before we start sampling the GPU for busyness.
If we do not rest for long enough between tests, we may carry the
sampling over.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 tests/perf_pmu.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/tests/perf_pmu.c b/tests/perf_pmu.c
index 7fd1506e..9057a1d7 100644
--- a/tests/perf_pmu.c
+++ b/tests/perf_pmu.c
@@ -137,6 +137,9 @@ single(int gem_fd, const struct intel_execution_engine2 *e, bool busy)
 	uint64_t val;
 	int fd;
 
+	/* Ensure the HW is idle before being BUSY measurements */
+	gem_quiescent_gpu(gem_fd);
+
 	fd = open_pmu(I915_PMU_ENGINE_BUSY(e->class, e->instance));
 
 	if (busy) {
@@ -187,6 +190,9 @@ busy_check_all(int gem_fd, const struct intel_execution_engine2 *e,
 	igt_spin_t *spin;
 	unsigned int busy_idx, i;
 
+	/* Ensure the HW is idle before being BUSY measurements */
+	gem_quiescent_gpu(gem_fd);
+
 	i = 0;
 	fd[0] = -1;
 	for_each_engine_class_instance(fd, e_) {
@@ -232,6 +238,9 @@ most_busy_check_all(int gem_fd, const struct intel_execution_engine2 *e,
 	igt_spin_t *spin[num_engines];
 	unsigned int idle_idx, i;
 
+	/* Ensure the HW is idle before being BUSY measurements */
+	gem_quiescent_gpu(gem_fd);
+
 	gem_require_engine(gem_fd, e->class, e->instance);
 
 	i = 0;
@@ -287,6 +296,9 @@ all_busy_check_all(int gem_fd, const unsigned int num_engines)
 	igt_spin_t *spin[num_engines];
 	unsigned int i;
 
+	/* Ensure the HW is idle before being BUSY measurements */
+	gem_quiescent_gpu(gem_fd);
+
 	i = 0;
 	fd[0] = -1;
 	for_each_engine_class_instance(fd, e) {
@@ -624,6 +636,9 @@ multi_client(int gem_fd, const struct intel_execution_engine2 *e)
 	uint64_t val[2];
 	int fd[2];
 
+	/* Ensure the HW is idle before being BUSY measurements */
+	gem_quiescent_gpu(gem_fd);
+
 	fd[0] = open_pmu(config);
 
 	/*
@@ -739,6 +754,9 @@ static void cpu_hotplug(int gem_fd)
 	uint64_t val, ref;
 	int fd;
 
+	/* Ensure the HW is idle before being BUSY measurements */
+	gem_quiescent_gpu(gem_fd);
+
 	igt_require(cpu0_hotplug_support());
 
 	fd = perf_i915_open(I915_PMU_ENGINE_BUSY(I915_ENGINE_CLASS_RENDER, 0));
-- 
2.15.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH igt 1/3] igt/perf_pmu: Reduce arbitrary delays
  2017-11-23  8:22 ` [PATCH igt 1/3] igt/perf_pmu: Reduce arbitrary delays Chris Wilson
@ 2017-11-23 13:35   ` Tvrtko Ursulin
  0 siblings, 0 replies; 15+ messages in thread
From: Tvrtko Ursulin @ 2017-11-23 13:35 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 23/11/2017 08:22, Chris Wilson wrote:
> gem_quiescent_gpu() is supposed to ensure that the HW is idle, and in
> the process kick the GPU into rc6, so we should not need a long delay
> afterwards to ensure that we are indeed in rc6. We do however need a
> small delay in order to be sure that rc6 cycle counter has started and
> stopped.
> 
> v2: Apply to rc6p as well.
> v3: The longest rc6 timeout (before the HW kicks in and enables rc6 on
> an idle GPU) is 50ms, so make sure that at least that time has passed
> since we were busy.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   tests/perf_pmu.c | 6 ++++--
>   1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/tests/perf_pmu.c b/tests/perf_pmu.c
> index ef5de658..5118c023 100644
> --- a/tests/perf_pmu.c
> +++ b/tests/perf_pmu.c
> @@ -1003,7 +1003,7 @@ test_rc6(int gem_fd)
>   	fd = open_pmu(I915_PMU_RC6_RESIDENCY);
>   
>   	gem_quiescent_gpu(gem_fd);
> -	usleep(1e6);
> +	usleep(100e3); /* wait for the rc6 cycle counter to kick in */
>   
>   	/* Go idle and check full RC6. */
>   	prev = pmu_read_single(fd);
> @@ -1015,6 +1015,7 @@ test_rc6(int gem_fd)
>   	/* Wake up device and check no RC6. */
>   	fw = igt_open_forcewake_handle(gem_fd);
>   	igt_assert(fw >= 0);
> +	usleep(1e3); /* wait for the rc6 cycle counter to stop ticking */
>   
>   	prev = pmu_read_single(fd);
>   	usleep(duration_ns / 1000);
> @@ -1047,7 +1048,7 @@ test_rc6p(int gem_fd)
>   	igt_require(num_pmu == 3);
>   
>   	gem_quiescent_gpu(gem_fd);
> -	usleep(1e6);
> +	usleep(100e3); /* wait for the rc6 cycle counter to kick in */
>   
>   	/* Go idle and check full RC6. */
>   	pmu_read_multi(fd, num_pmu, prev);
> @@ -1060,6 +1061,7 @@ test_rc6p(int gem_fd)
>   	/* Wake up device and check no RC6. */
>   	fw = igt_open_forcewake_handle(gem_fd);
>   	igt_assert(fw >= 0);
> +	usleep(1e3); /* wait for the rc6 cycle counter to stop ticking */
>   
>   	pmu_read_multi(fd, num_pmu, prev);
>   	usleep(duration_ns / 1000);
> 

As we talked yesterday I added the sleep thinking I am letting the GPU 
go to RC6, but I missed the fact gem_queiscent_gpu already does that. So 
as you explain here it is actually about hardware behaviour.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH igt 3/3] igt/perf_pmu: Idle the GPU before starting to measure busyness
  2017-11-23  8:22 ` [PATCH igt 3/3] igt/perf_pmu: Idle the GPU before starting to measure busyness Chris Wilson
@ 2017-11-23 13:37   ` Tvrtko Ursulin
  2017-11-23 16:02     ` Chris Wilson
  0 siblings, 1 reply; 15+ messages in thread
From: Tvrtko Ursulin @ 2017-11-23 13:37 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 23/11/2017 08:22, Chris Wilson wrote:
> Make sure the HW is idle before we start sampling the GPU for busyness.
> If we do not rest for long enough between tests, we may carry the
> sampling over.

I'd rather not have this since as I said yesterday each opened PMU event 
is supposed to record the initial counter value as reference. If that is 
failing or not good enough on some tests/platforms I would rather first 
understand why and how.

Regards,

Tvrtko

> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   tests/perf_pmu.c | 18 ++++++++++++++++++
>   1 file changed, 18 insertions(+)
> 
> diff --git a/tests/perf_pmu.c b/tests/perf_pmu.c
> index 7fd1506e..9057a1d7 100644
> --- a/tests/perf_pmu.c
> +++ b/tests/perf_pmu.c
> @@ -137,6 +137,9 @@ single(int gem_fd, const struct intel_execution_engine2 *e, bool busy)
>   	uint64_t val;
>   	int fd;
>   
> +	/* Ensure the HW is idle before being BUSY measurements */
> +	gem_quiescent_gpu(gem_fd);
> +
>   	fd = open_pmu(I915_PMU_ENGINE_BUSY(e->class, e->instance));
>   
>   	if (busy) {
> @@ -187,6 +190,9 @@ busy_check_all(int gem_fd, const struct intel_execution_engine2 *e,
>   	igt_spin_t *spin;
>   	unsigned int busy_idx, i;
>   
> +	/* Ensure the HW is idle before being BUSY measurements */
> +	gem_quiescent_gpu(gem_fd);
> +
>   	i = 0;
>   	fd[0] = -1;
>   	for_each_engine_class_instance(fd, e_) {
> @@ -232,6 +238,9 @@ most_busy_check_all(int gem_fd, const struct intel_execution_engine2 *e,
>   	igt_spin_t *spin[num_engines];
>   	unsigned int idle_idx, i;
>   
> +	/* Ensure the HW is idle before being BUSY measurements */
> +	gem_quiescent_gpu(gem_fd);
> +
>   	gem_require_engine(gem_fd, e->class, e->instance);
>   
>   	i = 0;
> @@ -287,6 +296,9 @@ all_busy_check_all(int gem_fd, const unsigned int num_engines)
>   	igt_spin_t *spin[num_engines];
>   	unsigned int i;
>   
> +	/* Ensure the HW is idle before being BUSY measurements */
> +	gem_quiescent_gpu(gem_fd);
> +
>   	i = 0;
>   	fd[0] = -1;
>   	for_each_engine_class_instance(fd, e) {
> @@ -624,6 +636,9 @@ multi_client(int gem_fd, const struct intel_execution_engine2 *e)
>   	uint64_t val[2];
>   	int fd[2];
>   
> +	/* Ensure the HW is idle before being BUSY measurements */
> +	gem_quiescent_gpu(gem_fd);
> +
>   	fd[0] = open_pmu(config);
>   
>   	/*
> @@ -739,6 +754,9 @@ static void cpu_hotplug(int gem_fd)
>   	uint64_t val, ref;
>   	int fd;
>   
> +	/* Ensure the HW is idle before being BUSY measurements */
> +	gem_quiescent_gpu(gem_fd);
> +
>   	igt_require(cpu0_hotplug_support());
>   
>   	fd = perf_i915_open(I915_PMU_ENGINE_BUSY(I915_ENGINE_CLASS_RENDER, 0));
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH igt 3/3] igt/perf_pmu: Idle the GPU before starting to measure busyness
  2017-11-23 13:37   ` Tvrtko Ursulin
@ 2017-11-23 16:02     ` Chris Wilson
  2017-11-23 16:07       ` Chris Wilson
  0 siblings, 1 reply; 15+ messages in thread
From: Chris Wilson @ 2017-11-23 16:02 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2017-11-23 13:37:30)
> 
> On 23/11/2017 08:22, Chris Wilson wrote:
> > Make sure the HW is idle before we start sampling the GPU for busyness.
> > If we do not rest for long enough between tests, we may carry the
> > sampling over.
> 
> I'd rather not have this since as I said yesterday each opened PMU event 
> is supposed to record the initial counter value as reference. If that is 
> failing or not good enough on some tests/platforms I would rather first 
> understand why and how.

All legacy sampling fails... :| I'm putting it back!
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH igt 3/3] igt/perf_pmu: Idle the GPU before starting to measure busyness
  2017-11-23 16:02     ` Chris Wilson
@ 2017-11-23 16:07       ` Chris Wilson
  2017-11-24  7:32         ` Tvrtko Ursulin
  0 siblings, 1 reply; 15+ messages in thread
From: Chris Wilson @ 2017-11-23 16:07 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Chris Wilson (2017-11-23 16:02:20)
> Quoting Tvrtko Ursulin (2017-11-23 13:37:30)
> > 
> > On 23/11/2017 08:22, Chris Wilson wrote:
> > > Make sure the HW is idle before we start sampling the GPU for busyness.
> > > If we do not rest for long enough between tests, we may carry the
> > > sampling over.
> > 
> > I'd rather not have this since as I said yesterday each opened PMU event 
> > is supposed to record the initial counter value as reference. If that is 
> > failing or not good enough on some tests/platforms I would rather first 
> > understand why and how.
> 
> All legacy sampling fails... :| I'm putting it back!

So presumably it is coupling with the parking.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* ✓ Fi.CI.BAT: success for series starting with [1/3] igt/perf_pmu: Reduce arbitrary delays
  2017-11-23  8:22 igt/perf_pmu roundup Chris Wilson
                   ` (2 preceding siblings ...)
  2017-11-23  8:22 ` [PATCH igt 3/3] igt/perf_pmu: Idle the GPU before starting to measure busyness Chris Wilson
@ 2017-11-23 18:53 ` Patchwork
  2017-11-23 22:14 ` ✓ Fi.CI.IGT: " Patchwork
  4 siblings, 0 replies; 15+ messages in thread
From: Patchwork @ 2017-11-23 18:53 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [1/3] igt/perf_pmu: Reduce arbitrary delays
URL   : https://patchwork.freedesktop.org/series/34271/
State : success

== Summary ==

IGT patchset tested on top of latest successful build
a1e444f4c8178acb590d41c21e921c6447668be4 tests/perf_pmu: Bump measuring duration for semaphores as well

with latest DRM-Tip kernel build CI_DRM_3378
b407e5f38397 drm-tip: 2017y-11m-23d-16h-14m-59s UTC integration manifest

No testlist changes.

Test kms_pipe_crc_basic:
        Subgroup suspend-read-crc-pipe-c:
                incomplete -> PASS       (fi-cfl-s2)

fi-bdw-5557u     total:289  pass:268  dwarn:0   dfail:0   fail:0   skip:21  time:447s
fi-bdw-gvtdvm    total:289  pass:265  dwarn:0   dfail:0   fail:0   skip:24  time:457s
fi-blb-e6850     total:289  pass:223  dwarn:1   dfail:0   fail:0   skip:65  time:388s
fi-bsw-n3050     total:289  pass:243  dwarn:0   dfail:0   fail:0   skip:46  time:549s
fi-bwr-2160      total:289  pass:183  dwarn:0   dfail:0   fail:0   skip:106 time:279s
fi-bxt-dsi       total:289  pass:259  dwarn:0   dfail:0   fail:0   skip:30  time:514s
fi-bxt-j4205     total:289  pass:260  dwarn:0   dfail:0   fail:0   skip:29  time:521s
fi-byt-j1900     total:289  pass:254  dwarn:0   dfail:0   fail:0   skip:35  time:513s
fi-byt-n2820     total:289  pass:250  dwarn:0   dfail:0   fail:0   skip:39  time:496s
fi-cfl-s2        total:289  pass:263  dwarn:0   dfail:0   fail:0   skip:26  time:606s
fi-elk-e7500     total:289  pass:229  dwarn:0   dfail:0   fail:0   skip:60  time:429s
fi-gdg-551       total:289  pass:178  dwarn:1   dfail:0   fail:1   skip:109 time:266s
fi-glk-1         total:289  pass:261  dwarn:0   dfail:0   fail:0   skip:28  time:544s
fi-hsw-4770      total:289  pass:262  dwarn:0   dfail:0   fail:0   skip:27  time:432s
fi-hsw-4770r     total:289  pass:262  dwarn:0   dfail:0   fail:0   skip:27  time:441s
fi-ilk-650       total:289  pass:228  dwarn:0   dfail:0   fail:0   skip:61  time:434s
fi-ivb-3520m     total:289  pass:260  dwarn:0   dfail:0   fail:0   skip:29  time:484s
fi-ivb-3770      total:289  pass:260  dwarn:0   dfail:0   fail:0   skip:29  time:465s
fi-pnv-d510      total:289  pass:222  dwarn:1   dfail:0   fail:0   skip:66  time:579s
fi-skl-6260u     total:289  pass:269  dwarn:0   dfail:0   fail:0   skip:20  time:454s
fi-skl-6600u     total:289  pass:262  dwarn:0   dfail:0   fail:0   skip:27  time:542s
fi-skl-6700hq    total:289  pass:263  dwarn:0   dfail:0   fail:0   skip:26  time:565s
fi-skl-6700k     total:289  pass:265  dwarn:0   dfail:0   fail:0   skip:24  time:524s
fi-skl-6770hq    total:289  pass:269  dwarn:0   dfail:0   fail:0   skip:20  time:504s
fi-skl-gvtdvm    total:289  pass:266  dwarn:0   dfail:0   fail:0   skip:23  time:464s
fi-snb-2520m     total:289  pass:250  dwarn:0   dfail:0   fail:0   skip:39  time:565s
fi-snb-2600      total:289  pass:249  dwarn:0   dfail:0   fail:0   skip:40  time:438s
Blacklisted hosts:
fi-cnl-y         total:237  pass:212  dwarn:0   dfail:0   fail:0   skip:24 
fi-glk-dsi       total:289  pass:258  dwarn:0   dfail:0   fail:1   skip:30  time:515s
fi-kbl-7500u     total:289  pass:264  dwarn:1   dfail:0   fail:0   skip:24  time:483s
fi-kbl-7560u     total:289  pass:270  dwarn:0   dfail:0   fail:0   skip:19  time:531s
fi-kbl-7567u     total:289  pass:269  dwarn:0   dfail:0   fail:0   skip:20  time:476s
fi-kbl-r         total:289  pass:262  dwarn:0   dfail:0   fail:0   skip:27  time:536s

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_541/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* ✓ Fi.CI.IGT: success for series starting with [1/3] igt/perf_pmu: Reduce arbitrary delays
  2017-11-23  8:22 igt/perf_pmu roundup Chris Wilson
                   ` (3 preceding siblings ...)
  2017-11-23 18:53 ` ✓ Fi.CI.BAT: success for series starting with [1/3] igt/perf_pmu: Reduce arbitrary delays Patchwork
@ 2017-11-23 22:14 ` Patchwork
  4 siblings, 0 replies; 15+ messages in thread
From: Patchwork @ 2017-11-23 22:14 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [1/3] igt/perf_pmu: Reduce arbitrary delays
URL   : https://patchwork.freedesktop.org/series/34271/
State : success

== Summary ==

Test kms_cursor_crc:
        Subgroup cursor-256x256-suspend:
                pass       -> SKIP       (shard-snb) fdo#102365

fdo#102365 https://bugs.freedesktop.org/show_bug.cgi?id=102365

shard-hsw        total:2667 pass:1534 dwarn:1   dfail:0   fail:11  skip:1121 time:9593s
shard-snb        total:2667 pass:1311 dwarn:1   dfail:0   fail:12  skip:1343 time:8101s
Blacklisted hosts:
shard-apl        total:2645 pass:1665 dwarn:1   dfail:0   fail:24  skip:954 time:13254s
shard-kbl        total:2595 pass:1734 dwarn:19  dfail:2   fail:24  skip:815 time:10756s

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_541/shards.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH igt 3/3] igt/perf_pmu: Idle the GPU before starting to measure busyness
  2017-11-23 16:07       ` Chris Wilson
@ 2017-11-24  7:32         ` Tvrtko Ursulin
  2017-11-24  9:15           ` Chris Wilson
  0 siblings, 1 reply; 15+ messages in thread
From: Tvrtko Ursulin @ 2017-11-24  7:32 UTC (permalink / raw)
  To: Chris Wilson, Tvrtko Ursulin, intel-gfx


On 23/11/17 16:07, Chris Wilson wrote:
> Quoting Chris Wilson (2017-11-23 16:02:20)
>> Quoting Tvrtko Ursulin (2017-11-23 13:37:30)
>>>
>>> On 23/11/2017 08:22, Chris Wilson wrote:
>>>> Make sure the HW is idle before we start sampling the GPU for busyness.
>>>> If we do not rest for long enough between tests, we may carry the
>>>> sampling over.
>>>
>>> I'd rather not have this since as I said yesterday each opened PMU event
>>> is supposed to record the initial counter value as reference. If that is
>>> failing or not good enough on some tests/platforms I would rather first
>>> understand why and how.
>>
>> All legacy sampling fails... :| I'm putting it back!
> 
> So presumably it is coupling with the parking.

Or more precisely averaging with the previous sample. Every time one PMU
client signals the timer to self-disarm, but the new PMU client enables
the timer again in this window, the previous sample value does not get
cleared.

I tried this for instance:

diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index 650089b30d46..a5ca27eeef40 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -147,9 +147,26 @@ void i915_pmu_gt_parked(struct drm_i915_private *i915)
        spin_unlock_irq(&i915->pmu.lock);
 }
 
+static void pmu_init_previous_samples(struct drm_i915_private *i915)
+{
+       struct intel_engine_cs *engine;
+       enum intel_engine_id id;
+       unsigned int i;
+
+       for_each_engine(engine, i915, id) {
+               for (i = 0; i < ARRAY_SIZE(engine->pmu.sample); i++)
+                       engine->pmu.sample[i].prev = 0;
+       }
+
+       for (i = 0; i < ARRAY_SIZE(i915->pmu.sample); i++)
+               i915->pmu.sample[i].prev = i915->gt_pm.rps.idle_freq;
+}
+
 static void __i915_pmu_maybe_start_timer(struct drm_i915_private *i915)
 {
        if (!i915->pmu.timer_enabled && pmu_needs_timer(i915, true)) {
+               hrtimer_cancel(&i915->pmu.timer);
+               pmu_init_previous_samples(i915);
                i915->pmu.timer_enabled = true;
                hrtimer_start_range_ns(&i915->pmu.timer,
                                       ns_to_ktime(PERIOD), 0,
@@ -262,31 +279,13 @@ static void frequency_sample(struct drm_i915_private *dev_priv)
        }
 }
 
-static void pmu_init_previous_samples(struct drm_i915_private *i915)
-{
-       struct intel_engine_cs *engine;
-       enum intel_engine_id id;
-       unsigned int i;
-
-       for_each_engine(engine, i915, id) {
-               for (i = 0; i < ARRAY_SIZE(engine->pmu.sample); i++)
-                       engine->pmu.sample[i].prev = 0;
-       }
-
-       for (i = 0; i < ARRAY_SIZE(i915->pmu.sample); i++)
-               i915->pmu.sample[i].prev = i915->gt_pm.rps.idle_freq;
-}
-
 static enum hrtimer_restart i915_sample(struct hrtimer *hrtimer)
 {
        struct drm_i915_private *i915 =
                container_of(hrtimer, struct drm_i915_private, pmu.timer);
 
-       if (!READ_ONCE(i915->pmu.timer_enabled)) {
-               pmu_init_previous_samples(i915);
-
+       if (!READ_ONCE(i915->pmu.timer_enabled))
                return HRTIMER_NORESTART;
-       }
 
        engines_sample(i915);
        frequency_sample(i915);
@@ -847,8 +846,6 @@ void i915_pmu_register(struct drm_i915_private *i915)
        hrtimer_init(&i915->pmu.timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
        i915->pmu.timer.function = i915_sample;
 
-       pmu_init_previous_samples(i915);
-
        for_each_engine(engine, i915, id)
                INIT_DELAYED_WORK(&engine->pmu.disable_busy_stats,
                                  __disable_busy_stats);
---

But it is too evil to wait for the sampling timer in case of MMIO in the
irq disabled section. Even though it would happen only on disable-enable
transitions quicker than sampling period. It is not an use case, but still..

Or we could drop sample averaging.. ?

Regards,

Tvrtko


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH igt 2/3] igt/perf_pmu: Stop peeking at intel_mmio registers
  2017-11-23  8:22 ` [PATCH igt 2/3] igt/perf_pmu: Stop peeking at intel_mmio registers Chris Wilson
@ 2017-11-24  8:55   ` Tvrtko Ursulin
  2017-11-24  9:20     ` Chris Wilson
  2017-11-24 10:47     ` Chris Wilson
  0 siblings, 2 replies; 15+ messages in thread
From: Tvrtko Ursulin @ 2017-11-24  8:55 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 23/11/2017 08:22, Chris Wilson wrote:
> Program the MI_WAIT_FOR_EVENT without reference to DERRMR by knowing its
> state is ~0u when not in use, and is only in use when userspace requires
> it. By not touching intel_regsiter_access we completely eliminate the
> risk that we leak the forcewake ref, which can cause later rc6 to fail.
> 
> At the same time, note that vlv/chv use a different mechanism (read
> none) for coupling between the render engine and display.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   tests/perf_pmu.c | 63 ++++++++++++++++++++++++++++----------------------------
>   1 file changed, 32 insertions(+), 31 deletions(-)
> 
> diff --git a/tests/perf_pmu.c b/tests/perf_pmu.c
> index 5118c023..7fd1506e 100644
> --- a/tests/perf_pmu.c
> +++ b/tests/perf_pmu.c
> @@ -498,20 +498,22 @@ event_wait(int gem_fd, const struct intel_execution_engine2 *e)
>   {
>   	struct drm_i915_gem_exec_object2 obj = { };
>   	struct drm_i915_gem_execbuffer2 eb = { };
> -	data_t data;
> -	igt_display_t *display = &data.display;
>   	const uint32_t DERRMR = 0x44050;
> +	const uint32_t FORCEWAKE_MT = 0xa188;
>   	unsigned int valid_tests = 0;
> -	uint32_t batch[8], *b;
> +	uint32_t batch[16], *b;
> +	uint16_t devid;
>   	igt_output_t *output;
> -	uint32_t bb_handle;
> -	uint32_t reg;
> +	data_t data;
>   	enum pipe p;
>   	int fd;
>   
> -	igt_require(intel_gen(intel_get_drm_devid(gem_fd)) >= 6);
> -	igt_require(intel_register_access_init(intel_get_pci_device(),
> -					       false, gem_fd) == 0);
> +	devid = intel_get_drm_devid(gem_fd);
> +	igt_require(intel_gen(devid) >= 7);
> +	igt_skip_on(IS_VALLEYVIEW(devid) || IS_CHERRYVIEW(devid));
> +
> +	kmstest_set_vt_graphics_mode();
> +	igt_display_init(&data.display, gem_fd);
>   
>   	/**
>   	 * We will use the display to render event forwarind so need to
> @@ -522,51 +524,52 @@ event_wait(int gem_fd, const struct intel_execution_engine2 *e)
>   	 * listen to the recorded time spent in engine wait state as reported
>   	 * by the PMU.
>   	 */
> -	reg = intel_register_read(DERRMR);
> -
> -	kmstest_set_vt_graphics_mode();
> -	igt_display_init(&data.display, gem_fd);
> -
> -	bb_handle = gem_create(gem_fd, 4096);
> +	obj.handle = gem_create(gem_fd, 4096);
>   
>   	b = batch;
>   	*b++ = MI_LOAD_REGISTER_IMM;
> +	*b++ = FORCEWAKE_MT;
> +	*b++ = 2 << 16 | 2;
> +	*b++ = MI_LOAD_REGISTER_IMM;
>   	*b++ = DERRMR;
> -	*b++ = reg & ~((1 << 3) | (1 << 11) | (1 << 21));
> -	*b++ = MI_WAIT_FOR_EVENT | MI_WAIT_FOR_PIPE_A_VBLANK;
> +	*b++ = ~0u;
> +	*b++ = MI_WAIT_FOR_EVENT;
>   	*b++ = MI_LOAD_REGISTER_IMM;
>   	*b++ = DERRMR;
> -	*b++ = reg;
> +	*b++ = ~0u;
> +	*b++ = MI_LOAD_REGISTER_IMM;
> +	*b++ = FORCEWAKE_MT;
> +	*b++ = 2 << 16;
>   	*b++ = MI_BATCH_BUFFER_END;
>   
> -	obj.handle = bb_handle;
> -
>   	eb.buffer_count = 1;
>   	eb.buffers_ptr = to_user_pointer(&obj);
>   	eb.flags = e2ring(gem_fd, e) | I915_EXEC_SECURE;
>   
> -	for_each_pipe_with_valid_output(display, p, output) {
> +	for_each_pipe_with_valid_output(&data.display, p, output) {
>   		struct igt_helper_process waiter = { };
>   		const unsigned int frames = 3;
> -		unsigned int frame;
>   		uint64_t val[2];
>   
> -		batch[3] = MI_WAIT_FOR_EVENT;
> +		batch[6] = MI_WAIT_FOR_EVENT;
>   		switch (p) {
>   		case PIPE_A:
> -			batch[3] |= MI_WAIT_FOR_PIPE_A_VBLANK;
> +			batch[6] |= MI_WAIT_FOR_PIPE_A_VBLANK;
> +			batch[5] = ~(1 << 3);
>   			break;
>   		case PIPE_B:
> -			batch[3] |= MI_WAIT_FOR_PIPE_B_VBLANK;
> +			batch[6] |= MI_WAIT_FOR_PIPE_B_VBLANK;
> +			batch[5] = ~(1 << 11);
>   			break;
>   		case PIPE_C:
> -			batch[3] |= MI_WAIT_FOR_PIPE_C_VBLANK;
> +			batch[6] |= MI_WAIT_FOR_PIPE_C_VBLANK;
> +			batch[5] = ~(1 << 21);
>   			break;
>   		default:
>   			continue;
>   		}
>   
> -		gem_write(gem_fd, bb_handle, 0, batch, sizeof(batch));
> +		gem_write(gem_fd, obj.handle, 0, batch, sizeof(batch));
>   
>   		data.pipe = p;
>   		prepare_crtc(&data, gem_fd, output);
> @@ -589,9 +592,9 @@ event_wait(int gem_fd, const struct intel_execution_engine2 *e)
>   			}
>   		}
>   
> -		for (frame = 0; frame < frames; frame++) {
> +		for (unsigned int frame = 0; frame < frames; frame++) {
>   			gem_execbuf(gem_fd, &eb);
> -			gem_sync(gem_fd, bb_handle);
> +			gem_sync(gem_fd, obj.handle);
>   		}
>   
>   		igt_stop_helper(&waiter);
> @@ -606,9 +609,7 @@ event_wait(int gem_fd, const struct intel_execution_engine2 *e)
>   		igt_assert(val[1] - val[0] > 0);
>   	}
>   
> -	gem_close(gem_fd, bb_handle);
> -
> -	intel_register_access_fini();
> +	gem_close(gem_fd, obj.handle);
>   
>   	igt_require_f(valid_tests,
>   		      "no valid crtc/connector combinations found\n");
> 

It looks fine apart from the assumption that DERRMR always have to 
remain at default ~0. Worst I can imagine is that in the future we have 
to make this test force disconnect all displays, should some of the 
reserved bits be used for something which we will be turning on by default.

But in that world sampling DERRMR value at test start is also only 
marginally better.

Anyway, we can worry about that if it happens.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko



_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH igt 3/3] igt/perf_pmu: Idle the GPU before starting to measure busyness
  2017-11-24  7:32         ` Tvrtko Ursulin
@ 2017-11-24  9:15           ` Chris Wilson
  0 siblings, 0 replies; 15+ messages in thread
From: Chris Wilson @ 2017-11-24  9:15 UTC (permalink / raw)
  To: Tvrtko Ursulin, Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2017-11-24 07:32:30)
> 
> On 23/11/17 16:07, Chris Wilson wrote:
> > Quoting Chris Wilson (2017-11-23 16:02:20)
> >> Quoting Tvrtko Ursulin (2017-11-23 13:37:30)
> >>>
> >>> On 23/11/2017 08:22, Chris Wilson wrote:
> >>>> Make sure the HW is idle before we start sampling the GPU for busyness.
> >>>> If we do not rest for long enough between tests, we may carry the
> >>>> sampling over.
> >>>
> >>> I'd rather not have this since as I said yesterday each opened PMU event
> >>> is supposed to record the initial counter value as reference. If that is
> >>> failing or not good enough on some tests/platforms I would rather first
> >>> understand why and how.
> >>
> >> All legacy sampling fails... :| I'm putting it back!
> > 
> > So presumably it is coupling with the parking.
> 
> Or more precisely averaging with the previous sample. Every time one PMU
> client signals the timer to self-disarm, but the new PMU client enables
> the timer again in this window, the previous sample value does not get
> cleared.

[snip]
 
> But it is too evil to wait for the sampling timer in case of MMIO in the
> irq disabled section. Even though it would happen only on disable-enable
> transitions quicker than sampling period. It is not an use case, but still..
> 
> Or we could drop sample averaging.. ?

Yeah, we are accumulating too much cruft for a small improvement (that
only affects the endpoints).
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH igt 2/3] igt/perf_pmu: Stop peeking at intel_mmio registers
  2017-11-24  8:55   ` Tvrtko Ursulin
@ 2017-11-24  9:20     ` Chris Wilson
  2017-11-24 10:47     ` Chris Wilson
  1 sibling, 0 replies; 15+ messages in thread
From: Chris Wilson @ 2017-11-24  9:20 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2017-11-24 08:55:13)
> 
> On 23/11/2017 08:22, Chris Wilson wrote:
> > @@ -522,51 +524,52 @@ event_wait(int gem_fd, const struct intel_execution_engine2 *e)
> >        * listen to the recorded time spent in engine wait state as reported
> >        * by the PMU.
> >        */
> > -     reg = intel_register_read(DERRMR);
> > -
> > -     kmstest_set_vt_graphics_mode();
> > -     igt_display_init(&data.display, gem_fd);
> > -
> > -     bb_handle = gem_create(gem_fd, 4096);
> > +     obj.handle = gem_create(gem_fd, 4096);
> >   
> >       b = batch;
> >       *b++ = MI_LOAD_REGISTER_IMM;
> > +     *b++ = FORCEWAKE_MT;
> > +     *b++ = 2 << 16 | 2;
> > +     *b++ = MI_LOAD_REGISTER_IMM;
> >       *b++ = DERRMR;
> > -     *b++ = reg & ~((1 << 3) | (1 << 11) | (1 << 21));
> > -     *b++ = MI_WAIT_FOR_EVENT | MI_WAIT_FOR_PIPE_A_VBLANK;
> > +     *b++ = ~0u;
> > +     *b++ = MI_WAIT_FOR_EVENT;
> >       *b++ = MI_LOAD_REGISTER_IMM;
> >       *b++ = DERRMR;
> > -     *b++ = reg;
> > +     *b++ = ~0u;
> > +     *b++ = MI_LOAD_REGISTER_IMM;
> > +     *b++ = FORCEWAKE_MT;
> > +     *b++ = 2 << 16;
> >       *b++ = MI_BATCH_BUFFER_END;
> >   
> It looks fine apart from the assumption that DERRMR always have to 
> remain at default ~0. Worst I can imagine is that in the future we have 
> to make this test force disconnect all displays, should some of the 
> reserved bits be used for something which we will be turning on by default.

I think more of the HW designers than that they will require routing
Display Engine messages to the Render Ring at all times! What are we
going to do with the messages we are not using in our batches? ;)

I believe the point of disabling it by default was that it was causing a
coupling between different powerwells across the other side of the chip,
and so we had to explicitly enable messages we wanted in the batch. I
don't imagine that they will stuff a different meaning into it, rather
they will just scrap it and start again (like every other gen).
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH igt 2/3] igt/perf_pmu: Stop peeking at intel_mmio registers
  2017-11-24  8:55   ` Tvrtko Ursulin
  2017-11-24  9:20     ` Chris Wilson
@ 2017-11-24 10:47     ` Chris Wilson
  1 sibling, 0 replies; 15+ messages in thread
From: Chris Wilson @ 2017-11-24 10:47 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2017-11-24 08:55:13)
> 
> On 23/11/2017 08:22, Chris Wilson wrote:
> > Program the MI_WAIT_FOR_EVENT without reference to DERRMR by knowing its
> > state is ~0u when not in use, and is only in use when userspace requires
> > it. By not touching intel_regsiter_access we completely eliminate the
> > risk that we leak the forcewake ref, which can cause later rc6 to fail.
> > 
> > At the same time, note that vlv/chv use a different mechanism (read
> > none) for coupling between the render engine and display.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >   tests/perf_pmu.c | 63 ++++++++++++++++++++++++++++----------------------------
> >   1 file changed, 32 insertions(+), 31 deletions(-)
> > 
> > diff --git a/tests/perf_pmu.c b/tests/perf_pmu.c
> > index 5118c023..7fd1506e 100644
> > --- a/tests/perf_pmu.c
> > +++ b/tests/perf_pmu.c
> > @@ -498,20 +498,22 @@ event_wait(int gem_fd, const struct intel_execution_engine2 *e)
> >   {
> >       struct drm_i915_gem_exec_object2 obj = { };
> >       struct drm_i915_gem_execbuffer2 eb = { };
> > -     data_t data;
> > -     igt_display_t *display = &data.display;
> >       const uint32_t DERRMR = 0x44050;
> > +     const uint32_t FORCEWAKE_MT = 0xa188;
> >       unsigned int valid_tests = 0;
> > -     uint32_t batch[8], *b;
> > +     uint32_t batch[16], *b;
> > +     uint16_t devid;
> >       igt_output_t *output;
> > -     uint32_t bb_handle;
> > -     uint32_t reg;
> > +     data_t data;
> >       enum pipe p;
> >       int fd;
> >   
> > -     igt_require(intel_gen(intel_get_drm_devid(gem_fd)) >= 6);
> > -     igt_require(intel_register_access_init(intel_get_pci_device(),
> > -                                            false, gem_fd) == 0);
> > +     devid = intel_get_drm_devid(gem_fd);
> > +     igt_require(intel_gen(devid) >= 7);
> > +     igt_skip_on(IS_VALLEYVIEW(devid) || IS_CHERRYVIEW(devid));
> > +
> > +     kmstest_set_vt_graphics_mode();
> > +     igt_display_init(&data.display, gem_fd);
> >   
> >       /**
> >        * We will use the display to render event forwarind so need to
> > @@ -522,51 +524,52 @@ event_wait(int gem_fd, const struct intel_execution_engine2 *e)
> >        * listen to the recorded time spent in engine wait state as reported
> >        * by the PMU.
> >        */
> > -     reg = intel_register_read(DERRMR);
> > -
> > -     kmstest_set_vt_graphics_mode();
> > -     igt_display_init(&data.display, gem_fd);
> > -
> > -     bb_handle = gem_create(gem_fd, 4096);
> > +     obj.handle = gem_create(gem_fd, 4096);
> >   
> >       b = batch;
> >       *b++ = MI_LOAD_REGISTER_IMM;
> > +     *b++ = FORCEWAKE_MT;
> > +     *b++ = 2 << 16 | 2;
> > +     *b++ = MI_LOAD_REGISTER_IMM;
> >       *b++ = DERRMR;
> > -     *b++ = reg & ~((1 << 3) | (1 << 11) | (1 << 21));
> > -     *b++ = MI_WAIT_FOR_EVENT | MI_WAIT_FOR_PIPE_A_VBLANK;
> > +     *b++ = ~0u;
> > +     *b++ = MI_WAIT_FOR_EVENT;
> >       *b++ = MI_LOAD_REGISTER_IMM;
> >       *b++ = DERRMR;
> > -     *b++ = reg;
> > +     *b++ = ~0u;
> > +     *b++ = MI_LOAD_REGISTER_IMM;
> > +     *b++ = FORCEWAKE_MT;
> > +     *b++ = 2 << 16;
> >       *b++ = MI_BATCH_BUFFER_END;
> >   
> > -     obj.handle = bb_handle;
> > -
> >       eb.buffer_count = 1;
> >       eb.buffers_ptr = to_user_pointer(&obj);
> >       eb.flags = e2ring(gem_fd, e) | I915_EXEC_SECURE;
> >   
> > -     for_each_pipe_with_valid_output(display, p, output) {
> > +     for_each_pipe_with_valid_output(&data.display, p, output) {
> >               struct igt_helper_process waiter = { };
> >               const unsigned int frames = 3;
> > -             unsigned int frame;
> >               uint64_t val[2];
> >   
> > -             batch[3] = MI_WAIT_FOR_EVENT;
> > +             batch[6] = MI_WAIT_FOR_EVENT;
> >               switch (p) {
> >               case PIPE_A:
> > -                     batch[3] |= MI_WAIT_FOR_PIPE_A_VBLANK;
> > +                     batch[6] |= MI_WAIT_FOR_PIPE_A_VBLANK;
> > +                     batch[5] = ~(1 << 3);
> >                       break;
> >               case PIPE_B:
> > -                     batch[3] |= MI_WAIT_FOR_PIPE_B_VBLANK;
> > +                     batch[6] |= MI_WAIT_FOR_PIPE_B_VBLANK;
> > +                     batch[5] = ~(1 << 11);
> >                       break;
> >               case PIPE_C:
> > -                     batch[3] |= MI_WAIT_FOR_PIPE_C_VBLANK;
> > +                     batch[6] |= MI_WAIT_FOR_PIPE_C_VBLANK;
> > +                     batch[5] = ~(1 << 21);
> >                       break;
> >               default:
> >                       continue;
> >               }
> >   
> > -             gem_write(gem_fd, bb_handle, 0, batch, sizeof(batch));
> > +             gem_write(gem_fd, obj.handle, 0, batch, sizeof(batch));
> >   
> >               data.pipe = p;
> >               prepare_crtc(&data, gem_fd, output);
> > @@ -589,9 +592,9 @@ event_wait(int gem_fd, const struct intel_execution_engine2 *e)
> >                       }
> >               }
> >   
> > -             for (frame = 0; frame < frames; frame++) {
> > +             for (unsigned int frame = 0; frame < frames; frame++) {
> >                       gem_execbuf(gem_fd, &eb);
> > -                     gem_sync(gem_fd, bb_handle);
> > +                     gem_sync(gem_fd, obj.handle);
> >               }
> >   
> >               igt_stop_helper(&waiter);
> > @@ -606,9 +609,7 @@ event_wait(int gem_fd, const struct intel_execution_engine2 *e)
> >               igt_assert(val[1] - val[0] > 0);
> >       }
> >   
> > -     gem_close(gem_fd, bb_handle);
> > -
> > -     intel_register_access_fini();
> > +     gem_close(gem_fd, obj.handle);
> >   
> >       igt_require_f(valid_tests,
> >                     "no valid crtc/connector combinations found\n");
> > 
> 
> It looks fine apart from the assumption that DERRMR always have to 
> remain at default ~0. Worst I can imagine is that in the future we have 
> to make this test force disconnect all displays, should some of the 
> reserved bits be used for something which we will be turning on by default.
> 
> But in that world sampling DERRMR value at test start is also only 
> marginally better.

Added a comment that I am lazy, and pushed. Thanks,
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2017-11-24 10:47 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-23  8:22 igt/perf_pmu roundup Chris Wilson
2017-11-23  8:22 ` [PATCH igt 1/3] igt/perf_pmu: Reduce arbitrary delays Chris Wilson
2017-11-23 13:35   ` Tvrtko Ursulin
2017-11-23  8:22 ` [PATCH igt 2/3] igt/perf_pmu: Stop peeking at intel_mmio registers Chris Wilson
2017-11-24  8:55   ` Tvrtko Ursulin
2017-11-24  9:20     ` Chris Wilson
2017-11-24 10:47     ` Chris Wilson
2017-11-23  8:22 ` [PATCH igt 3/3] igt/perf_pmu: Idle the GPU before starting to measure busyness Chris Wilson
2017-11-23 13:37   ` Tvrtko Ursulin
2017-11-23 16:02     ` Chris Wilson
2017-11-23 16:07       ` Chris Wilson
2017-11-24  7:32         ` Tvrtko Ursulin
2017-11-24  9:15           ` Chris Wilson
2017-11-23 18:53 ` ✓ Fi.CI.BAT: success for series starting with [1/3] igt/perf_pmu: Reduce arbitrary delays Patchwork
2017-11-23 22:14 ` ✓ Fi.CI.IGT: " Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.