* igt/perf_pmu roundup
@ 2017-11-23 8:22 Chris Wilson
2017-11-23 8:22 ` [PATCH igt 1/3] igt/perf_pmu: Reduce arbitrary delays Chris Wilson
` (4 more replies)
0 siblings, 5 replies; 15+ messages in thread
From: Chris Wilson @ 2017-11-23 8:22 UTC (permalink / raw)
To: intel-gfx
The stragglers for easy consumption and discussion.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH igt 1/3] igt/perf_pmu: Reduce arbitrary delays
2017-11-23 8:22 igt/perf_pmu roundup Chris Wilson
@ 2017-11-23 8:22 ` Chris Wilson
2017-11-23 13:35 ` Tvrtko Ursulin
2017-11-23 8:22 ` [PATCH igt 2/3] igt/perf_pmu: Stop peeking at intel_mmio registers Chris Wilson
` (3 subsequent siblings)
4 siblings, 1 reply; 15+ messages in thread
From: Chris Wilson @ 2017-11-23 8:22 UTC (permalink / raw)
To: intel-gfx
gem_quiescent_gpu() is supposed to ensure that the HW is idle, and in
the process kick the GPU into rc6, so we should not need a long delay
afterwards to ensure that we are indeed in rc6. We do however need a
small delay in order to be sure that rc6 cycle counter has started and
stopped.
v2: Apply to rc6p as well.
v3: The longest rc6 timeout (before the HW kicks in and enables rc6 on
an idle GPU) is 50ms, so make sure that at least that time has passed
since we were busy.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
tests/perf_pmu.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/tests/perf_pmu.c b/tests/perf_pmu.c
index ef5de658..5118c023 100644
--- a/tests/perf_pmu.c
+++ b/tests/perf_pmu.c
@@ -1003,7 +1003,7 @@ test_rc6(int gem_fd)
fd = open_pmu(I915_PMU_RC6_RESIDENCY);
gem_quiescent_gpu(gem_fd);
- usleep(1e6);
+ usleep(100e3); /* wait for the rc6 cycle counter to kick in */
/* Go idle and check full RC6. */
prev = pmu_read_single(fd);
@@ -1015,6 +1015,7 @@ test_rc6(int gem_fd)
/* Wake up device and check no RC6. */
fw = igt_open_forcewake_handle(gem_fd);
igt_assert(fw >= 0);
+ usleep(1e3); /* wait for the rc6 cycle counter to stop ticking */
prev = pmu_read_single(fd);
usleep(duration_ns / 1000);
@@ -1047,7 +1048,7 @@ test_rc6p(int gem_fd)
igt_require(num_pmu == 3);
gem_quiescent_gpu(gem_fd);
- usleep(1e6);
+ usleep(100e3); /* wait for the rc6 cycle counter to kick in */
/* Go idle and check full RC6. */
pmu_read_multi(fd, num_pmu, prev);
@@ -1060,6 +1061,7 @@ test_rc6p(int gem_fd)
/* Wake up device and check no RC6. */
fw = igt_open_forcewake_handle(gem_fd);
igt_assert(fw >= 0);
+ usleep(1e3); /* wait for the rc6 cycle counter to stop ticking */
pmu_read_multi(fd, num_pmu, prev);
usleep(duration_ns / 1000);
--
2.15.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH igt 2/3] igt/perf_pmu: Stop peeking at intel_mmio registers
2017-11-23 8:22 igt/perf_pmu roundup Chris Wilson
2017-11-23 8:22 ` [PATCH igt 1/3] igt/perf_pmu: Reduce arbitrary delays Chris Wilson
@ 2017-11-23 8:22 ` Chris Wilson
2017-11-24 8:55 ` Tvrtko Ursulin
2017-11-23 8:22 ` [PATCH igt 3/3] igt/perf_pmu: Idle the GPU before starting to measure busyness Chris Wilson
` (2 subsequent siblings)
4 siblings, 1 reply; 15+ messages in thread
From: Chris Wilson @ 2017-11-23 8:22 UTC (permalink / raw)
To: intel-gfx
Program the MI_WAIT_FOR_EVENT without reference to DERRMR by knowing its
state is ~0u when not in use, and is only in use when userspace requires
it. By not touching intel_regsiter_access we completely eliminate the
risk that we leak the forcewake ref, which can cause later rc6 to fail.
At the same time, note that vlv/chv use a different mechanism (read
none) for coupling between the render engine and display.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
tests/perf_pmu.c | 63 ++++++++++++++++++++++++++++----------------------------
1 file changed, 32 insertions(+), 31 deletions(-)
diff --git a/tests/perf_pmu.c b/tests/perf_pmu.c
index 5118c023..7fd1506e 100644
--- a/tests/perf_pmu.c
+++ b/tests/perf_pmu.c
@@ -498,20 +498,22 @@ event_wait(int gem_fd, const struct intel_execution_engine2 *e)
{
struct drm_i915_gem_exec_object2 obj = { };
struct drm_i915_gem_execbuffer2 eb = { };
- data_t data;
- igt_display_t *display = &data.display;
const uint32_t DERRMR = 0x44050;
+ const uint32_t FORCEWAKE_MT = 0xa188;
unsigned int valid_tests = 0;
- uint32_t batch[8], *b;
+ uint32_t batch[16], *b;
+ uint16_t devid;
igt_output_t *output;
- uint32_t bb_handle;
- uint32_t reg;
+ data_t data;
enum pipe p;
int fd;
- igt_require(intel_gen(intel_get_drm_devid(gem_fd)) >= 6);
- igt_require(intel_register_access_init(intel_get_pci_device(),
- false, gem_fd) == 0);
+ devid = intel_get_drm_devid(gem_fd);
+ igt_require(intel_gen(devid) >= 7);
+ igt_skip_on(IS_VALLEYVIEW(devid) || IS_CHERRYVIEW(devid));
+
+ kmstest_set_vt_graphics_mode();
+ igt_display_init(&data.display, gem_fd);
/**
* We will use the display to render event forwarind so need to
@@ -522,51 +524,52 @@ event_wait(int gem_fd, const struct intel_execution_engine2 *e)
* listen to the recorded time spent in engine wait state as reported
* by the PMU.
*/
- reg = intel_register_read(DERRMR);
-
- kmstest_set_vt_graphics_mode();
- igt_display_init(&data.display, gem_fd);
-
- bb_handle = gem_create(gem_fd, 4096);
+ obj.handle = gem_create(gem_fd, 4096);
b = batch;
*b++ = MI_LOAD_REGISTER_IMM;
+ *b++ = FORCEWAKE_MT;
+ *b++ = 2 << 16 | 2;
+ *b++ = MI_LOAD_REGISTER_IMM;
*b++ = DERRMR;
- *b++ = reg & ~((1 << 3) | (1 << 11) | (1 << 21));
- *b++ = MI_WAIT_FOR_EVENT | MI_WAIT_FOR_PIPE_A_VBLANK;
+ *b++ = ~0u;
+ *b++ = MI_WAIT_FOR_EVENT;
*b++ = MI_LOAD_REGISTER_IMM;
*b++ = DERRMR;
- *b++ = reg;
+ *b++ = ~0u;
+ *b++ = MI_LOAD_REGISTER_IMM;
+ *b++ = FORCEWAKE_MT;
+ *b++ = 2 << 16;
*b++ = MI_BATCH_BUFFER_END;
- obj.handle = bb_handle;
-
eb.buffer_count = 1;
eb.buffers_ptr = to_user_pointer(&obj);
eb.flags = e2ring(gem_fd, e) | I915_EXEC_SECURE;
- for_each_pipe_with_valid_output(display, p, output) {
+ for_each_pipe_with_valid_output(&data.display, p, output) {
struct igt_helper_process waiter = { };
const unsigned int frames = 3;
- unsigned int frame;
uint64_t val[2];
- batch[3] = MI_WAIT_FOR_EVENT;
+ batch[6] = MI_WAIT_FOR_EVENT;
switch (p) {
case PIPE_A:
- batch[3] |= MI_WAIT_FOR_PIPE_A_VBLANK;
+ batch[6] |= MI_WAIT_FOR_PIPE_A_VBLANK;
+ batch[5] = ~(1 << 3);
break;
case PIPE_B:
- batch[3] |= MI_WAIT_FOR_PIPE_B_VBLANK;
+ batch[6] |= MI_WAIT_FOR_PIPE_B_VBLANK;
+ batch[5] = ~(1 << 11);
break;
case PIPE_C:
- batch[3] |= MI_WAIT_FOR_PIPE_C_VBLANK;
+ batch[6] |= MI_WAIT_FOR_PIPE_C_VBLANK;
+ batch[5] = ~(1 << 21);
break;
default:
continue;
}
- gem_write(gem_fd, bb_handle, 0, batch, sizeof(batch));
+ gem_write(gem_fd, obj.handle, 0, batch, sizeof(batch));
data.pipe = p;
prepare_crtc(&data, gem_fd, output);
@@ -589,9 +592,9 @@ event_wait(int gem_fd, const struct intel_execution_engine2 *e)
}
}
- for (frame = 0; frame < frames; frame++) {
+ for (unsigned int frame = 0; frame < frames; frame++) {
gem_execbuf(gem_fd, &eb);
- gem_sync(gem_fd, bb_handle);
+ gem_sync(gem_fd, obj.handle);
}
igt_stop_helper(&waiter);
@@ -606,9 +609,7 @@ event_wait(int gem_fd, const struct intel_execution_engine2 *e)
igt_assert(val[1] - val[0] > 0);
}
- gem_close(gem_fd, bb_handle);
-
- intel_register_access_fini();
+ gem_close(gem_fd, obj.handle);
igt_require_f(valid_tests,
"no valid crtc/connector combinations found\n");
--
2.15.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH igt 3/3] igt/perf_pmu: Idle the GPU before starting to measure busyness
2017-11-23 8:22 igt/perf_pmu roundup Chris Wilson
2017-11-23 8:22 ` [PATCH igt 1/3] igt/perf_pmu: Reduce arbitrary delays Chris Wilson
2017-11-23 8:22 ` [PATCH igt 2/3] igt/perf_pmu: Stop peeking at intel_mmio registers Chris Wilson
@ 2017-11-23 8:22 ` Chris Wilson
2017-11-23 13:37 ` Tvrtko Ursulin
2017-11-23 18:53 ` ✓ Fi.CI.BAT: success for series starting with [1/3] igt/perf_pmu: Reduce arbitrary delays Patchwork
2017-11-23 22:14 ` ✓ Fi.CI.IGT: " Patchwork
4 siblings, 1 reply; 15+ messages in thread
From: Chris Wilson @ 2017-11-23 8:22 UTC (permalink / raw)
To: intel-gfx
Make sure the HW is idle before we start sampling the GPU for busyness.
If we do not rest for long enough between tests, we may carry the
sampling over.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
tests/perf_pmu.c | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
diff --git a/tests/perf_pmu.c b/tests/perf_pmu.c
index 7fd1506e..9057a1d7 100644
--- a/tests/perf_pmu.c
+++ b/tests/perf_pmu.c
@@ -137,6 +137,9 @@ single(int gem_fd, const struct intel_execution_engine2 *e, bool busy)
uint64_t val;
int fd;
+ /* Ensure the HW is idle before being BUSY measurements */
+ gem_quiescent_gpu(gem_fd);
+
fd = open_pmu(I915_PMU_ENGINE_BUSY(e->class, e->instance));
if (busy) {
@@ -187,6 +190,9 @@ busy_check_all(int gem_fd, const struct intel_execution_engine2 *e,
igt_spin_t *spin;
unsigned int busy_idx, i;
+ /* Ensure the HW is idle before being BUSY measurements */
+ gem_quiescent_gpu(gem_fd);
+
i = 0;
fd[0] = -1;
for_each_engine_class_instance(fd, e_) {
@@ -232,6 +238,9 @@ most_busy_check_all(int gem_fd, const struct intel_execution_engine2 *e,
igt_spin_t *spin[num_engines];
unsigned int idle_idx, i;
+ /* Ensure the HW is idle before being BUSY measurements */
+ gem_quiescent_gpu(gem_fd);
+
gem_require_engine(gem_fd, e->class, e->instance);
i = 0;
@@ -287,6 +296,9 @@ all_busy_check_all(int gem_fd, const unsigned int num_engines)
igt_spin_t *spin[num_engines];
unsigned int i;
+ /* Ensure the HW is idle before being BUSY measurements */
+ gem_quiescent_gpu(gem_fd);
+
i = 0;
fd[0] = -1;
for_each_engine_class_instance(fd, e) {
@@ -624,6 +636,9 @@ multi_client(int gem_fd, const struct intel_execution_engine2 *e)
uint64_t val[2];
int fd[2];
+ /* Ensure the HW is idle before being BUSY measurements */
+ gem_quiescent_gpu(gem_fd);
+
fd[0] = open_pmu(config);
/*
@@ -739,6 +754,9 @@ static void cpu_hotplug(int gem_fd)
uint64_t val, ref;
int fd;
+ /* Ensure the HW is idle before being BUSY measurements */
+ gem_quiescent_gpu(gem_fd);
+
igt_require(cpu0_hotplug_support());
fd = perf_i915_open(I915_PMU_ENGINE_BUSY(I915_ENGINE_CLASS_RENDER, 0));
--
2.15.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH igt 1/3] igt/perf_pmu: Reduce arbitrary delays
2017-11-23 8:22 ` [PATCH igt 1/3] igt/perf_pmu: Reduce arbitrary delays Chris Wilson
@ 2017-11-23 13:35 ` Tvrtko Ursulin
0 siblings, 0 replies; 15+ messages in thread
From: Tvrtko Ursulin @ 2017-11-23 13:35 UTC (permalink / raw)
To: Chris Wilson, intel-gfx
On 23/11/2017 08:22, Chris Wilson wrote:
> gem_quiescent_gpu() is supposed to ensure that the HW is idle, and in
> the process kick the GPU into rc6, so we should not need a long delay
> afterwards to ensure that we are indeed in rc6. We do however need a
> small delay in order to be sure that rc6 cycle counter has started and
> stopped.
>
> v2: Apply to rc6p as well.
> v3: The longest rc6 timeout (before the HW kicks in and enables rc6 on
> an idle GPU) is 50ms, so make sure that at least that time has passed
> since we were busy.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
> tests/perf_pmu.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/tests/perf_pmu.c b/tests/perf_pmu.c
> index ef5de658..5118c023 100644
> --- a/tests/perf_pmu.c
> +++ b/tests/perf_pmu.c
> @@ -1003,7 +1003,7 @@ test_rc6(int gem_fd)
> fd = open_pmu(I915_PMU_RC6_RESIDENCY);
>
> gem_quiescent_gpu(gem_fd);
> - usleep(1e6);
> + usleep(100e3); /* wait for the rc6 cycle counter to kick in */
>
> /* Go idle and check full RC6. */
> prev = pmu_read_single(fd);
> @@ -1015,6 +1015,7 @@ test_rc6(int gem_fd)
> /* Wake up device and check no RC6. */
> fw = igt_open_forcewake_handle(gem_fd);
> igt_assert(fw >= 0);
> + usleep(1e3); /* wait for the rc6 cycle counter to stop ticking */
>
> prev = pmu_read_single(fd);
> usleep(duration_ns / 1000);
> @@ -1047,7 +1048,7 @@ test_rc6p(int gem_fd)
> igt_require(num_pmu == 3);
>
> gem_quiescent_gpu(gem_fd);
> - usleep(1e6);
> + usleep(100e3); /* wait for the rc6 cycle counter to kick in */
>
> /* Go idle and check full RC6. */
> pmu_read_multi(fd, num_pmu, prev);
> @@ -1060,6 +1061,7 @@ test_rc6p(int gem_fd)
> /* Wake up device and check no RC6. */
> fw = igt_open_forcewake_handle(gem_fd);
> igt_assert(fw >= 0);
> + usleep(1e3); /* wait for the rc6 cycle counter to stop ticking */
>
> pmu_read_multi(fd, num_pmu, prev);
> usleep(duration_ns / 1000);
>
As we talked yesterday I added the sleep thinking I am letting the GPU
go to RC6, but I missed the fact gem_queiscent_gpu already does that. So
as you explain here it is actually about hardware behaviour.
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Regards,
Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH igt 3/3] igt/perf_pmu: Idle the GPU before starting to measure busyness
2017-11-23 8:22 ` [PATCH igt 3/3] igt/perf_pmu: Idle the GPU before starting to measure busyness Chris Wilson
@ 2017-11-23 13:37 ` Tvrtko Ursulin
2017-11-23 16:02 ` Chris Wilson
0 siblings, 1 reply; 15+ messages in thread
From: Tvrtko Ursulin @ 2017-11-23 13:37 UTC (permalink / raw)
To: Chris Wilson, intel-gfx
On 23/11/2017 08:22, Chris Wilson wrote:
> Make sure the HW is idle before we start sampling the GPU for busyness.
> If we do not rest for long enough between tests, we may carry the
> sampling over.
I'd rather not have this since as I said yesterday each opened PMU event
is supposed to record the initial counter value as reference. If that is
failing or not good enough on some tests/platforms I would rather first
understand why and how.
Regards,
Tvrtko
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
> tests/perf_pmu.c | 18 ++++++++++++++++++
> 1 file changed, 18 insertions(+)
>
> diff --git a/tests/perf_pmu.c b/tests/perf_pmu.c
> index 7fd1506e..9057a1d7 100644
> --- a/tests/perf_pmu.c
> +++ b/tests/perf_pmu.c
> @@ -137,6 +137,9 @@ single(int gem_fd, const struct intel_execution_engine2 *e, bool busy)
> uint64_t val;
> int fd;
>
> + /* Ensure the HW is idle before being BUSY measurements */
> + gem_quiescent_gpu(gem_fd);
> +
> fd = open_pmu(I915_PMU_ENGINE_BUSY(e->class, e->instance));
>
> if (busy) {
> @@ -187,6 +190,9 @@ busy_check_all(int gem_fd, const struct intel_execution_engine2 *e,
> igt_spin_t *spin;
> unsigned int busy_idx, i;
>
> + /* Ensure the HW is idle before being BUSY measurements */
> + gem_quiescent_gpu(gem_fd);
> +
> i = 0;
> fd[0] = -1;
> for_each_engine_class_instance(fd, e_) {
> @@ -232,6 +238,9 @@ most_busy_check_all(int gem_fd, const struct intel_execution_engine2 *e,
> igt_spin_t *spin[num_engines];
> unsigned int idle_idx, i;
>
> + /* Ensure the HW is idle before being BUSY measurements */
> + gem_quiescent_gpu(gem_fd);
> +
> gem_require_engine(gem_fd, e->class, e->instance);
>
> i = 0;
> @@ -287,6 +296,9 @@ all_busy_check_all(int gem_fd, const unsigned int num_engines)
> igt_spin_t *spin[num_engines];
> unsigned int i;
>
> + /* Ensure the HW is idle before being BUSY measurements */
> + gem_quiescent_gpu(gem_fd);
> +
> i = 0;
> fd[0] = -1;
> for_each_engine_class_instance(fd, e) {
> @@ -624,6 +636,9 @@ multi_client(int gem_fd, const struct intel_execution_engine2 *e)
> uint64_t val[2];
> int fd[2];
>
> + /* Ensure the HW is idle before being BUSY measurements */
> + gem_quiescent_gpu(gem_fd);
> +
> fd[0] = open_pmu(config);
>
> /*
> @@ -739,6 +754,9 @@ static void cpu_hotplug(int gem_fd)
> uint64_t val, ref;
> int fd;
>
> + /* Ensure the HW is idle before being BUSY measurements */
> + gem_quiescent_gpu(gem_fd);
> +
> igt_require(cpu0_hotplug_support());
>
> fd = perf_i915_open(I915_PMU_ENGINE_BUSY(I915_ENGINE_CLASS_RENDER, 0));
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH igt 3/3] igt/perf_pmu: Idle the GPU before starting to measure busyness
2017-11-23 13:37 ` Tvrtko Ursulin
@ 2017-11-23 16:02 ` Chris Wilson
2017-11-23 16:07 ` Chris Wilson
0 siblings, 1 reply; 15+ messages in thread
From: Chris Wilson @ 2017-11-23 16:02 UTC (permalink / raw)
To: Tvrtko Ursulin, intel-gfx
Quoting Tvrtko Ursulin (2017-11-23 13:37:30)
>
> On 23/11/2017 08:22, Chris Wilson wrote:
> > Make sure the HW is idle before we start sampling the GPU for busyness.
> > If we do not rest for long enough between tests, we may carry the
> > sampling over.
>
> I'd rather not have this since as I said yesterday each opened PMU event
> is supposed to record the initial counter value as reference. If that is
> failing or not good enough on some tests/platforms I would rather first
> understand why and how.
All legacy sampling fails... :| I'm putting it back!
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH igt 3/3] igt/perf_pmu: Idle the GPU before starting to measure busyness
2017-11-23 16:02 ` Chris Wilson
@ 2017-11-23 16:07 ` Chris Wilson
2017-11-24 7:32 ` Tvrtko Ursulin
0 siblings, 1 reply; 15+ messages in thread
From: Chris Wilson @ 2017-11-23 16:07 UTC (permalink / raw)
To: Tvrtko Ursulin, intel-gfx
Quoting Chris Wilson (2017-11-23 16:02:20)
> Quoting Tvrtko Ursulin (2017-11-23 13:37:30)
> >
> > On 23/11/2017 08:22, Chris Wilson wrote:
> > > Make sure the HW is idle before we start sampling the GPU for busyness.
> > > If we do not rest for long enough between tests, we may carry the
> > > sampling over.
> >
> > I'd rather not have this since as I said yesterday each opened PMU event
> > is supposed to record the initial counter value as reference. If that is
> > failing or not good enough on some tests/platforms I would rather first
> > understand why and how.
>
> All legacy sampling fails... :| I'm putting it back!
So presumably it is coupling with the parking.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 15+ messages in thread
* ✓ Fi.CI.BAT: success for series starting with [1/3] igt/perf_pmu: Reduce arbitrary delays
2017-11-23 8:22 igt/perf_pmu roundup Chris Wilson
` (2 preceding siblings ...)
2017-11-23 8:22 ` [PATCH igt 3/3] igt/perf_pmu: Idle the GPU before starting to measure busyness Chris Wilson
@ 2017-11-23 18:53 ` Patchwork
2017-11-23 22:14 ` ✓ Fi.CI.IGT: " Patchwork
4 siblings, 0 replies; 15+ messages in thread
From: Patchwork @ 2017-11-23 18:53 UTC (permalink / raw)
To: Chris Wilson; +Cc: intel-gfx
== Series Details ==
Series: series starting with [1/3] igt/perf_pmu: Reduce arbitrary delays
URL : https://patchwork.freedesktop.org/series/34271/
State : success
== Summary ==
IGT patchset tested on top of latest successful build
a1e444f4c8178acb590d41c21e921c6447668be4 tests/perf_pmu: Bump measuring duration for semaphores as well
with latest DRM-Tip kernel build CI_DRM_3378
b407e5f38397 drm-tip: 2017y-11m-23d-16h-14m-59s UTC integration manifest
No testlist changes.
Test kms_pipe_crc_basic:
Subgroup suspend-read-crc-pipe-c:
incomplete -> PASS (fi-cfl-s2)
fi-bdw-5557u total:289 pass:268 dwarn:0 dfail:0 fail:0 skip:21 time:447s
fi-bdw-gvtdvm total:289 pass:265 dwarn:0 dfail:0 fail:0 skip:24 time:457s
fi-blb-e6850 total:289 pass:223 dwarn:1 dfail:0 fail:0 skip:65 time:388s
fi-bsw-n3050 total:289 pass:243 dwarn:0 dfail:0 fail:0 skip:46 time:549s
fi-bwr-2160 total:289 pass:183 dwarn:0 dfail:0 fail:0 skip:106 time:279s
fi-bxt-dsi total:289 pass:259 dwarn:0 dfail:0 fail:0 skip:30 time:514s
fi-bxt-j4205 total:289 pass:260 dwarn:0 dfail:0 fail:0 skip:29 time:521s
fi-byt-j1900 total:289 pass:254 dwarn:0 dfail:0 fail:0 skip:35 time:513s
fi-byt-n2820 total:289 pass:250 dwarn:0 dfail:0 fail:0 skip:39 time:496s
fi-cfl-s2 total:289 pass:263 dwarn:0 dfail:0 fail:0 skip:26 time:606s
fi-elk-e7500 total:289 pass:229 dwarn:0 dfail:0 fail:0 skip:60 time:429s
fi-gdg-551 total:289 pass:178 dwarn:1 dfail:0 fail:1 skip:109 time:266s
fi-glk-1 total:289 pass:261 dwarn:0 dfail:0 fail:0 skip:28 time:544s
fi-hsw-4770 total:289 pass:262 dwarn:0 dfail:0 fail:0 skip:27 time:432s
fi-hsw-4770r total:289 pass:262 dwarn:0 dfail:0 fail:0 skip:27 time:441s
fi-ilk-650 total:289 pass:228 dwarn:0 dfail:0 fail:0 skip:61 time:434s
fi-ivb-3520m total:289 pass:260 dwarn:0 dfail:0 fail:0 skip:29 time:484s
fi-ivb-3770 total:289 pass:260 dwarn:0 dfail:0 fail:0 skip:29 time:465s
fi-pnv-d510 total:289 pass:222 dwarn:1 dfail:0 fail:0 skip:66 time:579s
fi-skl-6260u total:289 pass:269 dwarn:0 dfail:0 fail:0 skip:20 time:454s
fi-skl-6600u total:289 pass:262 dwarn:0 dfail:0 fail:0 skip:27 time:542s
fi-skl-6700hq total:289 pass:263 dwarn:0 dfail:0 fail:0 skip:26 time:565s
fi-skl-6700k total:289 pass:265 dwarn:0 dfail:0 fail:0 skip:24 time:524s
fi-skl-6770hq total:289 pass:269 dwarn:0 dfail:0 fail:0 skip:20 time:504s
fi-skl-gvtdvm total:289 pass:266 dwarn:0 dfail:0 fail:0 skip:23 time:464s
fi-snb-2520m total:289 pass:250 dwarn:0 dfail:0 fail:0 skip:39 time:565s
fi-snb-2600 total:289 pass:249 dwarn:0 dfail:0 fail:0 skip:40 time:438s
Blacklisted hosts:
fi-cnl-y total:237 pass:212 dwarn:0 dfail:0 fail:0 skip:24
fi-glk-dsi total:289 pass:258 dwarn:0 dfail:0 fail:1 skip:30 time:515s
fi-kbl-7500u total:289 pass:264 dwarn:1 dfail:0 fail:0 skip:24 time:483s
fi-kbl-7560u total:289 pass:270 dwarn:0 dfail:0 fail:0 skip:19 time:531s
fi-kbl-7567u total:289 pass:269 dwarn:0 dfail:0 fail:0 skip:20 time:476s
fi-kbl-r total:289 pass:262 dwarn:0 dfail:0 fail:0 skip:27 time:536s
== Logs ==
For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_541/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 15+ messages in thread
* ✓ Fi.CI.IGT: success for series starting with [1/3] igt/perf_pmu: Reduce arbitrary delays
2017-11-23 8:22 igt/perf_pmu roundup Chris Wilson
` (3 preceding siblings ...)
2017-11-23 18:53 ` ✓ Fi.CI.BAT: success for series starting with [1/3] igt/perf_pmu: Reduce arbitrary delays Patchwork
@ 2017-11-23 22:14 ` Patchwork
4 siblings, 0 replies; 15+ messages in thread
From: Patchwork @ 2017-11-23 22:14 UTC (permalink / raw)
To: Chris Wilson; +Cc: intel-gfx
== Series Details ==
Series: series starting with [1/3] igt/perf_pmu: Reduce arbitrary delays
URL : https://patchwork.freedesktop.org/series/34271/
State : success
== Summary ==
Test kms_cursor_crc:
Subgroup cursor-256x256-suspend:
pass -> SKIP (shard-snb) fdo#102365
fdo#102365 https://bugs.freedesktop.org/show_bug.cgi?id=102365
shard-hsw total:2667 pass:1534 dwarn:1 dfail:0 fail:11 skip:1121 time:9593s
shard-snb total:2667 pass:1311 dwarn:1 dfail:0 fail:12 skip:1343 time:8101s
Blacklisted hosts:
shard-apl total:2645 pass:1665 dwarn:1 dfail:0 fail:24 skip:954 time:13254s
shard-kbl total:2595 pass:1734 dwarn:19 dfail:2 fail:24 skip:815 time:10756s
== Logs ==
For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_541/shards.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH igt 3/3] igt/perf_pmu: Idle the GPU before starting to measure busyness
2017-11-23 16:07 ` Chris Wilson
@ 2017-11-24 7:32 ` Tvrtko Ursulin
2017-11-24 9:15 ` Chris Wilson
0 siblings, 1 reply; 15+ messages in thread
From: Tvrtko Ursulin @ 2017-11-24 7:32 UTC (permalink / raw)
To: Chris Wilson, Tvrtko Ursulin, intel-gfx
On 23/11/17 16:07, Chris Wilson wrote:
> Quoting Chris Wilson (2017-11-23 16:02:20)
>> Quoting Tvrtko Ursulin (2017-11-23 13:37:30)
>>>
>>> On 23/11/2017 08:22, Chris Wilson wrote:
>>>> Make sure the HW is idle before we start sampling the GPU for busyness.
>>>> If we do not rest for long enough between tests, we may carry the
>>>> sampling over.
>>>
>>> I'd rather not have this since as I said yesterday each opened PMU event
>>> is supposed to record the initial counter value as reference. If that is
>>> failing or not good enough on some tests/platforms I would rather first
>>> understand why and how.
>>
>> All legacy sampling fails... :| I'm putting it back!
>
> So presumably it is coupling with the parking.
Or more precisely averaging with the previous sample. Every time one PMU
client signals the timer to self-disarm, but the new PMU client enables
the timer again in this window, the previous sample value does not get
cleared.
I tried this for instance:
diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index 650089b30d46..a5ca27eeef40 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -147,9 +147,26 @@ void i915_pmu_gt_parked(struct drm_i915_private *i915)
spin_unlock_irq(&i915->pmu.lock);
}
+static void pmu_init_previous_samples(struct drm_i915_private *i915)
+{
+ struct intel_engine_cs *engine;
+ enum intel_engine_id id;
+ unsigned int i;
+
+ for_each_engine(engine, i915, id) {
+ for (i = 0; i < ARRAY_SIZE(engine->pmu.sample); i++)
+ engine->pmu.sample[i].prev = 0;
+ }
+
+ for (i = 0; i < ARRAY_SIZE(i915->pmu.sample); i++)
+ i915->pmu.sample[i].prev = i915->gt_pm.rps.idle_freq;
+}
+
static void __i915_pmu_maybe_start_timer(struct drm_i915_private *i915)
{
if (!i915->pmu.timer_enabled && pmu_needs_timer(i915, true)) {
+ hrtimer_cancel(&i915->pmu.timer);
+ pmu_init_previous_samples(i915);
i915->pmu.timer_enabled = true;
hrtimer_start_range_ns(&i915->pmu.timer,
ns_to_ktime(PERIOD), 0,
@@ -262,31 +279,13 @@ static void frequency_sample(struct drm_i915_private *dev_priv)
}
}
-static void pmu_init_previous_samples(struct drm_i915_private *i915)
-{
- struct intel_engine_cs *engine;
- enum intel_engine_id id;
- unsigned int i;
-
- for_each_engine(engine, i915, id) {
- for (i = 0; i < ARRAY_SIZE(engine->pmu.sample); i++)
- engine->pmu.sample[i].prev = 0;
- }
-
- for (i = 0; i < ARRAY_SIZE(i915->pmu.sample); i++)
- i915->pmu.sample[i].prev = i915->gt_pm.rps.idle_freq;
-}
-
static enum hrtimer_restart i915_sample(struct hrtimer *hrtimer)
{
struct drm_i915_private *i915 =
container_of(hrtimer, struct drm_i915_private, pmu.timer);
- if (!READ_ONCE(i915->pmu.timer_enabled)) {
- pmu_init_previous_samples(i915);
-
+ if (!READ_ONCE(i915->pmu.timer_enabled))
return HRTIMER_NORESTART;
- }
engines_sample(i915);
frequency_sample(i915);
@@ -847,8 +846,6 @@ void i915_pmu_register(struct drm_i915_private *i915)
hrtimer_init(&i915->pmu.timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
i915->pmu.timer.function = i915_sample;
- pmu_init_previous_samples(i915);
-
for_each_engine(engine, i915, id)
INIT_DELAYED_WORK(&engine->pmu.disable_busy_stats,
__disable_busy_stats);
---
But it is too evil to wait for the sampling timer in case of MMIO in the
irq disabled section. Even though it would happen only on disable-enable
transitions quicker than sampling period. It is not an use case, but still..
Or we could drop sample averaging.. ?
Regards,
Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH igt 2/3] igt/perf_pmu: Stop peeking at intel_mmio registers
2017-11-23 8:22 ` [PATCH igt 2/3] igt/perf_pmu: Stop peeking at intel_mmio registers Chris Wilson
@ 2017-11-24 8:55 ` Tvrtko Ursulin
2017-11-24 9:20 ` Chris Wilson
2017-11-24 10:47 ` Chris Wilson
0 siblings, 2 replies; 15+ messages in thread
From: Tvrtko Ursulin @ 2017-11-24 8:55 UTC (permalink / raw)
To: Chris Wilson, intel-gfx
On 23/11/2017 08:22, Chris Wilson wrote:
> Program the MI_WAIT_FOR_EVENT without reference to DERRMR by knowing its
> state is ~0u when not in use, and is only in use when userspace requires
> it. By not touching intel_regsiter_access we completely eliminate the
> risk that we leak the forcewake ref, which can cause later rc6 to fail.
>
> At the same time, note that vlv/chv use a different mechanism (read
> none) for coupling between the render engine and display.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
> tests/perf_pmu.c | 63 ++++++++++++++++++++++++++++----------------------------
> 1 file changed, 32 insertions(+), 31 deletions(-)
>
> diff --git a/tests/perf_pmu.c b/tests/perf_pmu.c
> index 5118c023..7fd1506e 100644
> --- a/tests/perf_pmu.c
> +++ b/tests/perf_pmu.c
> @@ -498,20 +498,22 @@ event_wait(int gem_fd, const struct intel_execution_engine2 *e)
> {
> struct drm_i915_gem_exec_object2 obj = { };
> struct drm_i915_gem_execbuffer2 eb = { };
> - data_t data;
> - igt_display_t *display = &data.display;
> const uint32_t DERRMR = 0x44050;
> + const uint32_t FORCEWAKE_MT = 0xa188;
> unsigned int valid_tests = 0;
> - uint32_t batch[8], *b;
> + uint32_t batch[16], *b;
> + uint16_t devid;
> igt_output_t *output;
> - uint32_t bb_handle;
> - uint32_t reg;
> + data_t data;
> enum pipe p;
> int fd;
>
> - igt_require(intel_gen(intel_get_drm_devid(gem_fd)) >= 6);
> - igt_require(intel_register_access_init(intel_get_pci_device(),
> - false, gem_fd) == 0);
> + devid = intel_get_drm_devid(gem_fd);
> + igt_require(intel_gen(devid) >= 7);
> + igt_skip_on(IS_VALLEYVIEW(devid) || IS_CHERRYVIEW(devid));
> +
> + kmstest_set_vt_graphics_mode();
> + igt_display_init(&data.display, gem_fd);
>
> /**
> * We will use the display to render event forwarind so need to
> @@ -522,51 +524,52 @@ event_wait(int gem_fd, const struct intel_execution_engine2 *e)
> * listen to the recorded time spent in engine wait state as reported
> * by the PMU.
> */
> - reg = intel_register_read(DERRMR);
> -
> - kmstest_set_vt_graphics_mode();
> - igt_display_init(&data.display, gem_fd);
> -
> - bb_handle = gem_create(gem_fd, 4096);
> + obj.handle = gem_create(gem_fd, 4096);
>
> b = batch;
> *b++ = MI_LOAD_REGISTER_IMM;
> + *b++ = FORCEWAKE_MT;
> + *b++ = 2 << 16 | 2;
> + *b++ = MI_LOAD_REGISTER_IMM;
> *b++ = DERRMR;
> - *b++ = reg & ~((1 << 3) | (1 << 11) | (1 << 21));
> - *b++ = MI_WAIT_FOR_EVENT | MI_WAIT_FOR_PIPE_A_VBLANK;
> + *b++ = ~0u;
> + *b++ = MI_WAIT_FOR_EVENT;
> *b++ = MI_LOAD_REGISTER_IMM;
> *b++ = DERRMR;
> - *b++ = reg;
> + *b++ = ~0u;
> + *b++ = MI_LOAD_REGISTER_IMM;
> + *b++ = FORCEWAKE_MT;
> + *b++ = 2 << 16;
> *b++ = MI_BATCH_BUFFER_END;
>
> - obj.handle = bb_handle;
> -
> eb.buffer_count = 1;
> eb.buffers_ptr = to_user_pointer(&obj);
> eb.flags = e2ring(gem_fd, e) | I915_EXEC_SECURE;
>
> - for_each_pipe_with_valid_output(display, p, output) {
> + for_each_pipe_with_valid_output(&data.display, p, output) {
> struct igt_helper_process waiter = { };
> const unsigned int frames = 3;
> - unsigned int frame;
> uint64_t val[2];
>
> - batch[3] = MI_WAIT_FOR_EVENT;
> + batch[6] = MI_WAIT_FOR_EVENT;
> switch (p) {
> case PIPE_A:
> - batch[3] |= MI_WAIT_FOR_PIPE_A_VBLANK;
> + batch[6] |= MI_WAIT_FOR_PIPE_A_VBLANK;
> + batch[5] = ~(1 << 3);
> break;
> case PIPE_B:
> - batch[3] |= MI_WAIT_FOR_PIPE_B_VBLANK;
> + batch[6] |= MI_WAIT_FOR_PIPE_B_VBLANK;
> + batch[5] = ~(1 << 11);
> break;
> case PIPE_C:
> - batch[3] |= MI_WAIT_FOR_PIPE_C_VBLANK;
> + batch[6] |= MI_WAIT_FOR_PIPE_C_VBLANK;
> + batch[5] = ~(1 << 21);
> break;
> default:
> continue;
> }
>
> - gem_write(gem_fd, bb_handle, 0, batch, sizeof(batch));
> + gem_write(gem_fd, obj.handle, 0, batch, sizeof(batch));
>
> data.pipe = p;
> prepare_crtc(&data, gem_fd, output);
> @@ -589,9 +592,9 @@ event_wait(int gem_fd, const struct intel_execution_engine2 *e)
> }
> }
>
> - for (frame = 0; frame < frames; frame++) {
> + for (unsigned int frame = 0; frame < frames; frame++) {
> gem_execbuf(gem_fd, &eb);
> - gem_sync(gem_fd, bb_handle);
> + gem_sync(gem_fd, obj.handle);
> }
>
> igt_stop_helper(&waiter);
> @@ -606,9 +609,7 @@ event_wait(int gem_fd, const struct intel_execution_engine2 *e)
> igt_assert(val[1] - val[0] > 0);
> }
>
> - gem_close(gem_fd, bb_handle);
> -
> - intel_register_access_fini();
> + gem_close(gem_fd, obj.handle);
>
> igt_require_f(valid_tests,
> "no valid crtc/connector combinations found\n");
>
It looks fine apart from the assumption that DERRMR always have to
remain at default ~0. Worst I can imagine is that in the future we have
to make this test force disconnect all displays, should some of the
reserved bits be used for something which we will be turning on by default.
But in that world sampling DERRMR value at test start is also only
marginally better.
Anyway, we can worry about that if it happens.
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Regards,
Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH igt 3/3] igt/perf_pmu: Idle the GPU before starting to measure busyness
2017-11-24 7:32 ` Tvrtko Ursulin
@ 2017-11-24 9:15 ` Chris Wilson
0 siblings, 0 replies; 15+ messages in thread
From: Chris Wilson @ 2017-11-24 9:15 UTC (permalink / raw)
To: Tvrtko Ursulin, Tvrtko Ursulin, intel-gfx
Quoting Tvrtko Ursulin (2017-11-24 07:32:30)
>
> On 23/11/17 16:07, Chris Wilson wrote:
> > Quoting Chris Wilson (2017-11-23 16:02:20)
> >> Quoting Tvrtko Ursulin (2017-11-23 13:37:30)
> >>>
> >>> On 23/11/2017 08:22, Chris Wilson wrote:
> >>>> Make sure the HW is idle before we start sampling the GPU for busyness.
> >>>> If we do not rest for long enough between tests, we may carry the
> >>>> sampling over.
> >>>
> >>> I'd rather not have this since as I said yesterday each opened PMU event
> >>> is supposed to record the initial counter value as reference. If that is
> >>> failing or not good enough on some tests/platforms I would rather first
> >>> understand why and how.
> >>
> >> All legacy sampling fails... :| I'm putting it back!
> >
> > So presumably it is coupling with the parking.
>
> Or more precisely averaging with the previous sample. Every time one PMU
> client signals the timer to self-disarm, but the new PMU client enables
> the timer again in this window, the previous sample value does not get
> cleared.
[snip]
> But it is too evil to wait for the sampling timer in case of MMIO in the
> irq disabled section. Even though it would happen only on disable-enable
> transitions quicker than sampling period. It is not an use case, but still..
>
> Or we could drop sample averaging.. ?
Yeah, we are accumulating too much cruft for a small improvement (that
only affects the endpoints).
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH igt 2/3] igt/perf_pmu: Stop peeking at intel_mmio registers
2017-11-24 8:55 ` Tvrtko Ursulin
@ 2017-11-24 9:20 ` Chris Wilson
2017-11-24 10:47 ` Chris Wilson
1 sibling, 0 replies; 15+ messages in thread
From: Chris Wilson @ 2017-11-24 9:20 UTC (permalink / raw)
To: Tvrtko Ursulin, intel-gfx
Quoting Tvrtko Ursulin (2017-11-24 08:55:13)
>
> On 23/11/2017 08:22, Chris Wilson wrote:
> > @@ -522,51 +524,52 @@ event_wait(int gem_fd, const struct intel_execution_engine2 *e)
> > * listen to the recorded time spent in engine wait state as reported
> > * by the PMU.
> > */
> > - reg = intel_register_read(DERRMR);
> > -
> > - kmstest_set_vt_graphics_mode();
> > - igt_display_init(&data.display, gem_fd);
> > -
> > - bb_handle = gem_create(gem_fd, 4096);
> > + obj.handle = gem_create(gem_fd, 4096);
> >
> > b = batch;
> > *b++ = MI_LOAD_REGISTER_IMM;
> > + *b++ = FORCEWAKE_MT;
> > + *b++ = 2 << 16 | 2;
> > + *b++ = MI_LOAD_REGISTER_IMM;
> > *b++ = DERRMR;
> > - *b++ = reg & ~((1 << 3) | (1 << 11) | (1 << 21));
> > - *b++ = MI_WAIT_FOR_EVENT | MI_WAIT_FOR_PIPE_A_VBLANK;
> > + *b++ = ~0u;
> > + *b++ = MI_WAIT_FOR_EVENT;
> > *b++ = MI_LOAD_REGISTER_IMM;
> > *b++ = DERRMR;
> > - *b++ = reg;
> > + *b++ = ~0u;
> > + *b++ = MI_LOAD_REGISTER_IMM;
> > + *b++ = FORCEWAKE_MT;
> > + *b++ = 2 << 16;
> > *b++ = MI_BATCH_BUFFER_END;
> >
> It looks fine apart from the assumption that DERRMR always have to
> remain at default ~0. Worst I can imagine is that in the future we have
> to make this test force disconnect all displays, should some of the
> reserved bits be used for something which we will be turning on by default.
I think more of the HW designers than that they will require routing
Display Engine messages to the Render Ring at all times! What are we
going to do with the messages we are not using in our batches? ;)
I believe the point of disabling it by default was that it was causing a
coupling between different powerwells across the other side of the chip,
and so we had to explicitly enable messages we wanted in the batch. I
don't imagine that they will stuff a different meaning into it, rather
they will just scrap it and start again (like every other gen).
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH igt 2/3] igt/perf_pmu: Stop peeking at intel_mmio registers
2017-11-24 8:55 ` Tvrtko Ursulin
2017-11-24 9:20 ` Chris Wilson
@ 2017-11-24 10:47 ` Chris Wilson
1 sibling, 0 replies; 15+ messages in thread
From: Chris Wilson @ 2017-11-24 10:47 UTC (permalink / raw)
To: Tvrtko Ursulin, intel-gfx
Quoting Tvrtko Ursulin (2017-11-24 08:55:13)
>
> On 23/11/2017 08:22, Chris Wilson wrote:
> > Program the MI_WAIT_FOR_EVENT without reference to DERRMR by knowing its
> > state is ~0u when not in use, and is only in use when userspace requires
> > it. By not touching intel_regsiter_access we completely eliminate the
> > risk that we leak the forcewake ref, which can cause later rc6 to fail.
> >
> > At the same time, note that vlv/chv use a different mechanism (read
> > none) for coupling between the render engine and display.
> >
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> > tests/perf_pmu.c | 63 ++++++++++++++++++++++++++++----------------------------
> > 1 file changed, 32 insertions(+), 31 deletions(-)
> >
> > diff --git a/tests/perf_pmu.c b/tests/perf_pmu.c
> > index 5118c023..7fd1506e 100644
> > --- a/tests/perf_pmu.c
> > +++ b/tests/perf_pmu.c
> > @@ -498,20 +498,22 @@ event_wait(int gem_fd, const struct intel_execution_engine2 *e)
> > {
> > struct drm_i915_gem_exec_object2 obj = { };
> > struct drm_i915_gem_execbuffer2 eb = { };
> > - data_t data;
> > - igt_display_t *display = &data.display;
> > const uint32_t DERRMR = 0x44050;
> > + const uint32_t FORCEWAKE_MT = 0xa188;
> > unsigned int valid_tests = 0;
> > - uint32_t batch[8], *b;
> > + uint32_t batch[16], *b;
> > + uint16_t devid;
> > igt_output_t *output;
> > - uint32_t bb_handle;
> > - uint32_t reg;
> > + data_t data;
> > enum pipe p;
> > int fd;
> >
> > - igt_require(intel_gen(intel_get_drm_devid(gem_fd)) >= 6);
> > - igt_require(intel_register_access_init(intel_get_pci_device(),
> > - false, gem_fd) == 0);
> > + devid = intel_get_drm_devid(gem_fd);
> > + igt_require(intel_gen(devid) >= 7);
> > + igt_skip_on(IS_VALLEYVIEW(devid) || IS_CHERRYVIEW(devid));
> > +
> > + kmstest_set_vt_graphics_mode();
> > + igt_display_init(&data.display, gem_fd);
> >
> > /**
> > * We will use the display to render event forwarind so need to
> > @@ -522,51 +524,52 @@ event_wait(int gem_fd, const struct intel_execution_engine2 *e)
> > * listen to the recorded time spent in engine wait state as reported
> > * by the PMU.
> > */
> > - reg = intel_register_read(DERRMR);
> > -
> > - kmstest_set_vt_graphics_mode();
> > - igt_display_init(&data.display, gem_fd);
> > -
> > - bb_handle = gem_create(gem_fd, 4096);
> > + obj.handle = gem_create(gem_fd, 4096);
> >
> > b = batch;
> > *b++ = MI_LOAD_REGISTER_IMM;
> > + *b++ = FORCEWAKE_MT;
> > + *b++ = 2 << 16 | 2;
> > + *b++ = MI_LOAD_REGISTER_IMM;
> > *b++ = DERRMR;
> > - *b++ = reg & ~((1 << 3) | (1 << 11) | (1 << 21));
> > - *b++ = MI_WAIT_FOR_EVENT | MI_WAIT_FOR_PIPE_A_VBLANK;
> > + *b++ = ~0u;
> > + *b++ = MI_WAIT_FOR_EVENT;
> > *b++ = MI_LOAD_REGISTER_IMM;
> > *b++ = DERRMR;
> > - *b++ = reg;
> > + *b++ = ~0u;
> > + *b++ = MI_LOAD_REGISTER_IMM;
> > + *b++ = FORCEWAKE_MT;
> > + *b++ = 2 << 16;
> > *b++ = MI_BATCH_BUFFER_END;
> >
> > - obj.handle = bb_handle;
> > -
> > eb.buffer_count = 1;
> > eb.buffers_ptr = to_user_pointer(&obj);
> > eb.flags = e2ring(gem_fd, e) | I915_EXEC_SECURE;
> >
> > - for_each_pipe_with_valid_output(display, p, output) {
> > + for_each_pipe_with_valid_output(&data.display, p, output) {
> > struct igt_helper_process waiter = { };
> > const unsigned int frames = 3;
> > - unsigned int frame;
> > uint64_t val[2];
> >
> > - batch[3] = MI_WAIT_FOR_EVENT;
> > + batch[6] = MI_WAIT_FOR_EVENT;
> > switch (p) {
> > case PIPE_A:
> > - batch[3] |= MI_WAIT_FOR_PIPE_A_VBLANK;
> > + batch[6] |= MI_WAIT_FOR_PIPE_A_VBLANK;
> > + batch[5] = ~(1 << 3);
> > break;
> > case PIPE_B:
> > - batch[3] |= MI_WAIT_FOR_PIPE_B_VBLANK;
> > + batch[6] |= MI_WAIT_FOR_PIPE_B_VBLANK;
> > + batch[5] = ~(1 << 11);
> > break;
> > case PIPE_C:
> > - batch[3] |= MI_WAIT_FOR_PIPE_C_VBLANK;
> > + batch[6] |= MI_WAIT_FOR_PIPE_C_VBLANK;
> > + batch[5] = ~(1 << 21);
> > break;
> > default:
> > continue;
> > }
> >
> > - gem_write(gem_fd, bb_handle, 0, batch, sizeof(batch));
> > + gem_write(gem_fd, obj.handle, 0, batch, sizeof(batch));
> >
> > data.pipe = p;
> > prepare_crtc(&data, gem_fd, output);
> > @@ -589,9 +592,9 @@ event_wait(int gem_fd, const struct intel_execution_engine2 *e)
> > }
> > }
> >
> > - for (frame = 0; frame < frames; frame++) {
> > + for (unsigned int frame = 0; frame < frames; frame++) {
> > gem_execbuf(gem_fd, &eb);
> > - gem_sync(gem_fd, bb_handle);
> > + gem_sync(gem_fd, obj.handle);
> > }
> >
> > igt_stop_helper(&waiter);
> > @@ -606,9 +609,7 @@ event_wait(int gem_fd, const struct intel_execution_engine2 *e)
> > igt_assert(val[1] - val[0] > 0);
> > }
> >
> > - gem_close(gem_fd, bb_handle);
> > -
> > - intel_register_access_fini();
> > + gem_close(gem_fd, obj.handle);
> >
> > igt_require_f(valid_tests,
> > "no valid crtc/connector combinations found\n");
> >
>
> It looks fine apart from the assumption that DERRMR always have to
> remain at default ~0. Worst I can imagine is that in the future we have
> to make this test force disconnect all displays, should some of the
> reserved bits be used for something which we will be turning on by default.
>
> But in that world sampling DERRMR value at test start is also only
> marginally better.
Added a comment that I am lazy, and pushed. Thanks,
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2017-11-24 10:47 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-23 8:22 igt/perf_pmu roundup Chris Wilson
2017-11-23 8:22 ` [PATCH igt 1/3] igt/perf_pmu: Reduce arbitrary delays Chris Wilson
2017-11-23 13:35 ` Tvrtko Ursulin
2017-11-23 8:22 ` [PATCH igt 2/3] igt/perf_pmu: Stop peeking at intel_mmio registers Chris Wilson
2017-11-24 8:55 ` Tvrtko Ursulin
2017-11-24 9:20 ` Chris Wilson
2017-11-24 10:47 ` Chris Wilson
2017-11-23 8:22 ` [PATCH igt 3/3] igt/perf_pmu: Idle the GPU before starting to measure busyness Chris Wilson
2017-11-23 13:37 ` Tvrtko Ursulin
2017-11-23 16:02 ` Chris Wilson
2017-11-23 16:07 ` Chris Wilson
2017-11-24 7:32 ` Tvrtko Ursulin
2017-11-24 9:15 ` Chris Wilson
2017-11-23 18:53 ` ✓ Fi.CI.BAT: success for series starting with [1/3] igt/perf_pmu: Reduce arbitrary delays Patchwork
2017-11-23 22:14 ` ✓ Fi.CI.IGT: " Patchwork
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.