All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH i-g-t 0/5] Queued/runnable/running engine stats
@ 2018-03-19 18:22 ` Tvrtko Ursulin
  0 siblings, 0 replies; 26+ messages in thread
From: Tvrtko Ursulin @ 2018-03-19 18:22 UTC (permalink / raw)
  To: igt-dev; +Cc: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

IGT patches for the identicaly named i915 series, including:

 * Engine queue depths for intel-gpu-overlay (including load average).
 * Tests for new PMU counters.
 * Tests for the query API.

Tests have been tested (!) only on Skylake so YMMV. Also they depend on one yet
unmerged IGT patch so won't compile in this form. Sending to show (most) of the
feature is close to ready.

Tvrtko Ursulin (5):
  include: i915 uAPI headers
  intel-gpu-overlay: Add engine queue stats
  intel-gpu-overlay: Show 1s, 30s and 15m GPU load
  tests/perf_pmu: Add tests for engine queued/runnable/running stats
  tests/i915_query: Engine queues tests

 include/drm-uapi/i915_drm.h |  19 ++-
 overlay/gpu-top.c           |  81 +++++++++-
 overlay/gpu-top.h           |  22 ++-
 overlay/overlay.c           |  35 +++-
 tests/i915_query.c          | 381 ++++++++++++++++++++++++++++++++++++++++++++
 tests/perf_pmu.c            | 224 ++++++++++++++++++++++++++
 6 files changed, 753 insertions(+), 9 deletions(-)

-- 
2.14.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Intel-gfx] [PATCH i-g-t 0/5] Queued/runnable/running engine stats
@ 2018-03-19 18:22 ` Tvrtko Ursulin
  0 siblings, 0 replies; 26+ messages in thread
From: Tvrtko Ursulin @ 2018-03-19 18:22 UTC (permalink / raw)
  To: igt-dev; +Cc: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

IGT patches for the identicaly named i915 series, including:

 * Engine queue depths for intel-gpu-overlay (including load average).
 * Tests for new PMU counters.
 * Tests for the query API.

Tests have been tested (!) only on Skylake so YMMV. Also they depend on one yet
unmerged IGT patch so won't compile in this form. Sending to show (most) of the
feature is close to ready.

Tvrtko Ursulin (5):
  include: i915 uAPI headers
  intel-gpu-overlay: Add engine queue stats
  intel-gpu-overlay: Show 1s, 30s and 15m GPU load
  tests/perf_pmu: Add tests for engine queued/runnable/running stats
  tests/i915_query: Engine queues tests

 include/drm-uapi/i915_drm.h |  19 ++-
 overlay/gpu-top.c           |  81 +++++++++-
 overlay/gpu-top.h           |  22 ++-
 overlay/overlay.c           |  35 +++-
 tests/i915_query.c          | 381 ++++++++++++++++++++++++++++++++++++++++++++
 tests/perf_pmu.c            | 224 ++++++++++++++++++++++++++
 6 files changed, 753 insertions(+), 9 deletions(-)

-- 
2.14.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH i-g-t 1/5] include: i915 uAPI headers
  2018-03-19 18:22 ` [Intel-gfx] " Tvrtko Ursulin
@ 2018-03-19 18:22   ` Tvrtko Ursulin
  -1 siblings, 0 replies; 26+ messages in thread
From: Tvrtko Ursulin @ 2018-03-19 18:22 UTC (permalink / raw)
  To: igt-dev; +Cc: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Temporary up to date uAPI headers.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 include/drm-uapi/i915_drm.h | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/include/drm-uapi/i915_drm.h b/include/drm-uapi/i915_drm.h
index 16e452aa12d4..14c7e790f6ed 100644
--- a/include/drm-uapi/i915_drm.h
+++ b/include/drm-uapi/i915_drm.h
@@ -110,9 +110,17 @@ enum drm_i915_gem_engine_class {
 enum drm_i915_pmu_engine_sample {
 	I915_SAMPLE_BUSY = 0,
 	I915_SAMPLE_WAIT = 1,
-	I915_SAMPLE_SEMA = 2
+	I915_SAMPLE_SEMA = 2,
+	I915_SAMPLE_QUEUED = 3,
+	I915_SAMPLE_RUNNABLE = 4,
+	I915_SAMPLE_RUNNING = 5,
 };
 
+ /* Divide counter value by divisor to get the real value. */
+#define I915_SAMPLE_QUEUED_DIVISOR (1024)
+#define I915_SAMPLE_RUNNABLE_DIVISOR (1024)
+#define I915_SAMPLE_RUNNING_DIVISOR (1024)
+
 #define I915_PMU_SAMPLE_BITS (4)
 #define I915_PMU_SAMPLE_MASK (0xf)
 #define I915_PMU_SAMPLE_INSTANCE_BITS (8)
@@ -133,6 +141,15 @@ enum drm_i915_pmu_engine_sample {
 #define I915_PMU_ENGINE_SEMA(class, instance) \
 	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_SEMA)
 
+#define I915_PMU_ENGINE_QUEUED(class, instance) \
+	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_QUEUED)
+
+#define I915_PMU_ENGINE_RUNNABLE(class, instance) \
+	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_RUNNABLE)
+
+#define I915_PMU_ENGINE_RUNNING(class, instance) \
+	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_RUNNING)
+
 #define __I915_PMU_OTHER(x) (__I915_PMU_ENGINE(0xff, 0xff, 0xf) + 1 + (x))
 
 #define I915_PMU_ACTUAL_FREQUENCY	__I915_PMU_OTHER(0)
-- 
2.14.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [igt-dev] [PATCH i-g-t 1/5] include: i915 uAPI headers
@ 2018-03-19 18:22   ` Tvrtko Ursulin
  0 siblings, 0 replies; 26+ messages in thread
From: Tvrtko Ursulin @ 2018-03-19 18:22 UTC (permalink / raw)
  To: igt-dev; +Cc: Intel-gfx, Tvrtko Ursulin

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Temporary up to date uAPI headers.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 include/drm-uapi/i915_drm.h | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/include/drm-uapi/i915_drm.h b/include/drm-uapi/i915_drm.h
index 16e452aa12d4..14c7e790f6ed 100644
--- a/include/drm-uapi/i915_drm.h
+++ b/include/drm-uapi/i915_drm.h
@@ -110,9 +110,17 @@ enum drm_i915_gem_engine_class {
 enum drm_i915_pmu_engine_sample {
 	I915_SAMPLE_BUSY = 0,
 	I915_SAMPLE_WAIT = 1,
-	I915_SAMPLE_SEMA = 2
+	I915_SAMPLE_SEMA = 2,
+	I915_SAMPLE_QUEUED = 3,
+	I915_SAMPLE_RUNNABLE = 4,
+	I915_SAMPLE_RUNNING = 5,
 };
 
+ /* Divide counter value by divisor to get the real value. */
+#define I915_SAMPLE_QUEUED_DIVISOR (1024)
+#define I915_SAMPLE_RUNNABLE_DIVISOR (1024)
+#define I915_SAMPLE_RUNNING_DIVISOR (1024)
+
 #define I915_PMU_SAMPLE_BITS (4)
 #define I915_PMU_SAMPLE_MASK (0xf)
 #define I915_PMU_SAMPLE_INSTANCE_BITS (8)
@@ -133,6 +141,15 @@ enum drm_i915_pmu_engine_sample {
 #define I915_PMU_ENGINE_SEMA(class, instance) \
 	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_SEMA)
 
+#define I915_PMU_ENGINE_QUEUED(class, instance) \
+	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_QUEUED)
+
+#define I915_PMU_ENGINE_RUNNABLE(class, instance) \
+	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_RUNNABLE)
+
+#define I915_PMU_ENGINE_RUNNING(class, instance) \
+	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_RUNNING)
+
 #define __I915_PMU_OTHER(x) (__I915_PMU_ENGINE(0xff, 0xff, 0xf) + 1 + (x))
 
 #define I915_PMU_ACTUAL_FREQUENCY	__I915_PMU_OTHER(0)
-- 
2.14.1

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH i-g-t 2/5] intel-gpu-overlay: Add engine queue stats
  2018-03-19 18:22 ` [Intel-gfx] " Tvrtko Ursulin
@ 2018-03-19 18:22   ` Tvrtko Ursulin
  -1 siblings, 0 replies; 26+ messages in thread
From: Tvrtko Ursulin @ 2018-03-19 18:22 UTC (permalink / raw)
  To: igt-dev; +Cc: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Use new PMU engine queue stats (queued, runnable and running) and display
them per engine.

v2:
 * Compact per engine stats. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 overlay/gpu-top.c | 42 ++++++++++++++++++++++++++++++++++++++++++
 overlay/gpu-top.h | 11 +++++++++++
 overlay/overlay.c |  7 +++++++
 3 files changed, 60 insertions(+)

diff --git a/overlay/gpu-top.c b/overlay/gpu-top.c
index 61b8f62fd78c..22e9badb22c1 100644
--- a/overlay/gpu-top.c
+++ b/overlay/gpu-top.c
@@ -72,6 +72,18 @@ static int perf_init(struct gpu_top *gt)
 				 gt->fd) >= 0)
 		gt->have_sema = 1;
 
+	if (perf_i915_open_group(I915_PMU_ENGINE_QUEUED(d->class, d->inst),
+				 gt->fd) >= 0)
+		gt->have_queued = 1;
+
+	if (perf_i915_open_group(I915_PMU_ENGINE_RUNNABLE(d->class, d->inst),
+				 gt->fd) >= 0)
+		gt->have_runnable = 1;
+
+	if (perf_i915_open_group(I915_PMU_ENGINE_RUNNING(d->class, d->inst),
+				 gt->fd) >= 0)
+		gt->have_running = 1;
+
 	gt->ring[0].name = d->name;
 	gt->num_rings = 1;
 
@@ -93,6 +105,24 @@ static int perf_init(struct gpu_top *gt)
 				   gt->fd) < 0)
 			return -1;
 
+		if (gt->have_queued &&
+		    perf_i915_open_group(I915_PMU_ENGINE_QUEUED(d->class,
+								d->inst),
+				   gt->fd) < 0)
+			return -1;
+
+		if (gt->have_runnable &&
+		    perf_i915_open_group(I915_PMU_ENGINE_RUNNABLE(d->class,
+								  d->inst),
+				   gt->fd) < 0)
+			return -1;
+
+		if (gt->have_running &&
+		    perf_i915_open_group(I915_PMU_ENGINE_RUNNING(d->class,
+								 d->inst),
+				   gt->fd) < 0)
+			return -1;
+
 		gt->ring[gt->num_rings++].name = d->name;
 	}
 
@@ -298,6 +328,12 @@ int gpu_top_update(struct gpu_top *gt)
 				s->wait[n] = sample[m++];
 			if (gt->have_sema)
 				s->sema[n] = sample[m++];
+			if (gt->have_queued)
+				s->queued[n] = sample[m++];
+			if (gt->have_runnable)
+				s->runnable[n] = sample[m++];
+			if (gt->have_running)
+				s->running[n] = sample[m++];
 		}
 
 		if (gt->count == 1)
@@ -310,6 +346,12 @@ int gpu_top_update(struct gpu_top *gt)
 				gt->ring[n].u.u.wait = (100 * (s->wait[n] - d->wait[n]) + d_time/2) / d_time;
 			if (gt->have_sema)
 				gt->ring[n].u.u.sema = (100 * (s->sema[n] - d->sema[n]) + d_time/2) / d_time;
+			if (gt->have_queued)
+				gt->ring[n].queued = (double)((s->queued[n] - d->queued[n])) / I915_SAMPLE_QUEUED_DIVISOR * 1e9 / d_time;
+			if (gt->have_runnable)
+				gt->ring[n].runnable = (double)((s->runnable[n] - d->runnable[n])) / I915_SAMPLE_RUNNABLE_DIVISOR  * 1e9 / d_time;
+			if (gt->have_running)
+				gt->ring[n].running = (double)((s->running[n] - d->running[n])) / I915_SAMPLE_RUNNING_DIVISOR * 1e9 / d_time;
 
 			/* in case of rounding + sampling errors, fudge */
 			if (gt->ring[n].u.u.busy > 100)
diff --git a/overlay/gpu-top.h b/overlay/gpu-top.h
index d3cdd779760f..cb4310c82a94 100644
--- a/overlay/gpu-top.h
+++ b/overlay/gpu-top.h
@@ -36,6 +36,9 @@ struct gpu_top {
 	int num_rings;
 	int have_wait;
 	int have_sema;
+	int have_queued;
+	int have_runnable;
+	int have_running;
 
 	struct gpu_top_ring {
 		const char *name;
@@ -47,6 +50,10 @@ struct gpu_top {
 			} u;
 			uint32_t payload;
 		} u;
+
+		double queued;
+		double runnable;
+		double running;
 	} ring[MAX_RINGS];
 
 	struct gpu_top_stat {
@@ -54,7 +61,11 @@ struct gpu_top {
 		uint64_t busy[MAX_RINGS];
 		uint64_t wait[MAX_RINGS];
 		uint64_t sema[MAX_RINGS];
+		uint64_t queued[MAX_RINGS];
+		uint64_t runnable[MAX_RINGS];
+		uint64_t running[MAX_RINGS];
 	} stat[2];
+
 	int count;
 };
 
diff --git a/overlay/overlay.c b/overlay/overlay.c
index 545af7bcb2f5..d3755397061b 100644
--- a/overlay/overlay.c
+++ b/overlay/overlay.c
@@ -255,6 +255,13 @@ static void show_gpu_top(struct overlay_context *ctx, struct overlay_gpu_top *gt
 		len = sprintf(txt, "%s: %3d%% busy",
 			      gt->gpu_top.ring[n].name,
 			      gt->gpu_top.ring[n].u.u.busy);
+		if (gt->gpu_top.have_queued &&
+		    gt->gpu_top.have_runnable &&
+		    gt->gpu_top.have_running)
+			len += sprintf(txt + len, " (%.2f / %.2f / %.2f)",
+				       gt->gpu_top.ring[n].queued,
+				       gt->gpu_top.ring[n].runnable,
+				       gt->gpu_top.ring[n].running);
 		if (gt->gpu_top.ring[n].u.u.wait)
 			len += sprintf(txt + len, ", %d%% wait",
 				       gt->gpu_top.ring[n].u.u.wait);
-- 
2.14.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [igt-dev] [PATCH i-g-t 2/5] intel-gpu-overlay: Add engine queue stats
@ 2018-03-19 18:22   ` Tvrtko Ursulin
  0 siblings, 0 replies; 26+ messages in thread
From: Tvrtko Ursulin @ 2018-03-19 18:22 UTC (permalink / raw)
  To: igt-dev; +Cc: Intel-gfx, Tvrtko Ursulin

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Use new PMU engine queue stats (queued, runnable and running) and display
them per engine.

v2:
 * Compact per engine stats. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 overlay/gpu-top.c | 42 ++++++++++++++++++++++++++++++++++++++++++
 overlay/gpu-top.h | 11 +++++++++++
 overlay/overlay.c |  7 +++++++
 3 files changed, 60 insertions(+)

diff --git a/overlay/gpu-top.c b/overlay/gpu-top.c
index 61b8f62fd78c..22e9badb22c1 100644
--- a/overlay/gpu-top.c
+++ b/overlay/gpu-top.c
@@ -72,6 +72,18 @@ static int perf_init(struct gpu_top *gt)
 				 gt->fd) >= 0)
 		gt->have_sema = 1;
 
+	if (perf_i915_open_group(I915_PMU_ENGINE_QUEUED(d->class, d->inst),
+				 gt->fd) >= 0)
+		gt->have_queued = 1;
+
+	if (perf_i915_open_group(I915_PMU_ENGINE_RUNNABLE(d->class, d->inst),
+				 gt->fd) >= 0)
+		gt->have_runnable = 1;
+
+	if (perf_i915_open_group(I915_PMU_ENGINE_RUNNING(d->class, d->inst),
+				 gt->fd) >= 0)
+		gt->have_running = 1;
+
 	gt->ring[0].name = d->name;
 	gt->num_rings = 1;
 
@@ -93,6 +105,24 @@ static int perf_init(struct gpu_top *gt)
 				   gt->fd) < 0)
 			return -1;
 
+		if (gt->have_queued &&
+		    perf_i915_open_group(I915_PMU_ENGINE_QUEUED(d->class,
+								d->inst),
+				   gt->fd) < 0)
+			return -1;
+
+		if (gt->have_runnable &&
+		    perf_i915_open_group(I915_PMU_ENGINE_RUNNABLE(d->class,
+								  d->inst),
+				   gt->fd) < 0)
+			return -1;
+
+		if (gt->have_running &&
+		    perf_i915_open_group(I915_PMU_ENGINE_RUNNING(d->class,
+								 d->inst),
+				   gt->fd) < 0)
+			return -1;
+
 		gt->ring[gt->num_rings++].name = d->name;
 	}
 
@@ -298,6 +328,12 @@ int gpu_top_update(struct gpu_top *gt)
 				s->wait[n] = sample[m++];
 			if (gt->have_sema)
 				s->sema[n] = sample[m++];
+			if (gt->have_queued)
+				s->queued[n] = sample[m++];
+			if (gt->have_runnable)
+				s->runnable[n] = sample[m++];
+			if (gt->have_running)
+				s->running[n] = sample[m++];
 		}
 
 		if (gt->count == 1)
@@ -310,6 +346,12 @@ int gpu_top_update(struct gpu_top *gt)
 				gt->ring[n].u.u.wait = (100 * (s->wait[n] - d->wait[n]) + d_time/2) / d_time;
 			if (gt->have_sema)
 				gt->ring[n].u.u.sema = (100 * (s->sema[n] - d->sema[n]) + d_time/2) / d_time;
+			if (gt->have_queued)
+				gt->ring[n].queued = (double)((s->queued[n] - d->queued[n])) / I915_SAMPLE_QUEUED_DIVISOR * 1e9 / d_time;
+			if (gt->have_runnable)
+				gt->ring[n].runnable = (double)((s->runnable[n] - d->runnable[n])) / I915_SAMPLE_RUNNABLE_DIVISOR  * 1e9 / d_time;
+			if (gt->have_running)
+				gt->ring[n].running = (double)((s->running[n] - d->running[n])) / I915_SAMPLE_RUNNING_DIVISOR * 1e9 / d_time;
 
 			/* in case of rounding + sampling errors, fudge */
 			if (gt->ring[n].u.u.busy > 100)
diff --git a/overlay/gpu-top.h b/overlay/gpu-top.h
index d3cdd779760f..cb4310c82a94 100644
--- a/overlay/gpu-top.h
+++ b/overlay/gpu-top.h
@@ -36,6 +36,9 @@ struct gpu_top {
 	int num_rings;
 	int have_wait;
 	int have_sema;
+	int have_queued;
+	int have_runnable;
+	int have_running;
 
 	struct gpu_top_ring {
 		const char *name;
@@ -47,6 +50,10 @@ struct gpu_top {
 			} u;
 			uint32_t payload;
 		} u;
+
+		double queued;
+		double runnable;
+		double running;
 	} ring[MAX_RINGS];
 
 	struct gpu_top_stat {
@@ -54,7 +61,11 @@ struct gpu_top {
 		uint64_t busy[MAX_RINGS];
 		uint64_t wait[MAX_RINGS];
 		uint64_t sema[MAX_RINGS];
+		uint64_t queued[MAX_RINGS];
+		uint64_t runnable[MAX_RINGS];
+		uint64_t running[MAX_RINGS];
 	} stat[2];
+
 	int count;
 };
 
diff --git a/overlay/overlay.c b/overlay/overlay.c
index 545af7bcb2f5..d3755397061b 100644
--- a/overlay/overlay.c
+++ b/overlay/overlay.c
@@ -255,6 +255,13 @@ static void show_gpu_top(struct overlay_context *ctx, struct overlay_gpu_top *gt
 		len = sprintf(txt, "%s: %3d%% busy",
 			      gt->gpu_top.ring[n].name,
 			      gt->gpu_top.ring[n].u.u.busy);
+		if (gt->gpu_top.have_queued &&
+		    gt->gpu_top.have_runnable &&
+		    gt->gpu_top.have_running)
+			len += sprintf(txt + len, " (%.2f / %.2f / %.2f)",
+				       gt->gpu_top.ring[n].queued,
+				       gt->gpu_top.ring[n].runnable,
+				       gt->gpu_top.ring[n].running);
 		if (gt->gpu_top.ring[n].u.u.wait)
 			len += sprintf(txt + len, ", %d%% wait",
 				       gt->gpu_top.ring[n].u.u.wait);
-- 
2.14.1

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH i-g-t 3/5] intel-gpu-overlay: Show 1s, 30s and 15m GPU load
  2018-03-19 18:22 ` [Intel-gfx] " Tvrtko Ursulin
@ 2018-03-19 18:22   ` Tvrtko Ursulin
  -1 siblings, 0 replies; 26+ messages in thread
From: Tvrtko Ursulin @ 2018-03-19 18:22 UTC (permalink / raw)
  To: igt-dev; +Cc: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Show total GPU loads in the window banner.

Engine load is defined as total of runnable and running requests on an
engine.

Total, non-normalized, load is display. In other words if N engines are
busy with exactly one request, the load will be shown as N.

v2:
 * Different flavour of load avg. (Chris Wilson)
 * Simplify code. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 overlay/gpu-top.c | 39 ++++++++++++++++++++++++++++++++++++++-
 overlay/gpu-top.h | 11 ++++++++++-
 overlay/overlay.c | 28 ++++++++++++++++++++++------
 3 files changed, 70 insertions(+), 8 deletions(-)

diff --git a/overlay/gpu-top.c b/overlay/gpu-top.c
index 22e9badb22c1..501429b86379 100644
--- a/overlay/gpu-top.c
+++ b/overlay/gpu-top.c
@@ -28,6 +28,7 @@
 #include <string.h>
 #include <unistd.h>
 #include <fcntl.h>
+#include <math.h>
 #include <errno.h>
 #include <assert.h>
 
@@ -126,6 +127,10 @@ static int perf_init(struct gpu_top *gt)
 		gt->ring[gt->num_rings++].name = d->name;
 	}
 
+	gt->have_load_avg = gt->have_queued &&
+			    gt->have_runnable &&
+			    gt->have_running;
+
 	return 0;
 }
 
@@ -290,17 +295,32 @@ static void mmio_init(struct gpu_top *gt)
 	}
 }
 
-void gpu_top_init(struct gpu_top *gt)
+void gpu_top_init(struct gpu_top *gt, unsigned int period_us)
 {
+	const double period = (double)period_us / 1e6;
+	const double load_period[NUM_LOADS] = { 1.0, 30.0, 900.0 };
+	const char *load_names[NUM_LOADS] = { "1s", "30s", "15m" };
+	unsigned int i;
+
 	memset(gt, 0, sizeof(*gt));
 	gt->fd = -1;
 
+	for (i = 0; i < NUM_LOADS; i++) {
+		gt->load_name[i] = load_names[i];
+		gt->exp[i] = exp(-period / load_period[i]);
+	}
+
 	if (perf_init(gt) == 0)
 		return;
 
 	mmio_init(gt);
 }
 
+static double update_load(double load, double exp, double val)
+{
+	return val + exp * (load - val);
+}
+
 int gpu_top_update(struct gpu_top *gt)
 {
 	uint32_t data[1024];
@@ -313,6 +333,8 @@ int gpu_top_update(struct gpu_top *gt)
 		struct gpu_top_stat *s = &gt->stat[gt->count++&1];
 		struct gpu_top_stat *d = &gt->stat[gt->count&1];
 		uint64_t *sample, d_time;
+		double gpu_qd = 0.0;
+		unsigned int i;
 		int n, m;
 
 		len = read(gt->fd, data, sizeof(data));
@@ -341,6 +363,8 @@ int gpu_top_update(struct gpu_top *gt)
 
 		d_time = s->time - d->time;
 		for (n = 0; n < gt->num_rings; n++) {
+			double qd = 0.0;
+
 			gt->ring[n].u.u.busy = (100 * (s->busy[n] - d->busy[n]) + d_time/2) / d_time;
 			if (gt->have_wait)
 				gt->ring[n].u.u.wait = (100 * (s->wait[n] - d->wait[n]) + d_time/2) / d_time;
@@ -353,6 +377,14 @@ int gpu_top_update(struct gpu_top *gt)
 			if (gt->have_running)
 				gt->ring[n].running = (double)((s->running[n] - d->running[n])) / I915_SAMPLE_RUNNING_DIVISOR * 1e9 / d_time;
 
+			qd = gt->ring[n].runnable + gt->ring[n].running;
+			gpu_qd += qd;
+
+			for (i = 0; i < NUM_LOADS; i++)
+				gt->ring[n].load[i] =
+					update_load(gt->ring[n].load[i],
+						    gt->exp[i], qd);
+
 			/* in case of rounding + sampling errors, fudge */
 			if (gt->ring[n].u.u.busy > 100)
 				gt->ring[n].u.u.busy = 100;
@@ -362,6 +394,11 @@ int gpu_top_update(struct gpu_top *gt)
 				gt->ring[n].u.u.sema = 100;
 		}
 
+		for (i = 0; i < NUM_LOADS; i++) {
+			gt->load[i] = update_load(gt->load[i], gt->exp[i],
+						  gpu_qd);
+			gt->norm_load[i] = gt->load[i] / gt->num_rings;
+		}
 		update = 1;
 	} else {
 		while ((len = read(gt->fd, data, sizeof(data))) > 0) {
diff --git a/overlay/gpu-top.h b/overlay/gpu-top.h
index cb4310c82a94..115ce8c482c1 100644
--- a/overlay/gpu-top.h
+++ b/overlay/gpu-top.h
@@ -26,6 +26,7 @@
 #define GPU_TOP_H
 
 #define MAX_RINGS 16
+#define NUM_LOADS 3
 
 #include <stdint.h>
 
@@ -39,6 +40,12 @@ struct gpu_top {
 	int have_queued;
 	int have_runnable;
 	int have_running;
+	int have_load_avg;
+
+	double exp[NUM_LOADS];
+	double load[NUM_LOADS];
+	double norm_load[NUM_LOADS];
+	const char *load_name[NUM_LOADS];
 
 	struct gpu_top_ring {
 		const char *name;
@@ -54,6 +61,8 @@ struct gpu_top {
 		double queued;
 		double runnable;
 		double running;
+
+		double load[NUM_LOADS];
 	} ring[MAX_RINGS];
 
 	struct gpu_top_stat {
@@ -69,7 +78,7 @@ struct gpu_top {
 	int count;
 };
 
-void gpu_top_init(struct gpu_top *gt);
+void gpu_top_init(struct gpu_top *gt, unsigned int period_us);
 int gpu_top_update(struct gpu_top *gt);
 
 #endif /* GPU_TOP_H */
diff --git a/overlay/overlay.c b/overlay/overlay.c
index d3755397061b..63512059d8ff 100644
--- a/overlay/overlay.c
+++ b/overlay/overlay.c
@@ -141,7 +141,8 @@ struct overlay_context {
 };
 
 static void init_gpu_top(struct overlay_context *ctx,
-			 struct overlay_gpu_top *gt)
+			 struct overlay_gpu_top *gt,
+			 unsigned int period_us)
 {
 	const double rgba[][4] = {
 		{ 1, 0.25, 0.25, 1 },
@@ -152,7 +153,7 @@ static void init_gpu_top(struct overlay_context *ctx,
 	int n;
 
 	cpu_top_init(&gt->cpu_top);
-	gpu_top_init(&gt->gpu_top);
+	gpu_top_init(&gt->gpu_top, period_us);
 
 	chart_init(&gt->cpu, "CPU", 120);
 	chart_set_position(&gt->cpu, PAD, PAD);
@@ -927,13 +928,13 @@ int main(int argc, char **argv)
 
 	debugfs_init();
 
-	init_gpu_top(&ctx, &ctx.gpu_top);
+	sample_period = get_sample_period(&config);
+
+	init_gpu_top(&ctx, &ctx.gpu_top, sample_period);
 	init_gpu_perf(&ctx, &ctx.gpu_perf);
 	init_gpu_freq(&ctx, &ctx.gpu_freq);
 	init_gem_objects(&ctx, &ctx.gem_objects);
 
-	sample_period = get_sample_period(&config);
-
 	i = 0;
 	while (1) {
 		ctx.time = time(NULL);
@@ -949,9 +950,24 @@ int main(int argc, char **argv)
 		show_gem_objects(&ctx, &ctx.gem_objects);
 
 		{
-			char buf[80];
+			struct gpu_top *gt = &ctx.gpu_top.gpu_top;
 			cairo_text_extents_t extents;
+			char buf[256];
+
 			gethostname(buf, sizeof(buf));
+
+			if (gt->have_load_avg) {
+				int len = strlen(buf);
+
+				snprintf(buf + len, sizeof(buf) - len,
+					 "%s; %u engines; load %s %.2f, %s %.2f, %s %.2f",
+					 buf,
+					 gt->num_rings,
+					 gt->load_name[0], gt->load[0],
+					 gt->load_name[1], gt->load[1],
+					 gt->load_name[2], gt->load[2]);
+			}
+
 			cairo_set_source_rgb(ctx.cr, .5, .5, .5);
 			cairo_set_font_size(ctx.cr, PAD-2);
 			cairo_text_extents(ctx.cr, buf, &extents);
-- 
2.14.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Intel-gfx] [PATCH i-g-t 3/5] intel-gpu-overlay: Show 1s, 30s and 15m GPU load
@ 2018-03-19 18:22   ` Tvrtko Ursulin
  0 siblings, 0 replies; 26+ messages in thread
From: Tvrtko Ursulin @ 2018-03-19 18:22 UTC (permalink / raw)
  To: igt-dev; +Cc: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Show total GPU loads in the window banner.

Engine load is defined as total of runnable and running requests on an
engine.

Total, non-normalized, load is display. In other words if N engines are
busy with exactly one request, the load will be shown as N.

v2:
 * Different flavour of load avg. (Chris Wilson)
 * Simplify code. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 overlay/gpu-top.c | 39 ++++++++++++++++++++++++++++++++++++++-
 overlay/gpu-top.h | 11 ++++++++++-
 overlay/overlay.c | 28 ++++++++++++++++++++++------
 3 files changed, 70 insertions(+), 8 deletions(-)

diff --git a/overlay/gpu-top.c b/overlay/gpu-top.c
index 22e9badb22c1..501429b86379 100644
--- a/overlay/gpu-top.c
+++ b/overlay/gpu-top.c
@@ -28,6 +28,7 @@
 #include <string.h>
 #include <unistd.h>
 #include <fcntl.h>
+#include <math.h>
 #include <errno.h>
 #include <assert.h>
 
@@ -126,6 +127,10 @@ static int perf_init(struct gpu_top *gt)
 		gt->ring[gt->num_rings++].name = d->name;
 	}
 
+	gt->have_load_avg = gt->have_queued &&
+			    gt->have_runnable &&
+			    gt->have_running;
+
 	return 0;
 }
 
@@ -290,17 +295,32 @@ static void mmio_init(struct gpu_top *gt)
 	}
 }
 
-void gpu_top_init(struct gpu_top *gt)
+void gpu_top_init(struct gpu_top *gt, unsigned int period_us)
 {
+	const double period = (double)period_us / 1e6;
+	const double load_period[NUM_LOADS] = { 1.0, 30.0, 900.0 };
+	const char *load_names[NUM_LOADS] = { "1s", "30s", "15m" };
+	unsigned int i;
+
 	memset(gt, 0, sizeof(*gt));
 	gt->fd = -1;
 
+	for (i = 0; i < NUM_LOADS; i++) {
+		gt->load_name[i] = load_names[i];
+		gt->exp[i] = exp(-period / load_period[i]);
+	}
+
 	if (perf_init(gt) == 0)
 		return;
 
 	mmio_init(gt);
 }
 
+static double update_load(double load, double exp, double val)
+{
+	return val + exp * (load - val);
+}
+
 int gpu_top_update(struct gpu_top *gt)
 {
 	uint32_t data[1024];
@@ -313,6 +333,8 @@ int gpu_top_update(struct gpu_top *gt)
 		struct gpu_top_stat *s = &gt->stat[gt->count++&1];
 		struct gpu_top_stat *d = &gt->stat[gt->count&1];
 		uint64_t *sample, d_time;
+		double gpu_qd = 0.0;
+		unsigned int i;
 		int n, m;
 
 		len = read(gt->fd, data, sizeof(data));
@@ -341,6 +363,8 @@ int gpu_top_update(struct gpu_top *gt)
 
 		d_time = s->time - d->time;
 		for (n = 0; n < gt->num_rings; n++) {
+			double qd = 0.0;
+
 			gt->ring[n].u.u.busy = (100 * (s->busy[n] - d->busy[n]) + d_time/2) / d_time;
 			if (gt->have_wait)
 				gt->ring[n].u.u.wait = (100 * (s->wait[n] - d->wait[n]) + d_time/2) / d_time;
@@ -353,6 +377,14 @@ int gpu_top_update(struct gpu_top *gt)
 			if (gt->have_running)
 				gt->ring[n].running = (double)((s->running[n] - d->running[n])) / I915_SAMPLE_RUNNING_DIVISOR * 1e9 / d_time;
 
+			qd = gt->ring[n].runnable + gt->ring[n].running;
+			gpu_qd += qd;
+
+			for (i = 0; i < NUM_LOADS; i++)
+				gt->ring[n].load[i] =
+					update_load(gt->ring[n].load[i],
+						    gt->exp[i], qd);
+
 			/* in case of rounding + sampling errors, fudge */
 			if (gt->ring[n].u.u.busy > 100)
 				gt->ring[n].u.u.busy = 100;
@@ -362,6 +394,11 @@ int gpu_top_update(struct gpu_top *gt)
 				gt->ring[n].u.u.sema = 100;
 		}
 
+		for (i = 0; i < NUM_LOADS; i++) {
+			gt->load[i] = update_load(gt->load[i], gt->exp[i],
+						  gpu_qd);
+			gt->norm_load[i] = gt->load[i] / gt->num_rings;
+		}
 		update = 1;
 	} else {
 		while ((len = read(gt->fd, data, sizeof(data))) > 0) {
diff --git a/overlay/gpu-top.h b/overlay/gpu-top.h
index cb4310c82a94..115ce8c482c1 100644
--- a/overlay/gpu-top.h
+++ b/overlay/gpu-top.h
@@ -26,6 +26,7 @@
 #define GPU_TOP_H
 
 #define MAX_RINGS 16
+#define NUM_LOADS 3
 
 #include <stdint.h>
 
@@ -39,6 +40,12 @@ struct gpu_top {
 	int have_queued;
 	int have_runnable;
 	int have_running;
+	int have_load_avg;
+
+	double exp[NUM_LOADS];
+	double load[NUM_LOADS];
+	double norm_load[NUM_LOADS];
+	const char *load_name[NUM_LOADS];
 
 	struct gpu_top_ring {
 		const char *name;
@@ -54,6 +61,8 @@ struct gpu_top {
 		double queued;
 		double runnable;
 		double running;
+
+		double load[NUM_LOADS];
 	} ring[MAX_RINGS];
 
 	struct gpu_top_stat {
@@ -69,7 +78,7 @@ struct gpu_top {
 	int count;
 };
 
-void gpu_top_init(struct gpu_top *gt);
+void gpu_top_init(struct gpu_top *gt, unsigned int period_us);
 int gpu_top_update(struct gpu_top *gt);
 
 #endif /* GPU_TOP_H */
diff --git a/overlay/overlay.c b/overlay/overlay.c
index d3755397061b..63512059d8ff 100644
--- a/overlay/overlay.c
+++ b/overlay/overlay.c
@@ -141,7 +141,8 @@ struct overlay_context {
 };
 
 static void init_gpu_top(struct overlay_context *ctx,
-			 struct overlay_gpu_top *gt)
+			 struct overlay_gpu_top *gt,
+			 unsigned int period_us)
 {
 	const double rgba[][4] = {
 		{ 1, 0.25, 0.25, 1 },
@@ -152,7 +153,7 @@ static void init_gpu_top(struct overlay_context *ctx,
 	int n;
 
 	cpu_top_init(&gt->cpu_top);
-	gpu_top_init(&gt->gpu_top);
+	gpu_top_init(&gt->gpu_top, period_us);
 
 	chart_init(&gt->cpu, "CPU", 120);
 	chart_set_position(&gt->cpu, PAD, PAD);
@@ -927,13 +928,13 @@ int main(int argc, char **argv)
 
 	debugfs_init();
 
-	init_gpu_top(&ctx, &ctx.gpu_top);
+	sample_period = get_sample_period(&config);
+
+	init_gpu_top(&ctx, &ctx.gpu_top, sample_period);
 	init_gpu_perf(&ctx, &ctx.gpu_perf);
 	init_gpu_freq(&ctx, &ctx.gpu_freq);
 	init_gem_objects(&ctx, &ctx.gem_objects);
 
-	sample_period = get_sample_period(&config);
-
 	i = 0;
 	while (1) {
 		ctx.time = time(NULL);
@@ -949,9 +950,24 @@ int main(int argc, char **argv)
 		show_gem_objects(&ctx, &ctx.gem_objects);
 
 		{
-			char buf[80];
+			struct gpu_top *gt = &ctx.gpu_top.gpu_top;
 			cairo_text_extents_t extents;
+			char buf[256];
+
 			gethostname(buf, sizeof(buf));
+
+			if (gt->have_load_avg) {
+				int len = strlen(buf);
+
+				snprintf(buf + len, sizeof(buf) - len,
+					 "%s; %u engines; load %s %.2f, %s %.2f, %s %.2f",
+					 buf,
+					 gt->num_rings,
+					 gt->load_name[0], gt->load[0],
+					 gt->load_name[1], gt->load[1],
+					 gt->load_name[2], gt->load[2]);
+			}
+
 			cairo_set_source_rgb(ctx.cr, .5, .5, .5);
 			cairo_set_font_size(ctx.cr, PAD-2);
 			cairo_text_extents(ctx.cr, buf, &extents);
-- 
2.14.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH i-g-t 4/5] tests/perf_pmu: Add tests for engine queued/runnable/running stats
  2018-03-19 18:22 ` [Intel-gfx] " Tvrtko Ursulin
@ 2018-03-19 18:22   ` Tvrtko Ursulin
  -1 siblings, 0 replies; 26+ messages in thread
From: Tvrtko Ursulin @ 2018-03-19 18:22 UTC (permalink / raw)
  To: igt-dev; +Cc: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Simple tests to check reported queue depths are correct.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 tests/perf_pmu.c | 224 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 224 insertions(+)

diff --git a/tests/perf_pmu.c b/tests/perf_pmu.c
index 469b9becdbac..206c18960b7b 100644
--- a/tests/perf_pmu.c
+++ b/tests/perf_pmu.c
@@ -966,6 +966,196 @@ multi_client(int gem_fd, const struct intel_execution_engine2 *e)
 	assert_within_epsilon(val[1], perf_slept[1], tolerance);
 }
 
+static double calc_queued(uint64_t d_val, uint64_t d_ns)
+{
+	return (double)d_val * 1e9 / I915_SAMPLE_QUEUED_DIVISOR / d_ns;
+}
+
+static void
+queued(int gem_fd, const struct intel_execution_engine2 *e)
+{
+	const unsigned long engine = e2ring(gem_fd, e);
+	const unsigned int max_rq = 10;
+	double queued[max_rq + 1];
+	uint32_t bo[max_rq + 1];
+	unsigned int n, i;
+	uint64_t val[2];
+	uint64_t ts[2];
+	int fd;
+
+	memset(queued, 0, sizeof(queued));
+	memset(bo, 0, sizeof(bo));
+
+	fd = open_pmu(I915_PMU_ENGINE_QUEUED(e->class, e->instance));
+
+	for (n = 0; n <= max_rq; n++) {
+		int fence = -1;
+		struct igt_cork cork = { .fd = fence, .type = CORK_SYNC_FD };
+
+		gem_quiescent_gpu(gem_fd);
+
+		if (n)
+			fence = igt_cork_plug(&cork, -1);
+
+		for (i = 0; i < n; i++) {
+			struct drm_i915_gem_exec_object2 obj = { };
+			struct drm_i915_gem_execbuffer2 eb = { };
+
+			if (!bo[i]) {
+				const uint32_t bbe = MI_BATCH_BUFFER_END;
+
+				bo[i] = gem_create(gem_fd, 4096);
+				gem_write(gem_fd, bo[i], 4092, &bbe,
+					  sizeof(bbe));
+			}
+
+			obj.handle = bo[i];
+
+			eb.buffer_count = 1;
+			eb.buffers_ptr = to_user_pointer(&obj);
+
+			eb.flags = engine | I915_EXEC_FENCE_IN;
+			eb.rsvd2 = fence;
+
+			gem_execbuf(gem_fd, &eb);
+		}
+
+		val[0] = __pmu_read_single(fd, &ts[0]);
+		usleep(batch_duration_ns / 1000);
+		val[1] = __pmu_read_single(fd, &ts[1]);
+
+		queued[n] = calc_queued(val[1] - val[0], ts[1] - ts[0]);
+		igt_info("n=%u queued=%.2f\n", n, queued[n]);
+
+		if (fence >= 0)
+			igt_cork_unplug(&cork);
+
+		for (i = 0; i < n; i++)
+			gem_sync(gem_fd, bo[i]);
+	}
+
+	close(fd);
+
+	for (i = 0; i < max_rq; i++) {
+		if (bo[i])
+			gem_close(gem_fd, bo[i]);
+	}
+
+	for (i = 0; i <= max_rq; i++)
+		assert_within_epsilon(queued[i], i, tolerance);
+}
+
+static void
+runnable(int gem_fd, const struct intel_execution_engine2 *e)
+{
+	const unsigned long engine = e2ring(gem_fd, e);
+	const unsigned int max_rq = 10;
+	igt_spin_t *spin[max_rq + 1];
+	double runnable[max_rq + 1];
+	uint32_t ctx[max_rq];
+	unsigned int n, i;
+	uint64_t val[2];
+	uint64_t ts[2];
+	int fd;
+
+	memset(runnable, 0, sizeof(runnable));
+	memset(ctx, 0, sizeof(ctx));
+
+	fd = open_pmu(I915_PMU_ENGINE_RUNNABLE(e->class, e->instance));
+
+	for (n = 0; n <= max_rq; n++) {
+		gem_quiescent_gpu(gem_fd);
+
+		for (i = 0; i < n; i++) {
+			if (!ctx[i])
+				ctx[i] = gem_context_create(gem_fd);
+
+			if (i == 0)
+				spin[i] = __spin_poll(gem_fd, ctx[i], engine);
+			else
+				spin[i] = __igt_spin_batch_new(gem_fd, ctx[i],
+							       engine, 0);
+		}
+
+		if (n)
+			__spin_wait(gem_fd, spin[0]);
+
+		val[0] = __pmu_read_single(fd, &ts[0]);
+		usleep(batch_duration_ns / 1000);
+		val[1] = __pmu_read_single(fd, &ts[1]);
+
+		runnable[n] = calc_queued(val[1] - val[0], ts[1] - ts[0]);
+		igt_info("n=%u runnable=%.2f\n", n, runnable[n]);
+
+		for (i = 0; i < n; i++) {
+			end_spin(gem_fd, spin[i], FLAG_SYNC);
+			igt_spin_batch_free(gem_fd, spin[i]);
+		}
+	}
+
+	for (i = 0; i < max_rq; i++) {
+		if (ctx[i])
+			gem_context_destroy(gem_fd, ctx[i]);
+	}
+
+	close(fd);
+
+	assert_within_epsilon(runnable[0], 0, tolerance);
+	igt_assert(runnable[max_rq] > 0.0);
+	assert_within_epsilon(runnable[max_rq] - runnable[max_rq - 1], 1,
+			      tolerance);
+}
+
+static void
+running(int gem_fd, const struct intel_execution_engine2 *e)
+{
+	const unsigned long engine = e2ring(gem_fd, e);
+	const unsigned int max_rq = 10;
+	igt_spin_t *spin[max_rq + 1];
+	double running[max_rq + 1];
+	unsigned int n, i;
+	uint64_t val[2];
+	uint64_t ts[2];
+	int fd;
+
+	memset(running, 0, sizeof(running));
+	memset(spin, 0, sizeof(spin));
+
+	fd = open_pmu(I915_PMU_ENGINE_RUNNING(e->class, e->instance));
+
+	for (n = 0; n <= max_rq; n++) {
+		gem_quiescent_gpu(gem_fd);
+
+		for (i = 0; i < n; i++) {
+			if (i == 0)
+				spin[i] = __spin_poll(gem_fd, 0, engine);
+			else
+				spin[i] = __igt_spin_batch_new(gem_fd, 0,
+							       engine, 0);
+		}
+
+		if (n)
+			__spin_wait(gem_fd, spin[0]);
+
+		val[0] = __pmu_read_single(fd, &ts[0]);
+		usleep(batch_duration_ns / 1000);
+		val[1] = __pmu_read_single(fd, &ts[1]);
+
+		running[n] = calc_queued(val[1] - val[0], ts[1] - ts[0]);
+		igt_info("n=%u running=%.2f\n", n, running[n]);
+
+		for (i = 0; i < n; i++) {
+			end_spin(gem_fd, spin[i], FLAG_SYNC);
+			igt_spin_batch_free(gem_fd, spin[i]);
+		}
+	}
+
+	close(fd);
+
+	for (i = 0; i <= max_rq; i++)
+		assert_within_epsilon(running[i], i, tolerance);
+}
+
 /**
  * Tests that i915 PMU corectly errors out in invalid initialization.
  * i915 PMU is uncore PMU, thus:
@@ -1718,6 +1908,15 @@ igt_main
 		igt_subtest_f("init-sema-%s", e->name)
 			init(fd, e, I915_SAMPLE_SEMA);
 
+		igt_subtest_f("init-queued-%s", e->name)
+			init(fd, e, I915_SAMPLE_QUEUED);
+
+		igt_subtest_f("init-runnable-%s", e->name)
+			init(fd, e, I915_SAMPLE_RUNNABLE);
+
+		igt_subtest_f("init-running-%s", e->name)
+			init(fd, e, I915_SAMPLE_RUNNING);
+
 		igt_subtest_group {
 			igt_fixture {
 				gem_require_engine(fd, e->class, e->instance);
@@ -1823,6 +2022,24 @@ igt_main
 
 			igt_subtest_f("busy-hang-%s", e->name)
 				single(fd, e, TEST_BUSY | FLAG_HANG);
+
+			/**
+			 * Test that queued metric works.
+			 */
+			igt_subtest_f("queued-%s", e->name)
+				queued(fd, e);
+
+			/**
+			 * Test that runnable metric works.
+			 */
+			igt_subtest_f("runnable-%s", e->name)
+				runnable(fd, e);
+
+			/**
+			 * Test that running metric works.
+			 */
+			igt_subtest_f("running-%s", e->name)
+				running(fd, e);
 		}
 
 		/**
@@ -1915,6 +2132,13 @@ igt_main
 					      e->name)
 					single(fd, e,
 					       TEST_BUSY | TEST_TRAILING_IDLE);
+				igt_subtest_f("render-node-queued-%s", e->name)
+					queued(fd, e);
+				igt_subtest_f("render-node-runnable-%s",
+					      e->name)
+					runnable(fd, e);
+				igt_subtest_f("render-node-running-%s", e->name)
+					running(fd, e);
 			}
 		}
 
-- 
2.14.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Intel-gfx] [PATCH i-g-t 4/5] tests/perf_pmu: Add tests for engine queued/runnable/running stats
@ 2018-03-19 18:22   ` Tvrtko Ursulin
  0 siblings, 0 replies; 26+ messages in thread
From: Tvrtko Ursulin @ 2018-03-19 18:22 UTC (permalink / raw)
  To: igt-dev; +Cc: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Simple tests to check reported queue depths are correct.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 tests/perf_pmu.c | 224 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 224 insertions(+)

diff --git a/tests/perf_pmu.c b/tests/perf_pmu.c
index 469b9becdbac..206c18960b7b 100644
--- a/tests/perf_pmu.c
+++ b/tests/perf_pmu.c
@@ -966,6 +966,196 @@ multi_client(int gem_fd, const struct intel_execution_engine2 *e)
 	assert_within_epsilon(val[1], perf_slept[1], tolerance);
 }
 
+static double calc_queued(uint64_t d_val, uint64_t d_ns)
+{
+	return (double)d_val * 1e9 / I915_SAMPLE_QUEUED_DIVISOR / d_ns;
+}
+
+static void
+queued(int gem_fd, const struct intel_execution_engine2 *e)
+{
+	const unsigned long engine = e2ring(gem_fd, e);
+	const unsigned int max_rq = 10;
+	double queued[max_rq + 1];
+	uint32_t bo[max_rq + 1];
+	unsigned int n, i;
+	uint64_t val[2];
+	uint64_t ts[2];
+	int fd;
+
+	memset(queued, 0, sizeof(queued));
+	memset(bo, 0, sizeof(bo));
+
+	fd = open_pmu(I915_PMU_ENGINE_QUEUED(e->class, e->instance));
+
+	for (n = 0; n <= max_rq; n++) {
+		int fence = -1;
+		struct igt_cork cork = { .fd = fence, .type = CORK_SYNC_FD };
+
+		gem_quiescent_gpu(gem_fd);
+
+		if (n)
+			fence = igt_cork_plug(&cork, -1);
+
+		for (i = 0; i < n; i++) {
+			struct drm_i915_gem_exec_object2 obj = { };
+			struct drm_i915_gem_execbuffer2 eb = { };
+
+			if (!bo[i]) {
+				const uint32_t bbe = MI_BATCH_BUFFER_END;
+
+				bo[i] = gem_create(gem_fd, 4096);
+				gem_write(gem_fd, bo[i], 4092, &bbe,
+					  sizeof(bbe));
+			}
+
+			obj.handle = bo[i];
+
+			eb.buffer_count = 1;
+			eb.buffers_ptr = to_user_pointer(&obj);
+
+			eb.flags = engine | I915_EXEC_FENCE_IN;
+			eb.rsvd2 = fence;
+
+			gem_execbuf(gem_fd, &eb);
+		}
+
+		val[0] = __pmu_read_single(fd, &ts[0]);
+		usleep(batch_duration_ns / 1000);
+		val[1] = __pmu_read_single(fd, &ts[1]);
+
+		queued[n] = calc_queued(val[1] - val[0], ts[1] - ts[0]);
+		igt_info("n=%u queued=%.2f\n", n, queued[n]);
+
+		if (fence >= 0)
+			igt_cork_unplug(&cork);
+
+		for (i = 0; i < n; i++)
+			gem_sync(gem_fd, bo[i]);
+	}
+
+	close(fd);
+
+	for (i = 0; i < max_rq; i++) {
+		if (bo[i])
+			gem_close(gem_fd, bo[i]);
+	}
+
+	for (i = 0; i <= max_rq; i++)
+		assert_within_epsilon(queued[i], i, tolerance);
+}
+
+static void
+runnable(int gem_fd, const struct intel_execution_engine2 *e)
+{
+	const unsigned long engine = e2ring(gem_fd, e);
+	const unsigned int max_rq = 10;
+	igt_spin_t *spin[max_rq + 1];
+	double runnable[max_rq + 1];
+	uint32_t ctx[max_rq];
+	unsigned int n, i;
+	uint64_t val[2];
+	uint64_t ts[2];
+	int fd;
+
+	memset(runnable, 0, sizeof(runnable));
+	memset(ctx, 0, sizeof(ctx));
+
+	fd = open_pmu(I915_PMU_ENGINE_RUNNABLE(e->class, e->instance));
+
+	for (n = 0; n <= max_rq; n++) {
+		gem_quiescent_gpu(gem_fd);
+
+		for (i = 0; i < n; i++) {
+			if (!ctx[i])
+				ctx[i] = gem_context_create(gem_fd);
+
+			if (i == 0)
+				spin[i] = __spin_poll(gem_fd, ctx[i], engine);
+			else
+				spin[i] = __igt_spin_batch_new(gem_fd, ctx[i],
+							       engine, 0);
+		}
+
+		if (n)
+			__spin_wait(gem_fd, spin[0]);
+
+		val[0] = __pmu_read_single(fd, &ts[0]);
+		usleep(batch_duration_ns / 1000);
+		val[1] = __pmu_read_single(fd, &ts[1]);
+
+		runnable[n] = calc_queued(val[1] - val[0], ts[1] - ts[0]);
+		igt_info("n=%u runnable=%.2f\n", n, runnable[n]);
+
+		for (i = 0; i < n; i++) {
+			end_spin(gem_fd, spin[i], FLAG_SYNC);
+			igt_spin_batch_free(gem_fd, spin[i]);
+		}
+	}
+
+	for (i = 0; i < max_rq; i++) {
+		if (ctx[i])
+			gem_context_destroy(gem_fd, ctx[i]);
+	}
+
+	close(fd);
+
+	assert_within_epsilon(runnable[0], 0, tolerance);
+	igt_assert(runnable[max_rq] > 0.0);
+	assert_within_epsilon(runnable[max_rq] - runnable[max_rq - 1], 1,
+			      tolerance);
+}
+
+static void
+running(int gem_fd, const struct intel_execution_engine2 *e)
+{
+	const unsigned long engine = e2ring(gem_fd, e);
+	const unsigned int max_rq = 10;
+	igt_spin_t *spin[max_rq + 1];
+	double running[max_rq + 1];
+	unsigned int n, i;
+	uint64_t val[2];
+	uint64_t ts[2];
+	int fd;
+
+	memset(running, 0, sizeof(running));
+	memset(spin, 0, sizeof(spin));
+
+	fd = open_pmu(I915_PMU_ENGINE_RUNNING(e->class, e->instance));
+
+	for (n = 0; n <= max_rq; n++) {
+		gem_quiescent_gpu(gem_fd);
+
+		for (i = 0; i < n; i++) {
+			if (i == 0)
+				spin[i] = __spin_poll(gem_fd, 0, engine);
+			else
+				spin[i] = __igt_spin_batch_new(gem_fd, 0,
+							       engine, 0);
+		}
+
+		if (n)
+			__spin_wait(gem_fd, spin[0]);
+
+		val[0] = __pmu_read_single(fd, &ts[0]);
+		usleep(batch_duration_ns / 1000);
+		val[1] = __pmu_read_single(fd, &ts[1]);
+
+		running[n] = calc_queued(val[1] - val[0], ts[1] - ts[0]);
+		igt_info("n=%u running=%.2f\n", n, running[n]);
+
+		for (i = 0; i < n; i++) {
+			end_spin(gem_fd, spin[i], FLAG_SYNC);
+			igt_spin_batch_free(gem_fd, spin[i]);
+		}
+	}
+
+	close(fd);
+
+	for (i = 0; i <= max_rq; i++)
+		assert_within_epsilon(running[i], i, tolerance);
+}
+
 /**
  * Tests that i915 PMU corectly errors out in invalid initialization.
  * i915 PMU is uncore PMU, thus:
@@ -1718,6 +1908,15 @@ igt_main
 		igt_subtest_f("init-sema-%s", e->name)
 			init(fd, e, I915_SAMPLE_SEMA);
 
+		igt_subtest_f("init-queued-%s", e->name)
+			init(fd, e, I915_SAMPLE_QUEUED);
+
+		igt_subtest_f("init-runnable-%s", e->name)
+			init(fd, e, I915_SAMPLE_RUNNABLE);
+
+		igt_subtest_f("init-running-%s", e->name)
+			init(fd, e, I915_SAMPLE_RUNNING);
+
 		igt_subtest_group {
 			igt_fixture {
 				gem_require_engine(fd, e->class, e->instance);
@@ -1823,6 +2022,24 @@ igt_main
 
 			igt_subtest_f("busy-hang-%s", e->name)
 				single(fd, e, TEST_BUSY | FLAG_HANG);
+
+			/**
+			 * Test that queued metric works.
+			 */
+			igt_subtest_f("queued-%s", e->name)
+				queued(fd, e);
+
+			/**
+			 * Test that runnable metric works.
+			 */
+			igt_subtest_f("runnable-%s", e->name)
+				runnable(fd, e);
+
+			/**
+			 * Test that running metric works.
+			 */
+			igt_subtest_f("running-%s", e->name)
+				running(fd, e);
 		}
 
 		/**
@@ -1915,6 +2132,13 @@ igt_main
 					      e->name)
 					single(fd, e,
 					       TEST_BUSY | TEST_TRAILING_IDLE);
+				igt_subtest_f("render-node-queued-%s", e->name)
+					queued(fd, e);
+				igt_subtest_f("render-node-runnable-%s",
+					      e->name)
+					runnable(fd, e);
+				igt_subtest_f("render-node-running-%s", e->name)
+					running(fd, e);
 			}
 		}
 
-- 
2.14.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH i-g-t 5/5] tests/i915_query: Engine queues tests
  2018-03-19 18:22 ` [Intel-gfx] " Tvrtko Ursulin
@ 2018-03-19 18:22   ` Tvrtko Ursulin
  -1 siblings, 0 replies; 26+ messages in thread
From: Tvrtko Ursulin @ 2018-03-19 18:22 UTC (permalink / raw)
  To: igt-dev; +Cc: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

...

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 tests/i915_query.c | 381 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 381 insertions(+)

diff --git a/tests/i915_query.c b/tests/i915_query.c
index c7de8cbd8371..94e7a3297ebd 100644
--- a/tests/i915_query.c
+++ b/tests/i915_query.c
@@ -477,8 +477,358 @@ test_query_topology_known_pci_ids(int fd, int devid)
 	free(topo_info);
 }
 
+#define DRM_I915_QUERY_ENGINE_QUEUES	2
+
+struct drm_i915_query_engine_queues {
+	/** Engine class as in enum drm_i915_gem_engine_class. */
+	__u16 class;
+
+	/** Engine instance number. */
+	__u16 instance;
+
+	/** Number of requests with unresolved fences and dependencies. */
+	__u32 queued;
+
+	/** Number of ready requests waiting on a slot on GPU. */
+	__u32 runnable;
+
+	/** Number of requests executing on the GPU. */
+	__u32 running;
+
+	__u32 rsvd[5];
+};
+
+static bool query_engine_queues_supported(int fd)
+{
+	struct drm_i915_query_item item = {
+		.query_id = DRM_I915_QUERY_ENGINE_QUEUES,
+	};
+
+	return __i915_query_items(fd, &item, 1) == 0 && item.length > 0;
+}
+
+static void engine_queues_invalid(int fd)
+{
+	struct drm_i915_query_engine_queues queues;
+	struct drm_i915_query_item item;
+	unsigned int len;
+	unsigned int i;
+
+	/* Flags is MBZ. */
+	memset(&item, 0, sizeof(item));
+	item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
+	item.flags = 1;
+	i915_query_items(fd, &item, 1);
+	igt_assert_eq(item.length, -EINVAL);
+
+	/* Length not zero and not greater or equal required size. */
+	memset(&item, 0, sizeof(item));
+	item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
+	item.length = 1;
+	i915_query_items(fd, &item, 1);
+	igt_assert_eq(item.length, -ENOSPC);
+
+	/* Query correct length. */
+	memset(&item, 0, sizeof(item));
+	item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
+	i915_query_items(fd, &item, 1);
+	igt_assert(item.length >= 0);
+	len = item.length;
+
+	/* Ivalid pointer. */
+	memset(&item, 0, sizeof(item));
+	item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
+	item.length = len;
+	i915_query_items(fd, &item, 1);
+	igt_assert_eq(item.length, -EFAULT);
+
+	/* Reserved fields are MBZ. */
+
+	for (i = 0; i < ARRAY_SIZE(queues.rsvd); i++) {
+		memset(&queues, 0, sizeof(queues));
+		queues.rsvd[i] = 1;
+		memset(&item, 0, sizeof(item));
+		item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
+		item.length = len;
+		item.data_ptr = to_user_pointer(&queues);
+		i915_query_items(fd, &item, 1);
+		igt_assert_eq(item.length, -EINVAL);
+	}
+
+	memset(&queues, 0, sizeof(queues));
+	queues.class = -1;
+	memset(&item, 0, sizeof(item));
+	item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
+	item.length = len;
+	item.data_ptr = to_user_pointer(&queues);
+	i915_query_items(fd, &item, 1);
+	igt_assert_eq(item.length, -ENOENT);
+
+	memset(&queues, 0, sizeof(queues));
+	queues.instance = -1;
+	memset(&item, 0, sizeof(item));
+	item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
+	item.length = len;
+		item.data_ptr = to_user_pointer(&queues);
+	i915_query_items(fd, &item, 1);
+	igt_assert_eq(item.length, -ENOENT);
+}
+
+static void engine_queues(int fd, const struct intel_execution_engine2 *e)
+{
+	struct drm_i915_query_engine_queues queues;
+	struct drm_i915_query_item item;
+	unsigned int len;
+
+	/* Query required buffer length. */
+	memset(&queues, 0, sizeof(queues));
+	memset(&item, 0, sizeof(item));
+	item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
+	item.data_ptr = to_user_pointer(&queues);
+	i915_query_items(fd, &item, 1);
+	igt_assert(item.length >= 0);
+	igt_assert(item.length <= sizeof(queues));
+	len = item.length;
+
+	/* Check length larger than required works and reports same length. */
+	memset(&queues, 0, sizeof(queues));
+	memset(&item, 0, sizeof(item));
+	item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
+	item.data_ptr = to_user_pointer(&queues);
+	item.length = len + 1;
+	i915_query_items(fd, &item, 1);
+	igt_assert_eq(item.length, len);
+
+	/* Actual query. */
+	memset(&queues, 0, sizeof(queues));
+	queues.class = e->class;
+	queues.instance = e->instance;
+	memset(&item, 0, sizeof(item));
+	item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
+	item.data_ptr = to_user_pointer(&queues);
+	item.length = len;
+	i915_query_items(fd, &item, 1);
+	igt_assert_eq(item.length, len);
+}
+
+static unsigned int e2ring(int gem_fd, const struct intel_execution_engine2 *e)
+{
+	return gem_class_instance_to_eb_flags(gem_fd, e->class, e->instance);
+}
+
+static void
+__query_queues(int fd, const struct intel_execution_engine2 *e,
+	       struct drm_i915_query_engine_queues *queues)
+{
+	struct drm_i915_query_item item;
+
+	memset(queues, 0, sizeof(*queues));
+	queues->class = e->class;
+	queues->instance = e->instance;
+	memset(&item, 0, sizeof(item));
+	item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
+	item.data_ptr = to_user_pointer(queues);
+	item.length = sizeof(*queues);
+	i915_query_items(fd, &item, 1);
+	igt_assert_eq(item.length, sizeof(*queues));
+}
+
+static void
+engine_queued(int gem_fd, const struct intel_execution_engine2 *e)
+{
+	const unsigned long engine = e2ring(gem_fd, e);
+	struct drm_i915_query_engine_queues queues;
+	const unsigned int max_rq = 10;
+	uint32_t queued[max_rq + 1];
+	uint32_t bo[max_rq + 1];
+	unsigned int n, i;
+
+	memset(queued, 0, sizeof(queued));
+	memset(bo, 0, sizeof(bo));
+
+	for (n = 0; n <= max_rq; n++) {
+		int fence = -1;
+		struct igt_cork cork = { .fd = fence, .type = CORK_SYNC_FD };
+
+		gem_quiescent_gpu(gem_fd);
+
+		if (n)
+			fence = igt_cork_plug(&cork, -1);
+
+		for (i = 0; i < n; i++) {
+			struct drm_i915_gem_exec_object2 obj = { };
+			struct drm_i915_gem_execbuffer2 eb = { };
+
+			if (!bo[i]) {
+				const uint32_t bbe = MI_BATCH_BUFFER_END;
+
+				bo[i] = gem_create(gem_fd, 4096);
+				gem_write(gem_fd, bo[i], 4092, &bbe,
+					  sizeof(bbe));
+			}
+
+			obj.handle = bo[i];
+
+			eb.buffer_count = 1;
+			eb.buffers_ptr = to_user_pointer(&obj);
+
+			eb.flags = engine | I915_EXEC_FENCE_IN;
+			eb.rsvd2 = fence;
+
+			gem_execbuf(gem_fd, &eb);
+		}
+
+		__query_queues(gem_fd, e, &queues);
+		queued[n] = queues.queued;
+		igt_info("n=%u queued=%u\n", n, queued[n]);
+
+		if (fence >= 0)
+			igt_cork_unplug(&cork);
+
+		for (i = 0; i < n; i++)
+			gem_sync(gem_fd, bo[i]);
+	}
+
+	for (i = 0; i < max_rq; i++) {
+		if (bo[i])
+			gem_close(gem_fd, bo[i]);
+	}
+
+	for (i = 0; i <= max_rq; i++)
+		igt_assert_eq(queued[i], i);
+}
+
+static igt_spin_t * __spin_poll(int fd, uint32_t ctx, unsigned long flags)
+{
+	if (gem_can_store_dword(fd, flags))
+		return __igt_spin_batch_new_poll(fd, ctx, flags);
+	else
+		return __igt_spin_batch_new(fd, ctx, flags, 0);
+}
+
+static unsigned long __spin_wait(int fd, igt_spin_t *spin)
+{
+	struct timespec start = { };
+
+	igt_nsec_elapsed(&start);
+
+	if (gem_can_store_dword(fd, spin->execbuf.flags)) {
+		unsigned long timeout = 0;
+
+		while (!spin->running) {
+			unsigned long t = igt_nsec_elapsed(&start);
+
+			if ((t - timeout) > 250e6) {
+				timeout = t;
+				igt_warn("Spinner not running after %.2fms\n",
+					 (double)t / 1e6);
+			}
+		};
+	} else {
+		igt_debug("__spin_wait - usleep mode\n");
+		usleep(500e3); /* Better than nothing! */
+	}
+
+	return igt_nsec_elapsed(&start);
+}
+
+static void
+engine_runnable(int gem_fd, const struct intel_execution_engine2 *e)
+{
+	const unsigned long engine = e2ring(gem_fd, e);
+	struct drm_i915_query_engine_queues queues;
+	const unsigned int max_rq = 10;
+	igt_spin_t *spin[max_rq + 1];
+	uint32_t runnable[max_rq + 1];
+	uint32_t ctx[max_rq];
+	unsigned int n, i;
+
+	memset(runnable, 0, sizeof(runnable));
+	memset(ctx, 0, sizeof(ctx));
+
+	for (n = 0; n <= max_rq; n++) {
+		gem_quiescent_gpu(gem_fd);
+
+		for (i = 0; i < n; i++) {
+			if (!ctx[i])
+				ctx[i] = gem_context_create(gem_fd);
+
+			if (i == 0)
+				spin[i] = __spin_poll(gem_fd, ctx[i], engine);
+			else
+				spin[i] = __igt_spin_batch_new(gem_fd, ctx[i],
+							       engine, 0);
+		}
+
+		if (n)
+			__spin_wait(gem_fd, spin[0]);
+
+		__query_queues(gem_fd, e, &queues);
+		runnable[n] = queues.runnable;
+		igt_info("n=%u runnable=%u\n", n, runnable[n]);
+
+		for (i = 0; i < n; i++) {
+			igt_spin_batch_end(spin[i]);
+			gem_sync(gem_fd, spin[i]->handle);
+			igt_spin_batch_free(gem_fd, spin[i]);
+		}
+	}
+
+	for (i = 0; i < max_rq; i++) {
+		if (ctx[i])
+			gem_context_destroy(gem_fd, ctx[i]);
+	}
+
+	igt_assert_eq(runnable[0], 0);
+	igt_assert(runnable[max_rq] > 0);
+	igt_assert_eq(runnable[max_rq] - runnable[max_rq - 1], 1);
+}
+
+static void
+engine_running(int gem_fd, const struct intel_execution_engine2 *e)
+{
+	const unsigned long engine = e2ring(gem_fd, e);
+	struct drm_i915_query_engine_queues queues;
+	const unsigned int max_rq = 10;
+	igt_spin_t *spin[max_rq + 1];
+	uint32_t running[max_rq + 1];
+	unsigned int n, i;
+
+	memset(running, 0, sizeof(running));
+	memset(spin, 0, sizeof(spin));
+
+	for (n = 0; n <= max_rq; n++) {
+		gem_quiescent_gpu(gem_fd);
+
+		for (i = 0; i < n; i++) {
+			if (i == 0)
+				spin[i] = __spin_poll(gem_fd, 0, engine);
+			else
+				spin[i] = __igt_spin_batch_new(gem_fd, 0,
+							       engine, 0);
+		}
+
+		if (n)
+			__spin_wait(gem_fd, spin[0]);
+
+		__query_queues(gem_fd, e, &queues);
+		running[n] = queues.running;
+		igt_info("n=%u running=%u\n", n, running[n]);
+
+		for (i = 0; i < n; i++) {
+			igt_spin_batch_end(spin[i]);
+			gem_sync(gem_fd, spin[i]->handle);
+			igt_spin_batch_free(gem_fd, spin[i]);
+		}
+	}
+
+	for (i = 0; i <= max_rq; i++)
+		igt_assert_eq(running[i], i);
+}
+
 igt_main
 {
+	const struct intel_execution_engine2 *e;
 	int fd = -1;
 	int devid;
 
@@ -524,6 +874,37 @@ igt_main
 		test_query_topology_known_pci_ids(fd, devid);
 	}
 
+	igt_subtest_group {
+		igt_fixture {
+			igt_require(query_engine_queues_supported(fd));
+		}
+
+		igt_subtest("engine-queues-invalid")
+			engine_queues_invalid(fd);
+
+		for_each_engine_class_instance(fd, e) {
+			igt_subtest_group {
+				igt_fixture {
+					gem_require_engine(fd,
+							   e->class,
+							   e->instance);
+				}
+
+				igt_subtest_f("engine-queues-%s", e->name)
+					engine_queues(fd, e);
+
+				igt_subtest_f("engine-queued-%s", e->name)
+					engine_queued(fd, e);
+
+				igt_subtest_f("engine-runnable-%s", e->name)
+					engine_runnable(fd, e);
+
+				igt_subtest_f("engine-running-%s", e->name)
+					engine_running(fd, e);
+			}
+		}
+	}
+
 	igt_fixture {
 		close(fd);
 	}
-- 
2.14.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [igt-dev] [PATCH i-g-t 5/5] tests/i915_query: Engine queues tests
@ 2018-03-19 18:22   ` Tvrtko Ursulin
  0 siblings, 0 replies; 26+ messages in thread
From: Tvrtko Ursulin @ 2018-03-19 18:22 UTC (permalink / raw)
  To: igt-dev; +Cc: Intel-gfx, Tvrtko Ursulin

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

...

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 tests/i915_query.c | 381 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 381 insertions(+)

diff --git a/tests/i915_query.c b/tests/i915_query.c
index c7de8cbd8371..94e7a3297ebd 100644
--- a/tests/i915_query.c
+++ b/tests/i915_query.c
@@ -477,8 +477,358 @@ test_query_topology_known_pci_ids(int fd, int devid)
 	free(topo_info);
 }
 
+#define DRM_I915_QUERY_ENGINE_QUEUES	2
+
+struct drm_i915_query_engine_queues {
+	/** Engine class as in enum drm_i915_gem_engine_class. */
+	__u16 class;
+
+	/** Engine instance number. */
+	__u16 instance;
+
+	/** Number of requests with unresolved fences and dependencies. */
+	__u32 queued;
+
+	/** Number of ready requests waiting on a slot on GPU. */
+	__u32 runnable;
+
+	/** Number of requests executing on the GPU. */
+	__u32 running;
+
+	__u32 rsvd[5];
+};
+
+static bool query_engine_queues_supported(int fd)
+{
+	struct drm_i915_query_item item = {
+		.query_id = DRM_I915_QUERY_ENGINE_QUEUES,
+	};
+
+	return __i915_query_items(fd, &item, 1) == 0 && item.length > 0;
+}
+
+static void engine_queues_invalid(int fd)
+{
+	struct drm_i915_query_engine_queues queues;
+	struct drm_i915_query_item item;
+	unsigned int len;
+	unsigned int i;
+
+	/* Flags is MBZ. */
+	memset(&item, 0, sizeof(item));
+	item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
+	item.flags = 1;
+	i915_query_items(fd, &item, 1);
+	igt_assert_eq(item.length, -EINVAL);
+
+	/* Length not zero and not greater or equal required size. */
+	memset(&item, 0, sizeof(item));
+	item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
+	item.length = 1;
+	i915_query_items(fd, &item, 1);
+	igt_assert_eq(item.length, -ENOSPC);
+
+	/* Query correct length. */
+	memset(&item, 0, sizeof(item));
+	item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
+	i915_query_items(fd, &item, 1);
+	igt_assert(item.length >= 0);
+	len = item.length;
+
+	/* Ivalid pointer. */
+	memset(&item, 0, sizeof(item));
+	item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
+	item.length = len;
+	i915_query_items(fd, &item, 1);
+	igt_assert_eq(item.length, -EFAULT);
+
+	/* Reserved fields are MBZ. */
+
+	for (i = 0; i < ARRAY_SIZE(queues.rsvd); i++) {
+		memset(&queues, 0, sizeof(queues));
+		queues.rsvd[i] = 1;
+		memset(&item, 0, sizeof(item));
+		item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
+		item.length = len;
+		item.data_ptr = to_user_pointer(&queues);
+		i915_query_items(fd, &item, 1);
+		igt_assert_eq(item.length, -EINVAL);
+	}
+
+	memset(&queues, 0, sizeof(queues));
+	queues.class = -1;
+	memset(&item, 0, sizeof(item));
+	item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
+	item.length = len;
+	item.data_ptr = to_user_pointer(&queues);
+	i915_query_items(fd, &item, 1);
+	igt_assert_eq(item.length, -ENOENT);
+
+	memset(&queues, 0, sizeof(queues));
+	queues.instance = -1;
+	memset(&item, 0, sizeof(item));
+	item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
+	item.length = len;
+		item.data_ptr = to_user_pointer(&queues);
+	i915_query_items(fd, &item, 1);
+	igt_assert_eq(item.length, -ENOENT);
+}
+
+static void engine_queues(int fd, const struct intel_execution_engine2 *e)
+{
+	struct drm_i915_query_engine_queues queues;
+	struct drm_i915_query_item item;
+	unsigned int len;
+
+	/* Query required buffer length. */
+	memset(&queues, 0, sizeof(queues));
+	memset(&item, 0, sizeof(item));
+	item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
+	item.data_ptr = to_user_pointer(&queues);
+	i915_query_items(fd, &item, 1);
+	igt_assert(item.length >= 0);
+	igt_assert(item.length <= sizeof(queues));
+	len = item.length;
+
+	/* Check length larger than required works and reports same length. */
+	memset(&queues, 0, sizeof(queues));
+	memset(&item, 0, sizeof(item));
+	item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
+	item.data_ptr = to_user_pointer(&queues);
+	item.length = len + 1;
+	i915_query_items(fd, &item, 1);
+	igt_assert_eq(item.length, len);
+
+	/* Actual query. */
+	memset(&queues, 0, sizeof(queues));
+	queues.class = e->class;
+	queues.instance = e->instance;
+	memset(&item, 0, sizeof(item));
+	item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
+	item.data_ptr = to_user_pointer(&queues);
+	item.length = len;
+	i915_query_items(fd, &item, 1);
+	igt_assert_eq(item.length, len);
+}
+
+static unsigned int e2ring(int gem_fd, const struct intel_execution_engine2 *e)
+{
+	return gem_class_instance_to_eb_flags(gem_fd, e->class, e->instance);
+}
+
+static void
+__query_queues(int fd, const struct intel_execution_engine2 *e,
+	       struct drm_i915_query_engine_queues *queues)
+{
+	struct drm_i915_query_item item;
+
+	memset(queues, 0, sizeof(*queues));
+	queues->class = e->class;
+	queues->instance = e->instance;
+	memset(&item, 0, sizeof(item));
+	item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
+	item.data_ptr = to_user_pointer(queues);
+	item.length = sizeof(*queues);
+	i915_query_items(fd, &item, 1);
+	igt_assert_eq(item.length, sizeof(*queues));
+}
+
+static void
+engine_queued(int gem_fd, const struct intel_execution_engine2 *e)
+{
+	const unsigned long engine = e2ring(gem_fd, e);
+	struct drm_i915_query_engine_queues queues;
+	const unsigned int max_rq = 10;
+	uint32_t queued[max_rq + 1];
+	uint32_t bo[max_rq + 1];
+	unsigned int n, i;
+
+	memset(queued, 0, sizeof(queued));
+	memset(bo, 0, sizeof(bo));
+
+	for (n = 0; n <= max_rq; n++) {
+		int fence = -1;
+		struct igt_cork cork = { .fd = fence, .type = CORK_SYNC_FD };
+
+		gem_quiescent_gpu(gem_fd);
+
+		if (n)
+			fence = igt_cork_plug(&cork, -1);
+
+		for (i = 0; i < n; i++) {
+			struct drm_i915_gem_exec_object2 obj = { };
+			struct drm_i915_gem_execbuffer2 eb = { };
+
+			if (!bo[i]) {
+				const uint32_t bbe = MI_BATCH_BUFFER_END;
+
+				bo[i] = gem_create(gem_fd, 4096);
+				gem_write(gem_fd, bo[i], 4092, &bbe,
+					  sizeof(bbe));
+			}
+
+			obj.handle = bo[i];
+
+			eb.buffer_count = 1;
+			eb.buffers_ptr = to_user_pointer(&obj);
+
+			eb.flags = engine | I915_EXEC_FENCE_IN;
+			eb.rsvd2 = fence;
+
+			gem_execbuf(gem_fd, &eb);
+		}
+
+		__query_queues(gem_fd, e, &queues);
+		queued[n] = queues.queued;
+		igt_info("n=%u queued=%u\n", n, queued[n]);
+
+		if (fence >= 0)
+			igt_cork_unplug(&cork);
+
+		for (i = 0; i < n; i++)
+			gem_sync(gem_fd, bo[i]);
+	}
+
+	for (i = 0; i < max_rq; i++) {
+		if (bo[i])
+			gem_close(gem_fd, bo[i]);
+	}
+
+	for (i = 0; i <= max_rq; i++)
+		igt_assert_eq(queued[i], i);
+}
+
+static igt_spin_t * __spin_poll(int fd, uint32_t ctx, unsigned long flags)
+{
+	if (gem_can_store_dword(fd, flags))
+		return __igt_spin_batch_new_poll(fd, ctx, flags);
+	else
+		return __igt_spin_batch_new(fd, ctx, flags, 0);
+}
+
+static unsigned long __spin_wait(int fd, igt_spin_t *spin)
+{
+	struct timespec start = { };
+
+	igt_nsec_elapsed(&start);
+
+	if (gem_can_store_dword(fd, spin->execbuf.flags)) {
+		unsigned long timeout = 0;
+
+		while (!spin->running) {
+			unsigned long t = igt_nsec_elapsed(&start);
+
+			if ((t - timeout) > 250e6) {
+				timeout = t;
+				igt_warn("Spinner not running after %.2fms\n",
+					 (double)t / 1e6);
+			}
+		};
+	} else {
+		igt_debug("__spin_wait - usleep mode\n");
+		usleep(500e3); /* Better than nothing! */
+	}
+
+	return igt_nsec_elapsed(&start);
+}
+
+static void
+engine_runnable(int gem_fd, const struct intel_execution_engine2 *e)
+{
+	const unsigned long engine = e2ring(gem_fd, e);
+	struct drm_i915_query_engine_queues queues;
+	const unsigned int max_rq = 10;
+	igt_spin_t *spin[max_rq + 1];
+	uint32_t runnable[max_rq + 1];
+	uint32_t ctx[max_rq];
+	unsigned int n, i;
+
+	memset(runnable, 0, sizeof(runnable));
+	memset(ctx, 0, sizeof(ctx));
+
+	for (n = 0; n <= max_rq; n++) {
+		gem_quiescent_gpu(gem_fd);
+
+		for (i = 0; i < n; i++) {
+			if (!ctx[i])
+				ctx[i] = gem_context_create(gem_fd);
+
+			if (i == 0)
+				spin[i] = __spin_poll(gem_fd, ctx[i], engine);
+			else
+				spin[i] = __igt_spin_batch_new(gem_fd, ctx[i],
+							       engine, 0);
+		}
+
+		if (n)
+			__spin_wait(gem_fd, spin[0]);
+
+		__query_queues(gem_fd, e, &queues);
+		runnable[n] = queues.runnable;
+		igt_info("n=%u runnable=%u\n", n, runnable[n]);
+
+		for (i = 0; i < n; i++) {
+			igt_spin_batch_end(spin[i]);
+			gem_sync(gem_fd, spin[i]->handle);
+			igt_spin_batch_free(gem_fd, spin[i]);
+		}
+	}
+
+	for (i = 0; i < max_rq; i++) {
+		if (ctx[i])
+			gem_context_destroy(gem_fd, ctx[i]);
+	}
+
+	igt_assert_eq(runnable[0], 0);
+	igt_assert(runnable[max_rq] > 0);
+	igt_assert_eq(runnable[max_rq] - runnable[max_rq - 1], 1);
+}
+
+static void
+engine_running(int gem_fd, const struct intel_execution_engine2 *e)
+{
+	const unsigned long engine = e2ring(gem_fd, e);
+	struct drm_i915_query_engine_queues queues;
+	const unsigned int max_rq = 10;
+	igt_spin_t *spin[max_rq + 1];
+	uint32_t running[max_rq + 1];
+	unsigned int n, i;
+
+	memset(running, 0, sizeof(running));
+	memset(spin, 0, sizeof(spin));
+
+	for (n = 0; n <= max_rq; n++) {
+		gem_quiescent_gpu(gem_fd);
+
+		for (i = 0; i < n; i++) {
+			if (i == 0)
+				spin[i] = __spin_poll(gem_fd, 0, engine);
+			else
+				spin[i] = __igt_spin_batch_new(gem_fd, 0,
+							       engine, 0);
+		}
+
+		if (n)
+			__spin_wait(gem_fd, spin[0]);
+
+		__query_queues(gem_fd, e, &queues);
+		running[n] = queues.running;
+		igt_info("n=%u running=%u\n", n, running[n]);
+
+		for (i = 0; i < n; i++) {
+			igt_spin_batch_end(spin[i]);
+			gem_sync(gem_fd, spin[i]->handle);
+			igt_spin_batch_free(gem_fd, spin[i]);
+		}
+	}
+
+	for (i = 0; i <= max_rq; i++)
+		igt_assert_eq(running[i], i);
+}
+
 igt_main
 {
+	const struct intel_execution_engine2 *e;
 	int fd = -1;
 	int devid;
 
@@ -524,6 +874,37 @@ igt_main
 		test_query_topology_known_pci_ids(fd, devid);
 	}
 
+	igt_subtest_group {
+		igt_fixture {
+			igt_require(query_engine_queues_supported(fd));
+		}
+
+		igt_subtest("engine-queues-invalid")
+			engine_queues_invalid(fd);
+
+		for_each_engine_class_instance(fd, e) {
+			igt_subtest_group {
+				igt_fixture {
+					gem_require_engine(fd,
+							   e->class,
+							   e->instance);
+				}
+
+				igt_subtest_f("engine-queues-%s", e->name)
+					engine_queues(fd, e);
+
+				igt_subtest_f("engine-queued-%s", e->name)
+					engine_queued(fd, e);
+
+				igt_subtest_f("engine-runnable-%s", e->name)
+					engine_runnable(fd, e);
+
+				igt_subtest_f("engine-running-%s", e->name)
+					engine_running(fd, e);
+			}
+		}
+	}
+
 	igt_fixture {
 		close(fd);
 	}
-- 
2.14.1

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH i-g-t 4/5] tests/perf_pmu: Add tests for engine queued/runnable/running stats
  2018-03-19 18:22   ` [Intel-gfx] " Tvrtko Ursulin
@ 2018-03-19 20:58     ` Chris Wilson
  -1 siblings, 0 replies; 26+ messages in thread
From: Chris Wilson @ 2018-03-19 20:58 UTC (permalink / raw)
  To: Tvrtko Ursulin, igt-dev; +Cc: Intel-gfx

Quoting Tvrtko Ursulin (2018-03-19 18:22:04)
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Simple tests to check reported queue depths are correct.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>  tests/perf_pmu.c | 224 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 224 insertions(+)
> 
> diff --git a/tests/perf_pmu.c b/tests/perf_pmu.c
> index 469b9becdbac..206c18960b7b 100644
> --- a/tests/perf_pmu.c
> +++ b/tests/perf_pmu.c
> @@ -966,6 +966,196 @@ multi_client(int gem_fd, const struct intel_execution_engine2 *e)
>         assert_within_epsilon(val[1], perf_slept[1], tolerance);
>  }
>  
> +static double calc_queued(uint64_t d_val, uint64_t d_ns)
> +{
> +       return (double)d_val * 1e9 / I915_SAMPLE_QUEUED_DIVISOR / d_ns;
> +}
> +
> +static void
> +queued(int gem_fd, const struct intel_execution_engine2 *e)
> +{
> +       const unsigned long engine = e2ring(gem_fd, e);
> +       const unsigned int max_rq = 10;
> +       double queued[max_rq + 1];
> +       uint32_t bo[max_rq + 1];
> +       unsigned int n, i;
> +       uint64_t val[2];
> +       uint64_t ts[2];
> +       int fd;

igt_require_sw_sync();

I guess we should do igt_require_cork(CORK_SYNC_FD) or something like
that.

> +
> +       memset(queued, 0, sizeof(queued));
> +       memset(bo, 0, sizeof(bo));
> +
> +       fd = open_pmu(I915_PMU_ENGINE_QUEUED(e->class, e->instance));
> +
> +       for (n = 0; n <= max_rq; n++) {
> +               int fence = -1;
> +               struct igt_cork cork = { .fd = fence, .type = CORK_SYNC_FD };

IGT_CORK_FENCE(cork); if you prefer

> +
> +               gem_quiescent_gpu(gem_fd);
> +
> +               if (n)
> +                       fence = igt_cork_plug(&cork, -1);
> +
> +               for (i = 0; i < n; i++) {
> +                       struct drm_i915_gem_exec_object2 obj = { };
> +                       struct drm_i915_gem_execbuffer2 eb = { };
> +
> +                       if (!bo[i]) {
> +                               const uint32_t bbe = MI_BATCH_BUFFER_END;
> +
> +                               bo[i] = gem_create(gem_fd, 4096);
> +                               gem_write(gem_fd, bo[i], 4092, &bbe,
> +                                         sizeof(bbe));
> +                       }
> +
> +                       obj.handle = bo[i];

Looks like you can use just the one handle multiple times?

> +
> +                       eb.buffer_count = 1;
> +                       eb.buffers_ptr = to_user_pointer(&obj);
> +
> +                       eb.flags = engine | I915_EXEC_FENCE_IN;
> +                       eb.rsvd2 = fence;

You do however also want to check with one context per execbuf.

if (flags & CONTEXTS)
	eb.rsvd1 = gem_context_create(fd);
> +
> +                       gem_execbuf(gem_fd, &eb);

if (flags & CONTEXTS)
	gem_context_destroy(fd, eb.rsvd1);
	eb.rsvd1 = gem_context_create();

> +               }
> +
> +               val[0] = __pmu_read_single(fd, &ts[0]);
> +               usleep(batch_duration_ns / 1000);
> +               val[1] = __pmu_read_single(fd, &ts[1]);
> +
> +               queued[n] = calc_queued(val[1] - val[0], ts[1] - ts[0]);
> +               igt_info("n=%u queued=%.2f\n", n, queued[n]);
> +
> +               if (fence >= 0)
> +                       igt_cork_unplug(&cork);

Maybe we should just make this a no-op when used on an unplugged cork.

> +
> +               for (i = 0; i < n; i++)
> +                       gem_sync(gem_fd, bo[i]);
> +       }
> +
> +       close(fd);
> +
> +       for (i = 0; i < max_rq; i++) {
> +               if (bo[i])
> +                       gem_close(gem_fd, bo[i]);
> +       }
> +
> +       for (i = 0; i <= max_rq; i++)
> +               assert_within_epsilon(queued[i], i, tolerance);
> +}
> +
> +static void
> +runnable(int gem_fd, const struct intel_execution_engine2 *e)
> +{
> +       const unsigned long engine = e2ring(gem_fd, e);
> +       const unsigned int max_rq = 10;
> +       igt_spin_t *spin[max_rq + 1];
> +       double runnable[max_rq + 1];
> +       uint32_t ctx[max_rq];
> +       unsigned int n, i;
> +       uint64_t val[2];
> +       uint64_t ts[2];
> +       int fd;
> +
> +       memset(runnable, 0, sizeof(runnable));
> +       memset(ctx, 0, sizeof(ctx));
> +
> +       fd = open_pmu(I915_PMU_ENGINE_RUNNABLE(e->class, e->instance));
> +
> +       for (n = 0; n <= max_rq; n++) {
> +               gem_quiescent_gpu(gem_fd);
> +
> +               for (i = 0; i < n; i++) {
> +                       if (!ctx[i])
> +                               ctx[i] = gem_context_create(gem_fd);
> +
> +                       if (i == 0)
> +                               spin[i] = __spin_poll(gem_fd, ctx[i], engine);
> +                       else
> +                               spin[i] = __igt_spin_batch_new(gem_fd, ctx[i],
> +                                                              engine, 0);
> +               }
> +
> +               if (n)
> +                       __spin_wait(gem_fd, spin[0]);
> +
> +               val[0] = __pmu_read_single(fd, &ts[0]);
> +               usleep(batch_duration_ns / 1000);
> +               val[1] = __pmu_read_single(fd, &ts[1]);
> +
> +               runnable[n] = calc_queued(val[1] - val[0], ts[1] - ts[0]);
> +               igt_info("n=%u runnable=%.2f\n", n, runnable[n]);
> +
> +               for (i = 0; i < n; i++) {
> +                       end_spin(gem_fd, spin[i], FLAG_SYNC);
> +                       igt_spin_batch_free(gem_fd, spin[i]);
> +               }
> +       }
> +
> +       for (i = 0; i < max_rq; i++) {
> +               if (ctx[i])
> +                       gem_context_destroy(gem_fd, ctx[i]);

I would just create the contexts unconditionally.

> +       }
> +
> +       close(fd);
> +
> +       assert_within_epsilon(runnable[0], 0, tolerance);
> +       igt_assert(runnable[max_rq] > 0.0);
> +       assert_within_epsilon(runnable[max_rq] - runnable[max_rq - 1], 1,
> +                             tolerance);
> +}
> +
> +static void
> +running(int gem_fd, const struct intel_execution_engine2 *e)
> +{
> +       const unsigned long engine = e2ring(gem_fd, e);
> +       const unsigned int max_rq = 10;
> +       igt_spin_t *spin[max_rq + 1];
> +       double running[max_rq + 1];
> +       unsigned int n, i;
> +       uint64_t val[2];
> +       uint64_t ts[2];
> +       int fd;
> +
> +       memset(running, 0, sizeof(running));
> +       memset(spin, 0, sizeof(spin));
> +
> +       fd = open_pmu(I915_PMU_ENGINE_RUNNING(e->class, e->instance));
> +
> +       for (n = 0; n <= max_rq; n++) {
> +               gem_quiescent_gpu(gem_fd);
> +
> +               for (i = 0; i < n; i++) {
> +                       if (i == 0)
> +                               spin[i] = __spin_poll(gem_fd, 0, engine);
> +                       else
> +                               spin[i] = __igt_spin_batch_new(gem_fd, 0,
> +                                                              engine, 0);
> +               }
> +
> +               if (n)
> +                       __spin_wait(gem_fd, spin[0]);

So create N requests on the same context so that running == N due to
lite-restore every time. I have some caveats that this relies on the
precise implementation, e.g. I don't think it will work for guc (using
execlists emulation with no lite-restore) for N > 2 or 8, or if we get
creative with execlists.

> +
> +               val[0] = __pmu_read_single(fd, &ts[0]);
> +               usleep(batch_duration_ns / 1000);
> +               val[1] = __pmu_read_single(fd, &ts[1]);
> +
> +               running[n] = calc_queued(val[1] - val[0], ts[1] - ts[0]);
> +               igt_info("n=%u running=%.2f\n", n, running[n]);
> +
> +               for (i = 0; i < n; i++) {
> +                       end_spin(gem_fd, spin[i], FLAG_SYNC);
> +                       igt_spin_batch_free(gem_fd, spin[i]);
> +               }
> +       }
> +
> +       close(fd);
> +
> +       for (i = 0; i <= max_rq; i++)
> +               assert_within_epsilon(running[i], i, tolerance);
> +}

Ok, the tests look like they should be covering the counters.

Do we need to do an all-engines pass to check concurrent usage?
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [igt-dev] [Intel-gfx] [PATCH i-g-t 4/5] tests/perf_pmu: Add tests for engine queued/runnable/running stats
@ 2018-03-19 20:58     ` Chris Wilson
  0 siblings, 0 replies; 26+ messages in thread
From: Chris Wilson @ 2018-03-19 20:58 UTC (permalink / raw)
  To: Tvrtko Ursulin, igt-dev; +Cc: Intel-gfx

Quoting Tvrtko Ursulin (2018-03-19 18:22:04)
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Simple tests to check reported queue depths are correct.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>  tests/perf_pmu.c | 224 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 224 insertions(+)
> 
> diff --git a/tests/perf_pmu.c b/tests/perf_pmu.c
> index 469b9becdbac..206c18960b7b 100644
> --- a/tests/perf_pmu.c
> +++ b/tests/perf_pmu.c
> @@ -966,6 +966,196 @@ multi_client(int gem_fd, const struct intel_execution_engine2 *e)
>         assert_within_epsilon(val[1], perf_slept[1], tolerance);
>  }
>  
> +static double calc_queued(uint64_t d_val, uint64_t d_ns)
> +{
> +       return (double)d_val * 1e9 / I915_SAMPLE_QUEUED_DIVISOR / d_ns;
> +}
> +
> +static void
> +queued(int gem_fd, const struct intel_execution_engine2 *e)
> +{
> +       const unsigned long engine = e2ring(gem_fd, e);
> +       const unsigned int max_rq = 10;
> +       double queued[max_rq + 1];
> +       uint32_t bo[max_rq + 1];
> +       unsigned int n, i;
> +       uint64_t val[2];
> +       uint64_t ts[2];
> +       int fd;

igt_require_sw_sync();

I guess we should do igt_require_cork(CORK_SYNC_FD) or something like
that.

> +
> +       memset(queued, 0, sizeof(queued));
> +       memset(bo, 0, sizeof(bo));
> +
> +       fd = open_pmu(I915_PMU_ENGINE_QUEUED(e->class, e->instance));
> +
> +       for (n = 0; n <= max_rq; n++) {
> +               int fence = -1;
> +               struct igt_cork cork = { .fd = fence, .type = CORK_SYNC_FD };

IGT_CORK_FENCE(cork); if you prefer

> +
> +               gem_quiescent_gpu(gem_fd);
> +
> +               if (n)
> +                       fence = igt_cork_plug(&cork, -1);
> +
> +               for (i = 0; i < n; i++) {
> +                       struct drm_i915_gem_exec_object2 obj = { };
> +                       struct drm_i915_gem_execbuffer2 eb = { };
> +
> +                       if (!bo[i]) {
> +                               const uint32_t bbe = MI_BATCH_BUFFER_END;
> +
> +                               bo[i] = gem_create(gem_fd, 4096);
> +                               gem_write(gem_fd, bo[i], 4092, &bbe,
> +                                         sizeof(bbe));
> +                       }
> +
> +                       obj.handle = bo[i];

Looks like you can use just the one handle multiple times?

> +
> +                       eb.buffer_count = 1;
> +                       eb.buffers_ptr = to_user_pointer(&obj);
> +
> +                       eb.flags = engine | I915_EXEC_FENCE_IN;
> +                       eb.rsvd2 = fence;

You do however also want to check with one context per execbuf.

if (flags & CONTEXTS)
	eb.rsvd1 = gem_context_create(fd);
> +
> +                       gem_execbuf(gem_fd, &eb);

if (flags & CONTEXTS)
	gem_context_destroy(fd, eb.rsvd1);
	eb.rsvd1 = gem_context_create();

> +               }
> +
> +               val[0] = __pmu_read_single(fd, &ts[0]);
> +               usleep(batch_duration_ns / 1000);
> +               val[1] = __pmu_read_single(fd, &ts[1]);
> +
> +               queued[n] = calc_queued(val[1] - val[0], ts[1] - ts[0]);
> +               igt_info("n=%u queued=%.2f\n", n, queued[n]);
> +
> +               if (fence >= 0)
> +                       igt_cork_unplug(&cork);

Maybe we should just make this a no-op when used on an unplugged cork.

> +
> +               for (i = 0; i < n; i++)
> +                       gem_sync(gem_fd, bo[i]);
> +       }
> +
> +       close(fd);
> +
> +       for (i = 0; i < max_rq; i++) {
> +               if (bo[i])
> +                       gem_close(gem_fd, bo[i]);
> +       }
> +
> +       for (i = 0; i <= max_rq; i++)
> +               assert_within_epsilon(queued[i], i, tolerance);
> +}
> +
> +static void
> +runnable(int gem_fd, const struct intel_execution_engine2 *e)
> +{
> +       const unsigned long engine = e2ring(gem_fd, e);
> +       const unsigned int max_rq = 10;
> +       igt_spin_t *spin[max_rq + 1];
> +       double runnable[max_rq + 1];
> +       uint32_t ctx[max_rq];
> +       unsigned int n, i;
> +       uint64_t val[2];
> +       uint64_t ts[2];
> +       int fd;
> +
> +       memset(runnable, 0, sizeof(runnable));
> +       memset(ctx, 0, sizeof(ctx));
> +
> +       fd = open_pmu(I915_PMU_ENGINE_RUNNABLE(e->class, e->instance));
> +
> +       for (n = 0; n <= max_rq; n++) {
> +               gem_quiescent_gpu(gem_fd);
> +
> +               for (i = 0; i < n; i++) {
> +                       if (!ctx[i])
> +                               ctx[i] = gem_context_create(gem_fd);
> +
> +                       if (i == 0)
> +                               spin[i] = __spin_poll(gem_fd, ctx[i], engine);
> +                       else
> +                               spin[i] = __igt_spin_batch_new(gem_fd, ctx[i],
> +                                                              engine, 0);
> +               }
> +
> +               if (n)
> +                       __spin_wait(gem_fd, spin[0]);
> +
> +               val[0] = __pmu_read_single(fd, &ts[0]);
> +               usleep(batch_duration_ns / 1000);
> +               val[1] = __pmu_read_single(fd, &ts[1]);
> +
> +               runnable[n] = calc_queued(val[1] - val[0], ts[1] - ts[0]);
> +               igt_info("n=%u runnable=%.2f\n", n, runnable[n]);
> +
> +               for (i = 0; i < n; i++) {
> +                       end_spin(gem_fd, spin[i], FLAG_SYNC);
> +                       igt_spin_batch_free(gem_fd, spin[i]);
> +               }
> +       }
> +
> +       for (i = 0; i < max_rq; i++) {
> +               if (ctx[i])
> +                       gem_context_destroy(gem_fd, ctx[i]);

I would just create the contexts unconditionally.

> +       }
> +
> +       close(fd);
> +
> +       assert_within_epsilon(runnable[0], 0, tolerance);
> +       igt_assert(runnable[max_rq] > 0.0);
> +       assert_within_epsilon(runnable[max_rq] - runnable[max_rq - 1], 1,
> +                             tolerance);
> +}
> +
> +static void
> +running(int gem_fd, const struct intel_execution_engine2 *e)
> +{
> +       const unsigned long engine = e2ring(gem_fd, e);
> +       const unsigned int max_rq = 10;
> +       igt_spin_t *spin[max_rq + 1];
> +       double running[max_rq + 1];
> +       unsigned int n, i;
> +       uint64_t val[2];
> +       uint64_t ts[2];
> +       int fd;
> +
> +       memset(running, 0, sizeof(running));
> +       memset(spin, 0, sizeof(spin));
> +
> +       fd = open_pmu(I915_PMU_ENGINE_RUNNING(e->class, e->instance));
> +
> +       for (n = 0; n <= max_rq; n++) {
> +               gem_quiescent_gpu(gem_fd);
> +
> +               for (i = 0; i < n; i++) {
> +                       if (i == 0)
> +                               spin[i] = __spin_poll(gem_fd, 0, engine);
> +                       else
> +                               spin[i] = __igt_spin_batch_new(gem_fd, 0,
> +                                                              engine, 0);
> +               }
> +
> +               if (n)
> +                       __spin_wait(gem_fd, spin[0]);

So create N requests on the same context so that running == N due to
lite-restore every time. I have some caveats that this relies on the
precise implementation, e.g. I don't think it will work for guc (using
execlists emulation with no lite-restore) for N > 2 or 8, or if we get
creative with execlists.

> +
> +               val[0] = __pmu_read_single(fd, &ts[0]);
> +               usleep(batch_duration_ns / 1000);
> +               val[1] = __pmu_read_single(fd, &ts[1]);
> +
> +               running[n] = calc_queued(val[1] - val[0], ts[1] - ts[0]);
> +               igt_info("n=%u running=%.2f\n", n, running[n]);
> +
> +               for (i = 0; i < n; i++) {
> +                       end_spin(gem_fd, spin[i], FLAG_SYNC);
> +                       igt_spin_batch_free(gem_fd, spin[i]);
> +               }
> +       }
> +
> +       close(fd);
> +
> +       for (i = 0; i <= max_rq; i++)
> +               assert_within_epsilon(running[i], i, tolerance);
> +}

Ok, the tests look like they should be covering the counters.

Do we need to do an all-engines pass to check concurrent usage?
-Chris
_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [igt-dev] ✗ Fi.CI.BAT: failure for Queued/runnable/running engine stats
  2018-03-19 18:22 ` [Intel-gfx] " Tvrtko Ursulin
                   ` (5 preceding siblings ...)
  (?)
@ 2018-03-19 21:54 ` Patchwork
  -1 siblings, 0 replies; 26+ messages in thread
From: Patchwork @ 2018-03-19 21:54 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: igt-dev

== Series Details ==

Series: Queued/runnable/running engine stats
URL   : https://patchwork.freedesktop.org/series/40217/
State : failure

== Summary ==

IGT patchset build failed on latest successful build
b09e979a67817a9b068f841bda81940b9d208850 tools/aubdump: For gen10+ support addresses up to 4GB

[137/484] Linking target tests/gem_ringfill
[138/484] Linking target tests/gem_set_tiling_vs_gtt
[139/484] Linking target tests/gem_shrink
[140/484] Linking target tests/gem_set_tiling_vs_pwrite
[141/484] Linking target tests/gem_spin_batch
[142/484] Linking target tests/gem_stolen
[143/484] Linking target tests/gem_storedw_batches_loop
[144/484] Linking target tests/gem_softpin
[145/484] Linking target tests/gem_storedw_loop
[146/484] Linking target tests/gem_threaded_access_tiled
[147/484] Linking target tests/gem_tiled_blits
[148/484] Linking target tests/gem_streaming_writes
[149/484] Linking target tests/gem_sync
[150/484] Linking target tests/gem_tiled_pread_basic
[151/484] Linking target tests/gem_tiled_fence_blits
[152/484] Linking target tests/gem_tiled_partial_pwrite_pread
[153/484] Linking target tests/gem_tiled_wc
[154/484] Linking target tests/gem_tiled_pread_pwrite
[155/484] Linking target tests/gem_tiled_wb
[156/484] Linking target tests/gem_tiled_swapping
[157/484] Linking target tests/gem_tiling_max_stride
[158/484] Linking target tests/gem_unfence_active_buffers
[159/484] Linking target tests/gem_unref_active_buffers
[160/484] Linking target tests/gem_wait
[161/484] Linking target tests/gem_write_read_ring_switch
[162/484] Linking target tests/gem_userptr_blits
[163/484] Linking target tests/gen3_mixed_blits
[164/484] Linking target tests/gem_workarounds
[165/484] Linking target tests/gen3_render_linear_blits
[166/484] Linking target tests/kms_3d
[167/484] Linking target tests/gen3_render_tiledx_blits
[168/484] Linking target tests/gen3_render_mixed_blits
[169/484] Compiling c object 'tests/i915_query@exe/i915_query.c.o'
FAILED: tests/i915_query@exe/i915_query.c.o 
ccache cc  '-Itests/i915_query@exe' '-Itests' '-I../tests' '-I.' '-I../' '-Ilib' '-I../lib' '-I../include/drm-uapi' '-I/usr/include/cairo' '-I/usr/include/glib-2.0' '-I/usr/lib/x86_64-linux-gnu/glib-2.0/include' '-I/usr/include/pixman-1' '-I/usr/include/freetype2' '-I/usr/include/libpng12' '-I/opt/igt/include' '-I/opt/igt/include/libdrm' '-I/usr/include' '-I/home/cidrm/kernel_headers/include' '-fdiagnostics-color=always' '-pipe' '-D_FILE_OFFSET_BITS=64' '-Wall' '-Winvalid-pch' '-Wextra' '-std=gnu99' '-O0' '-g' '-D_GNU_SOURCE' '-include' 'config.h' '-Wno-unused-parameter' '-Wno-sign-compare' '-Wno-missing-field-initializers' '-Wno-clobbered' '-Wno-type-limits' '-pthread' '-MMD' '-MQ' 'tests/i915_query@exe/i915_query.c.o' '-MF' 'tests/i915_query@exe/i915_query.c.o.d' -o 'tests/i915_query@exe/i915_query.c.o' -c ../tests/i915_query.c
../tests/i915_query.c: In function ‘__spin_poll’:
../tests/i915_query.c:704:10: warning: implicit declaration of function ‘__igt_spin_batch_new_poll’ [-Wimplicit-function-declaration]
   return __igt_spin_batch_new_poll(fd, ctx, flags);
          ^~~~~~~~~~~~~~~~~~~~~~~~~
../tests/i915_query.c:704:10: warning: return makes pointer from integer without a cast [-Wint-conversion]
   return __igt_spin_batch_new_poll(fd, ctx, flags);
          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../tests/i915_query.c: In function ‘__spin_wait’:
../tests/i915_query.c:715:34: error: ‘igt_spin_t {aka struct igt_spin}’ has no member named ‘execbuf’
  if (gem_can_store_dword(fd, spin->execbuf.flags)) {
                                  ^~
../tests/i915_query.c:718:15: error: ‘igt_spin_t {aka struct igt_spin}’ has no member named ‘running’
   while (!spin->running) {
               ^~
ninja: build stopped: subcommand failed.

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 5/5] tests/i915_query: Engine queues tests
  2018-03-19 18:22   ` [igt-dev] " Tvrtko Ursulin
@ 2018-03-22 21:22     ` Lionel Landwerlin
  -1 siblings, 0 replies; 26+ messages in thread
From: Lionel Landwerlin @ 2018-03-22 21:22 UTC (permalink / raw)
  To: Tvrtko Ursulin, igt-dev; +Cc: Intel-gfx

On 19/03/18 18:22, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>
> ...
>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   tests/i915_query.c | 381 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 381 insertions(+)
>
> diff --git a/tests/i915_query.c b/tests/i915_query.c
> index c7de8cbd8371..94e7a3297ebd 100644
> --- a/tests/i915_query.c
> +++ b/tests/i915_query.c
> @@ -477,8 +477,358 @@ test_query_topology_known_pci_ids(int fd, int devid)
>   	free(topo_info);
>   }
>   
> +#define DRM_I915_QUERY_ENGINE_QUEUES	2
> +
> +struct drm_i915_query_engine_queues {
> +	/** Engine class as in enum drm_i915_gem_engine_class. */
> +	__u16 class;
> +
> +	/** Engine instance number. */
> +	__u16 instance;
> +
> +	/** Number of requests with unresolved fences and dependencies. */
> +	__u32 queued;
> +
> +	/** Number of ready requests waiting on a slot on GPU. */
> +	__u32 runnable;
> +
> +	/** Number of requests executing on the GPU. */
> +	__u32 running;
> +
> +	__u32 rsvd[5];
> +};
> +
> +static bool query_engine_queues_supported(int fd)
> +{
> +	struct drm_i915_query_item item = {
> +		.query_id = DRM_I915_QUERY_ENGINE_QUEUES,
> +	};
> +
> +	return __i915_query_items(fd, &item, 1) == 0 && item.length > 0;
> +}
> +
> +static void engine_queues_invalid(int fd)
> +{
> +	struct drm_i915_query_engine_queues queues;
> +	struct drm_i915_query_item item;
> +	unsigned int len;
> +	unsigned int i;
> +
> +	/* Flags is MBZ. */
> +	memset(&item, 0, sizeof(item));
> +	item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
> +	item.flags = 1;
> +	i915_query_items(fd, &item, 1);
> +	igt_assert_eq(item.length, -EINVAL);
> +
> +	/* Length not zero and not greater or equal required size. */
> +	memset(&item, 0, sizeof(item));
> +	item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
> +	item.length = 1;
> +	i915_query_items(fd, &item, 1);
> +	igt_assert_eq(item.length, -ENOSPC);
> +
> +	/* Query correct length. */
> +	memset(&item, 0, sizeof(item));
> +	item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
> +	i915_query_items(fd, &item, 1);
> +	igt_assert(item.length >= 0);
> +	len = item.length;
> +
> +	/* Ivalid pointer. */
> +	memset(&item, 0, sizeof(item));
> +	item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
> +	item.length = len;
> +	i915_query_items(fd, &item, 1);
> +	igt_assert_eq(item.length, -EFAULT);
> +
> +	/* Reserved fields are MBZ. */
> +
> +	for (i = 0; i < ARRAY_SIZE(queues.rsvd); i++) {
> +		memset(&queues, 0, sizeof(queues));
> +		queues.rsvd[i] = 1;
> +		memset(&item, 0, sizeof(item));
> +		item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
> +		item.length = len;
> +		item.data_ptr = to_user_pointer(&queues);
> +		i915_query_items(fd, &item, 1);
> +		igt_assert_eq(item.length, -EINVAL);
> +	}
> +
> +	memset(&queues, 0, sizeof(queues));
> +	queues.class = -1;
> +	memset(&item, 0, sizeof(item));
> +	item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
> +	item.length = len;
> +	item.data_ptr = to_user_pointer(&queues);
> +	i915_query_items(fd, &item, 1);
> +	igt_assert_eq(item.length, -ENOENT);
> +

Looks like you've copied the few lines above after.
It seems to be the same test.

> +	memset(&queues, 0, sizeof(queues));
> +	queues.instance = -1;
> +	memset(&item, 0, sizeof(item));
> +	item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
> +	item.length = len;
> +		item.data_ptr = to_user_pointer(&queues);
> +	i915_query_items(fd, &item, 1);
> +	igt_assert_eq(item.length, -ENOENT);
> +}
> +
> +static void engine_queues(int fd, const struct intel_execution_engine2 *e)
> +{
> +	struct drm_i915_query_engine_queues queues;
> +	struct drm_i915_query_item item;
> +	unsigned int len;
> +
> +	/* Query required buffer length. */
> +	memset(&queues, 0, sizeof(queues));
> +	memset(&item, 0, sizeof(item));
> +	item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
> +	item.data_ptr = to_user_pointer(&queues);
> +	i915_query_items(fd, &item, 1);
> +	igt_assert(item.length >= 0);
> +	igt_assert(item.length <= sizeof(queues));
> +	len = item.length;
> +
> +	/* Check length larger than required works and reports same length. */
> +	memset(&queues, 0, sizeof(queues));
> +	memset(&item, 0, sizeof(item));
> +	item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
> +	item.data_ptr = to_user_pointer(&queues);
> +	item.length = len + 1;
> +	i915_query_items(fd, &item, 1);
> +	igt_assert_eq(item.length, len);
> +
> +	/* Actual query. */
> +	memset(&queues, 0, sizeof(queues));
> +	queues.class = e->class;
> +	queues.instance = e->instance;
> +	memset(&item, 0, sizeof(item));
> +	item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
> +	item.data_ptr = to_user_pointer(&queues);
> +	item.length = len;
> +	i915_query_items(fd, &item, 1);
> +	igt_assert_eq(item.length, len);
> +}
> +
> +static unsigned int e2ring(int gem_fd, const struct intel_execution_engine2 *e)
> +{
> +	return gem_class_instance_to_eb_flags(gem_fd, e->class, e->instance);
> +}
> +
> +static void
> +__query_queues(int fd, const struct intel_execution_engine2 *e,
> +	       struct drm_i915_query_engine_queues *queues)
> +{
> +	struct drm_i915_query_item item;
> +
> +	memset(queues, 0, sizeof(*queues));
> +	queues->class = e->class;
> +	queues->instance = e->instance;
> +	memset(&item, 0, sizeof(item));
> +	item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
> +	item.data_ptr = to_user_pointer(queues);
> +	item.length = sizeof(*queues);
> +	i915_query_items(fd, &item, 1);
> +	igt_assert_eq(item.length, sizeof(*queues));
> +}
> +
> +static void
> +engine_queued(int gem_fd, const struct intel_execution_engine2 *e)
> +{
> +	const unsigned long engine = e2ring(gem_fd, e);
> +	struct drm_i915_query_engine_queues queues;
> +	const unsigned int max_rq = 10;
> +	uint32_t queued[max_rq + 1];
> +	uint32_t bo[max_rq + 1];
> +	unsigned int n, i;
> +
> +	memset(queued, 0, sizeof(queued));
> +	memset(bo, 0, sizeof(bo));
> +
> +	for (n = 0; n <= max_rq; n++) {
> +		int fence = -1;
> +		struct igt_cork cork = { .fd = fence, .type = CORK_SYNC_FD };
> +
> +		gem_quiescent_gpu(gem_fd);
> +
> +		if (n)
> +			fence = igt_cork_plug(&cork, -1);
> +
> +		for (i = 0; i < n; i++) {
> +			struct drm_i915_gem_exec_object2 obj = { };
> +			struct drm_i915_gem_execbuffer2 eb = { };
> +
> +			if (!bo[i]) {
> +				const uint32_t bbe = MI_BATCH_BUFFER_END;
> +
> +				bo[i] = gem_create(gem_fd, 4096);
> +				gem_write(gem_fd, bo[i], 4092, &bbe,
> +					  sizeof(bbe));
> +			}
> +
> +			obj.handle = bo[i];
> +
> +			eb.buffer_count = 1;
> +			eb.buffers_ptr = to_user_pointer(&obj);
> +
> +			eb.flags = engine | I915_EXEC_FENCE_IN;
> +			eb.rsvd2 = fence;
> +
> +			gem_execbuf(gem_fd, &eb);
> +		}
> +
> +		__query_queues(gem_fd, e, &queues);
> +		queued[n] = queues.queued;
> +		igt_info("n=%u queued=%u\n", n, queued[n]);
> +
> +		if (fence >= 0)
> +			igt_cork_unplug(&cork);
> +
> +		for (i = 0; i < n; i++)
> +			gem_sync(gem_fd, bo[i]);
> +	}
> +
> +	for (i = 0; i < max_rq; i++) {
> +		if (bo[i])
> +			gem_close(gem_fd, bo[i]);
> +	}
> +
> +	for (i = 0; i <= max_rq; i++)
> +		igt_assert_eq(queued[i], i);
> +}
> +

I'm not sure to understand what the 2 function below are meant to do.
Could you put a comment?

__igt_spin_batch_new_poll also appear to be missing.

> +static igt_spin_t * __spin_poll(int fd, uint32_t ctx, unsigned long flags)
> +{
> +	if (gem_can_store_dword(fd, flags))
> +		return __igt_spin_batch_new_poll(fd, ctx, flags);
> +	else
> +		return __igt_spin_batch_new(fd, ctx, flags, 0);
> +}
> +
> +static unsigned long __spin_wait(int fd, igt_spin_t *spin)
> +{
> +	struct timespec start = { };
> +
> +	igt_nsec_elapsed(&start);
> +
> +	if (gem_can_store_dword(fd, spin->execbuf.flags)) {
> +		unsigned long timeout = 0;
> +
> +		while (!spin->running) {
> +			unsigned long t = igt_nsec_elapsed(&start);
> +
> +			if ((t - timeout) > 250e6) {
> +				timeout = t;
> +				igt_warn("Spinner not running after %.2fms\n",
> +					 (double)t / 1e6);
> +			}
> +		};
> +	} else {
> +		igt_debug("__spin_wait - usleep mode\n");
> +		usleep(500e3); /* Better than nothing! */
> +	}
> +
> +	return igt_nsec_elapsed(&start);
> +}
> +
> +static void
> +engine_runnable(int gem_fd, const struct intel_execution_engine2 *e)
> +{
> +	const unsigned long engine = e2ring(gem_fd, e);
> +	struct drm_i915_query_engine_queues queues;
> +	const unsigned int max_rq = 10;
> +	igt_spin_t *spin[max_rq + 1];
> +	uint32_t runnable[max_rq + 1];
> +	uint32_t ctx[max_rq];
> +	unsigned int n, i;
> +
> +	memset(runnable, 0, sizeof(runnable));
> +	memset(ctx, 0, sizeof(ctx));
> +
> +	for (n = 0; n <= max_rq; n++) {
> +		gem_quiescent_gpu(gem_fd);
> +
> +		for (i = 0; i < n; i++) {
> +			if (!ctx[i])
> +				ctx[i] = gem_context_create(gem_fd);
> +
> +			if (i == 0)
> +				spin[i] = __spin_poll(gem_fd, ctx[i], engine);
> +			else
> +				spin[i] = __igt_spin_batch_new(gem_fd, ctx[i],
> +							       engine, 0);
> +		}
> +
> +		if (n)
> +			__spin_wait(gem_fd, spin[0]);
> +
> +		__query_queues(gem_fd, e, &queues);
> +		runnable[n] = queues.runnable;
> +		igt_info("n=%u runnable=%u\n", n, runnable[n]);
> +
> +		for (i = 0; i < n; i++) {
> +			igt_spin_batch_end(spin[i]);
> +			gem_sync(gem_fd, spin[i]->handle);
> +			igt_spin_batch_free(gem_fd, spin[i]);
> +		}
> +	}
> +
> +	for (i = 0; i < max_rq; i++) {
> +		if (ctx[i])
> +			gem_context_destroy(gem_fd, ctx[i]);
> +	}
> +
> +	igt_assert_eq(runnable[0], 0);

Why only checking the first & last items?
It seems that the results should be consistent? no?

> +	igt_assert(runnable[max_rq] > 0);
> +	igt_assert_eq(runnable[max_rq] - runnable[max_rq - 1], 1);
> +}
> +
> +static void
> +engine_running(int gem_fd, const struct intel_execution_engine2 *e)
> +{
> +	const unsigned long engine = e2ring(gem_fd, e);
> +	struct drm_i915_query_engine_queues queues;
> +	const unsigned int max_rq = 10;
> +	igt_spin_t *spin[max_rq + 1];
> +	uint32_t running[max_rq + 1];
> +	unsigned int n, i;
> +
> +	memset(running, 0, sizeof(running));
> +	memset(spin, 0, sizeof(spin));
> +
> +	for (n = 0; n <= max_rq; n++) {
> +		gem_quiescent_gpu(gem_fd);
> +
> +		for (i = 0; i < n; i++) {
> +			if (i == 0)
> +				spin[i] = __spin_poll(gem_fd, 0, engine);
> +			else
> +				spin[i] = __igt_spin_batch_new(gem_fd, 0,
> +							       engine, 0);
> +		}
> +
> +		if (n)
> +			__spin_wait(gem_fd, spin[0]);
> +
> +		__query_queues(gem_fd, e, &queues);
> +		running[n] = queues.running;
> +		igt_info("n=%u running=%u\n", n, running[n]);
> +
> +		for (i = 0; i < n; i++) {
> +			igt_spin_batch_end(spin[i]);
> +			gem_sync(gem_fd, spin[i]->handle);
> +			igt_spin_batch_free(gem_fd, spin[i]);
> +		}
> +	}
> +
> +	for (i = 0; i <= max_rq; i++)
> +		igt_assert_eq(running[i], i);
> +}
> +
>   igt_main
>   {
> +	const struct intel_execution_engine2 *e;
>   	int fd = -1;
>   	int devid;
>   
> @@ -524,6 +874,37 @@ igt_main
>   		test_query_topology_known_pci_ids(fd, devid);
>   	}

I guess we could add a group for the topology too.

>   
> +	igt_subtest_group {
> +		igt_fixture {
> +			igt_require(query_engine_queues_supported(fd));
> +		}
> +
> +		igt_subtest("engine-queues-invalid")
> +			engine_queues_invalid(fd);
> +
> +		for_each_engine_class_instance(fd, e) {
> +			igt_subtest_group {
> +				igt_fixture {
> +					gem_require_engine(fd,
> +							   e->class,
> +							   e->instance);
> +				}
> +
> +				igt_subtest_f("engine-queues-%s", e->name)
> +					engine_queues(fd, e);
> +
> +				igt_subtest_f("engine-queued-%s", e->name)
> +					engine_queued(fd, e);
> +
> +				igt_subtest_f("engine-runnable-%s", e->name)
> +					engine_runnable(fd, e);
> +
> +				igt_subtest_f("engine-running-%s", e->name)
> +					engine_running(fd, e);
> +			}
> +		}
> +	}
> +
>   	igt_fixture {
>   		close(fd);
>   	}


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 5/5] tests/i915_query: Engine queues tests
@ 2018-03-22 21:22     ` Lionel Landwerlin
  0 siblings, 0 replies; 26+ messages in thread
From: Lionel Landwerlin @ 2018-03-22 21:22 UTC (permalink / raw)
  To: Tvrtko Ursulin, igt-dev; +Cc: Intel-gfx, Tvrtko Ursulin

On 19/03/18 18:22, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>
> ...
>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   tests/i915_query.c | 381 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 381 insertions(+)
>
> diff --git a/tests/i915_query.c b/tests/i915_query.c
> index c7de8cbd8371..94e7a3297ebd 100644
> --- a/tests/i915_query.c
> +++ b/tests/i915_query.c
> @@ -477,8 +477,358 @@ test_query_topology_known_pci_ids(int fd, int devid)
>   	free(topo_info);
>   }
>   
> +#define DRM_I915_QUERY_ENGINE_QUEUES	2
> +
> +struct drm_i915_query_engine_queues {
> +	/** Engine class as in enum drm_i915_gem_engine_class. */
> +	__u16 class;
> +
> +	/** Engine instance number. */
> +	__u16 instance;
> +
> +	/** Number of requests with unresolved fences and dependencies. */
> +	__u32 queued;
> +
> +	/** Number of ready requests waiting on a slot on GPU. */
> +	__u32 runnable;
> +
> +	/** Number of requests executing on the GPU. */
> +	__u32 running;
> +
> +	__u32 rsvd[5];
> +};
> +
> +static bool query_engine_queues_supported(int fd)
> +{
> +	struct drm_i915_query_item item = {
> +		.query_id = DRM_I915_QUERY_ENGINE_QUEUES,
> +	};
> +
> +	return __i915_query_items(fd, &item, 1) == 0 && item.length > 0;
> +}
> +
> +static void engine_queues_invalid(int fd)
> +{
> +	struct drm_i915_query_engine_queues queues;
> +	struct drm_i915_query_item item;
> +	unsigned int len;
> +	unsigned int i;
> +
> +	/* Flags is MBZ. */
> +	memset(&item, 0, sizeof(item));
> +	item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
> +	item.flags = 1;
> +	i915_query_items(fd, &item, 1);
> +	igt_assert_eq(item.length, -EINVAL);
> +
> +	/* Length not zero and not greater or equal required size. */
> +	memset(&item, 0, sizeof(item));
> +	item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
> +	item.length = 1;
> +	i915_query_items(fd, &item, 1);
> +	igt_assert_eq(item.length, -ENOSPC);
> +
> +	/* Query correct length. */
> +	memset(&item, 0, sizeof(item));
> +	item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
> +	i915_query_items(fd, &item, 1);
> +	igt_assert(item.length >= 0);
> +	len = item.length;
> +
> +	/* Ivalid pointer. */
> +	memset(&item, 0, sizeof(item));
> +	item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
> +	item.length = len;
> +	i915_query_items(fd, &item, 1);
> +	igt_assert_eq(item.length, -EFAULT);
> +
> +	/* Reserved fields are MBZ. */
> +
> +	for (i = 0; i < ARRAY_SIZE(queues.rsvd); i++) {
> +		memset(&queues, 0, sizeof(queues));
> +		queues.rsvd[i] = 1;
> +		memset(&item, 0, sizeof(item));
> +		item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
> +		item.length = len;
> +		item.data_ptr = to_user_pointer(&queues);
> +		i915_query_items(fd, &item, 1);
> +		igt_assert_eq(item.length, -EINVAL);
> +	}
> +
> +	memset(&queues, 0, sizeof(queues));
> +	queues.class = -1;
> +	memset(&item, 0, sizeof(item));
> +	item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
> +	item.length = len;
> +	item.data_ptr = to_user_pointer(&queues);
> +	i915_query_items(fd, &item, 1);
> +	igt_assert_eq(item.length, -ENOENT);
> +

Looks like you've copied the few lines above after.
It seems to be the same test.

> +	memset(&queues, 0, sizeof(queues));
> +	queues.instance = -1;
> +	memset(&item, 0, sizeof(item));
> +	item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
> +	item.length = len;
> +		item.data_ptr = to_user_pointer(&queues);
> +	i915_query_items(fd, &item, 1);
> +	igt_assert_eq(item.length, -ENOENT);
> +}
> +
> +static void engine_queues(int fd, const struct intel_execution_engine2 *e)
> +{
> +	struct drm_i915_query_engine_queues queues;
> +	struct drm_i915_query_item item;
> +	unsigned int len;
> +
> +	/* Query required buffer length. */
> +	memset(&queues, 0, sizeof(queues));
> +	memset(&item, 0, sizeof(item));
> +	item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
> +	item.data_ptr = to_user_pointer(&queues);
> +	i915_query_items(fd, &item, 1);
> +	igt_assert(item.length >= 0);
> +	igt_assert(item.length <= sizeof(queues));
> +	len = item.length;
> +
> +	/* Check length larger than required works and reports same length. */
> +	memset(&queues, 0, sizeof(queues));
> +	memset(&item, 0, sizeof(item));
> +	item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
> +	item.data_ptr = to_user_pointer(&queues);
> +	item.length = len + 1;
> +	i915_query_items(fd, &item, 1);
> +	igt_assert_eq(item.length, len);
> +
> +	/* Actual query. */
> +	memset(&queues, 0, sizeof(queues));
> +	queues.class = e->class;
> +	queues.instance = e->instance;
> +	memset(&item, 0, sizeof(item));
> +	item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
> +	item.data_ptr = to_user_pointer(&queues);
> +	item.length = len;
> +	i915_query_items(fd, &item, 1);
> +	igt_assert_eq(item.length, len);
> +}
> +
> +static unsigned int e2ring(int gem_fd, const struct intel_execution_engine2 *e)
> +{
> +	return gem_class_instance_to_eb_flags(gem_fd, e->class, e->instance);
> +}
> +
> +static void
> +__query_queues(int fd, const struct intel_execution_engine2 *e,
> +	       struct drm_i915_query_engine_queues *queues)
> +{
> +	struct drm_i915_query_item item;
> +
> +	memset(queues, 0, sizeof(*queues));
> +	queues->class = e->class;
> +	queues->instance = e->instance;
> +	memset(&item, 0, sizeof(item));
> +	item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
> +	item.data_ptr = to_user_pointer(queues);
> +	item.length = sizeof(*queues);
> +	i915_query_items(fd, &item, 1);
> +	igt_assert_eq(item.length, sizeof(*queues));
> +}
> +
> +static void
> +engine_queued(int gem_fd, const struct intel_execution_engine2 *e)
> +{
> +	const unsigned long engine = e2ring(gem_fd, e);
> +	struct drm_i915_query_engine_queues queues;
> +	const unsigned int max_rq = 10;
> +	uint32_t queued[max_rq + 1];
> +	uint32_t bo[max_rq + 1];
> +	unsigned int n, i;
> +
> +	memset(queued, 0, sizeof(queued));
> +	memset(bo, 0, sizeof(bo));
> +
> +	for (n = 0; n <= max_rq; n++) {
> +		int fence = -1;
> +		struct igt_cork cork = { .fd = fence, .type = CORK_SYNC_FD };
> +
> +		gem_quiescent_gpu(gem_fd);
> +
> +		if (n)
> +			fence = igt_cork_plug(&cork, -1);
> +
> +		for (i = 0; i < n; i++) {
> +			struct drm_i915_gem_exec_object2 obj = { };
> +			struct drm_i915_gem_execbuffer2 eb = { };
> +
> +			if (!bo[i]) {
> +				const uint32_t bbe = MI_BATCH_BUFFER_END;
> +
> +				bo[i] = gem_create(gem_fd, 4096);
> +				gem_write(gem_fd, bo[i], 4092, &bbe,
> +					  sizeof(bbe));
> +			}
> +
> +			obj.handle = bo[i];
> +
> +			eb.buffer_count = 1;
> +			eb.buffers_ptr = to_user_pointer(&obj);
> +
> +			eb.flags = engine | I915_EXEC_FENCE_IN;
> +			eb.rsvd2 = fence;
> +
> +			gem_execbuf(gem_fd, &eb);
> +		}
> +
> +		__query_queues(gem_fd, e, &queues);
> +		queued[n] = queues.queued;
> +		igt_info("n=%u queued=%u\n", n, queued[n]);
> +
> +		if (fence >= 0)
> +			igt_cork_unplug(&cork);
> +
> +		for (i = 0; i < n; i++)
> +			gem_sync(gem_fd, bo[i]);
> +	}
> +
> +	for (i = 0; i < max_rq; i++) {
> +		if (bo[i])
> +			gem_close(gem_fd, bo[i]);
> +	}
> +
> +	for (i = 0; i <= max_rq; i++)
> +		igt_assert_eq(queued[i], i);
> +}
> +

I'm not sure to understand what the 2 function below are meant to do.
Could you put a comment?

__igt_spin_batch_new_poll also appear to be missing.

> +static igt_spin_t * __spin_poll(int fd, uint32_t ctx, unsigned long flags)
> +{
> +	if (gem_can_store_dword(fd, flags))
> +		return __igt_spin_batch_new_poll(fd, ctx, flags);
> +	else
> +		return __igt_spin_batch_new(fd, ctx, flags, 0);
> +}
> +
> +static unsigned long __spin_wait(int fd, igt_spin_t *spin)
> +{
> +	struct timespec start = { };
> +
> +	igt_nsec_elapsed(&start);
> +
> +	if (gem_can_store_dword(fd, spin->execbuf.flags)) {
> +		unsigned long timeout = 0;
> +
> +		while (!spin->running) {
> +			unsigned long t = igt_nsec_elapsed(&start);
> +
> +			if ((t - timeout) > 250e6) {
> +				timeout = t;
> +				igt_warn("Spinner not running after %.2fms\n",
> +					 (double)t / 1e6);
> +			}
> +		};
> +	} else {
> +		igt_debug("__spin_wait - usleep mode\n");
> +		usleep(500e3); /* Better than nothing! */
> +	}
> +
> +	return igt_nsec_elapsed(&start);
> +}
> +
> +static void
> +engine_runnable(int gem_fd, const struct intel_execution_engine2 *e)
> +{
> +	const unsigned long engine = e2ring(gem_fd, e);
> +	struct drm_i915_query_engine_queues queues;
> +	const unsigned int max_rq = 10;
> +	igt_spin_t *spin[max_rq + 1];
> +	uint32_t runnable[max_rq + 1];
> +	uint32_t ctx[max_rq];
> +	unsigned int n, i;
> +
> +	memset(runnable, 0, sizeof(runnable));
> +	memset(ctx, 0, sizeof(ctx));
> +
> +	for (n = 0; n <= max_rq; n++) {
> +		gem_quiescent_gpu(gem_fd);
> +
> +		for (i = 0; i < n; i++) {
> +			if (!ctx[i])
> +				ctx[i] = gem_context_create(gem_fd);
> +
> +			if (i == 0)
> +				spin[i] = __spin_poll(gem_fd, ctx[i], engine);
> +			else
> +				spin[i] = __igt_spin_batch_new(gem_fd, ctx[i],
> +							       engine, 0);
> +		}
> +
> +		if (n)
> +			__spin_wait(gem_fd, spin[0]);
> +
> +		__query_queues(gem_fd, e, &queues);
> +		runnable[n] = queues.runnable;
> +		igt_info("n=%u runnable=%u\n", n, runnable[n]);
> +
> +		for (i = 0; i < n; i++) {
> +			igt_spin_batch_end(spin[i]);
> +			gem_sync(gem_fd, spin[i]->handle);
> +			igt_spin_batch_free(gem_fd, spin[i]);
> +		}
> +	}
> +
> +	for (i = 0; i < max_rq; i++) {
> +		if (ctx[i])
> +			gem_context_destroy(gem_fd, ctx[i]);
> +	}
> +
> +	igt_assert_eq(runnable[0], 0);

Why only checking the first & last items?
It seems that the results should be consistent? no?

> +	igt_assert(runnable[max_rq] > 0);
> +	igt_assert_eq(runnable[max_rq] - runnable[max_rq - 1], 1);
> +}
> +
> +static void
> +engine_running(int gem_fd, const struct intel_execution_engine2 *e)
> +{
> +	const unsigned long engine = e2ring(gem_fd, e);
> +	struct drm_i915_query_engine_queues queues;
> +	const unsigned int max_rq = 10;
> +	igt_spin_t *spin[max_rq + 1];
> +	uint32_t running[max_rq + 1];
> +	unsigned int n, i;
> +
> +	memset(running, 0, sizeof(running));
> +	memset(spin, 0, sizeof(spin));
> +
> +	for (n = 0; n <= max_rq; n++) {
> +		gem_quiescent_gpu(gem_fd);
> +
> +		for (i = 0; i < n; i++) {
> +			if (i == 0)
> +				spin[i] = __spin_poll(gem_fd, 0, engine);
> +			else
> +				spin[i] = __igt_spin_batch_new(gem_fd, 0,
> +							       engine, 0);
> +		}
> +
> +		if (n)
> +			__spin_wait(gem_fd, spin[0]);
> +
> +		__query_queues(gem_fd, e, &queues);
> +		running[n] = queues.running;
> +		igt_info("n=%u running=%u\n", n, running[n]);
> +
> +		for (i = 0; i < n; i++) {
> +			igt_spin_batch_end(spin[i]);
> +			gem_sync(gem_fd, spin[i]->handle);
> +			igt_spin_batch_free(gem_fd, spin[i]);
> +		}
> +	}
> +
> +	for (i = 0; i <= max_rq; i++)
> +		igt_assert_eq(running[i], i);
> +}
> +
>   igt_main
>   {
> +	const struct intel_execution_engine2 *e;
>   	int fd = -1;
>   	int devid;
>   
> @@ -524,6 +874,37 @@ igt_main
>   		test_query_topology_known_pci_ids(fd, devid);
>   	}

I guess we could add a group for the topology too.

>   
> +	igt_subtest_group {
> +		igt_fixture {
> +			igt_require(query_engine_queues_supported(fd));
> +		}
> +
> +		igt_subtest("engine-queues-invalid")
> +			engine_queues_invalid(fd);
> +
> +		for_each_engine_class_instance(fd, e) {
> +			igt_subtest_group {
> +				igt_fixture {
> +					gem_require_engine(fd,
> +							   e->class,
> +							   e->instance);
> +				}
> +
> +				igt_subtest_f("engine-queues-%s", e->name)
> +					engine_queues(fd, e);
> +
> +				igt_subtest_f("engine-queued-%s", e->name)
> +					engine_queued(fd, e);
> +
> +				igt_subtest_f("engine-runnable-%s", e->name)
> +					engine_runnable(fd, e);
> +
> +				igt_subtest_f("engine-running-%s", e->name)
> +					engine_running(fd, e);
> +			}
> +		}
> +	}
> +
>   	igt_fixture {
>   		close(fd);
>   	}


_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 4/5] tests/perf_pmu: Add tests for engine queued/runnable/running stats
  2018-03-19 20:58     ` [igt-dev] [Intel-gfx] " Chris Wilson
@ 2018-03-23 10:08       ` Tvrtko Ursulin
  -1 siblings, 0 replies; 26+ messages in thread
From: Tvrtko Ursulin @ 2018-03-23 10:08 UTC (permalink / raw)
  To: Chris Wilson, Tvrtko Ursulin, igt-dev; +Cc: Intel-gfx


On 19/03/2018 20:58, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-03-19 18:22:04)
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> Simple tests to check reported queue depths are correct.
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> ---
>>   tests/perf_pmu.c | 224 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 224 insertions(+)
>>
>> diff --git a/tests/perf_pmu.c b/tests/perf_pmu.c
>> index 469b9becdbac..206c18960b7b 100644
>> --- a/tests/perf_pmu.c
>> +++ b/tests/perf_pmu.c
>> @@ -966,6 +966,196 @@ multi_client(int gem_fd, const struct intel_execution_engine2 *e)
>>          assert_within_epsilon(val[1], perf_slept[1], tolerance);
>>   }
>>   
>> +static double calc_queued(uint64_t d_val, uint64_t d_ns)
>> +{
>> +       return (double)d_val * 1e9 / I915_SAMPLE_QUEUED_DIVISOR / d_ns;
>> +}
>> +
>> +static void
>> +queued(int gem_fd, const struct intel_execution_engine2 *e)
>> +{
>> +       const unsigned long engine = e2ring(gem_fd, e);
>> +       const unsigned int max_rq = 10;
>> +       double queued[max_rq + 1];
>> +       uint32_t bo[max_rq + 1];
>> +       unsigned int n, i;
>> +       uint64_t val[2];
>> +       uint64_t ts[2];
>> +       int fd;
> 
> igt_require_sw_sync();
> 
> I guess we should do igt_require_cork(CORK_SYNC_FD) or something like
> that.
> 
>> +
>> +       memset(queued, 0, sizeof(queued));
>> +       memset(bo, 0, sizeof(bo));
>> +
>> +       fd = open_pmu(I915_PMU_ENGINE_QUEUED(e->class, e->instance));
>> +
>> +       for (n = 0; n <= max_rq; n++) {
>> +               int fence = -1;
>> +               struct igt_cork cork = { .fd = fence, .type = CORK_SYNC_FD };
> 
> IGT_CORK_FENCE(cork); if you prefer

Missed it.

> 
>> +
>> +               gem_quiescent_gpu(gem_fd);
>> +
>> +               if (n)
>> +                       fence = igt_cork_plug(&cork, -1);
>> +
>> +               for (i = 0; i < n; i++) {
>> +                       struct drm_i915_gem_exec_object2 obj = { };
>> +                       struct drm_i915_gem_execbuffer2 eb = { };
>> +
>> +                       if (!bo[i]) {
>> +                               const uint32_t bbe = MI_BATCH_BUFFER_END;
>> +
>> +                               bo[i] = gem_create(gem_fd, 4096);
>> +                               gem_write(gem_fd, bo[i], 4092, &bbe,
>> +                                         sizeof(bbe));
>> +                       }
>> +
>> +                       obj.handle = bo[i];
> 
> Looks like you can use just the one handle multiple times?

Hm yeah.

> 
>> +
>> +                       eb.buffer_count = 1;
>> +                       eb.buffers_ptr = to_user_pointer(&obj);
>> +
>> +                       eb.flags = engine | I915_EXEC_FENCE_IN;
>> +                       eb.rsvd2 = fence;
> 
> You do however also want to check with one context per execbuf.

Ok.

> 
> if (flags & CONTEXTS)
> 	eb.rsvd1 = gem_context_create(fd);
>> +
>> +                       gem_execbuf(gem_fd, &eb);
> 
> if (flags & CONTEXTS)
> 	gem_context_destroy(fd, eb.rsvd1);
> 	eb.rsvd1 = gem_context_create();
> 
>> +               }
>> +
>> +               val[0] = __pmu_read_single(fd, &ts[0]);
>> +               usleep(batch_duration_ns / 1000);
>> +               val[1] = __pmu_read_single(fd, &ts[1]);
>> +
>> +               queued[n] = calc_queued(val[1] - val[0], ts[1] - ts[0]);
>> +               igt_info("n=%u queued=%.2f\n", n, queued[n]);
>> +
>> +               if (fence >= 0)
>> +                       igt_cork_unplug(&cork);
> 
> Maybe we should just make this a no-op when used on an unplugged cork.

Don't know really, there's an assert in there and I didn't feel like 
evaluating all callers.

> 
>> +
>> +               for (i = 0; i < n; i++)
>> +                       gem_sync(gem_fd, bo[i]);
>> +       }
>> +
>> +       close(fd);
>> +
>> +       for (i = 0; i < max_rq; i++) {
>> +               if (bo[i])
>> +                       gem_close(gem_fd, bo[i]);
>> +       }
>> +
>> +       for (i = 0; i <= max_rq; i++)
>> +               assert_within_epsilon(queued[i], i, tolerance);
>> +}
>> +
>> +static void
>> +runnable(int gem_fd, const struct intel_execution_engine2 *e)
>> +{
>> +       const unsigned long engine = e2ring(gem_fd, e);
>> +       const unsigned int max_rq = 10;
>> +       igt_spin_t *spin[max_rq + 1];
>> +       double runnable[max_rq + 1];
>> +       uint32_t ctx[max_rq];
>> +       unsigned int n, i;
>> +       uint64_t val[2];
>> +       uint64_t ts[2];
>> +       int fd;
>> +
>> +       memset(runnable, 0, sizeof(runnable));
>> +       memset(ctx, 0, sizeof(ctx));
>> +
>> +       fd = open_pmu(I915_PMU_ENGINE_RUNNABLE(e->class, e->instance));
>> +
>> +       for (n = 0; n <= max_rq; n++) {
>> +               gem_quiescent_gpu(gem_fd);
>> +
>> +               for (i = 0; i < n; i++) {
>> +                       if (!ctx[i])
>> +                               ctx[i] = gem_context_create(gem_fd);
>> +
>> +                       if (i == 0)
>> +                               spin[i] = __spin_poll(gem_fd, ctx[i], engine);
>> +                       else
>> +                               spin[i] = __igt_spin_batch_new(gem_fd, ctx[i],
>> +                                                              engine, 0);
>> +               }
>> +
>> +               if (n)
>> +                       __spin_wait(gem_fd, spin[0]);
>> +
>> +               val[0] = __pmu_read_single(fd, &ts[0]);
>> +               usleep(batch_duration_ns / 1000);
>> +               val[1] = __pmu_read_single(fd, &ts[1]);
>> +
>> +               runnable[n] = calc_queued(val[1] - val[0], ts[1] - ts[0]);
>> +               igt_info("n=%u runnable=%.2f\n", n, runnable[n]);
>> +
>> +               for (i = 0; i < n; i++) {
>> +                       end_spin(gem_fd, spin[i], FLAG_SYNC);
>> +                       igt_spin_batch_free(gem_fd, spin[i]);
>> +               }
>> +       }
>> +
>> +       for (i = 0; i < max_rq; i++) {
>> +               if (ctx[i])
>> +                       gem_context_destroy(gem_fd, ctx[i]);
> 
> I would just create the contexts unconditionally.

Can do.

> 
>> +       }
>> +
>> +       close(fd);
>> +
>> +       assert_within_epsilon(runnable[0], 0, tolerance);
>> +       igt_assert(runnable[max_rq] > 0.0);
>> +       assert_within_epsilon(runnable[max_rq] - runnable[max_rq - 1], 1,
>> +                             tolerance);
>> +}
>> +
>> +static void
>> +running(int gem_fd, const struct intel_execution_engine2 *e)
>> +{
>> +       const unsigned long engine = e2ring(gem_fd, e);
>> +       const unsigned int max_rq = 10;
>> +       igt_spin_t *spin[max_rq + 1];
>> +       double running[max_rq + 1];
>> +       unsigned int n, i;
>> +       uint64_t val[2];
>> +       uint64_t ts[2];
>> +       int fd;
>> +
>> +       memset(running, 0, sizeof(running));
>> +       memset(spin, 0, sizeof(spin));
>> +
>> +       fd = open_pmu(I915_PMU_ENGINE_RUNNING(e->class, e->instance));
>> +
>> +       for (n = 0; n <= max_rq; n++) {
>> +               gem_quiescent_gpu(gem_fd);
>> +
>> +               for (i = 0; i < n; i++) {
>> +                       if (i == 0)
>> +                               spin[i] = __spin_poll(gem_fd, 0, engine);
>> +                       else
>> +                               spin[i] = __igt_spin_batch_new(gem_fd, 0,
>> +                                                              engine, 0);
>> +               }
>> +
>> +               if (n)
>> +                       __spin_wait(gem_fd, spin[0]);
> 
> So create N requests on the same context so that running == N due to
> lite-restore every time. I have some caveats that this relies on the
> precise implementation, e.g. I don't think it will work for guc (using
> execlists emulation with no lite-restore) for N > 2 or 8, or if we get
> creative with execlists.

Yep, I think it doesn't work with GuC. Well, the assert would need to be 
different as minimum.

For N > 2 and execlists I think it works since it only needs port 0.

Runnable subtest on the other hand tries to be flexible towards 
different possible Ns already. I guess for running I could downgrade the 
asserts to just check running[N + 1] >= running[N] and running[1] > 0.

> 
>> +
>> +               val[0] = __pmu_read_single(fd, &ts[0]);
>> +               usleep(batch_duration_ns / 1000);
>> +               val[1] = __pmu_read_single(fd, &ts[1]);
>> +
>> +               running[n] = calc_queued(val[1] - val[0], ts[1] - ts[0]);
>> +               igt_info("n=%u running=%.2f\n", n, running[n]);
>> +
>> +               for (i = 0; i < n; i++) {
>> +                       end_spin(gem_fd, spin[i], FLAG_SYNC);
>> +                       igt_spin_batch_free(gem_fd, spin[i]);
>> +               }
>> +       }
>> +
>> +       close(fd);
>> +
>> +       for (i = 0; i <= max_rq; i++)
>> +               assert_within_epsilon(running[i], i, tolerance);
>> +}
> 
> Ok, the tests look like they should be covering the counters.
> 
> Do we need to do an all-engines pass to check concurrent usage?

Depends how grey is your approach between white box and black box testing.

But what is definitely needed is some tests involving hangs, resets and 
preemption, since I am pretty sure a bug sneaked in somewhere in those 
areas.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [igt-dev] [Intel-gfx] [PATCH i-g-t 4/5] tests/perf_pmu: Add tests for engine queued/runnable/running stats
@ 2018-03-23 10:08       ` Tvrtko Ursulin
  0 siblings, 0 replies; 26+ messages in thread
From: Tvrtko Ursulin @ 2018-03-23 10:08 UTC (permalink / raw)
  To: Chris Wilson, Tvrtko Ursulin, igt-dev; +Cc: Intel-gfx


On 19/03/2018 20:58, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-03-19 18:22:04)
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> Simple tests to check reported queue depths are correct.
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> ---
>>   tests/perf_pmu.c | 224 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 224 insertions(+)
>>
>> diff --git a/tests/perf_pmu.c b/tests/perf_pmu.c
>> index 469b9becdbac..206c18960b7b 100644
>> --- a/tests/perf_pmu.c
>> +++ b/tests/perf_pmu.c
>> @@ -966,6 +966,196 @@ multi_client(int gem_fd, const struct intel_execution_engine2 *e)
>>          assert_within_epsilon(val[1], perf_slept[1], tolerance);
>>   }
>>   
>> +static double calc_queued(uint64_t d_val, uint64_t d_ns)
>> +{
>> +       return (double)d_val * 1e9 / I915_SAMPLE_QUEUED_DIVISOR / d_ns;
>> +}
>> +
>> +static void
>> +queued(int gem_fd, const struct intel_execution_engine2 *e)
>> +{
>> +       const unsigned long engine = e2ring(gem_fd, e);
>> +       const unsigned int max_rq = 10;
>> +       double queued[max_rq + 1];
>> +       uint32_t bo[max_rq + 1];
>> +       unsigned int n, i;
>> +       uint64_t val[2];
>> +       uint64_t ts[2];
>> +       int fd;
> 
> igt_require_sw_sync();
> 
> I guess we should do igt_require_cork(CORK_SYNC_FD) or something like
> that.
> 
>> +
>> +       memset(queued, 0, sizeof(queued));
>> +       memset(bo, 0, sizeof(bo));
>> +
>> +       fd = open_pmu(I915_PMU_ENGINE_QUEUED(e->class, e->instance));
>> +
>> +       for (n = 0; n <= max_rq; n++) {
>> +               int fence = -1;
>> +               struct igt_cork cork = { .fd = fence, .type = CORK_SYNC_FD };
> 
> IGT_CORK_FENCE(cork); if you prefer

Missed it.

> 
>> +
>> +               gem_quiescent_gpu(gem_fd);
>> +
>> +               if (n)
>> +                       fence = igt_cork_plug(&cork, -1);
>> +
>> +               for (i = 0; i < n; i++) {
>> +                       struct drm_i915_gem_exec_object2 obj = { };
>> +                       struct drm_i915_gem_execbuffer2 eb = { };
>> +
>> +                       if (!bo[i]) {
>> +                               const uint32_t bbe = MI_BATCH_BUFFER_END;
>> +
>> +                               bo[i] = gem_create(gem_fd, 4096);
>> +                               gem_write(gem_fd, bo[i], 4092, &bbe,
>> +                                         sizeof(bbe));
>> +                       }
>> +
>> +                       obj.handle = bo[i];
> 
> Looks like you can use just the one handle multiple times?

Hm yeah.

> 
>> +
>> +                       eb.buffer_count = 1;
>> +                       eb.buffers_ptr = to_user_pointer(&obj);
>> +
>> +                       eb.flags = engine | I915_EXEC_FENCE_IN;
>> +                       eb.rsvd2 = fence;
> 
> You do however also want to check with one context per execbuf.

Ok.

> 
> if (flags & CONTEXTS)
> 	eb.rsvd1 = gem_context_create(fd);
>> +
>> +                       gem_execbuf(gem_fd, &eb);
> 
> if (flags & CONTEXTS)
> 	gem_context_destroy(fd, eb.rsvd1);
> 	eb.rsvd1 = gem_context_create();
> 
>> +               }
>> +
>> +               val[0] = __pmu_read_single(fd, &ts[0]);
>> +               usleep(batch_duration_ns / 1000);
>> +               val[1] = __pmu_read_single(fd, &ts[1]);
>> +
>> +               queued[n] = calc_queued(val[1] - val[0], ts[1] - ts[0]);
>> +               igt_info("n=%u queued=%.2f\n", n, queued[n]);
>> +
>> +               if (fence >= 0)
>> +                       igt_cork_unplug(&cork);
> 
> Maybe we should just make this a no-op when used on an unplugged cork.

Don't know really, there's an assert in there and I didn't feel like 
evaluating all callers.

> 
>> +
>> +               for (i = 0; i < n; i++)
>> +                       gem_sync(gem_fd, bo[i]);
>> +       }
>> +
>> +       close(fd);
>> +
>> +       for (i = 0; i < max_rq; i++) {
>> +               if (bo[i])
>> +                       gem_close(gem_fd, bo[i]);
>> +       }
>> +
>> +       for (i = 0; i <= max_rq; i++)
>> +               assert_within_epsilon(queued[i], i, tolerance);
>> +}
>> +
>> +static void
>> +runnable(int gem_fd, const struct intel_execution_engine2 *e)
>> +{
>> +       const unsigned long engine = e2ring(gem_fd, e);
>> +       const unsigned int max_rq = 10;
>> +       igt_spin_t *spin[max_rq + 1];
>> +       double runnable[max_rq + 1];
>> +       uint32_t ctx[max_rq];
>> +       unsigned int n, i;
>> +       uint64_t val[2];
>> +       uint64_t ts[2];
>> +       int fd;
>> +
>> +       memset(runnable, 0, sizeof(runnable));
>> +       memset(ctx, 0, sizeof(ctx));
>> +
>> +       fd = open_pmu(I915_PMU_ENGINE_RUNNABLE(e->class, e->instance));
>> +
>> +       for (n = 0; n <= max_rq; n++) {
>> +               gem_quiescent_gpu(gem_fd);
>> +
>> +               for (i = 0; i < n; i++) {
>> +                       if (!ctx[i])
>> +                               ctx[i] = gem_context_create(gem_fd);
>> +
>> +                       if (i == 0)
>> +                               spin[i] = __spin_poll(gem_fd, ctx[i], engine);
>> +                       else
>> +                               spin[i] = __igt_spin_batch_new(gem_fd, ctx[i],
>> +                                                              engine, 0);
>> +               }
>> +
>> +               if (n)
>> +                       __spin_wait(gem_fd, spin[0]);
>> +
>> +               val[0] = __pmu_read_single(fd, &ts[0]);
>> +               usleep(batch_duration_ns / 1000);
>> +               val[1] = __pmu_read_single(fd, &ts[1]);
>> +
>> +               runnable[n] = calc_queued(val[1] - val[0], ts[1] - ts[0]);
>> +               igt_info("n=%u runnable=%.2f\n", n, runnable[n]);
>> +
>> +               for (i = 0; i < n; i++) {
>> +                       end_spin(gem_fd, spin[i], FLAG_SYNC);
>> +                       igt_spin_batch_free(gem_fd, spin[i]);
>> +               }
>> +       }
>> +
>> +       for (i = 0; i < max_rq; i++) {
>> +               if (ctx[i])
>> +                       gem_context_destroy(gem_fd, ctx[i]);
> 
> I would just create the contexts unconditionally.

Can do.

> 
>> +       }
>> +
>> +       close(fd);
>> +
>> +       assert_within_epsilon(runnable[0], 0, tolerance);
>> +       igt_assert(runnable[max_rq] > 0.0);
>> +       assert_within_epsilon(runnable[max_rq] - runnable[max_rq - 1], 1,
>> +                             tolerance);
>> +}
>> +
>> +static void
>> +running(int gem_fd, const struct intel_execution_engine2 *e)
>> +{
>> +       const unsigned long engine = e2ring(gem_fd, e);
>> +       const unsigned int max_rq = 10;
>> +       igt_spin_t *spin[max_rq + 1];
>> +       double running[max_rq + 1];
>> +       unsigned int n, i;
>> +       uint64_t val[2];
>> +       uint64_t ts[2];
>> +       int fd;
>> +
>> +       memset(running, 0, sizeof(running));
>> +       memset(spin, 0, sizeof(spin));
>> +
>> +       fd = open_pmu(I915_PMU_ENGINE_RUNNING(e->class, e->instance));
>> +
>> +       for (n = 0; n <= max_rq; n++) {
>> +               gem_quiescent_gpu(gem_fd);
>> +
>> +               for (i = 0; i < n; i++) {
>> +                       if (i == 0)
>> +                               spin[i] = __spin_poll(gem_fd, 0, engine);
>> +                       else
>> +                               spin[i] = __igt_spin_batch_new(gem_fd, 0,
>> +                                                              engine, 0);
>> +               }
>> +
>> +               if (n)
>> +                       __spin_wait(gem_fd, spin[0]);
> 
> So create N requests on the same context so that running == N due to
> lite-restore every time. I have some caveats that this relies on the
> precise implementation, e.g. I don't think it will work for guc (using
> execlists emulation with no lite-restore) for N > 2 or 8, or if we get
> creative with execlists.

Yep, I think it doesn't work with GuC. Well, the assert would need to be 
different as minimum.

For N > 2 and execlists I think it works since it only needs port 0.

Runnable subtest on the other hand tries to be flexible towards 
different possible Ns already. I guess for running I could downgrade the 
asserts to just check running[N + 1] >= running[N] and running[1] > 0.

> 
>> +
>> +               val[0] = __pmu_read_single(fd, &ts[0]);
>> +               usleep(batch_duration_ns / 1000);
>> +               val[1] = __pmu_read_single(fd, &ts[1]);
>> +
>> +               running[n] = calc_queued(val[1] - val[0], ts[1] - ts[0]);
>> +               igt_info("n=%u running=%.2f\n", n, running[n]);
>> +
>> +               for (i = 0; i < n; i++) {
>> +                       end_spin(gem_fd, spin[i], FLAG_SYNC);
>> +                       igt_spin_batch_free(gem_fd, spin[i]);
>> +               }
>> +       }
>> +
>> +       close(fd);
>> +
>> +       for (i = 0; i <= max_rq; i++)
>> +               assert_within_epsilon(running[i], i, tolerance);
>> +}
> 
> Ok, the tests look like they should be covering the counters.
> 
> Do we need to do an all-engines pass to check concurrent usage?

Depends how grey is your approach between white box and black box testing.

But what is definitely needed is some tests involving hangs, resets and 
preemption, since I am pretty sure a bug sneaked in somewhere in those 
areas.

Regards,

Tvrtko
_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 5/5] tests/i915_query: Engine queues tests
  2018-03-22 21:22     ` Lionel Landwerlin
@ 2018-03-23 10:18       ` Tvrtko Ursulin
  -1 siblings, 0 replies; 26+ messages in thread
From: Tvrtko Ursulin @ 2018-03-23 10:18 UTC (permalink / raw)
  To: Lionel Landwerlin, Tvrtko Ursulin, igt-dev; +Cc: Intel-gfx


On 22/03/2018 21:22, Lionel Landwerlin wrote:
> On 19/03/18 18:22, Tvrtko Ursulin wrote:
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> ...
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> ---
>>   tests/i915_query.c | 381 
>> +++++++++++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 381 insertions(+)
>>
>> diff --git a/tests/i915_query.c b/tests/i915_query.c
>> index c7de8cbd8371..94e7a3297ebd 100644
>> --- a/tests/i915_query.c
>> +++ b/tests/i915_query.c
>> @@ -477,8 +477,358 @@ test_query_topology_known_pci_ids(int fd, int 
>> devid)
>>       free(topo_info);
>>   }
>> +#define DRM_I915_QUERY_ENGINE_QUEUES    2
>> +
>> +struct drm_i915_query_engine_queues {
>> +    /** Engine class as in enum drm_i915_gem_engine_class. */
>> +    __u16 class;
>> +
>> +    /** Engine instance number. */
>> +    __u16 instance;
>> +
>> +    /** Number of requests with unresolved fences and dependencies. */
>> +    __u32 queued;
>> +
>> +    /** Number of ready requests waiting on a slot on GPU. */
>> +    __u32 runnable;
>> +
>> +    /** Number of requests executing on the GPU. */
>> +    __u32 running;
>> +
>> +    __u32 rsvd[5];
>> +};
>> +
>> +static bool query_engine_queues_supported(int fd)
>> +{
>> +    struct drm_i915_query_item item = {
>> +        .query_id = DRM_I915_QUERY_ENGINE_QUEUES,
>> +    };
>> +
>> +    return __i915_query_items(fd, &item, 1) == 0 && item.length > 0;
>> +}
>> +
>> +static void engine_queues_invalid(int fd)
>> +{
>> +    struct drm_i915_query_engine_queues queues;
>> +    struct drm_i915_query_item item;
>> +    unsigned int len;
>> +    unsigned int i;
>> +
>> +    /* Flags is MBZ. */
>> +    memset(&item, 0, sizeof(item));
>> +    item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
>> +    item.flags = 1;
>> +    i915_query_items(fd, &item, 1);
>> +    igt_assert_eq(item.length, -EINVAL);
>> +
>> +    /* Length not zero and not greater or equal required size. */
>> +    memset(&item, 0, sizeof(item));
>> +    item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
>> +    item.length = 1;
>> +    i915_query_items(fd, &item, 1);
>> +    igt_assert_eq(item.length, -ENOSPC);
>> +
>> +    /* Query correct length. */
>> +    memset(&item, 0, sizeof(item));
>> +    item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
>> +    i915_query_items(fd, &item, 1);
>> +    igt_assert(item.length >= 0);
>> +    len = item.length;
>> +
>> +    /* Ivalid pointer. */
>> +    memset(&item, 0, sizeof(item));
>> +    item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
>> +    item.length = len;
>> +    i915_query_items(fd, &item, 1);
>> +    igt_assert_eq(item.length, -EFAULT);
>> +
>> +    /* Reserved fields are MBZ. */
>> +
>> +    for (i = 0; i < ARRAY_SIZE(queues.rsvd); i++) {
>> +        memset(&queues, 0, sizeof(queues));
>> +        queues.rsvd[i] = 1;
>> +        memset(&item, 0, sizeof(item));
>> +        item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
>> +        item.length = len;
>> +        item.data_ptr = to_user_pointer(&queues);
>> +        i915_query_items(fd, &item, 1);
>> +        igt_assert_eq(item.length, -EINVAL);
>> +    }
>> +
>> +    memset(&queues, 0, sizeof(queues));
>> +    queues.class = -1;
>> +    memset(&item, 0, sizeof(item));
>> +    item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
>> +    item.length = len;
>> +    item.data_ptr = to_user_pointer(&queues);
>> +    i915_query_items(fd, &item, 1);
>> +    igt_assert_eq(item.length, -ENOENT);
>> +
> 
> Looks like you've copied the few lines above after.
> It seems to be the same test.

Above is checking invalid class is rejected, below is checking invalid 
instance - if I understood correctly what you are pointing at?

>> +    memset(&queues, 0, sizeof(queues));
>> +    queues.instance = -1;
>> +    memset(&item, 0, sizeof(item));
>> +    item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
>> +    item.length = len;
>> +        item.data_ptr = to_user_pointer(&queues);
>> +    i915_query_items(fd, &item, 1);
>> +    igt_assert_eq(item.length, -ENOENT);
>> +}
>> +
>> +static void engine_queues(int fd, const struct 
>> intel_execution_engine2 *e)
>> +{
>> +    struct drm_i915_query_engine_queues queues;
>> +    struct drm_i915_query_item item;
>> +    unsigned int len;
>> +
>> +    /* Query required buffer length. */
>> +    memset(&queues, 0, sizeof(queues));
>> +    memset(&item, 0, sizeof(item));
>> +    item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
>> +    item.data_ptr = to_user_pointer(&queues);
>> +    i915_query_items(fd, &item, 1);
>> +    igt_assert(item.length >= 0);
>> +    igt_assert(item.length <= sizeof(queues));
>> +    len = item.length;
>> +
>> +    /* Check length larger than required works and reports same 
>> length. */
>> +    memset(&queues, 0, sizeof(queues));
>> +    memset(&item, 0, sizeof(item));
>> +    item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
>> +    item.data_ptr = to_user_pointer(&queues);
>> +    item.length = len + 1;
>> +    i915_query_items(fd, &item, 1);
>> +    igt_assert_eq(item.length, len);
>> +
>> +    /* Actual query. */
>> +    memset(&queues, 0, sizeof(queues));
>> +    queues.class = e->class;
>> +    queues.instance = e->instance;
>> +    memset(&item, 0, sizeof(item));
>> +    item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
>> +    item.data_ptr = to_user_pointer(&queues);
>> +    item.length = len;
>> +    i915_query_items(fd, &item, 1);
>> +    igt_assert_eq(item.length, len);
>> +}
>> +
>> +static unsigned int e2ring(int gem_fd, const struct 
>> intel_execution_engine2 *e)
>> +{
>> +    return gem_class_instance_to_eb_flags(gem_fd, e->class, 
>> e->instance);
>> +}
>> +
>> +static void
>> +__query_queues(int fd, const struct intel_execution_engine2 *e,
>> +           struct drm_i915_query_engine_queues *queues)
>> +{
>> +    struct drm_i915_query_item item;
>> +
>> +    memset(queues, 0, sizeof(*queues));
>> +    queues->class = e->class;
>> +    queues->instance = e->instance;
>> +    memset(&item, 0, sizeof(item));
>> +    item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
>> +    item.data_ptr = to_user_pointer(queues);
>> +    item.length = sizeof(*queues);
>> +    i915_query_items(fd, &item, 1);
>> +    igt_assert_eq(item.length, sizeof(*queues));
>> +}
>> +
>> +static void
>> +engine_queued(int gem_fd, const struct intel_execution_engine2 *e)
>> +{
>> +    const unsigned long engine = e2ring(gem_fd, e);
>> +    struct drm_i915_query_engine_queues queues;
>> +    const unsigned int max_rq = 10;
>> +    uint32_t queued[max_rq + 1];
>> +    uint32_t bo[max_rq + 1];
>> +    unsigned int n, i;
>> +
>> +    memset(queued, 0, sizeof(queued));
>> +    memset(bo, 0, sizeof(bo));
>> +
>> +    for (n = 0; n <= max_rq; n++) {
>> +        int fence = -1;
>> +        struct igt_cork cork = { .fd = fence, .type = CORK_SYNC_FD };
>> +
>> +        gem_quiescent_gpu(gem_fd);
>> +
>> +        if (n)
>> +            fence = igt_cork_plug(&cork, -1);
>> +
>> +        for (i = 0; i < n; i++) {
>> +            struct drm_i915_gem_exec_object2 obj = { };
>> +            struct drm_i915_gem_execbuffer2 eb = { };
>> +
>> +            if (!bo[i]) {
>> +                const uint32_t bbe = MI_BATCH_BUFFER_END;
>> +
>> +                bo[i] = gem_create(gem_fd, 4096);
>> +                gem_write(gem_fd, bo[i], 4092, &bbe,
>> +                      sizeof(bbe));
>> +            }
>> +
>> +            obj.handle = bo[i];
>> +
>> +            eb.buffer_count = 1;
>> +            eb.buffers_ptr = to_user_pointer(&obj);
>> +
>> +            eb.flags = engine | I915_EXEC_FENCE_IN;
>> +            eb.rsvd2 = fence;
>> +
>> +            gem_execbuf(gem_fd, &eb);
>> +        }
>> +
>> +        __query_queues(gem_fd, e, &queues);
>> +        queued[n] = queues.queued;
>> +        igt_info("n=%u queued=%u\n", n, queued[n]);
>> +
>> +        if (fence >= 0)
>> +            igt_cork_unplug(&cork);
>> +
>> +        for (i = 0; i < n; i++)
>> +            gem_sync(gem_fd, bo[i]);
>> +    }
>> +
>> +    for (i = 0; i < max_rq; i++) {
>> +        if (bo[i])
>> +            gem_close(gem_fd, bo[i]);
>> +    }
>> +
>> +    for (i = 0; i <= max_rq; i++)
>> +        igt_assert_eq(queued[i], i);
>> +}
>> +
> 
> I'm not sure to understand what the 2 function below are meant to do.
> Could you put a comment?

Yeah, will need more comments.

> __igt_spin_batch_new_poll also appear to be missing.

I think I mentioned in the commit it depends on another yet unmerged 
patch so I was sending it only for reference for now.

This particular API has little chance of being merged any time soon due 
lack of userspace. This is mostly so the product group interested in it 
can slurp the two (i915 + IGT) from the mailing list for their use.

>> +static igt_spin_t * __spin_poll(int fd, uint32_t ctx, unsigned long 
>> flags)
>> +{
>> +    if (gem_can_store_dword(fd, flags))
>> +        return __igt_spin_batch_new_poll(fd, ctx, flags);
>> +    else
>> +        return __igt_spin_batch_new(fd, ctx, flags, 0);
>> +}
>> +
>> +static unsigned long __spin_wait(int fd, igt_spin_t *spin)
>> +{
>> +    struct timespec start = { };
>> +
>> +    igt_nsec_elapsed(&start);
>> +
>> +    if (gem_can_store_dword(fd, spin->execbuf.flags)) {
>> +        unsigned long timeout = 0;
>> +
>> +        while (!spin->running) {
>> +            unsigned long t = igt_nsec_elapsed(&start);
>> +
>> +            if ((t - timeout) > 250e6) {
>> +                timeout = t;
>> +                igt_warn("Spinner not running after %.2fms\n",
>> +                     (double)t / 1e6);
>> +            }
>> +        };
>> +    } else {
>> +        igt_debug("__spin_wait - usleep mode\n");
>> +        usleep(500e3); /* Better than nothing! */
>> +    }
>> +
>> +    return igt_nsec_elapsed(&start);
>> +}
>> +
>> +static void
>> +engine_runnable(int gem_fd, const struct intel_execution_engine2 *e)
>> +{
>> +    const unsigned long engine = e2ring(gem_fd, e);
>> +    struct drm_i915_query_engine_queues queues;
>> +    const unsigned int max_rq = 10;
>> +    igt_spin_t *spin[max_rq + 1];
>> +    uint32_t runnable[max_rq + 1];
>> +    uint32_t ctx[max_rq];
>> +    unsigned int n, i;
>> +
>> +    memset(runnable, 0, sizeof(runnable));
>> +    memset(ctx, 0, sizeof(ctx));
>> +
>> +    for (n = 0; n <= max_rq; n++) {
>> +        gem_quiescent_gpu(gem_fd);
>> +
>> +        for (i = 0; i < n; i++) {
>> +            if (!ctx[i])
>> +                ctx[i] = gem_context_create(gem_fd);
>> +
>> +            if (i == 0)
>> +                spin[i] = __spin_poll(gem_fd, ctx[i], engine);
>> +            else
>> +                spin[i] = __igt_spin_batch_new(gem_fd, ctx[i],
>> +                                   engine, 0);
>> +        }
>> +
>> +        if (n)
>> +            __spin_wait(gem_fd, spin[0]);
>> +
>> +        __query_queues(gem_fd, e, &queues);
>> +        runnable[n] = queues.runnable;
>> +        igt_info("n=%u runnable=%u\n", n, runnable[n]);
>> +
>> +        for (i = 0; i < n; i++) {
>> +            igt_spin_batch_end(spin[i]);
>> +            gem_sync(gem_fd, spin[i]->handle);
>> +            igt_spin_batch_free(gem_fd, spin[i]);
>> +        }
>> +    }
>> +
>> +    for (i = 0; i < max_rq; i++) {
>> +        if (ctx[i])
>> +            gem_context_destroy(gem_fd, ctx[i]);
>> +    }
>> +
>> +    igt_assert_eq(runnable[0], 0);
> 
> Why only checking the first & last items?
> It seems that the results should be consistent? no?

Depending on the submission backend (ringbuffer/execlists/guc), and 
number of submission ports, the split between runnable and running 
counter for a given submission pattern will vary.

This series of three asserts is supposed to make it work with any 
backend and any number of ports (as long as less than ten).

I think I cannot make the assert stronger without embedding this 
knowledge in the test, but I'll have another think about it.

> 
>> +    igt_assert(runnable[max_rq] > 0);
>> +    igt_assert_eq(runnable[max_rq] - runnable[max_rq - 1], 1);
>> +}
>> +
>> +static void
>> +engine_running(int gem_fd, const struct intel_execution_engine2 *e)
>> +{
>> +    const unsigned long engine = e2ring(gem_fd, e);
>> +    struct drm_i915_query_engine_queues queues;
>> +    const unsigned int max_rq = 10;
>> +    igt_spin_t *spin[max_rq + 1];
>> +    uint32_t running[max_rq + 1];
>> +    unsigned int n, i;
>> +
>> +    memset(running, 0, sizeof(running));
>> +    memset(spin, 0, sizeof(spin));
>> +
>> +    for (n = 0; n <= max_rq; n++) {
>> +        gem_quiescent_gpu(gem_fd);
>> +
>> +        for (i = 0; i < n; i++) {
>> +            if (i == 0)
>> +                spin[i] = __spin_poll(gem_fd, 0, engine);
>> +            else
>> +                spin[i] = __igt_spin_batch_new(gem_fd, 0,
>> +                                   engine, 0);
>> +        }
>> +
>> +        if (n)
>> +            __spin_wait(gem_fd, spin[0]);
>> +
>> +        __query_queues(gem_fd, e, &queues);
>> +        running[n] = queues.running;
>> +        igt_info("n=%u running=%u\n", n, running[n]);
>> +
>> +        for (i = 0; i < n; i++) {
>> +            igt_spin_batch_end(spin[i]);
>> +            gem_sync(gem_fd, spin[i]->handle);
>> +            igt_spin_batch_free(gem_fd, spin[i]);
>> +        }
>> +    }
>> +
>> +    for (i = 0; i <= max_rq; i++)
>> +        igt_assert_eq(running[i], i);
>> +}
>> +
>>   igt_main
>>   {
>> +    const struct intel_execution_engine2 *e;
>>       int fd = -1;
>>       int devid;
>> @@ -524,6 +874,37 @@ igt_main
>>           test_query_topology_known_pci_ids(fd, devid);
>>       }
> 
> I guess we could add a group for the topology too.

My dilemma was whether to stuff tests for any query into i915_query.c, 
or split out to separate binaries.

I can imagine, if/when the number of queries grows, i915_query.c would 
become unmanageable with the former approach. What do you think?

Regards,

Tvrtko

> 
>> +    igt_subtest_group {
>> +        igt_fixture {
>> +            igt_require(query_engine_queues_supported(fd));
>> +        }
>> +
>> +        igt_subtest("engine-queues-invalid")
>> +            engine_queues_invalid(fd);
>> +
>> +        for_each_engine_class_instance(fd, e) {
>> +            igt_subtest_group {
>> +                igt_fixture {
>> +                    gem_require_engine(fd,
>> +                               e->class,
>> +                               e->instance);
>> +                }
>> +
>> +                igt_subtest_f("engine-queues-%s", e->name)
>> +                    engine_queues(fd, e);
>> +
>> +                igt_subtest_f("engine-queued-%s", e->name)
>> +                    engine_queued(fd, e);
>> +
>> +                igt_subtest_f("engine-runnable-%s", e->name)
>> +                    engine_runnable(fd, e);
>> +
>> +                igt_subtest_f("engine-running-%s", e->name)
>> +                    engine_running(fd, e);
>> +            }
>> +        }
>> +    }
>> +
>>       igt_fixture {
>>           close(fd);
>>       }
> 
> 
> _______________________________________________
> igt-dev mailing list
> igt-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/igt-dev
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 5/5] tests/i915_query: Engine queues tests
@ 2018-03-23 10:18       ` Tvrtko Ursulin
  0 siblings, 0 replies; 26+ messages in thread
From: Tvrtko Ursulin @ 2018-03-23 10:18 UTC (permalink / raw)
  To: Lionel Landwerlin, Tvrtko Ursulin, igt-dev; +Cc: Intel-gfx, Tvrtko Ursulin


On 22/03/2018 21:22, Lionel Landwerlin wrote:
> On 19/03/18 18:22, Tvrtko Ursulin wrote:
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> ...
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> ---
>>   tests/i915_query.c | 381 
>> +++++++++++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 381 insertions(+)
>>
>> diff --git a/tests/i915_query.c b/tests/i915_query.c
>> index c7de8cbd8371..94e7a3297ebd 100644
>> --- a/tests/i915_query.c
>> +++ b/tests/i915_query.c
>> @@ -477,8 +477,358 @@ test_query_topology_known_pci_ids(int fd, int 
>> devid)
>>       free(topo_info);
>>   }
>> +#define DRM_I915_QUERY_ENGINE_QUEUES    2
>> +
>> +struct drm_i915_query_engine_queues {
>> +    /** Engine class as in enum drm_i915_gem_engine_class. */
>> +    __u16 class;
>> +
>> +    /** Engine instance number. */
>> +    __u16 instance;
>> +
>> +    /** Number of requests with unresolved fences and dependencies. */
>> +    __u32 queued;
>> +
>> +    /** Number of ready requests waiting on a slot on GPU. */
>> +    __u32 runnable;
>> +
>> +    /** Number of requests executing on the GPU. */
>> +    __u32 running;
>> +
>> +    __u32 rsvd[5];
>> +};
>> +
>> +static bool query_engine_queues_supported(int fd)
>> +{
>> +    struct drm_i915_query_item item = {
>> +        .query_id = DRM_I915_QUERY_ENGINE_QUEUES,
>> +    };
>> +
>> +    return __i915_query_items(fd, &item, 1) == 0 && item.length > 0;
>> +}
>> +
>> +static void engine_queues_invalid(int fd)
>> +{
>> +    struct drm_i915_query_engine_queues queues;
>> +    struct drm_i915_query_item item;
>> +    unsigned int len;
>> +    unsigned int i;
>> +
>> +    /* Flags is MBZ. */
>> +    memset(&item, 0, sizeof(item));
>> +    item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
>> +    item.flags = 1;
>> +    i915_query_items(fd, &item, 1);
>> +    igt_assert_eq(item.length, -EINVAL);
>> +
>> +    /* Length not zero and not greater or equal required size. */
>> +    memset(&item, 0, sizeof(item));
>> +    item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
>> +    item.length = 1;
>> +    i915_query_items(fd, &item, 1);
>> +    igt_assert_eq(item.length, -ENOSPC);
>> +
>> +    /* Query correct length. */
>> +    memset(&item, 0, sizeof(item));
>> +    item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
>> +    i915_query_items(fd, &item, 1);
>> +    igt_assert(item.length >= 0);
>> +    len = item.length;
>> +
>> +    /* Ivalid pointer. */
>> +    memset(&item, 0, sizeof(item));
>> +    item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
>> +    item.length = len;
>> +    i915_query_items(fd, &item, 1);
>> +    igt_assert_eq(item.length, -EFAULT);
>> +
>> +    /* Reserved fields are MBZ. */
>> +
>> +    for (i = 0; i < ARRAY_SIZE(queues.rsvd); i++) {
>> +        memset(&queues, 0, sizeof(queues));
>> +        queues.rsvd[i] = 1;
>> +        memset(&item, 0, sizeof(item));
>> +        item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
>> +        item.length = len;
>> +        item.data_ptr = to_user_pointer(&queues);
>> +        i915_query_items(fd, &item, 1);
>> +        igt_assert_eq(item.length, -EINVAL);
>> +    }
>> +
>> +    memset(&queues, 0, sizeof(queues));
>> +    queues.class = -1;
>> +    memset(&item, 0, sizeof(item));
>> +    item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
>> +    item.length = len;
>> +    item.data_ptr = to_user_pointer(&queues);
>> +    i915_query_items(fd, &item, 1);
>> +    igt_assert_eq(item.length, -ENOENT);
>> +
> 
> Looks like you've copied the few lines above after.
> It seems to be the same test.

Above is checking invalid class is rejected, below is checking invalid 
instance - if I understood correctly what you are pointing at?

>> +    memset(&queues, 0, sizeof(queues));
>> +    queues.instance = -1;
>> +    memset(&item, 0, sizeof(item));
>> +    item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
>> +    item.length = len;
>> +        item.data_ptr = to_user_pointer(&queues);
>> +    i915_query_items(fd, &item, 1);
>> +    igt_assert_eq(item.length, -ENOENT);
>> +}
>> +
>> +static void engine_queues(int fd, const struct 
>> intel_execution_engine2 *e)
>> +{
>> +    struct drm_i915_query_engine_queues queues;
>> +    struct drm_i915_query_item item;
>> +    unsigned int len;
>> +
>> +    /* Query required buffer length. */
>> +    memset(&queues, 0, sizeof(queues));
>> +    memset(&item, 0, sizeof(item));
>> +    item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
>> +    item.data_ptr = to_user_pointer(&queues);
>> +    i915_query_items(fd, &item, 1);
>> +    igt_assert(item.length >= 0);
>> +    igt_assert(item.length <= sizeof(queues));
>> +    len = item.length;
>> +
>> +    /* Check length larger than required works and reports same 
>> length. */
>> +    memset(&queues, 0, sizeof(queues));
>> +    memset(&item, 0, sizeof(item));
>> +    item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
>> +    item.data_ptr = to_user_pointer(&queues);
>> +    item.length = len + 1;
>> +    i915_query_items(fd, &item, 1);
>> +    igt_assert_eq(item.length, len);
>> +
>> +    /* Actual query. */
>> +    memset(&queues, 0, sizeof(queues));
>> +    queues.class = e->class;
>> +    queues.instance = e->instance;
>> +    memset(&item, 0, sizeof(item));
>> +    item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
>> +    item.data_ptr = to_user_pointer(&queues);
>> +    item.length = len;
>> +    i915_query_items(fd, &item, 1);
>> +    igt_assert_eq(item.length, len);
>> +}
>> +
>> +static unsigned int e2ring(int gem_fd, const struct 
>> intel_execution_engine2 *e)
>> +{
>> +    return gem_class_instance_to_eb_flags(gem_fd, e->class, 
>> e->instance);
>> +}
>> +
>> +static void
>> +__query_queues(int fd, const struct intel_execution_engine2 *e,
>> +           struct drm_i915_query_engine_queues *queues)
>> +{
>> +    struct drm_i915_query_item item;
>> +
>> +    memset(queues, 0, sizeof(*queues));
>> +    queues->class = e->class;
>> +    queues->instance = e->instance;
>> +    memset(&item, 0, sizeof(item));
>> +    item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
>> +    item.data_ptr = to_user_pointer(queues);
>> +    item.length = sizeof(*queues);
>> +    i915_query_items(fd, &item, 1);
>> +    igt_assert_eq(item.length, sizeof(*queues));
>> +}
>> +
>> +static void
>> +engine_queued(int gem_fd, const struct intel_execution_engine2 *e)
>> +{
>> +    const unsigned long engine = e2ring(gem_fd, e);
>> +    struct drm_i915_query_engine_queues queues;
>> +    const unsigned int max_rq = 10;
>> +    uint32_t queued[max_rq + 1];
>> +    uint32_t bo[max_rq + 1];
>> +    unsigned int n, i;
>> +
>> +    memset(queued, 0, sizeof(queued));
>> +    memset(bo, 0, sizeof(bo));
>> +
>> +    for (n = 0; n <= max_rq; n++) {
>> +        int fence = -1;
>> +        struct igt_cork cork = { .fd = fence, .type = CORK_SYNC_FD };
>> +
>> +        gem_quiescent_gpu(gem_fd);
>> +
>> +        if (n)
>> +            fence = igt_cork_plug(&cork, -1);
>> +
>> +        for (i = 0; i < n; i++) {
>> +            struct drm_i915_gem_exec_object2 obj = { };
>> +            struct drm_i915_gem_execbuffer2 eb = { };
>> +
>> +            if (!bo[i]) {
>> +                const uint32_t bbe = MI_BATCH_BUFFER_END;
>> +
>> +                bo[i] = gem_create(gem_fd, 4096);
>> +                gem_write(gem_fd, bo[i], 4092, &bbe,
>> +                      sizeof(bbe));
>> +            }
>> +
>> +            obj.handle = bo[i];
>> +
>> +            eb.buffer_count = 1;
>> +            eb.buffers_ptr = to_user_pointer(&obj);
>> +
>> +            eb.flags = engine | I915_EXEC_FENCE_IN;
>> +            eb.rsvd2 = fence;
>> +
>> +            gem_execbuf(gem_fd, &eb);
>> +        }
>> +
>> +        __query_queues(gem_fd, e, &queues);
>> +        queued[n] = queues.queued;
>> +        igt_info("n=%u queued=%u\n", n, queued[n]);
>> +
>> +        if (fence >= 0)
>> +            igt_cork_unplug(&cork);
>> +
>> +        for (i = 0; i < n; i++)
>> +            gem_sync(gem_fd, bo[i]);
>> +    }
>> +
>> +    for (i = 0; i < max_rq; i++) {
>> +        if (bo[i])
>> +            gem_close(gem_fd, bo[i]);
>> +    }
>> +
>> +    for (i = 0; i <= max_rq; i++)
>> +        igt_assert_eq(queued[i], i);
>> +}
>> +
> 
> I'm not sure to understand what the 2 function below are meant to do.
> Could you put a comment?

Yeah, will need more comments.

> __igt_spin_batch_new_poll also appear to be missing.

I think I mentioned in the commit it depends on another yet unmerged 
patch so I was sending it only for reference for now.

This particular API has little chance of being merged any time soon due 
lack of userspace. This is mostly so the product group interested in it 
can slurp the two (i915 + IGT) from the mailing list for their use.

>> +static igt_spin_t * __spin_poll(int fd, uint32_t ctx, unsigned long 
>> flags)
>> +{
>> +    if (gem_can_store_dword(fd, flags))
>> +        return __igt_spin_batch_new_poll(fd, ctx, flags);
>> +    else
>> +        return __igt_spin_batch_new(fd, ctx, flags, 0);
>> +}
>> +
>> +static unsigned long __spin_wait(int fd, igt_spin_t *spin)
>> +{
>> +    struct timespec start = { };
>> +
>> +    igt_nsec_elapsed(&start);
>> +
>> +    if (gem_can_store_dword(fd, spin->execbuf.flags)) {
>> +        unsigned long timeout = 0;
>> +
>> +        while (!spin->running) {
>> +            unsigned long t = igt_nsec_elapsed(&start);
>> +
>> +            if ((t - timeout) > 250e6) {
>> +                timeout = t;
>> +                igt_warn("Spinner not running after %.2fms\n",
>> +                     (double)t / 1e6);
>> +            }
>> +        };
>> +    } else {
>> +        igt_debug("__spin_wait - usleep mode\n");
>> +        usleep(500e3); /* Better than nothing! */
>> +    }
>> +
>> +    return igt_nsec_elapsed(&start);
>> +}
>> +
>> +static void
>> +engine_runnable(int gem_fd, const struct intel_execution_engine2 *e)
>> +{
>> +    const unsigned long engine = e2ring(gem_fd, e);
>> +    struct drm_i915_query_engine_queues queues;
>> +    const unsigned int max_rq = 10;
>> +    igt_spin_t *spin[max_rq + 1];
>> +    uint32_t runnable[max_rq + 1];
>> +    uint32_t ctx[max_rq];
>> +    unsigned int n, i;
>> +
>> +    memset(runnable, 0, sizeof(runnable));
>> +    memset(ctx, 0, sizeof(ctx));
>> +
>> +    for (n = 0; n <= max_rq; n++) {
>> +        gem_quiescent_gpu(gem_fd);
>> +
>> +        for (i = 0; i < n; i++) {
>> +            if (!ctx[i])
>> +                ctx[i] = gem_context_create(gem_fd);
>> +
>> +            if (i == 0)
>> +                spin[i] = __spin_poll(gem_fd, ctx[i], engine);
>> +            else
>> +                spin[i] = __igt_spin_batch_new(gem_fd, ctx[i],
>> +                                   engine, 0);
>> +        }
>> +
>> +        if (n)
>> +            __spin_wait(gem_fd, spin[0]);
>> +
>> +        __query_queues(gem_fd, e, &queues);
>> +        runnable[n] = queues.runnable;
>> +        igt_info("n=%u runnable=%u\n", n, runnable[n]);
>> +
>> +        for (i = 0; i < n; i++) {
>> +            igt_spin_batch_end(spin[i]);
>> +            gem_sync(gem_fd, spin[i]->handle);
>> +            igt_spin_batch_free(gem_fd, spin[i]);
>> +        }
>> +    }
>> +
>> +    for (i = 0; i < max_rq; i++) {
>> +        if (ctx[i])
>> +            gem_context_destroy(gem_fd, ctx[i]);
>> +    }
>> +
>> +    igt_assert_eq(runnable[0], 0);
> 
> Why only checking the first & last items?
> It seems that the results should be consistent? no?

Depending on the submission backend (ringbuffer/execlists/guc), and 
number of submission ports, the split between runnable and running 
counter for a given submission pattern will vary.

This series of three asserts is supposed to make it work with any 
backend and any number of ports (as long as less than ten).

I think I cannot make the assert stronger without embedding this 
knowledge in the test, but I'll have another think about it.

> 
>> +    igt_assert(runnable[max_rq] > 0);
>> +    igt_assert_eq(runnable[max_rq] - runnable[max_rq - 1], 1);
>> +}
>> +
>> +static void
>> +engine_running(int gem_fd, const struct intel_execution_engine2 *e)
>> +{
>> +    const unsigned long engine = e2ring(gem_fd, e);
>> +    struct drm_i915_query_engine_queues queues;
>> +    const unsigned int max_rq = 10;
>> +    igt_spin_t *spin[max_rq + 1];
>> +    uint32_t running[max_rq + 1];
>> +    unsigned int n, i;
>> +
>> +    memset(running, 0, sizeof(running));
>> +    memset(spin, 0, sizeof(spin));
>> +
>> +    for (n = 0; n <= max_rq; n++) {
>> +        gem_quiescent_gpu(gem_fd);
>> +
>> +        for (i = 0; i < n; i++) {
>> +            if (i == 0)
>> +                spin[i] = __spin_poll(gem_fd, 0, engine);
>> +            else
>> +                spin[i] = __igt_spin_batch_new(gem_fd, 0,
>> +                                   engine, 0);
>> +        }
>> +
>> +        if (n)
>> +            __spin_wait(gem_fd, spin[0]);
>> +
>> +        __query_queues(gem_fd, e, &queues);
>> +        running[n] = queues.running;
>> +        igt_info("n=%u running=%u\n", n, running[n]);
>> +
>> +        for (i = 0; i < n; i++) {
>> +            igt_spin_batch_end(spin[i]);
>> +            gem_sync(gem_fd, spin[i]->handle);
>> +            igt_spin_batch_free(gem_fd, spin[i]);
>> +        }
>> +    }
>> +
>> +    for (i = 0; i <= max_rq; i++)
>> +        igt_assert_eq(running[i], i);
>> +}
>> +
>>   igt_main
>>   {
>> +    const struct intel_execution_engine2 *e;
>>       int fd = -1;
>>       int devid;
>> @@ -524,6 +874,37 @@ igt_main
>>           test_query_topology_known_pci_ids(fd, devid);
>>       }
> 
> I guess we could add a group for the topology too.

My dilemma was whether to stuff tests for any query into i915_query.c, 
or split out to separate binaries.

I can imagine, if/when the number of queries grows, i915_query.c would 
become unmanageable with the former approach. What do you think?

Regards,

Tvrtko

> 
>> +    igt_subtest_group {
>> +        igt_fixture {
>> +            igt_require(query_engine_queues_supported(fd));
>> +        }
>> +
>> +        igt_subtest("engine-queues-invalid")
>> +            engine_queues_invalid(fd);
>> +
>> +        for_each_engine_class_instance(fd, e) {
>> +            igt_subtest_group {
>> +                igt_fixture {
>> +                    gem_require_engine(fd,
>> +                               e->class,
>> +                               e->instance);
>> +                }
>> +
>> +                igt_subtest_f("engine-queues-%s", e->name)
>> +                    engine_queues(fd, e);
>> +
>> +                igt_subtest_f("engine-queued-%s", e->name)
>> +                    engine_queued(fd, e);
>> +
>> +                igt_subtest_f("engine-runnable-%s", e->name)
>> +                    engine_runnable(fd, e);
>> +
>> +                igt_subtest_f("engine-running-%s", e->name)
>> +                    engine_running(fd, e);
>> +            }
>> +        }
>> +    }
>> +
>>       igt_fixture {
>>           close(fd);
>>       }
> 
> 
> _______________________________________________
> igt-dev mailing list
> igt-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/igt-dev
_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 5/5] tests/i915_query: Engine queues tests
  2018-03-23 10:18       ` Tvrtko Ursulin
@ 2018-03-23 10:47         ` Lionel Landwerlin
  -1 siblings, 0 replies; 26+ messages in thread
From: Lionel Landwerlin @ 2018-03-23 10:47 UTC (permalink / raw)
  To: Tvrtko Ursulin, Tvrtko Ursulin, igt-dev; +Cc: Intel-gfx

On 23/03/18 10:18, Tvrtko Ursulin wrote:
>
> On 22/03/2018 21:22, Lionel Landwerlin wrote:
>> On 19/03/18 18:22, Tvrtko Ursulin wrote:
>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>
>>> ...
>>>
>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>> ---
>>>   tests/i915_query.c | 381 
>>> +++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>   1 file changed, 381 insertions(+)
>>>
>>> diff --git a/tests/i915_query.c b/tests/i915_query.c
>>> index c7de8cbd8371..94e7a3297ebd 100644
>>> --- a/tests/i915_query.c
>>> +++ b/tests/i915_query.c
>>> @@ -477,8 +477,358 @@ test_query_topology_known_pci_ids(int fd, int 
>>> devid)
>>>       free(topo_info);
>>>   }
>>> +#define DRM_I915_QUERY_ENGINE_QUEUES    2
>>> +
>>> +struct drm_i915_query_engine_queues {
>>> +    /** Engine class as in enum drm_i915_gem_engine_class. */
>>> +    __u16 class;
>>> +
>>> +    /** Engine instance number. */
>>> +    __u16 instance;
>>> +
>>> +    /** Number of requests with unresolved fences and dependencies. */
>>> +    __u32 queued;
>>> +
>>> +    /** Number of ready requests waiting on a slot on GPU. */
>>> +    __u32 runnable;
>>> +
>>> +    /** Number of requests executing on the GPU. */
>>> +    __u32 running;
>>> +
>>> +    __u32 rsvd[5];
>>> +};
>>> +
>>> +static bool query_engine_queues_supported(int fd)
>>> +{
>>> +    struct drm_i915_query_item item = {
>>> +        .query_id = DRM_I915_QUERY_ENGINE_QUEUES,
>>> +    };
>>> +
>>> +    return __i915_query_items(fd, &item, 1) == 0 && item.length > 0;
>>> +}
>>> +
>>> +static void engine_queues_invalid(int fd)
>>> +{
>>> +    struct drm_i915_query_engine_queues queues;
>>> +    struct drm_i915_query_item item;
>>> +    unsigned int len;
>>> +    unsigned int i;
>>> +
>>> +    /* Flags is MBZ. */
>>> +    memset(&item, 0, sizeof(item));
>>> +    item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
>>> +    item.flags = 1;
>>> +    i915_query_items(fd, &item, 1);
>>> +    igt_assert_eq(item.length, -EINVAL);
>>> +
>>> +    /* Length not zero and not greater or equal required size. */
>>> +    memset(&item, 0, sizeof(item));
>>> +    item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
>>> +    item.length = 1;
>>> +    i915_query_items(fd, &item, 1);
>>> +    igt_assert_eq(item.length, -ENOSPC);
>>> +
>>> +    /* Query correct length. */
>>> +    memset(&item, 0, sizeof(item));
>>> +    item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
>>> +    i915_query_items(fd, &item, 1);
>>> +    igt_assert(item.length >= 0);
>>> +    len = item.length;
>>> +
>>> +    /* Ivalid pointer. */
>>> +    memset(&item, 0, sizeof(item));
>>> +    item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
>>> +    item.length = len;
>>> +    i915_query_items(fd, &item, 1);
>>> +    igt_assert_eq(item.length, -EFAULT);
>>> +
>>> +    /* Reserved fields are MBZ. */
>>> +
>>> +    for (i = 0; i < ARRAY_SIZE(queues.rsvd); i++) {
>>> +        memset(&queues, 0, sizeof(queues));
>>> +        queues.rsvd[i] = 1;
>>> +        memset(&item, 0, sizeof(item));
>>> +        item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
>>> +        item.length = len;
>>> +        item.data_ptr = to_user_pointer(&queues);
>>> +        i915_query_items(fd, &item, 1);
>>> +        igt_assert_eq(item.length, -EINVAL);
>>> +    }
>>> +
>>> +    memset(&queues, 0, sizeof(queues));
>>> +    queues.class = -1;
>>> +    memset(&item, 0, sizeof(item));
>>> +    item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
>>> +    item.length = len;
>>> +    item.data_ptr = to_user_pointer(&queues);
>>> +    i915_query_items(fd, &item, 1);
>>> +    igt_assert_eq(item.length, -ENOENT);
>>> +
>>
>> Looks like you've copied the few lines above after.
>> It seems to be the same test.
>
> Above is checking invalid class is rejected, below is checking invalid 
> instance - if I understood correctly what you are pointing at?

Oops, missed that.

>
>>> +    memset(&queues, 0, sizeof(queues));
>>> +    queues.instance = -1;
>>> +    memset(&item, 0, sizeof(item));
>>> +    item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
>>> +    item.length = len;
>>> +        item.data_ptr = to_user_pointer(&queues);
>>> +    i915_query_items(fd, &item, 1);
>>> +    igt_assert_eq(item.length, -ENOENT);
>>> +}
>>> +
>>> +static void engine_queues(int fd, const struct 
>>> intel_execution_engine2 *e)
>>> +{
>>> +    struct drm_i915_query_engine_queues queues;
>>> +    struct drm_i915_query_item item;
>>> +    unsigned int len;
>>> +
>>> +    /* Query required buffer length. */
>>> +    memset(&queues, 0, sizeof(queues));
>>> +    memset(&item, 0, sizeof(item));
>>> +    item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
>>> +    item.data_ptr = to_user_pointer(&queues);
>>> +    i915_query_items(fd, &item, 1);
>>> +    igt_assert(item.length >= 0);
>>> +    igt_assert(item.length <= sizeof(queues));
>>> +    len = item.length;
>>> +
>>> +    /* Check length larger than required works and reports same 
>>> length. */
>>> +    memset(&queues, 0, sizeof(queues));
>>> +    memset(&item, 0, sizeof(item));
>>> +    item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
>>> +    item.data_ptr = to_user_pointer(&queues);
>>> +    item.length = len + 1;
>>> +    i915_query_items(fd, &item, 1);
>>> +    igt_assert_eq(item.length, len);
>>> +
>>> +    /* Actual query. */
>>> +    memset(&queues, 0, sizeof(queues));
>>> +    queues.class = e->class;
>>> +    queues.instance = e->instance;
>>> +    memset(&item, 0, sizeof(item));
>>> +    item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
>>> +    item.data_ptr = to_user_pointer(&queues);
>>> +    item.length = len;
>>> +    i915_query_items(fd, &item, 1);
>>> +    igt_assert_eq(item.length, len);
>>> +}
>>> +
>>> +static unsigned int e2ring(int gem_fd, const struct 
>>> intel_execution_engine2 *e)
>>> +{
>>> +    return gem_class_instance_to_eb_flags(gem_fd, e->class, 
>>> e->instance);
>>> +}
>>> +
>>> +static void
>>> +__query_queues(int fd, const struct intel_execution_engine2 *e,
>>> +           struct drm_i915_query_engine_queues *queues)
>>> +{
>>> +    struct drm_i915_query_item item;
>>> +
>>> +    memset(queues, 0, sizeof(*queues));
>>> +    queues->class = e->class;
>>> +    queues->instance = e->instance;
>>> +    memset(&item, 0, sizeof(item));
>>> +    item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
>>> +    item.data_ptr = to_user_pointer(queues);
>>> +    item.length = sizeof(*queues);
>>> +    i915_query_items(fd, &item, 1);
>>> +    igt_assert_eq(item.length, sizeof(*queues));
>>> +}
>>> +
>>> +static void
>>> +engine_queued(int gem_fd, const struct intel_execution_engine2 *e)
>>> +{
>>> +    const unsigned long engine = e2ring(gem_fd, e);
>>> +    struct drm_i915_query_engine_queues queues;
>>> +    const unsigned int max_rq = 10;
>>> +    uint32_t queued[max_rq + 1];
>>> +    uint32_t bo[max_rq + 1];
>>> +    unsigned int n, i;
>>> +
>>> +    memset(queued, 0, sizeof(queued));
>>> +    memset(bo, 0, sizeof(bo));
>>> +
>>> +    for (n = 0; n <= max_rq; n++) {
>>> +        int fence = -1;
>>> +        struct igt_cork cork = { .fd = fence, .type = CORK_SYNC_FD };
>>> +
>>> +        gem_quiescent_gpu(gem_fd);
>>> +
>>> +        if (n)
>>> +            fence = igt_cork_plug(&cork, -1);
>>> +
>>> +        for (i = 0; i < n; i++) {
>>> +            struct drm_i915_gem_exec_object2 obj = { };
>>> +            struct drm_i915_gem_execbuffer2 eb = { };
>>> +
>>> +            if (!bo[i]) {
>>> +                const uint32_t bbe = MI_BATCH_BUFFER_END;
>>> +
>>> +                bo[i] = gem_create(gem_fd, 4096);
>>> +                gem_write(gem_fd, bo[i], 4092, &bbe,
>>> +                      sizeof(bbe));
>>> +            }
>>> +
>>> +            obj.handle = bo[i];
>>> +
>>> +            eb.buffer_count = 1;
>>> +            eb.buffers_ptr = to_user_pointer(&obj);
>>> +
>>> +            eb.flags = engine | I915_EXEC_FENCE_IN;
>>> +            eb.rsvd2 = fence;
>>> +
>>> +            gem_execbuf(gem_fd, &eb);
>>> +        }
>>> +
>>> +        __query_queues(gem_fd, e, &queues);
>>> +        queued[n] = queues.queued;
>>> +        igt_info("n=%u queued=%u\n", n, queued[n]);
>>> +
>>> +        if (fence >= 0)
>>> +            igt_cork_unplug(&cork);
>>> +
>>> +        for (i = 0; i < n; i++)
>>> +            gem_sync(gem_fd, bo[i]);
>>> +    }
>>> +
>>> +    for (i = 0; i < max_rq; i++) {
>>> +        if (bo[i])
>>> +            gem_close(gem_fd, bo[i]);
>>> +    }
>>> +
>>> +    for (i = 0; i <= max_rq; i++)
>>> +        igt_assert_eq(queued[i], i);
>>> +}
>>> +
>>
>> I'm not sure to understand what the 2 function below are meant to do.
>> Could you put a comment?
>
> Yeah, will need more comments.
>
>> __igt_spin_batch_new_poll also appear to be missing.
>
> I think I mentioned in the commit it depends on another yet unmerged 
> patch so I was sending it only for reference for now.
>
> This particular API has little chance of being merged any time soon 
> due lack of userspace. This is mostly so the product group interested 
> in it can slurp the two (i915 + IGT) from the mailing list for their use.
>
>>> +static igt_spin_t * __spin_poll(int fd, uint32_t ctx, unsigned long 
>>> flags)
>>> +{
>>> +    if (gem_can_store_dword(fd, flags))
>>> +        return __igt_spin_batch_new_poll(fd, ctx, flags);
>>> +    else
>>> +        return __igt_spin_batch_new(fd, ctx, flags, 0);
>>> +}
>>> +
>>> +static unsigned long __spin_wait(int fd, igt_spin_t *spin)
>>> +{
>>> +    struct timespec start = { };
>>> +
>>> +    igt_nsec_elapsed(&start);
>>> +
>>> +    if (gem_can_store_dword(fd, spin->execbuf.flags)) {
>>> +        unsigned long timeout = 0;
>>> +
>>> +        while (!spin->running) {
>>> +            unsigned long t = igt_nsec_elapsed(&start);
>>> +
>>> +            if ((t - timeout) > 250e6) {
>>> +                timeout = t;
>>> +                igt_warn("Spinner not running after %.2fms\n",
>>> +                     (double)t / 1e6);
>>> +            }
>>> +        };
>>> +    } else {
>>> +        igt_debug("__spin_wait - usleep mode\n");
>>> +        usleep(500e3); /* Better than nothing! */
>>> +    }
>>> +
>>> +    return igt_nsec_elapsed(&start);
>>> +}
>>> +
>>> +static void
>>> +engine_runnable(int gem_fd, const struct intel_execution_engine2 *e)
>>> +{
>>> +    const unsigned long engine = e2ring(gem_fd, e);
>>> +    struct drm_i915_query_engine_queues queues;
>>> +    const unsigned int max_rq = 10;
>>> +    igt_spin_t *spin[max_rq + 1];
>>> +    uint32_t runnable[max_rq + 1];
>>> +    uint32_t ctx[max_rq];
>>> +    unsigned int n, i;
>>> +
>>> +    memset(runnable, 0, sizeof(runnable));
>>> +    memset(ctx, 0, sizeof(ctx));
>>> +
>>> +    for (n = 0; n <= max_rq; n++) {
>>> +        gem_quiescent_gpu(gem_fd);
>>> +
>>> +        for (i = 0; i < n; i++) {
>>> +            if (!ctx[i])
>>> +                ctx[i] = gem_context_create(gem_fd);
>>> +
>>> +            if (i == 0)
>>> +                spin[i] = __spin_poll(gem_fd, ctx[i], engine);
>>> +            else
>>> +                spin[i] = __igt_spin_batch_new(gem_fd, ctx[i],
>>> +                                   engine, 0);
>>> +        }
>>> +
>>> +        if (n)
>>> +            __spin_wait(gem_fd, spin[0]);
>>> +
>>> +        __query_queues(gem_fd, e, &queues);
>>> +        runnable[n] = queues.runnable;
>>> +        igt_info("n=%u runnable=%u\n", n, runnable[n]);
>>> +
>>> +        for (i = 0; i < n; i++) {
>>> +            igt_spin_batch_end(spin[i]);
>>> +            gem_sync(gem_fd, spin[i]->handle);
>>> +            igt_spin_batch_free(gem_fd, spin[i]);
>>> +        }
>>> +    }
>>> +
>>> +    for (i = 0; i < max_rq; i++) {
>>> +        if (ctx[i])
>>> +            gem_context_destroy(gem_fd, ctx[i]);
>>> +    }
>>> +
>>> +    igt_assert_eq(runnable[0], 0);
>>
>> Why only checking the first & last items?
>> It seems that the results should be consistent? no?
>
> Depending on the submission backend (ringbuffer/execlists/guc), and 
> number of submission ports, the split between runnable and running 
> counter for a given submission pattern will vary.
>
> This series of three asserts is supposed to make it work with any 
> backend and any number of ports (as long as less than ten).
>
> I think I cannot make the assert stronger without embedding this 
> knowledge in the test, but I'll have another think about it.

Thanks, maybe some comments here would be a good idea.

Is there any verification you can make on runnable + running?

>
>>
>>> +    igt_assert(runnable[max_rq] > 0);
>>> +    igt_assert_eq(runnable[max_rq] - runnable[max_rq - 1], 1);
>>> +}
>>> +
>>> +static void
>>> +engine_running(int gem_fd, const struct intel_execution_engine2 *e)
>>> +{
>>> +    const unsigned long engine = e2ring(gem_fd, e);
>>> +    struct drm_i915_query_engine_queues queues;
>>> +    const unsigned int max_rq = 10;
>>> +    igt_spin_t *spin[max_rq + 1];
>>> +    uint32_t running[max_rq + 1];
>>> +    unsigned int n, i;
>>> +
>>> +    memset(running, 0, sizeof(running));
>>> +    memset(spin, 0, sizeof(spin));
>>> +
>>> +    for (n = 0; n <= max_rq; n++) {
>>> +        gem_quiescent_gpu(gem_fd);
>>> +
>>> +        for (i = 0; i < n; i++) {
>>> +            if (i == 0)
>>> +                spin[i] = __spin_poll(gem_fd, 0, engine);
>>> +            else
>>> +                spin[i] = __igt_spin_batch_new(gem_fd, 0,
>>> +                                   engine, 0);
>>> +        }
>>> +
>>> +        if (n)
>>> +            __spin_wait(gem_fd, spin[0]);
>>> +
>>> +        __query_queues(gem_fd, e, &queues);
>>> +        running[n] = queues.running;
>>> +        igt_info("n=%u running=%u\n", n, running[n]);
>>> +
>>> +        for (i = 0; i < n; i++) {
>>> +            igt_spin_batch_end(spin[i]);
>>> +            gem_sync(gem_fd, spin[i]->handle);
>>> +            igt_spin_batch_free(gem_fd, spin[i]);
>>> +        }
>>> +    }
>>> +
>>> +    for (i = 0; i <= max_rq; i++)
>>> +        igt_assert_eq(running[i], i);
>>> +}
>>> +
>>>   igt_main
>>>   {
>>> +    const struct intel_execution_engine2 *e;
>>>       int fd = -1;
>>>       int devid;
>>> @@ -524,6 +874,37 @@ igt_main
>>>           test_query_topology_known_pci_ids(fd, devid);
>>>       }
>>
>> I guess we could add a group for the topology too.
>
> My dilemma was whether to stuff tests for any query into i915_query.c, 
> or split out to separate binaries.
>
> I can imagine, if/when the number of queries grows, i915_query.c would 
> become unmanageable with the former approach. What do you think?

It was just a suggestion, no strong feeling either way.

>
> Regards,
>
> Tvrtko
>
>>
>>> +    igt_subtest_group {
>>> +        igt_fixture {
>>> +            igt_require(query_engine_queues_supported(fd));
>>> +        }
>>> +
>>> +        igt_subtest("engine-queues-invalid")
>>> +            engine_queues_invalid(fd);
>>> +
>>> +        for_each_engine_class_instance(fd, e) {
>>> +            igt_subtest_group {
>>> +                igt_fixture {
>>> +                    gem_require_engine(fd,
>>> +                               e->class,
>>> +                               e->instance);
>>> +                }
>>> +
>>> +                igt_subtest_f("engine-queues-%s", e->name)
>>> +                    engine_queues(fd, e);
>>> +
>>> +                igt_subtest_f("engine-queued-%s", e->name)
>>> +                    engine_queued(fd, e);
>>> +
>>> +                igt_subtest_f("engine-runnable-%s", e->name)
>>> +                    engine_runnable(fd, e);
>>> +
>>> +                igt_subtest_f("engine-running-%s", e->name)
>>> +                    engine_running(fd, e);
>>> +            }
>>> +        }
>>> +    }
>>> +
>>>       igt_fixture {
>>>           close(fd);
>>>       }
>>
>>
>> _______________________________________________
>> igt-dev mailing list
>> igt-dev@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/igt-dev
>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Intel-gfx] [igt-dev] [PATCH i-g-t 5/5] tests/i915_query: Engine queues tests
@ 2018-03-23 10:47         ` Lionel Landwerlin
  0 siblings, 0 replies; 26+ messages in thread
From: Lionel Landwerlin @ 2018-03-23 10:47 UTC (permalink / raw)
  To: Tvrtko Ursulin, Tvrtko Ursulin, igt-dev; +Cc: Intel-gfx

On 23/03/18 10:18, Tvrtko Ursulin wrote:
>
> On 22/03/2018 21:22, Lionel Landwerlin wrote:
>> On 19/03/18 18:22, Tvrtko Ursulin wrote:
>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>
>>> ...
>>>
>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>> ---
>>>   tests/i915_query.c | 381 
>>> +++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>   1 file changed, 381 insertions(+)
>>>
>>> diff --git a/tests/i915_query.c b/tests/i915_query.c
>>> index c7de8cbd8371..94e7a3297ebd 100644
>>> --- a/tests/i915_query.c
>>> +++ b/tests/i915_query.c
>>> @@ -477,8 +477,358 @@ test_query_topology_known_pci_ids(int fd, int 
>>> devid)
>>>       free(topo_info);
>>>   }
>>> +#define DRM_I915_QUERY_ENGINE_QUEUES    2
>>> +
>>> +struct drm_i915_query_engine_queues {
>>> +    /** Engine class as in enum drm_i915_gem_engine_class. */
>>> +    __u16 class;
>>> +
>>> +    /** Engine instance number. */
>>> +    __u16 instance;
>>> +
>>> +    /** Number of requests with unresolved fences and dependencies. */
>>> +    __u32 queued;
>>> +
>>> +    /** Number of ready requests waiting on a slot on GPU. */
>>> +    __u32 runnable;
>>> +
>>> +    /** Number of requests executing on the GPU. */
>>> +    __u32 running;
>>> +
>>> +    __u32 rsvd[5];
>>> +};
>>> +
>>> +static bool query_engine_queues_supported(int fd)
>>> +{
>>> +    struct drm_i915_query_item item = {
>>> +        .query_id = DRM_I915_QUERY_ENGINE_QUEUES,
>>> +    };
>>> +
>>> +    return __i915_query_items(fd, &item, 1) == 0 && item.length > 0;
>>> +}
>>> +
>>> +static void engine_queues_invalid(int fd)
>>> +{
>>> +    struct drm_i915_query_engine_queues queues;
>>> +    struct drm_i915_query_item item;
>>> +    unsigned int len;
>>> +    unsigned int i;
>>> +
>>> +    /* Flags is MBZ. */
>>> +    memset(&item, 0, sizeof(item));
>>> +    item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
>>> +    item.flags = 1;
>>> +    i915_query_items(fd, &item, 1);
>>> +    igt_assert_eq(item.length, -EINVAL);
>>> +
>>> +    /* Length not zero and not greater or equal required size. */
>>> +    memset(&item, 0, sizeof(item));
>>> +    item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
>>> +    item.length = 1;
>>> +    i915_query_items(fd, &item, 1);
>>> +    igt_assert_eq(item.length, -ENOSPC);
>>> +
>>> +    /* Query correct length. */
>>> +    memset(&item, 0, sizeof(item));
>>> +    item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
>>> +    i915_query_items(fd, &item, 1);
>>> +    igt_assert(item.length >= 0);
>>> +    len = item.length;
>>> +
>>> +    /* Ivalid pointer. */
>>> +    memset(&item, 0, sizeof(item));
>>> +    item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
>>> +    item.length = len;
>>> +    i915_query_items(fd, &item, 1);
>>> +    igt_assert_eq(item.length, -EFAULT);
>>> +
>>> +    /* Reserved fields are MBZ. */
>>> +
>>> +    for (i = 0; i < ARRAY_SIZE(queues.rsvd); i++) {
>>> +        memset(&queues, 0, sizeof(queues));
>>> +        queues.rsvd[i] = 1;
>>> +        memset(&item, 0, sizeof(item));
>>> +        item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
>>> +        item.length = len;
>>> +        item.data_ptr = to_user_pointer(&queues);
>>> +        i915_query_items(fd, &item, 1);
>>> +        igt_assert_eq(item.length, -EINVAL);
>>> +    }
>>> +
>>> +    memset(&queues, 0, sizeof(queues));
>>> +    queues.class = -1;
>>> +    memset(&item, 0, sizeof(item));
>>> +    item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
>>> +    item.length = len;
>>> +    item.data_ptr = to_user_pointer(&queues);
>>> +    i915_query_items(fd, &item, 1);
>>> +    igt_assert_eq(item.length, -ENOENT);
>>> +
>>
>> Looks like you've copied the few lines above after.
>> It seems to be the same test.
>
> Above is checking invalid class is rejected, below is checking invalid 
> instance - if I understood correctly what you are pointing at?

Oops, missed that.

>
>>> +    memset(&queues, 0, sizeof(queues));
>>> +    queues.instance = -1;
>>> +    memset(&item, 0, sizeof(item));
>>> +    item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
>>> +    item.length = len;
>>> +        item.data_ptr = to_user_pointer(&queues);
>>> +    i915_query_items(fd, &item, 1);
>>> +    igt_assert_eq(item.length, -ENOENT);
>>> +}
>>> +
>>> +static void engine_queues(int fd, const struct 
>>> intel_execution_engine2 *e)
>>> +{
>>> +    struct drm_i915_query_engine_queues queues;
>>> +    struct drm_i915_query_item item;
>>> +    unsigned int len;
>>> +
>>> +    /* Query required buffer length. */
>>> +    memset(&queues, 0, sizeof(queues));
>>> +    memset(&item, 0, sizeof(item));
>>> +    item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
>>> +    item.data_ptr = to_user_pointer(&queues);
>>> +    i915_query_items(fd, &item, 1);
>>> +    igt_assert(item.length >= 0);
>>> +    igt_assert(item.length <= sizeof(queues));
>>> +    len = item.length;
>>> +
>>> +    /* Check length larger than required works and reports same 
>>> length. */
>>> +    memset(&queues, 0, sizeof(queues));
>>> +    memset(&item, 0, sizeof(item));
>>> +    item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
>>> +    item.data_ptr = to_user_pointer(&queues);
>>> +    item.length = len + 1;
>>> +    i915_query_items(fd, &item, 1);
>>> +    igt_assert_eq(item.length, len);
>>> +
>>> +    /* Actual query. */
>>> +    memset(&queues, 0, sizeof(queues));
>>> +    queues.class = e->class;
>>> +    queues.instance = e->instance;
>>> +    memset(&item, 0, sizeof(item));
>>> +    item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
>>> +    item.data_ptr = to_user_pointer(&queues);
>>> +    item.length = len;
>>> +    i915_query_items(fd, &item, 1);
>>> +    igt_assert_eq(item.length, len);
>>> +}
>>> +
>>> +static unsigned int e2ring(int gem_fd, const struct 
>>> intel_execution_engine2 *e)
>>> +{
>>> +    return gem_class_instance_to_eb_flags(gem_fd, e->class, 
>>> e->instance);
>>> +}
>>> +
>>> +static void
>>> +__query_queues(int fd, const struct intel_execution_engine2 *e,
>>> +           struct drm_i915_query_engine_queues *queues)
>>> +{
>>> +    struct drm_i915_query_item item;
>>> +
>>> +    memset(queues, 0, sizeof(*queues));
>>> +    queues->class = e->class;
>>> +    queues->instance = e->instance;
>>> +    memset(&item, 0, sizeof(item));
>>> +    item.query_id = DRM_I915_QUERY_ENGINE_QUEUES;
>>> +    item.data_ptr = to_user_pointer(queues);
>>> +    item.length = sizeof(*queues);
>>> +    i915_query_items(fd, &item, 1);
>>> +    igt_assert_eq(item.length, sizeof(*queues));
>>> +}
>>> +
>>> +static void
>>> +engine_queued(int gem_fd, const struct intel_execution_engine2 *e)
>>> +{
>>> +    const unsigned long engine = e2ring(gem_fd, e);
>>> +    struct drm_i915_query_engine_queues queues;
>>> +    const unsigned int max_rq = 10;
>>> +    uint32_t queued[max_rq + 1];
>>> +    uint32_t bo[max_rq + 1];
>>> +    unsigned int n, i;
>>> +
>>> +    memset(queued, 0, sizeof(queued));
>>> +    memset(bo, 0, sizeof(bo));
>>> +
>>> +    for (n = 0; n <= max_rq; n++) {
>>> +        int fence = -1;
>>> +        struct igt_cork cork = { .fd = fence, .type = CORK_SYNC_FD };
>>> +
>>> +        gem_quiescent_gpu(gem_fd);
>>> +
>>> +        if (n)
>>> +            fence = igt_cork_plug(&cork, -1);
>>> +
>>> +        for (i = 0; i < n; i++) {
>>> +            struct drm_i915_gem_exec_object2 obj = { };
>>> +            struct drm_i915_gem_execbuffer2 eb = { };
>>> +
>>> +            if (!bo[i]) {
>>> +                const uint32_t bbe = MI_BATCH_BUFFER_END;
>>> +
>>> +                bo[i] = gem_create(gem_fd, 4096);
>>> +                gem_write(gem_fd, bo[i], 4092, &bbe,
>>> +                      sizeof(bbe));
>>> +            }
>>> +
>>> +            obj.handle = bo[i];
>>> +
>>> +            eb.buffer_count = 1;
>>> +            eb.buffers_ptr = to_user_pointer(&obj);
>>> +
>>> +            eb.flags = engine | I915_EXEC_FENCE_IN;
>>> +            eb.rsvd2 = fence;
>>> +
>>> +            gem_execbuf(gem_fd, &eb);
>>> +        }
>>> +
>>> +        __query_queues(gem_fd, e, &queues);
>>> +        queued[n] = queues.queued;
>>> +        igt_info("n=%u queued=%u\n", n, queued[n]);
>>> +
>>> +        if (fence >= 0)
>>> +            igt_cork_unplug(&cork);
>>> +
>>> +        for (i = 0; i < n; i++)
>>> +            gem_sync(gem_fd, bo[i]);
>>> +    }
>>> +
>>> +    for (i = 0; i < max_rq; i++) {
>>> +        if (bo[i])
>>> +            gem_close(gem_fd, bo[i]);
>>> +    }
>>> +
>>> +    for (i = 0; i <= max_rq; i++)
>>> +        igt_assert_eq(queued[i], i);
>>> +}
>>> +
>>
>> I'm not sure to understand what the 2 function below are meant to do.
>> Could you put a comment?
>
> Yeah, will need more comments.
>
>> __igt_spin_batch_new_poll also appear to be missing.
>
> I think I mentioned in the commit it depends on another yet unmerged 
> patch so I was sending it only for reference for now.
>
> This particular API has little chance of being merged any time soon 
> due lack of userspace. This is mostly so the product group interested 
> in it can slurp the two (i915 + IGT) from the mailing list for their use.
>
>>> +static igt_spin_t * __spin_poll(int fd, uint32_t ctx, unsigned long 
>>> flags)
>>> +{
>>> +    if (gem_can_store_dword(fd, flags))
>>> +        return __igt_spin_batch_new_poll(fd, ctx, flags);
>>> +    else
>>> +        return __igt_spin_batch_new(fd, ctx, flags, 0);
>>> +}
>>> +
>>> +static unsigned long __spin_wait(int fd, igt_spin_t *spin)
>>> +{
>>> +    struct timespec start = { };
>>> +
>>> +    igt_nsec_elapsed(&start);
>>> +
>>> +    if (gem_can_store_dword(fd, spin->execbuf.flags)) {
>>> +        unsigned long timeout = 0;
>>> +
>>> +        while (!spin->running) {
>>> +            unsigned long t = igt_nsec_elapsed(&start);
>>> +
>>> +            if ((t - timeout) > 250e6) {
>>> +                timeout = t;
>>> +                igt_warn("Spinner not running after %.2fms\n",
>>> +                     (double)t / 1e6);
>>> +            }
>>> +        };
>>> +    } else {
>>> +        igt_debug("__spin_wait - usleep mode\n");
>>> +        usleep(500e3); /* Better than nothing! */
>>> +    }
>>> +
>>> +    return igt_nsec_elapsed(&start);
>>> +}
>>> +
>>> +static void
>>> +engine_runnable(int gem_fd, const struct intel_execution_engine2 *e)
>>> +{
>>> +    const unsigned long engine = e2ring(gem_fd, e);
>>> +    struct drm_i915_query_engine_queues queues;
>>> +    const unsigned int max_rq = 10;
>>> +    igt_spin_t *spin[max_rq + 1];
>>> +    uint32_t runnable[max_rq + 1];
>>> +    uint32_t ctx[max_rq];
>>> +    unsigned int n, i;
>>> +
>>> +    memset(runnable, 0, sizeof(runnable));
>>> +    memset(ctx, 0, sizeof(ctx));
>>> +
>>> +    for (n = 0; n <= max_rq; n++) {
>>> +        gem_quiescent_gpu(gem_fd);
>>> +
>>> +        for (i = 0; i < n; i++) {
>>> +            if (!ctx[i])
>>> +                ctx[i] = gem_context_create(gem_fd);
>>> +
>>> +            if (i == 0)
>>> +                spin[i] = __spin_poll(gem_fd, ctx[i], engine);
>>> +            else
>>> +                spin[i] = __igt_spin_batch_new(gem_fd, ctx[i],
>>> +                                   engine, 0);
>>> +        }
>>> +
>>> +        if (n)
>>> +            __spin_wait(gem_fd, spin[0]);
>>> +
>>> +        __query_queues(gem_fd, e, &queues);
>>> +        runnable[n] = queues.runnable;
>>> +        igt_info("n=%u runnable=%u\n", n, runnable[n]);
>>> +
>>> +        for (i = 0; i < n; i++) {
>>> +            igt_spin_batch_end(spin[i]);
>>> +            gem_sync(gem_fd, spin[i]->handle);
>>> +            igt_spin_batch_free(gem_fd, spin[i]);
>>> +        }
>>> +    }
>>> +
>>> +    for (i = 0; i < max_rq; i++) {
>>> +        if (ctx[i])
>>> +            gem_context_destroy(gem_fd, ctx[i]);
>>> +    }
>>> +
>>> +    igt_assert_eq(runnable[0], 0);
>>
>> Why only checking the first & last items?
>> It seems that the results should be consistent? no?
>
> Depending on the submission backend (ringbuffer/execlists/guc), and 
> number of submission ports, the split between runnable and running 
> counter for a given submission pattern will vary.
>
> This series of three asserts is supposed to make it work with any 
> backend and any number of ports (as long as less than ten).
>
> I think I cannot make the assert stronger without embedding this 
> knowledge in the test, but I'll have another think about it.

Thanks, maybe some comments here would be a good idea.

Is there any verification you can make on runnable + running?

>
>>
>>> +    igt_assert(runnable[max_rq] > 0);
>>> +    igt_assert_eq(runnable[max_rq] - runnable[max_rq - 1], 1);
>>> +}
>>> +
>>> +static void
>>> +engine_running(int gem_fd, const struct intel_execution_engine2 *e)
>>> +{
>>> +    const unsigned long engine = e2ring(gem_fd, e);
>>> +    struct drm_i915_query_engine_queues queues;
>>> +    const unsigned int max_rq = 10;
>>> +    igt_spin_t *spin[max_rq + 1];
>>> +    uint32_t running[max_rq + 1];
>>> +    unsigned int n, i;
>>> +
>>> +    memset(running, 0, sizeof(running));
>>> +    memset(spin, 0, sizeof(spin));
>>> +
>>> +    for (n = 0; n <= max_rq; n++) {
>>> +        gem_quiescent_gpu(gem_fd);
>>> +
>>> +        for (i = 0; i < n; i++) {
>>> +            if (i == 0)
>>> +                spin[i] = __spin_poll(gem_fd, 0, engine);
>>> +            else
>>> +                spin[i] = __igt_spin_batch_new(gem_fd, 0,
>>> +                                   engine, 0);
>>> +        }
>>> +
>>> +        if (n)
>>> +            __spin_wait(gem_fd, spin[0]);
>>> +
>>> +        __query_queues(gem_fd, e, &queues);
>>> +        running[n] = queues.running;
>>> +        igt_info("n=%u running=%u\n", n, running[n]);
>>> +
>>> +        for (i = 0; i < n; i++) {
>>> +            igt_spin_batch_end(spin[i]);
>>> +            gem_sync(gem_fd, spin[i]->handle);
>>> +            igt_spin_batch_free(gem_fd, spin[i]);
>>> +        }
>>> +    }
>>> +
>>> +    for (i = 0; i <= max_rq; i++)
>>> +        igt_assert_eq(running[i], i);
>>> +}
>>> +
>>>   igt_main
>>>   {
>>> +    const struct intel_execution_engine2 *e;
>>>       int fd = -1;
>>>       int devid;
>>> @@ -524,6 +874,37 @@ igt_main
>>>           test_query_topology_known_pci_ids(fd, devid);
>>>       }
>>
>> I guess we could add a group for the topology too.
>
> My dilemma was whether to stuff tests for any query into i915_query.c, 
> or split out to separate binaries.
>
> I can imagine, if/when the number of queries grows, i915_query.c would 
> become unmanageable with the former approach. What do you think?

It was just a suggestion, no strong feeling either way.

>
> Regards,
>
> Tvrtko
>
>>
>>> +    igt_subtest_group {
>>> +        igt_fixture {
>>> +            igt_require(query_engine_queues_supported(fd));
>>> +        }
>>> +
>>> +        igt_subtest("engine-queues-invalid")
>>> +            engine_queues_invalid(fd);
>>> +
>>> +        for_each_engine_class_instance(fd, e) {
>>> +            igt_subtest_group {
>>> +                igt_fixture {
>>> +                    gem_require_engine(fd,
>>> +                               e->class,
>>> +                               e->instance);
>>> +                }
>>> +
>>> +                igt_subtest_f("engine-queues-%s", e->name)
>>> +                    engine_queues(fd, e);
>>> +
>>> +                igt_subtest_f("engine-queued-%s", e->name)
>>> +                    engine_queued(fd, e);
>>> +
>>> +                igt_subtest_f("engine-runnable-%s", e->name)
>>> +                    engine_runnable(fd, e);
>>> +
>>> +                igt_subtest_f("engine-running-%s", e->name)
>>> +                    engine_running(fd, e);
>>> +            }
>>> +        }
>>> +    }
>>> +
>>>       igt_fixture {
>>>           close(fd);
>>>       }
>>
>>
>> _______________________________________________
>> igt-dev mailing list
>> igt-dev@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/igt-dev
>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 5/5] tests/i915_query: Engine queues tests
  2018-03-23 10:18       ` Tvrtko Ursulin
@ 2018-03-23 10:53         ` Chris Wilson
  -1 siblings, 0 replies; 26+ messages in thread
From: Chris Wilson @ 2018-03-23 10:53 UTC (permalink / raw)
  To: Tvrtko Ursulin, Lionel Landwerlin, Tvrtko Ursulin, igt-dev; +Cc: Intel-gfx

Quoting Tvrtko Ursulin (2018-03-23 10:18:37)
> 
> On 22/03/2018 21:22, Lionel Landwerlin wrote:
> > I guess we could add a group for the topology too.
> 
> My dilemma was whether to stuff tests for any query into i915_query.c, 
> or split out to separate binaries.
> 
> I can imagine, if/when the number of queries grows, i915_query.c would 
> become unmanageable with the former approach. What do you think?

#tags ;)
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Intel-gfx] [igt-dev] [PATCH i-g-t 5/5] tests/i915_query: Engine queues tests
@ 2018-03-23 10:53         ` Chris Wilson
  0 siblings, 0 replies; 26+ messages in thread
From: Chris Wilson @ 2018-03-23 10:53 UTC (permalink / raw)
  To: Tvrtko Ursulin, Lionel Landwerlin, Tvrtko Ursulin, igt-dev; +Cc: Intel-gfx

Quoting Tvrtko Ursulin (2018-03-23 10:18:37)
> 
> On 22/03/2018 21:22, Lionel Landwerlin wrote:
> > I guess we could add a group for the topology too.
> 
> My dilemma was whether to stuff tests for any query into i915_query.c, 
> or split out to separate binaries.
> 
> I can imagine, if/when the number of queries grows, i915_query.c would 
> become unmanageable with the former approach. What do you think?

#tags ;)
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH i-g-t 4/5] tests/perf_pmu: Add tests for engine queued/runnable/running stats
  2018-04-05 12:40 [PATCH i-g-t v2 0/5] " Tvrtko Ursulin
@ 2018-04-05 12:40 ` Tvrtko Ursulin
  0 siblings, 0 replies; 26+ messages in thread
From: Tvrtko Ursulin @ 2018-04-05 12:40 UTC (permalink / raw)
  To: igt-dev; +Cc: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Simple tests to check reported queue depths are correct.

v2:
 * Improvements similar to ones from i915_query.c.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 tests/perf_pmu.c | 258 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 258 insertions(+)

diff --git a/tests/perf_pmu.c b/tests/perf_pmu.c
index 590e6526b069..7fccb437d048 100644
--- a/tests/perf_pmu.c
+++ b/tests/perf_pmu.c
@@ -169,6 +169,7 @@ static unsigned int e2ring(int gem_fd, const struct intel_execution_engine2 *e)
 #define TEST_RUNTIME_PM (8)
 #define FLAG_LONG (16)
 #define FLAG_HANG (32)
+#define TEST_CONTEXTS (64)
 
 static igt_spin_t * __spin_poll(int fd, uint32_t ctx, unsigned long flags)
 {
@@ -959,6 +960,223 @@ multi_client(int gem_fd, const struct intel_execution_engine2 *e)
 	assert_within_epsilon(val[1], perf_slept[1], tolerance);
 }
 
+static double calc_queued(uint64_t d_val, uint64_t d_ns)
+{
+	return (double)d_val * 1e9 / I915_SAMPLE_QUEUED_DIVISOR / d_ns;
+}
+
+static void
+queued(int gem_fd, const struct intel_execution_engine2 *e, unsigned int flags)
+{
+	const unsigned long engine = e2ring(gem_fd, e);
+	const uint32_t bbe = MI_BATCH_BUFFER_END;
+	const unsigned int max_rq = 10;
+	double queued[max_rq + 1];
+	unsigned int n, i;
+	uint64_t val[2];
+	uint64_t ts[2];
+	uint32_t bo;
+	int fd;
+
+	igt_require_sw_sync();
+	if (flags & TEST_CONTEXTS)
+		gem_require_contexts(gem_fd);
+
+	memset(queued, 0, sizeof(queued));
+
+	bo = gem_create(gem_fd, 4096);
+	gem_write(gem_fd, bo, 4092, &bbe, sizeof(bbe));
+
+	fd = open_pmu(I915_PMU_ENGINE_QUEUED(e->class, e->instance));
+
+	for (n = 0; n <= max_rq; n++) {
+		IGT_CORK_FENCE(cork);
+		int fence = -1;
+
+		gem_quiescent_gpu(gem_fd);
+
+		if (n)
+			fence = igt_cork_plug(&cork, -1);
+
+		for (i = 0; i < n; i++) {
+			struct drm_i915_gem_exec_object2 obj = { };
+			struct drm_i915_gem_execbuffer2 eb = { };
+
+			obj.handle = bo;
+
+			eb.buffer_count = 1;
+			eb.buffers_ptr = to_user_pointer(&obj);
+
+			eb.flags = engine | I915_EXEC_FENCE_IN;
+			if (flags & TEST_CONTEXTS)
+				eb.rsvd1 = gem_context_create(gem_fd);
+			eb.rsvd2 = fence;
+
+			gem_execbuf(gem_fd, &eb);
+
+			if (flags & TEST_CONTEXTS)
+				gem_context_destroy(gem_fd, eb.rsvd1);
+		}
+
+		val[0] = __pmu_read_single(fd, &ts[0]);
+		usleep(batch_duration_ns / 1000);
+		val[1] = __pmu_read_single(fd, &ts[1]);
+
+		queued[n] = calc_queued(val[1] - val[0], ts[1] - ts[0]);
+		igt_info("n=%u queued=%.2f\n", n, queued[n]);
+
+		if (fence >= 0)
+			igt_cork_unplug(&cork);
+
+		for (i = 0; i < n; i++)
+			gem_sync(gem_fd, bo);
+	}
+
+	close(fd);
+
+	gem_close(gem_fd, bo);
+
+	for (i = 0; i <= max_rq; i++)
+		assert_within_epsilon(queued[i], i, tolerance);
+}
+
+static unsigned long __query_wait(igt_spin_t *spin, unsigned int n)
+{
+	struct timespec ts = { };
+	unsigned long t;
+
+	igt_nsec_elapsed(&ts);
+
+	if (spin->running) {
+		igt_spin_busywait_until_running(spin);
+	} else {
+		igt_debug("__spin_wait - usleep mode\n");
+		usleep(500e3); /* Better than nothing! */
+	}
+
+	t = igt_nsec_elapsed(&ts);
+
+	return spin->running ? t : 500e6 / n;
+}
+
+static void
+runnable(int gem_fd, const struct intel_execution_engine2 *e)
+{
+	const unsigned long engine = e2ring(gem_fd, e);
+	bool contexts = gem_has_contexts(gem_fd);
+	const unsigned int max_rq = 10;
+	igt_spin_t *spin[max_rq + 1];
+	double runnable[max_rq + 1];
+	uint32_t ctx[max_rq];
+	unsigned int n, i;
+	uint64_t val[2];
+	uint64_t ts[2];
+	int fd;
+
+	memset(runnable, 0, sizeof(runnable));
+
+	if (contexts) {
+		for (i = 0; i < max_rq; i++)
+			ctx[i] = gem_context_create(gem_fd);
+	}
+
+	fd = open_pmu(I915_PMU_ENGINE_RUNNABLE(e->class, e->instance));
+
+	for (n = 0; n <= max_rq; n++) {
+		gem_quiescent_gpu(gem_fd);
+
+		for (i = 0; i < n; i++) {
+			uint32_t ctx_ = contexts ? ctx[i] : 0;
+
+			if (i == 0)
+				spin[i] = __spin_poll(gem_fd, ctx_, engine);
+			else
+				spin[i] = __igt_spin_batch_new(gem_fd, ctx_,
+							       engine, 0);
+		}
+
+		if (n)
+			usleep(__query_wait(spin[0], n) * n);
+
+		val[0] = __pmu_read_single(fd, &ts[0]);
+		usleep(batch_duration_ns / 1000);
+		val[1] = __pmu_read_single(fd, &ts[1]);
+
+		runnable[n] = calc_queued(val[1] - val[0], ts[1] - ts[0]);
+		igt_info("n=%u runnable=%.2f\n", n, runnable[n]);
+
+		for (i = 0; i < n; i++) {
+			end_spin(gem_fd, spin[i], FLAG_SYNC);
+			igt_spin_batch_free(gem_fd, spin[i]);
+		}
+	}
+
+	if (contexts) {
+		for (i = 0; i < max_rq; i++)
+			gem_context_destroy(gem_fd, ctx[i]);
+	}
+
+	close(fd);
+
+	assert_within_epsilon(runnable[0], 0, tolerance);
+	igt_assert(runnable[max_rq] > 0.0);
+
+	if (contexts)
+		assert_within_epsilon(runnable[max_rq] - runnable[max_rq - 1],
+				      1, tolerance);
+}
+
+static void
+running(int gem_fd, const struct intel_execution_engine2 *e)
+{
+	const unsigned long engine = e2ring(gem_fd, e);
+	const unsigned int max_rq = 10;
+	igt_spin_t *spin[max_rq + 1];
+	double running[max_rq + 1];
+	unsigned int n, i;
+	uint64_t val[2];
+	uint64_t ts[2];
+	int fd;
+
+	memset(running, 0, sizeof(running));
+	memset(spin, 0, sizeof(spin));
+
+	fd = open_pmu(I915_PMU_ENGINE_RUNNING(e->class, e->instance));
+
+	for (n = 0; n <= max_rq; n++) {
+		gem_quiescent_gpu(gem_fd);
+
+		for (i = 0; i < n; i++) {
+			if (i == 0)
+				spin[i] = __spin_poll(gem_fd, 0, engine);
+			else
+				spin[i] = __igt_spin_batch_new(gem_fd, 0,
+							       engine, 0);
+		}
+
+		if (n)
+			usleep(__query_wait(spin[0], n) * n);
+
+		val[0] = __pmu_read_single(fd, &ts[0]);
+		usleep(batch_duration_ns / 1000);
+		val[1] = __pmu_read_single(fd, &ts[1]);
+
+		running[n] = calc_queued(val[1] - val[0], ts[1] - ts[0]);
+		igt_info("n=%u running=%.2f\n", n, running[n]);
+
+		for (i = 0; i < n; i++) {
+			end_spin(gem_fd, spin[i], FLAG_SYNC);
+			igt_spin_batch_free(gem_fd, spin[i]);
+		}
+	}
+
+	close(fd);
+
+	assert_within_epsilon(running[0], 0, tolerance);
+	for (i = 1; i <= max_rq; i++)
+		igt_assert(running[i] > 0);
+}
+
 /**
  * Tests that i915 PMU corectly errors out in invalid initialization.
  * i915 PMU is uncore PMU, thus:
@@ -1692,6 +1910,15 @@ igt_main
 		igt_subtest_f("init-sema-%s", e->name)
 			init(fd, e, I915_SAMPLE_SEMA);
 
+		igt_subtest_f("init-queued-%s", e->name)
+			init(fd, e, I915_SAMPLE_QUEUED);
+
+		igt_subtest_f("init-runnable-%s", e->name)
+			init(fd, e, I915_SAMPLE_RUNNABLE);
+
+		igt_subtest_f("init-running-%s", e->name)
+			init(fd, e, I915_SAMPLE_RUNNING);
+
 		igt_subtest_group {
 			igt_fixture {
 				gem_require_engine(fd, e->class, e->instance);
@@ -1797,6 +2024,27 @@ igt_main
 
 			igt_subtest_f("busy-hang-%s", e->name)
 				single(fd, e, TEST_BUSY | FLAG_HANG);
+
+			/**
+			 * Test that queued metric works.
+			 */
+			igt_subtest_f("queued-%s", e->name)
+				queued(fd, e, 0);
+
+			igt_subtest_f("queued-contexts-%s", e->name)
+				queued(fd, e, TEST_CONTEXTS);
+
+			/**
+			 * Test that runnable metric works.
+			 */
+			igt_subtest_f("runnable-%s", e->name)
+				runnable(fd, e);
+
+			/**
+			 * Test that running metric works.
+			 */
+			igt_subtest_f("running-%s", e->name)
+				running(fd, e);
 		}
 
 		/**
@@ -1889,6 +2137,16 @@ igt_main
 					      e->name)
 					single(render_fd, e,
 					       TEST_BUSY | TEST_TRAILING_IDLE);
+				igt_subtest_f("render-node-queued-%s", e->name)
+					queued(render_fd, e, 0);
+				igt_subtest_f("render-node-queued-contexts-%s",
+					      e->name)
+					queued(render_fd, e, TEST_CONTEXTS);
+				igt_subtest_f("render-node-runnable-%s",
+					      e->name)
+					runnable(render_fd, e);
+				igt_subtest_f("render-node-running-%s", e->name)
+					running(render_fd, e);
 			}
 		}
 
-- 
2.14.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2018-04-05 12:40 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-19 18:22 [PATCH i-g-t 0/5] Queued/runnable/running engine stats Tvrtko Ursulin
2018-03-19 18:22 ` [Intel-gfx] " Tvrtko Ursulin
2018-03-19 18:22 ` [PATCH i-g-t 1/5] include: i915 uAPI headers Tvrtko Ursulin
2018-03-19 18:22   ` [igt-dev] " Tvrtko Ursulin
2018-03-19 18:22 ` [PATCH i-g-t 2/5] intel-gpu-overlay: Add engine queue stats Tvrtko Ursulin
2018-03-19 18:22   ` [igt-dev] " Tvrtko Ursulin
2018-03-19 18:22 ` [PATCH i-g-t 3/5] intel-gpu-overlay: Show 1s, 30s and 15m GPU load Tvrtko Ursulin
2018-03-19 18:22   ` [Intel-gfx] " Tvrtko Ursulin
2018-03-19 18:22 ` [PATCH i-g-t 4/5] tests/perf_pmu: Add tests for engine queued/runnable/running stats Tvrtko Ursulin
2018-03-19 18:22   ` [Intel-gfx] " Tvrtko Ursulin
2018-03-19 20:58   ` Chris Wilson
2018-03-19 20:58     ` [igt-dev] [Intel-gfx] " Chris Wilson
2018-03-23 10:08     ` [igt-dev] " Tvrtko Ursulin
2018-03-23 10:08       ` [igt-dev] [Intel-gfx] " Tvrtko Ursulin
2018-03-19 18:22 ` [PATCH i-g-t 5/5] tests/i915_query: Engine queues tests Tvrtko Ursulin
2018-03-19 18:22   ` [igt-dev] " Tvrtko Ursulin
2018-03-22 21:22   ` Lionel Landwerlin
2018-03-22 21:22     ` Lionel Landwerlin
2018-03-23 10:18     ` Tvrtko Ursulin
2018-03-23 10:18       ` Tvrtko Ursulin
2018-03-23 10:47       ` Lionel Landwerlin
2018-03-23 10:47         ` [Intel-gfx] " Lionel Landwerlin
2018-03-23 10:53       ` Chris Wilson
2018-03-23 10:53         ` [Intel-gfx] " Chris Wilson
2018-03-19 21:54 ` [igt-dev] ✗ Fi.CI.BAT: failure for Queued/runnable/running engine stats Patchwork
2018-04-05 12:40 [PATCH i-g-t v2 0/5] " Tvrtko Ursulin
2018-04-05 12:40 ` [PATCH i-g-t 4/5] tests/perf_pmu: Add tests for engine queued/runnable/running stats Tvrtko Ursulin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.