All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/7] Queued/runnable/running engine stats
@ 2018-03-19 18:16 Tvrtko Ursulin
  2018-03-19 18:16 ` [PATCH 1/7] drm/i915/pmu: Fix enable count array size and bounds checking Tvrtko Ursulin
                   ` (7 more replies)
  0 siblings, 8 replies; 11+ messages in thread
From: Tvrtko Ursulin @ 2018-03-19 18:16 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Per-engine queue depths are an interesting metric for analyzing the system load
and also for users who wish to use it to load balance their submissions based
on it.

In this version I have split the metrics into three separate counters:

1. QUEUED - From execbuf time to request being runnable - runnable meaning until
            dependencies have been resolved and fences signaled.
2. RUNNABLE - From runnable to running on the GPU.
3. RUNNING - Running on the GPU.

When inspected with perf stat the output looks roughly like this:

#           time             counts unit events
   201.160490145               0.01      i915/rcs0-queued/
   201.160490145              19.13      i915/rcs0-runnable/
   201.160490145               2.39      i915/rcs0-running/

The reported numbers are average queue depths for the last query period.

v2:
 * Review feedback (see patch changelogs).
 * Renamed the counters and re-ordered some patches.

v3:
 * Review feedback and rebase.

v4:
 * Addition of last patch in the series, which supports a customer requirement
   to expose instantaneous queue values via the i915 query API.

Tvrtko Ursulin (7):
  drm/i915/pmu: Fix enable count array size and bounds checking
  drm/i915: Keep a count of requests waiting for a slot on GPU
  drm/i915: Keep a count of requests submitted from userspace
  drm/i915/pmu: Add queued counter
  drm/i915/pmu: Add runnable counter
  drm/i915/pmu: Add running counter
  drm/i915: Engine queues query

 drivers/gpu/drm/i915/i915_pmu.c         | 81 +++++++++++++++++++++++++++++----
 drivers/gpu/drm/i915/i915_query.c       | 43 +++++++++++++++++
 drivers/gpu/drm/i915/i915_request.c     | 10 ++++
 drivers/gpu/drm/i915/intel_engine_cs.c  |  6 ++-
 drivers/gpu/drm/i915/intel_lrc.c        |  1 +
 drivers/gpu/drm/i915/intel_ringbuffer.h | 21 ++++++++-
 include/uapi/drm/i915_drm.h             | 45 +++++++++++++++++-
 7 files changed, 194 insertions(+), 13 deletions(-)

-- 
2.14.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/7] drm/i915/pmu: Fix enable count array size and bounds checking
  2018-03-19 18:16 [PATCH v4 0/7] Queued/runnable/running engine stats Tvrtko Ursulin
@ 2018-03-19 18:16 ` Tvrtko Ursulin
  2018-03-19 18:16 ` [PATCH 2/7] drm/i915: Keep a count of requests waiting for a slot on GPU Tvrtko Ursulin
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Tvrtko Ursulin @ 2018-03-19 18:16 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Enable count array is supposed to have one counter for each possible
engine sampler. As such array sizing and bounds checking is not
correct when more engine samplers are added.

At the same time tidy the assert for readability and robustness.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: b46a33e271ed ("drm/i915/pmu: Expose a PMU interface for perf queries")
Cc: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_pmu.c         | 13 +++++++++----
 drivers/gpu/drm/i915/intel_ringbuffer.h |  2 +-
 2 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index 11fb76bd3860..eb60943671b3 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -549,7 +549,8 @@ static void i915_pmu_enable(struct perf_event *event)
 	 * Update the bitmask of enabled events and increment
 	 * the event reference counter.
 	 */
-	GEM_BUG_ON(bit >= I915_PMU_MASK_BITS);
+	BUILD_BUG_ON(ARRAY_SIZE(i915->pmu.enable_count) != I915_PMU_MASK_BITS);
+	GEM_BUG_ON(bit >= ARRAY_SIZE(i915->pmu.enable_count));
 	GEM_BUG_ON(i915->pmu.enable_count[bit] == ~0);
 	i915->pmu.enable |= BIT_ULL(bit);
 	i915->pmu.enable_count[bit]++;
@@ -573,7 +574,10 @@ static void i915_pmu_enable(struct perf_event *event)
 		GEM_BUG_ON(!engine);
 		engine->pmu.enable |= BIT(sample);
 
-		GEM_BUG_ON(sample >= I915_PMU_SAMPLE_BITS);
+		BUILD_BUG_ON(ARRAY_SIZE(engine->pmu.enable_count) !=
+			     (1 << I915_PMU_SAMPLE_BITS));
+		GEM_BUG_ON(sample >= ARRAY_SIZE(engine->pmu.enable_count));
+		GEM_BUG_ON(sample >= ARRAY_SIZE(engine->pmu.sample));
 		GEM_BUG_ON(engine->pmu.enable_count[sample] == ~0);
 		engine->pmu.enable_count[sample]++;
 	}
@@ -605,7 +609,8 @@ static void i915_pmu_disable(struct perf_event *event)
 						  engine_event_class(event),
 						  engine_event_instance(event));
 		GEM_BUG_ON(!engine);
-		GEM_BUG_ON(sample >= I915_PMU_SAMPLE_BITS);
+		GEM_BUG_ON(sample >= ARRAY_SIZE(engine->pmu.enable_count));
+		GEM_BUG_ON(sample >= ARRAY_SIZE(engine->pmu.sample));
 		GEM_BUG_ON(engine->pmu.enable_count[sample] == 0);
 		/*
 		 * Decrement the reference count and clear the enabled
@@ -615,7 +620,7 @@ static void i915_pmu_disable(struct perf_event *event)
 			engine->pmu.enable &= ~BIT(sample);
 	}
 
-	GEM_BUG_ON(bit >= I915_PMU_MASK_BITS);
+	GEM_BUG_ON(bit >= ARRAY_SIZE(i915->pmu.enable_count));
 	GEM_BUG_ON(i915->pmu.enable_count[bit] == 0);
 	/*
 	 * Decrement the reference count and clear the enabled
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 1f50727a5ddb..828a1f924405 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -391,7 +391,7 @@ struct intel_engine_cs {
 		 *
 		 * Index number corresponds to the bit number from @enable.
 		 */
-		unsigned int enable_count[I915_PMU_SAMPLE_BITS];
+		unsigned int enable_count[1 << I915_PMU_SAMPLE_BITS];
 		/**
 		 * @sample: Counter values for sampling events.
 		 *
-- 
2.14.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/7] drm/i915: Keep a count of requests waiting for a slot on GPU
  2018-03-19 18:16 [PATCH v4 0/7] Queued/runnable/running engine stats Tvrtko Ursulin
  2018-03-19 18:16 ` [PATCH 1/7] drm/i915/pmu: Fix enable count array size and bounds checking Tvrtko Ursulin
@ 2018-03-19 18:16 ` Tvrtko Ursulin
  2018-03-19 18:16 ` [PATCH 3/7] drm/i915: Keep a count of requests submitted from userspace Tvrtko Ursulin
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Tvrtko Ursulin @ 2018-03-19 18:16 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Keep a per-engine number of runnable (waiting for GPU time) requests.

v2:
 * Move queued increment from insert_request to execlist_submit_request to
   avoid bumping when re-ordering for priority.
 * Support the counter on the ringbuffer submission path as well, albeit
   just notionally. (Chris Wilson)

v3:
 * Rebase.

v4:
 * Rename and move the stats into a container structure. (Chris Wilson)

v5:
 * Re-order fields in struct intel_engine_cs. (Chris Wilson)

v6-v8:
 * Rebases.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_request.c     | 7 +++++++
 drivers/gpu/drm/i915/intel_engine_cs.c  | 5 +++--
 drivers/gpu/drm/i915/intel_lrc.c        | 1 +
 drivers/gpu/drm/i915/intel_ringbuffer.h | 9 +++++++++
 4 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 43c7134a9b93..2052028c1d68 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -525,6 +525,9 @@ void __i915_request_submit(struct i915_request *request)
 	engine->emit_breadcrumb(request,
 				request->ring->vaddr + request->postfix);
 
+	GEM_BUG_ON(engine->request_stats.runnable == 0);
+	engine->request_stats.runnable--;
+
 	spin_lock(&request->timeline->lock);
 	list_move_tail(&request->link, &timeline->requests);
 	spin_unlock(&request->timeline->lock);
@@ -542,6 +545,8 @@ void i915_request_submit(struct i915_request *request)
 	/* Will be called from irq-context when using foreign fences. */
 	spin_lock_irqsave(&engine->timeline->lock, flags);
 
+	engine->request_stats.runnable++;
+
 	__i915_request_submit(request);
 
 	spin_unlock_irqrestore(&engine->timeline->lock, flags);
@@ -581,6 +586,8 @@ void __i915_request_unsubmit(struct i915_request *request)
 	timeline = request->timeline;
 	GEM_BUG_ON(timeline == engine->timeline);
 
+	engine->request_stats.runnable++;
+
 	spin_lock(&timeline->lock);
 	list_move(&request->link, &timeline->requests);
 	spin_unlock(&timeline->lock);
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 337dfa56a738..3726544cfb00 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -1917,12 +1917,13 @@ void intel_engine_dump(struct intel_engine_cs *engine,
 	if (i915_terminally_wedged(&engine->i915->gpu_error))
 		drm_printf(m, "*** WEDGED ***\n");
 
-	drm_printf(m, "\tcurrent seqno %x, last %x, hangcheck %x [%d ms], inflight %d\n",
+	drm_printf(m, "\tcurrent seqno %x, last %x, hangcheck %x [%d ms], inflight %d, runnable %u\n",
 		   intel_engine_get_seqno(engine),
 		   intel_engine_last_submit(engine),
 		   engine->hangcheck.seqno,
 		   jiffies_to_msecs(jiffies - engine->hangcheck.action_timestamp),
-		   engine->timeline->inflight_seqnos);
+		   engine->timeline->inflight_seqnos,
+		   engine->request_stats.runnable);
 	drm_printf(m, "\tReset count: %d (global %d)\n",
 		   i915_reset_engine_count(error, engine),
 		   i915_reset_count(error));
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 53f1c009ed7b..46bacd98a360 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1033,6 +1033,7 @@ static void execlists_submit_request(struct i915_request *request)
 
 	queue_request(engine, &request->priotree, rq_prio(request));
 	submit_queue(engine, rq_prio(request));
+	engine->request_stats.runnable++;
 
 	GEM_BUG_ON(!engine->execlists.first);
 	GEM_BUG_ON(list_empty(&request->priotree.link));
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 828a1f924405..e98e007cb96b 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -338,6 +338,15 @@ struct intel_engine_cs {
 
 	struct drm_i915_gem_object *default_state;
 
+	struct {
+		/**
+		 * @runnable: Number of runnable requests sent to the backend.
+		 *
+		 * Count of requests waiting for the GPU to execute them.
+		 */
+		unsigned int runnable;
+	} request_stats;
+
 	atomic_t irq_count;
 	unsigned long irq_posted;
 #define ENGINE_IRQ_BREADCRUMB 0
-- 
2.14.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/7] drm/i915: Keep a count of requests submitted from userspace
  2018-03-19 18:16 [PATCH v4 0/7] Queued/runnable/running engine stats Tvrtko Ursulin
  2018-03-19 18:16 ` [PATCH 1/7] drm/i915/pmu: Fix enable count array size and bounds checking Tvrtko Ursulin
  2018-03-19 18:16 ` [PATCH 2/7] drm/i915: Keep a count of requests waiting for a slot on GPU Tvrtko Ursulin
@ 2018-03-19 18:16 ` Tvrtko Ursulin
  2018-03-19 18:16 ` [PATCH 4/7] drm/i915/pmu: Add queued counter Tvrtko Ursulin
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Tvrtko Ursulin @ 2018-03-19 18:16 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Keep a count of requests submitted from userspace and not yet runnable due
unresolved dependencies.

v2: Rename and move under the container struct. (Chris Wilson)
v3: Rebase.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_request.c     | 3 +++
 drivers/gpu/drm/i915/intel_engine_cs.c  | 3 ++-
 drivers/gpu/drm/i915/intel_ringbuffer.h | 8 ++++++++
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 2052028c1d68..79bb26928263 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -634,6 +634,7 @@ submit_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
 		rcu_read_lock();
 		request->engine->submit_request(request);
 		rcu_read_unlock();
+		atomic_dec(&request->engine->request_stats.queued);
 		break;
 
 	case FENCE_FREE:
@@ -1112,6 +1113,8 @@ void __i915_request_add(struct i915_request *request, bool flush_caches)
 		engine->schedule(request, request->ctx->priority);
 	rcu_read_unlock();
 
+	atomic_inc(&engine->request_stats.queued);
+
 	local_bh_disable();
 	i915_sw_fence_commit(&request->submit);
 	local_bh_enable(); /* Kick the execlists tasklet if just scheduled */
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 3726544cfb00..fa9aceb076b4 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -1917,12 +1917,13 @@ void intel_engine_dump(struct intel_engine_cs *engine,
 	if (i915_terminally_wedged(&engine->i915->gpu_error))
 		drm_printf(m, "*** WEDGED ***\n");
 
-	drm_printf(m, "\tcurrent seqno %x, last %x, hangcheck %x [%d ms], inflight %d, runnable %u\n",
+	drm_printf(m, "\tcurrent seqno %x, last %x, hangcheck %x [%d ms], inflight %d, queued %u, runnable %u\n",
 		   intel_engine_get_seqno(engine),
 		   intel_engine_last_submit(engine),
 		   engine->hangcheck.seqno,
 		   jiffies_to_msecs(jiffies - engine->hangcheck.action_timestamp),
 		   engine->timeline->inflight_seqnos,
+		   atomic_read(&engine->request_stats.queued),
 		   engine->request_stats.runnable);
 	drm_printf(m, "\tReset count: %d (global %d)\n",
 		   i915_reset_engine_count(error, engine),
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index e98e007cb96b..810ef79959f2 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -339,6 +339,14 @@ struct intel_engine_cs {
 	struct drm_i915_gem_object *default_state;
 
 	struct {
+		/**
+		 * @queued: Number of submitted requests with dependencies.
+		 *
+		 * Count of requests waiting for dependencies before they can be
+		 * submitted to the backend.
+		 */
+		atomic_t queued;
+
 		/**
 		 * @runnable: Number of runnable requests sent to the backend.
 		 *
-- 
2.14.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 4/7] drm/i915/pmu: Add queued counter
  2018-03-19 18:16 [PATCH v4 0/7] Queued/runnable/running engine stats Tvrtko Ursulin
                   ` (2 preceding siblings ...)
  2018-03-19 18:16 ` [PATCH 3/7] drm/i915: Keep a count of requests submitted from userspace Tvrtko Ursulin
@ 2018-03-19 18:16 ` Tvrtko Ursulin
  2018-03-19 18:16 ` [PATCH 5/7] drm/i915/pmu: Add runnable counter Tvrtko Ursulin
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Tvrtko Ursulin @ 2018-03-19 18:16 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

We add a PMU counter to expose the number of requests which have been
submitted from userspace but are not yet runnable due dependencies and
unsignaled fences.

This is useful to analyze the overall load of the system.

v2:
 * Rebase for name change and re-order.
 * Drop floating point constant. (Chris Wilson)

v3:
 * Change scale to 1024 for faster arithmetics. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_pmu.c         | 40 +++++++++++++++++++++++++++++----
 drivers/gpu/drm/i915/intel_ringbuffer.h |  2 +-
 include/uapi/drm/i915_drm.h             |  9 +++++++-
 3 files changed, 45 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index eb60943671b3..07f5cac97b56 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -15,7 +15,8 @@
 #define ENGINE_SAMPLE_MASK \
 	(BIT(I915_SAMPLE_BUSY) | \
 	 BIT(I915_SAMPLE_WAIT) | \
-	 BIT(I915_SAMPLE_SEMA))
+	 BIT(I915_SAMPLE_SEMA) | \
+	 BIT(I915_SAMPLE_QUEUED))
 
 #define ENGINE_SAMPLE_BITS (1 << I915_PMU_SAMPLE_BITS)
 
@@ -199,6 +200,11 @@ static void engines_sample(struct drm_i915_private *dev_priv)
 
 		update_sample(&engine->pmu.sample[I915_SAMPLE_SEMA],
 			      PERIOD, !!(val & RING_WAIT_SEMAPHORE));
+
+		if (engine->pmu.enable & BIT(I915_SAMPLE_QUEUED))
+			update_sample(&engine->pmu.sample[I915_SAMPLE_QUEUED],
+				      I915_SAMPLE_QUEUED_DIVISOR,
+				      atomic_read(&engine->request_stats.queued));
 	}
 
 	if (fw)
@@ -296,6 +302,7 @@ engine_event_status(struct intel_engine_cs *engine,
 	switch (sample) {
 	case I915_SAMPLE_BUSY:
 	case I915_SAMPLE_WAIT:
+	case I915_SAMPLE_QUEUED:
 		break;
 	case I915_SAMPLE_SEMA:
 		if (INTEL_GEN(engine->i915) < 6)
@@ -497,6 +504,9 @@ static u64 __i915_pmu_event_read(struct perf_event *event)
 		} else {
 			val = engine->pmu.sample[sample].cur;
 		}
+
+		if (sample == I915_SAMPLE_QUEUED)
+			val = div_u64(val, FREQUENCY);
 	} else {
 		switch (event->attr.config) {
 		case I915_PMU_ACTUAL_FREQUENCY:
@@ -752,6 +762,16 @@ static const struct attribute_group *i915_pmu_attr_groups[] = {
 { \
 	.sample = (__sample), \
 	.name = (__name), \
+	.suffix = "unit", \
+	.value = "ns", \
+}
+
+#define __engine_event_scale(__sample, __name, __scale) \
+{ \
+	.sample = (__sample), \
+	.name = (__name), \
+	.suffix = "scale", \
+	.value = (__scale), \
 }
 
 static struct i915_ext_attribute *
@@ -779,6 +799,9 @@ add_pmu_attr(struct perf_pmu_events_attr *attr, const char *name,
 	return ++attr;
 }
 
+/* No brackets or quotes below please. */
+#define I915_SAMPLE_QUEUED_SCALE 0.0009765625
+
 static struct attribute **
 create_event_attributes(struct drm_i915_private *i915)
 {
@@ -795,10 +818,14 @@ create_event_attributes(struct drm_i915_private *i915)
 	static const struct {
 		enum drm_i915_pmu_engine_sample sample;
 		char *name;
+		char *suffix;
+		char *value;
 	} engine_events[] = {
 		__engine_event(I915_SAMPLE_BUSY, "busy"),
 		__engine_event(I915_SAMPLE_SEMA, "sema"),
 		__engine_event(I915_SAMPLE_WAIT, "wait"),
+		__engine_event_scale(I915_SAMPLE_QUEUED, "queued",
+				     __stringify(I915_SAMPLE_QUEUED_SCALE)),
 	};
 	unsigned int count = 0;
 	struct perf_pmu_events_attr *pmu_attr = NULL, *pmu_iter;
@@ -808,6 +835,9 @@ create_event_attributes(struct drm_i915_private *i915)
 	enum intel_engine_id id;
 	unsigned int i;
 
+	BUILD_BUG_ON(I915_SAMPLE_QUEUED_DIVISOR !=
+		     (1 / I915_SAMPLE_QUEUED_SCALE));
+
 	/* Count how many counters we will be exposing. */
 	for (i = 0; i < ARRAY_SIZE(events); i++) {
 		if (!config_status(i915, events[i].config))
@@ -885,13 +915,15 @@ create_event_attributes(struct drm_i915_private *i915)
 								engine->instance,
 								engine_events[i].sample));
 
-			str = kasprintf(GFP_KERNEL, "%s-%s.unit",
-					engine->name, engine_events[i].name);
+			str = kasprintf(GFP_KERNEL, "%s-%s.%s",
+					engine->name, engine_events[i].name,
+					engine_events[i].suffix);
 			if (!str)
 				goto err;
 
 			*attr_iter++ = &pmu_iter->attr.attr;
-			pmu_iter = add_pmu_attr(pmu_iter, str, "ns");
+			pmu_iter = add_pmu_attr(pmu_iter, str,
+						engine_events[i].value);
 		}
 	}
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 810ef79959f2..62a198f5ba80 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -414,7 +414,7 @@ struct intel_engine_cs {
 		 *
 		 * Our internal timer stores the current counters in this field.
 		 */
-#define I915_ENGINE_SAMPLE_MAX (I915_SAMPLE_SEMA + 1)
+#define I915_ENGINE_SAMPLE_MAX (I915_SAMPLE_QUEUED + 1)
 		struct i915_pmu_sample sample[I915_ENGINE_SAMPLE_MAX];
 	} pmu;
 
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 7f5634ce8e88..6094cc9ca6d9 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -110,9 +110,13 @@ enum drm_i915_gem_engine_class {
 enum drm_i915_pmu_engine_sample {
 	I915_SAMPLE_BUSY = 0,
 	I915_SAMPLE_WAIT = 1,
-	I915_SAMPLE_SEMA = 2
+	I915_SAMPLE_SEMA = 2,
+	I915_SAMPLE_QUEUED = 3
 };
 
+ /* Divide counter value by divisor to get the real value. */
+#define I915_SAMPLE_QUEUED_DIVISOR (1024)
+
 #define I915_PMU_SAMPLE_BITS (4)
 #define I915_PMU_SAMPLE_MASK (0xf)
 #define I915_PMU_SAMPLE_INSTANCE_BITS (8)
@@ -133,6 +137,9 @@ enum drm_i915_pmu_engine_sample {
 #define I915_PMU_ENGINE_SEMA(class, instance) \
 	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_SEMA)
 
+#define I915_PMU_ENGINE_QUEUED(class, instance) \
+	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_QUEUED)
+
 #define __I915_PMU_OTHER(x) (__I915_PMU_ENGINE(0xff, 0xff, 0xf) + 1 + (x))
 
 #define I915_PMU_ACTUAL_FREQUENCY	__I915_PMU_OTHER(0)
-- 
2.14.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 5/7] drm/i915/pmu: Add runnable counter
  2018-03-19 18:16 [PATCH v4 0/7] Queued/runnable/running engine stats Tvrtko Ursulin
                   ` (3 preceding siblings ...)
  2018-03-19 18:16 ` [PATCH 4/7] drm/i915/pmu: Add queued counter Tvrtko Ursulin
@ 2018-03-19 18:16 ` Tvrtko Ursulin
  2018-03-19 18:16 ` [PATCH 6/7] drm/i915/pmu: Add running counter Tvrtko Ursulin
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Tvrtko Ursulin @ 2018-03-19 18:16 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

We add a PMU counter to expose the number of requests with resolved
dependencies waiting for a slot on the GPU to run.

This is useful to analyze the overall load of the system.

v2: Don't limit to gen8+.

v3:
 * Rebase for dynamic sysfs.
 * Drop currently executing requests.

v4:
 * Sync with internal renaming.
 * Drop floating point constant. (Chris Wilson)

v5:
 * Change scale to 1024 for faster arithmetics. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_pmu.c         | 18 ++++++++++++++++--
 drivers/gpu/drm/i915/intel_ringbuffer.h |  2 +-
 include/uapi/drm/i915_drm.h             |  7 ++++++-
 3 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index 07f5cac97b56..afc561e1aa92 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -16,7 +16,8 @@
 	(BIT(I915_SAMPLE_BUSY) | \
 	 BIT(I915_SAMPLE_WAIT) | \
 	 BIT(I915_SAMPLE_SEMA) | \
-	 BIT(I915_SAMPLE_QUEUED))
+	 BIT(I915_SAMPLE_QUEUED) | \
+	 BIT(I915_SAMPLE_RUNNABLE))
 
 #define ENGINE_SAMPLE_BITS (1 << I915_PMU_SAMPLE_BITS)
 
@@ -205,6 +206,11 @@ static void engines_sample(struct drm_i915_private *dev_priv)
 			update_sample(&engine->pmu.sample[I915_SAMPLE_QUEUED],
 				      I915_SAMPLE_QUEUED_DIVISOR,
 				      atomic_read(&engine->request_stats.queued));
+
+		if (engine->pmu.enable & BIT(I915_SAMPLE_RUNNABLE))
+			update_sample(&engine->pmu.sample[I915_SAMPLE_RUNNABLE],
+				      I915_SAMPLE_RUNNABLE_DIVISOR,
+				      engine->request_stats.runnable);
 	}
 
 	if (fw)
@@ -303,6 +309,7 @@ engine_event_status(struct intel_engine_cs *engine,
 	case I915_SAMPLE_BUSY:
 	case I915_SAMPLE_WAIT:
 	case I915_SAMPLE_QUEUED:
+	case I915_SAMPLE_RUNNABLE:
 		break;
 	case I915_SAMPLE_SEMA:
 		if (INTEL_GEN(engine->i915) < 6)
@@ -505,7 +512,8 @@ static u64 __i915_pmu_event_read(struct perf_event *event)
 			val = engine->pmu.sample[sample].cur;
 		}
 
-		if (sample == I915_SAMPLE_QUEUED)
+		if (sample == I915_SAMPLE_QUEUED ||
+		    sample == I915_SAMPLE_RUNNABLE)
 			val = div_u64(val, FREQUENCY);
 	} else {
 		switch (event->attr.config) {
@@ -801,6 +809,7 @@ add_pmu_attr(struct perf_pmu_events_attr *attr, const char *name,
 
 /* No brackets or quotes below please. */
 #define I915_SAMPLE_QUEUED_SCALE 0.0009765625
+#define I915_SAMPLE_RUNNABLE_SCALE 0.0009765625
 
 static struct attribute **
 create_event_attributes(struct drm_i915_private *i915)
@@ -826,6 +835,8 @@ create_event_attributes(struct drm_i915_private *i915)
 		__engine_event(I915_SAMPLE_WAIT, "wait"),
 		__engine_event_scale(I915_SAMPLE_QUEUED, "queued",
 				     __stringify(I915_SAMPLE_QUEUED_SCALE)),
+		__engine_event_scale(I915_SAMPLE_RUNNABLE, "runnable",
+				     __stringify(I915_SAMPLE_RUNNABLE_SCALE)),
 	};
 	unsigned int count = 0;
 	struct perf_pmu_events_attr *pmu_attr = NULL, *pmu_iter;
@@ -838,6 +849,9 @@ create_event_attributes(struct drm_i915_private *i915)
 	BUILD_BUG_ON(I915_SAMPLE_QUEUED_DIVISOR !=
 		     (1 / I915_SAMPLE_QUEUED_SCALE));
 
+	BUILD_BUG_ON(I915_SAMPLE_RUNNABLE_DIVISOR !=
+		     (1 / I915_SAMPLE_RUNNABLE_SCALE));
+
 	/* Count how many counters we will be exposing. */
 	for (i = 0; i < ARRAY_SIZE(events); i++) {
 		if (!config_status(i915, events[i].config))
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 62a198f5ba80..5d7532b185fe 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -414,7 +414,7 @@ struct intel_engine_cs {
 		 *
 		 * Our internal timer stores the current counters in this field.
 		 */
-#define I915_ENGINE_SAMPLE_MAX (I915_SAMPLE_QUEUED + 1)
+#define I915_ENGINE_SAMPLE_MAX (I915_SAMPLE_RUNNABLE + 1)
 		struct i915_pmu_sample sample[I915_ENGINE_SAMPLE_MAX];
 	} pmu;
 
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 6094cc9ca6d9..cf0265b20e37 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -111,11 +111,13 @@ enum drm_i915_pmu_engine_sample {
 	I915_SAMPLE_BUSY = 0,
 	I915_SAMPLE_WAIT = 1,
 	I915_SAMPLE_SEMA = 2,
-	I915_SAMPLE_QUEUED = 3
+	I915_SAMPLE_QUEUED = 3,
+	I915_SAMPLE_RUNNABLE = 4,
 };
 
  /* Divide counter value by divisor to get the real value. */
 #define I915_SAMPLE_QUEUED_DIVISOR (1024)
+#define I915_SAMPLE_RUNNABLE_DIVISOR (1024)
 
 #define I915_PMU_SAMPLE_BITS (4)
 #define I915_PMU_SAMPLE_MASK (0xf)
@@ -140,6 +142,9 @@ enum drm_i915_pmu_engine_sample {
 #define I915_PMU_ENGINE_QUEUED(class, instance) \
 	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_QUEUED)
 
+#define I915_PMU_ENGINE_RUNNABLE(class, instance) \
+	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_RUNNABLE)
+
 #define __I915_PMU_OTHER(x) (__I915_PMU_ENGINE(0xff, 0xff, 0xf) + 1 + (x))
 
 #define I915_PMU_ACTUAL_FREQUENCY	__I915_PMU_OTHER(0)
-- 
2.14.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 6/7] drm/i915/pmu: Add running counter
  2018-03-19 18:16 [PATCH v4 0/7] Queued/runnable/running engine stats Tvrtko Ursulin
                   ` (4 preceding siblings ...)
  2018-03-19 18:16 ` [PATCH 5/7] drm/i915/pmu: Add runnable counter Tvrtko Ursulin
@ 2018-03-19 18:16 ` Tvrtko Ursulin
  2018-03-19 18:16 ` [PATCH 7/7] drm/i915: Engine queues query Tvrtko Ursulin
  2018-03-19 19:57 ` ✗ Fi.CI.BAT: failure for Queued/runnable/running engine stats (rev3) Patchwork
  7 siblings, 0 replies; 11+ messages in thread
From: Tvrtko Ursulin @ 2018-03-19 18:16 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

We add a PMU counter to expose the number of requests currently executing
on the GPU.

This is useful to analyze the overall load of the system.

v2:
 * Rebase.
 * Drop floating point constant. (Chris Wilson)

v3:
 * Change scale to 1024 for faster arithmetics. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_pmu.c         | 18 ++++++++++++++++--
 drivers/gpu/drm/i915/intel_ringbuffer.h |  2 +-
 include/uapi/drm/i915_drm.h             |  5 +++++
 3 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index afc561e1aa92..bd7e695fc663 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -17,7 +17,8 @@
 	 BIT(I915_SAMPLE_WAIT) | \
 	 BIT(I915_SAMPLE_SEMA) | \
 	 BIT(I915_SAMPLE_QUEUED) | \
-	 BIT(I915_SAMPLE_RUNNABLE))
+	 BIT(I915_SAMPLE_RUNNABLE) | \
+	 BIT(I915_SAMPLE_RUNNING))
 
 #define ENGINE_SAMPLE_BITS (1 << I915_PMU_SAMPLE_BITS)
 
@@ -211,6 +212,11 @@ static void engines_sample(struct drm_i915_private *dev_priv)
 			update_sample(&engine->pmu.sample[I915_SAMPLE_RUNNABLE],
 				      I915_SAMPLE_RUNNABLE_DIVISOR,
 				      engine->request_stats.runnable);
+
+		if (engine->pmu.enable & BIT(I915_SAMPLE_RUNNING))
+			update_sample(&engine->pmu.sample[I915_SAMPLE_RUNNING],
+				      I915_SAMPLE_RUNNING_DIVISOR,
+				      last_seqno - current_seqno);
 	}
 
 	if (fw)
@@ -310,6 +316,7 @@ engine_event_status(struct intel_engine_cs *engine,
 	case I915_SAMPLE_WAIT:
 	case I915_SAMPLE_QUEUED:
 	case I915_SAMPLE_RUNNABLE:
+	case I915_SAMPLE_RUNNING:
 		break;
 	case I915_SAMPLE_SEMA:
 		if (INTEL_GEN(engine->i915) < 6)
@@ -513,7 +520,8 @@ static u64 __i915_pmu_event_read(struct perf_event *event)
 		}
 
 		if (sample == I915_SAMPLE_QUEUED ||
-		    sample == I915_SAMPLE_RUNNABLE)
+		    sample == I915_SAMPLE_RUNNABLE ||
+		    sample == I915_SAMPLE_RUNNING)
 			val = div_u64(val, FREQUENCY);
 	} else {
 		switch (event->attr.config) {
@@ -810,6 +818,7 @@ add_pmu_attr(struct perf_pmu_events_attr *attr, const char *name,
 /* No brackets or quotes below please. */
 #define I915_SAMPLE_QUEUED_SCALE 0.0009765625
 #define I915_SAMPLE_RUNNABLE_SCALE 0.0009765625
+#define I915_SAMPLE_RUNNING_SCALE 0.0009765625
 
 static struct attribute **
 create_event_attributes(struct drm_i915_private *i915)
@@ -837,6 +846,8 @@ create_event_attributes(struct drm_i915_private *i915)
 				     __stringify(I915_SAMPLE_QUEUED_SCALE)),
 		__engine_event_scale(I915_SAMPLE_RUNNABLE, "runnable",
 				     __stringify(I915_SAMPLE_RUNNABLE_SCALE)),
+		__engine_event_scale(I915_SAMPLE_RUNNING, "running",
+				     __stringify(I915_SAMPLE_RUNNING_SCALE)),
 	};
 	unsigned int count = 0;
 	struct perf_pmu_events_attr *pmu_attr = NULL, *pmu_iter;
@@ -852,6 +863,9 @@ create_event_attributes(struct drm_i915_private *i915)
 	BUILD_BUG_ON(I915_SAMPLE_RUNNABLE_DIVISOR !=
 		     (1 / I915_SAMPLE_RUNNABLE_SCALE));
 
+	BUILD_BUG_ON(I915_SAMPLE_RUNNING_DIVISOR !=
+		     (1 / I915_SAMPLE_RUNNING_SCALE));
+
 	/* Count how many counters we will be exposing. */
 	for (i = 0; i < ARRAY_SIZE(events); i++) {
 		if (!config_status(i915, events[i].config))
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 5d7532b185fe..fe1b7d0a94e9 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -414,7 +414,7 @@ struct intel_engine_cs {
 		 *
 		 * Our internal timer stores the current counters in this field.
 		 */
-#define I915_ENGINE_SAMPLE_MAX (I915_SAMPLE_RUNNABLE + 1)
+#define I915_ENGINE_SAMPLE_MAX (I915_SAMPLE_RUNNING + 1)
 		struct i915_pmu_sample sample[I915_ENGINE_SAMPLE_MAX];
 	} pmu;
 
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index cf0265b20e37..9a00c30e4071 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -113,11 +113,13 @@ enum drm_i915_pmu_engine_sample {
 	I915_SAMPLE_SEMA = 2,
 	I915_SAMPLE_QUEUED = 3,
 	I915_SAMPLE_RUNNABLE = 4,
+	I915_SAMPLE_RUNNING = 5,
 };
 
  /* Divide counter value by divisor to get the real value. */
 #define I915_SAMPLE_QUEUED_DIVISOR (1024)
 #define I915_SAMPLE_RUNNABLE_DIVISOR (1024)
+#define I915_SAMPLE_RUNNING_DIVISOR (1024)
 
 #define I915_PMU_SAMPLE_BITS (4)
 #define I915_PMU_SAMPLE_MASK (0xf)
@@ -145,6 +147,9 @@ enum drm_i915_pmu_engine_sample {
 #define I915_PMU_ENGINE_RUNNABLE(class, instance) \
 	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_RUNNABLE)
 
+#define I915_PMU_ENGINE_RUNNING(class, instance) \
+	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_RUNNING)
+
 #define __I915_PMU_OTHER(x) (__I915_PMU_ENGINE(0xff, 0xff, 0xf) + 1 + (x))
 
 #define I915_PMU_ACTUAL_FREQUENCY	__I915_PMU_OTHER(0)
-- 
2.14.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 7/7] drm/i915: Engine queues query
  2018-03-19 18:16 [PATCH v4 0/7] Queued/runnable/running engine stats Tvrtko Ursulin
                   ` (5 preceding siblings ...)
  2018-03-19 18:16 ` [PATCH 6/7] drm/i915/pmu: Add running counter Tvrtko Ursulin
@ 2018-03-19 18:16 ` Tvrtko Ursulin
  2018-03-31  1:04   ` Lionel Landwerlin
  2018-03-19 19:57 ` ✗ Fi.CI.BAT: failure for Queued/runnable/running engine stats (rev3) Patchwork
  7 siblings, 1 reply; 11+ messages in thread
From: Tvrtko Ursulin @ 2018-03-19 18:16 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

As well as exposing active requests on engines via PMU, we can also export
the current raw values (as tracked by i915 command submission) via a
dedicated query.

This is to satisfy customers who have userspace load balancing solutions
implemented on top of their custom kernel patches.

Userspace is now able to include DRM_I915_QUERY_ENGINE_QUEUES in their
query list, pointing to initialized struct drm_i915_query_engine_queues
entry. Fields describing engine class and instance userspace would like to
know about need to be filled in, and i915 will fill in the rest.

Multiple engines can be queried in one go by having multiple queries in
the query list.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
---
 drivers/gpu/drm/i915/i915_query.c | 43 +++++++++++++++++++++++++++++++++++++++
 include/uapi/drm/i915_drm.h       | 26 +++++++++++++++++++++++
 2 files changed, 69 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_query.c b/drivers/gpu/drm/i915/i915_query.c
index 3ace929dd90f..b3bc69e8deb7 100644
--- a/drivers/gpu/drm/i915/i915_query.c
+++ b/drivers/gpu/drm/i915/i915_query.c
@@ -82,9 +82,52 @@ static int query_topology_info(struct drm_i915_private *dev_priv,
 	return total_length;
 }
 
+static int
+query_engine_queues(struct drm_i915_private *i915,
+		    struct drm_i915_query_item *query_item)
+{
+	struct drm_i915_query_engine_queues __user *query_ptr =
+				u64_to_user_ptr(query_item->data_ptr);
+	struct drm_i915_query_engine_queues query;
+	struct intel_engine_cs *engine;
+	const int len = sizeof(query);
+	unsigned int i;
+
+	if (query_item->flags)
+		return -EINVAL;
+
+	if (!query_item->length)
+		return len;
+	else if (query_item->length < len)
+		return -ENOSPC;
+
+	if (copy_from_user(&query, query_ptr, len))
+		return -EFAULT;
+
+	for (i = 0; i < ARRAY_SIZE(query.rsvd); i++) {
+		if (query.rsvd[i])
+			return -EINVAL;
+	}
+
+	engine = intel_engine_lookup_user(i915, query.class, query.instance);
+	if (!engine)
+		return -ENOENT;
+
+	query.queued = atomic_read(&engine->request_stats.queued);
+	query.runnable = engine->request_stats.runnable;
+	query.running = intel_engine_last_submit(engine) -
+			intel_engine_get_seqno(engine);
+
+	if (copy_to_user(query_ptr, &query, len))
+		return -EFAULT;
+
+	return len;
+}
+
 static int (* const i915_query_funcs[])(struct drm_i915_private *dev_priv,
 					struct drm_i915_query_item *query_item) = {
 	query_topology_info,
+	query_engine_queues,
 };
 
 int i915_query_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 9a00c30e4071..064c3d620286 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1637,6 +1637,7 @@ struct drm_i915_perf_oa_config {
 struct drm_i915_query_item {
 	__u64 query_id;
 #define DRM_I915_QUERY_TOPOLOGY_INFO    1
+#define DRM_I915_QUERY_ENGINE_QUEUES	2
 
 	/*
 	 * When set to zero by userspace, this is filled with the size of the
@@ -1734,6 +1735,31 @@ struct drm_i915_query_topology_info {
 	__u8 data[];
 };
 
+/**
+ * struct drm_i915_query_engine_queues
+ *
+ * Engine queues query enables userspace to query current counts of active
+ * requests in their different states.
+ */
+struct drm_i915_query_engine_queues {
+	/** Engine class as in enum drm_i915_gem_engine_class. */
+	__u16 class;
+
+	/** Engine instance number. */
+	__u16 instance;
+
+	/** Number of requests with unresolved fences and dependencies. */
+	__u32 queued;
+
+	/** Number of ready requests waiting on a slot on GPU. */
+	__u32 runnable;
+
+	/** Number of requests executing on the GPU. */
+	__u32 running;
+
+	__u32 rsvd[5];
+};
+
 #if defined(__cplusplus)
 }
 #endif
-- 
2.14.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* ✗ Fi.CI.BAT: failure for Queued/runnable/running engine stats (rev3)
  2018-03-19 18:16 [PATCH v4 0/7] Queued/runnable/running engine stats Tvrtko Ursulin
                   ` (6 preceding siblings ...)
  2018-03-19 18:16 ` [PATCH 7/7] drm/i915: Engine queues query Tvrtko Ursulin
@ 2018-03-19 19:57 ` Patchwork
  7 siblings, 0 replies; 11+ messages in thread
From: Patchwork @ 2018-03-19 19:57 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

== Series Details ==

Series: Queued/runnable/running engine stats (rev3)
URL   : https://patchwork.freedesktop.org/series/36926/
State : failure

== Summary ==

Series 36926v3 Queued/runnable/running engine stats
https://patchwork.freedesktop.org/api/1.0/series/36926/revisions/3/mbox/

---- Possible new issues:

Test kms_pipe_crc_basic:
        Subgroup read-crc-pipe-b-frame-sequence:
                pass       -> FAIL       (fi-skl-6770hq)

---- Known issues:

Test gem_ringfill:
        Subgroup basic-default-hang:
                dmesg-warn -> INCOMPLETE (fi-blb-e6850) fdo#101600 +1
Test kms_flip:
        Subgroup basic-flip-vs-wf_vblank:
                pass       -> FAIL       (fi-skl-6770hq) fdo#100368

fdo#101600 https://bugs.freedesktop.org/show_bug.cgi?id=101600
fdo#100368 https://bugs.freedesktop.org/show_bug.cgi?id=100368

fi-bdw-5557u     total:285  pass:264  dwarn:0   dfail:0   fail:0   skip:21  time:434s
fi-bdw-gvtdvm    total:285  pass:261  dwarn:0   dfail:0   fail:0   skip:24  time:440s
fi-blb-e6850     total:146  pass:114  dwarn:0   dfail:0   fail:0   skip:31 
fi-bsw-n3050     total:285  pass:239  dwarn:0   dfail:0   fail:0   skip:46  time:538s
fi-bwr-2160      total:285  pass:180  dwarn:0   dfail:0   fail:0   skip:105 time:299s
fi-bxt-j4205     total:285  pass:256  dwarn:0   dfail:0   fail:0   skip:29  time:512s
fi-byt-j1900     total:285  pass:250  dwarn:0   dfail:0   fail:0   skip:35  time:516s
fi-byt-n2820     total:285  pass:246  dwarn:0   dfail:0   fail:0   skip:39  time:504s
fi-cfl-8700k     total:285  pass:257  dwarn:0   dfail:0   fail:0   skip:28  time:409s
fi-cfl-s2        total:285  pass:259  dwarn:0   dfail:0   fail:0   skip:26  time:579s
fi-cfl-u         total:285  pass:259  dwarn:0   dfail:0   fail:0   skip:26  time:511s
fi-cnl-drrs      total:285  pass:254  dwarn:3   dfail:0   fail:0   skip:28  time:540s
fi-elk-e7500     total:285  pass:225  dwarn:1   dfail:0   fail:0   skip:59  time:418s
fi-gdg-551       total:285  pass:176  dwarn:0   dfail:0   fail:1   skip:108 time:316s
fi-glk-1         total:285  pass:257  dwarn:0   dfail:0   fail:0   skip:28  time:535s
fi-hsw-4770      total:285  pass:258  dwarn:0   dfail:0   fail:0   skip:27  time:403s
fi-ilk-650       total:285  pass:225  dwarn:0   dfail:0   fail:0   skip:60  time:418s
fi-ivb-3520m     total:285  pass:256  dwarn:0   dfail:0   fail:0   skip:29  time:463s
fi-ivb-3770      total:285  pass:252  dwarn:0   dfail:0   fail:0   skip:33  time:428s
fi-kbl-7500u     total:285  pass:260  dwarn:1   dfail:0   fail:0   skip:24  time:472s
fi-kbl-7567u     total:285  pass:265  dwarn:0   dfail:0   fail:0   skip:20  time:463s
fi-kbl-r         total:285  pass:258  dwarn:0   dfail:0   fail:0   skip:27  time:514s
fi-pnv-d510      total:146  pass:113  dwarn:0   dfail:0   fail:0   skip:32 
fi-skl-6260u     total:285  pass:265  dwarn:0   dfail:0   fail:0   skip:20  time:438s
fi-skl-6600u     total:285  pass:258  dwarn:0   dfail:0   fail:0   skip:27  time:541s
fi-skl-6700hq    total:285  pass:259  dwarn:0   dfail:0   fail:0   skip:26  time:541s
fi-skl-6700k2    total:285  pass:261  dwarn:0   dfail:0   fail:0   skip:24  time:504s
fi-skl-6770hq    total:285  pass:263  dwarn:0   dfail:0   fail:2   skip:20  time:486s
fi-skl-guc       total:285  pass:257  dwarn:0   dfail:0   fail:0   skip:28  time:429s
fi-skl-gvtdvm    total:285  pass:262  dwarn:0   dfail:0   fail:0   skip:23  time:445s
fi-snb-2520m     total:285  pass:245  dwarn:0   dfail:0   fail:0   skip:40  time:569s
fi-snb-2600      total:285  pass:245  dwarn:0   dfail:0   fail:0   skip:40  time:405s

260af42eeff094e4768265a6ec8bbcb29b87e9a0 drm-tip: 2018y-03m-19d-17h-15m-08s UTC integration manifest
8c6da3b6af57 drm/i915: Engine queues query
1ef4b561e79b drm/i915/pmu: Add running counter
104ead8a4564 drm/i915/pmu: Add runnable counter
a13ee0c03cb5 drm/i915/pmu: Add queued counter
e4892df27036 drm/i915: Keep a count of requests submitted from userspace
a9b33ccca2b6 drm/i915: Keep a count of requests waiting for a slot on GPU
76102c836a10 drm/i915/pmu: Fix enable count array size and bounds checking

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_8402/issues.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 7/7] drm/i915: Engine queues query
  2018-03-19 18:16 ` [PATCH 7/7] drm/i915: Engine queues query Tvrtko Ursulin
@ 2018-03-31  1:04   ` Lionel Landwerlin
  2018-03-31  9:03     ` Chris Wilson
  0 siblings, 1 reply; 11+ messages in thread
From: Lionel Landwerlin @ 2018-03-31  1:04 UTC (permalink / raw)
  To: Tvrtko Ursulin, Intel-gfx

On 19/03/18 18:16, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>
> As well as exposing active requests on engines via PMU, we can also export
> the current raw values (as tracked by i915 command submission) via a
> dedicated query.
>
> This is to satisfy customers who have userspace load balancing solutions
> implemented on top of their custom kernel patches.
>
> Userspace is now able to include DRM_I915_QUERY_ENGINE_QUEUES in their
> query list, pointing to initialized struct drm_i915_query_engine_queues
> entry. Fields describing engine class and instance userspace would like to
> know about need to be filled in, and i915 will fill in the rest.
>
> Multiple engines can be queried in one go by having multiple queries in
> the query list.
>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_query.c | 43 +++++++++++++++++++++++++++++++++++++++
>   include/uapi/drm/i915_drm.h       | 26 +++++++++++++++++++++++
>   2 files changed, 69 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_query.c b/drivers/gpu/drm/i915/i915_query.c
> index 3ace929dd90f..b3bc69e8deb7 100644
> --- a/drivers/gpu/drm/i915/i915_query.c
> +++ b/drivers/gpu/drm/i915/i915_query.c
> @@ -82,9 +82,52 @@ static int query_topology_info(struct drm_i915_private *dev_priv,
>   	return total_length;
>   }
>   
> +static int
> +query_engine_queues(struct drm_i915_private *i915,
> +		    struct drm_i915_query_item *query_item)
> +{
> +	struct drm_i915_query_engine_queues __user *query_ptr =
> +				u64_to_user_ptr(query_item->data_ptr);
> +	struct drm_i915_query_engine_queues query;
> +	struct intel_engine_cs *engine;
> +	const int len = sizeof(query);
> +	unsigned int i;
> +
> +	if (query_item->flags)
> +		return -EINVAL;
> +
> +	if (!query_item->length)
> +		return len;
> +	else if (query_item->length < len)
> +		return -ENOSPC;

topology returns EINVAL in that case. I think ENOSPC makes more sense, 
do we need to change topology?

> +
> +	if (copy_from_user(&query, query_ptr, len))
> +		return -EFAULT;
> +
> +	for (i = 0; i < ARRAY_SIZE(query.rsvd); i++) {
> +		if (query.rsvd[i])
> +			return -EINVAL;
> +	}
> +
> +	engine = intel_engine_lookup_user(i915, query.class, query.instance);
> +	if (!engine)
> +		return -ENOENT;
> +
> +	query.queued = atomic_read(&engine->request_stats.queued);
> +	query.runnable = engine->request_stats.runnable;
> +	query.running = intel_engine_last_submit(engine) -
> +			intel_engine_get_seqno(engine);
> +
> +	if (copy_to_user(query_ptr, &query, len))
> +		return -EFAULT;
> +
> +	return len;
> +}
> +
>   static int (* const i915_query_funcs[])(struct drm_i915_private *dev_priv,
>   					struct drm_i915_query_item *query_item) = {
>   	query_topology_info,
> +	query_engine_queues,
>   };
>   
>   int i915_query_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 9a00c30e4071..064c3d620286 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -1637,6 +1637,7 @@ struct drm_i915_perf_oa_config {
>   struct drm_i915_query_item {
>   	__u64 query_id;
>   #define DRM_I915_QUERY_TOPOLOGY_INFO    1
> +#define DRM_I915_QUERY_ENGINE_QUEUES	2
>   
>   	/*
>   	 * When set to zero by userspace, this is filled with the size of the
> @@ -1734,6 +1735,31 @@ struct drm_i915_query_topology_info {
>   	__u8 data[];
>   };
>   
> +/**
> + * struct drm_i915_query_engine_queues
> + *
> + * Engine queues query enables userspace to query current counts of active
> + * requests in their different states.
> + */
> +struct drm_i915_query_engine_queues {
> +	/** Engine class as in enum drm_i915_gem_engine_class. */
> +	__u16 class;
> +
> +	/** Engine instance number. */
> +	__u16 instance;
> +
> +	/** Number of requests with unresolved fences and dependencies. */
> +	__u32 queued;
> +
> +	/** Number of ready requests waiting on a slot on GPU. */
> +	__u32 runnable;
> +
> +	/** Number of requests executing on the GPU. */
> +	__u32 running;
> +
> +	__u32 rsvd[5];
> +};
> +
>   #if defined(__cplusplus)
>   }
>   #endif


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 7/7] drm/i915: Engine queues query
  2018-03-31  1:04   ` Lionel Landwerlin
@ 2018-03-31  9:03     ` Chris Wilson
  0 siblings, 0 replies; 11+ messages in thread
From: Chris Wilson @ 2018-03-31  9:03 UTC (permalink / raw)
  To: Lionel Landwerlin, Tvrtko Ursulin, Intel-gfx

Quoting Lionel Landwerlin (2018-03-31 02:04:26)
> On 19/03/18 18:16, Tvrtko Ursulin wrote:
> > From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >
> > As well as exposing active requests on engines via PMU, we can also export
> > the current raw values (as tracked by i915 command submission) via a
> > dedicated query.
> >
> > This is to satisfy customers who have userspace load balancing solutions
> > implemented on top of their custom kernel patches.
> >
> > Userspace is now able to include DRM_I915_QUERY_ENGINE_QUEUES in their
> > query list, pointing to initialized struct drm_i915_query_engine_queues
> > entry. Fields describing engine class and instance userspace would like to
> > know about need to be filled in, and i915 will fill in the rest.
> >
> > Multiple engines can be queried in one go by having multiple queries in
> > the query list.
> >
> > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
> > ---
> >   drivers/gpu/drm/i915/i915_query.c | 43 +++++++++++++++++++++++++++++++++++++++
> >   include/uapi/drm/i915_drm.h       | 26 +++++++++++++++++++++++
> >   2 files changed, 69 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_query.c b/drivers/gpu/drm/i915/i915_query.c
> > index 3ace929dd90f..b3bc69e8deb7 100644
> > --- a/drivers/gpu/drm/i915/i915_query.c
> > +++ b/drivers/gpu/drm/i915/i915_query.c
> > @@ -82,9 +82,52 @@ static int query_topology_info(struct drm_i915_private *dev_priv,
> >       return total_length;
> >   }
> >   
> > +static int
> > +query_engine_queues(struct drm_i915_private *i915,
> > +                 struct drm_i915_query_item *query_item)
> > +{
> > +     struct drm_i915_query_engine_queues __user *query_ptr =
> > +                             u64_to_user_ptr(query_item->data_ptr);
> > +     struct drm_i915_query_engine_queues query;
> > +     struct intel_engine_cs *engine;
> > +     const int len = sizeof(query);
> > +     unsigned int i;
> > +
> > +     if (query_item->flags)
> > +             return -EINVAL;
> > +
> > +     if (!query_item->length)
> > +             return len;
> > +     else if (query_item->length < len)
> > +             return -ENOSPC;
> 
> topology returns EINVAL in that case. I think ENOSPC makes more sense, 
> do we need to change topology?

I suggest -EINVAL, we are still doing user parameter checking. I think
we will find more useful cases for -ENOSPC later and so would prefer to
keep it clear.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2018-03-31  9:03 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-19 18:16 [PATCH v4 0/7] Queued/runnable/running engine stats Tvrtko Ursulin
2018-03-19 18:16 ` [PATCH 1/7] drm/i915/pmu: Fix enable count array size and bounds checking Tvrtko Ursulin
2018-03-19 18:16 ` [PATCH 2/7] drm/i915: Keep a count of requests waiting for a slot on GPU Tvrtko Ursulin
2018-03-19 18:16 ` [PATCH 3/7] drm/i915: Keep a count of requests submitted from userspace Tvrtko Ursulin
2018-03-19 18:16 ` [PATCH 4/7] drm/i915/pmu: Add queued counter Tvrtko Ursulin
2018-03-19 18:16 ` [PATCH 5/7] drm/i915/pmu: Add runnable counter Tvrtko Ursulin
2018-03-19 18:16 ` [PATCH 6/7] drm/i915/pmu: Add running counter Tvrtko Ursulin
2018-03-19 18:16 ` [PATCH 7/7] drm/i915: Engine queues query Tvrtko Ursulin
2018-03-31  1:04   ` Lionel Landwerlin
2018-03-31  9:03     ` Chris Wilson
2018-03-19 19:57 ` ✗ Fi.CI.BAT: failure for Queued/runnable/running engine stats (rev3) Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.