All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 0/7] Queued/runnable/running engine stats
@ 2018-04-05 12:39 Tvrtko Ursulin
  2018-04-05 12:39 ` [PATCH 1/7] drm/i915/pmu: Fix enable count array size and bounds checking Tvrtko Ursulin
                   ` (9 more replies)
  0 siblings, 10 replies; 31+ messages in thread
From: Tvrtko Ursulin @ 2018-04-05 12:39 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Per-engine queue depths are an interesting metric for analyzing the system load
and also for users who wish to use it to load balance their submissions based
on it.

In this version I have split the metrics into three separate counters:

1. QUEUED - From execbuf time to request being runnable - runnable meaning until
            dependencies have been resolved and fences signaled.
2. RUNNABLE - From runnable to running on the GPU.
3. RUNNING - Running on the GPU.

When inspected with perf stat the output looks roughly like this:

#           time             counts unit events
   201.160490145               0.01      i915/rcs0-queued/
   201.160490145              19.13      i915/rcs0-runnable/
   201.160490145               2.39      i915/rcs0-running/

The reported numbers are average queue depths for the last query period.

v2:
 * Review feedback (see patch changelogs).
 * Renamed the counters and re-ordered some patches.

v3:
 * Review feedback and rebase.

v4:
 * Addition of last patch in the series, which supports a customer requirement
   to expose instantaneous queue values via the i915 query API.

v5:
 * Fixed accounting under wedging.
 * Error code ABI tweak.

Tvrtko Ursulin (7):
  drm/i915/pmu: Fix enable count array size and bounds checking
  drm/i915: Keep a count of requests waiting for a slot on GPU
  drm/i915: Keep a count of requests submitted from userspace
  drm/i915/pmu: Add queued counter
  drm/i915/pmu: Add runnable counter
  drm/i915/pmu: Add running counter
  drm/i915: Engine queues query

 drivers/gpu/drm/i915/i915_gem.c         |  1 +
 drivers/gpu/drm/i915/i915_pmu.c         | 81 +++++++++++++++++++++++++++++----
 drivers/gpu/drm/i915/i915_query.c       | 43 +++++++++++++++++
 drivers/gpu/drm/i915/i915_request.c     | 10 ++++
 drivers/gpu/drm/i915/intel_engine_cs.c  |  6 ++-
 drivers/gpu/drm/i915/intel_lrc.c        |  1 +
 drivers/gpu/drm/i915/intel_ringbuffer.h | 21 ++++++++-
 include/uapi/drm/i915_drm.h             | 45 +++++++++++++++++-
 8 files changed, 195 insertions(+), 13 deletions(-)

-- 
2.14.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 1/7] drm/i915/pmu: Fix enable count array size and bounds checking
  2018-04-05 12:39 [PATCH v5 0/7] Queued/runnable/running engine stats Tvrtko Ursulin
@ 2018-04-05 12:39 ` Tvrtko Ursulin
  2018-04-05 12:39 ` [PATCH 2/7] drm/i915: Keep a count of requests waiting for a slot on GPU Tvrtko Ursulin
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 31+ messages in thread
From: Tvrtko Ursulin @ 2018-04-05 12:39 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Enable count array is supposed to have one counter for each possible
engine sampler. As such array sizing and bounds checking is not
correct when more engine samplers are added.

At the same time tidy the assert for readability and robustness.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: b46a33e271ed ("drm/i915/pmu: Expose a PMU interface for perf queries")
Cc: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_pmu.c         | 13 +++++++++----
 drivers/gpu/drm/i915/intel_ringbuffer.h |  2 +-
 2 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index 11fb76bd3860..eb60943671b3 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -549,7 +549,8 @@ static void i915_pmu_enable(struct perf_event *event)
 	 * Update the bitmask of enabled events and increment
 	 * the event reference counter.
 	 */
-	GEM_BUG_ON(bit >= I915_PMU_MASK_BITS);
+	BUILD_BUG_ON(ARRAY_SIZE(i915->pmu.enable_count) != I915_PMU_MASK_BITS);
+	GEM_BUG_ON(bit >= ARRAY_SIZE(i915->pmu.enable_count));
 	GEM_BUG_ON(i915->pmu.enable_count[bit] == ~0);
 	i915->pmu.enable |= BIT_ULL(bit);
 	i915->pmu.enable_count[bit]++;
@@ -573,7 +574,10 @@ static void i915_pmu_enable(struct perf_event *event)
 		GEM_BUG_ON(!engine);
 		engine->pmu.enable |= BIT(sample);
 
-		GEM_BUG_ON(sample >= I915_PMU_SAMPLE_BITS);
+		BUILD_BUG_ON(ARRAY_SIZE(engine->pmu.enable_count) !=
+			     (1 << I915_PMU_SAMPLE_BITS));
+		GEM_BUG_ON(sample >= ARRAY_SIZE(engine->pmu.enable_count));
+		GEM_BUG_ON(sample >= ARRAY_SIZE(engine->pmu.sample));
 		GEM_BUG_ON(engine->pmu.enable_count[sample] == ~0);
 		engine->pmu.enable_count[sample]++;
 	}
@@ -605,7 +609,8 @@ static void i915_pmu_disable(struct perf_event *event)
 						  engine_event_class(event),
 						  engine_event_instance(event));
 		GEM_BUG_ON(!engine);
-		GEM_BUG_ON(sample >= I915_PMU_SAMPLE_BITS);
+		GEM_BUG_ON(sample >= ARRAY_SIZE(engine->pmu.enable_count));
+		GEM_BUG_ON(sample >= ARRAY_SIZE(engine->pmu.sample));
 		GEM_BUG_ON(engine->pmu.enable_count[sample] == 0);
 		/*
 		 * Decrement the reference count and clear the enabled
@@ -615,7 +620,7 @@ static void i915_pmu_disable(struct perf_event *event)
 			engine->pmu.enable &= ~BIT(sample);
 	}
 
-	GEM_BUG_ON(bit >= I915_PMU_MASK_BITS);
+	GEM_BUG_ON(bit >= ARRAY_SIZE(i915->pmu.enable_count));
 	GEM_BUG_ON(i915->pmu.enable_count[bit] == 0);
 	/*
 	 * Decrement the reference count and clear the enabled
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 256d58487559..0c548c400699 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -391,7 +391,7 @@ struct intel_engine_cs {
 		 *
 		 * Index number corresponds to the bit number from @enable.
 		 */
-		unsigned int enable_count[I915_PMU_SAMPLE_BITS];
+		unsigned int enable_count[1 << I915_PMU_SAMPLE_BITS];
 		/**
 		 * @sample: Counter values for sampling events.
 		 *
-- 
2.14.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 2/7] drm/i915: Keep a count of requests waiting for a slot on GPU
  2018-04-05 12:39 [PATCH v5 0/7] Queued/runnable/running engine stats Tvrtko Ursulin
  2018-04-05 12:39 ` [PATCH 1/7] drm/i915/pmu: Fix enable count array size and bounds checking Tvrtko Ursulin
@ 2018-04-05 12:39 ` Tvrtko Ursulin
  2018-04-06 20:16   ` Chris Wilson
  2018-04-09 16:37   ` [PATCH v10 " Tvrtko Ursulin
  2018-04-05 12:39 ` [PATCH 3/7] drm/i915: Keep a count of requests submitted from userspace Tvrtko Ursulin
                   ` (7 subsequent siblings)
  9 siblings, 2 replies; 31+ messages in thread
From: Tvrtko Ursulin @ 2018-04-05 12:39 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Keep a per-engine number of runnable (waiting for GPU time) requests.

v2:
 * Move queued increment from insert_request to execlist_submit_request to
   avoid bumping when re-ordering for priority.
 * Support the counter on the ringbuffer submission path as well, albeit
   just notionally. (Chris Wilson)

v3:
 * Rebase.

v4:
 * Rename and move the stats into a container structure. (Chris Wilson)

v5:
 * Re-order fields in struct intel_engine_cs. (Chris Wilson)

v6-v8:
 * Rebases.

v9:
 * Fix accounting during wedging.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c         | 1 +
 drivers/gpu/drm/i915/i915_request.c     | 7 +++++++
 drivers/gpu/drm/i915/intel_engine_cs.c  | 5 +++--
 drivers/gpu/drm/i915/intel_lrc.c        | 1 +
 drivers/gpu/drm/i915/intel_ringbuffer.h | 9 +++++++++
 5 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 9650a7b10c5f..63f334d5f7fd 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3211,6 +3211,7 @@ static void nop_complete_submit_request(struct i915_request *request)
 	dma_fence_set_error(&request->fence, -EIO);
 
 	spin_lock_irqsave(&request->engine->timeline->lock, flags);
+	request->engine->request_stats.runnable++;
 	__i915_request_submit(request);
 	intel_engine_init_global_seqno(request->engine, request->global_seqno);
 	spin_unlock_irqrestore(&request->engine->timeline->lock, flags);
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 585242831974..5c01291ad1cc 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -540,6 +540,9 @@ void __i915_request_submit(struct i915_request *request)
 	/* Transfer from per-context onto the global per-engine timeline */
 	move_to_timeline(request, engine->timeline);
 
+	GEM_BUG_ON(engine->request_stats.runnable == 0);
+	engine->request_stats.runnable--;
+
 	trace_i915_request_execute(request);
 
 	wake_up_all(&request->execute);
@@ -553,6 +556,8 @@ void i915_request_submit(struct i915_request *request)
 	/* Will be called from irq-context when using foreign fences. */
 	spin_lock_irqsave(&engine->timeline->lock, flags);
 
+	engine->request_stats.runnable++;
+
 	__i915_request_submit(request);
 
 	spin_unlock_irqrestore(&engine->timeline->lock, flags);
@@ -591,6 +596,8 @@ void __i915_request_unsubmit(struct i915_request *request)
 	/* Transfer back from the global per-engine timeline to per-context */
 	move_to_timeline(request, request->timeline);
 
+	engine->request_stats.runnable++;
+
 	/*
 	 * We don't need to wake_up any waiters on request->execute, they
 	 * will get woken by any other event or us re-adding this request
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 12486d8f534b..98254ff92785 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -1934,12 +1934,13 @@ void intel_engine_dump(struct intel_engine_cs *engine,
 	if (i915_terminally_wedged(&engine->i915->gpu_error))
 		drm_printf(m, "*** WEDGED ***\n");
 
-	drm_printf(m, "\tcurrent seqno %x, last %x, hangcheck %x [%d ms], inflight %d\n",
+	drm_printf(m, "\tcurrent seqno %x, last %x, hangcheck %x [%d ms], inflight %d, runnable %u\n",
 		   intel_engine_get_seqno(engine),
 		   intel_engine_last_submit(engine),
 		   engine->hangcheck.seqno,
 		   jiffies_to_msecs(jiffies - engine->hangcheck.action_timestamp),
-		   engine->timeline->inflight_seqnos);
+		   engine->timeline->inflight_seqnos,
+		   engine->request_stats.runnable);
 	drm_printf(m, "\tReset count: %d (global %d)\n",
 		   i915_reset_engine_count(error, engine),
 		   i915_reset_count(error));
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 3592288e4696..f6631ff11caf 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1113,6 +1113,7 @@ static void execlists_submit_request(struct i915_request *request)
 
 	queue_request(engine, &request->priotree, rq_prio(request));
 	submit_queue(engine, rq_prio(request));
+	engine->request_stats.runnable++;
 
 	GEM_BUG_ON(!engine->execlists.first);
 	GEM_BUG_ON(list_empty(&request->priotree.link));
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 0c548c400699..54d2ad1c8daa 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -338,6 +338,15 @@ struct intel_engine_cs {
 
 	struct drm_i915_gem_object *default_state;
 
+	struct {
+		/**
+		 * @runnable: Number of runnable requests sent to the backend.
+		 *
+		 * Count of requests waiting for the GPU to execute them.
+		 */
+		unsigned int runnable;
+	} request_stats;
+
 	atomic_t irq_count;
 	unsigned long irq_posted;
 #define ENGINE_IRQ_BREADCRUMB 0
-- 
2.14.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 3/7] drm/i915: Keep a count of requests submitted from userspace
  2018-04-05 12:39 [PATCH v5 0/7] Queued/runnable/running engine stats Tvrtko Ursulin
  2018-04-05 12:39 ` [PATCH 1/7] drm/i915/pmu: Fix enable count array size and bounds checking Tvrtko Ursulin
  2018-04-05 12:39 ` [PATCH 2/7] drm/i915: Keep a count of requests waiting for a slot on GPU Tvrtko Ursulin
@ 2018-04-05 12:39 ` Tvrtko Ursulin
  2018-04-06 20:17   ` Chris Wilson
  2018-04-09 16:38   ` [PATCH v4 " Tvrtko Ursulin
  2018-04-05 12:39 ` [PATCH 4/7] drm/i915/pmu: Add queued counter Tvrtko Ursulin
                   ` (6 subsequent siblings)
  9 siblings, 2 replies; 31+ messages in thread
From: Tvrtko Ursulin @ 2018-04-05 12:39 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Keep a count of requests submitted from userspace and not yet runnable due
unresolved dependencies.

v2: Rename and move under the container struct. (Chris Wilson)
v3: Rebase.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_request.c     | 3 +++
 drivers/gpu/drm/i915/intel_engine_cs.c  | 3 ++-
 drivers/gpu/drm/i915/intel_ringbuffer.h | 8 ++++++++
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 5c01291ad1cc..152321655fe6 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -640,6 +640,7 @@ submit_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
 		rcu_read_lock();
 		request->engine->submit_request(request);
 		rcu_read_unlock();
+		atomic_dec(&request->engine->request_stats.queued);
 		break;
 
 	case FENCE_FREE:
@@ -1118,6 +1119,8 @@ void __i915_request_add(struct i915_request *request, bool flush_caches)
 		engine->schedule(request, request->ctx->priority);
 	rcu_read_unlock();
 
+	atomic_inc(&engine->request_stats.queued);
+
 	local_bh_disable();
 	i915_sw_fence_commit(&request->submit);
 	local_bh_enable(); /* Kick the execlists tasklet if just scheduled */
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 98254ff92785..e4992c2e23a4 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -1934,12 +1934,13 @@ void intel_engine_dump(struct intel_engine_cs *engine,
 	if (i915_terminally_wedged(&engine->i915->gpu_error))
 		drm_printf(m, "*** WEDGED ***\n");
 
-	drm_printf(m, "\tcurrent seqno %x, last %x, hangcheck %x [%d ms], inflight %d, runnable %u\n",
+	drm_printf(m, "\tcurrent seqno %x, last %x, hangcheck %x [%d ms], inflight %d, queued %u, runnable %u\n",
 		   intel_engine_get_seqno(engine),
 		   intel_engine_last_submit(engine),
 		   engine->hangcheck.seqno,
 		   jiffies_to_msecs(jiffies - engine->hangcheck.action_timestamp),
 		   engine->timeline->inflight_seqnos,
+		   atomic_read(&engine->request_stats.queued),
 		   engine->request_stats.runnable);
 	drm_printf(m, "\tReset count: %d (global %d)\n",
 		   i915_reset_engine_count(error, engine),
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 54d2ad1c8daa..616066f536c9 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -339,6 +339,14 @@ struct intel_engine_cs {
 	struct drm_i915_gem_object *default_state;
 
 	struct {
+		/**
+		 * @queued: Number of submitted requests with dependencies.
+		 *
+		 * Count of requests waiting for dependencies before they can be
+		 * submitted to the backend.
+		 */
+		atomic_t queued;
+
 		/**
 		 * @runnable: Number of runnable requests sent to the backend.
 		 *
-- 
2.14.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 4/7] drm/i915/pmu: Add queued counter
  2018-04-05 12:39 [PATCH v5 0/7] Queued/runnable/running engine stats Tvrtko Ursulin
                   ` (2 preceding siblings ...)
  2018-04-05 12:39 ` [PATCH 3/7] drm/i915: Keep a count of requests submitted from userspace Tvrtko Ursulin
@ 2018-04-05 12:39 ` Tvrtko Ursulin
  2018-04-06 20:19   ` Chris Wilson
  2018-04-05 12:39 ` [PATCH 5/7] drm/i915/pmu: Add runnable counter Tvrtko Ursulin
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 31+ messages in thread
From: Tvrtko Ursulin @ 2018-04-05 12:39 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

We add a PMU counter to expose the number of requests which have been
submitted from userspace but are not yet runnable due dependencies and
unsignaled fences.

This is useful to analyze the overall load of the system.

v2:
 * Rebase for name change and re-order.
 * Drop floating point constant. (Chris Wilson)

v3:
 * Change scale to 1024 for faster arithmetics. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_pmu.c         | 40 +++++++++++++++++++++++++++++----
 drivers/gpu/drm/i915/intel_ringbuffer.h |  2 +-
 include/uapi/drm/i915_drm.h             |  9 +++++++-
 3 files changed, 45 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index eb60943671b3..07f5cac97b56 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -15,7 +15,8 @@
 #define ENGINE_SAMPLE_MASK \
 	(BIT(I915_SAMPLE_BUSY) | \
 	 BIT(I915_SAMPLE_WAIT) | \
-	 BIT(I915_SAMPLE_SEMA))
+	 BIT(I915_SAMPLE_SEMA) | \
+	 BIT(I915_SAMPLE_QUEUED))
 
 #define ENGINE_SAMPLE_BITS (1 << I915_PMU_SAMPLE_BITS)
 
@@ -199,6 +200,11 @@ static void engines_sample(struct drm_i915_private *dev_priv)
 
 		update_sample(&engine->pmu.sample[I915_SAMPLE_SEMA],
 			      PERIOD, !!(val & RING_WAIT_SEMAPHORE));
+
+		if (engine->pmu.enable & BIT(I915_SAMPLE_QUEUED))
+			update_sample(&engine->pmu.sample[I915_SAMPLE_QUEUED],
+				      I915_SAMPLE_QUEUED_DIVISOR,
+				      atomic_read(&engine->request_stats.queued));
 	}
 
 	if (fw)
@@ -296,6 +302,7 @@ engine_event_status(struct intel_engine_cs *engine,
 	switch (sample) {
 	case I915_SAMPLE_BUSY:
 	case I915_SAMPLE_WAIT:
+	case I915_SAMPLE_QUEUED:
 		break;
 	case I915_SAMPLE_SEMA:
 		if (INTEL_GEN(engine->i915) < 6)
@@ -497,6 +504,9 @@ static u64 __i915_pmu_event_read(struct perf_event *event)
 		} else {
 			val = engine->pmu.sample[sample].cur;
 		}
+
+		if (sample == I915_SAMPLE_QUEUED)
+			val = div_u64(val, FREQUENCY);
 	} else {
 		switch (event->attr.config) {
 		case I915_PMU_ACTUAL_FREQUENCY:
@@ -752,6 +762,16 @@ static const struct attribute_group *i915_pmu_attr_groups[] = {
 { \
 	.sample = (__sample), \
 	.name = (__name), \
+	.suffix = "unit", \
+	.value = "ns", \
+}
+
+#define __engine_event_scale(__sample, __name, __scale) \
+{ \
+	.sample = (__sample), \
+	.name = (__name), \
+	.suffix = "scale", \
+	.value = (__scale), \
 }
 
 static struct i915_ext_attribute *
@@ -779,6 +799,9 @@ add_pmu_attr(struct perf_pmu_events_attr *attr, const char *name,
 	return ++attr;
 }
 
+/* No brackets or quotes below please. */
+#define I915_SAMPLE_QUEUED_SCALE 0.0009765625
+
 static struct attribute **
 create_event_attributes(struct drm_i915_private *i915)
 {
@@ -795,10 +818,14 @@ create_event_attributes(struct drm_i915_private *i915)
 	static const struct {
 		enum drm_i915_pmu_engine_sample sample;
 		char *name;
+		char *suffix;
+		char *value;
 	} engine_events[] = {
 		__engine_event(I915_SAMPLE_BUSY, "busy"),
 		__engine_event(I915_SAMPLE_SEMA, "sema"),
 		__engine_event(I915_SAMPLE_WAIT, "wait"),
+		__engine_event_scale(I915_SAMPLE_QUEUED, "queued",
+				     __stringify(I915_SAMPLE_QUEUED_SCALE)),
 	};
 	unsigned int count = 0;
 	struct perf_pmu_events_attr *pmu_attr = NULL, *pmu_iter;
@@ -808,6 +835,9 @@ create_event_attributes(struct drm_i915_private *i915)
 	enum intel_engine_id id;
 	unsigned int i;
 
+	BUILD_BUG_ON(I915_SAMPLE_QUEUED_DIVISOR !=
+		     (1 / I915_SAMPLE_QUEUED_SCALE));
+
 	/* Count how many counters we will be exposing. */
 	for (i = 0; i < ARRAY_SIZE(events); i++) {
 		if (!config_status(i915, events[i].config))
@@ -885,13 +915,15 @@ create_event_attributes(struct drm_i915_private *i915)
 								engine->instance,
 								engine_events[i].sample));
 
-			str = kasprintf(GFP_KERNEL, "%s-%s.unit",
-					engine->name, engine_events[i].name);
+			str = kasprintf(GFP_KERNEL, "%s-%s.%s",
+					engine->name, engine_events[i].name,
+					engine_events[i].suffix);
 			if (!str)
 				goto err;
 
 			*attr_iter++ = &pmu_iter->attr.attr;
-			pmu_iter = add_pmu_attr(pmu_iter, str, "ns");
+			pmu_iter = add_pmu_attr(pmu_iter, str,
+						engine_events[i].value);
 		}
 	}
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 616066f536c9..2324150fae06 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -414,7 +414,7 @@ struct intel_engine_cs {
 		 *
 		 * Our internal timer stores the current counters in this field.
 		 */
-#define I915_ENGINE_SAMPLE_MAX (I915_SAMPLE_SEMA + 1)
+#define I915_ENGINE_SAMPLE_MAX (I915_SAMPLE_QUEUED + 1)
 		struct i915_pmu_sample sample[I915_ENGINE_SAMPLE_MAX];
 	} pmu;
 
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 7f5634ce8e88..6094cc9ca6d9 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -110,9 +110,13 @@ enum drm_i915_gem_engine_class {
 enum drm_i915_pmu_engine_sample {
 	I915_SAMPLE_BUSY = 0,
 	I915_SAMPLE_WAIT = 1,
-	I915_SAMPLE_SEMA = 2
+	I915_SAMPLE_SEMA = 2,
+	I915_SAMPLE_QUEUED = 3
 };
 
+ /* Divide counter value by divisor to get the real value. */
+#define I915_SAMPLE_QUEUED_DIVISOR (1024)
+
 #define I915_PMU_SAMPLE_BITS (4)
 #define I915_PMU_SAMPLE_MASK (0xf)
 #define I915_PMU_SAMPLE_INSTANCE_BITS (8)
@@ -133,6 +137,9 @@ enum drm_i915_pmu_engine_sample {
 #define I915_PMU_ENGINE_SEMA(class, instance) \
 	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_SEMA)
 
+#define I915_PMU_ENGINE_QUEUED(class, instance) \
+	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_QUEUED)
+
 #define __I915_PMU_OTHER(x) (__I915_PMU_ENGINE(0xff, 0xff, 0xf) + 1 + (x))
 
 #define I915_PMU_ACTUAL_FREQUENCY	__I915_PMU_OTHER(0)
-- 
2.14.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 5/7] drm/i915/pmu: Add runnable counter
  2018-04-05 12:39 [PATCH v5 0/7] Queued/runnable/running engine stats Tvrtko Ursulin
                   ` (3 preceding siblings ...)
  2018-04-05 12:39 ` [PATCH 4/7] drm/i915/pmu: Add queued counter Tvrtko Ursulin
@ 2018-04-05 12:39 ` Tvrtko Ursulin
  2018-04-06 20:22   ` Chris Wilson
  2018-04-05 12:39 ` [PATCH 6/7] drm/i915/pmu: Add running counter Tvrtko Ursulin
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 31+ messages in thread
From: Tvrtko Ursulin @ 2018-04-05 12:39 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

We add a PMU counter to expose the number of requests with resolved
dependencies waiting for a slot on the GPU to run.

This is useful to analyze the overall load of the system.

v2: Don't limit to gen8+.

v3:
 * Rebase for dynamic sysfs.
 * Drop currently executing requests.

v4:
 * Sync with internal renaming.
 * Drop floating point constant. (Chris Wilson)

v5:
 * Change scale to 1024 for faster arithmetics. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_pmu.c         | 18 ++++++++++++++++--
 drivers/gpu/drm/i915/intel_ringbuffer.h |  2 +-
 include/uapi/drm/i915_drm.h             |  7 ++++++-
 3 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index 07f5cac97b56..afc561e1aa92 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -16,7 +16,8 @@
 	(BIT(I915_SAMPLE_BUSY) | \
 	 BIT(I915_SAMPLE_WAIT) | \
 	 BIT(I915_SAMPLE_SEMA) | \
-	 BIT(I915_SAMPLE_QUEUED))
+	 BIT(I915_SAMPLE_QUEUED) | \
+	 BIT(I915_SAMPLE_RUNNABLE))
 
 #define ENGINE_SAMPLE_BITS (1 << I915_PMU_SAMPLE_BITS)
 
@@ -205,6 +206,11 @@ static void engines_sample(struct drm_i915_private *dev_priv)
 			update_sample(&engine->pmu.sample[I915_SAMPLE_QUEUED],
 				      I915_SAMPLE_QUEUED_DIVISOR,
 				      atomic_read(&engine->request_stats.queued));
+
+		if (engine->pmu.enable & BIT(I915_SAMPLE_RUNNABLE))
+			update_sample(&engine->pmu.sample[I915_SAMPLE_RUNNABLE],
+				      I915_SAMPLE_RUNNABLE_DIVISOR,
+				      engine->request_stats.runnable);
 	}
 
 	if (fw)
@@ -303,6 +309,7 @@ engine_event_status(struct intel_engine_cs *engine,
 	case I915_SAMPLE_BUSY:
 	case I915_SAMPLE_WAIT:
 	case I915_SAMPLE_QUEUED:
+	case I915_SAMPLE_RUNNABLE:
 		break;
 	case I915_SAMPLE_SEMA:
 		if (INTEL_GEN(engine->i915) < 6)
@@ -505,7 +512,8 @@ static u64 __i915_pmu_event_read(struct perf_event *event)
 			val = engine->pmu.sample[sample].cur;
 		}
 
-		if (sample == I915_SAMPLE_QUEUED)
+		if (sample == I915_SAMPLE_QUEUED ||
+		    sample == I915_SAMPLE_RUNNABLE)
 			val = div_u64(val, FREQUENCY);
 	} else {
 		switch (event->attr.config) {
@@ -801,6 +809,7 @@ add_pmu_attr(struct perf_pmu_events_attr *attr, const char *name,
 
 /* No brackets or quotes below please. */
 #define I915_SAMPLE_QUEUED_SCALE 0.0009765625
+#define I915_SAMPLE_RUNNABLE_SCALE 0.0009765625
 
 static struct attribute **
 create_event_attributes(struct drm_i915_private *i915)
@@ -826,6 +835,8 @@ create_event_attributes(struct drm_i915_private *i915)
 		__engine_event(I915_SAMPLE_WAIT, "wait"),
 		__engine_event_scale(I915_SAMPLE_QUEUED, "queued",
 				     __stringify(I915_SAMPLE_QUEUED_SCALE)),
+		__engine_event_scale(I915_SAMPLE_RUNNABLE, "runnable",
+				     __stringify(I915_SAMPLE_RUNNABLE_SCALE)),
 	};
 	unsigned int count = 0;
 	struct perf_pmu_events_attr *pmu_attr = NULL, *pmu_iter;
@@ -838,6 +849,9 @@ create_event_attributes(struct drm_i915_private *i915)
 	BUILD_BUG_ON(I915_SAMPLE_QUEUED_DIVISOR !=
 		     (1 / I915_SAMPLE_QUEUED_SCALE));
 
+	BUILD_BUG_ON(I915_SAMPLE_RUNNABLE_DIVISOR !=
+		     (1 / I915_SAMPLE_RUNNABLE_SCALE));
+
 	/* Count how many counters we will be exposing. */
 	for (i = 0; i < ARRAY_SIZE(events); i++) {
 		if (!config_status(i915, events[i].config))
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 2324150fae06..5af93e88c90f 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -414,7 +414,7 @@ struct intel_engine_cs {
 		 *
 		 * Our internal timer stores the current counters in this field.
 		 */
-#define I915_ENGINE_SAMPLE_MAX (I915_SAMPLE_QUEUED + 1)
+#define I915_ENGINE_SAMPLE_MAX (I915_SAMPLE_RUNNABLE + 1)
 		struct i915_pmu_sample sample[I915_ENGINE_SAMPLE_MAX];
 	} pmu;
 
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 6094cc9ca6d9..cf0265b20e37 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -111,11 +111,13 @@ enum drm_i915_pmu_engine_sample {
 	I915_SAMPLE_BUSY = 0,
 	I915_SAMPLE_WAIT = 1,
 	I915_SAMPLE_SEMA = 2,
-	I915_SAMPLE_QUEUED = 3
+	I915_SAMPLE_QUEUED = 3,
+	I915_SAMPLE_RUNNABLE = 4,
 };
 
  /* Divide counter value by divisor to get the real value. */
 #define I915_SAMPLE_QUEUED_DIVISOR (1024)
+#define I915_SAMPLE_RUNNABLE_DIVISOR (1024)
 
 #define I915_PMU_SAMPLE_BITS (4)
 #define I915_PMU_SAMPLE_MASK (0xf)
@@ -140,6 +142,9 @@ enum drm_i915_pmu_engine_sample {
 #define I915_PMU_ENGINE_QUEUED(class, instance) \
 	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_QUEUED)
 
+#define I915_PMU_ENGINE_RUNNABLE(class, instance) \
+	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_RUNNABLE)
+
 #define __I915_PMU_OTHER(x) (__I915_PMU_ENGINE(0xff, 0xff, 0xf) + 1 + (x))
 
 #define I915_PMU_ACTUAL_FREQUENCY	__I915_PMU_OTHER(0)
-- 
2.14.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 6/7] drm/i915/pmu: Add running counter
  2018-04-05 12:39 [PATCH v5 0/7] Queued/runnable/running engine stats Tvrtko Ursulin
                   ` (4 preceding siblings ...)
  2018-04-05 12:39 ` [PATCH 5/7] drm/i915/pmu: Add runnable counter Tvrtko Ursulin
@ 2018-04-05 12:39 ` Tvrtko Ursulin
  2018-04-06 20:24   ` Chris Wilson
  2018-04-05 12:39 ` [PATCH 7/7] drm/i915: Engine queues query Tvrtko Ursulin
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 31+ messages in thread
From: Tvrtko Ursulin @ 2018-04-05 12:39 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

We add a PMU counter to expose the number of requests currently executing
on the GPU.

This is useful to analyze the overall load of the system.

v2:
 * Rebase.
 * Drop floating point constant. (Chris Wilson)

v3:
 * Change scale to 1024 for faster arithmetics. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_pmu.c         | 18 ++++++++++++++++--
 drivers/gpu/drm/i915/intel_ringbuffer.h |  2 +-
 include/uapi/drm/i915_drm.h             |  5 +++++
 3 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index afc561e1aa92..bd7e695fc663 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -17,7 +17,8 @@
 	 BIT(I915_SAMPLE_WAIT) | \
 	 BIT(I915_SAMPLE_SEMA) | \
 	 BIT(I915_SAMPLE_QUEUED) | \
-	 BIT(I915_SAMPLE_RUNNABLE))
+	 BIT(I915_SAMPLE_RUNNABLE) | \
+	 BIT(I915_SAMPLE_RUNNING))
 
 #define ENGINE_SAMPLE_BITS (1 << I915_PMU_SAMPLE_BITS)
 
@@ -211,6 +212,11 @@ static void engines_sample(struct drm_i915_private *dev_priv)
 			update_sample(&engine->pmu.sample[I915_SAMPLE_RUNNABLE],
 				      I915_SAMPLE_RUNNABLE_DIVISOR,
 				      engine->request_stats.runnable);
+
+		if (engine->pmu.enable & BIT(I915_SAMPLE_RUNNING))
+			update_sample(&engine->pmu.sample[I915_SAMPLE_RUNNING],
+				      I915_SAMPLE_RUNNING_DIVISOR,
+				      last_seqno - current_seqno);
 	}
 
 	if (fw)
@@ -310,6 +316,7 @@ engine_event_status(struct intel_engine_cs *engine,
 	case I915_SAMPLE_WAIT:
 	case I915_SAMPLE_QUEUED:
 	case I915_SAMPLE_RUNNABLE:
+	case I915_SAMPLE_RUNNING:
 		break;
 	case I915_SAMPLE_SEMA:
 		if (INTEL_GEN(engine->i915) < 6)
@@ -513,7 +520,8 @@ static u64 __i915_pmu_event_read(struct perf_event *event)
 		}
 
 		if (sample == I915_SAMPLE_QUEUED ||
-		    sample == I915_SAMPLE_RUNNABLE)
+		    sample == I915_SAMPLE_RUNNABLE ||
+		    sample == I915_SAMPLE_RUNNING)
 			val = div_u64(val, FREQUENCY);
 	} else {
 		switch (event->attr.config) {
@@ -810,6 +818,7 @@ add_pmu_attr(struct perf_pmu_events_attr *attr, const char *name,
 /* No brackets or quotes below please. */
 #define I915_SAMPLE_QUEUED_SCALE 0.0009765625
 #define I915_SAMPLE_RUNNABLE_SCALE 0.0009765625
+#define I915_SAMPLE_RUNNING_SCALE 0.0009765625
 
 static struct attribute **
 create_event_attributes(struct drm_i915_private *i915)
@@ -837,6 +846,8 @@ create_event_attributes(struct drm_i915_private *i915)
 				     __stringify(I915_SAMPLE_QUEUED_SCALE)),
 		__engine_event_scale(I915_SAMPLE_RUNNABLE, "runnable",
 				     __stringify(I915_SAMPLE_RUNNABLE_SCALE)),
+		__engine_event_scale(I915_SAMPLE_RUNNING, "running",
+				     __stringify(I915_SAMPLE_RUNNING_SCALE)),
 	};
 	unsigned int count = 0;
 	struct perf_pmu_events_attr *pmu_attr = NULL, *pmu_iter;
@@ -852,6 +863,9 @@ create_event_attributes(struct drm_i915_private *i915)
 	BUILD_BUG_ON(I915_SAMPLE_RUNNABLE_DIVISOR !=
 		     (1 / I915_SAMPLE_RUNNABLE_SCALE));
 
+	BUILD_BUG_ON(I915_SAMPLE_RUNNING_DIVISOR !=
+		     (1 / I915_SAMPLE_RUNNING_SCALE));
+
 	/* Count how many counters we will be exposing. */
 	for (i = 0; i < ARRAY_SIZE(events); i++) {
 		if (!config_status(i915, events[i].config))
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 5af93e88c90f..d50b31eb43a5 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -414,7 +414,7 @@ struct intel_engine_cs {
 		 *
 		 * Our internal timer stores the current counters in this field.
 		 */
-#define I915_ENGINE_SAMPLE_MAX (I915_SAMPLE_RUNNABLE + 1)
+#define I915_ENGINE_SAMPLE_MAX (I915_SAMPLE_RUNNING + 1)
 		struct i915_pmu_sample sample[I915_ENGINE_SAMPLE_MAX];
 	} pmu;
 
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index cf0265b20e37..9a00c30e4071 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -113,11 +113,13 @@ enum drm_i915_pmu_engine_sample {
 	I915_SAMPLE_SEMA = 2,
 	I915_SAMPLE_QUEUED = 3,
 	I915_SAMPLE_RUNNABLE = 4,
+	I915_SAMPLE_RUNNING = 5,
 };
 
  /* Divide counter value by divisor to get the real value. */
 #define I915_SAMPLE_QUEUED_DIVISOR (1024)
 #define I915_SAMPLE_RUNNABLE_DIVISOR (1024)
+#define I915_SAMPLE_RUNNING_DIVISOR (1024)
 
 #define I915_PMU_SAMPLE_BITS (4)
 #define I915_PMU_SAMPLE_MASK (0xf)
@@ -145,6 +147,9 @@ enum drm_i915_pmu_engine_sample {
 #define I915_PMU_ENGINE_RUNNABLE(class, instance) \
 	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_RUNNABLE)
 
+#define I915_PMU_ENGINE_RUNNING(class, instance) \
+	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_RUNNING)
+
 #define __I915_PMU_OTHER(x) (__I915_PMU_ENGINE(0xff, 0xff, 0xf) + 1 + (x))
 
 #define I915_PMU_ACTUAL_FREQUENCY	__I915_PMU_OTHER(0)
-- 
2.14.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 7/7] drm/i915: Engine queues query
  2018-04-05 12:39 [PATCH v5 0/7] Queued/runnable/running engine stats Tvrtko Ursulin
                   ` (5 preceding siblings ...)
  2018-04-05 12:39 ` [PATCH 6/7] drm/i915/pmu: Add running counter Tvrtko Ursulin
@ 2018-04-05 12:39 ` Tvrtko Ursulin
  2018-04-05 13:05   ` Lionel Landwerlin
  2018-04-09 16:38   ` [PATCH v3 " Tvrtko Ursulin
  2018-04-05 13:49 ` ✓ Fi.CI.BAT: success for Queued/runnable/running engine stats (rev4) Patchwork
                   ` (2 subsequent siblings)
  9 siblings, 2 replies; 31+ messages in thread
From: Tvrtko Ursulin @ 2018-04-05 12:39 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

As well as exposing active requests on engines via PMU, we can also export
the current raw values (as tracked by i915 command submission) via a
dedicated query.

This is to satisfy customers who have userspace load balancing solutions
implemented on top of their custom kernel patches.

Userspace is now able to include DRM_I915_QUERY_ENGINE_QUEUES in their
query list, pointing to initialized struct drm_i915_query_engine_queues
entry. Fields describing engine class and instance userspace would like to
know about need to be filled in, and i915 will fill in the rest.

Multiple engines can be queried in one go by having multiple queries in
the query list.

v2:
 * Use EINVAL for reporting insufficient buffer space. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
---
 drivers/gpu/drm/i915/i915_query.c | 43 +++++++++++++++++++++++++++++++++++++++
 include/uapi/drm/i915_drm.h       | 26 +++++++++++++++++++++++
 2 files changed, 69 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_query.c b/drivers/gpu/drm/i915/i915_query.c
index 3ace929dd90f..798672f5c104 100644
--- a/drivers/gpu/drm/i915/i915_query.c
+++ b/drivers/gpu/drm/i915/i915_query.c
@@ -82,9 +82,52 @@ static int query_topology_info(struct drm_i915_private *dev_priv,
 	return total_length;
 }
 
+static int
+query_engine_queues(struct drm_i915_private *i915,
+		    struct drm_i915_query_item *query_item)
+{
+	struct drm_i915_query_engine_queues __user *query_ptr =
+				u64_to_user_ptr(query_item->data_ptr);
+	struct drm_i915_query_engine_queues query;
+	struct intel_engine_cs *engine;
+	const int len = sizeof(query);
+	unsigned int i;
+
+	if (query_item->flags)
+		return -EINVAL;
+
+	if (!query_item->length)
+		return len;
+	else if (query_item->length < len)
+		return -EINVAL;
+
+	if (copy_from_user(&query, query_ptr, len))
+		return -EFAULT;
+
+	for (i = 0; i < ARRAY_SIZE(query.rsvd); i++) {
+		if (query.rsvd[i])
+			return -EINVAL;
+	}
+
+	engine = intel_engine_lookup_user(i915, query.class, query.instance);
+	if (!engine)
+		return -ENOENT;
+
+	query.queued = atomic_read(&engine->request_stats.queued);
+	query.runnable = engine->request_stats.runnable;
+	query.running = intel_engine_last_submit(engine) -
+			intel_engine_get_seqno(engine);
+
+	if (copy_to_user(query_ptr, &query, len))
+		return -EFAULT;
+
+	return len;
+}
+
 static int (* const i915_query_funcs[])(struct drm_i915_private *dev_priv,
 					struct drm_i915_query_item *query_item) = {
 	query_topology_info,
+	query_engine_queues,
 };
 
 int i915_query_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 9a00c30e4071..064c3d620286 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1637,6 +1637,7 @@ struct drm_i915_perf_oa_config {
 struct drm_i915_query_item {
 	__u64 query_id;
 #define DRM_I915_QUERY_TOPOLOGY_INFO    1
+#define DRM_I915_QUERY_ENGINE_QUEUES	2
 
 	/*
 	 * When set to zero by userspace, this is filled with the size of the
@@ -1734,6 +1735,31 @@ struct drm_i915_query_topology_info {
 	__u8 data[];
 };
 
+/**
+ * struct drm_i915_query_engine_queues
+ *
+ * Engine queues query enables userspace to query current counts of active
+ * requests in their different states.
+ */
+struct drm_i915_query_engine_queues {
+	/** Engine class as in enum drm_i915_gem_engine_class. */
+	__u16 class;
+
+	/** Engine instance number. */
+	__u16 instance;
+
+	/** Number of requests with unresolved fences and dependencies. */
+	__u32 queued;
+
+	/** Number of ready requests waiting on a slot on GPU. */
+	__u32 runnable;
+
+	/** Number of requests executing on the GPU. */
+	__u32 running;
+
+	__u32 rsvd[5];
+};
+
 #if defined(__cplusplus)
 }
 #endif
-- 
2.14.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH 7/7] drm/i915: Engine queues query
  2018-04-05 12:39 ` [PATCH 7/7] drm/i915: Engine queues query Tvrtko Ursulin
@ 2018-04-05 13:05   ` Lionel Landwerlin
  2018-04-06 20:25     ` Chris Wilson
  2018-04-09 16:38   ` [PATCH v3 " Tvrtko Ursulin
  1 sibling, 1 reply; 31+ messages in thread
From: Lionel Landwerlin @ 2018-04-05 13:05 UTC (permalink / raw)
  To: Tvrtko Ursulin, Intel-gfx

Looks fine to me, I would just add some comments on the uAPI.

Otherwise :

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

On 05/04/18 13:39, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>
> As well as exposing active requests on engines via PMU, we can also export
> the current raw values (as tracked by i915 command submission) via a
> dedicated query.
>
> This is to satisfy customers who have userspace load balancing solutions
> implemented on top of their custom kernel patches.
>
> Userspace is now able to include DRM_I915_QUERY_ENGINE_QUEUES in their
> query list, pointing to initialized struct drm_i915_query_engine_queues
> entry. Fields describing engine class and instance userspace would like to
> know about need to be filled in, and i915 will fill in the rest.
>
> Multiple engines can be queried in one go by having multiple queries in
> the query list.
>
> v2:
>   * Use EINVAL for reporting insufficient buffer space. (Chris Wilson)
>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_query.c | 43 +++++++++++++++++++++++++++++++++++++++
>   include/uapi/drm/i915_drm.h       | 26 +++++++++++++++++++++++
>   2 files changed, 69 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_query.c b/drivers/gpu/drm/i915/i915_query.c
> index 3ace929dd90f..798672f5c104 100644
> --- a/drivers/gpu/drm/i915/i915_query.c
> +++ b/drivers/gpu/drm/i915/i915_query.c
> @@ -82,9 +82,52 @@ static int query_topology_info(struct drm_i915_private *dev_priv,
>   	return total_length;
>   }
>   
> +static int
> +query_engine_queues(struct drm_i915_private *i915,
> +		    struct drm_i915_query_item *query_item)
> +{
> +	struct drm_i915_query_engine_queues __user *query_ptr =
> +				u64_to_user_ptr(query_item->data_ptr);
> +	struct drm_i915_query_engine_queues query;
> +	struct intel_engine_cs *engine;
> +	const int len = sizeof(query);
> +	unsigned int i;
> +
> +	if (query_item->flags)
> +		return -EINVAL;
> +
> +	if (!query_item->length)
> +		return len;
> +	else if (query_item->length < len)
> +		return -EINVAL;
> +
> +	if (copy_from_user(&query, query_ptr, len))
> +		return -EFAULT;
> +
> +	for (i = 0; i < ARRAY_SIZE(query.rsvd); i++) {
> +		if (query.rsvd[i])
> +			return -EINVAL;
> +	}
> +
> +	engine = intel_engine_lookup_user(i915, query.class, query.instance);
> +	if (!engine)
> +		return -ENOENT;
> +
> +	query.queued = atomic_read(&engine->request_stats.queued);
> +	query.runnable = engine->request_stats.runnable;
> +	query.running = intel_engine_last_submit(engine) -
> +			intel_engine_get_seqno(engine);
> +
> +	if (copy_to_user(query_ptr, &query, len))
> +		return -EFAULT;
> +
> +	return len;
> +}
> +
>   static int (* const i915_query_funcs[])(struct drm_i915_private *dev_priv,
>   					struct drm_i915_query_item *query_item) = {
>   	query_topology_info,
> +	query_engine_queues,
>   };
>   
>   int i915_query_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 9a00c30e4071..064c3d620286 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -1637,6 +1637,7 @@ struct drm_i915_perf_oa_config {
>   struct drm_i915_query_item {
>   	__u64 query_id;
>   #define DRM_I915_QUERY_TOPOLOGY_INFO    1
> +#define DRM_I915_QUERY_ENGINE_QUEUES	2
>   
>   	/*
>   	 * When set to zero by userspace, this is filled with the size of the
> @@ -1734,6 +1735,31 @@ struct drm_i915_query_topology_info {
>   	__u8 data[];
>   };
>   
> +/**
> + * struct drm_i915_query_engine_queues
> + *
> + * Engine queues query enables userspace to query current counts of active
> + * requests in their different states.
> + */
> +struct drm_i915_query_engine_queues {
> +	/** Engine class as in enum drm_i915_gem_engine_class. */
> +	__u16 class;
> +
> +	/** Engine instance number. */
> +	__u16 instance;

Probably want to add that the previous 2 fields are set by userspace.

> +
> +	/** Number of requests with unresolved fences and dependencies. */
> +	__u32 queued;
> +
> +	/** Number of ready requests waiting on a slot on GPU. */
> +	__u32 runnable;
> +
> +	/** Number of requests executing on the GPU. */
> +	__u32 running;
> +
> +	__u32 rsvd[5];

Joonas made me add a comment for fields that are supposed to be cleared, 
probably applies here too.

> +};
> +
>   #if defined(__cplusplus)
>   }
>   #endif


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* ✓ Fi.CI.BAT: success for Queued/runnable/running engine stats (rev4)
  2018-04-05 12:39 [PATCH v5 0/7] Queued/runnable/running engine stats Tvrtko Ursulin
                   ` (6 preceding siblings ...)
  2018-04-05 12:39 ` [PATCH 7/7] drm/i915: Engine queues query Tvrtko Ursulin
@ 2018-04-05 13:49 ` Patchwork
  2018-04-05 16:08 ` ✗ Fi.CI.IGT: failure " Patchwork
  2018-04-09 17:12 ` ✗ Fi.CI.BAT: failure for Queued/runnable/running engine stats (rev7) Patchwork
  9 siblings, 0 replies; 31+ messages in thread
From: Patchwork @ 2018-04-05 13:49 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

== Series Details ==

Series: Queued/runnable/running engine stats (rev4)
URL   : https://patchwork.freedesktop.org/series/36926/
State : success

== Summary ==

Series 36926v4 Queued/runnable/running engine stats
https://patchwork.freedesktop.org/api/1.0/series/36926/revisions/4/mbox/

---- Known issues:

Test kms_flip:
        Subgroup basic-flip-vs-wf_vblank:
                pass       -> FAIL       (fi-cfl-s3) fdo#100368
Test kms_pipe_crc_basic:
        Subgroup nonblocking-crc-pipe-c-frame-sequence:
                pass       -> FAIL       (fi-cfl-s3) fdo#103481
        Subgroup suspend-read-crc-pipe-b:
                pass       -> INCOMPLETE (fi-snb-2520m) fdo#103713
        Subgroup suspend-read-crc-pipe-c:
                dmesg-warn -> PASS       (fi-glk-j4005) fdo#105644
Test prime_vgem:
        Subgroup basic-fence-flip:
                fail       -> PASS       (fi-ilk-650) fdo#104008

fdo#100368 https://bugs.freedesktop.org/show_bug.cgi?id=100368
fdo#103481 https://bugs.freedesktop.org/show_bug.cgi?id=103481
fdo#103713 https://bugs.freedesktop.org/show_bug.cgi?id=103713
fdo#105644 https://bugs.freedesktop.org/show_bug.cgi?id=105644
fdo#104008 https://bugs.freedesktop.org/show_bug.cgi?id=104008

fi-bdw-5557u     total:285  pass:264  dwarn:0   dfail:0   fail:0   skip:21  time:430s
fi-bdw-gvtdvm    total:285  pass:261  dwarn:0   dfail:0   fail:0   skip:24  time:441s
fi-blb-e6850     total:285  pass:220  dwarn:1   dfail:0   fail:0   skip:64  time:380s
fi-bsw-n3050     total:285  pass:239  dwarn:0   dfail:0   fail:0   skip:46  time:543s
fi-bwr-2160      total:285  pass:180  dwarn:0   dfail:0   fail:0   skip:105 time:297s
fi-bxt-dsi       total:285  pass:255  dwarn:0   dfail:0   fail:0   skip:30  time:514s
fi-bxt-j4205     total:285  pass:256  dwarn:0   dfail:0   fail:0   skip:29  time:514s
fi-byt-j1900     total:285  pass:250  dwarn:0   dfail:0   fail:0   skip:35  time:518s
fi-byt-n2820     total:285  pass:246  dwarn:0   dfail:0   fail:0   skip:39  time:508s
fi-cfl-8700k     total:285  pass:257  dwarn:0   dfail:0   fail:0   skip:28  time:411s
fi-cfl-s3        total:285  pass:257  dwarn:0   dfail:0   fail:2   skip:26  time:545s
fi-cfl-u         total:285  pass:259  dwarn:0   dfail:0   fail:0   skip:26  time:513s
fi-cnl-y3        total:285  pass:259  dwarn:0   dfail:0   fail:0   skip:26  time:583s
fi-elk-e7500     total:285  pass:226  dwarn:0   dfail:0   fail:0   skip:59  time:424s
fi-gdg-551       total:285  pass:176  dwarn:0   dfail:0   fail:1   skip:108 time:317s
fi-glk-1         total:285  pass:257  dwarn:0   dfail:0   fail:0   skip:28  time:536s
fi-glk-j4005     total:285  pass:256  dwarn:0   dfail:0   fail:0   skip:29  time:482s
fi-hsw-4770      total:285  pass:258  dwarn:0   dfail:0   fail:0   skip:27  time:405s
fi-ilk-650       total:285  pass:225  dwarn:0   dfail:0   fail:0   skip:60  time:426s
fi-ivb-3520m     total:285  pass:256  dwarn:0   dfail:0   fail:0   skip:29  time:468s
fi-ivb-3770      total:285  pass:252  dwarn:0   dfail:0   fail:0   skip:33  time:436s
fi-kbl-7500u     total:285  pass:260  dwarn:1   dfail:0   fail:0   skip:24  time:472s
fi-kbl-7567u     total:285  pass:265  dwarn:0   dfail:0   fail:0   skip:20  time:463s
fi-kbl-r         total:285  pass:258  dwarn:0   dfail:0   fail:0   skip:27  time:508s
fi-pnv-d510      total:285  pass:220  dwarn:1   dfail:0   fail:0   skip:64  time:665s
fi-skl-6260u     total:285  pass:265  dwarn:0   dfail:0   fail:0   skip:20  time:445s
fi-skl-6600u     total:285  pass:258  dwarn:0   dfail:0   fail:0   skip:27  time:532s
fi-skl-6700k2    total:285  pass:261  dwarn:0   dfail:0   fail:0   skip:24  time:502s
fi-skl-6770hq    total:285  pass:265  dwarn:0   dfail:0   fail:0   skip:20  time:504s
fi-skl-guc       total:285  pass:257  dwarn:0   dfail:0   fail:0   skip:28  time:427s
fi-skl-gvtdvm    total:285  pass:262  dwarn:0   dfail:0   fail:0   skip:23  time:445s
fi-snb-2520m     total:242  pass:208  dwarn:0   dfail:0   fail:0   skip:33 
fi-snb-2600      total:285  pass:245  dwarn:0   dfail:0   fail:0   skip:40  time:404s
Blacklisted hosts:
fi-cnl-psr       total:285  pass:256  dwarn:3   dfail:0   fail:0   skip:26  time:522s

0eddede73765b01ec287cad00e23bee23c216a16 drm-tip: 2018y-04m-05d-09h-51m-03s UTC integration manifest
2c169938f8d4 drm/i915: Engine queues query
d37cea8a93a5 drm/i915/pmu: Add running counter
e9fc2870a297 drm/i915/pmu: Add runnable counter
d30b4a00237d drm/i915/pmu: Add queued counter
6de912c97c97 drm/i915: Keep a count of requests submitted from userspace
c458e163eabb drm/i915: Keep a count of requests waiting for a slot on GPU
1d77e7b6fd84 drm/i915/pmu: Fix enable count array size and bounds checking

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_8596/issues.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* ✗ Fi.CI.IGT: failure for Queued/runnable/running engine stats (rev4)
  2018-04-05 12:39 [PATCH v5 0/7] Queued/runnable/running engine stats Tvrtko Ursulin
                   ` (7 preceding siblings ...)
  2018-04-05 13:49 ` ✓ Fi.CI.BAT: success for Queued/runnable/running engine stats (rev4) Patchwork
@ 2018-04-05 16:08 ` Patchwork
  2018-04-09 17:12 ` ✗ Fi.CI.BAT: failure for Queued/runnable/running engine stats (rev7) Patchwork
  9 siblings, 0 replies; 31+ messages in thread
From: Patchwork @ 2018-04-05 16:08 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

== Series Details ==

Series: Queued/runnable/running engine stats (rev4)
URL   : https://patchwork.freedesktop.org/series/36926/
State : failure

== Summary ==

---- Possible new issues:

Test kms_cursor_legacy:
        Subgroup cursor-vs-flip-toggle:
                fail       -> PASS       (shard-hsw)
Test kms_frontbuffer_tracking:
        Subgroup fbc-1p-primscrn-indfb-pgflip-blt:
                pass       -> FAIL       (shard-apl)

---- Known issues:

Test drv_selftest:
        Subgroup live_gtt:
                pass       -> INCOMPLETE (shard-apl) fdo#103927
Test kms_cursor_legacy:
        Subgroup flip-vs-cursor-toggle:
                pass       -> FAIL       (shard-hsw) fdo#102670
Test perf:
        Subgroup polling:
                pass       -> FAIL       (shard-hsw) fdo#102252

fdo#103927 https://bugs.freedesktop.org/show_bug.cgi?id=103927
fdo#102670 https://bugs.freedesktop.org/show_bug.cgi?id=102670
fdo#102252 https://bugs.freedesktop.org/show_bug.cgi?id=102252

shard-apl        total:2657 pass:1810 dwarn:1   dfail:0   fail:8   skip:836 time:12190s
shard-hsw        total:2680 pass:1784 dwarn:1   dfail:0   fail:3   skip:891 time:11420s
shard-snb        total:2680 pass:1375 dwarn:1   dfail:0   fail:5   skip:1299 time:6952s
Blacklisted hosts:
shard-kbl        total:2622 pass:1918 dwarn:1   dfail:0   fail:7   skip:695 time:8911s

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_8596/shards.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/7] drm/i915: Keep a count of requests waiting for a slot on GPU
  2018-04-05 12:39 ` [PATCH 2/7] drm/i915: Keep a count of requests waiting for a slot on GPU Tvrtko Ursulin
@ 2018-04-06 20:16   ` Chris Wilson
  2018-04-09 16:37   ` [PATCH v10 " Tvrtko Ursulin
  1 sibling, 0 replies; 31+ messages in thread
From: Chris Wilson @ 2018-04-06 20:16 UTC (permalink / raw)
  To: Tvrtko Ursulin, Intel-gfx

Quoting Tvrtko Ursulin (2018-04-05 13:39:18)
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Keep a per-engine number of runnable (waiting for GPU time) requests.
> 
> v2:
>  * Move queued increment from insert_request to execlist_submit_request to
>    avoid bumping when re-ordering for priority.
>  * Support the counter on the ringbuffer submission path as well, albeit
>    just notionally. (Chris Wilson)
> 
> v3:
>  * Rebase.
> 
> v4:
>  * Rename and move the stats into a container structure. (Chris Wilson)
> 
> v5:
>  * Re-order fields in struct intel_engine_cs. (Chris Wilson)
> 
> v6-v8:
>  * Rebases.
> 
> v9:
>  * Fix accounting during wedging.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem.c         | 1 +
>  drivers/gpu/drm/i915/i915_request.c     | 7 +++++++
>  drivers/gpu/drm/i915/intel_engine_cs.c  | 5 +++--
>  drivers/gpu/drm/i915/intel_lrc.c        | 1 +
>  drivers/gpu/drm/i915/intel_ringbuffer.h | 9 +++++++++
>  5 files changed, 21 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 9650a7b10c5f..63f334d5f7fd 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3211,6 +3211,7 @@ static void nop_complete_submit_request(struct i915_request *request)
>         dma_fence_set_error(&request->fence, -EIO);
>  
>         spin_lock_irqsave(&request->engine->timeline->lock, flags);
> +       request->engine->request_stats.runnable++;
>         __i915_request_submit(request);
>         intel_engine_init_global_seqno(request->engine, request->global_seqno);
>         spin_unlock_irqrestore(&request->engine->timeline->lock, flags);
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index 585242831974..5c01291ad1cc 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -540,6 +540,9 @@ void __i915_request_submit(struct i915_request *request)
>         /* Transfer from per-context onto the global per-engine timeline */
>         move_to_timeline(request, engine->timeline);
>  
> +       GEM_BUG_ON(engine->request_stats.runnable == 0);
> +       engine->request_stats.runnable--;
> +
>         trace_i915_request_execute(request);
>  
>         wake_up_all(&request->execute);
> @@ -553,6 +556,8 @@ void i915_request_submit(astruct i915_request *request)
>         /* Will be called from irq-context when using foreign fences. */
>         spin_lock_irqsave(&engine->timeline->lock, flags);
>  
> +       engine->request_stats.runnable++;

Hmm, I was thinking this should be in submit_notify(), as you want to
count from when all fences are signaled.

But you are using the timeline lock as its guard?

The only downside is having to repeat the inc in each path. And with the
slight disparity for unsubmit. Not a blocker, just had to actually think
about what you were doing, so maybe discuss that upfront in the commit
msg.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 3/7] drm/i915: Keep a count of requests submitted from userspace
  2018-04-05 12:39 ` [PATCH 3/7] drm/i915: Keep a count of requests submitted from userspace Tvrtko Ursulin
@ 2018-04-06 20:17   ` Chris Wilson
  2018-04-09  9:11     ` Tvrtko Ursulin
  2018-04-09 16:38   ` [PATCH v4 " Tvrtko Ursulin
  1 sibling, 1 reply; 31+ messages in thread
From: Chris Wilson @ 2018-04-06 20:17 UTC (permalink / raw)
  To: Tvrtko Ursulin, Intel-gfx

Quoting Tvrtko Ursulin (2018-04-05 13:39:19)
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Keep a count of requests submitted from userspace and not yet runnable due
> unresolved dependencies.
> 
> v2: Rename and move under the container struct. (Chris Wilson)
> v3: Rebase.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_request.c     | 3 +++
>  drivers/gpu/drm/i915/intel_engine_cs.c  | 3 ++-
>  drivers/gpu/drm/i915/intel_ringbuffer.h | 8 ++++++++
>  3 files changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index 5c01291ad1cc..152321655fe6 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -640,6 +640,7 @@ submit_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
>                 rcu_read_lock();
>                 request->engine->submit_request(request);
>                 rcu_read_unlock();
> +               atomic_dec(&request->engine->request_stats.queued);

But we use atomic here? Might as well use atomic for
request_stats.runnable here as well?
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 4/7] drm/i915/pmu: Add queued counter
  2018-04-05 12:39 ` [PATCH 4/7] drm/i915/pmu: Add queued counter Tvrtko Ursulin
@ 2018-04-06 20:19   ` Chris Wilson
  0 siblings, 0 replies; 31+ messages in thread
From: Chris Wilson @ 2018-04-06 20:19 UTC (permalink / raw)
  To: Tvrtko Ursulin, Intel-gfx

Quoting Tvrtko Ursulin (2018-04-05 13:39:20)
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> We add a PMU counter to expose the number of requests which have been
> submitted from userspace but are not yet runnable due dependencies and
> unsignaled fences.
> 
> This is useful to analyze the overall load of the system.
> 
> v2:
>  * Rebase for name change and re-order.
>  * Drop floating point constant. (Chris Wilson)
> 
> v3:
>  * Change scale to 1024 for faster arithmetics. (Chris Wilson)
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

I have nothing to complain about,
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 5/7] drm/i915/pmu: Add runnable counter
  2018-04-05 12:39 ` [PATCH 5/7] drm/i915/pmu: Add runnable counter Tvrtko Ursulin
@ 2018-04-06 20:22   ` Chris Wilson
  0 siblings, 0 replies; 31+ messages in thread
From: Chris Wilson @ 2018-04-06 20:22 UTC (permalink / raw)
  To: Tvrtko Ursulin, Intel-gfx

Quoting Tvrtko Ursulin (2018-04-05 13:39:21)
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> We add a PMU counter to expose the number of requests with resolved
> dependencies waiting for a slot on the GPU to run.
> 
> This is useful to analyze the overall load of the system.
> 
> v2: Don't limit to gen8+.
> 
> v3:
>  * Rebase for dynamic sysfs.
>  * Drop currently executing requests.
> 
> v4:
>  * Sync with internal renaming.
>  * Drop floating point constant. (Chris Wilson)
> 
> v5:
>  * Change scale to 1024 for faster arithmetics. (Chris Wilson)
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Cunningly disguised as the patch I thought I just read,

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 6/7] drm/i915/pmu: Add running counter
  2018-04-05 12:39 ` [PATCH 6/7] drm/i915/pmu: Add running counter Tvrtko Ursulin
@ 2018-04-06 20:24   ` Chris Wilson
  2018-04-09  9:13     ` Tvrtko Ursulin
  0 siblings, 1 reply; 31+ messages in thread
From: Chris Wilson @ 2018-04-06 20:24 UTC (permalink / raw)
  To: Tvrtko Ursulin, Intel-gfx

Quoting Tvrtko Ursulin (2018-04-05 13:39:22)
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> We add a PMU counter to expose the number of requests currently executing
> on the GPU.
> 
> This is useful to analyze the overall load of the system.
> 
> v2:
>  * Rebase.
>  * Drop floating point constant. (Chris Wilson)
> 
> v3:
>  * Change scale to 1024 for faster arithmetics. (Chris Wilson)
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>

Do we want these separate in the final push? Is there value in reverting
one but not the others? They seem a triumvirate.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 7/7] drm/i915: Engine queues query
  2018-04-05 13:05   ` Lionel Landwerlin
@ 2018-04-06 20:25     ` Chris Wilson
  0 siblings, 0 replies; 31+ messages in thread
From: Chris Wilson @ 2018-04-06 20:25 UTC (permalink / raw)
  To: Lionel Landwerlin, Tvrtko Ursulin, Intel-gfx

Quoting Lionel Landwerlin (2018-04-05 14:05:52)
> On 05/04/18 13:39, Tvrtko Ursulin wrote:
> > +
> > +     /** Number of requests with unresolved fences and dependencies. */
> > +     __u32 queued;
> > +
> > +     /** Number of ready requests waiting on a slot on GPU. */
> > +     __u32 runnable;
> > +
> > +     /** Number of requests executing on the GPU. */
> > +     __u32 running;
> > +
> > +     __u32 rsvd[5];
> 
> Joonas made me add a comment for fields that are supposed to be cleared, 
> probably applies here too.

__u32 mbz[5]; ?
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 3/7] drm/i915: Keep a count of requests submitted from userspace
  2018-04-06 20:17   ` Chris Wilson
@ 2018-04-09  9:11     ` Tvrtko Ursulin
  2018-04-09  9:25       ` Chris Wilson
  0 siblings, 1 reply; 31+ messages in thread
From: Tvrtko Ursulin @ 2018-04-09  9:11 UTC (permalink / raw)
  To: Chris Wilson, Tvrtko Ursulin, Intel-gfx


On 06/04/2018 21:17, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-04-05 13:39:19)
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> Keep a count of requests submitted from userspace and not yet runnable due
>> unresolved dependencies.
>>
>> v2: Rename and move under the container struct. (Chris Wilson)
>> v3: Rebase.
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_request.c     | 3 +++
>>   drivers/gpu/drm/i915/intel_engine_cs.c  | 3 ++-
>>   drivers/gpu/drm/i915/intel_ringbuffer.h | 8 ++++++++
>>   3 files changed, 13 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
>> index 5c01291ad1cc..152321655fe6 100644
>> --- a/drivers/gpu/drm/i915/i915_request.c
>> +++ b/drivers/gpu/drm/i915/i915_request.c
>> @@ -640,6 +640,7 @@ submit_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
>>                  rcu_read_lock();
>>                  request->engine->submit_request(request);
>>                  rcu_read_unlock();
>> +               atomic_dec(&request->engine->request_stats.queued);
> 
> But we use atomic here? Might as well use atomic for
> request_stats.runnable here as well?

I admit it can read a bit uneven.

For runnable I wanted to avoid another atomic by putting it under the 
engine timeline lock.

But for queued I did not want to start taking the same lock when adding 
a request.

Your proposal to make runnable atomic_t and move to submit_notify would 
indeed simplify things, but at a cost of one more atomic in that path. 
Perhaps the code path is heavy enough for one new atomic to be 
completely hidden in it, and code simplification to win?

Regards,

Tvrtko


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 6/7] drm/i915/pmu: Add running counter
  2018-04-06 20:24   ` Chris Wilson
@ 2018-04-09  9:13     ` Tvrtko Ursulin
  0 siblings, 0 replies; 31+ messages in thread
From: Tvrtko Ursulin @ 2018-04-09  9:13 UTC (permalink / raw)
  To: Chris Wilson, Tvrtko Ursulin, Intel-gfx


On 06/04/2018 21:24, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-04-05 13:39:22)
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> We add a PMU counter to expose the number of requests currently executing
>> on the GPU.
>>
>> This is useful to analyze the overall load of the system.
>>
>> v2:
>>   * Rebase.
>>   * Drop floating point constant. (Chris Wilson)
>>
>> v3:
>>   * Change scale to 1024 for faster arithmetics. (Chris Wilson)
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
> 
> Do we want these separate in the final push? Is there value in reverting
> one but not the others? They seem a triumvirate.

I think the only benefit to have them separate for me was that rebasing 
was marginally easier. I can just as well squash them if that is preferred.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 3/7] drm/i915: Keep a count of requests submitted from userspace
  2018-04-09  9:11     ` Tvrtko Ursulin
@ 2018-04-09  9:25       ` Chris Wilson
  2018-04-09 10:17         ` Tvrtko Ursulin
  0 siblings, 1 reply; 31+ messages in thread
From: Chris Wilson @ 2018-04-09  9:25 UTC (permalink / raw)
  To: Tvrtko Ursulin, Tvrtko Ursulin, Intel-gfx

Quoting Tvrtko Ursulin (2018-04-09 10:11:53)
> 
> On 06/04/2018 21:17, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2018-04-05 13:39:19)
> >> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>
> >> Keep a count of requests submitted from userspace and not yet runnable due
> >> unresolved dependencies.
> >>
> >> v2: Rename and move under the container struct. (Chris Wilson)
> >> v3: Rebase.
> >>
> >> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >> ---
> >>   drivers/gpu/drm/i915/i915_request.c     | 3 +++
> >>   drivers/gpu/drm/i915/intel_engine_cs.c  | 3 ++-
> >>   drivers/gpu/drm/i915/intel_ringbuffer.h | 8 ++++++++
> >>   3 files changed, 13 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> >> index 5c01291ad1cc..152321655fe6 100644
> >> --- a/drivers/gpu/drm/i915/i915_request.c
> >> +++ b/drivers/gpu/drm/i915/i915_request.c
> >> @@ -640,6 +640,7 @@ submit_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
> >>                  rcu_read_lock();
> >>                  request->engine->submit_request(request);
> >>                  rcu_read_unlock();
> >> +               atomic_dec(&request->engine->request_stats.queued);
> > 
> > But we use atomic here? Might as well use atomic for
> > request_stats.runnable here as well?
> 
> I admit it can read a bit uneven.
> 
> For runnable I wanted to avoid another atomic by putting it under the 
> engine timeline lock.
> 
> But for queued I did not want to start taking the same lock when adding 
> a request.
> 
> Your proposal to make runnable atomic_t and move to submit_notify would 
> indeed simplify things, but at a cost of one more atomic in that path. 
> Perhaps the code path is heavy enough for one new atomic to be 
> completely hidden in it, and code simplification to win?

It also solidifies that we are moving from one counter to the next.
(There must be some common 64b cmpxchg for doing that!) Going from +1
locked operations to +2 here isn't the end of the world, but I can
certainly appreciated trying to keep the number down (especially for aux
information like stats). 

now = atomic64_read(&stats.queued_runnable);
do {
	old = now;
	new_queued = upper_32_bits(old) - 1;
	new_runnable = lower_32_bits(old) + 1;
	now = atomic64_cmpxchg(&stats.queued_runnable,
				old, (new_runnable | (u64)new_queued << 32));
} while (now != old);

Downside being that we either then use atomic64 throughout or we mix
atomic32/atomic64 knowing that we're on x86. (I feel like someone else
must have solved this problem in a much neater way, before they went to
per-cpu stats ;)
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 3/7] drm/i915: Keep a count of requests submitted from userspace
  2018-04-09  9:25       ` Chris Wilson
@ 2018-04-09 10:17         ` Tvrtko Ursulin
  2018-04-09 10:27           ` Chris Wilson
  0 siblings, 1 reply; 31+ messages in thread
From: Tvrtko Ursulin @ 2018-04-09 10:17 UTC (permalink / raw)
  To: Chris Wilson, Tvrtko Ursulin, Intel-gfx


On 09/04/2018 10:25, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-04-09 10:11:53)
>>
>> On 06/04/2018 21:17, Chris Wilson wrote:
>>> Quoting Tvrtko Ursulin (2018-04-05 13:39:19)
>>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>
>>>> Keep a count of requests submitted from userspace and not yet runnable due
>>>> unresolved dependencies.
>>>>
>>>> v2: Rename and move under the container struct. (Chris Wilson)
>>>> v3: Rebase.
>>>>
>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>> ---
>>>>    drivers/gpu/drm/i915/i915_request.c     | 3 +++
>>>>    drivers/gpu/drm/i915/intel_engine_cs.c  | 3 ++-
>>>>    drivers/gpu/drm/i915/intel_ringbuffer.h | 8 ++++++++
>>>>    3 files changed, 13 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
>>>> index 5c01291ad1cc..152321655fe6 100644
>>>> --- a/drivers/gpu/drm/i915/i915_request.c
>>>> +++ b/drivers/gpu/drm/i915/i915_request.c
>>>> @@ -640,6 +640,7 @@ submit_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
>>>>                   rcu_read_lock();
>>>>                   request->engine->submit_request(request);
>>>>                   rcu_read_unlock();
>>>> +               atomic_dec(&request->engine->request_stats.queued);
>>>
>>> But we use atomic here? Might as well use atomic for
>>> request_stats.runnable here as well?
>>
>> I admit it can read a bit uneven.
>>
>> For runnable I wanted to avoid another atomic by putting it under the
>> engine timeline lock.
>>
>> But for queued I did not want to start taking the same lock when adding
>> a request.
>>
>> Your proposal to make runnable atomic_t and move to submit_notify would
>> indeed simplify things, but at a cost of one more atomic in that path.
>> Perhaps the code path is heavy enough for one new atomic to be
>> completely hidden in it, and code simplification to win?
> 
> It also solidifies that we are moving from one counter to the next.
> (There must be some common 64b cmpxchg for doing that!) Going from +1
> locked operations to +2 here isn't the end of the world, but I can
> certainly appreciated trying to keep the number down (especially for aux
> information like stats).
> 
> now = atomic64_read(&stats.queued_runnable);
> do {
> 	old = now;
> 	new_queued = upper_32_bits(old) - 1;
> 	new_runnable = lower_32_bits(old) + 1;
> 	now = atomic64_cmpxchg(&stats.queued_runnable,
> 				old, (new_runnable | (u64)new_queued << 32));
> } while (now != old);

Hm don't know, have to be careful with these retry loops. More 
importantly I am not sure if it isn't an overkill.

> Downside being that we either then use atomic64 throughout or we mix
> atomic32/atomic64 knowing that we're on x86. (I feel like someone else
> must have solved this problem in a much neater way, before they went to
> per-cpu stats ;)

Is the winky implying you know who and where? :) We have three potential 
solutions now, even for if the winky is suggesting something.

For me it is still a choice between what I have versus simplifying the 
code paths by going another atomic_t.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 3/7] drm/i915: Keep a count of requests submitted from userspace
  2018-04-09 10:17         ` Tvrtko Ursulin
@ 2018-04-09 10:27           ` Chris Wilson
  2018-04-09 10:29             ` Chris Wilson
  2018-04-09 10:40             ` Tvrtko Ursulin
  0 siblings, 2 replies; 31+ messages in thread
From: Chris Wilson @ 2018-04-09 10:27 UTC (permalink / raw)
  To: Tvrtko Ursulin, Tvrtko Ursulin, Intel-gfx

Quoting Tvrtko Ursulin (2018-04-09 11:17:04)
> 
> On 09/04/2018 10:25, Chris Wilson wrote:
> > Downside being that we either then use atomic64 throughout or we mix
> > atomic32/atomic64 knowing that we're on x86. (I feel like someone else
> > must have solved this problem in a much neater way, before they went to
> > per-cpu stats ;)
> 
> Is the winky implying you know who and where? :) We have three potential 
> solutions now, even for if the winky is suggesting something.

Nah, just that atomic/locked counters are so old hat. Not sure if there
remain any good examples for hotpath counters that remain applicable to
our code.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 3/7] drm/i915: Keep a count of requests submitted from userspace
  2018-04-09 10:27           ` Chris Wilson
@ 2018-04-09 10:29             ` Chris Wilson
  2018-04-09 10:40             ` Tvrtko Ursulin
  1 sibling, 0 replies; 31+ messages in thread
From: Chris Wilson @ 2018-04-09 10:29 UTC (permalink / raw)
  To: Tvrtko Ursulin, Tvrtko Ursulin, Intel-gfx

Quoting Chris Wilson (2018-04-09 11:27:17)
> Quoting Tvrtko Ursulin (2018-04-09 11:17:04)
> > 
> > On 09/04/2018 10:25, Chris Wilson wrote:
> > > Downside being that we either then use atomic64 throughout or we mix
> > > atomic32/atomic64 knowing that we're on x86. (I feel like someone else
> > > must have solved this problem in a much neater way, before they went to
> > > per-cpu stats ;)
> > 
> > Is the winky implying you know who and where? :) We have three potential 
> > solutions now, even for if the winky is suggesting something.
> 
> Nah, just that atomic/locked counters are so old hat. Not sure if there
> remain any good examples for hotpath counters that remain applicable to
> our code.

With an underlying nudge that perhaps our hotpath code isn't so hot, and
if we can squeeze out a few more cycles.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 3/7] drm/i915: Keep a count of requests submitted from userspace
  2018-04-09 10:27           ` Chris Wilson
  2018-04-09 10:29             ` Chris Wilson
@ 2018-04-09 10:40             ` Tvrtko Ursulin
  2018-04-09 10:51               ` Chris Wilson
  1 sibling, 1 reply; 31+ messages in thread
From: Tvrtko Ursulin @ 2018-04-09 10:40 UTC (permalink / raw)
  To: Chris Wilson, Tvrtko Ursulin, Intel-gfx


On 09/04/2018 11:27, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-04-09 11:17:04)
>>
>> On 09/04/2018 10:25, Chris Wilson wrote:
>>> Downside being that we either then use atomic64 throughout or we mix
>>> atomic32/atomic64 knowing that we're on x86. (I feel like someone else
>>> must have solved this problem in a much neater way, before they went to
>>> per-cpu stats ;)
>>
>> Is the winky implying you know who and where? :) We have three potential
>> solutions now, even for if the winky is suggesting something.
> 
> Nah, just that atomic/locked counters are so old hat. Not sure if there
> remain any good examples for hotpath counters that remain applicable to
> our code.

Leave it as is then for now and improve if we discover it is not good 
enough?

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 3/7] drm/i915: Keep a count of requests submitted from userspace
  2018-04-09 10:40             ` Tvrtko Ursulin
@ 2018-04-09 10:51               ` Chris Wilson
  2018-04-09 11:43                 ` Tvrtko Ursulin
  0 siblings, 1 reply; 31+ messages in thread
From: Chris Wilson @ 2018-04-09 10:51 UTC (permalink / raw)
  To: Tvrtko Ursulin, Tvrtko Ursulin, Intel-gfx

Quoting Tvrtko Ursulin (2018-04-09 11:40:08)
> 
> On 09/04/2018 11:27, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2018-04-09 11:17:04)
> >>
> >> On 09/04/2018 10:25, Chris Wilson wrote:
> >>> Downside being that we either then use atomic64 throughout or we mix
> >>> atomic32/atomic64 knowing that we're on x86. (I feel like someone else
> >>> must have solved this problem in a much neater way, before they went to
> >>> per-cpu stats ;)
> >>
> >> Is the winky implying you know who and where? :) We have three potential
> >> solutions now, even for if the winky is suggesting something.
> > 
> > Nah, just that atomic/locked counters are so old hat. Not sure if there
> > remain any good examples for hotpath counters that remain applicable to
> > our code.
> 
> Leave it as is then for now and improve if we discover it is not good 
> enough?

I did have an ulterior motive in that the cmpxchg did resolve one issue
that irked me with the two counters being updated out of sync. Minor,
minor glitches :)

I don't have a strong preference either way. These instructions on the
submit are not likely to stand out, as compared to the biggest fish of
ksoftirqd, execlists_schedule() and execlists_dequeue().
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 3/7] drm/i915: Keep a count of requests submitted from userspace
  2018-04-09 10:51               ` Chris Wilson
@ 2018-04-09 11:43                 ` Tvrtko Ursulin
  2018-04-09 11:54                   ` Chris Wilson
  0 siblings, 1 reply; 31+ messages in thread
From: Tvrtko Ursulin @ 2018-04-09 11:43 UTC (permalink / raw)
  To: Chris Wilson, Tvrtko Ursulin, Intel-gfx


On 09/04/2018 11:51, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-04-09 11:40:08)
>>
>> On 09/04/2018 11:27, Chris Wilson wrote:
>>> Quoting Tvrtko Ursulin (2018-04-09 11:17:04)
>>>>
>>>> On 09/04/2018 10:25, Chris Wilson wrote:
>>>>> Downside being that we either then use atomic64 throughout or we mix
>>>>> atomic32/atomic64 knowing that we're on x86. (I feel like someone else
>>>>> must have solved this problem in a much neater way, before they went to
>>>>> per-cpu stats ;)
>>>>
>>>> Is the winky implying you know who and where? :) We have three potential
>>>> solutions now, even for if the winky is suggesting something.
>>>
>>> Nah, just that atomic/locked counters are so old hat. Not sure if there
>>> remain any good examples for hotpath counters that remain applicable to
>>> our code.
>>
>> Leave it as is then for now and improve if we discover it is not good
>> enough?
> 
> I did have an ulterior motive in that the cmpxchg did resolve one issue
> that irked me with the two counters being updated out of sync. Minor,
> minor glitches :)
> 
> I don't have a strong preference either way. These instructions on the
> submit are not likely to stand out, as compared to the biggest fish of
> ksoftirqd, execlists_schedule() and execlists_dequeue().

I could move the queued decrement from submit_notify to backends, right 
next to runnable++? Then both would be under the engine->timeline->lock 
so any inconsistencies in readout I'd hope should be dismissable?

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 3/7] drm/i915: Keep a count of requests submitted from userspace
  2018-04-09 11:43                 ` Tvrtko Ursulin
@ 2018-04-09 11:54                   ` Chris Wilson
  0 siblings, 0 replies; 31+ messages in thread
From: Chris Wilson @ 2018-04-09 11:54 UTC (permalink / raw)
  To: Tvrtko Ursulin, Tvrtko Ursulin, Intel-gfx

Quoting Tvrtko Ursulin (2018-04-09 12:43:50)
> 
> On 09/04/2018 11:51, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2018-04-09 11:40:08)
> >>
> >> On 09/04/2018 11:27, Chris Wilson wrote:
> >>> Quoting Tvrtko Ursulin (2018-04-09 11:17:04)
> >>>>
> >>>> On 09/04/2018 10:25, Chris Wilson wrote:
> >>>>> Downside being that we either then use atomic64 throughout or we mix
> >>>>> atomic32/atomic64 knowing that we're on x86. (I feel like someone else
> >>>>> must have solved this problem in a much neater way, before they went to
> >>>>> per-cpu stats ;)
> >>>>
> >>>> Is the winky implying you know who and where? :) We have three potential
> >>>> solutions now, even for if the winky is suggesting something.
> >>>
> >>> Nah, just that atomic/locked counters are so old hat. Not sure if there
> >>> remain any good examples for hotpath counters that remain applicable to
> >>> our code.
> >>
> >> Leave it as is then for now and improve if we discover it is not good
> >> enough?
> > 
> > I did have an ulterior motive in that the cmpxchg did resolve one issue
> > that irked me with the two counters being updated out of sync. Minor,
> > minor glitches :)
> > 
> > I don't have a strong preference either way. These instructions on the
> > submit are not likely to stand out, as compared to the biggest fish of
> > ksoftirqd, execlists_schedule() and execlists_dequeue().
> 
> I could move the queued decrement from submit_notify to backends, right 
> next to runnable++? Then both would be under the engine->timeline->lock 
> so any inconsistencies in readout I'd hope should be dismissable?

Fair. I have this itch to add a request->state,
	switch (request->state) {
	case QUEUED:
		stats->queued--;
		switch (now) {
			case QUEUED:
				BUG();
			case: READY:
				stats->runnable++;
			case EXEC:
				break;
		}
		break;
	case ...
	}
	request->state = now;

Stop me. Please, stop me.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v10 2/7] drm/i915: Keep a count of requests waiting for a slot on GPU
  2018-04-05 12:39 ` [PATCH 2/7] drm/i915: Keep a count of requests waiting for a slot on GPU Tvrtko Ursulin
  2018-04-06 20:16   ` Chris Wilson
@ 2018-04-09 16:37   ` Tvrtko Ursulin
  1 sibling, 0 replies; 31+ messages in thread
From: Tvrtko Ursulin @ 2018-04-09 16:37 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Keep a per-engine number of runnable (waiting for GPU time) requests.

We choose to mange the runnable counter at the backend level instead of at
the request submit_notify callback. The latter would be more consolidated
and less code, but it would require making the counter either atomic_t or
taking the engine->timeline->lock in submit_notify. So the choice is to do
it at the backend level for the benefit of fewer atomic instructions.

v2:
 * Move queued increment from insert_request to execlist_submit_request to
   avoid bumping when re-ordering for priority.
 * Support the counter on the ringbuffer submission path as well, albeit
   just notionally. (Chris Wilson)

v3:
 * Rebase.

v4:
 * Rename and move the stats into a container structure. (Chris Wilson)

v5:
 * Re-order fields in struct intel_engine_cs. (Chris Wilson)

v6-v8:
 * Rebases.

v9:
 * Fix accounting during wedging.

v10:
 * Improved commit message. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c         | 1 +
 drivers/gpu/drm/i915/i915_request.c     | 7 +++++++
 drivers/gpu/drm/i915/intel_engine_cs.c  | 5 +++--
 drivers/gpu/drm/i915/intel_lrc.c        | 1 +
 drivers/gpu/drm/i915/intel_ringbuffer.h | 9 +++++++++
 5 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 28ab0beff86c..aa8d19fac167 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3304,6 +3304,7 @@ static void nop_complete_submit_request(struct i915_request *request)
 	dma_fence_set_error(&request->fence, -EIO);
 
 	spin_lock_irqsave(&request->engine->timeline->lock, flags);
+	request->engine->request_stats.runnable++;
 	__i915_request_submit(request);
 	intel_engine_init_global_seqno(request->engine, request->global_seqno);
 	spin_unlock_irqrestore(&request->engine->timeline->lock, flags);
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 9ca9c24b4421..2617bd008845 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -494,6 +494,9 @@ void __i915_request_submit(struct i915_request *request)
 	/* Transfer from per-context onto the global per-engine timeline */
 	move_to_timeline(request, engine->timeline);
 
+	GEM_BUG_ON(engine->request_stats.runnable == 0);
+	engine->request_stats.runnable--;
+
 	trace_i915_request_execute(request);
 
 	wake_up_all(&request->execute);
@@ -507,6 +510,8 @@ void i915_request_submit(struct i915_request *request)
 	/* Will be called from irq-context when using foreign fences. */
 	spin_lock_irqsave(&engine->timeline->lock, flags);
 
+	engine->request_stats.runnable++;
+
 	__i915_request_submit(request);
 
 	spin_unlock_irqrestore(&engine->timeline->lock, flags);
@@ -545,6 +550,8 @@ void __i915_request_unsubmit(struct i915_request *request)
 	/* Transfer back from the global per-engine timeline to per-context */
 	move_to_timeline(request, request->timeline);
 
+	engine->request_stats.runnable++;
+
 	/*
 	 * We don't need to wake_up any waiters on request->execute, they
 	 * will get woken by any other event or us re-adding this request
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 12486d8f534b..98254ff92785 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -1934,12 +1934,13 @@ void intel_engine_dump(struct intel_engine_cs *engine,
 	if (i915_terminally_wedged(&engine->i915->gpu_error))
 		drm_printf(m, "*** WEDGED ***\n");
 
-	drm_printf(m, "\tcurrent seqno %x, last %x, hangcheck %x [%d ms], inflight %d\n",
+	drm_printf(m, "\tcurrent seqno %x, last %x, hangcheck %x [%d ms], inflight %d, runnable %u\n",
 		   intel_engine_get_seqno(engine),
 		   intel_engine_last_submit(engine),
 		   engine->hangcheck.seqno,
 		   jiffies_to_msecs(jiffies - engine->hangcheck.action_timestamp),
-		   engine->timeline->inflight_seqnos);
+		   engine->timeline->inflight_seqnos,
+		   engine->request_stats.runnable);
 	drm_printf(m, "\tReset count: %d (global %d)\n",
 		   i915_reset_engine_count(error, engine),
 		   i915_reset_count(error));
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 02b25bf2378a..16ea95ff7c51 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1124,6 +1124,7 @@ static void execlists_submit_request(struct i915_request *request)
 
 	queue_request(engine, &request->priotree, rq_prio(request));
 	submit_queue(engine, rq_prio(request));
+	engine->request_stats.runnable++;
 
 	GEM_BUG_ON(!engine->execlists.first);
 	GEM_BUG_ON(list_empty(&request->priotree.link));
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 0c548c400699..54d2ad1c8daa 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -338,6 +338,15 @@ struct intel_engine_cs {
 
 	struct drm_i915_gem_object *default_state;
 
+	struct {
+		/**
+		 * @runnable: Number of runnable requests sent to the backend.
+		 *
+		 * Count of requests waiting for the GPU to execute them.
+		 */
+		unsigned int runnable;
+	} request_stats;
+
 	atomic_t irq_count;
 	unsigned long irq_posted;
 #define ENGINE_IRQ_BREADCRUMB 0
-- 
2.14.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v4 3/7] drm/i915: Keep a count of requests submitted from userspace
  2018-04-05 12:39 ` [PATCH 3/7] drm/i915: Keep a count of requests submitted from userspace Tvrtko Ursulin
  2018-04-06 20:17   ` Chris Wilson
@ 2018-04-09 16:38   ` Tvrtko Ursulin
  1 sibling, 0 replies; 31+ messages in thread
From: Tvrtko Ursulin @ 2018-04-09 16:38 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Keep a count of requests submitted from userspace and not yet runnable due
unresolved dependencies.

v2: Rename and move under the container struct. (Chris Wilson)
v3: Rebase.
v4: Move decrement site to the backend to shrink the window of double-
    accounting as much as possible. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_request.c     | 3 +++
 drivers/gpu/drm/i915/intel_engine_cs.c  | 3 ++-
 drivers/gpu/drm/i915/intel_lrc.c        | 2 ++
 drivers/gpu/drm/i915/intel_ringbuffer.h | 8 ++++++++
 4 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 2617bd008845..997be595d7e7 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -511,6 +511,7 @@ void i915_request_submit(struct i915_request *request)
 	spin_lock_irqsave(&engine->timeline->lock, flags);
 
 	engine->request_stats.runnable++;
+	atomic_dec(&engine->request_stats.queued);
 
 	__i915_request_submit(request);
 
@@ -1072,6 +1073,8 @@ void __i915_request_add(struct i915_request *request, bool flush_caches)
 		engine->schedule(request, request->ctx->priority);
 	rcu_read_unlock();
 
+	atomic_inc(&engine->request_stats.queued);
+
 	local_bh_disable();
 	i915_sw_fence_commit(&request->submit);
 	local_bh_enable(); /* Kick the execlists tasklet if just scheduled */
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 98254ff92785..e4992c2e23a4 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -1934,12 +1934,13 @@ void intel_engine_dump(struct intel_engine_cs *engine,
 	if (i915_terminally_wedged(&engine->i915->gpu_error))
 		drm_printf(m, "*** WEDGED ***\n");
 
-	drm_printf(m, "\tcurrent seqno %x, last %x, hangcheck %x [%d ms], inflight %d, runnable %u\n",
+	drm_printf(m, "\tcurrent seqno %x, last %x, hangcheck %x [%d ms], inflight %d, queued %u, runnable %u\n",
 		   intel_engine_get_seqno(engine),
 		   intel_engine_last_submit(engine),
 		   engine->hangcheck.seqno,
 		   jiffies_to_msecs(jiffies - engine->hangcheck.action_timestamp),
 		   engine->timeline->inflight_seqnos,
+		   atomic_read(&engine->request_stats.queued),
 		   engine->request_stats.runnable);
 	drm_printf(m, "\tReset count: %d (global %d)\n",
 		   i915_reset_engine_count(error, engine),
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 16ea95ff7c51..ddd14e30be6c 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1124,7 +1124,9 @@ static void execlists_submit_request(struct i915_request *request)
 
 	queue_request(engine, &request->priotree, rq_prio(request));
 	submit_queue(engine, rq_prio(request));
+
 	engine->request_stats.runnable++;
+	atomic_dec(&engine->request_stats.queued);
 
 	GEM_BUG_ON(!engine->execlists.first);
 	GEM_BUG_ON(list_empty(&request->priotree.link));
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 54d2ad1c8daa..616066f536c9 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -339,6 +339,14 @@ struct intel_engine_cs {
 	struct drm_i915_gem_object *default_state;
 
 	struct {
+		/**
+		 * @queued: Number of submitted requests with dependencies.
+		 *
+		 * Count of requests waiting for dependencies before they can be
+		 * submitted to the backend.
+		 */
+		atomic_t queued;
+
 		/**
 		 * @runnable: Number of runnable requests sent to the backend.
 		 *
-- 
2.14.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v3 7/7] drm/i915: Engine queues query
  2018-04-05 12:39 ` [PATCH 7/7] drm/i915: Engine queues query Tvrtko Ursulin
  2018-04-05 13:05   ` Lionel Landwerlin
@ 2018-04-09 16:38   ` Tvrtko Ursulin
  1 sibling, 0 replies; 31+ messages in thread
From: Tvrtko Ursulin @ 2018-04-09 16:38 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

As well as exposing active requests on engines via PMU, we can also export
the current raw values (as tracked by i915 command submission) via a
dedicated query.

This is to satisfy customers who have userspace load balancing solutions
implemented on top of their custom kernel patches.

Userspace is now able to include DRM_I915_QUERY_ENGINE_QUEUES in their
query list, pointing to initialized struct drm_i915_query_engine_queues
entry. Fields describing engine class and instance userspace would like to
know about need to be filled in, and i915 will fill in the rest.

Multiple engines can be queried in one go by having multiple queries in
the query list.

v2:
 * Use EINVAL for reporting insufficient buffer space. (Chris Wilson)

v3:
 * One more reserved dword because I like even numbers.
 Lionel Landwerlin:
 * Document input fields.
 * Document reserved bits must be zero.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 drivers/gpu/drm/i915/i915_query.c | 43 +++++++++++++++++++++++++++++++++++++++
 include/uapi/drm/i915_drm.h       | 29 ++++++++++++++++++++++++++
 2 files changed, 72 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_query.c b/drivers/gpu/drm/i915/i915_query.c
index 3ace929dd90f..798672f5c104 100644
--- a/drivers/gpu/drm/i915/i915_query.c
+++ b/drivers/gpu/drm/i915/i915_query.c
@@ -82,9 +82,52 @@ static int query_topology_info(struct drm_i915_private *dev_priv,
 	return total_length;
 }
 
+static int
+query_engine_queues(struct drm_i915_private *i915,
+		    struct drm_i915_query_item *query_item)
+{
+	struct drm_i915_query_engine_queues __user *query_ptr =
+				u64_to_user_ptr(query_item->data_ptr);
+	struct drm_i915_query_engine_queues query;
+	struct intel_engine_cs *engine;
+	const int len = sizeof(query);
+	unsigned int i;
+
+	if (query_item->flags)
+		return -EINVAL;
+
+	if (!query_item->length)
+		return len;
+	else if (query_item->length < len)
+		return -EINVAL;
+
+	if (copy_from_user(&query, query_ptr, len))
+		return -EFAULT;
+
+	for (i = 0; i < ARRAY_SIZE(query.rsvd); i++) {
+		if (query.rsvd[i])
+			return -EINVAL;
+	}
+
+	engine = intel_engine_lookup_user(i915, query.class, query.instance);
+	if (!engine)
+		return -ENOENT;
+
+	query.queued = atomic_read(&engine->request_stats.queued);
+	query.runnable = engine->request_stats.runnable;
+	query.running = intel_engine_last_submit(engine) -
+			intel_engine_get_seqno(engine);
+
+	if (copy_to_user(query_ptr, &query, len))
+		return -EFAULT;
+
+	return len;
+}
+
 static int (* const i915_query_funcs[])(struct drm_i915_private *dev_priv,
 					struct drm_i915_query_item *query_item) = {
 	query_topology_info,
+	query_engine_queues,
 };
 
 int i915_query_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 9a00c30e4071..c82035b71824 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1637,6 +1637,7 @@ struct drm_i915_perf_oa_config {
 struct drm_i915_query_item {
 	__u64 query_id;
 #define DRM_I915_QUERY_TOPOLOGY_INFO    1
+#define DRM_I915_QUERY_ENGINE_QUEUES	2
 
 	/*
 	 * When set to zero by userspace, this is filled with the size of the
@@ -1734,6 +1735,34 @@ struct drm_i915_query_topology_info {
 	__u8 data[];
 };
 
+/**
+ * struct drm_i915_query_engine_queues
+ *
+ * Engine queues query enables userspace to query current counts of active
+ * requests in their different states.
+ */
+struct drm_i915_query_engine_queues {
+	/**
+	 * Engine class as in enum drm_i915_gem_engine_class (set by userspace).
+	 */
+	__u16 class;
+
+	/** Engine instance number (set by userspace). */
+	__u16 instance;
+
+	/** Number of requests with unresolved fences and dependencies. */
+	__u32 queued;
+
+	/** Number of ready requests waiting on a slot on GPU. */
+	__u32 runnable;
+
+	/** Number of requests executing on the GPU. */
+	__u32 running;
+
+	/** Reserved bits must be set to zero by userspace. */
+	__u32 rsvd[6];
+};
+
 #if defined(__cplusplus)
 }
 #endif
-- 
2.14.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* ✗ Fi.CI.BAT: failure for Queued/runnable/running engine stats (rev7)
  2018-04-05 12:39 [PATCH v5 0/7] Queued/runnable/running engine stats Tvrtko Ursulin
                   ` (8 preceding siblings ...)
  2018-04-05 16:08 ` ✗ Fi.CI.IGT: failure " Patchwork
@ 2018-04-09 17:12 ` Patchwork
  9 siblings, 0 replies; 31+ messages in thread
From: Patchwork @ 2018-04-09 17:12 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

== Series Details ==

Series: Queued/runnable/running engine stats (rev7)
URL   : https://patchwork.freedesktop.org/series/36926/
State : failure

== Summary ==

Series 36926v7 Queued/runnable/running engine stats
https://patchwork.freedesktop.org/api/1.0/series/36926/revisions/7/mbox/

---- Possible new issues:

Test gem_exec_gttfill:
        Subgroup basic:
                skip       -> PASS       (fi-pnv-d510)
Test gvt_basic:
        Subgroup invalid-placeholder-test:
                skip       -> INCOMPLETE (fi-elk-e7500)

fi-bdw-5557u     total:285  pass:264  dwarn:0   dfail:0   fail:0   skip:21  time:429s
fi-bdw-gvtdvm    total:285  pass:261  dwarn:0   dfail:0   fail:0   skip:24  time:445s
fi-blb-e6850     total:285  pass:220  dwarn:1   dfail:0   fail:0   skip:64  time:382s
fi-bsw-n3050     total:285  pass:239  dwarn:0   dfail:0   fail:0   skip:46  time:541s
fi-bwr-2160      total:285  pass:180  dwarn:0   dfail:0   fail:0   skip:105 time:297s
fi-bxt-dsi       total:285  pass:255  dwarn:0   dfail:0   fail:0   skip:30  time:517s
fi-bxt-j4205     total:285  pass:256  dwarn:0   dfail:0   fail:0   skip:29  time:516s
fi-byt-j1900     total:285  pass:250  dwarn:0   dfail:0   fail:0   skip:35  time:521s
fi-byt-n2820     total:285  pass:246  dwarn:0   dfail:0   fail:0   skip:39  time:512s
fi-cfl-8700k     total:285  pass:257  dwarn:0   dfail:0   fail:0   skip:28  time:409s
fi-cfl-s3        total:285  pass:259  dwarn:0   dfail:0   fail:0   skip:26  time:564s
fi-cfl-u         total:285  pass:259  dwarn:0   dfail:0   fail:0   skip:26  time:515s
fi-cnl-y3        total:285  pass:259  dwarn:0   dfail:0   fail:0   skip:26  time:580s
fi-elk-e7500     total:285  pass:226  dwarn:0   dfail:0   fail:0   skip:58 
fi-gdg-551       total:285  pass:176  dwarn:0   dfail:0   fail:1   skip:108 time:318s
fi-glk-1         total:285  pass:257  dwarn:0   dfail:0   fail:0   skip:28  time:541s
fi-glk-j4005     total:285  pass:256  dwarn:0   dfail:0   fail:0   skip:29  time:489s
fi-hsw-4770      total:285  pass:258  dwarn:0   dfail:0   fail:0   skip:27  time:408s
fi-ilk-650       total:285  pass:225  dwarn:0   dfail:0   fail:0   skip:60  time:422s
fi-ivb-3520m     total:285  pass:256  dwarn:0   dfail:0   fail:0   skip:29  time:470s
fi-ivb-3770      total:285  pass:252  dwarn:0   dfail:0   fail:0   skip:33  time:433s
fi-kbl-7500u     total:285  pass:260  dwarn:1   dfail:0   fail:0   skip:24  time:471s
fi-kbl-7567u     total:285  pass:265  dwarn:0   dfail:0   fail:0   skip:20  time:474s
fi-kbl-r         total:285  pass:258  dwarn:0   dfail:0   fail:0   skip:27  time:513s
fi-pnv-d510      total:285  pass:220  dwarn:1   dfail:0   fail:0   skip:64  time:668s
fi-skl-6260u     total:285  pass:265  dwarn:0   dfail:0   fail:0   skip:20  time:444s
fi-skl-6600u     total:285  pass:258  dwarn:0   dfail:0   fail:0   skip:27  time:536s
fi-skl-6700k2    total:285  pass:261  dwarn:0   dfail:0   fail:0   skip:24  time:500s
fi-skl-6770hq    total:285  pass:265  dwarn:0   dfail:0   fail:0   skip:20  time:503s
fi-skl-guc       total:285  pass:257  dwarn:0   dfail:0   fail:0   skip:28  time:431s
fi-skl-gvtdvm    total:285  pass:262  dwarn:0   dfail:0   fail:0   skip:23  time:445s
fi-snb-2520m     total:285  pass:245  dwarn:0   dfail:0   fail:0   skip:40  time:580s
fi-snb-2600      total:285  pass:245  dwarn:0   dfail:0   fail:0   skip:40  time:406s

1cda370ffded69ce8c5ffa4fba3564a952730b97 drm-tip: 2018y-04m-09d-14h-03m-58s UTC integration manifest
966c10a475b1 drm/i915: Engine queues query
f2cd91627fb3 drm/i915/pmu: Add running counter
f85771222d50 drm/i915/pmu: Add runnable counter
d640c428d4e0 drm/i915/pmu: Add queued counter
765f3adef3ef drm/i915: Keep a count of requests submitted from userspace
7e5d1b1b004e drm/i915: Keep a count of requests waiting for a slot on GPU
0f51d0f81fea drm/i915/pmu: Fix enable count array size and bounds checking

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_8646/issues.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2018-04-09 17:12 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-05 12:39 [PATCH v5 0/7] Queued/runnable/running engine stats Tvrtko Ursulin
2018-04-05 12:39 ` [PATCH 1/7] drm/i915/pmu: Fix enable count array size and bounds checking Tvrtko Ursulin
2018-04-05 12:39 ` [PATCH 2/7] drm/i915: Keep a count of requests waiting for a slot on GPU Tvrtko Ursulin
2018-04-06 20:16   ` Chris Wilson
2018-04-09 16:37   ` [PATCH v10 " Tvrtko Ursulin
2018-04-05 12:39 ` [PATCH 3/7] drm/i915: Keep a count of requests submitted from userspace Tvrtko Ursulin
2018-04-06 20:17   ` Chris Wilson
2018-04-09  9:11     ` Tvrtko Ursulin
2018-04-09  9:25       ` Chris Wilson
2018-04-09 10:17         ` Tvrtko Ursulin
2018-04-09 10:27           ` Chris Wilson
2018-04-09 10:29             ` Chris Wilson
2018-04-09 10:40             ` Tvrtko Ursulin
2018-04-09 10:51               ` Chris Wilson
2018-04-09 11:43                 ` Tvrtko Ursulin
2018-04-09 11:54                   ` Chris Wilson
2018-04-09 16:38   ` [PATCH v4 " Tvrtko Ursulin
2018-04-05 12:39 ` [PATCH 4/7] drm/i915/pmu: Add queued counter Tvrtko Ursulin
2018-04-06 20:19   ` Chris Wilson
2018-04-05 12:39 ` [PATCH 5/7] drm/i915/pmu: Add runnable counter Tvrtko Ursulin
2018-04-06 20:22   ` Chris Wilson
2018-04-05 12:39 ` [PATCH 6/7] drm/i915/pmu: Add running counter Tvrtko Ursulin
2018-04-06 20:24   ` Chris Wilson
2018-04-09  9:13     ` Tvrtko Ursulin
2018-04-05 12:39 ` [PATCH 7/7] drm/i915: Engine queues query Tvrtko Ursulin
2018-04-05 13:05   ` Lionel Landwerlin
2018-04-06 20:25     ` Chris Wilson
2018-04-09 16:38   ` [PATCH v3 " Tvrtko Ursulin
2018-04-05 13:49 ` ✓ Fi.CI.BAT: success for Queued/runnable/running engine stats (rev4) Patchwork
2018-04-05 16:08 ` ✗ Fi.CI.IGT: failure " Patchwork
2018-04-09 17:12 ` ✗ Fi.CI.BAT: failure for Queued/runnable/running engine stats (rev7) Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.