All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6 0/7] Queued/runnable/running engine stats
@ 2018-06-06 12:48 Tvrtko Ursulin
  2018-06-06 12:48 ` [PATCH 1/7] drm/i915/pmu: Fix enable count array size and bounds checking Tvrtko Ursulin
                   ` (12 more replies)
  0 siblings, 13 replies; 25+ messages in thread
From: Tvrtko Ursulin @ 2018-06-06 12:48 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Per-engine queue depths are an interesting metric for analyzing the system load
and also for users who wish to use it to load balance their submissions based
on it.

In this version I have split the metrics into three separate counters:

1. QUEUED - From execbuf time to request being runnable - runnable meaning until
            dependencies have been resolved and fences signaled.
2. RUNNABLE - From runnable to running on the GPU.
3. RUNNING - Running on the GPU.

When inspected with perf stat the output looks roughly like this:

#           time             counts unit events
   201.160490145               0.01      i915/rcs0-queued/
   201.160490145              19.13      i915/rcs0-runnable/
   201.160490145               2.39      i915/rcs0-running/

The reported numbers are average queue depths for the last query period.

v2:
 * Review feedback (see patch changelogs).
 * Renamed the counters and re-ordered some patches.

v3:
 * Review feedback and rebase.

v4:
 * Addition of last patch in the series, which supports a customer requirement
   to expose instantaneous queue values via the i915 query API.

v5:
 * Fixed accounting under wedging.
 * Error code ABI tweak.

 v6:
 * Rebase and update for recently discovered possible hrtimer inaccuracy.

Tvrtko Ursulin (7):
  drm/i915/pmu: Fix enable count array size and bounds checking
  drm/i915: Keep a count of requests waiting for a slot on GPU
  drm/i915: Keep a count of requests submitted from userspace
  drm/i915/pmu: Add queued counter
  drm/i915/pmu: Add runnable counter
  drm/i915/pmu: Add running counter
  drm/i915: Engine queues query

 drivers/gpu/drm/i915/i915_gem.c         |  1 +
 drivers/gpu/drm/i915/i915_pmu.c         | 96 +++++++++++++++++++++----
 drivers/gpu/drm/i915/i915_query.c       | 43 +++++++++++
 drivers/gpu/drm/i915/i915_request.c     | 10 +++
 drivers/gpu/drm/i915/intel_engine_cs.c  |  6 +-
 drivers/gpu/drm/i915/intel_lrc.c        |  3 +
 drivers/gpu/drm/i915/intel_ringbuffer.h | 21 +++++-
 include/uapi/drm/i915_drm.h             | 48 ++++++++++++-
 8 files changed, 209 insertions(+), 19 deletions(-)

-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/7] drm/i915/pmu: Fix enable count array size and bounds checking
  2018-06-06 12:48 [PATCH v6 0/7] Queued/runnable/running engine stats Tvrtko Ursulin
@ 2018-06-06 12:48 ` Tvrtko Ursulin
  2018-06-06 12:48 ` [PATCH 2/7] drm/i915: Keep a count of requests waiting for a slot on GPU Tvrtko Ursulin
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 25+ messages in thread
From: Tvrtko Ursulin @ 2018-06-06 12:48 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Enable count array is supposed to have one counter for each possible
engine sampler. As such array sizing and bounds checking is not
correct when more engine samplers are added.

At the same time tidy the assert for readability and robustness.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: b46a33e271ed ("drm/i915/pmu: Expose a PMU interface for perf queries")
Cc: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_pmu.c         | 13 +++++++++----
 drivers/gpu/drm/i915/intel_ringbuffer.h |  2 +-
 2 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index c39541ed2219..b8c6953867ee 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -593,7 +593,8 @@ static void i915_pmu_enable(struct perf_event *event)
 	 * Update the bitmask of enabled events and increment
 	 * the event reference counter.
 	 */
-	GEM_BUG_ON(bit >= I915_PMU_MASK_BITS);
+	BUILD_BUG_ON(ARRAY_SIZE(i915->pmu.enable_count) != I915_PMU_MASK_BITS);
+	GEM_BUG_ON(bit >= ARRAY_SIZE(i915->pmu.enable_count));
 	GEM_BUG_ON(i915->pmu.enable_count[bit] == ~0);
 	i915->pmu.enable |= BIT_ULL(bit);
 	i915->pmu.enable_count[bit]++;
@@ -617,7 +618,10 @@ static void i915_pmu_enable(struct perf_event *event)
 		GEM_BUG_ON(!engine);
 		engine->pmu.enable |= BIT(sample);
 
-		GEM_BUG_ON(sample >= I915_PMU_SAMPLE_BITS);
+		BUILD_BUG_ON(ARRAY_SIZE(engine->pmu.enable_count) !=
+			     (1 << I915_PMU_SAMPLE_BITS));
+		GEM_BUG_ON(sample >= ARRAY_SIZE(engine->pmu.enable_count));
+		GEM_BUG_ON(sample >= ARRAY_SIZE(engine->pmu.sample));
 		GEM_BUG_ON(engine->pmu.enable_count[sample] == ~0);
 		engine->pmu.enable_count[sample]++;
 	}
@@ -649,7 +653,8 @@ static void i915_pmu_disable(struct perf_event *event)
 						  engine_event_class(event),
 						  engine_event_instance(event));
 		GEM_BUG_ON(!engine);
-		GEM_BUG_ON(sample >= I915_PMU_SAMPLE_BITS);
+		GEM_BUG_ON(sample >= ARRAY_SIZE(engine->pmu.enable_count));
+		GEM_BUG_ON(sample >= ARRAY_SIZE(engine->pmu.sample));
 		GEM_BUG_ON(engine->pmu.enable_count[sample] == 0);
 		/*
 		 * Decrement the reference count and clear the enabled
@@ -659,7 +664,7 @@ static void i915_pmu_disable(struct perf_event *event)
 			engine->pmu.enable &= ~BIT(sample);
 	}
 
-	GEM_BUG_ON(bit >= I915_PMU_MASK_BITS);
+	GEM_BUG_ON(bit >= ARRAY_SIZE(i915->pmu.enable_count));
 	GEM_BUG_ON(i915->pmu.enable_count[bit] == 0);
 	/*
 	 * Decrement the reference count and clear the enabled
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index acef385c4c80..2f3232599d80 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -397,7 +397,7 @@ struct intel_engine_cs {
 		 *
 		 * Index number corresponds to the bit number from @enable.
 		 */
-		unsigned int enable_count[I915_PMU_SAMPLE_BITS];
+		unsigned int enable_count[1 << I915_PMU_SAMPLE_BITS];
 		/**
 		 * @sample: Counter values for sampling events.
 		 *
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 2/7] drm/i915: Keep a count of requests waiting for a slot on GPU
  2018-06-06 12:48 [PATCH v6 0/7] Queued/runnable/running engine stats Tvrtko Ursulin
  2018-06-06 12:48 ` [PATCH 1/7] drm/i915/pmu: Fix enable count array size and bounds checking Tvrtko Ursulin
@ 2018-06-06 12:48 ` Tvrtko Ursulin
  2018-06-06 12:48 ` [PATCH 3/7] drm/i915: Keep a count of requests submitted from userspace Tvrtko Ursulin
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 25+ messages in thread
From: Tvrtko Ursulin @ 2018-06-06 12:48 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Keep a per-engine number of runnable (waiting for GPU time) requests.

We choose to mange the runnable counter at the backend level instead of at
the request submit_notify callback. The latter would be more consolidated
and less code, but it would require making the counter either atomic_t or
taking the engine->timeline->lock in submit_notify. So the choice is to do
it at the backend level for the benefit of fewer atomic instructions.

v2:
 * Move queued increment from insert_request to execlist_submit_request to
   avoid bumping when re-ordering for priority.
 * Support the counter on the ringbuffer submission path as well, albeit
   just notionally. (Chris Wilson)

v3:
 * Rebase.

v4:
 * Rename and move the stats into a container structure. (Chris Wilson)

v5:
 * Re-order fields in struct intel_engine_cs. (Chris Wilson)

v6-v8:
 * Rebases.

v9:
 * Fix accounting during wedging.

v10:
 * Improved commit message. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c         | 1 +
 drivers/gpu/drm/i915/i915_request.c     | 7 +++++++
 drivers/gpu/drm/i915/intel_engine_cs.c  | 5 +++--
 drivers/gpu/drm/i915/intel_lrc.c        | 1 +
 drivers/gpu/drm/i915/intel_ringbuffer.h | 9 +++++++++
 5 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 86f1f9aaa119..451f4399ed63 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3256,6 +3256,7 @@ static void nop_complete_submit_request(struct i915_request *request)
 	dma_fence_set_error(&request->fence, -EIO);
 
 	spin_lock_irqsave(&request->engine->timeline.lock, flags);
+	request->engine->request_stats.runnable++;
 	__i915_request_submit(request);
 	intel_engine_init_global_seqno(request->engine, request->global_seqno);
 	spin_unlock_irqrestore(&request->engine->timeline.lock, flags);
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index f187250e60c6..b8ddcd23a6f3 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -541,6 +541,9 @@ void __i915_request_submit(struct i915_request *request)
 	/* Transfer from per-context onto the global per-engine timeline */
 	move_to_timeline(request, &engine->timeline);
 
+	GEM_BUG_ON(engine->request_stats.runnable == 0);
+	engine->request_stats.runnable--;
+
 	trace_i915_request_execute(request);
 
 	wake_up_all(&request->execute);
@@ -554,6 +557,8 @@ void i915_request_submit(struct i915_request *request)
 	/* Will be called from irq-context when using foreign fences. */
 	spin_lock_irqsave(&engine->timeline.lock, flags);
 
+	engine->request_stats.runnable++;
+
 	__i915_request_submit(request);
 
 	spin_unlock_irqrestore(&engine->timeline.lock, flags);
@@ -592,6 +597,8 @@ void __i915_request_unsubmit(struct i915_request *request)
 	/* Transfer back from the global per-engine timeline to per-context */
 	move_to_timeline(request, request->timeline);
 
+	engine->request_stats.runnable++;
+
 	/*
 	 * We don't need to wake_up any waiters on request->execute, they
 	 * will get woken by any other event or us re-adding this request
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 2ec2e60dc670..1bb3be96ca08 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -1420,11 +1420,12 @@ void intel_engine_dump(struct intel_engine_cs *engine,
 	if (i915_terminally_wedged(&engine->i915->gpu_error))
 		drm_printf(m, "*** WEDGED ***\n");
 
-	drm_printf(m, "\tcurrent seqno %x, last %x, hangcheck %x [%d ms]\n",
+	drm_printf(m, "\tcurrent seqno %x, last %x, hangcheck %x [%d ms], runnable %u\n",
 		   intel_engine_get_seqno(engine),
 		   intel_engine_last_submit(engine),
 		   engine->hangcheck.seqno,
-		   jiffies_to_msecs(jiffies - engine->hangcheck.action_timestamp));
+		   jiffies_to_msecs(jiffies - engine->hangcheck.action_timestamp),
+		   engine->request_stats.runnable);
 	drm_printf(m, "\tReset count: %d (global %d)\n",
 		   i915_reset_engine_count(error, engine),
 		   i915_reset_count(error));
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 091e28f0e024..ed90f7a46e9a 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1201,6 +1201,7 @@ static void execlists_submit_request(struct i915_request *request)
 
 	queue_request(engine, &request->sched, rq_prio(request));
 	submit_queue(engine, rq_prio(request));
+	engine->request_stats.runnable++;
 
 	GEM_BUG_ON(!engine->execlists.first);
 	GEM_BUG_ON(list_empty(&request->sched.link));
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 2f3232599d80..3e0cfac49755 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -344,6 +344,15 @@ struct intel_engine_cs {
 	struct drm_i915_gem_object *default_state;
 	void *pinned_default_state;
 
+	struct {
+		/**
+		 * @runnable: Number of runnable requests sent to the backend.
+		 *
+		 * Count of requests waiting for the GPU to execute them.
+		 */
+		unsigned int runnable;
+	} request_stats;
+
 	atomic_t irq_count;
 	unsigned long irq_posted;
 #define ENGINE_IRQ_BREADCRUMB 0
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 3/7] drm/i915: Keep a count of requests submitted from userspace
  2018-06-06 12:48 [PATCH v6 0/7] Queued/runnable/running engine stats Tvrtko Ursulin
  2018-06-06 12:48 ` [PATCH 1/7] drm/i915/pmu: Fix enable count array size and bounds checking Tvrtko Ursulin
  2018-06-06 12:48 ` [PATCH 2/7] drm/i915: Keep a count of requests waiting for a slot on GPU Tvrtko Ursulin
@ 2018-06-06 12:48 ` Tvrtko Ursulin
  2018-06-06 12:48 ` [PATCH 4/7] drm/i915/pmu: Add queued counter Tvrtko Ursulin
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 25+ messages in thread
From: Tvrtko Ursulin @ 2018-06-06 12:48 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Keep a count of requests submitted from userspace and not yet runnable due
unresolved dependencies.

v2: Rename and move under the container struct. (Chris Wilson)
v3: Rebase.
v4: Move decrement site to the backend to shrink the window of double-
    accounting as much as possible. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_request.c     | 3 +++
 drivers/gpu/drm/i915/intel_engine_cs.c  | 3 ++-
 drivers/gpu/drm/i915/intel_lrc.c        | 2 ++
 drivers/gpu/drm/i915/intel_ringbuffer.h | 8 ++++++++
 4 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index b8ddcd23a6f3..53ba70ea7af3 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -558,6 +558,7 @@ void i915_request_submit(struct i915_request *request)
 	spin_lock_irqsave(&engine->timeline.lock, flags);
 
 	engine->request_stats.runnable++;
+	atomic_dec(&engine->request_stats.queued);
 
 	__i915_request_submit(request);
 
@@ -1109,6 +1110,8 @@ void __i915_request_add(struct i915_request *request, bool flush_caches)
 	}
 	request->emitted_jiffies = jiffies;
 
+	atomic_inc(&engine->request_stats.queued);
+
 	/*
 	 * Let the backend know a new request has arrived that may need
 	 * to adjust the existing execution schedule due to a high priority
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 1bb3be96ca08..d2a51aefed38 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -1420,11 +1420,12 @@ void intel_engine_dump(struct intel_engine_cs *engine,
 	if (i915_terminally_wedged(&engine->i915->gpu_error))
 		drm_printf(m, "*** WEDGED ***\n");
 
-	drm_printf(m, "\tcurrent seqno %x, last %x, hangcheck %x [%d ms], runnable %u\n",
+	drm_printf(m, "\tcurrent seqno %x, last %x, hangcheck %x [%d ms], queued %u, runnable %u\n",
 		   intel_engine_get_seqno(engine),
 		   intel_engine_last_submit(engine),
 		   engine->hangcheck.seqno,
 		   jiffies_to_msecs(jiffies - engine->hangcheck.action_timestamp),
+		   atomic_read(&engine->request_stats.queued),
 		   engine->request_stats.runnable);
 	drm_printf(m, "\tReset count: %d (global %d)\n",
 		   i915_reset_engine_count(error, engine),
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index ed90f7a46e9a..cac9888376b0 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1201,7 +1201,9 @@ static void execlists_submit_request(struct i915_request *request)
 
 	queue_request(engine, &request->sched, rq_prio(request));
 	submit_queue(engine, rq_prio(request));
+
 	engine->request_stats.runnable++;
+	atomic_dec(&engine->request_stats.queued);
 
 	GEM_BUG_ON(!engine->execlists.first);
 	GEM_BUG_ON(list_empty(&request->sched.link));
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 3e0cfac49755..eeb7f3662195 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -345,6 +345,14 @@ struct intel_engine_cs {
 	void *pinned_default_state;
 
 	struct {
+		/**
+		 * @queued: Number of submitted requests with dependencies.
+		 *
+		 * Count of requests waiting for dependencies before they can be
+		 * submitted to the backend.
+		 */
+		atomic_t queued;
+
 		/**
 		 * @runnable: Number of runnable requests sent to the backend.
 		 *
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 4/7] drm/i915/pmu: Add queued counter
  2018-06-06 12:48 [PATCH v6 0/7] Queued/runnable/running engine stats Tvrtko Ursulin
                   ` (2 preceding siblings ...)
  2018-06-06 12:48 ` [PATCH 3/7] drm/i915: Keep a count of requests submitted from userspace Tvrtko Ursulin
@ 2018-06-06 12:48 ` Tvrtko Ursulin
  2018-06-06 13:16   ` Chris Wilson
  2018-06-06 14:39   ` Tvrtko Ursulin
  2018-06-06 12:48 ` [PATCH 5/7] drm/i915/pmu: Add runnable counter Tvrtko Ursulin
                   ` (8 subsequent siblings)
  12 siblings, 2 replies; 25+ messages in thread
From: Tvrtko Ursulin @ 2018-06-06 12:48 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

We add a PMU counter to expose the number of requests which have been
submitted from userspace but are not yet runnable due dependencies and
unsignaled fences.

This is useful to analyze the overall load of the system.

v2:
 * Rebase for name change and re-order.
 * Drop floating point constant. (Chris Wilson)

v3:
 * Change scale to 1024 for faster arithmetics. (Chris Wilson)

v4:
 * Refactored for timer period accounting.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_pmu.c         | 53 ++++++++++++++++++++-----
 drivers/gpu/drm/i915/intel_ringbuffer.h |  2 +-
 include/uapi/drm/i915_drm.h             |  9 ++++-
 3 files changed, 52 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index b8c6953867ee..5f8cc3fe1826 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -15,7 +15,8 @@
 #define ENGINE_SAMPLE_MASK \
 	(BIT(I915_SAMPLE_BUSY) | \
 	 BIT(I915_SAMPLE_WAIT) | \
-	 BIT(I915_SAMPLE_SEMA))
+	 BIT(I915_SAMPLE_SEMA) | \
+	 BIT(I915_SAMPLE_QUEUED))
 
 #define ENGINE_SAMPLE_BITS (1 << I915_PMU_SAMPLE_BITS)
 
@@ -161,6 +162,12 @@ add_sample(struct i915_pmu_sample *sample, u32 val)
 	sample->cur += val;
 }
 
+static void
+add_sample_mult(struct i915_pmu_sample *sample, u32 val, u32 mul)
+{
+	sample->cur += mul_u32_u32(val, mul);
+}
+
 static void
 engines_sample(struct drm_i915_private *dev_priv, unsigned int period_ns)
 {
@@ -204,6 +211,12 @@ engines_sample(struct drm_i915_private *dev_priv, unsigned int period_ns)
 		if (val & RING_WAIT_SEMAPHORE)
 			add_sample(&engine->pmu.sample[I915_SAMPLE_SEMA],
 				   period_ns);
+
+		if (engine->pmu.enable & BIT(I915_SAMPLE_QUEUED))
+			add_sample_mult(&engine->pmu.sample[I915_SAMPLE_QUEUED],
+					atomic_read(&engine->request_stats.queued),
+					(u64)period_ns *
+					I915_SAMPLE_QUEUED_DIVISOR / 1000000);
 	}
 
 	if (fw)
@@ -212,12 +225,6 @@ engines_sample(struct drm_i915_private *dev_priv, unsigned int period_ns)
 	intel_runtime_pm_put(dev_priv);
 }
 
-static void
-add_sample_mult(struct i915_pmu_sample *sample, u32 val, u32 mul)
-{
-	sample->cur += mul_u32_u32(val, mul);
-}
-
 static void
 frequency_sample(struct drm_i915_private *dev_priv, unsigned int period_ns)
 {
@@ -323,6 +330,7 @@ engine_event_status(struct intel_engine_cs *engine,
 	switch (sample) {
 	case I915_SAMPLE_BUSY:
 	case I915_SAMPLE_WAIT:
+	case I915_SAMPLE_QUEUED:
 		break;
 	case I915_SAMPLE_SEMA:
 		if (INTEL_GEN(engine->i915) < 6)
@@ -540,6 +548,9 @@ static u64 __i915_pmu_event_read(struct perf_event *event)
 			val = ktime_to_ns(intel_engine_get_busy_time(engine));
 		} else {
 			val = engine->pmu.sample[sample].cur;
+
+			if (sample == I915_SAMPLE_QUEUED)
+				val = div_u64(val, MSEC_PER_SEC);  /* to qd */
 		}
 	} else {
 		switch (event->attr.config) {
@@ -796,6 +807,16 @@ static const struct attribute_group *i915_pmu_attr_groups[] = {
 { \
 	.sample = (__sample), \
 	.name = (__name), \
+	.suffix = "unit", \
+	.value = "ns", \
+}
+
+#define __engine_event_scale(__sample, __name, __scale) \
+{ \
+	.sample = (__sample), \
+	.name = (__name), \
+	.suffix = "scale", \
+	.value = (__scale), \
 }
 
 static struct i915_ext_attribute *
@@ -823,6 +844,9 @@ add_pmu_attr(struct perf_pmu_events_attr *attr, const char *name,
 	return ++attr;
 }
 
+/* No brackets or quotes below please. */
+#define I915_SAMPLE_QUEUED_SCALE 0.0009765625
+
 static struct attribute **
 create_event_attributes(struct drm_i915_private *i915)
 {
@@ -839,10 +863,14 @@ create_event_attributes(struct drm_i915_private *i915)
 	static const struct {
 		enum drm_i915_pmu_engine_sample sample;
 		char *name;
+		char *suffix;
+		char *value;
 	} engine_events[] = {
 		__engine_event(I915_SAMPLE_BUSY, "busy"),
 		__engine_event(I915_SAMPLE_SEMA, "sema"),
 		__engine_event(I915_SAMPLE_WAIT, "wait"),
+		__engine_event_scale(I915_SAMPLE_QUEUED, "queued",
+				     __stringify(I915_SAMPLE_QUEUED_SCALE)),
 	};
 	unsigned int count = 0;
 	struct perf_pmu_events_attr *pmu_attr = NULL, *pmu_iter;
@@ -852,6 +880,9 @@ create_event_attributes(struct drm_i915_private *i915)
 	enum intel_engine_id id;
 	unsigned int i;
 
+	BUILD_BUG_ON(I915_SAMPLE_QUEUED_DIVISOR !=
+		     (1 / I915_SAMPLE_QUEUED_SCALE));
+
 	/* Count how many counters we will be exposing. */
 	for (i = 0; i < ARRAY_SIZE(events); i++) {
 		if (!config_status(i915, events[i].config))
@@ -929,13 +960,15 @@ create_event_attributes(struct drm_i915_private *i915)
 								engine->instance,
 								engine_events[i].sample));
 
-			str = kasprintf(GFP_KERNEL, "%s-%s.unit",
-					engine->name, engine_events[i].name);
+			str = kasprintf(GFP_KERNEL, "%s-%s.%s",
+					engine->name, engine_events[i].name,
+					engine_events[i].suffix);
 			if (!str)
 				goto err;
 
 			*attr_iter++ = &pmu_iter->attr.attr;
-			pmu_iter = add_pmu_attr(pmu_iter, str, "ns");
+			pmu_iter = add_pmu_attr(pmu_iter, str,
+						engine_events[i].value);
 		}
 	}
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index eeb7f3662195..902b63eeaf50 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -420,7 +420,7 @@ struct intel_engine_cs {
 		 *
 		 * Our internal timer stores the current counters in this field.
 		 */
-#define I915_ENGINE_SAMPLE_MAX (I915_SAMPLE_SEMA + 1)
+#define I915_ENGINE_SAMPLE_MAX (I915_SAMPLE_QUEUED + 1)
 		struct i915_pmu_sample sample[I915_ENGINE_SAMPLE_MAX];
 	} pmu;
 
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 7f5634ce8e88..6094cc9ca6d9 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -110,9 +110,13 @@ enum drm_i915_gem_engine_class {
 enum drm_i915_pmu_engine_sample {
 	I915_SAMPLE_BUSY = 0,
 	I915_SAMPLE_WAIT = 1,
-	I915_SAMPLE_SEMA = 2
+	I915_SAMPLE_SEMA = 2,
+	I915_SAMPLE_QUEUED = 3
 };
 
+ /* Divide counter value by divisor to get the real value. */
+#define I915_SAMPLE_QUEUED_DIVISOR (1024)
+
 #define I915_PMU_SAMPLE_BITS (4)
 #define I915_PMU_SAMPLE_MASK (0xf)
 #define I915_PMU_SAMPLE_INSTANCE_BITS (8)
@@ -133,6 +137,9 @@ enum drm_i915_pmu_engine_sample {
 #define I915_PMU_ENGINE_SEMA(class, instance) \
 	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_SEMA)
 
+#define I915_PMU_ENGINE_QUEUED(class, instance) \
+	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_QUEUED)
+
 #define __I915_PMU_OTHER(x) (__I915_PMU_ENGINE(0xff, 0xff, 0xf) + 1 + (x))
 
 #define I915_PMU_ACTUAL_FREQUENCY	__I915_PMU_OTHER(0)
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 5/7] drm/i915/pmu: Add runnable counter
  2018-06-06 12:48 [PATCH v6 0/7] Queued/runnable/running engine stats Tvrtko Ursulin
                   ` (3 preceding siblings ...)
  2018-06-06 12:48 ` [PATCH 4/7] drm/i915/pmu: Add queued counter Tvrtko Ursulin
@ 2018-06-06 12:48 ` Tvrtko Ursulin
  2018-06-06 14:39   ` Tvrtko Ursulin
  2018-06-06 12:48 ` [PATCH 6/7] drm/i915/pmu: Add running counter Tvrtko Ursulin
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 25+ messages in thread
From: Tvrtko Ursulin @ 2018-06-06 12:48 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

We add a PMU counter to expose the number of requests with resolved
dependencies waiting for a slot on the GPU to run.

This is useful to analyze the overall load of the system.

v2: Don't limit to gen8+.

v3:
 * Rebase for dynamic sysfs.
 * Drop currently executing requests.

v4:
 * Sync with internal renaming.
 * Drop floating point constant. (Chris Wilson)

v5:
 * Change scale to 1024 for faster arithmetics. (Chris Wilson)

v6:
 * Refactored for timer period accounting.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_pmu.c         | 19 +++++++++++++++++--
 drivers/gpu/drm/i915/intel_ringbuffer.h |  2 +-
 include/uapi/drm/i915_drm.h             |  7 ++++++-
 3 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index 5f8cc3fe1826..41527b682c72 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -16,7 +16,8 @@
 	(BIT(I915_SAMPLE_BUSY) | \
 	 BIT(I915_SAMPLE_WAIT) | \
 	 BIT(I915_SAMPLE_SEMA) | \
-	 BIT(I915_SAMPLE_QUEUED))
+	 BIT(I915_SAMPLE_QUEUED) | \
+	 BIT(I915_SAMPLE_RUNNABLE))
 
 #define ENGINE_SAMPLE_BITS (1 << I915_PMU_SAMPLE_BITS)
 
@@ -217,6 +218,12 @@ engines_sample(struct drm_i915_private *dev_priv, unsigned int period_ns)
 					atomic_read(&engine->request_stats.queued),
 					(u64)period_ns *
 					I915_SAMPLE_QUEUED_DIVISOR / 1000000);
+
+		if (engine->pmu.enable & BIT(I915_SAMPLE_RUNNABLE))
+			add_sample_mult(&engine->pmu.sample[I915_SAMPLE_RUNNABLE],
+					engine->request_stats.runnable,
+					(u64)period_ns *
+					I915_SAMPLE_QUEUED_DIVISOR / 1000000);
 	}
 
 	if (fw)
@@ -331,6 +338,7 @@ engine_event_status(struct intel_engine_cs *engine,
 	case I915_SAMPLE_BUSY:
 	case I915_SAMPLE_WAIT:
 	case I915_SAMPLE_QUEUED:
+	case I915_SAMPLE_RUNNABLE:
 		break;
 	case I915_SAMPLE_SEMA:
 		if (INTEL_GEN(engine->i915) < 6)
@@ -549,7 +557,8 @@ static u64 __i915_pmu_event_read(struct perf_event *event)
 		} else {
 			val = engine->pmu.sample[sample].cur;
 
-			if (sample == I915_SAMPLE_QUEUED)
+			if (sample == I915_SAMPLE_QUEUED ||
+			    sample == I915_SAMPLE_RUNNABLE)
 				val = div_u64(val, MSEC_PER_SEC);  /* to qd */
 		}
 	} else {
@@ -846,6 +855,7 @@ add_pmu_attr(struct perf_pmu_events_attr *attr, const char *name,
 
 /* No brackets or quotes below please. */
 #define I915_SAMPLE_QUEUED_SCALE 0.0009765625
+#define I915_SAMPLE_RUNNABLE_SCALE 0.0009765625
 
 static struct attribute **
 create_event_attributes(struct drm_i915_private *i915)
@@ -871,6 +881,8 @@ create_event_attributes(struct drm_i915_private *i915)
 		__engine_event(I915_SAMPLE_WAIT, "wait"),
 		__engine_event_scale(I915_SAMPLE_QUEUED, "queued",
 				     __stringify(I915_SAMPLE_QUEUED_SCALE)),
+		__engine_event_scale(I915_SAMPLE_RUNNABLE, "runnable",
+				     __stringify(I915_SAMPLE_RUNNABLE_SCALE)),
 	};
 	unsigned int count = 0;
 	struct perf_pmu_events_attr *pmu_attr = NULL, *pmu_iter;
@@ -883,6 +895,9 @@ create_event_attributes(struct drm_i915_private *i915)
 	BUILD_BUG_ON(I915_SAMPLE_QUEUED_DIVISOR !=
 		     (1 / I915_SAMPLE_QUEUED_SCALE));
 
+	BUILD_BUG_ON(I915_SAMPLE_RUNNABLE_DIVISOR !=
+		     (1 / I915_SAMPLE_RUNNABLE_SCALE));
+
 	/* Count how many counters we will be exposing. */
 	for (i = 0; i < ARRAY_SIZE(events); i++) {
 		if (!config_status(i915, events[i].config))
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 902b63eeaf50..703cea694f0d 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -420,7 +420,7 @@ struct intel_engine_cs {
 		 *
 		 * Our internal timer stores the current counters in this field.
 		 */
-#define I915_ENGINE_SAMPLE_MAX (I915_SAMPLE_QUEUED + 1)
+#define I915_ENGINE_SAMPLE_MAX (I915_SAMPLE_RUNNABLE + 1)
 		struct i915_pmu_sample sample[I915_ENGINE_SAMPLE_MAX];
 	} pmu;
 
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 6094cc9ca6d9..cf0265b20e37 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -111,11 +111,13 @@ enum drm_i915_pmu_engine_sample {
 	I915_SAMPLE_BUSY = 0,
 	I915_SAMPLE_WAIT = 1,
 	I915_SAMPLE_SEMA = 2,
-	I915_SAMPLE_QUEUED = 3
+	I915_SAMPLE_QUEUED = 3,
+	I915_SAMPLE_RUNNABLE = 4,
 };
 
  /* Divide counter value by divisor to get the real value. */
 #define I915_SAMPLE_QUEUED_DIVISOR (1024)
+#define I915_SAMPLE_RUNNABLE_DIVISOR (1024)
 
 #define I915_PMU_SAMPLE_BITS (4)
 #define I915_PMU_SAMPLE_MASK (0xf)
@@ -140,6 +142,9 @@ enum drm_i915_pmu_engine_sample {
 #define I915_PMU_ENGINE_QUEUED(class, instance) \
 	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_QUEUED)
 
+#define I915_PMU_ENGINE_RUNNABLE(class, instance) \
+	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_RUNNABLE)
+
 #define __I915_PMU_OTHER(x) (__I915_PMU_ENGINE(0xff, 0xff, 0xf) + 1 + (x))
 
 #define I915_PMU_ACTUAL_FREQUENCY	__I915_PMU_OTHER(0)
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 6/7] drm/i915/pmu: Add running counter
  2018-06-06 12:48 [PATCH v6 0/7] Queued/runnable/running engine stats Tvrtko Ursulin
                   ` (4 preceding siblings ...)
  2018-06-06 12:48 ` [PATCH 5/7] drm/i915/pmu: Add runnable counter Tvrtko Ursulin
@ 2018-06-06 12:48 ` Tvrtko Ursulin
  2018-06-06 14:40   ` Tvrtko Ursulin
  2018-06-06 12:48 ` [PATCH 7/7] drm/i915: Engine queues query Tvrtko Ursulin
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 25+ messages in thread
From: Tvrtko Ursulin @ 2018-06-06 12:48 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

We add a PMU counter to expose the number of requests currently executing
on the GPU.

This is useful to analyze the overall load of the system.

v2:
 * Rebase.
 * Drop floating point constant. (Chris Wilson)

v3:
 * Change scale to 1024 for faster arithmetics. (Chris Wilson)

v4:
 * Refactored for timer period accounting.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_pmu.c         | 19 +++++++++++++++++--
 drivers/gpu/drm/i915/intel_ringbuffer.h |  2 +-
 include/uapi/drm/i915_drm.h             |  5 +++++
 3 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index 41527b682c72..60dc68e4911c 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -17,7 +17,8 @@
 	 BIT(I915_SAMPLE_WAIT) | \
 	 BIT(I915_SAMPLE_SEMA) | \
 	 BIT(I915_SAMPLE_QUEUED) | \
-	 BIT(I915_SAMPLE_RUNNABLE))
+	 BIT(I915_SAMPLE_RUNNABLE) | \
+	 BIT(I915_SAMPLE_RUNNING))
 
 #define ENGINE_SAMPLE_BITS (1 << I915_PMU_SAMPLE_BITS)
 
@@ -224,6 +225,12 @@ engines_sample(struct drm_i915_private *dev_priv, unsigned int period_ns)
 					engine->request_stats.runnable,
 					(u64)period_ns *
 					I915_SAMPLE_QUEUED_DIVISOR / 1000000);
+
+		if (engine->pmu.enable & BIT(I915_SAMPLE_RUNNING))
+			add_sample_mult(&engine->pmu.sample[I915_SAMPLE_RUNNING],
+					last_seqno - current_seqno,
+					(u64)period_ns *
+					I915_SAMPLE_QUEUED_DIVISOR / 1000000);
 	}
 
 	if (fw)
@@ -339,6 +346,7 @@ engine_event_status(struct intel_engine_cs *engine,
 	case I915_SAMPLE_WAIT:
 	case I915_SAMPLE_QUEUED:
 	case I915_SAMPLE_RUNNABLE:
+	case I915_SAMPLE_RUNNING:
 		break;
 	case I915_SAMPLE_SEMA:
 		if (INTEL_GEN(engine->i915) < 6)
@@ -558,7 +566,8 @@ static u64 __i915_pmu_event_read(struct perf_event *event)
 			val = engine->pmu.sample[sample].cur;
 
 			if (sample == I915_SAMPLE_QUEUED ||
-			    sample == I915_SAMPLE_RUNNABLE)
+			    sample == I915_SAMPLE_RUNNABLE ||
+			    sample == I915_SAMPLE_RUNNING)
 				val = div_u64(val, MSEC_PER_SEC);  /* to qd */
 		}
 	} else {
@@ -856,6 +865,7 @@ add_pmu_attr(struct perf_pmu_events_attr *attr, const char *name,
 /* No brackets or quotes below please. */
 #define I915_SAMPLE_QUEUED_SCALE 0.0009765625
 #define I915_SAMPLE_RUNNABLE_SCALE 0.0009765625
+#define I915_SAMPLE_RUNNING_SCALE 0.0009765625
 
 static struct attribute **
 create_event_attributes(struct drm_i915_private *i915)
@@ -883,6 +893,8 @@ create_event_attributes(struct drm_i915_private *i915)
 				     __stringify(I915_SAMPLE_QUEUED_SCALE)),
 		__engine_event_scale(I915_SAMPLE_RUNNABLE, "runnable",
 				     __stringify(I915_SAMPLE_RUNNABLE_SCALE)),
+		__engine_event_scale(I915_SAMPLE_RUNNING, "running",
+				     __stringify(I915_SAMPLE_RUNNING_SCALE)),
 	};
 	unsigned int count = 0;
 	struct perf_pmu_events_attr *pmu_attr = NULL, *pmu_iter;
@@ -898,6 +910,9 @@ create_event_attributes(struct drm_i915_private *i915)
 	BUILD_BUG_ON(I915_SAMPLE_RUNNABLE_DIVISOR !=
 		     (1 / I915_SAMPLE_RUNNABLE_SCALE));
 
+	BUILD_BUG_ON(I915_SAMPLE_RUNNING_DIVISOR !=
+		     (1 / I915_SAMPLE_RUNNING_SCALE));
+
 	/* Count how many counters we will be exposing. */
 	for (i = 0; i < ARRAY_SIZE(events); i++) {
 		if (!config_status(i915, events[i].config))
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 703cea694f0d..bff20cfd6870 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -420,7 +420,7 @@ struct intel_engine_cs {
 		 *
 		 * Our internal timer stores the current counters in this field.
 		 */
-#define I915_ENGINE_SAMPLE_MAX (I915_SAMPLE_RUNNABLE + 1)
+#define I915_ENGINE_SAMPLE_MAX (I915_SAMPLE_RUNNING + 1)
 		struct i915_pmu_sample sample[I915_ENGINE_SAMPLE_MAX];
 	} pmu;
 
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index cf0265b20e37..9a00c30e4071 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -113,11 +113,13 @@ enum drm_i915_pmu_engine_sample {
 	I915_SAMPLE_SEMA = 2,
 	I915_SAMPLE_QUEUED = 3,
 	I915_SAMPLE_RUNNABLE = 4,
+	I915_SAMPLE_RUNNING = 5,
 };
 
  /* Divide counter value by divisor to get the real value. */
 #define I915_SAMPLE_QUEUED_DIVISOR (1024)
 #define I915_SAMPLE_RUNNABLE_DIVISOR (1024)
+#define I915_SAMPLE_RUNNING_DIVISOR (1024)
 
 #define I915_PMU_SAMPLE_BITS (4)
 #define I915_PMU_SAMPLE_MASK (0xf)
@@ -145,6 +147,9 @@ enum drm_i915_pmu_engine_sample {
 #define I915_PMU_ENGINE_RUNNABLE(class, instance) \
 	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_RUNNABLE)
 
+#define I915_PMU_ENGINE_RUNNING(class, instance) \
+	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_RUNNING)
+
 #define __I915_PMU_OTHER(x) (__I915_PMU_ENGINE(0xff, 0xff, 0xf) + 1 + (x))
 
 #define I915_PMU_ACTUAL_FREQUENCY	__I915_PMU_OTHER(0)
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 7/7] drm/i915: Engine queues query
  2018-06-06 12:48 [PATCH v6 0/7] Queued/runnable/running engine stats Tvrtko Ursulin
                   ` (5 preceding siblings ...)
  2018-06-06 12:48 ` [PATCH 6/7] drm/i915/pmu: Add running counter Tvrtko Ursulin
@ 2018-06-06 12:48 ` Tvrtko Ursulin
  2018-06-06 13:31 ` ✓ Fi.CI.BAT: success for Queued/runnable/running engine stats (rev8) Patchwork
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 25+ messages in thread
From: Tvrtko Ursulin @ 2018-06-06 12:48 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

As well as exposing active requests on engines via PMU, we can also export
the current raw values (as tracked by i915 command submission) via a
dedicated query.

This is to satisfy customers who have userspace load balancing solutions
implemented on top of their custom kernel patches.

Userspace is now able to include DRM_I915_QUERY_ENGINE_QUEUES in their
query list, pointing to initialized struct drm_i915_query_engine_queues
entry. Fields describing engine class and instance userspace would like to
know about need to be filled in, and i915 will fill in the rest.

Multiple engines can be queried in one go by having multiple queries in
the query list.

v2:
 * Use EINVAL for reporting insufficient buffer space. (Chris Wilson)

v3:
 * One more reserved dword because I like even numbers.
 Lionel Landwerlin:
 * Document input fields.
 * Document reserved bits must be zero.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 drivers/gpu/drm/i915/i915_query.c | 43 +++++++++++++++++++++++++++++++
 include/uapi/drm/i915_drm.h       | 29 +++++++++++++++++++++
 2 files changed, 72 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_query.c b/drivers/gpu/drm/i915/i915_query.c
index 3f502eef2431..6a01c1c58f4f 100644
--- a/drivers/gpu/drm/i915/i915_query.c
+++ b/drivers/gpu/drm/i915/i915_query.c
@@ -84,9 +84,52 @@ static int query_topology_info(struct drm_i915_private *dev_priv,
 	return total_length;
 }
 
+static int
+query_engine_queues(struct drm_i915_private *i915,
+		    struct drm_i915_query_item *query_item)
+{
+	struct drm_i915_query_engine_queues __user *query_ptr =
+				u64_to_user_ptr(query_item->data_ptr);
+	struct drm_i915_query_engine_queues query;
+	struct intel_engine_cs *engine;
+	const int len = sizeof(query);
+	unsigned int i;
+
+	if (query_item->flags)
+		return -EINVAL;
+
+	if (!query_item->length)
+		return len;
+	else if (query_item->length < len)
+		return -EINVAL;
+
+	if (copy_from_user(&query, query_ptr, len))
+		return -EFAULT;
+
+	for (i = 0; i < ARRAY_SIZE(query.rsvd); i++) {
+		if (query.rsvd[i])
+			return -EINVAL;
+	}
+
+	engine = intel_engine_lookup_user(i915, query.class, query.instance);
+	if (!engine)
+		return -ENOENT;
+
+	query.queued = atomic_read(&engine->request_stats.queued);
+	query.runnable = engine->request_stats.runnable;
+	query.running = intel_engine_last_submit(engine) -
+			intel_engine_get_seqno(engine);
+
+	if (copy_to_user(query_ptr, &query, len))
+		return -EFAULT;
+
+	return len;
+}
+
 static int (* const i915_query_funcs[])(struct drm_i915_private *dev_priv,
 					struct drm_i915_query_item *query_item) = {
 	query_topology_info,
+	query_engine_queues,
 };
 
 int i915_query_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 9a00c30e4071..c82035b71824 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1637,6 +1637,7 @@ struct drm_i915_perf_oa_config {
 struct drm_i915_query_item {
 	__u64 query_id;
 #define DRM_I915_QUERY_TOPOLOGY_INFO    1
+#define DRM_I915_QUERY_ENGINE_QUEUES	2
 
 	/*
 	 * When set to zero by userspace, this is filled with the size of the
@@ -1734,6 +1735,34 @@ struct drm_i915_query_topology_info {
 	__u8 data[];
 };
 
+/**
+ * struct drm_i915_query_engine_queues
+ *
+ * Engine queues query enables userspace to query current counts of active
+ * requests in their different states.
+ */
+struct drm_i915_query_engine_queues {
+	/**
+	 * Engine class as in enum drm_i915_gem_engine_class (set by userspace).
+	 */
+	__u16 class;
+
+	/** Engine instance number (set by userspace). */
+	__u16 instance;
+
+	/** Number of requests with unresolved fences and dependencies. */
+	__u32 queued;
+
+	/** Number of ready requests waiting on a slot on GPU. */
+	__u32 runnable;
+
+	/** Number of requests executing on the GPU. */
+	__u32 running;
+
+	/** Reserved bits must be set to zero by userspace. */
+	__u32 rsvd[6];
+};
+
 #if defined(__cplusplus)
 }
 #endif
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/7] drm/i915/pmu: Add queued counter
  2018-06-06 12:48 ` [PATCH 4/7] drm/i915/pmu: Add queued counter Tvrtko Ursulin
@ 2018-06-06 13:16   ` Chris Wilson
  2018-06-06 13:24     ` Tvrtko Ursulin
  2018-06-06 14:39   ` Tvrtko Ursulin
  1 sibling, 1 reply; 25+ messages in thread
From: Chris Wilson @ 2018-06-06 13:16 UTC (permalink / raw)
  To: Tvrtko Ursulin, Intel-gfx

Quoting Tvrtko Ursulin (2018-06-06 13:48:45)
> @@ -204,6 +211,12 @@ engines_sample(struct drm_i915_private *dev_priv, unsigned int period_ns)
>                 if (val & RING_WAIT_SEMAPHORE)
>                         add_sample(&engine->pmu.sample[I915_SAMPLE_SEMA],
>                                    period_ns);
> +
> +               if (engine->pmu.enable & BIT(I915_SAMPLE_QUEUED))
> +                       add_sample_mult(&engine->pmu.sample[I915_SAMPLE_QUEUED],
> +                                       atomic_read(&engine->request_stats.queued),
> +                                       (u64)period_ns *
> +                                       I915_SAMPLE_QUEUED_DIVISOR / 1000000);

Doesn't this promote to a 64b divide?
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/7] drm/i915/pmu: Add queued counter
  2018-06-06 13:16   ` Chris Wilson
@ 2018-06-06 13:24     ` Tvrtko Ursulin
  0 siblings, 0 replies; 25+ messages in thread
From: Tvrtko Ursulin @ 2018-06-06 13:24 UTC (permalink / raw)
  To: Chris Wilson, Tvrtko Ursulin, Intel-gfx


On 06/06/2018 14:16, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-06-06 13:48:45)
>> @@ -204,6 +211,12 @@ engines_sample(struct drm_i915_private *dev_priv, unsigned int period_ns)
>>                  if (val & RING_WAIT_SEMAPHORE)
>>                          add_sample(&engine->pmu.sample[I915_SAMPLE_SEMA],
>>                                     period_ns);
>> +
>> +               if (engine->pmu.enable & BIT(I915_SAMPLE_QUEUED))
>> +                       add_sample_mult(&engine->pmu.sample[I915_SAMPLE_QUEUED],
>> +                                       atomic_read(&engine->request_stats.queued),
>> +                                       (u64)period_ns *
>> +                                       I915_SAMPLE_QUEUED_DIVISOR / 1000000);
> 
> Doesn't this promote to a 64b divide?

Yes my bad. Will need to use div_u64 and resend the three musketeers..

Regards,

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* ✓ Fi.CI.BAT: success for Queued/runnable/running engine stats (rev8)
  2018-06-06 12:48 [PATCH v6 0/7] Queued/runnable/running engine stats Tvrtko Ursulin
                   ` (6 preceding siblings ...)
  2018-06-06 12:48 ` [PATCH 7/7] drm/i915: Engine queues query Tvrtko Ursulin
@ 2018-06-06 13:31 ` Patchwork
  2018-06-06 15:17 ` ✓ Fi.CI.BAT: success for Queued/runnable/running engine stats (rev11) Patchwork
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 25+ messages in thread
From: Patchwork @ 2018-06-06 13:31 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

== Series Details ==

Series: Queued/runnable/running engine stats (rev8)
URL   : https://patchwork.freedesktop.org/series/36926/
State : success

== Summary ==

= CI Bug Log - changes from CI_DRM_4282 -> Patchwork_9216 =

== Summary - WARNING ==

  Minor unknown changes coming with Patchwork_9216 need to be verified
  manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_9216, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/36926/revisions/8/mbox/

== Possible new issues ==

  Here are the unknown changes that may have been introduced in Patchwork_9216:

  === IGT changes ===

    ==== Warnings ====

    igt@gem_exec_gttfill@basic:
      fi-pnv-d510:        SKIP -> PASS

    
== Known issues ==

  Here are the changes found in Patchwork_9216 that come from known issues:

  === IGT changes ===

    ==== Issues hit ====

    igt@gem_mmap_gtt@basic-small-bo-tiledx:
      fi-gdg-551:         PASS -> FAIL (fdo#102575)

    igt@kms_chamelium@hdmi-hpd-fast:
      fi-kbl-7500u:       SKIP -> FAIL (fdo#102672, fdo#103841)

    igt@kms_pipe_crc_basic@read-crc-pipe-b-frame-sequence:
      fi-glk-j4005:       PASS -> FAIL (fdo#103481)

    igt@kms_pipe_crc_basic@read-crc-pipe-c:
      fi-glk-j4005:       PASS -> DMESG-WARN (fdo#106097)

    igt@kms_pipe_crc_basic@suspend-read-crc-pipe-c:
      fi-glk-j4005:       PASS -> DMESG-WARN (fdo#106000) +2

    
  fdo#102575 https://bugs.freedesktop.org/show_bug.cgi?id=102575
  fdo#102672 https://bugs.freedesktop.org/show_bug.cgi?id=102672
  fdo#103481 https://bugs.freedesktop.org/show_bug.cgi?id=103481
  fdo#103841 https://bugs.freedesktop.org/show_bug.cgi?id=103841
  fdo#106000 https://bugs.freedesktop.org/show_bug.cgi?id=106000
  fdo#106097 https://bugs.freedesktop.org/show_bug.cgi?id=106097


== Participating hosts (41 -> 36) ==

  Missing    (5): fi-ctg-p8600 fi-byt-squawks fi-ilk-m540 fi-cnl-y3 fi-skl-6700hq 


== Build changes ==

    * Linux: CI_DRM_4282 -> Patchwork_9216

  CI_DRM_4282: c1064b9be065603680d060184da1a93d404dcf0c @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4508: 78a68c905066beeefd24b3a4518d817a811e8798 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_9216: 3526ea28bf9d8a5738b216fe2604a3db05550e86 @ git://anongit.freedesktop.org/gfx-ci/linux


== Kernel 32bit build ==

Warning: Kernel 32bit buildtest failed:
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_9216/build_32bit.log

  CHK     include/config/kernel.release
  CHK     include/generated/uapi/linux/version.h
  CHK     include/generated/utsrelease.h
  CHK     include/generated/bounds.h
  CHK     include/generated/timeconst.h
  CHK     include/generated/asm-offsets.h
  CALL    scripts/checksyscalls.sh
  CHK     scripts/mod/devicetable-offsets.h
  CHK     include/generated/compile.h
  CHK     kernel/config_data.h
  CHK     include/generated/uapi/linux/version.h
  DATAREL arch/x86/boot/compressed/vmlinux
Kernel: arch/x86/boot/bzImage is ready  (#1)
  Building modules, stage 2.
  MODPOST 106 modules
ERROR: "__udivdi3" [drivers/gpu/drm/i915/i915.ko] undefined!
ERROR: "__divdi3" [drivers/gpu/drm/i915/i915.ko] undefined!
scripts/Makefile.modpost:92: recipe for target '__modpost' failed
make[1]: *** [__modpost] Error 1
Makefile:1277: recipe for target 'modules' failed
make: *** [modules] Error 2


== Linux commits ==

3526ea28bf9d drm/i915: Engine queues query
e6c408b549fb drm/i915/pmu: Add running counter
e965ab7817a1 drm/i915/pmu: Add runnable counter
ad921ac04ce7 drm/i915/pmu: Add queued counter
5d81c6d42e39 drm/i915: Keep a count of requests submitted from userspace
ca7e61ec8ffc drm/i915: Keep a count of requests waiting for a slot on GPU
bbed24a2461b drm/i915/pmu: Fix enable count array size and bounds checking

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_9216/issues.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 4/7] drm/i915/pmu: Add queued counter
  2018-06-06 12:48 ` [PATCH 4/7] drm/i915/pmu: Add queued counter Tvrtko Ursulin
  2018-06-06 13:16   ` Chris Wilson
@ 2018-06-06 14:39   ` Tvrtko Ursulin
  2018-06-07 13:24     ` Tvrtko Ursulin
  1 sibling, 1 reply; 25+ messages in thread
From: Tvrtko Ursulin @ 2018-06-06 14:39 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

We add a PMU counter to expose the number of requests which have been
submitted from userspace but are not yet runnable due dependencies and
unsignaled fences.

This is useful to analyze the overall load of the system.

v2:
 * Rebase for name change and re-order.
 * Drop floating point constant. (Chris Wilson)

v3:
 * Change scale to 1024 for faster arithmetics. (Chris Wilson)

v4:
 * Refactored for timer period accounting.

v5:
 * Avoid 64-division. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_pmu.c         | 54 ++++++++++++++++++++-----
 drivers/gpu/drm/i915/intel_ringbuffer.h |  2 +-
 include/uapi/drm/i915_drm.h             |  9 ++++-
 3 files changed, 53 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index b8c6953867ee..ba2205d92190 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -15,7 +15,8 @@
 #define ENGINE_SAMPLE_MASK \
 	(BIT(I915_SAMPLE_BUSY) | \
 	 BIT(I915_SAMPLE_WAIT) | \
-	 BIT(I915_SAMPLE_SEMA))
+	 BIT(I915_SAMPLE_SEMA) | \
+	 BIT(I915_SAMPLE_QUEUED))
 
 #define ENGINE_SAMPLE_BITS (1 << I915_PMU_SAMPLE_BITS)
 
@@ -161,6 +162,12 @@ add_sample(struct i915_pmu_sample *sample, u32 val)
 	sample->cur += val;
 }
 
+static void
+add_sample_mult(struct i915_pmu_sample *sample, u32 val, u32 mul)
+{
+	sample->cur += mul_u32_u32(val, mul);
+}
+
 static void
 engines_sample(struct drm_i915_private *dev_priv, unsigned int period_ns)
 {
@@ -204,6 +211,13 @@ engines_sample(struct drm_i915_private *dev_priv, unsigned int period_ns)
 		if (val & RING_WAIT_SEMAPHORE)
 			add_sample(&engine->pmu.sample[I915_SAMPLE_SEMA],
 				   period_ns);
+
+		if (engine->pmu.enable & BIT(I915_SAMPLE_QUEUED))
+			add_sample_mult(&engine->pmu.sample[I915_SAMPLE_QUEUED],
+					atomic_read(&engine->request_stats.queued),
+					div_u64((u64)period_ns *
+						I915_SAMPLE_QUEUED_DIVISOR,
+						1000000));
 	}
 
 	if (fw)
@@ -212,12 +226,6 @@ engines_sample(struct drm_i915_private *dev_priv, unsigned int period_ns)
 	intel_runtime_pm_put(dev_priv);
 }
 
-static void
-add_sample_mult(struct i915_pmu_sample *sample, u32 val, u32 mul)
-{
-	sample->cur += mul_u32_u32(val, mul);
-}
-
 static void
 frequency_sample(struct drm_i915_private *dev_priv, unsigned int period_ns)
 {
@@ -323,6 +331,7 @@ engine_event_status(struct intel_engine_cs *engine,
 	switch (sample) {
 	case I915_SAMPLE_BUSY:
 	case I915_SAMPLE_WAIT:
+	case I915_SAMPLE_QUEUED:
 		break;
 	case I915_SAMPLE_SEMA:
 		if (INTEL_GEN(engine->i915) < 6)
@@ -540,6 +549,9 @@ static u64 __i915_pmu_event_read(struct perf_event *event)
 			val = ktime_to_ns(intel_engine_get_busy_time(engine));
 		} else {
 			val = engine->pmu.sample[sample].cur;
+
+			if (sample == I915_SAMPLE_QUEUED)
+				val = div_u64(val, MSEC_PER_SEC);  /* to qd */
 		}
 	} else {
 		switch (event->attr.config) {
@@ -796,6 +808,16 @@ static const struct attribute_group *i915_pmu_attr_groups[] = {
 { \
 	.sample = (__sample), \
 	.name = (__name), \
+	.suffix = "unit", \
+	.value = "ns", \
+}
+
+#define __engine_event_scale(__sample, __name, __scale) \
+{ \
+	.sample = (__sample), \
+	.name = (__name), \
+	.suffix = "scale", \
+	.value = (__scale), \
 }
 
 static struct i915_ext_attribute *
@@ -823,6 +845,9 @@ add_pmu_attr(struct perf_pmu_events_attr *attr, const char *name,
 	return ++attr;
 }
 
+/* No brackets or quotes below please. */
+#define I915_SAMPLE_QUEUED_SCALE 0.0009765625
+
 static struct attribute **
 create_event_attributes(struct drm_i915_private *i915)
 {
@@ -839,10 +864,14 @@ create_event_attributes(struct drm_i915_private *i915)
 	static const struct {
 		enum drm_i915_pmu_engine_sample sample;
 		char *name;
+		char *suffix;
+		char *value;
 	} engine_events[] = {
 		__engine_event(I915_SAMPLE_BUSY, "busy"),
 		__engine_event(I915_SAMPLE_SEMA, "sema"),
 		__engine_event(I915_SAMPLE_WAIT, "wait"),
+		__engine_event_scale(I915_SAMPLE_QUEUED, "queued",
+				     __stringify(I915_SAMPLE_QUEUED_SCALE)),
 	};
 	unsigned int count = 0;
 	struct perf_pmu_events_attr *pmu_attr = NULL, *pmu_iter;
@@ -852,6 +881,9 @@ create_event_attributes(struct drm_i915_private *i915)
 	enum intel_engine_id id;
 	unsigned int i;
 
+	BUILD_BUG_ON(I915_SAMPLE_QUEUED_DIVISOR !=
+		     (1 / I915_SAMPLE_QUEUED_SCALE));
+
 	/* Count how many counters we will be exposing. */
 	for (i = 0; i < ARRAY_SIZE(events); i++) {
 		if (!config_status(i915, events[i].config))
@@ -929,13 +961,15 @@ create_event_attributes(struct drm_i915_private *i915)
 								engine->instance,
 								engine_events[i].sample));
 
-			str = kasprintf(GFP_KERNEL, "%s-%s.unit",
-					engine->name, engine_events[i].name);
+			str = kasprintf(GFP_KERNEL, "%s-%s.%s",
+					engine->name, engine_events[i].name,
+					engine_events[i].suffix);
 			if (!str)
 				goto err;
 
 			*attr_iter++ = &pmu_iter->attr.attr;
-			pmu_iter = add_pmu_attr(pmu_iter, str, "ns");
+			pmu_iter = add_pmu_attr(pmu_iter, str,
+						engine_events[i].value);
 		}
 	}
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index eeb7f3662195..902b63eeaf50 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -420,7 +420,7 @@ struct intel_engine_cs {
 		 *
 		 * Our internal timer stores the current counters in this field.
 		 */
-#define I915_ENGINE_SAMPLE_MAX (I915_SAMPLE_SEMA + 1)
+#define I915_ENGINE_SAMPLE_MAX (I915_SAMPLE_QUEUED + 1)
 		struct i915_pmu_sample sample[I915_ENGINE_SAMPLE_MAX];
 	} pmu;
 
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 7f5634ce8e88..6094cc9ca6d9 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -110,9 +110,13 @@ enum drm_i915_gem_engine_class {
 enum drm_i915_pmu_engine_sample {
 	I915_SAMPLE_BUSY = 0,
 	I915_SAMPLE_WAIT = 1,
-	I915_SAMPLE_SEMA = 2
+	I915_SAMPLE_SEMA = 2,
+	I915_SAMPLE_QUEUED = 3
 };
 
+ /* Divide counter value by divisor to get the real value. */
+#define I915_SAMPLE_QUEUED_DIVISOR (1024)
+
 #define I915_PMU_SAMPLE_BITS (4)
 #define I915_PMU_SAMPLE_MASK (0xf)
 #define I915_PMU_SAMPLE_INSTANCE_BITS (8)
@@ -133,6 +137,9 @@ enum drm_i915_pmu_engine_sample {
 #define I915_PMU_ENGINE_SEMA(class, instance) \
 	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_SEMA)
 
+#define I915_PMU_ENGINE_QUEUED(class, instance) \
+	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_QUEUED)
+
 #define __I915_PMU_OTHER(x) (__I915_PMU_ENGINE(0xff, 0xff, 0xf) + 1 + (x))
 
 #define I915_PMU_ACTUAL_FREQUENCY	__I915_PMU_OTHER(0)
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 5/7] drm/i915/pmu: Add runnable counter
  2018-06-06 12:48 ` [PATCH 5/7] drm/i915/pmu: Add runnable counter Tvrtko Ursulin
@ 2018-06-06 14:39   ` Tvrtko Ursulin
  2018-06-07 13:24     ` Tvrtko Ursulin
  0 siblings, 1 reply; 25+ messages in thread
From: Tvrtko Ursulin @ 2018-06-06 14:39 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

We add a PMU counter to expose the number of requests with resolved
dependencies waiting for a slot on the GPU to run.

This is useful to analyze the overall load of the system.

v2: Don't limit to gen8+.

v3:
 * Rebase for dynamic sysfs.
 * Drop currently executing requests.

v4:
 * Sync with internal renaming.
 * Drop floating point constant. (Chris Wilson)

v5:
 * Change scale to 1024 for faster arithmetics. (Chris Wilson)

v6:
 * Refactored for timer period accounting.

v7:
 * Avoid 64-division. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_pmu.c         | 20 ++++++++++++++++++--
 drivers/gpu/drm/i915/intel_ringbuffer.h |  2 +-
 include/uapi/drm/i915_drm.h             |  7 ++++++-
 3 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index ba2205d92190..46a516a748c8 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -16,7 +16,8 @@
 	(BIT(I915_SAMPLE_BUSY) | \
 	 BIT(I915_SAMPLE_WAIT) | \
 	 BIT(I915_SAMPLE_SEMA) | \
-	 BIT(I915_SAMPLE_QUEUED))
+	 BIT(I915_SAMPLE_QUEUED) | \
+	 BIT(I915_SAMPLE_RUNNABLE))
 
 #define ENGINE_SAMPLE_BITS (1 << I915_PMU_SAMPLE_BITS)
 
@@ -218,6 +219,13 @@ engines_sample(struct drm_i915_private *dev_priv, unsigned int period_ns)
 					div_u64((u64)period_ns *
 						I915_SAMPLE_QUEUED_DIVISOR,
 						1000000));
+
+		if (engine->pmu.enable & BIT(I915_SAMPLE_RUNNABLE))
+			add_sample_mult(&engine->pmu.sample[I915_SAMPLE_RUNNABLE],
+					engine->request_stats.runnable,
+					div_u64((u64)period_ns *
+						I915_SAMPLE_QUEUED_DIVISOR,
+						1000000));
 	}
 
 	if (fw)
@@ -332,6 +340,7 @@ engine_event_status(struct intel_engine_cs *engine,
 	case I915_SAMPLE_BUSY:
 	case I915_SAMPLE_WAIT:
 	case I915_SAMPLE_QUEUED:
+	case I915_SAMPLE_RUNNABLE:
 		break;
 	case I915_SAMPLE_SEMA:
 		if (INTEL_GEN(engine->i915) < 6)
@@ -550,7 +559,8 @@ static u64 __i915_pmu_event_read(struct perf_event *event)
 		} else {
 			val = engine->pmu.sample[sample].cur;
 
-			if (sample == I915_SAMPLE_QUEUED)
+			if (sample == I915_SAMPLE_QUEUED ||
+			    sample == I915_SAMPLE_RUNNABLE)
 				val = div_u64(val, MSEC_PER_SEC);  /* to qd */
 		}
 	} else {
@@ -847,6 +857,7 @@ add_pmu_attr(struct perf_pmu_events_attr *attr, const char *name,
 
 /* No brackets or quotes below please. */
 #define I915_SAMPLE_QUEUED_SCALE 0.0009765625
+#define I915_SAMPLE_RUNNABLE_SCALE 0.0009765625
 
 static struct attribute **
 create_event_attributes(struct drm_i915_private *i915)
@@ -872,6 +883,8 @@ create_event_attributes(struct drm_i915_private *i915)
 		__engine_event(I915_SAMPLE_WAIT, "wait"),
 		__engine_event_scale(I915_SAMPLE_QUEUED, "queued",
 				     __stringify(I915_SAMPLE_QUEUED_SCALE)),
+		__engine_event_scale(I915_SAMPLE_RUNNABLE, "runnable",
+				     __stringify(I915_SAMPLE_RUNNABLE_SCALE)),
 	};
 	unsigned int count = 0;
 	struct perf_pmu_events_attr *pmu_attr = NULL, *pmu_iter;
@@ -884,6 +897,9 @@ create_event_attributes(struct drm_i915_private *i915)
 	BUILD_BUG_ON(I915_SAMPLE_QUEUED_DIVISOR !=
 		     (1 / I915_SAMPLE_QUEUED_SCALE));
 
+	BUILD_BUG_ON(I915_SAMPLE_RUNNABLE_DIVISOR !=
+		     (1 / I915_SAMPLE_RUNNABLE_SCALE));
+
 	/* Count how many counters we will be exposing. */
 	for (i = 0; i < ARRAY_SIZE(events); i++) {
 		if (!config_status(i915, events[i].config))
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 902b63eeaf50..703cea694f0d 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -420,7 +420,7 @@ struct intel_engine_cs {
 		 *
 		 * Our internal timer stores the current counters in this field.
 		 */
-#define I915_ENGINE_SAMPLE_MAX (I915_SAMPLE_QUEUED + 1)
+#define I915_ENGINE_SAMPLE_MAX (I915_SAMPLE_RUNNABLE + 1)
 		struct i915_pmu_sample sample[I915_ENGINE_SAMPLE_MAX];
 	} pmu;
 
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 6094cc9ca6d9..cf0265b20e37 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -111,11 +111,13 @@ enum drm_i915_pmu_engine_sample {
 	I915_SAMPLE_BUSY = 0,
 	I915_SAMPLE_WAIT = 1,
 	I915_SAMPLE_SEMA = 2,
-	I915_SAMPLE_QUEUED = 3
+	I915_SAMPLE_QUEUED = 3,
+	I915_SAMPLE_RUNNABLE = 4,
 };
 
  /* Divide counter value by divisor to get the real value. */
 #define I915_SAMPLE_QUEUED_DIVISOR (1024)
+#define I915_SAMPLE_RUNNABLE_DIVISOR (1024)
 
 #define I915_PMU_SAMPLE_BITS (4)
 #define I915_PMU_SAMPLE_MASK (0xf)
@@ -140,6 +142,9 @@ enum drm_i915_pmu_engine_sample {
 #define I915_PMU_ENGINE_QUEUED(class, instance) \
 	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_QUEUED)
 
+#define I915_PMU_ENGINE_RUNNABLE(class, instance) \
+	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_RUNNABLE)
+
 #define __I915_PMU_OTHER(x) (__I915_PMU_ENGINE(0xff, 0xff, 0xf) + 1 + (x))
 
 #define I915_PMU_ACTUAL_FREQUENCY	__I915_PMU_OTHER(0)
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 6/7] drm/i915/pmu: Add running counter
  2018-06-06 12:48 ` [PATCH 6/7] drm/i915/pmu: Add running counter Tvrtko Ursulin
@ 2018-06-06 14:40   ` Tvrtko Ursulin
  2018-06-06 15:23     ` Chris Wilson
  2018-06-07 13:25     ` Tvrtko Ursulin
  0 siblings, 2 replies; 25+ messages in thread
From: Tvrtko Ursulin @ 2018-06-06 14:40 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

We add a PMU counter to expose the number of requests currently executing
on the GPU.

This is useful to analyze the overall load of the system.

v2:
 * Rebase.
 * Drop floating point constant. (Chris Wilson)

v3:
 * Change scale to 1024 for faster arithmetics. (Chris Wilson)

v4:
 * Refactored for timer period accounting.

v5:
 * Avoid 64-division. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_pmu.c         | 20 ++++++++++++++++++--
 drivers/gpu/drm/i915/intel_ringbuffer.h |  2 +-
 include/uapi/drm/i915_drm.h             |  5 +++++
 3 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index 46a516a748c8..9ecaf662b5c1 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -17,7 +17,8 @@
 	 BIT(I915_SAMPLE_WAIT) | \
 	 BIT(I915_SAMPLE_SEMA) | \
 	 BIT(I915_SAMPLE_QUEUED) | \
-	 BIT(I915_SAMPLE_RUNNABLE))
+	 BIT(I915_SAMPLE_RUNNABLE) | \
+	 BIT(I915_SAMPLE_RUNNING))
 
 #define ENGINE_SAMPLE_BITS (1 << I915_PMU_SAMPLE_BITS)
 
@@ -226,6 +227,13 @@ engines_sample(struct drm_i915_private *dev_priv, unsigned int period_ns)
 					div_u64((u64)period_ns *
 						I915_SAMPLE_QUEUED_DIVISOR,
 						1000000));
+
+		if (engine->pmu.enable & BIT(I915_SAMPLE_RUNNING))
+			add_sample_mult(&engine->pmu.sample[I915_SAMPLE_RUNNING],
+					last_seqno - current_seqno,
+					div_u64((u64)period_ns *
+						I915_SAMPLE_QUEUED_DIVISOR,
+						1000000));
 	}
 
 	if (fw)
@@ -341,6 +349,7 @@ engine_event_status(struct intel_engine_cs *engine,
 	case I915_SAMPLE_WAIT:
 	case I915_SAMPLE_QUEUED:
 	case I915_SAMPLE_RUNNABLE:
+	case I915_SAMPLE_RUNNING:
 		break;
 	case I915_SAMPLE_SEMA:
 		if (INTEL_GEN(engine->i915) < 6)
@@ -560,7 +569,8 @@ static u64 __i915_pmu_event_read(struct perf_event *event)
 			val = engine->pmu.sample[sample].cur;
 
 			if (sample == I915_SAMPLE_QUEUED ||
-			    sample == I915_SAMPLE_RUNNABLE)
+			    sample == I915_SAMPLE_RUNNABLE ||
+			    sample == I915_SAMPLE_RUNNING)
 				val = div_u64(val, MSEC_PER_SEC);  /* to qd */
 		}
 	} else {
@@ -858,6 +868,7 @@ add_pmu_attr(struct perf_pmu_events_attr *attr, const char *name,
 /* No brackets or quotes below please. */
 #define I915_SAMPLE_QUEUED_SCALE 0.0009765625
 #define I915_SAMPLE_RUNNABLE_SCALE 0.0009765625
+#define I915_SAMPLE_RUNNING_SCALE 0.0009765625
 
 static struct attribute **
 create_event_attributes(struct drm_i915_private *i915)
@@ -885,6 +896,8 @@ create_event_attributes(struct drm_i915_private *i915)
 				     __stringify(I915_SAMPLE_QUEUED_SCALE)),
 		__engine_event_scale(I915_SAMPLE_RUNNABLE, "runnable",
 				     __stringify(I915_SAMPLE_RUNNABLE_SCALE)),
+		__engine_event_scale(I915_SAMPLE_RUNNING, "running",
+				     __stringify(I915_SAMPLE_RUNNING_SCALE)),
 	};
 	unsigned int count = 0;
 	struct perf_pmu_events_attr *pmu_attr = NULL, *pmu_iter;
@@ -900,6 +913,9 @@ create_event_attributes(struct drm_i915_private *i915)
 	BUILD_BUG_ON(I915_SAMPLE_RUNNABLE_DIVISOR !=
 		     (1 / I915_SAMPLE_RUNNABLE_SCALE));
 
+	BUILD_BUG_ON(I915_SAMPLE_RUNNING_DIVISOR !=
+		     (1 / I915_SAMPLE_RUNNING_SCALE));
+
 	/* Count how many counters we will be exposing. */
 	for (i = 0; i < ARRAY_SIZE(events); i++) {
 		if (!config_status(i915, events[i].config))
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 703cea694f0d..bff20cfd6870 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -420,7 +420,7 @@ struct intel_engine_cs {
 		 *
 		 * Our internal timer stores the current counters in this field.
 		 */
-#define I915_ENGINE_SAMPLE_MAX (I915_SAMPLE_RUNNABLE + 1)
+#define I915_ENGINE_SAMPLE_MAX (I915_SAMPLE_RUNNING + 1)
 		struct i915_pmu_sample sample[I915_ENGINE_SAMPLE_MAX];
 	} pmu;
 
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index cf0265b20e37..9a00c30e4071 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -113,11 +113,13 @@ enum drm_i915_pmu_engine_sample {
 	I915_SAMPLE_SEMA = 2,
 	I915_SAMPLE_QUEUED = 3,
 	I915_SAMPLE_RUNNABLE = 4,
+	I915_SAMPLE_RUNNING = 5,
 };
 
  /* Divide counter value by divisor to get the real value. */
 #define I915_SAMPLE_QUEUED_DIVISOR (1024)
 #define I915_SAMPLE_RUNNABLE_DIVISOR (1024)
+#define I915_SAMPLE_RUNNING_DIVISOR (1024)
 
 #define I915_PMU_SAMPLE_BITS (4)
 #define I915_PMU_SAMPLE_MASK (0xf)
@@ -145,6 +147,9 @@ enum drm_i915_pmu_engine_sample {
 #define I915_PMU_ENGINE_RUNNABLE(class, instance) \
 	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_RUNNABLE)
 
+#define I915_PMU_ENGINE_RUNNING(class, instance) \
+	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_RUNNING)
+
 #define __I915_PMU_OTHER(x) (__I915_PMU_ENGINE(0xff, 0xff, 0xf) + 1 + (x))
 
 #define I915_PMU_ACTUAL_FREQUENCY	__I915_PMU_OTHER(0)
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* ✓ Fi.CI.BAT: success for Queued/runnable/running engine stats (rev11)
  2018-06-06 12:48 [PATCH v6 0/7] Queued/runnable/running engine stats Tvrtko Ursulin
                   ` (7 preceding siblings ...)
  2018-06-06 13:31 ` ✓ Fi.CI.BAT: success for Queued/runnable/running engine stats (rev8) Patchwork
@ 2018-06-06 15:17 ` Patchwork
  2018-06-06 16:06 ` ✗ Fi.CI.IGT: failure for Queued/runnable/running engine stats (rev8) Patchwork
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 25+ messages in thread
From: Patchwork @ 2018-06-06 15:17 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

== Series Details ==

Series: Queued/runnable/running engine stats (rev11)
URL   : https://patchwork.freedesktop.org/series/36926/
State : success

== Summary ==

= CI Bug Log - changes from CI_DRM_4282 -> Patchwork_9220 =

== Summary - WARNING ==

  Minor unknown changes coming with Patchwork_9220 need to be verified
  manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_9220, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/36926/revisions/11/mbox/

== Possible new issues ==

  Here are the unknown changes that may have been introduced in Patchwork_9220:

  === IGT changes ===

    ==== Warnings ====

    igt@gem_exec_gttfill@basic:
      fi-pnv-d510:        SKIP -> PASS

    
== Known issues ==

  Here are the changes found in Patchwork_9220 that come from known issues:

  === IGT changes ===

    ==== Issues hit ====

    igt@gem_exec_nop@basic-series:
      fi-glk-j4005:       PASS -> DMESG-WARN (fdo#106097) +1

    igt@gem_mmap_gtt@basic-small-bo-tiledx:
      fi-gdg-551:         PASS -> FAIL (fdo#102575)

    igt@kms_flip@basic-flip-vs-dpms:
      fi-glk-j4005:       PASS -> DMESG-WARN (fdo#106000)

    
    ==== Possible fixes ====

    igt@gem_sync@basic-each:
      fi-cnl-y3:          INCOMPLETE (fdo#105086) -> PASS

    
  fdo#102575 https://bugs.freedesktop.org/show_bug.cgi?id=102575
  fdo#105086 https://bugs.freedesktop.org/show_bug.cgi?id=105086
  fdo#106000 https://bugs.freedesktop.org/show_bug.cgi?id=106000
  fdo#106097 https://bugs.freedesktop.org/show_bug.cgi?id=106097


== Participating hosts (41 -> 37) ==

  Missing    (4): fi-ctg-p8600 fi-ilk-m540 fi-byt-squawks fi-skl-6700hq 


== Build changes ==

    * Linux: CI_DRM_4282 -> Patchwork_9220

  CI_DRM_4282: c1064b9be065603680d060184da1a93d404dcf0c @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4508: 78a68c905066beeefd24b3a4518d817a811e8798 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_9220: 5a041e661e3d916252b0c672dd517bb66f898277 @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

5a041e661e3d drm/i915: Engine queues query
6e8b3968f1e4 drm/i915/pmu: Add running counter
a359d4f40428 drm/i915/pmu: Add runnable counter
a66ec00ebf55 drm/i915/pmu: Add queued counter
c332424de0d4 drm/i915: Keep a count of requests submitted from userspace
2cc8813d4252 drm/i915: Keep a count of requests waiting for a slot on GPU
66ba9510b5ec drm/i915/pmu: Fix enable count array size and bounds checking

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_9220/issues.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 6/7] drm/i915/pmu: Add running counter
  2018-06-06 14:40   ` Tvrtko Ursulin
@ 2018-06-06 15:23     ` Chris Wilson
  2018-06-06 15:52       ` Tvrtko Ursulin
  2018-06-07 13:25     ` Tvrtko Ursulin
  1 sibling, 1 reply; 25+ messages in thread
From: Chris Wilson @ 2018-06-06 15:23 UTC (permalink / raw)
  To: Tvrtko Ursulin, Intel-gfx

Quoting Tvrtko Ursulin (2018-06-06 15:40:10)
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> We add a PMU counter to expose the number of requests currently executing
> on the GPU.
> 
> This is useful to analyze the overall load of the system.
> 
> v2:
>  * Rebase.
>  * Drop floating point constant. (Chris Wilson)
> 
> v3:
>  * Change scale to 1024 for faster arithmetics. (Chris Wilson)
> 
> v4:
>  * Refactored for timer period accounting.
> 
> v5:
>  * Avoid 64-division. (Chris Wilson)
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>  #define ENGINE_SAMPLE_BITS (1 << I915_PMU_SAMPLE_BITS)
>  
> @@ -226,6 +227,13 @@ engines_sample(struct drm_i915_private *dev_priv, unsigned int period_ns)
>                                         div_u64((u64)period_ns *
>                                                 I915_SAMPLE_QUEUED_DIVISOR,
>                                                 1000000));
> +
> +               if (engine->pmu.enable & BIT(I915_SAMPLE_RUNNING))
> +                       add_sample_mult(&engine->pmu.sample[I915_SAMPLE_RUNNING],
> +                                       last_seqno - current_seqno,
> +                                       div_u64((u64)period_ns *
> +                                               I915_SAMPLE_QUEUED_DIVISOR,
> +                                               1000000));

Are we worried about losing precision with qd.ns?

add_sample_mult(SAMPLE, x, period_ns); here

> @@ -560,7 +569,8 @@ static u64 __i915_pmu_event_read(struct perf_event *event)
>                         val = engine->pmu.sample[sample].cur;
>  
>                         if (sample == I915_SAMPLE_QUEUED ||
> -                           sample == I915_SAMPLE_RUNNABLE)
> +                           sample == I915_SAMPLE_RUNNABLE ||
> +                           sample == I915_SAMPLE_RUNNING)
>                                 val = div_u64(val, MSEC_PER_SEC);  /* to qd */

and val = div_u64(val * I915_SAMPLE_QUEUED_DIVISOR, NSEC_PER_SEC);

So that gives us a limit of ~1 million qd (assuming the user cares for
about 1s intervals). Up to 8 million wlog with

	val = div_u64(val * I915_SAMPLE_QUEUED_DIVISOR/8, NSEC_PER_SEC/8);

Anyway, just concerned to have more than one 64b division and want to
provoke you into thinking of a way of avoiding it :)
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 6/7] drm/i915/pmu: Add running counter
  2018-06-06 15:23     ` Chris Wilson
@ 2018-06-06 15:52       ` Tvrtko Ursulin
  0 siblings, 0 replies; 25+ messages in thread
From: Tvrtko Ursulin @ 2018-06-06 15:52 UTC (permalink / raw)
  To: Chris Wilson, Tvrtko Ursulin, Intel-gfx


On 06/06/2018 16:23, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-06-06 15:40:10)
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> We add a PMU counter to expose the number of requests currently executing
>> on the GPU.
>>
>> This is useful to analyze the overall load of the system.
>>
>> v2:
>>   * Rebase.
>>   * Drop floating point constant. (Chris Wilson)
>>
>> v3:
>>   * Change scale to 1024 for faster arithmetics. (Chris Wilson)
>>
>> v4:
>>   * Refactored for timer period accounting.
>>
>> v5:
>>   * Avoid 64-division. (Chris Wilson)
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> ---
>>   #define ENGINE_SAMPLE_BITS (1 << I915_PMU_SAMPLE_BITS)
>>   
>> @@ -226,6 +227,13 @@ engines_sample(struct drm_i915_private *dev_priv, unsigned int period_ns)
>>                                          div_u64((u64)period_ns *
>>                                                  I915_SAMPLE_QUEUED_DIVISOR,
>>                                                  1000000));
>> +
>> +               if (engine->pmu.enable & BIT(I915_SAMPLE_RUNNING))
>> +                       add_sample_mult(&engine->pmu.sample[I915_SAMPLE_RUNNING],
>> +                                       last_seqno - current_seqno,
>> +                                       div_u64((u64)period_ns *
>> +                                               I915_SAMPLE_QUEUED_DIVISOR,
>> +                                               1000000));
> 
> Are we worried about losing precision with qd.ns?
> 
> add_sample_mult(SAMPLE, x, period_ns); here
> 
>> @@ -560,7 +569,8 @@ static u64 __i915_pmu_event_read(struct perf_event *event)
>>                          val = engine->pmu.sample[sample].cur;
>>   
>>                          if (sample == I915_SAMPLE_QUEUED ||
>> -                           sample == I915_SAMPLE_RUNNABLE)
>> +                           sample == I915_SAMPLE_RUNNABLE ||
>> +                           sample == I915_SAMPLE_RUNNING)
>>                                  val = div_u64(val, MSEC_PER_SEC);  /* to qd */
> 
> and val = div_u64(val * I915_SAMPLE_QUEUED_DIVISOR, NSEC_PER_SEC);

Yeah that works, thanks.

> So that gives us a limit of ~1 million qd (assuming the user cares for
> about 1s intervals). Up to 8 million wlog with
> 
> 	val = div_u64(val * I915_SAMPLE_QUEUED_DIVISOR/8, NSEC_PER_SEC/8);

Or keep in qd.us as for frequency. I think precision is plenty in any case.

> Anyway, just concerned to have more than one 64b division and want to
> provoke you into thinking of a way of avoiding it :)

It is an optimized 64-bit divide, or 64-divide as I faltered in the 
commit message :), so not as bad as 64/64, but still your idea is very good.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* ✗ Fi.CI.IGT: failure for Queued/runnable/running engine stats (rev8)
  2018-06-06 12:48 [PATCH v6 0/7] Queued/runnable/running engine stats Tvrtko Ursulin
                   ` (8 preceding siblings ...)
  2018-06-06 15:17 ` ✓ Fi.CI.BAT: success for Queued/runnable/running engine stats (rev11) Patchwork
@ 2018-06-06 16:06 ` Patchwork
  2018-06-06 17:26 ` ✓ Fi.CI.IGT: success for Queued/runnable/running engine stats (rev11) Patchwork
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 25+ messages in thread
From: Patchwork @ 2018-06-06 16:06 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

== Series Details ==

Series: Queued/runnable/running engine stats (rev8)
URL   : https://patchwork.freedesktop.org/series/36926/
State : failure

== Summary ==

= CI Bug Log - changes from CI_DRM_4282_full -> Patchwork_9216_full =

== Summary - FAILURE ==

  Serious unknown changes coming with Patchwork_9216_full absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_9216_full, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/36926/revisions/8/mbox/

== Possible new issues ==

  Here are the unknown changes that may have been introduced in Patchwork_9216_full:

  === IGT changes ===

    ==== Possible regressions ====

    igt@drv_selftest@live_hangcheck:
      shard-apl:          PASS -> DMESG-FAIL

    
    ==== Warnings ====

    igt@drv_selftest@live_execlists:
      shard-apl:          PASS -> SKIP +1

    igt@gem_exec_schedule@deep-blt:
      shard-kbl:          SKIP -> PASS

    igt@gem_mocs_settings@mocs-rc6-render:
      shard-kbl:          PASS -> SKIP

    
== Known issues ==

  Here are the changes found in Patchwork_9216_full that come from known issues:

  === IGT changes ===

    ==== Issues hit ====

    igt@drv_selftest@live_gtt:
      shard-kbl:          PASS -> FAIL (fdo#105347)

    igt@gem_ppgtt@blt-vs-render-ctx0:
      shard-kbl:          PASS -> INCOMPLETE (fdo#103665, fdo#106023)

    igt@kms_flip@2x-flip-vs-expired-vblank:
      shard-hsw:          PASS -> FAIL (fdo#105363)

    igt@kms_flip@flip-vs-absolute-wf_vblank-interruptible:
      shard-glk:          PASS -> FAIL (fdo#100368)

    
    ==== Possible fixes ====

    igt@gem_exec_big:
      shard-hsw:          INCOMPLETE (fdo#103540) -> PASS

    igt@kms_atomic_transition@1x-modeset-transitions-nonblocking-fencing:
      shard-glk:          FAIL (fdo#105703) -> PASS +1

    igt@kms_cursor_legacy@2x-nonblocking-modeset-vs-cursor-atomic:
      shard-glk:          FAIL (fdo#106509, fdo#105454) -> PASS

    igt@kms_flip@plain-flip-ts-check:
      shard-glk:          FAIL (fdo#100368) -> PASS

    igt@kms_flip_tiling@flip-to-x-tiled:
      shard-glk:          FAIL (fdo#104724, fdo#103822) -> PASS

    igt@kms_setmode@basic:
      shard-apl:          FAIL (fdo#99912) -> PASS

    
  fdo#100368 https://bugs.freedesktop.org/show_bug.cgi?id=100368
  fdo#103540 https://bugs.freedesktop.org/show_bug.cgi?id=103540
  fdo#103665 https://bugs.freedesktop.org/show_bug.cgi?id=103665
  fdo#103822 https://bugs.freedesktop.org/show_bug.cgi?id=103822
  fdo#104724 https://bugs.freedesktop.org/show_bug.cgi?id=104724
  fdo#105347 https://bugs.freedesktop.org/show_bug.cgi?id=105347
  fdo#105363 https://bugs.freedesktop.org/show_bug.cgi?id=105363
  fdo#105454 https://bugs.freedesktop.org/show_bug.cgi?id=105454
  fdo#105703 https://bugs.freedesktop.org/show_bug.cgi?id=105703
  fdo#106023 https://bugs.freedesktop.org/show_bug.cgi?id=106023
  fdo#106509 https://bugs.freedesktop.org/show_bug.cgi?id=106509
  fdo#99912 https://bugs.freedesktop.org/show_bug.cgi?id=99912


== Participating hosts (5 -> 5) ==

  No changes in participating hosts


== Build changes ==

    * Linux: CI_DRM_4282 -> Patchwork_9216

  CI_DRM_4282: c1064b9be065603680d060184da1a93d404dcf0c @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4508: 78a68c905066beeefd24b3a4518d817a811e8798 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_9216: 3526ea28bf9d8a5738b216fe2604a3db05550e86 @ git://anongit.freedesktop.org/gfx-ci/linux

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_9216/shards.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* ✓ Fi.CI.IGT: success for Queued/runnable/running engine stats (rev11)
  2018-06-06 12:48 [PATCH v6 0/7] Queued/runnable/running engine stats Tvrtko Ursulin
                   ` (9 preceding siblings ...)
  2018-06-06 16:06 ` ✗ Fi.CI.IGT: failure for Queued/runnable/running engine stats (rev8) Patchwork
@ 2018-06-06 17:26 ` Patchwork
  2018-06-07 13:57 ` ✓ Fi.CI.BAT: success for Queued/runnable/running engine stats (rev14) Patchwork
  2018-06-07 17:55 ` ✓ Fi.CI.IGT: " Patchwork
  12 siblings, 0 replies; 25+ messages in thread
From: Patchwork @ 2018-06-06 17:26 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

== Series Details ==

Series: Queued/runnable/running engine stats (rev11)
URL   : https://patchwork.freedesktop.org/series/36926/
State : success

== Summary ==

= CI Bug Log - changes from CI_DRM_4282_full -> Patchwork_9220_full =

== Summary - WARNING ==

  Minor unknown changes coming with Patchwork_9220_full need to be verified
  manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_9220_full, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/36926/revisions/11/mbox/

== Possible new issues ==

  Here are the unknown changes that may have been introduced in Patchwork_9220_full:

  === IGT changes ===

    ==== Warnings ====

    igt@gem_exec_schedule@deep-blt:
      shard-kbl:          SKIP -> PASS +1

    igt@gem_exec_schedule@deep-bsd2:
      shard-kbl:          PASS -> SKIP +2

    
== Known issues ==

  Here are the changes found in Patchwork_9220_full that come from known issues:

  === IGT changes ===

    ==== Issues hit ====

    igt@drv_selftest@live_hangcheck:
      shard-apl:          PASS -> DMESG-FAIL (fdo#106560)

    igt@gem_ppgtt@blt-vs-render-ctx0:
      shard-kbl:          PASS -> INCOMPLETE (fdo#103665, fdo#106023)

    
    ==== Possible fixes ====

    igt@drv_selftest@live_gtt:
      shard-glk:          INCOMPLETE (fdo#103359, k.org#198133) -> PASS

    igt@kms_atomic_transition@1x-modeset-transitions-nonblocking-fencing:
      shard-glk:          FAIL (fdo#105703) -> PASS +1

    igt@kms_cursor_legacy@2x-nonblocking-modeset-vs-cursor-atomic:
      shard-glk:          FAIL (fdo#105454, fdo#106509) -> PASS

    igt@kms_flip@flip-vs-expired-vblank:
      shard-hsw:          FAIL (fdo#105189) -> PASS

    igt@kms_flip@plain-flip-fb-recreate-interruptible:
      shard-glk:          FAIL (fdo#100368) -> PASS +1

    igt@kms_flip_tiling@flip-x-tiled:
      shard-glk:          FAIL (fdo#104724, fdo#103822) -> PASS

    igt@kms_flip_tiling@flip-y-tiled:
      shard-glk:          FAIL (fdo#104724) -> PASS

    
  fdo#100368 https://bugs.freedesktop.org/show_bug.cgi?id=100368
  fdo#103359 https://bugs.freedesktop.org/show_bug.cgi?id=103359
  fdo#103665 https://bugs.freedesktop.org/show_bug.cgi?id=103665
  fdo#103822 https://bugs.freedesktop.org/show_bug.cgi?id=103822
  fdo#104724 https://bugs.freedesktop.org/show_bug.cgi?id=104724
  fdo#105189 https://bugs.freedesktop.org/show_bug.cgi?id=105189
  fdo#105454 https://bugs.freedesktop.org/show_bug.cgi?id=105454
  fdo#105703 https://bugs.freedesktop.org/show_bug.cgi?id=105703
  fdo#106023 https://bugs.freedesktop.org/show_bug.cgi?id=106023
  fdo#106509 https://bugs.freedesktop.org/show_bug.cgi?id=106509
  fdo#106560 https://bugs.freedesktop.org/show_bug.cgi?id=106560
  k.org#198133 https://bugzilla.kernel.org/show_bug.cgi?id=198133


== Participating hosts (5 -> 5) ==

  No changes in participating hosts


== Build changes ==

    * Linux: CI_DRM_4282 -> Patchwork_9220

  CI_DRM_4282: c1064b9be065603680d060184da1a93d404dcf0c @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4508: 78a68c905066beeefd24b3a4518d817a811e8798 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_9220: 5a041e661e3d916252b0c672dd517bb66f898277 @ git://anongit.freedesktop.org/gfx-ci/linux

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_9220/shards.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 4/7] drm/i915/pmu: Add queued counter
  2018-06-06 14:39   ` Tvrtko Ursulin
@ 2018-06-07 13:24     ` Tvrtko Ursulin
  0 siblings, 0 replies; 25+ messages in thread
From: Tvrtko Ursulin @ 2018-06-07 13:24 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

We add a PMU counter to expose the number of requests which have been
submitted from userspace but are not yet runnable due dependencies and
unsignaled fences.

This is useful to analyze the overall load of the system.

v2:
 * Rebase for name change and re-order.
 * Drop floating point constant. (Chris Wilson)

v3:
 * Change scale to 1024 for faster arithmetics. (Chris Wilson)

v4:
 * Refactored for timer period accounting.

v5:
 * Avoid 64-division. (Chris Wilson)

v6:
 * Do fewer divisions by accumulating in qd.ns units. (Chris Wilson)
 * Change counter scale to avoid multiplication in readout and increase
   counter headroom.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_pmu.c         | 58 ++++++++++++++++++++-----
 drivers/gpu/drm/i915/intel_ringbuffer.h |  2 +-
 include/uapi/drm/i915_drm.h             |  9 +++-
 3 files changed, 57 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index b8c6953867ee..f8a819600ebc 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -15,7 +15,8 @@
 #define ENGINE_SAMPLE_MASK \
 	(BIT(I915_SAMPLE_BUSY) | \
 	 BIT(I915_SAMPLE_WAIT) | \
-	 BIT(I915_SAMPLE_SEMA))
+	 BIT(I915_SAMPLE_SEMA) | \
+	 BIT(I915_SAMPLE_QUEUED))
 
 #define ENGINE_SAMPLE_BITS (1 << I915_PMU_SAMPLE_BITS)
 
@@ -161,6 +162,12 @@ add_sample(struct i915_pmu_sample *sample, u32 val)
 	sample->cur += val;
 }
 
+static void
+add_sample_mult(struct i915_pmu_sample *sample, u32 val, u32 mul)
+{
+	sample->cur += mul_u32_u32(val, mul);
+}
+
 static void
 engines_sample(struct drm_i915_private *dev_priv, unsigned int period_ns)
 {
@@ -204,6 +211,11 @@ engines_sample(struct drm_i915_private *dev_priv, unsigned int period_ns)
 		if (val & RING_WAIT_SEMAPHORE)
 			add_sample(&engine->pmu.sample[I915_SAMPLE_SEMA],
 				   period_ns);
+
+		if (engine->pmu.enable & BIT(I915_SAMPLE_QUEUED))
+			add_sample_mult(&engine->pmu.sample[I915_SAMPLE_QUEUED],
+					atomic_read(&engine->request_stats.queued),
+					period_ns);
 	}
 
 	if (fw)
@@ -212,12 +224,6 @@ engines_sample(struct drm_i915_private *dev_priv, unsigned int period_ns)
 	intel_runtime_pm_put(dev_priv);
 }
 
-static void
-add_sample_mult(struct i915_pmu_sample *sample, u32 val, u32 mul)
-{
-	sample->cur += mul_u32_u32(val, mul);
-}
-
 static void
 frequency_sample(struct drm_i915_private *dev_priv, unsigned int period_ns)
 {
@@ -323,6 +329,7 @@ engine_event_status(struct intel_engine_cs *engine,
 	switch (sample) {
 	case I915_SAMPLE_BUSY:
 	case I915_SAMPLE_WAIT:
+	case I915_SAMPLE_QUEUED:
 		break;
 	case I915_SAMPLE_SEMA:
 		if (INTEL_GEN(engine->i915) < 6)
@@ -540,6 +547,15 @@ static u64 __i915_pmu_event_read(struct perf_event *event)
 			val = ktime_to_ns(intel_engine_get_busy_time(engine));
 		} else {
 			val = engine->pmu.sample[sample].cur;
+
+			if (sample == I915_SAMPLE_QUEUED) {
+				BUILD_BUG_ON(NSEC_PER_SEC %
+					     I915_SAMPLE_QUEUED_DIVISOR);
+				/* to qd */
+				val = div_u64(val,
+					      NSEC_PER_SEC /
+					      I915_SAMPLE_QUEUED_DIVISOR);
+			}
 		}
 	} else {
 		switch (event->attr.config) {
@@ -796,6 +812,16 @@ static const struct attribute_group *i915_pmu_attr_groups[] = {
 { \
 	.sample = (__sample), \
 	.name = (__name), \
+	.suffix = "unit", \
+	.value = "ns", \
+}
+
+#define __engine_event_scale(__sample, __name, __scale) \
+{ \
+	.sample = (__sample), \
+	.name = (__name), \
+	.suffix = "scale", \
+	.value = (__scale), \
 }
 
 static struct i915_ext_attribute *
@@ -823,6 +849,9 @@ add_pmu_attr(struct perf_pmu_events_attr *attr, const char *name,
 	return ++attr;
 }
 
+/* No brackets or quotes below please. */
+#define I915_SAMPLE_QUEUED_SCALE 0.001
+
 static struct attribute **
 create_event_attributes(struct drm_i915_private *i915)
 {
@@ -839,10 +868,14 @@ create_event_attributes(struct drm_i915_private *i915)
 	static const struct {
 		enum drm_i915_pmu_engine_sample sample;
 		char *name;
+		char *suffix;
+		char *value;
 	} engine_events[] = {
 		__engine_event(I915_SAMPLE_BUSY, "busy"),
 		__engine_event(I915_SAMPLE_SEMA, "sema"),
 		__engine_event(I915_SAMPLE_WAIT, "wait"),
+		__engine_event_scale(I915_SAMPLE_QUEUED, "queued",
+				     __stringify(I915_SAMPLE_QUEUED_SCALE)),
 	};
 	unsigned int count = 0;
 	struct perf_pmu_events_attr *pmu_attr = NULL, *pmu_iter;
@@ -852,6 +885,9 @@ create_event_attributes(struct drm_i915_private *i915)
 	enum intel_engine_id id;
 	unsigned int i;
 
+	BUILD_BUG_ON(I915_SAMPLE_QUEUED_DIVISOR !=
+		     (1 / I915_SAMPLE_QUEUED_SCALE));
+
 	/* Count how many counters we will be exposing. */
 	for (i = 0; i < ARRAY_SIZE(events); i++) {
 		if (!config_status(i915, events[i].config))
@@ -929,13 +965,15 @@ create_event_attributes(struct drm_i915_private *i915)
 								engine->instance,
 								engine_events[i].sample));
 
-			str = kasprintf(GFP_KERNEL, "%s-%s.unit",
-					engine->name, engine_events[i].name);
+			str = kasprintf(GFP_KERNEL, "%s-%s.%s",
+					engine->name, engine_events[i].name,
+					engine_events[i].suffix);
 			if (!str)
 				goto err;
 
 			*attr_iter++ = &pmu_iter->attr.attr;
-			pmu_iter = add_pmu_attr(pmu_iter, str, "ns");
+			pmu_iter = add_pmu_attr(pmu_iter, str,
+						engine_events[i].value);
 		}
 	}
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index eeb7f3662195..902b63eeaf50 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -420,7 +420,7 @@ struct intel_engine_cs {
 		 *
 		 * Our internal timer stores the current counters in this field.
 		 */
-#define I915_ENGINE_SAMPLE_MAX (I915_SAMPLE_SEMA + 1)
+#define I915_ENGINE_SAMPLE_MAX (I915_SAMPLE_QUEUED + 1)
 		struct i915_pmu_sample sample[I915_ENGINE_SAMPLE_MAX];
 	} pmu;
 
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 7f5634ce8e88..d01a26160a89 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -110,9 +110,13 @@ enum drm_i915_gem_engine_class {
 enum drm_i915_pmu_engine_sample {
 	I915_SAMPLE_BUSY = 0,
 	I915_SAMPLE_WAIT = 1,
-	I915_SAMPLE_SEMA = 2
+	I915_SAMPLE_SEMA = 2,
+	I915_SAMPLE_QUEUED = 3
 };
 
+ /* Divide counter value by divisor to get the real value. */
+#define I915_SAMPLE_QUEUED_DIVISOR (1000)
+
 #define I915_PMU_SAMPLE_BITS (4)
 #define I915_PMU_SAMPLE_MASK (0xf)
 #define I915_PMU_SAMPLE_INSTANCE_BITS (8)
@@ -133,6 +137,9 @@ enum drm_i915_pmu_engine_sample {
 #define I915_PMU_ENGINE_SEMA(class, instance) \
 	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_SEMA)
 
+#define I915_PMU_ENGINE_QUEUED(class, instance) \
+	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_QUEUED)
+
 #define __I915_PMU_OTHER(x) (__I915_PMU_ENGINE(0xff, 0xff, 0xf) + 1 + (x))
 
 #define I915_PMU_ACTUAL_FREQUENCY	__I915_PMU_OTHER(0)
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 5/7] drm/i915/pmu: Add runnable counter
  2018-06-06 14:39   ` Tvrtko Ursulin
@ 2018-06-07 13:24     ` Tvrtko Ursulin
  0 siblings, 0 replies; 25+ messages in thread
From: Tvrtko Ursulin @ 2018-06-07 13:24 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

We add a PMU counter to expose the number of requests with resolved
dependencies waiting for a slot on the GPU to run.

This is useful to analyze the overall load of the system.

v2: Don't limit to gen8+.

v3:
 * Rebase for dynamic sysfs.
 * Drop currently executing requests.

v4:
 * Sync with internal renaming.
 * Drop floating point constant. (Chris Wilson)

v5:
 * Change scale to 1024 for faster arithmetics. (Chris Wilson)

v6:
 * Refactored for timer period accounting.

v7:
 * Avoid 64-division. (Chris Wilson)

v8:
 * Do fewer divisions by accumulating in qd.ns units. (Chris Wilson)
 * Change counter scale to avoid multiplication in readout and increase
   counter headroom.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_pmu.c         | 20 ++++++++++++++++++--
 drivers/gpu/drm/i915/intel_ringbuffer.h |  2 +-
 include/uapi/drm/i915_drm.h             |  7 ++++++-
 3 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index f8a819600ebc..bdfb430909b4 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -16,7 +16,8 @@
 	(BIT(I915_SAMPLE_BUSY) | \
 	 BIT(I915_SAMPLE_WAIT) | \
 	 BIT(I915_SAMPLE_SEMA) | \
-	 BIT(I915_SAMPLE_QUEUED))
+	 BIT(I915_SAMPLE_QUEUED) | \
+	 BIT(I915_SAMPLE_RUNNABLE))
 
 #define ENGINE_SAMPLE_BITS (1 << I915_PMU_SAMPLE_BITS)
 
@@ -216,6 +217,11 @@ engines_sample(struct drm_i915_private *dev_priv, unsigned int period_ns)
 			add_sample_mult(&engine->pmu.sample[I915_SAMPLE_QUEUED],
 					atomic_read(&engine->request_stats.queued),
 					period_ns);
+
+		if (engine->pmu.enable & BIT(I915_SAMPLE_RUNNABLE))
+			add_sample_mult(&engine->pmu.sample[I915_SAMPLE_RUNNABLE],
+					engine->request_stats.runnable,
+					period_ns);
 	}
 
 	if (fw)
@@ -330,6 +336,7 @@ engine_event_status(struct intel_engine_cs *engine,
 	case I915_SAMPLE_BUSY:
 	case I915_SAMPLE_WAIT:
 	case I915_SAMPLE_QUEUED:
+	case I915_SAMPLE_RUNNABLE:
 		break;
 	case I915_SAMPLE_SEMA:
 		if (INTEL_GEN(engine->i915) < 6)
@@ -548,9 +555,12 @@ static u64 __i915_pmu_event_read(struct perf_event *event)
 		} else {
 			val = engine->pmu.sample[sample].cur;
 
-			if (sample == I915_SAMPLE_QUEUED) {
+			if (sample == I915_SAMPLE_QUEUED ||
+			    sample == I915_SAMPLE_RUNNABLE) {
 				BUILD_BUG_ON(NSEC_PER_SEC %
 					     I915_SAMPLE_QUEUED_DIVISOR);
+				BUILD_BUG_ON(I915_SAMPLE_QUEUED_DIVISOR !=
+					     I915_SAMPLE_RUNNABLE_DIVISOR);
 				/* to qd */
 				val = div_u64(val,
 					      NSEC_PER_SEC /
@@ -851,6 +861,7 @@ add_pmu_attr(struct perf_pmu_events_attr *attr, const char *name,
 
 /* No brackets or quotes below please. */
 #define I915_SAMPLE_QUEUED_SCALE 0.001
+#define I915_SAMPLE_RUNNABLE_SCALE 0.001
 
 static struct attribute **
 create_event_attributes(struct drm_i915_private *i915)
@@ -876,6 +887,8 @@ create_event_attributes(struct drm_i915_private *i915)
 		__engine_event(I915_SAMPLE_WAIT, "wait"),
 		__engine_event_scale(I915_SAMPLE_QUEUED, "queued",
 				     __stringify(I915_SAMPLE_QUEUED_SCALE)),
+		__engine_event_scale(I915_SAMPLE_RUNNABLE, "runnable",
+				     __stringify(I915_SAMPLE_RUNNABLE_SCALE)),
 	};
 	unsigned int count = 0;
 	struct perf_pmu_events_attr *pmu_attr = NULL, *pmu_iter;
@@ -888,6 +901,9 @@ create_event_attributes(struct drm_i915_private *i915)
 	BUILD_BUG_ON(I915_SAMPLE_QUEUED_DIVISOR !=
 		     (1 / I915_SAMPLE_QUEUED_SCALE));
 
+	BUILD_BUG_ON(I915_SAMPLE_RUNNABLE_DIVISOR !=
+		     (1 / I915_SAMPLE_RUNNABLE_SCALE));
+
 	/* Count how many counters we will be exposing. */
 	for (i = 0; i < ARRAY_SIZE(events); i++) {
 		if (!config_status(i915, events[i].config))
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 902b63eeaf50..703cea694f0d 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -420,7 +420,7 @@ struct intel_engine_cs {
 		 *
 		 * Our internal timer stores the current counters in this field.
 		 */
-#define I915_ENGINE_SAMPLE_MAX (I915_SAMPLE_QUEUED + 1)
+#define I915_ENGINE_SAMPLE_MAX (I915_SAMPLE_RUNNABLE + 1)
 		struct i915_pmu_sample sample[I915_ENGINE_SAMPLE_MAX];
 	} pmu;
 
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index d01a26160a89..11a5822dbc4d 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -111,11 +111,13 @@ enum drm_i915_pmu_engine_sample {
 	I915_SAMPLE_BUSY = 0,
 	I915_SAMPLE_WAIT = 1,
 	I915_SAMPLE_SEMA = 2,
-	I915_SAMPLE_QUEUED = 3
+	I915_SAMPLE_QUEUED = 3,
+	I915_SAMPLE_RUNNABLE = 4,
 };
 
  /* Divide counter value by divisor to get the real value. */
 #define I915_SAMPLE_QUEUED_DIVISOR (1000)
+#define I915_SAMPLE_RUNNABLE_DIVISOR (1000)
 
 #define I915_PMU_SAMPLE_BITS (4)
 #define I915_PMU_SAMPLE_MASK (0xf)
@@ -140,6 +142,9 @@ enum drm_i915_pmu_engine_sample {
 #define I915_PMU_ENGINE_QUEUED(class, instance) \
 	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_QUEUED)
 
+#define I915_PMU_ENGINE_RUNNABLE(class, instance) \
+	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_RUNNABLE)
+
 #define __I915_PMU_OTHER(x) (__I915_PMU_ENGINE(0xff, 0xff, 0xf) + 1 + (x))
 
 #define I915_PMU_ACTUAL_FREQUENCY	__I915_PMU_OTHER(0)
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 6/7] drm/i915/pmu: Add running counter
  2018-06-06 14:40   ` Tvrtko Ursulin
  2018-06-06 15:23     ` Chris Wilson
@ 2018-06-07 13:25     ` Tvrtko Ursulin
  2018-06-07 14:45       ` Chris Wilson
  1 sibling, 1 reply; 25+ messages in thread
From: Tvrtko Ursulin @ 2018-06-07 13:25 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

We add a PMU counter to expose the number of requests currently executing
on the GPU.

This is useful to analyze the overall load of the system.

v2:
 * Rebase.
 * Drop floating point constant. (Chris Wilson)

v3:
 * Change scale to 1024 for faster arithmetics. (Chris Wilson)

v4:
 * Refactored for timer period accounting.

v5:
 * Avoid 64-division. (Chris Wilson)

v6:
 * Do fewer divisions by accumulating in qd.ns units. (Chris Wilson)
 * Change counter scale to avoid multiplication in readout and increase
   counter headroom.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_pmu.c         | 20 ++++++++++++++++++--
 drivers/gpu/drm/i915/intel_ringbuffer.h |  2 +-
 include/uapi/drm/i915_drm.h             |  5 +++++
 3 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index bdfb430909b4..73b6fe7cc6af 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -17,7 +17,8 @@
 	 BIT(I915_SAMPLE_WAIT) | \
 	 BIT(I915_SAMPLE_SEMA) | \
 	 BIT(I915_SAMPLE_QUEUED) | \
-	 BIT(I915_SAMPLE_RUNNABLE))
+	 BIT(I915_SAMPLE_RUNNABLE) | \
+	 BIT(I915_SAMPLE_RUNNING))
 
 #define ENGINE_SAMPLE_BITS (1 << I915_PMU_SAMPLE_BITS)
 
@@ -222,6 +223,11 @@ engines_sample(struct drm_i915_private *dev_priv, unsigned int period_ns)
 			add_sample_mult(&engine->pmu.sample[I915_SAMPLE_RUNNABLE],
 					engine->request_stats.runnable,
 					period_ns);
+
+		if (engine->pmu.enable & BIT(I915_SAMPLE_RUNNING))
+			add_sample_mult(&engine->pmu.sample[I915_SAMPLE_RUNNING],
+					last_seqno - current_seqno,
+					period_ns);
 	}
 
 	if (fw)
@@ -337,6 +343,7 @@ engine_event_status(struct intel_engine_cs *engine,
 	case I915_SAMPLE_WAIT:
 	case I915_SAMPLE_QUEUED:
 	case I915_SAMPLE_RUNNABLE:
+	case I915_SAMPLE_RUNNING:
 		break;
 	case I915_SAMPLE_SEMA:
 		if (INTEL_GEN(engine->i915) < 6)
@@ -556,11 +563,14 @@ static u64 __i915_pmu_event_read(struct perf_event *event)
 			val = engine->pmu.sample[sample].cur;
 
 			if (sample == I915_SAMPLE_QUEUED ||
-			    sample == I915_SAMPLE_RUNNABLE) {
+			    sample == I915_SAMPLE_RUNNABLE ||
+			    sample == I915_SAMPLE_RUNNING) {
 				BUILD_BUG_ON(NSEC_PER_SEC %
 					     I915_SAMPLE_QUEUED_DIVISOR);
 				BUILD_BUG_ON(I915_SAMPLE_QUEUED_DIVISOR !=
 					     I915_SAMPLE_RUNNABLE_DIVISOR);
+				BUILD_BUG_ON(I915_SAMPLE_QUEUED_DIVISOR !=
+					     I915_SAMPLE_RUNNING_DIVISOR);
 				/* to qd */
 				val = div_u64(val,
 					      NSEC_PER_SEC /
@@ -862,6 +872,7 @@ add_pmu_attr(struct perf_pmu_events_attr *attr, const char *name,
 /* No brackets or quotes below please. */
 #define I915_SAMPLE_QUEUED_SCALE 0.001
 #define I915_SAMPLE_RUNNABLE_SCALE 0.001
+#define I915_SAMPLE_RUNNING_SCALE 0.001
 
 static struct attribute **
 create_event_attributes(struct drm_i915_private *i915)
@@ -889,6 +900,8 @@ create_event_attributes(struct drm_i915_private *i915)
 				     __stringify(I915_SAMPLE_QUEUED_SCALE)),
 		__engine_event_scale(I915_SAMPLE_RUNNABLE, "runnable",
 				     __stringify(I915_SAMPLE_RUNNABLE_SCALE)),
+		__engine_event_scale(I915_SAMPLE_RUNNING, "running",
+				     __stringify(I915_SAMPLE_RUNNING_SCALE)),
 	};
 	unsigned int count = 0;
 	struct perf_pmu_events_attr *pmu_attr = NULL, *pmu_iter;
@@ -904,6 +917,9 @@ create_event_attributes(struct drm_i915_private *i915)
 	BUILD_BUG_ON(I915_SAMPLE_RUNNABLE_DIVISOR !=
 		     (1 / I915_SAMPLE_RUNNABLE_SCALE));
 
+	BUILD_BUG_ON(I915_SAMPLE_RUNNING_DIVISOR !=
+		     (1 / I915_SAMPLE_RUNNING_SCALE));
+
 	/* Count how many counters we will be exposing. */
 	for (i = 0; i < ARRAY_SIZE(events); i++) {
 		if (!config_status(i915, events[i].config))
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 703cea694f0d..bff20cfd6870 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -420,7 +420,7 @@ struct intel_engine_cs {
 		 *
 		 * Our internal timer stores the current counters in this field.
 		 */
-#define I915_ENGINE_SAMPLE_MAX (I915_SAMPLE_RUNNABLE + 1)
+#define I915_ENGINE_SAMPLE_MAX (I915_SAMPLE_RUNNING + 1)
 		struct i915_pmu_sample sample[I915_ENGINE_SAMPLE_MAX];
 	} pmu;
 
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 11a5822dbc4d..6ae2ef7c0dde 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -113,11 +113,13 @@ enum drm_i915_pmu_engine_sample {
 	I915_SAMPLE_SEMA = 2,
 	I915_SAMPLE_QUEUED = 3,
 	I915_SAMPLE_RUNNABLE = 4,
+	I915_SAMPLE_RUNNING = 5,
 };
 
  /* Divide counter value by divisor to get the real value. */
 #define I915_SAMPLE_QUEUED_DIVISOR (1000)
 #define I915_SAMPLE_RUNNABLE_DIVISOR (1000)
+#define I915_SAMPLE_RUNNING_DIVISOR (1000)
 
 #define I915_PMU_SAMPLE_BITS (4)
 #define I915_PMU_SAMPLE_MASK (0xf)
@@ -145,6 +147,9 @@ enum drm_i915_pmu_engine_sample {
 #define I915_PMU_ENGINE_RUNNABLE(class, instance) \
 	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_RUNNABLE)
 
+#define I915_PMU_ENGINE_RUNNING(class, instance) \
+	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_RUNNING)
+
 #define __I915_PMU_OTHER(x) (__I915_PMU_ENGINE(0xff, 0xff, 0xf) + 1 + (x))
 
 #define I915_PMU_ACTUAL_FREQUENCY	__I915_PMU_OTHER(0)
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* ✓ Fi.CI.BAT: success for Queued/runnable/running engine stats (rev14)
  2018-06-06 12:48 [PATCH v6 0/7] Queued/runnable/running engine stats Tvrtko Ursulin
                   ` (10 preceding siblings ...)
  2018-06-06 17:26 ` ✓ Fi.CI.IGT: success for Queued/runnable/running engine stats (rev11) Patchwork
@ 2018-06-07 13:57 ` Patchwork
  2018-06-07 17:55 ` ✓ Fi.CI.IGT: " Patchwork
  12 siblings, 0 replies; 25+ messages in thread
From: Patchwork @ 2018-06-07 13:57 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

== Series Details ==

Series: Queued/runnable/running engine stats (rev14)
URL   : https://patchwork.freedesktop.org/series/36926/
State : success

== Summary ==

= CI Bug Log - changes from CI_DRM_4289 -> Patchwork_9228 =

== Summary - WARNING ==

  Minor unknown changes coming with Patchwork_9228 need to be verified
  manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_9228, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/36926/revisions/14/mbox/

== Possible new issues ==

  Here are the unknown changes that may have been introduced in Patchwork_9228:

  === IGT changes ===

    ==== Warnings ====

    igt@gem_exec_gttfill@basic:
      fi-pnv-d510:        SKIP -> PASS

    igt@kms_pipe_crc_basic@read-crc-pipe-c:
      fi-glk-j4005:       SKIP -> PASS

    
== Known issues ==

  Here are the changes found in Patchwork_9228 that come from known issues:

  === IGT changes ===

    ==== Issues hit ====

    igt@drv_module_reload@basic-reload:
      fi-ilk-650:         PASS -> DMESG-WARN (fdo#106387) +2

    igt@gem_mmap_gtt@basic-small-bo-tiledx:
      fi-gdg-551:         PASS -> FAIL (fdo#102575)

    igt@kms_flip@basic-flip-vs-dpms:
      fi-glk-j4005:       PASS -> DMESG-WARN (fdo#106097, fdo#106000)

    igt@kms_pipe_crc_basic@nonblocking-crc-pipe-b-frame-sequence:
      fi-cfl-s3:          PASS -> FAIL (fdo#103481)

    igt@kms_pipe_crc_basic@suspend-read-crc-pipe-c:
      fi-cnl-psr:         PASS -> DMESG-WARN (fdo#104951)

    
    ==== Possible fixes ====

    igt@gem_exec_suspend@basic-s3:
      fi-glk-j4005:       DMESG-WARN (fdo#106000) -> PASS +1

    igt@gem_sync@basic-many-each:
      fi-cnl-y3:          INCOMPLETE (fdo#105086) -> PASS

    igt@kms_flip@basic-flip-vs-wf_vblank:
      fi-glk-j4005:       FAIL (fdo#100368) -> PASS

    igt@kms_pipe_crc_basic@read-crc-pipe-b-frame-sequence:
      fi-glk-j4005:       FAIL (fdo#103481) -> PASS

    igt@kms_pipe_crc_basic@suspend-read-crc-pipe-b:
      fi-cnl-psr:         DMESG-WARN (fdo#104951) -> PASS

    igt@prime_vgem@basic-fence-flip:
      fi-ilk-650:         FAIL (fdo#104008) -> PASS

    
  fdo#100368 https://bugs.freedesktop.org/show_bug.cgi?id=100368
  fdo#102575 https://bugs.freedesktop.org/show_bug.cgi?id=102575
  fdo#103481 https://bugs.freedesktop.org/show_bug.cgi?id=103481
  fdo#104008 https://bugs.freedesktop.org/show_bug.cgi?id=104008
  fdo#104951 https://bugs.freedesktop.org/show_bug.cgi?id=104951
  fdo#105086 https://bugs.freedesktop.org/show_bug.cgi?id=105086
  fdo#106000 https://bugs.freedesktop.org/show_bug.cgi?id=106000
  fdo#106097 https://bugs.freedesktop.org/show_bug.cgi?id=106097
  fdo#106387 https://bugs.freedesktop.org/show_bug.cgi?id=106387


== Participating hosts (40 -> 37) ==

  Missing    (3): fi-ilk-m540 fi-byt-squawks fi-skl-6700hq 


== Build changes ==

    * Linux: CI_DRM_4289 -> Patchwork_9228

  CI_DRM_4289: 0e963d962be75b4e3d3d1c884e1bf4600473096d @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4509: c8f1ae58e1b7da17af4722a5ce5a9cd8b9a34059 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_9228: a0bcb68e7c4dfe1f5eeadd5b5aa3c5b63a97b67d @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

a0bcb68e7c4d drm/i915: Engine queues query
96fbd1d34956 drm/i915/pmu: Add running counter
8a1c70584497 drm/i915/pmu: Add runnable counter
a2f6d02d7479 drm/i915/pmu: Add queued counter
d9a4aa2db316 drm/i915: Keep a count of requests submitted from userspace
aaa1b93ef39b drm/i915: Keep a count of requests waiting for a slot on GPU
41210304d3a4 drm/i915/pmu: Fix enable count array size and bounds checking

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_9228/issues.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 6/7] drm/i915/pmu: Add running counter
  2018-06-07 13:25     ` Tvrtko Ursulin
@ 2018-06-07 14:45       ` Chris Wilson
  0 siblings, 0 replies; 25+ messages in thread
From: Chris Wilson @ 2018-06-07 14:45 UTC (permalink / raw)
  To: Tvrtko Ursulin, Intel-gfx

Quoting Tvrtko Ursulin (2018-06-07 14:25:28)
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> We add a PMU counter to expose the number of requests currently executing
> on the GPU.
> 
> This is useful to analyze the overall load of the system.
> 
> v2:
>  * Rebase.
>  * Drop floating point constant. (Chris Wilson)
> 
> v3:
>  * Change scale to 1024 for faster arithmetics. (Chris Wilson)
> 
> v4:
>  * Refactored for timer period accounting.
> 
> v5:
>  * Avoid 64-division. (Chris Wilson)
> 
> v6:
>  * Do fewer divisions by accumulating in qd.ns units. (Chris Wilson)
>  * Change counter scale to avoid multiplication in readout and increase
>    counter headroom.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

I can't spot any nits to pick. That means I actually have to review it
now, right?
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* ✓ Fi.CI.IGT: success for Queued/runnable/running engine stats (rev14)
  2018-06-06 12:48 [PATCH v6 0/7] Queued/runnable/running engine stats Tvrtko Ursulin
                   ` (11 preceding siblings ...)
  2018-06-07 13:57 ` ✓ Fi.CI.BAT: success for Queued/runnable/running engine stats (rev14) Patchwork
@ 2018-06-07 17:55 ` Patchwork
  12 siblings, 0 replies; 25+ messages in thread
From: Patchwork @ 2018-06-07 17:55 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

== Series Details ==

Series: Queued/runnable/running engine stats (rev14)
URL   : https://patchwork.freedesktop.org/series/36926/
State : success

== Summary ==

= CI Bug Log - changes from CI_DRM_4289_full -> Patchwork_9228_full =

== Summary - WARNING ==

  Minor unknown changes coming with Patchwork_9228_full need to be verified
  manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_9228_full, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  

== Possible new issues ==

  Here are the unknown changes that may have been introduced in Patchwork_9228_full:

  === IGT changes ===

    ==== Warnings ====

    igt@gem_exec_schedule@deep-bsd2:
      shard-kbl:          PASS -> SKIP +1

    igt@gem_exec_schedule@deep-vebox:
      shard-kbl:          SKIP -> PASS +1

    
== Known issues ==

  Here are the changes found in Patchwork_9228_full that come from known issues:

  === IGT changes ===

    ==== Issues hit ====

    igt@drv_selftest@live_gtt:
      shard-kbl:          PASS -> FAIL (fdo#105347)

    igt@drv_selftest@live_hangcheck:
      shard-apl:          PASS -> DMESG-FAIL (fdo#106560)
      shard-glk:          NOTRUN -> DMESG-FAIL (fdo#106560)

    igt@kms_cursor_legacy@2x-nonblocking-modeset-vs-cursor-atomic:
      shard-glk:          PASS -> FAIL (fdo#106509)

    igt@kms_flip@2x-flip-vs-expired-vblank-interruptible:
      shard-glk:          PASS -> FAIL (fdo#105189)

    igt@kms_flip@2x-plain-flip-fb-recreate-interruptible:
      shard-glk:          PASS -> FAIL (fdo#100368)

    igt@kms_flip_tiling@flip-to-y-tiled:
      shard-glk:          PASS -> FAIL (fdo#104724)

    
    ==== Possible fixes ====

    igt@drv_selftest@live_gtt:
      shard-glk:          INCOMPLETE (k.org#198133, fdo#103359) -> PASS

    igt@kms_atomic_transition@1x-modeset-transitions-nonblocking-fencing:
      shard-glk:          FAIL (fdo#105703) -> PASS +1

    igt@kms_flip@dpms-vs-vblank-race-interruptible:
      shard-hsw:          FAIL (fdo#103060) -> PASS

    igt@kms_flip@plain-flip-ts-check-interruptible:
      shard-hsw:          FAIL (fdo#103928) -> PASS +1

    igt@kms_frontbuffer_tracking@fbc-2p-primscrn-pri-shrfb-draw-mmap-wc:
      shard-glk:          FAIL (fdo#103167, fdo#104724) -> PASS

    igt@kms_plane@plane-panning-bottom-right-suspend-pipe-c-planes:
      shard-kbl:          INCOMPLETE (fdo#103665) -> PASS

    igt@kms_setmode@basic:
      shard-apl:          FAIL (fdo#99912) -> PASS

    
  fdo#100368 https://bugs.freedesktop.org/show_bug.cgi?id=100368
  fdo#103060 https://bugs.freedesktop.org/show_bug.cgi?id=103060
  fdo#103167 https://bugs.freedesktop.org/show_bug.cgi?id=103167
  fdo#103359 https://bugs.freedesktop.org/show_bug.cgi?id=103359
  fdo#103665 https://bugs.freedesktop.org/show_bug.cgi?id=103665
  fdo#103928 https://bugs.freedesktop.org/show_bug.cgi?id=103928
  fdo#104724 https://bugs.freedesktop.org/show_bug.cgi?id=104724
  fdo#105189 https://bugs.freedesktop.org/show_bug.cgi?id=105189
  fdo#105347 https://bugs.freedesktop.org/show_bug.cgi?id=105347
  fdo#105703 https://bugs.freedesktop.org/show_bug.cgi?id=105703
  fdo#106509 https://bugs.freedesktop.org/show_bug.cgi?id=106509
  fdo#106560 https://bugs.freedesktop.org/show_bug.cgi?id=106560
  fdo#99912 https://bugs.freedesktop.org/show_bug.cgi?id=99912
  k.org#198133 https://bugzilla.kernel.org/show_bug.cgi?id=198133


== Participating hosts (5 -> 5) ==

  No changes in participating hosts


== Build changes ==

    * Linux: CI_DRM_4289 -> Patchwork_9228

  CI_DRM_4289: 0e963d962be75b4e3d3d1c884e1bf4600473096d @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4509: c8f1ae58e1b7da17af4722a5ce5a9cd8b9a34059 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_9228: a0bcb68e7c4dfe1f5eeadd5b5aa3c5b63a97b67d @ git://anongit.freedesktop.org/gfx-ci/linux
  piglit_4509: fdc5a4ca11124ab8413c7988896eec4c97336694 @ git://anongit.freedesktop.org/piglit

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_9228/shards.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2018-06-07 17:55 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-06 12:48 [PATCH v6 0/7] Queued/runnable/running engine stats Tvrtko Ursulin
2018-06-06 12:48 ` [PATCH 1/7] drm/i915/pmu: Fix enable count array size and bounds checking Tvrtko Ursulin
2018-06-06 12:48 ` [PATCH 2/7] drm/i915: Keep a count of requests waiting for a slot on GPU Tvrtko Ursulin
2018-06-06 12:48 ` [PATCH 3/7] drm/i915: Keep a count of requests submitted from userspace Tvrtko Ursulin
2018-06-06 12:48 ` [PATCH 4/7] drm/i915/pmu: Add queued counter Tvrtko Ursulin
2018-06-06 13:16   ` Chris Wilson
2018-06-06 13:24     ` Tvrtko Ursulin
2018-06-06 14:39   ` Tvrtko Ursulin
2018-06-07 13:24     ` Tvrtko Ursulin
2018-06-06 12:48 ` [PATCH 5/7] drm/i915/pmu: Add runnable counter Tvrtko Ursulin
2018-06-06 14:39   ` Tvrtko Ursulin
2018-06-07 13:24     ` Tvrtko Ursulin
2018-06-06 12:48 ` [PATCH 6/7] drm/i915/pmu: Add running counter Tvrtko Ursulin
2018-06-06 14:40   ` Tvrtko Ursulin
2018-06-06 15:23     ` Chris Wilson
2018-06-06 15:52       ` Tvrtko Ursulin
2018-06-07 13:25     ` Tvrtko Ursulin
2018-06-07 14:45       ` Chris Wilson
2018-06-06 12:48 ` [PATCH 7/7] drm/i915: Engine queues query Tvrtko Ursulin
2018-06-06 13:31 ` ✓ Fi.CI.BAT: success for Queued/runnable/running engine stats (rev8) Patchwork
2018-06-06 15:17 ` ✓ Fi.CI.BAT: success for Queued/runnable/running engine stats (rev11) Patchwork
2018-06-06 16:06 ` ✗ Fi.CI.IGT: failure for Queued/runnable/running engine stats (rev8) Patchwork
2018-06-06 17:26 ` ✓ Fi.CI.IGT: success for Queued/runnable/running engine stats (rev11) Patchwork
2018-06-07 13:57 ` ✓ Fi.CI.BAT: success for Queued/runnable/running engine stats (rev14) Patchwork
2018-06-07 17:55 ` ✓ Fi.CI.IGT: " Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.