All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 00/13] 21st century intel_gpu_top
@ 2018-10-03 12:03 Tvrtko Ursulin
  2018-10-03 12:03 ` [RFC 01/13] drm/i915/pmu: Fix enable count array size and bounds checking Tvrtko Ursulin
                   ` (15 more replies)
  0 siblings, 16 replies; 18+ messages in thread
From: Tvrtko Ursulin @ 2018-10-03 12:03 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

A collection of patches which I have been sending before, sometimes together and
sometimes separately, which enable intel_gpu_top to report queue depths (also
translates as overall GPU load average) and per DRM client per engine busyness.

This enables a fancy intel_gpu_top which looks like this (a picture is worth a
thousand words):

intel-gpu-top - load avg  3.30,  1.51,  0.08;  949/ 949 MHz;    0% RC6;  14.66 Watts;     3605 irqs/s

      IMC reads:     4651 MiB/s
     IMC writes:       25 MiB/s

          ENGINE      BUSY                                                                                Q   r   R MI_SEMA MI_WAIT
     Render/3D/0    61.51% |█████████████████████████████████████████████▌                            |   3   0   1      0%      0%
       Blitter/0     0.00% |                                                                          |   0   0   0      0%      0%
         Video/0    60.86% |█████████████████████████████████████████████                             |   1   0   1      0%      0%
         Video/1    59.04% |███████████████████████████████████████████▋                              |   1   0   1      0%      0%
  VideoEnhance/0     0.00% |                                                                          |   0   0   0      0%      0%

  PID            NAME     Render/3D/0            Blitter/0              Video/0               Video/1            VideoEnhance/0
23373        gem_wsim |█████▎              ||                    ||████████▍           ||█████▎              ||                    |
23374        gem_wsim |███▉                ||                    ||██▏                 ||███                 ||                    |
23375        gem_wsim |███                 ||                    ||█▍                  ||███▌                ||                    |

All of this work actually came to be via different feature requests not directly
asking for this. Things like engine queue depth query and per context engine
busyness ioctl. Those bits need userspace which is not there yet and so I have
removed them from this posting to avoid confusion.

What remains is a set of patches which add some PMU counters and a completely
new sysfs interface to enable intel_gpu_top to read the per client stats.

IGT counterpart will be sent separately.

Tvrtko Ursulin (13):
  drm/i915/pmu: Fix enable count array size and bounds checking
  drm/i915: Keep a count of requests waiting for a slot on GPU
  drm/i915: Keep a count of requests submitted from userspace
  drm/i915/pmu: Add queued counter
  drm/i915/pmu: Add runnable counter
  drm/i915/pmu: Add running counter
  drm/i915: Store engine backpointer in the intel_context
  drm/i915: Move intel_engine_context_in/out into intel_lrc.c
  drm/i915: Track per-context engine busyness
  drm/i915: Expose list of clients in sysfs
  drm/i915: Update client name on context create
  drm/i915: Expose per-engine client busyness
  drm/i915: Add sysfs toggle to enable per-client engine stats

 drivers/gpu/drm/i915/i915_drv.h         |  39 +++++
 drivers/gpu/drm/i915/i915_gem.c         | 197 +++++++++++++++++++++++-
 drivers/gpu/drm/i915/i915_gem_context.c |  18 ++-
 drivers/gpu/drm/i915/i915_gem_context.h |  18 +++
 drivers/gpu/drm/i915/i915_pmu.c         | 103 +++++++++++--
 drivers/gpu/drm/i915/i915_request.c     |  10 ++
 drivers/gpu/drm/i915/i915_sysfs.c       |  81 ++++++++++
 drivers/gpu/drm/i915/intel_engine_cs.c  |  33 +++-
 drivers/gpu/drm/i915/intel_lrc.c        | 109 ++++++++++++-
 drivers/gpu/drm/i915/intel_ringbuffer.h |  76 +++------
 include/uapi/drm/i915_drm.h             |  19 ++-
 11 files changed, 614 insertions(+), 89 deletions(-)

-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [RFC 01/13] drm/i915/pmu: Fix enable count array size and bounds checking
  2018-10-03 12:03 [RFC 00/13] 21st century intel_gpu_top Tvrtko Ursulin
@ 2018-10-03 12:03 ` Tvrtko Ursulin
  2018-10-03 12:03 ` [RFC 02/13] drm/i915: Keep a count of requests waiting for a slot on GPU Tvrtko Ursulin
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Tvrtko Ursulin @ 2018-10-03 12:03 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Enable count array is supposed to have one counter for each possible
engine sampler. As such array sizing and bounds checking is not
correct when more engine samplers are added.

At the same time tidy the assert for readability and robustness.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: b46a33e271ed ("drm/i915/pmu: Expose a PMU interface for perf queries")
Cc: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_pmu.c         | 13 +++++++++----
 drivers/gpu/drm/i915/intel_ringbuffer.h |  2 +-
 2 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index d6c8f8fdfda5..417fda7208be 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -594,7 +594,8 @@ static void i915_pmu_enable(struct perf_event *event)
 	 * Update the bitmask of enabled events and increment
 	 * the event reference counter.
 	 */
-	GEM_BUG_ON(bit >= I915_PMU_MASK_BITS);
+	BUILD_BUG_ON(ARRAY_SIZE(i915->pmu.enable_count) != I915_PMU_MASK_BITS);
+	GEM_BUG_ON(bit >= ARRAY_SIZE(i915->pmu.enable_count));
 	GEM_BUG_ON(i915->pmu.enable_count[bit] == ~0);
 	i915->pmu.enable |= BIT_ULL(bit);
 	i915->pmu.enable_count[bit]++;
@@ -618,7 +619,10 @@ static void i915_pmu_enable(struct perf_event *event)
 		GEM_BUG_ON(!engine);
 		engine->pmu.enable |= BIT(sample);
 
-		GEM_BUG_ON(sample >= I915_PMU_SAMPLE_BITS);
+		BUILD_BUG_ON(ARRAY_SIZE(engine->pmu.enable_count) !=
+			     (1 << I915_PMU_SAMPLE_BITS));
+		GEM_BUG_ON(sample >= ARRAY_SIZE(engine->pmu.enable_count));
+		GEM_BUG_ON(sample >= ARRAY_SIZE(engine->pmu.sample));
 		GEM_BUG_ON(engine->pmu.enable_count[sample] == ~0);
 		engine->pmu.enable_count[sample]++;
 	}
@@ -650,7 +654,8 @@ static void i915_pmu_disable(struct perf_event *event)
 						  engine_event_class(event),
 						  engine_event_instance(event));
 		GEM_BUG_ON(!engine);
-		GEM_BUG_ON(sample >= I915_PMU_SAMPLE_BITS);
+		GEM_BUG_ON(sample >= ARRAY_SIZE(engine->pmu.enable_count));
+		GEM_BUG_ON(sample >= ARRAY_SIZE(engine->pmu.sample));
 		GEM_BUG_ON(engine->pmu.enable_count[sample] == 0);
 		/*
 		 * Decrement the reference count and clear the enabled
@@ -660,7 +665,7 @@ static void i915_pmu_disable(struct perf_event *event)
 			engine->pmu.enable &= ~BIT(sample);
 	}
 
-	GEM_BUG_ON(bit >= I915_PMU_MASK_BITS);
+	GEM_BUG_ON(bit >= ARRAY_SIZE(i915->pmu.enable_count));
 	GEM_BUG_ON(i915->pmu.enable_count[bit] == 0);
 	/*
 	 * Decrement the reference count and clear the enabled
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index f6ec48a75a69..7078132fc631 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -432,7 +432,7 @@ struct intel_engine_cs {
 		 *
 		 * Index number corresponds to the bit number from @enable.
 		 */
-		unsigned int enable_count[I915_PMU_SAMPLE_BITS];
+		unsigned int enable_count[1 << I915_PMU_SAMPLE_BITS];
 		/**
 		 * @sample: Counter values for sampling events.
 		 *
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC 02/13] drm/i915: Keep a count of requests waiting for a slot on GPU
  2018-10-03 12:03 [RFC 00/13] 21st century intel_gpu_top Tvrtko Ursulin
  2018-10-03 12:03 ` [RFC 01/13] drm/i915/pmu: Fix enable count array size and bounds checking Tvrtko Ursulin
@ 2018-10-03 12:03 ` Tvrtko Ursulin
  2018-10-03 12:03 ` [RFC 03/13] drm/i915: Keep a count of requests submitted from userspace Tvrtko Ursulin
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Tvrtko Ursulin @ 2018-10-03 12:03 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Keep a per-engine number of runnable (waiting for GPU time) requests.

We choose to mange the runnable counter at the backend level instead of at
the request submit_notify callback. The latter would be more consolidated
and less code, but it would require making the counter either atomic_t or
taking the engine->timeline->lock in submit_notify. So the choice is to do
it at the backend level for the benefit of fewer atomic instructions.

v2:
 * Move queued increment from insert_request to execlist_submit_request to
   avoid bumping when re-ordering for priority.
 * Support the counter on the ringbuffer submission path as well, albeit
   just notionally. (Chris Wilson)

v3:
 * Rebase.

v4:
 * Rename and move the stats into a container structure. (Chris Wilson)

v5:
 * Re-order fields in struct intel_engine_cs. (Chris Wilson)

v6-v8:
 * Rebases.

v9:
 * Fix accounting during wedging.

v10:
 * Improved commit message. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c         | 1 +
 drivers/gpu/drm/i915/i915_request.c     | 7 +++++++
 drivers/gpu/drm/i915/intel_engine_cs.c  | 5 +++--
 drivers/gpu/drm/i915/intel_lrc.c        | 1 +
 drivers/gpu/drm/i915/intel_ringbuffer.h | 9 +++++++++
 5 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 7d45e71100bc..d3a730f6ef65 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3309,6 +3309,7 @@ static void nop_complete_submit_request(struct i915_request *request)
 	dma_fence_set_error(&request->fence, -EIO);
 
 	spin_lock_irqsave(&request->engine->timeline.lock, flags);
+	request->engine->request_stats.runnable++;
 	__i915_request_submit(request);
 	intel_engine_init_global_seqno(request->engine, request->global_seqno);
 	spin_unlock_irqrestore(&request->engine->timeline.lock, flags);
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index abd4dacbab8e..689f838e849c 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -457,6 +457,9 @@ void __i915_request_submit(struct i915_request *request)
 	/* Transfer from per-context onto the global per-engine timeline */
 	move_to_timeline(request, &engine->timeline);
 
+	GEM_BUG_ON(engine->request_stats.runnable == 0);
+	engine->request_stats.runnable--;
+
 	trace_i915_request_execute(request);
 
 	wake_up_all(&request->execute);
@@ -470,6 +473,8 @@ void i915_request_submit(struct i915_request *request)
 	/* Will be called from irq-context when using foreign fences. */
 	spin_lock_irqsave(&engine->timeline.lock, flags);
 
+	engine->request_stats.runnable++;
+
 	__i915_request_submit(request);
 
 	spin_unlock_irqrestore(&engine->timeline.lock, flags);
@@ -507,6 +512,8 @@ void __i915_request_unsubmit(struct i915_request *request)
 	/* Transfer back from the global per-engine timeline to per-context */
 	move_to_timeline(request, request->timeline);
 
+	engine->request_stats.runnable++;
+
 	/*
 	 * We don't need to wake_up any waiters on request->execute, they
 	 * will get woken by any other event or us re-adding this request
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 1c6143bdf5a4..f46ef765aed0 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -1460,11 +1460,12 @@ void intel_engine_dump(struct intel_engine_cs *engine,
 	if (i915_terminally_wedged(&engine->i915->gpu_error))
 		drm_printf(m, "*** WEDGED ***\n");
 
-	drm_printf(m, "\tcurrent seqno %x, last %x, hangcheck %x [%d ms]\n",
+	drm_printf(m, "\tcurrent seqno %x, last %x, hangcheck %x [%d ms], runnable %u\n",
 		   intel_engine_get_seqno(engine),
 		   intel_engine_last_submit(engine),
 		   engine->hangcheck.seqno,
-		   jiffies_to_msecs(jiffies - engine->hangcheck.action_timestamp));
+		   jiffies_to_msecs(jiffies - engine->hangcheck.action_timestamp),
+		   engine->request_stats.runnable);
 	drm_printf(m, "\tReset count: %d (global %d)\n",
 		   i915_reset_engine_count(error, engine),
 		   i915_reset_count(error));
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 28d56387edf5..f0c2673fce49 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1023,6 +1023,7 @@ static void queue_request(struct intel_engine_cs *engine,
 			  int prio)
 {
 	list_add_tail(&node->link, i915_sched_lookup_priolist(engine, prio));
+	engine->request_stats.runnable++;
 }
 
 static void __submit_queue_imm(struct intel_engine_cs *engine)
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 7078132fc631..07491b6c7796 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -380,6 +380,15 @@ struct intel_engine_cs {
 	struct drm_i915_gem_object *default_state;
 	void *pinned_default_state;
 
+	struct {
+		/**
+		 * @runnable: Number of runnable requests sent to the backend.
+		 *
+		 * Count of requests waiting for the GPU to execute them.
+		 */
+		unsigned int runnable;
+	} request_stats;
+
 	unsigned long irq_posted;
 #define ENGINE_IRQ_BREADCRUMB 0
 
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC 03/13] drm/i915: Keep a count of requests submitted from userspace
  2018-10-03 12:03 [RFC 00/13] 21st century intel_gpu_top Tvrtko Ursulin
  2018-10-03 12:03 ` [RFC 01/13] drm/i915/pmu: Fix enable count array size and bounds checking Tvrtko Ursulin
  2018-10-03 12:03 ` [RFC 02/13] drm/i915: Keep a count of requests waiting for a slot on GPU Tvrtko Ursulin
@ 2018-10-03 12:03 ` Tvrtko Ursulin
  2018-10-03 12:03 ` [RFC 04/13] drm/i915/pmu: Add queued counter Tvrtko Ursulin
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Tvrtko Ursulin @ 2018-10-03 12:03 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Keep a count of requests submitted from userspace and not yet runnable due
unresolved dependencies.

v2: Rename and move under the container struct. (Chris Wilson)
v3: Rebase.
v4: Move decrement site to the backend to shrink the window of double-
    accounting as much as possible. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_request.c     | 3 +++
 drivers/gpu/drm/i915/intel_engine_cs.c  | 3 ++-
 drivers/gpu/drm/i915/intel_lrc.c        | 1 +
 drivers/gpu/drm/i915/intel_ringbuffer.h | 8 ++++++++
 4 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 689f838e849c..97a157f47a87 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -474,6 +474,7 @@ void i915_request_submit(struct i915_request *request)
 	spin_lock_irqsave(&engine->timeline.lock, flags);
 
 	engine->request_stats.runnable++;
+	atomic_dec(&engine->request_stats.queued);
 
 	__i915_request_submit(request);
 
@@ -1036,6 +1037,8 @@ void i915_request_add(struct i915_request *request)
 	}
 	request->emitted_jiffies = jiffies;
 
+	atomic_inc(&engine->request_stats.queued);
+
 	/*
 	 * Let the backend know a new request has arrived that may need
 	 * to adjust the existing execution schedule due to a high priority
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index f46ef765aed0..1b3f562246c9 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -1460,11 +1460,12 @@ void intel_engine_dump(struct intel_engine_cs *engine,
 	if (i915_terminally_wedged(&engine->i915->gpu_error))
 		drm_printf(m, "*** WEDGED ***\n");
 
-	drm_printf(m, "\tcurrent seqno %x, last %x, hangcheck %x [%d ms], runnable %u\n",
+	drm_printf(m, "\tcurrent seqno %x, last %x, hangcheck %x [%d ms], queued %u, runnable %u\n",
 		   intel_engine_get_seqno(engine),
 		   intel_engine_last_submit(engine),
 		   engine->hangcheck.seqno,
 		   jiffies_to_msecs(jiffies - engine->hangcheck.action_timestamp),
+		   atomic_read(&engine->request_stats.queued),
 		   engine->request_stats.runnable);
 	drm_printf(m, "\tReset count: %d (global %d)\n",
 		   i915_reset_engine_count(error, engine),
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index f0c2673fce49..3e8b16ab1b33 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1024,6 +1024,7 @@ static void queue_request(struct intel_engine_cs *engine,
 {
 	list_add_tail(&node->link, i915_sched_lookup_priolist(engine, prio));
 	engine->request_stats.runnable++;
+	atomic_dec(&engine->request_stats.queued);
 }
 
 static void __submit_queue_imm(struct intel_engine_cs *engine)
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 07491b6c7796..dc11ed10bac4 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -381,6 +381,14 @@ struct intel_engine_cs {
 	void *pinned_default_state;
 
 	struct {
+		/**
+		 * @queued: Number of submitted requests with dependencies.
+		 *
+		 * Count of requests waiting for dependencies before they can be
+		 * submitted to the backend.
+		 */
+		atomic_t queued;
+
 		/**
 		 * @runnable: Number of runnable requests sent to the backend.
 		 *
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC 04/13] drm/i915/pmu: Add queued counter
  2018-10-03 12:03 [RFC 00/13] 21st century intel_gpu_top Tvrtko Ursulin
                   ` (2 preceding siblings ...)
  2018-10-03 12:03 ` [RFC 03/13] drm/i915: Keep a count of requests submitted from userspace Tvrtko Ursulin
@ 2018-10-03 12:03 ` Tvrtko Ursulin
  2018-10-03 12:03 ` [RFC 05/13] drm/i915/pmu: Add runnable counter Tvrtko Ursulin
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Tvrtko Ursulin @ 2018-10-03 12:03 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

We add a PMU counter to expose the number of requests which have been
submitted from userspace but are not yet runnable due dependencies and
unsignaled fences.

This is useful to analyze the overall load of the system.

v2:
 * Rebase for name change and re-order.
 * Drop floating point constant. (Chris Wilson)

v3:
 * Change scale to 1024 for faster arithmetics. (Chris Wilson)

v4:
 * Refactored for timer period accounting.

v5:
 * Avoid 64-division. (Chris Wilson)

v6:
 * Do fewer divisions by accumulating in qd.ns units. (Chris Wilson)
 * Change counter scale to avoid multiplication in readout and increase
   counter headroom.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_pmu.c         | 58 ++++++++++++++++++++-----
 drivers/gpu/drm/i915/intel_ringbuffer.h |  2 +-
 include/uapi/drm/i915_drm.h             |  9 +++-
 3 files changed, 57 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index 417fda7208be..13449537e2a7 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -16,7 +16,8 @@
 #define ENGINE_SAMPLE_MASK \
 	(BIT(I915_SAMPLE_BUSY) | \
 	 BIT(I915_SAMPLE_WAIT) | \
-	 BIT(I915_SAMPLE_SEMA))
+	 BIT(I915_SAMPLE_SEMA) | \
+	 BIT(I915_SAMPLE_QUEUED))
 
 #define ENGINE_SAMPLE_BITS (1 << I915_PMU_SAMPLE_BITS)
 
@@ -162,6 +163,12 @@ add_sample(struct i915_pmu_sample *sample, u32 val)
 	sample->cur += val;
 }
 
+static void
+add_sample_mult(struct i915_pmu_sample *sample, u32 val, u32 mul)
+{
+	sample->cur += mul_u32_u32(val, mul);
+}
+
 static void
 engines_sample(struct drm_i915_private *dev_priv, unsigned int period_ns)
 {
@@ -205,6 +212,11 @@ engines_sample(struct drm_i915_private *dev_priv, unsigned int period_ns)
 		if (val & RING_WAIT_SEMAPHORE)
 			add_sample(&engine->pmu.sample[I915_SAMPLE_SEMA],
 				   period_ns);
+
+		if (engine->pmu.enable & BIT(I915_SAMPLE_QUEUED))
+			add_sample_mult(&engine->pmu.sample[I915_SAMPLE_QUEUED],
+					atomic_read(&engine->request_stats.queued),
+					period_ns);
 	}
 
 	if (fw)
@@ -213,12 +225,6 @@ engines_sample(struct drm_i915_private *dev_priv, unsigned int period_ns)
 	intel_runtime_pm_put(dev_priv);
 }
 
-static void
-add_sample_mult(struct i915_pmu_sample *sample, u32 val, u32 mul)
-{
-	sample->cur += mul_u32_u32(val, mul);
-}
-
 static void
 frequency_sample(struct drm_i915_private *dev_priv, unsigned int period_ns)
 {
@@ -324,6 +330,7 @@ engine_event_status(struct intel_engine_cs *engine,
 	switch (sample) {
 	case I915_SAMPLE_BUSY:
 	case I915_SAMPLE_WAIT:
+	case I915_SAMPLE_QUEUED:
 		break;
 	case I915_SAMPLE_SEMA:
 		if (INTEL_GEN(engine->i915) < 6)
@@ -541,6 +548,15 @@ static u64 __i915_pmu_event_read(struct perf_event *event)
 			val = ktime_to_ns(intel_engine_get_busy_time(engine));
 		} else {
 			val = engine->pmu.sample[sample].cur;
+
+			if (sample == I915_SAMPLE_QUEUED) {
+				BUILD_BUG_ON(NSEC_PER_SEC %
+					     I915_SAMPLE_QUEUED_DIVISOR);
+				/* to qd */
+				val = div_u64(val,
+					      NSEC_PER_SEC /
+					      I915_SAMPLE_QUEUED_DIVISOR);
+			}
 		}
 	} else {
 		switch (event->attr.config) {
@@ -797,6 +813,16 @@ static const struct attribute_group *i915_pmu_attr_groups[] = {
 { \
 	.sample = (__sample), \
 	.name = (__name), \
+	.suffix = "unit", \
+	.value = "ns", \
+}
+
+#define __engine_event_scale(__sample, __name, __scale) \
+{ \
+	.sample = (__sample), \
+	.name = (__name), \
+	.suffix = "scale", \
+	.value = (__scale), \
 }
 
 static struct i915_ext_attribute *
@@ -824,6 +850,9 @@ add_pmu_attr(struct perf_pmu_events_attr *attr, const char *name,
 	return ++attr;
 }
 
+/* No brackets or quotes below please. */
+#define I915_SAMPLE_QUEUED_SCALE 0.001
+
 static struct attribute **
 create_event_attributes(struct drm_i915_private *i915)
 {
@@ -840,10 +869,14 @@ create_event_attributes(struct drm_i915_private *i915)
 	static const struct {
 		enum drm_i915_pmu_engine_sample sample;
 		char *name;
+		char *suffix;
+		char *value;
 	} engine_events[] = {
 		__engine_event(I915_SAMPLE_BUSY, "busy"),
 		__engine_event(I915_SAMPLE_SEMA, "sema"),
 		__engine_event(I915_SAMPLE_WAIT, "wait"),
+		__engine_event_scale(I915_SAMPLE_QUEUED, "queued",
+				     __stringify(I915_SAMPLE_QUEUED_SCALE)),
 	};
 	unsigned int count = 0;
 	struct perf_pmu_events_attr *pmu_attr = NULL, *pmu_iter;
@@ -853,6 +886,9 @@ create_event_attributes(struct drm_i915_private *i915)
 	enum intel_engine_id id;
 	unsigned int i;
 
+	BUILD_BUG_ON(I915_SAMPLE_QUEUED_DIVISOR !=
+		     (1 / I915_SAMPLE_QUEUED_SCALE));
+
 	/* Count how many counters we will be exposing. */
 	for (i = 0; i < ARRAY_SIZE(events); i++) {
 		if (!config_status(i915, events[i].config))
@@ -930,13 +966,15 @@ create_event_attributes(struct drm_i915_private *i915)
 								engine->instance,
 								engine_events[i].sample));
 
-			str = kasprintf(GFP_KERNEL, "%s-%s.unit",
-					engine->name, engine_events[i].name);
+			str = kasprintf(GFP_KERNEL, "%s-%s.%s",
+					engine->name, engine_events[i].name,
+					engine_events[i].suffix);
 			if (!str)
 				goto err;
 
 			*attr_iter++ = &pmu_iter->attr.attr;
-			pmu_iter = add_pmu_attr(pmu_iter, str, "ns");
+			pmu_iter = add_pmu_attr(pmu_iter, str,
+						engine_events[i].value);
 		}
 	}
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index dc11ed10bac4..b44dee354dc6 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -455,7 +455,7 @@ struct intel_engine_cs {
 		 *
 		 * Our internal timer stores the current counters in this field.
 		 */
-#define I915_ENGINE_SAMPLE_MAX (I915_SAMPLE_SEMA + 1)
+#define I915_ENGINE_SAMPLE_MAX (I915_SAMPLE_QUEUED + 1)
 		struct i915_pmu_sample sample[I915_ENGINE_SAMPLE_MAX];
 	} pmu;
 
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 298b2e197744..dc76c4102c7a 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -110,9 +110,13 @@ enum drm_i915_gem_engine_class {
 enum drm_i915_pmu_engine_sample {
 	I915_SAMPLE_BUSY = 0,
 	I915_SAMPLE_WAIT = 1,
-	I915_SAMPLE_SEMA = 2
+	I915_SAMPLE_SEMA = 2,
+	I915_SAMPLE_QUEUED = 3
 };
 
+ /* Divide counter value by divisor to get the real value. */
+#define I915_SAMPLE_QUEUED_DIVISOR (1000)
+
 #define I915_PMU_SAMPLE_BITS (4)
 #define I915_PMU_SAMPLE_MASK (0xf)
 #define I915_PMU_SAMPLE_INSTANCE_BITS (8)
@@ -133,6 +137,9 @@ enum drm_i915_pmu_engine_sample {
 #define I915_PMU_ENGINE_SEMA(class, instance) \
 	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_SEMA)
 
+#define I915_PMU_ENGINE_QUEUED(class, instance) \
+	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_QUEUED)
+
 #define __I915_PMU_OTHER(x) (__I915_PMU_ENGINE(0xff, 0xff, 0xf) + 1 + (x))
 
 #define I915_PMU_ACTUAL_FREQUENCY	__I915_PMU_OTHER(0)
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC 05/13] drm/i915/pmu: Add runnable counter
  2018-10-03 12:03 [RFC 00/13] 21st century intel_gpu_top Tvrtko Ursulin
                   ` (3 preceding siblings ...)
  2018-10-03 12:03 ` [RFC 04/13] drm/i915/pmu: Add queued counter Tvrtko Ursulin
@ 2018-10-03 12:03 ` Tvrtko Ursulin
  2018-10-03 12:03 ` [RFC 06/13] drm/i915/pmu: Add running counter Tvrtko Ursulin
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Tvrtko Ursulin @ 2018-10-03 12:03 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

We add a PMU counter to expose the number of requests with resolved
dependencies waiting for a slot on the GPU to run.

This is useful to analyze the overall load of the system.

v2: Don't limit to gen8+.

v3:
 * Rebase for dynamic sysfs.
 * Drop currently executing requests.

v4:
 * Sync with internal renaming.
 * Drop floating point constant. (Chris Wilson)

v5:
 * Change scale to 1024 for faster arithmetics. (Chris Wilson)

v6:
 * Refactored for timer period accounting.

v7:
 * Avoid 64-division. (Chris Wilson)

v8:
 * Do fewer divisions by accumulating in qd.ns units. (Chris Wilson)
 * Change counter scale to avoid multiplication in readout and increase
   counter headroom.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_pmu.c         | 20 ++++++++++++++++++--
 drivers/gpu/drm/i915/intel_ringbuffer.h |  2 +-
 include/uapi/drm/i915_drm.h             |  7 ++++++-
 3 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index 13449537e2a7..b01a2e66d33a 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -17,7 +17,8 @@
 	(BIT(I915_SAMPLE_BUSY) | \
 	 BIT(I915_SAMPLE_WAIT) | \
 	 BIT(I915_SAMPLE_SEMA) | \
-	 BIT(I915_SAMPLE_QUEUED))
+	 BIT(I915_SAMPLE_QUEUED) | \
+	 BIT(I915_SAMPLE_RUNNABLE))
 
 #define ENGINE_SAMPLE_BITS (1 << I915_PMU_SAMPLE_BITS)
 
@@ -217,6 +218,11 @@ engines_sample(struct drm_i915_private *dev_priv, unsigned int period_ns)
 			add_sample_mult(&engine->pmu.sample[I915_SAMPLE_QUEUED],
 					atomic_read(&engine->request_stats.queued),
 					period_ns);
+
+		if (engine->pmu.enable & BIT(I915_SAMPLE_RUNNABLE))
+			add_sample_mult(&engine->pmu.sample[I915_SAMPLE_RUNNABLE],
+					engine->request_stats.runnable,
+					period_ns);
 	}
 
 	if (fw)
@@ -331,6 +337,7 @@ engine_event_status(struct intel_engine_cs *engine,
 	case I915_SAMPLE_BUSY:
 	case I915_SAMPLE_WAIT:
 	case I915_SAMPLE_QUEUED:
+	case I915_SAMPLE_RUNNABLE:
 		break;
 	case I915_SAMPLE_SEMA:
 		if (INTEL_GEN(engine->i915) < 6)
@@ -549,9 +556,12 @@ static u64 __i915_pmu_event_read(struct perf_event *event)
 		} else {
 			val = engine->pmu.sample[sample].cur;
 
-			if (sample == I915_SAMPLE_QUEUED) {
+			if (sample == I915_SAMPLE_QUEUED ||
+			    sample == I915_SAMPLE_RUNNABLE) {
 				BUILD_BUG_ON(NSEC_PER_SEC %
 					     I915_SAMPLE_QUEUED_DIVISOR);
+				BUILD_BUG_ON(I915_SAMPLE_QUEUED_DIVISOR !=
+					     I915_SAMPLE_RUNNABLE_DIVISOR);
 				/* to qd */
 				val = div_u64(val,
 					      NSEC_PER_SEC /
@@ -852,6 +862,7 @@ add_pmu_attr(struct perf_pmu_events_attr *attr, const char *name,
 
 /* No brackets or quotes below please. */
 #define I915_SAMPLE_QUEUED_SCALE 0.001
+#define I915_SAMPLE_RUNNABLE_SCALE 0.001
 
 static struct attribute **
 create_event_attributes(struct drm_i915_private *i915)
@@ -877,6 +888,8 @@ create_event_attributes(struct drm_i915_private *i915)
 		__engine_event(I915_SAMPLE_WAIT, "wait"),
 		__engine_event_scale(I915_SAMPLE_QUEUED, "queued",
 				     __stringify(I915_SAMPLE_QUEUED_SCALE)),
+		__engine_event_scale(I915_SAMPLE_RUNNABLE, "runnable",
+				     __stringify(I915_SAMPLE_RUNNABLE_SCALE)),
 	};
 	unsigned int count = 0;
 	struct perf_pmu_events_attr *pmu_attr = NULL, *pmu_iter;
@@ -889,6 +902,9 @@ create_event_attributes(struct drm_i915_private *i915)
 	BUILD_BUG_ON(I915_SAMPLE_QUEUED_DIVISOR !=
 		     (1 / I915_SAMPLE_QUEUED_SCALE));
 
+	BUILD_BUG_ON(I915_SAMPLE_RUNNABLE_DIVISOR !=
+		     (1 / I915_SAMPLE_RUNNABLE_SCALE));
+
 	/* Count how many counters we will be exposing. */
 	for (i = 0; i < ARRAY_SIZE(events); i++) {
 		if (!config_status(i915, events[i].config))
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index b44dee354dc6..50914b0ed826 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -455,7 +455,7 @@ struct intel_engine_cs {
 		 *
 		 * Our internal timer stores the current counters in this field.
 		 */
-#define I915_ENGINE_SAMPLE_MAX (I915_SAMPLE_QUEUED + 1)
+#define I915_ENGINE_SAMPLE_MAX (I915_SAMPLE_RUNNABLE + 1)
 		struct i915_pmu_sample sample[I915_ENGINE_SAMPLE_MAX];
 	} pmu;
 
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index dc76c4102c7a..5bb7f53f1a3d 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -111,11 +111,13 @@ enum drm_i915_pmu_engine_sample {
 	I915_SAMPLE_BUSY = 0,
 	I915_SAMPLE_WAIT = 1,
 	I915_SAMPLE_SEMA = 2,
-	I915_SAMPLE_QUEUED = 3
+	I915_SAMPLE_QUEUED = 3,
+	I915_SAMPLE_RUNNABLE = 4,
 };
 
  /* Divide counter value by divisor to get the real value. */
 #define I915_SAMPLE_QUEUED_DIVISOR (1000)
+#define I915_SAMPLE_RUNNABLE_DIVISOR (1000)
 
 #define I915_PMU_SAMPLE_BITS (4)
 #define I915_PMU_SAMPLE_MASK (0xf)
@@ -140,6 +142,9 @@ enum drm_i915_pmu_engine_sample {
 #define I915_PMU_ENGINE_QUEUED(class, instance) \
 	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_QUEUED)
 
+#define I915_PMU_ENGINE_RUNNABLE(class, instance) \
+	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_RUNNABLE)
+
 #define __I915_PMU_OTHER(x) (__I915_PMU_ENGINE(0xff, 0xff, 0xf) + 1 + (x))
 
 #define I915_PMU_ACTUAL_FREQUENCY	__I915_PMU_OTHER(0)
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC 06/13] drm/i915/pmu: Add running counter
  2018-10-03 12:03 [RFC 00/13] 21st century intel_gpu_top Tvrtko Ursulin
                   ` (4 preceding siblings ...)
  2018-10-03 12:03 ` [RFC 05/13] drm/i915/pmu: Add runnable counter Tvrtko Ursulin
@ 2018-10-03 12:03 ` Tvrtko Ursulin
  2018-10-03 12:04 ` [RFC 07/13] drm/i915: Store engine backpointer in the intel_context Tvrtko Ursulin
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Tvrtko Ursulin @ 2018-10-03 12:03 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

We add a PMU counter to expose the number of requests currently executing
on the GPU.

This is useful to analyze the overall load of the system.

v2:
 * Rebase.
 * Drop floating point constant. (Chris Wilson)

v3:
 * Change scale to 1024 for faster arithmetics. (Chris Wilson)

v4:
 * Refactored for timer period accounting.

v5:
 * Avoid 64-division. (Chris Wilson)

v6:
 * Do fewer divisions by accumulating in qd.ns units. (Chris Wilson)
 * Change counter scale to avoid multiplication in readout and increase
   counter headroom.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_pmu.c         | 20 ++++++++++++++++++--
 drivers/gpu/drm/i915/intel_ringbuffer.h |  2 +-
 include/uapi/drm/i915_drm.h             |  5 +++++
 3 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index b01a2e66d33a..7435bce23b8f 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -18,7 +18,8 @@
 	 BIT(I915_SAMPLE_WAIT) | \
 	 BIT(I915_SAMPLE_SEMA) | \
 	 BIT(I915_SAMPLE_QUEUED) | \
-	 BIT(I915_SAMPLE_RUNNABLE))
+	 BIT(I915_SAMPLE_RUNNABLE) | \
+	 BIT(I915_SAMPLE_RUNNING))
 
 #define ENGINE_SAMPLE_BITS (1 << I915_PMU_SAMPLE_BITS)
 
@@ -223,6 +224,11 @@ engines_sample(struct drm_i915_private *dev_priv, unsigned int period_ns)
 			add_sample_mult(&engine->pmu.sample[I915_SAMPLE_RUNNABLE],
 					engine->request_stats.runnable,
 					period_ns);
+
+		if (engine->pmu.enable & BIT(I915_SAMPLE_RUNNING))
+			add_sample_mult(&engine->pmu.sample[I915_SAMPLE_RUNNING],
+					last_seqno - current_seqno,
+					period_ns);
 	}
 
 	if (fw)
@@ -338,6 +344,7 @@ engine_event_status(struct intel_engine_cs *engine,
 	case I915_SAMPLE_WAIT:
 	case I915_SAMPLE_QUEUED:
 	case I915_SAMPLE_RUNNABLE:
+	case I915_SAMPLE_RUNNING:
 		break;
 	case I915_SAMPLE_SEMA:
 		if (INTEL_GEN(engine->i915) < 6)
@@ -557,11 +564,14 @@ static u64 __i915_pmu_event_read(struct perf_event *event)
 			val = engine->pmu.sample[sample].cur;
 
 			if (sample == I915_SAMPLE_QUEUED ||
-			    sample == I915_SAMPLE_RUNNABLE) {
+			    sample == I915_SAMPLE_RUNNABLE ||
+			    sample == I915_SAMPLE_RUNNING) {
 				BUILD_BUG_ON(NSEC_PER_SEC %
 					     I915_SAMPLE_QUEUED_DIVISOR);
 				BUILD_BUG_ON(I915_SAMPLE_QUEUED_DIVISOR !=
 					     I915_SAMPLE_RUNNABLE_DIVISOR);
+				BUILD_BUG_ON(I915_SAMPLE_QUEUED_DIVISOR !=
+					     I915_SAMPLE_RUNNING_DIVISOR);
 				/* to qd */
 				val = div_u64(val,
 					      NSEC_PER_SEC /
@@ -863,6 +873,7 @@ add_pmu_attr(struct perf_pmu_events_attr *attr, const char *name,
 /* No brackets or quotes below please. */
 #define I915_SAMPLE_QUEUED_SCALE 0.001
 #define I915_SAMPLE_RUNNABLE_SCALE 0.001
+#define I915_SAMPLE_RUNNING_SCALE 0.001
 
 static struct attribute **
 create_event_attributes(struct drm_i915_private *i915)
@@ -890,6 +901,8 @@ create_event_attributes(struct drm_i915_private *i915)
 				     __stringify(I915_SAMPLE_QUEUED_SCALE)),
 		__engine_event_scale(I915_SAMPLE_RUNNABLE, "runnable",
 				     __stringify(I915_SAMPLE_RUNNABLE_SCALE)),
+		__engine_event_scale(I915_SAMPLE_RUNNING, "running",
+				     __stringify(I915_SAMPLE_RUNNING_SCALE)),
 	};
 	unsigned int count = 0;
 	struct perf_pmu_events_attr *pmu_attr = NULL, *pmu_iter;
@@ -905,6 +918,9 @@ create_event_attributes(struct drm_i915_private *i915)
 	BUILD_BUG_ON(I915_SAMPLE_RUNNABLE_DIVISOR !=
 		     (1 / I915_SAMPLE_RUNNABLE_SCALE));
 
+	BUILD_BUG_ON(I915_SAMPLE_RUNNING_DIVISOR !=
+		     (1 / I915_SAMPLE_RUNNING_SCALE));
+
 	/* Count how many counters we will be exposing. */
 	for (i = 0; i < ARRAY_SIZE(events); i++) {
 		if (!config_status(i915, events[i].config))
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 50914b0ed826..8b53ed069063 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -455,7 +455,7 @@ struct intel_engine_cs {
 		 *
 		 * Our internal timer stores the current counters in this field.
 		 */
-#define I915_ENGINE_SAMPLE_MAX (I915_SAMPLE_RUNNABLE + 1)
+#define I915_ENGINE_SAMPLE_MAX (I915_SAMPLE_RUNNING + 1)
 		struct i915_pmu_sample sample[I915_ENGINE_SAMPLE_MAX];
 	} pmu;
 
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 5bb7f53f1a3d..10279c0ef94c 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -113,11 +113,13 @@ enum drm_i915_pmu_engine_sample {
 	I915_SAMPLE_SEMA = 2,
 	I915_SAMPLE_QUEUED = 3,
 	I915_SAMPLE_RUNNABLE = 4,
+	I915_SAMPLE_RUNNING = 5,
 };
 
  /* Divide counter value by divisor to get the real value. */
 #define I915_SAMPLE_QUEUED_DIVISOR (1000)
 #define I915_SAMPLE_RUNNABLE_DIVISOR (1000)
+#define I915_SAMPLE_RUNNING_DIVISOR (1000)
 
 #define I915_PMU_SAMPLE_BITS (4)
 #define I915_PMU_SAMPLE_MASK (0xf)
@@ -145,6 +147,9 @@ enum drm_i915_pmu_engine_sample {
 #define I915_PMU_ENGINE_RUNNABLE(class, instance) \
 	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_RUNNABLE)
 
+#define I915_PMU_ENGINE_RUNNING(class, instance) \
+	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_RUNNING)
+
 #define __I915_PMU_OTHER(x) (__I915_PMU_ENGINE(0xff, 0xff, 0xf) + 1 + (x))
 
 #define I915_PMU_ACTUAL_FREQUENCY	__I915_PMU_OTHER(0)
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC 07/13] drm/i915: Store engine backpointer in the intel_context
  2018-10-03 12:03 [RFC 00/13] 21st century intel_gpu_top Tvrtko Ursulin
                   ` (5 preceding siblings ...)
  2018-10-03 12:03 ` [RFC 06/13] drm/i915/pmu: Add running counter Tvrtko Ursulin
@ 2018-10-03 12:04 ` Tvrtko Ursulin
  2018-10-03 12:04 ` [RFC 08/13] drm/i915: Move intel_engine_context_in/out into intel_lrc.c Tvrtko Ursulin
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Tvrtko Ursulin @ 2018-10-03 12:04 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

It will become useful in a later patch.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_context.c | 1 +
 drivers/gpu/drm/i915/i915_gem_context.h | 1 +
 2 files changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 8cbe58070561..7cf5fdf93583 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -343,6 +343,7 @@ __create_hw_context(struct drm_i915_private *dev_priv,
 		struct intel_context *ce = &ctx->__engine[n];
 
 		ce->gem_context = ctx;
+		ce->engine = dev_priv->engine[n];
 	}
 
 	INIT_RADIX_TREE(&ctx->handles_vma, GFP_KERNEL);
diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h
index 08165f6a0a84..1c376cf316ca 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.h
+++ b/drivers/gpu/drm/i915/i915_gem_context.h
@@ -163,6 +163,7 @@ struct i915_gem_context {
 	/** engine: per-engine logical HW state */
 	struct intel_context {
 		struct i915_gem_context *gem_context;
+		struct intel_engine_cs *engine;
 		struct i915_vma *state;
 		struct intel_ring *ring;
 		u32 *lrc_reg_state;
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC 08/13] drm/i915: Move intel_engine_context_in/out into intel_lrc.c
  2018-10-03 12:03 [RFC 00/13] 21st century intel_gpu_top Tvrtko Ursulin
                   ` (6 preceding siblings ...)
  2018-10-03 12:04 ` [RFC 07/13] drm/i915: Store engine backpointer in the intel_context Tvrtko Ursulin
@ 2018-10-03 12:04 ` Tvrtko Ursulin
  2018-10-03 12:04 ` [RFC 09/13] drm/i915: Track per-context engine busyness Tvrtko Ursulin
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Tvrtko Ursulin @ 2018-10-03 12:04 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Intel_lrc.c is the only caller and so to avoid some header file ordering
issues in future patches move these two over there.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c        | 57 +++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_ringbuffer.h | 55 ------------------------
 2 files changed, 57 insertions(+), 55 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 3e8b16ab1b33..2ef541ce7768 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -329,6 +329,63 @@ execlists_context_status_change(struct i915_request *rq, unsigned long status)
 				   status, rq);
 }
 
+static inline void
+intel_engine_context_in(struct intel_engine_cs *engine)
+{
+	unsigned long flags;
+
+	if (READ_ONCE(engine->stats.enabled) == 0)
+		return;
+
+	write_seqlock_irqsave(&engine->stats.lock, flags);
+
+	if (engine->stats.enabled > 0) {
+		if (engine->stats.active++ == 0)
+			engine->stats.start = ktime_get();
+		GEM_BUG_ON(engine->stats.active == 0);
+	}
+
+	write_sequnlock_irqrestore(&engine->stats.lock, flags);
+}
+
+static inline void
+intel_engine_context_out(struct intel_engine_cs *engine)
+{
+	unsigned long flags;
+
+	if (READ_ONCE(engine->stats.enabled) == 0)
+		return;
+
+	write_seqlock_irqsave(&engine->stats.lock, flags);
+
+	if (engine->stats.enabled > 0) {
+		ktime_t last;
+
+		if (engine->stats.active && --engine->stats.active == 0) {
+			/*
+			 * Decrement the active context count and in case GPU
+			 * is now idle add up to the running total.
+			 */
+			last = ktime_sub(ktime_get(), engine->stats.start);
+
+			engine->stats.total = ktime_add(engine->stats.total,
+							last);
+		} else if (engine->stats.active == 0) {
+			/*
+			 * After turning on engine stats, context out might be
+			 * the first event in which case we account from the
+			 * time stats gathering was turned on.
+			 */
+			last = ktime_sub(ktime_get(), engine->stats.enabled_at);
+
+			engine->stats.total = ktime_add(engine->stats.total,
+							last);
+		}
+	}
+
+	write_sequnlock_irqrestore(&engine->stats.lock, flags);
+}
+
 inline void
 execlists_user_begin(struct intel_engine_execlists *execlists,
 		     const struct execlist_port *port)
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 8b53ed069063..b3092d736f8c 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -1150,61 +1150,6 @@ void intel_engine_dump(struct intel_engine_cs *engine,
 struct intel_engine_cs *
 intel_engine_lookup_user(struct drm_i915_private *i915, u8 class, u8 instance);
 
-static inline void intel_engine_context_in(struct intel_engine_cs *engine)
-{
-	unsigned long flags;
-
-	if (READ_ONCE(engine->stats.enabled) == 0)
-		return;
-
-	write_seqlock_irqsave(&engine->stats.lock, flags);
-
-	if (engine->stats.enabled > 0) {
-		if (engine->stats.active++ == 0)
-			engine->stats.start = ktime_get();
-		GEM_BUG_ON(engine->stats.active == 0);
-	}
-
-	write_sequnlock_irqrestore(&engine->stats.lock, flags);
-}
-
-static inline void intel_engine_context_out(struct intel_engine_cs *engine)
-{
-	unsigned long flags;
-
-	if (READ_ONCE(engine->stats.enabled) == 0)
-		return;
-
-	write_seqlock_irqsave(&engine->stats.lock, flags);
-
-	if (engine->stats.enabled > 0) {
-		ktime_t last;
-
-		if (engine->stats.active && --engine->stats.active == 0) {
-			/*
-			 * Decrement the active context count and in case GPU
-			 * is now idle add up to the running total.
-			 */
-			last = ktime_sub(ktime_get(), engine->stats.start);
-
-			engine->stats.total = ktime_add(engine->stats.total,
-							last);
-		} else if (engine->stats.active == 0) {
-			/*
-			 * After turning on engine stats, context out might be
-			 * the first event in which case we account from the
-			 * time stats gathering was turned on.
-			 */
-			last = ktime_sub(ktime_get(), engine->stats.enabled_at);
-
-			engine->stats.total = ktime_add(engine->stats.total,
-							last);
-		}
-	}
-
-	write_sequnlock_irqrestore(&engine->stats.lock, flags);
-}
-
 int intel_enable_engine_stats(struct intel_engine_cs *engine);
 void intel_disable_engine_stats(struct intel_engine_cs *engine);
 
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC 09/13] drm/i915: Track per-context engine busyness
  2018-10-03 12:03 [RFC 00/13] 21st century intel_gpu_top Tvrtko Ursulin
                   ` (7 preceding siblings ...)
  2018-10-03 12:04 ` [RFC 08/13] drm/i915: Move intel_engine_context_in/out into intel_lrc.c Tvrtko Ursulin
@ 2018-10-03 12:04 ` Tvrtko Ursulin
  2018-10-03 12:04 ` [RFC 10/13] drm/i915: Expose list of clients in sysfs Tvrtko Ursulin
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Tvrtko Ursulin @ 2018-10-03 12:04 UTC (permalink / raw)
  To: Intel-gfx; +Cc: gordon.kelly

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Some customers want to know how much of the GPU time are their clients
using in order to make dynamic load balancing decisions.

With the hooks already in place which track the overall engine busyness,
we can extend that slightly to split that time between contexts.

v2: Fix accounting for tail updates.
v3: Rebase.
v4: Mark currently running contexts as active on stats enable.
v5: Include some headers to fix the build.
v6: Added fine grained lock.
v7: Convert to seqlock. (Chris Wilson)
v8: Rebase and tidy with helpers.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: gordon.kelly@intel.com
---
 drivers/gpu/drm/i915/i915_gem_context.c |  1 +
 drivers/gpu/drm/i915/i915_gem_context.h | 17 +++++++
 drivers/gpu/drm/i915/intel_engine_cs.c  | 27 +++++++++++
 drivers/gpu/drm/i915/intel_lrc.c        | 62 +++++++++++++++++++++----
 4 files changed, 97 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 7cf5fdf93583..d0fa5dfb8389 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -344,6 +344,7 @@ __create_hw_context(struct drm_i915_private *dev_priv,
 
 		ce->gem_context = ctx;
 		ce->engine = dev_priv->engine[n];
+		seqlock_init(&ce->stats.lock);
 	}
 
 	INIT_RADIX_TREE(&ctx->handles_vma, GFP_KERNEL);
diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h
index 1c376cf316ca..151362aa95b9 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.h
+++ b/drivers/gpu/drm/i915/i915_gem_context.h
@@ -28,6 +28,7 @@
 #include <linux/bitops.h>
 #include <linux/list.h>
 #include <linux/radix-tree.h>
+#include <linux/seqlock.h>
 
 #include "i915_gem.h"
 #include "i915_scheduler.h"
@@ -170,6 +171,13 @@ struct i915_gem_context {
 		u64 lrc_desc;
 		int pin_count;
 
+		struct intel_context_stats {
+			seqlock_t lock;
+			bool active;
+			ktime_t start;
+			ktime_t total;
+		} stats;
+
 		const struct intel_context_ops *ops;
 	} __engine[I915_NUM_ENGINES];
 
@@ -364,4 +372,13 @@ static inline void i915_gem_context_put(struct i915_gem_context *ctx)
 	kref_put(&ctx->ref, i915_gem_context_release);
 }
 
+static inline void
+__intel_context_stats_start(struct intel_context_stats *stats, ktime_t now)
+{
+	stats->start = now;
+	stats->active = true;
+}
+
+ktime_t intel_context_get_busy_time(struct intel_context *ce);
+
 #endif /* !__I915_GEM_CONTEXT_H__ */
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 1b3f562246c9..e75b71b97347 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -1632,6 +1632,14 @@ int intel_enable_engine_stats(struct intel_engine_cs *engine)
 
 		engine->stats.enabled_at = ktime_get();
 
+		/* Mark currently running context as active. */
+		if (port_isset(port)) {
+			struct i915_request *rq = port_request(port);
+
+			__intel_context_stats_start(&rq->hw_context->stats,
+						    engine->stats.enabled_at);
+		}
+
 		/* XXX submission method oblivious? */
 		while (num_ports-- && port_isset(port)) {
 			engine->stats.active++;
@@ -1705,6 +1713,25 @@ void intel_disable_engine_stats(struct intel_engine_cs *engine)
 	write_sequnlock_irqrestore(&engine->stats.lock, flags);
 }
 
+ktime_t intel_context_get_busy_time(struct intel_context *ce)
+{
+	unsigned int seq;
+	ktime_t total;
+
+	do {
+		seq = read_seqbegin(&ce->stats.lock);
+
+		total = ce->stats.total;
+
+		if (ce->stats.active)
+			total = ktime_add(total,
+					  ktime_sub(ktime_get(),
+						    ce->stats.start));
+	} while (read_seqretry(&ce->stats.lock, seq));
+
+	return total;
+}
+
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
 #include "selftests/mock_engine.c"
 #include "selftests/intel_engine_cs.c"
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 2ef541ce7768..b5dbca4e4724 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -330,18 +330,48 @@ execlists_context_status_change(struct i915_request *rq, unsigned long status)
 }
 
 static inline void
-intel_engine_context_in(struct intel_engine_cs *engine)
+intel_context_stats_start(struct intel_context_stats *stats, ktime_t now)
 {
+	write_seqlock(&stats->lock);
+	__intel_context_stats_start(stats, now);
+	write_sequnlock(&stats->lock);
+}
+
+static inline void
+intel_context_stats_stop(struct intel_context_stats *stats, ktime_t now)
+{
+	write_seqlock(&stats->lock);
+	GEM_BUG_ON(!stats->start);
+	stats->total = ktime_add(stats->total, ktime_sub(now, stats->start));
+	stats->active = false;
+	write_sequnlock(&stats->lock);
+}
+
+static inline void
+intel_context_in(struct intel_context *ce, bool submit)
+{
+	struct intel_engine_cs *engine = ce->engine;
 	unsigned long flags;
+	ktime_t now;
 
 	if (READ_ONCE(engine->stats.enabled) == 0)
 		return;
 
 	write_seqlock_irqsave(&engine->stats.lock, flags);
 
+	if (submit) {
+		now = ktime_get();
+		intel_context_stats_start(&ce->stats, now);
+	} else {
+		now = 0;
+	}
+
 	if (engine->stats.enabled > 0) {
-		if (engine->stats.active++ == 0)
-			engine->stats.start = ktime_get();
+		if (engine->stats.active++ == 0) {
+			if (!now)
+				now = ktime_get();
+			engine->stats.start = now;
+		}
 		GEM_BUG_ON(engine->stats.active == 0);
 	}
 
@@ -349,8 +379,9 @@ intel_engine_context_in(struct intel_engine_cs *engine)
 }
 
 static inline void
-intel_engine_context_out(struct intel_engine_cs *engine)
+intel_context_out(struct intel_context *ce)
 {
+	struct intel_engine_cs *engine = ce->engine;
 	unsigned long flags;
 
 	if (READ_ONCE(engine->stats.enabled) == 0)
@@ -359,14 +390,25 @@ intel_engine_context_out(struct intel_engine_cs *engine)
 	write_seqlock_irqsave(&engine->stats.lock, flags);
 
 	if (engine->stats.enabled > 0) {
+		struct execlist_port *next_port = &engine->execlists.port[1];
+		ktime_t now = ktime_get();
 		ktime_t last;
 
+		intel_context_stats_stop(&ce->stats, now);
+
+		if (port_isset(next_port)) {
+			struct i915_request *next_rq = port_request(next_port);
+
+			intel_context_stats_start(&next_rq->hw_context->stats,
+						  now);
+		}
+
 		if (engine->stats.active && --engine->stats.active == 0) {
 			/*
 			 * Decrement the active context count and in case GPU
 			 * is now idle add up to the running total.
 			 */
-			last = ktime_sub(ktime_get(), engine->stats.start);
+			last = ktime_sub(now, engine->stats.start);
 
 			engine->stats.total = ktime_add(engine->stats.total,
 							last);
@@ -376,7 +418,7 @@ intel_engine_context_out(struct intel_engine_cs *engine)
 			 * the first event in which case we account from the
 			 * time stats gathering was turned on.
 			 */
-			last = ktime_sub(ktime_get(), engine->stats.enabled_at);
+			last = ktime_sub(now, engine->stats.enabled_at);
 
 			engine->stats.total = ktime_add(engine->stats.total,
 							last);
@@ -400,16 +442,16 @@ execlists_user_end(struct intel_engine_execlists *execlists)
 }
 
 static inline void
-execlists_context_schedule_in(struct i915_request *rq)
+execlists_context_schedule_in(struct i915_request *rq, unsigned int port)
 {
 	execlists_context_status_change(rq, INTEL_CONTEXT_SCHEDULE_IN);
-	intel_engine_context_in(rq->engine);
+	intel_context_in(rq->hw_context, port == 0);
 }
 
 static inline void
 execlists_context_schedule_out(struct i915_request *rq, unsigned long status)
 {
-	intel_engine_context_out(rq->engine);
+	intel_context_out(rq->hw_context);
 	execlists_context_status_change(rq, status);
 	trace_i915_request_out(rq);
 }
@@ -484,7 +526,7 @@ static void execlists_submit_ports(struct intel_engine_cs *engine)
 		if (rq) {
 			GEM_BUG_ON(count > !n);
 			if (!count++)
-				execlists_context_schedule_in(rq);
+				execlists_context_schedule_in(rq, n);
 			port_set(&port[n], port_pack(rq, count));
 			desc = execlists_update_context(rq);
 			GEM_DEBUG_EXEC(port[n].context_id = upper_32_bits(desc));
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC 10/13] drm/i915: Expose list of clients in sysfs
  2018-10-03 12:03 [RFC 00/13] 21st century intel_gpu_top Tvrtko Ursulin
                   ` (8 preceding siblings ...)
  2018-10-03 12:04 ` [RFC 09/13] drm/i915: Track per-context engine busyness Tvrtko Ursulin
@ 2018-10-03 12:04 ` Tvrtko Ursulin
  2018-10-03 12:04 ` [RFC 11/13] drm/i915: Update client name on context create Tvrtko Ursulin
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Tvrtko Ursulin @ 2018-10-03 12:04 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Expose a list of clients with open file handles in sysfs.

This will be a basis for a top-like utility showing per-client and per-
engine GPU load.

Currently we only expose each client's pid and name under opaque numbered
directories in /sys/class/drm/card0/clients/.

For instance:

/sys/class/drm/card0/clients/3/name: Xorg
/sys/class/drm/card0/clients/3/pid: 5664

v2:
 Chris Wilson:
 * Enclose new members into dedicated structs.
 * Protect against failed sysfs registration.

v3:
 * sysfs_attr_init.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h   |  19 +++++
 drivers/gpu/drm/i915/i915_gem.c   | 121 ++++++++++++++++++++++++++++--
 drivers/gpu/drm/i915/i915_sysfs.c |   8 ++
 3 files changed, 140 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 2264b30ce51a..f7022533eab1 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -367,6 +367,20 @@ struct drm_i915_file_private {
 	/** ban_score: Accumulated score of all ctx bans and fast hangs. */
 	atomic_t ban_score;
 	unsigned long hang_timestamp;
+
+	struct i915_drm_client {
+		unsigned int id;
+
+		pid_t pid;
+		char *name;
+
+		struct kobject *root;
+
+		struct {
+			struct device_attribute pid;
+			struct device_attribute name;
+		} attr;
+	} client;
 };
 
 /* Interface history:
@@ -2171,6 +2185,11 @@ struct drm_i915_private {
 
 	struct i915_pmu pmu;
 
+	struct i915_drm_clients {
+		struct kobject *root;
+		atomic_t serial;
+	} clients;
+
 	/*
 	 * NOTE: This is the dri1/ums dungeon, don't add stuff here. Your patch
 	 * will be rejected. Instead look for a better place.
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index d3a730f6ef65..44a875863401 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -5888,6 +5888,96 @@ int i915_gem_freeze_late(struct drm_i915_private *i915)
 	return 0;
 }
 
+static ssize_t
+show_client_name(struct device *kdev, struct device_attribute *attr, char *buf)
+{
+	struct drm_i915_file_private *file_priv =
+		container_of(attr, struct drm_i915_file_private,
+			     client.attr.name);
+
+	return snprintf(buf, PAGE_SIZE, "%s", file_priv->client.name);
+}
+
+static ssize_t
+show_client_pid(struct device *kdev, struct device_attribute *attr, char *buf)
+{
+	struct drm_i915_file_private *file_priv =
+		container_of(attr, struct drm_i915_file_private,
+			     client.attr.pid);
+
+	return snprintf(buf, PAGE_SIZE, "%u", file_priv->client.pid);
+}
+
+static int
+i915_gem_add_client(struct drm_i915_private *i915,
+		struct drm_i915_file_private *file_priv,
+		struct task_struct *task,
+		unsigned int serial)
+{
+	int ret = -ENOMEM;
+	struct device_attribute *attr;
+	char id[32];
+
+	if (!i915->clients.root)
+		goto err_name;
+
+	file_priv->client.name = kstrdup(task->comm, GFP_KERNEL);
+	if (!file_priv->client.name)
+		goto err_name;
+
+	snprintf(id, sizeof(id), "%u", serial);
+	file_priv->client.root = kobject_create_and_add(id,
+							i915->clients.root);
+	if (!file_priv->client.root)
+		goto err_client;
+
+	attr = &file_priv->client.attr.name;
+	sysfs_attr_init(&attr->attr);
+	attr->attr.name = "name";
+	attr->attr.mode = 0444;
+	attr->show = show_client_name;
+
+	ret = sysfs_create_file(file_priv->client.root,
+				(struct attribute *)attr);
+	if (ret)
+		goto err_attr_name;
+
+	attr = &file_priv->client.attr.pid;
+	sysfs_attr_init(&attr->attr);
+	attr->attr.name = "pid";
+	attr->attr.mode = 0444;
+	attr->show = show_client_pid;
+
+	ret = sysfs_create_file(file_priv->client.root,
+				(struct attribute *)attr);
+	if (ret)
+		goto err_attr_pid;
+
+	file_priv->client.pid = pid_nr(get_task_pid(task, PIDTYPE_PID));
+
+	return 0;
+
+err_attr_pid:
+	sysfs_remove_file(file_priv->client.root,
+			  (struct attribute *)&file_priv->client.attr.name);
+err_attr_name:
+	kobject_put(file_priv->client.root);
+err_client:
+	kfree(file_priv->client.name);
+err_name:
+	return ret;
+}
+
+static void i915_gem_remove_client(struct drm_i915_file_private *file_priv)
+{
+	sysfs_remove_file(file_priv->client.root,
+			  (struct attribute *)&file_priv->client.attr.pid);
+	sysfs_remove_file(file_priv->client.root,
+			  (struct attribute *)&file_priv->client.attr.name);
+	kobject_put(file_priv->client.root);
+	kfree(file_priv->client.name);
+}
+
 void i915_gem_release(struct drm_device *dev, struct drm_file *file)
 {
 	struct drm_i915_file_private *file_priv = file->driver_priv;
@@ -5901,33 +5991,48 @@ void i915_gem_release(struct drm_device *dev, struct drm_file *file)
 	list_for_each_entry(request, &file_priv->mm.request_list, client_link)
 		request->file_priv = NULL;
 	spin_unlock(&file_priv->mm.lock);
+
+	i915_gem_remove_client(file_priv);
 }
 
 int i915_gem_open(struct drm_i915_private *i915, struct drm_file *file)
 {
+	int ret = -ENOMEM;
 	struct drm_i915_file_private *file_priv;
-	int ret;
 
 	DRM_DEBUG("\n");
 
 	file_priv = kzalloc(sizeof(*file_priv), GFP_KERNEL);
 	if (!file_priv)
-		return -ENOMEM;
+		goto err_alloc;
+
+	file_priv->client.id = atomic_inc_return(&i915->clients.serial);
+	ret = i915_gem_add_client(i915, file_priv, current,
+				  file_priv->client.id);
+	if (ret)
+		goto err_client;
 
 	file->driver_priv = file_priv;
+	ret = i915_gem_context_open(i915, file);
+	if (ret)
+		goto err_context;
+
 	file_priv->dev_priv = i915;
 	file_priv->file = file;
+	file_priv->bsd_engine = -1;
+	file_priv->hang_timestamp = jiffies;
 
 	spin_lock_init(&file_priv->mm.lock);
 	INIT_LIST_HEAD(&file_priv->mm.request_list);
 
-	file_priv->bsd_engine = -1;
-	file_priv->hang_timestamp = jiffies;
-
-	ret = i915_gem_context_open(i915, file);
-	if (ret)
-		kfree(file_priv);
+	return 0;
 
+err_context:
+	i915_gem_remove_client(file_priv);
+err_client:
+	atomic_dec(&i915->clients.serial);
+	kfree(file_priv);
+err_alloc:
 	return ret;
 }
 
diff --git a/drivers/gpu/drm/i915/i915_sysfs.c b/drivers/gpu/drm/i915/i915_sysfs.c
index e5e6f6bb2b05..d809259456ef 100644
--- a/drivers/gpu/drm/i915/i915_sysfs.c
+++ b/drivers/gpu/drm/i915/i915_sysfs.c
@@ -581,6 +581,11 @@ void i915_setup_sysfs(struct drm_i915_private *dev_priv)
 	struct device *kdev = dev_priv->drm.primary->kdev;
 	int ret;
 
+	dev_priv->clients.root =
+		kobject_create_and_add("clients", &kdev->kobj);
+	if (!dev_priv->clients.root)
+		DRM_ERROR("Per-client sysfs setup failed\n");
+
 #ifdef CONFIG_PM
 	if (HAS_RC6(dev_priv)) {
 		ret = sysfs_merge_group(&kdev->kobj,
@@ -641,4 +646,7 @@ void i915_teardown_sysfs(struct drm_i915_private *dev_priv)
 	sysfs_unmerge_group(&kdev->kobj, &rc6_attr_group);
 	sysfs_unmerge_group(&kdev->kobj, &rc6p_attr_group);
 #endif
+
+	if (dev_priv->clients.root)
+		kobject_put(dev_priv->clients.root);
 }
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC 11/13] drm/i915: Update client name on context create
  2018-10-03 12:03 [RFC 00/13] 21st century intel_gpu_top Tvrtko Ursulin
                   ` (9 preceding siblings ...)
  2018-10-03 12:04 ` [RFC 10/13] drm/i915: Expose list of clients in sysfs Tvrtko Ursulin
@ 2018-10-03 12:04 ` Tvrtko Ursulin
  2018-10-03 12:04 ` [RFC 12/13] drm/i915: Expose per-engine client busyness Tvrtko Ursulin
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Tvrtko Ursulin @ 2018-10-03 12:04 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Some clients have the DRM fd passed to them over a socket by the X server.

Grab the real client and pid when they create their first context and
update the exposed data for more useful enumeration.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h         |  8 ++++++++
 drivers/gpu/drm/i915/i915_gem.c         |  4 ++--
 drivers/gpu/drm/i915/i915_gem_context.c | 16 +++++++++++++---
 3 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index f7022533eab1..075a600e066f 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3273,6 +3273,14 @@ i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj,
 void i915_gem_object_unpin_from_display_plane(struct i915_vma *vma);
 int i915_gem_object_attach_phys(struct drm_i915_gem_object *obj,
 				int align);
+
+int
+i915_gem_add_client(struct drm_i915_private *i915,
+		struct drm_i915_file_private *file_priv,
+		struct task_struct *task,
+		unsigned int serial);
+void i915_gem_remove_client(struct drm_i915_file_private *file_priv);
+
 int i915_gem_open(struct drm_i915_private *i915, struct drm_file *file);
 void i915_gem_release(struct drm_device *dev, struct drm_file *file);
 
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 44a875863401..fe684d04be22 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -5908,7 +5908,7 @@ show_client_pid(struct device *kdev, struct device_attribute *attr, char *buf)
 	return snprintf(buf, PAGE_SIZE, "%u", file_priv->client.pid);
 }
 
-static int
+int
 i915_gem_add_client(struct drm_i915_private *i915,
 		struct drm_i915_file_private *file_priv,
 		struct task_struct *task,
@@ -5968,7 +5968,7 @@ i915_gem_add_client(struct drm_i915_private *i915,
 	return ret;
 }
 
-static void i915_gem_remove_client(struct drm_i915_file_private *file_priv)
+void i915_gem_remove_client(struct drm_i915_file_private *file_priv)
 {
 	sysfs_remove_file(file_priv->client.root,
 			  (struct attribute *)&file_priv->client.attr.pid);
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index d0fa5dfb8389..738dd7e583e8 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -779,9 +779,10 @@ static bool client_is_banned(struct drm_i915_file_private *file_priv)
 int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
 				  struct drm_file *file)
 {
+	struct drm_i915_file_private *file_priv = file->driver_priv;
+	pid_t pid = pid_nr(get_task_pid(current, PIDTYPE_PID));
 	struct drm_i915_private *dev_priv = to_i915(dev);
 	struct drm_i915_gem_context_create *args = data;
-	struct drm_i915_file_private *file_priv = file->driver_priv;
 	struct i915_gem_context *ctx;
 	int ret;
 
@@ -793,8 +794,7 @@ int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
 
 	if (client_is_banned(file_priv)) {
 		DRM_DEBUG("client %s[%d] banned from creating ctx\n",
-			  current->comm,
-			  pid_nr(get_task_pid(current, PIDTYPE_PID)));
+			  current->comm, pid);
 
 		return -EIO;
 	}
@@ -803,6 +803,16 @@ int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
 	if (ret)
 		return ret;
 
+	if (file_priv->client.pid != pid) {
+		i915_gem_remove_client(file_priv);
+		ret = i915_gem_add_client(dev_priv, file_priv, current,
+					  file_priv->client.id);
+		if (ret) {
+			mutex_unlock(&dev->struct_mutex);
+			return ret;
+		}
+	}
+
 	ctx = i915_gem_create_context(dev_priv, file_priv);
 	mutex_unlock(&dev->struct_mutex);
 	if (IS_ERR(ctx))
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC 12/13] drm/i915: Expose per-engine client busyness
  2018-10-03 12:03 [RFC 00/13] 21st century intel_gpu_top Tvrtko Ursulin
                   ` (10 preceding siblings ...)
  2018-10-03 12:04 ` [RFC 11/13] drm/i915: Update client name on context create Tvrtko Ursulin
@ 2018-10-03 12:04 ` Tvrtko Ursulin
  2018-10-03 12:04 ` [RFC 13/13] drm/i915: Add sysfs toggle to enable per-client engine stats Tvrtko Ursulin
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Tvrtko Ursulin @ 2018-10-03 12:04 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Expose per-client and per-engine busyness under the previously added sysfs
client root.

The new files are one per-engine instance and located under the 'busy'
directory.

Each contains a monotonically increasing nano-second resolution times each
client's jobs were executing on the GPU.

$ cat /sys/class/drm/card0/clients/5/busy/rcs0
32516602

This data can serve as an interface to implement a top like utility for
GPU jobs. For instance I have prototyped a tool in IGT which produces
periodic output like:

neverball[  6011]:  rcs0:  41.01%  bcs0:   0.00%  vcs0:   0.00%  vecs0:   0.00%
     Xorg[  5664]:  rcs0:  31.16%  bcs0:   0.00%  vcs0:   0.00%  vecs0:   0.00%
    xfwm4[  5727]:  rcs0:   0.00%  bcs0:   0.00%  vcs0:   0.00%  vecs0:   0.00%

This tools can also be extended to use the i915 PMU and show overall engine
busyness, and engine loads using the queue depth metric.

v2: Use intel_context_engine_get_busy_time.
v3: New directory structure.
v4: Rebase.
v5: sysfs_attr_init.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h |  8 ++++
 drivers/gpu/drm/i915/i915_gem.c | 81 +++++++++++++++++++++++++++++++--
 2 files changed, 86 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 075a600e066f..a75f8345db27 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -330,6 +330,12 @@ struct drm_i915_private;
 struct i915_mm_struct;
 struct i915_mmu_object;
 
+struct i915_engine_busy_attribute {
+	struct device_attribute attr;
+	struct drm_i915_file_private *file_priv;
+	struct intel_engine_cs *engine;
+};
+
 struct drm_i915_file_private {
 	struct drm_i915_private *dev_priv;
 	struct drm_file *file;
@@ -375,10 +381,12 @@ struct drm_i915_file_private {
 		char *name;
 
 		struct kobject *root;
+		struct kobject *busy_root;
 
 		struct {
 			struct device_attribute pid;
 			struct device_attribute name;
+			struct i915_engine_busy_attribute busy[I915_NUM_ENGINES];
 		} attr;
 	} client;
 };
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index fe684d04be22..7a1a1279f39b 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -5908,6 +5908,37 @@ show_client_pid(struct device *kdev, struct device_attribute *attr, char *buf)
 	return snprintf(buf, PAGE_SIZE, "%u", file_priv->client.pid);
 }
 
+struct busy_ctx {
+	struct intel_engine_cs *engine;
+	u64 total;
+};
+
+static int busy_add(int _id, void *p, void *data)
+{
+	struct busy_ctx *bc = data;
+	struct i915_gem_context *ctx = p;
+	struct intel_context *ce = to_intel_context(ctx, bc->engine);
+
+	bc->total += ktime_to_ns(intel_context_get_busy_time(ce));
+
+	return 0;
+}
+
+static ssize_t
+show_client_busy(struct device *kdev, struct device_attribute *attr, char *buf)
+{
+	struct i915_engine_busy_attribute *i915_attr =
+		container_of(attr, typeof(*i915_attr), attr);
+	struct drm_i915_file_private *file_priv = i915_attr->file_priv;
+	struct busy_ctx bc = { .engine = i915_attr->engine };
+
+	rcu_read_lock();
+	idr_for_each(&file_priv->context_idr, busy_add, &bc);
+	rcu_read_unlock();
+
+	return snprintf(buf, PAGE_SIZE, "%llu\n", bc.total);
+}
+
 int
 i915_gem_add_client(struct drm_i915_private *i915,
 		struct drm_i915_file_private *file_priv,
@@ -5915,8 +5946,10 @@ i915_gem_add_client(struct drm_i915_private *i915,
 		unsigned int serial)
 {
 	int ret = -ENOMEM;
+	struct intel_engine_cs *engine;
 	struct device_attribute *attr;
-	char id[32];
+	enum intel_engine_id id, id2;
+	char idstr[32];
 
 	if (!i915->clients.root)
 		goto err_name;
@@ -5925,8 +5958,8 @@ i915_gem_add_client(struct drm_i915_private *i915,
 	if (!file_priv->client.name)
 		goto err_name;
 
-	snprintf(id, sizeof(id), "%u", serial);
-	file_priv->client.root = kobject_create_and_add(id,
+	snprintf(idstr, sizeof(idstr), "%u", serial);
+	file_priv->client.root = kobject_create_and_add(idstr,
 							i915->clients.root);
 	if (!file_priv->client.root)
 		goto err_client;
@@ -5953,10 +5986,42 @@ i915_gem_add_client(struct drm_i915_private *i915,
 	if (ret)
 		goto err_attr_pid;
 
+	file_priv->client.busy_root =
+			kobject_create_and_add("busy", file_priv->client.root);
+	if (!file_priv->client.busy_root)
+		goto err_busy_root;
+
+	for_each_engine(engine, i915, id) {
+		file_priv->client.attr.busy[id].file_priv = file_priv;
+		file_priv->client.attr.busy[id].engine = engine;
+		attr = &file_priv->client.attr.busy[id].attr;
+		sysfs_attr_init(&attr->attr);
+		attr->attr.name = engine->name;
+		attr->attr.mode = 0444;
+		attr->show = show_client_busy;
+
+		ret = sysfs_create_file(file_priv->client.busy_root,
+				        (struct attribute *)attr);
+		if (ret)
+			goto err_attr_busy;
+	}
+
 	file_priv->client.pid = pid_nr(get_task_pid(task, PIDTYPE_PID));
 
 	return 0;
 
+err_attr_busy:
+	for_each_engine(engine, i915, id2) {
+		if (id2 == id)
+			break;
+
+		sysfs_remove_file(file_priv->client.busy_root,
+				  (struct attribute *)&file_priv->client.attr.busy[id2]);
+	}
+	kobject_put(file_priv->client.busy_root);
+err_busy_root:
+	sysfs_remove_file(file_priv->client.root,
+			  (struct attribute *)&file_priv->client.attr.pid);
 err_attr_pid:
 	sysfs_remove_file(file_priv->client.root,
 			  (struct attribute *)&file_priv->client.attr.name);
@@ -5970,10 +6035,20 @@ i915_gem_add_client(struct drm_i915_private *i915,
 
 void i915_gem_remove_client(struct drm_i915_file_private *file_priv)
 {
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+
+	for_each_engine(engine, file_priv->dev_priv, id)
+		sysfs_remove_file(file_priv->client.busy_root,
+				  (struct attribute *)&file_priv->client.attr.busy[id]);
+
+	kobject_put(file_priv->client.busy_root);
+
 	sysfs_remove_file(file_priv->client.root,
 			  (struct attribute *)&file_priv->client.attr.pid);
 	sysfs_remove_file(file_priv->client.root,
 			  (struct attribute *)&file_priv->client.attr.name);
+
 	kobject_put(file_priv->client.root);
 	kfree(file_priv->client.name);
 }
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC 13/13] drm/i915: Add sysfs toggle to enable per-client engine stats
  2018-10-03 12:03 [RFC 00/13] 21st century intel_gpu_top Tvrtko Ursulin
                   ` (11 preceding siblings ...)
  2018-10-03 12:04 ` [RFC 12/13] drm/i915: Expose per-engine client busyness Tvrtko Ursulin
@ 2018-10-03 12:04 ` Tvrtko Ursulin
  2018-10-03 12:36 ` [RFC 00/13] 21st century intel_gpu_top Chris Wilson
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Tvrtko Ursulin @ 2018-10-03 12:04 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

By default we are not collecting any per-engine and per-context
statistcs.

Add a new sysfs toggle to enable this facility:

$ echo 1 >/sys/class/drm/card0/clients/enable_stats

v2: Rebase.
v3: sysfs_attr_init.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h   |  4 ++
 drivers/gpu/drm/i915/i915_sysfs.c | 73 +++++++++++++++++++++++++++++++
 2 files changed, 77 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index a75f8345db27..2d611335d367 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2196,6 +2196,10 @@ struct drm_i915_private {
 	struct i915_drm_clients {
 		struct kobject *root;
 		atomic_t serial;
+		struct {
+			bool enabled;
+			struct device_attribute attr;
+		} stats;
 	} clients;
 
 	/*
diff --git a/drivers/gpu/drm/i915/i915_sysfs.c b/drivers/gpu/drm/i915/i915_sysfs.c
index d809259456ef..bf33b0cab7c8 100644
--- a/drivers/gpu/drm/i915/i915_sysfs.c
+++ b/drivers/gpu/drm/i915/i915_sysfs.c
@@ -576,9 +576,67 @@ static void i915_setup_error_capture(struct device *kdev) {}
 static void i915_teardown_error_capture(struct device *kdev) {}
 #endif
 
+static ssize_t
+show_client_stats(struct device *kdev, struct device_attribute *attr, char *buf)
+{
+	struct drm_i915_private *i915 =
+		container_of(attr, struct drm_i915_private, clients.stats.attr);
+
+	return snprintf(buf, PAGE_SIZE, "%u\n", i915->clients.stats.enabled);
+}
+
+static ssize_t
+store_client_stats(struct device *kdev, struct device_attribute *attr,
+		   const char *buf, size_t count)
+{
+	struct drm_i915_private *i915 =
+		container_of(attr, struct drm_i915_private, clients.stats.attr);
+	bool disable = false;
+	bool enable = false;
+	bool val = false;
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+	int ret;
+
+	 /* Use RCS as proxy for all engines. */
+	if (!intel_engine_supports_stats(i915->engine[RCS]))
+		return -EINVAL;
+
+	ret = kstrtobool(buf, &val);
+	if (ret)
+		return ret;
+
+	ret = i915_mutex_lock_interruptible(&i915->drm);
+	if (ret)
+		return ret;
+
+	if (val && !i915->clients.stats.enabled)
+		enable = true;
+	else if (!val && i915->clients.stats.enabled)
+		disable = true;
+
+	if (!enable && !disable)
+		goto out;
+
+	for_each_engine(engine, i915, id) {
+		if (enable)
+			WARN_ON_ONCE(intel_enable_engine_stats(engine));
+		else if (disable)
+			intel_disable_engine_stats(engine);
+	}
+
+	i915->clients.stats.enabled = val;
+
+out:
+	mutex_unlock(&i915->drm.struct_mutex);
+
+	return count;
+}
+
 void i915_setup_sysfs(struct drm_i915_private *dev_priv)
 {
 	struct device *kdev = dev_priv->drm.primary->kdev;
+	struct device_attribute *attr;
 	int ret;
 
 	dev_priv->clients.root =
@@ -586,6 +644,18 @@ void i915_setup_sysfs(struct drm_i915_private *dev_priv)
 	if (!dev_priv->clients.root)
 		DRM_ERROR("Per-client sysfs setup failed\n");
 
+	attr = &dev_priv->clients.stats.attr;
+	sysfs_attr_init(&attr->attr);
+	attr->attr.name = "enable_stats";
+	attr->attr.mode = 0664;
+	attr->show = show_client_stats;
+	attr->store = store_client_stats;
+
+	ret = sysfs_create_file(dev_priv->clients.root,
+				(struct attribute *)attr);
+	if (ret)
+		DRM_ERROR("Per-client sysfs setup failed! (%d)\n", ret);
+
 #ifdef CONFIG_PM
 	if (HAS_RC6(dev_priv)) {
 		ret = sysfs_merge_group(&kdev->kobj,
@@ -647,6 +717,9 @@ void i915_teardown_sysfs(struct drm_i915_private *dev_priv)
 	sysfs_unmerge_group(&kdev->kobj, &rc6p_attr_group);
 #endif
 
+	sysfs_remove_file(dev_priv->clients.root,
+			  (struct attribute *)&dev_priv->clients.stats.attr);
+
 	if (dev_priv->clients.root)
 		kobject_put(dev_priv->clients.root);
 }
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [RFC 00/13] 21st century intel_gpu_top
  2018-10-03 12:03 [RFC 00/13] 21st century intel_gpu_top Tvrtko Ursulin
                   ` (12 preceding siblings ...)
  2018-10-03 12:04 ` [RFC 13/13] drm/i915: Add sysfs toggle to enable per-client engine stats Tvrtko Ursulin
@ 2018-10-03 12:36 ` Chris Wilson
  2018-10-03 12:57   ` Tvrtko Ursulin
  2018-10-03 14:38 ` ✗ Fi.CI.BAT: failure for " Patchwork
  2018-10-10 11:49 ` [RFC 00/13] " Tvrtko Ursulin
  15 siblings, 1 reply; 18+ messages in thread
From: Chris Wilson @ 2018-10-03 12:36 UTC (permalink / raw)
  To: Intel-gfx, Tvrtko Ursulin

Quoting Tvrtko Ursulin (2018-10-03 13:03:53)
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> A collection of patches which I have been sending before, sometimes together and
> sometimes separately, which enable intel_gpu_top to report queue depths (also
> translates as overall GPU load average) and per DRM client per engine busyness.

Queued falls apart with v.engine and I don't have a good suggestion for
a remedy. :(
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC 00/13] 21st century intel_gpu_top
  2018-10-03 12:36 ` [RFC 00/13] 21st century intel_gpu_top Chris Wilson
@ 2018-10-03 12:57   ` Tvrtko Ursulin
  0 siblings, 0 replies; 18+ messages in thread
From: Tvrtko Ursulin @ 2018-10-03 12:57 UTC (permalink / raw)
  To: Chris Wilson, Intel-gfx, Tvrtko Ursulin


On 03/10/2018 13:36, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-10-03 13:03:53)
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> A collection of patches which I have been sending before, sometimes together and
>> sometimes separately, which enable intel_gpu_top to report queue depths (also
>> translates as overall GPU load average) and per DRM client per engine busyness.
> 
> Queued falls apart with v.engine and I don't have a good suggestion for
> a remedy. :(

Indeed, I forgot about it. I have now even found a few months old branch 
with queued and runnable removed already.

I think we also talked about the option of exposing aggregate engine 
class counters but that also has problems.

We could go global and not expose this per engine, but that wouldn't 
make <gen11 users happy.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* ✗ Fi.CI.BAT: failure for 21st century intel_gpu_top
  2018-10-03 12:03 [RFC 00/13] 21st century intel_gpu_top Tvrtko Ursulin
                   ` (13 preceding siblings ...)
  2018-10-03 12:36 ` [RFC 00/13] 21st century intel_gpu_top Chris Wilson
@ 2018-10-03 14:38 ` Patchwork
  2018-10-10 11:49 ` [RFC 00/13] " Tvrtko Ursulin
  15 siblings, 0 replies; 18+ messages in thread
From: Patchwork @ 2018-10-03 14:38 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

== Series Details ==

Series: 21st century intel_gpu_top
URL   : https://patchwork.freedesktop.org/series/50498/
State : failure

== Summary ==

Applying: drm/i915/pmu: Fix enable count array size and bounds checking
Applying: drm/i915: Keep a count of requests waiting for a slot on GPU
Applying: drm/i915: Keep a count of requests submitted from userspace
Applying: drm/i915/pmu: Add queued counter
Applying: drm/i915/pmu: Add runnable counter
Applying: drm/i915/pmu: Add running counter
Applying: drm/i915: Store engine backpointer in the intel_context
Using index info to reconstruct a base tree...
M	drivers/gpu/drm/i915/i915_gem_context.h
Falling back to patching base and 3-way merge...
Auto-merging drivers/gpu/drm/i915/i915_gem_context.h
CONFLICT (content): Merge conflict in drivers/gpu/drm/i915/i915_gem_context.h
error: Failed to merge in the changes.
Patch failed at 0007 drm/i915: Store engine backpointer in the intel_context
Use 'git am --show-current-patch' to see the failed patch
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC 00/13] 21st century intel_gpu_top
  2018-10-03 12:03 [RFC 00/13] 21st century intel_gpu_top Tvrtko Ursulin
                   ` (14 preceding siblings ...)
  2018-10-03 14:38 ` ✗ Fi.CI.BAT: failure for " Patchwork
@ 2018-10-10 11:49 ` Tvrtko Ursulin
  15 siblings, 0 replies; 18+ messages in thread
From: Tvrtko Ursulin @ 2018-10-10 11:49 UTC (permalink / raw)
  To: Tvrtko Ursulin, Intel-gfx


On 03/10/2018 13:03, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> A collection of patches which I have been sending before, sometimes together and
> sometimes separately, which enable intel_gpu_top to report queue depths (also
> translates as overall GPU load average) and per DRM client per engine busyness.
> 
> This enables a fancy intel_gpu_top which looks like this (a picture is worth a
> thousand words):
> 
> intel-gpu-top - load avg  3.30,  1.51,  0.08;  949/ 949 MHz;    0% RC6;  14.66 Watts;     3605 irqs/s
> 
>        IMC reads:     4651 MiB/s
>       IMC writes:       25 MiB/s
> 
>            ENGINE      BUSY                                                                                Q   r   R MI_SEMA MI_WAIT
>       Render/3D/0    61.51% |█████████████████████████████████████████████▌                            |   3   0   1      0%      0%
>         Blitter/0     0.00% |                                                                          |   0   0   0      0%      0%
>           Video/0    60.86% |█████████████████████████████████████████████                             |   1   0   1      0%      0%
>           Video/1    59.04% |███████████████████████████████████████████▋                              |   1   0   1      0%      0%
>    VideoEnhance/0     0.00% |                                                                          |   0   0   0      0%      0%
> 
>    PID            NAME     Render/3D/0            Blitter/0              Video/0               Video/1            VideoEnhance/0
> 23373        gem_wsim |█████▎              ||                    ||████████▍           ||█████▎              ||                    |
> 23374        gem_wsim |███▉                ||                    ||██▏                 ||███                 ||                    |
> 23375        gem_wsim |███                 ||                    ||█▍                  ||███▌                ||                    |
> 
> All of this work actually came to be via different feature requests not directly
> asking for this. Things like engine queue depth query and per context engine
> busyness ioctl. Those bits need userspace which is not there yet and so I have
> removed them from this posting to avoid confusion.
> 
> What remains is a set of patches which add some PMU counters and a completely
> new sysfs interface to enable intel_gpu_top to read the per client stats.
> 
> IGT counterpart will be sent separately.

FWIW at least one more person thinks this would be a nice to have 
feature - https://twitter.com/IntelGraphics/status/1047991913972826112. 
But it sure feels weird to cross-link twitter to intel-gfx! Sign of 
times.. :)

Regards,

Tvrtko

> 
> Tvrtko Ursulin (13):
>    drm/i915/pmu: Fix enable count array size and bounds checking
>    drm/i915: Keep a count of requests waiting for a slot on GPU
>    drm/i915: Keep a count of requests submitted from userspace
>    drm/i915/pmu: Add queued counter
>    drm/i915/pmu: Add runnable counter
>    drm/i915/pmu: Add running counter
>    drm/i915: Store engine backpointer in the intel_context
>    drm/i915: Move intel_engine_context_in/out into intel_lrc.c
>    drm/i915: Track per-context engine busyness
>    drm/i915: Expose list of clients in sysfs
>    drm/i915: Update client name on context create
>    drm/i915: Expose per-engine client busyness
>    drm/i915: Add sysfs toggle to enable per-client engine stats
> 
>   drivers/gpu/drm/i915/i915_drv.h         |  39 +++++
>   drivers/gpu/drm/i915/i915_gem.c         | 197 +++++++++++++++++++++++-
>   drivers/gpu/drm/i915/i915_gem_context.c |  18 ++-
>   drivers/gpu/drm/i915/i915_gem_context.h |  18 +++
>   drivers/gpu/drm/i915/i915_pmu.c         | 103 +++++++++++--
>   drivers/gpu/drm/i915/i915_request.c     |  10 ++
>   drivers/gpu/drm/i915/i915_sysfs.c       |  81 ++++++++++
>   drivers/gpu/drm/i915/intel_engine_cs.c  |  33 +++-
>   drivers/gpu/drm/i915/intel_lrc.c        | 109 ++++++++++++-
>   drivers/gpu/drm/i915/intel_ringbuffer.h |  76 +++------
>   include/uapi/drm/i915_drm.h             |  19 ++-
>   11 files changed, 614 insertions(+), 89 deletions(-)
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2018-10-10 11:58 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-03 12:03 [RFC 00/13] 21st century intel_gpu_top Tvrtko Ursulin
2018-10-03 12:03 ` [RFC 01/13] drm/i915/pmu: Fix enable count array size and bounds checking Tvrtko Ursulin
2018-10-03 12:03 ` [RFC 02/13] drm/i915: Keep a count of requests waiting for a slot on GPU Tvrtko Ursulin
2018-10-03 12:03 ` [RFC 03/13] drm/i915: Keep a count of requests submitted from userspace Tvrtko Ursulin
2018-10-03 12:03 ` [RFC 04/13] drm/i915/pmu: Add queued counter Tvrtko Ursulin
2018-10-03 12:03 ` [RFC 05/13] drm/i915/pmu: Add runnable counter Tvrtko Ursulin
2018-10-03 12:03 ` [RFC 06/13] drm/i915/pmu: Add running counter Tvrtko Ursulin
2018-10-03 12:04 ` [RFC 07/13] drm/i915: Store engine backpointer in the intel_context Tvrtko Ursulin
2018-10-03 12:04 ` [RFC 08/13] drm/i915: Move intel_engine_context_in/out into intel_lrc.c Tvrtko Ursulin
2018-10-03 12:04 ` [RFC 09/13] drm/i915: Track per-context engine busyness Tvrtko Ursulin
2018-10-03 12:04 ` [RFC 10/13] drm/i915: Expose list of clients in sysfs Tvrtko Ursulin
2018-10-03 12:04 ` [RFC 11/13] drm/i915: Update client name on context create Tvrtko Ursulin
2018-10-03 12:04 ` [RFC 12/13] drm/i915: Expose per-engine client busyness Tvrtko Ursulin
2018-10-03 12:04 ` [RFC 13/13] drm/i915: Add sysfs toggle to enable per-client engine stats Tvrtko Ursulin
2018-10-03 12:36 ` [RFC 00/13] 21st century intel_gpu_top Chris Wilson
2018-10-03 12:57   ` Tvrtko Ursulin
2018-10-03 14:38 ` ✗ Fi.CI.BAT: failure for " Patchwork
2018-10-10 11:49 ` [RFC 00/13] " Tvrtko Ursulin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.