All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] drm/i915: Keep a count of requests waiting for a slot on GPU
@ 2017-11-22 12:46 Tvrtko Ursulin
  2017-11-22 12:46 ` [PATCH 2/2] drm/i915/pmu: Add queued counter Tvrtko Ursulin
  2017-11-22 12:59 ` [PATCH 1/2] drm/i915: Keep a count of requests waiting for a slot on GPU Chris Wilson
  0 siblings, 2 replies; 10+ messages in thread
From: Tvrtko Ursulin @ 2017-11-22 12:46 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Keep a per-engine number of runnable (waiting for GPU time) requests.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_request.c | 5 +++++
 drivers/gpu/drm/i915/intel_engine_cs.c  | 5 +++--
 drivers/gpu/drm/i915/intel_lrc.c        | 1 +
 drivers/gpu/drm/i915/intel_ringbuffer.h | 8 ++++++++
 4 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
index 7325469ce754..e3c74cafa7d4 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -480,6 +480,9 @@ void __i915_gem_request_submit(struct drm_i915_gem_request *request)
 	engine->emit_breadcrumb(request,
 				request->ring->vaddr + request->postfix);
 
+	GEM_BUG_ON(engine->queued == 0);
+	engine->queued--;
+
 	spin_lock(&request->timeline->lock);
 	list_move_tail(&request->link, &timeline->requests);
 	spin_unlock(&request->timeline->lock);
@@ -525,6 +528,8 @@ void __i915_gem_request_unsubmit(struct drm_i915_gem_request *request)
 	timeline = request->timeline;
 	GEM_BUG_ON(timeline == engine->timeline);
 
+	engine->queued++;
+
 	spin_lock(&timeline->lock);
 	list_move(&request->link, &timeline->requests);
 	spin_unlock(&timeline->lock);
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index d53680c08cb0..cc9d60130ddd 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -1675,12 +1675,13 @@ void intel_engine_dump(struct intel_engine_cs *engine, struct drm_printer *m)
 	u64 addr;
 
 	drm_printf(m, "%s\n", engine->name);
-	drm_printf(m, "\tcurrent seqno %x, last %x, hangcheck %x [%d ms], inflight %d\n",
+	drm_printf(m, "\tcurrent seqno %x, last %x, hangcheck %x [%d ms], inflight %d, queued %d\n",
 		   intel_engine_get_seqno(engine),
 		   intel_engine_last_submit(engine),
 		   engine->hangcheck.seqno,
 		   jiffies_to_msecs(jiffies - engine->hangcheck.action_timestamp),
-		   engine->timeline->inflight_seqnos);
+		   engine->timeline->inflight_seqnos,
+		   INTEL_GEN(dev_priv) >= 8 ? engine->queued : -1);
 	drm_printf(m, "\tReset count: %d\n",
 		   i915_reset_engine_count(error, engine));
 
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 570864583e28..c4c53ad67b4c 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -953,6 +953,7 @@ static void insert_request(struct intel_engine_cs *engine,
 {
 	struct i915_priolist *p = lookup_priolist(engine, pt, prio);
 
+	engine->queued++;
 	list_add_tail(&pt->link, &ptr_mask_bits(p, 1)->requests);
 	if (ptr_unmask_bits(p, 1))
 		tasklet_hi_schedule(&engine->execlists.tasklet);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 37a389ff031e..fe1651ec9756 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -297,6 +297,14 @@ struct intel_engine_cs {
 
 	struct intel_ring *buffer;
 	struct intel_timeline *timeline;
+	/**
+	 * @queued: Number of runnable requests submitted to the backend.
+	 *
+	 * Count of requests waiting for the GPU to execute them.
+	 *
+	 * Valid only with execlists and GuC submissions backends.
+	 */
+	unsigned int queued;
 
 	struct drm_i915_gem_object *default_state;
 
-- 
2.14.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/2] drm/i915/pmu: Add queued counter
  2017-11-22 12:46 [PATCH 1/2] drm/i915: Keep a count of requests waiting for a slot on GPU Tvrtko Ursulin
@ 2017-11-22 12:46 ` Tvrtko Ursulin
  2017-11-22 13:01   ` Chris Wilson
  2017-11-22 21:15   ` Rogozhkin, Dmitry V
  2017-11-22 12:59 ` [PATCH 1/2] drm/i915: Keep a count of requests waiting for a slot on GPU Chris Wilson
  1 sibling, 2 replies; 10+ messages in thread
From: Tvrtko Ursulin @ 2017-11-22 12:46 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

We add a PMU counter to expose the number of requests currently submitted
to the GPU, plus the number of runnable requests waiting on GPU time.

This is useful to analyze the overall load of the system.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_pmu.c | 30 +++++++++++++++++++++++++-----
 include/uapi/drm/i915_drm.h     |  6 ++++++
 2 files changed, 31 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index 112243720ff3..b2b4b32af35f 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -36,7 +36,8 @@
 #define ENGINE_SAMPLE_MASK \
 	(BIT(I915_SAMPLE_BUSY) | \
 	 BIT(I915_SAMPLE_WAIT) | \
-	 BIT(I915_SAMPLE_SEMA))
+	 BIT(I915_SAMPLE_SEMA) | \
+	 BIT(I915_SAMPLE_QUEUED))
 
 #define ENGINE_SAMPLE_BITS (1 << I915_PMU_SAMPLE_BITS)
 
@@ -223,6 +224,12 @@ static void engines_sample(struct drm_i915_private *dev_priv)
 
 		update_sample(&engine->pmu.sample[I915_SAMPLE_SEMA],
 			      PERIOD, !!(val & RING_WAIT_SEMAPHORE));
+
+		if (engine->pmu.enable & BIT(I915_SAMPLE_QUEUED))
+			update_sample(&engine->pmu.sample[I915_SAMPLE_QUEUED],
+				      1 / I915_SAMPLE_QUEUED_SCALE,
+				      engine->queued +
+				      (last_seqno - current_seqno));
 	}
 
 	if (fw)
@@ -310,6 +317,10 @@ static int engine_event_init(struct perf_event *event)
 		if (INTEL_GEN(i915) < 6)
 			return -ENODEV;
 		break;
+	case I915_SAMPLE_QUEUED:
+		if (INTEL_GEN(i915) < 8)
+			return -ENODEV;
+		break;
 	default:
 		return -ENOENT;
 	}
@@ -399,6 +410,10 @@ static u64 __i915_pmu_event_read(struct perf_event *event)
 		} else if (sample == I915_SAMPLE_BUSY &&
 			   engine->pmu.busy_stats) {
 			val = ktime_to_ns(intel_engine_get_busy_time(engine));
+		} else if (sample == I915_SAMPLE_QUEUED) {
+			val =
+			   div_u64(engine->pmu.sample[I915_SAMPLE_QUEUED].cur,
+				   FREQUENCY);
 		} else {
 			val = engine->pmu.sample[sample].cur;
 		}
@@ -679,13 +694,18 @@ static ssize_t i915_pmu_event_show(struct device *dev,
 	I915_EVENT_STR(_name.unit, _unit)
 
 #define I915_ENGINE_EVENT(_name, _class, _instance, _sample) \
-	I915_EVENT_ATTR(_name, __I915_PMU_ENGINE(_class, _instance, _sample)), \
+	I915_EVENT_ATTR(_name, __I915_PMU_ENGINE(_class, _instance, _sample))
+
+#define I915_ENGINE_EVENT_NS(_name, _class, _instance, _sample) \
+	I915_ENGINE_EVENT(_name, _class, _instance, _sample), \
 	I915_EVENT_STR(_name.unit, "ns")
 
 #define I915_ENGINE_EVENTS(_name, _class, _instance) \
-	I915_ENGINE_EVENT(_name##_instance-busy, _class, _instance, I915_SAMPLE_BUSY), \
-	I915_ENGINE_EVENT(_name##_instance-sema, _class, _instance, I915_SAMPLE_SEMA), \
-	I915_ENGINE_EVENT(_name##_instance-wait, _class, _instance, I915_SAMPLE_WAIT)
+	I915_ENGINE_EVENT_NS(_name##_instance-busy, _class, _instance, I915_SAMPLE_BUSY), \
+	I915_ENGINE_EVENT_NS(_name##_instance-sema, _class, _instance, I915_SAMPLE_SEMA), \
+	I915_ENGINE_EVENT_NS(_name##_instance-wait, _class, _instance, I915_SAMPLE_WAIT), \
+	I915_ENGINE_EVENT(_name##_instance-queued, _class, _instance, I915_SAMPLE_QUEUED), \
+	I915_EVENT_STR(_name##_instance-queued.scale, __stringify(I915_SAMPLE_QUEUED_SCALE))
 
 static struct attribute *i915_pmu_events_attrs[] = {
 	I915_ENGINE_EVENTS(rcs, I915_ENGINE_CLASS_RENDER, 0),
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 915a6e85a855..20ee668d1428 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -111,9 +111,12 @@ enum drm_i915_pmu_engine_sample {
 	I915_SAMPLE_BUSY = 0,
 	I915_SAMPLE_WAIT = 1,
 	I915_SAMPLE_SEMA = 2,
+	I915_SAMPLE_QUEUED = 3,
 	I915_ENGINE_SAMPLE_MAX /* non-ABI */
 };
 
+#define I915_SAMPLE_QUEUED_SCALE 1e-2 /* No braces please. */
+
 #define I915_PMU_SAMPLE_BITS (4)
 #define I915_PMU_SAMPLE_MASK (0xf)
 #define I915_PMU_SAMPLE_INSTANCE_BITS (8)
@@ -134,6 +137,9 @@ enum drm_i915_pmu_engine_sample {
 #define I915_PMU_ENGINE_SEMA(class, instance) \
 	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_SEMA)
 
+#define I915_PMU_ENGINE_QUEUED(class, instance) \
+	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_QUEUED)
+
 #define __I915_PMU_OTHER(x) (__I915_PMU_ENGINE(0xff, 0xff, 0xf) + 1 + (x))
 
 #define I915_PMU_ACTUAL_FREQUENCY	__I915_PMU_OTHER(0)
-- 
2.14.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] drm/i915: Keep a count of requests waiting for a slot on GPU
  2017-11-22 12:46 [PATCH 1/2] drm/i915: Keep a count of requests waiting for a slot on GPU Tvrtko Ursulin
  2017-11-22 12:46 ` [PATCH 2/2] drm/i915/pmu: Add queued counter Tvrtko Ursulin
@ 2017-11-22 12:59 ` Chris Wilson
  2017-11-22 13:31   ` Tvrtko Ursulin
  1 sibling, 1 reply; 10+ messages in thread
From: Chris Wilson @ 2017-11-22 12:59 UTC (permalink / raw)
  To: Tvrtko Ursulin, Intel-gfx

Quoting Tvrtko Ursulin (2017-11-22 12:46:21)
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Keep a per-engine number of runnable (waiting for GPU time) requests.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem_request.c | 5 +++++
>  drivers/gpu/drm/i915/intel_engine_cs.c  | 5 +++--
>  drivers/gpu/drm/i915/intel_lrc.c        | 1 +
>  drivers/gpu/drm/i915/intel_ringbuffer.h | 8 ++++++++
>  4 files changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
> index 7325469ce754..e3c74cafa7d4 100644
> --- a/drivers/gpu/drm/i915/i915_gem_request.c
> +++ b/drivers/gpu/drm/i915/i915_gem_request.c
> @@ -480,6 +480,9 @@ void __i915_gem_request_submit(struct drm_i915_gem_request *request)
>         engine->emit_breadcrumb(request,
>                                 request->ring->vaddr + request->postfix);
>  
> +       GEM_BUG_ON(engine->queued == 0);
> +       engine->queued--;

Ok, so under engine->timeline->lock.

> +
>         spin_lock(&request->timeline->lock);
>         list_move_tail(&request->link, &timeline->requests);
>         spin_unlock(&request->timeline->lock);
> @@ -525,6 +528,8 @@ void __i915_gem_request_unsubmit(struct drm_i915_gem_request *request)
>         timeline = request->timeline;
>         GEM_BUG_ON(timeline == engine->timeline);
>  
> +       engine->queued++;
> +
>         spin_lock(&timeline->lock);
>         list_move(&request->link, &timeline->requests);
>         spin_unlock(&timeline->lock);
> diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
> index d53680c08cb0..cc9d60130ddd 100644
> --- a/drivers/gpu/drm/i915/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
> @@ -1675,12 +1675,13 @@ void intel_engine_dump(struct intel_engine_cs *engine, struct drm_printer *m)
>         u64 addr;
>  
>         drm_printf(m, "%s\n", engine->name);
> -       drm_printf(m, "    current seqno %x, last %x, hangcheck %x [%d ms], inflight %d\n",
> +       drm_printf(m, "    current seqno %x, last %x, hangcheck %x [%d ms], inflight %d, queued %d\n",
>                    intel_engine_get_seqno(engine),
>                    intel_engine_last_submit(engine),
>                    engine->hangcheck.seqno,
>                    jiffies_to_msecs(jiffies - engine->hangcheck.action_timestamp),
> -                  engine->timeline->inflight_seqnos);
> +                  engine->timeline->inflight_seqnos,
> +                  INTEL_GEN(dev_priv) >= 8 ? engine->queued : -1);

Not gen8 specific, just add engine->queued++ to i9xx_submit_request().
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] drm/i915/pmu: Add queued counter
  2017-11-22 12:46 ` [PATCH 2/2] drm/i915/pmu: Add queued counter Tvrtko Ursulin
@ 2017-11-22 13:01   ` Chris Wilson
  2017-11-22 21:15   ` Rogozhkin, Dmitry V
  1 sibling, 0 replies; 10+ messages in thread
From: Chris Wilson @ 2017-11-22 13:01 UTC (permalink / raw)
  To: Tvrtko Ursulin, Intel-gfx

Quoting Tvrtko Ursulin (2017-11-22 12:46:22)
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> We add a PMU counter to expose the number of requests currently submitted
> to the GPU, plus the number of runnable requests waiting on GPU time.
> 
> This is useful to analyze the overall load of the system.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_pmu.c | 30 +++++++++++++++++++++++++-----
>  include/uapi/drm/i915_drm.h     |  6 ++++++
>  2 files changed, 31 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
> index 112243720ff3..b2b4b32af35f 100644
> --- a/drivers/gpu/drm/i915/i915_pmu.c
> +++ b/drivers/gpu/drm/i915/i915_pmu.c
> @@ -36,7 +36,8 @@
>  #define ENGINE_SAMPLE_MASK \
>         (BIT(I915_SAMPLE_BUSY) | \
>          BIT(I915_SAMPLE_WAIT) | \
> -        BIT(I915_SAMPLE_SEMA))
> +        BIT(I915_SAMPLE_SEMA) | \
> +        BIT(I915_SAMPLE_QUEUED))
>  
>  #define ENGINE_SAMPLE_BITS (1 << I915_PMU_SAMPLE_BITS)
>  
> @@ -223,6 +224,12 @@ static void engines_sample(struct drm_i915_private *dev_priv)
>  
>                 update_sample(&engine->pmu.sample[I915_SAMPLE_SEMA],
>                               PERIOD, !!(val & RING_WAIT_SEMAPHORE));
> +
> +               if (engine->pmu.enable & BIT(I915_SAMPLE_QUEUED))
> +                       update_sample(&engine->pmu.sample[I915_SAMPLE_QUEUED],
> +                                     1 / I915_SAMPLE_QUEUED_SCALE,
> +                                     engine->queued +

READ_ONCE(engine->queued) to indicate that this is outside of its lock.
Useful to indicate that we haven't serialised it with last_seqno. (And
no serialisation can be done against current_seqno ;)
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] drm/i915: Keep a count of requests waiting for a slot on GPU
  2017-11-22 12:59 ` [PATCH 1/2] drm/i915: Keep a count of requests waiting for a slot on GPU Chris Wilson
@ 2017-11-22 13:31   ` Tvrtko Ursulin
  2017-11-22 13:35     ` Chris Wilson
  0 siblings, 1 reply; 10+ messages in thread
From: Tvrtko Ursulin @ 2017-11-22 13:31 UTC (permalink / raw)
  To: Chris Wilson, Tvrtko Ursulin, Intel-gfx


On 22/11/2017 12:59, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2017-11-22 12:46:21)
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> Keep a per-engine number of runnable (waiting for GPU time) requests.
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_gem_request.c | 5 +++++
>>   drivers/gpu/drm/i915/intel_engine_cs.c  | 5 +++--
>>   drivers/gpu/drm/i915/intel_lrc.c        | 1 +
>>   drivers/gpu/drm/i915/intel_ringbuffer.h | 8 ++++++++
>>   4 files changed, 17 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
>> index 7325469ce754..e3c74cafa7d4 100644
>> --- a/drivers/gpu/drm/i915/i915_gem_request.c
>> +++ b/drivers/gpu/drm/i915/i915_gem_request.c
>> @@ -480,6 +480,9 @@ void __i915_gem_request_submit(struct drm_i915_gem_request *request)
>>          engine->emit_breadcrumb(request,
>>                                  request->ring->vaddr + request->postfix);
>>   
>> +       GEM_BUG_ON(engine->queued == 0);
>> +       engine->queued--;
> 
> Ok, so under engine->timeline->lock.
> 
>> +
>>          spin_lock(&request->timeline->lock);
>>          list_move_tail(&request->link, &timeline->requests);
>>          spin_unlock(&request->timeline->lock);
>> @@ -525,6 +528,8 @@ void __i915_gem_request_unsubmit(struct drm_i915_gem_request *request)
>>          timeline = request->timeline;
>>          GEM_BUG_ON(timeline == engine->timeline);
>>   
>> +       engine->queued++;
>> +
>>          spin_lock(&timeline->lock);
>>          list_move(&request->link, &timeline->requests);
>>          spin_unlock(&timeline->lock);
>> diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
>> index d53680c08cb0..cc9d60130ddd 100644
>> --- a/drivers/gpu/drm/i915/intel_engine_cs.c
>> +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
>> @@ -1675,12 +1675,13 @@ void intel_engine_dump(struct intel_engine_cs *engine, struct drm_printer *m)
>>          u64 addr;
>>   
>>          drm_printf(m, "%s\n", engine->name);
>> -       drm_printf(m, "    current seqno %x, last %x, hangcheck %x [%d ms], inflight %d\n",
>> +       drm_printf(m, "    current seqno %x, last %x, hangcheck %x [%d ms], inflight %d, queued %d\n",
>>                     intel_engine_get_seqno(engine),
>>                     intel_engine_last_submit(engine),
>>                     engine->hangcheck.seqno,
>>                     jiffies_to_msecs(jiffies - engine->hangcheck.action_timestamp),
>> -                  engine->timeline->inflight_seqnos);
>> +                  engine->timeline->inflight_seqnos,
>> +                  INTEL_GEN(dev_priv) >= 8 ? engine->queued : -1);
> 
> Not gen8 specific, just add engine->queued++ to i9xx_submit_request().

But where to put the decrement, and more importantly, how not make it 
lag the reality from the retire worker? :(

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] drm/i915: Keep a count of requests waiting for a slot on GPU
  2017-11-22 13:31   ` Tvrtko Ursulin
@ 2017-11-22 13:35     ` Chris Wilson
  2017-11-22 14:00       ` Tvrtko Ursulin
  0 siblings, 1 reply; 10+ messages in thread
From: Chris Wilson @ 2017-11-22 13:35 UTC (permalink / raw)
  To: Tvrtko Ursulin, Tvrtko Ursulin, Intel-gfx

Quoting Tvrtko Ursulin (2017-11-22 13:31:56)
> 
> On 22/11/2017 12:59, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2017-11-22 12:46:21)
> >> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>
> >> Keep a per-engine number of runnable (waiting for GPU time) requests.
> >>
> >> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >> ---
> >>   drivers/gpu/drm/i915/i915_gem_request.c | 5 +++++
> >>   drivers/gpu/drm/i915/intel_engine_cs.c  | 5 +++--
> >>   drivers/gpu/drm/i915/intel_lrc.c        | 1 +
> >>   drivers/gpu/drm/i915/intel_ringbuffer.h | 8 ++++++++
> >>   4 files changed, 17 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
> >> index 7325469ce754..e3c74cafa7d4 100644
> >> --- a/drivers/gpu/drm/i915/i915_gem_request.c
> >> +++ b/drivers/gpu/drm/i915/i915_gem_request.c
> >> @@ -480,6 +480,9 @@ void __i915_gem_request_submit(struct drm_i915_gem_request *request)
> >>          engine->emit_breadcrumb(request,
> >>                                  request->ring->vaddr + request->postfix);
> >>   
> >> +       GEM_BUG_ON(engine->queued == 0);
> >> +       engine->queued--;
> > 
> > Ok, so under engine->timeline->lock.
> > 
> >> +
> >>          spin_lock(&request->timeline->lock);
> >>          list_move_tail(&request->link, &timeline->requests);
> >>          spin_unlock(&request->timeline->lock);
> >> @@ -525,6 +528,8 @@ void __i915_gem_request_unsubmit(struct drm_i915_gem_request *request)
> >>          timeline = request->timeline;
> >>          GEM_BUG_ON(timeline == engine->timeline);
> >>   
> >> +       engine->queued++;
> >> +
> >>          spin_lock(&timeline->lock);
> >>          list_move(&request->link, &timeline->requests);
> >>          spin_unlock(&timeline->lock);
> >> diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
> >> index d53680c08cb0..cc9d60130ddd 100644
> >> --- a/drivers/gpu/drm/i915/intel_engine_cs.c
> >> +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
> >> @@ -1675,12 +1675,13 @@ void intel_engine_dump(struct intel_engine_cs *engine, struct drm_printer *m)
> >>          u64 addr;
> >>   
> >>          drm_printf(m, "%s\n", engine->name);
> >> -       drm_printf(m, "    current seqno %x, last %x, hangcheck %x [%d ms], inflight %d\n",
> >> +       drm_printf(m, "    current seqno %x, last %x, hangcheck %x [%d ms], inflight %d, queued %d\n",
> >>                     intel_engine_get_seqno(engine),
> >>                     intel_engine_last_submit(engine),
> >>                     engine->hangcheck.seqno,
> >>                     jiffies_to_msecs(jiffies - engine->hangcheck.action_timestamp),
> >> -                  engine->timeline->inflight_seqnos);
> >> +                  engine->timeline->inflight_seqnos,
> >> +                  INTEL_GEN(dev_priv) >= 8 ? engine->queued : -1);
> > 
> > Not gen8 specific, just add engine->queued++ to i9xx_submit_request().
> 
> But where to put the decrement, and more importantly, how not make it 
> lag the reality from the retire worker? :(

The decrement is in __i915_gem_request_submit as before.
So basically it should remain 0, since we aren't keeping a queue of work
for the HW and just submitting into the ringbuffer as soon as we are
ready. (This may not always remain so...) Hence why the (last_seqno -
current_seqno) was so important.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] drm/i915: Keep a count of requests waiting for a slot on GPU
  2017-11-22 13:35     ` Chris Wilson
@ 2017-11-22 14:00       ` Tvrtko Ursulin
  0 siblings, 0 replies; 10+ messages in thread
From: Tvrtko Ursulin @ 2017-11-22 14:00 UTC (permalink / raw)
  To: Chris Wilson, Tvrtko Ursulin, Intel-gfx


On 22/11/2017 13:35, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2017-11-22 13:31:56)
>>
>> On 22/11/2017 12:59, Chris Wilson wrote:
>>> Quoting Tvrtko Ursulin (2017-11-22 12:46:21)
>>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>
>>>> Keep a per-engine number of runnable (waiting for GPU time) requests.
>>>>
>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>> ---
>>>>    drivers/gpu/drm/i915/i915_gem_request.c | 5 +++++
>>>>    drivers/gpu/drm/i915/intel_engine_cs.c  | 5 +++--
>>>>    drivers/gpu/drm/i915/intel_lrc.c        | 1 +
>>>>    drivers/gpu/drm/i915/intel_ringbuffer.h | 8 ++++++++
>>>>    4 files changed, 17 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
>>>> index 7325469ce754..e3c74cafa7d4 100644
>>>> --- a/drivers/gpu/drm/i915/i915_gem_request.c
>>>> +++ b/drivers/gpu/drm/i915/i915_gem_request.c
>>>> @@ -480,6 +480,9 @@ void __i915_gem_request_submit(struct drm_i915_gem_request *request)
>>>>           engine->emit_breadcrumb(request,
>>>>                                   request->ring->vaddr + request->postfix);
>>>>    
>>>> +       GEM_BUG_ON(engine->queued == 0);
>>>> +       engine->queued--;
>>>
>>> Ok, so under engine->timeline->lock.
>>>
>>>> +
>>>>           spin_lock(&request->timeline->lock);
>>>>           list_move_tail(&request->link, &timeline->requests);
>>>>           spin_unlock(&request->timeline->lock);
>>>> @@ -525,6 +528,8 @@ void __i915_gem_request_unsubmit(struct drm_i915_gem_request *request)
>>>>           timeline = request->timeline;
>>>>           GEM_BUG_ON(timeline == engine->timeline);
>>>>    
>>>> +       engine->queued++;
>>>> +
>>>>           spin_lock(&timeline->lock);
>>>>           list_move(&request->link, &timeline->requests);
>>>>           spin_unlock(&timeline->lock);
>>>> diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
>>>> index d53680c08cb0..cc9d60130ddd 100644
>>>> --- a/drivers/gpu/drm/i915/intel_engine_cs.c
>>>> +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
>>>> @@ -1675,12 +1675,13 @@ void intel_engine_dump(struct intel_engine_cs *engine, struct drm_printer *m)
>>>>           u64 addr;
>>>>    
>>>>           drm_printf(m, "%s\n", engine->name);
>>>> -       drm_printf(m, "    current seqno %x, last %x, hangcheck %x [%d ms], inflight %d\n",
>>>> +       drm_printf(m, "    current seqno %x, last %x, hangcheck %x [%d ms], inflight %d, queued %d\n",
>>>>                      intel_engine_get_seqno(engine),
>>>>                      intel_engine_last_submit(engine),
>>>>                      engine->hangcheck.seqno,
>>>>                      jiffies_to_msecs(jiffies - engine->hangcheck.action_timestamp),
>>>> -                  engine->timeline->inflight_seqnos);
>>>> +                  engine->timeline->inflight_seqnos,
>>>> +                  INTEL_GEN(dev_priv) >= 8 ? engine->queued : -1);
>>>
>>> Not gen8 specific, just add engine->queued++ to i9xx_submit_request().
>>
>> But where to put the decrement, and more importantly, how not make it
>> lag the reality from the retire worker? :(
> 
> The decrement is in __i915_gem_request_submit as before.
> So basically it should remain 0, since we aren't keeping a queue of work
> for the HW and just submitting into the ringbuffer as soon as we are
> ready. (This may not always remain so...) Hence why the (last_seqno -
> current_seqno) was so important.

I keep getting lost in these callbacks...

Right, so if we put the increment in i9xx_submit_request, it calls 
i915_gem_request_submit which immediately decrements it, and as you say 
it remains at zero then.

So you just want to look at the last_seqno - current_seqno on <gen8. 
Which will be the totality of submitted stuff so sounds OK. Makes the 
metric usable on those platforms as well. Sorry for the confusion.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] drm/i915/pmu: Add queued counter
  2017-11-22 12:46 ` [PATCH 2/2] drm/i915/pmu: Add queued counter Tvrtko Ursulin
  2017-11-22 13:01   ` Chris Wilson
@ 2017-11-22 21:15   ` Rogozhkin, Dmitry V
  2017-11-22 21:38     ` Chris Wilson
  1 sibling, 1 reply; 10+ messages in thread
From: Rogozhkin, Dmitry V @ 2017-11-22 21:15 UTC (permalink / raw)
  To: tursulin; +Cc: Intel-gfx

On Wed, 2017-11-22 at 12:46 +0000, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> We add a PMU counter to expose the number of requests currently submitted
> to the GPU, plus the number of runnable requests waiting on GPU time.
> 
> This is useful to analyze the overall load of the system.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_pmu.c | 30 +++++++++++++++++++++++++-----
>  include/uapi/drm/i915_drm.h     |  6 ++++++
>  2 files changed, 31 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
> index 112243720ff3..b2b4b32af35f 100644
> --- a/drivers/gpu/drm/i915/i915_pmu.c
> +++ b/drivers/gpu/drm/i915/i915_pmu.c
> @@ -36,7 +36,8 @@
>  #define ENGINE_SAMPLE_MASK \
>  	(BIT(I915_SAMPLE_BUSY) | \
>  	 BIT(I915_SAMPLE_WAIT) | \
> -	 BIT(I915_SAMPLE_SEMA))
> +	 BIT(I915_SAMPLE_SEMA) | \
> +	 BIT(I915_SAMPLE_QUEUED))
>  
>  #define ENGINE_SAMPLE_BITS (1 << I915_PMU_SAMPLE_BITS)
>  
> @@ -223,6 +224,12 @@ static void engines_sample(struct drm_i915_private *dev_priv)
>  
>  		update_sample(&engine->pmu.sample[I915_SAMPLE_SEMA],
>  			      PERIOD, !!(val & RING_WAIT_SEMAPHORE));
> +
> +		if (engine->pmu.enable & BIT(I915_SAMPLE_QUEUED))
> +			update_sample(&engine->pmu.sample[I915_SAMPLE_QUEUED],
> +				      1 / I915_SAMPLE_QUEUED_SCALE,
> +				      engine->queued +
> +				      (last_seqno - current_seqno));
>  	}
>  
>  	if (fw)
> @@ -310,6 +317,10 @@ static int engine_event_init(struct perf_event *event)
>  		if (INTEL_GEN(i915) < 6)
>  			return -ENODEV;
>  		break;
> +	case I915_SAMPLE_QUEUED:
> +		if (INTEL_GEN(i915) < 8)
> +			return -ENODEV;
> +		break;
>  	default:
>  		return -ENOENT;
>  	}
> @@ -399,6 +410,10 @@ static u64 __i915_pmu_event_read(struct perf_event *event)
>  		} else if (sample == I915_SAMPLE_BUSY &&
>  			   engine->pmu.busy_stats) {
>  			val = ktime_to_ns(intel_engine_get_busy_time(engine));
> +		} else if (sample == I915_SAMPLE_QUEUED) {
> +			val =
> +			   div_u64(engine->pmu.sample[I915_SAMPLE_QUEUED].cur,
> +				   FREQUENCY);
>  		} else {
>  			val = engine->pmu.sample[sample].cur;
>  		}
> @@ -679,13 +694,18 @@ static ssize_t i915_pmu_event_show(struct device *dev,
>  	I915_EVENT_STR(_name.unit, _unit)
>  
>  #define I915_ENGINE_EVENT(_name, _class, _instance, _sample) \
> -	I915_EVENT_ATTR(_name, __I915_PMU_ENGINE(_class, _instance, _sample)), \
> +	I915_EVENT_ATTR(_name, __I915_PMU_ENGINE(_class, _instance, _sample))
> +
> +#define I915_ENGINE_EVENT_NS(_name, _class, _instance, _sample) \
> +	I915_ENGINE_EVENT(_name, _class, _instance, _sample), \
>  	I915_EVENT_STR(_name.unit, "ns")
>  
>  #define I915_ENGINE_EVENTS(_name, _class, _instance) \
> -	I915_ENGINE_EVENT(_name##_instance-busy, _class, _instance, I915_SAMPLE_BUSY), \
> -	I915_ENGINE_EVENT(_name##_instance-sema, _class, _instance, I915_SAMPLE_SEMA), \
> -	I915_ENGINE_EVENT(_name##_instance-wait, _class, _instance, I915_SAMPLE_WAIT)
> +	I915_ENGINE_EVENT_NS(_name##_instance-busy, _class, _instance, I915_SAMPLE_BUSY), \
> +	I915_ENGINE_EVENT_NS(_name##_instance-sema, _class, _instance, I915_SAMPLE_SEMA), \
> +	I915_ENGINE_EVENT_NS(_name##_instance-wait, _class, _instance, I915_SAMPLE_WAIT), \
> +	I915_ENGINE_EVENT(_name##_instance-queued, _class, _instance, I915_SAMPLE_QUEUED), \
> +	I915_EVENT_STR(_name##_instance-queued.scale, __stringify(I915_SAMPLE_QUEUED_SCALE))

We expose queued as an "instant" metric, i.e. that's a number of
requests on the very moment when we query the metric, i.e. that's not an
ever growing counter - is that right? I doubt such a metric will make
sense for perf-stat. Can we somehow restrict it to be queried by uAPI
only and avoid perf-stat for it?

>  
>  static struct attribute *i915_pmu_events_attrs[] = {
>  	I915_ENGINE_EVENTS(rcs, I915_ENGINE_CLASS_RENDER, 0),
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 915a6e85a855..20ee668d1428 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -111,9 +111,12 @@ enum drm_i915_pmu_engine_sample {
>  	I915_SAMPLE_BUSY = 0,
>  	I915_SAMPLE_WAIT = 1,
>  	I915_SAMPLE_SEMA = 2,
> +	I915_SAMPLE_QUEUED = 3,
>  	I915_ENGINE_SAMPLE_MAX /* non-ABI */
>  };
>  
> +#define I915_SAMPLE_QUEUED_SCALE 1e-2 /* No braces please. */
> +
>  #define I915_PMU_SAMPLE_BITS (4)
>  #define I915_PMU_SAMPLE_MASK (0xf)
>  #define I915_PMU_SAMPLE_INSTANCE_BITS (8)
> @@ -134,6 +137,9 @@ enum drm_i915_pmu_engine_sample {
>  #define I915_PMU_ENGINE_SEMA(class, instance) \
>  	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_SEMA)
>  
> +#define I915_PMU_ENGINE_QUEUED(class, instance) \
> +	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_QUEUED)
> +
>  #define __I915_PMU_OTHER(x) (__I915_PMU_ENGINE(0xff, 0xff, 0xf) + 1 + (x))
>  
>  #define I915_PMU_ACTUAL_FREQUENCY	__I915_PMU_OTHER(0)

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] drm/i915/pmu: Add queued counter
  2017-11-22 21:15   ` Rogozhkin, Dmitry V
@ 2017-11-22 21:38     ` Chris Wilson
  2017-11-24  8:14       ` Tvrtko Ursulin
  0 siblings, 1 reply; 10+ messages in thread
From: Chris Wilson @ 2017-11-22 21:38 UTC (permalink / raw)
  To: Rogozhkin, Dmitry V, tursulin; +Cc: Intel-gfx

Quoting Rogozhkin, Dmitry V (2017-11-22 21:15:24)
> On Wed, 2017-11-22 at 12:46 +0000, Tvrtko Ursulin wrote:
> > From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > 
> > We add a PMU counter to expose the number of requests currently submitted
> > to the GPU, plus the number of runnable requests waiting on GPU time.
> > 
> > This is useful to analyze the overall load of the system.
> > 
> > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_pmu.c | 30 +++++++++++++++++++++++++-----
> >  include/uapi/drm/i915_drm.h     |  6 ++++++
> >  2 files changed, 31 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
> > index 112243720ff3..b2b4b32af35f 100644
> > --- a/drivers/gpu/drm/i915/i915_pmu.c
> > +++ b/drivers/gpu/drm/i915/i915_pmu.c
> > @@ -36,7 +36,8 @@
> >  #define ENGINE_SAMPLE_MASK \
> >       (BIT(I915_SAMPLE_BUSY) | \
> >        BIT(I915_SAMPLE_WAIT) | \
> > -      BIT(I915_SAMPLE_SEMA))
> > +      BIT(I915_SAMPLE_SEMA) | \
> > +      BIT(I915_SAMPLE_QUEUED))
> >  
> >  #define ENGINE_SAMPLE_BITS (1 << I915_PMU_SAMPLE_BITS)
> >  
> > @@ -223,6 +224,12 @@ static void engines_sample(struct drm_i915_private *dev_priv)
> >  
> >               update_sample(&engine->pmu.sample[I915_SAMPLE_SEMA],
> >                             PERIOD, !!(val & RING_WAIT_SEMAPHORE));
> > +
> > +             if (engine->pmu.enable & BIT(I915_SAMPLE_QUEUED))
> > +                     update_sample(&engine->pmu.sample[I915_SAMPLE_QUEUED],
> > +                                   1 / I915_SAMPLE_QUEUED_SCALE,
> > +                                   engine->queued +
> > +                                   (last_seqno - current_seqno));
> >       }
> >  
> >       if (fw)
> > @@ -310,6 +317,10 @@ static int engine_event_init(struct perf_event *event)
> >               if (INTEL_GEN(i915) < 6)
> >                       return -ENODEV;
> >               break;
> > +     case I915_SAMPLE_QUEUED:
> > +             if (INTEL_GEN(i915) < 8)
> > +                     return -ENODEV;
> > +             break;
> >       default:
> >               return -ENOENT;
> >       }
> > @@ -399,6 +410,10 @@ static u64 __i915_pmu_event_read(struct perf_event *event)
> >               } else if (sample == I915_SAMPLE_BUSY &&
> >                          engine->pmu.busy_stats) {
> >                       val = ktime_to_ns(intel_engine_get_busy_time(engine));
> > +             } else if (sample == I915_SAMPLE_QUEUED) {
> > +                     val =
> > +                        div_u64(engine->pmu.sample[I915_SAMPLE_QUEUED].cur,
> > +                                FREQUENCY);
> >               } else {
> >                       val = engine->pmu.sample[sample].cur;
> >               }
> > @@ -679,13 +694,18 @@ static ssize_t i915_pmu_event_show(struct device *dev,
> >       I915_EVENT_STR(_name.unit, _unit)
> >  
> >  #define I915_ENGINE_EVENT(_name, _class, _instance, _sample) \
> > -     I915_EVENT_ATTR(_name, __I915_PMU_ENGINE(_class, _instance, _sample)), \
> > +     I915_EVENT_ATTR(_name, __I915_PMU_ENGINE(_class, _instance, _sample))
> > +
> > +#define I915_ENGINE_EVENT_NS(_name, _class, _instance, _sample) \
> > +     I915_ENGINE_EVENT(_name, _class, _instance, _sample), \
> >       I915_EVENT_STR(_name.unit, "ns")
> >  
> >  #define I915_ENGINE_EVENTS(_name, _class, _instance) \
> > -     I915_ENGINE_EVENT(_name##_instance-busy, _class, _instance, I915_SAMPLE_BUSY), \
> > -     I915_ENGINE_EVENT(_name##_instance-sema, _class, _instance, I915_SAMPLE_SEMA), \
> > -     I915_ENGINE_EVENT(_name##_instance-wait, _class, _instance, I915_SAMPLE_WAIT)
> > +     I915_ENGINE_EVENT_NS(_name##_instance-busy, _class, _instance, I915_SAMPLE_BUSY), \
> > +     I915_ENGINE_EVENT_NS(_name##_instance-sema, _class, _instance, I915_SAMPLE_SEMA), \
> > +     I915_ENGINE_EVENT_NS(_name##_instance-wait, _class, _instance, I915_SAMPLE_WAIT), \
> > +     I915_ENGINE_EVENT(_name##_instance-queued, _class, _instance, I915_SAMPLE_QUEUED), \
> > +     I915_EVENT_STR(_name##_instance-queued.scale, __stringify(I915_SAMPLE_QUEUED_SCALE))
> 
> We expose queued as an "instant" metric, i.e. that's a number of
> requests on the very moment when we query the metric, i.e. that's not an
> ever growing counter - is that right? I doubt such a metric will make
> sense for perf-stat. Can we somehow restrict it to be queried by uAPI
> only and avoid perf-stat for it?

True, I forgot that Tvrtko normalised it. We don't need to, and so allow
the user to apply their own normalisation and generate average values
for the last N seconds instead; aiui.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] drm/i915/pmu: Add queued counter
  2017-11-22 21:38     ` Chris Wilson
@ 2017-11-24  8:14       ` Tvrtko Ursulin
  0 siblings, 0 replies; 10+ messages in thread
From: Tvrtko Ursulin @ 2017-11-24  8:14 UTC (permalink / raw)
  To: Chris Wilson, Rogozhkin, Dmitry V, tursulin; +Cc: Intel-gfx


On 22/11/2017 21:38, Chris Wilson wrote:
> Quoting Rogozhkin, Dmitry V (2017-11-22 21:15:24)
>> On Wed, 2017-11-22 at 12:46 +0000, Tvrtko Ursulin wrote:
>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>
>>> We add a PMU counter to expose the number of requests currently submitted
>>> to the GPU, plus the number of runnable requests waiting on GPU time.
>>>
>>> This is useful to analyze the overall load of the system.
>>>
>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>> ---
>>>   drivers/gpu/drm/i915/i915_pmu.c | 30 +++++++++++++++++++++++++-----
>>>   include/uapi/drm/i915_drm.h     |  6 ++++++
>>>   2 files changed, 31 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
>>> index 112243720ff3..b2b4b32af35f 100644
>>> --- a/drivers/gpu/drm/i915/i915_pmu.c
>>> +++ b/drivers/gpu/drm/i915/i915_pmu.c
>>> @@ -36,7 +36,8 @@
>>>   #define ENGINE_SAMPLE_MASK \
>>>        (BIT(I915_SAMPLE_BUSY) | \
>>>         BIT(I915_SAMPLE_WAIT) | \
>>> -      BIT(I915_SAMPLE_SEMA))
>>> +      BIT(I915_SAMPLE_SEMA) | \
>>> +      BIT(I915_SAMPLE_QUEUED))
>>>   
>>>   #define ENGINE_SAMPLE_BITS (1 << I915_PMU_SAMPLE_BITS)
>>>   
>>> @@ -223,6 +224,12 @@ static void engines_sample(struct drm_i915_private *dev_priv)
>>>   
>>>                update_sample(&engine->pmu.sample[I915_SAMPLE_SEMA],
>>>                              PERIOD, !!(val & RING_WAIT_SEMAPHORE));
>>> +
>>> +             if (engine->pmu.enable & BIT(I915_SAMPLE_QUEUED))
>>> +                     update_sample(&engine->pmu.sample[I915_SAMPLE_QUEUED],
>>> +                                   1 / I915_SAMPLE_QUEUED_SCALE,
>>> +                                   engine->queued +
>>> +                                   (last_seqno - current_seqno));
>>>        }
>>>   
>>>        if (fw)
>>> @@ -310,6 +317,10 @@ static int engine_event_init(struct perf_event *event)
>>>                if (INTEL_GEN(i915) < 6)
>>>                        return -ENODEV;
>>>                break;
>>> +     case I915_SAMPLE_QUEUED:
>>> +             if (INTEL_GEN(i915) < 8)
>>> +                     return -ENODEV;
>>> +             break;
>>>        default:
>>>                return -ENOENT;
>>>        }
>>> @@ -399,6 +410,10 @@ static u64 __i915_pmu_event_read(struct perf_event *event)
>>>                } else if (sample == I915_SAMPLE_BUSY &&
>>>                           engine->pmu.busy_stats) {
>>>                        val = ktime_to_ns(intel_engine_get_busy_time(engine));
>>> +             } else if (sample == I915_SAMPLE_QUEUED) {
>>> +                     val =
>>> +                        div_u64(engine->pmu.sample[I915_SAMPLE_QUEUED].cur,
>>> +                                FREQUENCY);
>>>                } else {
>>>                        val = engine->pmu.sample[sample].cur;
>>>                }
>>> @@ -679,13 +694,18 @@ static ssize_t i915_pmu_event_show(struct device *dev,
>>>        I915_EVENT_STR(_name.unit, _unit)
>>>   
>>>   #define I915_ENGINE_EVENT(_name, _class, _instance, _sample) \
>>> -     I915_EVENT_ATTR(_name, __I915_PMU_ENGINE(_class, _instance, _sample)), \
>>> +     I915_EVENT_ATTR(_name, __I915_PMU_ENGINE(_class, _instance, _sample))
>>> +
>>> +#define I915_ENGINE_EVENT_NS(_name, _class, _instance, _sample) \
>>> +     I915_ENGINE_EVENT(_name, _class, _instance, _sample), \
>>>        I915_EVENT_STR(_name.unit, "ns")
>>>   
>>>   #define I915_ENGINE_EVENTS(_name, _class, _instance) \
>>> -     I915_ENGINE_EVENT(_name##_instance-busy, _class, _instance, I915_SAMPLE_BUSY), \
>>> -     I915_ENGINE_EVENT(_name##_instance-sema, _class, _instance, I915_SAMPLE_SEMA), \
>>> -     I915_ENGINE_EVENT(_name##_instance-wait, _class, _instance, I915_SAMPLE_WAIT)
>>> +     I915_ENGINE_EVENT_NS(_name##_instance-busy, _class, _instance, I915_SAMPLE_BUSY), \
>>> +     I915_ENGINE_EVENT_NS(_name##_instance-sema, _class, _instance, I915_SAMPLE_SEMA), \
>>> +     I915_ENGINE_EVENT_NS(_name##_instance-wait, _class, _instance, I915_SAMPLE_WAIT), \
>>> +     I915_ENGINE_EVENT(_name##_instance-queued, _class, _instance, I915_SAMPLE_QUEUED), \
>>> +     I915_EVENT_STR(_name##_instance-queued.scale, __stringify(I915_SAMPLE_QUEUED_SCALE))
>>
>> We expose queued as an "instant" metric, i.e. that's a number of
>> requests on the very moment when we query the metric, i.e. that's not an
>> ever growing counter - is that right? I doubt such a metric will make
>> sense for perf-stat. Can we somehow restrict it to be queried by uAPI
>> only and avoid perf-stat for it?
> 
> True, I forgot that Tvrtko normalised it. We don't need to, and so allow
> the user to apply their own normalisation and generate average values
> for the last N seconds instead; aiui.

It is not instantaneous as implemented but sampled as the other 
counters. So when two samples are read from userspace it gets the 
average QD for the period between two samples. The more often it reads, 
the more accurate QD it gets.

That to me makes it sounds useful for system monitoring.

For load-balancing use case you are saying you would prefer an 
instantaneous metric - so not average of the previous period but just 
the current state?

And you are saying that wouldn't make sense for perf stat so would like 
to block it? I don't think we can do that.

In general I don't at the moment see this being appropriate to be 
exposed via the PMU. Perhaps instantaneous QD could be exported via 
private ioctl, with the usual problems.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2017-11-24  8:14 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-22 12:46 [PATCH 1/2] drm/i915: Keep a count of requests waiting for a slot on GPU Tvrtko Ursulin
2017-11-22 12:46 ` [PATCH 2/2] drm/i915/pmu: Add queued counter Tvrtko Ursulin
2017-11-22 13:01   ` Chris Wilson
2017-11-22 21:15   ` Rogozhkin, Dmitry V
2017-11-22 21:38     ` Chris Wilson
2017-11-24  8:14       ` Tvrtko Ursulin
2017-11-22 12:59 ` [PATCH 1/2] drm/i915: Keep a count of requests waiting for a slot on GPU Chris Wilson
2017-11-22 13:31   ` Tvrtko Ursulin
2017-11-22 13:35     ` Chris Wilson
2017-11-22 14:00       ` Tvrtko Ursulin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.