* [PATCH 1/2] drm/i915: Keep a count of requests waiting for a slot on GPU @ 2017-11-22 12:46 Tvrtko Ursulin 2017-11-22 12:46 ` [PATCH 2/2] drm/i915/pmu: Add queued counter Tvrtko Ursulin 2017-11-22 12:59 ` [PATCH 1/2] drm/i915: Keep a count of requests waiting for a slot on GPU Chris Wilson 0 siblings, 2 replies; 10+ messages in thread From: Tvrtko Ursulin @ 2017-11-22 12:46 UTC (permalink / raw) To: Intel-gfx From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Keep a per-engine number of runnable (waiting for GPU time) requests. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> --- drivers/gpu/drm/i915/i915_gem_request.c | 5 +++++ drivers/gpu/drm/i915/intel_engine_cs.c | 5 +++-- drivers/gpu/drm/i915/intel_lrc.c | 1 + drivers/gpu/drm/i915/intel_ringbuffer.h | 8 ++++++++ 4 files changed, 17 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c index 7325469ce754..e3c74cafa7d4 100644 --- a/drivers/gpu/drm/i915/i915_gem_request.c +++ b/drivers/gpu/drm/i915/i915_gem_request.c @@ -480,6 +480,9 @@ void __i915_gem_request_submit(struct drm_i915_gem_request *request) engine->emit_breadcrumb(request, request->ring->vaddr + request->postfix); + GEM_BUG_ON(engine->queued == 0); + engine->queued--; + spin_lock(&request->timeline->lock); list_move_tail(&request->link, &timeline->requests); spin_unlock(&request->timeline->lock); @@ -525,6 +528,8 @@ void __i915_gem_request_unsubmit(struct drm_i915_gem_request *request) timeline = request->timeline; GEM_BUG_ON(timeline == engine->timeline); + engine->queued++; + spin_lock(&timeline->lock); list_move(&request->link, &timeline->requests); spin_unlock(&timeline->lock); diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c index d53680c08cb0..cc9d60130ddd 100644 --- a/drivers/gpu/drm/i915/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/intel_engine_cs.c @@ -1675,12 +1675,13 @@ void intel_engine_dump(struct intel_engine_cs *engine, struct drm_printer *m) u64 addr; drm_printf(m, "%s\n", engine->name); - drm_printf(m, "\tcurrent seqno %x, last %x, hangcheck %x [%d ms], inflight %d\n", + drm_printf(m, "\tcurrent seqno %x, last %x, hangcheck %x [%d ms], inflight %d, queued %d\n", intel_engine_get_seqno(engine), intel_engine_last_submit(engine), engine->hangcheck.seqno, jiffies_to_msecs(jiffies - engine->hangcheck.action_timestamp), - engine->timeline->inflight_seqnos); + engine->timeline->inflight_seqnos, + INTEL_GEN(dev_priv) >= 8 ? engine->queued : -1); drm_printf(m, "\tReset count: %d\n", i915_reset_engine_count(error, engine)); diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 570864583e28..c4c53ad67b4c 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -953,6 +953,7 @@ static void insert_request(struct intel_engine_cs *engine, { struct i915_priolist *p = lookup_priolist(engine, pt, prio); + engine->queued++; list_add_tail(&pt->link, &ptr_mask_bits(p, 1)->requests); if (ptr_unmask_bits(p, 1)) tasklet_hi_schedule(&engine->execlists.tasklet); diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h index 37a389ff031e..fe1651ec9756 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.h +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h @@ -297,6 +297,14 @@ struct intel_engine_cs { struct intel_ring *buffer; struct intel_timeline *timeline; + /** + * @queued: Number of runnable requests submitted to the backend. + * + * Count of requests waiting for the GPU to execute them. + * + * Valid only with execlists and GuC submissions backends. + */ + unsigned int queued; struct drm_i915_gem_object *default_state; -- 2.14.1 _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 2/2] drm/i915/pmu: Add queued counter 2017-11-22 12:46 [PATCH 1/2] drm/i915: Keep a count of requests waiting for a slot on GPU Tvrtko Ursulin @ 2017-11-22 12:46 ` Tvrtko Ursulin 2017-11-22 13:01 ` Chris Wilson 2017-11-22 21:15 ` Rogozhkin, Dmitry V 2017-11-22 12:59 ` [PATCH 1/2] drm/i915: Keep a count of requests waiting for a slot on GPU Chris Wilson 1 sibling, 2 replies; 10+ messages in thread From: Tvrtko Ursulin @ 2017-11-22 12:46 UTC (permalink / raw) To: Intel-gfx From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> We add a PMU counter to expose the number of requests currently submitted to the GPU, plus the number of runnable requests waiting on GPU time. This is useful to analyze the overall load of the system. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> --- drivers/gpu/drm/i915/i915_pmu.c | 30 +++++++++++++++++++++++++----- include/uapi/drm/i915_drm.h | 6 ++++++ 2 files changed, 31 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c index 112243720ff3..b2b4b32af35f 100644 --- a/drivers/gpu/drm/i915/i915_pmu.c +++ b/drivers/gpu/drm/i915/i915_pmu.c @@ -36,7 +36,8 @@ #define ENGINE_SAMPLE_MASK \ (BIT(I915_SAMPLE_BUSY) | \ BIT(I915_SAMPLE_WAIT) | \ - BIT(I915_SAMPLE_SEMA)) + BIT(I915_SAMPLE_SEMA) | \ + BIT(I915_SAMPLE_QUEUED)) #define ENGINE_SAMPLE_BITS (1 << I915_PMU_SAMPLE_BITS) @@ -223,6 +224,12 @@ static void engines_sample(struct drm_i915_private *dev_priv) update_sample(&engine->pmu.sample[I915_SAMPLE_SEMA], PERIOD, !!(val & RING_WAIT_SEMAPHORE)); + + if (engine->pmu.enable & BIT(I915_SAMPLE_QUEUED)) + update_sample(&engine->pmu.sample[I915_SAMPLE_QUEUED], + 1 / I915_SAMPLE_QUEUED_SCALE, + engine->queued + + (last_seqno - current_seqno)); } if (fw) @@ -310,6 +317,10 @@ static int engine_event_init(struct perf_event *event) if (INTEL_GEN(i915) < 6) return -ENODEV; break; + case I915_SAMPLE_QUEUED: + if (INTEL_GEN(i915) < 8) + return -ENODEV; + break; default: return -ENOENT; } @@ -399,6 +410,10 @@ static u64 __i915_pmu_event_read(struct perf_event *event) } else if (sample == I915_SAMPLE_BUSY && engine->pmu.busy_stats) { val = ktime_to_ns(intel_engine_get_busy_time(engine)); + } else if (sample == I915_SAMPLE_QUEUED) { + val = + div_u64(engine->pmu.sample[I915_SAMPLE_QUEUED].cur, + FREQUENCY); } else { val = engine->pmu.sample[sample].cur; } @@ -679,13 +694,18 @@ static ssize_t i915_pmu_event_show(struct device *dev, I915_EVENT_STR(_name.unit, _unit) #define I915_ENGINE_EVENT(_name, _class, _instance, _sample) \ - I915_EVENT_ATTR(_name, __I915_PMU_ENGINE(_class, _instance, _sample)), \ + I915_EVENT_ATTR(_name, __I915_PMU_ENGINE(_class, _instance, _sample)) + +#define I915_ENGINE_EVENT_NS(_name, _class, _instance, _sample) \ + I915_ENGINE_EVENT(_name, _class, _instance, _sample), \ I915_EVENT_STR(_name.unit, "ns") #define I915_ENGINE_EVENTS(_name, _class, _instance) \ - I915_ENGINE_EVENT(_name##_instance-busy, _class, _instance, I915_SAMPLE_BUSY), \ - I915_ENGINE_EVENT(_name##_instance-sema, _class, _instance, I915_SAMPLE_SEMA), \ - I915_ENGINE_EVENT(_name##_instance-wait, _class, _instance, I915_SAMPLE_WAIT) + I915_ENGINE_EVENT_NS(_name##_instance-busy, _class, _instance, I915_SAMPLE_BUSY), \ + I915_ENGINE_EVENT_NS(_name##_instance-sema, _class, _instance, I915_SAMPLE_SEMA), \ + I915_ENGINE_EVENT_NS(_name##_instance-wait, _class, _instance, I915_SAMPLE_WAIT), \ + I915_ENGINE_EVENT(_name##_instance-queued, _class, _instance, I915_SAMPLE_QUEUED), \ + I915_EVENT_STR(_name##_instance-queued.scale, __stringify(I915_SAMPLE_QUEUED_SCALE)) static struct attribute *i915_pmu_events_attrs[] = { I915_ENGINE_EVENTS(rcs, I915_ENGINE_CLASS_RENDER, 0), diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index 915a6e85a855..20ee668d1428 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -111,9 +111,12 @@ enum drm_i915_pmu_engine_sample { I915_SAMPLE_BUSY = 0, I915_SAMPLE_WAIT = 1, I915_SAMPLE_SEMA = 2, + I915_SAMPLE_QUEUED = 3, I915_ENGINE_SAMPLE_MAX /* non-ABI */ }; +#define I915_SAMPLE_QUEUED_SCALE 1e-2 /* No braces please. */ + #define I915_PMU_SAMPLE_BITS (4) #define I915_PMU_SAMPLE_MASK (0xf) #define I915_PMU_SAMPLE_INSTANCE_BITS (8) @@ -134,6 +137,9 @@ enum drm_i915_pmu_engine_sample { #define I915_PMU_ENGINE_SEMA(class, instance) \ __I915_PMU_ENGINE(class, instance, I915_SAMPLE_SEMA) +#define I915_PMU_ENGINE_QUEUED(class, instance) \ + __I915_PMU_ENGINE(class, instance, I915_SAMPLE_QUEUED) + #define __I915_PMU_OTHER(x) (__I915_PMU_ENGINE(0xff, 0xff, 0xf) + 1 + (x)) #define I915_PMU_ACTUAL_FREQUENCY __I915_PMU_OTHER(0) -- 2.14.1 _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] drm/i915/pmu: Add queued counter 2017-11-22 12:46 ` [PATCH 2/2] drm/i915/pmu: Add queued counter Tvrtko Ursulin @ 2017-11-22 13:01 ` Chris Wilson 2017-11-22 21:15 ` Rogozhkin, Dmitry V 1 sibling, 0 replies; 10+ messages in thread From: Chris Wilson @ 2017-11-22 13:01 UTC (permalink / raw) To: Tvrtko Ursulin, Intel-gfx Quoting Tvrtko Ursulin (2017-11-22 12:46:22) > From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > > We add a PMU counter to expose the number of requests currently submitted > to the GPU, plus the number of runnable requests waiting on GPU time. > > This is useful to analyze the overall load of the system. > > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > --- > drivers/gpu/drm/i915/i915_pmu.c | 30 +++++++++++++++++++++++++----- > include/uapi/drm/i915_drm.h | 6 ++++++ > 2 files changed, 31 insertions(+), 5 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c > index 112243720ff3..b2b4b32af35f 100644 > --- a/drivers/gpu/drm/i915/i915_pmu.c > +++ b/drivers/gpu/drm/i915/i915_pmu.c > @@ -36,7 +36,8 @@ > #define ENGINE_SAMPLE_MASK \ > (BIT(I915_SAMPLE_BUSY) | \ > BIT(I915_SAMPLE_WAIT) | \ > - BIT(I915_SAMPLE_SEMA)) > + BIT(I915_SAMPLE_SEMA) | \ > + BIT(I915_SAMPLE_QUEUED)) > > #define ENGINE_SAMPLE_BITS (1 << I915_PMU_SAMPLE_BITS) > > @@ -223,6 +224,12 @@ static void engines_sample(struct drm_i915_private *dev_priv) > > update_sample(&engine->pmu.sample[I915_SAMPLE_SEMA], > PERIOD, !!(val & RING_WAIT_SEMAPHORE)); > + > + if (engine->pmu.enable & BIT(I915_SAMPLE_QUEUED)) > + update_sample(&engine->pmu.sample[I915_SAMPLE_QUEUED], > + 1 / I915_SAMPLE_QUEUED_SCALE, > + engine->queued + READ_ONCE(engine->queued) to indicate that this is outside of its lock. Useful to indicate that we haven't serialised it with last_seqno. (And no serialisation can be done against current_seqno ;) -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] drm/i915/pmu: Add queued counter 2017-11-22 12:46 ` [PATCH 2/2] drm/i915/pmu: Add queued counter Tvrtko Ursulin 2017-11-22 13:01 ` Chris Wilson @ 2017-11-22 21:15 ` Rogozhkin, Dmitry V 2017-11-22 21:38 ` Chris Wilson 1 sibling, 1 reply; 10+ messages in thread From: Rogozhkin, Dmitry V @ 2017-11-22 21:15 UTC (permalink / raw) To: tursulin; +Cc: Intel-gfx On Wed, 2017-11-22 at 12:46 +0000, Tvrtko Ursulin wrote: > From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > > We add a PMU counter to expose the number of requests currently submitted > to the GPU, plus the number of runnable requests waiting on GPU time. > > This is useful to analyze the overall load of the system. > > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > --- > drivers/gpu/drm/i915/i915_pmu.c | 30 +++++++++++++++++++++++++----- > include/uapi/drm/i915_drm.h | 6 ++++++ > 2 files changed, 31 insertions(+), 5 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c > index 112243720ff3..b2b4b32af35f 100644 > --- a/drivers/gpu/drm/i915/i915_pmu.c > +++ b/drivers/gpu/drm/i915/i915_pmu.c > @@ -36,7 +36,8 @@ > #define ENGINE_SAMPLE_MASK \ > (BIT(I915_SAMPLE_BUSY) | \ > BIT(I915_SAMPLE_WAIT) | \ > - BIT(I915_SAMPLE_SEMA)) > + BIT(I915_SAMPLE_SEMA) | \ > + BIT(I915_SAMPLE_QUEUED)) > > #define ENGINE_SAMPLE_BITS (1 << I915_PMU_SAMPLE_BITS) > > @@ -223,6 +224,12 @@ static void engines_sample(struct drm_i915_private *dev_priv) > > update_sample(&engine->pmu.sample[I915_SAMPLE_SEMA], > PERIOD, !!(val & RING_WAIT_SEMAPHORE)); > + > + if (engine->pmu.enable & BIT(I915_SAMPLE_QUEUED)) > + update_sample(&engine->pmu.sample[I915_SAMPLE_QUEUED], > + 1 / I915_SAMPLE_QUEUED_SCALE, > + engine->queued + > + (last_seqno - current_seqno)); > } > > if (fw) > @@ -310,6 +317,10 @@ static int engine_event_init(struct perf_event *event) > if (INTEL_GEN(i915) < 6) > return -ENODEV; > break; > + case I915_SAMPLE_QUEUED: > + if (INTEL_GEN(i915) < 8) > + return -ENODEV; > + break; > default: > return -ENOENT; > } > @@ -399,6 +410,10 @@ static u64 __i915_pmu_event_read(struct perf_event *event) > } else if (sample == I915_SAMPLE_BUSY && > engine->pmu.busy_stats) { > val = ktime_to_ns(intel_engine_get_busy_time(engine)); > + } else if (sample == I915_SAMPLE_QUEUED) { > + val = > + div_u64(engine->pmu.sample[I915_SAMPLE_QUEUED].cur, > + FREQUENCY); > } else { > val = engine->pmu.sample[sample].cur; > } > @@ -679,13 +694,18 @@ static ssize_t i915_pmu_event_show(struct device *dev, > I915_EVENT_STR(_name.unit, _unit) > > #define I915_ENGINE_EVENT(_name, _class, _instance, _sample) \ > - I915_EVENT_ATTR(_name, __I915_PMU_ENGINE(_class, _instance, _sample)), \ > + I915_EVENT_ATTR(_name, __I915_PMU_ENGINE(_class, _instance, _sample)) > + > +#define I915_ENGINE_EVENT_NS(_name, _class, _instance, _sample) \ > + I915_ENGINE_EVENT(_name, _class, _instance, _sample), \ > I915_EVENT_STR(_name.unit, "ns") > > #define I915_ENGINE_EVENTS(_name, _class, _instance) \ > - I915_ENGINE_EVENT(_name##_instance-busy, _class, _instance, I915_SAMPLE_BUSY), \ > - I915_ENGINE_EVENT(_name##_instance-sema, _class, _instance, I915_SAMPLE_SEMA), \ > - I915_ENGINE_EVENT(_name##_instance-wait, _class, _instance, I915_SAMPLE_WAIT) > + I915_ENGINE_EVENT_NS(_name##_instance-busy, _class, _instance, I915_SAMPLE_BUSY), \ > + I915_ENGINE_EVENT_NS(_name##_instance-sema, _class, _instance, I915_SAMPLE_SEMA), \ > + I915_ENGINE_EVENT_NS(_name##_instance-wait, _class, _instance, I915_SAMPLE_WAIT), \ > + I915_ENGINE_EVENT(_name##_instance-queued, _class, _instance, I915_SAMPLE_QUEUED), \ > + I915_EVENT_STR(_name##_instance-queued.scale, __stringify(I915_SAMPLE_QUEUED_SCALE)) We expose queued as an "instant" metric, i.e. that's a number of requests on the very moment when we query the metric, i.e. that's not an ever growing counter - is that right? I doubt such a metric will make sense for perf-stat. Can we somehow restrict it to be queried by uAPI only and avoid perf-stat for it? > > static struct attribute *i915_pmu_events_attrs[] = { > I915_ENGINE_EVENTS(rcs, I915_ENGINE_CLASS_RENDER, 0), > diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h > index 915a6e85a855..20ee668d1428 100644 > --- a/include/uapi/drm/i915_drm.h > +++ b/include/uapi/drm/i915_drm.h > @@ -111,9 +111,12 @@ enum drm_i915_pmu_engine_sample { > I915_SAMPLE_BUSY = 0, > I915_SAMPLE_WAIT = 1, > I915_SAMPLE_SEMA = 2, > + I915_SAMPLE_QUEUED = 3, > I915_ENGINE_SAMPLE_MAX /* non-ABI */ > }; > > +#define I915_SAMPLE_QUEUED_SCALE 1e-2 /* No braces please. */ > + > #define I915_PMU_SAMPLE_BITS (4) > #define I915_PMU_SAMPLE_MASK (0xf) > #define I915_PMU_SAMPLE_INSTANCE_BITS (8) > @@ -134,6 +137,9 @@ enum drm_i915_pmu_engine_sample { > #define I915_PMU_ENGINE_SEMA(class, instance) \ > __I915_PMU_ENGINE(class, instance, I915_SAMPLE_SEMA) > > +#define I915_PMU_ENGINE_QUEUED(class, instance) \ > + __I915_PMU_ENGINE(class, instance, I915_SAMPLE_QUEUED) > + > #define __I915_PMU_OTHER(x) (__I915_PMU_ENGINE(0xff, 0xff, 0xf) + 1 + (x)) > > #define I915_PMU_ACTUAL_FREQUENCY __I915_PMU_OTHER(0) _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] drm/i915/pmu: Add queued counter 2017-11-22 21:15 ` Rogozhkin, Dmitry V @ 2017-11-22 21:38 ` Chris Wilson 2017-11-24 8:14 ` Tvrtko Ursulin 0 siblings, 1 reply; 10+ messages in thread From: Chris Wilson @ 2017-11-22 21:38 UTC (permalink / raw) To: Rogozhkin, Dmitry V, tursulin; +Cc: Intel-gfx Quoting Rogozhkin, Dmitry V (2017-11-22 21:15:24) > On Wed, 2017-11-22 at 12:46 +0000, Tvrtko Ursulin wrote: > > From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > > > > We add a PMU counter to expose the number of requests currently submitted > > to the GPU, plus the number of runnable requests waiting on GPU time. > > > > This is useful to analyze the overall load of the system. > > > > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > > --- > > drivers/gpu/drm/i915/i915_pmu.c | 30 +++++++++++++++++++++++++----- > > include/uapi/drm/i915_drm.h | 6 ++++++ > > 2 files changed, 31 insertions(+), 5 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c > > index 112243720ff3..b2b4b32af35f 100644 > > --- a/drivers/gpu/drm/i915/i915_pmu.c > > +++ b/drivers/gpu/drm/i915/i915_pmu.c > > @@ -36,7 +36,8 @@ > > #define ENGINE_SAMPLE_MASK \ > > (BIT(I915_SAMPLE_BUSY) | \ > > BIT(I915_SAMPLE_WAIT) | \ > > - BIT(I915_SAMPLE_SEMA)) > > + BIT(I915_SAMPLE_SEMA) | \ > > + BIT(I915_SAMPLE_QUEUED)) > > > > #define ENGINE_SAMPLE_BITS (1 << I915_PMU_SAMPLE_BITS) > > > > @@ -223,6 +224,12 @@ static void engines_sample(struct drm_i915_private *dev_priv) > > > > update_sample(&engine->pmu.sample[I915_SAMPLE_SEMA], > > PERIOD, !!(val & RING_WAIT_SEMAPHORE)); > > + > > + if (engine->pmu.enable & BIT(I915_SAMPLE_QUEUED)) > > + update_sample(&engine->pmu.sample[I915_SAMPLE_QUEUED], > > + 1 / I915_SAMPLE_QUEUED_SCALE, > > + engine->queued + > > + (last_seqno - current_seqno)); > > } > > > > if (fw) > > @@ -310,6 +317,10 @@ static int engine_event_init(struct perf_event *event) > > if (INTEL_GEN(i915) < 6) > > return -ENODEV; > > break; > > + case I915_SAMPLE_QUEUED: > > + if (INTEL_GEN(i915) < 8) > > + return -ENODEV; > > + break; > > default: > > return -ENOENT; > > } > > @@ -399,6 +410,10 @@ static u64 __i915_pmu_event_read(struct perf_event *event) > > } else if (sample == I915_SAMPLE_BUSY && > > engine->pmu.busy_stats) { > > val = ktime_to_ns(intel_engine_get_busy_time(engine)); > > + } else if (sample == I915_SAMPLE_QUEUED) { > > + val = > > + div_u64(engine->pmu.sample[I915_SAMPLE_QUEUED].cur, > > + FREQUENCY); > > } else { > > val = engine->pmu.sample[sample].cur; > > } > > @@ -679,13 +694,18 @@ static ssize_t i915_pmu_event_show(struct device *dev, > > I915_EVENT_STR(_name.unit, _unit) > > > > #define I915_ENGINE_EVENT(_name, _class, _instance, _sample) \ > > - I915_EVENT_ATTR(_name, __I915_PMU_ENGINE(_class, _instance, _sample)), \ > > + I915_EVENT_ATTR(_name, __I915_PMU_ENGINE(_class, _instance, _sample)) > > + > > +#define I915_ENGINE_EVENT_NS(_name, _class, _instance, _sample) \ > > + I915_ENGINE_EVENT(_name, _class, _instance, _sample), \ > > I915_EVENT_STR(_name.unit, "ns") > > > > #define I915_ENGINE_EVENTS(_name, _class, _instance) \ > > - I915_ENGINE_EVENT(_name##_instance-busy, _class, _instance, I915_SAMPLE_BUSY), \ > > - I915_ENGINE_EVENT(_name##_instance-sema, _class, _instance, I915_SAMPLE_SEMA), \ > > - I915_ENGINE_EVENT(_name##_instance-wait, _class, _instance, I915_SAMPLE_WAIT) > > + I915_ENGINE_EVENT_NS(_name##_instance-busy, _class, _instance, I915_SAMPLE_BUSY), \ > > + I915_ENGINE_EVENT_NS(_name##_instance-sema, _class, _instance, I915_SAMPLE_SEMA), \ > > + I915_ENGINE_EVENT_NS(_name##_instance-wait, _class, _instance, I915_SAMPLE_WAIT), \ > > + I915_ENGINE_EVENT(_name##_instance-queued, _class, _instance, I915_SAMPLE_QUEUED), \ > > + I915_EVENT_STR(_name##_instance-queued.scale, __stringify(I915_SAMPLE_QUEUED_SCALE)) > > We expose queued as an "instant" metric, i.e. that's a number of > requests on the very moment when we query the metric, i.e. that's not an > ever growing counter - is that right? I doubt such a metric will make > sense for perf-stat. Can we somehow restrict it to be queried by uAPI > only and avoid perf-stat for it? True, I forgot that Tvrtko normalised it. We don't need to, and so allow the user to apply their own normalisation and generate average values for the last N seconds instead; aiui. -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] drm/i915/pmu: Add queued counter 2017-11-22 21:38 ` Chris Wilson @ 2017-11-24 8:14 ` Tvrtko Ursulin 0 siblings, 0 replies; 10+ messages in thread From: Tvrtko Ursulin @ 2017-11-24 8:14 UTC (permalink / raw) To: Chris Wilson, Rogozhkin, Dmitry V, tursulin; +Cc: Intel-gfx On 22/11/2017 21:38, Chris Wilson wrote: > Quoting Rogozhkin, Dmitry V (2017-11-22 21:15:24) >> On Wed, 2017-11-22 at 12:46 +0000, Tvrtko Ursulin wrote: >>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >>> >>> We add a PMU counter to expose the number of requests currently submitted >>> to the GPU, plus the number of runnable requests waiting on GPU time. >>> >>> This is useful to analyze the overall load of the system. >>> >>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >>> --- >>> drivers/gpu/drm/i915/i915_pmu.c | 30 +++++++++++++++++++++++++----- >>> include/uapi/drm/i915_drm.h | 6 ++++++ >>> 2 files changed, 31 insertions(+), 5 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c >>> index 112243720ff3..b2b4b32af35f 100644 >>> --- a/drivers/gpu/drm/i915/i915_pmu.c >>> +++ b/drivers/gpu/drm/i915/i915_pmu.c >>> @@ -36,7 +36,8 @@ >>> #define ENGINE_SAMPLE_MASK \ >>> (BIT(I915_SAMPLE_BUSY) | \ >>> BIT(I915_SAMPLE_WAIT) | \ >>> - BIT(I915_SAMPLE_SEMA)) >>> + BIT(I915_SAMPLE_SEMA) | \ >>> + BIT(I915_SAMPLE_QUEUED)) >>> >>> #define ENGINE_SAMPLE_BITS (1 << I915_PMU_SAMPLE_BITS) >>> >>> @@ -223,6 +224,12 @@ static void engines_sample(struct drm_i915_private *dev_priv) >>> >>> update_sample(&engine->pmu.sample[I915_SAMPLE_SEMA], >>> PERIOD, !!(val & RING_WAIT_SEMAPHORE)); >>> + >>> + if (engine->pmu.enable & BIT(I915_SAMPLE_QUEUED)) >>> + update_sample(&engine->pmu.sample[I915_SAMPLE_QUEUED], >>> + 1 / I915_SAMPLE_QUEUED_SCALE, >>> + engine->queued + >>> + (last_seqno - current_seqno)); >>> } >>> >>> if (fw) >>> @@ -310,6 +317,10 @@ static int engine_event_init(struct perf_event *event) >>> if (INTEL_GEN(i915) < 6) >>> return -ENODEV; >>> break; >>> + case I915_SAMPLE_QUEUED: >>> + if (INTEL_GEN(i915) < 8) >>> + return -ENODEV; >>> + break; >>> default: >>> return -ENOENT; >>> } >>> @@ -399,6 +410,10 @@ static u64 __i915_pmu_event_read(struct perf_event *event) >>> } else if (sample == I915_SAMPLE_BUSY && >>> engine->pmu.busy_stats) { >>> val = ktime_to_ns(intel_engine_get_busy_time(engine)); >>> + } else if (sample == I915_SAMPLE_QUEUED) { >>> + val = >>> + div_u64(engine->pmu.sample[I915_SAMPLE_QUEUED].cur, >>> + FREQUENCY); >>> } else { >>> val = engine->pmu.sample[sample].cur; >>> } >>> @@ -679,13 +694,18 @@ static ssize_t i915_pmu_event_show(struct device *dev, >>> I915_EVENT_STR(_name.unit, _unit) >>> >>> #define I915_ENGINE_EVENT(_name, _class, _instance, _sample) \ >>> - I915_EVENT_ATTR(_name, __I915_PMU_ENGINE(_class, _instance, _sample)), \ >>> + I915_EVENT_ATTR(_name, __I915_PMU_ENGINE(_class, _instance, _sample)) >>> + >>> +#define I915_ENGINE_EVENT_NS(_name, _class, _instance, _sample) \ >>> + I915_ENGINE_EVENT(_name, _class, _instance, _sample), \ >>> I915_EVENT_STR(_name.unit, "ns") >>> >>> #define I915_ENGINE_EVENTS(_name, _class, _instance) \ >>> - I915_ENGINE_EVENT(_name##_instance-busy, _class, _instance, I915_SAMPLE_BUSY), \ >>> - I915_ENGINE_EVENT(_name##_instance-sema, _class, _instance, I915_SAMPLE_SEMA), \ >>> - I915_ENGINE_EVENT(_name##_instance-wait, _class, _instance, I915_SAMPLE_WAIT) >>> + I915_ENGINE_EVENT_NS(_name##_instance-busy, _class, _instance, I915_SAMPLE_BUSY), \ >>> + I915_ENGINE_EVENT_NS(_name##_instance-sema, _class, _instance, I915_SAMPLE_SEMA), \ >>> + I915_ENGINE_EVENT_NS(_name##_instance-wait, _class, _instance, I915_SAMPLE_WAIT), \ >>> + I915_ENGINE_EVENT(_name##_instance-queued, _class, _instance, I915_SAMPLE_QUEUED), \ >>> + I915_EVENT_STR(_name##_instance-queued.scale, __stringify(I915_SAMPLE_QUEUED_SCALE)) >> >> We expose queued as an "instant" metric, i.e. that's a number of >> requests on the very moment when we query the metric, i.e. that's not an >> ever growing counter - is that right? I doubt such a metric will make >> sense for perf-stat. Can we somehow restrict it to be queried by uAPI >> only and avoid perf-stat for it? > > True, I forgot that Tvrtko normalised it. We don't need to, and so allow > the user to apply their own normalisation and generate average values > for the last N seconds instead; aiui. It is not instantaneous as implemented but sampled as the other counters. So when two samples are read from userspace it gets the average QD for the period between two samples. The more often it reads, the more accurate QD it gets. That to me makes it sounds useful for system monitoring. For load-balancing use case you are saying you would prefer an instantaneous metric - so not average of the previous period but just the current state? And you are saying that wouldn't make sense for perf stat so would like to block it? I don't think we can do that. In general I don't at the moment see this being appropriate to be exposed via the PMU. Perhaps instantaneous QD could be exported via private ioctl, with the usual problems. Regards, Tvrtko _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] drm/i915: Keep a count of requests waiting for a slot on GPU 2017-11-22 12:46 [PATCH 1/2] drm/i915: Keep a count of requests waiting for a slot on GPU Tvrtko Ursulin 2017-11-22 12:46 ` [PATCH 2/2] drm/i915/pmu: Add queued counter Tvrtko Ursulin @ 2017-11-22 12:59 ` Chris Wilson 2017-11-22 13:31 ` Tvrtko Ursulin 1 sibling, 1 reply; 10+ messages in thread From: Chris Wilson @ 2017-11-22 12:59 UTC (permalink / raw) To: Tvrtko Ursulin, Intel-gfx Quoting Tvrtko Ursulin (2017-11-22 12:46:21) > From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > > Keep a per-engine number of runnable (waiting for GPU time) requests. > > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > --- > drivers/gpu/drm/i915/i915_gem_request.c | 5 +++++ > drivers/gpu/drm/i915/intel_engine_cs.c | 5 +++-- > drivers/gpu/drm/i915/intel_lrc.c | 1 + > drivers/gpu/drm/i915/intel_ringbuffer.h | 8 ++++++++ > 4 files changed, 17 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c > index 7325469ce754..e3c74cafa7d4 100644 > --- a/drivers/gpu/drm/i915/i915_gem_request.c > +++ b/drivers/gpu/drm/i915/i915_gem_request.c > @@ -480,6 +480,9 @@ void __i915_gem_request_submit(struct drm_i915_gem_request *request) > engine->emit_breadcrumb(request, > request->ring->vaddr + request->postfix); > > + GEM_BUG_ON(engine->queued == 0); > + engine->queued--; Ok, so under engine->timeline->lock. > + > spin_lock(&request->timeline->lock); > list_move_tail(&request->link, &timeline->requests); > spin_unlock(&request->timeline->lock); > @@ -525,6 +528,8 @@ void __i915_gem_request_unsubmit(struct drm_i915_gem_request *request) > timeline = request->timeline; > GEM_BUG_ON(timeline == engine->timeline); > > + engine->queued++; > + > spin_lock(&timeline->lock); > list_move(&request->link, &timeline->requests); > spin_unlock(&timeline->lock); > diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c > index d53680c08cb0..cc9d60130ddd 100644 > --- a/drivers/gpu/drm/i915/intel_engine_cs.c > +++ b/drivers/gpu/drm/i915/intel_engine_cs.c > @@ -1675,12 +1675,13 @@ void intel_engine_dump(struct intel_engine_cs *engine, struct drm_printer *m) > u64 addr; > > drm_printf(m, "%s\n", engine->name); > - drm_printf(m, " current seqno %x, last %x, hangcheck %x [%d ms], inflight %d\n", > + drm_printf(m, " current seqno %x, last %x, hangcheck %x [%d ms], inflight %d, queued %d\n", > intel_engine_get_seqno(engine), > intel_engine_last_submit(engine), > engine->hangcheck.seqno, > jiffies_to_msecs(jiffies - engine->hangcheck.action_timestamp), > - engine->timeline->inflight_seqnos); > + engine->timeline->inflight_seqnos, > + INTEL_GEN(dev_priv) >= 8 ? engine->queued : -1); Not gen8 specific, just add engine->queued++ to i9xx_submit_request(). -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] drm/i915: Keep a count of requests waiting for a slot on GPU 2017-11-22 12:59 ` [PATCH 1/2] drm/i915: Keep a count of requests waiting for a slot on GPU Chris Wilson @ 2017-11-22 13:31 ` Tvrtko Ursulin 2017-11-22 13:35 ` Chris Wilson 0 siblings, 1 reply; 10+ messages in thread From: Tvrtko Ursulin @ 2017-11-22 13:31 UTC (permalink / raw) To: Chris Wilson, Tvrtko Ursulin, Intel-gfx On 22/11/2017 12:59, Chris Wilson wrote: > Quoting Tvrtko Ursulin (2017-11-22 12:46:21) >> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >> >> Keep a per-engine number of runnable (waiting for GPU time) requests. >> >> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >> --- >> drivers/gpu/drm/i915/i915_gem_request.c | 5 +++++ >> drivers/gpu/drm/i915/intel_engine_cs.c | 5 +++-- >> drivers/gpu/drm/i915/intel_lrc.c | 1 + >> drivers/gpu/drm/i915/intel_ringbuffer.h | 8 ++++++++ >> 4 files changed, 17 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c >> index 7325469ce754..e3c74cafa7d4 100644 >> --- a/drivers/gpu/drm/i915/i915_gem_request.c >> +++ b/drivers/gpu/drm/i915/i915_gem_request.c >> @@ -480,6 +480,9 @@ void __i915_gem_request_submit(struct drm_i915_gem_request *request) >> engine->emit_breadcrumb(request, >> request->ring->vaddr + request->postfix); >> >> + GEM_BUG_ON(engine->queued == 0); >> + engine->queued--; > > Ok, so under engine->timeline->lock. > >> + >> spin_lock(&request->timeline->lock); >> list_move_tail(&request->link, &timeline->requests); >> spin_unlock(&request->timeline->lock); >> @@ -525,6 +528,8 @@ void __i915_gem_request_unsubmit(struct drm_i915_gem_request *request) >> timeline = request->timeline; >> GEM_BUG_ON(timeline == engine->timeline); >> >> + engine->queued++; >> + >> spin_lock(&timeline->lock); >> list_move(&request->link, &timeline->requests); >> spin_unlock(&timeline->lock); >> diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c >> index d53680c08cb0..cc9d60130ddd 100644 >> --- a/drivers/gpu/drm/i915/intel_engine_cs.c >> +++ b/drivers/gpu/drm/i915/intel_engine_cs.c >> @@ -1675,12 +1675,13 @@ void intel_engine_dump(struct intel_engine_cs *engine, struct drm_printer *m) >> u64 addr; >> >> drm_printf(m, "%s\n", engine->name); >> - drm_printf(m, " current seqno %x, last %x, hangcheck %x [%d ms], inflight %d\n", >> + drm_printf(m, " current seqno %x, last %x, hangcheck %x [%d ms], inflight %d, queued %d\n", >> intel_engine_get_seqno(engine), >> intel_engine_last_submit(engine), >> engine->hangcheck.seqno, >> jiffies_to_msecs(jiffies - engine->hangcheck.action_timestamp), >> - engine->timeline->inflight_seqnos); >> + engine->timeline->inflight_seqnos, >> + INTEL_GEN(dev_priv) >= 8 ? engine->queued : -1); > > Not gen8 specific, just add engine->queued++ to i9xx_submit_request(). But where to put the decrement, and more importantly, how not make it lag the reality from the retire worker? :( Regards, Tvrtko _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] drm/i915: Keep a count of requests waiting for a slot on GPU 2017-11-22 13:31 ` Tvrtko Ursulin @ 2017-11-22 13:35 ` Chris Wilson 2017-11-22 14:00 ` Tvrtko Ursulin 0 siblings, 1 reply; 10+ messages in thread From: Chris Wilson @ 2017-11-22 13:35 UTC (permalink / raw) To: Tvrtko Ursulin, Tvrtko Ursulin, Intel-gfx Quoting Tvrtko Ursulin (2017-11-22 13:31:56) > > On 22/11/2017 12:59, Chris Wilson wrote: > > Quoting Tvrtko Ursulin (2017-11-22 12:46:21) > >> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > >> > >> Keep a per-engine number of runnable (waiting for GPU time) requests. > >> > >> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > >> --- > >> drivers/gpu/drm/i915/i915_gem_request.c | 5 +++++ > >> drivers/gpu/drm/i915/intel_engine_cs.c | 5 +++-- > >> drivers/gpu/drm/i915/intel_lrc.c | 1 + > >> drivers/gpu/drm/i915/intel_ringbuffer.h | 8 ++++++++ > >> 4 files changed, 17 insertions(+), 2 deletions(-) > >> > >> diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c > >> index 7325469ce754..e3c74cafa7d4 100644 > >> --- a/drivers/gpu/drm/i915/i915_gem_request.c > >> +++ b/drivers/gpu/drm/i915/i915_gem_request.c > >> @@ -480,6 +480,9 @@ void __i915_gem_request_submit(struct drm_i915_gem_request *request) > >> engine->emit_breadcrumb(request, > >> request->ring->vaddr + request->postfix); > >> > >> + GEM_BUG_ON(engine->queued == 0); > >> + engine->queued--; > > > > Ok, so under engine->timeline->lock. > > > >> + > >> spin_lock(&request->timeline->lock); > >> list_move_tail(&request->link, &timeline->requests); > >> spin_unlock(&request->timeline->lock); > >> @@ -525,6 +528,8 @@ void __i915_gem_request_unsubmit(struct drm_i915_gem_request *request) > >> timeline = request->timeline; > >> GEM_BUG_ON(timeline == engine->timeline); > >> > >> + engine->queued++; > >> + > >> spin_lock(&timeline->lock); > >> list_move(&request->link, &timeline->requests); > >> spin_unlock(&timeline->lock); > >> diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c > >> index d53680c08cb0..cc9d60130ddd 100644 > >> --- a/drivers/gpu/drm/i915/intel_engine_cs.c > >> +++ b/drivers/gpu/drm/i915/intel_engine_cs.c > >> @@ -1675,12 +1675,13 @@ void intel_engine_dump(struct intel_engine_cs *engine, struct drm_printer *m) > >> u64 addr; > >> > >> drm_printf(m, "%s\n", engine->name); > >> - drm_printf(m, " current seqno %x, last %x, hangcheck %x [%d ms], inflight %d\n", > >> + drm_printf(m, " current seqno %x, last %x, hangcheck %x [%d ms], inflight %d, queued %d\n", > >> intel_engine_get_seqno(engine), > >> intel_engine_last_submit(engine), > >> engine->hangcheck.seqno, > >> jiffies_to_msecs(jiffies - engine->hangcheck.action_timestamp), > >> - engine->timeline->inflight_seqnos); > >> + engine->timeline->inflight_seqnos, > >> + INTEL_GEN(dev_priv) >= 8 ? engine->queued : -1); > > > > Not gen8 specific, just add engine->queued++ to i9xx_submit_request(). > > But where to put the decrement, and more importantly, how not make it > lag the reality from the retire worker? :( The decrement is in __i915_gem_request_submit as before. So basically it should remain 0, since we aren't keeping a queue of work for the HW and just submitting into the ringbuffer as soon as we are ready. (This may not always remain so...) Hence why the (last_seqno - current_seqno) was so important. -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] drm/i915: Keep a count of requests waiting for a slot on GPU 2017-11-22 13:35 ` Chris Wilson @ 2017-11-22 14:00 ` Tvrtko Ursulin 0 siblings, 0 replies; 10+ messages in thread From: Tvrtko Ursulin @ 2017-11-22 14:00 UTC (permalink / raw) To: Chris Wilson, Tvrtko Ursulin, Intel-gfx On 22/11/2017 13:35, Chris Wilson wrote: > Quoting Tvrtko Ursulin (2017-11-22 13:31:56) >> >> On 22/11/2017 12:59, Chris Wilson wrote: >>> Quoting Tvrtko Ursulin (2017-11-22 12:46:21) >>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >>>> >>>> Keep a per-engine number of runnable (waiting for GPU time) requests. >>>> >>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >>>> --- >>>> drivers/gpu/drm/i915/i915_gem_request.c | 5 +++++ >>>> drivers/gpu/drm/i915/intel_engine_cs.c | 5 +++-- >>>> drivers/gpu/drm/i915/intel_lrc.c | 1 + >>>> drivers/gpu/drm/i915/intel_ringbuffer.h | 8 ++++++++ >>>> 4 files changed, 17 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c >>>> index 7325469ce754..e3c74cafa7d4 100644 >>>> --- a/drivers/gpu/drm/i915/i915_gem_request.c >>>> +++ b/drivers/gpu/drm/i915/i915_gem_request.c >>>> @@ -480,6 +480,9 @@ void __i915_gem_request_submit(struct drm_i915_gem_request *request) >>>> engine->emit_breadcrumb(request, >>>> request->ring->vaddr + request->postfix); >>>> >>>> + GEM_BUG_ON(engine->queued == 0); >>>> + engine->queued--; >>> >>> Ok, so under engine->timeline->lock. >>> >>>> + >>>> spin_lock(&request->timeline->lock); >>>> list_move_tail(&request->link, &timeline->requests); >>>> spin_unlock(&request->timeline->lock); >>>> @@ -525,6 +528,8 @@ void __i915_gem_request_unsubmit(struct drm_i915_gem_request *request) >>>> timeline = request->timeline; >>>> GEM_BUG_ON(timeline == engine->timeline); >>>> >>>> + engine->queued++; >>>> + >>>> spin_lock(&timeline->lock); >>>> list_move(&request->link, &timeline->requests); >>>> spin_unlock(&timeline->lock); >>>> diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c >>>> index d53680c08cb0..cc9d60130ddd 100644 >>>> --- a/drivers/gpu/drm/i915/intel_engine_cs.c >>>> +++ b/drivers/gpu/drm/i915/intel_engine_cs.c >>>> @@ -1675,12 +1675,13 @@ void intel_engine_dump(struct intel_engine_cs *engine, struct drm_printer *m) >>>> u64 addr; >>>> >>>> drm_printf(m, "%s\n", engine->name); >>>> - drm_printf(m, " current seqno %x, last %x, hangcheck %x [%d ms], inflight %d\n", >>>> + drm_printf(m, " current seqno %x, last %x, hangcheck %x [%d ms], inflight %d, queued %d\n", >>>> intel_engine_get_seqno(engine), >>>> intel_engine_last_submit(engine), >>>> engine->hangcheck.seqno, >>>> jiffies_to_msecs(jiffies - engine->hangcheck.action_timestamp), >>>> - engine->timeline->inflight_seqnos); >>>> + engine->timeline->inflight_seqnos, >>>> + INTEL_GEN(dev_priv) >= 8 ? engine->queued : -1); >>> >>> Not gen8 specific, just add engine->queued++ to i9xx_submit_request(). >> >> But where to put the decrement, and more importantly, how not make it >> lag the reality from the retire worker? :( > > The decrement is in __i915_gem_request_submit as before. > So basically it should remain 0, since we aren't keeping a queue of work > for the HW and just submitting into the ringbuffer as soon as we are > ready. (This may not always remain so...) Hence why the (last_seqno - > current_seqno) was so important. I keep getting lost in these callbacks... Right, so if we put the increment in i9xx_submit_request, it calls i915_gem_request_submit which immediately decrements it, and as you say it remains at zero then. So you just want to look at the last_seqno - current_seqno on <gen8. Which will be the totality of submitted stuff so sounds OK. Makes the metric usable on those platforms as well. Sorry for the confusion. Regards, Tvrtko _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2017-11-24 8:14 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-11-22 12:46 [PATCH 1/2] drm/i915: Keep a count of requests waiting for a slot on GPU Tvrtko Ursulin 2017-11-22 12:46 ` [PATCH 2/2] drm/i915/pmu: Add queued counter Tvrtko Ursulin 2017-11-22 13:01 ` Chris Wilson 2017-11-22 21:15 ` Rogozhkin, Dmitry V 2017-11-22 21:38 ` Chris Wilson 2017-11-24 8:14 ` Tvrtko Ursulin 2017-11-22 12:59 ` [PATCH 1/2] drm/i915: Keep a count of requests waiting for a slot on GPU Chris Wilson 2017-11-22 13:31 ` Tvrtko Ursulin 2017-11-22 13:35 ` Chris Wilson 2017-11-22 14:00 ` Tvrtko Ursulin
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.