[RFC v2 0/6] Queued/runnable/running engine stats

* [RFC v2 0/6] Queued/runnable/running engine stats
@ 2018-01-22 18:43 Tvrtko Ursulin
  2018-01-22 18:43 ` [RFC 1/6] drm/i915/pmu: Fix enable count array size and bounds checking Tvrtko Ursulin
                   ` (8 more replies)
  0 siblings, 9 replies; 16+ messages in thread
From: Tvrtko Ursulin @ 2018-01-22 18:43 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Per-engine queue depths are an interesting metric for analyzing the system load
and also for users who wish to use it to load balance their submissions based
on it.

In this version I have split the metrics into three separate counters:

1. QUEUED - From execbuf time to request being runnable - runnable meaning until
            dependencies have been resolved and fences signaled.
2. RUNNABLE - From runnable to running on the GPU.
3. RUNNING - Running on the GPU.

When inspected with perf stat the output looks roughly like this:

#           time             counts unit events
   201.160490145               0.01      i915/rcs0-queued/
   201.160490145              19.13      i915/rcs0-runnable/
   201.160490145               2.39      i915/rcs0-running/

The reported numbers are average queue depths for the last query period.

Having split out metrics should be more flexible for all users, and it is still
possible to fetch an atomic snapshot of all using the perf groups for those
wanting to combine them.

For users wanting instantanous numbers instead of averaged, we could potentially
expose them using the query API Lionel is working on.
(https://patchwork.freedesktop.org/series/36622/)

For instance a query packet could look like:

#define DRM_I915_QUERY_ENGINE_QUEUES		0x04

struct drm_i915_query_engine_queues {
	__u8 class;
	__u8 instance

	__u8 pad[2];

	__u32 queued;
	__u32 runnable;
	__u32 running;
};

I also have patches to expose this via intel-gpu-top, using the perf API.

v2:
 * Review feedback (see patch changelogs).
 * Renamed the counters and re-ordered some patches.

Tvrtko Ursulin (6):
  drm/i915/pmu: Fix enable count array size and bounds checking
  drm/i915: Keep a count of requests waiting for a slot on GPU
  drm/i915: Keep a count of requests submitted from userspace
  drm/i915/pmu: Add queued counter
  drm/i915/pmu: Add runnable counter
  drm/i915/pmu: Add running counter

 drivers/gpu/drm/i915/i915_gem_request.c | 10 ++++
 drivers/gpu/drm/i915/i915_pmu.c         | 81 +++++++++++++++++++++++++++++----
 drivers/gpu/drm/i915/intel_engine_cs.c  |  6 ++-
 drivers/gpu/drm/i915/intel_lrc.c        |  2 +
 drivers/gpu/drm/i915/intel_ringbuffer.h | 21 ++++++++-
 include/uapi/drm/i915_drm.h             | 19 +++++++-
 6 files changed, 126 insertions(+), 13 deletions(-)

-- 
2.14.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 16+ messages in thread