All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 0/6] Submitted queue depth stats
@ 2018-01-18 10:41 Tvrtko Ursulin
  2018-01-18 10:41 ` [RFC 1/6] drm/i915/pmu: Fix enable count array size and bounds checking Tvrtko Ursulin
                   ` (6 more replies)
  0 siblings, 7 replies; 15+ messages in thread
From: Tvrtko Ursulin @ 2018-01-18 10:41 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Per-engine queue depths are an interesting metric for analyzing the system load
and also for users who wish to use it to load balance their submissions based
on it.

In this version I have split the metrics into three separate counters:

1. SUBMITTED - From execbuf time to request being runnable - meaning
	       dependencies have been resolved and fences signaled.
2. QUEUED - From runnable to running on the GPU.
3. RUNNING - Running on the GPU.

When inspected with perf stat the output looks roughly like this:

#           time             counts unit events
   201.160490145               0.01      i915/rcs0-submitted/
   201.160490145              19.13      i915/rcs0-queued/
   201.160490145               2.39      i915/rcs0-running/

The reported numbers are average queue depths for the last query period.

Having split out metrics should be more flexible for all users, and it is still
possible to fetch an atomic snapshot of all using the perf groups for those
wanting to combine them.

For users wanting instantanous numbers instead of averaged, we could potentially
expose them using the query API Lionel is working on.
(https://patchwork.freedesktop.org/series/36622/)

For instance a query packet could look like:

#define DRM_I915_QUERY_ENGINE_QUEUES		0x04

struct drm_i915_query_engine_queues {
	__u8 class;
	__u8 instance

	__u8 pad[2];

	__u32 submitted;
	__u32 queued;
	__u32 running;
};

I also have patches to expose this via intel-gpu-top, using the perf API.

Tvrtko Ursulin (6):
  drm/i915/pmu: Fix enable count array size and bounds checking
  drm/i915: Keep a count of requests waiting for a slot on GPU
  drm/i915: Keep a count of requests submitted from userspace
  drm/i915/pmu: Add queued counter
  drm/i915/pmu: Add submitted counter
  drm/i915/pmu: Add running counter

 drivers/gpu/drm/i915/i915_gem_request.c | 10 +++++
 drivers/gpu/drm/i915/i915_pmu.c         | 67 +++++++++++++++++++++++++++++----
 drivers/gpu/drm/i915/intel_engine_cs.c  |  6 ++-
 drivers/gpu/drm/i915/intel_lrc.c        |  2 +
 drivers/gpu/drm/i915/intel_ringbuffer.h | 19 +++++++++-
 include/uapi/drm/i915_drm.h             | 18 ++++++++-
 6 files changed, 109 insertions(+), 13 deletions(-)

-- 
2.14.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 15+ messages in thread
* [RFC v2 0/6] Queued/runnable/running engine stats
@ 2018-01-22 18:43 Tvrtko Ursulin
  2018-01-22 18:43 ` [RFC 4/6] drm/i915/pmu: Add queued counter Tvrtko Ursulin
  0 siblings, 1 reply; 15+ messages in thread
From: Tvrtko Ursulin @ 2018-01-22 18:43 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Per-engine queue depths are an interesting metric for analyzing the system load
and also for users who wish to use it to load balance their submissions based
on it.

In this version I have split the metrics into three separate counters:

1. QUEUED - From execbuf time to request being runnable - runnable meaning until
            dependencies have been resolved and fences signaled.
2. RUNNABLE - From runnable to running on the GPU.
3. RUNNING - Running on the GPU.

When inspected with perf stat the output looks roughly like this:

#           time             counts unit events
   201.160490145               0.01      i915/rcs0-queued/
   201.160490145              19.13      i915/rcs0-runnable/
   201.160490145               2.39      i915/rcs0-running/

The reported numbers are average queue depths for the last query period.

Having split out metrics should be more flexible for all users, and it is still
possible to fetch an atomic snapshot of all using the perf groups for those
wanting to combine them.

For users wanting instantanous numbers instead of averaged, we could potentially
expose them using the query API Lionel is working on.
(https://patchwork.freedesktop.org/series/36622/)

For instance a query packet could look like:

#define DRM_I915_QUERY_ENGINE_QUEUES		0x04

struct drm_i915_query_engine_queues {
	__u8 class;
	__u8 instance

	__u8 pad[2];

	__u32 queued;
	__u32 runnable;
	__u32 running;
};

I also have patches to expose this via intel-gpu-top, using the perf API.

v2:
 * Review feedback (see patch changelogs).
 * Renamed the counters and re-ordered some patches.

Tvrtko Ursulin (6):
  drm/i915/pmu: Fix enable count array size and bounds checking
  drm/i915: Keep a count of requests waiting for a slot on GPU
  drm/i915: Keep a count of requests submitted from userspace
  drm/i915/pmu: Add queued counter
  drm/i915/pmu: Add runnable counter
  drm/i915/pmu: Add running counter

 drivers/gpu/drm/i915/i915_gem_request.c | 10 ++++
 drivers/gpu/drm/i915/i915_pmu.c         | 81 +++++++++++++++++++++++++++++----
 drivers/gpu/drm/i915/intel_engine_cs.c  |  6 ++-
 drivers/gpu/drm/i915/intel_lrc.c        |  2 +
 drivers/gpu/drm/i915/intel_ringbuffer.h | 21 ++++++++-
 include/uapi/drm/i915_drm.h             | 19 +++++++-
 6 files changed, 126 insertions(+), 13 deletions(-)

-- 
2.14.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2018-01-24 18:02 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-18 10:41 [RFC 0/6] Submitted queue depth stats Tvrtko Ursulin
2018-01-18 10:41 ` [RFC 1/6] drm/i915/pmu: Fix enable count array size and bounds checking Tvrtko Ursulin
2018-01-18 10:41 ` [RFC 2/6] drm/i915: Keep a count of requests waiting for a slot on GPU Tvrtko Ursulin
2018-01-18 10:41 ` [RFC 3/6] drm/i915: Keep a count of requests submitted from userspace Tvrtko Ursulin
2018-01-18 10:41 ` [RFC 4/6] drm/i915/pmu: Add queued counter Tvrtko Ursulin
2018-01-18 10:41 ` [RFC 5/6] drm/i915/pmu: Add submitted counter Tvrtko Ursulin
2018-01-18 10:41 ` [RFC 6/6] drm/i915/pmu: Add running counter Tvrtko Ursulin
2018-01-18 11:57   ` Chris Wilson
2018-01-19 11:45     ` Tvrtko Ursulin
2018-01-19 13:40       ` Chris Wilson
2018-01-19 13:48         ` Tvrtko Ursulin
2018-01-18 12:04 ` ✗ Fi.CI.BAT: warning for Submitted queue depth stats Patchwork
2018-01-22 18:43 [RFC v2 0/6] Queued/runnable/running engine stats Tvrtko Ursulin
2018-01-22 18:43 ` [RFC 4/6] drm/i915/pmu: Add queued counter Tvrtko Ursulin
2018-01-22 18:56   ` Chris Wilson
2018-01-24 18:01     ` Tvrtko Ursulin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.