[RFC 0/8] Introduce framework for forwarding generic non-OA performance

* [RFC 0/8] Introduce framework for forwarding generic non-OA performance
@ 2015-08-05  5:55 sourab.gupta
  2015-08-05  5:55 ` [RFC 1/8] drm/i915: Add a new PMU for handling non-OA counter data profiling requests sourab.gupta
                   ` (7 more replies)
  0 siblings, 8 replies; 31+ messages in thread
From: sourab.gupta @ 2015-08-05  5:55 UTC (permalink / raw)
  To: intel-gfx; +Cc: Insoo Woo, Peter Zijlstra, Jabin Wu, Sourab Gupta

From: Sourab Gupta <sourab.gupta@intel.com>

This is an updated patch set (v3 - changes list at end), which builds upon the
multi context OA patch set introduced earlier at:

http://lists.freedesktop.org/archives/intel-gfx/2015-August/072949.html

The OA unit, as such, is specific to render ring and can't cater to performance
data requirements for other GPU engines.
Specifically, the media workloads may utilize other GPU engines, but there is
currently no framework which can be used to query performance statistics for
non-RCS workloads and provide this data to userspace tools. This patch set
tries to address this specific problem. The aim of this patch series is to
build upon the perf event framework developed earlier and use it for
forwarding performance data of non-RCS engine workloads.

Since the previous PMU is customized to handle OA reports, a new perf PMU is
added to handle generic non-OA performance data. An example of such non-OA
performance data is the timestamps/mmio registers.
This patch set enables framework for capturing the timestamps at
batch buffer boundaries, by inserting commands for the same in ringbuffer,
and forwarding the samples to userspace through perf interface.
Nevertheless, the framework and data structures can be extended to introduce
more performance data types (other than timestamps). The intention here is to
introduce a framework to enable capturing of generic performance data and
forwarding the same to userspace using perf apis.

The reports generated will again have an additional footer for metadata
information such as ctx_id, pid, ring id and tags (in the same way as done
for OA reports specified in the patch series earlier). This information can be
used by userspace tools such as MVP (Modular Video Profiler) to associate
reports with individual contexts and different stages of workload execution.

In this patch set, the timestamps are captured at BB boundaries by inserting
the commands in the ringbuffer at the batchbuffer boundaries. As specified
earlier, for a system wide GPU profiler, the relative complexity of doing this
in kernel is significantly less than supporting this usecase through userspace
command insertion by all the different components.

The final patch in the series tries to extend the data structures to enable
capture of upto 8 MMIO register values, in conjunction with timestamps

v2: This patch series has the following changes wrt the one floated earlier:
    - Removing synchronous waits during event stop/destroy
    - segregating the book-keeping data for the samples from destination buffer
      and collecting it into a separate list
    - managing the lifetime of destination buffer with the help of gem active
      reference tracking
    - having the scope of i915 device mutex limited to places of gem interaction
      and having the pmu data structures protected with a per pmu lock
    - userspace can now control the metadata it wants by requesting the same
      during event init. The sample is sent with the requested metadata in a
      packed format.
    - Some patches merged together and a few more introduced
    - mmio whitelist in place

v3: Changes made:
    - Meeting semantics for flush (ensuring to flush samples before returning).
    - spin_lock used in place of spin_lock_irqsave.
    - Using BUILD_BUG_ON macros to test the alignment/size requirements.
    - Some code restructuring/optimization, better nomenclature, and error
      handling.

Sourab Gupta (8):
  drm/i915: Add a new PMU for handling non-OA counter data profiling
    requests
  drm/i915: Add mechanism for forwarding the timestamp data through perf
  drm/i915: Handle event stop and destroy for GPU commands submitted
  drm/i915: Insert commands for capturing timestamps in the ring
  drm/i915: Add support for forwarding ring id in sample metadata
    through perf
  drm/i915: Add support for forwarding pid in timestamp sample metadata
    through perf
  drm/i915: Add support for forwarding execbuffer tags in timestamp
    sample metadata
  drm/i915: Support for retrieving MMIO register values alongwith
    timestamps through perf

 drivers/gpu/drm/i915/i915_dma.c     |   2 +
 drivers/gpu/drm/i915/i915_drv.h     |  43 +++
 drivers/gpu/drm/i915/i915_oa_perf.c | 690 ++++++++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_reg.h     |   2 +
 include/uapi/drm/i915_drm.h         |  42 +++
 5 files changed, 779 insertions(+)

-- 
1.8.5.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread