All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/12] i915 perf support for command stream based OA, GPU and workload metrics capture
@ 2017-07-31  7:59 Sagar Arun Kamble
  2017-07-31  7:59 ` [PATCH 01/12] drm/i915: Add ctx getparam ioctl parameter to retrieve ctx unique id Sagar Arun Kamble
                   ` (12 more replies)
  0 siblings, 13 replies; 34+ messages in thread
From: Sagar Arun Kamble @ 2017-07-31  7:59 UTC (permalink / raw)
  To: intel-gfx; +Cc: Matthew Auld

This series is prepared from below two series posted by Sourab in March.
1. https://patchwork.freedesktop.org/series/21351/ - Collect command stream
   based OA reports using i915 perf
2. https://patchwork.freedesktop.org/series/21352/ - Collect command stream
   based GPU metrics for all engines using i915 perf

This series addresses most of the review comments from above two. Major
change is moving the stream structure and information from dev_priv to
per-engine structures. Stating below the intent of this series from cover
letters of earlier series.

This series adds framework for
1. Collection of OA reports associated with the render command stream, which
are collected around batchbuffer boundaries.
2. Collect other metadata such as ctx_id, pid, tag etc. with the samples,
and thus we can establish the association of samples collected with the
corresponding process/workload.
3. Collection of GPU performance metrics associated with the command stream of
a particular engine. These metrics include timestamps of work submission and
completion on engines, mmio metrics, etc. These metrics are are collected
around batchbuffer boundaries.

Functionality to be added in future patches:
1. GPU/CPU cross-timestamp sync patches need to be reworked as requested by
   kernel maintainers.
2. Some of the data types being collected through these patches can be done in
   the userspace and that is yet to be finalized. Based on that some of the
   functionality from this series can be pruned.
3. Add support in the perf IGT tests for verifying CS based perf functionality.

Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>

Sourab Gupta (12):
  drm/i915: Add ctx getparam ioctl parameter to retrieve ctx unique id
  drm/i915: Expose OA sample source to userspace
  drm/i915: Framework for capturing command stream based OA reports and
    ctx id info.
  drm/i915: Flush periodic samples, in case of no pending CS sample
    requests
  drm/i915: Inform userspace about command stream OA buf overflow
  drm/i915: Populate ctx ID for periodic OA reports
  drm/i915: Add support for having pid output with OA report
  drm/i915: Add support for emitting execbuffer tags through OA counter
    reports
  drm/i915: Add support for collecting timestamps on all gpu engines
  drm/i915: Extract raw GPU timestamps from OA reports to forward in
    perf samples
  drm/i915: Async check for streams data availability with hrtimer
    rescheduling
  drm/i915: Support for capturing MMIO register values

 drivers/gpu/drm/i915/i915_drv.h            |  165 ++-
 drivers/gpu/drm/i915/i915_gem.c            |    1 +
 drivers/gpu/drm/i915/i915_gem_context.c    |    3 +
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |   11 +
 drivers/gpu/drm/i915/i915_perf.c           | 1790 ++++++++++++++++++++++++----
 drivers/gpu/drm/i915/i915_reg.h            |    6 +
 drivers/gpu/drm/i915/intel_engine_cs.c     |    4 +
 drivers/gpu/drm/i915/intel_ringbuffer.c    |    2 +
 drivers/gpu/drm/i915/intel_ringbuffer.h    |    8 +
 include/uapi/drm/i915_drm.h                |   69 ++
 10 files changed, 1798 insertions(+), 261 deletions(-)

-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH 01/12] drm/i915: Add ctx getparam ioctl parameter to retrieve ctx unique id
  2017-07-31  7:59 [PATCH 00/12] i915 perf support for command stream based OA, GPU and workload metrics capture Sagar Arun Kamble
@ 2017-07-31  7:59 ` Sagar Arun Kamble
  2017-07-31  7:59 ` [PATCH 02/12] drm/i915: Expose OA sample source to userspace Sagar Arun Kamble
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 34+ messages in thread
From: Sagar Arun Kamble @ 2017-07-31  7:59 UTC (permalink / raw)
  To: intel-gfx; +Cc: Sourab Gupta

From: Sourab Gupta <sourab.gupta@intel.com>

This patch adds a new ctx getparam ioctl parameter, which can be used to
retrieve ctx unique id by userspace.

This can be used by userspace to map the OA reports received in the
i915 perf samples with their associated ctx's (The OA reports have the
hw ctx ID information embedded for Gen8+).
Otherwise the userspace has no way of maintaining this association,
since it has the knowledge of only per-drm file specific ctx handles.

Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_context.c | 3 +++
 include/uapi/drm/i915_drm.h             | 1 +
 2 files changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index ed91ac8..d6128aa 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -1084,6 +1084,9 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
 	case I915_CONTEXT_PARAM_BANNABLE:
 		args->value = i915_gem_context_is_bannable(ctx);
 		break;
+	case I915_CONTEXT_PARAM_HW_ID:
+		args->value = ctx->hw_id;
+		break;
 	default:
 		ret = -EINVAL;
 		break;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 7ccbd6a..29f1501 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1327,6 +1327,7 @@ struct drm_i915_gem_context_param {
 #define I915_CONTEXT_PARAM_GTT_SIZE	0x3
 #define I915_CONTEXT_PARAM_NO_ERROR_CAPTURE	0x4
 #define I915_CONTEXT_PARAM_BANNABLE	0x5
+#define I915_CONTEXT_PARAM_HW_ID	0x6
 	__u64 value;
 };
 
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 02/12] drm/i915: Expose OA sample source to userspace
  2017-07-31  7:59 [PATCH 00/12] i915 perf support for command stream based OA, GPU and workload metrics capture Sagar Arun Kamble
  2017-07-31  7:59 ` [PATCH 01/12] drm/i915: Add ctx getparam ioctl parameter to retrieve ctx unique id Sagar Arun Kamble
@ 2017-07-31  7:59 ` Sagar Arun Kamble
  2017-07-31  7:59 ` [PATCH 03/12] drm/i915: Framework for capturing command stream based OA reports and ctx id info Sagar Arun Kamble
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 34+ messages in thread
From: Sagar Arun Kamble @ 2017-07-31  7:59 UTC (permalink / raw)
  To: intel-gfx; +Cc: Sourab Gupta

From: Sourab Gupta <sourab.gupta@intel.com>

This patch exposes a new sample source field to userspace. This field can
be populated to specify the origin of the OA report.
Currently, the OA samples are being generated only periodically, and hence
there's only source flag enum definition right now, but there are other
means of generating OA samples, such as via MI_RPC commands. The OA_SOURCE
sample type is introducing a mechanism (for userspace) to distinguish
various OA reports generated via different sources.
This is not intended as a replacement for the reason field that's part of
Gen8+ OA reports. For automatically triggered reports written to the
OABUFFER the reason field will distinguish e.g. periodic vs ctx-switch vs
GO transition reasons for the OA unit writing a report. However, The reason
field is overloaded as the RPT_ID field for MI_RPC reports so we need our
own way of tracking the difference.

v2: Renamed the source enum type and values. Updated commit description.
(Robert). Changed payload field source to u64 to keep all sample data
aligned at 8 bytes. (Lionel)

Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
Signed-off-by: Robert Bragg <robert@sixbynine.org>
Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
---
 drivers/gpu/drm/i915/i915_perf.c | 25 +++++++++++++++++++++++++
 include/uapi/drm/i915_drm.h      | 13 +++++++++++++
 2 files changed, 38 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 96682fd..b272653 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -329,6 +329,7 @@
 };
 
 #define SAMPLE_OA_REPORT      (1<<0)
+#define SAMPLE_OA_SOURCE      (1<<1)
 
 /**
  * struct perf_open_properties - for validated properties given to open a stream
@@ -559,6 +560,22 @@ static int append_oa_sample(struct i915_perf_stream *stream,
 		return -EFAULT;
 	buf += sizeof(header);
 
+	/*
+	 * Sample has metadata containting OA_SOURCE followed by OA_REPORT.
+	 * Need to maintain this uapi w.r.t any reorganizing later not realizing
+	 * the ordering.
+	 * Currently there are a number of different automatic triggers for
+	 * writing OA reports to the OABUFFER like periodic, ctx-switch, go
+	 * transition. These are considered as source 'OABUFFER'.
+	 */
+	if (sample_flags & SAMPLE_OA_SOURCE) {
+		u64 source = I915_PERF_SAMPLE_OA_SOURCE_OABUFFER;
+
+		if (copy_to_user(buf, &source, 8))
+			return -EFAULT;
+		buf += 8;
+	}
+
 	if (sample_flags & SAMPLE_OA_REPORT) {
 		if (copy_to_user(buf, report, report_size))
 			return -EFAULT;
@@ -2048,6 +2065,11 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
 	stream->sample_flags |= SAMPLE_OA_REPORT;
 	stream->sample_size += format_size;
 
+	if (props->sample_flags & SAMPLE_OA_SOURCE) {
+		stream->sample_flags |= SAMPLE_OA_SOURCE;
+		stream->sample_size += 8;
+	}
+
 	dev_priv->perf.oa.oa_buffer.format_size = format_size;
 	if (WARN_ON(dev_priv->perf.oa.oa_buffer.format_size == 0))
 		return -EINVAL;
@@ -2749,6 +2771,9 @@ static int read_properties_unlocked(struct drm_i915_private *dev_priv,
 			props->oa_periodic = true;
 			props->oa_period_exponent = value;
 			break;
+		case DRM_I915_PERF_PROP_SAMPLE_OA_SOURCE:
+			props->sample_flags |= SAMPLE_OA_SOURCE;
+			break;
 		case DRM_I915_PERF_PROP_MAX:
 			MISSING_CASE(id);
 			return -EINVAL;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 29f1501..a1314c5 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1348,6 +1348,11 @@ enum drm_i915_oa_format {
 	I915_OA_FORMAT_MAX	    /* non-ABI */
 };
 
+enum drm_i915_perf_sample_oa_source {
+	I915_PERF_SAMPLE_OA_SOURCE_OABUFFER,
+	I915_PERF_SAMPLE_OA_SOURCE_MAX	/* non-ABI */
+};
+
 enum drm_i915_perf_property_id {
 	/**
 	 * Open the stream for a specific context handle (as used with
@@ -1382,6 +1387,13 @@ enum drm_i915_perf_property_id {
 	 */
 	DRM_I915_PERF_PROP_OA_EXPONENT,
 
+	/**
+	 * The value of this property set to 1 requests inclusion of sample
+	 * source field to be given to userspace. The sample source field
+	 * specifies the origin of OA report.
+	 */
+	DRM_I915_PERF_PROP_SAMPLE_OA_SOURCE,
+
 	DRM_I915_PERF_PROP_MAX /* non-ABI */
 };
 
@@ -1447,6 +1459,7 @@ enum drm_i915_perf_record_type {
 	 * struct {
 	 *     struct drm_i915_perf_record_header header;
 	 *
+	 *     { u64 source; } && DRM_I915_PERF_PROP_SAMPLE_OA_SOURCE
 	 *     { u32 oa_report[]; } && DRM_I915_PERF_PROP_SAMPLE_OA
 	 * };
 	 */
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 03/12] drm/i915: Framework for capturing command stream based OA reports and ctx id info.
  2017-07-31  7:59 [PATCH 00/12] i915 perf support for command stream based OA, GPU and workload metrics capture Sagar Arun Kamble
  2017-07-31  7:59 ` [PATCH 01/12] drm/i915: Add ctx getparam ioctl parameter to retrieve ctx unique id Sagar Arun Kamble
  2017-07-31  7:59 ` [PATCH 02/12] drm/i915: Expose OA sample source to userspace Sagar Arun Kamble
@ 2017-07-31  7:59 ` Sagar Arun Kamble
  2017-07-31  8:34   ` Chris Wilson
                     ` (3 more replies)
  2017-07-31  7:59 ` [PATCH 04/12] drm/i915: Flush periodic samples, in case of no pending CS sample requests Sagar Arun Kamble
                   ` (9 subsequent siblings)
  12 siblings, 4 replies; 34+ messages in thread
From: Sagar Arun Kamble @ 2017-07-31  7:59 UTC (permalink / raw)
  To: intel-gfx; +Cc: Sourab Gupta

From: Sourab Gupta <sourab.gupta@intel.com>

This patch introduces a framework to capture OA counter reports associated
with Render command stream. We can then associate the reports captured
through this mechanism with their corresponding context id's. This can be
further extended to associate any other metadata information with the
corresponding samples (since the association with Render command stream
gives us the ability to capture these information while inserting the
corresponding capture commands into the command stream).

The OA reports generated in this way are associated with a corresponding
workload, and thus can be used the delimit the workload (i.e. sample the
counters at the workload boundaries), within an ongoing stream of periodic
counter snapshots.

There may be usecases wherein we need more than periodic OA capture mode
which is supported currently. This mode is primarily used for two usecases:
    - Ability to capture system wide metrics, alongwith the ability to map
      the reports back to individual contexts (particularly for HSW).
    - Ability to inject tags for work, into the reports. This provides
      visibility into the multiple stages of work within single context.

The userspace will be able to distinguish between the periodic and CS based
OA reports by the virtue of source_info sample field.

The command MI_REPORT_PERF_COUNT can be used to capture snapshots of OA
counters, and is inserted at BB boundaries.
The data thus captured will be stored in a separate buffer, which will
be different from the buffer used otherwise for periodic OA capture mode.
The metadata information pertaining to snapshot is maintained in a list,
which also has offsets into the gem buffer object per captured snapshot.
In order to track whether the gpu has completed processing the node,
a field pertaining to corresponding gem request is added, which is tracked
for completion of the command.

Both periodic and CS based reports are associated with a single stream
(corresponding to render engine), and it is expected to have the samples
in the sequential order according to their timestamps. Now, since these
reports are collected in separate buffers, these are merge sorted at the
time of forwarding to userspace during the read call.

v2: Aligning with the non-perf interface (custom drm ioctl based). Also,
few related patches are squashed together for better readability

v3: Updated perf sample capture emit hook name. Reserving space upfront
in the ring for emitting sample capture commands and using
req->fence.seqno for tracking samples. Added SRCU protection for streams.
Changed the stream last_request tracking to resv object. (Chris)
Updated perf.sample_lock spin_lock usage to avoid softlockups. Moved
stream to global per-engine structure. (Sagar)
Update unpin and put in the free routines to i915_vma_unpin_and_release.
Making use of perf stream cs_buffer vma resv instead of separate resv obj.
Pruned perf stream vma resv during gem_idle. (Chris)
Changed payload field ctx_id to u64 to keep all sample data aligned at 8
bytes. (Lionel)
stall/flush prior to sample capture is not added. Do we need to give this
control to user to select whether to stall/flush at each sample?

Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
Signed-off-by: Robert Bragg <robert@sixbynine.org>
Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h            |  101 ++-
 drivers/gpu/drm/i915/i915_gem.c            |    1 +
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |    8 +
 drivers/gpu/drm/i915/i915_perf.c           | 1185 ++++++++++++++++++++++------
 drivers/gpu/drm/i915/intel_engine_cs.c     |    4 +
 drivers/gpu/drm/i915/intel_ringbuffer.c    |    2 +
 drivers/gpu/drm/i915/intel_ringbuffer.h    |    5 +
 include/uapi/drm/i915_drm.h                |   15 +
 8 files changed, 1073 insertions(+), 248 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 2c7456f..8b1cecf 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1985,6 +1985,24 @@ struct i915_perf_stream_ops {
 	 * The stream will always be disabled before this is called.
 	 */
 	void (*destroy)(struct i915_perf_stream *stream);
+
+	/*
+	 * @emit_sample_capture: Emit the commands in the command streamer
+	 * for a particular gpu engine.
+	 *
+	 * The commands are inserted to capture the perf sample data at
+	 * specific points during workload execution, such as before and after
+	 * the batch buffer.
+	 */
+	void (*emit_sample_capture)(struct i915_perf_stream *stream,
+				    struct drm_i915_gem_request *request,
+				    bool preallocate);
+};
+
+enum i915_perf_stream_state {
+	I915_PERF_STREAM_DISABLED,
+	I915_PERF_STREAM_ENABLE_IN_PROGRESS,
+	I915_PERF_STREAM_ENABLED,
 };
 
 /**
@@ -1997,9 +2015,9 @@ struct i915_perf_stream {
 	struct drm_i915_private *dev_priv;
 
 	/**
-	 * @link: Links the stream into ``&drm_i915_private->streams``
+	 * @engine: Engine to which this stream corresponds.
 	 */
-	struct list_head link;
+	struct intel_engine_cs *engine;
 
 	/**
 	 * @sample_flags: Flags representing the `DRM_I915_PERF_PROP_SAMPLE_*`
@@ -2022,17 +2040,41 @@ struct i915_perf_stream {
 	struct i915_gem_context *ctx;
 
 	/**
-	 * @enabled: Whether the stream is currently enabled, considering
-	 * whether the stream was opened in a disabled state and based
-	 * on `I915_PERF_IOCTL_ENABLE` and `I915_PERF_IOCTL_DISABLE` calls.
+	 * @state: Current stream state, which can be either disabled, enabled,
+	 * or enable_in_progress, while considering whether the stream was
+	 * opened in a disabled state and based on `I915_PERF_IOCTL_ENABLE` and
+	 * `I915_PERF_IOCTL_DISABLE` calls.
 	 */
-	bool enabled;
+	enum i915_perf_stream_state state;
+
+	/**
+	 * @cs_mode: Whether command stream based perf sample collection is
+	 * enabled for this stream
+	 */
+	bool cs_mode;
+
+	/**
+	 * @using_oa: Whether OA unit is in use for this particular stream
+	 */
+	bool using_oa;
 
 	/**
 	 * @ops: The callbacks providing the implementation of this specific
 	 * type of configured stream.
 	 */
 	const struct i915_perf_stream_ops *ops;
+
+	/* Command stream based perf data buffer */
+	struct {
+		struct i915_vma *vma;
+		u8 *vaddr;
+	} cs_buffer;
+
+	struct list_head cs_samples;
+	spinlock_t cs_samples_lock;
+
+	wait_queue_head_t poll_wq;
+	bool pollin;
 };
 
 /**
@@ -2095,7 +2137,8 @@ struct i915_oa_ops {
 	int (*read)(struct i915_perf_stream *stream,
 		    char __user *buf,
 		    size_t count,
-		    size_t *offset);
+		    size_t *offset,
+		    u32 ts);
 
 	/**
 	 * @oa_hw_tail_read: read the OA tail pointer register
@@ -2107,6 +2150,36 @@ struct i915_oa_ops {
 	u32 (*oa_hw_tail_read)(struct drm_i915_private *dev_priv);
 };
 
+/*
+ * i915_perf_cs_sample - Sample element to hold info about a single perf
+ * sample data associated with a particular GPU command stream.
+ */
+struct i915_perf_cs_sample {
+	/**
+	 * @link: Links the sample into ``&stream->cs_samples``
+	 */
+	struct list_head link;
+
+	/**
+	 * @request: GEM request associated with the sample. The commands to
+	 * capture the perf metrics are inserted into the command streamer in
+	 * context of this request.
+	 */
+	struct drm_i915_gem_request *request;
+
+	/**
+	 * @offset: Offset into ``&stream->cs_buffer``
+	 * where the perf metrics will be collected, when the commands inserted
+	 * into the command stream are executed by GPU.
+	 */
+	u32 offset;
+
+	/**
+	 * @ctx_id: Context ID associated with this perf sample
+	 */
+	u32 ctx_id;
+};
+
 struct intel_cdclk_state {
 	unsigned int cdclk, vco, ref;
 };
@@ -2431,17 +2504,10 @@ struct drm_i915_private {
 		struct ctl_table_header *sysctl_header;
 
 		struct mutex lock;
-		struct list_head streams;
-
-		struct {
-			struct i915_perf_stream *exclusive_stream;
 
-			u32 specific_ctx_id;
-
-			struct hrtimer poll_check_timer;
-			wait_queue_head_t poll_wq;
-			bool pollin;
+		struct hrtimer poll_check_timer;
 
+		struct {
 			/**
 			 * For rate limiting any notifications of spurious
 			 * invalid OA reports
@@ -3636,6 +3702,8 @@ int i915_perf_open_ioctl(struct drm_device *dev, void *data,
 void i915_oa_init_reg_state(struct intel_engine_cs *engine,
 			    struct i915_gem_context *ctx,
 			    uint32_t *reg_state);
+void i915_perf_emit_sample_capture(struct drm_i915_gem_request *req,
+				   bool preallocate);
 
 /* i915_gem_evict.c */
 int __must_check i915_gem_evict_something(struct i915_address_space *vm,
@@ -3795,6 +3863,7 @@ int intel_engine_cmd_parser(struct intel_engine_cs *engine,
 /* i915_perf.c */
 extern void i915_perf_init(struct drm_i915_private *dev_priv);
 extern void i915_perf_fini(struct drm_i915_private *dev_priv);
+extern void i915_perf_streams_mark_idle(struct drm_i915_private *dev_priv);
 extern void i915_perf_register(struct drm_i915_private *dev_priv);
 extern void i915_perf_unregister(struct drm_i915_private *dev_priv);
 
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 000a764..7b01548 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3220,6 +3220,7 @@ bool i915_gem_unset_wedged(struct drm_i915_private *i915)
 
 	intel_engines_mark_idle(dev_priv);
 	i915_gem_timelines_mark_idle(dev_priv);
+	i915_perf_streams_mark_idle(dev_priv);
 
 	GEM_BUG_ON(!dev_priv->gt.awake);
 	dev_priv->gt.awake = false;
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 5fa4476..bfe546b 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1194,12 +1194,16 @@ static int __reloc_gpu_alloc(struct i915_execbuffer *eb,
 	if (err)
 		goto err_request;
 
+	i915_perf_emit_sample_capture(rq, true);
+
 	err = eb->engine->emit_bb_start(rq,
 					batch->node.start, PAGE_SIZE,
 					cache->gen > 5 ? 0 : I915_DISPATCH_SECURE);
 	if (err)
 		goto err_request;
 
+	i915_perf_emit_sample_capture(rq, false);
+
 	GEM_BUG_ON(!reservation_object_test_signaled_rcu(batch->resv, true));
 	i915_vma_move_to_active(batch, rq, 0);
 	reservation_object_lock(batch->resv, NULL);
@@ -2029,6 +2033,8 @@ static int eb_submit(struct i915_execbuffer *eb)
 			return err;
 	}
 
+	i915_perf_emit_sample_capture(eb->request, true);
+
 	err = eb->engine->emit_bb_start(eb->request,
 					eb->batch->node.start +
 					eb->batch_start_offset,
@@ -2037,6 +2043,8 @@ static int eb_submit(struct i915_execbuffer *eb)
 	if (err)
 		return err;
 
+	i915_perf_emit_sample_capture(eb->request, false);
+
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index b272653..57e1936 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -193,6 +193,7 @@
 
 #include <linux/anon_inodes.h>
 #include <linux/sizes.h>
+#include <linux/srcu.h>
 
 #include "i915_drv.h"
 #include "i915_oa_hsw.h"
@@ -288,6 +289,12 @@
 #define OAREPORT_REASON_CTX_SWITCH     (1<<3)
 #define OAREPORT_REASON_CLK_RATIO      (1<<5)
 
+/* Data common to periodic and RCS based OA samples */
+struct i915_perf_sample_data {
+	u64 source;
+	u64 ctx_id;
+	const u8 *report;
+};
 
 /* For sysctl proc_dointvec_minmax of i915_oa_max_sample_rate
  *
@@ -328,8 +335,19 @@
 	[I915_OA_FORMAT_C4_B8]		    = { 7, 64 },
 };
 
+/* Duplicated from similar static enum in i915_gem_execbuffer.c */
+#define I915_USER_RINGS (4)
+static const enum intel_engine_id user_ring_map[I915_USER_RINGS + 1] = {
+	[I915_EXEC_DEFAULT]     = RCS,
+	[I915_EXEC_RENDER]      = RCS,
+	[I915_EXEC_BLT]         = BCS,
+	[I915_EXEC_BSD]         = VCS,
+	[I915_EXEC_VEBOX]       = VECS
+};
+
 #define SAMPLE_OA_REPORT      (1<<0)
 #define SAMPLE_OA_SOURCE      (1<<1)
+#define SAMPLE_CTX_ID	      (1<<2)
 
 /**
  * struct perf_open_properties - for validated properties given to open a stream
@@ -340,6 +358,9 @@
  * @oa_format: An OA unit HW report format
  * @oa_periodic: Whether to enable periodic OA unit sampling
  * @oa_period_exponent: The OA unit sampling period is derived from this
+ * @cs_mode: Whether the stream is configured to enable collection of metrics
+ * associated with command stream of a particular GPU engine
+ * @engine: The GPU engine associated with the stream in case cs_mode is enabled
  *
  * As read_properties_unlocked() enumerates and validates the properties given
  * to open a stream of metrics the configuration is built up in the structure
@@ -356,6 +377,10 @@ struct perf_open_properties {
 	int oa_format;
 	bool oa_periodic;
 	int oa_period_exponent;
+
+	/* Command stream mode */
+	bool cs_mode;
+	enum intel_engine_id engine;
 };
 
 static u32 gen8_oa_hw_tail_read(struct drm_i915_private *dev_priv)
@@ -371,6 +396,266 @@ static u32 gen7_oa_hw_tail_read(struct drm_i915_private *dev_priv)
 }
 
 /**
+ * i915_perf_emit_sample_capture - Insert the commands to capture metrics into
+ * the command stream of a GPU engine.
+ * @request: request in whose context the metrics are being collected.
+ * @preallocate: allocate space in ring for related sample.
+ *
+ * The function provides a hook through which the commands to capture perf
+ * metrics, are inserted into the command stream of a GPU engine.
+ */
+void i915_perf_emit_sample_capture(struct drm_i915_gem_request *request,
+				   bool preallocate)
+{
+	struct intel_engine_cs *engine = request->engine;
+	struct drm_i915_private *dev_priv = engine->i915;
+	struct i915_perf_stream *stream;
+	int idx;
+
+	if (!dev_priv->perf.initialized)
+		return;
+
+	idx = srcu_read_lock(&engine->perf_srcu);
+	stream = srcu_dereference(engine->exclusive_stream, &engine->perf_srcu);
+	if (stream && (stream->state == I915_PERF_STREAM_ENABLED) &&
+				stream->cs_mode)
+		stream->ops->emit_sample_capture(stream, request,
+						 preallocate);
+	srcu_read_unlock(&engine->perf_srcu, idx);
+}
+
+/**
+ * release_perf_samples - Release old perf samples to make space for new
+ * sample data.
+ * @stream: Stream from which space is to be freed up.
+ * @target_size: Space required to be freed up.
+ *
+ * We also dereference the associated request before deleting the sample.
+ * Also, no need to check whether the commands associated with old samples
+ * have been completed. This is because these sample entries are anyways going
+ * to be replaced by a new sample, and gpu will eventually overwrite the buffer
+ * contents, when the request associated with new sample completes.
+ */
+static void release_perf_samples(struct i915_perf_stream *stream,
+				 u32 target_size)
+{
+	struct drm_i915_private *dev_priv = stream->dev_priv;
+	struct i915_perf_cs_sample *sample, *next;
+	u32 sample_size = dev_priv->perf.oa.oa_buffer.format_size;
+	u32 size = 0;
+
+	list_for_each_entry_safe
+		(sample, next, &stream->cs_samples, link) {
+		size += sample_size;
+		i915_gem_request_put(sample->request);
+		list_del(&sample->link);
+		kfree(sample);
+
+		if (size >= target_size)
+			break;
+	}
+}
+
+/**
+ * insert_perf_sample - Insert a perf sample entry to the sample list.
+ * @stream: Stream into which sample is to be inserted.
+ * @sample: perf CS sample to be inserted into the list
+ *
+ * This function never fails, since it always manages to insert the sample.
+ * If the space is exhausted in the buffer, it will remove the older
+ * entries in order to make space.
+ */
+static void insert_perf_sample(struct i915_perf_stream *stream,
+				struct i915_perf_cs_sample *sample)
+{
+	struct drm_i915_private *dev_priv = stream->dev_priv;
+	struct i915_perf_cs_sample *first, *last;
+	int max_offset = stream->cs_buffer.vma->obj->base.size;
+	u32 sample_size = dev_priv->perf.oa.oa_buffer.format_size;
+	unsigned long flags;
+
+	spin_lock_irqsave(&stream->cs_samples_lock, flags);
+	if (list_empty(&stream->cs_samples)) {
+		sample->offset = 0;
+		list_add_tail(&sample->link, &stream->cs_samples);
+		spin_unlock_irqrestore(&stream->cs_samples_lock, flags);
+		return;
+	}
+
+	first = list_first_entry(&stream->cs_samples, typeof(*first),
+				link);
+	last = list_last_entry(&stream->cs_samples, typeof(*last),
+				link);
+
+	if (last->offset >= first->offset) {
+		/* Sufficient space available at the end of buffer? */
+		if (last->offset + 2*sample_size < max_offset)
+			sample->offset = last->offset + sample_size;
+		/*
+		 * Wraparound condition. Is sufficient space available at
+		 * beginning of buffer?
+		 */
+		else if (sample_size < first->offset)
+			sample->offset = 0;
+		/* Insufficient space. Overwrite existing old entries */
+		else {
+			u32 target_size = sample_size - first->offset;
+
+			release_perf_samples(stream, target_size);
+			sample->offset = 0;
+		}
+	} else {
+		/* Sufficient space available? */
+		if (last->offset + 2*sample_size < first->offset)
+			sample->offset = last->offset + sample_size;
+		/* Insufficient space. Overwrite existing old entries */
+		else {
+			u32 target_size = sample_size -
+				(first->offset - last->offset -
+				sample_size);
+
+			release_perf_samples(stream, target_size);
+			sample->offset = last->offset + sample_size;
+		}
+	}
+	list_add_tail(&sample->link, &stream->cs_samples);
+	spin_unlock_irqrestore(&stream->cs_samples_lock, flags);
+}
+
+/**
+ * i915_emit_oa_report_capture - Insert the commands to capture OA
+ * reports metrics into the render command stream
+ * @request: request in whose context the metrics are being collected.
+ * @preallocate: allocate space in ring for related sample.
+ * @offset: command stream buffer offset where the OA metrics need to be
+ * collected
+ */
+static int i915_emit_oa_report_capture(
+				struct drm_i915_gem_request *request,
+				bool preallocate,
+				u32 offset)
+{
+	struct drm_i915_private *dev_priv = request->i915;
+	struct intel_engine_cs *engine = request->engine;
+	struct i915_perf_stream *stream;
+	u32 addr = 0;
+	u32 cmd, len = 4, *cs;
+	int idx;
+
+	idx = srcu_read_lock(&engine->perf_srcu);
+	stream = srcu_dereference(engine->exclusive_stream, &engine->perf_srcu);
+	addr = stream->cs_buffer.vma->node.start + offset;
+	srcu_read_unlock(&engine->perf_srcu, idx);
+
+	if (WARN_ON(addr & 0x3f)) {
+		DRM_ERROR("OA buffer address not aligned to 64 byte\n");
+		return -EINVAL;
+	}
+
+	if (preallocate)
+		request->reserved_space += len;
+	else
+		request->reserved_space -= len;
+
+	cs = intel_ring_begin(request, 4);
+	if (IS_ERR(cs))
+		return PTR_ERR(cs);
+
+	cmd = MI_REPORT_PERF_COUNT | (1<<0);
+	if (INTEL_GEN(dev_priv) >= 8)
+		cmd |= (2<<0);
+
+	*cs++ = cmd;
+	*cs++ = addr | MI_REPORT_PERF_COUNT_GGTT;
+	*cs++ = request->fence.seqno;
+
+	if (INTEL_GEN(dev_priv) >= 8)
+		*cs++ = 0;
+	else
+		*cs++ = MI_NOOP;
+
+	intel_ring_advance(request, cs);
+
+	return 0;
+}
+
+/**
+ * i915_perf_stream_emit_sample_capture - Insert the commands to capture perf
+ * metrics into the GPU command stream
+ * @stream: An i915-perf stream opened for GPU metrics
+ * @request: request in whose context the metrics are being collected.
+ * @preallocate: allocate space in ring for related sample.
+ */
+static void i915_perf_stream_emit_sample_capture(
+					struct i915_perf_stream *stream,
+					struct drm_i915_gem_request *request,
+					bool preallocate)
+{
+	struct reservation_object *resv = stream->cs_buffer.vma->resv;
+	struct i915_perf_cs_sample *sample;
+	unsigned long flags;
+	int ret;
+
+	sample = kzalloc(sizeof(*sample), GFP_KERNEL);
+	if (sample == NULL) {
+		DRM_ERROR("Perf sample alloc failed\n");
+		return;
+	}
+
+	sample->request = i915_gem_request_get(request);
+	sample->ctx_id = request->ctx->hw_id;
+
+	insert_perf_sample(stream, sample);
+
+	if (stream->sample_flags & SAMPLE_OA_REPORT) {
+		ret = i915_emit_oa_report_capture(request,
+						  preallocate,
+						  sample->offset);
+		if (ret)
+			goto err_unref;
+	}
+
+	reservation_object_lock(resv, NULL);
+	if (reservation_object_reserve_shared(resv) == 0)
+		reservation_object_add_shared_fence(resv, &request->fence);
+	reservation_object_unlock(resv);
+
+	i915_vma_move_to_active(stream->cs_buffer.vma, request,
+					EXEC_OBJECT_WRITE);
+	return;
+
+err_unref:
+	i915_gem_request_put(sample->request);
+	spin_lock_irqsave(&stream->cs_samples_lock, flags);
+	list_del(&sample->link);
+	spin_unlock_irqrestore(&stream->cs_samples_lock, flags);
+	kfree(sample);
+}
+
+/**
+ * i915_perf_stream_release_samples - Release the perf command stream samples
+ * @stream: Stream from which sample are to be released.
+ *
+ * Note: The associated requests should be completed before releasing the
+ * references here.
+ */
+static void i915_perf_stream_release_samples(struct i915_perf_stream *stream)
+{
+	struct i915_perf_cs_sample *entry, *next;
+	unsigned long flags;
+
+	list_for_each_entry_safe
+		(entry, next, &stream->cs_samples, link) {
+		i915_gem_request_put(entry->request);
+
+		spin_lock_irqsave(&stream->cs_samples_lock, flags);
+		list_del(&entry->link);
+		spin_unlock_irqrestore(&stream->cs_samples_lock, flags);
+		kfree(entry);
+	}
+}
+
+/**
  * oa_buffer_check_unlocked - check for data and update tail ptr state
  * @dev_priv: i915 device instance
  *
@@ -521,12 +806,13 @@ static int append_oa_status(struct i915_perf_stream *stream,
 }
 
 /**
- * append_oa_sample - Copies single OA report into userspace read() buffer.
- * @stream: An i915-perf stream opened for OA metrics
+ * append_perf_sample - Copies single perf sample into userspace read() buffer.
+ * @stream: An i915-perf stream opened for perf samples
  * @buf: destination buffer given by userspace
  * @count: the number of bytes userspace wants to read
  * @offset: (inout): the current position for writing into @buf
- * @report: A single OA report to (optionally) include as part of the sample
+ * @data: perf sample data which contains (optionally) metrics configured
+ * earlier when opening a stream
  *
  * The contents of a sample are configured through `DRM_I915_PERF_PROP_SAMPLE_*`
  * properties when opening a stream, tracked as `stream->sample_flags`. This
@@ -537,11 +823,11 @@ static int append_oa_status(struct i915_perf_stream *stream,
  *
  * Returns: 0 on success, negative error code on failure.
  */
-static int append_oa_sample(struct i915_perf_stream *stream,
+static int append_perf_sample(struct i915_perf_stream *stream,
 			    char __user *buf,
 			    size_t count,
 			    size_t *offset,
-			    const u8 *report)
+			    const struct i915_perf_sample_data *data)
 {
 	struct drm_i915_private *dev_priv = stream->dev_priv;
 	int report_size = dev_priv->perf.oa.oa_buffer.format_size;
@@ -569,16 +855,21 @@ static int append_oa_sample(struct i915_perf_stream *stream,
 	 * transition. These are considered as source 'OABUFFER'.
 	 */
 	if (sample_flags & SAMPLE_OA_SOURCE) {
-		u64 source = I915_PERF_SAMPLE_OA_SOURCE_OABUFFER;
+		if (copy_to_user(buf, &data->source, 8))
+			return -EFAULT;
+		buf += 8;
+	}
 
-		if (copy_to_user(buf, &source, 8))
+	if (sample_flags & SAMPLE_CTX_ID) {
+		if (copy_to_user(buf, &data->ctx_id, 8))
 			return -EFAULT;
 		buf += 8;
 	}
 
 	if (sample_flags & SAMPLE_OA_REPORT) {
-		if (copy_to_user(buf, report, report_size))
+		if (copy_to_user(buf, data->report, report_size))
 			return -EFAULT;
+		buf += report_size;
 	}
 
 	(*offset) += header.size;
@@ -587,11 +878,54 @@ static int append_oa_sample(struct i915_perf_stream *stream,
 }
 
 /**
+ * append_oa_buffer_sample - Copies single periodic OA report into userspace
+ * read() buffer.
+ * @stream: An i915-perf stream opened for OA metrics
+ * @buf: destination buffer given by userspace
+ * @count: the number of bytes userspace wants to read
+ * @offset: (inout): the current position for writing into @buf
+ * @report: A single OA report to (optionally) include as part of the sample
+ *
+ * Returns: 0 on success, negative error code on failure.
+ */
+static int append_oa_buffer_sample(struct i915_perf_stream *stream,
+				char __user *buf, size_t count,
+				size_t *offset,	const u8 *report)
+{
+	struct drm_i915_private *dev_priv = stream->dev_priv;
+	u32 sample_flags = stream->sample_flags;
+	struct i915_perf_sample_data data = { 0 };
+	u32 *report32 = (u32 *)report;
+
+	if (sample_flags & SAMPLE_OA_SOURCE)
+		data.source = I915_PERF_SAMPLE_OA_SOURCE_OABUFFER;
+
+	if (sample_flags & SAMPLE_CTX_ID) {
+		if (INTEL_INFO(dev_priv)->gen < 8)
+			data.ctx_id = 0;
+		else {
+			/*
+			 * XXX: Just keep the lower 21 bits for now since I'm
+			 * not entirely sure if the HW touches any of the higher
+			 * bits in this field
+			 */
+			data.ctx_id = report32[2] & 0x1fffff;
+		}
+	}
+
+	if (sample_flags & SAMPLE_OA_REPORT)
+		data.report = report;
+
+	return append_perf_sample(stream, buf, count, offset, &data);
+}
+
+/**
  * Copies all buffered OA reports into userspace read() buffer.
  * @stream: An i915-perf stream opened for OA metrics
  * @buf: destination buffer given by userspace
  * @count: the number of bytes userspace wants to read
  * @offset: (inout): the current position for writing into @buf
+ * @ts: copy OA reports till this timestamp
  *
  * Notably any error condition resulting in a short read (-%ENOSPC or
  * -%EFAULT) will be returned even though one or more records may
@@ -609,7 +943,8 @@ static int append_oa_sample(struct i915_perf_stream *stream,
 static int gen8_append_oa_reports(struct i915_perf_stream *stream,
 				  char __user *buf,
 				  size_t count,
-				  size_t *offset)
+				  size_t *offset,
+				  u32 ts)
 {
 	struct drm_i915_private *dev_priv = stream->dev_priv;
 	int report_size = dev_priv->perf.oa.oa_buffer.format_size;
@@ -623,7 +958,7 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
 	u32 taken;
 	int ret = 0;
 
-	if (WARN_ON(!stream->enabled))
+	if (WARN_ON(stream->state != I915_PERF_STREAM_ENABLED))
 		return -EIO;
 
 	spin_lock_irqsave(&dev_priv->perf.oa.oa_buffer.ptr_lock, flags);
@@ -669,6 +1004,11 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
 		u32 *report32 = (void *)report;
 		u32 ctx_id;
 		u32 reason;
+		u32 report_ts = report32[1];
+
+		/* Report timestamp should not exceed the given ts */
+		if (report_ts > ts)
+			break;
 
 		/*
 		 * All the report sizes factor neatly into the buffer
@@ -750,23 +1090,23 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
 		 * switches since it's not-uncommon for periodic samples to
 		 * identify a switch before any 'context switch' report.
 		 */
-		if (!dev_priv->perf.oa.exclusive_stream->ctx ||
-		    dev_priv->perf.oa.specific_ctx_id == ctx_id ||
+		if (!stream->ctx ||
+		    stream->engine->specific_ctx_id == ctx_id ||
 		    (dev_priv->perf.oa.oa_buffer.last_ctx_id ==
-		     dev_priv->perf.oa.specific_ctx_id) ||
+		     stream->engine->specific_ctx_id) ||
 		    reason & OAREPORT_REASON_CTX_SWITCH) {
 
 			/*
 			 * While filtering for a single context we avoid
 			 * leaking the IDs of other contexts.
 			 */
-			if (dev_priv->perf.oa.exclusive_stream->ctx &&
-			    dev_priv->perf.oa.specific_ctx_id != ctx_id) {
+			if (stream->ctx &&
+			    stream->engine->specific_ctx_id != ctx_id) {
 				report32[2] = INVALID_CTX_ID;
 			}
 
-			ret = append_oa_sample(stream, buf, count, offset,
-					       report);
+			ret = append_oa_buffer_sample(stream, buf, count,
+						      offset, report);
 			if (ret)
 				break;
 
@@ -807,6 +1147,7 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
  * @buf: destination buffer given by userspace
  * @count: the number of bytes userspace wants to read
  * @offset: (inout): the current position for writing into @buf
+ * @ts: copy OA reports till this timestamp
  *
  * Checks OA unit status registers and if necessary appends corresponding
  * status records for userspace (such as for a buffer full condition) and then
@@ -824,7 +1165,8 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
 static int gen8_oa_read(struct i915_perf_stream *stream,
 			char __user *buf,
 			size_t count,
-			size_t *offset)
+			size_t *offset,
+			u32 ts)
 {
 	struct drm_i915_private *dev_priv = stream->dev_priv;
 	u32 oastatus;
@@ -877,7 +1219,7 @@ static int gen8_oa_read(struct i915_perf_stream *stream,
 			   oastatus & ~GEN8_OASTATUS_REPORT_LOST);
 	}
 
-	return gen8_append_oa_reports(stream, buf, count, offset);
+	return gen8_append_oa_reports(stream, buf, count, offset, ts);
 }
 
 /**
@@ -886,6 +1228,7 @@ static int gen8_oa_read(struct i915_perf_stream *stream,
  * @buf: destination buffer given by userspace
  * @count: the number of bytes userspace wants to read
  * @offset: (inout): the current position for writing into @buf
+ * @ts: copy OA reports till this timestamp
  *
  * Notably any error condition resulting in a short read (-%ENOSPC or
  * -%EFAULT) will be returned even though one or more records may
@@ -903,7 +1246,8 @@ static int gen8_oa_read(struct i915_perf_stream *stream,
 static int gen7_append_oa_reports(struct i915_perf_stream *stream,
 				  char __user *buf,
 				  size_t count,
-				  size_t *offset)
+				  size_t *offset,
+				  u32 ts)
 {
 	struct drm_i915_private *dev_priv = stream->dev_priv;
 	int report_size = dev_priv->perf.oa.oa_buffer.format_size;
@@ -917,7 +1261,7 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream,
 	u32 taken;
 	int ret = 0;
 
-	if (WARN_ON(!stream->enabled))
+	if (WARN_ON(stream->state != I915_PERF_STREAM_ENABLED))
 		return -EIO;
 
 	spin_lock_irqsave(&dev_priv->perf.oa.oa_buffer.ptr_lock, flags);
@@ -984,7 +1328,12 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream,
 			continue;
 		}
 
-		ret = append_oa_sample(stream, buf, count, offset, report);
+		/* Report timestamp should not exceed the given ts */
+		if (report32[1] > ts)
+			break;
+
+		ret = append_oa_buffer_sample(stream, buf, count, offset,
+					      report);
 		if (ret)
 			break;
 
@@ -1022,6 +1371,7 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream,
  * @buf: destination buffer given by userspace
  * @count: the number of bytes userspace wants to read
  * @offset: (inout): the current position for writing into @buf
+ * @ts: copy OA reports till this timestamp
  *
  * Checks Gen 7 specific OA unit status registers and if necessary appends
  * corresponding status records for userspace (such as for a buffer full
@@ -1035,7 +1385,8 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream,
 static int gen7_oa_read(struct i915_perf_stream *stream,
 			char __user *buf,
 			size_t count,
-			size_t *offset)
+			size_t *offset,
+			u32 ts)
 {
 	struct drm_i915_private *dev_priv = stream->dev_priv;
 	u32 oastatus1;
@@ -1097,16 +1448,172 @@ static int gen7_oa_read(struct i915_perf_stream *stream,
 			GEN7_OASTATUS1_REPORT_LOST;
 	}
 
-	return gen7_append_oa_reports(stream, buf, count, offset);
+	return gen7_append_oa_reports(stream, buf, count, offset, ts);
+}
+
+/**
+ * append_cs_buffer_sample - Copies single perf sample data associated with
+ * GPU command stream, into userspace read() buffer.
+ * @stream: An i915-perf stream opened for perf CS metrics
+ * @buf: destination buffer given by userspace
+ * @count: the number of bytes userspace wants to read
+ * @offset: (inout): the current position for writing into @buf
+ * @node: Sample data associated with perf metrics
+ *
+ * Returns: 0 on success, negative error code on failure.
+ */
+static int append_cs_buffer_sample(struct i915_perf_stream *stream,
+				char __user *buf,
+				size_t count,
+				size_t *offset,
+				struct i915_perf_cs_sample *node)
+{
+	struct drm_i915_private *dev_priv = stream->dev_priv;
+	struct i915_perf_sample_data data = { 0 };
+	u32 sample_flags = stream->sample_flags;
+	int ret = 0;
+
+	if (sample_flags & SAMPLE_OA_REPORT) {
+		const u8 *report = stream->cs_buffer.vaddr + node->offset;
+		u32 sample_ts = *(u32 *)(report + 4);
+
+		data.report = report;
+
+		/* First, append the periodic OA samples having lower
+		 * timestamp values
+		 */
+		ret = dev_priv->perf.oa.ops.read(stream, buf, count, offset,
+						 sample_ts);
+		if (ret)
+			return ret;
+	}
+
+	if (sample_flags & SAMPLE_OA_SOURCE)
+		data.source = I915_PERF_SAMPLE_OA_SOURCE_CS;
+
+	if (sample_flags & SAMPLE_CTX_ID)
+		data.ctx_id = node->ctx_id;
+
+	return append_perf_sample(stream, buf, count, offset, &data);
 }
 
 /**
- * i915_oa_wait_unlocked - handles blocking IO until OA data available
+ * append_cs_buffer_samples: Copies all command stream based perf samples
+ * into userspace read() buffer.
+ * @stream: An i915-perf stream opened for perf CS metrics
+ * @buf: destination buffer given by userspace
+ * @count: the number of bytes userspace wants to read
+ * @offset: (inout): the current position for writing into @buf
+ *
+ * Notably any error condition resulting in a short read (-%ENOSPC or
+ * -%EFAULT) will be returned even though one or more records may
+ * have been successfully copied. In this case it's up to the caller
+ * to decide if the error should be squashed before returning to
+ * userspace.
+ *
+ * Returns: 0 on success, negative error code on failure.
+ */
+static int append_cs_buffer_samples(struct i915_perf_stream *stream,
+				char __user *buf,
+				size_t count,
+				size_t *offset)
+{
+	struct i915_perf_cs_sample *entry, *next;
+	LIST_HEAD(free_list);
+	int ret = 0;
+	unsigned long flags;
+
+	spin_lock_irqsave(&stream->cs_samples_lock, flags);
+	if (list_empty(&stream->cs_samples)) {
+		spin_unlock_irqrestore(&stream->cs_samples_lock, flags);
+		return 0;
+	}
+	list_for_each_entry_safe(entry, next,
+				 &stream->cs_samples, link) {
+		if (!i915_gem_request_completed(entry->request))
+			break;
+		list_move_tail(&entry->link, &free_list);
+	}
+	spin_unlock_irqrestore(&stream->cs_samples_lock, flags);
+
+	if (list_empty(&free_list))
+		return 0;
+
+	list_for_each_entry_safe(entry, next, &free_list, link) {
+		ret = append_cs_buffer_sample(stream, buf, count, offset,
+					      entry);
+		if (ret)
+			break;
+
+		list_del(&entry->link);
+		i915_gem_request_put(entry->request);
+		kfree(entry);
+	}
+
+	/* Don't discard remaining entries, keep them for next read */
+	spin_lock_irqsave(&stream->cs_samples_lock, flags);
+	list_splice(&free_list, &stream->cs_samples);
+	spin_unlock_irqrestore(&stream->cs_samples_lock, flags);
+
+	return ret;
+}
+
+/*
+ * cs_buffer_is_empty - Checks whether the command stream buffer
+ * associated with the stream has data available.
  * @stream: An i915-perf stream opened for OA metrics
  *
+ * Returns: true if atleast one request associated with command stream is
+ * completed, else returns false.
+ */
+static bool cs_buffer_is_empty(struct i915_perf_stream *stream)
+
+{
+	struct i915_perf_cs_sample *entry = NULL;
+	struct drm_i915_gem_request *request = NULL;
+	unsigned long flags;
+
+	spin_lock_irqsave(&stream->cs_samples_lock, flags);
+	entry = list_first_entry_or_null(&stream->cs_samples,
+			struct i915_perf_cs_sample, link);
+	if (entry)
+		request = entry->request;
+	spin_unlock_irqrestore(&stream->cs_samples_lock, flags);
+
+	if (!entry)
+		return true;
+	else if (!i915_gem_request_completed(request))
+		return true;
+	else
+		return false;
+}
+
+/**
+ * stream_have_data_unlocked - Checks whether the stream has data available
+ * @stream: An i915-perf stream opened for OA metrics
+ *
+ * For command stream based streams, check if the command stream buffer has
+ * atleast one sample available, if not return false, irrespective of periodic
+ * oa buffer having the data or not.
+ */
+
+static bool stream_have_data_unlocked(struct i915_perf_stream *stream)
+{
+	struct drm_i915_private *dev_priv = stream->dev_priv;
+
+	if (stream->cs_mode)
+		return !cs_buffer_is_empty(stream);
+	else
+		return oa_buffer_check_unlocked(dev_priv);
+}
+
+/**
+ * i915_perf_stream_wait_unlocked - handles blocking IO until data available
+ * @stream: An i915-perf stream opened for GPU metrics
+ *
  * Called when userspace tries to read() from a blocking stream FD opened
- * for OA metrics. It waits until the hrtimer callback finds a non-empty
- * OA buffer and wakes us.
+ * for perf metrics. It waits until the hrtimer callback finds a non-empty
+ * command stream buffer / OA buffer and wakes us.
  *
  * Note: it's acceptable to have this return with some false positives
  * since any subsequent read handling will return -EAGAIN if there isn't
@@ -1114,7 +1621,7 @@ static int gen7_oa_read(struct i915_perf_stream *stream,
  *
  * Returns: zero on success or a negative error code
  */
-static int i915_oa_wait_unlocked(struct i915_perf_stream *stream)
+static int i915_perf_stream_wait_unlocked(struct i915_perf_stream *stream)
 {
 	struct drm_i915_private *dev_priv = stream->dev_priv;
 
@@ -1122,32 +1629,47 @@ static int i915_oa_wait_unlocked(struct i915_perf_stream *stream)
 	if (!dev_priv->perf.oa.periodic)
 		return -EIO;
 
-	return wait_event_interruptible(dev_priv->perf.oa.poll_wq,
-					oa_buffer_check_unlocked(dev_priv));
+	if (stream->cs_mode) {
+		long int ret;
+
+		/* Wait for the all sampled requests. */
+		ret = reservation_object_wait_timeout_rcu(
+						    stream->cs_buffer.vma->resv,
+						    true,
+						    true,
+						    MAX_SCHEDULE_TIMEOUT);
+		if (unlikely(ret < 0)) {
+			DRM_DEBUG_DRIVER("Failed to wait for sampled requests: %li\n", ret);
+			return ret;
+		}
+	}
+
+	return wait_event_interruptible(stream->poll_wq,
+					stream_have_data_unlocked(stream));
 }
 
 /**
- * i915_oa_poll_wait - call poll_wait() for an OA stream poll()
- * @stream: An i915-perf stream opened for OA metrics
+ * i915_perf_stream_poll_wait - call poll_wait() for an stream poll()
+ * @stream: An i915-perf stream opened for GPU metrics
  * @file: An i915 perf stream file
  * @wait: poll() state table
  *
- * For handling userspace polling on an i915 perf stream opened for OA metrics,
+ * For handling userspace polling on an i915 perf stream opened for metrics,
  * this starts a poll_wait with the wait queue that our hrtimer callback wakes
- * when it sees data ready to read in the circular OA buffer.
+ * when it sees data ready to read either in command stream buffer or in the
+ * circular OA buffer.
  */
-static void i915_oa_poll_wait(struct i915_perf_stream *stream,
+static void i915_perf_stream_poll_wait(struct i915_perf_stream *stream,
 			      struct file *file,
 			      poll_table *wait)
 {
-	struct drm_i915_private *dev_priv = stream->dev_priv;
-
-	poll_wait(file, &dev_priv->perf.oa.poll_wq, wait);
+	poll_wait(file, &stream->poll_wq, wait);
 }
 
 /**
- * i915_oa_read - just calls through to &i915_oa_ops->read
- * @stream: An i915-perf stream opened for OA metrics
+ * i915_perf_stream_read - Reads perf metrics available into userspace read
+ * buffer
+ * @stream: An i915-perf stream opened for GPU metrics
  * @buf: destination buffer given by userspace
  * @count: the number of bytes userspace wants to read
  * @offset: (inout): the current position for writing into @buf
@@ -1157,14 +1679,21 @@ static void i915_oa_poll_wait(struct i915_perf_stream *stream,
  *
  * Returns: zero on success or a negative error code
  */
-static int i915_oa_read(struct i915_perf_stream *stream,
+static int i915_perf_stream_read(struct i915_perf_stream *stream,
 			char __user *buf,
 			size_t count,
 			size_t *offset)
 {
 	struct drm_i915_private *dev_priv = stream->dev_priv;
 
-	return dev_priv->perf.oa.ops.read(stream, buf, count, offset);
+
+	if (stream->cs_mode)
+		return append_cs_buffer_samples(stream, buf, count, offset);
+	else if (stream->sample_flags & SAMPLE_OA_REPORT)
+		return dev_priv->perf.oa.ops.read(stream, buf, count, offset,
+						U32_MAX);
+	else
+		return -EINVAL;
 }
 
 /**
@@ -1182,7 +1711,7 @@ static int oa_get_render_ctx_id(struct i915_perf_stream *stream)
 	struct drm_i915_private *dev_priv = stream->dev_priv;
 
 	if (i915.enable_execlists)
-		dev_priv->perf.oa.specific_ctx_id = stream->ctx->hw_id;
+		stream->engine->specific_ctx_id = stream->ctx->hw_id;
 	else {
 		struct intel_engine_cs *engine = dev_priv->engine[RCS];
 		struct intel_ring *ring;
@@ -1209,7 +1738,7 @@ static int oa_get_render_ctx_id(struct i915_perf_stream *stream)
 		 * i915_ggtt_offset() on the fly) considering the difference
 		 * with gen8+ and execlists
 		 */
-		dev_priv->perf.oa.specific_ctx_id =
+		stream->engine->specific_ctx_id =
 			i915_ggtt_offset(stream->ctx->engine[engine->id].state);
 	}
 
@@ -1228,13 +1757,13 @@ static void oa_put_render_ctx_id(struct i915_perf_stream *stream)
 	struct drm_i915_private *dev_priv = stream->dev_priv;
 
 	if (i915.enable_execlists) {
-		dev_priv->perf.oa.specific_ctx_id = INVALID_CTX_ID;
+		stream->engine->specific_ctx_id = INVALID_CTX_ID;
 	} else {
 		struct intel_engine_cs *engine = dev_priv->engine[RCS];
 
 		mutex_lock(&dev_priv->drm.struct_mutex);
 
-		dev_priv->perf.oa.specific_ctx_id = INVALID_CTX_ID;
+		stream->engine->specific_ctx_id = INVALID_CTX_ID;
 		engine->context_unpin(engine, stream->ctx);
 
 		mutex_unlock(&dev_priv->drm.struct_mutex);
@@ -1242,13 +1771,28 @@ static void oa_put_render_ctx_id(struct i915_perf_stream *stream)
 }
 
 static void
+free_cs_buffer(struct i915_perf_stream *stream)
+{
+	struct drm_i915_private *dev_priv = stream->dev_priv;
+
+	mutex_lock(&dev_priv->drm.struct_mutex);
+
+	i915_gem_object_unpin_map(stream->cs_buffer.vma->obj);
+	i915_vma_unpin_and_release(&stream->cs_buffer.vma);
+
+	stream->cs_buffer.vma = NULL;
+	stream->cs_buffer.vaddr = NULL;
+
+	mutex_unlock(&dev_priv->drm.struct_mutex);
+}
+
+static void
 free_oa_buffer(struct drm_i915_private *i915)
 {
 	mutex_lock(&i915->drm.struct_mutex);
 
 	i915_gem_object_unpin_map(i915->perf.oa.oa_buffer.vma->obj);
-	i915_vma_unpin(i915->perf.oa.oa_buffer.vma);
-	i915_gem_object_put(i915->perf.oa.oa_buffer.vma->obj);
+	i915_vma_unpin_and_release(&i915->perf.oa.oa_buffer.vma);
 
 	i915->perf.oa.oa_buffer.vma = NULL;
 	i915->perf.oa.oa_buffer.vaddr = NULL;
@@ -1256,27 +1800,41 @@ static void oa_put_render_ctx_id(struct i915_perf_stream *stream)
 	mutex_unlock(&i915->drm.struct_mutex);
 }
 
-static void i915_oa_stream_destroy(struct i915_perf_stream *stream)
+static void i915_perf_stream_destroy(struct i915_perf_stream *stream)
 {
 	struct drm_i915_private *dev_priv = stream->dev_priv;
-
-	BUG_ON(stream != dev_priv->perf.oa.exclusive_stream);
+	struct intel_engine_cs *engine = stream->engine;
+	struct i915_perf_stream *engine_stream;
+	int idx;
+
+	idx = srcu_read_lock(&engine->perf_srcu);
+	engine_stream = srcu_dereference(engine->exclusive_stream,
+					 &engine->perf_srcu);
+	if (WARN_ON(stream != engine_stream))
+		return;
+	srcu_read_unlock(&engine->perf_srcu, idx);
 
 	/*
 	 * Unset exclusive_stream first, it might be checked while
 	 * disabling the metric set on gen8+.
 	 */
-	dev_priv->perf.oa.exclusive_stream = NULL;
+	rcu_assign_pointer(stream->engine->exclusive_stream, NULL);
+	synchronize_srcu(&stream->engine->perf_srcu);
 
-	dev_priv->perf.oa.ops.disable_metric_set(dev_priv);
+	if (stream->using_oa) {
+		dev_priv->perf.oa.ops.disable_metric_set(dev_priv);
 
-	free_oa_buffer(dev_priv);
+		free_oa_buffer(dev_priv);
 
-	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
-	intel_runtime_pm_put(dev_priv);
+		intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
+		intel_runtime_pm_put(dev_priv);
 
-	if (stream->ctx)
-		oa_put_render_ctx_id(stream);
+		if (stream->ctx)
+			oa_put_render_ctx_id(stream);
+	}
+
+	if (stream->cs_mode)
+		free_cs_buffer(stream);
 
 	if (dev_priv->perf.oa.spurious_report_rs.missed) {
 		DRM_NOTE("%d spurious OA report notices suppressed due to ratelimiting\n",
@@ -1325,11 +1883,6 @@ static void gen7_init_oa_buffer(struct drm_i915_private *dev_priv)
 	 * memory...
 	 */
 	memset(dev_priv->perf.oa.oa_buffer.vaddr, 0, OA_BUFFER_SIZE);
-
-	/* Maybe make ->pollin per-stream state if we support multiple
-	 * concurrent streams in the future.
-	 */
-	dev_priv->perf.oa.pollin = false;
 }
 
 static void gen8_init_oa_buffer(struct drm_i915_private *dev_priv)
@@ -1383,33 +1936,26 @@ static void gen8_init_oa_buffer(struct drm_i915_private *dev_priv)
 	 * memory...
 	 */
 	memset(dev_priv->perf.oa.oa_buffer.vaddr, 0, OA_BUFFER_SIZE);
-
-	/*
-	 * Maybe make ->pollin per-stream state if we support multiple
-	 * concurrent streams in the future.
-	 */
-	dev_priv->perf.oa.pollin = false;
 }
 
-static int alloc_oa_buffer(struct drm_i915_private *dev_priv)
+static int alloc_obj(struct drm_i915_private *dev_priv,
+		     struct i915_vma **vma, u8 **vaddr)
 {
 	struct drm_i915_gem_object *bo;
-	struct i915_vma *vma;
 	int ret;
 
-	if (WARN_ON(dev_priv->perf.oa.oa_buffer.vma))
-		return -ENODEV;
+	intel_runtime_pm_get(dev_priv);
 
 	ret = i915_mutex_lock_interruptible(&dev_priv->drm);
 	if (ret)
-		return ret;
+		goto out;
 
 	BUILD_BUG_ON_NOT_POWER_OF_2(OA_BUFFER_SIZE);
 	BUILD_BUG_ON(OA_BUFFER_SIZE < SZ_128K || OA_BUFFER_SIZE > SZ_16M);
 
 	bo = i915_gem_object_create(dev_priv, OA_BUFFER_SIZE);
 	if (IS_ERR(bo)) {
-		DRM_ERROR("Failed to allocate OA buffer\n");
+		DRM_ERROR("Failed to allocate i915 perf obj\n");
 		ret = PTR_ERR(bo);
 		goto unlock;
 	}
@@ -1419,42 +1965,83 @@ static int alloc_oa_buffer(struct drm_i915_private *dev_priv)
 		goto err_unref;
 
 	/* PreHSW required 512K alignment, HSW requires 16M */
-	vma = i915_gem_object_ggtt_pin(bo, NULL, 0, SZ_16M, 0);
-	if (IS_ERR(vma)) {
-		ret = PTR_ERR(vma);
+	*vma = i915_gem_object_ggtt_pin(bo, NULL, 0, SZ_16M, 0);
+	if (IS_ERR(*vma)) {
+		ret = PTR_ERR(*vma);
 		goto err_unref;
 	}
-	dev_priv->perf.oa.oa_buffer.vma = vma;
 
-	dev_priv->perf.oa.oa_buffer.vaddr =
-		i915_gem_object_pin_map(bo, I915_MAP_WB);
-	if (IS_ERR(dev_priv->perf.oa.oa_buffer.vaddr)) {
-		ret = PTR_ERR(dev_priv->perf.oa.oa_buffer.vaddr);
+	*vaddr = i915_gem_object_pin_map(bo, I915_MAP_WB);
+	if (IS_ERR(*vaddr)) {
+		ret = PTR_ERR(*vaddr);
 		goto err_unpin;
 	}
 
-	dev_priv->perf.oa.ops.init_oa_buffer(dev_priv);
-
-	DRM_DEBUG_DRIVER("OA Buffer initialized, gtt offset = 0x%x, vaddr = %p\n",
-			 i915_ggtt_offset(dev_priv->perf.oa.oa_buffer.vma),
-			 dev_priv->perf.oa.oa_buffer.vaddr);
-
 	goto unlock;
 
 err_unpin:
-	__i915_vma_unpin(vma);
+	i915_vma_unpin(*vma);
 
 err_unref:
 	i915_gem_object_put(bo);
 
-	dev_priv->perf.oa.oa_buffer.vaddr = NULL;
-	dev_priv->perf.oa.oa_buffer.vma = NULL;
-
 unlock:
 	mutex_unlock(&dev_priv->drm.struct_mutex);
+out:
+	intel_runtime_pm_put(dev_priv);
 	return ret;
 }
 
+static int alloc_oa_buffer(struct drm_i915_private *dev_priv)
+{
+	struct i915_vma *vma;
+	u8 *vaddr;
+	int ret;
+
+	if (WARN_ON(dev_priv->perf.oa.oa_buffer.vma))
+		return -ENODEV;
+
+	ret = alloc_obj(dev_priv, &vma, &vaddr);
+	if (ret)
+		return ret;
+
+	dev_priv->perf.oa.oa_buffer.vma = vma;
+	dev_priv->perf.oa.oa_buffer.vaddr = vaddr;
+
+	dev_priv->perf.oa.ops.init_oa_buffer(dev_priv);
+
+	DRM_DEBUG_DRIVER("OA Buffer initialized, gtt offset = 0x%x, vaddr = %p",
+			 i915_ggtt_offset(dev_priv->perf.oa.oa_buffer.vma),
+			 dev_priv->perf.oa.oa_buffer.vaddr);
+	return 0;
+}
+
+static int alloc_cs_buffer(struct i915_perf_stream *stream)
+{
+	struct drm_i915_private *dev_priv = stream->dev_priv;
+	struct i915_vma *vma;
+	u8 *vaddr;
+	int ret;
+
+	if (WARN_ON(stream->cs_buffer.vma))
+		return -ENODEV;
+
+	ret = alloc_obj(dev_priv, &vma, &vaddr);
+	if (ret)
+		return ret;
+
+	stream->cs_buffer.vma = vma;
+	stream->cs_buffer.vaddr = vaddr;
+	if (WARN_ON(!list_empty(&stream->cs_samples)))
+		INIT_LIST_HEAD(&stream->cs_samples);
+
+	DRM_DEBUG_DRIVER("Command stream buf initialized, gtt offset = 0x%x, vaddr = %p",
+			 i915_ggtt_offset(stream->cs_buffer.vma),
+			 stream->cs_buffer.vaddr);
+
+	return 0;
+}
+
 static void config_oa_regs(struct drm_i915_private *dev_priv,
 			   const struct i915_oa_reg *regs,
 			   int n_regs)
@@ -1859,6 +2446,10 @@ static void gen8_disable_metric_set(struct drm_i915_private *dev_priv)
 
 static void gen7_oa_enable(struct drm_i915_private *dev_priv)
 {
+	struct i915_perf_stream *stream;
+	struct intel_engine_cs *engine = dev_priv->engine[RCS];
+	int idx;
+
 	/*
 	 * Reset buf pointers so we don't forward reports from before now.
 	 *
@@ -1870,11 +2461,11 @@ static void gen7_oa_enable(struct drm_i915_private *dev_priv)
 	 */
 	gen7_init_oa_buffer(dev_priv);
 
-	if (dev_priv->perf.oa.exclusive_stream->enabled) {
-		struct i915_gem_context *ctx =
-			dev_priv->perf.oa.exclusive_stream->ctx;
-		u32 ctx_id = dev_priv->perf.oa.specific_ctx_id;
-
+	idx = srcu_read_lock(&engine->perf_srcu);
+	stream = srcu_dereference(engine->exclusive_stream, &engine->perf_srcu);
+	if (stream->state != I915_PERF_STREAM_DISABLED) {
+		struct i915_gem_context *ctx = stream->ctx;
+		u32 ctx_id = engine->specific_ctx_id;
 		bool periodic = dev_priv->perf.oa.periodic;
 		u32 period_exponent = dev_priv->perf.oa.period_exponent;
 		u32 report_format = dev_priv->perf.oa.oa_buffer.format;
@@ -1889,6 +2480,7 @@ static void gen7_oa_enable(struct drm_i915_private *dev_priv)
 			   GEN7_OACONTROL_ENABLE);
 	} else
 		I915_WRITE(GEN7_OACONTROL, 0);
+	srcu_read_unlock(&engine->perf_srcu, idx);
 }
 
 static void gen8_oa_enable(struct drm_i915_private *dev_priv)
@@ -1917,22 +2509,23 @@ static void gen8_oa_enable(struct drm_i915_private *dev_priv)
 }
 
 /**
- * i915_oa_stream_enable - handle `I915_PERF_IOCTL_ENABLE` for OA stream
- * @stream: An i915 perf stream opened for OA metrics
+ * i915_perf_stream_enable - handle `I915_PERF_IOCTL_ENABLE` for perf stream
+ * @stream: An i915 perf stream opened for GPU metrics
  *
  * [Re]enables hardware periodic sampling according to the period configured
  * when opening the stream. This also starts a hrtimer that will periodically
  * check for data in the circular OA buffer for notifying userspace (e.g.
  * during a read() or poll()).
  */
-static void i915_oa_stream_enable(struct i915_perf_stream *stream)
+static void i915_perf_stream_enable(struct i915_perf_stream *stream)
 {
 	struct drm_i915_private *dev_priv = stream->dev_priv;
 
-	dev_priv->perf.oa.ops.oa_enable(dev_priv);
+	if (stream->sample_flags & SAMPLE_OA_REPORT)
+		dev_priv->perf.oa.ops.oa_enable(dev_priv);
 
-	if (dev_priv->perf.oa.periodic)
-		hrtimer_start(&dev_priv->perf.oa.poll_check_timer,
+	if (stream->cs_mode || dev_priv->perf.oa.periodic)
+		hrtimer_start(&dev_priv->perf.poll_check_timer,
 			      ns_to_ktime(POLL_PERIOD),
 			      HRTIMER_MODE_REL_PINNED);
 }
@@ -1948,34 +2541,39 @@ static void gen8_oa_disable(struct drm_i915_private *dev_priv)
 }
 
 /**
- * i915_oa_stream_disable - handle `I915_PERF_IOCTL_DISABLE` for OA stream
- * @stream: An i915 perf stream opened for OA metrics
+ * i915_perf_stream_disable - handle `I915_PERF_IOCTL_DISABLE` for perf stream
+ * @stream: An i915 perf stream opened for GPU metrics
  *
  * Stops the OA unit from periodically writing counter reports into the
  * circular OA buffer. This also stops the hrtimer that periodically checks for
  * data in the circular OA buffer, for notifying userspace.
  */
-static void i915_oa_stream_disable(struct i915_perf_stream *stream)
+static void i915_perf_stream_disable(struct i915_perf_stream *stream)
 {
 	struct drm_i915_private *dev_priv = stream->dev_priv;
 
-	dev_priv->perf.oa.ops.oa_disable(dev_priv);
+	if (stream->cs_mode || dev_priv->perf.oa.periodic)
+		hrtimer_cancel(&dev_priv->perf.poll_check_timer);
+
+	if (stream->cs_mode)
+		i915_perf_stream_release_samples(stream);
 
-	if (dev_priv->perf.oa.periodic)
-		hrtimer_cancel(&dev_priv->perf.oa.poll_check_timer);
+	if (stream->sample_flags & SAMPLE_OA_REPORT)
+		dev_priv->perf.oa.ops.oa_disable(dev_priv);
 }
 
-static const struct i915_perf_stream_ops i915_oa_stream_ops = {
-	.destroy = i915_oa_stream_destroy,
-	.enable = i915_oa_stream_enable,
-	.disable = i915_oa_stream_disable,
-	.wait_unlocked = i915_oa_wait_unlocked,
-	.poll_wait = i915_oa_poll_wait,
-	.read = i915_oa_read,
+static const struct i915_perf_stream_ops perf_stream_ops = {
+	.destroy = i915_perf_stream_destroy,
+	.enable = i915_perf_stream_enable,
+	.disable = i915_perf_stream_disable,
+	.wait_unlocked = i915_perf_stream_wait_unlocked,
+	.poll_wait = i915_perf_stream_poll_wait,
+	.read = i915_perf_stream_read,
+	.emit_sample_capture = i915_perf_stream_emit_sample_capture,
 };
 
 /**
- * i915_oa_stream_init - validate combined props for OA stream and init
+ * i915_perf_stream_init - validate combined props for stream and init
  * @stream: An i915 perf stream
  * @param: The open parameters passed to `DRM_I915_PERF_OPEN`
  * @props: The property state that configures stream (individually validated)
@@ -1984,58 +2582,35 @@ static void i915_oa_stream_disable(struct i915_perf_stream *stream)
  * doesn't ensure that the combination necessarily makes sense.
  *
  * At this point it has been determined that userspace wants a stream of
- * OA metrics, but still we need to further validate the combined
+ * perf metrics, but still we need to further validate the combined
  * properties are OK.
  *
  * If the configuration makes sense then we can allocate memory for
- * a circular OA buffer and apply the requested metric set configuration.
+ * a circular perf buffer and apply the requested metric set configuration.
  *
  * Returns: zero on success or a negative error code.
  */
-static int i915_oa_stream_init(struct i915_perf_stream *stream,
+static int i915_perf_stream_init(struct i915_perf_stream *stream,
 			       struct drm_i915_perf_open_param *param,
 			       struct perf_open_properties *props)
 {
 	struct drm_i915_private *dev_priv = stream->dev_priv;
-	int format_size;
+	bool require_oa_unit = props->sample_flags & (SAMPLE_OA_REPORT |
+						      SAMPLE_OA_SOURCE);
+	bool cs_sample_data = props->sample_flags & SAMPLE_OA_REPORT;
+	struct i915_perf_stream *curr_stream;
+	struct intel_engine_cs *engine = NULL;
+	int idx;
 	int ret;
 
-	/* If the sysfs metrics/ directory wasn't registered for some
-	 * reason then don't let userspace try their luck with config
-	 * IDs
-	 */
-	if (!dev_priv->perf.metrics_kobj) {
-		DRM_DEBUG("OA metrics weren't advertised via sysfs\n");
-		return -EINVAL;
-	}
-
-	if (!(props->sample_flags & SAMPLE_OA_REPORT)) {
-		DRM_DEBUG("Only OA report sampling supported\n");
-		return -EINVAL;
-	}
-
-	if (!dev_priv->perf.oa.ops.init_oa_buffer) {
-		DRM_DEBUG("OA unit not supported\n");
-		return -ENODEV;
-	}
-
-	/* To avoid the complexity of having to accurately filter
-	 * counter reports and marshal to the appropriate client
-	 * we currently only allow exclusive access
-	 */
-	if (dev_priv->perf.oa.exclusive_stream) {
-		DRM_DEBUG("OA unit already in use\n");
-		return -EBUSY;
-	}
-
-	if (!props->metrics_set) {
-		DRM_DEBUG("OA metric set not specified\n");
-		return -EINVAL;
-	}
-
-	if (!props->oa_format) {
-		DRM_DEBUG("OA report format not specified\n");
-		return -EINVAL;
+	if ((props->sample_flags & SAMPLE_CTX_ID) && !props->cs_mode) {
+		if (IS_HASWELL(dev_priv)) {
+			DRM_ERROR("On HSW, context ID sampling only supported via command stream\n");
+			return -EINVAL;
+		} else if (!i915.enable_execlists) {
+			DRM_ERROR("On Gen8+ without execlists, context ID sampling only supported via command stream\n");
+			return -EINVAL;
+		}
 	}
 
 	/* We set up some ratelimit state to potentially throttle any _NOTES
@@ -2060,70 +2635,167 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
 
 	stream->sample_size = sizeof(struct drm_i915_perf_record_header);
 
-	format_size = dev_priv->perf.oa.oa_formats[props->oa_format].size;
+	if (require_oa_unit) {
+		int format_size;
 
-	stream->sample_flags |= SAMPLE_OA_REPORT;
-	stream->sample_size += format_size;
+		/* If the sysfs metrics/ directory wasn't registered for some
+		 * reason then don't let userspace try their luck with config
+		 * IDs
+		 */
+		if (!dev_priv->perf.metrics_kobj) {
+			DRM_DEBUG("OA metrics weren't advertised via sysfs\n");
+			return -EINVAL;
+		}
 
-	if (props->sample_flags & SAMPLE_OA_SOURCE) {
-		stream->sample_flags |= SAMPLE_OA_SOURCE;
-		stream->sample_size += 8;
-	}
+		if (!dev_priv->perf.oa.ops.init_oa_buffer) {
+			DRM_DEBUG("OA unit not supported\n");
+			return -ENODEV;
+		}
 
-	dev_priv->perf.oa.oa_buffer.format_size = format_size;
-	if (WARN_ON(dev_priv->perf.oa.oa_buffer.format_size == 0))
-		return -EINVAL;
+		if (!props->metrics_set) {
+			DRM_DEBUG("OA metric set not specified\n");
+			return -EINVAL;
+		}
+
+		if (!props->oa_format) {
+			DRM_DEBUG("OA report format not specified\n");
+			return -EINVAL;
+		}
+
+		if (props->cs_mode && (props->engine != RCS)) {
+			DRM_ERROR("Command stream OA metrics only available via Render CS\n");
+			return -EINVAL;
+		}
+
+		engine = dev_priv->engine[RCS];
+		stream->using_oa = true;
+
+		idx = srcu_read_lock(&engine->perf_srcu);
+		curr_stream = srcu_dereference(engine->exclusive_stream,
+					       &engine->perf_srcu);
+		if (curr_stream) {
+			DRM_ERROR("Stream already opened\n");
+			ret = -EINVAL;
+			goto err_enable;
+		}
+		srcu_read_unlock(&engine->perf_srcu, idx);
+
+		format_size =
+			dev_priv->perf.oa.oa_formats[props->oa_format].size;
+
+		if (props->sample_flags & SAMPLE_OA_REPORT) {
+			stream->sample_flags |= SAMPLE_OA_REPORT;
+			stream->sample_size += format_size;
+		}
+
+		if (props->sample_flags & SAMPLE_OA_SOURCE) {
+			if (!(props->sample_flags & SAMPLE_OA_REPORT)) {
+				DRM_ERROR("OA source type can't be sampled without OA report\n");
+				return -EINVAL;
+			}
+			stream->sample_flags |= SAMPLE_OA_SOURCE;
+			stream->sample_size += 8;
+		}
+
+		dev_priv->perf.oa.oa_buffer.format_size = format_size;
+		if (WARN_ON(dev_priv->perf.oa.oa_buffer.format_size == 0))
+			return -EINVAL;
+
+		dev_priv->perf.oa.oa_buffer.format =
+			dev_priv->perf.oa.oa_formats[props->oa_format].format;
+
+		dev_priv->perf.oa.metrics_set = props->metrics_set;
 
-	dev_priv->perf.oa.oa_buffer.format =
-		dev_priv->perf.oa.oa_formats[props->oa_format].format;
+		dev_priv->perf.oa.periodic = props->oa_periodic;
+		if (dev_priv->perf.oa.periodic)
+			dev_priv->perf.oa.period_exponent =
+				props->oa_period_exponent;
 
-	dev_priv->perf.oa.metrics_set = props->metrics_set;
+		if (stream->ctx) {
+			ret = oa_get_render_ctx_id(stream);
+			if (ret)
+				return ret;
+		}
 
-	dev_priv->perf.oa.periodic = props->oa_periodic;
-	if (dev_priv->perf.oa.periodic)
-		dev_priv->perf.oa.period_exponent = props->oa_period_exponent;
+		/* PRM - observability performance counters:
+		 *
+		 *   OACONTROL, performance counter enable, note:
+		 *
+		 *   "When this bit is set, in order to have coherent counts,
+		 *   RC6 power state and trunk clock gating must be disabled.
+		 *   This can be achieved by programming MMIO registers as
+		 *   0xA094=0 and 0xA090[31]=1"
+		 *
+		 *   In our case we are expecting that taking pm + FORCEWAKE
+		 *   references will effectively disable RC6.
+		 */
+		intel_runtime_pm_get(dev_priv);
+		intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
 
-	if (stream->ctx) {
-		ret = oa_get_render_ctx_id(stream);
+		ret = alloc_oa_buffer(dev_priv);
 		if (ret)
-			return ret;
+			goto err_oa_buf_alloc;
+
+		ret = dev_priv->perf.oa.ops.enable_metric_set(dev_priv);
+		if (ret)
+			goto err_enable;
 	}
 
-	/* PRM - observability performance counters:
-	 *
-	 *   OACONTROL, performance counter enable, note:
-	 *
-	 *   "When this bit is set, in order to have coherent counts,
-	 *   RC6 power state and trunk clock gating must be disabled.
-	 *   This can be achieved by programming MMIO registers as
-	 *   0xA094=0 and 0xA090[31]=1"
-	 *
-	 *   In our case we are expecting that taking pm + FORCEWAKE
-	 *   references will effectively disable RC6.
-	 */
-	intel_runtime_pm_get(dev_priv);
-	intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
+	if (props->sample_flags & SAMPLE_CTX_ID) {
+		stream->sample_flags |= SAMPLE_CTX_ID;
+		stream->sample_size += 8;
+	}
 
-	ret = alloc_oa_buffer(dev_priv);
-	if (ret)
-		goto err_oa_buf_alloc;
+	if (props->cs_mode) {
+		if (!cs_sample_data) {
+			DRM_ERROR("Stream engine given without requesting any CS data to sample\n");
+			ret = -EINVAL;
+			goto err_enable;
+		}
 
-	ret = dev_priv->perf.oa.ops.enable_metric_set(dev_priv);
-	if (ret)
-		goto err_enable;
+		if (!(props->sample_flags & SAMPLE_CTX_ID)) {
+			DRM_ERROR("Stream engine given without requesting any CS specific property\n");
+			ret = -EINVAL;
+			goto err_enable;
+		}
 
-	stream->ops = &i915_oa_stream_ops;
+		engine = dev_priv->engine[props->engine];
 
-	dev_priv->perf.oa.exclusive_stream = stream;
+		idx = srcu_read_lock(&engine->perf_srcu);
+		curr_stream = srcu_dereference(engine->exclusive_stream,
+					       &engine->perf_srcu);
+		if (curr_stream) {
+			DRM_ERROR("Stream already opened\n");
+			ret = -EINVAL;
+			goto err_enable;
+		}
+		srcu_read_unlock(&engine->perf_srcu, idx);
+
+		INIT_LIST_HEAD(&stream->cs_samples);
+		ret = alloc_cs_buffer(stream);
+		if (ret)
+			goto err_enable;
+
+		stream->cs_mode = true;
+	}
+
+	init_waitqueue_head(&stream->poll_wq);
+	stream->pollin = false;
+	stream->ops = &perf_stream_ops;
+	stream->engine = engine;
+	rcu_assign_pointer(engine->exclusive_stream, stream);
 
 	return 0;
 
 err_enable:
-	free_oa_buffer(dev_priv);
+	if (require_oa_unit)
+		free_oa_buffer(dev_priv);
 
 err_oa_buf_alloc:
-	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
-	intel_runtime_pm_put(dev_priv);
+	if (require_oa_unit) {
+		intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
+		intel_runtime_pm_put(dev_priv);
+	}
 	if (stream->ctx)
 		oa_put_render_ctx_id(stream);
 
@@ -2219,7 +2891,7 @@ static ssize_t i915_perf_read(struct file *file,
 	 * disabled stream as an error. In particular it might otherwise lead
 	 * to a deadlock for blocking file descriptors...
 	 */
-	if (!stream->enabled)
+	if (stream->state == I915_PERF_STREAM_DISABLED)
 		return -EIO;
 
 	if (!(file->f_flags & O_NONBLOCK)) {
@@ -2254,25 +2926,32 @@ static ssize_t i915_perf_read(struct file *file,
 	 * effectively ensures we back off until the next hrtimer callback
 	 * before reporting another POLLIN event.
 	 */
-	if (ret >= 0 || ret == -EAGAIN) {
-		/* Maybe make ->pollin per-stream state if we support multiple
-		 * concurrent streams in the future.
-		 */
-		dev_priv->perf.oa.pollin = false;
-	}
+	if (ret >= 0 || ret == -EAGAIN)
+		stream->pollin = false;
 
 	return ret;
 }
 
-static enum hrtimer_restart oa_poll_check_timer_cb(struct hrtimer *hrtimer)
+static enum hrtimer_restart poll_check_timer_cb(struct hrtimer *hrtimer)
 {
+	struct i915_perf_stream *stream;
 	struct drm_i915_private *dev_priv =
 		container_of(hrtimer, typeof(*dev_priv),
-			     perf.oa.poll_check_timer);
-
-	if (oa_buffer_check_unlocked(dev_priv)) {
-		dev_priv->perf.oa.pollin = true;
-		wake_up(&dev_priv->perf.oa.poll_wq);
+			     perf.poll_check_timer);
+	int idx;
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+
+	for_each_engine(engine, dev_priv, id) {
+		idx = srcu_read_lock(&engine->perf_srcu);
+		stream = srcu_dereference(engine->exclusive_stream,
+					  &engine->perf_srcu);
+		if (stream && (stream->state == I915_PERF_STREAM_ENABLED) &&
+		    stream_have_data_unlocked(stream)) {
+			stream->pollin = true;
+			wake_up(&stream->poll_wq);
+		}
+		srcu_read_unlock(&engine->perf_srcu, idx);
 	}
 
 	hrtimer_forward_now(hrtimer, ns_to_ktime(POLL_PERIOD));
@@ -2311,7 +2990,7 @@ static unsigned int i915_perf_poll_locked(struct drm_i915_private *dev_priv,
 	 * the hrtimer/oa_poll_check_timer_cb to notify us when there are
 	 * samples to read.
 	 */
-	if (dev_priv->perf.oa.pollin)
+	if (stream->pollin)
 		events |= POLLIN;
 
 	return events;
@@ -2355,14 +3034,16 @@ static unsigned int i915_perf_poll(struct file *file, poll_table *wait)
  */
 static void i915_perf_enable_locked(struct i915_perf_stream *stream)
 {
-	if (stream->enabled)
+	if (stream->state != I915_PERF_STREAM_DISABLED)
 		return;
 
 	/* Allow stream->ops->enable() to refer to this */
-	stream->enabled = true;
+	stream->state = I915_PERF_STREAM_ENABLE_IN_PROGRESS;
 
 	if (stream->ops->enable)
 		stream->ops->enable(stream);
+
+	stream->state = I915_PERF_STREAM_ENABLED;
 }
 
 /**
@@ -2381,11 +3062,11 @@ static void i915_perf_enable_locked(struct i915_perf_stream *stream)
  */
 static void i915_perf_disable_locked(struct i915_perf_stream *stream)
 {
-	if (!stream->enabled)
+	if (stream->state != I915_PERF_STREAM_ENABLED)
 		return;
 
 	/* Allow stream->ops->disable() to refer to this */
-	stream->enabled = false;
+	stream->state = I915_PERF_STREAM_DISABLED;
 
 	if (stream->ops->disable)
 		stream->ops->disable(stream);
@@ -2457,14 +3138,12 @@ static long i915_perf_ioctl(struct file *file,
  */
 static void i915_perf_destroy_locked(struct i915_perf_stream *stream)
 {
-	if (stream->enabled)
+	if (stream->state == I915_PERF_STREAM_ENABLED)
 		i915_perf_disable_locked(stream);
 
 	if (stream->ops->destroy)
 		stream->ops->destroy(stream);
 
-	list_del(&stream->link);
-
 	if (stream->ctx)
 		i915_gem_context_put(stream->ctx);
 
@@ -2524,7 +3203,7 @@ static int i915_perf_release(struct inode *inode, struct file *file)
  *
  * In the case where userspace is interested in OA unit metrics then further
  * config validation and stream initialization details will be handled by
- * i915_oa_stream_init(). The code here should only validate config state that
+ * i915_perf_stream_init(). The code here should only validate config state that
  * will be relevant to all stream types / backends.
  *
  * Returns: zero on success or a negative error code.
@@ -2593,7 +3272,7 @@ static int i915_perf_release(struct inode *inode, struct file *file)
 	stream->dev_priv = dev_priv;
 	stream->ctx = specific_ctx;
 
-	ret = i915_oa_stream_init(stream, param, props);
+	ret = i915_perf_stream_init(stream, param, props);
 	if (ret)
 		goto err_alloc;
 
@@ -2606,8 +3285,6 @@ static int i915_perf_release(struct inode *inode, struct file *file)
 		goto err_flags;
 	}
 
-	list_add(&stream->link, &dev_priv->perf.streams);
-
 	if (param->flags & I915_PERF_FLAG_FD_CLOEXEC)
 		f_flags |= O_CLOEXEC;
 	if (param->flags & I915_PERF_FLAG_FD_NONBLOCK)
@@ -2625,7 +3302,6 @@ static int i915_perf_release(struct inode *inode, struct file *file)
 	return stream_fd;
 
 err_open:
-	list_del(&stream->link);
 err_flags:
 	if (stream->ops->destroy)
 		stream->ops->destroy(stream);
@@ -2774,6 +3450,29 @@ static int read_properties_unlocked(struct drm_i915_private *dev_priv,
 		case DRM_I915_PERF_PROP_SAMPLE_OA_SOURCE:
 			props->sample_flags |= SAMPLE_OA_SOURCE;
 			break;
+		case DRM_I915_PERF_PROP_ENGINE: {
+				unsigned int user_ring_id =
+					value & I915_EXEC_RING_MASK;
+				enum intel_engine_id engine;
+
+				if (user_ring_id > I915_USER_RINGS)
+					return -EINVAL;
+
+				/* XXX: Currently only RCS is supported.
+				 * Remove this check when support for other
+				 * engines is added
+				 */
+				engine = user_ring_map[user_ring_id];
+				if (engine != RCS)
+					return -EINVAL;
+
+				props->cs_mode = true;
+				props->engine = engine;
+			}
+			break;
+		case DRM_I915_PERF_PROP_SAMPLE_CTX_ID:
+			props->sample_flags |= SAMPLE_CTX_ID;
+			break;
 		case DRM_I915_PERF_PROP_MAX:
 			MISSING_CASE(id);
 			return -EINVAL;
@@ -3002,6 +3701,30 @@ void i915_perf_unregister(struct drm_i915_private *dev_priv)
 	{}
 };
 
+void i915_perf_streams_mark_idle(struct drm_i915_private *dev_priv)
+{
+	struct intel_engine_cs *engine;
+	struct i915_perf_stream *stream;
+	enum intel_engine_id id;
+	int idx;
+
+	for_each_engine(engine, dev_priv, id) {
+		idx = srcu_read_lock(&engine->perf_srcu);
+		stream = srcu_dereference(engine->exclusive_stream,
+					  &engine->perf_srcu);
+		if (stream && (stream->state == I915_PERF_STREAM_ENABLED) &&
+					stream->cs_mode) {
+			struct reservation_object *resv =
+						stream->cs_buffer.vma->resv;
+
+			reservation_object_lock(resv, NULL);
+			reservation_object_add_excl_fence(resv, NULL);
+			reservation_object_unlock(resv);
+		}
+		srcu_read_unlock(&engine->perf_srcu, idx);
+	}
+}
+
 /**
  * i915_perf_init - initialize i915-perf state on module load
  * @dev_priv: i915 device instance
@@ -3125,12 +3848,10 @@ void i915_perf_init(struct drm_i915_private *dev_priv)
 	}
 
 	if (dev_priv->perf.oa.n_builtin_sets) {
-		hrtimer_init(&dev_priv->perf.oa.poll_check_timer,
+		hrtimer_init(&dev_priv->perf.poll_check_timer,
 				CLOCK_MONOTONIC, HRTIMER_MODE_REL);
-		dev_priv->perf.oa.poll_check_timer.function = oa_poll_check_timer_cb;
-		init_waitqueue_head(&dev_priv->perf.oa.poll_wq);
+		dev_priv->perf.poll_check_timer.function = poll_check_timer_cb;
 
-		INIT_LIST_HEAD(&dev_priv->perf.streams);
 		mutex_init(&dev_priv->perf.lock);
 		spin_lock_init(&dev_priv->perf.oa.oa_buffer.ptr_lock);
 
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 9ab5969..1a2e843 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -317,6 +317,10 @@ int intel_engines_init(struct drm_i915_private *dev_priv)
 			goto cleanup;
 
 		GEM_BUG_ON(!engine->submit_request);
+
+		/* Perf stream related initialization for Engine */
+		rcu_assign_pointer(engine->exclusive_stream, NULL);
+		init_srcu_struct(&engine->perf_srcu);
 	}
 
 	return 0;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index cdf084e..4333623 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1622,6 +1622,8 @@ void intel_engine_cleanup(struct intel_engine_cs *engine)
 
 	intel_engine_cleanup_common(engine);
 
+	cleanup_srcu_struct(&engine->perf_srcu);
+
 	dev_priv->engine[engine->id] = NULL;
 	kfree(engine);
 }
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index d33c934..0ac8491 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -441,6 +441,11 @@ struct intel_engine_cs {
 	 * certain bits to encode the command length in the header).
 	 */
 	u32 (*get_cmd_length_mask)(u32 cmd_header);
+
+	/* Global per-engine stream */
+	struct srcu_struct perf_srcu;
+	struct i915_perf_stream __rcu *exclusive_stream;
+	u32 specific_ctx_id;
 };
 
 static inline unsigned int
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index a1314c5..768b1a5 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1350,6 +1350,7 @@ enum drm_i915_oa_format {
 
 enum drm_i915_perf_sample_oa_source {
 	I915_PERF_SAMPLE_OA_SOURCE_OABUFFER,
+	I915_PERF_SAMPLE_OA_SOURCE_CS,
 	I915_PERF_SAMPLE_OA_SOURCE_MAX	/* non-ABI */
 };
 
@@ -1394,6 +1395,19 @@ enum drm_i915_perf_property_id {
 	 */
 	DRM_I915_PERF_PROP_SAMPLE_OA_SOURCE,
 
+	/**
+	 * The value of this property specifies the GPU engine for which
+	 * the samples need to be collected. Specifying this property also
+	 * implies the command stream based sample collection.
+	 */
+	DRM_I915_PERF_PROP_ENGINE,
+
+	/**
+	 * The value of this property set to 1 requests inclusion of context ID
+	 * in the perf sample data.
+	 */
+	DRM_I915_PERF_PROP_SAMPLE_CTX_ID,
+
 	DRM_I915_PERF_PROP_MAX /* non-ABI */
 };
 
@@ -1460,6 +1474,7 @@ enum drm_i915_perf_record_type {
 	 *     struct drm_i915_perf_record_header header;
 	 *
 	 *     { u64 source; } && DRM_I915_PERF_PROP_SAMPLE_OA_SOURCE
+	 *     { u64 ctx_id; } && DRM_I915_PERF_PROP_SAMPLE_CTX_ID
 	 *     { u32 oa_report[]; } && DRM_I915_PERF_PROP_SAMPLE_OA
 	 * };
 	 */
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 04/12] drm/i915: Flush periodic samples, in case of no pending CS sample requests
  2017-07-31  7:59 [PATCH 00/12] i915 perf support for command stream based OA, GPU and workload metrics capture Sagar Arun Kamble
                   ` (2 preceding siblings ...)
  2017-07-31  7:59 ` [PATCH 03/12] drm/i915: Framework for capturing command stream based OA reports and ctx id info Sagar Arun Kamble
@ 2017-07-31  7:59 ` Sagar Arun Kamble
  2017-07-31 16:52   ` kbuild test robot
  2017-07-31  7:59 ` [PATCH 05/12] drm/i915: Inform userspace about command stream OA buf overflow Sagar Arun Kamble
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 34+ messages in thread
From: Sagar Arun Kamble @ 2017-07-31  7:59 UTC (permalink / raw)
  To: intel-gfx; +Cc: Sourab Gupta

From: Sourab Gupta <sourab.gupta@intel.com>

When there are no pending CS OA samples, flush the periodic OA samples
collected so far.

We can safely forward the periodic OA samples in the case we
have no pending CS samples, but we can't do so in the case we have
pending CS samples, since we don't know what the ordering between
pending CS samples and periodic samples will eventually be. If we
have no pending CS sample, it won't be possible for future pending CS
sample to have timestamps earlier than current periodic timestamp.

Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h  |   5 +-
 drivers/gpu/drm/i915/i915_perf.c | 142 ++++++++++++++++++++++++++++++---------
 2 files changed, 113 insertions(+), 34 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 8b1cecf..886fc5e 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2138,7 +2138,8 @@ struct i915_oa_ops {
 		    char __user *buf,
 		    size_t count,
 		    size_t *offset,
-		    u32 ts);
+		    u32 ts,
+		    u32 max_reports);
 
 	/**
 	 * @oa_hw_tail_read: read the OA tail pointer register
@@ -2604,6 +2605,8 @@ struct drm_i915_private {
 			u32 gen7_latched_oastatus1;
 			u32 ctx_oactxctrl_offset;
 			u32 ctx_flexeu0_offset;
+			u32 n_pending_periodic_samples;
+			u32 pending_periodic_ts;
 
 			/**
 			 * The RPT_ID/reason field for Gen8+ includes a bit
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 57e1936..462d180 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -656,7 +656,7 @@ static void i915_perf_stream_release_samples(struct i915_perf_stream *stream)
 }
 
 /**
- * oa_buffer_check_unlocked - check for data and update tail ptr state
+ * oa_buffer_num_reports_unlocked - check for data and update tail ptr state
  * @dev_priv: i915 device instance
  *
  * This is either called via fops (for blocking reads in user ctx) or the poll
@@ -669,7 +669,7 @@ static void i915_perf_stream_release_samples(struct i915_perf_stream *stream)
  * the pointers time to 'age' before they are made available for reading.
  * (See description of OA_TAIL_MARGIN_NSEC above for further details.)
  *
- * Besides returning true when there is data available to read() this function
+ * Besides returning num of reports when there is data available to read() it
  * also has the side effect of updating the oa_buffer.tails[], .aging_timestamp
  * and .aged_tail_idx state used for reading.
  *
@@ -677,14 +677,15 @@ static void i915_perf_stream_release_samples(struct i915_perf_stream *stream)
  * only called while the stream is enabled, while the global OA configuration
  * can't be modified.
  *
- * Returns: %true if the OA buffer contains data, else %false
+ * Returns: number of samples available to read
  */
-static bool oa_buffer_check_unlocked(struct drm_i915_private *dev_priv)
+static u32 oa_buffer_num_reports_unlocked(
+			struct drm_i915_private *dev_priv, u32 *last_ts)
 {
 	int report_size = dev_priv->perf.oa.oa_buffer.format_size;
 	unsigned long flags;
 	unsigned int aged_idx;
-	u32 head, hw_tail, aged_tail, aging_tail;
+	u32 head, hw_tail, aged_tail, aging_tail, num_reports = 0;
 	u64 now;
 
 	/* We have to consider the (unlikely) possibility that read() errors
@@ -725,6 +726,13 @@ static bool oa_buffer_check_unlocked(struct drm_i915_private *dev_priv)
 	if (aging_tail != INVALID_TAIL_PTR &&
 	    ((now - dev_priv->perf.oa.oa_buffer.aging_timestamp) >
 	     OA_TAIL_MARGIN_NSEC)) {
+		u32 mask = (OA_BUFFER_SIZE - 1);
+		u32 gtt_offset = i915_ggtt_offset(
+				dev_priv->perf.oa.oa_buffer.vma);
+		u32 head = (dev_priv->perf.oa.oa_buffer.head - gtt_offset)
+				& mask;
+		u8 *oa_buf_base = dev_priv->perf.oa.oa_buffer.vaddr;
+		u32 *report32;
 
 		aged_idx ^= 1;
 		dev_priv->perf.oa.oa_buffer.aged_tail_idx = aged_idx;
@@ -734,6 +742,14 @@ static bool oa_buffer_check_unlocked(struct drm_i915_private *dev_priv)
 		/* Mark that we need a new pointer to start aging... */
 		dev_priv->perf.oa.oa_buffer.tails[!aged_idx].offset = INVALID_TAIL_PTR;
 		aging_tail = INVALID_TAIL_PTR;
+
+		num_reports = OA_TAKEN(((aged_tail - gtt_offset) & mask), head)/
+				report_size;
+
+		/* read the timestamp of last OA report */
+		head = (head + report_size*(num_reports - 1)) & mask;
+		report32 = (u32 *)(oa_buf_base + head);
+		*last_ts = report32[1];
 	}
 
 	/* Update the aging tail
@@ -767,8 +783,7 @@ static bool oa_buffer_check_unlocked(struct drm_i915_private *dev_priv)
 
 	spin_unlock_irqrestore(&dev_priv->perf.oa.oa_buffer.ptr_lock, flags);
 
-	return aged_tail == INVALID_TAIL_PTR ?
-		false : OA_TAKEN(aged_tail, head) >= report_size;
+	return aged_tail == INVALID_TAIL_PTR ? 0 : num_reports;
 }
 
 /**
@@ -926,6 +941,7 @@ static int append_oa_buffer_sample(struct i915_perf_stream *stream,
  * @count: the number of bytes userspace wants to read
  * @offset: (inout): the current position for writing into @buf
  * @ts: copy OA reports till this timestamp
+ * @max_reports: max number of OA reports to copy
  *
  * Notably any error condition resulting in a short read (-%ENOSPC or
  * -%EFAULT) will be returned even though one or more records may
@@ -944,7 +960,8 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
 				  char __user *buf,
 				  size_t count,
 				  size_t *offset,
-				  u32 ts)
+				  u32 ts,
+				  u32 max_reports)
 {
 	struct drm_i915_private *dev_priv = stream->dev_priv;
 	int report_size = dev_priv->perf.oa.oa_buffer.format_size;
@@ -957,6 +974,7 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
 	u32 head, tail;
 	u32 taken;
 	int ret = 0;
+	u32 report_count = 0;
 
 	if (WARN_ON(stream->state != I915_PERF_STREAM_ENABLED))
 		return -EIO;
@@ -998,7 +1016,7 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
 
 
 	for (/* none */;
-	     (taken = OA_TAKEN(tail, head));
+	     (taken = OA_TAKEN(tail, head)) && (report_count <= max_reports);
 	     head = (head + report_size) & mask) {
 		u8 *report = oa_buf_base + head;
 		u32 *report32 = (void *)report;
@@ -1110,6 +1128,7 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
 			if (ret)
 				break;
 
+			report_count++;
 			dev_priv->perf.oa.oa_buffer.last_ctx_id = ctx_id;
 		}
 
@@ -1148,6 +1167,7 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
  * @count: the number of bytes userspace wants to read
  * @offset: (inout): the current position for writing into @buf
  * @ts: copy OA reports till this timestamp
+ * @max_reports: max number of OA reports to copy
  *
  * Checks OA unit status registers and if necessary appends corresponding
  * status records for userspace (such as for a buffer full condition) and then
@@ -1166,7 +1186,8 @@ static int gen8_oa_read(struct i915_perf_stream *stream,
 			char __user *buf,
 			size_t count,
 			size_t *offset,
-			u32 ts)
+			u32 ts,
+			u32 max_reports)
 {
 	struct drm_i915_private *dev_priv = stream->dev_priv;
 	u32 oastatus;
@@ -1219,7 +1240,8 @@ static int gen8_oa_read(struct i915_perf_stream *stream,
 			   oastatus & ~GEN8_OASTATUS_REPORT_LOST);
 	}
 
-	return gen8_append_oa_reports(stream, buf, count, offset, ts);
+	return gen8_append_oa_reports(stream, buf, count, offset, ts,
+					max_reports);
 }
 
 /**
@@ -1229,6 +1251,7 @@ static int gen8_oa_read(struct i915_perf_stream *stream,
  * @count: the number of bytes userspace wants to read
  * @offset: (inout): the current position for writing into @buf
  * @ts: copy OA reports till this timestamp
+ * @max_reports: max number of OA reports to copy
  *
  * Notably any error condition resulting in a short read (-%ENOSPC or
  * -%EFAULT) will be returned even though one or more records may
@@ -1247,7 +1270,8 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream,
 				  char __user *buf,
 				  size_t count,
 				  size_t *offset,
-				  u32 ts)
+				  u32 ts,
+				  u32 max_reports)
 {
 	struct drm_i915_private *dev_priv = stream->dev_priv;
 	int report_size = dev_priv->perf.oa.oa_buffer.format_size;
@@ -1260,6 +1284,7 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream,
 	u32 head, tail;
 	u32 taken;
 	int ret = 0;
+	u32 report_count = 0;
 
 	if (WARN_ON(stream->state != I915_PERF_STREAM_ENABLED))
 		return -EIO;
@@ -1298,7 +1323,7 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream,
 
 
 	for (/* none */;
-	     (taken = OA_TAKEN(tail, head));
+	     (taken = OA_TAKEN(tail, head)) && (report_count <= max_reports);
 	     head = (head + report_size) & mask) {
 		u8 *report = oa_buf_base + head;
 		u32 *report32 = (void *)report;
@@ -1337,6 +1362,7 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream,
 		if (ret)
 			break;
 
+		report_count++;
 		/* The above report-id field sanity check is based on
 		 * the assumption that the OA buffer is initially
 		 * zeroed and we reset the field after copying so the
@@ -1372,6 +1398,7 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream,
  * @count: the number of bytes userspace wants to read
  * @offset: (inout): the current position for writing into @buf
  * @ts: copy OA reports till this timestamp
+ * @max_reports: max number of OA reports to copy
  *
  * Checks Gen 7 specific OA unit status registers and if necessary appends
  * corresponding status records for userspace (such as for a buffer full
@@ -1386,7 +1413,8 @@ static int gen7_oa_read(struct i915_perf_stream *stream,
 			char __user *buf,
 			size_t count,
 			size_t *offset,
-			u32 ts)
+			u32 ts,
+			u32 max_reports)
 {
 	struct drm_i915_private *dev_priv = stream->dev_priv;
 	u32 oastatus1;
@@ -1448,7 +1476,8 @@ static int gen7_oa_read(struct i915_perf_stream *stream,
 			GEN7_OASTATUS1_REPORT_LOST;
 	}
 
-	return gen7_append_oa_reports(stream, buf, count, offset, ts);
+	return gen7_append_oa_reports(stream, buf, count, offset, ts,
+					max_reports);
 }
 
 /**
@@ -1483,7 +1512,7 @@ static int append_cs_buffer_sample(struct i915_perf_stream *stream,
 		 * timestamp values
 		 */
 		ret = dev_priv->perf.oa.ops.read(stream, buf, count, offset,
-						 sample_ts);
+						 sample_ts, U32_MAX);
 		if (ret)
 			return ret;
 	}
@@ -1518,6 +1547,7 @@ static int append_cs_buffer_samples(struct i915_perf_stream *stream,
 				size_t count,
 				size_t *offset)
 {
+	struct drm_i915_private *dev_priv = stream->dev_priv;
 	struct i915_perf_cs_sample *entry, *next;
 	LIST_HEAD(free_list);
 	int ret = 0;
@@ -1526,7 +1556,7 @@ static int append_cs_buffer_samples(struct i915_perf_stream *stream,
 	spin_lock_irqsave(&stream->cs_samples_lock, flags);
 	if (list_empty(&stream->cs_samples)) {
 		spin_unlock_irqrestore(&stream->cs_samples_lock, flags);
-		return 0;
+		goto pending_periodic;
 	}
 	list_for_each_entry_safe(entry, next,
 				 &stream->cs_samples, link) {
@@ -1537,7 +1567,7 @@ static int append_cs_buffer_samples(struct i915_perf_stream *stream,
 	spin_unlock_irqrestore(&stream->cs_samples_lock, flags);
 
 	if (list_empty(&free_list))
-		return 0;
+		goto pending_periodic;
 
 	list_for_each_entry_safe(entry, next, &free_list, link) {
 		ret = append_cs_buffer_sample(stream, buf, count, offset,
@@ -1556,18 +1586,37 @@ static int append_cs_buffer_samples(struct i915_perf_stream *stream,
 	spin_unlock_irqrestore(&stream->cs_samples_lock, flags);
 
 	return ret;
+
+pending_periodic:
+	if (!((stream->sample_flags & SAMPLE_OA_REPORT) &&
+			dev_priv->perf.oa.n_pending_periodic_samples))
+		return 0;
+
+	ret = dev_priv->perf.oa.ops.read(stream, buf, count, offset,
+				dev_priv->perf.oa.pending_periodic_ts,
+				dev_priv->perf.oa.n_pending_periodic_samples);
+	dev_priv->perf.oa.n_pending_periodic_samples = 0;
+	dev_priv->perf.oa.pending_periodic_ts = 0;
+	return ret;
 }
 
+enum cs_buf_state {
+	CS_BUF_EMPTY,
+	CS_BUF_REQ_PENDING,
+	CS_BUF_HAVE_DATA,
+};
+
 /*
- * cs_buffer_is_empty - Checks whether the command stream buffer
+ * cs_buffer_state - Checks whether the command stream buffer
  * associated with the stream has data available.
  * @stream: An i915-perf stream opened for OA metrics
  *
- * Returns: true if atleast one request associated with command stream is
- * completed, else returns false.
+ * Returns:
+ * CS_BUF_HAVE_DATA	- if there is atleast one completed request
+ * CS_BUF_REQ_PENDING	- there are requests pending, but no completed requests
+ * CS_BUF_EMPTY		- no requests scheduled
  */
-static bool cs_buffer_is_empty(struct i915_perf_stream *stream)
-
+static enum cs_buf_state cs_buffer_state(struct i915_perf_stream *stream)
 {
 	struct i915_perf_cs_sample *entry = NULL;
 	struct drm_i915_gem_request *request = NULL;
@@ -1581,30 +1630,57 @@ static bool cs_buffer_is_empty(struct i915_perf_stream *stream)
 	spin_unlock_irqrestore(&stream->cs_samples_lock, flags);
 
 	if (!entry)
-		return true;
+		return CS_BUF_EMPTY;
 	else if (!i915_gem_request_completed(request))
-		return true;
+		return CS_BUF_REQ_PENDING;
 	else
-		return false;
+		return CS_BUF_HAVE_DATA;
 }
 
 /**
  * stream_have_data_unlocked - Checks whether the stream has data available
  * @stream: An i915-perf stream opened for OA metrics
  *
- * For command stream based streams, check if the command stream buffer has
- * atleast one sample available, if not return false, irrespective of periodic
- * oa buffer having the data or not.
+ * Note: We can safely forward the periodic OA samples in the case we have no
+ * pending CS samples, but we can't do so in the case we have pending CS
+ * samples, since we don't know what the ordering between pending CS samples
+ * and periodic samples will eventually be. If we have no pending CS sample,
+ * it won't be possible for future pending CS sample to have timestamps
+ * earlier than current periodic timestamp.
  */
 
 static bool stream_have_data_unlocked(struct i915_perf_stream *stream)
 {
 	struct drm_i915_private *dev_priv = stream->dev_priv;
+	enum cs_buf_state state = CS_BUF_EMPTY;
+	u32 num_samples = 0, last_ts = 0;
+
+	dev_priv->perf.oa.n_pending_periodic_samples = 0;
+	dev_priv->perf.oa.pending_periodic_ts = 0;
+	num_samples = oa_buffer_num_reports_unlocked(dev_priv,
+						     &last_ts);
 
 	if (stream->cs_mode)
-		return !cs_buffer_is_empty(stream);
-	else
-		return oa_buffer_check_unlocked(dev_priv);
+		state = cs_buffer_state(stream);
+
+	switch (state) {
+	case CS_BUF_EMPTY:
+		if (stream->sample_flags & SAMPLE_OA_REPORT) {
+			dev_priv->perf.oa.n_pending_periodic_samples =
+							num_samples;
+			dev_priv->perf.oa.pending_periodic_ts = last_ts;
+			return (num_samples != 0);
+		} else
+			return false;
+
+	case CS_BUF_HAVE_DATA:
+		return true;
+
+	case CS_BUF_REQ_PENDING:
+	default:
+		return false;
+	}
+	return false;
 }
 
 /**
@@ -1691,7 +1767,7 @@ static int i915_perf_stream_read(struct i915_perf_stream *stream,
 		return append_cs_buffer_samples(stream, buf, count, offset);
 	else if (stream->sample_flags & SAMPLE_OA_REPORT)
 		return dev_priv->perf.oa.ops.read(stream, buf, count, offset,
-						U32_MAX);
+						U32_MAX, U32_MAX);
 	else
 		return -EINVAL;
 }
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 05/12] drm/i915: Inform userspace about command stream OA buf overflow
  2017-07-31  7:59 [PATCH 00/12] i915 perf support for command stream based OA, GPU and workload metrics capture Sagar Arun Kamble
                   ` (3 preceding siblings ...)
  2017-07-31  7:59 ` [PATCH 04/12] drm/i915: Flush periodic samples, in case of no pending CS sample requests Sagar Arun Kamble
@ 2017-07-31  7:59 ` Sagar Arun Kamble
  2017-07-31  7:59 ` [PATCH 06/12] drm/i915: Populate ctx ID for periodic OA reports Sagar Arun Kamble
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 34+ messages in thread
From: Sagar Arun Kamble @ 2017-07-31  7:59 UTC (permalink / raw)
  To: intel-gfx; +Cc: Sourab Gupta

From: Sourab Gupta <sourab.gupta@intel.com>

Considering how we don't currently give userspace control over the
OA buffer size and always configure a large 16MB buffer,
then a buffer overflow does anyway likely indicate that something
has gone quite badly wrong.

Here we set a status flag to detect overflow and inform userspace
of the report_lost condition accordingly. This is in line with the
behavior of the periodic OA buffer.

Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h  |  2 ++
 drivers/gpu/drm/i915/i915_perf.c | 15 +++++++++++++++
 2 files changed, 17 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 886fc5e..fb81315 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2068,6 +2068,8 @@ struct i915_perf_stream {
 	struct {
 		struct i915_vma *vma;
 		u8 *vaddr;
+#define I915_PERF_CMD_STREAM_BUF_STATUS_OVERFLOW (1<<0)
+		u32 status;
 	} cs_buffer;
 
 	struct list_head cs_samples;
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 462d180..905c5bb 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -501,6 +501,8 @@ static void insert_perf_sample(struct i915_perf_stream *stream,
 		else {
 			u32 target_size = sample_size - first->offset;
 
+			stream->cs_buffer.status |=
+				I915_PERF_CMD_STREAM_BUF_STATUS_OVERFLOW;
 			release_perf_samples(stream, target_size);
 			sample->offset = 0;
 		}
@@ -514,6 +516,8 @@ static void insert_perf_sample(struct i915_perf_stream *stream,
 				(first->offset - last->offset -
 				sample_size);
 
+			stream->cs_buffer.status |=
+				I915_PERF_CMD_STREAM_BUF_STATUS_OVERFLOW;
 			release_perf_samples(stream, target_size);
 			sample->offset = last->offset + sample_size;
 		}
@@ -1552,6 +1556,17 @@ static int append_cs_buffer_samples(struct i915_perf_stream *stream,
 	LIST_HEAD(free_list);
 	int ret = 0;
 	unsigned long flags;
+	u32 status = stream->cs_buffer.status;
+
+	if (unlikely(status & I915_PERF_CMD_STREAM_BUF_STATUS_OVERFLOW)) {
+		ret = append_oa_status(stream, buf, count, offset,
+				       DRM_I915_PERF_RECORD_OA_BUFFER_LOST);
+		if (ret)
+			return ret;
+
+		stream->cs_buffer.status &=
+				~I915_PERF_CMD_STREAM_BUF_STATUS_OVERFLOW;
+	}
 
 	spin_lock_irqsave(&stream->cs_samples_lock, flags);
 	if (list_empty(&stream->cs_samples)) {
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 06/12] drm/i915: Populate ctx ID for periodic OA reports
  2017-07-31  7:59 [PATCH 00/12] i915 perf support for command stream based OA, GPU and workload metrics capture Sagar Arun Kamble
                   ` (4 preceding siblings ...)
  2017-07-31  7:59 ` [PATCH 05/12] drm/i915: Inform userspace about command stream OA buf overflow Sagar Arun Kamble
@ 2017-07-31  7:59 ` Sagar Arun Kamble
  2017-07-31  9:27   ` Lionel Landwerlin
  2017-07-31 18:17   ` kbuild test robot
  2017-07-31  7:59 ` [PATCH 07/12] drm/i915: Add support for having pid output with OA report Sagar Arun Kamble
                   ` (6 subsequent siblings)
  12 siblings, 2 replies; 34+ messages in thread
From: Sagar Arun Kamble @ 2017-07-31  7:59 UTC (permalink / raw)
  To: intel-gfx; +Cc: Sourab Gupta

From: Sourab Gupta <sourab.gupta@intel.com>

This adds support for populating the ctx id for the periodic OA reports
when requested through the corresponding property.

For Gen8, the OA reports itself have the ctx ID and it is the one
programmed into HW while submitting workloads. Thus it's retrieved from
reports itself.
For Gen7, the OA reports don't have any such field, and we can populate
this field with the last seen ctx ID while sending CS reports.

Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h  |  8 ++++++
 drivers/gpu/drm/i915/i915_perf.c | 58 +++++++++++++++++++++++++++++++---------
 2 files changed, 54 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index fb81315..6c011f3 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2077,6 +2077,8 @@ struct i915_perf_stream {
 
 	wait_queue_head_t poll_wq;
 	bool pollin;
+
+	u32 last_ctx_id;
 };
 
 /**
@@ -2151,6 +2153,12 @@ struct i915_oa_ops {
 	 * generations.
 	 */
 	u32 (*oa_hw_tail_read)(struct drm_i915_private *dev_priv);
+
+	/**
+	 * @get_ctx_id: Retrieve the ctx_id associated with the (periodic) OA
+	 * report.
+	 */
+	u32 (*get_ctx_id)(struct i915_perf_stream *stream, const u8 *report);
 };
 
 /*
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 905c5bb..1f5ebdb 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -790,6 +790,45 @@ static u32 oa_buffer_num_reports_unlocked(
 	return aged_tail == INVALID_TAIL_PTR ? 0 : num_reports;
 }
 
+static u32 gen7_oa_buffer_get_ctx_id(struct i915_perf_stream *stream,
+				    const u8 *report)
+{
+	if (!stream->cs_mode)
+		WARN_ONCE(1,
+			"CTX ID can't be retrieved if command stream mode not enabled");
+
+	/*
+	 * OA reports generated in Gen7 don't have the ctx ID information.
+	 * Therefore, just rely on the ctx ID information from the last CS
+	 * sample forwarded
+	 */
+	return stream->last_ctx_id;
+}
+
+static u32 gen8_oa_buffer_get_ctx_id(struct i915_perf_stream *stream,
+				    const u8 *report)
+{
+	u32 ctx_id;
+
+	/* The ctx ID present in the OA reports have intel_context::hw_id
+	 * present, since this is programmed into the ELSP in execlist mode.
+	 * In non-execlist mode, fall back to retrieving the ctx ID from the
+	 * last saved ctx ID from command stream mode.
+	 */
+	if (i915.enable_execlists) {
+		u32 *report32 = (void *)report;
+
+		ctx_id = report32[2] & 0x1fffff;
+	} else {
+		if (!stream->cs_mode)
+			WARN_ONCE(1,
+				"CTX ID can't be retrieved if command stream mode not enabled");
+
+		ctx_id = stream->last_ctx_id;
+	}
+	return ctx_id;
+}
+
 /**
  * append_oa_status - Appends a status record to a userspace read() buffer.
  * @stream: An i915-perf stream opened for OA metrics
@@ -914,22 +953,12 @@ static int append_oa_buffer_sample(struct i915_perf_stream *stream,
 	struct drm_i915_private *dev_priv = stream->dev_priv;
 	u32 sample_flags = stream->sample_flags;
 	struct i915_perf_sample_data data = { 0 };
-	u32 *report32 = (u32 *)report;
 
 	if (sample_flags & SAMPLE_OA_SOURCE)
 		data.source = I915_PERF_SAMPLE_OA_SOURCE_OABUFFER;
 
 	if (sample_flags & SAMPLE_CTX_ID) {
-		if (INTEL_INFO(dev_priv)->gen < 8)
-			data.ctx_id = 0;
-		else {
-			/*
-			 * XXX: Just keep the lower 21 bits for now since I'm
-			 * not entirely sure if the HW touches any of the higher
-			 * bits in this field
-			 */
-			data.ctx_id = report32[2] & 0x1fffff;
-		}
+		data.ctx_id = dev_priv->perf.oa.ops.get_ctx_id(stream, report);
 	}
 
 	if (sample_flags & SAMPLE_OA_REPORT)
@@ -1524,8 +1553,10 @@ static int append_cs_buffer_sample(struct i915_perf_stream *stream,
 	if (sample_flags & SAMPLE_OA_SOURCE)
 		data.source = I915_PERF_SAMPLE_OA_SOURCE_CS;
 
-	if (sample_flags & SAMPLE_CTX_ID)
+	if (sample_flags & SAMPLE_CTX_ID) {
 		data.ctx_id = node->ctx_id;
+		stream->last_ctx_id = data.ctx_id;
+	}
 
 	return append_perf_sample(stream, buf, count, offset, &data);
 }
@@ -3838,6 +3869,7 @@ void i915_perf_init(struct drm_i915_private *dev_priv)
 		dev_priv->perf.oa.ops.read = gen7_oa_read;
 		dev_priv->perf.oa.ops.oa_hw_tail_read =
 			gen7_oa_hw_tail_read;
+		dev_priv->perf.oa.ops.get_ctx_id = gen7_oa_buffer_get_ctx_id;
 
 		dev_priv->perf.oa.timestamp_frequency = 12500000;
 
@@ -3933,6 +3965,8 @@ void i915_perf_init(struct drm_i915_private *dev_priv)
 			dev_priv->perf.oa.ops.read = gen8_oa_read;
 			dev_priv->perf.oa.ops.oa_hw_tail_read =
 				gen8_oa_hw_tail_read;
+			dev_priv->perf.oa.ops.get_ctx_id =
+				gen8_oa_buffer_get_ctx_id;
 
 			dev_priv->perf.oa.oa_formats = gen8_plus_oa_formats;
 		}
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 07/12] drm/i915: Add support for having pid output with OA report
  2017-07-31  7:59 [PATCH 00/12] i915 perf support for command stream based OA, GPU and workload metrics capture Sagar Arun Kamble
                   ` (5 preceding siblings ...)
  2017-07-31  7:59 ` [PATCH 06/12] drm/i915: Populate ctx ID for periodic OA reports Sagar Arun Kamble
@ 2017-07-31  7:59 ` Sagar Arun Kamble
  2017-07-31 19:24   ` kbuild test robot
  2017-07-31  7:59 ` [PATCH 08/12] drm/i915: Add support for emitting execbuffer tags through OA counter reports Sagar Arun Kamble
                   ` (5 subsequent siblings)
  12 siblings, 1 reply; 34+ messages in thread
From: Sagar Arun Kamble @ 2017-07-31  7:59 UTC (permalink / raw)
  To: intel-gfx; +Cc: Sourab Gupta

From: Sourab Gupta <sourab.gupta@intel.com>

This patch introduces flags and adds support for having pid output with
the OA reports generated through the RCS commands.

When the stream is opened with pid sample type, the pid information is also
captured through the command stream samples and forwarded along with the
OA reports.

v2: Changed payload field pid to u64 to keep all sample data aligned at 8
bytes. (Lionel)

Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h  |  7 ++++++
 drivers/gpu/drm/i915/i915_perf.c | 48 +++++++++++++++++++++++++++++++++++++++-
 include/uapi/drm/i915_drm.h      |  7 ++++++
 3 files changed, 61 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 6c011f3..b56ea20 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2079,6 +2079,7 @@ struct i915_perf_stream {
 	bool pollin;
 
 	u32 last_ctx_id;
+	u32 last_pid;
 };
 
 /**
@@ -2189,6 +2190,12 @@ struct i915_perf_cs_sample {
 	 * @ctx_id: Context ID associated with this perf sample
 	 */
 	u32 ctx_id;
+
+	/**
+	 * @pid: PID of the process in context of which the workload was
+	 * submitted, pertaining to this perf sample
+	 */
+	u32 pid;
 };
 
 struct intel_cdclk_state {
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 1f5ebdb..5ac1a41 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -293,6 +293,7 @@
 struct i915_perf_sample_data {
 	u64 source;
 	u64 ctx_id;
+	u64 pid;
 	const u8 *report;
 };
 
@@ -348,6 +349,7 @@ struct i915_perf_sample_data {
 #define SAMPLE_OA_REPORT      (1<<0)
 #define SAMPLE_OA_SOURCE      (1<<1)
 #define SAMPLE_CTX_ID	      (1<<2)
+#define SAMPLE_PID	      (1<<3)
 
 /**
  * struct perf_open_properties - for validated properties given to open a stream
@@ -608,6 +610,7 @@ static void i915_perf_stream_emit_sample_capture(
 
 	sample->request = i915_gem_request_get(request);
 	sample->ctx_id = request->ctx->hw_id;
+	sample->pid = current->pid;
 
 	insert_perf_sample(stream, sample);
 
@@ -924,6 +927,12 @@ static int append_perf_sample(struct i915_perf_stream *stream,
 		buf += 8;
 	}
 
+	if (sample_flags & SAMPLE_PID) {
+		if (copy_to_user(buf, &data->pid, 8))
+			return -EFAULT;
+		buf += 8;
+	}
+
 	if (sample_flags & SAMPLE_OA_REPORT) {
 		if (copy_to_user(buf, data->report, report_size))
 			return -EFAULT;
@@ -961,6 +970,9 @@ static int append_oa_buffer_sample(struct i915_perf_stream *stream,
 		data.ctx_id = dev_priv->perf.oa.ops.get_ctx_id(stream, report);
 	}
 
+	if (sample_flags & SAMPLE_PID)
+		data.pid = stream->last_pid;
+
 	if (sample_flags & SAMPLE_OA_REPORT)
 		data.report = report;
 
@@ -1558,6 +1570,11 @@ static int append_cs_buffer_sample(struct i915_perf_stream *stream,
 		stream->last_ctx_id = data.ctx_id;
 	}
 
+	if (sample_flags & SAMPLE_PID) {
+		data.pid = node->pid;
+		stream->last_pid = node->pid;
+	}
+
 	return append_perf_sample(stream, buf, count, offset, &data);
 }
 
@@ -2719,6 +2736,7 @@ static int i915_perf_stream_init(struct i915_perf_stream *stream,
 	struct drm_i915_private *dev_priv = stream->dev_priv;
 	bool require_oa_unit = props->sample_flags & (SAMPLE_OA_REPORT |
 						      SAMPLE_OA_SOURCE);
+	bool require_cs_mode = props->sample_flags & SAMPLE_PID;
 	bool cs_sample_data = props->sample_flags & SAMPLE_OA_REPORT;
 	struct i915_perf_stream *curr_stream;
 	struct intel_engine_cs *engine = NULL;
@@ -2866,6 +2884,20 @@ static int i915_perf_stream_init(struct i915_perf_stream *stream,
 	if (props->sample_flags & SAMPLE_CTX_ID) {
 		stream->sample_flags |= SAMPLE_CTX_ID;
 		stream->sample_size += 8;
+
+		/*
+		 * NB: it's meaningful to request SAMPLE_CTX_ID with just CS
+		 * mode or periodic OA mode sampling but we don't allow
+		 * SAMPLE_CTX_ID without either mode
+		 */
+		if (!require_oa_unit)
+			require_cs_mode = true;
+	}
+
+	if (require_cs_mode && !props->cs_mode) {
+		DRM_ERROR("PID sampling requires a ring to be specified");
+		ret = -EINVAL;
+		goto err_enable;
 	}
 
 	if (props->cs_mode) {
@@ -2875,12 +2907,23 @@ static int i915_perf_stream_init(struct i915_perf_stream *stream,
 			goto err_enable;
 		}
 
-		if (!(props->sample_flags & SAMPLE_CTX_ID)) {
+		/*
+		 * The only time we should allow enabling CS mode if it's not
+		 * strictly required, is if SAMPLE_CTX_ID has been requested
+		 * as it's usable with periodic OA or CS sampling.
+		 */
+		if (!require_cs_mode &&
+		    !(props->sample_flags & SAMPLE_CTX_ID)) {
 			DRM_ERROR("Stream engine given without requesting any CS specific property\n");
 			ret = -EINVAL;
 			goto err_enable;
 		}
 
+		if (props->sample_flags & SAMPLE_PID) {
+			stream->sample_flags |= SAMPLE_PID;
+			stream->sample_size += 8;
+		}
+
 		engine = dev_priv->engine[props->engine];
 
 		idx = srcu_read_lock(&engine->perf_srcu);
@@ -3595,6 +3638,9 @@ static int read_properties_unlocked(struct drm_i915_private *dev_priv,
 		case DRM_I915_PERF_PROP_SAMPLE_CTX_ID:
 			props->sample_flags |= SAMPLE_CTX_ID;
 			break;
+		case DRM_I915_PERF_PROP_SAMPLE_PID:
+			props->sample_flags |= SAMPLE_PID;
+			break;
 		case DRM_I915_PERF_PROP_MAX:
 			MISSING_CASE(id);
 			return -EINVAL;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 768b1a5..34d8e41 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1408,6 +1408,12 @@ enum drm_i915_perf_property_id {
 	 */
 	DRM_I915_PERF_PROP_SAMPLE_CTX_ID,
 
+	/**
+	 * The value of this property set to 1 requests inclusion of pid in the
+	 * perf sample data.
+	 */
+	DRM_I915_PERF_PROP_SAMPLE_PID,
+
 	DRM_I915_PERF_PROP_MAX /* non-ABI */
 };
 
@@ -1475,6 +1481,7 @@ enum drm_i915_perf_record_type {
 	 *
 	 *     { u64 source; } && DRM_I915_PERF_PROP_SAMPLE_OA_SOURCE
 	 *     { u64 ctx_id; } && DRM_I915_PERF_PROP_SAMPLE_CTX_ID
+	 *     { u64 pid; } && DRM_I915_PERF_PROP_SAMPLE_PID
 	 *     { u32 oa_report[]; } && DRM_I915_PERF_PROP_SAMPLE_OA
 	 * };
 	 */
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 08/12] drm/i915: Add support for emitting execbuffer tags through OA counter reports
  2017-07-31  7:59 [PATCH 00/12] i915 perf support for command stream based OA, GPU and workload metrics capture Sagar Arun Kamble
                   ` (6 preceding siblings ...)
  2017-07-31  7:59 ` [PATCH 07/12] drm/i915: Add support for having pid output with OA report Sagar Arun Kamble
@ 2017-07-31  7:59 ` Sagar Arun Kamble
  2017-07-31  7:59 ` [PATCH 09/12] drm/i915: Add support for collecting timestamps on all gpu engines Sagar Arun Kamble
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 34+ messages in thread
From: Sagar Arun Kamble @ 2017-07-31  7:59 UTC (permalink / raw)
  To: intel-gfx; +Cc: Sourab Gupta

From: Sourab Gupta <sourab.gupta@intel.com>

This patch enables userspace to specify tags (per workload), provided via
execbuffer ioctl, which could be added to OA reports, to help associate
reports with the corresponding workloads.

There may be multiple stages within a single context, from a userspace
perspective. An ability is needed to individually associate the OA reports
with their corresponding workloads(execbuffers), which may not be possible
solely with ctx_id or pid information. This patch enables such a mechanism.

In this patch, upper 32 bits of rsvd1 field, which were previously unused
are now being used to pass in the tag.

v2: Updated i915_execbuffer2_get_tag to get the tag properly. (Sagar)
Changed tag size to 64 bit to ensure all sample fields are aligned at 8
bytes. (Lionel)

Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h            | 18 +++++++++++--
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 11 +++++---
 drivers/gpu/drm/i915/i915_perf.c           | 41 ++++++++++++++++++++++++++----
 include/uapi/drm/i915_drm.h                | 12 +++++++++
 4 files changed, 71 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index b56ea20..c4f7462 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1996,7 +1996,8 @@ struct i915_perf_stream_ops {
 	 */
 	void (*emit_sample_capture)(struct i915_perf_stream *stream,
 				    struct drm_i915_gem_request *request,
-				    bool preallocate);
+				    bool preallocate,
+				    u32 tag);
 };
 
 enum i915_perf_stream_state {
@@ -2080,6 +2081,7 @@ struct i915_perf_stream {
 
 	u32 last_ctx_id;
 	u32 last_pid;
+	u32 last_tag;
 };
 
 /**
@@ -2196,6 +2198,17 @@ struct i915_perf_cs_sample {
 	 * submitted, pertaining to this perf sample
 	 */
 	u32 pid;
+
+	/**
+	 * @tag: Tag associated with workload, for which the perf sample is
+	 * being collected.
+	 *
+	 * Userspace can specify tags (provided via execbuffer ioctl), which
+	 * can be associated with the perf samples, and be used to functionally
+	 * distinguish different workload stages, and associate samples with
+	 * these different stages.
+	 */
+	u32 tag;
 };
 
 struct intel_cdclk_state {
@@ -3723,7 +3736,8 @@ void i915_oa_init_reg_state(struct intel_engine_cs *engine,
 			    struct i915_gem_context *ctx,
 			    uint32_t *reg_state);
 void i915_perf_emit_sample_capture(struct drm_i915_gem_request *req,
-				   bool preallocate);
+				   bool preallocate,
+				   u32 tag);
 
 /* i915_gem_evict.c */
 int __must_check i915_gem_evict_something(struct i915_address_space *vm,
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index bfe546b..92585df 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -242,6 +242,7 @@ struct i915_execbuffer {
 	 */
 	int lut_size;
 	struct hlist_head *buckets; /** ht for relocation handles */
+	uint32_t tag;
 };
 
 /*
@@ -1194,7 +1195,7 @@ static int __reloc_gpu_alloc(struct i915_execbuffer *eb,
 	if (err)
 		goto err_request;
 
-	i915_perf_emit_sample_capture(rq, true);
+	i915_perf_emit_sample_capture(rq, true, eb->tag);
 
 	err = eb->engine->emit_bb_start(rq,
 					batch->node.start, PAGE_SIZE,
@@ -1202,7 +1203,7 @@ static int __reloc_gpu_alloc(struct i915_execbuffer *eb,
 	if (err)
 		goto err_request;
 
-	i915_perf_emit_sample_capture(rq, false);
+	i915_perf_emit_sample_capture(rq, false, eb->tag);
 
 	GEM_BUG_ON(!reservation_object_test_signaled_rcu(batch->resv, true));
 	i915_vma_move_to_active(batch, rq, 0);
@@ -2033,7 +2034,7 @@ static int eb_submit(struct i915_execbuffer *eb)
 			return err;
 	}
 
-	i915_perf_emit_sample_capture(eb->request, true);
+	i915_perf_emit_sample_capture(eb->request, true, eb->tag);
 
 	err = eb->engine->emit_bb_start(eb->request,
 					eb->batch->node.start +
@@ -2043,7 +2044,7 @@ static int eb_submit(struct i915_execbuffer *eb)
 	if (err)
 		return err;
 
-	i915_perf_emit_sample_capture(eb->request, false);
+	i915_perf_emit_sample_capture(eb->request, false, eb->tag);
 
 	return 0;
 }
@@ -2168,6 +2169,8 @@ static int eb_submit(struct i915_execbuffer *eb)
 	if (!eb.engine)
 		return -EINVAL;
 
+	eb.tag	= i915_execbuffer2_get_tag(*args);
+
 	if (args->flags & I915_EXEC_RESOURCE_STREAMER) {
 		if (!HAS_RESOURCE_STREAMER(eb.i915)) {
 			DRM_DEBUG("RS is only allowed for Haswell, Gen8 and above\n");
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 5ac1a41..c7f8e7f 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -294,6 +294,7 @@ struct i915_perf_sample_data {
 	u64 source;
 	u64 ctx_id;
 	u64 pid;
+	u64 tag;
 	const u8 *report;
 };
 
@@ -350,6 +351,7 @@ struct i915_perf_sample_data {
 #define SAMPLE_OA_SOURCE      (1<<1)
 #define SAMPLE_CTX_ID	      (1<<2)
 #define SAMPLE_PID	      (1<<3)
+#define SAMPLE_TAG	      (1<<4)
 
 /**
  * struct perf_open_properties - for validated properties given to open a stream
@@ -402,12 +404,14 @@ static u32 gen7_oa_hw_tail_read(struct drm_i915_private *dev_priv)
  * the command stream of a GPU engine.
  * @request: request in whose context the metrics are being collected.
  * @preallocate: allocate space in ring for related sample.
+ * @tag: userspace provided tag to be associated with the perf sample
  *
  * The function provides a hook through which the commands to capture perf
  * metrics, are inserted into the command stream of a GPU engine.
  */
 void i915_perf_emit_sample_capture(struct drm_i915_gem_request *request,
-				   bool preallocate)
+				   bool preallocate,
+				   u32 tag)
 {
 	struct intel_engine_cs *engine = request->engine;
 	struct drm_i915_private *dev_priv = engine->i915;
@@ -422,7 +426,8 @@ void i915_perf_emit_sample_capture(struct drm_i915_gem_request *request,
 	if (stream && (stream->state == I915_PERF_STREAM_ENABLED) &&
 				stream->cs_mode)
 		stream->ops->emit_sample_capture(stream, request,
-						 preallocate);
+						 preallocate, tag);
+
 	srcu_read_unlock(&engine->perf_srcu, idx);
 }
 
@@ -591,11 +596,13 @@ static int i915_emit_oa_report_capture(
  * @stream: An i915-perf stream opened for GPU metrics
  * @request: request in whose context the metrics are being collected.
  * @preallocate: allocate space in ring for related sample.
+ * @tag: userspace provided tag to be associated with the perf sample
  */
 static void i915_perf_stream_emit_sample_capture(
 					struct i915_perf_stream *stream,
 					struct drm_i915_gem_request *request,
-					bool preallocate)
+					bool preallocate,
+					u32 tag)
 {
 	struct reservation_object *resv = stream->cs_buffer.vma->resv;
 	struct i915_perf_cs_sample *sample;
@@ -611,6 +618,7 @@ static void i915_perf_stream_emit_sample_capture(
 	sample->request = i915_gem_request_get(request);
 	sample->ctx_id = request->ctx->hw_id;
 	sample->pid = current->pid;
+	sample->tag = tag;
 
 	insert_perf_sample(stream, sample);
 
@@ -933,6 +941,12 @@ static int append_perf_sample(struct i915_perf_stream *stream,
 		buf += 8;
 	}
 
+	if (sample_flags & SAMPLE_TAG) {
+		if (copy_to_user(buf, &data->tag, 8))
+			return -EFAULT;
+		buf += 8;
+	}
+
 	if (sample_flags & SAMPLE_OA_REPORT) {
 		if (copy_to_user(buf, data->report, report_size))
 			return -EFAULT;
@@ -973,6 +987,9 @@ static int append_oa_buffer_sample(struct i915_perf_stream *stream,
 	if (sample_flags & SAMPLE_PID)
 		data.pid = stream->last_pid;
 
+	if (sample_flags & SAMPLE_TAG)
+		data.tag = stream->last_tag;
+
 	if (sample_flags & SAMPLE_OA_REPORT)
 		data.report = report;
 
@@ -1575,6 +1592,11 @@ static int append_cs_buffer_sample(struct i915_perf_stream *stream,
 		stream->last_pid = node->pid;
 	}
 
+	if (sample_flags & SAMPLE_TAG) {
+		data.tag = node->tag;
+		stream->last_tag = node->tag;
+	}
+
 	return append_perf_sample(stream, buf, count, offset, &data);
 }
 
@@ -2736,7 +2758,8 @@ static int i915_perf_stream_init(struct i915_perf_stream *stream,
 	struct drm_i915_private *dev_priv = stream->dev_priv;
 	bool require_oa_unit = props->sample_flags & (SAMPLE_OA_REPORT |
 						      SAMPLE_OA_SOURCE);
-	bool require_cs_mode = props->sample_flags & SAMPLE_PID;
+	bool require_cs_mode = props->sample_flags & (SAMPLE_PID |
+						      SAMPLE_TAG);
 	bool cs_sample_data = props->sample_flags & SAMPLE_OA_REPORT;
 	struct i915_perf_stream *curr_stream;
 	struct intel_engine_cs *engine = NULL;
@@ -2895,7 +2918,7 @@ static int i915_perf_stream_init(struct i915_perf_stream *stream,
 	}
 
 	if (require_cs_mode && !props->cs_mode) {
-		DRM_ERROR("PID sampling requires a ring to be specified");
+		DRM_ERROR("PID/TAG sampling requires a ring to be specified");
 		ret = -EINVAL;
 		goto err_enable;
 	}
@@ -2924,6 +2947,11 @@ static int i915_perf_stream_init(struct i915_perf_stream *stream,
 			stream->sample_size += 8;
 		}
 
+		if (props->sample_flags & SAMPLE_TAG) {
+			stream->sample_flags |= SAMPLE_TAG;
+			stream->sample_size += 8;
+		}
+
 		engine = dev_priv->engine[props->engine];
 
 		idx = srcu_read_lock(&engine->perf_srcu);
@@ -3641,6 +3669,9 @@ static int read_properties_unlocked(struct drm_i915_private *dev_priv,
 		case DRM_I915_PERF_PROP_SAMPLE_PID:
 			props->sample_flags |= SAMPLE_PID;
 			break;
+		case DRM_I915_PERF_PROP_SAMPLE_TAG:
+			props->sample_flags |= SAMPLE_TAG;
+			break;
 		case DRM_I915_PERF_PROP_MAX:
 			MISSING_CASE(id);
 			return -EINVAL;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 34d8e41..0e522d4 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -935,6 +935,11 @@ struct drm_i915_gem_execbuffer2 {
 #define i915_execbuffer2_get_context_id(eb2) \
 	((eb2).rsvd1 & I915_EXEC_CONTEXT_ID_MASK)
 
+/* upper 32 bits of rsvd1 field contain tag */
+#define I915_EXEC_TAG_MASK		(0xffffffff00000000UL)
+#define i915_execbuffer2_get_tag(eb2) \
+	(((eb2).rsvd1 & I915_EXEC_TAG_MASK) >> 32)
+
 struct drm_i915_gem_pin {
 	/** Handle of the buffer to be pinned. */
 	__u32 handle;
@@ -1414,6 +1419,12 @@ enum drm_i915_perf_property_id {
 	 */
 	DRM_I915_PERF_PROP_SAMPLE_PID,
 
+	/**
+	 * The value of this property set to 1 requests inclusion of tag in the
+	 * perf sample data.
+	 */
+	DRM_I915_PERF_PROP_SAMPLE_TAG,
+
 	DRM_I915_PERF_PROP_MAX /* non-ABI */
 };
 
@@ -1482,6 +1493,7 @@ enum drm_i915_perf_record_type {
 	 *     { u64 source; } && DRM_I915_PERF_PROP_SAMPLE_OA_SOURCE
 	 *     { u64 ctx_id; } && DRM_I915_PERF_PROP_SAMPLE_CTX_ID
 	 *     { u64 pid; } && DRM_I915_PERF_PROP_SAMPLE_PID
+	 *     { u64 tag; } && DRM_I915_PERF_PROP_SAMPLE_TAG
 	 *     { u32 oa_report[]; } && DRM_I915_PERF_PROP_SAMPLE_OA
 	 * };
 	 */
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 09/12] drm/i915: Add support for collecting timestamps on all gpu engines
  2017-07-31  7:59 [PATCH 00/12] i915 perf support for command stream based OA, GPU and workload metrics capture Sagar Arun Kamble
                   ` (7 preceding siblings ...)
  2017-07-31  7:59 ` [PATCH 08/12] drm/i915: Add support for emitting execbuffer tags through OA counter reports Sagar Arun Kamble
@ 2017-07-31  7:59 ` Sagar Arun Kamble
  2017-07-31  7:59 ` [PATCH 10/12] drm/i915: Extract raw GPU timestamps from OA reports to forward in perf samples Sagar Arun Kamble
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 34+ messages in thread
From: Sagar Arun Kamble @ 2017-07-31  7:59 UTC (permalink / raw)
  To: intel-gfx; +Cc: Sourab Gupta

From: Sourab Gupta <sourab.gupta@intel.com>

With this patch, for RCS, timestamps and OA reports can be collected
together, and provided to userspace in separate sample fields. For other
engines, the capabilility to collect timestamps is added.

The thing to note is that, still only a single stream instance can be
opened at any particular time. Though that stream may now be opened for any
gpu engine, for collection of timestamp samples.

So, this patch doesn't add the support to open multiple concurrent streams,
as yet.

Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h  |  25 ++++-
 drivers/gpu/drm/i915/i915_perf.c | 229 ++++++++++++++++++++++++++++++++-------
 drivers/gpu/drm/i915/i915_reg.h  |   2 +
 include/uapi/drm/i915_drm.h      |   7 ++
 4 files changed, 222 insertions(+), 41 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index c4f7462..0763280 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2182,11 +2182,30 @@ struct i915_perf_cs_sample {
 	struct drm_i915_gem_request *request;
 
 	/**
-	 * @offset: Offset into ``&stream->cs_buffer``
-	 * where the perf metrics will be collected, when the commands inserted
+	 * @start_offset: Offset into ``&stream->cs_buffer
+	 * where the metrics will be collected, when the commands inserted
 	 * into the command stream are executed by GPU.
 	 */
-	u32 offset;
+	u32 start_offset;
+
+	/**
+	 * @oa_offset: Offset into ``&stream->cs_buffer
+	 * where the OA report will be collected (if the stream is configured
+	 * for collection of OA samples).
+	 */
+	u32 oa_offset;
+
+	/**
+	 * @ts_offset: Offset into ``&stream->cs_buffer
+	 * where the timestamps will be collected (if the stream is configured
+	 * for collection of timestamp data)
+	 */
+	u32 ts_offset;
+
+	/**
+	 * @size: buffer size corresponding to this perf sample
+	 */
+	u32 size;
 
 	/**
 	 * @ctx_id: Context ID associated with this perf sample
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index c7f8e7f..2c7ab98 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -289,12 +289,17 @@
 #define OAREPORT_REASON_CTX_SWITCH     (1<<3)
 #define OAREPORT_REASON_CLK_RATIO      (1<<5)
 
-/* Data common to periodic and RCS based OA samples */
+#define OA_ADDR_ALIGN 64
+#define TS_ADDR_ALIGN 8
+#define I915_PERF_TS_SAMPLE_SIZE 8
+
+/*Data common to perf samples (periodic OA / CS based OA / Timestamps)*/
 struct i915_perf_sample_data {
 	u64 source;
 	u64 ctx_id;
 	u64 pid;
 	u64 tag;
+	u64 ts;
 	const u8 *report;
 };
 
@@ -352,6 +357,7 @@ struct i915_perf_sample_data {
 #define SAMPLE_CTX_ID	      (1<<2)
 #define SAMPLE_PID	      (1<<3)
 #define SAMPLE_TAG	      (1<<4)
+#define SAMPLE_TS	      (1<<5)
 
 /**
  * struct perf_open_properties - for validated properties given to open a stream
@@ -446,14 +452,12 @@ void i915_perf_emit_sample_capture(struct drm_i915_gem_request *request,
 static void release_perf_samples(struct i915_perf_stream *stream,
 				 u32 target_size)
 {
-	struct drm_i915_private *dev_priv = stream->dev_priv;
 	struct i915_perf_cs_sample *sample, *next;
-	u32 sample_size = dev_priv->perf.oa.oa_buffer.format_size;
 	u32 size = 0;
 
 	list_for_each_entry_safe
 		(sample, next, &stream->cs_samples, link) {
-		size += sample_size;
+		size += sample->size;
 		i915_gem_request_put(sample->request);
 		list_del(&sample->link);
 		kfree(sample);
@@ -478,15 +482,24 @@ static void insert_perf_sample(struct i915_perf_stream *stream,
 	struct drm_i915_private *dev_priv = stream->dev_priv;
 	struct i915_perf_cs_sample *first, *last;
 	int max_offset = stream->cs_buffer.vma->obj->base.size;
-	u32 sample_size = dev_priv->perf.oa.oa_buffer.format_size;
 	unsigned long flags;
+	u32 offset, sample_size = 0;
+
+	if (stream->sample_flags & SAMPLE_OA_REPORT)
+		sample_size += dev_priv->perf.oa.oa_buffer.format_size;
+	else if (stream->sample_flags & SAMPLE_TS) {
+		/*
+		 * XXX: Since TS data can anyways be derived from OA report, so
+		 * no need to capture it for RCS engine, if capture oa data is
+		 * called already.
+		 */
+		sample_size += I915_PERF_TS_SAMPLE_SIZE;
+	}
 
 	spin_lock_irqsave(&stream->cs_samples_lock, flags);
 	if (list_empty(&stream->cs_samples)) {
-		sample->offset = 0;
-		list_add_tail(&sample->link, &stream->cs_samples);
-		spin_unlock_irqrestore(&stream->cs_samples_lock, flags);
-		return;
+		offset = 0;
+		goto out;
 	}
 
 	first = list_first_entry(&stream->cs_samples, typeof(*first),
@@ -494,41 +507,61 @@ static void insert_perf_sample(struct i915_perf_stream *stream,
 	last = list_last_entry(&stream->cs_samples, typeof(*last),
 				link);
 
-	if (last->offset >= first->offset) {
+	if (last->start_offset >= first->start_offset) {
 		/* Sufficient space available at the end of buffer? */
-		if (last->offset + 2*sample_size < max_offset)
-			sample->offset = last->offset + sample_size;
+		if (last->start_offset + last->size + sample_size < max_offset)
+			offset = last->start_offset + last->size;
 		/*
 		 * Wraparound condition. Is sufficient space available at
 		 * beginning of buffer?
 		 */
-		else if (sample_size < first->offset)
-			sample->offset = 0;
+		else if (sample_size < first->start_offset)
+			offset = 0;
 		/* Insufficient space. Overwrite existing old entries */
 		else {
-			u32 target_size = sample_size - first->offset;
+			u32 target_size = sample_size - first->start_offset;
 
 			stream->cs_buffer.status |=
 				I915_PERF_CMD_STREAM_BUF_STATUS_OVERFLOW;
 			release_perf_samples(stream, target_size);
-			sample->offset = 0;
+			offset = 0;
 		}
 	} else {
 		/* Sufficient space available? */
-		if (last->offset + 2*sample_size < first->offset)
-			sample->offset = last->offset + sample_size;
+		if (last->start_offset + last->size + sample_size
+						< first->start_offset)
+			offset = last->start_offset + last->size;
+
 		/* Insufficient space. Overwrite existing old entries */
 		else {
 			u32 target_size = sample_size -
-				(first->offset - last->offset -
-				sample_size);
+				(first->start_offset - last->start_offset -
+				last->size);
 
 			stream->cs_buffer.status |=
 				I915_PERF_CMD_STREAM_BUF_STATUS_OVERFLOW;
 			release_perf_samples(stream, target_size);
-			sample->offset = last->offset + sample_size;
+			offset = last->start_offset + sample_size;
 		}
 	}
+
+out:
+	sample->start_offset = offset;
+	sample->size = sample_size;
+	if (stream->sample_flags & SAMPLE_OA_REPORT) {
+		sample->oa_offset = offset;
+		/* Ensure 64 byte alignment of oa_offset */
+		sample->oa_offset = ALIGN(sample->oa_offset, OA_ADDR_ALIGN);
+		offset = sample->oa_offset +
+			 dev_priv->perf.oa.oa_buffer.format_size;
+	}
+	if (stream->sample_flags & SAMPLE_TS) {
+		sample->ts_offset = offset;
+		/* Ensure 8 byte alignment of ts_offset */
+		sample->ts_offset = ALIGN(sample->ts_offset, TS_ADDR_ALIGN);
+		offset = sample->ts_offset + I915_PERF_TS_SAMPLE_SIZE;
+	}
+
 	list_add_tail(&sample->link, &stream->cs_samples);
 	spin_unlock_irqrestore(&stream->cs_samples_lock, flags);
 }
@@ -591,6 +624,82 @@ static int i915_emit_oa_report_capture(
 }
 
 /**
+ * i915_emit_ts_capture - Insert the commands to capture timestamp
+ * data into the GPU command stream
+ * @request: request in whose context the timestamps are being collected.
+ * @preallocate: allocate space in ring for related sample.
+ * @offset: command stream buffer offset where the timestamp data needs to be
+ * collected
+ */
+static int i915_emit_ts_capture(struct drm_i915_gem_request *request,
+					 bool preallocate,
+					 u32 offset)
+{
+	struct drm_i915_private *dev_priv = request->i915;
+	struct intel_engine_cs *engine = request->engine;
+	struct i915_perf_stream *stream;
+	u32 addr = 0;
+	u32 cmd, len = 6, *cs;
+	int idx;
+
+	if (preallocate)
+		request->reserved_space += len;
+	else
+		request->reserved_space -= len;
+
+	cs = intel_ring_begin(request, 6);
+	if (IS_ERR(cs))
+		return PTR_ERR(cs);
+
+	idx = srcu_read_lock(&engine->perf_srcu);
+	stream = rcu_dereference(engine->exclusive_stream);
+	addr = stream->cs_buffer.vma->node.start + offset;
+	srcu_read_unlock(&engine->perf_srcu, idx);
+
+	if (request->engine->id == RCS) {
+		if (INTEL_GEN(dev_priv) >= 8)
+			cmd = GFX_OP_PIPE_CONTROL(6);
+		else
+			cmd = GFX_OP_PIPE_CONTROL(5);
+
+		*cs++ = cmd;
+		*cs++ = PIPE_CONTROL_GLOBAL_GTT_IVB |
+				PIPE_CONTROL_TIMESTAMP_WRITE;
+		*cs++ = addr | PIPE_CONTROL_GLOBAL_GTT;
+		*cs++ = 0;
+		*cs++ = 0;
+
+		if (INTEL_GEN(dev_priv) >= 8)
+			*cs++ = 0;
+		else
+			*cs++ = MI_NOOP;
+	} else {
+		uint32_t cmd;
+
+		cmd = MI_FLUSH_DW + 1;
+		if (INTEL_GEN(dev_priv) >= 8)
+			cmd += 1;
+
+		cmd |= MI_FLUSH_DW_OP_STAMP;
+
+		*cs++ = cmd;
+		*cs++ = addr | MI_FLUSH_DW_USE_GTT;
+		*cs++ = 0;
+		*cs++ = 0;
+
+		if (INTEL_GEN(dev_priv) >= 8)
+			*cs++ = 0;
+		else
+			*cs++ = MI_NOOP;
+		*cs++ = MI_NOOP;
+	}
+
+	intel_ring_advance(request, cs);
+
+	return 0;
+}
+
+/**
  * i915_perf_stream_emit_sample_capture - Insert the commands to capture perf
  * metrics into the GPU command stream
  * @stream: An i915-perf stream opened for GPU metrics
@@ -625,7 +734,17 @@ static void i915_perf_stream_emit_sample_capture(
 	if (stream->sample_flags & SAMPLE_OA_REPORT) {
 		ret = i915_emit_oa_report_capture(request,
 						  preallocate,
-						  sample->offset);
+						  sample->oa_offset);
+		if (ret)
+			goto err_unref;
+	} else if (stream->sample_flags & SAMPLE_TS) {
+		/*
+		 * XXX: Since TS data can anyways be derived from OA report, so
+		 * no need to capture it for RCS engine, if capture oa data is
+		 * called already.
+		 */
+		ret = i915_emit_ts_capture(request, preallocate,
+						    sample->ts_offset);
 		if (ret)
 			goto err_unref;
 	}
@@ -947,6 +1066,12 @@ static int append_perf_sample(struct i915_perf_stream *stream,
 		buf += 8;
 	}
 
+	if (sample_flags & SAMPLE_TS) {
+		if (copy_to_user(buf, &data->ts, I915_PERF_TS_SAMPLE_SIZE))
+			return -EFAULT;
+		buf += I915_PERF_TS_SAMPLE_SIZE;
+	}
+
 	if (sample_flags & SAMPLE_OA_REPORT) {
 		if (copy_to_user(buf, data->report, report_size))
 			return -EFAULT;
@@ -990,6 +1115,12 @@ static int append_oa_buffer_sample(struct i915_perf_stream *stream,
 	if (sample_flags & SAMPLE_TAG)
 		data.tag = stream->last_tag;
 
+	/* TODO: Derive timestamp from OA report,
+	 * after scaling with the ts base
+	 */
+	if (sample_flags & SAMPLE_TS)
+		data.ts = 0;
+
 	if (sample_flags & SAMPLE_OA_REPORT)
 		data.report = report;
 
@@ -1565,7 +1696,8 @@ static int append_cs_buffer_sample(struct i915_perf_stream *stream,
 	int ret = 0;
 
 	if (sample_flags & SAMPLE_OA_REPORT) {
-		const u8 *report = stream->cs_buffer.vaddr + node->offset;
+		const u8 *report = stream->cs_buffer.vaddr + node->oa_offset;
+
 		u32 sample_ts = *(u32 *)(report + 4);
 
 		data.report = report;
@@ -1597,6 +1729,19 @@ static int append_cs_buffer_sample(struct i915_perf_stream *stream,
 		stream->last_tag = node->tag;
 	}
 
+	if (sample_flags & SAMPLE_TS) {
+		/* For RCS, if OA samples are also being collected, derive the
+		 * timestamp from OA report, after scaling with the TS base.
+		 * Else, forward the timestamp collected via command stream.
+		 */
+		/* TODO: derive the timestamp from OA report */
+		if (sample_flags & SAMPLE_OA_REPORT)
+			data.ts = 0;
+		else
+			data.ts = *(u64 *) (stream->cs_buffer.vaddr +
+					   node->ts_offset);
+	}
+
 	return append_perf_sample(stream, buf, count, offset, &data);
 }
 
@@ -2760,7 +2905,8 @@ static int i915_perf_stream_init(struct i915_perf_stream *stream,
 						      SAMPLE_OA_SOURCE);
 	bool require_cs_mode = props->sample_flags & (SAMPLE_PID |
 						      SAMPLE_TAG);
-	bool cs_sample_data = props->sample_flags & SAMPLE_OA_REPORT;
+	bool cs_sample_data = props->sample_flags & (SAMPLE_OA_REPORT |
+							SAMPLE_TS);
 	struct i915_perf_stream *curr_stream;
 	struct intel_engine_cs *engine = NULL;
 	int idx;
@@ -2917,8 +3063,21 @@ static int i915_perf_stream_init(struct i915_perf_stream *stream,
 			require_cs_mode = true;
 	}
 
+	if (props->sample_flags & SAMPLE_TS) {
+		stream->sample_flags |= SAMPLE_TS;
+		stream->sample_size += I915_PERF_TS_SAMPLE_SIZE;
+
+		/*
+		 * NB: it's meaningful to request SAMPLE_TS with just CS
+		 * mode or periodic OA mode sampling but we don't allow
+		 * SAMPLE_TS without either mode
+		 */
+		if (!require_oa_unit)
+			require_cs_mode = true;
+	}
+
 	if (require_cs_mode && !props->cs_mode) {
-		DRM_ERROR("PID/TAG sampling requires a ring to be specified");
+		DRM_ERROR("PID/TAG/TS sampling requires engine to be specified");
 		ret = -EINVAL;
 		goto err_enable;
 	}
@@ -2932,11 +3091,11 @@ static int i915_perf_stream_init(struct i915_perf_stream *stream,
 
 		/*
 		 * The only time we should allow enabling CS mode if it's not
-		 * strictly required, is if SAMPLE_CTX_ID has been requested
-		 * as it's usable with periodic OA or CS sampling.
+		 * strictly required, is if SAMPLE_CTX_ID/SAMPLE_TS has been
+		 * requested as they're usable with periodic OA or CS sampling.
 		 */
 		if (!require_cs_mode &&
-		    !(props->sample_flags & SAMPLE_CTX_ID)) {
+		    !(props->sample_flags & (SAMPLE_CTX_ID | SAMPLE_TS))) {
 			DRM_ERROR("Stream engine given without requesting any CS specific property\n");
 			ret = -EINVAL;
 			goto err_enable;
@@ -3646,21 +3805,12 @@ static int read_properties_unlocked(struct drm_i915_private *dev_priv,
 		case DRM_I915_PERF_PROP_ENGINE: {
 				unsigned int user_ring_id =
 					value & I915_EXEC_RING_MASK;
-				enum intel_engine_id engine;
 
 				if (user_ring_id > I915_USER_RINGS)
 					return -EINVAL;
 
-				/* XXX: Currently only RCS is supported.
-				 * Remove this check when support for other
-				 * engines is added
-				 */
-				engine = user_ring_map[user_ring_id];
-				if (engine != RCS)
-					return -EINVAL;
-
 				props->cs_mode = true;
-				props->engine = engine;
+				props->engine = user_ring_map[user_ring_id];
 			}
 			break;
 		case DRM_I915_PERF_PROP_SAMPLE_CTX_ID:
@@ -3672,6 +3822,9 @@ static int read_properties_unlocked(struct drm_i915_private *dev_priv,
 		case DRM_I915_PERF_PROP_SAMPLE_TAG:
 			props->sample_flags |= SAMPLE_TAG;
 			break;
+		case DRM_I915_PERF_PROP_SAMPLE_TS:
+			props->sample_flags |= SAMPLE_TS;
+			break;
 		case DRM_I915_PERF_PROP_MAX:
 			MISSING_CASE(id);
 			return -EINVAL;
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 1dc7e7a..ecd5794 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -434,6 +434,7 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg)
 #define   MI_FLUSH_DW_STORE_INDEX	(1<<21)
 #define   MI_INVALIDATE_TLB		(1<<18)
 #define   MI_FLUSH_DW_OP_STOREDW	(1<<14)
+#define   MI_FLUSH_DW_OP_STAMP		(3<<14)
 #define   MI_FLUSH_DW_OP_MASK		(3<<14)
 #define   MI_FLUSH_DW_NOTIFY		(1<<8)
 #define   MI_INVALIDATE_BSD		(1<<7)
@@ -517,6 +518,7 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg)
 #define   PIPE_CONTROL_TLB_INVALIDATE			(1<<18)
 #define   PIPE_CONTROL_MEDIA_STATE_CLEAR		(1<<16)
 #define   PIPE_CONTROL_QW_WRITE				(1<<14)
+#define   PIPE_CONTROL_TIMESTAMP_WRITE			(3<<14)
 #define   PIPE_CONTROL_POST_SYNC_OP_MASK                (3<<14)
 #define   PIPE_CONTROL_DEPTH_STALL			(1<<13)
 #define   PIPE_CONTROL_WRITE_FLUSH			(1<<12)
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 0e522d4..4d27075 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1425,6 +1425,12 @@ enum drm_i915_perf_property_id {
 	 */
 	DRM_I915_PERF_PROP_SAMPLE_TAG,
 
+	/**
+	 * The value of this property set to 1 requests inclusion of timestamp
+	 * in the perf sample data.
+	 */
+	DRM_I915_PERF_PROP_SAMPLE_TS,
+
 	DRM_I915_PERF_PROP_MAX /* non-ABI */
 };
 
@@ -1494,6 +1500,7 @@ enum drm_i915_perf_record_type {
 	 *     { u64 ctx_id; } && DRM_I915_PERF_PROP_SAMPLE_CTX_ID
 	 *     { u64 pid; } && DRM_I915_PERF_PROP_SAMPLE_PID
 	 *     { u64 tag; } && DRM_I915_PERF_PROP_SAMPLE_TAG
+	 *     { u64 timestamp; } && DRM_I915_PERF_PROP_SAMPLE_TS
 	 *     { u32 oa_report[]; } && DRM_I915_PERF_PROP_SAMPLE_OA
 	 * };
 	 */
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 10/12] drm/i915: Extract raw GPU timestamps from OA reports to forward in perf samples
  2017-07-31  7:59 [PATCH 00/12] i915 perf support for command stream based OA, GPU and workload metrics capture Sagar Arun Kamble
                   ` (8 preceding siblings ...)
  2017-07-31  7:59 ` [PATCH 09/12] drm/i915: Add support for collecting timestamps on all gpu engines Sagar Arun Kamble
@ 2017-07-31  7:59 ` Sagar Arun Kamble
  2017-07-31  7:59 ` [PATCH 11/12] drm/i915: Async check for streams data availability with hrtimer rescheduling Sagar Arun Kamble
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 34+ messages in thread
From: Sagar Arun Kamble @ 2017-07-31  7:59 UTC (permalink / raw)
  To: intel-gfx; +Cc: Sourab Gupta

From: Sourab Gupta <sourab.gupta@intel.com>

The OA reports contain the least significant 32 bits of the gpu timestamp.
This patch enables retrieval of the timestamp field from OA reports, to
forward as 64 bit raw gpu timestamps in the perf samples.

Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h  |  1 +
 drivers/gpu/drm/i915/i915_perf.c | 48 ++++++++++++++++++++++++++++++----------
 drivers/gpu/drm/i915/i915_reg.h  |  4 ++++
 3 files changed, 41 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 0763280..c7823ff 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2656,6 +2656,7 @@ struct drm_i915_private {
 			u32 ctx_flexeu0_offset;
 			u32 n_pending_periodic_samples;
 			u32 pending_periodic_ts;
+			u64 last_gpu_ts;
 
 			/**
 			 * The RPT_ID/reason field for Gen8+ includes a bit
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 2c7ab98..24d0823 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -1084,6 +1084,26 @@ static int append_perf_sample(struct i915_perf_stream *stream,
 }
 
 /**
+ * get_gpu_ts_from_oa_report - Retrieve absolute gpu timestamp from OA report
+ *
+ * Note: We are assuming that we're updating last_gpu_ts frequently enough so
+ * that it's never possible to see multiple overflows before we compare
+ * sample_ts to last_gpu_ts. Since this is significantly large duration
+ * (~6min for 80ns ts base), we can safely assume so.
+ */
+static u64 get_gpu_ts_from_oa_report(struct drm_i915_private *dev_priv,
+					const u8 *report)
+{
+	u32 sample_ts = *(u32 *)(report + 4);
+	u32 delta;
+
+	delta = sample_ts - (u32)dev_priv->perf.oa.last_gpu_ts;
+	dev_priv->perf.oa.last_gpu_ts += delta;
+
+	return dev_priv->perf.oa.last_gpu_ts;
+}
+
+/**
  * append_oa_buffer_sample - Copies single periodic OA report into userspace
  * read() buffer.
  * @stream: An i915-perf stream opened for OA metrics
@@ -1115,11 +1135,8 @@ static int append_oa_buffer_sample(struct i915_perf_stream *stream,
 	if (sample_flags & SAMPLE_TAG)
 		data.tag = stream->last_tag;
 
-	/* TODO: Derive timestamp from OA report,
-	 * after scaling with the ts base
-	 */
 	if (sample_flags & SAMPLE_TS)
-		data.ts = 0;
+		data.ts = get_gpu_ts_from_oa_report(dev_priv, report);
 
 	if (sample_flags & SAMPLE_OA_REPORT)
 		data.report = report;
@@ -1693,6 +1710,7 @@ static int append_cs_buffer_sample(struct i915_perf_stream *stream,
 	struct drm_i915_private *dev_priv = stream->dev_priv;
 	struct i915_perf_sample_data data = { 0 };
 	u32 sample_flags = stream->sample_flags;
+	u64 gpu_ts = 0;
 	int ret = 0;
 
 	if (sample_flags & SAMPLE_OA_REPORT) {
@@ -1709,6 +1727,9 @@ static int append_cs_buffer_sample(struct i915_perf_stream *stream,
 						 sample_ts, U32_MAX);
 		if (ret)
 			return ret;
+
+		if (sample_flags & SAMPLE_TS)
+			gpu_ts = get_gpu_ts_from_oa_report(dev_priv, report);
 	}
 
 	if (sample_flags & SAMPLE_OA_SOURCE)
@@ -1730,16 +1751,13 @@ static int append_cs_buffer_sample(struct i915_perf_stream *stream,
 	}
 
 	if (sample_flags & SAMPLE_TS) {
-		/* For RCS, if OA samples are also being collected, derive the
-		 * timestamp from OA report, after scaling with the TS base.
+		/* If OA sampling is enabled, derive the ts from OA report.
 		 * Else, forward the timestamp collected via command stream.
 		 */
-		/* TODO: derive the timestamp from OA report */
-		if (sample_flags & SAMPLE_OA_REPORT)
-			data.ts = 0;
-		else
-			data.ts = *(u64 *) (stream->cs_buffer.vaddr +
+		if (!(sample_flags & SAMPLE_OA_REPORT))
+			gpu_ts = *(u64 *) (stream->cs_buffer.vaddr +
 					   node->ts_offset);
+		data.ts = gpu_ts;
 	}
 
 	return append_perf_sample(stream, buf, count, offset, &data);
@@ -2827,9 +2845,15 @@ static void i915_perf_stream_enable(struct i915_perf_stream *stream)
 {
 	struct drm_i915_private *dev_priv = stream->dev_priv;
 
-	if (stream->sample_flags & SAMPLE_OA_REPORT)
+	if (stream->sample_flags & SAMPLE_OA_REPORT) {
 		dev_priv->perf.oa.ops.oa_enable(dev_priv);
 
+		if (stream->sample_flags & SAMPLE_TS)
+			dev_priv->perf.oa.last_gpu_ts =
+				I915_READ64_2x32(GT_TIMESTAMP_COUNT,
+					GT_TIMESTAMP_COUNT_UDW);
+	}
+
 	if (stream->cs_mode || dev_priv->perf.oa.periodic)
 		hrtimer_start(&dev_priv->perf.poll_check_timer,
 			      ns_to_ktime(POLL_PERIOD),
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index ecd5794..05be687 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -617,6 +617,10 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg)
 #define PS_DEPTH_COUNT                  _MMIO(0x2350)
 #define PS_DEPTH_COUNT_UDW		_MMIO(0x2350 + 4)
 
+/* Timestamp count register */
+#define GT_TIMESTAMP_COUNT		_MMIO(0x2358)
+#define GT_TIMESTAMP_COUNT_UDW		_MMIO(0x2358 + 4)
+
 /* There are the 4 64-bit counter registers, one for each stream output */
 #define GEN7_SO_NUM_PRIMS_WRITTEN(n)		_MMIO(0x5200 + (n) * 8)
 #define GEN7_SO_NUM_PRIMS_WRITTEN_UDW(n)	_MMIO(0x5200 + (n) * 8 + 4)
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 11/12] drm/i915: Async check for streams data availability with hrtimer rescheduling
  2017-07-31  7:59 [PATCH 00/12] i915 perf support for command stream based OA, GPU and workload metrics capture Sagar Arun Kamble
                   ` (9 preceding siblings ...)
  2017-07-31  7:59 ` [PATCH 10/12] drm/i915: Extract raw GPU timestamps from OA reports to forward in perf samples Sagar Arun Kamble
@ 2017-07-31  7:59 ` Sagar Arun Kamble
  2017-07-31  7:59 ` [PATCH 12/12] drm/i915: Support for capturing MMIO register values Sagar Arun Kamble
  2017-07-31  9:02 ` ✓ Fi.CI.BAT: success for i915 perf support for command stream based OA, GPU and workload metrics capture Patchwork
  12 siblings, 0 replies; 34+ messages in thread
From: Sagar Arun Kamble @ 2017-07-31  7:59 UTC (permalink / raw)
  To: intel-gfx; +Cc: Sourab Gupta

From: Sourab Gupta <sourab.gupta@intel.com>

This patch ensures hrtimer is rescheduled immediately during cb by
doing async call to check for streams data availability.

Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
---
 drivers/gpu/drm/i915/i915_perf.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 24d0823..6b9bea7 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -3308,12 +3308,10 @@ static ssize_t i915_perf_read(struct file *file,
 	return ret;
 }
 
-static enum hrtimer_restart poll_check_timer_cb(struct hrtimer *hrtimer)
+static void wake_up_perf_streams(void *data, async_cookie_t cookie)
 {
+	struct drm_i915_private *dev_priv = data;
 	struct i915_perf_stream *stream;
-	struct drm_i915_private *dev_priv =
-		container_of(hrtimer, typeof(*dev_priv),
-			     perf.poll_check_timer);
 	int idx;
 	struct intel_engine_cs *engine;
 	enum intel_engine_id id;
@@ -3329,6 +3327,15 @@ static enum hrtimer_restart poll_check_timer_cb(struct hrtimer *hrtimer)
 		}
 		srcu_read_unlock(&engine->perf_srcu, idx);
 	}
+}
+
+static enum hrtimer_restart poll_check_timer_cb(struct hrtimer *hrtimer)
+{
+	struct drm_i915_private *dev_priv =
+		container_of(hrtimer, typeof(*dev_priv),
+			     perf.poll_check_timer);
+
+	async_schedule(wake_up_perf_streams, dev_priv);
 
 	hrtimer_forward_now(hrtimer, ns_to_ktime(POLL_PERIOD));
 
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 12/12] drm/i915: Support for capturing MMIO register values
  2017-07-31  7:59 [PATCH 00/12] i915 perf support for command stream based OA, GPU and workload metrics capture Sagar Arun Kamble
                   ` (10 preceding siblings ...)
  2017-07-31  7:59 ` [PATCH 11/12] drm/i915: Async check for streams data availability with hrtimer rescheduling Sagar Arun Kamble
@ 2017-07-31  7:59 ` Sagar Arun Kamble
  2017-07-31 11:49   ` kbuild test robot
  2017-07-31 12:08   ` kbuild test robot
  2017-07-31  9:02 ` ✓ Fi.CI.BAT: success for i915 perf support for command stream based OA, GPU and workload metrics capture Patchwork
  12 siblings, 2 replies; 34+ messages in thread
From: Sagar Arun Kamble @ 2017-07-31  7:59 UTC (permalink / raw)
  To: intel-gfx; +Cc: Sourab Gupta

From: Sourab Gupta <sourab.gupta@intel.com>

This patch adds support for capturing MMIO register values through
i915 perf interface.
The userspace can request upto 8 MMIO register values to be dumped.
The addresses of these registers can be passed through the corresponding
property 'value' field while opening the stream.
The commands to dump the values of these MMIO registers are then
inserted into the ring alongwith other commands.

v2: Updated error return on copy_from_user failure. (Chris)

Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h         |  10 ++
 drivers/gpu/drm/i915/i915_perf.c        | 174 +++++++++++++++++++++++++++++++-
 drivers/gpu/drm/i915/intel_ringbuffer.h |   3 +
 include/uapi/drm/i915_drm.h             |  14 +++
 4 files changed, 198 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index c7823ff..01f9abe 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2203,6 +2203,13 @@ struct i915_perf_cs_sample {
 	u32 ts_offset;
 
 	/**
+	 * @mmio_offset: Offset into ``&stream->cs_buffer`` where the mmio reg
+	 * values for this perf sample will be collected (if the stream is
+	 * configured for collection of mmio data)
+	 */
+	u32 mmio_offset;
+
+	/**
 	 * @size: buffer size corresponding to this perf sample
 	 */
 	u32 size;
@@ -2669,6 +2676,9 @@ struct drm_i915_private {
 			const struct i915_oa_format *oa_formats;
 			int n_builtin_sets;
 		} oa;
+
+		u32 num_mmio;
+		u32 mmio_list[I915_PERF_MMIO_NUM_MAX];
 	} perf;
 
 	/* Abstract the submission mechanism (legacy ringbuffer or execlists) away */
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 6b9bea7..8c909cf 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -301,6 +301,7 @@ struct i915_perf_sample_data {
 	u64 tag;
 	u64 ts;
 	const u8 *report;
+	const u8 *mmio;
 };
 
 /* For sysctl proc_dointvec_minmax of i915_oa_max_sample_rate
@@ -358,6 +359,7 @@ struct i915_perf_sample_data {
 #define SAMPLE_PID	      (1<<3)
 #define SAMPLE_TAG	      (1<<4)
 #define SAMPLE_TS	      (1<<5)
+#define SAMPLE_MMIO	      (1<<6)
 
 /**
  * struct perf_open_properties - for validated properties given to open a stream
@@ -496,6 +498,9 @@ static void insert_perf_sample(struct i915_perf_stream *stream,
 		sample_size += I915_PERF_TS_SAMPLE_SIZE;
 	}
 
+	if (stream->sample_flags & SAMPLE_MMIO)
+		sample_size += 4 * stream->engine->num_mmio;
+
 	spin_lock_irqsave(&stream->cs_samples_lock, flags);
 	if (list_empty(&stream->cs_samples)) {
 		offset = 0;
@@ -561,6 +566,10 @@ static void insert_perf_sample(struct i915_perf_stream *stream,
 		sample->ts_offset = ALIGN(sample->ts_offset, TS_ADDR_ALIGN);
 		offset = sample->ts_offset + I915_PERF_TS_SAMPLE_SIZE;
 	}
+	if (stream->sample_flags & SAMPLE_MMIO) {
+		sample->mmio_offset = offset;
+		offset = sample->mmio_offset + 4 * stream->engine->num_mmio;
+	}
 
 	list_add_tail(&sample->link, &stream->cs_samples);
 	spin_unlock_irqrestore(&stream->cs_samples_lock, flags);
@@ -700,6 +709,61 @@ static int i915_emit_ts_capture(struct drm_i915_gem_request *request,
 }
 
 /**
+ * i915_emit_mmio_capture - Insert the commands to capture mmio
+ * data into the GPU command stream
+ * @request: request in whose context the mmio data being collected.
+ * @preallocate: allocate space in ring for related sample.
+ * @offset: command stream buffer offset where the data needs to be collected
+ */
+static int i915_engine_stream_capture_mmio(struct drm_i915_gem_request *request,
+					   bool preallocate,
+					   u32 offset)
+{
+	struct drm_i915_private *dev_priv = request->i915;
+	struct intel_engine_cs *engine = request->engine;
+	struct i915_perf_stream *stream;
+	int i, num_mmio = engine->num_mmio;
+	u32 mmio_addr;
+	u32 cmd, len, *cs;
+	int idx;
+
+	len = 4 * num_mmio;
+
+	if (preallocate)
+		request->reserved_space += len;
+	else
+		request->reserved_space -= len;
+
+	cs = intel_ring_begin(request, 4 * num_mmio);
+	if (IS_ERR(cs))
+		return PTR_ERR(cs);
+
+	idx = srcu_read_lock(&engine->perf_srcu);
+	stream = rcu_dereference(engine->exclusive_stream);
+	mmio_addr = stream->cs_buffer.vma->node.start + offset;
+	srcu_read_unlock(&engine->perf_srcu, idx);
+
+
+	if (INTEL_GEN(dev_priv) >= 8)
+		cmd = MI_STORE_REGISTER_MEM_GEN8 | MI_SRM_LRM_GLOBAL_GTT;
+	else
+		cmd = MI_STORE_REGISTER_MEM | MI_SRM_LRM_GLOBAL_GTT;
+
+	for (i = 0; i < num_mmio; i++) {
+		*cs++ = cmd;
+		*cs++ = engine->mmio_list[i];
+		*cs++ = mmio_addr + 4*i;
+
+		if (INTEL_GEN(dev_priv) >= 8)
+			*cs++ = 0;
+		else
+			*cs++ = MI_NOOP;
+	}
+	intel_ring_advance(request, cs);
+	return 0;
+}
+
+/**
  * i915_perf_stream_emit_sample_capture - Insert the commands to capture perf
  * metrics into the GPU command stream
  * @stream: An i915-perf stream opened for GPU metrics
@@ -749,6 +813,14 @@ static void i915_perf_stream_emit_sample_capture(
 			goto err_unref;
 	}
 
+	if (stream->sample_flags & SAMPLE_MMIO) {
+		ret = i915_engine_stream_capture_mmio(request,
+				preallocate,
+				sample->mmio_offset);
+		if (ret)
+			goto err_unref;
+	}
+
 	reservation_object_lock(resv, NULL);
 	if (reservation_object_reserve_shared(resv) == 0)
 		reservation_object_add_shared_fence(resv, &request->fence);
@@ -1072,6 +1144,12 @@ static int append_perf_sample(struct i915_perf_stream *stream,
 		buf += I915_PERF_TS_SAMPLE_SIZE;
 	}
 
+	if (sample_flags & SAMPLE_MMIO) {
+		if (copy_to_user(buf, data->mmio, 4 * stream->engine->num_mmio))
+			return -EFAULT;
+		buf += 4 * stream->engine->num_mmio;
+	}
+
 	if (sample_flags & SAMPLE_OA_REPORT) {
 		if (copy_to_user(buf, data->report, report_size))
 			return -EFAULT;
@@ -1121,6 +1199,7 @@ static int append_oa_buffer_sample(struct i915_perf_stream *stream,
 	struct drm_i915_private *dev_priv = stream->dev_priv;
 	u32 sample_flags = stream->sample_flags;
 	struct i915_perf_sample_data data = { 0 };
+	u32 mmio_list_dummy[I915_PERF_MMIO_NUM_MAX] = { 0 };
 
 	if (sample_flags & SAMPLE_OA_SOURCE)
 		data.source = I915_PERF_SAMPLE_OA_SOURCE_OABUFFER;
@@ -1138,6 +1217,10 @@ static int append_oa_buffer_sample(struct i915_perf_stream *stream,
 	if (sample_flags & SAMPLE_TS)
 		data.ts = get_gpu_ts_from_oa_report(dev_priv, report);
 
+	/* Periodic OA samples don't have mmio associated with them */
+	if (sample_flags & SAMPLE_MMIO)
+		data.mmio = (u8 *)mmio_list_dummy;
+
 	if (sample_flags & SAMPLE_OA_REPORT)
 		data.report = report;
 
@@ -1760,6 +1843,9 @@ static int append_cs_buffer_sample(struct i915_perf_stream *stream,
 		data.ts = gpu_ts;
 	}
 
+	if (sample_flags & SAMPLE_MMIO)
+		data.mmio = stream->cs_buffer.vaddr + node->mmio_offset;
+
 	return append_perf_sample(stream, buf, count, offset, &data);
 }
 
@@ -2928,9 +3014,11 @@ static int i915_perf_stream_init(struct i915_perf_stream *stream,
 	bool require_oa_unit = props->sample_flags & (SAMPLE_OA_REPORT |
 						      SAMPLE_OA_SOURCE);
 	bool require_cs_mode = props->sample_flags & (SAMPLE_PID |
-						      SAMPLE_TAG);
+						      SAMPLE_TAG |
+						      SAMPLE_MMIO);
 	bool cs_sample_data = props->sample_flags & (SAMPLE_OA_REPORT |
-							SAMPLE_TS);
+							SAMPLE_TS |
+							SAMPLE_MMIO);
 	struct i915_perf_stream *curr_stream;
 	struct intel_engine_cs *engine = NULL;
 	int idx;
@@ -3101,7 +3189,8 @@ static int i915_perf_stream_init(struct i915_perf_stream *stream,
 	}
 
 	if (require_cs_mode && !props->cs_mode) {
-		DRM_ERROR("PID/TAG/TS sampling requires engine to be specified");
+		DRM_ERROR(
+			"PID/TAG/TS/MMIO sampling requires engine to be specified");
 		ret = -EINVAL;
 		goto err_enable;
 	}
@@ -3137,6 +3226,16 @@ static int i915_perf_stream_init(struct i915_perf_stream *stream,
 
 		engine = dev_priv->engine[props->engine];
 
+		if (props->sample_flags & SAMPLE_MMIO) {
+			memset(engine->mmio_list, 0, I915_PERF_MMIO_NUM_MAX);
+			memcpy(engine->mmio_list, dev_priv->perf.mmio_list,
+			       4 * engine->num_mmio);
+			engine->num_mmio = dev_priv->perf.num_mmio;
+
+			stream->sample_flags |= SAMPLE_MMIO;
+			stream->sample_size += 4 * engine->num_mmio;
+		}
+
 		idx = srcu_read_lock(&engine->perf_srcu);
 		curr_stream = srcu_dereference(engine->exclusive_stream,
 					       &engine->perf_srcu);
@@ -3703,6 +3802,69 @@ static u64 oa_exponent_to_ns(struct drm_i915_private *dev_priv, int exponent)
 		       dev_priv->perf.oa.timestamp_frequency);
 }
 
+static int check_mmio_whitelist(struct drm_i915_private *dev_priv, u32 num_mmio)
+{
+#define GEN_RANGE(l, h) GENMASK(h, l)
+	static const struct register_whitelist {
+		i915_reg_t mmio;
+		uint32_t size;
+		/* supported gens, 0x10 for 4, 0x30 for 4 and 5, etc. */
+		uint32_t gen_bitmask;
+	} whitelist[] = {
+		{ GEN6_GT_GFX_RC6, 4, GEN_RANGE(7, 9) },
+		{ GEN6_GT_GFX_RC6p, 4, GEN_RANGE(7, 9) },
+	};
+	int i, count;
+
+	for (count = 0; count < num_mmio; count++) {
+		/* Coarse check on mmio reg addresses being non zero */
+		if (!dev_priv->perf.mmio_list[count])
+			return -EINVAL;
+
+		for (i = 0; i < ARRAY_SIZE(whitelist); i++) {
+			if ((i915_mmio_reg_offset(whitelist[i].mmio) ==
+				dev_priv->perf.mmio_list[count]) &&
+			    (1 << INTEL_INFO(dev_priv)->gen &
+					whitelist[i].gen_bitmask))
+				break;
+		}
+
+		if (i == ARRAY_SIZE(whitelist))
+			return -EINVAL;
+	}
+	return 0;
+}
+
+static int copy_mmio_list(struct drm_i915_private *dev_priv,
+				void __user *mmio)
+{
+	void __user *mmio_list = ((u8 __user *)mmio + 4);
+	u32 num_mmio;
+	int ret;
+
+	if (!mmio)
+		return -EINVAL;
+
+	ret = get_user(num_mmio, (u32 __user *)mmio);
+	if (ret)
+		return ret;
+
+	if (num_mmio > I915_PERF_MMIO_NUM_MAX)
+		return -EINVAL;
+
+	memset(dev_priv->perf.mmio_list, 0, I915_PERF_MMIO_NUM_MAX);
+	if (copy_from_user(dev_priv->perf.mmio_list, mmio_list, 4 * num_mmio))
+		return -EFAULT;
+
+	ret = check_mmio_whitelist(dev_priv, num_mmio);
+	if (ret)
+		return ret;
+
+	dev_priv->perf.num_mmio = num_mmio;
+
+	return 0;
+}
+
 /**
  * read_properties_unlocked - validate + copy userspace stream open properties
  * @dev_priv: i915 device instance
@@ -3856,6 +4018,12 @@ static int read_properties_unlocked(struct drm_i915_private *dev_priv,
 		case DRM_I915_PERF_PROP_SAMPLE_TS:
 			props->sample_flags |= SAMPLE_TS;
 			break;
+		case DRM_I915_PERF_PROP_SAMPLE_MMIO:
+			ret = copy_mmio_list(dev_priv, (u64 __user *)value);
+			if (ret)
+				return ret;
+			props->sample_flags |= SAMPLE_MMIO;
+			break;
 		case DRM_I915_PERF_PROP_MAX:
 			MISSING_CASE(id);
 			return -EINVAL;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 0ac8491..537d2b9 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -446,6 +446,9 @@ struct intel_engine_cs {
 	struct srcu_struct perf_srcu;
 	struct i915_perf_stream __rcu *exclusive_stream;
 	u32 specific_ctx_id;
+
+	u32 num_mmio;
+	u32 mmio_list[I915_PERF_MMIO_NUM_MAX];
 };
 
 static inline unsigned int
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 4d27075..7b2a64cd 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1359,6 +1359,12 @@ enum drm_i915_perf_sample_oa_source {
 	I915_PERF_SAMPLE_OA_SOURCE_MAX	/* non-ABI */
 };
 
+#define I915_PERF_MMIO_NUM_MAX	8
+struct drm_i915_perf_mmio_list {
+	__u32 num_mmio;
+	__u32 mmio_list[I915_PERF_MMIO_NUM_MAX];
+};
+
 enum drm_i915_perf_property_id {
 	/**
 	 * Open the stream for a specific context handle (as used with
@@ -1431,6 +1437,13 @@ enum drm_i915_perf_property_id {
 	 */
 	DRM_I915_PERF_PROP_SAMPLE_TS,
 
+	/**
+	 * This property requests inclusion of mmio register values in the perf
+	 * sample data. The value of this property specifies the address of user
+	 * struct having the register addresses.
+	 */
+	DRM_I915_PERF_PROP_SAMPLE_MMIO,
+
 	DRM_I915_PERF_PROP_MAX /* non-ABI */
 };
 
@@ -1501,6 +1514,7 @@ enum drm_i915_perf_record_type {
 	 *     { u64 pid; } && DRM_I915_PERF_PROP_SAMPLE_PID
 	 *     { u64 tag; } && DRM_I915_PERF_PROP_SAMPLE_TAG
 	 *     { u64 timestamp; } && DRM_I915_PERF_PROP_SAMPLE_TS
+	 *     { u32 mmio[]; } && DRM_I915_PERF_PROP_SAMPLE_MMIO
 	 *     { u32 oa_report[]; } && DRM_I915_PERF_PROP_SAMPLE_OA
 	 * };
 	 */
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH 03/12] drm/i915: Framework for capturing command stream based OA reports and ctx id info.
  2017-07-31  7:59 ` [PATCH 03/12] drm/i915: Framework for capturing command stream based OA reports and ctx id info Sagar Arun Kamble
@ 2017-07-31  8:34   ` Chris Wilson
  2017-07-31 10:11     ` Chris Wilson
  2017-07-31  9:43   ` Lionel Landwerlin
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 34+ messages in thread
From: Chris Wilson @ 2017-07-31  8:34 UTC (permalink / raw)
  To: Sagar Arun Kamble, intel-gfx; +Cc: Sourab Gupta

Quoting Sagar Arun Kamble (2017-07-31 08:59:36)
> +/**
> + * i915_perf_stream_emit_sample_capture - Insert the commands to capture perf
> + * metrics into the GPU command stream
> + * @stream: An i915-perf stream opened for GPU metrics
> + * @request: request in whose context the metrics are being collected.
> + * @preallocate: allocate space in ring for related sample.
> + */
> +static void i915_perf_stream_emit_sample_capture(
> +                                       struct i915_perf_stream *stream,
> +                                       struct drm_i915_gem_request *request,
> +                                       bool preallocate)
> +{
> +       struct reservation_object *resv = stream->cs_buffer.vma->resv;
> +       struct i915_perf_cs_sample *sample;
> +       unsigned long flags;
> +       int ret;
> +
> +       sample = kzalloc(sizeof(*sample), GFP_KERNEL);
> +       if (sample == NULL) {
> +               DRM_ERROR("Perf sample alloc failed\n");
> +               return;
> +       }
> +
> +       sample->request = i915_gem_request_get(request);
> +       sample->ctx_id = request->ctx->hw_id;
> +
> +       insert_perf_sample(stream, sample);
> +
> +       if (stream->sample_flags & SAMPLE_OA_REPORT) {
> +               ret = i915_emit_oa_report_capture(request,
> +                                                 preallocate,
> +                                                 sample->offset);
> +               if (ret)
> +                       goto err_unref;
> +       }

This is incorrect as the requests may be reordered. You either need to
declare the linear ordering of requests through the sample buffer, or we
have to delay setting sample->offset until execution, and even then we
need to disable preemption when using SAMPLE_OA_REPORT.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 34+ messages in thread

* ✓ Fi.CI.BAT: success for i915 perf support for command stream based OA, GPU and workload metrics capture
  2017-07-31  7:59 [PATCH 00/12] i915 perf support for command stream based OA, GPU and workload metrics capture Sagar Arun Kamble
                   ` (11 preceding siblings ...)
  2017-07-31  7:59 ` [PATCH 12/12] drm/i915: Support for capturing MMIO register values Sagar Arun Kamble
@ 2017-07-31  9:02 ` Patchwork
  12 siblings, 0 replies; 34+ messages in thread
From: Patchwork @ 2017-07-31  9:02 UTC (permalink / raw)
  To: sagar.a.kamble; +Cc: intel-gfx

== Series Details ==

Series: i915 perf support for command stream based OA, GPU and workload metrics capture
URL   : https://patchwork.freedesktop.org/series/28104/
State : success

== Summary ==

Series 28104v1 i915 perf support for command stream based OA, GPU and workload metrics capture
https://patchwork.freedesktop.org/api/1.0/series/28104/revisions/1/mbox/

Test gem_exec_flush:
        Subgroup basic-batch-kernel-default-uc:
                pass       -> FAIL       (fi-snb-2600) fdo#100007
Test kms_pipe_crc_basic:
        Subgroup suspend-read-crc-pipe-b:
                dmesg-warn -> PASS       (fi-byt-j1900) fdo#101705

fdo#100007 https://bugs.freedesktop.org/show_bug.cgi?id=100007
fdo#101705 https://bugs.freedesktop.org/show_bug.cgi?id=101705

fi-bdw-5557u     total:280  pass:269  dwarn:0   dfail:0   fail:0   skip:11  time:442s
fi-bdw-gvtdvm    total:280  pass:266  dwarn:0   dfail:0   fail:0   skip:14  time:430s
fi-blb-e6850     total:280  pass:225  dwarn:1   dfail:0   fail:0   skip:54  time:356s
fi-bsw-n3050     total:280  pass:244  dwarn:0   dfail:0   fail:0   skip:36  time:537s
fi-bxt-j4205     total:280  pass:261  dwarn:0   dfail:0   fail:0   skip:19  time:509s
fi-byt-j1900     total:280  pass:256  dwarn:0   dfail:0   fail:0   skip:24  time:497s
fi-byt-n2820     total:280  pass:251  dwarn:1   dfail:0   fail:0   skip:28  time:486s
fi-glk-2a        total:280  pass:261  dwarn:0   dfail:0   fail:0   skip:19  time:596s
fi-hsw-4770      total:280  pass:264  dwarn:0   dfail:0   fail:0   skip:16  time:440s
fi-hsw-4770r     total:280  pass:264  dwarn:0   dfail:0   fail:0   skip:16  time:414s
fi-ilk-650       total:280  pass:230  dwarn:0   dfail:0   fail:0   skip:50  time:416s
fi-ivb-3520m     total:280  pass:262  dwarn:0   dfail:0   fail:0   skip:18  time:503s
fi-ivb-3770      total:280  pass:262  dwarn:0   dfail:0   fail:0   skip:18  time:473s
fi-kbl-7500u     total:280  pass:262  dwarn:0   dfail:0   fail:0   skip:18  time:466s
fi-kbl-7560u     total:280  pass:270  dwarn:0   dfail:0   fail:0   skip:10  time:570s
fi-kbl-r         total:280  pass:262  dwarn:0   dfail:0   fail:0   skip:18  time:585s
fi-pnv-d510      total:280  pass:224  dwarn:1   dfail:0   fail:0   skip:55  time:572s
fi-skl-6260u     total:280  pass:270  dwarn:0   dfail:0   fail:0   skip:10  time:459s
fi-skl-6700hq    total:280  pass:263  dwarn:0   dfail:0   fail:0   skip:17  time:582s
fi-skl-6700k     total:280  pass:262  dwarn:0   dfail:0   fail:0   skip:18  time:467s
fi-skl-6770hq    total:280  pass:270  dwarn:0   dfail:0   fail:0   skip:10  time:484s
fi-skl-gvtdvm    total:280  pass:267  dwarn:0   dfail:0   fail:0   skip:13  time:443s
fi-skl-x1585l    total:280  pass:269  dwarn:0   dfail:0   fail:0   skip:11  time:476s
fi-snb-2520m     total:280  pass:252  dwarn:0   dfail:0   fail:0   skip:28  time:550s
fi-snb-2600      total:280  pass:250  dwarn:0   dfail:0   fail:1   skip:29  time:409s

7c1a6431ac6c971d30e461ad3f7fe7ea2c52492d drm-tip: 2017y-07m-31d-08h-08m-30s UTC integration manifest
78b799e574a7 drm/i915: Support for capturing MMIO register values
fcbe733a8743 drm/i915: Async check for streams data availability with hrtimer rescheduling
9d4f005ddb9d drm/i915: Extract raw GPU timestamps from OA reports to forward in perf samples
a8bd9d8a39e0 drm/i915: Add support for collecting timestamps on all gpu engines
7f3358b51a16 drm/i915: Add support for emitting execbuffer tags through OA counter reports
3b46273a5d38 drm/i915: Add support for having pid output with OA report
452c678b26ea drm/i915: Populate ctx ID for periodic OA reports
b361169e85db drm/i915: Inform userspace about command stream OA buf overflow
01d93db2437c drm/i915: Flush periodic samples, in case of no pending CS sample requests
42ad44f11ca0 drm/i915: Framework for capturing command stream based OA reports and ctx id info.
b880c7f0e191 drm/i915: Expose OA sample source to userspace
c45bf3b57ab1 drm/i915: Add ctx getparam ioctl parameter to retrieve ctx unique id

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_5298/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 06/12] drm/i915: Populate ctx ID for periodic OA reports
  2017-07-31  7:59 ` [PATCH 06/12] drm/i915: Populate ctx ID for periodic OA reports Sagar Arun Kamble
@ 2017-07-31  9:27   ` Lionel Landwerlin
  2017-07-31 10:42     ` Kamble, Sagar A
  2017-07-31 18:17   ` kbuild test robot
  1 sibling, 1 reply; 34+ messages in thread
From: Lionel Landwerlin @ 2017-07-31  9:27 UTC (permalink / raw)
  To: Sagar Arun Kamble, intel-gfx; +Cc: Sourab Gupta

Hi Sagar,

I'm curious to what happens if 2 contexts submit requests which a time 
period smaller than the sampling OA period on Gen7.5.
My understanding is that with this change you'll only retain the last 
submission and then the ctx_id reported in the SAMPLE_CTX_ID field will 
be incorrect for the first workload.

Am I missing something?

-
Lionel

On 31/07/17 08:59, Sagar Arun Kamble wrote:
> From: Sourab Gupta <sourab.gupta@intel.com>
>
> This adds support for populating the ctx id for the periodic OA reports
> when requested through the corresponding property.
>
> For Gen8, the OA reports itself have the ctx ID and it is the one
> programmed into HW while submitting workloads. Thus it's retrieved from
> reports itself.
> For Gen7, the OA reports don't have any such field, and we can populate
> this field with the last seen ctx ID while sending CS reports.
>
> Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
> Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_drv.h  |  8 ++++++
>   drivers/gpu/drm/i915/i915_perf.c | 58 +++++++++++++++++++++++++++++++---------
>   2 files changed, 54 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index fb81315..6c011f3 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2077,6 +2077,8 @@ struct i915_perf_stream {
>   
>   	wait_queue_head_t poll_wq;
>   	bool pollin;
> +
> +	u32 last_ctx_id;
>   };
>   
>   /**
> @@ -2151,6 +2153,12 @@ struct i915_oa_ops {
>   	 * generations.
>   	 */
>   	u32 (*oa_hw_tail_read)(struct drm_i915_private *dev_priv);
> +
> +	/**
> +	 * @get_ctx_id: Retrieve the ctx_id associated with the (periodic) OA
> +	 * report.
> +	 */
> +	u32 (*get_ctx_id)(struct i915_perf_stream *stream, const u8 *report);
>   };
>   
>   /*
> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> index 905c5bb..1f5ebdb 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -790,6 +790,45 @@ static u32 oa_buffer_num_reports_unlocked(
>   	return aged_tail == INVALID_TAIL_PTR ? 0 : num_reports;
>   }
>   
> +static u32 gen7_oa_buffer_get_ctx_id(struct i915_perf_stream *stream,
> +				    const u8 *report)
> +{
> +	if (!stream->cs_mode)
> +		WARN_ONCE(1,
> +			"CTX ID can't be retrieved if command stream mode not enabled");
> +
> +	/*
> +	 * OA reports generated in Gen7 don't have the ctx ID information.
> +	 * Therefore, just rely on the ctx ID information from the last CS
> +	 * sample forwarded
> +	 */
> +	return stream->last_ctx_id;
> +}
> +
> +static u32 gen8_oa_buffer_get_ctx_id(struct i915_perf_stream *stream,
> +				    const u8 *report)
> +{
> +	u32 ctx_id;
> +
> +	/* The ctx ID present in the OA reports have intel_context::hw_id
> +	 * present, since this is programmed into the ELSP in execlist mode.
> +	 * In non-execlist mode, fall back to retrieving the ctx ID from the
> +	 * last saved ctx ID from command stream mode.
> +	 */
> +	if (i915.enable_execlists) {
> +		u32 *report32 = (void *)report;
> +
> +		ctx_id = report32[2] & 0x1fffff;
> +	} else {
> +		if (!stream->cs_mode)
> +			WARN_ONCE(1,
> +				"CTX ID can't be retrieved if command stream mode not enabled");
> +
> +		ctx_id = stream->last_ctx_id;
> +	}
> +	return ctx_id;
> +}
> +
>   /**
>    * append_oa_status - Appends a status record to a userspace read() buffer.
>    * @stream: An i915-perf stream opened for OA metrics
> @@ -914,22 +953,12 @@ static int append_oa_buffer_sample(struct i915_perf_stream *stream,
>   	struct drm_i915_private *dev_priv = stream->dev_priv;
>   	u32 sample_flags = stream->sample_flags;
>   	struct i915_perf_sample_data data = { 0 };
> -	u32 *report32 = (u32 *)report;
>   
>   	if (sample_flags & SAMPLE_OA_SOURCE)
>   		data.source = I915_PERF_SAMPLE_OA_SOURCE_OABUFFER;
>   
>   	if (sample_flags & SAMPLE_CTX_ID) {
> -		if (INTEL_INFO(dev_priv)->gen < 8)
> -			data.ctx_id = 0;
> -		else {
> -			/*
> -			 * XXX: Just keep the lower 21 bits for now since I'm
> -			 * not entirely sure if the HW touches any of the higher
> -			 * bits in this field
> -			 */
> -			data.ctx_id = report32[2] & 0x1fffff;
> -		}
> +		data.ctx_id = dev_priv->perf.oa.ops.get_ctx_id(stream, report);
>   	}
>   
>   	if (sample_flags & SAMPLE_OA_REPORT)
> @@ -1524,8 +1553,10 @@ static int append_cs_buffer_sample(struct i915_perf_stream *stream,
>   	if (sample_flags & SAMPLE_OA_SOURCE)
>   		data.source = I915_PERF_SAMPLE_OA_SOURCE_CS;
>   
> -	if (sample_flags & SAMPLE_CTX_ID)
> +	if (sample_flags & SAMPLE_CTX_ID) {
>   		data.ctx_id = node->ctx_id;
> +		stream->last_ctx_id = data.ctx_id;
> +	}
>   
>   	return append_perf_sample(stream, buf, count, offset, &data);
>   }
> @@ -3838,6 +3869,7 @@ void i915_perf_init(struct drm_i915_private *dev_priv)
>   		dev_priv->perf.oa.ops.read = gen7_oa_read;
>   		dev_priv->perf.oa.ops.oa_hw_tail_read =
>   			gen7_oa_hw_tail_read;
> +		dev_priv->perf.oa.ops.get_ctx_id = gen7_oa_buffer_get_ctx_id;
>   
>   		dev_priv->perf.oa.timestamp_frequency = 12500000;
>   
> @@ -3933,6 +3965,8 @@ void i915_perf_init(struct drm_i915_private *dev_priv)
>   			dev_priv->perf.oa.ops.read = gen8_oa_read;
>   			dev_priv->perf.oa.ops.oa_hw_tail_read =
>   				gen8_oa_hw_tail_read;
> +			dev_priv->perf.oa.ops.get_ctx_id =
> +				gen8_oa_buffer_get_ctx_id;
>   
>   			dev_priv->perf.oa.oa_formats = gen8_plus_oa_formats;
>   		}


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 03/12] drm/i915: Framework for capturing command stream based OA reports and ctx id info.
  2017-07-31  7:59 ` [PATCH 03/12] drm/i915: Framework for capturing command stream based OA reports and ctx id info Sagar Arun Kamble
  2017-07-31  8:34   ` Chris Wilson
@ 2017-07-31  9:43   ` Lionel Landwerlin
  2017-07-31 11:38     ` sourab gupta
  2017-07-31 15:38   ` kbuild test robot
  2017-07-31 15:45   ` Lionel Landwerlin
  3 siblings, 1 reply; 34+ messages in thread
From: Lionel Landwerlin @ 2017-07-31  9:43 UTC (permalink / raw)
  To: Sagar Arun Kamble, intel-gfx; +Cc: Sourab Gupta

On 31/07/17 08:59, Sagar Arun Kamble wrote:
> From: Sourab Gupta <sourab.gupta@intel.com>
>
> This patch introduces a framework to capture OA counter reports associated
> with Render command stream. We can then associate the reports captured
> through this mechanism with their corresponding context id's. This can be
> further extended to associate any other metadata information with the
> corresponding samples (since the association with Render command stream
> gives us the ability to capture these information while inserting the
> corresponding capture commands into the command stream).
>
> The OA reports generated in this way are associated with a corresponding
> workload, and thus can be used the delimit the workload (i.e. sample the
> counters at the workload boundaries), within an ongoing stream of periodic
> counter snapshots.
>
> There may be usecases wherein we need more than periodic OA capture mode
> which is supported currently. This mode is primarily used for two usecases:
>      - Ability to capture system wide metrics, alongwith the ability to map
>        the reports back to individual contexts (particularly for HSW).
>      - Ability to inject tags for work, into the reports. This provides
>        visibility into the multiple stages of work within single context.
>
> The userspace will be able to distinguish between the periodic and CS based
> OA reports by the virtue of source_info sample field.
>
> The command MI_REPORT_PERF_COUNT can be used to capture snapshots of OA
> counters, and is inserted at BB boundaries.
> The data thus captured will be stored in a separate buffer, which will
> be different from the buffer used otherwise for periodic OA capture mode.
> The metadata information pertaining to snapshot is maintained in a list,
> which also has offsets into the gem buffer object per captured snapshot.
> In order to track whether the gpu has completed processing the node,
> a field pertaining to corresponding gem request is added, which is tracked
> for completion of the command.
>
> Both periodic and CS based reports are associated with a single stream
> (corresponding to render engine), and it is expected to have the samples
> in the sequential order according to their timestamps. Now, since these
> reports are collected in separate buffers, these are merge sorted at the
> time of forwarding to userspace during the read call.
>
> v2: Aligning with the non-perf interface (custom drm ioctl based). Also,
> few related patches are squashed together for better readability
>
> v3: Updated perf sample capture emit hook name. Reserving space upfront
> in the ring for emitting sample capture commands and using
> req->fence.seqno for tracking samples. Added SRCU protection for streams.
> Changed the stream last_request tracking to resv object. (Chris)
> Updated perf.sample_lock spin_lock usage to avoid softlockups. Moved
> stream to global per-engine structure. (Sagar)
> Update unpin and put in the free routines to i915_vma_unpin_and_release.
> Making use of perf stream cs_buffer vma resv instead of separate resv obj.
> Pruned perf stream vma resv during gem_idle. (Chris)
> Changed payload field ctx_id to u64 to keep all sample data aligned at 8
> bytes. (Lionel)
> stall/flush prior to sample capture is not added. Do we need to give this
> control to user to select whether to stall/flush at each sample?
>
> Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
> Signed-off-by: Robert Bragg <robert@sixbynine.org>
> Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_drv.h            |  101 ++-
>   drivers/gpu/drm/i915/i915_gem.c            |    1 +
>   drivers/gpu/drm/i915/i915_gem_execbuffer.c |    8 +
>   drivers/gpu/drm/i915/i915_perf.c           | 1185 ++++++++++++++++++++++------
>   drivers/gpu/drm/i915/intel_engine_cs.c     |    4 +
>   drivers/gpu/drm/i915/intel_ringbuffer.c    |    2 +
>   drivers/gpu/drm/i915/intel_ringbuffer.h    |    5 +
>   include/uapi/drm/i915_drm.h                |   15 +
>   8 files changed, 1073 insertions(+), 248 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 2c7456f..8b1cecf 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1985,6 +1985,24 @@ struct i915_perf_stream_ops {
>   	 * The stream will always be disabled before this is called.
>   	 */
>   	void (*destroy)(struct i915_perf_stream *stream);
> +
> +	/*
> +	 * @emit_sample_capture: Emit the commands in the command streamer
> +	 * for a particular gpu engine.
> +	 *
> +	 * The commands are inserted to capture the perf sample data at
> +	 * specific points during workload execution, such as before and after
> +	 * the batch buffer.
> +	 */
> +	void (*emit_sample_capture)(struct i915_perf_stream *stream,
> +				    struct drm_i915_gem_request *request,
> +				    bool preallocate);
> +};
> +
> +enum i915_perf_stream_state {
> +	I915_PERF_STREAM_DISABLED,
> +	I915_PERF_STREAM_ENABLE_IN_PROGRESS,
> +	I915_PERF_STREAM_ENABLED,
>   };
>   
>   /**
> @@ -1997,9 +2015,9 @@ struct i915_perf_stream {
>   	struct drm_i915_private *dev_priv;
>   
>   	/**
> -	 * @link: Links the stream into ``&drm_i915_private->streams``
> +	 * @engine: Engine to which this stream corresponds.
>   	 */
> -	struct list_head link;
> +	struct intel_engine_cs *engine;
>   
>   	/**
>   	 * @sample_flags: Flags representing the `DRM_I915_PERF_PROP_SAMPLE_*`
> @@ -2022,17 +2040,41 @@ struct i915_perf_stream {
>   	struct i915_gem_context *ctx;
>   
>   	/**
> -	 * @enabled: Whether the stream is currently enabled, considering
> -	 * whether the stream was opened in a disabled state and based
> -	 * on `I915_PERF_IOCTL_ENABLE` and `I915_PERF_IOCTL_DISABLE` calls.
> +	 * @state: Current stream state, which can be either disabled, enabled,
> +	 * or enable_in_progress, while considering whether the stream was
> +	 * opened in a disabled state and based on `I915_PERF_IOCTL_ENABLE` and
> +	 * `I915_PERF_IOCTL_DISABLE` calls.
>   	 */
> -	bool enabled;
> +	enum i915_perf_stream_state state;
> +
> +	/**
> +	 * @cs_mode: Whether command stream based perf sample collection is
> +	 * enabled for this stream
> +	 */
> +	bool cs_mode;
> +
> +	/**
> +	 * @using_oa: Whether OA unit is in use for this particular stream
> +	 */
> +	bool using_oa;
>   
>   	/**
>   	 * @ops: The callbacks providing the implementation of this specific
>   	 * type of configured stream.
>   	 */
>   	const struct i915_perf_stream_ops *ops;
> +
> +	/* Command stream based perf data buffer */
> +	struct {
> +		struct i915_vma *vma;
> +		u8 *vaddr;
> +	} cs_buffer;
> +
> +	struct list_head cs_samples;
> +	spinlock_t cs_samples_lock;
> +
> +	wait_queue_head_t poll_wq;
> +	bool pollin;
>   };
>   
>   /**
> @@ -2095,7 +2137,8 @@ struct i915_oa_ops {
>   	int (*read)(struct i915_perf_stream *stream,
>   		    char __user *buf,
>   		    size_t count,
> -		    size_t *offset);
> +		    size_t *offset,
> +		    u32 ts);
>   
>   	/**
>   	 * @oa_hw_tail_read: read the OA tail pointer register
> @@ -2107,6 +2150,36 @@ struct i915_oa_ops {
>   	u32 (*oa_hw_tail_read)(struct drm_i915_private *dev_priv);
>   };
>   
> +/*
> + * i915_perf_cs_sample - Sample element to hold info about a single perf
> + * sample data associated with a particular GPU command stream.
> + */
> +struct i915_perf_cs_sample {
> +	/**
> +	 * @link: Links the sample into ``&stream->cs_samples``
> +	 */
> +	struct list_head link;
> +
> +	/**
> +	 * @request: GEM request associated with the sample. The commands to
> +	 * capture the perf metrics are inserted into the command streamer in
> +	 * context of this request.
> +	 */
> +	struct drm_i915_gem_request *request;
> +
> +	/**
> +	 * @offset: Offset into ``&stream->cs_buffer``
> +	 * where the perf metrics will be collected, when the commands inserted
> +	 * into the command stream are executed by GPU.
> +	 */
> +	u32 offset;
> +
> +	/**
> +	 * @ctx_id: Context ID associated with this perf sample
> +	 */
> +	u32 ctx_id;
> +};
> +
>   struct intel_cdclk_state {
>   	unsigned int cdclk, vco, ref;
>   };
> @@ -2431,17 +2504,10 @@ struct drm_i915_private {
>   		struct ctl_table_header *sysctl_header;
>   
>   		struct mutex lock;
> -		struct list_head streams;
> -
> -		struct {
> -			struct i915_perf_stream *exclusive_stream;
>   
> -			u32 specific_ctx_id;
> -
> -			struct hrtimer poll_check_timer;
> -			wait_queue_head_t poll_wq;
> -			bool pollin;
> +		struct hrtimer poll_check_timer;
>   
> +		struct {
>   			/**
>   			 * For rate limiting any notifications of spurious
>   			 * invalid OA reports
> @@ -3636,6 +3702,8 @@ int i915_perf_open_ioctl(struct drm_device *dev, void *data,
>   void i915_oa_init_reg_state(struct intel_engine_cs *engine,
>   			    struct i915_gem_context *ctx,
>   			    uint32_t *reg_state);
> +void i915_perf_emit_sample_capture(struct drm_i915_gem_request *req,
> +				   bool preallocate);
>   
>   /* i915_gem_evict.c */
>   int __must_check i915_gem_evict_something(struct i915_address_space *vm,
> @@ -3795,6 +3863,7 @@ int intel_engine_cmd_parser(struct intel_engine_cs *engine,
>   /* i915_perf.c */
>   extern void i915_perf_init(struct drm_i915_private *dev_priv);
>   extern void i915_perf_fini(struct drm_i915_private *dev_priv);
> +extern void i915_perf_streams_mark_idle(struct drm_i915_private *dev_priv);
>   extern void i915_perf_register(struct drm_i915_private *dev_priv);
>   extern void i915_perf_unregister(struct drm_i915_private *dev_priv);
>   
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 000a764..7b01548 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3220,6 +3220,7 @@ bool i915_gem_unset_wedged(struct drm_i915_private *i915)
>   
>   	intel_engines_mark_idle(dev_priv);
>   	i915_gem_timelines_mark_idle(dev_priv);
> +	i915_perf_streams_mark_idle(dev_priv);
>   
>   	GEM_BUG_ON(!dev_priv->gt.awake);
>   	dev_priv->gt.awake = false;
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index 5fa4476..bfe546b 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -1194,12 +1194,16 @@ static int __reloc_gpu_alloc(struct i915_execbuffer *eb,
>   	if (err)
>   		goto err_request;
>   
> +	i915_perf_emit_sample_capture(rq, true);
> +
>   	err = eb->engine->emit_bb_start(rq,
>   					batch->node.start, PAGE_SIZE,
>   					cache->gen > 5 ? 0 : I915_DISPATCH_SECURE);
>   	if (err)
>   		goto err_request;
>   
> +	i915_perf_emit_sample_capture(rq, false);
> +
>   	GEM_BUG_ON(!reservation_object_test_signaled_rcu(batch->resv, true));
>   	i915_vma_move_to_active(batch, rq, 0);
>   	reservation_object_lock(batch->resv, NULL);
> @@ -2029,6 +2033,8 @@ static int eb_submit(struct i915_execbuffer *eb)
>   			return err;
>   	}
>   
> +	i915_perf_emit_sample_capture(eb->request, true);
> +
>   	err = eb->engine->emit_bb_start(eb->request,
>   					eb->batch->node.start +
>   					eb->batch_start_offset,
> @@ -2037,6 +2043,8 @@ static int eb_submit(struct i915_execbuffer *eb)
>   	if (err)
>   		return err;
>   
> +	i915_perf_emit_sample_capture(eb->request, false);
> +
>   	return 0;
>   }
>   
> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> index b272653..57e1936 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -193,6 +193,7 @@
>   
>   #include <linux/anon_inodes.h>
>   #include <linux/sizes.h>
> +#include <linux/srcu.h>
>   
>   #include "i915_drv.h"
>   #include "i915_oa_hsw.h"
> @@ -288,6 +289,12 @@
>   #define OAREPORT_REASON_CTX_SWITCH     (1<<3)
>   #define OAREPORT_REASON_CLK_RATIO      (1<<5)
>   
> +/* Data common to periodic and RCS based OA samples */
> +struct i915_perf_sample_data {
> +	u64 source;
> +	u64 ctx_id;
> +	const u8 *report;
> +};
>   
>   /* For sysctl proc_dointvec_minmax of i915_oa_max_sample_rate
>    *
> @@ -328,8 +335,19 @@
>   	[I915_OA_FORMAT_C4_B8]		    = { 7, 64 },
>   };
>   
> +/* Duplicated from similar static enum in i915_gem_execbuffer.c */
> +#define I915_USER_RINGS (4)
> +static const enum intel_engine_id user_ring_map[I915_USER_RINGS + 1] = {
> +	[I915_EXEC_DEFAULT]     = RCS,
> +	[I915_EXEC_RENDER]      = RCS,
> +	[I915_EXEC_BLT]         = BCS,
> +	[I915_EXEC_BSD]         = VCS,
> +	[I915_EXEC_VEBOX]       = VECS
> +};
> +
>   #define SAMPLE_OA_REPORT      (1<<0)
>   #define SAMPLE_OA_SOURCE      (1<<1)
> +#define SAMPLE_CTX_ID	      (1<<2)
>   
>   /**
>    * struct perf_open_properties - for validated properties given to open a stream
> @@ -340,6 +358,9 @@
>    * @oa_format: An OA unit HW report format
>    * @oa_periodic: Whether to enable periodic OA unit sampling
>    * @oa_period_exponent: The OA unit sampling period is derived from this
> + * @cs_mode: Whether the stream is configured to enable collection of metrics
> + * associated with command stream of a particular GPU engine
> + * @engine: The GPU engine associated with the stream in case cs_mode is enabled
>    *
>    * As read_properties_unlocked() enumerates and validates the properties given
>    * to open a stream of metrics the configuration is built up in the structure
> @@ -356,6 +377,10 @@ struct perf_open_properties {
>   	int oa_format;
>   	bool oa_periodic;
>   	int oa_period_exponent;
> +
> +	/* Command stream mode */
> +	bool cs_mode;
> +	enum intel_engine_id engine;
>   };
>   
>   static u32 gen8_oa_hw_tail_read(struct drm_i915_private *dev_priv)
> @@ -371,6 +396,266 @@ static u32 gen7_oa_hw_tail_read(struct drm_i915_private *dev_priv)
>   }
>   
>   /**
> + * i915_perf_emit_sample_capture - Insert the commands to capture metrics into
> + * the command stream of a GPU engine.
> + * @request: request in whose context the metrics are being collected.
> + * @preallocate: allocate space in ring for related sample.
> + *
> + * The function provides a hook through which the commands to capture perf
> + * metrics, are inserted into the command stream of a GPU engine.
> + */
> +void i915_perf_emit_sample_capture(struct drm_i915_gem_request *request,
> +				   bool preallocate)
> +{
> +	struct intel_engine_cs *engine = request->engine;
> +	struct drm_i915_private *dev_priv = engine->i915;
> +	struct i915_perf_stream *stream;
> +	int idx;
> +
> +	if (!dev_priv->perf.initialized)
> +		return;
> +
> +	idx = srcu_read_lock(&engine->perf_srcu);
> +	stream = srcu_dereference(engine->exclusive_stream, &engine->perf_srcu);
> +	if (stream && (stream->state == I915_PERF_STREAM_ENABLED) &&
> +				stream->cs_mode)
> +		stream->ops->emit_sample_capture(stream, request,
> +						 preallocate);
> +	srcu_read_unlock(&engine->perf_srcu, idx);
> +}
> +
> +/**
> + * release_perf_samples - Release old perf samples to make space for new
> + * sample data.
> + * @stream: Stream from which space is to be freed up.
> + * @target_size: Space required to be freed up.
> + *
> + * We also dereference the associated request before deleting the sample.
> + * Also, no need to check whether the commands associated with old samples
> + * have been completed. This is because these sample entries are anyways going
> + * to be replaced by a new sample, and gpu will eventually overwrite the buffer
> + * contents, when the request associated with new sample completes.
> + */
> +static void release_perf_samples(struct i915_perf_stream *stream,
> +				 u32 target_size)
> +{
> +	struct drm_i915_private *dev_priv = stream->dev_priv;
> +	struct i915_perf_cs_sample *sample, *next;
> +	u32 sample_size = dev_priv->perf.oa.oa_buffer.format_size;
> +	u32 size = 0;
> +
> +	list_for_each_entry_safe
> +		(sample, next, &stream->cs_samples, link) {
> +		size += sample_size;
> +		i915_gem_request_put(sample->request);
> +		list_del(&sample->link);
> +		kfree(sample);
> +
> +		if (size >= target_size)
> +			break;
> +	}
> +}
> +
> +/**
> + * insert_perf_sample - Insert a perf sample entry to the sample list.
> + * @stream: Stream into which sample is to be inserted.
> + * @sample: perf CS sample to be inserted into the list
> + *
> + * This function never fails, since it always manages to insert the sample.
> + * If the space is exhausted in the buffer, it will remove the older
> + * entries in order to make space.
> + */
> +static void insert_perf_sample(struct i915_perf_stream *stream,
> +				struct i915_perf_cs_sample *sample)
> +{
> +	struct drm_i915_private *dev_priv = stream->dev_priv;
> +	struct i915_perf_cs_sample *first, *last;
> +	int max_offset = stream->cs_buffer.vma->obj->base.size;
> +	u32 sample_size = dev_priv->perf.oa.oa_buffer.format_size;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&stream->cs_samples_lock, flags);
> +	if (list_empty(&stream->cs_samples)) {
> +		sample->offset = 0;
> +		list_add_tail(&sample->link, &stream->cs_samples);
> +		spin_unlock_irqrestore(&stream->cs_samples_lock, flags);
> +		return;
> +	}
> +
> +	first = list_first_entry(&stream->cs_samples, typeof(*first),
> +				link);
> +	last = list_last_entry(&stream->cs_samples, typeof(*last),
> +				link);
> +
> +	if (last->offset >= first->offset) {
> +		/* Sufficient space available at the end of buffer? */
> +		if (last->offset + 2*sample_size < max_offset)
> +			sample->offset = last->offset + sample_size;
> +		/*
> +		 * Wraparound condition. Is sufficient space available at
> +		 * beginning of buffer?
> +		 */
> +		else if (sample_size < first->offset)
> +			sample->offset = 0;
> +		/* Insufficient space. Overwrite existing old entries */
> +		else {
> +			u32 target_size = sample_size - first->offset;
> +
> +			release_perf_samples(stream, target_size);
> +			sample->offset = 0;
> +		}
> +	} else {
> +		/* Sufficient space available? */
> +		if (last->offset + 2*sample_size < first->offset)
> +			sample->offset = last->offset + sample_size;
> +		/* Insufficient space. Overwrite existing old entries */
> +		else {
> +			u32 target_size = sample_size -
> +				(first->offset - last->offset -
> +				sample_size);
> +
> +			release_perf_samples(stream, target_size);
> +			sample->offset = last->offset + sample_size;
> +		}
> +	}
> +	list_add_tail(&sample->link, &stream->cs_samples);
> +	spin_unlock_irqrestore(&stream->cs_samples_lock, flags);
> +}
> +
> +/**
> + * i915_emit_oa_report_capture - Insert the commands to capture OA
> + * reports metrics into the render command stream
> + * @request: request in whose context the metrics are being collected.
> + * @preallocate: allocate space in ring for related sample.
> + * @offset: command stream buffer offset where the OA metrics need to be
> + * collected
> + */
> +static int i915_emit_oa_report_capture(
> +				struct drm_i915_gem_request *request,
> +				bool preallocate,
> +				u32 offset)
> +{
> +	struct drm_i915_private *dev_priv = request->i915;
> +	struct intel_engine_cs *engine = request->engine;
> +	struct i915_perf_stream *stream;
> +	u32 addr = 0;
> +	u32 cmd, len = 4, *cs;
> +	int idx;
> +
> +	idx = srcu_read_lock(&engine->perf_srcu);
> +	stream = srcu_dereference(engine->exclusive_stream, &engine->perf_srcu);
> +	addr = stream->cs_buffer.vma->node.start + offset;
> +	srcu_read_unlock(&engine->perf_srcu, idx);
> +
> +	if (WARN_ON(addr & 0x3f)) {
> +		DRM_ERROR("OA buffer address not aligned to 64 byte\n");
> +		return -EINVAL;
> +	}
> +
> +	if (preallocate)
> +		request->reserved_space += len;
> +	else
> +		request->reserved_space -= len;
> +
> +	cs = intel_ring_begin(request, 4);
> +	if (IS_ERR(cs))
> +		return PTR_ERR(cs);
> +
> +	cmd = MI_REPORT_PERF_COUNT | (1<<0);
> +	if (INTEL_GEN(dev_priv) >= 8)
> +		cmd |= (2<<0);
> +
> +	*cs++ = cmd;
> +	*cs++ = addr | MI_REPORT_PERF_COUNT_GGTT;
> +	*cs++ = request->fence.seqno;
> +
> +	if (INTEL_GEN(dev_priv) >= 8)
> +		*cs++ = 0;
> +	else
> +		*cs++ = MI_NOOP;
> +
> +	intel_ring_advance(request, cs);
> +
> +	return 0;
> +}
> +
> +/**
> + * i915_perf_stream_emit_sample_capture - Insert the commands to capture perf
> + * metrics into the GPU command stream
> + * @stream: An i915-perf stream opened for GPU metrics
> + * @request: request in whose context the metrics are being collected.
> + * @preallocate: allocate space in ring for related sample.
> + */
> +static void i915_perf_stream_emit_sample_capture(
> +					struct i915_perf_stream *stream,
> +					struct drm_i915_gem_request *request,
> +					bool preallocate)
> +{
> +	struct reservation_object *resv = stream->cs_buffer.vma->resv;
> +	struct i915_perf_cs_sample *sample;
> +	unsigned long flags;
> +	int ret;
> +
> +	sample = kzalloc(sizeof(*sample), GFP_KERNEL);
> +	if (sample == NULL) {
> +		DRM_ERROR("Perf sample alloc failed\n");
> +		return;
> +	}
> +
> +	sample->request = i915_gem_request_get(request);
> +	sample->ctx_id = request->ctx->hw_id;
> +
> +	insert_perf_sample(stream, sample);
> +
> +	if (stream->sample_flags & SAMPLE_OA_REPORT) {
> +		ret = i915_emit_oa_report_capture(request,
> +						  preallocate,
> +						  sample->offset);
> +		if (ret)
> +			goto err_unref;
> +	}
> +
> +	reservation_object_lock(resv, NULL);
> +	if (reservation_object_reserve_shared(resv) == 0)
> +		reservation_object_add_shared_fence(resv, &request->fence);
> +	reservation_object_unlock(resv);
> +
> +	i915_vma_move_to_active(stream->cs_buffer.vma, request,
> +					EXEC_OBJECT_WRITE);
> +	return;
> +
> +err_unref:
> +	i915_gem_request_put(sample->request);
> +	spin_lock_irqsave(&stream->cs_samples_lock, flags);
> +	list_del(&sample->link);
> +	spin_unlock_irqrestore(&stream->cs_samples_lock, flags);
> +	kfree(sample);
> +}
> +
> +/**
> + * i915_perf_stream_release_samples - Release the perf command stream samples
> + * @stream: Stream from which sample are to be released.
> + *
> + * Note: The associated requests should be completed before releasing the
> + * references here.
> + */
> +static void i915_perf_stream_release_samples(struct i915_perf_stream *stream)
> +{
> +	struct i915_perf_cs_sample *entry, *next;
> +	unsigned long flags;
> +
> +	list_for_each_entry_safe
> +		(entry, next, &stream->cs_samples, link) {
> +		i915_gem_request_put(entry->request);
> +
> +		spin_lock_irqsave(&stream->cs_samples_lock, flags);
> +		list_del(&entry->link);
> +		spin_unlock_irqrestore(&stream->cs_samples_lock, flags);
> +		kfree(entry);
> +	}
> +}
> +
> +/**
>    * oa_buffer_check_unlocked - check for data and update tail ptr state
>    * @dev_priv: i915 device instance
>    *
> @@ -521,12 +806,13 @@ static int append_oa_status(struct i915_perf_stream *stream,
>   }
>   
>   /**
> - * append_oa_sample - Copies single OA report into userspace read() buffer.
> - * @stream: An i915-perf stream opened for OA metrics
> + * append_perf_sample - Copies single perf sample into userspace read() buffer.
> + * @stream: An i915-perf stream opened for perf samples
>    * @buf: destination buffer given by userspace
>    * @count: the number of bytes userspace wants to read
>    * @offset: (inout): the current position for writing into @buf
> - * @report: A single OA report to (optionally) include as part of the sample
> + * @data: perf sample data which contains (optionally) metrics configured
> + * earlier when opening a stream
>    *
>    * The contents of a sample are configured through `DRM_I915_PERF_PROP_SAMPLE_*`
>    * properties when opening a stream, tracked as `stream->sample_flags`. This
> @@ -537,11 +823,11 @@ static int append_oa_status(struct i915_perf_stream *stream,
>    *
>    * Returns: 0 on success, negative error code on failure.
>    */
> -static int append_oa_sample(struct i915_perf_stream *stream,
> +static int append_perf_sample(struct i915_perf_stream *stream,
>   			    char __user *buf,
>   			    size_t count,
>   			    size_t *offset,
> -			    const u8 *report)
> +			    const struct i915_perf_sample_data *data)
>   {
>   	struct drm_i915_private *dev_priv = stream->dev_priv;
>   	int report_size = dev_priv->perf.oa.oa_buffer.format_size;
> @@ -569,16 +855,21 @@ static int append_oa_sample(struct i915_perf_stream *stream,
>   	 * transition. These are considered as source 'OABUFFER'.
>   	 */
>   	if (sample_flags & SAMPLE_OA_SOURCE) {
> -		u64 source = I915_PERF_SAMPLE_OA_SOURCE_OABUFFER;
> +		if (copy_to_user(buf, &data->source, 8))
> +			return -EFAULT;
> +		buf += 8;
> +	}
>   
> -		if (copy_to_user(buf, &source, 8))
> +	if (sample_flags & SAMPLE_CTX_ID) {
> +		if (copy_to_user(buf, &data->ctx_id, 8))
>   			return -EFAULT;
>   		buf += 8;
>   	}
>   
>   	if (sample_flags & SAMPLE_OA_REPORT) {
> -		if (copy_to_user(buf, report, report_size))
> +		if (copy_to_user(buf, data->report, report_size))
>   			return -EFAULT;
> +		buf += report_size;
>   	}
>   
>   	(*offset) += header.size;
> @@ -587,11 +878,54 @@ static int append_oa_sample(struct i915_perf_stream *stream,
>   }
>   
>   /**
> + * append_oa_buffer_sample - Copies single periodic OA report into userspace
> + * read() buffer.
> + * @stream: An i915-perf stream opened for OA metrics
> + * @buf: destination buffer given by userspace
> + * @count: the number of bytes userspace wants to read
> + * @offset: (inout): the current position for writing into @buf
> + * @report: A single OA report to (optionally) include as part of the sample
> + *
> + * Returns: 0 on success, negative error code on failure.
> + */
> +static int append_oa_buffer_sample(struct i915_perf_stream *stream,
> +				char __user *buf, size_t count,
> +				size_t *offset,	const u8 *report)
> +{
> +	struct drm_i915_private *dev_priv = stream->dev_priv;
> +	u32 sample_flags = stream->sample_flags;
> +	struct i915_perf_sample_data data = { 0 };
> +	u32 *report32 = (u32 *)report;
> +
> +	if (sample_flags & SAMPLE_OA_SOURCE)
> +		data.source = I915_PERF_SAMPLE_OA_SOURCE_OABUFFER;
> +
> +	if (sample_flags & SAMPLE_CTX_ID) {
> +		if (INTEL_INFO(dev_priv)->gen < 8)
> +			data.ctx_id = 0;
> +		else {
> +			/*
> +			 * XXX: Just keep the lower 21 bits for now since I'm
> +			 * not entirely sure if the HW touches any of the higher
> +			 * bits in this field
> +			 */
> +			data.ctx_id = report32[2] & 0x1fffff;
> +		}
> +	}
> +
> +	if (sample_flags & SAMPLE_OA_REPORT)
> +		data.report = report;
> +
> +	return append_perf_sample(stream, buf, count, offset, &data);
> +}
> +
> +/**
>    * Copies all buffered OA reports into userspace read() buffer.
>    * @stream: An i915-perf stream opened for OA metrics
>    * @buf: destination buffer given by userspace
>    * @count: the number of bytes userspace wants to read
>    * @offset: (inout): the current position for writing into @buf
> + * @ts: copy OA reports till this timestamp
>    *
>    * Notably any error condition resulting in a short read (-%ENOSPC or
>    * -%EFAULT) will be returned even though one or more records may
> @@ -609,7 +943,8 @@ static int append_oa_sample(struct i915_perf_stream *stream,
>   static int gen8_append_oa_reports(struct i915_perf_stream *stream,
>   				  char __user *buf,
>   				  size_t count,
> -				  size_t *offset)
> +				  size_t *offset,
> +				  u32 ts)
>   {
>   	struct drm_i915_private *dev_priv = stream->dev_priv;
>   	int report_size = dev_priv->perf.oa.oa_buffer.format_size;
> @@ -623,7 +958,7 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
>   	u32 taken;
>   	int ret = 0;
>   
> -	if (WARN_ON(!stream->enabled))
> +	if (WARN_ON(stream->state != I915_PERF_STREAM_ENABLED))
>   		return -EIO;
>   
>   	spin_lock_irqsave(&dev_priv->perf.oa.oa_buffer.ptr_lock, flags);
> @@ -669,6 +1004,11 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
>   		u32 *report32 = (void *)report;
>   		u32 ctx_id;
>   		u32 reason;
> +		u32 report_ts = report32[1];
> +
> +		/* Report timestamp should not exceed the given ts */
> +		if (report_ts > ts)
> +			break;
>   
>   		/*
>   		 * All the report sizes factor neatly into the buffer
> @@ -750,23 +1090,23 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
>   		 * switches since it's not-uncommon for periodic samples to
>   		 * identify a switch before any 'context switch' report.
>   		 */
> -		if (!dev_priv->perf.oa.exclusive_stream->ctx ||
> -		    dev_priv->perf.oa.specific_ctx_id == ctx_id ||
> +		if (!stream->ctx ||
> +		    stream->engine->specific_ctx_id == ctx_id ||
>   		    (dev_priv->perf.oa.oa_buffer.last_ctx_id ==
> -		     dev_priv->perf.oa.specific_ctx_id) ||
> +		     stream->engine->specific_ctx_id) ||
>   		    reason & OAREPORT_REASON_CTX_SWITCH) {
>   
>   			/*
>   			 * While filtering for a single context we avoid
>   			 * leaking the IDs of other contexts.
>   			 */
> -			if (dev_priv->perf.oa.exclusive_stream->ctx &&
> -			    dev_priv->perf.oa.specific_ctx_id != ctx_id) {
> +			if (stream->ctx &&
> +			    stream->engine->specific_ctx_id != ctx_id) {
>   				report32[2] = INVALID_CTX_ID;
>   			}
>   
> -			ret = append_oa_sample(stream, buf, count, offset,
> -					       report);
> +			ret = append_oa_buffer_sample(stream, buf, count,
> +						      offset, report);
>   			if (ret)
>   				break;
>   
> @@ -807,6 +1147,7 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
>    * @buf: destination buffer given by userspace
>    * @count: the number of bytes userspace wants to read
>    * @offset: (inout): the current position for writing into @buf
> + * @ts: copy OA reports till this timestamp
>    *
>    * Checks OA unit status registers and if necessary appends corresponding
>    * status records for userspace (such as for a buffer full condition) and then
> @@ -824,7 +1165,8 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
>   static int gen8_oa_read(struct i915_perf_stream *stream,
>   			char __user *buf,
>   			size_t count,
> -			size_t *offset)
> +			size_t *offset,
> +			u32 ts)
>   {
>   	struct drm_i915_private *dev_priv = stream->dev_priv;
>   	u32 oastatus;
> @@ -877,7 +1219,7 @@ static int gen8_oa_read(struct i915_perf_stream *stream,
>   			   oastatus & ~GEN8_OASTATUS_REPORT_LOST);
>   	}
>   
> -	return gen8_append_oa_reports(stream, buf, count, offset);
> +	return gen8_append_oa_reports(stream, buf, count, offset, ts);
>   }
>   
>   /**
> @@ -886,6 +1228,7 @@ static int gen8_oa_read(struct i915_perf_stream *stream,
>    * @buf: destination buffer given by userspace
>    * @count: the number of bytes userspace wants to read
>    * @offset: (inout): the current position for writing into @buf
> + * @ts: copy OA reports till this timestamp
>    *
>    * Notably any error condition resulting in a short read (-%ENOSPC or
>    * -%EFAULT) will be returned even though one or more records may
> @@ -903,7 +1246,8 @@ static int gen8_oa_read(struct i915_perf_stream *stream,
>   static int gen7_append_oa_reports(struct i915_perf_stream *stream,
>   				  char __user *buf,
>   				  size_t count,
> -				  size_t *offset)
> +				  size_t *offset,
> +				  u32 ts)
>   {
>   	struct drm_i915_private *dev_priv = stream->dev_priv;
>   	int report_size = dev_priv->perf.oa.oa_buffer.format_size;
> @@ -917,7 +1261,7 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream,
>   	u32 taken;
>   	int ret = 0;
>   
> -	if (WARN_ON(!stream->enabled))
> +	if (WARN_ON(stream->state != I915_PERF_STREAM_ENABLED))
>   		return -EIO;
>   
>   	spin_lock_irqsave(&dev_priv->perf.oa.oa_buffer.ptr_lock, flags);
> @@ -984,7 +1328,12 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream,
>   			continue;
>   		}
>   
> -		ret = append_oa_sample(stream, buf, count, offset, report);
> +		/* Report timestamp should not exceed the given ts */
> +		if (report32[1] > ts)
> +			break;
> +
> +		ret = append_oa_buffer_sample(stream, buf, count, offset,
> +					      report);
>   		if (ret)
>   			break;
>   
> @@ -1022,6 +1371,7 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream,
>    * @buf: destination buffer given by userspace
>    * @count: the number of bytes userspace wants to read
>    * @offset: (inout): the current position for writing into @buf
> + * @ts: copy OA reports till this timestamp
>    *
>    * Checks Gen 7 specific OA unit status registers and if necessary appends
>    * corresponding status records for userspace (such as for a buffer full
> @@ -1035,7 +1385,8 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream,
>   static int gen7_oa_read(struct i915_perf_stream *stream,
>   			char __user *buf,
>   			size_t count,
> -			size_t *offset)
> +			size_t *offset,
> +			u32 ts)
>   {
>   	struct drm_i915_private *dev_priv = stream->dev_priv;
>   	u32 oastatus1;
> @@ -1097,16 +1448,172 @@ static int gen7_oa_read(struct i915_perf_stream *stream,
>   			GEN7_OASTATUS1_REPORT_LOST;
>   	}
>   
> -	return gen7_append_oa_reports(stream, buf, count, offset);
> +	return gen7_append_oa_reports(stream, buf, count, offset, ts);
> +}
> +
> +/**
> + * append_cs_buffer_sample - Copies single perf sample data associated with
> + * GPU command stream, into userspace read() buffer.
> + * @stream: An i915-perf stream opened for perf CS metrics
> + * @buf: destination buffer given by userspace
> + * @count: the number of bytes userspace wants to read
> + * @offset: (inout): the current position for writing into @buf
> + * @node: Sample data associated with perf metrics
> + *
> + * Returns: 0 on success, negative error code on failure.
> + */
> +static int append_cs_buffer_sample(struct i915_perf_stream *stream,
> +				char __user *buf,
> +				size_t count,
> +				size_t *offset,
> +				struct i915_perf_cs_sample *node)
> +{
> +	struct drm_i915_private *dev_priv = stream->dev_priv;
> +	struct i915_perf_sample_data data = { 0 };
> +	u32 sample_flags = stream->sample_flags;
> +	int ret = 0;
> +
> +	if (sample_flags & SAMPLE_OA_REPORT) {
> +		const u8 *report = stream->cs_buffer.vaddr + node->offset;
> +		u32 sample_ts = *(u32 *)(report + 4);
> +
> +		data.report = report;
> +
> +		/* First, append the periodic OA samples having lower
> +		 * timestamp values
> +		 */
> +		ret = dev_priv->perf.oa.ops.read(stream, buf, count, offset,
> +						 sample_ts);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	if (sample_flags & SAMPLE_OA_SOURCE)
> +		data.source = I915_PERF_SAMPLE_OA_SOURCE_CS;
> +
> +	if (sample_flags & SAMPLE_CTX_ID)
> +		data.ctx_id = node->ctx_id;
> +
> +	return append_perf_sample(stream, buf, count, offset, &data);
>   }
>   
>   /**
> - * i915_oa_wait_unlocked - handles blocking IO until OA data available
> + * append_cs_buffer_samples: Copies all command stream based perf samples
> + * into userspace read() buffer.
> + * @stream: An i915-perf stream opened for perf CS metrics
> + * @buf: destination buffer given by userspace
> + * @count: the number of bytes userspace wants to read
> + * @offset: (inout): the current position for writing into @buf
> + *
> + * Notably any error condition resulting in a short read (-%ENOSPC or
> + * -%EFAULT) will be returned even though one or more records may
> + * have been successfully copied. In this case it's up to the caller
> + * to decide if the error should be squashed before returning to
> + * userspace.
> + *
> + * Returns: 0 on success, negative error code on failure.
> + */
> +static int append_cs_buffer_samples(struct i915_perf_stream *stream,
> +				char __user *buf,
> +				size_t count,
> +				size_t *offset)
> +{
> +	struct i915_perf_cs_sample *entry, *next;
> +	LIST_HEAD(free_list);
> +	int ret = 0;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&stream->cs_samples_lock, flags);
> +	if (list_empty(&stream->cs_samples)) {
> +		spin_unlock_irqrestore(&stream->cs_samples_lock, flags);
> +		return 0;
> +	}
> +	list_for_each_entry_safe(entry, next,
> +				 &stream->cs_samples, link) {
> +		if (!i915_gem_request_completed(entry->request))
> +			break;
> +		list_move_tail(&entry->link, &free_list);
> +	}
> +	spin_unlock_irqrestore(&stream->cs_samples_lock, flags);
> +
> +	if (list_empty(&free_list))
> +		return 0;
> +
> +	list_for_each_entry_safe(entry, next, &free_list, link) {
> +		ret = append_cs_buffer_sample(stream, buf, count, offset,
> +					      entry);
> +		if (ret)
> +			break;
> +
> +		list_del(&entry->link);
> +		i915_gem_request_put(entry->request);
> +		kfree(entry);
> +	}
> +
> +	/* Don't discard remaining entries, keep them for next read */
> +	spin_lock_irqsave(&stream->cs_samples_lock, flags);
> +	list_splice(&free_list, &stream->cs_samples);
> +	spin_unlock_irqrestore(&stream->cs_samples_lock, flags);
> +
> +	return ret;
> +}
> +
> +/*
> + * cs_buffer_is_empty - Checks whether the command stream buffer
> + * associated with the stream has data available.
>    * @stream: An i915-perf stream opened for OA metrics
>    *
> + * Returns: true if atleast one request associated with command stream is
> + * completed, else returns false.
> + */
> +static bool cs_buffer_is_empty(struct i915_perf_stream *stream)
> +
> +{
> +	struct i915_perf_cs_sample *entry = NULL;
> +	struct drm_i915_gem_request *request = NULL;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&stream->cs_samples_lock, flags);
> +	entry = list_first_entry_or_null(&stream->cs_samples,
> +			struct i915_perf_cs_sample, link);
> +	if (entry)
> +		request = entry->request;
> +	spin_unlock_irqrestore(&stream->cs_samples_lock, flags);
> +
> +	if (!entry)
> +		return true;
> +	else if (!i915_gem_request_completed(request))
> +		return true;
> +	else
> +		return false;
> +}
> +
> +/**
> + * stream_have_data_unlocked - Checks whether the stream has data available
> + * @stream: An i915-perf stream opened for OA metrics
> + *
> + * For command stream based streams, check if the command stream buffer has
> + * atleast one sample available, if not return false, irrespective of periodic
> + * oa buffer having the data or not.
> + */
> +
> +static bool stream_have_data_unlocked(struct i915_perf_stream *stream)
> +{
> +	struct drm_i915_private *dev_priv = stream->dev_priv;
> +
> +	if (stream->cs_mode)
> +		return !cs_buffer_is_empty(stream);
> +	else
> +		return oa_buffer_check_unlocked(dev_priv);
> +}
> +
> +/**
> + * i915_perf_stream_wait_unlocked - handles blocking IO until data available
> + * @stream: An i915-perf stream opened for GPU metrics
> + *
>    * Called when userspace tries to read() from a blocking stream FD opened
> - * for OA metrics. It waits until the hrtimer callback finds a non-empty
> - * OA buffer and wakes us.
> + * for perf metrics. It waits until the hrtimer callback finds a non-empty
> + * command stream buffer / OA buffer and wakes us.
>    *
>    * Note: it's acceptable to have this return with some false positives
>    * since any subsequent read handling will return -EAGAIN if there isn't
> @@ -1114,7 +1621,7 @@ static int gen7_oa_read(struct i915_perf_stream *stream,
>    *
>    * Returns: zero on success or a negative error code
>    */
> -static int i915_oa_wait_unlocked(struct i915_perf_stream *stream)
> +static int i915_perf_stream_wait_unlocked(struct i915_perf_stream *stream)
>   {
>   	struct drm_i915_private *dev_priv = stream->dev_priv;
>   
> @@ -1122,32 +1629,47 @@ static int i915_oa_wait_unlocked(struct i915_perf_stream *stream)
>   	if (!dev_priv->perf.oa.periodic)
>   		return -EIO;
>   
> -	return wait_event_interruptible(dev_priv->perf.oa.poll_wq,
> -					oa_buffer_check_unlocked(dev_priv));
> +	if (stream->cs_mode) {
> +		long int ret;
> +
> +		/* Wait for the all sampled requests. */
> +		ret = reservation_object_wait_timeout_rcu(
> +						    stream->cs_buffer.vma->resv,
> +						    true,
> +						    true,
> +						    MAX_SCHEDULE_TIMEOUT);
> +		if (unlikely(ret < 0)) {
> +			DRM_DEBUG_DRIVER("Failed to wait for sampled requests: %li\n", ret);
> +			return ret;
> +		}
> +	}
> +
> +	return wait_event_interruptible(stream->poll_wq,
> +					stream_have_data_unlocked(stream));
>   }
>   
>   /**
> - * i915_oa_poll_wait - call poll_wait() for an OA stream poll()
> - * @stream: An i915-perf stream opened for OA metrics
> + * i915_perf_stream_poll_wait - call poll_wait() for an stream poll()
> + * @stream: An i915-perf stream opened for GPU metrics
>    * @file: An i915 perf stream file
>    * @wait: poll() state table
>    *
> - * For handling userspace polling on an i915 perf stream opened for OA metrics,
> + * For handling userspace polling on an i915 perf stream opened for metrics,
>    * this starts a poll_wait with the wait queue that our hrtimer callback wakes
> - * when it sees data ready to read in the circular OA buffer.
> + * when it sees data ready to read either in command stream buffer or in the
> + * circular OA buffer.
>    */
> -static void i915_oa_poll_wait(struct i915_perf_stream *stream,
> +static void i915_perf_stream_poll_wait(struct i915_perf_stream *stream,
>   			      struct file *file,
>   			      poll_table *wait)
>   {
> -	struct drm_i915_private *dev_priv = stream->dev_priv;
> -
> -	poll_wait(file, &dev_priv->perf.oa.poll_wq, wait);
> +	poll_wait(file, &stream->poll_wq, wait);
>   }
>   
>   /**
> - * i915_oa_read - just calls through to &i915_oa_ops->read
> - * @stream: An i915-perf stream opened for OA metrics
> + * i915_perf_stream_read - Reads perf metrics available into userspace read
> + * buffer
> + * @stream: An i915-perf stream opened for GPU metrics
>    * @buf: destination buffer given by userspace
>    * @count: the number of bytes userspace wants to read
>    * @offset: (inout): the current position for writing into @buf
> @@ -1157,14 +1679,21 @@ static void i915_oa_poll_wait(struct i915_perf_stream *stream,
>    *
>    * Returns: zero on success or a negative error code
>    */
> -static int i915_oa_read(struct i915_perf_stream *stream,
> +static int i915_perf_stream_read(struct i915_perf_stream *stream,
>   			char __user *buf,
>   			size_t count,
>   			size_t *offset)
>   {
>   	struct drm_i915_private *dev_priv = stream->dev_priv;
>   
> -	return dev_priv->perf.oa.ops.read(stream, buf, count, offset);
> +

Does the following code mean that a perf stream is either in cs_mode or 
OA mode?
I couldn't see that condition in the function processing the opening 
parameters.

The comments in the patch description also says :

"Both periodic and CS based reports are associated with a single stream"

The following code seems to contradict that. Can you explain how it works?

Thanks

> +	if (stream->cs_mode)
> +		return append_cs_buffer_samples(stream, buf, count, offset);
> +	else if (stream->sample_flags & SAMPLE_OA_REPORT)
> +		return dev_priv->perf.oa.ops.read(stream, buf, count, offset,
> +						U32_MAX);
> +	else
> +		return -EINVAL;
>   }
>   
>   /**
> @@ -1182,7 +1711,7 @@ static int oa_get_render_ctx_id(struct i915_perf_stream *stream)
>   	struct drm_i915_private *dev_priv = stream->dev_priv;
>   
>   	if (i915.enable_execlists)
> -		dev_priv->perf.oa.specific_ctx_id = stream->ctx->hw_id;
> +		stream->engine->specific_ctx_id = stream->ctx->hw_id;
>   	else {
>   		struct intel_engine_cs *engine = dev_priv->engine[RCS];
>   		struct intel_ring *ring;
> @@ -1209,7 +1738,7 @@ static int oa_get_render_ctx_id(struct i915_perf_stream *stream)
>   		 * i915_ggtt_offset() on the fly) considering the difference
>   		 * with gen8+ and execlists
>   		 */
> -		dev_priv->perf.oa.specific_ctx_id =
> +		stream->engine->specific_ctx_id =
>   			i915_ggtt_offset(stream->ctx->engine[engine->id].state);
>   	}
>   
> @@ -1228,13 +1757,13 @@ static void oa_put_render_ctx_id(struct i915_perf_stream *stream)
>   	struct drm_i915_private *dev_priv = stream->dev_priv;
>   
>   	if (i915.enable_execlists) {
> -		dev_priv->perf.oa.specific_ctx_id = INVALID_CTX_ID;
> +		stream->engine->specific_ctx_id = INVALID_CTX_ID;
>   	} else {
>   		struct intel_engine_cs *engine = dev_priv->engine[RCS];
>   
>   		mutex_lock(&dev_priv->drm.struct_mutex);
>   
> -		dev_priv->perf.oa.specific_ctx_id = INVALID_CTX_ID;
> +		stream->engine->specific_ctx_id = INVALID_CTX_ID;
>   		engine->context_unpin(engine, stream->ctx);
>   
>   		mutex_unlock(&dev_priv->drm.struct_mutex);
> @@ -1242,13 +1771,28 @@ static void oa_put_render_ctx_id(struct i915_perf_stream *stream)
>   }
>   
>   static void
> +free_cs_buffer(struct i915_perf_stream *stream)
> +{
> +	struct drm_i915_private *dev_priv = stream->dev_priv;
> +
> +	mutex_lock(&dev_priv->drm.struct_mutex);
> +
> +	i915_gem_object_unpin_map(stream->cs_buffer.vma->obj);
> +	i915_vma_unpin_and_release(&stream->cs_buffer.vma);
> +
> +	stream->cs_buffer.vma = NULL;
> +	stream->cs_buffer.vaddr = NULL;
> +
> +	mutex_unlock(&dev_priv->drm.struct_mutex);
> +}
> +
> +static void
>   free_oa_buffer(struct drm_i915_private *i915)
>   {
>   	mutex_lock(&i915->drm.struct_mutex);
>   
>   	i915_gem_object_unpin_map(i915->perf.oa.oa_buffer.vma->obj);
> -	i915_vma_unpin(i915->perf.oa.oa_buffer.vma);
> -	i915_gem_object_put(i915->perf.oa.oa_buffer.vma->obj);
> +	i915_vma_unpin_and_release(&i915->perf.oa.oa_buffer.vma);
>   
>   	i915->perf.oa.oa_buffer.vma = NULL;
>   	i915->perf.oa.oa_buffer.vaddr = NULL;
> @@ -1256,27 +1800,41 @@ static void oa_put_render_ctx_id(struct i915_perf_stream *stream)
>   	mutex_unlock(&i915->drm.struct_mutex);
>   }
>   
> -static void i915_oa_stream_destroy(struct i915_perf_stream *stream)
> +static void i915_perf_stream_destroy(struct i915_perf_stream *stream)
>   {
>   	struct drm_i915_private *dev_priv = stream->dev_priv;
> -
> -	BUG_ON(stream != dev_priv->perf.oa.exclusive_stream);
> +	struct intel_engine_cs *engine = stream->engine;
> +	struct i915_perf_stream *engine_stream;
> +	int idx;
> +
> +	idx = srcu_read_lock(&engine->perf_srcu);
> +	engine_stream = srcu_dereference(engine->exclusive_stream,
> +					 &engine->perf_srcu);
> +	if (WARN_ON(stream != engine_stream))
> +		return;
> +	srcu_read_unlock(&engine->perf_srcu, idx);
>   
>   	/*
>   	 * Unset exclusive_stream first, it might be checked while
>   	 * disabling the metric set on gen8+.
>   	 */
> -	dev_priv->perf.oa.exclusive_stream = NULL;
> +	rcu_assign_pointer(stream->engine->exclusive_stream, NULL);
> +	synchronize_srcu(&stream->engine->perf_srcu);
>   
> -	dev_priv->perf.oa.ops.disable_metric_set(dev_priv);
> +	if (stream->using_oa) {
> +		dev_priv->perf.oa.ops.disable_metric_set(dev_priv);
>   
> -	free_oa_buffer(dev_priv);
> +		free_oa_buffer(dev_priv);
>   
> -	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
> -	intel_runtime_pm_put(dev_priv);
> +		intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
> +		intel_runtime_pm_put(dev_priv);
>   
> -	if (stream->ctx)
> -		oa_put_render_ctx_id(stream);
> +		if (stream->ctx)
> +			oa_put_render_ctx_id(stream);
> +	}
> +
> +	if (stream->cs_mode)
> +		free_cs_buffer(stream);
>   
>   	if (dev_priv->perf.oa.spurious_report_rs.missed) {
>   		DRM_NOTE("%d spurious OA report notices suppressed due to ratelimiting\n",
> @@ -1325,11 +1883,6 @@ static void gen7_init_oa_buffer(struct drm_i915_private *dev_priv)
>   	 * memory...
>   	 */
>   	memset(dev_priv->perf.oa.oa_buffer.vaddr, 0, OA_BUFFER_SIZE);
> -
> -	/* Maybe make ->pollin per-stream state if we support multiple
> -	 * concurrent streams in the future.
> -	 */
> -	dev_priv->perf.oa.pollin = false;
>   }
>   
>   static void gen8_init_oa_buffer(struct drm_i915_private *dev_priv)
> @@ -1383,33 +1936,26 @@ static void gen8_init_oa_buffer(struct drm_i915_private *dev_priv)
>   	 * memory...
>   	 */
>   	memset(dev_priv->perf.oa.oa_buffer.vaddr, 0, OA_BUFFER_SIZE);
> -
> -	/*
> -	 * Maybe make ->pollin per-stream state if we support multiple
> -	 * concurrent streams in the future.
> -	 */
> -	dev_priv->perf.oa.pollin = false;
>   }
>   
> -static int alloc_oa_buffer(struct drm_i915_private *dev_priv)
> +static int alloc_obj(struct drm_i915_private *dev_priv,
> +		     struct i915_vma **vma, u8 **vaddr)
>   {
>   	struct drm_i915_gem_object *bo;
> -	struct i915_vma *vma;
>   	int ret;
>   
> -	if (WARN_ON(dev_priv->perf.oa.oa_buffer.vma))
> -		return -ENODEV;
> +	intel_runtime_pm_get(dev_priv);
>   
>   	ret = i915_mutex_lock_interruptible(&dev_priv->drm);
>   	if (ret)
> -		return ret;
> +		goto out;
>   
>   	BUILD_BUG_ON_NOT_POWER_OF_2(OA_BUFFER_SIZE);
>   	BUILD_BUG_ON(OA_BUFFER_SIZE < SZ_128K || OA_BUFFER_SIZE > SZ_16M);
>   
>   	bo = i915_gem_object_create(dev_priv, OA_BUFFER_SIZE);
>   	if (IS_ERR(bo)) {
> -		DRM_ERROR("Failed to allocate OA buffer\n");
> +		DRM_ERROR("Failed to allocate i915 perf obj\n");
>   		ret = PTR_ERR(bo);
>   		goto unlock;
>   	}
> @@ -1419,42 +1965,83 @@ static int alloc_oa_buffer(struct drm_i915_private *dev_priv)
>   		goto err_unref;
>   
>   	/* PreHSW required 512K alignment, HSW requires 16M */
> -	vma = i915_gem_object_ggtt_pin(bo, NULL, 0, SZ_16M, 0);
> -	if (IS_ERR(vma)) {
> -		ret = PTR_ERR(vma);
> +	*vma = i915_gem_object_ggtt_pin(bo, NULL, 0, SZ_16M, 0);
> +	if (IS_ERR(*vma)) {
> +		ret = PTR_ERR(*vma);
>   		goto err_unref;
>   	}
> -	dev_priv->perf.oa.oa_buffer.vma = vma;
>   
> -	dev_priv->perf.oa.oa_buffer.vaddr =
> -		i915_gem_object_pin_map(bo, I915_MAP_WB);
> -	if (IS_ERR(dev_priv->perf.oa.oa_buffer.vaddr)) {
> -		ret = PTR_ERR(dev_priv->perf.oa.oa_buffer.vaddr);
> +	*vaddr = i915_gem_object_pin_map(bo, I915_MAP_WB);
> +	if (IS_ERR(*vaddr)) {
> +		ret = PTR_ERR(*vaddr);
>   		goto err_unpin;
>   	}
>   
> -	dev_priv->perf.oa.ops.init_oa_buffer(dev_priv);
> -
> -	DRM_DEBUG_DRIVER("OA Buffer initialized, gtt offset = 0x%x, vaddr = %p\n",
> -			 i915_ggtt_offset(dev_priv->perf.oa.oa_buffer.vma),
> -			 dev_priv->perf.oa.oa_buffer.vaddr);
> -
>   	goto unlock;
>   
>   err_unpin:
> -	__i915_vma_unpin(vma);
> +	i915_vma_unpin(*vma);
>   
>   err_unref:
>   	i915_gem_object_put(bo);
>   
> -	dev_priv->perf.oa.oa_buffer.vaddr = NULL;
> -	dev_priv->perf.oa.oa_buffer.vma = NULL;
> -
>   unlock:
>   	mutex_unlock(&dev_priv->drm.struct_mutex);
> +out:
> +	intel_runtime_pm_put(dev_priv);
>   	return ret;
>   }
>   
> +static int alloc_oa_buffer(struct drm_i915_private *dev_priv)
> +{
> +	struct i915_vma *vma;
> +	u8 *vaddr;
> +	int ret;
> +
> +	if (WARN_ON(dev_priv->perf.oa.oa_buffer.vma))
> +		return -ENODEV;
> +
> +	ret = alloc_obj(dev_priv, &vma, &vaddr);
> +	if (ret)
> +		return ret;
> +
> +	dev_priv->perf.oa.oa_buffer.vma = vma;
> +	dev_priv->perf.oa.oa_buffer.vaddr = vaddr;
> +
> +	dev_priv->perf.oa.ops.init_oa_buffer(dev_priv);
> +
> +	DRM_DEBUG_DRIVER("OA Buffer initialized, gtt offset = 0x%x, vaddr = %p",
> +			 i915_ggtt_offset(dev_priv->perf.oa.oa_buffer.vma),
> +			 dev_priv->perf.oa.oa_buffer.vaddr);
> +	return 0;
> +}
> +
> +static int alloc_cs_buffer(struct i915_perf_stream *stream)
> +{
> +	struct drm_i915_private *dev_priv = stream->dev_priv;
> +	struct i915_vma *vma;
> +	u8 *vaddr;
> +	int ret;
> +
> +	if (WARN_ON(stream->cs_buffer.vma))
> +		return -ENODEV;
> +
> +	ret = alloc_obj(dev_priv, &vma, &vaddr);
> +	if (ret)
> +		return ret;
> +
> +	stream->cs_buffer.vma = vma;
> +	stream->cs_buffer.vaddr = vaddr;
> +	if (WARN_ON(!list_empty(&stream->cs_samples)))
> +		INIT_LIST_HEAD(&stream->cs_samples);
> +
> +	DRM_DEBUG_DRIVER("Command stream buf initialized, gtt offset = 0x%x, vaddr = %p",
> +			 i915_ggtt_offset(stream->cs_buffer.vma),
> +			 stream->cs_buffer.vaddr);
> +
> +	return 0;
> +}
> +
>   static void config_oa_regs(struct drm_i915_private *dev_priv,
>   			   const struct i915_oa_reg *regs,
>   			   int n_regs)
> @@ -1859,6 +2446,10 @@ static void gen8_disable_metric_set(struct drm_i915_private *dev_priv)
>   
>   static void gen7_oa_enable(struct drm_i915_private *dev_priv)
>   {
> +	struct i915_perf_stream *stream;
> +	struct intel_engine_cs *engine = dev_priv->engine[RCS];
> +	int idx;
> +
>   	/*
>   	 * Reset buf pointers so we don't forward reports from before now.
>   	 *
> @@ -1870,11 +2461,11 @@ static void gen7_oa_enable(struct drm_i915_private *dev_priv)
>   	 */
>   	gen7_init_oa_buffer(dev_priv);
>   
> -	if (dev_priv->perf.oa.exclusive_stream->enabled) {
> -		struct i915_gem_context *ctx =
> -			dev_priv->perf.oa.exclusive_stream->ctx;
> -		u32 ctx_id = dev_priv->perf.oa.specific_ctx_id;
> -
> +	idx = srcu_read_lock(&engine->perf_srcu);
> +	stream = srcu_dereference(engine->exclusive_stream, &engine->perf_srcu);
> +	if (stream->state != I915_PERF_STREAM_DISABLED) {
> +		struct i915_gem_context *ctx = stream->ctx;
> +		u32 ctx_id = engine->specific_ctx_id;
>   		bool periodic = dev_priv->perf.oa.periodic;
>   		u32 period_exponent = dev_priv->perf.oa.period_exponent;
>   		u32 report_format = dev_priv->perf.oa.oa_buffer.format;
> @@ -1889,6 +2480,7 @@ static void gen7_oa_enable(struct drm_i915_private *dev_priv)
>   			   GEN7_OACONTROL_ENABLE);
>   	} else
>   		I915_WRITE(GEN7_OACONTROL, 0);
> +	srcu_read_unlock(&engine->perf_srcu, idx);
>   }
>   
>   static void gen8_oa_enable(struct drm_i915_private *dev_priv)
> @@ -1917,22 +2509,23 @@ static void gen8_oa_enable(struct drm_i915_private *dev_priv)
>   }
>   
>   /**
> - * i915_oa_stream_enable - handle `I915_PERF_IOCTL_ENABLE` for OA stream
> - * @stream: An i915 perf stream opened for OA metrics
> + * i915_perf_stream_enable - handle `I915_PERF_IOCTL_ENABLE` for perf stream
> + * @stream: An i915 perf stream opened for GPU metrics
>    *
>    * [Re]enables hardware periodic sampling according to the period configured
>    * when opening the stream. This also starts a hrtimer that will periodically
>    * check for data in the circular OA buffer for notifying userspace (e.g.
>    * during a read() or poll()).
>    */
> -static void i915_oa_stream_enable(struct i915_perf_stream *stream)
> +static void i915_perf_stream_enable(struct i915_perf_stream *stream)
>   {
>   	struct drm_i915_private *dev_priv = stream->dev_priv;
>   
> -	dev_priv->perf.oa.ops.oa_enable(dev_priv);
> +	if (stream->sample_flags & SAMPLE_OA_REPORT)
> +		dev_priv->perf.oa.ops.oa_enable(dev_priv);
>   
> -	if (dev_priv->perf.oa.periodic)
> -		hrtimer_start(&dev_priv->perf.oa.poll_check_timer,
> +	if (stream->cs_mode || dev_priv->perf.oa.periodic)
> +		hrtimer_start(&dev_priv->perf.poll_check_timer,
>   			      ns_to_ktime(POLL_PERIOD),
>   			      HRTIMER_MODE_REL_PINNED);
>   }
> @@ -1948,34 +2541,39 @@ static void gen8_oa_disable(struct drm_i915_private *dev_priv)
>   }
>   
>   /**
> - * i915_oa_stream_disable - handle `I915_PERF_IOCTL_DISABLE` for OA stream
> - * @stream: An i915 perf stream opened for OA metrics
> + * i915_perf_stream_disable - handle `I915_PERF_IOCTL_DISABLE` for perf stream
> + * @stream: An i915 perf stream opened for GPU metrics
>    *
>    * Stops the OA unit from periodically writing counter reports into the
>    * circular OA buffer. This also stops the hrtimer that periodically checks for
>    * data in the circular OA buffer, for notifying userspace.
>    */
> -static void i915_oa_stream_disable(struct i915_perf_stream *stream)
> +static void i915_perf_stream_disable(struct i915_perf_stream *stream)
>   {
>   	struct drm_i915_private *dev_priv = stream->dev_priv;
>   
> -	dev_priv->perf.oa.ops.oa_disable(dev_priv);
> +	if (stream->cs_mode || dev_priv->perf.oa.periodic)
> +		hrtimer_cancel(&dev_priv->perf.poll_check_timer);
> +
> +	if (stream->cs_mode)
> +		i915_perf_stream_release_samples(stream);
>   
> -	if (dev_priv->perf.oa.periodic)
> -		hrtimer_cancel(&dev_priv->perf.oa.poll_check_timer);
> +	if (stream->sample_flags & SAMPLE_OA_REPORT)
> +		dev_priv->perf.oa.ops.oa_disable(dev_priv);
>   }
>   
> -static const struct i915_perf_stream_ops i915_oa_stream_ops = {
> -	.destroy = i915_oa_stream_destroy,
> -	.enable = i915_oa_stream_enable,
> -	.disable = i915_oa_stream_disable,
> -	.wait_unlocked = i915_oa_wait_unlocked,
> -	.poll_wait = i915_oa_poll_wait,
> -	.read = i915_oa_read,
> +static const struct i915_perf_stream_ops perf_stream_ops = {
> +	.destroy = i915_perf_stream_destroy,
> +	.enable = i915_perf_stream_enable,
> +	.disable = i915_perf_stream_disable,
> +	.wait_unlocked = i915_perf_stream_wait_unlocked,
> +	.poll_wait = i915_perf_stream_poll_wait,
> +	.read = i915_perf_stream_read,
> +	.emit_sample_capture = i915_perf_stream_emit_sample_capture,
>   };
>   
>   /**
> - * i915_oa_stream_init - validate combined props for OA stream and init
> + * i915_perf_stream_init - validate combined props for stream and init
>    * @stream: An i915 perf stream
>    * @param: The open parameters passed to `DRM_I915_PERF_OPEN`
>    * @props: The property state that configures stream (individually validated)
> @@ -1984,58 +2582,35 @@ static void i915_oa_stream_disable(struct i915_perf_stream *stream)
>    * doesn't ensure that the combination necessarily makes sense.
>    *
>    * At this point it has been determined that userspace wants a stream of
> - * OA metrics, but still we need to further validate the combined
> + * perf metrics, but still we need to further validate the combined
>    * properties are OK.
>    *
>    * If the configuration makes sense then we can allocate memory for
> - * a circular OA buffer and apply the requested metric set configuration.
> + * a circular perf buffer and apply the requested metric set configuration.
>    *
>    * Returns: zero on success or a negative error code.
>    */
> -static int i915_oa_stream_init(struct i915_perf_stream *stream,
> +static int i915_perf_stream_init(struct i915_perf_stream *stream,
>   			       struct drm_i915_perf_open_param *param,
>   			       struct perf_open_properties *props)
>   {
>   	struct drm_i915_private *dev_priv = stream->dev_priv;
> -	int format_size;
> +	bool require_oa_unit = props->sample_flags & (SAMPLE_OA_REPORT |
> +						      SAMPLE_OA_SOURCE);
> +	bool cs_sample_data = props->sample_flags & SAMPLE_OA_REPORT;
> +	struct i915_perf_stream *curr_stream;
> +	struct intel_engine_cs *engine = NULL;
> +	int idx;
>   	int ret;
>   
> -	/* If the sysfs metrics/ directory wasn't registered for some
> -	 * reason then don't let userspace try their luck with config
> -	 * IDs
> -	 */
> -	if (!dev_priv->perf.metrics_kobj) {
> -		DRM_DEBUG("OA metrics weren't advertised via sysfs\n");
> -		return -EINVAL;
> -	}
> -
> -	if (!(props->sample_flags & SAMPLE_OA_REPORT)) {
> -		DRM_DEBUG("Only OA report sampling supported\n");
> -		return -EINVAL;
> -	}
> -
> -	if (!dev_priv->perf.oa.ops.init_oa_buffer) {
> -		DRM_DEBUG("OA unit not supported\n");
> -		return -ENODEV;
> -	}
> -
> -	/* To avoid the complexity of having to accurately filter
> -	 * counter reports and marshal to the appropriate client
> -	 * we currently only allow exclusive access
> -	 */
> -	if (dev_priv->perf.oa.exclusive_stream) {
> -		DRM_DEBUG("OA unit already in use\n");
> -		return -EBUSY;
> -	}
> -
> -	if (!props->metrics_set) {
> -		DRM_DEBUG("OA metric set not specified\n");
> -		return -EINVAL;
> -	}
> -
> -	if (!props->oa_format) {
> -		DRM_DEBUG("OA report format not specified\n");
> -		return -EINVAL;
> +	if ((props->sample_flags & SAMPLE_CTX_ID) && !props->cs_mode) {
> +		if (IS_HASWELL(dev_priv)) {
> +			DRM_ERROR("On HSW, context ID sampling only supported via command stream\n");
> +			return -EINVAL;
> +		} else if (!i915.enable_execlists) {
> +			DRM_ERROR("On Gen8+ without execlists, context ID sampling only supported via command stream\n");
> +			return -EINVAL;
> +		}
>   	}
>   
>   	/* We set up some ratelimit state to potentially throttle any _NOTES
> @@ -2060,70 +2635,167 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
>   
>   	stream->sample_size = sizeof(struct drm_i915_perf_record_header);
>   
> -	format_size = dev_priv->perf.oa.oa_formats[props->oa_format].size;
> +	if (require_oa_unit) {
> +		int format_size;
>   
> -	stream->sample_flags |= SAMPLE_OA_REPORT;
> -	stream->sample_size += format_size;
> +		/* If the sysfs metrics/ directory wasn't registered for some
> +		 * reason then don't let userspace try their luck with config
> +		 * IDs
> +		 */
> +		if (!dev_priv->perf.metrics_kobj) {
> +			DRM_DEBUG("OA metrics weren't advertised via sysfs\n");
> +			return -EINVAL;
> +		}
>   
> -	if (props->sample_flags & SAMPLE_OA_SOURCE) {
> -		stream->sample_flags |= SAMPLE_OA_SOURCE;
> -		stream->sample_size += 8;
> -	}
> +		if (!dev_priv->perf.oa.ops.init_oa_buffer) {
> +			DRM_DEBUG("OA unit not supported\n");
> +			return -ENODEV;
> +		}
>   
> -	dev_priv->perf.oa.oa_buffer.format_size = format_size;
> -	if (WARN_ON(dev_priv->perf.oa.oa_buffer.format_size == 0))
> -		return -EINVAL;
> +		if (!props->metrics_set) {
> +			DRM_DEBUG("OA metric set not specified\n");
> +			return -EINVAL;
> +		}
> +
> +		if (!props->oa_format) {
> +			DRM_DEBUG("OA report format not specified\n");
> +			return -EINVAL;
> +		}
> +
> +		if (props->cs_mode && (props->engine != RCS)) {
> +			DRM_ERROR("Command stream OA metrics only available via Render CS\n");
> +			return -EINVAL;
> +		}
> +
> +		engine = dev_priv->engine[RCS];
> +		stream->using_oa = true;
> +
> +		idx = srcu_read_lock(&engine->perf_srcu);
> +		curr_stream = srcu_dereference(engine->exclusive_stream,
> +					       &engine->perf_srcu);
> +		if (curr_stream) {
> +			DRM_ERROR("Stream already opened\n");
> +			ret = -EINVAL;
> +			goto err_enable;
> +		}
> +		srcu_read_unlock(&engine->perf_srcu, idx);
> +
> +		format_size =
> +			dev_priv->perf.oa.oa_formats[props->oa_format].size;
> +
> +		if (props->sample_flags & SAMPLE_OA_REPORT) {
> +			stream->sample_flags |= SAMPLE_OA_REPORT;
> +			stream->sample_size += format_size;
> +		}
> +
> +		if (props->sample_flags & SAMPLE_OA_SOURCE) {
> +			if (!(props->sample_flags & SAMPLE_OA_REPORT)) {
> +				DRM_ERROR("OA source type can't be sampled without OA report\n");
> +				return -EINVAL;
> +			}
> +			stream->sample_flags |= SAMPLE_OA_SOURCE;
> +			stream->sample_size += 8;
> +		}
> +
> +		dev_priv->perf.oa.oa_buffer.format_size = format_size;
> +		if (WARN_ON(dev_priv->perf.oa.oa_buffer.format_size == 0))
> +			return -EINVAL;
> +
> +		dev_priv->perf.oa.oa_buffer.format =
> +			dev_priv->perf.oa.oa_formats[props->oa_format].format;
> +
> +		dev_priv->perf.oa.metrics_set = props->metrics_set;
>   
> -	dev_priv->perf.oa.oa_buffer.format =
> -		dev_priv->perf.oa.oa_formats[props->oa_format].format;
> +		dev_priv->perf.oa.periodic = props->oa_periodic;
> +		if (dev_priv->perf.oa.periodic)
> +			dev_priv->perf.oa.period_exponent =
> +				props->oa_period_exponent;
>   
> -	dev_priv->perf.oa.metrics_set = props->metrics_set;
> +		if (stream->ctx) {
> +			ret = oa_get_render_ctx_id(stream);
> +			if (ret)
> +				return ret;
> +		}
>   
> -	dev_priv->perf.oa.periodic = props->oa_periodic;
> -	if (dev_priv->perf.oa.periodic)
> -		dev_priv->perf.oa.period_exponent = props->oa_period_exponent;
> +		/* PRM - observability performance counters:
> +		 *
> +		 *   OACONTROL, performance counter enable, note:
> +		 *
> +		 *   "When this bit is set, in order to have coherent counts,
> +		 *   RC6 power state and trunk clock gating must be disabled.
> +		 *   This can be achieved by programming MMIO registers as
> +		 *   0xA094=0 and 0xA090[31]=1"
> +		 *
> +		 *   In our case we are expecting that taking pm + FORCEWAKE
> +		 *   references will effectively disable RC6.
> +		 */
> +		intel_runtime_pm_get(dev_priv);
> +		intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
>   
> -	if (stream->ctx) {
> -		ret = oa_get_render_ctx_id(stream);
> +		ret = alloc_oa_buffer(dev_priv);
>   		if (ret)
> -			return ret;
> +			goto err_oa_buf_alloc;
> +
> +		ret = dev_priv->perf.oa.ops.enable_metric_set(dev_priv);
> +		if (ret)
> +			goto err_enable;
>   	}
>   
> -	/* PRM - observability performance counters:
> -	 *
> -	 *   OACONTROL, performance counter enable, note:
> -	 *
> -	 *   "When this bit is set, in order to have coherent counts,
> -	 *   RC6 power state and trunk clock gating must be disabled.
> -	 *   This can be achieved by programming MMIO registers as
> -	 *   0xA094=0 and 0xA090[31]=1"
> -	 *
> -	 *   In our case we are expecting that taking pm + FORCEWAKE
> -	 *   references will effectively disable RC6.
> -	 */
> -	intel_runtime_pm_get(dev_priv);
> -	intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
> +	if (props->sample_flags & SAMPLE_CTX_ID) {
> +		stream->sample_flags |= SAMPLE_CTX_ID;
> +		stream->sample_size += 8;
> +	}
>   
> -	ret = alloc_oa_buffer(dev_priv);
> -	if (ret)
> -		goto err_oa_buf_alloc;
> +	if (props->cs_mode) {
> +		if (!cs_sample_data) {
> +			DRM_ERROR("Stream engine given without requesting any CS data to sample\n");
> +			ret = -EINVAL;
> +			goto err_enable;
> +		}
>   
> -	ret = dev_priv->perf.oa.ops.enable_metric_set(dev_priv);
> -	if (ret)
> -		goto err_enable;
> +		if (!(props->sample_flags & SAMPLE_CTX_ID)) {
> +			DRM_ERROR("Stream engine given without requesting any CS specific property\n");
> +			ret = -EINVAL;
> +			goto err_enable;
> +		}
>   
> -	stream->ops = &i915_oa_stream_ops;
> +		engine = dev_priv->engine[props->engine];
>   
> -	dev_priv->perf.oa.exclusive_stream = stream;
> +		idx = srcu_read_lock(&engine->perf_srcu);
> +		curr_stream = srcu_dereference(engine->exclusive_stream,
> +					       &engine->perf_srcu);
> +		if (curr_stream) {
> +			DRM_ERROR("Stream already opened\n");
> +			ret = -EINVAL;
> +			goto err_enable;
> +		}
> +		srcu_read_unlock(&engine->perf_srcu, idx);
> +
> +		INIT_LIST_HEAD(&stream->cs_samples);
> +		ret = alloc_cs_buffer(stream);
> +		if (ret)
> +			goto err_enable;
> +
> +		stream->cs_mode = true;
> +	}
> +
> +	init_waitqueue_head(&stream->poll_wq);
> +	stream->pollin = false;
> +	stream->ops = &perf_stream_ops;
> +	stream->engine = engine;
> +	rcu_assign_pointer(engine->exclusive_stream, stream);
>   
>   	return 0;
>   
>   err_enable:
> -	free_oa_buffer(dev_priv);
> +	if (require_oa_unit)
> +		free_oa_buffer(dev_priv);
>   
>   err_oa_buf_alloc:
> -	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
> -	intel_runtime_pm_put(dev_priv);
> +	if (require_oa_unit) {
> +		intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
> +		intel_runtime_pm_put(dev_priv);
> +	}
>   	if (stream->ctx)
>   		oa_put_render_ctx_id(stream);
>   
> @@ -2219,7 +2891,7 @@ static ssize_t i915_perf_read(struct file *file,
>   	 * disabled stream as an error. In particular it might otherwise lead
>   	 * to a deadlock for blocking file descriptors...
>   	 */
> -	if (!stream->enabled)
> +	if (stream->state == I915_PERF_STREAM_DISABLED)
>   		return -EIO;
>   
>   	if (!(file->f_flags & O_NONBLOCK)) {
> @@ -2254,25 +2926,32 @@ static ssize_t i915_perf_read(struct file *file,
>   	 * effectively ensures we back off until the next hrtimer callback
>   	 * before reporting another POLLIN event.
>   	 */
> -	if (ret >= 0 || ret == -EAGAIN) {
> -		/* Maybe make ->pollin per-stream state if we support multiple
> -		 * concurrent streams in the future.
> -		 */
> -		dev_priv->perf.oa.pollin = false;
> -	}
> +	if (ret >= 0 || ret == -EAGAIN)
> +		stream->pollin = false;
>   
>   	return ret;
>   }
>   
> -static enum hrtimer_restart oa_poll_check_timer_cb(struct hrtimer *hrtimer)
> +static enum hrtimer_restart poll_check_timer_cb(struct hrtimer *hrtimer)
>   {
> +	struct i915_perf_stream *stream;
>   	struct drm_i915_private *dev_priv =
>   		container_of(hrtimer, typeof(*dev_priv),
> -			     perf.oa.poll_check_timer);
> -
> -	if (oa_buffer_check_unlocked(dev_priv)) {
> -		dev_priv->perf.oa.pollin = true;
> -		wake_up(&dev_priv->perf.oa.poll_wq);
> +			     perf.poll_check_timer);
> +	int idx;
> +	struct intel_engine_cs *engine;
> +	enum intel_engine_id id;
> +
> +	for_each_engine(engine, dev_priv, id) {
> +		idx = srcu_read_lock(&engine->perf_srcu);
> +		stream = srcu_dereference(engine->exclusive_stream,
> +					  &engine->perf_srcu);
> +		if (stream && (stream->state == I915_PERF_STREAM_ENABLED) &&
> +		    stream_have_data_unlocked(stream)) {
> +			stream->pollin = true;
> +			wake_up(&stream->poll_wq);
> +		}
> +		srcu_read_unlock(&engine->perf_srcu, idx);
>   	}
>   
>   	hrtimer_forward_now(hrtimer, ns_to_ktime(POLL_PERIOD));
> @@ -2311,7 +2990,7 @@ static unsigned int i915_perf_poll_locked(struct drm_i915_private *dev_priv,
>   	 * the hrtimer/oa_poll_check_timer_cb to notify us when there are
>   	 * samples to read.
>   	 */
> -	if (dev_priv->perf.oa.pollin)
> +	if (stream->pollin)
>   		events |= POLLIN;
>   
>   	return events;
> @@ -2355,14 +3034,16 @@ static unsigned int i915_perf_poll(struct file *file, poll_table *wait)
>    */
>   static void i915_perf_enable_locked(struct i915_perf_stream *stream)
>   {
> -	if (stream->enabled)
> +	if (stream->state != I915_PERF_STREAM_DISABLED)
>   		return;
>   
>   	/* Allow stream->ops->enable() to refer to this */
> -	stream->enabled = true;
> +	stream->state = I915_PERF_STREAM_ENABLE_IN_PROGRESS;
>   
>   	if (stream->ops->enable)
>   		stream->ops->enable(stream);
> +
> +	stream->state = I915_PERF_STREAM_ENABLED;
>   }
>   
>   /**
> @@ -2381,11 +3062,11 @@ static void i915_perf_enable_locked(struct i915_perf_stream *stream)
>    */
>   static void i915_perf_disable_locked(struct i915_perf_stream *stream)
>   {
> -	if (!stream->enabled)
> +	if (stream->state != I915_PERF_STREAM_ENABLED)
>   		return;
>   
>   	/* Allow stream->ops->disable() to refer to this */
> -	stream->enabled = false;
> +	stream->state = I915_PERF_STREAM_DISABLED;
>   
>   	if (stream->ops->disable)
>   		stream->ops->disable(stream);
> @@ -2457,14 +3138,12 @@ static long i915_perf_ioctl(struct file *file,
>    */
>   static void i915_perf_destroy_locked(struct i915_perf_stream *stream)
>   {
> -	if (stream->enabled)
> +	if (stream->state == I915_PERF_STREAM_ENABLED)
>   		i915_perf_disable_locked(stream);
>   
>   	if (stream->ops->destroy)
>   		stream->ops->destroy(stream);
>   
> -	list_del(&stream->link);
> -
>   	if (stream->ctx)
>   		i915_gem_context_put(stream->ctx);
>   
> @@ -2524,7 +3203,7 @@ static int i915_perf_release(struct inode *inode, struct file *file)
>    *
>    * In the case where userspace is interested in OA unit metrics then further
>    * config validation and stream initialization details will be handled by
> - * i915_oa_stream_init(). The code here should only validate config state that
> + * i915_perf_stream_init(). The code here should only validate config state that
>    * will be relevant to all stream types / backends.
>    *
>    * Returns: zero on success or a negative error code.
> @@ -2593,7 +3272,7 @@ static int i915_perf_release(struct inode *inode, struct file *file)
>   	stream->dev_priv = dev_priv;
>   	stream->ctx = specific_ctx;
>   
> -	ret = i915_oa_stream_init(stream, param, props);
> +	ret = i915_perf_stream_init(stream, param, props);
>   	if (ret)
>   		goto err_alloc;
>   
> @@ -2606,8 +3285,6 @@ static int i915_perf_release(struct inode *inode, struct file *file)
>   		goto err_flags;
>   	}
>   
> -	list_add(&stream->link, &dev_priv->perf.streams);
> -
>   	if (param->flags & I915_PERF_FLAG_FD_CLOEXEC)
>   		f_flags |= O_CLOEXEC;
>   	if (param->flags & I915_PERF_FLAG_FD_NONBLOCK)
> @@ -2625,7 +3302,6 @@ static int i915_perf_release(struct inode *inode, struct file *file)
>   	return stream_fd;
>   
>   err_open:
> -	list_del(&stream->link);
>   err_flags:
>   	if (stream->ops->destroy)
>   		stream->ops->destroy(stream);
> @@ -2774,6 +3450,29 @@ static int read_properties_unlocked(struct drm_i915_private *dev_priv,
>   		case DRM_I915_PERF_PROP_SAMPLE_OA_SOURCE:
>   			props->sample_flags |= SAMPLE_OA_SOURCE;
>   			break;
> +		case DRM_I915_PERF_PROP_ENGINE: {
> +				unsigned int user_ring_id =
> +					value & I915_EXEC_RING_MASK;
> +				enum intel_engine_id engine;
> +
> +				if (user_ring_id > I915_USER_RINGS)
> +					return -EINVAL;
> +
> +				/* XXX: Currently only RCS is supported.
> +				 * Remove this check when support for other
> +				 * engines is added
> +				 */
> +				engine = user_ring_map[user_ring_id];
> +				if (engine != RCS)
> +					return -EINVAL;
> +
> +				props->cs_mode = true;
> +				props->engine = engine;
> +			}
> +			break;
> +		case DRM_I915_PERF_PROP_SAMPLE_CTX_ID:
> +			props->sample_flags |= SAMPLE_CTX_ID;
> +			break;
>   		case DRM_I915_PERF_PROP_MAX:
>   			MISSING_CASE(id);
>   			return -EINVAL;
> @@ -3002,6 +3701,30 @@ void i915_perf_unregister(struct drm_i915_private *dev_priv)
>   	{}
>   };
>   
> +void i915_perf_streams_mark_idle(struct drm_i915_private *dev_priv)
> +{
> +	struct intel_engine_cs *engine;
> +	struct i915_perf_stream *stream;
> +	enum intel_engine_id id;
> +	int idx;
> +
> +	for_each_engine(engine, dev_priv, id) {
> +		idx = srcu_read_lock(&engine->perf_srcu);
> +		stream = srcu_dereference(engine->exclusive_stream,
> +					  &engine->perf_srcu);
> +		if (stream && (stream->state == I915_PERF_STREAM_ENABLED) &&
> +					stream->cs_mode) {
> +			struct reservation_object *resv =
> +						stream->cs_buffer.vma->resv;
> +
> +			reservation_object_lock(resv, NULL);
> +			reservation_object_add_excl_fence(resv, NULL);
> +			reservation_object_unlock(resv);
> +		}
> +		srcu_read_unlock(&engine->perf_srcu, idx);
> +	}
> +}
> +
>   /**
>    * i915_perf_init - initialize i915-perf state on module load
>    * @dev_priv: i915 device instance
> @@ -3125,12 +3848,10 @@ void i915_perf_init(struct drm_i915_private *dev_priv)
>   	}
>   
>   	if (dev_priv->perf.oa.n_builtin_sets) {
> -		hrtimer_init(&dev_priv->perf.oa.poll_check_timer,
> +		hrtimer_init(&dev_priv->perf.poll_check_timer,
>   				CLOCK_MONOTONIC, HRTIMER_MODE_REL);
> -		dev_priv->perf.oa.poll_check_timer.function = oa_poll_check_timer_cb;
> -		init_waitqueue_head(&dev_priv->perf.oa.poll_wq);
> +		dev_priv->perf.poll_check_timer.function = poll_check_timer_cb;
>   
> -		INIT_LIST_HEAD(&dev_priv->perf.streams);
>   		mutex_init(&dev_priv->perf.lock);
>   		spin_lock_init(&dev_priv->perf.oa.oa_buffer.ptr_lock);
>   
> diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
> index 9ab5969..1a2e843 100644
> --- a/drivers/gpu/drm/i915/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
> @@ -317,6 +317,10 @@ int intel_engines_init(struct drm_i915_private *dev_priv)
>   			goto cleanup;
>   
>   		GEM_BUG_ON(!engine->submit_request);
> +
> +		/* Perf stream related initialization for Engine */
> +		rcu_assign_pointer(engine->exclusive_stream, NULL);
> +		init_srcu_struct(&engine->perf_srcu);
>   	}
>   
>   	return 0;
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index cdf084e..4333623 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -1622,6 +1622,8 @@ void intel_engine_cleanup(struct intel_engine_cs *engine)
>   
>   	intel_engine_cleanup_common(engine);
>   
> +	cleanup_srcu_struct(&engine->perf_srcu);
> +
>   	dev_priv->engine[engine->id] = NULL;
>   	kfree(engine);
>   }
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index d33c934..0ac8491 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -441,6 +441,11 @@ struct intel_engine_cs {
>   	 * certain bits to encode the command length in the header).
>   	 */
>   	u32 (*get_cmd_length_mask)(u32 cmd_header);
> +
> +	/* Global per-engine stream */
> +	struct srcu_struct perf_srcu;
> +	struct i915_perf_stream __rcu *exclusive_stream;
> +	u32 specific_ctx_id;
>   };
>   
>   static inline unsigned int
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index a1314c5..768b1a5 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -1350,6 +1350,7 @@ enum drm_i915_oa_format {
>   
>   enum drm_i915_perf_sample_oa_source {
>   	I915_PERF_SAMPLE_OA_SOURCE_OABUFFER,
> +	I915_PERF_SAMPLE_OA_SOURCE_CS,
>   	I915_PERF_SAMPLE_OA_SOURCE_MAX	/* non-ABI */
>   };
>   
> @@ -1394,6 +1395,19 @@ enum drm_i915_perf_property_id {
>   	 */
>   	DRM_I915_PERF_PROP_SAMPLE_OA_SOURCE,
>   
> +	/**
> +	 * The value of this property specifies the GPU engine for which
> +	 * the samples need to be collected. Specifying this property also
> +	 * implies the command stream based sample collection.
> +	 */
> +	DRM_I915_PERF_PROP_ENGINE,
> +
> +	/**
> +	 * The value of this property set to 1 requests inclusion of context ID
> +	 * in the perf sample data.
> +	 */
> +	DRM_I915_PERF_PROP_SAMPLE_CTX_ID,
> +
>   	DRM_I915_PERF_PROP_MAX /* non-ABI */
>   };
>   
> @@ -1460,6 +1474,7 @@ enum drm_i915_perf_record_type {
>   	 *     struct drm_i915_perf_record_header header;
>   	 *
>   	 *     { u64 source; } && DRM_I915_PERF_PROP_SAMPLE_OA_SOURCE
> +	 *     { u64 ctx_id; } && DRM_I915_PERF_PROP_SAMPLE_CTX_ID
>   	 *     { u32 oa_report[]; } && DRM_I915_PERF_PROP_SAMPLE_OA
>   	 * };
>   	 */


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 03/12] drm/i915: Framework for capturing command stream based OA reports and ctx id info.
  2017-07-31  8:34   ` Chris Wilson
@ 2017-07-31 10:11     ` Chris Wilson
  2017-08-02  4:44       ` Kamble, Sagar A
  0 siblings, 1 reply; 34+ messages in thread
From: Chris Wilson @ 2017-07-31 10:11 UTC (permalink / raw)
  To: Sagar Arun Kamble, intel-gfx; +Cc: Sourab Gupta

Quoting Chris Wilson (2017-07-31 09:34:30)
> Quoting Sagar Arun Kamble (2017-07-31 08:59:36)
> > +/**
> > + * i915_perf_stream_emit_sample_capture - Insert the commands to capture perf
> > + * metrics into the GPU command stream
> > + * @stream: An i915-perf stream opened for GPU metrics
> > + * @request: request in whose context the metrics are being collected.
> > + * @preallocate: allocate space in ring for related sample.
> > + */
> > +static void i915_perf_stream_emit_sample_capture(
> > +                                       struct i915_perf_stream *stream,
> > +                                       struct drm_i915_gem_request *request,
> > +                                       bool preallocate)
> > +{
> > +       struct reservation_object *resv = stream->cs_buffer.vma->resv;
> > +       struct i915_perf_cs_sample *sample;
> > +       unsigned long flags;
> > +       int ret;
> > +
> > +       sample = kzalloc(sizeof(*sample), GFP_KERNEL);
> > +       if (sample == NULL) {
> > +               DRM_ERROR("Perf sample alloc failed\n");
> > +               return;
> > +       }
> > +
> > +       sample->request = i915_gem_request_get(request);
> > +       sample->ctx_id = request->ctx->hw_id;
> > +
> > +       insert_perf_sample(stream, sample);
> > +
> > +       if (stream->sample_flags & SAMPLE_OA_REPORT) {
> > +               ret = i915_emit_oa_report_capture(request,
> > +                                                 preallocate,
> > +                                                 sample->offset);
> > +               if (ret)
> > +                       goto err_unref;
> > +       }
> 
> This is incorrect as the requests may be reordered. You either need to
> declare the linear ordering of requests through the sample buffer, or we
> have to delay setting sample->offset until execution, and even then we
> need to disable preemption when using SAMPLE_OA_REPORT.

Thinking about it, you do need to serialise based on stream->vma, or
else where a stream->vma per capture context.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 06/12] drm/i915: Populate ctx ID for periodic OA reports
  2017-07-31  9:27   ` Lionel Landwerlin
@ 2017-07-31 10:42     ` Kamble, Sagar A
  0 siblings, 0 replies; 34+ messages in thread
From: Kamble, Sagar A @ 2017-07-31 10:42 UTC (permalink / raw)
  To: Landwerlin, Lionel G, intel-gfx; +Cc: Sourab Gupta

Ctx_id for first submission will be its corresponding context as CS sample for that is allocated during submission with ctx_id taken from ctx->hw_id.
For periodic reports, cs sample after those reports will have the ctx_id info as the timestamp of that CS sample's report is greater than periodic report.

With no CS samples, periodic reports can't be associated with last context hence that would need change in following patch to set last ctx id to INVALID
f5f73cf drm/i915: Flush periodic samples, in case of no pending CS sample requests

Timestamps of OA reports taken before and after batch are used to associate ctx_id information with OA reports.
So for e.g. for batches B1, B2 if following is the timeline:
B1.start -> P1 -> P2 -> B1.end -> P3 -> B2.start -> P4 -> B2.end

Then while reading CS samples will be read first interleaved with OA samples so
Read sequence will be
1. Read B1.start report
2. Read P1 and P2 and associate with B1's context
3. Read B1.end report
4. Read P3 and associate with B1 (this is incorrect - should not be tagged with any context)
5. Read B2.start report
6. Read P4 and associate with B2's context
7. Read B2.end report



-----Original Message-----
From: Landwerlin, Lionel G 
Sent: Monday, July 31, 2017 2:57 PM
To: Kamble, Sagar A <sagar.a.kamble@intel.com>; intel-gfx@lists.freedesktop.org
Cc: Sourab Gupta <sourab.gupta@intel.com>
Subject: Re: [Intel-gfx] [PATCH 06/12] drm/i915: Populate ctx ID for periodic OA reports

Hi Sagar,

I'm curious to what happens if 2 contexts submit requests which a time period smaller than the sampling OA period on Gen7.5.
My understanding is that with this change you'll only retain the last submission and then the ctx_id reported in the SAMPLE_CTX_ID field will be incorrect for the first workload.

Am I missing something?

-
Lionel

On 31/07/17 08:59, Sagar Arun Kamble wrote:
> From: Sourab Gupta <sourab.gupta@intel.com>
>
> This adds support for populating the ctx id for the periodic OA 
> reports when requested through the corresponding property.
>
> For Gen8, the OA reports itself have the ctx ID and it is the one 
> programmed into HW while submitting workloads. Thus it's retrieved 
> from reports itself.
> For Gen7, the OA reports don't have any such field, and we can 
> populate this field with the last seen ctx ID while sending CS reports.
>
> Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
> Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_drv.h  |  8 ++++++
>   drivers/gpu/drm/i915/i915_perf.c | 58 +++++++++++++++++++++++++++++++---------
>   2 files changed, 54 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h 
> b/drivers/gpu/drm/i915/i915_drv.h index fb81315..6c011f3 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2077,6 +2077,8 @@ struct i915_perf_stream {
>   
>   	wait_queue_head_t poll_wq;
>   	bool pollin;
> +
> +	u32 last_ctx_id;
>   };
>   
>   /**
> @@ -2151,6 +2153,12 @@ struct i915_oa_ops {
>   	 * generations.
>   	 */
>   	u32 (*oa_hw_tail_read)(struct drm_i915_private *dev_priv);
> +
> +	/**
> +	 * @get_ctx_id: Retrieve the ctx_id associated with the (periodic) OA
> +	 * report.
> +	 */
> +	u32 (*get_ctx_id)(struct i915_perf_stream *stream, const u8 
> +*report);
>   };
>   
>   /*
> diff --git a/drivers/gpu/drm/i915/i915_perf.c 
> b/drivers/gpu/drm/i915/i915_perf.c
> index 905c5bb..1f5ebdb 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -790,6 +790,45 @@ static u32 oa_buffer_num_reports_unlocked(
>   	return aged_tail == INVALID_TAIL_PTR ? 0 : num_reports;
>   }
>   
> +static u32 gen7_oa_buffer_get_ctx_id(struct i915_perf_stream *stream,
> +				    const u8 *report)
> +{
> +	if (!stream->cs_mode)
> +		WARN_ONCE(1,
> +			"CTX ID can't be retrieved if command stream mode not enabled");
> +
> +	/*
> +	 * OA reports generated in Gen7 don't have the ctx ID information.
> +	 * Therefore, just rely on the ctx ID information from the last CS
> +	 * sample forwarded
> +	 */
> +	return stream->last_ctx_id;
> +}
> +
> +static u32 gen8_oa_buffer_get_ctx_id(struct i915_perf_stream *stream,
> +				    const u8 *report)
> +{
> +	u32 ctx_id;
> +
> +	/* The ctx ID present in the OA reports have intel_context::hw_id
> +	 * present, since this is programmed into the ELSP in execlist mode.
> +	 * In non-execlist mode, fall back to retrieving the ctx ID from the
> +	 * last saved ctx ID from command stream mode.
> +	 */
> +	if (i915.enable_execlists) {
> +		u32 *report32 = (void *)report;
> +
> +		ctx_id = report32[2] & 0x1fffff;
> +	} else {
> +		if (!stream->cs_mode)
> +			WARN_ONCE(1,
> +				"CTX ID can't be retrieved if command stream mode not enabled");
> +
> +		ctx_id = stream->last_ctx_id;
> +	}
> +	return ctx_id;
> +}
> +
>   /**
>    * append_oa_status - Appends a status record to a userspace read() buffer.
>    * @stream: An i915-perf stream opened for OA metrics @@ -914,22 
> +953,12 @@ static int append_oa_buffer_sample(struct i915_perf_stream *stream,
>   	struct drm_i915_private *dev_priv = stream->dev_priv;
>   	u32 sample_flags = stream->sample_flags;
>   	struct i915_perf_sample_data data = { 0 };
> -	u32 *report32 = (u32 *)report;
>   
>   	if (sample_flags & SAMPLE_OA_SOURCE)
>   		data.source = I915_PERF_SAMPLE_OA_SOURCE_OABUFFER;
>   
>   	if (sample_flags & SAMPLE_CTX_ID) {
> -		if (INTEL_INFO(dev_priv)->gen < 8)
> -			data.ctx_id = 0;
> -		else {
> -			/*
> -			 * XXX: Just keep the lower 21 bits for now since I'm
> -			 * not entirely sure if the HW touches any of the higher
> -			 * bits in this field
> -			 */
> -			data.ctx_id = report32[2] & 0x1fffff;
> -		}
> +		data.ctx_id = dev_priv->perf.oa.ops.get_ctx_id(stream, report);
>   	}
>   
>   	if (sample_flags & SAMPLE_OA_REPORT) @@ -1524,8 +1553,10 @@ static 
> int append_cs_buffer_sample(struct i915_perf_stream *stream,
>   	if (sample_flags & SAMPLE_OA_SOURCE)
>   		data.source = I915_PERF_SAMPLE_OA_SOURCE_CS;
>   
> -	if (sample_flags & SAMPLE_CTX_ID)
> +	if (sample_flags & SAMPLE_CTX_ID) {
>   		data.ctx_id = node->ctx_id;
> +		stream->last_ctx_id = data.ctx_id;
> +	}
>   
>   	return append_perf_sample(stream, buf, count, offset, &data);
>   }
> @@ -3838,6 +3869,7 @@ void i915_perf_init(struct drm_i915_private *dev_priv)
>   		dev_priv->perf.oa.ops.read = gen7_oa_read;
>   		dev_priv->perf.oa.ops.oa_hw_tail_read =
>   			gen7_oa_hw_tail_read;
> +		dev_priv->perf.oa.ops.get_ctx_id = gen7_oa_buffer_get_ctx_id;
>   
>   		dev_priv->perf.oa.timestamp_frequency = 12500000;
>   
> @@ -3933,6 +3965,8 @@ void i915_perf_init(struct drm_i915_private *dev_priv)
>   			dev_priv->perf.oa.ops.read = gen8_oa_read;
>   			dev_priv->perf.oa.ops.oa_hw_tail_read =
>   				gen8_oa_hw_tail_read;
> +			dev_priv->perf.oa.ops.get_ctx_id =
> +				gen8_oa_buffer_get_ctx_id;
>   
>   			dev_priv->perf.oa.oa_formats = gen8_plus_oa_formats;
>   		}


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 03/12] drm/i915: Framework for capturing command stream based OA reports and ctx id info.
  2017-07-31  9:43   ` Lionel Landwerlin
@ 2017-07-31 11:38     ` sourab gupta
  2017-07-31 14:25       ` Lionel Landwerlin
  0 siblings, 1 reply; 34+ messages in thread
From: sourab gupta @ 2017-07-31 11:38 UTC (permalink / raw)
  To: Lionel Landwerlin; +Cc: intel-gfx, Sourab Gupta


[-- Attachment #1.1: Type: text/plain, Size: 67659 bytes --]

On Mon, Jul 31, 2017 at 3:13 PM, Lionel Landwerlin <
lionel.g.landwerlin@intel.com> wrote:

> On 31/07/17 08:59, Sagar Arun Kamble wrote:
>
>> From: Sourab Gupta <sourab.gupta@intel.com>
>>
>> This patch introduces a framework to capture OA counter reports associated
>> with Render command stream. We can then associate the reports captured
>> through this mechanism with their corresponding context id's. This can be
>> further extended to associate any other metadata information with the
>> corresponding samples (since the association with Render command stream
>> gives us the ability to capture these information while inserting the
>> corresponding capture commands into the command stream).
>>
>> The OA reports generated in this way are associated with a corresponding
>> workload, and thus can be used the delimit the workload (i.e. sample the
>> counters at the workload boundaries), within an ongoing stream of periodic
>> counter snapshots.
>>
>> There may be usecases wherein we need more than periodic OA capture mode
>> which is supported currently. This mode is primarily used for two
>> usecases:
>>      - Ability to capture system wide metrics, alongwith the ability to
>> map
>>        the reports back to individual contexts (particularly for HSW).
>>      - Ability to inject tags for work, into the reports. This provides
>>        visibility into the multiple stages of work within single context.
>>
>> The userspace will be able to distinguish between the periodic and CS
>> based
>> OA reports by the virtue of source_info sample field.
>>
>> The command MI_REPORT_PERF_COUNT can be used to capture snapshots of OA
>> counters, and is inserted at BB boundaries.
>> The data thus captured will be stored in a separate buffer, which will
>> be different from the buffer used otherwise for periodic OA capture mode.
>> The metadata information pertaining to snapshot is maintained in a list,
>> which also has offsets into the gem buffer object per captured snapshot.
>> In order to track whether the gpu has completed processing the node,
>> a field pertaining to corresponding gem request is added, which is tracked
>> for completion of the command.
>>
>> Both periodic and CS based reports are associated with a single stream
>> (corresponding to render engine), and it is expected to have the samples
>> in the sequential order according to their timestamps. Now, since these
>> reports are collected in separate buffers, these are merge sorted at the
>> time of forwarding to userspace during the read call.
>>
>> v2: Aligning with the non-perf interface (custom drm ioctl based). Also,
>> few related patches are squashed together for better readability
>>
>> v3: Updated perf sample capture emit hook name. Reserving space upfront
>> in the ring for emitting sample capture commands and using
>> req->fence.seqno for tracking samples. Added SRCU protection for streams.
>> Changed the stream last_request tracking to resv object. (Chris)
>> Updated perf.sample_lock spin_lock usage to avoid softlockups. Moved
>> stream to global per-engine structure. (Sagar)
>> Update unpin and put in the free routines to i915_vma_unpin_and_release.
>> Making use of perf stream cs_buffer vma resv instead of separate resv obj.
>> Pruned perf stream vma resv during gem_idle. (Chris)
>> Changed payload field ctx_id to u64 to keep all sample data aligned at 8
>> bytes. (Lionel)
>> stall/flush prior to sample capture is not added. Do we need to give this
>> control to user to select whether to stall/flush at each sample?
>>
>> Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
>> Signed-off-by: Robert Bragg <robert@sixbynine.org>
>> Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_drv.h            |  101 ++-
>>   drivers/gpu/drm/i915/i915_gem.c            |    1 +
>>   drivers/gpu/drm/i915/i915_gem_execbuffer.c |    8 +
>>   drivers/gpu/drm/i915/i915_perf.c           | 1185
>> ++++++++++++++++++++++------
>>   drivers/gpu/drm/i915/intel_engine_cs.c     |    4 +
>>   drivers/gpu/drm/i915/intel_ringbuffer.c    |    2 +
>>   drivers/gpu/drm/i915/intel_ringbuffer.h    |    5 +
>>   include/uapi/drm/i915_drm.h                |   15 +
>>   8 files changed, 1073 insertions(+), 248 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_drv.h
>> b/drivers/gpu/drm/i915/i915_drv.h
>> index 2c7456f..8b1cecf 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.h
>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>> @@ -1985,6 +1985,24 @@ struct i915_perf_stream_ops {
>>          * The stream will always be disabled before this is called.
>>          */
>>         void (*destroy)(struct i915_perf_stream *stream);
>> +
>> +       /*
>> +        * @emit_sample_capture: Emit the commands in the command streamer
>> +        * for a particular gpu engine.
>> +        *
>> +        * The commands are inserted to capture the perf sample data at
>> +        * specific points during workload execution, such as before and
>> after
>> +        * the batch buffer.
>> +        */
>> +       void (*emit_sample_capture)(struct i915_perf_stream *stream,
>> +                                   struct drm_i915_gem_request *request,
>> +                                   bool preallocate);
>> +};
>> +
>> +enum i915_perf_stream_state {
>> +       I915_PERF_STREAM_DISABLED,
>> +       I915_PERF_STREAM_ENABLE_IN_PROGRESS,
>> +       I915_PERF_STREAM_ENABLED,
>>   };
>>     /**
>> @@ -1997,9 +2015,9 @@ struct i915_perf_stream {
>>         struct drm_i915_private *dev_priv;
>>         /**
>> -        * @link: Links the stream into ``&drm_i915_private->streams``
>> +        * @engine: Engine to which this stream corresponds.
>>          */
>> -       struct list_head link;
>> +       struct intel_engine_cs *engine;
>>         /**
>>          * @sample_flags: Flags representing the
>> `DRM_I915_PERF_PROP_SAMPLE_*`
>> @@ -2022,17 +2040,41 @@ struct i915_perf_stream {
>>         struct i915_gem_context *ctx;
>>         /**
>> -        * @enabled: Whether the stream is currently enabled, considering
>> -        * whether the stream was opened in a disabled state and based
>> -        * on `I915_PERF_IOCTL_ENABLE` and `I915_PERF_IOCTL_DISABLE`
>> calls.
>> +        * @state: Current stream state, which can be either disabled,
>> enabled,
>> +        * or enable_in_progress, while considering whether the stream was
>> +        * opened in a disabled state and based on
>> `I915_PERF_IOCTL_ENABLE` and
>> +        * `I915_PERF_IOCTL_DISABLE` calls.
>>          */
>> -       bool enabled;
>> +       enum i915_perf_stream_state state;
>> +
>> +       /**
>> +        * @cs_mode: Whether command stream based perf sample collection
>> is
>> +        * enabled for this stream
>> +        */
>> +       bool cs_mode;
>> +
>> +       /**
>> +        * @using_oa: Whether OA unit is in use for this particular stream
>> +        */
>> +       bool using_oa;
>>         /**
>>          * @ops: The callbacks providing the implementation of this
>> specific
>>          * type of configured stream.
>>          */
>>         const struct i915_perf_stream_ops *ops;
>> +
>> +       /* Command stream based perf data buffer */
>> +       struct {
>> +               struct i915_vma *vma;
>> +               u8 *vaddr;
>> +       } cs_buffer;
>> +
>> +       struct list_head cs_samples;
>> +       spinlock_t cs_samples_lock;
>> +
>> +       wait_queue_head_t poll_wq;
>> +       bool pollin;
>>   };
>>     /**
>> @@ -2095,7 +2137,8 @@ struct i915_oa_ops {
>>         int (*read)(struct i915_perf_stream *stream,
>>                     char __user *buf,
>>                     size_t count,
>> -                   size_t *offset);
>> +                   size_t *offset,
>> +                   u32 ts);
>>         /**
>>          * @oa_hw_tail_read: read the OA tail pointer register
>> @@ -2107,6 +2150,36 @@ struct i915_oa_ops {
>>         u32 (*oa_hw_tail_read)(struct drm_i915_private *dev_priv);
>>   };
>>   +/*
>> + * i915_perf_cs_sample - Sample element to hold info about a single perf
>> + * sample data associated with a particular GPU command stream.
>> + */
>> +struct i915_perf_cs_sample {
>> +       /**
>> +        * @link: Links the sample into ``&stream->cs_samples``
>> +        */
>> +       struct list_head link;
>> +
>> +       /**
>> +        * @request: GEM request associated with the sample. The commands
>> to
>> +        * capture the perf metrics are inserted into the command
>> streamer in
>> +        * context of this request.
>> +        */
>> +       struct drm_i915_gem_request *request;
>> +
>> +       /**
>> +        * @offset: Offset into ``&stream->cs_buffer``
>> +        * where the perf metrics will be collected, when the commands
>> inserted
>> +        * into the command stream are executed by GPU.
>> +        */
>> +       u32 offset;
>> +
>> +       /**
>> +        * @ctx_id: Context ID associated with this perf sample
>> +        */
>> +       u32 ctx_id;
>> +};
>> +
>>   struct intel_cdclk_state {
>>         unsigned int cdclk, vco, ref;
>>   };
>> @@ -2431,17 +2504,10 @@ struct drm_i915_private {
>>                 struct ctl_table_header *sysctl_header;
>>                 struct mutex lock;
>> -               struct list_head streams;
>> -
>> -               struct {
>> -                       struct i915_perf_stream *exclusive_stream;
>>   -                     u32 specific_ctx_id;
>> -
>> -                       struct hrtimer poll_check_timer;
>> -                       wait_queue_head_t poll_wq;
>> -                       bool pollin;
>> +               struct hrtimer poll_check_timer;
>>   +             struct {
>>                         /**
>>                          * For rate limiting any notifications of spurious
>>                          * invalid OA reports
>> @@ -3636,6 +3702,8 @@ int i915_perf_open_ioctl(struct drm_device *dev,
>> void *data,
>>   void i915_oa_init_reg_state(struct intel_engine_cs *engine,
>>                             struct i915_gem_context *ctx,
>>                             uint32_t *reg_state);
>> +void i915_perf_emit_sample_capture(struct drm_i915_gem_request *req,
>> +                                  bool preallocate);
>>     /* i915_gem_evict.c */
>>   int __must_check i915_gem_evict_something(struct i915_address_space
>> *vm,
>> @@ -3795,6 +3863,7 @@ int intel_engine_cmd_parser(struct intel_engine_cs
>> *engine,
>>   /* i915_perf.c */
>>   extern void i915_perf_init(struct drm_i915_private *dev_priv);
>>   extern void i915_perf_fini(struct drm_i915_private *dev_priv);
>> +extern void i915_perf_streams_mark_idle(struct drm_i915_private
>> *dev_priv);
>>   extern void i915_perf_register(struct drm_i915_private *dev_priv);
>>   extern void i915_perf_unregister(struct drm_i915_private *dev_priv);
>>   diff --git a/drivers/gpu/drm/i915/i915_gem.c
>> b/drivers/gpu/drm/i915/i915_gem.c
>> index 000a764..7b01548 100644
>> --- a/drivers/gpu/drm/i915/i915_gem.c
>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>> @@ -3220,6 +3220,7 @@ bool i915_gem_unset_wedged(struct drm_i915_private
>> *i915)
>>         intel_engines_mark_idle(dev_priv);
>>         i915_gem_timelines_mark_idle(dev_priv);
>> +       i915_perf_streams_mark_idle(dev_priv);
>>         GEM_BUG_ON(!dev_priv->gt.awake);
>>         dev_priv->gt.awake = false;
>> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
>> b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
>> index 5fa4476..bfe546b 100644
>> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
>> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
>> @@ -1194,12 +1194,16 @@ static int __reloc_gpu_alloc(struct
>> i915_execbuffer *eb,
>>         if (err)
>>                 goto err_request;
>>   +     i915_perf_emit_sample_capture(rq, true);
>> +
>>         err = eb->engine->emit_bb_start(rq,
>>                                         batch->node.start, PAGE_SIZE,
>>                                         cache->gen > 5 ? 0 :
>> I915_DISPATCH_SECURE);
>>         if (err)
>>                 goto err_request;
>>   +     i915_perf_emit_sample_capture(rq, false);
>> +
>>         GEM_BUG_ON(!reservation_object_test_signaled_rcu(batch->resv,
>> true));
>>         i915_vma_move_to_active(batch, rq, 0);
>>         reservation_object_lock(batch->resv, NULL);
>> @@ -2029,6 +2033,8 @@ static int eb_submit(struct i915_execbuffer *eb)
>>                         return err;
>>         }
>>   +     i915_perf_emit_sample_capture(eb->request, true);
>> +
>>         err = eb->engine->emit_bb_start(eb->request,
>>                                         eb->batch->node.start +
>>                                         eb->batch_start_offset,
>> @@ -2037,6 +2043,8 @@ static int eb_submit(struct i915_execbuffer *eb)
>>         if (err)
>>                 return err;
>>   +     i915_perf_emit_sample_capture(eb->request, false);
>> +
>>         return 0;
>>   }
>>   diff --git a/drivers/gpu/drm/i915/i915_perf.c
>> b/drivers/gpu/drm/i915/i915_perf.c
>> index b272653..57e1936 100644
>> --- a/drivers/gpu/drm/i915/i915_perf.c
>> +++ b/drivers/gpu/drm/i915/i915_perf.c
>> @@ -193,6 +193,7 @@
>>     #include <linux/anon_inodes.h>
>>   #include <linux/sizes.h>
>> +#include <linux/srcu.h>
>>     #include "i915_drv.h"
>>   #include "i915_oa_hsw.h"
>> @@ -288,6 +289,12 @@
>>   #define OAREPORT_REASON_CTX_SWITCH     (1<<3)
>>   #define OAREPORT_REASON_CLK_RATIO      (1<<5)
>>   +/* Data common to periodic and RCS based OA samples */
>> +struct i915_perf_sample_data {
>> +       u64 source;
>> +       u64 ctx_id;
>> +       const u8 *report;
>> +};
>>     /* For sysctl proc_dointvec_minmax of i915_oa_max_sample_rate
>>    *
>> @@ -328,8 +335,19 @@
>>         [I915_OA_FORMAT_C4_B8]              = { 7, 64 },
>>   };
>>   +/* Duplicated from similar static enum in i915_gem_execbuffer.c */
>> +#define I915_USER_RINGS (4)
>> +static const enum intel_engine_id user_ring_map[I915_USER_RINGS + 1] = {
>> +       [I915_EXEC_DEFAULT]     = RCS,
>> +       [I915_EXEC_RENDER]      = RCS,
>> +       [I915_EXEC_BLT]         = BCS,
>> +       [I915_EXEC_BSD]         = VCS,
>> +       [I915_EXEC_VEBOX]       = VECS
>> +};
>> +
>>   #define SAMPLE_OA_REPORT      (1<<0)
>>   #define SAMPLE_OA_SOURCE      (1<<1)
>> +#define SAMPLE_CTX_ID        (1<<2)
>>     /**
>>    * struct perf_open_properties - for validated properties given to open
>> a stream
>> @@ -340,6 +358,9 @@
>>    * @oa_format: An OA unit HW report format
>>    * @oa_periodic: Whether to enable periodic OA unit sampling
>>    * @oa_period_exponent: The OA unit sampling period is derived from this
>> + * @cs_mode: Whether the stream is configured to enable collection of
>> metrics
>> + * associated with command stream of a particular GPU engine
>> + * @engine: The GPU engine associated with the stream in case cs_mode is
>> enabled
>>    *
>>    * As read_properties_unlocked() enumerates and validates the
>> properties given
>>    * to open a stream of metrics the configuration is built up in the
>> structure
>> @@ -356,6 +377,10 @@ struct perf_open_properties {
>>         int oa_format;
>>         bool oa_periodic;
>>         int oa_period_exponent;
>> +
>> +       /* Command stream mode */
>> +       bool cs_mode;
>> +       enum intel_engine_id engine;
>>   };
>>     static u32 gen8_oa_hw_tail_read(struct drm_i915_private *dev_priv)
>> @@ -371,6 +396,266 @@ static u32 gen7_oa_hw_tail_read(struct
>> drm_i915_private *dev_priv)
>>   }
>>     /**
>> + * i915_perf_emit_sample_capture - Insert the commands to capture
>> metrics into
>> + * the command stream of a GPU engine.
>> + * @request: request in whose context the metrics are being collected.
>> + * @preallocate: allocate space in ring for related sample.
>> + *
>> + * The function provides a hook through which the commands to capture
>> perf
>> + * metrics, are inserted into the command stream of a GPU engine.
>> + */
>> +void i915_perf_emit_sample_capture(struct drm_i915_gem_request *request,
>> +                                  bool preallocate)
>> +{
>> +       struct intel_engine_cs *engine = request->engine;
>> +       struct drm_i915_private *dev_priv = engine->i915;
>> +       struct i915_perf_stream *stream;
>> +       int idx;
>> +
>> +       if (!dev_priv->perf.initialized)
>> +               return;
>> +
>> +       idx = srcu_read_lock(&engine->perf_srcu);
>> +       stream = srcu_dereference(engine->exclusive_stream,
>> &engine->perf_srcu);
>> +       if (stream && (stream->state == I915_PERF_STREAM_ENABLED) &&
>> +                               stream->cs_mode)
>> +               stream->ops->emit_sample_capture(stream, request,
>> +                                                preallocate);
>> +       srcu_read_unlock(&engine->perf_srcu, idx);
>> +}
>> +
>> +/**
>> + * release_perf_samples - Release old perf samples to make space for new
>> + * sample data.
>> + * @stream: Stream from which space is to be freed up.
>> + * @target_size: Space required to be freed up.
>> + *
>> + * We also dereference the associated request before deleting the sample.
>> + * Also, no need to check whether the commands associated with old
>> samples
>> + * have been completed. This is because these sample entries are anyways
>> going
>> + * to be replaced by a new sample, and gpu will eventually overwrite the
>> buffer
>> + * contents, when the request associated with new sample completes.
>> + */
>> +static void release_perf_samples(struct i915_perf_stream *stream,
>> +                                u32 target_size)
>> +{
>> +       struct drm_i915_private *dev_priv = stream->dev_priv;
>> +       struct i915_perf_cs_sample *sample, *next;
>> +       u32 sample_size = dev_priv->perf.oa.oa_buffer.format_size;
>> +       u32 size = 0;
>> +
>> +       list_for_each_entry_safe
>> +               (sample, next, &stream->cs_samples, link) {
>> +               size += sample_size;
>> +               i915_gem_request_put(sample->request);
>> +               list_del(&sample->link);
>> +               kfree(sample);
>> +
>> +               if (size >= target_size)
>> +                       break;
>> +       }
>> +}
>> +
>> +/**
>> + * insert_perf_sample - Insert a perf sample entry to the sample list.
>> + * @stream: Stream into which sample is to be inserted.
>> + * @sample: perf CS sample to be inserted into the list
>> + *
>> + * This function never fails, since it always manages to insert the
>> sample.
>> + * If the space is exhausted in the buffer, it will remove the older
>> + * entries in order to make space.
>> + */
>> +static void insert_perf_sample(struct i915_perf_stream *stream,
>> +                               struct i915_perf_cs_sample *sample)
>> +{
>> +       struct drm_i915_private *dev_priv = stream->dev_priv;
>> +       struct i915_perf_cs_sample *first, *last;
>> +       int max_offset = stream->cs_buffer.vma->obj->base.size;
>> +       u32 sample_size = dev_priv->perf.oa.oa_buffer.format_size;
>> +       unsigned long flags;
>> +
>> +       spin_lock_irqsave(&stream->cs_samples_lock, flags);
>> +       if (list_empty(&stream->cs_samples)) {
>> +               sample->offset = 0;
>> +               list_add_tail(&sample->link, &stream->cs_samples);
>> +               spin_unlock_irqrestore(&stream->cs_samples_lock, flags);
>> +               return;
>> +       }
>> +
>> +       first = list_first_entry(&stream->cs_samples, typeof(*first),
>> +                               link);
>> +       last = list_last_entry(&stream->cs_samples, typeof(*last),
>> +                               link);
>> +
>> +       if (last->offset >= first->offset) {
>> +               /* Sufficient space available at the end of buffer? */
>> +               if (last->offset + 2*sample_size < max_offset)
>> +                       sample->offset = last->offset + sample_size;
>> +               /*
>> +                * Wraparound condition. Is sufficient space available at
>> +                * beginning of buffer?
>> +                */
>> +               else if (sample_size < first->offset)
>> +                       sample->offset = 0;
>> +               /* Insufficient space. Overwrite existing old entries */
>> +               else {
>> +                       u32 target_size = sample_size - first->offset;
>> +
>> +                       release_perf_samples(stream, target_size);
>> +                       sample->offset = 0;
>> +               }
>> +       } else {
>> +               /* Sufficient space available? */
>> +               if (last->offset + 2*sample_size < first->offset)
>> +                       sample->offset = last->offset + sample_size;
>> +               /* Insufficient space. Overwrite existing old entries */
>> +               else {
>> +                       u32 target_size = sample_size -
>> +                               (first->offset - last->offset -
>> +                               sample_size);
>> +
>> +                       release_perf_samples(stream, target_size);
>> +                       sample->offset = last->offset + sample_size;
>> +               }
>> +       }
>> +       list_add_tail(&sample->link, &stream->cs_samples);
>> +       spin_unlock_irqrestore(&stream->cs_samples_lock, flags);
>> +}
>> +
>> +/**
>> + * i915_emit_oa_report_capture - Insert the commands to capture OA
>> + * reports metrics into the render command stream
>> + * @request: request in whose context the metrics are being collected.
>> + * @preallocate: allocate space in ring for related sample.
>> + * @offset: command stream buffer offset where the OA metrics need to be
>> + * collected
>> + */
>> +static int i915_emit_oa_report_capture(
>> +                               struct drm_i915_gem_request *request,
>> +                               bool preallocate,
>> +                               u32 offset)
>> +{
>> +       struct drm_i915_private *dev_priv = request->i915;
>> +       struct intel_engine_cs *engine = request->engine;
>> +       struct i915_perf_stream *stream;
>> +       u32 addr = 0;
>> +       u32 cmd, len = 4, *cs;
>> +       int idx;
>> +
>> +       idx = srcu_read_lock(&engine->perf_srcu);
>> +       stream = srcu_dereference(engine->exclusive_stream,
>> &engine->perf_srcu);
>> +       addr = stream->cs_buffer.vma->node.start + offset;
>> +       srcu_read_unlock(&engine->perf_srcu, idx);
>> +
>> +       if (WARN_ON(addr & 0x3f)) {
>> +               DRM_ERROR("OA buffer address not aligned to 64 byte\n");
>> +               return -EINVAL;
>> +       }
>> +
>> +       if (preallocate)
>> +               request->reserved_space += len;
>> +       else
>> +               request->reserved_space -= len;
>> +
>> +       cs = intel_ring_begin(request, 4);
>> +       if (IS_ERR(cs))
>> +               return PTR_ERR(cs);
>> +
>> +       cmd = MI_REPORT_PERF_COUNT | (1<<0);
>> +       if (INTEL_GEN(dev_priv) >= 8)
>> +               cmd |= (2<<0);
>> +
>> +       *cs++ = cmd;
>> +       *cs++ = addr | MI_REPORT_PERF_COUNT_GGTT;
>> +       *cs++ = request->fence.seqno;
>> +
>> +       if (INTEL_GEN(dev_priv) >= 8)
>> +               *cs++ = 0;
>> +       else
>> +               *cs++ = MI_NOOP;
>> +
>> +       intel_ring_advance(request, cs);
>> +
>> +       return 0;
>> +}
>> +
>> +/**
>> + * i915_perf_stream_emit_sample_capture - Insert the commands to
>> capture perf
>> + * metrics into the GPU command stream
>> + * @stream: An i915-perf stream opened for GPU metrics
>> + * @request: request in whose context the metrics are being collected.
>> + * @preallocate: allocate space in ring for related sample.
>> + */
>> +static void i915_perf_stream_emit_sample_capture(
>> +                                       struct i915_perf_stream *stream,
>> +                                       struct drm_i915_gem_request
>> *request,
>> +                                       bool preallocate)
>> +{
>> +       struct reservation_object *resv = stream->cs_buffer.vma->resv;
>> +       struct i915_perf_cs_sample *sample;
>> +       unsigned long flags;
>> +       int ret;
>> +
>> +       sample = kzalloc(sizeof(*sample), GFP_KERNEL);
>> +       if (sample == NULL) {
>> +               DRM_ERROR("Perf sample alloc failed\n");
>> +               return;
>> +       }
>> +
>> +       sample->request = i915_gem_request_get(request);
>> +       sample->ctx_id = request->ctx->hw_id;
>> +
>> +       insert_perf_sample(stream, sample);
>> +
>> +       if (stream->sample_flags & SAMPLE_OA_REPORT) {
>> +               ret = i915_emit_oa_report_capture(request,
>> +                                                 preallocate,
>> +                                                 sample->offset);
>> +               if (ret)
>> +                       goto err_unref;
>> +       }
>> +
>> +       reservation_object_lock(resv, NULL);
>> +       if (reservation_object_reserve_shared(resv) == 0)
>> +               reservation_object_add_shared_fence(resv,
>> &request->fence);
>> +       reservation_object_unlock(resv);
>> +
>> +       i915_vma_move_to_active(stream->cs_buffer.vma, request,
>> +                                       EXEC_OBJECT_WRITE);
>> +       return;
>> +
>> +err_unref:
>> +       i915_gem_request_put(sample->request);
>> +       spin_lock_irqsave(&stream->cs_samples_lock, flags);
>> +       list_del(&sample->link);
>> +       spin_unlock_irqrestore(&stream->cs_samples_lock, flags);
>> +       kfree(sample);
>> +}
>> +
>> +/**
>> + * i915_perf_stream_release_samples - Release the perf command stream
>> samples
>> + * @stream: Stream from which sample are to be released.
>> + *
>> + * Note: The associated requests should be completed before releasing the
>> + * references here.
>> + */
>> +static void i915_perf_stream_release_samples(struct i915_perf_stream
>> *stream)
>> +{
>> +       struct i915_perf_cs_sample *entry, *next;
>> +       unsigned long flags;
>> +
>> +       list_for_each_entry_safe
>> +               (entry, next, &stream->cs_samples, link) {
>> +               i915_gem_request_put(entry->request);
>> +
>> +               spin_lock_irqsave(&stream->cs_samples_lock, flags);
>> +               list_del(&entry->link);
>> +               spin_unlock_irqrestore(&stream->cs_samples_lock, flags);
>> +               kfree(entry);
>> +       }
>> +}
>> +
>> +/**
>>    * oa_buffer_check_unlocked - check for data and update tail ptr state
>>    * @dev_priv: i915 device instance
>>    *
>> @@ -521,12 +806,13 @@ static int append_oa_status(struct i915_perf_stream
>> *stream,
>>   }
>>     /**
>> - * append_oa_sample - Copies single OA report into userspace read()
>> buffer.
>> - * @stream: An i915-perf stream opened for OA metrics
>> + * append_perf_sample - Copies single perf sample into userspace read()
>> buffer.
>> + * @stream: An i915-perf stream opened for perf samples
>>    * @buf: destination buffer given by userspace
>>    * @count: the number of bytes userspace wants to read
>>    * @offset: (inout): the current position for writing into @buf
>> - * @report: A single OA report to (optionally) include as part of the
>> sample
>> + * @data: perf sample data which contains (optionally) metrics configured
>> + * earlier when opening a stream
>>    *
>>    * The contents of a sample are configured through
>> `DRM_I915_PERF_PROP_SAMPLE_*`
>>    * properties when opening a stream, tracked as `stream->sample_flags`.
>> This
>> @@ -537,11 +823,11 @@ static int append_oa_status(struct i915_perf_stream
>> *stream,
>>    *
>>    * Returns: 0 on success, negative error code on failure.
>>    */
>> -static int append_oa_sample(struct i915_perf_stream *stream,
>> +static int append_perf_sample(struct i915_perf_stream *stream,
>>                             char __user *buf,
>>                             size_t count,
>>                             size_t *offset,
>> -                           const u8 *report)
>> +                           const struct i915_perf_sample_data *data)
>>   {
>>         struct drm_i915_private *dev_priv = stream->dev_priv;
>>         int report_size = dev_priv->perf.oa.oa_buffer.format_size;
>> @@ -569,16 +855,21 @@ static int append_oa_sample(struct i915_perf_stream
>> *stream,
>>          * transition. These are considered as source 'OABUFFER'.
>>          */
>>         if (sample_flags & SAMPLE_OA_SOURCE) {
>> -               u64 source = I915_PERF_SAMPLE_OA_SOURCE_OABUFFER;
>> +               if (copy_to_user(buf, &data->source, 8))
>> +                       return -EFAULT;
>> +               buf += 8;
>> +       }
>>   -             if (copy_to_user(buf, &source, 8))
>> +       if (sample_flags & SAMPLE_CTX_ID) {
>> +               if (copy_to_user(buf, &data->ctx_id, 8))
>>                         return -EFAULT;
>>                 buf += 8;
>>         }
>>         if (sample_flags & SAMPLE_OA_REPORT) {
>> -               if (copy_to_user(buf, report, report_size))
>> +               if (copy_to_user(buf, data->report, report_size))
>>                         return -EFAULT;
>> +               buf += report_size;
>>         }
>>         (*offset) += header.size;
>> @@ -587,11 +878,54 @@ static int append_oa_sample(struct i915_perf_stream
>> *stream,
>>   }
>>     /**
>> + * append_oa_buffer_sample - Copies single periodic OA report into
>> userspace
>> + * read() buffer.
>> + * @stream: An i915-perf stream opened for OA metrics
>> + * @buf: destination buffer given by userspace
>> + * @count: the number of bytes userspace wants to read
>> + * @offset: (inout): the current position for writing into @buf
>> + * @report: A single OA report to (optionally) include as part of the
>> sample
>> + *
>> + * Returns: 0 on success, negative error code on failure.
>> + */
>> +static int append_oa_buffer_sample(struct i915_perf_stream *stream,
>> +                               char __user *buf, size_t count,
>> +                               size_t *offset, const u8 *report)
>> +{
>> +       struct drm_i915_private *dev_priv = stream->dev_priv;
>> +       u32 sample_flags = stream->sample_flags;
>> +       struct i915_perf_sample_data data = { 0 };
>> +       u32 *report32 = (u32 *)report;
>> +
>> +       if (sample_flags & SAMPLE_OA_SOURCE)
>> +               data.source = I915_PERF_SAMPLE_OA_SOURCE_OABUFFER;
>> +
>> +       if (sample_flags & SAMPLE_CTX_ID) {
>> +               if (INTEL_INFO(dev_priv)->gen < 8)
>> +                       data.ctx_id = 0;
>> +               else {
>> +                       /*
>> +                        * XXX: Just keep the lower 21 bits for now since
>> I'm
>> +                        * not entirely sure if the HW touches any of the
>> higher
>> +                        * bits in this field
>> +                        */
>> +                       data.ctx_id = report32[2] & 0x1fffff;
>> +               }
>> +       }
>> +
>> +       if (sample_flags & SAMPLE_OA_REPORT)
>> +               data.report = report;
>> +
>> +       return append_perf_sample(stream, buf, count, offset, &data);
>> +}
>> +
>> +/**
>>    * Copies all buffered OA reports into userspace read() buffer.
>>    * @stream: An i915-perf stream opened for OA metrics
>>    * @buf: destination buffer given by userspace
>>    * @count: the number of bytes userspace wants to read
>>    * @offset: (inout): the current position for writing into @buf
>> + * @ts: copy OA reports till this timestamp
>>    *
>>    * Notably any error condition resulting in a short read (-%ENOSPC or
>>    * -%EFAULT) will be returned even though one or more records may
>> @@ -609,7 +943,8 @@ static int append_oa_sample(struct i915_perf_stream
>> *stream,
>>   static int gen8_append_oa_reports(struct i915_perf_stream *stream,
>>                                   char __user *buf,
>>                                   size_t count,
>> -                                 size_t *offset)
>> +                                 size_t *offset,
>> +                                 u32 ts)
>>   {
>>         struct drm_i915_private *dev_priv = stream->dev_priv;
>>         int report_size = dev_priv->perf.oa.oa_buffer.format_size;
>> @@ -623,7 +958,7 @@ static int gen8_append_oa_reports(struct
>> i915_perf_stream *stream,
>>         u32 taken;
>>         int ret = 0;
>>   -     if (WARN_ON(!stream->enabled))
>> +       if (WARN_ON(stream->state != I915_PERF_STREAM_ENABLED))
>>                 return -EIO;
>>         spin_lock_irqsave(&dev_priv->perf.oa.oa_buffer.ptr_lock, flags);
>> @@ -669,6 +1004,11 @@ static int gen8_append_oa_reports(struct
>> i915_perf_stream *stream,
>>                 u32 *report32 = (void *)report;
>>                 u32 ctx_id;
>>                 u32 reason;
>> +               u32 report_ts = report32[1];
>> +
>> +               /* Report timestamp should not exceed the given ts */
>> +               if (report_ts > ts)
>> +                       break;
>>                 /*
>>                  * All the report sizes factor neatly into the buffer
>> @@ -750,23 +1090,23 @@ static int gen8_append_oa_reports(struct
>> i915_perf_stream *stream,
>>                  * switches since it's not-uncommon for periodic samples
>> to
>>                  * identify a switch before any 'context switch' report.
>>                  */
>> -               if (!dev_priv->perf.oa.exclusive_stream->ctx ||
>> -                   dev_priv->perf.oa.specific_ctx_id == ctx_id ||
>> +               if (!stream->ctx ||
>> +                   stream->engine->specific_ctx_id == ctx_id ||
>>                     (dev_priv->perf.oa.oa_buffer.last_ctx_id ==
>> -                    dev_priv->perf.oa.specific_ctx_id) ||
>> +                    stream->engine->specific_ctx_id) ||
>>                     reason & OAREPORT_REASON_CTX_SWITCH) {
>>                         /*
>>                          * While filtering for a single context we avoid
>>                          * leaking the IDs of other contexts.
>>                          */
>> -                       if (dev_priv->perf.oa.exclusive_stream->ctx &&
>> -                           dev_priv->perf.oa.specific_ctx_id != ctx_id)
>> {
>> +                       if (stream->ctx &&
>> +                           stream->engine->specific_ctx_id != ctx_id) {
>>                                 report32[2] = INVALID_CTX_ID;
>>                         }
>>   -                     ret = append_oa_sample(stream, buf, count, offset,
>> -                                              report);
>> +                       ret = append_oa_buffer_sample(stream, buf, count,
>> +                                                     offset, report);
>>                         if (ret)
>>                                 break;
>>   @@ -807,6 +1147,7 @@ static int gen8_append_oa_reports(struct
>> i915_perf_stream *stream,
>>    * @buf: destination buffer given by userspace
>>    * @count: the number of bytes userspace wants to read
>>    * @offset: (inout): the current position for writing into @buf
>> + * @ts: copy OA reports till this timestamp
>>    *
>>    * Checks OA unit status registers and if necessary appends
>> corresponding
>>    * status records for userspace (such as for a buffer full condition)
>> and then
>> @@ -824,7 +1165,8 @@ static int gen8_append_oa_reports(struct
>> i915_perf_stream *stream,
>>   static int gen8_oa_read(struct i915_perf_stream *stream,
>>                         char __user *buf,
>>                         size_t count,
>> -                       size_t *offset)
>> +                       size_t *offset,
>> +                       u32 ts)
>>   {
>>         struct drm_i915_private *dev_priv = stream->dev_priv;
>>         u32 oastatus;
>> @@ -877,7 +1219,7 @@ static int gen8_oa_read(struct i915_perf_stream
>> *stream,
>>                            oastatus & ~GEN8_OASTATUS_REPORT_LOST);
>>         }
>>   -     return gen8_append_oa_reports(stream, buf, count, offset);
>> +       return gen8_append_oa_reports(stream, buf, count, offset, ts);
>>   }
>>     /**
>> @@ -886,6 +1228,7 @@ static int gen8_oa_read(struct i915_perf_stream
>> *stream,
>>    * @buf: destination buffer given by userspace
>>    * @count: the number of bytes userspace wants to read
>>    * @offset: (inout): the current position for writing into @buf
>> + * @ts: copy OA reports till this timestamp
>>    *
>>    * Notably any error condition resulting in a short read (-%ENOSPC or
>>    * -%EFAULT) will be returned even though one or more records may
>> @@ -903,7 +1246,8 @@ static int gen8_oa_read(struct i915_perf_stream
>> *stream,
>>   static int gen7_append_oa_reports(struct i915_perf_stream *stream,
>>                                   char __user *buf,
>>                                   size_t count,
>> -                                 size_t *offset)
>> +                                 size_t *offset,
>> +                                 u32 ts)
>>   {
>>         struct drm_i915_private *dev_priv = stream->dev_priv;
>>         int report_size = dev_priv->perf.oa.oa_buffer.format_size;
>> @@ -917,7 +1261,7 @@ static int gen7_append_oa_reports(struct
>> i915_perf_stream *stream,
>>         u32 taken;
>>         int ret = 0;
>>   -     if (WARN_ON(!stream->enabled))
>> +       if (WARN_ON(stream->state != I915_PERF_STREAM_ENABLED))
>>                 return -EIO;
>>         spin_lock_irqsave(&dev_priv->perf.oa.oa_buffer.ptr_lock, flags);
>> @@ -984,7 +1328,12 @@ static int gen7_append_oa_reports(struct
>> i915_perf_stream *stream,
>>                         continue;
>>                 }
>>   -             ret = append_oa_sample(stream, buf, count, offset,
>> report);
>> +               /* Report timestamp should not exceed the given ts */
>> +               if (report32[1] > ts)
>> +                       break;
>> +
>> +               ret = append_oa_buffer_sample(stream, buf, count, offset,
>> +                                             report);
>>                 if (ret)
>>                         break;
>>   @@ -1022,6 +1371,7 @@ static int gen7_append_oa_reports(struct
>> i915_perf_stream *stream,
>>    * @buf: destination buffer given by userspace
>>    * @count: the number of bytes userspace wants to read
>>    * @offset: (inout): the current position for writing into @buf
>> + * @ts: copy OA reports till this timestamp
>>    *
>>    * Checks Gen 7 specific OA unit status registers and if necessary
>> appends
>>    * corresponding status records for userspace (such as for a buffer full
>> @@ -1035,7 +1385,8 @@ static int gen7_append_oa_reports(struct
>> i915_perf_stream *stream,
>>   static int gen7_oa_read(struct i915_perf_stream *stream,
>>                         char __user *buf,
>>                         size_t count,
>> -                       size_t *offset)
>> +                       size_t *offset,
>> +                       u32 ts)
>>   {
>>         struct drm_i915_private *dev_priv = stream->dev_priv;
>>         u32 oastatus1;
>> @@ -1097,16 +1448,172 @@ static int gen7_oa_read(struct i915_perf_stream
>> *stream,
>>                         GEN7_OASTATUS1_REPORT_LOST;
>>         }
>>   -     return gen7_append_oa_reports(stream, buf, count, offset);
>> +       return gen7_append_oa_reports(stream, buf, count, offset, ts);
>> +}
>> +
>> +/**
>> + * append_cs_buffer_sample - Copies single perf sample data associated
>> with
>> + * GPU command stream, into userspace read() buffer.
>> + * @stream: An i915-perf stream opened for perf CS metrics
>> + * @buf: destination buffer given by userspace
>> + * @count: the number of bytes userspace wants to read
>> + * @offset: (inout): the current position for writing into @buf
>> + * @node: Sample data associated with perf metrics
>> + *
>> + * Returns: 0 on success, negative error code on failure.
>> + */
>> +static int append_cs_buffer_sample(struct i915_perf_stream *stream,
>> +                               char __user *buf,
>> +                               size_t count,
>> +                               size_t *offset,
>> +                               struct i915_perf_cs_sample *node)
>> +{
>> +       struct drm_i915_private *dev_priv = stream->dev_priv;
>> +       struct i915_perf_sample_data data = { 0 };
>> +       u32 sample_flags = stream->sample_flags;
>> +       int ret = 0;
>> +
>> +       if (sample_flags & SAMPLE_OA_REPORT) {
>> +               const u8 *report = stream->cs_buffer.vaddr + node->offset;
>> +               u32 sample_ts = *(u32 *)(report + 4);
>> +
>> +               data.report = report;
>> +
>> +               /* First, append the periodic OA samples having lower
>> +                * timestamp values
>> +                */
>> +               ret = dev_priv->perf.oa.ops.read(stream, buf, count,
>> offset,
>> +                                                sample_ts);
>> +               if (ret)
>> +                       return ret;
>> +       }
>> +
>> +       if (sample_flags & SAMPLE_OA_SOURCE)
>> +               data.source = I915_PERF_SAMPLE_OA_SOURCE_CS;
>> +
>> +       if (sample_flags & SAMPLE_CTX_ID)
>> +               data.ctx_id = node->ctx_id;
>> +
>> +       return append_perf_sample(stream, buf, count, offset, &data);
>>   }
>>     /**
>> - * i915_oa_wait_unlocked - handles blocking IO until OA data available
>> + * append_cs_buffer_samples: Copies all command stream based perf samples
>> + * into userspace read() buffer.
>> + * @stream: An i915-perf stream opened for perf CS metrics
>> + * @buf: destination buffer given by userspace
>> + * @count: the number of bytes userspace wants to read
>> + * @offset: (inout): the current position for writing into @buf
>> + *
>> + * Notably any error condition resulting in a short read (-%ENOSPC or
>> + * -%EFAULT) will be returned even though one or more records may
>> + * have been successfully copied. In this case it's up to the caller
>> + * to decide if the error should be squashed before returning to
>> + * userspace.
>> + *
>> + * Returns: 0 on success, negative error code on failure.
>> + */
>> +static int append_cs_buffer_samples(struct i915_perf_stream *stream,
>> +                               char __user *buf,
>> +                               size_t count,
>> +                               size_t *offset)
>> +{
>> +       struct i915_perf_cs_sample *entry, *next;
>> +       LIST_HEAD(free_list);
>> +       int ret = 0;
>> +       unsigned long flags;
>> +
>> +       spin_lock_irqsave(&stream->cs_samples_lock, flags);
>> +       if (list_empty(&stream->cs_samples)) {
>> +               spin_unlock_irqrestore(&stream->cs_samples_lock, flags);
>> +               return 0;
>> +       }
>> +       list_for_each_entry_safe(entry, next,
>> +                                &stream->cs_samples, link) {
>> +               if (!i915_gem_request_completed(entry->request))
>> +                       break;
>> +               list_move_tail(&entry->link, &free_list);
>> +       }
>> +       spin_unlock_irqrestore(&stream->cs_samples_lock, flags);
>> +
>> +       if (list_empty(&free_list))
>> +               return 0;
>> +
>> +       list_for_each_entry_safe(entry, next, &free_list, link) {
>> +               ret = append_cs_buffer_sample(stream, buf, count, offset,
>> +                                             entry);
>> +               if (ret)
>> +                       break;
>> +
>> +               list_del(&entry->link);
>> +               i915_gem_request_put(entry->request);
>> +               kfree(entry);
>> +       }
>> +
>> +       /* Don't discard remaining entries, keep them for next read */
>> +       spin_lock_irqsave(&stream->cs_samples_lock, flags);
>> +       list_splice(&free_list, &stream->cs_samples);
>> +       spin_unlock_irqrestore(&stream->cs_samples_lock, flags);
>> +
>> +       return ret;
>> +}
>> +
>> +/*
>> + * cs_buffer_is_empty - Checks whether the command stream buffer
>> + * associated with the stream has data available.
>>    * @stream: An i915-perf stream opened for OA metrics
>>    *
>> + * Returns: true if atleast one request associated with command stream is
>> + * completed, else returns false.
>> + */
>> +static bool cs_buffer_is_empty(struct i915_perf_stream *stream)
>> +
>> +{
>> +       struct i915_perf_cs_sample *entry = NULL;
>> +       struct drm_i915_gem_request *request = NULL;
>> +       unsigned long flags;
>> +
>> +       spin_lock_irqsave(&stream->cs_samples_lock, flags);
>> +       entry = list_first_entry_or_null(&stream->cs_samples,
>> +                       struct i915_perf_cs_sample, link);
>> +       if (entry)
>> +               request = entry->request;
>> +       spin_unlock_irqrestore(&stream->cs_samples_lock, flags);
>> +
>> +       if (!entry)
>> +               return true;
>> +       else if (!i915_gem_request_completed(request))
>> +               return true;
>> +       else
>> +               return false;
>> +}
>> +
>> +/**
>> + * stream_have_data_unlocked - Checks whether the stream has data
>> available
>> + * @stream: An i915-perf stream opened for OA metrics
>> + *
>> + * For command stream based streams, check if the command stream buffer
>> has
>> + * atleast one sample available, if not return false, irrespective of
>> periodic
>> + * oa buffer having the data or not.
>> + */
>> +
>> +static bool stream_have_data_unlocked(struct i915_perf_stream *stream)
>> +{
>> +       struct drm_i915_private *dev_priv = stream->dev_priv;
>> +
>> +       if (stream->cs_mode)
>> +               return !cs_buffer_is_empty(stream);
>> +       else
>> +               return oa_buffer_check_unlocked(dev_priv);
>> +}
>> +
>> +/**
>> + * i915_perf_stream_wait_unlocked - handles blocking IO until data
>> available
>> + * @stream: An i915-perf stream opened for GPU metrics
>> + *
>>    * Called when userspace tries to read() from a blocking stream FD
>> opened
>> - * for OA metrics. It waits until the hrtimer callback finds a non-empty
>> - * OA buffer and wakes us.
>> + * for perf metrics. It waits until the hrtimer callback finds a
>> non-empty
>> + * command stream buffer / OA buffer and wakes us.
>>    *
>>    * Note: it's acceptable to have this return with some false positives
>>    * since any subsequent read handling will return -EAGAIN if there isn't
>> @@ -1114,7 +1621,7 @@ static int gen7_oa_read(struct i915_perf_stream
>> *stream,
>>    *
>>    * Returns: zero on success or a negative error code
>>    */
>> -static int i915_oa_wait_unlocked(struct i915_perf_stream *stream)
>> +static int i915_perf_stream_wait_unlocked(struct i915_perf_stream
>> *stream)
>>   {
>>         struct drm_i915_private *dev_priv = stream->dev_priv;
>>   @@ -1122,32 +1629,47 @@ static int i915_oa_wait_unlocked(struct
>> i915_perf_stream *stream)
>>         if (!dev_priv->perf.oa.periodic)
>>                 return -EIO;
>>   -     return wait_event_interruptible(dev_priv->perf.oa.poll_wq,
>> -                                       oa_buffer_check_unlocked(dev_
>> priv));
>> +       if (stream->cs_mode) {
>> +               long int ret;
>> +
>> +               /* Wait for the all sampled requests. */
>> +               ret = reservation_object_wait_timeout_rcu(
>> +
>>  stream->cs_buffer.vma->resv,
>> +                                                   true,
>> +                                                   true,
>> +                                                   MAX_SCHEDULE_TIMEOUT);
>> +               if (unlikely(ret < 0)) {
>> +                       DRM_DEBUG_DRIVER("Failed to wait for sampled
>> requests: %li\n", ret);
>> +                       return ret;
>> +               }
>> +       }
>> +
>> +       return wait_event_interruptible(stream->poll_wq,
>> +                                       stream_have_data_unlocked(str
>> eam));
>>   }
>>     /**
>> - * i915_oa_poll_wait - call poll_wait() for an OA stream poll()
>> - * @stream: An i915-perf stream opened for OA metrics
>> + * i915_perf_stream_poll_wait - call poll_wait() for an stream poll()
>> + * @stream: An i915-perf stream opened for GPU metrics
>>    * @file: An i915 perf stream file
>>    * @wait: poll() state table
>>    *
>> - * For handling userspace polling on an i915 perf stream opened for OA
>> metrics,
>> + * For handling userspace polling on an i915 perf stream opened for
>> metrics,
>>    * this starts a poll_wait with the wait queue that our hrtimer
>> callback wakes
>> - * when it sees data ready to read in the circular OA buffer.
>> + * when it sees data ready to read either in command stream buffer or in
>> the
>> + * circular OA buffer.
>>    */
>> -static void i915_oa_poll_wait(struct i915_perf_stream *stream,
>> +static void i915_perf_stream_poll_wait(struct i915_perf_stream *stream,
>>                               struct file *file,
>>                               poll_table *wait)
>>   {
>> -       struct drm_i915_private *dev_priv = stream->dev_priv;
>> -
>> -       poll_wait(file, &dev_priv->perf.oa.poll_wq, wait);
>> +       poll_wait(file, &stream->poll_wq, wait);
>>   }
>>     /**
>> - * i915_oa_read - just calls through to &i915_oa_ops->read
>> - * @stream: An i915-perf stream opened for OA metrics
>> + * i915_perf_stream_read - Reads perf metrics available into userspace
>> read
>> + * buffer
>> + * @stream: An i915-perf stream opened for GPU metrics
>>    * @buf: destination buffer given by userspace
>>    * @count: the number of bytes userspace wants to read
>>    * @offset: (inout): the current position for writing into @buf
>> @@ -1157,14 +1679,21 @@ static void i915_oa_poll_wait(struct
>> i915_perf_stream *stream,
>>    *
>>    * Returns: zero on success or a negative error code
>>    */
>> -static int i915_oa_read(struct i915_perf_stream *stream,
>> +static int i915_perf_stream_read(struct i915_perf_stream *stream,
>>                         char __user *buf,
>>                         size_t count,
>>                         size_t *offset)
>>   {
>>         struct drm_i915_private *dev_priv = stream->dev_priv;
>>   -     return dev_priv->perf.oa.ops.read(stream, buf, count, offset);
>> +
>>
>
> Does the following code mean that a perf stream is either in cs_mode or OA
> mode?
> I couldn't see that condition in the function processing the opening
> parameters.
>
> The comments in the patch description also says :
>
> "Both periodic and CS based reports are associated with a single stream"
>
> The following code seems to contradict that. Can you explain how it works?
>
> Thanks
>

Hi Lionel,

If you look closely, append_cs_buffer_sample() function does merge sorting
of
OA reports from two independent buffers (OA buffer which has the periodic OA
samples and Command stream buffer for RCS based OA reports). This is done on
the basis of the report timestamps.
Therefore, in the code below, if stream->cs_mode is enabled, that means the
append_cs_buffer_samples() function needs to be called which will take care
of
collating the samples from these two independent buffers and copying to
stream's
buffer in merge sort'ed order. If cs_mode is not enabled, we can simply
collect
samples from periodic OA buffer and forward them to userspace (done via
perf.oa.ops.read() function).
Hope this addresses your question.

Regards,
Sourab

>
> +       if (stream->cs_mode)
>> +               return append_cs_buffer_samples(stream, buf, count,
>> offset);
>> +       else if (stream->sample_flags & SAMPLE_OA_REPORT)
>> +               return dev_priv->perf.oa.ops.read(stream, buf, count,
>> offset,
>> +                                               U32_MAX);
>> +       else
>> +               return -EINVAL;
>>   }
>>     /**
>> @@ -1182,7 +1711,7 @@ static int oa_get_render_ctx_id(struct
>> i915_perf_stream *stream)
>>         struct drm_i915_private *dev_priv = stream->dev_priv;
>>         if (i915.enable_execlists)
>> -               dev_priv->perf.oa.specific_ctx_id = stream->ctx->hw_id;
>> +               stream->engine->specific_ctx_id = stream->ctx->hw_id;
>>         else {
>>                 struct intel_engine_cs *engine = dev_priv->engine[RCS];
>>                 struct intel_ring *ring;
>> @@ -1209,7 +1738,7 @@ static int oa_get_render_ctx_id(struct
>> i915_perf_stream *stream)
>>                  * i915_ggtt_offset() on the fly) considering the
>> difference
>>                  * with gen8+ and execlists
>>                  */
>> -               dev_priv->perf.oa.specific_ctx_id =
>> +               stream->engine->specific_ctx_id =
>>                         i915_ggtt_offset(stream->ctx->
>> engine[engine->id].state);
>>         }
>>   @@ -1228,13 +1757,13 @@ static void oa_put_render_ctx_id(struct
>> i915_perf_stream *stream)
>>         struct drm_i915_private *dev_priv = stream->dev_priv;
>>         if (i915.enable_execlists) {
>> -               dev_priv->perf.oa.specific_ctx_id = INVALID_CTX_ID;
>> +               stream->engine->specific_ctx_id = INVALID_CTX_ID;
>>         } else {
>>                 struct intel_engine_cs *engine = dev_priv->engine[RCS];
>>                 mutex_lock(&dev_priv->drm.struct_mutex);
>>   -             dev_priv->perf.oa.specific_ctx_id = INVALID_CTX_ID;
>> +               stream->engine->specific_ctx_id = INVALID_CTX_ID;
>>                 engine->context_unpin(engine, stream->ctx);
>>                 mutex_unlock(&dev_priv->drm.struct_mutex);
>> @@ -1242,13 +1771,28 @@ static void oa_put_render_ctx_id(struct
>> i915_perf_stream *stream)
>>   }
>>     static void
>> +free_cs_buffer(struct i915_perf_stream *stream)
>> +{
>> +       struct drm_i915_private *dev_priv = stream->dev_priv;
>> +
>> +       mutex_lock(&dev_priv->drm.struct_mutex);
>> +
>> +       i915_gem_object_unpin_map(stream->cs_buffer.vma->obj);
>> +       i915_vma_unpin_and_release(&stream->cs_buffer.vma);
>> +
>> +       stream->cs_buffer.vma = NULL;
>> +       stream->cs_buffer.vaddr = NULL;
>> +
>> +       mutex_unlock(&dev_priv->drm.struct_mutex);
>> +}
>> +
>> +static void
>>   free_oa_buffer(struct drm_i915_private *i915)
>>   {
>>         mutex_lock(&i915->drm.struct_mutex);
>>         i915_gem_object_unpin_map(i915->perf.oa.oa_buffer.vma->obj);
>> -       i915_vma_unpin(i915->perf.oa.oa_buffer.vma);
>> -       i915_gem_object_put(i915->perf.oa.oa_buffer.vma->obj);
>> +       i915_vma_unpin_and_release(&i915->perf.oa.oa_buffer.vma);
>>         i915->perf.oa.oa_buffer.vma = NULL;
>>         i915->perf.oa.oa_buffer.vaddr = NULL;
>> @@ -1256,27 +1800,41 @@ static void oa_put_render_ctx_id(struct
>> i915_perf_stream *stream)
>>         mutex_unlock(&i915->drm.struct_mutex);
>>   }
>>   -static void i915_oa_stream_destroy(struct i915_perf_stream *stream)
>> +static void i915_perf_stream_destroy(struct i915_perf_stream *stream)
>>   {
>>         struct drm_i915_private *dev_priv = stream->dev_priv;
>> -
>> -       BUG_ON(stream != dev_priv->perf.oa.exclusive_stream);
>> +       struct intel_engine_cs *engine = stream->engine;
>> +       struct i915_perf_stream *engine_stream;
>> +       int idx;
>> +
>> +       idx = srcu_read_lock(&engine->perf_srcu);
>> +       engine_stream = srcu_dereference(engine->exclusive_stream,
>> +                                        &engine->perf_srcu);
>> +       if (WARN_ON(stream != engine_stream))
>> +               return;
>> +       srcu_read_unlock(&engine->perf_srcu, idx);
>>         /*
>>          * Unset exclusive_stream first, it might be checked while
>>          * disabling the metric set on gen8+.
>>          */
>> -       dev_priv->perf.oa.exclusive_stream = NULL;
>> +       rcu_assign_pointer(stream->engine->exclusive_stream, NULL);
>> +       synchronize_srcu(&stream->engine->perf_srcu);
>>   -     dev_priv->perf.oa.ops.disable_metric_set(dev_priv);
>> +       if (stream->using_oa) {
>> +               dev_priv->perf.oa.ops.disable_metric_set(dev_priv);
>>   -     free_oa_buffer(dev_priv);
>> +               free_oa_buffer(dev_priv);
>>   -     intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
>> -       intel_runtime_pm_put(dev_priv);
>> +               intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
>> +               intel_runtime_pm_put(dev_priv);
>>   -     if (stream->ctx)
>> -               oa_put_render_ctx_id(stream);
>> +               if (stream->ctx)
>> +                       oa_put_render_ctx_id(stream);
>> +       }
>> +
>> +       if (stream->cs_mode)
>> +               free_cs_buffer(stream);
>>         if (dev_priv->perf.oa.spurious_report_rs.missed) {
>>                 DRM_NOTE("%d spurious OA report notices suppressed due to
>> ratelimiting\n",
>> @@ -1325,11 +1883,6 @@ static void gen7_init_oa_buffer(struct
>> drm_i915_private *dev_priv)
>>          * memory...
>>          */
>>         memset(dev_priv->perf.oa.oa_buffer.vaddr, 0, OA_BUFFER_SIZE);
>> -
>> -       /* Maybe make ->pollin per-stream state if we support multiple
>> -        * concurrent streams in the future.
>> -        */
>> -       dev_priv->perf.oa.pollin = false;
>>   }
>>     static void gen8_init_oa_buffer(struct drm_i915_private *dev_priv)
>> @@ -1383,33 +1936,26 @@ static void gen8_init_oa_buffer(struct
>> drm_i915_private *dev_priv)
>>          * memory...
>>          */
>>         memset(dev_priv->perf.oa.oa_buffer.vaddr, 0, OA_BUFFER_SIZE);
>> -
>> -       /*
>> -        * Maybe make ->pollin per-stream state if we support multiple
>> -        * concurrent streams in the future.
>> -        */
>> -       dev_priv->perf.oa.pollin = false;
>>   }
>>   -static int alloc_oa_buffer(struct drm_i915_private *dev_priv)
>> +static int alloc_obj(struct drm_i915_private *dev_priv,
>> +                    struct i915_vma **vma, u8 **vaddr)
>>   {
>>         struct drm_i915_gem_object *bo;
>> -       struct i915_vma *vma;
>>         int ret;
>>   -     if (WARN_ON(dev_priv->perf.oa.oa_buffer.vma))
>> -               return -ENODEV;
>> +       intel_runtime_pm_get(dev_priv);
>>         ret = i915_mutex_lock_interruptible(&dev_priv->drm);
>>         if (ret)
>> -               return ret;
>> +               goto out;
>>         BUILD_BUG_ON_NOT_POWER_OF_2(OA_BUFFER_SIZE);
>>         BUILD_BUG_ON(OA_BUFFER_SIZE < SZ_128K || OA_BUFFER_SIZE > SZ_16M);
>>         bo = i915_gem_object_create(dev_priv, OA_BUFFER_SIZE);
>>         if (IS_ERR(bo)) {
>> -               DRM_ERROR("Failed to allocate OA buffer\n");
>> +               DRM_ERROR("Failed to allocate i915 perf obj\n");
>>                 ret = PTR_ERR(bo);
>>                 goto unlock;
>>         }
>> @@ -1419,42 +1965,83 @@ static int alloc_oa_buffer(struct
>> drm_i915_private *dev_priv)
>>                 goto err_unref;
>>         /* PreHSW required 512K alignment, HSW requires 16M */
>> -       vma = i915_gem_object_ggtt_pin(bo, NULL, 0, SZ_16M, 0);
>> -       if (IS_ERR(vma)) {
>> -               ret = PTR_ERR(vma);
>> +       *vma = i915_gem_object_ggtt_pin(bo, NULL, 0, SZ_16M, 0);
>> +       if (IS_ERR(*vma)) {
>> +               ret = PTR_ERR(*vma);
>>                 goto err_unref;
>>         }
>> -       dev_priv->perf.oa.oa_buffer.vma = vma;
>>   -     dev_priv->perf.oa.oa_buffer.vaddr =
>> -               i915_gem_object_pin_map(bo, I915_MAP_WB);
>> -       if (IS_ERR(dev_priv->perf.oa.oa_buffer.vaddr)) {
>> -               ret = PTR_ERR(dev_priv->perf.oa.oa_buffer.vaddr);
>> +       *vaddr = i915_gem_object_pin_map(bo, I915_MAP_WB);
>> +       if (IS_ERR(*vaddr)) {
>> +               ret = PTR_ERR(*vaddr);
>>                 goto err_unpin;
>>         }
>>   -     dev_priv->perf.oa.ops.init_oa_buffer(dev_priv);
>> -
>> -       DRM_DEBUG_DRIVER("OA Buffer initialized, gtt offset = 0x%x, vaddr
>> = %p\n",
>> -                        i915_ggtt_offset(dev_priv->per
>> f.oa.oa_buffer.vma),
>> -                        dev_priv->perf.oa.oa_buffer.vaddr);
>> -
>>         goto unlock;
>>     err_unpin:
>> -       __i915_vma_unpin(vma);
>> +       i915_vma_unpin(*vma);
>>     err_unref:
>>         i915_gem_object_put(bo);
>>   -     dev_priv->perf.oa.oa_buffer.vaddr = NULL;
>> -       dev_priv->perf.oa.oa_buffer.vma = NULL;
>> -
>>   unlock:
>>         mutex_unlock(&dev_priv->drm.struct_mutex);
>> +out:
>> +       intel_runtime_pm_put(dev_priv);
>>         return ret;
>>   }
>>   +static int alloc_oa_buffer(struct drm_i915_private *dev_priv)
>> +{
>> +       struct i915_vma *vma;
>> +       u8 *vaddr;
>> +       int ret;
>> +
>> +       if (WARN_ON(dev_priv->perf.oa.oa_buffer.vma))
>> +               return -ENODEV;
>> +
>> +       ret = alloc_obj(dev_priv, &vma, &vaddr);
>> +       if (ret)
>> +               return ret;
>> +
>> +       dev_priv->perf.oa.oa_buffer.vma = vma;
>> +       dev_priv->perf.oa.oa_buffer.vaddr = vaddr;
>> +
>> +       dev_priv->perf.oa.ops.init_oa_buffer(dev_priv);
>> +
>> +       DRM_DEBUG_DRIVER("OA Buffer initialized, gtt offset = 0x%x, vaddr
>> = %p",
>> +                        i915_ggtt_offset(dev_priv->per
>> f.oa.oa_buffer.vma),
>> +                        dev_priv->perf.oa.oa_buffer.vaddr);
>> +       return 0;
>> +}
>> +
>> +static int alloc_cs_buffer(struct i915_perf_stream *stream)
>> +{
>> +       struct drm_i915_private *dev_priv = stream->dev_priv;
>> +       struct i915_vma *vma;
>> +       u8 *vaddr;
>> +       int ret;
>> +
>> +       if (WARN_ON(stream->cs_buffer.vma))
>> +               return -ENODEV;
>> +
>> +       ret = alloc_obj(dev_priv, &vma, &vaddr);
>> +       if (ret)
>> +               return ret;
>> +
>> +       stream->cs_buffer.vma = vma;
>> +       stream->cs_buffer.vaddr = vaddr;
>> +       if (WARN_ON(!list_empty(&stream->cs_samples)))
>> +               INIT_LIST_HEAD(&stream->cs_samples);
>> +
>> +       DRM_DEBUG_DRIVER("Command stream buf initialized, gtt offset =
>> 0x%x, vaddr = %p",
>> +                        i915_ggtt_offset(stream->cs_buffer.vma),
>> +                        stream->cs_buffer.vaddr);
>> +
>> +       return 0;
>> +}
>> +
>>   static void config_oa_regs(struct drm_i915_private *dev_priv,
>>                            const struct i915_oa_reg *regs,
>>                            int n_regs)
>> @@ -1859,6 +2446,10 @@ static void gen8_disable_metric_set(struct
>> drm_i915_private *dev_priv)
>>     static void gen7_oa_enable(struct drm_i915_private *dev_priv)
>>   {
>> +       struct i915_perf_stream *stream;
>> +       struct intel_engine_cs *engine = dev_priv->engine[RCS];
>> +       int idx;
>> +
>>         /*
>>          * Reset buf pointers so we don't forward reports from before now.
>>          *
>> @@ -1870,11 +2461,11 @@ static void gen7_oa_enable(struct
>> drm_i915_private *dev_priv)
>>          */
>>         gen7_init_oa_buffer(dev_priv);
>>   -     if (dev_priv->perf.oa.exclusive_stream->enabled) {
>> -               struct i915_gem_context *ctx =
>> -                       dev_priv->perf.oa.exclusive_stream->ctx;
>> -               u32 ctx_id = dev_priv->perf.oa.specific_ctx_id;
>> -
>> +       idx = srcu_read_lock(&engine->perf_srcu);
>> +       stream = srcu_dereference(engine->exclusive_stream,
>> &engine->perf_srcu);
>> +       if (stream->state != I915_PERF_STREAM_DISABLED) {
>> +               struct i915_gem_context *ctx = stream->ctx;
>> +               u32 ctx_id = engine->specific_ctx_id;
>>                 bool periodic = dev_priv->perf.oa.periodic;
>>                 u32 period_exponent = dev_priv->perf.oa.period_exponent;
>>                 u32 report_format = dev_priv->perf.oa.oa_buffer.format;
>> @@ -1889,6 +2480,7 @@ static void gen7_oa_enable(struct drm_i915_private
>> *dev_priv)
>>                            GEN7_OACONTROL_ENABLE);
>>         } else
>>                 I915_WRITE(GEN7_OACONTROL, 0);
>> +       srcu_read_unlock(&engine->perf_srcu, idx);
>>   }
>>     static void gen8_oa_enable(struct drm_i915_private *dev_priv)
>> @@ -1917,22 +2509,23 @@ static void gen8_oa_enable(struct
>> drm_i915_private *dev_priv)
>>   }
>>     /**
>> - * i915_oa_stream_enable - handle `I915_PERF_IOCTL_ENABLE` for OA stream
>> - * @stream: An i915 perf stream opened for OA metrics
>> + * i915_perf_stream_enable - handle `I915_PERF_IOCTL_ENABLE` for perf
>> stream
>> + * @stream: An i915 perf stream opened for GPU metrics
>>    *
>>    * [Re]enables hardware periodic sampling according to the period
>> configured
>>    * when opening the stream. This also starts a hrtimer that will
>> periodically
>>    * check for data in the circular OA buffer for notifying userspace
>> (e.g.
>>    * during a read() or poll()).
>>    */
>> -static void i915_oa_stream_enable(struct i915_perf_stream *stream)
>> +static void i915_perf_stream_enable(struct i915_perf_stream *stream)
>>   {
>>         struct drm_i915_private *dev_priv = stream->dev_priv;
>>   -     dev_priv->perf.oa.ops.oa_enable(dev_priv);
>> +       if (stream->sample_flags & SAMPLE_OA_REPORT)
>> +               dev_priv->perf.oa.ops.oa_enable(dev_priv);
>>   -     if (dev_priv->perf.oa.periodic)
>> -               hrtimer_start(&dev_priv->perf.oa.poll_check_timer,
>> +       if (stream->cs_mode || dev_priv->perf.oa.periodic)
>> +               hrtimer_start(&dev_priv->perf.poll_check_timer,
>>                               ns_to_ktime(POLL_PERIOD),
>>                               HRTIMER_MODE_REL_PINNED);
>>   }
>> @@ -1948,34 +2541,39 @@ static void gen8_oa_disable(struct
>> drm_i915_private *dev_priv)
>>   }
>>     /**
>> - * i915_oa_stream_disable - handle `I915_PERF_IOCTL_DISABLE` for OA
>> stream
>> - * @stream: An i915 perf stream opened for OA metrics
>> + * i915_perf_stream_disable - handle `I915_PERF_IOCTL_DISABLE` for perf
>> stream
>> + * @stream: An i915 perf stream opened for GPU metrics
>>    *
>>    * Stops the OA unit from periodically writing counter reports into the
>>    * circular OA buffer. This also stops the hrtimer that periodically
>> checks for
>>    * data in the circular OA buffer, for notifying userspace.
>>    */
>> -static void i915_oa_stream_disable(struct i915_perf_stream *stream)
>> +static void i915_perf_stream_disable(struct i915_perf_stream *stream)
>>   {
>>         struct drm_i915_private *dev_priv = stream->dev_priv;
>>   -     dev_priv->perf.oa.ops.oa_disable(dev_priv);
>> +       if (stream->cs_mode || dev_priv->perf.oa.periodic)
>> +               hrtimer_cancel(&dev_priv->perf.poll_check_timer);
>> +
>> +       if (stream->cs_mode)
>> +               i915_perf_stream_release_samples(stream);
>>   -     if (dev_priv->perf.oa.periodic)
>> -               hrtimer_cancel(&dev_priv->perf.oa.poll_check_timer);
>> +       if (stream->sample_flags & SAMPLE_OA_REPORT)
>> +               dev_priv->perf.oa.ops.oa_disable(dev_priv);
>>   }
>>   -static const struct i915_perf_stream_ops i915_oa_stream_ops = {
>> -       .destroy = i915_oa_stream_destroy,
>> -       .enable = i915_oa_stream_enable,
>> -       .disable = i915_oa_stream_disable,
>> -       .wait_unlocked = i915_oa_wait_unlocked,
>> -       .poll_wait = i915_oa_poll_wait,
>> -       .read = i915_oa_read,
>> +static const struct i915_perf_stream_ops perf_stream_ops = {
>> +       .destroy = i915_perf_stream_destroy,
>> +       .enable = i915_perf_stream_enable,
>> +       .disable = i915_perf_stream_disable,
>> +       .wait_unlocked = i915_perf_stream_wait_unlocked,
>> +       .poll_wait = i915_perf_stream_poll_wait,
>> +       .read = i915_perf_stream_read,
>> +       .emit_sample_capture = i915_perf_stream_emit_sample_capture,
>>   };
>>     /**
>> - * i915_oa_stream_init - validate combined props for OA stream and init
>> + * i915_perf_stream_init - validate combined props for stream and init
>>    * @stream: An i915 perf stream
>>    * @param: The open parameters passed to `DRM_I915_PERF_OPEN`
>>    * @props: The property state that configures stream (individually
>> validated)
>> @@ -1984,58 +2582,35 @@ static void i915_oa_stream_disable(struct
>> i915_perf_stream *stream)
>>    * doesn't ensure that the combination necessarily makes sense.
>>    *
>>    * At this point it has been determined that userspace wants a stream of
>
>

[-- Attachment #1.2: Type: text/html, Size: 80649 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 12/12] drm/i915: Support for capturing MMIO register values
  2017-07-31  7:59 ` [PATCH 12/12] drm/i915: Support for capturing MMIO register values Sagar Arun Kamble
@ 2017-07-31 11:49   ` kbuild test robot
  2017-07-31 12:08   ` kbuild test robot
  1 sibling, 0 replies; 34+ messages in thread
From: kbuild test robot @ 2017-07-31 11:49 UTC (permalink / raw)
  To: Sagar Arun Kamble; +Cc: intel-gfx, Sourab Gupta, kbuild-all

[-- Attachment #1: Type: text/plain, Size: 7382 bytes --]

Hi Sourab,

[auto build test WARNING on drm-intel/for-linux-next]
[also build test WARNING on next-20170731]
[cannot apply to v4.13-rc3]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Sagar-Arun-Kamble/i915-perf-support-for-command-stream-based-OA-GPU-and-workload-metrics-capture/20170731-184412
base:   git://anongit.freedesktop.org/drm-intel for-linux-next
config: i386-randconfig-x071-201731 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All warnings (new ones prefixed by >>):

   drivers/gpu/drm/i915/i915_perf.c: In function 'read_properties_unlocked':
>> drivers/gpu/drm/i915/i915_perf.c:4022:35: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
       ret = copy_mmio_list(dev_priv, (u64 __user *)value);
                                      ^

vim +4022 drivers/gpu/drm/i915/i915_perf.c

  3867	
  3868	/**
  3869	 * read_properties_unlocked - validate + copy userspace stream open properties
  3870	 * @dev_priv: i915 device instance
  3871	 * @uprops: The array of u64 key value pairs given by userspace
  3872	 * @n_props: The number of key value pairs expected in @uprops
  3873	 * @props: The stream configuration built up while validating properties
  3874	 *
  3875	 * Note this function only validates properties in isolation it doesn't
  3876	 * validate that the combination of properties makes sense or that all
  3877	 * properties necessary for a particular kind of stream have been set.
  3878	 *
  3879	 * Note that there currently aren't any ordering requirements for properties so
  3880	 * we shouldn't validate or assume anything about ordering here. This doesn't
  3881	 * rule out defining new properties with ordering requirements in the future.
  3882	 */
  3883	static int read_properties_unlocked(struct drm_i915_private *dev_priv,
  3884					    u64 __user *uprops,
  3885					    u32 n_props,
  3886					    struct perf_open_properties *props)
  3887	{
  3888		u64 __user *uprop = uprops;
  3889		int i;
  3890	
  3891		memset(props, 0, sizeof(struct perf_open_properties));
  3892	
  3893		if (!n_props) {
  3894			DRM_DEBUG("No i915 perf properties given\n");
  3895			return -EINVAL;
  3896		}
  3897	
  3898		/* Considering that ID = 0 is reserved and assuming that we don't
  3899		 * (currently) expect any configurations to ever specify duplicate
  3900		 * values for a particular property ID then the last _PROP_MAX value is
  3901		 * one greater than the maximum number of properties we expect to get
  3902		 * from userspace.
  3903		 */
  3904		if (n_props >= DRM_I915_PERF_PROP_MAX) {
  3905			DRM_DEBUG("More i915 perf properties specified than exist\n");
  3906			return -EINVAL;
  3907		}
  3908	
  3909		for (i = 0; i < n_props; i++) {
  3910			u64 oa_period, oa_freq_hz;
  3911			u64 id, value;
  3912			int ret;
  3913	
  3914			ret = get_user(id, uprop);
  3915			if (ret)
  3916				return ret;
  3917	
  3918			ret = get_user(value, uprop + 1);
  3919			if (ret)
  3920				return ret;
  3921	
  3922			if (id == 0 || id >= DRM_I915_PERF_PROP_MAX) {
  3923				DRM_DEBUG("Unknown i915 perf property ID\n");
  3924				return -EINVAL;
  3925			}
  3926	
  3927			switch ((enum drm_i915_perf_property_id)id) {
  3928			case DRM_I915_PERF_PROP_CTX_HANDLE:
  3929				props->single_context = 1;
  3930				props->ctx_handle = value;
  3931				break;
  3932			case DRM_I915_PERF_PROP_SAMPLE_OA:
  3933				props->sample_flags |= SAMPLE_OA_REPORT;
  3934				break;
  3935			case DRM_I915_PERF_PROP_OA_METRICS_SET:
  3936				if (value == 0 ||
  3937				    value > dev_priv->perf.oa.n_builtin_sets) {
  3938					DRM_DEBUG("Unknown OA metric set ID\n");
  3939					return -EINVAL;
  3940				}
  3941				props->metrics_set = value;
  3942				break;
  3943			case DRM_I915_PERF_PROP_OA_FORMAT:
  3944				if (value == 0 || value >= I915_OA_FORMAT_MAX) {
  3945					DRM_DEBUG("Out-of-range OA report format %llu\n",
  3946						  value);
  3947					return -EINVAL;
  3948				}
  3949				if (!dev_priv->perf.oa.oa_formats[value].size) {
  3950					DRM_DEBUG("Unsupported OA report format %llu\n",
  3951						  value);
  3952					return -EINVAL;
  3953				}
  3954				props->oa_format = value;
  3955				break;
  3956			case DRM_I915_PERF_PROP_OA_EXPONENT:
  3957				if (value > OA_EXPONENT_MAX) {
  3958					DRM_DEBUG("OA timer exponent too high (> %u)\n",
  3959						 OA_EXPONENT_MAX);
  3960					return -EINVAL;
  3961				}
  3962	
  3963				/* Theoretically we can program the OA unit to sample
  3964				 * e.g. every 160ns for HSW, 167ns for BDW/SKL or 104ns
  3965				 * for BXT. We don't allow such high sampling
  3966				 * frequencies by default unless root.
  3967				 */
  3968	
  3969				BUILD_BUG_ON(sizeof(oa_period) != 8);
  3970				oa_period = oa_exponent_to_ns(dev_priv, value);
  3971	
  3972				/* This check is primarily to ensure that oa_period <=
  3973				 * UINT32_MAX (before passing to do_div which only
  3974				 * accepts a u32 denominator), but we can also skip
  3975				 * checking anything < 1Hz which implicitly can't be
  3976				 * limited via an integer oa_max_sample_rate.
  3977				 */
  3978				if (oa_period <= NSEC_PER_SEC) {
  3979					u64 tmp = NSEC_PER_SEC;
  3980					do_div(tmp, oa_period);
  3981					oa_freq_hz = tmp;
  3982				} else
  3983					oa_freq_hz = 0;
  3984	
  3985				if (oa_freq_hz > i915_oa_max_sample_rate &&
  3986				    !capable(CAP_SYS_ADMIN)) {
  3987					DRM_DEBUG("OA exponent would exceed the max sampling frequency (sysctl dev.i915.oa_max_sample_rate) %uHz without root privileges\n",
  3988						  i915_oa_max_sample_rate);
  3989					return -EACCES;
  3990				}
  3991	
  3992				props->oa_periodic = true;
  3993				props->oa_period_exponent = value;
  3994				break;
  3995			case DRM_I915_PERF_PROP_SAMPLE_OA_SOURCE:
  3996				props->sample_flags |= SAMPLE_OA_SOURCE;
  3997				break;
  3998			case DRM_I915_PERF_PROP_ENGINE: {
  3999					unsigned int user_ring_id =
  4000						value & I915_EXEC_RING_MASK;
  4001	
  4002					if (user_ring_id > I915_USER_RINGS)
  4003						return -EINVAL;
  4004	
  4005					props->cs_mode = true;
  4006					props->engine = user_ring_map[user_ring_id];
  4007				}
  4008				break;
  4009			case DRM_I915_PERF_PROP_SAMPLE_CTX_ID:
  4010				props->sample_flags |= SAMPLE_CTX_ID;
  4011				break;
  4012			case DRM_I915_PERF_PROP_SAMPLE_PID:
  4013				props->sample_flags |= SAMPLE_PID;
  4014				break;
  4015			case DRM_I915_PERF_PROP_SAMPLE_TAG:
  4016				props->sample_flags |= SAMPLE_TAG;
  4017				break;
  4018			case DRM_I915_PERF_PROP_SAMPLE_TS:
  4019				props->sample_flags |= SAMPLE_TS;
  4020				break;
  4021			case DRM_I915_PERF_PROP_SAMPLE_MMIO:
> 4022				ret = copy_mmio_list(dev_priv, (u64 __user *)value);
  4023				if (ret)
  4024					return ret;
  4025				props->sample_flags |= SAMPLE_MMIO;
  4026				break;
  4027			case DRM_I915_PERF_PROP_MAX:
  4028				MISSING_CASE(id);
  4029				return -EINVAL;
  4030			}
  4031	
  4032			uprop += 2;
  4033		}
  4034	
  4035		return 0;
  4036	}
  4037	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 28308 bytes --]

[-- Attachment #3: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 12/12] drm/i915: Support for capturing MMIO register values
  2017-07-31  7:59 ` [PATCH 12/12] drm/i915: Support for capturing MMIO register values Sagar Arun Kamble
  2017-07-31 11:49   ` kbuild test robot
@ 2017-07-31 12:08   ` kbuild test robot
  1 sibling, 0 replies; 34+ messages in thread
From: kbuild test robot @ 2017-07-31 12:08 UTC (permalink / raw)
  To: Sagar Arun Kamble; +Cc: intel-gfx, Sourab Gupta, kbuild-all

[-- Attachment #1: Type: text/plain, Size: 14694 bytes --]

Hi Sourab,

[auto build test ERROR on drm-intel/for-linux-next]
[also build test ERROR on next-20170731]
[cannot apply to v4.13-rc3]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Sagar-Arun-Kamble/i915-perf-support-for-command-stream-based-OA-GPU-and-workload-metrics-capture/20170731-184412
base:   git://anongit.freedesktop.org/drm-intel for-linux-next
config: i386-randconfig-x003-201731 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   drivers/gpu//drm/i915/i915_perf.c: In function 'read_properties_unlocked':
>> drivers/gpu//drm/i915/i915_perf.c:4022:35: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast]
       ret = copy_mmio_list(dev_priv, (u64 __user *)value);
                                      ^
   Cyclomatic Complexity 5 include/linux/compiler.h:__read_once_size
   Cyclomatic Complexity 5 include/linux/compiler.h:__write_once_size
   Cyclomatic Complexity 1 arch/x86/include/asm/bitops.h:fls
   Cyclomatic Complexity 1 include/linux/log2.h:__ilog2_u32
   Cyclomatic Complexity 3 include/linux/log2.h:is_power_of_2
   Cyclomatic Complexity 1 arch/x86/include/asm/current.h:get_current
   Cyclomatic Complexity 1 include/linux/list.h:INIT_LIST_HEAD
   Cyclomatic Complexity 1 include/linux/list.h:__list_add_valid
   Cyclomatic Complexity 1 include/linux/list.h:__list_del_entry_valid
   Cyclomatic Complexity 2 include/linux/list.h:__list_add
   Cyclomatic Complexity 1 include/linux/list.h:list_add_tail
   Cyclomatic Complexity 1 include/linux/list.h:__list_del
   Cyclomatic Complexity 2 include/linux/list.h:__list_del_entry
   Cyclomatic Complexity 1 include/linux/list.h:list_del
   Cyclomatic Complexity 1 include/linux/list.h:list_move_tail
   Cyclomatic Complexity 1 include/linux/list.h:list_empty
   Cyclomatic Complexity 1 include/linux/list.h:__list_splice
   Cyclomatic Complexity 2 include/linux/list.h:list_splice
   Cyclomatic Complexity 1 arch/x86/include/asm/atomic.h:atomic_read
   Cyclomatic Complexity 1 arch/x86/include/asm/atomic.h:atomic_dec
   Cyclomatic Complexity 1 include/asm-generic/getorder.h:__get_order
   Cyclomatic Complexity 1 include/linux/err.h:PTR_ERR
   Cyclomatic Complexity 1 include/linux/thread_info.h:check_object_size
   Cyclomatic Complexity 6 include/linux/thread_info.h:check_copy_size
   Cyclomatic Complexity 1 include/linux/spinlock.h:spinlock_check
   Cyclomatic Complexity 1 include/linux/spinlock.h:spin_unlock_irqrestore
   Cyclomatic Complexity 1 include/linux/ktime.h:ns_to_ktime
   Cyclomatic Complexity 56 include/linux/slab.h:kmalloc_index
   Cyclomatic Complexity 67 include/linux/slab.h:kmalloc_large
   Cyclomatic Complexity 9 include/linux/slab.h:kmalloc
   Cyclomatic Complexity 1 include/linux/slab.h:kzalloc
   Cyclomatic Complexity 2 include/linux/uaccess.h:copy_from_user
   Cyclomatic Complexity 2 include/linux/uaccess.h:copy_to_user
   Cyclomatic Complexity 1 include/linux/ratelimit.h:ratelimit_set_flags
   Cyclomatic Complexity 1 include/drm/drm_mm.h:drm_mm_node_allocated
   Cyclomatic Complexity 1 drivers/gpu//drm/i915/i915_reg.h:i915_mmio_reg_offset
   Cyclomatic Complexity 1 drivers/gpu//drm/i915/i915_gem_request.h:dma_fence_is_i915
   Cyclomatic Complexity 1 drivers/gpu//drm/i915/i915_gem_request.h:i915_gem_request_global_seqno
   Cyclomatic Complexity 1 drivers/gpu//drm/i915/i915_gem_request.h:i915_seqno_passed
   Cyclomatic Complexity 1 drivers/gpu//drm/i915/i915_gem_request.h:i915_gem_active_raw
   Cyclomatic Complexity 1 drivers/gpu//drm/i915/intel_ringbuffer.h:intel_read_status_page
   Cyclomatic Complexity 1 drivers/gpu//drm/i915/intel_ringbuffer.h:intel_engine_get_seqno
   Cyclomatic Complexity 1 drivers/gpu//drm/i915/i915_vma.h:i915_vma_is_ggtt
   Cyclomatic Complexity 1 drivers/gpu//drm/i915/i915_vma.h:i915_vma_pin_count
   Cyclomatic Complexity 1 drivers/gpu//drm/i915/i915_vma.h:i915_vma_is_pinned
   Cyclomatic Complexity 1 drivers/gpu//drm/i915/i915_vma.h:__i915_vma_unpin
   Cyclomatic Complexity 1 drivers/gpu//drm/i915/i915_drv.h:intel_info
   Cyclomatic Complexity 1 drivers/gpu//drm/i915/i915_drv.h:i915_gem_object_has_pinned_pages
   Cyclomatic Complexity 1 drivers/gpu//drm/i915/i915_perf.c:gen8_oa_hw_tail_read
   Cyclomatic Complexity 1 drivers/gpu//drm/i915/i915_perf.c:gen7_oa_hw_tail_read
   Cyclomatic Complexity 1 drivers/gpu//drm/i915/i915_perf.c:get_gpu_ts_from_oa_report
   Cyclomatic Complexity 2 drivers/gpu//drm/i915/i915_perf.c:config_oa_regs
   Cyclomatic Complexity 1 drivers/gpu//drm/i915/i915_perf.c:hsw_disable_metric_set
   Cyclomatic Complexity 5 drivers/gpu//drm/i915/i915_perf.c:gen8_update_reg_state_unlocked
   Cyclomatic Complexity 1 drivers/gpu//drm/i915/i915_perf.c:gen7_oa_disable
   Cyclomatic Complexity 1 drivers/gpu//drm/i915/i915_perf.c:gen8_oa_disable
   Cyclomatic Complexity 3 drivers/gpu//drm/i915/i915_perf.c:i915_perf_read_locked
   Cyclomatic Complexity 3 arch/x86/include/asm/div64.h:div_u64_rem
   Cyclomatic Complexity 1 include/linux/math64.h:div_u64
   Cyclomatic Complexity 1 drivers/gpu//drm/i915/i915_perf.c:oa_exponent_to_ns
   Cyclomatic Complexity 10 drivers/gpu//drm/i915/i915_perf.c:check_mmio_whitelist
   Cyclomatic Complexity 5 drivers/gpu//drm/i915/i915_perf.c:i915_perf_disable_locked
   Cyclomatic Complexity 3 drivers/gpu//drm/i915/i915_perf.c:i915_perf_poll_locked
   Cyclomatic Complexity 4 drivers/gpu//drm/i915/i915_perf.c:append_oa_status
   Cyclomatic Complexity 25 drivers/gpu//drm/i915/i915_perf.c:append_perf_sample
   Cyclomatic Complexity 21 drivers/gpu//drm/i915/i915_perf.c:append_cs_buffer_sample
   Cyclomatic Complexity 12 include/linux/poll.h:poll_wait
   Cyclomatic Complexity 1 drivers/gpu//drm/i915/i915_perf.c:i915_perf_stream_poll_wait
   Cyclomatic Complexity 5 drivers/gpu//drm/i915/i915_perf.c:i915_perf_enable_locked
   Cyclomatic Complexity 3 drivers/gpu//drm/i915/i915_perf.c:i915_perf_ioctl_locked
   Cyclomatic Complexity 15 drivers/gpu//drm/i915/i915_perf.c:append_oa_buffer_sample
   Cyclomatic Complexity 1 include/linux/srcu.h:srcu_read_lock
   Cyclomatic Complexity 1 include/linux/srcu.h:srcu_read_unlock
   Cyclomatic Complexity 1 drivers/gpu//drm/i915/i915_perf.c:i915_perf_ioctl
   Cyclomatic Complexity 1 drivers/gpu//drm/i915/i915_perf.c:i915_perf_poll
   Cyclomatic Complexity 10 drivers/gpu//drm/i915/i915_perf.c:i915_perf_read
   Cyclomatic Complexity 3 drivers/gpu//drm/i915/i915_perf.c:oa_put_render_ctx_id
   Cyclomatic Complexity 10 drivers/gpu//drm/i915/i915_perf.c:copy_mmio_list
   Cyclomatic Complexity 1 include/linux/err.h:IS_ERR
   Cyclomatic Complexity 124 drivers/gpu//drm/i915/i915_perf.c:read_properties_unlocked
   Cyclomatic Complexity 2 include/linux/thread_info.h:copy_overflow
   Cyclomatic Complexity 11 drivers/gpu//drm/i915/i915_perf.c:gen8_oa_buffer_get_ctx_id
   Cyclomatic Complexity 9 drivers/gpu//drm/i915/i915_perf.c:gen7_oa_buffer_get_ctx_id
   Cyclomatic Complexity 1 include/linux/rcupdate.h:rcu_read_lock
   Cyclomatic Complexity 1 include/linux/idr.h:idr_find
   Cyclomatic Complexity 1 drivers/gpu//drm/i915/i915_drv.h:__i915_gem_context_lookup_rcu
   Cyclomatic Complexity 1 include/linux/kref.h:kref_get_unless_zero
   Cyclomatic Complexity 1 include/linux/rcupdate.h:rcu_read_unlock
   Cyclomatic Complexity 4 drivers/gpu//drm/i915/i915_drv.h:i915_gem_context_lookup
   Cyclomatic Complexity 1 drivers/gpu//drm/i915/intel_ringbuffer.h:intel_ring_advance
   Cyclomatic Complexity 1 drivers/gpu//drm/i915/i915_drv.h:__i915_gem_object_unpin_pages
   Cyclomatic Complexity 1 drivers/gpu//drm/i915/i915_drv.h:i915_gem_object_unpin_pages
   Cyclomatic Complexity 1 drivers/gpu//drm/i915/i915_drv.h:i915_gem_object_unpin_map
   Cyclomatic Complexity 3 drivers/gpu//drm/i915/i915_gem_request.h:__i915_gem_request_completed
   Cyclomatic Complexity 3 drivers/gpu//drm/i915/i915_gem_request.h:i915_gem_request_completed
   Cyclomatic Complexity 1 drivers/gpu//drm/i915/i915_vma.h:i915_ggtt_offset
   Cyclomatic Complexity 1 drivers/gpu//drm/i915/i915_perf.c:gen8_init_oa_buffer
   Cyclomatic Complexity 1 drivers/gpu//drm/i915/i915_perf.c:gen8_oa_enable

vim +4022 drivers/gpu//drm/i915/i915_perf.c

  3867	
  3868	/**
  3869	 * read_properties_unlocked - validate + copy userspace stream open properties
  3870	 * @dev_priv: i915 device instance
  3871	 * @uprops: The array of u64 key value pairs given by userspace
  3872	 * @n_props: The number of key value pairs expected in @uprops
  3873	 * @props: The stream configuration built up while validating properties
  3874	 *
  3875	 * Note this function only validates properties in isolation it doesn't
  3876	 * validate that the combination of properties makes sense or that all
  3877	 * properties necessary for a particular kind of stream have been set.
  3878	 *
  3879	 * Note that there currently aren't any ordering requirements for properties so
  3880	 * we shouldn't validate or assume anything about ordering here. This doesn't
  3881	 * rule out defining new properties with ordering requirements in the future.
  3882	 */
  3883	static int read_properties_unlocked(struct drm_i915_private *dev_priv,
  3884					    u64 __user *uprops,
  3885					    u32 n_props,
  3886					    struct perf_open_properties *props)
  3887	{
  3888		u64 __user *uprop = uprops;
  3889		int i;
  3890	
  3891		memset(props, 0, sizeof(struct perf_open_properties));
  3892	
  3893		if (!n_props) {
  3894			DRM_DEBUG("No i915 perf properties given\n");
  3895			return -EINVAL;
  3896		}
  3897	
  3898		/* Considering that ID = 0 is reserved and assuming that we don't
  3899		 * (currently) expect any configurations to ever specify duplicate
  3900		 * values for a particular property ID then the last _PROP_MAX value is
  3901		 * one greater than the maximum number of properties we expect to get
  3902		 * from userspace.
  3903		 */
  3904		if (n_props >= DRM_I915_PERF_PROP_MAX) {
  3905			DRM_DEBUG("More i915 perf properties specified than exist\n");
  3906			return -EINVAL;
  3907		}
  3908	
  3909		for (i = 0; i < n_props; i++) {
  3910			u64 oa_period, oa_freq_hz;
  3911			u64 id, value;
  3912			int ret;
  3913	
  3914			ret = get_user(id, uprop);
  3915			if (ret)
  3916				return ret;
  3917	
  3918			ret = get_user(value, uprop + 1);
  3919			if (ret)
  3920				return ret;
  3921	
  3922			if (id == 0 || id >= DRM_I915_PERF_PROP_MAX) {
  3923				DRM_DEBUG("Unknown i915 perf property ID\n");
  3924				return -EINVAL;
  3925			}
  3926	
  3927			switch ((enum drm_i915_perf_property_id)id) {
  3928			case DRM_I915_PERF_PROP_CTX_HANDLE:
  3929				props->single_context = 1;
  3930				props->ctx_handle = value;
  3931				break;
  3932			case DRM_I915_PERF_PROP_SAMPLE_OA:
  3933				props->sample_flags |= SAMPLE_OA_REPORT;
  3934				break;
  3935			case DRM_I915_PERF_PROP_OA_METRICS_SET:
  3936				if (value == 0 ||
  3937				    value > dev_priv->perf.oa.n_builtin_sets) {
  3938					DRM_DEBUG("Unknown OA metric set ID\n");
  3939					return -EINVAL;
  3940				}
  3941				props->metrics_set = value;
  3942				break;
  3943			case DRM_I915_PERF_PROP_OA_FORMAT:
  3944				if (value == 0 || value >= I915_OA_FORMAT_MAX) {
  3945					DRM_DEBUG("Out-of-range OA report format %llu\n",
  3946						  value);
  3947					return -EINVAL;
  3948				}
  3949				if (!dev_priv->perf.oa.oa_formats[value].size) {
  3950					DRM_DEBUG("Unsupported OA report format %llu\n",
  3951						  value);
  3952					return -EINVAL;
  3953				}
  3954				props->oa_format = value;
  3955				break;
  3956			case DRM_I915_PERF_PROP_OA_EXPONENT:
  3957				if (value > OA_EXPONENT_MAX) {
  3958					DRM_DEBUG("OA timer exponent too high (> %u)\n",
  3959						 OA_EXPONENT_MAX);
  3960					return -EINVAL;
  3961				}
  3962	
  3963				/* Theoretically we can program the OA unit to sample
  3964				 * e.g. every 160ns for HSW, 167ns for BDW/SKL or 104ns
  3965				 * for BXT. We don't allow such high sampling
  3966				 * frequencies by default unless root.
  3967				 */
  3968	
  3969				BUILD_BUG_ON(sizeof(oa_period) != 8);
  3970				oa_period = oa_exponent_to_ns(dev_priv, value);
  3971	
  3972				/* This check is primarily to ensure that oa_period <=
  3973				 * UINT32_MAX (before passing to do_div which only
  3974				 * accepts a u32 denominator), but we can also skip
  3975				 * checking anything < 1Hz which implicitly can't be
  3976				 * limited via an integer oa_max_sample_rate.
  3977				 */
  3978				if (oa_period <= NSEC_PER_SEC) {
  3979					u64 tmp = NSEC_PER_SEC;
  3980					do_div(tmp, oa_period);
  3981					oa_freq_hz = tmp;
  3982				} else
  3983					oa_freq_hz = 0;
  3984	
  3985				if (oa_freq_hz > i915_oa_max_sample_rate &&
  3986				    !capable(CAP_SYS_ADMIN)) {
  3987					DRM_DEBUG("OA exponent would exceed the max sampling frequency (sysctl dev.i915.oa_max_sample_rate) %uHz without root privileges\n",
  3988						  i915_oa_max_sample_rate);
  3989					return -EACCES;
  3990				}
  3991	
  3992				props->oa_periodic = true;
  3993				props->oa_period_exponent = value;
  3994				break;
  3995			case DRM_I915_PERF_PROP_SAMPLE_OA_SOURCE:
  3996				props->sample_flags |= SAMPLE_OA_SOURCE;
  3997				break;
  3998			case DRM_I915_PERF_PROP_ENGINE: {
  3999					unsigned int user_ring_id =
  4000						value & I915_EXEC_RING_MASK;
  4001	
  4002					if (user_ring_id > I915_USER_RINGS)
  4003						return -EINVAL;
  4004	
  4005					props->cs_mode = true;
  4006					props->engine = user_ring_map[user_ring_id];
  4007				}
  4008				break;
  4009			case DRM_I915_PERF_PROP_SAMPLE_CTX_ID:
  4010				props->sample_flags |= SAMPLE_CTX_ID;
  4011				break;
  4012			case DRM_I915_PERF_PROP_SAMPLE_PID:
  4013				props->sample_flags |= SAMPLE_PID;
  4014				break;
  4015			case DRM_I915_PERF_PROP_SAMPLE_TAG:
  4016				props->sample_flags |= SAMPLE_TAG;
  4017				break;
  4018			case DRM_I915_PERF_PROP_SAMPLE_TS:
  4019				props->sample_flags |= SAMPLE_TS;
  4020				break;
  4021			case DRM_I915_PERF_PROP_SAMPLE_MMIO:
> 4022				ret = copy_mmio_list(dev_priv, (u64 __user *)value);
  4023				if (ret)
  4024					return ret;
  4025				props->sample_flags |= SAMPLE_MMIO;
  4026				break;
  4027			case DRM_I915_PERF_PROP_MAX:
  4028				MISSING_CASE(id);
  4029				return -EINVAL;
  4030			}
  4031	
  4032			uprop += 2;
  4033		}
  4034	
  4035		return 0;
  4036	}
  4037	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 29266 bytes --]

[-- Attachment #3: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 03/12] drm/i915: Framework for capturing command stream based OA reports and ctx id info.
  2017-07-31 11:38     ` sourab gupta
@ 2017-07-31 14:25       ` Lionel Landwerlin
  0 siblings, 0 replies; 34+ messages in thread
From: Lionel Landwerlin @ 2017-07-31 14:25 UTC (permalink / raw)
  To: sourab gupta; +Cc: intel-gfx, Sourab Gupta


[-- Attachment #1.1: Type: text/plain, Size: 358 bytes --]

Thanks for the details!

On 31/07/17 12:38, sourab gupta wrote:
>
>
> On Mon, Jul 31, 2017 at 3:13 PM, Lionel Landwerlin 
> <lionel.g.landwerlin@intel.com <mailto:lionel.g.landwerlin@intel.com>> 
> wrote:
>
>     On 31/07/17 08:59, Sagar Arun Kamble wrote:
>
>         From: Sourab Gupta <sourab.gupta@intel.com
>         <mailto:sourab.gupta@intel.com>>
>


[-- Attachment #1.2: Type: text/html, Size: 1545 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 03/12] drm/i915: Framework for capturing command stream based OA reports and ctx id info.
  2017-07-31  7:59 ` [PATCH 03/12] drm/i915: Framework for capturing command stream based OA reports and ctx id info Sagar Arun Kamble
  2017-07-31  8:34   ` Chris Wilson
  2017-07-31  9:43   ` Lionel Landwerlin
@ 2017-07-31 15:38   ` kbuild test robot
  2017-07-31 15:45   ` Lionel Landwerlin
  3 siblings, 0 replies; 34+ messages in thread
From: kbuild test robot @ 2017-07-31 15:38 UTC (permalink / raw)
  To: Sagar Arun Kamble; +Cc: intel-gfx, Sourab Gupta, kbuild-all

[-- Attachment #1: Type: text/plain, Size: 15689 bytes --]

Hi Sourab,

[auto build test WARNING on drm-intel/for-linux-next]
[also build test WARNING on next-20170731]
[cannot apply to v4.13-rc3]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Sagar-Arun-Kamble/i915-perf-support-for-command-stream-based-OA-GPU-and-workload-metrics-capture/20170731-184412
base:   git://anongit.freedesktop.org/drm-intel for-linux-next
reproduce: make htmldocs

All warnings (new ones prefixed by >>):

   WARNING: convert(1) not found, for SVG to PDF conversion install ImageMagick (https://www.imagemagick.org)
   include/linux/init.h:1: warning: no structured comments found
   include/linux/mod_devicetable.h:687: warning: Excess struct/union/enum/typedef member 'ver_major' description in 'fsl_mc_device_id'
   include/linux/mod_devicetable.h:687: warning: Excess struct/union/enum/typedef member 'ver_minor' description in 'fsl_mc_device_id'
   kernel/sched/core.c:2080: warning: No description found for parameter 'rf'
   kernel/sched/core.c:2080: warning: Excess function parameter 'cookie' description in 'try_to_wake_up_local'
   include/linux/wait.h:555: warning: No description found for parameter 'wq'
   include/linux/wait.h:555: warning: Excess function parameter 'wq_head' description in 'wait_event_interruptible_hrtimeout'
   include/linux/wait.h:759: warning: No description found for parameter 'wq_head'
   include/linux/wait.h:759: warning: Excess function parameter 'wq' description in 'wait_event_killable'
   include/linux/kthread.h:26: warning: Excess function parameter '...' description in 'kthread_create'
   kernel/sys.c:1: warning: no structured comments found
   include/linux/device.h:968: warning: No description found for parameter 'dma_ops'
   drivers/dma-buf/seqno-fence.c:1: warning: no structured comments found
   include/linux/iio/iio.h:603: warning: No description found for parameter 'trig_readonly'
   include/linux/iio/trigger.h:151: warning: No description found for parameter 'indio_dev'
   include/linux/iio/trigger.h:151: warning: No description found for parameter 'trig'
   include/linux/device.h:969: warning: No description found for parameter 'dma_ops'
   drivers/ata/libata-eh.c:1449: warning: No description found for parameter 'link'
   drivers/ata/libata-eh.c:1449: warning: Excess function parameter 'ap' description in 'ata_eh_done'
   drivers/ata/libata-eh.c:1590: warning: No description found for parameter 'qc'
   drivers/ata/libata-eh.c:1590: warning: Excess function parameter 'dev' description in 'ata_eh_request_sense'
   drivers/mtd/nand/nand_base.c:2751: warning: Excess function parameter 'cached' description in 'nand_write_page'
   drivers/mtd/nand/nand_base.c:2751: warning: Excess function parameter 'cached' description in 'nand_write_page'
   arch/s390/include/asm/cmb.h:1: warning: no structured comments found
   drivers/scsi/scsi_lib.c:1116: warning: No description found for parameter 'rq'
   drivers/scsi/constants.c:1: warning: no structured comments found
   include/linux/usb/gadget.h:230: warning: No description found for parameter 'claimed'
   include/linux/usb/gadget.h:230: warning: No description found for parameter 'enabled'
   include/linux/usb/gadget.h:412: warning: No description found for parameter 'quirk_altset_not_supp'
   include/linux/usb/gadget.h:412: warning: No description found for parameter 'quirk_stall_not_supp'
   include/linux/usb/gadget.h:412: warning: No description found for parameter 'quirk_zlp_not_supp'
   fs/inode.c:1666: warning: No description found for parameter 'rcu'
   include/linux/jbd2.h:443: warning: No description found for parameter 'i_transaction'
   include/linux/jbd2.h:443: warning: No description found for parameter 'i_next_transaction'
   include/linux/jbd2.h:443: warning: No description found for parameter 'i_list'
   include/linux/jbd2.h:443: warning: No description found for parameter 'i_vfs_inode'
   include/linux/jbd2.h:443: warning: No description found for parameter 'i_flags'
   include/linux/jbd2.h:497: warning: No description found for parameter 'h_rsv_handle'
   include/linux/jbd2.h:497: warning: No description found for parameter 'h_reserved'
   include/linux/jbd2.h:497: warning: No description found for parameter 'h_type'
   include/linux/jbd2.h:497: warning: No description found for parameter 'h_line_no'
   include/linux/jbd2.h:497: warning: No description found for parameter 'h_start_jiffies'
   include/linux/jbd2.h:497: warning: No description found for parameter 'h_requested_credits'
   include/linux/jbd2.h:497: warning: No description found for parameter 'saved_alloc_context'
   include/linux/jbd2.h:1050: warning: No description found for parameter 'j_chkpt_bhs'
   include/linux/jbd2.h:1050: warning: No description found for parameter 'j_devname'
   include/linux/jbd2.h:1050: warning: No description found for parameter 'j_average_commit_time'
   include/linux/jbd2.h:1050: warning: No description found for parameter 'j_min_batch_time'
   include/linux/jbd2.h:1050: warning: No description found for parameter 'j_max_batch_time'
   include/linux/jbd2.h:1050: warning: No description found for parameter 'j_commit_callback'
   include/linux/jbd2.h:1050: warning: No description found for parameter 'j_failed_commit'
   include/linux/jbd2.h:1050: warning: No description found for parameter 'j_chksum_driver'
   include/linux/jbd2.h:1050: warning: No description found for parameter 'j_csum_seed'
   fs/jbd2/transaction.c:511: warning: No description found for parameter 'type'
   fs/jbd2/transaction.c:511: warning: No description found for parameter 'line_no'
   fs/jbd2/transaction.c:641: warning: No description found for parameter 'gfp_mask'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'debugfs_init'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'gem_open_object'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'gem_close_object'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'prime_handle_to_fd'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'prime_fd_to_handle'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'gem_prime_export'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'gem_prime_import'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'gem_prime_pin'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'gem_prime_unpin'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'gem_prime_res_obj'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'gem_prime_get_sg_table'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'gem_prime_import_sg_table'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'gem_prime_vmap'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'gem_prime_vunmap'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'gem_prime_mmap'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'gem_vm_ops'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'major'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'minor'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'patchlevel'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'name'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'desc'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'date'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'driver_features'
   drivers/gpu/drm/drm_modes.c:1623: warning: No description found for parameter 'display'
   drivers/gpu/drm/drm_modes.c:1623: warning: Excess function parameter 'connector' description in 'drm_mode_is_420_only'
>> drivers/gpu/drm/i915/i915_drv.h:2000: warning: No description found for parameter 'emit_sample_capture'
>> drivers/gpu/drm/i915/i915_drv.h:2078: warning: No description found for parameter 'cs_buffer'
>> drivers/gpu/drm/i915/i915_drv.h:2078: warning: No description found for parameter 'cs_samples'
>> drivers/gpu/drm/i915/i915_drv.h:2078: warning: No description found for parameter 'cs_samples_lock'
>> drivers/gpu/drm/i915/i915_drv.h:2078: warning: No description found for parameter 'poll_wq'
>> drivers/gpu/drm/i915/i915_drv.h:2078: warning: No description found for parameter 'pollin'
>> drivers/gpu/drm/i915/i915_drv.h:2000: warning: No description found for parameter 'emit_sample_capture'
>> drivers/gpu/drm/i915/i915_drv.h:2078: warning: No description found for parameter 'cs_buffer'
>> drivers/gpu/drm/i915/i915_drv.h:2078: warning: No description found for parameter 'cs_samples'
>> drivers/gpu/drm/i915/i915_drv.h:2078: warning: No description found for parameter 'cs_samples_lock'
>> drivers/gpu/drm/i915/i915_drv.h:2078: warning: No description found for parameter 'poll_wq'
>> drivers/gpu/drm/i915/i915_drv.h:2078: warning: No description found for parameter 'pollin'
>> drivers/gpu/drm/i915/i915_drv.h:2000: warning: No description found for parameter 'emit_sample_capture'
>> drivers/gpu/drm/i915/i915_drv.h:2078: warning: No description found for parameter 'cs_buffer'
>> drivers/gpu/drm/i915/i915_drv.h:2078: warning: No description found for parameter 'cs_samples'
>> drivers/gpu/drm/i915/i915_drv.h:2078: warning: No description found for parameter 'cs_samples_lock'
>> drivers/gpu/drm/i915/i915_drv.h:2078: warning: No description found for parameter 'poll_wq'
>> drivers/gpu/drm/i915/i915_drv.h:2078: warning: No description found for parameter 'pollin'
>> drivers/gpu/drm/i915/i915_perf.c:1: warning: no structured comments found
>> drivers/gpu/drm/i915/i915_perf.c:1: warning: no structured comments found

vim +/emit_sample_capture +2000 drivers/gpu/drm/i915/i915_drv.h

  1925	
  1926	/**
  1927	 * struct i915_perf_stream_ops - the OPs to support a specific stream type
  1928	 */
  1929	struct i915_perf_stream_ops {
  1930		/**
  1931		 * @enable: Enables the collection of HW samples, either in response to
  1932		 * `I915_PERF_IOCTL_ENABLE` or implicitly called when stream is opened
  1933		 * without `I915_PERF_FLAG_DISABLED`.
  1934		 */
  1935		void (*enable)(struct i915_perf_stream *stream);
  1936	
  1937		/**
  1938		 * @disable: Disables the collection of HW samples, either in response
  1939		 * to `I915_PERF_IOCTL_DISABLE` or implicitly called before destroying
  1940		 * the stream.
  1941		 */
  1942		void (*disable)(struct i915_perf_stream *stream);
  1943	
  1944		/**
  1945		 * @poll_wait: Call poll_wait, passing a wait queue that will be woken
  1946		 * once there is something ready to read() for the stream
  1947		 */
  1948		void (*poll_wait)(struct i915_perf_stream *stream,
  1949				  struct file *file,
  1950				  poll_table *wait);
  1951	
  1952		/**
  1953		 * @wait_unlocked: For handling a blocking read, wait until there is
  1954		 * something to ready to read() for the stream. E.g. wait on the same
  1955		 * wait queue that would be passed to poll_wait().
  1956		 */
  1957		int (*wait_unlocked)(struct i915_perf_stream *stream);
  1958	
  1959		/**
  1960		 * @read: Copy buffered metrics as records to userspace
  1961		 * **buf**: the userspace, destination buffer
  1962		 * **count**: the number of bytes to copy, requested by userspace
  1963		 * **offset**: zero at the start of the read, updated as the read
  1964		 * proceeds, it represents how many bytes have been copied so far and
  1965		 * the buffer offset for copying the next record.
  1966		 *
  1967		 * Copy as many buffered i915 perf samples and records for this stream
  1968		 * to userspace as will fit in the given buffer.
  1969		 *
  1970		 * Only write complete records; returning -%ENOSPC if there isn't room
  1971		 * for a complete record.
  1972		 *
  1973		 * Return any error condition that results in a short read such as
  1974		 * -%ENOSPC or -%EFAULT, even though these may be squashed before
  1975		 * returning to userspace.
  1976		 */
  1977		int (*read)(struct i915_perf_stream *stream,
  1978			    char __user *buf,
  1979			    size_t count,
  1980			    size_t *offset);
  1981	
  1982		/**
  1983		 * @destroy: Cleanup any stream specific resources.
  1984		 *
  1985		 * The stream will always be disabled before this is called.
  1986		 */
  1987		void (*destroy)(struct i915_perf_stream *stream);
  1988	
  1989		/*
  1990		 * @emit_sample_capture: Emit the commands in the command streamer
  1991		 * for a particular gpu engine.
  1992		 *
  1993		 * The commands are inserted to capture the perf sample data at
  1994		 * specific points during workload execution, such as before and after
  1995		 * the batch buffer.
  1996		 */
  1997		void (*emit_sample_capture)(struct i915_perf_stream *stream,
  1998					    struct drm_i915_gem_request *request,
  1999					    bool preallocate);
> 2000	};
  2001	
  2002	enum i915_perf_stream_state {
  2003		I915_PERF_STREAM_DISABLED,
  2004		I915_PERF_STREAM_ENABLE_IN_PROGRESS,
  2005		I915_PERF_STREAM_ENABLED,
  2006	};
  2007	
  2008	/**
  2009	 * struct i915_perf_stream - state for a single open stream FD
  2010	 */
  2011	struct i915_perf_stream {
  2012		/**
  2013		 * @dev_priv: i915 drm device
  2014		 */
  2015		struct drm_i915_private *dev_priv;
  2016	
  2017		/**
  2018		 * @engine: Engine to which this stream corresponds.
  2019		 */
  2020		struct intel_engine_cs *engine;
  2021	
  2022		/**
  2023		 * @sample_flags: Flags representing the `DRM_I915_PERF_PROP_SAMPLE_*`
  2024		 * properties given when opening a stream, representing the contents
  2025		 * of a single sample as read() by userspace.
  2026		 */
  2027		u32 sample_flags;
  2028	
  2029		/**
  2030		 * @sample_size: Considering the configured contents of a sample
  2031		 * combined with the required header size, this is the total size
  2032		 * of a single sample record.
  2033		 */
  2034		int sample_size;
  2035	
  2036		/**
  2037		 * @ctx: %NULL if measuring system-wide across all contexts or a
  2038		 * specific context that is being monitored.
  2039		 */
  2040		struct i915_gem_context *ctx;
  2041	
  2042		/**
  2043		 * @state: Current stream state, which can be either disabled, enabled,
  2044		 * or enable_in_progress, while considering whether the stream was
  2045		 * opened in a disabled state and based on `I915_PERF_IOCTL_ENABLE` and
  2046		 * `I915_PERF_IOCTL_DISABLE` calls.
  2047		 */
  2048		enum i915_perf_stream_state state;
  2049	
  2050		/**
  2051		 * @cs_mode: Whether command stream based perf sample collection is
  2052		 * enabled for this stream
  2053		 */
  2054		bool cs_mode;
  2055	
  2056		/**
  2057		 * @using_oa: Whether OA unit is in use for this particular stream
  2058		 */
  2059		bool using_oa;
  2060	
  2061		/**
  2062		 * @ops: The callbacks providing the implementation of this specific
  2063		 * type of configured stream.
  2064		 */
  2065		const struct i915_perf_stream_ops *ops;
  2066	
  2067		/* Command stream based perf data buffer */
  2068		struct {
  2069			struct i915_vma *vma;
  2070			u8 *vaddr;
  2071		} cs_buffer;
  2072	
  2073		struct list_head cs_samples;
  2074		spinlock_t cs_samples_lock;
  2075	
  2076		wait_queue_head_t poll_wq;
  2077		bool pollin;
> 2078	};
  2079	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 6735 bytes --]

[-- Attachment #3: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 03/12] drm/i915: Framework for capturing command stream based OA reports and ctx id info.
  2017-07-31  7:59 ` [PATCH 03/12] drm/i915: Framework for capturing command stream based OA reports and ctx id info Sagar Arun Kamble
                     ` (2 preceding siblings ...)
  2017-07-31 15:38   ` kbuild test robot
@ 2017-07-31 15:45   ` Lionel Landwerlin
  2017-08-01  9:29     ` Kamble, Sagar A
  3 siblings, 1 reply; 34+ messages in thread
From: Lionel Landwerlin @ 2017-07-31 15:45 UTC (permalink / raw)
  To: Sagar Arun Kamble, intel-gfx; +Cc: Sourab Gupta

On 31/07/17 08:59, Sagar Arun Kamble wrote:
> From: Sourab Gupta <sourab.gupta@intel.com>
>
> This patch introduces a framework to capture OA counter reports associated
> with Render command stream. We can then associate the reports captured
> through this mechanism with their corresponding context id's. This can be
> further extended to associate any other metadata information with the
> corresponding samples (since the association with Render command stream
> gives us the ability to capture these information while inserting the
> corresponding capture commands into the command stream).
>
> The OA reports generated in this way are associated with a corresponding
> workload, and thus can be used the delimit the workload (i.e. sample the
> counters at the workload boundaries), within an ongoing stream of periodic
> counter snapshots.
>
> There may be usecases wherein we need more than periodic OA capture mode
> which is supported currently. This mode is primarily used for two usecases:
>      - Ability to capture system wide metrics, alongwith the ability to map
>        the reports back to individual contexts (particularly for HSW).
>      - Ability to inject tags for work, into the reports. This provides
>        visibility into the multiple stages of work within single context.
>
> The userspace will be able to distinguish between the periodic and CS based
> OA reports by the virtue of source_info sample field.
>
> The command MI_REPORT_PERF_COUNT can be used to capture snapshots of OA
> counters, and is inserted at BB boundaries.
> The data thus captured will be stored in a separate buffer, which will
> be different from the buffer used otherwise for periodic OA capture mode.
> The metadata information pertaining to snapshot is maintained in a list,
> which also has offsets into the gem buffer object per captured snapshot.
> In order to track whether the gpu has completed processing the node,
> a field pertaining to corresponding gem request is added, which is tracked
> for completion of the command.
>
> Both periodic and CS based reports are associated with a single stream
> (corresponding to render engine), and it is expected to have the samples
> in the sequential order according to their timestamps. Now, since these
> reports are collected in separate buffers, these are merge sorted at the
> time of forwarding to userspace during the read call.
>
> v2: Aligning with the non-perf interface (custom drm ioctl based). Also,
> few related patches are squashed together for better readability
>
> v3: Updated perf sample capture emit hook name. Reserving space upfront
> in the ring for emitting sample capture commands and using
> req->fence.seqno for tracking samples. Added SRCU protection for streams.
> Changed the stream last_request tracking to resv object. (Chris)
> Updated perf.sample_lock spin_lock usage to avoid softlockups. Moved
> stream to global per-engine structure. (Sagar)
> Update unpin and put in the free routines to i915_vma_unpin_and_release.
> Making use of perf stream cs_buffer vma resv instead of separate resv obj.
> Pruned perf stream vma resv during gem_idle. (Chris)
> Changed payload field ctx_id to u64 to keep all sample data aligned at 8
> bytes. (Lionel)
> stall/flush prior to sample capture is not added. Do we need to give this
> control to user to select whether to stall/flush at each sample?
>
> Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
> Signed-off-by: Robert Bragg <robert@sixbynine.org>
> Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_drv.h            |  101 ++-
>   drivers/gpu/drm/i915/i915_gem.c            |    1 +
>   drivers/gpu/drm/i915/i915_gem_execbuffer.c |    8 +
>   drivers/gpu/drm/i915/i915_perf.c           | 1185 ++++++++++++++++++++++------
>   drivers/gpu/drm/i915/intel_engine_cs.c     |    4 +
>   drivers/gpu/drm/i915/intel_ringbuffer.c    |    2 +
>   drivers/gpu/drm/i915/intel_ringbuffer.h    |    5 +
>   include/uapi/drm/i915_drm.h                |   15 +
>   8 files changed, 1073 insertions(+), 248 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 2c7456f..8b1cecf 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1985,6 +1985,24 @@ struct i915_perf_stream_ops {
>   	 * The stream will always be disabled before this is called.
>   	 */
>   	void (*destroy)(struct i915_perf_stream *stream);
> +
> +	/*
> +	 * @emit_sample_capture: Emit the commands in the command streamer
> +	 * for a particular gpu engine.
> +	 *
> +	 * The commands are inserted to capture the perf sample data at
> +	 * specific points during workload execution, such as before and after
> +	 * the batch buffer.
> +	 */
> +	void (*emit_sample_capture)(struct i915_perf_stream *stream,
> +				    struct drm_i915_gem_request *request,
> +				    bool preallocate);
> +};
> +

It seems the motivation for this following enum is mostly to deal with 
the fact that engine->perf_srcu is set before the OA unit is configured.
Would it possible to set it later so that we get rid of the enum?

> +enum i915_perf_stream_state {
> +	I915_PERF_STREAM_DISABLED,
> +	I915_PERF_STREAM_ENABLE_IN_PROGRESS,
> +	I915_PERF_STREAM_ENABLED,
>   };
>   
>   /**
> @@ -1997,9 +2015,9 @@ struct i915_perf_stream {
>   	struct drm_i915_private *dev_priv;
>   
>   	/**
> -	 * @link: Links the stream into ``&drm_i915_private->streams``
> +	 * @engine: Engine to which this stream corresponds.
>   	 */
> -	struct list_head link;
> +	struct intel_engine_cs *engine;

This series only supports cs_mode on the RCS command stream.
Does it really make sense to add an srcu on all the engines rather than 
keeping it part of dev_priv->perf ?

We can always add that later if needed.

>   
>   	/**
>   	 * @sample_flags: Flags representing the `DRM_I915_PERF_PROP_SAMPLE_*`
> @@ -2022,17 +2040,41 @@ struct i915_perf_stream {
>   	struct i915_gem_context *ctx;
>   
>   	/**
> -	 * @enabled: Whether the stream is currently enabled, considering
> -	 * whether the stream was opened in a disabled state and based
> -	 * on `I915_PERF_IOCTL_ENABLE` and `I915_PERF_IOCTL_DISABLE` calls.
> +	 * @state: Current stream state, which can be either disabled, enabled,
> +	 * or enable_in_progress, while considering whether the stream was
> +	 * opened in a disabled state and based on `I915_PERF_IOCTL_ENABLE` and
> +	 * `I915_PERF_IOCTL_DISABLE` calls.
>   	 */
> -	bool enabled;
> +	enum i915_perf_stream_state state;
> +
> +	/**
> +	 * @cs_mode: Whether command stream based perf sample collection is
> +	 * enabled for this stream
> +	 */
> +	bool cs_mode;
> +
> +	/**
> +	 * @using_oa: Whether OA unit is in use for this particular stream
> +	 */
> +	bool using_oa;
>   
>   	/**
>   	 * @ops: The callbacks providing the implementation of this specific
>   	 * type of configured stream.
>   	 */
>   	const struct i915_perf_stream_ops *ops;
> +
> +	/* Command stream based perf data buffer */
> +	struct {
> +		struct i915_vma *vma;
> +		u8 *vaddr;
> +	} cs_buffer;
> +
> +	struct list_head cs_samples;
> +	spinlock_t cs_samples_lock;
> +
> +	wait_queue_head_t poll_wq;
> +	bool pollin;
>   };
>   
>   /**
> @@ -2095,7 +2137,8 @@ struct i915_oa_ops {
>   	int (*read)(struct i915_perf_stream *stream,
>   		    char __user *buf,
>   		    size_t count,
> -		    size_t *offset);
> +		    size_t *offset,
> +		    u32 ts);
>   
>   	/**
>   	 * @oa_hw_tail_read: read the OA tail pointer register
> @@ -2107,6 +2150,36 @@ struct i915_oa_ops {
>   	u32 (*oa_hw_tail_read)(struct drm_i915_private *dev_priv);
>   };
>   
> +/*
> + * i915_perf_cs_sample - Sample element to hold info about a single perf
> + * sample data associated with a particular GPU command stream.
> + */
> +struct i915_perf_cs_sample {
> +	/**
> +	 * @link: Links the sample into ``&stream->cs_samples``
> +	 */
> +	struct list_head link;
> +
> +	/**
> +	 * @request: GEM request associated with the sample. The commands to
> +	 * capture the perf metrics are inserted into the command streamer in
> +	 * context of this request.
> +	 */
> +	struct drm_i915_gem_request *request;
> +
> +	/**
> +	 * @offset: Offset into ``&stream->cs_buffer``
> +	 * where the perf metrics will be collected, when the commands inserted
> +	 * into the command stream are executed by GPU.
> +	 */
> +	u32 offset;
> +
> +	/**
> +	 * @ctx_id: Context ID associated with this perf sample
> +	 */
> +	u32 ctx_id;
> +};
> +
>   struct intel_cdclk_state {
>   	unsigned int cdclk, vco, ref;
>   };
> @@ -2431,17 +2504,10 @@ struct drm_i915_private {
>   		struct ctl_table_header *sysctl_header;
>   
>   		struct mutex lock;
> -		struct list_head streams;
> -
> -		struct {
> -			struct i915_perf_stream *exclusive_stream;
>   
> -			u32 specific_ctx_id;
> -
> -			struct hrtimer poll_check_timer;
> -			wait_queue_head_t poll_wq;
> -			bool pollin;
> +		struct hrtimer poll_check_timer;
>   
> +		struct {
>   			/**
>   			 * For rate limiting any notifications of spurious
>   			 * invalid OA reports
> @@ -3636,6 +3702,8 @@ int i915_perf_open_ioctl(struct drm_device *dev, void *data,
>   void i915_oa_init_reg_state(struct intel_engine_cs *engine,
>   			    struct i915_gem_context *ctx,
>   			    uint32_t *reg_state);
> +void i915_perf_emit_sample_capture(struct drm_i915_gem_request *req,
> +				   bool preallocate);
>   
>   /* i915_gem_evict.c */
>   int __must_check i915_gem_evict_something(struct i915_address_space *vm,
> @@ -3795,6 +3863,7 @@ int intel_engine_cmd_parser(struct intel_engine_cs *engine,
>   /* i915_perf.c */
>   extern void i915_perf_init(struct drm_i915_private *dev_priv);
>   extern void i915_perf_fini(struct drm_i915_private *dev_priv);
> +extern void i915_perf_streams_mark_idle(struct drm_i915_private *dev_priv);
>   extern void i915_perf_register(struct drm_i915_private *dev_priv);
>   extern void i915_perf_unregister(struct drm_i915_private *dev_priv);
>   
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 000a764..7b01548 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3220,6 +3220,7 @@ bool i915_gem_unset_wedged(struct drm_i915_private *i915)
>   
>   	intel_engines_mark_idle(dev_priv);
>   	i915_gem_timelines_mark_idle(dev_priv);
> +	i915_perf_streams_mark_idle(dev_priv);
>   
>   	GEM_BUG_ON(!dev_priv->gt.awake);
>   	dev_priv->gt.awake = false;
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index 5fa4476..bfe546b 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -1194,12 +1194,16 @@ static int __reloc_gpu_alloc(struct i915_execbuffer *eb,
>   	if (err)
>   		goto err_request;
>   
> +	i915_perf_emit_sample_capture(rq, true);
> +
>   	err = eb->engine->emit_bb_start(rq,
>   					batch->node.start, PAGE_SIZE,
>   					cache->gen > 5 ? 0 : I915_DISPATCH_SECURE);
>   	if (err)
>   		goto err_request;
>   
> +	i915_perf_emit_sample_capture(rq, false);
> +
>   	GEM_BUG_ON(!reservation_object_test_signaled_rcu(batch->resv, true));
>   	i915_vma_move_to_active(batch, rq, 0);
>   	reservation_object_lock(batch->resv, NULL);
> @@ -2029,6 +2033,8 @@ static int eb_submit(struct i915_execbuffer *eb)
>   			return err;
>   	}
>   
> +	i915_perf_emit_sample_capture(eb->request, true);
> +
>   	err = eb->engine->emit_bb_start(eb->request,
>   					eb->batch->node.start +
>   					eb->batch_start_offset,
> @@ -2037,6 +2043,8 @@ static int eb_submit(struct i915_execbuffer *eb)
>   	if (err)
>   		return err;
>   
> +	i915_perf_emit_sample_capture(eb->request, false);
> +
>   	return 0;
>   }
>   
> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> index b272653..57e1936 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -193,6 +193,7 @@
>   
>   #include <linux/anon_inodes.h>
>   #include <linux/sizes.h>
> +#include <linux/srcu.h>
>   
>   #include "i915_drv.h"
>   #include "i915_oa_hsw.h"
> @@ -288,6 +289,12 @@
>   #define OAREPORT_REASON_CTX_SWITCH     (1<<3)
>   #define OAREPORT_REASON_CLK_RATIO      (1<<5)
>   
> +/* Data common to periodic and RCS based OA samples */
> +struct i915_perf_sample_data {
> +	u64 source;
> +	u64 ctx_id;
> +	const u8 *report;
> +};
>   
>   /* For sysctl proc_dointvec_minmax of i915_oa_max_sample_rate
>    *
> @@ -328,8 +335,19 @@
>   	[I915_OA_FORMAT_C4_B8]		    = { 7, 64 },
>   };
>   
> +/* Duplicated from similar static enum in i915_gem_execbuffer.c */
> +#define I915_USER_RINGS (4)
> +static const enum intel_engine_id user_ring_map[I915_USER_RINGS + 1] = {
> +	[I915_EXEC_DEFAULT]     = RCS,
> +	[I915_EXEC_RENDER]      = RCS,
> +	[I915_EXEC_BLT]         = BCS,
> +	[I915_EXEC_BSD]         = VCS,
> +	[I915_EXEC_VEBOX]       = VECS
> +};
> +
>   #define SAMPLE_OA_REPORT      (1<<0)
>   #define SAMPLE_OA_SOURCE      (1<<1)
> +#define SAMPLE_CTX_ID	      (1<<2)
>   
>   /**
>    * struct perf_open_properties - for validated properties given to open a stream
> @@ -340,6 +358,9 @@
>    * @oa_format: An OA unit HW report format
>    * @oa_periodic: Whether to enable periodic OA unit sampling
>    * @oa_period_exponent: The OA unit sampling period is derived from this
> + * @cs_mode: Whether the stream is configured to enable collection of metrics
> + * associated with command stream of a particular GPU engine
> + * @engine: The GPU engine associated with the stream in case cs_mode is enabled
>    *
>    * As read_properties_unlocked() enumerates and validates the properties given
>    * to open a stream of metrics the configuration is built up in the structure
> @@ -356,6 +377,10 @@ struct perf_open_properties {
>   	int oa_format;
>   	bool oa_periodic;
>   	int oa_period_exponent;
> +
> +	/* Command stream mode */
> +	bool cs_mode;
> +	enum intel_engine_id engine;
>   };
>   
>   static u32 gen8_oa_hw_tail_read(struct drm_i915_private *dev_priv)
> @@ -371,6 +396,266 @@ static u32 gen7_oa_hw_tail_read(struct drm_i915_private *dev_priv)
>   }
>   
>   /**
> + * i915_perf_emit_sample_capture - Insert the commands to capture metrics into
> + * the command stream of a GPU engine.
> + * @request: request in whose context the metrics are being collected.
> + * @preallocate: allocate space in ring for related sample.
> + *
> + * The function provides a hook through which the commands to capture perf
> + * metrics, are inserted into the command stream of a GPU engine.
> + */
> +void i915_perf_emit_sample_capture(struct drm_i915_gem_request *request,
> +				   bool preallocate)
> +{
> +	struct intel_engine_cs *engine = request->engine;
> +	struct drm_i915_private *dev_priv = engine->i915;
> +	struct i915_perf_stream *stream;
> +	int idx;
> +
> +	if (!dev_priv->perf.initialized)
> +		return;
> +
> +	idx = srcu_read_lock(&engine->perf_srcu);
> +	stream = srcu_dereference(engine->exclusive_stream, &engine->perf_srcu);
> +	if (stream && (stream->state == I915_PERF_STREAM_ENABLED) &&
> +				stream->cs_mode)
> +		stream->ops->emit_sample_capture(stream, request,
> +						 preallocate);
> +	srcu_read_unlock(&engine->perf_srcu, idx);
> +}
> +
> +/**
> + * release_perf_samples - Release old perf samples to make space for new
> + * sample data.
> + * @stream: Stream from which space is to be freed up.
> + * @target_size: Space required to be freed up.
> + *
> + * We also dereference the associated request before deleting the sample.
> + * Also, no need to check whether the commands associated with old samples
> + * have been completed. This is because these sample entries are anyways going
> + * to be replaced by a new sample, and gpu will eventually overwrite the buffer
> + * contents, when the request associated with new sample completes.
> + */
> +static void release_perf_samples(struct i915_perf_stream *stream,
> +				 u32 target_size)
> +{
> +	struct drm_i915_private *dev_priv = stream->dev_priv;
> +	struct i915_perf_cs_sample *sample, *next;
> +	u32 sample_size = dev_priv->perf.oa.oa_buffer.format_size;
> +	u32 size = 0;
> +
> +	list_for_each_entry_safe
> +		(sample, next, &stream->cs_samples, link) {
> +		size += sample_size;
> +		i915_gem_request_put(sample->request);
> +		list_del(&sample->link);
> +		kfree(sample);
> +
> +		if (size >= target_size)
> +			break;
> +	}
> +}
> +
> +/**
> + * insert_perf_sample - Insert a perf sample entry to the sample list.
> + * @stream: Stream into which sample is to be inserted.
> + * @sample: perf CS sample to be inserted into the list
> + *
> + * This function never fails, since it always manages to insert the sample.
> + * If the space is exhausted in the buffer, it will remove the older
> + * entries in order to make space.
> + */
> +static void insert_perf_sample(struct i915_perf_stream *stream,
> +				struct i915_perf_cs_sample *sample)
> +{
> +	struct drm_i915_private *dev_priv = stream->dev_priv;
> +	struct i915_perf_cs_sample *first, *last;
> +	int max_offset = stream->cs_buffer.vma->obj->base.size;
> +	u32 sample_size = dev_priv->perf.oa.oa_buffer.format_size;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&stream->cs_samples_lock, flags);
> +	if (list_empty(&stream->cs_samples)) {
> +		sample->offset = 0;
> +		list_add_tail(&sample->link, &stream->cs_samples);
> +		spin_unlock_irqrestore(&stream->cs_samples_lock, flags);
> +		return;
> +	}
> +
> +	first = list_first_entry(&stream->cs_samples, typeof(*first),
> +				link);
> +	last = list_last_entry(&stream->cs_samples, typeof(*last),
> +				link);
> +
> +	if (last->offset >= first->offset) {
> +		/* Sufficient space available at the end of buffer? */
> +		if (last->offset + 2*sample_size < max_offset)
> +			sample->offset = last->offset + sample_size;
> +		/*
> +		 * Wraparound condition. Is sufficient space available at
> +		 * beginning of buffer?
> +		 */
> +		else if (sample_size < first->offset)
> +			sample->offset = 0;
> +		/* Insufficient space. Overwrite existing old entries */
> +		else {
> +			u32 target_size = sample_size - first->offset;
> +
> +			release_perf_samples(stream, target_size);
> +			sample->offset = 0;
> +		}
> +	} else {
> +		/* Sufficient space available? */
> +		if (last->offset + 2*sample_size < first->offset)
> +			sample->offset = last->offset + sample_size;
> +		/* Insufficient space. Overwrite existing old entries */
> +		else {
> +			u32 target_size = sample_size -
> +				(first->offset - last->offset -
> +				sample_size);
> +
> +			release_perf_samples(stream, target_size);
> +			sample->offset = last->offset + sample_size;
> +		}
> +	}
> +	list_add_tail(&sample->link, &stream->cs_samples);
> +	spin_unlock_irqrestore(&stream->cs_samples_lock, flags);
> +}
> +
> +/**
> + * i915_emit_oa_report_capture - Insert the commands to capture OA
> + * reports metrics into the render command stream
> + * @request: request in whose context the metrics are being collected.
> + * @preallocate: allocate space in ring for related sample.
> + * @offset: command stream buffer offset where the OA metrics need to be
> + * collected
> + */
> +static int i915_emit_oa_report_capture(
> +				struct drm_i915_gem_request *request,
> +				bool preallocate,
> +				u32 offset)
> +{
> +	struct drm_i915_private *dev_priv = request->i915;
> +	struct intel_engine_cs *engine = request->engine;
> +	struct i915_perf_stream *stream;
> +	u32 addr = 0;
> +	u32 cmd, len = 4, *cs;
> +	int idx;
> +
> +	idx = srcu_read_lock(&engine->perf_srcu);
> +	stream = srcu_dereference(engine->exclusive_stream, &engine->perf_srcu);
> +	addr = stream->cs_buffer.vma->node.start + offset;
> +	srcu_read_unlock(&engine->perf_srcu, idx);
> +
> +	if (WARN_ON(addr & 0x3f)) {
> +		DRM_ERROR("OA buffer address not aligned to 64 byte\n");
> +		return -EINVAL;
> +	}
> +
> +	if (preallocate)
> +		request->reserved_space += len;
> +	else
> +		request->reserved_space -= len;
> +
> +	cs = intel_ring_begin(request, 4);
> +	if (IS_ERR(cs))
> +		return PTR_ERR(cs);
> +
> +	cmd = MI_REPORT_PERF_COUNT | (1<<0);
> +	if (INTEL_GEN(dev_priv) >= 8)
> +		cmd |= (2<<0);
> +
> +	*cs++ = cmd;
> +	*cs++ = addr | MI_REPORT_PERF_COUNT_GGTT;
> +	*cs++ = request->fence.seqno;
> +
> +	if (INTEL_GEN(dev_priv) >= 8)
> +		*cs++ = 0;
> +	else
> +		*cs++ = MI_NOOP;
> +
> +	intel_ring_advance(request, cs);
> +
> +	return 0;
> +}
> +
> +/**
> + * i915_perf_stream_emit_sample_capture - Insert the commands to capture perf
> + * metrics into the GPU command stream
> + * @stream: An i915-perf stream opened for GPU metrics
> + * @request: request in whose context the metrics are being collected.
> + * @preallocate: allocate space in ring for related sample.
> + */
> +static void i915_perf_stream_emit_sample_capture(
> +					struct i915_perf_stream *stream,
> +					struct drm_i915_gem_request *request,
> +					bool preallocate)
> +{
> +	struct reservation_object *resv = stream->cs_buffer.vma->resv;
> +	struct i915_perf_cs_sample *sample;
> +	unsigned long flags;
> +	int ret;
> +
> +	sample = kzalloc(sizeof(*sample), GFP_KERNEL);
> +	if (sample == NULL) {
> +		DRM_ERROR("Perf sample alloc failed\n");
> +		return;
> +	}
> +
> +	sample->request = i915_gem_request_get(request);
> +	sample->ctx_id = request->ctx->hw_id;
> +
> +	insert_perf_sample(stream, sample);
> +
> +	if (stream->sample_flags & SAMPLE_OA_REPORT) {
> +		ret = i915_emit_oa_report_capture(request,
> +						  preallocate,
> +						  sample->offset);
> +		if (ret)
> +			goto err_unref;
> +	}
> +
> +	reservation_object_lock(resv, NULL);
> +	if (reservation_object_reserve_shared(resv) == 0)
> +		reservation_object_add_shared_fence(resv, &request->fence);
> +	reservation_object_unlock(resv);
> +
> +	i915_vma_move_to_active(stream->cs_buffer.vma, request,
> +					EXEC_OBJECT_WRITE);
> +	return;
> +
> +err_unref:
> +	i915_gem_request_put(sample->request);
> +	spin_lock_irqsave(&stream->cs_samples_lock, flags);
> +	list_del(&sample->link);
> +	spin_unlock_irqrestore(&stream->cs_samples_lock, flags);
> +	kfree(sample);
> +}
> +
> +/**
> + * i915_perf_stream_release_samples - Release the perf command stream samples
> + * @stream: Stream from which sample are to be released.
> + *
> + * Note: The associated requests should be completed before releasing the
> + * references here.
> + */
> +static void i915_perf_stream_release_samples(struct i915_perf_stream *stream)
> +{
> +	struct i915_perf_cs_sample *entry, *next;
> +	unsigned long flags;
> +
> +	list_for_each_entry_safe
> +		(entry, next, &stream->cs_samples, link) {
> +		i915_gem_request_put(entry->request);
> +
> +		spin_lock_irqsave(&stream->cs_samples_lock, flags);
> +		list_del(&entry->link);
> +		spin_unlock_irqrestore(&stream->cs_samples_lock, flags);
> +		kfree(entry);
> +	}
> +}
> +
> +/**
>    * oa_buffer_check_unlocked - check for data and update tail ptr state
>    * @dev_priv: i915 device instance
>    *
> @@ -521,12 +806,13 @@ static int append_oa_status(struct i915_perf_stream *stream,
>   }
>   
>   /**
> - * append_oa_sample - Copies single OA report into userspace read() buffer.
> - * @stream: An i915-perf stream opened for OA metrics
> + * append_perf_sample - Copies single perf sample into userspace read() buffer.
> + * @stream: An i915-perf stream opened for perf samples
>    * @buf: destination buffer given by userspace
>    * @count: the number of bytes userspace wants to read
>    * @offset: (inout): the current position for writing into @buf
> - * @report: A single OA report to (optionally) include as part of the sample
> + * @data: perf sample data which contains (optionally) metrics configured
> + * earlier when opening a stream
>    *
>    * The contents of a sample are configured through `DRM_I915_PERF_PROP_SAMPLE_*`
>    * properties when opening a stream, tracked as `stream->sample_flags`. This
> @@ -537,11 +823,11 @@ static int append_oa_status(struct i915_perf_stream *stream,
>    *
>    * Returns: 0 on success, negative error code on failure.
>    */
> -static int append_oa_sample(struct i915_perf_stream *stream,
> +static int append_perf_sample(struct i915_perf_stream *stream,
>   			    char __user *buf,
>   			    size_t count,
>   			    size_t *offset,
> -			    const u8 *report)
> +			    const struct i915_perf_sample_data *data)
>   {
>   	struct drm_i915_private *dev_priv = stream->dev_priv;
>   	int report_size = dev_priv->perf.oa.oa_buffer.format_size;
> @@ -569,16 +855,21 @@ static int append_oa_sample(struct i915_perf_stream *stream,
>   	 * transition. These are considered as source 'OABUFFER'.
>   	 */
>   	if (sample_flags & SAMPLE_OA_SOURCE) {
> -		u64 source = I915_PERF_SAMPLE_OA_SOURCE_OABUFFER;
> +		if (copy_to_user(buf, &data->source, 8))
> +			return -EFAULT;
> +		buf += 8;
> +	}
>   
> -		if (copy_to_user(buf, &source, 8))
> +	if (sample_flags & SAMPLE_CTX_ID) {
> +		if (copy_to_user(buf, &data->ctx_id, 8))
>   			return -EFAULT;
>   		buf += 8;
>   	}
>   
>   	if (sample_flags & SAMPLE_OA_REPORT) {
> -		if (copy_to_user(buf, report, report_size))
> +		if (copy_to_user(buf, data->report, report_size))
>   			return -EFAULT;
> +		buf += report_size;
>   	}
>   
>   	(*offset) += header.size;
> @@ -587,11 +878,54 @@ static int append_oa_sample(struct i915_perf_stream *stream,
>   }
>   
>   /**
> + * append_oa_buffer_sample - Copies single periodic OA report into userspace
> + * read() buffer.
> + * @stream: An i915-perf stream opened for OA metrics
> + * @buf: destination buffer given by userspace
> + * @count: the number of bytes userspace wants to read
> + * @offset: (inout): the current position for writing into @buf
> + * @report: A single OA report to (optionally) include as part of the sample
> + *
> + * Returns: 0 on success, negative error code on failure.
> + */
> +static int append_oa_buffer_sample(struct i915_perf_stream *stream,
> +				char __user *buf, size_t count,
> +				size_t *offset,	const u8 *report)
> +{
> +	struct drm_i915_private *dev_priv = stream->dev_priv;
> +	u32 sample_flags = stream->sample_flags;
> +	struct i915_perf_sample_data data = { 0 };
> +	u32 *report32 = (u32 *)report;
> +
> +	if (sample_flags & SAMPLE_OA_SOURCE)
> +		data.source = I915_PERF_SAMPLE_OA_SOURCE_OABUFFER;
> +
> +	if (sample_flags & SAMPLE_CTX_ID) {
> +		if (INTEL_INFO(dev_priv)->gen < 8)
> +			data.ctx_id = 0;
> +		else {
> +			/*
> +			 * XXX: Just keep the lower 21 bits for now since I'm
> +			 * not entirely sure if the HW touches any of the higher
> +			 * bits in this field
> +			 */
> +			data.ctx_id = report32[2] & 0x1fffff;
> +		}
> +	}
> +
> +	if (sample_flags & SAMPLE_OA_REPORT)
> +		data.report = report;
> +
> +	return append_perf_sample(stream, buf, count, offset, &data);
> +}
> +
> +/**
>    * Copies all buffered OA reports into userspace read() buffer.
>    * @stream: An i915-perf stream opened for OA metrics
>    * @buf: destination buffer given by userspace
>    * @count: the number of bytes userspace wants to read
>    * @offset: (inout): the current position for writing into @buf
> + * @ts: copy OA reports till this timestamp
>    *
>    * Notably any error condition resulting in a short read (-%ENOSPC or
>    * -%EFAULT) will be returned even though one or more records may
> @@ -609,7 +943,8 @@ static int append_oa_sample(struct i915_perf_stream *stream,
>   static int gen8_append_oa_reports(struct i915_perf_stream *stream,
>   				  char __user *buf,
>   				  size_t count,
> -				  size_t *offset)
> +				  size_t *offset,
> +				  u32 ts)
>   {
>   	struct drm_i915_private *dev_priv = stream->dev_priv;
>   	int report_size = dev_priv->perf.oa.oa_buffer.format_size;
> @@ -623,7 +958,7 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
>   	u32 taken;
>   	int ret = 0;
>   
> -	if (WARN_ON(!stream->enabled))
> +	if (WARN_ON(stream->state != I915_PERF_STREAM_ENABLED))
>   		return -EIO;
>   
>   	spin_lock_irqsave(&dev_priv->perf.oa.oa_buffer.ptr_lock, flags);
> @@ -669,6 +1004,11 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
>   		u32 *report32 = (void *)report;
>   		u32 ctx_id;
>   		u32 reason;
> +		u32 report_ts = report32[1];
> +
> +		/* Report timestamp should not exceed the given ts */
> +		if (report_ts > ts)
> +			break;
>   
>   		/*
>   		 * All the report sizes factor neatly into the buffer
> @@ -750,23 +1090,23 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
>   		 * switches since it's not-uncommon for periodic samples to
>   		 * identify a switch before any 'context switch' report.
>   		 */
> -		if (!dev_priv->perf.oa.exclusive_stream->ctx ||
> -		    dev_priv->perf.oa.specific_ctx_id == ctx_id ||
> +		if (!stream->ctx ||
> +		    stream->engine->specific_ctx_id == ctx_id ||
>   		    (dev_priv->perf.oa.oa_buffer.last_ctx_id ==
> -		     dev_priv->perf.oa.specific_ctx_id) ||
> +		     stream->engine->specific_ctx_id) ||
>   		    reason & OAREPORT_REASON_CTX_SWITCH) {
>   
>   			/*
>   			 * While filtering for a single context we avoid
>   			 * leaking the IDs of other contexts.
>   			 */
> -			if (dev_priv->perf.oa.exclusive_stream->ctx &&
> -			    dev_priv->perf.oa.specific_ctx_id != ctx_id) {
> +			if (stream->ctx &&
> +			    stream->engine->specific_ctx_id != ctx_id) {
>   				report32[2] = INVALID_CTX_ID;
>   			}
>   
> -			ret = append_oa_sample(stream, buf, count, offset,
> -					       report);
> +			ret = append_oa_buffer_sample(stream, buf, count,
> +						      offset, report);
>   			if (ret)
>   				break;
>   
> @@ -807,6 +1147,7 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
>    * @buf: destination buffer given by userspace
>    * @count: the number of bytes userspace wants to read
>    * @offset: (inout): the current position for writing into @buf
> + * @ts: copy OA reports till this timestamp
>    *
>    * Checks OA unit status registers and if necessary appends corresponding
>    * status records for userspace (such as for a buffer full condition) and then
> @@ -824,7 +1165,8 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
>   static int gen8_oa_read(struct i915_perf_stream *stream,
>   			char __user *buf,
>   			size_t count,
> -			size_t *offset)
> +			size_t *offset,
> +			u32 ts)
>   {
>   	struct drm_i915_private *dev_priv = stream->dev_priv;
>   	u32 oastatus;
> @@ -877,7 +1219,7 @@ static int gen8_oa_read(struct i915_perf_stream *stream,
>   			   oastatus & ~GEN8_OASTATUS_REPORT_LOST);
>   	}
>   
> -	return gen8_append_oa_reports(stream, buf, count, offset);
> +	return gen8_append_oa_reports(stream, buf, count, offset, ts);
>   }
>   
>   /**
> @@ -886,6 +1228,7 @@ static int gen8_oa_read(struct i915_perf_stream *stream,
>    * @buf: destination buffer given by userspace
>    * @count: the number of bytes userspace wants to read
>    * @offset: (inout): the current position for writing into @buf
> + * @ts: copy OA reports till this timestamp
>    *
>    * Notably any error condition resulting in a short read (-%ENOSPC or
>    * -%EFAULT) will be returned even though one or more records may
> @@ -903,7 +1246,8 @@ static int gen8_oa_read(struct i915_perf_stream *stream,
>   static int gen7_append_oa_reports(struct i915_perf_stream *stream,
>   				  char __user *buf,
>   				  size_t count,
> -				  size_t *offset)
> +				  size_t *offset,
> +				  u32 ts)
>   {
>   	struct drm_i915_private *dev_priv = stream->dev_priv;
>   	int report_size = dev_priv->perf.oa.oa_buffer.format_size;
> @@ -917,7 +1261,7 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream,
>   	u32 taken;
>   	int ret = 0;
>   
> -	if (WARN_ON(!stream->enabled))
> +	if (WARN_ON(stream->state != I915_PERF_STREAM_ENABLED))
>   		return -EIO;
>   
>   	spin_lock_irqsave(&dev_priv->perf.oa.oa_buffer.ptr_lock, flags);
> @@ -984,7 +1328,12 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream,
>   			continue;
>   		}
>   
> -		ret = append_oa_sample(stream, buf, count, offset, report);
> +		/* Report timestamp should not exceed the given ts */
> +		if (report32[1] > ts)
> +			break;
> +
> +		ret = append_oa_buffer_sample(stream, buf, count, offset,
> +					      report);
>   		if (ret)
>   			break;
>   
> @@ -1022,6 +1371,7 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream,
>    * @buf: destination buffer given by userspace
>    * @count: the number of bytes userspace wants to read
>    * @offset: (inout): the current position for writing into @buf
> + * @ts: copy OA reports till this timestamp
>    *
>    * Checks Gen 7 specific OA unit status registers and if necessary appends
>    * corresponding status records for userspace (such as for a buffer full
> @@ -1035,7 +1385,8 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream,
>   static int gen7_oa_read(struct i915_perf_stream *stream,
>   			char __user *buf,
>   			size_t count,
> -			size_t *offset)
> +			size_t *offset,
> +			u32 ts)
>   {
>   	struct drm_i915_private *dev_priv = stream->dev_priv;
>   	u32 oastatus1;
> @@ -1097,16 +1448,172 @@ static int gen7_oa_read(struct i915_perf_stream *stream,
>   			GEN7_OASTATUS1_REPORT_LOST;
>   	}
>   
> -	return gen7_append_oa_reports(stream, buf, count, offset);
> +	return gen7_append_oa_reports(stream, buf, count, offset, ts);
> +}
> +
> +/**
> + * append_cs_buffer_sample - Copies single perf sample data associated with
> + * GPU command stream, into userspace read() buffer.
> + * @stream: An i915-perf stream opened for perf CS metrics
> + * @buf: destination buffer given by userspace
> + * @count: the number of bytes userspace wants to read
> + * @offset: (inout): the current position for writing into @buf
> + * @node: Sample data associated with perf metrics
> + *
> + * Returns: 0 on success, negative error code on failure.
> + */
> +static int append_cs_buffer_sample(struct i915_perf_stream *stream,
> +				char __user *buf,
> +				size_t count,
> +				size_t *offset,
> +				struct i915_perf_cs_sample *node)
> +{
> +	struct drm_i915_private *dev_priv = stream->dev_priv;
> +	struct i915_perf_sample_data data = { 0 };
> +	u32 sample_flags = stream->sample_flags;
> +	int ret = 0;
> +
> +	if (sample_flags & SAMPLE_OA_REPORT) {
> +		const u8 *report = stream->cs_buffer.vaddr + node->offset;
> +		u32 sample_ts = *(u32 *)(report + 4);
> +
> +		data.report = report;
> +
> +		/* First, append the periodic OA samples having lower
> +		 * timestamp values
> +		 */
> +		ret = dev_priv->perf.oa.ops.read(stream, buf, count, offset,
> +						 sample_ts);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	if (sample_flags & SAMPLE_OA_SOURCE)
> +		data.source = I915_PERF_SAMPLE_OA_SOURCE_CS;
> +
> +	if (sample_flags & SAMPLE_CTX_ID)
> +		data.ctx_id = node->ctx_id;
> +
> +	return append_perf_sample(stream, buf, count, offset, &data);
>   }
>   
>   /**
> - * i915_oa_wait_unlocked - handles blocking IO until OA data available
> + * append_cs_buffer_samples: Copies all command stream based perf samples
> + * into userspace read() buffer.
> + * @stream: An i915-perf stream opened for perf CS metrics
> + * @buf: destination buffer given by userspace
> + * @count: the number of bytes userspace wants to read
> + * @offset: (inout): the current position for writing into @buf
> + *
> + * Notably any error condition resulting in a short read (-%ENOSPC or
> + * -%EFAULT) will be returned even though one or more records may
> + * have been successfully copied. In this case it's up to the caller
> + * to decide if the error should be squashed before returning to
> + * userspace.
> + *
> + * Returns: 0 on success, negative error code on failure.
> + */
> +static int append_cs_buffer_samples(struct i915_perf_stream *stream,
> +				char __user *buf,
> +				size_t count,
> +				size_t *offset)
> +{
> +	struct i915_perf_cs_sample *entry, *next;
> +	LIST_HEAD(free_list);
> +	int ret = 0;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&stream->cs_samples_lock, flags);
> +	if (list_empty(&stream->cs_samples)) {
> +		spin_unlock_irqrestore(&stream->cs_samples_lock, flags);
> +		return 0;
> +	}
> +	list_for_each_entry_safe(entry, next,
> +				 &stream->cs_samples, link) {
> +		if (!i915_gem_request_completed(entry->request))
> +			break;
> +		list_move_tail(&entry->link, &free_list);
> +	}
> +	spin_unlock_irqrestore(&stream->cs_samples_lock, flags);
> +
> +	if (list_empty(&free_list))
> +		return 0;
> +
> +	list_for_each_entry_safe(entry, next, &free_list, link) {
> +		ret = append_cs_buffer_sample(stream, buf, count, offset,
> +					      entry);
> +		if (ret)
> +			break;
> +
> +		list_del(&entry->link);
> +		i915_gem_request_put(entry->request);
> +		kfree(entry);
> +	}
> +
> +	/* Don't discard remaining entries, keep them for next read */
> +	spin_lock_irqsave(&stream->cs_samples_lock, flags);
> +	list_splice(&free_list, &stream->cs_samples);
> +	spin_unlock_irqrestore(&stream->cs_samples_lock, flags);
> +
> +	return ret;
> +}
> +
> +/*
> + * cs_buffer_is_empty - Checks whether the command stream buffer
> + * associated with the stream has data available.
>    * @stream: An i915-perf stream opened for OA metrics
>    *
> + * Returns: true if atleast one request associated with command stream is
> + * completed, else returns false.
> + */
> +static bool cs_buffer_is_empty(struct i915_perf_stream *stream)
> +
> +{
> +	struct i915_perf_cs_sample *entry = NULL;
> +	struct drm_i915_gem_request *request = NULL;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&stream->cs_samples_lock, flags);
> +	entry = list_first_entry_or_null(&stream->cs_samples,
> +			struct i915_perf_cs_sample, link);
> +	if (entry)
> +		request = entry->request;
> +	spin_unlock_irqrestore(&stream->cs_samples_lock, flags);
> +
> +	if (!entry)
> +		return true;
> +	else if (!i915_gem_request_completed(request))
> +		return true;
> +	else
> +		return false;
> +}
> +
> +/**
> + * stream_have_data_unlocked - Checks whether the stream has data available
> + * @stream: An i915-perf stream opened for OA metrics
> + *
> + * For command stream based streams, check if the command stream buffer has
> + * atleast one sample available, if not return false, irrespective of periodic
> + * oa buffer having the data or not.
> + */
> +
> +static bool stream_have_data_unlocked(struct i915_perf_stream *stream)
> +{
> +	struct drm_i915_private *dev_priv = stream->dev_priv;
> +
> +	if (stream->cs_mode)
> +		return !cs_buffer_is_empty(stream);
> +	else
> +		return oa_buffer_check_unlocked(dev_priv);
> +}
> +
> +/**
> + * i915_perf_stream_wait_unlocked - handles blocking IO until data available
> + * @stream: An i915-perf stream opened for GPU metrics
> + *
>    * Called when userspace tries to read() from a blocking stream FD opened
> - * for OA metrics. It waits until the hrtimer callback finds a non-empty
> - * OA buffer and wakes us.
> + * for perf metrics. It waits until the hrtimer callback finds a non-empty
> + * command stream buffer / OA buffer and wakes us.
>    *
>    * Note: it's acceptable to have this return with some false positives
>    * since any subsequent read handling will return -EAGAIN if there isn't
> @@ -1114,7 +1621,7 @@ static int gen7_oa_read(struct i915_perf_stream *stream,
>    *
>    * Returns: zero on success or a negative error code
>    */
> -static int i915_oa_wait_unlocked(struct i915_perf_stream *stream)
> +static int i915_perf_stream_wait_unlocked(struct i915_perf_stream *stream)
>   {
>   	struct drm_i915_private *dev_priv = stream->dev_priv;
>   
> @@ -1122,32 +1629,47 @@ static int i915_oa_wait_unlocked(struct i915_perf_stream *stream)
>   	if (!dev_priv->perf.oa.periodic)
>   		return -EIO;
>   
> -	return wait_event_interruptible(dev_priv->perf.oa.poll_wq,
> -					oa_buffer_check_unlocked(dev_priv));
> +	if (stream->cs_mode) {
> +		long int ret;
> +
> +		/* Wait for the all sampled requests. */
> +		ret = reservation_object_wait_timeout_rcu(
> +						    stream->cs_buffer.vma->resv,
> +						    true,
> +						    true,
> +						    MAX_SCHEDULE_TIMEOUT);
> +		if (unlikely(ret < 0)) {
> +			DRM_DEBUG_DRIVER("Failed to wait for sampled requests: %li\n", ret);
> +			return ret;
> +		}
> +	}
> +
> +	return wait_event_interruptible(stream->poll_wq,
> +					stream_have_data_unlocked(stream));
>   }
>   
>   /**
> - * i915_oa_poll_wait - call poll_wait() for an OA stream poll()
> - * @stream: An i915-perf stream opened for OA metrics
> + * i915_perf_stream_poll_wait - call poll_wait() for an stream poll()
> + * @stream: An i915-perf stream opened for GPU metrics
>    * @file: An i915 perf stream file
>    * @wait: poll() state table
>    *
> - * For handling userspace polling on an i915 perf stream opened for OA metrics,
> + * For handling userspace polling on an i915 perf stream opened for metrics,
>    * this starts a poll_wait with the wait queue that our hrtimer callback wakes
> - * when it sees data ready to read in the circular OA buffer.
> + * when it sees data ready to read either in command stream buffer or in the
> + * circular OA buffer.
>    */
> -static void i915_oa_poll_wait(struct i915_perf_stream *stream,
> +static void i915_perf_stream_poll_wait(struct i915_perf_stream *stream,
>   			      struct file *file,
>   			      poll_table *wait)
>   {
> -	struct drm_i915_private *dev_priv = stream->dev_priv;
> -
> -	poll_wait(file, &dev_priv->perf.oa.poll_wq, wait);
> +	poll_wait(file, &stream->poll_wq, wait);
>   }
>   
>   /**
> - * i915_oa_read - just calls through to &i915_oa_ops->read
> - * @stream: An i915-perf stream opened for OA metrics
> + * i915_perf_stream_read - Reads perf metrics available into userspace read
> + * buffer
> + * @stream: An i915-perf stream opened for GPU metrics
>    * @buf: destination buffer given by userspace
>    * @count: the number of bytes userspace wants to read
>    * @offset: (inout): the current position for writing into @buf
> @@ -1157,14 +1679,21 @@ static void i915_oa_poll_wait(struct i915_perf_stream *stream,
>    *
>    * Returns: zero on success or a negative error code
>    */
> -static int i915_oa_read(struct i915_perf_stream *stream,
> +static int i915_perf_stream_read(struct i915_perf_stream *stream,
>   			char __user *buf,
>   			size_t count,
>   			size_t *offset)
>   {
>   	struct drm_i915_private *dev_priv = stream->dev_priv;
>   
> -	return dev_priv->perf.oa.ops.read(stream, buf, count, offset);
> +
> +	if (stream->cs_mode)
> +		return append_cs_buffer_samples(stream, buf, count, offset);
> +	else if (stream->sample_flags & SAMPLE_OA_REPORT)
> +		return dev_priv->perf.oa.ops.read(stream, buf, count, offset,
> +						U32_MAX);
> +	else
> +		return -EINVAL;
>   }
>   
>   /**
> @@ -1182,7 +1711,7 @@ static int oa_get_render_ctx_id(struct i915_perf_stream *stream)
>   	struct drm_i915_private *dev_priv = stream->dev_priv;
>   
>   	if (i915.enable_execlists)
> -		dev_priv->perf.oa.specific_ctx_id = stream->ctx->hw_id;
> +		stream->engine->specific_ctx_id = stream->ctx->hw_id;
>   	else {
>   		struct intel_engine_cs *engine = dev_priv->engine[RCS];
>   		struct intel_ring *ring;
> @@ -1209,7 +1738,7 @@ static int oa_get_render_ctx_id(struct i915_perf_stream *stream)
>   		 * i915_ggtt_offset() on the fly) considering the difference
>   		 * with gen8+ and execlists
>   		 */
> -		dev_priv->perf.oa.specific_ctx_id =
> +		stream->engine->specific_ctx_id =
>   			i915_ggtt_offset(stream->ctx->engine[engine->id].state);
>   	}
>   
> @@ -1228,13 +1757,13 @@ static void oa_put_render_ctx_id(struct i915_perf_stream *stream)
>   	struct drm_i915_private *dev_priv = stream->dev_priv;
>   
>   	if (i915.enable_execlists) {
> -		dev_priv->perf.oa.specific_ctx_id = INVALID_CTX_ID;
> +		stream->engine->specific_ctx_id = INVALID_CTX_ID;
>   	} else {
>   		struct intel_engine_cs *engine = dev_priv->engine[RCS];
>   
>   		mutex_lock(&dev_priv->drm.struct_mutex);
>   
> -		dev_priv->perf.oa.specific_ctx_id = INVALID_CTX_ID;
> +		stream->engine->specific_ctx_id = INVALID_CTX_ID;
>   		engine->context_unpin(engine, stream->ctx);
>   
>   		mutex_unlock(&dev_priv->drm.struct_mutex);
> @@ -1242,13 +1771,28 @@ static void oa_put_render_ctx_id(struct i915_perf_stream *stream)
>   }
>   
>   static void
> +free_cs_buffer(struct i915_perf_stream *stream)
> +{
> +	struct drm_i915_private *dev_priv = stream->dev_priv;
> +
> +	mutex_lock(&dev_priv->drm.struct_mutex);
> +
> +	i915_gem_object_unpin_map(stream->cs_buffer.vma->obj);
> +	i915_vma_unpin_and_release(&stream->cs_buffer.vma);
> +
> +	stream->cs_buffer.vma = NULL;
> +	stream->cs_buffer.vaddr = NULL;
> +
> +	mutex_unlock(&dev_priv->drm.struct_mutex);
> +}
> +
> +static void
>   free_oa_buffer(struct drm_i915_private *i915)
>   {
>   	mutex_lock(&i915->drm.struct_mutex);
>   
>   	i915_gem_object_unpin_map(i915->perf.oa.oa_buffer.vma->obj);
> -	i915_vma_unpin(i915->perf.oa.oa_buffer.vma);
> -	i915_gem_object_put(i915->perf.oa.oa_buffer.vma->obj);
> +	i915_vma_unpin_and_release(&i915->perf.oa.oa_buffer.vma);
>   
>   	i915->perf.oa.oa_buffer.vma = NULL;
>   	i915->perf.oa.oa_buffer.vaddr = NULL;
> @@ -1256,27 +1800,41 @@ static void oa_put_render_ctx_id(struct i915_perf_stream *stream)
>   	mutex_unlock(&i915->drm.struct_mutex);
>   }
>   
> -static void i915_oa_stream_destroy(struct i915_perf_stream *stream)
> +static void i915_perf_stream_destroy(struct i915_perf_stream *stream)
>   {
>   	struct drm_i915_private *dev_priv = stream->dev_priv;
> -
> -	BUG_ON(stream != dev_priv->perf.oa.exclusive_stream);
> +	struct intel_engine_cs *engine = stream->engine;
> +	struct i915_perf_stream *engine_stream;
> +	int idx;
> +
> +	idx = srcu_read_lock(&engine->perf_srcu);
> +	engine_stream = srcu_dereference(engine->exclusive_stream,
> +					 &engine->perf_srcu);
> +	if (WARN_ON(stream != engine_stream))
> +		return;
> +	srcu_read_unlock(&engine->perf_srcu, idx);
>   
>   	/*
>   	 * Unset exclusive_stream first, it might be checked while
>   	 * disabling the metric set on gen8+.
>   	 */
> -	dev_priv->perf.oa.exclusive_stream = NULL;
> +	rcu_assign_pointer(stream->engine->exclusive_stream, NULL);
> +	synchronize_srcu(&stream->engine->perf_srcu);
>   
> -	dev_priv->perf.oa.ops.disable_metric_set(dev_priv);
> +	if (stream->using_oa) {
> +		dev_priv->perf.oa.ops.disable_metric_set(dev_priv);
>   
> -	free_oa_buffer(dev_priv);
> +		free_oa_buffer(dev_priv);
>   
> -	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
> -	intel_runtime_pm_put(dev_priv);
> +		intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
> +		intel_runtime_pm_put(dev_priv);
>   
> -	if (stream->ctx)
> -		oa_put_render_ctx_id(stream);
> +		if (stream->ctx)
> +			oa_put_render_ctx_id(stream);
> +	}
> +
> +	if (stream->cs_mode)
> +		free_cs_buffer(stream);
>   
>   	if (dev_priv->perf.oa.spurious_report_rs.missed) {
>   		DRM_NOTE("%d spurious OA report notices suppressed due to ratelimiting\n",
> @@ -1325,11 +1883,6 @@ static void gen7_init_oa_buffer(struct drm_i915_private *dev_priv)
>   	 * memory...
>   	 */
>   	memset(dev_priv->perf.oa.oa_buffer.vaddr, 0, OA_BUFFER_SIZE);
> -
> -	/* Maybe make ->pollin per-stream state if we support multiple
> -	 * concurrent streams in the future.
> -	 */
> -	dev_priv->perf.oa.pollin = false;
>   }
>   
>   static void gen8_init_oa_buffer(struct drm_i915_private *dev_priv)
> @@ -1383,33 +1936,26 @@ static void gen8_init_oa_buffer(struct drm_i915_private *dev_priv)
>   	 * memory...
>   	 */
>   	memset(dev_priv->perf.oa.oa_buffer.vaddr, 0, OA_BUFFER_SIZE);
> -
> -	/*
> -	 * Maybe make ->pollin per-stream state if we support multiple
> -	 * concurrent streams in the future.
> -	 */
> -	dev_priv->perf.oa.pollin = false;
>   }
>   
> -static int alloc_oa_buffer(struct drm_i915_private *dev_priv)
> +static int alloc_obj(struct drm_i915_private *dev_priv,
> +		     struct i915_vma **vma, u8 **vaddr)
>   {
>   	struct drm_i915_gem_object *bo;
> -	struct i915_vma *vma;
>   	int ret;
>   
> -	if (WARN_ON(dev_priv->perf.oa.oa_buffer.vma))
> -		return -ENODEV;
> +	intel_runtime_pm_get(dev_priv);
>   
>   	ret = i915_mutex_lock_interruptible(&dev_priv->drm);
>   	if (ret)
> -		return ret;
> +		goto out;
>   
>   	BUILD_BUG_ON_NOT_POWER_OF_2(OA_BUFFER_SIZE);
>   	BUILD_BUG_ON(OA_BUFFER_SIZE < SZ_128K || OA_BUFFER_SIZE > SZ_16M);
>   
>   	bo = i915_gem_object_create(dev_priv, OA_BUFFER_SIZE);
>   	if (IS_ERR(bo)) {
> -		DRM_ERROR("Failed to allocate OA buffer\n");
> +		DRM_ERROR("Failed to allocate i915 perf obj\n");
>   		ret = PTR_ERR(bo);
>   		goto unlock;
>   	}
> @@ -1419,42 +1965,83 @@ static int alloc_oa_buffer(struct drm_i915_private *dev_priv)
>   		goto err_unref;
>   
>   	/* PreHSW required 512K alignment, HSW requires 16M */
> -	vma = i915_gem_object_ggtt_pin(bo, NULL, 0, SZ_16M, 0);
> -	if (IS_ERR(vma)) {
> -		ret = PTR_ERR(vma);
> +	*vma = i915_gem_object_ggtt_pin(bo, NULL, 0, SZ_16M, 0);
> +	if (IS_ERR(*vma)) {
> +		ret = PTR_ERR(*vma);
>   		goto err_unref;
>   	}
> -	dev_priv->perf.oa.oa_buffer.vma = vma;
>   
> -	dev_priv->perf.oa.oa_buffer.vaddr =
> -		i915_gem_object_pin_map(bo, I915_MAP_WB);
> -	if (IS_ERR(dev_priv->perf.oa.oa_buffer.vaddr)) {
> -		ret = PTR_ERR(dev_priv->perf.oa.oa_buffer.vaddr);
> +	*vaddr = i915_gem_object_pin_map(bo, I915_MAP_WB);
> +	if (IS_ERR(*vaddr)) {
> +		ret = PTR_ERR(*vaddr);
>   		goto err_unpin;
>   	}
>   
> -	dev_priv->perf.oa.ops.init_oa_buffer(dev_priv);
> -
> -	DRM_DEBUG_DRIVER("OA Buffer initialized, gtt offset = 0x%x, vaddr = %p\n",
> -			 i915_ggtt_offset(dev_priv->perf.oa.oa_buffer.vma),
> -			 dev_priv->perf.oa.oa_buffer.vaddr);
> -
>   	goto unlock;
>   
>   err_unpin:
> -	__i915_vma_unpin(vma);
> +	i915_vma_unpin(*vma);
>   
>   err_unref:
>   	i915_gem_object_put(bo);
>   
> -	dev_priv->perf.oa.oa_buffer.vaddr = NULL;
> -	dev_priv->perf.oa.oa_buffer.vma = NULL;
> -
>   unlock:
>   	mutex_unlock(&dev_priv->drm.struct_mutex);
> +out:
> +	intel_runtime_pm_put(dev_priv);
>   	return ret;
>   }
>   
> +static int alloc_oa_buffer(struct drm_i915_private *dev_priv)
> +{
> +	struct i915_vma *vma;
> +	u8 *vaddr;
> +	int ret;
> +
> +	if (WARN_ON(dev_priv->perf.oa.oa_buffer.vma))
> +		return -ENODEV;
> +
> +	ret = alloc_obj(dev_priv, &vma, &vaddr);
> +	if (ret)
> +		return ret;
> +
> +	dev_priv->perf.oa.oa_buffer.vma = vma;
> +	dev_priv->perf.oa.oa_buffer.vaddr = vaddr;
> +
> +	dev_priv->perf.oa.ops.init_oa_buffer(dev_priv);
> +
> +	DRM_DEBUG_DRIVER("OA Buffer initialized, gtt offset = 0x%x, vaddr = %p",
> +			 i915_ggtt_offset(dev_priv->perf.oa.oa_buffer.vma),
> +			 dev_priv->perf.oa.oa_buffer.vaddr);
> +	return 0;
> +}
> +
> +static int alloc_cs_buffer(struct i915_perf_stream *stream)
> +{
> +	struct drm_i915_private *dev_priv = stream->dev_priv;
> +	struct i915_vma *vma;
> +	u8 *vaddr;
> +	int ret;
> +
> +	if (WARN_ON(stream->cs_buffer.vma))
> +		return -ENODEV;
> +
> +	ret = alloc_obj(dev_priv, &vma, &vaddr);
> +	if (ret)
> +		return ret;
> +
> +	stream->cs_buffer.vma = vma;
> +	stream->cs_buffer.vaddr = vaddr;
> +	if (WARN_ON(!list_empty(&stream->cs_samples)))
> +		INIT_LIST_HEAD(&stream->cs_samples);
> +
> +	DRM_DEBUG_DRIVER("Command stream buf initialized, gtt offset = 0x%x, vaddr = %p",
> +			 i915_ggtt_offset(stream->cs_buffer.vma),
> +			 stream->cs_buffer.vaddr);
> +
> +	return 0;
> +}
> +
>   static void config_oa_regs(struct drm_i915_private *dev_priv,
>   			   const struct i915_oa_reg *regs,
>   			   int n_regs)
> @@ -1859,6 +2446,10 @@ static void gen8_disable_metric_set(struct drm_i915_private *dev_priv)
>   
>   static void gen7_oa_enable(struct drm_i915_private *dev_priv)
>   {
> +	struct i915_perf_stream *stream;
> +	struct intel_engine_cs *engine = dev_priv->engine[RCS];
> +	int idx;
> +
>   	/*
>   	 * Reset buf pointers so we don't forward reports from before now.
>   	 *
> @@ -1870,11 +2461,11 @@ static void gen7_oa_enable(struct drm_i915_private *dev_priv)
>   	 */
>   	gen7_init_oa_buffer(dev_priv);
>   
> -	if (dev_priv->perf.oa.exclusive_stream->enabled) {
> -		struct i915_gem_context *ctx =
> -			dev_priv->perf.oa.exclusive_stream->ctx;
> -		u32 ctx_id = dev_priv->perf.oa.specific_ctx_id;
> -
> +	idx = srcu_read_lock(&engine->perf_srcu);
> +	stream = srcu_dereference(engine->exclusive_stream, &engine->perf_srcu);
> +	if (stream->state != I915_PERF_STREAM_DISABLED) {
> +		struct i915_gem_context *ctx = stream->ctx;
> +		u32 ctx_id = engine->specific_ctx_id;
>   		bool periodic = dev_priv->perf.oa.periodic;
>   		u32 period_exponent = dev_priv->perf.oa.period_exponent;
>   		u32 report_format = dev_priv->perf.oa.oa_buffer.format;
> @@ -1889,6 +2480,7 @@ static void gen7_oa_enable(struct drm_i915_private *dev_priv)
>   			   GEN7_OACONTROL_ENABLE);
>   	} else
>   		I915_WRITE(GEN7_OACONTROL, 0);
> +	srcu_read_unlock(&engine->perf_srcu, idx);
>   }
>   
>   static void gen8_oa_enable(struct drm_i915_private *dev_priv)
> @@ -1917,22 +2509,23 @@ static void gen8_oa_enable(struct drm_i915_private *dev_priv)
>   }
>   
>   /**
> - * i915_oa_stream_enable - handle `I915_PERF_IOCTL_ENABLE` for OA stream
> - * @stream: An i915 perf stream opened for OA metrics
> + * i915_perf_stream_enable - handle `I915_PERF_IOCTL_ENABLE` for perf stream
> + * @stream: An i915 perf stream opened for GPU metrics
>    *
>    * [Re]enables hardware periodic sampling according to the period configured
>    * when opening the stream. This also starts a hrtimer that will periodically
>    * check for data in the circular OA buffer for notifying userspace (e.g.
>    * during a read() or poll()).
>    */
> -static void i915_oa_stream_enable(struct i915_perf_stream *stream)
> +static void i915_perf_stream_enable(struct i915_perf_stream *stream)
>   {
>   	struct drm_i915_private *dev_priv = stream->dev_priv;
>   
> -	dev_priv->perf.oa.ops.oa_enable(dev_priv);
> +	if (stream->sample_flags & SAMPLE_OA_REPORT)
> +		dev_priv->perf.oa.ops.oa_enable(dev_priv);
>   
> -	if (dev_priv->perf.oa.periodic)
> -		hrtimer_start(&dev_priv->perf.oa.poll_check_timer,
> +	if (stream->cs_mode || dev_priv->perf.oa.periodic)
> +		hrtimer_start(&dev_priv->perf.poll_check_timer,
>   			      ns_to_ktime(POLL_PERIOD),
>   			      HRTIMER_MODE_REL_PINNED);
>   }
> @@ -1948,34 +2541,39 @@ static void gen8_oa_disable(struct drm_i915_private *dev_priv)
>   }
>   
>   /**
> - * i915_oa_stream_disable - handle `I915_PERF_IOCTL_DISABLE` for OA stream
> - * @stream: An i915 perf stream opened for OA metrics
> + * i915_perf_stream_disable - handle `I915_PERF_IOCTL_DISABLE` for perf stream
> + * @stream: An i915 perf stream opened for GPU metrics
>    *
>    * Stops the OA unit from periodically writing counter reports into the
>    * circular OA buffer. This also stops the hrtimer that periodically checks for
>    * data in the circular OA buffer, for notifying userspace.
>    */
> -static void i915_oa_stream_disable(struct i915_perf_stream *stream)
> +static void i915_perf_stream_disable(struct i915_perf_stream *stream)
>   {
>   	struct drm_i915_private *dev_priv = stream->dev_priv;
>   
> -	dev_priv->perf.oa.ops.oa_disable(dev_priv);
> +	if (stream->cs_mode || dev_priv->perf.oa.periodic)
> +		hrtimer_cancel(&dev_priv->perf.poll_check_timer);
> +
> +	if (stream->cs_mode)
> +		i915_perf_stream_release_samples(stream);
>   
> -	if (dev_priv->perf.oa.periodic)
> -		hrtimer_cancel(&dev_priv->perf.oa.poll_check_timer);
> +	if (stream->sample_flags & SAMPLE_OA_REPORT)
> +		dev_priv->perf.oa.ops.oa_disable(dev_priv);
>   }
>   
> -static const struct i915_perf_stream_ops i915_oa_stream_ops = {
> -	.destroy = i915_oa_stream_destroy,
> -	.enable = i915_oa_stream_enable,
> -	.disable = i915_oa_stream_disable,
> -	.wait_unlocked = i915_oa_wait_unlocked,
> -	.poll_wait = i915_oa_poll_wait,
> -	.read = i915_oa_read,
> +static const struct i915_perf_stream_ops perf_stream_ops = {
> +	.destroy = i915_perf_stream_destroy,
> +	.enable = i915_perf_stream_enable,
> +	.disable = i915_perf_stream_disable,
> +	.wait_unlocked = i915_perf_stream_wait_unlocked,
> +	.poll_wait = i915_perf_stream_poll_wait,
> +	.read = i915_perf_stream_read,
> +	.emit_sample_capture = i915_perf_stream_emit_sample_capture,
>   };
>   
>   /**
> - * i915_oa_stream_init - validate combined props for OA stream and init
> + * i915_perf_stream_init - validate combined props for stream and init
>    * @stream: An i915 perf stream
>    * @param: The open parameters passed to `DRM_I915_PERF_OPEN`
>    * @props: The property state that configures stream (individually validated)
> @@ -1984,58 +2582,35 @@ static void i915_oa_stream_disable(struct i915_perf_stream *stream)
>    * doesn't ensure that the combination necessarily makes sense.
>    *
>    * At this point it has been determined that userspace wants a stream of
> - * OA metrics, but still we need to further validate the combined
> + * perf metrics, but still we need to further validate the combined
>    * properties are OK.
>    *
>    * If the configuration makes sense then we can allocate memory for
> - * a circular OA buffer and apply the requested metric set configuration.
> + * a circular perf buffer and apply the requested metric set configuration.
>    *
>    * Returns: zero on success or a negative error code.
>    */
> -static int i915_oa_stream_init(struct i915_perf_stream *stream,
> +static int i915_perf_stream_init(struct i915_perf_stream *stream,
>   			       struct drm_i915_perf_open_param *param,
>   			       struct perf_open_properties *props)
>   {
>   	struct drm_i915_private *dev_priv = stream->dev_priv;
> -	int format_size;
> +	bool require_oa_unit = props->sample_flags & (SAMPLE_OA_REPORT |
> +						      SAMPLE_OA_SOURCE);
> +	bool cs_sample_data = props->sample_flags & SAMPLE_OA_REPORT;
> +	struct i915_perf_stream *curr_stream;
> +	struct intel_engine_cs *engine = NULL;
> +	int idx;
>   	int ret;
>   
> -	/* If the sysfs metrics/ directory wasn't registered for some
> -	 * reason then don't let userspace try their luck with config
> -	 * IDs
> -	 */
> -	if (!dev_priv->perf.metrics_kobj) {
> -		DRM_DEBUG("OA metrics weren't advertised via sysfs\n");
> -		return -EINVAL;
> -	}
> -
> -	if (!(props->sample_flags & SAMPLE_OA_REPORT)) {
> -		DRM_DEBUG("Only OA report sampling supported\n");
> -		return -EINVAL;
> -	}
> -
> -	if (!dev_priv->perf.oa.ops.init_oa_buffer) {
> -		DRM_DEBUG("OA unit not supported\n");
> -		return -ENODEV;
> -	}
> -
> -	/* To avoid the complexity of having to accurately filter
> -	 * counter reports and marshal to the appropriate client
> -	 * we currently only allow exclusive access
> -	 */
> -	if (dev_priv->perf.oa.exclusive_stream) {
> -		DRM_DEBUG("OA unit already in use\n");
> -		return -EBUSY;
> -	}
> -
> -	if (!props->metrics_set) {
> -		DRM_DEBUG("OA metric set not specified\n");
> -		return -EINVAL;
> -	}
> -
> -	if (!props->oa_format) {
> -		DRM_DEBUG("OA report format not specified\n");
> -		return -EINVAL;
> +	if ((props->sample_flags & SAMPLE_CTX_ID) && !props->cs_mode) {
> +		if (IS_HASWELL(dev_priv)) {
> +			DRM_ERROR("On HSW, context ID sampling only supported via command stream\n");
> +			return -EINVAL;
> +		} else if (!i915.enable_execlists) {
> +			DRM_ERROR("On Gen8+ without execlists, context ID sampling only supported via command stream\n");
> +			return -EINVAL;
> +		}
>   	}
>   
>   	/* We set up some ratelimit state to potentially throttle any _NOTES
> @@ -2060,70 +2635,167 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
>   
>   	stream->sample_size = sizeof(struct drm_i915_perf_record_header);
>   
> -	format_size = dev_priv->perf.oa.oa_formats[props->oa_format].size;
> +	if (require_oa_unit) {
> +		int format_size;
>   
> -	stream->sample_flags |= SAMPLE_OA_REPORT;
> -	stream->sample_size += format_size;
> +		/* If the sysfs metrics/ directory wasn't registered for some
> +		 * reason then don't let userspace try their luck with config
> +		 * IDs
> +		 */
> +		if (!dev_priv->perf.metrics_kobj) {
> +			DRM_DEBUG("OA metrics weren't advertised via sysfs\n");
> +			return -EINVAL;
> +		}
>   
> -	if (props->sample_flags & SAMPLE_OA_SOURCE) {
> -		stream->sample_flags |= SAMPLE_OA_SOURCE;
> -		stream->sample_size += 8;
> -	}
> +		if (!dev_priv->perf.oa.ops.init_oa_buffer) {
> +			DRM_DEBUG("OA unit not supported\n");
> +			return -ENODEV;
> +		}
>   
> -	dev_priv->perf.oa.oa_buffer.format_size = format_size;
> -	if (WARN_ON(dev_priv->perf.oa.oa_buffer.format_size == 0))
> -		return -EINVAL;
> +		if (!props->metrics_set) {
> +			DRM_DEBUG("OA metric set not specified\n");
> +			return -EINVAL;
> +		}
> +
> +		if (!props->oa_format) {
> +			DRM_DEBUG("OA report format not specified\n");
> +			return -EINVAL;
> +		}
> +
> +		if (props->cs_mode && (props->engine != RCS)) {
> +			DRM_ERROR("Command stream OA metrics only available via Render CS\n");
> +			return -EINVAL;
> +		}
> +
> +		engine = dev_priv->engine[RCS];
> +		stream->using_oa = true;
> +
> +		idx = srcu_read_lock(&engine->perf_srcu);
> +		curr_stream = srcu_dereference(engine->exclusive_stream,
> +					       &engine->perf_srcu);
> +		if (curr_stream) {
> +			DRM_ERROR("Stream already opened\n");
> +			ret = -EINVAL;
> +			goto err_enable;
> +		}
> +		srcu_read_unlock(&engine->perf_srcu, idx);
> +
> +		format_size =
> +			dev_priv->perf.oa.oa_formats[props->oa_format].size;
> +
> +		if (props->sample_flags & SAMPLE_OA_REPORT) {
> +			stream->sample_flags |= SAMPLE_OA_REPORT;
> +			stream->sample_size += format_size;
> +		}
> +
> +		if (props->sample_flags & SAMPLE_OA_SOURCE) {
> +			if (!(props->sample_flags & SAMPLE_OA_REPORT)) {
> +				DRM_ERROR("OA source type can't be sampled without OA report\n");
> +				return -EINVAL;
> +			}
> +			stream->sample_flags |= SAMPLE_OA_SOURCE;
> +			stream->sample_size += 8;
> +		}
> +
> +		dev_priv->perf.oa.oa_buffer.format_size = format_size;
> +		if (WARN_ON(dev_priv->perf.oa.oa_buffer.format_size == 0))
> +			return -EINVAL;
> +
> +		dev_priv->perf.oa.oa_buffer.format =
> +			dev_priv->perf.oa.oa_formats[props->oa_format].format;
> +
> +		dev_priv->perf.oa.metrics_set = props->metrics_set;
>   
> -	dev_priv->perf.oa.oa_buffer.format =
> -		dev_priv->perf.oa.oa_formats[props->oa_format].format;
> +		dev_priv->perf.oa.periodic = props->oa_periodic;
> +		if (dev_priv->perf.oa.periodic)
> +			dev_priv->perf.oa.period_exponent =
> +				props->oa_period_exponent;
>   
> -	dev_priv->perf.oa.metrics_set = props->metrics_set;
> +		if (stream->ctx) {
> +			ret = oa_get_render_ctx_id(stream);
> +			if (ret)
> +				return ret;
> +		}
>   
> -	dev_priv->perf.oa.periodic = props->oa_periodic;
> -	if (dev_priv->perf.oa.periodic)
> -		dev_priv->perf.oa.period_exponent = props->oa_period_exponent;
> +		/* PRM - observability performance counters:
> +		 *
> +		 *   OACONTROL, performance counter enable, note:
> +		 *
> +		 *   "When this bit is set, in order to have coherent counts,
> +		 *   RC6 power state and trunk clock gating must be disabled.
> +		 *   This can be achieved by programming MMIO registers as
> +		 *   0xA094=0 and 0xA090[31]=1"
> +		 *
> +		 *   In our case we are expecting that taking pm + FORCEWAKE
> +		 *   references will effectively disable RC6.
> +		 */
> +		intel_runtime_pm_get(dev_priv);
> +		intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
>   
> -	if (stream->ctx) {
> -		ret = oa_get_render_ctx_id(stream);
> +		ret = alloc_oa_buffer(dev_priv);
>   		if (ret)
> -			return ret;
> +			goto err_oa_buf_alloc;
> +
> +		ret = dev_priv->perf.oa.ops.enable_metric_set(dev_priv);
> +		if (ret)
> +			goto err_enable;
>   	}
>   
> -	/* PRM - observability performance counters:
> -	 *
> -	 *   OACONTROL, performance counter enable, note:
> -	 *
> -	 *   "When this bit is set, in order to have coherent counts,
> -	 *   RC6 power state and trunk clock gating must be disabled.
> -	 *   This can be achieved by programming MMIO registers as
> -	 *   0xA094=0 and 0xA090[31]=1"
> -	 *
> -	 *   In our case we are expecting that taking pm + FORCEWAKE
> -	 *   references will effectively disable RC6.
> -	 */
> -	intel_runtime_pm_get(dev_priv);
> -	intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
> +	if (props->sample_flags & SAMPLE_CTX_ID) {
> +		stream->sample_flags |= SAMPLE_CTX_ID;
> +		stream->sample_size += 8;
> +	}
>   
> -	ret = alloc_oa_buffer(dev_priv);
> -	if (ret)
> -		goto err_oa_buf_alloc;
> +	if (props->cs_mode) {
> +		if (!cs_sample_data) {
> +			DRM_ERROR("Stream engine given without requesting any CS data to sample\n");
> +			ret = -EINVAL;
> +			goto err_enable;
> +		}
>   
> -	ret = dev_priv->perf.oa.ops.enable_metric_set(dev_priv);
> -	if (ret)
> -		goto err_enable;
> +		if (!(props->sample_flags & SAMPLE_CTX_ID)) {
> +			DRM_ERROR("Stream engine given without requesting any CS specific property\n");
> +			ret = -EINVAL;
> +			goto err_enable;
> +		}
>   
> -	stream->ops = &i915_oa_stream_ops;
> +		engine = dev_priv->engine[props->engine];
>   
> -	dev_priv->perf.oa.exclusive_stream = stream;
> +		idx = srcu_read_lock(&engine->perf_srcu);
> +		curr_stream = srcu_dereference(engine->exclusive_stream,
> +					       &engine->perf_srcu);
> +		if (curr_stream) {
> +			DRM_ERROR("Stream already opened\n");
> +			ret = -EINVAL;
> +			goto err_enable;
> +		}
> +		srcu_read_unlock(&engine->perf_srcu, idx);
> +
> +		INIT_LIST_HEAD(&stream->cs_samples);
> +		ret = alloc_cs_buffer(stream);
> +		if (ret)
> +			goto err_enable;
> +
> +		stream->cs_mode = true;
> +	}
> +
> +	init_waitqueue_head(&stream->poll_wq);
> +	stream->pollin = false;
> +	stream->ops = &perf_stream_ops;
> +	stream->engine = engine;
> +	rcu_assign_pointer(engine->exclusive_stream, stream);
>   
>   	return 0;
>   
>   err_enable:
> -	free_oa_buffer(dev_priv);
> +	if (require_oa_unit)
> +		free_oa_buffer(dev_priv);
>   
>   err_oa_buf_alloc:
> -	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
> -	intel_runtime_pm_put(dev_priv);
> +	if (require_oa_unit) {
> +		intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
> +		intel_runtime_pm_put(dev_priv);
> +	}
>   	if (stream->ctx)
>   		oa_put_render_ctx_id(stream);
>   
> @@ -2219,7 +2891,7 @@ static ssize_t i915_perf_read(struct file *file,
>   	 * disabled stream as an error. In particular it might otherwise lead
>   	 * to a deadlock for blocking file descriptors...
>   	 */
> -	if (!stream->enabled)
> +	if (stream->state == I915_PERF_STREAM_DISABLED)
>   		return -EIO;
>   
>   	if (!(file->f_flags & O_NONBLOCK)) {
> @@ -2254,25 +2926,32 @@ static ssize_t i915_perf_read(struct file *file,
>   	 * effectively ensures we back off until the next hrtimer callback
>   	 * before reporting another POLLIN event.
>   	 */
> -	if (ret >= 0 || ret == -EAGAIN) {
> -		/* Maybe make ->pollin per-stream state if we support multiple
> -		 * concurrent streams in the future.
> -		 */
> -		dev_priv->perf.oa.pollin = false;
> -	}
> +	if (ret >= 0 || ret == -EAGAIN)
> +		stream->pollin = false;
>   
>   	return ret;
>   }
>   
> -static enum hrtimer_restart oa_poll_check_timer_cb(struct hrtimer *hrtimer)
> +static enum hrtimer_restart poll_check_timer_cb(struct hrtimer *hrtimer)
>   {
> +	struct i915_perf_stream *stream;
>   	struct drm_i915_private *dev_priv =
>   		container_of(hrtimer, typeof(*dev_priv),
> -			     perf.oa.poll_check_timer);
> -
> -	if (oa_buffer_check_unlocked(dev_priv)) {
> -		dev_priv->perf.oa.pollin = true;
> -		wake_up(&dev_priv->perf.oa.poll_wq);
> +			     perf.poll_check_timer);
> +	int idx;
> +	struct intel_engine_cs *engine;
> +	enum intel_engine_id id;
> +
> +	for_each_engine(engine, dev_priv, id) {
> +		idx = srcu_read_lock(&engine->perf_srcu);
> +		stream = srcu_dereference(engine->exclusive_stream,
> +					  &engine->perf_srcu);
> +		if (stream && (stream->state == I915_PERF_STREAM_ENABLED) &&
> +		    stream_have_data_unlocked(stream)) {
> +			stream->pollin = true;
> +			wake_up(&stream->poll_wq);
> +		}
> +		srcu_read_unlock(&engine->perf_srcu, idx);
>   	}
>   
>   	hrtimer_forward_now(hrtimer, ns_to_ktime(POLL_PERIOD));
> @@ -2311,7 +2990,7 @@ static unsigned int i915_perf_poll_locked(struct drm_i915_private *dev_priv,
>   	 * the hrtimer/oa_poll_check_timer_cb to notify us when there are
>   	 * samples to read.
>   	 */
> -	if (dev_priv->perf.oa.pollin)
> +	if (stream->pollin)
>   		events |= POLLIN;
>   
>   	return events;
> @@ -2355,14 +3034,16 @@ static unsigned int i915_perf_poll(struct file *file, poll_table *wait)
>    */
>   static void i915_perf_enable_locked(struct i915_perf_stream *stream)
>   {
> -	if (stream->enabled)
> +	if (stream->state != I915_PERF_STREAM_DISABLED)
>   		return;
>   
>   	/* Allow stream->ops->enable() to refer to this */
> -	stream->enabled = true;
> +	stream->state = I915_PERF_STREAM_ENABLE_IN_PROGRESS;
>   
>   	if (stream->ops->enable)
>   		stream->ops->enable(stream);
> +
> +	stream->state = I915_PERF_STREAM_ENABLED;
>   }
>   
>   /**
> @@ -2381,11 +3062,11 @@ static void i915_perf_enable_locked(struct i915_perf_stream *stream)
>    */
>   static void i915_perf_disable_locked(struct i915_perf_stream *stream)
>   {
> -	if (!stream->enabled)
> +	if (stream->state != I915_PERF_STREAM_ENABLED)
>   		return;
>   
>   	/* Allow stream->ops->disable() to refer to this */
> -	stream->enabled = false;
> +	stream->state = I915_PERF_STREAM_DISABLED;
>   
>   	if (stream->ops->disable)
>   		stream->ops->disable(stream);
> @@ -2457,14 +3138,12 @@ static long i915_perf_ioctl(struct file *file,
>    */
>   static void i915_perf_destroy_locked(struct i915_perf_stream *stream)
>   {
> -	if (stream->enabled)
> +	if (stream->state == I915_PERF_STREAM_ENABLED)
>   		i915_perf_disable_locked(stream);
>   
>   	if (stream->ops->destroy)
>   		stream->ops->destroy(stream);
>   
> -	list_del(&stream->link);
> -
>   	if (stream->ctx)
>   		i915_gem_context_put(stream->ctx);
>   
> @@ -2524,7 +3203,7 @@ static int i915_perf_release(struct inode *inode, struct file *file)
>    *
>    * In the case where userspace is interested in OA unit metrics then further
>    * config validation and stream initialization details will be handled by
> - * i915_oa_stream_init(). The code here should only validate config state that
> + * i915_perf_stream_init(). The code here should only validate config state that
>    * will be relevant to all stream types / backends.
>    *
>    * Returns: zero on success or a negative error code.
> @@ -2593,7 +3272,7 @@ static int i915_perf_release(struct inode *inode, struct file *file)
>   	stream->dev_priv = dev_priv;
>   	stream->ctx = specific_ctx;
>   
> -	ret = i915_oa_stream_init(stream, param, props);
> +	ret = i915_perf_stream_init(stream, param, props);
>   	if (ret)
>   		goto err_alloc;
>   
> @@ -2606,8 +3285,6 @@ static int i915_perf_release(struct inode *inode, struct file *file)
>   		goto err_flags;
>   	}
>   
> -	list_add(&stream->link, &dev_priv->perf.streams);
> -
>   	if (param->flags & I915_PERF_FLAG_FD_CLOEXEC)
>   		f_flags |= O_CLOEXEC;
>   	if (param->flags & I915_PERF_FLAG_FD_NONBLOCK)
> @@ -2625,7 +3302,6 @@ static int i915_perf_release(struct inode *inode, struct file *file)
>   	return stream_fd;
>   
>   err_open:
> -	list_del(&stream->link);
>   err_flags:
>   	if (stream->ops->destroy)
>   		stream->ops->destroy(stream);
> @@ -2774,6 +3450,29 @@ static int read_properties_unlocked(struct drm_i915_private *dev_priv,
>   		case DRM_I915_PERF_PROP_SAMPLE_OA_SOURCE:
>   			props->sample_flags |= SAMPLE_OA_SOURCE;
>   			break;
> +		case DRM_I915_PERF_PROP_ENGINE: {
> +				unsigned int user_ring_id =
> +					value & I915_EXEC_RING_MASK;
> +				enum intel_engine_id engine;
> +
> +				if (user_ring_id > I915_USER_RINGS)
> +					return -EINVAL;
> +
> +				/* XXX: Currently only RCS is supported.
> +				 * Remove this check when support for other
> +				 * engines is added
> +				 */
> +				engine = user_ring_map[user_ring_id];
> +				if (engine != RCS)
> +					return -EINVAL;
> +
> +				props->cs_mode = true;
> +				props->engine = engine;
> +			}
> +			break;
> +		case DRM_I915_PERF_PROP_SAMPLE_CTX_ID:
> +			props->sample_flags |= SAMPLE_CTX_ID;
> +			break;
>   		case DRM_I915_PERF_PROP_MAX:
>   			MISSING_CASE(id);
>   			return -EINVAL;
> @@ -3002,6 +3701,30 @@ void i915_perf_unregister(struct drm_i915_private *dev_priv)
>   	{}
>   };
>   
> +void i915_perf_streams_mark_idle(struct drm_i915_private *dev_priv)
> +{
> +	struct intel_engine_cs *engine;
> +	struct i915_perf_stream *stream;
> +	enum intel_engine_id id;
> +	int idx;
> +
> +	for_each_engine(engine, dev_priv, id) {
> +		idx = srcu_read_lock(&engine->perf_srcu);
> +		stream = srcu_dereference(engine->exclusive_stream,
> +					  &engine->perf_srcu);
> +		if (stream && (stream->state == I915_PERF_STREAM_ENABLED) &&
> +					stream->cs_mode) {
> +			struct reservation_object *resv =
> +						stream->cs_buffer.vma->resv;
> +
> +			reservation_object_lock(resv, NULL);
> +			reservation_object_add_excl_fence(resv, NULL);
> +			reservation_object_unlock(resv);
> +		}
> +		srcu_read_unlock(&engine->perf_srcu, idx);
> +	}
> +}
> +
>   /**
>    * i915_perf_init - initialize i915-perf state on module load
>    * @dev_priv: i915 device instance
> @@ -3125,12 +3848,10 @@ void i915_perf_init(struct drm_i915_private *dev_priv)
>   	}
>   
>   	if (dev_priv->perf.oa.n_builtin_sets) {
> -		hrtimer_init(&dev_priv->perf.oa.poll_check_timer,
> +		hrtimer_init(&dev_priv->perf.poll_check_timer,
>   				CLOCK_MONOTONIC, HRTIMER_MODE_REL);
> -		dev_priv->perf.oa.poll_check_timer.function = oa_poll_check_timer_cb;
> -		init_waitqueue_head(&dev_priv->perf.oa.poll_wq);
> +		dev_priv->perf.poll_check_timer.function = poll_check_timer_cb;
>   
> -		INIT_LIST_HEAD(&dev_priv->perf.streams);
>   		mutex_init(&dev_priv->perf.lock);
>   		spin_lock_init(&dev_priv->perf.oa.oa_buffer.ptr_lock);
>   
> diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
> index 9ab5969..1a2e843 100644
> --- a/drivers/gpu/drm/i915/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
> @@ -317,6 +317,10 @@ int intel_engines_init(struct drm_i915_private *dev_priv)
>   			goto cleanup;
>   
>   		GEM_BUG_ON(!engine->submit_request);
> +
> +		/* Perf stream related initialization for Engine */
> +		rcu_assign_pointer(engine->exclusive_stream, NULL);
> +		init_srcu_struct(&engine->perf_srcu);
>   	}
>   
>   	return 0;
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index cdf084e..4333623 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -1622,6 +1622,8 @@ void intel_engine_cleanup(struct intel_engine_cs *engine)
>   
>   	intel_engine_cleanup_common(engine);
>   
> +	cleanup_srcu_struct(&engine->perf_srcu);
> +
>   	dev_priv->engine[engine->id] = NULL;
>   	kfree(engine);
>   }
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index d33c934..0ac8491 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -441,6 +441,11 @@ struct intel_engine_cs {
>   	 * certain bits to encode the command length in the header).
>   	 */
>   	u32 (*get_cmd_length_mask)(u32 cmd_header);
> +
> +	/* Global per-engine stream */
> +	struct srcu_struct perf_srcu;
> +	struct i915_perf_stream __rcu *exclusive_stream;
> +	u32 specific_ctx_id;
>   };
>   
>   static inline unsigned int
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index a1314c5..768b1a5 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -1350,6 +1350,7 @@ enum drm_i915_oa_format {
>   
>   enum drm_i915_perf_sample_oa_source {
>   	I915_PERF_SAMPLE_OA_SOURCE_OABUFFER,
> +	I915_PERF_SAMPLE_OA_SOURCE_CS,
>   	I915_PERF_SAMPLE_OA_SOURCE_MAX	/* non-ABI */
>   };
>   
> @@ -1394,6 +1395,19 @@ enum drm_i915_perf_property_id {
>   	 */
>   	DRM_I915_PERF_PROP_SAMPLE_OA_SOURCE,
>   
> +	/**
> +	 * The value of this property specifies the GPU engine for which
> +	 * the samples need to be collected. Specifying this property also
> +	 * implies the command stream based sample collection.
> +	 */
> +	DRM_I915_PERF_PROP_ENGINE,
> +
> +	/**
> +	 * The value of this property set to 1 requests inclusion of context ID
> +	 * in the perf sample data.
> +	 */
> +	DRM_I915_PERF_PROP_SAMPLE_CTX_ID,
> +
>   	DRM_I915_PERF_PROP_MAX /* non-ABI */
>   };
>   
> @@ -1460,6 +1474,7 @@ enum drm_i915_perf_record_type {
>   	 *     struct drm_i915_perf_record_header header;
>   	 *
>   	 *     { u64 source; } && DRM_I915_PERF_PROP_SAMPLE_OA_SOURCE
> +	 *     { u64 ctx_id; } && DRM_I915_PERF_PROP_SAMPLE_CTX_ID
>   	 *     { u32 oa_report[]; } && DRM_I915_PERF_PROP_SAMPLE_OA
>   	 * };
>   	 */


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 04/12] drm/i915: Flush periodic samples, in case of no pending CS sample requests
  2017-07-31  7:59 ` [PATCH 04/12] drm/i915: Flush periodic samples, in case of no pending CS sample requests Sagar Arun Kamble
@ 2017-07-31 16:52   ` kbuild test robot
  0 siblings, 0 replies; 34+ messages in thread
From: kbuild test robot @ 2017-07-31 16:52 UTC (permalink / raw)
  To: Sagar Arun Kamble; +Cc: intel-gfx, Sourab Gupta, kbuild-all

[-- Attachment #1: Type: text/plain, Size: 22381 bytes --]

Hi Sourab,

[auto build test WARNING on drm-intel/for-linux-next]
[also build test WARNING on next-20170731]
[cannot apply to v4.13-rc3]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Sagar-Arun-Kamble/i915-perf-support-for-command-stream-based-OA-GPU-and-workload-metrics-capture/20170731-184412
base:   git://anongit.freedesktop.org/drm-intel for-linux-next
reproduce: make htmldocs

All warnings (new ones prefixed by >>):

   WARNING: convert(1) not found, for SVG to PDF conversion install ImageMagick (https://www.imagemagick.org)
   include/linux/init.h:1: warning: no structured comments found
   include/linux/mod_devicetable.h:687: warning: Excess struct/union/enum/typedef member 'ver_major' description in 'fsl_mc_device_id'
   include/linux/mod_devicetable.h:687: warning: Excess struct/union/enum/typedef member 'ver_minor' description in 'fsl_mc_device_id'
   kernel/sched/core.c:2080: warning: No description found for parameter 'rf'
   kernel/sched/core.c:2080: warning: Excess function parameter 'cookie' description in 'try_to_wake_up_local'
   include/linux/wait.h:555: warning: No description found for parameter 'wq'
   include/linux/wait.h:555: warning: Excess function parameter 'wq_head' description in 'wait_event_interruptible_hrtimeout'
   include/linux/wait.h:759: warning: No description found for parameter 'wq_head'
   include/linux/wait.h:759: warning: Excess function parameter 'wq' description in 'wait_event_killable'
   include/linux/kthread.h:26: warning: Excess function parameter '...' description in 'kthread_create'
   kernel/sys.c:1: warning: no structured comments found
   include/linux/device.h:968: warning: No description found for parameter 'dma_ops'
   drivers/dma-buf/seqno-fence.c:1: warning: no structured comments found
   include/linux/iio/iio.h:603: warning: No description found for parameter 'trig_readonly'
   include/linux/iio/trigger.h:151: warning: No description found for parameter 'indio_dev'
   include/linux/iio/trigger.h:151: warning: No description found for parameter 'trig'
   include/linux/device.h:969: warning: No description found for parameter 'dma_ops'
   drivers/ata/libata-eh.c:1449: warning: No description found for parameter 'link'
   drivers/ata/libata-eh.c:1449: warning: Excess function parameter 'ap' description in 'ata_eh_done'
   drivers/ata/libata-eh.c:1590: warning: No description found for parameter 'qc'
   drivers/ata/libata-eh.c:1590: warning: Excess function parameter 'dev' description in 'ata_eh_request_sense'
   drivers/mtd/nand/nand_base.c:2751: warning: Excess function parameter 'cached' description in 'nand_write_page'
   drivers/mtd/nand/nand_base.c:2751: warning: Excess function parameter 'cached' description in 'nand_write_page'
   arch/s390/include/asm/cmb.h:1: warning: no structured comments found
   drivers/scsi/scsi_lib.c:1116: warning: No description found for parameter 'rq'
   drivers/scsi/constants.c:1: warning: no structured comments found
   include/linux/usb/gadget.h:230: warning: No description found for parameter 'claimed'
   include/linux/usb/gadget.h:230: warning: No description found for parameter 'enabled'
   include/linux/usb/gadget.h:412: warning: No description found for parameter 'quirk_altset_not_supp'
   include/linux/usb/gadget.h:412: warning: No description found for parameter 'quirk_stall_not_supp'
   include/linux/usb/gadget.h:412: warning: No description found for parameter 'quirk_zlp_not_supp'
   fs/inode.c:1666: warning: No description found for parameter 'rcu'
   include/linux/jbd2.h:443: warning: No description found for parameter 'i_transaction'
   include/linux/jbd2.h:443: warning: No description found for parameter 'i_next_transaction'
   include/linux/jbd2.h:443: warning: No description found for parameter 'i_list'
   include/linux/jbd2.h:443: warning: No description found for parameter 'i_vfs_inode'
   include/linux/jbd2.h:443: warning: No description found for parameter 'i_flags'
   include/linux/jbd2.h:497: warning: No description found for parameter 'h_rsv_handle'
   include/linux/jbd2.h:497: warning: No description found for parameter 'h_reserved'
   include/linux/jbd2.h:497: warning: No description found for parameter 'h_type'
   include/linux/jbd2.h:497: warning: No description found for parameter 'h_line_no'
   include/linux/jbd2.h:497: warning: No description found for parameter 'h_start_jiffies'
   include/linux/jbd2.h:497: warning: No description found for parameter 'h_requested_credits'
   include/linux/jbd2.h:497: warning: No description found for parameter 'saved_alloc_context'
   include/linux/jbd2.h:1050: warning: No description found for parameter 'j_chkpt_bhs'
   include/linux/jbd2.h:1050: warning: No description found for parameter 'j_devname'
   include/linux/jbd2.h:1050: warning: No description found for parameter 'j_average_commit_time'
   include/linux/jbd2.h:1050: warning: No description found for parameter 'j_min_batch_time'
   include/linux/jbd2.h:1050: warning: No description found for parameter 'j_max_batch_time'
   include/linux/jbd2.h:1050: warning: No description found for parameter 'j_commit_callback'
   include/linux/jbd2.h:1050: warning: No description found for parameter 'j_failed_commit'
   include/linux/jbd2.h:1050: warning: No description found for parameter 'j_chksum_driver'
   include/linux/jbd2.h:1050: warning: No description found for parameter 'j_csum_seed'
   fs/jbd2/transaction.c:511: warning: No description found for parameter 'type'
   fs/jbd2/transaction.c:511: warning: No description found for parameter 'line_no'
   fs/jbd2/transaction.c:641: warning: No description found for parameter 'gfp_mask'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'debugfs_init'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'gem_open_object'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'gem_close_object'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'prime_handle_to_fd'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'prime_fd_to_handle'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'gem_prime_export'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'gem_prime_import'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'gem_prime_pin'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'gem_prime_unpin'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'gem_prime_res_obj'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'gem_prime_get_sg_table'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'gem_prime_import_sg_table'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'gem_prime_vmap'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'gem_prime_vunmap'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'gem_prime_mmap'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'gem_vm_ops'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'major'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'minor'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'patchlevel'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'name'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'desc'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'date'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'driver_features'
   drivers/gpu/drm/drm_modes.c:1623: warning: No description found for parameter 'display'
   drivers/gpu/drm/drm_modes.c:1623: warning: Excess function parameter 'connector' description in 'drm_mode_is_420_only'
>> drivers/gpu/drm/i915/i915_perf.c:684: warning: No description found for parameter 'last_ts'
>> drivers/gpu/drm/i915/i915_perf.c:684: warning: No description found for parameter 'last_ts'
>> drivers/gpu/drm/i915/i915_perf.c:684: warning: No description found for parameter 'last_ts'
>> drivers/gpu/drm/i915/i915_perf.c:684: warning: No description found for parameter 'last_ts'
>> drivers/gpu/drm/i915/i915_perf.c:684: warning: No description found for parameter 'last_ts'
>> drivers/gpu/drm/i915/i915_perf.c:684: warning: No description found for parameter 'last_ts'
>> drivers/gpu/drm/i915/i915_perf.c:684: warning: No description found for parameter 'last_ts'
>> drivers/gpu/drm/i915/i915_perf.c:684: warning: No description found for parameter 'last_ts'
   drivers/gpu/drm/i915/i915_drv.h:2000: warning: No description found for parameter 'emit_sample_capture'
   drivers/gpu/drm/i915/i915_drv.h:2078: warning: No description found for parameter 'cs_buffer'
   drivers/gpu/drm/i915/i915_drv.h:2078: warning: No description found for parameter 'cs_samples'
   drivers/gpu/drm/i915/i915_drv.h:2078: warning: No description found for parameter 'cs_samples_lock'
   drivers/gpu/drm/i915/i915_drv.h:2078: warning: No description found for parameter 'poll_wq'
   drivers/gpu/drm/i915/i915_drv.h:2078: warning: No description found for parameter 'pollin'
   drivers/gpu/drm/i915/i915_drv.h:2000: warning: No description found for parameter 'emit_sample_capture'
   drivers/gpu/drm/i915/i915_drv.h:2078: warning: No description found for parameter 'cs_buffer'
   drivers/gpu/drm/i915/i915_drv.h:2078: warning: No description found for parameter 'cs_samples'
   drivers/gpu/drm/i915/i915_drv.h:2078: warning: No description found for parameter 'cs_samples_lock'
   drivers/gpu/drm/i915/i915_drv.h:2078: warning: No description found for parameter 'poll_wq'
   drivers/gpu/drm/i915/i915_drv.h:2078: warning: No description found for parameter 'pollin'
>> drivers/gpu/drm/i915/i915_perf.c:684: warning: No description found for parameter 'last_ts'
>> drivers/gpu/drm/i915/i915_perf.c:684: warning: No description found for parameter 'last_ts'
>> drivers/gpu/drm/i915/i915_perf.c:684: warning: No description found for parameter 'last_ts'
>> drivers/gpu/drm/i915/i915_perf.c:684: warning: No description found for parameter 'last_ts'
>> drivers/gpu/drm/i915/i915_perf.c:684: warning: No description found for parameter 'last_ts'
>> drivers/gpu/drm/i915/i915_perf.c:684: warning: No description found for parameter 'last_ts'
>> drivers/gpu/drm/i915/i915_perf.c:684: warning: No description found for parameter 'last_ts'
>> drivers/gpu/drm/i915/i915_perf.c:684: warning: No description found for parameter 'last_ts'
>> drivers/gpu/drm/i915/i915_perf.c:684: warning: No description found for parameter 'last_ts'
   drivers/gpu/drm/i915/i915_drv.h:2000: warning: No description found for parameter 'emit_sample_capture'
   drivers/gpu/drm/i915/i915_drv.h:2078: warning: No description found for parameter 'cs_buffer'
   drivers/gpu/drm/i915/i915_drv.h:2078: warning: No description found for parameter 'cs_samples'
   drivers/gpu/drm/i915/i915_drv.h:2078: warning: No description found for parameter 'cs_samples_lock'
   drivers/gpu/drm/i915/i915_drv.h:2078: warning: No description found for parameter 'poll_wq'
   drivers/gpu/drm/i915/i915_drv.h:2078: warning: No description found for parameter 'pollin'
>> drivers/gpu/drm/i915/i915_perf.c:684: warning: No description found for parameter 'last_ts'
   drivers/gpu/drm/i915/i915_perf.c:1: warning: no structured comments found
>> drivers/gpu/drm/i915/i915_perf.c:684: warning: No description found for parameter 'last_ts'
   drivers/gpu/drm/i915/i915_perf.c:1: warning: no structured comments found
>> drivers/gpu/drm/i915/i915_perf.c:684: warning: No description found for parameter 'last_ts'
   drivers/gpu/drm/i915/i915_perf.c:1: warning: no structured comments found

vim +/last_ts +684 drivers/gpu/drm/i915/i915_perf.c

b0aca6b4 Sourab Gupta 2017-07-31  657  
b0aca6b4 Sourab Gupta 2017-07-31  658  /**
24459f50 Sourab Gupta 2017-07-31  659   * oa_buffer_num_reports_unlocked - check for data and update tail ptr state
0dd860cf Robert Bragg 2017-05-11  660   * @dev_priv: i915 device instance
d7965152 Robert Bragg 2016-11-07  661   *
0dd860cf Robert Bragg 2017-05-11  662   * This is either called via fops (for blocking reads in user ctx) or the poll
0dd860cf Robert Bragg 2017-05-11  663   * check hrtimer (atomic ctx) to check the OA buffer tail pointer and check
0dd860cf Robert Bragg 2017-05-11  664   * if there is data available for userspace to read.
d7965152 Robert Bragg 2016-11-07  665   *
0dd860cf Robert Bragg 2017-05-11  666   * This function is central to providing a workaround for the OA unit tail
0dd860cf Robert Bragg 2017-05-11  667   * pointer having a race with respect to what data is visible to the CPU.
0dd860cf Robert Bragg 2017-05-11  668   * It is responsible for reading tail pointers from the hardware and giving
0dd860cf Robert Bragg 2017-05-11  669   * the pointers time to 'age' before they are made available for reading.
0dd860cf Robert Bragg 2017-05-11  670   * (See description of OA_TAIL_MARGIN_NSEC above for further details.)
0dd860cf Robert Bragg 2017-05-11  671   *
24459f50 Sourab Gupta 2017-07-31  672   * Besides returning num of reports when there is data available to read() it
0dd860cf Robert Bragg 2017-05-11  673   * also has the side effect of updating the oa_buffer.tails[], .aging_timestamp
0dd860cf Robert Bragg 2017-05-11  674   * and .aged_tail_idx state used for reading.
0dd860cf Robert Bragg 2017-05-11  675   *
0dd860cf Robert Bragg 2017-05-11  676   * Note: It's safe to read OA config state here unlocked, assuming that this is
0dd860cf Robert Bragg 2017-05-11  677   * only called while the stream is enabled, while the global OA configuration
0dd860cf Robert Bragg 2017-05-11  678   * can't be modified.
0dd860cf Robert Bragg 2017-05-11  679   *
24459f50 Sourab Gupta 2017-07-31  680   * Returns: number of samples available to read
d7965152 Robert Bragg 2016-11-07  681   */
24459f50 Sourab Gupta 2017-07-31  682  static u32 oa_buffer_num_reports_unlocked(
24459f50 Sourab Gupta 2017-07-31  683  			struct drm_i915_private *dev_priv, u32 *last_ts)
d7965152 Robert Bragg 2016-11-07 @684  {
d7965152 Robert Bragg 2016-11-07  685  	int report_size = dev_priv->perf.oa.oa_buffer.format_size;
0dd860cf Robert Bragg 2017-05-11  686  	unsigned long flags;
0dd860cf Robert Bragg 2017-05-11  687  	unsigned int aged_idx;
24459f50 Sourab Gupta 2017-07-31  688  	u32 head, hw_tail, aged_tail, aging_tail, num_reports = 0;
0dd860cf Robert Bragg 2017-05-11  689  	u64 now;
0dd860cf Robert Bragg 2017-05-11  690  
0dd860cf Robert Bragg 2017-05-11  691  	/* We have to consider the (unlikely) possibility that read() errors
0dd860cf Robert Bragg 2017-05-11  692  	 * could result in an OA buffer reset which might reset the head,
0dd860cf Robert Bragg 2017-05-11  693  	 * tails[] and aged_tail state.
0dd860cf Robert Bragg 2017-05-11  694  	 */
0dd860cf Robert Bragg 2017-05-11  695  	spin_lock_irqsave(&dev_priv->perf.oa.oa_buffer.ptr_lock, flags);
0dd860cf Robert Bragg 2017-05-11  696  
0dd860cf Robert Bragg 2017-05-11  697  	/* NB: The head we observe here might effectively be a little out of
0dd860cf Robert Bragg 2017-05-11  698  	 * date (between head and tails[aged_idx].offset if there is currently
0dd860cf Robert Bragg 2017-05-11  699  	 * a read() in progress.
0dd860cf Robert Bragg 2017-05-11  700  	 */
0dd860cf Robert Bragg 2017-05-11  701  	head = dev_priv->perf.oa.oa_buffer.head;
0dd860cf Robert Bragg 2017-05-11  702  
0dd860cf Robert Bragg 2017-05-11  703  	aged_idx = dev_priv->perf.oa.oa_buffer.aged_tail_idx;
0dd860cf Robert Bragg 2017-05-11  704  	aged_tail = dev_priv->perf.oa.oa_buffer.tails[aged_idx].offset;
0dd860cf Robert Bragg 2017-05-11  705  	aging_tail = dev_priv->perf.oa.oa_buffer.tails[!aged_idx].offset;
0dd860cf Robert Bragg 2017-05-11  706  
19f81df2 Robert Bragg 2017-06-13  707  	hw_tail = dev_priv->perf.oa.ops.oa_hw_tail_read(dev_priv);
0dd860cf Robert Bragg 2017-05-11  708  
0dd860cf Robert Bragg 2017-05-11  709  	/* The tail pointer increases in 64 byte increments,
0dd860cf Robert Bragg 2017-05-11  710  	 * not in report_size steps...
0dd860cf Robert Bragg 2017-05-11  711  	 */
0dd860cf Robert Bragg 2017-05-11  712  	hw_tail &= ~(report_size - 1);
0dd860cf Robert Bragg 2017-05-11  713  
0dd860cf Robert Bragg 2017-05-11  714  	now = ktime_get_mono_fast_ns();
0dd860cf Robert Bragg 2017-05-11  715  
4117ebc7 Robert Bragg 2017-05-11  716  	/* Update the aged tail
4117ebc7 Robert Bragg 2017-05-11  717  	 *
4117ebc7 Robert Bragg 2017-05-11  718  	 * Flip the tail pointer available for read()s once the aging tail is
4117ebc7 Robert Bragg 2017-05-11  719  	 * old enough to trust that the corresponding data will be visible to
4117ebc7 Robert Bragg 2017-05-11  720  	 * the CPU...
4117ebc7 Robert Bragg 2017-05-11  721  	 *
4117ebc7 Robert Bragg 2017-05-11  722  	 * Do this before updating the aging pointer in case we may be able to
4117ebc7 Robert Bragg 2017-05-11  723  	 * immediately start aging a new pointer too (if new data has become
4117ebc7 Robert Bragg 2017-05-11  724  	 * available) without needing to wait for a later hrtimer callback.
4117ebc7 Robert Bragg 2017-05-11  725  	 */
4117ebc7 Robert Bragg 2017-05-11  726  	if (aging_tail != INVALID_TAIL_PTR &&
4117ebc7 Robert Bragg 2017-05-11  727  	    ((now - dev_priv->perf.oa.oa_buffer.aging_timestamp) >
4117ebc7 Robert Bragg 2017-05-11  728  	     OA_TAIL_MARGIN_NSEC)) {
24459f50 Sourab Gupta 2017-07-31  729  		u32 mask = (OA_BUFFER_SIZE - 1);
24459f50 Sourab Gupta 2017-07-31  730  		u32 gtt_offset = i915_ggtt_offset(
24459f50 Sourab Gupta 2017-07-31  731  				dev_priv->perf.oa.oa_buffer.vma);
24459f50 Sourab Gupta 2017-07-31  732  		u32 head = (dev_priv->perf.oa.oa_buffer.head - gtt_offset)
24459f50 Sourab Gupta 2017-07-31  733  				& mask;
24459f50 Sourab Gupta 2017-07-31  734  		u8 *oa_buf_base = dev_priv->perf.oa.oa_buffer.vaddr;
24459f50 Sourab Gupta 2017-07-31  735  		u32 *report32;
19f81df2 Robert Bragg 2017-06-13  736  
4117ebc7 Robert Bragg 2017-05-11  737  		aged_idx ^= 1;
4117ebc7 Robert Bragg 2017-05-11  738  		dev_priv->perf.oa.oa_buffer.aged_tail_idx = aged_idx;
4117ebc7 Robert Bragg 2017-05-11  739  
4117ebc7 Robert Bragg 2017-05-11  740  		aged_tail = aging_tail;
4117ebc7 Robert Bragg 2017-05-11  741  
4117ebc7 Robert Bragg 2017-05-11  742  		/* Mark that we need a new pointer to start aging... */
4117ebc7 Robert Bragg 2017-05-11  743  		dev_priv->perf.oa.oa_buffer.tails[!aged_idx].offset = INVALID_TAIL_PTR;
4117ebc7 Robert Bragg 2017-05-11  744  		aging_tail = INVALID_TAIL_PTR;
24459f50 Sourab Gupta 2017-07-31  745  
24459f50 Sourab Gupta 2017-07-31  746  		num_reports = OA_TAKEN(((aged_tail - gtt_offset) & mask), head)/
24459f50 Sourab Gupta 2017-07-31  747  				report_size;
24459f50 Sourab Gupta 2017-07-31  748  
24459f50 Sourab Gupta 2017-07-31  749  		/* read the timestamp of last OA report */
24459f50 Sourab Gupta 2017-07-31  750  		head = (head + report_size*(num_reports - 1)) & mask;
24459f50 Sourab Gupta 2017-07-31  751  		report32 = (u32 *)(oa_buf_base + head);
24459f50 Sourab Gupta 2017-07-31  752  		*last_ts = report32[1];
4117ebc7 Robert Bragg 2017-05-11  753  	}
4117ebc7 Robert Bragg 2017-05-11  754  
0dd860cf Robert Bragg 2017-05-11  755  	/* Update the aging tail
0dd860cf Robert Bragg 2017-05-11  756  	 *
0dd860cf Robert Bragg 2017-05-11  757  	 * We throttle aging tail updates until we have a new tail that
0dd860cf Robert Bragg 2017-05-11  758  	 * represents >= one report more data than is already available for
0dd860cf Robert Bragg 2017-05-11  759  	 * reading. This ensures there will be enough data for a successful
0dd860cf Robert Bragg 2017-05-11  760  	 * read once this new pointer has aged and ensures we will give the new
0dd860cf Robert Bragg 2017-05-11  761  	 * pointer time to age.
0dd860cf Robert Bragg 2017-05-11  762  	 */
0dd860cf Robert Bragg 2017-05-11  763  	if (aging_tail == INVALID_TAIL_PTR &&
0dd860cf Robert Bragg 2017-05-11  764  	    (aged_tail == INVALID_TAIL_PTR ||
0dd860cf Robert Bragg 2017-05-11  765  	     OA_TAKEN(hw_tail, aged_tail) >= report_size)) {
0dd860cf Robert Bragg 2017-05-11  766  		struct i915_vma *vma = dev_priv->perf.oa.oa_buffer.vma;
0dd860cf Robert Bragg 2017-05-11  767  		u32 gtt_offset = i915_ggtt_offset(vma);
0dd860cf Robert Bragg 2017-05-11  768  
0dd860cf Robert Bragg 2017-05-11  769  		/* Be paranoid and do a bounds check on the pointer read back
0dd860cf Robert Bragg 2017-05-11  770  		 * from hardware, just in case some spurious hardware condition
0dd860cf Robert Bragg 2017-05-11  771  		 * could put the tail out of bounds...
0dd860cf Robert Bragg 2017-05-11  772  		 */
0dd860cf Robert Bragg 2017-05-11  773  		if (hw_tail >= gtt_offset &&
0dd860cf Robert Bragg 2017-05-11  774  		    hw_tail < (gtt_offset + OA_BUFFER_SIZE)) {
0dd860cf Robert Bragg 2017-05-11  775  			dev_priv->perf.oa.oa_buffer.tails[!aged_idx].offset =
0dd860cf Robert Bragg 2017-05-11  776  				aging_tail = hw_tail;
0dd860cf Robert Bragg 2017-05-11  777  			dev_priv->perf.oa.oa_buffer.aging_timestamp = now;
0dd860cf Robert Bragg 2017-05-11  778  		} else {
0dd860cf Robert Bragg 2017-05-11  779  			DRM_ERROR("Ignoring spurious out of range OA buffer tail pointer = %u\n",
0dd860cf Robert Bragg 2017-05-11  780  				  hw_tail);
0dd860cf Robert Bragg 2017-05-11  781  		}
0dd860cf Robert Bragg 2017-05-11  782  	}
0dd860cf Robert Bragg 2017-05-11  783  
0dd860cf Robert Bragg 2017-05-11  784  	spin_unlock_irqrestore(&dev_priv->perf.oa.oa_buffer.ptr_lock, flags);
0dd860cf Robert Bragg 2017-05-11  785  
24459f50 Sourab Gupta 2017-07-31  786  	return aged_tail == INVALID_TAIL_PTR ? 0 : num_reports;
d7965152 Robert Bragg 2016-11-07  787  }
d7965152 Robert Bragg 2016-11-07  788  

:::::: The code at line 684 was first introduced by commit
:::::: d79651522e89c4ffa8992b48dfe449f0c583f809 drm/i915: Enable i915 perf stream for Haswell OA unit

:::::: TO: Robert Bragg <robert@sixbynine.org>
:::::: CC: Daniel Vetter <daniel.vetter@ffwll.ch>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 6735 bytes --]

[-- Attachment #3: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 06/12] drm/i915: Populate ctx ID for periodic OA reports
  2017-07-31  7:59 ` [PATCH 06/12] drm/i915: Populate ctx ID for periodic OA reports Sagar Arun Kamble
  2017-07-31  9:27   ` Lionel Landwerlin
@ 2017-07-31 18:17   ` kbuild test robot
  1 sibling, 0 replies; 34+ messages in thread
From: kbuild test robot @ 2017-07-31 18:17 UTC (permalink / raw)
  To: Sagar Arun Kamble; +Cc: intel-gfx, Sourab Gupta, kbuild-all

[-- Attachment #1: Type: text/plain, Size: 34642 bytes --]

Hi Sourab,

[auto build test WARNING on drm-intel/for-linux-next]
[also build test WARNING on next-20170731]
[cannot apply to v4.13-rc3]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Sagar-Arun-Kamble/i915-perf-support-for-command-stream-based-OA-GPU-and-workload-metrics-capture/20170731-184412
base:   git://anongit.freedesktop.org/drm-intel for-linux-next
reproduce: make htmldocs

All warnings (new ones prefixed by >>):

   WARNING: convert(1) not found, for SVG to PDF conversion install ImageMagick (https://www.imagemagick.org)
   include/linux/init.h:1: warning: no structured comments found
   include/linux/mod_devicetable.h:687: warning: Excess struct/union/enum/typedef member 'ver_major' description in 'fsl_mc_device_id'
   include/linux/mod_devicetable.h:687: warning: Excess struct/union/enum/typedef member 'ver_minor' description in 'fsl_mc_device_id'
   kernel/sched/core.c:2080: warning: No description found for parameter 'rf'
   kernel/sched/core.c:2080: warning: Excess function parameter 'cookie' description in 'try_to_wake_up_local'
   include/linux/wait.h:555: warning: No description found for parameter 'wq'
   include/linux/wait.h:555: warning: Excess function parameter 'wq_head' description in 'wait_event_interruptible_hrtimeout'
   include/linux/wait.h:759: warning: No description found for parameter 'wq_head'
   include/linux/wait.h:759: warning: Excess function parameter 'wq' description in 'wait_event_killable'
   include/linux/kthread.h:26: warning: Excess function parameter '...' description in 'kthread_create'
   kernel/sys.c:1: warning: no structured comments found
   include/linux/device.h:968: warning: No description found for parameter 'dma_ops'
   drivers/dma-buf/seqno-fence.c:1: warning: no structured comments found
   include/linux/iio/iio.h:603: warning: No description found for parameter 'trig_readonly'
   include/linux/iio/trigger.h:151: warning: No description found for parameter 'indio_dev'
   include/linux/iio/trigger.h:151: warning: No description found for parameter 'trig'
   include/linux/device.h:969: warning: No description found for parameter 'dma_ops'
   drivers/ata/libata-eh.c:1449: warning: No description found for parameter 'link'
   drivers/ata/libata-eh.c:1449: warning: Excess function parameter 'ap' description in 'ata_eh_done'
   drivers/ata/libata-eh.c:1590: warning: No description found for parameter 'qc'
   drivers/ata/libata-eh.c:1590: warning: Excess function parameter 'dev' description in 'ata_eh_request_sense'
   drivers/mtd/nand/nand_base.c:2751: warning: Excess function parameter 'cached' description in 'nand_write_page'
   drivers/mtd/nand/nand_base.c:2751: warning: Excess function parameter 'cached' description in 'nand_write_page'
   arch/s390/include/asm/cmb.h:1: warning: no structured comments found
   drivers/scsi/scsi_lib.c:1116: warning: No description found for parameter 'rq'
   drivers/scsi/constants.c:1: warning: no structured comments found
   include/linux/usb/gadget.h:230: warning: No description found for parameter 'claimed'
   include/linux/usb/gadget.h:230: warning: No description found for parameter 'enabled'
   include/linux/usb/gadget.h:412: warning: No description found for parameter 'quirk_altset_not_supp'
   include/linux/usb/gadget.h:412: warning: No description found for parameter 'quirk_stall_not_supp'
   include/linux/usb/gadget.h:412: warning: No description found for parameter 'quirk_zlp_not_supp'
   fs/inode.c:1666: warning: No description found for parameter 'rcu'
   include/linux/jbd2.h:443: warning: No description found for parameter 'i_transaction'
   include/linux/jbd2.h:443: warning: No description found for parameter 'i_next_transaction'
   include/linux/jbd2.h:443: warning: No description found for parameter 'i_list'
   include/linux/jbd2.h:443: warning: No description found for parameter 'i_vfs_inode'
   include/linux/jbd2.h:443: warning: No description found for parameter 'i_flags'
   include/linux/jbd2.h:497: warning: No description found for parameter 'h_rsv_handle'
   include/linux/jbd2.h:497: warning: No description found for parameter 'h_reserved'
   include/linux/jbd2.h:497: warning: No description found for parameter 'h_type'
   include/linux/jbd2.h:497: warning: No description found for parameter 'h_line_no'
   include/linux/jbd2.h:497: warning: No description found for parameter 'h_start_jiffies'
   include/linux/jbd2.h:497: warning: No description found for parameter 'h_requested_credits'
   include/linux/jbd2.h:497: warning: No description found for parameter 'saved_alloc_context'
   include/linux/jbd2.h:1050: warning: No description found for parameter 'j_chkpt_bhs'
   include/linux/jbd2.h:1050: warning: No description found for parameter 'j_devname'
   include/linux/jbd2.h:1050: warning: No description found for parameter 'j_average_commit_time'
   include/linux/jbd2.h:1050: warning: No description found for parameter 'j_min_batch_time'
   include/linux/jbd2.h:1050: warning: No description found for parameter 'j_max_batch_time'
   include/linux/jbd2.h:1050: warning: No description found for parameter 'j_commit_callback'
   include/linux/jbd2.h:1050: warning: No description found for parameter 'j_failed_commit'
   include/linux/jbd2.h:1050: warning: No description found for parameter 'j_chksum_driver'
   include/linux/jbd2.h:1050: warning: No description found for parameter 'j_csum_seed'
   fs/jbd2/transaction.c:511: warning: No description found for parameter 'type'
   fs/jbd2/transaction.c:511: warning: No description found for parameter 'line_no'
   fs/jbd2/transaction.c:641: warning: No description found for parameter 'gfp_mask'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'debugfs_init'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'gem_open_object'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'gem_close_object'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'prime_handle_to_fd'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'prime_fd_to_handle'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'gem_prime_export'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'gem_prime_import'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'gem_prime_pin'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'gem_prime_unpin'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'gem_prime_res_obj'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'gem_prime_get_sg_table'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'gem_prime_import_sg_table'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'gem_prime_vmap'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'gem_prime_vunmap'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'gem_prime_mmap'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'gem_vm_ops'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'major'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'minor'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'patchlevel'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'name'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'desc'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'date'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'driver_features'
   drivers/gpu/drm/drm_modes.c:1623: warning: No description found for parameter 'display'
   drivers/gpu/drm/drm_modes.c:1623: warning: Excess function parameter 'connector' description in 'drm_mode_is_420_only'
   drivers/gpu/drm/i915/i915_perf.c:688: warning: No description found for parameter 'last_ts'
   drivers/gpu/drm/i915/i915_perf.c:688: warning: No description found for parameter 'last_ts'
   drivers/gpu/drm/i915/i915_perf.c:688: warning: No description found for parameter 'last_ts'
   drivers/gpu/drm/i915/i915_perf.c:688: warning: No description found for parameter 'last_ts'
   drivers/gpu/drm/i915/i915_perf.c:688: warning: No description found for parameter 'last_ts'
   drivers/gpu/drm/i915/i915_perf.c:688: warning: No description found for parameter 'last_ts'
   drivers/gpu/drm/i915/i915_perf.c:688: warning: No description found for parameter 'last_ts'
   drivers/gpu/drm/i915/i915_perf.c:688: warning: No description found for parameter 'last_ts'
   drivers/gpu/drm/i915/i915_drv.h:2000: warning: No description found for parameter 'emit_sample_capture'
   drivers/gpu/drm/i915/i915_drv.h:2082: warning: No description found for parameter 'cs_buffer'
   drivers/gpu/drm/i915/i915_drv.h:2082: warning: No description found for parameter 'cs_samples'
   drivers/gpu/drm/i915/i915_drv.h:2082: warning: No description found for parameter 'cs_samples_lock'
   drivers/gpu/drm/i915/i915_drv.h:2082: warning: No description found for parameter 'poll_wq'
   drivers/gpu/drm/i915/i915_drv.h:2082: warning: No description found for parameter 'pollin'
>> drivers/gpu/drm/i915/i915_drv.h:2082: warning: No description found for parameter 'last_ctx_id'
   drivers/gpu/drm/i915/i915_drv.h:2000: warning: No description found for parameter 'emit_sample_capture'
   drivers/gpu/drm/i915/i915_drv.h:2082: warning: No description found for parameter 'cs_buffer'
   drivers/gpu/drm/i915/i915_drv.h:2082: warning: No description found for parameter 'cs_samples'
   drivers/gpu/drm/i915/i915_drv.h:2082: warning: No description found for parameter 'cs_samples_lock'
   drivers/gpu/drm/i915/i915_drv.h:2082: warning: No description found for parameter 'poll_wq'
   drivers/gpu/drm/i915/i915_drv.h:2082: warning: No description found for parameter 'pollin'
>> drivers/gpu/drm/i915/i915_drv.h:2082: warning: No description found for parameter 'last_ctx_id'
   drivers/gpu/drm/i915/i915_perf.c:688: warning: No description found for parameter 'last_ts'
   drivers/gpu/drm/i915/i915_perf.c:688: warning: No description found for parameter 'last_ts'
   drivers/gpu/drm/i915/i915_perf.c:688: warning: No description found for parameter 'last_ts'
   drivers/gpu/drm/i915/i915_perf.c:688: warning: No description found for parameter 'last_ts'
   drivers/gpu/drm/i915/i915_perf.c:688: warning: No description found for parameter 'last_ts'
   drivers/gpu/drm/i915/i915_perf.c:688: warning: No description found for parameter 'last_ts'
   drivers/gpu/drm/i915/i915_perf.c:688: warning: No description found for parameter 'last_ts'
   drivers/gpu/drm/i915/i915_perf.c:688: warning: No description found for parameter 'last_ts'
   drivers/gpu/drm/i915/i915_perf.c:688: warning: No description found for parameter 'last_ts'
   drivers/gpu/drm/i915/i915_drv.h:2000: warning: No description found for parameter 'emit_sample_capture'
   drivers/gpu/drm/i915/i915_drv.h:2082: warning: No description found for parameter 'cs_buffer'
   drivers/gpu/drm/i915/i915_drv.h:2082: warning: No description found for parameter 'cs_samples'
   drivers/gpu/drm/i915/i915_drv.h:2082: warning: No description found for parameter 'cs_samples_lock'
   drivers/gpu/drm/i915/i915_drv.h:2082: warning: No description found for parameter 'poll_wq'
   drivers/gpu/drm/i915/i915_drv.h:2082: warning: No description found for parameter 'pollin'
>> drivers/gpu/drm/i915/i915_drv.h:2082: warning: No description found for parameter 'last_ctx_id'
   drivers/gpu/drm/i915/i915_perf.c:688: warning: No description found for parameter 'last_ts'
   drivers/gpu/drm/i915/i915_perf.c:1: warning: no structured comments found
   drivers/gpu/drm/i915/i915_perf.c:688: warning: No description found for parameter 'last_ts'
   drivers/gpu/drm/i915/i915_perf.c:1: warning: no structured comments found
   drivers/gpu/drm/i915/i915_perf.c:688: warning: No description found for parameter 'last_ts'
   drivers/gpu/drm/i915/i915_perf.c:1: warning: no structured comments found
   drivers/gpu/drm/i915/i915_perf.c:688: warning: No description found for parameter 'last_ts'
   drivers/gpu/drm/i915/i915_perf.c:1: warning: no structured comments found
   drivers/gpu/drm/i915/i915_perf.c:688: warning: No description found for parameter 'last_ts'
   drivers/gpu/drm/i915/i915_perf.c:1: warning: no structured comments found
   drivers/gpu/drm/i915/i915_perf.c:688: warning: No description found for parameter 'last_ts'
   drivers/gpu/drm/i915/i915_perf.c:1: warning: no structured comments found
   drivers/gpu/drm/i915/i915_perf.c:689: warning: No description found for parameter 'last_ts'
   drivers/gpu/host1x/bus.c:50: warning: Excess function parameter 'driver' description in 'host1x_subdev_add'
   Documentation/doc-guide/sphinx.rst:121: ERROR: Unknown target name: "sphinx c domain".
   kernel/sched/fair.c:7584: WARNING: Inline emphasis start-string without end-string.
   kernel/time/timer.c:1200: ERROR: Unexpected indentation.
   kernel/time/timer.c:1202: ERROR: Unexpected indentation.
   kernel/time/timer.c:1203: WARNING: Block quote ends without a blank line; unexpected unindent.
   include/linux/wait.h:108: WARNING: Block quote ends without a blank line; unexpected unindent.
   include/linux/wait.h:111: ERROR: Unexpected indentation.
   include/linux/wait.h:113: WARNING: Block quote ends without a blank line; unexpected unindent.
   kernel/time/hrtimer.c:991: WARNING: Block quote ends without a blank line; unexpected unindent.
   kernel/signal.c:323: WARNING: Inline literal start-string without end-string.
   kernel/rcu/tree.c:3187: ERROR: Unexpected indentation.
   kernel/rcu/tree.c:3214: ERROR: Unexpected indentation.
   kernel/rcu/tree.c:3215: WARNING: Bullet list ends without a blank line; unexpected unindent.
   include/linux/iio/iio.h:219: ERROR: Unexpected indentation.
   include/linux/iio/iio.h:220: WARNING: Block quote ends without a blank line; unexpected unindent.
   include/linux/iio/iio.h:226: WARNING: Definition list ends without a blank line; unexpected unindent.
   drivers/iio/industrialio-core.c:633: ERROR: Unknown target name: "iio_val".
   drivers/iio/industrialio-core.c:640: ERROR: Unknown target name: "iio_val".
   drivers/ata/libata-core.c:5906: ERROR: Unknown target name: "hw".
   drivers/message/fusion/mptbase.c:5051: WARNING: Definition list ends without a blank line; unexpected unindent.
   drivers/tty/serial/serial_core.c:1897: WARNING: Definition list ends without a blank line; unexpected unindent.
   drivers/pci/pci.c:3470: ERROR: Unexpected indentation.
   include/linux/regulator/driver.h:271: ERROR: Unknown target name: "regulator_regmap_x_voltage".
   include/linux/spi/spi.h:373: ERROR: Unexpected indentation.
   drivers/w1/w1_io.c:196: WARNING: Definition list ends without a blank line; unexpected unindent.
   block/bio.c:404: ERROR: Unknown target name: "gfp".
   include/drm/drm_modeset_helper_vtables.h:1182: WARNING: Bullet list ends without a blank line; unexpected unindent.
   drivers/gpu/drm/drm_scdc_helper.c:203: ERROR: Unexpected indentation.
   drivers/gpu/drm/drm_scdc_helper.c:204: WARNING: Block quote ends without a blank line; unexpected unindent.
   Documentation/gpu/todo.rst:111: ERROR: Unknown target name: "drm_fb".
   sound/soc/soc-core.c:2703: ERROR: Unknown target name: "snd_soc_daifmt".
   sound/core/jack.c:312: ERROR: Unknown target name: "snd_jack_btn".
   Documentation/media/v4l-drivers/imx.rst:: WARNING: document isn't included in any toctree
   Documentation/virtual/kvm/vcpu-requests.rst:: WARNING: document isn't included in any toctree
   Documentation/dev-tools/kselftest.rst:15: WARNING: Could not lex literal_block as "c". Highlighting skipped.
   Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 43: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 56: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 69: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 82: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 96: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 109: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 122: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 133: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 164: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 193: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "~/.fonts.conf", line 43: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "~/.fonts.conf", line 56: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "~/.fonts.conf", line 69: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "~/.fonts.conf", line 82: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "~/.fonts.conf", line 96: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "~/.fonts.conf", line 109: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "~/.fonts.conf", line 122: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "~/.fonts.conf", line 133: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "~/.fonts.conf", line 164: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "~/.fonts.conf", line 193: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 43: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 56: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 69: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 82: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 96: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 109: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 122: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 133: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 164: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 193: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "~/.fonts.conf", line 43: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "~/.fonts.conf", line 56: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "~/.fonts.conf", line 69: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "~/.fonts.conf", line 82: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "~/.fonts.conf", line 96: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "~/.fonts.conf", line 109: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "~/.fonts.conf", line 122: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "~/.fonts.conf", line 133: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "~/.fonts.conf", line 164: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "~/.fonts.conf", line 193: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 43: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 56: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 69: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 82: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 96: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 109: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 122: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 133: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 164: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 193: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "~/.fonts.conf", line 43: Having multiple values in <test> isn't supported and may not work as expected

vim +/last_ctx_id +2082 drivers/gpu/drm/i915/i915_drv.h

eec688e1 Robert Bragg 2016-11-07  1925  
16d98b31 Robert Bragg 2016-12-07  1926  /**
16d98b31 Robert Bragg 2016-12-07  1927   * struct i915_perf_stream_ops - the OPs to support a specific stream type
16d98b31 Robert Bragg 2016-12-07  1928   */
eec688e1 Robert Bragg 2016-11-07  1929  struct i915_perf_stream_ops {
16d98b31 Robert Bragg 2016-12-07  1930  	/**
16d98b31 Robert Bragg 2016-12-07  1931  	 * @enable: Enables the collection of HW samples, either in response to
16d98b31 Robert Bragg 2016-12-07  1932  	 * `I915_PERF_IOCTL_ENABLE` or implicitly called when stream is opened
16d98b31 Robert Bragg 2016-12-07  1933  	 * without `I915_PERF_FLAG_DISABLED`.
eec688e1 Robert Bragg 2016-11-07  1934  	 */
eec688e1 Robert Bragg 2016-11-07  1935  	void (*enable)(struct i915_perf_stream *stream);
eec688e1 Robert Bragg 2016-11-07  1936  
16d98b31 Robert Bragg 2016-12-07  1937  	/**
16d98b31 Robert Bragg 2016-12-07  1938  	 * @disable: Disables the collection of HW samples, either in response
16d98b31 Robert Bragg 2016-12-07  1939  	 * to `I915_PERF_IOCTL_DISABLE` or implicitly called before destroying
16d98b31 Robert Bragg 2016-12-07  1940  	 * the stream.
eec688e1 Robert Bragg 2016-11-07  1941  	 */
eec688e1 Robert Bragg 2016-11-07  1942  	void (*disable)(struct i915_perf_stream *stream);
eec688e1 Robert Bragg 2016-11-07  1943  
16d98b31 Robert Bragg 2016-12-07  1944  	/**
16d98b31 Robert Bragg 2016-12-07  1945  	 * @poll_wait: Call poll_wait, passing a wait queue that will be woken
eec688e1 Robert Bragg 2016-11-07  1946  	 * once there is something ready to read() for the stream
eec688e1 Robert Bragg 2016-11-07  1947  	 */
eec688e1 Robert Bragg 2016-11-07  1948  	void (*poll_wait)(struct i915_perf_stream *stream,
eec688e1 Robert Bragg 2016-11-07  1949  			  struct file *file,
eec688e1 Robert Bragg 2016-11-07  1950  			  poll_table *wait);
eec688e1 Robert Bragg 2016-11-07  1951  
16d98b31 Robert Bragg 2016-12-07  1952  	/**
16d98b31 Robert Bragg 2016-12-07  1953  	 * @wait_unlocked: For handling a blocking read, wait until there is
16d98b31 Robert Bragg 2016-12-07  1954  	 * something to ready to read() for the stream. E.g. wait on the same
d7965152 Robert Bragg 2016-11-07  1955  	 * wait queue that would be passed to poll_wait().
eec688e1 Robert Bragg 2016-11-07  1956  	 */
eec688e1 Robert Bragg 2016-11-07  1957  	int (*wait_unlocked)(struct i915_perf_stream *stream);
eec688e1 Robert Bragg 2016-11-07  1958  
16d98b31 Robert Bragg 2016-12-07  1959  	/**
16d98b31 Robert Bragg 2016-12-07  1960  	 * @read: Copy buffered metrics as records to userspace
16d98b31 Robert Bragg 2016-12-07  1961  	 * **buf**: the userspace, destination buffer
16d98b31 Robert Bragg 2016-12-07  1962  	 * **count**: the number of bytes to copy, requested by userspace
16d98b31 Robert Bragg 2016-12-07  1963  	 * **offset**: zero at the start of the read, updated as the read
16d98b31 Robert Bragg 2016-12-07  1964  	 * proceeds, it represents how many bytes have been copied so far and
16d98b31 Robert Bragg 2016-12-07  1965  	 * the buffer offset for copying the next record.
eec688e1 Robert Bragg 2016-11-07  1966  	 *
16d98b31 Robert Bragg 2016-12-07  1967  	 * Copy as many buffered i915 perf samples and records for this stream
16d98b31 Robert Bragg 2016-12-07  1968  	 * to userspace as will fit in the given buffer.
eec688e1 Robert Bragg 2016-11-07  1969  	 *
16d98b31 Robert Bragg 2016-12-07  1970  	 * Only write complete records; returning -%ENOSPC if there isn't room
16d98b31 Robert Bragg 2016-12-07  1971  	 * for a complete record.
eec688e1 Robert Bragg 2016-11-07  1972  	 *
16d98b31 Robert Bragg 2016-12-07  1973  	 * Return any error condition that results in a short read such as
16d98b31 Robert Bragg 2016-12-07  1974  	 * -%ENOSPC or -%EFAULT, even though these may be squashed before
16d98b31 Robert Bragg 2016-12-07  1975  	 * returning to userspace.
eec688e1 Robert Bragg 2016-11-07  1976  	 */
eec688e1 Robert Bragg 2016-11-07  1977  	int (*read)(struct i915_perf_stream *stream,
eec688e1 Robert Bragg 2016-11-07  1978  		    char __user *buf,
eec688e1 Robert Bragg 2016-11-07  1979  		    size_t count,
eec688e1 Robert Bragg 2016-11-07  1980  		    size_t *offset);
eec688e1 Robert Bragg 2016-11-07  1981  
16d98b31 Robert Bragg 2016-12-07  1982  	/**
16d98b31 Robert Bragg 2016-12-07  1983  	 * @destroy: Cleanup any stream specific resources.
eec688e1 Robert Bragg 2016-11-07  1984  	 *
eec688e1 Robert Bragg 2016-11-07  1985  	 * The stream will always be disabled before this is called.
eec688e1 Robert Bragg 2016-11-07  1986  	 */
eec688e1 Robert Bragg 2016-11-07  1987  	void (*destroy)(struct i915_perf_stream *stream);
b0aca6b4 Sourab Gupta 2017-07-31  1988  
b0aca6b4 Sourab Gupta 2017-07-31  1989  	/*
b0aca6b4 Sourab Gupta 2017-07-31  1990  	 * @emit_sample_capture: Emit the commands in the command streamer
b0aca6b4 Sourab Gupta 2017-07-31  1991  	 * for a particular gpu engine.
b0aca6b4 Sourab Gupta 2017-07-31  1992  	 *
b0aca6b4 Sourab Gupta 2017-07-31  1993  	 * The commands are inserted to capture the perf sample data at
b0aca6b4 Sourab Gupta 2017-07-31  1994  	 * specific points during workload execution, such as before and after
b0aca6b4 Sourab Gupta 2017-07-31  1995  	 * the batch buffer.
b0aca6b4 Sourab Gupta 2017-07-31  1996  	 */
b0aca6b4 Sourab Gupta 2017-07-31  1997  	void (*emit_sample_capture)(struct i915_perf_stream *stream,
b0aca6b4 Sourab Gupta 2017-07-31  1998  				    struct drm_i915_gem_request *request,
b0aca6b4 Sourab Gupta 2017-07-31  1999  				    bool preallocate);
b0aca6b4 Sourab Gupta 2017-07-31 @2000  };
b0aca6b4 Sourab Gupta 2017-07-31  2001  
b0aca6b4 Sourab Gupta 2017-07-31  2002  enum i915_perf_stream_state {
b0aca6b4 Sourab Gupta 2017-07-31  2003  	I915_PERF_STREAM_DISABLED,
b0aca6b4 Sourab Gupta 2017-07-31  2004  	I915_PERF_STREAM_ENABLE_IN_PROGRESS,
b0aca6b4 Sourab Gupta 2017-07-31  2005  	I915_PERF_STREAM_ENABLED,
eec688e1 Robert Bragg 2016-11-07  2006  };
eec688e1 Robert Bragg 2016-11-07  2007  
16d98b31 Robert Bragg 2016-12-07  2008  /**
16d98b31 Robert Bragg 2016-12-07  2009   * struct i915_perf_stream - state for a single open stream FD
16d98b31 Robert Bragg 2016-12-07  2010   */
eec688e1 Robert Bragg 2016-11-07  2011  struct i915_perf_stream {
16d98b31 Robert Bragg 2016-12-07  2012  	/**
16d98b31 Robert Bragg 2016-12-07  2013  	 * @dev_priv: i915 drm device
16d98b31 Robert Bragg 2016-12-07  2014  	 */
eec688e1 Robert Bragg 2016-11-07  2015  	struct drm_i915_private *dev_priv;
eec688e1 Robert Bragg 2016-11-07  2016  
16d98b31 Robert Bragg 2016-12-07  2017  	/**
b0aca6b4 Sourab Gupta 2017-07-31  2018  	 * @engine: Engine to which this stream corresponds.
16d98b31 Robert Bragg 2016-12-07  2019  	 */
b0aca6b4 Sourab Gupta 2017-07-31  2020  	struct intel_engine_cs *engine;
eec688e1 Robert Bragg 2016-11-07  2021  
16d98b31 Robert Bragg 2016-12-07  2022  	/**
16d98b31 Robert Bragg 2016-12-07  2023  	 * @sample_flags: Flags representing the `DRM_I915_PERF_PROP_SAMPLE_*`
16d98b31 Robert Bragg 2016-12-07  2024  	 * properties given when opening a stream, representing the contents
16d98b31 Robert Bragg 2016-12-07  2025  	 * of a single sample as read() by userspace.
16d98b31 Robert Bragg 2016-12-07  2026  	 */
eec688e1 Robert Bragg 2016-11-07  2027  	u32 sample_flags;
16d98b31 Robert Bragg 2016-12-07  2028  
16d98b31 Robert Bragg 2016-12-07  2029  	/**
16d98b31 Robert Bragg 2016-12-07  2030  	 * @sample_size: Considering the configured contents of a sample
16d98b31 Robert Bragg 2016-12-07  2031  	 * combined with the required header size, this is the total size
16d98b31 Robert Bragg 2016-12-07  2032  	 * of a single sample record.
16d98b31 Robert Bragg 2016-12-07  2033  	 */
d7965152 Robert Bragg 2016-11-07  2034  	int sample_size;
eec688e1 Robert Bragg 2016-11-07  2035  
16d98b31 Robert Bragg 2016-12-07  2036  	/**
16d98b31 Robert Bragg 2016-12-07  2037  	 * @ctx: %NULL if measuring system-wide across all contexts or a
16d98b31 Robert Bragg 2016-12-07  2038  	 * specific context that is being monitored.
16d98b31 Robert Bragg 2016-12-07  2039  	 */
eec688e1 Robert Bragg 2016-11-07  2040  	struct i915_gem_context *ctx;
16d98b31 Robert Bragg 2016-12-07  2041  
16d98b31 Robert Bragg 2016-12-07  2042  	/**
b0aca6b4 Sourab Gupta 2017-07-31  2043  	 * @state: Current stream state, which can be either disabled, enabled,
b0aca6b4 Sourab Gupta 2017-07-31  2044  	 * or enable_in_progress, while considering whether the stream was
b0aca6b4 Sourab Gupta 2017-07-31  2045  	 * opened in a disabled state and based on `I915_PERF_IOCTL_ENABLE` and
b0aca6b4 Sourab Gupta 2017-07-31  2046  	 * `I915_PERF_IOCTL_DISABLE` calls.
16d98b31 Robert Bragg 2016-12-07  2047  	 */
b0aca6b4 Sourab Gupta 2017-07-31  2048  	enum i915_perf_stream_state state;
b0aca6b4 Sourab Gupta 2017-07-31  2049  
b0aca6b4 Sourab Gupta 2017-07-31  2050  	/**
b0aca6b4 Sourab Gupta 2017-07-31  2051  	 * @cs_mode: Whether command stream based perf sample collection is
b0aca6b4 Sourab Gupta 2017-07-31  2052  	 * enabled for this stream
b0aca6b4 Sourab Gupta 2017-07-31  2053  	 */
b0aca6b4 Sourab Gupta 2017-07-31  2054  	bool cs_mode;
b0aca6b4 Sourab Gupta 2017-07-31  2055  
b0aca6b4 Sourab Gupta 2017-07-31  2056  	/**
b0aca6b4 Sourab Gupta 2017-07-31  2057  	 * @using_oa: Whether OA unit is in use for this particular stream
b0aca6b4 Sourab Gupta 2017-07-31  2058  	 */
b0aca6b4 Sourab Gupta 2017-07-31  2059  	bool using_oa;
eec688e1 Robert Bragg 2016-11-07  2060  
16d98b31 Robert Bragg 2016-12-07  2061  	/**
16d98b31 Robert Bragg 2016-12-07  2062  	 * @ops: The callbacks providing the implementation of this specific
16d98b31 Robert Bragg 2016-12-07  2063  	 * type of configured stream.
16d98b31 Robert Bragg 2016-12-07  2064  	 */
d7965152 Robert Bragg 2016-11-07  2065  	const struct i915_perf_stream_ops *ops;
b0aca6b4 Sourab Gupta 2017-07-31  2066  
b0aca6b4 Sourab Gupta 2017-07-31  2067  	/* Command stream based perf data buffer */
b0aca6b4 Sourab Gupta 2017-07-31  2068  	struct {
b0aca6b4 Sourab Gupta 2017-07-31  2069  		struct i915_vma *vma;
b0aca6b4 Sourab Gupta 2017-07-31  2070  		u8 *vaddr;
71fd8fc0 Sourab Gupta 2017-07-31  2071  #define I915_PERF_CMD_STREAM_BUF_STATUS_OVERFLOW (1<<0)
71fd8fc0 Sourab Gupta 2017-07-31  2072  		u32 status;
b0aca6b4 Sourab Gupta 2017-07-31  2073  	} cs_buffer;
b0aca6b4 Sourab Gupta 2017-07-31  2074  
b0aca6b4 Sourab Gupta 2017-07-31  2075  	struct list_head cs_samples;
b0aca6b4 Sourab Gupta 2017-07-31  2076  	spinlock_t cs_samples_lock;
b0aca6b4 Sourab Gupta 2017-07-31  2077  
b0aca6b4 Sourab Gupta 2017-07-31  2078  	wait_queue_head_t poll_wq;
b0aca6b4 Sourab Gupta 2017-07-31  2079  	bool pollin;
7405a923 Sourab Gupta 2017-07-31  2080  
7405a923 Sourab Gupta 2017-07-31  2081  	u32 last_ctx_id;
d7965152 Robert Bragg 2016-11-07 @2082  };
d7965152 Robert Bragg 2016-11-07  2083  

:::::: The code at line 2082 was first introduced by commit
:::::: d79651522e89c4ffa8992b48dfe449f0c583f809 drm/i915: Enable i915 perf stream for Haswell OA unit

:::::: TO: Robert Bragg <robert@sixbynine.org>
:::::: CC: Daniel Vetter <daniel.vetter@ffwll.ch>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 6735 bytes --]

[-- Attachment #3: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 07/12] drm/i915: Add support for having pid output with OA report
  2017-07-31  7:59 ` [PATCH 07/12] drm/i915: Add support for having pid output with OA report Sagar Arun Kamble
@ 2017-07-31 19:24   ` kbuild test robot
  0 siblings, 0 replies; 34+ messages in thread
From: kbuild test robot @ 2017-07-31 19:24 UTC (permalink / raw)
  To: Sagar Arun Kamble; +Cc: intel-gfx, Sourab Gupta, kbuild-all

[-- Attachment #1: Type: text/plain, Size: 34982 bytes --]

Hi Sourab,

[auto build test WARNING on drm-intel/for-linux-next]
[also build test WARNING on next-20170731]
[cannot apply to v4.13-rc3]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Sagar-Arun-Kamble/i915-perf-support-for-command-stream-based-OA-GPU-and-workload-metrics-capture/20170731-184412
base:   git://anongit.freedesktop.org/drm-intel for-linux-next
reproduce: make htmldocs

All warnings (new ones prefixed by >>):

   WARNING: convert(1) not found, for SVG to PDF conversion install ImageMagick (https://www.imagemagick.org)
   include/linux/init.h:1: warning: no structured comments found
   include/linux/mod_devicetable.h:687: warning: Excess struct/union/enum/typedef member 'ver_major' description in 'fsl_mc_device_id'
   include/linux/mod_devicetable.h:687: warning: Excess struct/union/enum/typedef member 'ver_minor' description in 'fsl_mc_device_id'
   kernel/sched/core.c:2080: warning: No description found for parameter 'rf'
   kernel/sched/core.c:2080: warning: Excess function parameter 'cookie' description in 'try_to_wake_up_local'
   include/linux/wait.h:555: warning: No description found for parameter 'wq'
   include/linux/wait.h:555: warning: Excess function parameter 'wq_head' description in 'wait_event_interruptible_hrtimeout'
   include/linux/wait.h:759: warning: No description found for parameter 'wq_head'
   include/linux/wait.h:759: warning: Excess function parameter 'wq' description in 'wait_event_killable'
   include/linux/kthread.h:26: warning: Excess function parameter '...' description in 'kthread_create'
   kernel/sys.c:1: warning: no structured comments found
   include/linux/device.h:968: warning: No description found for parameter 'dma_ops'
   drivers/dma-buf/seqno-fence.c:1: warning: no structured comments found
   include/linux/iio/iio.h:603: warning: No description found for parameter 'trig_readonly'
   include/linux/iio/trigger.h:151: warning: No description found for parameter 'indio_dev'
   include/linux/iio/trigger.h:151: warning: No description found for parameter 'trig'
   include/linux/device.h:969: warning: No description found for parameter 'dma_ops'
   drivers/ata/libata-eh.c:1449: warning: No description found for parameter 'link'
   drivers/ata/libata-eh.c:1449: warning: Excess function parameter 'ap' description in 'ata_eh_done'
   drivers/ata/libata-eh.c:1590: warning: No description found for parameter 'qc'
   drivers/ata/libata-eh.c:1590: warning: Excess function parameter 'dev' description in 'ata_eh_request_sense'
   drivers/mtd/nand/nand_base.c:2751: warning: Excess function parameter 'cached' description in 'nand_write_page'
   drivers/mtd/nand/nand_base.c:2751: warning: Excess function parameter 'cached' description in 'nand_write_page'
   arch/s390/include/asm/cmb.h:1: warning: no structured comments found
   drivers/scsi/scsi_lib.c:1116: warning: No description found for parameter 'rq'
   drivers/scsi/constants.c:1: warning: no structured comments found
   include/linux/usb/gadget.h:230: warning: No description found for parameter 'claimed'
   include/linux/usb/gadget.h:230: warning: No description found for parameter 'enabled'
   include/linux/usb/gadget.h:412: warning: No description found for parameter 'quirk_altset_not_supp'
   include/linux/usb/gadget.h:412: warning: No description found for parameter 'quirk_stall_not_supp'
   include/linux/usb/gadget.h:412: warning: No description found for parameter 'quirk_zlp_not_supp'
   fs/inode.c:1666: warning: No description found for parameter 'rcu'
   include/linux/jbd2.h:443: warning: No description found for parameter 'i_transaction'
   include/linux/jbd2.h:443: warning: No description found for parameter 'i_next_transaction'
   include/linux/jbd2.h:443: warning: No description found for parameter 'i_list'
   include/linux/jbd2.h:443: warning: No description found for parameter 'i_vfs_inode'
   include/linux/jbd2.h:443: warning: No description found for parameter 'i_flags'
   include/linux/jbd2.h:497: warning: No description found for parameter 'h_rsv_handle'
   include/linux/jbd2.h:497: warning: No description found for parameter 'h_reserved'
   include/linux/jbd2.h:497: warning: No description found for parameter 'h_type'
   include/linux/jbd2.h:497: warning: No description found for parameter 'h_line_no'
   include/linux/jbd2.h:497: warning: No description found for parameter 'h_start_jiffies'
   include/linux/jbd2.h:497: warning: No description found for parameter 'h_requested_credits'
   include/linux/jbd2.h:497: warning: No description found for parameter 'saved_alloc_context'
   include/linux/jbd2.h:1050: warning: No description found for parameter 'j_chkpt_bhs'
   include/linux/jbd2.h:1050: warning: No description found for parameter 'j_devname'
   include/linux/jbd2.h:1050: warning: No description found for parameter 'j_average_commit_time'
   include/linux/jbd2.h:1050: warning: No description found for parameter 'j_min_batch_time'
   include/linux/jbd2.h:1050: warning: No description found for parameter 'j_max_batch_time'
   include/linux/jbd2.h:1050: warning: No description found for parameter 'j_commit_callback'
   include/linux/jbd2.h:1050: warning: No description found for parameter 'j_failed_commit'
   include/linux/jbd2.h:1050: warning: No description found for parameter 'j_chksum_driver'
   include/linux/jbd2.h:1050: warning: No description found for parameter 'j_csum_seed'
   fs/jbd2/transaction.c:511: warning: No description found for parameter 'type'
   fs/jbd2/transaction.c:511: warning: No description found for parameter 'line_no'
   fs/jbd2/transaction.c:641: warning: No description found for parameter 'gfp_mask'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'debugfs_init'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'gem_open_object'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'gem_close_object'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'prime_handle_to_fd'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'prime_fd_to_handle'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'gem_prime_export'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'gem_prime_import'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'gem_prime_pin'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'gem_prime_unpin'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'gem_prime_res_obj'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'gem_prime_get_sg_table'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'gem_prime_import_sg_table'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'gem_prime_vmap'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'gem_prime_vunmap'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'gem_prime_mmap'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'gem_vm_ops'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'major'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'minor'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'patchlevel'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'name'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'desc'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'date'
   include/drm/drm_drv.h:553: warning: No description found for parameter 'driver_features'
   drivers/gpu/drm/drm_modes.c:1623: warning: No description found for parameter 'display'
   drivers/gpu/drm/drm_modes.c:1623: warning: Excess function parameter 'connector' description in 'drm_mode_is_420_only'
   drivers/gpu/drm/i915/i915_perf.c:691: warning: No description found for parameter 'last_ts'
   drivers/gpu/drm/i915/i915_perf.c:691: warning: No description found for parameter 'last_ts'
   drivers/gpu/drm/i915/i915_perf.c:691: warning: No description found for parameter 'last_ts'
   drivers/gpu/drm/i915/i915_perf.c:691: warning: No description found for parameter 'last_ts'
   drivers/gpu/drm/i915/i915_perf.c:691: warning: No description found for parameter 'last_ts'
   drivers/gpu/drm/i915/i915_perf.c:691: warning: No description found for parameter 'last_ts'
   drivers/gpu/drm/i915/i915_perf.c:691: warning: No description found for parameter 'last_ts'
   drivers/gpu/drm/i915/i915_perf.c:691: warning: No description found for parameter 'last_ts'
   drivers/gpu/drm/i915/i915_drv.h:2000: warning: No description found for parameter 'emit_sample_capture'
   drivers/gpu/drm/i915/i915_drv.h:2083: warning: No description found for parameter 'cs_buffer'
   drivers/gpu/drm/i915/i915_drv.h:2083: warning: No description found for parameter 'cs_samples'
   drivers/gpu/drm/i915/i915_drv.h:2083: warning: No description found for parameter 'cs_samples_lock'
   drivers/gpu/drm/i915/i915_drv.h:2083: warning: No description found for parameter 'poll_wq'
   drivers/gpu/drm/i915/i915_drv.h:2083: warning: No description found for parameter 'pollin'
   drivers/gpu/drm/i915/i915_drv.h:2083: warning: No description found for parameter 'last_ctx_id'
>> drivers/gpu/drm/i915/i915_drv.h:2083: warning: No description found for parameter 'last_pid'
   drivers/gpu/drm/i915/i915_drv.h:2000: warning: No description found for parameter 'emit_sample_capture'
   drivers/gpu/drm/i915/i915_drv.h:2083: warning: No description found for parameter 'cs_buffer'
   drivers/gpu/drm/i915/i915_drv.h:2083: warning: No description found for parameter 'cs_samples'
   drivers/gpu/drm/i915/i915_drv.h:2083: warning: No description found for parameter 'cs_samples_lock'
   drivers/gpu/drm/i915/i915_drv.h:2083: warning: No description found for parameter 'poll_wq'
   drivers/gpu/drm/i915/i915_drv.h:2083: warning: No description found for parameter 'pollin'
   drivers/gpu/drm/i915/i915_drv.h:2083: warning: No description found for parameter 'last_ctx_id'
>> drivers/gpu/drm/i915/i915_drv.h:2083: warning: No description found for parameter 'last_pid'
   drivers/gpu/drm/i915/i915_perf.c:691: warning: No description found for parameter 'last_ts'
   drivers/gpu/drm/i915/i915_perf.c:691: warning: No description found for parameter 'last_ts'
   drivers/gpu/drm/i915/i915_perf.c:691: warning: No description found for parameter 'last_ts'
   drivers/gpu/drm/i915/i915_perf.c:691: warning: No description found for parameter 'last_ts'
   drivers/gpu/drm/i915/i915_perf.c:691: warning: No description found for parameter 'last_ts'
   drivers/gpu/drm/i915/i915_perf.c:691: warning: No description found for parameter 'last_ts'
   drivers/gpu/drm/i915/i915_perf.c:691: warning: No description found for parameter 'last_ts'
   drivers/gpu/drm/i915/i915_perf.c:691: warning: No description found for parameter 'last_ts'
   drivers/gpu/drm/i915/i915_perf.c:691: warning: No description found for parameter 'last_ts'
   drivers/gpu/drm/i915/i915_drv.h:2000: warning: No description found for parameter 'emit_sample_capture'
   drivers/gpu/drm/i915/i915_drv.h:2083: warning: No description found for parameter 'cs_buffer'
   drivers/gpu/drm/i915/i915_drv.h:2083: warning: No description found for parameter 'cs_samples'
   drivers/gpu/drm/i915/i915_drv.h:2083: warning: No description found for parameter 'cs_samples_lock'
   drivers/gpu/drm/i915/i915_drv.h:2083: warning: No description found for parameter 'poll_wq'
   drivers/gpu/drm/i915/i915_drv.h:2083: warning: No description found for parameter 'pollin'
   drivers/gpu/drm/i915/i915_drv.h:2083: warning: No description found for parameter 'last_ctx_id'
>> drivers/gpu/drm/i915/i915_drv.h:2083: warning: No description found for parameter 'last_pid'
   drivers/gpu/drm/i915/i915_perf.c:691: warning: No description found for parameter 'last_ts'
   drivers/gpu/drm/i915/i915_perf.c:1: warning: no structured comments found
   drivers/gpu/drm/i915/i915_perf.c:691: warning: No description found for parameter 'last_ts'
   drivers/gpu/drm/i915/i915_perf.c:1: warning: no structured comments found
   drivers/gpu/drm/i915/i915_perf.c:691: warning: No description found for parameter 'last_ts'
   drivers/gpu/drm/i915/i915_perf.c:1: warning: no structured comments found
   drivers/gpu/drm/i915/i915_perf.c:691: warning: No description found for parameter 'last_ts'
   drivers/gpu/drm/i915/i915_perf.c:1: warning: no structured comments found
   drivers/gpu/drm/i915/i915_perf.c:691: warning: No description found for parameter 'last_ts'
   drivers/gpu/drm/i915/i915_perf.c:1: warning: no structured comments found
   drivers/gpu/drm/i915/i915_perf.c:691: warning: No description found for parameter 'last_ts'
   drivers/gpu/drm/i915/i915_perf.c:1: warning: no structured comments found
   drivers/gpu/drm/i915/i915_perf.c:692: warning: No description found for parameter 'last_ts'
   drivers/gpu/host1x/bus.c:50: warning: Excess function parameter 'driver' description in 'host1x_subdev_add'
   Documentation/doc-guide/sphinx.rst:121: ERROR: Unknown target name: "sphinx c domain".
   kernel/sched/fair.c:7584: WARNING: Inline emphasis start-string without end-string.
   kernel/time/timer.c:1200: ERROR: Unexpected indentation.
   kernel/time/timer.c:1202: ERROR: Unexpected indentation.
   kernel/time/timer.c:1203: WARNING: Block quote ends without a blank line; unexpected unindent.
   include/linux/wait.h:108: WARNING: Block quote ends without a blank line; unexpected unindent.
   include/linux/wait.h:111: ERROR: Unexpected indentation.
   include/linux/wait.h:113: WARNING: Block quote ends without a blank line; unexpected unindent.
   kernel/time/hrtimer.c:991: WARNING: Block quote ends without a blank line; unexpected unindent.
   kernel/signal.c:323: WARNING: Inline literal start-string without end-string.
   kernel/rcu/tree.c:3187: ERROR: Unexpected indentation.
   kernel/rcu/tree.c:3214: ERROR: Unexpected indentation.
   kernel/rcu/tree.c:3215: WARNING: Bullet list ends without a blank line; unexpected unindent.
   include/linux/iio/iio.h:219: ERROR: Unexpected indentation.
   include/linux/iio/iio.h:220: WARNING: Block quote ends without a blank line; unexpected unindent.
   include/linux/iio/iio.h:226: WARNING: Definition list ends without a blank line; unexpected unindent.
   drivers/iio/industrialio-core.c:633: ERROR: Unknown target name: "iio_val".
   drivers/iio/industrialio-core.c:640: ERROR: Unknown target name: "iio_val".
   drivers/ata/libata-core.c:5906: ERROR: Unknown target name: "hw".
   drivers/message/fusion/mptbase.c:5051: WARNING: Definition list ends without a blank line; unexpected unindent.
   drivers/tty/serial/serial_core.c:1897: WARNING: Definition list ends without a blank line; unexpected unindent.
   drivers/pci/pci.c:3470: ERROR: Unexpected indentation.
   include/linux/regulator/driver.h:271: ERROR: Unknown target name: "regulator_regmap_x_voltage".
   include/linux/spi/spi.h:373: ERROR: Unexpected indentation.
   drivers/w1/w1_io.c:196: WARNING: Definition list ends without a blank line; unexpected unindent.
   block/bio.c:404: ERROR: Unknown target name: "gfp".
   include/drm/drm_modeset_helper_vtables.h:1182: WARNING: Bullet list ends without a blank line; unexpected unindent.
   drivers/gpu/drm/drm_scdc_helper.c:203: ERROR: Unexpected indentation.
   drivers/gpu/drm/drm_scdc_helper.c:204: WARNING: Block quote ends without a blank line; unexpected unindent.
   Documentation/gpu/todo.rst:111: ERROR: Unknown target name: "drm_fb".
   sound/soc/soc-core.c:2703: ERROR: Unknown target name: "snd_soc_daifmt".
   sound/core/jack.c:312: ERROR: Unknown target name: "snd_jack_btn".
   Documentation/media/v4l-drivers/imx.rst:: WARNING: document isn't included in any toctree
   Documentation/virtual/kvm/vcpu-requests.rst:: WARNING: document isn't included in any toctree
   Documentation/dev-tools/kselftest.rst:15: WARNING: Could not lex literal_block as "c". Highlighting skipped.
   Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 43: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 56: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 69: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 82: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 96: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 109: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 122: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 133: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 164: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 193: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "~/.fonts.conf", line 43: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "~/.fonts.conf", line 56: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "~/.fonts.conf", line 69: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "~/.fonts.conf", line 82: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "~/.fonts.conf", line 96: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "~/.fonts.conf", line 109: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "~/.fonts.conf", line 122: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "~/.fonts.conf", line 133: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "~/.fonts.conf", line 164: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "~/.fonts.conf", line 193: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 43: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 56: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 69: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 82: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 96: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 109: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 122: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 133: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 164: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 193: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "~/.fonts.conf", line 43: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "~/.fonts.conf", line 56: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "~/.fonts.conf", line 69: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "~/.fonts.conf", line 82: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "~/.fonts.conf", line 96: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "~/.fonts.conf", line 109: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "~/.fonts.conf", line 122: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "~/.fonts.conf", line 133: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "~/.fonts.conf", line 164: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "~/.fonts.conf", line 193: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 43: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 56: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 69: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 82: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 96: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 109: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 122: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 133: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 164: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 193: Having multiple values in <test> isn't supported and may not work as expected
   Fontconfig warning: "~/.fonts.conf", line 43: Having multiple values in <test> isn't supported and may not work as expected

vim +/last_pid +2083 drivers/gpu/drm/i915/i915_drv.h

eec688e1 Robert Bragg 2016-11-07  1925  
16d98b31 Robert Bragg 2016-12-07  1926  /**
16d98b31 Robert Bragg 2016-12-07  1927   * struct i915_perf_stream_ops - the OPs to support a specific stream type
16d98b31 Robert Bragg 2016-12-07  1928   */
eec688e1 Robert Bragg 2016-11-07  1929  struct i915_perf_stream_ops {
16d98b31 Robert Bragg 2016-12-07  1930  	/**
16d98b31 Robert Bragg 2016-12-07  1931  	 * @enable: Enables the collection of HW samples, either in response to
16d98b31 Robert Bragg 2016-12-07  1932  	 * `I915_PERF_IOCTL_ENABLE` or implicitly called when stream is opened
16d98b31 Robert Bragg 2016-12-07  1933  	 * without `I915_PERF_FLAG_DISABLED`.
eec688e1 Robert Bragg 2016-11-07  1934  	 */
eec688e1 Robert Bragg 2016-11-07  1935  	void (*enable)(struct i915_perf_stream *stream);
eec688e1 Robert Bragg 2016-11-07  1936  
16d98b31 Robert Bragg 2016-12-07  1937  	/**
16d98b31 Robert Bragg 2016-12-07  1938  	 * @disable: Disables the collection of HW samples, either in response
16d98b31 Robert Bragg 2016-12-07  1939  	 * to `I915_PERF_IOCTL_DISABLE` or implicitly called before destroying
16d98b31 Robert Bragg 2016-12-07  1940  	 * the stream.
eec688e1 Robert Bragg 2016-11-07  1941  	 */
eec688e1 Robert Bragg 2016-11-07  1942  	void (*disable)(struct i915_perf_stream *stream);
eec688e1 Robert Bragg 2016-11-07  1943  
16d98b31 Robert Bragg 2016-12-07  1944  	/**
16d98b31 Robert Bragg 2016-12-07  1945  	 * @poll_wait: Call poll_wait, passing a wait queue that will be woken
eec688e1 Robert Bragg 2016-11-07  1946  	 * once there is something ready to read() for the stream
eec688e1 Robert Bragg 2016-11-07  1947  	 */
eec688e1 Robert Bragg 2016-11-07  1948  	void (*poll_wait)(struct i915_perf_stream *stream,
eec688e1 Robert Bragg 2016-11-07  1949  			  struct file *file,
eec688e1 Robert Bragg 2016-11-07  1950  			  poll_table *wait);
eec688e1 Robert Bragg 2016-11-07  1951  
16d98b31 Robert Bragg 2016-12-07  1952  	/**
16d98b31 Robert Bragg 2016-12-07  1953  	 * @wait_unlocked: For handling a blocking read, wait until there is
16d98b31 Robert Bragg 2016-12-07  1954  	 * something to ready to read() for the stream. E.g. wait on the same
d7965152 Robert Bragg 2016-11-07  1955  	 * wait queue that would be passed to poll_wait().
eec688e1 Robert Bragg 2016-11-07  1956  	 */
eec688e1 Robert Bragg 2016-11-07  1957  	int (*wait_unlocked)(struct i915_perf_stream *stream);
eec688e1 Robert Bragg 2016-11-07  1958  
16d98b31 Robert Bragg 2016-12-07  1959  	/**
16d98b31 Robert Bragg 2016-12-07  1960  	 * @read: Copy buffered metrics as records to userspace
16d98b31 Robert Bragg 2016-12-07  1961  	 * **buf**: the userspace, destination buffer
16d98b31 Robert Bragg 2016-12-07  1962  	 * **count**: the number of bytes to copy, requested by userspace
16d98b31 Robert Bragg 2016-12-07  1963  	 * **offset**: zero at the start of the read, updated as the read
16d98b31 Robert Bragg 2016-12-07  1964  	 * proceeds, it represents how many bytes have been copied so far and
16d98b31 Robert Bragg 2016-12-07  1965  	 * the buffer offset for copying the next record.
eec688e1 Robert Bragg 2016-11-07  1966  	 *
16d98b31 Robert Bragg 2016-12-07  1967  	 * Copy as many buffered i915 perf samples and records for this stream
16d98b31 Robert Bragg 2016-12-07  1968  	 * to userspace as will fit in the given buffer.
eec688e1 Robert Bragg 2016-11-07  1969  	 *
16d98b31 Robert Bragg 2016-12-07  1970  	 * Only write complete records; returning -%ENOSPC if there isn't room
16d98b31 Robert Bragg 2016-12-07  1971  	 * for a complete record.
eec688e1 Robert Bragg 2016-11-07  1972  	 *
16d98b31 Robert Bragg 2016-12-07  1973  	 * Return any error condition that results in a short read such as
16d98b31 Robert Bragg 2016-12-07  1974  	 * -%ENOSPC or -%EFAULT, even though these may be squashed before
16d98b31 Robert Bragg 2016-12-07  1975  	 * returning to userspace.
eec688e1 Robert Bragg 2016-11-07  1976  	 */
eec688e1 Robert Bragg 2016-11-07  1977  	int (*read)(struct i915_perf_stream *stream,
eec688e1 Robert Bragg 2016-11-07  1978  		    char __user *buf,
eec688e1 Robert Bragg 2016-11-07  1979  		    size_t count,
eec688e1 Robert Bragg 2016-11-07  1980  		    size_t *offset);
eec688e1 Robert Bragg 2016-11-07  1981  
16d98b31 Robert Bragg 2016-12-07  1982  	/**
16d98b31 Robert Bragg 2016-12-07  1983  	 * @destroy: Cleanup any stream specific resources.
eec688e1 Robert Bragg 2016-11-07  1984  	 *
eec688e1 Robert Bragg 2016-11-07  1985  	 * The stream will always be disabled before this is called.
eec688e1 Robert Bragg 2016-11-07  1986  	 */
eec688e1 Robert Bragg 2016-11-07  1987  	void (*destroy)(struct i915_perf_stream *stream);
b0aca6b4 Sourab Gupta 2017-07-31  1988  
b0aca6b4 Sourab Gupta 2017-07-31  1989  	/*
b0aca6b4 Sourab Gupta 2017-07-31  1990  	 * @emit_sample_capture: Emit the commands in the command streamer
b0aca6b4 Sourab Gupta 2017-07-31  1991  	 * for a particular gpu engine.
b0aca6b4 Sourab Gupta 2017-07-31  1992  	 *
b0aca6b4 Sourab Gupta 2017-07-31  1993  	 * The commands are inserted to capture the perf sample data at
b0aca6b4 Sourab Gupta 2017-07-31  1994  	 * specific points during workload execution, such as before and after
b0aca6b4 Sourab Gupta 2017-07-31  1995  	 * the batch buffer.
b0aca6b4 Sourab Gupta 2017-07-31  1996  	 */
b0aca6b4 Sourab Gupta 2017-07-31  1997  	void (*emit_sample_capture)(struct i915_perf_stream *stream,
b0aca6b4 Sourab Gupta 2017-07-31  1998  				    struct drm_i915_gem_request *request,
b0aca6b4 Sourab Gupta 2017-07-31  1999  				    bool preallocate);
b0aca6b4 Sourab Gupta 2017-07-31 @2000  };
b0aca6b4 Sourab Gupta 2017-07-31  2001  
b0aca6b4 Sourab Gupta 2017-07-31  2002  enum i915_perf_stream_state {
b0aca6b4 Sourab Gupta 2017-07-31  2003  	I915_PERF_STREAM_DISABLED,
b0aca6b4 Sourab Gupta 2017-07-31  2004  	I915_PERF_STREAM_ENABLE_IN_PROGRESS,
b0aca6b4 Sourab Gupta 2017-07-31  2005  	I915_PERF_STREAM_ENABLED,
eec688e1 Robert Bragg 2016-11-07  2006  };
eec688e1 Robert Bragg 2016-11-07  2007  
16d98b31 Robert Bragg 2016-12-07  2008  /**
16d98b31 Robert Bragg 2016-12-07  2009   * struct i915_perf_stream - state for a single open stream FD
16d98b31 Robert Bragg 2016-12-07  2010   */
eec688e1 Robert Bragg 2016-11-07  2011  struct i915_perf_stream {
16d98b31 Robert Bragg 2016-12-07  2012  	/**
16d98b31 Robert Bragg 2016-12-07  2013  	 * @dev_priv: i915 drm device
16d98b31 Robert Bragg 2016-12-07  2014  	 */
eec688e1 Robert Bragg 2016-11-07  2015  	struct drm_i915_private *dev_priv;
eec688e1 Robert Bragg 2016-11-07  2016  
16d98b31 Robert Bragg 2016-12-07  2017  	/**
b0aca6b4 Sourab Gupta 2017-07-31  2018  	 * @engine: Engine to which this stream corresponds.
16d98b31 Robert Bragg 2016-12-07  2019  	 */
b0aca6b4 Sourab Gupta 2017-07-31  2020  	struct intel_engine_cs *engine;
eec688e1 Robert Bragg 2016-11-07  2021  
16d98b31 Robert Bragg 2016-12-07  2022  	/**
16d98b31 Robert Bragg 2016-12-07  2023  	 * @sample_flags: Flags representing the `DRM_I915_PERF_PROP_SAMPLE_*`
16d98b31 Robert Bragg 2016-12-07  2024  	 * properties given when opening a stream, representing the contents
16d98b31 Robert Bragg 2016-12-07  2025  	 * of a single sample as read() by userspace.
16d98b31 Robert Bragg 2016-12-07  2026  	 */
eec688e1 Robert Bragg 2016-11-07  2027  	u32 sample_flags;
16d98b31 Robert Bragg 2016-12-07  2028  
16d98b31 Robert Bragg 2016-12-07  2029  	/**
16d98b31 Robert Bragg 2016-12-07  2030  	 * @sample_size: Considering the configured contents of a sample
16d98b31 Robert Bragg 2016-12-07  2031  	 * combined with the required header size, this is the total size
16d98b31 Robert Bragg 2016-12-07  2032  	 * of a single sample record.
16d98b31 Robert Bragg 2016-12-07  2033  	 */
d7965152 Robert Bragg 2016-11-07  2034  	int sample_size;
eec688e1 Robert Bragg 2016-11-07  2035  
16d98b31 Robert Bragg 2016-12-07  2036  	/**
16d98b31 Robert Bragg 2016-12-07  2037  	 * @ctx: %NULL if measuring system-wide across all contexts or a
16d98b31 Robert Bragg 2016-12-07  2038  	 * specific context that is being monitored.
16d98b31 Robert Bragg 2016-12-07  2039  	 */
eec688e1 Robert Bragg 2016-11-07  2040  	struct i915_gem_context *ctx;
16d98b31 Robert Bragg 2016-12-07  2041  
16d98b31 Robert Bragg 2016-12-07  2042  	/**
b0aca6b4 Sourab Gupta 2017-07-31  2043  	 * @state: Current stream state, which can be either disabled, enabled,
b0aca6b4 Sourab Gupta 2017-07-31  2044  	 * or enable_in_progress, while considering whether the stream was
b0aca6b4 Sourab Gupta 2017-07-31  2045  	 * opened in a disabled state and based on `I915_PERF_IOCTL_ENABLE` and
b0aca6b4 Sourab Gupta 2017-07-31  2046  	 * `I915_PERF_IOCTL_DISABLE` calls.
16d98b31 Robert Bragg 2016-12-07  2047  	 */
b0aca6b4 Sourab Gupta 2017-07-31  2048  	enum i915_perf_stream_state state;
b0aca6b4 Sourab Gupta 2017-07-31  2049  
b0aca6b4 Sourab Gupta 2017-07-31  2050  	/**
b0aca6b4 Sourab Gupta 2017-07-31  2051  	 * @cs_mode: Whether command stream based perf sample collection is
b0aca6b4 Sourab Gupta 2017-07-31  2052  	 * enabled for this stream
b0aca6b4 Sourab Gupta 2017-07-31  2053  	 */
b0aca6b4 Sourab Gupta 2017-07-31  2054  	bool cs_mode;
b0aca6b4 Sourab Gupta 2017-07-31  2055  
b0aca6b4 Sourab Gupta 2017-07-31  2056  	/**
b0aca6b4 Sourab Gupta 2017-07-31  2057  	 * @using_oa: Whether OA unit is in use for this particular stream
b0aca6b4 Sourab Gupta 2017-07-31  2058  	 */
b0aca6b4 Sourab Gupta 2017-07-31  2059  	bool using_oa;
eec688e1 Robert Bragg 2016-11-07  2060  
16d98b31 Robert Bragg 2016-12-07  2061  	/**
16d98b31 Robert Bragg 2016-12-07  2062  	 * @ops: The callbacks providing the implementation of this specific
16d98b31 Robert Bragg 2016-12-07  2063  	 * type of configured stream.
16d98b31 Robert Bragg 2016-12-07  2064  	 */
d7965152 Robert Bragg 2016-11-07  2065  	const struct i915_perf_stream_ops *ops;
b0aca6b4 Sourab Gupta 2017-07-31  2066  
b0aca6b4 Sourab Gupta 2017-07-31  2067  	/* Command stream based perf data buffer */
b0aca6b4 Sourab Gupta 2017-07-31  2068  	struct {
b0aca6b4 Sourab Gupta 2017-07-31  2069  		struct i915_vma *vma;
b0aca6b4 Sourab Gupta 2017-07-31  2070  		u8 *vaddr;
71fd8fc0 Sourab Gupta 2017-07-31  2071  #define I915_PERF_CMD_STREAM_BUF_STATUS_OVERFLOW (1<<0)
71fd8fc0 Sourab Gupta 2017-07-31  2072  		u32 status;
b0aca6b4 Sourab Gupta 2017-07-31  2073  	} cs_buffer;
b0aca6b4 Sourab Gupta 2017-07-31  2074  
b0aca6b4 Sourab Gupta 2017-07-31  2075  	struct list_head cs_samples;
b0aca6b4 Sourab Gupta 2017-07-31  2076  	spinlock_t cs_samples_lock;
b0aca6b4 Sourab Gupta 2017-07-31  2077  
b0aca6b4 Sourab Gupta 2017-07-31  2078  	wait_queue_head_t poll_wq;
b0aca6b4 Sourab Gupta 2017-07-31  2079  	bool pollin;
7405a923 Sourab Gupta 2017-07-31  2080  
7405a923 Sourab Gupta 2017-07-31  2081  	u32 last_ctx_id;
928b6006 Sourab Gupta 2017-07-31  2082  	u32 last_pid;
d7965152 Robert Bragg 2016-11-07 @2083  };
d7965152 Robert Bragg 2016-11-07  2084  

:::::: The code at line 2083 was first introduced by commit
:::::: d79651522e89c4ffa8992b48dfe449f0c583f809 drm/i915: Enable i915 perf stream for Haswell OA unit

:::::: TO: Robert Bragg <robert@sixbynine.org>
:::::: CC: Daniel Vetter <daniel.vetter@ffwll.ch>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 6735 bytes --]

[-- Attachment #3: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 03/12] drm/i915: Framework for capturing command stream based OA reports and ctx id info.
  2017-07-31 15:45   ` Lionel Landwerlin
@ 2017-08-01  9:29     ` Kamble, Sagar A
  2017-08-01 18:05       ` sourab gupta
  0 siblings, 1 reply; 34+ messages in thread
From: Kamble, Sagar A @ 2017-08-01  9:29 UTC (permalink / raw)
  To: Landwerlin, Lionel G, intel-gfx; +Cc: Sourab Gupta



-----Original Message-----
From: Landwerlin, Lionel G 
Sent: Monday, July 31, 2017 9:16 PM
To: Kamble, Sagar A <sagar.a.kamble@intel.com>; intel-gfx@lists.freedesktop.org
Cc: Sourab Gupta <sourab.gupta@intel.com>
Subject: Re: [Intel-gfx] [PATCH 03/12] drm/i915: Framework for capturing command stream based OA reports and ctx id info.

On 31/07/17 08:59, Sagar Arun Kamble wrote:
> From: Sourab Gupta <sourab.gupta@intel.com>
>
> This patch introduces a framework to capture OA counter reports associated
> with Render command stream. We can then associate the reports captured
> through this mechanism with their corresponding context id's. This can be
> further extended to associate any other metadata information with the
> corresponding samples (since the association with Render command stream
> gives us the ability to capture these information while inserting the
> corresponding capture commands into the command stream).
>
> The OA reports generated in this way are associated with a corresponding
> workload, and thus can be used the delimit the workload (i.e. sample the
> counters at the workload boundaries), within an ongoing stream of periodic
> counter snapshots.
>
> There may be usecases wherein we need more than periodic OA capture mode
> which is supported currently. This mode is primarily used for two usecases:
>      - Ability to capture system wide metrics, alongwith the ability to map
>        the reports back to individual contexts (particularly for HSW).
>      - Ability to inject tags for work, into the reports. This provides
>        visibility into the multiple stages of work within single context.
>
> The userspace will be able to distinguish between the periodic and CS based
> OA reports by the virtue of source_info sample field.
>
> The command MI_REPORT_PERF_COUNT can be used to capture snapshots of OA
> counters, and is inserted at BB boundaries.
> The data thus captured will be stored in a separate buffer, which will
> be different from the buffer used otherwise for periodic OA capture mode.
> The metadata information pertaining to snapshot is maintained in a list,
> which also has offsets into the gem buffer object per captured snapshot.
> In order to track whether the gpu has completed processing the node,
> a field pertaining to corresponding gem request is added, which is tracked
> for completion of the command.
>
> Both periodic and CS based reports are associated with a single stream
> (corresponding to render engine), and it is expected to have the samples
> in the sequential order according to their timestamps. Now, since these
> reports are collected in separate buffers, these are merge sorted at the
> time of forwarding to userspace during the read call.
>
> v2: Aligning with the non-perf interface (custom drm ioctl based). Also,
> few related patches are squashed together for better readability
>
> v3: Updated perf sample capture emit hook name. Reserving space upfront
> in the ring for emitting sample capture commands and using
> req->fence.seqno for tracking samples. Added SRCU protection for streams.
> Changed the stream last_request tracking to resv object. (Chris)
> Updated perf.sample_lock spin_lock usage to avoid softlockups. Moved
> stream to global per-engine structure. (Sagar)
> Update unpin and put in the free routines to i915_vma_unpin_and_release.
> Making use of perf stream cs_buffer vma resv instead of separate resv obj.
> Pruned perf stream vma resv during gem_idle. (Chris)
> Changed payload field ctx_id to u64 to keep all sample data aligned at 8
> bytes. (Lionel)
> stall/flush prior to sample capture is not added. Do we need to give this
> control to user to select whether to stall/flush at each sample?
>
> Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
> Signed-off-by: Robert Bragg <robert@sixbynine.org>
> Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_drv.h            |  101 ++-
>   drivers/gpu/drm/i915/i915_gem.c            |    1 +
>   drivers/gpu/drm/i915/i915_gem_execbuffer.c |    8 +
>   drivers/gpu/drm/i915/i915_perf.c           | 1185 ++++++++++++++++++++++------
>   drivers/gpu/drm/i915/intel_engine_cs.c     |    4 +
>   drivers/gpu/drm/i915/intel_ringbuffer.c    |    2 +
>   drivers/gpu/drm/i915/intel_ringbuffer.h    |    5 +
>   include/uapi/drm/i915_drm.h                |   15 +
>   8 files changed, 1073 insertions(+), 248 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 2c7456f..8b1cecf 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1985,6 +1985,24 @@ struct i915_perf_stream_ops {
>   	 * The stream will always be disabled before this is called.
>   	 */
>   	void (*destroy)(struct i915_perf_stream *stream);
> +
> +	/*
> +	 * @emit_sample_capture: Emit the commands in the command streamer
> +	 * for a particular gpu engine.
> +	 *
> +	 * The commands are inserted to capture the perf sample data at
> +	 * specific points during workload execution, such as before and after
> +	 * the batch buffer.
> +	 */
> +	void (*emit_sample_capture)(struct i915_perf_stream *stream,
> +				    struct drm_i915_gem_request *request,
> +				    bool preallocate);
> +};
> +

It seems the motivation for this following enum is mostly to deal with 
the fact that engine->perf_srcu is set before the OA unit is configured.
Would it possible to set it later so that we get rid of the enum?

<Sagar> I will try to make this as just binary state. This enum is defining the state of the stream. I too got confused with purpose of IN_PROGRESS.
SRCU is used for synchronizing stream state check.
IN_PROGRESS will enable us to not advertently try to access the stream vma for inserting the samples, but I guess depending on disabled/enabled should
suffice.

> +enum i915_perf_stream_state {
> +	I915_PERF_STREAM_DISABLED,
> +	I915_PERF_STREAM_ENABLE_IN_PROGRESS,
> +	I915_PERF_STREAM_ENABLED,
>   };
>   
>   /**
> @@ -1997,9 +2015,9 @@ struct i915_perf_stream {
>   	struct drm_i915_private *dev_priv;
>   
>   	/**
> -	 * @link: Links the stream into ``&drm_i915_private->streams``
> +	 * @engine: Engine to which this stream corresponds.
>   	 */
> -	struct list_head link;
> +	struct intel_engine_cs *engine;

This series only supports cs_mode on the RCS command stream.
Does it really make sense to add an srcu on all the engines rather than 
keeping it part of dev_priv->perf ?

We can always add that later if needed.

<sagar> Yes. Will change this.
>   
>   	/**
>   	 * @sample_flags: Flags representing the `DRM_I915_PERF_PROP_SAMPLE_*`
> @@ -2022,17 +2040,41 @@ struct i915_perf_stream {
>   	struct i915_gem_context *ctx;
>   
>   	/**
> -	 * @enabled: Whether the stream is currently enabled, considering
> -	 * whether the stream was opened in a disabled state and based
> -	 * on `I915_PERF_IOCTL_ENABLE` and `I915_PERF_IOCTL_DISABLE` calls.
> +	 * @state: Current stream state, which can be either disabled, enabled,
> +	 * or enable_in_progress, while considering whether the stream was
> +	 * opened in a disabled state and based on `I915_PERF_IOCTL_ENABLE` and
> +	 * `I915_PERF_IOCTL_DISABLE` calls.
>   	 */
> -	bool enabled;
> +	enum i915_perf_stream_state state;
> +
> +	/**
> +	 * @cs_mode: Whether command stream based perf sample collection is
> +	 * enabled for this stream
> +	 */
> +	bool cs_mode;
> +
> +	/**
> +	 * @using_oa: Whether OA unit is in use for this particular stream
> +	 */
> +	bool using_oa;
>   
>   	/**
>   	 * @ops: The callbacks providing the implementation of this specific
>   	 * type of configured stream.
>   	 */
>   	const struct i915_perf_stream_ops *ops;
> +
> +	/* Command stream based perf data buffer */
> +	struct {
> +		struct i915_vma *vma;
> +		u8 *vaddr;
> +	} cs_buffer;
> +
> +	struct list_head cs_samples;
> +	spinlock_t cs_samples_lock;
> +
> +	wait_queue_head_t poll_wq;
> +	bool pollin;
>   };
>   
>   /**
> @@ -2095,7 +2137,8 @@ struct i915_oa_ops {
>   	int (*read)(struct i915_perf_stream *stream,
>   		    char __user *buf,
>   		    size_t count,
> -		    size_t *offset);
> +		    size_t *offset,
> +		    u32 ts);
>   
>   	/**
>   	 * @oa_hw_tail_read: read the OA tail pointer register
> @@ -2107,6 +2150,36 @@ struct i915_oa_ops {
>   	u32 (*oa_hw_tail_read)(struct drm_i915_private *dev_priv);
>   };
>   
> +/*
> + * i915_perf_cs_sample - Sample element to hold info about a single perf
> + * sample data associated with a particular GPU command stream.
> + */
> +struct i915_perf_cs_sample {
> +	/**
> +	 * @link: Links the sample into ``&stream->cs_samples``
> +	 */
> +	struct list_head link;
> +
> +	/**
> +	 * @request: GEM request associated with the sample. The commands to
> +	 * capture the perf metrics are inserted into the command streamer in
> +	 * context of this request.
> +	 */
> +	struct drm_i915_gem_request *request;
> +
> +	/**
> +	 * @offset: Offset into ``&stream->cs_buffer``
> +	 * where the perf metrics will be collected, when the commands inserted
> +	 * into the command stream are executed by GPU.
> +	 */
> +	u32 offset;
> +
> +	/**
> +	 * @ctx_id: Context ID associated with this perf sample
> +	 */
> +	u32 ctx_id;
> +};
> +
>   struct intel_cdclk_state {
>   	unsigned int cdclk, vco, ref;
>   };
> @@ -2431,17 +2504,10 @@ struct drm_i915_private {
>   		struct ctl_table_header *sysctl_header;
>   
>   		struct mutex lock;
> -		struct list_head streams;
> -
> -		struct {
> -			struct i915_perf_stream *exclusive_stream;
>   
> -			u32 specific_ctx_id;
> -
> -			struct hrtimer poll_check_timer;
> -			wait_queue_head_t poll_wq;
> -			bool pollin;
> +		struct hrtimer poll_check_timer;
>   
> +		struct {
>   			/**
>   			 * For rate limiting any notifications of spurious
>   			 * invalid OA reports
> @@ -3636,6 +3702,8 @@ int i915_perf_open_ioctl(struct drm_device *dev, void *data,
>   void i915_oa_init_reg_state(struct intel_engine_cs *engine,
>   			    struct i915_gem_context *ctx,
>   			    uint32_t *reg_state);
> +void i915_perf_emit_sample_capture(struct drm_i915_gem_request *req,
> +				   bool preallocate);
>   
>   /* i915_gem_evict.c */
>   int __must_check i915_gem_evict_something(struct i915_address_space *vm,
> @@ -3795,6 +3863,7 @@ int intel_engine_cmd_parser(struct intel_engine_cs *engine,
>   /* i915_perf.c */
>   extern void i915_perf_init(struct drm_i915_private *dev_priv);
>   extern void i915_perf_fini(struct drm_i915_private *dev_priv);
> +extern void i915_perf_streams_mark_idle(struct drm_i915_private *dev_priv);
>   extern void i915_perf_register(struct drm_i915_private *dev_priv);
>   extern void i915_perf_unregister(struct drm_i915_private *dev_priv);
>   
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 000a764..7b01548 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3220,6 +3220,7 @@ bool i915_gem_unset_wedged(struct drm_i915_private *i915)
>   
>   	intel_engines_mark_idle(dev_priv);
>   	i915_gem_timelines_mark_idle(dev_priv);
> +	i915_perf_streams_mark_idle(dev_priv);
>   
>   	GEM_BUG_ON(!dev_priv->gt.awake);
>   	dev_priv->gt.awake = false;
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index 5fa4476..bfe546b 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -1194,12 +1194,16 @@ static int __reloc_gpu_alloc(struct i915_execbuffer *eb,
>   	if (err)
>   		goto err_request;
>   
> +	i915_perf_emit_sample_capture(rq, true);
> +
>   	err = eb->engine->emit_bb_start(rq,
>   					batch->node.start, PAGE_SIZE,
>   					cache->gen > 5 ? 0 : I915_DISPATCH_SECURE);
>   	if (err)
>   		goto err_request;
>   
> +	i915_perf_emit_sample_capture(rq, false);
> +
>   	GEM_BUG_ON(!reservation_object_test_signaled_rcu(batch->resv, true));
>   	i915_vma_move_to_active(batch, rq, 0);
>   	reservation_object_lock(batch->resv, NULL);
> @@ -2029,6 +2033,8 @@ static int eb_submit(struct i915_execbuffer *eb)
>   			return err;
>   	}
>   
> +	i915_perf_emit_sample_capture(eb->request, true);
> +
>   	err = eb->engine->emit_bb_start(eb->request,
>   					eb->batch->node.start +
>   					eb->batch_start_offset,
> @@ -2037,6 +2043,8 @@ static int eb_submit(struct i915_execbuffer *eb)
>   	if (err)
>   		return err;
>   
> +	i915_perf_emit_sample_capture(eb->request, false);
> +
>   	return 0;
>   }
>   
> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> index b272653..57e1936 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -193,6 +193,7 @@
>   
>   #include <linux/anon_inodes.h>
>   #include <linux/sizes.h>
> +#include <linux/srcu.h>
>   
>   #include "i915_drv.h"
>   #include "i915_oa_hsw.h"
> @@ -288,6 +289,12 @@
>   #define OAREPORT_REASON_CTX_SWITCH     (1<<3)
>   #define OAREPORT_REASON_CLK_RATIO      (1<<5)
>   
> +/* Data common to periodic and RCS based OA samples */
> +struct i915_perf_sample_data {
> +	u64 source;
> +	u64 ctx_id;
> +	const u8 *report;
> +};
>   
>   /* For sysctl proc_dointvec_minmax of i915_oa_max_sample_rate
>    *
> @@ -328,8 +335,19 @@
>   	[I915_OA_FORMAT_C4_B8]		    = { 7, 64 },
>   };
>   
> +/* Duplicated from similar static enum in i915_gem_execbuffer.c */
> +#define I915_USER_RINGS (4)
> +static const enum intel_engine_id user_ring_map[I915_USER_RINGS + 1] = {
> +	[I915_EXEC_DEFAULT]     = RCS,
> +	[I915_EXEC_RENDER]      = RCS,
> +	[I915_EXEC_BLT]         = BCS,
> +	[I915_EXEC_BSD]         = VCS,
> +	[I915_EXEC_VEBOX]       = VECS
> +};
> +
>   #define SAMPLE_OA_REPORT      (1<<0)
>   #define SAMPLE_OA_SOURCE      (1<<1)
> +#define SAMPLE_CTX_ID	      (1<<2)
>   
>   /**
>    * struct perf_open_properties - for validated properties given to open a stream
> @@ -340,6 +358,9 @@
>    * @oa_format: An OA unit HW report format
>    * @oa_periodic: Whether to enable periodic OA unit sampling
>    * @oa_period_exponent: The OA unit sampling period is derived from this
> + * @cs_mode: Whether the stream is configured to enable collection of metrics
> + * associated with command stream of a particular GPU engine
> + * @engine: The GPU engine associated with the stream in case cs_mode is enabled
>    *
>    * As read_properties_unlocked() enumerates and validates the properties given
>    * to open a stream of metrics the configuration is built up in the structure
> @@ -356,6 +377,10 @@ struct perf_open_properties {
>   	int oa_format;
>   	bool oa_periodic;
>   	int oa_period_exponent;
> +
> +	/* Command stream mode */
> +	bool cs_mode;
> +	enum intel_engine_id engine;
>   };
>   
>   static u32 gen8_oa_hw_tail_read(struct drm_i915_private *dev_priv)
> @@ -371,6 +396,266 @@ static u32 gen7_oa_hw_tail_read(struct drm_i915_private *dev_priv)
>   }
>   
>   /**
> + * i915_perf_emit_sample_capture - Insert the commands to capture metrics into
> + * the command stream of a GPU engine.
> + * @request: request in whose context the metrics are being collected.
> + * @preallocate: allocate space in ring for related sample.
> + *
> + * The function provides a hook through which the commands to capture perf
> + * metrics, are inserted into the command stream of a GPU engine.
> + */
> +void i915_perf_emit_sample_capture(struct drm_i915_gem_request *request,
> +				   bool preallocate)
> +{
> +	struct intel_engine_cs *engine = request->engine;
> +	struct drm_i915_private *dev_priv = engine->i915;
> +	struct i915_perf_stream *stream;
> +	int idx;
> +
> +	if (!dev_priv->perf.initialized)
> +		return;
> +
> +	idx = srcu_read_lock(&engine->perf_srcu);
> +	stream = srcu_dereference(engine->exclusive_stream, &engine->perf_srcu);
> +	if (stream && (stream->state == I915_PERF_STREAM_ENABLED) &&
> +				stream->cs_mode)
> +		stream->ops->emit_sample_capture(stream, request,
> +						 preallocate);
> +	srcu_read_unlock(&engine->perf_srcu, idx);
> +}
> +
> +/**
> + * release_perf_samples - Release old perf samples to make space for new
> + * sample data.
> + * @stream: Stream from which space is to be freed up.
> + * @target_size: Space required to be freed up.
> + *
> + * We also dereference the associated request before deleting the sample.
> + * Also, no need to check whether the commands associated with old samples
> + * have been completed. This is because these sample entries are anyways going
> + * to be replaced by a new sample, and gpu will eventually overwrite the buffer
> + * contents, when the request associated with new sample completes.
> + */
> +static void release_perf_samples(struct i915_perf_stream *stream,
> +				 u32 target_size)
> +{
> +	struct drm_i915_private *dev_priv = stream->dev_priv;
> +	struct i915_perf_cs_sample *sample, *next;
> +	u32 sample_size = dev_priv->perf.oa.oa_buffer.format_size;
> +	u32 size = 0;
> +
> +	list_for_each_entry_safe
> +		(sample, next, &stream->cs_samples, link) {
> +		size += sample_size;
> +		i915_gem_request_put(sample->request);
> +		list_del(&sample->link);
> +		kfree(sample);
> +
> +		if (size >= target_size)
> +			break;
> +	}
> +}
> +
> +/**
> + * insert_perf_sample - Insert a perf sample entry to the sample list.
> + * @stream: Stream into which sample is to be inserted.
> + * @sample: perf CS sample to be inserted into the list
> + *
> + * This function never fails, since it always manages to insert the sample.
> + * If the space is exhausted in the buffer, it will remove the older
> + * entries in order to make space.
> + */
> +static void insert_perf_sample(struct i915_perf_stream *stream,
> +				struct i915_perf_cs_sample *sample)
> +{
> +	struct drm_i915_private *dev_priv = stream->dev_priv;
> +	struct i915_perf_cs_sample *first, *last;
> +	int max_offset = stream->cs_buffer.vma->obj->base.size;
> +	u32 sample_size = dev_priv->perf.oa.oa_buffer.format_size;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&stream->cs_samples_lock, flags);
> +	if (list_empty(&stream->cs_samples)) {
> +		sample->offset = 0;
> +		list_add_tail(&sample->link, &stream->cs_samples);
> +		spin_unlock_irqrestore(&stream->cs_samples_lock, flags);
> +		return;
> +	}
> +
> +	first = list_first_entry(&stream->cs_samples, typeof(*first),
> +				link);
> +	last = list_last_entry(&stream->cs_samples, typeof(*last),
> +				link);
> +
> +	if (last->offset >= first->offset) {
> +		/* Sufficient space available at the end of buffer? */
> +		if (last->offset + 2*sample_size < max_offset)
> +			sample->offset = last->offset + sample_size;
> +		/*
> +		 * Wraparound condition. Is sufficient space available at
> +		 * beginning of buffer?
> +		 */
> +		else if (sample_size < first->offset)
> +			sample->offset = 0;
> +		/* Insufficient space. Overwrite existing old entries */
> +		else {
> +			u32 target_size = sample_size - first->offset;
> +
> +			release_perf_samples(stream, target_size);
> +			sample->offset = 0;
> +		}
> +	} else {
> +		/* Sufficient space available? */
> +		if (last->offset + 2*sample_size < first->offset)
> +			sample->offset = last->offset + sample_size;
> +		/* Insufficient space. Overwrite existing old entries */
> +		else {
> +			u32 target_size = sample_size -
> +				(first->offset - last->offset -
> +				sample_size);
> +
> +			release_perf_samples(stream, target_size);
> +			sample->offset = last->offset + sample_size;
> +		}
> +	}
> +	list_add_tail(&sample->link, &stream->cs_samples);
> +	spin_unlock_irqrestore(&stream->cs_samples_lock, flags);
> +}
> +
> +/**
> + * i915_emit_oa_report_capture - Insert the commands to capture OA
> + * reports metrics into the render command stream
> + * @request: request in whose context the metrics are being collected.
> + * @preallocate: allocate space in ring for related sample.
> + * @offset: command stream buffer offset where the OA metrics need to be
> + * collected
> + */
> +static int i915_emit_oa_report_capture(
> +				struct drm_i915_gem_request *request,
> +				bool preallocate,
> +				u32 offset)
> +{
> +	struct drm_i915_private *dev_priv = request->i915;
> +	struct intel_engine_cs *engine = request->engine;
> +	struct i915_perf_stream *stream;
> +	u32 addr = 0;
> +	u32 cmd, len = 4, *cs;
> +	int idx;
> +
> +	idx = srcu_read_lock(&engine->perf_srcu);
> +	stream = srcu_dereference(engine->exclusive_stream, &engine->perf_srcu);
> +	addr = stream->cs_buffer.vma->node.start + offset;
> +	srcu_read_unlock(&engine->perf_srcu, idx);
> +
> +	if (WARN_ON(addr & 0x3f)) {
> +		DRM_ERROR("OA buffer address not aligned to 64 byte\n");
> +		return -EINVAL;
> +	}
> +
> +	if (preallocate)
> +		request->reserved_space += len;
> +	else
> +		request->reserved_space -= len;
> +
> +	cs = intel_ring_begin(request, 4);
> +	if (IS_ERR(cs))
> +		return PTR_ERR(cs);
> +
> +	cmd = MI_REPORT_PERF_COUNT | (1<<0);
> +	if (INTEL_GEN(dev_priv) >= 8)
> +		cmd |= (2<<0);
> +
> +	*cs++ = cmd;
> +	*cs++ = addr | MI_REPORT_PERF_COUNT_GGTT;
> +	*cs++ = request->fence.seqno;
> +
> +	if (INTEL_GEN(dev_priv) >= 8)
> +		*cs++ = 0;
> +	else
> +		*cs++ = MI_NOOP;
> +
> +	intel_ring_advance(request, cs);
> +
> +	return 0;
> +}
> +
> +/**
> + * i915_perf_stream_emit_sample_capture - Insert the commands to capture perf
> + * metrics into the GPU command stream
> + * @stream: An i915-perf stream opened for GPU metrics
> + * @request: request in whose context the metrics are being collected.
> + * @preallocate: allocate space in ring for related sample.
> + */
> +static void i915_perf_stream_emit_sample_capture(
> +					struct i915_perf_stream *stream,
> +					struct drm_i915_gem_request *request,
> +					bool preallocate)
> +{
> +	struct reservation_object *resv = stream->cs_buffer.vma->resv;
> +	struct i915_perf_cs_sample *sample;
> +	unsigned long flags;
> +	int ret;
> +
> +	sample = kzalloc(sizeof(*sample), GFP_KERNEL);
> +	if (sample == NULL) {
> +		DRM_ERROR("Perf sample alloc failed\n");
> +		return;
> +	}
> +
> +	sample->request = i915_gem_request_get(request);
> +	sample->ctx_id = request->ctx->hw_id;
> +
> +	insert_perf_sample(stream, sample);
> +
> +	if (stream->sample_flags & SAMPLE_OA_REPORT) {
> +		ret = i915_emit_oa_report_capture(request,
> +						  preallocate,
> +						  sample->offset);
> +		if (ret)
> +			goto err_unref;
> +	}
> +
> +	reservation_object_lock(resv, NULL);
> +	if (reservation_object_reserve_shared(resv) == 0)
> +		reservation_object_add_shared_fence(resv, &request->fence);
> +	reservation_object_unlock(resv);
> +
> +	i915_vma_move_to_active(stream->cs_buffer.vma, request,
> +					EXEC_OBJECT_WRITE);
> +	return;
> +
> +err_unref:
> +	i915_gem_request_put(sample->request);
> +	spin_lock_irqsave(&stream->cs_samples_lock, flags);
> +	list_del(&sample->link);
> +	spin_unlock_irqrestore(&stream->cs_samples_lock, flags);
> +	kfree(sample);
> +}
> +
> +/**
> + * i915_perf_stream_release_samples - Release the perf command stream samples
> + * @stream: Stream from which sample are to be released.
> + *
> + * Note: The associated requests should be completed before releasing the
> + * references here.
> + */
> +static void i915_perf_stream_release_samples(struct i915_perf_stream *stream)
> +{
> +	struct i915_perf_cs_sample *entry, *next;
> +	unsigned long flags;
> +
> +	list_for_each_entry_safe
> +		(entry, next, &stream->cs_samples, link) {
> +		i915_gem_request_put(entry->request);
> +
> +		spin_lock_irqsave(&stream->cs_samples_lock, flags);
> +		list_del(&entry->link);
> +		spin_unlock_irqrestore(&stream->cs_samples_lock, flags);
> +		kfree(entry);
> +	}
> +}
> +
> +/**
>    * oa_buffer_check_unlocked - check for data and update tail ptr state
>    * @dev_priv: i915 device instance
>    *
> @@ -521,12 +806,13 @@ static int append_oa_status(struct i915_perf_stream *stream,
>   }
>   
>   /**
> - * append_oa_sample - Copies single OA report into userspace read() buffer.
> - * @stream: An i915-perf stream opened for OA metrics
> + * append_perf_sample - Copies single perf sample into userspace read() buffer.
> + * @stream: An i915-perf stream opened for perf samples
>    * @buf: destination buffer given by userspace
>    * @count: the number of bytes userspace wants to read
>    * @offset: (inout): the current position for writing into @buf
> - * @report: A single OA report to (optionally) include as part of the sample
> + * @data: perf sample data which contains (optionally) metrics configured
> + * earlier when opening a stream
>    *
>    * The contents of a sample are configured through `DRM_I915_PERF_PROP_SAMPLE_*`
>    * properties when opening a stream, tracked as `stream->sample_flags`. This
> @@ -537,11 +823,11 @@ static int append_oa_status(struct i915_perf_stream *stream,
>    *
>    * Returns: 0 on success, negative error code on failure.
>    */
> -static int append_oa_sample(struct i915_perf_stream *stream,
> +static int append_perf_sample(struct i915_perf_stream *stream,
>   			    char __user *buf,
>   			    size_t count,
>   			    size_t *offset,
> -			    const u8 *report)
> +			    const struct i915_perf_sample_data *data)
>   {
>   	struct drm_i915_private *dev_priv = stream->dev_priv;
>   	int report_size = dev_priv->perf.oa.oa_buffer.format_size;
> @@ -569,16 +855,21 @@ static int append_oa_sample(struct i915_perf_stream *stream,
>   	 * transition. These are considered as source 'OABUFFER'.
>   	 */
>   	if (sample_flags & SAMPLE_OA_SOURCE) {
> -		u64 source = I915_PERF_SAMPLE_OA_SOURCE_OABUFFER;
> +		if (copy_to_user(buf, &data->source, 8))
> +			return -EFAULT;
> +		buf += 8;
> +	}
>   
> -		if (copy_to_user(buf, &source, 8))
> +	if (sample_flags & SAMPLE_CTX_ID) {
> +		if (copy_to_user(buf, &data->ctx_id, 8))
>   			return -EFAULT;
>   		buf += 8;
>   	}
>   
>   	if (sample_flags & SAMPLE_OA_REPORT) {
> -		if (copy_to_user(buf, report, report_size))
> +		if (copy_to_user(buf, data->report, report_size))
>   			return -EFAULT;
> +		buf += report_size;
>   	}
>   
>   	(*offset) += header.size;
> @@ -587,11 +878,54 @@ static int append_oa_sample(struct i915_perf_stream *stream,
>   }
>   
>   /**
> + * append_oa_buffer_sample - Copies single periodic OA report into userspace
> + * read() buffer.
> + * @stream: An i915-perf stream opened for OA metrics
> + * @buf: destination buffer given by userspace
> + * @count: the number of bytes userspace wants to read
> + * @offset: (inout): the current position for writing into @buf
> + * @report: A single OA report to (optionally) include as part of the sample
> + *
> + * Returns: 0 on success, negative error code on failure.
> + */
> +static int append_oa_buffer_sample(struct i915_perf_stream *stream,
> +				char __user *buf, size_t count,
> +				size_t *offset,	const u8 *report)
> +{
> +	struct drm_i915_private *dev_priv = stream->dev_priv;
> +	u32 sample_flags = stream->sample_flags;
> +	struct i915_perf_sample_data data = { 0 };
> +	u32 *report32 = (u32 *)report;
> +
> +	if (sample_flags & SAMPLE_OA_SOURCE)
> +		data.source = I915_PERF_SAMPLE_OA_SOURCE_OABUFFER;
> +
> +	if (sample_flags & SAMPLE_CTX_ID) {
> +		if (INTEL_INFO(dev_priv)->gen < 8)
> +			data.ctx_id = 0;
> +		else {
> +			/*
> +			 * XXX: Just keep the lower 21 bits for now since I'm
> +			 * not entirely sure if the HW touches any of the higher
> +			 * bits in this field
> +			 */
> +			data.ctx_id = report32[2] & 0x1fffff;
> +		}
> +	}
> +
> +	if (sample_flags & SAMPLE_OA_REPORT)
> +		data.report = report;
> +
> +	return append_perf_sample(stream, buf, count, offset, &data);
> +}
> +
> +/**
>    * Copies all buffered OA reports into userspace read() buffer.
>    * @stream: An i915-perf stream opened for OA metrics
>    * @buf: destination buffer given by userspace
>    * @count: the number of bytes userspace wants to read
>    * @offset: (inout): the current position for writing into @buf
> + * @ts: copy OA reports till this timestamp
>    *
>    * Notably any error condition resulting in a short read (-%ENOSPC or
>    * -%EFAULT) will be returned even though one or more records may
> @@ -609,7 +943,8 @@ static int append_oa_sample(struct i915_perf_stream *stream,
>   static int gen8_append_oa_reports(struct i915_perf_stream *stream,
>   				  char __user *buf,
>   				  size_t count,
> -				  size_t *offset)
> +				  size_t *offset,
> +				  u32 ts)
>   {
>   	struct drm_i915_private *dev_priv = stream->dev_priv;
>   	int report_size = dev_priv->perf.oa.oa_buffer.format_size;
> @@ -623,7 +958,7 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
>   	u32 taken;
>   	int ret = 0;
>   
> -	if (WARN_ON(!stream->enabled))
> +	if (WARN_ON(stream->state != I915_PERF_STREAM_ENABLED))
>   		return -EIO;
>   
>   	spin_lock_irqsave(&dev_priv->perf.oa.oa_buffer.ptr_lock, flags);
> @@ -669,6 +1004,11 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
>   		u32 *report32 = (void *)report;
>   		u32 ctx_id;
>   		u32 reason;
> +		u32 report_ts = report32[1];
> +
> +		/* Report timestamp should not exceed the given ts */
> +		if (report_ts > ts)
> +			break;
>   
>   		/*
>   		 * All the report sizes factor neatly into the buffer
> @@ -750,23 +1090,23 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
>   		 * switches since it's not-uncommon for periodic samples to
>   		 * identify a switch before any 'context switch' report.
>   		 */
> -		if (!dev_priv->perf.oa.exclusive_stream->ctx ||
> -		    dev_priv->perf.oa.specific_ctx_id == ctx_id ||
> +		if (!stream->ctx ||
> +		    stream->engine->specific_ctx_id == ctx_id ||
>   		    (dev_priv->perf.oa.oa_buffer.last_ctx_id ==
> -		     dev_priv->perf.oa.specific_ctx_id) ||
> +		     stream->engine->specific_ctx_id) ||
>   		    reason & OAREPORT_REASON_CTX_SWITCH) {
>   
>   			/*
>   			 * While filtering for a single context we avoid
>   			 * leaking the IDs of other contexts.
>   			 */
> -			if (dev_priv->perf.oa.exclusive_stream->ctx &&
> -			    dev_priv->perf.oa.specific_ctx_id != ctx_id) {
> +			if (stream->ctx &&
> +			    stream->engine->specific_ctx_id != ctx_id) {
>   				report32[2] = INVALID_CTX_ID;
>   			}
>   
> -			ret = append_oa_sample(stream, buf, count, offset,
> -					       report);
> +			ret = append_oa_buffer_sample(stream, buf, count,
> +						      offset, report);
>   			if (ret)
>   				break;
>   
> @@ -807,6 +1147,7 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
>    * @buf: destination buffer given by userspace
>    * @count: the number of bytes userspace wants to read
>    * @offset: (inout): the current position for writing into @buf
> + * @ts: copy OA reports till this timestamp
>    *
>    * Checks OA unit status registers and if necessary appends corresponding
>    * status records for userspace (such as for a buffer full condition) and then
> @@ -824,7 +1165,8 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
>   static int gen8_oa_read(struct i915_perf_stream *stream,
>   			char __user *buf,
>   			size_t count,
> -			size_t *offset)
> +			size_t *offset,
> +			u32 ts)
>   {
>   	struct drm_i915_private *dev_priv = stream->dev_priv;
>   	u32 oastatus;
> @@ -877,7 +1219,7 @@ static int gen8_oa_read(struct i915_perf_stream *stream,
>   			   oastatus & ~GEN8_OASTATUS_REPORT_LOST);
>   	}
>   
> -	return gen8_append_oa_reports(stream, buf, count, offset);
> +	return gen8_append_oa_reports(stream, buf, count, offset, ts);
>   }
>   
>   /**
> @@ -886,6 +1228,7 @@ static int gen8_oa_read(struct i915_perf_stream *stream,
>    * @buf: destination buffer given by userspace
>    * @count: the number of bytes userspace wants to read
>    * @offset: (inout): the current position for writing into @buf
> + * @ts: copy OA reports till this timestamp
>    *
>    * Notably any error condition resulting in a short read (-%ENOSPC or
>    * -%EFAULT) will be returned even though one or more records may
> @@ -903,7 +1246,8 @@ static int gen8_oa_read(struct i915_perf_stream *stream,
>   static int gen7_append_oa_reports(struct i915_perf_stream *stream,
>   				  char __user *buf,
>   				  size_t count,
> -				  size_t *offset)
> +				  size_t *offset,
> +				  u32 ts)
>   {
>   	struct drm_i915_private *dev_priv = stream->dev_priv;
>   	int report_size = dev_priv->perf.oa.oa_buffer.format_size;
> @@ -917,7 +1261,7 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream,
>   	u32 taken;
>   	int ret = 0;
>   
> -	if (WARN_ON(!stream->enabled))
> +	if (WARN_ON(stream->state != I915_PERF_STREAM_ENABLED))
>   		return -EIO;
>   
>   	spin_lock_irqsave(&dev_priv->perf.oa.oa_buffer.ptr_lock, flags);
> @@ -984,7 +1328,12 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream,
>   			continue;
>   		}
>   
> -		ret = append_oa_sample(stream, buf, count, offset, report);
> +		/* Report timestamp should not exceed the given ts */
> +		if (report32[1] > ts)
> +			break;
> +
> +		ret = append_oa_buffer_sample(stream, buf, count, offset,
> +					      report);
>   		if (ret)
>   			break;
>   
> @@ -1022,6 +1371,7 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream,
>    * @buf: destination buffer given by userspace
>    * @count: the number of bytes userspace wants to read
>    * @offset: (inout): the current position for writing into @buf
> + * @ts: copy OA reports till this timestamp
>    *
>    * Checks Gen 7 specific OA unit status registers and if necessary appends
>    * corresponding status records for userspace (such as for a buffer full
> @@ -1035,7 +1385,8 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream,
>   static int gen7_oa_read(struct i915_perf_stream *stream,
>   			char __user *buf,
>   			size_t count,
> -			size_t *offset)
> +			size_t *offset,
> +			u32 ts)
>   {
>   	struct drm_i915_private *dev_priv = stream->dev_priv;
>   	u32 oastatus1;
> @@ -1097,16 +1448,172 @@ static int gen7_oa_read(struct i915_perf_stream *stream,
>   			GEN7_OASTATUS1_REPORT_LOST;
>   	}
>   
> -	return gen7_append_oa_reports(stream, buf, count, offset);
> +	return gen7_append_oa_reports(stream, buf, count, offset, ts);
> +}
> +
> +/**
> + * append_cs_buffer_sample - Copies single perf sample data associated with
> + * GPU command stream, into userspace read() buffer.
> + * @stream: An i915-perf stream opened for perf CS metrics
> + * @buf: destination buffer given by userspace
> + * @count: the number of bytes userspace wants to read
> + * @offset: (inout): the current position for writing into @buf
> + * @node: Sample data associated with perf metrics
> + *
> + * Returns: 0 on success, negative error code on failure.
> + */
> +static int append_cs_buffer_sample(struct i915_perf_stream *stream,
> +				char __user *buf,
> +				size_t count,
> +				size_t *offset,
> +				struct i915_perf_cs_sample *node)
> +{
> +	struct drm_i915_private *dev_priv = stream->dev_priv;
> +	struct i915_perf_sample_data data = { 0 };
> +	u32 sample_flags = stream->sample_flags;
> +	int ret = 0;
> +
> +	if (sample_flags & SAMPLE_OA_REPORT) {
> +		const u8 *report = stream->cs_buffer.vaddr + node->offset;
> +		u32 sample_ts = *(u32 *)(report + 4);
> +
> +		data.report = report;
> +
> +		/* First, append the periodic OA samples having lower
> +		 * timestamp values
> +		 */
> +		ret = dev_priv->perf.oa.ops.read(stream, buf, count, offset,
> +						 sample_ts);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	if (sample_flags & SAMPLE_OA_SOURCE)
> +		data.source = I915_PERF_SAMPLE_OA_SOURCE_CS;
> +
> +	if (sample_flags & SAMPLE_CTX_ID)
> +		data.ctx_id = node->ctx_id;
> +
> +	return append_perf_sample(stream, buf, count, offset, &data);
>   }
>   
>   /**
> - * i915_oa_wait_unlocked - handles blocking IO until OA data available
> + * append_cs_buffer_samples: Copies all command stream based perf samples
> + * into userspace read() buffer.
> + * @stream: An i915-perf stream opened for perf CS metrics
> + * @buf: destination buffer given by userspace
> + * @count: the number of bytes userspace wants to read
> + * @offset: (inout): the current position for writing into @buf
> + *
> + * Notably any error condition resulting in a short read (-%ENOSPC or
> + * -%EFAULT) will be returned even though one or more records may
> + * have been successfully copied. In this case it's up to the caller
> + * to decide if the error should be squashed before returning to
> + * userspace.
> + *
> + * Returns: 0 on success, negative error code on failure.
> + */
> +static int append_cs_buffer_samples(struct i915_perf_stream *stream,
> +				char __user *buf,
> +				size_t count,
> +				size_t *offset)
> +{
> +	struct i915_perf_cs_sample *entry, *next;
> +	LIST_HEAD(free_list);
> +	int ret = 0;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&stream->cs_samples_lock, flags);
> +	if (list_empty(&stream->cs_samples)) {
> +		spin_unlock_irqrestore(&stream->cs_samples_lock, flags);
> +		return 0;
> +	}
> +	list_for_each_entry_safe(entry, next,
> +				 &stream->cs_samples, link) {
> +		if (!i915_gem_request_completed(entry->request))
> +			break;
> +		list_move_tail(&entry->link, &free_list);
> +	}
> +	spin_unlock_irqrestore(&stream->cs_samples_lock, flags);
> +
> +	if (list_empty(&free_list))
> +		return 0;
> +
> +	list_for_each_entry_safe(entry, next, &free_list, link) {
> +		ret = append_cs_buffer_sample(stream, buf, count, offset,
> +					      entry);
> +		if (ret)
> +			break;
> +
> +		list_del(&entry->link);
> +		i915_gem_request_put(entry->request);
> +		kfree(entry);
> +	}
> +
> +	/* Don't discard remaining entries, keep them for next read */
> +	spin_lock_irqsave(&stream->cs_samples_lock, flags);
> +	list_splice(&free_list, &stream->cs_samples);
> +	spin_unlock_irqrestore(&stream->cs_samples_lock, flags);
> +
> +	return ret;
> +}
> +
> +/*
> + * cs_buffer_is_empty - Checks whether the command stream buffer
> + * associated with the stream has data available.
>    * @stream: An i915-perf stream opened for OA metrics
>    *
> + * Returns: true if atleast one request associated with command stream is
> + * completed, else returns false.
> + */
> +static bool cs_buffer_is_empty(struct i915_perf_stream *stream)
> +
> +{
> +	struct i915_perf_cs_sample *entry = NULL;
> +	struct drm_i915_gem_request *request = NULL;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&stream->cs_samples_lock, flags);
> +	entry = list_first_entry_or_null(&stream->cs_samples,
> +			struct i915_perf_cs_sample, link);
> +	if (entry)
> +		request = entry->request;
> +	spin_unlock_irqrestore(&stream->cs_samples_lock, flags);
> +
> +	if (!entry)
> +		return true;
> +	else if (!i915_gem_request_completed(request))
> +		return true;
> +	else
> +		return false;
> +}
> +
> +/**
> + * stream_have_data_unlocked - Checks whether the stream has data available
> + * @stream: An i915-perf stream opened for OA metrics
> + *
> + * For command stream based streams, check if the command stream buffer has
> + * atleast one sample available, if not return false, irrespective of periodic
> + * oa buffer having the data or not.
> + */
> +
> +static bool stream_have_data_unlocked(struct i915_perf_stream *stream)
> +{
> +	struct drm_i915_private *dev_priv = stream->dev_priv;
> +
> +	if (stream->cs_mode)
> +		return !cs_buffer_is_empty(stream);
> +	else
> +		return oa_buffer_check_unlocked(dev_priv);
> +}
> +
> +/**
> + * i915_perf_stream_wait_unlocked - handles blocking IO until data available
> + * @stream: An i915-perf stream opened for GPU metrics
> + *
>    * Called when userspace tries to read() from a blocking stream FD opened
> - * for OA metrics. It waits until the hrtimer callback finds a non-empty
> - * OA buffer and wakes us.
> + * for perf metrics. It waits until the hrtimer callback finds a non-empty
> + * command stream buffer / OA buffer and wakes us.
>    *
>    * Note: it's acceptable to have this return with some false positives
>    * since any subsequent read handling will return -EAGAIN if there isn't
> @@ -1114,7 +1621,7 @@ static int gen7_oa_read(struct i915_perf_stream *stream,
>    *
>    * Returns: zero on success or a negative error code
>    */
> -static int i915_oa_wait_unlocked(struct i915_perf_stream *stream)
> +static int i915_perf_stream_wait_unlocked(struct i915_perf_stream *stream)
>   {
>   	struct drm_i915_private *dev_priv = stream->dev_priv;
>   
> @@ -1122,32 +1629,47 @@ static int i915_oa_wait_unlocked(struct i915_perf_stream *stream)
>   	if (!dev_priv->perf.oa.periodic)
>   		return -EIO;
>   
> -	return wait_event_interruptible(dev_priv->perf.oa.poll_wq,
> -					oa_buffer_check_unlocked(dev_priv));
> +	if (stream->cs_mode) {
> +		long int ret;
> +
> +		/* Wait for the all sampled requests. */
> +		ret = reservation_object_wait_timeout_rcu(
> +						    stream->cs_buffer.vma->resv,
> +						    true,
> +						    true,
> +						    MAX_SCHEDULE_TIMEOUT);
> +		if (unlikely(ret < 0)) {
> +			DRM_DEBUG_DRIVER("Failed to wait for sampled requests: %li\n", ret);
> +			return ret;
> +		}
> +	}
> +
> +	return wait_event_interruptible(stream->poll_wq,
> +					stream_have_data_unlocked(stream));
>   }
>   
>   /**
> - * i915_oa_poll_wait - call poll_wait() for an OA stream poll()
> - * @stream: An i915-perf stream opened for OA metrics
> + * i915_perf_stream_poll_wait - call poll_wait() for an stream poll()
> + * @stream: An i915-perf stream opened for GPU metrics
>    * @file: An i915 perf stream file
>    * @wait: poll() state table
>    *
> - * For handling userspace polling on an i915 perf stream opened for OA metrics,
> + * For handling userspace polling on an i915 perf stream opened for metrics,
>    * this starts a poll_wait with the wait queue that our hrtimer callback wakes
> - * when it sees data ready to read in the circular OA buffer.
> + * when it sees data ready to read either in command stream buffer or in the
> + * circular OA buffer.
>    */
> -static void i915_oa_poll_wait(struct i915_perf_stream *stream,
> +static void i915_perf_stream_poll_wait(struct i915_perf_stream *stream,
>   			      struct file *file,
>   			      poll_table *wait)
>   {
> -	struct drm_i915_private *dev_priv = stream->dev_priv;
> -
> -	poll_wait(file, &dev_priv->perf.oa.poll_wq, wait);
> +	poll_wait(file, &stream->poll_wq, wait);
>   }
>   
>   /**
> - * i915_oa_read - just calls through to &i915_oa_ops->read
> - * @stream: An i915-perf stream opened for OA metrics
> + * i915_perf_stream_read - Reads perf metrics available into userspace read
> + * buffer
> + * @stream: An i915-perf stream opened for GPU metrics
>    * @buf: destination buffer given by userspace
>    * @count: the number of bytes userspace wants to read
>    * @offset: (inout): the current position for writing into @buf
> @@ -1157,14 +1679,21 @@ static void i915_oa_poll_wait(struct i915_perf_stream *stream,
>    *
>    * Returns: zero on success or a negative error code
>    */
> -static int i915_oa_read(struct i915_perf_stream *stream,
> +static int i915_perf_stream_read(struct i915_perf_stream *stream,
>   			char __user *buf,
>   			size_t count,
>   			size_t *offset)
>   {
>   	struct drm_i915_private *dev_priv = stream->dev_priv;
>   
> -	return dev_priv->perf.oa.ops.read(stream, buf, count, offset);
> +
> +	if (stream->cs_mode)
> +		return append_cs_buffer_samples(stream, buf, count, offset);
> +	else if (stream->sample_flags & SAMPLE_OA_REPORT)
> +		return dev_priv->perf.oa.ops.read(stream, buf, count, offset,
> +						U32_MAX);
> +	else
> +		return -EINVAL;
>   }
>   
>   /**
> @@ -1182,7 +1711,7 @@ static int oa_get_render_ctx_id(struct i915_perf_stream *stream)
>   	struct drm_i915_private *dev_priv = stream->dev_priv;
>   
>   	if (i915.enable_execlists)
> -		dev_priv->perf.oa.specific_ctx_id = stream->ctx->hw_id;
> +		stream->engine->specific_ctx_id = stream->ctx->hw_id;
>   	else {
>   		struct intel_engine_cs *engine = dev_priv->engine[RCS];
>   		struct intel_ring *ring;
> @@ -1209,7 +1738,7 @@ static int oa_get_render_ctx_id(struct i915_perf_stream *stream)
>   		 * i915_ggtt_offset() on the fly) considering the difference
>   		 * with gen8+ and execlists
>   		 */
> -		dev_priv->perf.oa.specific_ctx_id =
> +		stream->engine->specific_ctx_id =
>   			i915_ggtt_offset(stream->ctx->engine[engine->id].state);
>   	}
>   
> @@ -1228,13 +1757,13 @@ static void oa_put_render_ctx_id(struct i915_perf_stream *stream)
>   	struct drm_i915_private *dev_priv = stream->dev_priv;
>   
>   	if (i915.enable_execlists) {
> -		dev_priv->perf.oa.specific_ctx_id = INVALID_CTX_ID;
> +		stream->engine->specific_ctx_id = INVALID_CTX_ID;
>   	} else {
>   		struct intel_engine_cs *engine = dev_priv->engine[RCS];
>   
>   		mutex_lock(&dev_priv->drm.struct_mutex);
>   
> -		dev_priv->perf.oa.specific_ctx_id = INVALID_CTX_ID;
> +		stream->engine->specific_ctx_id = INVALID_CTX_ID;
>   		engine->context_unpin(engine, stream->ctx);
>   
>   		mutex_unlock(&dev_priv->drm.struct_mutex);
> @@ -1242,13 +1771,28 @@ static void oa_put_render_ctx_id(struct i915_perf_stream *stream)
>   }
>   
>   static void
> +free_cs_buffer(struct i915_perf_stream *stream)
> +{
> +	struct drm_i915_private *dev_priv = stream->dev_priv;
> +
> +	mutex_lock(&dev_priv->drm.struct_mutex);
> +
> +	i915_gem_object_unpin_map(stream->cs_buffer.vma->obj);
> +	i915_vma_unpin_and_release(&stream->cs_buffer.vma);
> +
> +	stream->cs_buffer.vma = NULL;
> +	stream->cs_buffer.vaddr = NULL;
> +
> +	mutex_unlock(&dev_priv->drm.struct_mutex);
> +}
> +
> +static void
>   free_oa_buffer(struct drm_i915_private *i915)
>   {
>   	mutex_lock(&i915->drm.struct_mutex);
>   
>   	i915_gem_object_unpin_map(i915->perf.oa.oa_buffer.vma->obj);
> -	i915_vma_unpin(i915->perf.oa.oa_buffer.vma);
> -	i915_gem_object_put(i915->perf.oa.oa_buffer.vma->obj);
> +	i915_vma_unpin_and_release(&i915->perf.oa.oa_buffer.vma);
>   
>   	i915->perf.oa.oa_buffer.vma = NULL;
>   	i915->perf.oa.oa_buffer.vaddr = NULL;
> @@ -1256,27 +1800,41 @@ static void oa_put_render_ctx_id(struct i915_perf_stream *stream)
>   	mutex_unlock(&i915->drm.struct_mutex);
>   }
>   
> -static void i915_oa_stream_destroy(struct i915_perf_stream *stream)
> +static void i915_perf_stream_destroy(struct i915_perf_stream *stream)
>   {
>   	struct drm_i915_private *dev_priv = stream->dev_priv;
> -
> -	BUG_ON(stream != dev_priv->perf.oa.exclusive_stream);
> +	struct intel_engine_cs *engine = stream->engine;
> +	struct i915_perf_stream *engine_stream;
> +	int idx;
> +
> +	idx = srcu_read_lock(&engine->perf_srcu);
> +	engine_stream = srcu_dereference(engine->exclusive_stream,
> +					 &engine->perf_srcu);
> +	if (WARN_ON(stream != engine_stream))
> +		return;
> +	srcu_read_unlock(&engine->perf_srcu, idx);
>   
>   	/*
>   	 * Unset exclusive_stream first, it might be checked while
>   	 * disabling the metric set on gen8+.
>   	 */
> -	dev_priv->perf.oa.exclusive_stream = NULL;
> +	rcu_assign_pointer(stream->engine->exclusive_stream, NULL);
> +	synchronize_srcu(&stream->engine->perf_srcu);
>   
> -	dev_priv->perf.oa.ops.disable_metric_set(dev_priv);
> +	if (stream->using_oa) {
> +		dev_priv->perf.oa.ops.disable_metric_set(dev_priv);
>   
> -	free_oa_buffer(dev_priv);
> +		free_oa_buffer(dev_priv);
>   
> -	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
> -	intel_runtime_pm_put(dev_priv);
> +		intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
> +		intel_runtime_pm_put(dev_priv);
>   
> -	if (stream->ctx)
> -		oa_put_render_ctx_id(stream);
> +		if (stream->ctx)
> +			oa_put_render_ctx_id(stream);
> +	}
> +
> +	if (stream->cs_mode)
> +		free_cs_buffer(stream);
>   
>   	if (dev_priv->perf.oa.spurious_report_rs.missed) {
>   		DRM_NOTE("%d spurious OA report notices suppressed due to ratelimiting\n",
> @@ -1325,11 +1883,6 @@ static void gen7_init_oa_buffer(struct drm_i915_private *dev_priv)
>   	 * memory...
>   	 */
>   	memset(dev_priv->perf.oa.oa_buffer.vaddr, 0, OA_BUFFER_SIZE);
> -
> -	/* Maybe make ->pollin per-stream state if we support multiple
> -	 * concurrent streams in the future.
> -	 */
> -	dev_priv->perf.oa.pollin = false;
>   }
>   
>   static void gen8_init_oa_buffer(struct drm_i915_private *dev_priv)
> @@ -1383,33 +1936,26 @@ static void gen8_init_oa_buffer(struct drm_i915_private *dev_priv)
>   	 * memory...
>   	 */
>   	memset(dev_priv->perf.oa.oa_buffer.vaddr, 0, OA_BUFFER_SIZE);
> -
> -	/*
> -	 * Maybe make ->pollin per-stream state if we support multiple
> -	 * concurrent streams in the future.
> -	 */
> -	dev_priv->perf.oa.pollin = false;
>   }
>   
> -static int alloc_oa_buffer(struct drm_i915_private *dev_priv)
> +static int alloc_obj(struct drm_i915_private *dev_priv,
> +		     struct i915_vma **vma, u8 **vaddr)
>   {
>   	struct drm_i915_gem_object *bo;
> -	struct i915_vma *vma;
>   	int ret;
>   
> -	if (WARN_ON(dev_priv->perf.oa.oa_buffer.vma))
> -		return -ENODEV;
> +	intel_runtime_pm_get(dev_priv);
>   
>   	ret = i915_mutex_lock_interruptible(&dev_priv->drm);
>   	if (ret)
> -		return ret;
> +		goto out;
>   
>   	BUILD_BUG_ON_NOT_POWER_OF_2(OA_BUFFER_SIZE);
>   	BUILD_BUG_ON(OA_BUFFER_SIZE < SZ_128K || OA_BUFFER_SIZE > SZ_16M);
>   
>   	bo = i915_gem_object_create(dev_priv, OA_BUFFER_SIZE);
>   	if (IS_ERR(bo)) {
> -		DRM_ERROR("Failed to allocate OA buffer\n");
> +		DRM_ERROR("Failed to allocate i915 perf obj\n");
>   		ret = PTR_ERR(bo);
>   		goto unlock;
>   	}
> @@ -1419,42 +1965,83 @@ static int alloc_oa_buffer(struct drm_i915_private *dev_priv)
>   		goto err_unref;
>   
>   	/* PreHSW required 512K alignment, HSW requires 16M */
> -	vma = i915_gem_object_ggtt_pin(bo, NULL, 0, SZ_16M, 0);
> -	if (IS_ERR(vma)) {
> -		ret = PTR_ERR(vma);
> +	*vma = i915_gem_object_ggtt_pin(bo, NULL, 0, SZ_16M, 0);
> +	if (IS_ERR(*vma)) {
> +		ret = PTR_ERR(*vma);
>   		goto err_unref;
>   	}
> -	dev_priv->perf.oa.oa_buffer.vma = vma;
>   
> -	dev_priv->perf.oa.oa_buffer.vaddr =
> -		i915_gem_object_pin_map(bo, I915_MAP_WB);
> -	if (IS_ERR(dev_priv->perf.oa.oa_buffer.vaddr)) {
> -		ret = PTR_ERR(dev_priv->perf.oa.oa_buffer.vaddr);
> +	*vaddr = i915_gem_object_pin_map(bo, I915_MAP_WB);
> +	if (IS_ERR(*vaddr)) {
> +		ret = PTR_ERR(*vaddr);
>   		goto err_unpin;
>   	}
>   
> -	dev_priv->perf.oa.ops.init_oa_buffer(dev_priv);
> -
> -	DRM_DEBUG_DRIVER("OA Buffer initialized, gtt offset = 0x%x, vaddr = %p\n",
> -			 i915_ggtt_offset(dev_priv->perf.oa.oa_buffer.vma),
> -			 dev_priv->perf.oa.oa_buffer.vaddr);
> -
>   	goto unlock;
>   
>   err_unpin:
> -	__i915_vma_unpin(vma);
> +	i915_vma_unpin(*vma);
>   
>   err_unref:
>   	i915_gem_object_put(bo);
>   
> -	dev_priv->perf.oa.oa_buffer.vaddr = NULL;
> -	dev_priv->perf.oa.oa_buffer.vma = NULL;
> -
>   unlock:
>   	mutex_unlock(&dev_priv->drm.struct_mutex);
> +out:
> +	intel_runtime_pm_put(dev_priv);
>   	return ret;
>   }
>   
> +static int alloc_oa_buffer(struct drm_i915_private *dev_priv)
> +{
> +	struct i915_vma *vma;
> +	u8 *vaddr;
> +	int ret;
> +
> +	if (WARN_ON(dev_priv->perf.oa.oa_buffer.vma))
> +		return -ENODEV;
> +
> +	ret = alloc_obj(dev_priv, &vma, &vaddr);
> +	if (ret)
> +		return ret;
> +
> +	dev_priv->perf.oa.oa_buffer.vma = vma;
> +	dev_priv->perf.oa.oa_buffer.vaddr = vaddr;
> +
> +	dev_priv->perf.oa.ops.init_oa_buffer(dev_priv);
> +
> +	DRM_DEBUG_DRIVER("OA Buffer initialized, gtt offset = 0x%x, vaddr = %p",
> +			 i915_ggtt_offset(dev_priv->perf.oa.oa_buffer.vma),
> +			 dev_priv->perf.oa.oa_buffer.vaddr);
> +	return 0;
> +}
> +
> +static int alloc_cs_buffer(struct i915_perf_stream *stream)
> +{
> +	struct drm_i915_private *dev_priv = stream->dev_priv;
> +	struct i915_vma *vma;
> +	u8 *vaddr;
> +	int ret;
> +
> +	if (WARN_ON(stream->cs_buffer.vma))
> +		return -ENODEV;
> +
> +	ret = alloc_obj(dev_priv, &vma, &vaddr);
> +	if (ret)
> +		return ret;
> +
> +	stream->cs_buffer.vma = vma;
> +	stream->cs_buffer.vaddr = vaddr;
> +	if (WARN_ON(!list_empty(&stream->cs_samples)))
> +		INIT_LIST_HEAD(&stream->cs_samples);
> +
> +	DRM_DEBUG_DRIVER("Command stream buf initialized, gtt offset = 0x%x, vaddr = %p",
> +			 i915_ggtt_offset(stream->cs_buffer.vma),
> +			 stream->cs_buffer.vaddr);
> +
> +	return 0;
> +}
> +
>   static void config_oa_regs(struct drm_i915_private *dev_priv,
>   			   const struct i915_oa_reg *regs,
>   			   int n_regs)
> @@ -1859,6 +2446,10 @@ static void gen8_disable_metric_set(struct drm_i915_private *dev_priv)
>   
>   static void gen7_oa_enable(struct drm_i915_private *dev_priv)
>   {
> +	struct i915_perf_stream *stream;
> +	struct intel_engine_cs *engine = dev_priv->engine[RCS];
> +	int idx;
> +
>   	/*
>   	 * Reset buf pointers so we don't forward reports from before now.
>   	 *
> @@ -1870,11 +2461,11 @@ static void gen7_oa_enable(struct drm_i915_private *dev_priv)
>   	 */
>   	gen7_init_oa_buffer(dev_priv);
>   
> -	if (dev_priv->perf.oa.exclusive_stream->enabled) {
> -		struct i915_gem_context *ctx =
> -			dev_priv->perf.oa.exclusive_stream->ctx;
> -		u32 ctx_id = dev_priv->perf.oa.specific_ctx_id;
> -
> +	idx = srcu_read_lock(&engine->perf_srcu);
> +	stream = srcu_dereference(engine->exclusive_stream, &engine->perf_srcu);
> +	if (stream->state != I915_PERF_STREAM_DISABLED) {
> +		struct i915_gem_context *ctx = stream->ctx;
> +		u32 ctx_id = engine->specific_ctx_id;
>   		bool periodic = dev_priv->perf.oa.periodic;
>   		u32 period_exponent = dev_priv->perf.oa.period_exponent;
>   		u32 report_format = dev_priv->perf.oa.oa_buffer.format;
> @@ -1889,6 +2480,7 @@ static void gen7_oa_enable(struct drm_i915_private *dev_priv)
>   			   GEN7_OACONTROL_ENABLE);
>   	} else
>   		I915_WRITE(GEN7_OACONTROL, 0);
> +	srcu_read_unlock(&engine->perf_srcu, idx);
>   }
>   
>   static void gen8_oa_enable(struct drm_i915_private *dev_priv)
> @@ -1917,22 +2509,23 @@ static void gen8_oa_enable(struct drm_i915_private *dev_priv)
>   }
>   
>   /**
> - * i915_oa_stream_enable - handle `I915_PERF_IOCTL_ENABLE` for OA stream
> - * @stream: An i915 perf stream opened for OA metrics
> + * i915_perf_stream_enable - handle `I915_PERF_IOCTL_ENABLE` for perf stream
> + * @stream: An i915 perf stream opened for GPU metrics
>    *
>    * [Re]enables hardware periodic sampling according to the period configured
>    * when opening the stream. This also starts a hrtimer that will periodically
>    * check for data in the circular OA buffer for notifying userspace (e.g.
>    * during a read() or poll()).
>    */
> -static void i915_oa_stream_enable(struct i915_perf_stream *stream)
> +static void i915_perf_stream_enable(struct i915_perf_stream *stream)
>   {
>   	struct drm_i915_private *dev_priv = stream->dev_priv;
>   
> -	dev_priv->perf.oa.ops.oa_enable(dev_priv);
> +	if (stream->sample_flags & SAMPLE_OA_REPORT)
> +		dev_priv->perf.oa.ops.oa_enable(dev_priv);
>   
> -	if (dev_priv->perf.oa.periodic)
> -		hrtimer_start(&dev_priv->perf.oa.poll_check_timer,
> +	if (stream->cs_mode || dev_priv->perf.oa.periodic)
> +		hrtimer_start(&dev_priv->perf.poll_check_timer,
>   			      ns_to_ktime(POLL_PERIOD),
>   			      HRTIMER_MODE_REL_PINNED);
>   }
> @@ -1948,34 +2541,39 @@ static void gen8_oa_disable(struct drm_i915_private *dev_priv)
>   }
>   
>   /**
> - * i915_oa_stream_disable - handle `I915_PERF_IOCTL_DISABLE` for OA stream
> - * @stream: An i915 perf stream opened for OA metrics
> + * i915_perf_stream_disable - handle `I915_PERF_IOCTL_DISABLE` for perf stream
> + * @stream: An i915 perf stream opened for GPU metrics
>    *
>    * Stops the OA unit from periodically writing counter reports into the
>    * circular OA buffer. This also stops the hrtimer that periodically checks for
>    * data in the circular OA buffer, for notifying userspace.
>    */
> -static void i915_oa_stream_disable(struct i915_perf_stream *stream)
> +static void i915_perf_stream_disable(struct i915_perf_stream *stream)
>   {
>   	struct drm_i915_private *dev_priv = stream->dev_priv;
>   
> -	dev_priv->perf.oa.ops.oa_disable(dev_priv);
> +	if (stream->cs_mode || dev_priv->perf.oa.periodic)
> +		hrtimer_cancel(&dev_priv->perf.poll_check_timer);
> +
> +	if (stream->cs_mode)
> +		i915_perf_stream_release_samples(stream);
>   
> -	if (dev_priv->perf.oa.periodic)
> -		hrtimer_cancel(&dev_priv->perf.oa.poll_check_timer);
> +	if (stream->sample_flags & SAMPLE_OA_REPORT)
> +		dev_priv->perf.oa.ops.oa_disable(dev_priv);
>   }
>   
> -static const struct i915_perf_stream_ops i915_oa_stream_ops = {
> -	.destroy = i915_oa_stream_destroy,
> -	.enable = i915_oa_stream_enable,
> -	.disable = i915_oa_stream_disable,
> -	.wait_unlocked = i915_oa_wait_unlocked,
> -	.poll_wait = i915_oa_poll_wait,
> -	.read = i915_oa_read,
> +static const struct i915_perf_stream_ops perf_stream_ops = {
> +	.destroy = i915_perf_stream_destroy,
> +	.enable = i915_perf_stream_enable,
> +	.disable = i915_perf_stream_disable,
> +	.wait_unlocked = i915_perf_stream_wait_unlocked,
> +	.poll_wait = i915_perf_stream_poll_wait,
> +	.read = i915_perf_stream_read,
> +	.emit_sample_capture = i915_perf_stream_emit_sample_capture,
>   };
>   
>   /**
> - * i915_oa_stream_init - validate combined props for OA stream and init
> + * i915_perf_stream_init - validate combined props for stream and init
>    * @stream: An i915 perf stream
>    * @param: The open parameters passed to `DRM_I915_PERF_OPEN`
>    * @props: The property state that configures stream (individually validated)
> @@ -1984,58 +2582,35 @@ static void i915_oa_stream_disable(struct i915_perf_stream *stream)
>    * doesn't ensure that the combination necessarily makes sense.
>    *
>    * At this point it has been determined that userspace wants a stream of
> - * OA metrics, but still we need to further validate the combined
> + * perf metrics, but still we need to further validate the combined
>    * properties are OK.
>    *
>    * If the configuration makes sense then we can allocate memory for
> - * a circular OA buffer and apply the requested metric set configuration.
> + * a circular perf buffer and apply the requested metric set configuration.
>    *
>    * Returns: zero on success or a negative error code.
>    */
> -static int i915_oa_stream_init(struct i915_perf_stream *stream,
> +static int i915_perf_stream_init(struct i915_perf_stream *stream,
>   			       struct drm_i915_perf_open_param *param,
>   			       struct perf_open_properties *props)
>   {
>   	struct drm_i915_private *dev_priv = stream->dev_priv;
> -	int format_size;
> +	bool require_oa_unit = props->sample_flags & (SAMPLE_OA_REPORT |
> +						      SAMPLE_OA_SOURCE);
> +	bool cs_sample_data = props->sample_flags & SAMPLE_OA_REPORT;
> +	struct i915_perf_stream *curr_stream;
> +	struct intel_engine_cs *engine = NULL;
> +	int idx;
>   	int ret;
>   
> -	/* If the sysfs metrics/ directory wasn't registered for some
> -	 * reason then don't let userspace try their luck with config
> -	 * IDs
> -	 */
> -	if (!dev_priv->perf.metrics_kobj) {
> -		DRM_DEBUG("OA metrics weren't advertised via sysfs\n");
> -		return -EINVAL;
> -	}
> -
> -	if (!(props->sample_flags & SAMPLE_OA_REPORT)) {
> -		DRM_DEBUG("Only OA report sampling supported\n");
> -		return -EINVAL;
> -	}
> -
> -	if (!dev_priv->perf.oa.ops.init_oa_buffer) {
> -		DRM_DEBUG("OA unit not supported\n");
> -		return -ENODEV;
> -	}
> -
> -	/* To avoid the complexity of having to accurately filter
> -	 * counter reports and marshal to the appropriate client
> -	 * we currently only allow exclusive access
> -	 */
> -	if (dev_priv->perf.oa.exclusive_stream) {
> -		DRM_DEBUG("OA unit already in use\n");
> -		return -EBUSY;
> -	}
> -
> -	if (!props->metrics_set) {
> -		DRM_DEBUG("OA metric set not specified\n");
> -		return -EINVAL;
> -	}
> -
> -	if (!props->oa_format) {
> -		DRM_DEBUG("OA report format not specified\n");
> -		return -EINVAL;
> +	if ((props->sample_flags & SAMPLE_CTX_ID) && !props->cs_mode) {
> +		if (IS_HASWELL(dev_priv)) {
> +			DRM_ERROR("On HSW, context ID sampling only supported via command stream\n");
> +			return -EINVAL;
> +		} else if (!i915.enable_execlists) {
> +			DRM_ERROR("On Gen8+ without execlists, context ID sampling only supported via command stream\n");
> +			return -EINVAL;
> +		}
>   	}
>   
>   	/* We set up some ratelimit state to potentially throttle any _NOTES
> @@ -2060,70 +2635,167 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
>   
>   	stream->sample_size = sizeof(struct drm_i915_perf_record_header);
>   
> -	format_size = dev_priv->perf.oa.oa_formats[props->oa_format].size;
> +	if (require_oa_unit) {
> +		int format_size;
>   
> -	stream->sample_flags |= SAMPLE_OA_REPORT;
> -	stream->sample_size += format_size;
> +		/* If the sysfs metrics/ directory wasn't registered for some
> +		 * reason then don't let userspace try their luck with config
> +		 * IDs
> +		 */
> +		if (!dev_priv->perf.metrics_kobj) {
> +			DRM_DEBUG("OA metrics weren't advertised via sysfs\n");
> +			return -EINVAL;
> +		}
>   
> -	if (props->sample_flags & SAMPLE_OA_SOURCE) {
> -		stream->sample_flags |= SAMPLE_OA_SOURCE;
> -		stream->sample_size += 8;
> -	}
> +		if (!dev_priv->perf.oa.ops.init_oa_buffer) {
> +			DRM_DEBUG("OA unit not supported\n");
> +			return -ENODEV;
> +		}
>   
> -	dev_priv->perf.oa.oa_buffer.format_size = format_size;
> -	if (WARN_ON(dev_priv->perf.oa.oa_buffer.format_size == 0))
> -		return -EINVAL;
> +		if (!props->metrics_set) {
> +			DRM_DEBUG("OA metric set not specified\n");
> +			return -EINVAL;
> +		}
> +
> +		if (!props->oa_format) {
> +			DRM_DEBUG("OA report format not specified\n");
> +			return -EINVAL;
> +		}
> +
> +		if (props->cs_mode && (props->engine != RCS)) {
> +			DRM_ERROR("Command stream OA metrics only available via Render CS\n");
> +			return -EINVAL;
> +		}
> +
> +		engine = dev_priv->engine[RCS];
> +		stream->using_oa = true;
> +
> +		idx = srcu_read_lock(&engine->perf_srcu);
> +		curr_stream = srcu_dereference(engine->exclusive_stream,
> +					       &engine->perf_srcu);
> +		if (curr_stream) {
> +			DRM_ERROR("Stream already opened\n");
> +			ret = -EINVAL;
> +			goto err_enable;
> +		}
> +		srcu_read_unlock(&engine->perf_srcu, idx);
> +
> +		format_size =
> +			dev_priv->perf.oa.oa_formats[props->oa_format].size;
> +
> +		if (props->sample_flags & SAMPLE_OA_REPORT) {
> +			stream->sample_flags |= SAMPLE_OA_REPORT;
> +			stream->sample_size += format_size;
> +		}
> +
> +		if (props->sample_flags & SAMPLE_OA_SOURCE) {
> +			if (!(props->sample_flags & SAMPLE_OA_REPORT)) {
> +				DRM_ERROR("OA source type can't be sampled without OA report\n");
> +				return -EINVAL;
> +			}
> +			stream->sample_flags |= SAMPLE_OA_SOURCE;
> +			stream->sample_size += 8;
> +		}
> +
> +		dev_priv->perf.oa.oa_buffer.format_size = format_size;
> +		if (WARN_ON(dev_priv->perf.oa.oa_buffer.format_size == 0))
> +			return -EINVAL;
> +
> +		dev_priv->perf.oa.oa_buffer.format =
> +			dev_priv->perf.oa.oa_formats[props->oa_format].format;
> +
> +		dev_priv->perf.oa.metrics_set = props->metrics_set;
>   
> -	dev_priv->perf.oa.oa_buffer.format =
> -		dev_priv->perf.oa.oa_formats[props->oa_format].format;
> +		dev_priv->perf.oa.periodic = props->oa_periodic;
> +		if (dev_priv->perf.oa.periodic)
> +			dev_priv->perf.oa.period_exponent =
> +				props->oa_period_exponent;
>   
> -	dev_priv->perf.oa.metrics_set = props->metrics_set;
> +		if (stream->ctx) {
> +			ret = oa_get_render_ctx_id(stream);
> +			if (ret)
> +				return ret;
> +		}
>   
> -	dev_priv->perf.oa.periodic = props->oa_periodic;
> -	if (dev_priv->perf.oa.periodic)
> -		dev_priv->perf.oa.period_exponent = props->oa_period_exponent;
> +		/* PRM - observability performance counters:
> +		 *
> +		 *   OACONTROL, performance counter enable, note:
> +		 *
> +		 *   "When this bit is set, in order to have coherent counts,
> +		 *   RC6 power state and trunk clock gating must be disabled.
> +		 *   This can be achieved by programming MMIO registers as
> +		 *   0xA094=0 and 0xA090[31]=1"
> +		 *
> +		 *   In our case we are expecting that taking pm + FORCEWAKE
> +		 *   references will effectively disable RC6.
> +		 */
> +		intel_runtime_pm_get(dev_priv);
> +		intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
>   
> -	if (stream->ctx) {
> -		ret = oa_get_render_ctx_id(stream);
> +		ret = alloc_oa_buffer(dev_priv);
>   		if (ret)
> -			return ret;
> +			goto err_oa_buf_alloc;
> +
> +		ret = dev_priv->perf.oa.ops.enable_metric_set(dev_priv);
> +		if (ret)
> +			goto err_enable;
>   	}
>   
> -	/* PRM - observability performance counters:
> -	 *
> -	 *   OACONTROL, performance counter enable, note:
> -	 *
> -	 *   "When this bit is set, in order to have coherent counts,
> -	 *   RC6 power state and trunk clock gating must be disabled.
> -	 *   This can be achieved by programming MMIO registers as
> -	 *   0xA094=0 and 0xA090[31]=1"
> -	 *
> -	 *   In our case we are expecting that taking pm + FORCEWAKE
> -	 *   references will effectively disable RC6.
> -	 */
> -	intel_runtime_pm_get(dev_priv);
> -	intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
> +	if (props->sample_flags & SAMPLE_CTX_ID) {
> +		stream->sample_flags |= SAMPLE_CTX_ID;
> +		stream->sample_size += 8;
> +	}
>   
> -	ret = alloc_oa_buffer(dev_priv);
> -	if (ret)
> -		goto err_oa_buf_alloc;
> +	if (props->cs_mode) {
> +		if (!cs_sample_data) {
> +			DRM_ERROR("Stream engine given without requesting any CS data to sample\n");
> +			ret = -EINVAL;
> +			goto err_enable;
> +		}
>   
> -	ret = dev_priv->perf.oa.ops.enable_metric_set(dev_priv);
> -	if (ret)
> -		goto err_enable;
> +		if (!(props->sample_flags & SAMPLE_CTX_ID)) {
> +			DRM_ERROR("Stream engine given without requesting any CS specific property\n");
> +			ret = -EINVAL;
> +			goto err_enable;
> +		}
>   
> -	stream->ops = &i915_oa_stream_ops;
> +		engine = dev_priv->engine[props->engine];
>   
> -	dev_priv->perf.oa.exclusive_stream = stream;
> +		idx = srcu_read_lock(&engine->perf_srcu);
> +		curr_stream = srcu_dereference(engine->exclusive_stream,
> +					       &engine->perf_srcu);
> +		if (curr_stream) {
> +			DRM_ERROR("Stream already opened\n");
> +			ret = -EINVAL;
> +			goto err_enable;
> +		}
> +		srcu_read_unlock(&engine->perf_srcu, idx);
> +
> +		INIT_LIST_HEAD(&stream->cs_samples);
> +		ret = alloc_cs_buffer(stream);
> +		if (ret)
> +			goto err_enable;
> +
> +		stream->cs_mode = true;
> +	}
> +
> +	init_waitqueue_head(&stream->poll_wq);
> +	stream->pollin = false;
> +	stream->ops = &perf_stream_ops;
> +	stream->engine = engine;
> +	rcu_assign_pointer(engine->exclusive_stream, stream);
>   
>   	return 0;
>   
>   err_enable:
> -	free_oa_buffer(dev_priv);
> +	if (require_oa_unit)
> +		free_oa_buffer(dev_priv);
>   
>   err_oa_buf_alloc:
> -	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
> -	intel_runtime_pm_put(dev_priv);
> +	if (require_oa_unit) {
> +		intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
> +		intel_runtime_pm_put(dev_priv);
> +	}
>   	if (stream->ctx)
>   		oa_put_render_ctx_id(stream);
>   
> @@ -2219,7 +2891,7 @@ static ssize_t i915_perf_read(struct file *file,
>   	 * disabled stream as an error. In particular it might otherwise lead
>   	 * to a deadlock for blocking file descriptors...
>   	 */
> -	if (!stream->enabled)
> +	if (stream->state == I915_PERF_STREAM_DISABLED)
>   		return -EIO;
>   
>   	if (!(file->f_flags & O_NONBLOCK)) {
> @@ -2254,25 +2926,32 @@ static ssize_t i915_perf_read(struct file *file,
>   	 * effectively ensures we back off until the next hrtimer callback
>   	 * before reporting another POLLIN event.
>   	 */
> -	if (ret >= 0 || ret == -EAGAIN) {
> -		/* Maybe make ->pollin per-stream state if we support multiple
> -		 * concurrent streams in the future.
> -		 */
> -		dev_priv->perf.oa.pollin = false;
> -	}
> +	if (ret >= 0 || ret == -EAGAIN)
> +		stream->pollin = false;
>   
>   	return ret;
>   }
>   
> -static enum hrtimer_restart oa_poll_check_timer_cb(struct hrtimer *hrtimer)
> +static enum hrtimer_restart poll_check_timer_cb(struct hrtimer *hrtimer)
>   {
> +	struct i915_perf_stream *stream;
>   	struct drm_i915_private *dev_priv =
>   		container_of(hrtimer, typeof(*dev_priv),
> -			     perf.oa.poll_check_timer);
> -
> -	if (oa_buffer_check_unlocked(dev_priv)) {
> -		dev_priv->perf.oa.pollin = true;
> -		wake_up(&dev_priv->perf.oa.poll_wq);
> +			     perf.poll_check_timer);
> +	int idx;
> +	struct intel_engine_cs *engine;
> +	enum intel_engine_id id;
> +
> +	for_each_engine(engine, dev_priv, id) {
> +		idx = srcu_read_lock(&engine->perf_srcu);
> +		stream = srcu_dereference(engine->exclusive_stream,
> +					  &engine->perf_srcu);
> +		if (stream && (stream->state == I915_PERF_STREAM_ENABLED) &&
> +		    stream_have_data_unlocked(stream)) {
> +			stream->pollin = true;
> +			wake_up(&stream->poll_wq);
> +		}
> +		srcu_read_unlock(&engine->perf_srcu, idx);
>   	}
>   
>   	hrtimer_forward_now(hrtimer, ns_to_ktime(POLL_PERIOD));
> @@ -2311,7 +2990,7 @@ static unsigned int i915_perf_poll_locked(struct drm_i915_private *dev_priv,
>   	 * the hrtimer/oa_poll_check_timer_cb to notify us when there are
>   	 * samples to read.
>   	 */
> -	if (dev_priv->perf.oa.pollin)
> +	if (stream->pollin)
>   		events |= POLLIN;
>   
>   	return events;
> @@ -2355,14 +3034,16 @@ static unsigned int i915_perf_poll(struct file *file, poll_table *wait)
>    */
>   static void i915_perf_enable_locked(struct i915_perf_stream *stream)
>   {
> -	if (stream->enabled)
> +	if (stream->state != I915_PERF_STREAM_DISABLED)
>   		return;
>   
>   	/* Allow stream->ops->enable() to refer to this */
> -	stream->enabled = true;
> +	stream->state = I915_PERF_STREAM_ENABLE_IN_PROGRESS;
>   
>   	if (stream->ops->enable)
>   		stream->ops->enable(stream);
> +
> +	stream->state = I915_PERF_STREAM_ENABLED;
>   }
>   
>   /**
> @@ -2381,11 +3062,11 @@ static void i915_perf_enable_locked(struct i915_perf_stream *stream)
>    */
>   static void i915_perf_disable_locked(struct i915_perf_stream *stream)
>   {
> -	if (!stream->enabled)
> +	if (stream->state != I915_PERF_STREAM_ENABLED)
>   		return;
>   
>   	/* Allow stream->ops->disable() to refer to this */
> -	stream->enabled = false;
> +	stream->state = I915_PERF_STREAM_DISABLED;
>   
>   	if (stream->ops->disable)
>   		stream->ops->disable(stream);
> @@ -2457,14 +3138,12 @@ static long i915_perf_ioctl(struct file *file,
>    */
>   static void i915_perf_destroy_locked(struct i915_perf_stream *stream)
>   {
> -	if (stream->enabled)
> +	if (stream->state == I915_PERF_STREAM_ENABLED)
>   		i915_perf_disable_locked(stream);
>   
>   	if (stream->ops->destroy)
>   		stream->ops->destroy(stream);
>   
> -	list_del(&stream->link);
> -
>   	if (stream->ctx)
>   		i915_gem_context_put(stream->ctx);
>   
> @@ -2524,7 +3203,7 @@ static int i915_perf_release(struct inode *inode, struct file *file)
>    *
>    * In the case where userspace is interested in OA unit metrics then further
>    * config validation and stream initialization details will be handled by
> - * i915_oa_stream_init(). The code here should only validate config state that
> + * i915_perf_stream_init(). The code here should only validate config state that
>    * will be relevant to all stream types / backends.
>    *
>    * Returns: zero on success or a negative error code.
> @@ -2593,7 +3272,7 @@ static int i915_perf_release(struct inode *inode, struct file *file)
>   	stream->dev_priv = dev_priv;
>   	stream->ctx = specific_ctx;
>   
> -	ret = i915_oa_stream_init(stream, param, props);
> +	ret = i915_perf_stream_init(stream, param, props);
>   	if (ret)
>   		goto err_alloc;
>   
> @@ -2606,8 +3285,6 @@ static int i915_perf_release(struct inode *inode, struct file *file)
>   		goto err_flags;
>   	}
>   
> -	list_add(&stream->link, &dev_priv->perf.streams);
> -
>   	if (param->flags & I915_PERF_FLAG_FD_CLOEXEC)
>   		f_flags |= O_CLOEXEC;
>   	if (param->flags & I915_PERF_FLAG_FD_NONBLOCK)
> @@ -2625,7 +3302,6 @@ static int i915_perf_release(struct inode *inode, struct file *file)
>   	return stream_fd;
>   
>   err_open:
> -	list_del(&stream->link);
>   err_flags:
>   	if (stream->ops->destroy)
>   		stream->ops->destroy(stream);
> @@ -2774,6 +3450,29 @@ static int read_properties_unlocked(struct drm_i915_private *dev_priv,
>   		case DRM_I915_PERF_PROP_SAMPLE_OA_SOURCE:
>   			props->sample_flags |= SAMPLE_OA_SOURCE;
>   			break;
> +		case DRM_I915_PERF_PROP_ENGINE: {
> +				unsigned int user_ring_id =
> +					value & I915_EXEC_RING_MASK;
> +				enum intel_engine_id engine;
> +
> +				if (user_ring_id > I915_USER_RINGS)
> +					return -EINVAL;
> +
> +				/* XXX: Currently only RCS is supported.
> +				 * Remove this check when support for other
> +				 * engines is added
> +				 */
> +				engine = user_ring_map[user_ring_id];
> +				if (engine != RCS)
> +					return -EINVAL;
> +
> +				props->cs_mode = true;
> +				props->engine = engine;
> +			}
> +			break;
> +		case DRM_I915_PERF_PROP_SAMPLE_CTX_ID:
> +			props->sample_flags |= SAMPLE_CTX_ID;
> +			break;
>   		case DRM_I915_PERF_PROP_MAX:
>   			MISSING_CASE(id);
>   			return -EINVAL;
> @@ -3002,6 +3701,30 @@ void i915_perf_unregister(struct drm_i915_private *dev_priv)
>   	{}
>   };
>   
> +void i915_perf_streams_mark_idle(struct drm_i915_private *dev_priv)
> +{
> +	struct intel_engine_cs *engine;
> +	struct i915_perf_stream *stream;
> +	enum intel_engine_id id;
> +	int idx;
> +
> +	for_each_engine(engine, dev_priv, id) {
> +		idx = srcu_read_lock(&engine->perf_srcu);
> +		stream = srcu_dereference(engine->exclusive_stream,
> +					  &engine->perf_srcu);
> +		if (stream && (stream->state == I915_PERF_STREAM_ENABLED) &&
> +					stream->cs_mode) {
> +			struct reservation_object *resv =
> +						stream->cs_buffer.vma->resv;
> +
> +			reservation_object_lock(resv, NULL);
> +			reservation_object_add_excl_fence(resv, NULL);
> +			reservation_object_unlock(resv);
> +		}
> +		srcu_read_unlock(&engine->perf_srcu, idx);
> +	}
> +}
> +
>   /**
>    * i915_perf_init - initialize i915-perf state on module load
>    * @dev_priv: i915 device instance
> @@ -3125,12 +3848,10 @@ void i915_perf_init(struct drm_i915_private *dev_priv)
>   	}
>   
>   	if (dev_priv->perf.oa.n_builtin_sets) {
> -		hrtimer_init(&dev_priv->perf.oa.poll_check_timer,
> +		hrtimer_init(&dev_priv->perf.poll_check_timer,
>   				CLOCK_MONOTONIC, HRTIMER_MODE_REL);
> -		dev_priv->perf.oa.poll_check_timer.function = oa_poll_check_timer_cb;
> -		init_waitqueue_head(&dev_priv->perf.oa.poll_wq);
> +		dev_priv->perf.poll_check_timer.function = poll_check_timer_cb;
>   
> -		INIT_LIST_HEAD(&dev_priv->perf.streams);
>   		mutex_init(&dev_priv->perf.lock);
>   		spin_lock_init(&dev_priv->perf.oa.oa_buffer.ptr_lock);
>   
> diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
> index 9ab5969..1a2e843 100644
> --- a/drivers/gpu/drm/i915/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
> @@ -317,6 +317,10 @@ int intel_engines_init(struct drm_i915_private *dev_priv)
>   			goto cleanup;
>   
>   		GEM_BUG_ON(!engine->submit_request);
> +
> +		/* Perf stream related initialization for Engine */
> +		rcu_assign_pointer(engine->exclusive_stream, NULL);
> +		init_srcu_struct(&engine->perf_srcu);
>   	}
>   
>   	return 0;
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index cdf084e..4333623 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -1622,6 +1622,8 @@ void intel_engine_cleanup(struct intel_engine_cs *engine)
>   
>   	intel_engine_cleanup_common(engine);
>   
> +	cleanup_srcu_struct(&engine->perf_srcu);
> +
>   	dev_priv->engine[engine->id] = NULL;
>   	kfree(engine);
>   }
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index d33c934..0ac8491 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -441,6 +441,11 @@ struct intel_engine_cs {
>   	 * certain bits to encode the command length in the header).
>   	 */
>   	u32 (*get_cmd_length_mask)(u32 cmd_header);
> +
> +	/* Global per-engine stream */
> +	struct srcu_struct perf_srcu;
> +	struct i915_perf_stream __rcu *exclusive_stream;
> +	u32 specific_ctx_id;
>   };
>   
>   static inline unsigned int
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index a1314c5..768b1a5 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -1350,6 +1350,7 @@ enum drm_i915_oa_format {
>   
>   enum drm_i915_perf_sample_oa_source {
>   	I915_PERF_SAMPLE_OA_SOURCE_OABUFFER,
> +	I915_PERF_SAMPLE_OA_SOURCE_CS,
>   	I915_PERF_SAMPLE_OA_SOURCE_MAX	/* non-ABI */
>   };
>   
> @@ -1394,6 +1395,19 @@ enum drm_i915_perf_property_id {
>   	 */
>   	DRM_I915_PERF_PROP_SAMPLE_OA_SOURCE,
>   
> +	/**
> +	 * The value of this property specifies the GPU engine for which
> +	 * the samples need to be collected. Specifying this property also
> +	 * implies the command stream based sample collection.
> +	 */
> +	DRM_I915_PERF_PROP_ENGINE,
> +
> +	/**
> +	 * The value of this property set to 1 requests inclusion of context ID
> +	 * in the perf sample data.
> +	 */
> +	DRM_I915_PERF_PROP_SAMPLE_CTX_ID,
> +
>   	DRM_I915_PERF_PROP_MAX /* non-ABI */
>   };
>   
> @@ -1460,6 +1474,7 @@ enum drm_i915_perf_record_type {
>   	 *     struct drm_i915_perf_record_header header;
>   	 *
>   	 *     { u64 source; } && DRM_I915_PERF_PROP_SAMPLE_OA_SOURCE
> +	 *     { u64 ctx_id; } && DRM_I915_PERF_PROP_SAMPLE_CTX_ID
>   	 *     { u32 oa_report[]; } && DRM_I915_PERF_PROP_SAMPLE_OA
>   	 * };
>   	 */


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 03/12] drm/i915: Framework for capturing command stream based OA reports and ctx id info.
  2017-08-01  9:29     ` Kamble, Sagar A
@ 2017-08-01 18:05       ` sourab gupta
  2017-08-01 20:58         ` Lionel Landwerlin
  0 siblings, 1 reply; 34+ messages in thread
From: sourab gupta @ 2017-08-01 18:05 UTC (permalink / raw)
  To: Kamble, Sagar A; +Cc: intel-gfx, Sourab Gupta


[-- Attachment #1.1: Type: text/plain, Size: 94019 bytes --]

On Tue, Aug 1, 2017 at 2:59 PM, Kamble, Sagar A <sagar.a.kamble@intel.com>
wrote:

>
>
> -----Original Message-----
> From: Landwerlin, Lionel G
> Sent: Monday, July 31, 2017 9:16 PM
> To: Kamble, Sagar A <sagar.a.kamble@intel.com>;
> intel-gfx@lists.freedesktop.org
> Cc: Sourab Gupta <sourab.gupta@intel.com>
> Subject: Re: [Intel-gfx] [PATCH 03/12] drm/i915: Framework for capturing
> command stream based OA reports and ctx id info.
>
> On 31/07/17 08:59, Sagar Arun Kamble wrote:
> > From: Sourab Gupta <sourab.gupta@intel.com>
> >
> > This patch introduces a framework to capture OA counter reports
> associated
> > with Render command stream. We can then associate the reports captured
> > through this mechanism with their corresponding context id's. This can be
> > further extended to associate any other metadata information with the
> > corresponding samples (since the association with Render command stream
> > gives us the ability to capture these information while inserting the
> > corresponding capture commands into the command stream).
> >
> > The OA reports generated in this way are associated with a corresponding
> > workload, and thus can be used the delimit the workload (i.e. sample the
> > counters at the workload boundaries), within an ongoing stream of
> periodic
> > counter snapshots.
> >
> > There may be usecases wherein we need more than periodic OA capture mode
> > which is supported currently. This mode is primarily used for two
> usecases:
> >      - Ability to capture system wide metrics, alongwith the ability to
> map
> >        the reports back to individual contexts (particularly for HSW).
> >      - Ability to inject tags for work, into the reports. This provides
> >        visibility into the multiple stages of work within single context.
> >
> > The userspace will be able to distinguish between the periodic and CS
> based
> > OA reports by the virtue of source_info sample field.
> >
> > The command MI_REPORT_PERF_COUNT can be used to capture snapshots of OA
> > counters, and is inserted at BB boundaries.
> > The data thus captured will be stored in a separate buffer, which will
> > be different from the buffer used otherwise for periodic OA capture mode.
> > The metadata information pertaining to snapshot is maintained in a list,
> > which also has offsets into the gem buffer object per captured snapshot.
> > In order to track whether the gpu has completed processing the node,
> > a field pertaining to corresponding gem request is added, which is
> tracked
> > for completion of the command.
> >
> > Both periodic and CS based reports are associated with a single stream
> > (corresponding to render engine), and it is expected to have the samples
> > in the sequential order according to their timestamps. Now, since these
> > reports are collected in separate buffers, these are merge sorted at the
> > time of forwarding to userspace during the read call.
> >
> > v2: Aligning with the non-perf interface (custom drm ioctl based). Also,
> > few related patches are squashed together for better readability
> >
> > v3: Updated perf sample capture emit hook name. Reserving space upfront
> > in the ring for emitting sample capture commands and using
> > req->fence.seqno for tracking samples. Added SRCU protection for streams.
> > Changed the stream last_request tracking to resv object. (Chris)
> > Updated perf.sample_lock spin_lock usage to avoid softlockups. Moved
> > stream to global per-engine structure. (Sagar)
> > Update unpin and put in the free routines to i915_vma_unpin_and_release.
> > Making use of perf stream cs_buffer vma resv instead of separate resv
> obj.
> > Pruned perf stream vma resv during gem_idle. (Chris)
> > Changed payload field ctx_id to u64 to keep all sample data aligned at 8
> > bytes. (Lionel)
> > stall/flush prior to sample capture is not added. Do we need to give this
> > control to user to select whether to stall/flush at each sample?
> >
> > Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
> > Signed-off-by: Robert Bragg <robert@sixbynine.org>
> > Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
> > ---
> >   drivers/gpu/drm/i915/i915_drv.h            |  101 ++-
> >   drivers/gpu/drm/i915/i915_gem.c            |    1 +
> >   drivers/gpu/drm/i915/i915_gem_execbuffer.c |    8 +
> >   drivers/gpu/drm/i915/i915_perf.c           | 1185
> ++++++++++++++++++++++------
> >   drivers/gpu/drm/i915/intel_engine_cs.c     |    4 +
> >   drivers/gpu/drm/i915/intel_ringbuffer.c    |    2 +
> >   drivers/gpu/drm/i915/intel_ringbuffer.h    |    5 +
> >   include/uapi/drm/i915_drm.h                |   15 +
> >   8 files changed, 1073 insertions(+), 248 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h
> b/drivers/gpu/drm/i915/i915_drv.h
> > index 2c7456f..8b1cecf 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -1985,6 +1985,24 @@ struct i915_perf_stream_ops {
> >        * The stream will always be disabled before this is called.
> >        */
> >       void (*destroy)(struct i915_perf_stream *stream);
> > +
> > +     /*
> > +      * @emit_sample_capture: Emit the commands in the command streamer
> > +      * for a particular gpu engine.
> > +      *
> > +      * The commands are inserted to capture the perf sample data at
> > +      * specific points during workload execution, such as before and
> after
> > +      * the batch buffer.
> > +      */
> > +     void (*emit_sample_capture)(struct i915_perf_stream *stream,
> > +                                 struct drm_i915_gem_request *request,
> > +                                 bool preallocate);
> > +};
> > +
>
> It seems the motivation for this following enum is mostly to deal with
> the fact that engine->perf_srcu is set before the OA unit is configured.
> Would it possible to set it later so that we get rid of the enum?
>
> <Sagar> I will try to make this as just binary state. This enum is
> defining the state of the stream. I too got confused with purpose of
> IN_PROGRESS.
> SRCU is used for synchronizing stream state check.
> IN_PROGRESS will enable us to not advertently try to access the stream vma
> for inserting the samples, but I guess depending on disabled/enabled should
> suffice.
>

Hi Sagar/Lionel,

The purpose of the tristate was to workaround a particular kludge of
working with just enabled/disabled boolean state. I'll explain below.

Let's say we have only boolean state.
i915_perf_emit_sample_capture() function would depend on
stream->enabled in order to insert the MI_RPC command in RCS.
If you see i915_perf_enable_locked(), stream->enabled is set before
stream->ops->enable(). The stream->ops->enable() function actually
enables the OA hardware to capture reports, and if MI_RPC commands
are submitted before OA hw is enabled, it may hang the gpu.

Also, we can't change the order of calling these operations inside
i915_perf_enable_locked() since gen7_update_oacontrol_locked()
function depends on stream->enabled flag to enable the OA
hw unit (i.e. it needs the flag to be true).

To workaround this problem, I introduced a tristate here.
If you can suggest some alternate solution to this problem,
we can remove this tristate kludge here.

Regards,
Sourab


> > +enum i915_perf_stream_state {
> > +     I915_PERF_STREAM_DISABLED,
> > +     I915_PERF_STREAM_ENABLE_IN_PROGRESS,
> > +     I915_PERF_STREAM_ENABLED,
> >   };
> >
> >   /**
> > @@ -1997,9 +2015,9 @@ struct i915_perf_stream {
> >       struct drm_i915_private *dev_priv;
> >
> >       /**
> > -      * @link: Links the stream into ``&drm_i915_private->streams``
> > +      * @engine: Engine to which this stream corresponds.
> >        */
> > -     struct list_head link;
> > +     struct intel_engine_cs *engine;
>
> This series only supports cs_mode on the RCS command stream.
> Does it really make sense to add an srcu on all the engines rather than
> keeping it part of dev_priv->perf ?
>
> We can always add that later if needed.
>
> <sagar> Yes. Will change this.
> >
> >       /**
> >        * @sample_flags: Flags representing the
> `DRM_I915_PERF_PROP_SAMPLE_*`
> > @@ -2022,17 +2040,41 @@ struct i915_perf_stream {
> >       struct i915_gem_context *ctx;
> >
> >       /**
> > -      * @enabled: Whether the stream is currently enabled, considering
> > -      * whether the stream was opened in a disabled state and based
> > -      * on `I915_PERF_IOCTL_ENABLE` and `I915_PERF_IOCTL_DISABLE` calls.
> > +      * @state: Current stream state, which can be either disabled,
> enabled,
> > +      * or enable_in_progress, while considering whether the stream was
> > +      * opened in a disabled state and based on
> `I915_PERF_IOCTL_ENABLE` and
> > +      * `I915_PERF_IOCTL_DISABLE` calls.
> >        */
> > -     bool enabled;
> > +     enum i915_perf_stream_state state;
> > +
> > +     /**
> > +      * @cs_mode: Whether command stream based perf sample collection is
> > +      * enabled for this stream
> > +      */
> > +     bool cs_mode;
> > +
> > +     /**
> > +      * @using_oa: Whether OA unit is in use for this particular stream
> > +      */
> > +     bool using_oa;
> >
> >       /**
> >        * @ops: The callbacks providing the implementation of this
> specific
> >        * type of configured stream.
> >        */
> >       const struct i915_perf_stream_ops *ops;
> > +
> > +     /* Command stream based perf data buffer */
> > +     struct {
> > +             struct i915_vma *vma;
> > +             u8 *vaddr;
> > +     } cs_buffer;
> > +
> > +     struct list_head cs_samples;
> > +     spinlock_t cs_samples_lock;
> > +
> > +     wait_queue_head_t poll_wq;
> > +     bool pollin;
> >   };
> >
> >   /**
> > @@ -2095,7 +2137,8 @@ struct i915_oa_ops {
> >       int (*read)(struct i915_perf_stream *stream,
> >                   char __user *buf,
> >                   size_t count,
> > -                 size_t *offset);
> > +                 size_t *offset,
> > +                 u32 ts);
> >
> >       /**
> >        * @oa_hw_tail_read: read the OA tail pointer register
> > @@ -2107,6 +2150,36 @@ struct i915_oa_ops {
> >       u32 (*oa_hw_tail_read)(struct drm_i915_private *dev_priv);
> >   };
> >
> > +/*
> > + * i915_perf_cs_sample - Sample element to hold info about a single perf
> > + * sample data associated with a particular GPU command stream.
> > + */
> > +struct i915_perf_cs_sample {
> > +     /**
> > +      * @link: Links the sample into ``&stream->cs_samples``
> > +      */
> > +     struct list_head link;
> > +
> > +     /**
> > +      * @request: GEM request associated with the sample. The commands
> to
> > +      * capture the perf metrics are inserted into the command streamer
> in
> > +      * context of this request.
> > +      */
> > +     struct drm_i915_gem_request *request;
> > +
> > +     /**
> > +      * @offset: Offset into ``&stream->cs_buffer``
> > +      * where the perf metrics will be collected, when the commands
> inserted
> > +      * into the command stream are executed by GPU.
> > +      */
> > +     u32 offset;
> > +
> > +     /**
> > +      * @ctx_id: Context ID associated with this perf sample
> > +      */
> > +     u32 ctx_id;
> > +};
> > +
> >   struct intel_cdclk_state {
> >       unsigned int cdclk, vco, ref;
> >   };
> > @@ -2431,17 +2504,10 @@ struct drm_i915_private {
> >               struct ctl_table_header *sysctl_header;
> >
> >               struct mutex lock;
> > -             struct list_head streams;
> > -
> > -             struct {
> > -                     struct i915_perf_stream *exclusive_stream;
> >
> > -                     u32 specific_ctx_id;
> > -
> > -                     struct hrtimer poll_check_timer;
> > -                     wait_queue_head_t poll_wq;
> > -                     bool pollin;
> > +             struct hrtimer poll_check_timer;
> >
> > +             struct {
> >                       /**
> >                        * For rate limiting any notifications of spurious
> >                        * invalid OA reports
> > @@ -3636,6 +3702,8 @@ int i915_perf_open_ioctl(struct drm_device *dev,
> void *data,
> >   void i915_oa_init_reg_state(struct intel_engine_cs *engine,
> >                           struct i915_gem_context *ctx,
> >                           uint32_t *reg_state);
> > +void i915_perf_emit_sample_capture(struct drm_i915_gem_request *req,
> > +                                bool preallocate);
> >
> >   /* i915_gem_evict.c */
> >   int __must_check i915_gem_evict_something(struct i915_address_space
> *vm,
> > @@ -3795,6 +3863,7 @@ int intel_engine_cmd_parser(struct intel_engine_cs
> *engine,
> >   /* i915_perf.c */
> >   extern void i915_perf_init(struct drm_i915_private *dev_priv);
> >   extern void i915_perf_fini(struct drm_i915_private *dev_priv);
> > +extern void i915_perf_streams_mark_idle(struct drm_i915_private
> *dev_priv);
> >   extern void i915_perf_register(struct drm_i915_private *dev_priv);
> >   extern void i915_perf_unregister(struct drm_i915_private *dev_priv);
> >
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c
> b/drivers/gpu/drm/i915/i915_gem.c
> > index 000a764..7b01548 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -3220,6 +3220,7 @@ bool i915_gem_unset_wedged(struct drm_i915_private
> *i915)
> >
> >       intel_engines_mark_idle(dev_priv);
> >       i915_gem_timelines_mark_idle(dev_priv);
> > +     i915_perf_streams_mark_idle(dev_priv);
> >
> >       GEM_BUG_ON(!dev_priv->gt.awake);
> >       dev_priv->gt.awake = false;
> > diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > index 5fa4476..bfe546b 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > @@ -1194,12 +1194,16 @@ static int __reloc_gpu_alloc(struct
> i915_execbuffer *eb,
> >       if (err)
> >               goto err_request;
> >
> > +     i915_perf_emit_sample_capture(rq, true);
> > +
> >       err = eb->engine->emit_bb_start(rq,
> >                                       batch->node.start, PAGE_SIZE,
> >                                       cache->gen > 5 ? 0 :
> I915_DISPATCH_SECURE);
> >       if (err)
> >               goto err_request;
> >
> > +     i915_perf_emit_sample_capture(rq, false);
> > +
> >       GEM_BUG_ON(!reservation_object_test_signaled_rcu(batch->resv,
> true));
> >       i915_vma_move_to_active(batch, rq, 0);
> >       reservation_object_lock(batch->resv, NULL);
> > @@ -2029,6 +2033,8 @@ static int eb_submit(struct i915_execbuffer *eb)
> >                       return err;
> >       }
> >
> > +     i915_perf_emit_sample_capture(eb->request, true);
> > +
> >       err = eb->engine->emit_bb_start(eb->request,
> >                                       eb->batch->node.start +
> >                                       eb->batch_start_offset,
> > @@ -2037,6 +2043,8 @@ static int eb_submit(struct i915_execbuffer *eb)
> >       if (err)
> >               return err;
> >
> > +     i915_perf_emit_sample_capture(eb->request, false);
> > +
> >       return 0;
> >   }
> >
> > diff --git a/drivers/gpu/drm/i915/i915_perf.c
> b/drivers/gpu/drm/i915/i915_perf.c
> > index b272653..57e1936 100644
> > --- a/drivers/gpu/drm/i915/i915_perf.c
> > +++ b/drivers/gpu/drm/i915/i915_perf.c
> > @@ -193,6 +193,7 @@
> >
> >   #include <linux/anon_inodes.h>
> >   #include <linux/sizes.h>
> > +#include <linux/srcu.h>
> >
> >   #include "i915_drv.h"
> >   #include "i915_oa_hsw.h"
> > @@ -288,6 +289,12 @@
> >   #define OAREPORT_REASON_CTX_SWITCH     (1<<3)
> >   #define OAREPORT_REASON_CLK_RATIO      (1<<5)
> >
> > +/* Data common to periodic and RCS based OA samples */
> > +struct i915_perf_sample_data {
> > +     u64 source;
> > +     u64 ctx_id;
> > +     const u8 *report;
> > +};
> >
> >   /* For sysctl proc_dointvec_minmax of i915_oa_max_sample_rate
> >    *
> > @@ -328,8 +335,19 @@
> >       [I915_OA_FORMAT_C4_B8]              = { 7, 64 },
> >   };
> >
> > +/* Duplicated from similar static enum in i915_gem_execbuffer.c */
> > +#define I915_USER_RINGS (4)
> > +static const enum intel_engine_id user_ring_map[I915_USER_RINGS + 1] = {
> > +     [I915_EXEC_DEFAULT]     = RCS,
> > +     [I915_EXEC_RENDER]      = RCS,
> > +     [I915_EXEC_BLT]         = BCS,
> > +     [I915_EXEC_BSD]         = VCS,
> > +     [I915_EXEC_VEBOX]       = VECS
> > +};
> > +
> >   #define SAMPLE_OA_REPORT      (1<<0)
> >   #define SAMPLE_OA_SOURCE      (1<<1)
> > +#define SAMPLE_CTX_ID              (1<<2)
> >
> >   /**
> >    * struct perf_open_properties - for validated properties given to
> open a stream
> > @@ -340,6 +358,9 @@
> >    * @oa_format: An OA unit HW report format
> >    * @oa_periodic: Whether to enable periodic OA unit sampling
> >    * @oa_period_exponent: The OA unit sampling period is derived from
> this
> > + * @cs_mode: Whether the stream is configured to enable collection of
> metrics
> > + * associated with command stream of a particular GPU engine
> > + * @engine: The GPU engine associated with the stream in case cs_mode
> is enabled
> >    *
> >    * As read_properties_unlocked() enumerates and validates the
> properties given
> >    * to open a stream of metrics the configuration is built up in the
> structure
> > @@ -356,6 +377,10 @@ struct perf_open_properties {
> >       int oa_format;
> >       bool oa_periodic;
> >       int oa_period_exponent;
> > +
> > +     /* Command stream mode */
> > +     bool cs_mode;
> > +     enum intel_engine_id engine;
> >   };
> >
> >   static u32 gen8_oa_hw_tail_read(struct drm_i915_private *dev_priv)
> > @@ -371,6 +396,266 @@ static u32 gen7_oa_hw_tail_read(struct
> drm_i915_private *dev_priv)
> >   }
> >
> >   /**
> > + * i915_perf_emit_sample_capture - Insert the commands to capture
> metrics into
> > + * the command stream of a GPU engine.
> > + * @request: request in whose context the metrics are being collected.
> > + * @preallocate: allocate space in ring for related sample.
> > + *
> > + * The function provides a hook through which the commands to capture
> perf
> > + * metrics, are inserted into the command stream of a GPU engine.
> > + */
> > +void i915_perf_emit_sample_capture(struct drm_i915_gem_request
> *request,
> > +                                bool preallocate)
> > +{
> > +     struct intel_engine_cs *engine = request->engine;
> > +     struct drm_i915_private *dev_priv = engine->i915;
> > +     struct i915_perf_stream *stream;
> > +     int idx;
> > +
> > +     if (!dev_priv->perf.initialized)
> > +             return;
> > +
> > +     idx = srcu_read_lock(&engine->perf_srcu);
> > +     stream = srcu_dereference(engine->exclusive_stream,
> &engine->perf_srcu);
> > +     if (stream && (stream->state == I915_PERF_STREAM_ENABLED) &&
> > +                             stream->cs_mode)
> > +             stream->ops->emit_sample_capture(stream, request,
> > +                                              preallocate);
> > +     srcu_read_unlock(&engine->perf_srcu, idx);
> > +}
> > +
> > +/**
> > + * release_perf_samples - Release old perf samples to make space for new
> > + * sample data.
> > + * @stream: Stream from which space is to be freed up.
> > + * @target_size: Space required to be freed up.
> > + *
> > + * We also dereference the associated request before deleting the
> sample.
> > + * Also, no need to check whether the commands associated with old
> samples
> > + * have been completed. This is because these sample entries are
> anyways going
> > + * to be replaced by a new sample, and gpu will eventually overwrite
> the buffer
> > + * contents, when the request associated with new sample completes.
> > + */
> > +static void release_perf_samples(struct i915_perf_stream *stream,
> > +                              u32 target_size)
> > +{
> > +     struct drm_i915_private *dev_priv = stream->dev_priv;
> > +     struct i915_perf_cs_sample *sample, *next;
> > +     u32 sample_size = dev_priv->perf.oa.oa_buffer.format_size;
> > +     u32 size = 0;
> > +
> > +     list_for_each_entry_safe
> > +             (sample, next, &stream->cs_samples, link) {
> > +             size += sample_size;
> > +             i915_gem_request_put(sample->request);
> > +             list_del(&sample->link);
> > +             kfree(sample);
> > +
> > +             if (size >= target_size)
> > +                     break;
> > +     }
> > +}
> > +
> > +/**
> > + * insert_perf_sample - Insert a perf sample entry to the sample list.
> > + * @stream: Stream into which sample is to be inserted.
> > + * @sample: perf CS sample to be inserted into the list
> > + *
> > + * This function never fails, since it always manages to insert the
> sample.
> > + * If the space is exhausted in the buffer, it will remove the older
> > + * entries in order to make space.
> > + */
> > +static void insert_perf_sample(struct i915_perf_stream *stream,
> > +                             struct i915_perf_cs_sample *sample)
> > +{
> > +     struct drm_i915_private *dev_priv = stream->dev_priv;
> > +     struct i915_perf_cs_sample *first, *last;
> > +     int max_offset = stream->cs_buffer.vma->obj->base.size;
> > +     u32 sample_size = dev_priv->perf.oa.oa_buffer.format_size;
> > +     unsigned long flags;
> > +
> > +     spin_lock_irqsave(&stream->cs_samples_lock, flags);
> > +     if (list_empty(&stream->cs_samples)) {
> > +             sample->offset = 0;
> > +             list_add_tail(&sample->link, &stream->cs_samples);
> > +             spin_unlock_irqrestore(&stream->cs_samples_lock, flags);
> > +             return;
> > +     }
> > +
> > +     first = list_first_entry(&stream->cs_samples, typeof(*first),
> > +                             link);
> > +     last = list_last_entry(&stream->cs_samples, typeof(*last),
> > +                             link);
> > +
> > +     if (last->offset >= first->offset) {
> > +             /* Sufficient space available at the end of buffer? */
> > +             if (last->offset + 2*sample_size < max_offset)
> > +                     sample->offset = last->offset + sample_size;
> > +             /*
> > +              * Wraparound condition. Is sufficient space available at
> > +              * beginning of buffer?
> > +              */
> > +             else if (sample_size < first->offset)
> > +                     sample->offset = 0;
> > +             /* Insufficient space. Overwrite existing old entries */
> > +             else {
> > +                     u32 target_size = sample_size - first->offset;
> > +
> > +                     release_perf_samples(stream, target_size);
> > +                     sample->offset = 0;
> > +             }
> > +     } else {
> > +             /* Sufficient space available? */
> > +             if (last->offset + 2*sample_size < first->offset)
> > +                     sample->offset = last->offset + sample_size;
> > +             /* Insufficient space. Overwrite existing old entries */
> > +             else {
> > +                     u32 target_size = sample_size -
> > +                             (first->offset - last->offset -
> > +                             sample_size);
> > +
> > +                     release_perf_samples(stream, target_size);
> > +                     sample->offset = last->offset + sample_size;
> > +             }
> > +     }
> > +     list_add_tail(&sample->link, &stream->cs_samples);
> > +     spin_unlock_irqrestore(&stream->cs_samples_lock, flags);
> > +}
> > +
> > +/**
> > + * i915_emit_oa_report_capture - Insert the commands to capture OA
> > + * reports metrics into the render command stream
> > + * @request: request in whose context the metrics are being collected.
> > + * @preallocate: allocate space in ring for related sample.
> > + * @offset: command stream buffer offset where the OA metrics need to be
> > + * collected
> > + */
> > +static int i915_emit_oa_report_capture(
> > +                             struct drm_i915_gem_request *request,
> > +                             bool preallocate,
> > +                             u32 offset)
> > +{
> > +     struct drm_i915_private *dev_priv = request->i915;
> > +     struct intel_engine_cs *engine = request->engine;
> > +     struct i915_perf_stream *stream;
> > +     u32 addr = 0;
> > +     u32 cmd, len = 4, *cs;
> > +     int idx;
> > +
> > +     idx = srcu_read_lock(&engine->perf_srcu);
> > +     stream = srcu_dereference(engine->exclusive_stream,
> &engine->perf_srcu);
> > +     addr = stream->cs_buffer.vma->node.start + offset;
> > +     srcu_read_unlock(&engine->perf_srcu, idx);
> > +
> > +     if (WARN_ON(addr & 0x3f)) {
> > +             DRM_ERROR("OA buffer address not aligned to 64 byte\n");
> > +             return -EINVAL;
> > +     }
> > +
> > +     if (preallocate)
> > +             request->reserved_space += len;
> > +     else
> > +             request->reserved_space -= len;
> > +
> > +     cs = intel_ring_begin(request, 4);
> > +     if (IS_ERR(cs))
> > +             return PTR_ERR(cs);
> > +
> > +     cmd = MI_REPORT_PERF_COUNT | (1<<0);
> > +     if (INTEL_GEN(dev_priv) >= 8)
> > +             cmd |= (2<<0);
> > +
> > +     *cs++ = cmd;
> > +     *cs++ = addr | MI_REPORT_PERF_COUNT_GGTT;
> > +     *cs++ = request->fence.seqno;
> > +
> > +     if (INTEL_GEN(dev_priv) >= 8)
> > +             *cs++ = 0;
> > +     else
> > +             *cs++ = MI_NOOP;
> > +
> > +     intel_ring_advance(request, cs);
> > +
> > +     return 0;
> > +}
> > +
> > +/**
> > + * i915_perf_stream_emit_sample_capture - Insert the commands to
> capture perf
> > + * metrics into the GPU command stream
> > + * @stream: An i915-perf stream opened for GPU metrics
> > + * @request: request in whose context the metrics are being collected.
> > + * @preallocate: allocate space in ring for related sample.
> > + */
> > +static void i915_perf_stream_emit_sample_capture(
> > +                                     struct i915_perf_stream *stream,
> > +                                     struct drm_i915_gem_request
> *request,
> > +                                     bool preallocate)
> > +{
> > +     struct reservation_object *resv = stream->cs_buffer.vma->resv;
> > +     struct i915_perf_cs_sample *sample;
> > +     unsigned long flags;
> > +     int ret;
> > +
> > +     sample = kzalloc(sizeof(*sample), GFP_KERNEL);
> > +     if (sample == NULL) {
> > +             DRM_ERROR("Perf sample alloc failed\n");
> > +             return;
> > +     }
> > +
> > +     sample->request = i915_gem_request_get(request);
> > +     sample->ctx_id = request->ctx->hw_id;
> > +
> > +     insert_perf_sample(stream, sample);
> > +
> > +     if (stream->sample_flags & SAMPLE_OA_REPORT) {
> > +             ret = i915_emit_oa_report_capture(request,
> > +                                               preallocate,
> > +                                               sample->offset);
> > +             if (ret)
> > +                     goto err_unref;
> > +     }
> > +
> > +     reservation_object_lock(resv, NULL);
> > +     if (reservation_object_reserve_shared(resv) == 0)
> > +             reservation_object_add_shared_fence(resv,
> &request->fence);
> > +     reservation_object_unlock(resv);
> > +
> > +     i915_vma_move_to_active(stream->cs_buffer.vma, request,
> > +                                     EXEC_OBJECT_WRITE);
> > +     return;
> > +
> > +err_unref:
> > +     i915_gem_request_put(sample->request);
> > +     spin_lock_irqsave(&stream->cs_samples_lock, flags);
> > +     list_del(&sample->link);
> > +     spin_unlock_irqrestore(&stream->cs_samples_lock, flags);
> > +     kfree(sample);
> > +}
> > +
> > +/**
> > + * i915_perf_stream_release_samples - Release the perf command stream
> samples
> > + * @stream: Stream from which sample are to be released.
> > + *
> > + * Note: The associated requests should be completed before releasing
> the
> > + * references here.
> > + */
> > +static void i915_perf_stream_release_samples(struct i915_perf_stream
> *stream)
> > +{
> > +     struct i915_perf_cs_sample *entry, *next;
> > +     unsigned long flags;
> > +
> > +     list_for_each_entry_safe
> > +             (entry, next, &stream->cs_samples, link) {
> > +             i915_gem_request_put(entry->request);
> > +
> > +             spin_lock_irqsave(&stream->cs_samples_lock, flags);
> > +             list_del(&entry->link);
> > +             spin_unlock_irqrestore(&stream->cs_samples_lock, flags);
> > +             kfree(entry);
> > +     }
> > +}
> > +
> > +/**
> >    * oa_buffer_check_unlocked - check for data and update tail ptr state
> >    * @dev_priv: i915 device instance
> >    *
> > @@ -521,12 +806,13 @@ static int append_oa_status(struct
> i915_perf_stream *stream,
> >   }
> >
> >   /**
> > - * append_oa_sample - Copies single OA report into userspace read()
> buffer.
> > - * @stream: An i915-perf stream opened for OA metrics
> > + * append_perf_sample - Copies single perf sample into userspace read()
> buffer.
> > + * @stream: An i915-perf stream opened for perf samples
> >    * @buf: destination buffer given by userspace
> >    * @count: the number of bytes userspace wants to read
> >    * @offset: (inout): the current position for writing into @buf
> > - * @report: A single OA report to (optionally) include as part of the
> sample
> > + * @data: perf sample data which contains (optionally) metrics
> configured
> > + * earlier when opening a stream
> >    *
> >    * The contents of a sample are configured through
> `DRM_I915_PERF_PROP_SAMPLE_*`
> >    * properties when opening a stream, tracked as
> `stream->sample_flags`. This
> > @@ -537,11 +823,11 @@ static int append_oa_status(struct
> i915_perf_stream *stream,
> >    *
> >    * Returns: 0 on success, negative error code on failure.
> >    */
> > -static int append_oa_sample(struct i915_perf_stream *stream,
> > +static int append_perf_sample(struct i915_perf_stream *stream,
> >                           char __user *buf,
> >                           size_t count,
> >                           size_t *offset,
> > -                         const u8 *report)
> > +                         const struct i915_perf_sample_data *data)
> >   {
> >       struct drm_i915_private *dev_priv = stream->dev_priv;
> >       int report_size = dev_priv->perf.oa.oa_buffer.format_size;
> > @@ -569,16 +855,21 @@ static int append_oa_sample(struct
> i915_perf_stream *stream,
> >        * transition. These are considered as source 'OABUFFER'.
> >        */
> >       if (sample_flags & SAMPLE_OA_SOURCE) {
> > -             u64 source = I915_PERF_SAMPLE_OA_SOURCE_OABUFFER;
> > +             if (copy_to_user(buf, &data->source, 8))
> > +                     return -EFAULT;
> > +             buf += 8;
> > +     }
> >
> > -             if (copy_to_user(buf, &source, 8))
> > +     if (sample_flags & SAMPLE_CTX_ID) {
> > +             if (copy_to_user(buf, &data->ctx_id, 8))
> >                       return -EFAULT;
> >               buf += 8;
> >       }
> >
> >       if (sample_flags & SAMPLE_OA_REPORT) {
> > -             if (copy_to_user(buf, report, report_size))
> > +             if (copy_to_user(buf, data->report, report_size))
> >                       return -EFAULT;
> > +             buf += report_size;
> >       }
> >
> >       (*offset) += header.size;
> > @@ -587,11 +878,54 @@ static int append_oa_sample(struct
> i915_perf_stream *stream,
> >   }
> >
> >   /**
> > + * append_oa_buffer_sample - Copies single periodic OA report into
> userspace
> > + * read() buffer.
> > + * @stream: An i915-perf stream opened for OA metrics
> > + * @buf: destination buffer given by userspace
> > + * @count: the number of bytes userspace wants to read
> > + * @offset: (inout): the current position for writing into @buf
> > + * @report: A single OA report to (optionally) include as part of the
> sample
> > + *
> > + * Returns: 0 on success, negative error code on failure.
> > + */
> > +static int append_oa_buffer_sample(struct i915_perf_stream *stream,
> > +                             char __user *buf, size_t count,
> > +                             size_t *offset, const u8 *report)
> > +{
> > +     struct drm_i915_private *dev_priv = stream->dev_priv;
> > +     u32 sample_flags = stream->sample_flags;
> > +     struct i915_perf_sample_data data = { 0 };
> > +     u32 *report32 = (u32 *)report;
> > +
> > +     if (sample_flags & SAMPLE_OA_SOURCE)
> > +             data.source = I915_PERF_SAMPLE_OA_SOURCE_OABUFFER;
> > +
> > +     if (sample_flags & SAMPLE_CTX_ID) {
> > +             if (INTEL_INFO(dev_priv)->gen < 8)
> > +                     data.ctx_id = 0;
> > +             else {
> > +                     /*
> > +                      * XXX: Just keep the lower 21 bits for now since
> I'm
> > +                      * not entirely sure if the HW touches any of the
> higher
> > +                      * bits in this field
> > +                      */
> > +                     data.ctx_id = report32[2] & 0x1fffff;
> > +             }
> > +     }
> > +
> > +     if (sample_flags & SAMPLE_OA_REPORT)
> > +             data.report = report;
> > +
> > +     return append_perf_sample(stream, buf, count, offset, &data);
> > +}
> > +
> > +/**
> >    * Copies all buffered OA reports into userspace read() buffer.
> >    * @stream: An i915-perf stream opened for OA metrics
> >    * @buf: destination buffer given by userspace
> >    * @count: the number of bytes userspace wants to read
> >    * @offset: (inout): the current position for writing into @buf
> > + * @ts: copy OA reports till this timestamp
> >    *
> >    * Notably any error condition resulting in a short read (-%ENOSPC or
> >    * -%EFAULT) will be returned even though one or more records may
> > @@ -609,7 +943,8 @@ static int append_oa_sample(struct i915_perf_stream
> *stream,
> >   static int gen8_append_oa_reports(struct i915_perf_stream *stream,
> >                                 char __user *buf,
> >                                 size_t count,
> > -                               size_t *offset)
> > +                               size_t *offset,
> > +                               u32 ts)
> >   {
> >       struct drm_i915_private *dev_priv = stream->dev_priv;
> >       int report_size = dev_priv->perf.oa.oa_buffer.format_size;
> > @@ -623,7 +958,7 @@ static int gen8_append_oa_reports(struct
> i915_perf_stream *stream,
> >       u32 taken;
> >       int ret = 0;
> >
> > -     if (WARN_ON(!stream->enabled))
> > +     if (WARN_ON(stream->state != I915_PERF_STREAM_ENABLED))
> >               return -EIO;
> >
> >       spin_lock_irqsave(&dev_priv->perf.oa.oa_buffer.ptr_lock, flags);
> > @@ -669,6 +1004,11 @@ static int gen8_append_oa_reports(struct
> i915_perf_stream *stream,
> >               u32 *report32 = (void *)report;
> >               u32 ctx_id;
> >               u32 reason;
> > +             u32 report_ts = report32[1];
> > +
> > +             /* Report timestamp should not exceed the given ts */
> > +             if (report_ts > ts)
> > +                     break;
> >
> >               /*
> >                * All the report sizes factor neatly into the buffer
> > @@ -750,23 +1090,23 @@ static int gen8_append_oa_reports(struct
> i915_perf_stream *stream,
> >                * switches since it's not-uncommon for periodic samples to
> >                * identify a switch before any 'context switch' report.
> >                */
> > -             if (!dev_priv->perf.oa.exclusive_stream->ctx ||
> > -                 dev_priv->perf.oa.specific_ctx_id == ctx_id ||
> > +             if (!stream->ctx ||
> > +                 stream->engine->specific_ctx_id == ctx_id ||
> >                   (dev_priv->perf.oa.oa_buffer.last_ctx_id ==
> > -                  dev_priv->perf.oa.specific_ctx_id) ||
> > +                  stream->engine->specific_ctx_id) ||
> >                   reason & OAREPORT_REASON_CTX_SWITCH) {
> >
> >                       /*
> >                        * While filtering for a single context we avoid
> >                        * leaking the IDs of other contexts.
> >                        */
> > -                     if (dev_priv->perf.oa.exclusive_stream->ctx &&
> > -                         dev_priv->perf.oa.specific_ctx_id != ctx_id) {
> > +                     if (stream->ctx &&
> > +                         stream->engine->specific_ctx_id != ctx_id) {
> >                               report32[2] = INVALID_CTX_ID;
> >                       }
> >
> > -                     ret = append_oa_sample(stream, buf, count, offset,
> > -                                            report);
> > +                     ret = append_oa_buffer_sample(stream, buf, count,
> > +                                                   offset, report);
> >                       if (ret)
> >                               break;
> >
> > @@ -807,6 +1147,7 @@ static int gen8_append_oa_reports(struct
> i915_perf_stream *stream,
> >    * @buf: destination buffer given by userspace
> >    * @count: the number of bytes userspace wants to read
> >    * @offset: (inout): the current position for writing into @buf
> > + * @ts: copy OA reports till this timestamp
> >    *
> >    * Checks OA unit status registers and if necessary appends
> corresponding
> >    * status records for userspace (such as for a buffer full condition)
> and then
> > @@ -824,7 +1165,8 @@ static int gen8_append_oa_reports(struct
> i915_perf_stream *stream,
> >   static int gen8_oa_read(struct i915_perf_stream *stream,
> >                       char __user *buf,
> >                       size_t count,
> > -                     size_t *offset)
> > +                     size_t *offset,
> > +                     u32 ts)
> >   {
> >       struct drm_i915_private *dev_priv = stream->dev_priv;
> >       u32 oastatus;
> > @@ -877,7 +1219,7 @@ static int gen8_oa_read(struct i915_perf_stream
> *stream,
> >                          oastatus & ~GEN8_OASTATUS_REPORT_LOST);
> >       }
> >
> > -     return gen8_append_oa_reports(stream, buf, count, offset);
> > +     return gen8_append_oa_reports(stream, buf, count, offset, ts);
> >   }
> >
> >   /**
> > @@ -886,6 +1228,7 @@ static int gen8_oa_read(struct i915_perf_stream
> *stream,
> >    * @buf: destination buffer given by userspace
> >    * @count: the number of bytes userspace wants to read
> >    * @offset: (inout): the current position for writing into @buf
> > + * @ts: copy OA reports till this timestamp
> >    *
> >    * Notably any error condition resulting in a short read (-%ENOSPC or
> >    * -%EFAULT) will be returned even though one or more records may
> > @@ -903,7 +1246,8 @@ static int gen8_oa_read(struct i915_perf_stream
> *stream,
> >   static int gen7_append_oa_reports(struct i915_perf_stream *stream,
> >                                 char __user *buf,
> >                                 size_t count,
> > -                               size_t *offset)
> > +                               size_t *offset,
> > +                               u32 ts)
> >   {
> >       struct drm_i915_private *dev_priv = stream->dev_priv;
> >       int report_size = dev_priv->perf.oa.oa_buffer.format_size;
> > @@ -917,7 +1261,7 @@ static int gen7_append_oa_reports(struct
> i915_perf_stream *stream,
> >       u32 taken;
> >       int ret = 0;
> >
> > -     if (WARN_ON(!stream->enabled))
> > +     if (WARN_ON(stream->state != I915_PERF_STREAM_ENABLED))
> >               return -EIO;
> >
> >       spin_lock_irqsave(&dev_priv->perf.oa.oa_buffer.ptr_lock, flags);
> > @@ -984,7 +1328,12 @@ static int gen7_append_oa_reports(struct
> i915_perf_stream *stream,
> >                       continue;
> >               }
> >
> > -             ret = append_oa_sample(stream, buf, count, offset, report);
> > +             /* Report timestamp should not exceed the given ts */
> > +             if (report32[1] > ts)
> > +                     break;
> > +
> > +             ret = append_oa_buffer_sample(stream, buf, count, offset,
> > +                                           report);
> >               if (ret)
> >                       break;
> >
> > @@ -1022,6 +1371,7 @@ static int gen7_append_oa_reports(struct
> i915_perf_stream *stream,
> >    * @buf: destination buffer given by userspace
> >    * @count: the number of bytes userspace wants to read
> >    * @offset: (inout): the current position for writing into @buf
> > + * @ts: copy OA reports till this timestamp
> >    *
> >    * Checks Gen 7 specific OA unit status registers and if necessary
> appends
> >    * corresponding status records for userspace (such as for a buffer
> full
> > @@ -1035,7 +1385,8 @@ static int gen7_append_oa_reports(struct
> i915_perf_stream *stream,
> >   static int gen7_oa_read(struct i915_perf_stream *stream,
> >                       char __user *buf,
> >                       size_t count,
> > -                     size_t *offset)
> > +                     size_t *offset,
> > +                     u32 ts)
> >   {
> >       struct drm_i915_private *dev_priv = stream->dev_priv;
> >       u32 oastatus1;
> > @@ -1097,16 +1448,172 @@ static int gen7_oa_read(struct i915_perf_stream
> *stream,
> >                       GEN7_OASTATUS1_REPORT_LOST;
> >       }
> >
> > -     return gen7_append_oa_reports(stream, buf, count, offset);
> > +     return gen7_append_oa_reports(stream, buf, count, offset, ts);
> > +}
> > +
> > +/**
> > + * append_cs_buffer_sample - Copies single perf sample data associated
> with
> > + * GPU command stream, into userspace read() buffer.
> > + * @stream: An i915-perf stream opened for perf CS metrics
> > + * @buf: destination buffer given by userspace
> > + * @count: the number of bytes userspace wants to read
> > + * @offset: (inout): the current position for writing into @buf
> > + * @node: Sample data associated with perf metrics
> > + *
> > + * Returns: 0 on success, negative error code on failure.
> > + */
> > +static int append_cs_buffer_sample(struct i915_perf_stream *stream,
> > +                             char __user *buf,
> > +                             size_t count,
> > +                             size_t *offset,
> > +                             struct i915_perf_cs_sample *node)
> > +{
> > +     struct drm_i915_private *dev_priv = stream->dev_priv;
> > +     struct i915_perf_sample_data data = { 0 };
> > +     u32 sample_flags = stream->sample_flags;
> > +     int ret = 0;
> > +
> > +     if (sample_flags & SAMPLE_OA_REPORT) {
> > +             const u8 *report = stream->cs_buffer.vaddr + node->offset;
> > +             u32 sample_ts = *(u32 *)(report + 4);
> > +
> > +             data.report = report;
> > +
> > +             /* First, append the periodic OA samples having lower
> > +              * timestamp values
> > +              */
> > +             ret = dev_priv->perf.oa.ops.read(stream, buf, count,
> offset,
> > +                                              sample_ts);
> > +             if (ret)
> > +                     return ret;
> > +     }
> > +
> > +     if (sample_flags & SAMPLE_OA_SOURCE)
> > +             data.source = I915_PERF_SAMPLE_OA_SOURCE_CS;
> > +
> > +     if (sample_flags & SAMPLE_CTX_ID)
> > +             data.ctx_id = node->ctx_id;
> > +
> > +     return append_perf_sample(stream, buf, count, offset, &data);
> >   }
> >
> >   /**
> > - * i915_oa_wait_unlocked - handles blocking IO until OA data available
> > + * append_cs_buffer_samples: Copies all command stream based perf
> samples
> > + * into userspace read() buffer.
> > + * @stream: An i915-perf stream opened for perf CS metrics
> > + * @buf: destination buffer given by userspace
> > + * @count: the number of bytes userspace wants to read
> > + * @offset: (inout): the current position for writing into @buf
> > + *
> > + * Notably any error condition resulting in a short read (-%ENOSPC or
> > + * -%EFAULT) will be returned even though one or more records may
> > + * have been successfully copied. In this case it's up to the caller
> > + * to decide if the error should be squashed before returning to
> > + * userspace.
> > + *
> > + * Returns: 0 on success, negative error code on failure.
> > + */
> > +static int append_cs_buffer_samples(struct i915_perf_stream *stream,
> > +                             char __user *buf,
> > +                             size_t count,
> > +                             size_t *offset)
> > +{
> > +     struct i915_perf_cs_sample *entry, *next;
> > +     LIST_HEAD(free_list);
> > +     int ret = 0;
> > +     unsigned long flags;
> > +
> > +     spin_lock_irqsave(&stream->cs_samples_lock, flags);
> > +     if (list_empty(&stream->cs_samples)) {
> > +             spin_unlock_irqrestore(&stream->cs_samples_lock, flags);
> > +             return 0;
> > +     }
> > +     list_for_each_entry_safe(entry, next,
> > +                              &stream->cs_samples, link) {
> > +             if (!i915_gem_request_completed(entry->request))
> > +                     break;
> > +             list_move_tail(&entry->link, &free_list);
> > +     }
> > +     spin_unlock_irqrestore(&stream->cs_samples_lock, flags);
> > +
> > +     if (list_empty(&free_list))
> > +             return 0;
> > +
> > +     list_for_each_entry_safe(entry, next, &free_list, link) {
> > +             ret = append_cs_buffer_sample(stream, buf, count, offset,
> > +                                           entry);
> > +             if (ret)
> > +                     break;
> > +
> > +             list_del(&entry->link);
> > +             i915_gem_request_put(entry->request);
> > +             kfree(entry);
> > +     }
> > +
> > +     /* Don't discard remaining entries, keep them for next read */
> > +     spin_lock_irqsave(&stream->cs_samples_lock, flags);
> > +     list_splice(&free_list, &stream->cs_samples);
> > +     spin_unlock_irqrestore(&stream->cs_samples_lock, flags);
> > +
> > +     return ret;
> > +}
> > +
> > +/*
> > + * cs_buffer_is_empty - Checks whether the command stream buffer
> > + * associated with the stream has data available.
> >    * @stream: An i915-perf stream opened for OA metrics
> >    *
> > + * Returns: true if atleast one request associated with command stream
> is
> > + * completed, else returns false.
> > + */
> > +static bool cs_buffer_is_empty(struct i915_perf_stream *stream)
> > +
> > +{
> > +     struct i915_perf_cs_sample *entry = NULL;
> > +     struct drm_i915_gem_request *request = NULL;
> > +     unsigned long flags;
> > +
> > +     spin_lock_irqsave(&stream->cs_samples_lock, flags);
> > +     entry = list_first_entry_or_null(&stream->cs_samples,
> > +                     struct i915_perf_cs_sample, link);
> > +     if (entry)
> > +             request = entry->request;
> > +     spin_unlock_irqrestore(&stream->cs_samples_lock, flags);
> > +
> > +     if (!entry)
> > +             return true;
> > +     else if (!i915_gem_request_completed(request))
> > +             return true;
> > +     else
> > +             return false;
> > +}
> > +
> > +/**
> > + * stream_have_data_unlocked - Checks whether the stream has data
> available
> > + * @stream: An i915-perf stream opened for OA metrics
> > + *
> > + * For command stream based streams, check if the command stream buffer
> has
> > + * atleast one sample available, if not return false, irrespective of
> periodic
> > + * oa buffer having the data or not.
> > + */
> > +
> > +static bool stream_have_data_unlocked(struct i915_perf_stream *stream)
> > +{
> > +     struct drm_i915_private *dev_priv = stream->dev_priv;
> > +
> > +     if (stream->cs_mode)
> > +             return !cs_buffer_is_empty(stream);
> > +     else
> > +             return oa_buffer_check_unlocked(dev_priv);
> > +}
> > +
> > +/**
> > + * i915_perf_stream_wait_unlocked - handles blocking IO until data
> available
> > + * @stream: An i915-perf stream opened for GPU metrics
> > + *
> >    * Called when userspace tries to read() from a blocking stream FD
> opened
> > - * for OA metrics. It waits until the hrtimer callback finds a non-empty
> > - * OA buffer and wakes us.
> > + * for perf metrics. It waits until the hrtimer callback finds a
> non-empty
> > + * command stream buffer / OA buffer and wakes us.
> >    *
> >    * Note: it's acceptable to have this return with some false positives
> >    * since any subsequent read handling will return -EAGAIN if there
> isn't
> > @@ -1114,7 +1621,7 @@ static int gen7_oa_read(struct i915_perf_stream
> *stream,
> >    *
> >    * Returns: zero on success or a negative error code
> >    */
> > -static int i915_oa_wait_unlocked(struct i915_perf_stream *stream)
> > +static int i915_perf_stream_wait_unlocked(struct i915_perf_stream
> *stream)
> >   {
> >       struct drm_i915_private *dev_priv = stream->dev_priv;
> >
> > @@ -1122,32 +1629,47 @@ static int i915_oa_wait_unlocked(struct
> i915_perf_stream *stream)
> >       if (!dev_priv->perf.oa.periodic)
> >               return -EIO;
> >
> > -     return wait_event_interruptible(dev_priv->perf.oa.poll_wq,
> > -                                     oa_buffer_check_unlocked(dev_
> priv));
> > +     if (stream->cs_mode) {
> > +             long int ret;
> > +
> > +             /* Wait for the all sampled requests. */
> > +             ret = reservation_object_wait_timeout_rcu(
> > +
>  stream->cs_buffer.vma->resv,
> > +                                                 true,
> > +                                                 true,
> > +                                                 MAX_SCHEDULE_TIMEOUT);
> > +             if (unlikely(ret < 0)) {
> > +                     DRM_DEBUG_DRIVER("Failed to wait for sampled
> requests: %li\n", ret);
> > +                     return ret;
> > +             }
> > +     }
> > +
> > +     return wait_event_interruptible(stream->poll_wq,
> > +                                     stream_have_data_unlocked(
> stream));
> >   }
> >
> >   /**
> > - * i915_oa_poll_wait - call poll_wait() for an OA stream poll()
> > - * @stream: An i915-perf stream opened for OA metrics
> > + * i915_perf_stream_poll_wait - call poll_wait() for an stream poll()
> > + * @stream: An i915-perf stream opened for GPU metrics
> >    * @file: An i915 perf stream file
> >    * @wait: poll() state table
> >    *
> > - * For handling userspace polling on an i915 perf stream opened for OA
> metrics,
> > + * For handling userspace polling on an i915 perf stream opened for
> metrics,
> >    * this starts a poll_wait with the wait queue that our hrtimer
> callback wakes
> > - * when it sees data ready to read in the circular OA buffer.
> > + * when it sees data ready to read either in command stream buffer or
> in the
> > + * circular OA buffer.
> >    */
> > -static void i915_oa_poll_wait(struct i915_perf_stream *stream,
> > +static void i915_perf_stream_poll_wait(struct i915_perf_stream *stream,
> >                             struct file *file,
> >                             poll_table *wait)
> >   {
> > -     struct drm_i915_private *dev_priv = stream->dev_priv;
> > -
> > -     poll_wait(file, &dev_priv->perf.oa.poll_wq, wait);
> > +     poll_wait(file, &stream->poll_wq, wait);
> >   }
> >
> >   /**
> > - * i915_oa_read - just calls through to &i915_oa_ops->read
> > - * @stream: An i915-perf stream opened for OA metrics
> > + * i915_perf_stream_read - Reads perf metrics available into userspace
> read
> > + * buffer
> > + * @stream: An i915-perf stream opened for GPU metrics
> >    * @buf: destination buffer given by userspace
> >    * @count: the number of bytes userspace wants to read
> >    * @offset: (inout): the current position for writing into @buf
> > @@ -1157,14 +1679,21 @@ static void i915_oa_poll_wait(struct
> i915_perf_stream *stream,
> >    *
> >    * Returns: zero on success or a negative error code
> >    */
> > -static int i915_oa_read(struct i915_perf_stream *stream,
> > +static int i915_perf_stream_read(struct i915_perf_stream *stream,
> >                       char __user *buf,
> >                       size_t count,
> >                       size_t *offset)
> >   {
> >       struct drm_i915_private *dev_priv = stream->dev_priv;
> >
> > -     return dev_priv->perf.oa.ops.read(stream, buf, count, offset);
> > +
> > +     if (stream->cs_mode)
> > +             return append_cs_buffer_samples(stream, buf, count,
> offset);
> > +     else if (stream->sample_flags & SAMPLE_OA_REPORT)
> > +             return dev_priv->perf.oa.ops.read(stream, buf, count,
> offset,
> > +                                             U32_MAX);
> > +     else
> > +             return -EINVAL;
> >   }
> >
> >   /**
> > @@ -1182,7 +1711,7 @@ static int oa_get_render_ctx_id(struct
> i915_perf_stream *stream)
> >       struct drm_i915_private *dev_priv = stream->dev_priv;
> >
> >       if (i915.enable_execlists)
> > -             dev_priv->perf.oa.specific_ctx_id = stream->ctx->hw_id;
> > +             stream->engine->specific_ctx_id = stream->ctx->hw_id;
> >       else {
> >               struct intel_engine_cs *engine = dev_priv->engine[RCS];
> >               struct intel_ring *ring;
> > @@ -1209,7 +1738,7 @@ static int oa_get_render_ctx_id(struct
> i915_perf_stream *stream)
> >                * i915_ggtt_offset() on the fly) considering the
> difference
> >                * with gen8+ and execlists
> >                */
> > -             dev_priv->perf.oa.specific_ctx_id =
> > +             stream->engine->specific_ctx_id =
> >                       i915_ggtt_offset(stream->ctx->
> engine[engine->id].state);
> >       }
> >
> > @@ -1228,13 +1757,13 @@ static void oa_put_render_ctx_id(struct
> i915_perf_stream *stream)
> >       struct drm_i915_private *dev_priv = stream->dev_priv;
> >
> >       if (i915.enable_execlists) {
> > -             dev_priv->perf.oa.specific_ctx_id = INVALID_CTX_ID;
> > +             stream->engine->specific_ctx_id = INVALID_CTX_ID;
> >       } else {
> >               struct intel_engine_cs *engine = dev_priv->engine[RCS];
> >
> >               mutex_lock(&dev_priv->drm.struct_mutex);
> >
> > -             dev_priv->perf.oa.specific_ctx_id = INVALID_CTX_ID;
> > +             stream->engine->specific_ctx_id = INVALID_CTX_ID;
> >               engine->context_unpin(engine, stream->ctx);
> >
> >               mutex_unlock(&dev_priv->drm.struct_mutex);
> > @@ -1242,13 +1771,28 @@ static void oa_put_render_ctx_id(struct
> i915_perf_stream *stream)
> >   }
> >
> >   static void
> > +free_cs_buffer(struct i915_perf_stream *stream)
> > +{
> > +     struct drm_i915_private *dev_priv = stream->dev_priv;
> > +
> > +     mutex_lock(&dev_priv->drm.struct_mutex);
> > +
> > +     i915_gem_object_unpin_map(stream->cs_buffer.vma->obj);
> > +     i915_vma_unpin_and_release(&stream->cs_buffer.vma);
> > +
> > +     stream->cs_buffer.vma = NULL;
> > +     stream->cs_buffer.vaddr = NULL;
> > +
> > +     mutex_unlock(&dev_priv->drm.struct_mutex);
> > +}
> > +
> > +static void
> >   free_oa_buffer(struct drm_i915_private *i915)
> >   {
> >       mutex_lock(&i915->drm.struct_mutex);
> >
> >       i915_gem_object_unpin_map(i915->perf.oa.oa_buffer.vma->obj);
> > -     i915_vma_unpin(i915->perf.oa.oa_buffer.vma);
> > -     i915_gem_object_put(i915->perf.oa.oa_buffer.vma->obj);
> > +     i915_vma_unpin_and_release(&i915->perf.oa.oa_buffer.vma);
> >
> >       i915->perf.oa.oa_buffer.vma = NULL;
> >       i915->perf.oa.oa_buffer.vaddr = NULL;
> > @@ -1256,27 +1800,41 @@ static void oa_put_render_ctx_id(struct
> i915_perf_stream *stream)
> >       mutex_unlock(&i915->drm.struct_mutex);
> >   }
> >
> > -static void i915_oa_stream_destroy(struct i915_perf_stream *stream)
> > +static void i915_perf_stream_destroy(struct i915_perf_stream *stream)
> >   {
> >       struct drm_i915_private *dev_priv = stream->dev_priv;
> > -
> > -     BUG_ON(stream != dev_priv->perf.oa.exclusive_stream);
> > +     struct intel_engine_cs *engine = stream->engine;
> > +     struct i915_perf_stream *engine_stream;
> > +     int idx;
> > +
> > +     idx = srcu_read_lock(&engine->perf_srcu);
> > +     engine_stream = srcu_dereference(engine->exclusive_stream,
> > +                                      &engine->perf_srcu);
> > +     if (WARN_ON(stream != engine_stream))
> > +             return;
> > +     srcu_read_unlock(&engine->perf_srcu, idx);
> >
> >       /*
> >        * Unset exclusive_stream first, it might be checked while
> >        * disabling the metric set on gen8+.
> >        */
> > -     dev_priv->perf.oa.exclusive_stream = NULL;
> > +     rcu_assign_pointer(stream->engine->exclusive_stream, NULL);
> > +     synchronize_srcu(&stream->engine->perf_srcu);
> >
> > -     dev_priv->perf.oa.ops.disable_metric_set(dev_priv);
> > +     if (stream->using_oa) {
> > +             dev_priv->perf.oa.ops.disable_metric_set(dev_priv);
> >
> > -     free_oa_buffer(dev_priv);
> > +             free_oa_buffer(dev_priv);
> >
> > -     intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
> > -     intel_runtime_pm_put(dev_priv);
> > +             intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
> > +             intel_runtime_pm_put(dev_priv);
> >
> > -     if (stream->ctx)
> > -             oa_put_render_ctx_id(stream);
> > +             if (stream->ctx)
> > +                     oa_put_render_ctx_id(stream);
> > +     }
> > +
> > +     if (stream->cs_mode)
> > +             free_cs_buffer(stream);
> >
> >       if (dev_priv->perf.oa.spurious_report_rs.missed) {
> >               DRM_NOTE("%d spurious OA report notices suppressed due to
> ratelimiting\n",
> > @@ -1325,11 +1883,6 @@ static void gen7_init_oa_buffer(struct
> drm_i915_private *dev_priv)
> >        * memory...
> >        */
> >       memset(dev_priv->perf.oa.oa_buffer.vaddr, 0, OA_BUFFER_SIZE);
> > -
> > -     /* Maybe make ->pollin per-stream state if we support multiple
> > -      * concurrent streams in the future.
> > -      */
> > -     dev_priv->perf.oa.pollin = false;
> >   }
> >
> >   static void gen8_init_oa_buffer(struct drm_i915_private *dev_priv)
> > @@ -1383,33 +1936,26 @@ static void gen8_init_oa_buffer(struct
> drm_i915_private *dev_priv)
> >        * memory...
> >        */
> >       memset(dev_priv->perf.oa.oa_buffer.vaddr, 0, OA_BUFFER_SIZE);
> > -
> > -     /*
> > -      * Maybe make ->pollin per-stream state if we support multiple
> > -      * concurrent streams in the future.
> > -      */
> > -     dev_priv->perf.oa.pollin = false;
> >   }
> >
> > -static int alloc_oa_buffer(struct drm_i915_private *dev_priv)
> > +static int alloc_obj(struct drm_i915_private *dev_priv,
> > +                  struct i915_vma **vma, u8 **vaddr)
> >   {
> >       struct drm_i915_gem_object *bo;
> > -     struct i915_vma *vma;
> >       int ret;
> >
> > -     if (WARN_ON(dev_priv->perf.oa.oa_buffer.vma))
> > -             return -ENODEV;
> > +     intel_runtime_pm_get(dev_priv);
> >
> >       ret = i915_mutex_lock_interruptible(&dev_priv->drm);
> >       if (ret)
> > -             return ret;
> > +             goto out;
> >
> >       BUILD_BUG_ON_NOT_POWER_OF_2(OA_BUFFER_SIZE);
> >       BUILD_BUG_ON(OA_BUFFER_SIZE < SZ_128K || OA_BUFFER_SIZE > SZ_16M);
> >
> >       bo = i915_gem_object_create(dev_priv, OA_BUFFER_SIZE);
> >       if (IS_ERR(bo)) {
> > -             DRM_ERROR("Failed to allocate OA buffer\n");
> > +             DRM_ERROR("Failed to allocate i915 perf obj\n");
> >               ret = PTR_ERR(bo);
> >               goto unlock;
> >       }
> > @@ -1419,42 +1965,83 @@ static int alloc_oa_buffer(struct
> drm_i915_private *dev_priv)
> >               goto err_unref;
> >
> >       /* PreHSW required 512K alignment, HSW requires 16M */
> > -     vma = i915_gem_object_ggtt_pin(bo, NULL, 0, SZ_16M, 0);
> > -     if (IS_ERR(vma)) {
> > -             ret = PTR_ERR(vma);
> > +     *vma = i915_gem_object_ggtt_pin(bo, NULL, 0, SZ_16M, 0);
> > +     if (IS_ERR(*vma)) {
> > +             ret = PTR_ERR(*vma);
> >               goto err_unref;
> >       }
> > -     dev_priv->perf.oa.oa_buffer.vma = vma;
> >
> > -     dev_priv->perf.oa.oa_buffer.vaddr =
> > -             i915_gem_object_pin_map(bo, I915_MAP_WB);
> > -     if (IS_ERR(dev_priv->perf.oa.oa_buffer.vaddr)) {
> > -             ret = PTR_ERR(dev_priv->perf.oa.oa_buffer.vaddr);
> > +     *vaddr = i915_gem_object_pin_map(bo, I915_MAP_WB);
> > +     if (IS_ERR(*vaddr)) {
> > +             ret = PTR_ERR(*vaddr);
> >               goto err_unpin;
> >       }
> >
> > -     dev_priv->perf.oa.ops.init_oa_buffer(dev_priv);
> > -
> > -     DRM_DEBUG_DRIVER("OA Buffer initialized, gtt offset = 0x%x, vaddr
> = %p\n",
> > -                      i915_ggtt_offset(dev_priv->
> perf.oa.oa_buffer.vma),
> > -                      dev_priv->perf.oa.oa_buffer.vaddr);
> > -
> >       goto unlock;
> >
> >   err_unpin:
> > -     __i915_vma_unpin(vma);
> > +     i915_vma_unpin(*vma);
> >
> >   err_unref:
> >       i915_gem_object_put(bo);
> >
> > -     dev_priv->perf.oa.oa_buffer.vaddr = NULL;
> > -     dev_priv->perf.oa.oa_buffer.vma = NULL;
> > -
> >   unlock:
> >       mutex_unlock(&dev_priv->drm.struct_mutex);
> > +out:
> > +     intel_runtime_pm_put(dev_priv);
> >       return ret;
> >   }
> >
> > +static int alloc_oa_buffer(struct drm_i915_private *dev_priv)
> > +{
> > +     struct i915_vma *vma;
> > +     u8 *vaddr;
> > +     int ret;
> > +
> > +     if (WARN_ON(dev_priv->perf.oa.oa_buffer.vma))
> > +             return -ENODEV;
> > +
> > +     ret = alloc_obj(dev_priv, &vma, &vaddr);
> > +     if (ret)
> > +             return ret;
> > +
> > +     dev_priv->perf.oa.oa_buffer.vma = vma;
> > +     dev_priv->perf.oa.oa_buffer.vaddr = vaddr;
> > +
> > +     dev_priv->perf.oa.ops.init_oa_buffer(dev_priv);
> > +
> > +     DRM_DEBUG_DRIVER("OA Buffer initialized, gtt offset = 0x%x, vaddr
> = %p",
> > +                      i915_ggtt_offset(dev_priv->
> perf.oa.oa_buffer.vma),
> > +                      dev_priv->perf.oa.oa_buffer.vaddr);
> > +     return 0;
> > +}
> > +
> > +static int alloc_cs_buffer(struct i915_perf_stream *stream)
> > +{
> > +     struct drm_i915_private *dev_priv = stream->dev_priv;
> > +     struct i915_vma *vma;
> > +     u8 *vaddr;
> > +     int ret;
> > +
> > +     if (WARN_ON(stream->cs_buffer.vma))
> > +             return -ENODEV;
> > +
> > +     ret = alloc_obj(dev_priv, &vma, &vaddr);
> > +     if (ret)
> > +             return ret;
> > +
> > +     stream->cs_buffer.vma = vma;
> > +     stream->cs_buffer.vaddr = vaddr;
> > +     if (WARN_ON(!list_empty(&stream->cs_samples)))
> > +             INIT_LIST_HEAD(&stream->cs_samples);
> > +
> > +     DRM_DEBUG_DRIVER("Command stream buf initialized, gtt offset =
> 0x%x, vaddr = %p",
> > +                      i915_ggtt_offset(stream->cs_buffer.vma),
> > +                      stream->cs_buffer.vaddr);
> > +
> > +     return 0;
> > +}
> > +
> >   static void config_oa_regs(struct drm_i915_private *dev_priv,
> >                          const struct i915_oa_reg *regs,
> >                          int n_regs)
> > @@ -1859,6 +2446,10 @@ static void gen8_disable_metric_set(struct
> drm_i915_private *dev_priv)
> >
> >   static void gen7_oa_enable(struct drm_i915_private *dev_priv)
> >   {
> > +     struct i915_perf_stream *stream;
> > +     struct intel_engine_cs *engine = dev_priv->engine[RCS];
> > +     int idx;
> > +
> >       /*
> >        * Reset buf pointers so we don't forward reports from before now.
> >        *
> > @@ -1870,11 +2461,11 @@ static void gen7_oa_enable(struct
> drm_i915_private *dev_priv)
> >        */
> >       gen7_init_oa_buffer(dev_priv);
> >
> > -     if (dev_priv->perf.oa.exclusive_stream->enabled) {
> > -             struct i915_gem_context *ctx =
> > -                     dev_priv->perf.oa.exclusive_stream->ctx;
> > -             u32 ctx_id = dev_priv->perf.oa.specific_ctx_id;
> > -
> > +     idx = srcu_read_lock(&engine->perf_srcu);
> > +     stream = srcu_dereference(engine->exclusive_stream,
> &engine->perf_srcu);
> > +     if (stream->state != I915_PERF_STREAM_DISABLED) {
> > +             struct i915_gem_context *ctx = stream->ctx;
> > +             u32 ctx_id = engine->specific_ctx_id;
> >               bool periodic = dev_priv->perf.oa.periodic;
> >               u32 period_exponent = dev_priv->perf.oa.period_exponent;
> >               u32 report_format = dev_priv->perf.oa.oa_buffer.format;
> > @@ -1889,6 +2480,7 @@ static void gen7_oa_enable(struct drm_i915_private
> *dev_priv)
> >                          GEN7_OACONTROL_ENABLE);
> >       } else
> >               I915_WRITE(GEN7_OACONTROL, 0);
> > +     srcu_read_unlock(&engine->perf_srcu, idx);
> >   }
> >
> >   static void gen8_oa_enable(struct drm_i915_private *dev_priv)
> > @@ -1917,22 +2509,23 @@ static void gen8_oa_enable(struct
> drm_i915_private *dev_priv)
> >   }
> >
> >   /**
> > - * i915_oa_stream_enable - handle `I915_PERF_IOCTL_ENABLE` for OA stream
> > - * @stream: An i915 perf stream opened for OA metrics
> > + * i915_perf_stream_enable - handle `I915_PERF_IOCTL_ENABLE` for perf
> stream
> > + * @stream: An i915 perf stream opened for GPU metrics
> >    *
> >    * [Re]enables hardware periodic sampling according to the period
> configured
> >    * when opening the stream. This also starts a hrtimer that will
> periodically
> >    * check for data in the circular OA buffer for notifying userspace
> (e.g.
> >    * during a read() or poll()).
> >    */
> > -static void i915_oa_stream_enable(struct i915_perf_stream *stream)
> > +static void i915_perf_stream_enable(struct i915_perf_stream *stream)
> >   {
> >       struct drm_i915_private *dev_priv = stream->dev_priv;
> >
> > -     dev_priv->perf.oa.ops.oa_enable(dev_priv);
> > +     if (stream->sample_flags & SAMPLE_OA_REPORT)
> > +             dev_priv->perf.oa.ops.oa_enable(dev_priv);
> >
> > -     if (dev_priv->perf.oa.periodic)
> > -             hrtimer_start(&dev_priv->perf.oa.poll_check_timer,
> > +     if (stream->cs_mode || dev_priv->perf.oa.periodic)
> > +             hrtimer_start(&dev_priv->perf.poll_check_timer,
> >                             ns_to_ktime(POLL_PERIOD),
> >                             HRTIMER_MODE_REL_PINNED);
> >   }
> > @@ -1948,34 +2541,39 @@ static void gen8_oa_disable(struct
> drm_i915_private *dev_priv)
> >   }
> >
> >   /**
> > - * i915_oa_stream_disable - handle `I915_PERF_IOCTL_DISABLE` for OA
> stream
> > - * @stream: An i915 perf stream opened for OA metrics
> > + * i915_perf_stream_disable - handle `I915_PERF_IOCTL_DISABLE` for perf
> stream
> > + * @stream: An i915 perf stream opened for GPU metrics
> >    *
> >    * Stops the OA unit from periodically writing counter reports into the
> >    * circular OA buffer. This also stops the hrtimer that periodically
> checks for
> >    * data in the circular OA buffer, for notifying userspace.
> >    */
> > -static void i915_oa_stream_disable(struct i915_perf_stream *stream)
> > +static void i915_perf_stream_disable(struct i915_perf_stream *stream)
> >   {
> >       struct drm_i915_private *dev_priv = stream->dev_priv;
> >
> > -     dev_priv->perf.oa.ops.oa_disable(dev_priv);
> > +     if (stream->cs_mode || dev_priv->perf.oa.periodic)
> > +             hrtimer_cancel(&dev_priv->perf.poll_check_timer);
> > +
> > +     if (stream->cs_mode)
> > +             i915_perf_stream_release_samples(stream);
> >
> > -     if (dev_priv->perf.oa.periodic)
> > -             hrtimer_cancel(&dev_priv->perf.oa.poll_check_timer);
> > +     if (stream->sample_flags & SAMPLE_OA_REPORT)
> > +             dev_priv->perf.oa.ops.oa_disable(dev_priv);
> >   }
> >
> > -static const struct i915_perf_stream_ops i915_oa_stream_ops = {
> > -     .destroy = i915_oa_stream_destroy,
> > -     .enable = i915_oa_stream_enable,
> > -     .disable = i915_oa_stream_disable,
> > -     .wait_unlocked = i915_oa_wait_unlocked,
> > -     .poll_wait = i915_oa_poll_wait,
> > -     .read = i915_oa_read,
> > +static const struct i915_perf_stream_ops perf_stream_ops = {
> > +     .destroy = i915_perf_stream_destroy,
> > +     .enable = i915_perf_stream_enable,
> > +     .disable = i915_perf_stream_disable,
> > +     .wait_unlocked = i915_perf_stream_wait_unlocked,
> > +     .poll_wait = i915_perf_stream_poll_wait,
> > +     .read = i915_perf_stream_read,
> > +     .emit_sample_capture = i915_perf_stream_emit_sample_capture,
> >   };
> >
> >   /**
> > - * i915_oa_stream_init - validate combined props for OA stream and init
> > + * i915_perf_stream_init - validate combined props for stream and init
> >    * @stream: An i915 perf stream
> >    * @param: The open parameters passed to `DRM_I915_PERF_OPEN`
> >    * @props: The property state that configures stream (individually
> validated)
> > @@ -1984,58 +2582,35 @@ static void i915_oa_stream_disable(struct
> i915_perf_stream *stream)
> >    * doesn't ensure that the combination necessarily makes sense.
> >    *
> >    * At this point it has been determined that userspace wants a stream
> of
> > - * OA metrics, but still we need to further validate the combined
> > + * perf metrics, but still we need to further validate the combined
> >    * properties are OK.
> >    *
> >    * If the configuration makes sense then we can allocate memory for
> > - * a circular OA buffer and apply the requested metric set
> configuration.
> > + * a circular perf buffer and apply the requested metric set
> configuration.
> >    *
> >    * Returns: zero on success or a negative error code.
> >    */
> > -static int i915_oa_stream_init(struct i915_perf_stream *stream,
> > +static int i915_perf_stream_init(struct i915_perf_stream *stream,
> >                              struct drm_i915_perf_open_param *param,
> >                              struct perf_open_properties *props)
> >   {
> >       struct drm_i915_private *dev_priv = stream->dev_priv;
> > -     int format_size;
> > +     bool require_oa_unit = props->sample_flags & (SAMPLE_OA_REPORT |
> > +                                                   SAMPLE_OA_SOURCE);
> > +     bool cs_sample_data = props->sample_flags & SAMPLE_OA_REPORT;
> > +     struct i915_perf_stream *curr_stream;
> > +     struct intel_engine_cs *engine = NULL;
> > +     int idx;
> >       int ret;
> >
> > -     /* If the sysfs metrics/ directory wasn't registered for some
> > -      * reason then don't let userspace try their luck with config
> > -      * IDs
> > -      */
> > -     if (!dev_priv->perf.metrics_kobj) {
> > -             DRM_DEBUG("OA metrics weren't advertised via sysfs\n");
> > -             return -EINVAL;
> > -     }
> > -
> > -     if (!(props->sample_flags & SAMPLE_OA_REPORT)) {
> > -             DRM_DEBUG("Only OA report sampling supported\n");
> > -             return -EINVAL;
> > -     }
> > -
> > -     if (!dev_priv->perf.oa.ops.init_oa_buffer) {
> > -             DRM_DEBUG("OA unit not supported\n");
> > -             return -ENODEV;
> > -     }
> > -
> > -     /* To avoid the complexity of having to accurately filter
> > -      * counter reports and marshal to the appropriate client
> > -      * we currently only allow exclusive access
> > -      */
> > -     if (dev_priv->perf.oa.exclusive_stream) {
> > -             DRM_DEBUG("OA unit already in use\n");
> > -             return -EBUSY;
> > -     }
> > -
> > -     if (!props->metrics_set) {
> > -             DRM_DEBUG("OA metric set not specified\n");
> > -             return -EINVAL;
> > -     }
> > -
> > -     if (!props->oa_format) {
> > -             DRM_DEBUG("OA report format not specified\n");
> > -             return -EINVAL;
> > +     if ((props->sample_flags & SAMPLE_CTX_ID) && !props->cs_mode) {
> > +             if (IS_HASWELL(dev_priv)) {
> > +                     DRM_ERROR("On HSW, context ID sampling only
> supported via command stream\n");
> > +                     return -EINVAL;
> > +             } else if (!i915.enable_execlists) {
> > +                     DRM_ERROR("On Gen8+ without execlists, context ID
> sampling only supported via command stream\n");
> > +                     return -EINVAL;
> > +             }
> >       }
> >
> >       /* We set up some ratelimit state to potentially throttle any
> _NOTES
> > @@ -2060,70 +2635,167 @@ static int i915_oa_stream_init(struct
> i915_perf_stream *stream,
> >
> >       stream->sample_size = sizeof(struct drm_i915_perf_record_header);
> >
> > -     format_size = dev_priv->perf.oa.oa_formats[props->oa_format].size;
> > +     if (require_oa_unit) {
> > +             int format_size;
> >
> > -     stream->sample_flags |= SAMPLE_OA_REPORT;
> > -     stream->sample_size += format_size;
> > +             /* If the sysfs metrics/ directory wasn't registered for
> some
> > +              * reason then don't let userspace try their luck with
> config
> > +              * IDs
> > +              */
> > +             if (!dev_priv->perf.metrics_kobj) {
> > +                     DRM_DEBUG("OA metrics weren't advertised via
> sysfs\n");
> > +                     return -EINVAL;
> > +             }
> >
> > -     if (props->sample_flags & SAMPLE_OA_SOURCE) {
> > -             stream->sample_flags |= SAMPLE_OA_SOURCE;
> > -             stream->sample_size += 8;
> > -     }
> > +             if (!dev_priv->perf.oa.ops.init_oa_buffer) {
> > +                     DRM_DEBUG("OA unit not supported\n");
> > +                     return -ENODEV;
> > +             }
> >
> > -     dev_priv->perf.oa.oa_buffer.format_size = format_size;
> > -     if (WARN_ON(dev_priv->perf.oa.oa_buffer.format_size == 0))
> > -             return -EINVAL;
> > +             if (!props->metrics_set) {
> > +                     DRM_DEBUG("OA metric set not specified\n");
> > +                     return -EINVAL;
> > +             }
> > +
> > +             if (!props->oa_format) {
> > +                     DRM_DEBUG("OA report format not specified\n");
> > +                     return -EINVAL;
> > +             }
> > +
> > +             if (props->cs_mode && (props->engine != RCS)) {
> > +                     DRM_ERROR("Command stream OA metrics only
> available via Render CS\n");
> > +                     return -EINVAL;
> > +             }
> > +
> > +             engine = dev_priv->engine[RCS];
> > +             stream->using_oa = true;
> > +
> > +             idx = srcu_read_lock(&engine->perf_srcu);
> > +             curr_stream = srcu_dereference(engine->exclusive_stream,
> > +                                            &engine->perf_srcu);
> > +             if (curr_stream) {
> > +                     DRM_ERROR("Stream already opened\n");
> > +                     ret = -EINVAL;
> > +                     goto err_enable;
> > +             }
> > +             srcu_read_unlock(&engine->perf_srcu, idx);
> > +
> > +             format_size =
> > +                     dev_priv->perf.oa.oa_formats[
> props->oa_format].size;
> > +
> > +             if (props->sample_flags & SAMPLE_OA_REPORT) {
> > +                     stream->sample_flags |= SAMPLE_OA_REPORT;
> > +                     stream->sample_size += format_size;
> > +             }
> > +
> > +             if (props->sample_flags & SAMPLE_OA_SOURCE) {
> > +                     if (!(props->sample_flags & SAMPLE_OA_REPORT)) {
> > +                             DRM_ERROR("OA source type can't be sampled
> without OA report\n");
> > +                             return -EINVAL;
> > +                     }
> > +                     stream->sample_flags |= SAMPLE_OA_SOURCE;
> > +                     stream->sample_size += 8;
> > +             }
> > +
> > +             dev_priv->perf.oa.oa_buffer.format_size = format_size;
> > +             if (WARN_ON(dev_priv->perf.oa.oa_buffer.format_size == 0))
> > +                     return -EINVAL;
> > +
> > +             dev_priv->perf.oa.oa_buffer.format =
> > +                     dev_priv->perf.oa.oa_formats[
> props->oa_format].format;
> > +
> > +             dev_priv->perf.oa.metrics_set = props->metrics_set;
> >
> > -     dev_priv->perf.oa.oa_buffer.format =
> > -             dev_priv->perf.oa.oa_formats[props->oa_format].format;
> > +             dev_priv->perf.oa.periodic = props->oa_periodic;
> > +             if (dev_priv->perf.oa.periodic)
> > +                     dev_priv->perf.oa.period_exponent =
> > +                             props->oa_period_exponent;
> >
> > -     dev_priv->perf.oa.metrics_set = props->metrics_set;
> > +             if (stream->ctx) {
> > +                     ret = oa_get_render_ctx_id(stream);
> > +                     if (ret)
> > +                             return ret;
> > +             }
> >
> > -     dev_priv->perf.oa.periodic = props->oa_periodic;
> > -     if (dev_priv->perf.oa.periodic)
> > -             dev_priv->perf.oa.period_exponent =
> props->oa_period_exponent;
> > +             /* PRM - observability performance counters:
> > +              *
> > +              *   OACONTROL, performance counter enable, note:
> > +              *
> > +              *   "When this bit is set, in order to have coherent
> counts,
> > +              *   RC6 power state and trunk clock gating must be
> disabled.
> > +              *   This can be achieved by programming MMIO registers as
> > +              *   0xA094=0 and 0xA090[31]=1"
> > +              *
> > +              *   In our case we are expecting that taking pm +
> FORCEWAKE
> > +              *   references will effectively disable RC6.
> > +              */
> > +             intel_runtime_pm_get(dev_priv);
> > +             intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
> >
> > -     if (stream->ctx) {
> > -             ret = oa_get_render_ctx_id(stream);
> > +             ret = alloc_oa_buffer(dev_priv);
> >               if (ret)
> > -                     return ret;
> > +                     goto err_oa_buf_alloc;
> > +
> > +             ret = dev_priv->perf.oa.ops.enable_metric_set(dev_priv);
> > +             if (ret)
> > +                     goto err_enable;
> >       }
> >
> > -     /* PRM - observability performance counters:
> > -      *
> > -      *   OACONTROL, performance counter enable, note:
> > -      *
> > -      *   "When this bit is set, in order to have coherent counts,
> > -      *   RC6 power state and trunk clock gating must be disabled.
> > -      *   This can be achieved by programming MMIO registers as
> > -      *   0xA094=0 and 0xA090[31]=1"
> > -      *
> > -      *   In our case we are expecting that taking pm + FORCEWAKE
> > -      *   references will effectively disable RC6.
> > -      */
> > -     intel_runtime_pm_get(dev_priv);
> > -     intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
> > +     if (props->sample_flags & SAMPLE_CTX_ID) {
> > +             stream->sample_flags |= SAMPLE_CTX_ID;
> > +             stream->sample_size += 8;
> > +     }
> >
> > -     ret = alloc_oa_buffer(dev_priv);
> > -     if (ret)
> > -             goto err_oa_buf_alloc;
> > +     if (props->cs_mode) {
> > +             if (!cs_sample_data) {
> > +                     DRM_ERROR("Stream engine given without requesting
> any CS data to sample\n");
> > +                     ret = -EINVAL;
> > +                     goto err_enable;
> > +             }
> >
> > -     ret = dev_priv->perf.oa.ops.enable_metric_set(dev_priv);
> > -     if (ret)
> > -             goto err_enable;
> > +             if (!(props->sample_flags & SAMPLE_CTX_ID)) {
> > +                     DRM_ERROR("Stream engine given without requesting
> any CS specific property\n");
> > +                     ret = -EINVAL;
> > +                     goto err_enable;
> > +             }
> >
> > -     stream->ops = &i915_oa_stream_ops;
> > +             engine = dev_priv->engine[props->engine];
> >
> > -     dev_priv->perf.oa.exclusive_stream = stream;
> > +             idx = srcu_read_lock(&engine->perf_srcu);
> > +             curr_stream = srcu_dereference(engine->exclusive_stream,
> > +                                            &engine->perf_srcu);
> > +             if (curr_stream) {
> > +                     DRM_ERROR("Stream already opened\n");
> > +                     ret = -EINVAL;
> > +                     goto err_enable;
> > +             }
> > +             srcu_read_unlock(&engine->perf_srcu, idx);
> > +
> > +             INIT_LIST_HEAD(&stream->cs_samples);
> > +             ret = alloc_cs_buffer(stream);
> > +             if (ret)
> > +                     goto err_enable;
> > +
> > +             stream->cs_mode = true;
> > +     }
> > +
> > +     init_waitqueue_head(&stream->poll_wq);
> > +     stream->pollin = false;
> > +     stream->ops = &perf_stream_ops;
> > +     stream->engine = engine;
> > +     rcu_assign_pointer(engine->exclusive_stream, stream);
> >
> >       return 0;
> >
> >   err_enable:
> > -     free_oa_buffer(dev_priv);
> > +     if (require_oa_unit)
> > +             free_oa_buffer(dev_priv);
> >
> >   err_oa_buf_alloc:
> > -     intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
> > -     intel_runtime_pm_put(dev_priv);
> > +     if (require_oa_unit) {
> > +             intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
> > +             intel_runtime_pm_put(dev_priv);
> > +     }
> >       if (stream->ctx)
> >               oa_put_render_ctx_id(stream);
> >
> > @@ -2219,7 +2891,7 @@ static ssize_t i915_perf_read(struct file *file,
> >        * disabled stream as an error. In particular it might otherwise
> lead
> >        * to a deadlock for blocking file descriptors...
> >        */
> > -     if (!stream->enabled)
> > +     if (stream->state == I915_PERF_STREAM_DISABLED)
> >               return -EIO;
> >
> >       if (!(file->f_flags & O_NONBLOCK)) {
> > @@ -2254,25 +2926,32 @@ static ssize_t i915_perf_read(struct file *file,
> >        * effectively ensures we back off until the next hrtimer callback
> >        * before reporting another POLLIN event.
> >        */
> > -     if (ret >= 0 || ret == -EAGAIN) {
> > -             /* Maybe make ->pollin per-stream state if we support
> multiple
> > -              * concurrent streams in the future.
> > -              */
> > -             dev_priv->perf.oa.pollin = false;
> > -     }
> > +     if (ret >= 0 || ret == -EAGAIN)
> > +             stream->pollin = false;
> >
> >       return ret;
> >   }
> >
> > -static enum hrtimer_restart oa_poll_check_timer_cb(struct hrtimer
> *hrtimer)
> > +static enum hrtimer_restart poll_check_timer_cb(struct hrtimer *hrtimer)
> >   {
> > +     struct i915_perf_stream *stream;
> >       struct drm_i915_private *dev_priv =
> >               container_of(hrtimer, typeof(*dev_priv),
> > -                          perf.oa.poll_check_timer);
> > -
> > -     if (oa_buffer_check_unlocked(dev_priv)) {
> > -             dev_priv->perf.oa.pollin = true;
> > -             wake_up(&dev_priv->perf.oa.poll_wq);
> > +                          perf.poll_check_timer);
> > +     int idx;
> > +     struct intel_engine_cs *engine;
> > +     enum intel_engine_id id;
> > +
> > +     for_each_engine(engine, dev_priv, id) {
> > +             idx = srcu_read_lock(&engine->perf_srcu);
> > +             stream = srcu_dereference(engine->exclusive_stream,
> > +                                       &engine->perf_srcu);
> > +             if (stream && (stream->state == I915_PERF_STREAM_ENABLED)
> &&
> > +                 stream_have_data_unlocked(stream)) {
> > +                     stream->pollin = true;
> > +                     wake_up(&stream->poll_wq);
> > +             }
> > +             srcu_read_unlock(&engine->perf_srcu, idx);
> >       }
> >
> >       hrtimer_forward_now(hrtimer, ns_to_ktime(POLL_PERIOD));
> > @@ -2311,7 +2990,7 @@ static unsigned int i915_perf_poll_locked(struct
> drm_i915_private *dev_priv,
> >        * the hrtimer/oa_poll_check_timer_cb to notify us when there are
> >        * samples to read.
> >        */
> > -     if (dev_priv->perf.oa.pollin)
> > +     if (stream->pollin)
> >               events |= POLLIN;
> >
> >       return events;
> > @@ -2355,14 +3034,16 @@ static unsigned int i915_perf_poll(struct file
> *file, poll_table *wait)
> >    */
> >   static void i915_perf_enable_locked(struct i915_perf_stream *stream)
> >   {
> > -     if (stream->enabled)
> > +     if (stream->state != I915_PERF_STREAM_DISABLED)
> >               return;
> >
> >       /* Allow stream->ops->enable() to refer to this */
> > -     stream->enabled = true;
> > +     stream->state = I915_PERF_STREAM_ENABLE_IN_PROGRESS;
> >
> >       if (stream->ops->enable)
> >               stream->ops->enable(stream);
> > +
> > +     stream->state = I915_PERF_STREAM_ENABLED;
> >   }
> >
> >   /**
> > @@ -2381,11 +3062,11 @@ static void i915_perf_enable_locked(struct
> i915_perf_stream *stream)
> >    */
> >   static void i915_perf_disable_locked(struct i915_perf_stream *stream)
> >   {
> > -     if (!stream->enabled)
> > +     if (stream->state != I915_PERF_STREAM_ENABLED)
> >               return;
> >
> >       /* Allow stream->ops->disable() to refer to this */
> > -     stream->enabled = false;
> > +     stream->state = I915_PERF_STREAM_DISABLED;
> >
> >       if (stream->ops->disable)
> >               stream->ops->disable(stream);
> > @@ -2457,14 +3138,12 @@ static long i915_perf_ioctl(struct file *file,
> >    */
> >   static void i915_perf_destroy_locked(struct i915_perf_stream *stream)
> >   {
> > -     if (stream->enabled)
> > +     if (stream->state == I915_PERF_STREAM_ENABLED)
> >               i915_perf_disable_locked(stream);
> >
> >       if (stream->ops->destroy)
> >               stream->ops->destroy(stream);
> >
> > -     list_del(&stream->link);
> > -
> >       if (stream->ctx)
> >               i915_gem_context_put(stream->ctx);
> >
> > @@ -2524,7 +3203,7 @@ static int i915_perf_release(struct inode *inode,
> struct file *file)
> >    *
> >    * In the case where userspace is interested in OA unit metrics then
> further
> >    * config validation and stream initialization details will be handled
> by
> > - * i915_oa_stream_init(). The code here should only validate config
> state that
> > + * i915_perf_stream_init(). The code here should only validate config
> state that
> >    * will be relevant to all stream types / backends.
> >    *
> >    * Returns: zero on success or a negative error code.
> > @@ -2593,7 +3272,7 @@ static int i915_perf_release(struct inode *inode,
> struct file *file)
> >       stream->dev_priv = dev_priv;
> >       stream->ctx = specific_ctx;
> >
> > -     ret = i915_oa_stream_init(stream, param, props);
> > +     ret = i915_perf_stream_init(stream, param, props);
> >       if (ret)
> >               goto err_alloc;
> >
> > @@ -2606,8 +3285,6 @@ static int i915_perf_release(struct inode *inode,
> struct file *file)
> >               goto err_flags;
> >       }
> >
> > -     list_add(&stream->link, &dev_priv->perf.streams);
> > -
> >       if (param->flags & I915_PERF_FLAG_FD_CLOEXEC)
> >               f_flags |= O_CLOEXEC;
> >       if (param->flags & I915_PERF_FLAG_FD_NONBLOCK)
> > @@ -2625,7 +3302,6 @@ static int i915_perf_release(struct inode *inode,
> struct file *file)
> >       return stream_fd;
> >
> >   err_open:
> > -     list_del(&stream->link);
> >   err_flags:
> >       if (stream->ops->destroy)
> >               stream->ops->destroy(stream);
> > @@ -2774,6 +3450,29 @@ static int read_properties_unlocked(struct
> drm_i915_private *dev_priv,
> >               case DRM_I915_PERF_PROP_SAMPLE_OA_SOURCE:
> >                       props->sample_flags |= SAMPLE_OA_SOURCE;
> >                       break;
> > +             case DRM_I915_PERF_PROP_ENGINE: {
> > +                             unsigned int user_ring_id =
> > +                                     value & I915_EXEC_RING_MASK;
> > +                             enum intel_engine_id engine;
> > +
> > +                             if (user_ring_id > I915_USER_RINGS)
> > +                                     return -EINVAL;
> > +
> > +                             /* XXX: Currently only RCS is supported.
> > +                              * Remove this check when support for other
> > +                              * engines is added
> > +                              */
> > +                             engine = user_ring_map[user_ring_id];
> > +                             if (engine != RCS)
> > +                                     return -EINVAL;
> > +
> > +                             props->cs_mode = true;
> > +                             props->engine = engine;
> > +                     }
> > +                     break;
> > +             case DRM_I915_PERF_PROP_SAMPLE_CTX_ID:
> > +                     props->sample_flags |= SAMPLE_CTX_ID;
> > +                     break;
> >               case DRM_I915_PERF_PROP_MAX:
> >                       MISSING_CASE(id);
> >                       return -EINVAL;
> > @@ -3002,6 +3701,30 @@ void i915_perf_unregister(struct drm_i915_private
> *dev_priv)
> >       {}
> >   };
> >
> > +void i915_perf_streams_mark_idle(struct drm_i915_private *dev_priv)
> > +{
> > +     struct intel_engine_cs *engine;
> > +     struct i915_perf_stream *stream;
> > +     enum intel_engine_id id;
> > +     int idx;
> > +
> > +     for_each_engine(engine, dev_priv, id) {
> > +             idx = srcu_read_lock(&engine->perf_srcu);
> > +             stream = srcu_dereference(engine->exclusive_stream,
> > +                                       &engine->perf_srcu);
> > +             if (stream && (stream->state == I915_PERF_STREAM_ENABLED)
> &&
> > +                                     stream->cs_mode) {
> > +                     struct reservation_object *resv =
> > +
>  stream->cs_buffer.vma->resv;
> > +
> > +                     reservation_object_lock(resv, NULL);
> > +                     reservation_object_add_excl_fence(resv, NULL);
> > +                     reservation_object_unlock(resv);
> > +             }
> > +             srcu_read_unlock(&engine->perf_srcu, idx);
> > +     }
> > +}
> > +
> >   /**
> >    * i915_perf_init - initialize i915-perf state on module load
> >    * @dev_priv: i915 device instance
> > @@ -3125,12 +3848,10 @@ void i915_perf_init(struct drm_i915_private
> *dev_priv)
> >       }
> >
> >       if (dev_priv->perf.oa.n_builtin_sets) {
> > -             hrtimer_init(&dev_priv->perf.oa.poll_check_timer,
> > +             hrtimer_init(&dev_priv->perf.poll_check_timer,
> >                               CLOCK_MONOTONIC, HRTIMER_MODE_REL);
> > -             dev_priv->perf.oa.poll_check_timer.function =
> oa_poll_check_timer_cb;
> > -             init_waitqueue_head(&dev_priv->perf.oa.poll_wq);
> > +             dev_priv->perf.poll_check_timer.function =
> poll_check_timer_cb;
> >
> > -             INIT_LIST_HEAD(&dev_priv->perf.streams);
> >               mutex_init(&dev_priv->perf.lock);
> >               spin_lock_init(&dev_priv->perf.oa.oa_buffer.ptr_lock);
> >
> > diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c
> b/drivers/gpu/drm/i915/intel_engine_cs.c
> > index 9ab5969..1a2e843 100644
> > --- a/drivers/gpu/drm/i915/intel_engine_cs.c
> > +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
> > @@ -317,6 +317,10 @@ int intel_engines_init(struct drm_i915_private
> *dev_priv)
> >                       goto cleanup;
> >
> >               GEM_BUG_ON(!engine->submit_request);
> > +
> > +             /* Perf stream related initialization for Engine */
> > +             rcu_assign_pointer(engine->exclusive_stream, NULL);
> > +             init_srcu_struct(&engine->perf_srcu);
> >       }
> >
> >       return 0;
> > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c
> b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > index cdf084e..4333623 100644
> > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > @@ -1622,6 +1622,8 @@ void intel_engine_cleanup(struct intel_engine_cs
> *engine)
> >
> >       intel_engine_cleanup_common(engine);
> >
> > +     cleanup_srcu_struct(&engine->perf_srcu);
> > +
> >       dev_priv->engine[engine->id] = NULL;
> >       kfree(engine);
> >   }
> > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h
> b/drivers/gpu/drm/i915/intel_ringbuffer.h
> > index d33c934..0ac8491 100644
> > --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> > @@ -441,6 +441,11 @@ struct intel_engine_cs {
> >        * certain bits to encode the command length in the header).
> >        */
> >       u32 (*get_cmd_length_mask)(u32 cmd_header);
> > +
> > +     /* Global per-engine stream */
> > +     struct srcu_struct perf_srcu;
> > +     struct i915_perf_stream __rcu *exclusive_stream;
> > +     u32 specific_ctx_id;
> >   };
> >
> >   static inline unsigned int
> > diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> > index a1314c5..768b1a5 100644
> > --- a/include/uapi/drm/i915_drm.h
> > +++ b/include/uapi/drm/i915_drm.h
> > @@ -1350,6 +1350,7 @@ enum drm_i915_oa_format {
> >
> >   enum drm_i915_perf_sample_oa_source {
> >       I915_PERF_SAMPLE_OA_SOURCE_OABUFFER,
> > +     I915_PERF_SAMPLE_OA_SOURCE_CS,
> >       I915_PERF_SAMPLE_OA_SOURCE_MAX  /* non-ABI */
> >   };
> >
> > @@ -1394,6 +1395,19 @@ enum drm_i915_perf_property_id {
> >        */
> >       DRM_I915_PERF_PROP_SAMPLE_OA_SOURCE,
> >
> > +     /**
> > +      * The value of this property specifies the GPU engine for which
> > +      * the samples need to be collected. Specifying this property also
> > +      * implies the command stream based sample collection.
> > +      */
> > +     DRM_I915_PERF_PROP_ENGINE,
> > +
> > +     /**
> > +      * The value of this property set to 1 requests inclusion of
> context ID
> > +      * in the perf sample data.
> > +      */
> > +     DRM_I915_PERF_PROP_SAMPLE_CTX_ID,
> > +
> >       DRM_I915_PERF_PROP_MAX /* non-ABI */
> >   };
> >
> > @@ -1460,6 +1474,7 @@ enum drm_i915_perf_record_type {
> >        *     struct drm_i915_perf_record_header header;
> >        *
> >        *     { u64 source; } && DRM_I915_PERF_PROP_SAMPLE_OA_SOURCE
> > +      *     { u64 ctx_id; } && DRM_I915_PERF_PROP_SAMPLE_CTX_ID
> >        *     { u32 oa_report[]; } && DRM_I915_PERF_PROP_SAMPLE_OA
> >        * };
> >        */
>
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
>

[-- Attachment #1.2: Type: text/html, Size: 124103 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 03/12] drm/i915: Framework for capturing command stream based OA reports and ctx id info.
  2017-08-01 18:05       ` sourab gupta
@ 2017-08-01 20:58         ` Lionel Landwerlin
  2017-08-02  2:47           ` sourab gupta
  0 siblings, 1 reply; 34+ messages in thread
From: Lionel Landwerlin @ 2017-08-01 20:58 UTC (permalink / raw)
  To: sourab gupta, Kamble, Sagar A; +Cc: intel-gfx, Sourab Gupta


[-- Attachment #1.1: Type: text/plain, Size: 8514 bytes --]

On 01/08/17 19:05, sourab gupta wrote:
>
>
> On Tue, Aug 1, 2017 at 2:59 PM, Kamble, Sagar A 
> <sagar.a.kamble@intel.com <mailto:sagar.a.kamble@intel.com>> wrote:
>
>
>
>     -----Original Message-----
>     From: Landwerlin, Lionel G
>     Sent: Monday, July 31, 2017 9:16 PM
>     To: Kamble, Sagar A <sagar.a.kamble@intel.com
>     <mailto:sagar.a.kamble@intel.com>>;
>     intel-gfx@lists.freedesktop.org
>     <mailto:intel-gfx@lists.freedesktop.org>
>     Cc: Sourab Gupta <sourab.gupta@intel.com
>     <mailto:sourab.gupta@intel.com>>
>     Subject: Re: [Intel-gfx] [PATCH 03/12] drm/i915: Framework for
>     capturing command stream based OA reports and ctx id info.
>
>     On 31/07/17 08:59, Sagar Arun Kamble wrote:
>     > From: Sourab Gupta <sourab.gupta@intel.com
>     <mailto:sourab.gupta@intel.com>>
>     >
>     > This patch introduces a framework to capture OA counter reports
>     associated
>     > with Render command stream. We can then associate the reports
>     captured
>     > through this mechanism with their corresponding context id's.
>     This can be
>     > further extended to associate any other metadata information
>     with the
>     > corresponding samples (since the association with Render command
>     stream
>     > gives us the ability to capture these information while
>     inserting the
>     > corresponding capture commands into the command stream).
>     >
>     > The OA reports generated in this way are associated with a
>     corresponding
>     > workload, and thus can be used the delimit the workload (i.e.
>     sample the
>     > counters at the workload boundaries), within an ongoing stream
>     of periodic
>     > counter snapshots.
>     >
>     > There may be usecases wherein we need more than periodic OA
>     capture mode
>     > which is supported currently. This mode is primarily used for
>     two usecases:
>     >      - Ability to capture system wide metrics, alongwith the
>     ability to map
>     >        the reports back to individual contexts (particularly for
>     HSW).
>     >      - Ability to inject tags for work, into the reports. This
>     provides
>     >        visibility into the multiple stages of work within single
>     context.
>     >
>     > The userspace will be able to distinguish between the periodic
>     and CS based
>     > OA reports by the virtue of source_info sample field.
>     >
>     > The command MI_REPORT_PERF_COUNT can be used to capture
>     snapshots of OA
>     > counters, and is inserted at BB boundaries.
>     > The data thus captured will be stored in a separate buffer,
>     which will
>     > be different from the buffer used otherwise for periodic OA
>     capture mode.
>     > The metadata information pertaining to snapshot is maintained in
>     a list,
>     > which also has offsets into the gem buffer object per captured
>     snapshot.
>     > In order to track whether the gpu has completed processing the node,
>     > a field pertaining to corresponding gem request is added, which
>     is tracked
>     > for completion of the command.
>     >
>     > Both periodic and CS based reports are associated with a single
>     stream
>     > (corresponding to render engine), and it is expected to have the
>     samples
>     > in the sequential order according to their timestamps. Now,
>     since these
>     > reports are collected in separate buffers, these are merge
>     sorted at the
>     > time of forwarding to userspace during the read call.
>     >
>     > v2: Aligning with the non-perf interface (custom drm ioctl
>     based). Also,
>     > few related patches are squashed together for better readability
>     >
>     > v3: Updated perf sample capture emit hook name. Reserving space
>     upfront
>     > in the ring for emitting sample capture commands and using
>     > req->fence.seqno for tracking samples. Added SRCU protection for
>     streams.
>     > Changed the stream last_request tracking to resv object. (Chris)
>     > Updated perf.sample_lock spin_lock usage to avoid softlockups. Moved
>     > stream to global per-engine structure. (Sagar)
>     > Update unpin and put in the free routines to
>     i915_vma_unpin_and_release.
>     > Making use of perf stream cs_buffer vma resv instead of separate
>     resv obj.
>     > Pruned perf stream vma resv during gem_idle. (Chris)
>     > Changed payload field ctx_id to u64 to keep all sample data
>     aligned at 8
>     > bytes. (Lionel)
>     > stall/flush prior to sample capture is not added. Do we need to
>     give this
>     > control to user to select whether to stall/flush at each sample?
>     >
>     > Signed-off-by: Sourab Gupta <sourab.gupta@intel.com
>     <mailto:sourab.gupta@intel.com>>
>     > Signed-off-by: Robert Bragg <robert@sixbynine.org
>     <mailto:robert@sixbynine.org>>
>     > Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com
>     <mailto:sagar.a.kamble@intel.com>>
>     > ---
>     >   drivers/gpu/drm/i915/i915_drv.h |  101 ++-
>     >   drivers/gpu/drm/i915/i915_gem.c |    1 +
>     >   drivers/gpu/drm/i915/i915_gem_execbuffer.c |    8 +
>     >   drivers/gpu/drm/i915/i915_perf.c  | 1185
>     ++++++++++++++++++++++------
>     >   drivers/gpu/drm/i915/intel_engine_cs.c  |    4 +
>     >   drivers/gpu/drm/i915/intel_ringbuffer.c |    2 +
>     >   drivers/gpu/drm/i915/intel_ringbuffer.h |    5 +
>     >   include/uapi/drm/i915_drm.h                |  15 +
>     >   8 files changed, 1073 insertions(+), 248 deletions(-)
>     >
>     > diff --git a/drivers/gpu/drm/i915/i915_drv.h
>     b/drivers/gpu/drm/i915/i915_drv.h
>     > index 2c7456f..8b1cecf 100644
>     > --- a/drivers/gpu/drm/i915/i915_drv.h
>     > +++ b/drivers/gpu/drm/i915/i915_drv.h
>     > @@ -1985,6 +1985,24 @@ struct i915_perf_stream_ops {
>     >        * The stream will always be disabled before this is called.
>     >        */
>     >       void (*destroy)(struct i915_perf_stream *stream);
>     > +
>     > +     /*
>     > +      * @emit_sample_capture: Emit the commands in the command
>     streamer
>     > +      * for a particular gpu engine.
>     > +      *
>     > +      * The commands are inserted to capture the perf sample
>     data at
>     > +      * specific points during workload execution, such as
>     before and after
>     > +      * the batch buffer.
>     > +      */
>     > +     void (*emit_sample_capture)(struct i915_perf_stream *stream,
>     > +                                 struct drm_i915_gem_request
>     *request,
>     > +                                 bool preallocate);
>     > +};
>     > +
>
>     It seems the motivation for this following enum is mostly to deal with
>     the fact that engine->perf_srcu is set before the OA unit is
>     configured.
>     Would it possible to set it later so that we get rid of the enum?
>
>     <Sagar> I will try to make this as just binary state. This enum is
>     defining the state of the stream. I too got confused with purpose
>     of IN_PROGRESS.
>     SRCU is used for synchronizing stream state check.
>     IN_PROGRESS will enable us to not advertently try to access the
>     stream vma for inserting the samples, but I guess depending on
>     disabled/enabled should
>     suffice.
>
>
> Hi Sagar/Lionel,

Hi Sourab,

Thanks again for your input on this.

>
> The purpose of the tristate was to workaround a particular kludge of
> working with just enabled/disabled boolean state. I'll explain below.
>
> Let's say we have only boolean state.
> i915_perf_emit_sample_capture() function would depend on
> stream->enabled in order to insert the MI_RPC command in RCS.
> If you see i915_perf_enable_locked(), stream->enabled is set before
> stream->ops->enable(). The stream->ops->enable() function actually
> enables the OA hardware to capture reports, and if MI_RPC commands
> are submitted before OA hw is enabled, it may hang the gpu.

Do you remember if this is documented anywhere?
I couldn't find anything in the MI_RPC instruction.

>
> Also, we can't change the order of calling these operations inside
> i915_perf_enable_locked() since gen7_update_oacontrol_locked()
> function depends on stream->enabled flag to enable the OA
> hw unit (i.e. it needs the flag to be true).

We can probably work around that by passing some arguments.

> To workaround this problem, I introduced a tristate here.
> If you can suggest some alternate solution to this problem,
> we can remove this tristate kludge here.
> Regards,
> Sourab
>



[-- Attachment #1.2: Type: text/html, Size: 17181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 03/12] drm/i915: Framework for capturing command stream based OA reports and ctx id info.
  2017-08-01 20:58         ` Lionel Landwerlin
@ 2017-08-02  2:47           ` sourab gupta
  2017-08-02  4:25             ` Kamble, Sagar A
  0 siblings, 1 reply; 34+ messages in thread
From: sourab gupta @ 2017-08-02  2:47 UTC (permalink / raw)
  To: Lionel Landwerlin; +Cc: intel-gfx, Sourab Gupta


[-- Attachment #1.1: Type: text/plain, Size: 7965 bytes --]

On Wed, Aug 2, 2017 at 2:28 AM, Lionel Landwerlin <
lionel.g.landwerlin@intel.com> wrote:

> On 01/08/17 19:05, sourab gupta wrote:
>
>
>
> On Tue, Aug 1, 2017 at 2:59 PM, Kamble, Sagar A <sagar.a.kamble@intel.com>
> wrote:
>
>>
>>
>> -----Original Message-----
>> From: Landwerlin, Lionel G
>> Sent: Monday, July 31, 2017 9:16 PM
>> To: Kamble, Sagar A <sagar.a.kamble@intel.com>;
>> intel-gfx@lists.freedesktop.org
>> Cc: Sourab Gupta <sourab.gupta@intel.com>
>> Subject: Re: [Intel-gfx] [PATCH 03/12] drm/i915: Framework for capturing
>> command stream based OA reports and ctx id info.
>>
>> On 31/07/17 08:59, Sagar Arun Kamble wrote:
>> > From: Sourab Gupta <sourab.gupta@intel.com>
>> >
>> > This patch introduces a framework to capture OA counter reports
>> associated
>> > with Render command stream. We can then associate the reports captured
>> > through this mechanism with their corresponding context id's. This can
>> be
>> > further extended to associate any other metadata information with the
>> > corresponding samples (since the association with Render command stream
>> > gives us the ability to capture these information while inserting the
>> > corresponding capture commands into the command stream).
>> >
>> > The OA reports generated in this way are associated with a corresponding
>> > workload, and thus can be used the delimit the workload (i.e. sample the
>> > counters at the workload boundaries), within an ongoing stream of
>> periodic
>> > counter snapshots.
>> >
>> > There may be usecases wherein we need more than periodic OA capture mode
>> > which is supported currently. This mode is primarily used for two
>> usecases:
>> >      - Ability to capture system wide metrics, alongwith the ability to
>> map
>> >        the reports back to individual contexts (particularly for HSW).
>> >      - Ability to inject tags for work, into the reports. This provides
>> >        visibility into the multiple stages of work within single
>> context.
>> >
>> > The userspace will be able to distinguish between the periodic and CS
>> based
>> > OA reports by the virtue of source_info sample field.
>> >
>> > The command MI_REPORT_PERF_COUNT can be used to capture snapshots of OA
>> > counters, and is inserted at BB boundaries.
>> > The data thus captured will be stored in a separate buffer, which will
>> > be different from the buffer used otherwise for periodic OA capture
>> mode.
>> > The metadata information pertaining to snapshot is maintained in a list,
>> > which also has offsets into the gem buffer object per captured snapshot.
>> > In order to track whether the gpu has completed processing the node,
>> > a field pertaining to corresponding gem request is added, which is
>> tracked
>> > for completion of the command.
>> >
>> > Both periodic and CS based reports are associated with a single stream
>> > (corresponding to render engine), and it is expected to have the samples
>> > in the sequential order according to their timestamps. Now, since these
>> > reports are collected in separate buffers, these are merge sorted at the
>> > time of forwarding to userspace during the read call.
>> >
>> > v2: Aligning with the non-perf interface (custom drm ioctl based). Also,
>> > few related patches are squashed together for better readability
>> >
>> > v3: Updated perf sample capture emit hook name. Reserving space upfront
>> > in the ring for emitting sample capture commands and using
>> > req->fence.seqno for tracking samples. Added SRCU protection for
>> streams.
>> > Changed the stream last_request tracking to resv object. (Chris)
>> > Updated perf.sample_lock spin_lock usage to avoid softlockups. Moved
>> > stream to global per-engine structure. (Sagar)
>> > Update unpin and put in the free routines to i915_vma_unpin_and_release.
>> > Making use of perf stream cs_buffer vma resv instead of separate resv
>> obj.
>> > Pruned perf stream vma resv during gem_idle. (Chris)
>> > Changed payload field ctx_id to u64 to keep all sample data aligned at 8
>> > bytes. (Lionel)
>> > stall/flush prior to sample capture is not added. Do we need to give
>> this
>> > control to user to select whether to stall/flush at each sample?
>> >
>> > Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
>> > Signed-off-by: Robert Bragg <robert@sixbynine.org>
>> > Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
>> > ---
>> >   drivers/gpu/drm/i915/i915_drv.h            |  101 ++-
>> >   drivers/gpu/drm/i915/i915_gem.c            |    1 +
>> >   drivers/gpu/drm/i915/i915_gem_execbuffer.c |    8 +
>> >   drivers/gpu/drm/i915/i915_perf.c           | 1185
>> ++++++++++++++++++++++------
>> >   drivers/gpu/drm/i915/intel_engine_cs.c     |    4 +
>> >   drivers/gpu/drm/i915/intel_ringbuffer.c    |    2 +
>> >   drivers/gpu/drm/i915/intel_ringbuffer.h    |    5 +
>> >   include/uapi/drm/i915_drm.h                |   15 +
>> >   8 files changed, 1073 insertions(+), 248 deletions(-)
>> >
>> > diff --git a/drivers/gpu/drm/i915/i915_drv.h
>> b/drivers/gpu/drm/i915/i915_drv.h
>> > index 2c7456f..8b1cecf 100644
>> > --- a/drivers/gpu/drm/i915/i915_drv.h
>> > +++ b/drivers/gpu/drm/i915/i915_drv.h
>> > @@ -1985,6 +1985,24 @@ struct i915_perf_stream_ops {
>> >        * The stream will always be disabled before this is called.
>> >        */
>> >       void (*destroy)(struct i915_perf_stream *stream);
>> > +
>> > +     /*
>> > +      * @emit_sample_capture: Emit the commands in the command streamer
>> > +      * for a particular gpu engine.
>> > +      *
>> > +      * The commands are inserted to capture the perf sample data at
>> > +      * specific points during workload execution, such as before and
>> after
>> > +      * the batch buffer.
>> > +      */
>> > +     void (*emit_sample_capture)(struct i915_perf_stream *stream,
>> > +                                 struct drm_i915_gem_request *request,
>> > +                                 bool preallocate);
>> > +};
>> > +
>>
>> It seems the motivation for this following enum is mostly to deal with
>> the fact that engine->perf_srcu is set before the OA unit is configured.
>> Would it possible to set it later so that we get rid of the enum?
>>
>> <Sagar> I will try to make this as just binary state. This enum is
>> defining the state of the stream. I too got confused with purpose of
>> IN_PROGRESS.
>> SRCU is used for synchronizing stream state check.
>> IN_PROGRESS will enable us to not advertently try to access the stream
>> vma for inserting the samples, but I guess depending on disabled/enabled
>> should
>> suffice.
>>
>
> Hi Sagar/Lionel,
>
>
> Hi Sourab,
>
> Thanks again for your input on this.
>
>
> The purpose of the tristate was to workaround a particular kludge of
> working with just enabled/disabled boolean state. I'll explain below.
>
> Let's say we have only boolean state.
> i915_perf_emit_sample_capture() function would depend on
> stream->enabled in order to insert the MI_RPC command in RCS.
> If you see i915_perf_enable_locked(), stream->enabled is set before
> stream->ops->enable(). The stream->ops->enable() function actually
> enables the OA hardware to capture reports, and if MI_RPC commands
> are submitted before OA hw is enabled, it may hang the gpu.
>
>
> Do you remember if this is documented anywhere?
> I couldn't find anything in the MI_RPC instruction.
>
> Sorry, I don't happen to remember any documentation. Probably, you can
check this out by submitting MI_RPC without enabling OA.

>
> Also, we can't change the order of calling these operations inside
> i915_perf_enable_locked() since gen7_update_oacontrol_locked()
> function depends on stream->enabled flag to enable the OA
> hw unit (i.e. it needs the flag to be true).
>
>
> We can probably work around that by passing some arguments.
>
> To workaround this problem, I introduced a tristate here.
> If you can suggest some alternate solution to this problem,
> we can remove this tristate kludge here.
> Regards,
> Sourab
>
>
>
>

[-- Attachment #1.2: Type: text/html, Size: 17575 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 03/12] drm/i915: Framework for capturing command stream based OA reports and ctx id info.
  2017-08-02  2:47           ` sourab gupta
@ 2017-08-02  4:25             ` Kamble, Sagar A
  0 siblings, 0 replies; 34+ messages in thread
From: Kamble, Sagar A @ 2017-08-02  4:25 UTC (permalink / raw)
  To: sourab gupta, Landwerlin, Lionel G; +Cc: intel-gfx, Sourab Gupta


[-- Attachment #1.1: Type: text/plain, Size: 8697 bytes --]



From: sourab gupta [mailto:sourabgupta@gmail.com]
Sent: Wednesday, August 2, 2017 8:17 AM
To: Landwerlin, Lionel G <lionel.g.landwerlin@intel.com>
Cc: Kamble, Sagar A <sagar.a.kamble@intel.com>; intel-gfx@lists.freedesktop.org; Sourab Gupta <sourab.gupta@intel.com>
Subject: Re: [Intel-gfx] [PATCH 03/12] drm/i915: Framework for capturing command stream based OA reports and ctx id info.



On Wed, Aug 2, 2017 at 2:28 AM, Lionel Landwerlin <lionel.g.landwerlin@intel.com<mailto:lionel.g.landwerlin@intel.com>> wrote:
On 01/08/17 19:05, sourab gupta wrote:


On Tue, Aug 1, 2017 at 2:59 PM, Kamble, Sagar A <sagar.a.kamble@intel.com<mailto:sagar.a.kamble@intel.com>> wrote:


-----Original Message-----
From: Landwerlin, Lionel G
Sent: Monday, July 31, 2017 9:16 PM
To: Kamble, Sagar A <sagar.a.kamble@intel.com<mailto:sagar.a.kamble@intel.com>>; intel-gfx@lists.freedesktop.org<mailto:intel-gfx@lists.freedesktop.org>
Cc: Sourab Gupta <sourab.gupta@intel.com<mailto:sourab.gupta@intel.com>>
Subject: Re: [Intel-gfx] [PATCH 03/12] drm/i915: Framework for capturing command stream based OA reports and ctx id info.

On 31/07/17 08:59, Sagar Arun Kamble wrote:
> From: Sourab Gupta <sourab.gupta@intel.com<mailto:sourab.gupta@intel.com>>
>
> This patch introduces a framework to capture OA counter reports associated
> with Render command stream. We can then associate the reports captured
> through this mechanism with their corresponding context id's. This can be
> further extended to associate any other metadata information with the
> corresponding samples (since the association with Render command stream
> gives us the ability to capture these information while inserting the
> corresponding capture commands into the command stream).
>
> The OA reports generated in this way are associated with a corresponding
> workload, and thus can be used the delimit the workload (i.e. sample the
> counters at the workload boundaries), within an ongoing stream of periodic
> counter snapshots.
>
> There may be usecases wherein we need more than periodic OA capture mode
> which is supported currently. This mode is primarily used for two usecases:
>      - Ability to capture system wide metrics, alongwith the ability to map
>        the reports back to individual contexts (particularly for HSW).
>      - Ability to inject tags for work, into the reports. This provides
>        visibility into the multiple stages of work within single context.
>
> The userspace will be able to distinguish between the periodic and CS based
> OA reports by the virtue of source_info sample field.
>
> The command MI_REPORT_PERF_COUNT can be used to capture snapshots of OA
> counters, and is inserted at BB boundaries.
> The data thus captured will be stored in a separate buffer, which will
> be different from the buffer used otherwise for periodic OA capture mode.
> The metadata information pertaining to snapshot is maintained in a list,
> which also has offsets into the gem buffer object per captured snapshot.
> In order to track whether the gpu has completed processing the node,
> a field pertaining to corresponding gem request is added, which is tracked
> for completion of the command.
>
> Both periodic and CS based reports are associated with a single stream
> (corresponding to render engine), and it is expected to have the samples
> in the sequential order according to their timestamps. Now, since these
> reports are collected in separate buffers, these are merge sorted at the
> time of forwarding to userspace during the read call.
>
> v2: Aligning with the non-perf interface (custom drm ioctl based). Also,
> few related patches are squashed together for better readability
>
> v3: Updated perf sample capture emit hook name. Reserving space upfront
> in the ring for emitting sample capture commands and using
> req->fence.seqno for tracking samples. Added SRCU protection for streams.
> Changed the stream last_request tracking to resv object. (Chris)
> Updated perf.sample_lock spin_lock usage to avoid softlockups. Moved
> stream to global per-engine structure. (Sagar)
> Update unpin and put in the free routines to i915_vma_unpin_and_release.
> Making use of perf stream cs_buffer vma resv instead of separate resv obj.
> Pruned perf stream vma resv during gem_idle. (Chris)
> Changed payload field ctx_id to u64 to keep all sample data aligned at 8
> bytes. (Lionel)
> stall/flush prior to sample capture is not added. Do we need to give this
> control to user to select whether to stall/flush at each sample?
>
> Signed-off-by: Sourab Gupta <sourab.gupta@intel.com<mailto:sourab.gupta@intel.com>>
> Signed-off-by: Robert Bragg <robert@sixbynine.org<mailto:robert@sixbynine.org>>
> Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com<mailto:sagar.a.kamble@intel.com>>
> ---
>   drivers/gpu/drm/i915/i915_drv.h            |  101 ++-
>   drivers/gpu/drm/i915/i915_gem.c            |    1 +
>   drivers/gpu/drm/i915/i915_gem_execbuffer.c |    8 +
>   drivers/gpu/drm/i915/i915_perf.c           | 1185 ++++++++++++++++++++++------
>   drivers/gpu/drm/i915/intel_engine_cs.c     |    4 +
>   drivers/gpu/drm/i915/intel_ringbuffer.c    |    2 +
>   drivers/gpu/drm/i915/intel_ringbuffer.h    |    5 +
>   include/uapi/drm/i915_drm.h                |   15 +
>   8 files changed, 1073 insertions(+), 248 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 2c7456f..8b1cecf 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1985,6 +1985,24 @@ struct i915_perf_stream_ops {
>        * The stream will always be disabled before this is called.
>        */
>       void (*destroy)(struct i915_perf_stream *stream);
> +
> +     /*
> +      * @emit_sample_capture: Emit the commands in the command streamer
> +      * for a particular gpu engine.
> +      *
> +      * The commands are inserted to capture the perf sample data at
> +      * specific points during workload execution, such as before and after
> +      * the batch buffer.
> +      */
> +     void (*emit_sample_capture)(struct i915_perf_stream *stream,
> +                                 struct drm_i915_gem_request *request,
> +                                 bool preallocate);
> +};
> +

It seems the motivation for this following enum is mostly to deal with
the fact that engine->perf_srcu is set before the OA unit is configured.
Would it possible to set it later so that we get rid of the enum?
<Sagar> I will try to make this as just binary state. This enum is defining the state of the stream. I too got confused with purpose of IN_PROGRESS.
SRCU is used for synchronizing stream state check.
IN_PROGRESS will enable us to not advertently try to access the stream vma for inserting the samples, but I guess depending on disabled/enabled should
suffice.

Hi Sagar/Lionel,

Hi Sourab,

Thanks again for your input on this.



The purpose of the tristate was to workaround a particular kludge of
working with just enabled/disabled boolean state. I'll explain below.

Let's say we have only boolean state.
i915_perf_emit_sample_capture() function would depend on
stream->enabled in order to insert the MI_RPC command in RCS.
If you see i915_perf_enable_locked(), stream->enabled is set before
stream->ops->enable(). The stream->ops->enable() function actually
enables the OA hardware to capture reports, and if MI_RPC commands
are submitted before OA hw is enabled, it may hang the gpu.

Do you remember if this is documented anywhere?
I couldn't find anything in the MI_RPC instruction.
Sorry, I don't happen to remember any documentation. Probably, you can check this out by submitting MI_RPC without enabling OA.
Thing is gen7_oa_enable is depending on enabled to be true.
Lionel, can I change it to :

        if (!stream->enabled) {
                I915_WRITE(GEN7_OACONTROL,
                           (ctx_id & GEN7_OACONTROL_CTX_MASK) |
                           (period_exponent <<

With stream->enabled set to true at the end of i915_perf_enable_locked.


Also, we can't change the order of calling these operations inside
i915_perf_enable_locked() since gen7_update_oacontrol_locked()
function depends on stream->enabled flag to enable the OA
hw unit (i.e. it needs the flag to be true).

We can probably work around that by passing some arguments.


To workaround this problem, I introduced a tristate here.
If you can suggest some alternate solution to this problem,
we can remove this tristate kludge here.
Regards,
Sourab




[-- Attachment #1.2: Type: text/html, Size: 18562 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 03/12] drm/i915: Framework for capturing command stream based OA reports and ctx id info.
  2017-07-31 10:11     ` Chris Wilson
@ 2017-08-02  4:44       ` Kamble, Sagar A
  0 siblings, 0 replies; 34+ messages in thread
From: Kamble, Sagar A @ 2017-08-02  4:44 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: Sourab Gupta

Hi Chis,

Understood the need to handle request reordering. 
Are you suggesting following paths:
1. cs samples list for stream be read based on the order of submission from submit timestamps/OA capture timestamps.
2. put the commands to capture during eb_submit and patch the offset in vma where data is to be captured, populate cs sample list during __i915_gem_request_submit

For preemption, it would then simplify by just discarding the cs sample and relying on corresponding next  __i915_gem_request_submit.

Thanks
Sagar

-----Original Message-----
From: Chris Wilson [mailto:chris@chris-wilson.co.uk] 
Sent: Monday, July 31, 2017 3:42 PM
To: Kamble, Sagar A <sagar.a.kamble@intel.com>; intel-gfx@lists.freedesktop.org
Cc: Sourab Gupta <sourab.gupta@intel.com>
Subject: Re: [Intel-gfx] [PATCH 03/12] drm/i915: Framework for capturing command stream based OA reports and ctx id info.

Quoting Chris Wilson (2017-07-31 09:34:30)
> Quoting Sagar Arun Kamble (2017-07-31 08:59:36)
> > +/**
> > + * i915_perf_stream_emit_sample_capture - Insert the commands to 
> > +capture perf
> > + * metrics into the GPU command stream
> > + * @stream: An i915-perf stream opened for GPU metrics
> > + * @request: request in whose context the metrics are being collected.
> > + * @preallocate: allocate space in ring for related sample.
> > + */
> > +static void i915_perf_stream_emit_sample_capture(
> > +                                       struct i915_perf_stream *stream,
> > +                                       struct drm_i915_gem_request *request,
> > +                                       bool preallocate) {
> > +       struct reservation_object *resv = stream->cs_buffer.vma->resv;
> > +       struct i915_perf_cs_sample *sample;
> > +       unsigned long flags;
> > +       int ret;
> > +
> > +       sample = kzalloc(sizeof(*sample), GFP_KERNEL);
> > +       if (sample == NULL) {
> > +               DRM_ERROR("Perf sample alloc failed\n");
> > +               return;
> > +       }
> > +
> > +       sample->request = i915_gem_request_get(request);
> > +       sample->ctx_id = request->ctx->hw_id;
> > +
> > +       insert_perf_sample(stream, sample);
> > +
> > +       if (stream->sample_flags & SAMPLE_OA_REPORT) {
> > +               ret = i915_emit_oa_report_capture(request,
> > +                                                 preallocate,
> > +                                                 sample->offset);
> > +               if (ret)
> > +                       goto err_unref;
> > +       }
> 
> This is incorrect as the requests may be reordered. You either need to 
> declare the linear ordering of requests through the sample buffer, or 
> we have to delay setting sample->offset until execution, and even then 
> we need to disable preemption when using SAMPLE_OA_REPORT.

Thinking about it, you do need to serialise based on stream->vma, or else where a stream->vma per capture context.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2017-08-02  4:44 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-31  7:59 [PATCH 00/12] i915 perf support for command stream based OA, GPU and workload metrics capture Sagar Arun Kamble
2017-07-31  7:59 ` [PATCH 01/12] drm/i915: Add ctx getparam ioctl parameter to retrieve ctx unique id Sagar Arun Kamble
2017-07-31  7:59 ` [PATCH 02/12] drm/i915: Expose OA sample source to userspace Sagar Arun Kamble
2017-07-31  7:59 ` [PATCH 03/12] drm/i915: Framework for capturing command stream based OA reports and ctx id info Sagar Arun Kamble
2017-07-31  8:34   ` Chris Wilson
2017-07-31 10:11     ` Chris Wilson
2017-08-02  4:44       ` Kamble, Sagar A
2017-07-31  9:43   ` Lionel Landwerlin
2017-07-31 11:38     ` sourab gupta
2017-07-31 14:25       ` Lionel Landwerlin
2017-07-31 15:38   ` kbuild test robot
2017-07-31 15:45   ` Lionel Landwerlin
2017-08-01  9:29     ` Kamble, Sagar A
2017-08-01 18:05       ` sourab gupta
2017-08-01 20:58         ` Lionel Landwerlin
2017-08-02  2:47           ` sourab gupta
2017-08-02  4:25             ` Kamble, Sagar A
2017-07-31  7:59 ` [PATCH 04/12] drm/i915: Flush periodic samples, in case of no pending CS sample requests Sagar Arun Kamble
2017-07-31 16:52   ` kbuild test robot
2017-07-31  7:59 ` [PATCH 05/12] drm/i915: Inform userspace about command stream OA buf overflow Sagar Arun Kamble
2017-07-31  7:59 ` [PATCH 06/12] drm/i915: Populate ctx ID for periodic OA reports Sagar Arun Kamble
2017-07-31  9:27   ` Lionel Landwerlin
2017-07-31 10:42     ` Kamble, Sagar A
2017-07-31 18:17   ` kbuild test robot
2017-07-31  7:59 ` [PATCH 07/12] drm/i915: Add support for having pid output with OA report Sagar Arun Kamble
2017-07-31 19:24   ` kbuild test robot
2017-07-31  7:59 ` [PATCH 08/12] drm/i915: Add support for emitting execbuffer tags through OA counter reports Sagar Arun Kamble
2017-07-31  7:59 ` [PATCH 09/12] drm/i915: Add support for collecting timestamps on all gpu engines Sagar Arun Kamble
2017-07-31  7:59 ` [PATCH 10/12] drm/i915: Extract raw GPU timestamps from OA reports to forward in perf samples Sagar Arun Kamble
2017-07-31  7:59 ` [PATCH 11/12] drm/i915: Async check for streams data availability with hrtimer rescheduling Sagar Arun Kamble
2017-07-31  7:59 ` [PATCH 12/12] drm/i915: Support for capturing MMIO register values Sagar Arun Kamble
2017-07-31 11:49   ` kbuild test robot
2017-07-31 12:08   ` kbuild test robot
2017-07-31  9:02 ` ✓ Fi.CI.BAT: success for i915 perf support for command stream based OA, GPU and workload metrics capture Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.