[PATCH 0/8] Collect command stream based OA reports using i915 perf

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/8] Collect command stream based OA reports using i915 perf
@ 2017-03-16  6:14 sourab.gupta
  2017-03-16  6:14 ` [PATCH 1/8] drm/i915: Add ctx getparam ioctl parameter to retrieve ctx unique id sourab.gupta
                   ` (9 more replies)
  0 siblings, 10 replies; 24+ messages in thread
From: sourab.gupta @ 2017-03-16  6:14 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter, Sourab Gupta, Matthew Auld

From: Sourab Gupta <sourab.gupta@intel.com>

This series adds framework for collection of OA reports associated with the
render command stream, which are collected around batchbuffer boundaries.

Refloating the series rebased on Robert's latest patch set for
'Enabling OA unit for Gen 8 and 9 in i915 perf', which can be found here:
https://patchwork.freedesktop.org/series/20084/

Since Robert's patches are being reviewed and this patch series extends his
framework to collect command stream based OA metrics, it would be good to keep
this work in perspective. Looking to receive feedback (and possibly r-b's :))
on the series.

Since the OA reports collected associated with the render command stream, this
also gives us the ability to collect other metadata such as ctx_id, pid, etc.
with the samples, and thus we can establish the association of samples
collected with the corresponding process/workload.

These patches can be found for viewing at
https://github.com/sourabgu/linux/tree/oa-6march2017

Sourab Gupta (8):
  drm/i915: Add ctx getparam ioctl parameter to retrieve ctx unique id
  drm/i915: Expose OA sample source to userspace
  drm/i915: Framework for capturing command stream based OA reports
  drm/i915: flush periodic samples, in case of no pending CS sample
    requests
  drm/i915: Inform userspace about command stream OA buf overflow
  drm/i915: Populate ctx ID for periodic OA reports
  drm/i915: Add support for having pid output with OA report
  drm/i915: Add support for emitting execbuffer tags through OA counter
    reports

 drivers/gpu/drm/i915/i915_drv.h            |  125 ++-
 drivers/gpu/drm/i915/i915_gem_context.c    |    3 +
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |    6 +
 drivers/gpu/drm/i915/i915_perf.c           | 1149 ++++++++++++++++++++++++----
 include/uapi/drm/i915_drm.h                |   49 ++
 5 files changed, 1184 insertions(+), 148 deletions(-)

-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 1/8] drm/i915: Add ctx getparam ioctl parameter to retrieve ctx unique id
  2017-03-16  6:14 [PATCH 0/8] Collect command stream based OA reports using i915 perf sourab.gupta
@ 2017-03-16  6:14 ` sourab.gupta
  2017-03-16  8:23   ` Chris Wilson
  2017-03-16  6:14 ` [PATCH 2/8] drm/i915: Expose OA sample source to userspace sourab.gupta
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 24+ messages in thread
From: sourab.gupta @ 2017-03-16  6:14 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter, Sourab Gupta, Matthew Auld

From: Sourab Gupta <sourab.gupta@intel.com>

This patch adds a new ctx getparam ioctl parameter, which can be used to
retrieve ctx unique id by userspace.

This can be used by userspace to map the OA reports received in the
i915 perf samples with their associated ctx's (The OA reports have the
hw ctx ID information embedded for Gen8+).
Otherwise the userspace has no way of maintaining this association,
since it has the knowledge of only per-drm file specific ctx handles.

Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_context.c | 3 +++
 include/uapi/drm/i915_drm.h             | 1 +
 2 files changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index baceca1..b6d2125 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -1062,6 +1062,9 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
 	case I915_CONTEXT_PARAM_BANNABLE:
 		args->value = i915_gem_context_is_bannable(ctx);
 		break;
+	case I915_CONTEXT_PARAM_HW_ID:
+		args->value = ctx->hw_id;
+		break;
 	default:
 		ret = -EINVAL;
 		break;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 03b8338..835e711 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1291,6 +1291,7 @@ struct drm_i915_gem_context_param {
 #define I915_CONTEXT_PARAM_GTT_SIZE	0x3
 #define I915_CONTEXT_PARAM_NO_ERROR_CAPTURE	0x4
 #define I915_CONTEXT_PARAM_BANNABLE	0x5
+#define I915_CONTEXT_PARAM_HW_ID	0x6
 	__u64 value;
 };
 
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 2/8] drm/i915: Expose OA sample source to userspace
  2017-03-16  6:14 [PATCH 0/8] Collect command stream based OA reports using i915 perf sourab.gupta
  2017-03-16  6:14 ` [PATCH 1/8] drm/i915: Add ctx getparam ioctl parameter to retrieve ctx unique id sourab.gupta
@ 2017-03-16  6:14 ` sourab.gupta
  2017-03-16 14:20   ` Robert Bragg
  2017-03-16  6:14 ` [PATCH 3/8] drm/i915: Framework for capturing command stream based OA reports sourab.gupta
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 24+ messages in thread
From: sourab.gupta @ 2017-03-16  6:14 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter, Sourab Gupta, Matthew Auld

From: Sourab Gupta <sourab.gupta@intel.com>

This patch exposes a new sample source field to userspace. This field can
be populated to specify the origin of the OA report.
Currently, the OA samples are being generated only periodically, and hence
there's only source flag enum definition right now, but there are other
means of generating OA samples, such as via MI_RPC commands. The OA_SOURCE
sample type is introducing a mechanism (for userspace) to distinguish various
OA reports generated via different sources.

Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
Signed-off-by: Robert Bragg <robert@sixbynine.org>
---
 drivers/gpu/drm/i915/i915_perf.c | 18 ++++++++++++++++++
 include/uapi/drm/i915_drm.h      | 14 ++++++++++++++
 2 files changed, 32 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 4b1db73..540c5b2 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -324,6 +324,7 @@
 };
 
 #define SAMPLE_OA_REPORT      (1<<0)
+#define SAMPLE_OA_SOURCE_INFO	(1<<1)
 
 /**
  * struct perf_open_properties - for validated properties given to open a stream
@@ -659,6 +660,15 @@ static int append_oa_sample(struct i915_perf_stream *stream,
 		return -EFAULT;
 	buf += sizeof(header);
 
+	if (sample_flags & SAMPLE_OA_SOURCE_INFO) {
+		enum drm_i915_perf_oa_event_source source =
+				I915_PERF_OA_EVENT_SOURCE_PERIODIC;
+
+		if (copy_to_user(buf, &source, 4))
+			return -EFAULT;
+		buf += 4;
+	}
+
 	if (sample_flags & SAMPLE_OA_REPORT) {
 		if (copy_to_user(buf, report, report_size))
 			return -EFAULT;
@@ -2030,6 +2040,11 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
 	stream->sample_flags |= SAMPLE_OA_REPORT;
 	stream->sample_size += format_size;
 
+	if (props->sample_flags & SAMPLE_OA_SOURCE_INFO) {
+		stream->sample_flags |= SAMPLE_OA_SOURCE_INFO;
+		stream->sample_size += 4;
+	}
+
 	dev_priv->perf.oa.oa_buffer.format_size = format_size;
 	if (WARN_ON(dev_priv->perf.oa.oa_buffer.format_size == 0))
 		return -EINVAL;
@@ -2814,6 +2829,9 @@ static int read_properties_unlocked(struct drm_i915_private *dev_priv,
 			props->oa_periodic = true;
 			props->oa_period_exponent = value;
 			break;
+		case DRM_I915_PERF_PROP_SAMPLE_OA_SOURCE:
+			props->sample_flags |= SAMPLE_OA_SOURCE_INFO;
+			break;
 		default:
 			MISSING_CASE(id);
 			DRM_DEBUG("Unknown i915 perf property ID\n");
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 835e711..c597e36 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1312,6 +1312,12 @@ enum drm_i915_oa_format {
 	I915_OA_FORMAT_MAX	    /* non-ABI */
 };
 
+enum drm_i915_perf_oa_event_source {
+	I915_PERF_OA_EVENT_SOURCE_PERIODIC,
+
+	I915_PERF_OA_EVENT_SOURCE_MAX	/* non-ABI */
+};
+
 enum drm_i915_perf_property_id {
 	/**
 	 * Open the stream for a specific context handle (as used with
@@ -1346,6 +1352,13 @@ enum drm_i915_perf_property_id {
 	 */
 	DRM_I915_PERF_PROP_OA_EXPONENT,
 
+	/**
+	 * The value of this property set to 1 requests inclusion of sample
+	 * source field to be given to userspace. The sample source field
+	 * specifies the origin of OA report.
+	 */
+	DRM_I915_PERF_PROP_SAMPLE_OA_SOURCE,
+
 	DRM_I915_PERF_PROP_MAX /* non-ABI */
 };
 
@@ -1411,6 +1424,7 @@ enum drm_i915_perf_record_type {
 	 * struct {
 	 *     struct drm_i915_perf_record_header header;
 	 *
+	 *     { u32 source_info; } && DRM_I915_PERF_PROP_SAMPLE_OA_SOURCE
 	 *     { u32 oa_report[]; } && DRM_I915_PERF_PROP_SAMPLE_OA
 	 * };
 	 */
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 3/8] drm/i915: Framework for capturing command stream based OA reports
  2017-03-16  6:14 [PATCH 0/8] Collect command stream based OA reports using i915 perf sourab.gupta
  2017-03-16  6:14 ` [PATCH 1/8] drm/i915: Add ctx getparam ioctl parameter to retrieve ctx unique id sourab.gupta
  2017-03-16  6:14 ` [PATCH 2/8] drm/i915: Expose OA sample source to userspace sourab.gupta
@ 2017-03-16  6:14 ` sourab.gupta
  2017-03-16  8:10   ` Chris Wilson
  2017-03-16  8:31   ` Chris Wilson
  2017-03-16  6:14 ` [PATCH 4/8] drm/i915: flush periodic samples, in case of no pending CS sample requests sourab.gupta
                   ` (6 subsequent siblings)
  9 siblings, 2 replies; 24+ messages in thread
From: sourab.gupta @ 2017-03-16  6:14 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter, Sourab Gupta, Matthew Auld

From: Sourab Gupta <sourab.gupta@intel.com>

This patch introduces a framework to capture OA counter reports associated
with Render command stream. We can then associate the reports captured
through this mechanism with their corresponding context id's. This can be
further extended to associate any other metadata information with the
corresponding samples (since the association with Render command stream
gives us the ability to capture these information while inserting the
corresponding capture commands into the command stream).

The OA reports generated in this way are associated with a corresponding
workload, and thus can be used the delimit the workload (i.e. sample the
counters at the workload boundaries), within an ongoing stream of periodic
counter snapshots.

There may be usecases wherein we need more than periodic OA capture mode
which is supported currently. This mode is primarily used for two usecases:
    - Ability to capture system wide metrics, alongwith the ability to map
      the reports back to individual contexts (particularly for HSW).
    - Ability to inject tags for work, into the reports. This provides
      visibility into the multiple stages of work within single context.

The userspace will be able to distinguish between the periodic and CS based
OA reports by the virtue of source_info sample field.

The command MI_REPORT_PERF_COUNT can be used to capture snapshots of OA
counters, and is inserted at BB boundaries.
The data thus captured will be stored in a separate buffer, which will
be different from the buffer used otherwise for periodic OA capture mode.
The metadata information pertaining to snapshot is maintained in a list,
which also has offsets into the gem buffer object per captured snapshot.
In order to track whether the gpu has completed processing the node,
a field pertaining to corresponding gem request is added, which is tracked
for completion of the command.

Both periodic and RCS based reports are associated with a single stream
(corresponding to render engine), and it is expected to have the samples
in the sequential order according to their timestamps. Now, since these
reports are collected in separate buffers, these are merge sorted at the
time of forwarding to userspace during the read call.

v2: Aligining with the non-perf interface (custom drm ioctl based). Also,
few related patches are squashed together for better readability

Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
Signed-off-by: Robert Bragg <robert@sixbynine.org>
---
 drivers/gpu/drm/i915/i915_drv.h            |  88 ++-
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |   4 +
 drivers/gpu/drm/i915/i915_perf.c           | 888 ++++++++++++++++++++++++-----
 include/uapi/drm/i915_drm.h                |  15 +
 4 files changed, 861 insertions(+), 134 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index ef104ff5..0b70052 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1972,6 +1972,23 @@ struct i915_perf_stream_ops {
 	 * The stream will always be disabled before this is called.
 	 */
 	void (*destroy)(struct i915_perf_stream *stream);
+
+	/*
+	 * @command_stream_hook: Emit the commands in the command streamer
+	 * for a particular gpu engine.
+	 *
+	 * The commands are inserted to capture the perf sample data at
+	 * specific points during workload execution, such as before and after
+	 * the batch buffer.
+	 */
+	void (*command_stream_hook)(struct i915_perf_stream *stream,
+				struct drm_i915_gem_request *request);
+};
+
+enum i915_perf_stream_state {
+	I915_PERF_STREAM_DISABLED,
+	I915_PERF_STREAM_ENABLE_IN_PROGRESS,
+	I915_PERF_STREAM_ENABLED,
 };
 
 /**
@@ -1989,6 +2006,10 @@ struct i915_perf_stream {
 	struct list_head link;
 
 	/**
+	 * @engine: GPU engine associated with this particular stream
+	 */
+	enum intel_engine_id engine;
+	/**
 	 * @sample_flags: Flags representing the `DRM_I915_PERF_PROP_SAMPLE_*`
 	 * properties given when opening a stream, representing the contents
 	 * of a single sample as read() by userspace.
@@ -2009,11 +2030,25 @@ struct i915_perf_stream {
 	struct i915_gem_context *ctx;
 
 	/**
-	 * @enabled: Whether the stream is currently enabled, considering
-	 * whether the stream was opened in a disabled state and based
-	 * on `I915_PERF_IOCTL_ENABLE` and `I915_PERF_IOCTL_DISABLE` calls.
+	 * @state: Current stream state, which can be either disabled, enabled,
+	 * or enable_in_progress, while considering whether the stream was
+	 * opened in a disabled state and based on `I915_PERF_IOCTL_ENABLE` and
+	 * `I915_PERF_IOCTL_DISABLE` calls.
 	 */
-	bool enabled;
+	enum i915_perf_stream_state state;
+
+	/**
+	 * @cs_mode: Whether command stream based perf sample collection is
+	 * enabled for this stream
+	 */
+	bool cs_mode;
+
+	/**
+	 * @last_request: Contains an RCU guarded pointer to the last request
+	 * associated with the stream, when command stream mode is enabled for
+	 * the stream.
+	 */
+	struct i915_gem_active last_request;
 
 	/**
 	 * @ops: The callbacks providing the implementation of this specific
@@ -2074,7 +2109,8 @@ struct i915_oa_ops {
 	int (*read)(struct i915_perf_stream *stream,
 		    char __user *buf,
 		    size_t count,
-		    size_t *offset);
+		    size_t *offset,
+		    u32 ts);
 
 	/**
 	 * @oa_buffer_check: Check for OA buffer data + update tail
@@ -2093,6 +2129,36 @@ struct i915_oa_ops {
 	bool (*oa_buffer_check)(struct drm_i915_private *dev_priv);
 };
 
+/*
+ * i915_perf_cs_sample - Sample element to hold info about a single perf
+ * sample data associated with a particular GPU command stream.
+ */
+struct i915_perf_cs_sample {
+	/**
+	 * @link: Links the sample into ``&drm_i915_private->perf.cs_samples``
+	 */
+	struct list_head link;
+
+	/**
+	 * @request: Gem request associated with the sample. The commands to
+	 * capture the perf metrics are inserted into the command streamer in
+	 * context of this request.
+	 */
+	struct drm_i915_gem_request *request;
+
+	/**
+	 * @offset: Offset into ``&drm_i915_private->perf.command_stream_buf``
+	 * where the perf metrics will be collected, when the commands inserted
+	 * into the command stream are executed by GPU.
+	 */
+	u32 offset;
+
+	/**
+	 * @ctx_id: Context ID associated with this perf sample
+	 */
+	u32 ctx_id;
+};
+
 struct intel_cdclk_state {
 	unsigned int cdclk, vco, ref;
 };
@@ -2421,6 +2487,8 @@ struct drm_i915_private {
 		struct ctl_table_header *sysctl_header;
 
 		struct mutex lock;
+
+		struct mutex streams_lock;
 		struct list_head streams;
 
 		spinlock_t hook_lock;
@@ -2532,6 +2600,15 @@ struct drm_i915_private {
 			const struct i915_oa_format *oa_formats;
 			int n_builtin_sets;
 		} oa;
+
+		/* Command stream based perf data buffer */
+		struct {
+			struct i915_vma *vma;
+			u8 *vaddr;
+		} command_stream_buf;
+
+		struct list_head cs_samples;
+		spinlock_t sample_lock;
 	} perf;
 
 	/* Abstract the submission mechanism (legacy ringbuffer or execlists) away */
@@ -3593,6 +3670,7 @@ void i915_oa_init_reg_state(struct intel_engine_cs *engine,
 void i915_oa_update_reg_state(struct intel_engine_cs *engine,
 			      struct i915_gem_context *ctx,
 			      uint32_t *reg_state);
+void i915_perf_command_stream_hook(struct drm_i915_gem_request *req);
 
 /* i915_gem_evict.c */
 int __must_check i915_gem_evict_something(struct i915_address_space *vm,
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index aa75ea2..7af32c97 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1441,12 +1441,16 @@ static void eb_export_fence(struct drm_i915_gem_object *obj,
 	if (exec_len == 0)
 		exec_len = params->batch->size - params->args_batch_start_offset;
 
+	i915_perf_command_stream_hook(params->request);
+
 	ret = params->engine->emit_bb_start(params->request,
 					    exec_start, exec_len,
 					    params->dispatch_flags);
 	if (ret)
 		return ret;
 
+	i915_perf_command_stream_hook(params->request);
+
 	i915_gem_execbuffer_move_to_active(vmas, params->request);
 
 	return 0;
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 540c5b2..3321dad 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -285,6 +285,12 @@
 #define OAREPORT_REASON_CTX_SWITCH     (1<<3)
 #define OAREPORT_REASON_CLK_RATIO      (1<<5)
 
+/* Data common to periodic and RCS based OA samples */
+struct oa_sample_data {
+	u32 source;
+	u32 ctx_id;
+	const u8 *report;
+};
 
 /* For sysctl proc_dointvec_minmax of i915_oa_max_sample_rate
  *
@@ -323,8 +329,19 @@
 	[I915_OA_FORMAT_C4_B8]		    = { 7, 64 },
 };
 
+/* Duplicated from similar static enum in i915_gem_execbuffer.c */
+#define I915_USER_RINGS (4)
+static const enum intel_engine_id user_ring_map[I915_USER_RINGS + 1] = {
+	[I915_EXEC_DEFAULT]	= RCS,
+	[I915_EXEC_RENDER]	= RCS,
+	[I915_EXEC_BLT]		= BCS,
+	[I915_EXEC_BSD]		= VCS,
+	[I915_EXEC_VEBOX]	= VECS
+};
+
 #define SAMPLE_OA_REPORT      (1<<0)
 #define SAMPLE_OA_SOURCE_INFO	(1<<1)
+#define SAMPLE_CTX_ID		(1<<2)
 
 /**
  * struct perf_open_properties - for validated properties given to open a stream
@@ -335,6 +352,9 @@
  * @oa_format: An OA unit HW report format
  * @oa_periodic: Whether to enable periodic OA unit sampling
  * @oa_period_exponent: The OA unit sampling period is derived from this
+ * @cs_mode: Whether the stream is configured to enable collection of metrics
+ * associated with command stream of a particular GPU engine
+ * @engine: The GPU engine associated with the stream in case cs_mode is enabled
  *
  * As read_properties_unlocked() enumerates and validates the properties given
  * to open a stream of metrics the configuration is built up in the structure
@@ -351,9 +371,218 @@ struct perf_open_properties {
 	int oa_format;
 	bool oa_periodic;
 	int oa_period_exponent;
+
+	/* Command stream mode */
+	bool cs_mode;
+	enum intel_engine_id engine;
 };
 
 /**
+ * i915_perf_command_stream_hook - Insert the commands to capture metrics into the
+ * command stream of a GPU engine.
+ * @request: request in whose context the metrics are being collected.
+ *
+ * The function provides a hook through which the commands to capture perf
+ * metrics, are inserted into the command stream of a GPU engine.
+ */
+void i915_perf_command_stream_hook(struct drm_i915_gem_request *request)
+{
+	struct intel_engine_cs *engine = request->engine;
+	struct drm_i915_private *dev_priv = engine->i915;
+	struct i915_perf_stream *stream;
+
+	if (!dev_priv->perf.initialized)
+		return;
+
+	mutex_lock(&dev_priv->perf.streams_lock);
+	list_for_each_entry(stream, &dev_priv->perf.streams, link) {
+		if ((stream->state == I915_PERF_STREAM_ENABLED) &&
+					stream->cs_mode)
+			stream->ops->command_stream_hook(stream, request);
+	}
+	mutex_unlock(&dev_priv->perf.streams_lock);
+}
+
+/**
+ * release_perf_samples - Release old perf samples to make space for new
+ * sample data.
+ * @dev_priv: i915 device private
+ * @target_size: Space required to be freed up.
+ *
+ * We also dereference the associated request before deleting the sample.
+ * Also, no need to check whether the commands associated with old samples
+ * have been completed. This is because these sample entries are anyways going
+ * to be replaced by a new sample, and gpu will eventually overwrite the buffer
+ * contents, when the request associated with new sample completes.
+ */
+static void release_perf_samples(struct drm_i915_private *dev_priv,
+					u32 target_size)
+{
+	struct i915_perf_cs_sample *sample, *next;
+	u32 sample_size = dev_priv->perf.oa.oa_buffer.format_size;
+	u32 size = 0;
+
+	list_for_each_entry_safe
+		(sample, next, &dev_priv->perf.cs_samples, link) {
+
+		size += sample_size;
+		i915_gem_request_put(sample->request);
+		list_del(&sample->link);
+		kfree(sample);
+
+		if (size >= target_size)
+			break;
+	}
+}
+
+/**
+ * insert_perf_sample - Insert a perf sample entry to the sample list.
+ * @dev_priv: i915 device private
+ * @sample: perf CS sample to be inserted into the list
+ *
+ * This function never fails, since it always manages to insert the sample.
+ * If the space is exhausted in the buffer, it will remove the older
+ * entries in order to make space.
+ */
+static void insert_perf_sample(struct drm_i915_private *dev_priv,
+				struct i915_perf_cs_sample *sample)
+{
+	struct i915_perf_cs_sample *first, *last;
+	int max_offset = dev_priv->perf.command_stream_buf.vma->obj->base.size;
+	u32 sample_size = dev_priv->perf.oa.oa_buffer.format_size;
+
+	spin_lock(&dev_priv->perf.sample_lock);
+	if (list_empty(&dev_priv->perf.cs_samples)) {
+		sample->offset = 0;
+		list_add_tail(&sample->link, &dev_priv->perf.cs_samples);
+		spin_unlock(&dev_priv->perf.sample_lock);
+		return;
+	}
+
+	first = list_first_entry(&dev_priv->perf.cs_samples,typeof(*first),
+				link);
+	last = list_last_entry(&dev_priv->perf.cs_samples, typeof(*last),
+				link);
+
+	if (last->offset >= first->offset) {
+		/* Sufficient space available at the end of buffer? */
+		if (last->offset + 2*sample_size < max_offset)
+			sample->offset = last->offset + sample_size;
+		/*
+		 * Wraparound condition. Is sufficient space available at
+		 * beginning of buffer?
+		 */
+		else if (sample_size < first->offset)
+			sample->offset = 0;
+		/* Insufficient space. Overwrite existing old entries */
+		else {
+			u32 target_size = sample_size - first->offset;
+
+			release_perf_samples(dev_priv, target_size);
+			sample->offset = 0;
+		}
+	} else {
+		/* Sufficient space available? */
+		if (last->offset + 2*sample_size < first->offset)
+			sample->offset = last->offset + sample_size;
+		/* Insufficient space. Overwrite existing old entries */
+		else {
+			u32 target_size = sample_size -
+				(first->offset - last->offset -
+				sample_size);
+
+			release_perf_samples(dev_priv, target_size);
+			sample->offset = last->offset + sample_size;
+		}
+	}
+	list_add_tail(&sample->link, &dev_priv->perf.cs_samples);
+	spin_unlock(&dev_priv->perf.sample_lock);
+}
+
+/**
+ * i915_perf_command_stream_hook_oa - Insert the commands to capture OA reports
+ * metrics into the render command stream
+ * @stream: An i915-perf stream opened for OA metrics
+ * @request: request in whose context the metrics are being collected.
+ *
+ */
+static void i915_perf_command_stream_hook_oa(struct i915_perf_stream *stream,
+					struct drm_i915_gem_request *request)
+{
+	struct drm_i915_private *dev_priv = request->i915;
+	struct i915_gem_context *ctx = request->ctx;
+	struct i915_perf_cs_sample *sample;
+	u32 addr = 0;
+	u32 cmd, *cs;
+
+	sample = kzalloc(sizeof(*sample), GFP_KERNEL);
+	if (sample == NULL) {
+		DRM_ERROR("Perf sample alloc failed\n");
+		return;
+	}
+
+	cs = intel_ring_begin(request, 4);
+	if (IS_ERR(cs)) {
+		kfree(sample);
+		return;
+	}
+
+	sample->ctx_id = ctx->hw_id;
+	i915_gem_request_assign(&sample->request, request);
+
+	insert_perf_sample(dev_priv, sample);
+
+	addr = dev_priv->perf.command_stream_buf.vma->node.start +
+		sample->offset;
+
+	if (WARN_ON(addr & 0x3f)) {
+		DRM_ERROR("OA buffer address not aligned to 64 byte");
+		return;
+	}
+
+	cmd = MI_REPORT_PERF_COUNT | (1<<0);
+	if (INTEL_GEN(dev_priv) >= 8)
+		cmd |= (2<<0);
+
+	*cs++ = cmd;
+	*cs++ = addr | MI_REPORT_PERF_COUNT_GGTT;
+	*cs++ = request->global_seqno;
+
+	if (INTEL_GEN(dev_priv) >= 8)
+		*cs++ = 0;
+	else
+		*cs++ = MI_NOOP;
+
+	intel_ring_advance(request, cs);
+
+	i915_gem_active_set(&stream->last_request, request);
+	i915_vma_move_to_active(dev_priv->perf.command_stream_buf.vma, request,
+					EXEC_OBJECT_WRITE);
+}
+
+/**
+ * i915_oa_rcs_release_samples - Release the perf command stream samples
+ * @dev_priv: i915 device private
+ *
+ * Note: The associated requests should be completed before releasing the
+ * references here.
+ */
+static void i915_oa_rcs_release_samples(struct drm_i915_private *dev_priv)
+{
+	struct i915_perf_cs_sample *entry, *next;
+
+	list_for_each_entry_safe
+		(entry, next, &dev_priv->perf.cs_samples, link) {
+		i915_gem_request_put(entry->request);
+
+		spin_lock(&dev_priv->perf.sample_lock);
+		list_del(&entry->link);
+		spin_unlock(&dev_priv->perf.sample_lock);
+		kfree(entry);
+	}
+}
+
+/**
  * gen8_oa_buffer_check_unlocked - check for data and update tail ptr state
  * @dev_priv: i915 device instance
  *
@@ -626,7 +855,8 @@ static int append_oa_status(struct i915_perf_stream *stream,
  * @buf: destination buffer given by userspace
  * @count: the number of bytes userspace wants to read
  * @offset: (inout): the current position for writing into @buf
- * @report: A single OA report to (optionally) include as part of the sample
+ * @data: OA sample data, which contains (optionally) OA report with other
+ * sample metadata.
  *
  * The contents of a sample are configured through `DRM_I915_PERF_PROP_SAMPLE_*`
  * properties when opening a stream, tracked as `stream->sample_flags`. This
@@ -641,7 +871,7 @@ static int append_oa_sample(struct i915_perf_stream *stream,
 			    char __user *buf,
 			    size_t count,
 			    size_t *offset,
-			    const u8 *report)
+			    const struct oa_sample_data *data)
 {
 	struct drm_i915_private *dev_priv = stream->dev_priv;
 	int report_size = dev_priv->perf.oa.oa_buffer.format_size;
@@ -661,17 +891,21 @@ static int append_oa_sample(struct i915_perf_stream *stream,
 	buf += sizeof(header);
 
 	if (sample_flags & SAMPLE_OA_SOURCE_INFO) {
-		enum drm_i915_perf_oa_event_source source =
-				I915_PERF_OA_EVENT_SOURCE_PERIODIC;
+		if (copy_to_user(buf, &data->source, 4))
+			return -EFAULT;
+		buf += 4;
+	}
 
-		if (copy_to_user(buf, &source, 4))
+	if (sample_flags & SAMPLE_CTX_ID) {
+		if (copy_to_user(buf, &data->ctx_id, 4))
 			return -EFAULT;
 		buf += 4;
 	}
 
 	if (sample_flags & SAMPLE_OA_REPORT) {
-		if (copy_to_user(buf, report, report_size))
+		if (copy_to_user(buf, data->report, report_size))
 			return -EFAULT;
+		buf += report_size;
 	}
 
 	(*offset) += header.size;
@@ -680,11 +914,47 @@ static int append_oa_sample(struct i915_perf_stream *stream,
 }
 
 /**
+ * append_oa_buffer_sample - Copies single periodic OA report into userspace
+ * read() buffer.
+ * @stream: An i915-perf stream opened for OA metrics
+ * @buf: destination buffer given by userspace
+ * @count: the number of bytes userspace wants to read
+ * @offset: (inout): the current position for writing into @buf
+ * @report: A single OA report to (optionally) include as part of the sample
+ *
+ * Returns: 0 on success, negative error code on failure.
+ */
+static int append_oa_buffer_sample(struct i915_perf_stream *stream,
+				char __user *buf, size_t count,
+				size_t *offset,	const u8 *report)
+{
+	u32 sample_flags = stream->sample_flags;
+	struct oa_sample_data data = { 0 };
+
+	if (sample_flags & SAMPLE_OA_SOURCE_INFO)
+		data.source = I915_PERF_OA_EVENT_SOURCE_PERIODIC;
+
+	/*
+	 * FIXME: append_oa_buffer_sample: read ctx ID from report and map
+	 * that to a intel_context::hw_id"
+	 */
+	if (sample_flags & SAMPLE_CTX_ID)
+		data.ctx_id = 0;
+
+	if (sample_flags & SAMPLE_OA_REPORT)
+		data.report = report;
+
+	return append_oa_sample(stream, buf, count, offset, &data);
+}
+
+
+/**
  * Copies all buffered OA reports into userspace read() buffer.
  * @stream: An i915-perf stream opened for OA metrics
  * @buf: destination buffer given by userspace
  * @count: the number of bytes userspace wants to read
  * @offset: (inout): the current position for writing into @buf
+ * @ts: copy OA reports till this timestamp
  *
  * Notably any error condition resulting in a short read (-%ENOSPC or
  * -%EFAULT) will be returned even though one or more records may
@@ -702,7 +972,8 @@ static int append_oa_sample(struct i915_perf_stream *stream,
 static int gen8_append_oa_reports(struct i915_perf_stream *stream,
 				  char __user *buf,
 				  size_t count,
-				  size_t *offset)
+				  size_t *offset,
+				  u32 ts)
 {
 	struct drm_i915_private *dev_priv = stream->dev_priv;
 	int report_size = dev_priv->perf.oa.oa_buffer.format_size;
@@ -716,7 +987,7 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
 	u32 taken;
 	int ret = 0;
 
-	if (WARN_ON(!stream->enabled))
+	if (WARN_ON(stream->state != I915_PERF_STREAM_ENABLED))
 		return -EIO;
 
 	spin_lock_irqsave(&dev_priv->perf.oa.oa_buffer.ptr_lock, flags);
@@ -759,6 +1030,11 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
 		u32 *report32 = (void *)report;
 		u32 ctx_id;
 		u32 reason;
+		u32 report_ts = report32[1];
+
+		/* Report timestamp should not exceed the given ts */
+		if (report_ts > ts)
+			break;
 
 		/* All the report sizes factor neatly into the buffer
 		 * size so we never expect to see a report split
@@ -846,8 +1122,8 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
 				report32[2] = INVALID_CTX_ID;
 			}
 
-			ret = append_oa_sample(stream, buf, count, offset,
-					       report);
+			ret = append_oa_buffer_sample(stream, buf, count,
+							offset, report);
 			if (ret)
 				break;
 
@@ -886,6 +1162,7 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
  * @buf: destination buffer given by userspace
  * @count: the number of bytes userspace wants to read
  * @offset: (inout): the current position for writing into @buf
+ * @ts: copy OA reports till this timestamp
  *
  * Checks OA unit status registers and if necessary appends corresponding
  * status records for userspace (such as for a buffer full condition) and then
@@ -903,7 +1180,8 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
 static int gen8_oa_read(struct i915_perf_stream *stream,
 			char __user *buf,
 			size_t count,
-			size_t *offset)
+			size_t *offset,
+			u32 ts)
 {
 	struct drm_i915_private *dev_priv = stream->dev_priv;
 	u32 oastatus;
@@ -953,7 +1231,7 @@ static int gen8_oa_read(struct i915_perf_stream *stream,
 		}
 	}
 
-	return gen8_append_oa_reports(stream, buf, count, offset);
+	return gen8_append_oa_reports(stream, buf, count, offset, ts);
 }
 
 /**
@@ -962,6 +1240,7 @@ static int gen8_oa_read(struct i915_perf_stream *stream,
  * @buf: destination buffer given by userspace
  * @count: the number of bytes userspace wants to read
  * @offset: (inout): the current position for writing into @buf
+ * @ts: copy OA reports till this timestamp
  *
  * Notably any error condition resulting in a short read (-%ENOSPC or
  * -%EFAULT) will be returned even though one or more records may
@@ -979,7 +1258,8 @@ static int gen8_oa_read(struct i915_perf_stream *stream,
 static int gen7_append_oa_reports(struct i915_perf_stream *stream,
 				  char __user *buf,
 				  size_t count,
-				  size_t *offset)
+				  size_t *offset,
+				  u32 ts)
 {
 	struct drm_i915_private *dev_priv = stream->dev_priv;
 	int report_size = dev_priv->perf.oa.oa_buffer.format_size;
@@ -993,7 +1273,7 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream,
 	u32 taken;
 	int ret = 0;
 
-	if (WARN_ON(!stream->enabled))
+	if (WARN_ON(stream->state != I915_PERF_STREAM_ENABLED))
 		return -EIO;
 
 	spin_lock_irqsave(&dev_priv->perf.oa.oa_buffer.ptr_lock, flags);
@@ -1060,7 +1340,11 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream,
 			continue;
 		}
 
-		ret = append_oa_sample(stream, buf, count, offset, report);
+		/* Report timestamp should not exceed the given ts */
+		if (report32[1] > ts)
+			break;
+
+		ret = append_oa_buffer_sample(stream, buf, count, offset, report);
 		if (ret)
 			break;
 
@@ -1099,6 +1383,7 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream,
  * @buf: destination buffer given by userspace
  * @count: the number of bytes userspace wants to read
  * @offset: (inout): the current position for writing into @buf
+ * @ts: copy OA reports till this timestamp
  *
  * Checks Gen 7 specific OA unit status registers and if necessary appends
  * corresponding status records for userspace (such as for a buffer full
@@ -1112,7 +1397,8 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream,
 static int gen7_oa_read(struct i915_perf_stream *stream,
 			char __user *buf,
 			size_t count,
-			size_t *offset)
+			size_t *offset,
+			u32 ts)
 {
 	struct drm_i915_private *dev_priv = stream->dev_priv;
 	u32 oastatus1;
@@ -1174,7 +1460,159 @@ static int gen7_oa_read(struct i915_perf_stream *stream,
 			GEN7_OASTATUS1_REPORT_LOST;
 	}
 
-	return gen7_append_oa_reports(stream, buf, count, offset);
+	return gen7_append_oa_reports(stream, buf, count, offset, ts);
+}
+
+/**
+ * append_oa_rcs_sample - Copies single RCS based OA report into userspace
+ * read() buffer.
+ * @stream: An i915-perf stream opened for OA metrics
+ * @buf: destination buffer given by userspace
+ * @count: the number of bytes userspace wants to read
+ * @offset: (inout): the current position for writing into @buf
+ * @node: Sample data associated with RCS based OA report
+ *
+ * Returns: 0 on success, negative error code on failure.
+ */
+static int append_oa_rcs_sample(struct i915_perf_stream *stream,
+				char __user *buf,
+				size_t count,
+				size_t *offset,
+				struct i915_perf_cs_sample *node)
+{
+	struct drm_i915_private *dev_priv = stream->dev_priv;
+	struct oa_sample_data data = { 0 };
+	const u8 *report = dev_priv->perf.command_stream_buf.vaddr +
+				node->offset;
+	u32 sample_flags = stream->sample_flags;
+	u32 report_ts;
+	int ret;
+
+	/* First, append the periodic OA samples having lower timestamps */
+	report_ts = *(u32 *)(report + 4);
+	ret = dev_priv->perf.oa.ops.read(stream, buf, count, offset, report_ts);
+	if (ret)
+		return ret;
+
+	if (sample_flags & SAMPLE_OA_SOURCE_INFO)
+		data.source = I915_PERF_OA_EVENT_SOURCE_RCS;
+
+	if (sample_flags & SAMPLE_CTX_ID)
+		data.ctx_id = node->ctx_id;
+
+	if (sample_flags & SAMPLE_OA_REPORT)
+		data.report = report;
+
+	return append_oa_sample(stream, buf, count, offset, &data);
+}
+
+/**
+ * oa_rcs_append_reports: Copies all comand stream based OA reports into
+ * userspace read() buffer.
+ * @stream: An i915-perf stream opened for OA metrics
+ * @buf: destination buffer given by userspace
+ * @count: the number of bytes userspace wants to read
+ * @offset: (inout): the current position for writing into @buf
+ *
+ * Notably any error condition resulting in a short read (-%ENOSPC or
+ * -%EFAULT) will be returned even though one or more records may
+ * have been successfully copied. In this case it's up to the caller
+ * to decide if the error should be squashed before returning to
+ * userspace.
+ *
+ * Returns: 0 on success, negative error code on failure.
+ */
+static int oa_rcs_append_reports(struct i915_perf_stream *stream,
+				char __user *buf,
+				size_t count,
+				size_t *offset)
+{
+	struct drm_i915_private *dev_priv = stream->dev_priv;
+	struct i915_perf_cs_sample *entry, *next;
+	LIST_HEAD(free_list);
+	int ret = 0;
+
+	spin_lock(&dev_priv->perf.sample_lock);
+	if (list_empty(&dev_priv->perf.cs_samples)) {
+		spin_unlock(&dev_priv->perf.sample_lock);
+		return 0;
+	}
+	list_for_each_entry_safe(entry, next,
+				 &dev_priv->perf.cs_samples, link) {
+		if (!i915_gem_request_completed(entry->request))
+			break;
+		list_move_tail(&entry->link, &free_list);
+	}
+	spin_unlock(&dev_priv->perf.sample_lock);
+
+	if (list_empty(&free_list))
+		return 0;
+
+	list_for_each_entry_safe(entry, next, &free_list, link) {
+		ret = append_oa_rcs_sample(stream, buf, count, offset, entry);
+		if (ret)
+			break;
+
+		list_del(&entry->link);
+		i915_gem_request_put(entry->request);
+		kfree(entry);
+	}
+
+	/* Don't discard remaining entries, keep them for next read */
+	spin_lock(&dev_priv->perf.sample_lock);
+	list_splice(&free_list, &dev_priv->perf.cs_samples);
+	spin_unlock(&dev_priv->perf.sample_lock);
+
+	return ret;
+}
+
+/*
+ * command_stream_buf_is_empty - Checks whether the command stream buffer
+ * associated with the stream has data available.
+ * @stream: An i915-perf stream opened for OA metrics
+ *
+ * Returns: true if atleast one request associated with command stream is
+ * completed, else returns false.
+ */
+static bool command_stream_buf_is_empty(struct i915_perf_stream *stream)
+
+{
+	struct drm_i915_private *dev_priv = stream->dev_priv;
+	struct i915_perf_cs_sample *entry = NULL;
+	struct drm_i915_gem_request *request = NULL;
+
+	spin_lock(&dev_priv->perf.sample_lock);
+	entry = list_first_entry_or_null(&dev_priv->perf.cs_samples,
+			struct i915_perf_cs_sample, link);
+	if (entry)
+		request = entry->request;
+	spin_unlock(&dev_priv->perf.sample_lock);
+
+	if (!entry)
+		return true;
+	else if (!i915_gem_request_completed(request))
+		return true;
+	else
+		return false;
+}
+
+/**
+ * stream_have_data_unlocked - Checks whether the stream has data available
+ * @stream: An i915-perf stream opened for OA metrics
+ *
+ * For command stream based streams, check if the command stream buffer has
+ * atleast one sample available, if not return false, irrespective of periodic
+ * oa buffer having the data or not.
+ */
+
+static bool stream_have_data_unlocked(struct i915_perf_stream *stream)
+{
+	struct drm_i915_private *dev_priv = stream->dev_priv;
+
+	if (stream->cs_mode)
+		return !command_stream_buf_is_empty(stream);
+	else
+		return dev_priv->perf.oa.ops.oa_buffer_check(dev_priv);
 }
 
 /**
@@ -1199,8 +1637,22 @@ static int i915_oa_wait_unlocked(struct i915_perf_stream *stream)
 	if (!dev_priv->perf.oa.periodic)
 		return -EIO;
 
+	if (stream->cs_mode) {
+		int ret;
+
+		/*
+		 * Wait for the last submitted 'active' request.
+		 */
+		ret = i915_gem_active_wait(&stream->last_request,
+					   I915_WAIT_INTERRUPTIBLE);
+		if (ret) {
+			DRM_ERROR("Failed to wait for stream active request\n");
+			return ret;
+		}
+	}
+
 	return wait_event_interruptible(dev_priv->perf.oa.poll_wq,
-					dev_priv->perf.oa.ops.oa_buffer_check(dev_priv));
+					stream_have_data_unlocked(stream));
 }
 
 /**
@@ -1241,7 +1693,12 @@ static int i915_oa_read(struct i915_perf_stream *stream,
 {
 	struct drm_i915_private *dev_priv = stream->dev_priv;
 
-	return dev_priv->perf.oa.ops.read(stream, buf, count, offset);
+
+	if (stream->cs_mode)
+		return oa_rcs_append_reports(stream, buf, count, offset);
+	else
+		return dev_priv->perf.oa.ops.read(stream, buf, count, offset,
+						U32_MAX);
 }
 
 /**
@@ -1315,6 +1772,21 @@ static void oa_put_render_ctx_id(struct i915_perf_stream *stream)
 }
 
 static void
+free_command_stream_buf(struct drm_i915_private *dev_priv)
+{
+	mutex_lock(&dev_priv->drm.struct_mutex);
+
+	i915_gem_object_unpin_map(dev_priv->perf.command_stream_buf.vma->obj);
+	i915_vma_unpin(dev_priv->perf.command_stream_buf.vma);
+	i915_gem_object_put(dev_priv->perf.command_stream_buf.vma->obj);
+
+	dev_priv->perf.command_stream_buf.vma = NULL;
+	dev_priv->perf.command_stream_buf.vaddr = NULL;
+
+	mutex_unlock(&dev_priv->drm.struct_mutex);
+}
+
+static void
 free_oa_buffer(struct drm_i915_private *i915)
 {
 	mutex_lock(&i915->drm.struct_mutex);
@@ -1333,7 +1805,8 @@ static void i915_oa_stream_destroy(struct i915_perf_stream *stream)
 {
 	struct drm_i915_private *dev_priv = stream->dev_priv;
 
-	BUG_ON(stream != dev_priv->perf.oa.exclusive_stream);
+	if (WARN_ON(stream != dev_priv->perf.oa.exclusive_stream))
+		return;
 
 	dev_priv->perf.oa.ops.disable_metric_set(dev_priv);
 
@@ -1345,6 +1818,9 @@ static void i915_oa_stream_destroy(struct i915_perf_stream *stream)
 	if (stream->ctx)
 		oa_put_render_ctx_id(stream);
 
+	if (stream->cs_mode)
+		free_command_stream_buf(dev_priv);
+
 	dev_priv->perf.oa.exclusive_stream = NULL;
 }
 
@@ -1446,25 +1922,24 @@ static void gen8_init_oa_buffer(struct drm_i915_private *dev_priv)
 	dev_priv->perf.oa.pollin = false;
 }
 
-static int alloc_oa_buffer(struct drm_i915_private *dev_priv)
+static int alloc_obj(struct drm_i915_private *dev_priv,
+		     struct i915_vma **vma, u8 **vaddr)
 {
 	struct drm_i915_gem_object *bo;
-	struct i915_vma *vma;
 	int ret;
 
-	if (WARN_ON(dev_priv->perf.oa.oa_buffer.vma))
-		return -ENODEV;
+	intel_runtime_pm_get(dev_priv);
 
 	ret = i915_mutex_lock_interruptible(&dev_priv->drm);
 	if (ret)
-		return ret;
+		goto out;
 
 	BUILD_BUG_ON_NOT_POWER_OF_2(OA_BUFFER_SIZE);
 	BUILD_BUG_ON(OA_BUFFER_SIZE < SZ_128K || OA_BUFFER_SIZE > SZ_16M);
 
 	bo = i915_gem_object_create(dev_priv, OA_BUFFER_SIZE);
 	if (IS_ERR(bo)) {
-		DRM_ERROR("Failed to allocate OA buffer\n");
+		DRM_ERROR("Failed to allocate i915 perf obj\n");
 		ret = PTR_ERR(bo);
 		goto unlock;
 	}
@@ -1474,42 +1949,83 @@ static int alloc_oa_buffer(struct drm_i915_private *dev_priv)
 		goto err_unref;
 
 	/* PreHSW required 512K alignment, HSW requires 16M */
-	vma = i915_gem_object_ggtt_pin(bo, NULL, 0, SZ_16M, 0);
-	if (IS_ERR(vma)) {
-		ret = PTR_ERR(vma);
+	*vma = i915_gem_object_ggtt_pin(bo, NULL, 0, SZ_16M, 0);
+	if (IS_ERR(*vma)) {
+		ret = PTR_ERR(*vma);
 		goto err_unref;
 	}
-	dev_priv->perf.oa.oa_buffer.vma = vma;
 
-	dev_priv->perf.oa.oa_buffer.vaddr =
-		i915_gem_object_pin_map(bo, I915_MAP_WB);
-	if (IS_ERR(dev_priv->perf.oa.oa_buffer.vaddr)) {
-		ret = PTR_ERR(dev_priv->perf.oa.oa_buffer.vaddr);
+	*vaddr = i915_gem_object_pin_map(bo, I915_MAP_WB);
+	if (IS_ERR(*vaddr)) {
+		ret = PTR_ERR(*vaddr);
 		goto err_unpin;
 	}
 
-	dev_priv->perf.oa.ops.init_oa_buffer(dev_priv);
-
-	DRM_DEBUG_DRIVER("OA Buffer initialized, gtt offset = 0x%x, vaddr = %p\n",
-			 i915_ggtt_offset(dev_priv->perf.oa.oa_buffer.vma),
-			 dev_priv->perf.oa.oa_buffer.vaddr);
-
 	goto unlock;
 
 err_unpin:
-	__i915_vma_unpin(vma);
+	i915_vma_unpin(*vma);
 
 err_unref:
 	i915_gem_object_put(bo);
 
-	dev_priv->perf.oa.oa_buffer.vaddr = NULL;
-	dev_priv->perf.oa.oa_buffer.vma = NULL;
-
 unlock:
 	mutex_unlock(&dev_priv->drm.struct_mutex);
+out:
+	intel_runtime_pm_put(dev_priv);
 	return ret;
 }
 
+static int alloc_oa_buffer(struct drm_i915_private *dev_priv)
+{
+	struct i915_vma *vma;
+	u8 *vaddr;
+	int ret;
+
+	if (WARN_ON(dev_priv->perf.oa.oa_buffer.vma))
+		return -ENODEV;
+
+	ret = alloc_obj(dev_priv, &vma, &vaddr);
+	if (ret)
+		return ret;
+
+	dev_priv->perf.oa.oa_buffer.vma = vma;
+	dev_priv->perf.oa.oa_buffer.vaddr = vaddr;
+
+	dev_priv->perf.oa.ops.init_oa_buffer(dev_priv);
+
+	DRM_DEBUG_DRIVER("OA Buffer initialized, gtt offset = 0x%x, vaddr = %p",
+			 i915_ggtt_offset(dev_priv->perf.oa.oa_buffer.vma),
+			 dev_priv->perf.oa.oa_buffer.vaddr);
+	return 0;
+}
+
+static int alloc_command_stream_buf(struct drm_i915_private *dev_priv)
+{
+	struct i915_vma *vma;
+	u8 *vaddr;
+	int ret;
+
+	if (WARN_ON(dev_priv->perf.command_stream_buf.vma))
+		return -ENODEV;
+
+	ret = alloc_obj(dev_priv, &vma, &vaddr);
+	if (ret)
+		return ret;
+
+	dev_priv->perf.command_stream_buf.vma = vma;
+	dev_priv->perf.command_stream_buf.vaddr = vaddr;
+	if (WARN_ON(!list_empty(&dev_priv->perf.cs_samples)))
+		INIT_LIST_HEAD(&dev_priv->perf.cs_samples);
+
+	DRM_DEBUG_DRIVER(
+		"command stream buf initialized, gtt offset = 0x%x, vaddr = %p",
+		 i915_ggtt_offset(dev_priv->perf.command_stream_buf.vma),
+		 dev_priv->perf.command_stream_buf.vaddr);
+
+	return 0;
+}
+
 static void config_oa_regs(struct drm_i915_private *dev_priv,
 			   const struct i915_oa_reg *regs,
 			   int n_regs)
@@ -1848,7 +2364,8 @@ static void gen7_update_oacontrol_locked(struct drm_i915_private *dev_priv)
 {
 	assert_spin_locked(&dev_priv->perf.hook_lock);
 
-	if (dev_priv->perf.oa.exclusive_stream->enabled) {
+	if (dev_priv->perf.oa.exclusive_stream->state !=
+					I915_PERF_STREAM_DISABLED) {
 		struct i915_gem_context *ctx =
 			dev_priv->perf.oa.exclusive_stream->ctx;
 		u32 ctx_id = dev_priv->perf.oa.specific_ctx_id;
@@ -1954,10 +2471,20 @@ static void i915_oa_stream_disable(struct i915_perf_stream *stream)
 {
 	struct drm_i915_private *dev_priv = stream->dev_priv;
 
-	dev_priv->perf.oa.ops.oa_disable(dev_priv);
-
 	if (dev_priv->perf.oa.periodic)
 		hrtimer_cancel(&dev_priv->perf.oa.poll_check_timer);
+
+	if (stream->cs_mode) {
+		/*
+		 * Wait for the last submitted 'active' request, before freeing
+		 * the requests associated with the stream.
+		 */
+		i915_gem_active_wait(&stream->last_request,
+					   I915_WAIT_INTERRUPTIBLE);
+		i915_oa_rcs_release_samples(dev_priv);
+	}
+
+	dev_priv->perf.oa.ops.oa_disable(dev_priv);
 }
 
 static const struct i915_perf_stream_ops i915_oa_stream_ops = {
@@ -1967,6 +2494,7 @@ static void i915_oa_stream_disable(struct i915_perf_stream *stream)
 	.wait_unlocked = i915_oa_wait_unlocked,
 	.poll_wait = i915_oa_poll_wait,
 	.read = i915_oa_read,
+	.command_stream_hook = i915_perf_command_stream_hook_oa,
 };
 
 /**
@@ -1992,28 +2520,11 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
 			       struct perf_open_properties *props)
 {
 	struct drm_i915_private *dev_priv = stream->dev_priv;
-	int format_size;
+	bool require_oa_unit = props->sample_flags & (SAMPLE_OA_REPORT |
+						      SAMPLE_OA_SOURCE_INFO);
+	bool cs_sample_data = props->sample_flags & SAMPLE_OA_REPORT;
 	int ret;
 
-	/* If the sysfs metrics/ directory wasn't registered for some
-	 * reason then don't let userspace try their luck with config
-	 * IDs
-	 */
-	if (!dev_priv->perf.metrics_kobj) {
-		DRM_DEBUG("OA metrics weren't advertised via sysfs\n");
-		return -EINVAL;
-	}
-
-	if (!(props->sample_flags & SAMPLE_OA_REPORT)) {
-		DRM_DEBUG("Only OA report sampling supported\n");
-		return -EINVAL;
-	}
-
-	if (!dev_priv->perf.oa.ops.init_oa_buffer) {
-		DRM_DEBUG("OA unit not supported\n");
-		return -ENODEV;
-	}
-
 	/* To avoid the complexity of having to accurately filter
 	 * counter reports and marshal to the appropriate client
 	 * we currently only allow exclusive access
@@ -2023,80 +2534,154 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
 		return -EBUSY;
 	}
 
-	if (!props->metrics_set) {
-		DRM_DEBUG("OA metric set not specified\n");
-		return -EINVAL;
-	}
-
-	if (!props->oa_format) {
-		DRM_DEBUG("OA report format not specified\n");
-		return -EINVAL;
+	if ((props->sample_flags & SAMPLE_CTX_ID) && !props->cs_mode) {
+		if (IS_HASWELL(dev_priv)) {
+			DRM_ERROR(
+				"On HSW, context ID sampling only supported via command stream");
+			return -EINVAL;
+		} else if (!i915.enable_execlists) {
+			DRM_ERROR(
+				"On Gen8+ without execlists, context ID sampling only supported via command stream");
+			return -EINVAL;
+		}
 	}
 
 	stream->sample_size = sizeof(struct drm_i915_perf_record_header);
 
-	format_size = dev_priv->perf.oa.oa_formats[props->oa_format].size;
+	if (require_oa_unit) {
+		int format_size;
 
-	stream->sample_flags |= SAMPLE_OA_REPORT;
-	stream->sample_size += format_size;
+		/* If the sysfs metrics/ directory wasn't registered for some
+		 * reason then don't let userspace try their luck with config
+		 * IDs
+		 */
+		if (!dev_priv->perf.metrics_kobj) {
+			DRM_DEBUG("OA metrics weren't advertised via sysfs\n");
+			return -EINVAL;
+		}
 
-	if (props->sample_flags & SAMPLE_OA_SOURCE_INFO) {
-		stream->sample_flags |= SAMPLE_OA_SOURCE_INFO;
-		stream->sample_size += 4;
-	}
+		if (!dev_priv->perf.oa.ops.init_oa_buffer) {
+			DRM_DEBUG("OA unit not supported\n");
+			return -ENODEV;
+		}
 
-	dev_priv->perf.oa.oa_buffer.format_size = format_size;
-	if (WARN_ON(dev_priv->perf.oa.oa_buffer.format_size == 0))
-		return -EINVAL;
+		if (!props->metrics_set) {
+			DRM_DEBUG("OA metric set not specified\n");
+			return -EINVAL;
+		}
 
-	dev_priv->perf.oa.oa_buffer.format =
-		dev_priv->perf.oa.oa_formats[props->oa_format].format;
+		if (!props->oa_format) {
+			DRM_DEBUG("OA report format not specified\n");
+			return -EINVAL;
+		}
 
-	dev_priv->perf.oa.metrics_set = props->metrics_set;
+		if (props->cs_mode  && (props->engine!= RCS)) {
+			DRM_ERROR(
+				  "Command stream OA metrics only available via Render CS\n");
+			return -EINVAL;
+		}
+		stream->engine= RCS;
 
-	dev_priv->perf.oa.periodic = props->oa_periodic;
-	if (dev_priv->perf.oa.periodic)
-		dev_priv->perf.oa.period_exponent = props->oa_period_exponent;
+		format_size =
+			dev_priv->perf.oa.oa_formats[props->oa_format].size;
+
+		if (props->sample_flags & SAMPLE_OA_REPORT) {
+			stream->sample_flags |= SAMPLE_OA_REPORT;
+			stream->sample_size += format_size;
+		}
+
+		if (props->sample_flags & SAMPLE_OA_SOURCE_INFO) {
+			if (!(props->sample_flags & SAMPLE_OA_REPORT)) {
+				DRM_ERROR(
+					  "OA source type can't be sampled without OA report");
+				return -EINVAL;
+			}
+			stream->sample_flags |= SAMPLE_OA_SOURCE_INFO;
+			stream->sample_size += 4;
+		}
+
+		dev_priv->perf.oa.oa_buffer.format_size = format_size;
+		if (WARN_ON(dev_priv->perf.oa.oa_buffer.format_size == 0))
+			return -EINVAL;
+
+		dev_priv->perf.oa.oa_buffer.format =
+			dev_priv->perf.oa.oa_formats[props->oa_format].format;
+
+		dev_priv->perf.oa.metrics_set = props->metrics_set;
+
+		dev_priv->perf.oa.periodic = props->oa_periodic;
+		if (dev_priv->perf.oa.periodic)
+			dev_priv->perf.oa.period_exponent = props->oa_period_exponent;
+
+		if (stream->ctx) {
+			ret = oa_get_render_ctx_id(stream);
+			if (ret)
+				return ret;
+		}
 
-	if (stream->ctx) {
-		ret = oa_get_render_ctx_id(stream);
+		ret = alloc_oa_buffer(dev_priv);
 		if (ret)
-			return ret;
+			goto err_oa_buf_alloc;
+
+		/* PRM - observability performance counters:
+		 *
+		 *   OACONTROL, performance counter enable, note:
+		 *
+		 *   "When this bit is set, in order to have coherent counts,
+		 *   RC6 power state and trunk clock gating must be disabled.
+		 *   This can be achieved by programming MMIO registers as
+		 *   0xA094=0 and 0xA090[31]=1"
+		 *
+		 *   In our case we are expecting that taking pm + FORCEWAKE
+		 *   references will effectively disable RC6.
+		 */
+		intel_runtime_pm_get(dev_priv);
+		intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
+
+		ret = dev_priv->perf.oa.ops.enable_metric_set(dev_priv);
+		if (ret)
+			goto err_enable;
+
 	}
 
-	ret = alloc_oa_buffer(dev_priv);
-	if (ret)
-		goto err_oa_buf_alloc;
+	if (props->sample_flags & SAMPLE_CTX_ID) {
+		stream->sample_flags |= SAMPLE_CTX_ID;
+		stream->sample_size += 4;
+	}
 
-	/* PRM - observability performance counters:
-	 *
-	 *   OACONTROL, performance counter enable, note:
-	 *
-	 *   "When this bit is set, in order to have coherent counts,
-	 *   RC6 power state and trunk clock gating must be disabled.
-	 *   This can be achieved by programming MMIO registers as
-	 *   0xA094=0 and 0xA090[31]=1"
-	 *
-	 *   In our case we are expecting that taking pm + FORCEWAKE
-	 *   references will effectively disable RC6.
-	 */
-	intel_runtime_pm_get(dev_priv);
-	intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
+	if (props->cs_mode) {
+		if (!cs_sample_data) {
+			DRM_ERROR(
+				"Stream engine given without requesting any CS data to sample");
+			ret = -EINVAL;
+			goto err_enable;
+		}
 
-	ret = dev_priv->perf.oa.ops.enable_metric_set(dev_priv);
-	if (ret)
-		goto err_enable;
+		if (!(props->sample_flags & SAMPLE_CTX_ID)) {
+			DRM_ERROR(
+				"Stream engine given without requesting any CS specific property");
+			ret = -EINVAL;
+			goto err_enable;
+		}
+		stream->cs_mode = true;
+		ret = alloc_command_stream_buf(dev_priv);
+		if (ret)
+			goto err_enable;
 
-	stream->ops = &i915_oa_stream_ops;
+		init_request_active(&stream->last_request, NULL);
+	}
 
+	stream->ops = &i915_oa_stream_ops;
 	dev_priv->perf.oa.exclusive_stream = stream;
 
 	return 0;
 
 err_enable:
-	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
-	intel_runtime_pm_put(dev_priv);
-	free_oa_buffer(dev_priv);
+	if (require_oa_unit) {
+		intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
+		intel_runtime_pm_put(dev_priv);
+		free_oa_buffer(dev_priv);
+	}
 
 err_oa_buf_alloc:
 	if (stream->ctx)
@@ -2265,7 +2850,7 @@ static ssize_t i915_perf_read(struct file *file,
 	 * disabled stream as an error. In particular it might otherwise lead
 	 * to a deadlock for blocking file descriptors...
 	 */
-	if (!stream->enabled)
+	if (stream->state == I915_PERF_STREAM_DISABLED)
 		return -EIO;
 
 	if (!(file->f_flags & O_NONBLOCK)) {
@@ -2312,13 +2897,21 @@ static ssize_t i915_perf_read(struct file *file,
 
 static enum hrtimer_restart oa_poll_check_timer_cb(struct hrtimer *hrtimer)
 {
+	struct i915_perf_stream *stream;
 	struct drm_i915_private *dev_priv =
 		container_of(hrtimer, typeof(*dev_priv),
 			     perf.oa.poll_check_timer);
 
-	if (dev_priv->perf.oa.ops.oa_buffer_check(dev_priv)) {
-		dev_priv->perf.oa.pollin = true;
-		wake_up(&dev_priv->perf.oa.poll_wq);
+	/* No need to protect the streams list here, since the hrtimer is
+	 * disabled before the stream is removed from list, and currently a
+	 * single exclusive_stream is supported.
+	 * XXX: revisit this when multiple concurrent streams are supported.
+	 */
+	list_for_each_entry(stream, &dev_priv->perf.streams, link) {
+		if (stream_have_data_unlocked(stream)) {
+			dev_priv->perf.oa.pollin = true;
+			wake_up(&dev_priv->perf.oa.poll_wq);
+		}
 	}
 
 	hrtimer_forward_now(hrtimer, ns_to_ktime(POLL_PERIOD));
@@ -2401,14 +2994,16 @@ static unsigned int i915_perf_poll(struct file *file, poll_table *wait)
  */
 static void i915_perf_enable_locked(struct i915_perf_stream *stream)
 {
-	if (stream->enabled)
+	if (stream->state != I915_PERF_STREAM_DISABLED)
 		return;
 
 	/* Allow stream->ops->enable() to refer to this */
-	stream->enabled = true;
+	stream->state = I915_PERF_STREAM_ENABLE_IN_PROGRESS;
 
 	if (stream->ops->enable)
 		stream->ops->enable(stream);
+
+	stream->state = I915_PERF_STREAM_ENABLED;
 }
 
 /**
@@ -2427,11 +3022,11 @@ static void i915_perf_enable_locked(struct i915_perf_stream *stream)
  */
 static void i915_perf_disable_locked(struct i915_perf_stream *stream)
 {
-	if (!stream->enabled)
+	if (stream->state != I915_PERF_STREAM_ENABLED)
 		return;
 
 	/* Allow stream->ops->disable() to refer to this */
-	stream->enabled = false;
+	stream->state = I915_PERF_STREAM_DISABLED;
 
 	if (stream->ops->disable)
 		stream->ops->disable(stream);
@@ -2503,14 +3098,18 @@ static long i915_perf_ioctl(struct file *file,
  */
 static void i915_perf_destroy_locked(struct i915_perf_stream *stream)
 {
-	if (stream->enabled)
+	struct drm_i915_private *dev_priv = stream->dev_priv;
+
+	mutex_lock(&dev_priv->perf.streams_lock);
+	list_del(&stream->link);
+	mutex_unlock(&dev_priv->perf.streams_lock);
+
+	if (stream->state == I915_PERF_STREAM_ENABLED)
 		i915_perf_disable_locked(stream);
 
 	if (stream->ops->destroy)
 		stream->ops->destroy(stream);
 
-	list_del(&stream->link);
-
 	if (stream->ctx)
 		i915_gem_context_put_unlocked(stream->ctx);
 
@@ -2673,7 +3272,9 @@ static int i915_perf_release(struct inode *inode, struct file *file)
 		goto err_alloc;
 	}
 
+	mutex_lock(&dev_priv->perf.streams_lock);
 	list_add(&stream->link, &dev_priv->perf.streams);
+	mutex_unlock(&dev_priv->perf.streams_lock);
 
 	if (param->flags & I915_PERF_FLAG_FD_CLOEXEC)
 		f_flags |= O_CLOEXEC;
@@ -2692,7 +3293,9 @@ static int i915_perf_release(struct inode *inode, struct file *file)
 	return stream_fd;
 
 err_open:
+	mutex_lock(&dev_priv->perf.streams_lock);
 	list_del(&stream->link);
+	mutex_unlock(&dev_priv->perf.streams_lock);
 	if (stream->ops->destroy)
 		stream->ops->destroy(stream);
 err_alloc:
@@ -2832,6 +3435,30 @@ static int read_properties_unlocked(struct drm_i915_private *dev_priv,
 		case DRM_I915_PERF_PROP_SAMPLE_OA_SOURCE:
 			props->sample_flags |= SAMPLE_OA_SOURCE_INFO;
 			break;
+		case DRM_I915_PERF_PROP_ENGINE: {
+				unsigned int user_ring_id =
+					value & I915_EXEC_RING_MASK;
+				enum intel_engine_id engine;
+
+				if (user_ring_id > I915_USER_RINGS)
+					return -EINVAL;
+
+				/* XXX: Currently only RCS is supported.
+				 * Remove this check when support for other
+				 * engines is added
+				 */
+				engine = user_ring_map[user_ring_id];
+				if (engine != RCS)
+					return -EINVAL;
+
+				props->cs_mode = true;
+				props->engine = engine;
+			}
+			break;
+		case DRM_I915_PERF_PROP_SAMPLE_CTX_ID:
+			props->sample_flags |= SAMPLE_CTX_ID;
+			break;
+
 		default:
 			MISSING_CASE(id);
 			DRM_DEBUG("Unknown i915 perf property ID\n");
@@ -3140,8 +3767,11 @@ void i915_perf_init(struct drm_i915_private *dev_priv)
 		init_waitqueue_head(&dev_priv->perf.oa.poll_wq);
 
 		INIT_LIST_HEAD(&dev_priv->perf.streams);
+		INIT_LIST_HEAD(&dev_priv->perf.cs_samples);
 		mutex_init(&dev_priv->perf.lock);
+		mutex_init(&dev_priv->perf.streams_lock);
 		spin_lock_init(&dev_priv->perf.hook_lock);
+		spin_lock_init(&dev_priv->perf.sample_lock);
 		spin_lock_init(&dev_priv->perf.oa.oa_buffer.ptr_lock);
 
 		dev_priv->perf.sysctl_header = register_sysctl_table(dev_root);
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index c597e36..4cbefc8 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1314,6 +1314,7 @@ enum drm_i915_oa_format {
 
 enum drm_i915_perf_oa_event_source {
 	I915_PERF_OA_EVENT_SOURCE_PERIODIC,
+	I915_PERF_OA_EVENT_SOURCE_RCS,
 
 	I915_PERF_OA_EVENT_SOURCE_MAX	/* non-ABI */
 };
@@ -1359,6 +1360,19 @@ enum drm_i915_perf_property_id {
 	 */
 	DRM_I915_PERF_PROP_SAMPLE_OA_SOURCE,
 
+	/**
+	 * The value of this property specifies the GPU engine for which
+	 * the samples need to be collected. Specifying this property also
+	 * implies the command stream based sample collection.
+	 */
+	DRM_I915_PERF_PROP_ENGINE,
+
+	/**
+	 * The value of this property set to 1 requests inclusion of context ID
+	 * in the perf sample data.
+	 */
+	DRM_I915_PERF_PROP_SAMPLE_CTX_ID,
+
 	DRM_I915_PERF_PROP_MAX /* non-ABI */
 };
 
@@ -1425,6 +1439,7 @@ enum drm_i915_perf_record_type {
 	 *     struct drm_i915_perf_record_header header;
 	 *
 	 *     { u32 source_info; } && DRM_I915_PERF_PROP_SAMPLE_OA_SOURCE
+	 *     { u32 ctx_id; } && DRM_I915_PERF_PROP_SAMPLE_CTX_ID
 	 *     { u32 oa_report[]; } && DRM_I915_PERF_PROP_SAMPLE_OA
 	 * };
 	 */
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 4/8] drm/i915: flush periodic samples, in case of no pending CS sample requests
  2017-03-16  6:14 [PATCH 0/8] Collect command stream based OA reports using i915 perf sourab.gupta
                   ` (2 preceding siblings ...)
  2017-03-16  6:14 ` [PATCH 3/8] drm/i915: Framework for capturing command stream based OA reports sourab.gupta
@ 2017-03-16  6:14 ` sourab.gupta
  2017-03-16  6:14 ` [PATCH 5/8] drm/i915: Inform userspace about command stream OA buf overflow sourab.gupta
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 24+ messages in thread
From: sourab.gupta @ 2017-03-16  6:14 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter, Sourab Gupta, Matthew Auld

From: Sourab Gupta <sourab.gupta@intel.com>

When there are no pending CS OA samples, flush the periodic OA samples
collected so far.

We can safely forward the periodic OA samples in the case we
have no pending CS samples, but we can't do so in the case we have
pending CS samples, since we don't know what the ordering between
pending CS samples and periodic samples will eventually be. If we
have no pending CS sample, it won't be possible for future pending CS
sample to have timestamps earlier than current periodic timestamp.

Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h  |  10 ++-
 drivers/gpu/drm/i915/i915_perf.c | 175 +++++++++++++++++++++++++++++----------
 2 files changed, 139 insertions(+), 46 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 0b70052..d0f43e9 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2110,10 +2110,11 @@ struct i915_oa_ops {
 		    char __user *buf,
 		    size_t count,
 		    size_t *offset,
-		    u32 ts);
+		    u32 ts,
+		    u32 max_reports);
 
 	/**
-	 * @oa_buffer_check: Check for OA buffer data + update tail
+	 * @oa_buffer_num_reports: Return number of OA reports + update tail
 	 *
 	 * This is either called via fops or the poll check hrtimer (atomic
 	 * ctx) without any locks taken.
@@ -2126,7 +2127,8 @@ struct i915_oa_ops {
 	 * here, which will be handled gracefully - likely resulting in an
 	 * %EAGAIN error for userspace.
 	 */
-	bool (*oa_buffer_check)(struct drm_i915_private *dev_priv);
+	u32 (*oa_buffer_num_reports)(struct drm_i915_private *dev_priv,
+					u32 *last_ts);
 };
 
 /*
@@ -2589,6 +2591,8 @@ struct drm_i915_private {
 			u32 gen7_latched_oastatus1;
 			u32 ctx_oactxctrl_off;
 			u32 ctx_flexeu0_off;
+			u32 n_pending_periodic_samples;
+			u32 pending_periodic_ts;
 
 			/* The RPT_ID/reason field for Gen8+ includes a bit
 			 * to determine if the CTX ID in the report is valid
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 3321dad..d1d9853 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -583,7 +583,7 @@ static void i915_oa_rcs_release_samples(struct drm_i915_private *dev_priv)
 }
 
 /**
- * gen8_oa_buffer_check_unlocked - check for data and update tail ptr state
+ * gen8_oa_buffer_num_reports_unlocked - check for data and update tail ptr state
  * @dev_priv: i915 device instance
  *
  * This is either called via fops (for blocking reads in user ctx) or the poll
@@ -596,7 +596,7 @@ static void i915_oa_rcs_release_samples(struct drm_i915_private *dev_priv)
  * the pointers time to 'age' before they are made available for reading.
  * (See description of OA_TAIL_MARGIN_NSEC above for further details.)
  *
- * Besides returning true when there is data available to read() this function
+ * Besides returning num of reports when there is data available to read() it
  * also has the side effect of updating the oa_buffer.tails[], .aging_timestamp
  * and .aged_tail_idx state used for reading.
  *
@@ -604,14 +604,15 @@ static void i915_oa_rcs_release_samples(struct drm_i915_private *dev_priv)
  * only called while the stream is enabled, while the global OA configuration
  * can't be modified.
  *
- * Returns: %true if the OA buffer contains data, else %false
+ * Returns: number of samples available to read
  */
-static bool gen8_oa_buffer_check_unlocked(struct drm_i915_private *dev_priv)
+static u32 gen8_oa_buffer_num_reports_unlocked(
+			struct drm_i915_private *dev_priv, u32 *last_ts)
 {
 	int report_size = dev_priv->perf.oa.oa_buffer.format_size;
 	unsigned long flags;
 	unsigned int aged_idx;
-	u32 head, hw_tail, aged_tail, aging_tail;
+	u32 head, hw_tail, aged_tail, aging_tail, num_reports = 0;
 	u64 now;
 
 	/* We have to consider the (unlikely) possibility that read() errors
@@ -652,6 +653,13 @@ static bool gen8_oa_buffer_check_unlocked(struct drm_i915_private *dev_priv)
 	if (aging_tail != INVALID_TAIL_PTR &&
 	    ((now - dev_priv->perf.oa.oa_buffer.aging_timestamp) >
 	     OA_TAIL_MARGIN_NSEC)) {
+		u32 mask = (OA_BUFFER_SIZE - 1);
+		u32 gtt_offset = i915_ggtt_offset(
+				dev_priv->perf.oa.oa_buffer.vma);
+		u32 head = (dev_priv->perf.oa.oa_buffer.head - gtt_offset)
+				& mask;
+		u8 *oa_buf_base = dev_priv->perf.oa.oa_buffer.vaddr;
+		u32 *report32;
 
 		aged_idx ^= 1;
 		dev_priv->perf.oa.oa_buffer.aged_tail_idx = aged_idx;
@@ -661,6 +669,14 @@ static bool gen8_oa_buffer_check_unlocked(struct drm_i915_private *dev_priv)
 		/* Mark that we need a new pointer to start aging... */
 		dev_priv->perf.oa.oa_buffer.tails[!aged_idx].offset = INVALID_TAIL_PTR;
 		aging_tail = INVALID_TAIL_PTR;
+
+		num_reports = OA_TAKEN(((aged_tail - gtt_offset) & mask), head)/
+				report_size;
+
+		/* read the timestamp of last OA report */
+		head = (head + report_size*(num_reports - 1)) & mask;
+		report32 = (u32 *)(oa_buf_base + head);
+		*last_ts = report32[1];
 	}
 
 	/* Update the aging tail
@@ -694,12 +710,11 @@ static bool gen8_oa_buffer_check_unlocked(struct drm_i915_private *dev_priv)
 
 	spin_unlock_irqrestore(&dev_priv->perf.oa.oa_buffer.ptr_lock, flags);
 
-	return aged_tail == INVALID_TAIL_PTR ?
-		false : OA_TAKEN(aged_tail, head) >= report_size;
+	return aged_tail == INVALID_TAIL_PTR ? 0 : num_reports;
 }
 
 /**
- * gen7_oa_buffer_check_unlocked - check for data and update tail ptr state
+ * gen7_oa_buffer_num_reports_unlocked - check for data and update tail ptr state
  * @dev_priv: i915 device instance
  *
  * This is either called via fops (for blocking reads in user ctx) or the poll
@@ -712,7 +727,7 @@ static bool gen8_oa_buffer_check_unlocked(struct drm_i915_private *dev_priv)
  * the pointers time to 'age' before they are made available for reading.
  * (See description of OA_TAIL_MARGIN_NSEC above for further details.)
  *
- * Besides returning true when there is data available to read() this function
+ * Besides returning num of reports when there is data available to read() it
  * also has the side effect of updating the oa_buffer.tails[], .aging_timestamp
  * and .aged_tail_idx state used for reading.
  *
@@ -720,15 +735,16 @@ static bool gen8_oa_buffer_check_unlocked(struct drm_i915_private *dev_priv)
  * only called while the stream is enabled, while the global OA configuration
  * can't be modified.
  *
- * Returns: %true if the OA buffer contains data, else %false
+ * Returns: number of samples available to read
  */
-static bool gen7_oa_buffer_check_unlocked(struct drm_i915_private *dev_priv)
+static u32 gen7_oa_buffer_num_reports_unlocked(
+			struct drm_i915_private *dev_priv, u32 *last_ts)
 {
 	int report_size = dev_priv->perf.oa.oa_buffer.format_size;
 	unsigned long flags;
 	unsigned int aged_idx;
 	u32 oastatus1;
-	u32 head, hw_tail, aged_tail, aging_tail;
+	u32 head, hw_tail, aged_tail, aging_tail, num_reports = 0;
 	u64 now;
 
 	/* We have to consider the (unlikely) possibility that read() errors
@@ -770,6 +786,14 @@ static bool gen7_oa_buffer_check_unlocked(struct drm_i915_private *dev_priv)
 	if (aging_tail != INVALID_TAIL_PTR &&
 	    ((now - dev_priv->perf.oa.oa_buffer.aging_timestamp) >
 	     OA_TAIL_MARGIN_NSEC)) {
+		u32 mask = (OA_BUFFER_SIZE - 1);
+		u32 gtt_offset = i915_ggtt_offset(
+				dev_priv->perf.oa.oa_buffer.vma);
+		u32 head = (dev_priv->perf.oa.oa_buffer.head - gtt_offset)
+				& mask;
+		u8 *oa_buf_base = dev_priv->perf.oa.oa_buffer.vaddr;
+		u32 *report32;
+
 		aged_idx ^= 1;
 		dev_priv->perf.oa.oa_buffer.aged_tail_idx = aged_idx;
 
@@ -778,6 +802,14 @@ static bool gen7_oa_buffer_check_unlocked(struct drm_i915_private *dev_priv)
 		/* Mark that we need a new pointer to start aging... */
 		dev_priv->perf.oa.oa_buffer.tails[!aged_idx].offset = INVALID_TAIL_PTR;
 		aging_tail = INVALID_TAIL_PTR;
+
+		num_reports = OA_TAKEN(((aged_tail - gtt_offset) & mask), head)/
+				report_size;
+
+		/* read the timestamp of last OA report */
+		head = (head + report_size*(num_reports - 1)) & mask;
+		report32 = (u32 *)(oa_buf_base + head);
+		*last_ts = report32[1];
 	}
 
 	/* Update the aging tail
@@ -811,8 +843,7 @@ static bool gen7_oa_buffer_check_unlocked(struct drm_i915_private *dev_priv)
 
 	spin_unlock_irqrestore(&dev_priv->perf.oa.oa_buffer.ptr_lock, flags);
 
-	return aged_tail == INVALID_TAIL_PTR ?
-		false : OA_TAKEN(aged_tail, head) >= report_size;
+	return aged_tail == INVALID_TAIL_PTR ? 0 : num_reports;
 }
 
 /**
@@ -955,6 +986,7 @@ static int append_oa_buffer_sample(struct i915_perf_stream *stream,
  * @count: the number of bytes userspace wants to read
  * @offset: (inout): the current position for writing into @buf
  * @ts: copy OA reports till this timestamp
+ * @max_reports: max number of OA reports to copy
  *
  * Notably any error condition resulting in a short read (-%ENOSPC or
  * -%EFAULT) will be returned even though one or more records may
@@ -973,7 +1005,8 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
 				  char __user *buf,
 				  size_t count,
 				  size_t *offset,
-				  u32 ts)
+				  u32 ts,
+				  u32 max_reports)
 {
 	struct drm_i915_private *dev_priv = stream->dev_priv;
 	int report_size = dev_priv->perf.oa.oa_buffer.format_size;
@@ -986,6 +1019,7 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
 	u32 head, tail;
 	u32 taken;
 	int ret = 0;
+	u32 report_count = 0;
 
 	if (WARN_ON(stream->state != I915_PERF_STREAM_ENABLED))
 		return -EIO;
@@ -1024,7 +1058,7 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
 
 
 	for (/* none */;
-	     (taken = OA_TAKEN(tail, head));
+	     (taken = OA_TAKEN(tail, head)) && (report_count <= max_reports);
 	     head = (head + report_size) & mask) {
 		u8 *report = oa_buf_base + head;
 		u32 *report32 = (void *)report;
@@ -1127,6 +1161,7 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
 			if (ret)
 				break;
 
+			report_count++;
 			dev_priv->perf.oa.oa_buffer.last_ctx_id = ctx_id;
 		}
 
@@ -1163,6 +1198,7 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
  * @count: the number of bytes userspace wants to read
  * @offset: (inout): the current position for writing into @buf
  * @ts: copy OA reports till this timestamp
+ * @max_reports: max number of OA reports to copy
  *
  * Checks OA unit status registers and if necessary appends corresponding
  * status records for userspace (such as for a buffer full condition) and then
@@ -1181,7 +1217,8 @@ static int gen8_oa_read(struct i915_perf_stream *stream,
 			char __user *buf,
 			size_t count,
 			size_t *offset,
-			u32 ts)
+			u32 ts,
+			u32 max_reports)
 {
 	struct drm_i915_private *dev_priv = stream->dev_priv;
 	u32 oastatus;
@@ -1231,7 +1268,8 @@ static int gen8_oa_read(struct i915_perf_stream *stream,
 		}
 	}
 
-	return gen8_append_oa_reports(stream, buf, count, offset, ts);
+	return gen8_append_oa_reports(stream, buf, count, offset, ts,
+					max_reports);
 }
 
 /**
@@ -1241,6 +1279,7 @@ static int gen8_oa_read(struct i915_perf_stream *stream,
  * @count: the number of bytes userspace wants to read
  * @offset: (inout): the current position for writing into @buf
  * @ts: copy OA reports till this timestamp
+ * @max_reports: max number of OA reports to copy
  *
  * Notably any error condition resulting in a short read (-%ENOSPC or
  * -%EFAULT) will be returned even though one or more records may
@@ -1259,7 +1298,8 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream,
 				  char __user *buf,
 				  size_t count,
 				  size_t *offset,
-				  u32 ts)
+				  u32 ts,
+				  u32 max_reports)
 {
 	struct drm_i915_private *dev_priv = stream->dev_priv;
 	int report_size = dev_priv->perf.oa.oa_buffer.format_size;
@@ -1272,6 +1312,7 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream,
 	u32 head, tail;
 	u32 taken;
 	int ret = 0;
+	u32 report_count = 0;
 
 	if (WARN_ON(stream->state != I915_PERF_STREAM_ENABLED))
 		return -EIO;
@@ -1310,7 +1351,7 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream,
 
 
 	for (/* none */;
-	     (taken = OA_TAKEN(tail, head));
+	     (taken = OA_TAKEN(tail, head)) && (report_count <= max_reports);
 	     head = (head + report_size) & mask) {
 		u8 *report = oa_buf_base + head;
 		u32 *report32 = (void *)report;
@@ -1348,6 +1389,7 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream,
 		if (ret)
 			break;
 
+		report_count++;
 		/* The above report-id field sanity check is based on
 		 * the assumption that the OA buffer is initially
 		 * zeroed and we reset the field after copying so the
@@ -1384,6 +1426,7 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream,
  * @count: the number of bytes userspace wants to read
  * @offset: (inout): the current position for writing into @buf
  * @ts: copy OA reports till this timestamp
+ * @max_reports: max number of OA reports to copy
  *
  * Checks Gen 7 specific OA unit status registers and if necessary appends
  * corresponding status records for userspace (such as for a buffer full
@@ -1398,7 +1441,8 @@ static int gen7_oa_read(struct i915_perf_stream *stream,
 			char __user *buf,
 			size_t count,
 			size_t *offset,
-			u32 ts)
+			u32 ts,
+			u32 max_reports)
 {
 	struct drm_i915_private *dev_priv = stream->dev_priv;
 	u32 oastatus1;
@@ -1460,7 +1504,8 @@ static int gen7_oa_read(struct i915_perf_stream *stream,
 			GEN7_OASTATUS1_REPORT_LOST;
 	}
 
-	return gen7_append_oa_reports(stream, buf, count, offset, ts);
+	return gen7_append_oa_reports(stream, buf, count, offset, ts,
+					max_reports);
 }
 
 /**
@@ -1490,7 +1535,8 @@ static int append_oa_rcs_sample(struct i915_perf_stream *stream,
 
 	/* First, append the periodic OA samples having lower timestamps */
 	report_ts = *(u32 *)(report + 4);
-	ret = dev_priv->perf.oa.ops.read(stream, buf, count, offset, report_ts);
+	ret = dev_priv->perf.oa.ops.read(stream, buf, count, offset,
+					report_ts, U32_MAX);
 	if (ret)
 		return ret;
 
@@ -1535,7 +1581,7 @@ static int oa_rcs_append_reports(struct i915_perf_stream *stream,
 	spin_lock(&dev_priv->perf.sample_lock);
 	if (list_empty(&dev_priv->perf.cs_samples)) {
 		spin_unlock(&dev_priv->perf.sample_lock);
-		return 0;
+		goto pending_periodic;
 	}
 	list_for_each_entry_safe(entry, next,
 				 &dev_priv->perf.cs_samples, link) {
@@ -1546,7 +1592,7 @@ static int oa_rcs_append_reports(struct i915_perf_stream *stream,
 	spin_unlock(&dev_priv->perf.sample_lock);
 
 	if (list_empty(&free_list))
-		return 0;
+		goto pending_periodic;
 
 	list_for_each_entry_safe(entry, next, &free_list, link) {
 		ret = append_oa_rcs_sample(stream, buf, count, offset, entry);
@@ -1564,18 +1610,37 @@ static int oa_rcs_append_reports(struct i915_perf_stream *stream,
 	spin_unlock(&dev_priv->perf.sample_lock);
 
 	return ret;
+
+pending_periodic:
+	if (!dev_priv->perf.oa.n_pending_periodic_samples)
+		return 0;
+
+	ret = dev_priv->perf.oa.ops.read(stream, buf, count, offset,
+				dev_priv->perf.oa.pending_periodic_ts,
+				dev_priv->perf.oa.n_pending_periodic_samples);
+	dev_priv->perf.oa.n_pending_periodic_samples = 0;
+	dev_priv->perf.oa.pending_periodic_ts = 0;
+	return ret;
 }
 
+enum cs_buf_state {
+	CS_BUF_EMPTY,
+	CS_BUF_REQ_PENDING,
+	CS_BUF_HAVE_DATA,
+};
+
 /*
- * command_stream_buf_is_empty - Checks whether the command stream buffer
+ * command_stream_buf_state - Checks whether the command stream buffer
  * associated with the stream has data available.
  * @stream: An i915-perf stream opened for OA metrics
  *
- * Returns: true if atleast one request associated with command stream is
- * completed, else returns false.
+ * Returns:
+ * CS_BUF_HAVE_DATA	- if there is atleast one completed request
+ * CS_BUF_REQ_PENDING	- there are requests pending, but no completed requests
+ * CS_BUF_EMPTY		- no requests scheduled
  */
-static bool command_stream_buf_is_empty(struct i915_perf_stream *stream)
-
+static enum cs_buf_state command_stream_buf_state(
+				struct i915_perf_stream *stream)
 {
 	struct drm_i915_private *dev_priv = stream->dev_priv;
 	struct i915_perf_cs_sample *entry = NULL;
@@ -1589,30 +1654,54 @@ static bool command_stream_buf_is_empty(struct i915_perf_stream *stream)
 	spin_unlock(&dev_priv->perf.sample_lock);
 
 	if (!entry)
-		return true;
+		return CS_BUF_EMPTY;
 	else if (!i915_gem_request_completed(request))
-		return true;
+		return CS_BUF_REQ_PENDING;
 	else
-		return false;
+		return CS_BUF_HAVE_DATA;
 }
 
 /**
  * stream_have_data_unlocked - Checks whether the stream has data available
  * @stream: An i915-perf stream opened for OA metrics
  *
- * For command stream based streams, check if the command stream buffer has
- * atleast one sample available, if not return false, irrespective of periodic
- * oa buffer having the data or not.
+ * Note: We can safely forward the periodic OA samples in the case we have no
+ * pending CS samples, but we can't do so in the case we have pending CS
+ * samples, since we don't know what the ordering between pending CS samples
+ * and periodic samples will eventually be. If we have no pending CS sample,
+ * it won't be possible for future pending CS sample to have timestamps
+ * earlier than current periodic timestamp.
  */
 
 static bool stream_have_data_unlocked(struct i915_perf_stream *stream)
 {
 	struct drm_i915_private *dev_priv = stream->dev_priv;
+	enum cs_buf_state state;
+	u32 num_samples, last_ts = 0;
 
+	dev_priv->perf.oa.n_pending_periodic_samples = 0;
+	dev_priv->perf.oa.pending_periodic_ts = 0;
+	num_samples = dev_priv->perf.oa.ops.oa_buffer_num_reports(dev_priv,
+								&last_ts);
 	if (stream->cs_mode)
-		return !command_stream_buf_is_empty(stream);
+		state = command_stream_buf_state(stream);
 	else
-		return dev_priv->perf.oa.ops.oa_buffer_check(dev_priv);
+		state = CS_BUF_EMPTY;
+
+	switch (state) {
+	case CS_BUF_EMPTY:
+		dev_priv->perf.oa.n_pending_periodic_samples = num_samples;
+		dev_priv->perf.oa.pending_periodic_ts = last_ts;
+		return (num_samples != 0);
+
+	case CS_BUF_HAVE_DATA:
+		return true;
+
+	case CS_BUF_REQ_PENDING:
+		default:
+		return false;
+	}
+	return false;
 }
 
 /**
@@ -1698,7 +1787,7 @@ static int i915_oa_read(struct i915_perf_stream *stream,
 		return oa_rcs_append_reports(stream, buf, count, offset);
 	else
 		return dev_priv->perf.oa.ops.read(stream, buf, count, offset,
-						U32_MAX);
+						U32_MAX, U32_MAX);
 }
 
 /**
@@ -3688,8 +3777,8 @@ void i915_perf_init(struct drm_i915_private *dev_priv)
 		dev_priv->perf.oa.ops.oa_enable = gen7_oa_enable;
 		dev_priv->perf.oa.ops.oa_disable = gen7_oa_disable;
 		dev_priv->perf.oa.ops.read = gen7_oa_read;
-		dev_priv->perf.oa.ops.oa_buffer_check =
-			gen7_oa_buffer_check_unlocked;
+		dev_priv->perf.oa.ops.oa_buffer_num_reports =
+			gen7_oa_buffer_num_reports_unlocked;
 
 		dev_priv->perf.oa.oa_formats = hsw_oa_formats;
 
@@ -3753,8 +3842,8 @@ void i915_perf_init(struct drm_i915_private *dev_priv)
 			dev_priv->perf.oa.ops.oa_enable = gen8_oa_enable;
 			dev_priv->perf.oa.ops.oa_disable = gen8_oa_disable;
 			dev_priv->perf.oa.ops.read = gen8_oa_read;
-			dev_priv->perf.oa.ops.oa_buffer_check =
-				gen8_oa_buffer_check_unlocked;
+			dev_priv->perf.oa.ops.oa_buffer_num_reports =
+				gen8_oa_buffer_num_reports_unlocked;
 
 			dev_priv->perf.oa.oa_formats = gen8_plus_oa_formats;
 		}
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 5/8] drm/i915: Inform userspace about command stream OA buf overflow
  2017-03-16  6:14 [PATCH 0/8] Collect command stream based OA reports using i915 perf sourab.gupta
                   ` (3 preceding siblings ...)
  2017-03-16  6:14 ` [PATCH 4/8] drm/i915: flush periodic samples, in case of no pending CS sample requests sourab.gupta
@ 2017-03-16  6:14 ` sourab.gupta
  2017-03-16  6:14 ` [PATCH 6/8] drm/i915: Populate ctx ID for periodic OA reports sourab.gupta
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 24+ messages in thread
From: sourab.gupta @ 2017-03-16  6:14 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter, Sourab Gupta, Matthew Auld

From: Sourab Gupta <sourab.gupta@intel.com>

Considering how we don't currently give userspace control over the
OA buffer size and always configure a large 16MB buffer,
then a buffer overflow does anyway likely indicate that something
has gone quite badly wrong.

Here we set a status flag to detect overflow and inform userspace
of the report_lost condition accordingly. This is in line with the
behavior of the periodic OA buffer.

Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h  |  2 ++
 drivers/gpu/drm/i915/i915_perf.c | 15 +++++++++++++++
 2 files changed, 17 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index d0f43e9..0f2a552 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2609,6 +2609,8 @@ struct drm_i915_private {
 		struct {
 			struct i915_vma *vma;
 			u8 *vaddr;
+#define I915_PERF_CMD_STREAM_BUF_STATUS_OVERFLOW (1<<0)
+			u32 status;
 		} command_stream_buf;
 
 		struct list_head cs_samples;
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index d1d9853..2841d0a 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -478,6 +478,8 @@ static void insert_perf_sample(struct drm_i915_private *dev_priv,
 		else {
 			u32 target_size = sample_size - first->offset;
 
+			dev_priv->perf.command_stream_buf.status |=
+				I915_PERF_CMD_STREAM_BUF_STATUS_OVERFLOW;
 			release_perf_samples(dev_priv, target_size);
 			sample->offset = 0;
 		}
@@ -491,6 +493,8 @@ static void insert_perf_sample(struct drm_i915_private *dev_priv,
 				(first->offset - last->offset -
 				sample_size);
 
+			dev_priv->perf.command_stream_buf.status |=
+				I915_PERF_CMD_STREAM_BUF_STATUS_OVERFLOW;
 			release_perf_samples(dev_priv, target_size);
 			sample->offset = last->offset + sample_size;
 		}
@@ -1577,6 +1581,17 @@ static int oa_rcs_append_reports(struct i915_perf_stream *stream,
 	struct i915_perf_cs_sample *entry, *next;
 	LIST_HEAD(free_list);
 	int ret = 0;
+	u32 status = dev_priv->perf.command_stream_buf.status;
+
+	if (unlikely(status & I915_PERF_CMD_STREAM_BUF_STATUS_OVERFLOW)) {
+		ret = append_oa_status(stream, buf, count, offset,
+				       DRM_I915_PERF_RECORD_OA_BUFFER_LOST);
+		if (ret)
+			return ret;
+
+		dev_priv->perf.command_stream_buf.status &=
+				~I915_PERF_CMD_STREAM_BUF_STATUS_OVERFLOW;
+	}
 
 	spin_lock(&dev_priv->perf.sample_lock);
 	if (list_empty(&dev_priv->perf.cs_samples)) {
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 6/8] drm/i915: Populate ctx ID for periodic OA reports
  2017-03-16  6:14 [PATCH 0/8] Collect command stream based OA reports using i915 perf sourab.gupta
                   ` (4 preceding siblings ...)
  2017-03-16  6:14 ` [PATCH 5/8] drm/i915: Inform userspace about command stream OA buf overflow sourab.gupta
@ 2017-03-16  6:14 ` sourab.gupta
  2017-03-16  6:14 ` [PATCH 7/8] drm/i915: Add support for having pid output with OA report sourab.gupta
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 24+ messages in thread
From: sourab.gupta @ 2017-03-16  6:14 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter, Sourab Gupta, Matthew Auld

From: Sourab Gupta <sourab.gupta@intel.com>

This adds support for populating the ctx id for the periodic OA reports
when requested through the corresponding property.

For Gen8, the OA reports itself have the ctx ID and it is the one programmed
into HW while submitting workloads. Thus it's retrieved from reports itself.
For Gen7, the OA reports don't have any such field, and we can populate this
field with the last seen ctx ID while sending CS reports.

Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h  |  7 ++++++
 drivers/gpu/drm/i915/i915_perf.c | 53 +++++++++++++++++++++++++++++++++++-----
 2 files changed, 54 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 0f2a552..7a6dcb3 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2129,6 +2129,12 @@ struct i915_oa_ops {
 	 */
 	u32 (*oa_buffer_num_reports)(struct drm_i915_private *dev_priv,
 					u32 *last_ts);
+
+	/**
+	 * @get_ctx_id: Retrieve the ctx_id associated with the (periodic) OA
+	 * report.
+	 */
+	u32 (*get_ctx_id)(struct i915_perf_stream *stream, const u8 *report);
 };
 
 /*
@@ -2613,6 +2619,7 @@ struct drm_i915_private {
 			u32 status;
 		} command_stream_buf;
 
+		u32 last_cmd_stream_ctx_id;
 		struct list_head cs_samples;
 		spinlock_t sample_lock;
 	} perf;
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 2841d0a..208179f 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -850,6 +850,46 @@ static u32 gen7_oa_buffer_num_reports_unlocked(
 	return aged_tail == INVALID_TAIL_PTR ? 0 : num_reports;
 }
 
+static u32 gen7_oa_buffer_get_ctx_id(struct i915_perf_stream *stream,
+				    const u8 *report)
+{
+	struct drm_i915_private *dev_priv = stream->dev_priv;
+
+	if (!stream->cs_mode)
+		WARN_ONCE(1,
+			"CTX ID can't be retrieved if command stream mode not enabled");
+
+	/*
+	 * OA reports generated in Gen7 don't have the ctx ID information.
+	 * Therefore, just rely on the ctx ID information from the last CS
+	 * sample forwarded
+	 */
+	return dev_priv->perf.last_cmd_stream_ctx_id;
+}
+
+static u32 gen8_oa_buffer_get_ctx_id(struct i915_perf_stream *stream,
+				    const u8 *report)
+{
+	struct drm_i915_private *dev_priv = stream->dev_priv;
+
+	/* The ctx ID present in the OA reports have intel_context::hw_id
+	 * present, since this is programmed into the ELSP in execlist mode.
+	 * In non-execlist mode, fall back to retrieving the ctx ID from the
+	 * last saved ctx ID from command stream mode.
+	 */
+	if (i915.enable_execlists) {
+		u32 *report32 = (void *)report;
+		u32 ctx_id = report32[2] & 0x1fffff;
+		return ctx_id;
+	} else {
+		if (!stream->cs_mode)
+		WARN_ONCE(1,
+			"CTX ID can't be retrieved if command stream mode not enabled");
+
+		return dev_priv->perf.last_cmd_stream_ctx_id;
+	}
+}
+
 /**
  * append_oa_status - Appends a status record to a userspace read() buffer.
  * @stream: An i915-perf stream opened for OA metrics
@@ -963,18 +1003,15 @@ static int append_oa_buffer_sample(struct i915_perf_stream *stream,
 				char __user *buf, size_t count,
 				size_t *offset,	const u8 *report)
 {
+	struct drm_i915_private *dev_priv =  stream->dev_priv;
 	u32 sample_flags = stream->sample_flags;
 	struct oa_sample_data data = { 0 };
 
 	if (sample_flags & SAMPLE_OA_SOURCE_INFO)
 		data.source = I915_PERF_OA_EVENT_SOURCE_PERIODIC;
 
-	/*
-	 * FIXME: append_oa_buffer_sample: read ctx ID from report and map
-	 * that to a intel_context::hw_id"
-	 */
 	if (sample_flags & SAMPLE_CTX_ID)
-		data.ctx_id = 0;
+		data.ctx_id = dev_priv->perf.oa.ops.get_ctx_id(stream, report);
 
 	if (sample_flags & SAMPLE_OA_REPORT)
 		data.report = report;
@@ -1547,8 +1584,10 @@ static int append_oa_rcs_sample(struct i915_perf_stream *stream,
 	if (sample_flags & SAMPLE_OA_SOURCE_INFO)
 		data.source = I915_PERF_OA_EVENT_SOURCE_RCS;
 
-	if (sample_flags & SAMPLE_CTX_ID)
+	if (sample_flags & SAMPLE_CTX_ID) {
 		data.ctx_id = node->ctx_id;
+		dev_priv->perf.last_cmd_stream_ctx_id = node->ctx_id;
+	}
 
 	if (sample_flags & SAMPLE_OA_REPORT)
 		data.report = report;
@@ -3794,6 +3833,7 @@ void i915_perf_init(struct drm_i915_private *dev_priv)
 		dev_priv->perf.oa.ops.read = gen7_oa_read;
 		dev_priv->perf.oa.ops.oa_buffer_num_reports =
 			gen7_oa_buffer_num_reports_unlocked;
+		dev_priv->perf.oa.ops.get_ctx_id = gen7_oa_buffer_get_ctx_id;
 
 		dev_priv->perf.oa.oa_formats = hsw_oa_formats;
 
@@ -3859,6 +3899,7 @@ void i915_perf_init(struct drm_i915_private *dev_priv)
 			dev_priv->perf.oa.ops.read = gen8_oa_read;
 			dev_priv->perf.oa.ops.oa_buffer_num_reports =
 				gen8_oa_buffer_num_reports_unlocked;
+		dev_priv->perf.oa.ops.get_ctx_id = gen8_oa_buffer_get_ctx_id;
 
 			dev_priv->perf.oa.oa_formats = gen8_plus_oa_formats;
 		}
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 7/8] drm/i915: Add support for having pid output with OA report
  2017-03-16  6:14 [PATCH 0/8] Collect command stream based OA reports using i915 perf sourab.gupta
                   ` (5 preceding siblings ...)
  2017-03-16  6:14 ` [PATCH 6/8] drm/i915: Populate ctx ID for periodic OA reports sourab.gupta
@ 2017-03-16  6:14 ` sourab.gupta
  2017-03-16  6:14 ` [PATCH 8/8] drm/i915: Add support for emitting execbuffer tags through OA counter reports sourab.gupta
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 24+ messages in thread
From: sourab.gupta @ 2017-03-16  6:14 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter, Sourab Gupta, Matthew Auld

From: Sourab Gupta <sourab.gupta@intel.com>

This patch introduces flags and adds support for having pid output with
the OA reports generated through the RCS commands.

When the stream is opened with pid sample type, the pid information is also
captured through the command stream samples and forwarded along with the
OA reports.

Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h  |  7 ++++++
 drivers/gpu/drm/i915/i915_perf.c | 50 ++++++++++++++++++++++++++++++++++++++--
 include/uapi/drm/i915_drm.h      |  7 ++++++
 3 files changed, 62 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 7a6dcb3..fa1d3fc 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2165,6 +2165,12 @@ struct i915_perf_cs_sample {
 	 * @ctx_id: Context ID associated with this perf sample
 	 */
 	u32 ctx_id;
+
+	/**
+	 * @pid: PID of the process in context of which the workload was
+	 * submitted, pertaining to this perf sample
+	 */
+	u32 pid;
 };
 
 struct intel_cdclk_state {
@@ -2620,6 +2626,7 @@ struct drm_i915_private {
 		} command_stream_buf;
 
 		u32 last_cmd_stream_ctx_id;
+		u32 last_pid;
 		struct list_head cs_samples;
 		spinlock_t sample_lock;
 	} perf;
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 208179f..6e8af2d 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -289,6 +289,7 @@
 struct oa_sample_data {
 	u32 source;
 	u32 ctx_id;
+	u32 pid;
 	const u8 *report;
 };
 
@@ -342,6 +343,7 @@ struct oa_sample_data {
 #define SAMPLE_OA_REPORT      (1<<0)
 #define SAMPLE_OA_SOURCE_INFO	(1<<1)
 #define SAMPLE_CTX_ID		(1<<2)
+#define SAMPLE_PID		(1<<3)
 
 /**
  * struct perf_open_properties - for validated properties given to open a stream
@@ -532,6 +534,7 @@ static void i915_perf_command_stream_hook_oa(struct i915_perf_stream *stream,
 	}
 
 	sample->ctx_id = ctx->hw_id;
+	sample->pid = current->pid;
 	i915_gem_request_assign(&sample->request, request);
 
 	insert_perf_sample(dev_priv, sample);
@@ -977,6 +980,12 @@ static int append_oa_sample(struct i915_perf_stream *stream,
 		buf += 4;
 	}
 
+	if (sample_flags & SAMPLE_PID) {
+		if (copy_to_user(buf, &data->pid, 4))
+			return -EFAULT;
+		buf += 4;
+	}
+
 	if (sample_flags & SAMPLE_OA_REPORT) {
 		if (copy_to_user(buf, data->report, report_size))
 			return -EFAULT;
@@ -1013,6 +1022,9 @@ static int append_oa_buffer_sample(struct i915_perf_stream *stream,
 	if (sample_flags & SAMPLE_CTX_ID)
 		data.ctx_id = dev_priv->perf.oa.ops.get_ctx_id(stream, report);
 
+	if (sample_flags & SAMPLE_PID)
+		data.pid = dev_priv->perf.last_pid;
+
 	if (sample_flags & SAMPLE_OA_REPORT)
 		data.report = report;
 
@@ -1589,6 +1601,11 @@ static int append_oa_rcs_sample(struct i915_perf_stream *stream,
 		dev_priv->perf.last_cmd_stream_ctx_id = node->ctx_id;
 	}
 
+	if (sample_flags & SAMPLE_PID) {
+		data.pid = node->pid;
+		dev_priv->perf.last_pid = node->pid;
+	}
+
 	if (sample_flags & SAMPLE_OA_REPORT)
 		data.report = report;
 
@@ -2665,6 +2682,7 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
 	struct drm_i915_private *dev_priv = stream->dev_priv;
 	bool require_oa_unit = props->sample_flags & (SAMPLE_OA_REPORT |
 						      SAMPLE_OA_SOURCE_INFO);
+	bool require_cs_mode = props->sample_flags & SAMPLE_PID;
 	bool cs_sample_data = props->sample_flags & SAMPLE_OA_REPORT;
 	int ret;
 
@@ -2790,6 +2808,20 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
 	if (props->sample_flags & SAMPLE_CTX_ID) {
 		stream->sample_flags |= SAMPLE_CTX_ID;
 		stream->sample_size += 4;
+
+		/*
+		 * NB: it's meaningful to request SAMPLE_CTX_ID with just CS
+		 * mode or periodic OA mode sampling but we don't allow
+		 * SAMPLE_CTX_ID without either mode
+		 */
+		if (!require_oa_unit)
+			require_cs_mode = true;
+	}
+
+	if (require_cs_mode && !props->cs_mode) {
+		DRM_ERROR("PID sampling requires a ring to be specified");
+		ret = -EINVAL;
+		goto err_enable;
 	}
 
 	if (props->cs_mode) {
@@ -2800,13 +2832,25 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
 			goto err_enable;
 		}
 
-		if (!(props->sample_flags & SAMPLE_CTX_ID)) {
+		/*
+		 * The only time we should allow enabling CS mode if it's not
+		 * strictly required, is if SAMPLE_CTX_ID has been requested
+		 * as it's usable with periodic OA or CS sampling.
+		 */
+		if (!require_cs_mode &&
+		    !(props->sample_flags & SAMPLE_CTX_ID)) {
 			DRM_ERROR(
 				"Stream engine given without requesting any CS specific property");
 			ret = -EINVAL;
 			goto err_enable;
 		}
 		stream->cs_mode = true;
+
+		if (props->sample_flags & SAMPLE_PID) {
+			stream->sample_flags |= SAMPLE_PID;
+			stream->sample_size += 4;
+		}
+
 		ret = alloc_command_stream_buf(dev_priv);
 		if (ret)
 			goto err_enable;
@@ -3601,7 +3645,9 @@ static int read_properties_unlocked(struct drm_i915_private *dev_priv,
 		case DRM_I915_PERF_PROP_SAMPLE_CTX_ID:
 			props->sample_flags |= SAMPLE_CTX_ID;
 			break;
-
+		case DRM_I915_PERF_PROP_SAMPLE_PID:
+			props->sample_flags |= SAMPLE_PID;
+			break;
 		default:
 			MISSING_CASE(id);
 			DRM_DEBUG("Unknown i915 perf property ID\n");
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 4cbefc8..25cf612 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1373,6 +1373,12 @@ enum drm_i915_perf_property_id {
 	 */
 	DRM_I915_PERF_PROP_SAMPLE_CTX_ID,
 
+	/**
+	 * The value of this property set to 1 requests inclusion of pid in the
+	 * perf sample data.
+	 */
+	DRM_I915_PERF_PROP_SAMPLE_PID,
+
 	DRM_I915_PERF_PROP_MAX /* non-ABI */
 };
 
@@ -1440,6 +1446,7 @@ enum drm_i915_perf_record_type {
 	 *
 	 *     { u32 source_info; } && DRM_I915_PERF_PROP_SAMPLE_OA_SOURCE
 	 *     { u32 ctx_id; } && DRM_I915_PERF_PROP_SAMPLE_CTX_ID
+	 *     { u32 pid; } && DRM_I915_PERF_PROP_SAMPLE_PID
 	 *     { u32 oa_report[]; } && DRM_I915_PERF_PROP_SAMPLE_OA
 	 * };
 	 */
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 8/8] drm/i915: Add support for emitting execbuffer tags through OA counter reports
  2017-03-16  6:14 [PATCH 0/8] Collect command stream based OA reports using i915 perf sourab.gupta
                   ` (6 preceding siblings ...)
  2017-03-16  6:14 ` [PATCH 7/8] drm/i915: Add support for having pid output with OA report sourab.gupta
@ 2017-03-16  6:14 ` sourab.gupta
  2017-03-16  8:13   ` Chris Wilson
  2017-03-16  6:30 ` ✓ Fi.CI.BAT: success for Collect command stream based OA reports using i915 perf Patchwork
  2017-03-16 12:59 ` [PATCH 0/8] " Robert Bragg
  9 siblings, 1 reply; 24+ messages in thread
From: sourab.gupta @ 2017-03-16  6:14 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter, Sourab Gupta, Matthew Auld

From: Sourab Gupta <sourab.gupta@intel.com>

This patch enables userspace to specify tags (per workload), provided via
execbuffer ioctl, which could be added to OA reports, to help associate
reports with the corresponding workloads.

There may be multiple stages within a single context, from a userspace
perspective. An ability is needed to individually associate the OA reports
with their corresponding workloads(execbuffers), which may not be possible
solely with ctx_id or pid information. This patch enables such a mechanism.

In this patch, upper 32 bits of rsvd1 field, which were previously unused
are now being used to pass in the tag.

Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h            | 17 +++++++++++--
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  6 +++--
 drivers/gpu/drm/i915/i915_perf.c           | 40 +++++++++++++++++++++++++-----
 include/uapi/drm/i915_drm.h                | 12 +++++++++
 4 files changed, 65 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index fa1d3fc..414afa5 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1982,7 +1982,8 @@ struct i915_perf_stream_ops {
 	 * the batch buffer.
 	 */
 	void (*command_stream_hook)(struct i915_perf_stream *stream,
-				struct drm_i915_gem_request *request);
+				struct drm_i915_gem_request *request,
+				u32 tag);
 };
 
 enum i915_perf_stream_state {
@@ -2171,6 +2172,17 @@ struct i915_perf_cs_sample {
 	 * submitted, pertaining to this perf sample
 	 */
 	u32 pid;
+
+	/**
+	 * @tag: Tag associated with workload, for which the perf sample is
+	 * being collected.
+	 *
+	 * Userspace can specify tags (provided via execbuffer ioctl), which
+	 * can be associated with the perf samples, and be used to functionally
+	 * distinguish different workload stages, and associate samples with
+	 * these different stages.
+	 */
+	u32 tag;
 };
 
 struct intel_cdclk_state {
@@ -2627,6 +2639,7 @@ struct drm_i915_private {
 
 		u32 last_cmd_stream_ctx_id;
 		u32 last_pid;
+		u32 last_tag;
 		struct list_head cs_samples;
 		spinlock_t sample_lock;
 	} perf;
@@ -3690,7 +3703,7 @@ void i915_oa_init_reg_state(struct intel_engine_cs *engine,
 void i915_oa_update_reg_state(struct intel_engine_cs *engine,
 			      struct i915_gem_context *ctx,
 			      uint32_t *reg_state);
-void i915_perf_command_stream_hook(struct drm_i915_gem_request *req);
+void i915_perf_command_stream_hook(struct drm_i915_gem_request *req, u32 tag);
 
 /* i915_gem_evict.c */
 int __must_check i915_gem_evict_something(struct i915_address_space *vm,
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 7af32c97..b42d47e 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -59,6 +59,7 @@ struct i915_execbuffer_params {
 	struct intel_engine_cs          *engine;
 	struct i915_gem_context         *ctx;
 	struct drm_i915_gem_request     *request;
+	uint32_t			tag;
 };
 
 struct eb_vmas {
@@ -1441,7 +1442,7 @@ static void eb_export_fence(struct drm_i915_gem_object *obj,
 	if (exec_len == 0)
 		exec_len = params->batch->size - params->args_batch_start_offset;
 
-	i915_perf_command_stream_hook(params->request);
+	i915_perf_command_stream_hook(params->request, params->tag);
 
 	ret = params->engine->emit_bb_start(params->request,
 					    exec_start, exec_len,
@@ -1449,7 +1450,7 @@ static void eb_export_fence(struct drm_i915_gem_object *obj,
 	if (ret)
 		return ret;
 
-	i915_perf_command_stream_hook(params->request);
+	i915_perf_command_stream_hook(params->request, params->tag);
 
 	i915_gem_execbuffer_move_to_active(vmas, params->request);
 
@@ -1791,6 +1792,7 @@ static void eb_export_fence(struct drm_i915_gem_object *obj,
 	params->engine                    = engine;
 	params->dispatch_flags          = dispatch_flags;
 	params->ctx                     = ctx;
+	params->tag			= i915_execbuffer2_get_tag(*args);
 
 	trace_i915_gem_request_queue(params->request, dispatch_flags);
 
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 6e8af2d..759865e 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -290,6 +290,7 @@ struct oa_sample_data {
 	u32 source;
 	u32 ctx_id;
 	u32 pid;
+	u32 tag;
 	const u8 *report;
 };
 
@@ -344,6 +345,7 @@ struct oa_sample_data {
 #define SAMPLE_OA_SOURCE_INFO	(1<<1)
 #define SAMPLE_CTX_ID		(1<<2)
 #define SAMPLE_PID		(1<<3)
+#define SAMPLE_TAG		(1<<4)
 
 /**
  * struct perf_open_properties - for validated properties given to open a stream
@@ -387,7 +389,8 @@ struct perf_open_properties {
  * The function provides a hook through which the commands to capture perf
  * metrics, are inserted into the command stream of a GPU engine.
  */
-void i915_perf_command_stream_hook(struct drm_i915_gem_request *request)
+void i915_perf_command_stream_hook(struct drm_i915_gem_request *request,
+					u32 tag)
 {
 	struct intel_engine_cs *engine = request->engine;
 	struct drm_i915_private *dev_priv = engine->i915;
@@ -400,7 +403,7 @@ void i915_perf_command_stream_hook(struct drm_i915_gem_request *request)
 	list_for_each_entry(stream, &dev_priv->perf.streams, link) {
 		if ((stream->state == I915_PERF_STREAM_ENABLED) &&
 					stream->cs_mode)
-			stream->ops->command_stream_hook(stream, request);
+			stream->ops->command_stream_hook(stream, request, tag);
 	}
 	mutex_unlock(&dev_priv->perf.streams_lock);
 }
@@ -510,10 +513,11 @@ static void insert_perf_sample(struct drm_i915_private *dev_priv,
  * metrics into the render command stream
  * @stream: An i915-perf stream opened for OA metrics
  * @request: request in whose context the metrics are being collected.
- *
+ * @tag: userspace provided tag to be associated with the perf sample
  */
 static void i915_perf_command_stream_hook_oa(struct i915_perf_stream *stream,
-					struct drm_i915_gem_request *request)
+					struct drm_i915_gem_request *request,
+					u32 tag)
 {
 	struct drm_i915_private *dev_priv = request->i915;
 	struct i915_gem_context *ctx = request->ctx;
@@ -535,6 +539,7 @@ static void i915_perf_command_stream_hook_oa(struct i915_perf_stream *stream,
 
 	sample->ctx_id = ctx->hw_id;
 	sample->pid = current->pid;
+	sample->tag = tag;
 	i915_gem_request_assign(&sample->request, request);
 
 	insert_perf_sample(dev_priv, sample);
@@ -986,6 +991,12 @@ static int append_oa_sample(struct i915_perf_stream *stream,
 		buf += 4;
 	}
 
+	if (sample_flags & SAMPLE_TAG) {
+		if (copy_to_user(buf, &data->tag, 4))
+			return -EFAULT;
+		buf += 4;
+	}
+
 	if (sample_flags & SAMPLE_OA_REPORT) {
 		if (copy_to_user(buf, data->report, report_size))
 			return -EFAULT;
@@ -1025,6 +1036,9 @@ static int append_oa_buffer_sample(struct i915_perf_stream *stream,
 	if (sample_flags & SAMPLE_PID)
 		data.pid = dev_priv->perf.last_pid;
 
+	if (sample_flags & SAMPLE_TAG)
+		data.tag = dev_priv->perf.last_tag;
+
 	if (sample_flags & SAMPLE_OA_REPORT)
 		data.report = report;
 
@@ -1606,6 +1620,11 @@ static int append_oa_rcs_sample(struct i915_perf_stream *stream,
 		dev_priv->perf.last_pid = node->pid;
 	}
 
+	if (sample_flags & SAMPLE_TAG) {
+		data.tag = node->tag;
+		dev_priv->perf.last_tag = node->tag;
+	}
+
 	if (sample_flags & SAMPLE_OA_REPORT)
 		data.report = report;
 
@@ -2682,7 +2701,8 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
 	struct drm_i915_private *dev_priv = stream->dev_priv;
 	bool require_oa_unit = props->sample_flags & (SAMPLE_OA_REPORT |
 						      SAMPLE_OA_SOURCE_INFO);
-	bool require_cs_mode = props->sample_flags & SAMPLE_PID;
+	bool require_cs_mode = props->sample_flags & (SAMPLE_PID |
+						      SAMPLE_TAG);
 	bool cs_sample_data = props->sample_flags & SAMPLE_OA_REPORT;
 	int ret;
 
@@ -2819,7 +2839,7 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
 	}
 
 	if (require_cs_mode && !props->cs_mode) {
-		DRM_ERROR("PID sampling requires a ring to be specified");
+		DRM_ERROR("PID/TAG sampling requires a ring to be specified");
 		ret = -EINVAL;
 		goto err_enable;
 	}
@@ -2851,6 +2871,11 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
 			stream->sample_size += 4;
 		}
 
+		if (props->sample_flags & SAMPLE_TAG) {
+			stream->sample_flags |= SAMPLE_TAG;
+			stream->sample_size += 4;
+		}
+
 		ret = alloc_command_stream_buf(dev_priv);
 		if (ret)
 			goto err_enable;
@@ -3648,6 +3673,9 @@ static int read_properties_unlocked(struct drm_i915_private *dev_priv,
 		case DRM_I915_PERF_PROP_SAMPLE_PID:
 			props->sample_flags |= SAMPLE_PID;
 			break;
+		case DRM_I915_PERF_PROP_SAMPLE_TAG:
+			props->sample_flags |= SAMPLE_TAG;
+			break;
 		default:
 			MISSING_CASE(id);
 			DRM_DEBUG("Unknown i915 perf property ID\n");
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 25cf612..ab5fdc0 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -899,6 +899,11 @@ struct drm_i915_gem_execbuffer2 {
 #define i915_execbuffer2_get_context_id(eb2) \
 	((eb2).rsvd1 & I915_EXEC_CONTEXT_ID_MASK)
 
+/* upper 32 bits of rsvd1 field contain tag */
+#define I915_EXEC_TAG_MASK		(0xffffffff00000000UL)
+#define i915_execbuffer2_get_tag(eb2) \
+	((eb2).rsvd1 & I915_EXEC_TAG_MASK)
+
 struct drm_i915_gem_pin {
 	/** Handle of the buffer to be pinned. */
 	__u32 handle;
@@ -1379,6 +1384,12 @@ enum drm_i915_perf_property_id {
 	 */
 	DRM_I915_PERF_PROP_SAMPLE_PID,
 
+	/**
+	 * The value of this property set to 1 requests inclusion of tag in the
+	 * perf sample data.
+	 */
+	DRM_I915_PERF_PROP_SAMPLE_TAG,
+
 	DRM_I915_PERF_PROP_MAX /* non-ABI */
 };
 
@@ -1447,6 +1458,7 @@ enum drm_i915_perf_record_type {
 	 *     { u32 source_info; } && DRM_I915_PERF_PROP_SAMPLE_OA_SOURCE
 	 *     { u32 ctx_id; } && DRM_I915_PERF_PROP_SAMPLE_CTX_ID
 	 *     { u32 pid; } && DRM_I915_PERF_PROP_SAMPLE_PID
+	 *     { u32 tag; } && DRM_I915_PERF_PROP_SAMPLE_TAG
 	 *     { u32 oa_report[]; } && DRM_I915_PERF_PROP_SAMPLE_OA
 	 * };
 	 */
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* ✓ Fi.CI.BAT: success for Collect command stream based OA reports using i915 perf
  2017-03-16  6:14 [PATCH 0/8] Collect command stream based OA reports using i915 perf sourab.gupta
                   ` (7 preceding siblings ...)
  2017-03-16  6:14 ` [PATCH 8/8] drm/i915: Add support for emitting execbuffer tags through OA counter reports sourab.gupta
@ 2017-03-16  6:30 ` Patchwork
  2017-03-16 12:59 ` [PATCH 0/8] " Robert Bragg
  9 siblings, 0 replies; 24+ messages in thread
From: Patchwork @ 2017-03-16  6:30 UTC (permalink / raw)
  To: sourab.gupta; +Cc: intel-gfx

== Series Details ==

Series: Collect command stream based OA reports using i915 perf
URL   : https://patchwork.freedesktop.org/series/21351/
State : success

== Summary ==

Series 21351v1 Collect command stream based OA reports using i915 perf
https://patchwork.freedesktop.org/api/1.0/series/21351/revisions/1/mbox/

Test gem_exec_flush:
        Subgroup basic-batch-kernel-default-uc:
                fail       -> PASS       (fi-snb-2600) fdo#100007

fdo#100007 https://bugs.freedesktop.org/show_bug.cgi?id=100007

fi-bdw-5557u     total:278  pass:267  dwarn:0   dfail:0   fail:0   skip:11  time: 455s
fi-bsw-n3050     total:278  pass:239  dwarn:0   dfail:0   fail:0   skip:39  time: 579s
fi-bxt-j4205     total:278  pass:259  dwarn:0   dfail:0   fail:0   skip:19  time: 540s
fi-bxt-t5700     total:278  pass:258  dwarn:0   dfail:0   fail:0   skip:20  time: 534s
fi-byt-j1900     total:278  pass:251  dwarn:0   dfail:0   fail:0   skip:27  time: 499s
fi-byt-n2820     total:278  pass:247  dwarn:0   dfail:0   fail:0   skip:31  time: 496s
fi-hsw-4770      total:278  pass:262  dwarn:0   dfail:0   fail:0   skip:16  time: 439s
fi-hsw-4770r     total:278  pass:262  dwarn:0   dfail:0   fail:0   skip:16  time: 434s
fi-ilk-650       total:278  pass:228  dwarn:0   dfail:0   fail:0   skip:50  time: 436s
fi-ivb-3520m     total:278  pass:260  dwarn:0   dfail:0   fail:0   skip:18  time: 515s
fi-ivb-3770      total:278  pass:260  dwarn:0   dfail:0   fail:0   skip:18  time: 495s
fi-kbl-7500u     total:278  pass:260  dwarn:0   dfail:0   fail:0   skip:18  time: 482s
fi-skl-6260u     total:278  pass:268  dwarn:0   dfail:0   fail:0   skip:10  time: 482s
fi-skl-6700hq    total:278  pass:261  dwarn:0   dfail:0   fail:0   skip:17  time: 604s
fi-skl-6700k     total:278  pass:256  dwarn:4   dfail:0   fail:0   skip:18  time: 491s
fi-skl-6770hq    total:278  pass:268  dwarn:0   dfail:0   fail:0   skip:10  time: 515s
fi-snb-2520m     total:278  pass:250  dwarn:0   dfail:0   fail:0   skip:28  time: 546s
fi-snb-2600      total:278  pass:249  dwarn:0   dfail:0   fail:0   skip:29  time: 411s

d7cb114e1327b50c609ee6e67122cc0293ea515f drm-tip: 2017y-03m-15d-21h-47m-56s UTC integration manifest
1408d1e drm/i915: Expose OA sample source to userspace
dabfab4 drm/i915: Add ctx getparam ioctl parameter to retrieve ctx unique id

== Logs ==

For more details see: https://intel-gfx-ci.01.org/CI/Patchwork_4194/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/8] drm/i915: Framework for capturing command stream based OA reports
  2017-03-16  6:14 ` [PATCH 3/8] drm/i915: Framework for capturing command stream based OA reports sourab.gupta
@ 2017-03-16  8:10   ` Chris Wilson
  2017-03-16  8:54     ` sourab gupta
  2017-03-16  8:31   ` Chris Wilson
  1 sibling, 1 reply; 24+ messages in thread
From: Chris Wilson @ 2017-03-16  8:10 UTC (permalink / raw)
  To: sourab.gupta; +Cc: Daniel Vetter, intel-gfx, Matthew Auld

On Thu, Mar 16, 2017 at 11:44:10AM +0530, sourab.gupta@intel.com wrote:
> @@ -3593,6 +3670,7 @@ void i915_oa_init_reg_state(struct intel_engine_cs *engine,
>  void i915_oa_update_reg_state(struct intel_engine_cs *engine,
>  			      struct i915_gem_context *ctx,
>  			      uint32_t *reg_state);
> +void i915_perf_command_stream_hook(struct drm_i915_gem_request *req);
>  
>  /* i915_gem_evict.c */
>  int __must_check i915_gem_evict_something(struct i915_address_space *vm,
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index aa75ea2..7af32c97 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -1441,12 +1441,16 @@ static void eb_export_fence(struct drm_i915_gem_object *obj,
>  	if (exec_len == 0)
>  		exec_len = params->batch->size - params->args_batch_start_offset;
>  
> +	i915_perf_command_stream_hook(params->request);

Could you have named it anything more cyptic and inconsistent?

Quite clearly this can fail, so justify the non handling of errors.

DRM_ERROR is not error handling, it is an indication that this patch
isn't ready.

> +void i915_perf_command_stream_hook(struct drm_i915_gem_request *request)
> +{
> +	struct intel_engine_cs *engine = request->engine;
> +	struct drm_i915_private *dev_priv = engine->i915;
> +	struct i915_perf_stream *stream;
> +
> +	if (!dev_priv->perf.initialized)
> +		return;
> +
> +	mutex_lock(&dev_priv->perf.streams_lock);

Justify a new global lock very, very carefully on execbuf.

> +	list_for_each_entry(stream, &dev_priv->perf.streams, link) {
> +		if ((stream->state == I915_PERF_STREAM_ENABLED) &&
> +					stream->cs_mode)
> +			stream->ops->command_stream_hook(stream, request);
> +	}
> +	mutex_unlock(&dev_priv->perf.streams_lock);
> +}

> +static void i915_perf_command_stream_hook_oa(struct i915_perf_stream *stream,
> +					struct drm_i915_gem_request *request)
> +{
> +	struct drm_i915_private *dev_priv = request->i915;
> +	struct i915_gem_context *ctx = request->ctx;
> +	struct i915_perf_cs_sample *sample;
> +	u32 addr = 0;
> +	u32 cmd, *cs;
> +
> +	sample = kzalloc(sizeof(*sample), GFP_KERNEL);
> +	if (sample == NULL) {
> +		DRM_ERROR("Perf sample alloc failed\n");
> +		return;
> +	}
> +
> +	cs = intel_ring_begin(request, 4);
> +	if (IS_ERR(cs)) {
> +		kfree(sample);
> +		return;
> +	}
> +
> +	sample->ctx_id = ctx->hw_id;
> +	i915_gem_request_assign(&sample->request, request);

> +
> +	i915_gem_active_set(&stream->last_request, request);

Hint, you don't need you own request trap.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 8/8] drm/i915: Add support for emitting execbuffer tags through OA counter reports
  2017-03-16  6:14 ` [PATCH 8/8] drm/i915: Add support for emitting execbuffer tags through OA counter reports sourab.gupta
@ 2017-03-16  8:13   ` Chris Wilson
  0 siblings, 0 replies; 24+ messages in thread
From: Chris Wilson @ 2017-03-16  8:13 UTC (permalink / raw)
  To: sourab.gupta; +Cc: Daniel Vetter, intel-gfx, Matthew Auld

On Thu, Mar 16, 2017 at 11:44:15AM +0530, sourab.gupta@intel.com wrote:
> From: Sourab Gupta <sourab.gupta@intel.com>
> 
> This patch enables userspace to specify tags (per workload), provided via
> execbuffer ioctl, which could be added to OA reports, to help associate
> reports with the corresponding workloads.
> 
> There may be multiple stages within a single context, from a userspace
> perspective. An ability is needed to individually associate the OA reports
> with their corresponding workloads(execbuffers), which may not be possible
> solely with ctx_id or pid information. This patch enables such a mechanism.
> 
> In this patch, upper 32 bits of rsvd1 field, which were previously unused
> are now being used to pass in the tag.

Clearly this patch hasn't been tested.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/8] drm/i915: Add ctx getparam ioctl parameter to retrieve ctx unique id
  2017-03-16  6:14 ` [PATCH 1/8] drm/i915: Add ctx getparam ioctl parameter to retrieve ctx unique id sourab.gupta
@ 2017-03-16  8:23   ` Chris Wilson
  0 siblings, 0 replies; 24+ messages in thread
From: Chris Wilson @ 2017-03-16  8:23 UTC (permalink / raw)
  To: sourab.gupta; +Cc: Daniel Vetter, intel-gfx, Matthew Auld

On Thu, Mar 16, 2017 at 11:44:08AM +0530, sourab.gupta@intel.com wrote:
> From: Sourab Gupta <sourab.gupta@intel.com>
> 
> This patch adds a new ctx getparam ioctl parameter, which can be used to
> retrieve ctx unique id by userspace.
> 
> This can be used by userspace to map the OA reports received in the
> i915 perf samples with their associated ctx's (The OA reports have the
> hw ctx ID information embedded for Gen8+).
> Otherwise the userspace has no way of maintaining this association,
> since it has the knowledge of only per-drm file specific ctx handles.
> 
> Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/8] drm/i915: Framework for capturing command stream based OA reports
  2017-03-16  6:14 ` [PATCH 3/8] drm/i915: Framework for capturing command stream based OA reports sourab.gupta
  2017-03-16  8:10   ` Chris Wilson
@ 2017-03-16  8:31   ` Chris Wilson
  2017-03-16  8:57     ` sourab gupta
  1 sibling, 1 reply; 24+ messages in thread
From: Chris Wilson @ 2017-03-16  8:31 UTC (permalink / raw)
  To: sourab.gupta; +Cc: Daniel Vetter, intel-gfx, Matthew Auld

On Thu, Mar 16, 2017 at 11:44:10AM +0530, sourab.gupta@intel.com wrote:
> +static void i915_perf_command_stream_hook_oa(struct i915_perf_stream *stream,
> +					struct drm_i915_gem_request *request)
> +{
> +	struct drm_i915_private *dev_priv = request->i915;
> +	struct i915_gem_context *ctx = request->ctx;
> +	struct i915_perf_cs_sample *sample;
> +	u32 addr = 0;
> +	u32 cmd, *cs;
> +
> +	sample = kzalloc(sizeof(*sample), GFP_KERNEL);
> +	if (sample == NULL) {
> +		DRM_ERROR("Perf sample alloc failed\n");
> +		return;
> +	}
> +
> +	cs = intel_ring_begin(request, 4);
> +	if (IS_ERR(cs)) {
> +		kfree(sample);
> +		return;
> +	}
> +
> +	sample->ctx_id = ctx->hw_id;
> +	i915_gem_request_assign(&sample->request, request);
> +
> +	insert_perf_sample(dev_priv, sample);
> +
> +	addr = dev_priv->perf.command_stream_buf.vma->node.start +
> +		sample->offset;
> +
> +	if (WARN_ON(addr & 0x3f)) {
> +		DRM_ERROR("OA buffer address not aligned to 64 byte");
> +		return;
> +	}
> +
> +	cmd = MI_REPORT_PERF_COUNT | (1<<0);
> +	if (INTEL_GEN(dev_priv) >= 8)
> +		cmd |= (2<<0);
> +
> +	*cs++ = cmd;
> +	*cs++ = addr | MI_REPORT_PERF_COUNT_GGTT;
> +	*cs++ = request->global_seqno;

request->global_seqno is always zero at this point, and is subject to
change after assignment.

[ctx:engine or request->fence.context]:request->fence.seqno is permanent.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/8] drm/i915: Framework for capturing command stream based OA reports
  2017-03-16  8:10   ` Chris Wilson
@ 2017-03-16  8:54     ` sourab gupta
  2017-03-16  9:03       ` Chris Wilson
  0 siblings, 1 reply; 24+ messages in thread
From: sourab gupta @ 2017-03-16  8:54 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Daniel Vetter, intel-gfx, Auld, Matthew

On Thu, 2017-03-16 at 01:10 -0700, Chris Wilson wrote:
> On Thu, Mar 16, 2017 at 11:44:10AM +0530, sourab.gupta@intel.com wrote:
> > @@ -3593,6 +3670,7 @@ void i915_oa_init_reg_state(struct intel_engine_cs *engine,
> >  void i915_oa_update_reg_state(struct intel_engine_cs *engine,
> >  			      struct i915_gem_context *ctx,
> >  			      uint32_t *reg_state);
> > +void i915_perf_command_stream_hook(struct drm_i915_gem_request *req);
> >  
> >  /* i915_gem_evict.c */
> >  int __must_check i915_gem_evict_something(struct i915_address_space *vm,
> > diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > index aa75ea2..7af32c97 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > @@ -1441,12 +1441,16 @@ static void eb_export_fence(struct drm_i915_gem_object *obj,
> >  	if (exec_len == 0)
> >  		exec_len = params->batch->size - params->args_batch_start_offset;
> >  
> > +	i915_perf_command_stream_hook(params->request);
> 
> Could you have named it anything more cyptic and inconsistent?

Sorry. Would 'i915_perf_capture_metrics' work?
Can you please suggest an appropriate name otherwise.
> 
> Quite clearly this can fail, so justify the non handling of errors.
> 
> DRM_ERROR is not error handling, it is an indication that this patch
> isn't ready.

Well, the intent was to have minimal effect to execbuf normal routine,
even if we fail. But, I guess I'm mistaken.
I'll rectify this to handle the case, if perf callback fails.
	
> 
> > +void i915_perf_command_stream_hook(struct drm_i915_gem_request *request)
> > +{
> > +	struct intel_engine_cs *engine = request->engine;
> > +	struct drm_i915_private *dev_priv = engine->i915;
> > +	struct i915_perf_stream *stream;
> > +
> > +	if (!dev_priv->perf.initialized)
> > +		return;
> > +
> > +	mutex_lock(&dev_priv->perf.streams_lock);
> 
> Justify a new global lock very, very carefully on execbuf.

The lock introduced here is to protect the the perf.streams list against
addition/deletion while we're processing the list during execbuf call.
The other places where the mutex is taken is when a new stream is being
created (using perf_open ioctl) or a stream is being destroyed
(perf_release ioctl), which just protect the list_add and list_del into
the perf.streams list.
As such, there should not be much impact on execbuf path.
Does this seem reasonable?

> 
> > +	list_for_each_entry(stream, &dev_priv->perf.streams, link) {
> > +		if ((stream->state == I915_PERF_STREAM_ENABLED) &&
> > +					stream->cs_mode)
> > +			stream->ops->command_stream_hook(stream, request);
> > +	}
> > +	mutex_unlock(&dev_priv->perf.streams_lock);
> > +}
> 
> > +static void i915_perf_command_stream_hook_oa(struct i915_perf_stream *stream,
> > +					struct drm_i915_gem_request *request)
> > +{
> > +	struct drm_i915_private *dev_priv = request->i915;
> > +	struct i915_gem_context *ctx = request->ctx;
> > +	struct i915_perf_cs_sample *sample;
> > +	u32 addr = 0;
> > +	u32 cmd, *cs;
> > +
> > +	sample = kzalloc(sizeof(*sample), GFP_KERNEL);
> > +	if (sample == NULL) {
> > +		DRM_ERROR("Perf sample alloc failed\n");
> > +		return;
> > +	}
> > +
> > +	cs = intel_ring_begin(request, 4);
> > +	if (IS_ERR(cs)) {
> > +		kfree(sample);
> > +		return;
> > +	}
> > +
> > +	sample->ctx_id = ctx->hw_id;
> > +	i915_gem_request_assign(&sample->request, request);
> 
> > +
> > +	i915_gem_active_set(&stream->last_request, request);
> 
> Hint, you don't need you own request trap.
Sorry, I didn't understand you fully here. I'm storing the reference to
the last active request associated with the stream inside the
last_request 'gem_active' field. Do I need to do something differently
here?

> -Chris
> 


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/8] drm/i915: Framework for capturing command stream based OA reports
  2017-03-16  8:31   ` Chris Wilson
@ 2017-03-16  8:57     ` sourab gupta
  0 siblings, 0 replies; 24+ messages in thread
From: sourab gupta @ 2017-03-16  8:57 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Daniel Vetter, intel-gfx, Auld, Matthew

On Thu, 2017-03-16 at 01:31 -0700, Chris Wilson wrote:
> On Thu, Mar 16, 2017 at 11:44:10AM +0530, sourab.gupta@intel.com wrote:
> > +static void i915_perf_command_stream_hook_oa(struct i915_perf_stream *stream,
> > +					struct drm_i915_gem_request *request)
> > +{
> > +	struct drm_i915_private *dev_priv = request->i915;
> > +	struct i915_gem_context *ctx = request->ctx;
> > +	struct i915_perf_cs_sample *sample;
> > +	u32 addr = 0;
> > +	u32 cmd, *cs;
> > +
> > +	sample = kzalloc(sizeof(*sample), GFP_KERNEL);
> > +	if (sample == NULL) {
> > +		DRM_ERROR("Perf sample alloc failed\n");
> > +		return;
> > +	}
> > +
> > +	cs = intel_ring_begin(request, 4);
> > +	if (IS_ERR(cs)) {
> > +		kfree(sample);
> > +		return;
> > +	}
> > +
> > +	sample->ctx_id = ctx->hw_id;
> > +	i915_gem_request_assign(&sample->request, request);
> > +
> > +	insert_perf_sample(dev_priv, sample);
> > +
> > +	addr = dev_priv->perf.command_stream_buf.vma->node.start +
> > +		sample->offset;
> > +
> > +	if (WARN_ON(addr & 0x3f)) {
> > +		DRM_ERROR("OA buffer address not aligned to 64 byte");
> > +		return;
> > +	}
> > +
> > +	cmd = MI_REPORT_PERF_COUNT | (1<<0);
> > +	if (INTEL_GEN(dev_priv) >= 8)
> > +		cmd |= (2<<0);
> > +
> > +	*cs++ = cmd;
> > +	*cs++ = addr | MI_REPORT_PERF_COUNT_GGTT;
> > +	*cs++ = request->global_seqno;
> 
> request->global_seqno is always zero at this point, and is subject to
> change after assignment.
> 
> [ctx:engine or request->fence.context]:request->fence.seqno is permanent.
> -Chris
> 
Thanks for pointing out. I'll use request->fence.seqno instead.

Regards,
Sourab


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/8] drm/i915: Framework for capturing command stream based OA reports
  2017-03-16  8:54     ` sourab gupta
@ 2017-03-16  9:03       ` Chris Wilson
  2017-03-16  9:52         ` sourab gupta
  0 siblings, 1 reply; 24+ messages in thread
From: Chris Wilson @ 2017-03-16  9:03 UTC (permalink / raw)
  To: sourab gupta; +Cc: Daniel Vetter, intel-gfx, Auld, Matthew

On Thu, Mar 16, 2017 at 02:24:55PM +0530, sourab gupta wrote:
> On Thu, 2017-03-16 at 01:10 -0700, Chris Wilson wrote:
> > On Thu, Mar 16, 2017 at 11:44:10AM +0530, sourab.gupta@intel.com wrote:
> > > @@ -3593,6 +3670,7 @@ void i915_oa_init_reg_state(struct intel_engine_cs *engine,
> > >  void i915_oa_update_reg_state(struct intel_engine_cs *engine,
> > >  			      struct i915_gem_context *ctx,
> > >  			      uint32_t *reg_state);
> > > +void i915_perf_command_stream_hook(struct drm_i915_gem_request *req);
> > >  
> > >  /* i915_gem_evict.c */
> > >  int __must_check i915_gem_evict_something(struct i915_address_space *vm,
> > > diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > > index aa75ea2..7af32c97 100644
> > > --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > > +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > > @@ -1441,12 +1441,16 @@ static void eb_export_fence(struct drm_i915_gem_object *obj,
> > >  	if (exec_len == 0)
> > >  		exec_len = params->batch->size - params->args_batch_start_offset;
> > >  
> > > +	i915_perf_command_stream_hook(params->request);
> > 
> > Could you have named it anything more cyptic and inconsistent?
> 
> Sorry. Would 'i915_perf_capture_metrics' work?
> Can you please suggest an appropriate name otherwise.

The verb we use for writting into the command stream is emit. So
i915_perf_emit_samples() (emit_record record is clumsy as it is not clear
whether it is the verb or noun).

> > 
> > Quite clearly this can fail, so justify the non handling of errors.
> > 
> > DRM_ERROR is not error handling, it is an indication that this patch
> > isn't ready.
> 
> Well, the intent was to have minimal effect to execbuf normal routine,
> even if we fail. But, I guess I'm mistaken.
> I'll rectify this to handle the case, if perf callback fails.

That all depends on whether or not you can handle the unbalanced
metrics. If simply missing a report is fine, then just kill the
DRM_ERROR.

The bigger question is whether the following emit can to fail -- once
the batch is in the request, no more failures are tolerable. You have to
preallocate reserved space.

Don't you need a flush before the emit following the batch?

> > > +void i915_perf_command_stream_hook(struct drm_i915_gem_request *request)
> > > +{
> > > +	struct intel_engine_cs *engine = request->engine;
> > > +	struct drm_i915_private *dev_priv = engine->i915;
> > > +	struct i915_perf_stream *stream;
> > > +
> > > +	if (!dev_priv->perf.initialized)
> > > +		return;
> > > +
> > > +	mutex_lock(&dev_priv->perf.streams_lock);
> > 
> > Justify a new global lock very, very carefully on execbuf.
> 
> The lock introduced here is to protect the the perf.streams list against
> addition/deletion while we're processing the list during execbuf call.
> The other places where the mutex is taken is when a new stream is being
> created (using perf_open ioctl) or a stream is being destroyed
> (perf_release ioctl), which just protect the list_add and list_del into
> the perf.streams list.
> As such, there should not be much impact on execbuf path.
> Does this seem reasonable?

It doesn't sound like it needs a mutex, which will simplify the other
users as well (if switched to, for example, an RCU protected list).

> > > +	list_for_each_entry(stream, &dev_priv->perf.streams, link) {
> > > +		if ((stream->state == I915_PERF_STREAM_ENABLED) &&
> > > +					stream->cs_mode)
> > > +			stream->ops->command_stream_hook(stream, request);
> > > +	}
> > > +	mutex_unlock(&dev_priv->perf.streams_lock);
> > > +}
> > 
> > > +static void i915_perf_command_stream_hook_oa(struct i915_perf_stream *stream,
> > > +					struct drm_i915_gem_request *request)
> > > +{
> > > +	struct drm_i915_private *dev_priv = request->i915;
> > > +	struct i915_gem_context *ctx = request->ctx;
> > > +	struct i915_perf_cs_sample *sample;
> > > +	u32 addr = 0;
> > > +	u32 cmd, *cs;
> > > +
> > > +	sample = kzalloc(sizeof(*sample), GFP_KERNEL);
> > > +	if (sample == NULL) {
> > > +		DRM_ERROR("Perf sample alloc failed\n");
> > > +		return;
> > > +	}
> > > +
> > > +	cs = intel_ring_begin(request, 4);
> > > +	if (IS_ERR(cs)) {
> > > +		kfree(sample);
> > > +		return;
> > > +	}
> > > +
> > > +	sample->ctx_id = ctx->hw_id;
> > > +	i915_gem_request_assign(&sample->request, request);
> > 
> > > +
> > > +	i915_gem_active_set(&stream->last_request, request);
> > 
> > Hint, you don't need you own request trap.
> Sorry, I didn't understand you fully here. I'm storing the reference to
> the last active request associated with the stream inside the
> last_request 'gem_active' field. Do I need to do something differently
> here?

It's the duplication.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/8] drm/i915: Framework for capturing command stream based OA reports
  2017-03-16  9:03       ` Chris Wilson
@ 2017-03-16  9:52         ` sourab gupta
  2017-03-16 10:09           ` Chris Wilson
  0 siblings, 1 reply; 24+ messages in thread
From: sourab gupta @ 2017-03-16  9:52 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Daniel Vetter, intel-gfx, Auld, Matthew

On Thu, 2017-03-16 at 02:03 -0700, Chris Wilson wrote:
> On Thu, Mar 16, 2017 at 02:24:55PM +0530, sourab gupta wrote:
> > On Thu, 2017-03-16 at 01:10 -0700, Chris Wilson wrote:
> > > On Thu, Mar 16, 2017 at 11:44:10AM +0530, sourab.gupta@intel.com wrote:
> > > > @@ -3593,6 +3670,7 @@ void i915_oa_init_reg_state(struct intel_engine_cs *engine,
> > > >  void i915_oa_update_reg_state(struct intel_engine_cs *engine,
> > > >  			      struct i915_gem_context *ctx,
> > > >  			      uint32_t *reg_state);
> > > > +void i915_perf_command_stream_hook(struct drm_i915_gem_request *req);
> > > >  
> > > >  /* i915_gem_evict.c */
> > > >  int __must_check i915_gem_evict_something(struct i915_address_space *vm,
> > > > diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > > > index aa75ea2..7af32c97 100644
> > > > --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > > > +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > > > @@ -1441,12 +1441,16 @@ static void eb_export_fence(struct drm_i915_gem_object *obj,
> > > >  	if (exec_len == 0)
> > > >  		exec_len = params->batch->size - params->args_batch_start_offset;
> > > >  
> > > > +	i915_perf_command_stream_hook(params->request);
> > > 
> > > Could you have named it anything more cyptic and inconsistent?
> > 
> > Sorry. Would 'i915_perf_capture_metrics' work?
> > Can you please suggest an appropriate name otherwise.
> 
> The verb we use for writting into the command stream is emit. So
> i915_perf_emit_samples() (emit_record record is clumsy as it is not clear
> whether it is the verb or noun).
> 
Thanks for suggesting. I'll use 'i915_perf_emit_samples' here.

> > > 
> > > Quite clearly this can fail, so justify the non handling of errors.
> > > 
> > > DRM_ERROR is not error handling, it is an indication that this patch
> > > isn't ready.
> > 
> > Well, the intent was to have minimal effect to execbuf normal routine,
> > even if we fail. But, I guess I'm mistaken.
> > I'll rectify this to handle the case, if perf callback fails.
> 
> That all depends on whether or not you can handle the unbalanced
> metrics. If simply missing a report is fine, then just kill the
> DRM_ERROR.
> 
> The bigger question is whether the following emit can to fail -- once
> the batch is in the request, no more failures are tolerable. You have to
> preallocate reserved space.
> 
> Don't you need a flush before the emit following the batch?
> 

Ok. So, that would mean that we have to first of all reserve the space
required by two 'perf_emit_samples' calls, so that we can't fail for the
lack of space in the emit following the batch.
Probably, I could pass an additional boolean parameter 'reserve_space'
set to true in the first call, which would reserve the space for both
emit_samples() calls (through intel_ring_begin)?


Would it still need the flush before the emit following the batch?
Ideally, the metrics should be collected as close to batch as possible.
So, if there are cache flush/tlb invalidate commands, it would introduce
some lag between the batch and following emit_samples command.
Sorry, I'm not able to gauge the need for flush here. I can understand
it's needed before the batch is programmed for flushing the cache/TLB
entries for the new workload to be submitted. But for the Sample_emit
commands, which generally only capture OA/timestamp/mmio metrics, is
this still required? 


> > > > +void i915_perf_command_stream_hook(struct drm_i915_gem_request *request)
> > > > +{
> > > > +	struct intel_engine_cs *engine = request->engine;
> > > > +	struct drm_i915_private *dev_priv = engine->i915;
> > > > +	struct i915_perf_stream *stream;
> > > > +
> > > > +	if (!dev_priv->perf.initialized)
> > > > +		return;
> > > > +
> > > > +	mutex_lock(&dev_priv->perf.streams_lock);
> > > 
> > > Justify a new global lock very, very carefully on execbuf.
> > 
> > The lock introduced here is to protect the the perf.streams list against
> > addition/deletion while we're processing the list during execbuf call.
> > The other places where the mutex is taken is when a new stream is being
> > created (using perf_open ioctl) or a stream is being destroyed
> > (perf_release ioctl), which just protect the list_add and list_del into
> > the perf.streams list.
> > As such, there should not be much impact on execbuf path.
> > Does this seem reasonable?
> 
> It doesn't sound like it needs a mutex, which will simplify the other
> users as well (if switched to, for example, an RCU protected list).

Ok. Sounds reasonable, I'll switch to using an RCU protected list here.

> 
> > > > +	list_for_each_entry(stream, &dev_priv->perf.streams, link) {
> > > > +		if ((stream->state == I915_PERF_STREAM_ENABLED) &&
> > > > +					stream->cs_mode)
> > > > +			stream->ops->command_stream_hook(stream, request);
> > > > +	}
> > > > +	mutex_unlock(&dev_priv->perf.streams_lock);
> > > > +}
> > > 
> > > > +static void i915_perf_command_stream_hook_oa(struct i915_perf_stream *stream,
> > > > +					struct drm_i915_gem_request *request)
> > > > +{
> > > > +	struct drm_i915_private *dev_priv = request->i915;
> > > > +	struct i915_gem_context *ctx = request->ctx;
> > > > +	struct i915_perf_cs_sample *sample;
> > > > +	u32 addr = 0;
> > > > +	u32 cmd, *cs;
> > > > +
> > > > +	sample = kzalloc(sizeof(*sample), GFP_KERNEL);
> > > > +	if (sample == NULL) {
> > > > +		DRM_ERROR("Perf sample alloc failed\n");
> > > > +		return;
> > > > +	}
> > > > +
> > > > +	cs = intel_ring_begin(request, 4);
> > > > +	if (IS_ERR(cs)) {
> > > > +		kfree(sample);
> > > > +		return;
> > > > +	}
> > > > +
> > > > +	sample->ctx_id = ctx->hw_id;
> > > > +	i915_gem_request_assign(&sample->request, request);
> > > 
> > > > +
> > > > +	i915_gem_active_set(&stream->last_request, request);
> > > 
> > > Hint, you don't need you own request trap.
> > Sorry, I didn't understand you fully here. I'm storing the reference to
> > the last active request associated with the stream inside the
> > last_request 'gem_active' field. Do I need to do something differently
> > here?
> 
> It's the duplication.

Are you suggesting that request_assign() and active_set() is
duplication? 
Actually, I'm using the last_request active tracker for the purpose of
waiting on last request to complete, whenever the need.
But still I need to get reference for every request for which the
commands for collection of metrics are emitted. This is because I need
to check for their completion before collecting the associated metrics.
This is done inside 'append_command_stream_samples' function, which also
takes care of releasing the reference taken here.
Am I missing something w.r.t. the active_set() functionality?

> -Chris
> 

Thanks,
Sourab



_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/8] drm/i915: Framework for capturing command stream based OA reports
  2017-03-16  9:52         ` sourab gupta
@ 2017-03-16 10:09           ` Chris Wilson
  2017-03-16 13:12             ` sourab gupta
  0 siblings, 1 reply; 24+ messages in thread
From: Chris Wilson @ 2017-03-16 10:09 UTC (permalink / raw)
  To: sourab gupta; +Cc: Daniel Vetter, intel-gfx, Auld, Matthew

On Thu, Mar 16, 2017 at 03:22:03PM +0530, sourab gupta wrote:
> On Thu, 2017-03-16 at 02:03 -0700, Chris Wilson wrote:
> > On Thu, Mar 16, 2017 at 02:24:55PM +0530, sourab gupta wrote:
> > > On Thu, 2017-03-16 at 01:10 -0700, Chris Wilson wrote:
> > > > On Thu, Mar 16, 2017 at 11:44:10AM +0530, sourab.gupta@intel.com wrote:
> > > > > @@ -3593,6 +3670,7 @@ void i915_oa_init_reg_state(struct intel_engine_cs *engine,
> > > > >  void i915_oa_update_reg_state(struct intel_engine_cs *engine,
> > > > >  			      struct i915_gem_context *ctx,
> > > > >  			      uint32_t *reg_state);
> > > > > +void i915_perf_command_stream_hook(struct drm_i915_gem_request *req);
> > > > >  
> > > > >  /* i915_gem_evict.c */
> > > > >  int __must_check i915_gem_evict_something(struct i915_address_space *vm,
> > > > > diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > > > > index aa75ea2..7af32c97 100644
> > > > > --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > > > > +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > > > > @@ -1441,12 +1441,16 @@ static void eb_export_fence(struct drm_i915_gem_object *obj,
> > > > >  	if (exec_len == 0)
> > > > >  		exec_len = params->batch->size - params->args_batch_start_offset;
> > > > >  
> > > > > +	i915_perf_command_stream_hook(params->request);
> > > > 
> > > > Could you have named it anything more cyptic and inconsistent?
> > > 
> > > Sorry. Would 'i915_perf_capture_metrics' work?
> > > Can you please suggest an appropriate name otherwise.
> > 
> > The verb we use for writting into the command stream is emit. So
> > i915_perf_emit_samples() (emit_record record is clumsy as it is not clear
> > whether it is the verb or noun).
> > 
> Thanks for suggesting. I'll use 'i915_perf_emit_samples' here.
> 
> > > > 
> > > > Quite clearly this can fail, so justify the non handling of errors.
> > > > 
> > > > DRM_ERROR is not error handling, it is an indication that this patch
> > > > isn't ready.
> > > 
> > > Well, the intent was to have minimal effect to execbuf normal routine,
> > > even if we fail. But, I guess I'm mistaken.
> > > I'll rectify this to handle the case, if perf callback fails.
> > 
> > That all depends on whether or not you can handle the unbalanced
> > metrics. If simply missing a report is fine, then just kill the
> > DRM_ERROR.
> > 
> > The bigger question is whether the following emit can to fail -- once
> > the batch is in the request, no more failures are tolerable. You have to
> > preallocate reserved space.
> > 
> > Don't you need a flush before the emit following the batch?
> > 
> 
> Ok. So, that would mean that we have to first of all reserve the space
> required by two 'perf_emit_samples' calls, so that we can't fail for the
> lack of space in the emit following the batch.
> Probably, I could pass an additional boolean parameter 'reserve_space'
> set to true in the first call, which would reserve the space for both
> emit_samples() calls (through intel_ring_begin)?

Hmm, yes, you can just tweak the request->reserved_space in the first
call to preallocate some space (and make it remains available) for the
second.

perf_emit_samples(rq, bool preallocate) {
	/* then in each sample callback */
	cs_len = foo;
	if (preallocate)
		rq->reserved_space += cs_len;
	else
		rq->reserved_space -= cs_len;
	cs = intel_ring_begin(rq, cs_len);
}
	
> Would it still need the flush before the emit following the batch?
> Ideally, the metrics should be collected as close to batch as possible.
> So, if there are cache flush/tlb invalidate commands, it would introduce
> some lag between the batch and following emit_samples command.
> Sorry, I'm not able to gauge the need for flush here. I can understand
> it's needed before the batch is programmed for flushing the cache/TLB
> entries for the new workload to be submitted. But for the Sample_emit
> commands, which generally only capture OA/timestamp/mmio metrics, is
> this still required? 

Depends on the desired accuracy. If you want your metrics sampled after
the user pipeline is completed, you need a stall at least, a flush if
your metrics include e.g. fragment data. If you want samples
taken in whilst the user's batch is still executing, then no.

> > > > > +void i915_perf_command_stream_hook(struct drm_i915_gem_request *request)
> > > > > +{
> > > > > +	struct intel_engine_cs *engine = request->engine;
> > > > > +	struct drm_i915_private *dev_priv = engine->i915;
> > > > > +	struct i915_perf_stream *stream;
> > > > > +
> > > > > +	if (!dev_priv->perf.initialized)
> > > > > +		return;
> > > > > +
> > > > > +	mutex_lock(&dev_priv->perf.streams_lock);
> > > > 
> > > > Justify a new global lock very, very carefully on execbuf.
> > > 
> > > The lock introduced here is to protect the the perf.streams list against
> > > addition/deletion while we're processing the list during execbuf call.
> > > The other places where the mutex is taken is when a new stream is being
> > > created (using perf_open ioctl) or a stream is being destroyed
> > > (perf_release ioctl), which just protect the list_add and list_del into
> > > the perf.streams list.
> > > As such, there should not be much impact on execbuf path.
> > > Does this seem reasonable?
> > 
> > It doesn't sound like it needs a mutex, which will simplify the other
> > users as well (if switched to, for example, an RCU protected list).
> 
> Ok. Sounds reasonable, I'll switch to using an RCU protected list here.

(I may be overthinking this, but it still seems overkill and made the
timer callback uglier than expected.)

> > > > > +	list_for_each_entry(stream, &dev_priv->perf.streams, link) {
> > > > > +		if ((stream->state == I915_PERF_STREAM_ENABLED) &&
> > > > > +					stream->cs_mode)
> > > > > +			stream->ops->command_stream_hook(stream, request);
> > > > > +	}
> > > > > +	mutex_unlock(&dev_priv->perf.streams_lock);
> > > > > +}
> > > > 
> > > > > +static void i915_perf_command_stream_hook_oa(struct i915_perf_stream *stream,
> > > > > +					struct drm_i915_gem_request *request)
> > > > > +{
> > > > > +	struct drm_i915_private *dev_priv = request->i915;
> > > > > +	struct i915_gem_context *ctx = request->ctx;
> > > > > +	struct i915_perf_cs_sample *sample;
> > > > > +	u32 addr = 0;
> > > > > +	u32 cmd, *cs;
> > > > > +
> > > > > +	sample = kzalloc(sizeof(*sample), GFP_KERNEL);
> > > > > +	if (sample == NULL) {
> > > > > +		DRM_ERROR("Perf sample alloc failed\n");
> > > > > +		return;
> > > > > +	}
> > > > > +
> > > > > +	cs = intel_ring_begin(request, 4);
> > > > > +	if (IS_ERR(cs)) {
> > > > > +		kfree(sample);
> > > > > +		return;
> > > > > +	}
> > > > > +
> > > > > +	sample->ctx_id = ctx->hw_id;
> > > > > +	i915_gem_request_assign(&sample->request, request);
> > > > 
> > > > > +
> > > > > +	i915_gem_active_set(&stream->last_request, request);
> > > > 
> > > > Hint, you don't need you own request trap.
> > > Sorry, I didn't understand you fully here. I'm storing the reference to
> > > the last active request associated with the stream inside the
> > > last_request 'gem_active' field. Do I need to do something differently
> > > here?
> > 
> > It's the duplication.
> 
> Are you suggesting that request_assign() and active_set() is
> duplication? 
> Actually, I'm using the last_request active tracker for the purpose of
> waiting on last request to complete, whenever the need.
> But still I need to get reference for every request for which the
> commands for collection of metrics are emitted. This is because I need
> to check for their completion before collecting the associated metrics.

Missed the sample / stream difference. request_assign means update the
request field, had you used
	sample->request = i915_gem_request_get(request)
it would have been clearer.

Note that the requests are not ordered for the stream, so you cannot
track the last_request so easily.

> This is done inside 'append_command_stream_samples' function, which also
> takes care of releasing the reference taken here.
> Am I missing something w.r.t. the active_set() functionality?

I was mostly thrown by the idea that you were reassigning requests,
which history has shown to be a bad idea (hence i915_gem_active).
However, stream->last_request should be a reservation_object.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/8] Collect command stream based OA reports using i915 perf
  2017-03-16  6:14 [PATCH 0/8] Collect command stream based OA reports using i915 perf sourab.gupta
                   ` (8 preceding siblings ...)
  2017-03-16  6:30 ` ✓ Fi.CI.BAT: success for Collect command stream based OA reports using i915 perf Patchwork
@ 2017-03-16 12:59 ` Robert Bragg
  2017-03-17  5:29   ` sourab gupta
  9 siblings, 1 reply; 24+ messages in thread
From: Robert Bragg @ 2017-03-16 12:59 UTC (permalink / raw)
  To: Gupta, Sourab; +Cc: Daniel Vetter, Intel Graphics Development, Matthew Auld

On Thu, Mar 16, 2017 at 6:14 AM,  <sourab.gupta@intel.com> wrote:
> From: Sourab Gupta <sourab.gupta@intel.com>
>
> This series adds framework for collection of OA reports associated with the
> render command stream, which are collected around batchbuffer boundaries.
>
> Refloating the series rebased on Robert's latest patch set for
> 'Enabling OA unit for Gen 8 and 9 in i915 perf', which can be found here:
> https://patchwork.freedesktop.org/series/20084/
>
> Since Robert's patches are being reviewed and this patch series extends his
> framework to collect command stream based OA metrics, it would be good to keep
> this work in perspective. Looking to receive feedback (and possibly r-b's :))
> on the series.
>
> Since the OA reports collected associated with the render command stream, this
> also gives us the ability to collect other metadata such as ctx_id, pid, etc.
> with the samples, and thus we can establish the association of samples
> collected with the corresponding process/workload.
>
> These patches can be found for viewing at
> https://github.com/sourabgu/linux/tree/oa-6march2017
>
> Sourab Gupta (8):
>   drm/i915: Add ctx getparam ioctl parameter to retrieve ctx unique id
>   drm/i915: Expose OA sample source to userspace
>   drm/i915: Framework for capturing command stream based OA reports
>   drm/i915: flush periodic samples, in case of no pending CS sample
>     requests
>   drm/i915: Inform userspace about command stream OA buf overflow
>   drm/i915: Populate ctx ID for periodic OA reports
>   drm/i915: Add support for having pid output with OA report
>   drm/i915: Add support for emitting execbuffer tags through OA counter
>     reports

Thanks for the updated series Sourab.

I think it could really help to have a pointer to some userspace that
can be used to exercise these new features. Maybe you could look at
adding support to the gputop-csv command line tool which is probably
the simplest, usable userspace for i915 perf we have currently. A
pointer to some work-in-progress IGT tests could be good too.

Br,
- Robert

>
>  drivers/gpu/drm/i915/i915_drv.h            |  125 ++-
>  drivers/gpu/drm/i915/i915_gem_context.c    |    3 +
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c |    6 +
>  drivers/gpu/drm/i915/i915_perf.c           | 1149 ++++++++++++++++++++++++----
>  include/uapi/drm/i915_drm.h                |   49 ++
>  5 files changed, 1184 insertions(+), 148 deletions(-)
>
> --
> 1.9.1
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/8] drm/i915: Framework for capturing command stream based OA reports
  2017-03-16 10:09           ` Chris Wilson
@ 2017-03-16 13:12             ` sourab gupta
  2017-03-16 13:27               ` Chris Wilson
  0 siblings, 1 reply; 24+ messages in thread
From: sourab gupta @ 2017-03-16 13:12 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Daniel Vetter, intel-gfx, Auld, Matthew

On Thu, 2017-03-16 at 03:09 -0700, Chris Wilson wrote:
> On Thu, Mar 16, 2017 at 03:22:03PM +0530, sourab gupta wrote:
> > On Thu, 2017-03-16 at 02:03 -0700, Chris Wilson wrote:
> > > On Thu, Mar 16, 2017 at 02:24:55PM +0530, sourab gupta wrote:
> > > > On Thu, 2017-03-16 at 01:10 -0700, Chris Wilson wrote:
> > > > > On Thu, Mar 16, 2017 at 11:44:10AM +0530, sourab.gupta@intel.com wrote:
> > > > > > @@ -3593,6 +3670,7 @@ void i915_oa_init_reg_state(struct intel_engine_cs *engine,
> > > > > >  void i915_oa_update_reg_state(struct intel_engine_cs *engine,
> > > > > >  			      struct i915_gem_context *ctx,
> > > > > >  			      uint32_t *reg_state);
> > > > > > +void i915_perf_command_stream_hook(struct drm_i915_gem_request *req);
> > > > > >  
> > > > > >  /* i915_gem_evict.c */
> > > > > >  int __must_check i915_gem_evict_something(struct i915_address_space *vm,
> > > > > > diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > > > > > index aa75ea2..7af32c97 100644
> > > > > > --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > > > > > +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > > > > > @@ -1441,12 +1441,16 @@ static void eb_export_fence(struct drm_i915_gem_object *obj,
> > > > > >  	if (exec_len == 0)
> > > > > >  		exec_len = params->batch->size - params->args_batch_start_offset;
> > > > > >  
> > > > > > +	i915_perf_command_stream_hook(params->request);
> > > > > 
> > > > > Could you have named it anything more cyptic and inconsistent?
> > > > 
> > > > Sorry. Would 'i915_perf_capture_metrics' work?
> > > > Can you please suggest an appropriate name otherwise.
> > > 
> > > The verb we use for writting into the command stream is emit. So
> > > i915_perf_emit_samples() (emit_record record is clumsy as it is not clear
> > > whether it is the verb or noun).
> > > 
> > Thanks for suggesting. I'll use 'i915_perf_emit_samples' here.
> > 
> > > > > 
> > > > > Quite clearly this can fail, so justify the non handling of errors.
> > > > > 
> > > > > DRM_ERROR is not error handling, it is an indication that this patch
> > > > > isn't ready.
> > > > 
> > > > Well, the intent was to have minimal effect to execbuf normal routine,
> > > > even if we fail. But, I guess I'm mistaken.
> > > > I'll rectify this to handle the case, if perf callback fails.
> > > 
> > > That all depends on whether or not you can handle the unbalanced
> > > metrics. If simply missing a report is fine, then just kill the
> > > DRM_ERROR.
> > > 
> > > The bigger question is whether the following emit can to fail -- once
> > > the batch is in the request, no more failures are tolerable. You have to
> > > preallocate reserved space.
> > > 
> > > Don't you need a flush before the emit following the batch?
> > > 
> > 
> > Ok. So, that would mean that we have to first of all reserve the space
> > required by two 'perf_emit_samples' calls, so that we can't fail for the
> > lack of space in the emit following the batch.
> > Probably, I could pass an additional boolean parameter 'reserve_space'
> > set to true in the first call, which would reserve the space for both
> > emit_samples() calls (through intel_ring_begin)?
> 
> Hmm, yes, you can just tweak the request->reserved_space in the first
> call to preallocate some space (and make it remains available) for the
> second.
> 
> perf_emit_samples(rq, bool preallocate) {
> 	/* then in each sample callback */
> 	cs_len = foo;
> 	if (preallocate)
> 		rq->reserved_space += cs_len;
> 	else
> 		rq->reserved_space -= cs_len;
> 	cs = intel_ring_begin(rq, cs_len);
> }
> 	
Ok. I would do that. Thanks for suggesting.

> > Would it still need the flush before the emit following the batch?
> > Ideally, the metrics should be collected as close to batch as possible.
> > So, if there are cache flush/tlb invalidate commands, it would introduce
> > some lag between the batch and following emit_samples command.
> > Sorry, I'm not able to gauge the need for flush here. I can understand
> > it's needed before the batch is programmed for flushing the cache/TLB
> > entries for the new workload to be submitted. But for the Sample_emit
> > commands, which generally only capture OA/timestamp/mmio metrics, is
> > this still required? 
> 
> Depends on the desired accuracy. If you want your metrics sampled after
> the user pipeline is completed, you need a stall at least, a flush if
> your metrics include e.g. fragment data. If you want samples
> taken in whilst the user's batch is still executing, then no.
> 
I guess, in that case, I would atleast need a stall to ensure user
pipeline is completed before metrics are collected.
Any additional inputs here, Robert?

> > > > > > +void i915_perf_command_stream_hook(struct drm_i915_gem_request *request)
> > > > > > +{
> > > > > > +	struct intel_engine_cs *engine = request->engine;
> > > > > > +	struct drm_i915_private *dev_priv = engine->i915;
> > > > > > +	struct i915_perf_stream *stream;
> > > > > > +
> > > > > > +	if (!dev_priv->perf.initialized)
> > > > > > +		return;
> > > > > > +
> > > > > > +	mutex_lock(&dev_priv->perf.streams_lock);
> > > > > 
> > > > > Justify a new global lock very, very carefully on execbuf.
> > > > 
> > > > The lock introduced here is to protect the the perf.streams list against
> > > > addition/deletion while we're processing the list during execbuf call.
> > > > The other places where the mutex is taken is when a new stream is being
> > > > created (using perf_open ioctl) or a stream is being destroyed
> > > > (perf_release ioctl), which just protect the list_add and list_del into
> > > > the perf.streams list.
> > > > As such, there should not be much impact on execbuf path.
> > > > Does this seem reasonable?
> > > 
> > > It doesn't sound like it needs a mutex, which will simplify the other
> > > users as well (if switched to, for example, an RCU protected list).
> > 
> > Ok. Sounds reasonable, I'll switch to using an RCU protected list here.
> 
> (I may be overthinking this, but it still seems overkill and made the
> timer callback uglier than expected.)
> 
Are you suggesting that using RCU here is overkill, and mutex would do?

> > > > > > +	list_for_each_entry(stream, &dev_priv->perf.streams, link) {
> > > > > > +		if ((stream->state == I915_PERF_STREAM_ENABLED) &&
> > > > > > +					stream->cs_mode)
> > > > > > +			stream->ops->command_stream_hook(stream, request);
> > > > > > +	}
> > > > > > +	mutex_unlock(&dev_priv->perf.streams_lock);
> > > > > > +}
> > > > > 
> > > > > > +static void i915_perf_command_stream_hook_oa(struct i915_perf_stream *stream,
> > > > > > +					struct drm_i915_gem_request *request)
> > > > > > +{
> > > > > > +	struct drm_i915_private *dev_priv = request->i915;
> > > > > > +	struct i915_gem_context *ctx = request->ctx;
> > > > > > +	struct i915_perf_cs_sample *sample;
> > > > > > +	u32 addr = 0;
> > > > > > +	u32 cmd, *cs;
> > > > > > +
> > > > > > +	sample = kzalloc(sizeof(*sample), GFP_KERNEL);
> > > > > > +	if (sample == NULL) {
> > > > > > +		DRM_ERROR("Perf sample alloc failed\n");
> > > > > > +		return;
> > > > > > +	}
> > > > > > +
> > > > > > +	cs = intel_ring_begin(request, 4);
> > > > > > +	if (IS_ERR(cs)) {
> > > > > > +		kfree(sample);
> > > > > > +		return;
> > > > > > +	}
> > > > > > +
> > > > > > +	sample->ctx_id = ctx->hw_id;
> > > > > > +	i915_gem_request_assign(&sample->request, request);
> > > > > 
> > > > > > +
> > > > > > +	i915_gem_active_set(&stream->last_request, request);
> > > > > 
> > > > > Hint, you don't need you own request trap.
> > > > Sorry, I didn't understand you fully here. I'm storing the reference to
> > > > the last active request associated with the stream inside the
> > > > last_request 'gem_active' field. Do I need to do something differently
> > > > here?
> > > 
> > > It's the duplication.
> > 
> > Are you suggesting that request_assign() and active_set() is
> > duplication? 
> > Actually, I'm using the last_request active tracker for the purpose of
> > waiting on last request to complete, whenever the need.
> > But still I need to get reference for every request for which the
> > commands for collection of metrics are emitted. This is because I need
> > to check for their completion before collecting the associated metrics.
> 
> Missed the sample / stream difference. request_assign means update the
> request field, had you used
> 	sample->request = i915_gem_request_get(request)
> it would have been clearer.
> 
> Note that the requests are not ordered for the stream, so you cannot
> track the last_request so easily.
> 
> > This is done inside 'append_command_stream_samples' function, which also
> > takes care of releasing the reference taken here.
> > Am I missing something w.r.t. the active_set() functionality?
> 
> I was mostly thrown by the idea that you were reassigning requests,
> which history has shown to be a bad idea (hence i915_gem_active).
> However, stream->last_request should be a reservation_object.
> -Chris
> 
If I understand correctly, I need to have last_request of type
reservation object to hold all fences corresponding to the requests
pertaining to the stream. So, instead of i915_gem_active_set, I need to
do something like this:
	reservation_object_add_excl_fence(obj->resv, &rq->fence); /* Or any
other api to be used here ? */
And, when we want to wait for requests submitted on the stream to
complete, we have to wait for fences associated with the 'last_request'
reservation_object, for e.g. as done in
'i915_gem_object_wait_reservation' function.
Is this all to it, or am I missing some piece of action with this
reservation_object?

Thanks,
Sourab

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/8] drm/i915: Framework for capturing command stream based OA reports
  2017-03-16 13:12             ` sourab gupta
@ 2017-03-16 13:27               ` Chris Wilson
  0 siblings, 0 replies; 24+ messages in thread
From: Chris Wilson @ 2017-03-16 13:27 UTC (permalink / raw)
  To: sourab gupta; +Cc: Daniel Vetter, intel-gfx, Auld, Matthew

On Thu, Mar 16, 2017 at 06:42:21PM +0530, sourab gupta wrote:
> On Thu, 2017-03-16 at 03:09 -0700, Chris Wilson wrote:
> > On Thu, Mar 16, 2017 at 03:22:03PM +0530, sourab gupta wrote:
> > > On Thu, 2017-03-16 at 02:03 -0700, Chris Wilson wrote:
> > > > On Thu, Mar 16, 2017 at 02:24:55PM +0530, sourab gupta wrote:
> > > > > On Thu, 2017-03-16 at 01:10 -0700, Chris Wilson wrote:
> > > > > > On Thu, Mar 16, 2017 at 11:44:10AM +0530, sourab.gupta@intel.com wrote:
> > > > > > > +void i915_perf_command_stream_hook(struct drm_i915_gem_request *request)
> > > > > > > +{
> > > > > > > +	struct intel_engine_cs *engine = request->engine;
> > > > > > > +	struct drm_i915_private *dev_priv = engine->i915;
> > > > > > > +	struct i915_perf_stream *stream;
> > > > > > > +
> > > > > > > +	if (!dev_priv->perf.initialized)
> > > > > > > +		return;
> > > > > > > +
> > > > > > > +	mutex_lock(&dev_priv->perf.streams_lock);
> > > > > > 
> > > > > > Justify a new global lock very, very carefully on execbuf.
> > > > > 
> > > > > The lock introduced here is to protect the the perf.streams list against
> > > > > addition/deletion while we're processing the list during execbuf call.
> > > > > The other places where the mutex is taken is when a new stream is being
> > > > > created (using perf_open ioctl) or a stream is being destroyed
> > > > > (perf_release ioctl), which just protect the list_add and list_del into
> > > > > the perf.streams list.
> > > > > As such, there should not be much impact on execbuf path.
> > > > > Does this seem reasonable?
> > > > 
> > > > It doesn't sound like it needs a mutex, which will simplify the other
> > > > users as well (if switched to, for example, an RCU protected list).
> > > 
> > > Ok. Sounds reasonable, I'll switch to using an RCU protected list here.
> > 
> > (I may be overthinking this, but it still seems overkill and made the
> > timer callback uglier than expected.)
> > 
> Are you suggesting that using RCU here is overkill, and mutex would do?

No, I still think it can be done without a mutex, just slightly more
ugly due to the potential sleep here (perhaps srcu?). It would just be a
shame when we get parallel submission of execbuf sorted for it all to be
serialised by walking a simple list.

> > I was mostly thrown by the idea that you were reassigning requests,
> > which history has shown to be a bad idea (hence i915_gem_active).
> > However, stream->last_request should be a reservation_object.
> > 
> If I understand correctly, I need to have last_request of type
> reservation object to hold all fences corresponding to the requests
> pertaining to the stream. So, instead of i915_gem_active_set, I need to
> do something like this:
> 	reservation_object_add_excl_fence(obj->resv, &rq->fence); /* Or any
> other api to be used here ? */

reservation_object_lock();
if (reservation_object_reserve_shared() == 0)
	reservation_object_add_shared_fence();
reservation_object_unlock();

> And, when we want to wait for requests submitted on the stream to
> complete, we have to wait for fences associated with the 'last_request'
> reservation_object, for e.g. as done in
> 'i915_gem_object_wait_reservation' function.
> Is this all to it, or am I missing some piece of action with this
> reservation_object?

reservation_object_test_signaled_rcu(.test_all = true)
reservation_object_wait_timeout_rcu(.wait_all = true)
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/8] drm/i915: Expose OA sample source to userspace
  2017-03-16  6:14 ` [PATCH 2/8] drm/i915: Expose OA sample source to userspace sourab.gupta
@ 2017-03-16 14:20   ` Robert Bragg
  0 siblings, 0 replies; 24+ messages in thread
From: Robert Bragg @ 2017-03-16 14:20 UTC (permalink / raw)
  To: Gupta, Sourab; +Cc: Daniel Vetter, Intel Graphics Development, Matthew Auld

On Thu, Mar 16, 2017 at 6:14 AM,  <sourab.gupta@intel.com> wrote:
> From: Sourab Gupta <sourab.gupta@intel.com>
>
> This patch exposes a new sample source field to userspace. This field can
> be populated to specify the origin of the OA report.
> Currently, the OA samples are being generated only periodically, and hence
> there's only source flag enum definition right now, but there are other
> means of generating OA samples, such as via MI_RPC commands. The OA_SOURCE
> sample type is introducing a mechanism (for userspace) to distinguish various
> OA reports generated via different sources.

Maybe we could clarify that it's not intended as a replacement for the
reason field that's part of Gen8+ OA reports. For automatically
triggered reports written to the OABUFFER then the reason field will
distinguish e.g. periodic vs ctx-switch vs GO transition reasons for
the OA unit writing a report. The reason field is overloaded as the
RPT_ID field for MI_RPC reports so we need our own way of tracking the
difference.

>
> Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
> Signed-off-by: Robert Bragg <robert@sixbynine.org>
> ---
>  drivers/gpu/drm/i915/i915_perf.c | 18 ++++++++++++++++++
>  include/uapi/drm/i915_drm.h      | 14 ++++++++++++++
>  2 files changed, 32 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> index 4b1db73..540c5b2 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -324,6 +324,7 @@
>  };
>
>  #define SAMPLE_OA_REPORT      (1<<0)
> +#define SAMPLE_OA_SOURCE_INFO  (1<<1)
>
>  /**
>   * struct perf_open_properties - for validated properties given to open a stream
> @@ -659,6 +660,15 @@ static int append_oa_sample(struct i915_perf_stream *stream,
>                 return -EFAULT;
>         buf += sizeof(header);
>

I think I'd add a note mentioning that the ordering of _OA_SOURCE_INFO
before the _OA_REPORT forms part of the uapi - just as an extra guard
against someone reorganising code not realising that. Maybe over
paranoid though.

> +       if (sample_flags & SAMPLE_OA_SOURCE_INFO) {
> +               enum drm_i915_perf_oa_event_source source =
> +                               I915_PERF_OA_EVENT_SOURCE_PERIODIC;
> +
> +               if (copy_to_user(buf, &source, 4))
> +                       return -EFAULT;
> +               buf += 4;
> +       }

I think we should keep all sample sections 8 byte aligned to keep any
embedded u64 values we might want within records naturally aligned.
Currently we know OA reports all have power-of-two sizes of 64, 128 or
256 bytes. The _OA_REPORT_LOST and _OA_BUFFER_LOST records only have
an 8 byte header.

'PERIODIC' may not be an accurate differentiator. As mentioned above
with the comment about then Gen8+ 'reason' field there are a number of
different automatic triggers for writing OA reports to the OABUFFER.
Maybe 'OABUFFER' would be better.

I'm not sure the final '_INFO' suffix is necessary here. To me I think
I'd expect richer, more-detailed data with that suffix and I'd expect
the data to about the source while this is just about identifying the
source. (not too important at this stage - just mentioning how it
comes across when I read it currently).

> +
>         if (sample_flags & SAMPLE_OA_REPORT) {
>                 if (copy_to_user(buf, report, report_size))
>                         return -EFAULT;
> @@ -2030,6 +2040,11 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
>         stream->sample_flags |= SAMPLE_OA_REPORT;
>         stream->sample_size += format_size;
>
> +       if (props->sample_flags & SAMPLE_OA_SOURCE_INFO) {
> +               stream->sample_flags |= SAMPLE_OA_SOURCE_INFO;
> +               stream->sample_size += 4;
> +       }

8 - as commented above.

> +
>         dev_priv->perf.oa.oa_buffer.format_size = format_size;
>         if (WARN_ON(dev_priv->perf.oa.oa_buffer.format_size == 0))
>                 return -EINVAL;
> @@ -2814,6 +2829,9 @@ static int read_properties_unlocked(struct drm_i915_private *dev_priv,
>                         props->oa_periodic = true;
>                         props->oa_period_exponent = value;
>                         break;
> +               case DRM_I915_PERF_PROP_SAMPLE_OA_SOURCE:
> +                       props->sample_flags |= SAMPLE_OA_SOURCE_INFO;
> +                       break;
>                 default:
>                         MISSING_CASE(id);
>                         DRM_DEBUG("Unknown i915 perf property ID\n");
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 835e711..c597e36 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -1312,6 +1312,12 @@ enum drm_i915_oa_format {
>         I915_OA_FORMAT_MAX          /* non-ABI */
>  };
>
> +enum drm_i915_perf_oa_event_source {
> +       I915_PERF_OA_EVENT_SOURCE_PERIODIC,
> +
> +       I915_PERF_OA_EVENT_SOURCE_MAX   /* non-ABI */
> +};
> +
>  enum drm_i915_perf_property_id {
>         /**
>          * Open the stream for a specific context handle (as used with
> @@ -1346,6 +1352,13 @@ enum drm_i915_perf_property_id {
>          */
>         DRM_I915_PERF_PROP_OA_EXPONENT,
>
> +       /**
> +        * The value of this property set to 1 requests inclusion of sample
> +        * source field to be given to userspace. The sample source field
> +        * specifies the origin of OA report.
> +        */
> +       DRM_I915_PERF_PROP_SAMPLE_OA_SOURCE,
> +
>         DRM_I915_PERF_PROP_MAX /* non-ABI */
>  };
>
> @@ -1411,6 +1424,7 @@ enum drm_i915_perf_record_type {
>          * struct {
>          *     struct drm_i915_perf_record_header header;
>          *
> +        *     { u32 source_info; } && DRM_I915_PERF_PROP_SAMPLE_OA_SOURCE

u64

Considering how 8 bytes is obviously overkill for the info here I'm
wondering if we should consider maintaining the ability to pack more
data into the same section in the future.

Maybe u64 source_info:4, reserved:60 bitfields could keep it easier to
combine with more data later? (this is a convention taken from from
uapi/linux/perf_event.h


>          *     { u32 oa_report[]; } && DRM_I915_PERF_PROP_SAMPLE_OA
>          * };
>          */
> --
> 1.9.1
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/8] Collect command stream based OA reports using i915 perf
  2017-03-16 12:59 ` [PATCH 0/8] " Robert Bragg
@ 2017-03-17  5:29   ` sourab gupta
  0 siblings, 0 replies; 24+ messages in thread
From: sourab gupta @ 2017-03-17  5:29 UTC (permalink / raw)
  To: Robert Bragg; +Cc: Daniel Vetter, Intel Graphics Development, Auld, Matthew

On Thu, 2017-03-16 at 05:59 -0700, Robert Bragg wrote:
> On Thu, Mar 16, 2017 at 6:14 AM,  <sourab.gupta@intel.com> wrote:
> > From: Sourab Gupta <sourab.gupta@intel.com>
> >
> > This series adds framework for collection of OA reports associated with the
> > render command stream, which are collected around batchbuffer boundaries.
> >
> > Refloating the series rebased on Robert's latest patch set for
> > 'Enabling OA unit for Gen 8 and 9 in i915 perf', which can be found here:
> > https://patchwork.freedesktop.org/series/20084/
> >
> > Since Robert's patches are being reviewed and this patch series extends his
> > framework to collect command stream based OA metrics, it would be good to keep
> > this work in perspective. Looking to receive feedback (and possibly r-b's :))
> > on the series.
> >
> > Since the OA reports collected associated with the render command stream, this
> > also gives us the ability to collect other metadata such as ctx_id, pid, etc.
> > with the samples, and thus we can establish the association of samples
> > collected with the corresponding process/workload.
> >
> > These patches can be found for viewing at
> > https://github.com/sourabgu/linux/tree/oa-6march2017
> >
> > Sourab Gupta (8):
> >   drm/i915: Add ctx getparam ioctl parameter to retrieve ctx unique id
> >   drm/i915: Expose OA sample source to userspace
> >   drm/i915: Framework for capturing command stream based OA reports
> >   drm/i915: flush periodic samples, in case of no pending CS sample
> >     requests
> >   drm/i915: Inform userspace about command stream OA buf overflow
> >   drm/i915: Populate ctx ID for periodic OA reports
> >   drm/i915: Add support for having pid output with OA report
> >   drm/i915: Add support for emitting execbuffer tags through OA counter
> >     reports
> 
> Thanks for the updated series Sourab.
> 
> I think it could really help to have a pointer to some userspace that
> can be used to exercise these new features. Maybe you could look at
> adding support to the gputop-csv command line tool which is probably
> the simplest, usable userspace for i915 perf we have currently. A
> pointer to some work-in-progress IGT tests could be good too.
> 
> Br,
> - Robert
> 
Hi Robert,

I'll aim to publish this during next revision.

Thanks,
Sourab
> >
> >  drivers/gpu/drm/i915/i915_drv.h            |  125 ++-
> >  drivers/gpu/drm/i915/i915_gem_context.c    |    3 +
> >  drivers/gpu/drm/i915/i915_gem_execbuffer.c |    6 +
> >  drivers/gpu/drm/i915/i915_perf.c           | 1149 ++++++++++++++++++++++++----
> >  include/uapi/drm/i915_drm.h                |   49 ++
> >  5 files changed, 1184 insertions(+), 148 deletions(-)
> >
> > --
> > 1.9.1
> >


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2017-03-17  5:26 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-16  6:14 [PATCH 0/8] Collect command stream based OA reports using i915 perf sourab.gupta
2017-03-16  6:14 ` [PATCH 1/8] drm/i915: Add ctx getparam ioctl parameter to retrieve ctx unique id sourab.gupta
2017-03-16  8:23   ` Chris Wilson
2017-03-16  6:14 ` [PATCH 2/8] drm/i915: Expose OA sample source to userspace sourab.gupta
2017-03-16 14:20   ` Robert Bragg
2017-03-16  6:14 ` [PATCH 3/8] drm/i915: Framework for capturing command stream based OA reports sourab.gupta
2017-03-16  8:10   ` Chris Wilson
2017-03-16  8:54     ` sourab gupta
2017-03-16  9:03       ` Chris Wilson
2017-03-16  9:52         ` sourab gupta
2017-03-16 10:09           ` Chris Wilson
2017-03-16 13:12             ` sourab gupta
2017-03-16 13:27               ` Chris Wilson
2017-03-16  8:31   ` Chris Wilson
2017-03-16  8:57     ` sourab gupta
2017-03-16  6:14 ` [PATCH 4/8] drm/i915: flush periodic samples, in case of no pending CS sample requests sourab.gupta
2017-03-16  6:14 ` [PATCH 5/8] drm/i915: Inform userspace about command stream OA buf overflow sourab.gupta
2017-03-16  6:14 ` [PATCH 6/8] drm/i915: Populate ctx ID for periodic OA reports sourab.gupta
2017-03-16  6:14 ` [PATCH 7/8] drm/i915: Add support for having pid output with OA report sourab.gupta
2017-03-16  6:14 ` [PATCH 8/8] drm/i915: Add support for emitting execbuffer tags through OA counter reports sourab.gupta
2017-03-16  8:13   ` Chris Wilson
2017-03-16  6:30 ` ✓ Fi.CI.BAT: success for Collect command stream based OA reports using i915 perf Patchwork
2017-03-16 12:59 ` [PATCH 0/8] " Robert Bragg
2017-03-17  5:29   ` sourab gupta

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.