All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/3] drm/i915: serialized performance queries
@ 2018-10-08 15:18 Lionel Landwerlin
  2018-10-08 15:18 ` [RFC PATCH 1/3] drm/i915/perf: allow holding preemption on filtered ctx Lionel Landwerlin
                   ` (6 more replies)
  0 siblings, 7 replies; 13+ messages in thread
From: Lionel Landwerlin @ 2018-10-08 15:18 UTC (permalink / raw)
  To: intel-gfx

Hi all,

This is a early stage series on which I'm looking for feedback.

Some background :

Performance queries (sampling performance counters through
MI_REPORT_PERF_COUNT instruction) commands requires the hardware to be
programmed with the desired configuration to allow particular
performance data to be recorded. Up to this series, an application
querying performance data would have to close/reopen the i915/perf
stream each time it wanted to gather different type of performance
data.

This series introduce a new mechanism through the execbuf parameters
to specify what configuration should be used for the set of commands
given into the execbuf's batchbuffer.

Motivation :

Giving the configuration needed to gather performance data at execbuf
time together with holding preemption on the batchs of the same
application allows data to be gathered using just MI_REPORT_PERF_COUNT
instructions and also to go through all many configuration without
slowing down the application. The application can now serialize many
queries of different types and doesn't have to wait for completion of
a previous query to submit a new one with a different configuration.



These patches are in no way final, in particular the execbuf uapi
changes is probably unworkable in production (it was just a quick way
to prove things are working). I heard discussions about execbuf3, this
could probably be tied into that.

Looking forward to your comments.

Thanks,

Lionel Landwerlin (3):
  drm/i915/perf: allow holding preemption on filtered ctx
  drm/i915/perf: allow for CS OA configs to be created lazily
  drm/i915: add a new perf configuration execbuf parameter

 drivers/gpu/drm/i915/i915_drv.c            |   4 +
 drivers/gpu/drm/i915/i915_drv.h            |  22 +-
 drivers/gpu/drm/i915/i915_gem_context.c    |   1 +
 drivers/gpu/drm/i915/i915_gem_context.h    |   3 +
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  60 +++++-
 drivers/gpu/drm/i915/i915_perf.c           | 227 ++++++++++++++++++---
 drivers/gpu/drm/i915/i915_request.c        |   4 +
 drivers/gpu/drm/i915/i915_request.h        |   2 +
 drivers/gpu/drm/i915/intel_gpu_commands.h  |   1 +
 drivers/gpu/drm/i915/intel_lrc.c           |  15 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c    |  11 +-
 include/uapi/drm/i915_drm.h                |  20 +-
 12 files changed, 327 insertions(+), 43 deletions(-)

--
2.19.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC PATCH 1/3] drm/i915/perf: allow holding preemption on filtered ctx
  2018-10-08 15:18 [RFC PATCH 0/3] drm/i915: serialized performance queries Lionel Landwerlin
@ 2018-10-08 15:18 ` Lionel Landwerlin
  2018-10-08 15:35   ` Chris Wilson
  2018-10-08 15:18 ` [RFC PATCH 2/3] drm/i915/perf: allow for CS OA configs to be created lazily Lionel Landwerlin
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 13+ messages in thread
From: Lionel Landwerlin @ 2018-10-08 15:18 UTC (permalink / raw)
  To: intel-gfx

We would like to make use of perf in Vulkan. The Vulkan API is much
lower level than OpenGL, with applications directly exposed to the
concept of command buffers (pretty much equivalent to our batch
buffers). In Vulkan, queries are always limited in scope to a command
buffer. In OpenGL, the lack of command buffer concept meant that
queries' duration could span multiple command buffers.

With that restriction gone in Vulkan, we would like to simplify
measuring performance just by measuring the deltas between the counter
snapshots written by 2 MI_RECORD_PERF_COUNT commands, rather than the
more complex scheme we currently have in the GL driver, using 2
MI_RECORD_PERF_COUNT commands and doing some post processing on the
stream of OA reports, coming from the global OA buffer, to remove any
unrelated deltas in between the 2 MI_RECORD_PERF_COUNT.

Disabling preemption only apply to a single context with which want to
query performance counters for and is considered a privileged
operation, by default protected by CAP_SYS_ADMIN. It is possible to
enable it for a normal user by disabling the paranoid stream setting.

v2: Store preemption setting in intel_context (Chris)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_context.c |  1 +
 drivers/gpu/drm/i915/i915_gem_context.h |  3 +++
 drivers/gpu/drm/i915/i915_perf.c        | 32 +++++++++++++++++++++++--
 drivers/gpu/drm/i915/intel_lrc.c        |  2 +-
 include/uapi/drm/i915_drm.h             |  8 +++++++
 5 files changed, 43 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 8cbe58070561..c80655dbccc1 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -343,6 +343,7 @@ __create_hw_context(struct drm_i915_private *dev_priv,
 		struct intel_context *ce = &ctx->__engine[n];
 
 		ce->gem_context = ctx;
+		ce->arb_enable = MI_ARB_ENABLE;
 	}
 
 	INIT_RADIX_TREE(&ctx->handles_vma, GFP_KERNEL);
diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h
index f6d870b1f73e..295d53763ee0 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.h
+++ b/drivers/gpu/drm/i915/i915_gem_context.h
@@ -171,6 +171,9 @@ struct i915_gem_context {
 		int pin_count;
 
 		const struct intel_context_ops *ops;
+
+		/* arb_enable: Control preemption */
+		u32 arb_enable;
 	} __engine[I915_NUM_ENGINES];
 
 	/** ring_size: size for allocating the per-engine ring buffer */
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 664b96bb65a3..e2a96b6844fe 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -354,6 +354,7 @@ struct perf_open_properties {
 	u32 sample_flags;
 
 	u64 single_context:1;
+	u64 context_disable_preemption:1;
 	u64 ctx_handle;
 
 	/* OA sampling state */
@@ -1360,6 +1361,12 @@ static void i915_oa_stream_destroy(struct i915_perf_stream *stream)
 	mutex_lock(&dev_priv->drm.struct_mutex);
 	dev_priv->perf.oa.exclusive_stream = NULL;
 	dev_priv->perf.oa.ops.disable_metric_set(dev_priv);
+	if (stream->ctx) {
+		struct intel_context *ce =
+			to_intel_context(stream->ctx, dev_priv->engine[RCS]);
+
+		ce->arb_enable = MI_ARB_ENABLE;
+	}
 	mutex_unlock(&dev_priv->drm.struct_mutex);
 
 	free_oa_buffer(dev_priv);
@@ -2099,6 +2106,13 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
 		goto err_enable;
 	}
 
+	if (props->context_disable_preemption) {
+		struct intel_context *ce =
+			to_intel_context(stream->ctx, dev_priv->engine[RCS]);
+
+		ce->arb_enable = MI_ARB_DISABLE;
+	}
+
 	stream->ops = &i915_oa_stream_ops;
 
 	dev_priv->perf.oa.exclusive_stream = stream;
@@ -2555,6 +2569,15 @@ i915_perf_open_ioctl_locked(struct drm_i915_private *dev_priv,
 		}
 	}
 
+	if (props->context_disable_preemption) {
+		if (!props->single_context) {
+			DRM_DEBUG("preemption disable with no context\n");
+			ret = -EINVAL;
+			goto err;
+		}
+		privileged_op = true;
+	}
+
 	/*
 	 * On Haswell the OA unit supports clock gating off for a specific
 	 * context and in this mode there's no visibility of metrics for the
@@ -2569,8 +2592,10 @@ i915_perf_open_ioctl_locked(struct drm_i915_private *dev_priv,
 	 * MI_REPORT_PERF_COUNT commands and so consider it a privileged op to
 	 * enable the OA unit by default.
 	 */
-	if (IS_HASWELL(dev_priv) && specific_ctx)
+	if (IS_HASWELL(dev_priv) && specific_ctx &&
+	    !props->context_disable_preemption) {
 		privileged_op = false;
+	}
 
 	/* Similar to perf's kernel.perf_paranoid_cpu sysctl option
 	 * we check a dev.i915.perf_stream_paranoid sysctl option
@@ -2579,7 +2604,7 @@ i915_perf_open_ioctl_locked(struct drm_i915_private *dev_priv,
 	 */
 	if (privileged_op &&
 	    i915_perf_stream_paranoid && !capable(CAP_SYS_ADMIN)) {
-		DRM_DEBUG("Insufficient privileges to open system-wide i915 perf stream\n");
+		DRM_DEBUG("Insufficient privileges to open i915 perf stream\n");
 		ret = -EACCES;
 		goto err_ctx;
 	}
@@ -2771,6 +2796,9 @@ static int read_properties_unlocked(struct drm_i915_private *dev_priv,
 			props->oa_periodic = true;
 			props->oa_period_exponent = value;
 			break;
+		case DRM_I915_PERF_PROP_HOLD_PREEMPTION:
+			props->context_disable_preemption = value != 0 ? 1 : 0;
+			break;
 		case DRM_I915_PERF_PROP_MAX:
 			MISSING_CASE(id);
 			return -EINVAL;
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index ff0e2b36cb8b..b240332838c1 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1896,7 +1896,7 @@ static int gen8_emit_bb_start(struct i915_request *rq,
 	 *
 	 * That satisfies both the GPGPU w/a and our heavy-handed paranoia.
 	 */
-	*cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE;
+	*cs++ = MI_ARB_ON_OFF | rq->hw_context->arb_enable;
 
 	/* FIXME(BDW): Address space and security selectors. */
 	*cs++ = MI_BATCH_BUFFER_START_GEN8 |
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 298b2e197744..62f669030741 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1540,6 +1540,14 @@ enum drm_i915_perf_property_id {
 	 */
 	DRM_I915_PERF_PROP_OA_EXPONENT,
 
+	/**
+	 * Specifying this property is only valid when specify a context to
+	 * filter with DRM_I915_PERF_PROP_CTX_HANDLE. Specifying this property
+	 * will hold preemption of the particular context we want to gather
+	 * performance data about.
+	 */
+	DRM_I915_PERF_PROP_HOLD_PREEMPTION,
+
 	DRM_I915_PERF_PROP_MAX /* non-ABI */
 };
 
-- 
2.19.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC PATCH 2/3] drm/i915/perf: allow for CS OA configs to be created lazily
  2018-10-08 15:18 [RFC PATCH 0/3] drm/i915: serialized performance queries Lionel Landwerlin
  2018-10-08 15:18 ` [RFC PATCH 1/3] drm/i915/perf: allow holding preemption on filtered ctx Lionel Landwerlin
@ 2018-10-08 15:18 ` Lionel Landwerlin
  2018-10-08 15:34   ` Chris Wilson
  2018-10-08 15:18 ` [RFC PATCH 3/3] drm/i915: add a new perf configuration execbuf parameter Lionel Landwerlin
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 13+ messages in thread
From: Lionel Landwerlin @ 2018-10-08 15:18 UTC (permalink / raw)
  To: intel-gfx

Here we introduce a mechanism by which the execbuf part of the i915
driver will be able to request that a batch buffer containing the
programming for a particular OA config be created.

We'll execute these OA configuration buffers right before executing a
set of userspace commands so that a particular user batchbuffer be
executed with a given OA configuration.

This mechanism essentially allows the userspace driver to go through
several OA configuration without having to open/close the i915/perf
stream.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h           |  22 ++-
 drivers/gpu/drm/i915/i915_perf.c          | 195 ++++++++++++++++++----
 drivers/gpu/drm/i915/intel_gpu_commands.h |   1 +
 3 files changed, 187 insertions(+), 31 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 2264b30ce51a..a35715cd7608 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1378,6 +1378,10 @@ struct i915_oa_config {
 	struct attribute *attrs[2];
 	struct device_attribute sysfs_metric_id;
 
+	struct i915_vma *vma;
+
+	struct list_head vma_link;
+
 	atomic_t ref_count;
 };
 
@@ -1979,11 +1983,21 @@ struct drm_i915_private {
 		struct mutex metrics_lock;
 
 		/*
-		 * List of dynamic configurations, you need to hold
-		 * dev_priv->perf.metrics_lock to access it.
+		 * List of dynamic configurations (struct i915_oa_config), you
+		 * need to hold dev_priv->perf.metrics_lock to access it.
 		 */
 		struct idr metrics_idr;
 
+		/*
+		 * List of dynamic configurations (struct i915_oa_config)
+		 * which have an allocated buffer in GGTT for reconfiguration,
+		 * you need to hold dev_priv->perf.metrics_lock to access it.
+		 * Elements are added to the list lazilly on execbuf (when a
+		 * particular configuration is requested). The list is freed
+		 * upon closing the perf stream.
+		 */
+		struct list_head metrics_buffers;
+
 		/*
 		 * Lock associated with anything below within this structure
 		 * except exclusive_stream.
@@ -3315,6 +3329,10 @@ int i915_perf_remove_config_ioctl(struct drm_device *dev, void *data,
 void i915_oa_init_reg_state(struct intel_engine_cs *engine,
 			    struct i915_gem_context *ctx,
 			    uint32_t *reg_state);
+int i915_perf_get_oa_config(struct drm_i915_private *i915,
+			    int metrics_set,
+			    struct i915_oa_config **out_config,
+			    struct i915_vma **out_vma);
 
 /* i915_gem_evict.c */
 int __must_check i915_gem_evict_something(struct i915_address_space *vm,
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index e2a96b6844fe..39c5b44862d4 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -364,9 +364,16 @@ struct perf_open_properties {
 	int oa_period_exponent;
 };
 
-static void free_oa_config(struct drm_i915_private *dev_priv,
-			   struct i915_oa_config *oa_config)
+static void put_oa_config(struct i915_oa_config *oa_config)
 {
+	if (!atomic_dec_and_test(&oa_config->ref_count))
+		return;
+
+	if (oa_config->vma) {
+		list_del(&oa_config->vma_link);
+		i915_vma_put(oa_config->vma);
+	}
+
 	if (!PTR_ERR(oa_config->flex_regs))
 		kfree(oa_config->flex_regs);
 	if (!PTR_ERR(oa_config->b_counter_regs))
@@ -376,38 +383,152 @@ static void free_oa_config(struct drm_i915_private *dev_priv,
 	kfree(oa_config);
 }
 
-static void put_oa_config(struct drm_i915_private *dev_priv,
-			  struct i915_oa_config *oa_config)
+static u32 *write_cs_mi_lri(u32 *cs, const struct i915_oa_reg *reg_data, u32 n_regs)
 {
-	if (!atomic_dec_and_test(&oa_config->ref_count))
-		return;
+	u32 i;
 
-	free_oa_config(dev_priv, oa_config);
+	for (i = 0; i < n_regs; i++) {
+		if ((i % MI_LOAD_REGISTER_IMM_MAX_REGS) == 0) {
+			u32 n_lri = min(n_regs - i,
+					(u32) MI_LOAD_REGISTER_IMM_MAX_REGS);
+
+			*cs++ = MI_LOAD_REGISTER_IMM(n_lri);
+		}
+		*cs++ = i915_mmio_reg_offset(reg_data[i].addr);
+		*cs++ = reg_data[i].value;
+	}
+
+	return cs;
 }
 
-static int get_oa_config(struct drm_i915_private *dev_priv,
-			 int metrics_set,
-			 struct i915_oa_config **out_config)
+static int alloc_oa_config_buffer(struct drm_i915_private *i915,
+				  struct i915_oa_config *oa_config)
 {
+	struct drm_i915_gem_object *bo;
+	size_t config_length = 0;
 	int ret;
+	u32 *cs;
 
-	if (metrics_set == 1) {
-		*out_config = &dev_priv->perf.oa.test_config;
-		atomic_inc(&dev_priv->perf.oa.test_config.ref_count);
-		return 0;
+	if (oa_config->mux_regs_len > 0) {
+		config_length += DIV_ROUND_UP(oa_config->mux_regs_len,
+					      MI_LOAD_REGISTER_IMM_MAX_REGS) * 4;
+		config_length += oa_config->mux_regs_len * 8;
 	}
+	if (oa_config->b_counter_regs_len > 0) {
+		config_length += DIV_ROUND_UP(oa_config->b_counter_regs_len,
+					      MI_LOAD_REGISTER_IMM_MAX_REGS) * 4;
+		config_length += oa_config->b_counter_regs_len * 8;
+	}
+	if (oa_config->flex_regs_len > 0) {
+		config_length += DIV_ROUND_UP(oa_config->flex_regs_len,
+					      MI_LOAD_REGISTER_IMM_MAX_REGS) * 4;
+		config_length += oa_config->flex_regs_len * 8;
+	}
+	config_length += 4; /* MI_BATCH_BUFFER_END */
+	config_length = ALIGN(config_length, I915_GTT_PAGE_SIZE);
 
-	ret = mutex_lock_interruptible(&dev_priv->perf.metrics_lock);
+	ret = i915_mutex_lock_interruptible(&i915->drm);
 	if (ret)
 		return ret;
 
-	*out_config = idr_find(&dev_priv->perf.metrics_idr, metrics_set);
-	if (!*out_config)
-		ret = -EINVAL;
-	else
-		atomic_inc(&(*out_config)->ref_count);
+	bo = i915_gem_object_create(i915, config_length);
+	if (IS_ERR(bo)) {
+		ret = PTR_ERR(bo);
+		goto unlock;
+	}
+
+	ret = i915_gem_object_set_cache_level(bo, I915_CACHE_LLC);
+	if (ret)
+		goto err_unref;
 
-	mutex_unlock(&dev_priv->perf.metrics_lock);
+	oa_config->vma = i915_gem_object_ggtt_pin(bo, NULL, 0, config_length, 0);
+	if (IS_ERR(oa_config->vma)) {
+		ret = PTR_ERR(oa_config->vma);
+		oa_config->vma = NULL;
+		goto err_unref;
+	}
+
+	cs = i915_gem_object_pin_map(bo, I915_MAP_WB);
+	if (IS_ERR(cs)) {
+		ret = PTR_ERR(cs);
+		goto err_unpin;
+	}
+
+	memset(cs, 0, config_length);
+
+	cs = write_cs_mi_lri(cs, oa_config->mux_regs, oa_config->mux_regs_len);
+	cs = write_cs_mi_lri(cs, oa_config->b_counter_regs, oa_config->b_counter_regs_len);
+	cs = write_cs_mi_lri(cs, oa_config->flex_regs, oa_config->flex_regs_len);
+
+	*cs++ = MI_BATCH_BUFFER_END;
+
+	i915_gem_object_unpin_map(bo);
+
+	goto unlock;
+
+err_unpin:
+	__i915_vma_unpin(oa_config->vma);
+
+err_unref:
+	oa_config->vma = NULL;
+	i915_gem_object_put(bo);
+
+unlock:
+	mutex_unlock(&i915->drm.struct_mutex);
+	return ret;
+}
+
+int i915_perf_get_oa_config(struct drm_i915_private *i915,
+			    int metrics_set,
+			    struct i915_oa_config **out_config,
+			    struct i915_vma **out_vma)
+{
+	int ret = 0;
+	struct i915_oa_config *oa_config;
+
+	if (!i915->perf.initialized)
+		return -ENODEV;
+
+	ret = mutex_lock_interruptible(&i915->perf.metrics_lock);
+	if (ret)
+		return ret;
+
+	if (metrics_set == 1) {
+		oa_config = &i915->perf.oa.test_config;
+	} else {
+		oa_config = idr_find(&i915->perf.metrics_idr, metrics_set);
+		if (!oa_config) {
+			ret = -EINVAL;
+			goto unlock;
+		}
+	}
+
+	if (out_config) {
+		atomic_inc(&oa_config->ref_count);
+		*out_config = oa_config;
+	}
+
+	if (out_vma) {
+		if (oa_config->vma) {
+			*out_vma = i915_vma_get(oa_config->vma);
+		} else {
+			ret = alloc_oa_config_buffer(i915, oa_config);
+			if (ret) {
+				goto err_buf_alloc;
+			} else {
+				list_add(&oa_config->vma_link,
+					 &i915->perf.metrics_buffers);
+				*out_vma = i915_vma_get(oa_config->vma);
+			}
+		}
+	}
+
+	goto unlock;
+
+err_buf_alloc:
+	put_oa_config(oa_config);
+unlock:
+	mutex_unlock(&i915->perf.metrics_lock);
 
 	return ret;
 }
@@ -1377,7 +1498,7 @@ static void i915_oa_stream_destroy(struct i915_perf_stream *stream)
 	if (stream->ctx)
 		oa_put_render_ctx_id(stream);
 
-	put_oa_config(dev_priv, stream->oa_config);
+	put_oa_config(stream->oa_config);
 
 	if (dev_priv->perf.oa.spurious_report_rs.missed) {
 		DRM_NOTE("%d spurious OA report notices suppressed due to ratelimiting\n",
@@ -2070,7 +2191,8 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
 		}
 	}
 
-	ret = get_oa_config(dev_priv, props->metrics_set, &stream->oa_config);
+	ret = i915_perf_get_oa_config(dev_priv, props->metrics_set,
+				      &stream->oa_config, NULL);
 	if (ret) {
 		DRM_DEBUG("Invalid OA config id=%i\n", props->metrics_set);
 		goto err_config;
@@ -2115,6 +2237,8 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
 
 	stream->ops = &i915_oa_stream_ops;
 
+	DRM_DEBUG("opening stream oa config uuid=%s\n", stream->oa_config->uuid);
+
 	dev_priv->perf.oa.exclusive_stream = stream;
 
 	mutex_unlock(&dev_priv->drm.struct_mutex);
@@ -2129,7 +2253,7 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
 	free_oa_buffer(dev_priv);
 
 err_oa_buf_alloc:
-	put_oa_config(dev_priv, stream->oa_config);
+	put_oa_config(stream->oa_config);
 
 	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
 	intel_runtime_pm_put(dev_priv);
@@ -2496,9 +2620,21 @@ static int i915_perf_release(struct inode *inode, struct file *file)
 {
 	struct i915_perf_stream *stream = file->private_data;
 	struct drm_i915_private *dev_priv = stream->dev_priv;
+	struct i915_oa_config *oa_config, *next;
 
 	mutex_lock(&dev_priv->perf.lock);
+
 	i915_perf_destroy_locked(stream);
+
+	/* Dispose of all oa config batch buffers. */
+	mutex_lock(&dev_priv->perf.metrics_lock);
+	list_for_each_entry_safe(oa_config, next, &dev_priv->perf.metrics_buffers, vma_link) {
+		list_del(&oa_config->vma_link);
+		i915_vma_put(oa_config->vma);
+		oa_config->vma = NULL;
+	}
+	mutex_unlock(&dev_priv->perf.metrics_lock);
+
 	mutex_unlock(&dev_priv->perf.lock);
 
 	return 0;
@@ -3294,7 +3430,7 @@ int i915_perf_add_config_ioctl(struct drm_device *dev, void *data,
 sysfs_err:
 	mutex_unlock(&dev_priv->perf.metrics_lock);
 reg_err:
-	put_oa_config(dev_priv, oa_config);
+	put_oa_config(oa_config);
 	DRM_DEBUG("Failed to add new OA config\n");
 	return err;
 }
@@ -3348,7 +3484,7 @@ int i915_perf_remove_config_ioctl(struct drm_device *dev, void *data,
 
 	DRM_DEBUG("Removed config %s id=%i\n", oa_config->uuid, oa_config->id);
 
-	put_oa_config(dev_priv, oa_config);
+	put_oa_config(oa_config);
 
 config_err:
 	mutex_unlock(&dev_priv->perf.metrics_lock);
@@ -3492,6 +3628,8 @@ void i915_perf_init(struct drm_i915_private *dev_priv)
 		init_waitqueue_head(&dev_priv->perf.oa.poll_wq);
 
 		INIT_LIST_HEAD(&dev_priv->perf.streams);
+		INIT_LIST_HEAD(&dev_priv->perf.metrics_buffers);
+
 		mutex_init(&dev_priv->perf.lock);
 		spin_lock_init(&dev_priv->perf.oa.oa_buffer.ptr_lock);
 
@@ -3508,10 +3646,9 @@ void i915_perf_init(struct drm_i915_private *dev_priv)
 
 static int destroy_config(int id, void *p, void *data)
 {
-	struct drm_i915_private *dev_priv = data;
 	struct i915_oa_config *oa_config = p;
 
-	put_oa_config(dev_priv, oa_config);
+	put_oa_config(oa_config);
 
 	return 0;
 }
@@ -3525,7 +3662,7 @@ void i915_perf_fini(struct drm_i915_private *dev_priv)
 	if (!dev_priv->perf.initialized)
 		return;
 
-	idr_for_each(&dev_priv->perf.metrics_idr, destroy_config, dev_priv);
+	idr_for_each(&dev_priv->perf.metrics_idr, destroy_config, NULL);
 	idr_destroy(&dev_priv->perf.metrics_idr);
 
 	unregister_sysctl_table(dev_priv->perf.sysctl_header);
diff --git a/drivers/gpu/drm/i915/intel_gpu_commands.h b/drivers/gpu/drm/i915/intel_gpu_commands.h
index 105e2a9e874a..9fb9f3a0cb60 100644
--- a/drivers/gpu/drm/i915/intel_gpu_commands.h
+++ b/drivers/gpu/drm/i915/intel_gpu_commands.h
@@ -122,6 +122,7 @@
  */
 #define MI_LOAD_REGISTER_IMM(x)	MI_INSTR(0x22, 2*(x)-1)
 #define   MI_LRI_FORCE_POSTED		(1<<12)
+#define MI_LOAD_REGISTER_IMM_MAX_REGS (126)
 #define MI_STORE_REGISTER_MEM        MI_INSTR(0x24, 1)
 #define MI_STORE_REGISTER_MEM_GEN8   MI_INSTR(0x24, 2)
 #define   MI_SRM_LRM_GLOBAL_GTT		(1<<22)
-- 
2.19.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC PATCH 3/3] drm/i915: add a new perf configuration execbuf parameter
  2018-10-08 15:18 [RFC PATCH 0/3] drm/i915: serialized performance queries Lionel Landwerlin
  2018-10-08 15:18 ` [RFC PATCH 1/3] drm/i915/perf: allow holding preemption on filtered ctx Lionel Landwerlin
  2018-10-08 15:18 ` [RFC PATCH 2/3] drm/i915/perf: allow for CS OA configs to be created lazily Lionel Landwerlin
@ 2018-10-08 15:18 ` Lionel Landwerlin
  2018-10-08 15:44   ` Chris Wilson
  2018-10-08 15:30 ` ✗ Fi.CI.CHECKPATCH: warning for drm/i915: serialized performance queries Patchwork
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 13+ messages in thread
From: Lionel Landwerlin @ 2018-10-08 15:18 UTC (permalink / raw)
  To: intel-gfx

We want the ability to dispatch a set of command buffer to the
hardware, each with a different OA configuration. To achieve this, we
reuse a couple of fields from the execbuf2 struct (I CAN HAZ
execbuf3?) to notify what OA configuration should be used for a batch
buffer. This requires the process making the execbuf with this flag to
also own the perf fd at the time of execbuf.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.c            |  4 ++
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 60 +++++++++++++++++++---
 drivers/gpu/drm/i915/i915_request.c        |  4 ++
 drivers/gpu/drm/i915/i915_request.h        |  2 +
 drivers/gpu/drm/i915/intel_lrc.c           | 13 ++++-
 drivers/gpu/drm/i915/intel_ringbuffer.c    | 11 +++-
 include/uapi/drm/i915_drm.h                | 12 ++++-
 7 files changed, 97 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 193023427b40..564c2e749fd8 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -444,6 +444,10 @@ static int i915_getparam_ioctl(struct drm_device *dev, void *data,
 	case I915_PARAM_MMAP_GTT_COHERENT:
 		value = INTEL_INFO(dev_priv)->has_coherent_ggtt;
 		break;
+	case I915_PARAM_HAS_EXEC_PERF_CONFIG:
+		/* Obviously requires perf support. */
+		value = dev_priv->perf.initialized;
+		break;
 	default:
 		DRM_DEBUG("Unknown parameter %d\n", param->param);
 		return -EINVAL;
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 09187286d346..8b963641f142 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -286,6 +286,8 @@ struct i915_execbuffer {
 	 */
 	int lut_size;
 	struct hlist_head *buckets; /** ht for relocation handles */
+
+	struct i915_vma *oa_config; /** HW configuration for OA, NULL is not needed. */
 };
 
 #define exec_entry(EB, VMA) (&(EB)->exec[(VMA)->exec_flags - (EB)->flags])
@@ -1121,6 +1123,32 @@ static void clflush_write32(u32 *addr, u32 value, unsigned int flushes)
 		*addr = value;
 }
 
+static int
+get_execbuf_oa_config(struct drm_i915_private *dev_priv,
+		      int perf_fd, u32 oa_config_id,
+		      struct i915_vma **out_oa_vma)
+{
+	struct file *perf_file;
+	int ret;
+
+	if (!dev_priv->perf.oa.exclusive_stream)
+		return -EINVAL;
+
+	perf_file = fget(perf_fd);
+	if (!perf_file)
+		return -EINVAL;
+
+	if (perf_file->private_data != dev_priv->perf.oa.exclusive_stream)
+		return -EINVAL;
+
+	fput(perf_file);
+
+	ret = i915_perf_get_oa_config(dev_priv, oa_config_id,
+				      NULL, out_oa_vma);
+
+	return ret;
+}
+
 static int __reloc_gpu_alloc(struct i915_execbuffer *eb,
 			     struct i915_vma *vma,
 			     unsigned int len)
@@ -1173,6 +1201,9 @@ static int __reloc_gpu_alloc(struct i915_execbuffer *eb,
 		goto err_unpin;
 	}
 
+	rq->oa_config = eb->oa_config;
+	eb->oa_config = NULL;
+
 	err = i915_request_await_object(rq, vma->obj, true);
 	if (err)
 		goto err_request;
@@ -1875,12 +1906,15 @@ static bool i915_gem_check_execbuffer(struct drm_i915_gem_execbuffer2 *exec)
 			return false;
 	}
 
-	if (exec->DR4 == 0xffffffff) {
-		DRM_DEBUG("UXA submitting garbage DR4, fixing up\n");
-		exec->DR4 = 0;
+	/* We reuse DR1 & DR4 fields for passing the perf config detail. */
+	if (!(exec->flags & I915_EXEC_PERF_CONFIG)) {
+		if (exec->DR4 == 0xffffffff) {
+			DRM_DEBUG("UXA submitting garbage DR4, fixing up\n");
+			exec->DR4 = 0;
+		}
+		if (exec->DR1 || exec->DR4)
+			return false;
 	}
-	if (exec->DR1 || exec->DR4)
-		return false;
 
 	if ((exec->batch_start_offset | exec->batch_len) & 0x7)
 		return false;
@@ -2224,6 +2258,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	eb.buffer_count = args->buffer_count;
 	eb.batch_start_offset = args->batch_start_offset;
 	eb.batch_len = args->batch_len;
+	eb.oa_config = NULL;
 
 	eb.batch_flags = 0;
 	if (args->flags & I915_EXEC_SECURE) {
@@ -2253,9 +2288,16 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 		}
 	}
 
+	if (args->flags & I915_EXEC_PERF_CONFIG) {
+		err = get_execbuf_oa_config(eb.i915, args->DR1, args->DR4,
+					    &eb.oa_config);
+		if (err)
+			goto err_out_fence;
+	}
+
 	err = eb_create(&eb);
 	if (err)
-		goto err_out_fence;
+		goto err_perf;
 
 	GEM_BUG_ON(!eb.lut_size);
 
@@ -2365,6 +2407,9 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 		goto err_batch_unpin;
 	}
 
+	eb.request->oa_config = eb.oa_config;
+	eb.oa_config = NULL;
+
 	if (in_fence) {
 		err = i915_request_await_dma_fence(eb.request, in_fence);
 		if (err < 0)
@@ -2426,6 +2471,9 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	i915_gem_context_put(eb.ctx);
 err_destroy:
 	eb_destroy(&eb);
+err_perf:
+	if (eb.oa_config)
+		i915_vma_put(eb.oa_config);
 err_out_fence:
 	if (out_fence_fd != -1)
 		put_unused_fd(out_fence_fd);
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index abd4dacbab8e..8fb134793925 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -379,6 +379,9 @@ static void i915_request_retire(struct i915_request *request)
 
 	unreserve_gt(request->i915);
 
+	if (request->oa_config)
+		i915_vma_put(request->oa_config);
+
 	i915_sched_node_fini(request->i915, &request->sched);
 	i915_request_put(request);
 }
@@ -704,6 +707,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 	rq->batch = NULL;
 	rq->capture_list = NULL;
 	rq->waitboost = false;
+	rq->oa_config = NULL;
 
 	/*
 	 * Reserve space in the ring buffer for all the commands required to
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index 90e9d170a0cd..7a42c9b94877 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -188,6 +188,8 @@ struct i915_request {
 	struct drm_i915_file_private *file_priv;
 	/** file_priv list entry for this request */
 	struct list_head client_link;
+
+	struct i915_vma *oa_config; /** HW configuration for OA, NULL is not needed. */
 };
 
 #define I915_FENCE_GFP (GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN)
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index b240332838c1..d3d8c0c60d65 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1858,6 +1858,8 @@ static int gen8_emit_bb_start(struct i915_request *rq,
 {
 	u32 *cs;
 	int ret;
+	bool use_oa_config =
+		rq->i915->perf.oa.exclusive_stream && rq->oa_config;
 
 	/* Don't rely in hw updating PDPs, specially in lite-restore.
 	 * Ideally, we should set Force PD Restore in ctx descriptor,
@@ -1875,10 +1877,19 @@ static int gen8_emit_bb_start(struct i915_request *rq,
 		rq->gem_context->ppgtt->pd_dirty_rings &= ~intel_engine_flag(rq->engine);
 	}
 
-	cs = intel_ring_begin(rq, 6);
+	cs = intel_ring_begin(rq, use_oa_config ? 10 : 6);
 	if (IS_ERR(cs))
 		return PTR_ERR(cs);
 
+	if (use_oa_config) {
+		u32 oa_config_offset = i915_ggtt_offset(rq->oa_config);
+
+		*cs++ = MI_BATCH_BUFFER_START_GEN8;
+		*cs++ = oa_config_offset;
+		*cs++ = 0;
+		*cs++ = MI_NOOP;
+	}
+
 	/*
 	 * WaDisableCtxRestoreArbitration:bdw,chv
 	 *
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index b8a7a014d46d..d8ebcf91ce93 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2037,11 +2037,20 @@ hsw_emit_bb_start(struct i915_request *rq,
 		  unsigned int dispatch_flags)
 {
 	u32 *cs;
+	bool use_oa_config =
+		rq->i915->perf.oa.exclusive_stream && rq->oa_config;
 
-	cs = intel_ring_begin(rq, 2);
+	cs = intel_ring_begin(rq, use_oa_config ? 4 : 2);
 	if (IS_ERR(cs))
 		return PTR_ERR(cs);
 
+	if (use_oa_config) {
+		u32 oa_config_offset = i915_ggtt_offset(rq->oa_config);
+
+		*cs++ = MI_BATCH_BUFFER_START;
+		*cs++ = oa_config_offset;
+	}
+
 	*cs++ = MI_BATCH_BUFFER_START | (dispatch_flags & I915_DISPATCH_SECURE ?
 		0 : MI_BATCH_PPGTT_HSW | MI_BATCH_NON_SECURE_HSW);
 	/* bit0-7 is the length on GEN6+ */
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 62f669030741..4f0b39796d80 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -559,6 +559,8 @@ typedef struct drm_i915_irq_wait {
  */
 #define I915_PARAM_MMAP_GTT_COHERENT	52
 
+#define I915_PARAM_HAS_EXEC_PERF_CONFIG 53
+
 typedef struct drm_i915_getparam {
 	__s32 param;
 	/*
@@ -1078,7 +1080,15 @@ struct drm_i915_gem_execbuffer2 {
  */
 #define I915_EXEC_FENCE_ARRAY   (1<<19)
 
-#define __I915_EXEC_UNKNOWN_FLAGS (-(I915_EXEC_FENCE_ARRAY<<1))
+/* Request that perf monitoring hardware be reprogrammed before executing the
+ * commands from the batch in the execbuf. The DR1 & DR4 fields of the execbuf
+ * must respectively contain the file descriptor of the perf monitoring device
+ * and the configuration to program.
+ */
+#define I915_EXEC_PERF_CONFIG   (1<<20)
+
+
+#define __I915_EXEC_UNKNOWN_FLAGS (-(I915_EXEC_PERF_CONFIG<<1))
 
 #define I915_EXEC_CONTEXT_ID_MASK	(0xffffffff)
 #define i915_execbuffer2_set_context_id(eb2, context) \
-- 
2.19.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* ✗ Fi.CI.CHECKPATCH: warning for drm/i915: serialized performance queries
  2018-10-08 15:18 [RFC PATCH 0/3] drm/i915: serialized performance queries Lionel Landwerlin
                   ` (2 preceding siblings ...)
  2018-10-08 15:18 ` [RFC PATCH 3/3] drm/i915: add a new perf configuration execbuf parameter Lionel Landwerlin
@ 2018-10-08 15:30 ` Patchwork
  2018-10-08 15:32 ` ✗ Fi.CI.SPARSE: " Patchwork
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Patchwork @ 2018-10-08 15:30 UTC (permalink / raw)
  To: Lionel Landwerlin; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: serialized performance queries
URL   : https://patchwork.freedesktop.org/series/50698/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
40102d5976dd drm/i915/perf: allow holding preemption on filtered ctx
ccae77ecfc25 drm/i915/perf: allow for CS OA configs to be created lazily
-:109: CHECK:SPACING: No space is necessary after a cast
#109: FILE: drivers/gpu/drm/i915/i915_perf.c:393:
+					(u32) MI_LOAD_REGISTER_IMM_MAX_REGS);

total: 0 errors, 0 warnings, 1 checks, 336 lines checked
81dc092f58d5 drm/i915: add a new perf configuration execbuf parameter
-:271: CHECK:SPACING: spaces preferred around that '<<' (ctx:VxV)
#271: FILE: include/uapi/drm/i915_drm.h:1088:
+#define I915_EXEC_PERF_CONFIG   (1<<20)
                                   ^

-:273: CHECK:LINE_SPACING: Please don't use multiple blank lines
#273: FILE: include/uapi/drm/i915_drm.h:1090:
+
+

-:274: CHECK:SPACING: spaces preferred around that '<<' (ctx:VxV)
#274: FILE: include/uapi/drm/i915_drm.h:1091:
+#define __I915_EXEC_UNKNOWN_FLAGS (-(I915_EXEC_PERF_CONFIG<<1))
                                                           ^

total: 0 errors, 0 warnings, 3 checks, 218 lines checked

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* ✗ Fi.CI.SPARSE: warning for drm/i915: serialized performance queries
  2018-10-08 15:18 [RFC PATCH 0/3] drm/i915: serialized performance queries Lionel Landwerlin
                   ` (3 preceding siblings ...)
  2018-10-08 15:30 ` ✗ Fi.CI.CHECKPATCH: warning for drm/i915: serialized performance queries Patchwork
@ 2018-10-08 15:32 ` Patchwork
  2018-10-08 15:55 ` ✓ Fi.CI.BAT: success " Patchwork
  2018-10-08 17:14 ` ✗ Fi.CI.IGT: failure " Patchwork
  6 siblings, 0 replies; 13+ messages in thread
From: Patchwork @ 2018-10-08 15:32 UTC (permalink / raw)
  To: Lionel Landwerlin; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: serialized performance queries
URL   : https://patchwork.freedesktop.org/series/50698/
State : warning

== Summary ==

$ dim sparse origin/drm-tip
Sparse version: v0.5.2
Commit: drm/i915/perf: allow holding preemption on filtered ctx
Okay!

Commit: drm/i915/perf: allow for CS OA configs to be created lazily
+drivers/gpu/drm/i915/i915_perf.c:392:37: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/selftests/../i915_drv.h:3727:16: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/../i915_drv.h:3745:16: warning: expression using sizeof(void)

Commit: drm/i915: add a new perf configuration execbuf parameter
Okay!

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 2/3] drm/i915/perf: allow for CS OA configs to be created lazily
  2018-10-08 15:18 ` [RFC PATCH 2/3] drm/i915/perf: allow for CS OA configs to be created lazily Lionel Landwerlin
@ 2018-10-08 15:34   ` Chris Wilson
  2018-10-08 15:44     ` Lionel Landwerlin
  0 siblings, 1 reply; 13+ messages in thread
From: Chris Wilson @ 2018-10-08 15:34 UTC (permalink / raw)
  To: Lionel Landwerlin, intel-gfx

Quoting Lionel Landwerlin (2018-10-08 16:18:21)
> Here we introduce a mechanism by which the execbuf part of the i915
> driver will be able to request that a batch buffer containing the
> programming for a particular OA config be created.
> 
> We'll execute these OA configuration buffers right before executing a
> set of userspace commands so that a particular user batchbuffer be
> executed with a given OA configuration.
> 
> This mechanism essentially allows the userspace driver to go through
> several OA configuration without having to open/close the i915/perf
> stream.
> 
> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.h           |  22 ++-
>  drivers/gpu/drm/i915/i915_perf.c          | 195 ++++++++++++++++++----
>  drivers/gpu/drm/i915/intel_gpu_commands.h |   1 +
>  3 files changed, 187 insertions(+), 31 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 2264b30ce51a..a35715cd7608 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1378,6 +1378,10 @@ struct i915_oa_config {
>         struct attribute *attrs[2];
>         struct device_attribute sysfs_metric_id;
>  
> +       struct i915_vma *vma;
> +
> +       struct list_head vma_link;
> +
>         atomic_t ref_count;
>  };
>  
> @@ -1979,11 +1983,21 @@ struct drm_i915_private {
>                 struct mutex metrics_lock;
>  
>                 /*
> -                * List of dynamic configurations, you need to hold
> -                * dev_priv->perf.metrics_lock to access it.
> +                * List of dynamic configurations (struct i915_oa_config), you
> +                * need to hold dev_priv->perf.metrics_lock to access it.
>                  */
>                 struct idr metrics_idr;
>  
> +               /*
> +                * List of dynamic configurations (struct i915_oa_config)
> +                * which have an allocated buffer in GGTT for reconfiguration,
> +                * you need to hold dev_priv->perf.metrics_lock to access it.
> +                * Elements are added to the list lazilly on execbuf (when a
> +                * particular configuration is requested). The list is freed
> +                * upon closing the perf stream.
> +                */
> +               struct list_head metrics_buffers;
> +
>                 /*
>                  * Lock associated with anything below within this structure
>                  * except exclusive_stream.
> @@ -3315,6 +3329,10 @@ int i915_perf_remove_config_ioctl(struct drm_device *dev, void *data,
>  void i915_oa_init_reg_state(struct intel_engine_cs *engine,
>                             struct i915_gem_context *ctx,
>                             uint32_t *reg_state);
> +int i915_perf_get_oa_config(struct drm_i915_private *i915,
> +                           int metrics_set,
> +                           struct i915_oa_config **out_config,
> +                           struct i915_vma **out_vma);
>  
>  /* i915_gem_evict.c */
>  int __must_check i915_gem_evict_something(struct i915_address_space *vm,
> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> index e2a96b6844fe..39c5b44862d4 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -364,9 +364,16 @@ struct perf_open_properties {
>         int oa_period_exponent;
>  };
>  
> -static void free_oa_config(struct drm_i915_private *dev_priv,
> -                          struct i915_oa_config *oa_config)
> +static void put_oa_config(struct i915_oa_config *oa_config)
>  {
> +       if (!atomic_dec_and_test(&oa_config->ref_count))
> +               return;
> +
> +       if (oa_config->vma) {
> +               list_del(&oa_config->vma_link);
> +               i915_vma_put(oa_config->vma);
> +       }
> +
>         if (!PTR_ERR(oa_config->flex_regs))
>                 kfree(oa_config->flex_regs);
>         if (!PTR_ERR(oa_config->b_counter_regs))
> @@ -376,38 +383,152 @@ static void free_oa_config(struct drm_i915_private *dev_priv,
>         kfree(oa_config);
>  }
>  
> -static void put_oa_config(struct drm_i915_private *dev_priv,
> -                         struct i915_oa_config *oa_config)
> +static u32 *write_cs_mi_lri(u32 *cs, const struct i915_oa_reg *reg_data, u32 n_regs)
>  {
> -       if (!atomic_dec_and_test(&oa_config->ref_count))
> -               return;
> +       u32 i;
>  
> -       free_oa_config(dev_priv, oa_config);
> +       for (i = 0; i < n_regs; i++) {
> +               if ((i % MI_LOAD_REGISTER_IMM_MAX_REGS) == 0) {
> +                       u32 n_lri = min(n_regs - i,
> +                                       (u32) MI_LOAD_REGISTER_IMM_MAX_REGS);
> +
> +                       *cs++ = MI_LOAD_REGISTER_IMM(n_lri);
> +               }
> +               *cs++ = i915_mmio_reg_offset(reg_data[i].addr);
> +               *cs++ = reg_data[i].value;
> +       }
> +
> +       return cs;
>  }
>  
> -static int get_oa_config(struct drm_i915_private *dev_priv,
> -                        int metrics_set,
> -                        struct i915_oa_config **out_config)
> +static int alloc_oa_config_buffer(struct drm_i915_private *i915,
> +                                 struct i915_oa_config *oa_config)
>  {
> +       struct drm_i915_gem_object *bo;
> +       size_t config_length = 0;
>         int ret;
> +       u32 *cs;
>  
> -       if (metrics_set == 1) {
> -               *out_config = &dev_priv->perf.oa.test_config;
> -               atomic_inc(&dev_priv->perf.oa.test_config.ref_count);
> -               return 0;
> +       if (oa_config->mux_regs_len > 0) {
> +               config_length += DIV_ROUND_UP(oa_config->mux_regs_len,
> +                                             MI_LOAD_REGISTER_IMM_MAX_REGS) * 4;
> +               config_length += oa_config->mux_regs_len * 8;
>         }
> +       if (oa_config->b_counter_regs_len > 0) {
> +               config_length += DIV_ROUND_UP(oa_config->b_counter_regs_len,
> +                                             MI_LOAD_REGISTER_IMM_MAX_REGS) * 4;
> +               config_length += oa_config->b_counter_regs_len * 8;
> +       }
> +       if (oa_config->flex_regs_len > 0) {
> +               config_length += DIV_ROUND_UP(oa_config->flex_regs_len,
> +                                             MI_LOAD_REGISTER_IMM_MAX_REGS) * 4;
> +               config_length += oa_config->flex_regs_len * 8;
> +       }
> +       config_length += 4; /* MI_BATCH_BUFFER_END */
> +       config_length = ALIGN(config_length, I915_GTT_PAGE_SIZE);
>  
> -       ret = mutex_lock_interruptible(&dev_priv->perf.metrics_lock);
> +       ret = i915_mutex_lock_interruptible(&i915->drm);
>         if (ret)
>                 return ret;
>  
> -       *out_config = idr_find(&dev_priv->perf.metrics_idr, metrics_set);
> -       if (!*out_config)
> -               ret = -EINVAL;
> -       else
> -               atomic_inc(&(*out_config)->ref_count);
> +       bo = i915_gem_object_create(i915, config_length);
> +       if (IS_ERR(bo)) {
> +               ret = PTR_ERR(bo);
> +               goto unlock;
> +       }
> +
> +       ret = i915_gem_object_set_cache_level(bo, I915_CACHE_LLC);

Don't enable snoop on a batchbuffer.

> +       if (ret)
> +               goto err_unref;
>  
> -       mutex_unlock(&dev_priv->perf.metrics_lock);
> +       oa_config->vma = i915_gem_object_ggtt_pin(bo, NULL, 0, config_length, 0);

Why have you pinned it?

> +       if (IS_ERR(oa_config->vma)) {
> +               ret = PTR_ERR(oa_config->vma);
> +               oa_config->vma = NULL;
> +               goto err_unref;
> +       }
> +
> +       cs = i915_gem_object_pin_map(bo, I915_MAP_WB);
> +       if (IS_ERR(cs)) {
> +               ret = PTR_ERR(cs);
> +               goto err_unpin;
> +       }
> +
> +       memset(cs, 0, config_length);

Already zero. Or use create_internal to avoid shmemfs overhead. And
since you write all bytes, you can just ignore it.

> +       cs = write_cs_mi_lri(cs, oa_config->mux_regs, oa_config->mux_regs_len);
> +       cs = write_cs_mi_lri(cs, oa_config->b_counter_regs, oa_config->b_counter_regs_len);
> +       cs = write_cs_mi_lri(cs, oa_config->flex_regs, oa_config->flex_regs_len);
> +
> +       *cs++ = MI_BATCH_BUFFER_END;
> +
> +       i915_gem_object_unpin_map(bo);
> +
> +       goto unlock;
> +
> +err_unpin:
> +       __i915_vma_unpin(oa_config->vma);
> +
> +err_unref:
> +       oa_config->vma = NULL;
> +       i915_gem_object_put(bo);
> +
> +unlock:
> +       mutex_unlock(&i915->drm.struct_mutex);
> +       return ret;
> +}
> +
> +int i915_perf_get_oa_config(struct drm_i915_private *i915,
> +                           int metrics_set,
> +                           struct i915_oa_config **out_config,
> +                           struct i915_vma **out_vma)
> +{
> +       int ret = 0;
> +       struct i915_oa_config *oa_config;
> +
> +       if (!i915->perf.initialized)
> +               return -ENODEV;
> +
> +       ret = mutex_lock_interruptible(&i915->perf.metrics_lock);
> +       if (ret)
> +               return ret;
> +
> +       if (metrics_set == 1) {
> +               oa_config = &i915->perf.oa.test_config;
> +       } else {
> +               oa_config = idr_find(&i915->perf.metrics_idr, metrics_set);
> +               if (!oa_config) {
> +                       ret = -EINVAL;
> +                       goto unlock;
> +               }
> +       }
> +
> +       if (out_config) {
> +               atomic_inc(&oa_config->ref_count);
> +               *out_config = oa_config;
> +       }
> +
> +       if (out_vma) {
> +               if (oa_config->vma) {
> +                       *out_vma = i915_vma_get(oa_config->vma);
> +               } else {
> +                       ret = alloc_oa_config_buffer(i915, oa_config);
> +                       if (ret) {
> +                               goto err_buf_alloc;
> +                       } else {
> +                               list_add(&oa_config->vma_link,
> +                                        &i915->perf.metrics_buffers);
> +                               *out_vma = i915_vma_get(oa_config->vma);
> +                       }
> +               }

Where is out_vma used so we can check if the litetime tracking is ok as
so far you are releasing it before we know it is idle.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 1/3] drm/i915/perf: allow holding preemption on filtered ctx
  2018-10-08 15:18 ` [RFC PATCH 1/3] drm/i915/perf: allow holding preemption on filtered ctx Lionel Landwerlin
@ 2018-10-08 15:35   ` Chris Wilson
  2018-10-08 15:45     ` Lionel Landwerlin
  0 siblings, 1 reply; 13+ messages in thread
From: Chris Wilson @ 2018-10-08 15:35 UTC (permalink / raw)
  To: Lionel Landwerlin, intel-gfx

Quoting Lionel Landwerlin (2018-10-08 16:18:20)
> We would like to make use of perf in Vulkan. The Vulkan API is much
> lower level than OpenGL, with applications directly exposed to the
> concept of command buffers (pretty much equivalent to our batch
> buffers). In Vulkan, queries are always limited in scope to a command
> buffer. In OpenGL, the lack of command buffer concept meant that
> queries' duration could span multiple command buffers.
> 
> With that restriction gone in Vulkan, we would like to simplify
> measuring performance just by measuring the deltas between the counter
> snapshots written by 2 MI_RECORD_PERF_COUNT commands, rather than the
> more complex scheme we currently have in the GL driver, using 2
> MI_RECORD_PERF_COUNT commands and doing some post processing on the
> stream of OA reports, coming from the global OA buffer, to remove any
> unrelated deltas in between the 2 MI_RECORD_PERF_COUNT.
> 
> Disabling preemption only apply to a single context with which want to
> query performance counters for and is considered a privileged
> operation, by default protected by CAP_SYS_ADMIN. It is possible to
> enable it for a normal user by disabling the paranoid stream setting.

I'm am uncomfortable with disabling preemption like this. I suggest we
at least have the preemption timeout in place first.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 2/3] drm/i915/perf: allow for CS OA configs to be created lazily
  2018-10-08 15:34   ` Chris Wilson
@ 2018-10-08 15:44     ` Lionel Landwerlin
  0 siblings, 0 replies; 13+ messages in thread
From: Lionel Landwerlin @ 2018-10-08 15:44 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On 08/10/2018 16:34, Chris Wilson wrote:
> Quoting Lionel Landwerlin (2018-10-08 16:18:21)
>> Here we introduce a mechanism by which the execbuf part of the i915
>> driver will be able to request that a batch buffer containing the
>> programming for a particular OA config be created.
>>
>> We'll execute these OA configuration buffers right before executing a
>> set of userspace commands so that a particular user batchbuffer be
>> executed with a given OA configuration.
>>
>> This mechanism essentially allows the userspace driver to go through
>> several OA configuration without having to open/close the i915/perf
>> stream.
>>
>> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_drv.h           |  22 ++-
>>   drivers/gpu/drm/i915/i915_perf.c          | 195 ++++++++++++++++++----
>>   drivers/gpu/drm/i915/intel_gpu_commands.h |   1 +
>>   3 files changed, 187 insertions(+), 31 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
>> index 2264b30ce51a..a35715cd7608 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.h
>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>> @@ -1378,6 +1378,10 @@ struct i915_oa_config {
>>          struct attribute *attrs[2];
>>          struct device_attribute sysfs_metric_id;
>>   
>> +       struct i915_vma *vma;
>> +
>> +       struct list_head vma_link;
>> +
>>          atomic_t ref_count;
>>   };
>>   
>> @@ -1979,11 +1983,21 @@ struct drm_i915_private {
>>                  struct mutex metrics_lock;
>>   
>>                  /*
>> -                * List of dynamic configurations, you need to hold
>> -                * dev_priv->perf.metrics_lock to access it.
>> +                * List of dynamic configurations (struct i915_oa_config), you
>> +                * need to hold dev_priv->perf.metrics_lock to access it.
>>                   */
>>                  struct idr metrics_idr;
>>   
>> +               /*
>> +                * List of dynamic configurations (struct i915_oa_config)
>> +                * which have an allocated buffer in GGTT for reconfiguration,
>> +                * you need to hold dev_priv->perf.metrics_lock to access it.
>> +                * Elements are added to the list lazilly on execbuf (when a
>> +                * particular configuration is requested). The list is freed
>> +                * upon closing the perf stream.
>> +                */
>> +               struct list_head metrics_buffers;
>> +
>>                  /*
>>                   * Lock associated with anything below within this structure
>>                   * except exclusive_stream.
>> @@ -3315,6 +3329,10 @@ int i915_perf_remove_config_ioctl(struct drm_device *dev, void *data,
>>   void i915_oa_init_reg_state(struct intel_engine_cs *engine,
>>                              struct i915_gem_context *ctx,
>>                              uint32_t *reg_state);
>> +int i915_perf_get_oa_config(struct drm_i915_private *i915,
>> +                           int metrics_set,
>> +                           struct i915_oa_config **out_config,
>> +                           struct i915_vma **out_vma);
>>   
>>   /* i915_gem_evict.c */
>>   int __must_check i915_gem_evict_something(struct i915_address_space *vm,
>> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
>> index e2a96b6844fe..39c5b44862d4 100644
>> --- a/drivers/gpu/drm/i915/i915_perf.c
>> +++ b/drivers/gpu/drm/i915/i915_perf.c
>> @@ -364,9 +364,16 @@ struct perf_open_properties {
>>          int oa_period_exponent;
>>   };
>>   
>> -static void free_oa_config(struct drm_i915_private *dev_priv,
>> -                          struct i915_oa_config *oa_config)
>> +static void put_oa_config(struct i915_oa_config *oa_config)
>>   {
>> +       if (!atomic_dec_and_test(&oa_config->ref_count))
>> +               return;
>> +
>> +       if (oa_config->vma) {
>> +               list_del(&oa_config->vma_link);
>> +               i915_vma_put(oa_config->vma);
>> +       }
>> +
>>          if (!PTR_ERR(oa_config->flex_regs))
>>                  kfree(oa_config->flex_regs);
>>          if (!PTR_ERR(oa_config->b_counter_regs))
>> @@ -376,38 +383,152 @@ static void free_oa_config(struct drm_i915_private *dev_priv,
>>          kfree(oa_config);
>>   }
>>   
>> -static void put_oa_config(struct drm_i915_private *dev_priv,
>> -                         struct i915_oa_config *oa_config)
>> +static u32 *write_cs_mi_lri(u32 *cs, const struct i915_oa_reg *reg_data, u32 n_regs)
>>   {
>> -       if (!atomic_dec_and_test(&oa_config->ref_count))
>> -               return;
>> +       u32 i;
>>   
>> -       free_oa_config(dev_priv, oa_config);
>> +       for (i = 0; i < n_regs; i++) {
>> +               if ((i % MI_LOAD_REGISTER_IMM_MAX_REGS) == 0) {
>> +                       u32 n_lri = min(n_regs - i,
>> +                                       (u32) MI_LOAD_REGISTER_IMM_MAX_REGS);
>> +
>> +                       *cs++ = MI_LOAD_REGISTER_IMM(n_lri);
>> +               }
>> +               *cs++ = i915_mmio_reg_offset(reg_data[i].addr);
>> +               *cs++ = reg_data[i].value;
>> +       }
>> +
>> +       return cs;
>>   }
>>   
>> -static int get_oa_config(struct drm_i915_private *dev_priv,
>> -                        int metrics_set,
>> -                        struct i915_oa_config **out_config)
>> +static int alloc_oa_config_buffer(struct drm_i915_private *i915,
>> +                                 struct i915_oa_config *oa_config)
>>   {
>> +       struct drm_i915_gem_object *bo;
>> +       size_t config_length = 0;
>>          int ret;
>> +       u32 *cs;
>>   
>> -       if (metrics_set == 1) {
>> -               *out_config = &dev_priv->perf.oa.test_config;
>> -               atomic_inc(&dev_priv->perf.oa.test_config.ref_count);
>> -               return 0;
>> +       if (oa_config->mux_regs_len > 0) {
>> +               config_length += DIV_ROUND_UP(oa_config->mux_regs_len,
>> +                                             MI_LOAD_REGISTER_IMM_MAX_REGS) * 4;
>> +               config_length += oa_config->mux_regs_len * 8;
>>          }
>> +       if (oa_config->b_counter_regs_len > 0) {
>> +               config_length += DIV_ROUND_UP(oa_config->b_counter_regs_len,
>> +                                             MI_LOAD_REGISTER_IMM_MAX_REGS) * 4;
>> +               config_length += oa_config->b_counter_regs_len * 8;
>> +       }
>> +       if (oa_config->flex_regs_len > 0) {
>> +               config_length += DIV_ROUND_UP(oa_config->flex_regs_len,
>> +                                             MI_LOAD_REGISTER_IMM_MAX_REGS) * 4;
>> +               config_length += oa_config->flex_regs_len * 8;
>> +       }
>> +       config_length += 4; /* MI_BATCH_BUFFER_END */
>> +       config_length = ALIGN(config_length, I915_GTT_PAGE_SIZE);
>>   
>> -       ret = mutex_lock_interruptible(&dev_priv->perf.metrics_lock);
>> +       ret = i915_mutex_lock_interruptible(&i915->drm);
>>          if (ret)
>>                  return ret;
>>   
>> -       *out_config = idr_find(&dev_priv->perf.metrics_idr, metrics_set);
>> -       if (!*out_config)
>> -               ret = -EINVAL;
>> -       else
>> -               atomic_inc(&(*out_config)->ref_count);
>> +       bo = i915_gem_object_create(i915, config_length);
>> +       if (IS_ERR(bo)) {
>> +               ret = PTR_ERR(bo);
>> +               goto unlock;
>> +       }
>> +
>> +       ret = i915_gem_object_set_cache_level(bo, I915_CACHE_LLC);
> Don't enable snoop on a batchbuffer.


Oh right, dropping.


>
>> +       if (ret)
>> +               goto err_unref;
>>   
>> -       mutex_unlock(&dev_priv->perf.metrics_lock);
>> +       oa_config->vma = i915_gem_object_ggtt_pin(bo, NULL, 0, config_length, 0);
> Why have you pinned it?


Duh, I guess I can just pin it at execbuf time!

Thanks!


>
>> +       if (IS_ERR(oa_config->vma)) {
>> +               ret = PTR_ERR(oa_config->vma);
>> +               oa_config->vma = NULL;
>> +               goto err_unref;
>> +       }
>> +
>> +       cs = i915_gem_object_pin_map(bo, I915_MAP_WB);
>> +       if (IS_ERR(cs)) {
>> +               ret = PTR_ERR(cs);
>> +               goto err_unpin;
>> +       }
>> +
>> +       memset(cs, 0, config_length);
> Already zero. Or use create_internal to avoid shmemfs overhead. And
> since you write all bytes, you can just ignore it.


Cool, will drop.


>
>> +       cs = write_cs_mi_lri(cs, oa_config->mux_regs, oa_config->mux_regs_len);
>> +       cs = write_cs_mi_lri(cs, oa_config->b_counter_regs, oa_config->b_counter_regs_len);
>> +       cs = write_cs_mi_lri(cs, oa_config->flex_regs, oa_config->flex_regs_len);
>> +
>> +       *cs++ = MI_BATCH_BUFFER_END;
>> +
>> +       i915_gem_object_unpin_map(bo);
>> +
>> +       goto unlock;
>> +
>> +err_unpin:
>> +       __i915_vma_unpin(oa_config->vma);
>> +
>> +err_unref:
>> +       oa_config->vma = NULL;
>> +       i915_gem_object_put(bo);
>> +
>> +unlock:
>> +       mutex_unlock(&i915->drm.struct_mutex);
>> +       return ret;
>> +}
>> +
>> +int i915_perf_get_oa_config(struct drm_i915_private *i915,
>> +                           int metrics_set,
>> +                           struct i915_oa_config **out_config,
>> +                           struct i915_vma **out_vma)
>> +{
>> +       int ret = 0;
>> +       struct i915_oa_config *oa_config;
>> +
>> +       if (!i915->perf.initialized)
>> +               return -ENODEV;
>> +
>> +       ret = mutex_lock_interruptible(&i915->perf.metrics_lock);
>> +       if (ret)
>> +               return ret;
>> +
>> +       if (metrics_set == 1) {
>> +               oa_config = &i915->perf.oa.test_config;
>> +       } else {
>> +               oa_config = idr_find(&i915->perf.metrics_idr, metrics_set);
>> +               if (!oa_config) {
>> +                       ret = -EINVAL;
>> +                       goto unlock;
>> +               }
>> +       }
>> +
>> +       if (out_config) {
>> +               atomic_inc(&oa_config->ref_count);
>> +               *out_config = oa_config;
>> +       }
>> +
>> +       if (out_vma) {
>> +               if (oa_config->vma) {
>> +                       *out_vma = i915_vma_get(oa_config->vma);
>> +               } else {
>> +                       ret = alloc_oa_config_buffer(i915, oa_config);
>> +                       if (ret) {
>> +                               goto err_buf_alloc;
>> +                       } else {
>> +                               list_add(&oa_config->vma_link,
>> +                                        &i915->perf.metrics_buffers);
>> +                               *out_vma = i915_vma_get(oa_config->vma);
>> +                       }
>> +               }
> Where is out_vma used so we can check if the litetime tracking is ok as
> so far you are releasing it before we know it is idle.


It's part of patch 3.


> -Chris
>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 3/3] drm/i915: add a new perf configuration execbuf parameter
  2018-10-08 15:18 ` [RFC PATCH 3/3] drm/i915: add a new perf configuration execbuf parameter Lionel Landwerlin
@ 2018-10-08 15:44   ` Chris Wilson
  0 siblings, 0 replies; 13+ messages in thread
From: Chris Wilson @ 2018-10-08 15:44 UTC (permalink / raw)
  To: Lionel Landwerlin, intel-gfx

Quoting Lionel Landwerlin (2018-10-08 16:18:22)
> We want the ability to dispatch a set of command buffer to the
> hardware, each with a different OA configuration. To achieve this, we
> reuse a couple of fields from the execbuf2 struct (I CAN HAZ
> execbuf3?) to notify what OA configuration should be used for a batch
> buffer. This requires the process making the execbuf with this flag to
> also own the perf fd at the time of execbuf.

Sigh. It's a distinct step from emit_bb_start and should be using
emit_bb_start itself to execute the batch. Use i915_vma_move_to_active
to couple into retirement correctly, and use pin properly.

If you feel emit_bb is doing too much, split it up into primitives
rather than continue to overload it; emit_bb is used and will be used
outside of execbuf.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 1/3] drm/i915/perf: allow holding preemption on filtered ctx
  2018-10-08 15:35   ` Chris Wilson
@ 2018-10-08 15:45     ` Lionel Landwerlin
  0 siblings, 0 replies; 13+ messages in thread
From: Lionel Landwerlin @ 2018-10-08 15:45 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On 08/10/2018 16:35, Chris Wilson wrote:
> Quoting Lionel Landwerlin (2018-10-08 16:18:20)
>> We would like to make use of perf in Vulkan. The Vulkan API is much
>> lower level than OpenGL, with applications directly exposed to the
>> concept of command buffers (pretty much equivalent to our batch
>> buffers). In Vulkan, queries are always limited in scope to a command
>> buffer. In OpenGL, the lack of command buffer concept meant that
>> queries' duration could span multiple command buffers.
>>
>> With that restriction gone in Vulkan, we would like to simplify
>> measuring performance just by measuring the deltas between the counter
>> snapshots written by 2 MI_RECORD_PERF_COUNT commands, rather than the
>> more complex scheme we currently have in the GL driver, using 2
>> MI_RECORD_PERF_COUNT commands and doing some post processing on the
>> stream of OA reports, coming from the global OA buffer, to remove any
>> unrelated deltas in between the 2 MI_RECORD_PERF_COUNT.
>>
>> Disabling preemption only apply to a single context with which want to
>> query performance counters for and is considered a privileged
>> operation, by default protected by CAP_SYS_ADMIN. It is possible to
>> enable it for a normal user by disabling the paranoid stream setting.
> I'm am uncomfortable with disabling preemption like this. I suggest we
> at least have the preemption timeout in place first.
> -Chris
>
Sure, I'm not super happy about that either but that's the hardware we 
have to work with :/

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* ✓ Fi.CI.BAT: success for drm/i915: serialized performance queries
  2018-10-08 15:18 [RFC PATCH 0/3] drm/i915: serialized performance queries Lionel Landwerlin
                   ` (4 preceding siblings ...)
  2018-10-08 15:32 ` ✗ Fi.CI.SPARSE: " Patchwork
@ 2018-10-08 15:55 ` Patchwork
  2018-10-08 17:14 ` ✗ Fi.CI.IGT: failure " Patchwork
  6 siblings, 0 replies; 13+ messages in thread
From: Patchwork @ 2018-10-08 15:55 UTC (permalink / raw)
  To: Lionel Landwerlin; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: serialized performance queries
URL   : https://patchwork.freedesktop.org/series/50698/
State : success

== Summary ==

= CI Bug Log - changes from CI_DRM_4950 -> Patchwork_10390 =

== Summary - SUCCESS ==

  No regressions found.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/50698/revisions/1/mbox/

== Known issues ==

  Here are the changes found in Patchwork_10390 that come from known issues:

  === IGT changes ===

    ==== Issues hit ====

    igt@drv_getparams_basic@basic-subslice-total:
      fi-snb-2520m:       PASS -> DMESG-WARN (fdo#103713) +10

    igt@gem_exec_suspend@basic-s3:
      fi-kbl-soraka:      NOTRUN -> INCOMPLETE (fdo#107774, fdo#107859, fdo#107556)

    igt@kms_flip@basic-flip-vs-dpms:
      fi-skl-6700hq:      PASS -> DMESG-WARN (fdo#105998)

    igt@kms_pipe_crc_basic@hang-read-crc-pipe-b:
      fi-byt-clapper:     PASS -> FAIL (fdo#107362, fdo#103191)

    
    ==== Possible fixes ====

    igt@gem_exec_suspend@basic-s4-devices:
      fi-kbl-7500u:       DMESG-WARN (fdo#105128, fdo#107139) -> PASS

    igt@kms_flip@basic-flip-vs-modeset:
      fi-hsw-4770r:       DMESG-WARN (fdo#105602) -> PASS

    igt@kms_frontbuffer_tracking@basic:
      fi-byt-clapper:     FAIL (fdo#103167) -> PASS

    
  fdo#103167 https://bugs.freedesktop.org/show_bug.cgi?id=103167
  fdo#103191 https://bugs.freedesktop.org/show_bug.cgi?id=103191
  fdo#103713 https://bugs.freedesktop.org/show_bug.cgi?id=103713
  fdo#105128 https://bugs.freedesktop.org/show_bug.cgi?id=105128
  fdo#105602 https://bugs.freedesktop.org/show_bug.cgi?id=105602
  fdo#105998 https://bugs.freedesktop.org/show_bug.cgi?id=105998
  fdo#107139 https://bugs.freedesktop.org/show_bug.cgi?id=107139
  fdo#107362 https://bugs.freedesktop.org/show_bug.cgi?id=107362
  fdo#107556 https://bugs.freedesktop.org/show_bug.cgi?id=107556
  fdo#107774 https://bugs.freedesktop.org/show_bug.cgi?id=107774
  fdo#107859 https://bugs.freedesktop.org/show_bug.cgi?id=107859


== Participating hosts (50 -> 45) ==

  Additional (1): fi-kbl-soraka 
  Missing    (6): fi-ilk-m540 fi-hsw-4200u fi-byt-squawks fi-bsw-cyan fi-icl-u2 fi-ctg-p8600 


== Build changes ==

    * Linux: CI_DRM_4950 -> Patchwork_10390

  CI_DRM_4950: a9abc43bebfb6de62c2c3747a22fadfa17b61d8b @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4671: b121f7d42c260ae3a050c3f440d1c11f7cff7d1a @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_10390: 81dc092f58d50b34a5dd22efb926bc0c65142288 @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

81dc092f58d5 drm/i915: add a new perf configuration execbuf parameter
ccae77ecfc25 drm/i915/perf: allow for CS OA configs to be created lazily
40102d5976dd drm/i915/perf: allow holding preemption on filtered ctx

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_10390/issues.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* ✗ Fi.CI.IGT: failure for drm/i915: serialized performance queries
  2018-10-08 15:18 [RFC PATCH 0/3] drm/i915: serialized performance queries Lionel Landwerlin
                   ` (5 preceding siblings ...)
  2018-10-08 15:55 ` ✓ Fi.CI.BAT: success " Patchwork
@ 2018-10-08 17:14 ` Patchwork
  6 siblings, 0 replies; 13+ messages in thread
From: Patchwork @ 2018-10-08 17:14 UTC (permalink / raw)
  To: Lionel Landwerlin; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: serialized performance queries
URL   : https://patchwork.freedesktop.org/series/50698/
State : failure

== Summary ==

= CI Bug Log - changes from CI_DRM_4950_full -> Patchwork_10390_full =

== Summary - FAILURE ==

  Serious unknown changes coming with Patchwork_10390_full absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_10390_full, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  

== Possible new issues ==

  Here are the unknown changes that may have been introduced in Patchwork_10390_full:

  === IGT changes ===

    ==== Possible regressions ====

    igt@pm_rps@min-max-config-idle:
      shard-hsw:          PASS -> FAIL

    
    ==== Warnings ====

    igt@kms_vblank@pipe-a-wait-forked-busy:
      shard-snb:          PASS -> SKIP +1

    
== Known issues ==

  Here are the changes found in Patchwork_10390_full that come from known issues:

  === IGT changes ===

    ==== Issues hit ====

    igt@gem_cpu_reloc@full:
      shard-skl:          NOTRUN -> INCOMPLETE (fdo#108073)

    igt@gem_exec_big:
      shard-hsw:          PASS -> TIMEOUT (fdo#107937)

    igt@kms_available_modes_crc@available_mode_test_crc:
      shard-apl:          PASS -> FAIL (fdo#106641)

    igt@kms_ccs@pipe-a-crc-sprite-planes-basic:
      shard-skl:          NOTRUN -> FAIL (fdo#105458)

    igt@kms_color@pipe-c-legacy-gamma:
      shard-skl:          NOTRUN -> FAIL (fdo#104782)

    igt@kms_cursor_crc@cursor-128x128-suspend:
      shard-glk:          PASS -> INCOMPLETE (k.org#198133, fdo#103359) +1
      shard-snb:          PASS -> DMESG-WARN (fdo#102365)

    igt@kms_cursor_crc@cursor-64x21-random:
      shard-apl:          PASS -> FAIL (fdo#103232) +4

    igt@kms_cursor_legacy@2x-cursor-vs-flip-legacy:
      shard-glk:          NOTRUN -> INCOMPLETE (k.org#198133, fdo#103359)

    igt@kms_flip@flip-vs-expired-vblank-interruptible:
      shard-skl:          PASS -> FAIL (fdo#105363)

    igt@kms_frontbuffer_tracking@fbc-1p-primscrn-cur-indfb-draw-pwrite:
      shard-apl:          PASS -> FAIL (fdo#103167) +1

    {igt@kms_plane_alpha_blend@pipe-a-coverage-7efc}:
      shard-skl:          NOTRUN -> FAIL (fdo#108145) +1

    igt@kms_rotation_crc@exhaust-fences:
      shard-skl:          NOTRUN -> DMESG-WARN (fdo#105748)

    igt@kms_setmode@basic:
      shard-kbl:          PASS -> FAIL (fdo#99912)

    igt@kms_universal_plane@universal-plane-pipe-c-functional:
      shard-apl:          PASS -> FAIL (fdo#103166)

    igt@perf_pmu@busy-accuracy-98-rcs0:
      shard-snb:          SKIP -> INCOMPLETE (fdo#105411)

    igt@pm_rpm@legacy-planes:
      shard-skl:          PASS -> INCOMPLETE (fdo#105959, fdo#107807)

    igt@pm_rpm@universal-planes:
      shard-skl:          PASS -> INCOMPLETE (fdo#107807)

    
    ==== Possible fixes ====

    igt@kms_busy@extended-modeset-hang-newfb-with-reset-render-a:
      shard-skl:          DMESG-WARN (fdo#107956) -> PASS

    igt@kms_ccs@pipe-a-crc-primary-rotation-180:
      shard-skl:          FAIL (fdo#107725) -> PASS

    igt@kms_ccs@pipe-a-crc-sprite-planes-basic:
      shard-glk:          FAIL (fdo#108145) -> PASS

    igt@kms_color@pipe-a-legacy-gamma:
      shard-apl:          FAIL (fdo#108145, fdo#104782) -> PASS

    igt@kms_color@pipe-b-legacy-gamma:
      shard-apl:          FAIL (fdo#104782) -> PASS

    igt@kms_draw_crc@draw-method-rgb565-mmap-gtt-xtiled:
      shard-skl:          FAIL (fdo#103184) -> PASS

    igt@kms_flip@2x-flip-vs-rmfb-interruptible:
      shard-glk:          INCOMPLETE (k.org#198133, fdo#103359) -> PASS

    igt@kms_flip@flip-vs-expired-vblank-interruptible:
      shard-glk:          FAIL (fdo#105363) -> PASS

    igt@kms_flip@plain-flip-fb-recreate-interruptible:
      shard-skl:          FAIL (fdo#100368) -> PASS

    igt@kms_frontbuffer_tracking@fbc-1p-primscrn-indfb-msflip-blt:
      shard-skl:          FAIL (fdo#103167) -> PASS +2

    igt@kms_frontbuffer_tracking@fbc-1p-primscrn-spr-indfb-move:
      shard-apl:          FAIL (fdo#103167) -> PASS

    igt@kms_frontbuffer_tracking@fbc-1p-rte:
      shard-apl:          FAIL (fdo#105682, fdo#103167) -> PASS

    igt@kms_frontbuffer_tracking@fbc-2p-primscrn-spr-indfb-onoff:
      shard-glk:          FAIL (fdo#103167) -> PASS +7

    igt@kms_plane@plane-panning-bottom-right-suspend-pipe-c-planes:
      shard-skl:          INCOMPLETE (fdo#104108) -> PASS

    igt@kms_plane_multiple@atomic-pipe-a-tiling-y:
      shard-apl:          FAIL (fdo#103166) -> PASS +1

    igt@kms_setmode@basic:
      shard-apl:          FAIL (fdo#99912) -> PASS

    
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  fdo#100368 https://bugs.freedesktop.org/show_bug.cgi?id=100368
  fdo#102365 https://bugs.freedesktop.org/show_bug.cgi?id=102365
  fdo#103166 https://bugs.freedesktop.org/show_bug.cgi?id=103166
  fdo#103167 https://bugs.freedesktop.org/show_bug.cgi?id=103167
  fdo#103184 https://bugs.freedesktop.org/show_bug.cgi?id=103184
  fdo#103232 https://bugs.freedesktop.org/show_bug.cgi?id=103232
  fdo#103359 https://bugs.freedesktop.org/show_bug.cgi?id=103359
  fdo#104108 https://bugs.freedesktop.org/show_bug.cgi?id=104108
  fdo#104782 https://bugs.freedesktop.org/show_bug.cgi?id=104782
  fdo#105363 https://bugs.freedesktop.org/show_bug.cgi?id=105363
  fdo#105411 https://bugs.freedesktop.org/show_bug.cgi?id=105411
  fdo#105458 https://bugs.freedesktop.org/show_bug.cgi?id=105458
  fdo#105682 https://bugs.freedesktop.org/show_bug.cgi?id=105682
  fdo#105748 https://bugs.freedesktop.org/show_bug.cgi?id=105748
  fdo#105959 https://bugs.freedesktop.org/show_bug.cgi?id=105959
  fdo#106641 https://bugs.freedesktop.org/show_bug.cgi?id=106641
  fdo#107725 https://bugs.freedesktop.org/show_bug.cgi?id=107725
  fdo#107807 https://bugs.freedesktop.org/show_bug.cgi?id=107807
  fdo#107937 https://bugs.freedesktop.org/show_bug.cgi?id=107937
  fdo#107956 https://bugs.freedesktop.org/show_bug.cgi?id=107956
  fdo#108073 https://bugs.freedesktop.org/show_bug.cgi?id=108073
  fdo#108145 https://bugs.freedesktop.org/show_bug.cgi?id=108145
  fdo#99912 https://bugs.freedesktop.org/show_bug.cgi?id=99912
  k.org#198133 https://bugzilla.kernel.org/show_bug.cgi?id=198133


== Participating hosts (6 -> 6) ==

  No changes in participating hosts


== Build changes ==

    * Linux: CI_DRM_4950 -> Patchwork_10390

  CI_DRM_4950: a9abc43bebfb6de62c2c3747a22fadfa17b61d8b @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4671: b121f7d42c260ae3a050c3f440d1c11f7cff7d1a @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_10390: 81dc092f58d50b34a5dd22efb926bc0c65142288 @ git://anongit.freedesktop.org/gfx-ci/linux
  piglit_4509: fdc5a4ca11124ab8413c7988896eec4c97336694 @ git://anongit.freedesktop.org/piglit

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_10390/shards.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2018-10-08 17:14 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-08 15:18 [RFC PATCH 0/3] drm/i915: serialized performance queries Lionel Landwerlin
2018-10-08 15:18 ` [RFC PATCH 1/3] drm/i915/perf: allow holding preemption on filtered ctx Lionel Landwerlin
2018-10-08 15:35   ` Chris Wilson
2018-10-08 15:45     ` Lionel Landwerlin
2018-10-08 15:18 ` [RFC PATCH 2/3] drm/i915/perf: allow for CS OA configs to be created lazily Lionel Landwerlin
2018-10-08 15:34   ` Chris Wilson
2018-10-08 15:44     ` Lionel Landwerlin
2018-10-08 15:18 ` [RFC PATCH 3/3] drm/i915: add a new perf configuration execbuf parameter Lionel Landwerlin
2018-10-08 15:44   ` Chris Wilson
2018-10-08 15:30 ` ✗ Fi.CI.CHECKPATCH: warning for drm/i915: serialized performance queries Patchwork
2018-10-08 15:32 ` ✗ Fi.CI.SPARSE: " Patchwork
2018-10-08 15:55 ` ✓ Fi.CI.BAT: success " Patchwork
2018-10-08 17:14 ` ✗ Fi.CI.IGT: failure " Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.