[PATCH v6 00/11] drm/i915: Vulkan performance query support

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v6 00/11] drm/i915: Vulkan performance query support
@ 2019-07-01 11:34 Lionel Landwerlin
  2019-07-01 11:34 ` [PATCH v6 01/11] drm/i915/perf: add missing delay for OA muxes configuration Lionel Landwerlin
                   ` (14 more replies)
  0 siblings, 15 replies; 44+ messages in thread
From: Lionel Landwerlin @ 2019-07-01 11:34 UTC (permalink / raw)
  To: intel-gfx

Hi all,

Here are a number of fixes and improvement over v5.

Here is a summary :

     * Name offsets/fields used in the scratch buffer

     * Save/restore used CS_GPR registers for perf delay

     * Limiting taking of global lock now that we have configuration
       happening on CS

     * Prevent structure to be listed more than once in execbuffer
       extension chain

Many thanks to Chris for his comments.

Cheers,

Lionel Landwerlin (11):
  drm/i915/perf: add missing delay for OA muxes configuration
  drm/i915/perf: introduce a versioning of the i915-perf uapi
  drm/i915/perf: allow for CS OA configs to be created lazily
  drm/i915: enumerate scratch fields
  drm/i915/perf: implement active wait for noa configurations
  drm/i915: introduce a mechanism to extend execbuf2
  drm/i915: add syncobj timeline support
  drm/i915: add a new perf configuration execbuf parameter
  drm/i915/perf: allow holding preemption on filtered ctx
  drm/i915/perf: execute OA configuration from command stream
  drm/i915: add support for perf configuration queries

 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 452 +++++++++++--
 drivers/gpu/drm/i915/gt/intel_engine_cs.c     |   2 +
 drivers/gpu/drm/i915/gt/intel_engine_types.h  |   9 +
 drivers/gpu/drm/i915/gt/intel_gpu_commands.h  |  25 +
 drivers/gpu/drm/i915/gt/intel_gt.h            |   6 +-
 drivers/gpu/drm/i915/gt/intel_gt_types.h      |  20 +
 drivers/gpu/drm/i915/gt/intel_lrc.c           |  32 +-
 drivers/gpu/drm/i915/gt/intel_ringbuffer.c    |  35 +-
 drivers/gpu/drm/i915/i915_debugfs.c           |  25 +
 drivers/gpu/drm/i915/i915_drv.c               |  11 +-
 drivers/gpu/drm/i915/i915_drv.h               |  61 +-
 drivers/gpu/drm/i915/i915_perf.c              | 629 +++++++++++++++---
 drivers/gpu/drm/i915/i915_priolist_types.h    |   7 +
 drivers/gpu/drm/i915/i915_query.c             | 279 ++++++++
 drivers/gpu/drm/i915/i915_reg.h               |   4 +-
 drivers/gpu/drm/i915/i915_request.c           |   4 +-
 drivers/gpu/drm/i915/i915_request.h           |  14 +-
 drivers/gpu/drm/i915/intel_guc_submission.c   |  10 +-
 drivers/gpu/drm/i915/intel_pm.c               |   5 +-
 include/uapi/drm/i915_drm.h                   | 193 +++++-
 20 files changed, 1625 insertions(+), 198 deletions(-)

--
2.21.0.392.gf8f6787159e
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v6 01/11] drm/i915/perf: add missing delay for OA muxes configuration
  2019-07-01 11:34 [PATCH v6 00/11] drm/i915: Vulkan performance query support Lionel Landwerlin
@ 2019-07-01 11:34 ` Lionel Landwerlin
  2019-07-01 11:34 ` [PATCH v6 02/11] drm/i915/perf: introduce a versioning of the i915-perf uapi Lionel Landwerlin
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 44+ messages in thread
From: Lionel Landwerlin @ 2019-07-01 11:34 UTC (permalink / raw)
  To: intel-gfx

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 19f81df2859eb1 ("drm/i915/perf: Add OA unit support for Gen 8+")
---
 drivers/gpu/drm/i915/i915_perf.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 357e63beb373..2094358860d5 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -1838,6 +1838,29 @@ static int gen8_enable_metric_set(struct i915_perf_stream *stream)
 
 	config_oa_regs(dev_priv, oa_config->mux_regs, oa_config->mux_regs_len);
 
+	/* It apparently takes a fairly long time for a new MUX
+	 * configuration to be be applied after these register writes.
+	 * This delay duration was derived empirically based on the
+	 * render_basic config but hopefully it covers the maximum
+	 * configuration latency.
+	 *
+	 * As a fallback, the checks in _append_oa_reports() to skip
+	 * invalid OA reports do also seem to work to discard reports
+	 * generated before this config has completed - albeit not
+	 * silently.
+	 *
+	 * Unfortunately this is essentially a magic number, since we
+	 * don't currently know of a reliable mechanism for predicting
+	 * how long the MUX config will take to apply and besides
+	 * seeing invalid reports we don't know of a reliable way to
+	 * explicitly check that the MUX config has landed.
+	 *
+	 * It's even possible we've miss characterized the underlying
+	 * problem - it just seems like the simplest explanation why
+	 * a delay at this location would mitigate any invalid reports.
+	 */
+	usleep_range(15000, 20000);
+
 	config_oa_regs(dev_priv, oa_config->b_counter_regs,
 		       oa_config->b_counter_regs_len);
 
-- 
2.21.0.392.gf8f6787159e

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v6 02/11] drm/i915/perf: introduce a versioning of the i915-perf uapi
  2019-07-01 11:34 [PATCH v6 00/11] drm/i915: Vulkan performance query support Lionel Landwerlin
  2019-07-01 11:34 ` [PATCH v6 01/11] drm/i915/perf: add missing delay for OA muxes configuration Lionel Landwerlin
@ 2019-07-01 11:34 ` Lionel Landwerlin
  2019-07-01 12:45   ` Chris Wilson
  2019-07-01 11:34 ` [PATCH v6 03/11] drm/i915/perf: allow for CS OA configs to be created lazily Lionel Landwerlin
                   ` (12 subsequent siblings)
  14 siblings, 1 reply; 44+ messages in thread
From: Lionel Landwerlin @ 2019-07-01 11:34 UTC (permalink / raw)
  To: intel-gfx

Reporting this version will help application figure out what level of
the support the running kernel provides.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.c |  3 +++
 include/uapi/drm/i915_drm.h     | 21 +++++++++++++++++++++
 2 files changed, 24 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 794c6814a6d0..fa02e8f033d7 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -483,6 +483,9 @@ static int i915_getparam_ioctl(struct drm_device *dev, void *data,
 	case I915_PARAM_MMAP_GTT_COHERENT:
 		value = INTEL_INFO(dev_priv)->has_coherent_ggtt;
 		break;
+	case I915_PARAM_PERF_REVISION:
+		value = 1;
+		break;
 	default:
 		DRM_DEBUG("Unknown parameter %d\n", param->param);
 		return -EINVAL;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 328d05e77d9f..e27a8eda9121 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -610,6 +610,13 @@ typedef struct drm_i915_irq_wait {
  * See I915_EXEC_FENCE_OUT and I915_EXEC_FENCE_SUBMIT.
  */
 #define I915_PARAM_HAS_EXEC_SUBMIT_FENCE 53
+
+/*
+ * Revision of the i915-perf uAPI. The value returned helps determine what
+ * i915-perf features are available. See drm_i915_perf_property_id.
+ */
+#define I915_PARAM_PERF_REVISION	54
+
 /* Must be kept compact -- no holes and well documented */
 
 typedef struct drm_i915_getparam {
@@ -1843,23 +1850,31 @@ enum drm_i915_perf_property_id {
 	 * Open the stream for a specific context handle (as used with
 	 * execbuffer2). A stream opened for a specific context this way
 	 * won't typically require root privileges.
+	 *
+	 * This property is available in perf revision 1.
 	 */
 	DRM_I915_PERF_PROP_CTX_HANDLE = 1,
 
 	/**
 	 * A value of 1 requests the inclusion of raw OA unit reports as
 	 * part of stream samples.
+	 *
+	 * This property is available in perf revision 1.
 	 */
 	DRM_I915_PERF_PROP_SAMPLE_OA,
 
 	/**
 	 * The value specifies which set of OA unit metrics should be
 	 * be configured, defining the contents of any OA unit reports.
+	 *
+	 * This property is available in perf revision 1.
 	 */
 	DRM_I915_PERF_PROP_OA_METRICS_SET,
 
 	/**
 	 * The value specifies the size and layout of OA unit reports.
+	 *
+	 * This property is available in perf revision 1.
 	 */
 	DRM_I915_PERF_PROP_OA_FORMAT,
 
@@ -1869,6 +1884,8 @@ enum drm_i915_perf_property_id {
 	 * from this exponent as follows:
 	 *
 	 *   80ns * 2^(period_exponent + 1)
+	 *
+	 * This property is available in perf revision 1.
 	 */
 	DRM_I915_PERF_PROP_OA_EXPONENT,
 
@@ -1900,6 +1917,8 @@ struct drm_i915_perf_open_param {
  * to close and re-open a stream with the same configuration.
  *
  * It's undefined whether any pending data for the stream will be lost.
+ *
+ * This ioctl is available in perf revision 1.
  */
 #define I915_PERF_IOCTL_ENABLE	_IO('i', 0x0)
 
@@ -1907,6 +1926,8 @@ struct drm_i915_perf_open_param {
  * Disable data capture for a stream.
  *
  * It is an error to try and read a stream that is disabled.
+ *
+ * This ioctl is available in perf revision 1.
  */
 #define I915_PERF_IOCTL_DISABLE	_IO('i', 0x1)
 
-- 
2.21.0.392.gf8f6787159e

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v6 03/11] drm/i915/perf: allow for CS OA configs to be created lazily
  2019-07-01 11:34 [PATCH v6 00/11] drm/i915: Vulkan performance query support Lionel Landwerlin
  2019-07-01 11:34 ` [PATCH v6 01/11] drm/i915/perf: add missing delay for OA muxes configuration Lionel Landwerlin
  2019-07-01 11:34 ` [PATCH v6 02/11] drm/i915/perf: introduce a versioning of the i915-perf uapi Lionel Landwerlin
@ 2019-07-01 11:34 ` Lionel Landwerlin
  2019-07-01 13:06   ` Chris Wilson
                     ` (2 more replies)
  2019-07-01 11:34 ` [PATCH v6 04/11] drm/i915: enumerate scratch fields Lionel Landwerlin
                   ` (11 subsequent siblings)
  14 siblings, 3 replies; 44+ messages in thread
From: Lionel Landwerlin @ 2019-07-01 11:34 UTC (permalink / raw)
  To: intel-gfx

Here we introduce a mechanism by which the execbuf part of the i915
driver will be able to request that a batch buffer containing the
programming for a particular OA config be created.

We'll execute these OA configuration buffers right before executing a
set of userspace commands so that a particular user batchbuffer be
executed with a given OA configuration.

This mechanism essentially allows the userspace driver to go through
several OA configuration without having to open/close the i915/perf
stream.

v2: No need for locking on object OA config object creation (Chris)
    Flush cpu mapping of OA config (Chris)

v3: Properly deal with the perf_metric lock (Chris/Lionel)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_gpu_commands.h |   1 +
 drivers/gpu/drm/i915/i915_drv.h              |  24 ++-
 drivers/gpu/drm/i915/i915_perf.c             | 186 +++++++++++++++----
 3 files changed, 175 insertions(+), 36 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
index eec31e36aca7..e7eff9db343e 100644
--- a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
+++ b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
@@ -126,6 +126,7 @@
  */
 #define MI_LOAD_REGISTER_IMM(x)	MI_INSTR(0x22, 2*(x)-1)
 #define   MI_LRI_FORCE_POSTED		(1<<12)
+#define MI_LOAD_REGISTER_IMM_MAX_REGS (126)
 #define MI_STORE_REGISTER_MEM        MI_INSTR(0x24, 1)
 #define MI_STORE_REGISTER_MEM_GEN8   MI_INSTR(0x24, 2)
 #define   MI_SRM_LRM_GLOBAL_GTT		(1<<22)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 7e981b03face..df39b2ee6bd9 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1096,6 +1096,8 @@ struct i915_oa_reg {
 };
 
 struct i915_oa_config {
+	struct drm_i915_private *i915;
+
 	char uuid[UUID_STRING_LEN + 1];
 	int id;
 
@@ -1110,6 +1112,10 @@ struct i915_oa_config {
 	struct attribute *attrs[2];
 	struct device_attribute sysfs_metric_id;
 
+	struct drm_i915_gem_object *obj;
+
+	struct list_head vma_link;
+
 	atomic_t ref_count;
 };
 
@@ -1696,11 +1702,21 @@ struct drm_i915_private {
 		struct mutex metrics_lock;
 
 		/*
-		 * List of dynamic configurations, you need to hold
-		 * dev_priv->perf.metrics_lock to access it.
+		 * List of dynamic configurations (struct i915_oa_config), you
+		 * need to hold dev_priv->perf.metrics_lock to access it.
 		 */
 		struct idr metrics_idr;
 
+		/*
+		 * List of dynamic configurations (struct i915_oa_config)
+		 * which have an allocated buffer in GGTT for reconfiguration,
+		 * you need to hold dev_priv->perf.metrics_lock to access it.
+		 * Elements are added to the list lazilly on execbuf (when a
+		 * particular configuration is requested). The list is freed
+		 * upon closing the perf stream.
+		 */
+		struct list_head metrics_buffers;
+
 		/*
 		 * Lock associated with anything below within this structure
 		 * except exclusive_stream.
@@ -2587,6 +2603,10 @@ int i915_perf_remove_config_ioctl(struct drm_device *dev, void *data,
 void i915_oa_init_reg_state(struct intel_engine_cs *engine,
 			    struct intel_context *ce,
 			    u32 *reg_state);
+int i915_perf_get_oa_config(struct drm_i915_private *i915,
+			    int metrics_set,
+			    struct i915_oa_config **out_config,
+			    struct drm_i915_gem_object **out_obj);
 
 /* i915_gem_evict.c */
 int __must_check i915_gem_evict_something(struct i915_address_space *vm,
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 2094358860d5..5ba771468078 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -366,9 +366,20 @@ struct perf_open_properties {
 	int oa_period_exponent;
 };
 
-static void free_oa_config(struct drm_i915_private *dev_priv,
-			   struct i915_oa_config *oa_config)
+static void put_oa_config(struct i915_oa_config *oa_config)
 {
+	if (!atomic_dec_and_test(&oa_config->ref_count))
+		return;
+
+	if (oa_config->obj) {
+		struct drm_i915_private *i915 = oa_config->i915;
+
+		mutex_lock(&i915->perf.metrics_lock);
+		list_del(&oa_config->vma_link);
+		i915_gem_object_put(oa_config->obj);
+		mutex_unlock(&i915->perf.metrics_lock);
+	}
+
 	if (!PTR_ERR(oa_config->flex_regs))
 		kfree(oa_config->flex_regs);
 	if (!PTR_ERR(oa_config->b_counter_regs))
@@ -378,38 +389,124 @@ static void free_oa_config(struct drm_i915_private *dev_priv,
 	kfree(oa_config);
 }
 
-static void put_oa_config(struct drm_i915_private *dev_priv,
-			  struct i915_oa_config *oa_config)
+static u32 *write_cs_mi_lri(u32 *cs, const struct i915_oa_reg *reg_data, u32 n_regs)
 {
-	if (!atomic_dec_and_test(&oa_config->ref_count))
-		return;
+	u32 i;
+
+	for (i = 0; i < n_regs; i++) {
+		if ((i % MI_LOAD_REGISTER_IMM_MAX_REGS) == 0) {
+			u32 n_lri = min(n_regs - i,
+					(u32) MI_LOAD_REGISTER_IMM_MAX_REGS);
+
+			*cs++ = MI_LOAD_REGISTER_IMM(n_lri);
+		}
+		*cs++ = i915_mmio_reg_offset(reg_data[i].addr);
+		*cs++ = reg_data[i].value;
+	}
 
-	free_oa_config(dev_priv, oa_config);
+	return cs;
 }
 
-static int get_oa_config(struct drm_i915_private *dev_priv,
-			 int metrics_set,
-			 struct i915_oa_config **out_config)
+static int alloc_oa_config_buffer(struct drm_i915_private *i915,
+				  struct i915_oa_config *oa_config)
 {
-	int ret;
+	struct drm_i915_gem_object *bo;
+	size_t config_length = 0;
+	u32 *cs;
 
-	if (metrics_set == 1) {
-		*out_config = &dev_priv->perf.oa.test_config;
-		atomic_inc(&dev_priv->perf.oa.test_config.ref_count);
-		return 0;
+	if (oa_config->mux_regs_len > 0) {
+		config_length += DIV_ROUND_UP(oa_config->mux_regs_len,
+					      MI_LOAD_REGISTER_IMM_MAX_REGS) * 4;
+		config_length += oa_config->mux_regs_len * 8;
+	}
+	if (oa_config->b_counter_regs_len > 0) {
+		config_length += DIV_ROUND_UP(oa_config->b_counter_regs_len,
+					      MI_LOAD_REGISTER_IMM_MAX_REGS) * 4;
+		config_length += oa_config->b_counter_regs_len * 8;
+	}
+	if (oa_config->flex_regs_len > 0) {
+		config_length += DIV_ROUND_UP(oa_config->flex_regs_len,
+					      MI_LOAD_REGISTER_IMM_MAX_REGS) * 4;
+		config_length += oa_config->flex_regs_len * 8;
 	}
+	config_length += 4; /* MI_BATCH_BUFFER_END */
+	config_length = ALIGN(config_length, I915_GTT_PAGE_SIZE);
 
-	ret = mutex_lock_interruptible(&dev_priv->perf.metrics_lock);
+	bo = i915_gem_object_create_shmem(i915, config_length);
+	if (IS_ERR(bo))
+		return PTR_ERR(bo);
+
+	cs = i915_gem_object_pin_map(bo, I915_MAP_WB);
+	if (IS_ERR(cs)) {
+		i915_gem_object_put(bo);
+		return PTR_ERR(cs);
+	}
+
+	cs = write_cs_mi_lri(cs, oa_config->mux_regs, oa_config->mux_regs_len);
+	cs = write_cs_mi_lri(cs, oa_config->b_counter_regs, oa_config->b_counter_regs_len);
+	cs = write_cs_mi_lri(cs, oa_config->flex_regs, oa_config->flex_regs_len);
+
+	*cs++ = MI_BATCH_BUFFER_END;
+
+	i915_gem_object_flush_map(bo);
+	i915_gem_object_unpin_map(bo);
+
+	oa_config->obj = bo;
+
+	return 0;
+}
+
+int i915_perf_get_oa_config(struct drm_i915_private *i915,
+			    int metrics_set,
+			    struct i915_oa_config **out_config,
+			    struct drm_i915_gem_object **out_obj)
+{
+	int ret = 0;
+	struct i915_oa_config *oa_config;
+
+	if (!i915->perf.initialized)
+		return -ENODEV;
+
+	ret = mutex_lock_interruptible(&i915->perf.metrics_lock);
 	if (ret)
 		return ret;
 
-	*out_config = idr_find(&dev_priv->perf.metrics_idr, metrics_set);
-	if (!*out_config)
-		ret = -EINVAL;
-	else
-		atomic_inc(&(*out_config)->ref_count);
+	if (metrics_set == 1) {
+		oa_config = &i915->perf.oa.test_config;
+	} else {
+		oa_config = idr_find(&i915->perf.metrics_idr, metrics_set);
+		if (!oa_config) {
+			ret = -EINVAL;
+			goto err_unlock;
+		}
+	}
+
+	if (out_config) {
+		atomic_inc(&oa_config->ref_count);
+		*out_config = oa_config;
+	}
 
-	mutex_unlock(&dev_priv->perf.metrics_lock);
+	if (out_obj) {
+		if (oa_config->obj) {
+			*out_obj = i915_gem_object_get(oa_config->obj);
+		} else {
+			ret = alloc_oa_config_buffer(i915, oa_config);
+			if (ret)
+				goto err_unlock;
+
+			list_add(&oa_config->vma_link,
+				 &i915->perf.metrics_buffers);
+			*out_obj = i915_gem_object_get(oa_config->obj);
+		}
+	}
+
+err_unlock:
+	mutex_unlock(&i915->perf.metrics_lock);
+
+	if (ret && out_config) {
+		put_oa_config(oa_config);
+		*out_config = NULL;
+	}
 
 	return ret;
 }
@@ -1380,7 +1477,7 @@ static void i915_oa_stream_destroy(struct i915_perf_stream *stream)
 	if (stream->ctx)
 		oa_put_render_ctx_id(stream);
 
-	put_oa_config(dev_priv, stream->oa_config);
+	put_oa_config(stream->oa_config);
 
 	if (dev_priv->perf.oa.spurious_report_rs.missed) {
 		DRM_NOTE("%d spurious OA report notices suppressed due to ratelimiting\n",
@@ -2117,7 +2214,8 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
 		}
 	}
 
-	ret = get_oa_config(dev_priv, props->metrics_set, &stream->oa_config);
+	ret = i915_perf_get_oa_config(dev_priv, props->metrics_set,
+				      &stream->oa_config, NULL);
 	if (ret) {
 		DRM_DEBUG("Invalid OA config id=%i\n", props->metrics_set);
 		goto err_config;
@@ -2155,6 +2253,8 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
 		goto err_enable;
 	}
 
+	DRM_DEBUG("opening stream oa config uuid=%s\n", stream->oa_config->uuid);
+
 	mutex_unlock(&dev_priv->drm.struct_mutex);
 
 	return 0;
@@ -2168,7 +2268,7 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
 	free_oa_buffer(dev_priv);
 
 err_oa_buf_alloc:
-	put_oa_config(dev_priv, stream->oa_config);
+	put_oa_config(stream->oa_config);
 
 	intel_uncore_forcewake_put(&dev_priv->uncore, FORCEWAKE_ALL);
 	intel_runtime_pm_put(&dev_priv->runtime_pm, stream->wakeref);
@@ -2535,9 +2635,21 @@ static int i915_perf_release(struct inode *inode, struct file *file)
 {
 	struct i915_perf_stream *stream = file->private_data;
 	struct drm_i915_private *dev_priv = stream->dev_priv;
+	struct i915_oa_config *oa_config, *next;
 
 	mutex_lock(&dev_priv->perf.lock);
+
 	i915_perf_destroy_locked(stream);
+
+	/* Dispose of all oa config batch buffers. */
+	mutex_lock(&dev_priv->perf.metrics_lock);
+	list_for_each_entry_safe(oa_config, next, &dev_priv->perf.metrics_buffers, vma_link) {
+		list_del(&oa_config->vma_link);
+		i915_gem_object_put(oa_config->obj);
+		oa_config->obj = NULL;
+	}
+	mutex_unlock(&dev_priv->perf.metrics_lock);
+
 	mutex_unlock(&dev_priv->perf.lock);
 
 	return 0;
@@ -2965,6 +3077,7 @@ void i915_perf_register(struct drm_i915_private *dev_priv)
 	if (ret)
 		goto sysfs_error;
 
+	dev_priv->perf.oa.test_config.i915 = dev_priv;
 	atomic_set(&dev_priv->perf.oa.test_config.ref_count, 1);
 
 	goto exit;
@@ -3221,6 +3334,7 @@ int i915_perf_add_config_ioctl(struct drm_device *dev, void *data,
 		return -ENOMEM;
 	}
 
+	oa_config->i915 = dev_priv;
 	atomic_set(&oa_config->ref_count, 1);
 
 	if (!uuid_is_valid(args->uuid)) {
@@ -3320,7 +3434,7 @@ int i915_perf_add_config_ioctl(struct drm_device *dev, void *data,
 sysfs_err:
 	mutex_unlock(&dev_priv->perf.metrics_lock);
 reg_err:
-	put_oa_config(dev_priv, oa_config);
+	put_oa_config(oa_config);
 	DRM_DEBUG("Failed to add new OA config\n");
 	return err;
 }
@@ -3356,13 +3470,13 @@ int i915_perf_remove_config_ioctl(struct drm_device *dev, void *data,
 
 	ret = mutex_lock_interruptible(&dev_priv->perf.metrics_lock);
 	if (ret)
-		goto lock_err;
+		return ret;
 
 	oa_config = idr_find(&dev_priv->perf.metrics_idr, *arg);
 	if (!oa_config) {
 		DRM_DEBUG("Failed to remove unknown OA config\n");
 		ret = -ENOENT;
-		goto config_err;
+		goto err_unlock;
 	}
 
 	GEM_BUG_ON(*arg != oa_config->id);
@@ -3372,13 +3486,16 @@ int i915_perf_remove_config_ioctl(struct drm_device *dev, void *data,
 
 	idr_remove(&dev_priv->perf.metrics_idr, *arg);
 
+	mutex_unlock(&dev_priv->perf.metrics_lock);
+
 	DRM_DEBUG("Removed config %s id=%i\n", oa_config->uuid, oa_config->id);
 
-	put_oa_config(dev_priv, oa_config);
+	put_oa_config(oa_config);
+
+	return 0;
 
-config_err:
+err_unlock:
 	mutex_unlock(&dev_priv->perf.metrics_lock);
-lock_err:
 	return ret;
 }
 
@@ -3520,6 +3637,8 @@ void i915_perf_init(struct drm_i915_private *dev_priv)
 		init_waitqueue_head(&dev_priv->perf.oa.poll_wq);
 
 		INIT_LIST_HEAD(&dev_priv->perf.streams);
+		INIT_LIST_HEAD(&dev_priv->perf.metrics_buffers);
+
 		mutex_init(&dev_priv->perf.lock);
 		spin_lock_init(&dev_priv->perf.oa.oa_buffer.ptr_lock);
 
@@ -3536,10 +3655,9 @@ void i915_perf_init(struct drm_i915_private *dev_priv)
 
 static int destroy_config(int id, void *p, void *data)
 {
-	struct drm_i915_private *dev_priv = data;
 	struct i915_oa_config *oa_config = p;
 
-	put_oa_config(dev_priv, oa_config);
+	put_oa_config(oa_config);
 
 	return 0;
 }
@@ -3553,7 +3671,7 @@ void i915_perf_fini(struct drm_i915_private *dev_priv)
 	if (!dev_priv->perf.initialized)
 		return;
 
-	idr_for_each(&dev_priv->perf.metrics_idr, destroy_config, dev_priv);
+	idr_for_each(&dev_priv->perf.metrics_idr, destroy_config, NULL);
 	idr_destroy(&dev_priv->perf.metrics_idr);
 
 	unregister_sysctl_table(dev_priv->perf.sysctl_header);
-- 
2.21.0.392.gf8f6787159e

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v6 04/11] drm/i915: enumerate scratch fields
  2019-07-01 11:34 [PATCH v6 00/11] drm/i915: Vulkan performance query support Lionel Landwerlin
                   ` (2 preceding siblings ...)
  2019-07-01 11:34 ` [PATCH v6 03/11] drm/i915/perf: allow for CS OA configs to be created lazily Lionel Landwerlin
@ 2019-07-01 11:34 ` Lionel Landwerlin
  2019-07-01 12:07   ` Chris Wilson
  2019-07-01 11:34 ` [PATCH v6 05/11] drm/i915/perf: implement active wait for noa configurations Lionel Landwerlin
                   ` (10 subsequent siblings)
  14 siblings, 1 reply; 44+ messages in thread
From: Lionel Landwerlin @ 2019-07-01 11:34 UTC (permalink / raw)
  To: intel-gfx

We have a bunch of offsets in the scratch buffer. As we're about to
add some more, let's group all of the offsets in a common location.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_gt.h         |  6 +++--
 drivers/gpu/drm/i915/gt/intel_gt_types.h   | 15 +++++++++++
 drivers/gpu/drm/i915/gt/intel_lrc.c        | 24 ++++++++++-------
 drivers/gpu/drm/i915/gt/intel_ringbuffer.c | 31 +++++++++++++++-------
 4 files changed, 54 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h b/drivers/gpu/drm/i915/gt/intel_gt.h
index cf3c6cecc8ee..d9ce1775be53 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt.h
@@ -24,9 +24,11 @@ void intel_gt_chipset_flush(struct intel_gt *gt);
 int intel_gt_init_scratch(struct intel_gt *gt, unsigned int size);
 void intel_gt_fini_scratch(struct intel_gt *gt);
 
-static inline u32 intel_gt_scratch_offset(const struct intel_gt *gt)
+static inline u32 intel_gt_scratch_offset(const struct intel_gt *gt,
+					  enum intel_gt_scratch_field field)
 {
-	return i915_ggtt_offset(gt->scratch);
+
+	return i915_ggtt_offset(gt->scratch) + field;
 }
 
 #endif /* __INTEL_GT_H__ */
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_types.h b/drivers/gpu/drm/i915/gt/intel_gt_types.h
index c03e56628ee2..e625a5e320d3 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h
@@ -57,4 +57,19 @@ struct intel_gt {
 	struct i915_vma *scratch;
 };
 
+enum intel_gt_scratch_field {
+	/* 8 bytes */
+	INTEL_GT_SCRATCH_FIELD_DEFAULT = 0,
+
+	/* 8 bytes */
+	INTEL_GT_SCRATCH_FIELD_CLEAR_SLM_WA = 128,
+
+	/* 8 bytes */
+	INTEL_GT_SCRATCH_FIELD_RENDER_FLUSH = 128,
+
+	/* 8 bytes */
+	INTEL_GT_SCRATCH_FIELD_COHERENTL3_WA = 256,
+
+};
+
 #endif /* __INTEL_GT_TYPES_H__ */
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 471e134de186..cce8337bdf9c 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -1758,7 +1758,8 @@ gen8_emit_flush_coherentl3_wa(struct intel_engine_cs *engine, u32 *batch)
 	/* NB no one else is allowed to scribble over scratch + 256! */
 	*batch++ = MI_STORE_REGISTER_MEM_GEN8 | MI_SRM_LRM_GLOBAL_GTT;
 	*batch++ = i915_mmio_reg_offset(GEN8_L3SQCREG4);
-	*batch++ = intel_gt_scratch_offset(engine->gt) + 256;
+	*batch++ = intel_gt_scratch_offset(engine->gt,
+					   INTEL_GT_SCRATCH_FIELD_COHERENTL3_WA);
 	*batch++ = 0;
 
 	*batch++ = MI_LOAD_REGISTER_IMM(1);
@@ -1772,7 +1773,8 @@ gen8_emit_flush_coherentl3_wa(struct intel_engine_cs *engine, u32 *batch)
 
 	*batch++ = MI_LOAD_REGISTER_MEM_GEN8 | MI_SRM_LRM_GLOBAL_GTT;
 	*batch++ = i915_mmio_reg_offset(GEN8_L3SQCREG4);
-	*batch++ = intel_gt_scratch_offset(engine->gt) + 256;
+	*batch++ = intel_gt_scratch_offset(engine->gt,
+					   INTEL_GT_SCRATCH_FIELD_COHERENTL3_WA);
 	*batch++ = 0;
 
 	return batch;
@@ -1804,13 +1806,14 @@ static u32 *gen8_init_indirectctx_bb(struct intel_engine_cs *engine, u32 *batch)
 
 	/* WaClearSlmSpaceAtContextSwitch:bdw,chv */
 	/* Actual scratch location is at 128 bytes offset */
-	batch = gen8_emit_pipe_control(batch,
-				       PIPE_CONTROL_FLUSH_L3 |
-				       PIPE_CONTROL_GLOBAL_GTT_IVB |
-				       PIPE_CONTROL_CS_STALL |
-				       PIPE_CONTROL_QW_WRITE,
-				       intel_gt_scratch_offset(engine->gt) +
-				       2 * CACHELINE_BYTES);
+	batch = gen8_emit_pipe_control(
+		batch,
+		PIPE_CONTROL_FLUSH_L3 |
+		PIPE_CONTROL_GLOBAL_GTT_IVB |
+		PIPE_CONTROL_CS_STALL |
+		PIPE_CONTROL_QW_WRITE,
+		intel_gt_scratch_offset(engine->gt,
+					INTEL_GT_SCRATCH_FIELD_CLEAR_SLM_WA));
 
 	*batch++ = MI_ARB_ON_OFF | MI_ARB_ENABLE;
 
@@ -2503,7 +2506,8 @@ static int gen8_emit_flush_render(struct i915_request *request,
 {
 	struct intel_engine_cs *engine = request->engine;
 	u32 scratch_addr =
-		intel_gt_scratch_offset(engine->gt) + 2 * CACHELINE_BYTES;
+		intel_gt_scratch_offset(engine->gt,
+					INTEL_GT_SCRATCH_FIELD_RENDER_FLUSH);
 	bool vf_flush_wa = false, dc_flush_wa = false;
 	u32 *cs, flags = 0;
 	int len;
diff --git a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
index 81f9b0422e6a..02a4a52e2019 100644
--- a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
@@ -77,7 +77,8 @@ gen2_render_ring_flush(struct i915_request *rq, u32 mode)
 	*cs++ = cmd;
 	while (num_store_dw--) {
 		*cs++ = MI_STORE_DWORD_IMM | MI_MEM_VIRTUAL;
-		*cs++ = intel_gt_scratch_offset(rq->engine->gt);
+		*cs++ = intel_gt_scratch_offset(rq->engine->gt,
+						INTEL_GT_SCRATCH_FIELD_DEFAULT);
 		*cs++ = 0;
 	}
 	*cs++ = MI_FLUSH | MI_NO_WRITE_FLUSH;
@@ -150,7 +151,8 @@ gen4_render_ring_flush(struct i915_request *rq, u32 mode)
 	 */
 	if (mode & EMIT_INVALIDATE) {
 		*cs++ = GFX_OP_PIPE_CONTROL(4) | PIPE_CONTROL_QW_WRITE;
-		*cs++ = intel_gt_scratch_offset(rq->engine->gt) |
+		*cs++ = intel_gt_scratch_offset(rq->engine->gt,
+						INTEL_GT_SCRATCH_FIELD_DEFAULT) |
 			PIPE_CONTROL_GLOBAL_GTT;
 		*cs++ = 0;
 		*cs++ = 0;
@@ -159,7 +161,8 @@ gen4_render_ring_flush(struct i915_request *rq, u32 mode)
 			*cs++ = MI_FLUSH;
 
 		*cs++ = GFX_OP_PIPE_CONTROL(4) | PIPE_CONTROL_QW_WRITE;
-		*cs++ = intel_gt_scratch_offset(rq->engine->gt) |
+		*cs++ = intel_gt_scratch_offset(rq->engine->gt,
+						INTEL_GT_SCRATCH_FIELD_DEFAULT) |
 			PIPE_CONTROL_GLOBAL_GTT;
 		*cs++ = 0;
 		*cs++ = 0;
@@ -213,7 +216,8 @@ static int
 gen6_emit_post_sync_nonzero_flush(struct i915_request *rq)
 {
 	u32 scratch_addr =
-		intel_gt_scratch_offset(rq->engine->gt) + 2 * CACHELINE_BYTES;
+		intel_gt_scratch_offset(rq->engine->gt,
+					INTEL_GT_SCRATCH_FIELD_RENDER_FLUSH);
 	u32 *cs;
 
 	cs = intel_ring_begin(rq, 6);
@@ -247,7 +251,8 @@ static int
 gen6_render_ring_flush(struct i915_request *rq, u32 mode)
 {
 	u32 scratch_addr =
-		intel_gt_scratch_offset(rq->engine->gt) + 2 * CACHELINE_BYTES;
+		intel_gt_scratch_offset(rq->engine->gt,
+					INTEL_GT_SCRATCH_FIELD_RENDER_FLUSH);
 	u32 *cs, flags = 0;
 	int ret;
 
@@ -305,7 +310,8 @@ static u32 *gen6_rcs_emit_breadcrumb(struct i915_request *rq, u32 *cs)
 
 	*cs++ = GFX_OP_PIPE_CONTROL(4);
 	*cs++ = PIPE_CONTROL_QW_WRITE;
-	*cs++ = intel_gt_scratch_offset(rq->engine->gt) |
+	*cs++ = intel_gt_scratch_offset(rq->engine->gt,
+					INTEL_GT_SCRATCH_FIELD_DEFAULT) |
 		PIPE_CONTROL_GLOBAL_GTT;
 	*cs++ = 0;
 
@@ -350,7 +356,8 @@ static int
 gen7_render_ring_flush(struct i915_request *rq, u32 mode)
 {
 	u32 scratch_addr =
-		intel_gt_scratch_offset(rq->engine->gt) + 2 * CACHELINE_BYTES;
+		intel_gt_scratch_offset(rq->engine->gt,
+					INTEL_GT_SCRATCH_FIELD_RENDER_FLUSH);
 	u32 *cs, flags = 0;
 
 	/*
@@ -1079,7 +1086,9 @@ i830_emit_bb_start(struct i915_request *rq,
 		   u64 offset, u32 len,
 		   unsigned int dispatch_flags)
 {
-	u32 *cs, cs_offset = intel_gt_scratch_offset(rq->engine->gt);
+	u32 *cs, cs_offset =
+		intel_gt_scratch_offset(rq->engine->gt,
+					INTEL_GT_SCRATCH_FIELD_DEFAULT);
 
 	GEM_BUG_ON(rq->engine->gt->scratch->size < I830_WA_SIZE);
 
@@ -1523,7 +1532,8 @@ static int flush_pd_dir(struct i915_request *rq)
 	/* Stall until the page table load is complete */
 	*cs++ = MI_STORE_REGISTER_MEM | MI_SRM_LRM_GLOBAL_GTT;
 	*cs++ = i915_mmio_reg_offset(RING_PP_DIR_BASE(engine->mmio_base));
-	*cs++ = intel_gt_scratch_offset(rq->engine->gt);
+	*cs++ = intel_gt_scratch_offset(rq->engine->gt,
+					INTEL_GT_SCRATCH_FIELD_DEFAULT);
 	*cs++ = MI_NOOP;
 
 	intel_ring_advance(rq, cs);
@@ -1639,7 +1649,8 @@ static inline int mi_set_context(struct i915_request *rq, u32 flags)
 			/* Insert a delay before the next switch! */
 			*cs++ = MI_STORE_REGISTER_MEM | MI_SRM_LRM_GLOBAL_GTT;
 			*cs++ = i915_mmio_reg_offset(last_reg);
-			*cs++ = intel_gt_scratch_offset(rq->engine->gt);
+			*cs++ = intel_gt_scratch_offset(rq->engine->gt,
+							INTEL_GT_SCRATCH_FIELD_DEFAULT);
 			*cs++ = MI_NOOP;
 		}
 		*cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE;
-- 
2.21.0.392.gf8f6787159e

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v6 05/11] drm/i915/perf: implement active wait for noa configurations
  2019-07-01 11:34 [PATCH v6 00/11] drm/i915: Vulkan performance query support Lionel Landwerlin
                   ` (3 preceding siblings ...)
  2019-07-01 11:34 ` [PATCH v6 04/11] drm/i915: enumerate scratch fields Lionel Landwerlin
@ 2019-07-01 11:34 ` Lionel Landwerlin
  2019-07-01 12:43   ` Chris Wilson
  2019-07-01 11:34 ` [PATCH v6 06/11] drm/i915: introduce a mechanism to extend execbuf2 Lionel Landwerlin
                   ` (9 subsequent siblings)
  14 siblings, 1 reply; 44+ messages in thread
From: Lionel Landwerlin @ 2019-07-01 11:34 UTC (permalink / raw)
  To: intel-gfx

NOA configuration take some amount of time to apply. That amount of
time depends on the size of the GT. There is no documented time for
this. For example, past experimentations with powergating
configuration changes seem to indicate a 60~70us delay. We go with
500us as default for now which should be over the required amount of
time (according to HW architects).

v2: Don't forget to save/restore registers used for the wait (Chris)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_gpu_commands.h |  24 ++
 drivers/gpu/drm/i915/gt/intel_gt_types.h     |   5 +
 drivers/gpu/drm/i915/i915_debugfs.c          |  25 +++
 drivers/gpu/drm/i915/i915_drv.h              |   8 +
 drivers/gpu/drm/i915/i915_perf.c             | 225 ++++++++++++++++++-
 drivers/gpu/drm/i915/i915_reg.h              |   4 +-
 6 files changed, 288 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
index e7eff9db343e..4a66af38c87b 100644
--- a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
+++ b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
@@ -151,6 +151,7 @@
 #define   MI_BATCH_GTT		    (2<<6) /* aliased with (1<<7) on gen4 */
 #define MI_BATCH_BUFFER_START_GEN8	MI_INSTR(0x31, 1)
 #define   MI_BATCH_RESOURCE_STREAMER (1<<10)
+#define   MI_BATCH_PREDICATE         (1 << 15) /* HSW+ on RCS only*/
 
 /*
  * 3D instructions used by the kernel
@@ -226,6 +227,29 @@
 #define   PIPE_CONTROL_DEPTH_CACHE_FLUSH		(1<<0)
 #define   PIPE_CONTROL_GLOBAL_GTT (1<<2) /* in addr dword */
 
+#define MI_MATH(x) MI_INSTR(0x1a, (x)-1)
+#define   MI_ALU_OP(op, src1, src2) (((op) << 20) | ((src1) << 10) | (src2))
+/* operands */
+#define   MI_ALU_OP_NOOP     0
+#define   MI_ALU_OP_LOAD     128
+#define   MI_ALU_OP_LOADINV  1152
+#define   MI_ALU_OP_LOAD0    129
+#define   MI_ALU_OP_LOAD1    1153
+#define   MI_ALU_OP_ADD      256
+#define   MI_ALU_OP_SUB      257
+#define   MI_ALU_OP_AND      258
+#define   MI_ALU_OP_OR       259
+#define   MI_ALU_OP_XOR      260
+#define   MI_ALU_OP_STORE    384
+#define   MI_ALU_OP_STOREINV 1408
+/* sources */
+#define   MI_ALU_SRC_REG(x)  (x) /* 0 -> 15 */
+#define   MI_ALU_SRC_SRCA    32
+#define   MI_ALU_SRC_SRCB    33
+#define   MI_ALU_SRC_ACCU    49
+#define   MI_ALU_SRC_ZF      50
+#define   MI_ALU_SRC_CF      51
+
 /*
  * Commands used only by the command parser
  */
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_types.h b/drivers/gpu/drm/i915/gt/intel_gt_types.h
index e625a5e320d3..0750ac49a05b 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h
@@ -70,6 +70,11 @@ enum intel_gt_scratch_field {
 	/* 8 bytes */
 	INTEL_GT_SCRATCH_FIELD_COHERENTL3_WA = 256,
 
+	/* 6 * 8 bytes */
+	INTEL_GT_SCRATCH_FIELD_PERF_CS_GPR = 2048,
+
+	/* 4 bytes */
+	INTEL_GT_SCRATCH_FIELD_PERF_PREDICATE_RESULT_1 = 2096,
 };
 
 #endif /* __INTEL_GT_TYPES_H__ */
diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index eeecdad0e3ca..6b49fda145e7 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -3646,6 +3646,30 @@ DEFINE_SIMPLE_ATTRIBUTE(i915_wedged_fops,
 			i915_wedged_get, i915_wedged_set,
 			"%llu\n");
 
+static int
+i915_perf_noa_delay_set(void *data, u64 val)
+{
+	struct drm_i915_private *i915 = data;
+
+	atomic64_set(&i915->perf.oa.noa_programming_delay, val);
+	return 0;
+}
+
+static int
+i915_perf_noa_delay_get(void *data, u64 *val)
+{
+	struct drm_i915_private *i915 = data;
+
+	*val = atomic64_read(&i915->perf.oa.noa_programming_delay);
+	return 0;
+}
+
+DEFINE_SIMPLE_ATTRIBUTE(i915_perf_noa_delay_fops,
+			i915_perf_noa_delay_get,
+			i915_perf_noa_delay_set,
+			"%llu\n");
+
+
 #define DROP_UNBOUND	BIT(0)
 #define DROP_BOUND	BIT(1)
 #define DROP_RETIRE	BIT(2)
@@ -4411,6 +4435,7 @@ static const struct i915_debugfs_files {
 	const char *name;
 	const struct file_operations *fops;
 } i915_debugfs_files[] = {
+	{"i915_perf_noa_delay", &i915_perf_noa_delay_fops},
 	{"i915_wedged", &i915_wedged_fops},
 	{"i915_cache_sharing", &i915_cache_sharing_fops},
 	{"i915_gem_drop_caches", &i915_drop_caches_fops},
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index df39b2ee6bd9..fe93a260bd28 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1837,6 +1837,14 @@ struct drm_i915_private {
 
 			struct i915_oa_ops ops;
 			const struct i915_oa_format *oa_formats;
+
+			/**
+			 * A batch buffer doing a wait on the GPU for the NOA
+			 * logic to be reprogrammed.
+			 */
+			struct i915_vma *noa_wait;
+
+			atomic64_t noa_programming_delay;
 		} oa;
 	} perf;
 
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 5ba771468078..03e6908282e3 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -197,6 +197,7 @@
 
 #include "gem/i915_gem_context.h"
 #include "gem/i915_gem_pm.h"
+#include "gt/intel_gt.h"
 #include "gt/intel_lrc_reg.h"
 
 #include "i915_drv.h"
@@ -429,7 +430,7 @@ static int alloc_oa_config_buffer(struct drm_i915_private *i915,
 					      MI_LOAD_REGISTER_IMM_MAX_REGS) * 4;
 		config_length += oa_config->flex_regs_len * 8;
 	}
-	config_length += 4; /* MI_BATCH_BUFFER_END */
+	config_length += 12; /* MI_BATCH_BUFFER_START into noa_wait loop */
 	config_length = ALIGN(config_length, I915_GTT_PAGE_SIZE);
 
 	bo = i915_gem_object_create_shmem(i915, config_length);
@@ -446,7 +447,12 @@ static int alloc_oa_config_buffer(struct drm_i915_private *i915,
 	cs = write_cs_mi_lri(cs, oa_config->b_counter_regs, oa_config->b_counter_regs_len);
 	cs = write_cs_mi_lri(cs, oa_config->flex_regs, oa_config->flex_regs_len);
 
-	*cs++ = MI_BATCH_BUFFER_END;
+
+	/* Jump into the NOA wait busy loop. */
+	*cs++ = (INTEL_GEN(i915) < 8 ?
+		 MI_BATCH_BUFFER_START : MI_BATCH_BUFFER_START_GEN8);
+	*cs++ = i915_ggtt_offset(i915->perf.oa.noa_wait);
+	*cs++ = 0;
 
 	i915_gem_object_flush_map(bo);
 	i915_gem_object_unpin_map(bo);
@@ -1467,6 +1473,7 @@ static void i915_oa_stream_destroy(struct i915_perf_stream *stream)
 	mutex_lock(&dev_priv->drm.struct_mutex);
 	dev_priv->perf.oa.exclusive_stream = NULL;
 	dev_priv->perf.oa.ops.disable_metric_set(dev_priv);
+	i915_vma_unpin_and_release(&dev_priv->perf.oa.noa_wait, 0);
 	mutex_unlock(&dev_priv->drm.struct_mutex);
 
 	free_oa_buffer(dev_priv);
@@ -1653,6 +1660,204 @@ static int alloc_oa_buffer(struct drm_i915_private *dev_priv)
 	return ret;
 }
 
+static u32 *save_register(struct drm_i915_private *i915, u32 *cs,
+			  i915_reg_t reg, u32 offset, u32 dword_count)
+{
+	uint32_t d;
+
+	for (d = 0; d < dword_count; d++) {
+		*cs++ = INTEL_GEN(i915) >= 8 ?
+			MI_STORE_REGISTER_MEM_GEN8 : MI_STORE_REGISTER_MEM;
+		*cs++ = i915_mmio_reg_offset(reg) + 4 * d;
+		*cs++ = intel_gt_scratch_offset(&i915->gt, offset) + 4 * d;
+		if (INTEL_GEN(i915) >= 8)
+			*cs++ = 0;
+	}
+
+	return cs;
+}
+
+static u32 *restore_register(struct drm_i915_private *i915, u32 *cs,
+			     i915_reg_t reg, u32 offset, u32 dword_count)
+{
+	uint32_t d;
+
+	for (d = 0; d < dword_count; d++) {
+		*cs++ = INTEL_GEN(i915) >= 8 ?
+			MI_LOAD_REGISTER_MEM_GEN8 : MI_LOAD_REGISTER_MEM;
+		*cs++ = i915_mmio_reg_offset(reg);
+		*cs++ = intel_gt_scratch_offset(&i915->gt, offset);
+		if (INTEL_GEN(i915) >= 8)
+			*cs++ = 0;
+	}
+
+	return cs;
+}
+
+static int alloc_noa_wait(struct drm_i915_private *i915)
+{
+	struct drm_i915_gem_object *bo;
+	struct i915_vma *vma;
+	u64 delay_ns = atomic64_read(&i915->perf.oa.noa_programming_delay), delay_ticks;
+	u32 *batch, *ts0, *cs, *jump;
+	int ret, i;
+
+	bo = i915_gem_object_create_shmem(i915, 4096);
+	if (IS_ERR(bo)) {
+		DRM_ERROR("Failed to allocate NOA wait batchbuffer\n");
+		return PTR_ERR(bo);
+	}
+
+	/*
+	 * We pin in GGTT because we jump into this buffer now because
+	 * multiple OA config BOs will have a jump to this address and it
+	 * needs to be fixed during the lifetime of the i915/perf stream.
+	 */
+	vma = i915_gem_object_ggtt_pin(bo, NULL, 0, 4096, 0);
+	if (IS_ERR(vma)) {
+		ret = PTR_ERR(vma);
+		goto err_unref;
+	}
+
+	batch = cs = i915_gem_object_pin_map(bo, I915_MAP_WB);
+	if (IS_ERR(batch)) {
+		ret = PTR_ERR(batch);
+		goto err_unpin;
+	}
+
+	/* Save registers. */
+	for (i = 0; i <= 5; i++) {
+		cs = save_register(i915, cs, HSW_CS_GPR(i),
+				   INTEL_GT_SCRATCH_FIELD_PERF_CS_GPR + 8 * i, 2);
+	}
+	cs = save_register(i915, cs, MI_PREDICATE_RESULT_1,
+			   INTEL_GT_SCRATCH_FIELD_PERF_PREDICATE_RESULT_1, 1);
+
+	/* First timestamp snapshot location. */
+	ts0 = cs;
+
+	/*
+	 * Initial snapshot of the timestamp register to implement the wait.
+	 * We work with 32b values, so clear out the top 32b bits of the
+	 * register because the ALU works 64bits.
+	 */
+	*cs++ = MI_LOAD_REGISTER_IMM(1);
+	*cs++ = i915_mmio_reg_offset(HSW_CS_GPR(0)) + 4;
+	*cs++ = 0;
+	*cs++ = MI_LOAD_REGISTER_REG | (3 - 2);
+	*cs++ = i915_mmio_reg_offset(RING_TIMESTAMP(RENDER_RING_BASE));
+	*cs++ = i915_mmio_reg_offset(HSW_CS_GPR(0));
+
+	/*
+	 * This is the location we're going to jump back into until the
+	 * required amount of time has passed.
+	 */
+	jump = cs;
+
+	/*
+	 * Take another snapshot of the timestamp register. Take care to clear
+	 * up the top 32bits of CS_GPR(1) as we're using it for other
+	 * operations below.
+	 */
+	*cs++ = MI_LOAD_REGISTER_IMM(1);
+	*cs++ = i915_mmio_reg_offset(HSW_CS_GPR(1)) + 4;
+	*cs++ = 0;
+	*cs++ = MI_LOAD_REGISTER_REG | (3 - 2);
+	*cs++ = i915_mmio_reg_offset(RING_TIMESTAMP(RENDER_RING_BASE));
+	*cs++ = i915_mmio_reg_offset(HSW_CS_GPR(1));
+
+	/*
+	 * Do a diff between the 2 timestamps and store the result back into
+	 * CS_GPR(1).
+	 */
+	*cs++ = MI_MATH(5);
+	*cs++ = MI_ALU_OP(MI_ALU_OP_LOAD, MI_ALU_SRC_SRCA, MI_ALU_SRC_REG(1));
+	*cs++ = MI_ALU_OP(MI_ALU_OP_LOAD, MI_ALU_SRC_SRCB, MI_ALU_SRC_REG(0));
+	*cs++ = MI_ALU_OP(MI_ALU_OP_SUB, 0, 0);
+	*cs++ = MI_ALU_OP(MI_ALU_OP_STORE, MI_ALU_SRC_REG(2), MI_ALU_SRC_ACCU);
+	*cs++ = MI_ALU_OP(MI_ALU_OP_STORE, MI_ALU_SRC_REG(3), MI_ALU_SRC_CF);
+
+	/*
+	 * Transfer the carry flag (set to 1 if ts1 < ts0, meaning the
+	 * timestamp have rolled over the 32bits) into the predicate register
+	 * to be used for the predicated jump.
+	 */
+	*cs++ = MI_LOAD_REGISTER_REG | (3 - 2);
+	*cs++ = i915_mmio_reg_offset(HSW_CS_GPR(3));
+	*cs++ = i915_mmio_reg_offset(MI_PREDICATE_RESULT_1);
+
+	/* Restart from the beginning if we had timestamps roll over. */
+	*cs++ = (INTEL_GEN(i915) < 8 ?
+		 MI_BATCH_BUFFER_START : MI_BATCH_BUFFER_START_GEN8) |
+		MI_BATCH_PREDICATE;
+	*cs++ = vma->node.start;
+	*cs++ = 0;
+
+	/*
+	 * Now add the diff between to previous timestamps and add it to :
+	 *      (((1 * << 64) - 1) - delay_ns)
+	 *
+	 * When the Carry Flag contains 1 this means the elapsed time is
+	 * longer than the expected delay, and we can exit the wait loop.
+	 */
+	delay_ticks = 0xffffffffffffffff -
+		DIV64_U64_ROUND_UP(delay_ns *
+				   RUNTIME_INFO(i915)->cs_timestamp_frequency_khz,
+				   1000000ull);
+	*cs++ = MI_LOAD_REGISTER_IMM(2);
+	*cs++ = i915_mmio_reg_offset(HSW_CS_GPR(4));
+	*cs++ = lower_32_bits(delay_ticks);
+	*cs++ = i915_mmio_reg_offset(HSW_CS_GPR(4)) + 4;
+	*cs++ = upper_32_bits(delay_ticks);
+
+	*cs++ = MI_MATH(4);
+	*cs++ = MI_ALU_OP(MI_ALU_OP_LOAD, MI_ALU_SRC_SRCA, MI_ALU_SRC_REG(2));
+	*cs++ = MI_ALU_OP(MI_ALU_OP_LOAD, MI_ALU_SRC_SRCB, MI_ALU_SRC_REG(4));
+	*cs++ = MI_ALU_OP(MI_ALU_OP_ADD, 0, 0);
+	*cs++ = MI_ALU_OP(MI_ALU_OP_STOREINV, MI_ALU_SRC_REG(5), MI_ALU_SRC_CF);
+
+	/*
+	 * Transfer the result into the predicate register to be used for the
+	 * predicated jump.
+	 */
+	*cs++ = MI_LOAD_REGISTER_REG | (3 - 2);
+	*cs++ = i915_mmio_reg_offset(HSW_CS_GPR(5));
+	*cs++ = i915_mmio_reg_offset(MI_PREDICATE_RESULT_1);
+
+	/* Predicate the jump.  */
+	*cs++ = (INTEL_GEN(i915) < 8 ?
+		 MI_BATCH_BUFFER_START : MI_BATCH_BUFFER_START_GEN8) |
+		MI_BATCH_PREDICATE;
+	*cs++ = vma->node.start + (jump - batch) * 4;
+	*cs++ = 0;
+
+	/* Restore registers. */
+	for (i = 0; i <= 5; i++) {
+		cs = restore_register(i915, cs, HSW_CS_GPR(i),
+				      INTEL_GT_SCRATCH_FIELD_PERF_CS_GPR + 8 * i, 2);
+	}
+	cs = restore_register(i915, cs, MI_PREDICATE_RESULT_1,
+			      INTEL_GT_SCRATCH_FIELD_PERF_PREDICATE_RESULT_1, 1);
+
+	/* And return to the ring. */
+	*cs++ = MI_BATCH_BUFFER_END;
+
+	i915_gem_object_flush_map(bo);
+	i915_gem_object_unpin_map(bo);
+
+	i915->perf.oa.noa_wait = vma;
+
+	return 0;
+
+err_unpin:
+	__i915_vma_unpin(vma);
+
+err_unref:
+	i915_gem_object_put(bo);
+
+	return ret;
+}
+
 static void config_oa_regs(struct drm_i915_private *dev_priv,
 			   const struct i915_oa_reg *regs,
 			   u32 n_regs)
@@ -2221,6 +2426,12 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
 		goto err_config;
 	}
 
+	ret = alloc_noa_wait(dev_priv);
+	if (ret) {
+		DRM_DEBUG("Unable to allocate NOA wait batch buffer\n");
+		goto err_noa_wait_alloc;
+	}
+
 	/* PRM - observability performance counters:
 	 *
 	 *   OACONTROL, performance counter enable, note:
@@ -2273,6 +2484,13 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
 	intel_uncore_forcewake_put(&dev_priv->uncore, FORCEWAKE_ALL);
 	intel_runtime_pm_put(&dev_priv->runtime_pm, stream->wakeref);
 
+	mutex_lock(&dev_priv->drm.struct_mutex);
+	i915_vma_unpin_and_release(&dev_priv->perf.oa.noa_wait, 0);
+	mutex_unlock(&dev_priv->drm.struct_mutex);
+
+err_noa_wait_alloc:
+	i915_oa_config_put(stream->oa_config);
+
 err_config:
 	if (stream->ctx)
 		oa_put_render_ctx_id(stream);
@@ -3649,6 +3867,9 @@ void i915_perf_init(struct drm_i915_private *dev_priv)
 		mutex_init(&dev_priv->perf.metrics_lock);
 		idr_init(&dev_priv->perf.metrics_idr);
 
+		atomic64_set(&dev_priv->perf.oa.noa_programming_delay,
+			     500 * 1000 /* 500us */);
+
 		dev_priv->perf.initialized = true;
 	}
 }
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 7e6009cefb18..e12b2fccef70 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -567,7 +567,9 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg)
 #define MI_PREDICATE_SRC0_UDW	_MMIO(0x2400 + 4)
 #define MI_PREDICATE_SRC1	_MMIO(0x2408)
 #define MI_PREDICATE_SRC1_UDW	_MMIO(0x2408 + 4)
-
+#define MI_PREDICATE_DATA       _MMIO(0x2410)
+#define MI_PREDICATE_RESULT     _MMIO(0x2418)
+#define MI_PREDICATE_RESULT_1   _MMIO(0x241c)
 #define MI_PREDICATE_RESULT_2	_MMIO(0x2214)
 #define  LOWER_SLICE_ENABLED	(1 << 0)
 #define  LOWER_SLICE_DISABLED	(0 << 0)
-- 
2.21.0.392.gf8f6787159e

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v6 06/11] drm/i915: introduce a mechanism to extend execbuf2
  2019-07-01 11:34 [PATCH v6 00/11] drm/i915: Vulkan performance query support Lionel Landwerlin
                   ` (4 preceding siblings ...)
  2019-07-01 11:34 ` [PATCH v6 05/11] drm/i915/perf: implement active wait for noa configurations Lionel Landwerlin
@ 2019-07-01 11:34 ` Lionel Landwerlin
  2019-07-01 15:17   ` Chris Wilson
  2019-07-01 11:34 ` [PATCH v6 07/11] drm/i915: add syncobj timeline support Lionel Landwerlin
                   ` (8 subsequent siblings)
  14 siblings, 1 reply; 44+ messages in thread
From: Lionel Landwerlin @ 2019-07-01 11:34 UTC (permalink / raw)
  To: intel-gfx

We're planning to use this for a couple of new feature where we need
to provide additional parameters to execbuf.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 32 ++++++++++++++++++-
 include/uapi/drm/i915_drm.h                   | 25 +++++++++++++--
 2 files changed, 53 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 1c5dfbfad71b..9887fa9e3ac8 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -23,6 +23,7 @@
 #include "i915_gem_clflush.h"
 #include "i915_gem_context.h"
 #include "i915_trace.h"
+#include "i915_user_extensions.h"
 #include "intel_drv.h"
 
 enum {
@@ -271,6 +272,10 @@ struct i915_execbuffer {
 	 */
 	int lut_size;
 	struct hlist_head *buckets; /** ht for relocation handles */
+
+	struct {
+		u64 flags; /** Available extensions parameters */
+	} extensions;
 };
 
 #define exec_entry(EB, VMA) (&(EB)->exec[(VMA)->exec_flags - (EB)->flags])
@@ -1969,7 +1974,7 @@ static bool i915_gem_check_execbuffer(struct drm_i915_gem_execbuffer2 *exec)
 		return false;
 
 	/* Kernel clipping was a DRI1 misfeature */
-	if (!(exec->flags & I915_EXEC_FENCE_ARRAY)) {
+	if (!(exec->flags & (I915_EXEC_FENCE_ARRAY | I915_EXEC_EXT))) {
 		if (exec->num_cliprects || exec->cliprects_ptr)
 			return false;
 	}
@@ -2347,6 +2352,27 @@ signal_fence_array(struct i915_execbuffer *eb,
 	}
 }
 
+static const i915_user_extension_fn execbuf_extensions[] = {
+};
+
+static int
+parse_execbuf2_extensions(struct drm_i915_gem_execbuffer2 *args,
+			  struct i915_execbuffer *eb)
+{
+	eb->extensions.flags = 0;
+
+	if (!(args->flags & I915_EXEC_EXT))
+		return 0;
+
+	if (args->num_cliprects != 0)
+		return -EINVAL;
+
+	return i915_user_extensions(u64_to_user_ptr(args->cliprects_ptr),
+				    execbuf_extensions,
+				    ARRAY_SIZE(execbuf_extensions),
+				    eb);
+}
+
 static int
 i915_gem_do_execbuffer(struct drm_device *dev,
 		       struct drm_file *file,
@@ -2393,6 +2419,10 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	if (args->flags & I915_EXEC_IS_PINNED)
 		eb.batch_flags |= I915_DISPATCH_PINNED;
 
+	err = parse_execbuf2_extensions(args, &eb);
+	if (err)
+		return err;
+
 	if (args->flags & I915_EXEC_FENCE_IN) {
 		in_fence = sync_file_get_fence(lower_32_bits(args->rsvd2));
 		if (!in_fence)
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index e27a8eda9121..efa195d6994e 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1013,6 +1013,10 @@ struct drm_i915_gem_exec_fence {
 	__u32 flags;
 };
 
+enum drm_i915_gem_execbuffer_ext {
+	DRM_I915_GEM_EXECBUFFER_EXT_MAX /* non-ABI */
+};
+
 struct drm_i915_gem_execbuffer2 {
 	/**
 	 * List of gem_exec_object2 structs
@@ -1029,8 +1033,14 @@ struct drm_i915_gem_execbuffer2 {
 	__u32 num_cliprects;
 	/**
 	 * This is a struct drm_clip_rect *cliprects if I915_EXEC_FENCE_ARRAY
-	 * is not set.  If I915_EXEC_FENCE_ARRAY is set, then this is a
-	 * struct drm_i915_gem_exec_fence *fences.
+	 * & I915_EXEC_EXT are not set.
+	 *
+	 * If I915_EXEC_FENCE_ARRAY is set, then this is a pointer to an array
+	 * of struct drm_i915_gem_exec_fence and num_cliprects is the length
+	 * of the array.
+	 *
+	 * If I915_EXEC_EXT is set, then this is a pointer to a single struct
+	 * drm_i915_gem_base_execbuffer_ext and num_cliprects is 0.
 	 */
 	__u64 cliprects_ptr;
 #define I915_EXEC_RING_MASK              (0x3f)
@@ -1148,7 +1158,16 @@ struct drm_i915_gem_execbuffer2 {
  */
 #define I915_EXEC_FENCE_SUBMIT		(1 << 20)
 
-#define __I915_EXEC_UNKNOWN_FLAGS (-(I915_EXEC_FENCE_SUBMIT << 1))
+/*
+ * Setting I915_EXEC_EXT implies that drm_i915_gem_execbuffer2.cliprects_ptr
+ * is treated as a pointer to an linked list of i915_user_extension. Each
+ * i915_user_extension node is the base of a larger structure. The list of
+ * supported structures are listed in the drm_i915_gem_execbuffer_ext
+ * enum.
+ */
+#define I915_EXEC_EXT		(1 << 21)
+
+#define __I915_EXEC_UNKNOWN_FLAGS (-(I915_EXEC_EXT<<1))
 
 #define I915_EXEC_CONTEXT_ID_MASK	(0xffffffff)
 #define i915_execbuffer2_set_context_id(eb2, context) \
-- 
2.21.0.392.gf8f6787159e

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v6 07/11] drm/i915: add syncobj timeline support
  2019-07-01 11:34 [PATCH v6 00/11] drm/i915: Vulkan performance query support Lionel Landwerlin
                   ` (5 preceding siblings ...)
  2019-07-01 11:34 ` [PATCH v6 06/11] drm/i915: introduce a mechanism to extend execbuf2 Lionel Landwerlin
@ 2019-07-01 11:34 ` Lionel Landwerlin
  2019-07-01 13:13   ` Chris Wilson
                     ` (2 more replies)
  2019-07-01 11:34 ` [PATCH v6 08/11] drm/i915: add a new perf configuration execbuf parameter Lionel Landwerlin
                   ` (7 subsequent siblings)
  14 siblings, 3 replies; 44+ messages in thread
From: Lionel Landwerlin @ 2019-07-01 11:34 UTC (permalink / raw)
  To: intel-gfx

Introduces a new parameters to execbuf so that we can specify syncobj
handles as well as timeline points.

v2: Reuse i915_user_extension_fn

v3: Check that the chained extension is only present once (Chris)

v4: Check that dma_fence_chain_find_seqno returns a non NULL fence (Lionel)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 288 ++++++++++++++----
 drivers/gpu/drm/i915/i915_drv.c               |   4 +-
 include/uapi/drm/i915_drm.h                   |  38 +++
 3 files changed, 273 insertions(+), 57 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 9887fa9e3ac8..d3004fc1f995 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -213,6 +213,13 @@ enum {
  * the batchbuffer in trusted mode, otherwise the ioctl is rejected.
  */
 
+struct i915_eb_fences {
+	struct drm_syncobj *syncobj; /* Use with ptr_mask_bits() */
+	struct dma_fence *dma_fence;
+	u64 value;
+	struct dma_fence_chain *chain_fence;
+};
+
 struct i915_execbuffer {
 	struct drm_i915_private *i915; /** i915 backpointer */
 	struct drm_file *file; /** per-file lookup tables and limits */
@@ -275,6 +282,7 @@ struct i915_execbuffer {
 
 	struct {
 		u64 flags; /** Available extensions parameters */
+		struct drm_i915_gem_execbuffer_ext_timeline_fences timeline_fences;
 	} extensions;
 };
 
@@ -2224,67 +2232,198 @@ eb_select_engine(struct i915_execbuffer *eb,
 }
 
 static void
-__free_fence_array(struct drm_syncobj **fences, unsigned int n)
+__free_fence_array(struct i915_eb_fences *fences, unsigned int n)
 {
-	while (n--)
-		drm_syncobj_put(ptr_mask_bits(fences[n], 2));
+	while (n--) {
+		drm_syncobj_put(ptr_mask_bits(fences[n].syncobj, 2));
+		dma_fence_put(fences[n].dma_fence);
+		kfree(fences[n].chain_fence);
+	}
 	kvfree(fences);
 }
 
-static struct drm_syncobj **
-get_fence_array(struct drm_i915_gem_execbuffer2 *args,
-		struct drm_file *file)
+static struct i915_eb_fences *
+get_timeline_fence_array(struct i915_execbuffer *eb, int *out_n_fences)
+{
+	struct drm_i915_gem_execbuffer_ext_timeline_fences *timeline_fences =
+		&eb->extensions.timeline_fences;
+	struct drm_i915_gem_exec_fence __user *user_fences;
+	struct i915_eb_fences *fences;
+	u64 __user *user_values;
+	const u64 num_fences = timeline_fences->fence_count;
+	unsigned long n;
+	int err;
+
+	*out_n_fences = num_fences;
+
+	/* Check multiplication overflow for access_ok() and kvmalloc_array() */
+	BUILD_BUG_ON(sizeof(size_t) > sizeof(unsigned long));
+	if (num_fences > min_t(unsigned long,
+			       ULONG_MAX / sizeof(*user_fences),
+			       SIZE_MAX / sizeof(*fences)))
+		return ERR_PTR(-EINVAL);
+
+	user_fences = u64_to_user_ptr(timeline_fences->handles_ptr);
+	if (!access_ok(user_fences, num_fences * sizeof(*user_fences)))
+		return ERR_PTR(-EFAULT);
+
+	user_values = u64_to_user_ptr(timeline_fences->values_ptr);
+	if (!access_ok(user_values, num_fences * sizeof(*user_values)))
+		return ERR_PTR(-EFAULT);
+
+	fences = kvmalloc_array(num_fences, sizeof(*fences),
+				__GFP_NOWARN | GFP_KERNEL);
+	if (!fences)
+		return ERR_PTR(-ENOMEM);
+
+	BUILD_BUG_ON(~(ARCH_KMALLOC_MINALIGN - 1) &
+		     ~__I915_EXEC_FENCE_UNKNOWN_FLAGS);
+
+	for (n = 0; n < num_fences; n++) {
+		struct drm_i915_gem_exec_fence user_fence;
+		struct drm_syncobj *syncobj;
+		struct dma_fence *fence = NULL;
+		u64 point;
+
+		if (__copy_from_user(&user_fence, user_fences++, sizeof(user_fence))) {
+			err = -EFAULT;
+			goto err;
+		}
+
+		if (user_fence.flags & __I915_EXEC_FENCE_UNKNOWN_FLAGS) {
+			err = -EINVAL;
+			goto err;
+		}
+
+		if (__get_user(point, user_values++)) {
+			err = -EFAULT;
+			goto err;
+		}
+
+		syncobj = drm_syncobj_find(eb->file, user_fence.handle);
+		if (!syncobj) {
+			DRM_DEBUG("Invalid syncobj handle provided\n");
+			err = -EINVAL;
+			goto err;
+		}
+
+		if (user_fence.flags & I915_EXEC_FENCE_WAIT) {
+			fence = drm_syncobj_fence_get(syncobj);
+			if (!fence) {
+				DRM_DEBUG("Syncobj handle has no fence\n");
+				drm_syncobj_put(syncobj);
+				err = -EINVAL;
+				goto err;
+			}
+
+			err = dma_fence_chain_find_seqno(&fence, point);
+			if (err || !fence) {
+				DRM_DEBUG("Syncobj handle missing requested point\n");
+				drm_syncobj_put(syncobj);
+				err = err != 0 ? err : -EINVAL;
+				goto err;
+			}
+		}
+
+		/*
+		 * For timeline syncobjs we need to preallocate chains for
+		 * later signaling.
+		 */
+		if (point != 0 && user_fence.flags & I915_EXEC_FENCE_SIGNAL) {
+			fences[n].chain_fence =
+				kmalloc(sizeof(*fences[n].chain_fence),
+					GFP_KERNEL);
+			if (!fences[n].chain_fence) {
+				dma_fence_put(fence);
+				drm_syncobj_put(syncobj);
+				err = -ENOMEM;
+				DRM_DEBUG("Unable to alloc chain_fence\n");
+				goto err;
+			}
+		} else {
+			fences[n].chain_fence = NULL;
+		}
+
+		fences[n].syncobj = ptr_pack_bits(syncobj, user_fence.flags, 2);
+		fences[n].dma_fence = fence;
+		fences[n].value = point;
+	}
+
+	return fences;
+
+err:
+	__free_fence_array(fences, n);
+	return ERR_PTR(err);
+}
+
+static struct i915_eb_fences *
+get_legacy_fence_array(struct i915_execbuffer *eb,
+		       int *out_n_fences)
 {
-	const unsigned long nfences = args->num_cliprects;
+	struct drm_i915_gem_execbuffer2 *args = eb->args;
 	struct drm_i915_gem_exec_fence __user *user;
-	struct drm_syncobj **fences;
+	struct i915_eb_fences *fences;
+	const u32 num_fences = args->num_cliprects;
 	unsigned long n;
 	int err;
 
-	if (!(args->flags & I915_EXEC_FENCE_ARRAY))
-		return NULL;
+	*out_n_fences = num_fences;
 
 	/* Check multiplication overflow for access_ok() and kvmalloc_array() */
 	BUILD_BUG_ON(sizeof(size_t) > sizeof(unsigned long));
-	if (nfences > min_t(unsigned long,
-			    ULONG_MAX / sizeof(*user),
-			    SIZE_MAX / sizeof(*fences)))
+	if (*out_n_fences > min_t(unsigned long,
+				  ULONG_MAX / sizeof(*user),
+				  SIZE_MAX / sizeof(*fences)))
 		return ERR_PTR(-EINVAL);
 
 	user = u64_to_user_ptr(args->cliprects_ptr);
-	if (!access_ok(user, nfences * sizeof(*user)))
+	if (!access_ok(user, *out_n_fences * sizeof(*user)))
 		return ERR_PTR(-EFAULT);
 
-	fences = kvmalloc_array(nfences, sizeof(*fences),
+	fences = kvmalloc_array(*out_n_fences, sizeof(*fences),
 				__GFP_NOWARN | GFP_KERNEL);
 	if (!fences)
 		return ERR_PTR(-ENOMEM);
 
-	for (n = 0; n < nfences; n++) {
-		struct drm_i915_gem_exec_fence fence;
+	for (n = 0; n < *out_n_fences; n++) {
+		struct drm_i915_gem_exec_fence user_fence;
 		struct drm_syncobj *syncobj;
+		struct dma_fence *fence = NULL;
 
-		if (__copy_from_user(&fence, user++, sizeof(fence))) {
+		if (__copy_from_user(&user_fence, user++, sizeof(user_fence))) {
 			err = -EFAULT;
 			goto err;
 		}
 
-		if (fence.flags & __I915_EXEC_FENCE_UNKNOWN_FLAGS) {
+		if (user_fence.flags & __I915_EXEC_FENCE_UNKNOWN_FLAGS) {
 			err = -EINVAL;
 			goto err;
 		}
 
-		syncobj = drm_syncobj_find(file, fence.handle);
+		syncobj = drm_syncobj_find(eb->file, user_fence.handle);
 		if (!syncobj) {
 			DRM_DEBUG("Invalid syncobj handle provided\n");
 			err = -ENOENT;
 			goto err;
 		}
 
+		if (user_fence.flags & I915_EXEC_FENCE_WAIT) {
+			fence = drm_syncobj_fence_get(syncobj);
+			if (!fence) {
+				DRM_DEBUG("Syncobj handle has no fence\n");
+				drm_syncobj_put(syncobj);
+				err = -EINVAL;
+				goto err;
+			}
+		}
+
 		BUILD_BUG_ON(~(ARCH_KMALLOC_MINALIGN - 1) &
 			     ~__I915_EXEC_FENCE_UNKNOWN_FLAGS);
 
-		fences[n] = ptr_pack_bits(syncobj, fence.flags, 2);
+		fences[n].syncobj = ptr_pack_bits(syncobj, user_fence.flags, 2);
+		fences[n].dma_fence = fence;
+		fences[n].value = 0;
+		fences[n].chain_fence = NULL;
 	}
 
 	return fences;
@@ -2294,37 +2433,44 @@ get_fence_array(struct drm_i915_gem_execbuffer2 *args,
 	return ERR_PTR(err);
 }
 
+static struct i915_eb_fences *
+get_fence_array(struct i915_execbuffer *eb, int *out_n_fences)
+{
+	if (eb->args->flags & I915_EXEC_FENCE_ARRAY)
+		return get_legacy_fence_array(eb, out_n_fences);
+
+	if (eb->extensions.flags & BIT(DRM_I915_GEM_EXECBUFFER_EXT_TIMELINE_FENCES))
+		return get_timeline_fence_array(eb, out_n_fences);
+
+	*out_n_fences = 0;
+	return NULL;
+}
+
 static void
-put_fence_array(struct drm_i915_gem_execbuffer2 *args,
-		struct drm_syncobj **fences)
+put_fence_array(struct i915_eb_fences *fences, int nfences)
 {
 	if (fences)
-		__free_fence_array(fences, args->num_cliprects);
+		__free_fence_array(fences, nfences);
 }
 
 static int
 await_fence_array(struct i915_execbuffer *eb,
-		  struct drm_syncobj **fences)
+		  struct i915_eb_fences *fences,
+		  int nfences)
 {
-	const unsigned int nfences = eb->args->num_cliprects;
 	unsigned int n;
 	int err;
 
 	for (n = 0; n < nfences; n++) {
 		struct drm_syncobj *syncobj;
-		struct dma_fence *fence;
 		unsigned int flags;
 
-		syncobj = ptr_unpack_bits(fences[n], &flags, 2);
+		syncobj = ptr_unpack_bits(fences[n].syncobj, &flags, 2);
 		if (!(flags & I915_EXEC_FENCE_WAIT))
 			continue;
 
-		fence = drm_syncobj_fence_get(syncobj);
-		if (!fence)
-			return -EINVAL;
-
-		err = i915_request_await_dma_fence(eb->request, fence);
-		dma_fence_put(fence);
+		err = i915_request_await_dma_fence(eb->request,
+						   fences[n].dma_fence);
 		if (err < 0)
 			return err;
 	}
@@ -2334,9 +2480,9 @@ await_fence_array(struct i915_execbuffer *eb,
 
 static void
 signal_fence_array(struct i915_execbuffer *eb,
-		   struct drm_syncobj **fences)
+		   struct i915_eb_fences *fences,
+		   int nfences)
 {
-	const unsigned int nfences = eb->args->num_cliprects;
 	struct dma_fence * const fence = &eb->request->fence;
 	unsigned int n;
 
@@ -2344,15 +2490,46 @@ signal_fence_array(struct i915_execbuffer *eb,
 		struct drm_syncobj *syncobj;
 		unsigned int flags;
 
-		syncobj = ptr_unpack_bits(fences[n], &flags, 2);
+		syncobj = ptr_unpack_bits(fences[n].syncobj, &flags, 2);
 		if (!(flags & I915_EXEC_FENCE_SIGNAL))
 			continue;
 
-		drm_syncobj_replace_fence(syncobj, fence);
+		if (fences[n].chain_fence) {
+			drm_syncobj_add_point(syncobj, fences[n].chain_fence,
+					      fence, fences[n].value);
+			/*
+			 * The chain's ownership is transfered to the
+			 * timeline.
+			 */
+			fences[n].chain_fence = NULL;
+		} else {
+			drm_syncobj_replace_fence(syncobj, fence);
+		}
 	}
 }
 
+static int parse_timeline_fences(struct i915_user_extension __user *ext, void *data)
+{
+	struct i915_execbuffer *eb = data;
+
+	/* Timeline fences are incompatible with the fence array flag. */
+	if (eb->args->flags & I915_EXEC_FENCE_ARRAY)
+		return -EINVAL;
+
+	if (eb->extensions.flags & BIT(DRM_I915_GEM_EXECBUFFER_EXT_TIMELINE_FENCES))
+		return -EINVAL;
+
+	if (copy_from_user(&eb->extensions.timeline_fences, ext,
+			   sizeof(eb->extensions.timeline_fences)))
+		return -EFAULT;
+
+	eb->extensions.flags |= BIT(DRM_I915_GEM_EXECBUFFER_EXT_TIMELINE_FENCES);
+
+	return 0;
+}
+
 static const i915_user_extension_fn execbuf_extensions[] = {
+        [DRM_I915_GEM_EXECBUFFER_EXT_TIMELINE_FENCES] = parse_timeline_fences,
 };
 
 static int
@@ -2377,14 +2554,15 @@ static int
 i915_gem_do_execbuffer(struct drm_device *dev,
 		       struct drm_file *file,
 		       struct drm_i915_gem_execbuffer2 *args,
-		       struct drm_i915_gem_exec_object2 *exec,
-		       struct drm_syncobj **fences)
+		       struct drm_i915_gem_exec_object2 *exec)
 {
 	struct i915_execbuffer eb;
 	struct dma_fence *in_fence = NULL;
 	struct dma_fence *exec_fence = NULL;
 	struct sync_file *out_fence = NULL;
+	struct i915_eb_fences *fences = NULL;
 	int out_fence_fd = -1;
+	int nfences = 0;
 	int err;
 
 	BUILD_BUG_ON(__EXEC_INTERNAL_FLAGS & ~__I915_EXEC_ILLEGAL_FLAGS);
@@ -2423,10 +2601,16 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	if (err)
 		return err;
 
+	fences = get_fence_array(&eb, &nfences);
+	if (IS_ERR(fences))
+		return PTR_ERR(fences);
+
 	if (args->flags & I915_EXEC_FENCE_IN) {
 		in_fence = sync_file_get_fence(lower_32_bits(args->rsvd2));
-		if (!in_fence)
-			return -EINVAL;
+		if (!in_fence) {
+			err = -EINVAL;
+			goto err_fences;
+		}
 	}
 
 	if (args->flags & I915_EXEC_FENCE_SUBMIT) {
@@ -2584,7 +2768,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	}
 
 	if (fences) {
-		err = await_fence_array(&eb, fences);
+		err = await_fence_array(&eb, fences, nfences);
 		if (err)
 			goto err_request;
 	}
@@ -2613,7 +2797,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	i915_request_add(eb.request);
 
 	if (fences)
-		signal_fence_array(&eb, fences);
+		signal_fence_array(&eb, fences, nfences);
 
 	if (out_fence) {
 		if (err == 0) {
@@ -2648,6 +2832,8 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	dma_fence_put(exec_fence);
 err_in_fence:
 	dma_fence_put(in_fence);
+err_fences:
+	put_fence_array(fences, nfences);
 	return err;
 }
 
@@ -2741,7 +2927,7 @@ i915_gem_execbuffer_ioctl(struct drm_device *dev, void *data,
 			exec2_list[i].flags = 0;
 	}
 
-	err = i915_gem_do_execbuffer(dev, file, &exec2, exec2_list, NULL);
+	err = i915_gem_do_execbuffer(dev, file, &exec2, exec2_list);
 	if (exec2.flags & __EXEC_HAS_RELOC) {
 		struct drm_i915_gem_exec_object __user *user_exec_list =
 			u64_to_user_ptr(args->buffers_ptr);
@@ -2772,7 +2958,6 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
 {
 	struct drm_i915_gem_execbuffer2 *args = data;
 	struct drm_i915_gem_exec_object2 *exec2_list;
-	struct drm_syncobj **fences = NULL;
 	const size_t count = args->buffer_count;
 	int err;
 
@@ -2800,15 +2985,7 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
 		return -EFAULT;
 	}
 
-	if (args->flags & I915_EXEC_FENCE_ARRAY) {
-		fences = get_fence_array(args, file);
-		if (IS_ERR(fences)) {
-			kvfree(exec2_list);
-			return PTR_ERR(fences);
-		}
-	}
-
-	err = i915_gem_do_execbuffer(dev, file, args, exec2_list, fences);
+	err = i915_gem_do_execbuffer(dev, file, args, exec2_list);
 
 	/*
 	 * Now that we have begun execution of the batchbuffer, we ignore
@@ -2848,7 +3025,6 @@ end:;
 	}
 
 	args->flags &= ~__I915_EXEC_UNKNOWN_FLAGS;
-	put_fence_array(args, fences);
 	kvfree(exec2_list);
 	return err;
 }
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index fa02e8f033d7..088f9d2af3fa 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -457,6 +457,7 @@ static int i915_getparam_ioctl(struct drm_device *dev, void *data,
 	case I915_PARAM_HAS_EXEC_BATCH_FIRST:
 	case I915_PARAM_HAS_EXEC_FENCE_ARRAY:
 	case I915_PARAM_HAS_EXEC_SUBMIT_FENCE:
+	case I915_PARAM_HAS_EXEC_TIMELINE_FENCES:
 		/* For the time being all of these are always true;
 		 * if some supported hardware does not have one of these
 		 * features this value needs to be provided from
@@ -3220,7 +3221,8 @@ static struct drm_driver driver = {
 	 */
 	.driver_features =
 	    DRIVER_GEM |
-	    DRIVER_RENDER | DRIVER_MODESET | DRIVER_ATOMIC | DRIVER_SYNCOBJ,
+	    DRIVER_RENDER | DRIVER_MODESET | DRIVER_ATOMIC | DRIVER_SYNCOBJ |
+	    DRIVER_SYNCOBJ_TIMELINE,
 	.release = i915_driver_release,
 	.open = i915_driver_open,
 	.lastclose = i915_driver_lastclose,
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index efa195d6994e..6bd76a0d29e5 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -617,6 +617,12 @@ typedef struct drm_i915_irq_wait {
  */
 #define I915_PARAM_PERF_REVISION	54
 
+/* Query whether DRM_I915_GEM_EXECBUFFER2 supports supplying an array of
+ * timeline syncobj through drm_i915_gem_execbuf_ext_timeline_fences. See
+ * I915_EXEC_EXT.
+ */
+#define I915_PARAM_HAS_EXEC_TIMELINE_FENCES 55
+
 /* Must be kept compact -- no holes and well documented */
 
 typedef struct drm_i915_getparam {
@@ -1014,9 +1020,41 @@ struct drm_i915_gem_exec_fence {
 };
 
 enum drm_i915_gem_execbuffer_ext {
+	/**
+	 * See drm_i915_gem_execbuf_ext_timeline_fences.
+	 */
+	DRM_I915_GEM_EXECBUFFER_EXT_TIMELINE_FENCES = 0,
+
 	DRM_I915_GEM_EXECBUFFER_EXT_MAX /* non-ABI */
 };
 
+/**
+ * This structure describes an array of drm_syncobj and associated points for
+ * timeline variants of drm_syncobj. It is invalid to append this structure to
+ * the execbuf if I915_EXEC_FENCE_ARRAY is set.
+ */
+struct drm_i915_gem_execbuffer_ext_timeline_fences {
+	struct i915_user_extension base;
+
+	/**
+	 * Number of element in the handles_ptr & value_ptr arrays.
+	 */
+	__u64 fence_count;
+
+	/**
+	 * Pointer to an array of struct drm_i915_gem_exec_fence of length
+	 * fence_count.
+	 */
+	__u64 handles_ptr;
+
+	/**
+	 * Pointer to an array of u64 values of length fence_count. Values
+	 * must be 0 for a binary drm_syncobj. A Value of 0 for a timeline
+	 * drm_syncobj is invalid as it turns a drm_syncobj into a binary one.
+	 */
+	__u64 values_ptr;
+};
+
 struct drm_i915_gem_execbuffer2 {
 	/**
 	 * List of gem_exec_object2 structs
-- 
2.21.0.392.gf8f6787159e

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v6 08/11] drm/i915: add a new perf configuration execbuf parameter
  2019-07-01 11:34 [PATCH v6 00/11] drm/i915: Vulkan performance query support Lionel Landwerlin
                   ` (6 preceding siblings ...)
  2019-07-01 11:34 ` [PATCH v6 07/11] drm/i915: add syncobj timeline support Lionel Landwerlin
@ 2019-07-01 11:34 ` Lionel Landwerlin
  2019-07-01 12:05   ` Chris Wilson
  2019-07-01 11:34 ` [PATCH v6 09/11] drm/i915/perf: allow holding preemption on filtered ctx Lionel Landwerlin
                   ` (6 subsequent siblings)
  14 siblings, 1 reply; 44+ messages in thread
From: Lionel Landwerlin @ 2019-07-01 11:34 UTC (permalink / raw)
  To: intel-gfx

We want the ability to dispatch a set of command buffer to the
hardware, each with a different OA configuration. To achieve this, we
reuse a couple of fields from the execbuf2 struct (I CAN HAZ
execbuf3?) to notify what OA configuration should be used for a batch
buffer. This requires the process making the execbuf with this flag to
also own the perf fd at the time of execbuf.

v2: Add a emit_oa_config() vfunc in the intel_engine_cs (Chris)
    Move oa_config vma to active (Chris)

v3: Don't drop the lock for engine lookup (Chris)
    Move OA config vma to active before writing the ringbuffer (Chris)

v4: Reuse i915_user_extension_fn
    Serialize requests with OA config updates

v5: Check that the chained extension is only present once (Chris)
    Unpin oa_vma in main path (Chris)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 124 +++++++++++++++++-
 drivers/gpu/drm/i915/gt/intel_engine_cs.c     |   2 +
 drivers/gpu/drm/i915/gt/intel_engine_types.h  |   9 ++
 drivers/gpu/drm/i915/gt/intel_lrc.c           |   1 +
 drivers/gpu/drm/i915/gt/intel_ringbuffer.c    |   4 +-
 drivers/gpu/drm/i915/i915_drv.c               |   4 +
 drivers/gpu/drm/i915/i915_drv.h               |   8 +-
 drivers/gpu/drm/i915/i915_perf.c              |  23 ++--
 include/uapi/drm/i915_drm.h                   |  37 ++++++
 9 files changed, 196 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index d3004fc1f995..f92bace9caff 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -283,7 +283,12 @@ struct i915_execbuffer {
 	struct {
 		u64 flags; /** Available extensions parameters */
 		struct drm_i915_gem_execbuffer_ext_timeline_fences timeline_fences;
+		struct drm_i915_gem_execbuffer_ext_perf perf_config;
 	} extensions;
+
+	struct i915_oa_config *oa_config; /** HW configuration for OA, NULL is not needed. */
+	struct drm_i915_gem_object *oa_bo;
+	struct i915_vma *oa_vma;
 };
 
 #define exec_entry(EB, VMA) (&(EB)->exec[(VMA)->exec_flags - (EB)->flags])
@@ -1210,6 +1215,21 @@ static int reloc_move_to_gpu(struct i915_request *rq, struct i915_vma *vma)
 	return err;
 }
 
+
+static int
+get_execbuf_oa_config(struct i915_execbuffer *eb)
+{
+	eb->oa_config = NULL;
+	eb->oa_vma = NULL;
+	eb->oa_bo = NULL;
+
+	if ((eb->extensions.flags & BIT(DRM_I915_GEM_EXECBUFFER_EXT_PERF)) == 0)
+		return 0;
+
+	return i915_perf_get_oa_config(eb->i915, eb->extensions.perf_config.oa_config,
+				       &eb->oa_config, &eb->oa_bo);
+}
+
 static int __reloc_gpu_alloc(struct i915_execbuffer *eb,
 			     struct i915_vma *vma,
 			     unsigned int len)
@@ -2072,6 +2092,40 @@ add_to_client(struct i915_request *rq, struct drm_file *file)
 	list_add_tail(&rq->client_link, &rq->file_priv->mm.request_list);
 }
 
+static int eb_oa_config(struct i915_execbuffer *eb)
+{
+	int err;
+
+	if (!eb->oa_config)
+		return 0;
+
+	err = i915_active_request_set(&eb->engine->last_oa_config,
+				      eb->request);
+	if (err)
+		return err;
+
+	/*
+	 * If the config hasn't changed, skip reconfiguring the HW (this is
+	 * subject to a delay we want to avoid has much as possible).
+	 */
+	if (eb->oa_config == eb->i915->perf.oa.exclusive_stream->oa_config)
+		return 0;
+
+	err = i915_vma_move_to_active(eb->oa_vma, eb->request, 0);
+	if (err)
+		return err;
+
+	err = eb->engine->emit_bb_start(eb->request,
+					eb->oa_vma->node.start,
+					0, I915_DISPATCH_SECURE);
+	if (err)
+		return err;
+
+	swap(eb->oa_config, eb->i915->perf.oa.exclusive_stream->oa_config);
+
+	return 0;
+}
+
 static int eb_submit(struct i915_execbuffer *eb)
 {
 	int err;
@@ -2098,6 +2152,10 @@ static int eb_submit(struct i915_execbuffer *eb)
 			return err;
 	}
 
+	err = eb_oa_config(eb);
+	if (err)
+		return err;
+
 	err = eb->engine->emit_bb_start(eb->request,
 					eb->batch->node.start +
 					eb->batch_start_offset,
@@ -2528,8 +2586,25 @@ static int parse_timeline_fences(struct i915_user_extension __user *ext, void *d
 	return 0;
 }
 
+static int parse_perf_config(struct i915_user_extension __user *ext, void *data)
+{
+	struct i915_execbuffer *eb = data;
+
+	if (eb->extensions.flags & BIT(DRM_I915_GEM_EXECBUFFER_EXT_PERF))
+		return -EINVAL;
+
+	if (copy_from_user(&eb->extensions.perf_config, ext,
+			   sizeof(eb->extensions.perf_config)))
+		return -EFAULT;
+
+	eb->extensions.flags |= BIT(DRM_I915_GEM_EXECBUFFER_EXT_PERF);
+
+	return 0;
+}
+
 static const i915_user_extension_fn execbuf_extensions[] = {
         [DRM_I915_GEM_EXECBUFFER_EXT_TIMELINE_FENCES] = parse_timeline_fences,
+        [DRM_I915_GEM_EXECBUFFER_EXT_PERF] = parse_perf_config,
 };
 
 static int
@@ -2634,9 +2709,13 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 		}
 	}
 
+	err = get_execbuf_oa_config(&eb);
+	if (err)
+		goto err_oa_config;
+
 	err = eb_create(&eb);
 	if (err)
-		goto err_out_fence;
+		goto err_oa_config;
 
 	GEM_BUG_ON(!eb.lut_size);
 
@@ -2661,6 +2740,27 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	if (unlikely(err))
 		goto err_unlock;
 
+	if (eb.extensions.flags & BIT(DRM_I915_GEM_EXECBUFFER_EXT_PERF)) {
+		struct file *perf_file;
+
+		if (!intel_engine_has_oa(eb.engine)) {
+			err = -ENODEV;
+			goto err_engine;
+		}
+
+		perf_file = fget(eb.extensions.perf_config.perf_fd);
+		if (!perf_file)
+			goto err_engine;
+
+		if (perf_file->private_data != eb.i915->perf.oa.exclusive_stream)
+			err = -EINVAL;
+
+		fput(perf_file);
+
+		if (unlikely(err))
+			goto err_engine;
+	}
+
 	err = eb_wait_for_ring(&eb); /* may temporarily drop struct_mutex */
 	if (unlikely(err))
 		goto err_engine;
@@ -2781,6 +2881,20 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 		}
 	}
 
+	if (eb.extensions.flags & BIT(DRM_I915_GEM_EXECBUFFER_EXT_PERF)) {
+		eb.oa_vma = i915_vma_instance(eb.oa_bo,
+					      &eb.engine->i915->ggtt.vm, NULL);
+		if (unlikely(IS_ERR(eb.oa_vma))) {
+			err = PTR_ERR(eb.oa_vma);
+			eb.oa_vma = NULL;
+			goto err_request;
+		}
+
+		err = i915_vma_pin(eb.oa_vma, 0, 0, PIN_GLOBAL);
+		if (err)
+			goto err_request;
+	}
+
 	/*
 	 * Whilst this request exists, batch_obj will be on the
 	 * active_list, and so will hold the active reference. Only when this
@@ -2825,7 +2939,13 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	i915_gem_context_put(eb.gem_context);
 err_destroy:
 	eb_destroy(&eb);
-err_out_fence:
+err_oa_config:
+	if (eb.oa_config) {
+		i915_gem_object_put(eb.oa_bo);
+		i915_oa_config_put(eb.oa_config);
+	}
+	if (eb.oa_vma)
+		i915_vma_unpin(eb.oa_vma);
 	if (out_fence_fd != -1)
 		put_unused_fd(out_fence_fd);
 err_exec_fence:
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index d1508f0b4c84..b3ee3e3b58dd 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -859,6 +859,8 @@ int intel_engine_init_common(struct intel_engine_cs *engine)
 
 	engine->set_default_submission(engine);
 
+	INIT_ACTIVE_REQUEST(&engine->last_oa_config);
+
 	return 0;
 
 err_unpin:
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index 7e056114344e..39da40937e7f 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -363,6 +363,8 @@ struct intel_engine_cs {
 	struct i915_wa_list wa_list;
 	struct i915_wa_list whitelist;
 
+	struct i915_active_request last_oa_config;
+
 	u32             irq_keep_mask; /* always keep these interrupts */
 	u32		irq_enable_mask; /* bitmask to enable ring interrupt */
 	void		(*irq_enable)(struct intel_engine_cs *engine);
@@ -446,6 +448,7 @@ struct intel_engine_cs {
 #define I915_ENGINE_HAS_SEMAPHORES   BIT(3)
 #define I915_ENGINE_NEEDS_BREADCRUMB_TASKLET BIT(4)
 #define I915_ENGINE_IS_VIRTUAL       BIT(5)
+#define I915_ENGINE_HAS_OA           BIT(6)
 	unsigned int flags;
 
 	/*
@@ -541,6 +544,12 @@ intel_engine_is_virtual(const struct intel_engine_cs *engine)
 	return engine->flags & I915_ENGINE_IS_VIRTUAL;
 }
 
+static inline bool
+intel_engine_has_oa(const struct intel_engine_cs *engine)
+{
+	return engine->flags & I915_ENGINE_HAS_OA;
+}
+
 #define instdone_slice_mask(dev_priv__) \
 	(IS_GEN(dev_priv__, 7) ? \
 	 1 : RUNTIME_INFO(dev_priv__)->sseu.slice_mask)
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index cce8337bdf9c..001b696834f2 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -2769,6 +2769,7 @@ int intel_execlists_submission_setup(struct intel_engine_cs *engine)
 		engine->init_context = gen8_init_rcs_context;
 		engine->emit_flush = gen8_emit_flush_render;
 		engine->emit_fini_breadcrumb = gen8_emit_fini_breadcrumb_rcs;
+		engine->flags |= I915_ENGINE_HAS_OA;
 	}
 
 	return 0;
diff --git a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
index 02a4a52e2019..476d465f701c 100644
--- a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
@@ -2206,8 +2206,10 @@ static void setup_rcs(struct intel_engine_cs *engine)
 		engine->irq_enable_mask = I915_USER_INTERRUPT;
 	}
 
-	if (IS_HASWELL(i915))
+	if (IS_HASWELL(i915)) {
 		engine->emit_bb_start = hsw_emit_bb_start;
+		engine->flags |= I915_ENGINE_HAS_OA;
+	}
 
 	engine->resume = rcs_resume;
 }
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 088f9d2af3fa..a08c6123bf38 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -487,6 +487,10 @@ static int i915_getparam_ioctl(struct drm_device *dev, void *data,
 	case I915_PARAM_PERF_REVISION:
 		value = 1;
 		break;
+	case I915_PARAM_HAS_EXEC_PERF_CONFIG:
+		/* Obviously requires perf support. */
+		value = dev_priv->perf.initialized;
+		break;
 	default:
 		DRM_DEBUG("Unknown parameter %d\n", param->param);
 		return -EINVAL;
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index fe93a260bd28..664554114718 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1116,7 +1116,7 @@ struct i915_oa_config {
 
 	struct list_head vma_link;
 
-	atomic_t ref_count;
+	struct kref ref;
 };
 
 struct i915_perf_stream;
@@ -2615,6 +2615,12 @@ int i915_perf_get_oa_config(struct drm_i915_private *i915,
 			    int metrics_set,
 			    struct i915_oa_config **out_config,
 			    struct drm_i915_gem_object **out_obj);
+void i915_oa_config_release(struct kref *ref);
+
+static inline void i915_oa_config_put(struct i915_oa_config *oa_config)
+{
+	kref_put(&oa_config->ref, i915_oa_config_release);
+}
 
 /* i915_gem_evict.c */
 int __must_check i915_gem_evict_something(struct i915_address_space *vm,
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 03e6908282e3..6b659b5f1948 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -367,10 +367,9 @@ struct perf_open_properties {
 	int oa_period_exponent;
 };
 
-static void put_oa_config(struct i915_oa_config *oa_config)
+void i915_oa_config_release(struct kref *ref)
 {
-	if (!atomic_dec_and_test(&oa_config->ref_count))
-		return;
+	struct i915_oa_config *oa_config = container_of(ref, typeof(*oa_config), ref);
 
 	if (oa_config->obj) {
 		struct drm_i915_private *i915 = oa_config->i915;
@@ -488,7 +487,7 @@ int i915_perf_get_oa_config(struct drm_i915_private *i915,
 	}
 
 	if (out_config) {
-		atomic_inc(&oa_config->ref_count);
+		kref_get(&oa_config->ref);
 		*out_config = oa_config;
 	}
 
@@ -510,7 +509,7 @@ int i915_perf_get_oa_config(struct drm_i915_private *i915,
 	mutex_unlock(&i915->perf.metrics_lock);
 
 	if (ret && out_config) {
-		put_oa_config(oa_config);
+		i915_oa_config_put(oa_config);
 		*out_config = NULL;
 	}
 
@@ -1484,7 +1483,7 @@ static void i915_oa_stream_destroy(struct i915_perf_stream *stream)
 	if (stream->ctx)
 		oa_put_render_ctx_id(stream);
 
-	put_oa_config(stream->oa_config);
+	i915_oa_config_put(stream->oa_config);
 
 	if (dev_priv->perf.oa.spurious_report_rs.missed) {
 		DRM_NOTE("%d spurious OA report notices suppressed due to ratelimiting\n",
@@ -2479,7 +2478,7 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
 	free_oa_buffer(dev_priv);
 
 err_oa_buf_alloc:
-	put_oa_config(stream->oa_config);
+	i915_oa_config_put(stream->oa_config);
 
 	intel_uncore_forcewake_put(&dev_priv->uncore, FORCEWAKE_ALL);
 	intel_runtime_pm_put(&dev_priv->runtime_pm, stream->wakeref);
@@ -3296,7 +3295,7 @@ void i915_perf_register(struct drm_i915_private *dev_priv)
 		goto sysfs_error;
 
 	dev_priv->perf.oa.test_config.i915 = dev_priv;
-	atomic_set(&dev_priv->perf.oa.test_config.ref_count, 1);
+	kref_init(&dev_priv->perf.oa.test_config.ref);
 
 	goto exit;
 
@@ -3553,7 +3552,7 @@ int i915_perf_add_config_ioctl(struct drm_device *dev, void *data,
 	}
 
 	oa_config->i915 = dev_priv;
-	atomic_set(&oa_config->ref_count, 1);
+	kref_init(&oa_config->ref);
 
 	if (!uuid_is_valid(args->uuid)) {
 		DRM_DEBUG("Invalid uuid format for OA config\n");
@@ -3652,7 +3651,7 @@ int i915_perf_add_config_ioctl(struct drm_device *dev, void *data,
 sysfs_err:
 	mutex_unlock(&dev_priv->perf.metrics_lock);
 reg_err:
-	put_oa_config(oa_config);
+	i915_oa_config_put(oa_config);
 	DRM_DEBUG("Failed to add new OA config\n");
 	return err;
 }
@@ -3708,7 +3707,7 @@ int i915_perf_remove_config_ioctl(struct drm_device *dev, void *data,
 
 	DRM_DEBUG("Removed config %s id=%i\n", oa_config->uuid, oa_config->id);
 
-	put_oa_config(oa_config);
+	i915_oa_config_put(oa_config);
 
 	return 0;
 
@@ -3878,7 +3877,7 @@ static int destroy_config(int id, void *p, void *data)
 {
 	struct i915_oa_config *oa_config = p;
 
-	put_oa_config(oa_config);
+	i915_oa_config_put(oa_config);
 
 	return 0;
 }
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 6bd76a0d29e5..96bc09af8cc4 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -623,6 +623,16 @@ typedef struct drm_i915_irq_wait {
  */
 #define I915_PARAM_HAS_EXEC_TIMELINE_FENCES 55
 
+/*
+ * Request an i915/perf performance configuration change before running the
+ * commands given in an execbuf.
+ *
+ * Performance configuration ID and the file descriptor of the i915 perf
+ * stream are given through drm_i915_gem_execbuffer_ext_perf. See
+ * I915_EXEC_EXT.
+ */
+#define I915_PARAM_HAS_EXEC_PERF_CONFIG 56
+
 /* Must be kept compact -- no holes and well documented */
 
 typedef struct drm_i915_getparam {
@@ -1025,6 +1035,12 @@ enum drm_i915_gem_execbuffer_ext {
 	 */
 	DRM_I915_GEM_EXECBUFFER_EXT_TIMELINE_FENCES = 0,
 
+	/**
+	 * This identifier is associated with
+	 * drm_i915_gem_execbuffer_perf_ext.
+	 */
+	DRM_I915_GEM_EXECBUFFER_EXT_PERF,
+
 	DRM_I915_GEM_EXECBUFFER_EXT_MAX /* non-ABI */
 };
 
@@ -1055,6 +1071,27 @@ struct drm_i915_gem_execbuffer_ext_timeline_fences {
 	__u64 values_ptr;
 };
 
+struct drm_i915_gem_execbuffer_ext_perf {
+	struct i915_user_extension base;
+
+	/**
+	 * Performance file descriptor returned by DRM_IOCTL_I915_PERF_OPEN.
+	 * This is used to identify that the application
+	 */
+	__s32 perf_fd;
+
+	/**
+	 * Unused for now. Must be cleared to zero.
+	 */
+	__u32 pad;
+
+	/**
+	 * OA configuration ID to switch to before executing the commands
+	 * associated to the execbuf.
+	 */
+	__u64 oa_config;
+};
+
 struct drm_i915_gem_execbuffer2 {
 	/**
 	 * List of gem_exec_object2 structs
-- 
2.21.0.392.gf8f6787159e

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v6 09/11] drm/i915/perf: allow holding preemption on filtered ctx
  2019-07-01 11:34 [PATCH v6 00/11] drm/i915: Vulkan performance query support Lionel Landwerlin
                   ` (7 preceding siblings ...)
  2019-07-01 11:34 ` [PATCH v6 08/11] drm/i915: add a new perf configuration execbuf parameter Lionel Landwerlin
@ 2019-07-01 11:34 ` Lionel Landwerlin
  2019-07-01 12:03   ` Chris Wilson
  2019-07-01 11:34 ` [PATCH v6 10/11] drm/i915/perf: execute OA configuration from command stream Lionel Landwerlin
                   ` (5 subsequent siblings)
  14 siblings, 1 reply; 44+ messages in thread
From: Lionel Landwerlin @ 2019-07-01 11:34 UTC (permalink / raw)
  To: intel-gfx

We would like to make use of perf in Vulkan. The Vulkan API is much
lower level than OpenGL, with applications directly exposed to the
concept of command buffers (pretty much equivalent to our batch
buffers). In Vulkan, queries are always limited in scope to a command
buffer. In OpenGL, the lack of command buffer concept meant that
queries' duration could span multiple command buffers.

With that restriction gone in Vulkan, we would like to simplify
measuring performance just by measuring the deltas between the counter
snapshots written by 2 MI_RECORD_PERF_COUNT commands, rather than the
more complex scheme we currently have in the GL driver, using 2
MI_RECORD_PERF_COUNT commands and doing some post processing on the
stream of OA reports, coming from the global OA buffer, to remove any
unrelated deltas in between the 2 MI_RECORD_PERF_COUNT.

Disabling preemption only apply to a single context with which want to
query performance counters for and is considered a privileged
operation, by default protected by CAP_SYS_ADMIN. It is possible to
enable it for a normal user by disabling the paranoid stream setting.

v2: Store preemption setting in intel_context (Chris)

v3: Use priorities to avoid preemption rather than the HW mechanism

v4: Just modify the port priority reporting function

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |  8 +++++++
 drivers/gpu/drm/i915/gt/intel_lrc.c           |  7 +++++-
 drivers/gpu/drm/i915/i915_drv.c               |  2 +-
 drivers/gpu/drm/i915/i915_drv.h               |  8 +++++++
 drivers/gpu/drm/i915/i915_perf.c              | 22 +++++++++++++++++--
 drivers/gpu/drm/i915/i915_priolist_types.h    |  7 ++++++
 drivers/gpu/drm/i915/i915_request.c           |  4 ++--
 drivers/gpu/drm/i915/i915_request.h           | 14 +++++++++++-
 drivers/gpu/drm/i915/intel_guc_submission.c   | 10 ++++++++-
 drivers/gpu/drm/i915/intel_pm.c               |  5 +++--
 include/uapi/drm/i915_drm.h                   | 11 ++++++++++
 11 files changed, 88 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index f92bace9caff..012d6d7f54e2 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -2104,6 +2104,14 @@ static int eb_oa_config(struct i915_execbuffer *eb)
 	if (err)
 		return err;
 
+	/*
+	 * If the perf stream was opened with hold preemption, flag the
+	 * request properly so that the priority of the request is bumped once
+	 * it reaches the execlist ports.
+	 */
+	if (eb->i915->perf.oa.exclusive_stream->hold_preemption)
+		eb->request->flags |= I915_REQUEST_FLAGS_PERF;
+
 	/*
 	 * If the config hasn't changed, skip reconfiguring the HW (this is
 	 * subject to a delay we want to avoid has much as possible).
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 001b696834f2..3ec7fe6d2790 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -256,7 +256,12 @@ static inline int rq_prio(const struct i915_request *rq)
 
 static int effective_prio(const struct i915_request *rq)
 {
-	int prio = rq_prio(rq);
+	int prio;
+
+	if (i915_request_has_perf(rq))
+		prio = I915_USER_PRIORITY(I915_PRIORITY_PERF);
+	else
+		prio = rq_prio(rq);
 
 	/*
 	 * On unwinding the active request, we give it a priority bump
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index a08c6123bf38..6395deadf73f 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -485,7 +485,7 @@ static int i915_getparam_ioctl(struct drm_device *dev, void *data,
 		value = INTEL_INFO(dev_priv)->has_coherent_ggtt;
 		break;
 	case I915_PARAM_PERF_REVISION:
-		value = 1;
+		value = 2;
 		break;
 	case I915_PARAM_HAS_EXEC_PERF_CONFIG:
 		/* Obviously requires perf support. */
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 664554114718..84f9dec59a19 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1232,6 +1232,14 @@ struct i915_perf_stream {
 	 */
 	bool enabled;
 
+	/**
+	 * @hold_preemption: Whether preemption is put on hold for command
+	 * submissions done on the @ctx. This is useful for some drivers that
+	 * cannot easily post process the OA buffer context to subtract delta
+	 * of performance counters not associated with @ctx.
+	 */
+	bool hold_preemption;
+
 	/**
 	 * @ops: The callbacks providing the implementation of this specific
 	 * type of configured stream.
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 6b659b5f1948..272036316192 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -344,6 +344,8 @@ static const struct i915_oa_format gen8_plus_oa_formats[I915_OA_FORMAT_MAX] = {
  * struct perf_open_properties - for validated properties given to open a stream
  * @sample_flags: `DRM_I915_PERF_PROP_SAMPLE_*` properties are tracked as flags
  * @single_context: Whether a single or all gpu contexts should be monitored
+ * @hold_preemption: Whether the preemption is disabled for the filtered
+ *                   context
  * @ctx_handle: A gem ctx handle for use with @single_context
  * @metrics_set: An ID for an OA unit metric set advertised via sysfs
  * @oa_format: An OA unit HW report format
@@ -358,6 +360,7 @@ struct perf_open_properties {
 	u32 sample_flags;
 
 	u64 single_context:1;
+	u64 hold_preemption:1;
 	u64 ctx_handle;
 
 	/* OA sampling state */
@@ -2399,6 +2402,8 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
 	stream->sample_flags |= SAMPLE_OA_REPORT;
 	stream->sample_size += format_size;
 
+	stream->hold_preemption = props->hold_preemption;
+
 	dev_priv->perf.oa.oa_buffer.format_size = format_size;
 	if (WARN_ON(dev_priv->perf.oa.oa_buffer.format_size == 0))
 		return -EINVAL;
@@ -2937,6 +2942,15 @@ i915_perf_open_ioctl_locked(struct drm_i915_private *dev_priv,
 		}
 	}
 
+	if (props->hold_preemption) {
+		if (!props->single_context) {
+			DRM_DEBUG("preemption disable with no context\n");
+			ret = -EINVAL;
+			goto err;
+		}
+		privileged_op = true;
+	}
+
 	/*
 	 * On Haswell the OA unit supports clock gating off for a specific
 	 * context and in this mode there's no visibility of metrics for the
@@ -2951,8 +2965,9 @@ i915_perf_open_ioctl_locked(struct drm_i915_private *dev_priv,
 	 * MI_REPORT_PERF_COUNT commands and so consider it a privileged op to
 	 * enable the OA unit by default.
 	 */
-	if (IS_HASWELL(dev_priv) && specific_ctx)
+	if (IS_HASWELL(dev_priv) && specific_ctx && !props->hold_preemption) {
 		privileged_op = false;
+	}
 
 	/* Similar to perf's kernel.perf_paranoid_cpu sysctl option
 	 * we check a dev.i915.perf_stream_paranoid sysctl option
@@ -2961,7 +2976,7 @@ i915_perf_open_ioctl_locked(struct drm_i915_private *dev_priv,
 	 */
 	if (privileged_op &&
 	    i915_perf_stream_paranoid && !capable(CAP_SYS_ADMIN)) {
-		DRM_DEBUG("Insufficient privileges to open system-wide i915 perf stream\n");
+		DRM_DEBUG("Insufficient privileges to open i915 perf stream\n");
 		ret = -EACCES;
 		goto err_ctx;
 	}
@@ -3153,6 +3168,9 @@ static int read_properties_unlocked(struct drm_i915_private *dev_priv,
 			props->oa_periodic = true;
 			props->oa_period_exponent = value;
 			break;
+		case DRM_I915_PERF_PROP_HOLD_PREEMPTION:
+			props->hold_preemption = !!value;
+			break;
 		case DRM_I915_PERF_PROP_MAX:
 			MISSING_CASE(id);
 			return -EINVAL;
diff --git a/drivers/gpu/drm/i915/i915_priolist_types.h b/drivers/gpu/drm/i915/i915_priolist_types.h
index 49709de69875..bb9d1e1fb94a 100644
--- a/drivers/gpu/drm/i915/i915_priolist_types.h
+++ b/drivers/gpu/drm/i915/i915_priolist_types.h
@@ -17,6 +17,13 @@ enum {
 	I915_PRIORITY_NORMAL = I915_CONTEXT_DEFAULT_PRIORITY,
 	I915_PRIORITY_MAX = I915_CONTEXT_MAX_USER_PRIORITY + 1,
 
+	/* Requests containing performance queries must not be preempted by
+	 * another context. They get scheduled with their default priority and
+	 * once they reach the execlist ports we bump them to
+	 * I915_PRIORITY_PERF so that they stick to the HW until they finish.
+	 */
+	I915_PRIORITY_PERF,
+
 	I915_PRIORITY_INVALID = INT_MIN
 };
 
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 5ff87c4a0cd5..222c9c56e9de 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -292,7 +292,7 @@ static bool i915_request_retire(struct i915_request *rq)
 		dma_fence_signal_locked(&rq->fence);
 	if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &rq->fence.flags))
 		i915_request_cancel_breadcrumb(rq);
-	if (rq->waitboost) {
+	if (i915_request_has_waitboost(rq)) {
 		GEM_BUG_ON(!atomic_read(&rq->i915->gt_pm.rps.num_waiters));
 		atomic_dec(&rq->i915->gt_pm.rps.num_waiters);
 	}
@@ -684,7 +684,7 @@ __i915_request_create(struct intel_context *ce, gfp_t gfp)
 	rq->file_priv = NULL;
 	rq->batch = NULL;
 	rq->capture_list = NULL;
-	rq->waitboost = false;
+	rq->flags = 0;
 	rq->execution_mask = ALL_ENGINES;
 
 	INIT_LIST_HEAD(&rq->active_list);
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index b58ceef92e20..3bc2a3b8b9ca 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -216,7 +216,9 @@ struct i915_request {
 	/** Time at which this request was emitted, in jiffies. */
 	unsigned long emitted_jiffies;
 
-	bool waitboost;
+#define I915_REQUEST_FLAGS_WAITBOOST BIT(0)
+#define I915_REQUEST_FLAGS_PERF      BIT(1)
+	u32 flags;
 
 	/** timeline->request entry for this request */
 	struct list_head link;
@@ -430,6 +432,16 @@ static inline void i915_request_mark_complete(struct i915_request *rq)
 	rq->hwsp_seqno = (u32 *)&rq->fence.seqno; /* decouple from HWSP */
 }
 
+static inline bool i915_request_has_waitboost(const struct i915_request *rq)
+{
+	return rq->flags & I915_REQUEST_FLAGS_WAITBOOST;
+}
+
+static inline bool i915_request_has_perf(const struct i915_request *rq)
+{
+	return rq->flags & I915_REQUEST_FLAGS_PERF;
+}
+
 bool i915_retire_requests(struct drm_i915_private *i915);
 
 #endif /* I915_REQUEST_H */
diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c
index 12c22359fdac..48e7ae3d67a2 100644
--- a/drivers/gpu/drm/i915/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/intel_guc_submission.c
@@ -707,6 +707,14 @@ static inline int rq_prio(const struct i915_request *rq)
 	return rq->sched.attr.priority | __NO_PREEMPTION;
 }
 
+static inline int effective_prio(const struct i915_request *rq)
+{
+	if (i915_request_has_perf(rq))
+		return I915_USER_PRIORITY(I915_PRIORITY_PERF) | __NO_PREEMPTION;
+
+	return rq_prio(rq);
+}
+
 static struct i915_request *schedule_in(struct i915_request *rq, int idx)
 {
 	trace_i915_request_in(rq, idx);
@@ -747,7 +755,7 @@ static void __guc_dequeue(struct intel_engine_cs *engine)
 				&engine->i915->guc.preempt_work[engine->id];
 			int prio = execlists->queue_priority_hint;
 
-			if (i915_scheduler_need_preempt(prio, rq_prio(last))) {
+			if (i915_scheduler_need_preempt(prio, effective_prio(last))) {
 				intel_write_status_page(engine,
 							I915_GEM_HWS_PREEMPT,
 							GUC_PREEMPT_INPROGRESS);
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index d9a7a13ce32a..491251419c67 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -6891,9 +6891,10 @@ void gen6_rps_boost(struct i915_request *rq)
 	/* Serializes with i915_request_retire() */
 	boost = false;
 	spin_lock_irqsave(&rq->lock, flags);
-	if (!rq->waitboost && !dma_fence_is_signaled_locked(&rq->fence)) {
+	if (!i915_request_has_waitboost(rq) &&
+	    !dma_fence_is_signaled_locked(&rq->fence)) {
 		boost = !atomic_fetch_inc(&rps->num_waiters);
-		rq->waitboost = true;
+		rq->flags |= I915_REQUEST_FLAGS_WAITBOOST;
 	}
 	spin_unlock_irqrestore(&rq->lock, flags);
 	if (!boost)
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 96bc09af8cc4..f2b7a6c1e870 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1983,6 +1983,17 @@ enum drm_i915_perf_property_id {
 	 */
 	DRM_I915_PERF_PROP_OA_EXPONENT,
 
+	/**
+	 * Specifying this property is only valid when specify a context to
+	 * filter with DRM_I915_PERF_PROP_CTX_HANDLE. Specifying this property
+	 * will hold preemption of the particular context we want to gather
+	 * performance data about. The execbuf2 submissions must include a
+	 * drm_i915_gem_execbuffer_ext_perf parameter for this to apply.
+	 *
+	 * This property is available in perf revision 2.
+	 */
+	DRM_I915_PERF_PROP_HOLD_PREEMPTION,
+
 	DRM_I915_PERF_PROP_MAX /* non-ABI */
 };
 
-- 
2.21.0.392.gf8f6787159e

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v6 10/11] drm/i915/perf: execute OA configuration from command stream
  2019-07-01 11:34 [PATCH v6 00/11] drm/i915: Vulkan performance query support Lionel Landwerlin
                   ` (8 preceding siblings ...)
  2019-07-01 11:34 ` [PATCH v6 09/11] drm/i915/perf: allow holding preemption on filtered ctx Lionel Landwerlin
@ 2019-07-01 11:34 ` Lionel Landwerlin
  2019-07-01 13:32   ` Chris Wilson
  2019-07-01 11:34 ` [PATCH v6 11/11] drm/i915: add support for perf configuration queries Lionel Landwerlin
                   ` (4 subsequent siblings)
  14 siblings, 1 reply; 44+ messages in thread
From: Lionel Landwerlin @ 2019-07-01 11:34 UTC (permalink / raw)
  To: intel-gfx

We haven't run into issues with programming the global OA/NOA
registers configuration from CPU so far, but HW engineers actually
recommend doing this from the command streamer.

Since we have a command buffer prepared for the execbuffer side of
things, we can reuse that approach here too.

This also allows us to significantly reduce the amount of time we hold
the main lock.

v2: Drop the global lock as much as possible

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h  |   7 +
 drivers/gpu/drm/i915/i915_perf.c | 261 ++++++++++++++++++-------------
 2 files changed, 161 insertions(+), 107 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 84f9dec59a19..a2430bc3ffd9 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1205,6 +1205,13 @@ struct i915_perf_stream {
 	 */
 	intel_wakeref_t wakeref;
 
+	/**
+	 * @initial_config_rq: First request run at the opening of the i915
+	 * perf stream to configure the HW. Should be NULL after the perf
+	 * stream has been opened successfully.
+	 */
+	struct i915_request *initial_config_rq;
+
 	/**
 	 * @sample_flags: Flags representing the `DRM_I915_PERF_PROP_SAMPLE_*`
 	 * properties given when opening a stream, representing the contents
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 272036316192..8ee4bd712cb6 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -392,6 +392,19 @@ void i915_oa_config_release(struct kref *ref)
 	kfree(oa_config);
 }
 
+static void i915_oa_config_dispose_buffers(struct drm_i915_private *i915)
+{
+	struct i915_oa_config *oa_config, *next;
+
+	mutex_lock(&i915->perf.metrics_lock);
+	list_for_each_entry_safe(oa_config, next, &i915->perf.metrics_buffers, vma_link) {
+		list_del(&oa_config->vma_link);
+		i915_gem_object_put(oa_config->obj);
+		oa_config->obj = NULL;
+	}
+	mutex_unlock(&i915->perf.metrics_lock);
+}
+
 static u32 *write_cs_mi_lri(u32 *cs, const struct i915_oa_reg *reg_data, u32 n_regs)
 {
 	u32 i;
@@ -1449,6 +1462,14 @@ static void oa_put_render_ctx_id(struct i915_perf_stream *stream)
 	}
 }
 
+static void free_noa_wait(struct drm_i915_private *i915)
+{
+	mutex_lock(&i915->drm.struct_mutex);
+	i915_vma_unpin_and_release(&i915->perf.oa.noa_wait,
+				   I915_VMA_RELEASE_MAP);
+	mutex_unlock(&i915->drm.struct_mutex);
+}
+
 static void
 free_oa_buffer(struct drm_i915_private *i915)
 {
@@ -1468,16 +1489,17 @@ static void i915_oa_stream_destroy(struct i915_perf_stream *stream)
 
 	BUG_ON(stream != dev_priv->perf.oa.exclusive_stream);
 
+	dev_priv->perf.oa.ops.disable_metric_set(dev_priv);
+
 	/*
 	 * Unset exclusive_stream first, it will be checked while disabling
 	 * the metric set on gen8+.
 	 */
 	mutex_lock(&dev_priv->drm.struct_mutex);
 	dev_priv->perf.oa.exclusive_stream = NULL;
-	dev_priv->perf.oa.ops.disable_metric_set(dev_priv);
-	i915_vma_unpin_and_release(&dev_priv->perf.oa.noa_wait, 0);
 	mutex_unlock(&dev_priv->drm.struct_mutex);
 
+	free_noa_wait(dev_priv);
 	free_oa_buffer(dev_priv);
 
 	intel_uncore_forcewake_put(&dev_priv->uncore, FORCEWAKE_ALL);
@@ -1710,6 +1732,10 @@ static int alloc_noa_wait(struct drm_i915_private *i915)
 		return PTR_ERR(bo);
 	}
 
+	ret = i915_mutex_lock_interruptible(&i915->drm);
+	if (ret)
+		goto err_unref;
+
 	/*
 	 * We pin in GGTT because we jump into this buffer now because
 	 * multiple OA config BOs will have a jump to this address and it
@@ -1717,10 +1743,13 @@ static int alloc_noa_wait(struct drm_i915_private *i915)
 	 */
 	vma = i915_gem_object_ggtt_pin(bo, NULL, 0, 4096, 0);
 	if (IS_ERR(vma)) {
+		mutex_unlock(&i915->drm.struct_mutex);
 		ret = PTR_ERR(vma);
 		goto err_unref;
 	}
 
+	mutex_unlock(&i915->drm.struct_mutex);
+
 	batch = cs = i915_gem_object_pin_map(bo, I915_MAP_WB);
 	if (IS_ERR(batch)) {
 		ret = PTR_ERR(batch);
@@ -1852,7 +1881,11 @@ static int alloc_noa_wait(struct drm_i915_private *i915)
 	return 0;
 
 err_unpin:
-	__i915_vma_unpin(vma);
+	mutex_lock(&i915->drm.struct_mutex);
+	i915_vma_unpin_and_release(&i915->perf.oa.noa_wait, 0);
+	mutex_unlock(&i915->drm.struct_mutex);
+
+	return ret;
 
 err_unref:
 	i915_gem_object_put(bo);
@@ -1860,23 +1893,55 @@ static int alloc_noa_wait(struct drm_i915_private *i915)
 	return ret;
 }
 
-static void config_oa_regs(struct drm_i915_private *dev_priv,
-			   const struct i915_oa_reg *regs,
-			   u32 n_regs)
+static int emit_oa_config(struct drm_i915_private *i915,
+			  struct i915_perf_stream *stream)
 {
-	u32 i;
+	struct i915_oa_config *oa_config = stream->oa_config;
+	struct i915_request *rq = stream->initial_config_rq;
+	struct i915_vma *vma;
+	u32 *cs;
+	int err;
 
-	for (i = 0; i < n_regs; i++) {
-		const struct i915_oa_reg *reg = regs + i;
+	vma = i915_vma_instance(oa_config->obj, &i915->ggtt.vm, NULL);
+	if (unlikely(IS_ERR(vma)))
+		return PTR_ERR(vma);
+
+	err = i915_vma_pin(vma, 0, 0, PIN_GLOBAL);
+	if (err)
+		return err;
 
-		I915_WRITE(reg->addr, reg->value);
+	err = i915_vma_move_to_active(vma, rq, 0);
+	if (err) {
+		i915_vma_unpin(vma);
+		return err;
 	}
+
+	cs = intel_ring_begin(rq, INTEL_GEN(i915) >= 8 ? 4 : 2);
+	if (IS_ERR(cs)) {
+		i915_vma_unpin(vma);
+		return PTR_ERR(cs);
+	}
+
+	if (INTEL_GEN(i915) > 8) {
+		*cs++ = MI_BATCH_BUFFER_START_GEN8;
+		*cs++ = lower_32_bits(vma->node.start);
+		*cs++ = upper_32_bits(vma->node.start);
+		*cs++ = MI_NOOP;
+	} else {
+		*cs++ = MI_BATCH_BUFFER_START;
+		*cs++ = vma->node.start;
+	}
+
+	intel_ring_advance(rq, cs);
+
+	i915_vma_unpin(vma);
+
+	return 0;
 }
 
 static int hsw_enable_metric_set(struct i915_perf_stream *stream)
 {
 	struct drm_i915_private *dev_priv = stream->dev_priv;
-	const struct i915_oa_config *oa_config = stream->oa_config;
 
 	/* PRM:
 	 *
@@ -1892,35 +1957,7 @@ static int hsw_enable_metric_set(struct i915_perf_stream *stream)
 	I915_WRITE(GEN6_UCGCTL1, (I915_READ(GEN6_UCGCTL1) |
 				  GEN6_CSUNIT_CLOCK_GATE_DISABLE));
 
-	config_oa_regs(dev_priv, oa_config->mux_regs, oa_config->mux_regs_len);
-
-	/* It apparently takes a fairly long time for a new MUX
-	 * configuration to be be applied after these register writes.
-	 * This delay duration was derived empirically based on the
-	 * render_basic config but hopefully it covers the maximum
-	 * configuration latency.
-	 *
-	 * As a fallback, the checks in _append_oa_reports() to skip
-	 * invalid OA reports do also seem to work to discard reports
-	 * generated before this config has completed - albeit not
-	 * silently.
-	 *
-	 * Unfortunately this is essentially a magic number, since we
-	 * don't currently know of a reliable mechanism for predicting
-	 * how long the MUX config will take to apply and besides
-	 * seeing invalid reports we don't know of a reliable way to
-	 * explicitly check that the MUX config has landed.
-	 *
-	 * It's even possible we've miss characterized the underlying
-	 * problem - it just seems like the simplest explanation why
-	 * a delay at this location would mitigate any invalid reports.
-	 */
-	usleep_range(15000, 20000);
-
-	config_oa_regs(dev_priv, oa_config->b_counter_regs,
-		       oa_config->b_counter_regs_len);
-
-	return 0;
+	return emit_oa_config(dev_priv, stream);
 }
 
 static void hsw_disable_metric_set(struct drm_i915_private *dev_priv)
@@ -2025,10 +2062,18 @@ static int gen8_configure_all_contexts(struct drm_i915_private *dev_priv,
 {
 	unsigned int map_type = i915_coherent_map_type(dev_priv);
 	struct i915_gem_context *ctx;
-	struct i915_request *rq;
 	int ret;
 
-	lockdep_assert_held(&dev_priv->drm.struct_mutex);
+	/* When calling without a configuration, we're tearing down the i915
+	 * perf stream. Don't be interruptible in that case.
+	 */
+	if (oa_config) {
+		ret = i915_mutex_lock_interruptible(&dev_priv->drm);
+		if (ret)
+			return ret;
+	} else {
+		mutex_lock(&dev_priv->drm.struct_mutex);
+	}
 
 	/*
 	 * The OA register config is setup through the context image. This image
@@ -2047,7 +2092,7 @@ static int gen8_configure_all_contexts(struct drm_i915_private *dev_priv,
 				     I915_WAIT_LOCKED,
 				     MAX_SCHEDULE_TIMEOUT);
 	if (ret)
-		return ret;
+		goto unlock;
 
 	/* Update all contexts now that we've stalled the submission. */
 	list_for_each_entry(ctx, &dev_priv->contexts.list, link) {
@@ -2070,7 +2115,8 @@ static int gen8_configure_all_contexts(struct drm_i915_private *dev_priv,
 						       map_type);
 			if (IS_ERR(regs)) {
 				i915_gem_context_unlock_engines(ctx);
-				return PTR_ERR(regs);
+				ret = PTR_ERR(regs);
+				goto unlock;
 			}
 
 			ce->state->obj->mm.dirty = true;
@@ -2084,16 +2130,14 @@ static int gen8_configure_all_contexts(struct drm_i915_private *dev_priv,
 	}
 
 	/*
-	 * Apply the configuration by doing one context restore of the edited
-	 * context image.
+	 * The above configuration will be applied when called
+	 * config_oa_regs().
 	 */
-	rq = i915_request_create(dev_priv->engine[RCS0]->kernel_context);
-	if (IS_ERR(rq))
-		return PTR_ERR(rq);
 
-	i915_request_add(rq);
+unlock:
+	mutex_unlock(&dev_priv->drm.struct_mutex);
 
-	return 0;
+	return ret;
 }
 
 static int gen8_enable_metric_set(struct i915_perf_stream *stream)
@@ -2140,35 +2184,7 @@ static int gen8_enable_metric_set(struct i915_perf_stream *stream)
 	if (ret)
 		return ret;
 
-	config_oa_regs(dev_priv, oa_config->mux_regs, oa_config->mux_regs_len);
-
-	/* It apparently takes a fairly long time for a new MUX
-	 * configuration to be be applied after these register writes.
-	 * This delay duration was derived empirically based on the
-	 * render_basic config but hopefully it covers the maximum
-	 * configuration latency.
-	 *
-	 * As a fallback, the checks in _append_oa_reports() to skip
-	 * invalid OA reports do also seem to work to discard reports
-	 * generated before this config has completed - albeit not
-	 * silently.
-	 *
-	 * Unfortunately this is essentially a magic number, since we
-	 * don't currently know of a reliable mechanism for predicting
-	 * how long the MUX config will take to apply and besides
-	 * seeing invalid reports we don't know of a reliable way to
-	 * explicitly check that the MUX config has landed.
-	 *
-	 * It's even possible we've miss characterized the underlying
-	 * problem - it just seems like the simplest explanation why
-	 * a delay at this location would mitigate any invalid reports.
-	 */
-	usleep_range(15000, 20000);
-
-	config_oa_regs(dev_priv, oa_config->b_counter_regs,
-		       oa_config->b_counter_regs_len);
-
-	return 0;
+	return emit_oa_config(dev_priv, stream);
 }
 
 static void gen8_disable_metric_set(struct drm_i915_private *dev_priv)
@@ -2339,7 +2355,9 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
 			       struct perf_open_properties *props)
 {
 	struct drm_i915_private *dev_priv = stream->dev_priv;
+	struct drm_i915_gem_object *obj;
 	int format_size;
+	long timeout;
 	int ret;
 
 	/* If the sysfs metrics/ directory wasn't registered for some
@@ -2423,13 +2441,6 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
 		}
 	}
 
-	ret = i915_perf_get_oa_config(dev_priv, props->metrics_set,
-				      &stream->oa_config, NULL);
-	if (ret) {
-		DRM_DEBUG("Invalid OA config id=%i\n", props->metrics_set);
-		goto err_config;
-	}
-
 	ret = alloc_noa_wait(dev_priv);
 	if (ret) {
 		DRM_DEBUG("Unable to allocate NOA wait batch buffer\n");
@@ -2455,47 +2466,90 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
 	if (ret)
 		goto err_oa_buf_alloc;
 
+	ret = i915_perf_get_oa_config(dev_priv, props->metrics_set,
+				      &stream->oa_config, &obj);
+	if (ret) {
+		DRM_DEBUG("Invalid OA config id=%i\n", props->metrics_set);
+		goto err_config;
+	}
+
+	/*
+	 * We just need the buffer to be created, but not our own reference on
+	 * it as the oa_config already has one.
+	 */
+	i915_gem_object_put(obj);
+
+	stream->initial_config_rq =
+		i915_request_create(dev_priv->engine[RCS0]->kernel_context);
+	if (IS_ERR(stream->initial_config_rq)) {
+		ret = PTR_ERR(stream->initial_config_rq);
+		goto err_initial_config;
+	}
+
+	stream->ops = &i915_oa_stream_ops;
+
 	ret = i915_mutex_lock_interruptible(&dev_priv->drm);
 	if (ret)
 		goto err_lock;
 
-	stream->ops = &i915_oa_stream_ops;
+	ret = i915_active_request_set(&dev_priv->engine[RCS0]->last_oa_config,
+				      stream->initial_config_rq);
+	if (ret) {
+		mutex_unlock(&dev_priv->drm.struct_mutex);
+		goto err_lock;
+	}
+
 	dev_priv->perf.oa.exclusive_stream = stream;
 
+	mutex_unlock(&dev_priv->drm.struct_mutex);
+
 	ret = dev_priv->perf.oa.ops.enable_metric_set(stream);
 	if (ret) {
 		DRM_DEBUG("Unable to enable metric set\n");
 		goto err_enable;
 	}
 
-	DRM_DEBUG("opening stream oa config uuid=%s\n", stream->oa_config->uuid);
+	i915_request_get(stream->initial_config_rq);
 
-	mutex_unlock(&dev_priv->drm.struct_mutex);
+	i915_request_add(stream->initial_config_rq);
+
+	timeout = i915_request_wait(stream->initial_config_rq,
+				    I915_WAIT_INTERRUPTIBLE,
+				    MAX_SCHEDULE_TIMEOUT);
+	i915_request_put(stream->initial_config_rq);
+	stream->initial_config_rq = NULL;
+
+	ret = timeout < 0 ? timeout : 0;
+	if (ret)
+		goto err_enable;
+
+	DRM_DEBUG("opening stream oa config uuid=%s\n", stream->oa_config->uuid);
 
 	return 0;
 
 err_enable:
+	mutex_lock(&dev_priv->drm.struct_mutex);
 	dev_priv->perf.oa.exclusive_stream = NULL;
-	dev_priv->perf.oa.ops.disable_metric_set(dev_priv);
 	mutex_unlock(&dev_priv->drm.struct_mutex);
+	dev_priv->perf.oa.ops.disable_metric_set(dev_priv);
 
 err_lock:
-	free_oa_buffer(dev_priv);
+	i915_request_add(stream->initial_config_rq);
 
-err_oa_buf_alloc:
+err_initial_config:
 	i915_oa_config_put(stream->oa_config);
+	i915_oa_config_dispose_buffers(dev_priv);
+
+err_config:
+	free_oa_buffer(dev_priv);
 
+err_oa_buf_alloc:
 	intel_uncore_forcewake_put(&dev_priv->uncore, FORCEWAKE_ALL);
 	intel_runtime_pm_put(&dev_priv->runtime_pm, stream->wakeref);
 
-	mutex_lock(&dev_priv->drm.struct_mutex);
-	i915_vma_unpin_and_release(&dev_priv->perf.oa.noa_wait, 0);
-	mutex_unlock(&dev_priv->drm.struct_mutex);
+	free_noa_wait(dev_priv);
 
 err_noa_wait_alloc:
-	i915_oa_config_put(stream->oa_config);
-
-err_config:
 	if (stream->ctx)
 		oa_put_render_ctx_id(stream);
 
@@ -2857,20 +2911,13 @@ static int i915_perf_release(struct inode *inode, struct file *file)
 {
 	struct i915_perf_stream *stream = file->private_data;
 	struct drm_i915_private *dev_priv = stream->dev_priv;
-	struct i915_oa_config *oa_config, *next;
 
 	mutex_lock(&dev_priv->perf.lock);
 
 	i915_perf_destroy_locked(stream);
 
 	/* Dispose of all oa config batch buffers. */
-	mutex_lock(&dev_priv->perf.metrics_lock);
-	list_for_each_entry_safe(oa_config, next, &dev_priv->perf.metrics_buffers, vma_link) {
-		list_del(&oa_config->vma_link);
-		i915_gem_object_put(oa_config->obj);
-		oa_config->obj = NULL;
-	}
-	mutex_unlock(&dev_priv->perf.metrics_lock);
+	i915_oa_config_dispose_buffers(dev_priv);
 
 	mutex_unlock(&dev_priv->perf.lock);
 
-- 
2.21.0.392.gf8f6787159e

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v6 11/11] drm/i915: add support for perf configuration queries
  2019-07-01 11:34 [PATCH v6 00/11] drm/i915: Vulkan performance query support Lionel Landwerlin
                   ` (9 preceding siblings ...)
  2019-07-01 11:34 ` [PATCH v6 10/11] drm/i915/perf: execute OA configuration from command stream Lionel Landwerlin
@ 2019-07-01 11:34 ` Lionel Landwerlin
  2019-07-01 13:08 ` ✗ Fi.CI.CHECKPATCH: warning for drm/i915: Vulkan performance query support (rev6) Patchwork
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 44+ messages in thread
From: Lionel Landwerlin @ 2019-07-01 11:34 UTC (permalink / raw)
  To: intel-gfx

Listing configurations at the moment is supported only through sysfs.
This might cause issues for applications wanting to list
configurations from a container where sysfs isn't available.

This change adds a way to query the number of configurations and their
content through the i915 query uAPI.

v2: Fix sparse warnings (Lionel)
    Add support to query configuration using uuid (Lionel)

v3: Fix some inconsistency in uapi header (Lionel)
    Fix unlocking when not locked issue (Lionel)
    Add debug messages (Lionel)

v4: Fix missing unlock (Dan)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h   |   6 +
 drivers/gpu/drm/i915/i915_perf.c  |   3 +
 drivers/gpu/drm/i915/i915_query.c | 279 ++++++++++++++++++++++++++++++
 include/uapi/drm/i915_drm.h       |  65 ++++++-
 4 files changed, 350 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index a2430bc3ffd9..2948f8be2293 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1732,6 +1732,12 @@ struct drm_i915_private {
 		 */
 		struct list_head metrics_buffers;
 
+		/*
+		 * Number of dynamic configurations, you need to hold
+		 * dev_priv->perf.metrics_lock to access it.
+		 */
+		u32 n_metrics;
+
 		/*
 		 * Lock associated with anything below within this structure
 		 * except exclusive_stream.
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 8ee4bd712cb6..b9047211e08d 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -3707,6 +3707,8 @@ int i915_perf_add_config_ioctl(struct drm_device *dev, void *data,
 		goto sysfs_err;
 	}
 
+	dev_priv->perf.n_metrics++;
+
 	mutex_unlock(&dev_priv->perf.metrics_lock);
 
 	DRM_DEBUG("Added config %s id=%i\n", oa_config->uuid, oa_config->id);
@@ -3767,6 +3769,7 @@ int i915_perf_remove_config_ioctl(struct drm_device *dev, void *data,
 			   &oa_config->sysfs_metric);
 
 	idr_remove(&dev_priv->perf.metrics_idr, *arg);
+	dev_priv->perf.n_metrics--;
 
 	mutex_unlock(&dev_priv->perf.metrics_lock);
 
diff --git a/drivers/gpu/drm/i915/i915_query.c b/drivers/gpu/drm/i915/i915_query.c
index 7b7016171057..c1e203c7b2c2 100644
--- a/drivers/gpu/drm/i915/i915_query.c
+++ b/drivers/gpu/drm/i915/i915_query.c
@@ -143,10 +143,289 @@ query_engine_info(struct drm_i915_private *i915,
 	return len;
 }
 
+static int can_copy_perf_config_registers_or_number(u32 user_n_regs,
+						    u64 user_regs_ptr,
+						    u32 kernel_n_regs)
+{
+	/*
+	 * We'll just put the number of registers, and won't copy the
+	 * register.
+	 */
+	if (user_n_regs == 0)
+		return 0;
+
+	if (user_n_regs < kernel_n_regs)
+		return -EINVAL;
+
+	if (!access_ok(u64_to_user_ptr(user_regs_ptr),
+		       2 * sizeof(u32) * kernel_n_regs))
+		return -EFAULT;
+
+	return 0;
+}
+
+static int copy_perf_config_registers_or_number(const struct i915_oa_reg *kernel_regs,
+						u32 kernel_n_regs,
+						u64 user_regs_ptr,
+						u32 *user_n_regs)
+{
+	u32 r;
+
+	if (*user_n_regs == 0) {
+		*user_n_regs = kernel_n_regs;
+		return 0;
+	}
+
+	*user_n_regs = kernel_n_regs;
+
+	for (r = 0; r < kernel_n_regs; r++) {
+		u32 __user *user_reg_ptr =
+			u64_to_user_ptr(user_regs_ptr + sizeof(u32) * r * 2);
+		u32 __user *user_val_ptr =
+			u64_to_user_ptr(user_regs_ptr + sizeof(u32) * r * 2 +
+					sizeof(u32));
+		int ret;
+
+		ret = __put_user(i915_mmio_reg_offset(kernel_regs[r].addr),
+				 user_reg_ptr);
+		if (ret)
+			return -EFAULT;
+
+		ret = __put_user(kernel_regs[r].value, user_val_ptr);
+		if (ret)
+			return -EFAULT;
+	}
+
+	return 0;
+}
+
+static int query_perf_config_data(struct drm_i915_private *i915,
+				  struct drm_i915_query_item *query_item,
+				  bool use_uuid)
+{
+	struct drm_i915_query_perf_config __user *user_query_config_ptr =
+		u64_to_user_ptr(query_item->data_ptr);
+	struct drm_i915_perf_oa_config __user *user_config_ptr =
+		u64_to_user_ptr(query_item->data_ptr +
+				sizeof(struct drm_i915_query_perf_config));
+	struct drm_i915_perf_oa_config user_config;
+	struct i915_oa_config *oa_config = NULL;
+	u32 flags, total_size;
+	int ret;
+
+	if (!i915->perf.initialized)
+		return -ENODEV;
+
+	total_size = sizeof(struct drm_i915_query_perf_config) +
+		sizeof(struct drm_i915_perf_oa_config);
+
+	if (query_item->length == 0)
+		return total_size;
+
+	if (query_item->length < total_size) {
+		DRM_DEBUG("Invalid query config data item size=%u expected=%u\n",
+			  query_item->length, total_size);
+		return -EINVAL;
+	}
+
+	if (!access_ok(user_query_config_ptr, total_size))
+		return -EFAULT;
+
+	if (__get_user(flags, &user_query_config_ptr->flags))
+		return -EFAULT;
+
+	if (flags != 0)
+		return -EINVAL;
+
+	ret = mutex_lock_interruptible(&i915->perf.metrics_lock);
+	if (ret)
+		return ret;
+
+	if (use_uuid) {
+		char uuid[UUID_STRING_LEN + 1] = { 0, };
+		struct i915_oa_config *tmp;
+		int id;
+
+		BUILD_BUG_ON(sizeof(user_query_config_ptr->uuid) >= sizeof(uuid));
+
+		if (__copy_from_user(uuid, user_query_config_ptr->uuid,
+				     sizeof(user_query_config_ptr->uuid))) {
+			ret = -EFAULT;
+			goto out;
+		}
+
+		idr_for_each_entry(&i915->perf.metrics_idr, tmp, id) {
+			if (!strcmp(tmp->uuid, uuid)) {
+				oa_config = tmp;
+				break;
+			}
+		}
+	} else {
+		u64 config_id;
+
+		if (__get_user(config_id, &user_query_config_ptr->config)) {
+			ret = -EFAULT;
+			goto out;
+		}
+
+		if (config_id == 1)
+			oa_config = &i915->perf.oa.test_config;
+		else
+			oa_config = idr_find(&i915->perf.metrics_idr, config_id);
+	}
+
+	if (!oa_config) {
+		ret = -ENOENT;
+		goto out;
+	}
+
+	if (__copy_from_user(&user_config, user_config_ptr,
+			     sizeof(user_config))) {
+		ret = -EFAULT;
+		goto out;
+	}
+
+	ret = can_copy_perf_config_registers_or_number(user_config.n_boolean_regs,
+						       user_config.boolean_regs_ptr,
+						       oa_config->b_counter_regs_len);
+	if (ret)
+		goto out;
+
+	ret = can_copy_perf_config_registers_or_number(user_config.n_flex_regs,
+						       user_config.flex_regs_ptr,
+						       oa_config->flex_regs_len);
+	if (ret)
+		goto out;
+
+	ret = can_copy_perf_config_registers_or_number(user_config.n_mux_regs,
+						       user_config.mux_regs_ptr,
+						       oa_config->mux_regs_len);
+	if (ret)
+		goto out;
+
+	ret = copy_perf_config_registers_or_number(oa_config->b_counter_regs,
+						   oa_config->b_counter_regs_len,
+						   user_config.boolean_regs_ptr,
+						   &user_config.n_boolean_regs);
+	if (ret)
+		goto out;
+
+	ret = copy_perf_config_registers_or_number(oa_config->flex_regs,
+						   oa_config->flex_regs_len,
+						   user_config.flex_regs_ptr,
+						   &user_config.n_flex_regs);
+	if (ret)
+		goto out;
+
+	ret = copy_perf_config_registers_or_number(oa_config->mux_regs,
+						   oa_config->mux_regs_len,
+						   user_config.mux_regs_ptr,
+						   &user_config.n_mux_regs);
+	if (ret)
+		goto out;
+
+	memcpy(user_config.uuid, oa_config->uuid, sizeof(user_config.uuid));
+
+	if (__copy_to_user(user_config_ptr, &user_config,
+			   sizeof(user_config))) {
+		ret = -EFAULT;
+		goto out;
+	}
+
+	ret = total_size;
+
+out:
+	mutex_unlock(&i915->perf.metrics_lock);
+	return ret;
+}
+
+static int query_perf_config_list(struct drm_i915_private *i915,
+				  struct drm_i915_query_item *query_item)
+{
+	struct drm_i915_query_perf_config __user *user_query_config_ptr =
+		u64_to_user_ptr(query_item->data_ptr);
+	struct i915_oa_config *oa_config;
+	u32 flags, total_size;
+	u64 n_configs;
+	int ret, id;
+
+	if (!i915->perf.initialized)
+		return -ENODEV;
+
+	/* Count the default test configuration */
+	n_configs = i915->perf.n_metrics + 1;
+	total_size = sizeof(struct drm_i915_query_perf_config) +
+		sizeof(u64) * n_configs;
+
+	if (query_item->length == 0)
+		return total_size;
+
+	if (query_item->length < total_size) {
+		DRM_DEBUG("Invalid query config list item size=%u expected=%u\n",
+			  query_item->length, total_size);
+		return -EINVAL;
+	}
+
+	if (!access_ok(user_query_config_ptr, total_size))
+		return -EFAULT;
+
+	if (__get_user(flags, &user_query_config_ptr->flags))
+		return -EFAULT;
+
+	if (flags != 0)
+		return -EINVAL;
+
+	if (__put_user(n_configs, &user_query_config_ptr->config))
+		return -EFAULT;
+
+	if (__put_user((u64)1ULL, &user_query_config_ptr->data[0]))
+		return -EFAULT;
+
+	ret = mutex_lock_interruptible(&i915->perf.metrics_lock);
+	if (ret)
+		return ret;
+
+	n_configs = 1;
+	idr_for_each_entry(&i915->perf.metrics_idr, oa_config, id) {
+		u64 __user *item =
+			u64_to_user_ptr(query_item->data_ptr +
+					sizeof(struct drm_i915_query_perf_config) +
+					n_configs * sizeof(u64));
+
+		if (__put_user((u64)id, item)) {
+			ret = -EFAULT;
+			goto out;
+		}
+		n_configs++;
+	}
+
+	ret = total_size;
+
+out:
+	mutex_unlock(&i915->perf.metrics_lock);
+	return ret;
+}
+
+static int query_perf_config(struct drm_i915_private *i915,
+			     struct drm_i915_query_item *query_item)
+{
+	switch (query_item->flags) {
+	case DRM_I915_QUERY_PERF_CONFIG_LIST:
+		return query_perf_config_list(i915, query_item);
+	case DRM_I915_QUERY_PERF_CONFIG_DATA_FOR_UUID:
+		return query_perf_config_data(i915, query_item, true);
+	case DRM_I915_QUERY_PERF_CONFIG_DATA_FOR_ID:
+		return query_perf_config_data(i915, query_item, false);
+	default:
+		return -EINVAL;
+	}
+}
+
 static int (* const i915_query_funcs[])(struct drm_i915_private *dev_priv,
 					struct drm_i915_query_item *query_item) = {
 	query_topology_info,
 	query_engine_info,
+	query_perf_config,
 };
 
 int i915_query_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index f2b7a6c1e870..49a9dacead08 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1036,8 +1036,7 @@ enum drm_i915_gem_execbuffer_ext {
 	DRM_I915_GEM_EXECBUFFER_EXT_TIMELINE_FENCES = 0,
 
 	/**
-	 * This identifier is associated with
-	 * drm_i915_gem_execbuffer_perf_ext.
+	 * See drm_i915_gem_execbuffer_perf_ext.
 	 */
 	DRM_I915_GEM_EXECBUFFER_EXT_PERF,
 
@@ -2109,6 +2108,7 @@ struct drm_i915_query_item {
 	__u64 query_id;
 #define DRM_I915_QUERY_TOPOLOGY_INFO    1
 #define DRM_I915_QUERY_ENGINE_INFO	2
+#define DRM_I915_QUERY_PERF_CONFIG      3
 /* Must be kept compact -- no holes and well documented */
 
 	/*
@@ -2120,9 +2120,18 @@ struct drm_i915_query_item {
 	__s32 length;
 
 	/*
-	 * Unused for now. Must be cleared to zero.
+	 * When query_id == DRM_I915_QUERY_TOPOLOGY_INFO, must be 0.
+	 *
+	 * When query_id == DRM_I915_QUERY_PERF_CONFIG, must be one of the
+	 * following :
+	 *         - DRM_I915_QUERY_PERF_CONFIG_LIST
+	 *         - DRM_I915_QUERY_PERF_CONFIG_DATA_FOR_UUID
+	 *         - DRM_I915_QUERY_PERF_CONFIG_FOR_UUID
 	 */
 	__u32 flags;
+#define DRM_I915_QUERY_PERF_CONFIG_LIST          1
+#define DRM_I915_QUERY_PERF_CONFIG_DATA_FOR_UUID 2
+#define DRM_I915_QUERY_PERF_CONFIG_DATA_FOR_ID   3
 
 	/*
 	 * Data will be written at the location pointed by data_ptr when the
@@ -2248,6 +2257,56 @@ struct drm_i915_query_engine_info {
 	struct drm_i915_engine_info engines[];
 };
 
+/*
+ * Data written by the kernel with query DRM_I915_QUERY_PERF_CONFIG.
+ */
+struct drm_i915_query_perf_config {
+	union {
+		/*
+		 * When query_item.flags == DRM_I915_QUERY_PERF_CONFIG_LIST, i915 sets
+		 * this fields to the number of configurations available.
+		 */
+		__u64 n_configs;
+
+		/*
+		 * When query_id == DRM_I915_QUERY_PERF_CONFIG_DATA_FOR_ID,
+		 * i915 will use the value in this field as configuration
+		 * identifier to decide what data to write into config_ptr.
+		 */
+		__u64 config;
+
+		/*
+		 * When query_id == DRM_I915_QUERY_PERF_CONFIG_DATA_FOR_UUID,
+		 * i915 will use the value in this field as configuration
+		 * identifier to decide what data to write into config_ptr.
+		 *
+		 * String formatted like "%08x-%04x-%04x-%04x-%012x"
+		 */
+		char uuid[36];
+	};
+
+	/*
+	 * Unused for now. Must be cleared to zero.
+	 */
+	__u32 flags;
+
+	/*
+	 * When query_item.flags == DRM_I915_QUERY_PERF_CONFIG_LIST, i915 will
+	 * write an array of __u64 of configuration identifiers.
+	 *
+	 * When query_item.flags == DRM_I915_QUERY_PERF_CONFIG_DATA, i915 will
+	 * write a struct drm_i915_perf_oa_config. If the following fields of
+	 * drm_i915_perf_oa_config are set not set to 0, i915 will write into
+	 * the associated pointers the values of submitted when the
+	 * configuration was created :
+	 *
+	 *         - n_mux_regs
+	 *         - n_boolean_regs
+	 *         - n_flex_regs
+	 */
+	__u8 data[];
+};
+
 #if defined(__cplusplus)
 }
 #endif
-- 
2.21.0.392.gf8f6787159e

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH v6 09/11] drm/i915/perf: allow holding preemption on filtered ctx
  2019-07-01 11:34 ` [PATCH v6 09/11] drm/i915/perf: allow holding preemption on filtered ctx Lionel Landwerlin
@ 2019-07-01 12:03   ` Chris Wilson
  2019-07-01 12:10     ` Lionel Landwerlin
  0 siblings, 1 reply; 44+ messages in thread
From: Chris Wilson @ 2019-07-01 12:03 UTC (permalink / raw)
  To: Lionel Landwerlin, intel-gfx

Quoting Lionel Landwerlin (2019-07-01 12:34:35)
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> index f92bace9caff..012d6d7f54e2 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> @@ -2104,6 +2104,14 @@ static int eb_oa_config(struct i915_execbuffer *eb)
>         if (err)
>                 return err;
>  
> +       /*
> +        * If the perf stream was opened with hold preemption, flag the
> +        * request properly so that the priority of the request is bumped once
> +        * it reaches the execlist ports.
> +        */
> +       if (eb->i915->perf.oa.exclusive_stream->hold_preemption)
> +               eb->request->flags |= I915_REQUEST_FLAGS_PERF;

Just to reassure myself that this is the behaviour you:

If the exclusive_stream is changed before the request is executed, it is
likely that we no longer notice the earlier preemption-protection. This
should not matter because the listener is no longer interested in those
events?
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v6 08/11] drm/i915: add a new perf configuration execbuf parameter
  2019-07-01 11:34 ` [PATCH v6 08/11] drm/i915: add a new perf configuration execbuf parameter Lionel Landwerlin
@ 2019-07-01 12:05   ` Chris Wilson
  2019-07-01 12:14     ` Lionel Landwerlin
  0 siblings, 1 reply; 44+ messages in thread
From: Chris Wilson @ 2019-07-01 12:05 UTC (permalink / raw)
  To: Lionel Landwerlin, intel-gfx

Quoting Lionel Landwerlin (2019-07-01 12:34:34)
> +static int eb_oa_config(struct i915_execbuffer *eb)
> +{
> +       int err;
> +
> +       if (!eb->oa_config)
> +               return 0;
> +
> +       err = i915_active_request_set(&eb->engine->last_oa_config,
> +                                     eb->request);
> +       if (err)
> +               return err;
> +
> +       /*
> +        * If the config hasn't changed, skip reconfiguring the HW (this is
> +        * subject to a delay we want to avoid has much as possible).
> +        */
> +       if (eb->oa_config == eb->i915->perf.oa.exclusive_stream->oa_config)
> +               return 0;

So what's the story for resets? I presume the OA config is lost on a
device reset, and possible an engine reset? If so, then if we reset, we
lose the config and do not notice.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v6 04/11] drm/i915: enumerate scratch fields
  2019-07-01 11:34 ` [PATCH v6 04/11] drm/i915: enumerate scratch fields Lionel Landwerlin
@ 2019-07-01 12:07   ` Chris Wilson
  0 siblings, 0 replies; 44+ messages in thread
From: Chris Wilson @ 2019-07-01 12:07 UTC (permalink / raw)
  To: Lionel Landwerlin, intel-gfx

Quoting Lionel Landwerlin (2019-07-01 12:34:30)
> We have a bunch of offsets in the scratch buffer. As we're about to
> add some more, let's group all of the offsets in a common location.
> 
> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v6 09/11] drm/i915/perf: allow holding preemption on filtered ctx
  2019-07-01 12:03   ` Chris Wilson
@ 2019-07-01 12:10     ` Lionel Landwerlin
  2019-07-01 14:37       ` Chris Wilson
  0 siblings, 1 reply; 44+ messages in thread
From: Lionel Landwerlin @ 2019-07-01 12:10 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On 01/07/2019 15:03, Chris Wilson wrote:
> Quoting Lionel Landwerlin (2019-07-01 12:34:35)
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>> index f92bace9caff..012d6d7f54e2 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>> @@ -2104,6 +2104,14 @@ static int eb_oa_config(struct i915_execbuffer *eb)
>>          if (err)
>>                  return err;
>>   
>> +       /*
>> +        * If the perf stream was opened with hold preemption, flag the
>> +        * request properly so that the priority of the request is bumped once
>> +        * it reaches the execlist ports.
>> +        */
>> +       if (eb->i915->perf.oa.exclusive_stream->hold_preemption)
>> +               eb->request->flags |= I915_REQUEST_FLAGS_PERF;
> Just to reassure myself that this is the behaviour you:
>
> If the exclusive_stream is changed before the request is executed, it is
> likely that we no longer notice the earlier preemption-protection. This
> should not matter because the listener is no longer interested in those
> events?
> -Chris
>

Yeah, dropping the perf stream before your queries complete and you're 
in undefined behavior territory.


-Lionel

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v6 08/11] drm/i915: add a new perf configuration execbuf parameter
  2019-07-01 12:05   ` Chris Wilson
@ 2019-07-01 12:14     ` Lionel Landwerlin
  0 siblings, 0 replies; 44+ messages in thread
From: Lionel Landwerlin @ 2019-07-01 12:14 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On 01/07/2019 15:05, Chris Wilson wrote:
> Quoting Lionel Landwerlin (2019-07-01 12:34:34)
>> +static int eb_oa_config(struct i915_execbuffer *eb)
>> +{
>> +       int err;
>> +
>> +       if (!eb->oa_config)
>> +               return 0;
>> +
>> +       err = i915_active_request_set(&eb->engine->last_oa_config,
>> +                                     eb->request);
>> +       if (err)
>> +               return err;
>> +
>> +       /*
>> +        * If the config hasn't changed, skip reconfiguring the HW (this is
>> +        * subject to a delay we want to avoid has much as possible).
>> +        */
>> +       if (eb->oa_config == eb->i915->perf.oa.exclusive_stream->oa_config)
>> +               return 0;
> So what's the story for resets? I presume the OA config is lost on a
> device reset, and possible an engine reset? If so, then if we reset, we
> lose the config and do not notice.
> -Chris
>
The current story (before those patches) for resets and OA is already 
pretty undefined.

I haven't actually gone to look at all the OA register see what they 
reset value would be.


At the moment we should consider a reset to make your results invalid.

I'll try to dig a bit on this.


-Lionel

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v6 05/11] drm/i915/perf: implement active wait for noa configurations
  2019-07-01 11:34 ` [PATCH v6 05/11] drm/i915/perf: implement active wait for noa configurations Lionel Landwerlin
@ 2019-07-01 12:43   ` Chris Wilson
  2019-07-01 13:10     ` Lionel Landwerlin
  0 siblings, 1 reply; 44+ messages in thread
From: Chris Wilson @ 2019-07-01 12:43 UTC (permalink / raw)
  To: Lionel Landwerlin, intel-gfx

Quoting Lionel Landwerlin (2019-07-01 12:34:31)
> NOA configuration take some amount of time to apply. That amount of
> time depends on the size of the GT. There is no documented time for
> this. For example, past experimentations with powergating
> configuration changes seem to indicate a 60~70us delay. We go with
> 500us as default for now which should be over the required amount of
> time (according to HW architects).
> 
> v2: Don't forget to save/restore registers used for the wait (Chris)
> 
> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/intel_gpu_commands.h |  24 ++
>  drivers/gpu/drm/i915/gt/intel_gt_types.h     |   5 +
>  drivers/gpu/drm/i915/i915_debugfs.c          |  25 +++
>  drivers/gpu/drm/i915/i915_drv.h              |   8 +
>  drivers/gpu/drm/i915/i915_perf.c             | 225 ++++++++++++++++++-
>  drivers/gpu/drm/i915/i915_reg.h              |   4 +-
>  6 files changed, 288 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> index e7eff9db343e..4a66af38c87b 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> @@ -151,6 +151,7 @@
>  #define   MI_BATCH_GTT             (2<<6) /* aliased with (1<<7) on gen4 */
>  #define MI_BATCH_BUFFER_START_GEN8     MI_INSTR(0x31, 1)
>  #define   MI_BATCH_RESOURCE_STREAMER (1<<10)
> +#define   MI_BATCH_PREDICATE         (1 << 15) /* HSW+ on RCS only*/
>  
>  /*
>   * 3D instructions used by the kernel
> @@ -226,6 +227,29 @@
>  #define   PIPE_CONTROL_DEPTH_CACHE_FLUSH               (1<<0)
>  #define   PIPE_CONTROL_GLOBAL_GTT (1<<2) /* in addr dword */
>  
> +#define MI_MATH(x) MI_INSTR(0x1a, (x)-1)
> +#define   MI_ALU_OP(op, src1, src2) (((op) << 20) | ((src1) << 10) | (src2))
> +/* operands */
> +#define   MI_ALU_OP_NOOP     0
> +#define   MI_ALU_OP_LOAD     128
> +#define   MI_ALU_OP_LOADINV  1152
> +#define   MI_ALU_OP_LOAD0    129
> +#define   MI_ALU_OP_LOAD1    1153
> +#define   MI_ALU_OP_ADD      256
> +#define   MI_ALU_OP_SUB      257
> +#define   MI_ALU_OP_AND      258
> +#define   MI_ALU_OP_OR       259
> +#define   MI_ALU_OP_XOR      260
> +#define   MI_ALU_OP_STORE    384
> +#define   MI_ALU_OP_STOREINV 1408
> +/* sources */
> +#define   MI_ALU_SRC_REG(x)  (x) /* 0 -> 15 */
> +#define   MI_ALU_SRC_SRCA    32
> +#define   MI_ALU_SRC_SRCB    33
> +#define   MI_ALU_SRC_ACCU    49
> +#define   MI_ALU_SRC_ZF      50
> +#define   MI_ALU_SRC_CF      51
> +
>  /*
>   * Commands used only by the command parser
>   */
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_types.h b/drivers/gpu/drm/i915/gt/intel_gt_types.h
> index e625a5e320d3..0750ac49a05b 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h
> @@ -70,6 +70,11 @@ enum intel_gt_scratch_field {
>         /* 8 bytes */
>         INTEL_GT_SCRATCH_FIELD_COHERENTL3_WA = 256,
>  
> +       /* 6 * 8 bytes */
> +       INTEL_GT_SCRATCH_FIELD_PERF_CS_GPR = 2048,
> +
> +       /* 4 bytes */
> +       INTEL_GT_SCRATCH_FIELD_PERF_PREDICATE_RESULT_1 = 2096,
>  };
>  
>  #endif /* __INTEL_GT_TYPES_H__ */
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index eeecdad0e3ca..6b49fda145e7 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -3646,6 +3646,30 @@ DEFINE_SIMPLE_ATTRIBUTE(i915_wedged_fops,
>                         i915_wedged_get, i915_wedged_set,
>                         "%llu\n");
>  
> +static int
> +i915_perf_noa_delay_set(void *data, u64 val)
> +{
> +       struct drm_i915_private *i915 = data;
> +
> +       atomic64_set(&i915->perf.oa.noa_programming_delay, val);
> +       return 0;
> +}
> +
> +static int
> +i915_perf_noa_delay_get(void *data, u64 *val)
> +{
> +       struct drm_i915_private *i915 = data;
> +
> +       *val = atomic64_read(&i915->perf.oa.noa_programming_delay);
> +       return 0;
> +}
> +
> +DEFINE_SIMPLE_ATTRIBUTE(i915_perf_noa_delay_fops,
> +                       i915_perf_noa_delay_get,
> +                       i915_perf_noa_delay_set,
> +                       "%llu\n");
> +
> +
>  #define DROP_UNBOUND   BIT(0)
>  #define DROP_BOUND     BIT(1)
>  #define DROP_RETIRE    BIT(2)
> @@ -4411,6 +4435,7 @@ static const struct i915_debugfs_files {
>         const char *name;
>         const struct file_operations *fops;
>  } i915_debugfs_files[] = {
> +       {"i915_perf_noa_delay", &i915_perf_noa_delay_fops},
>         {"i915_wedged", &i915_wedged_fops},
>         {"i915_cache_sharing", &i915_cache_sharing_fops},
>         {"i915_gem_drop_caches", &i915_drop_caches_fops},
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index df39b2ee6bd9..fe93a260bd28 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1837,6 +1837,14 @@ struct drm_i915_private {
>  
>                         struct i915_oa_ops ops;
>                         const struct i915_oa_format *oa_formats;
> +
> +                       /**
> +                        * A batch buffer doing a wait on the GPU for the NOA
> +                        * logic to be reprogrammed.
> +                        */
> +                       struct i915_vma *noa_wait;
> +
> +                       atomic64_t noa_programming_delay;
>                 } oa;
>         } perf;
>  
> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> index 5ba771468078..03e6908282e3 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -197,6 +197,7 @@
>  
>  #include "gem/i915_gem_context.h"
>  #include "gem/i915_gem_pm.h"
> +#include "gt/intel_gt.h"
>  #include "gt/intel_lrc_reg.h"
>  
>  #include "i915_drv.h"
> @@ -429,7 +430,7 @@ static int alloc_oa_config_buffer(struct drm_i915_private *i915,
>                                               MI_LOAD_REGISTER_IMM_MAX_REGS) * 4;
>                 config_length += oa_config->flex_regs_len * 8;
>         }
> -       config_length += 4; /* MI_BATCH_BUFFER_END */
> +       config_length += 12; /* MI_BATCH_BUFFER_START into noa_wait loop */
>         config_length = ALIGN(config_length, I915_GTT_PAGE_SIZE);
>  
>         bo = i915_gem_object_create_shmem(i915, config_length);
> @@ -446,7 +447,12 @@ static int alloc_oa_config_buffer(struct drm_i915_private *i915,
>         cs = write_cs_mi_lri(cs, oa_config->b_counter_regs, oa_config->b_counter_regs_len);
>         cs = write_cs_mi_lri(cs, oa_config->flex_regs, oa_config->flex_regs_len);
>  
> -       *cs++ = MI_BATCH_BUFFER_END;
> +
> +       /* Jump into the NOA wait busy loop. */
> +       *cs++ = (INTEL_GEN(i915) < 8 ?
> +                MI_BATCH_BUFFER_START : MI_BATCH_BUFFER_START_GEN8);
> +       *cs++ = i915_ggtt_offset(i915->perf.oa.noa_wait);
> +       *cs++ = 0;
>  
>         i915_gem_object_flush_map(bo);
>         i915_gem_object_unpin_map(bo);
> @@ -1467,6 +1473,7 @@ static void i915_oa_stream_destroy(struct i915_perf_stream *stream)
>         mutex_lock(&dev_priv->drm.struct_mutex);
>         dev_priv->perf.oa.exclusive_stream = NULL;
>         dev_priv->perf.oa.ops.disable_metric_set(dev_priv);
> +       i915_vma_unpin_and_release(&dev_priv->perf.oa.noa_wait, 0);
>         mutex_unlock(&dev_priv->drm.struct_mutex);
>  
>         free_oa_buffer(dev_priv);
> @@ -1653,6 +1660,204 @@ static int alloc_oa_buffer(struct drm_i915_private *dev_priv)
>         return ret;
>  }
>  
> +static u32 *save_register(struct drm_i915_private *i915, u32 *cs,
> +                         i915_reg_t reg, u32 offset, u32 dword_count)
> +{
> +       uint32_t d;
> +
> +       for (d = 0; d < dword_count; d++) {
> +               *cs++ = INTEL_GEN(i915) >= 8 ?
> +                       MI_STORE_REGISTER_MEM_GEN8 : MI_STORE_REGISTER_MEM;
> +               *cs++ = i915_mmio_reg_offset(reg) + 4 * d;
> +               *cs++ = intel_gt_scratch_offset(&i915->gt, offset) + 4 * d;
> +               if (INTEL_GEN(i915) >= 8)
> +                       *cs++ = 0;

Will anyone care about the extra nop on hsw? :)

> +static int alloc_noa_wait(struct drm_i915_private *i915)
> +{
> +       struct drm_i915_gem_object *bo;
> +       struct i915_vma *vma;
> +       u64 delay_ns = atomic64_read(&i915->perf.oa.noa_programming_delay), delay_ticks;
> +       u32 *batch, *ts0, *cs, *jump;
> +       int ret, i;
> +
> +       bo = i915_gem_object_create_shmem(i915, 4096);

So swappable. At 4k, we almost consume as much in our bookkeeping as we
allocate for the backing store. Yes, it's atrocious.

Hang on. This can never be unpinned as it stores absolute addresses. So
this can be i915_gem_object_create_internal().

> +       if (IS_ERR(bo)) {
> +               DRM_ERROR("Failed to allocate NOA wait batchbuffer\n");
> +               return PTR_ERR(bo);
> +       }
> +
> +       /*
> +        * We pin in GGTT because we jump into this buffer now because
> +        * multiple OA config BOs will have a jump to this address and it
> +        * needs to be fixed during the lifetime of the i915/perf stream.
> +        */
> +       vma = i915_gem_object_ggtt_pin(bo, NULL, 0, 4096, 0);
> +       if (IS_ERR(vma)) {
> +               ret = PTR_ERR(vma);
> +               goto err_unref;
> +       }
> +
> +       batch = cs = i915_gem_object_pin_map(bo, I915_MAP_WB);
> +       if (IS_ERR(batch)) {
> +               ret = PTR_ERR(batch);
> +               goto err_unpin;
> +       }
> +
> +       /* Save registers. */
> +       for (i = 0; i <= 5; i++) {
> +               cs = save_register(i915, cs, HSW_CS_GPR(i),
> +                                  INTEL_GT_SCRATCH_FIELD_PERF_CS_GPR + 8 * i, 2);
> +       }
> +       cs = save_register(i915, cs, MI_PREDICATE_RESULT_1,
> +                          INTEL_GT_SCRATCH_FIELD_PERF_PREDICATE_RESULT_1, 1);
> +
> +       /* First timestamp snapshot location. */
> +       ts0 = cs;

Did this ever get used?

> +       /*
> +        * Initial snapshot of the timestamp register to implement the wait.
> +        * We work with 32b values, so clear out the top 32b bits of the
> +        * register because the ALU works 64bits.
> +        */
> +       *cs++ = MI_LOAD_REGISTER_IMM(1);
> +       *cs++ = i915_mmio_reg_offset(HSW_CS_GPR(0)) + 4;
> +       *cs++ = 0;
> +       *cs++ = MI_LOAD_REGISTER_REG | (3 - 2);
> +       *cs++ = i915_mmio_reg_offset(RING_TIMESTAMP(RENDER_RING_BASE));
> +       *cs++ = i915_mmio_reg_offset(HSW_CS_GPR(0));
> +
> +       /*
> +        * This is the location we're going to jump back into until the
> +        * required amount of time has passed.
> +        */
> +       jump = cs;
> +
> +       /*
> +        * Take another snapshot of the timestamp register. Take care to clear
> +        * up the top 32bits of CS_GPR(1) as we're using it for other
> +        * operations below.
> +        */
> +       *cs++ = MI_LOAD_REGISTER_IMM(1);
> +       *cs++ = i915_mmio_reg_offset(HSW_CS_GPR(1)) + 4;
> +       *cs++ = 0;
> +       *cs++ = MI_LOAD_REGISTER_REG | (3 - 2);
> +       *cs++ = i915_mmio_reg_offset(RING_TIMESTAMP(RENDER_RING_BASE));
> +       *cs++ = i915_mmio_reg_offset(HSW_CS_GPR(1));

cs = get_timestamp(cs, 1);

enum { START, NOW, DELTA, RESULT, TARGET } ?

That would also help with save/restore registers, as all CS_GPR should
then be named.

> +       /*
> +        * Do a diff between the 2 timestamps and store the result back into
> +        * CS_GPR(1).
> +        */
> +       *cs++ = MI_MATH(5);
> +       *cs++ = MI_ALU_OP(MI_ALU_OP_LOAD, MI_ALU_SRC_SRCA, MI_ALU_SRC_REG(1));
> +       *cs++ = MI_ALU_OP(MI_ALU_OP_LOAD, MI_ALU_SRC_SRCB, MI_ALU_SRC_REG(0));
> +       *cs++ = MI_ALU_OP(MI_ALU_OP_SUB, 0, 0);
> +       *cs++ = MI_ALU_OP(MI_ALU_OP_STORE, MI_ALU_SRC_REG(2), MI_ALU_SRC_ACCU);
> +       *cs++ = MI_ALU_OP(MI_ALU_OP_STORE, MI_ALU_SRC_REG(3), MI_ALU_SRC_CF);
> +
> +       /*
> +        * Transfer the carry flag (set to 1 if ts1 < ts0, meaning the
> +        * timestamp have rolled over the 32bits) into the predicate register
> +        * to be used for the predicated jump.
> +        */
> +       *cs++ = MI_LOAD_REGISTER_REG | (3 - 2);
> +       *cs++ = i915_mmio_reg_offset(HSW_CS_GPR(3));
> +       *cs++ = i915_mmio_reg_offset(MI_PREDICATE_RESULT_1);
> +
> +       /* Restart from the beginning if we had timestamps roll over. */
> +       *cs++ = (INTEL_GEN(i915) < 8 ?
> +                MI_BATCH_BUFFER_START : MI_BATCH_BUFFER_START_GEN8) |
> +               MI_BATCH_PREDICATE;
> +       *cs++ = vma->node.start;
> +       *cs++ = 0;
> +
> +       /*
> +        * Now add the diff between to previous timestamps and add it to :
> +        *      (((1 * << 64) - 1) - delay_ns)
> +        *
> +        * When the Carry Flag contains 1 this means the elapsed time is
> +        * longer than the expected delay, and we can exit the wait loop.
> +        */
> +       delay_ticks = 0xffffffffffffffff -
> +               DIV64_U64_ROUND_UP(delay_ns *
> +                                  RUNTIME_INFO(i915)->cs_timestamp_frequency_khz,
> +                                  1000000ull);
> +       *cs++ = MI_LOAD_REGISTER_IMM(2);
> +       *cs++ = i915_mmio_reg_offset(HSW_CS_GPR(4));
> +       *cs++ = lower_32_bits(delay_ticks);
> +       *cs++ = i915_mmio_reg_offset(HSW_CS_GPR(4)) + 4;
> +       *cs++ = upper_32_bits(delay_ticks);

Now, I was expecting to compute the 32b end timestamp and compare now to
that using the carry-flag to indicate the completion.

Why the detour? (I'm sure I am missing something here.)

Quick question request for >32b delays are rejected in the user debug api?

> +       *cs++ = MI_MATH(4);
> +       *cs++ = MI_ALU_OP(MI_ALU_OP_LOAD, MI_ALU_SRC_SRCA, MI_ALU_SRC_REG(2));
> +       *cs++ = MI_ALU_OP(MI_ALU_OP_LOAD, MI_ALU_SRC_SRCB, MI_ALU_SRC_REG(4));
> +       *cs++ = MI_ALU_OP(MI_ALU_OP_ADD, 0, 0);
> +       *cs++ = MI_ALU_OP(MI_ALU_OP_STOREINV, MI_ALU_SRC_REG(5), MI_ALU_SRC_CF);

Comparing delta against target delay. Inverted store to give 
	(delay - target) >= 0

> +       /*
> +        * Transfer the result into the predicate register to be used for the
> +        * predicated jump.
> +        */
> +       *cs++ = MI_LOAD_REGISTER_REG | (3 - 2);
> +       *cs++ = i915_mmio_reg_offset(HSW_CS_GPR(5));
> +       *cs++ = i915_mmio_reg_offset(MI_PREDICATE_RESULT_1);
> +
> +       /* Predicate the jump.  */
> +       *cs++ = (INTEL_GEN(i915) < 8 ?
> +                MI_BATCH_BUFFER_START : MI_BATCH_BUFFER_START_GEN8) |
> +               MI_BATCH_PREDICATE;

The joy of being on rcs.

> +       *cs++ = vma->node.start + (jump - batch) * 4;

Mutters i915_ggtt_offset(vma)

> +       *cs++ = 0;
> +
> +       /* Restore registers. */
> +       for (i = 0; i <= 5; i++) {
> +               cs = restore_register(i915, cs, HSW_CS_GPR(i),
> +                                     INTEL_GT_SCRATCH_FIELD_PERF_CS_GPR + 8 * i, 2);
> +       }
> +       cs = restore_register(i915, cs, MI_PREDICATE_RESULT_1,
> +                             INTEL_GT_SCRATCH_FIELD_PERF_PREDICATE_RESULT_1, 1);
> +
> +       /* And return to the ring. */
> +       *cs++ = MI_BATCH_BUFFER_END;

GEM_BUG_ON(cs - batch > PAGE_SIZE/sizeof(*batch));

> +
> +       i915_gem_object_flush_map(bo);
> +       i915_gem_object_unpin_map(bo);
> +
> +       i915->perf.oa.noa_wait = vma;
> +
> +       return 0;
> +
> +err_unpin:
> +       __i915_vma_unpin(vma);
> +
> +err_unref:
> +       i915_gem_object_put(bo);
> +
> +       return ret;
> +}

Certainly seems to do what you say on the tin.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v6 02/11] drm/i915/perf: introduce a versioning of the i915-perf uapi
  2019-07-01 11:34 ` [PATCH v6 02/11] drm/i915/perf: introduce a versioning of the i915-perf uapi Lionel Landwerlin
@ 2019-07-01 12:45   ` Chris Wilson
  0 siblings, 0 replies; 44+ messages in thread
From: Chris Wilson @ 2019-07-01 12:45 UTC (permalink / raw)
  To: Lionel Landwerlin, intel-gfx

Quoting Lionel Landwerlin (2019-07-01 12:34:28)
> Reporting this version will help application figure out what level of
> the support the running kernel provides.
> 
> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.c |  3 +++
>  include/uapi/drm/i915_drm.h     | 21 +++++++++++++++++++++
>  2 files changed, 24 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index 794c6814a6d0..fa02e8f033d7 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -483,6 +483,9 @@ static int i915_getparam_ioctl(struct drm_device *dev, void *data,
>         case I915_PARAM_MMAP_GTT_COHERENT:
>                 value = INTEL_INFO(dev_priv)->has_coherent_ggtt;
>                 break;
> +       case I915_PARAM_PERF_REVISION:
> +               value = 1;

I would suggest making i915_perf_ioctl_version() and putting the value
1 there so you can document changes within i915_perf.c
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v6 03/11] drm/i915/perf: allow for CS OA configs to be created lazily
  2019-07-01 11:34 ` [PATCH v6 03/11] drm/i915/perf: allow for CS OA configs to be created lazily Lionel Landwerlin
@ 2019-07-01 13:06   ` Chris Wilson
  2019-07-01 13:45     ` Lionel Landwerlin
  2019-07-01 15:09   ` Chris Wilson
  2019-07-09  8:30   ` Chris Wilson
  2 siblings, 1 reply; 44+ messages in thread
From: Chris Wilson @ 2019-07-01 13:06 UTC (permalink / raw)
  To: Lionel Landwerlin, intel-gfx

Quoting Lionel Landwerlin (2019-07-01 12:34:29)
> @@ -2535,9 +2635,21 @@ static int i915_perf_release(struct inode *inode, struct file *file)
>  {
>         struct i915_perf_stream *stream = file->private_data;
>         struct drm_i915_private *dev_priv = stream->dev_priv;
> +       struct i915_oa_config *oa_config, *next;
>  
>         mutex_lock(&dev_priv->perf.lock);
> +
>         i915_perf_destroy_locked(stream);
> +
> +       /* Dispose of all oa config batch buffers. */
> +       mutex_lock(&dev_priv->perf.metrics_lock);
> +       list_for_each_entry_safe(oa_config, next, &dev_priv->perf.metrics_buffers, vma_link) {
> +               list_del(&oa_config->vma_link);
> +               i915_gem_object_put(oa_config->obj);
> +               oa_config->obj = NULL;
> +       }
> +       mutex_unlock(&dev_priv->perf.metrics_lock);

What's the reference chain from the i915_perf fd to the i915_device?
What's even keeping the module alive!

Shouldn't be a drm_dev_get() in i915_perf_open_ioctl() and a
drm_dev_put() here?

So there may be more than one stream, sharing the same oa_config. If a
stream closes, you let all the current streams keep their reference and
the next gets a new object. Looks like there's some scope for
duplication, but looks safe enough. My main worry was for zombie
oa_config.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* ✗ Fi.CI.CHECKPATCH: warning for drm/i915: Vulkan performance query support (rev6)
  2019-07-01 11:34 [PATCH v6 00/11] drm/i915: Vulkan performance query support Lionel Landwerlin
                   ` (10 preceding siblings ...)
  2019-07-01 11:34 ` [PATCH v6 11/11] drm/i915: add support for perf configuration queries Lionel Landwerlin
@ 2019-07-01 13:08 ` Patchwork
  2019-07-01 13:14 ` ✗ Fi.CI.SPARSE: " Patchwork
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 44+ messages in thread
From: Patchwork @ 2019-07-01 13:08 UTC (permalink / raw)
  To: Lionel Landwerlin; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: Vulkan performance query support (rev6)
URL   : https://patchwork.freedesktop.org/series/60916/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
063d9e6a602f drm/i915/perf: add missing delay for OA muxes configuration
-:7: WARNING:COMMIT_MESSAGE: Missing commit description - Add an appropriate one

total: 0 errors, 1 warnings, 0 checks, 29 lines checked
5d6fb48512bb drm/i915/perf: introduce a versioning of the i915-perf uapi
61476a43f18a drm/i915/perf: allow for CS OA configs to be created lazily
-:138: CHECK:SPACING: No space is necessary after a cast
#138: FILE: drivers/gpu/drm/i915/i915_perf.c:399:
+					(u32) MI_LOAD_REGISTER_IMM_MAX_REGS);

total: 0 errors, 0 warnings, 1 checks, 361 lines checked
d84a895e334c drm/i915: enumerate scratch fields
-:25: CHECK:BRACES: Blank lines aren't necessary after an open brace '{'
#25: FILE: drivers/gpu/drm/i915/gt/intel_gt.h:30:
 {
+

-:89: CHECK:OPEN_ENDED_LINE: Lines should not end with a '('
#89: FILE: drivers/gpu/drm/i915/gt/intel_lrc.c:1809:
+	batch = gen8_emit_pipe_control(

total: 0 errors, 0 warnings, 2 checks, 171 lines checked
9762d251a468 drm/i915/perf: implement active wait for noa configurations
-:33: CHECK:SPACING: spaces preferred around that '-' (ctx:VxV)
#33: FILE: drivers/gpu/drm/i915/gt/intel_gpu_commands.h:230:
+#define MI_MATH(x) MI_INSTR(0x1a, (x)-1)
                                      ^

-:106: CHECK:LINE_SPACING: Please don't use multiple blank lines
#106: FILE: drivers/gpu/drm/i915/i915_debugfs.c:3672:
+
+

-:163: CHECK:LINE_SPACING: Please don't use multiple blank lines
#163: FILE: drivers/gpu/drm/i915/i915_perf.c:450:
 
+

-:187: CHECK:PREFER_KERNEL_TYPES: Prefer kernel type 'u32' over 'uint32_t'
#187: FILE: drivers/gpu/drm/i915/i915_perf.c:1666:
+	uint32_t d;

-:204: CHECK:PREFER_KERNEL_TYPES: Prefer kernel type 'u32' over 'uint32_t'
#204: FILE: drivers/gpu/drm/i915/i915_perf.c:1683:
+	uint32_t d;

-:243: CHECK:MULTIPLE_ASSIGNMENTS: multiple assignments should be avoided
#243: FILE: drivers/gpu/drm/i915/i915_perf.c:1722:
+	batch = cs = i915_gem_object_pin_map(bo, I915_MAP_WB);

total: 0 errors, 0 warnings, 6 checks, 381 lines checked
35af21623514 drm/i915: introduce a mechanism to extend execbuf2
-:128: CHECK:SPACING: spaces preferred around that '<<' (ctx:VxV)
#128: FILE: include/uapi/drm/i915_drm.h:1170:
+#define __I915_EXEC_UNKNOWN_FLAGS (-(I915_EXEC_EXT<<1))
                                                   ^

total: 0 errors, 0 warnings, 1 checks, 105 lines checked
d0bdcdd65e5d drm/i915: add syncobj timeline support
-:347: WARNING:TYPO_SPELLING: 'transfered' may be misspelled - perhaps 'transferred'?
#347: FILE: drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c:2501:
+			 * The chain's ownership is transfered to the

-:378: ERROR:CODE_INDENT: code indent should use tabs where possible
#378: FILE: drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c:2532:
+        [DRM_I915_GEM_EXECBUFFER_EXT_TIMELINE_FENCES] = parse_timeline_fences,$

-:378: WARNING:LEADING_SPACE: please, no spaces at the start of a line
#378: FILE: drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c:2532:
+        [DRM_I915_GEM_EXECBUFFER_EXT_TIMELINE_FENCES] = parse_timeline_fences,$

total: 1 errors, 2 warnings, 0 checks, 521 lines checked
243b2d67cc0f drm/i915: add a new perf configuration execbuf parameter
-:48: CHECK:LINE_SPACING: Please don't use multiple blank lines
#48: FILE: drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c:1218:
 
+

-:140: ERROR:CODE_INDENT: code indent should use tabs where possible
#140: FILE: drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c:2607:
+        [DRM_I915_GEM_EXECBUFFER_EXT_PERF] = parse_perf_config,$

-:140: WARNING:LEADING_SPACE: please, no spaces at the start of a line
#140: FILE: drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c:2607:
+        [DRM_I915_GEM_EXECBUFFER_EXT_PERF] = parse_perf_config,$

total: 1 errors, 1 warnings, 1 checks, 405 lines checked
aa1c4a084e04 drm/i915/perf: allow holding preemption on filtered ctx
-:154: WARNING:BRACES: braces {} are not necessary for single statement blocks
#154: FILE: drivers/gpu/drm/i915/i915_perf.c:2968:
+	if (IS_HASWELL(dev_priv) && specific_ctx && !props->hold_preemption) {
 		privileged_op = false;
+	}

total: 0 errors, 1 warnings, 0 checks, 220 lines checked
ee7f0b256c01 drm/i915/perf: execute OA configuration from command stream
e4ee8661e287 drm/i915: add support for perf configuration queries

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v6 05/11] drm/i915/perf: implement active wait for noa configurations
  2019-07-01 12:43   ` Chris Wilson
@ 2019-07-01 13:10     ` Lionel Landwerlin
  0 siblings, 0 replies; 44+ messages in thread
From: Lionel Landwerlin @ 2019-07-01 13:10 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On 01/07/2019 15:43, Chris Wilson wrote:
> Quoting Lionel Landwerlin (2019-07-01 12:34:31)
>> NOA configuration take some amount of time to apply. That amount of
>> time depends on the size of the GT. There is no documented time for
>> this. For example, past experimentations with powergating
>> configuration changes seem to indicate a 60~70us delay. We go with
>> 500us as default for now which should be over the required amount of
>> time (according to HW architects).
>>
>> v2: Don't forget to save/restore registers used for the wait (Chris)
>>
>> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>> ---
>>   drivers/gpu/drm/i915/gt/intel_gpu_commands.h |  24 ++
>>   drivers/gpu/drm/i915/gt/intel_gt_types.h     |   5 +
>>   drivers/gpu/drm/i915/i915_debugfs.c          |  25 +++
>>   drivers/gpu/drm/i915/i915_drv.h              |   8 +
>>   drivers/gpu/drm/i915/i915_perf.c             | 225 ++++++++++++++++++-
>>   drivers/gpu/drm/i915/i915_reg.h              |   4 +-
>>   6 files changed, 288 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
>> index e7eff9db343e..4a66af38c87b 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
>> +++ b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
>> @@ -151,6 +151,7 @@
>>   #define   MI_BATCH_GTT             (2<<6) /* aliased with (1<<7) on gen4 */
>>   #define MI_BATCH_BUFFER_START_GEN8     MI_INSTR(0x31, 1)
>>   #define   MI_BATCH_RESOURCE_STREAMER (1<<10)
>> +#define   MI_BATCH_PREDICATE         (1 << 15) /* HSW+ on RCS only*/
>>   
>>   /*
>>    * 3D instructions used by the kernel
>> @@ -226,6 +227,29 @@
>>   #define   PIPE_CONTROL_DEPTH_CACHE_FLUSH               (1<<0)
>>   #define   PIPE_CONTROL_GLOBAL_GTT (1<<2) /* in addr dword */
>>   
>> +#define MI_MATH(x) MI_INSTR(0x1a, (x)-1)
>> +#define   MI_ALU_OP(op, src1, src2) (((op) << 20) | ((src1) << 10) | (src2))
>> +/* operands */
>> +#define   MI_ALU_OP_NOOP     0
>> +#define   MI_ALU_OP_LOAD     128
>> +#define   MI_ALU_OP_LOADINV  1152
>> +#define   MI_ALU_OP_LOAD0    129
>> +#define   MI_ALU_OP_LOAD1    1153
>> +#define   MI_ALU_OP_ADD      256
>> +#define   MI_ALU_OP_SUB      257
>> +#define   MI_ALU_OP_AND      258
>> +#define   MI_ALU_OP_OR       259
>> +#define   MI_ALU_OP_XOR      260
>> +#define   MI_ALU_OP_STORE    384
>> +#define   MI_ALU_OP_STOREINV 1408
>> +/* sources */
>> +#define   MI_ALU_SRC_REG(x)  (x) /* 0 -> 15 */
>> +#define   MI_ALU_SRC_SRCA    32
>> +#define   MI_ALU_SRC_SRCB    33
>> +#define   MI_ALU_SRC_ACCU    49
>> +#define   MI_ALU_SRC_ZF      50
>> +#define   MI_ALU_SRC_CF      51
>> +
>>   /*
>>    * Commands used only by the command parser
>>    */
>> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_types.h b/drivers/gpu/drm/i915/gt/intel_gt_types.h
>> index e625a5e320d3..0750ac49a05b 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_gt_types.h
>> +++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h
>> @@ -70,6 +70,11 @@ enum intel_gt_scratch_field {
>>          /* 8 bytes */
>>          INTEL_GT_SCRATCH_FIELD_COHERENTL3_WA = 256,
>>   
>> +       /* 6 * 8 bytes */
>> +       INTEL_GT_SCRATCH_FIELD_PERF_CS_GPR = 2048,
>> +
>> +       /* 4 bytes */
>> +       INTEL_GT_SCRATCH_FIELD_PERF_PREDICATE_RESULT_1 = 2096,
>>   };
>>   
>>   #endif /* __INTEL_GT_TYPES_H__ */
>> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
>> index eeecdad0e3ca..6b49fda145e7 100644
>> --- a/drivers/gpu/drm/i915/i915_debugfs.c
>> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
>> @@ -3646,6 +3646,30 @@ DEFINE_SIMPLE_ATTRIBUTE(i915_wedged_fops,
>>                          i915_wedged_get, i915_wedged_set,
>>                          "%llu\n");
>>   
>> +static int
>> +i915_perf_noa_delay_set(void *data, u64 val)
>> +{
>> +       struct drm_i915_private *i915 = data;
>> +
>> +       atomic64_set(&i915->perf.oa.noa_programming_delay, val);
>> +       return 0;
>> +}
>> +
>> +static int
>> +i915_perf_noa_delay_get(void *data, u64 *val)
>> +{
>> +       struct drm_i915_private *i915 = data;
>> +
>> +       *val = atomic64_read(&i915->perf.oa.noa_programming_delay);
>> +       return 0;
>> +}
>> +
>> +DEFINE_SIMPLE_ATTRIBUTE(i915_perf_noa_delay_fops,
>> +                       i915_perf_noa_delay_get,
>> +                       i915_perf_noa_delay_set,
>> +                       "%llu\n");
>> +
>> +
>>   #define DROP_UNBOUND   BIT(0)
>>   #define DROP_BOUND     BIT(1)
>>   #define DROP_RETIRE    BIT(2)
>> @@ -4411,6 +4435,7 @@ static const struct i915_debugfs_files {
>>          const char *name;
>>          const struct file_operations *fops;
>>   } i915_debugfs_files[] = {
>> +       {"i915_perf_noa_delay", &i915_perf_noa_delay_fops},
>>          {"i915_wedged", &i915_wedged_fops},
>>          {"i915_cache_sharing", &i915_cache_sharing_fops},
>>          {"i915_gem_drop_caches", &i915_drop_caches_fops},
>> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
>> index df39b2ee6bd9..fe93a260bd28 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.h
>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>> @@ -1837,6 +1837,14 @@ struct drm_i915_private {
>>   
>>                          struct i915_oa_ops ops;
>>                          const struct i915_oa_format *oa_formats;
>> +
>> +                       /**
>> +                        * A batch buffer doing a wait on the GPU for the NOA
>> +                        * logic to be reprogrammed.
>> +                        */
>> +                       struct i915_vma *noa_wait;
>> +
>> +                       atomic64_t noa_programming_delay;
>>                  } oa;
>>          } perf;
>>   
>> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
>> index 5ba771468078..03e6908282e3 100644
>> --- a/drivers/gpu/drm/i915/i915_perf.c
>> +++ b/drivers/gpu/drm/i915/i915_perf.c
>> @@ -197,6 +197,7 @@
>>   
>>   #include "gem/i915_gem_context.h"
>>   #include "gem/i915_gem_pm.h"
>> +#include "gt/intel_gt.h"
>>   #include "gt/intel_lrc_reg.h"
>>   
>>   #include "i915_drv.h"
>> @@ -429,7 +430,7 @@ static int alloc_oa_config_buffer(struct drm_i915_private *i915,
>>                                                MI_LOAD_REGISTER_IMM_MAX_REGS) * 4;
>>                  config_length += oa_config->flex_regs_len * 8;
>>          }
>> -       config_length += 4; /* MI_BATCH_BUFFER_END */
>> +       config_length += 12; /* MI_BATCH_BUFFER_START into noa_wait loop */
>>          config_length = ALIGN(config_length, I915_GTT_PAGE_SIZE);
>>   
>>          bo = i915_gem_object_create_shmem(i915, config_length);
>> @@ -446,7 +447,12 @@ static int alloc_oa_config_buffer(struct drm_i915_private *i915,
>>          cs = write_cs_mi_lri(cs, oa_config->b_counter_regs, oa_config->b_counter_regs_len);
>>          cs = write_cs_mi_lri(cs, oa_config->flex_regs, oa_config->flex_regs_len);
>>   
>> -       *cs++ = MI_BATCH_BUFFER_END;
>> +
>> +       /* Jump into the NOA wait busy loop. */
>> +       *cs++ = (INTEL_GEN(i915) < 8 ?
>> +                MI_BATCH_BUFFER_START : MI_BATCH_BUFFER_START_GEN8);
>> +       *cs++ = i915_ggtt_offset(i915->perf.oa.noa_wait);
>> +       *cs++ = 0;
>>   
>>          i915_gem_object_flush_map(bo);
>>          i915_gem_object_unpin_map(bo);
>> @@ -1467,6 +1473,7 @@ static void i915_oa_stream_destroy(struct i915_perf_stream *stream)
>>          mutex_lock(&dev_priv->drm.struct_mutex);
>>          dev_priv->perf.oa.exclusive_stream = NULL;
>>          dev_priv->perf.oa.ops.disable_metric_set(dev_priv);
>> +       i915_vma_unpin_and_release(&dev_priv->perf.oa.noa_wait, 0);
>>          mutex_unlock(&dev_priv->drm.struct_mutex);
>>   
>>          free_oa_buffer(dev_priv);
>> @@ -1653,6 +1660,204 @@ static int alloc_oa_buffer(struct drm_i915_private *dev_priv)
>>          return ret;
>>   }
>>   
>> +static u32 *save_register(struct drm_i915_private *i915, u32 *cs,
>> +                         i915_reg_t reg, u32 offset, u32 dword_count)
>> +{
>> +       uint32_t d;
>> +
>> +       for (d = 0; d < dword_count; d++) {
>> +               *cs++ = INTEL_GEN(i915) >= 8 ?
>> +                       MI_STORE_REGISTER_MEM_GEN8 : MI_STORE_REGISTER_MEM;
>> +               *cs++ = i915_mmio_reg_offset(reg) + 4 * d;
>> +               *cs++ = intel_gt_scratch_offset(&i915->gt, offset) + 4 * d;
>> +               if (INTEL_GEN(i915) >= 8)
>> +                       *cs++ = 0;
> Will anyone care about the extra nop on hsw? :)


Alright :)

I didn't noticed I left a NOOP below too.


>
>> +static int alloc_noa_wait(struct drm_i915_private *i915)
>> +{
>> +       struct drm_i915_gem_object *bo;
>> +       struct i915_vma *vma;
>> +       u64 delay_ns = atomic64_read(&i915->perf.oa.noa_programming_delay), delay_ticks;
>> +       u32 *batch, *ts0, *cs, *jump;
>> +       int ret, i;
>> +
>> +       bo = i915_gem_object_create_shmem(i915, 4096);
> So swappable. At 4k, we almost consume as much in our bookkeeping as we
> allocate for the backing store. Yes, it's atrocious.
>
> Hang on. This can never be unpinned as it stores absolute addresses. So
> this can be i915_gem_object_create_internal().
>
>> +       if (IS_ERR(bo)) {
>> +               DRM_ERROR("Failed to allocate NOA wait batchbuffer\n");
>> +               return PTR_ERR(bo);
>> +       }
>> +
>> +       /*
>> +        * We pin in GGTT because we jump into this buffer now because
>> +        * multiple OA config BOs will have a jump to this address and it
>> +        * needs to be fixed during the lifetime of the i915/perf stream.
>> +        */
>> +       vma = i915_gem_object_ggtt_pin(bo, NULL, 0, 4096, 0);
>> +       if (IS_ERR(vma)) {
>> +               ret = PTR_ERR(vma);
>> +               goto err_unref;
>> +       }
>> +
>> +       batch = cs = i915_gem_object_pin_map(bo, I915_MAP_WB);
>> +       if (IS_ERR(batch)) {
>> +               ret = PTR_ERR(batch);
>> +               goto err_unpin;
>> +       }
>> +
>> +       /* Save registers. */
>> +       for (i = 0; i <= 5; i++) {
>> +               cs = save_register(i915, cs, HSW_CS_GPR(i),
>> +                                  INTEL_GT_SCRATCH_FIELD_PERF_CS_GPR + 8 * i, 2);
>> +       }
>> +       cs = save_register(i915, cs, MI_PREDICATE_RESULT_1,
>> +                          INTEL_GT_SCRATCH_FIELD_PERF_PREDICATE_RESULT_1, 1);
>> +
>> +       /* First timestamp snapshot location. */
>> +       ts0 = cs;
> Did this ever get used?


Oops, my mistake. This is buggy, the jump back to the beginning should 
come back here.


>
>> +       /*
>> +        * Initial snapshot of the timestamp register to implement the wait.
>> +        * We work with 32b values, so clear out the top 32b bits of the
>> +        * register because the ALU works 64bits.
>> +        */
>> +       *cs++ = MI_LOAD_REGISTER_IMM(1);
>> +       *cs++ = i915_mmio_reg_offset(HSW_CS_GPR(0)) + 4;
>> +       *cs++ = 0;
>> +       *cs++ = MI_LOAD_REGISTER_REG | (3 - 2);
>> +       *cs++ = i915_mmio_reg_offset(RING_TIMESTAMP(RENDER_RING_BASE));
>> +       *cs++ = i915_mmio_reg_offset(HSW_CS_GPR(0));
>> +
>> +       /*
>> +        * This is the location we're going to jump back into until the
>> +        * required amount of time has passed.
>> +        */
>> +       jump = cs;
>> +
>> +       /*
>> +        * Take another snapshot of the timestamp register. Take care to clear
>> +        * up the top 32bits of CS_GPR(1) as we're using it for other
>> +        * operations below.
>> +        */
>> +       *cs++ = MI_LOAD_REGISTER_IMM(1);
>> +       *cs++ = i915_mmio_reg_offset(HSW_CS_GPR(1)) + 4;
>> +       *cs++ = 0;
>> +       *cs++ = MI_LOAD_REGISTER_REG | (3 - 2);
>> +       *cs++ = i915_mmio_reg_offset(RING_TIMESTAMP(RENDER_RING_BASE));
>> +       *cs++ = i915_mmio_reg_offset(HSW_CS_GPR(1));
> cs = get_timestamp(cs, 1);
>
> enum { START, NOW, DELTA, RESULT, TARGET } ?
>
> That would also help with save/restore registers, as all CS_GPR should
> then be named.


Very sensible :)

If only you had 
https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/intel/common/gen_mi_builder.h


>
>> +       /*
>> +        * Do a diff between the 2 timestamps and store the result back into
>> +        * CS_GPR(1).
>> +        */
>> +       *cs++ = MI_MATH(5);
>> +       *cs++ = MI_ALU_OP(MI_ALU_OP_LOAD, MI_ALU_SRC_SRCA, MI_ALU_SRC_REG(1));
>> +       *cs++ = MI_ALU_OP(MI_ALU_OP_LOAD, MI_ALU_SRC_SRCB, MI_ALU_SRC_REG(0));
>> +       *cs++ = MI_ALU_OP(MI_ALU_OP_SUB, 0, 0);
>> +       *cs++ = MI_ALU_OP(MI_ALU_OP_STORE, MI_ALU_SRC_REG(2), MI_ALU_SRC_ACCU);
>> +       *cs++ = MI_ALU_OP(MI_ALU_OP_STORE, MI_ALU_SRC_REG(3), MI_ALU_SRC_CF);
>> +
>> +       /*
>> +        * Transfer the carry flag (set to 1 if ts1 < ts0, meaning the
>> +        * timestamp have rolled over the 32bits) into the predicate register
>> +        * to be used for the predicated jump.
>> +        */
>> +       *cs++ = MI_LOAD_REGISTER_REG | (3 - 2);
>> +       *cs++ = i915_mmio_reg_offset(HSW_CS_GPR(3));
>> +       *cs++ = i915_mmio_reg_offset(MI_PREDICATE_RESULT_1);
>> +
>> +       /* Restart from the beginning if we had timestamps roll over. */
>> +       *cs++ = (INTEL_GEN(i915) < 8 ?
>> +                MI_BATCH_BUFFER_START : MI_BATCH_BUFFER_START_GEN8) |
>> +               MI_BATCH_PREDICATE;
>> +       *cs++ = vma->node.start;
>> +       *cs++ = 0;
>> +
>> +       /*
>> +        * Now add the diff between to previous timestamps and add it to :
>> +        *      (((1 * << 64) - 1) - delay_ns)
>> +        *
>> +        * When the Carry Flag contains 1 this means the elapsed time is
>> +        * longer than the expected delay, and we can exit the wait loop.
>> +        */
>> +       delay_ticks = 0xffffffffffffffff -
>> +               DIV64_U64_ROUND_UP(delay_ns *
>> +                                  RUNTIME_INFO(i915)->cs_timestamp_frequency_khz,
>> +                                  1000000ull);
>> +       *cs++ = MI_LOAD_REGISTER_IMM(2);
>> +       *cs++ = i915_mmio_reg_offset(HSW_CS_GPR(4));
>> +       *cs++ = lower_32_bits(delay_ticks);
>> +       *cs++ = i915_mmio_reg_offset(HSW_CS_GPR(4)) + 4;
>> +       *cs++ = upper_32_bits(delay_ticks);
> Now, I was expecting to compute the 32b end timestamp and compare now to
> that using the carry-flag to indicate the completion.
>
> Why the detour? (I'm sure I am missing something here.)


Probably trying to save a register... My thinking is not that deep too ;)


>
> Quick question request for >32b delays are rejected in the user debug api?


Nop :( Will add.


>
>> +       *cs++ = MI_MATH(4);
>> +       *cs++ = MI_ALU_OP(MI_ALU_OP_LOAD, MI_ALU_SRC_SRCA, MI_ALU_SRC_REG(2));
>> +       *cs++ = MI_ALU_OP(MI_ALU_OP_LOAD, MI_ALU_SRC_SRCB, MI_ALU_SRC_REG(4));
>> +       *cs++ = MI_ALU_OP(MI_ALU_OP_ADD, 0, 0);
>> +       *cs++ = MI_ALU_OP(MI_ALU_OP_STOREINV, MI_ALU_SRC_REG(5), MI_ALU_SRC_CF);
> Comparing delta against target delay. Inverted store to give
> 	(delay - target) >= 0


I don't trust myself too much with ALU changes ;)


>
>> +       /*
>> +        * Transfer the result into the predicate register to be used for the
>> +        * predicated jump.
>> +        */
>> +       *cs++ = MI_LOAD_REGISTER_REG | (3 - 2);
>> +       *cs++ = i915_mmio_reg_offset(HSW_CS_GPR(5));
>> +       *cs++ = i915_mmio_reg_offset(MI_PREDICATE_RESULT_1);
>> +
>> +       /* Predicate the jump.  */
>> +       *cs++ = (INTEL_GEN(i915) < 8 ?
>> +                MI_BATCH_BUFFER_START : MI_BATCH_BUFFER_START_GEN8) |
>> +               MI_BATCH_PREDICATE;
> The joy of being on rcs.


You'll be delighted to know about this new TGL generation where all CS 
(afaict) have graduated from predication.


>
>> +       *cs++ = vma->node.start + (jump - batch) * 4;
> Mutters i915_ggtt_offset(vma)
>
>> +       *cs++ = 0;
>> +
>> +       /* Restore registers. */
>> +       for (i = 0; i <= 5; i++) {
>> +               cs = restore_register(i915, cs, HSW_CS_GPR(i),
>> +                                     INTEL_GT_SCRATCH_FIELD_PERF_CS_GPR + 8 * i, 2);
>> +       }
>> +       cs = restore_register(i915, cs, MI_PREDICATE_RESULT_1,
>> +                             INTEL_GT_SCRATCH_FIELD_PERF_PREDICATE_RESULT_1, 1);
>> +
>> +       /* And return to the ring. */
>> +       *cs++ = MI_BATCH_BUFFER_END;
> GEM_BUG_ON(cs - batch > PAGE_SIZE/sizeof(*batch));


Done!


>
>> +
>> +       i915_gem_object_flush_map(bo);
>> +       i915_gem_object_unpin_map(bo);
>> +
>> +       i915->perf.oa.noa_wait = vma;
>> +
>> +       return 0;
>> +
>> +err_unpin:
>> +       __i915_vma_unpin(vma);
>> +
>> +err_unref:
>> +       i915_gem_object_put(bo);
>> +
>> +       return ret;
>> +}
> Certainly seems to do what you say on the tin.
> -Chris
>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v6 07/11] drm/i915: add syncobj timeline support
  2019-07-01 11:34 ` [PATCH v6 07/11] drm/i915: add syncobj timeline support Lionel Landwerlin
@ 2019-07-01 13:13   ` Chris Wilson
  2019-07-01 13:15     ` Lionel Landwerlin
  2019-07-01 13:18   ` Chris Wilson
  2019-07-03  8:56   ` Chris Wilson
  2 siblings, 1 reply; 44+ messages in thread
From: Chris Wilson @ 2019-07-01 13:13 UTC (permalink / raw)
  To: Lionel Landwerlin, intel-gfx

Quoting Lionel Landwerlin (2019-07-01 12:34:33)
>  struct i915_execbuffer {
>         struct drm_i915_private *i915; /** i915 backpointer */
>         struct drm_file *file; /** per-file lookup tables and limits */
> @@ -275,6 +282,7 @@ struct i915_execbuffer {
>  
>         struct {
>                 u64 flags; /** Available extensions parameters */
> +               struct drm_i915_gem_execbuffer_ext_timeline_fences timeline_fences;
>         } extensions;
>  };
> +static int parse_timeline_fences(struct i915_user_extension __user *ext, void *data)
> +{
> +       struct i915_execbuffer *eb = data;
> +
> +       /* Timeline fences are incompatible with the fence array flag. */
> +       if (eb->args->flags & I915_EXEC_FENCE_ARRAY)
> +               return -EINVAL;
> +
> +       if (eb->extensions.flags & BIT(DRM_I915_GEM_EXECBUFFER_EXT_TIMELINE_FENCES))
> +               return -EINVAL;

flags is 64b, so wiser if we use BIT_ULL() from the start. You don't
want to copy my bugs ;)
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* ✗ Fi.CI.SPARSE: warning for drm/i915: Vulkan performance query support (rev6)
  2019-07-01 11:34 [PATCH v6 00/11] drm/i915: Vulkan performance query support Lionel Landwerlin
                   ` (11 preceding siblings ...)
  2019-07-01 13:08 ` ✗ Fi.CI.CHECKPATCH: warning for drm/i915: Vulkan performance query support (rev6) Patchwork
@ 2019-07-01 13:14 ` Patchwork
  2019-07-01 13:38 ` ✓ Fi.CI.BAT: success " Patchwork
  2019-07-02 18:02 ` ✗ Fi.CI.IGT: failure " Patchwork
  14 siblings, 0 replies; 44+ messages in thread
From: Patchwork @ 2019-07-01 13:14 UTC (permalink / raw)
  To: Lionel Landwerlin; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: Vulkan performance query support (rev6)
URL   : https://patchwork.freedesktop.org/series/60916/
State : warning

== Summary ==

$ dim sparse origin/drm-tip
Sparse version: v0.5.2
Commit: drm/i915/perf: add missing delay for OA muxes configuration
Okay!

Commit: drm/i915/perf: introduce a versioning of the i915-perf uapi
Okay!

Commit: drm/i915/perf: allow for CS OA configs to be created lazily
+drivers/gpu/drm/i915/i915_perf.c:398:37: warning: expression using sizeof(void)

Commit: drm/i915: enumerate scratch fields
Okay!

Commit: drm/i915/perf: implement active wait for noa configurations
+  ^~~~~~~~~~~~~~~~~~
+cc1: all warnings being treated as errors
+drivers/gpu/drm/i915/i915_perf.c:1803:23: warning: constant 0xffffffffffffffff is so big it is unsigned long
+drivers/gpu/drm/i915/i915_perf.c:2492:27: warning: call with no type!
+drivers/gpu/drm/i915/i915_perf.c:2492:2: error: implicit declaration of function ‘i915_oa_config_put’; did you mean ‘i915_gem_context_put’? [-Werror=implicit-function-declaration]
+drivers/gpu/drm/i915/i915_perf.c:2492:9: error: undefined identifier 'i915_oa_config_put'
+drivers/gpu/drm/i915/i915_perf.c: In function ‘i915_oa_stream_init’:
+  i915_gem_context_put
+  i915_oa_config_put(stream->oa_config);
+make[1]: *** [drivers/gpu/drm/i915] Error 2
+make[2]: *** [drivers/gpu/drm/i915/i915_perf.o] Error 1
+make[2]: *** Waiting for unfinished jobs....
+make: *** [drivers/gpu/drm/] Error 2

Commit: drm/i915: introduce a mechanism to extend execbuf2
Okay!

Commit: drm/i915: add syncobj timeline support
+./include/linux/mm.h:663:13: error: not a function <noident>

Commit: drm/i915: add a new perf configuration execbuf parameter
-  ^~~~~~~~~~~~~~~~~~
-cc1: all warnings being treated as errors
-drivers/gpu/drm/i915/i915_perf.c:2491:27: warning: call with no type!
-drivers/gpu/drm/i915/i915_perf.c:2491:2: error: implicit declaration of function ‘i915_oa_config_put’; did you mean ‘i915_gem_context_put’? [-Werror=implicit-function-declaration]
-drivers/gpu/drm/i915/i915_perf.c:2491:9: error: undefined identifier 'i915_oa_config_put'
-drivers/gpu/drm/i915/i915_perf.c: In function ‘i915_oa_stream_init’:
-  i915_gem_context_put
-  i915_oa_config_put(stream->oa_config);
-make[1]: *** [drivers/gpu/drm/i915] Error 2
-make[2]: *** [drivers/gpu/drm/i915/i915_perf.o] Error 1
-make[2]: *** Waiting for unfinished jobs....
-make: *** [drivers/gpu/drm/] Error 2

Commit: drm/i915/perf: allow holding preemption on filtered ctx
Okay!

Commit: drm/i915/perf: execute OA configuration from command stream
Okay!

Commit: drm/i915: add support for perf configuration queries
Okay!

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v6 07/11] drm/i915: add syncobj timeline support
  2019-07-01 13:13   ` Chris Wilson
@ 2019-07-01 13:15     ` Lionel Landwerlin
  0 siblings, 0 replies; 44+ messages in thread
From: Lionel Landwerlin @ 2019-07-01 13:15 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On 01/07/2019 16:13, Chris Wilson wrote:
> Quoting Lionel Landwerlin (2019-07-01 12:34:33)
>>   struct i915_execbuffer {
>>          struct drm_i915_private *i915; /** i915 backpointer */
>>          struct drm_file *file; /** per-file lookup tables and limits */
>> @@ -275,6 +282,7 @@ struct i915_execbuffer {
>>   
>>          struct {
>>                  u64 flags; /** Available extensions parameters */
>> +               struct drm_i915_gem_execbuffer_ext_timeline_fences timeline_fences;
>>          } extensions;
>>   };
>> +static int parse_timeline_fences(struct i915_user_extension __user *ext, void *data)
>> +{
>> +       struct i915_execbuffer *eb = data;
>> +
>> +       /* Timeline fences are incompatible with the fence array flag. */
>> +       if (eb->args->flags & I915_EXEC_FENCE_ARRAY)
>> +               return -EINVAL;
>> +
>> +       if (eb->extensions.flags & BIT(DRM_I915_GEM_EXECBUFFER_EXT_TIMELINE_FENCES))
>> +               return -EINVAL;
> flags is 64b, so wiser if we use BIT_ULL() from the start. You don't
> want to copy my bugs ;)
> -Chris
>
Dammit! Why aren't all bit macros 64bits? :)


-Lionel

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v6 07/11] drm/i915: add syncobj timeline support
  2019-07-01 11:34 ` [PATCH v6 07/11] drm/i915: add syncobj timeline support Lionel Landwerlin
  2019-07-01 13:13   ` Chris Wilson
@ 2019-07-01 13:18   ` Chris Wilson
  2019-07-01 13:22     ` Lionel Landwerlin
  2019-07-03  8:56   ` Chris Wilson
  2 siblings, 1 reply; 44+ messages in thread
From: Chris Wilson @ 2019-07-01 13:18 UTC (permalink / raw)
  To: Lionel Landwerlin, intel-gfx

Quoting Lionel Landwerlin (2019-07-01 12:34:33)
> +               /*
> +                * For timeline syncobjs we need to preallocate chains for
> +                * later signaling.
> +                */
> +               if (point != 0 && user_fence.flags & I915_EXEC_FENCE_SIGNAL) {
> +                       fences[n].chain_fence =
> +                               kmalloc(sizeof(*fences[n].chain_fence),
> +                                       GFP_KERNEL);
> +                       if (!fences[n].chain_fence) {
> +                               dma_fence_put(fence);
> +                               drm_syncobj_put(syncobj);
> +                               err = -ENOMEM;
> +                               DRM_DEBUG("Unable to alloc chain_fence\n");

This is like throwing a grenade, waiting for the explosion, and then
saying "bang" under your breath. :)
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v6 07/11] drm/i915: add syncobj timeline support
  2019-07-01 13:18   ` Chris Wilson
@ 2019-07-01 13:22     ` Lionel Landwerlin
  0 siblings, 0 replies; 44+ messages in thread
From: Lionel Landwerlin @ 2019-07-01 13:22 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On 01/07/2019 16:18, Chris Wilson wrote:
> Quoting Lionel Landwerlin (2019-07-01 12:34:33)
>> +               /*
>> +                * For timeline syncobjs we need to preallocate chains for
>> +                * later signaling.
>> +                */
>> +               if (point != 0 && user_fence.flags & I915_EXEC_FENCE_SIGNAL) {
>> +                       fences[n].chain_fence =
>> +                               kmalloc(sizeof(*fences[n].chain_fence),
>> +                                       GFP_KERNEL);
>> +                       if (!fences[n].chain_fence) {
>> +                               dma_fence_put(fence);
>> +                               drm_syncobj_put(syncobj);
>> +                               err = -ENOMEM;
>> +                               DRM_DEBUG("Unable to alloc chain_fence\n");
> This is like throwing a grenade, waiting for the explosion, and then
> saying "bang" under your breath. :)
> -Chris
>
I don't get your point :)


-Lionel

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v6 10/11] drm/i915/perf: execute OA configuration from command stream
  2019-07-01 11:34 ` [PATCH v6 10/11] drm/i915/perf: execute OA configuration from command stream Lionel Landwerlin
@ 2019-07-01 13:32   ` Chris Wilson
  2019-07-01 13:42     ` Lionel Landwerlin
  0 siblings, 1 reply; 44+ messages in thread
From: Chris Wilson @ 2019-07-01 13:32 UTC (permalink / raw)
  To: Lionel Landwerlin, intel-gfx

Quoting Lionel Landwerlin (2019-07-01 12:34:36)
> @@ -1860,23 +1893,55 @@ static int alloc_noa_wait(struct drm_i915_private *i915)
>         return ret;
>  }
>  
> -static void config_oa_regs(struct drm_i915_private *dev_priv,
> -                          const struct i915_oa_reg *regs,
> -                          u32 n_regs)
> +static int emit_oa_config(struct drm_i915_private *i915,
> +                         struct i915_perf_stream *stream)
>  {
> -       u32 i;
> +       struct i915_oa_config *oa_config = stream->oa_config;
> +       struct i915_request *rq = stream->initial_config_rq;
> +       struct i915_vma *vma;
> +       u32 *cs;
> +       int err;
>  
> -       for (i = 0; i < n_regs; i++) {
> -               const struct i915_oa_reg *reg = regs + i;
> +       vma = i915_vma_instance(oa_config->obj, &i915->ggtt.vm, NULL);
> +       if (unlikely(IS_ERR(vma)))
> +               return PTR_ERR(vma);
> +
> +       err = i915_vma_pin(vma, 0, 0, PIN_GLOBAL);
> +       if (err)
> +               return err;

No pinning underneath the timeline->mutex.

...

> @@ -2455,47 +2466,90 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
>         if (ret)
>                 goto err_oa_buf_alloc;
>  
> +       ret = i915_perf_get_oa_config(dev_priv, props->metrics_set,
> +                                     &stream->oa_config, &obj);
> +       if (ret) {
> +               DRM_DEBUG("Invalid OA config id=%i\n", props->metrics_set);
> +               goto err_config;
> +       }
> +
> +       /*
> +        * We just need the buffer to be created, but not our own reference on
> +        * it as the oa_config already has one.
> +        */
> +       i915_gem_object_put(obj);
> +
> +       stream->initial_config_rq =
> +               i915_request_create(dev_priv->engine[RCS0]->kernel_context);
> +       if (IS_ERR(stream->initial_config_rq)) {
> +               ret = PTR_ERR(stream->initial_config_rq);
> +               goto err_initial_config;
> +       }
> +
> +       stream->ops = &i915_oa_stream_ops;
> +
>         ret = i915_mutex_lock_interruptible(&dev_priv->drm);
>         if (ret)
>                 goto err_lock;

This locking is inverted as timeline->mutex is not a complete guard for
request allocation yet.

> -       stream->ops = &i915_oa_stream_ops;
> +       ret = i915_active_request_set(&dev_priv->engine[RCS0]->last_oa_config,
> +                                     stream->initial_config_rq);

I'm not convinced you want this (and the missing mutex) on the engine,
as it is just describing the perf oa_config timeline. I think it's
better to put that at the same granularity as it is used.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* ✓ Fi.CI.BAT: success for drm/i915: Vulkan performance query support (rev6)
  2019-07-01 11:34 [PATCH v6 00/11] drm/i915: Vulkan performance query support Lionel Landwerlin
                   ` (12 preceding siblings ...)
  2019-07-01 13:14 ` ✗ Fi.CI.SPARSE: " Patchwork
@ 2019-07-01 13:38 ` Patchwork
  2019-07-02 18:02 ` ✗ Fi.CI.IGT: failure " Patchwork
  14 siblings, 0 replies; 44+ messages in thread
From: Patchwork @ 2019-07-01 13:38 UTC (permalink / raw)
  To: Lionel Landwerlin; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: Vulkan performance query support (rev6)
URL   : https://patchwork.freedesktop.org/series/60916/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_6390 -> Patchwork_13480
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13480/

Known issues
------------

  Here are the changes found in Patchwork_13480 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_tiled_pread_basic:
    - fi-icl-u3:          [PASS][1] -> [DMESG-WARN][2] ([fdo#107724]) +1 similar issue
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6390/fi-icl-u3/igt@gem_tiled_pread_basic.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13480/fi-icl-u3/igt@gem_tiled_pread_basic.html

  * igt@i915_pm_rpm@module-reload:
    - fi-cml-u:           [PASS][3] -> [DMESG-WARN][4] ([fdo#111012])
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6390/fi-cml-u/igt@i915_pm_rpm@module-reload.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13480/fi-cml-u/igt@i915_pm_rpm@module-reload.html

  
#### Possible fixes ####

  * igt@gem_mmap_gtt@basic-copy:
    - fi-icl-u3:          [DMESG-WARN][5] ([fdo#107724]) -> [PASS][6]
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6390/fi-icl-u3/igt@gem_mmap_gtt@basic-copy.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13480/fi-icl-u3/igt@gem_mmap_gtt@basic-copy.html

  * igt@i915_pm_rpm@module-reload:
    - fi-skl-6770hq:      [FAIL][7] ([fdo#108511]) -> [PASS][8]
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6390/fi-skl-6770hq/igt@i915_pm_rpm@module-reload.html
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13480/fi-skl-6770hq/igt@i915_pm_rpm@module-reload.html

  * igt@i915_selftest@live_contexts:
    - fi-icl-dsi:         [INCOMPLETE][9] ([fdo#107713] / [fdo#108569]) -> [PASS][10]
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6390/fi-icl-dsi/igt@i915_selftest@live_contexts.html
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13480/fi-icl-dsi/igt@i915_selftest@live_contexts.html

  * igt@kms_busy@basic-flip-c:
    - fi-skl-6770hq:      [SKIP][11] ([fdo#109271] / [fdo#109278]) -> [PASS][12] +2 similar issues
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6390/fi-skl-6770hq/igt@kms_busy@basic-flip-c.html
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13480/fi-skl-6770hq/igt@kms_busy@basic-flip-c.html

  * igt@kms_flip@basic-flip-vs-dpms:
    - fi-skl-6770hq:      [SKIP][13] ([fdo#109271]) -> [PASS][14] +23 similar issues
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6390/fi-skl-6770hq/igt@kms_flip@basic-flip-vs-dpms.html
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13480/fi-skl-6770hq/igt@kms_flip@basic-flip-vs-dpms.html

  
  [fdo#107713]: https://bugs.freedesktop.org/show_bug.cgi?id=107713
  [fdo#107724]: https://bugs.freedesktop.org/show_bug.cgi?id=107724
  [fdo#108511]: https://bugs.freedesktop.org/show_bug.cgi?id=108511
  [fdo#108569]: https://bugs.freedesktop.org/show_bug.cgi?id=108569
  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109278]: https://bugs.freedesktop.org/show_bug.cgi?id=109278
  [fdo#111012]: https://bugs.freedesktop.org/show_bug.cgi?id=111012


Participating hosts (52 -> 45)
------------------------------

  Additional (1): fi-kbl-7567u 
  Missing    (8): fi-kbl-soraka fi-ilk-m540 fi-hsw-4200u fi-byt-squawks fi-bsw-cyan fi-icl-y fi-byt-clapper fi-bdw-samus 


Build changes
-------------

  * Linux: CI_DRM_6390 -> Patchwork_13480

  CI_DRM_6390: 4c6c23fdf450ab43bb4046ac1fb1439ebf176564 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_5075: 03779dd3de8a57544f124d9952a6d2b3e34e34ca @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_13480: e4ee8661e287f57ac107e317b2adcb2fe2b364ec @ git://anongit.freedesktop.org/gfx-ci/linux


== Kernel 32bit build ==

Warning: Kernel 32bit buildtest failed:
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13480/build_32bit.log

  CALL    scripts/checksyscalls.sh
  CALL    scripts/atomic/check-atomics.sh
  CHK     include/generated/compile.h
Kernel: arch/x86/boot/bzImage is ready  (#1)
  Building modules, stage 2.
  MODPOST 112 modules
ERROR: "__udivdi3" [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined!
ERROR: "__divdi3" [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined!
scripts/Makefile.modpost:91: recipe for target '__modpost' failed
make[1]: *** [__modpost] Error 1
Makefile:1287: recipe for target 'modules' failed
make: *** [modules] Error 2


== Linux commits ==

e4ee8661e287 drm/i915: add support for perf configuration queries
ee7f0b256c01 drm/i915/perf: execute OA configuration from command stream
aa1c4a084e04 drm/i915/perf: allow holding preemption on filtered ctx
243b2d67cc0f drm/i915: add a new perf configuration execbuf parameter
d0bdcdd65e5d drm/i915: add syncobj timeline support
35af21623514 drm/i915: introduce a mechanism to extend execbuf2
9762d251a468 drm/i915/perf: implement active wait for noa configurations
d84a895e334c drm/i915: enumerate scratch fields
61476a43f18a drm/i915/perf: allow for CS OA configs to be created lazily
5d6fb48512bb drm/i915/perf: introduce a versioning of the i915-perf uapi
063d9e6a602f drm/i915/perf: add missing delay for OA muxes configuration

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13480/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v6 10/11] drm/i915/perf: execute OA configuration from command stream
  2019-07-01 13:32   ` Chris Wilson
@ 2019-07-01 13:42     ` Lionel Landwerlin
  0 siblings, 0 replies; 44+ messages in thread
From: Lionel Landwerlin @ 2019-07-01 13:42 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On 01/07/2019 16:32, Chris Wilson wrote:
> Quoting Lionel Landwerlin (2019-07-01 12:34:36)
>> @@ -1860,23 +1893,55 @@ static int alloc_noa_wait(struct drm_i915_private *i915)
>>          return ret;
>>   }
>>   
>> -static void config_oa_regs(struct drm_i915_private *dev_priv,
>> -                          const struct i915_oa_reg *regs,
>> -                          u32 n_regs)
>> +static int emit_oa_config(struct drm_i915_private *i915,
>> +                         struct i915_perf_stream *stream)
>>   {
>> -       u32 i;
>> +       struct i915_oa_config *oa_config = stream->oa_config;
>> +       struct i915_request *rq = stream->initial_config_rq;
>> +       struct i915_vma *vma;
>> +       u32 *cs;
>> +       int err;
>>   
>> -       for (i = 0; i < n_regs; i++) {
>> -               const struct i915_oa_reg *reg = regs + i;
>> +       vma = i915_vma_instance(oa_config->obj, &i915->ggtt.vm, NULL);
>> +       if (unlikely(IS_ERR(vma)))
>> +               return PTR_ERR(vma);
>> +
>> +       err = i915_vma_pin(vma, 0, 0, PIN_GLOBAL);
>> +       if (err)
>> +               return err;
> No pinning underneath the timeline->mutex.
>
> ...


Hmm... But in this change emit_oa_config() is called by the 
enable_metricset() vfunc from i915_oa_stream_init() without holding 
drm.struct_mutex.

That doesn't seem to be under the timeline->mutex.


>
>> @@ -2455,47 +2466,90 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
>>          if (ret)
>>                  goto err_oa_buf_alloc;
>>   
>> +       ret = i915_perf_get_oa_config(dev_priv, props->metrics_set,
>> +                                     &stream->oa_config, &obj);
>> +       if (ret) {
>> +               DRM_DEBUG("Invalid OA config id=%i\n", props->metrics_set);
>> +               goto err_config;
>> +       }
>> +
>> +       /*
>> +        * We just need the buffer to be created, but not our own reference on
>> +        * it as the oa_config already has one.
>> +        */
>> +       i915_gem_object_put(obj);
>> +
>> +       stream->initial_config_rq =
>> +               i915_request_create(dev_priv->engine[RCS0]->kernel_context);
>> +       if (IS_ERR(stream->initial_config_rq)) {
>> +               ret = PTR_ERR(stream->initial_config_rq);
>> +               goto err_initial_config;
>> +       }
>> +
>> +       stream->ops = &i915_oa_stream_ops;
>> +
>>          ret = i915_mutex_lock_interruptible(&dev_priv->drm);
>>          if (ret)
>>                  goto err_lock;
> This locking is inverted as timeline->mutex is not a complete guard for
> request allocation yet.


So intel_context_lock_pinned() around the request allocation and setting 
the active request then?

With the struct_mutex lock taken around it?


>
>> -       stream->ops = &i915_oa_stream_ops;
>> +       ret = i915_active_request_set(&dev_priv->engine[RCS0]->last_oa_config,
>> +                                     stream->initial_config_rq);
> I'm not convinced you want this (and the missing mutex) on the engine,
> as it is just describing the perf oa_config timeline. I think it's
> better to put that at the same granularity as it is used.
> -Chris
>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v6 03/11] drm/i915/perf: allow for CS OA configs to be created lazily
  2019-07-01 13:06   ` Chris Wilson
@ 2019-07-01 13:45     ` Lionel Landwerlin
  0 siblings, 0 replies; 44+ messages in thread
From: Lionel Landwerlin @ 2019-07-01 13:45 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On 01/07/2019 16:06, Chris Wilson wrote:
> Quoting Lionel Landwerlin (2019-07-01 12:34:29)
>> @@ -2535,9 +2635,21 @@ static int i915_perf_release(struct inode *inode, struct file *file)
>>   {
>>          struct i915_perf_stream *stream = file->private_data;
>>          struct drm_i915_private *dev_priv = stream->dev_priv;
>> +       struct i915_oa_config *oa_config, *next;
>>   
>>          mutex_lock(&dev_priv->perf.lock);
>> +
>>          i915_perf_destroy_locked(stream);
>> +
>> +       /* Dispose of all oa config batch buffers. */
>> +       mutex_lock(&dev_priv->perf.metrics_lock);
>> +       list_for_each_entry_safe(oa_config, next, &dev_priv->perf.metrics_buffers, vma_link) {
>> +               list_del(&oa_config->vma_link);
>> +               i915_gem_object_put(oa_config->obj);
>> +               oa_config->obj = NULL;
>> +       }
>> +       mutex_unlock(&dev_priv->perf.metrics_lock);
> What's the reference chain from the i915_perf fd to the i915_device?
> What's even keeping the module alive!
>
> Shouldn't be a drm_dev_get() in i915_perf_open_ioctl() and a
> drm_dev_put() here?


Aye!

Looks like a candidate for stable...


>
> So there may be more than one stream, sharing the same oa_config. If a
> stream closes, you let all the current streams keep their reference and
> the next gets a new object. Looks like there's some scope for
> duplication, but looks safe enough. My main worry was for zombie
> oa_config.


The goal of this loop is to garbage collect the config BOs once OA isn't 
used anymore.

Right now there is only one engine with OA support.

We could potentially put that list on the engine to be safe.


Thanks,


-Lionel


> -Chris
>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v6 09/11] drm/i915/perf: allow holding preemption on filtered ctx
  2019-07-01 12:10     ` Lionel Landwerlin
@ 2019-07-01 14:37       ` Chris Wilson
  2019-07-09  9:18         ` Lionel Landwerlin
  0 siblings, 1 reply; 44+ messages in thread
From: Chris Wilson @ 2019-07-01 14:37 UTC (permalink / raw)
  To: Lionel Landwerlin, intel-gfx

Quoting Lionel Landwerlin (2019-07-01 13:10:53)
> On 01/07/2019 15:03, Chris Wilson wrote:
> > Quoting Lionel Landwerlin (2019-07-01 12:34:35)
> >> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> >> index f92bace9caff..012d6d7f54e2 100644
> >> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> >> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> >> @@ -2104,6 +2104,14 @@ static int eb_oa_config(struct i915_execbuffer *eb)
> >>          if (err)
> >>                  return err;
> >>   
> >> +       /*
> >> +        * If the perf stream was opened with hold preemption, flag the
> >> +        * request properly so that the priority of the request is bumped once
> >> +        * it reaches the execlist ports.
> >> +        */
> >> +       if (eb->i915->perf.oa.exclusive_stream->hold_preemption)
> >> +               eb->request->flags |= I915_REQUEST_FLAGS_PERF;
> > Just to reassure myself that this is the behaviour you:
> >
> > If the exclusive_stream is changed before the request is executed, it is
> > likely that we no longer notice the earlier preemption-protection. This
> > should not matter because the listener is no longer interested in those
> > events?
> > -Chris
> >
> 
> Yeah, dropping the perf stream before your queries complete and you're 
> in undefined behavior territory.

Then this should do what you want, and if I break it in future, I have
to fix it ;)

Hmm, this definitely merits some selftest/igt as I am very liable to
break it.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v6 03/11] drm/i915/perf: allow for CS OA configs to be created lazily
  2019-07-01 11:34 ` [PATCH v6 03/11] drm/i915/perf: allow for CS OA configs to be created lazily Lionel Landwerlin
  2019-07-01 13:06   ` Chris Wilson
@ 2019-07-01 15:09   ` Chris Wilson
  2019-07-09  6:47     ` Lionel Landwerlin
  2019-07-09  8:30   ` Chris Wilson
  2 siblings, 1 reply; 44+ messages in thread
From: Chris Wilson @ 2019-07-01 15:09 UTC (permalink / raw)
  To: Lionel Landwerlin, intel-gfx

Quoting Lionel Landwerlin (2019-07-01 12:34:29)
>  struct i915_oa_config {
> +       struct drm_i915_private *i915;
> +
>         char uuid[UUID_STRING_LEN + 1];
>         int id;
>  
> @@ -1110,6 +1112,10 @@ struct i915_oa_config {
>         struct attribute *attrs[2];
>         struct device_attribute sysfs_metric_id;
>  
> +       struct drm_i915_gem_object *obj;
> +
> +       struct list_head vma_link;
> +
>         atomic_t ref_count;
>  };

> -static void free_oa_config(struct drm_i915_private *dev_priv,
> -                          struct i915_oa_config *oa_config)
> +static void put_oa_config(struct i915_oa_config *oa_config)
>  {
> +       if (!atomic_dec_and_test(&oa_config->ref_count))
> +               return;

I strongly advise that ref_count be replaced by struct kref, just so that
we get the benefit of debugging.

put_oa_config -> kref_put(&oa_config->ref, free_oa_config)
(free_oa_config takes kref as its arg and uses container_of())

> +int i915_perf_get_oa_config(struct drm_i915_private *i915,
> +                           int metrics_set,
> +                           struct i915_oa_config **out_config,
> +                           struct drm_i915_gem_object **out_obj)
> +{
> +       int ret = 0;
> +       struct i915_oa_config *oa_config;
> +
> +       if (!i915->perf.initialized)
> +               return -ENODEV;
> +
> +       ret = mutex_lock_interruptible(&i915->perf.metrics_lock);
>         if (ret)
>                 return ret;
>  
> -       *out_config = idr_find(&dev_priv->perf.metrics_idr, metrics_set);
> -       if (!*out_config)
> -               ret = -EINVAL;
> -       else
> -               atomic_inc(&(*out_config)->ref_count);
> +       if (metrics_set == 1) {
> +               oa_config = &i915->perf.oa.test_config;
> +       } else {
> +               oa_config = idr_find(&i915->perf.metrics_idr, metrics_set);

Why not have the builtin[1] inside the idr?
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v6 06/11] drm/i915: introduce a mechanism to extend execbuf2
  2019-07-01 11:34 ` [PATCH v6 06/11] drm/i915: introduce a mechanism to extend execbuf2 Lionel Landwerlin
@ 2019-07-01 15:17   ` Chris Wilson
  2019-07-02 11:36     ` Lionel Landwerlin
  0 siblings, 1 reply; 44+ messages in thread
From: Chris Wilson @ 2019-07-01 15:17 UTC (permalink / raw)
  To: Lionel Landwerlin, intel-gfx

Quoting Lionel Landwerlin (2019-07-01 12:34:32)
> We're planning to use this for a couple of new feature where we need
> to provide additional parameters to execbuf.
> 
> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

Looks ok, are you convinced by I915_EXEC_EXT? It doesn't roll off the
tongue too well for me, but I guess EXT is a bit more ingrained in
your cerebral cortex.

> ---
>  .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 32 ++++++++++++++++++-
>  include/uapi/drm/i915_drm.h                   | 25 +++++++++++++--
>  2 files changed, 53 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> index 1c5dfbfad71b..9887fa9e3ac8 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> @@ -23,6 +23,7 @@
>  #include "i915_gem_clflush.h"
>  #include "i915_gem_context.h"
>  #include "i915_trace.h"
> +#include "i915_user_extensions.h"
>  #include "intel_drv.h"
>  
>  enum {
> @@ -271,6 +272,10 @@ struct i915_execbuffer {
>          */
>         int lut_size;
>         struct hlist_head *buckets; /** ht for relocation handles */
> +
> +       struct {
> +               u64 flags; /** Available extensions parameters */
> +       } extensions;
>  };
>  
>  #define exec_entry(EB, VMA) (&(EB)->exec[(VMA)->exec_flags - (EB)->flags])
> @@ -1969,7 +1974,7 @@ static bool i915_gem_check_execbuffer(struct drm_i915_gem_execbuffer2 *exec)
>                 return false;
>  
>         /* Kernel clipping was a DRI1 misfeature */
> -       if (!(exec->flags & I915_EXEC_FENCE_ARRAY)) {
> +       if (!(exec->flags & (I915_EXEC_FENCE_ARRAY | I915_EXEC_EXT))) {
>                 if (exec->num_cliprects || exec->cliprects_ptr)
>                         return false;
>         }
> @@ -2347,6 +2352,27 @@ signal_fence_array(struct i915_execbuffer *eb,
>         }
>  }
>  
> +static const i915_user_extension_fn execbuf_extensions[] = {
> +};
> +
> +static int
> +parse_execbuf2_extensions(struct drm_i915_gem_execbuffer2 *args,
> +                         struct i915_execbuffer *eb)
> +{
> +       eb->extensions.flags = 0;
> +
> +       if (!(args->flags & I915_EXEC_EXT))
> +               return 0;
> +
> +       if (args->num_cliprects != 0)
> +               return -EINVAL;
> +
> +       return i915_user_extensions(u64_to_user_ptr(args->cliprects_ptr),
> +                                   execbuf_extensions,
> +                                   ARRAY_SIZE(execbuf_extensions),
> +                                   eb);
> +}
> +
>  static int
>  i915_gem_do_execbuffer(struct drm_device *dev,
>                        struct drm_file *file,
> @@ -2393,6 +2419,10 @@ i915_gem_do_execbuffer(struct drm_device *dev,
>         if (args->flags & I915_EXEC_IS_PINNED)
>                 eb.batch_flags |= I915_DISPATCH_PINNED;
>  
> +       err = parse_execbuf2_extensions(args, &eb);
> +       if (err)
> +               return err;
> +
>         if (args->flags & I915_EXEC_FENCE_IN) {
>                 in_fence = sync_file_get_fence(lower_32_bits(args->rsvd2));
>                 if (!in_fence)
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index e27a8eda9121..efa195d6994e 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -1013,6 +1013,10 @@ struct drm_i915_gem_exec_fence {
>         __u32 flags;
>  };
>  
> +enum drm_i915_gem_execbuffer_ext {
> +       DRM_I915_GEM_EXECBUFFER_EXT_MAX /* non-ABI */

We have a weird mix of trying to avoid drm_i915_gem and yet it's
plastered all over the structs. Sigh.

> +};

enums next to uABI make me nervous :)

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v6 06/11] drm/i915: introduce a mechanism to extend execbuf2
  2019-07-01 15:17   ` Chris Wilson
@ 2019-07-02 11:36     ` Lionel Landwerlin
  0 siblings, 0 replies; 44+ messages in thread
From: Lionel Landwerlin @ 2019-07-02 11:36 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On 01/07/2019 18:17, Chris Wilson wrote:
> Quoting Lionel Landwerlin (2019-07-01 12:34:32)
>> We're planning to use this for a couple of new feature where we need
>> to provide additional parameters to execbuf.
>>
>> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> Looks ok, are you convinced by I915_EXEC_EXT? It doesn't roll off the
> tongue too well for me, but I guess EXT is a bit more ingrained in
> your cerebral cortex.


I'm open to any suggestion for the name :)


>
>> ---
>>   .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 32 ++++++++++++++++++-
>>   include/uapi/drm/i915_drm.h                   | 25 +++++++++++++--
>>   2 files changed, 53 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>> index 1c5dfbfad71b..9887fa9e3ac8 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>> @@ -23,6 +23,7 @@
>>   #include "i915_gem_clflush.h"
>>   #include "i915_gem_context.h"
>>   #include "i915_trace.h"
>> +#include "i915_user_extensions.h"
>>   #include "intel_drv.h"
>>   
>>   enum {
>> @@ -271,6 +272,10 @@ struct i915_execbuffer {
>>           */
>>          int lut_size;
>>          struct hlist_head *buckets; /** ht for relocation handles */
>> +
>> +       struct {
>> +               u64 flags; /** Available extensions parameters */
>> +       } extensions;
>>   };
>>   
>>   #define exec_entry(EB, VMA) (&(EB)->exec[(VMA)->exec_flags - (EB)->flags])
>> @@ -1969,7 +1974,7 @@ static bool i915_gem_check_execbuffer(struct drm_i915_gem_execbuffer2 *exec)
>>                  return false;
>>   
>>          /* Kernel clipping was a DRI1 misfeature */
>> -       if (!(exec->flags & I915_EXEC_FENCE_ARRAY)) {
>> +       if (!(exec->flags & (I915_EXEC_FENCE_ARRAY | I915_EXEC_EXT))) {
>>                  if (exec->num_cliprects || exec->cliprects_ptr)
>>                          return false;
>>          }
>> @@ -2347,6 +2352,27 @@ signal_fence_array(struct i915_execbuffer *eb,
>>          }
>>   }
>>   
>> +static const i915_user_extension_fn execbuf_extensions[] = {
>> +};
>> +
>> +static int
>> +parse_execbuf2_extensions(struct drm_i915_gem_execbuffer2 *args,
>> +                         struct i915_execbuffer *eb)
>> +{
>> +       eb->extensions.flags = 0;
>> +
>> +       if (!(args->flags & I915_EXEC_EXT))
>> +               return 0;
>> +
>> +       if (args->num_cliprects != 0)
>> +               return -EINVAL;
>> +
>> +       return i915_user_extensions(u64_to_user_ptr(args->cliprects_ptr),
>> +                                   execbuf_extensions,
>> +                                   ARRAY_SIZE(execbuf_extensions),
>> +                                   eb);
>> +}
>> +
>>   static int
>>   i915_gem_do_execbuffer(struct drm_device *dev,
>>                         struct drm_file *file,
>> @@ -2393,6 +2419,10 @@ i915_gem_do_execbuffer(struct drm_device *dev,
>>          if (args->flags & I915_EXEC_IS_PINNED)
>>                  eb.batch_flags |= I915_DISPATCH_PINNED;
>>   
>> +       err = parse_execbuf2_extensions(args, &eb);
>> +       if (err)
>> +               return err;
>> +
>>          if (args->flags & I915_EXEC_FENCE_IN) {
>>                  in_fence = sync_file_get_fence(lower_32_bits(args->rsvd2));
>>                  if (!in_fence)
>> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
>> index e27a8eda9121..efa195d6994e 100644
>> --- a/include/uapi/drm/i915_drm.h
>> +++ b/include/uapi/drm/i915_drm.h
>> @@ -1013,6 +1013,10 @@ struct drm_i915_gem_exec_fence {
>>          __u32 flags;
>>   };
>>   
>> +enum drm_i915_gem_execbuffer_ext {
>> +       DRM_I915_GEM_EXECBUFFER_EXT_MAX /* non-ABI */
> We have a weird mix of trying to avoid drm_i915_gem and yet it's
> plastered all over the structs. Sigh.


Yeah, I couldn't figure out what is desired.

Happy to change it if you have a naming scheme.


>
>> +};
> enums next to uABI make me nervous :)
>
> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
> -Chris
>

Thanks a lot,


-Lionel

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* ✗ Fi.CI.IGT: failure for drm/i915: Vulkan performance query support (rev6)
  2019-07-01 11:34 [PATCH v6 00/11] drm/i915: Vulkan performance query support Lionel Landwerlin
                   ` (13 preceding siblings ...)
  2019-07-01 13:38 ` ✓ Fi.CI.BAT: success " Patchwork
@ 2019-07-02 18:02 ` Patchwork
  14 siblings, 0 replies; 44+ messages in thread
From: Patchwork @ 2019-07-02 18:02 UTC (permalink / raw)
  To: Lionel Landwerlin; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: Vulkan performance query support (rev6)
URL   : https://patchwork.freedesktop.org/series/60916/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_6390_full -> Patchwork_13480_full
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with Patchwork_13480_full absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_13480_full, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_13480_full:

### IGT changes ###

#### Possible regressions ####

  * igt@perf@blocking:
    - shard-hsw:          [PASS][1] -> [DMESG-WARN][2] +12 similar issues
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6390/shard-hsw8/igt@perf@blocking.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13480/shard-hsw7/igt@perf@blocking.html

  * igt@perf@create-destroy-userspace-config:
    - shard-glk:          [PASS][3] -> [DMESG-WARN][4] +12 similar issues
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6390/shard-glk9/igt@perf@create-destroy-userspace-config.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13480/shard-glk7/igt@perf@create-destroy-userspace-config.html

  * igt@perf@gen8-unprivileged-single-ctx-counters:
    - shard-skl:          [PASS][5] -> [DMESG-WARN][6] +10 similar issues
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6390/shard-skl9/igt@perf@gen8-unprivileged-single-ctx-counters.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13480/shard-skl5/igt@perf@gen8-unprivileged-single-ctx-counters.html

  * igt@perf@invalid-oa-exponent:
    - shard-iclb:         [PASS][7] -> [DMESG-WARN][8] +12 similar issues
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6390/shard-iclb2/igt@perf@invalid-oa-exponent.html
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13480/shard-iclb8/igt@perf@invalid-oa-exponent.html

  * igt@perf@invalid-oa-format-id:
    - shard-kbl:          [PASS][9] -> [DMESG-WARN][10] +12 similar issues
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6390/shard-kbl3/igt@perf@invalid-oa-format-id.html
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13480/shard-kbl1/igt@perf@invalid-oa-format-id.html
    - shard-hsw:          NOTRUN -> [DMESG-WARN][11]
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13480/shard-hsw1/igt@perf@invalid-oa-format-id.html

  * igt@perf@invalid-oa-metric-set-id:
    - shard-skl:          [PASS][12] -> [INCOMPLETE][13] +2 similar issues
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6390/shard-skl1/igt@perf@invalid-oa-metric-set-id.html
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13480/shard-skl7/igt@perf@invalid-oa-metric-set-id.html

  * igt@perf@low-oa-exponent-permissions:
    - shard-apl:          [PASS][14] -> [DMESG-WARN][15] +11 similar issues
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6390/shard-apl8/igt@perf@low-oa-exponent-permissions.html
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13480/shard-apl6/igt@perf@low-oa-exponent-permissions.html

  * igt@runner@aborted:
    - shard-hsw:          NOTRUN -> [FAIL][16]
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13480/shard-hsw4/igt@runner@aborted.html
    - shard-kbl:          NOTRUN -> [FAIL][17]
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13480/shard-kbl1/igt@runner@aborted.html
    - shard-iclb:         NOTRUN -> [FAIL][18]
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13480/shard-iclb1/igt@runner@aborted.html
    - shard-apl:          NOTRUN -> [FAIL][19]
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13480/shard-apl1/igt@runner@aborted.html

  
#### Warnings ####

  * igt@perf@blocking:
    - shard-skl:          [FAIL][20] ([fdo#110728]) -> [DMESG-WARN][21]
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6390/shard-skl10/igt@perf@blocking.html
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13480/shard-skl3/igt@perf@blocking.html

  
Known issues
------------

  Here are the changes found in Patchwork_13480_full that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_exec_await@wide-contexts:
    - shard-iclb:         [PASS][22] -> [FAIL][23] ([fdo#110769] / [fdo#110946])
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6390/shard-iclb1/igt@gem_exec_await@wide-contexts.html
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13480/shard-iclb2/igt@gem_exec_await@wide-contexts.html

  * igt@i915_suspend@fence-restore-untiled:
    - shard-apl:          [PASS][24] -> [DMESG-WARN][25] ([fdo#108566]) +1 similar issue
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6390/shard-apl5/igt@i915_suspend@fence-restore-untiled.html
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13480/shard-apl8/igt@i915_suspend@fence-restore-untiled.html

  * igt@kms_flip@flip-vs-expired-vblank-interruptible:
    - shard-glk:          [PASS][26] -> [FAIL][27] ([fdo#105363])
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6390/shard-glk4/igt@kms_flip@flip-vs-expired-vblank-interruptible.html
   [27]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13480/shard-glk3/igt@kms_flip@flip-vs-expired-vblank-interruptible.html

  * igt@kms_plane_alpha_blend@pipe-a-coverage-7efc:
    - shard-skl:          [PASS][28] -> [FAIL][29] ([fdo#108145])
   [28]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6390/shard-skl2/igt@kms_plane_alpha_blend@pipe-a-coverage-7efc.html
   [29]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13480/shard-skl4/igt@kms_plane_alpha_blend@pipe-a-coverage-7efc.html

  * igt@kms_psr2_su@page_flip:
    - shard-iclb:         [PASS][30] -> [SKIP][31] ([fdo#109642])
   [30]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6390/shard-iclb2/igt@kms_psr2_su@page_flip.html
   [31]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13480/shard-iclb3/igt@kms_psr2_su@page_flip.html

  * igt@kms_psr@psr2_primary_page_flip:
    - shard-iclb:         [PASS][32] -> [SKIP][33] ([fdo#109441])
   [32]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6390/shard-iclb2/igt@kms_psr@psr2_primary_page_flip.html
   [33]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13480/shard-iclb4/igt@kms_psr@psr2_primary_page_flip.html

  * igt@kms_setmode@basic:
    - shard-apl:          [PASS][34] -> [FAIL][35] ([fdo#99912])
   [34]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6390/shard-apl4/igt@kms_setmode@basic.html
   [35]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13480/shard-apl2/igt@kms_setmode@basic.html

  * igt@perf@disabled-read-error:
    - shard-kbl:          [PASS][36] -> [INCOMPLETE][37] ([fdo#103665]) +3 similar issues
   [36]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6390/shard-kbl4/igt@perf@disabled-read-error.html
   [37]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13480/shard-kbl2/igt@perf@disabled-read-error.html
    - shard-hsw:          [PASS][38] -> [INCOMPLETE][39] ([fdo#103540]) +2 similar issues
   [38]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6390/shard-hsw1/igt@perf@disabled-read-error.html
   [39]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13480/shard-hsw7/igt@perf@disabled-read-error.html

  * igt@perf@invalid-oa-metric-set-id:
    - shard-apl:          [PASS][40] -> [INCOMPLETE][41] ([fdo#103927]) +4 similar issues
   [40]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6390/shard-apl5/igt@perf@invalid-oa-metric-set-id.html
   [41]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13480/shard-apl1/igt@perf@invalid-oa-metric-set-id.html
    - shard-glk:          [PASS][42] -> [INCOMPLETE][43] ([fdo#103359] / [k.org#198133]) +3 similar issues
   [42]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6390/shard-glk3/igt@perf@invalid-oa-metric-set-id.html
   [43]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13480/shard-glk3/igt@perf@invalid-oa-metric-set-id.html

  * igt@perf@oa-exponents:
    - shard-skl:          [PASS][44] -> [INCOMPLETE][45] ([fdo#104108])
   [44]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6390/shard-skl5/igt@perf@oa-exponents.html
   [45]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13480/shard-skl8/igt@perf@oa-exponents.html

  * igt@perf@oa-formats:
    - shard-iclb:         [PASS][46] -> [INCOMPLETE][47] ([fdo#107713]) +3 similar issues
   [46]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6390/shard-iclb5/igt@perf@oa-formats.html
   [47]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13480/shard-iclb8/igt@perf@oa-formats.html
    - shard-hsw:          [PASS][48] -> [INCOMPLETE][49] ([fdo#103540] / [fdo#108767])
   [48]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6390/shard-hsw5/igt@perf@oa-formats.html
   [49]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13480/shard-hsw2/igt@perf@oa-formats.html

  
#### Possible fixes ####

  * igt@gem_exec_balancer@smoke:
    - shard-iclb:         [SKIP][50] ([fdo#110854]) -> [PASS][51]
   [50]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6390/shard-iclb5/igt@gem_exec_balancer@smoke.html
   [51]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13480/shard-iclb1/igt@gem_exec_balancer@smoke.html

  * igt@gem_exec_schedule@preemptive-hang-vebox:
    - shard-iclb:         [INCOMPLETE][52] ([fdo#107713]) -> [PASS][53]
   [52]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6390/shard-iclb7/igt@gem_exec_schedule@preemptive-hang-vebox.html
   [53]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13480/shard-iclb3/igt@gem_exec_schedule@preemptive-hang-vebox.html

  * igt@i915_selftest@mock_requests:
    - shard-skl:          [INCOMPLETE][54] ([fdo#110550]) -> [PASS][55]
   [54]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6390/shard-skl2/igt@i915_selftest@mock_requests.html
   [55]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13480/shard-skl7/igt@i915_selftest@mock_requests.html
    - shard-glk:          [INCOMPLETE][56] ([fdo#103359] / [k.org#198133]) -> [PASS][57]
   [56]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6390/shard-glk6/igt@i915_selftest@mock_requests.html
   [57]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13480/shard-glk9/igt@i915_selftest@mock_requests.html

  * igt@kms_flip@flip-vs-expired-vblank:
    - shard-skl:          [FAIL][58] ([fdo#105363]) -> [PASS][59]
   [58]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6390/shard-skl5/igt@kms_flip@flip-vs-expired-vblank.html
   [59]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13480/shard-skl5/igt@kms_flip@flip-vs-expired-vblank.html

  * igt@kms_flip@flip-vs-suspend:
    - shard-apl:          [DMESG-WARN][60] ([fdo#108566]) -> [PASS][61]
   [60]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6390/shard-apl6/igt@kms_flip@flip-vs-suspend.html
   [61]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13480/shard-apl7/igt@kms_flip@flip-vs-suspend.html

  * igt@kms_flip@plain-flip-fb-recreate-interruptible:
    - shard-hsw:          [INCOMPLETE][62] ([fdo#103540]) -> [PASS][63]
   [62]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6390/shard-hsw4/igt@kms_flip@plain-flip-fb-recreate-interruptible.html
   [63]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13480/shard-hsw1/igt@kms_flip@plain-flip-fb-recreate-interruptible.html

  * igt@kms_frontbuffer_tracking@fbc-1p-offscren-pri-indfb-draw-blt:
    - shard-iclb:         [FAIL][64] ([fdo#103167]) -> [PASS][65] +1 similar issue
   [64]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6390/shard-iclb2/igt@kms_frontbuffer_tracking@fbc-1p-offscren-pri-indfb-draw-blt.html
   [65]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13480/shard-iclb3/igt@kms_frontbuffer_tracking@fbc-1p-offscren-pri-indfb-draw-blt.html

  * igt@kms_frontbuffer_tracking@fbc-suspend:
    - shard-kbl:          [INCOMPLETE][66] ([fdo#103665]) -> [PASS][67]
   [66]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6390/shard-kbl4/igt@kms_frontbuffer_tracking@fbc-suspend.html
   [67]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13480/shard-kbl7/igt@kms_frontbuffer_tracking@fbc-suspend.html

  * igt@kms_plane_alpha_blend@pipe-b-constant-alpha-min:
    - shard-skl:          [FAIL][68] ([fdo#108145]) -> [PASS][69]
   [68]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6390/shard-skl6/igt@kms_plane_alpha_blend@pipe-b-constant-alpha-min.html
   [69]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13480/shard-skl4/igt@kms_plane_alpha_blend@pipe-b-constant-alpha-min.html

  * igt@kms_plane_lowres@pipe-a-tiling-y:
    - shard-iclb:         [FAIL][70] ([fdo#103166]) -> [PASS][71]
   [70]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6390/shard-iclb5/igt@kms_plane_lowres@pipe-a-tiling-y.html
   [71]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13480/shard-iclb1/igt@kms_plane_lowres@pipe-a-tiling-y.html

  * igt@kms_psr@no_drrs:
    - shard-iclb:         [FAIL][72] ([fdo#108341]) -> [PASS][73]
   [72]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6390/shard-iclb1/igt@kms_psr@no_drrs.html
   [73]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13480/shard-iclb2/igt@kms_psr@no_drrs.html

  * igt@kms_psr@psr2_cursor_mmap_cpu:
    - shard-iclb:         [SKIP][74] ([fdo#109441]) -> [PASS][75]
   [74]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6390/shard-iclb1/igt@kms_psr@psr2_cursor_mmap_cpu.html
   [75]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13480/shard-iclb2/igt@kms_psr@psr2_cursor_mmap_cpu.html

  
  [fdo#103166]: https://bugs.freedesktop.org/show_bug.cgi?id=103166
  [fdo#103167]: https://bugs.freedesktop.org/show_bug.cgi?id=103167
  [fdo#103359]: https://bugs.freedesktop.org/show_bug.cgi?id=103359
  [fdo#103540]: https://bugs.freedesktop.org/show_bug.cgi?id=103540
  [fdo#103665]: https://bugs.freedesktop.org/show_bug.cgi?id=103665
  [fdo#103927]: https://bugs.freedesktop.org/show_bug.cgi?id=103927
  [fdo#104108]: https://bugs.freedesktop.org/show_bug.cgi?id=104108
  [fdo#105363]: https://bugs.freedesktop.org/show_bug.cgi?id=105363
  [fdo#107713]: https://bugs.freedesktop.org/show_bug.cgi?id=107713
  [fdo#108145]: https://bugs.freedesktop.org/show_bug.cgi?id=108145
  [fdo#108341]: https://bugs.freedesktop.org/show_bug.cgi?id=108341
  [fdo#108566]: https://bugs.freedesktop.org/show_bug.cgi?id=108566
  [fdo#108767]: https://bugs.freedesktop.org/show_bug.cgi?id=108767
  [fdo#109441]: https://bugs.freedesktop.org/show_bug.cgi?id=109441
  [fdo#109642]: https://bugs.freedesktop.org/show_bug.cgi?id=109642
  [fdo#110550]: https://bugs.freedesktop.org/show_bug.cgi?id=110550
  [fdo#110728]: https://bugs.freedesktop.org/show_bug.cgi?id=110728
  [fdo#110769]: https://bugs.freedesktop.org/show_bug.cgi?id=110769
  [fdo#110854]: https://bugs.freedesktop.org/show_bug.cgi?id=110854
  [fdo#110946]: https://bugs.freedesktop.org/show_bug.cgi?id=110946
  [fdo#99912]: https://bugs.freedesktop.org/show_bug.cgi?id=99912
  [k.org#198133]: https://bugzilla.kernel.org/show_bug.cgi?id=198133


Participating hosts (10 -> 10)
------------------------------

  No changes in participating hosts


Build changes
-------------

  * Linux: CI_DRM_6390 -> Patchwork_13480

  CI_DRM_6390: 4c6c23fdf450ab43bb4046ac1fb1439ebf176564 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_5075: 03779dd3de8a57544f124d9952a6d2b3e34e34ca @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_13480: e4ee8661e287f57ac107e317b2adcb2fe2b364ec @ git://anongit.freedesktop.org/gfx-ci/linux
  piglit_4509: fdc5a4ca11124ab8413c7988896eec4c97336694 @ git://anongit.freedesktop.org/piglit

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13480/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v6 07/11] drm/i915: add syncobj timeline support
  2019-07-01 11:34 ` [PATCH v6 07/11] drm/i915: add syncobj timeline support Lionel Landwerlin
  2019-07-01 13:13   ` Chris Wilson
  2019-07-01 13:18   ` Chris Wilson
@ 2019-07-03  8:56   ` Chris Wilson
  2019-07-03  9:17     ` Lionel Landwerlin
  2 siblings, 1 reply; 44+ messages in thread
From: Chris Wilson @ 2019-07-03  8:56 UTC (permalink / raw)
  To: Lionel Landwerlin, intel-gfx

Quoting Lionel Landwerlin (2019-07-01 12:34:33)
> +               syncobj = drm_syncobj_find(eb->file, user_fence.handle);
> +               if (!syncobj) {
> +                       DRM_DEBUG("Invalid syncobj handle provided\n");
> +                       err = -EINVAL;
> +                       goto err;
> +               }
> +
> +               if (user_fence.flags & I915_EXEC_FENCE_WAIT) {
> +                       fence = drm_syncobj_fence_get(syncobj);
> +                       if (!fence) {
> +                               DRM_DEBUG("Syncobj handle has no fence\n");
> +                               drm_syncobj_put(syncobj);
> +                               err = -EINVAL;
> +                               goto err;
> +                       }
> +
> +                       err = dma_fence_chain_find_seqno(&fence, point);

I'm very dubious about chain_find_seqno().

It returns -EINVAL if the point is older than the first in the chain --
it is in an unknown state, but may be signaled since we remove signaled
links from the chain. If we are waiting for an already signaled syncpt,
we should not be erring out!

Do we allow later requests to insert earlier syncpt into the chain? If
so, then the request we wait on here may be woefully inaccurate and
quite easily lead to cycles in the fence tree. We have no way of
resolving such deadlocks -- we would have to treat this fence as a
foreign fence and install a backup timer. Alternatively, we only allow
this to return the exact fence for a syncpt, and proxies for the rest.

> +                       if (err || !fence) {
> +                               DRM_DEBUG("Syncobj handle missing requested point\n");
> +                               drm_syncobj_put(syncobj);
> +                               err = err != 0 ? err : -EINVAL;
> +                               goto err;
> +                       }
> +               }
> +
> +               /*
> +                * For timeline syncobjs we need to preallocate chains for
> +                * later signaling.
> +                */
> +               if (point != 0 && user_fence.flags & I915_EXEC_FENCE_SIGNAL) {
> +                       fences[n].chain_fence =
> +                               kmalloc(sizeof(*fences[n].chain_fence),
> +                                       GFP_KERNEL);
> +                       if (!fences[n].chain_fence) {
> +                               dma_fence_put(fence);
> +                               drm_syncobj_put(syncobj);
> +                               err = -ENOMEM;
> +                               DRM_DEBUG("Unable to alloc chain_fence\n");
> +                               goto err;
> +                       }

What happens if we later try to insert two fences for the same syncpt?
Should we not reserve the slot in the chain to reject duplicates?
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v6 07/11] drm/i915: add syncobj timeline support
  2019-07-03  8:56   ` Chris Wilson
@ 2019-07-03  9:17     ` Lionel Landwerlin
  2019-07-15 11:30       ` Koenig, Christian
  0 siblings, 1 reply; 44+ messages in thread
From: Lionel Landwerlin @ 2019-07-03  9:17 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: Koenig, Christian

On 03/07/2019 11:56, Chris Wilson wrote:
> Quoting Lionel Landwerlin (2019-07-01 12:34:33)
>> +               syncobj = drm_syncobj_find(eb->file, user_fence.handle);
>> +               if (!syncobj) {
>> +                       DRM_DEBUG("Invalid syncobj handle provided\n");
>> +                       err = -EINVAL;
>> +                       goto err;
>> +               }
>> +
>> +               if (user_fence.flags & I915_EXEC_FENCE_WAIT) {
>> +                       fence = drm_syncobj_fence_get(syncobj);
>> +                       if (!fence) {
>> +                               DRM_DEBUG("Syncobj handle has no fence\n");
>> +                               drm_syncobj_put(syncobj);
>> +                               err = -EINVAL;
>> +                               goto err;
>> +                       }
>> +
>> +                       err = dma_fence_chain_find_seqno(&fence, point);
> I'm very dubious about chain_find_seqno().
>
> It returns -EINVAL if the point is older than the first in the chain --
> it is in an unknown state, but may be signaled since we remove signaled
> links from the chain. If we are waiting for an already signaled syncpt,
> we should not be erring out!


You're right, I got this wrong.

We can get fence = NULL if the point is already signaled.

The easiest would be to skip it from the list, or add the stub fence.


I guess the CTS got lucky that it always got the point needed before it 
was garbage collected...


>
> Do we allow later requests to insert earlier syncpt into the chain? If
> so, then the request we wait on here may be woefully inaccurate and
> quite easily lead to cycles in the fence tree. We have no way of
> resolving such deadlocks -- we would have to treat this fence as a
> foreign fence and install a backup timer. Alternatively, we only allow
> this to return the exact fence for a syncpt, and proxies for the rest.


Adding points < latest added point is forbidden.

I wish we enforced it a bit more than what's currently done in 
drm_syncobj_add_point().

In my view we should :

     - lock the syncobj in get_timeline_fence_array() do the sanity 
check there.

     - keep the lock until we add the point to the timeline

     - unlock once added


That way we would ensure that the application cannot generate invalid 
timelines and error out if it does.

We could do the same for host signaling in 
drm_syncobj_timeline_signal_ioctl/drm_syncobj_transfer_to_timeline 
(there the locking a lot shorter).

That requires holding the lock for longer than maybe other driver would 
prefer.


Ccing Christian who can tell whether that's out of question for AMD.


Cheers,


-Lionel


>> +                       if (err || !fence) {
>> +                               DRM_DEBUG("Syncobj handle missing requested point\n");
>> +                               drm_syncobj_put(syncobj);
>> +                               err = err != 0 ? err : -EINVAL;
>> +                               goto err;
>> +                       }
>> +               }
>> +
>> +               /*
>> +                * For timeline syncobjs we need to preallocate chains for
>> +                * later signaling.
>> +                */
>> +               if (point != 0 && user_fence.flags & I915_EXEC_FENCE_SIGNAL) {
>> +                       fences[n].chain_fence =
>> +                               kmalloc(sizeof(*fences[n].chain_fence),
>> +                                       GFP_KERNEL);
>> +                       if (!fences[n].chain_fence) {
>> +                               dma_fence_put(fence);
>> +                               drm_syncobj_put(syncobj);
>> +                               err = -ENOMEM;
>> +                               DRM_DEBUG("Unable to alloc chain_fence\n");
>> +                               goto err;
>> +                       }
> What happens if we later try to insert two fences for the same syncpt?
> Should we not reserve the slot in the chain to reject duplicates?
> -Chris
>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v6 03/11] drm/i915/perf: allow for CS OA configs to be created lazily
  2019-07-01 15:09   ` Chris Wilson
@ 2019-07-09  6:47     ` Lionel Landwerlin
  2019-07-09  8:31       ` Chris Wilson
  0 siblings, 1 reply; 44+ messages in thread
From: Lionel Landwerlin @ 2019-07-09  6:47 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On 01/07/2019 18:09, Chris Wilson wrote:
> Quoting Lionel Landwerlin (2019-07-01 12:34:29)
>>   struct i915_oa_config {
>> +       struct drm_i915_private *i915;
>> +
>>          char uuid[UUID_STRING_LEN + 1];
>>          int id;
>>   
>> @@ -1110,6 +1112,10 @@ struct i915_oa_config {
>>          struct attribute *attrs[2];
>>          struct device_attribute sysfs_metric_id;
>>   
>> +       struct drm_i915_gem_object *obj;
>> +
>> +       struct list_head vma_link;
>> +
>>          atomic_t ref_count;
>>   };
>> -static void free_oa_config(struct drm_i915_private *dev_priv,
>> -                          struct i915_oa_config *oa_config)
>> +static void put_oa_config(struct i915_oa_config *oa_config)
>>   {
>> +       if (!atomic_dec_and_test(&oa_config->ref_count))
>> +               return;
> I strongly advise that ref_count be replaced by struct kref, just so that
> we get the benefit of debugging.
>
> put_oa_config -> kref_put(&oa_config->ref, free_oa_config)
> (free_oa_config takes kref as its arg and uses container_of())


This is done in "drm/i915: add a new perf configuration execbuf parameter"

I'll factor it in this commit.


>
>> +int i915_perf_get_oa_config(struct drm_i915_private *i915,
>> +                           int metrics_set,
>> +                           struct i915_oa_config **out_config,
>> +                           struct drm_i915_gem_object **out_obj)
>> +{
>> +       int ret = 0;
>> +       struct i915_oa_config *oa_config;
>> +
>> +       if (!i915->perf.initialized)
>> +               return -ENODEV;
>> +
>> +       ret = mutex_lock_interruptible(&i915->perf.metrics_lock);
>>          if (ret)
>>                  return ret;
>>   
>> -       *out_config = idr_find(&dev_priv->perf.metrics_idr, metrics_set);
>> -       if (!*out_config)
>> -               ret = -EINVAL;
>> -       else
>> -               atomic_inc(&(*out_config)->ref_count);
>> +       if (metrics_set == 1) {
>> +               oa_config = &i915->perf.oa.test_config;
>> +       } else {
>> +               oa_config = idr_find(&i915->perf.metrics_idr, metrics_set);
> Why not have the builtin[1] inside the idr?


I think it was just a way to avoid removing it from the idr through 
userspace calls.


> -Chris
>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v6 03/11] drm/i915/perf: allow for CS OA configs to be created lazily
  2019-07-01 11:34 ` [PATCH v6 03/11] drm/i915/perf: allow for CS OA configs to be created lazily Lionel Landwerlin
  2019-07-01 13:06   ` Chris Wilson
  2019-07-01 15:09   ` Chris Wilson
@ 2019-07-09  8:30   ` Chris Wilson
  2 siblings, 0 replies; 44+ messages in thread
From: Chris Wilson @ 2019-07-09  8:30 UTC (permalink / raw)
  To: Lionel Landwerlin, intel-gfx

Quoting Lionel Landwerlin (2019-07-01 12:34:29)
> Here we introduce a mechanism by which the execbuf part of the i915
> driver will be able to request that a batch buffer containing the
> programming for a particular OA config be created.
> 
> We'll execute these OA configuration buffers right before executing a
> set of userspace commands so that a particular user batchbuffer be
> executed with a given OA configuration.
> 
> This mechanism essentially allows the userspace driver to go through
> several OA configuration without having to open/close the i915/perf
> stream.
> 
> v2: No need for locking on object OA config object creation (Chris)
>     Flush cpu mapping of OA config (Chris)
> 
> v3: Properly deal with the perf_metric lock (Chris/Lionel)
> 
> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v6 03/11] drm/i915/perf: allow for CS OA configs to be created lazily
  2019-07-09  6:47     ` Lionel Landwerlin
@ 2019-07-09  8:31       ` Chris Wilson
  0 siblings, 0 replies; 44+ messages in thread
From: Chris Wilson @ 2019-07-09  8:31 UTC (permalink / raw)
  To: Lionel Landwerlin, intel-gfx

Quoting Lionel Landwerlin (2019-07-09 07:47:04)
> On 01/07/2019 18:09, Chris Wilson wrote:
> > Quoting Lionel Landwerlin (2019-07-01 12:34:29)
> >> +       if (metrics_set == 1) {
> >> +               oa_config = &i915->perf.oa.test_config;
> >> +       } else {
> >> +               oa_config = idr_find(&i915->perf.metrics_idr, metrics_set);
> > Why not have the builtin[1] inside the idr?
> 
> 
> I think it was just a way to avoid removing it from the idr through 
> userspace calls.

It might just be simpler to have the filter in the ioctl?
if (arg->id <= BUILTINS)
	return -EINVAL;
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v6 09/11] drm/i915/perf: allow holding preemption on filtered ctx
  2019-07-01 14:37       ` Chris Wilson
@ 2019-07-09  9:18         ` Lionel Landwerlin
  0 siblings, 0 replies; 44+ messages in thread
From: Lionel Landwerlin @ 2019-07-09  9:18 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On 01/07/2019 17:37, Chris Wilson wrote:
> Quoting Lionel Landwerlin (2019-07-01 13:10:53)
>> On 01/07/2019 15:03, Chris Wilson wrote:
>>> Quoting Lionel Landwerlin (2019-07-01 12:34:35)
>>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>>>> index f92bace9caff..012d6d7f54e2 100644
>>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>>>> @@ -2104,6 +2104,14 @@ static int eb_oa_config(struct i915_execbuffer *eb)
>>>>           if (err)
>>>>                   return err;
>>>>    
>>>> +       /*
>>>> +        * If the perf stream was opened with hold preemption, flag the
>>>> +        * request properly so that the priority of the request is bumped once
>>>> +        * it reaches the execlist ports.
>>>> +        */
>>>> +       if (eb->i915->perf.oa.exclusive_stream->hold_preemption)
>>>> +               eb->request->flags |= I915_REQUEST_FLAGS_PERF;
>>> Just to reassure myself that this is the behaviour you:
>>>
>>> If the exclusive_stream is changed before the request is executed, it is
>>> likely that we no longer notice the earlier preemption-protection. This
>>> should not matter because the listener is no longer interested in those
>>> events?
>>> -Chris
>>>
>> Yeah, dropping the perf stream before your queries complete and you're
>> in undefined behavior territory.
> Then this should do what you want, and if I break it in future, I have
> to fix it ;)
>
> Hmm, this definitely merits some selftest/igt as I am very liable to
> break it.
>
> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
> -Chris
>

I had an IGT test that used a spinning batch for a couple of seconds 
(same trick as the noa wait patch) and verified that no other context 
would come up in the OA buffer.

This might need allowing the preempt context or whatever is used for 
hang checks.


-Lionel

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v6 07/11] drm/i915: add syncobj timeline support
  2019-07-03  9:17     ` Lionel Landwerlin
@ 2019-07-15 11:30       ` Koenig, Christian
  2019-07-16  8:17         ` Lionel Landwerlin
  0 siblings, 1 reply; 44+ messages in thread
From: Koenig, Christian @ 2019-07-15 11:30 UTC (permalink / raw)
  To: Lionel Landwerlin, Chris Wilson, intel-gfx

Hi Lionel,

sorry for the delayed response, I'm just back from vacation.

Am 03.07.19 um 11:17 schrieb Lionel Landwerlin:
> On 03/07/2019 11:56, Chris Wilson wrote:
>> Quoting Lionel Landwerlin (2019-07-01 12:34:33)
>>> +               syncobj = drm_syncobj_find(eb->file, 
>>> user_fence.handle);
>>> +               if (!syncobj) {
>>> +                       DRM_DEBUG("Invalid syncobj handle provided\n");
>>> +                       err = -EINVAL;
>>> +                       goto err;
>>> +               }
>>> +
>>> +               if (user_fence.flags & I915_EXEC_FENCE_WAIT) {
>>> +                       fence = drm_syncobj_fence_get(syncobj);
>>> +                       if (!fence) {
>>> +                               DRM_DEBUG("Syncobj handle has no 
>>> fence\n");
>>> +                               drm_syncobj_put(syncobj);
>>> +                               err = -EINVAL;
>>> +                               goto err;
>>> +                       }
>>> +
>>> +                       err = dma_fence_chain_find_seqno(&fence, 
>>> point);
>> I'm very dubious about chain_find_seqno().
>>
>> It returns -EINVAL if the point is older than the first in the chain --
>> it is in an unknown state, but may be signaled since we remove signaled
>> links from the chain. If we are waiting for an already signaled syncpt,
>> we should not be erring out!
>
>
> You're right, I got this wrong.
>
> We can get fence = NULL if the point is already signaled.
>
> The easiest would be to skip it from the list, or add the stub fence.
>
>
> I guess the CTS got lucky that it always got the point needed before 
> it was garbage collected...

The topmost point is never garbage collected. So IIRC the check is 
actually correct and you should never get NULL here.

>>
>> Do we allow later requests to insert earlier syncpt into the chain? If
>> so, then the request we wait on here may be woefully inaccurate and
>> quite easily lead to cycles in the fence tree. We have no way of
>> resolving such deadlocks -- we would have to treat this fence as a
>> foreign fence and install a backup timer. Alternatively, we only allow
>> this to return the exact fence for a syncpt, and proxies for the rest.
>
>
> Adding points < latest added point is forbidden.
>
> I wish we enforced it a bit more than what's currently done in 
> drm_syncobj_add_point().
>
> In my view we should :
>
>     - lock the syncobj in get_timeline_fence_array() do the sanity 
> check there.
>
>     - keep the lock until we add the point to the timeline
>
>     - unlock once added
>
>
> That way we would ensure that the application cannot generate invalid 
> timelines and error out if it does.
>
> We could do the same for host signaling in 
> drm_syncobj_timeline_signal_ioctl/drm_syncobj_transfer_to_timeline 
> (there the locking a lot shorter).
>
> That requires holding the lock for longer than maybe other driver 
> would prefer.
>
>
> Ccing Christian who can tell whether that's out of question for AMD.

Yeah, adding the lock was the only other option I could see as well, but 
we intentionally decided against that.

Since we have multiple out sync objects we would need to use a ww_mutex 
as lock here.

That in turn would result in a another rather complicated dance for 
deadlock avoidance. Something which each driver would have to implement 
correctly.

That doesn't sounds like a good idea to me just to improve error checking.

As long as it is only in the same process userspace could check that as 
well before doing the submission.

Regards,
Christian.



>
>
> Cheers,
>
>
> -Lionel
>
>
>>> +                       if (err || !fence) {
>>> +                               DRM_DEBUG("Syncobj handle missing 
>>> requested point\n");
>>> +                               drm_syncobj_put(syncobj);
>>> +                               err = err != 0 ? err : -EINVAL;
>>> +                               goto err;
>>> +                       }
>>> +               }
>>> +
>>> +               /*
>>> +                * For timeline syncobjs we need to preallocate 
>>> chains for
>>> +                * later signaling.
>>> +                */
>>> +               if (point != 0 && user_fence.flags & 
>>> I915_EXEC_FENCE_SIGNAL) {
>>> +                       fences[n].chain_fence =
>>> + kmalloc(sizeof(*fences[n].chain_fence),
>>> +                                       GFP_KERNEL);
>>> +                       if (!fences[n].chain_fence) {
>>> +                               dma_fence_put(fence);
>>> +                               drm_syncobj_put(syncobj);
>>> +                               err = -ENOMEM;
>>> +                               DRM_DEBUG("Unable to alloc 
>>> chain_fence\n");
>>> +                               goto err;
>>> +                       }
>> What happens if we later try to insert two fences for the same syncpt?
>> Should we not reserve the slot in the chain to reject duplicates?
>> -Chris
>>
>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v6 07/11] drm/i915: add syncobj timeline support
  2019-07-15 11:30       ` Koenig, Christian
@ 2019-07-16  8:17         ` Lionel Landwerlin
  0 siblings, 0 replies; 44+ messages in thread
From: Lionel Landwerlin @ 2019-07-16  8:17 UTC (permalink / raw)
  To: Koenig, Christian, Chris Wilson, intel-gfx

On 15/07/2019 14:30, Koenig, Christian wrote:
> Hi Lionel,
>
> sorry for the delayed response, I'm just back from vacation.
>
> Am 03.07.19 um 11:17 schrieb Lionel Landwerlin:
>> On 03/07/2019 11:56, Chris Wilson wrote:
>>> Quoting Lionel Landwerlin (2019-07-01 12:34:33)
>>>> +               syncobj = drm_syncobj_find(eb->file,
>>>> user_fence.handle);
>>>> +               if (!syncobj) {
>>>> +                       DRM_DEBUG("Invalid syncobj handle provided\n");
>>>> +                       err = -EINVAL;
>>>> +                       goto err;
>>>> +               }
>>>> +
>>>> +               if (user_fence.flags & I915_EXEC_FENCE_WAIT) {
>>>> +                       fence = drm_syncobj_fence_get(syncobj);
>>>> +                       if (!fence) {
>>>> +                               DRM_DEBUG("Syncobj handle has no
>>>> fence\n");
>>>> +                               drm_syncobj_put(syncobj);
>>>> +                               err = -EINVAL;
>>>> +                               goto err;
>>>> +                       }
>>>> +
>>>> +                       err = dma_fence_chain_find_seqno(&fence,
>>>> point);
>>> I'm very dubious about chain_find_seqno().
>>>
>>> It returns -EINVAL if the point is older than the first in the chain --
>>> it is in an unknown state, but may be signaled since we remove signaled
>>> links from the chain. If we are waiting for an already signaled syncpt,
>>> we should not be erring out!
>>
>> You're right, I got this wrong.
>>
>> We can get fence = NULL if the point is already signaled.
>>
>> The easiest would be to skip it from the list, or add the stub fence.
>>
>>
>> I guess the CTS got lucky that it always got the point needed before
>> it was garbage collected...
> The topmost point is never garbage collected. So IIRC the check is
> actually correct and you should never get NULL here.
>
>>> Do we allow later requests to insert earlier syncpt into the chain? If
>>> so, then the request we wait on here may be woefully inaccurate and
>>> quite easily lead to cycles in the fence tree. We have no way of
>>> resolving such deadlocks -- we would have to treat this fence as a
>>> foreign fence and install a backup timer. Alternatively, we only allow
>>> this to return the exact fence for a syncpt, and proxies for the rest.
>>
>> Adding points < latest added point is forbidden.
>>
>> I wish we enforced it a bit more than what's currently done in
>> drm_syncobj_add_point().
>>
>> In my view we should :
>>
>>      - lock the syncobj in get_timeline_fence_array() do the sanity
>> check there.
>>
>>      - keep the lock until we add the point to the timeline
>>
>>      - unlock once added
>>
>>
>> That way we would ensure that the application cannot generate invalid
>> timelines and error out if it does.
>>
>> We could do the same for host signaling in
>> drm_syncobj_timeline_signal_ioctl/drm_syncobj_transfer_to_timeline
>> (there the locking a lot shorter).
>>
>> That requires holding the lock for longer than maybe other driver
>> would prefer.
>>
>>
>> Ccing Christian who can tell whether that's out of question for AMD.
> Yeah, adding the lock was the only other option I could see as well, but
> we intentionally decided against that.
>
> Since we have multiple out sync objects we would need to use a ww_mutex
> as lock here.
>
> That in turn would result in a another rather complicated dance for
> deadlock avoidance. Something which each driver would have to implement
> correctly.
>
> That doesn't sounds like a good idea to me just to improve error checking.
>
> As long as it is only in the same process userspace could check that as
> well before doing the submission.


Thanks Christian,


Would you be opposed to exposing an _locked() version of 
drm_syncobj_add_point() and have a static inline do the locking?

I don't think it would be a difference for your driver and we could add 
checking with a proxy fence Chris suggested on our side.


We could also allow do checks in drm_syncobj_timeline_signal_ioctl().


-Lionel


>
> Regards,
> Christian.
>
>
>
>>
>> Cheers,
>>
>>
>> -Lionel
>>
>>
>>>> +                       if (err || !fence) {
>>>> +                               DRM_DEBUG("Syncobj handle missing
>>>> requested point\n");
>>>> +                               drm_syncobj_put(syncobj);
>>>> +                               err = err != 0 ? err : -EINVAL;
>>>> +                               goto err;
>>>> +                       }
>>>> +               }
>>>> +
>>>> +               /*
>>>> +                * For timeline syncobjs we need to preallocate
>>>> chains for
>>>> +                * later signaling.
>>>> +                */
>>>> +               if (point != 0 && user_fence.flags &
>>>> I915_EXEC_FENCE_SIGNAL) {
>>>> +                       fences[n].chain_fence =
>>>> + kmalloc(sizeof(*fences[n].chain_fence),
>>>> +                                       GFP_KERNEL);
>>>> +                       if (!fences[n].chain_fence) {
>>>> +                               dma_fence_put(fence);
>>>> +                               drm_syncobj_put(syncobj);
>>>> +                               err = -ENOMEM;
>>>> +                               DRM_DEBUG("Unable to alloc
>>>> chain_fence\n");
>>>> +                               goto err;
>>>> +                       }
>>> What happens if we later try to insert two fences for the same syncpt?
>>> Should we not reserve the slot in the chain to reject duplicates?
>>> -Chris
>>>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2019-07-16  8:17 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-01 11:34 [PATCH v6 00/11] drm/i915: Vulkan performance query support Lionel Landwerlin
2019-07-01 11:34 ` [PATCH v6 01/11] drm/i915/perf: add missing delay for OA muxes configuration Lionel Landwerlin
2019-07-01 11:34 ` [PATCH v6 02/11] drm/i915/perf: introduce a versioning of the i915-perf uapi Lionel Landwerlin
2019-07-01 12:45   ` Chris Wilson
2019-07-01 11:34 ` [PATCH v6 03/11] drm/i915/perf: allow for CS OA configs to be created lazily Lionel Landwerlin
2019-07-01 13:06   ` Chris Wilson
2019-07-01 13:45     ` Lionel Landwerlin
2019-07-01 15:09   ` Chris Wilson
2019-07-09  6:47     ` Lionel Landwerlin
2019-07-09  8:31       ` Chris Wilson
2019-07-09  8:30   ` Chris Wilson
2019-07-01 11:34 ` [PATCH v6 04/11] drm/i915: enumerate scratch fields Lionel Landwerlin
2019-07-01 12:07   ` Chris Wilson
2019-07-01 11:34 ` [PATCH v6 05/11] drm/i915/perf: implement active wait for noa configurations Lionel Landwerlin
2019-07-01 12:43   ` Chris Wilson
2019-07-01 13:10     ` Lionel Landwerlin
2019-07-01 11:34 ` [PATCH v6 06/11] drm/i915: introduce a mechanism to extend execbuf2 Lionel Landwerlin
2019-07-01 15:17   ` Chris Wilson
2019-07-02 11:36     ` Lionel Landwerlin
2019-07-01 11:34 ` [PATCH v6 07/11] drm/i915: add syncobj timeline support Lionel Landwerlin
2019-07-01 13:13   ` Chris Wilson
2019-07-01 13:15     ` Lionel Landwerlin
2019-07-01 13:18   ` Chris Wilson
2019-07-01 13:22     ` Lionel Landwerlin
2019-07-03  8:56   ` Chris Wilson
2019-07-03  9:17     ` Lionel Landwerlin
2019-07-15 11:30       ` Koenig, Christian
2019-07-16  8:17         ` Lionel Landwerlin
2019-07-01 11:34 ` [PATCH v6 08/11] drm/i915: add a new perf configuration execbuf parameter Lionel Landwerlin
2019-07-01 12:05   ` Chris Wilson
2019-07-01 12:14     ` Lionel Landwerlin
2019-07-01 11:34 ` [PATCH v6 09/11] drm/i915/perf: allow holding preemption on filtered ctx Lionel Landwerlin
2019-07-01 12:03   ` Chris Wilson
2019-07-01 12:10     ` Lionel Landwerlin
2019-07-01 14:37       ` Chris Wilson
2019-07-09  9:18         ` Lionel Landwerlin
2019-07-01 11:34 ` [PATCH v6 10/11] drm/i915/perf: execute OA configuration from command stream Lionel Landwerlin
2019-07-01 13:32   ` Chris Wilson
2019-07-01 13:42     ` Lionel Landwerlin
2019-07-01 11:34 ` [PATCH v6 11/11] drm/i915: add support for perf configuration queries Lionel Landwerlin
2019-07-01 13:08 ` ✗ Fi.CI.CHECKPATCH: warning for drm/i915: Vulkan performance query support (rev6) Patchwork
2019-07-01 13:14 ` ✗ Fi.CI.SPARSE: " Patchwork
2019-07-01 13:38 ` ✓ Fi.CI.BAT: success " Patchwork
2019-07-02 18:02 ` ✗ Fi.CI.IGT: failure " Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.