All of lore.kernel.org
 help / color / mirror / Atom feed
* cmdparser perf improvement
@ 2016-08-12 15:07 Chris Wilson
  2016-08-12 15:07 ` [PATCH 1/9] drm/i915/cmdparser: Make initialisation failure non-fatal Chris Wilson
                   ` (9 more replies)
  0 siblings, 10 replies; 30+ messages in thread
From: Chris Wilson @ 2016-08-12 15:07 UTC (permalink / raw)
  To: intel-gfx

From the moment the cmdparser was enabled (4.0) we got regression reports
about the performance regression, e.g. most notable on Baytrail

	http://www.spinics.net/lists/dri-devel/msg80933.html
	msg->id:1428627643.3417.22.camel@collabora.com

Whilst this doesn't make the cmdparser free, it does significantly
reduce the overhead. (The cached vmappings and better hash were tested
at the time and demonstrated to reduce the impact on the user's workload
to the point where the new kernel was an improvement over the last known
good). This builds upon the regression fixes to stop the cmdparser
falling over in the first place.
-Chris

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 1/9] drm/i915/cmdparser: Make initialisation failure non-fatal
  2016-08-12 15:07 cmdparser perf improvement Chris Wilson
@ 2016-08-12 15:07 ` Chris Wilson
  2016-08-15 11:56   ` Matthew Auld
  2016-08-12 15:07 ` [PATCH 2/9] drm/i915/cmdparser: Add the TIMESTAMP register for the other engines Chris Wilson
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 30+ messages in thread
From: Chris Wilson @ 2016-08-12 15:07 UTC (permalink / raw)
  To: intel-gfx

If the developer adds a register in the wrong order, we BUG during boot.
That makes development and testing very difficult. Let's be a bit more
friendly and disable the command parser with a big warning if the tables
are invalid.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_cmd_parser.c | 30 ++++++++++++++++++------------
 drivers/gpu/drm/i915/i915_drv.h        |  2 +-
 drivers/gpu/drm/i915/intel_engine_cs.c |  6 ++++--
 3 files changed, 23 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
index a1f4683f5c35..1882dc28c750 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -746,17 +746,15 @@ static void fini_hash_table(struct intel_engine_cs *engine)
  * Optionally initializes fields related to batch buffer command parsing in the
  * struct intel_engine_cs based on whether the platform requires software
  * command parsing.
- *
- * Return: non-zero if initialization fails
  */
-int intel_engine_init_cmd_parser(struct intel_engine_cs *engine)
+void intel_engine_init_cmd_parser(struct intel_engine_cs *engine)
 {
 	const struct drm_i915_cmd_table *cmd_tables;
 	int cmd_table_count;
 	int ret;
 
 	if (!IS_GEN7(engine->i915))
-		return 0;
+		return;
 
 	switch (engine->id) {
 	case RCS:
@@ -811,24 +809,32 @@ int intel_engine_init_cmd_parser(struct intel_engine_cs *engine)
 		break;
 	default:
 		MISSING_CASE(engine->id);
-		BUG();
+		return;
 	}
 
-	BUG_ON(!validate_cmds_sorted(engine, cmd_tables, cmd_table_count));
-	BUG_ON(!validate_regs_sorted(engine));
+	if (!hash_empty(engine->cmd_hash)) {
+		DRM_DEBUG_DRIVER("%s: no commands?\n", engine->name);
+		return;
+	}
 
-	WARN_ON(!hash_empty(engine->cmd_hash));
+	if (!validate_cmds_sorted(engine, cmd_tables, cmd_table_count)) {
+		DRM_ERROR("%s: command descriptions are not sorted\n",
+			  engine->name);
+		return;
+	}
+	if (!validate_regs_sorted(engine)) {
+		DRM_ERROR("%s: registers are not sorted\n", engine->name);
+		return;
+	}
 
 	ret = init_hash_table(engine, cmd_tables, cmd_table_count);
 	if (ret) {
-		DRM_ERROR("CMD: cmd_parser_init failed!\n");
+		DRM_ERROR("%s: initialised failed!\n", engine->name);
 		fini_hash_table(engine);
-		return ret;
+		return;
 	}
 
 	engine->needs_cmd_parser = true;
-
-	return 0;
 }
 
 /**
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 52207b086286..f5b187662059 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3608,7 +3608,7 @@ const char *i915_cache_level_str(struct drm_i915_private *i915, int type);
 
 /* i915_cmd_parser.c */
 int i915_cmd_parser_get_version(struct drm_i915_private *dev_priv);
-int intel_engine_init_cmd_parser(struct intel_engine_cs *engine);
+void intel_engine_init_cmd_parser(struct intel_engine_cs *engine);
 void intel_engine_cleanup_cmd_parser(struct intel_engine_cs *engine);
 int intel_engine_cmd_parser(struct intel_engine_cs *engine,
 			    struct drm_i915_gem_object *batch_obj,
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 63440c6a6349..0eb19388eba4 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -239,6 +239,8 @@ void intel_engine_setup_common(struct intel_engine_cs *engine)
 	intel_engine_init_requests(engine);
 	intel_engine_init_hangcheck(engine);
 	i915_gem_batch_pool_init(engine, &engine->batch_pool);
+
+	intel_engine_init_cmd_parser(engine);
 }
 
 int intel_engine_create_scratch(struct intel_engine_cs *engine, int size)
@@ -305,7 +307,7 @@ int intel_engine_init_common(struct intel_engine_cs *engine)
 	if (ret)
 		return ret;
 
-	return intel_engine_init_cmd_parser(engine);
+	return 0;
 }
 
 /**
@@ -319,8 +321,8 @@ void intel_engine_cleanup_common(struct intel_engine_cs *engine)
 {
 	intel_engine_cleanup_scratch(engine);
 
-	intel_engine_cleanup_cmd_parser(engine);
 	i915_gem_render_state_fini(engine);
 	intel_engine_fini_breadcrumbs(engine);
+	intel_engine_cleanup_cmd_parser(engine);
 	i915_gem_batch_pool_fini(&engine->batch_pool);
 }
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 2/9] drm/i915/cmdparser: Add the TIMESTAMP register for the other engines
  2016-08-12 15:07 cmdparser perf improvement Chris Wilson
  2016-08-12 15:07 ` [PATCH 1/9] drm/i915/cmdparser: Make initialisation failure non-fatal Chris Wilson
@ 2016-08-12 15:07 ` Chris Wilson
  2016-08-16 11:03   ` Matthew Auld
  2016-08-12 15:07 ` [PATCH 3/9] drm/i915/cmdparser: Use cached vmappings Chris Wilson
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 30+ messages in thread
From: Chris Wilson @ 2016-08-12 15:07 UTC (permalink / raw)
  To: intel-gfx

Since I have been using the BCS_TIMESTAMP to measure latency of
execution upon the blitter ring, allow regular userspace to also read
from that register. They are already allowed RCS_TIMESTAMP!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_cmd_parser.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
index 1882dc28c750..5fbd049f8095 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -458,6 +458,7 @@ static const struct drm_i915_reg_descriptor gen7_render_regs[] = {
 	REG32(GEN7_GPGPU_DISPATCHDIMX),
 	REG32(GEN7_GPGPU_DISPATCHDIMY),
 	REG32(GEN7_GPGPU_DISPATCHDIMZ),
+	REG64_IDX(RING_TIMESTAMP, BSD_RING_BASE),
 	REG64_IDX(GEN7_SO_NUM_PRIMS_WRITTEN, 0),
 	REG64_IDX(GEN7_SO_NUM_PRIMS_WRITTEN, 1),
 	REG64_IDX(GEN7_SO_NUM_PRIMS_WRITTEN, 2),
@@ -473,6 +474,7 @@ static const struct drm_i915_reg_descriptor gen7_render_regs[] = {
 	REG32(GEN7_L3SQCREG1),
 	REG32(GEN7_L3CNTLREG2),
 	REG32(GEN7_L3CNTLREG3),
+	REG64_IDX(RING_TIMESTAMP, BLT_RING_BASE),
 };
 
 static const struct drm_i915_reg_descriptor hsw_render_regs[] = {
@@ -502,7 +504,10 @@ static const struct drm_i915_reg_descriptor hsw_render_regs[] = {
 };
 
 static const struct drm_i915_reg_descriptor gen7_blt_regs[] = {
+	REG64_IDX(RING_TIMESTAMP, RENDER_RING_BASE),
+	REG64_IDX(RING_TIMESTAMP, BSD_RING_BASE),
 	REG32(BCS_SWCTRL),
+	REG64_IDX(RING_TIMESTAMP, BLT_RING_BASE),
 };
 
 static const struct drm_i915_reg_descriptor ivb_master_regs[] = {
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 3/9] drm/i915/cmdparser: Use cached vmappings
  2016-08-12 15:07 cmdparser perf improvement Chris Wilson
  2016-08-12 15:07 ` [PATCH 1/9] drm/i915/cmdparser: Make initialisation failure non-fatal Chris Wilson
  2016-08-12 15:07 ` [PATCH 2/9] drm/i915/cmdparser: Add the TIMESTAMP register for the other engines Chris Wilson
@ 2016-08-12 15:07 ` Chris Wilson
  2016-08-16 10:50   ` Matthew Auld
  2016-08-16 19:04   ` Matthew Auld
  2016-08-12 15:07 ` [PATCH 4/9] drm/i915/cmdparser: Only cache the dst vmap Chris Wilson
                   ` (6 subsequent siblings)
  9 siblings, 2 replies; 30+ messages in thread
From: Chris Wilson @ 2016-08-12 15:07 UTC (permalink / raw)
  To: intel-gfx

The single largest factor in the overhead of parsing the commands is the
setup of the virtual mapping to provide a continuous block for the batch
buffer. If we keep those vmappings around (against the better judgement
of mm/vmalloc.c, which we offset by handwaving and looking suggestively
at the shrinker) we can dramatically improve the performance of the
parser for small batches (such as media workloads). Furthermore, we can
use the prepare shmem read/write functions to determine  how best we
need to clflush the range (rather than every page of the object).

The impact of caching both src/dst vmaps is +80% on ivb and +140% on byt
for the throughput on small batches. (Caching just the dst vmap and
iterating over the src, doing a page by page copy is roughly 5% slower
on both platforms. That may be an acceptable trade-off to eliminate one
cached vmapping, and we may be able to reduce the per-page copying overhead
further.) For *this* simple test case, the cmdparser is now within a
factor of 2 of ideal performance.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_cmd_parser.c     | 121 ++++++++++-------------------
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |   6 ++
 2 files changed, 47 insertions(+), 80 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
index 5fbd049f8095..545c333663c0 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -942,98 +942,57 @@ find_reg_in_tables(const struct drm_i915_reg_table *tables,
 	return NULL;
 }
 
-static u32 *vmap_batch(struct drm_i915_gem_object *obj,
-		       unsigned start, unsigned len)
-{
-	int i;
-	void *addr = NULL;
-	struct sg_page_iter sg_iter;
-	int first_page = start >> PAGE_SHIFT;
-	int last_page = (len + start + 4095) >> PAGE_SHIFT;
-	int npages = last_page - first_page;
-	struct page **pages;
-
-	pages = drm_malloc_ab(npages, sizeof(*pages));
-	if (pages == NULL) {
-		DRM_DEBUG_DRIVER("Failed to get space for pages\n");
-		goto finish;
-	}
-
-	i = 0;
-	for_each_sg_page(obj->pages->sgl, &sg_iter, obj->pages->nents, first_page) {
-		pages[i++] = sg_page_iter_page(&sg_iter);
-		if (i == npages)
-			break;
-	}
-
-	addr = vmap(pages, i, 0, PAGE_KERNEL);
-	if (addr == NULL) {
-		DRM_DEBUG_DRIVER("Failed to vmap pages\n");
-		goto finish;
-	}
-
-finish:
-	if (pages)
-		drm_free_large(pages);
-	return (u32*)addr;
-}
-
-/* Returns a vmap'd pointer to dest_obj, which the caller must unmap */
-static u32 *copy_batch(struct drm_i915_gem_object *dest_obj,
+/* Returns a vmap'd pointer to dst_obj, which the caller must unmap */
+static u32 *copy_batch(struct drm_i915_gem_object *dst_obj,
 		       struct drm_i915_gem_object *src_obj,
 		       u32 batch_start_offset,
-		       u32 batch_len)
+		       u32 batch_len,
+		       bool *needs_clflush_after)
 {
-	unsigned int needs_clflush;
-	void *src_base, *src;
-	void *dst = NULL;
+	unsigned int src_needs_clflush;
+	unsigned int dst_needs_clflush;
+	void *src, *dst;
 	int ret;
 
-	if (batch_len > dest_obj->base.size ||
-	    batch_len + batch_start_offset > src_obj->base.size)
-		return ERR_PTR(-E2BIG);
-
-	if (WARN_ON(dest_obj->pages_pin_count == 0))
-		return ERR_PTR(-ENODEV);
-
-	ret = i915_gem_obj_prepare_shmem_read(src_obj, &needs_clflush);
-	if (ret) {
-		DRM_DEBUG_DRIVER("CMD: failed to prepare shadow batch\n");
+	ret = i915_gem_obj_prepare_shmem_read(src_obj, &src_needs_clflush);
+	if (ret)
 		return ERR_PTR(ret);
-	}
 
-	src_base = vmap_batch(src_obj, batch_start_offset, batch_len);
-	if (!src_base) {
-		DRM_DEBUG_DRIVER("CMD: Failed to vmap batch\n");
-		ret = -ENOMEM;
+	ret = i915_gem_obj_prepare_shmem_write(dst_obj, &dst_needs_clflush);
+	if (ret) {
+		dst = ERR_PTR(ret);
 		goto unpin_src;
 	}
 
-	ret = i915_gem_object_set_to_cpu_domain(dest_obj, true);
-	if (ret) {
-		DRM_DEBUG_DRIVER("CMD: Failed to set shadow batch to CPU\n");
-		goto unmap_src;
+	src = i915_gem_object_pin_map(src_obj, I915_MAP_WB);
+	if (IS_ERR(src)) {
+		dst = src;
+		goto unpin_dst;
 	}
 
-	dst = vmap_batch(dest_obj, 0, batch_len);
-	if (!dst) {
-		DRM_DEBUG_DRIVER("CMD: Failed to vmap shadow batch\n");
-		ret = -ENOMEM;
+	dst = i915_gem_object_pin_map(dst_obj, I915_MAP_WB);
+	if (IS_ERR(dst))
 		goto unmap_src;
-	}
 
-	src = src_base + offset_in_page(batch_start_offset);
-	if (needs_clflush)
+	src += batch_start_offset;
+	if (src_needs_clflush)
 		drm_clflush_virt_range(src, batch_len);
 
+	if (dst_needs_clflush & CLFLUSH_BEFORE)
+		batch_len = roundup(batch_len, boot_cpu_data.x86_clflush_size);
+
 	memcpy(dst, src, batch_len);
 
+	/* dst_obj is returned with vmap pinned */
+	*needs_clflush_after = dst_needs_clflush & CLFLUSH_AFTER;
+
 unmap_src:
-	vunmap(src_base);
+	i915_gem_object_unpin_map(src_obj);
+unpin_dst:
+	i915_gem_object_unpin_pages(dst_obj);
 unpin_src:
 	i915_gem_object_unpin_pages(src_obj);
-
-	return ret ? ERR_PTR(ret) : dst;
+	return dst;
 }
 
 static bool check_cmd(const struct intel_engine_cs *engine,
@@ -1190,16 +1149,18 @@ int intel_engine_cmd_parser(struct intel_engine_cs *engine,
 			    u32 batch_len,
 			    bool is_master)
 {
-	u32 *cmd, *batch_base, *batch_end;
+	u32 *cmd, *batch_end;
 	struct drm_i915_cmd_descriptor default_desc = { 0 };
 	bool oacontrol_set = false; /* OACONTROL tracking. See check_cmd() */
+	bool needs_clflush_after = false;
 	int ret = 0;
 
-	batch_base = copy_batch(shadow_batch_obj, batch_obj,
-				batch_start_offset, batch_len);
-	if (IS_ERR(batch_base)) {
+	cmd = copy_batch(shadow_batch_obj, batch_obj,
+			 batch_start_offset, batch_len,
+			 &needs_clflush_after);
+	if (IS_ERR(cmd)) {
 		DRM_DEBUG_DRIVER("CMD: Failed to copy batch\n");
-		return PTR_ERR(batch_base);
+		return PTR_ERR(cmd);
 	}
 
 	/*
@@ -1207,9 +1168,7 @@ int intel_engine_cmd_parser(struct intel_engine_cs *engine,
 	 * large or larger and copy_batch() will write MI_NOPs to the extra
 	 * space. Parsing should be faster in some cases this way.
 	 */
-	batch_end = batch_base + (batch_len / sizeof(*batch_end));
-
-	cmd = batch_base;
+	batch_end = cmd + (batch_len / sizeof(*batch_end));
 	while (cmd < batch_end) {
 		const struct drm_i915_cmd_descriptor *desc;
 		u32 length;
@@ -1268,7 +1227,9 @@ int intel_engine_cmd_parser(struct intel_engine_cs *engine,
 		ret = -EINVAL;
 	}
 
-	vunmap(batch_base);
+	if (ret == 0 && needs_clflush_after)
+		drm_clflush_virt_range(shadow_batch_obj->mapping, batch_len);
+	i915_gem_object_unpin_map(shadow_batch_obj);
 
 	return ret;
 }
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 14b60350fbcc..0fbe5aae7b16 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1801,6 +1801,12 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 		ret = -EINVAL;
 		goto err;
 	}
+	if (args->batch_start_offset > eb.batch->size ||
+	    args->batch_len > eb.batch->size - args->batch_start_offset) {
+		DRM_DEBUG("Attempting to use out-of-bounds batch\n");
+		ret = -EINVAL;
+		goto err;
+	}
 
 	if (intel_engine_needs_cmd_parser(eb.engine) && args->batch_len) {
 		struct i915_vma *vma;
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 4/9] drm/i915/cmdparser: Only cache the dst vmap
  2016-08-12 15:07 cmdparser perf improvement Chris Wilson
                   ` (2 preceding siblings ...)
  2016-08-12 15:07 ` [PATCH 3/9] drm/i915/cmdparser: Use cached vmappings Chris Wilson
@ 2016-08-12 15:07 ` Chris Wilson
  2016-08-17  7:51   ` Matthew Auld
  2016-08-12 15:07 ` [PATCH 5/9] drm/i915/cmdparser: Improve hash function Chris Wilson
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 30+ messages in thread
From: Chris Wilson @ 2016-08-12 15:07 UTC (permalink / raw)
  To: intel-gfx

For simplicity, we want to continue using a contiguous mapping of the
command buffer, but we can reduce the number of vmappings we hold by
switching over to a page-by-page copy from the user batch buffer to the
shadow. The cost for saving one linear mapping is about 5% in trivial
workloads - which is more or less the overhead in calling kmap_atomic().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_cmd_parser.c | 34 +++++++++++++++++++---------------
 1 file changed, 19 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
index 545c333663c0..4c903081604c 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -951,7 +951,8 @@ static u32 *copy_batch(struct drm_i915_gem_object *dst_obj,
 {
 	unsigned int src_needs_clflush;
 	unsigned int dst_needs_clflush;
-	void *src, *dst;
+	void *dst, *ptr;
+	int offset, n;
 	int ret;
 
 	ret = i915_gem_obj_prepare_shmem_read(src_obj, &src_needs_clflush);
@@ -964,30 +965,33 @@ static u32 *copy_batch(struct drm_i915_gem_object *dst_obj,
 		goto unpin_src;
 	}
 
-	src = i915_gem_object_pin_map(src_obj, I915_MAP_WB);
-	if (IS_ERR(src)) {
-		dst = src;
-		goto unpin_dst;
-	}
-
 	dst = i915_gem_object_pin_map(dst_obj, I915_MAP_WB);
 	if (IS_ERR(dst))
-		goto unmap_src;
-
-	src += batch_start_offset;
-	if (src_needs_clflush)
-		drm_clflush_virt_range(src, batch_len);
+		goto unpin_dst;
 
+	ptr = dst;
+	offset = offset_in_page(batch_start_offset);
 	if (dst_needs_clflush & CLFLUSH_BEFORE)
 		batch_len = roundup(batch_len, boot_cpu_data.x86_clflush_size);
 
-	memcpy(dst, src, batch_len);
+	for (n = batch_start_offset >> PAGE_SHIFT; batch_len; n++) {
+		int len = min_t(int, batch_len, PAGE_SIZE - offset);
+		void *vaddr;
+
+		vaddr = kmap_atomic(i915_gem_object_get_page(src_obj, n));
+		if (src_needs_clflush)
+			drm_clflush_virt_range(vaddr + offset, len);
+		memcpy(ptr, vaddr + offset, len);
+		kunmap_atomic(vaddr);
+
+		ptr += len;
+		batch_len -= len;
+		offset = 0;
+	}
 
 	/* dst_obj is returned with vmap pinned */
 	*needs_clflush_after = dst_needs_clflush & CLFLUSH_AFTER;
 
-unmap_src:
-	i915_gem_object_unpin_map(src_obj);
 unpin_dst:
 	i915_gem_object_unpin_pages(dst_obj);
 unpin_src:
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 5/9] drm/i915/cmdparser: Improve hash function
  2016-08-12 15:07 cmdparser perf improvement Chris Wilson
                   ` (3 preceding siblings ...)
  2016-08-12 15:07 ` [PATCH 4/9] drm/i915/cmdparser: Only cache the dst vmap Chris Wilson
@ 2016-08-12 15:07 ` Chris Wilson
  2016-08-12 17:42   ` Matthew Auld
  2016-08-12 15:07 ` [PATCH 6/9] drm/i915/cmdparser: Compare against the previous command descriptor Chris Wilson
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 30+ messages in thread
From: Chris Wilson @ 2016-08-12 15:07 UTC (permalink / raw)
  To: intel-gfx

The existing code's hashfunction is very suboptimal (most 3D commands
use the same bucket degrading the hash to a long list). The code even
acknowledge that the issue was known and the fix simple:

/*
 * If we attempt to generate a perfect hash, we should be able to look at bits
 * 31:29 of a command from a batch buffer and use the full mask for that
 * client. The existing INSTR_CLIENT_MASK/SHIFT defines can be used for this.
 */

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_cmd_parser.c | 51 +++++++++++++++++++++-------------
 1 file changed, 31 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
index 4c903081604c..274f2136a846 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -86,24 +86,24 @@
  * general bitmasking mechanism.
  */
 
-#define STD_MI_OPCODE_MASK  0xFF800000
-#define STD_3D_OPCODE_MASK  0xFFFF0000
-#define STD_2D_OPCODE_MASK  0xFFC00000
-#define STD_MFX_OPCODE_MASK 0xFFFF0000
+#define STD_MI_OPCODE_SHIFT  (32 - 9)
+#define STD_3D_OPCODE_SHIFT  (32 - 16)
+#define STD_2D_OPCODE_SHIFT  (32 - 10)
+#define STD_MFX_OPCODE_SHIFT (32 - 16)
 
 #define CMD(op, opm, f, lm, fl, ...)				\
 	{							\
 		.flags = (fl) | ((f) ? CMD_DESC_FIXED : 0),	\
-		.cmd = { (op), (opm) },				\
+		.cmd = { (op), ~0u << (opm) },			\
 		.length = { (lm) },				\
 		__VA_ARGS__					\
 	}
 
 /* Convenience macros to compress the tables */
-#define SMI STD_MI_OPCODE_MASK
-#define S3D STD_3D_OPCODE_MASK
-#define S2D STD_2D_OPCODE_MASK
-#define SMFX STD_MFX_OPCODE_MASK
+#define SMI STD_MI_OPCODE_SHIFT
+#define S3D STD_3D_OPCODE_SHIFT
+#define S2D STD_2D_OPCODE_SHIFT
+#define SMFX STD_MFX_OPCODE_SHIFT
 #define F true
 #define S CMD_DESC_SKIP
 #define R CMD_DESC_REJECT
@@ -696,12 +696,26 @@ struct cmd_node {
  * non-opcode bits being set. But if we don't include those bits, some 3D
  * commands may hash to the same bucket due to not including opcode bits that
  * make the command unique. For now, we will risk hashing to the same bucket.
- *
- * If we attempt to generate a perfect hash, we should be able to look at bits
- * 31:29 of a command from a batch buffer and use the full mask for that
- * client. The existing INSTR_CLIENT_MASK/SHIFT defines can be used for this.
  */
-#define CMD_HASH_MASK STD_MI_OPCODE_MASK
+static inline u32 cmd_header_key(u32 x)
+{
+	u32 shift;
+
+	switch (x >> INSTR_CLIENT_SHIFT) {
+	default:
+	case INSTR_MI_CLIENT:
+		shift = STD_MI_OPCODE_SHIFT;
+		break;
+	case INSTR_RC_CLIENT:
+		shift = STD_3D_OPCODE_SHIFT;
+		break;
+	case INSTR_BC_CLIENT:
+		shift = STD_2D_OPCODE_SHIFT;
+		break;
+	}
+
+	return x >> shift;
+}
 
 static int init_hash_table(struct intel_engine_cs *engine,
 			   const struct drm_i915_cmd_table *cmd_tables,
@@ -725,7 +739,7 @@ static int init_hash_table(struct intel_engine_cs *engine,
 
 			desc_node->desc = desc;
 			hash_add(engine->cmd_hash, &desc_node->node,
-				 desc->cmd.value & CMD_HASH_MASK);
+				 cmd_header_key(desc->cmd.value));
 		}
 	}
 
@@ -864,12 +878,9 @@ find_cmd_in_table(struct intel_engine_cs *engine,
 	struct cmd_node *desc_node;
 
 	hash_for_each_possible(engine->cmd_hash, desc_node, node,
-			       cmd_header & CMD_HASH_MASK) {
+			       cmd_header_key(cmd_header)) {
 		const struct drm_i915_cmd_descriptor *desc = desc_node->desc;
-		u32 masked_cmd = desc->cmd.mask & cmd_header;
-		u32 masked_value = desc->cmd.value & desc->cmd.mask;
-
-		if (masked_cmd == masked_value)
+		if (((cmd_header ^ desc->cmd.value) & desc->cmd.mask) == 0)
 			return desc;
 	}
 
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 6/9] drm/i915/cmdparser: Compare against the previous command descriptor
  2016-08-12 15:07 cmdparser perf improvement Chris Wilson
                   ` (4 preceding siblings ...)
  2016-08-12 15:07 ` [PATCH 5/9] drm/i915/cmdparser: Improve hash function Chris Wilson
@ 2016-08-12 15:07 ` Chris Wilson
  2016-08-15 11:00   ` Matthew Auld
  2016-08-12 15:07 ` [PATCH 7/9] drm/i915/cmdparser: Check for SKIP descriptors first Chris Wilson
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 30+ messages in thread
From: Chris Wilson @ 2016-08-12 15:07 UTC (permalink / raw)
  To: intel-gfx

On the blitter (and in test code), we see long sequences of repeated
commands, e.g. XY_PIXEL_BLT, XY_SCANLINE_BLT, or XY_SRC_COPY. For these,
we can skip the hashtable lookup by remembering the previous command
descriptor and doing a straightforward compare of the command header.
The corollary is that we need to do one extra comparison before lookup
up new commands.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_cmd_parser.c | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
index 274f2136a846..3b1100a0e0cb 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -350,6 +350,9 @@ static const struct drm_i915_cmd_descriptor hsw_blt_cmds[] = {
 	CMD(  MI_LOAD_SCAN_LINES_EXCL,          SMI,   !F,  0x3F,   R  ),
 };
 
+static const struct drm_i915_cmd_descriptor noop_desc =
+	CMD(MI_NOOP, SMI, F, 1, S);
+
 #undef CMD
 #undef SMI
 #undef S3D
@@ -898,11 +901,14 @@ find_cmd_in_table(struct intel_engine_cs *engine,
 static const struct drm_i915_cmd_descriptor*
 find_cmd(struct intel_engine_cs *engine,
 	 u32 cmd_header,
+	 const struct drm_i915_cmd_descriptor *desc,
 	 struct drm_i915_cmd_descriptor *default_desc)
 {
-	const struct drm_i915_cmd_descriptor *desc;
 	u32 mask;
 
+	if (((cmd_header ^ desc->cmd.value) & desc->cmd.mask) == 0)
+		return desc;
+
 	desc = find_cmd_in_table(engine, cmd_header);
 	if (desc)
 		return desc;
@@ -911,10 +917,10 @@ find_cmd(struct intel_engine_cs *engine,
 	if (!mask)
 		return NULL;
 
-	BUG_ON(!default_desc);
-	default_desc->flags = CMD_DESC_SKIP;
+	default_desc->cmd.value = cmd_header;
+	default_desc->cmd.mask = 0xffff0000;
 	default_desc->length.mask = mask;
-
+	default_desc->flags = CMD_DESC_SKIP;
 	return default_desc;
 }
 
@@ -1165,7 +1171,8 @@ int intel_engine_cmd_parser(struct intel_engine_cs *engine,
 			    bool is_master)
 {
 	u32 *cmd, *batch_end;
-	struct drm_i915_cmd_descriptor default_desc = { 0 };
+	struct drm_i915_cmd_descriptor default_desc = noop_desc;
+	const struct drm_i915_cmd_descriptor *desc = &default_desc;
 	bool oacontrol_set = false; /* OACONTROL tracking. See check_cmd() */
 	bool needs_clflush_after = false;
 	int ret = 0;
@@ -1185,13 +1192,12 @@ int intel_engine_cmd_parser(struct intel_engine_cs *engine,
 	 */
 	batch_end = cmd + (batch_len / sizeof(*batch_end));
 	while (cmd < batch_end) {
-		const struct drm_i915_cmd_descriptor *desc;
 		u32 length;
 
 		if (*cmd == MI_BATCH_BUFFER_END)
 			break;
 
-		desc = find_cmd(engine, *cmd, &default_desc);
+		desc = find_cmd(engine, *cmd, desc, &default_desc);
 		if (!desc) {
 			DRM_DEBUG_DRIVER("CMD: Unrecognized command: 0x%08X\n",
 					 *cmd);
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 7/9] drm/i915/cmdparser: Check for SKIP descriptors first
  2016-08-12 15:07 cmdparser perf improvement Chris Wilson
                   ` (5 preceding siblings ...)
  2016-08-12 15:07 ` [PATCH 6/9] drm/i915/cmdparser: Compare against the previous command descriptor Chris Wilson
@ 2016-08-12 15:07 ` Chris Wilson
  2016-08-12 17:46   ` Matthew Auld
  2016-08-12 15:07 ` [PATCH 8/9] drm/i915/cmdparser: Use binary search for faster register lookup Chris Wilson
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 30+ messages in thread
From: Chris Wilson @ 2016-08-12 15:07 UTC (permalink / raw)
  To: intel-gfx

If the command descriptor says to skip it, ignore checking for anyother
other conflict.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_cmd_parser.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
index 3b1100a0e0cb..b88607bb971a 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -1022,6 +1022,9 @@ static bool check_cmd(const struct intel_engine_cs *engine,
 		      const bool is_master,
 		      bool *oacontrol_set)
 {
+	if (desc->flags & CMD_DESC_SKIP)
+		return true;
+
 	if (desc->flags & CMD_DESC_REJECT) {
 		DRM_DEBUG_DRIVER("CMD: Rejected command: 0x%08X\n", *cmd);
 		return false;
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 8/9] drm/i915/cmdparser: Use binary search for faster register lookup
  2016-08-12 15:07 cmdparser perf improvement Chris Wilson
                   ` (6 preceding siblings ...)
  2016-08-12 15:07 ` [PATCH 7/9] drm/i915/cmdparser: Check for SKIP descriptors first Chris Wilson
@ 2016-08-12 15:07 ` Chris Wilson
  2016-08-12 16:04   ` Matthew Auld
  2016-08-12 15:07 ` [PATCH 9/9] drm/i915/cmdparser: Accelerate copies from WC memory Chris Wilson
  2016-08-12 15:35 ` ✗ Ro.CI.BAT: failure for series starting with [1/9] drm/i915/cmdparser: Make initialisation failure non-fatal Patchwork
  9 siblings, 1 reply; 30+ messages in thread
From: Chris Wilson @ 2016-08-12 15:07 UTC (permalink / raw)
  To: intel-gfx

A signifcant proportion of the cmdparsing time for some batches is the
cost to find the register in the mmiotable. We ensure that those tables
are in ascending order such that we could do a binary search if it was
ever merited. It is.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_cmd_parser.c | 42 ++++++++++++++++------------------
 1 file changed, 20 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
index b88607bb971a..cea3ef7299cc 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -925,36 +925,37 @@ find_cmd(struct intel_engine_cs *engine,
 }
 
 static const struct drm_i915_reg_descriptor *
-find_reg(const struct drm_i915_reg_descriptor *table,
-	 int count, u32 addr)
+__find_reg(const struct drm_i915_reg_descriptor *table, int count, u32 addr)
 {
-	int i;
-
-	for (i = 0; i < count; i++) {
-		if (i915_mmio_reg_offset(table[i].addr) == addr)
-			return &table[i];
+	int start = 0, end = count;
+	while (start < end) {
+		int mid = start + (end - start) / 2;
+		int ret = addr - i915_mmio_reg_offset(table[mid].addr);
+		if (ret < 0)
+			end = mid;
+		else if (ret > 0)
+			start = mid + 1;
+		else
+			return &table[mid];
 	}
-
 	return NULL;
 }
 
 static const struct drm_i915_reg_descriptor *
-find_reg_in_tables(const struct drm_i915_reg_table *tables,
-		   int count, bool is_master, u32 addr)
+find_reg(const struct intel_engine_cs *engine, bool is_master, u32 addr)
 {
-	int i;
-	const struct drm_i915_reg_table *table;
-	const struct drm_i915_reg_descriptor *reg;
+	const struct drm_i915_reg_table *table = engine->reg_tables;
+	int count = engine->reg_table_count;
 
-	for (i = 0; i < count; i++) {
-		table = &tables[i];
+	do {
 		if (!table->master || is_master) {
-			reg = find_reg(table->regs, table->num_regs,
-				       addr);
+			const struct drm_i915_reg_descriptor *reg;
+
+			reg = __find_reg(table->regs, table->num_regs, addr);
 			if (reg != NULL)
 				return reg;
 		}
-	}
+	} while (table++, --count);
 
 	return NULL;
 }
@@ -1049,10 +1050,7 @@ static bool check_cmd(const struct intel_engine_cs *engine,
 		     offset += step) {
 			const u32 reg_addr = cmd[offset] & desc->reg.mask;
 			const struct drm_i915_reg_descriptor *reg =
-				find_reg_in_tables(engine->reg_tables,
-						   engine->reg_table_count,
-						   is_master,
-						   reg_addr);
+				find_reg(engine, is_master, reg_addr);
 
 			if (!reg) {
 				DRM_DEBUG_DRIVER("CMD: Rejected register 0x%08X in command: 0x%08X (exec_id=%d)\n",
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 9/9] drm/i915/cmdparser: Accelerate copies from WC memory
  2016-08-12 15:07 cmdparser perf improvement Chris Wilson
                   ` (7 preceding siblings ...)
  2016-08-12 15:07 ` [PATCH 8/9] drm/i915/cmdparser: Use binary search for faster register lookup Chris Wilson
@ 2016-08-12 15:07 ` Chris Wilson
  2016-08-12 15:14   ` Chris Wilson
  2016-08-17 16:33   ` Matthew Auld
  2016-08-12 15:35 ` ✗ Ro.CI.BAT: failure for series starting with [1/9] drm/i915/cmdparser: Make initialisation failure non-fatal Patchwork
  9 siblings, 2 replies; 30+ messages in thread
From: Chris Wilson @ 2016-08-12 15:07 UTC (permalink / raw)
  To: intel-gfx

If we need to use clflush to prepare our batch for reads from memory, we
can bypass the cache instead by using non-temporal copies.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_cmd_parser.c | 58 ++++++++++++++++++++++------------
 drivers/gpu/drm/i915/i915_debugfs.c    | 24 --------------
 drivers/gpu/drm/i915/i915_drv.c        | 19 -----------
 drivers/gpu/drm/i915/i915_gem.c        | 48 ++++++++++++++++------------
 drivers/gpu/drm/i915/i915_gem_gtt.c    | 17 +++++++---
 drivers/gpu/drm/i915/i915_gem_tiling.c |  4 ---
 drivers/gpu/drm/i915/i915_irq.c        |  2 --
 drivers/gpu/drm/i915/intel_uncore.c    |  6 ++--
 8 files changed, 81 insertions(+), 97 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
index cea3ef7299cc..3244ef1401ad 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -969,8 +969,7 @@ static u32 *copy_batch(struct drm_i915_gem_object *dst_obj,
 {
 	unsigned int src_needs_clflush;
 	unsigned int dst_needs_clflush;
-	void *dst, *ptr;
-	int offset, n;
+	void *dst;
 	int ret;
 
 	ret = i915_gem_obj_prepare_shmem_read(src_obj, &src_needs_clflush);
@@ -987,24 +986,43 @@ static u32 *copy_batch(struct drm_i915_gem_object *dst_obj,
 	if (IS_ERR(dst))
 		goto unpin_dst;
 
-	ptr = dst;
-	offset = offset_in_page(batch_start_offset);
-	if (dst_needs_clflush & CLFLUSH_BEFORE)
-		batch_len = roundup(batch_len, boot_cpu_data.x86_clflush_size);
-
-	for (n = batch_start_offset >> PAGE_SHIFT; batch_len; n++) {
-		int len = min_t(int, batch_len, PAGE_SIZE - offset);
-		void *vaddr;
-
-		vaddr = kmap_atomic(i915_gem_object_get_page(src_obj, n));
-		if (src_needs_clflush)
-			drm_clflush_virt_range(vaddr + offset, len);
-		memcpy(ptr, vaddr + offset, len);
-		kunmap_atomic(vaddr);
-
-		ptr += len;
-		batch_len -= len;
-		offset = 0;
+	if (src_needs_clflush &&
+	    i915_memcpy_from_wc((void *)(uintptr_t)batch_start_offset, 0, 0)) {
+		void *src;
+
+		src = i915_gem_object_pin_map(src_obj, I915_MAP_WC);
+		if (IS_ERR(src))
+			goto shmem_copy;
+
+		i915_memcpy_from_wc(dst,
+				    src + batch_start_offset,
+				    ALIGN(batch_len, 16));
+		i915_gem_object_unpin_map(src_obj);
+	} else {
+		void *ptr;
+		int offset, n;
+
+shmem_copy:
+		offset = offset_in_page(batch_start_offset);
+		if (dst_needs_clflush & CLFLUSH_BEFORE)
+			batch_len = roundup(batch_len,
+					    boot_cpu_data.x86_clflush_size);
+
+		ptr = dst;
+		for (n = batch_start_offset >> PAGE_SHIFT; batch_len; n++) {
+			int len = min_t(int, batch_len, PAGE_SIZE - offset);
+			void *vaddr;
+
+			vaddr = kmap_atomic(i915_gem_object_get_page(src_obj, n));
+			if (src_needs_clflush)
+				drm_clflush_virt_range(vaddr + offset, len);
+			memcpy(ptr, vaddr + offset, len);
+			kunmap_atomic(vaddr);
+
+			ptr += len;
+			batch_len -= len;
+			offset = 0;
+		}
 	}
 
 	/* dst_obj is returned with vmap pinned */
diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 2fe88d930ca7..8dcdc27afe80 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -715,18 +715,13 @@ static int i915_gem_seqno_info(struct seq_file *m, void *data)
 	struct drm_device *dev = node->minor->dev;
 	struct drm_i915_private *dev_priv = to_i915(dev);
 	struct intel_engine_cs *engine;
-	int ret;
 
-	ret = mutex_lock_interruptible(&dev->struct_mutex);
-	if (ret)
-		return ret;
 	intel_runtime_pm_get(dev_priv);
 
 	for_each_engine(engine, dev_priv)
 		i915_ring_seqno_info(m, engine);
 
 	intel_runtime_pm_put(dev_priv);
-	mutex_unlock(&dev->struct_mutex);
 
 	return 0;
 }
@@ -1379,11 +1374,7 @@ static int ironlake_drpc_info(struct seq_file *m)
 	struct drm_i915_private *dev_priv = to_i915(dev);
 	u32 rgvmodectl, rstdbyctl;
 	u16 crstandvid;
-	int ret;
 
-	ret = mutex_lock_interruptible(&dev->struct_mutex);
-	if (ret)
-		return ret;
 	intel_runtime_pm_get(dev_priv);
 
 	rgvmodectl = I915_READ(MEMMODECTL);
@@ -1391,7 +1382,6 @@ static int ironlake_drpc_info(struct seq_file *m)
 	crstandvid = I915_READ16(CRSTANDVID);
 
 	intel_runtime_pm_put(dev_priv);
-	mutex_unlock(&dev->struct_mutex);
 
 	seq_printf(m, "HD boost: %s\n", yesno(rgvmodectl & MEMMODE_BOOST_EN));
 	seq_printf(m, "Boost freq: %d\n",
@@ -2179,11 +2169,7 @@ static int i915_swizzle_info(struct seq_file *m, void *data)
 	struct drm_info_node *node = m->private;
 	struct drm_device *dev = node->minor->dev;
 	struct drm_i915_private *dev_priv = to_i915(dev);
-	int ret;
 
-	ret = mutex_lock_interruptible(&dev->struct_mutex);
-	if (ret)
-		return ret;
 	intel_runtime_pm_get(dev_priv);
 
 	seq_printf(m, "bit6 swizzle for X-tiling = %s\n",
@@ -2223,7 +2209,6 @@ static int i915_swizzle_info(struct seq_file *m, void *data)
 		seq_puts(m, "L-shaped memory detected\n");
 
 	intel_runtime_pm_put(dev_priv);
-	mutex_unlock(&dev->struct_mutex);
 
 	return 0;
 }
@@ -4729,13 +4714,9 @@ i915_wedged_set(void *data, u64 val)
 	if (i915_reset_in_progress(&dev_priv->gpu_error))
 		return -EAGAIN;
 
-	intel_runtime_pm_get(dev_priv);
-
 	i915_handle_error(dev_priv, val,
 			  "Manually setting wedged to %llu", val);
 
-	intel_runtime_pm_put(dev_priv);
-
 	return 0;
 }
 
@@ -4976,20 +4957,15 @@ i915_cache_sharing_get(void *data, u64 *val)
 	struct drm_device *dev = data;
 	struct drm_i915_private *dev_priv = to_i915(dev);
 	u32 snpcr;
-	int ret;
 
 	if (!(IS_GEN6(dev) || IS_GEN7(dev)))
 		return -ENODEV;
 
-	ret = mutex_lock_interruptible(&dev->struct_mutex);
-	if (ret)
-		return ret;
 	intel_runtime_pm_get(dev_priv);
 
 	snpcr = I915_READ(GEN6_MBCUNIT_SNPCR);
 
 	intel_runtime_pm_put(dev_priv);
-	mutex_unlock(&dev_priv->drm.struct_mutex);
 
 	*val = (snpcr & GEN6_MBC_SNPCR_MASK) >> GEN6_MBC_SNPCR_SHIFT;
 
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index c040c6329804..b458faa0d349 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -2293,24 +2293,6 @@ static int intel_runtime_suspend(struct device *device)
 
 	DRM_DEBUG_KMS("Suspending device\n");
 
-	/*
-	 * We could deadlock here in case another thread holding struct_mutex
-	 * calls RPM suspend concurrently, since the RPM suspend will wait
-	 * first for this RPM suspend to finish. In this case the concurrent
-	 * RPM resume will be followed by its RPM suspend counterpart. Still
-	 * for consistency return -EAGAIN, which will reschedule this suspend.
-	 */
-	if (!mutex_trylock(&dev->struct_mutex)) {
-		DRM_DEBUG_KMS("device lock contention, deffering suspend\n");
-		/*
-		 * Bump the expiration timestamp, otherwise the suspend won't
-		 * be rescheduled.
-		 */
-		pm_runtime_mark_last_busy(device);
-
-		return -EAGAIN;
-	}
-
 	disable_rpm_wakeref_asserts(dev_priv);
 
 	/*
@@ -2318,7 +2300,6 @@ static int intel_runtime_suspend(struct device *device)
 	 * an RPM reference.
 	 */
 	i915_gem_release_all_mmaps(dev_priv);
-	mutex_unlock(&dev->struct_mutex);
 
 	intel_guc_suspend(dev);
 
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 5c1acfc10bc4..a26bfd7d6aab 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1434,11 +1434,9 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data,
 	if (ret)
 		goto err;
 
-	intel_runtime_pm_get(dev_priv);
-
 	ret = i915_mutex_lock_interruptible(dev);
 	if (ret)
-		goto err_rpm;
+		goto err;
 
 	ret = -EFAULT;
 	/* We can only do the GTT pwrite on untiled buffers, as otherwise
@@ -1449,7 +1447,9 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data,
 	 */
 	if (!i915_gem_object_has_struct_page(obj) ||
 	    cpu_write_needs_clflush(obj)) {
+		intel_runtime_pm_get(dev_priv);
 		ret = i915_gem_gtt_pwrite_fast(dev_priv, obj, args, file);
+		intel_runtime_pm_put(dev_priv);
 		/* Note that the gtt paths might fail with non-page-backed user
 		 * pointers (e.g. gtt mappings when moving data between
 		 * textures). Fallback to the shmem path in that case. */
@@ -1464,12 +1464,8 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data,
 
 	i915_gem_object_put(obj);
 	mutex_unlock(&dev->struct_mutex);
-	intel_runtime_pm_put(dev_priv);
-
 	return ret;
 
-err_rpm:
-	intel_runtime_pm_put(dev_priv);
 err:
 	i915_gem_object_put_unlocked(obj);
 	return ret;
@@ -1833,9 +1829,13 @@ i915_gem_release_mmap(struct drm_i915_gem_object *obj)
 	/* Serialisation between user GTT access and our code depends upon
 	 * revoking the CPU's PTE whilst the mutex is held. The next user
 	 * pagefault then has to wait until we release the mutex.
+	 *
+	 * Note that RPM complicates somewhat by adding an additional
+	 * requirement that operations to the GGTT be made holding the RPM
+	 * wakeref. This in turns allow us to release the mmap from within
+	 * the RPM suspend code ignoring the struct_mutex serialisation in
+	 * lieu of the RPM barriers.
 	 */
-	lockdep_assert_held(&obj->base.dev->struct_mutex);
-
 	if (!obj->fault_mappable)
 		return;
 
@@ -1854,11 +1854,21 @@ i915_gem_release_mmap(struct drm_i915_gem_object *obj)
 	obj->fault_mappable = false;
 }
 
+static void assert_rpm_release_all_mmaps(struct drm_i915_private *dev_priv)
+{
+	assert_rpm_wakelock_held(dev_priv);
+}
+
 void
 i915_gem_release_all_mmaps(struct drm_i915_private *dev_priv)
 {
 	struct drm_i915_gem_object *obj;
 
+	/* This should only be called by RPM as we require the bound_list
+	 * to be protected by the RPM barriers and not struct_mutex.
+	 * We check that we are holding the wakeref whenever we manipulate
+	 * the dev_priv->mm.bound_list (via assert_rpm_release_all_mmaps).
+	 */
 	list_for_each_entry(obj, &dev_priv->mm.bound_list, global_list)
 		i915_gem_release_mmap(obj);
 }
@@ -2402,9 +2412,11 @@ i915_gem_object_retire__read(struct i915_gem_active *active,
 	 * so that we don't steal from recently used but inactive objects
 	 * (unless we are forced to ofc!)
 	 */
-	if (obj->bind_count)
+	if (obj->bind_count) {
+		assert_rpm_release_all_mmaps(request->i915);
 		list_move_tail(&obj->global_list,
 			       &request->i915->mm.bound_list);
+	}
 
 	if (i915_gem_object_has_active_reference(obj)) {
 		i915_gem_object_clear_active_reference(obj);
@@ -2881,9 +2893,11 @@ int i915_vma_unbind(struct i915_vma *vma)
 
 	/* Since the unbound list is global, only move to that list if
 	 * no more VMAs exist. */
-	if (--obj->bind_count == 0)
+	if (--obj->bind_count == 0) {
+		assert_rpm_release_all_mmaps(to_i915(obj->base.dev));
 		list_move_tail(&obj->global_list,
 			       &to_i915(obj->base.dev)->mm.unbound_list);
+	}
 
 	/* And finally now the object is completely decoupled from this vma,
 	 * we can drop its hold on the backing storage and allow it to be
@@ -3071,6 +3085,7 @@ search_free:
 	}
 	GEM_BUG_ON(!i915_gem_valid_gtt_space(vma, obj->cache_level));
 
+	assert_rpm_release_all_mmaps(dev_priv);
 	list_move_tail(&obj->global_list, &dev_priv->mm.bound_list);
 	list_move_tail(&vma->vm_link, &vma->vm->inactive_list);
 	obj->bind_count++;
@@ -3420,7 +3435,6 @@ int i915_gem_get_caching_ioctl(struct drm_device *dev, void *data,
 int i915_gem_set_caching_ioctl(struct drm_device *dev, void *data,
 			       struct drm_file *file)
 {
-	struct drm_i915_private *dev_priv = to_i915(dev);
 	struct drm_i915_gem_caching *args = data;
 	struct drm_i915_gem_object *obj;
 	enum i915_cache_level level;
@@ -3449,11 +3463,9 @@ int i915_gem_set_caching_ioctl(struct drm_device *dev, void *data,
 		return -EINVAL;
 	}
 
-	intel_runtime_pm_get(dev_priv);
-
 	ret = i915_mutex_lock_interruptible(dev);
 	if (ret)
-		goto rpm_put;
+		return ret;
 
 	obj = i915_gem_object_lookup(file, args->handle);
 	if (!obj) {
@@ -3462,13 +3474,9 @@ int i915_gem_set_caching_ioctl(struct drm_device *dev, void *data,
 	}
 
 	ret = i915_gem_object_set_cache_level(obj, level);
-
 	i915_gem_object_put(obj);
 unlock:
 	mutex_unlock(&dev->struct_mutex);
-rpm_put:
-	intel_runtime_pm_put(dev_priv);
-
 	return ret;
 }
 
@@ -4174,8 +4182,6 @@ void i915_gem_free_object(struct drm_gem_object *gem_obj)
 
 	kfree(obj->bit_17);
 	i915_gem_object_free(obj);
-
-	intel_runtime_pm_put(dev_priv);
 }
 
 void __i915_gem_object_release_unless_active(struct drm_i915_gem_object *obj)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index fe7f9887ee67..67a3ff960b0d 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -2594,6 +2594,7 @@ static int ggtt_bind_vma(struct i915_vma *vma,
 			 enum i915_cache_level cache_level,
 			 u32 flags)
 {
+	struct drm_i915_private *i915 = to_i915(vma->vm->dev);
 	struct drm_i915_gem_object *obj = vma->obj;
 	u32 pte_flags = 0;
 	int ret;
@@ -2606,8 +2607,10 @@ static int ggtt_bind_vma(struct i915_vma *vma,
 	if (obj->gt_ro)
 		pte_flags |= PTE_READ_ONLY;
 
+	intel_runtime_pm_get(i915);
 	vma->vm->insert_entries(vma->vm, vma->pages, vma->node.start,
 				cache_level, pte_flags);
+	intel_runtime_pm_get(i915);
 
 	/*
 	 * Without aliasing PPGTT there's no difference between
@@ -2623,6 +2626,7 @@ static int aliasing_gtt_bind_vma(struct i915_vma *vma,
 				 enum i915_cache_level cache_level,
 				 u32 flags)
 {
+	struct drm_i915_private *i915 = to_i915(vma->vm->dev);
 	u32 pte_flags;
 	int ret;
 
@@ -2637,14 +2641,15 @@ static int aliasing_gtt_bind_vma(struct i915_vma *vma,
 
 
 	if (flags & I915_VMA_GLOBAL_BIND) {
+		intel_runtime_pm_get(i915);
 		vma->vm->insert_entries(vma->vm,
 					vma->pages, vma->node.start,
 					cache_level, pte_flags);
+		intel_runtime_pm_put(i915);
 	}
 
 	if (flags & I915_VMA_LOCAL_BIND) {
-		struct i915_hw_ppgtt *appgtt =
-			to_i915(vma->vm->dev)->mm.aliasing_ppgtt;
+		struct i915_hw_ppgtt *appgtt = i915->mm.aliasing_ppgtt;
 		appgtt->base.insert_entries(&appgtt->base,
 					    vma->pages, vma->node.start,
 					    cache_level, pte_flags);
@@ -2655,13 +2660,17 @@ static int aliasing_gtt_bind_vma(struct i915_vma *vma,
 
 static void ggtt_unbind_vma(struct i915_vma *vma)
 {
-	struct i915_hw_ppgtt *appgtt = to_i915(vma->vm->dev)->mm.aliasing_ppgtt;
+	struct drm_i915_private *i915 = to_i915(vma->vm->dev);
+	struct i915_hw_ppgtt *appgtt = i915->mm.aliasing_ppgtt;
 	const u64 size = min(vma->size, vma->node.size);
 
-	if (vma->flags & I915_VMA_GLOBAL_BIND)
+	if (vma->flags & I915_VMA_GLOBAL_BIND) {
+		intel_runtime_pm_get(i915);
 		vma->vm->clear_range(vma->vm,
 				     vma->node.start, size,
 				     true);
+		intel_runtime_pm_put(i915);
+	}
 
 	if (vma->flags & I915_VMA_LOCAL_BIND && appgtt)
 		appgtt->base.clear_range(&appgtt->base,
diff --git a/drivers/gpu/drm/i915/i915_gem_tiling.c b/drivers/gpu/drm/i915/i915_gem_tiling.c
index a14b1e3d4c78..08f796a4f5f6 100644
--- a/drivers/gpu/drm/i915/i915_gem_tiling.c
+++ b/drivers/gpu/drm/i915/i915_gem_tiling.c
@@ -204,8 +204,6 @@ i915_gem_set_tiling(struct drm_device *dev, void *data,
 		return -EINVAL;
 	}
 
-	intel_runtime_pm_get(dev_priv);
-
 	mutex_lock(&dev->struct_mutex);
 	if (obj->pin_display || obj->framebuffer_references) {
 		err = -EBUSY;
@@ -301,8 +299,6 @@ err:
 	i915_gem_object_put(obj);
 	mutex_unlock(&dev->struct_mutex);
 
-	intel_runtime_pm_put(dev_priv);
-
 	return err;
 }
 
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index ebb83d5a448b..3d9c2a21dfbd 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -2523,7 +2523,6 @@ static void i915_reset_and_wakeup(struct drm_i915_private *dev_priv)
 		 * simulated reset via debugs, so get an RPM reference.
 		 */
 		intel_runtime_pm_get(dev_priv);
-
 		intel_prepare_reset(dev_priv);
 
 		/*
@@ -2535,7 +2534,6 @@ static void i915_reset_and_wakeup(struct drm_i915_private *dev_priv)
 		ret = i915_reset(dev_priv);
 
 		intel_finish_reset(dev_priv);
-
 		intel_runtime_pm_put(dev_priv);
 
 		if (ret == 0)
diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
index 43f833901b8e..a6b04da4bf21 100644
--- a/drivers/gpu/drm/i915/intel_uncore.c
+++ b/drivers/gpu/drm/i915/intel_uncore.c
@@ -1414,7 +1414,7 @@ int i915_reg_read_ioctl(struct drm_device *dev,
 	struct register_whitelist const *entry = whitelist;
 	unsigned size;
 	i915_reg_t offset_ldw, offset_udw;
-	int i, ret = 0;
+	int i, ret;
 
 	for (i = 0; i < ARRAY_SIZE(whitelist); i++, entry++) {
 		if (i915_mmio_reg_offset(entry->offset_ldw) == (reg->offset & -entry->size) &&
@@ -1436,6 +1436,7 @@ int i915_reg_read_ioctl(struct drm_device *dev,
 
 	intel_runtime_pm_get(dev_priv);
 
+	ret = 0;
 	switch (size) {
 	case 8 | 1:
 		reg->val = I915_READ64_2x32(offset_ldw, offset_udw);
@@ -1454,10 +1455,9 @@ int i915_reg_read_ioctl(struct drm_device *dev,
 		break;
 	default:
 		ret = -EINVAL;
-		goto out;
+		break;
 	}
 
-out:
 	intel_runtime_pm_put(dev_priv);
 	return ret;
 }
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 9/9] drm/i915/cmdparser: Accelerate copies from WC memory
  2016-08-12 15:07 ` [PATCH 9/9] drm/i915/cmdparser: Accelerate copies from WC memory Chris Wilson
@ 2016-08-12 15:14   ` Chris Wilson
  2016-08-17 16:33   ` Matthew Auld
  1 sibling, 0 replies; 30+ messages in thread
From: Chris Wilson @ 2016-08-12 15:14 UTC (permalink / raw)
  To: intel-gfx

On Fri, Aug 12, 2016 at 04:07:30PM +0100, Chris Wilson wrote:
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 2fe88d930ca7..8dcdc27afe80 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -715,18 +715,13 @@ static int i915_gem_seqno_info(struct seq_file *m, void *data)
>  	struct drm_device *dev = node->minor->dev;
>  	struct drm_i915_private *dev_priv = to_i915(dev);
>  	struct intel_engine_cs *engine;
> -	int ret;
>  
> -	ret = mutex_lock_interruptible(&dev->struct_mutex);
> -	if (ret)
> -		return ret;
>  	intel_runtime_pm_get(dev_priv);
>  
>  	for_each_engine(engine, dev_priv)
>  		i915_ring_seqno_info(m, engine);
>  
>  	intel_runtime_pm_put(dev_priv);
> -	mutex_unlock(&dev->struct_mutex);
....
On noes, rebase damage. /o\
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 30+ messages in thread

* ✗ Ro.CI.BAT: failure for series starting with [1/9] drm/i915/cmdparser: Make initialisation failure non-fatal
  2016-08-12 15:07 cmdparser perf improvement Chris Wilson
                   ` (8 preceding siblings ...)
  2016-08-12 15:07 ` [PATCH 9/9] drm/i915/cmdparser: Accelerate copies from WC memory Chris Wilson
@ 2016-08-12 15:35 ` Patchwork
  9 siblings, 0 replies; 30+ messages in thread
From: Patchwork @ 2016-08-12 15:35 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [1/9] drm/i915/cmdparser: Make initialisation failure non-fatal
URL   : https://patchwork.freedesktop.org/series/11031/
State : failure

== Summary ==

Applying: drm/i915/cmdparser: Make initialisation failure non-fatal
fatal: sha1 information is lacking or useless (drivers/gpu/drm/i915/i915_drv.h).
error: could not build fake ancestor
Patch failed at 0001 drm/i915/cmdparser: Make initialisation failure non-fatal
The copy of the patch that failed is found in: .git/rebase-apply/patch
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 8/9] drm/i915/cmdparser: Use binary search for faster register lookup
  2016-08-12 15:07 ` [PATCH 8/9] drm/i915/cmdparser: Use binary search for faster register lookup Chris Wilson
@ 2016-08-12 16:04   ` Matthew Auld
  0 siblings, 0 replies; 30+ messages in thread
From: Matthew Auld @ 2016-08-12 16:04 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Intel Graphics Development

On 12 August 2016 at 16:07, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> A signifcant proportion of the cmdparsing time for some batches is the
> cost to find the register in the mmiotable. We ensure that those tables
> are in ascending order such that we could do a binary search if it was
> ever merited. It is.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cool.
s/signifcant/significant/

Reviewed-by: Matthew Auld <matthew.auld@intel.com>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/9] drm/i915/cmdparser: Improve hash function
  2016-08-12 15:07 ` [PATCH 5/9] drm/i915/cmdparser: Improve hash function Chris Wilson
@ 2016-08-12 17:42   ` Matthew Auld
  2016-08-12 17:58     ` Chris Wilson
  0 siblings, 1 reply; 30+ messages in thread
From: Matthew Auld @ 2016-08-12 17:42 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Intel Graphics Development

> -#define STD_MI_OPCODE_MASK  0xFF800000
> -#define STD_3D_OPCODE_MASK  0xFFFF0000
> -#define STD_2D_OPCODE_MASK  0xFFC00000
> -#define STD_MFX_OPCODE_MASK 0xFFFF0000
> +#define STD_MI_OPCODE_SHIFT  (32 - 9)
> +#define STD_3D_OPCODE_SHIFT  (32 - 16)
> +#define STD_2D_OPCODE_SHIFT  (32 - 10)
> +#define STD_MFX_OPCODE_SHIFT (32 - 16)
Why don't we make use of this one in cmd_header_key? What client is it
supposed to map to?
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 7/9] drm/i915/cmdparser: Check for SKIP descriptors first
  2016-08-12 15:07 ` [PATCH 7/9] drm/i915/cmdparser: Check for SKIP descriptors first Chris Wilson
@ 2016-08-12 17:46   ` Matthew Auld
  0 siblings, 0 replies; 30+ messages in thread
From: Matthew Auld @ 2016-08-12 17:46 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Intel Graphics Development

Reviewed-by: Matthew Auld <matthew.auld@intel.com>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/9] drm/i915/cmdparser: Improve hash function
  2016-08-12 17:42   ` Matthew Auld
@ 2016-08-12 17:58     ` Chris Wilson
  2016-08-12 19:47       ` Matthew Auld
  0 siblings, 1 reply; 30+ messages in thread
From: Chris Wilson @ 2016-08-12 17:58 UTC (permalink / raw)
  To: Matthew Auld; +Cc: Intel Graphics Development

On Fri, Aug 12, 2016 at 06:42:30PM +0100, Matthew Auld wrote:
> > -#define STD_MI_OPCODE_MASK  0xFF800000
> > -#define STD_3D_OPCODE_MASK  0xFFFF0000
> > -#define STD_2D_OPCODE_MASK  0xFFC00000
> > -#define STD_MFX_OPCODE_MASK 0xFFFF0000
> > +#define STD_MI_OPCODE_SHIFT  (32 - 9)
> > +#define STD_3D_OPCODE_SHIFT  (32 - 16)
> > +#define STD_2D_OPCODE_SHIFT  (32 - 10)
> > +#define STD_MFX_OPCODE_SHIFT (32 - 16)
> Why don't we make use of this one in cmd_header_key? What client is it
> supposed to map to?

It doesn't map to its own CLIENT, it reuses the RC_CLIENT for its
commands. (iirc)
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/9] drm/i915/cmdparser: Improve hash function
  2016-08-12 17:58     ` Chris Wilson
@ 2016-08-12 19:47       ` Matthew Auld
  0 siblings, 0 replies; 30+ messages in thread
From: Matthew Auld @ 2016-08-12 19:47 UTC (permalink / raw)
  To: Chris Wilson, Intel Graphics Development

On 12 August 2016 at 18:58, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> On Fri, Aug 12, 2016 at 06:42:30PM +0100, Matthew Auld wrote:
>> > -#define STD_MI_OPCODE_MASK  0xFF800000
>> > -#define STD_3D_OPCODE_MASK  0xFFFF0000
>> > -#define STD_2D_OPCODE_MASK  0xFFC00000
>> > -#define STD_MFX_OPCODE_MASK 0xFFFF0000
>> > +#define STD_MI_OPCODE_SHIFT  (32 - 9)
>> > +#define STD_3D_OPCODE_SHIFT  (32 - 16)
>> > +#define STD_2D_OPCODE_SHIFT  (32 - 10)
>> > +#define STD_MFX_OPCODE_SHIFT (32 - 16)
>> Why don't we make use of this one in cmd_header_key? What client is it
>> supposed to map to?
>
> It doesn't map to its own CLIENT, it reuses the RC_CLIENT for its
> commands. (iirc)
hmm, okay.

Reviewed-by: Matthew Auld <matthew.auld@intel.com>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 6/9] drm/i915/cmdparser: Compare against the previous command descriptor
  2016-08-12 15:07 ` [PATCH 6/9] drm/i915/cmdparser: Compare against the previous command descriptor Chris Wilson
@ 2016-08-15 11:00   ` Matthew Auld
  2016-08-15 14:37     ` Chris Wilson
  0 siblings, 1 reply; 30+ messages in thread
From: Matthew Auld @ 2016-08-15 11:00 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Intel Graphics Development

On 12 August 2016 at 16:07, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> On the blitter (and in test code), we see long sequences of repeated
> commands, e.g. XY_PIXEL_BLT, XY_SCANLINE_BLT, or XY_SRC_COPY. For these,
> we can skip the hashtable lookup by remembering the previous command
> descriptor and doing a straightforward compare of the command header.
> The corollary is that we need to do one extra comparison before lookup
> up new commands.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/i915_cmd_parser.c | 20 +++++++++++++-------
>  1 file changed, 13 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
> index 274f2136a846..3b1100a0e0cb 100644
> --- a/drivers/gpu/drm/i915/i915_cmd_parser.c
> +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
> @@ -350,6 +350,9 @@ static const struct drm_i915_cmd_descriptor hsw_blt_cmds[] = {
>         CMD(  MI_LOAD_SCAN_LINES_EXCL,          SMI,   !F,  0x3F,   R  ),
>  };
>
> +static const struct drm_i915_cmd_descriptor noop_desc =
> +       CMD(MI_NOOP, SMI, F, 1, S);
> +
>  #undef CMD
>  #undef SMI
>  #undef S3D
> @@ -898,11 +901,14 @@ find_cmd_in_table(struct intel_engine_cs *engine,
>  static const struct drm_i915_cmd_descriptor*
>  find_cmd(struct intel_engine_cs *engine,
>          u32 cmd_header,
> +        const struct drm_i915_cmd_descriptor *desc,
>          struct drm_i915_cmd_descriptor *default_desc)
>  {
> -       const struct drm_i915_cmd_descriptor *desc;
>         u32 mask;
>
> +       if (((cmd_header ^ desc->cmd.value) & desc->cmd.mask) == 0)
> +               return desc;
> +
>         desc = find_cmd_in_table(engine, cmd_header);
>         if (desc)
>                 return desc;
> @@ -911,10 +917,10 @@ find_cmd(struct intel_engine_cs *engine,
>         if (!mask)
>                 return NULL;
>
> -       BUG_ON(!default_desc);
Why remove this, was it overkill?

> -       default_desc->flags = CMD_DESC_SKIP;
> +       default_desc->cmd.value = cmd_header;
> +       default_desc->cmd.mask = 0xffff0000;
Where did you pluck this mask from?
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/9] drm/i915/cmdparser: Make initialisation failure non-fatal
  2016-08-12 15:07 ` [PATCH 1/9] drm/i915/cmdparser: Make initialisation failure non-fatal Chris Wilson
@ 2016-08-15 11:56   ` Matthew Auld
  2016-08-15 14:34     ` Chris Wilson
  0 siblings, 1 reply; 30+ messages in thread
From: Matthew Auld @ 2016-08-15 11:56 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Intel Graphics Development

On 12 August 2016 at 16:07, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> If the developer adds a register in the wrong order, we BUG during boot.
> That makes development and testing very difficult. Let's be a bit more
> friendly and disable the command parser with a big warning if the tables
> are invalid.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/i915_cmd_parser.c | 30 ++++++++++++++++++------------
>  drivers/gpu/drm/i915/i915_drv.h        |  2 +-
>  drivers/gpu/drm/i915/intel_engine_cs.c |  6 ++++--
>  3 files changed, 23 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
> index a1f4683f5c35..1882dc28c750 100644
> --- a/drivers/gpu/drm/i915/i915_cmd_parser.c
> +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
> @@ -746,17 +746,15 @@ static void fini_hash_table(struct intel_engine_cs *engine)
>   * Optionally initializes fields related to batch buffer command parsing in the
>   * struct intel_engine_cs based on whether the platform requires software
>   * command parsing.
> - *
> - * Return: non-zero if initialization fails
>   */
> -int intel_engine_init_cmd_parser(struct intel_engine_cs *engine)
> +void intel_engine_init_cmd_parser(struct intel_engine_cs *engine)
>  {
>         const struct drm_i915_cmd_table *cmd_tables;
>         int cmd_table_count;
>         int ret;
>
>         if (!IS_GEN7(engine->i915))
> -               return 0;
> +               return;
>
>         switch (engine->id) {
>         case RCS:
> @@ -811,24 +809,32 @@ int intel_engine_init_cmd_parser(struct intel_engine_cs *engine)
>                 break;
>         default:
>                 MISSING_CASE(engine->id);
> -               BUG();
> +               return;
>         }
>
> -       BUG_ON(!validate_cmds_sorted(engine, cmd_tables, cmd_table_count));
> -       BUG_ON(!validate_regs_sorted(engine));
> +       if (!hash_empty(engine->cmd_hash)) {
> +               DRM_DEBUG_DRIVER("%s: no commands?\n", engine->name);
> +               return;
> +       }
"no commands?", !hash_empty should mean we already have commands, not
that we don't, right?

With that explained or fixed:
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/9] drm/i915/cmdparser: Make initialisation failure non-fatal
  2016-08-15 11:56   ` Matthew Auld
@ 2016-08-15 14:34     ` Chris Wilson
  0 siblings, 0 replies; 30+ messages in thread
From: Chris Wilson @ 2016-08-15 14:34 UTC (permalink / raw)
  To: Matthew Auld; +Cc: Intel Graphics Development

On Mon, Aug 15, 2016 at 12:56:44PM +0100, Matthew Auld wrote:
> On 12 August 2016 at 16:07, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > If the developer adds a register in the wrong order, we BUG during boot.
> > That makes development and testing very difficult. Let's be a bit more
> > friendly and disable the command parser with a big warning if the tables
> > are invalid.
> >
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >  drivers/gpu/drm/i915/i915_cmd_parser.c | 30 ++++++++++++++++++------------
> >  drivers/gpu/drm/i915/i915_drv.h        |  2 +-
> >  drivers/gpu/drm/i915/intel_engine_cs.c |  6 ++++--
> >  3 files changed, 23 insertions(+), 15 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
> > index a1f4683f5c35..1882dc28c750 100644
> > --- a/drivers/gpu/drm/i915/i915_cmd_parser.c
> > +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
> > @@ -746,17 +746,15 @@ static void fini_hash_table(struct intel_engine_cs *engine)
> >   * Optionally initializes fields related to batch buffer command parsing in the
> >   * struct intel_engine_cs based on whether the platform requires software
> >   * command parsing.
> > - *
> > - * Return: non-zero if initialization fails
> >   */
> > -int intel_engine_init_cmd_parser(struct intel_engine_cs *engine)
> > +void intel_engine_init_cmd_parser(struct intel_engine_cs *engine)
> >  {
> >         const struct drm_i915_cmd_table *cmd_tables;
> >         int cmd_table_count;
> >         int ret;
> >
> >         if (!IS_GEN7(engine->i915))
> > -               return 0;
> > +               return;
> >
> >         switch (engine->id) {
> >         case RCS:
> > @@ -811,24 +809,32 @@ int intel_engine_init_cmd_parser(struct intel_engine_cs *engine)
> >                 break;
> >         default:
> >                 MISSING_CASE(engine->id);
> > -               BUG();
> > +               return;
> >         }
> >
> > -       BUG_ON(!validate_cmds_sorted(engine, cmd_tables, cmd_table_count));
> > -       BUG_ON(!validate_regs_sorted(engine));
> > +       if (!hash_empty(engine->cmd_hash)) {
> > +               DRM_DEBUG_DRIVER("%s: no commands?\n", engine->name);
> > +               return;
> > +       }
> "no commands?", !hash_empty should mean we already have commands, not
> that we don't, right?

Oh, it should be "inited twice?"; return;

We can just kill it since we should catch that error earlier and now the
command parser is initialised from the common engine setup the error is
even less likely to ever be hit.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 6/9] drm/i915/cmdparser: Compare against the previous command descriptor
  2016-08-15 11:00   ` Matthew Auld
@ 2016-08-15 14:37     ` Chris Wilson
  2016-08-17  7:55       ` Matthew Auld
  0 siblings, 1 reply; 30+ messages in thread
From: Chris Wilson @ 2016-08-15 14:37 UTC (permalink / raw)
  To: Matthew Auld; +Cc: Intel Graphics Development

On Mon, Aug 15, 2016 at 12:00:32PM +0100, Matthew Auld wrote:
> On 12 August 2016 at 16:07, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > On the blitter (and in test code), we see long sequences of repeated
> > commands, e.g. XY_PIXEL_BLT, XY_SCANLINE_BLT, or XY_SRC_COPY. For these,
> > we can skip the hashtable lookup by remembering the previous command
> > descriptor and doing a straightforward compare of the command header.
> > The corollary is that we need to do one extra comparison before lookup
> > up new commands.
> >
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >  drivers/gpu/drm/i915/i915_cmd_parser.c | 20 +++++++++++++-------
> >  1 file changed, 13 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
> > index 274f2136a846..3b1100a0e0cb 100644
> > --- a/drivers/gpu/drm/i915/i915_cmd_parser.c
> > +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
> > @@ -350,6 +350,9 @@ static const struct drm_i915_cmd_descriptor hsw_blt_cmds[] = {
> >         CMD(  MI_LOAD_SCAN_LINES_EXCL,          SMI,   !F,  0x3F,   R  ),
> >  };
> >
> > +static const struct drm_i915_cmd_descriptor noop_desc =
> > +       CMD(MI_NOOP, SMI, F, 1, S);
> > +
> >  #undef CMD
> >  #undef SMI
> >  #undef S3D
> > @@ -898,11 +901,14 @@ find_cmd_in_table(struct intel_engine_cs *engine,
> >  static const struct drm_i915_cmd_descriptor*
> >  find_cmd(struct intel_engine_cs *engine,
> >          u32 cmd_header,
> > +        const struct drm_i915_cmd_descriptor *desc,
> >          struct drm_i915_cmd_descriptor *default_desc)
> >  {
> > -       const struct drm_i915_cmd_descriptor *desc;
> >         u32 mask;
> >
> > +       if (((cmd_header ^ desc->cmd.value) & desc->cmd.mask) == 0)
> > +               return desc;
> > +
> >         desc = find_cmd_in_table(engine, cmd_header);
> >         if (desc)
> >                 return desc;
> > @@ -911,10 +917,10 @@ find_cmd(struct intel_engine_cs *engine,
> >         if (!mask)
> >                 return NULL;
> >
> > -       BUG_ON(!default_desc);
> Why remove this, was it overkill?

The NULL dereference on the next line is all we need to debug this, i.e.
it gives us no more information and we know by construction it can never
be NULL.

> 
> > -       default_desc->flags = CMD_DESC_SKIP;
> > +       default_desc->cmd.value = cmd_header;
> > +       default_desc->cmd.mask = 0xffff0000;
> Where did you pluck this mask from?

It is the most general cmd header mask.

	#define MIN_OPCODE_SHIFT 16
	cmd.mask = ~0u << MIN_OPCODE_SHIFT
?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/9] drm/i915/cmdparser: Use cached vmappings
  2016-08-12 15:07 ` [PATCH 3/9] drm/i915/cmdparser: Use cached vmappings Chris Wilson
@ 2016-08-16 10:50   ` Matthew Auld
  2016-08-16 10:57     ` Chris Wilson
  2016-08-16 19:04   ` Matthew Auld
  1 sibling, 1 reply; 30+ messages in thread
From: Matthew Auld @ 2016-08-16 10:50 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Intel Graphics Development

> +       if (dst_needs_clflush & CLFLUSH_BEFORE)
> +               batch_len = roundup(batch_len, boot_cpu_data.x86_clflush_size);
> +
>         memcpy(dst, src, batch_len);
>
> +       /* dst_obj is returned with vmap pinned */
> +       *needs_clflush_after = dst_needs_clflush & CLFLUSH_AFTER;
Where did CLFLUSH_{BEFORE, AFTER} come from, I can't find them?
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/9] drm/i915/cmdparser: Use cached vmappings
  2016-08-16 10:50   ` Matthew Auld
@ 2016-08-16 10:57     ` Chris Wilson
  0 siblings, 0 replies; 30+ messages in thread
From: Chris Wilson @ 2016-08-16 10:57 UTC (permalink / raw)
  To: Matthew Auld; +Cc: Intel Graphics Development

On Tue, Aug 16, 2016 at 11:50:29AM +0100, Matthew Auld wrote:
> > +       if (dst_needs_clflush & CLFLUSH_BEFORE)
> > +               batch_len = roundup(batch_len, boot_cpu_data.x86_clflush_size);
> > +
> >         memcpy(dst, src, batch_len);
> >
> > +       /* dst_obj is returned with vmap pinned */
> > +       *needs_clflush_after = dst_needs_clflush & CLFLUSH_AFTER;
> Where did CLFLUSH_{BEFORE, AFTER} come from, I can't find them?

They are part of the required coherency fixes earlier in the bigger
series.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/9] drm/i915/cmdparser: Add the TIMESTAMP register for the other engines
  2016-08-12 15:07 ` [PATCH 2/9] drm/i915/cmdparser: Add the TIMESTAMP register for the other engines Chris Wilson
@ 2016-08-16 11:03   ` Matthew Auld
  0 siblings, 0 replies; 30+ messages in thread
From: Matthew Auld @ 2016-08-16 11:03 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Intel Graphics Development

Reviewed-by: Matthew Auld <matthew.auld@intel.com>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/9] drm/i915/cmdparser: Use cached vmappings
  2016-08-12 15:07 ` [PATCH 3/9] drm/i915/cmdparser: Use cached vmappings Chris Wilson
  2016-08-16 10:50   ` Matthew Auld
@ 2016-08-16 19:04   ` Matthew Auld
  2016-08-16 19:10     ` Chris Wilson
  1 sibling, 1 reply; 30+ messages in thread
From: Matthew Auld @ 2016-08-16 19:04 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Intel Graphics Development

> +       if (dst_needs_clflush & CLFLUSH_BEFORE)
> +               batch_len = roundup(batch_len, boot_cpu_data.x86_clflush_size);
hmm, this bit doesn't seem obvious to me. What am I missing?
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/9] drm/i915/cmdparser: Use cached vmappings
  2016-08-16 19:04   ` Matthew Auld
@ 2016-08-16 19:10     ` Chris Wilson
  2016-08-17  7:35       ` Matthew Auld
  0 siblings, 1 reply; 30+ messages in thread
From: Chris Wilson @ 2016-08-16 19:10 UTC (permalink / raw)
  To: Matthew Auld; +Cc: Intel Graphics Development

On Tue, Aug 16, 2016 at 08:04:35PM +0100, Matthew Auld wrote:
> > +       if (dst_needs_clflush & CLFLUSH_BEFORE)
> > +               batch_len = roundup(batch_len, boot_cpu_data.x86_clflush_size);
> hmm, this bit doesn't seem obvious to me. What am I missing?

The code is optimized to work on cachelines (to work on a partial
cacheline requires a flush before the read). We know that the batch is
in whole pages and so we can always round up to the end of the next
cacheline. We also know that we can read more than the declared length
of the batch into the local page as we only validate as much as
required. So it is safe to read from beyond the end of the batch, and to
do so avoids having to insert clflushes before the read.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/9] drm/i915/cmdparser: Use cached vmappings
  2016-08-16 19:10     ` Chris Wilson
@ 2016-08-17  7:35       ` Matthew Auld
  0 siblings, 0 replies; 30+ messages in thread
From: Matthew Auld @ 2016-08-17  7:35 UTC (permalink / raw)
  To: Chris Wilson, Intel Graphics Development

On 16 August 2016 at 20:10, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> On Tue, Aug 16, 2016 at 08:04:35PM +0100, Matthew Auld wrote:
>> > +       if (dst_needs_clflush & CLFLUSH_BEFORE)
>> > +               batch_len = roundup(batch_len, boot_cpu_data.x86_clflush_size);
>> hmm, this bit doesn't seem obvious to me. What am I missing?
>
> The code is optimized to work on cachelines (to work on a partial
> cacheline requires a flush before the read). We know that the batch is
> in whole pages and so we can always round up to the end of the next
> cacheline. We also know that we can read more than the declared length
> of the batch into the local page as we only validate as much as
> required. So it is safe to read from beyond the end of the batch, and to
> do so avoids having to insert clflushes before the read.
Ah, right ofc.

Reviewed-by: Matthew Auld <matthew.auld@intel.com>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/9] drm/i915/cmdparser: Only cache the dst vmap
  2016-08-12 15:07 ` [PATCH 4/9] drm/i915/cmdparser: Only cache the dst vmap Chris Wilson
@ 2016-08-17  7:51   ` Matthew Auld
  0 siblings, 0 replies; 30+ messages in thread
From: Matthew Auld @ 2016-08-17  7:51 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Intel Graphics Development

Reviewed-by: Matthew Auld <matthew.auld@intel.com>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 6/9] drm/i915/cmdparser: Compare against the previous command descriptor
  2016-08-15 14:37     ` Chris Wilson
@ 2016-08-17  7:55       ` Matthew Auld
  0 siblings, 0 replies; 30+ messages in thread
From: Matthew Auld @ 2016-08-17  7:55 UTC (permalink / raw)
  To: Chris Wilson, Intel Graphics Development

>> > -       default_desc->flags = CMD_DESC_SKIP;
>> > +       default_desc->cmd.value = cmd_header;
>> > +       default_desc->cmd.mask = 0xffff0000;
>> Where did you pluck this mask from?
>
> It is the most general cmd header mask.
>
>         #define MIN_OPCODE_SHIFT 16
>         cmd.mask = ~0u << MIN_OPCODE_SHIFT
> ?
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 9/9] drm/i915/cmdparser: Accelerate copies from WC memory
  2016-08-12 15:07 ` [PATCH 9/9] drm/i915/cmdparser: Accelerate copies from WC memory Chris Wilson
  2016-08-12 15:14   ` Chris Wilson
@ 2016-08-17 16:33   ` Matthew Auld
  1 sibling, 0 replies; 30+ messages in thread
From: Matthew Auld @ 2016-08-17 16:33 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Intel Graphics Development

On 12 August 2016 at 16:07, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> If we need to use clflush to prepare our batch for reads from memory, we
> can bypass the cache instead by using non-temporal copies.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/i915_cmd_parser.c | 58 ++++++++++++++++++++++------------
>  drivers/gpu/drm/i915/i915_debugfs.c    | 24 --------------
>  drivers/gpu/drm/i915/i915_drv.c        | 19 -----------
>  drivers/gpu/drm/i915/i915_gem.c        | 48 ++++++++++++++++------------
>  drivers/gpu/drm/i915/i915_gem_gtt.c    | 17 +++++++---
>  drivers/gpu/drm/i915/i915_gem_tiling.c |  4 ---
>  drivers/gpu/drm/i915/i915_irq.c        |  2 --
>  drivers/gpu/drm/i915/intel_uncore.c    |  6 ++--
>  8 files changed, 81 insertions(+), 97 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
> index cea3ef7299cc..3244ef1401ad 100644
> --- a/drivers/gpu/drm/i915/i915_cmd_parser.c
> +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
> @@ -969,8 +969,7 @@ static u32 *copy_batch(struct drm_i915_gem_object *dst_obj,
>  {
>         unsigned int src_needs_clflush;
>         unsigned int dst_needs_clflush;
> -       void *dst, *ptr;
> -       int offset, n;
> +       void *dst;
>         int ret;
>
>         ret = i915_gem_obj_prepare_shmem_read(src_obj, &src_needs_clflush);
> @@ -987,24 +986,43 @@ static u32 *copy_batch(struct drm_i915_gem_object *dst_obj,
>         if (IS_ERR(dst))
>                 goto unpin_dst;
>
> -       ptr = dst;
> -       offset = offset_in_page(batch_start_offset);
> -       if (dst_needs_clflush & CLFLUSH_BEFORE)
> -               batch_len = roundup(batch_len, boot_cpu_data.x86_clflush_size);
> -
> -       for (n = batch_start_offset >> PAGE_SHIFT; batch_len; n++) {
> -               int len = min_t(int, batch_len, PAGE_SIZE - offset);
> -               void *vaddr;
> -
> -               vaddr = kmap_atomic(i915_gem_object_get_page(src_obj, n));
> -               if (src_needs_clflush)
> -                       drm_clflush_virt_range(vaddr + offset, len);
> -               memcpy(ptr, vaddr + offset, len);
> -               kunmap_atomic(vaddr);
> -
> -               ptr += len;
> -               batch_len -= len;
> -               offset = 0;
> +       if (src_needs_clflush &&
> +           i915_memcpy_from_wc((void *)(uintptr_t)batch_start_offset, 0, 0)) {
> +               void *src;
> +
> +               src = i915_gem_object_pin_map(src_obj, I915_MAP_WC);
> +               if (IS_ERR(src))
> +                       goto shmem_copy;
> +
> +               i915_memcpy_from_wc(dst,
> +                                   src + batch_start_offset,
> +                                   ALIGN(batch_len, 16));
> +               i915_gem_object_unpin_map(src_obj);
> +       } else {
> +               void *ptr;
> +               int offset, n;
> +
> +shmem_copy:
I think Joonas may shed another tear at the sight of this :)

> +               offset = offset_in_page(batch_start_offset);
> +               if (dst_needs_clflush & CLFLUSH_BEFORE)
> +                       batch_len = roundup(batch_len,
> +                                           boot_cpu_data.x86_clflush_size);
> +
> +               ptr = dst;
> +               for (n = batch_start_offset >> PAGE_SHIFT; batch_len; n++) {
> +                       int len = min_t(int, batch_len, PAGE_SIZE - offset);
> +                       void *vaddr;
> +
> +                       vaddr = kmap_atomic(i915_gem_object_get_page(src_obj, n));
> +                       if (src_needs_clflush)
> +                               drm_clflush_virt_range(vaddr + offset, len);
> +                       memcpy(ptr, vaddr + offset, len);
> +                       kunmap_atomic(vaddr);
> +
> +                       ptr += len;
> +                       batch_len -= len;
> +                       offset = 0;
> +               }
>         }
>

Disregarding the rest, which seems unrelated to this patch.

Reviewed-by: Matthew Auld <matthew.auld@intel.com>

>         /* dst_obj is returned with vmap pinned */
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 2fe88d930ca7..8dcdc27afe80 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -715,18 +715,13 @@ static int i915_gem_seqno_info(struct seq_file *m, void *data)
>         struct drm_device *dev = node->minor->dev;
>         struct drm_i915_private *dev_priv = to_i915(dev);
>         struct intel_engine_cs *engine;
> -       int ret;
>
> -       ret = mutex_lock_interruptible(&dev->struct_mutex);
> -       if (ret)
> -               return ret;
>         intel_runtime_pm_get(dev_priv);
>
>         for_each_engine(engine, dev_priv)
>                 i915_ring_seqno_info(m, engine);
>
>         intel_runtime_pm_put(dev_priv);
> -       mutex_unlock(&dev->struct_mutex);
>
>         return 0;
>  }
> @@ -1379,11 +1374,7 @@ static int ironlake_drpc_info(struct seq_file *m)
>         struct drm_i915_private *dev_priv = to_i915(dev);
>         u32 rgvmodectl, rstdbyctl;
>         u16 crstandvid;
> -       int ret;
>
> -       ret = mutex_lock_interruptible(&dev->struct_mutex);
> -       if (ret)
> -               return ret;
>         intel_runtime_pm_get(dev_priv);
>
>         rgvmodectl = I915_READ(MEMMODECTL);
> @@ -1391,7 +1382,6 @@ static int ironlake_drpc_info(struct seq_file *m)
>         crstandvid = I915_READ16(CRSTANDVID);
>
>         intel_runtime_pm_put(dev_priv);
> -       mutex_unlock(&dev->struct_mutex);
>
>         seq_printf(m, "HD boost: %s\n", yesno(rgvmodectl & MEMMODE_BOOST_EN));
>         seq_printf(m, "Boost freq: %d\n",
> @@ -2179,11 +2169,7 @@ static int i915_swizzle_info(struct seq_file *m, void *data)
>         struct drm_info_node *node = m->private;
>         struct drm_device *dev = node->minor->dev;
>         struct drm_i915_private *dev_priv = to_i915(dev);
> -       int ret;
>
> -       ret = mutex_lock_interruptible(&dev->struct_mutex);
> -       if (ret)
> -               return ret;
>         intel_runtime_pm_get(dev_priv);
>
>         seq_printf(m, "bit6 swizzle for X-tiling = %s\n",
> @@ -2223,7 +2209,6 @@ static int i915_swizzle_info(struct seq_file *m, void *data)
>                 seq_puts(m, "L-shaped memory detected\n");
>
>         intel_runtime_pm_put(dev_priv);
> -       mutex_unlock(&dev->struct_mutex);
>
>         return 0;
>  }
> @@ -4729,13 +4714,9 @@ i915_wedged_set(void *data, u64 val)
>         if (i915_reset_in_progress(&dev_priv->gpu_error))
>                 return -EAGAIN;
>
> -       intel_runtime_pm_get(dev_priv);
> -
>         i915_handle_error(dev_priv, val,
>                           "Manually setting wedged to %llu", val);
>
> -       intel_runtime_pm_put(dev_priv);
> -
>         return 0;
>  }
>
> @@ -4976,20 +4957,15 @@ i915_cache_sharing_get(void *data, u64 *val)
>         struct drm_device *dev = data;
>         struct drm_i915_private *dev_priv = to_i915(dev);
>         u32 snpcr;
> -       int ret;
>
>         if (!(IS_GEN6(dev) || IS_GEN7(dev)))
>                 return -ENODEV;
>
> -       ret = mutex_lock_interruptible(&dev->struct_mutex);
> -       if (ret)
> -               return ret;
>         intel_runtime_pm_get(dev_priv);
>
>         snpcr = I915_READ(GEN6_MBCUNIT_SNPCR);
>
>         intel_runtime_pm_put(dev_priv);
> -       mutex_unlock(&dev_priv->drm.struct_mutex);
>
>         *val = (snpcr & GEN6_MBC_SNPCR_MASK) >> GEN6_MBC_SNPCR_SHIFT;
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index c040c6329804..b458faa0d349 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -2293,24 +2293,6 @@ static int intel_runtime_suspend(struct device *device)
>
>         DRM_DEBUG_KMS("Suspending device\n");
>
> -       /*
> -        * We could deadlock here in case another thread holding struct_mutex
> -        * calls RPM suspend concurrently, since the RPM suspend will wait
> -        * first for this RPM suspend to finish. In this case the concurrent
> -        * RPM resume will be followed by its RPM suspend counterpart. Still
> -        * for consistency return -EAGAIN, which will reschedule this suspend.
> -        */
> -       if (!mutex_trylock(&dev->struct_mutex)) {
> -               DRM_DEBUG_KMS("device lock contention, deffering suspend\n");
> -               /*
> -                * Bump the expiration timestamp, otherwise the suspend won't
> -                * be rescheduled.
> -                */
> -               pm_runtime_mark_last_busy(device);
> -
> -               return -EAGAIN;
> -       }
> -
>         disable_rpm_wakeref_asserts(dev_priv);
>
>         /*
> @@ -2318,7 +2300,6 @@ static int intel_runtime_suspend(struct device *device)
>          * an RPM reference.
>          */
>         i915_gem_release_all_mmaps(dev_priv);
> -       mutex_unlock(&dev->struct_mutex);
>
>         intel_guc_suspend(dev);
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 5c1acfc10bc4..a26bfd7d6aab 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1434,11 +1434,9 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data,
>         if (ret)
>                 goto err;
>
> -       intel_runtime_pm_get(dev_priv);
> -
>         ret = i915_mutex_lock_interruptible(dev);
>         if (ret)
> -               goto err_rpm;
> +               goto err;
>
>         ret = -EFAULT;
>         /* We can only do the GTT pwrite on untiled buffers, as otherwise
> @@ -1449,7 +1447,9 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data,
>          */
>         if (!i915_gem_object_has_struct_page(obj) ||
>             cpu_write_needs_clflush(obj)) {
> +               intel_runtime_pm_get(dev_priv);
>                 ret = i915_gem_gtt_pwrite_fast(dev_priv, obj, args, file);
> +               intel_runtime_pm_put(dev_priv);
>                 /* Note that the gtt paths might fail with non-page-backed user
>                  * pointers (e.g. gtt mappings when moving data between
>                  * textures). Fallback to the shmem path in that case. */
> @@ -1464,12 +1464,8 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data,
>
>         i915_gem_object_put(obj);
>         mutex_unlock(&dev->struct_mutex);
> -       intel_runtime_pm_put(dev_priv);
> -
>         return ret;
>
> -err_rpm:
> -       intel_runtime_pm_put(dev_priv);
>  err:
>         i915_gem_object_put_unlocked(obj);
>         return ret;
> @@ -1833,9 +1829,13 @@ i915_gem_release_mmap(struct drm_i915_gem_object *obj)
>         /* Serialisation between user GTT access and our code depends upon
>          * revoking the CPU's PTE whilst the mutex is held. The next user
>          * pagefault then has to wait until we release the mutex.
> +        *
> +        * Note that RPM complicates somewhat by adding an additional
> +        * requirement that operations to the GGTT be made holding the RPM
> +        * wakeref. This in turns allow us to release the mmap from within
> +        * the RPM suspend code ignoring the struct_mutex serialisation in
> +        * lieu of the RPM barriers.
>          */
> -       lockdep_assert_held(&obj->base.dev->struct_mutex);
> -
>         if (!obj->fault_mappable)
>                 return;
>
> @@ -1854,11 +1854,21 @@ i915_gem_release_mmap(struct drm_i915_gem_object *obj)
>         obj->fault_mappable = false;
>  }
>
> +static void assert_rpm_release_all_mmaps(struct drm_i915_private *dev_priv)
> +{
> +       assert_rpm_wakelock_held(dev_priv);
> +}
> +
>  void
>  i915_gem_release_all_mmaps(struct drm_i915_private *dev_priv)
>  {
>         struct drm_i915_gem_object *obj;
>
> +       /* This should only be called by RPM as we require the bound_list
> +        * to be protected by the RPM barriers and not struct_mutex.
> +        * We check that we are holding the wakeref whenever we manipulate
> +        * the dev_priv->mm.bound_list (via assert_rpm_release_all_mmaps).
> +        */
>         list_for_each_entry(obj, &dev_priv->mm.bound_list, global_list)
>                 i915_gem_release_mmap(obj);
>  }
> @@ -2402,9 +2412,11 @@ i915_gem_object_retire__read(struct i915_gem_active *active,
>          * so that we don't steal from recently used but inactive objects
>          * (unless we are forced to ofc!)
>          */
> -       if (obj->bind_count)
> +       if (obj->bind_count) {
> +               assert_rpm_release_all_mmaps(request->i915);
>                 list_move_tail(&obj->global_list,
>                                &request->i915->mm.bound_list);
> +       }
>
>         if (i915_gem_object_has_active_reference(obj)) {
>                 i915_gem_object_clear_active_reference(obj);
> @@ -2881,9 +2893,11 @@ int i915_vma_unbind(struct i915_vma *vma)
>
>         /* Since the unbound list is global, only move to that list if
>          * no more VMAs exist. */
> -       if (--obj->bind_count == 0)
> +       if (--obj->bind_count == 0) {
> +               assert_rpm_release_all_mmaps(to_i915(obj->base.dev));
>                 list_move_tail(&obj->global_list,
>                                &to_i915(obj->base.dev)->mm.unbound_list);
> +       }
>
>         /* And finally now the object is completely decoupled from this vma,
>          * we can drop its hold on the backing storage and allow it to be
> @@ -3071,6 +3085,7 @@ search_free:
>         }
>         GEM_BUG_ON(!i915_gem_valid_gtt_space(vma, obj->cache_level));
>
> +       assert_rpm_release_all_mmaps(dev_priv);
>         list_move_tail(&obj->global_list, &dev_priv->mm.bound_list);
>         list_move_tail(&vma->vm_link, &vma->vm->inactive_list);
>         obj->bind_count++;
> @@ -3420,7 +3435,6 @@ int i915_gem_get_caching_ioctl(struct drm_device *dev, void *data,
>  int i915_gem_set_caching_ioctl(struct drm_device *dev, void *data,
>                                struct drm_file *file)
>  {
> -       struct drm_i915_private *dev_priv = to_i915(dev);
>         struct drm_i915_gem_caching *args = data;
>         struct drm_i915_gem_object *obj;
>         enum i915_cache_level level;
> @@ -3449,11 +3463,9 @@ int i915_gem_set_caching_ioctl(struct drm_device *dev, void *data,
>                 return -EINVAL;
>         }
>
> -       intel_runtime_pm_get(dev_priv);
> -
>         ret = i915_mutex_lock_interruptible(dev);
>         if (ret)
> -               goto rpm_put;
> +               return ret;
>
>         obj = i915_gem_object_lookup(file, args->handle);
>         if (!obj) {
> @@ -3462,13 +3474,9 @@ int i915_gem_set_caching_ioctl(struct drm_device *dev, void *data,
>         }
>
>         ret = i915_gem_object_set_cache_level(obj, level);
> -
>         i915_gem_object_put(obj);
>  unlock:
>         mutex_unlock(&dev->struct_mutex);
> -rpm_put:
> -       intel_runtime_pm_put(dev_priv);
> -
>         return ret;
>  }
>
> @@ -4174,8 +4182,6 @@ void i915_gem_free_object(struct drm_gem_object *gem_obj)
>
>         kfree(obj->bit_17);
>         i915_gem_object_free(obj);
> -
> -       intel_runtime_pm_put(dev_priv);
>  }
>
>  void __i915_gem_object_release_unless_active(struct drm_i915_gem_object *obj)
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index fe7f9887ee67..67a3ff960b0d 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -2594,6 +2594,7 @@ static int ggtt_bind_vma(struct i915_vma *vma,
>                          enum i915_cache_level cache_level,
>                          u32 flags)
>  {
> +       struct drm_i915_private *i915 = to_i915(vma->vm->dev);
>         struct drm_i915_gem_object *obj = vma->obj;
>         u32 pte_flags = 0;
>         int ret;
> @@ -2606,8 +2607,10 @@ static int ggtt_bind_vma(struct i915_vma *vma,
>         if (obj->gt_ro)
>                 pte_flags |= PTE_READ_ONLY;
>
> +       intel_runtime_pm_get(i915);
>         vma->vm->insert_entries(vma->vm, vma->pages, vma->node.start,
>                                 cache_level, pte_flags);
> +       intel_runtime_pm_get(i915);
>
>         /*
>          * Without aliasing PPGTT there's no difference between
> @@ -2623,6 +2626,7 @@ static int aliasing_gtt_bind_vma(struct i915_vma *vma,
>                                  enum i915_cache_level cache_level,
>                                  u32 flags)
>  {
> +       struct drm_i915_private *i915 = to_i915(vma->vm->dev);
>         u32 pte_flags;
>         int ret;
>
> @@ -2637,14 +2641,15 @@ static int aliasing_gtt_bind_vma(struct i915_vma *vma,
>
>
>         if (flags & I915_VMA_GLOBAL_BIND) {
> +               intel_runtime_pm_get(i915);
>                 vma->vm->insert_entries(vma->vm,
>                                         vma->pages, vma->node.start,
>                                         cache_level, pte_flags);
> +               intel_runtime_pm_put(i915);
>         }
>
>         if (flags & I915_VMA_LOCAL_BIND) {
> -               struct i915_hw_ppgtt *appgtt =
> -                       to_i915(vma->vm->dev)->mm.aliasing_ppgtt;
> +               struct i915_hw_ppgtt *appgtt = i915->mm.aliasing_ppgtt;
>                 appgtt->base.insert_entries(&appgtt->base,
>                                             vma->pages, vma->node.start,
>                                             cache_level, pte_flags);
> @@ -2655,13 +2660,17 @@ static int aliasing_gtt_bind_vma(struct i915_vma *vma,
>
>  static void ggtt_unbind_vma(struct i915_vma *vma)
>  {
> -       struct i915_hw_ppgtt *appgtt = to_i915(vma->vm->dev)->mm.aliasing_ppgtt;
> +       struct drm_i915_private *i915 = to_i915(vma->vm->dev);
> +       struct i915_hw_ppgtt *appgtt = i915->mm.aliasing_ppgtt;
>         const u64 size = min(vma->size, vma->node.size);
>
> -       if (vma->flags & I915_VMA_GLOBAL_BIND)
> +       if (vma->flags & I915_VMA_GLOBAL_BIND) {
> +               intel_runtime_pm_get(i915);
>                 vma->vm->clear_range(vma->vm,
>                                      vma->node.start, size,
>                                      true);
> +               intel_runtime_pm_put(i915);
> +       }
>
>         if (vma->flags & I915_VMA_LOCAL_BIND && appgtt)
>                 appgtt->base.clear_range(&appgtt->base,
> diff --git a/drivers/gpu/drm/i915/i915_gem_tiling.c b/drivers/gpu/drm/i915/i915_gem_tiling.c
> index a14b1e3d4c78..08f796a4f5f6 100644
> --- a/drivers/gpu/drm/i915/i915_gem_tiling.c
> +++ b/drivers/gpu/drm/i915/i915_gem_tiling.c
> @@ -204,8 +204,6 @@ i915_gem_set_tiling(struct drm_device *dev, void *data,
>                 return -EINVAL;
>         }
>
> -       intel_runtime_pm_get(dev_priv);
> -
>         mutex_lock(&dev->struct_mutex);
>         if (obj->pin_display || obj->framebuffer_references) {
>                 err = -EBUSY;
> @@ -301,8 +299,6 @@ err:
>         i915_gem_object_put(obj);
>         mutex_unlock(&dev->struct_mutex);
>
> -       intel_runtime_pm_put(dev_priv);
> -
>         return err;
>  }
>
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index ebb83d5a448b..3d9c2a21dfbd 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -2523,7 +2523,6 @@ static void i915_reset_and_wakeup(struct drm_i915_private *dev_priv)
>                  * simulated reset via debugs, so get an RPM reference.
>                  */
>                 intel_runtime_pm_get(dev_priv);
> -
>                 intel_prepare_reset(dev_priv);
>
>                 /*
> @@ -2535,7 +2534,6 @@ static void i915_reset_and_wakeup(struct drm_i915_private *dev_priv)
>                 ret = i915_reset(dev_priv);
>
>                 intel_finish_reset(dev_priv);
> -
>                 intel_runtime_pm_put(dev_priv);
>
>                 if (ret == 0)
> diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
> index 43f833901b8e..a6b04da4bf21 100644
> --- a/drivers/gpu/drm/i915/intel_uncore.c
> +++ b/drivers/gpu/drm/i915/intel_uncore.c
> @@ -1414,7 +1414,7 @@ int i915_reg_read_ioctl(struct drm_device *dev,
>         struct register_whitelist const *entry = whitelist;
>         unsigned size;
>         i915_reg_t offset_ldw, offset_udw;
> -       int i, ret = 0;
> +       int i, ret;
>
>         for (i = 0; i < ARRAY_SIZE(whitelist); i++, entry++) {
>                 if (i915_mmio_reg_offset(entry->offset_ldw) == (reg->offset & -entry->size) &&
> @@ -1436,6 +1436,7 @@ int i915_reg_read_ioctl(struct drm_device *dev,
>
>         intel_runtime_pm_get(dev_priv);
>
> +       ret = 0;
>         switch (size) {
>         case 8 | 1:
>                 reg->val = I915_READ64_2x32(offset_ldw, offset_udw);
> @@ -1454,10 +1455,9 @@ int i915_reg_read_ioctl(struct drm_device *dev,
>                 break;
>         default:
>                 ret = -EINVAL;
> -               goto out;
> +               break;
>         }
>
> -out:
>         intel_runtime_pm_put(dev_priv);
>         return ret;
>  }
> --
> 2.8.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2016-08-17 16:33 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-12 15:07 cmdparser perf improvement Chris Wilson
2016-08-12 15:07 ` [PATCH 1/9] drm/i915/cmdparser: Make initialisation failure non-fatal Chris Wilson
2016-08-15 11:56   ` Matthew Auld
2016-08-15 14:34     ` Chris Wilson
2016-08-12 15:07 ` [PATCH 2/9] drm/i915/cmdparser: Add the TIMESTAMP register for the other engines Chris Wilson
2016-08-16 11:03   ` Matthew Auld
2016-08-12 15:07 ` [PATCH 3/9] drm/i915/cmdparser: Use cached vmappings Chris Wilson
2016-08-16 10:50   ` Matthew Auld
2016-08-16 10:57     ` Chris Wilson
2016-08-16 19:04   ` Matthew Auld
2016-08-16 19:10     ` Chris Wilson
2016-08-17  7:35       ` Matthew Auld
2016-08-12 15:07 ` [PATCH 4/9] drm/i915/cmdparser: Only cache the dst vmap Chris Wilson
2016-08-17  7:51   ` Matthew Auld
2016-08-12 15:07 ` [PATCH 5/9] drm/i915/cmdparser: Improve hash function Chris Wilson
2016-08-12 17:42   ` Matthew Auld
2016-08-12 17:58     ` Chris Wilson
2016-08-12 19:47       ` Matthew Auld
2016-08-12 15:07 ` [PATCH 6/9] drm/i915/cmdparser: Compare against the previous command descriptor Chris Wilson
2016-08-15 11:00   ` Matthew Auld
2016-08-15 14:37     ` Chris Wilson
2016-08-17  7:55       ` Matthew Auld
2016-08-12 15:07 ` [PATCH 7/9] drm/i915/cmdparser: Check for SKIP descriptors first Chris Wilson
2016-08-12 17:46   ` Matthew Auld
2016-08-12 15:07 ` [PATCH 8/9] drm/i915/cmdparser: Use binary search for faster register lookup Chris Wilson
2016-08-12 16:04   ` Matthew Auld
2016-08-12 15:07 ` [PATCH 9/9] drm/i915/cmdparser: Accelerate copies from WC memory Chris Wilson
2016-08-12 15:14   ` Chris Wilson
2016-08-17 16:33   ` Matthew Auld
2016-08-12 15:35 ` ✗ Ro.CI.BAT: failure for series starting with [1/9] drm/i915/cmdparser: Make initialisation failure non-fatal Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.