All of lore.kernel.org
 help / color / mirror / Atom feed
* cmdparser overhead reduction
@ 2015-11-20 10:55 Chris Wilson
  2015-11-20 10:55 ` [PATCH v2 1/6] drm/i915: Eliminate vmap overhead for cmd parser Chris Wilson
                   ` (5 more replies)
  0 siblings, 6 replies; 26+ messages in thread
From: Chris Wilson @ 2015-11-20 10:55 UTC (permalink / raw)
  To: intel-gfx

I spent yonks trying to define tests that produce reliable results for
demonstrating the impact of the cmdparser, that don't require inspection
of a perf profile. So far, with any reliability (because gen7 thermal
throttling makes life difficult) I can demonstrate the impact of using
vmap + WC. Improving the hash function still relies on inspecting the
perf profile of real applications (i.e. games) where the easiest metrics
to gather such as frame times are dominated by the render time. Nor do I
have a metric that is sensitive to timing, such as the bug reported in

"libva decoding performance regression with kernel 4.0-rc"
1428627643.3417.22.camel@collabora.com

What I can demonstrate is that eliminating the vmap overhead affects
throughput by about 2x on small batches, and using WC on byt further
improves throughput by about 30%. And from that bug report thread,
applying the patches prevented the missed deadlines.

Despite all of this the cmdparser still imposes severe overhead (e.g.
throughput reduction of 2x on batches).
-Chris

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 1/6] drm/i915: Eliminate vmap overhead for cmd parser
  2015-11-20 10:55 cmdparser overhead reduction Chris Wilson
@ 2015-11-20 10:55 ` Chris Wilson
  2015-11-20 14:41   ` Ville Syrjälä
  2015-11-20 10:55 ` [PATCH v2 2/6] drm/i915: Cache last cmd descriptor when parsing Chris Wilson
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 26+ messages in thread
From: Chris Wilson @ 2015-11-20 10:55 UTC (permalink / raw)
  To: intel-gfx

With a little complexity to handle cmds straddling page boundaries, we
can completely avoiding having to vmap the batch and the shadow batch
objects whilst running the command parser.

On ivb i7-3720MQ:

x11perf -dot before 54.3M, after 53.2M (max 203M)
glxgears before 7110 fps, after 7300 fps (max 7860 fps)

Before:
Time to blt 16384 bytes x      1:	 12.400µs, 1.2GiB/s
Time to blt 16384 bytes x   4096:	  3.055µs, 5.0GiB/s

After:
Time to blt 16384 bytes x      1:	  8.600µs, 1.8GiB/s
Time to blt 16384 bytes x   4096:	  2.456µs, 6.2GiB/s

Removing the vmap is mostly a win, except we lose in a few cases where
the batch size is greater than a page due to the extra complexity (loss
of a simple cache efficient large copy, and boundary handling).

v2: Reorder so that we do check oacontrol remaining set at end-of-batch
v3: Stage the copy through a temporary page so that we only copy into
dst commands that have been verified.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_cmd_parser.c | 305 +++++++++++++++++----------------
 1 file changed, 153 insertions(+), 152 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
index 814d894ed925..db3a04ea036a 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -860,100 +860,6 @@ find_reg(const struct drm_i915_reg_descriptor *table,
 	return NULL;
 }
 
-static u32 *vmap_batch(struct drm_i915_gem_object *obj,
-		       unsigned start, unsigned len)
-{
-	int i;
-	void *addr = NULL;
-	struct sg_page_iter sg_iter;
-	int first_page = start >> PAGE_SHIFT;
-	int last_page = (len + start + 4095) >> PAGE_SHIFT;
-	int npages = last_page - first_page;
-	struct page **pages;
-
-	pages = drm_malloc_ab(npages, sizeof(*pages));
-	if (pages == NULL) {
-		DRM_DEBUG_DRIVER("Failed to get space for pages\n");
-		goto finish;
-	}
-
-	i = 0;
-	for_each_sg_page(obj->pages->sgl, &sg_iter, obj->pages->nents, first_page) {
-		pages[i++] = sg_page_iter_page(&sg_iter);
-		if (i == npages)
-			break;
-	}
-
-	addr = vmap(pages, i, 0, PAGE_KERNEL);
-	if (addr == NULL) {
-		DRM_DEBUG_DRIVER("Failed to vmap pages\n");
-		goto finish;
-	}
-
-finish:
-	if (pages)
-		drm_free_large(pages);
-	return (u32*)addr;
-}
-
-/* Returns a vmap'd pointer to dest_obj, which the caller must unmap */
-static u32 *copy_batch(struct drm_i915_gem_object *dest_obj,
-		       struct drm_i915_gem_object *src_obj,
-		       u32 batch_start_offset,
-		       u32 batch_len)
-{
-	int needs_clflush = 0;
-	void *src_base, *src;
-	void *dst = NULL;
-	int ret;
-
-	if (batch_len > dest_obj->base.size ||
-	    batch_len + batch_start_offset > src_obj->base.size)
-		return ERR_PTR(-E2BIG);
-
-	if (WARN_ON(dest_obj->pages_pin_count == 0))
-		return ERR_PTR(-ENODEV);
-
-	ret = i915_gem_obj_prepare_shmem_read(src_obj, &needs_clflush);
-	if (ret) {
-		DRM_DEBUG_DRIVER("CMD: failed to prepare shadow batch\n");
-		return ERR_PTR(ret);
-	}
-
-	src_base = vmap_batch(src_obj, batch_start_offset, batch_len);
-	if (!src_base) {
-		DRM_DEBUG_DRIVER("CMD: Failed to vmap batch\n");
-		ret = -ENOMEM;
-		goto unpin_src;
-	}
-
-	ret = i915_gem_object_set_to_cpu_domain(dest_obj, true);
-	if (ret) {
-		DRM_DEBUG_DRIVER("CMD: Failed to set shadow batch to CPU\n");
-		goto unmap_src;
-	}
-
-	dst = vmap_batch(dest_obj, 0, batch_len);
-	if (!dst) {
-		DRM_DEBUG_DRIVER("CMD: Failed to vmap shadow batch\n");
-		ret = -ENOMEM;
-		goto unmap_src;
-	}
-
-	src = src_base + offset_in_page(batch_start_offset);
-	if (needs_clflush)
-		drm_clflush_virt_range(src, batch_len);
-
-	memcpy(dst, src, batch_len);
-
-unmap_src:
-	vunmap(src_base);
-unpin_src:
-	i915_gem_object_unpin_pages(src_obj);
-
-	return ret ? ERR_PTR(ret) : dst;
-}
-
 /**
  * i915_needs_cmd_parser() - should a given ring use software command parsing?
  * @ring: the ring in question
@@ -1097,6 +1003,7 @@ static bool check_cmd(const struct intel_engine_cs *ring,
 }
 
 #define LENGTH_BIAS 2
+#define MAX_PARTIAL 256
 
 /**
  * i915_parse_cmds() - parse a submitted batch buffer for privilege violations
@@ -1120,86 +1027,180 @@ int i915_parse_cmds(struct intel_engine_cs *ring,
 		    u32 batch_len,
 		    bool is_master)
 {
-	u32 *cmd, *batch_base, *batch_end;
+	const struct drm_i915_cmd_descriptor *desc;
+	unsigned dst_iter, src_iter;
+	int needs_clflush = 0;
+	struct get_page rewind;
+	void *src, *dst, *tmp;
+	u32 partial, length;
+	unsigned in, out;
 	struct drm_i915_cmd_descriptor default_desc = { 0 };
 	bool oacontrol_set = false; /* OACONTROL tracking. See check_cmd() */
 	int ret = 0;
 
-	batch_base = copy_batch(shadow_batch_obj, batch_obj,
-				batch_start_offset, batch_len);
-	if (IS_ERR(batch_base)) {
-		DRM_DEBUG_DRIVER("CMD: Failed to copy batch\n");
-		return PTR_ERR(batch_base);
+	if (batch_len > shadow_batch_obj->base.size ||
+	    batch_len + batch_start_offset > batch_obj->base.size)
+		return -E2BIG;
+
+	if (WARN_ON(shadow_batch_obj->pages_pin_count == 0))
+		return -ENODEV;
+
+	ret = i915_gem_obj_prepare_shmem_read(batch_obj, &needs_clflush);
+	if (ret) {
+		DRM_DEBUG_DRIVER("CMD: failed to prepare shadow batch\n");
+		return ret;
+	}
+
+	ret = i915_gem_object_set_to_cpu_domain(shadow_batch_obj, true);
+	if (ret) {
+		DRM_DEBUG_DRIVER("CMD: Failed to set shadow batch to CPU\n");
+		goto unpin;
 	}
 
+	tmp = kmalloc(PAGE_SIZE + MAX_PARTIAL, GFP_KERNEL);
+	if (tmp == NULL)
+		return -ENOMEM;
+
 	/*
 	 * We use the batch length as size because the shadow object is as
 	 * large or larger and copy_batch() will write MI_NOPs to the extra
 	 * space. Parsing should be faster in some cases this way.
 	 */
-	batch_end = batch_base + (batch_len / sizeof(*batch_end));
+	ret = -EINVAL;
+	rewind = batch_obj->get_page;
+
+	dst_iter = 0;
+	dst = kmap_atomic(i915_gem_object_get_page(shadow_batch_obj, dst_iter));
+	out = 0;
+
+	in = offset_in_page(batch_start_offset);
+	partial = 0;
+	for (src_iter = batch_start_offset >> PAGE_SHIFT;
+	     src_iter < batch_obj->base.size >> PAGE_SHIFT;
+	     src_iter++) {
+		u32 *cmd, *page_end, *batch_end;
+		u32 this;
+
+		this = batch_len;
+		if (this > PAGE_SIZE - in)
+			this = PAGE_SIZE - in;
+
+		src = kmap_atomic(i915_gem_object_get_page(batch_obj, src_iter));
+		if (needs_clflush)
+			drm_clflush_virt_range(src + in, this);
+
+		if (this == PAGE_SIZE && partial == 0)
+			copy_page(tmp, src);
+		else
+			memcpy(tmp+partial, src+in, this);
+
+		cmd = tmp;
+		page_end = tmp + this + partial;
+		batch_end = tmp + batch_len + partial;
+		partial = 0;
+
+		do {
+			if (*cmd == MI_BATCH_BUFFER_END) {
+				if (oacontrol_set) {
+					DRM_DEBUG_DRIVER("CMD: batch set OACONTROL but did not clear it\n");
+					goto unmap;
+				}
 
-	cmd = batch_base;
-	while (cmd < batch_end) {
-		const struct drm_i915_cmd_descriptor *desc;
-		u32 length;
+				cmd++; /* copy this cmd to dst */
+				batch_len = this; /* no more src */
+				ret = 0;
+				break;
+			}
 
-		if (*cmd == MI_BATCH_BUFFER_END)
-			break;
+			desc = find_cmd(ring, *cmd, &default_desc);
+			if (!desc) {
+				DRM_DEBUG_DRIVER("CMD: Unrecognized command: 0x%08X\n",
+						 *cmd);
+				goto unmap;
+			}
 
-		desc = find_cmd(ring, *cmd, &default_desc);
-		if (!desc) {
-			DRM_DEBUG_DRIVER("CMD: Unrecognized command: 0x%08X\n",
-					 *cmd);
-			ret = -EINVAL;
-			break;
-		}
+			/*
+			 * If the batch buffer contains a chained batch, return an
+			 * error that tells the caller to abort and dispatch the
+			 * workload as a non-secure batch.
+			 */
+			if (desc->cmd.value == MI_BATCH_BUFFER_START) {
+				ret = -EACCES;
+				goto unmap;
+			}
 
-		/*
-		 * If the batch buffer contains a chained batch, return an
-		 * error that tells the caller to abort and dispatch the
-		 * workload as a non-secure batch.
-		 */
-		if (desc->cmd.value == MI_BATCH_BUFFER_START) {
-			ret = -EACCES;
-			break;
-		}
+			if (desc->flags & CMD_DESC_FIXED)
+				length = desc->length.fixed;
+			else
+				length = ((*cmd & desc->length.mask) + LENGTH_BIAS);
 
-		if (desc->flags & CMD_DESC_FIXED)
-			length = desc->length.fixed;
-		else
-			length = ((*cmd & desc->length.mask) + LENGTH_BIAS);
-
-		if ((batch_end - cmd) < length) {
-			DRM_DEBUG_DRIVER("CMD: Command length exceeds batch length: 0x%08X length=%u batchlen=%td\n",
-					 *cmd,
-					 length,
-					 batch_end - cmd);
-			ret = -EINVAL;
-			break;
-		}
+			if (unlikely(cmd + length > page_end)) {
+				if (unlikely(cmd + length > batch_end)) {
+					DRM_DEBUG_DRIVER("CMD: Command length exceeds batch length: 0x%08X length=%u batchlen=%td\n",
+							 *cmd, length, batch_end - cmd);
+					goto unmap;
+				}
 
-		if (!check_cmd(ring, desc, cmd, length, is_master,
-			       &oacontrol_set)) {
-			ret = -EINVAL;
-			break;
-		}
+				if (WARN_ON(length > MAX_PARTIAL)) {
+					ret = -ENODEV;
+					goto unmap;
+				}
 
-		cmd += length;
-	}
+				partial = 4*(page_end - cmd);
+				break;
+			}
 
-	if (oacontrol_set) {
-		DRM_DEBUG_DRIVER("CMD: batch set OACONTROL but did not clear it\n");
-		ret = -EINVAL;
-	}
+			if (!check_cmd(ring, desc, cmd, length, is_master,
+				       &oacontrol_set))
+				goto unmap;
+
+			cmd += length;
+		} while (cmd < page_end);
+
+		/* Copy the validated page to the GPU batch */
+		{
+			/* exclude partial cmd, we copy it next time */
+			unsigned dst_length = (void *)cmd - tmp;
+			in = 0;
+			do {
+				int len;
+
+				if (out == PAGE_SIZE) {
+					kunmap_atomic(dst);
+					dst = kmap_atomic(i915_gem_object_get_page(shadow_batch_obj, ++dst_iter));
+					out = 0;
+				}
 
-	if (cmd >= batch_end) {
-		DRM_DEBUG_DRIVER("CMD: Got to the end of the buffer w/o a BBE cmd!\n");
-		ret = -EINVAL;
-	}
+				len = dst_length;
+				if (len + out > PAGE_SIZE)
+					len = PAGE_SIZE - out;
+				if (len == PAGE_SIZE)
+					copy_page(dst, tmp + in);
+				else
+					memcpy(dst + out, tmp + in, len);
+				in += len;
+				out += len;
+				dst_length -= len;
+			} while (dst_length);
+		}
+
+		batch_len -= this;
+		if (batch_len == 0)
+			break;
 
-	vunmap(batch_base);
+		if (partial)
+			memmove(tmp, cmd, partial);
 
+		kunmap_atomic(src);
+		in = 0;
+	}
+unmap:
+	kunmap_atomic(src);
+	kunmap_atomic(dst);
+	batch_obj->get_page = rewind;
+	kfree(tmp);
+unpin:
+	i915_gem_object_unpin_pages(batch_obj);
 	return ret;
 }
 
-- 
2.6.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 2/6] drm/i915: Cache last cmd descriptor when parsing
  2015-11-20 10:55 cmdparser overhead reduction Chris Wilson
  2015-11-20 10:55 ` [PATCH v2 1/6] drm/i915: Eliminate vmap overhead for cmd parser Chris Wilson
@ 2015-11-20 10:55 ` Chris Wilson
  2015-11-20 15:08   ` Ville Syrjälä
  2015-12-01 17:30   ` Ville Syrjälä
  2015-11-20 10:55 ` [PATCH v2 3/6] drm/i915: Use WC copies on !llc platforms for the command parser Chris Wilson
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 26+ messages in thread
From: Chris Wilson @ 2015-11-20 10:55 UTC (permalink / raw)
  To: intel-gfx

The cmd parser has the biggest impact on the BLT ring, because it is
relatively verbose composed to the other engines as the vertex data is
inline. It also typically has runs of repeating commands (again since
the vertex data is inline, it typically has sequences of XY_SETUP_BLT,
XY_SCANLINE_BLT..) We can easily reduce the impact of cmdparsing on
benchmarks by then caching the last descriptor and comparing it against
the next command header. To get maximum benefit, we also want to take
advantage of skipping a few validations and length determinations if the
header is unchanged between commands.

ivb i7-3720QM:
x11perf -dot: before 52.3M, after 124M (max 203M)
glxgears: before 7310 fps, after 7550 fps (max 7860 fps)

v2: Fix initial cached cmd descriptor to match MI_NOOP.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_cmd_parser.c | 133 +++++++++++++++------------------
 drivers/gpu/drm/i915/i915_drv.h        |  10 +--
 2 files changed, 64 insertions(+), 79 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
index db3a04ea036a..c6f6d9f2b2ce 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -794,6 +794,9 @@ void i915_cmd_parser_fini_ring(struct intel_engine_cs *ring)
 	fini_hash_table(ring);
 }
 
+/*
+ * Returns a pointer to a descriptor for the command specified by cmd_header.
+ */
 static const struct drm_i915_cmd_descriptor*
 find_cmd_in_table(struct intel_engine_cs *ring,
 		  u32 cmd_header)
@@ -813,37 +816,6 @@ find_cmd_in_table(struct intel_engine_cs *ring,
 	return NULL;
 }
 
-/*
- * Returns a pointer to a descriptor for the command specified by cmd_header.
- *
- * The caller must supply space for a default descriptor via the default_desc
- * parameter. If no descriptor for the specified command exists in the ring's
- * command parser tables, this function fills in default_desc based on the
- * ring's default length encoding and returns default_desc.
- */
-static const struct drm_i915_cmd_descriptor*
-find_cmd(struct intel_engine_cs *ring,
-	 u32 cmd_header,
-	 struct drm_i915_cmd_descriptor *default_desc)
-{
-	const struct drm_i915_cmd_descriptor *desc;
-	u32 mask;
-
-	desc = find_cmd_in_table(ring, cmd_header);
-	if (desc)
-		return desc;
-
-	mask = ring->get_cmd_length_mask(cmd_header);
-	if (!mask)
-		return NULL;
-
-	BUG_ON(!default_desc);
-	default_desc->flags = CMD_DESC_SKIP;
-	default_desc->length.mask = mask;
-
-	return default_desc;
-}
-
 static const struct drm_i915_reg_descriptor *
 find_reg(const struct drm_i915_reg_descriptor *table,
 	 int count, u32 addr)
@@ -886,17 +858,6 @@ static bool check_cmd(const struct intel_engine_cs *ring,
 		      const bool is_master,
 		      bool *oacontrol_set)
 {
-	if (desc->flags & CMD_DESC_REJECT) {
-		DRM_DEBUG_DRIVER("CMD: Rejected command: 0x%08X\n", *cmd);
-		return false;
-	}
-
-	if ((desc->flags & CMD_DESC_MASTER) && !is_master) {
-		DRM_DEBUG_DRIVER("CMD: Rejected master-only command: 0x%08X\n",
-				 *cmd);
-		return false;
-	}
-
 	if (desc->flags & CMD_DESC_REGISTER) {
 		/*
 		 * Get the distance between individual register offset
@@ -1027,14 +988,15 @@ int i915_parse_cmds(struct intel_engine_cs *ring,
 		    u32 batch_len,
 		    bool is_master)
 {
-	const struct drm_i915_cmd_descriptor *desc;
+	struct drm_i915_cmd_descriptor default_desc = { CMD_DESC_SKIP };
+	const struct drm_i915_cmd_descriptor *desc = &default_desc;
+	u32 last_cmd_header = 0;
 	unsigned dst_iter, src_iter;
 	int needs_clflush = 0;
 	struct get_page rewind;
 	void *src, *dst, *tmp;
-	u32 partial, length;
+	u32 partial, length = 1;
 	unsigned in, out;
-	struct drm_i915_cmd_descriptor default_desc = { 0 };
 	bool oacontrol_set = false; /* OACONTROL tracking. See check_cmd() */
 	int ret = 0;
 
@@ -1100,40 +1062,63 @@ int i915_parse_cmds(struct intel_engine_cs *ring,
 		partial = 0;
 
 		do {
-			if (*cmd == MI_BATCH_BUFFER_END) {
-				if (oacontrol_set) {
-					DRM_DEBUG_DRIVER("CMD: batch set OACONTROL but did not clear it\n");
-					goto unmap;
+			if (*cmd != last_cmd_header) {
+				if (*cmd == MI_BATCH_BUFFER_END) {
+					if (unlikely(oacontrol_set)) {
+						DRM_DEBUG_DRIVER("CMD: batch set OACONTROL but did not clear it\n");
+						goto unmap;
+					}
+
+					cmd++; /* copy this cmd to dst */
+					batch_len = this; /* no more src */
+					ret = 0;
+					break;
 				}
 
-				cmd++; /* copy this cmd to dst */
-				batch_len = this; /* no more src */
-				ret = 0;
-				break;
-			}
-
-			desc = find_cmd(ring, *cmd, &default_desc);
-			if (!desc) {
-				DRM_DEBUG_DRIVER("CMD: Unrecognized command: 0x%08X\n",
-						 *cmd);
-				goto unmap;
-			}
+				desc = find_cmd_in_table(ring, *cmd);
+				if (desc) {
+					if (unlikely(desc->flags & CMD_DESC_REJECT)) {
+						DRM_DEBUG_DRIVER("CMD: Rejected command: 0x%08X\n", *cmd);
+						goto unmap;
+					}
+
+					if (unlikely((desc->flags & CMD_DESC_MASTER) && !is_master)) {
+						DRM_DEBUG_DRIVER("CMD: Rejected master-only command: 0x%08X\n", *cmd);
+						goto unmap;
+					}
+
+					/*
+					 * If the batch buffer contains a
+					 * chained batch, return an error that
+					 * tells the caller to abort and
+					 * dispatch the workload as a
+					 * non-secure batch.
+					 */
+					if (unlikely(desc->cmd.value == MI_BATCH_BUFFER_START)) {
+						ret = -EACCES;
+						goto unmap;
+					}
+
+					if (desc->flags & CMD_DESC_FIXED)
+						length = desc->length.fixed;
+					else
+						length = (*cmd & desc->length.mask) + LENGTH_BIAS;
+				} else {
+					u32 mask = ring->get_cmd_length_mask(*cmd);
+					if (unlikely(!mask)) {
+						DRM_DEBUG_DRIVER("CMD: Unrecognized command: 0x%08X\n", *cmd);
+						goto unmap;
+					}
+
+					default_desc.length.mask = mask;
+					desc = &default_desc;
+
+					length = (*cmd & mask) + LENGTH_BIAS;
+				}
 
-			/*
-			 * If the batch buffer contains a chained batch, return an
-			 * error that tells the caller to abort and dispatch the
-			 * workload as a non-secure batch.
-			 */
-			if (desc->cmd.value == MI_BATCH_BUFFER_START) {
-				ret = -EACCES;
-				goto unmap;
+				last_cmd_header = *cmd;
 			}
 
-			if (desc->flags & CMD_DESC_FIXED)
-				length = desc->length.fixed;
-			else
-				length = ((*cmd & desc->length.mask) + LENGTH_BIAS);
-
 			if (unlikely(cmd + length > page_end)) {
 				if (unlikely(cmd + length > batch_end)) {
 					DRM_DEBUG_DRIVER("CMD: Command length exceeds batch length: 0x%08X length=%u batchlen=%td\n",
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 8d939649b919..28d5bfceae3b 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2333,12 +2333,12 @@ struct drm_i915_cmd_descriptor {
 	 *                  is the DRM master
 	 */
 	u32 flags;
+#define CMD_DESC_SKIP     (0)
 #define CMD_DESC_FIXED    (1<<0)
-#define CMD_DESC_SKIP     (1<<1)
-#define CMD_DESC_REJECT   (1<<2)
-#define CMD_DESC_REGISTER (1<<3)
-#define CMD_DESC_BITMASK  (1<<4)
-#define CMD_DESC_MASTER   (1<<5)
+#define CMD_DESC_REJECT   (1<<1)
+#define CMD_DESC_REGISTER (1<<2)
+#define CMD_DESC_BITMASK  (1<<3)
+#define CMD_DESC_MASTER   (1<<4)
 
 	/*
 	 * The command's unique identification bits and the bitmask to get them.
-- 
2.6.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 3/6] drm/i915: Use WC copies on !llc platforms for the command parser
  2015-11-20 10:55 cmdparser overhead reduction Chris Wilson
  2015-11-20 10:55 ` [PATCH v2 1/6] drm/i915: Eliminate vmap overhead for cmd parser Chris Wilson
  2015-11-20 10:55 ` [PATCH v2 2/6] drm/i915: Cache last cmd descriptor when parsing Chris Wilson
@ 2015-11-20 10:55 ` Chris Wilson
  2015-11-20 15:05   ` Ville Syrjälä
  2015-11-20 10:55 ` [PATCH v2 4/6] drm/i915: Reduce arithmetic operations during cmd parser lookup Chris Wilson
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 26+ messages in thread
From: Chris Wilson @ 2015-11-20 10:55 UTC (permalink / raw)
  To: intel-gfx

Since we blow the TLB caches by using kmap/kunmap, we may as well go the
whole hog and see if declaring our destination page as WC is faster than
keeping it as WB and using clflush. It should be!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_cmd_parser.c | 19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
index c6f6d9f2b2ce..4a3e90b042c5 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -992,9 +992,10 @@ int i915_parse_cmds(struct intel_engine_cs *ring,
 	const struct drm_i915_cmd_descriptor *desc = &default_desc;
 	u32 last_cmd_header = 0;
 	unsigned dst_iter, src_iter;
-	int needs_clflush = 0;
 	struct get_page rewind;
 	void *src, *dst, *tmp;
+	int src_needs_clflush = 0;
+	bool dst_needs_clflush;
 	u32 partial, length = 1;
 	unsigned in, out;
 	bool oacontrol_set = false; /* OACONTROL tracking. See check_cmd() */
@@ -1007,13 +1008,19 @@ int i915_parse_cmds(struct intel_engine_cs *ring,
 	if (WARN_ON(shadow_batch_obj->pages_pin_count == 0))
 		return -ENODEV;
 
-	ret = i915_gem_obj_prepare_shmem_read(batch_obj, &needs_clflush);
+	ret = i915_gem_obj_prepare_shmem_read(batch_obj, &src_needs_clflush);
 	if (ret) {
 		DRM_DEBUG_DRIVER("CMD: failed to prepare shadow batch\n");
 		return ret;
 	}
 
-	ret = i915_gem_object_set_to_cpu_domain(shadow_batch_obj, true);
+	dst_needs_clflush =
+		shadow_batch_obj->base.write_domain != I915_GEM_DOMAIN_CPU &&
+		!INTEL_INFO(shadow_batch_obj->base.dev)->has_llc;
+	if (dst_needs_clflush)
+		ret = i915_gem_object_set_to_gtt_domain(shadow_batch_obj, true);
+	else
+		ret = i915_gem_object_set_to_cpu_domain(shadow_batch_obj, true);
 	if (ret) {
 		DRM_DEBUG_DRIVER("CMD: Failed to set shadow batch to CPU\n");
 		goto unpin;
@@ -1048,7 +1055,7 @@ int i915_parse_cmds(struct intel_engine_cs *ring,
 			this = PAGE_SIZE - in;
 
 		src = kmap_atomic(i915_gem_object_get_page(batch_obj, src_iter));
-		if (needs_clflush)
+		if (src_needs_clflush)
 			drm_clflush_virt_range(src + in, this);
 
 		if (this == PAGE_SIZE && partial == 0)
@@ -1151,6 +1158,8 @@ int i915_parse_cmds(struct intel_engine_cs *ring,
 				int len;
 
 				if (out == PAGE_SIZE) {
+					if (dst_needs_clflush)
+						drm_clflush_virt_range(dst, PAGE_SIZE);
 					kunmap_atomic(dst);
 					dst = kmap_atomic(i915_gem_object_get_page(shadow_batch_obj, ++dst_iter));
 					out = 0;
@@ -1179,6 +1188,8 @@ int i915_parse_cmds(struct intel_engine_cs *ring,
 		kunmap_atomic(src);
 		in = 0;
 	}
+	if (dst_needs_clflush)
+		drm_clflush_virt_range(dst, out);
 unmap:
 	kunmap_atomic(src);
 	kunmap_atomic(dst);
-- 
2.6.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 4/6] drm/i915: Reduce arithmetic operations during cmd parser lookup
  2015-11-20 10:55 cmdparser overhead reduction Chris Wilson
                   ` (2 preceding siblings ...)
  2015-11-20 10:55 ` [PATCH v2 3/6] drm/i915: Use WC copies on !llc platforms for the command parser Chris Wilson
@ 2015-11-20 10:55 ` Chris Wilson
  2015-11-20 15:02   ` Ville Syrjälä
  2015-11-20 10:56 ` [PATCH v2 5/6] drm/i915: Reduce pointer indirection " Chris Wilson
  2015-11-20 10:56 ` [PATCH v2 6/6] drm/i915: Improve hash function for the command parser Chris Wilson
  5 siblings, 1 reply; 26+ messages in thread
From: Chris Wilson @ 2015-11-20 10:55 UTC (permalink / raw)
  To: intel-gfx

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_cmd_parser.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
index 4a3e90b042c5..cfd07bfe6e75 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -806,10 +806,7 @@ find_cmd_in_table(struct intel_engine_cs *ring,
 	hash_for_each_possible(ring->cmd_hash, desc_node, node,
 			       cmd_header & CMD_HASH_MASK) {
 		const struct drm_i915_cmd_descriptor *desc = desc_node->desc;
-		u32 masked_cmd = desc->cmd.mask & cmd_header;
-		u32 masked_value = desc->cmd.value & desc->cmd.mask;
-
-		if (masked_cmd == masked_value)
+		if (((cmd_header ^ desc->cmd.value) & desc->cmd.mask) == 0)
 			return desc;
 	}
 
-- 
2.6.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 5/6] drm/i915: Reduce pointer indirection during cmd parser lookup
  2015-11-20 10:55 cmdparser overhead reduction Chris Wilson
                   ` (3 preceding siblings ...)
  2015-11-20 10:55 ` [PATCH v2 4/6] drm/i915: Reduce arithmetic operations during cmd parser lookup Chris Wilson
@ 2015-11-20 10:56 ` Chris Wilson
  2015-11-20 15:27   ` Ville Syrjälä
  2015-11-20 10:56 ` [PATCH v2 6/6] drm/i915: Improve hash function for the command parser Chris Wilson
  5 siblings, 1 reply; 26+ messages in thread
From: Chris Wilson @ 2015-11-20 10:56 UTC (permalink / raw)
  To: intel-gfx

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_cmd_parser.c | 51 ++++++++--------------------------
 drivers/gpu/drm/i915/i915_drv.h        |  4 ++-
 2 files changed, 14 insertions(+), 41 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
index cfd07bfe6e75..ea9df2bb87de 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -113,7 +113,7 @@
 
 /*            Command                          Mask   Fixed Len   Action
 	      ---------------------------------------------------------- */
-static const struct drm_i915_cmd_descriptor common_cmds[] = {
+static struct drm_i915_cmd_descriptor common_cmds[] = {
 	CMD(  MI_NOOP,                          SMI,    F,  1,      S  ),
 	CMD(  MI_USER_INTERRUPT,                SMI,    F,  1,      R  ),
 	CMD(  MI_WAIT_FOR_EVENT,                SMI,    F,  1,      M  ),
@@ -146,7 +146,7 @@ static const struct drm_i915_cmd_descriptor common_cmds[] = {
 	CMD(  MI_BATCH_BUFFER_START,            SMI,   !F,  0xFF,   S  ),
 };
 
-static const struct drm_i915_cmd_descriptor render_cmds[] = {
+static struct drm_i915_cmd_descriptor render_cmds[] = {
 	CMD(  MI_FLUSH,                         SMI,    F,  1,      S  ),
 	CMD(  MI_ARB_ON_OFF,                    SMI,    F,  1,      R  ),
 	CMD(  MI_PREDICATE,                     SMI,    F,  1,      S  ),
@@ -207,7 +207,7 @@ static const struct drm_i915_cmd_descriptor render_cmds[] = {
 	      }},						       ),
 };
 
-static const struct drm_i915_cmd_descriptor hsw_render_cmds[] = {
+static struct drm_i915_cmd_descriptor hsw_render_cmds[] = {
 	CMD(  MI_SET_PREDICATE,                 SMI,    F,  1,      S  ),
 	CMD(  MI_RS_CONTROL,                    SMI,    F,  1,      S  ),
 	CMD(  MI_URB_ATOMIC_ALLOC,              SMI,    F,  1,      S  ),
@@ -229,7 +229,7 @@ static const struct drm_i915_cmd_descriptor hsw_render_cmds[] = {
 	CMD(  GFX_OP_3DSTATE_BINDING_TABLE_EDIT_PS,  S3D,   !F,  0x1FF,  S  ),
 };
 
-static const struct drm_i915_cmd_descriptor video_cmds[] = {
+static struct drm_i915_cmd_descriptor video_cmds[] = {
 	CMD(  MI_ARB_ON_OFF,                    SMI,    F,  1,      R  ),
 	CMD(  MI_SET_APPID,                     SMI,    F,  1,      S  ),
 	CMD(  MI_STORE_DWORD_IMM,               SMI,   !F,  0xFF,   B,
@@ -273,7 +273,7 @@ static const struct drm_i915_cmd_descriptor video_cmds[] = {
 	CMD(  MFX_WAIT,                         SMFX,   F,  1,      S  ),
 };
 
-static const struct drm_i915_cmd_descriptor vecs_cmds[] = {
+static struct drm_i915_cmd_descriptor vecs_cmds[] = {
 	CMD(  MI_ARB_ON_OFF,                    SMI,    F,  1,      R  ),
 	CMD(  MI_SET_APPID,                     SMI,    F,  1,      S  ),
 	CMD(  MI_STORE_DWORD_IMM,               SMI,   !F,  0xFF,   B,
@@ -311,7 +311,7 @@ static const struct drm_i915_cmd_descriptor vecs_cmds[] = {
 	      }},						       ),
 };
 
-static const struct drm_i915_cmd_descriptor blt_cmds[] = {
+static struct drm_i915_cmd_descriptor blt_cmds[] = {
 	CMD(  MI_DISPLAY_FLIP,                  SMI,   !F,  0xFF,   R  ),
 	CMD(  MI_STORE_DWORD_IMM,               SMI,   !F,  0x3FF,  B,
 	      .bits = {{
@@ -344,7 +344,7 @@ static const struct drm_i915_cmd_descriptor blt_cmds[] = {
 	CMD(  SRC_COPY_BLT,                     S2D,   !F,  0x3F,   S  ),
 };
 
-static const struct drm_i915_cmd_descriptor hsw_blt_cmds[] = {
+static struct drm_i915_cmd_descriptor hsw_blt_cmds[] = {
 	CMD(  MI_LOAD_SCAN_LINES_INCL,          SMI,   !F,  0x3F,   M  ),
 	CMD(  MI_LOAD_SCAN_LINES_EXCL,          SMI,   !F,  0x3F,   R  ),
 };
@@ -618,11 +618,6 @@ static bool validate_regs_sorted(struct intel_engine_cs *ring)
 			     ring->master_reg_count);
 }
 
-struct cmd_node {
-	const struct drm_i915_cmd_descriptor *desc;
-	struct hlist_node node;
-};
-
 /*
  * Different command ranges have different numbers of bits for the opcode. For
  * example, MI commands use bits 31:23 while 3D commands use bits 31:16. The
@@ -651,16 +646,8 @@ static int init_hash_table(struct intel_engine_cs *ring,
 		const struct drm_i915_cmd_table *table = &cmd_tables[i];
 
 		for (j = 0; j < table->count; j++) {
-			const struct drm_i915_cmd_descriptor *desc =
-				&table->table[j];
-			struct cmd_node *desc_node =
-				kmalloc(sizeof(*desc_node), GFP_KERNEL);
-
-			if (!desc_node)
-				return -ENOMEM;
-
-			desc_node->desc = desc;
-			hash_add(ring->cmd_hash, &desc_node->node,
+			struct drm_i915_cmd_descriptor *desc = &table->table[j];
+			hash_add(ring->cmd_hash, &desc->node[ring->id],
 				 desc->cmd.value & CMD_HASH_MASK);
 		}
 	}
@@ -668,18 +655,6 @@ static int init_hash_table(struct intel_engine_cs *ring,
 	return 0;
 }
 
-static void fini_hash_table(struct intel_engine_cs *ring)
-{
-	struct hlist_node *tmp;
-	struct cmd_node *desc_node;
-	int i;
-
-	hash_for_each_safe(ring->cmd_hash, i, tmp, desc_node, node) {
-		hash_del(&desc_node->node);
-		kfree(desc_node);
-	}
-}
-
 /**
  * i915_cmd_parser_init_ring() - set cmd parser related fields for a ringbuffer
  * @ring: the ringbuffer to initialize
@@ -770,7 +745,6 @@ int i915_cmd_parser_init_ring(struct intel_engine_cs *ring)
 	ret = init_hash_table(ring, cmd_tables, cmd_table_count);
 	if (ret) {
 		DRM_ERROR("CMD: cmd_parser_init failed!\n");
-		fini_hash_table(ring);
 		return ret;
 	}
 
@@ -790,8 +764,6 @@ void i915_cmd_parser_fini_ring(struct intel_engine_cs *ring)
 {
 	if (!ring->needs_cmd_parser)
 		return;
-
-	fini_hash_table(ring);
 }
 
 /*
@@ -801,11 +773,10 @@ static const struct drm_i915_cmd_descriptor*
 find_cmd_in_table(struct intel_engine_cs *ring,
 		  u32 cmd_header)
 {
-	struct cmd_node *desc_node;
+	const struct drm_i915_cmd_descriptor *desc;
 
-	hash_for_each_possible(ring->cmd_hash, desc_node, node,
+	hash_for_each_possible(ring->cmd_hash, desc, node[ring->id],
 			       cmd_header & CMD_HASH_MASK) {
-		const struct drm_i915_cmd_descriptor *desc = desc_node->desc;
 		if (((cmd_header ^ desc->cmd.value) & desc->cmd.mask) == 0)
 			return desc;
 	}
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 28d5bfceae3b..5960f76f1438 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2396,6 +2396,8 @@ struct drm_i915_cmd_descriptor {
 		u32 condition_offset;
 		u32 condition_mask;
 	} bits[MAX_CMD_DESC_BITMASKS];
+
+	struct hlist_node node[I915_NUM_RINGS];
 };
 
 /*
@@ -2405,7 +2407,7 @@ struct drm_i915_cmd_descriptor {
  * descriptors, which must be sorted with command opcodes in ascending order.
  */
 struct drm_i915_cmd_table {
-	const struct drm_i915_cmd_descriptor *table;
+	struct drm_i915_cmd_descriptor *table;
 	int count;
 };
 
-- 
2.6.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 6/6] drm/i915: Improve hash function for the command parser
  2015-11-20 10:55 cmdparser overhead reduction Chris Wilson
                   ` (4 preceding siblings ...)
  2015-11-20 10:56 ` [PATCH v2 5/6] drm/i915: Reduce pointer indirection " Chris Wilson
@ 2015-11-20 10:56 ` Chris Wilson
  2015-11-20 15:13   ` Ville Syrjälä
  5 siblings, 1 reply; 26+ messages in thread
From: Chris Wilson @ 2015-11-20 10:56 UTC (permalink / raw)
  To: intel-gfx

The existing code's hashfunction is very suboptimal (most 3D commands
use the same bucket degrading the hash to a long list). The code even
acknowledge that the issue was known and the fix simple:

/*
 * If we attempt to generate a perfect hash, we should be able to look at bits
 * 31:29 of a command from a batch buffer and use the full mask for that
 * client. The existing INSTR_CLIENT_MASK/SHIFT defines can be used for this.
 */

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_cmd_parser.c | 45 +++++++++++++++++++++-------------
 1 file changed, 28 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
index ea9df2bb87de..7a0250c1d201 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -86,24 +86,24 @@
  * general bitmasking mechanism.
  */
 
-#define STD_MI_OPCODE_MASK  0xFF800000
-#define STD_3D_OPCODE_MASK  0xFFFF0000
-#define STD_2D_OPCODE_MASK  0xFFC00000
-#define STD_MFX_OPCODE_MASK 0xFFFF0000
+#define STD_MI_OPCODE_SHIFT  (32 - 9)
+#define STD_3D_OPCODE_SHIFT  (32 - 16)
+#define STD_2D_OPCODE_SHIFT  (32 - 10)
+#define STD_MFX_OPCODE_SHIFT (32 - 16)
 
 #define CMD(op, opm, f, lm, fl, ...)				\
 	{							\
 		.flags = (fl) | ((f) ? CMD_DESC_FIXED : 0),	\
-		.cmd = { (op), (opm) },				\
+		.cmd = { (op), ~0 << (opm) }, 			\
 		.length = { (lm) },				\
 		__VA_ARGS__					\
 	}
 
 /* Convenience macros to compress the tables */
-#define SMI STD_MI_OPCODE_MASK
-#define S3D STD_3D_OPCODE_MASK
-#define S2D STD_2D_OPCODE_MASK
-#define SMFX STD_MFX_OPCODE_MASK
+#define SMI STD_MI_OPCODE_SHIFT
+#define S3D STD_3D_OPCODE_SHIFT
+#define S2D STD_2D_OPCODE_SHIFT
+#define SMFX STD_MFX_OPCODE_SHIFT
 #define F true
 #define S CMD_DESC_SKIP
 #define R CMD_DESC_REJECT
@@ -627,12 +627,24 @@ static bool validate_regs_sorted(struct intel_engine_cs *ring)
  * non-opcode bits being set. But if we don't include those bits, some 3D
  * commands may hash to the same bucket due to not including opcode bits that
  * make the command unique. For now, we will risk hashing to the same bucket.
- *
- * If we attempt to generate a perfect hash, we should be able to look at bits
- * 31:29 of a command from a batch buffer and use the full mask for that
- * client. The existing INSTR_CLIENT_MASK/SHIFT defines can be used for this.
  */
-#define CMD_HASH_MASK STD_MI_OPCODE_MASK
+static inline u32 cmd_header_key(u32 x)
+{
+	u32 shift;
+	switch (x >> INSTR_CLIENT_SHIFT) {
+	default:
+	case INSTR_MI_CLIENT:
+		shift = STD_MI_OPCODE_SHIFT;
+		break;
+	case INSTR_RC_CLIENT:
+		shift = STD_3D_OPCODE_SHIFT;
+		break;
+	case INSTR_BC_CLIENT:
+		shift = STD_2D_OPCODE_SHIFT;
+		break;
+	}
+	return x >> shift;
+}
 
 static int init_hash_table(struct intel_engine_cs *ring,
 			   const struct drm_i915_cmd_table *cmd_tables,
@@ -648,7 +660,7 @@ static int init_hash_table(struct intel_engine_cs *ring,
 		for (j = 0; j < table->count; j++) {
 			struct drm_i915_cmd_descriptor *desc = &table->table[j];
 			hash_add(ring->cmd_hash, &desc->node[ring->id],
-				 desc->cmd.value & CMD_HASH_MASK);
+				 cmd_header_key(desc->cmd.value));
 		}
 	}
 
@@ -776,10 +788,9 @@ find_cmd_in_table(struct intel_engine_cs *ring,
 	const struct drm_i915_cmd_descriptor *desc;
 
 	hash_for_each_possible(ring->cmd_hash, desc, node[ring->id],
-			       cmd_header & CMD_HASH_MASK) {
+			       cmd_header_key(cmd_header))
 		if (((cmd_header ^ desc->cmd.value) & desc->cmd.mask) == 0)
 			return desc;
-	}
 
 	return NULL;
 }
-- 
2.6.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 1/6] drm/i915: Eliminate vmap overhead for cmd parser
  2015-11-20 10:55 ` [PATCH v2 1/6] drm/i915: Eliminate vmap overhead for cmd parser Chris Wilson
@ 2015-11-20 14:41   ` Ville Syrjälä
  2015-11-20 14:52     ` Chris Wilson
  2015-11-20 15:31     ` [PATCH v3] " Chris Wilson
  0 siblings, 2 replies; 26+ messages in thread
From: Ville Syrjälä @ 2015-11-20 14:41 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Fri, Nov 20, 2015 at 10:55:56AM +0000, Chris Wilson wrote:
> With a little complexity to handle cmds straddling page boundaries, we
> can completely avoiding having to vmap the batch and the shadow batch
> objects whilst running the command parser.
> 
> On ivb i7-3720MQ:
> 
> x11perf -dot before 54.3M, after 53.2M (max 203M)
> glxgears before 7110 fps, after 7300 fps (max 7860 fps)
> 
> Before:
> Time to blt 16384 bytes x      1:	 12.400µs, 1.2GiB/s
> Time to blt 16384 bytes x   4096:	  3.055µs, 5.0GiB/s
> 
> After:
> Time to blt 16384 bytes x      1:	  8.600µs, 1.8GiB/s
> Time to blt 16384 bytes x   4096:	  2.456µs, 6.2GiB/s
> 
> Removing the vmap is mostly a win, except we lose in a few cases where
> the batch size is greater than a page due to the extra complexity (loss
> of a simple cache efficient large copy, and boundary handling).
> 
> v2: Reorder so that we do check oacontrol remaining set at end-of-batch
> v3: Stage the copy through a temporary page so that we only copy into
> dst commands that have been verified.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/i915_cmd_parser.c | 305 +++++++++++++++++----------------
>  1 file changed, 153 insertions(+), 152 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
> index 814d894ed925..db3a04ea036a 100644
> --- a/drivers/gpu/drm/i915/i915_cmd_parser.c
> +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
> @@ -860,100 +860,6 @@ find_reg(const struct drm_i915_reg_descriptor *table,
>  	return NULL;
>  }
>  
> -static u32 *vmap_batch(struct drm_i915_gem_object *obj,
> -		       unsigned start, unsigned len)
> -{
> -	int i;
> -	void *addr = NULL;
> -	struct sg_page_iter sg_iter;
> -	int first_page = start >> PAGE_SHIFT;
> -	int last_page = (len + start + 4095) >> PAGE_SHIFT;
> -	int npages = last_page - first_page;
> -	struct page **pages;
> -
> -	pages = drm_malloc_ab(npages, sizeof(*pages));
> -	if (pages == NULL) {
> -		DRM_DEBUG_DRIVER("Failed to get space for pages\n");
> -		goto finish;
> -	}
> -
> -	i = 0;
> -	for_each_sg_page(obj->pages->sgl, &sg_iter, obj->pages->nents, first_page) {
> -		pages[i++] = sg_page_iter_page(&sg_iter);
> -		if (i == npages)
> -			break;
> -	}
> -
> -	addr = vmap(pages, i, 0, PAGE_KERNEL);
> -	if (addr == NULL) {
> -		DRM_DEBUG_DRIVER("Failed to vmap pages\n");
> -		goto finish;
> -	}
> -
> -finish:
> -	if (pages)
> -		drm_free_large(pages);
> -	return (u32*)addr;
> -}
> -
> -/* Returns a vmap'd pointer to dest_obj, which the caller must unmap */
> -static u32 *copy_batch(struct drm_i915_gem_object *dest_obj,
> -		       struct drm_i915_gem_object *src_obj,
> -		       u32 batch_start_offset,
> -		       u32 batch_len)
> -{
> -	int needs_clflush = 0;
> -	void *src_base, *src;
> -	void *dst = NULL;
> -	int ret;
> -
> -	if (batch_len > dest_obj->base.size ||
> -	    batch_len + batch_start_offset > src_obj->base.size)
> -		return ERR_PTR(-E2BIG);
> -
> -	if (WARN_ON(dest_obj->pages_pin_count == 0))
> -		return ERR_PTR(-ENODEV);
> -
> -	ret = i915_gem_obj_prepare_shmem_read(src_obj, &needs_clflush);
> -	if (ret) {
> -		DRM_DEBUG_DRIVER("CMD: failed to prepare shadow batch\n");
> -		return ERR_PTR(ret);
> -	}
> -
> -	src_base = vmap_batch(src_obj, batch_start_offset, batch_len);
> -	if (!src_base) {
> -		DRM_DEBUG_DRIVER("CMD: Failed to vmap batch\n");
> -		ret = -ENOMEM;
> -		goto unpin_src;
> -	}
> -
> -	ret = i915_gem_object_set_to_cpu_domain(dest_obj, true);
> -	if (ret) {
> -		DRM_DEBUG_DRIVER("CMD: Failed to set shadow batch to CPU\n");
> -		goto unmap_src;
> -	}
> -
> -	dst = vmap_batch(dest_obj, 0, batch_len);
> -	if (!dst) {
> -		DRM_DEBUG_DRIVER("CMD: Failed to vmap shadow batch\n");
> -		ret = -ENOMEM;
> -		goto unmap_src;
> -	}
> -
> -	src = src_base + offset_in_page(batch_start_offset);
> -	if (needs_clflush)
> -		drm_clflush_virt_range(src, batch_len);
> -
> -	memcpy(dst, src, batch_len);
> -
> -unmap_src:
> -	vunmap(src_base);
> -unpin_src:
> -	i915_gem_object_unpin_pages(src_obj);
> -
> -	return ret ? ERR_PTR(ret) : dst;
> -}
> -
>  /**
>   * i915_needs_cmd_parser() - should a given ring use software command parsing?
>   * @ring: the ring in question
> @@ -1097,6 +1003,7 @@ static bool check_cmd(const struct intel_engine_cs *ring,
>  }
>  
>  #define LENGTH_BIAS 2
> +#define MAX_PARTIAL 256
>  
>  /**
>   * i915_parse_cmds() - parse a submitted batch buffer for privilege violations
> @@ -1120,86 +1027,180 @@ int i915_parse_cmds(struct intel_engine_cs *ring,
>  		    u32 batch_len,
>  		    bool is_master)
>  {
> -	u32 *cmd, *batch_base, *batch_end;
> +	const struct drm_i915_cmd_descriptor *desc;
> +	unsigned dst_iter, src_iter;
> +	int needs_clflush = 0;
> +	struct get_page rewind;
> +	void *src, *dst, *tmp;
> +	u32 partial, length;
> +	unsigned in, out;
>  	struct drm_i915_cmd_descriptor default_desc = { 0 };
>  	bool oacontrol_set = false; /* OACONTROL tracking. See check_cmd() */
>  	int ret = 0;
>  
> -	batch_base = copy_batch(shadow_batch_obj, batch_obj,
> -				batch_start_offset, batch_len);
> -	if (IS_ERR(batch_base)) {
> -		DRM_DEBUG_DRIVER("CMD: Failed to copy batch\n");
> -		return PTR_ERR(batch_base);
> +	if (batch_len > shadow_batch_obj->base.size ||

AFAIK that can't actaully happen since we allocate the shadow batch
based on batch_len. But I see it was already like this in the old code.

> +	    batch_len + batch_start_offset > batch_obj->base.size)

This worries me more. Shouldn't we also have this check somewhere for
the non-cmd_parser cases? Nor can I see any overflow check for
'batch_len+batch_start_offset'.

> +		return -E2BIG;
> +
> +	if (WARN_ON(shadow_batch_obj->pages_pin_count == 0))
> +		return -ENODEV;
> +
> +	ret = i915_gem_obj_prepare_shmem_read(batch_obj, &needs_clflush);
> +	if (ret) {
> +		DRM_DEBUG_DRIVER("CMD: failed to prepare shadow batch\n");
> +		return ret;
> +	}
> +
> +	ret = i915_gem_object_set_to_cpu_domain(shadow_batch_obj, true);
> +	if (ret) {
> +		DRM_DEBUG_DRIVER("CMD: Failed to set shadow batch to CPU\n");
> +		goto unpin;
>  	}
>  
> +	tmp = kmalloc(PAGE_SIZE + MAX_PARTIAL, GFP_KERNEL);

GFP_TEMPORARY perhaps?

> +	if (tmp == NULL)
> +		return -ENOMEM;
> +
>  	/*
>  	 * We use the batch length as size because the shadow object is as
>  	 * large or larger and copy_batch() will write MI_NOPs to the extra

copy_batch() is gone.

>  	 * space. Parsing should be faster in some cases this way.
>  	 */
> -	batch_end = batch_base + (batch_len / sizeof(*batch_end));
> +	ret = -EINVAL;
> +	rewind = batch_obj->get_page;

Shouldn't the get_page work just fine without this rewind trick? As in if
some other user wants one of the previous pages it restarts iterating
from 0?

> +
> +	dst_iter = 0;
> +	dst = kmap_atomic(i915_gem_object_get_page(shadow_batch_obj, dst_iter));
> +	out = 0;
> +
> +	in = offset_in_page(batch_start_offset);
> +	partial = 0;
> +	for (src_iter = batch_start_offset >> PAGE_SHIFT;
> +	     src_iter < batch_obj->base.size >> PAGE_SHIFT;
> +	     src_iter++) {
> +		u32 *cmd, *page_end, *batch_end;
> +		u32 this;
> +
> +		this = batch_len;
> +		if (this > PAGE_SIZE - in)
> +			this = PAGE_SIZE - in;
> +
> +		src = kmap_atomic(i915_gem_object_get_page(batch_obj, src_iter));
> +		if (needs_clflush)
> +			drm_clflush_virt_range(src + in, this);
> +
> +		if (this == PAGE_SIZE && partial == 0)
> +			copy_page(tmp, src);
> +		else
> +			memcpy(tmp+partial, src+in, this);
> +
> +		cmd = tmp;
> +		page_end = tmp + this + partial;
> +		batch_end = tmp + batch_len + partial;
> +		partial = 0;
> +
> +		do {
> +			if (*cmd == MI_BATCH_BUFFER_END) {
> +				if (oacontrol_set) {
> +					DRM_DEBUG_DRIVER("CMD: batch set OACONTROL but did not clear it\n");
> +					goto unmap;
> +				}
>  
> -	cmd = batch_base;
> -	while (cmd < batch_end) {
> -		const struct drm_i915_cmd_descriptor *desc;
> -		u32 length;
> +				cmd++; /* copy this cmd to dst */
> +				batch_len = this; /* no more src */
> +				ret = 0;
> +				break;
> +			}
>  
> -		if (*cmd == MI_BATCH_BUFFER_END)
> -			break;
> +			desc = find_cmd(ring, *cmd, &default_desc);
> +			if (!desc) {
> +				DRM_DEBUG_DRIVER("CMD: Unrecognized command: 0x%08X\n",
> +						 *cmd);
> +				goto unmap;
> +			}
>  
> -		desc = find_cmd(ring, *cmd, &default_desc);
> -		if (!desc) {
> -			DRM_DEBUG_DRIVER("CMD: Unrecognized command: 0x%08X\n",
> -					 *cmd);
> -			ret = -EINVAL;
> -			break;
> -		}

It's quite hard to see what's changed due to the reindent. Any chance
you could do the reindent in a prep patch just help my poor brain a bit?

> +			/*
> +			 * If the batch buffer contains a chained batch, return an
> +			 * error that tells the caller to abort and dispatch the
> +			 * workload as a non-secure batch.
> +			 */
> +			if (desc->cmd.value == MI_BATCH_BUFFER_START) {
> +				ret = -EACCES;
> +				goto unmap;
> +			}
>  
> -		/*
> -		 * If the batch buffer contains a chained batch, return an
> -		 * error that tells the caller to abort and dispatch the
> -		 * workload as a non-secure batch.
> -		 */
> -		if (desc->cmd.value == MI_BATCH_BUFFER_START) {
> -			ret = -EACCES;
> -			break;
> -		}
> +			if (desc->flags & CMD_DESC_FIXED)
> +				length = desc->length.fixed;
> +			else
> +				length = ((*cmd & desc->length.mask) + LENGTH_BIAS);
>  
> -		if (desc->flags & CMD_DESC_FIXED)
> -			length = desc->length.fixed;
> -		else
> -			length = ((*cmd & desc->length.mask) + LENGTH_BIAS);
> -
> -		if ((batch_end - cmd) < length) {
> -			DRM_DEBUG_DRIVER("CMD: Command length exceeds batch length: 0x%08X length=%u batchlen=%td\n",
> -					 *cmd,
> -					 length,
> -					 batch_end - cmd);
> -			ret = -EINVAL;
> -			break;
> -		}
> +			if (unlikely(cmd + length > page_end)) {
> +				if (unlikely(cmd + length > batch_end)) {
> +					DRM_DEBUG_DRIVER("CMD: Command length exceeds batch length: 0x%08X length=%u batchlen=%td\n",
> +							 *cmd, length, batch_end - cmd);
> +					goto unmap;
> +				}
>  
> -		if (!check_cmd(ring, desc, cmd, length, is_master,
> -			       &oacontrol_set)) {
> -			ret = -EINVAL;
> -			break;
> -		}
> +				if (WARN_ON(length > MAX_PARTIAL)) {
> +					ret = -ENODEV;
> +					goto unmap;
> +				}
>  
> -		cmd += length;
> -	}
> +				partial = 4*(page_end - cmd);
> +				break;
> +			}
>  
> -	if (oacontrol_set) {
> -		DRM_DEBUG_DRIVER("CMD: batch set OACONTROL but did not clear it\n");
> -		ret = -EINVAL;
> -	}
> +			if (!check_cmd(ring, desc, cmd, length, is_master,
> +				       &oacontrol_set))
> +				goto unmap;
> +
> +			cmd += length;
> +		} while (cmd < page_end);
> +
> +		/* Copy the validated page to the GPU batch */
> +		{
> +			/* exclude partial cmd, we copy it next time */
> +			unsigned dst_length = (void *)cmd - tmp;
> +			in = 0;
> +			do {
> +				int len;
> +
> +				if (out == PAGE_SIZE) {
> +					kunmap_atomic(dst);
> +					dst = kmap_atomic(i915_gem_object_get_page(shadow_batch_obj, ++dst_iter));
> +					out = 0;
> +				}
>  
> -	if (cmd >= batch_end) {
> -		DRM_DEBUG_DRIVER("CMD: Got to the end of the buffer w/o a BBE cmd!\n");
> -		ret = -EINVAL;
> -	}
> +				len = dst_length;
> +				if (len + out > PAGE_SIZE)
> +					len = PAGE_SIZE - out;
> +				if (len == PAGE_SIZE)
> +					copy_page(dst, tmp + in);
> +				else
> +					memcpy(dst + out, tmp + in, len);
> +				in += len;
> +				out += len;
> +				dst_length -= len;
> +			} while (dst_length);
> +		}
> +
> +		batch_len -= this;
> +		if (batch_len == 0)
> +			break;
>  
> -	vunmap(batch_base);
> +		if (partial)
> +			memmove(tmp, cmd, partial);
>  
> +		kunmap_atomic(src);
> +		in = 0;
> +	}
> +unmap:
> +	kunmap_atomic(src);
> +	kunmap_atomic(dst);
> +	batch_obj->get_page = rewind;
> +	kfree(tmp);
> +unpin:
> +	i915_gem_object_unpin_pages(batch_obj);
>  	return ret;
>  }
>  
> -- 
> 2.6.2

-- 
Ville Syrjälä
Intel OTC
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 1/6] drm/i915: Eliminate vmap overhead for cmd parser
  2015-11-20 14:41   ` Ville Syrjälä
@ 2015-11-20 14:52     ` Chris Wilson
  2015-11-20 15:31     ` [PATCH v3] " Chris Wilson
  1 sibling, 0 replies; 26+ messages in thread
From: Chris Wilson @ 2015-11-20 14:52 UTC (permalink / raw)
  To: Ville Syrjälä; +Cc: intel-gfx

On Fri, Nov 20, 2015 at 04:41:53PM +0200, Ville Syrjälä wrote:
> > +	if (batch_len > shadow_batch_obj->base.size ||
> 
> AFAIK that can't actaully happen since we allocate the shadow batch
> based on batch_len. But I see it was already like this in the old code.
> 
> > +	    batch_len + batch_start_offset > batch_obj->base.size)
> 
> This worries me more. Shouldn't we also have this check somewhere for
> the non-cmd_parser cases? Nor can I see any overflow check for
> 'batch_len+batch_start_offset'.

True, that is worthy of being moved to execbuf validation.

> > +		return -E2BIG;
> > +
> > +	if (WARN_ON(shadow_batch_obj->pages_pin_count == 0))
> > +		return -ENODEV;
> > +
> > +	ret = i915_gem_obj_prepare_shmem_read(batch_obj, &needs_clflush);
> > +	if (ret) {
> > +		DRM_DEBUG_DRIVER("CMD: failed to prepare shadow batch\n");
> > +		return ret;
> > +	}
> > +
> > +	ret = i915_gem_object_set_to_cpu_domain(shadow_batch_obj, true);
> > +	if (ret) {
> > +		DRM_DEBUG_DRIVER("CMD: Failed to set shadow batch to CPU\n");
> > +		goto unpin;
> >  	}
> >  
> > +	tmp = kmalloc(PAGE_SIZE + MAX_PARTIAL, GFP_KERNEL);
> 
> GFP_TEMPORARY perhaps?

Ok.

> > +	if (tmp == NULL)
> > +		return -ENOMEM;
> > +
> >  	/*
> >  	 * We use the batch length as size because the shadow object is as
> >  	 * large or larger and copy_batch() will write MI_NOPs to the extra
> 
> copy_batch() is gone.

I guess the whole comment is now misleading. Comments, never trust them!

> >  	 * space. Parsing should be faster in some cases this way.
> >  	 */
> > -	batch_end = batch_base + (batch_len / sizeof(*batch_end));
> > +	ret = -EINVAL;
> > +	rewind = batch_obj->get_page;
> 
> Shouldn't the get_page work just fine without this rewind trick? As in if
> some other user wants one of the previous pages it restarts iterating
> from 0?

Yes, it works fine, it is just a trick to keep the cache of the
last location intact for other paths (basically trying to keep the cache
for the direct user paths).

> > -		if (*cmd == MI_BATCH_BUFFER_END)
> > -			break;
> > +			desc = find_cmd(ring, *cmd, &default_desc);
> > +			if (!desc) {
> > +				DRM_DEBUG_DRIVER("CMD: Unrecognized command: 0x%08X\n",
> > +						 *cmd);
> > +				goto unmap;
> > +			}
> >  
> > -		desc = find_cmd(ring, *cmd, &default_desc);
> > -		if (!desc) {
> > -			DRM_DEBUG_DRIVER("CMD: Unrecognized command: 0x%08X\n",
> > -					 *cmd);
> > -			ret = -EINVAL;
> > -			break;
> > -		}
> 
> It's quite hard to see what's changed due to the reindent. Any chance
> you could do the reindent in a prep patch just help my poor brain a bit?

Or maybe the ignore-whitespace option for send-email?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 4/6] drm/i915: Reduce arithmetic operations during cmd parser lookup
  2015-11-20 10:55 ` [PATCH v2 4/6] drm/i915: Reduce arithmetic operations during cmd parser lookup Chris Wilson
@ 2015-11-20 15:02   ` Ville Syrjälä
  0 siblings, 0 replies; 26+ messages in thread
From: Ville Syrjälä @ 2015-11-20 15:02 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Fri, Nov 20, 2015 at 10:55:59AM +0000, Chris Wilson wrote:
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/i915_cmd_parser.c | 5 +----
>  1 file changed, 1 insertion(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
> index 4a3e90b042c5..cfd07bfe6e75 100644
> --- a/drivers/gpu/drm/i915/i915_cmd_parser.c
> +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
> @@ -806,10 +806,7 @@ find_cmd_in_table(struct intel_engine_cs *ring,
>  	hash_for_each_possible(ring->cmd_hash, desc_node, node,
>  			       cmd_header & CMD_HASH_MASK) {
>  		const struct drm_i915_cmd_descriptor *desc = desc_node->desc;
> -		u32 masked_cmd = desc->cmd.mask & cmd_header;
> -		u32 masked_value = desc->cmd.value & desc->cmd.mask;
> -
> -		if (masked_cmd == masked_value)
> +		if (((cmd_header ^ desc->cmd.value) & desc->cmd.mask) == 0)

Hmm. '(a & m) == (b & m)' vs '((a ^ b) & m) == 0'.

So two & vs. one ^ and one &. Is that an improvement, dunno.

At least the result should be the same, so based on that:
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>

>  			return desc;
>  	}
>  
> -- 
> 2.6.2
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Ville Syrjälä
Intel OTC
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 3/6] drm/i915: Use WC copies on !llc platforms for the command parser
  2015-11-20 10:55 ` [PATCH v2 3/6] drm/i915: Use WC copies on !llc platforms for the command parser Chris Wilson
@ 2015-11-20 15:05   ` Ville Syrjälä
  2015-11-20 15:22     ` Chris Wilson
  0 siblings, 1 reply; 26+ messages in thread
From: Ville Syrjälä @ 2015-11-20 15:05 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Fri, Nov 20, 2015 at 10:55:58AM +0000, Chris Wilson wrote:
> Since we blow the TLB caches by using kmap/kunmap, we may as well go the
> whole hog and see if declaring our destination page as WC is faster than
> keeping it as WB and using clflush. It should be!

Is this description for another patch? I can't see any WC stuff in
there.

> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/i915_cmd_parser.c | 19 +++++++++++++++----
>  1 file changed, 15 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
> index c6f6d9f2b2ce..4a3e90b042c5 100644
> --- a/drivers/gpu/drm/i915/i915_cmd_parser.c
> +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
> @@ -992,9 +992,10 @@ int i915_parse_cmds(struct intel_engine_cs *ring,
>  	const struct drm_i915_cmd_descriptor *desc = &default_desc;
>  	u32 last_cmd_header = 0;
>  	unsigned dst_iter, src_iter;
> -	int needs_clflush = 0;
>  	struct get_page rewind;
>  	void *src, *dst, *tmp;
> +	int src_needs_clflush = 0;
> +	bool dst_needs_clflush;
>  	u32 partial, length = 1;
>  	unsigned in, out;
>  	bool oacontrol_set = false; /* OACONTROL tracking. See check_cmd() */
> @@ -1007,13 +1008,19 @@ int i915_parse_cmds(struct intel_engine_cs *ring,
>  	if (WARN_ON(shadow_batch_obj->pages_pin_count == 0))
>  		return -ENODEV;
>  
> -	ret = i915_gem_obj_prepare_shmem_read(batch_obj, &needs_clflush);
> +	ret = i915_gem_obj_prepare_shmem_read(batch_obj, &src_needs_clflush);
>  	if (ret) {
>  		DRM_DEBUG_DRIVER("CMD: failed to prepare shadow batch\n");
>  		return ret;
>  	}
>  
> -	ret = i915_gem_object_set_to_cpu_domain(shadow_batch_obj, true);
> +	dst_needs_clflush =
> +		shadow_batch_obj->base.write_domain != I915_GEM_DOMAIN_CPU &&
> +		!INTEL_INFO(shadow_batch_obj->base.dev)->has_llc;
> +	if (dst_needs_clflush)
> +		ret = i915_gem_object_set_to_gtt_domain(shadow_batch_obj, true);
> +	else
> +		ret = i915_gem_object_set_to_cpu_domain(shadow_batch_obj, true);
>  	if (ret) {
>  		DRM_DEBUG_DRIVER("CMD: Failed to set shadow batch to CPU\n");
>  		goto unpin;
> @@ -1048,7 +1055,7 @@ int i915_parse_cmds(struct intel_engine_cs *ring,
>  			this = PAGE_SIZE - in;
>  
>  		src = kmap_atomic(i915_gem_object_get_page(batch_obj, src_iter));
> -		if (needs_clflush)
> +		if (src_needs_clflush)
>  			drm_clflush_virt_range(src + in, this);
>  
>  		if (this == PAGE_SIZE && partial == 0)
> @@ -1151,6 +1158,8 @@ int i915_parse_cmds(struct intel_engine_cs *ring,
>  				int len;
>  
>  				if (out == PAGE_SIZE) {
> +					if (dst_needs_clflush)
> +						drm_clflush_virt_range(dst, PAGE_SIZE);
>  					kunmap_atomic(dst);
>  					dst = kmap_atomic(i915_gem_object_get_page(shadow_batch_obj, ++dst_iter));
>  					out = 0;
> @@ -1179,6 +1188,8 @@ int i915_parse_cmds(struct intel_engine_cs *ring,
>  		kunmap_atomic(src);
>  		in = 0;
>  	}
> +	if (dst_needs_clflush)
> +		drm_clflush_virt_range(dst, out);
>  unmap:
>  	kunmap_atomic(src);
>  	kunmap_atomic(dst);
> -- 
> 2.6.2
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Ville Syrjälä
Intel OTC
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 2/6] drm/i915: Cache last cmd descriptor when parsing
  2015-11-20 10:55 ` [PATCH v2 2/6] drm/i915: Cache last cmd descriptor when parsing Chris Wilson
@ 2015-11-20 15:08   ` Ville Syrjälä
  2015-11-20 15:44     ` Chris Wilson
  2015-12-01 17:30   ` Ville Syrjälä
  1 sibling, 1 reply; 26+ messages in thread
From: Ville Syrjälä @ 2015-11-20 15:08 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Fri, Nov 20, 2015 at 10:55:57AM +0000, Chris Wilson wrote:
> The cmd parser has the biggest impact on the BLT ring, because it is
> relatively verbose composed to the other engines as the vertex data is
> inline. It also typically has runs of repeating commands (again since
> the vertex data is inline, it typically has sequences of XY_SETUP_BLT,
> XY_SCANLINE_BLT..) We can easily reduce the impact of cmdparsing on
> benchmarks by then caching the last descriptor and comparing it against
> the next command header. To get maximum benefit, we also want to take
> advantage of skipping a few validations and length determinations if the
> header is unchanged between commands.
> 
> ivb i7-3720QM:
> x11perf -dot: before 52.3M, after 124M (max 203M)
> glxgears: before 7310 fps, after 7550 fps (max 7860 fps)
> 
> v2: Fix initial cached cmd descriptor to match MI_NOOP.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/i915_cmd_parser.c | 133 +++++++++++++++------------------
>  drivers/gpu/drm/i915/i915_drv.h        |  10 +--
>  2 files changed, 64 insertions(+), 79 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
> index db3a04ea036a..c6f6d9f2b2ce 100644
> --- a/drivers/gpu/drm/i915/i915_cmd_parser.c
> +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
> @@ -794,6 +794,9 @@ void i915_cmd_parser_fini_ring(struct intel_engine_cs *ring)
>  	fini_hash_table(ring);
>  }
>  
> +/*
> + * Returns a pointer to a descriptor for the command specified by cmd_header.
> + */
>  static const struct drm_i915_cmd_descriptor*
>  find_cmd_in_table(struct intel_engine_cs *ring,
>  		  u32 cmd_header)
> @@ -813,37 +816,6 @@ find_cmd_in_table(struct intel_engine_cs *ring,
>  	return NULL;
>  }
>  
> -/*
> - * Returns a pointer to a descriptor for the command specified by cmd_header.
> - *
> - * The caller must supply space for a default descriptor via the default_desc
> - * parameter. If no descriptor for the specified command exists in the ring's
> - * command parser tables, this function fills in default_desc based on the
> - * ring's default length encoding and returns default_desc.
> - */
> -static const struct drm_i915_cmd_descriptor*
> -find_cmd(struct intel_engine_cs *ring,
> -	 u32 cmd_header,
> -	 struct drm_i915_cmd_descriptor *default_desc)
> -{
> -	const struct drm_i915_cmd_descriptor *desc;
> -	u32 mask;
> -
> -	desc = find_cmd_in_table(ring, cmd_header);
> -	if (desc)
> -		return desc;
> -
> -	mask = ring->get_cmd_length_mask(cmd_header);
> -	if (!mask)
> -		return NULL;
> -
> -	BUG_ON(!default_desc);
> -	default_desc->flags = CMD_DESC_SKIP;
> -	default_desc->length.mask = mask;
> -
> -	return default_desc;
> -}
> -
>  static const struct drm_i915_reg_descriptor *
>  find_reg(const struct drm_i915_reg_descriptor *table,
>  	 int count, u32 addr)
> @@ -886,17 +858,6 @@ static bool check_cmd(const struct intel_engine_cs *ring,
>  		      const bool is_master,
>  		      bool *oacontrol_set)
>  {
> -	if (desc->flags & CMD_DESC_REJECT) {
> -		DRM_DEBUG_DRIVER("CMD: Rejected command: 0x%08X\n", *cmd);
> -		return false;
> -	}
> -
> -	if ((desc->flags & CMD_DESC_MASTER) && !is_master) {
> -		DRM_DEBUG_DRIVER("CMD: Rejected master-only command: 0x%08X\n",
> -				 *cmd);
> -		return false;
> -	}
> -
>  	if (desc->flags & CMD_DESC_REGISTER) {
>  		/*
>  		 * Get the distance between individual register offset
> @@ -1027,14 +988,15 @@ int i915_parse_cmds(struct intel_engine_cs *ring,
>  		    u32 batch_len,
>  		    bool is_master)
>  {
> -	const struct drm_i915_cmd_descriptor *desc;
> +	struct drm_i915_cmd_descriptor default_desc = { CMD_DESC_SKIP };
> +	const struct drm_i915_cmd_descriptor *desc = &default_desc;
> +	u32 last_cmd_header = 0;
>  	unsigned dst_iter, src_iter;
>  	int needs_clflush = 0;
>  	struct get_page rewind;
>  	void *src, *dst, *tmp;
> -	u32 partial, length;
> +	u32 partial, length = 1;
>  	unsigned in, out;
> -	struct drm_i915_cmd_descriptor default_desc = { 0 };
>  	bool oacontrol_set = false; /* OACONTROL tracking. See check_cmd() */
>  	int ret = 0;
>  
> @@ -1100,40 +1062,63 @@ int i915_parse_cmds(struct intel_engine_cs *ring,
>  		partial = 0;
>  
>  		do {
> -			if (*cmd == MI_BATCH_BUFFER_END) {
> -				if (oacontrol_set) {
> -					DRM_DEBUG_DRIVER("CMD: batch set OACONTROL but did not clear it\n");
> -					goto unmap;
> +			if (*cmd != last_cmd_header) {
> +				if (*cmd == MI_BATCH_BUFFER_END) {
> +					if (unlikely(oacontrol_set)) {
> +						DRM_DEBUG_DRIVER("CMD: batch set OACONTROL but did not clear it\n");
> +						goto unmap;
> +					}
> +
> +					cmd++; /* copy this cmd to dst */
> +					batch_len = this; /* no more src */
> +					ret = 0;
> +					break;
>  				}
>  
> -				cmd++; /* copy this cmd to dst */
> -				batch_len = this; /* no more src */
> -				ret = 0;
> -				break;
> -			}
> -
> -			desc = find_cmd(ring, *cmd, &default_desc);
> -			if (!desc) {
> -				DRM_DEBUG_DRIVER("CMD: Unrecognized command: 0x%08X\n",
> -						 *cmd);
> -				goto unmap;
> -			}
> +				desc = find_cmd_in_table(ring, *cmd);
> +				if (desc) {
> +					if (unlikely(desc->flags & CMD_DESC_REJECT)) {
> +						DRM_DEBUG_DRIVER("CMD: Rejected command: 0x%08X\n", *cmd);
> +						goto unmap;
> +					}

Yikes. More indenttion. It's getting really deep here. Any sane way to
reduce it?

> +
> +					if (unlikely((desc->flags & CMD_DESC_MASTER) && !is_master)) {
> +						DRM_DEBUG_DRIVER("CMD: Rejected master-only command: 0x%08X\n", *cmd);
> +						goto unmap;
> +					}
> +
> +					/*
> +					 * If the batch buffer contains a
> +					 * chained batch, return an error that
> +					 * tells the caller to abort and
> +					 * dispatch the workload as a
> +					 * non-secure batch.
> +					 */
> +					if (unlikely(desc->cmd.value == MI_BATCH_BUFFER_START)) {
> +						ret = -EACCES;
> +						goto unmap;
> +					}
> +
> +					if (desc->flags & CMD_DESC_FIXED)
> +						length = desc->length.fixed;
> +					else
> +						length = (*cmd & desc->length.mask) + LENGTH_BIAS;
> +				} else {
> +					u32 mask = ring->get_cmd_length_mask(*cmd);
> +					if (unlikely(!mask)) {
> +						DRM_DEBUG_DRIVER("CMD: Unrecognized command: 0x%08X\n", *cmd);
> +						goto unmap;
> +					}
> +
> +					default_desc.length.mask = mask;
> +					desc = &default_desc;
> +
> +					length = (*cmd & mask) + LENGTH_BIAS;
> +				}
>  
> -			/*
> -			 * If the batch buffer contains a chained batch, return an
> -			 * error that tells the caller to abort and dispatch the
> -			 * workload as a non-secure batch.
> -			 */
> -			if (desc->cmd.value == MI_BATCH_BUFFER_START) {
> -				ret = -EACCES;
> -				goto unmap;
> +				last_cmd_header = *cmd;
>  			}
>  
> -			if (desc->flags & CMD_DESC_FIXED)
> -				length = desc->length.fixed;
> -			else
> -				length = ((*cmd & desc->length.mask) + LENGTH_BIAS);
> -
>  			if (unlikely(cmd + length > page_end)) {
>  				if (unlikely(cmd + length > batch_end)) {
>  					DRM_DEBUG_DRIVER("CMD: Command length exceeds batch length: 0x%08X length=%u batchlen=%td\n",
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 8d939649b919..28d5bfceae3b 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2333,12 +2333,12 @@ struct drm_i915_cmd_descriptor {
>  	 *                  is the DRM master
>  	 */
>  	u32 flags;
> +#define CMD_DESC_SKIP     (0)
>  #define CMD_DESC_FIXED    (1<<0)
> -#define CMD_DESC_SKIP     (1<<1)
> -#define CMD_DESC_REJECT   (1<<2)
> -#define CMD_DESC_REGISTER (1<<3)
> -#define CMD_DESC_BITMASK  (1<<4)
> -#define CMD_DESC_MASTER   (1<<5)
> +#define CMD_DESC_REJECT   (1<<1)
> +#define CMD_DESC_REGISTER (1<<2)
> +#define CMD_DESC_BITMASK  (1<<3)
> +#define CMD_DESC_MASTER   (1<<4)
>  
>  	/*
>  	 * The command's unique identification bits and the bitmask to get them.
> -- 
> 2.6.2
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Ville Syrjälä
Intel OTC
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 6/6] drm/i915: Improve hash function for the command parser
  2015-11-20 10:56 ` [PATCH v2 6/6] drm/i915: Improve hash function for the command parser Chris Wilson
@ 2015-11-20 15:13   ` Ville Syrjälä
  0 siblings, 0 replies; 26+ messages in thread
From: Ville Syrjälä @ 2015-11-20 15:13 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Fri, Nov 20, 2015 at 10:56:01AM +0000, Chris Wilson wrote:
> The existing code's hashfunction is very suboptimal (most 3D commands
> use the same bucket degrading the hash to a long list). The code even
> acknowledge that the issue was known and the fix simple:

Do we have any statistics/some easy way to gather them to see how
the hash is performing?

> 
> /*
>  * If we attempt to generate a perfect hash, we should be able to look at bits
>  * 31:29 of a command from a batch buffer and use the full mask for that
>  * client. The existing INSTR_CLIENT_MASK/SHIFT defines can be used for this.
>  */
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/i915_cmd_parser.c | 45 +++++++++++++++++++++-------------
>  1 file changed, 28 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
> index ea9df2bb87de..7a0250c1d201 100644
> --- a/drivers/gpu/drm/i915/i915_cmd_parser.c
> +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
> @@ -86,24 +86,24 @@
>   * general bitmasking mechanism.
>   */
>  
> -#define STD_MI_OPCODE_MASK  0xFF800000
> -#define STD_3D_OPCODE_MASK  0xFFFF0000
> -#define STD_2D_OPCODE_MASK  0xFFC00000
> -#define STD_MFX_OPCODE_MASK 0xFFFF0000
> +#define STD_MI_OPCODE_SHIFT  (32 - 9)
> +#define STD_3D_OPCODE_SHIFT  (32 - 16)
> +#define STD_2D_OPCODE_SHIFT  (32 - 10)
> +#define STD_MFX_OPCODE_SHIFT (32 - 16)
>  
>  #define CMD(op, opm, f, lm, fl, ...)				\
>  	{							\
>  		.flags = (fl) | ((f) ? CMD_DESC_FIXED : 0),	\
> -		.cmd = { (op), (opm) },				\
> +		.cmd = { (op), ~0 << (opm) }, 			\
>  		.length = { (lm) },				\
>  		__VA_ARGS__					\
>  	}
>  
>  /* Convenience macros to compress the tables */
> -#define SMI STD_MI_OPCODE_MASK
> -#define S3D STD_3D_OPCODE_MASK
> -#define S2D STD_2D_OPCODE_MASK
> -#define SMFX STD_MFX_OPCODE_MASK
> +#define SMI STD_MI_OPCODE_SHIFT
> +#define S3D STD_3D_OPCODE_SHIFT
> +#define S2D STD_2D_OPCODE_SHIFT
> +#define SMFX STD_MFX_OPCODE_SHIFT
>  #define F true
>  #define S CMD_DESC_SKIP
>  #define R CMD_DESC_REJECT
> @@ -627,12 +627,24 @@ static bool validate_regs_sorted(struct intel_engine_cs *ring)
>   * non-opcode bits being set. But if we don't include those bits, some 3D
>   * commands may hash to the same bucket due to not including opcode bits that
>   * make the command unique. For now, we will risk hashing to the same bucket.
> - *
> - * If we attempt to generate a perfect hash, we should be able to look at bits
> - * 31:29 of a command from a batch buffer and use the full mask for that
> - * client. The existing INSTR_CLIENT_MASK/SHIFT defines can be used for this.
>   */
> -#define CMD_HASH_MASK STD_MI_OPCODE_MASK
> +static inline u32 cmd_header_key(u32 x)
> +{
> +	u32 shift;
> +	switch (x >> INSTR_CLIENT_SHIFT) {
> +	default:
> +	case INSTR_MI_CLIENT:
> +		shift = STD_MI_OPCODE_SHIFT;
> +		break;
> +	case INSTR_RC_CLIENT:
> +		shift = STD_3D_OPCODE_SHIFT;
> +		break;
> +	case INSTR_BC_CLIENT:
> +		shift = STD_2D_OPCODE_SHIFT;
> +		break;
> +	}
> +	return x >> shift;
> +}
>  
>  static int init_hash_table(struct intel_engine_cs *ring,
>  			   const struct drm_i915_cmd_table *cmd_tables,
> @@ -648,7 +660,7 @@ static int init_hash_table(struct intel_engine_cs *ring,
>  		for (j = 0; j < table->count; j++) {
>  			struct drm_i915_cmd_descriptor *desc = &table->table[j];
>  			hash_add(ring->cmd_hash, &desc->node[ring->id],
> -				 desc->cmd.value & CMD_HASH_MASK);
> +				 cmd_header_key(desc->cmd.value));
>  		}
>  	}
>  
> @@ -776,10 +788,9 @@ find_cmd_in_table(struct intel_engine_cs *ring,
>  	const struct drm_i915_cmd_descriptor *desc;
>  
>  	hash_for_each_possible(ring->cmd_hash, desc, node[ring->id],
> -			       cmd_header & CMD_HASH_MASK) {
> +			       cmd_header_key(cmd_header))
>  		if (((cmd_header ^ desc->cmd.value) & desc->cmd.mask) == 0)
>  			return desc;
> -	}
>  
>  	return NULL;
>  }
> -- 
> 2.6.2
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Ville Syrjälä
Intel OTC
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 3/6] drm/i915: Use WC copies on !llc platforms for the command parser
  2015-11-20 15:05   ` Ville Syrjälä
@ 2015-11-20 15:22     ` Chris Wilson
  2015-12-01 17:32       ` Ville Syrjälä
  0 siblings, 1 reply; 26+ messages in thread
From: Chris Wilson @ 2015-11-20 15:22 UTC (permalink / raw)
  To: Ville Syrjälä; +Cc: intel-gfx

On Fri, Nov 20, 2015 at 05:05:05PM +0200, Ville Syrjälä wrote:
> On Fri, Nov 20, 2015 at 10:55:58AM +0000, Chris Wilson wrote:
> > Since we blow the TLB caches by using kmap/kunmap, we may as well go the
> > whole hog and see if declaring our destination page as WC is faster than
> > keeping it as WB and using clflush. It should be!
> 
> Is this description for another patch? I can't see any WC stuff in
> there.

No, just badly written. (Admittedly at one point I did experiment with
remapping the pages as WC and discovered that did stop_macheine()...)

drm/i915: Perform inline clflushes from the command parser

On incoherent architectures, we can avoid having to clflush the
destination shadow batch as a separate pass by inlining the call to
clflush whilst we already have the kmap_atomic() around for the page.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 5/6] drm/i915: Reduce pointer indirection during cmd parser lookup
  2015-11-20 10:56 ` [PATCH v2 5/6] drm/i915: Reduce pointer indirection " Chris Wilson
@ 2015-11-20 15:27   ` Ville Syrjälä
  2015-11-20 15:34     ` Chris Wilson
  2015-12-01 17:39     ` Ville Syrjälä
  0 siblings, 2 replies; 26+ messages in thread
From: Ville Syrjälä @ 2015-11-20 15:27 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Fri, Nov 20, 2015 at 10:56:00AM +0000, Chris Wilson wrote:
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/i915_cmd_parser.c | 51 ++++++++--------------------------
>  drivers/gpu/drm/i915/i915_drv.h        |  4 ++-
>  2 files changed, 14 insertions(+), 41 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
> index cfd07bfe6e75..ea9df2bb87de 100644
> --- a/drivers/gpu/drm/i915/i915_cmd_parser.c
> +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
> @@ -113,7 +113,7 @@
>  
>  /*            Command                          Mask   Fixed Len   Action
>  	      ---------------------------------------------------------- */
> -static const struct drm_i915_cmd_descriptor common_cmds[] = {
> +static struct drm_i915_cmd_descriptor common_cmds[] = {

I'm a little sad to see the const gone. All this gets moved out of
rodata.

>  	CMD(  MI_NOOP,                          SMI,    F,  1,      S  ),
>  	CMD(  MI_USER_INTERRUPT,                SMI,    F,  1,      R  ),
>  	CMD(  MI_WAIT_FOR_EVENT,                SMI,    F,  1,      M  ),
> @@ -146,7 +146,7 @@ static const struct drm_i915_cmd_descriptor common_cmds[] = {
>  	CMD(  MI_BATCH_BUFFER_START,            SMI,   !F,  0xFF,   S  ),
>  };
>  
> -static const struct drm_i915_cmd_descriptor render_cmds[] = {
> +static struct drm_i915_cmd_descriptor render_cmds[] = {
>  	CMD(  MI_FLUSH,                         SMI,    F,  1,      S  ),
>  	CMD(  MI_ARB_ON_OFF,                    SMI,    F,  1,      R  ),
>  	CMD(  MI_PREDICATE,                     SMI,    F,  1,      S  ),
> @@ -207,7 +207,7 @@ static const struct drm_i915_cmd_descriptor render_cmds[] = {
>  	      }},						       ),
>  };
>  
> -static const struct drm_i915_cmd_descriptor hsw_render_cmds[] = {
> +static struct drm_i915_cmd_descriptor hsw_render_cmds[] = {
>  	CMD(  MI_SET_PREDICATE,                 SMI,    F,  1,      S  ),
>  	CMD(  MI_RS_CONTROL,                    SMI,    F,  1,      S  ),
>  	CMD(  MI_URB_ATOMIC_ALLOC,              SMI,    F,  1,      S  ),
> @@ -229,7 +229,7 @@ static const struct drm_i915_cmd_descriptor hsw_render_cmds[] = {
>  	CMD(  GFX_OP_3DSTATE_BINDING_TABLE_EDIT_PS,  S3D,   !F,  0x1FF,  S  ),
>  };
>  
> -static const struct drm_i915_cmd_descriptor video_cmds[] = {
> +static struct drm_i915_cmd_descriptor video_cmds[] = {
>  	CMD(  MI_ARB_ON_OFF,                    SMI,    F,  1,      R  ),
>  	CMD(  MI_SET_APPID,                     SMI,    F,  1,      S  ),
>  	CMD(  MI_STORE_DWORD_IMM,               SMI,   !F,  0xFF,   B,
> @@ -273,7 +273,7 @@ static const struct drm_i915_cmd_descriptor video_cmds[] = {
>  	CMD(  MFX_WAIT,                         SMFX,   F,  1,      S  ),
>  };
>  
> -static const struct drm_i915_cmd_descriptor vecs_cmds[] = {
> +static struct drm_i915_cmd_descriptor vecs_cmds[] = {
>  	CMD(  MI_ARB_ON_OFF,                    SMI,    F,  1,      R  ),
>  	CMD(  MI_SET_APPID,                     SMI,    F,  1,      S  ),
>  	CMD(  MI_STORE_DWORD_IMM,               SMI,   !F,  0xFF,   B,
> @@ -311,7 +311,7 @@ static const struct drm_i915_cmd_descriptor vecs_cmds[] = {
>  	      }},						       ),
>  };
>  
> -static const struct drm_i915_cmd_descriptor blt_cmds[] = {
> +static struct drm_i915_cmd_descriptor blt_cmds[] = {
>  	CMD(  MI_DISPLAY_FLIP,                  SMI,   !F,  0xFF,   R  ),
>  	CMD(  MI_STORE_DWORD_IMM,               SMI,   !F,  0x3FF,  B,
>  	      .bits = {{
> @@ -344,7 +344,7 @@ static const struct drm_i915_cmd_descriptor blt_cmds[] = {
>  	CMD(  SRC_COPY_BLT,                     S2D,   !F,  0x3F,   S  ),
>  };
>  
> -static const struct drm_i915_cmd_descriptor hsw_blt_cmds[] = {
> +static struct drm_i915_cmd_descriptor hsw_blt_cmds[] = {
>  	CMD(  MI_LOAD_SCAN_LINES_INCL,          SMI,   !F,  0x3F,   M  ),
>  	CMD(  MI_LOAD_SCAN_LINES_EXCL,          SMI,   !F,  0x3F,   R  ),
>  };
> @@ -618,11 +618,6 @@ static bool validate_regs_sorted(struct intel_engine_cs *ring)
>  			     ring->master_reg_count);
>  }
>  
> -struct cmd_node {
> -	const struct drm_i915_cmd_descriptor *desc;
> -	struct hlist_node node;
> -};
> -
>  /*
>   * Different command ranges have different numbers of bits for the opcode. For
>   * example, MI commands use bits 31:23 while 3D commands use bits 31:16. The
> @@ -651,16 +646,8 @@ static int init_hash_table(struct intel_engine_cs *ring,
>  		const struct drm_i915_cmd_table *table = &cmd_tables[i];
>  
>  		for (j = 0; j < table->count; j++) {
> -			const struct drm_i915_cmd_descriptor *desc =
> -				&table->table[j];
> -			struct cmd_node *desc_node =
> -				kmalloc(sizeof(*desc_node), GFP_KERNEL);
> -
> -			if (!desc_node)
> -				return -ENOMEM;

init_hash_table() can no longer fail -> void?

> -
> -			desc_node->desc = desc;
> -			hash_add(ring->cmd_hash, &desc_node->node,
> +			struct drm_i915_cmd_descriptor *desc = &table->table[j];
> +			hash_add(ring->cmd_hash, &desc->node[ring->id],
>  				 desc->cmd.value & CMD_HASH_MASK);
>  		}
>  	}
> @@ -668,18 +655,6 @@ static int init_hash_table(struct intel_engine_cs *ring,
>  	return 0;
>  }
>  
> -static void fini_hash_table(struct intel_engine_cs *ring)
> -{
> -	struct hlist_node *tmp;
> -	struct cmd_node *desc_node;
> -	int i;
> -
> -	hash_for_each_safe(ring->cmd_hash, i, tmp, desc_node, node) {
> -		hash_del(&desc_node->node);
> -		kfree(desc_node);
> -	}
> -}
> -
>  /**
>   * i915_cmd_parser_init_ring() - set cmd parser related fields for a ringbuffer
>   * @ring: the ringbuffer to initialize
> @@ -770,7 +745,6 @@ int i915_cmd_parser_init_ring(struct intel_engine_cs *ring)
>  	ret = init_hash_table(ring, cmd_tables, cmd_table_count);
>  	if (ret) {
>  		DRM_ERROR("CMD: cmd_parser_init failed!\n");
> -		fini_hash_table(ring);
>  		return ret;
>  	}
>  
> @@ -790,8 +764,6 @@ void i915_cmd_parser_fini_ring(struct intel_engine_cs *ring)
>  {
>  	if (!ring->needs_cmd_parser)
>  		return;
> -
> -	fini_hash_table(ring);
>  }

i915_cmd_parser_fini_ring() is a nop now. Kill it?

>  
>  /*
> @@ -801,11 +773,10 @@ static const struct drm_i915_cmd_descriptor*
>  find_cmd_in_table(struct intel_engine_cs *ring,
>  		  u32 cmd_header)
>  {
> -	struct cmd_node *desc_node;
> +	const struct drm_i915_cmd_descriptor *desc;
>  
> -	hash_for_each_possible(ring->cmd_hash, desc_node, node,
> +	hash_for_each_possible(ring->cmd_hash, desc, node[ring->id],
>  			       cmd_header & CMD_HASH_MASK) {
> -		const struct drm_i915_cmd_descriptor *desc = desc_node->desc;
>  		if (((cmd_header ^ desc->cmd.value) & desc->cmd.mask) == 0)
>  			return desc;

At least we still return this as const, so the caller can't accidentally
clobber it, even if we lose the rodata protection.

Apart from the int vs. void thing and the nop
i915_cmd_parser_fini_ring() the patch looks fine to me.

>  	}
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 28d5bfceae3b..5960f76f1438 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2396,6 +2396,8 @@ struct drm_i915_cmd_descriptor {
>  		u32 condition_offset;
>  		u32 condition_mask;
>  	} bits[MAX_CMD_DESC_BITMASKS];
> +
> +	struct hlist_node node[I915_NUM_RINGS];
>  };
>  
>  /*
> @@ -2405,7 +2407,7 @@ struct drm_i915_cmd_descriptor {
>   * descriptors, which must be sorted with command opcodes in ascending order.
>   */
>  struct drm_i915_cmd_table {
> -	const struct drm_i915_cmd_descriptor *table;
> +	struct drm_i915_cmd_descriptor *table;
>  	int count;
>  };
>  
> -- 
> 2.6.2
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Ville Syrjälä
Intel OTC
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v3] drm/i915: Eliminate vmap overhead for cmd parser
  2015-11-20 14:41   ` Ville Syrjälä
  2015-11-20 14:52     ` Chris Wilson
@ 2015-11-20 15:31     ` Chris Wilson
  2015-11-25 19:51       ` Ville Syrjälä
  1 sibling, 1 reply; 26+ messages in thread
From: Chris Wilson @ 2015-11-20 15:31 UTC (permalink / raw)
  To: intel-gfx

With a little complexity to handle cmds straddling page boundaries, we
can completely avoiding having to vmap the batch and the shadow batch
objects whilst running the command parser.

On ivb i7-3720MQ:

x11perf -dot before 54.3M, after 53.2M (max 203M)
glxgears before 7110 fps, after 7300 fps (max 7860 fps)

Before:
Time to blt 16384 bytes x      1:	 12.400µs, 1.2GiB/s
Time to blt 16384 bytes x   4096:	  3.055µs, 5.0GiB/s

After:
Time to blt 16384 bytes x      1:	  8.600µs, 1.8GiB/s
Time to blt 16384 bytes x   4096:	  2.456µs, 6.2GiB/s

Removing the vmap is mostly a win, except we lose in a few cases where
the batch size is greater than a page due to the extra complexity (loss
of a simple cache efficient large copy, and boundary handling).

v2: Reorder so that we do check oacontrol remaining set at end-of-batch
v3: Stage the copy through a temporary page so that we only copy into
dst commands that have been verified.
v4: Move the batch offset/length validation to execbuf.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_cmd_parser.c | 258 ++++++++++++++++-----------------
 1 file changed, 127 insertions(+), 131 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
index 814d894ed925..86910e17505a 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -860,100 +860,6 @@ find_reg(const struct drm_i915_reg_descriptor *table,
 	return NULL;
 }
 
-static u32 *vmap_batch(struct drm_i915_gem_object *obj,
-		       unsigned start, unsigned len)
-{
-	int i;
-	void *addr = NULL;
-	struct sg_page_iter sg_iter;
-	int first_page = start >> PAGE_SHIFT;
-	int last_page = (len + start + 4095) >> PAGE_SHIFT;
-	int npages = last_page - first_page;
-	struct page **pages;
-
-	pages = drm_malloc_ab(npages, sizeof(*pages));
-	if (pages == NULL) {
-		DRM_DEBUG_DRIVER("Failed to get space for pages\n");
-		goto finish;
-	}
-
-	i = 0;
-	for_each_sg_page(obj->pages->sgl, &sg_iter, obj->pages->nents, first_page) {
-		pages[i++] = sg_page_iter_page(&sg_iter);
-		if (i == npages)
-			break;
-	}
-
-	addr = vmap(pages, i, 0, PAGE_KERNEL);
-	if (addr == NULL) {
-		DRM_DEBUG_DRIVER("Failed to vmap pages\n");
-		goto finish;
-	}
-
-finish:
-	if (pages)
-		drm_free_large(pages);
-	return (u32*)addr;
-}
-
-/* Returns a vmap'd pointer to dest_obj, which the caller must unmap */
-static u32 *copy_batch(struct drm_i915_gem_object *dest_obj,
-		       struct drm_i915_gem_object *src_obj,
-		       u32 batch_start_offset,
-		       u32 batch_len)
-{
-	int needs_clflush = 0;
-	void *src_base, *src;
-	void *dst = NULL;
-	int ret;
-
-	if (batch_len > dest_obj->base.size ||
-	    batch_len + batch_start_offset > src_obj->base.size)
-		return ERR_PTR(-E2BIG);
-
-	if (WARN_ON(dest_obj->pages_pin_count == 0))
-		return ERR_PTR(-ENODEV);
-
-	ret = i915_gem_obj_prepare_shmem_read(src_obj, &needs_clflush);
-	if (ret) {
-		DRM_DEBUG_DRIVER("CMD: failed to prepare shadow batch\n");
-		return ERR_PTR(ret);
-	}
-
-	src_base = vmap_batch(src_obj, batch_start_offset, batch_len);
-	if (!src_base) {
-		DRM_DEBUG_DRIVER("CMD: Failed to vmap batch\n");
-		ret = -ENOMEM;
-		goto unpin_src;
-	}
-
-	ret = i915_gem_object_set_to_cpu_domain(dest_obj, true);
-	if (ret) {
-		DRM_DEBUG_DRIVER("CMD: Failed to set shadow batch to CPU\n");
-		goto unmap_src;
-	}
-
-	dst = vmap_batch(dest_obj, 0, batch_len);
-	if (!dst) {
-		DRM_DEBUG_DRIVER("CMD: Failed to vmap shadow batch\n");
-		ret = -ENOMEM;
-		goto unmap_src;
-	}
-
-	src = src_base + offset_in_page(batch_start_offset);
-	if (needs_clflush)
-		drm_clflush_virt_range(src, batch_len);
-
-	memcpy(dst, src, batch_len);
-
-unmap_src:
-	vunmap(src_base);
-unpin_src:
-	i915_gem_object_unpin_pages(src_obj);
-
-	return ret ? ERR_PTR(ret) : dst;
-}
-
 /**
  * i915_needs_cmd_parser() - should a given ring use software command parsing?
  * @ring: the ring in question
@@ -1097,6 +1003,7 @@ static bool check_cmd(const struct intel_engine_cs *ring,
 }
 
 #define LENGTH_BIAS 2
+#define MAX_PARTIAL 256
 
 /**
  * i915_parse_cmds() - parse a submitted batch buffer for privilege violations
@@ -1120,39 +1027,91 @@ int i915_parse_cmds(struct intel_engine_cs *ring,
 		    u32 batch_len,
 		    bool is_master)
 {
-	u32 *cmd, *batch_base, *batch_end;
+	const struct drm_i915_cmd_descriptor *desc;
+	unsigned dst_iter, src_iter;
+	int needs_clflush = 0;
+	struct get_page rewind;
+	void *src, *dst, *tmp;
+	u32 partial, length;
+	unsigned in, out;
 	struct drm_i915_cmd_descriptor default_desc = { 0 };
 	bool oacontrol_set = false; /* OACONTROL tracking. See check_cmd() */
-	int ret = 0;
+	int ret;
 
-	batch_base = copy_batch(shadow_batch_obj, batch_obj,
-				batch_start_offset, batch_len);
-	if (IS_ERR(batch_base)) {
-		DRM_DEBUG_DRIVER("CMD: Failed to copy batch\n");
-		return PTR_ERR(batch_base);
+	if (WARN_ON(shadow_batch_obj->pages_pin_count == 0))
+		return -ENODEV;
+
+	ret = i915_gem_obj_prepare_shmem_read(batch_obj, &needs_clflush);
+	if (ret) {
+		DRM_DEBUG_DRIVER("CMD: failed to prepare shadow batch\n");
+		return ret;
 	}
 
-	/*
-	 * We use the batch length as size because the shadow object is as
-	 * large or larger and copy_batch() will write MI_NOPs to the extra
-	 * space. Parsing should be faster in some cases this way.
+	ret = i915_gem_object_set_to_cpu_domain(shadow_batch_obj, true);
+	if (ret) {
+		DRM_DEBUG_DRIVER("CMD: Failed to set shadow batch to CPU\n");
+		goto unpin;
+	}
+
+	tmp = kmalloc(PAGE_SIZE + MAX_PARTIAL, GFP_TEMPORARY);
+	if (tmp == NULL)
+		return -ENOMEM;
+
+	/* Keep the original cached iterator around as that is likely
+	 * to be more useful in future ioctls.
 	 */
-	batch_end = batch_base + (batch_len / sizeof(*batch_end));
+	rewind = batch_obj->get_page;
 
-	cmd = batch_base;
-	while (cmd < batch_end) {
-		const struct drm_i915_cmd_descriptor *desc;
-		u32 length;
+	ret = -EINVAL;
+
+	dst_iter = 0;
+	dst = kmap_atomic(i915_gem_object_get_page(shadow_batch_obj, dst_iter));
+	out = 0;
+
+	in = offset_in_page(batch_start_offset);
+	partial = 0;
+	for (src_iter = batch_start_offset >> PAGE_SHIFT;
+	     src_iter < batch_obj->base.size >> PAGE_SHIFT;
+	     src_iter++) {
+		u32 *cmd, *page_end, *batch_end;
+		u32 this;
+
+		this = batch_len;
+		if (this > PAGE_SIZE - in)
+			this = PAGE_SIZE - in;
 
-		if (*cmd == MI_BATCH_BUFFER_END)
+		src = kmap_atomic(i915_gem_object_get_page(batch_obj, src_iter));
+		if (needs_clflush)
+			drm_clflush_virt_range(src + in, this);
+
+		if (this == PAGE_SIZE && partial == 0)
+			copy_page(tmp, src);
+		else
+			memcpy(tmp+partial, src+in, this);
+
+		cmd = tmp;
+		page_end = tmp + this + partial;
+		batch_end = tmp + batch_len + partial;
+		partial = 0;
+
+		do {
+			if (*cmd == MI_BATCH_BUFFER_END) {
+				if (oacontrol_set) {
+					DRM_DEBUG_DRIVER("CMD: batch set OACONTROL but did not clear it\n");
+					goto unmap;
+				}
+
+				cmd++; /* copy this cmd to dst */
+				batch_len = this; /* no more src */
+				ret = 0;
 				break;
+			}
 
 			desc = find_cmd(ring, *cmd, &default_desc);
 			if (!desc) {
 				DRM_DEBUG_DRIVER("CMD: Unrecognized command: 0x%08X\n",
 						 *cmd);
-			ret = -EINVAL;
-			break;
+				goto unmap;
 			}
 
 			/*
@@ -1162,7 +1121,7 @@ int i915_parse_cmds(struct intel_engine_cs *ring,
 			 */
 			if (desc->cmd.value == MI_BATCH_BUFFER_START) {
 				ret = -EACCES;
-			break;
+				goto unmap;
 			}
 
 			if (desc->flags & CMD_DESC_FIXED)
@@ -1170,36 +1129,73 @@ int i915_parse_cmds(struct intel_engine_cs *ring,
 			else
 				length = ((*cmd & desc->length.mask) + LENGTH_BIAS);
 
-		if ((batch_end - cmd) < length) {
+			if (unlikely(cmd + length > page_end)) {
+				if (unlikely(cmd + length > batch_end)) {
 					DRM_DEBUG_DRIVER("CMD: Command length exceeds batch length: 0x%08X length=%u batchlen=%td\n",
-					 *cmd,
-					 length,
-					 batch_end - cmd);
-			ret = -EINVAL;
-			break;
+							 *cmd, length, batch_end - cmd);
+					goto unmap;
 				}
 
-		if (!check_cmd(ring, desc, cmd, length, is_master,
-			       &oacontrol_set)) {
-			ret = -EINVAL;
+				if (WARN_ON(length > MAX_PARTIAL)) {
+					ret = -ENODEV;
+					goto unmap;
+				}
+
+				partial = 4*(page_end - cmd);
 				break;
 			}
 
+			if (!check_cmd(ring, desc, cmd, length, is_master,
+				       &oacontrol_set))
+				goto unmap;
+
 			cmd += length;
-	}
+		} while (cmd < page_end);
 
-	if (oacontrol_set) {
-		DRM_DEBUG_DRIVER("CMD: batch set OACONTROL but did not clear it\n");
-		ret = -EINVAL;
+		/* Copy the validated page to the GPU batch */
+		{
+			/* exclude partial cmd, we copy it next time */
+			unsigned dst_length = (void *)cmd - tmp;
+			in = 0;
+			do {
+				int len;
+
+				if (out == PAGE_SIZE) {
+					kunmap_atomic(dst);
+					dst = kmap_atomic(i915_gem_object_get_page(shadow_batch_obj, ++dst_iter));
+					out = 0;
 				}
 
-	if (cmd >= batch_end) {
-		DRM_DEBUG_DRIVER("CMD: Got to the end of the buffer w/o a BBE cmd!\n");
-		ret = -EINVAL;
+				len = dst_length;
+				if (len + out > PAGE_SIZE)
+					len = PAGE_SIZE - out;
+				if (len == PAGE_SIZE)
+					copy_page(dst, tmp + in);
+				else
+					memcpy(dst + out, tmp + in, len);
+				in += len;
+				out += len;
+				dst_length -= len;
+			} while (dst_length);
 		}
 
-	vunmap(batch_base);
+		batch_len -= this;
+		if (batch_len == 0)
+			break;
 
+		if (partial)
+			memmove(tmp, cmd, partial);
+
+		kunmap_atomic(src);
+		in = 0;
+	}
+unmap:
+	kunmap_atomic(src);
+	kunmap_atomic(dst);
+	batch_obj->get_page = rewind;
+	kfree(tmp);
+unpin:
+	i915_gem_object_unpin_pages(batch_obj);
 	return ret;
 }
 
-- 
2.6.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 5/6] drm/i915: Reduce pointer indirection during cmd parser lookup
  2015-11-20 15:27   ` Ville Syrjälä
@ 2015-11-20 15:34     ` Chris Wilson
  2015-11-20 15:47       ` Ville Syrjälä
  2015-12-01 17:39     ` Ville Syrjälä
  1 sibling, 1 reply; 26+ messages in thread
From: Chris Wilson @ 2015-11-20 15:34 UTC (permalink / raw)
  To: Ville Syrjälä; +Cc: intel-gfx

On Fri, Nov 20, 2015 at 05:27:43PM +0200, Ville Syrjälä wrote:
> On Fri, Nov 20, 2015 at 10:56:00AM +0000, Chris Wilson wrote:
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >  drivers/gpu/drm/i915/i915_cmd_parser.c | 51 ++++++++--------------------------
> >  drivers/gpu/drm/i915/i915_drv.h        |  4 ++-
> >  2 files changed, 14 insertions(+), 41 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
> > index cfd07bfe6e75..ea9df2bb87de 100644
> > --- a/drivers/gpu/drm/i915/i915_cmd_parser.c
> > +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
> > @@ -113,7 +113,7 @@
> >  
> >  /*            Command                          Mask   Fixed Len   Action
> >  	      ---------------------------------------------------------- */
> > -static const struct drm_i915_cmd_descriptor common_cmds[] = {
> > +static struct drm_i915_cmd_descriptor common_cmds[] = {
> 
> I'm a little sad to see the const gone. All this gets moved out of
> rodata.

I'm open to having this refused. Pretty much both of 4 & 5 can be
ignored after reducing the number of hash collisions in 6. Though not
having to duplicate the memory for the hash tables is nice.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 2/6] drm/i915: Cache last cmd descriptor when parsing
  2015-11-20 15:08   ` Ville Syrjälä
@ 2015-11-20 15:44     ` Chris Wilson
  0 siblings, 0 replies; 26+ messages in thread
From: Chris Wilson @ 2015-11-20 15:44 UTC (permalink / raw)
  To: Ville Syrjälä; +Cc: intel-gfx

On Fri, Nov 20, 2015 at 05:08:06PM +0200, Ville Syrjälä wrote:
> On Fri, Nov 20, 2015 at 10:55:57AM +0000, Chris Wilson wrote:
> > +				desc = find_cmd_in_table(ring, *cmd);
> > +				if (desc) {
> > +					if (unlikely(desc->flags & CMD_DESC_REJECT)) {
> > +						DRM_DEBUG_DRIVER("CMD: Rejected command: 0x%08X\n", *cmd);
> > +						goto unmap;
> > +					}
> 
> Yikes. More indenttion. It's getting really deep here. Any sane way to
> reduce it?

Moving the parsing out to a helper and some of the state to an on-stack
struct and then let gcc fold it back in. I was being lazy, just letting
vim reindent the block is far easier!
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 5/6] drm/i915: Reduce pointer indirection during cmd parser lookup
  2015-11-20 15:34     ` Chris Wilson
@ 2015-11-20 15:47       ` Ville Syrjälä
  2015-11-23  8:09         ` Jani Nikula
  0 siblings, 1 reply; 26+ messages in thread
From: Ville Syrjälä @ 2015-11-20 15:47 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On Fri, Nov 20, 2015 at 03:34:22PM +0000, Chris Wilson wrote:
> On Fri, Nov 20, 2015 at 05:27:43PM +0200, Ville Syrjälä wrote:
> > On Fri, Nov 20, 2015 at 10:56:00AM +0000, Chris Wilson wrote:
> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > ---
> > >  drivers/gpu/drm/i915/i915_cmd_parser.c | 51 ++++++++--------------------------
> > >  drivers/gpu/drm/i915/i915_drv.h        |  4 ++-
> > >  2 files changed, 14 insertions(+), 41 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
> > > index cfd07bfe6e75..ea9df2bb87de 100644
> > > --- a/drivers/gpu/drm/i915/i915_cmd_parser.c
> > > +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
> > > @@ -113,7 +113,7 @@
> > >  
> > >  /*            Command                          Mask   Fixed Len   Action
> > >  	      ---------------------------------------------------------- */
> > > -static const struct drm_i915_cmd_descriptor common_cmds[] = {
> > > +static struct drm_i915_cmd_descriptor common_cmds[] = {
> > 
> > I'm a little sad to see the const gone. All this gets moved out of
> > rodata.
> 
> I'm open to having this refused. Pretty much both of 4 & 5 can be
> ignored after reducing the number of hash collisions in 6. Though not
> having to duplicate the memory for the hash tables is nice.

I don't have a strong opinion either way. On one hand rodata is nice,
on the other hand simplicity from avoiding the kmalloc()s is nice.

-- 
Ville Syrjälä
Intel OTC
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 5/6] drm/i915: Reduce pointer indirection during cmd parser lookup
  2015-11-20 15:47       ` Ville Syrjälä
@ 2015-11-23  8:09         ` Jani Nikula
  0 siblings, 0 replies; 26+ messages in thread
From: Jani Nikula @ 2015-11-23  8:09 UTC (permalink / raw)
  To: Ville Syrjälä, Chris Wilson, intel-gfx

On Fri, 20 Nov 2015, Ville Syrjälä <ville.syrjala@linux.intel.com> wrote:
> On Fri, Nov 20, 2015 at 03:34:22PM +0000, Chris Wilson wrote:
>> On Fri, Nov 20, 2015 at 05:27:43PM +0200, Ville Syrjälä wrote:
>> > On Fri, Nov 20, 2015 at 10:56:00AM +0000, Chris Wilson wrote:
>> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> > > ---
>> > >  drivers/gpu/drm/i915/i915_cmd_parser.c | 51 ++++++++--------------------------
>> > >  drivers/gpu/drm/i915/i915_drv.h        |  4 ++-
>> > >  2 files changed, 14 insertions(+), 41 deletions(-)
>> > > 
>> > > diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
>> > > index cfd07bfe6e75..ea9df2bb87de 100644
>> > > --- a/drivers/gpu/drm/i915/i915_cmd_parser.c
>> > > +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
>> > > @@ -113,7 +113,7 @@
>> > >  
>> > >  /*            Command                          Mask   Fixed Len   Action
>> > >  	      ---------------------------------------------------------- */
>> > > -static const struct drm_i915_cmd_descriptor common_cmds[] = {
>> > > +static struct drm_i915_cmd_descriptor common_cmds[] = {
>> > 
>> > I'm a little sad to see the const gone. All this gets moved out of
>> > rodata.
>> 
>> I'm open to having this refused. Pretty much both of 4 & 5 can be
>> ignored after reducing the number of hash collisions in 6. Though not
>> having to duplicate the memory for the hash tables is nice.
>
> I don't have a strong opinion either way. On one hand rodata is nice,
> on the other hand simplicity from avoiding the kmalloc()s is nice.

One could allocate (by kmalloc or statically) an array of struct
cmd_node instead of each of them individually, however IIUC that was not
the main optimization here but rather the reduction of the pointer
chasing.

BR,
Jani.




-- 
Jani Nikula, Intel Open Source Technology Center
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3] drm/i915: Eliminate vmap overhead for cmd parser
  2015-11-20 15:31     ` [PATCH v3] " Chris Wilson
@ 2015-11-25 19:51       ` Ville Syrjälä
  2015-11-25 20:13         ` Chris Wilson
  0 siblings, 1 reply; 26+ messages in thread
From: Ville Syrjälä @ 2015-11-25 19:51 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Fri, Nov 20, 2015 at 03:31:23PM +0000, Chris Wilson wrote:
> With a little complexity to handle cmds straddling page boundaries, we
> can completely avoiding having to vmap the batch and the shadow batch
> objects whilst running the command parser.
> 
> On ivb i7-3720MQ:
> 
> x11perf -dot before 54.3M, after 53.2M (max 203M)
> glxgears before 7110 fps, after 7300 fps (max 7860 fps)
> 
> Before:
> Time to blt 16384 bytes x      1:	 12.400µs, 1.2GiB/s
> Time to blt 16384 bytes x   4096:	  3.055µs, 5.0GiB/s
> 
> After:
> Time to blt 16384 bytes x      1:	  8.600µs, 1.8GiB/s
> Time to blt 16384 bytes x   4096:	  2.456µs, 6.2GiB/s
> 
> Removing the vmap is mostly a win, except we lose in a few cases where
> the batch size is greater than a page due to the extra complexity (loss
> of a simple cache efficient large copy, and boundary handling).
> 
> v2: Reorder so that we do check oacontrol remaining set at end-of-batch
> v3: Stage the copy through a temporary page so that we only copy into
> dst commands that have been verified.
> v4: Move the batch offset/length validation to execbuf.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/i915_cmd_parser.c | 258 ++++++++++++++++-----------------
>  1 file changed, 127 insertions(+), 131 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
> index 814d894ed925..86910e17505a 100644
> --- a/drivers/gpu/drm/i915/i915_cmd_parser.c
> +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
> @@ -860,100 +860,6 @@ find_reg(const struct drm_i915_reg_descriptor *table,
>  	return NULL;
>  }
>  
> -static u32 *vmap_batch(struct drm_i915_gem_object *obj,
> -		       unsigned start, unsigned len)
> -{
> -	int i;
> -	void *addr = NULL;
> -	struct sg_page_iter sg_iter;
> -	int first_page = start >> PAGE_SHIFT;
> -	int last_page = (len + start + 4095) >> PAGE_SHIFT;
> -	int npages = last_page - first_page;
> -	struct page **pages;
> -
> -	pages = drm_malloc_ab(npages, sizeof(*pages));
> -	if (pages == NULL) {
> -		DRM_DEBUG_DRIVER("Failed to get space for pages\n");
> -		goto finish;
> -	}
> -
> -	i = 0;
> -	for_each_sg_page(obj->pages->sgl, &sg_iter, obj->pages->nents, first_page) {
> -		pages[i++] = sg_page_iter_page(&sg_iter);
> -		if (i == npages)
> -			break;
> -	}
> -
> -	addr = vmap(pages, i, 0, PAGE_KERNEL);
> -	if (addr == NULL) {
> -		DRM_DEBUG_DRIVER("Failed to vmap pages\n");
> -		goto finish;
> -	}
> -
> -finish:
> -	if (pages)
> -		drm_free_large(pages);
> -	return (u32*)addr;
> -}
> -
> -/* Returns a vmap'd pointer to dest_obj, which the caller must unmap */
> -static u32 *copy_batch(struct drm_i915_gem_object *dest_obj,
> -		       struct drm_i915_gem_object *src_obj,
> -		       u32 batch_start_offset,
> -		       u32 batch_len)
> -{
> -	int needs_clflush = 0;
> -	void *src_base, *src;
> -	void *dst = NULL;
> -	int ret;
> -
> -	if (batch_len > dest_obj->base.size ||
> -	    batch_len + batch_start_offset > src_obj->base.size)
> -		return ERR_PTR(-E2BIG);
> -
> -	if (WARN_ON(dest_obj->pages_pin_count == 0))
> -		return ERR_PTR(-ENODEV);
> -
> -	ret = i915_gem_obj_prepare_shmem_read(src_obj, &needs_clflush);
> -	if (ret) {
> -		DRM_DEBUG_DRIVER("CMD: failed to prepare shadow batch\n");
> -		return ERR_PTR(ret);
> -	}
> -
> -	src_base = vmap_batch(src_obj, batch_start_offset, batch_len);
> -	if (!src_base) {
> -		DRM_DEBUG_DRIVER("CMD: Failed to vmap batch\n");
> -		ret = -ENOMEM;
> -		goto unpin_src;
> -	}
> -
> -	ret = i915_gem_object_set_to_cpu_domain(dest_obj, true);
> -	if (ret) {
> -		DRM_DEBUG_DRIVER("CMD: Failed to set shadow batch to CPU\n");
> -		goto unmap_src;
> -	}
> -
> -	dst = vmap_batch(dest_obj, 0, batch_len);
> -	if (!dst) {
> -		DRM_DEBUG_DRIVER("CMD: Failed to vmap shadow batch\n");
> -		ret = -ENOMEM;
> -		goto unmap_src;
> -	}
> -
> -	src = src_base + offset_in_page(batch_start_offset);
> -	if (needs_clflush)
> -		drm_clflush_virt_range(src, batch_len);
> -
> -	memcpy(dst, src, batch_len);
> -
> -unmap_src:
> -	vunmap(src_base);
> -unpin_src:
> -	i915_gem_object_unpin_pages(src_obj);
> -
> -	return ret ? ERR_PTR(ret) : dst;
> -}
> -
>  /**
>   * i915_needs_cmd_parser() - should a given ring use software command parsing?
>   * @ring: the ring in question
> @@ -1097,6 +1003,7 @@ static bool check_cmd(const struct intel_engine_cs *ring,
>  }
>  
>  #define LENGTH_BIAS 2
> +#define MAX_PARTIAL 256

There seems to some confusion whether this is bytes or dwords.

Also I guess we already end up allocating two pages anyway, so
maybe MAX_PARTIAL should just be one page? It's still not big
enough to cover the max legal cmd length AFAICS, so I think
the WARN in the check needs to be removed.

>  
>  /**
>   * i915_parse_cmds() - parse a submitted batch buffer for privilege violations
> @@ -1120,39 +1027,91 @@ int i915_parse_cmds(struct intel_engine_cs *ring,
>  		    u32 batch_len,
>  		    bool is_master)
>  {
> -	u32 *cmd, *batch_base, *batch_end;
> +	const struct drm_i915_cmd_descriptor *desc;
> +	unsigned dst_iter, src_iter;
> +	int needs_clflush = 0;
> +	struct get_page rewind;
> +	void *src, *dst, *tmp;
> +	u32 partial, length;
> +	unsigned in, out;
>  	struct drm_i915_cmd_descriptor default_desc = { 0 };
>  	bool oacontrol_set = false; /* OACONTROL tracking. See check_cmd() */
> -	int ret = 0;
> +	int ret;
>  
> -	batch_base = copy_batch(shadow_batch_obj, batch_obj,
> -				batch_start_offset, batch_len);
> -	if (IS_ERR(batch_base)) {
> -		DRM_DEBUG_DRIVER("CMD: Failed to copy batch\n");
> -		return PTR_ERR(batch_base);
> +	if (WARN_ON(shadow_batch_obj->pages_pin_count == 0))
> +		return -ENODEV;
> +
> +	ret = i915_gem_obj_prepare_shmem_read(batch_obj, &needs_clflush);
> +	if (ret) {
> +		DRM_DEBUG_DRIVER("CMD: failed to prepare shadow batch\n");
> +		return ret;
>  	}
>  
> -	/*
> -	 * We use the batch length as size because the shadow object is as
> -	 * large or larger and copy_batch() will write MI_NOPs to the extra
> -	 * space. Parsing should be faster in some cases this way.
> +	ret = i915_gem_object_set_to_cpu_domain(shadow_batch_obj, true);
> +	if (ret) {
> +		DRM_DEBUG_DRIVER("CMD: Failed to set shadow batch to CPU\n");
> +		goto unpin;
> +	}
> +
> +	tmp = kmalloc(PAGE_SIZE + MAX_PARTIAL, GFP_TEMPORARY);
> +	if (tmp == NULL)
> +		return -ENOMEM;
> +
> +	/* Keep the original cached iterator around as that is likely
> +	 * to be more useful in future ioctls.
>  	 */
> -	batch_end = batch_base + (batch_len / sizeof(*batch_end));
> +	rewind = batch_obj->get_page;
>  
> -	cmd = batch_base;
> -	while (cmd < batch_end) {
> -		const struct drm_i915_cmd_descriptor *desc;
> -		u32 length;
> +	ret = -EINVAL;
> +
> +	dst_iter = 0;
> +	dst = kmap_atomic(i915_gem_object_get_page(shadow_batch_obj, dst_iter));
> +	out = 0;
> +
> +	in = offset_in_page(batch_start_offset);
> +	partial = 0;
> +	for (src_iter = batch_start_offset >> PAGE_SHIFT;
> +	     src_iter < batch_obj->base.size >> PAGE_SHIFT;
> +	     src_iter++) {

So we're iterating over all the pages. Should be enough to iterate
until batch_start_offset+batch_len I suppose, but as long as we bail
out when we run out of batch it should be fine.

I see there's a batch_len check at the end, but I don't see us handling
the case when the user already gives us something with batch_len==0.
Maybe that should be rejected somewhere higher up?

Also what happens if we don't find MI_BATCH_BUFFER_END before running
out of batch? Oh, I see, we set ret=-EINVAL, and clear it to 0 when we
find MI_BATCH_BUFFER_END. So that part seems to be fine.

> +		u32 *cmd, *page_end, *batch_end;
> +		u32 this;
> +
> +		this = batch_len;

I was a bit concerned about batch_len & 3, but we already check for
batch_len&7==0 in i915_gem_check_execbuffer(), so it should be good here.

> +		if (this > PAGE_SIZE - in)
> +			this = PAGE_SIZE - in;
>  
> -		if (*cmd == MI_BATCH_BUFFER_END)
> +		src = kmap_atomic(i915_gem_object_get_page(batch_obj, src_iter));
> +		if (needs_clflush)
> +			drm_clflush_virt_range(src + in, this);
> +
> +		if (this == PAGE_SIZE && partial == 0)
> +			copy_page(tmp, src);
> +		else
> +			memcpy(tmp+partial, src+in, this);
> +
> +		cmd = tmp;
> +		page_end = tmp + this + partial;
> +		batch_end = tmp + batch_len + partial;
> +		partial = 0;
> +
> +		do {
> +			if (*cmd == MI_BATCH_BUFFER_END) {
> +				if (oacontrol_set) {
> +					DRM_DEBUG_DRIVER("CMD: batch set OACONTROL but did not clear it\n");
> +					goto unmap;
> +				}
> +
> +				cmd++; /* copy this cmd to dst */
> +				batch_len = this; /* no more src */
> +				ret = 0;
>  				break;
> +			}
>  
>  			desc = find_cmd(ring, *cmd, &default_desc);
>  			if (!desc) {
>  				DRM_DEBUG_DRIVER("CMD: Unrecognized command: 0x%08X\n",
>  						 *cmd);
> -			ret = -EINVAL;
> -			break;
> +				goto unmap;
>  			}
>  
>  			/*
> @@ -1162,7 +1121,7 @@ int i915_parse_cmds(struct intel_engine_cs *ring,
>  			 */
>  			if (desc->cmd.value == MI_BATCH_BUFFER_START) {
>  				ret = -EACCES;

Bleh. I wonder why we even bother with this thing on VLV/IVB when
it's that easy to bypass...

> -			break;
> +				goto unmap;
>  			}
>  
>  			if (desc->flags & CMD_DESC_FIXED)
> @@ -1170,36 +1129,73 @@ int i915_parse_cmds(struct intel_engine_cs *ring,
>  			else
>  				length = ((*cmd & desc->length.mask) + LENGTH_BIAS);
>  
> -		if ((batch_end - cmd) < length) {
> +			if (unlikely(cmd + length > page_end)) {
> +				if (unlikely(cmd + length > batch_end)) {
>  					DRM_DEBUG_DRIVER("CMD: Command length exceeds batch length: 0x%08X length=%u batchlen=%td\n",
> -					 *cmd,
> -					 length,
> -					 batch_end - cmd);
> -			ret = -EINVAL;
> -			break;
> +							 *cmd, length, batch_end - cmd);
> +					goto unmap;
>  				}
>  
> -		if (!check_cmd(ring, desc, cmd, length, is_master,
> -			       &oacontrol_set)) {
> -			ret = -EINVAL;
> +				if (WARN_ON(length > MAX_PARTIAL)) {
> +					ret = -ENODEV;
> +					goto unmap;
> +				}
> +
> +				partial = 4*(page_end - cmd);
>  				break;
>  			}
>  
> +			if (!check_cmd(ring, desc, cmd, length, is_master,
> +				       &oacontrol_set))
> +				goto unmap;
> +
>  			cmd += length;
> -	}
> +		} while (cmd < page_end);
>  
> -	if (oacontrol_set) {
> -		DRM_DEBUG_DRIVER("CMD: batch set OACONTROL but did not clear it\n");
> -		ret = -EINVAL;
> +		/* Copy the validated page to the GPU batch */
> +		{
> +			/* exclude partial cmd, we copy it next time */
> +			unsigned dst_length = (void *)cmd - tmp;
> +			in = 0;
> +			do {
> +				int len;
> +
> +				if (out == PAGE_SIZE) {
> +					kunmap_atomic(dst);
> +					dst = kmap_atomic(i915_gem_object_get_page(shadow_batch_obj, ++dst_iter));
> +					out = 0;
>  				}
>  
> -	if (cmd >= batch_end) {
> -		DRM_DEBUG_DRIVER("CMD: Got to the end of the buffer w/o a BBE cmd!\n");
> -		ret = -EINVAL;
> +				len = dst_length;
> +				if (len + out > PAGE_SIZE)
> +					len = PAGE_SIZE - out;
> +				if (len == PAGE_SIZE)
> +					copy_page(dst, tmp + in);
> +				else
> +					memcpy(dst + out, tmp + in, len);
> +				in += len;
> +				out += len;
> +				dst_length -= len;
> +			} while (dst_length);
>  		}
>  
> -	vunmap(batch_base);
> +		batch_len -= this;
> +		if (batch_len == 0)
> +			break;
>  
> +		if (partial)
> +			memmove(tmp, cmd, partial);
> +
> +		kunmap_atomic(src);
> +		in = 0;
> +	}
> +unmap:
> +	kunmap_atomic(src);
> +	kunmap_atomic(dst);
> +	batch_obj->get_page = rewind;
> +	kfree(tmp);
> +unpin:
> +	i915_gem_object_unpin_pages(batch_obj);
>  	return ret;
>  }
>  
> -- 
> 2.6.2

-- 
Ville Syrjälä
Intel OTC
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3] drm/i915: Eliminate vmap overhead for cmd parser
  2015-11-25 19:51       ` Ville Syrjälä
@ 2015-11-25 20:13         ` Chris Wilson
  2015-11-25 21:15           ` Ville Syrjälä
  0 siblings, 1 reply; 26+ messages in thread
From: Chris Wilson @ 2015-11-25 20:13 UTC (permalink / raw)
  To: Ville Syrjälä; +Cc: intel-gfx

On Wed, Nov 25, 2015 at 09:51:08PM +0200, Ville Syrjälä wrote:
> On Fri, Nov 20, 2015 at 03:31:23PM +0000, Chris Wilson wrote:
> > @@ -1097,6 +1003,7 @@ static bool check_cmd(const struct intel_engine_cs *ring,
> >  }
> >  
> >  #define LENGTH_BIAS 2
> > +#define MAX_PARTIAL 256
> 
> There seems to some confusion whether this is bytes or dwords.

Indeed, I can't remember of the top of my head.

(Double checked that the set of commands that I was thinking were 132
bytes.)
 
> Also I guess we already end up allocating two pages anyway, so
> maybe MAX_PARTIAL should just be one page? It's still not big
> enough to cover the max legal cmd length AFAICS, so I think
> the WARN in the check needs to be removed.

Sure, rounding up the next 8192 byte slab cache doesn't seem like it
will bite us.

So #define MAX_PARTIAL_BYTES PAGE_SIZE

> > +	in = offset_in_page(batch_start_offset);
> > +	partial = 0;
> > +	for (src_iter = batch_start_offset >> PAGE_SHIFT;
> > +	     src_iter < batch_obj->base.size >> PAGE_SHIFT;
> > +	     src_iter++) {
> 
> So we're iterating over all the pages. Should be enough to iterate
> until batch_start_offset+batch_len I suppose, but as long as we bail
> out when we run out of batch it should be fine.

Right, this was mostly convenience for writing the loop bounds - it was
more or less a simple conversion from the old iterator.

> I see there's a batch_len check at the end, but I don't see us handling
> the case when the user already gives us something with batch_len==0.
> Maybe that should be rejected somewhere higher up?

batch_len = 0 is filtered out in the caller...
 
> Also what happens if we don't find MI_BATCH_BUFFER_END before running
> out of batch? Oh, I see, we set ret=-EINVAL, and clear it to 0 when we
> find MI_BATCH_BUFFER_END. So that part seems to be fine.
> 
> > +		u32 *cmd, *page_end, *batch_end;
> > +		u32 this;
> > +
> > +		this = batch_len;
> 
> I was a bit concerned about batch_len & 3, but we already check for
> batch_len&7==0 in i915_gem_check_execbuffer(), so it should be good here.

cmdparser_assert(batch_len > 0 && (batch_len & 3) == 0);

as documentation for the contract?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3] drm/i915: Eliminate vmap overhead for cmd parser
  2015-11-25 20:13         ` Chris Wilson
@ 2015-11-25 21:15           ` Ville Syrjälä
  0 siblings, 0 replies; 26+ messages in thread
From: Ville Syrjälä @ 2015-11-25 21:15 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On Wed, Nov 25, 2015 at 08:13:43PM +0000, Chris Wilson wrote:
> On Wed, Nov 25, 2015 at 09:51:08PM +0200, Ville Syrjälä wrote:
> > On Fri, Nov 20, 2015 at 03:31:23PM +0000, Chris Wilson wrote:
> > > @@ -1097,6 +1003,7 @@ static bool check_cmd(const struct intel_engine_cs *ring,
> > >  }
> > >  
> > >  #define LENGTH_BIAS 2
> > > +#define MAX_PARTIAL 256
> > 
> > There seems to some confusion whether this is bytes or dwords.
> 
> Indeed, I can't remember of the top of my head.
> 
> (Double checked that the set of commands that I was thinking were 132
> bytes.)
>  
> > Also I guess we already end up allocating two pages anyway, so
> > maybe MAX_PARTIAL should just be one page? It's still not big
> > enough to cover the max legal cmd length AFAICS, so I think
> > the WARN in the check needs to be removed.
> 
> Sure, rounding up the next 8192 byte slab cache doesn't seem like it
> will bite us.
> 
> So #define MAX_PARTIAL_BYTES PAGE_SIZE
> 
> > > +	in = offset_in_page(batch_start_offset);
> > > +	partial = 0;
> > > +	for (src_iter = batch_start_offset >> PAGE_SHIFT;
> > > +	     src_iter < batch_obj->base.size >> PAGE_SHIFT;
> > > +	     src_iter++) {
> > 
> > So we're iterating over all the pages. Should be enough to iterate
> > until batch_start_offset+batch_len I suppose, but as long as we bail
> > out when we run out of batch it should be fine.
> 
> Right, this was mostly convenience for writing the loop bounds - it was
> more or less a simple conversion from the old iterator.
> 
> > I see there's a batch_len check at the end, but I don't see us handling
> > the case when the user already gives us something with batch_len==0.
> > Maybe that should be rejected somewhere higher up?
> 
> batch_len = 0 is filtered out in the caller...

Oh yeah, see it now.

>  
> > Also what happens if we don't find MI_BATCH_BUFFER_END before running
> > out of batch? Oh, I see, we set ret=-EINVAL, and clear it to 0 when we
> > find MI_BATCH_BUFFER_END. So that part seems to be fine.
> > 
> > > +		u32 *cmd, *page_end, *batch_end;
> > > +		u32 this;
> > > +
> > > +		this = batch_len;
> > 
> > I was a bit concerned about batch_len & 3, but we already check for
> > batch_len&7==0 in i915_gem_check_execbuffer(), so it should be good here.
> 
> cmdparser_assert(batch_len > 0 && (batch_len & 3) == 0);
> 
> as documentation for the contract?

I won't insist, but feel free to add something like that if you
wish.

-- 
Ville Syrjälä
Intel OTC
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 2/6] drm/i915: Cache last cmd descriptor when parsing
  2015-11-20 10:55 ` [PATCH v2 2/6] drm/i915: Cache last cmd descriptor when parsing Chris Wilson
  2015-11-20 15:08   ` Ville Syrjälä
@ 2015-12-01 17:30   ` Ville Syrjälä
  1 sibling, 0 replies; 26+ messages in thread
From: Ville Syrjälä @ 2015-12-01 17:30 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Fri, Nov 20, 2015 at 10:55:57AM +0000, Chris Wilson wrote:
> The cmd parser has the biggest impact on the BLT ring, because it is
> relatively verbose composed to the other engines as the vertex data is
> inline. It also typically has runs of repeating commands (again since
> the vertex data is inline, it typically has sequences of XY_SETUP_BLT,
> XY_SCANLINE_BLT..) We can easily reduce the impact of cmdparsing on
> benchmarks by then caching the last descriptor and comparing it against
> the next command header. To get maximum benefit, we also want to take
> advantage of skipping a few validations and length determinations if the
> header is unchanged between commands.
> 
> ivb i7-3720QM:
> x11perf -dot: before 52.3M, after 124M (max 203M)
> glxgears: before 7310 fps, after 7550 fps (max 7860 fps)
> 
> v2: Fix initial cached cmd descriptor to match MI_NOOP.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---

Didn't spot anything wrong, so:

Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>

>  drivers/gpu/drm/i915/i915_cmd_parser.c | 133 +++++++++++++++------------------
>  drivers/gpu/drm/i915/i915_drv.h        |  10 +--
>  2 files changed, 64 insertions(+), 79 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
> index db3a04ea036a..c6f6d9f2b2ce 100644
> --- a/drivers/gpu/drm/i915/i915_cmd_parser.c
> +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
> @@ -794,6 +794,9 @@ void i915_cmd_parser_fini_ring(struct intel_engine_cs *ring)
>  	fini_hash_table(ring);
>  }
>  
> +/*
> + * Returns a pointer to a descriptor for the command specified by cmd_header.
> + */
>  static const struct drm_i915_cmd_descriptor*
>  find_cmd_in_table(struct intel_engine_cs *ring,
>  		  u32 cmd_header)
> @@ -813,37 +816,6 @@ find_cmd_in_table(struct intel_engine_cs *ring,
>  	return NULL;
>  }
>  
> -/*
> - * Returns a pointer to a descriptor for the command specified by cmd_header.
> - *
> - * The caller must supply space for a default descriptor via the default_desc
> - * parameter. If no descriptor for the specified command exists in the ring's
> - * command parser tables, this function fills in default_desc based on the
> - * ring's default length encoding and returns default_desc.
> - */
> -static const struct drm_i915_cmd_descriptor*
> -find_cmd(struct intel_engine_cs *ring,
> -	 u32 cmd_header,
> -	 struct drm_i915_cmd_descriptor *default_desc)
> -{
> -	const struct drm_i915_cmd_descriptor *desc;
> -	u32 mask;
> -
> -	desc = find_cmd_in_table(ring, cmd_header);
> -	if (desc)
> -		return desc;
> -
> -	mask = ring->get_cmd_length_mask(cmd_header);
> -	if (!mask)
> -		return NULL;
> -
> -	BUG_ON(!default_desc);
> -	default_desc->flags = CMD_DESC_SKIP;
> -	default_desc->length.mask = mask;
> -
> -	return default_desc;
> -}
> -
>  static const struct drm_i915_reg_descriptor *
>  find_reg(const struct drm_i915_reg_descriptor *table,
>  	 int count, u32 addr)
> @@ -886,17 +858,6 @@ static bool check_cmd(const struct intel_engine_cs *ring,
>  		      const bool is_master,
>  		      bool *oacontrol_set)
>  {
> -	if (desc->flags & CMD_DESC_REJECT) {
> -		DRM_DEBUG_DRIVER("CMD: Rejected command: 0x%08X\n", *cmd);
> -		return false;
> -	}
> -
> -	if ((desc->flags & CMD_DESC_MASTER) && !is_master) {
> -		DRM_DEBUG_DRIVER("CMD: Rejected master-only command: 0x%08X\n",
> -				 *cmd);
> -		return false;
> -	}
> -
>  	if (desc->flags & CMD_DESC_REGISTER) {
>  		/*
>  		 * Get the distance between individual register offset
> @@ -1027,14 +988,15 @@ int i915_parse_cmds(struct intel_engine_cs *ring,
>  		    u32 batch_len,
>  		    bool is_master)
>  {
> -	const struct drm_i915_cmd_descriptor *desc;
> +	struct drm_i915_cmd_descriptor default_desc = { CMD_DESC_SKIP };
> +	const struct drm_i915_cmd_descriptor *desc = &default_desc;
> +	u32 last_cmd_header = 0;
>  	unsigned dst_iter, src_iter;
>  	int needs_clflush = 0;
>  	struct get_page rewind;
>  	void *src, *dst, *tmp;
> -	u32 partial, length;
> +	u32 partial, length = 1;
>  	unsigned in, out;
> -	struct drm_i915_cmd_descriptor default_desc = { 0 };
>  	bool oacontrol_set = false; /* OACONTROL tracking. See check_cmd() */
>  	int ret = 0;
>  
> @@ -1100,40 +1062,63 @@ int i915_parse_cmds(struct intel_engine_cs *ring,
>  		partial = 0;
>  
>  		do {
> -			if (*cmd == MI_BATCH_BUFFER_END) {
> -				if (oacontrol_set) {
> -					DRM_DEBUG_DRIVER("CMD: batch set OACONTROL but did not clear it\n");
> -					goto unmap;
> +			if (*cmd != last_cmd_header) {
> +				if (*cmd == MI_BATCH_BUFFER_END) {
> +					if (unlikely(oacontrol_set)) {
> +						DRM_DEBUG_DRIVER("CMD: batch set OACONTROL but did not clear it\n");
> +						goto unmap;
> +					}
> +
> +					cmd++; /* copy this cmd to dst */
> +					batch_len = this; /* no more src */
> +					ret = 0;
> +					break;
>  				}
>  
> -				cmd++; /* copy this cmd to dst */
> -				batch_len = this; /* no more src */
> -				ret = 0;
> -				break;
> -			}
> -
> -			desc = find_cmd(ring, *cmd, &default_desc);
> -			if (!desc) {
> -				DRM_DEBUG_DRIVER("CMD: Unrecognized command: 0x%08X\n",
> -						 *cmd);
> -				goto unmap;
> -			}
> +				desc = find_cmd_in_table(ring, *cmd);
> +				if (desc) {
> +					if (unlikely(desc->flags & CMD_DESC_REJECT)) {
> +						DRM_DEBUG_DRIVER("CMD: Rejected command: 0x%08X\n", *cmd);
> +						goto unmap;
> +					}
> +
> +					if (unlikely((desc->flags & CMD_DESC_MASTER) && !is_master)) {
> +						DRM_DEBUG_DRIVER("CMD: Rejected master-only command: 0x%08X\n", *cmd);
> +						goto unmap;
> +					}
> +
> +					/*
> +					 * If the batch buffer contains a
> +					 * chained batch, return an error that
> +					 * tells the caller to abort and
> +					 * dispatch the workload as a
> +					 * non-secure batch.
> +					 */
> +					if (unlikely(desc->cmd.value == MI_BATCH_BUFFER_START)) {
> +						ret = -EACCES;
> +						goto unmap;
> +					}
> +
> +					if (desc->flags & CMD_DESC_FIXED)
> +						length = desc->length.fixed;
> +					else
> +						length = (*cmd & desc->length.mask) + LENGTH_BIAS;
> +				} else {
> +					u32 mask = ring->get_cmd_length_mask(*cmd);
> +					if (unlikely(!mask)) {
> +						DRM_DEBUG_DRIVER("CMD: Unrecognized command: 0x%08X\n", *cmd);
> +						goto unmap;
> +					}
> +
> +					default_desc.length.mask = mask;
> +					desc = &default_desc;
> +
> +					length = (*cmd & mask) + LENGTH_BIAS;
> +				}
>  
> -			/*
> -			 * If the batch buffer contains a chained batch, return an
> -			 * error that tells the caller to abort and dispatch the
> -			 * workload as a non-secure batch.
> -			 */
> -			if (desc->cmd.value == MI_BATCH_BUFFER_START) {
> -				ret = -EACCES;
> -				goto unmap;
> +				last_cmd_header = *cmd;
>  			}
>  
> -			if (desc->flags & CMD_DESC_FIXED)
> -				length = desc->length.fixed;
> -			else
> -				length = ((*cmd & desc->length.mask) + LENGTH_BIAS);
> -
>  			if (unlikely(cmd + length > page_end)) {
>  				if (unlikely(cmd + length > batch_end)) {
>  					DRM_DEBUG_DRIVER("CMD: Command length exceeds batch length: 0x%08X length=%u batchlen=%td\n",
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 8d939649b919..28d5bfceae3b 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2333,12 +2333,12 @@ struct drm_i915_cmd_descriptor {
>  	 *                  is the DRM master
>  	 */
>  	u32 flags;
> +#define CMD_DESC_SKIP     (0)
>  #define CMD_DESC_FIXED    (1<<0)
> -#define CMD_DESC_SKIP     (1<<1)
> -#define CMD_DESC_REJECT   (1<<2)
> -#define CMD_DESC_REGISTER (1<<3)
> -#define CMD_DESC_BITMASK  (1<<4)
> -#define CMD_DESC_MASTER   (1<<5)
> +#define CMD_DESC_REJECT   (1<<1)
> +#define CMD_DESC_REGISTER (1<<2)
> +#define CMD_DESC_BITMASK  (1<<3)
> +#define CMD_DESC_MASTER   (1<<4)
>  
>  	/*
>  	 * The command's unique identification bits and the bitmask to get them.
> -- 
> 2.6.2
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Ville Syrjälä
Intel OTC
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 3/6] drm/i915: Use WC copies on !llc platforms for the command parser
  2015-11-20 15:22     ` Chris Wilson
@ 2015-12-01 17:32       ` Ville Syrjälä
  0 siblings, 0 replies; 26+ messages in thread
From: Ville Syrjälä @ 2015-12-01 17:32 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On Fri, Nov 20, 2015 at 03:22:03PM +0000, Chris Wilson wrote:
> On Fri, Nov 20, 2015 at 05:05:05PM +0200, Ville Syrjälä wrote:
> > On Fri, Nov 20, 2015 at 10:55:58AM +0000, Chris Wilson wrote:
> > > Since we blow the TLB caches by using kmap/kunmap, we may as well go the
> > > whole hog and see if declaring our destination page as WC is faster than
> > > keeping it as WB and using clflush. It should be!
> > 
> > Is this description for another patch? I can't see any WC stuff in
> > there.
> 
> No, just badly written. (Admittedly at one point I did experiment with
> remapping the pages as WC and discovered that did stop_macheine()...)
> 
> drm/i915: Perform inline clflushes from the command parser
> 
> On incoherent architectures, we can avoid having to clflush the
> destination shadow batch as a separate pass by inlining the call to
> clflush whilst we already have the kmap_atomic() around for the page.

With the new description:
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>

-- 
Ville Syrjälä
Intel OTC
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 5/6] drm/i915: Reduce pointer indirection during cmd parser lookup
  2015-11-20 15:27   ` Ville Syrjälä
  2015-11-20 15:34     ` Chris Wilson
@ 2015-12-01 17:39     ` Ville Syrjälä
  1 sibling, 0 replies; 26+ messages in thread
From: Ville Syrjälä @ 2015-12-01 17:39 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Fri, Nov 20, 2015 at 05:27:43PM +0200, Ville Syrjälä wrote:
> On Fri, Nov 20, 2015 at 10:56:00AM +0000, Chris Wilson wrote:
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >  drivers/gpu/drm/i915/i915_cmd_parser.c | 51 ++++++++--------------------------
> >  drivers/gpu/drm/i915/i915_drv.h        |  4 ++-
> >  2 files changed, 14 insertions(+), 41 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
> > index cfd07bfe6e75..ea9df2bb87de 100644
> > --- a/drivers/gpu/drm/i915/i915_cmd_parser.c
> > +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
> > @@ -113,7 +113,7 @@
> >  
> >  /*            Command                          Mask   Fixed Len   Action
> >  	      ---------------------------------------------------------- */
> > -static const struct drm_i915_cmd_descriptor common_cmds[] = {
> > +static struct drm_i915_cmd_descriptor common_cmds[] = {
> 
> I'm a little sad to see the const gone. All this gets moved out of
> rodata.
> 
> >  	CMD(  MI_NOOP,                          SMI,    F,  1,      S  ),
> >  	CMD(  MI_USER_INTERRUPT,                SMI,    F,  1,      R  ),
> >  	CMD(  MI_WAIT_FOR_EVENT,                SMI,    F,  1,      M  ),
> > @@ -146,7 +146,7 @@ static const struct drm_i915_cmd_descriptor common_cmds[] = {
> >  	CMD(  MI_BATCH_BUFFER_START,            SMI,   !F,  0xFF,   S  ),
> >  };
> >  
> > -static const struct drm_i915_cmd_descriptor render_cmds[] = {
> > +static struct drm_i915_cmd_descriptor render_cmds[] = {
> >  	CMD(  MI_FLUSH,                         SMI,    F,  1,      S  ),
> >  	CMD(  MI_ARB_ON_OFF,                    SMI,    F,  1,      R  ),
> >  	CMD(  MI_PREDICATE,                     SMI,    F,  1,      S  ),
> > @@ -207,7 +207,7 @@ static const struct drm_i915_cmd_descriptor render_cmds[] = {
> >  	      }},						       ),
> >  };
> >  
> > -static const struct drm_i915_cmd_descriptor hsw_render_cmds[] = {
> > +static struct drm_i915_cmd_descriptor hsw_render_cmds[] = {
> >  	CMD(  MI_SET_PREDICATE,                 SMI,    F,  1,      S  ),
> >  	CMD(  MI_RS_CONTROL,                    SMI,    F,  1,      S  ),
> >  	CMD(  MI_URB_ATOMIC_ALLOC,              SMI,    F,  1,      S  ),
> > @@ -229,7 +229,7 @@ static const struct drm_i915_cmd_descriptor hsw_render_cmds[] = {
> >  	CMD(  GFX_OP_3DSTATE_BINDING_TABLE_EDIT_PS,  S3D,   !F,  0x1FF,  S  ),
> >  };
> >  
> > -static const struct drm_i915_cmd_descriptor video_cmds[] = {
> > +static struct drm_i915_cmd_descriptor video_cmds[] = {
> >  	CMD(  MI_ARB_ON_OFF,                    SMI,    F,  1,      R  ),
> >  	CMD(  MI_SET_APPID,                     SMI,    F,  1,      S  ),
> >  	CMD(  MI_STORE_DWORD_IMM,               SMI,   !F,  0xFF,   B,
> > @@ -273,7 +273,7 @@ static const struct drm_i915_cmd_descriptor video_cmds[] = {
> >  	CMD(  MFX_WAIT,                         SMFX,   F,  1,      S  ),
> >  };
> >  
> > -static const struct drm_i915_cmd_descriptor vecs_cmds[] = {
> > +static struct drm_i915_cmd_descriptor vecs_cmds[] = {
> >  	CMD(  MI_ARB_ON_OFF,                    SMI,    F,  1,      R  ),
> >  	CMD(  MI_SET_APPID,                     SMI,    F,  1,      S  ),
> >  	CMD(  MI_STORE_DWORD_IMM,               SMI,   !F,  0xFF,   B,
> > @@ -311,7 +311,7 @@ static const struct drm_i915_cmd_descriptor vecs_cmds[] = {
> >  	      }},						       ),
> >  };
> >  
> > -static const struct drm_i915_cmd_descriptor blt_cmds[] = {
> > +static struct drm_i915_cmd_descriptor blt_cmds[] = {
> >  	CMD(  MI_DISPLAY_FLIP,                  SMI,   !F,  0xFF,   R  ),
> >  	CMD(  MI_STORE_DWORD_IMM,               SMI,   !F,  0x3FF,  B,
> >  	      .bits = {{
> > @@ -344,7 +344,7 @@ static const struct drm_i915_cmd_descriptor blt_cmds[] = {
> >  	CMD(  SRC_COPY_BLT,                     S2D,   !F,  0x3F,   S  ),
> >  };
> >  
> > -static const struct drm_i915_cmd_descriptor hsw_blt_cmds[] = {
> > +static struct drm_i915_cmd_descriptor hsw_blt_cmds[] = {
> >  	CMD(  MI_LOAD_SCAN_LINES_INCL,          SMI,   !F,  0x3F,   M  ),
> >  	CMD(  MI_LOAD_SCAN_LINES_EXCL,          SMI,   !F,  0x3F,   R  ),
> >  };
> > @@ -618,11 +618,6 @@ static bool validate_regs_sorted(struct intel_engine_cs *ring)
> >  			     ring->master_reg_count);
> >  }
> >  
> > -struct cmd_node {
> > -	const struct drm_i915_cmd_descriptor *desc;
> > -	struct hlist_node node;
> > -};
> > -
> >  /*
> >   * Different command ranges have different numbers of bits for the opcode. For
> >   * example, MI commands use bits 31:23 while 3D commands use bits 31:16. The
> > @@ -651,16 +646,8 @@ static int init_hash_table(struct intel_engine_cs *ring,
> >  		const struct drm_i915_cmd_table *table = &cmd_tables[i];
> >  
> >  		for (j = 0; j < table->count; j++) {
> > -			const struct drm_i915_cmd_descriptor *desc =
> > -				&table->table[j];
> > -			struct cmd_node *desc_node =
> > -				kmalloc(sizeof(*desc_node), GFP_KERNEL);
> > -
> > -			if (!desc_node)
> > -				return -ENOMEM;
> 
> init_hash_table() can no longer fail -> void?
> 
> > -
> > -			desc_node->desc = desc;
> > -			hash_add(ring->cmd_hash, &desc_node->node,
> > +			struct drm_i915_cmd_descriptor *desc = &table->table[j];
> > +			hash_add(ring->cmd_hash, &desc->node[ring->id],
> >  				 desc->cmd.value & CMD_HASH_MASK);
> >  		}
> >  	}
> > @@ -668,18 +655,6 @@ static int init_hash_table(struct intel_engine_cs *ring,
> >  	return 0;
> >  }
> >  
> > -static void fini_hash_table(struct intel_engine_cs *ring)
> > -{
> > -	struct hlist_node *tmp;
> > -	struct cmd_node *desc_node;
> > -	int i;
> > -
> > -	hash_for_each_safe(ring->cmd_hash, i, tmp, desc_node, node) {
> > -		hash_del(&desc_node->node);
> > -		kfree(desc_node);
> > -	}
> > -}
> > -
> >  /**
> >   * i915_cmd_parser_init_ring() - set cmd parser related fields for a ringbuffer
> >   * @ring: the ringbuffer to initialize
> > @@ -770,7 +745,6 @@ int i915_cmd_parser_init_ring(struct intel_engine_cs *ring)
> >  	ret = init_hash_table(ring, cmd_tables, cmd_table_count);
> >  	if (ret) {
> >  		DRM_ERROR("CMD: cmd_parser_init failed!\n");
> > -		fini_hash_table(ring);
> >  		return ret;
> >  	}
> >  
> > @@ -790,8 +764,6 @@ void i915_cmd_parser_fini_ring(struct intel_engine_cs *ring)
> >  {
> >  	if (!ring->needs_cmd_parser)
> >  		return;
> > -
> > -	fini_hash_table(ring);
> >  }
> 
> i915_cmd_parser_fini_ring() is a nop now. Kill it?
> 
> >  
> >  /*
> > @@ -801,11 +773,10 @@ static const struct drm_i915_cmd_descriptor*
> >  find_cmd_in_table(struct intel_engine_cs *ring,
> >  		  u32 cmd_header)
> >  {
> > -	struct cmd_node *desc_node;
> > +	const struct drm_i915_cmd_descriptor *desc;
> >  
> > -	hash_for_each_possible(ring->cmd_hash, desc_node, node,
> > +	hash_for_each_possible(ring->cmd_hash, desc, node[ring->id],
> >  			       cmd_header & CMD_HASH_MASK) {
> > -		const struct drm_i915_cmd_descriptor *desc = desc_node->desc;
> >  		if (((cmd_header ^ desc->cmd.value) & desc->cmd.mask) == 0)
> >  			return desc;
> 
> At least we still return this as const, so the caller can't accidentally
> clobber it, even if we lose the rodata protection.
> 
> Apart from the int vs. void thing and the nop
> i915_cmd_parser_fini_ring() the patch looks fine to me.

This too can get
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>

since the now unnecessary error handling doesn't actually do anything.
Feel free to keep the r-b even if you decide to throw out the error
handling parts.

> 
> >  	}
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > index 28d5bfceae3b..5960f76f1438 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -2396,6 +2396,8 @@ struct drm_i915_cmd_descriptor {
> >  		u32 condition_offset;
> >  		u32 condition_mask;
> >  	} bits[MAX_CMD_DESC_BITMASKS];
> > +
> > +	struct hlist_node node[I915_NUM_RINGS];
> >  };
> >  
> >  /*
> > @@ -2405,7 +2407,7 @@ struct drm_i915_cmd_descriptor {
> >   * descriptors, which must be sorted with command opcodes in ascending order.
> >   */
> >  struct drm_i915_cmd_table {
> > -	const struct drm_i915_cmd_descriptor *table;
> > +	struct drm_i915_cmd_descriptor *table;
> >  	int count;
> >  };
> >  
> > -- 
> > 2.6.2
> > 
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
> 
> -- 
> Ville Syrjälä
> Intel OTC
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Ville Syrjälä
Intel OTC
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2015-12-01 17:39 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-20 10:55 cmdparser overhead reduction Chris Wilson
2015-11-20 10:55 ` [PATCH v2 1/6] drm/i915: Eliminate vmap overhead for cmd parser Chris Wilson
2015-11-20 14:41   ` Ville Syrjälä
2015-11-20 14:52     ` Chris Wilson
2015-11-20 15:31     ` [PATCH v3] " Chris Wilson
2015-11-25 19:51       ` Ville Syrjälä
2015-11-25 20:13         ` Chris Wilson
2015-11-25 21:15           ` Ville Syrjälä
2015-11-20 10:55 ` [PATCH v2 2/6] drm/i915: Cache last cmd descriptor when parsing Chris Wilson
2015-11-20 15:08   ` Ville Syrjälä
2015-11-20 15:44     ` Chris Wilson
2015-12-01 17:30   ` Ville Syrjälä
2015-11-20 10:55 ` [PATCH v2 3/6] drm/i915: Use WC copies on !llc platforms for the command parser Chris Wilson
2015-11-20 15:05   ` Ville Syrjälä
2015-11-20 15:22     ` Chris Wilson
2015-12-01 17:32       ` Ville Syrjälä
2015-11-20 10:55 ` [PATCH v2 4/6] drm/i915: Reduce arithmetic operations during cmd parser lookup Chris Wilson
2015-11-20 15:02   ` Ville Syrjälä
2015-11-20 10:56 ` [PATCH v2 5/6] drm/i915: Reduce pointer indirection " Chris Wilson
2015-11-20 15:27   ` Ville Syrjälä
2015-11-20 15:34     ` Chris Wilson
2015-11-20 15:47       ` Ville Syrjälä
2015-11-23  8:09         ` Jani Nikula
2015-12-01 17:39     ` Ville Syrjälä
2015-11-20 10:56 ` [PATCH v2 6/6] drm/i915: Improve hash function for the command parser Chris Wilson
2015-11-20 15:13   ` Ville Syrjälä

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.