All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 0/4] Command parser batch buffer copy
@ 2014-06-18 16:36 bradley.d.volkin
  2014-06-18 16:36 ` [RFC 1/4] drm/i915: Implement a framework for batch buffer pools bradley.d.volkin
                   ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: bradley.d.volkin @ 2014-06-18 16:36 UTC (permalink / raw)
  To: intel-gfx

From: Brad Volkin <bradley.d.volkin@intel.com>

This series is what I have so far on the batch copy aspect of the
command parser. The individual patches and kerneldoc have details.
It doesn't work with full ppgtt at the moment, so isn't ready for
merge. But I'd like to get feedback on the direction and particularly
whether I've got the activity tracking, domain management, and madv
parts correct.

Brad Volkin (4):
  drm/i915: Implement a framework for batch buffer pools
  drm/i915: Use batch pools with the command parser
  drm/i915: Add a batch pool debugfs file
  drm/i915: Dispatch the shadow batch buffer

 Documentation/DocBook/drm.tmpl               |   5 +
 drivers/gpu/drm/i915/Makefile                |   1 +
 drivers/gpu/drm/i915/i915_cmd_parser.c       |  75 ++++++++++---
 drivers/gpu/drm/i915/i915_debugfs.c          |  48 +++++++++
 drivers/gpu/drm/i915/i915_drv.h              |  26 ++++-
 drivers/gpu/drm/i915/i915_gem.c              |   9 +-
 drivers/gpu/drm/i915/i915_gem_batch_pool.c   | 151 +++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_gem_execbuffer.c   |  82 +++++++++++----
 drivers/gpu/drm/i915/i915_gem_render_state.c |   2 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c      |  12 +++
 drivers/gpu/drm/i915/intel_ringbuffer.h      |   7 ++
 11 files changed, 379 insertions(+), 39 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/i915_gem_batch_pool.c

-- 
1.8.3.2

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [RFC 1/4] drm/i915: Implement a framework for batch buffer pools
  2014-06-18 16:36 [RFC 0/4] Command parser batch buffer copy bradley.d.volkin
@ 2014-06-18 16:36 ` bradley.d.volkin
  2014-06-19  9:48   ` Tvrtko Ursulin
  2014-06-18 16:36 ` [RFC 2/4] drm/i915: Use batch pools with the command parser bradley.d.volkin
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 16+ messages in thread
From: bradley.d.volkin @ 2014-06-18 16:36 UTC (permalink / raw)
  To: intel-gfx

From: Brad Volkin <bradley.d.volkin@intel.com>

This adds a small module for managing a pool of batch buffers.
The only current use case is for the command parser, as described
in the kerneldoc in the patch. The code is simple, but separating
it out makes it easier to change the underlying algorithms and to
extend to future use cases should they arise.

The interface is simple: alloc to create an empty pool, free to
clean it up; get to obtain a new buffer, put to return it to the
pool. Note that all buffers must be returned to the pool before
freeing it.

The pool has a maximum number of buffers allowed due to some tests
(e.g. gem_exec_nop) creating a very large number of buffers (e.g.
___). Buffers are purgeable while in the pool, but not explicitly
truncated in order to avoid overhead during execbuf.

Locking is currently based on the caller holding the struct_mutex.
We already do that in the places where we will use the batch pool
for the command parser.

Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
---

r.e. pool capacity
My original testing showed something like thousands of buffers in
the pool after a gem_exec_nop run. But when I reran with the max
check disabled just now to get an actual number for the commit
message, the number was more like 130.  I developed and tested the
changes incrementally, and suspect that the original run was before
I implemented the actual copy operation. So I'm inclined to remove
or at least increase the cap in the final version. Thoughts?

---
 Documentation/DocBook/drm.tmpl             |   5 +
 drivers/gpu/drm/i915/Makefile              |   1 +
 drivers/gpu/drm/i915/i915_drv.h            |  19 ++++
 drivers/gpu/drm/i915/i915_gem.c            |   1 +
 drivers/gpu/drm/i915/i915_gem_batch_pool.c | 151 +++++++++++++++++++++++++++++
 5 files changed, 177 insertions(+)
 create mode 100644 drivers/gpu/drm/i915/i915_gem_batch_pool.c

diff --git a/Documentation/DocBook/drm.tmpl b/Documentation/DocBook/drm.tmpl
index 7df3134..fcc0a1c 100644
--- a/Documentation/DocBook/drm.tmpl
+++ b/Documentation/DocBook/drm.tmpl
@@ -3939,6 +3939,11 @@ int num_ioctls;</synopsis>
 !Pdrivers/gpu/drm/i915/i915_cmd_parser.c batch buffer command parser
 !Idrivers/gpu/drm/i915/i915_cmd_parser.c
       </sect2>
+      <sect2>
+        <title>Batchbuffer Pools</title>
+!Pdrivers/gpu/drm/i915/i915_gem_batch_pool.c batch pool
+!Idrivers/gpu/drm/i915/i915_gem_batch_pool.c
+      </sect2>
     </sect1>
   </chapter>
 </part>
diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index cad1683..b92fbe6 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -17,6 +17,7 @@ i915-$(CONFIG_DEBUG_FS) += i915_debugfs.o
 
 # GEM code
 i915-y += i915_cmd_parser.o \
+	  i915_gem_batch_pool.o \
 	  i915_gem_context.o \
 	  i915_gem_render_state.o \
 	  i915_gem_debug.o \
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 0640071..2a88b5e 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1610,6 +1610,8 @@ struct drm_i915_gem_object {
 	/** Used in execbuf to temporarily hold a ref */
 	struct list_head obj_exec_link;
 
+	struct list_head batch_pool_list;
+
 	/**
 	 * This is set if the object is on the active lists (has pending
 	 * rendering and so a non-zero seqno), and is not set if it i s on
@@ -1727,6 +1729,14 @@ struct drm_i915_gem_object {
 };
 #define to_intel_bo(x) container_of(x, struct drm_i915_gem_object, base)
 
+struct i915_gem_batch_pool {
+	struct drm_device *dev;
+	struct list_head active_list;
+	struct list_head inactive_list;
+	int count;
+	int max_count;
+};
+
 /**
  * Request queue structure.
  *
@@ -2508,6 +2518,15 @@ void i915_destroy_error_state(struct drm_device *dev);
 void i915_get_extra_instdone(struct drm_device *dev, uint32_t *instdone);
 const char *i915_cache_level_str(int type);
 
+/* i915_gem_batch_pool.c */
+struct i915_gem_batch_pool *i915_gem_batch_pool_alloc(struct drm_device *dev,
+						      int max_count);
+void i915_gem_batch_pool_free(struct i915_gem_batch_pool *pool);
+struct drm_i915_gem_object*
+i915_gem_batch_pool_get(struct i915_gem_batch_pool *pool, size_t size);
+void i915_gem_batch_pool_put(struct i915_gem_batch_pool *pool,
+			     struct drm_i915_gem_object *obj);
+
 /* i915_cmd_parser.c */
 int i915_cmd_parser_get_version(void);
 int i915_cmd_parser_init_ring(struct intel_engine_cs *ring);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index d857f58..d5e3001 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4324,6 +4324,7 @@ void i915_gem_object_init(struct drm_i915_gem_object *obj,
 	INIT_LIST_HEAD(&obj->ring_list);
 	INIT_LIST_HEAD(&obj->obj_exec_link);
 	INIT_LIST_HEAD(&obj->vma_list);
+	INIT_LIST_HEAD(&obj->batch_pool_list);
 
 	obj->ops = ops;
 
diff --git a/drivers/gpu/drm/i915/i915_gem_batch_pool.c b/drivers/gpu/drm/i915/i915_gem_batch_pool.c
new file mode 100644
index 0000000..37cec7d
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_gem_batch_pool.c
@@ -0,0 +1,151 @@
+/*
+ * Copyright © 2014 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ */
+
+#include "i915_drv.h"
+
+/**
+ * DOC: batch pool
+ *
+ * In order to submit batch buffers as 'secure', the software command parser
+ * must ensure that a batch buffer cannot be modified after parsing. It does
+ * this by copying the user provided batch buffer contents to a kernel owned
+ * buffer from which the hardware will actually execute, and by carefully
+ * managing the address space bindings for such buffers.
+ *
+ * The batch pool framework provides a mechanism for the driver to manage a
+ * set of scratch buffers to use for this purpose. The framework can be
+ * extended to support other uses cases should they arise.
+ */
+
+/**
+ * i915_gem_batch_pool_alloc() - allocate and initialize a batch buffer pool
+ * @dev: the drm device
+ * @max_count: the maximum number of buffers allowed in the pool
+ *
+ * Return: the new batch buffer pool
+ */
+struct i915_gem_batch_pool *i915_gem_batch_pool_alloc(struct drm_device *dev,
+						      int max_count)
+{
+	struct i915_gem_batch_pool *pool;
+
+	pool = kmalloc(sizeof(*pool), GFP_KERNEL);
+	if (!pool)
+		return ERR_PTR(-ENOMEM);
+
+	pool->dev = dev;
+	INIT_LIST_HEAD(&pool->active_list);
+	INIT_LIST_HEAD(&pool->inactive_list);
+	pool->count = 0;
+	pool->max_count = max_count;
+
+	return pool;
+}
+
+/**
+ * i915_gem_batch_pool_free() - clean up a batch buffer pool
+ * @pool: the pool to clean up
+ *
+ * Note: Callers must hold the struct_mutex. Callers must also ensure
+ *       that all buffers have been returned to the pool.
+ */
+void i915_gem_batch_pool_free(struct i915_gem_batch_pool *pool)
+{
+	struct drm_i915_gem_object *obj, *next;
+
+	BUG_ON(!mutex_is_locked(&pool->dev->struct_mutex));
+	BUG_ON(!list_empty(&pool->active_list));
+
+	list_for_each_entry_safe(obj, next,
+				 &pool->inactive_list, batch_pool_list) {
+		list_del(&obj->batch_pool_list);
+		drm_gem_object_unreference(&obj->base);
+	}
+
+	kfree(pool);
+}
+
+/**
+ * i915_gem_batch_pool_get() - select a buffer from the pool
+ * @pool: the batch buffer pool
+ * @size: the minimum desired size of the returned buffer
+ *
+ * Finds or allocates a batch buffer in the pool with at least the requested
+ * size. The buffer will not be used to satisfy further
+ * i915_gem_batch_pool_get() requests until the corresponding
+ * i915_gem_batch_pool_put() call. The caller is responsible for any domain or
+ * active/inactive management for the returned buffer.
+ *
+ * Note: Callers must hold the struct_mutex
+ *
+ * Return: the selected batch buffer object; -EAGAIN if the pool is at capacity
+ */
+struct drm_i915_gem_object *
+i915_gem_batch_pool_get(struct i915_gem_batch_pool *pool,
+			size_t size)
+{
+	struct drm_i915_gem_object *obj = NULL;
+	struct drm_i915_gem_object *tmp;
+
+	BUG_ON(!mutex_is_locked(&pool->dev->struct_mutex));
+
+	list_for_each_entry(tmp, &pool->inactive_list, batch_pool_list) {
+		if (tmp->base.size >= size) {
+			obj = tmp;
+			break;
+		}
+	}
+
+	if (!obj) {
+		if (pool->count == pool->max_count)
+			return ERR_PTR(-EAGAIN);
+
+		obj = i915_gem_alloc_object(pool->dev, size);
+		if (!obj)
+			return ERR_PTR(-ENOMEM);
+
+		list_add_tail(&obj->batch_pool_list, &pool->inactive_list);
+		pool->count++;
+	}
+
+	obj->madv = I915_MADV_WILLNEED;
+	list_move_tail(&obj->batch_pool_list, &pool->active_list);
+
+	return obj;
+}
+
+/**
+ * i915_gem_batch_pool_put() - return a buffer to the pool
+ * @pool: the batch buffer pool
+ * @obj: the batch buffer object
+ *
+ * Note: Callers must hold the struct_mutex
+ */
+void i915_gem_batch_pool_put(struct i915_gem_batch_pool *pool,
+			     struct drm_i915_gem_object *obj)
+{
+	BUG_ON(!mutex_is_locked(&pool->dev->struct_mutex));
+	obj->madv = I915_MADV_DONTNEED;
+	list_move_tail(&obj->batch_pool_list, &pool->inactive_list);
+}
-- 
1.8.3.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC 2/4] drm/i915: Use batch pools with the command parser
  2014-06-18 16:36 [RFC 0/4] Command parser batch buffer copy bradley.d.volkin
  2014-06-18 16:36 ` [RFC 1/4] drm/i915: Implement a framework for batch buffer pools bradley.d.volkin
@ 2014-06-18 16:36 ` bradley.d.volkin
  2014-06-18 16:52   ` Chris Wilson
  2014-06-18 16:36 ` [RFC 3/4] drm/i915: Add a batch pool debugfs file bradley.d.volkin
  2014-06-18 16:36 ` [RFC 4/4] drm/i915: Dispatch the shadow batch buffer bradley.d.volkin
  3 siblings, 1 reply; 16+ messages in thread
From: bradley.d.volkin @ 2014-06-18 16:36 UTC (permalink / raw)
  To: intel-gfx

From: Brad Volkin <bradley.d.volkin@intel.com>

This patch sets up all of the tracking and copying necessary to
use batch pools with the command parser, but does not actually
dispatch the copied (shadow) batch to the hardware yet. We still
aren't quite ready to set the secure bit during dispatch.

Note that performance takes a hit from the copy in some cases
and will likely need some work. At a rough pass, the memcpy
appears to be the bottleneck. Without having done a deeper
analysis, two ideas that come to mind are:
1) Copy sections of the batch at a time, as they are reached
   by parsing. Might improve cache locality.
2) Copy only up to the userspace-supplied batch length and
   memset the rest of the buffer. Reduces the number of reads.

Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
---
 drivers/gpu/drm/i915/i915_cmd_parser.c       | 75 ++++++++++++++++++++++------
 drivers/gpu/drm/i915/i915_drv.h              |  7 ++-
 drivers/gpu/drm/i915/i915_gem.c              |  8 ++-
 drivers/gpu/drm/i915/i915_gem_execbuffer.c   | 45 +++++++++++++++--
 drivers/gpu/drm/i915/i915_gem_render_state.c |  2 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c      | 12 +++++
 drivers/gpu/drm/i915/intel_ringbuffer.h      |  7 +++
 7 files changed, 134 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
index dea99d9..669afb0 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -831,6 +831,53 @@ finish:
 	return (u32*)addr;
 }
 
+/* Returns a vmap'd pointer to dest_obj, which the caller must unmap */
+static u32 *copy_batch(struct drm_i915_gem_object *dest_obj,
+		       struct drm_i915_gem_object *src_obj)
+{
+	int ret = 0;
+	int needs_clflush = 0;
+	u32 *src_addr, *dest_addr = NULL;
+
+	ret = i915_gem_obj_prepare_shmem_read(src_obj, &needs_clflush);
+	if (ret) {
+		DRM_DEBUG_DRIVER("CMD: failed to prep read\n");
+		return ERR_PTR(ret);
+	}
+
+	src_addr = vmap_batch(src_obj);
+	if (!src_addr) {
+		DRM_DEBUG_DRIVER("CMD: Failed to vmap batch\n");
+		ret = -ENOMEM;
+		goto unpin_src;
+	}
+
+	if (needs_clflush)
+		drm_clflush_virt_range((char *)src_addr, src_obj->base.size);
+
+	ret = i915_gem_object_set_to_cpu_domain(dest_obj, true);
+	if (ret) {
+		DRM_DEBUG_DRIVER("CMD: Failed to set batch CPU domain\n");
+		goto unmap_src;
+	}
+
+	dest_addr = vmap_batch(dest_obj);
+	if (!dest_addr) {
+		DRM_DEBUG_DRIVER("CMD: Failed to vmap shadow batch\n");
+		ret = -ENOMEM;
+		goto unmap_src;
+	}
+
+	memcpy(dest_addr, src_addr, src_obj->base.size);
+
+unmap_src:
+	vunmap(src_addr);
+unpin_src:
+	i915_gem_object_unpin_pages(src_obj);
+
+	return ret ? ERR_PTR(ret) : dest_addr;
+}
+
 /**
  * i915_needs_cmd_parser() - should a given ring use software command parsing?
  * @ring: the ring in question
@@ -952,6 +999,7 @@ static bool check_cmd(const struct intel_engine_cs *ring,
  * i915_parse_cmds() - parse a submitted batch buffer for privilege violations
  * @ring: the ring on which the batch is to execute
  * @batch_obj: the batch buffer in question
+ * @shadow_batch_obj: copy of the batch buffer in question
  * @batch_start_offset: byte offset in the batch at which execution starts
  * @is_master: is the submitting process the drm master?
  *
@@ -962,31 +1010,21 @@ static bool check_cmd(const struct intel_engine_cs *ring,
  */
 int i915_parse_cmds(struct intel_engine_cs *ring,
 		    struct drm_i915_gem_object *batch_obj,
+		    struct drm_i915_gem_object *shadow_batch_obj,
 		    u32 batch_start_offset,
 		    bool is_master)
 {
 	int ret = 0;
 	u32 *cmd, *batch_base, *batch_end;
 	struct drm_i915_cmd_descriptor default_desc = { 0 };
-	int needs_clflush = 0;
 	bool oacontrol_set = false; /* OACONTROL tracking. See check_cmd() */
 
-	ret = i915_gem_obj_prepare_shmem_read(batch_obj, &needs_clflush);
-	if (ret) {
-		DRM_DEBUG_DRIVER("CMD: failed to prep read\n");
-		return ret;
-	}
-
-	batch_base = vmap_batch(batch_obj);
-	if (!batch_base) {
-		DRM_DEBUG_DRIVER("CMD: Failed to vmap batch\n");
-		i915_gem_object_unpin_pages(batch_obj);
-		return -ENOMEM;
+	batch_base = copy_batch(shadow_batch_obj, batch_obj);
+	if (IS_ERR(batch_base)) {
+		DRM_DEBUG_DRIVER("CMD: Failed to copy batch\n");
+		return PTR_ERR(batch_base);
 	}
 
-	if (needs_clflush)
-		drm_clflush_virt_range((char *)batch_base, batch_obj->base.size);
-
 	cmd = batch_base + (batch_start_offset / sizeof(*cmd));
 	batch_end = cmd + (batch_obj->base.size / sizeof(*batch_end));
 
@@ -1039,7 +1077,12 @@ int i915_parse_cmds(struct intel_engine_cs *ring,
 
 	vunmap(batch_base);
 
-	i915_gem_object_unpin_pages(batch_obj);
+	if (!ret) {
+		ret = i915_gem_object_set_to_gtt_domain(shadow_batch_obj,
+							false);
+		if (ret)
+			DRM_DEBUG_DRIVER("CMD: Failed to set batch GTT domain\n");
+	}
 
 	return ret;
 }
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 2a88b5e..cecf0f8 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1766,6 +1766,9 @@ struct drm_i915_gem_request {
 	/** Batch buffer related to this request if any */
 	struct drm_i915_gem_object *batch_obj;
 
+	/** Shadow copy of the batch buffer, if any, for the command parser */
+	struct drm_i915_gem_object *shadow_batch_obj;
+
 	/** Time at which this request was emitted, in jiffies. */
 	unsigned long emitted_jiffies;
 
@@ -2297,9 +2300,10 @@ int __must_check i915_gem_suspend(struct drm_device *dev);
 int __i915_add_request(struct intel_engine_cs *ring,
 		       struct drm_file *file,
 		       struct drm_i915_gem_object *batch_obj,
+		       struct drm_i915_gem_object *shadow_batch_obj,
 		       u32 *seqno);
 #define i915_add_request(ring, seqno) \
-	__i915_add_request(ring, NULL, NULL, seqno)
+	__i915_add_request(ring, NULL, NULL, NULL, seqno)
 int __must_check i915_wait_seqno(struct intel_engine_cs *ring,
 				 uint32_t seqno);
 int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf);
@@ -2534,6 +2538,7 @@ void i915_cmd_parser_fini_ring(struct intel_engine_cs *ring);
 bool i915_needs_cmd_parser(struct intel_engine_cs *ring);
 int i915_parse_cmds(struct intel_engine_cs *ring,
 		    struct drm_i915_gem_object *batch_obj,
+		    struct drm_i915_gem_object *shadow_batch_obj,
 		    u32 batch_start_offset,
 		    bool is_master);
 
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index d5e3001..a477818 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2325,6 +2325,7 @@ i915_gem_get_seqno(struct drm_device *dev, u32 *seqno)
 int __i915_add_request(struct intel_engine_cs *ring,
 		       struct drm_file *file,
 		       struct drm_i915_gem_object *obj,
+		       struct drm_i915_gem_object *shadow_obj,
 		       u32 *out_seqno)
 {
 	struct drm_i915_private *dev_priv = ring->dev->dev_private;
@@ -2368,9 +2369,10 @@ int __i915_add_request(struct intel_engine_cs *ring,
 	 * active_list, and so will hold the active reference. Only when this
 	 * request is retired will the the batch_obj be moved onto the
 	 * inactive_list and lose its active reference. Hence we do not need
-	 * to explicitly hold another reference here.
+	 * to explicitly hold another reference here. Same for shadow batch.
 	 */
 	request->batch_obj = obj;
+	request->shadow_batch_obj = shadow_obj;
 
 	/* Hold a reference to the current context so that we can inspect
 	 * it later in case a hangcheck error event fires.
@@ -2478,6 +2480,10 @@ static void i915_gem_free_request(struct drm_i915_gem_request *request)
 	if (request->ctx)
 		i915_gem_context_unreference(request->ctx);
 
+	if (request->shadow_batch_obj)
+		i915_gem_batch_pool_put(request->ring->batch_pool,
+					request->shadow_batch_obj);
+
 	kfree(request);
 }
 
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 93d7f72..0b263aa 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -992,13 +992,14 @@ static void
 i915_gem_execbuffer_retire_commands(struct drm_device *dev,
 				    struct drm_file *file,
 				    struct intel_engine_cs *ring,
-				    struct drm_i915_gem_object *obj)
+				    struct drm_i915_gem_object *obj,
+				    struct drm_i915_gem_object *shadow_obj)
 {
 	/* Unconditionally force add_request to emit a full flush. */
 	ring->gpu_caches_dirty = true;
 
 	/* Add a breadcrumb for the completion of the batch buffer */
-	(void)__i915_add_request(ring, file, obj, NULL);
+	(void)__i915_add_request(ring, file, obj, shadow_obj, NULL);
 }
 
 static int
@@ -1087,6 +1088,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct eb_vmas *eb;
 	struct drm_i915_gem_object *batch_obj;
+	struct drm_i915_gem_object *shadow_batch_obj = NULL;
 	struct drm_clip_rect *cliprects = NULL;
 	struct intel_engine_cs *ring;
 	struct intel_context *ctx;
@@ -1288,8 +1290,31 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	batch_obj->base.pending_read_domains |= I915_GEM_DOMAIN_COMMAND;
 
 	if (i915_needs_cmd_parser(ring)) {
+		shadow_batch_obj =
+			i915_gem_batch_pool_get(ring->batch_pool,
+						batch_obj->base.size);
+		if (IS_ERR(shadow_batch_obj)) {
+			ret = PTR_ERR(shadow_batch_obj);
+			/* Don't try to clean up the obj in the error path */
+			shadow_batch_obj = NULL;
+
+			/*
+			 * If the pool is at capcity, retiring requests might
+			 * return some buffers.
+			 */
+			if (ret == -EAGAIN)
+				i915_gem_retire_requests_ring(ring);
+
+			goto err;
+		}
+
+		ret = i915_gem_obj_ggtt_pin(shadow_batch_obj, 4096, 0);
+		if (ret)
+			goto err;
+
 		ret = i915_parse_cmds(ring,
 				      batch_obj,
+				      shadow_batch_obj,
 				      args->batch_start_offset,
 				      file->is_master);
 		if (ret)
@@ -1377,13 +1402,27 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	trace_i915_gem_ring_dispatch(ring, intel_ring_get_seqno(ring), flags);
 
 	i915_gem_execbuffer_move_to_active(&eb->vmas, ring);
-	i915_gem_execbuffer_retire_commands(dev, file, ring, batch_obj);
+	if (shadow_batch_obj) {
+		struct i915_vma *vma = i915_gem_obj_to_ggtt(shadow_batch_obj);
+
+		i915_vma_move_to_active(vma, ring);
+		i915_gem_object_ggtt_unpin(shadow_batch_obj);
+	}
+	i915_gem_execbuffer_retire_commands(dev, file, ring, batch_obj,
+					    shadow_batch_obj);
 
 err:
 	/* the request owns the ref now */
 	i915_gem_context_unreference(ctx);
 	eb_destroy(eb);
 
+	if (ret && ring && shadow_batch_obj) {
+		if (i915_gem_obj_is_pinned(shadow_batch_obj))
+			i915_gem_object_ggtt_unpin(shadow_batch_obj);
+
+		i915_gem_batch_pool_put(ring->batch_pool, shadow_batch_obj);
+	}
+
 	mutex_unlock(&dev->struct_mutex);
 
 pre_mutex_err:
diff --git a/drivers/gpu/drm/i915/i915_gem_render_state.c b/drivers/gpu/drm/i915/i915_gem_render_state.c
index e60be3f..f614406 100644
--- a/drivers/gpu/drm/i915/i915_gem_render_state.c
+++ b/drivers/gpu/drm/i915/i915_gem_render_state.c
@@ -161,7 +161,7 @@ int i915_gem_render_state_init(struct intel_engine_cs *ring)
 
 	i915_vma_move_to_active(i915_gem_obj_to_ggtt(so.obj), ring);
 
-	ret = __i915_add_request(ring, NULL, so.obj, NULL);
+	ret = __i915_add_request(ring, NULL, so.obj, NULL, NULL);
 	/* __i915_add_request moves object to inactive if it fails */
 out:
 	render_state_fini(&so);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index b96edaf..4cd123a 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1476,6 +1476,17 @@ static int intel_init_ring_buffer(struct drm_device *dev,
 	if (ret)
 		goto error;
 
+	/*
+	 * Benchmarking shows that the actual maximum number of buffers
+	 * in the pool varies a bit with workload, but 64 seems to cover
+	 * us fairly well.
+	 */
+	ring->batch_pool = i915_gem_batch_pool_alloc(dev, 64);
+	if (IS_ERR(ring->batch_pool)) {
+		ret = PTR_ERR(ring->batch_pool);
+		goto error;
+	}
+
 	ret = ring->init(ring);
 	if (ret)
 		goto error;
@@ -1513,6 +1524,7 @@ void intel_cleanup_ring_buffer(struct intel_engine_cs *ring)
 	cleanup_status_page(ring);
 
 	i915_cmd_parser_fini_ring(ring);
+	i915_gem_batch_pool_free(ring->batch_pool);
 
 	kfree(ringbuf);
 	ring->buffer = NULL;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index e72017b..82aae29 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -188,6 +188,13 @@ struct  intel_engine_cs {
 	bool needs_cmd_parser;
 
 	/*
+	 * A pool of objects to use as shadow copies of client batch buffers
+	 * when the command parser is enabled. Prevents the client from
+	 * modifying the batch contents after software parsing.
+	 */
+	struct i915_gem_batch_pool *batch_pool;
+
+	/*
 	 * Table of commands the command parser needs to know about
 	 * for this ring.
 	 */
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC 3/4] drm/i915: Add a batch pool debugfs file
  2014-06-18 16:36 [RFC 0/4] Command parser batch buffer copy bradley.d.volkin
  2014-06-18 16:36 ` [RFC 1/4] drm/i915: Implement a framework for batch buffer pools bradley.d.volkin
  2014-06-18 16:36 ` [RFC 2/4] drm/i915: Use batch pools with the command parser bradley.d.volkin
@ 2014-06-18 16:36 ` bradley.d.volkin
  2014-06-18 16:36 ` [RFC 4/4] drm/i915: Dispatch the shadow batch buffer bradley.d.volkin
  3 siblings, 0 replies; 16+ messages in thread
From: bradley.d.volkin @ 2014-06-18 16:36 UTC (permalink / raw)
  To: intel-gfx

From: Brad Volkin <bradley.d.volkin@intel.com>

It provides some useful information about the buffers in
the per-ring command parser batch pools.

Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c | 48 +++++++++++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 76c2572..a930313 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -564,6 +564,53 @@ static int i915_gem_pageflip_info(struct seq_file *m, void *data)
 	return 0;
 }
 
+static int i915_gem_batch_pool_info(struct seq_file *m, void *data)
+{
+	struct drm_info_node *node = m->private;
+	struct drm_device *dev = node->minor->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct intel_engine_cs *ring;
+	struct drm_i915_gem_object *obj;
+	int count;
+	int ret, i;
+
+	ret = mutex_lock_interruptible(&dev->struct_mutex);
+	if (ret)
+		return ret;
+
+	for_each_ring(ring, dev_priv, i) {
+		count = 0;
+
+		seq_printf(m, "%s pool:\n", ring->name);
+
+		seq_puts(m, " active:\n");
+		list_for_each_entry(obj,
+				    &ring->batch_pool->active_list,
+				    batch_pool_list) {
+			seq_puts(m, "   ");
+			describe_obj(m, obj);
+			seq_putc(m, '\n');
+			count++;
+		}
+
+		seq_puts(m, " inactive:\n");
+		list_for_each_entry(obj,
+				    &ring->batch_pool->inactive_list,
+				    batch_pool_list) {
+			seq_puts(m, "   ");
+			describe_obj(m, obj);
+			seq_putc(m, '\n');
+			count++;
+		}
+
+		seq_printf(m, " total: %d\n", count);
+	}
+
+	mutex_unlock(&dev->struct_mutex);
+
+	return 0;
+}
+
 static int i915_gem_request_info(struct seq_file *m, void *data)
 {
 	struct drm_info_node *node = m->private;
@@ -3811,6 +3858,7 @@ static const struct drm_info_list i915_debugfs_list[] = {
 	{"i915_gem_hws_blt", i915_hws_info, 0, (void *)BCS},
 	{"i915_gem_hws_bsd", i915_hws_info, 0, (void *)VCS},
 	{"i915_gem_hws_vebox", i915_hws_info, 0, (void *)VECS},
+	{"i915_gem_batch_pool", i915_gem_batch_pool_info, 0},
 	{"i915_rstdby_delays", i915_rstdby_delays, 0},
 	{"i915_frequency_info", i915_frequency_info, 0},
 	{"i915_delayfreq_table", i915_delayfreq_table, 0},
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC 4/4] drm/i915: Dispatch the shadow batch buffer
  2014-06-18 16:36 [RFC 0/4] Command parser batch buffer copy bradley.d.volkin
                   ` (2 preceding siblings ...)
  2014-06-18 16:36 ` [RFC 3/4] drm/i915: Add a batch pool debugfs file bradley.d.volkin
@ 2014-06-18 16:36 ` bradley.d.volkin
  3 siblings, 0 replies; 16+ messages in thread
From: bradley.d.volkin @ 2014-06-18 16:36 UTC (permalink / raw)
  To: intel-gfx

From: Brad Volkin <bradley.d.volkin@intel.com>

This is useful for testing the batch pool code with aliasing PPGTT.
It doesn't work with full PPGTT though; the GPU hangs and the whole
UI is corrupted. We need fixes for the secure dispatch path to
enable this for real.

Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 37 ++++++++++++++++--------------
 1 file changed, 20 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 0b263aa..981f66b 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1321,31 +1321,34 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 			goto err;
 
 		/*
-		 * XXX: Actually do this when enabling batch copy...
-		 *
 		 * Set the DISPATCH_SECURE bit to remove the NON_SECURE bit
 		 * from MI_BATCH_BUFFER_START commands issued in the
 		 * dispatch_execbuffer implementations. We specifically don't
 		 * want that set when the command parser is enabled.
 		 */
+		flags |= I915_DISPATCH_SECURE;
 	}
 
-	/* snb/ivb/vlv conflate the "batch in ppgtt" bit with the "non-secure
-	 * batch" bit. Hence we need to pin secure batches into the global gtt.
-	 * hsw should have this fixed, but bdw mucks it up again. */
-	if (flags & I915_DISPATCH_SECURE &&
-	    !batch_obj->has_global_gtt_mapping) {
-		/* When we have multiple VMs, we'll need to make sure that we
-		 * allocate space first */
-		struct i915_vma *vma = i915_gem_obj_to_ggtt(batch_obj);
-		BUG_ON(!vma);
-		vma->bind_vma(vma, batch_obj->cache_level, GLOBAL_BIND);
-	}
+	if (!shadow_batch_obj) {
+		/* snb/ivb/vlv conflate the "batch in ppgtt" bit with the "non-secure
+		 * batch" bit. Hence we need to pin secure batches into the global gtt.
+		 * hsw should have this fixed, but bdw mucks it up again. */
+		if (flags & I915_DISPATCH_SECURE &&
+		    !batch_obj->has_global_gtt_mapping) {
+			/* When we have multiple VMs, we'll need to make sure that we
+			 * allocate space first */
+			struct i915_vma *vma = i915_gem_obj_to_ggtt(batch_obj);
+			BUG_ON(!vma);
+			vma->bind_vma(vma, batch_obj->cache_level, GLOBAL_BIND);
+		}
 
-	if (flags & I915_DISPATCH_SECURE)
-		exec_start += i915_gem_obj_ggtt_offset(batch_obj);
-	else
-		exec_start += i915_gem_obj_offset(batch_obj, vm);
+		if (flags & I915_DISPATCH_SECURE)
+			exec_start += i915_gem_obj_ggtt_offset(batch_obj);
+		else
+			exec_start += i915_gem_obj_offset(batch_obj, vm);
+	} else {
+		exec_start += i915_gem_obj_ggtt_offset(shadow_batch_obj);
+	}
 
 	ret = i915_gem_execbuffer_move_to_gpu(ring, &eb->vmas);
 	if (ret)
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [RFC 2/4] drm/i915: Use batch pools with the command parser
  2014-06-18 16:36 ` [RFC 2/4] drm/i915: Use batch pools with the command parser bradley.d.volkin
@ 2014-06-18 16:52   ` Chris Wilson
  2014-06-18 17:49     ` Volkin, Bradley D
  0 siblings, 1 reply; 16+ messages in thread
From: Chris Wilson @ 2014-06-18 16:52 UTC (permalink / raw)
  To: bradley.d.volkin; +Cc: intel-gfx

On Wed, Jun 18, 2014 at 09:36:14AM -0700, bradley.d.volkin@intel.com wrote:
> From: Brad Volkin <bradley.d.volkin@intel.com>
> 
> This patch sets up all of the tracking and copying necessary to
> use batch pools with the command parser, but does not actually
> dispatch the copied (shadow) batch to the hardware yet. We still
> aren't quite ready to set the secure bit during dispatch.
> 
> Note that performance takes a hit from the copy in some cases
> and will likely need some work. At a rough pass, the memcpy
> appears to be the bottleneck. Without having done a deeper
> analysis, two ideas that come to mind are:
> 1) Copy sections of the batch at a time, as they are reached
>    by parsing. Might improve cache locality.
> 2) Copy only up to the userspace-supplied batch length and
>    memset the rest of the buffer. Reduces the number of reads.
> 
> Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_cmd_parser.c       | 75 ++++++++++++++++++++++------
>  drivers/gpu/drm/i915/i915_drv.h              |  7 ++-
>  drivers/gpu/drm/i915/i915_gem.c              |  8 ++-
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c   | 45 +++++++++++++++--
>  drivers/gpu/drm/i915/i915_gem_render_state.c |  2 +-
>  drivers/gpu/drm/i915/intel_ringbuffer.c      | 12 +++++
>  drivers/gpu/drm/i915/intel_ringbuffer.h      |  7 +++
>  7 files changed, 134 insertions(+), 22 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
> index dea99d9..669afb0 100644
> --- a/drivers/gpu/drm/i915/i915_cmd_parser.c
> +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
> @@ -831,6 +831,53 @@ finish:
>  	return (u32*)addr;
>  }
>  
> +/* Returns a vmap'd pointer to dest_obj, which the caller must unmap */
> +static u32 *copy_batch(struct drm_i915_gem_object *dest_obj,
> +		       struct drm_i915_gem_object *src_obj)
> +{
> +	int ret = 0;
> +	int needs_clflush = 0;
> +	u32 *src_addr, *dest_addr = NULL;
> +
> +	ret = i915_gem_obj_prepare_shmem_read(src_obj, &needs_clflush);
> +	if (ret) {
> +		DRM_DEBUG_DRIVER("CMD: failed to prep read\n");
> +		return ERR_PTR(ret);
> +	}
> +
> +	src_addr = vmap_batch(src_obj);
> +	if (!src_addr) {
> +		DRM_DEBUG_DRIVER("CMD: Failed to vmap batch\n");
> +		ret = -ENOMEM;
> +		goto unpin_src;
> +	}
> +
> +	if (needs_clflush)
> +		drm_clflush_virt_range((char *)src_addr, src_obj->base.size);
> +
> +	ret = i915_gem_object_set_to_cpu_domain(dest_obj, true);
> +	if (ret) {
> +		DRM_DEBUG_DRIVER("CMD: Failed to set batch CPU domain\n");
> +		goto unmap_src;
> +	}
> +
> +	dest_addr = vmap_batch(dest_obj);
> +	if (!dest_addr) {
> +		DRM_DEBUG_DRIVER("CMD: Failed to vmap shadow batch\n");
> +		ret = -ENOMEM;
> +		goto unmap_src;
> +	}
> +
> +	memcpy(dest_addr, src_addr, src_obj->base.size);
> +
> +unmap_src:
> +	vunmap(src_addr);
> +unpin_src:
> +	i915_gem_object_unpin_pages(src_obj);
> +
> +	return ret ? ERR_PTR(ret) : dest_addr;
> +}

src_obj->size? You should perhaps borrow a byt.

>  static int
> @@ -1087,6 +1088,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
>  	struct drm_i915_private *dev_priv = dev->dev_private;
>  	struct eb_vmas *eb;
>  	struct drm_i915_gem_object *batch_obj;
> +	struct drm_i915_gem_object *shadow_batch_obj = NULL;
>  	struct drm_clip_rect *cliprects = NULL;
>  	struct intel_engine_cs *ring;
>  	struct intel_context *ctx;
> @@ -1288,8 +1290,31 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
>  	batch_obj->base.pending_read_domains |= I915_GEM_DOMAIN_COMMAND;
>  
>  	if (i915_needs_cmd_parser(ring)) {
> +		shadow_batch_obj =
> +			i915_gem_batch_pool_get(ring->batch_pool,
> +						batch_obj->base.size);
> +		if (IS_ERR(shadow_batch_obj)) {
> +			ret = PTR_ERR(shadow_batch_obj);
> +			/* Don't try to clean up the obj in the error path */
> +			shadow_batch_obj = NULL;
> +
> +			/*
> +			 * If the pool is at capcity, retiring requests might
> +			 * return some buffers.
> +			 */
> +			if (ret == -EAGAIN)
> +				i915_gem_retire_requests_ring(ring);
> +
> +			goto err;
> +		}
> +
> +		ret = i915_gem_obj_ggtt_pin(shadow_batch_obj, 4096, 0);
> +		if (ret)
> +			goto err;
> +
>  		ret = i915_parse_cmds(ring,
>  				      batch_obj,
> +				      shadow_batch_obj,
>  				      args->batch_start_offset,
>  				      file->is_master);

You could just allocate the shadow inside parse_cmds and return it. Then
replace the conventional batch_boj with the new pointer and add it to
the eb list. Similarly, you do not need to track shadow_batch_obj
explicitly in the requests, the pool can do its own busy tracking
external to the core and reduce the invasiveness of the patch.

> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index e72017b..82aae29 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -188,6 +188,13 @@ struct  intel_engine_cs {
>  	bool needs_cmd_parser;
>  
>  	/*
> +	 * A pool of objects to use as shadow copies of client batch buffers
> +	 * when the command parser is enabled. Prevents the client from
> +	 * modifying the batch contents after software parsing.
> +	 */
> +	struct i915_gem_batch_pool *batch_pool;

Why are they tied to a ring?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC 2/4] drm/i915: Use batch pools with the command parser
  2014-06-18 16:52   ` Chris Wilson
@ 2014-06-18 17:49     ` Volkin, Bradley D
  2014-06-18 18:11       ` Chris Wilson
  0 siblings, 1 reply; 16+ messages in thread
From: Volkin, Bradley D @ 2014-06-18 17:49 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Wed, Jun 18, 2014 at 09:52:52AM -0700, Chris Wilson wrote:
> On Wed, Jun 18, 2014 at 09:36:14AM -0700, bradley.d.volkin@intel.com wrote:
> > From: Brad Volkin <bradley.d.volkin@intel.com>
> > 
> > This patch sets up all of the tracking and copying necessary to
> > use batch pools with the command parser, but does not actually
> > dispatch the copied (shadow) batch to the hardware yet. We still
> > aren't quite ready to set the secure bit during dispatch.
> > 
> > Note that performance takes a hit from the copy in some cases
> > and will likely need some work. At a rough pass, the memcpy
> > appears to be the bottleneck. Without having done a deeper
> > analysis, two ideas that come to mind are:
> > 1) Copy sections of the batch at a time, as they are reached
> >    by parsing. Might improve cache locality.
> > 2) Copy only up to the userspace-supplied batch length and
> >    memset the rest of the buffer. Reduces the number of reads.
> > 
> > Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_cmd_parser.c       | 75 ++++++++++++++++++++++------
> >  drivers/gpu/drm/i915/i915_drv.h              |  7 ++-
> >  drivers/gpu/drm/i915/i915_gem.c              |  8 ++-
> >  drivers/gpu/drm/i915/i915_gem_execbuffer.c   | 45 +++++++++++++++--
> >  drivers/gpu/drm/i915/i915_gem_render_state.c |  2 +-
> >  drivers/gpu/drm/i915/intel_ringbuffer.c      | 12 +++++
> >  drivers/gpu/drm/i915/intel_ringbuffer.h      |  7 +++
> >  7 files changed, 134 insertions(+), 22 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
> > index dea99d9..669afb0 100644
> > --- a/drivers/gpu/drm/i915/i915_cmd_parser.c
> > +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
> > @@ -831,6 +831,53 @@ finish:
> >  	return (u32*)addr;
> >  }
> >  
> > +/* Returns a vmap'd pointer to dest_obj, which the caller must unmap */
> > +static u32 *copy_batch(struct drm_i915_gem_object *dest_obj,
> > +		       struct drm_i915_gem_object *src_obj)
> > +{
> > +	int ret = 0;
> > +	int needs_clflush = 0;
> > +	u32 *src_addr, *dest_addr = NULL;
> > +
> > +	ret = i915_gem_obj_prepare_shmem_read(src_obj, &needs_clflush);
> > +	if (ret) {
> > +		DRM_DEBUG_DRIVER("CMD: failed to prep read\n");
> > +		return ERR_PTR(ret);
> > +	}
> > +
> > +	src_addr = vmap_batch(src_obj);
> > +	if (!src_addr) {
> > +		DRM_DEBUG_DRIVER("CMD: Failed to vmap batch\n");
> > +		ret = -ENOMEM;
> > +		goto unpin_src;
> > +	}
> > +
> > +	if (needs_clflush)
> > +		drm_clflush_virt_range((char *)src_addr, src_obj->base.size);
> > +
> > +	ret = i915_gem_object_set_to_cpu_domain(dest_obj, true);
> > +	if (ret) {
> > +		DRM_DEBUG_DRIVER("CMD: Failed to set batch CPU domain\n");
> > +		goto unmap_src;
> > +	}
> > +
> > +	dest_addr = vmap_batch(dest_obj);
> > +	if (!dest_addr) {
> > +		DRM_DEBUG_DRIVER("CMD: Failed to vmap shadow batch\n");
> > +		ret = -ENOMEM;
> > +		goto unmap_src;
> > +	}
> > +
> > +	memcpy(dest_addr, src_addr, src_obj->base.size);
> > +
> > +unmap_src:
> > +	vunmap(src_addr);
> > +unpin_src:
> > +	i915_gem_object_unpin_pages(src_obj);
> > +
> > +	return ret ? ERR_PTR(ret) : dest_addr;
> > +}
> 
> src_obj->size? You should perhaps borrow a byt.

Not sure I completely follow you here. The dest_obj is as big or bigger than
the source, so I think copying only as much data as exists in the source
object is right. I should be writing nops into any extra space though. And in
parse_cmds, I should update the batch_end calculation. Were those the issues
that you saw?

> 
> >  static int
> > @@ -1087,6 +1088,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
> >  	struct drm_i915_private *dev_priv = dev->dev_private;
> >  	struct eb_vmas *eb;
> >  	struct drm_i915_gem_object *batch_obj;
> > +	struct drm_i915_gem_object *shadow_batch_obj = NULL;
> >  	struct drm_clip_rect *cliprects = NULL;
> >  	struct intel_engine_cs *ring;
> >  	struct intel_context *ctx;
> > @@ -1288,8 +1290,31 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
> >  	batch_obj->base.pending_read_domains |= I915_GEM_DOMAIN_COMMAND;
> >  
> >  	if (i915_needs_cmd_parser(ring)) {
> > +		shadow_batch_obj =
> > +			i915_gem_batch_pool_get(ring->batch_pool,
> > +						batch_obj->base.size);
> > +		if (IS_ERR(shadow_batch_obj)) {
> > +			ret = PTR_ERR(shadow_batch_obj);
> > +			/* Don't try to clean up the obj in the error path */
> > +			shadow_batch_obj = NULL;
> > +
> > +			/*
> > +			 * If the pool is at capcity, retiring requests might
> > +			 * return some buffers.
> > +			 */
> > +			if (ret == -EAGAIN)
> > +				i915_gem_retire_requests_ring(ring);
> > +
> > +			goto err;
> > +		}
> > +
> > +		ret = i915_gem_obj_ggtt_pin(shadow_batch_obj, 4096, 0);
> > +		if (ret)
> > +			goto err;
> > +
> >  		ret = i915_parse_cmds(ring,
> >  				      batch_obj,
> > +				      shadow_batch_obj,
> >  				      args->batch_start_offset,
> >  				      file->is_master);
> 
> You could just allocate the shadow inside parse_cmds and return it. Then
> replace the conventional batch_boj with the new pointer and add it to
> the eb list. Similarly, you do not need to track shadow_batch_obj
> explicitly in the requests, the pool can do its own busy tracking
> external to the core and reduce the invasiveness of the patch.

Overall, I agree that this touched more places than I would have liked.

I initially thought about replacing batch_obj and hooking into the eb list,
but that seemed like it would involve trickier code w.r.t. ref counting,
making up an exec_entry for the vma, etc. Not the end of the world, but
it felt less obvious. I can look again though if you think it's worth it.

What specifically are you thinking in terms of implementing busy tracking
in the pool? The idea with adding the shadow object to the request was just
to get it back in the pool and purgeable asap. I also thought it would limit
some additional code given that we know buffers in the pool have had any
pending work completed. Maybe the suggested approach would do a better job
of those things though.

> 
> > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> > index e72017b..82aae29 100644
> > --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> > @@ -188,6 +188,13 @@ struct  intel_engine_cs {
> >  	bool needs_cmd_parser;
> >  
> >  	/*
> > +	 * A pool of objects to use as shadow copies of client batch buffers
> > +	 * when the command parser is enabled. Prevents the client from
> > +	 * modifying the batch contents after software parsing.
> > +	 */
> > +	struct i915_gem_batch_pool *batch_pool;
> 
> Why are they tied to a ring?

There was a question as to whether to make a global pool or per-ring
pools way back when I first sent the parser series. Daniel expressed
a slight preference for per-ring, but I don't object to a global pool.
I suppose a potential benefit to per-ring would be if userspace uses
different sized buffers on different rings, we'd more easily find a
suitable buffer. But enhancing the pool management could fix that too.

Brad

> -Chris
> 
> -- 
> Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC 2/4] drm/i915: Use batch pools with the command parser
  2014-06-18 17:49     ` Volkin, Bradley D
@ 2014-06-18 18:11       ` Chris Wilson
  2014-06-18 19:59         ` Daniel Vetter
  0 siblings, 1 reply; 16+ messages in thread
From: Chris Wilson @ 2014-06-18 18:11 UTC (permalink / raw)
  To: Volkin, Bradley D; +Cc: intel-gfx

On Wed, Jun 18, 2014 at 10:49:05AM -0700, Volkin, Bradley D wrote:
> On Wed, Jun 18, 2014 at 09:52:52AM -0700, Chris Wilson wrote:
> > On Wed, Jun 18, 2014 at 09:36:14AM -0700, bradley.d.volkin@intel.com wrote:
> > > From: Brad Volkin <bradley.d.volkin@intel.com>
> > > 
> > > This patch sets up all of the tracking and copying necessary to
> > > use batch pools with the command parser, but does not actually
> > > dispatch the copied (shadow) batch to the hardware yet. We still
> > > aren't quite ready to set the secure bit during dispatch.
> > > 
> > > Note that performance takes a hit from the copy in some cases
> > > and will likely need some work. At a rough pass, the memcpy
> > > appears to be the bottleneck. Without having done a deeper
> > > analysis, two ideas that come to mind are:
> > > 1) Copy sections of the batch at a time, as they are reached
> > >    by parsing. Might improve cache locality.
> > > 2) Copy only up to the userspace-supplied batch length and
> > >    memset the rest of the buffer. Reduces the number of reads.
> > > 
> > > Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
> > > ---
> > >  drivers/gpu/drm/i915/i915_cmd_parser.c       | 75 ++++++++++++++++++++++------
> > >  drivers/gpu/drm/i915/i915_drv.h              |  7 ++-
> > >  drivers/gpu/drm/i915/i915_gem.c              |  8 ++-
> > >  drivers/gpu/drm/i915/i915_gem_execbuffer.c   | 45 +++++++++++++++--
> > >  drivers/gpu/drm/i915/i915_gem_render_state.c |  2 +-
> > >  drivers/gpu/drm/i915/intel_ringbuffer.c      | 12 +++++
> > >  drivers/gpu/drm/i915/intel_ringbuffer.h      |  7 +++
> > >  7 files changed, 134 insertions(+), 22 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
> > > index dea99d9..669afb0 100644
> > > --- a/drivers/gpu/drm/i915/i915_cmd_parser.c
> > > +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
> > > @@ -831,6 +831,53 @@ finish:
> > >  	return (u32*)addr;
> > >  }
> > >  
> > > +/* Returns a vmap'd pointer to dest_obj, which the caller must unmap */
> > > +static u32 *copy_batch(struct drm_i915_gem_object *dest_obj,
> > > +		       struct drm_i915_gem_object *src_obj)
> > > +{
> > > +	int ret = 0;
> > > +	int needs_clflush = 0;
> > > +	u32 *src_addr, *dest_addr = NULL;
> > > +
> > > +	ret = i915_gem_obj_prepare_shmem_read(src_obj, &needs_clflush);
> > > +	if (ret) {
> > > +		DRM_DEBUG_DRIVER("CMD: failed to prep read\n");
> > > +		return ERR_PTR(ret);
> > > +	}
> > > +
> > > +	src_addr = vmap_batch(src_obj);
> > > +	if (!src_addr) {
> > > +		DRM_DEBUG_DRIVER("CMD: Failed to vmap batch\n");
> > > +		ret = -ENOMEM;
> > > +		goto unpin_src;
> > > +	}
> > > +
> > > +	if (needs_clflush)
> > > +		drm_clflush_virt_range((char *)src_addr, src_obj->base.size);
> > > +
> > > +	ret = i915_gem_object_set_to_cpu_domain(dest_obj, true);
> > > +	if (ret) {
> > > +		DRM_DEBUG_DRIVER("CMD: Failed to set batch CPU domain\n");
> > > +		goto unmap_src;
> > > +	}
> > > +
> > > +	dest_addr = vmap_batch(dest_obj);
> > > +	if (!dest_addr) {
> > > +		DRM_DEBUG_DRIVER("CMD: Failed to vmap shadow batch\n");
> > > +		ret = -ENOMEM;
> > > +		goto unmap_src;
> > > +	}
> > > +
> > > +	memcpy(dest_addr, src_addr, src_obj->base.size);
> > > +
> > > +unmap_src:
> > > +	vunmap(src_addr);
> > > +unpin_src:
> > > +	i915_gem_object_unpin_pages(src_obj);
> > > +
> > > +	return ret ? ERR_PTR(ret) : dest_addr;
> > > +}
> > 
> > src_obj->size? You should perhaps borrow a byt.
> 
> Not sure I completely follow you here. The dest_obj is as big or bigger than
> the source, so I think copying only as much data as exists in the source
> object is right. I should be writing nops into any extra space though. And in
> parse_cmds, I should update the batch_end calculation. Were those the issues
> that you saw?

Just that generally src->size >> batch len and clflush will make it
twice as expensive on byt. (The object may be about 1000 times larger
than the batch at the extreme, libva *cough*.)

> > >  static int
> > > @@ -1087,6 +1088,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
> > >  	struct drm_i915_private *dev_priv = dev->dev_private;
> > >  	struct eb_vmas *eb;
> > >  	struct drm_i915_gem_object *batch_obj;
> > > +	struct drm_i915_gem_object *shadow_batch_obj = NULL;
> > >  	struct drm_clip_rect *cliprects = NULL;
> > >  	struct intel_engine_cs *ring;
> > >  	struct intel_context *ctx;
> > > @@ -1288,8 +1290,31 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
> > >  	batch_obj->base.pending_read_domains |= I915_GEM_DOMAIN_COMMAND;
> > >  
> > >  	if (i915_needs_cmd_parser(ring)) {
> > > +		shadow_batch_obj =
> > > +			i915_gem_batch_pool_get(ring->batch_pool,
> > > +						batch_obj->base.size);
> > > +		if (IS_ERR(shadow_batch_obj)) {
> > > +			ret = PTR_ERR(shadow_batch_obj);
> > > +			/* Don't try to clean up the obj in the error path */
> > > +			shadow_batch_obj = NULL;
> > > +
> > > +			/*
> > > +			 * If the pool is at capcity, retiring requests might
> > > +			 * return some buffers.
> > > +			 */
> > > +			if (ret == -EAGAIN)
> > > +				i915_gem_retire_requests_ring(ring);
> > > +
> > > +			goto err;
> > > +		}
> > > +
> > > +		ret = i915_gem_obj_ggtt_pin(shadow_batch_obj, 4096, 0);
> > > +		if (ret)
> > > +			goto err;
> > > +
> > >  		ret = i915_parse_cmds(ring,
> > >  				      batch_obj,
> > > +				      shadow_batch_obj,
> > >  				      args->batch_start_offset,
> > >  				      file->is_master);
> > 
> > You could just allocate the shadow inside parse_cmds and return it. Then
> > replace the conventional batch_boj with the new pointer and add it to
> > the eb list. Similarly, you do not need to track shadow_batch_obj
> > explicitly in the requests, the pool can do its own busy tracking
> > external to the core and reduce the invasiveness of the patch.
> 
> Overall, I agree that this touched more places than I would have liked.
> 
> I initially thought about replacing batch_obj and hooking into the eb list,
> but that seemed like it would involve trickier code w.r.t. ref counting,
> making up an exec_entry for the vma, etc. Not the end of the world, but
> it felt less obvious. I can look again though if you think it's worth it.

I think so. The request is already complicated enough and I think the
shadow batch can fit neatly inside the rules we already have with a
modicum of stitching here.
 
> What specifically are you thinking in terms of implementing busy tracking
> in the pool? The idea with adding the shadow object to the request was just
> to get it back in the pool and purgeable asap. I also thought it would limit
> some additional code given that we know buffers in the pool have had any
> pending work completed. Maybe the suggested approach would do a better job
> of those things though.

I was thinking that a pool is basically a list of bo, and you simply
query whether the oldest was still busy when we need a new bo. Which is
the same as how userspace implements its pool of active/inactive objects.
 
> > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> > > index e72017b..82aae29 100644
> > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> > > @@ -188,6 +188,13 @@ struct  intel_engine_cs {
> > >  	bool needs_cmd_parser;
> > >  
> > >  	/*
> > > +	 * A pool of objects to use as shadow copies of client batch buffers
> > > +	 * when the command parser is enabled. Prevents the client from
> > > +	 * modifying the batch contents after software parsing.
> > > +	 */
> > > +	struct i915_gem_batch_pool *batch_pool;
> > 
> > Why are they tied to a ring?
> 
> There was a question as to whether to make a global pool or per-ring
> pools way back when I first sent the parser series. Daniel expressed
> a slight preference for per-ring, but I don't object to a global pool.
> I suppose a potential benefit to per-ring would be if userspace uses
> different sized buffers on different rings, we'd more easily find a
> suitable buffer. But enhancing the pool management could fix that too.

Batch buffer sizes are not fixed per ring anyway. I guess Daniel thought
it might work out easier, but memory is a global resource and should be
tracked primarily at the device level. Userspace segregates its caches
based on size to speed up searches.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC 2/4] drm/i915: Use batch pools with the command parser
  2014-06-18 18:11       ` Chris Wilson
@ 2014-06-18 19:59         ` Daniel Vetter
  0 siblings, 0 replies; 16+ messages in thread
From: Daniel Vetter @ 2014-06-18 19:59 UTC (permalink / raw)
  To: Chris Wilson, Volkin, Bradley D, intel-gfx

On Wed, Jun 18, 2014 at 07:11:46PM +0100, Chris Wilson wrote:
> On Wed, Jun 18, 2014 at 10:49:05AM -0700, Volkin, Bradley D wrote:
> > What specifically are you thinking in terms of implementing busy tracking
> > in the pool? The idea with adding the shadow object to the request was just
> > to get it back in the pool and purgeable asap. I also thought it would limit
> > some additional code given that we know buffers in the pool have had any
> > pending work completed. Maybe the suggested approach would do a better job
> > of those things though.
> 
> I was thinking that a pool is basically a list of bo, and you simply
> query whether the oldest was still busy when we need a new bo. Which is
> the same as how userspace implements its pool of active/inactive objects.

Yeah, linking the ggtt vma of the shadow batch into the eb list should be
the most natural solution. We need the same trickery for secure batches to
make sure they're bound into the ggtt with full ppgtt anyway. Yeah, that's
one of the tasks I'm signed up on and slacked off about :(
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC 1/4] drm/i915: Implement a framework for batch buffer pools
  2014-06-18 16:36 ` [RFC 1/4] drm/i915: Implement a framework for batch buffer pools bradley.d.volkin
@ 2014-06-19  9:48   ` Tvrtko Ursulin
  2014-06-19 17:35     ` Volkin, Bradley D
  0 siblings, 1 reply; 16+ messages in thread
From: Tvrtko Ursulin @ 2014-06-19  9:48 UTC (permalink / raw)
  To: bradley.d.volkin, intel-gfx


Hi Brad,

On 06/18/2014 05:36 PM, bradley.d.volkin@intel.com wrote:
> From: Brad Volkin <bradley.d.volkin@intel.com>
>
> This adds a small module for managing a pool of batch buffers.
> The only current use case is for the command parser, as described
> in the kerneldoc in the patch. The code is simple, but separating
> it out makes it easier to change the underlying algorithms and to
> extend to future use cases should they arise.
>
> The interface is simple: alloc to create an empty pool, free to
> clean it up; get to obtain a new buffer, put to return it to the
> pool. Note that all buffers must be returned to the pool before
> freeing it.
>
> The pool has a maximum number of buffers allowed due to some tests
> (e.g. gem_exec_nop) creating a very large number of buffers (e.g.
> ___). Buffers are purgeable while in the pool, but not explicitly
> truncated in order to avoid overhead during execbuf.
>
> Locking is currently based on the caller holding the struct_mutex.
> We already do that in the places where we will use the batch pool
> for the command parser.
>
> Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
> ---
>
> r.e. pool capacity
> My original testing showed something like thousands of buffers in
> the pool after a gem_exec_nop run. But when I reran with the max
> check disabled just now to get an actual number for the commit
> message, the number was more like 130.  I developed and tested the
> changes incrementally, and suspect that the original run was before
> I implemented the actual copy operation. So I'm inclined to remove
> or at least increase the cap in the final version. Thoughts?

Some random thoughts:

Is it strictly necessary to cap the pool size? I ask because it seems to 
be introducing a limit where so far there wasn't an explicit one.

Are object sizes generally page aligned or all you've seen all sizes in 
the distribution? Either way, I am thinking whether some sort of 
rounding up would be more efficient? Would it cause a problem if 
slightly larger object was returned?

Given that objects are only ever added to the pool, once max number is 
allocated and there are no free ones of exact size it nags userspace 
with EAGAIN and retires objects. But I wonder if the above points could 
reduce that behaviour?

Could we get away without tracking the given out objects in a list and 
just keep a list of available ones? In which case if object can only 
ever be either in the free pool or on one of the existing GEM 
active/inactive lists the same list head could be used?

Could it use its own locking just as easily? Just thinking if the future 
goal is to fine grain locking, this seems self contained enough to be 
doable straight away unless I am missing something.

The above would make the pool more generic, but then I read Chris's 
reply which perhaps suggests to make it more specialised so I don't know.

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC 1/4] drm/i915: Implement a framework for batch buffer pools
  2014-06-19  9:48   ` Tvrtko Ursulin
@ 2014-06-19 17:35     ` Volkin, Bradley D
  2014-06-19 19:07       ` Daniel Vetter
  2014-06-20 13:25       ` Tvrtko Ursulin
  0 siblings, 2 replies; 16+ messages in thread
From: Volkin, Bradley D @ 2014-06-19 17:35 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Thu, Jun 19, 2014 at 02:48:29AM -0700, Tvrtko Ursulin wrote:
> 
> Hi Brad,
> 
> On 06/18/2014 05:36 PM, bradley.d.volkin@intel.com wrote:
> > From: Brad Volkin <bradley.d.volkin@intel.com>
> >
> > This adds a small module for managing a pool of batch buffers.
> > The only current use case is for the command parser, as described
> > in the kerneldoc in the patch. The code is simple, but separating
> > it out makes it easier to change the underlying algorithms and to
> > extend to future use cases should they arise.
> >
> > The interface is simple: alloc to create an empty pool, free to
> > clean it up; get to obtain a new buffer, put to return it to the
> > pool. Note that all buffers must be returned to the pool before
> > freeing it.
> >
> > The pool has a maximum number of buffers allowed due to some tests
> > (e.g. gem_exec_nop) creating a very large number of buffers (e.g.
> > ___). Buffers are purgeable while in the pool, but not explicitly
> > truncated in order to avoid overhead during execbuf.
> >
> > Locking is currently based on the caller holding the struct_mutex.
> > We already do that in the places where we will use the batch pool
> > for the command parser.
> >
> > Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
> > ---
> >
> > r.e. pool capacity
> > My original testing showed something like thousands of buffers in
> > the pool after a gem_exec_nop run. But when I reran with the max
> > check disabled just now to get an actual number for the commit
> > message, the number was more like 130.  I developed and tested the
> > changes incrementally, and suspect that the original run was before
> > I implemented the actual copy operation. So I'm inclined to remove
> > or at least increase the cap in the final version. Thoughts?
> 
> Some random thoughts:
> 
> Is it strictly necessary to cap the pool size? I ask because it seems to 
> be introducing a limit where so far there wasn't an explicit one.

No, I only added it because there were a huge number of buffers in the
pool at one point. But that seems to have been an artifact of my
development process, so unless someone says they really want to keep
the cap, I'm going to drop it in the next rev.

> 
> Are object sizes generally page aligned or all you've seen all sizes in 
> the distribution? Either way, I am thinking whether some sort of 
> rounding up would be more efficient? Would it cause a problem if 
> slightly larger object was returned?

I believe objects must have page-aligned sizes.

> 
> Given that objects are only ever added to the pool, once max number is 
> allocated and there are no free ones of exact size it nags userspace 
> with EAGAIN and retires objects. But I wonder if the above points could 
> reduce that behaviour?
> 
> Could we get away without tracking the given out objects in a list and 
> just keep a list of available ones? In which case if object can only 
> ever be either in the free pool or on one of the existing GEM 
> active/inactive lists the same list head could be used?
> 
> Could it use its own locking just as easily? Just thinking if the future 
> goal is to fine grain locking, this seems self contained enough to be 
> doable straight away unless I am missing something.

I thought about putting a spinlock in the batch pool struct for
exactly that reason. But we have to hold the struct_mutex for at least
drm_gem_object_unreference(). I guess we don't actually need it for
i915_gem_alloc_object(). I opted to just put the mutex_is_locked()
checks so the dependency on the lock being held was hopefully easy to
find by the poor soul who eventually tries to tackle the locking
improvements :). But if people would rather I add a lock to the batch
pool struct, I can do that.

Thanks,
Brad

> 
> The above would make the pool more generic, but then I read Chris's 
> reply which perhaps suggests to make it more specialised so I don't know.
> 
> Regards,
> 
> Tvrtko
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC 1/4] drm/i915: Implement a framework for batch buffer pools
  2014-06-19 17:35     ` Volkin, Bradley D
@ 2014-06-19 19:07       ` Daniel Vetter
  2014-06-20 13:25       ` Tvrtko Ursulin
  1 sibling, 0 replies; 16+ messages in thread
From: Daniel Vetter @ 2014-06-19 19:07 UTC (permalink / raw)
  To: Volkin, Bradley D; +Cc: intel-gfx

On Thu, Jun 19, 2014 at 10:35:44AM -0700, Volkin, Bradley D wrote:
> On Thu, Jun 19, 2014 at 02:48:29AM -0700, Tvrtko Ursulin wrote:
> > 
> > Hi Brad,
> > 
> > On 06/18/2014 05:36 PM, bradley.d.volkin@intel.com wrote:
> > > From: Brad Volkin <bradley.d.volkin@intel.com>
> > >
> > > This adds a small module for managing a pool of batch buffers.
> > > The only current use case is for the command parser, as described
> > > in the kerneldoc in the patch. The code is simple, but separating
> > > it out makes it easier to change the underlying algorithms and to
> > > extend to future use cases should they arise.
> > >
> > > The interface is simple: alloc to create an empty pool, free to
> > > clean it up; get to obtain a new buffer, put to return it to the
> > > pool. Note that all buffers must be returned to the pool before
> > > freeing it.
> > >
> > > The pool has a maximum number of buffers allowed due to some tests
> > > (e.g. gem_exec_nop) creating a very large number of buffers (e.g.
> > > ___). Buffers are purgeable while in the pool, but not explicitly
> > > truncated in order to avoid overhead during execbuf.
> > >
> > > Locking is currently based on the caller holding the struct_mutex.
> > > We already do that in the places where we will use the batch pool
> > > for the command parser.
> > >
> > > Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
> > > ---
> > >
> > > r.e. pool capacity
> > > My original testing showed something like thousands of buffers in
> > > the pool after a gem_exec_nop run. But when I reran with the max
> > > check disabled just now to get an actual number for the commit
> > > message, the number was more like 130.  I developed and tested the
> > > changes incrementally, and suspect that the original run was before
> > > I implemented the actual copy operation. So I'm inclined to remove
> > > or at least increase the cap in the final version. Thoughts?
> > 
> > Some random thoughts:
> > 
> > Is it strictly necessary to cap the pool size? I ask because it seems to 
> > be introducing a limit where so far there wasn't an explicit one.
> 
> No, I only added it because there were a huge number of buffers in the
> pool at one point. But that seems to have been an artifact of my
> development process, so unless someone says they really want to keep
> the cap, I'm going to drop it in the next rev.

As long as we have a single global lru and mark buffers correctly as
purgeable the shrinker logic will size your working set optimally. So I
don't think a cap is necessary as long as you reuse eagerly.

> > Are object sizes generally page aligned or all you've seen all sizes in 
> > the distribution? Either way, I am thinking whether some sort of 
> > rounding up would be more efficient? Would it cause a problem if 
> > slightly larger object was returned?
> 
> I believe objects must have page-aligned sizes.

Yes.

> > Given that objects are only ever added to the pool, once max number is 
> > allocated and there are no free ones of exact size it nags userspace 
> > with EAGAIN and retires objects. But I wonder if the above points could 
> > reduce that behaviour?
> > 
> > Could we get away without tracking the given out objects in a list and 
> > just keep a list of available ones? In which case if object can only 
> > ever be either in the free pool or on one of the existing GEM 
> > active/inactive lists the same list head could be used?
> > 
> > Could it use its own locking just as easily? Just thinking if the future 
> > goal is to fine grain locking, this seems self contained enough to be 
> > doable straight away unless I am missing something.
> 
> I thought about putting a spinlock in the batch pool struct for
> exactly that reason. But we have to hold the struct_mutex for at least
> drm_gem_object_unreference(). I guess we don't actually need it for
> i915_gem_alloc_object(). I opted to just put the mutex_is_locked()
> checks so the dependency on the lock being held was hopefully easy to
> find by the poor soul who eventually tries to tackle the locking
> improvements :). But if people would rather I add a lock to the batch
> pool struct, I can do that.

If we ever get around to more fine-grained locking I expect three classes
of locks: per-gem-object locks, one lru lock for all these lists and a
low-level lock for actually submitting batches to the hw (i.e. tail/ring
updates and execlist submission). But before we have that adding
fine-grained locks that always must nest within dev->struct_mutex will
only make the eventual conversion harder.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC 1/4] drm/i915: Implement a framework for batch buffer pools
  2014-06-19 17:35     ` Volkin, Bradley D
  2014-06-19 19:07       ` Daniel Vetter
@ 2014-06-20 13:25       ` Tvrtko Ursulin
  2014-06-20 15:30         ` Volkin, Bradley D
  1 sibling, 1 reply; 16+ messages in thread
From: Tvrtko Ursulin @ 2014-06-20 13:25 UTC (permalink / raw)
  To: Volkin, Bradley D; +Cc: intel-gfx


On 06/19/2014 06:35 PM, Volkin, Bradley D wrote:
> On Thu, Jun 19, 2014 at 02:48:29AM -0700, Tvrtko Ursulin wrote:
>>
>> Hi Brad,
>>
>> On 06/18/2014 05:36 PM, bradley.d.volkin@intel.com wrote:
>>> From: Brad Volkin <bradley.d.volkin@intel.com>
>>>
>>> This adds a small module for managing a pool of batch buffers.
>>> The only current use case is for the command parser, as described
>>> in the kerneldoc in the patch. The code is simple, but separating
>>> it out makes it easier to change the underlying algorithms and to
>>> extend to future use cases should they arise.
>>>
>>> The interface is simple: alloc to create an empty pool, free to
>>> clean it up; get to obtain a new buffer, put to return it to the
>>> pool. Note that all buffers must be returned to the pool before
>>> freeing it.
>>>
>>> The pool has a maximum number of buffers allowed due to some tests
>>> (e.g. gem_exec_nop) creating a very large number of buffers (e.g.
>>> ___). Buffers are purgeable while in the pool, but not explicitly
>>> truncated in order to avoid overhead during execbuf.
>>>
>>> Locking is currently based on the caller holding the struct_mutex.
>>> We already do that in the places where we will use the batch pool
>>> for the command parser.
>>>
>>> Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
>>> ---
>>>
>>> r.e. pool capacity
>>> My original testing showed something like thousands of buffers in
>>> the pool after a gem_exec_nop run. But when I reran with the max
>>> check disabled just now to get an actual number for the commit
>>> message, the number was more like 130.  I developed and tested the
>>> changes incrementally, and suspect that the original run was before
>>> I implemented the actual copy operation. So I'm inclined to remove
>>> or at least increase the cap in the final version. Thoughts?
>>
>> Some random thoughts:
>>
>> Is it strictly necessary to cap the pool size? I ask because it seems to
>> be introducing a limit where so far there wasn't an explicit one.
>
> No, I only added it because there were a huge number of buffers in the
> pool at one point. But that seems to have been an artifact of my
> development process, so unless someone says they really want to keep
> the cap, I'm going to drop it in the next rev.

Cap or no cap (I am for no cap), but the pool is still "grow only" at 
the moment, no? So one allocation storm and objects on the pool inactive 
list end up wasting memory forever.

Unless my novice eyes are missing something hidden? But it can't be 
since then there would have to be a mechanism letting the pool know that 
some objects got expired.

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC 1/4] drm/i915: Implement a framework for batch buffer pools
  2014-06-20 13:25       ` Tvrtko Ursulin
@ 2014-06-20 15:30         ` Volkin, Bradley D
  2014-06-20 15:41           ` Tvrtko Ursulin
  0 siblings, 1 reply; 16+ messages in thread
From: Volkin, Bradley D @ 2014-06-20 15:30 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Fri, Jun 20, 2014 at 06:25:56AM -0700, Tvrtko Ursulin wrote:
> 
> On 06/19/2014 06:35 PM, Volkin, Bradley D wrote:
> > On Thu, Jun 19, 2014 at 02:48:29AM -0700, Tvrtko Ursulin wrote:
> >>
> >> Hi Brad,
> >>
> >> On 06/18/2014 05:36 PM, bradley.d.volkin@intel.com wrote:
> >>> From: Brad Volkin <bradley.d.volkin@intel.com>
> >>>
> >>> This adds a small module for managing a pool of batch buffers.
> >>> The only current use case is for the command parser, as described
> >>> in the kerneldoc in the patch. The code is simple, but separating
> >>> it out makes it easier to change the underlying algorithms and to
> >>> extend to future use cases should they arise.
> >>>
> >>> The interface is simple: alloc to create an empty pool, free to
> >>> clean it up; get to obtain a new buffer, put to return it to the
> >>> pool. Note that all buffers must be returned to the pool before
> >>> freeing it.
> >>>
> >>> The pool has a maximum number of buffers allowed due to some tests
> >>> (e.g. gem_exec_nop) creating a very large number of buffers (e.g.
> >>> ___). Buffers are purgeable while in the pool, but not explicitly
> >>> truncated in order to avoid overhead during execbuf.
> >>>
> >>> Locking is currently based on the caller holding the struct_mutex.
> >>> We already do that in the places where we will use the batch pool
> >>> for the command parser.
> >>>
> >>> Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
> >>> ---
> >>>
> >>> r.e. pool capacity
> >>> My original testing showed something like thousands of buffers in
> >>> the pool after a gem_exec_nop run. But when I reran with the max
> >>> check disabled just now to get an actual number for the commit
> >>> message, the number was more like 130.  I developed and tested the
> >>> changes incrementally, and suspect that the original run was before
> >>> I implemented the actual copy operation. So I'm inclined to remove
> >>> or at least increase the cap in the final version. Thoughts?
> >>
> >> Some random thoughts:
> >>
> >> Is it strictly necessary to cap the pool size? I ask because it seems to
> >> be introducing a limit where so far there wasn't an explicit one.
> >
> > No, I only added it because there were a huge number of buffers in the
> > pool at one point. But that seems to have been an artifact of my
> > development process, so unless someone says they really want to keep
> > the cap, I'm going to drop it in the next rev.
> 
> Cap or no cap (I am for no cap), but the pool is still "grow only" at 
> the moment, no? So one allocation storm and objects on the pool inactive 
> list end up wasting memory forever.

Oh, so what happens is that when you put() an object back in the pool, we
set obj->madv = I915_MADV_DONTNEED, which should tell the shrinker that it
can drop the backing storage for the object if we need space. When you get()
an object, we set obj->madv = I915_MADV_WILLNEED and get new backing pages.
So the number of objects grows (capped or not), but the memory used can be
controlled.

Brad

> 
> Unless my novice eyes are missing something hidden? But it can't be 
> since then there would have to be a mechanism letting the pool know that 
> some objects got expired.
> 
> Regards,
> 
> Tvrtko
> 
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC 1/4] drm/i915: Implement a framework for batch buffer pools
  2014-06-20 15:30         ` Volkin, Bradley D
@ 2014-06-20 15:41           ` Tvrtko Ursulin
  2014-06-20 16:06             ` Volkin, Bradley D
  0 siblings, 1 reply; 16+ messages in thread
From: Tvrtko Ursulin @ 2014-06-20 15:41 UTC (permalink / raw)
  To: Volkin, Bradley D; +Cc: intel-gfx


On 06/20/2014 04:30 PM, Volkin, Bradley D wrote:
> On Fri, Jun 20, 2014 at 06:25:56AM -0700, Tvrtko Ursulin wrote:
>>
>> On 06/19/2014 06:35 PM, Volkin, Bradley D wrote:
>>> On Thu, Jun 19, 2014 at 02:48:29AM -0700, Tvrtko Ursulin wrote:
>>>>
>>>> Hi Brad,
>>>>
>>>> On 06/18/2014 05:36 PM, bradley.d.volkin@intel.com wrote:
>>>>> From: Brad Volkin <bradley.d.volkin@intel.com>
>>>>>
>>>>> This adds a small module for managing a pool of batch buffers.
>>>>> The only current use case is for the command parser, as described
>>>>> in the kerneldoc in the patch. The code is simple, but separating
>>>>> it out makes it easier to change the underlying algorithms and to
>>>>> extend to future use cases should they arise.
>>>>>
>>>>> The interface is simple: alloc to create an empty pool, free to
>>>>> clean it up; get to obtain a new buffer, put to return it to the
>>>>> pool. Note that all buffers must be returned to the pool before
>>>>> freeing it.
>>>>>
>>>>> The pool has a maximum number of buffers allowed due to some tests
>>>>> (e.g. gem_exec_nop) creating a very large number of buffers (e.g.
>>>>> ___). Buffers are purgeable while in the pool, but not explicitly
>>>>> truncated in order to avoid overhead during execbuf.
>>>>>
>>>>> Locking is currently based on the caller holding the struct_mutex.
>>>>> We already do that in the places where we will use the batch pool
>>>>> for the command parser.
>>>>>
>>>>> Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
>>>>> ---
>>>>>
>>>>> r.e. pool capacity
>>>>> My original testing showed something like thousands of buffers in
>>>>> the pool after a gem_exec_nop run. But when I reran with the max
>>>>> check disabled just now to get an actual number for the commit
>>>>> message, the number was more like 130.  I developed and tested the
>>>>> changes incrementally, and suspect that the original run was before
>>>>> I implemented the actual copy operation. So I'm inclined to remove
>>>>> or at least increase the cap in the final version. Thoughts?
>>>>
>>>> Some random thoughts:
>>>>
>>>> Is it strictly necessary to cap the pool size? I ask because it seems to
>>>> be introducing a limit where so far there wasn't an explicit one.
>>>
>>> No, I only added it because there were a huge number of buffers in the
>>> pool at one point. But that seems to have been an artifact of my
>>> development process, so unless someone says they really want to keep
>>> the cap, I'm going to drop it in the next rev.
>>
>> Cap or no cap (I am for no cap), but the pool is still "grow only" at
>> the moment, no? So one allocation storm and objects on the pool inactive
>> list end up wasting memory forever.
>
> Oh, so what happens is that when you put() an object back in the pool, we
> set obj->madv = I915_MADV_DONTNEED, which should tell the shrinker that it
> can drop the backing storage for the object if we need space. When you get()
> an object, we set obj->madv = I915_MADV_WILLNEED and get new backing pages.
> So the number of objects grows (capped or not), but the memory used can be
> controlled.

Education time for me I see. :)

So the object is in pool->inactive_list _and_ in some other list so 
shrinker can find it?

Thanks,

Tvrtko

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC 1/4] drm/i915: Implement a framework for batch buffer pools
  2014-06-20 15:41           ` Tvrtko Ursulin
@ 2014-06-20 16:06             ` Volkin, Bradley D
  0 siblings, 0 replies; 16+ messages in thread
From: Volkin, Bradley D @ 2014-06-20 16:06 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Fri, Jun 20, 2014 at 08:41:08AM -0700, Tvrtko Ursulin wrote:
> 
> On 06/20/2014 04:30 PM, Volkin, Bradley D wrote:
> > On Fri, Jun 20, 2014 at 06:25:56AM -0700, Tvrtko Ursulin wrote:
> >>
> >> On 06/19/2014 06:35 PM, Volkin, Bradley D wrote:
> >>> On Thu, Jun 19, 2014 at 02:48:29AM -0700, Tvrtko Ursulin wrote:
> >>>>
> >>>> Hi Brad,
> >>>>
> >>>> On 06/18/2014 05:36 PM, bradley.d.volkin@intel.com wrote:
> >> Cap or no cap (I am for no cap), but the pool is still "grow only" at
> >> the moment, no? So one allocation storm and objects on the pool inactive
> >> list end up wasting memory forever.
> >
> > Oh, so what happens is that when you put() an object back in the pool, we
> > set obj->madv = I915_MADV_DONTNEED, which should tell the shrinker that it
> > can drop the backing storage for the object if we need space. When you get()
> > an object, we set obj->madv = I915_MADV_WILLNEED and get new backing pages.
> > So the number of objects grows (capped or not), but the memory used can be
> > controlled.
> 
> Education time for me I see. :)
> 
> So the object is in pool->inactive_list _and_ in some other list so 
> shrinker can find it?

Yes. In fact, they are on several other lists. Here's my understanding:

The dev_priv->mm struct has bound_list and unbound_list, which track which
objects are or are not bound in some gtt. The shrinker operates on these
lists to drop backing storage when we need physical space. The pool objects
are added to these lists when we explicitly call ggtt_pin() and  when the
object eventually gets an unbind().

The ring structs have an active_list, which track objects that are used by
work still in progress on that ring. The i915_address_space structs have an
inactive_list, which contains vmas that are bound in that address space but
are not used by work still in progress. The driver uses these to evict
objects in a given gtt/ppgtt when we need GPU virtual address space. We
explicitly put a pool object's vma on the active_list with a move_to_active()
call at the end of do_execbuffer(). Retiring requests moves it to the
appropriate i915_address_space inactive_list.

Brad

> 
> Thanks,
> 
> Tvrtko

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2014-06-20 16:05 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-18 16:36 [RFC 0/4] Command parser batch buffer copy bradley.d.volkin
2014-06-18 16:36 ` [RFC 1/4] drm/i915: Implement a framework for batch buffer pools bradley.d.volkin
2014-06-19  9:48   ` Tvrtko Ursulin
2014-06-19 17:35     ` Volkin, Bradley D
2014-06-19 19:07       ` Daniel Vetter
2014-06-20 13:25       ` Tvrtko Ursulin
2014-06-20 15:30         ` Volkin, Bradley D
2014-06-20 15:41           ` Tvrtko Ursulin
2014-06-20 16:06             ` Volkin, Bradley D
2014-06-18 16:36 ` [RFC 2/4] drm/i915: Use batch pools with the command parser bradley.d.volkin
2014-06-18 16:52   ` Chris Wilson
2014-06-18 17:49     ` Volkin, Bradley D
2014-06-18 18:11       ` Chris Wilson
2014-06-18 19:59         ` Daniel Vetter
2014-06-18 16:36 ` [RFC 3/4] drm/i915: Add a batch pool debugfs file bradley.d.volkin
2014-06-18 16:36 ` [RFC 4/4] drm/i915: Dispatch the shadow batch buffer bradley.d.volkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.