All of lore.kernel.org
 help / color / mirror / Atom feed
* Execbuf fixes and major tuning
@ 2016-08-22  8:03 Chris Wilson
  2016-08-22  8:03 ` [PATCH 01/17] drm/i915: Skip holding an object reference for execbuf preparation Chris Wilson
                   ` (18 more replies)
  0 siblings, 19 replies; 47+ messages in thread
From: Chris Wilson @ 2016-08-22  8:03 UTC (permalink / raw)
  To: intel-gfx

Not only do we fix up some warts in fulfilling the execbuf uABI, but we
also fix the major cause of ppgtt execbuf slowdown and make further
refinements to speed up relocation processing.
-Chris

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH 01/17] drm/i915: Skip holding an object reference for execbuf preparation
  2016-08-22  8:03 Execbuf fixes and major tuning Chris Wilson
@ 2016-08-22  8:03 ` Chris Wilson
  2016-08-22 11:21   ` Joonas Lahtinen
  2016-08-22  8:03 ` [PATCH 02/17] drm/i915: Defer active reference until required Chris Wilson
                   ` (17 subsequent siblings)
  18 siblings, 1 reply; 47+ messages in thread
From: Chris Wilson @ 2016-08-22  8:03 UTC (permalink / raw)
  To: intel-gfx

This is a golden oldie! We can shave a couple of locked instructions for
about 10% of the per-object overhead by not taking an extra kref whilst
reserving objects for an execbuf. Due to lock management this is safe,
as we cannot lose the original object reference without the lock.
Equally, because this relies on the heavy BKL^W struct_mutex, it is also
likely to be only a temporary optimisation until we have fine grained
locking. (That's what we said 5 years ago, so there's probably another
10 years before we get around to finer grained locking!)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 601156c353cc..6baad503764d 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -164,7 +164,6 @@ eb_lookup_vmas(struct eb_vmas *eb,
 			goto err;
 		}
 
-		i915_gem_object_get(obj);
 		list_add_tail(&obj->obj_exec_link, &objects);
 	}
 	spin_unlock(&file->table_lock);
@@ -275,7 +274,6 @@ static void eb_destroy(struct eb_vmas *eb)
 				       exec_list);
 		list_del_init(&vma->exec_list);
 		i915_gem_execbuffer_unreserve_vma(vma);
-		i915_vma_put(vma);
 	}
 	kfree(eb);
 }
@@ -1017,7 +1015,6 @@ i915_gem_execbuffer_relocate_slow(struct drm_device *dev,
 		vma = list_first_entry(&eb->vmas, struct i915_vma, exec_list);
 		list_del_init(&vma->exec_list);
 		i915_gem_execbuffer_unreserve_vma(vma);
-		i915_vma_put(vma);
 	}
 
 	mutex_unlock(&dev->struct_mutex);
@@ -1424,7 +1421,6 @@ i915_gem_execbuffer_parse(struct intel_engine_cs *engine,
 
 	vma->exec_entry = shadow_exec_entry;
 	vma->exec_entry->flags = __EXEC_OBJECT_HAS_PIN;
-	i915_gem_object_get(shadow_batch_obj);
 	list_add_tail(&vma->exec_list, &eb->vmas);
 
 out:
-- 
2.9.3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 02/17] drm/i915: Defer active reference until required
  2016-08-22  8:03 Execbuf fixes and major tuning Chris Wilson
  2016-08-22  8:03 ` [PATCH 01/17] drm/i915: Skip holding an object reference for execbuf preparation Chris Wilson
@ 2016-08-22  8:03 ` Chris Wilson
  2016-08-22  8:03 ` [PATCH 03/17] drm/i915: Allow the user to pass a context to any ring Chris Wilson
                   ` (16 subsequent siblings)
  18 siblings, 0 replies; 47+ messages in thread
From: Chris Wilson @ 2016-08-22  8:03 UTC (permalink / raw)
  To: intel-gfx

We only need the active reference to keep the object alive after the
handle has been deleted (so as to prevent a synchronous gem_close). Why
the pay the price of a kref on every execbuf when we can insert that
final active ref just in time for the handle deletion?

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h              | 27 +++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_gem.c              | 23 ++++++++++++++++++++++-
 drivers/gpu/drm/i915/i915_gem_batch_pool.c   |  2 +-
 drivers/gpu/drm/i915/i915_gem_context.c      |  2 +-
 drivers/gpu/drm/i915/i915_gem_execbuffer.c   |  5 +----
 drivers/gpu/drm/i915/i915_gem_render_state.c |  2 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c      |  2 +-
 7 files changed, 54 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 24672d6e36c1..a85316be9116 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2175,6 +2175,13 @@ struct drm_i915_gem_object {
 	((READ_ONCE((bo)->flags) >> I915_BO_ACTIVE_SHIFT) & I915_BO_ACTIVE_MASK)
 
 	/**
+	 * Have we taken a reference for the object for incomplete GPU
+	 * activity?
+	 */
+#define I915_BO_ACTIVE_REF_SHIFT (I915_BO_ACTIVE_SHIFT + I915_NUM_ENGINES)
+#define I915_BO_ACTIVE_REF_BIT BIT(I915_BO_ACTIVE_REF_SHIFT)
+
+	/**
 	 * This is set if the object has been written to since last bound
 	 * to the GTT
 	 */
@@ -2344,6 +2351,26 @@ i915_gem_object_has_active_engine(const struct drm_i915_gem_object *obj,
 	return obj->flags & BIT(engine + I915_BO_ACTIVE_SHIFT);
 }
 
+static inline bool
+i915_gem_object_has_active_reference(const struct drm_i915_gem_object *obj)
+{
+	return obj->flags & I915_BO_ACTIVE_REF_BIT;
+}
+
+static inline void
+i915_gem_object_set_active_reference(struct drm_i915_gem_object *obj)
+{
+	obj->flags |= I915_BO_ACTIVE_REF_BIT;
+}
+
+static inline void
+i915_gem_object_clear_active_reference(struct drm_i915_gem_object *obj)
+{
+	obj->flags &= ~I915_BO_ACTIVE_REF_BIT;
+}
+
+void __i915_gem_object_release_unless_active(struct drm_i915_gem_object *obj);
+
 static inline unsigned int
 i915_gem_object_get_tiling(struct drm_i915_gem_object *obj)
 {
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 04607d4115d6..0a48c90979a9 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2444,7 +2444,10 @@ i915_gem_object_retire__read(struct i915_gem_active *active,
 		list_move_tail(&obj->global_list,
 			       &request->i915->mm.bound_list);
 
-	i915_gem_object_put(obj);
+	if (i915_gem_object_has_active_reference(obj)) {
+		i915_gem_object_clear_active_reference(obj);
+		i915_gem_object_put(obj);
+	}
 }
 
 static bool i915_context_is_banned(const struct i915_gem_context *ctx)
@@ -2675,6 +2678,12 @@ void i915_gem_close_object(struct drm_gem_object *gem, struct drm_file *file)
 	list_for_each_entry_safe(vma, vn, &obj->vma_list, obj_link)
 		if (vma->vm->file == fpriv)
 			i915_vma_close(vma);
+
+	if (i915_gem_object_is_active(obj) &&
+	    !i915_gem_object_has_active_reference(obj)) {
+		i915_gem_object_set_active_reference(obj);
+		i915_gem_object_get(obj);
+	}
 	mutex_unlock(&obj->base.dev->struct_mutex);
 }
 
@@ -4223,6 +4232,18 @@ void i915_gem_free_object(struct drm_gem_object *gem_obj)
 	intel_runtime_pm_put(dev_priv);
 }
 
+void __i915_gem_object_release_unless_active(struct drm_i915_gem_object *obj)
+{
+	if (!obj)
+		return;
+
+	GEM_BUG_ON(i915_gem_object_has_active_reference(obj));
+	if (i915_gem_object_is_active(obj))
+		i915_gem_object_set_active_reference(obj);
+	else
+		i915_gem_object_put(obj);
+}
+
 int i915_gem_suspend(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = to_i915(dev);
diff --git a/drivers/gpu/drm/i915/i915_gem_batch_pool.c b/drivers/gpu/drm/i915/i915_gem_batch_pool.c
index ed989596d9a3..cb25cad3318c 100644
--- a/drivers/gpu/drm/i915/i915_gem_batch_pool.c
+++ b/drivers/gpu/drm/i915/i915_gem_batch_pool.c
@@ -73,7 +73,7 @@ void i915_gem_batch_pool_fini(struct i915_gem_batch_pool *pool)
 		list_for_each_entry_safe(obj, next,
 					 &pool->cache_list[n],
 					 batch_pool_link)
-			i915_gem_object_put(obj);
+			__i915_gem_object_release_unless_active(obj);
 
 		INIT_LIST_HEAD(&pool->cache_list[n]);
 	}
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 35950ee46a1d..ab54f208a4a9 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -155,7 +155,7 @@ void i915_gem_context_free(struct kref *ctx_ref)
 		if (ce->ring)
 			intel_ring_free(ce->ring);
 
-		i915_vma_put(ce->state);
+		__i915_gem_object_release_unless_active(ce->state->obj);
 	}
 
 	put_pid(ctx->pid);
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 6baad503764d..8f9d5ad0cfd8 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1280,15 +1280,12 @@ void i915_vma_move_to_active(struct i915_vma *vma,
 
 	obj->dirty = 1; /* be paranoid  */
 
-	/* Add a reference if we're newly entering the active list.
-	 * The order in which we add operations to the retirement queue is
+	/* The order in which we add operations to the retirement queue is
 	 * vital here: mark_active adds to the start of the callback list,
 	 * such that subsequent callbacks are called first. Therefore we
 	 * add the active reference first and queue for it to be dropped
 	 * *last*.
 	 */
-	if (!i915_gem_object_is_active(obj))
-		i915_gem_object_get(obj);
 	i915_gem_object_set_active(obj, idx);
 	i915_gem_active_set(&obj->last_read[idx], req);
 
diff --git a/drivers/gpu/drm/i915/i915_gem_render_state.c b/drivers/gpu/drm/i915/i915_gem_render_state.c
index 95b7e9afd5f8..055139343137 100644
--- a/drivers/gpu/drm/i915/i915_gem_render_state.c
+++ b/drivers/gpu/drm/i915/i915_gem_render_state.c
@@ -225,6 +225,6 @@ int i915_gem_render_state_init(struct drm_i915_gem_request *req)
 err_unpin:
 	i915_vma_unpin(so.vma);
 err_obj:
-	i915_gem_object_put(obj);
+	__i915_gem_object_release_unless_active(obj);
 	return ret;
 }
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index cc5bcd14b6df..63ed3fb9c49e 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2014,7 +2014,7 @@ intel_engine_create_ring(struct intel_engine_cs *engine, int size)
 void
 intel_ring_free(struct intel_ring *ring)
 {
-	i915_vma_put(ring->vma);
+	__i915_gem_object_release_unless_active(ring->vma->obj);
 	list_del(&ring->link);
 	kfree(ring);
 }
-- 
2.9.3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 03/17] drm/i915: Allow the user to pass a context to any ring
  2016-08-22  8:03 Execbuf fixes and major tuning Chris Wilson
  2016-08-22  8:03 ` [PATCH 01/17] drm/i915: Skip holding an object reference for execbuf preparation Chris Wilson
  2016-08-22  8:03 ` [PATCH 02/17] drm/i915: Defer active reference until required Chris Wilson
@ 2016-08-22  8:03 ` Chris Wilson
  2016-08-22 11:23   ` Joonas Lahtinen
  2016-08-22  8:03 ` [PATCH 04/17] drm/i915: Fix i915_gem_evict_for_vma (soft-pinning) Chris Wilson
                   ` (15 subsequent siblings)
  18 siblings, 1 reply; 47+ messages in thread
From: Chris Wilson @ 2016-08-22  8:03 UTC (permalink / raw)
  To: intel-gfx

With full-ppgtt, we want the user to have full control over their memory
layout, with a separate instance per context. Forcing them to use a
shared memory layout for !RCS not only duplicates the amount of work we
have to do, but also defeats the memory segregation on offer.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 8f9d5ad0cfd8..fb1a64738fb8 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1250,12 +1250,9 @@ static struct i915_gem_context *
 i915_gem_validate_context(struct drm_device *dev, struct drm_file *file,
 			  struct intel_engine_cs *engine, const u32 ctx_id)
 {
-	struct i915_gem_context *ctx = NULL;
+	struct i915_gem_context *ctx;
 	struct i915_ctx_hang_stats *hs;
 
-	if (engine->id != RCS && ctx_id != DEFAULT_CONTEXT_HANDLE)
-		return ERR_PTR(-EINVAL);
-
 	ctx = i915_gem_context_lookup(file->driver_priv, ctx_id);
 	if (IS_ERR(ctx))
 		return ctx;
-- 
2.9.3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 04/17] drm/i915: Fix i915_gem_evict_for_vma (soft-pinning)
  2016-08-22  8:03 Execbuf fixes and major tuning Chris Wilson
                   ` (2 preceding siblings ...)
  2016-08-22  8:03 ` [PATCH 03/17] drm/i915: Allow the user to pass a context to any ring Chris Wilson
@ 2016-08-22  8:03 ` Chris Wilson
  2016-08-22  8:03 ` [PATCH 05/17] drm/i915: Pin the pages whilst operating on them Chris Wilson
                   ` (14 subsequent siblings)
  18 siblings, 0 replies; 47+ messages in thread
From: Chris Wilson @ 2016-08-22  8:03 UTC (permalink / raw)
  To: intel-gfx

Soft-pinning depends upon being able to check for availabilty of an
inverval and evict overlapping object fro a drm_mm range manager very
quickly. Currently it uses a linear list which makes performance dire,
and softpinning not a suitable replacement.

It also helps if the routine reports the correct error codes as expected
by its callers and emits a tracepoint upon use.

For posterity since the wrong patch was pushed (i.e. that missed these
key points), this is the changelog that should have been on commit
506a8e87d8d2 ("drm/i915: Add soft-pinning API for execbuffer"):

Userspace can pass in an offset that it presumes the object is located
at. The kernel will then do its utmost to fit the object into that
location. The assumption is that userspace is handling its own object
locations (for example along with full-ppgtt) and that the kernel will
rarely have to make space for the user's requests.

This extends the DRM_IOCTL_I915_GEM_EXECBUFFER2 to do the following:
* if the user supplies a virtual address via the execobject->offset
  *and* sets the EXEC_OBJECT_PINNED flag in execobject->flags, then
  that object is placed at that offset in the address space selected
  by the context specifier in execbuffer.
* the location must be aligned to the GTT page size, 4096 bytes
* as the object is placed exactly as specified, it may be used by this
  execbuffer call without relocations pointing to it

It may fail to do so if:
* EINVAL is returned if the object does not have a 4096 byte aligned
  address
* the object conflicts with another pinned object (either pinned by
  hardware in that address space, e.g. scanouts in the aliasing ppgtt)
  or within the same batch.
  EBUSY is returned if the location is pinned by hardware
  EINVAL is returned if the location is already in use by the batch
* EINVAL is returned if the object conflicts with its own alignment (as meets
  the hardware requirements) or if the placement of the object does not fit
  within the address space

All other execbuffer errors apply.

Presence of this execbuf extension may be queried by passing
I915_PARAM_HAS_EXEC_SOFTPIN to DRM_IOCTL_I915_GETPARAM and checking for
a reported value of 1 (or greater).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/drm_mm.c                   |  9 +++
 drivers/gpu/drm/i915/i915_drv.h            |  3 +-
 drivers/gpu/drm/i915/i915_gem.c            | 12 +---
 drivers/gpu/drm/i915/i915_gem_evict.c      | 88 +++++++++++++++++++++---------
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  2 +-
 drivers/gpu/drm/i915/i915_gem_gtt.c        | 10 ++--
 drivers/gpu/drm/i915/i915_trace.h          | 23 ++++++++
 7 files changed, 105 insertions(+), 42 deletions(-)

diff --git a/drivers/gpu/drm/drm_mm.c b/drivers/gpu/drm/drm_mm.c
index 11d44a1e0ab3..a13215a525e6 100644
--- a/drivers/gpu/drm/drm_mm.c
+++ b/drivers/gpu/drm/drm_mm.c
@@ -275,6 +275,15 @@ int drm_mm_reserve_node(struct drm_mm *mm, struct drm_mm_node *node)
 	if (hole_start > node->start || hole_end < end)
 		return -ENOSPC;
 
+	if (mm->color_adjust) {
+		u64 adj_start = hole_start, adj_end = hole_end;
+
+		mm->color_adjust(hole, node->color, &adj_start, &adj_end);
+		if (adj_start > node->start ||
+		    adj_end < node->start + node->size)
+			return -ENOSPC;
+	}
+
 	node->mm = mm;
 	node->allocated = 1;
 
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index a85316be9116..30ae14870c47 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3470,7 +3470,8 @@ int __must_check i915_gem_evict_something(struct i915_address_space *vm,
 					  unsigned cache_level,
 					  u64 start, u64 end,
 					  unsigned flags);
-int __must_check i915_gem_evict_for_vma(struct i915_vma *target);
+int __must_check i915_gem_evict_for_vma(struct i915_vma *vma,
+					unsigned int flags);
 int i915_gem_evict_vm(struct i915_address_space *vm, bool do_idle);
 
 /* belongs in i915_gem_gtt.h */
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 0a48c90979a9..de895c7ef9e6 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -68,8 +68,8 @@ insert_mappable_node(struct drm_i915_private *i915,
 {
 	memset(node, 0, sizeof(*node));
 	return drm_mm_insert_node_in_range_generic(&i915->ggtt.base.mm, node,
-						   size, 0, 0, 0,
-						   i915->ggtt.mappable_end,
+						   size, 0, -1,
+						   0, i915->ggtt.mappable_end,
 						   DRM_MM_SEARCH_DEFAULT,
 						   DRM_MM_CREATE_DEFAULT);
 }
@@ -2966,12 +2966,6 @@ static bool i915_gem_valid_gtt_space(struct i915_vma *vma,
 	if (vma->vm->mm.color_adjust == NULL)
 		return true;
 
-	if (!drm_mm_node_allocated(gtt_space))
-		return true;
-
-	if (list_empty(&gtt_space->node_list))
-		return true;
-
 	other = list_entry(gtt_space->node_list.prev, struct drm_mm_node, node_list);
 	if (other->allocated && !other->hole_follows && other->color != cache_level)
 		return false;
@@ -3056,7 +3050,7 @@ i915_vma_insert(struct i915_vma *vma, u64 size, u64 alignment, u64 flags)
 		vma->node.color = obj->cache_level;
 		ret = drm_mm_reserve_node(&vma->vm->mm, &vma->node);
 		if (ret) {
-			ret = i915_gem_evict_for_vma(vma);
+			ret = i915_gem_evict_for_vma(vma, flags);
 			if (ret == 0)
 				ret = drm_mm_reserve_node(&vma->vm->mm, &vma->node);
 			if (ret)
diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c
index 815d5fbe07ac..a8f6825cf14f 100644
--- a/drivers/gpu/drm/i915/i915_gem_evict.c
+++ b/drivers/gpu/drm/i915/i915_gem_evict.c
@@ -205,43 +205,81 @@ found:
 	return ret;
 }
 
-int
-i915_gem_evict_for_vma(struct i915_vma *target)
+int i915_gem_evict_for_vma(struct i915_vma *target, unsigned int flags)
 {
-	struct drm_mm_node *node, *next;
+	struct list_head eviction_list;
+	struct drm_mm_node *node;
+	u64 start = target->node.start;
+	u64 end = start + target->node.size - 1;
+	struct i915_vma *vma, *next;
+	bool check_snoop;
+	int ret = 0;
 
-	list_for_each_entry_safe(node, next,
-			&target->vm->mm.head_node.node_list,
-			node_list) {
-		struct i915_vma *vma;
-		int ret;
+	trace_i915_gem_evict_vma(target, flags);
 
-		if (node->start + node->size <= target->node.start)
-			continue;
-		if (node->start >= target->node.start + target->node.size)
+	check_snoop = target->vm->mm.color_adjust;
+	if (check_snoop) {
+		if (start > target->vm->start)
+			start -= 4096;
+		if (end < target->vm->start + target->vm->total)
+			end += 4096;
+	}
+
+	node = drm_mm_interval_first(&target->vm->mm, start, end);
+	if (!node)
+		return 0;
+
+	INIT_LIST_HEAD(&eviction_list);
+	vma = container_of(node, typeof(*vma), node);
+	list_for_each_entry_from(vma,
+				 &target->vm->mm.head_node.node_list,
+				 node.node_list) {
+		if (vma->node.start > end)
 			break;
 
-		vma = container_of(node, typeof(*vma), node);
+		if (check_snoop) {
+			if (vma->node.start + vma->node.size == target->node.start) {
+				if (vma->node.color == target->node.color)
+					continue;
+			}
+			if (vma->node.start == target->node.start + target->node.size) {
+				if (vma->node.color == target->node.color)
+					continue;
+			}
+		}
 
-		if (i915_vma_is_pinned(vma)) {
-			if (!vma->exec_entry || i915_vma_pin_count(vma) > 1)
-				/* Object is pinned for some other use */
-				return -EBUSY;
+		if (vma->node.color == -1) {
+			ret = -ENOSPC;
+			break;
+		}
 
-			/* We need to evict a buffer in the same batch */
-			if (vma->exec_entry->flags & EXEC_OBJECT_PINNED)
-				/* Overlapping fixed objects in the same batch */
-				return -EINVAL;
+		if (flags & PIN_NONBLOCK &&
+		    (i915_vma_is_pinned(vma) || i915_vma_is_active(vma))) {
+			ret = -ENOSPC;
+			break;
+		}
 
-			return -ENOSPC;
+		/* Overlap of objects in the same batch? */
+		if (i915_vma_is_pinned(vma)) {
+			ret = -ENOSPC;
+			if (vma->exec_entry &&
+			    vma->exec_entry->flags & EXEC_OBJECT_PINNED)
+				ret = -EINVAL;
+			break;
 		}
 
-		ret = i915_vma_unbind(vma);
-		if (ret)
-			return ret;
+		__i915_vma_pin(vma);
+		list_add(&vma->exec_list, &eviction_list);
 	}
 
-	return 0;
+	list_for_each_entry_safe(vma, next, &eviction_list, exec_list) {
+		list_del_init(&vma->exec_list);
+		__i915_vma_unpin(vma);
+		if (ret == 0)
+			ret = i915_vma_unbind(vma);
+	}
+
+	return ret;
 }
 
 /**
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index fb1a64738fb8..3a5f43960cb6 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -445,7 +445,7 @@ static void *reloc_iomap(struct drm_i915_gem_object *obj,
 			memset(&cache->node, 0, sizeof(cache->node));
 			ret = drm_mm_insert_node_in_range_generic
 				(&ggtt->base.mm, &cache->node,
-				 4096, 0, 0,
+				 4096, 0, -1,
 				 0, ggtt->mappable_end,
 				 DRM_MM_SEARCH_DEFAULT,
 				 DRM_MM_CREATE_DEFAULT);
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 435c3fc73e18..0061c3841760 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1996,16 +1996,14 @@ static int gen6_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt)
 		return ret;
 
 alloc:
-	ret = drm_mm_insert_node_in_range_generic(&ggtt->base.mm,
-						  &ppgtt->node, GEN6_PD_SIZE,
-						  GEN6_PD_ALIGN, 0,
-						  0, ggtt->base.total,
+	ret = drm_mm_insert_node_in_range_generic(&ggtt->base.mm, &ppgtt->node,
+						  GEN6_PD_SIZE, GEN6_PD_ALIGN,
+						  -1, 0, ggtt->base.total,
 						  DRM_MM_TOPDOWN);
 	if (ret == -ENOSPC && !retried) {
 		ret = i915_gem_evict_something(&ggtt->base,
 					       GEN6_PD_SIZE, GEN6_PD_ALIGN,
-					       I915_CACHE_NONE,
-					       0, ggtt->base.total,
+					       -1, 0, ggtt->base.total,
 					       0);
 		if (ret)
 			goto err_out;
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index 178798002a73..7a46e7c8a0cd 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -450,6 +450,29 @@ TRACE_EVENT(i915_gem_evict_vm,
 	    TP_printk("dev=%d, vm=%p", __entry->dev, __entry->vm)
 );
 
+TRACE_EVENT(i915_gem_evict_vma,
+	    TP_PROTO(struct i915_vma *vma, unsigned int flags),
+	    TP_ARGS(vma, flags),
+
+	    TP_STRUCT__entry(
+			     __field(u32, dev)
+			     __field(struct i915_address_space *, vm)
+			     __field(u64, start)
+			     __field(u64, size)
+			     __field(unsigned int, flags)
+			    ),
+
+	    TP_fast_assign(
+			   __entry->dev = vma->vm->dev->primary->index;
+			   __entry->vm = vma->vm;
+			   __entry->start = vma->node.start;
+			   __entry->size = vma->node.size;
+			   __entry->flags = flags;
+			  ),
+
+	    TP_printk("dev=%d, vm=%p, start=%llx size=%llx, flags=%x", __entry->dev, __entry->vm, (long long)__entry->start, (long long)__entry->size, __entry->flags)
+);
+
 TRACE_EVENT(i915_gem_ring_sync_to,
 	    TP_PROTO(struct drm_i915_gem_request *to,
 		     struct drm_i915_gem_request *from),
-- 
2.9.3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 05/17] drm/i915: Pin the pages whilst operating on them
  2016-08-22  8:03 Execbuf fixes and major tuning Chris Wilson
                   ` (3 preceding siblings ...)
  2016-08-22  8:03 ` [PATCH 04/17] drm/i915: Fix i915_gem_evict_for_vma (soft-pinning) Chris Wilson
@ 2016-08-22  8:03 ` Chris Wilson
  2016-08-23 11:54   ` Joonas Lahtinen
  2016-08-22  8:03 ` [PATCH 06/17] drm/i915: Move obj->dirty:1 to obj->flags Chris Wilson
                   ` (13 subsequent siblings)
  18 siblings, 1 reply; 47+ messages in thread
From: Chris Wilson @ 2016-08-22  8:03 UTC (permalink / raw)
  To: intel-gfx

As a safety precaution, whilst we operate on the object's pages
(clflushing them, updating the LRU) make sure we hold a pin on those
pages to prevent them disappearing underneath us.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index de895c7ef9e6..e601b74c19f9 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3251,6 +3251,7 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
 	if (ret)
 		return ret;
 
+	i915_gem_object_pin_pages(obj);
 	i915_gem_object_flush_cpu_write_domain(obj);
 
 	/* Serialise direct access to this object with the barriers for
@@ -3280,6 +3281,7 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
 
 	/* And bump the LRU for this access */
 	i915_gem_object_bump_inactive_ggtt(obj);
+	i915_gem_object_unpin_pages(obj);
 
 	return 0;
 }
-- 
2.9.3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 06/17] drm/i915: Move obj->dirty:1 to obj->flags
  2016-08-22  8:03 Execbuf fixes and major tuning Chris Wilson
                   ` (4 preceding siblings ...)
  2016-08-22  8:03 ` [PATCH 05/17] drm/i915: Pin the pages whilst operating on them Chris Wilson
@ 2016-08-22  8:03 ` Chris Wilson
  2016-08-23 12:01   ` Joonas Lahtinen
  2016-09-06 11:37   ` Dave Gordon
  2016-08-22  8:03 ` [PATCH 07/17] drm/i915: Use the precomputed value for whether to enable command parsing Chris Wilson
                   ` (12 subsequent siblings)
  18 siblings, 2 replies; 47+ messages in thread
From: Chris Wilson @ 2016-08-22  8:03 UTC (permalink / raw)
  To: intel-gfx

The obj->dirty bit is a companion to the obj->active bits that were
moved to the obj->flags bitmask. Since we also update this bit inside
the i915_vma_move_to_active() hotpath, we can aide gcc by also moving
the obj->dirty bit to obj->flags bitmask.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c        |  2 +-
 drivers/gpu/drm/i915/i915_drv.h            | 22 +++++++++++++++++++++-
 drivers/gpu/drm/i915/i915_gem.c            | 20 ++++++++++----------
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  3 +--
 drivers/gpu/drm/i915/i915_gem_userptr.c    |  6 +++---
 drivers/gpu/drm/i915/i915_gpu_error.c      |  2 +-
 drivers/gpu/drm/i915/intel_lrc.c           |  6 +++---
 7 files changed, 40 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 75ad7b9c243c..086053fa2820 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -156,7 +156,7 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
 		   i915_gem_active_get_seqno(&obj->last_write,
 					     &obj->base.dev->struct_mutex),
 		   i915_cache_level_str(to_i915(obj->base.dev), obj->cache_level),
-		   obj->dirty ? " dirty" : "",
+		   i915_gem_object_is_dirty(obj) ? " dirty" : "",
 		   obj->madv == I915_MADV_DONTNEED ? " purgeable" : "");
 	if (obj->base.name)
 		seq_printf(m, " (name: %d)", obj->base.name);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 30ae14870c47..5613f2886ad3 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2185,7 +2185,8 @@ struct drm_i915_gem_object {
 	 * This is set if the object has been written to since last bound
 	 * to the GTT
 	 */
-	unsigned int dirty:1;
+#define I915_BO_DIRTY_SHIFT (I915_BO_ACTIVE_REF_SHIFT + 1)
+#define I915_BO_DIRTY_BIT BIT(I915_BO_DIRTY_SHIFT)
 
 	/**
 	 * Advice: are the backing pages purgeable?
@@ -2371,6 +2372,25 @@ i915_gem_object_clear_active_reference(struct drm_i915_gem_object *obj)
 
 void __i915_gem_object_release_unless_active(struct drm_i915_gem_object *obj);
 
+static inline bool
+i915_gem_object_is_dirty(const struct drm_i915_gem_object *obj)
+{
+	return obj->flags & I915_BO_DIRTY_BIT;
+}
+
+static inline void
+i915_gem_object_set_dirty(struct drm_i915_gem_object *obj)
+{
+	GEM_BUG_ON(obj->pages_pin_count == 0);
+	obj->flags |= I915_BO_DIRTY_BIT;
+}
+
+static inline void
+i915_gem_object_clear_dirty(struct drm_i915_gem_object *obj)
+{
+	obj->flags &= ~I915_BO_DIRTY_BIT;
+}
+
 static inline unsigned int
 i915_gem_object_get_tiling(struct drm_i915_gem_object *obj)
 {
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index e601b74c19f9..7b8abda541e6 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -234,9 +234,9 @@ i915_gem_object_put_pages_phys(struct drm_i915_gem_object *obj)
 	}
 
 	if (obj->madv == I915_MADV_DONTNEED)
-		obj->dirty = 0;
+		i915_gem_object_clear_dirty(obj);
 
-	if (obj->dirty) {
+	if (i915_gem_object_is_dirty(obj)) {
 		struct address_space *mapping = obj->base.filp->f_mapping;
 		char *vaddr = obj->phys_handle->vaddr;
 		int i;
@@ -260,7 +260,7 @@ i915_gem_object_put_pages_phys(struct drm_i915_gem_object *obj)
 			put_page(page);
 			vaddr += PAGE_SIZE;
 		}
-		obj->dirty = 0;
+		i915_gem_object_clear_dirty(obj);
 	}
 
 	sg_free_table(obj->pages);
@@ -703,7 +703,7 @@ int i915_gem_obj_prepare_shmem_write(struct drm_i915_gem_object *obj,
 		obj->cache_dirty = true;
 
 	intel_fb_obj_invalidate(obj, ORIGIN_CPU);
-	obj->dirty = 1;
+	i915_gem_object_set_dirty(obj);
 	/* return with the pages pinned */
 	return 0;
 
@@ -1156,7 +1156,7 @@ i915_gem_gtt_pwrite_fast(struct drm_i915_private *i915,
 		goto out_unpin;
 
 	intel_fb_obj_invalidate(obj, ORIGIN_CPU);
-	obj->dirty = true;
+	i915_gem_object_set_dirty(obj);
 
 	user_data = u64_to_user_ptr(args->data_ptr);
 	offset = args->offset;
@@ -2099,10 +2099,10 @@ i915_gem_object_put_pages_gtt(struct drm_i915_gem_object *obj)
 		i915_gem_object_save_bit_17_swizzle(obj);
 
 	if (obj->madv == I915_MADV_DONTNEED)
-		obj->dirty = 0;
+		i915_gem_object_clear_dirty(obj);
 
 	for_each_sgt_page(page, sgt_iter, obj->pages) {
-		if (obj->dirty)
+		if (i915_gem_object_is_dirty(obj))
 			set_page_dirty(page);
 
 		if (obj->madv == I915_MADV_WILLNEED)
@@ -2110,7 +2110,7 @@ i915_gem_object_put_pages_gtt(struct drm_i915_gem_object *obj)
 
 		put_page(page);
 	}
-	obj->dirty = 0;
+	i915_gem_object_clear_dirty(obj);
 
 	sg_free_table(obj->pages);
 	kfree(obj->pages);
@@ -3272,7 +3272,7 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
 	if (write) {
 		obj->base.read_domains = I915_GEM_DOMAIN_GTT;
 		obj->base.write_domain = I915_GEM_DOMAIN_GTT;
-		obj->dirty = 1;
+		i915_gem_object_set_dirty(obj);
 	}
 
 	trace_i915_gem_object_change_domain(obj,
@@ -4751,7 +4751,7 @@ i915_gem_object_create_from_data(struct drm_device *dev,
 	i915_gem_object_pin_pages(obj);
 	sg = obj->pages;
 	bytes = sg_copy_from_buffer(sg->sgl, sg->nents, (void *)data, size);
-	obj->dirty = 1;		/* Backing store is now out of date */
+	i915_gem_object_set_dirty(obj); /* Backing store is now out of date */
 	i915_gem_object_unpin_pages(obj);
 
 	if (WARN_ON(bytes != size)) {
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 3a5f43960cb6..125fb38eff40 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1275,14 +1275,13 @@ void i915_vma_move_to_active(struct i915_vma *vma,
 
 	GEM_BUG_ON(!drm_mm_node_allocated(&vma->node));
 
-	obj->dirty = 1; /* be paranoid  */
-
 	/* The order in which we add operations to the retirement queue is
 	 * vital here: mark_active adds to the start of the callback list,
 	 * such that subsequent callbacks are called first. Therefore we
 	 * add the active reference first and queue for it to be dropped
 	 * *last*.
 	 */
+	i915_gem_object_set_dirty(obj); /* be paranoid */
 	i915_gem_object_set_active(obj, idx);
 	i915_gem_active_set(&obj->last_read[idx], req);
 
diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
index be54825ef3e8..581df2316ca5 100644
--- a/drivers/gpu/drm/i915/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
@@ -679,18 +679,18 @@ i915_gem_userptr_put_pages(struct drm_i915_gem_object *obj)
 	__i915_gem_userptr_set_active(obj, false);
 
 	if (obj->madv != I915_MADV_WILLNEED)
-		obj->dirty = 0;
+		i915_gem_object_clear_dirty(obj);
 
 	i915_gem_gtt_finish_object(obj);
 
 	for_each_sgt_page(page, sgt_iter, obj->pages) {
-		if (obj->dirty)
+		if (i915_gem_object_is_dirty(obj))
 			set_page_dirty(page);
 
 		mark_page_accessed(page);
 		put_page(page);
 	}
-	obj->dirty = 0;
+	i915_gem_object_clear_dirty(obj);
 
 	sg_free_table(obj->pages);
 	kfree(obj->pages);
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index dbf1dcbd4692..75136ae98d9c 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -843,7 +843,7 @@ static void capture_bo(struct drm_i915_error_buffer *err,
 	err->write_domain = obj->base.write_domain;
 	err->fence_reg = vma->fence ? vma->fence->id : -1;
 	err->tiling = i915_gem_object_get_tiling(obj);
-	err->dirty = obj->dirty;
+	err->dirty = i915_gem_object_is_dirty(obj);
 	err->purgeable = obj->madv != I915_MADV_WILLNEED;
 	err->userptr = obj->userptr.mm != NULL;
 	err->cache_level = obj->cache_level;
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 6b49df4316f4..e3ad05da2f77 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -795,7 +795,7 @@ static int intel_lr_context_pin(struct i915_gem_context *ctx,
 	lrc_reg_state[CTX_RING_BUFFER_START+1] =
 		i915_ggtt_offset(ce->ring->vma);
 	ce->lrc_reg_state = lrc_reg_state;
-	ce->state->obj->dirty = true;
+	i915_gem_object_set_dirty(ce->state->obj);
 
 	/* Invalidate GuC TLB. */
 	if (i915.enable_guc_submission) {
@@ -1969,7 +1969,7 @@ populate_lr_context(struct i915_gem_context *ctx,
 		DRM_DEBUG_DRIVER("Could not map object pages! (%d)\n", ret);
 		return ret;
 	}
-	ctx_obj->dirty = true;
+	i915_gem_object_set_dirty(ctx_obj);
 
 	/* The second page of the context object contains some fields which must
 	 * be set up prior to the first execution. */
@@ -2197,7 +2197,7 @@ void intel_lr_context_reset(struct drm_i915_private *dev_priv,
 		reg_state[CTX_RING_HEAD+1] = 0;
 		reg_state[CTX_RING_TAIL+1] = 0;
 
-		ce->state->obj->dirty = true;
+		i915_gem_object_set_dirty(ce->state->obj);
 		i915_gem_object_unpin_map(ce->state->obj);
 
 		ce->ring->head = 0;
-- 
2.9.3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 07/17] drm/i915: Use the precomputed value for whether to enable command parsing
  2016-08-22  8:03 Execbuf fixes and major tuning Chris Wilson
                   ` (5 preceding siblings ...)
  2016-08-22  8:03 ` [PATCH 06/17] drm/i915: Move obj->dirty:1 to obj->flags Chris Wilson
@ 2016-08-22  8:03 ` Chris Wilson
  2016-08-24 14:33   ` John Harrison
  2016-08-22  8:03 ` [PATCH 08/17] drm/i915: Drop spinlocks around adding to the client request list Chris Wilson
                   ` (11 subsequent siblings)
  18 siblings, 1 reply; 47+ messages in thread
From: Chris Wilson @ 2016-08-22  8:03 UTC (permalink / raw)
  To: intel-gfx

As i915.enable_cmd_parser is an unsafe option, make it read-only at
runtime. Now that it is constant, we can use the value determined during
initialisation as to whether we need the cmdparser at execbuffer time.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_cmd_parser.c  | 21 ---------------------
 drivers/gpu/drm/i915/i915_drv.h         |  1 -
 drivers/gpu/drm/i915/i915_params.c      |  6 +++---
 drivers/gpu/drm/i915/i915_params.h      |  2 +-
 drivers/gpu/drm/i915/intel_ringbuffer.h | 15 +++++++++++++++
 5 files changed, 19 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
index 3c72b3b103e7..f749c8b52351 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -1036,27 +1036,6 @@ unpin_src:
 	return dst;
 }
 
-/**
- * intel_engine_needs_cmd_parser() - should a given engine use software
- *                                   command parsing?
- * @engine: the engine in question
- *
- * Only certain platforms require software batch buffer command parsing, and
- * only when enabled via module parameter.
- *
- * Return: true if the engine requires software command parsing
- */
-bool intel_engine_needs_cmd_parser(struct intel_engine_cs *engine)
-{
-	if (!engine->needs_cmd_parser)
-		return false;
-
-	if (!USES_PPGTT(engine->i915))
-		return false;
-
-	return (i915.enable_cmd_parser == 1);
-}
-
 static bool check_cmd(const struct intel_engine_cs *engine,
 		      const struct drm_i915_cmd_descriptor *desc,
 		      const u32 *cmd, u32 length,
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 5613f2886ad3..049ac53b05da 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3603,7 +3603,6 @@ const char *i915_cache_level_str(struct drm_i915_private *i915, int type);
 int i915_cmd_parser_get_version(struct drm_i915_private *dev_priv);
 void intel_engine_init_cmd_parser(struct intel_engine_cs *engine);
 void intel_engine_cleanup_cmd_parser(struct intel_engine_cs *engine);
-bool intel_engine_needs_cmd_parser(struct intel_engine_cs *engine);
 int intel_engine_cmd_parser(struct intel_engine_cs *engine,
 			    struct drm_i915_gem_object *batch_obj,
 			    struct drm_i915_gem_object *shadow_batch_obj,
diff --git a/drivers/gpu/drm/i915/i915_params.c b/drivers/gpu/drm/i915/i915_params.c
index e72a41223535..5879a7eae6cc 100644
--- a/drivers/gpu/drm/i915/i915_params.c
+++ b/drivers/gpu/drm/i915/i915_params.c
@@ -50,7 +50,7 @@ struct i915_params i915 __read_mostly = {
 	.error_capture = true,
 	.invert_brightness = 0,
 	.disable_display = 0,
-	.enable_cmd_parser = 1,
+	.enable_cmd_parser = true,
 	.use_mmio_flip = 0,
 	.mmio_debug = 0,
 	.verbose_state_checks = 1,
@@ -187,9 +187,9 @@ MODULE_PARM_DESC(invert_brightness,
 module_param_named(disable_display, i915.disable_display, bool, 0400);
 MODULE_PARM_DESC(disable_display, "Disable display (default: false)");
 
-module_param_named_unsafe(enable_cmd_parser, i915.enable_cmd_parser, int, 0600);
+module_param_named_unsafe(enable_cmd_parser, i915.enable_cmd_parser, bool, 0400);
 MODULE_PARM_DESC(enable_cmd_parser,
-		 "Enable command parsing (1=enabled [default], 0=disabled)");
+		 "Enable command parsing (true=enabled [default], false=disabled)");
 
 module_param_named_unsafe(use_mmio_flip, i915.use_mmio_flip, int, 0600);
 MODULE_PARM_DESC(use_mmio_flip,
diff --git a/drivers/gpu/drm/i915/i915_params.h b/drivers/gpu/drm/i915/i915_params.h
index 94efc899c1ef..ad0b1d280c2b 100644
--- a/drivers/gpu/drm/i915/i915_params.h
+++ b/drivers/gpu/drm/i915/i915_params.h
@@ -44,7 +44,6 @@ struct i915_params {
 	int disable_power_well;
 	int enable_ips;
 	int invert_brightness;
-	int enable_cmd_parser;
 	int enable_guc_loading;
 	int enable_guc_submission;
 	int guc_log_level;
@@ -53,6 +52,7 @@ struct i915_params {
 	int edp_vswing;
 	unsigned int inject_load_failure;
 	/* leave bools at the end to not create holes */
+	bool enable_cmd_parser;
 	bool enable_hangcheck;
 	bool fastboot;
 	bool prefault_disable;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 84aea549de5d..c8cbe61deece 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -368,6 +368,21 @@ intel_engine_flag(const struct intel_engine_cs *engine)
 	return 1 << engine->id;
 }
 
+/**
+ * intel_engine_needs_cmd_parser() - should a given engine use software
+ *                                   command parsing?
+ * @engine: the engine in question
+ *
+ * Only certain platforms require software batch buffer command parsing, and
+ * only when enabled via module parameter.
+ *
+ * Return: true if the engine requires software command parsing
+ */
+static inline bool intel_engine_needs_cmd_parser(struct intel_engine_cs *engine)
+{
+	return engine->needs_cmd_parser;
+}
+
 static inline u32
 intel_engine_sync_index(struct intel_engine_cs *engine,
 			struct intel_engine_cs *other)
-- 
2.9.3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 08/17] drm/i915: Drop spinlocks around adding to the client request list
  2016-08-22  8:03 Execbuf fixes and major tuning Chris Wilson
                   ` (6 preceding siblings ...)
  2016-08-22  8:03 ` [PATCH 07/17] drm/i915: Use the precomputed value for whether to enable command parsing Chris Wilson
@ 2016-08-22  8:03 ` Chris Wilson
  2016-08-24 13:16   ` Mika Kuoppala
                     ` (2 more replies)
  2016-08-22  8:03 ` [PATCH 09/17] drm/i915: Amalgamate execbuffer parameter structures Chris Wilson
                   ` (10 subsequent siblings)
  18 siblings, 3 replies; 47+ messages in thread
From: Chris Wilson @ 2016-08-22  8:03 UTC (permalink / raw)
  To: intel-gfx

Adding to the tail of the client request list as the only other user is
in the throttle ioctl that iterates forwards over the list. It only
needs protection against deletion of a request as it reads it, it simply
won't see a new request added to the end of the list, or it would be too
early and rejected. We can further reduce the number of spinlocks
required when throttling by removing stale requests from the client_list
as we throttle.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c        |  2 +-
 drivers/gpu/drm/i915/i915_gem.c            | 14 ++++++------
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 14 +++++++-----
 drivers/gpu/drm/i915/i915_gem_request.c    | 34 ++++++------------------------
 drivers/gpu/drm/i915/i915_gem_request.h    |  4 +---
 5 files changed, 23 insertions(+), 45 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 086053fa2820..996744708f31 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -480,7 +480,7 @@ static int i915_gem_object_info(struct seq_file *m, void* data)
 		mutex_lock(&dev->struct_mutex);
 		request = list_first_entry_or_null(&file_priv->mm.request_list,
 						   struct drm_i915_gem_request,
-						   client_list);
+						   client_link);
 		rcu_read_lock();
 		task = pid_task(request && request->ctx->pid ?
 				request->ctx->pid : file->pid,
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 7b8abda541e6..e432211e8b24 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3673,16 +3673,14 @@ i915_gem_ring_throttle(struct drm_device *dev, struct drm_file *file)
 		return -EIO;
 
 	spin_lock(&file_priv->mm.lock);
-	list_for_each_entry(request, &file_priv->mm.request_list, client_list) {
+	list_for_each_entry(request, &file_priv->mm.request_list, client_link) {
 		if (time_after_eq(request->emitted_jiffies, recent_enough))
 			break;
 
-		/*
-		 * Note that the request might not have been submitted yet.
-		 * In which case emitted_jiffies will be zero.
-		 */
-		if (!request->emitted_jiffies)
-			continue;
+		if (target) {
+			list_del(&target->client_link);
+			target->file_priv = NULL;
+		}
 
 		target = request;
 	}
@@ -4639,7 +4637,7 @@ void i915_gem_release(struct drm_device *dev, struct drm_file *file)
 	 * file_priv.
 	 */
 	spin_lock(&file_priv->mm.lock);
-	list_for_each_entry(request, &file_priv->mm.request_list, client_list)
+	list_for_each_entry(request, &file_priv->mm.request_list, client_link)
 		request->file_priv = NULL;
 	spin_unlock(&file_priv->mm.lock);
 
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 125fb38eff40..5689445b1cd3 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1421,6 +1421,14 @@ out:
 	return vma;
 }
 
+static void
+add_to_client(struct drm_i915_gem_request *req,
+	      struct drm_file *file)
+{
+	req->file_priv = file->driver_priv;
+	list_add_tail(&req->client_link, &req->file_priv->mm.request_list);
+}
+
 static int
 execbuf_submit(struct i915_execbuffer_params *params,
 	       struct drm_i915_gem_execbuffer2 *args,
@@ -1512,6 +1520,7 @@ execbuf_submit(struct i915_execbuffer_params *params,
 	trace_i915_gem_ring_dispatch(params->request, params->dispatch_flags);
 
 	i915_gem_execbuffer_move_to_active(vmas, params->request);
+	add_to_client(params->request, params->file);
 
 	return 0;
 }
@@ -1808,10 +1817,6 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	 */
 	params->request->batch = params->batch;
 
-	ret = i915_gem_request_add_to_client(params->request, file);
-	if (ret)
-		goto err_request;
-
 	/*
 	 * Save assorted stuff away to pass through to *_submission().
 	 * NB: This data should be 'persistent' and not local as it will
@@ -1825,7 +1830,6 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	params->ctx                     = ctx;
 
 	ret = execbuf_submit(params, args, &eb->vmas);
-err_request:
 	__i915_add_request(params->request, ret == 0);
 
 err_batch_unpin:
diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
index 1a215320cefb..bf62427a35b7 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -115,42 +115,20 @@ const struct fence_ops i915_fence_ops = {
 	.timeline_value_str = i915_fence_timeline_value_str,
 };
 
-int i915_gem_request_add_to_client(struct drm_i915_gem_request *req,
-				   struct drm_file *file)
-{
-	struct drm_i915_private *dev_private;
-	struct drm_i915_file_private *file_priv;
-
-	WARN_ON(!req || !file || req->file_priv);
-
-	if (!req || !file)
-		return -EINVAL;
-
-	if (req->file_priv)
-		return -EINVAL;
-
-	dev_private = req->i915;
-	file_priv = file->driver_priv;
-
-	spin_lock(&file_priv->mm.lock);
-	req->file_priv = file_priv;
-	list_add_tail(&req->client_list, &file_priv->mm.request_list);
-	spin_unlock(&file_priv->mm.lock);
-
-	return 0;
-}
-
 static inline void
 i915_gem_request_remove_from_client(struct drm_i915_gem_request *request)
 {
-	struct drm_i915_file_private *file_priv = request->file_priv;
+	struct drm_i915_file_private *file_priv;
 
+	file_priv = request->file_priv;
 	if (!file_priv)
 		return;
 
 	spin_lock(&file_priv->mm.lock);
-	list_del(&request->client_list);
-	request->file_priv = NULL;
+	if (request->file_priv) {
+		list_del(&request->client_link);
+		request->file_priv = NULL;
+	}
 	spin_unlock(&file_priv->mm.lock);
 }
 
diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
index 6c72bd8d9423..9d5a66bfc509 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.h
+++ b/drivers/gpu/drm/i915/i915_gem_request.h
@@ -132,7 +132,7 @@ struct drm_i915_gem_request {
 
 	struct drm_i915_file_private *file_priv;
 	/** file_priv list entry for this request */
-	struct list_head client_list;
+	struct list_head client_link;
 
 	/**
 	 * The ELSP only accepts two elements at a time, so we queue
@@ -167,8 +167,6 @@ static inline bool fence_is_i915(struct fence *fence)
 struct drm_i915_gem_request * __must_check
 i915_gem_request_alloc(struct intel_engine_cs *engine,
 		       struct i915_gem_context *ctx);
-int i915_gem_request_add_to_client(struct drm_i915_gem_request *req,
-				   struct drm_file *file);
 void i915_gem_request_retire_upto(struct drm_i915_gem_request *req);
 
 static inline u32
-- 
2.9.3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 09/17] drm/i915: Amalgamate execbuffer parameter structures
  2016-08-22  8:03 Execbuf fixes and major tuning Chris Wilson
                   ` (7 preceding siblings ...)
  2016-08-22  8:03 ` [PATCH 08/17] drm/i915: Drop spinlocks around adding to the client request list Chris Wilson
@ 2016-08-22  8:03 ` Chris Wilson
  2016-08-24 13:20   ` John Harrison
  2016-08-22  8:03 ` [PATCH 10/17] drm/i915: Use vma->exec_entry as our double-entry placeholder Chris Wilson
                   ` (9 subsequent siblings)
  18 siblings, 1 reply; 47+ messages in thread
From: Chris Wilson @ 2016-08-22  8:03 UTC (permalink / raw)
  To: intel-gfx

Combine the two slightly overlapping parameter structures we pass around
the execbuffer routines into one.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 557 ++++++++++++-----------------
 1 file changed, 235 insertions(+), 322 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 5689445b1cd3..7cb5b9ad9212 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -49,70 +49,73 @@
 
 #define BATCH_OFFSET_BIAS (256*1024)
 
-struct i915_execbuffer_params {
-	struct drm_device               *dev;
-	struct drm_file                 *file;
-	struct i915_vma			*batch;
-	u32				dispatch_flags;
-	u32				args_batch_start_offset;
-	struct intel_engine_cs          *engine;
-	struct i915_gem_context         *ctx;
-	struct drm_i915_gem_request     *request;
-};
-
-struct eb_vmas {
+struct i915_execbuffer {
 	struct drm_i915_private *i915;
+	struct drm_file *file;
+	struct drm_i915_gem_execbuffer2 *args;
+	struct drm_i915_gem_exec_object2 *exec;
+	struct intel_engine_cs *engine;
+	struct i915_gem_context *ctx;
+	struct i915_address_space *vm;
+	struct i915_vma *batch;
+	struct drm_i915_gem_request *request;
+	u32 batch_start_offset;
+	unsigned int dispatch_flags;
+	struct drm_i915_gem_exec_object2 shadow_exec_entry;
+	bool need_relocs;
 	struct list_head vmas;
+	struct reloc_cache {
+		struct drm_mm_node node;
+		unsigned long vaddr;
+		unsigned int page;
+		bool use_64bit_reloc;
+	} reloc_cache;
 	int and;
 	union {
-		struct i915_vma *lut[0];
-		struct hlist_head buckets[0];
+		struct i915_vma **lut;
+		struct hlist_head *buckets;
 	};
 };
 
-static struct eb_vmas *
-eb_create(struct drm_i915_private *i915,
-	  struct drm_i915_gem_execbuffer2 *args)
+static int
+eb_create(struct i915_execbuffer *eb)
 {
-	struct eb_vmas *eb = NULL;
-
-	if (args->flags & I915_EXEC_HANDLE_LUT) {
-		unsigned size = args->buffer_count;
+	eb->lut = NULL;
+	if (eb->args->flags & I915_EXEC_HANDLE_LUT) {
+		unsigned int size = eb->args->buffer_count;
 		size *= sizeof(struct i915_vma *);
-		size += sizeof(struct eb_vmas);
-		eb = kmalloc(size, GFP_TEMPORARY | __GFP_NOWARN | __GFP_NORETRY);
+		eb->lut = kmalloc(size,
+				  GFP_TEMPORARY | __GFP_NOWARN | __GFP_NORETRY);
 	}
 
-	if (eb == NULL) {
-		unsigned size = args->buffer_count;
-		unsigned count = PAGE_SIZE / sizeof(struct hlist_head) / 2;
+	if (!eb->lut) {
+		unsigned int size = eb->args->buffer_count;
+		unsigned int count = PAGE_SIZE / sizeof(struct hlist_head) / 2;
 		BUILD_BUG_ON_NOT_POWER_OF_2(PAGE_SIZE / sizeof(struct hlist_head));
 		while (count > 2*size)
 			count >>= 1;
-		eb = kzalloc(count*sizeof(struct hlist_head) +
-			     sizeof(struct eb_vmas),
-			     GFP_TEMPORARY);
-		if (eb == NULL)
-			return eb;
+		eb->lut = kzalloc(count*sizeof(struct hlist_head),
+				  GFP_TEMPORARY);
+		if (!eb->lut)
+			return -ENOMEM;
 
 		eb->and = count - 1;
 	} else
-		eb->and = -args->buffer_count;
+		eb->and = -eb->args->buffer_count;
 
-	eb->i915 = i915;
 	INIT_LIST_HEAD(&eb->vmas);
-	return eb;
+	return 0;
 }
 
 static void
-eb_reset(struct eb_vmas *eb)
+eb_reset(struct i915_execbuffer *eb)
 {
 	if (eb->and >= 0)
 		memset(eb->buckets, 0, (eb->and+1)*sizeof(struct hlist_head));
 }
 
 static struct i915_vma *
-eb_get_batch(struct eb_vmas *eb)
+eb_get_batch(struct i915_execbuffer *eb)
 {
 	struct i915_vma *vma = list_entry(eb->vmas.prev, typeof(*vma), exec_list);
 
@@ -132,41 +135,37 @@ eb_get_batch(struct eb_vmas *eb)
 }
 
 static int
-eb_lookup_vmas(struct eb_vmas *eb,
-	       struct drm_i915_gem_exec_object2 *exec,
-	       const struct drm_i915_gem_execbuffer2 *args,
-	       struct i915_address_space *vm,
-	       struct drm_file *file)
+eb_lookup_vmas(struct i915_execbuffer *eb)
 {
 	struct drm_i915_gem_object *obj;
 	struct list_head objects;
 	int i, ret;
 
 	INIT_LIST_HEAD(&objects);
-	spin_lock(&file->table_lock);
+	spin_lock(&eb->file->table_lock);
 	/* Grab a reference to the object and release the lock so we can lookup
 	 * or create the VMA without using GFP_ATOMIC */
-	for (i = 0; i < args->buffer_count; i++) {
-		obj = to_intel_bo(idr_find(&file->object_idr, exec[i].handle));
+	for (i = 0; i < eb->args->buffer_count; i++) {
+		obj = to_intel_bo(idr_find(&eb->file->object_idr, eb->exec[i].handle));
 		if (obj == NULL) {
-			spin_unlock(&file->table_lock);
+			spin_unlock(&eb->file->table_lock);
 			DRM_DEBUG("Invalid object handle %d at index %d\n",
-				   exec[i].handle, i);
+				   eb->exec[i].handle, i);
 			ret = -ENOENT;
 			goto err;
 		}
 
 		if (!list_empty(&obj->obj_exec_link)) {
-			spin_unlock(&file->table_lock);
+			spin_unlock(&eb->file->table_lock);
 			DRM_DEBUG("Object %p [handle %d, index %d] appears more than once in object list\n",
-				   obj, exec[i].handle, i);
+				   obj, eb->exec[i].handle, i);
 			ret = -EINVAL;
 			goto err;
 		}
 
 		list_add_tail(&obj->obj_exec_link, &objects);
 	}
-	spin_unlock(&file->table_lock);
+	spin_unlock(&eb->file->table_lock);
 
 	i = 0;
 	while (!list_empty(&objects)) {
@@ -184,7 +183,7 @@ eb_lookup_vmas(struct eb_vmas *eb,
 		 * from the (obj, vm) we don't run the risk of creating
 		 * duplicated vmas for the same vm.
 		 */
-		vma = i915_gem_obj_lookup_or_create_vma(obj, vm, NULL);
+		vma = i915_gem_obj_lookup_or_create_vma(obj, eb->vm, NULL);
 		if (unlikely(IS_ERR(vma))) {
 			DRM_DEBUG("Failed to lookup VMA\n");
 			ret = PTR_ERR(vma);
@@ -195,11 +194,13 @@ eb_lookup_vmas(struct eb_vmas *eb,
 		list_add_tail(&vma->exec_list, &eb->vmas);
 		list_del_init(&obj->obj_exec_link);
 
-		vma->exec_entry = &exec[i];
+		vma->exec_entry = &eb->exec[i];
 		if (eb->and < 0) {
 			eb->lut[i] = vma;
 		} else {
-			uint32_t handle = args->flags & I915_EXEC_HANDLE_LUT ? i : exec[i].handle;
+			u32 handle =
+				eb->args->flags & I915_EXEC_HANDLE_LUT ?
+				i : eb->exec[i].handle;
 			vma->exec_handle = handle;
 			hlist_add_head(&vma->exec_node,
 				       &eb->buckets[handle & eb->and]);
@@ -226,7 +227,7 @@ err:
 	return ret;
 }
 
-static struct i915_vma *eb_get_vma(struct eb_vmas *eb, unsigned long handle)
+static struct i915_vma *eb_get_vma(struct i915_execbuffer *eb, unsigned long handle)
 {
 	if (eb->and < 0) {
 		if (handle >= -eb->and)
@@ -246,7 +247,7 @@ static struct i915_vma *eb_get_vma(struct eb_vmas *eb, unsigned long handle)
 }
 
 static void
-i915_gem_execbuffer_unreserve_vma(struct i915_vma *vma)
+eb_unreserve_vma(struct i915_vma *vma)
 {
 	struct drm_i915_gem_exec_object2 *entry;
 
@@ -264,8 +265,10 @@ i915_gem_execbuffer_unreserve_vma(struct i915_vma *vma)
 	entry->flags &= ~(__EXEC_OBJECT_HAS_FENCE | __EXEC_OBJECT_HAS_PIN);
 }
 
-static void eb_destroy(struct eb_vmas *eb)
+static void eb_destroy(struct i915_execbuffer *eb)
 {
+	i915_gem_context_put(eb->ctx);
+
 	while (!list_empty(&eb->vmas)) {
 		struct i915_vma *vma;
 
@@ -273,9 +276,8 @@ static void eb_destroy(struct eb_vmas *eb)
 				       struct i915_vma,
 				       exec_list);
 		list_del_init(&vma->exec_list);
-		i915_gem_execbuffer_unreserve_vma(vma);
+		eb_unreserve_vma(vma);
 	}
-	kfree(eb);
 }
 
 static inline int use_cpu_reloc(struct drm_i915_gem_object *obj)
@@ -316,21 +318,12 @@ relocation_target(const struct drm_i915_gem_relocation_entry *reloc,
 	return gen8_canonical_addr((int)reloc->delta + target_offset);
 }
 
-struct reloc_cache {
-	struct drm_i915_private *i915;
-	struct drm_mm_node node;
-	unsigned long vaddr;
-	unsigned int page;
-	bool use_64bit_reloc;
-};
-
 static void reloc_cache_init(struct reloc_cache *cache,
 			     struct drm_i915_private *i915)
 {
 	cache->page = -1;
 	cache->vaddr = 0;
-	cache->i915 = i915;
-	cache->use_64bit_reloc = INTEL_GEN(cache->i915) >= 8;
+	cache->use_64bit_reloc = INTEL_GEN(i915) >= 8;
 	cache->node.allocated = false;
 }
 
@@ -346,7 +339,14 @@ static inline unsigned int unmask_flags(unsigned long p)
 
 #define KMAP 0x4 /* after CLFLUSH_FLAGS */
 
-static void reloc_cache_fini(struct reloc_cache *cache)
+static inline struct i915_ggtt *cache_to_ggtt(struct reloc_cache *cache)
+{
+	struct drm_i915_private *i915 =
+		container_of(cache, struct i915_execbuffer, reloc_cache)->i915;
+	return &i915->ggtt;
+}
+
+static void reloc_cache_reset(struct reloc_cache *cache)
 {
 	void *vaddr;
 
@@ -364,7 +364,7 @@ static void reloc_cache_fini(struct reloc_cache *cache)
 		wmb();
 		io_mapping_unmap_atomic((void __iomem *)vaddr);
 		if (cache->node.allocated) {
-			struct i915_ggtt *ggtt = &cache->i915->ggtt;
+			struct i915_ggtt *ggtt = cache_to_ggtt(cache);
 
 			ggtt->base.clear_range(&ggtt->base,
 					       cache->node.start,
@@ -375,6 +375,9 @@ static void reloc_cache_fini(struct reloc_cache *cache)
 			i915_vma_unpin((struct i915_vma *)cache->node.mm);
 		}
 	}
+
+	cache->vaddr = 0;
+	cache->page = -1;
 }
 
 static void *reloc_kmap(struct drm_i915_gem_object *obj,
@@ -413,7 +416,7 @@ static void *reloc_iomap(struct drm_i915_gem_object *obj,
 			 struct reloc_cache *cache,
 			 int page)
 {
-	struct i915_ggtt *ggtt = &cache->i915->ggtt;
+	struct i915_ggtt *ggtt = cache_to_ggtt(cache);
 	unsigned long offset;
 	void *vaddr;
 
@@ -472,7 +475,7 @@ static void *reloc_iomap(struct drm_i915_gem_object *obj,
 		offset += page << PAGE_SHIFT;
 	}
 
-	vaddr = io_mapping_map_atomic_wc(&cache->i915->ggtt.mappable, offset);
+	vaddr = io_mapping_map_atomic_wc(&ggtt->mappable, offset);
 	cache->page = page;
 	cache->vaddr = (unsigned long)vaddr;
 
@@ -565,12 +568,10 @@ static bool object_is_idle(struct drm_i915_gem_object *obj)
 }
 
 static int
-i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
-				   struct eb_vmas *eb,
-				   struct drm_i915_gem_relocation_entry *reloc,
-				   struct reloc_cache *cache)
+eb_relocate_entry(struct drm_i915_gem_object *obj,
+		  struct i915_execbuffer *eb,
+		  struct drm_i915_gem_relocation_entry *reloc)
 {
-	struct drm_device *dev = obj->base.dev;
 	struct drm_gem_object *target_obj;
 	struct drm_i915_gem_object *target_i915_obj;
 	struct i915_vma *target_vma;
@@ -589,8 +590,8 @@ i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
 	/* Sandybridge PPGTT errata: We need a global gtt mapping for MI and
 	 * pipe_control writes because the gpu doesn't properly redirect them
 	 * through the ppgtt for non_secure batchbuffers. */
-	if (unlikely(IS_GEN6(dev) &&
-	    reloc->write_domain == I915_GEM_DOMAIN_INSTRUCTION)) {
+	if (unlikely(IS_GEN6(eb->i915) &&
+		     reloc->write_domain == I915_GEM_DOMAIN_INSTRUCTION)) {
 		ret = i915_vma_bind(target_vma, target_i915_obj->cache_level,
 				    PIN_GLOBAL);
 		if (WARN_ONCE(ret, "Unexpected failure to bind target VMA!"))
@@ -631,7 +632,7 @@ i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
 
 	/* Check that the relocation address is valid... */
 	if (unlikely(reloc->offset >
-		     obj->base.size - (cache->use_64bit_reloc ? 8 : 4))) {
+		     obj->base.size - (eb->reloc_cache.use_64bit_reloc ? 8 : 4))) {
 		DRM_DEBUG("Relocation beyond object bounds: "
 			  "obj %p target %d offset %d size %d.\n",
 			  obj, reloc->target_handle,
@@ -651,7 +652,7 @@ i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
 	if (pagefault_disabled() && !object_is_idle(obj))
 		return -EFAULT;
 
-	ret = relocate_entry(obj, reloc, cache, target_offset);
+	ret = relocate_entry(obj, reloc, &eb->reloc_cache, target_offset);
 	if (ret)
 		return ret;
 
@@ -660,19 +661,15 @@ i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
 	return 0;
 }
 
-static int
-i915_gem_execbuffer_relocate_vma(struct i915_vma *vma,
-				 struct eb_vmas *eb)
+static int eb_relocate_vma(struct i915_vma *vma, struct i915_execbuffer *eb)
 {
 #define N_RELOC(x) ((x) / sizeof(struct drm_i915_gem_relocation_entry))
 	struct drm_i915_gem_relocation_entry stack_reloc[N_RELOC(512)];
 	struct drm_i915_gem_relocation_entry __user *user_relocs;
 	struct drm_i915_gem_exec_object2 *entry = vma->exec_entry;
-	struct reloc_cache cache;
 	int remain, ret = 0;
 
 	user_relocs = u64_to_user_ptr(entry->relocs_ptr);
-	reloc_cache_init(&cache, eb->i915);
 
 	remain = entry->relocation_count;
 	while (remain) {
@@ -690,7 +687,7 @@ i915_gem_execbuffer_relocate_vma(struct i915_vma *vma,
 		do {
 			u64 offset = r->presumed_offset;
 
-			ret = i915_gem_execbuffer_relocate_entry(vma->obj, eb, r, &cache);
+			ret = eb_relocate_entry(vma->obj, eb, r);
 			if (ret)
 				goto out;
 
@@ -707,33 +704,29 @@ i915_gem_execbuffer_relocate_vma(struct i915_vma *vma,
 	}
 
 out:
-	reloc_cache_fini(&cache);
+	reloc_cache_reset(&eb->reloc_cache);
 	return ret;
 #undef N_RELOC
 }
 
 static int
-i915_gem_execbuffer_relocate_vma_slow(struct i915_vma *vma,
-				      struct eb_vmas *eb,
-				      struct drm_i915_gem_relocation_entry *relocs)
+eb_relocate_vma_slow(struct i915_vma *vma,
+		     struct i915_execbuffer *eb,
+		     struct drm_i915_gem_relocation_entry *relocs)
 {
 	const struct drm_i915_gem_exec_object2 *entry = vma->exec_entry;
-	struct reloc_cache cache;
 	int i, ret = 0;
 
-	reloc_cache_init(&cache, eb->i915);
 	for (i = 0; i < entry->relocation_count; i++) {
-		ret = i915_gem_execbuffer_relocate_entry(vma->obj, eb, &relocs[i], &cache);
+		ret = eb_relocate_entry(vma->obj, eb, &relocs[i]);
 		if (ret)
 			break;
 	}
-	reloc_cache_fini(&cache);
-
+	reloc_cache_reset(&eb->reloc_cache);
 	return ret;
 }
 
-static int
-i915_gem_execbuffer_relocate(struct eb_vmas *eb)
+static int eb_relocate(struct i915_execbuffer *eb)
 {
 	struct i915_vma *vma;
 	int ret = 0;
@@ -747,7 +740,7 @@ i915_gem_execbuffer_relocate(struct eb_vmas *eb)
 	 */
 	pagefault_disable();
 	list_for_each_entry(vma, &eb->vmas, exec_list) {
-		ret = i915_gem_execbuffer_relocate_vma(vma, eb);
+		ret = eb_relocate_vma(vma, eb);
 		if (ret)
 			break;
 	}
@@ -763,9 +756,9 @@ static bool only_mappable_for_reloc(unsigned int flags)
 }
 
 static int
-i915_gem_execbuffer_reserve_vma(struct i915_vma *vma,
-				struct intel_engine_cs *engine,
-				bool *need_reloc)
+eb_reserve_vma(struct i915_vma *vma,
+	       struct intel_engine_cs *engine,
+	       bool *need_reloc)
 {
 	struct drm_i915_gem_object *obj = vma->obj;
 	struct drm_i915_gem_exec_object2 *entry = vma->exec_entry;
@@ -885,33 +878,26 @@ eb_vma_misplaced(struct i915_vma *vma)
 	return false;
 }
 
-static int
-i915_gem_execbuffer_reserve(struct intel_engine_cs *engine,
-			    struct list_head *vmas,
-			    struct i915_gem_context *ctx,
-			    bool *need_relocs)
+static int eb_reserve(struct i915_execbuffer *eb)
 {
+	const bool has_fenced_gpu_access = INTEL_GEN(eb->i915) < 4;
 	struct drm_i915_gem_object *obj;
 	struct i915_vma *vma;
-	struct i915_address_space *vm;
 	struct list_head ordered_vmas;
 	struct list_head pinned_vmas;
-	bool has_fenced_gpu_access = INTEL_GEN(engine->i915) < 4;
 	int retry;
 
-	vm = list_first_entry(vmas, struct i915_vma, exec_list)->vm;
-
 	INIT_LIST_HEAD(&ordered_vmas);
 	INIT_LIST_HEAD(&pinned_vmas);
-	while (!list_empty(vmas)) {
+	while (!list_empty(&eb->vmas)) {
 		struct drm_i915_gem_exec_object2 *entry;
 		bool need_fence, need_mappable;
 
-		vma = list_first_entry(vmas, struct i915_vma, exec_list);
+		vma = list_first_entry(&eb->vmas, struct i915_vma, exec_list);
 		obj = vma->obj;
 		entry = vma->exec_entry;
 
-		if (ctx->flags & CONTEXT_NO_ZEROMAP)
+		if (eb->ctx->flags & CONTEXT_NO_ZEROMAP)
 			entry->flags |= __EXEC_OBJECT_NEEDS_BIAS;
 
 		if (!has_fenced_gpu_access)
@@ -932,8 +918,8 @@ i915_gem_execbuffer_reserve(struct intel_engine_cs *engine,
 		obj->base.pending_read_domains = I915_GEM_GPU_DOMAINS & ~I915_GEM_DOMAIN_COMMAND;
 		obj->base.pending_write_domain = 0;
 	}
-	list_splice(&ordered_vmas, vmas);
-	list_splice(&pinned_vmas, vmas);
+	list_splice(&ordered_vmas, &eb->vmas);
+	list_splice(&pinned_vmas, &eb->vmas);
 
 	/* Attempt to pin all of the buffers into the GTT.
 	 * This is done in 3 phases:
@@ -952,27 +938,24 @@ i915_gem_execbuffer_reserve(struct intel_engine_cs *engine,
 		int ret = 0;
 
 		/* Unbind any ill-fitting objects or pin. */
-		list_for_each_entry(vma, vmas, exec_list) {
+		list_for_each_entry(vma, &eb->vmas, exec_list) {
 			if (!drm_mm_node_allocated(&vma->node))
 				continue;
 
 			if (eb_vma_misplaced(vma))
 				ret = i915_vma_unbind(vma);
 			else
-				ret = i915_gem_execbuffer_reserve_vma(vma,
-								      engine,
-								      need_relocs);
+				ret = eb_reserve_vma(vma, eb->engine, &eb->need_relocs);
 			if (ret)
 				goto err;
 		}
 
 		/* Bind fresh objects */
-		list_for_each_entry(vma, vmas, exec_list) {
+		list_for_each_entry(vma, &eb->vmas, exec_list) {
 			if (drm_mm_node_allocated(&vma->node))
 				continue;
 
-			ret = i915_gem_execbuffer_reserve_vma(vma, engine,
-							      need_relocs);
+			ret = eb_reserve_vma(vma, eb->engine, &eb->need_relocs);
 			if (ret)
 				goto err;
 		}
@@ -982,46 +965,37 @@ err:
 			return ret;
 
 		/* Decrement pin count for bound objects */
-		list_for_each_entry(vma, vmas, exec_list)
-			i915_gem_execbuffer_unreserve_vma(vma);
+		list_for_each_entry(vma, &eb->vmas, exec_list)
+			eb_unreserve_vma(vma);
 
-		ret = i915_gem_evict_vm(vm, true);
+		ret = i915_gem_evict_vm(eb->vm, true);
 		if (ret)
 			return ret;
 	} while (1);
 }
 
 static int
-i915_gem_execbuffer_relocate_slow(struct drm_device *dev,
-				  struct drm_i915_gem_execbuffer2 *args,
-				  struct drm_file *file,
-				  struct intel_engine_cs *engine,
-				  struct eb_vmas *eb,
-				  struct drm_i915_gem_exec_object2 *exec,
-				  struct i915_gem_context *ctx)
+eb_relocate_slow(struct i915_execbuffer *eb)
 {
+	const unsigned int count = eb->args->buffer_count;
+	struct drm_device *dev = &eb->i915->drm;
 	struct drm_i915_gem_relocation_entry *reloc;
-	struct i915_address_space *vm;
 	struct i915_vma *vma;
-	bool need_relocs;
 	int *reloc_offset;
 	int i, total, ret;
-	unsigned count = args->buffer_count;
-
-	vm = list_first_entry(&eb->vmas, struct i915_vma, exec_list)->vm;
 
 	/* We may process another execbuffer during the unlock... */
 	while (!list_empty(&eb->vmas)) {
 		vma = list_first_entry(&eb->vmas, struct i915_vma, exec_list);
 		list_del_init(&vma->exec_list);
-		i915_gem_execbuffer_unreserve_vma(vma);
+		eb_unreserve_vma(vma);
 	}
 
 	mutex_unlock(&dev->struct_mutex);
 
 	total = 0;
 	for (i = 0; i < count; i++)
-		total += exec[i].relocation_count;
+		total += eb->exec[i].relocation_count;
 
 	reloc_offset = drm_malloc_ab(count, sizeof(*reloc_offset));
 	reloc = drm_malloc_ab(total, sizeof(*reloc));
@@ -1038,10 +1012,10 @@ i915_gem_execbuffer_relocate_slow(struct drm_device *dev,
 		u64 invalid_offset = (u64)-1;
 		int j;
 
-		user_relocs = u64_to_user_ptr(exec[i].relocs_ptr);
+		user_relocs = u64_to_user_ptr(eb->exec[i].relocs_ptr);
 
 		if (copy_from_user(reloc+total, user_relocs,
-				   exec[i].relocation_count * sizeof(*reloc))) {
+				   eb->exec[i].relocation_count * sizeof(*reloc))) {
 			ret = -EFAULT;
 			mutex_lock(&dev->struct_mutex);
 			goto err;
@@ -1056,7 +1030,7 @@ i915_gem_execbuffer_relocate_slow(struct drm_device *dev,
 		 * happened we would make the mistake of assuming that the
 		 * relocations were valid.
 		 */
-		for (j = 0; j < exec[i].relocation_count; j++) {
+		for (j = 0; j < eb->exec[i].relocation_count; j++) {
 			if (__copy_to_user(&user_relocs[j].presumed_offset,
 					   &invalid_offset,
 					   sizeof(invalid_offset))) {
@@ -1067,7 +1041,7 @@ i915_gem_execbuffer_relocate_slow(struct drm_device *dev,
 		}
 
 		reloc_offset[i] = total;
-		total += exec[i].relocation_count;
+		total += eb->exec[i].relocation_count;
 	}
 
 	ret = i915_mutex_lock_interruptible(dev);
@@ -1078,20 +1052,18 @@ i915_gem_execbuffer_relocate_slow(struct drm_device *dev,
 
 	/* reacquire the objects */
 	eb_reset(eb);
-	ret = eb_lookup_vmas(eb, exec, args, vm, file);
+	ret = eb_lookup_vmas(eb);
 	if (ret)
 		goto err;
 
-	need_relocs = (args->flags & I915_EXEC_NO_RELOC) == 0;
-	ret = i915_gem_execbuffer_reserve(engine, &eb->vmas, ctx,
-					  &need_relocs);
+	ret = eb_reserve(eb);
 	if (ret)
 		goto err;
 
 	list_for_each_entry(vma, &eb->vmas, exec_list) {
-		int offset = vma->exec_entry - exec;
-		ret = i915_gem_execbuffer_relocate_vma_slow(vma, eb,
-							    reloc + reloc_offset[offset]);
+		int idx = vma->exec_entry - eb->exec;
+
+		ret = eb_relocate_vma_slow(vma, eb, reloc + reloc_offset[idx]);
 		if (ret)
 			goto err;
 	}
@@ -1108,29 +1080,28 @@ err:
 	return ret;
 }
 
-static unsigned int eb_other_engines(struct drm_i915_gem_request *req)
+static unsigned int eb_other_engines(struct i915_execbuffer *eb)
 {
 	unsigned int mask;
 
-	mask = ~intel_engine_flag(req->engine) & I915_BO_ACTIVE_MASK;
+	mask = ~intel_engine_flag(eb->engine) & I915_BO_ACTIVE_MASK;
 	mask <<= I915_BO_ACTIVE_SHIFT;
 
 	return mask;
 }
 
 static int
-i915_gem_execbuffer_move_to_gpu(struct drm_i915_gem_request *req,
-				struct list_head *vmas)
+eb_move_to_gpu(struct i915_execbuffer *eb)
 {
-	const unsigned int other_rings = eb_other_engines(req);
+	const unsigned int other_rings = eb_other_engines(eb);
 	struct i915_vma *vma;
 	int ret;
 
-	list_for_each_entry(vma, vmas, exec_list) {
+	list_for_each_entry(vma, &eb->vmas, exec_list) {
 		struct drm_i915_gem_object *obj = vma->obj;
 
 		if (obj->flags & other_rings) {
-			ret = i915_gem_object_sync(obj, req);
+			ret = i915_gem_object_sync(obj, eb->request);
 			if (ret)
 				return ret;
 		}
@@ -1140,10 +1111,10 @@ i915_gem_execbuffer_move_to_gpu(struct drm_i915_gem_request *req,
 	}
 
 	/* Unconditionally flush any chipset caches (for streaming writes). */
-	i915_gem_chipset_flush(req->engine->i915);
+	i915_gem_chipset_flush(eb->i915);
 
 	/* Unconditionally invalidate GPU caches and TLBs. */
-	return req->engine->emit_flush(req, EMIT_INVALIDATE);
+	return eb->engine->emit_flush(eb->request, EMIT_INVALIDATE);
 }
 
 static bool
@@ -1246,24 +1217,25 @@ validate_exec_list(struct drm_device *dev,
 	return 0;
 }
 
-static struct i915_gem_context *
-i915_gem_validate_context(struct drm_device *dev, struct drm_file *file,
-			  struct intel_engine_cs *engine, const u32 ctx_id)
+static int eb_select_context(struct i915_execbuffer *eb)
 {
 	struct i915_gem_context *ctx;
-	struct i915_ctx_hang_stats *hs;
+	unsigned int ctx_id;
 
-	ctx = i915_gem_context_lookup(file->driver_priv, ctx_id);
-	if (IS_ERR(ctx))
-		return ctx;
+	ctx_id = i915_execbuffer2_get_context_id(*eb->args);
+	ctx = i915_gem_context_lookup(eb->file->driver_priv, ctx_id);
+	if (unlikely(IS_ERR(ctx)))
+		return PTR_ERR(ctx);
 
-	hs = &ctx->hang_stats;
-	if (hs->banned) {
+	if (unlikely(ctx->hang_stats.banned)) {
 		DRM_DEBUG("Context %u tried to submit while banned\n", ctx_id);
-		return ERR_PTR(-EIO);
+		return -EIO;
 	}
 
-	return ctx;
+	eb->ctx = i915_gem_context_get(ctx);
+	eb->vm = ctx->ppgtt ? &ctx->ppgtt->base : &eb->i915->ggtt.base;
+
+	return 0;
 }
 
 void i915_vma_move_to_active(struct i915_vma *vma,
@@ -1325,12 +1297,11 @@ static void eb_export_fence(struct drm_i915_gem_object *obj,
 }
 
 static void
-i915_gem_execbuffer_move_to_active(struct list_head *vmas,
-				   struct drm_i915_gem_request *req)
+eb_move_to_active(struct i915_execbuffer *eb)
 {
 	struct i915_vma *vma;
 
-	list_for_each_entry(vma, vmas, exec_list) {
+	list_for_each_entry(vma, &eb->vmas, exec_list) {
 		struct drm_i915_gem_object *obj = vma->obj;
 		u32 old_read = obj->base.read_domains;
 		u32 old_write = obj->base.write_domain;
@@ -1342,8 +1313,8 @@ i915_gem_execbuffer_move_to_active(struct list_head *vmas,
 			obj->base.pending_read_domains |= obj->base.read_domains;
 		obj->base.read_domains = obj->base.pending_read_domains;
 
-		i915_vma_move_to_active(vma, req, vma->exec_entry->flags);
-		eb_export_fence(obj, req, vma->exec_entry->flags);
+		i915_vma_move_to_active(vma, eb->request, vma->exec_entry->flags);
+		eb_export_fence(obj, eb->request, vma->exec_entry->flags);
 		trace_i915_gem_object_change_domain(obj, old_read, old_write);
 	}
 }
@@ -1374,29 +1345,22 @@ i915_reset_gen7_sol_offsets(struct drm_i915_gem_request *req)
 	return 0;
 }
 
-static struct i915_vma *
-i915_gem_execbuffer_parse(struct intel_engine_cs *engine,
-			  struct drm_i915_gem_exec_object2 *shadow_exec_entry,
-			  struct drm_i915_gem_object *batch_obj,
-			  struct eb_vmas *eb,
-			  u32 batch_start_offset,
-			  u32 batch_len,
-			  bool is_master)
+static struct i915_vma *eb_parse(struct i915_execbuffer *eb, bool is_master)
 {
 	struct drm_i915_gem_object *shadow_batch_obj;
 	struct i915_vma *vma;
 	int ret;
 
-	shadow_batch_obj = i915_gem_batch_pool_get(&engine->batch_pool,
-						   PAGE_ALIGN(batch_len));
+	shadow_batch_obj = i915_gem_batch_pool_get(&eb->engine->batch_pool,
+						   PAGE_ALIGN(eb->args->batch_len));
 	if (IS_ERR(shadow_batch_obj))
 		return ERR_CAST(shadow_batch_obj);
 
-	ret = intel_engine_cmd_parser(engine,
-				      batch_obj,
+	ret = intel_engine_cmd_parser(eb->engine,
+				      eb->batch->obj,
 				      shadow_batch_obj,
-				      batch_start_offset,
-				      batch_len,
+				      eb->args->batch_start_offset,
+				      eb->args->batch_len,
 				      is_master);
 	if (ret) {
 		if (ret == -EACCES) /* unhandled chained batch */
@@ -1410,9 +1374,8 @@ i915_gem_execbuffer_parse(struct intel_engine_cs *engine,
 	if (IS_ERR(vma))
 		goto out;
 
-	memset(shadow_exec_entry, 0, sizeof(*shadow_exec_entry));
-
-	vma->exec_entry = shadow_exec_entry;
+	vma->exec_entry =
+		memset(&eb->shadow_exec_entry, 0, sizeof(*vma->exec_entry));
 	vma->exec_entry->flags = __EXEC_OBJECT_HAS_PIN;
 	list_add_tail(&vma->exec_list, &eb->vmas);
 
@@ -1430,49 +1393,45 @@ add_to_client(struct drm_i915_gem_request *req,
 }
 
 static int
-execbuf_submit(struct i915_execbuffer_params *params,
-	       struct drm_i915_gem_execbuffer2 *args,
-	       struct list_head *vmas)
+execbuf_submit(struct i915_execbuffer *eb)
 {
-	struct drm_i915_private *dev_priv = params->request->i915;
-	u64 exec_start, exec_len;
 	int instp_mode;
 	u32 instp_mask;
 	int ret;
 
-	ret = i915_gem_execbuffer_move_to_gpu(params->request, vmas);
+	ret = eb_move_to_gpu(eb);
 	if (ret)
 		return ret;
 
-	ret = i915_switch_context(params->request);
+	ret = i915_switch_context(eb->request);
 	if (ret)
 		return ret;
 
-	instp_mode = args->flags & I915_EXEC_CONSTANTS_MASK;
+	instp_mode = eb->args->flags & I915_EXEC_CONSTANTS_MASK;
 	instp_mask = I915_EXEC_CONSTANTS_MASK;
 	switch (instp_mode) {
 	case I915_EXEC_CONSTANTS_REL_GENERAL:
 	case I915_EXEC_CONSTANTS_ABSOLUTE:
 	case I915_EXEC_CONSTANTS_REL_SURFACE:
-		if (instp_mode != 0 && params->engine->id != RCS) {
+		if (instp_mode != 0 && eb->engine->id != RCS) {
 			DRM_DEBUG("non-0 rel constants mode on non-RCS\n");
 			return -EINVAL;
 		}
 
-		if (instp_mode != dev_priv->relative_constants_mode) {
-			if (INTEL_INFO(dev_priv)->gen < 4) {
+		if (instp_mode != eb->i915->relative_constants_mode) {
+			if (INTEL_INFO(eb->i915)->gen < 4) {
 				DRM_DEBUG("no rel constants on pre-gen4\n");
 				return -EINVAL;
 			}
 
-			if (INTEL_INFO(dev_priv)->gen > 5 &&
+			if (INTEL_INFO(eb->i915)->gen > 5 &&
 			    instp_mode == I915_EXEC_CONSTANTS_REL_SURFACE) {
 				DRM_DEBUG("rel surface constants mode invalid on gen5+\n");
 				return -EINVAL;
 			}
 
 			/* The HW changed the meaning on this bit on gen6 */
-			if (INTEL_INFO(dev_priv)->gen >= 6)
+			if (INTEL_INFO(eb->i915)->gen >= 6)
 				instp_mask &= ~I915_EXEC_CONSTANTS_REL_SURFACE;
 		}
 		break;
@@ -1481,11 +1440,11 @@ execbuf_submit(struct i915_execbuffer_params *params,
 		return -EINVAL;
 	}
 
-	if (params->engine->id == RCS &&
-	    instp_mode != dev_priv->relative_constants_mode) {
-		struct intel_ring *ring = params->request->ring;
+	if (eb->engine->id == RCS &&
+	    instp_mode != eb->i915->relative_constants_mode) {
+		struct intel_ring *ring = eb->request->ring;
 
-		ret = intel_ring_begin(params->request, 4);
+		ret = intel_ring_begin(eb->request, 4);
 		if (ret)
 			return ret;
 
@@ -1495,32 +1454,27 @@ execbuf_submit(struct i915_execbuffer_params *params,
 		intel_ring_emit(ring, instp_mask << 16 | instp_mode);
 		intel_ring_advance(ring);
 
-		dev_priv->relative_constants_mode = instp_mode;
+		eb->i915->relative_constants_mode = instp_mode;
 	}
 
-	if (args->flags & I915_EXEC_GEN7_SOL_RESET) {
-		ret = i915_reset_gen7_sol_offsets(params->request);
+	if (eb->args->flags & I915_EXEC_GEN7_SOL_RESET) {
+		ret = i915_reset_gen7_sol_offsets(eb->request);
 		if (ret)
 			return ret;
 	}
 
-	exec_len   = args->batch_len;
-	exec_start = params->batch->node.start +
-		     params->args_batch_start_offset;
-
-	if (exec_len == 0)
-		exec_len = params->batch->size - params->args_batch_start_offset;
-
-	ret = params->engine->emit_bb_start(params->request,
-					    exec_start, exec_len,
-					    params->dispatch_flags);
+	ret = eb->engine->emit_bb_start(eb->request,
+					eb->batch->node.start +
+					eb->batch_start_offset,
+					eb->args->batch_len,
+					eb->dispatch_flags);
 	if (ret)
 		return ret;
 
-	trace_i915_gem_ring_dispatch(params->request, params->dispatch_flags);
+	trace_i915_gem_ring_dispatch(eb->request, eb->dispatch_flags);
 
-	i915_gem_execbuffer_move_to_active(vmas, params->request);
-	add_to_client(params->request, params->file);
+	eb_move_to_active(eb);
+	add_to_client(eb->request, eb->file);
 
 	return 0;
 }
@@ -1606,24 +1560,13 @@ eb_select_engine(struct drm_i915_private *dev_priv,
 }
 
 static int
-i915_gem_do_execbuffer(struct drm_device *dev, void *data,
+i915_gem_do_execbuffer(struct drm_device *dev,
 		       struct drm_file *file,
 		       struct drm_i915_gem_execbuffer2 *args,
 		       struct drm_i915_gem_exec_object2 *exec)
 {
-	struct drm_i915_private *dev_priv = to_i915(dev);
-	struct i915_ggtt *ggtt = &dev_priv->ggtt;
-	struct eb_vmas *eb;
-	struct drm_i915_gem_exec_object2 shadow_exec_entry;
-	struct intel_engine_cs *engine;
-	struct i915_gem_context *ctx;
-	struct i915_address_space *vm;
-	struct i915_execbuffer_params params_master; /* XXX: will be removed later */
-	struct i915_execbuffer_params *params = &params_master;
-	const u32 ctx_id = i915_execbuffer2_get_context_id(*args);
-	u32 dispatch_flags;
+	struct i915_execbuffer eb;
 	int ret;
-	bool need_relocs;
 
 	if (!i915_gem_check_execbuffer(args))
 		return -EINVAL;
@@ -1632,37 +1575,39 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	if (ret)
 		return ret;
 
-	dispatch_flags = 0;
+	eb.i915 = to_i915(dev);
+	eb.file = file;
+	eb.args = args;
+	eb.exec = exec;
+	eb.need_relocs = (args->flags & I915_EXEC_NO_RELOC) == 0;
+	reloc_cache_init(&eb.reloc_cache, eb.i915);
+
+	eb.dispatch_flags = 0;
 	if (args->flags & I915_EXEC_SECURE) {
 		if (!drm_is_current_master(file) || !capable(CAP_SYS_ADMIN))
 		    return -EPERM;
 
-		dispatch_flags |= I915_DISPATCH_SECURE;
+		eb.dispatch_flags |= I915_DISPATCH_SECURE;
 	}
 	if (args->flags & I915_EXEC_IS_PINNED)
-		dispatch_flags |= I915_DISPATCH_PINNED;
-
-	engine = eb_select_engine(dev_priv, file, args);
-	if (!engine)
-		return -EINVAL;
+		eb.dispatch_flags |= I915_DISPATCH_PINNED;
 
-	if (args->buffer_count < 1) {
-		DRM_DEBUG("execbuf with %d buffers\n", args->buffer_count);
+	eb.engine = eb_select_engine(eb.i915, file, args);
+	if (!eb.engine)
 		return -EINVAL;
-	}
 
 	if (args->flags & I915_EXEC_RESOURCE_STREAMER) {
-		if (!HAS_RESOURCE_STREAMER(dev)) {
+		if (!HAS_RESOURCE_STREAMER(eb.i915)) {
 			DRM_DEBUG("RS is only allowed for Haswell, Gen8 and above\n");
 			return -EINVAL;
 		}
-		if (engine->id != RCS) {
+		if (eb.engine->id != RCS) {
 			DRM_DEBUG("RS is not available on %s\n",
-				 engine->name);
+				 eb.engine->name);
 			return -EINVAL;
 		}
 
-		dispatch_flags |= I915_DISPATCH_RS;
+		eb.dispatch_flags |= I915_DISPATCH_RS;
 	}
 
 	/* Take a local wakeref for preparing to dispatch the execbuf as
@@ -1671,59 +1616,44 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	 * wakeref that we hold until the GPU has been idle for at least
 	 * 100ms.
 	 */
-	intel_runtime_pm_get(dev_priv);
+	intel_runtime_pm_get(eb.i915);
 
 	ret = i915_mutex_lock_interruptible(dev);
 	if (ret)
 		goto pre_mutex_err;
 
-	ctx = i915_gem_validate_context(dev, file, engine, ctx_id);
-	if (IS_ERR(ctx)) {
+	ret = eb_select_context(&eb);
+	if (ret) {
 		mutex_unlock(&dev->struct_mutex);
-		ret = PTR_ERR(ctx);
 		goto pre_mutex_err;
 	}
 
-	i915_gem_context_get(ctx);
-
-	if (ctx->ppgtt)
-		vm = &ctx->ppgtt->base;
-	else
-		vm = &ggtt->base;
-
-	memset(&params_master, 0x00, sizeof(params_master));
-
-	eb = eb_create(dev_priv, args);
-	if (eb == NULL) {
-		i915_gem_context_put(ctx);
+	if (eb_create(&eb)) {
+		i915_gem_context_put(eb.ctx);
 		mutex_unlock(&dev->struct_mutex);
 		ret = -ENOMEM;
 		goto pre_mutex_err;
 	}
 
 	/* Look up object handles */
-	ret = eb_lookup_vmas(eb, exec, args, vm, file);
+	ret = eb_lookup_vmas(&eb);
 	if (ret)
 		goto err;
 
 	/* take note of the batch buffer before we might reorder the lists */
-	params->batch = eb_get_batch(eb);
+	eb.batch = eb_get_batch(&eb);
 
 	/* Move the objects en-masse into the GTT, evicting if necessary. */
-	need_relocs = (args->flags & I915_EXEC_NO_RELOC) == 0;
-	ret = i915_gem_execbuffer_reserve(engine, &eb->vmas, ctx,
-					  &need_relocs);
+	ret = eb_reserve(&eb);
 	if (ret)
 		goto err;
 
 	/* The objects are in their final locations, apply the relocations. */
-	if (need_relocs)
-		ret = i915_gem_execbuffer_relocate(eb);
+	if (eb.need_relocs)
+		ret = eb_relocate(&eb);
 	if (ret) {
 		if (ret == -EFAULT) {
-			ret = i915_gem_execbuffer_relocate_slow(dev, args, file,
-								engine,
-								eb, exec, ctx);
+			ret = eb_relocate_slow(&eb);
 			BUG_ON(!mutex_is_locked(&dev->struct_mutex));
 		}
 		if (ret)
@@ -1731,28 +1661,23 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	}
 
 	/* Set the pending read domains for the batch buffer to COMMAND */
-	if (params->batch->obj->base.pending_write_domain) {
+	if (eb.batch->obj->base.pending_write_domain) {
 		DRM_DEBUG("Attempting to use self-modifying batch buffer\n");
 		ret = -EINVAL;
 		goto err;
 	}
-	if (args->batch_start_offset > params->batch->size ||
-	    args->batch_len > params->batch->size - args->batch_start_offset) {
+	if (args->batch_start_offset > eb.batch->size ||
+	    args->batch_len > eb.batch->size - args->batch_start_offset) {
 		DRM_DEBUG("Attempting to use out-of-bounds batch\n");
 		ret = -EINVAL;
 		goto err;
 	}
 
-	params->args_batch_start_offset = args->batch_start_offset;
-	if (intel_engine_needs_cmd_parser(engine) && args->batch_len) {
+	eb.batch_start_offset = args->batch_start_offset;
+	if (intel_engine_needs_cmd_parser(eb.engine) && args->batch_len) {
 		struct i915_vma *vma;
 
-		vma = i915_gem_execbuffer_parse(engine, &shadow_exec_entry,
-						params->batch->obj,
-						eb,
-						args->batch_start_offset,
-						args->batch_len,
-						drm_is_current_master(file));
+		vma = eb_parse(&eb, drm_is_current_master(file));
 		if (IS_ERR(vma)) {
 			ret = PTR_ERR(vma);
 			goto err;
@@ -1768,19 +1693,21 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 			 * specifically don't want that set on batches the
 			 * command parser has accepted.
 			 */
-			dispatch_flags |= I915_DISPATCH_SECURE;
-			params->args_batch_start_offset = 0;
-			params->batch = vma;
+			eb.dispatch_flags |= I915_DISPATCH_SECURE;
+			eb.batch_start_offset = 0;
+			eb.batch = vma;
 		}
 	}
 
-	params->batch->obj->base.pending_read_domains |= I915_GEM_DOMAIN_COMMAND;
+	eb.batch->obj->base.pending_read_domains |= I915_GEM_DOMAIN_COMMAND;
+	if (args->batch_len == 0)
+		args->batch_len = eb.batch->size - eb.batch_start_offset;
 
 	/* snb/ivb/vlv conflate the "batch in ppgtt" bit with the "non-secure
 	 * batch" bit. Hence we need to pin secure batches into the global gtt.
 	 * hsw should have this fixed, but bdw mucks it up again. */
-	if (dispatch_flags & I915_DISPATCH_SECURE) {
-		struct drm_i915_gem_object *obj = params->batch->obj;
+	if (eb.dispatch_flags & I915_DISPATCH_SECURE) {
+		struct drm_i915_gem_object *obj = eb.batch->obj;
 		struct i915_vma *vma;
 
 		/*
@@ -1799,13 +1726,13 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 			goto err;
 		}
 
-		params->batch = vma;
+		eb.batch = vma;
 	}
 
 	/* Allocate a request for this batch buffer nice and early. */
-	params->request = i915_gem_request_alloc(engine, ctx);
-	if (IS_ERR(params->request)) {
-		ret = PTR_ERR(params->request);
+	eb.request = i915_gem_request_alloc(eb.engine, eb.ctx);
+	if (IS_ERR(eb.request)) {
+		ret = PTR_ERR(eb.request);
 		goto err_batch_unpin;
 	}
 
@@ -1815,22 +1742,10 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	 * inactive_list and lose its active reference. Hence we do not need
 	 * to explicitly hold another reference here.
 	 */
-	params->request->batch = params->batch;
+	eb.request->batch = eb.batch;
 
-	/*
-	 * Save assorted stuff away to pass through to *_submission().
-	 * NB: This data should be 'persistent' and not local as it will
-	 * kept around beyond the duration of the IOCTL once the GPU
-	 * scheduler arrives.
-	 */
-	params->dev                     = dev;
-	params->file                    = file;
-	params->engine                    = engine;
-	params->dispatch_flags          = dispatch_flags;
-	params->ctx                     = ctx;
-
-	ret = execbuf_submit(params, args, &eb->vmas);
-	__i915_add_request(params->request, ret == 0);
+	ret = execbuf_submit(&eb);
+	__i915_add_request(eb.request, ret == 0);
 
 err_batch_unpin:
 	/*
@@ -1839,19 +1754,17 @@ err_batch_unpin:
 	 * needs to be adjusted to also track the ggtt batch vma properly as
 	 * active.
 	 */
-	if (dispatch_flags & I915_DISPATCH_SECURE)
-		i915_vma_unpin(params->batch);
+	if (eb.dispatch_flags & I915_DISPATCH_SECURE)
+		i915_vma_unpin(eb.batch);
 err:
 	/* the request owns the ref now */
-	i915_gem_context_put(ctx);
-	eb_destroy(eb);
-
+	eb_destroy(&eb);
 	mutex_unlock(&dev->struct_mutex);
 
 pre_mutex_err:
 	/* intel_gpu_busy should also get a ref, so it will free when the device
 	 * is really idle. */
-	intel_runtime_pm_put(dev_priv);
+	intel_runtime_pm_put(eb.i915);
 	return ret;
 }
 
@@ -1918,7 +1831,7 @@ i915_gem_execbuffer(struct drm_device *dev, void *data,
 	exec2.flags = I915_EXEC_RENDER;
 	i915_execbuffer2_set_context_id(exec2, 0);
 
-	ret = i915_gem_do_execbuffer(dev, data, file, &exec2, exec2_list);
+	ret = i915_gem_do_execbuffer(dev, file, &exec2, exec2_list);
 	if (!ret) {
 		struct drm_i915_gem_exec_object __user *user_exec_list =
 			u64_to_user_ptr(args->buffers_ptr);
@@ -1982,7 +1895,7 @@ i915_gem_execbuffer2(struct drm_device *dev, void *data,
 		return -EFAULT;
 	}
 
-	ret = i915_gem_do_execbuffer(dev, data, file, args, exec2_list);
+	ret = i915_gem_do_execbuffer(dev, file, args, exec2_list);
 	if (!ret) {
 		/* Copy the new buffer offsets back to the user's exec list. */
 		struct drm_i915_gem_exec_object2 __user *user_exec_list =
-- 
2.9.3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 10/17] drm/i915: Use vma->exec_entry as our double-entry placeholder
  2016-08-22  8:03 Execbuf fixes and major tuning Chris Wilson
                   ` (8 preceding siblings ...)
  2016-08-22  8:03 ` [PATCH 09/17] drm/i915: Amalgamate execbuffer parameter structures Chris Wilson
@ 2016-08-22  8:03 ` Chris Wilson
  2016-08-22  8:03 ` [PATCH 11/17] drm/i915: Store a direct lookup from object handle to vma Chris Wilson
                   ` (8 subsequent siblings)
  18 siblings, 0 replies; 47+ messages in thread
From: Chris Wilson @ 2016-08-22  8:03 UTC (permalink / raw)
  To: intel-gfx

This has the benefit of not requiring us to manipulate the
vma->exec_link list when tearing down the execbuffer, and is a
marginally cheaper test to detect the user error.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_evict.c      | 16 ++-----
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 74 ++++++++++++++++--------------
 drivers/gpu/drm/i915/i915_gem_gtt.c        |  1 -
 3 files changed, 43 insertions(+), 48 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c
index a8f6825cf14f..79ce873b4891 100644
--- a/drivers/gpu/drm/i915/i915_gem_evict.c
+++ b/drivers/gpu/drm/i915/i915_gem_evict.c
@@ -52,9 +52,6 @@ mark_free(struct i915_vma *vma, unsigned int flags, struct list_head *unwind)
 	if (i915_vma_is_pinned(vma))
 		return false;
 
-	if (WARN_ON(!list_empty(&vma->exec_list)))
-		return false;
-
 	if (flags & PIN_NONFAULT && vma->obj->fault_mappable)
 		return false;
 
@@ -140,8 +137,6 @@ search_again:
 	list_for_each_entry_safe(vma, next, &eviction_list, exec_list) {
 		ret = drm_mm_scan_remove_block(&vma->node);
 		BUG_ON(ret);
-
-		INIT_LIST_HEAD(&vma->exec_list);
 	}
 
 	/* Can we unpin some objects such as idle hw contents,
@@ -188,16 +183,12 @@ found:
 		if (drm_mm_scan_remove_block(&vma->node))
 			__i915_vma_pin(vma);
 		else
-			list_del_init(&vma->exec_list);
+			list_del(&vma->exec_list);
 	}
 
 	/* Unbinding will emit any required flushes */
-	while (!list_empty(&eviction_list)) {
-		vma = list_first_entry(&eviction_list,
-				       struct i915_vma,
-				       exec_list);
-
-		list_del_init(&vma->exec_list);
+	ret = 0;
+	list_for_each_entry_safe(vma, next, &eviction_list, exec_list) {
 		__i915_vma_unpin(vma);
 		if (ret == 0)
 			ret = i915_vma_unbind(vma);
@@ -273,7 +264,6 @@ int i915_gem_evict_for_vma(struct i915_vma *target, unsigned int flags)
 	}
 
 	list_for_each_entry_safe(vma, next, &eviction_list, exec_list) {
-		list_del_init(&vma->exec_list);
 		__i915_vma_unpin(vma);
 		if (ret == 0)
 			ret = i915_vma_unbind(vma);
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 7cb5b9ad9212..82496dd73309 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -103,13 +103,39 @@ eb_create(struct i915_execbuffer *eb)
 	} else
 		eb->and = -eb->args->buffer_count;
 
-	INIT_LIST_HEAD(&eb->vmas);
 	return 0;
 }
 
+static inline void
+__eb_unreserve_vma(struct i915_vma *vma,
+		   const struct drm_i915_gem_exec_object2 *entry)
+{
+	if (unlikely(entry->flags & __EXEC_OBJECT_HAS_FENCE))
+		i915_vma_unpin_fence(vma);
+
+	if (entry->flags & __EXEC_OBJECT_HAS_PIN)
+		__i915_vma_unpin(vma);
+}
+
+static void
+eb_unreserve_vma(struct i915_vma *vma)
+{
+	struct drm_i915_gem_exec_object2 *entry = vma->exec_entry;
+
+	__eb_unreserve_vma(vma, entry);
+	entry->flags &= ~(__EXEC_OBJECT_HAS_FENCE | __EXEC_OBJECT_HAS_PIN);
+}
+
 static void
 eb_reset(struct i915_execbuffer *eb)
 {
+	struct i915_vma *vma;
+
+	list_for_each_entry(vma, &eb->vmas, exec_list) {
+		eb_unreserve_vma(vma);
+		vma->exec_entry = NULL;
+	}
+
 	if (eb->and >= 0)
 		memset(eb->buckets, 0, (eb->and+1)*sizeof(struct hlist_head));
 }
@@ -141,6 +167,8 @@ eb_lookup_vmas(struct i915_execbuffer *eb)
 	struct list_head objects;
 	int i, ret;
 
+	INIT_LIST_HEAD(&eb->vmas);
+
 	INIT_LIST_HEAD(&objects);
 	spin_lock(&eb->file->table_lock);
 	/* Grab a reference to the object and release the lock so we can lookup
@@ -246,38 +274,22 @@ static struct i915_vma *eb_get_vma(struct i915_execbuffer *eb, unsigned long han
 	}
 }
 
-static void
-eb_unreserve_vma(struct i915_vma *vma)
+static void eb_destroy(struct i915_execbuffer *eb)
 {
-	struct drm_i915_gem_exec_object2 *entry;
-
-	if (!drm_mm_node_allocated(&vma->node))
-		return;
-
-	entry = vma->exec_entry;
-
-	if (entry->flags & __EXEC_OBJECT_HAS_FENCE)
-		i915_vma_unpin_fence(vma);
+	struct i915_vma *vma;
 
-	if (entry->flags & __EXEC_OBJECT_HAS_PIN)
-		__i915_vma_unpin(vma);
+	list_for_each_entry(vma, &eb->vmas, exec_list) {
+		if (!vma->exec_entry)
+			continue;
 
-	entry->flags &= ~(__EXEC_OBJECT_HAS_FENCE | __EXEC_OBJECT_HAS_PIN);
-}
+		__eb_unreserve_vma(vma, vma->exec_entry);
+		vma->exec_entry = NULL;
+	}
 
-static void eb_destroy(struct i915_execbuffer *eb)
-{
 	i915_gem_context_put(eb->ctx);
 
-	while (!list_empty(&eb->vmas)) {
-		struct i915_vma *vma;
-
-		vma = list_first_entry(&eb->vmas,
-				       struct i915_vma,
-				       exec_list);
-		list_del_init(&vma->exec_list);
-		eb_unreserve_vma(vma);
-	}
+	if (eb->buckets)
+		kfree(eb->buckets);
 }
 
 static inline int use_cpu_reloc(struct drm_i915_gem_object *obj)
@@ -985,12 +997,7 @@ eb_relocate_slow(struct i915_execbuffer *eb)
 	int i, total, ret;
 
 	/* We may process another execbuffer during the unlock... */
-	while (!list_empty(&eb->vmas)) {
-		vma = list_first_entry(&eb->vmas, struct i915_vma, exec_list);
-		list_del_init(&vma->exec_list);
-		eb_unreserve_vma(vma);
-	}
-
+	eb_reset(eb);
 	mutex_unlock(&dev->struct_mutex);
 
 	total = 0;
@@ -1051,7 +1058,6 @@ eb_relocate_slow(struct i915_execbuffer *eb)
 	}
 
 	/* reacquire the objects */
-	eb_reset(eb);
 	ret = eb_lookup_vmas(eb);
 	if (ret)
 		goto err;
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 0061c3841760..f51e483569b9 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -3357,7 +3357,6 @@ __i915_vma_create(struct drm_i915_gem_object *obj,
 	if (vma == NULL)
 		return ERR_PTR(-ENOMEM);
 
-	INIT_LIST_HEAD(&vma->exec_list);
 	for (i = 0; i < ARRAY_SIZE(vma->last_read); i++)
 		init_request_active(&vma->last_read[i], i915_vma_retire);
 	init_request_active(&vma->last_fence, NULL);
-- 
2.9.3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 11/17] drm/i915: Store a direct lookup from object handle to vma
  2016-08-22  8:03 Execbuf fixes and major tuning Chris Wilson
                   ` (9 preceding siblings ...)
  2016-08-22  8:03 ` [PATCH 10/17] drm/i915: Use vma->exec_entry as our double-entry placeholder Chris Wilson
@ 2016-08-22  8:03 ` Chris Wilson
  2016-08-22  8:03 ` [PATCH 12/17] drm/i915: Pass vma to relocate entry Chris Wilson
                   ` (7 subsequent siblings)
  18 siblings, 0 replies; 47+ messages in thread
From: Chris Wilson @ 2016-08-22  8:03 UTC (permalink / raw)
  To: intel-gfx

The advent of full-ppgtt lead to an extra indirection between the object
and its binding. That extra indirection has a noticeable impact on how
fast we can convert from the user handles to our internal vma for
execbuffer. In order to bypass the extra indirection, we use a
resizable hashtable to jump from the object to the per-ctx vma.
rhashtable was considered but we don't need the online resizing feature
and the extra complexity proved to undermine its usefulness. Instead, we
simply reallocate the hastable on demand in a background task and
serialize it before iterating.

In non-full-ppgtt modes, multiple files and multiple contexts can share
the same vma. This leads to having multiple possible handle->vma links,
so we only use the first to establish the fast path. The majority of
buffers are not shared and so we should still be able to realise
speedups with multiple clients.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c        |  17 +-
 drivers/gpu/drm/i915/i915_drv.h            |  19 ++-
 drivers/gpu/drm/i915/i915_gem.c            |   5 +-
 drivers/gpu/drm/i915/i915_gem_context.c    |  73 +++++++++
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 243 +++++++++++++++++------------
 drivers/gpu/drm/i915/i915_gem_gtt.c        |  20 +++
 drivers/gpu/drm/i915/i915_gem_gtt.h        |   8 +-
 7 files changed, 269 insertions(+), 116 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 996744708f31..d812157ed94c 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -208,9 +208,9 @@ static int obj_rank_by_stolen(void *priv,
 			      struct list_head *A, struct list_head *B)
 {
 	struct drm_i915_gem_object *a =
-		container_of(A, struct drm_i915_gem_object, obj_exec_link);
+		container_of(A, struct drm_i915_gem_object, tmp_link);
 	struct drm_i915_gem_object *b =
-		container_of(B, struct drm_i915_gem_object, obj_exec_link);
+		container_of(B, struct drm_i915_gem_object, tmp_link);
 
 	if (a->stolen->start < b->stolen->start)
 		return -1;
@@ -238,7 +238,7 @@ static int i915_gem_stolen_list_info(struct seq_file *m, void *data)
 		if (obj->stolen == NULL)
 			continue;
 
-		list_add(&obj->obj_exec_link, &stolen);
+		list_add(&obj->tmp_link, &stolen);
 
 		total_obj_size += obj->base.size;
 		total_gtt_size += i915_gem_obj_total_ggtt_size(obj);
@@ -248,7 +248,7 @@ static int i915_gem_stolen_list_info(struct seq_file *m, void *data)
 		if (obj->stolen == NULL)
 			continue;
 
-		list_add(&obj->obj_exec_link, &stolen);
+		list_add(&obj->tmp_link, &stolen);
 
 		total_obj_size += obj->base.size;
 		count++;
@@ -256,11 +256,11 @@ static int i915_gem_stolen_list_info(struct seq_file *m, void *data)
 	list_sort(NULL, &stolen, obj_rank_by_stolen);
 	seq_puts(m, "Stolen:\n");
 	while (!list_empty(&stolen)) {
-		obj = list_first_entry(&stolen, typeof(*obj), obj_exec_link);
+		obj = list_first_entry(&stolen, typeof(*obj), tmp_link);
 		seq_puts(m, "   ");
 		describe_obj(m, obj);
 		seq_putc(m, '\n');
-		list_del_init(&obj->obj_exec_link);
+		list_del(&obj->tmp_link);
 	}
 	mutex_unlock(&dev->struct_mutex);
 
@@ -1996,6 +1996,11 @@ static int i915_context_status(struct seq_file *m, void *unused)
 			seq_putc(m, '\n');
 		}
 
+		seq_printf(m, "\tvma hashtable size=%u (actual %u), count=%u\n",
+			   ctx->vma.ht_size,
+			   1 << ctx->vma.ht_bits,
+			   ctx->vma.ht_count);
+
 		seq_putc(m, '\n');
 	}
 
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 049ac53b05da..e7e7840d5a68 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -37,7 +37,7 @@
 #include <linux/i2c.h>
 #include <linux/i2c-algo-bit.h>
 #include <linux/backlight.h>
-#include <linux/hashtable.h>
+#include <linux/hash.h>
 #include <linux/intel-iommu.h>
 #include <linux/kref.h>
 #include <linux/pm_qos.h>
@@ -889,8 +889,11 @@ struct i915_ctx_hang_stats {
  */
 struct i915_gem_context {
 	struct kref ref;
+
 	struct drm_i915_private *i915;
 	struct drm_i915_file_private *file_priv;
+	struct list_head link;
+
 	struct i915_hw_ppgtt *ppgtt;
 	struct pid *pid;
 
@@ -919,7 +922,13 @@ struct i915_gem_context {
 	struct atomic_notifier_head status_notifier;
 	bool execlists_force_single_submission;
 
-	struct list_head link;
+	struct {
+		struct work_struct resize;
+		struct hlist_head *ht;
+		unsigned int ht_bits;
+		unsigned int ht_size;
+		unsigned int ht_count;
+	} vma;
 
 	u8 remap_slice;
 	bool closed:1;
@@ -2153,15 +2162,14 @@ struct drm_i915_gem_object {
 
 	/** List of VMAs backed by this object */
 	struct list_head vma_list;
+	struct i915_vma *vma_hashed;
 
 	/** Stolen memory for this object, instead of being backed by shmem. */
 	struct drm_mm_node *stolen;
 	struct list_head global_list;
 
-	/** Used in execbuf to temporarily hold a ref */
-	struct list_head obj_exec_link;
-
 	struct list_head batch_pool_link;
+	struct list_head tmp_link;
 
 	unsigned long flags;
 	/**
@@ -3123,6 +3131,7 @@ int i915_vma_bind(struct i915_vma *vma, enum i915_cache_level cache_level,
 		  u32 flags);
 void __i915_vma_set_map_and_fenceable(struct i915_vma *vma);
 int __must_check i915_vma_unbind(struct i915_vma *vma);
+void i915_vma_unlink_ctx(struct i915_vma *vma);
 void i915_vma_close(struct i915_vma *vma);
 void i915_vma_destroy(struct i915_vma *vma);
 
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index e432211e8b24..0ca3ef547136 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2679,6 +2679,10 @@ void i915_gem_close_object(struct drm_gem_object *gem, struct drm_file *file)
 		if (vma->vm->file == fpriv)
 			i915_vma_close(vma);
 
+	vma = obj->vma_hashed;
+	if (vma && vma->ctx->file_priv == fpriv)
+		i915_vma_unlink_ctx(vma);
+
 	if (i915_gem_object_is_active(obj) &&
 	    !i915_gem_object_has_active_reference(obj)) {
 		i915_gem_object_set_active_reference(obj);
@@ -4065,7 +4069,6 @@ void i915_gem_object_init(struct drm_i915_gem_object *obj,
 				    i915_gem_object_retire__read);
 	init_request_active(&obj->last_write,
 			    i915_gem_object_retire__write);
-	INIT_LIST_HEAD(&obj->obj_exec_link);
 	INIT_LIST_HEAD(&obj->vma_list);
 	INIT_LIST_HEAD(&obj->batch_pool_link);
 
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index ab54f208a4a9..3b4eec676d66 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -85,6 +85,7 @@
  *
  */
 
+#include <linux/log2.h>
 #include <drm/drmP.h>
 #include <drm/i915_drm.h>
 #include "i915_drv.h"
@@ -99,6 +100,9 @@
 #define GEN6_CONTEXT_ALIGN (64<<10)
 #define GEN7_CONTEXT_ALIGN 4096
 
+/* Initial size (as log2) to preallocate the handle->object hashtable */
+#define VMA_HT_BITS 2u /* 4 x 2 pointers, 64 bytes minimum */
+
 static size_t get_context_alignment(struct drm_i915_private *dev_priv)
 {
 	if (IS_GEN6(dev_priv))
@@ -134,6 +138,64 @@ static int get_context_size(struct drm_i915_private *dev_priv)
 	return ret;
 }
 
+static void resize_vma_ht(struct work_struct *work)
+{
+	struct i915_gem_context *ctx =
+		container_of(work, typeof(*ctx), vma.resize);
+	unsigned int size, bits, new_bits, i;
+	struct hlist_head *new_ht;
+
+	bits = 1 + ilog2(4*ctx->vma.ht_count/3);
+	new_bits = min_t(unsigned int,
+			 max(bits, VMA_HT_BITS),
+			 sizeof(unsigned int)*8);
+	if (new_bits == ctx->vma.ht_bits)
+		goto out;
+
+	new_ht = kzalloc(sizeof(*new_ht)<<new_bits, GFP_KERNEL | __GFP_NOWARN);
+	if (!new_ht)
+		new_ht = vzalloc(sizeof(*new_ht)<<new_bits);
+	if (!new_ht)
+		/* pretend the resize suceeded and stop calling us for a bit! */
+		goto out;
+
+	size = 1 << ctx->vma.ht_bits;
+	for (i = 0; i < size; i++) {
+		struct i915_vma *vma;
+		struct hlist_node *tmp;
+
+		hlist_for_each_entry_safe(vma, tmp, &ctx->vma.ht[i], ctx_node)
+			hlist_add_head(&vma->ctx_node,
+				       &new_ht[hash_32(vma->ctx_handle,
+						       new_bits)]);
+	}
+	kvfree(ctx->vma.ht);
+	ctx->vma.ht = new_ht;
+	ctx->vma.ht_bits = new_bits;
+	smp_wmb();
+out:
+	ctx->vma.ht_size = 1 << bits;
+}
+
+static void decouple_vma(struct i915_gem_context *ctx)
+{
+	unsigned int i, size;
+
+	if (ctx->vma.ht_size & 1)
+		cancel_work_sync(&ctx->vma.resize);
+
+	size = 1 << ctx->vma.ht_bits;
+	for (i = 0; i < size; i++) {
+		struct i915_vma *vma;
+
+		hlist_for_each_entry(vma, &ctx->vma.ht[i], ctx_node) {
+			vma->obj->vma_hashed = NULL;
+			vma->ctx = NULL;
+		}
+	}
+	kvfree(ctx->vma.ht);
+}
+
 void i915_gem_context_free(struct kref *ctx_ref)
 {
 	struct i915_gem_context *ctx = container_of(ctx_ref, typeof(*ctx), ref);
@@ -143,6 +205,7 @@ void i915_gem_context_free(struct kref *ctx_ref)
 	trace_i915_context_free(ctx);
 	GEM_BUG_ON(!ctx->closed);
 
+	decouple_vma(ctx);
 	i915_ppgtt_put(ctx->ppgtt);
 
 	for (i = 0; i < I915_NUM_ENGINES; i++) {
@@ -159,6 +222,7 @@ void i915_gem_context_free(struct kref *ctx_ref)
 	}
 
 	put_pid(ctx->pid);
+
 	list_del(&ctx->link);
 
 	ida_simple_remove(&ctx->i915->context_hw_ida, ctx->hw_id);
@@ -281,6 +345,15 @@ __create_hw_context(struct drm_device *dev,
 
 	ctx->ggtt_alignment = get_context_alignment(dev_priv);
 
+	ctx->vma.ht_bits = VMA_HT_BITS;
+	ctx->vma.ht_size = 1 << ctx->vma.ht_bits;
+	ctx->vma.ht = kzalloc(sizeof(*ctx->vma.ht)*ctx->vma.ht_size,
+			      GFP_KERNEL);
+	if (!ctx->vma.ht)
+		goto err_out;
+
+	INIT_WORK(&ctx->vma.resize, resize_vma_ht);
+
 	if (dev_priv->hw_context_size) {
 		struct drm_i915_gem_object *obj;
 		struct i915_vma *vma;
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 82496dd73309..35751e855859 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -70,38 +70,33 @@ struct i915_execbuffer {
 		unsigned int page;
 		bool use_64bit_reloc;
 	} reloc_cache;
-	int and;
-	union {
-		struct i915_vma **lut;
-		struct hlist_head *buckets;
-	};
+	int lut_mask;
+	struct hlist_head *buckets;
 };
 
 static int
 eb_create(struct i915_execbuffer *eb)
 {
-	eb->lut = NULL;
-	if (eb->args->flags & I915_EXEC_HANDLE_LUT) {
-		unsigned int size = eb->args->buffer_count;
-		size *= sizeof(struct i915_vma *);
-		eb->lut = kmalloc(size,
-				  GFP_TEMPORARY | __GFP_NOWARN | __GFP_NORETRY);
-	}
-
-	if (!eb->lut) {
-		unsigned int size = eb->args->buffer_count;
-		unsigned int count = PAGE_SIZE / sizeof(struct hlist_head) / 2;
-		BUILD_BUG_ON_NOT_POWER_OF_2(PAGE_SIZE / sizeof(struct hlist_head));
-		while (count > 2*size)
-			count >>= 1;
-		eb->lut = kzalloc(count*sizeof(struct hlist_head),
-				  GFP_TEMPORARY);
-		if (!eb->lut)
-			return -ENOMEM;
-
-		eb->and = count - 1;
+	if ((eb->args->flags & I915_EXEC_HANDLE_LUT) == 0) {
+		unsigned int size = 1 + ilog2(eb->args->buffer_count);
+
+		do {
+			eb->buckets = kzalloc(sizeof(struct hlist_head) << size,
+					     GFP_TEMPORARY | __GFP_NOWARN | __GFP_NORETRY);
+			if (eb->buckets)
+				break;
+		} while (--size);
+
+		if (unlikely(!eb->buckets)) {
+			eb->buckets = kzalloc(sizeof(struct hlist_head),
+					      GFP_TEMPORARY);
+			if (unlikely(!eb->buckets))
+				return -ENOMEM;
+		}
+
+		eb->lut_mask = size;
 	} else
-		eb->and = -eb->args->buffer_count;
+		eb->lut_mask = -eb->args->buffer_count;
 
 	return 0;
 }
@@ -136,72 +131,103 @@ eb_reset(struct i915_execbuffer *eb)
 		vma->exec_entry = NULL;
 	}
 
-	if (eb->and >= 0)
-		memset(eb->buckets, 0, (eb->and+1)*sizeof(struct hlist_head));
+	if (eb->lut_mask >= 0)
+		memset(eb->buckets, 0,
+		       (1<<eb->lut_mask)*sizeof(struct hlist_head));
 }
 
-static struct i915_vma *
-eb_get_batch(struct i915_execbuffer *eb)
+#define to_ptr(T, x) ((T *)(uintptr_t)(x))
+
+static bool
+eb_add_vma(struct i915_execbuffer *eb, struct i915_vma *vma, int i)
 {
-	struct i915_vma *vma = list_entry(eb->vmas.prev, typeof(*vma), exec_list);
+	if (unlikely(vma->exec_entry)) {
+		DRM_DEBUG("Object [handle %d, index %d] appears more than once in object list\n",
+			  eb->exec[i].handle, i);
+		return false;
+	}
+	list_add_tail(&vma->exec_list, &eb->vmas);
 
-	/*
-	 * SNA is doing fancy tricks with compressing batch buffers, which leads
-	 * to negative relocation deltas. Usually that works out ok since the
-	 * relocate address is still positive, except when the batch is placed
-	 * very low in the GTT. Ensure this doesn't happen.
-	 *
-	 * Note that actual hangs have only been observed on gen7, but for
-	 * paranoia do it everywhere.
-	 */
-	if ((vma->exec_entry->flags & EXEC_OBJECT_PINNED) == 0)
-		vma->exec_entry->flags |= __EXEC_OBJECT_NEEDS_BIAS;
+	vma->exec_entry = &eb->exec[i];
+	if (eb->lut_mask >= 0) {
+		vma->exec_handle = eb->exec[i].handle;
+		hlist_add_head(&vma->exec_node,
+			       &eb->buckets[hash_32(vma->exec_handle,
+						    eb->lut_mask)]);
+	}
 
-	return vma;
+	eb->exec[i].rsvd2 = (uintptr_t)vma;
+	return true;
+}
+
+static inline struct hlist_head *ht_head(struct i915_gem_context *ctx,
+					 u32 handle)
+{
+	return &ctx->vma.ht[hash_32(handle, ctx->vma.ht_bits)];
 }
 
 static int
 eb_lookup_vmas(struct i915_execbuffer *eb)
 {
-	struct drm_i915_gem_object *obj;
-	struct list_head objects;
-	int i, ret;
+	const int count = eb->args->buffer_count;
+	struct i915_vma *vma;
+	int slow_pass = -1;
+	int i;
 
 	INIT_LIST_HEAD(&eb->vmas);
 
-	INIT_LIST_HEAD(&objects);
+	if (unlikely(eb->ctx->vma.ht_size & 1))
+		flush_work(&eb->ctx->vma.resize);
+	for (i = 0; i < count; i++) {
+		eb->exec[i].rsvd2 = 0;
+
+		hlist_for_each_entry(vma,
+				     ht_head(eb->ctx, eb->exec[i].handle),
+				     ctx_node) {
+			if (vma->ctx_handle != eb->exec[i].handle)
+				continue;
+
+			if (!eb_add_vma(eb, vma, i))
+				return -EINVAL;
+
+			goto next_vma;
+		}
+
+		if (slow_pass < 0)
+			slow_pass = i;
+next_vma: ;
+	}
+
+	if (slow_pass < 0)
+		return 0;
+
 	spin_lock(&eb->file->table_lock);
 	/* Grab a reference to the object and release the lock so we can lookup
 	 * or create the VMA without using GFP_ATOMIC */
-	for (i = 0; i < eb->args->buffer_count; i++) {
-		obj = to_intel_bo(idr_find(&eb->file->object_idr, eb->exec[i].handle));
-		if (obj == NULL) {
-			spin_unlock(&eb->file->table_lock);
-			DRM_DEBUG("Invalid object handle %d at index %d\n",
-				   eb->exec[i].handle, i);
-			ret = -ENOENT;
-			goto err;
-		}
+	for (i = slow_pass; i < count; i++) {
+		struct drm_i915_gem_object *obj;
+
+		if (eb->exec[i].rsvd2)
+			continue;
 
-		if (!list_empty(&obj->obj_exec_link)) {
+		obj = to_intel_bo(idr_find(&eb->file->object_idr,
+					   eb->exec[i].handle));
+		if (unlikely(!obj)) {
 			spin_unlock(&eb->file->table_lock);
-			DRM_DEBUG("Object %p [handle %d, index %d] appears more than once in object list\n",
-				   obj, eb->exec[i].handle, i);
-			ret = -EINVAL;
-			goto err;
+			DRM_DEBUG("Invalid object handle %d at index %d\n",
+				  eb->exec[i].handle, i);
+			return -ENOENT;
 		}
 
-		list_add_tail(&obj->obj_exec_link, &objects);
+		eb->exec[i].rsvd2 = 1 | (uintptr_t)obj;
 	}
 	spin_unlock(&eb->file->table_lock);
 
-	i = 0;
-	while (!list_empty(&objects)) {
-		struct i915_vma *vma;
+	for (i = slow_pass; i < count; i++) {
+		struct drm_i915_gem_object *obj;
 
-		obj = list_first_entry(&objects,
-				       struct drm_i915_gem_object,
-				       obj_exec_link);
+		if ((eb->exec[i].rsvd2 & 1) == 0)
+			continue;
 
 		/*
 		 * NOTE: We can leak any vmas created here when something fails
@@ -211,61 +237,72 @@ eb_lookup_vmas(struct i915_execbuffer *eb)
 		 * from the (obj, vm) we don't run the risk of creating
 		 * duplicated vmas for the same vm.
 		 */
+		obj = to_ptr(struct drm_i915_gem_object, eb->exec[i].rsvd2 & ~1);
 		vma = i915_gem_obj_lookup_or_create_vma(obj, eb->vm, NULL);
 		if (unlikely(IS_ERR(vma))) {
 			DRM_DEBUG("Failed to lookup VMA\n");
-			ret = PTR_ERR(vma);
-			goto err;
+			return PTR_ERR(vma);
 		}
 
-		/* Transfer ownership from the objects list to the vmas list. */
-		list_add_tail(&vma->exec_list, &eb->vmas);
-		list_del_init(&obj->obj_exec_link);
-
-		vma->exec_entry = &eb->exec[i];
-		if (eb->and < 0) {
-			eb->lut[i] = vma;
-		} else {
-			u32 handle =
-				eb->args->flags & I915_EXEC_HANDLE_LUT ?
-				i : eb->exec[i].handle;
-			vma->exec_handle = handle;
-			hlist_add_head(&vma->exec_node,
-				       &eb->buckets[handle & eb->and]);
+		/* First come, first served */
+		if (!vma->ctx) {
+			vma->ctx = eb->ctx;
+			vma->ctx_handle = eb->exec[i].handle;
+			hlist_add_head(&vma->ctx_node,
+				       ht_head(eb->ctx, eb->exec[i].handle));
+			eb->ctx->vma.ht_count++;
+			if (i915_vma_is_ggtt(vma)) {
+				GEM_BUG_ON(obj->vma_hashed);
+				obj->vma_hashed = vma;
+			}
 		}
-		++i;
+
+		if (!eb_add_vma(eb, vma, i))
+			return -EINVAL;
+	}
+	if (4*eb->ctx->vma.ht_count > 3*eb->ctx->vma.ht_size ||
+	    4*eb->ctx->vma.ht_count < eb->ctx->vma.ht_size) {
+		eb->ctx->vma.ht_size |= 1;
+		queue_work(system_highpri_wq, &eb->ctx->vma.resize);
 	}
 
 	return 0;
+}
+
+static struct i915_vma *
+eb_get_batch(struct i915_execbuffer *eb)
+{
+	struct i915_vma *vma;
 
+	vma = to_ptr(struct i915_vma, eb->exec[eb->args->buffer_count-1].rsvd2);
 
-err:
-	while (!list_empty(&objects)) {
-		obj = list_first_entry(&objects,
-				       struct drm_i915_gem_object,
-				       obj_exec_link);
-		list_del_init(&obj->obj_exec_link);
-		i915_gem_object_put(obj);
-	}
 	/*
-	 * Objects already transfered to the vmas list will be unreferenced by
-	 * eb_destroy.
+	 * SNA is doing fancy tricks with compressing batch buffers, which leads
+	 * to negative relocation deltas. Usually that works out ok since the
+	 * relocate address is still positive, except when the batch is placed
+	 * very low in the GTT. Ensure this doesn't happen.
+	 *
+	 * Note that actual hangs have only been observed on gen7, but for
+	 * paranoia do it everywhere.
 	 */
+	if ((vma->exec_entry->flags & EXEC_OBJECT_PINNED) == 0)
+		vma->exec_entry->flags |= __EXEC_OBJECT_NEEDS_BIAS;
 
-	return ret;
+	return vma;
 }
 
-static struct i915_vma *eb_get_vma(struct i915_execbuffer *eb, unsigned long handle)
+static struct i915_vma *
+eb_get_vma(struct i915_execbuffer *eb, unsigned long handle)
 {
-	if (eb->and < 0) {
-		if (handle >= -eb->and)
+	if (eb->lut_mask < 0) {
+		if (handle >= -eb->lut_mask)
 			return NULL;
-		return eb->lut[handle];
+		return to_ptr(struct i915_vma, eb->exec[handle].rsvd2);
 	} else {
 		struct hlist_head *head;
 		struct i915_vma *vma;
 
-		head = &eb->buckets[handle & eb->and];
+		head = &eb->buckets[hash_32(handle, eb->lut_mask)];
 		hlist_for_each_entry(vma, head, exec_node) {
 			if (vma->exec_handle == handle)
 				return vma;
@@ -288,7 +325,7 @@ static void eb_destroy(struct i915_execbuffer *eb)
 
 	i915_gem_context_put(eb->ctx);
 
-	if (eb->buckets)
+	if (eb->lut_mask >= 0)
 		kfree(eb->buckets);
 }
 
@@ -916,7 +953,7 @@ static int eb_reserve(struct i915_execbuffer *eb)
 			entry->flags &= ~EXEC_OBJECT_NEEDS_FENCE;
 		need_fence =
 			entry->flags & EXEC_OBJECT_NEEDS_FENCE &&
-			i915_gem_object_is_tiled(obj);
+			i915_gem_object_is_tiled(vma->obj);
 		need_mappable = need_fence || need_reloc_mappable(vma);
 
 		if (entry->flags & EXEC_OBJECT_PINNED)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index f51e483569b9..b07a84f2603f 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -3333,11 +3333,31 @@ void i915_vma_destroy(struct i915_vma *vma)
 	kmem_cache_free(to_i915(vma->obj->base.dev)->vmas, vma);
 }
 
+void i915_vma_unlink_ctx(struct i915_vma *vma)
+{
+	struct i915_gem_context *ctx = vma->ctx;
+
+	if (ctx->vma.ht_size & 1) {
+		cancel_work_sync(&ctx->vma.resize);
+		ctx->vma.ht_size &= ~1;
+	}
+
+	__hlist_del(&vma->ctx_node);
+	ctx->vma.ht_count--;
+
+	if (i915_vma_is_ggtt(vma))
+		vma->obj->vma_hashed = NULL;
+	vma->ctx = NULL;
+}
+
 void i915_vma_close(struct i915_vma *vma)
 {
 	GEM_BUG_ON(i915_vma_is_closed(vma));
 	vma->flags |= I915_VMA_CLOSED;
 
+	if (vma->ctx)
+		i915_vma_unlink_ctx(vma);
+
 	list_del_init(&vma->obj_link);
 	if (!i915_vma_is_active(vma) && !i915_vma_is_pinned(vma))
 		WARN_ON(i915_vma_unbind(vma));
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index a90999fe2e57..f9540683d2c0 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -166,6 +166,7 @@ struct i915_ggtt_view {
 extern const struct i915_ggtt_view i915_ggtt_view_normal;
 extern const struct i915_ggtt_view i915_ggtt_view_rotated;
 
+struct i915_gem_context;
 enum i915_cache_level;
 
 /**
@@ -226,6 +227,7 @@ struct i915_vma {
 	struct list_head vm_link;
 
 	struct list_head obj_link; /* Link in the object's VMA list */
+	struct hlist_node obj_node;
 
 	/** This vma's place in the batchbuffer or on the eviction list */
 	struct list_head exec_list;
@@ -234,8 +236,12 @@ struct i915_vma {
 	 * Used for performing relocations during execbuffer insertion.
 	 */
 	struct hlist_node exec_node;
-	unsigned long exec_handle;
 	struct drm_i915_gem_exec_object2 *exec_entry;
+	u32 exec_handle;
+
+	struct i915_gem_context *ctx;
+	struct hlist_node ctx_node;
+	u32 ctx_handle;
 };
 
 struct i915_vma *
-- 
2.9.3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 12/17] drm/i915: Pass vma to relocate entry
  2016-08-22  8:03 Execbuf fixes and major tuning Chris Wilson
                   ` (10 preceding siblings ...)
  2016-08-22  8:03 ` [PATCH 11/17] drm/i915: Store a direct lookup from object handle to vma Chris Wilson
@ 2016-08-22  8:03 ` Chris Wilson
  2016-08-22  8:03 ` [PATCH 13/17] drm/i915: Eliminate lots of iterations over the execobjects array Chris Wilson
                   ` (6 subsequent siblings)
  18 siblings, 0 replies; 47+ messages in thread
From: Chris Wilson @ 2016-08-22  8:03 UTC (permalink / raw)
  To: intel-gfx

We can simplify our tracking of pending writes in an execbuf to the
single bit in the vma->exec_entry->flags, but that requires the
relocation function knowing the object's vma. Pass it along.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h            |   3 +-
 drivers/gpu/drm/i915/i915_gem.c            |   5 +-
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 110 ++++++++++++-----------------
 drivers/gpu/drm/i915/intel_display.c       |   2 +-
 4 files changed, 53 insertions(+), 67 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index e7e7840d5a68..899fe983e623 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3256,7 +3256,8 @@ i915_gem_obj_finish_shmem_access(struct drm_i915_gem_object *obj)
 
 int __must_check i915_mutex_lock_interruptible(struct drm_device *dev);
 int i915_gem_object_sync(struct drm_i915_gem_object *obj,
-			 struct drm_i915_gem_request *to);
+			 struct drm_i915_gem_request *to,
+			 bool write);
 void i915_vma_move_to_active(struct i915_vma *vma,
 			     struct drm_i915_gem_request *req,
 			     unsigned int flags);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 0ca3ef547136..78faac2b780c 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2798,7 +2798,8 @@ __i915_gem_object_sync(struct drm_i915_gem_request *to,
  */
 int
 i915_gem_object_sync(struct drm_i915_gem_object *obj,
-		     struct drm_i915_gem_request *to)
+		     struct drm_i915_gem_request *to,
+		     bool write)
 {
 	struct i915_gem_active *active;
 	unsigned long active_mask;
@@ -2810,7 +2811,7 @@ i915_gem_object_sync(struct drm_i915_gem_object *obj,
 	if (!active_mask)
 		return 0;
 
-	if (obj->base.pending_write_domain) {
+	if (write) {
 		active = obj->last_read;
 	} else {
 		active_mask = 1;
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 35751e855859..e9ac591f3a79 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -617,42 +617,25 @@ static bool object_is_idle(struct drm_i915_gem_object *obj)
 }
 
 static int
-eb_relocate_entry(struct drm_i915_gem_object *obj,
+eb_relocate_entry(struct i915_vma *vma,
 		  struct i915_execbuffer *eb,
 		  struct drm_i915_gem_relocation_entry *reloc)
 {
-	struct drm_gem_object *target_obj;
-	struct drm_i915_gem_object *target_i915_obj;
-	struct i915_vma *target_vma;
-	uint64_t target_offset;
+	struct i915_vma *target;
+	u64 target_offset;
 	int ret;
 
 	/* we've already hold a reference to all valid objects */
-	target_vma = eb_get_vma(eb, reloc->target_handle);
-	if (unlikely(target_vma == NULL))
+	target = eb_get_vma(eb, reloc->target_handle);
+	if (unlikely(!target))
 		return -ENOENT;
-	target_i915_obj = target_vma->obj;
-	target_obj = &target_vma->obj->base;
-
-	target_offset = gen8_canonical_addr(target_vma->node.start);
-
-	/* Sandybridge PPGTT errata: We need a global gtt mapping for MI and
-	 * pipe_control writes because the gpu doesn't properly redirect them
-	 * through the ppgtt for non_secure batchbuffers. */
-	if (unlikely(IS_GEN6(eb->i915) &&
-		     reloc->write_domain == I915_GEM_DOMAIN_INSTRUCTION)) {
-		ret = i915_vma_bind(target_vma, target_i915_obj->cache_level,
-				    PIN_GLOBAL);
-		if (WARN_ONCE(ret, "Unexpected failure to bind target VMA!"))
-			return ret;
-	}
 
 	/* Validate that the target is in a valid r/w GPU domain */
 	if (unlikely(reloc->write_domain & (reloc->write_domain - 1))) {
 		DRM_DEBUG("reloc with multiple write domains: "
-			  "obj %p target %d offset %d "
+			  "target %d offset %d "
 			  "read %08x write %08x",
-			  obj, reloc->target_handle,
+			  reloc->target_handle,
 			  (int) reloc->offset,
 			  reloc->read_domains,
 			  reloc->write_domain);
@@ -661,47 +644,60 @@ eb_relocate_entry(struct drm_i915_gem_object *obj,
 	if (unlikely((reloc->write_domain | reloc->read_domains)
 		     & ~I915_GEM_GPU_DOMAINS)) {
 		DRM_DEBUG("reloc with read/write non-GPU domains: "
-			  "obj %p target %d offset %d "
+			  "target %d offset %d "
 			  "read %08x write %08x",
-			  obj, reloc->target_handle,
+			  reloc->target_handle,
 			  (int) reloc->offset,
 			  reloc->read_domains,
 			  reloc->write_domain);
 		return -EINVAL;
 	}
 
-	target_obj->pending_read_domains |= reloc->read_domains;
-	target_obj->pending_write_domain |= reloc->write_domain;
+	if (reloc->write_domain)
+		target->exec_entry->flags |= EXEC_OBJECT_WRITE;
+
+	/* Sandybridge PPGTT errata: We need a global gtt mapping for MI and
+	 * pipe_control writes because the gpu doesn't properly redirect them
+	 * through the ppgtt for non_secure batchbuffers.
+	 */
+	if (unlikely(IS_GEN6(eb->i915) &&
+		     reloc->write_domain == I915_GEM_DOMAIN_INSTRUCTION)) {
+		ret = i915_vma_bind(target, target->obj->cache_level,
+				    PIN_GLOBAL);
+		if (WARN_ONCE(ret, "Unexpected failure to bind target VMA!"))
+			return ret;
+	}
 
 	/* If the relocation already has the right value in it, no
 	 * more work needs to be done.
 	 */
+	target_offset = gen8_canonical_addr(target->node.start);
 	if (target_offset == reloc->presumed_offset)
 		return 0;
 
 	/* Check that the relocation address is valid... */
 	if (unlikely(reloc->offset >
-		     obj->base.size - (eb->reloc_cache.use_64bit_reloc ? 8 : 4))) {
+		     vma->size - (eb->reloc_cache.use_64bit_reloc ? 8 : 4))) {
 		DRM_DEBUG("Relocation beyond object bounds: "
-			  "obj %p target %d offset %d size %d.\n",
-			  obj, reloc->target_handle,
-			  (int) reloc->offset,
-			  (int) obj->base.size);
+			  "target %d offset %d size %d.\n",
+			  reloc->target_handle,
+			  (int)reloc->offset,
+			  (int)vma->size);
 		return -EINVAL;
 	}
 	if (unlikely(reloc->offset & 3)) {
 		DRM_DEBUG("Relocation not 4-byte aligned: "
-			  "obj %p target %d offset %d.\n",
-			  obj, reloc->target_handle,
-			  (int) reloc->offset);
+			  "target %d offset %d.\n",
+			  reloc->target_handle,
+			  (int)reloc->offset);
 		return -EINVAL;
 	}
 
 	/* We can't wait for rendering with pagefaults disabled */
-	if (pagefault_disabled() && !object_is_idle(obj))
+	if (pagefault_disabled() && !object_is_idle(vma->obj))
 		return -EFAULT;
 
-	ret = relocate_entry(obj, reloc, &eb->reloc_cache, target_offset);
+	ret = relocate_entry(vma->obj, reloc, &eb->reloc_cache, target_offset);
 	if (ret)
 		return ret;
 
@@ -736,7 +732,7 @@ static int eb_relocate_vma(struct i915_vma *vma, struct i915_execbuffer *eb)
 		do {
 			u64 offset = r->presumed_offset;
 
-			ret = eb_relocate_entry(vma->obj, eb, r);
+			ret = eb_relocate_entry(vma, eb, r);
 			if (ret)
 				goto out;
 
@@ -767,7 +763,7 @@ eb_relocate_vma_slow(struct i915_vma *vma,
 	int i, ret = 0;
 
 	for (i = 0; i < entry->relocation_count; i++) {
-		ret = eb_relocate_entry(vma->obj, eb, &relocs[i]);
+		ret = eb_relocate_entry(vma, eb, &relocs[i]);
 		if (ret)
 			break;
 	}
@@ -809,7 +805,6 @@ eb_reserve_vma(struct i915_vma *vma,
 	       struct intel_engine_cs *engine,
 	       bool *need_reloc)
 {
-	struct drm_i915_gem_object *obj = vma->obj;
 	struct drm_i915_gem_exec_object2 *entry = vma->exec_entry;
 	uint64_t flags;
 	int ret;
@@ -863,11 +858,6 @@ eb_reserve_vma(struct i915_vma *vma,
 		*need_reloc = true;
 	}
 
-	if (entry->flags & EXEC_OBJECT_WRITE) {
-		obj->base.pending_read_domains = I915_GEM_DOMAIN_RENDER;
-		obj->base.pending_write_domain = I915_GEM_DOMAIN_RENDER;
-	}
-
 	return 0;
 }
 
@@ -930,7 +920,6 @@ eb_vma_misplaced(struct i915_vma *vma)
 static int eb_reserve(struct i915_execbuffer *eb)
 {
 	const bool has_fenced_gpu_access = INTEL_GEN(eb->i915) < 4;
-	struct drm_i915_gem_object *obj;
 	struct i915_vma *vma;
 	struct list_head ordered_vmas;
 	struct list_head pinned_vmas;
@@ -943,7 +932,6 @@ static int eb_reserve(struct i915_execbuffer *eb)
 		bool need_fence, need_mappable;
 
 		vma = list_first_entry(&eb->vmas, struct i915_vma, exec_list);
-		obj = vma->obj;
 		entry = vma->exec_entry;
 
 		if (eb->ctx->flags & CONTEXT_NO_ZEROMAP)
@@ -963,9 +951,6 @@ static int eb_reserve(struct i915_execbuffer *eb)
 			list_move(&vma->exec_list, &ordered_vmas);
 		} else
 			list_move_tail(&vma->exec_list, &ordered_vmas);
-
-		obj->base.pending_read_domains = I915_GEM_GPU_DOMAINS & ~I915_GEM_DOMAIN_COMMAND;
-		obj->base.pending_write_domain = 0;
 	}
 	list_splice(&ordered_vmas, &eb->vmas);
 	list_splice(&pinned_vmas, &eb->vmas);
@@ -1144,7 +1129,9 @@ eb_move_to_gpu(struct i915_execbuffer *eb)
 		struct drm_i915_gem_object *obj = vma->obj;
 
 		if (obj->flags & other_rings) {
-			ret = i915_gem_object_sync(obj, eb->request);
+			ret = i915_gem_object_sync(obj,
+						   eb->request,
+						   vma->exec_entry->flags & EXEC_OBJECT_WRITE);
 			if (ret)
 				return ret;
 		}
@@ -1349,12 +1336,10 @@ eb_move_to_active(struct i915_execbuffer *eb)
 		u32 old_read = obj->base.read_domains;
 		u32 old_write = obj->base.write_domain;
 
-		obj->base.write_domain = obj->base.pending_write_domain;
-		if (obj->base.write_domain)
-			vma->exec_entry->flags |= EXEC_OBJECT_WRITE;
-		else
-			obj->base.pending_read_domains |= obj->base.read_domains;
-		obj->base.read_domains = obj->base.pending_read_domains;
+		obj->base.write_domain = 0;
+		if (vma->exec_entry->flags & EXEC_OBJECT_WRITE)
+			obj->base.read_domains = 0;
+		obj->base.read_domains |= I915_GEM_GPU_DOMAINS;
 
 		i915_vma_move_to_active(vma, eb->request, vma->exec_entry->flags);
 		eb_export_fence(obj, eb->request, vma->exec_entry->flags);
@@ -1704,7 +1689,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	}
 
 	/* Set the pending read domains for the batch buffer to COMMAND */
-	if (eb.batch->obj->base.pending_write_domain) {
+	if (eb.batch->exec_entry->flags & EXEC_OBJECT_WRITE) {
 		DRM_DEBUG("Attempting to use self-modifying batch buffer\n");
 		ret = -EINVAL;
 		goto err;
@@ -1742,10 +1727,6 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 		}
 	}
 
-	eb.batch->obj->base.pending_read_domains |= I915_GEM_DOMAIN_COMMAND;
-	if (args->batch_len == 0)
-		args->batch_len = eb.batch->size - eb.batch_start_offset;
-
 	/* snb/ivb/vlv conflate the "batch in ppgtt" bit with the "non-secure
 	 * batch" bit. Hence we need to pin secure batches into the global gtt.
 	 * hsw should have this fixed, but bdw mucks it up again. */
@@ -1772,6 +1753,9 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 		eb.batch = vma;
 	}
 
+	if (args->batch_len == 0)
+		args->batch_len = eb.batch->size - eb.batch_start_offset;
+
 	/* Allocate a request for this batch buffer nice and early. */
 	eb.request = i915_gem_request_alloc(eb.engine, eb.ctx);
 	if (IS_ERR(eb.request)) {
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 54e01631c1a9..4c6cb64ec220 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -12137,7 +12137,7 @@ static int intel_crtc_page_flip(struct drm_crtc *crtc,
 			goto cleanup_unpin;
 		}
 
-		ret = i915_gem_object_sync(obj, request);
+		ret = i915_gem_object_sync(obj, request, false);
 		if (ret)
 			goto cleanup_request;
 
-- 
2.9.3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 13/17] drm/i915: Eliminate lots of iterations over the execobjects array
  2016-08-22  8:03 Execbuf fixes and major tuning Chris Wilson
                   ` (11 preceding siblings ...)
  2016-08-22  8:03 ` [PATCH 12/17] drm/i915: Pass vma to relocate entry Chris Wilson
@ 2016-08-22  8:03 ` Chris Wilson
  2016-08-25  6:03   ` [PATCH v2] " Chris Wilson
  2016-08-22  8:03 ` [PATCH 14/17] drm/i915: First try the previous execbuffer location Chris Wilson
                   ` (5 subsequent siblings)
  18 siblings, 1 reply; 47+ messages in thread
From: Chris Wilson @ 2016-08-22  8:03 UTC (permalink / raw)
  To: intel-gfx

The major scaling bottleneck in execbuffer is the processing of the
execobjects. Creating an auxiliary list is inefficient when compared to
using the execobject array we already have allocated.

Reservation is then split into phases. As we lookup up the VMA, we
try and bind it back into active location. Only if that fails, do we add
it to the unbound list for phase 2. In phase 2, we try and add all those
objects that could not fit into their previous location, with fallback
to retrying all objects and evicting the VM in case of severe
fragmentation. (This is the same as before, except that phase 1 is now
done inline with looking up the VMA to avoid an iteration over the
execobject array. In the ideal case, we eliminate the separate reservation
phase). During the reservation phase, we only evict from the VM between
passes (rather than currently as we try to fit every new VMA). In
testing with Unreal Engine's Atlantis demo which stresses the eviction
logic on gen7 class hardware, this speed up the framerate by a factor of
2.

The second loop amalgamation is between move_to_gpu and move_to_active.
As we always submit the request, even if incomplete, we can use the
current request to track active VMA as we perform the flushes and
synchronisation required.

The next big advancement is to avoid copying back to the user any
execobjects and relocations that are not changed.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h            |    3 +-
 drivers/gpu/drm/i915/i915_gem.c            |    2 +-
 drivers/gpu/drm/i915/i915_gem_evict.c      |   64 +-
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 1555 +++++++++++++++-------------
 drivers/gpu/drm/i915/i915_gem_gtt.h        |    4 +-
 5 files changed, 874 insertions(+), 754 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 899fe983e623..a92c14ad30eb 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3111,6 +3111,7 @@ int i915_gem_freeze_late(struct drm_i915_private *dev_priv);
 
 void *i915_gem_object_alloc(struct drm_device *dev);
 void i915_gem_object_free(struct drm_i915_gem_object *obj);
+bool i915_gem_object_flush_active(struct drm_i915_gem_object *obj);
 void i915_gem_object_init(struct drm_i915_gem_object *obj,
 			 const struct drm_i915_gem_object_ops *ops);
 struct drm_i915_gem_object *i915_gem_object_create(struct drm_device *dev,
@@ -3502,7 +3503,7 @@ int __must_check i915_gem_evict_something(struct i915_address_space *vm,
 					  unsigned flags);
 int __must_check i915_gem_evict_for_vma(struct i915_vma *vma,
 					unsigned int flags);
-int i915_gem_evict_vm(struct i915_address_space *vm, bool do_idle);
+int i915_gem_evict_vm(struct i915_address_space *vm);
 
 /* belongs in i915_gem_gtt.h */
 static inline void i915_gem_chipset_flush(struct drm_i915_private *dev_priv)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 78faac2b780c..1f35dd6219cb 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3034,7 +3034,7 @@ i915_vma_insert(struct i915_vma *vma, u64 size, u64 alignment, u64 flags)
 			  size, obj->base.size,
 			  flags & PIN_MAPPABLE ? "mappable" : "total",
 			  end);
-		return -E2BIG;
+		return -ENOSPC;
 	}
 
 	ret = i915_gem_object_get_pages(obj);
diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c
index 79ce873b4891..d59b085de48e 100644
--- a/drivers/gpu/drm/i915/i915_gem_evict.c
+++ b/drivers/gpu/drm/i915/i915_gem_evict.c
@@ -55,7 +55,7 @@ mark_free(struct i915_vma *vma, unsigned int flags, struct list_head *unwind)
 	if (flags & PIN_NONFAULT && vma->obj->fault_mappable)
 		return false;
 
-	list_add(&vma->exec_list, unwind);
+	list_add(&vma->evict_link, unwind);
 	return drm_mm_scan_add_block(&vma->node);
 }
 
@@ -134,7 +134,7 @@ search_again:
 	} while (*++phase);
 
 	/* Nothing found, clean up and bail out! */
-	list_for_each_entry_safe(vma, next, &eviction_list, exec_list) {
+	list_for_each_entry_safe(vma, next, &eviction_list, evict_link) {
 		ret = drm_mm_scan_remove_block(&vma->node);
 		BUG_ON(ret);
 	}
@@ -179,16 +179,16 @@ found:
 	 * calling unbind (which may remove the active reference
 	 * of any of our objects, thus corrupting the list).
 	 */
-	list_for_each_entry_safe(vma, next, &eviction_list, exec_list) {
+	list_for_each_entry_safe(vma, next, &eviction_list, evict_link) {
 		if (drm_mm_scan_remove_block(&vma->node))
 			__i915_vma_pin(vma);
 		else
-			list_del(&vma->exec_list);
+			list_del(&vma->evict_link);
 	}
 
 	/* Unbinding will emit any required flushes */
 	ret = 0;
-	list_for_each_entry_safe(vma, next, &eviction_list, exec_list) {
+	list_for_each_entry_safe(vma, next, &eviction_list, evict_link) {
 		__i915_vma_unpin(vma);
 		if (ret == 0)
 			ret = i915_vma_unbind(vma);
@@ -260,10 +260,10 @@ int i915_gem_evict_for_vma(struct i915_vma *target, unsigned int flags)
 		}
 
 		__i915_vma_pin(vma);
-		list_add(&vma->exec_list, &eviction_list);
+		list_add(&vma->evict_link, &eviction_list);
 	}
 
-	list_for_each_entry_safe(vma, next, &eviction_list, exec_list) {
+	list_for_each_entry_safe(vma, next, &eviction_list, evict_link) {
 		__i915_vma_unpin(vma);
 		if (ret == 0)
 			ret = i915_vma_unbind(vma);
@@ -286,34 +286,48 @@ int i915_gem_evict_for_vma(struct i915_vma *target, unsigned int flags)
  * To clarify: This is for freeing up virtual address space, not for freeing
  * memory in e.g. the shrinker.
  */
-int i915_gem_evict_vm(struct i915_address_space *vm, bool do_idle)
+int i915_gem_evict_vm(struct i915_address_space *vm)
 {
+	struct list_head *phases[] = {
+		&vm->inactive_list,
+		&vm->active_list,
+		NULL
+	}, **phase;
+	struct list_head eviction_list;
 	struct i915_vma *vma, *next;
 	int ret;
 
 	WARN_ON(!mutex_is_locked(&vm->dev->struct_mutex));
 	trace_i915_gem_evict_vm(vm);
 
-	if (do_idle) {
-		struct drm_i915_private *dev_priv = to_i915(vm->dev);
-
-		if (i915_is_ggtt(vm)) {
-			ret = i915_gem_switch_to_kernel_context(dev_priv);
-			if (ret)
-				return ret;
-		}
-
-		ret = i915_gem_wait_for_idle(dev_priv, true);
+	/* Switch back to the default context in order to unpin
+	 * the existing context objects. However, such objects only
+	 * pin themselves inside the global GTT and performing the
+	 * switch otherwise is ineffective.
+	 */
+	if (i915_is_ggtt(vm)) {
+		ret = i915_gem_switch_to_kernel_context(to_i915(vm->dev));
 		if (ret)
 			return ret;
-
-		i915_gem_retire_requests(dev_priv);
-		WARN_ON(!list_empty(&vm->active_list));
 	}
 
-	list_for_each_entry_safe(vma, next, &vm->inactive_list, vm_link)
-		if (!i915_vma_is_pinned(vma))
-			WARN_ON(i915_vma_unbind(vma));
+	INIT_LIST_HEAD(&eviction_list);
+	phase = phases;
+	do {
+		list_for_each_entry(vma, *phase, vm_link) {
+			if (i915_vma_is_pinned(vma))
+				continue;
+
+			__i915_vma_pin(vma);
+			list_add(&vma->evict_link, &eviction_list);
+		}
+	} while (*++phase);
 
-	return 0;
+	ret = 0;
+	list_for_each_entry_safe(vma, next, &eviction_list, evict_link) {
+		__i915_vma_unpin(vma);
+		if (ret == 0)
+			ret = i915_vma_unbind(vma);
+	}
+	return ret;
 }
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index e9ac591f3a79..71b18fcbd8a7 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -41,11 +41,16 @@
 
 #define DBG_USE_CPU_RELOC 0 /* -1 force GTT relocs; 1 force CPU relocs */
 
-#define  __EXEC_OBJECT_HAS_PIN		(1<<31)
-#define  __EXEC_OBJECT_HAS_FENCE	(1<<30)
-#define  __EXEC_OBJECT_NEEDS_MAP	(1<<29)
-#define  __EXEC_OBJECT_NEEDS_BIAS	(1<<28)
+#define  __EXEC_OBJECT_HAS_PIN		BIT(31)
+#define  __EXEC_OBJECT_HAS_FENCE	BIT(30)
+#define  __EXEC_OBJECT_NEEDS_MAP	BIT(29)
+#define  __EXEC_OBJECT_NEEDS_BIAS	BIT(28)
 #define  __EXEC_OBJECT_INTERNAL_FLAGS (0xf<<28) /* all of the above */
+#define __EB_RESERVED (__EXEC_OBJECT_HAS_PIN | __EXEC_OBJECT_HAS_FENCE)
+
+#define __EXEC_HAS_RELOC	BIT(31)
+#define __EXEC_VALIDATED	BIT(30)
+#define UPDATE			PIN_OFFSET_FIXED
 
 #define BATCH_OFFSET_BIAS (256*1024)
 
@@ -59,21 +64,43 @@ struct i915_execbuffer {
 	struct i915_address_space *vm;
 	struct i915_vma *batch;
 	struct drm_i915_gem_request *request;
-	u32 batch_start_offset;
-	unsigned int dispatch_flags;
-	struct drm_i915_gem_exec_object2 shadow_exec_entry;
-	bool need_relocs;
-	struct list_head vmas;
+	struct list_head unbound;
+	struct list_head relocs;
 	struct reloc_cache {
 		struct drm_mm_node node;
 		unsigned long vaddr;
 		unsigned int page;
 		bool use_64bit_reloc;
+		bool has_llc;
+		bool has_fence;
 	} reloc_cache;
+	u64 invalid_flags;
+	u32 context_flags;
+	u32 dispatch_flags;
 	int lut_mask;
 	struct hlist_head *buckets;
 };
 
+#define to_ptr(T, x) ((T *)(uintptr_t)(x))
+
+/* Used to convert any address to canonical form.
+ * Starting from gen8, some commands (e.g. STATE_BASE_ADDRESS,
+ * MI_LOAD_REGISTER_MEM and others, see Broadwell PRM Vol2a) require the
+ * addresses to be in a canonical form:
+ * "GraphicsAddress[63:48] are ignored by the HW and assumed to be in correct
+ * canonical form [63:48] == [47]."
+ */
+#define GEN8_HIGH_ADDRESS_BIT 47
+static inline u64 gen8_canonical_addr(u64 address)
+{
+	return sign_extend64(address, GEN8_HIGH_ADDRESS_BIT);
+}
+
+static inline u64 gen8_noncanonical_addr(u64 address)
+{
+	return address & ((1ULL << (GEN8_HIGH_ADDRESS_BIT + 1)) - 1);
+}
+
 static int
 eb_create(struct i915_execbuffer *eb)
 {
@@ -101,80 +128,340 @@ eb_create(struct i915_execbuffer *eb)
 	return 0;
 }
 
+static bool
+eb_vma_misplaced(const struct drm_i915_gem_exec_object2 *entry,
+		 const struct i915_vma *vma)
+{
+	if ((entry->flags & __EXEC_OBJECT_HAS_PIN) == 0)
+		return true;
+
+	if (vma->node.size < entry->pad_to_size)
+		return true;
+
+	if (entry->alignment && vma->node.start & (entry->alignment - 1))
+		return true;
+
+	if (entry->flags & EXEC_OBJECT_PINNED &&
+	    vma->node.start != entry->offset)
+		return true;
+
+	if (entry->flags & __EXEC_OBJECT_NEEDS_BIAS &&
+	    vma->node.start < BATCH_OFFSET_BIAS)
+		return true;
+
+	if ((entry->flags & EXEC_OBJECT_SUPPORTS_48B_ADDRESS) == 0 &&
+	    (vma->node.start + vma->node.size - 1) >> 32)
+		return true;
+
+	return false;
+}
+
+static void
+eb_pin_vma(struct i915_execbuffer *eb,
+	   struct drm_i915_gem_exec_object2 *entry,
+	   struct i915_vma *vma)
+{
+	u64 flags;
+
+	flags = vma->node.start;
+	flags |= PIN_USER | PIN_NONBLOCK | PIN_OFFSET_FIXED;
+	if (unlikely(entry->flags & EXEC_OBJECT_NEEDS_GTT))
+		flags |= PIN_GLOBAL;
+	if (unlikely(i915_vma_pin(vma, 0, 0, flags)))
+		return;
+
+	if (entry->flags & EXEC_OBJECT_NEEDS_FENCE) {
+		if (unlikely(i915_vma_get_fence(vma))) {
+			i915_vma_unpin(vma);
+			return;
+		}
+
+		if (i915_vma_pin_fence(vma))
+			entry->flags |= __EXEC_OBJECT_HAS_FENCE;
+	}
+
+	entry->flags |= __EXEC_OBJECT_HAS_PIN;
+}
+
 static inline void
 __eb_unreserve_vma(struct i915_vma *vma,
 		   const struct drm_i915_gem_exec_object2 *entry)
 {
+	GEM_BUG_ON((entry->flags & __EXEC_OBJECT_HAS_PIN) == 0);
+
 	if (unlikely(entry->flags & __EXEC_OBJECT_HAS_FENCE))
 		i915_vma_unpin_fence(vma);
 
-	if (entry->flags & __EXEC_OBJECT_HAS_PIN)
-		__i915_vma_unpin(vma);
+	__i915_vma_unpin(vma);
 }
 
-static void
-eb_unreserve_vma(struct i915_vma *vma)
+static inline void
+eb_unreserve_vma(struct i915_vma *vma,
+		 struct drm_i915_gem_exec_object2 *entry)
 {
-	struct drm_i915_gem_exec_object2 *entry = vma->exec_entry;
-
-	__eb_unreserve_vma(vma, entry);
-	entry->flags &= ~(__EXEC_OBJECT_HAS_FENCE | __EXEC_OBJECT_HAS_PIN);
+	if (entry->flags & __EXEC_OBJECT_HAS_PIN) {
+		__eb_unreserve_vma(vma, entry);
+		entry->flags &= ~__EB_RESERVED;
+	}
 }
 
-static void
-eb_reset(struct i915_execbuffer *eb)
+static int
+eb_add_vma(struct i915_execbuffer *eb,
+	   struct drm_i915_gem_exec_object2 *entry,
+	   struct i915_vma *vma)
 {
-	struct i915_vma *vma;
+	int ret;
 
-	list_for_each_entry(vma, &eb->vmas, exec_list) {
-		eb_unreserve_vma(vma);
-		vma->exec_entry = NULL;
-	}
+	GEM_BUG_ON(i915_vma_is_closed(vma));
 
-	if (eb->lut_mask >= 0)
-		memset(eb->buckets, 0,
-		       (1<<eb->lut_mask)*sizeof(struct hlist_head));
-}
+	if ((eb->args->flags & __EXEC_VALIDATED) == 0) {
+		if (unlikely(entry->flags & eb->invalid_flags))
+			return -EINVAL;
 
-#define to_ptr(T, x) ((T *)(uintptr_t)(x))
+		if (unlikely(entry->alignment && !is_power_of_2(entry->alignment)))
+			return -EINVAL;
 
-static bool
-eb_add_vma(struct i915_execbuffer *eb, struct i915_vma *vma, int i)
-{
-	if (unlikely(vma->exec_entry)) {
-		DRM_DEBUG("Object [handle %d, index %d] appears more than once in object list\n",
-			  eb->exec[i].handle, i);
-		return false;
+		/* Offset can be used as input (EXEC_OBJECT_PINNED), reject
+		 * any non-page-aligned or non-canonical addresses.
+		 */
+		if (entry->flags & EXEC_OBJECT_PINNED) {
+			if (unlikely(entry->offset !=
+				     gen8_canonical_addr(entry->offset & PAGE_MASK)))
+				return -EINVAL;
+		}
+
+		/* From drm_mm perspective address space is continuous,
+		 * so from this point we're always using non-canonical
+		 * form internally.
+		 */
+		entry->offset = gen8_noncanonical_addr(entry->offset);
+
+		/* pad_to_size was once a reserved field, so sanitize it */
+		if (entry->flags & EXEC_OBJECT_PAD_TO_SIZE) {
+			if (unlikely(offset_in_page(entry->pad_to_size)))
+				return -EINVAL;
+		} else {
+			entry->pad_to_size = 0;
+		}
+
+		if (unlikely(vma->exec_entry)) {
+			DRM_DEBUG("Object [handle %d, index %d] appears more than once in object list\n",
+				  entry->handle, (int)(entry - eb->exec));
+			return -EINVAL;
+		}
 	}
-	list_add_tail(&vma->exec_list, &eb->vmas);
 
-	vma->exec_entry = &eb->exec[i];
+	vma->exec_entry = entry;
+	entry->rsvd2 = (uintptr_t)vma;
+
 	if (eb->lut_mask >= 0) {
-		vma->exec_handle = eb->exec[i].handle;
+		vma->exec_handle = entry->handle;
 		hlist_add_head(&vma->exec_node,
-			       &eb->buckets[hash_32(vma->exec_handle,
+			       &eb->buckets[hash_32(entry->handle,
 						    eb->lut_mask)]);
 	}
 
-	eb->exec[i].rsvd2 = (uintptr_t)vma;
-	return true;
+	if (entry->relocation_count)
+		list_add_tail(&vma->reloc_link, &eb->relocs);
+
+	if (!eb->reloc_cache.has_fence) {
+		entry->flags &= ~EXEC_OBJECT_NEEDS_FENCE;
+	} else {
+		if (entry->flags & EXEC_OBJECT_NEEDS_FENCE &&
+		    i915_gem_object_is_tiled(vma->obj))
+			entry->flags |= EXEC_OBJECT_NEEDS_GTT | __EXEC_OBJECT_NEEDS_MAP;
+	}
+
+	if ((entry->flags & EXEC_OBJECT_PINNED) == 0)
+		entry->flags |= eb->context_flags;
+
+	ret = 0;
+	if (vma->node.size)
+		eb_pin_vma(eb, entry, vma);
+	if (eb_vma_misplaced(entry, vma)) {
+		eb_unreserve_vma(vma, entry);
+
+		list_add_tail(&vma->exec_link, &eb->unbound);
+		if (drm_mm_node_allocated(&vma->node))
+			ret = i915_vma_unbind(vma);
+	} else {
+		if (entry->offset != vma->node.start) {
+			entry->offset = vma->node.start | UPDATE;
+			eb->args->flags |= __EXEC_HAS_RELOC;
+		}
+	}
+	return ret;
 }
 
-static inline struct hlist_head *ht_head(struct i915_gem_context *ctx,
-					 u32 handle)
+static inline int use_cpu_reloc(const struct reloc_cache *cache,
+				const struct drm_i915_gem_object *obj)
+{
+	if (!i915_gem_object_has_struct_page(obj))
+		return false;
+
+	if (DBG_USE_CPU_RELOC)
+		return DBG_USE_CPU_RELOC > 0;
+
+	return (cache->has_llc ||
+		obj->base.write_domain == I915_GEM_DOMAIN_CPU ||
+		obj->cache_level != I915_CACHE_NONE);
+}
+
+static int
+eb_reserve_vma(struct i915_execbuffer *eb, struct i915_vma *vma)
+{
+	struct drm_i915_gem_exec_object2 *entry = vma->exec_entry;
+	u64 flags;
+	int ret;
+
+	flags = PIN_USER | PIN_NONBLOCK;
+	if (entry->flags & EXEC_OBJECT_NEEDS_GTT)
+		flags |= PIN_GLOBAL;
+
+	if (!drm_mm_node_allocated(&vma->node)) {
+		/* Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset,
+		 * limit address to the first 4GBs for unflagged objects.
+		 */
+		if ((entry->flags & EXEC_OBJECT_SUPPORTS_48B_ADDRESS) == 0)
+			flags |= PIN_ZONE_4G;
+
+		if (entry->flags & __EXEC_OBJECT_NEEDS_MAP)
+			flags |= PIN_MAPPABLE;
+
+		if (entry->flags & EXEC_OBJECT_PINNED) {
+			flags |= entry->offset | PIN_OFFSET_FIXED;
+			/* force overlapping PINNED checks */
+			flags &= ~PIN_NONBLOCK;
+		} else if (entry->flags & __EXEC_OBJECT_NEEDS_BIAS)
+			flags |= BATCH_OFFSET_BIAS | PIN_OFFSET_BIAS;
+	}
+
+	ret = i915_vma_pin(vma, entry->pad_to_size, entry->alignment, flags);
+	if (ret)
+		return ret;
+
+	if (entry->offset != vma->node.start) {
+		entry->offset = vma->node.start | UPDATE;
+		eb->args->flags |= __EXEC_HAS_RELOC;
+	}
+	entry->flags |= __EXEC_OBJECT_HAS_PIN;
+
+	if (entry->flags & EXEC_OBJECT_NEEDS_FENCE) {
+		ret = i915_vma_get_fence(vma);
+		if (ret)
+			return ret;
+
+		if (i915_vma_pin_fence(vma))
+			entry->flags |= __EXEC_OBJECT_HAS_FENCE;
+	}
+
+	GEM_BUG_ON(eb_vma_misplaced(entry, vma));
+	return 0;
+}
+
+static int eb_reserve(struct i915_execbuffer *eb)
+{
+	const unsigned int count = eb->args->buffer_count;
+	struct list_head last;
+	struct i915_vma *vma;
+	unsigned int i, pass;
+	int ret;
+
+	/* Attempt to pin all of the buffers into the GTT.
+	 * This is done in 3 phases:
+	 *
+	 * 1a. Unbind all objects that do not match the GTT constraints for
+	 *     the execbuffer (fenceable, mappable, alignment etc).
+	 * 1b. Increment pin count for already bound objects.
+	 * 2.  Bind new objects.
+	 * 3.  Decrement pin count.
+	 *
+	 * This avoid unnecessary unbinding of later objects in order to make
+	 * room for the earlier objects *unless* we need to defragment.
+	 */
+
+	pass = 0;
+	ret = 0;
+	do {
+		list_for_each_entry(vma, &eb->unbound, exec_link) {
+			ret = eb_reserve_vma(eb, vma);
+			if (ret)
+				break;
+		}
+		if (ret != -ENOSPC || pass++)
+			return ret;
+
+		/* Resort *all* the objects into priority order */
+		INIT_LIST_HEAD(&eb->unbound);
+		INIT_LIST_HEAD(&last);
+		for (i = 0; i < count; i++) {
+			struct drm_i915_gem_exec_object2 *entry = &eb->exec[i];
+
+			vma = to_ptr(struct i915_vma, entry->rsvd2);
+			eb_unreserve_vma(vma, entry);
+
+			if (entry->flags & EXEC_OBJECT_PINNED)
+				list_add(&vma->exec_link, &eb->unbound);
+			else if (entry->flags & __EXEC_OBJECT_NEEDS_MAP)
+				list_add_tail(&vma->exec_link, &eb->unbound);
+			else
+				list_add_tail(&vma->exec_link, &last);
+		}
+		list_splice_tail(&last, &eb->unbound);
+
+		/* Too fragmented, unbind everything and retry */
+		ret = i915_gem_evict_vm(eb->vm);
+		if (ret)
+			return ret;
+	} while (1);
+}
+
+static inline struct hlist_head *
+ht_head(const struct i915_gem_context *ctx, u32 handle)
 {
 	return &ctx->vma.ht[hash_32(handle, ctx->vma.ht_bits)];
 }
 
+static int eb_batch_index(const struct i915_execbuffer *eb)
+{
+	return eb->args->buffer_count - 1;
+}
+
+static int eb_select_context(struct i915_execbuffer *eb)
+{
+	struct i915_gem_context *ctx;
+
+	ctx = i915_gem_context_lookup(eb->file->driver_priv, eb->args->rsvd1);
+	if (unlikely(IS_ERR(ctx)))
+		return PTR_ERR(ctx);
+
+	if (unlikely(ctx->hang_stats.banned)) {
+		DRM_DEBUG("Context %u tried to submit while banned\n",
+			  ctx->user_handle);
+		return -EIO;
+	}
+
+	eb->ctx = i915_gem_context_get(ctx);
+	eb->vm = ctx->ppgtt ? &ctx->ppgtt->base : &eb->i915->ggtt.base;
+
+	eb->context_flags = 0;
+	if (ctx->flags & CONTEXT_NO_ZEROMAP)
+		eb->context_flags |= __EXEC_OBJECT_NEEDS_BIAS;
+
+	return 0;
+}
+
 static int
 eb_lookup_vmas(struct i915_execbuffer *eb)
 {
 	const int count = eb->args->buffer_count;
 	struct i915_vma *vma;
+	struct idr *idr;
 	int slow_pass = -1;
-	int i;
+	int i, ret;
 
-	INIT_LIST_HEAD(&eb->vmas);
+	INIT_LIST_HEAD(&eb->relocs);
+	INIT_LIST_HEAD(&eb->unbound);
 
 	if (unlikely(eb->ctx->vma.ht_size & 1))
 		flush_work(&eb->ctx->vma.resize);
@@ -187,8 +474,9 @@ eb_lookup_vmas(struct i915_execbuffer *eb)
 			if (vma->ctx_handle != eb->exec[i].handle)
 				continue;
 
-			if (!eb_add_vma(eb, vma, i))
-				return -EINVAL;
+			ret = eb_add_vma(eb, &eb->exec[i], vma);
+			if (unlikely(ret))
+				return ret;
 
 			goto next_vma;
 		}
@@ -199,24 +487,25 @@ next_vma: ;
 	}
 
 	if (slow_pass < 0)
-		return 0;
+		goto out;
 
 	spin_lock(&eb->file->table_lock);
 	/* Grab a reference to the object and release the lock so we can lookup
 	 * or create the VMA without using GFP_ATOMIC */
+	idr = &eb->file->object_idr;
 	for (i = slow_pass; i < count; i++) {
 		struct drm_i915_gem_object *obj;
 
 		if (eb->exec[i].rsvd2)
 			continue;
 
-		obj = to_intel_bo(idr_find(&eb->file->object_idr,
-					   eb->exec[i].handle));
+		obj = to_intel_bo(idr_find(idr, eb->exec[i].handle));
 		if (unlikely(!obj)) {
 			spin_unlock(&eb->file->table_lock);
 			DRM_DEBUG("Invalid object handle %d at index %d\n",
 				  eb->exec[i].handle, i);
-			return -ENOENT;
+			ret = -ENOENT;
+			goto err;
 		}
 
 		eb->exec[i].rsvd2 = 1 | (uintptr_t)obj;
@@ -237,11 +526,12 @@ next_vma: ;
 		 * from the (obj, vm) we don't run the risk of creating
 		 * duplicated vmas for the same vm.
 		 */
-		obj = to_ptr(struct drm_i915_gem_object, eb->exec[i].rsvd2 & ~1);
+		obj = to_ptr(typeof(*obj), eb->exec[i].rsvd2 & ~1);
 		vma = i915_gem_obj_lookup_or_create_vma(obj, eb->vm, NULL);
 		if (unlikely(IS_ERR(vma))) {
 			DRM_DEBUG("Failed to lookup VMA\n");
-			return PTR_ERR(vma);
+			ret = PTR_ERR(vma);
+			goto err;
 		}
 
 		/* First come, first served */
@@ -257,8 +547,9 @@ next_vma: ;
 			}
 		}
 
-		if (!eb_add_vma(eb, vma, i))
-			return -EINVAL;
+		ret = eb_add_vma(eb, &eb->exec[i], vma);
+		if (unlikely(ret))
+			goto err;
 	}
 	if (4*eb->ctx->vma.ht_count > 3*eb->ctx->vma.ht_size ||
 	    4*eb->ctx->vma.ht_count < eb->ctx->vma.ht_size) {
@@ -266,15 +557,10 @@ next_vma: ;
 		queue_work(system_highpri_wq, &eb->ctx->vma.resize);
 	}
 
-	return 0;
-}
-
-static struct i915_vma *
-eb_get_batch(struct i915_execbuffer *eb)
-{
-	struct i915_vma *vma;
-
-	vma = to_ptr(struct i915_vma, eb->exec[eb->args->buffer_count-1].rsvd2);
+out:
+	/* take note of the batch buffer before we might reorder the lists */
+	i = eb_batch_index(eb);
+	eb->batch = to_ptr(struct i915_vma, eb->exec[i].rsvd2);
 
 	/*
 	 * SNA is doing fancy tricks with compressing batch buffers, which leads
@@ -285,14 +571,24 @@ eb_get_batch(struct i915_execbuffer *eb)
 	 * Note that actual hangs have only been observed on gen7, but for
 	 * paranoia do it everywhere.
 	 */
-	if ((vma->exec_entry->flags & EXEC_OBJECT_PINNED) == 0)
-		vma->exec_entry->flags |= __EXEC_OBJECT_NEEDS_BIAS;
+	if ((eb->exec[i].flags & EXEC_OBJECT_PINNED) == 0)
+		eb->exec[i].flags |= __EXEC_OBJECT_NEEDS_BIAS;
+	if (eb->reloc_cache.has_fence)
+		eb->exec[i].flags |= EXEC_OBJECT_NEEDS_FENCE;
 
-	return vma;
+	eb->args->flags |= __EXEC_VALIDATED;
+	return eb_reserve(eb);
+
+err:
+	for (i = slow_pass; i < count; i++) {
+		if (eb->exec[i].rsvd2 & 1)
+			eb->exec[i].rsvd2 = 0;
+	}
+	return ret;
 }
 
 static struct i915_vma *
-eb_get_vma(struct i915_execbuffer *eb, unsigned long handle)
+eb_get_vma(const struct i915_execbuffer *eb, unsigned long handle)
 {
 	if (eb->lut_mask < 0) {
 		if (handle >= -eb->lut_mask)
@@ -311,60 +607,58 @@ eb_get_vma(struct i915_execbuffer *eb, unsigned long handle)
 	}
 }
 
-static void eb_destroy(struct i915_execbuffer *eb)
+static void
+eb_reset(const struct i915_execbuffer *eb)
 {
-	struct i915_vma *vma;
+	const unsigned int count = eb->args->buffer_count;
+	unsigned int i;
 
-	list_for_each_entry(vma, &eb->vmas, exec_list) {
-		if (!vma->exec_entry)
-			continue;
+	for (i = 0; i < count; i++) {
+		struct drm_i915_gem_exec_object2 *entry = &eb->exec[i];
+		struct i915_vma *vma = to_ptr(struct i915_vma, entry->rsvd2);
 
-		__eb_unreserve_vma(vma, vma->exec_entry);
+		eb_unreserve_vma(vma, entry);
 		vma->exec_entry = NULL;
 	}
 
-	i915_gem_context_put(eb->ctx);
-
 	if (eb->lut_mask >= 0)
-		kfree(eb->buckets);
+		memset(eb->buckets, 0,
+		       (1<<eb->lut_mask)*sizeof(struct hlist_head));
 }
 
-static inline int use_cpu_reloc(struct drm_i915_gem_object *obj)
+static void eb_release_vma(const struct i915_execbuffer *eb)
 {
-	if (!i915_gem_object_has_struct_page(obj))
-		return false;
+	const unsigned int count = eb->args->buffer_count;
+	unsigned int i;
 
-	if (DBG_USE_CPU_RELOC)
-		return DBG_USE_CPU_RELOC > 0;
+	if (!eb->exec)
+		return;
 
-	return (HAS_LLC(obj->base.dev) ||
-		obj->base.write_domain == I915_GEM_DOMAIN_CPU ||
-		obj->cache_level != I915_CACHE_NONE);
-}
+	for (i = 0; i < count; i++) {
+		struct drm_i915_gem_exec_object2 *entry = &eb->exec[i];
+		struct i915_vma *vma = to_ptr(struct i915_vma, entry->rsvd2);
 
-/* Used to convert any address to canonical form.
- * Starting from gen8, some commands (e.g. STATE_BASE_ADDRESS,
- * MI_LOAD_REGISTER_MEM and others, see Broadwell PRM Vol2a) require the
- * addresses to be in a canonical form:
- * "GraphicsAddress[63:48] are ignored by the HW and assumed to be in correct
- * canonical form [63:48] == [47]."
- */
-#define GEN8_HIGH_ADDRESS_BIT 47
-static inline uint64_t gen8_canonical_addr(uint64_t address)
-{
-	return sign_extend64(address, GEN8_HIGH_ADDRESS_BIT);
+		if (!vma || !vma->exec_entry)
+			continue;
+
+		GEM_BUG_ON(vma->exec_entry != entry);
+		if (entry->flags & __EXEC_OBJECT_HAS_PIN)
+			__eb_unreserve_vma(vma, entry);
+		vma->exec_entry = NULL;
+	}
 }
 
-static inline uint64_t gen8_noncanonical_addr(uint64_t address)
+static void eb_destroy(const struct i915_execbuffer *eb)
 {
-	return address & ((1ULL << (GEN8_HIGH_ADDRESS_BIT + 1)) - 1);
+	if (eb->lut_mask >= 0)
+		kfree(eb->buckets);
 }
 
-static inline uint64_t
+static inline u64
 relocation_target(const struct drm_i915_gem_relocation_entry *reloc,
-		  uint64_t target_offset)
+		  const struct i915_vma *target)
 {
-	return gen8_canonical_addr((int)reloc->delta + target_offset);
+	return gen8_canonical_addr((int)reloc->delta + target->node.start);
 }
 
 static void reloc_cache_init(struct reloc_cache *cache,
@@ -372,6 +666,8 @@ static void reloc_cache_init(struct reloc_cache *cache,
 {
 	cache->page = -1;
 	cache->vaddr = 0;
+	cache->has_llc = HAS_LLC(i915);
+	cache->has_fence = INTEL_GEN(i915) < 4;
 	cache->use_64bit_reloc = INTEL_GEN(i915) >= 8;
 	cache->node.allocated = false;
 }
@@ -484,7 +780,7 @@ static void *reloc_iomap(struct drm_i915_gem_object *obj,
 		struct i915_vma *vma;
 		int ret;
 
-		if (use_cpu_reloc(obj))
+		if (use_cpu_reloc(cache, obj))
 			return NULL;
 
 		ret = i915_gem_object_set_to_gtt_domain(obj, true);
@@ -572,17 +868,17 @@ static void clflush_write32(u32 *addr, u32 value, unsigned int flushes)
 		*addr = value;
 }
 
-static int
+static u64
 relocate_entry(struct drm_i915_gem_object *obj,
 	       const struct drm_i915_gem_relocation_entry *reloc,
 	       struct reloc_cache *cache,
-	       u64 target_offset)
+	       const struct i915_vma *target)
 {
 	u64 offset = reloc->offset;
+	u64 target_offset = relocation_target(reloc, target);
 	bool wide = cache->use_64bit_reloc;
 	void *vaddr;
 
-	target_offset = relocation_target(reloc, target_offset);
 repeat:
 	vaddr = reloc_vaddr(obj, cache, offset >> PAGE_SHIFT);
 	if (IS_ERR(vaddr))
@@ -599,7 +895,7 @@ repeat:
 		goto repeat;
 	}
 
-	return 0;
+	return gen8_canonical_addr(target->node.start) | 1;
 }
 
 static bool object_is_idle(struct drm_i915_gem_object *obj)
@@ -616,13 +912,12 @@ static bool object_is_idle(struct drm_i915_gem_object *obj)
 	return true;
 }
 
-static int
-eb_relocate_entry(struct i915_vma *vma,
-		  struct i915_execbuffer *eb,
-		  struct drm_i915_gem_relocation_entry *reloc)
+static u64
+eb_relocate_entry(struct i915_execbuffer *eb,
+		  const struct i915_vma *vma,
+		  const struct drm_i915_gem_relocation_entry *reloc)
 {
 	struct i915_vma *target;
-	u64 target_offset;
 	int ret;
 
 	/* we've already hold a reference to all valid objects */
@@ -653,26 +948,28 @@ eb_relocate_entry(struct i915_vma *vma,
 		return -EINVAL;
 	}
 
-	if (reloc->write_domain)
+	if (reloc->write_domain) {
 		target->exec_entry->flags |= EXEC_OBJECT_WRITE;
 
-	/* Sandybridge PPGTT errata: We need a global gtt mapping for MI and
-	 * pipe_control writes because the gpu doesn't properly redirect them
-	 * through the ppgtt for non_secure batchbuffers.
-	 */
-	if (unlikely(IS_GEN6(eb->i915) &&
-		     reloc->write_domain == I915_GEM_DOMAIN_INSTRUCTION)) {
-		ret = i915_vma_bind(target, target->obj->cache_level,
-				    PIN_GLOBAL);
-		if (WARN_ONCE(ret, "Unexpected failure to bind target VMA!"))
-			return ret;
+		/* Sandybridge PPGTT errata: We need a global gtt mapping
+		 * for MI and pipe_control writes because the gpu doesn't
+		 * properly redirect them through the ppgtt for non_secure
+		 * batchbuffers.
+		 */
+		if (reloc->write_domain == I915_GEM_DOMAIN_INSTRUCTION &&
+		    IS_GEN6(eb->i915)) {
+			ret = i915_vma_bind(target, target->obj->cache_level,
+					    PIN_GLOBAL);
+			if (WARN_ONCE(ret,
+				      "Unexpected failure to bind target VMA!"))
+				return ret;
+		}
 	}
 
 	/* If the relocation already has the right value in it, no
 	 * more work needs to be done.
 	 */
-	target_offset = gen8_canonical_addr(target->node.start);
-	if (target_offset == reloc->presumed_offset)
+	if (gen8_canonical_addr(target->node.start) == reloc->presumed_offset)
 		return 0;
 
 	/* Check that the relocation address is valid... */
@@ -695,85 +992,90 @@ eb_relocate_entry(struct i915_vma *vma,
 
 	/* We can't wait for rendering with pagefaults disabled */
 	if (pagefault_disabled() && !object_is_idle(vma->obj))
-		return -EFAULT;
-
-	ret = relocate_entry(vma->obj, reloc, &eb->reloc_cache, target_offset);
-	if (ret)
-		return ret;
+		return -EBUSY;
 
 	/* and update the user's relocation entry */
-	reloc->presumed_offset = target_offset;
-	return 0;
+	return relocate_entry(vma->obj, reloc, &eb->reloc_cache, target);
 }
 
-static int eb_relocate_vma(struct i915_vma *vma, struct i915_execbuffer *eb)
+static int eb_relocate_vma(struct i915_execbuffer *eb,
+			   const struct i915_vma *vma)
 {
 #define N_RELOC(x) ((x) / sizeof(struct drm_i915_gem_relocation_entry))
-	struct drm_i915_gem_relocation_entry stack_reloc[N_RELOC(512)];
-	struct drm_i915_gem_relocation_entry __user *user_relocs;
-	struct drm_i915_gem_exec_object2 *entry = vma->exec_entry;
-	int remain, ret = 0;
-
-	user_relocs = u64_to_user_ptr(entry->relocs_ptr);
+	struct drm_i915_gem_relocation_entry stack[N_RELOC(512)];
+	struct drm_i915_gem_relocation_entry __user *urelocs;
+	const struct drm_i915_gem_exec_object2 *entry = vma->exec_entry;
+	unsigned int remain;
 
+	urelocs = u64_to_user_ptr(entry->relocs_ptr);
 	remain = entry->relocation_count;
-	while (remain) {
-		struct drm_i915_gem_relocation_entry *r = stack_reloc;
-		int count = remain;
-		if (count > ARRAY_SIZE(stack_reloc))
-			count = ARRAY_SIZE(stack_reloc);
-		remain -= count;
+	if (unlikely(remain > ULONG_MAX / sizeof(*urelocs)))
+		return -EINVAL;
+
+	/*
+	 * We must check that the entire relocation array is safe
+	 * to read. However, if the array is not writable the user loses
+	 * the updated relocation values.
+	 */
 
-		if (__copy_from_user_inatomic(r, user_relocs, count*sizeof(r[0]))) {
-			ret = -EFAULT;
+	do {
+		struct drm_i915_gem_relocation_entry *r = stack;
+		unsigned int count =
+			min_t(unsigned int, remain, ARRAY_SIZE(stack));
+
+		if (__copy_from_user_inatomic(r, urelocs, count*sizeof(r[0]))) {
+			remain = -EFAULT;
 			goto out;
 		}
 
+		remain -= count;
 		do {
-			u64 offset = r->presumed_offset;
+			u64 offset = eb_relocate_entry(eb, vma, r);
 
-			ret = eb_relocate_entry(vma, eb, r);
-			if (ret)
-				goto out;
-
-			if (r->presumed_offset != offset &&
-			    __put_user(r->presumed_offset,
-				       &user_relocs->presumed_offset)) {
-				ret = -EFAULT;
+			if (offset == 0) {
+			} else if ((s64)offset < 0) {
+				remain = (s64)offset;
 				goto out;
+			} else {
+				__put_user(offset & ~1,
+					   &urelocs[r-stack].presumed_offset);
 			}
-
-			user_relocs++;
-			r++;
-		} while (--count);
-	}
-
+		} while (r++, --count);
+		urelocs += ARRAY_SIZE(stack);
+	} while (remain);
 out:
 	reloc_cache_reset(&eb->reloc_cache);
-	return ret;
+	return remain;
 #undef N_RELOC
 }
 
 static int
-eb_relocate_vma_slow(struct i915_vma *vma,
-		     struct i915_execbuffer *eb,
-		     struct drm_i915_gem_relocation_entry *relocs)
+eb_relocate_vma_slow(struct i915_execbuffer *eb,
+		     const struct i915_vma *vma)
 {
 	const struct drm_i915_gem_exec_object2 *entry = vma->exec_entry;
-	int i, ret = 0;
+	struct drm_i915_gem_relocation_entry *relocs =
+		to_ptr(typeof(*relocs), entry->relocs_ptr);
+	unsigned int i;
+	int ret;
 
 	for (i = 0; i < entry->relocation_count; i++) {
-		ret = eb_relocate_entry(vma, eb, &relocs[i]);
-		if (ret)
-			break;
+		u64 offset = eb_relocate_entry(eb, vma, &relocs[i]);
+
+		if ((s64)offset < 0) {
+			ret = (s64)offset;
+			goto err;
+		}
 	}
+	ret = 0;
+err:
 	reloc_cache_reset(&eb->reloc_cache);
 	return ret;
 }
 
 static int eb_relocate(struct i915_execbuffer *eb)
 {
-	struct i915_vma *vma;
+	const struct i915_vma *vma;
 	int ret = 0;
 
 	/* This is the fast path and we cannot handle a pagefault whilst
@@ -784,293 +1086,175 @@ static int eb_relocate(struct i915_execbuffer *eb)
 	 * lockdep complains vehemently.
 	 */
 	pagefault_disable();
-	list_for_each_entry(vma, &eb->vmas, exec_list) {
-		ret = eb_relocate_vma(vma, eb);
-		if (ret)
-			break;
+	list_for_each_entry(vma, &eb->relocs, reloc_link) {
+retry:
+		ret = eb_relocate_vma(eb, vma);
+		if (ret == 0)
+			continue;
+
+		if (ret == -EBUSY) {
+			pagefault_enable();
+			ret = i915_gem_object_wait_rendering(vma->obj, false);
+			pagefault_disable();
+			if (ret == 0)
+				goto retry;
+		}
+		break;
 	}
 	pagefault_enable();
 
 	return ret;
 }
 
-static bool only_mappable_for_reloc(unsigned int flags)
+static int check_relocations(const struct drm_i915_gem_exec_object2 *entry)
 {
-	return (flags & (EXEC_OBJECT_NEEDS_FENCE | __EXEC_OBJECT_NEEDS_MAP)) ==
-		__EXEC_OBJECT_NEEDS_MAP;
-}
-
-static int
-eb_reserve_vma(struct i915_vma *vma,
-	       struct intel_engine_cs *engine,
-	       bool *need_reloc)
-{
-	struct drm_i915_gem_exec_object2 *entry = vma->exec_entry;
-	uint64_t flags;
-	int ret;
-
-	flags = PIN_USER;
-	if (entry->flags & EXEC_OBJECT_NEEDS_GTT)
-		flags |= PIN_GLOBAL;
+	const unsigned long relocs_max =
+		ULONG_MAX / sizeof(struct drm_i915_gem_relocation_entry);
+	const char __user *addr, *end;
+	unsigned long size;
+	unsigned int nreloc;
+	char c;
+
+	nreloc = entry->relocation_count;
+	if (nreloc == 0)
+		return 0;
 
-	if (!drm_mm_node_allocated(&vma->node)) {
-		/* Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset,
-		 * limit address to the first 4GBs for unflagged objects.
-		 */
-		if ((entry->flags & EXEC_OBJECT_SUPPORTS_48B_ADDRESS) == 0)
-			flags |= PIN_ZONE_4G;
-		if (entry->flags & __EXEC_OBJECT_NEEDS_MAP)
-			flags |= PIN_GLOBAL | PIN_MAPPABLE;
-		if (entry->flags & __EXEC_OBJECT_NEEDS_BIAS)
-			flags |= BATCH_OFFSET_BIAS | PIN_OFFSET_BIAS;
-		if (entry->flags & EXEC_OBJECT_PINNED)
-			flags |= entry->offset | PIN_OFFSET_FIXED;
-		if ((flags & PIN_MAPPABLE) == 0)
-			flags |= PIN_HIGH;
-	}
-
-	ret = i915_vma_pin(vma,
-			   entry->pad_to_size,
-			   entry->alignment,
-			   flags);
-	if ((ret == -ENOSPC || ret == -E2BIG) &&
-	    only_mappable_for_reloc(entry->flags))
-		ret = i915_vma_pin(vma,
-				   entry->pad_to_size,
-				   entry->alignment,
-				   flags & ~PIN_MAPPABLE);
-	if (ret)
-		return ret;
+	if (nreloc > relocs_max)
+		return -EINVAL;
 
-	entry->flags |= __EXEC_OBJECT_HAS_PIN;
+	addr = u64_to_user_ptr(entry->relocs_ptr);
+	size = nreloc * sizeof(struct drm_i915_gem_relocation_entry);
+	if (!access_ok(VERIFY_WRITE, addr, size))
+		return -EFAULT;
 
-	if (entry->flags & EXEC_OBJECT_NEEDS_FENCE) {
-		ret = i915_vma_get_fence(vma);
+	end = addr + size;
+	for (; addr < end; addr += PAGE_SIZE) {
+		int ret = __get_user(c, addr);
 		if (ret)
 			return ret;
-
-		if (i915_vma_pin_fence(vma))
-			entry->flags |= __EXEC_OBJECT_HAS_FENCE;
 	}
-
-	if (entry->offset != vma->node.start) {
-		entry->offset = vma->node.start;
-		*need_reloc = true;
-	}
-
-	return 0;
+	return __get_user(c, end - 1);
 }
 
-static bool
-need_reloc_mappable(struct i915_vma *vma)
+static int
+eb_copy_relocations(const struct i915_execbuffer *eb)
 {
-	struct drm_i915_gem_exec_object2 *entry = vma->exec_entry;
-
-	if (entry->relocation_count == 0)
-		return false;
-
-	if (!i915_vma_is_ggtt(vma))
-		return false;
+	const unsigned int count = eb->args->buffer_count;
+	unsigned int i;
+	int ret;
 
-	/* See also use_cpu_reloc() */
-	if (HAS_LLC(vma->obj->base.dev))
-		return false;
+	for (i = 0; i < count; i++) {
+		struct drm_i915_gem_relocation_entry __user *urelocs;
+		struct drm_i915_gem_relocation_entry *relocs;
+		unsigned int nreloc = eb->exec[i].relocation_count, j;
+		unsigned long size;
 
-	if (vma->obj->base.write_domain == I915_GEM_DOMAIN_CPU)
-		return false;
+		if (nreloc == 0)
+			continue;
 
-	return true;
-}
+		ret = check_relocations(&eb->exec[i]);
+		if (ret)
+			goto err;
 
-static bool
-eb_vma_misplaced(struct i915_vma *vma)
-{
-	struct drm_i915_gem_exec_object2 *entry = vma->exec_entry;
+		urelocs = u64_to_user_ptr(eb->exec[i].relocs_ptr);
+		size = nreloc * sizeof(*relocs);
 
-	WARN_ON(entry->flags & __EXEC_OBJECT_NEEDS_MAP &&
-		!i915_vma_is_ggtt(vma));
+		relocs = drm_malloc_gfp(size, 1, GFP_TEMPORARY);
+		if (!relocs) {
+			ret = -ENOMEM;
+			goto err;
+		}
 
-	if (entry->alignment &&
-	    vma->node.start & (entry->alignment - 1))
-		return true;
+		/* copy_from_user is limited to 4GiB */
+		j = 0;
+		do {
+			u32 len = min_t(u64, 1ull<<31, size);
 
-	if (vma->node.size < entry->pad_to_size)
-		return true;
+			if (__copy_from_user(relocs + j, urelocs + j, len)) {
+				ret = -EFAULT;
+				goto err;
+			}
 
-	if (entry->flags & EXEC_OBJECT_PINNED &&
-	    vma->node.start != entry->offset)
-		return true;
+			size -= len;
+			BUILD_BUG_ON_NOT_POWER_OF_2(sizeof(*relocs));
+			j += len / sizeof(*relocs);
+		} while (size);
 
-	if (entry->flags & __EXEC_OBJECT_NEEDS_BIAS &&
-	    vma->node.start < BATCH_OFFSET_BIAS)
-		return true;
+		/* As we do not update the known relocation offsets after
+		 * relocating (due to the complexities in lock handling),
+		 * we need to mark them as invalid now so that we force the
+		 * relocation processing next time. Just in case the target
+		 * object is evicted and then rebound into its old
+		 * presumed_offset before the next execbuffer - if that
+		 * happened we would make the mistake of assuming that the
+		 * relocations were valid.
+		 */
+		user_access_begin();
+		for (j = 0; j < nreloc; j++)
+			unsafe_put_user(-1,
+					&urelocs[j].presumed_offset,
+					end_user);
+end_user:
+		user_access_end();
 
-	/* avoid costly ping-pong once a batch bo ended up non-mappable */
-	if (entry->flags & __EXEC_OBJECT_NEEDS_MAP &&
-	    !i915_vma_is_map_and_fenceable(vma))
-		return !only_mappable_for_reloc(entry->flags);
+		eb->exec[i].relocs_ptr = (uintptr_t)relocs;
+	}
 
-	if ((entry->flags & EXEC_OBJECT_SUPPORTS_48B_ADDRESS) == 0 &&
-	    (vma->node.start + vma->node.size - 1) >> 32)
-		return true;
+	return 0;
 
-	return false;
+err:
+	while (i--) {
+		struct drm_i915_gem_relocation_entry *relocs =
+			to_ptr(typeof(*relocs), eb->exec[i].relocs_ptr);
+		if (eb->exec[i].relocation_count)
+			drm_free_large(relocs);
+	}
+	return ret;
 }
 
-static int eb_reserve(struct i915_execbuffer *eb)
+static int eb_prefault_relocations(const struct i915_execbuffer *eb)
 {
-	const bool has_fenced_gpu_access = INTEL_GEN(eb->i915) < 4;
-	struct i915_vma *vma;
-	struct list_head ordered_vmas;
-	struct list_head pinned_vmas;
-	int retry;
-
-	INIT_LIST_HEAD(&ordered_vmas);
-	INIT_LIST_HEAD(&pinned_vmas);
-	while (!list_empty(&eb->vmas)) {
-		struct drm_i915_gem_exec_object2 *entry;
-		bool need_fence, need_mappable;
-
-		vma = list_first_entry(&eb->vmas, struct i915_vma, exec_list);
-		entry = vma->exec_entry;
-
-		if (eb->ctx->flags & CONTEXT_NO_ZEROMAP)
-			entry->flags |= __EXEC_OBJECT_NEEDS_BIAS;
-
-		if (!has_fenced_gpu_access)
-			entry->flags &= ~EXEC_OBJECT_NEEDS_FENCE;
-		need_fence =
-			entry->flags & EXEC_OBJECT_NEEDS_FENCE &&
-			i915_gem_object_is_tiled(vma->obj);
-		need_mappable = need_fence || need_reloc_mappable(vma);
-
-		if (entry->flags & EXEC_OBJECT_PINNED)
-			list_move_tail(&vma->exec_list, &pinned_vmas);
-		else if (need_mappable) {
-			entry->flags |= __EXEC_OBJECT_NEEDS_MAP;
-			list_move(&vma->exec_list, &ordered_vmas);
-		} else
-			list_move_tail(&vma->exec_list, &ordered_vmas);
-	}
-	list_splice(&ordered_vmas, &eb->vmas);
-	list_splice(&pinned_vmas, &eb->vmas);
-
-	/* Attempt to pin all of the buffers into the GTT.
-	 * This is done in 3 phases:
-	 *
-	 * 1a. Unbind all objects that do not match the GTT constraints for
-	 *     the execbuffer (fenceable, mappable, alignment etc).
-	 * 1b. Increment pin count for already bound objects.
-	 * 2.  Bind new objects.
-	 * 3.  Decrement pin count.
-	 *
-	 * This avoid unnecessary unbinding of later objects in order to make
-	 * room for the earlier objects *unless* we need to defragment.
-	 */
-	retry = 0;
-	do {
-		int ret = 0;
-
-		/* Unbind any ill-fitting objects or pin. */
-		list_for_each_entry(vma, &eb->vmas, exec_list) {
-			if (!drm_mm_node_allocated(&vma->node))
-				continue;
-
-			if (eb_vma_misplaced(vma))
-				ret = i915_vma_unbind(vma);
-			else
-				ret = eb_reserve_vma(vma, eb->engine, &eb->need_relocs);
-			if (ret)
-				goto err;
-		}
-
-		/* Bind fresh objects */
-		list_for_each_entry(vma, &eb->vmas, exec_list) {
-			if (drm_mm_node_allocated(&vma->node))
-				continue;
-
-			ret = eb_reserve_vma(vma, eb->engine, &eb->need_relocs);
-			if (ret)
-				goto err;
-		}
-
-err:
-		if (ret != -ENOSPC || retry++)
-			return ret;
+	const unsigned int count = eb->args->buffer_count;
+	unsigned int i;
 
-		/* Decrement pin count for bound objects */
-		list_for_each_entry(vma, &eb->vmas, exec_list)
-			eb_unreserve_vma(vma);
+	for (i = 0; i < count; i++) {
+		int ret;
 
-		ret = i915_gem_evict_vm(eb->vm, true);
+		ret = check_relocations(&eb->exec[i]);
 		if (ret)
 			return ret;
-	} while (1);
+	}
+
+	return 0;
 }
 
-static int
-eb_relocate_slow(struct i915_execbuffer *eb)
+static int eb_relocate_slow(struct i915_execbuffer *eb)
 {
-	const unsigned int count = eb->args->buffer_count;
 	struct drm_device *dev = &eb->i915->drm;
-	struct drm_i915_gem_relocation_entry *reloc;
-	struct i915_vma *vma;
-	int *reloc_offset;
-	int i, total, ret;
-
-	/* We may process another execbuffer during the unlock... */
-	eb_reset(eb);
-	mutex_unlock(&dev->struct_mutex);
-
-	total = 0;
-	for (i = 0; i < count; i++)
-		total += eb->exec[i].relocation_count;
-
-	reloc_offset = drm_malloc_ab(count, sizeof(*reloc_offset));
-	reloc = drm_malloc_ab(total, sizeof(*reloc));
-	if (reloc == NULL || reloc_offset == NULL) {
-		drm_free_large(reloc);
-		drm_free_large(reloc_offset);
-		mutex_lock(&dev->struct_mutex);
-		return -ENOMEM;
-	}
-
-	total = 0;
-	for (i = 0; i < count; i++) {
-		struct drm_i915_gem_relocation_entry __user *user_relocs;
-		u64 invalid_offset = (u64)-1;
-		int j;
-
-		user_relocs = u64_to_user_ptr(eb->exec[i].relocs_ptr);
-
-		if (copy_from_user(reloc+total, user_relocs,
-				   eb->exec[i].relocation_count * sizeof(*reloc))) {
-			ret = -EFAULT;
-			mutex_lock(&dev->struct_mutex);
-			goto err;
-		}
+	bool have_copy = false;
+	const struct i915_vma *vma;
+	int ret = 0;
 
-		/* As we do not update the known relocation offsets after
-		 * relocating (due to the complexities in lock handling),
-		 * we need to mark them as invalid now so that we force the
-		 * relocation processing next time. Just in case the target
-		 * object is evicted and then rebound into its old
-		 * presumed_offset before the next execbuffer - if that
-		 * happened we would make the mistake of assuming that the
-		 * relocations were valid.
-		 */
-		for (j = 0; j < eb->exec[i].relocation_count; j++) {
-			if (__copy_to_user(&user_relocs[j].presumed_offset,
-					   &invalid_offset,
-					   sizeof(invalid_offset))) {
-				ret = -EFAULT;
-				mutex_lock(&dev->struct_mutex);
-				goto err;
-			}
-		}
+repeat:
+	if (signal_pending(current))
+		return -ERESTARTSYS;
 
-		reloc_offset[i] = total;
-		total += eb->exec[i].relocation_count;
+	/* We may process another execbuffer during the unlock... */
+	eb_reset(eb);
+	mutex_unlock(&dev->struct_mutex);
+
+	if (ret == 0 && likely(!i915.prefault_disable)) {
+		ret = eb_prefault_relocations(eb);
+	} else if (!have_copy) {
+		ret = eb_copy_relocations(eb);
+		have_copy = true;
+	} else {
+		cond_resched();
+		ret = 0;
+	}
+	if (ret) {
+		mutex_lock(&dev->struct_mutex);
+		return ret;
 	}
 
 	ret = i915_mutex_lock_interruptible(dev);
@@ -1084,16 +1268,18 @@ eb_relocate_slow(struct i915_execbuffer *eb)
 	if (ret)
 		goto err;
 
-	ret = eb_reserve(eb);
-	if (ret)
-		goto err;
-
-	list_for_each_entry(vma, &eb->vmas, exec_list) {
-		int idx = vma->exec_entry - eb->exec;
-
-		ret = eb_relocate_vma_slow(vma, eb, reloc + reloc_offset[idx]);
-		if (ret)
-			goto err;
+	list_for_each_entry(vma, &eb->relocs, reloc_link) {
+		if (!have_copy) {
+			pagefault_disable();
+			ret = eb_relocate_vma(eb, vma);
+			pagefault_enable();
+			if (ret)
+				goto repeat;
+		} else {
+			ret = eb_relocate_vma_slow(eb, vma);
+			if (ret)
+				goto err;
+		}
 	}
 
 	/* Leave the user relocations as are, this is the painfully slow path,
@@ -1103,11 +1289,51 @@ eb_relocate_slow(struct i915_execbuffer *eb)
 	 */
 
 err:
-	drm_free_large(reloc);
-	drm_free_large(reloc_offset);
+	if (ret == -EAGAIN)
+		goto repeat;
+
+	if (have_copy) {
+		const unsigned int count = eb->args->buffer_count;
+		unsigned int i;
+
+		for (i = 0; i < count; i++) {
+			const struct drm_i915_gem_exec_object2 *entry =
+				&eb->exec[i];
+			struct drm_i915_gem_relocation_entry *relocs;
+
+			if (entry->relocation_count == 0)
+				continue;
+
+			relocs = to_ptr(typeof(*relocs), entry->relocs_ptr);
+			drm_free_large(relocs);
+		}
+	}
+
 	return ret;
 }
 
+static void eb_export_fence(struct drm_i915_gem_object *obj,
+			    struct drm_i915_gem_request *req,
+			    unsigned int flags)
+{
+	struct reservation_object *resv;
+
+	resv = i915_gem_object_get_dmabuf_resv(obj);
+	if (!resv)
+		return;
+
+	/* Ignore errors from failing to allocate the new fence, we can't
+	 * handle an error right now. Worst case should be missed
+	 * synchronisation leading to rendering corruption.
+	 */
+	ww_mutex_lock(&resv->lock, NULL);
+	if (flags & EXEC_OBJECT_WRITE)
+		reservation_object_add_excl_fence(resv, &req->fence);
+	else if (reservation_object_reserve_shared(resv) == 0)
+		reservation_object_add_shared_fence(resv, &req->fence);
+	ww_mutex_unlock(&resv->lock);
+}
+
 static unsigned int eb_other_engines(struct i915_execbuffer *eb)
 {
 	unsigned int mask;
@@ -1122,23 +1348,40 @@ static int
 eb_move_to_gpu(struct i915_execbuffer *eb)
 {
 	const unsigned int other_rings = eb_other_engines(eb);
-	struct i915_vma *vma;
+	const unsigned int count = eb->args->buffer_count;
+	unsigned int i;
 	int ret;
 
-	list_for_each_entry(vma, &eb->vmas, exec_list) {
+	for (i = 0; i < count; i++) {
+		const struct drm_i915_gem_exec_object2 *entry = &eb->exec[i];
+		struct i915_vma *vma = to_ptr(struct i915_vma, entry->rsvd2);
 		struct drm_i915_gem_object *obj = vma->obj;
 
 		if (obj->flags & other_rings) {
-			ret = i915_gem_object_sync(obj,
-						   eb->request,
-						   vma->exec_entry->flags & EXEC_OBJECT_WRITE);
+			ret = i915_gem_object_sync(obj, eb->request,
+						   entry->flags & EXEC_OBJECT_WRITE);
 			if (ret)
 				return ret;
 		}
 
-		if (obj->base.write_domain & I915_GEM_DOMAIN_CPU)
-			i915_gem_clflush_object(obj, false);
+		if (obj->base.write_domain) {
+			if (obj->base.write_domain & I915_GEM_DOMAIN_CPU)
+				i915_gem_clflush_object(obj, false);
+
+			obj->base.write_domain = 0;
+		}
+
+		if (entry->flags & EXEC_OBJECT_WRITE)
+			obj->base.read_domains = 0;
+		obj->base.read_domains |= I915_GEM_GPU_DOMAINS;
+
+		i915_vma_move_to_active(vma, eb->request, entry->flags);
+		eb_export_fence(obj, eb->request, entry->flags);
+
+		__eb_unreserve_vma(vma, entry);
+		vma->exec_entry = NULL;
 	}
+	eb->exec = NULL;
 
 	/* Unconditionally flush any chipset caches (for streaming writes). */
 	i915_gem_chipset_flush(eb->i915);
@@ -1170,104 +1413,6 @@ i915_gem_check_execbuffer(struct drm_i915_gem_execbuffer2 *exec)
 	return true;
 }
 
-static int
-validate_exec_list(struct drm_device *dev,
-		   struct drm_i915_gem_exec_object2 *exec,
-		   int count)
-{
-	unsigned relocs_total = 0;
-	unsigned relocs_max = UINT_MAX / sizeof(struct drm_i915_gem_relocation_entry);
-	unsigned invalid_flags;
-	int i;
-
-	/* INTERNAL flags must not overlap with external ones */
-	BUILD_BUG_ON(__EXEC_OBJECT_INTERNAL_FLAGS & ~__EXEC_OBJECT_UNKNOWN_FLAGS);
-
-	invalid_flags = __EXEC_OBJECT_UNKNOWN_FLAGS;
-	if (USES_FULL_PPGTT(dev))
-		invalid_flags |= EXEC_OBJECT_NEEDS_GTT;
-
-	for (i = 0; i < count; i++) {
-		char __user *ptr = u64_to_user_ptr(exec[i].relocs_ptr);
-		int length; /* limited by fault_in_pages_readable() */
-
-		if (exec[i].flags & invalid_flags)
-			return -EINVAL;
-
-		/* Offset can be used as input (EXEC_OBJECT_PINNED), reject
-		 * any non-page-aligned or non-canonical addresses.
-		 */
-		if (exec[i].flags & EXEC_OBJECT_PINNED) {
-			if (exec[i].offset !=
-			    gen8_canonical_addr(exec[i].offset & PAGE_MASK))
-				return -EINVAL;
-
-			/* From drm_mm perspective address space is continuous,
-			 * so from this point we're always using non-canonical
-			 * form internally.
-			 */
-			exec[i].offset = gen8_noncanonical_addr(exec[i].offset);
-		}
-
-		if (exec[i].alignment && !is_power_of_2(exec[i].alignment))
-			return -EINVAL;
-
-		/* pad_to_size was once a reserved field, so sanitize it */
-		if (exec[i].flags & EXEC_OBJECT_PAD_TO_SIZE) {
-			if (offset_in_page(exec[i].pad_to_size))
-				return -EINVAL;
-		} else {
-			exec[i].pad_to_size = 0;
-		}
-
-		/* First check for malicious input causing overflow in
-		 * the worst case where we need to allocate the entire
-		 * relocation tree as a single array.
-		 */
-		if (exec[i].relocation_count > relocs_max - relocs_total)
-			return -EINVAL;
-		relocs_total += exec[i].relocation_count;
-
-		length = exec[i].relocation_count *
-			sizeof(struct drm_i915_gem_relocation_entry);
-		/*
-		 * We must check that the entire relocation array is safe
-		 * to read, but since we may need to update the presumed
-		 * offsets during execution, check for full write access.
-		 */
-		if (!access_ok(VERIFY_WRITE, ptr, length))
-			return -EFAULT;
-
-		if (likely(!i915.prefault_disable)) {
-			if (fault_in_multipages_readable(ptr, length))
-				return -EFAULT;
-		}
-	}
-
-	return 0;
-}
-
-static int eb_select_context(struct i915_execbuffer *eb)
-{
-	struct i915_gem_context *ctx;
-	unsigned int ctx_id;
-
-	ctx_id = i915_execbuffer2_get_context_id(*eb->args);
-	ctx = i915_gem_context_lookup(eb->file->driver_priv, ctx_id);
-	if (unlikely(IS_ERR(ctx)))
-		return PTR_ERR(ctx);
-
-	if (unlikely(ctx->hang_stats.banned)) {
-		DRM_DEBUG("Context %u tried to submit while banned\n", ctx_id);
-		return -EIO;
-	}
-
-	eb->ctx = i915_gem_context_get(ctx);
-	eb->vm = ctx->ppgtt ? &ctx->ppgtt->base : &eb->i915->ggtt.base;
-
-	return 0;
-}
-
 void i915_vma_move_to_active(struct i915_vma *vma,
 			     struct drm_i915_gem_request *req,
 			     unsigned int flags)
@@ -1289,11 +1434,7 @@ void i915_vma_move_to_active(struct i915_vma *vma,
 
 	if (flags & EXEC_OBJECT_WRITE) {
 		i915_gem_active_set(&obj->last_write, req);
-
 		intel_fb_obj_invalidate(obj, ORIGIN_CS);
-
-		/* update for the implicit flush after a batch */
-		obj->base.write_domain &= ~I915_GEM_GPU_DOMAINS;
 	}
 
 	if (flags & EXEC_OBJECT_NEEDS_FENCE)
@@ -1304,49 +1445,6 @@ void i915_vma_move_to_active(struct i915_vma *vma,
 	list_move_tail(&vma->vm_link, &vma->vm->active_list);
 }
 
-static void eb_export_fence(struct drm_i915_gem_object *obj,
-			    struct drm_i915_gem_request *req,
-			    unsigned int flags)
-{
-	struct reservation_object *resv;
-
-	resv = i915_gem_object_get_dmabuf_resv(obj);
-	if (!resv)
-		return;
-
-	/* Ignore errors from failing to allocate the new fence, we can't
-	 * handle an error right now. Worst case should be missed
-	 * synchronisation leading to rendering corruption.
-	 */
-	ww_mutex_lock(&resv->lock, NULL);
-	if (flags & EXEC_OBJECT_WRITE)
-		reservation_object_add_excl_fence(resv, &req->fence);
-	else if (reservation_object_reserve_shared(resv) == 0)
-		reservation_object_add_shared_fence(resv, &req->fence);
-	ww_mutex_unlock(&resv->lock);
-}
-
-static void
-eb_move_to_active(struct i915_execbuffer *eb)
-{
-	struct i915_vma *vma;
-
-	list_for_each_entry(vma, &eb->vmas, exec_list) {
-		struct drm_i915_gem_object *obj = vma->obj;
-		u32 old_read = obj->base.read_domains;
-		u32 old_write = obj->base.write_domain;
-
-		obj->base.write_domain = 0;
-		if (vma->exec_entry->flags & EXEC_OBJECT_WRITE)
-			obj->base.read_domains = 0;
-		obj->base.read_domains |= I915_GEM_GPU_DOMAINS;
-
-		i915_vma_move_to_active(vma, eb->request, vma->exec_entry->flags);
-		eb_export_fence(obj, eb->request, vma->exec_entry->flags);
-		trace_i915_gem_object_change_domain(obj, old_read, old_write);
-	}
-}
-
 static int
 i915_reset_gen7_sol_offsets(struct drm_i915_gem_request *req)
 {
@@ -1358,16 +1456,16 @@ i915_reset_gen7_sol_offsets(struct drm_i915_gem_request *req)
 		return -EINVAL;
 	}
 
-	ret = intel_ring_begin(req, 4 * 3);
+	ret = intel_ring_begin(req, 4 * 2 + 2);
 	if (ret)
 		return ret;
 
+	intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(4));
 	for (i = 0; i < 4; i++) {
-		intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1));
 		intel_ring_emit_reg(ring, GEN7_SO_WRITE_OFFSET(i));
 		intel_ring_emit(ring, 0);
 	}
-
+	intel_ring_emit(ring, MI_NOOP);
 	intel_ring_advance(ring);
 
 	return 0;
@@ -1403,9 +1501,10 @@ static struct i915_vma *eb_parse(struct i915_execbuffer *eb, bool is_master)
 		goto out;
 
 	vma->exec_entry =
-		memset(&eb->shadow_exec_entry, 0, sizeof(*vma->exec_entry));
+		memset(&eb->exec[eb->args->buffer_count++],
+		       0, sizeof(*vma->exec_entry));
 	vma->exec_entry->flags = __EXEC_OBJECT_HAS_PIN;
-	list_add_tail(&vma->exec_list, &eb->vmas);
+	vma->exec_entry->rsvd2 = (uintptr_t)vma;
 
 out:
 	i915_gem_object_unpin_pages(shadow_batch_obj);
@@ -1421,70 +1520,81 @@ add_to_client(struct drm_i915_gem_request *req,
 }
 
 static int
-execbuf_submit(struct i915_execbuffer *eb)
+eb_set_constants_offset(struct i915_execbuffer *eb)
 {
-	int instp_mode;
-	u32 instp_mask;
+	struct drm_i915_private *dev_priv = eb->i915;
+	struct intel_ring *ring;
+	u32 mode, mask;
 	int ret;
 
-	ret = eb_move_to_gpu(eb);
-	if (ret)
-		return ret;
-
-	ret = i915_switch_context(eb->request);
-	if (ret)
-		return ret;
-
-	instp_mode = eb->args->flags & I915_EXEC_CONSTANTS_MASK;
-	instp_mask = I915_EXEC_CONSTANTS_MASK;
-	switch (instp_mode) {
+	mode = eb->args->flags & I915_EXEC_CONSTANTS_MASK;
+	switch (mode) {
 	case I915_EXEC_CONSTANTS_REL_GENERAL:
 	case I915_EXEC_CONSTANTS_ABSOLUTE:
 	case I915_EXEC_CONSTANTS_REL_SURFACE:
-		if (instp_mode != 0 && eb->engine->id != RCS) {
-			DRM_DEBUG("non-0 rel constants mode on non-RCS\n");
-			return -EINVAL;
-		}
-
-		if (instp_mode != eb->i915->relative_constants_mode) {
-			if (INTEL_INFO(eb->i915)->gen < 4) {
-				DRM_DEBUG("no rel constants on pre-gen4\n");
-				return -EINVAL;
-			}
-
-			if (INTEL_INFO(eb->i915)->gen > 5 &&
-			    instp_mode == I915_EXEC_CONSTANTS_REL_SURFACE) {
-				DRM_DEBUG("rel surface constants mode invalid on gen5+\n");
-				return -EINVAL;
-			}
-
-			/* The HW changed the meaning on this bit on gen6 */
-			if (INTEL_INFO(eb->i915)->gen >= 6)
-				instp_mask &= ~I915_EXEC_CONSTANTS_REL_SURFACE;
-		}
 		break;
 	default:
-		DRM_DEBUG("execbuf with unknown constants: %d\n", instp_mode);
+		DRM_DEBUG("execbuf with unknown constants: %d\n", mode);
 		return -EINVAL;
 	}
 
-	if (eb->engine->id == RCS &&
-	    instp_mode != eb->i915->relative_constants_mode) {
-		struct intel_ring *ring = eb->request->ring;
+	if (mode == dev_priv->relative_constants_mode)
+		return 0;
 
-		ret = intel_ring_begin(eb->request, 4);
-		if (ret)
-			return ret;
+	if (eb->engine->id != RCS) {
+		DRM_DEBUG("non-0 rel constants mode on non-RCS\n");
+		return -EINVAL;
+	}
 
-		intel_ring_emit(ring, MI_NOOP);
-		intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1));
-		intel_ring_emit_reg(ring, INSTPM);
-		intel_ring_emit(ring, instp_mask << 16 | instp_mode);
-		intel_ring_advance(ring);
+	if (INTEL_GEN(dev_priv) < 4) {
+		DRM_DEBUG("no rel constants on pre-gen4\n");
+		return -EINVAL;
+	}
 
-		eb->i915->relative_constants_mode = instp_mode;
+	if (INTEL_GEN(dev_priv) > 5 &&
+	    mode == I915_EXEC_CONSTANTS_REL_SURFACE) {
+		DRM_DEBUG("rel surface constants mode invalid on gen5+\n");
+		return -EINVAL;
 	}
 
+	/* The HW changed the meaning on this bit on gen6 */
+	mask = I915_EXEC_CONSTANTS_MASK;
+	if (INTEL_GEN(dev_priv) >= 6)
+		mask &= ~I915_EXEC_CONSTANTS_REL_SURFACE;
+
+	ret = intel_ring_begin(eb->request, 4);
+	if (ret)
+		return ret;
+
+	ring = eb->request->ring;
+	intel_ring_emit(ring, MI_NOOP);
+	intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1));
+	intel_ring_emit_reg(ring, INSTPM);
+	intel_ring_emit(ring, mask << 16 | mode);
+	intel_ring_advance(ring);
+
+	dev_priv->relative_constants_mode = mode;
+
+	return 0;
+}
+
+static int
+eb_submit(struct i915_execbuffer *eb)
+{
+	int ret;
+
+	ret = eb_move_to_gpu(eb);
+	if (ret)
+		return ret;
+
+	ret = i915_switch_context(eb->request);
+	if (ret)
+		return ret;
+
+	ret = eb_set_constants_offset(eb);
+	if (ret)
+		return ret;
+
 	if (eb->args->flags & I915_EXEC_GEN7_SOL_RESET) {
 		ret = i915_reset_gen7_sol_offsets(eb->request);
 		if (ret)
@@ -1493,15 +1603,13 @@ execbuf_submit(struct i915_execbuffer *eb)
 
 	ret = eb->engine->emit_bb_start(eb->request,
 					eb->batch->node.start +
-					eb->batch_start_offset,
+					eb->args->batch_start_offset,
 					eb->args->batch_len,
 					eb->dispatch_flags);
 	if (ret)
 		return ret;
 
 	trace_i915_gem_ring_dispatch(eb->request, eb->dispatch_flags);
-
-	eb_move_to_active(eb);
 	add_to_client(eb->request, eb->file);
 
 	return 0;
@@ -1596,18 +1704,21 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	struct i915_execbuffer eb;
 	int ret;
 
+	BUILD_BUG_ON(__EXEC_OBJECT_INTERNAL_FLAGS & ~__EXEC_OBJECT_UNKNOWN_FLAGS);
+
 	if (!i915_gem_check_execbuffer(args))
 		return -EINVAL;
 
-	ret = validate_exec_list(dev, exec, args->buffer_count);
-	if (ret)
-		return ret;
-
 	eb.i915 = to_i915(dev);
 	eb.file = file;
 	eb.args = args;
-	eb.exec = exec;
-	eb.need_relocs = (args->flags & I915_EXEC_NO_RELOC) == 0;
+	if ((args->flags & I915_EXEC_NO_RELOC) == 0)
+		args->flags |= __EXEC_HAS_RELOC;
+	eb.exec = NULL;
+	eb.ctx = NULL;
+	eb.invalid_flags = __EXEC_OBJECT_UNKNOWN_FLAGS;
+	if (USES_FULL_PPGTT(eb.i915))
+		eb.invalid_flags |= EXEC_OBJECT_NEEDS_GTT;
 	reloc_cache_init(&eb.reloc_cache, eb.i915);
 
 	eb.dispatch_flags = 0;
@@ -1638,6 +1749,9 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 		eb.dispatch_flags |= I915_DISPATCH_RS;
 	}
 
+	if (eb_create(&eb))
+		return -ENOMEM;
+
 	/* Take a local wakeref for preparing to dispatch the execbuf as
 	 * we expect to access the hardware fairly frequently in the
 	 * process. Upon first dispatch, we acquire another prolonged
@@ -1645,70 +1759,57 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	 * 100ms.
 	 */
 	intel_runtime_pm_get(eb.i915);
-
 	ret = i915_mutex_lock_interruptible(dev);
 	if (ret)
-		goto pre_mutex_err;
+		goto err_rpm;
 
 	ret = eb_select_context(&eb);
-	if (ret) {
-		mutex_unlock(&dev->struct_mutex);
-		goto pre_mutex_err;
-	}
+	if (unlikely(ret))
+		goto err_unlock;
 
-	if (eb_create(&eb)) {
-		i915_gem_context_put(eb.ctx);
-		mutex_unlock(&dev->struct_mutex);
-		ret = -ENOMEM;
-		goto pre_mutex_err;
-	}
-
-	/* Look up object handles */
+	eb.exec = exec;
 	ret = eb_lookup_vmas(&eb);
-	if (ret)
-		goto err;
-
-	/* take note of the batch buffer before we might reorder the lists */
-	eb.batch = eb_get_batch(&eb);
-
-	/* Move the objects en-masse into the GTT, evicting if necessary. */
-	ret = eb_reserve(&eb);
-	if (ret)
-		goto err;
+	if (unlikely(ret))
+		goto err_context;
 
 	/* The objects are in their final locations, apply the relocations. */
-	if (eb.need_relocs)
+	if (args->flags & __EXEC_HAS_RELOC && !list_empty(&eb.relocs)) {
 		ret = eb_relocate(&eb);
-	if (ret) {
-		if (ret == -EFAULT) {
+		if (ret == -EAGAIN || ret == -EFAULT)
 			ret = eb_relocate_slow(&eb);
-			BUG_ON(!mutex_is_locked(&dev->struct_mutex));
+		if (ret) {
+			/* If the user expects the execobject.offset and
+			 * reloc.presumed_offset to be an exact match,
+			 * as for using NO_RELOC, then we cannot update
+			 * the execobject.offset until we have completed
+			 * relocation.
+			 */
+			if (args->flags & I915_EXEC_NO_RELOC)
+				args->flags &= ~__EXEC_HAS_RELOC;
+			goto err_vma;
 		}
-		if (ret)
-			goto err;
 	}
 
 	/* Set the pending read domains for the batch buffer to COMMAND */
-	if (eb.batch->exec_entry->flags & EXEC_OBJECT_WRITE) {
+	if (unlikely(eb.batch->exec_entry->flags & EXEC_OBJECT_WRITE)) {
 		DRM_DEBUG("Attempting to use self-modifying batch buffer\n");
 		ret = -EINVAL;
-		goto err;
+		goto err_vma;
 	}
 	if (args->batch_start_offset > eb.batch->size ||
 	    args->batch_len > eb.batch->size - args->batch_start_offset) {
 		DRM_DEBUG("Attempting to use out-of-bounds batch\n");
 		ret = -EINVAL;
-		goto err;
+		goto err_vma;
 	}
 
-	eb.batch_start_offset = args->batch_start_offset;
 	if (intel_engine_needs_cmd_parser(eb.engine) && args->batch_len) {
 		struct i915_vma *vma;
 
 		vma = eb_parse(&eb, drm_is_current_master(file));
 		if (IS_ERR(vma)) {
 			ret = PTR_ERR(vma);
-			goto err;
+			goto err_vma;
 		}
 
 		if (vma) {
@@ -1722,7 +1823,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 			 * command parser has accepted.
 			 */
 			eb.dispatch_flags |= I915_DISPATCH_SECURE;
-			eb.batch_start_offset = 0;
+			eb.args->batch_start_offset = 0;
 			eb.batch = vma;
 		}
 	}
@@ -1731,7 +1832,6 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	 * batch" bit. Hence we need to pin secure batches into the global gtt.
 	 * hsw should have this fixed, but bdw mucks it up again. */
 	if (eb.dispatch_flags & I915_DISPATCH_SECURE) {
-		struct drm_i915_gem_object *obj = eb.batch->obj;
 		struct i915_vma *vma;
 
 		/*
@@ -1744,17 +1844,18 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 		 *   fitting due to fragmentation.
 		 * So this is actually safe.
 		 */
-		vma = i915_gem_object_ggtt_pin(obj, NULL, 0, 0, 0);
+		vma = i915_gem_object_ggtt_pin(eb.batch->obj, NULL,
+					       0, 0, 0);
 		if (IS_ERR(vma)) {
 			ret = PTR_ERR(vma);
-			goto err;
+			goto err_vma;
 		}
 
 		eb.batch = vma;
 	}
 
 	if (args->batch_len == 0)
-		args->batch_len = eb.batch->size - eb.batch_start_offset;
+		args->batch_len = eb.batch->size - eb.args->batch_start_offset;
 
 	/* Allocate a request for this batch buffer nice and early. */
 	eb.request = i915_gem_request_alloc(eb.engine, eb.ctx);
@@ -1771,7 +1872,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	 */
 	eb.request->batch = eb.batch;
 
-	ret = execbuf_submit(&eb);
+	ret = eb_submit(&eb);
 	__i915_add_request(eb.request, ret == 0);
 
 err_batch_unpin:
@@ -1783,15 +1884,18 @@ err_batch_unpin:
 	 */
 	if (eb.dispatch_flags & I915_DISPATCH_SECURE)
 		i915_vma_unpin(eb.batch);
-err:
 	/* the request owns the ref now */
-	eb_destroy(&eb);
+err_vma:
+	eb_release_vma(&eb);
+err_context:
+	i915_gem_context_put(eb.ctx);
+err_unlock:
 	mutex_unlock(&dev->struct_mutex);
-
-pre_mutex_err:
+err_rpm:
 	/* intel_gpu_busy should also get a ref, so it will free when the device
 	 * is really idle. */
 	intel_runtime_pm_put(eb.i915);
+	eb_destroy(&eb);
 	return ret;
 }
 
@@ -1815,8 +1919,12 @@ i915_gem_execbuffer(struct drm_device *dev, void *data,
 	}
 
 	/* Copy in the exec list from userland */
-	exec_list = drm_malloc_ab(sizeof(*exec_list), args->buffer_count);
-	exec2_list = drm_malloc_ab(sizeof(*exec2_list), args->buffer_count);
+	exec_list = drm_malloc_gfp(args->buffer_count,
+				   sizeof(*exec_list),
+				   __GFP_NOWARN | GFP_TEMPORARY);
+	exec2_list = drm_malloc_gfp(args->buffer_count + 1,
+				    sizeof(*exec2_list),
+				    __GFP_NOWARN | GFP_TEMPORARY);
 	if (exec_list == NULL || exec2_list == NULL) {
 		DRM_DEBUG("Failed to allocate exec list for %d buffers\n",
 			  args->buffer_count);
@@ -1841,7 +1949,7 @@ i915_gem_execbuffer(struct drm_device *dev, void *data,
 		exec2_list[i].relocs_ptr = exec_list[i].relocs_ptr;
 		exec2_list[i].alignment = exec_list[i].alignment;
 		exec2_list[i].offset = exec_list[i].offset;
-		if (INTEL_INFO(dev)->gen < 4)
+		if (INTEL_GEN(dev) < 4)
 			exec2_list[i].flags = EXEC_OBJECT_NEEDS_FENCE;
 		else
 			exec2_list[i].flags = 0;
@@ -1859,24 +1967,22 @@ i915_gem_execbuffer(struct drm_device *dev, void *data,
 	i915_execbuffer2_set_context_id(exec2, 0);
 
 	ret = i915_gem_do_execbuffer(dev, file, &exec2, exec2_list);
-	if (!ret) {
+	if (exec2.flags & __EXEC_HAS_RELOC) {
 		struct drm_i915_gem_exec_object __user *user_exec_list =
 			u64_to_user_ptr(args->buffers_ptr);
 
 		/* Copy the new buffer offsets back to the user's exec list. */
 		for (i = 0; i < args->buffer_count; i++) {
+			if ((exec2_list[i].offset & UPDATE) == 0)
+				continue;
+
 			exec2_list[i].offset =
-				gen8_canonical_addr(exec2_list[i].offset);
-			ret = __copy_to_user(&user_exec_list[i].offset,
-					     &exec2_list[i].offset,
-					     sizeof(user_exec_list[i].offset));
-			if (ret) {
-				ret = -EFAULT;
-				DRM_DEBUG("failed to copy %d exec entries "
-					  "back to user (%d)\n",
-					  args->buffer_count, ret);
+				gen8_canonical_addr(exec2_list[i].offset & PIN_OFFSET_MASK);
+			exec2_list[i].offset &= PIN_OFFSET_MASK;
+			if (__copy_to_user(&user_exec_list[i].offset,
+					   &exec2_list[i].offset,
+					   sizeof(user_exec_list[i].offset)))
 				break;
-			}
 		}
 	}
 
@@ -1890,11 +1996,11 @@ i915_gem_execbuffer2(struct drm_device *dev, void *data,
 		     struct drm_file *file)
 {
 	struct drm_i915_gem_execbuffer2 *args = data;
-	struct drm_i915_gem_exec_object2 *exec2_list = NULL;
+	struct drm_i915_gem_exec_object2 *exec2_list;
 	int ret;
 
 	if (args->buffer_count < 1 ||
-	    args->buffer_count > UINT_MAX / sizeof(*exec2_list)) {
+	    args->buffer_count >= UINT_MAX / sizeof(*exec2_list)) {
 		DRM_DEBUG("execbuf2 with %d buffers\n", args->buffer_count);
 		return -EINVAL;
 	}
@@ -1904,45 +2010,42 @@ i915_gem_execbuffer2(struct drm_device *dev, void *data,
 		return -EINVAL;
 	}
 
-	exec2_list = drm_malloc_gfp(args->buffer_count,
+	exec2_list = drm_malloc_gfp(args->buffer_count + 1,
 				    sizeof(*exec2_list),
-				    GFP_TEMPORARY);
+				    __GFP_NOWARN | GFP_TEMPORARY);
 	if (exec2_list == NULL) {
 		DRM_DEBUG("Failed to allocate exec list for %d buffers\n",
 			  args->buffer_count);
 		return -ENOMEM;
 	}
-	ret = copy_from_user(exec2_list,
-			     u64_to_user_ptr(args->buffers_ptr),
-			     sizeof(*exec2_list) * args->buffer_count);
-	if (ret != 0) {
-		DRM_DEBUG("copy %d exec entries failed %d\n",
-			  args->buffer_count, ret);
+	if (copy_from_user(exec2_list,
+			   u64_to_user_ptr(args->buffers_ptr),
+			   sizeof(*exec2_list) * args->buffer_count)) {
+		DRM_DEBUG("copy %d exec entries failed\n", args->buffer_count);
 		drm_free_large(exec2_list);
 		return -EFAULT;
 	}
 
 	ret = i915_gem_do_execbuffer(dev, file, args, exec2_list);
-	if (!ret) {
+	if (args->flags & __EXEC_HAS_RELOC) {
 		/* Copy the new buffer offsets back to the user's exec list. */
 		struct drm_i915_gem_exec_object2 __user *user_exec_list =
-				   u64_to_user_ptr(args->buffers_ptr);
+			u64_to_user_ptr(args->buffers_ptr);
 		int i;
 
+		user_access_begin();
 		for (i = 0; i < args->buffer_count; i++) {
+			if ((exec2_list[i].offset & UPDATE) == 0)
+				continue;
+
 			exec2_list[i].offset =
-				gen8_canonical_addr(exec2_list[i].offset);
-			ret = __copy_to_user(&user_exec_list[i].offset,
-					     &exec2_list[i].offset,
-					     sizeof(user_exec_list[i].offset));
-			if (ret) {
-				ret = -EFAULT;
-				DRM_DEBUG("failed to copy %d exec entries "
-					  "back to user\n",
-					  args->buffer_count);
-				break;
-			}
+				gen8_canonical_addr(exec2_list[i].offset & PIN_OFFSET_MASK);
+			unsafe_put_user(exec2_list[i].offset,
+					&user_exec_list[i].offset,
+					end_user);
 		}
+end_user:
+		user_access_end();
 	}
 
 	drm_free_large(exec2_list);
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index f9540683d2c0..ba04b0bf7fe0 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -230,7 +230,9 @@ struct i915_vma {
 	struct hlist_node obj_node;
 
 	/** This vma's place in the batchbuffer or on the eviction list */
-	struct list_head exec_list;
+	struct list_head exec_link;
+	struct list_head reloc_link;
+	struct list_head evict_link;
 
 	/**
 	 * Used for performing relocations during execbuffer insertion.
-- 
2.9.3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 14/17] drm/i915: First try the previous execbuffer location
  2016-08-22  8:03 Execbuf fixes and major tuning Chris Wilson
                   ` (12 preceding siblings ...)
  2016-08-22  8:03 ` [PATCH 13/17] drm/i915: Eliminate lots of iterations over the execobjects array Chris Wilson
@ 2016-08-22  8:03 ` Chris Wilson
  2016-08-22  8:03 ` [PATCH 15/17] drm/i915: Wait upon userptr get-user-pages within execbuffer Chris Wilson
                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 47+ messages in thread
From: Chris Wilson @ 2016-08-22  8:03 UTC (permalink / raw)
  To: intel-gfx

When choosing a slot for an execbuffer, we ideally want to use the same
address as last time (so that we don't have to rebind it) and the same
address as expected by the user (so that we don't have to fixup any
relocations pointing to it). If we first try to bind the incoming
execbuffer->offset from the user, or the currently bound offset that
should hopefully achieve the goal of avoiding the rebind cost and the
relocation penalty. However, if the object is not currently bound there
we don't want to arbitrarily unbind an object in our chosen position and
so choose to rebind/relocate the incoming object instead. After we
report the new position back to the user, on the next pass the
relocations should have settled down.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c            | 6 ++++++
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 9 ++++++---
 drivers/gpu/drm/i915/i915_gem_gtt.h        | 1 +
 3 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 1f35dd6219cb..0398f86683e7 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3055,6 +3055,9 @@ i915_vma_insert(struct i915_vma *vma, u64 size, u64 alignment, u64 flags)
 		vma->node.color = obj->cache_level;
 		ret = drm_mm_reserve_node(&vma->vm->mm, &vma->node);
 		if (ret) {
+			if (flags & PIN_NOEVICT)
+				goto err_unpin;
+
 			ret = i915_gem_evict_for_vma(vma, flags);
 			if (ret == 0)
 				ret = drm_mm_reserve_node(&vma->vm->mm, &vma->node);
@@ -3090,6 +3093,9 @@ search_free:
 							  search_flag,
 							  alloc_flag);
 		if (ret) {
+			if (flags & PIN_NOEVICT)
+				goto err_unpin;
+
 			ret = i915_gem_evict_something(vma->vm, size, alignment,
 						       obj->cache_level,
 						       start, end,
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 71b18fcbd8a7..1fa3b1888b0b 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -163,10 +163,14 @@ eb_pin_vma(struct i915_execbuffer *eb,
 {
 	u64 flags;
 
-	flags = vma->node.start;
+	flags = entry->offset & PIN_OFFSET_MASK;
+	if (vma->node.size && flags != vma->node.start)
+		flags = vma->node.start | PIN_NOEVICT;
+
 	flags |= PIN_USER | PIN_NONBLOCK | PIN_OFFSET_FIXED;
 	if (unlikely(entry->flags & EXEC_OBJECT_NEEDS_GTT))
 		flags |= PIN_GLOBAL;
+
 	if (unlikely(i915_vma_pin(vma, 0, 0, flags)))
 		return;
 
@@ -276,8 +280,7 @@ eb_add_vma(struct i915_execbuffer *eb,
 		entry->flags |= eb->context_flags;
 
 	ret = 0;
-	if (vma->node.size)
-		eb_pin_vma(eb, entry, vma);
+	eb_pin_vma(eb, entry, vma);
 	if (eb_vma_misplaced(entry, vma)) {
 		eb_unreserve_vma(vma, entry);
 
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index ba04b0bf7fe0..9eeb9a37f177 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -645,6 +645,7 @@ void i915_gem_gtt_finish_object(struct drm_i915_gem_object *obj);
 #define PIN_MAPPABLE		BIT(1)
 #define PIN_ZONE_4G		BIT(2)
 #define PIN_NONFAULT		BIT(3)
+#define PIN_NOEVICT		BIT(4)
 
 #define PIN_MBZ			BIT(5) /* I915_VMA_PIN_OVERFLOW */
 #define PIN_GLOBAL		BIT(6) /* I915_VMA_GLOBAL_BIND */
-- 
2.9.3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 15/17] drm/i915: Wait upon userptr get-user-pages within execbuffer
  2016-08-22  8:03 Execbuf fixes and major tuning Chris Wilson
                   ` (13 preceding siblings ...)
  2016-08-22  8:03 ` [PATCH 14/17] drm/i915: First try the previous execbuffer location Chris Wilson
@ 2016-08-22  8:03 ` Chris Wilson
  2016-08-23 10:53   ` Joonas Lahtinen
  2016-08-22  8:03 ` [PATCH 16/17] drm/i915: Remove superfluous i915_add_request_no_flush() helper Chris Wilson
                   ` (3 subsequent siblings)
  18 siblings, 1 reply; 47+ messages in thread
From: Chris Wilson @ 2016-08-22  8:03 UTC (permalink / raw)
  To: intel-gfx

This simply hides the EAGAIN caused by userptr when userspace causes
resource contention. However, it is quite beneficial with highly
contended userptr users as we avoid repeating the setup costs and
kernel-user context switches.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.c            |  1 +
 drivers/gpu/drm/i915/i915_drv.h            | 10 +++++++++-
 drivers/gpu/drm/i915/i915_gem.c            |  4 +++-
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  3 +++
 drivers/gpu/drm/i915/i915_gem_userptr.c    | 18 +++++++++++++++---
 5 files changed, 31 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 13ae340ef1f3..c040c6329804 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -577,6 +577,7 @@ static void i915_gem_fini(struct drm_device *dev)
 	i915_gem_reset(dev);
 	i915_gem_cleanup_engines(dev);
 	i915_gem_context_fini(dev);
+	i915_gem_cleanup_userptr(dev_priv);
 	mutex_unlock(&dev->struct_mutex);
 
 	WARN_ON(!list_empty(&to_i915(dev)->context_list));
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index a92c14ad30eb..ebf602ecde10 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1339,6 +1339,13 @@ struct i915_gem_mm {
 	struct list_head fence_list;
 
 	/**
+	 * Workqueue to fault in userptr pages, flushed by the execbuf
+	 * when required but otherwise left to userspace to try again
+	 * on EAGAIN.
+	 */
+	struct workqueue_struct *userptr_wq;
+
+	/**
 	 * Are we in a non-interruptible section of code like
 	 * modesetting?
 	 */
@@ -3097,7 +3104,8 @@ int i915_gem_set_tiling(struct drm_device *dev, void *data,
 			struct drm_file *file_priv);
 int i915_gem_get_tiling(struct drm_device *dev, void *data,
 			struct drm_file *file_priv);
-void i915_gem_init_userptr(struct drm_i915_private *dev_priv);
+int i915_gem_init_userptr(struct drm_i915_private *dev_priv);
+void i915_gem_cleanup_userptr(struct drm_i915_private *dev_priv);
 int i915_gem_userptr_ioctl(struct drm_device *dev, void *data,
 			   struct drm_file *file);
 int i915_gem_get_aperture_ioctl(struct drm_device *dev, void *data,
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 0398f86683e7..fee2011591fb 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4471,7 +4471,9 @@ int i915_gem_init(struct drm_device *dev)
 	 */
 	intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
 
-	i915_gem_init_userptr(dev_priv);
+	ret = i915_gem_init_userptr(dev_priv);
+	if (ret)
+		goto out_unlock;
 
 	ret = i915_gem_init_ggtt(dev_priv);
 	if (ret)
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 1fa3b1888b0b..7babed0b40ee 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1260,6 +1260,9 @@ repeat:
 		return ret;
 	}
 
+	/* A frequent cause for EAGAIN are currently unavailable client pages */
+	flush_workqueue(eb->i915->mm.userptr_wq);
+
 	ret = i915_mutex_lock_interruptible(dev);
 	if (ret) {
 		mutex_lock(&dev->struct_mutex);
diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
index 581df2316ca5..13e902dfa134 100644
--- a/drivers/gpu/drm/i915/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
@@ -376,7 +376,7 @@ __i915_mm_struct_free(struct kref *kref)
 	mutex_unlock(&mm->i915->mm_lock);
 
 	INIT_WORK(&mm->work, __i915_mm_struct_free__worker);
-	schedule_work(&mm->work);
+	queue_work(mm->i915->mm.userptr_wq, &mm->work);
 }
 
 static void
@@ -596,7 +596,7 @@ __i915_gem_userptr_get_pages_schedule(struct drm_i915_gem_object *obj,
 	get_task_struct(work->task);
 
 	INIT_WORK(&work->work, __i915_gem_userptr_get_pages_worker);
-	schedule_work(&work->work);
+	queue_work(to_i915(obj->base.dev)->mm.userptr_wq, &work->work);
 
 	*active = true;
 	return -EAGAIN;
@@ -820,8 +820,20 @@ i915_gem_userptr_ioctl(struct drm_device *dev, void *data, struct drm_file *file
 	return 0;
 }
 
-void i915_gem_init_userptr(struct drm_i915_private *dev_priv)
+int i915_gem_init_userptr(struct drm_i915_private *dev_priv)
 {
 	mutex_init(&dev_priv->mm_lock);
 	hash_init(dev_priv->mm_structs);
+
+	dev_priv->mm.userptr_wq =
+		alloc_workqueue("i915-userptr-acquire", WQ_HIGHPRI, 0);
+	if (!dev_priv->mm.userptr_wq)
+		return -ENOMEM;
+
+	return 0;
+}
+
+void i915_gem_cleanup_userptr(struct drm_i915_private *dev_priv)
+{
+	destroy_workqueue(dev_priv->mm.userptr_wq);
 }
-- 
2.9.3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 16/17] drm/i915: Remove superfluous i915_add_request_no_flush() helper
  2016-08-22  8:03 Execbuf fixes and major tuning Chris Wilson
                   ` (14 preceding siblings ...)
  2016-08-22  8:03 ` [PATCH 15/17] drm/i915: Wait upon userptr get-user-pages within execbuffer Chris Wilson
@ 2016-08-22  8:03 ` Chris Wilson
  2016-08-23 10:21   ` Joonas Lahtinen
  2016-08-22  8:03 ` [PATCH 17/17] drm/i915: Use the MRU stack search after evicting Chris Wilson
                   ` (2 subsequent siblings)
  18 siblings, 1 reply; 47+ messages in thread
From: Chris Wilson @ 2016-08-22  8:03 UTC (permalink / raw)
  To: intel-gfx

The only time we need to emit a flush inside request emission is after
an execbuffer, for which we can use the full __i915_add_request(). All
other instances want the simpler i915_add_request() without flushing, so
remove the useless helper.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_context.c | 2 +-
 drivers/gpu/drm/i915/i915_gem_request.h | 2 --
 drivers/gpu/drm/i915/intel_display.c    | 4 ++--
 drivers/gpu/drm/i915/intel_overlay.c    | 8 ++++----
 drivers/gpu/drm/i915/intel_pm.c         | 2 +-
 5 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 3b4eec676d66..e71cf17bba61 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -1034,7 +1034,7 @@ int i915_gem_switch_to_kernel_context(struct drm_i915_private *dev_priv)
 			return PTR_ERR(req);
 
 		ret = i915_switch_context(req);
-		i915_add_request_no_flush(req);
+		i915_add_request(req);
 		if (ret)
 			return ret;
 	}
diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
index 9d5a66bfc509..b556c1082a34 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.h
+++ b/drivers/gpu/drm/i915/i915_gem_request.h
@@ -222,8 +222,6 @@ static inline void i915_gem_request_assign(struct drm_i915_gem_request **pdst,
 
 void __i915_add_request(struct drm_i915_gem_request *req, bool flush_caches);
 #define i915_add_request(req) \
-	__i915_add_request(req, true)
-#define i915_add_request_no_flush(req) \
 	__i915_add_request(req, false)
 
 struct intel_rps_client;
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 4c6cb64ec220..a44c2313c4f2 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -12149,7 +12149,7 @@ static int intel_crtc_page_flip(struct drm_crtc *crtc,
 		intel_mark_page_flip_active(intel_crtc, work);
 
 		work->flip_queued_req = i915_gem_request_get(request);
-		i915_add_request_no_flush(request);
+		i915_add_request(request);
 	}
 
 	i915_gem_track_fb(intel_fb_obj(old_fb), obj,
@@ -12164,7 +12164,7 @@ static int intel_crtc_page_flip(struct drm_crtc *crtc,
 	return 0;
 
 cleanup_request:
-	i915_add_request_no_flush(request);
+	i915_add_request(request);
 cleanup_unpin:
 	to_intel_plane_state(primary->state)->vma = work->old_vma;
 	intel_unpin_fb_vma(vma);
diff --git a/drivers/gpu/drm/i915/intel_overlay.c b/drivers/gpu/drm/i915/intel_overlay.c
index 7c392547711f..dde486cd2cd8 100644
--- a/drivers/gpu/drm/i915/intel_overlay.c
+++ b/drivers/gpu/drm/i915/intel_overlay.c
@@ -255,7 +255,7 @@ static int intel_overlay_on(struct intel_overlay *overlay)
 
 	ret = intel_ring_begin(req, 4);
 	if (ret) {
-		i915_add_request_no_flush(req);
+		i915_add_request(req);
 		return ret;
 	}
 
@@ -298,7 +298,7 @@ static int intel_overlay_continue(struct intel_overlay *overlay,
 
 	ret = intel_ring_begin(req, 2);
 	if (ret) {
-		i915_add_request_no_flush(req);
+		i915_add_request(req);
 		return ret;
 	}
 
@@ -373,7 +373,7 @@ static int intel_overlay_off(struct intel_overlay *overlay)
 
 	ret = intel_ring_begin(req, 6);
 	if (ret) {
-		i915_add_request_no_flush(req);
+		i915_add_request(req);
 		return ret;
 	}
 
@@ -437,7 +437,7 @@ static int intel_overlay_release_old_vid(struct intel_overlay *overlay)
 
 		ret = intel_ring_begin(req, 2);
 		if (ret) {
-			i915_add_request_no_flush(req);
+			i915_add_request(req);
 			return ret;
 		}
 
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 8a6751e14ab9..5633aa66f64b 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -6676,7 +6676,7 @@ static void __intel_autoenable_gt_powersave(struct work_struct *work)
 		rcs->init_context(req);
 
 	/* Mark the device busy, calling intel_enable_gt_powersave() */
-	i915_add_request_no_flush(req);
+	i915_add_request(req);
 
 unlock:
 	mutex_unlock(&dev_priv->drm.struct_mutex);
-- 
2.9.3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 17/17] drm/i915: Use the MRU stack search after evicting
  2016-08-22  8:03 Execbuf fixes and major tuning Chris Wilson
                   ` (15 preceding siblings ...)
  2016-08-22  8:03 ` [PATCH 16/17] drm/i915: Remove superfluous i915_add_request_no_flush() helper Chris Wilson
@ 2016-08-22  8:03 ` Chris Wilson
  2016-08-23 13:52 ` ✗ Fi.CI.BAT: failure for series starting with [01/17] drm/i915: Skip holding an object reference for execbuf preparation Patchwork
  2016-08-25  6:51 ` ✗ Fi.CI.BAT: warning for series starting with [01/17] drm/i915: Skip holding an object reference for execbuf preparation (rev2) Patchwork
  18 siblings, 0 replies; 47+ messages in thread
From: Chris Wilson @ 2016-08-22  8:03 UTC (permalink / raw)
  To: intel-gfx

When we evict from the GTT to make room for an object, the hole we
create is put onto the MRU stack inside the drm_mm range manager. On the
next search pass, we can speed up a PIN_HIGH allocation by referencing
that stack for the new hole.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index fee2011591fb..76701000a538 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3100,8 +3100,10 @@ search_free:
 						       obj->cache_level,
 						       start, end,
 						       flags);
-			if (ret == 0)
+			if (ret == 0) {
+				search_flag = DRM_MM_SEARCH_DEFAULT;
 				goto search_free;
+			}
 
 			goto err_unpin;
 		}
-- 
2.9.3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH 01/17] drm/i915: Skip holding an object reference for execbuf preparation
  2016-08-22  8:03 ` [PATCH 01/17] drm/i915: Skip holding an object reference for execbuf preparation Chris Wilson
@ 2016-08-22 11:21   ` Joonas Lahtinen
  2016-08-22 11:56     ` Chris Wilson
  0 siblings, 1 reply; 47+ messages in thread
From: Joonas Lahtinen @ 2016-08-22 11:21 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On ma, 2016-08-22 at 09:03 +0100, Chris Wilson wrote:
> This is a golden oldie! We can shave a couple of locked instructions for
> about 10% of the per-object overhead by not taking an extra kref whilst
> reserving objects for an execbuf. Due to lock management this is safe,
> as we cannot lose the original object reference without the lock.
> Equally, because this relies on the heavy BKL^W struct_mutex, it is also
> likely to be only a temporary optimisation until we have fine grained
> locking. (That's what we said 5 years ago, so there's probably another
> 10 years before we get around to finer grained locking!)
> 

Should we sprinkle a couple of lockdep_assert_held for documentation?

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 03/17] drm/i915: Allow the user to pass a context to any ring
  2016-08-22  8:03 ` [PATCH 03/17] drm/i915: Allow the user to pass a context to any ring Chris Wilson
@ 2016-08-22 11:23   ` Joonas Lahtinen
  2016-08-22 12:23     ` Chris Wilson
  0 siblings, 1 reply; 47+ messages in thread
From: Joonas Lahtinen @ 2016-08-22 11:23 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On ma, 2016-08-22 at 09:03 +0100, Chris Wilson wrote:
> With full-ppgtt, we want the user to have full control over their memory
> layout, with a separate instance per context. Forcing them to use a
> shared memory layout for !RCS not only duplicates the amount of work we
> have to do, but also defeats the memory segregation on offer.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c | 5 +----
>  1 file changed, 1 insertion(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index 8f9d5ad0cfd8..fb1a64738fb8 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -1250,12 +1250,9 @@ static struct i915_gem_context *
>  i915_gem_validate_context(struct drm_device *dev, struct drm_file *file,
>  			  struct intel_engine_cs *engine, const u32 ctx_id)
>  {
> -	struct i915_gem_context *ctx = NULL;
> +	struct i915_gem_context *ctx;
>  	struct i915_ctx_hang_stats *hs;
>  
> -	if (engine->id != RCS && ctx_id != DEFAULT_CONTEXT_HANDLE)
> -		return ERR_PTR(-EINVAL);
> -

One would think this existed due to lack of testing or bugs in early
hardware. Do we need to use IS_GEN or some other means of validation?

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 01/17] drm/i915: Skip holding an object reference for execbuf preparation
  2016-08-22 11:21   ` Joonas Lahtinen
@ 2016-08-22 11:56     ` Chris Wilson
  0 siblings, 0 replies; 47+ messages in thread
From: Chris Wilson @ 2016-08-22 11:56 UTC (permalink / raw)
  To: Joonas Lahtinen; +Cc: intel-gfx

On Mon, Aug 22, 2016 at 02:21:16PM +0300, Joonas Lahtinen wrote:
> On ma, 2016-08-22 at 09:03 +0100, Chris Wilson wrote:
> > This is a golden oldie! We can shave a couple of locked instructions for
> > about 10% of the per-object overhead by not taking an extra kref whilst
> > reserving objects for an execbuf. Due to lock management this is safe,
> > as we cannot lose the original object reference without the lock.
> > Equally, because this relies on the heavy BKL^W struct_mutex, it is also
> > likely to be only a temporary optimisation until we have fine grained
> > locking. (That's what we said 5 years ago, so there's probably another
> > 10 years before we get around to finer grained locking!)
> > 
> 
> Should we sprinkle a couple of lockdep_assert_held for documentation?

What's required is a struct mutex.serial counter so that we can assert
that the mutex wasn't unlocked since the lookup and before use.

i915_gem_execbuffer.c is pretty much self-contained (and will be even
more so in future) and the place where we drop the lock is scary enough
to carry lots of comments. I tend to focus the asserts on the boundaries
between pieces of code, but I'll have a look to see if there are
sensible places within execbuf that merit a sprinkling of lockdep
asserts.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 03/17] drm/i915: Allow the user to pass a context to any ring
  2016-08-22 11:23   ` Joonas Lahtinen
@ 2016-08-22 12:23     ` Chris Wilson
  2016-08-23 13:28       ` John Harrison
  0 siblings, 1 reply; 47+ messages in thread
From: Chris Wilson @ 2016-08-22 12:23 UTC (permalink / raw)
  To: Joonas Lahtinen; +Cc: intel-gfx

On Mon, Aug 22, 2016 at 02:23:28PM +0300, Joonas Lahtinen wrote:
> On ma, 2016-08-22 at 09:03 +0100, Chris Wilson wrote:
> > With full-ppgtt, we want the user to have full control over their memory
> > layout, with a separate instance per context. Forcing them to use a
> > shared memory layout for !RCS not only duplicates the amount of work we
> > have to do, but also defeats the memory segregation on offer.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >  drivers/gpu/drm/i915/i915_gem_execbuffer.c | 5 +----
> >  1 file changed, 1 insertion(+), 4 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > index 8f9d5ad0cfd8..fb1a64738fb8 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > @@ -1250,12 +1250,9 @@ static struct i915_gem_context *
> >  i915_gem_validate_context(struct drm_device *dev, struct drm_file *file,
> >  			  struct intel_engine_cs *engine, const u32 ctx_id)
> >  {
> > -	struct i915_gem_context *ctx = NULL;
> > +	struct i915_gem_context *ctx;
> >  	struct i915_ctx_hang_stats *hs;
> >  
> > -	if (engine->id != RCS && ctx_id != DEFAULT_CONTEXT_HANDLE)
> > -		return ERR_PTR(-EINVAL);
> > -
> 
> One would think this existed due to lack of testing or bugs in early
> hardware. Do we need to use IS_GEN or some other means of validation?

No.

This has nothing to do with the hardware logical state (that is found
within intel_context and only enabled where appropriate). The
i915_gem_context is the driver's segregation between clients. Not only
is t required for tracking clients independently (currently hangstats,
but the context would be the first place we start enforcing cgroups like
controls), but it is vital for clients who want to control their memory
layout without conflicts (with themselves and others).
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 16/17] drm/i915: Remove superfluous i915_add_request_no_flush() helper
  2016-08-22  8:03 ` [PATCH 16/17] drm/i915: Remove superfluous i915_add_request_no_flush() helper Chris Wilson
@ 2016-08-23 10:21   ` Joonas Lahtinen
  0 siblings, 0 replies; 47+ messages in thread
From: Joonas Lahtinen @ 2016-08-23 10:21 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On ma, 2016-08-22 at 09:03 +0100, Chris Wilson wrote:
> The only time we need to emit a flush inside request emission is after
> an execbuffer, for which we can use the full __i915_add_request(). All
> other instances want the simpler i915_add_request() without flushing, so
> remove the useless helper.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

I'd probably make the title more like "Rename i915_add_request_no_flush
to i915_add_request", to make it less likely there are trouble when
somebody is rebasing.

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 15/17] drm/i915: Wait upon userptr get-user-pages within execbuffer
  2016-08-22  8:03 ` [PATCH 15/17] drm/i915: Wait upon userptr get-user-pages within execbuffer Chris Wilson
@ 2016-08-23 10:53   ` Joonas Lahtinen
  2016-08-23 11:14     ` Chris Wilson
  0 siblings, 1 reply; 47+ messages in thread
From: Joonas Lahtinen @ 2016-08-23 10:53 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On ma, 2016-08-22 at 09:03 +0100, Chris Wilson wrote:
> @@ -4471,7 +4471,9 @@ int i915_gem_init(struct drm_device *dev)
>  	 */
>  	intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
>  
> -	i915_gem_init_userptr(dev_priv);
> +	ret = i915_gem_init_userptr(dev_priv);
> +	if (ret)
> +		goto out_unlock;

Don't we need a new teardown label after successful
i915_gem_init_userptr()?

With that added,

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 15/17] drm/i915: Wait upon userptr get-user-pages within execbuffer
  2016-08-23 10:53   ` Joonas Lahtinen
@ 2016-08-23 11:14     ` Chris Wilson
  0 siblings, 0 replies; 47+ messages in thread
From: Chris Wilson @ 2016-08-23 11:14 UTC (permalink / raw)
  To: Joonas Lahtinen; +Cc: intel-gfx

On Tue, Aug 23, 2016 at 01:53:09PM +0300, Joonas Lahtinen wrote:
> On ma, 2016-08-22 at 09:03 +0100, Chris Wilson wrote:
> > @@ -4471,7 +4471,9 @@ int i915_gem_init(struct drm_device *dev)
> >  	 */
> >  	intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
> >  
> > -	i915_gem_init_userptr(dev_priv);
> > +	ret = i915_gem_init_userptr(dev_priv);
> > +	if (ret)
> > +		goto out_unlock;
> 
> Don't we need a new teardown label after successful
> i915_gem_init_userptr()?

The entire function is devoid of local cleanup. Oh well.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 05/17] drm/i915: Pin the pages whilst operating on them
  2016-08-22  8:03 ` [PATCH 05/17] drm/i915: Pin the pages whilst operating on them Chris Wilson
@ 2016-08-23 11:54   ` Joonas Lahtinen
  0 siblings, 0 replies; 47+ messages in thread
From: Joonas Lahtinen @ 2016-08-23 11:54 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On ma, 2016-08-22 at 09:03 +0100, Chris Wilson wrote:
> As a safety precaution, whilst we operate on the object's pages
> (clflushing them, updating the LRU) make sure we hold a pin on those
> pages to prevent them disappearing underneath us.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 06/17] drm/i915: Move obj->dirty:1 to obj->flags
  2016-08-22  8:03 ` [PATCH 06/17] drm/i915: Move obj->dirty:1 to obj->flags Chris Wilson
@ 2016-08-23 12:01   ` Joonas Lahtinen
  2016-09-06 11:37   ` Dave Gordon
  1 sibling, 0 replies; 47+ messages in thread
From: Joonas Lahtinen @ 2016-08-23 12:01 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On ma, 2016-08-22 at 09:03 +0100, Chris Wilson wrote:
> @@ -2185,7 +2185,8 @@ struct drm_i915_gem_object {
>  	 * This is set if the object has been written to since last bound
>  	 * to the GTT
>  	 */
> -	unsigned int dirty:1;
> +#define I915_BO_DIRTY_SHIFT (I915_BO_ACTIVE_REF_SHIFT + 1)
> +#define I915_BO_DIRTY_BIT BIT(I915_BO_DIRTY_SHIFT)

Just #define I915_BO_DIRTY

With that fixed (remove in the other I915_BO_* too),

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 03/17] drm/i915: Allow the user to pass a context to any ring
  2016-08-22 12:23     ` Chris Wilson
@ 2016-08-23 13:28       ` John Harrison
  2016-08-23 13:33         ` John Harrison
  0 siblings, 1 reply; 47+ messages in thread
From: John Harrison @ 2016-08-23 13:28 UTC (permalink / raw)
  To: intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 2433 bytes --]

On 22/08/2016 13:23, Chris Wilson wrote:
> On Mon, Aug 22, 2016 at 02:23:28PM +0300, Joonas Lahtinen wrote:
>> On ma, 2016-08-22 at 09:03 +0100, Chris Wilson wrote:
>>> With full-ppgtt, we want the user to have full control over their memory
>>> layout, with a separate instance per context. Forcing them to use a
>>> shared memory layout for !RCS not only duplicates the amount of work we
>>> have to do, but also defeats the memory segregation on offer.
>>>
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>> ---
>>>   drivers/gpu/drm/i915/i915_gem_execbuffer.c | 5 +----
>>>   1 file changed, 1 insertion(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
>>> index 8f9d5ad0cfd8..fb1a64738fb8 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
>>> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
>>> @@ -1250,12 +1250,9 @@ static struct i915_gem_context *
>>>   i915_gem_validate_context(struct drm_device *dev, struct drm_file *file,
>>>   			  struct intel_engine_cs *engine, const u32 ctx_id)
>>>   {
>>> -	struct i915_gem_context *ctx = NULL;
>>> +	struct i915_gem_context *ctx;
>>>   	struct i915_ctx_hang_stats *hs;
>>>   
>>> -	if (engine->id != RCS && ctx_id != DEFAULT_CONTEXT_HANDLE)
>>> -		return ERR_PTR(-EINVAL);
>>> -
>> One would think this existed due to lack of testing or bugs in early
>> hardware. Do we need to use IS_GEN or some other means of validation?
> No.
>
> This has nothing to do with the hardware logical state (that is found
> within intel_context and only enabled where appropriate). The
> i915_gem_context is the driver's segregation between clients. Not only
> is t required for tracking clients independently (currently hangstats,
> but the context would be the first place we start enforcing cgroups like
> controls), but it is vital for clients who want to control their memory
> layout without conflicts (with themselves and others).
> -Chris
>
It is also important for clients that want to submit lots of work in 
parallel from a single application by using multiple contexts. Other 
internal teams have been running with this patch for quite some time. I 
believe the only reason it has not been merged upstream before (it has 
been on the mailing list at least twice before that I know of) was the 
argument of no open source user.

Reviewed-by: John Harrison <john.c.harrison@intel.com>


[-- Attachment #1.2: Type: text/html, Size: 3064 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 03/17] drm/i915: Allow the user to pass a context to any ring
  2016-08-23 13:28       ` John Harrison
@ 2016-08-23 13:33         ` John Harrison
  2016-08-25 15:28           ` John Harrison
  0 siblings, 1 reply; 47+ messages in thread
From: John Harrison @ 2016-08-23 13:33 UTC (permalink / raw)
  To: intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 2744 bytes --]

On 23/08/2016 14:28, John Harrison wrote:
> On 22/08/2016 13:23, Chris Wilson wrote:
>> On Mon, Aug 22, 2016 at 02:23:28PM +0300, Joonas Lahtinen wrote:
>>> On ma, 2016-08-22 at 09:03 +0100, Chris Wilson wrote:
>>>> With full-ppgtt, we want the user to have full control over their memory
>>>> layout, with a separate instance per context. Forcing them to use a
>>>> shared memory layout for !RCS not only duplicates the amount of work we
>>>> have to do, but also defeats the memory segregation on offer.
>>>>
>>>> Signed-off-by: Chris Wilson<chris@chris-wilson.co.uk>
>>>> ---
>>>>   drivers/gpu/drm/i915/i915_gem_execbuffer.c | 5 +----
>>>>   1 file changed, 1 insertion(+), 4 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
>>>> index 8f9d5ad0cfd8..fb1a64738fb8 100644
>>>> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
>>>> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
>>>> @@ -1250,12 +1250,9 @@ static struct i915_gem_context *
>>>>   i915_gem_validate_context(struct drm_device *dev, struct drm_file *file,
>>>>   			  struct intel_engine_cs *engine, const u32 ctx_id)
>>>>   {
>>>> -	struct i915_gem_context *ctx = NULL;
>>>> +	struct i915_gem_context *ctx;
>>>>   	struct i915_ctx_hang_stats *hs;
>>>>   
>>>> -	if (engine->id != RCS && ctx_id != DEFAULT_CONTEXT_HANDLE)
>>>> -		return ERR_PTR(-EINVAL);
>>>> -
>>> One would think this existed due to lack of testing or bugs in early
>>> hardware. Do we need to use IS_GEN or some other means of validation?
>> No.
>>
>> This has nothing to do with the hardware logical state (that is found
>> within intel_context and only enabled where appropriate). The
>> i915_gem_context is the driver's segregation between clients. Not only
>> is t required for tracking clients independently (currently hangstats,
>> but the context would be the first place we start enforcing cgroups like
>> controls), but it is vital for clients who want to control their memory
>> layout without conflicts (with themselves and others).
>> -Chris
>>
> It is also important for clients that want to submit lots of work in 
> parallel from a single application by using multiple contexts. Other 
> internal teams have been running with this patch for quite some time. 
> I believe the only reason it has not been merged upstream before (it 
> has been on the mailing list at least twice before that I know of) was 
> the argument of no open source user.
>
> Reviewed-by: John Harrison <john.c.harrison@intel.com>
>

Actually, just found a previous instance. It had an r-b from Daniel 
Thomas but Tvrtko vetoed it on the grounds of needing IGT coverage first 
- message id '<565EF558.5050705@linux.intel.com>' on Dec 2nd 2015.


[-- Attachment #1.2: Type: text/html, Size: 3741 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* ✗ Fi.CI.BAT: failure for series starting with [01/17] drm/i915: Skip holding an object reference for execbuf preparation
  2016-08-22  8:03 Execbuf fixes and major tuning Chris Wilson
                   ` (16 preceding siblings ...)
  2016-08-22  8:03 ` [PATCH 17/17] drm/i915: Use the MRU stack search after evicting Chris Wilson
@ 2016-08-23 13:52 ` Patchwork
  2016-08-25  6:51 ` ✗ Fi.CI.BAT: warning for series starting with [01/17] drm/i915: Skip holding an object reference for execbuf preparation (rev2) Patchwork
  18 siblings, 0 replies; 47+ messages in thread
From: Patchwork @ 2016-08-23 13:52 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/17] drm/i915: Skip holding an object reference for execbuf preparation
URL   : https://patchwork.freedesktop.org/series/11399/
State : failure

== Summary ==

Series 11399v1 Series without cover letter
http://patchwork.freedesktop.org/api/1.0/series/11399/revisions/1/mbox/

Test kms_cursor_legacy:
        Subgroup basic-flip-vs-cursor-varying-size:
                fail       -> PASS       (fi-hsw-4770r)
                fail       -> PASS       (fi-hsw-4770k)
Test kms_pipe_crc_basic:
        Subgroup hang-read-crc-pipe-b:
                skip       -> PASS       (fi-skl-6260u)
        Subgroup suspend-read-crc-pipe-b:
                dmesg-warn -> INCOMPLETE (fi-hsw-4770k)

fi-bdw-5557u     total:249  pass:232  dwarn:0   dfail:0   fail:1   skip:16 
fi-bsw-n3050     total:249  pass:202  dwarn:0   dfail:0   fail:3   skip:44 
fi-byt-n2820     total:249  pass:204  dwarn:0   dfail:0   fail:4   skip:41 
fi-hsw-4770k     total:207  pass:184  dwarn:2   dfail:0   fail:0   skip:20 
fi-hsw-4770r     total:249  pass:222  dwarn:0   dfail:0   fail:1   skip:26 
fi-ivb-3520m     total:249  pass:217  dwarn:0   dfail:0   fail:1   skip:31 
fi-skl-6260u     total:249  pass:234  dwarn:0   dfail:0   fail:1   skip:14 
fi-skl-6700k     total:249  pass:214  dwarn:4   dfail:0   fail:3   skip:28 
fi-snb-2520m     total:249  pass:201  dwarn:4   dfail:0   fail:2   skip:42 
fi-snb-2600      total:249  pass:202  dwarn:4   dfail:0   fail:1   skip:42 

Results at /archive/results/CI_IGT_test/Patchwork_2402/

5a8708ec5c5bbc8eacf1f5b9cb815e6064e9737b drm-intel-nightly: 2016y-08m-23d-12h-12m-19s UTC integration manifest
fc94e4d drm/i915: Pin the pages whilst operating on them
b754ad4 drm/i915: Fix i915_gem_evict_for_vma (soft-pinning)
01051fe drm/i915: Allow the user to pass a context to any ring
986279d drm/i915: Defer active reference until required
2405119 drm/i915: Skip holding an object reference for execbuf preparation

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 08/17] drm/i915: Drop spinlocks around adding to the client request list
  2016-08-22  8:03 ` [PATCH 08/17] drm/i915: Drop spinlocks around adding to the client request list Chris Wilson
@ 2016-08-24 13:16   ` Mika Kuoppala
  2016-08-24 13:25     ` Chris Wilson
  2016-08-26  9:13   ` Mika Kuoppala
  2016-09-02 10:30   ` John Harrison
  2 siblings, 1 reply; 47+ messages in thread
From: Mika Kuoppala @ 2016-08-24 13:16 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Chris Wilson <chris@chris-wilson.co.uk> writes:

> Adding to the tail of the client request list as the only other user is
> in the throttle ioctl that iterates forwards over the list. It only
> needs protection against deletion of a request as it reads it, it simply
> won't see a new request added to the end of the list, or it would be too
> early and rejected. We can further reduce the number of spinlocks
> required when throttling by removing stale requests from the client_list
> as we throttle.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/i915_debugfs.c        |  2 +-
>  drivers/gpu/drm/i915/i915_gem.c            | 14 ++++++------
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c | 14 +++++++-----
>  drivers/gpu/drm/i915/i915_gem_request.c    | 34 ++++++------------------------
>  drivers/gpu/drm/i915/i915_gem_request.h    |  4 +---
>  5 files changed, 23 insertions(+), 45 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 086053fa2820..996744708f31 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -480,7 +480,7 @@ static int i915_gem_object_info(struct seq_file *m, void* data)
>  		mutex_lock(&dev->struct_mutex);
>  		request = list_first_entry_or_null(&file_priv->mm.request_list,
>  						   struct drm_i915_gem_request,
> -						   client_list);
> +						   client_link);
>  		rcu_read_lock();
>  		task = pid_task(request && request->ctx->pid ?
>  				request->ctx->pid : file->pid,
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 7b8abda541e6..e432211e8b24 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3673,16 +3673,14 @@ i915_gem_ring_throttle(struct drm_device *dev, struct drm_file *file)
>  		return -EIO;
>  
>  	spin_lock(&file_priv->mm.lock);
> -	list_for_each_entry(request, &file_priv->mm.request_list, client_list) {
> +	list_for_each_entry(request, &file_priv->mm.request_list, client_link) {
>  		if (time_after_eq(request->emitted_jiffies, recent_enough))
>  			break;
>  
> -		/*
> -		 * Note that the request might not have been submitted yet.
> -		 * In which case emitted_jiffies will be zero.
> -		 */
> -		if (!request->emitted_jiffies)
> -			continue;
> +		if (target) {
> +			list_del(&target->client_link);
> +			target->file_priv = NULL;

No need for list_for_each_entry_safe as we are throwing out stuff before
the point of traversal?

Mika

> +		}
>  
>  		target = request;
>  	}
> @@ -4639,7 +4637,7 @@ void i915_gem_release(struct drm_device *dev, struct drm_file *file)
>  	 * file_priv.
>  	 */
>  	spin_lock(&file_priv->mm.lock);
> -	list_for_each_entry(request, &file_priv->mm.request_list, client_list)
> +	list_for_each_entry(request, &file_priv->mm.request_list, client_link)
>  		request->file_priv = NULL;
>  	spin_unlock(&file_priv->mm.lock);
>  
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index 125fb38eff40..5689445b1cd3 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -1421,6 +1421,14 @@ out:
>  	return vma;
>  }
>  
> +static void
> +add_to_client(struct drm_i915_gem_request *req,
> +	      struct drm_file *file)
> +{
> +	req->file_priv = file->driver_priv;
> +	list_add_tail(&req->client_link, &req->file_priv->mm.request_list);
> +}
> +
>  static int
>  execbuf_submit(struct i915_execbuffer_params *params,
>  	       struct drm_i915_gem_execbuffer2 *args,
> @@ -1512,6 +1520,7 @@ execbuf_submit(struct i915_execbuffer_params *params,
>  	trace_i915_gem_ring_dispatch(params->request, params->dispatch_flags);
>  
>  	i915_gem_execbuffer_move_to_active(vmas, params->request);
> +	add_to_client(params->request, params->file);
>  
>  	return 0;
>  }
> @@ -1808,10 +1817,6 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
>  	 */
>  	params->request->batch = params->batch;
>  
> -	ret = i915_gem_request_add_to_client(params->request, file);
> -	if (ret)
> -		goto err_request;
> -
>  	/*
>  	 * Save assorted stuff away to pass through to *_submission().
>  	 * NB: This data should be 'persistent' and not local as it will
> @@ -1825,7 +1830,6 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
>  	params->ctx                     = ctx;
>  
>  	ret = execbuf_submit(params, args, &eb->vmas);
> -err_request:
>  	__i915_add_request(params->request, ret == 0);
>  
>  err_batch_unpin:
> diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
> index 1a215320cefb..bf62427a35b7 100644
> --- a/drivers/gpu/drm/i915/i915_gem_request.c
> +++ b/drivers/gpu/drm/i915/i915_gem_request.c
> @@ -115,42 +115,20 @@ const struct fence_ops i915_fence_ops = {
>  	.timeline_value_str = i915_fence_timeline_value_str,
>  };
>  
> -int i915_gem_request_add_to_client(struct drm_i915_gem_request *req,
> -				   struct drm_file *file)
> -{
> -	struct drm_i915_private *dev_private;
> -	struct drm_i915_file_private *file_priv;
> -
> -	WARN_ON(!req || !file || req->file_priv);
> -
> -	if (!req || !file)
> -		return -EINVAL;
> -
> -	if (req->file_priv)
> -		return -EINVAL;
> -
> -	dev_private = req->i915;
> -	file_priv = file->driver_priv;
> -
> -	spin_lock(&file_priv->mm.lock);
> -	req->file_priv = file_priv;
> -	list_add_tail(&req->client_list, &file_priv->mm.request_list);
> -	spin_unlock(&file_priv->mm.lock);
> -
> -	return 0;
> -}
> -
>  static inline void
>  i915_gem_request_remove_from_client(struct drm_i915_gem_request *request)
>  {
> -	struct drm_i915_file_private *file_priv = request->file_priv;
> +	struct drm_i915_file_private *file_priv;
>  
> +	file_priv = request->file_priv;
>  	if (!file_priv)
>  		return;
>  
>  	spin_lock(&file_priv->mm.lock);
> -	list_del(&request->client_list);
> -	request->file_priv = NULL;
> +	if (request->file_priv) {
> +		list_del(&request->client_link);
> +		request->file_priv = NULL;
> +	}
>  	spin_unlock(&file_priv->mm.lock);
>  }
>  
> diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
> index 6c72bd8d9423..9d5a66bfc509 100644
> --- a/drivers/gpu/drm/i915/i915_gem_request.h
> +++ b/drivers/gpu/drm/i915/i915_gem_request.h
> @@ -132,7 +132,7 @@ struct drm_i915_gem_request {
>  
>  	struct drm_i915_file_private *file_priv;
>  	/** file_priv list entry for this request */
> -	struct list_head client_list;
> +	struct list_head client_link;
>  
>  	/**
>  	 * The ELSP only accepts two elements at a time, so we queue
> @@ -167,8 +167,6 @@ static inline bool fence_is_i915(struct fence *fence)
>  struct drm_i915_gem_request * __must_check
>  i915_gem_request_alloc(struct intel_engine_cs *engine,
>  		       struct i915_gem_context *ctx);
> -int i915_gem_request_add_to_client(struct drm_i915_gem_request *req,
> -				   struct drm_file *file);
>  void i915_gem_request_retire_upto(struct drm_i915_gem_request *req);
>  
>  static inline u32
> -- 
> 2.9.3
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 09/17] drm/i915: Amalgamate execbuffer parameter structures
  2016-08-22  8:03 ` [PATCH 09/17] drm/i915: Amalgamate execbuffer parameter structures Chris Wilson
@ 2016-08-24 13:20   ` John Harrison
  0 siblings, 0 replies; 47+ messages in thread
From: John Harrison @ 2016-08-24 13:20 UTC (permalink / raw)
  To: intel-gfx

On 22/08/2016 09:03, Chris Wilson wrote:
> Combine the two slightly overlapping parameter structures we pass around
> the execbuffer routines into one.
Should also include something about also renaming and refactoring the 
eb_* / i915_gem_execbuffer_* helper functions too.

Also, while you are doing the rename/refactor, maybe add at least some 
brief kerneldocs?


> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_gem_execbuffer.c | 557 ++++++++++++-----------------
>   1 file changed, 235 insertions(+), 322 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index 5689445b1cd3..7cb5b9ad9212 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -49,70 +49,73 @@
>   
>   #define BATCH_OFFSET_BIAS (256*1024)
>   
> -struct i915_execbuffer_params {
> -	struct drm_device               *dev;
> -	struct drm_file                 *file;
> -	struct i915_vma			*batch;
> -	u32				dispatch_flags;
> -	u32				args_batch_start_offset;
> -	struct intel_engine_cs          *engine;
> -	struct i915_gem_context         *ctx;
> -	struct drm_i915_gem_request     *request;
> -};
> -
> -struct eb_vmas {
> +struct i915_execbuffer {
>   	struct drm_i915_private *i915;
> +	struct drm_file *file;
> +	struct drm_i915_gem_execbuffer2 *args;
> +	struct drm_i915_gem_exec_object2 *exec;
> +	struct intel_engine_cs *engine;
> +	struct i915_gem_context *ctx;
> +	struct i915_address_space *vm;
> +	struct i915_vma *batch;
> +	struct drm_i915_gem_request *request;
> +	u32 batch_start_offset;
> +	unsigned int dispatch_flags;
> +	struct drm_i915_gem_exec_object2 shadow_exec_entry;
> +	bool need_relocs;
>   	struct list_head vmas;
> +	struct reloc_cache {
> +		struct drm_mm_node node;
> +		unsigned long vaddr;
> +		unsigned int page;
> +		bool use_64bit_reloc;
> +	} reloc_cache;
>   	int and;
>   	union {
> -		struct i915_vma *lut[0];
> -		struct hlist_head buckets[0];
> +		struct i915_vma **lut;
> +		struct hlist_head *buckets;
>   	};
>   };
>   
> -static struct eb_vmas *
> -eb_create(struct drm_i915_private *i915,
> -	  struct drm_i915_gem_execbuffer2 *args)
> +static int
> +eb_create(struct i915_execbuffer *eb)
Would this be better called eb_create_vma_lut() or similar? It doesn't 
create the eb itself (that is local to i915_gem_do_execbuffer) and all 
the basic initialisation is outside as well.

>   {
> -	struct eb_vmas *eb = NULL;
> -
> -	if (args->flags & I915_EXEC_HANDLE_LUT) {
> -		unsigned size = args->buffer_count;
> +	eb->lut = NULL;
> +	if (eb->args->flags & I915_EXEC_HANDLE_LUT) {
> +		unsigned int size = eb->args->buffer_count;
>   		size *= sizeof(struct i915_vma *);
> -		size += sizeof(struct eb_vmas);
> -		eb = kmalloc(size, GFP_TEMPORARY | __GFP_NOWARN | __GFP_NORETRY);
> +		eb->lut = kmalloc(size,
> +				  GFP_TEMPORARY | __GFP_NOWARN | __GFP_NORETRY);
>   	}
>   
> -	if (eb == NULL) {
> -		unsigned size = args->buffer_count;
> -		unsigned count = PAGE_SIZE / sizeof(struct hlist_head) / 2;
> +	if (!eb->lut) {
> +		unsigned int size = eb->args->buffer_count;
> +		unsigned int count = PAGE_SIZE / sizeof(struct hlist_head) / 2;
>   		BUILD_BUG_ON_NOT_POWER_OF_2(PAGE_SIZE / sizeof(struct hlist_head));
>   		while (count > 2*size)
>   			count >>= 1;
> -		eb = kzalloc(count*sizeof(struct hlist_head) +
> -			     sizeof(struct eb_vmas),
> -			     GFP_TEMPORARY);
> -		if (eb == NULL)
> -			return eb;
> +		eb->lut = kzalloc(count*sizeof(struct hlist_head),
> +				  GFP_TEMPORARY);
> +		if (!eb->lut)
> +			return -ENOMEM;
>   
>   		eb->and = count - 1;
>   	} else
> -		eb->and = -args->buffer_count;
> +		eb->and = -eb->args->buffer_count;
>   
> -	eb->i915 = i915;
>   	INIT_LIST_HEAD(&eb->vmas);
> -	return eb;
> +	return 0;
>   }
>   
>   static void
> -eb_reset(struct eb_vmas *eb)
> +eb_reset(struct i915_execbuffer *eb)
>   {
>   	if (eb->and >= 0)
>   		memset(eb->buckets, 0, (eb->and+1)*sizeof(struct hlist_head));
>   }
>   
>   static struct i915_vma *
> -eb_get_batch(struct eb_vmas *eb)
> +eb_get_batch(struct i915_execbuffer *eb)
>   {
>   	struct i915_vma *vma = list_entry(eb->vmas.prev, typeof(*vma), exec_list);
>   
> @@ -132,41 +135,37 @@ eb_get_batch(struct eb_vmas *eb)
>   }
>   
>   static int
> -eb_lookup_vmas(struct eb_vmas *eb,
> -	       struct drm_i915_gem_exec_object2 *exec,
> -	       const struct drm_i915_gem_execbuffer2 *args,
> -	       struct i915_address_space *vm,
> -	       struct drm_file *file)
> +eb_lookup_vmas(struct i915_execbuffer *eb)
>   {
>   	struct drm_i915_gem_object *obj;
>   	struct list_head objects;
>   	int i, ret;
>   
>   	INIT_LIST_HEAD(&objects);
> -	spin_lock(&file->table_lock);
> +	spin_lock(&eb->file->table_lock);
>   	/* Grab a reference to the object and release the lock so we can lookup
>   	 * or create the VMA without using GFP_ATOMIC */
> -	for (i = 0; i < args->buffer_count; i++) {
> -		obj = to_intel_bo(idr_find(&file->object_idr, exec[i].handle));
> +	for (i = 0; i < eb->args->buffer_count; i++) {
> +		obj = to_intel_bo(idr_find(&eb->file->object_idr, eb->exec[i].handle));
>   		if (obj == NULL) {
> -			spin_unlock(&file->table_lock);
> +			spin_unlock(&eb->file->table_lock);
>   			DRM_DEBUG("Invalid object handle %d at index %d\n",
> -				   exec[i].handle, i);
> +				   eb->exec[i].handle, i);
>   			ret = -ENOENT;
>   			goto err;
>   		}
>   
>   		if (!list_empty(&obj->obj_exec_link)) {
> -			spin_unlock(&file->table_lock);
> +			spin_unlock(&eb->file->table_lock);
>   			DRM_DEBUG("Object %p [handle %d, index %d] appears more than once in object list\n",
> -				   obj, exec[i].handle, i);
> +				   obj, eb->exec[i].handle, i);
>   			ret = -EINVAL;
>   			goto err;
>   		}
>   
>   		list_add_tail(&obj->obj_exec_link, &objects);
>   	}
> -	spin_unlock(&file->table_lock);
> +	spin_unlock(&eb->file->table_lock);
>   
>   	i = 0;
>   	while (!list_empty(&objects)) {
> @@ -184,7 +183,7 @@ eb_lookup_vmas(struct eb_vmas *eb,
>   		 * from the (obj, vm) we don't run the risk of creating
>   		 * duplicated vmas for the same vm.
>   		 */
> -		vma = i915_gem_obj_lookup_or_create_vma(obj, vm, NULL);
> +		vma = i915_gem_obj_lookup_or_create_vma(obj, eb->vm, NULL);
>   		if (unlikely(IS_ERR(vma))) {
>   			DRM_DEBUG("Failed to lookup VMA\n");
>   			ret = PTR_ERR(vma);
> @@ -195,11 +194,13 @@ eb_lookup_vmas(struct eb_vmas *eb,
>   		list_add_tail(&vma->exec_list, &eb->vmas);
>   		list_del_init(&obj->obj_exec_link);
>   
> -		vma->exec_entry = &exec[i];
> +		vma->exec_entry = &eb->exec[i];
>   		if (eb->and < 0) {
>   			eb->lut[i] = vma;
>   		} else {
> -			uint32_t handle = args->flags & I915_EXEC_HANDLE_LUT ? i : exec[i].handle;
> +			u32 handle =
> +				eb->args->flags & I915_EXEC_HANDLE_LUT ?
> +				i : eb->exec[i].handle;
>   			vma->exec_handle = handle;
>   			hlist_add_head(&vma->exec_node,
>   				       &eb->buckets[handle & eb->and]);
> @@ -226,7 +227,7 @@ err:
>   	return ret;
>   }
>   
> -static struct i915_vma *eb_get_vma(struct eb_vmas *eb, unsigned long handle)
> +static struct i915_vma *eb_get_vma(struct i915_execbuffer *eb, unsigned long handle)
>   {
>   	if (eb->and < 0) {
>   		if (handle >= -eb->and)
> @@ -246,7 +247,7 @@ static struct i915_vma *eb_get_vma(struct eb_vmas *eb, unsigned long handle)
>   }
>   
>   static void
> -i915_gem_execbuffer_unreserve_vma(struct i915_vma *vma)
> +eb_unreserve_vma(struct i915_vma *vma)
>   {
>   	struct drm_i915_gem_exec_object2 *entry;
>   
> @@ -264,8 +265,10 @@ i915_gem_execbuffer_unreserve_vma(struct i915_vma *vma)
>   	entry->flags &= ~(__EXEC_OBJECT_HAS_FENCE | __EXEC_OBJECT_HAS_PIN);
>   }
>   
> -static void eb_destroy(struct eb_vmas *eb)
> +static void eb_destroy(struct i915_execbuffer *eb)
>   {
> +	i915_gem_context_put(eb->ctx);
> +
>   	while (!list_empty(&eb->vmas)) {
>   		struct i915_vma *vma;
>   
> @@ -273,9 +276,8 @@ static void eb_destroy(struct eb_vmas *eb)
>   				       struct i915_vma,
>   				       exec_list);
>   		list_del_init(&vma->exec_list);
> -		i915_gem_execbuffer_unreserve_vma(vma);
> +		eb_unreserve_vma(vma);
>   	}
> -	kfree(eb);
>   }
>   
>   static inline int use_cpu_reloc(struct drm_i915_gem_object *obj)
> @@ -316,21 +318,12 @@ relocation_target(const struct drm_i915_gem_relocation_entry *reloc,
>   	return gen8_canonical_addr((int)reloc->delta + target_offset);
>   }
>   
> -struct reloc_cache {
> -	struct drm_i915_private *i915;
> -	struct drm_mm_node node;
> -	unsigned long vaddr;
> -	unsigned int page;
> -	bool use_64bit_reloc;
> -};
> -
>   static void reloc_cache_init(struct reloc_cache *cache,
>   			     struct drm_i915_private *i915)
>   {
>   	cache->page = -1;
>   	cache->vaddr = 0;
> -	cache->i915 = i915;
> -	cache->use_64bit_reloc = INTEL_GEN(cache->i915) >= 8;
> +	cache->use_64bit_reloc = INTEL_GEN(i915) >= 8;
>   	cache->node.allocated = false;
>   }
>   
> @@ -346,7 +339,14 @@ static inline unsigned int unmask_flags(unsigned long p)
>   
>   #define KMAP 0x4 /* after CLFLUSH_FLAGS */
>   
> -static void reloc_cache_fini(struct reloc_cache *cache)
> +static inline struct i915_ggtt *cache_to_ggtt(struct reloc_cache *cache)
> +{
> +	struct drm_i915_private *i915 =
> +		container_of(cache, struct i915_execbuffer, reloc_cache)->i915;
> +	return &i915->ggtt;
> +}
> +
> +static void reloc_cache_reset(struct reloc_cache *cache)
>   {
>   	void *vaddr;
>   
> @@ -364,7 +364,7 @@ static void reloc_cache_fini(struct reloc_cache *cache)
>   		wmb();
>   		io_mapping_unmap_atomic((void __iomem *)vaddr);
>   		if (cache->node.allocated) {
> -			struct i915_ggtt *ggtt = &cache->i915->ggtt;
> +			struct i915_ggtt *ggtt = cache_to_ggtt(cache);
>   
>   			ggtt->base.clear_range(&ggtt->base,
>   					       cache->node.start,
> @@ -375,6 +375,9 @@ static void reloc_cache_fini(struct reloc_cache *cache)
>   			i915_vma_unpin((struct i915_vma *)cache->node.mm);
>   		}
>   	}
> +
> +	cache->vaddr = 0;
> +	cache->page = -1;
>   }
>   
>   static void *reloc_kmap(struct drm_i915_gem_object *obj,
> @@ -413,7 +416,7 @@ static void *reloc_iomap(struct drm_i915_gem_object *obj,
>   			 struct reloc_cache *cache,
>   			 int page)
>   {
> -	struct i915_ggtt *ggtt = &cache->i915->ggtt;
> +	struct i915_ggtt *ggtt = cache_to_ggtt(cache);
>   	unsigned long offset;
>   	void *vaddr;
>   
> @@ -472,7 +475,7 @@ static void *reloc_iomap(struct drm_i915_gem_object *obj,
>   		offset += page << PAGE_SHIFT;
>   	}
>   
> -	vaddr = io_mapping_map_atomic_wc(&cache->i915->ggtt.mappable, offset);
> +	vaddr = io_mapping_map_atomic_wc(&ggtt->mappable, offset);
>   	cache->page = page;
>   	cache->vaddr = (unsigned long)vaddr;
>   
> @@ -565,12 +568,10 @@ static bool object_is_idle(struct drm_i915_gem_object *obj)
>   }
>   
>   static int
> -i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
> -				   struct eb_vmas *eb,
> -				   struct drm_i915_gem_relocation_entry *reloc,
> -				   struct reloc_cache *cache)
> +eb_relocate_entry(struct drm_i915_gem_object *obj,
> +		  struct i915_execbuffer *eb,
> +		  struct drm_i915_gem_relocation_entry *reloc)
>   {
> -	struct drm_device *dev = obj->base.dev;
>   	struct drm_gem_object *target_obj;
>   	struct drm_i915_gem_object *target_i915_obj;
>   	struct i915_vma *target_vma;
> @@ -589,8 +590,8 @@ i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
>   	/* Sandybridge PPGTT errata: We need a global gtt mapping for MI and
>   	 * pipe_control writes because the gpu doesn't properly redirect them
>   	 * through the ppgtt for non_secure batchbuffers. */
> -	if (unlikely(IS_GEN6(dev) &&
> -	    reloc->write_domain == I915_GEM_DOMAIN_INSTRUCTION)) {
> +	if (unlikely(IS_GEN6(eb->i915) &&
> +		     reloc->write_domain == I915_GEM_DOMAIN_INSTRUCTION)) {
>   		ret = i915_vma_bind(target_vma, target_i915_obj->cache_level,
>   				    PIN_GLOBAL);
>   		if (WARN_ONCE(ret, "Unexpected failure to bind target VMA!"))
> @@ -631,7 +632,7 @@ i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
>   
>   	/* Check that the relocation address is valid... */
>   	if (unlikely(reloc->offset >
> -		     obj->base.size - (cache->use_64bit_reloc ? 8 : 4))) {
> +		     obj->base.size - (eb->reloc_cache.use_64bit_reloc ? 8 : 4))) {
>   		DRM_DEBUG("Relocation beyond object bounds: "
>   			  "obj %p target %d offset %d size %d.\n",
>   			  obj, reloc->target_handle,
> @@ -651,7 +652,7 @@ i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
>   	if (pagefault_disabled() && !object_is_idle(obj))
>   		return -EFAULT;
>   
> -	ret = relocate_entry(obj, reloc, cache, target_offset);
> +	ret = relocate_entry(obj, reloc, &eb->reloc_cache, target_offset);
>   	if (ret)
>   		return ret;
>   
> @@ -660,19 +661,15 @@ i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
>   	return 0;
>   }
>   
> -static int
> -i915_gem_execbuffer_relocate_vma(struct i915_vma *vma,
> -				 struct eb_vmas *eb)
> +static int eb_relocate_vma(struct i915_vma *vma, struct i915_execbuffer *eb)
>   {
>   #define N_RELOC(x) ((x) / sizeof(struct drm_i915_gem_relocation_entry))
>   	struct drm_i915_gem_relocation_entry stack_reloc[N_RELOC(512)];
>   	struct drm_i915_gem_relocation_entry __user *user_relocs;
>   	struct drm_i915_gem_exec_object2 *entry = vma->exec_entry;
> -	struct reloc_cache cache;
>   	int remain, ret = 0;
>   
>   	user_relocs = u64_to_user_ptr(entry->relocs_ptr);
> -	reloc_cache_init(&cache, eb->i915);
>   
>   	remain = entry->relocation_count;
>   	while (remain) {
> @@ -690,7 +687,7 @@ i915_gem_execbuffer_relocate_vma(struct i915_vma *vma,
>   		do {
>   			u64 offset = r->presumed_offset;
>   
> -			ret = i915_gem_execbuffer_relocate_entry(vma->obj, eb, r, &cache);
> +			ret = eb_relocate_entry(vma->obj, eb, r);
>   			if (ret)
>   				goto out;
>   
> @@ -707,33 +704,29 @@ i915_gem_execbuffer_relocate_vma(struct i915_vma *vma,
>   	}
>   
>   out:
> -	reloc_cache_fini(&cache);
> +	reloc_cache_reset(&eb->reloc_cache);
>   	return ret;
>   #undef N_RELOC
>   }
>   
>   static int
> -i915_gem_execbuffer_relocate_vma_slow(struct i915_vma *vma,
> -				      struct eb_vmas *eb,
> -				      struct drm_i915_gem_relocation_entry *relocs)
> +eb_relocate_vma_slow(struct i915_vma *vma,
> +		     struct i915_execbuffer *eb,
> +		     struct drm_i915_gem_relocation_entry *relocs)
>   {
>   	const struct drm_i915_gem_exec_object2 *entry = vma->exec_entry;
> -	struct reloc_cache cache;
>   	int i, ret = 0;
>   
> -	reloc_cache_init(&cache, eb->i915);
>   	for (i = 0; i < entry->relocation_count; i++) {
> -		ret = i915_gem_execbuffer_relocate_entry(vma->obj, eb, &relocs[i], &cache);
> +		ret = eb_relocate_entry(vma->obj, eb, &relocs[i]);
>   		if (ret)
>   			break;
>   	}
> -	reloc_cache_fini(&cache);
> -
> +	reloc_cache_reset(&eb->reloc_cache);
>   	return ret;
>   }
>   
> -static int
> -i915_gem_execbuffer_relocate(struct eb_vmas *eb)
> +static int eb_relocate(struct i915_execbuffer *eb)
>   {
>   	struct i915_vma *vma;
>   	int ret = 0;
> @@ -747,7 +740,7 @@ i915_gem_execbuffer_relocate(struct eb_vmas *eb)
>   	 */
>   	pagefault_disable();
>   	list_for_each_entry(vma, &eb->vmas, exec_list) {
> -		ret = i915_gem_execbuffer_relocate_vma(vma, eb);
> +		ret = eb_relocate_vma(vma, eb);
>   		if (ret)
>   			break;
>   	}
> @@ -763,9 +756,9 @@ static bool only_mappable_for_reloc(unsigned int flags)
>   }
>   
>   static int
> -i915_gem_execbuffer_reserve_vma(struct i915_vma *vma,
> -				struct intel_engine_cs *engine,
> -				bool *need_reloc)
> +eb_reserve_vma(struct i915_vma *vma,
> +	       struct intel_engine_cs *engine,
> +	       bool *need_reloc)
>   {
>   	struct drm_i915_gem_object *obj = vma->obj;
>   	struct drm_i915_gem_exec_object2 *entry = vma->exec_entry;
> @@ -885,33 +878,26 @@ eb_vma_misplaced(struct i915_vma *vma)
>   	return false;
>   }
>   
> -static int
> -i915_gem_execbuffer_reserve(struct intel_engine_cs *engine,
> -			    struct list_head *vmas,
> -			    struct i915_gem_context *ctx,
> -			    bool *need_relocs)
> +static int eb_reserve(struct i915_execbuffer *eb)
>   {
> +	const bool has_fenced_gpu_access = INTEL_GEN(eb->i915) < 4;
>   	struct drm_i915_gem_object *obj;
>   	struct i915_vma *vma;
> -	struct i915_address_space *vm;
>   	struct list_head ordered_vmas;
>   	struct list_head pinned_vmas;
> -	bool has_fenced_gpu_access = INTEL_GEN(engine->i915) < 4;
>   	int retry;
>   
> -	vm = list_first_entry(vmas, struct i915_vma, exec_list)->vm;
> -
>   	INIT_LIST_HEAD(&ordered_vmas);
>   	INIT_LIST_HEAD(&pinned_vmas);
> -	while (!list_empty(vmas)) {
> +	while (!list_empty(&eb->vmas)) {
>   		struct drm_i915_gem_exec_object2 *entry;
>   		bool need_fence, need_mappable;
>   
> -		vma = list_first_entry(vmas, struct i915_vma, exec_list);
> +		vma = list_first_entry(&eb->vmas, struct i915_vma, exec_list);
>   		obj = vma->obj;
>   		entry = vma->exec_entry;
>   
> -		if (ctx->flags & CONTEXT_NO_ZEROMAP)
> +		if (eb->ctx->flags & CONTEXT_NO_ZEROMAP)
>   			entry->flags |= __EXEC_OBJECT_NEEDS_BIAS;
>   
>   		if (!has_fenced_gpu_access)
> @@ -932,8 +918,8 @@ i915_gem_execbuffer_reserve(struct intel_engine_cs *engine,
>   		obj->base.pending_read_domains = I915_GEM_GPU_DOMAINS & ~I915_GEM_DOMAIN_COMMAND;
>   		obj->base.pending_write_domain = 0;
>   	}
> -	list_splice(&ordered_vmas, vmas);
> -	list_splice(&pinned_vmas, vmas);
> +	list_splice(&ordered_vmas, &eb->vmas);
> +	list_splice(&pinned_vmas, &eb->vmas);
>   
>   	/* Attempt to pin all of the buffers into the GTT.
>   	 * This is done in 3 phases:
> @@ -952,27 +938,24 @@ i915_gem_execbuffer_reserve(struct intel_engine_cs *engine,
>   		int ret = 0;
>   
>   		/* Unbind any ill-fitting objects or pin. */
> -		list_for_each_entry(vma, vmas, exec_list) {
> +		list_for_each_entry(vma, &eb->vmas, exec_list) {
>   			if (!drm_mm_node_allocated(&vma->node))
>   				continue;
>   
>   			if (eb_vma_misplaced(vma))
>   				ret = i915_vma_unbind(vma);
>   			else
> -				ret = i915_gem_execbuffer_reserve_vma(vma,
> -								      engine,
> -								      need_relocs);
> +				ret = eb_reserve_vma(vma, eb->engine, &eb->need_relocs);
>   			if (ret)
>   				goto err;
>   		}
>   
>   		/* Bind fresh objects */
> -		list_for_each_entry(vma, vmas, exec_list) {
> +		list_for_each_entry(vma, &eb->vmas, exec_list) {
>   			if (drm_mm_node_allocated(&vma->node))
>   				continue;
>   
> -			ret = i915_gem_execbuffer_reserve_vma(vma, engine,
> -							      need_relocs);
> +			ret = eb_reserve_vma(vma, eb->engine, &eb->need_relocs);
>   			if (ret)
>   				goto err;
>   		}
> @@ -982,46 +965,37 @@ err:
>   			return ret;
>   
>   		/* Decrement pin count for bound objects */
> -		list_for_each_entry(vma, vmas, exec_list)
> -			i915_gem_execbuffer_unreserve_vma(vma);
> +		list_for_each_entry(vma, &eb->vmas, exec_list)
> +			eb_unreserve_vma(vma);
>   
> -		ret = i915_gem_evict_vm(vm, true);
> +		ret = i915_gem_evict_vm(eb->vm, true);
>   		if (ret)
>   			return ret;
>   	} while (1);
>   }
>   
>   static int
> -i915_gem_execbuffer_relocate_slow(struct drm_device *dev,
> -				  struct drm_i915_gem_execbuffer2 *args,
> -				  struct drm_file *file,
> -				  struct intel_engine_cs *engine,
> -				  struct eb_vmas *eb,
> -				  struct drm_i915_gem_exec_object2 *exec,
> -				  struct i915_gem_context *ctx)
> +eb_relocate_slow(struct i915_execbuffer *eb)
>   {
> +	const unsigned int count = eb->args->buffer_count;
> +	struct drm_device *dev = &eb->i915->drm;
>   	struct drm_i915_gem_relocation_entry *reloc;
> -	struct i915_address_space *vm;
>   	struct i915_vma *vma;
> -	bool need_relocs;
>   	int *reloc_offset;
>   	int i, total, ret;
> -	unsigned count = args->buffer_count;
> -
> -	vm = list_first_entry(&eb->vmas, struct i915_vma, exec_list)->vm;
>   
>   	/* We may process another execbuffer during the unlock... */
>   	while (!list_empty(&eb->vmas)) {
>   		vma = list_first_entry(&eb->vmas, struct i915_vma, exec_list);
>   		list_del_init(&vma->exec_list);
> -		i915_gem_execbuffer_unreserve_vma(vma);
> +		eb_unreserve_vma(vma);
>   	}
>   
>   	mutex_unlock(&dev->struct_mutex);
>   
>   	total = 0;
>   	for (i = 0; i < count; i++)
> -		total += exec[i].relocation_count;
> +		total += eb->exec[i].relocation_count;
>   
>   	reloc_offset = drm_malloc_ab(count, sizeof(*reloc_offset));
>   	reloc = drm_malloc_ab(total, sizeof(*reloc));
> @@ -1038,10 +1012,10 @@ i915_gem_execbuffer_relocate_slow(struct drm_device *dev,
>   		u64 invalid_offset = (u64)-1;
>   		int j;
>   
> -		user_relocs = u64_to_user_ptr(exec[i].relocs_ptr);
> +		user_relocs = u64_to_user_ptr(eb->exec[i].relocs_ptr);
>   
>   		if (copy_from_user(reloc+total, user_relocs,
> -				   exec[i].relocation_count * sizeof(*reloc))) {
> +				   eb->exec[i].relocation_count * sizeof(*reloc))) {
>   			ret = -EFAULT;
>   			mutex_lock(&dev->struct_mutex);
>   			goto err;
> @@ -1056,7 +1030,7 @@ i915_gem_execbuffer_relocate_slow(struct drm_device *dev,
>   		 * happened we would make the mistake of assuming that the
>   		 * relocations were valid.
>   		 */
> -		for (j = 0; j < exec[i].relocation_count; j++) {
> +		for (j = 0; j < eb->exec[i].relocation_count; j++) {
>   			if (__copy_to_user(&user_relocs[j].presumed_offset,
>   					   &invalid_offset,
>   					   sizeof(invalid_offset))) {
> @@ -1067,7 +1041,7 @@ i915_gem_execbuffer_relocate_slow(struct drm_device *dev,
>   		}
>   
>   		reloc_offset[i] = total;
> -		total += exec[i].relocation_count;
> +		total += eb->exec[i].relocation_count;
>   	}
>   
>   	ret = i915_mutex_lock_interruptible(dev);
> @@ -1078,20 +1052,18 @@ i915_gem_execbuffer_relocate_slow(struct drm_device *dev,
>   
>   	/* reacquire the objects */
>   	eb_reset(eb);
> -	ret = eb_lookup_vmas(eb, exec, args, vm, file);
> +	ret = eb_lookup_vmas(eb);
>   	if (ret)
>   		goto err;
>   
> -	need_relocs = (args->flags & I915_EXEC_NO_RELOC) == 0;
> -	ret = i915_gem_execbuffer_reserve(engine, &eb->vmas, ctx,
> -					  &need_relocs);
> +	ret = eb_reserve(eb);
>   	if (ret)
>   		goto err;
>   
>   	list_for_each_entry(vma, &eb->vmas, exec_list) {
> -		int offset = vma->exec_entry - exec;
> -		ret = i915_gem_execbuffer_relocate_vma_slow(vma, eb,
> -							    reloc + reloc_offset[offset]);
> +		int idx = vma->exec_entry - eb->exec;
> +
> +		ret = eb_relocate_vma_slow(vma, eb, reloc + reloc_offset[idx]);
>   		if (ret)
>   			goto err;
>   	}
> @@ -1108,29 +1080,28 @@ err:
>   	return ret;
>   }
>   
> -static unsigned int eb_other_engines(struct drm_i915_gem_request *req)
> +static unsigned int eb_other_engines(struct i915_execbuffer *eb)
>   {
>   	unsigned int mask;
>   
> -	mask = ~intel_engine_flag(req->engine) & I915_BO_ACTIVE_MASK;
> +	mask = ~intel_engine_flag(eb->engine) & I915_BO_ACTIVE_MASK;
>   	mask <<= I915_BO_ACTIVE_SHIFT;
>   
>   	return mask;
>   }
>   
>   static int
> -i915_gem_execbuffer_move_to_gpu(struct drm_i915_gem_request *req,
> -				struct list_head *vmas)
> +eb_move_to_gpu(struct i915_execbuffer *eb)
>   {
> -	const unsigned int other_rings = eb_other_engines(req);
> +	const unsigned int other_rings = eb_other_engines(eb);
>   	struct i915_vma *vma;
>   	int ret;
>   
> -	list_for_each_entry(vma, vmas, exec_list) {
> +	list_for_each_entry(vma, &eb->vmas, exec_list) {
>   		struct drm_i915_gem_object *obj = vma->obj;
>   
>   		if (obj->flags & other_rings) {
> -			ret = i915_gem_object_sync(obj, req);
> +			ret = i915_gem_object_sync(obj, eb->request);
>   			if (ret)
>   				return ret;
>   		}
> @@ -1140,10 +1111,10 @@ i915_gem_execbuffer_move_to_gpu(struct drm_i915_gem_request *req,
>   	}
>   
>   	/* Unconditionally flush any chipset caches (for streaming writes). */
> -	i915_gem_chipset_flush(req->engine->i915);
> +	i915_gem_chipset_flush(eb->i915);
>   
>   	/* Unconditionally invalidate GPU caches and TLBs. */
> -	return req->engine->emit_flush(req, EMIT_INVALIDATE);
> +	return eb->engine->emit_flush(eb->request, EMIT_INVALIDATE);
>   }
>   
>   static bool
> @@ -1246,24 +1217,25 @@ validate_exec_list(struct drm_device *dev,
>   	return 0;
>   }
>   
> -static struct i915_gem_context *
> -i915_gem_validate_context(struct drm_device *dev, struct drm_file *file,
> -			  struct intel_engine_cs *engine, const u32 ctx_id)
> +static int eb_select_context(struct i915_execbuffer *eb)
>   {
>   	struct i915_gem_context *ctx;
> -	struct i915_ctx_hang_stats *hs;
> +	unsigned int ctx_id;
>   
> -	ctx = i915_gem_context_lookup(file->driver_priv, ctx_id);
> -	if (IS_ERR(ctx))
> -		return ctx;
> +	ctx_id = i915_execbuffer2_get_context_id(*eb->args);
> +	ctx = i915_gem_context_lookup(eb->file->driver_priv, ctx_id);
> +	if (unlikely(IS_ERR(ctx)))
> +		return PTR_ERR(ctx);
>   
> -	hs = &ctx->hang_stats;
> -	if (hs->banned) {
> +	if (unlikely(ctx->hang_stats.banned)) {
>   		DRM_DEBUG("Context %u tried to submit while banned\n", ctx_id);
> -		return ERR_PTR(-EIO);
> +		return -EIO;
>   	}
>   
> -	return ctx;
> +	eb->ctx = i915_gem_context_get(ctx);
> +	eb->vm = ctx->ppgtt ? &ctx->ppgtt->base : &eb->i915->ggtt.base;
> +
> +	return 0;
>   }
>   
>   void i915_vma_move_to_active(struct i915_vma *vma,
> @@ -1325,12 +1297,11 @@ static void eb_export_fence(struct drm_i915_gem_object *obj,
>   }
>   
>   static void
> -i915_gem_execbuffer_move_to_active(struct list_head *vmas,
> -				   struct drm_i915_gem_request *req)
> +eb_move_to_active(struct i915_execbuffer *eb)
>   {
>   	struct i915_vma *vma;
>   
> -	list_for_each_entry(vma, vmas, exec_list) {
> +	list_for_each_entry(vma, &eb->vmas, exec_list) {
>   		struct drm_i915_gem_object *obj = vma->obj;
>   		u32 old_read = obj->base.read_domains;
>   		u32 old_write = obj->base.write_domain;
> @@ -1342,8 +1313,8 @@ i915_gem_execbuffer_move_to_active(struct list_head *vmas,
>   			obj->base.pending_read_domains |= obj->base.read_domains;
>   		obj->base.read_domains = obj->base.pending_read_domains;
>   
> -		i915_vma_move_to_active(vma, req, vma->exec_entry->flags);
> -		eb_export_fence(obj, req, vma->exec_entry->flags);
> +		i915_vma_move_to_active(vma, eb->request, vma->exec_entry->flags);
> +		eb_export_fence(obj, eb->request, vma->exec_entry->flags);
>   		trace_i915_gem_object_change_domain(obj, old_read, old_write);
>   	}
>   }
> @@ -1374,29 +1345,22 @@ i915_reset_gen7_sol_offsets(struct drm_i915_gem_request *req)
>   	return 0;
>   }
>   
> -static struct i915_vma *
> -i915_gem_execbuffer_parse(struct intel_engine_cs *engine,
> -			  struct drm_i915_gem_exec_object2 *shadow_exec_entry,
> -			  struct drm_i915_gem_object *batch_obj,
> -			  struct eb_vmas *eb,
> -			  u32 batch_start_offset,
> -			  u32 batch_len,
> -			  bool is_master)
> +static struct i915_vma *eb_parse(struct i915_execbuffer *eb, bool is_master)
>   {
>   	struct drm_i915_gem_object *shadow_batch_obj;
>   	struct i915_vma *vma;
>   	int ret;
>   
> -	shadow_batch_obj = i915_gem_batch_pool_get(&engine->batch_pool,
> -						   PAGE_ALIGN(batch_len));
> +	shadow_batch_obj = i915_gem_batch_pool_get(&eb->engine->batch_pool,
> +						   PAGE_ALIGN(eb->args->batch_len));
>   	if (IS_ERR(shadow_batch_obj))
>   		return ERR_CAST(shadow_batch_obj);
>   
> -	ret = intel_engine_cmd_parser(engine,
> -				      batch_obj,
> +	ret = intel_engine_cmd_parser(eb->engine,
> +				      eb->batch->obj,
>   				      shadow_batch_obj,
> -				      batch_start_offset,
> -				      batch_len,
> +				      eb->args->batch_start_offset,
> +				      eb->args->batch_len,
>   				      is_master);
>   	if (ret) {
>   		if (ret == -EACCES) /* unhandled chained batch */
> @@ -1410,9 +1374,8 @@ i915_gem_execbuffer_parse(struct intel_engine_cs *engine,
>   	if (IS_ERR(vma))
>   		goto out;
>   
> -	memset(shadow_exec_entry, 0, sizeof(*shadow_exec_entry));
> -
> -	vma->exec_entry = shadow_exec_entry;
> +	vma->exec_entry =
> +		memset(&eb->shadow_exec_entry, 0, sizeof(*vma->exec_entry));
I would say leave this as two lines. Merging them doesn't gain you 
anything but gives you an ugly line break mid assignment.

>   	vma->exec_entry->flags = __EXEC_OBJECT_HAS_PIN;
>   	list_add_tail(&vma->exec_list, &eb->vmas);
>   
> @@ -1430,49 +1393,45 @@ add_to_client(struct drm_i915_gem_request *req,
>   }
>   
>   static int
> -execbuf_submit(struct i915_execbuffer_params *params,
> -	       struct drm_i915_gem_execbuffer2 *args,
> -	       struct list_head *vmas)
> +execbuf_submit(struct i915_execbuffer *eb)
>   {
> -	struct drm_i915_private *dev_priv = params->request->i915;
> -	u64 exec_start, exec_len;
>   	int instp_mode;
>   	u32 instp_mask;
>   	int ret;
>   
> -	ret = i915_gem_execbuffer_move_to_gpu(params->request, vmas);
> +	ret = eb_move_to_gpu(eb);
>   	if (ret)
>   		return ret;
>   
> -	ret = i915_switch_context(params->request);
> +	ret = i915_switch_context(eb->request);
>   	if (ret)
>   		return ret;
>   
> -	instp_mode = args->flags & I915_EXEC_CONSTANTS_MASK;
> +	instp_mode = eb->args->flags & I915_EXEC_CONSTANTS_MASK;
>   	instp_mask = I915_EXEC_CONSTANTS_MASK;
>   	switch (instp_mode) {
>   	case I915_EXEC_CONSTANTS_REL_GENERAL:
>   	case I915_EXEC_CONSTANTS_ABSOLUTE:
>   	case I915_EXEC_CONSTANTS_REL_SURFACE:
> -		if (instp_mode != 0 && params->engine->id != RCS) {
> +		if (instp_mode != 0 && eb->engine->id != RCS) {
>   			DRM_DEBUG("non-0 rel constants mode on non-RCS\n");
>   			return -EINVAL;
>   		}
>   
> -		if (instp_mode != dev_priv->relative_constants_mode) {
> -			if (INTEL_INFO(dev_priv)->gen < 4) {
> +		if (instp_mode != eb->i915->relative_constants_mode) {
> +			if (INTEL_INFO(eb->i915)->gen < 4) {
>   				DRM_DEBUG("no rel constants on pre-gen4\n");
>   				return -EINVAL;
>   			}
>   
> -			if (INTEL_INFO(dev_priv)->gen > 5 &&
> +			if (INTEL_INFO(eb->i915)->gen > 5 &&
>   			    instp_mode == I915_EXEC_CONSTANTS_REL_SURFACE) {
>   				DRM_DEBUG("rel surface constants mode invalid on gen5+\n");
>   				return -EINVAL;
>   			}
>   
>   			/* The HW changed the meaning on this bit on gen6 */
> -			if (INTEL_INFO(dev_priv)->gen >= 6)
> +			if (INTEL_INFO(eb->i915)->gen >= 6)
>   				instp_mask &= ~I915_EXEC_CONSTANTS_REL_SURFACE;
>   		}
>   		break;
> @@ -1481,11 +1440,11 @@ execbuf_submit(struct i915_execbuffer_params *params,
>   		return -EINVAL;
>   	}
>   
> -	if (params->engine->id == RCS &&
> -	    instp_mode != dev_priv->relative_constants_mode) {
> -		struct intel_ring *ring = params->request->ring;
> +	if (eb->engine->id == RCS &&
> +	    instp_mode != eb->i915->relative_constants_mode) {
> +		struct intel_ring *ring = eb->request->ring;
>   
> -		ret = intel_ring_begin(params->request, 4);
> +		ret = intel_ring_begin(eb->request, 4);
>   		if (ret)
>   			return ret;
>   
> @@ -1495,32 +1454,27 @@ execbuf_submit(struct i915_execbuffer_params *params,
>   		intel_ring_emit(ring, instp_mask << 16 | instp_mode);
>   		intel_ring_advance(ring);
>   
> -		dev_priv->relative_constants_mode = instp_mode;
> +		eb->i915->relative_constants_mode = instp_mode;
>   	}
>   
> -	if (args->flags & I915_EXEC_GEN7_SOL_RESET) {
> -		ret = i915_reset_gen7_sol_offsets(params->request);
> +	if (eb->args->flags & I915_EXEC_GEN7_SOL_RESET) {
> +		ret = i915_reset_gen7_sol_offsets(eb->request);
>   		if (ret)
>   			return ret;
>   	}
>   
> -	exec_len   = args->batch_len;
> -	exec_start = params->batch->node.start +
> -		     params->args_batch_start_offset;
> -
> -	if (exec_len == 0)
> -		exec_len = params->batch->size - params->args_batch_start_offset;
> -
> -	ret = params->engine->emit_bb_start(params->request,
> -					    exec_start, exec_len,
> -					    params->dispatch_flags);
> +	ret = eb->engine->emit_bb_start(eb->request,
> +					eb->batch->node.start +
> +					eb->batch_start_offset,
> +					eb->args->batch_len,
> +					eb->dispatch_flags);
>   	if (ret)
>   		return ret;
>   
> -	trace_i915_gem_ring_dispatch(params->request, params->dispatch_flags);
> +	trace_i915_gem_ring_dispatch(eb->request, eb->dispatch_flags);
>   
> -	i915_gem_execbuffer_move_to_active(vmas, params->request);
> -	add_to_client(params->request, params->file);
> +	eb_move_to_active(eb);
> +	add_to_client(eb->request, eb->file);
>   
>   	return 0;
>   }
> @@ -1606,24 +1560,13 @@ eb_select_engine(struct drm_i915_private *dev_priv,
>   }
>   
>   static int
> -i915_gem_do_execbuffer(struct drm_device *dev, void *data,
> +i915_gem_do_execbuffer(struct drm_device *dev,
>   		       struct drm_file *file,
>   		       struct drm_i915_gem_execbuffer2 *args,
>   		       struct drm_i915_gem_exec_object2 *exec)
>   {
> -	struct drm_i915_private *dev_priv = to_i915(dev);
> -	struct i915_ggtt *ggtt = &dev_priv->ggtt;
> -	struct eb_vmas *eb;
> -	struct drm_i915_gem_exec_object2 shadow_exec_entry;
> -	struct intel_engine_cs *engine;
> -	struct i915_gem_context *ctx;
> -	struct i915_address_space *vm;
> -	struct i915_execbuffer_params params_master; /* XXX: will be removed later */
> -	struct i915_execbuffer_params *params = &params_master;
> -	const u32 ctx_id = i915_execbuffer2_get_context_id(*args);
> -	u32 dispatch_flags;
> +	struct i915_execbuffer eb;
>   	int ret;
> -	bool need_relocs;
>   
>   	if (!i915_gem_check_execbuffer(args))
>   		return -EINVAL;
> @@ -1632,37 +1575,39 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
>   	if (ret)
>   		return ret;
>   
> -	dispatch_flags = 0;
> +	eb.i915 = to_i915(dev);
> +	eb.file = file;
> +	eb.args = args;
> +	eb.exec = exec;
> +	eb.need_relocs = (args->flags & I915_EXEC_NO_RELOC) == 0;
> +	reloc_cache_init(&eb.reloc_cache, eb.i915);
> +
> +	eb.dispatch_flags = 0;
>   	if (args->flags & I915_EXEC_SECURE) {
>   		if (!drm_is_current_master(file) || !capable(CAP_SYS_ADMIN))
>   		    return -EPERM;
>   
> -		dispatch_flags |= I915_DISPATCH_SECURE;
> +		eb.dispatch_flags |= I915_DISPATCH_SECURE;
>   	}
>   	if (args->flags & I915_EXEC_IS_PINNED)
> -		dispatch_flags |= I915_DISPATCH_PINNED;
> -
> -	engine = eb_select_engine(dev_priv, file, args);
> -	if (!engine)
> -		return -EINVAL;
> +		eb.dispatch_flags |= I915_DISPATCH_PINNED;
>   
> -	if (args->buffer_count < 1) {
> -		DRM_DEBUG("execbuf with %d buffers\n", args->buffer_count);
> +	eb.engine = eb_select_engine(eb.i915, file, args);
> +	if (!eb.engine)
>   		return -EINVAL;
> -	}
>   
>   	if (args->flags & I915_EXEC_RESOURCE_STREAMER) {
> -		if (!HAS_RESOURCE_STREAMER(dev)) {
> +		if (!HAS_RESOURCE_STREAMER(eb.i915)) {
>   			DRM_DEBUG("RS is only allowed for Haswell, Gen8 and above\n");
>   			return -EINVAL;
>   		}
> -		if (engine->id != RCS) {
> +		if (eb.engine->id != RCS) {
>   			DRM_DEBUG("RS is not available on %s\n",
> -				 engine->name);
> +				 eb.engine->name);
>   			return -EINVAL;
>   		}
>   
> -		dispatch_flags |= I915_DISPATCH_RS;
> +		eb.dispatch_flags |= I915_DISPATCH_RS;
>   	}
>   
>   	/* Take a local wakeref for preparing to dispatch the execbuf as
> @@ -1671,59 +1616,44 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
>   	 * wakeref that we hold until the GPU has been idle for at least
>   	 * 100ms.
>   	 */
> -	intel_runtime_pm_get(dev_priv);
> +	intel_runtime_pm_get(eb.i915);
>   
>   	ret = i915_mutex_lock_interruptible(dev);
>   	if (ret)
>   		goto pre_mutex_err;
>   
> -	ctx = i915_gem_validate_context(dev, file, engine, ctx_id);
> -	if (IS_ERR(ctx)) {
> +	ret = eb_select_context(&eb);
> +	if (ret) {
>   		mutex_unlock(&dev->struct_mutex);
> -		ret = PTR_ERR(ctx);
>   		goto pre_mutex_err;
>   	}
>   
> -	i915_gem_context_get(ctx);
> -
> -	if (ctx->ppgtt)
> -		vm = &ctx->ppgtt->base;
> -	else
> -		vm = &ggtt->base;
> -
> -	memset(&params_master, 0x00, sizeof(params_master));
> -
> -	eb = eb_create(dev_priv, args);
> -	if (eb == NULL) {
> -		i915_gem_context_put(ctx);
> +	if (eb_create(&eb)) {
> +		i915_gem_context_put(eb.ctx);
>   		mutex_unlock(&dev->struct_mutex);
>   		ret = -ENOMEM;
>   		goto pre_mutex_err;
>   	}
>   
>   	/* Look up object handles */
> -	ret = eb_lookup_vmas(eb, exec, args, vm, file);
> +	ret = eb_lookup_vmas(&eb);
>   	if (ret)
>   		goto err;
>   
>   	/* take note of the batch buffer before we might reorder the lists */
> -	params->batch = eb_get_batch(eb);
> +	eb.batch = eb_get_batch(&eb);
>   
>   	/* Move the objects en-masse into the GTT, evicting if necessary. */
> -	need_relocs = (args->flags & I915_EXEC_NO_RELOC) == 0;
> -	ret = i915_gem_execbuffer_reserve(engine, &eb->vmas, ctx,
> -					  &need_relocs);
> +	ret = eb_reserve(&eb);
>   	if (ret)
>   		goto err;
>   
>   	/* The objects are in their final locations, apply the relocations. */
> -	if (need_relocs)
> -		ret = i915_gem_execbuffer_relocate(eb);
> +	if (eb.need_relocs)
> +		ret = eb_relocate(&eb);
>   	if (ret) {
>   		if (ret == -EFAULT) {
> -			ret = i915_gem_execbuffer_relocate_slow(dev, args, file,
> -								engine,
> -								eb, exec, ctx);
> +			ret = eb_relocate_slow(&eb);
>   			BUG_ON(!mutex_is_locked(&dev->struct_mutex));
>   		}
>   		if (ret)
> @@ -1731,28 +1661,23 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
>   	}
>   
>   	/* Set the pending read domains for the batch buffer to COMMAND */
> -	if (params->batch->obj->base.pending_write_domain) {
> +	if (eb.batch->obj->base.pending_write_domain) {
>   		DRM_DEBUG("Attempting to use self-modifying batch buffer\n");
>   		ret = -EINVAL;
>   		goto err;
>   	}
> -	if (args->batch_start_offset > params->batch->size ||
> -	    args->batch_len > params->batch->size - args->batch_start_offset) {
> +	if (args->batch_start_offset > eb.batch->size ||
> +	    args->batch_len > eb.batch->size - args->batch_start_offset) {
>   		DRM_DEBUG("Attempting to use out-of-bounds batch\n");
>   		ret = -EINVAL;
>   		goto err;
>   	}
>   
> -	params->args_batch_start_offset = args->batch_start_offset;
> -	if (intel_engine_needs_cmd_parser(engine) && args->batch_len) {
> +	eb.batch_start_offset = args->batch_start_offset;
> +	if (intel_engine_needs_cmd_parser(eb.engine) && args->batch_len) {
>   		struct i915_vma *vma;
>   
> -		vma = i915_gem_execbuffer_parse(engine, &shadow_exec_entry,
> -						params->batch->obj,
> -						eb,
> -						args->batch_start_offset,
> -						args->batch_len,
> -						drm_is_current_master(file));
> +		vma = eb_parse(&eb, drm_is_current_master(file));
>   		if (IS_ERR(vma)) {
>   			ret = PTR_ERR(vma);
>   			goto err;
> @@ -1768,19 +1693,21 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
>   			 * specifically don't want that set on batches the
>   			 * command parser has accepted.
>   			 */
> -			dispatch_flags |= I915_DISPATCH_SECURE;
> -			params->args_batch_start_offset = 0;
> -			params->batch = vma;
> +			eb.dispatch_flags |= I915_DISPATCH_SECURE;
> +			eb.batch_start_offset = 0;
> +			eb.batch = vma;
>   		}
>   	}
>   
> -	params->batch->obj->base.pending_read_domains |= I915_GEM_DOMAIN_COMMAND;
> +	eb.batch->obj->base.pending_read_domains |= I915_GEM_DOMAIN_COMMAND;
> +	if (args->batch_len == 0)
> +		args->batch_len = eb.batch->size - eb.batch_start_offset;
>   
>   	/* snb/ivb/vlv conflate the "batch in ppgtt" bit with the "non-secure
>   	 * batch" bit. Hence we need to pin secure batches into the global gtt.
>   	 * hsw should have this fixed, but bdw mucks it up again. */
> -	if (dispatch_flags & I915_DISPATCH_SECURE) {
> -		struct drm_i915_gem_object *obj = params->batch->obj;
> +	if (eb.dispatch_flags & I915_DISPATCH_SECURE) {
> +		struct drm_i915_gem_object *obj = eb.batch->obj;
>   		struct i915_vma *vma;
>   
>   		/*
> @@ -1799,13 +1726,13 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
>   			goto err;
>   		}
>   
> -		params->batch = vma;
> +		eb.batch = vma;
>   	}
>   
>   	/* Allocate a request for this batch buffer nice and early. */
> -	params->request = i915_gem_request_alloc(engine, ctx);
> -	if (IS_ERR(params->request)) {
> -		ret = PTR_ERR(params->request);
> +	eb.request = i915_gem_request_alloc(eb.engine, eb.ctx);
> +	if (IS_ERR(eb.request)) {
> +		ret = PTR_ERR(eb.request);
>   		goto err_batch_unpin;
>   	}
>   
> @@ -1815,22 +1742,10 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
>   	 * inactive_list and lose its active reference. Hence we do not need
>   	 * to explicitly hold another reference here.
>   	 */
> -	params->request->batch = params->batch;
> +	eb.request->batch = eb.batch;
>   
> -	/*
> -	 * Save assorted stuff away to pass through to *_submission().
> -	 * NB: This data should be 'persistent' and not local as it will
> -	 * kept around beyond the duration of the IOCTL once the GPU
> -	 * scheduler arrives.
> -	 */
> -	params->dev                     = dev;
> -	params->file                    = file;
> -	params->engine                    = engine;
> -	params->dispatch_flags          = dispatch_flags;
> -	params->ctx                     = ctx;
> -
> -	ret = execbuf_submit(params, args, &eb->vmas);
> -	__i915_add_request(params->request, ret == 0);
> +	ret = execbuf_submit(&eb);
> +	__i915_add_request(eb.request, ret == 0);
>   
>   err_batch_unpin:
>   	/*
> @@ -1839,19 +1754,17 @@ err_batch_unpin:
>   	 * needs to be adjusted to also track the ggtt batch vma properly as
>   	 * active.
>   	 */
> -	if (dispatch_flags & I915_DISPATCH_SECURE)
> -		i915_vma_unpin(params->batch);
> +	if (eb.dispatch_flags & I915_DISPATCH_SECURE)
> +		i915_vma_unpin(eb.batch);
>   err:
>   	/* the request owns the ref now */
This comment refers to the dropping of the context reference on the line 
below. That line is now inside eb_destroy() so the comment should be 
moved/updated as well. Or just removed entirely.

> -	i915_gem_context_put(ctx);
> -	eb_destroy(eb);
> -
> +	eb_destroy(&eb);
>   	mutex_unlock(&dev->struct_mutex);
>   
>   pre_mutex_err:
>   	/* intel_gpu_busy should also get a ref, so it will free when the device
>   	 * is really idle. */
> -	intel_runtime_pm_put(dev_priv);
> +	intel_runtime_pm_put(eb.i915);
>   	return ret;
>   }
>   
> @@ -1918,7 +1831,7 @@ i915_gem_execbuffer(struct drm_device *dev, void *data,
>   	exec2.flags = I915_EXEC_RENDER;
>   	i915_execbuffer2_set_context_id(exec2, 0);
>   
> -	ret = i915_gem_do_execbuffer(dev, data, file, &exec2, exec2_list);
> +	ret = i915_gem_do_execbuffer(dev, file, &exec2, exec2_list);
>   	if (!ret) {
>   		struct drm_i915_gem_exec_object __user *user_exec_list =
>   			u64_to_user_ptr(args->buffers_ptr);
> @@ -1982,7 +1895,7 @@ i915_gem_execbuffer2(struct drm_device *dev, void *data,
>   		return -EFAULT;
>   	}
>   
> -	ret = i915_gem_do_execbuffer(dev, data, file, args, exec2_list);
> +	ret = i915_gem_do_execbuffer(dev, file, args, exec2_list);
>   	if (!ret) {
>   		/* Copy the new buffer offsets back to the user's exec list. */
>   		struct drm_i915_gem_exec_object2 __user *user_exec_list =

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 08/17] drm/i915: Drop spinlocks around adding to the client request list
  2016-08-24 13:16   ` Mika Kuoppala
@ 2016-08-24 13:25     ` Chris Wilson
  0 siblings, 0 replies; 47+ messages in thread
From: Chris Wilson @ 2016-08-24 13:25 UTC (permalink / raw)
  To: Mika Kuoppala; +Cc: intel-gfx

On Wed, Aug 24, 2016 at 04:16:55PM +0300, Mika Kuoppala wrote:
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> > index 7b8abda541e6..e432211e8b24 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -3673,16 +3673,14 @@ i915_gem_ring_throttle(struct drm_device *dev, struct drm_file *file)
> >  		return -EIO;
> >  
> >  	spin_lock(&file_priv->mm.lock);
> > -	list_for_each_entry(request, &file_priv->mm.request_list, client_list) {
> > +	list_for_each_entry(request, &file_priv->mm.request_list, client_link) {
> >  		if (time_after_eq(request->emitted_jiffies, recent_enough))
> >  			break;
> >  
> > -		/*
> > -		 * Note that the request might not have been submitted yet.
> > -		 * In which case emitted_jiffies will be zero.
> > -		 */
> > -		if (!request->emitted_jiffies)
> > -			continue;
> > +		if (target) {
> > +			list_del(&target->client_link);
> > +			target->file_priv = NULL;
> 
> No need for list_for_each_entry_safe as we are throwing out stuff before
> the point of traversal?

Exactly, there is a very strict ordering here. We are deleting before
the cursor and we only ever add at the end (which is past the cursor).
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 07/17] drm/i915: Use the precomputed value for whether to enable command parsing
  2016-08-22  8:03 ` [PATCH 07/17] drm/i915: Use the precomputed value for whether to enable command parsing Chris Wilson
@ 2016-08-24 14:33   ` John Harrison
  2016-08-27  9:12     ` Chris Wilson
  0 siblings, 1 reply; 47+ messages in thread
From: John Harrison @ 2016-08-24 14:33 UTC (permalink / raw)
  To: intel-gfx

On 22/08/2016 09:03, Chris Wilson wrote:
> As i915.enable_cmd_parser is an unsafe option, make it read-only at
> runtime. Now that it is constant, we can use the value determined during
> initialisation as to whether we need the cmdparser at execbuffer time.
I'm confused. I can't see where i915.enable_cmd_parse is used now. It 
does not appear to be mentioned anywhere other than i915_params.c after 
this patch. It is not used to calculate engine->needs_cmd_parser. There 
also seems to be no longer a check for PPGTT either.

>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_cmd_parser.c  | 21 ---------------------
>   drivers/gpu/drm/i915/i915_drv.h         |  1 -
>   drivers/gpu/drm/i915/i915_params.c      |  6 +++---
>   drivers/gpu/drm/i915/i915_params.h      |  2 +-
>   drivers/gpu/drm/i915/intel_ringbuffer.h | 15 +++++++++++++++
>   5 files changed, 19 insertions(+), 26 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
> index 3c72b3b103e7..f749c8b52351 100644
> --- a/drivers/gpu/drm/i915/i915_cmd_parser.c
> +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
> @@ -1036,27 +1036,6 @@ unpin_src:
>   	return dst;
>   }
>   
> -/**
> - * intel_engine_needs_cmd_parser() - should a given engine use software
> - *                                   command parsing?
> - * @engine: the engine in question
> - *
> - * Only certain platforms require software batch buffer command parsing, and
> - * only when enabled via module parameter.
> - *
> - * Return: true if the engine requires software command parsing
> - */
> -bool intel_engine_needs_cmd_parser(struct intel_engine_cs *engine)
> -{
> -	if (!engine->needs_cmd_parser)
> -		return false;
> -
> -	if (!USES_PPGTT(engine->i915))
> -		return false;
> -
> -	return (i915.enable_cmd_parser == 1);
> -}
> -
>   static bool check_cmd(const struct intel_engine_cs *engine,
>   		      const struct drm_i915_cmd_descriptor *desc,
>   		      const u32 *cmd, u32 length,
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 5613f2886ad3..049ac53b05da 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -3603,7 +3603,6 @@ const char *i915_cache_level_str(struct drm_i915_private *i915, int type);
>   int i915_cmd_parser_get_version(struct drm_i915_private *dev_priv);
>   void intel_engine_init_cmd_parser(struct intel_engine_cs *engine);
>   void intel_engine_cleanup_cmd_parser(struct intel_engine_cs *engine);
> -bool intel_engine_needs_cmd_parser(struct intel_engine_cs *engine);
>   int intel_engine_cmd_parser(struct intel_engine_cs *engine,
>   			    struct drm_i915_gem_object *batch_obj,
>   			    struct drm_i915_gem_object *shadow_batch_obj,
> diff --git a/drivers/gpu/drm/i915/i915_params.c b/drivers/gpu/drm/i915/i915_params.c
> index e72a41223535..5879a7eae6cc 100644
> --- a/drivers/gpu/drm/i915/i915_params.c
> +++ b/drivers/gpu/drm/i915/i915_params.c
> @@ -50,7 +50,7 @@ struct i915_params i915 __read_mostly = {
>   	.error_capture = true,
>   	.invert_brightness = 0,
>   	.disable_display = 0,
> -	.enable_cmd_parser = 1,
> +	.enable_cmd_parser = true,
>   	.use_mmio_flip = 0,
>   	.mmio_debug = 0,
>   	.verbose_state_checks = 1,
> @@ -187,9 +187,9 @@ MODULE_PARM_DESC(invert_brightness,
>   module_param_named(disable_display, i915.disable_display, bool, 0400);
>   MODULE_PARM_DESC(disable_display, "Disable display (default: false)");
>   
> -module_param_named_unsafe(enable_cmd_parser, i915.enable_cmd_parser, int, 0600);
> +module_param_named_unsafe(enable_cmd_parser, i915.enable_cmd_parser, bool, 0400);
>   MODULE_PARM_DESC(enable_cmd_parser,
> -		 "Enable command parsing (1=enabled [default], 0=disabled)");
> +		 "Enable command parsing (true=enabled [default], false=disabled)");
>   
>   module_param_named_unsafe(use_mmio_flip, i915.use_mmio_flip, int, 0600);
>   MODULE_PARM_DESC(use_mmio_flip,
> diff --git a/drivers/gpu/drm/i915/i915_params.h b/drivers/gpu/drm/i915/i915_params.h
> index 94efc899c1ef..ad0b1d280c2b 100644
> --- a/drivers/gpu/drm/i915/i915_params.h
> +++ b/drivers/gpu/drm/i915/i915_params.h
> @@ -44,7 +44,6 @@ struct i915_params {
>   	int disable_power_well;
>   	int enable_ips;
>   	int invert_brightness;
> -	int enable_cmd_parser;
>   	int enable_guc_loading;
>   	int enable_guc_submission;
>   	int guc_log_level;
> @@ -53,6 +52,7 @@ struct i915_params {
>   	int edp_vswing;
>   	unsigned int inject_load_failure;
>   	/* leave bools at the end to not create holes */
> +	bool enable_cmd_parser;
>   	bool enable_hangcheck;
>   	bool fastboot;
>   	bool prefault_disable;
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index 84aea549de5d..c8cbe61deece 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -368,6 +368,21 @@ intel_engine_flag(const struct intel_engine_cs *engine)
>   	return 1 << engine->id;
>   }
>   
> +/**
> + * intel_engine_needs_cmd_parser() - should a given engine use software
> + *                                   command parsing?
> + * @engine: the engine in question
> + *
> + * Only certain platforms require software batch buffer command parsing, and
> + * only when enabled via module parameter.
> + *
> + * Return: true if the engine requires software command parsing
> + */
> +static inline bool intel_engine_needs_cmd_parser(struct intel_engine_cs *engine)
> +{
> +	return engine->needs_cmd_parser;
> +}
> +
>   static inline u32
>   intel_engine_sync_index(struct intel_engine_cs *engine,
>   			struct intel_engine_cs *other)

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v2] drm/i915: Eliminate lots of iterations over the execobjects array
  2016-08-22  8:03 ` [PATCH 13/17] drm/i915: Eliminate lots of iterations over the execobjects array Chris Wilson
@ 2016-08-25  6:03   ` Chris Wilson
  0 siblings, 0 replies; 47+ messages in thread
From: Chris Wilson @ 2016-08-25  6:03 UTC (permalink / raw)
  To: intel-gfx

The major scaling bottleneck in execbuffer is the processing of the
execobjects. Creating an auxiliary list is inefficient when compared to
using the execobject array we already have allocated.

Reservation is then split into phases. As we lookup up the VMA, we
try and bind it back into active location. Only if that fails, do we add
it to the unbound list for phase 2. In phase 2, we try and add all those
objects that could not fit into their previous location, with fallback
to retrying all objects and evicting the VM in case of severe
fragmentation. (This is the same as before, except that phase 1 is now
done inline with looking up the VMA to avoid an iteration over the
execobject array. In the ideal case, we eliminate the separate reservation
phase). During the reservation phase, we only evict from the VM between
passes (rather than currently as we try to fit every new VMA). In
testing with Unreal Engine's Atlantis demo which stresses the eviction
logic on gen7 class hardware, this speed up the framerate by a factor of
2.

The second loop amalgamation is between move_to_gpu and move_to_active.
As we always submit the request, even if incomplete, we can use the
current request to track active VMA as we perform the flushes and
synchronisation required.

The next big advancement is to avoid copying back to the user any
execobjects and relocations that are not changed.

v2: Add a Theory of Operation spiel.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h            |    3 +-
 drivers/gpu/drm/i915/i915_gem.c            |    2 +-
 drivers/gpu/drm/i915/i915_gem_evict.c      |   64 +-
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 1719 ++++++++++++++++------------
 drivers/gpu/drm/i915/i915_gem_gtt.h        |    4 +-
 5 files changed, 1022 insertions(+), 770 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 870e3c61eb68..f874a533eab9 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3124,6 +3124,7 @@ int i915_gem_freeze_late(struct drm_i915_private *dev_priv);
 
 void *i915_gem_object_alloc(struct drm_device *dev);
 void i915_gem_object_free(struct drm_i915_gem_object *obj);
+bool i915_gem_object_flush_active(struct drm_i915_gem_object *obj);
 void i915_gem_object_init(struct drm_i915_gem_object *obj,
 			 const struct drm_i915_gem_object_ops *ops);
 struct drm_i915_gem_object *i915_gem_object_create(struct drm_device *dev,
@@ -3515,7 +3516,7 @@ int __must_check i915_gem_evict_something(struct i915_address_space *vm,
 					  unsigned flags);
 int __must_check i915_gem_evict_for_vma(struct i915_vma *vma,
 					unsigned int flags);
-int i915_gem_evict_vm(struct i915_address_space *vm, bool do_idle);
+int i915_gem_evict_vm(struct i915_address_space *vm);
 
 /* belongs in i915_gem_gtt.h */
 static inline void i915_gem_chipset_flush(struct drm_i915_private *dev_priv)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 78faac2b780c..1f35dd6219cb 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3034,7 +3034,7 @@ i915_vma_insert(struct i915_vma *vma, u64 size, u64 alignment, u64 flags)
 			  size, obj->base.size,
 			  flags & PIN_MAPPABLE ? "mappable" : "total",
 			  end);
-		return -E2BIG;
+		return -ENOSPC;
 	}
 
 	ret = i915_gem_object_get_pages(obj);
diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c
index 79ce873b4891..d59b085de48e 100644
--- a/drivers/gpu/drm/i915/i915_gem_evict.c
+++ b/drivers/gpu/drm/i915/i915_gem_evict.c
@@ -55,7 +55,7 @@ mark_free(struct i915_vma *vma, unsigned int flags, struct list_head *unwind)
 	if (flags & PIN_NONFAULT && vma->obj->fault_mappable)
 		return false;
 
-	list_add(&vma->exec_list, unwind);
+	list_add(&vma->evict_link, unwind);
 	return drm_mm_scan_add_block(&vma->node);
 }
 
@@ -134,7 +134,7 @@ search_again:
 	} while (*++phase);
 
 	/* Nothing found, clean up and bail out! */
-	list_for_each_entry_safe(vma, next, &eviction_list, exec_list) {
+	list_for_each_entry_safe(vma, next, &eviction_list, evict_link) {
 		ret = drm_mm_scan_remove_block(&vma->node);
 		BUG_ON(ret);
 	}
@@ -179,16 +179,16 @@ found:
 	 * calling unbind (which may remove the active reference
 	 * of any of our objects, thus corrupting the list).
 	 */
-	list_for_each_entry_safe(vma, next, &eviction_list, exec_list) {
+	list_for_each_entry_safe(vma, next, &eviction_list, evict_link) {
 		if (drm_mm_scan_remove_block(&vma->node))
 			__i915_vma_pin(vma);
 		else
-			list_del(&vma->exec_list);
+			list_del(&vma->evict_link);
 	}
 
 	/* Unbinding will emit any required flushes */
 	ret = 0;
-	list_for_each_entry_safe(vma, next, &eviction_list, exec_list) {
+	list_for_each_entry_safe(vma, next, &eviction_list, evict_link) {
 		__i915_vma_unpin(vma);
 		if (ret == 0)
 			ret = i915_vma_unbind(vma);
@@ -260,10 +260,10 @@ int i915_gem_evict_for_vma(struct i915_vma *target, unsigned int flags)
 		}
 
 		__i915_vma_pin(vma);
-		list_add(&vma->exec_list, &eviction_list);
+		list_add(&vma->evict_link, &eviction_list);
 	}
 
-	list_for_each_entry_safe(vma, next, &eviction_list, exec_list) {
+	list_for_each_entry_safe(vma, next, &eviction_list, evict_link) {
 		__i915_vma_unpin(vma);
 		if (ret == 0)
 			ret = i915_vma_unbind(vma);
@@ -286,34 +286,48 @@ int i915_gem_evict_for_vma(struct i915_vma *target, unsigned int flags)
  * To clarify: This is for freeing up virtual address space, not for freeing
  * memory in e.g. the shrinker.
  */
-int i915_gem_evict_vm(struct i915_address_space *vm, bool do_idle)
+int i915_gem_evict_vm(struct i915_address_space *vm)
 {
+	struct list_head *phases[] = {
+		&vm->inactive_list,
+		&vm->active_list,
+		NULL
+	}, **phase;
+	struct list_head eviction_list;
 	struct i915_vma *vma, *next;
 	int ret;
 
 	WARN_ON(!mutex_is_locked(&vm->dev->struct_mutex));
 	trace_i915_gem_evict_vm(vm);
 
-	if (do_idle) {
-		struct drm_i915_private *dev_priv = to_i915(vm->dev);
-
-		if (i915_is_ggtt(vm)) {
-			ret = i915_gem_switch_to_kernel_context(dev_priv);
-			if (ret)
-				return ret;
-		}
-
-		ret = i915_gem_wait_for_idle(dev_priv, true);
+	/* Switch back to the default context in order to unpin
+	 * the existing context objects. However, such objects only
+	 * pin themselves inside the global GTT and performing the
+	 * switch otherwise is ineffective.
+	 */
+	if (i915_is_ggtt(vm)) {
+		ret = i915_gem_switch_to_kernel_context(to_i915(vm->dev));
 		if (ret)
 			return ret;
-
-		i915_gem_retire_requests(dev_priv);
-		WARN_ON(!list_empty(&vm->active_list));
 	}
 
-	list_for_each_entry_safe(vma, next, &vm->inactive_list, vm_link)
-		if (!i915_vma_is_pinned(vma))
-			WARN_ON(i915_vma_unbind(vma));
+	INIT_LIST_HEAD(&eviction_list);
+	phase = phases;
+	do {
+		list_for_each_entry(vma, *phase, vm_link) {
+			if (i915_vma_is_pinned(vma))
+				continue;
+
+			__i915_vma_pin(vma);
+			list_add(&vma->evict_link, &eviction_list);
+		}
+	} while (*++phase);
 
-	return 0;
+	ret = 0;
+	list_for_each_entry_safe(vma, next, &eviction_list, evict_link) {
+		__i915_vma_unpin(vma);
+		if (ret == 0)
+			ret = i915_vma_unbind(vma);
+	}
+	return ret;
 }
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index e9ac591f3a79..ad6684cc5fb4 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -41,14 +41,131 @@
 
 #define DBG_USE_CPU_RELOC 0 /* -1 force GTT relocs; 1 force CPU relocs */
 
-#define  __EXEC_OBJECT_HAS_PIN		(1<<31)
-#define  __EXEC_OBJECT_HAS_FENCE	(1<<30)
-#define  __EXEC_OBJECT_NEEDS_MAP	(1<<29)
-#define  __EXEC_OBJECT_NEEDS_BIAS	(1<<28)
+#define  __EXEC_OBJECT_HAS_PIN		BIT(31)
+#define  __EXEC_OBJECT_HAS_FENCE	BIT(30)
+#define  __EXEC_OBJECT_NEEDS_MAP	BIT(29)
+#define  __EXEC_OBJECT_NEEDS_BIAS	BIT(28)
 #define  __EXEC_OBJECT_INTERNAL_FLAGS (0xf<<28) /* all of the above */
+#define __EB_RESERVED (__EXEC_OBJECT_HAS_PIN | __EXEC_OBJECT_HAS_FENCE)
+
+#define __EXEC_HAS_RELOC	BIT(31)
+#define __EXEC_VALIDATED	BIT(30)
+#define UPDATE			PIN_OFFSET_FIXED
 
 #define BATCH_OFFSET_BIAS (256*1024)
 
+/**
+ * DOC: User command execution
+ *
+ * Userspace submits commands to be executed on the GPU as an instruction
+ * stream within a GEM object we call a batchbuffer. This instructions may
+ * refer to other GEM objects containing auxiliary state such as kernels,
+ * samplers, render targets and even secondary batchbuffers. Userspace does
+ * not know where in the GPU memory these objects reside and so before the
+ * batchbuffer is passed to the GPU for execution, those addresses in the
+ * batchbuffer and auxiliary objects are updated. This is known as relocation,
+ * or patching. To try and avoid having to relocate each object on the next
+ * execution, userspace is told the location of those objects in this pass,
+ * but this remains just a hint as the kernel may choose a new location for
+ * any object in the future.
+ *
+ * Processing an execbuf ioctl is conceptually split up into a few phases.
+ *
+ * 1. Validation - Ensure all the pointers, handles and flags are valid.
+ * 2. Reservation - Assign GPU address space for every object
+ * 3. Relocation - Update any addresses to point to the final locations
+ * 4. Serialisation - Order the request with respect to its dependencies
+ * 5. Construction - Construct a request to execute the batchbuffer
+ * 6. Submission (at some point in the future execution)
+ *
+ * Reserving resources for the execbuf is the most complicated phase. We
+ * neither want to have to migrate the object in the address space, nor do
+ * we want to have to update any relocations pointing to this object. Ideally,
+ * we want to leave the object where it is and for all the existing relocations
+ * to match. If the object is given a new address, or if userspace thinks the
+ * object is elsewhere, we have to parse all the relocation entries and update
+ * the addresses. Userspace can set the I915_EXEC_NORELOC flag to hint that
+ * all the target addresses in all of its objects match the value in the
+ * relocation entries and that they all match the presumed offsets given by the
+ * list of execbuffer objects. Using this knowledge, we know that if we haven't
+ * moved any buffers, all the relocation entries are valid and we can skip
+ * the update. (If userspace is wrong, the likely outcome is an impromptu GPU
+ * hang.)
+ *
+ * The reservation is done is multiple phases. First we try and keep any
+ * object already bound in its current location - so as long as meets the
+ * constraints imposed by the new execbuffer. Any object left unbound after the
+ * first pass is then fitted into any available idle space. If an object does
+ * not fit, all objects are removed from the reservation and the process rerun
+ * after sorting the objects into a priority order (more difficult to fit
+ * objects are tried first). Failing that, the entire VM is cleared and we try
+ * to fit the execbuf once last time before concluding that it simply will not
+ * fit.
+ *
+ * A small complication to all of this is that we allow userspace not only to
+ * specify an alignment and a size for the object in the address space, but
+ * we also allow userspace to specify the exact offset. This objects are
+ * simpler to place (the location is known a priori) all we have to do is make
+ * sure the space is available.
+ *
+ * Once all the objects are in place, patching up the buried pointers to point
+ * to the final locations is a fairly simple job of walking over the relocation
+ * entry arrays, looking up the right address and rewriting the value into
+ * the object. Simple! ... The relocation entries are stored in user memory
+ * and so to access them we have to copy them into a local buffer. That copy
+ * has to avoid taking any pagefaults as they may lead back to a GEM object
+ * requiring the struct_mutex (i.e. recursive deadlock). So once again we split
+ * the relocation into multiple passes. First we try to do everything within an
+ * atomic context (avoid the pagefaults) which requires that we never wait. If
+ * we detect that we may wait, or if we need to fault, then we have to fallback
+ * to a slower path. The slowpath has to drop the mutex. (Can you hear alarm
+ * bells yet?) Dropping the mutex means that we lose all the state we have
+ * built up so far for the execbuf and we must reset any global data. However,
+ * we do leave the objects pinned in their final locations - which is a
+ * potential issue for concurrent execbufs. Once we have left the mutex, we can
+ * allocate and copy all the relocation entries into a large array at our
+ * leisure, reacquire the mutex, reclaim all the objects and other state and
+ * then proceed to update any incorrect addresses with the objects.
+ *
+ * As we process the relocation entries, we maintain a record of whether the
+ * object is being written to. Using NORELOC, we expect userspace to provide
+ * this information instead. We also check whether we can skip the relocation
+ * by comparing the expected value inside the relocation entry with the target's
+ * final address. If they differ, we have to map the current object and rewrite
+ * the 4 or 8 byte pointer within.
+ *
+ * Serialising an execbuf is quite simple according to the rules of the GEM
+ * ABI. Execution within each context is ordered by the order of submission.
+ * Writes to any GEM object are in order of submission and are exclusive. Reads
+ * from a GEM object are unordered with respect to other reads, but ordered by
+ * writes. A write submitted after a read cannot occur before the read, and
+ * similarly any read submitted after a write cannot occur before the write.
+ * Writes are ordered between engines such that only one write occurs at any
+ * time (completing any reads beforehand) - using semaphores where available
+ * and CPU serialisation otherwise. Other GEM access obey the same rules, any
+ * write (either via mmaps using set-domain, or via pwrite) must flush all GPU
+ * reads before starting, and any read (either using set-domain or pread) must
+ * flush all GPU writes before starting. (Note we only employ a barrier before,
+ * we currently rely on userspace not concurrently starting a new execution
+ * whilst reading or writing to an object. This may be an advantage or not
+ * depending on how much you trust userspace not to shoot themselves in the
+ * foot.) Serialisation may just result in the request being inserted into
+ * a DAG awaiting its turn, but most simple is to wait on the CPU until
+ * all dependencies are resolved.
+ *
+ * After all of that, is just a matter of closing the request and handing it to
+ * the hardware (well, leaving it in a queue to be executed). However, we also
+ * offer the ability for batchbuffers to be run with elevated privileges so
+ * that they access otherwise hidden registers. (Used to adjust L3 cache etc.)
+ * Before any batch is given extra privileges we first must check that it
+ * contains no nefarious instructions, we check that each instruction is from
+ * our whitelist and all registers are also from an allowed list. We first
+ * copy the user's batchbuffer to a shadow (so that the user doesn't have
+ * access to it, either by the CPU or GPU as we scan it) and then parse each
+ * instruction. If everything is ok, we set a flag telling the hardware to run
+ * the batchbuffer in trusted mode, otherwise the ioctl is rejected.
+ */
+
 struct i915_execbuffer {
 	struct drm_i915_private *i915;
 	struct drm_file *file;
@@ -59,27 +176,59 @@ struct i915_execbuffer {
 	struct i915_address_space *vm;
 	struct i915_vma *batch;
 	struct drm_i915_gem_request *request;
-	u32 batch_start_offset;
-	unsigned int dispatch_flags;
-	struct drm_i915_gem_exec_object2 shadow_exec_entry;
-	bool need_relocs;
-	struct list_head vmas;
+	struct list_head unbound;
+	struct list_head relocs;
 	struct reloc_cache {
 		struct drm_mm_node node;
 		unsigned long vaddr;
 		unsigned int page;
 		bool use_64bit_reloc;
+		bool has_llc;
+		bool has_fence;
 	} reloc_cache;
-	int lut_mask;
+	u64 invalid_flags;
+	u32 context_flags;
+	u32 dispatch_flags;
+	int lut_size;
 	struct hlist_head *buckets;
 };
 
+#define to_ptr(T, x) ((T *)(uintptr_t)(x))
+
+/* Used to convert any address to canonical form.
+ * Starting from gen8, some commands (e.g. STATE_BASE_ADDRESS,
+ * MI_LOAD_REGISTER_MEM and others, see Broadwell PRM Vol2a) require the
+ * addresses to be in a canonical form:
+ * "GraphicsAddress[63:48] are ignored by the HW and assumed to be in correct
+ * canonical form [63:48] == [47]."
+ */
+#define GEN8_HIGH_ADDRESS_BIT 47
+static inline u64 gen8_canonical_addr(u64 address)
+{
+	return sign_extend64(address, GEN8_HIGH_ADDRESS_BIT);
+}
+
+static inline u64 gen8_noncanonical_addr(u64 address)
+{
+	return address & ((1ULL << (GEN8_HIGH_ADDRESS_BIT + 1)) - 1);
+}
+
 static int
 eb_create(struct i915_execbuffer *eb)
 {
 	if ((eb->args->flags & I915_EXEC_HANDLE_LUT) == 0) {
 		unsigned int size = 1 + ilog2(eb->args->buffer_count);
 
+		/* Without a 1:1 association between relocation handles and
+		 * the execobject[] index, we instead create a hashtable.
+		 * We size it dynamically based on available memory, starting
+		 * first with 1:1 assocative hash and scaling back until
+		 * the allocation succeeds.
+		 *
+		 * Later on we use a positive lut_size to indicate we are
+		 * using this hashtable, and a negative value to indicate a
+		 * direct lookup.
+		 */
 		do {
 			eb->buckets = kzalloc(sizeof(struct hlist_head) << size,
 					     GFP_TEMPORARY | __GFP_NOWARN | __GFP_NORETRY);
@@ -94,87 +243,347 @@ eb_create(struct i915_execbuffer *eb)
 				return -ENOMEM;
 		}
 
-		eb->lut_mask = size;
+		eb->lut_size = size;
 	} else
-		eb->lut_mask = -eb->args->buffer_count;
+		eb->lut_size = -eb->args->buffer_count;
 
 	return 0;
 }
 
+static bool
+eb_vma_misplaced(const struct drm_i915_gem_exec_object2 *entry,
+		 const struct i915_vma *vma)
+{
+	if ((entry->flags & __EXEC_OBJECT_HAS_PIN) == 0)
+		return true;
+
+	if (vma->node.size < entry->pad_to_size)
+		return true;
+
+	if (entry->alignment && vma->node.start & (entry->alignment - 1))
+		return true;
+
+	if (entry->flags & EXEC_OBJECT_PINNED &&
+	    vma->node.start != entry->offset)
+		return true;
+
+	if (entry->flags & __EXEC_OBJECT_NEEDS_BIAS &&
+	    vma->node.start < BATCH_OFFSET_BIAS)
+		return true;
+
+	if ((entry->flags & EXEC_OBJECT_SUPPORTS_48B_ADDRESS) == 0 &&
+	    (vma->node.start + vma->node.size - 1) >> 32)
+		return true;
+
+	return false;
+}
+
+static void
+eb_pin_vma(struct i915_execbuffer *eb,
+	   struct drm_i915_gem_exec_object2 *entry,
+	   struct i915_vma *vma)
+{
+	u64 flags;
+
+	flags = vma->node.start;
+	flags |= PIN_USER | PIN_NONBLOCK | PIN_OFFSET_FIXED;
+	if (unlikely(entry->flags & EXEC_OBJECT_NEEDS_GTT))
+		flags |= PIN_GLOBAL;
+	if (unlikely(i915_vma_pin(vma, 0, 0, flags)))
+		return;
+
+	if (entry->flags & EXEC_OBJECT_NEEDS_FENCE) {
+		if (unlikely(i915_vma_get_fence(vma))) {
+			i915_vma_unpin(vma);
+			return;
+		}
+
+		if (i915_vma_pin_fence(vma))
+			entry->flags |= __EXEC_OBJECT_HAS_FENCE;
+	}
+
+	entry->flags |= __EXEC_OBJECT_HAS_PIN;
+}
+
 static inline void
 __eb_unreserve_vma(struct i915_vma *vma,
 		   const struct drm_i915_gem_exec_object2 *entry)
 {
+	GEM_BUG_ON((entry->flags & __EXEC_OBJECT_HAS_PIN) == 0);
+
 	if (unlikely(entry->flags & __EXEC_OBJECT_HAS_FENCE))
 		i915_vma_unpin_fence(vma);
 
-	if (entry->flags & __EXEC_OBJECT_HAS_PIN)
-		__i915_vma_unpin(vma);
+	__i915_vma_unpin(vma);
 }
 
-static void
-eb_unreserve_vma(struct i915_vma *vma)
+static inline void
+eb_unreserve_vma(struct i915_vma *vma,
+		 struct drm_i915_gem_exec_object2 *entry)
 {
-	struct drm_i915_gem_exec_object2 *entry = vma->exec_entry;
-
-	__eb_unreserve_vma(vma, entry);
-	entry->flags &= ~(__EXEC_OBJECT_HAS_FENCE | __EXEC_OBJECT_HAS_PIN);
+	if (entry->flags & __EXEC_OBJECT_HAS_PIN) {
+		__eb_unreserve_vma(vma, entry);
+		entry->flags &= ~__EB_RESERVED;
+	}
 }
 
-static void
-eb_reset(struct i915_execbuffer *eb)
+static int
+eb_add_vma(struct i915_execbuffer *eb,
+	   struct drm_i915_gem_exec_object2 *entry,
+	   struct i915_vma *vma)
 {
-	struct i915_vma *vma;
+	int ret;
 
-	list_for_each_entry(vma, &eb->vmas, exec_list) {
-		eb_unreserve_vma(vma);
-		vma->exec_entry = NULL;
+	GEM_BUG_ON(i915_vma_is_closed(vma));
+
+	if ((eb->args->flags & __EXEC_VALIDATED) == 0) {
+		if (unlikely(entry->flags & eb->invalid_flags))
+			return -EINVAL;
+
+		if (unlikely(entry->alignment && !is_power_of_2(entry->alignment)))
+			return -EINVAL;
+
+		/* Offset can be used as input (EXEC_OBJECT_PINNED), reject
+		 * any non-page-aligned or non-canonical addresses.
+		 */
+		if (entry->flags & EXEC_OBJECT_PINNED) {
+			if (unlikely(entry->offset !=
+				     gen8_canonical_addr(entry->offset & PAGE_MASK)))
+				return -EINVAL;
+		}
+
+		/* From drm_mm perspective address space is continuous,
+		 * so from this point we're always using non-canonical
+		 * form internally.
+		 */
+		entry->offset = gen8_noncanonical_addr(entry->offset);
+
+		/* pad_to_size was once a reserved field, so sanitize it */
+		if (entry->flags & EXEC_OBJECT_PAD_TO_SIZE) {
+			if (unlikely(offset_in_page(entry->pad_to_size)))
+				return -EINVAL;
+		} else {
+			entry->pad_to_size = 0;
+		}
+
+		if (unlikely(vma->exec_entry)) {
+			DRM_DEBUG("Object [handle %d, index %d] appears more than once in object list\n",
+				  entry->handle, (int)(entry - eb->exec));
+			return -EINVAL;
+		}
 	}
 
-	if (eb->lut_mask >= 0)
-		memset(eb->buckets, 0,
-		       (1<<eb->lut_mask)*sizeof(struct hlist_head));
-}
+	vma->exec_entry = entry;
+	entry->rsvd2 = (uintptr_t)vma;
 
-#define to_ptr(T, x) ((T *)(uintptr_t)(x))
+	if (eb->lut_size >= 0) {
+		vma->exec_handle = entry->handle;
+		hlist_add_head(&vma->exec_node,
+			       &eb->buckets[hash_32(entry->handle,
+						    eb->lut_size)]);
+	}
 
-static bool
-eb_add_vma(struct i915_execbuffer *eb, struct i915_vma *vma, int i)
+	if (entry->relocation_count)
+		list_add_tail(&vma->reloc_link, &eb->relocs);
+
+	if (!eb->reloc_cache.has_fence) {
+		entry->flags &= ~EXEC_OBJECT_NEEDS_FENCE;
+	} else {
+		if (entry->flags & EXEC_OBJECT_NEEDS_FENCE &&
+		    i915_gem_object_is_tiled(vma->obj))
+			entry->flags |= EXEC_OBJECT_NEEDS_GTT | __EXEC_OBJECT_NEEDS_MAP;
+	}
+
+	if ((entry->flags & EXEC_OBJECT_PINNED) == 0)
+		entry->flags |= eb->context_flags;
+
+	ret = 0;
+	if (vma->node.size)
+		eb_pin_vma(eb, entry, vma);
+	if (eb_vma_misplaced(entry, vma)) {
+		eb_unreserve_vma(vma, entry);
+
+		list_add_tail(&vma->exec_link, &eb->unbound);
+		if (drm_mm_node_allocated(&vma->node))
+			ret = i915_vma_unbind(vma);
+	} else {
+		if (entry->offset != vma->node.start) {
+			entry->offset = vma->node.start | UPDATE;
+			eb->args->flags |= __EXEC_HAS_RELOC;
+		}
+	}
+	return ret;
+}
+
+static inline int use_cpu_reloc(const struct reloc_cache *cache,
+				const struct drm_i915_gem_object *obj)
 {
-	if (unlikely(vma->exec_entry)) {
-		DRM_DEBUG("Object [handle %d, index %d] appears more than once in object list\n",
-			  eb->exec[i].handle, i);
+	if (!i915_gem_object_has_struct_page(obj))
 		return false;
+
+	if (DBG_USE_CPU_RELOC)
+		return DBG_USE_CPU_RELOC > 0;
+
+	return (cache->has_llc ||
+		obj->base.write_domain == I915_GEM_DOMAIN_CPU ||
+		obj->cache_level != I915_CACHE_NONE);
+}
+
+static int
+eb_reserve_vma(struct i915_execbuffer *eb, struct i915_vma *vma)
+{
+	struct drm_i915_gem_exec_object2 *entry = vma->exec_entry;
+	u64 flags;
+	int ret;
+
+	flags = PIN_USER | PIN_NONBLOCK;
+	if (entry->flags & EXEC_OBJECT_NEEDS_GTT)
+		flags |= PIN_GLOBAL;
+
+	if (!drm_mm_node_allocated(&vma->node)) {
+		/* Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset,
+		 * limit address to the first 4GBs for unflagged objects.
+		 */
+		if ((entry->flags & EXEC_OBJECT_SUPPORTS_48B_ADDRESS) == 0)
+			flags |= PIN_ZONE_4G;
+
+		if (entry->flags & __EXEC_OBJECT_NEEDS_MAP)
+			flags |= PIN_MAPPABLE;
+
+		if (entry->flags & EXEC_OBJECT_PINNED) {
+			flags |= entry->offset | PIN_OFFSET_FIXED;
+			/* force overlapping PINNED checks */
+			flags &= ~PIN_NONBLOCK;
+		} else if (entry->flags & __EXEC_OBJECT_NEEDS_BIAS)
+			flags |= BATCH_OFFSET_BIAS | PIN_OFFSET_BIAS;
 	}
-	list_add_tail(&vma->exec_list, &eb->vmas);
 
-	vma->exec_entry = &eb->exec[i];
-	if (eb->lut_mask >= 0) {
-		vma->exec_handle = eb->exec[i].handle;
-		hlist_add_head(&vma->exec_node,
-			       &eb->buckets[hash_32(vma->exec_handle,
-						    eb->lut_mask)]);
+	ret = i915_vma_pin(vma, entry->pad_to_size, entry->alignment, flags);
+	if (ret)
+		return ret;
+
+	if (entry->offset != vma->node.start) {
+		entry->offset = vma->node.start | UPDATE;
+		eb->args->flags |= __EXEC_HAS_RELOC;
 	}
+	entry->flags |= __EXEC_OBJECT_HAS_PIN;
 
-	eb->exec[i].rsvd2 = (uintptr_t)vma;
-	return true;
+	if (entry->flags & EXEC_OBJECT_NEEDS_FENCE) {
+		ret = i915_vma_get_fence(vma);
+		if (ret)
+			return ret;
+
+		if (i915_vma_pin_fence(vma))
+			entry->flags |= __EXEC_OBJECT_HAS_FENCE;
+	}
+
+	GEM_BUG_ON(eb_vma_misplaced(entry, vma));
+	return 0;
 }
 
-static inline struct hlist_head *ht_head(struct i915_gem_context *ctx,
-					 u32 handle)
+static int eb_reserve(struct i915_execbuffer *eb)
+{
+	const unsigned int count = eb->args->buffer_count;
+	struct list_head last;
+	struct i915_vma *vma;
+	unsigned int i, pass;
+	int ret;
+
+	/* Attempt to pin all of the buffers into the GTT.
+	 * This is done in 3 phases:
+	 *
+	 * 1a. Unbind all objects that do not match the GTT constraints for
+	 *     the execbuffer (fenceable, mappable, alignment etc).
+	 * 1b. Increment pin count for already bound objects.
+	 * 2.  Bind new objects.
+	 * 3.  Decrement pin count.
+	 *
+	 * This avoid unnecessary unbinding of later objects in order to make
+	 * room for the earlier objects *unless* we need to defragment.
+	 */
+
+	pass = 0;
+	ret = 0;
+	do {
+		list_for_each_entry(vma, &eb->unbound, exec_link) {
+			ret = eb_reserve_vma(eb, vma);
+			if (ret)
+				break;
+		}
+		if (ret != -ENOSPC || pass++)
+			return ret;
+
+		/* Resort *all* the objects into priority order */
+		INIT_LIST_HEAD(&eb->unbound);
+		INIT_LIST_HEAD(&last);
+		for (i = 0; i < count; i++) {
+			struct drm_i915_gem_exec_object2 *entry = &eb->exec[i];
+
+			vma = to_ptr(struct i915_vma, entry->rsvd2);
+			eb_unreserve_vma(vma, entry);
+
+			if (entry->flags & EXEC_OBJECT_PINNED)
+				list_add(&vma->exec_link, &eb->unbound);
+			else if (entry->flags & __EXEC_OBJECT_NEEDS_MAP)
+				list_add_tail(&vma->exec_link, &eb->unbound);
+			else
+				list_add_tail(&vma->exec_link, &last);
+		}
+		list_splice_tail(&last, &eb->unbound);
+
+		/* Too fragmented, unbind everything and retry */
+		ret = i915_gem_evict_vm(eb->vm);
+		if (ret)
+			return ret;
+	} while (1);
+}
+
+static inline struct hlist_head *
+ht_head(const struct i915_gem_context *ctx, u32 handle)
 {
 	return &ctx->vma.ht[hash_32(handle, ctx->vma.ht_bits)];
 }
 
+static int eb_batch_index(const struct i915_execbuffer *eb)
+{
+	return eb->args->buffer_count - 1;
+}
+
+static int eb_select_context(struct i915_execbuffer *eb)
+{
+	struct i915_gem_context *ctx;
+
+	ctx = i915_gem_context_lookup(eb->file->driver_priv, eb->args->rsvd1);
+	if (unlikely(IS_ERR(ctx)))
+		return PTR_ERR(ctx);
+
+	if (unlikely(ctx->hang_stats.banned)) {
+		DRM_DEBUG("Context %u tried to submit while banned\n",
+			  ctx->user_handle);
+		return -EIO;
+	}
+
+	eb->ctx = i915_gem_context_get(ctx);
+	eb->vm = ctx->ppgtt ? &ctx->ppgtt->base : &eb->i915->ggtt.base;
+
+	eb->context_flags = 0;
+	if (ctx->flags & CONTEXT_NO_ZEROMAP)
+		eb->context_flags |= __EXEC_OBJECT_NEEDS_BIAS;
+
+	return 0;
+}
+
 static int
 eb_lookup_vmas(struct i915_execbuffer *eb)
 {
 	const int count = eb->args->buffer_count;
 	struct i915_vma *vma;
+	struct idr *idr;
 	int slow_pass = -1;
-	int i;
+	int i, ret;
 
-	INIT_LIST_HEAD(&eb->vmas);
+	INIT_LIST_HEAD(&eb->relocs);
+	INIT_LIST_HEAD(&eb->unbound);
 
 	if (unlikely(eb->ctx->vma.ht_size & 1))
 		flush_work(&eb->ctx->vma.resize);
@@ -187,8 +596,9 @@ eb_lookup_vmas(struct i915_execbuffer *eb)
 			if (vma->ctx_handle != eb->exec[i].handle)
 				continue;
 
-			if (!eb_add_vma(eb, vma, i))
-				return -EINVAL;
+			ret = eb_add_vma(eb, &eb->exec[i], vma);
+			if (unlikely(ret))
+				return ret;
 
 			goto next_vma;
 		}
@@ -199,24 +609,25 @@ next_vma: ;
 	}
 
 	if (slow_pass < 0)
-		return 0;
+		goto out;
 
 	spin_lock(&eb->file->table_lock);
 	/* Grab a reference to the object and release the lock so we can lookup
 	 * or create the VMA without using GFP_ATOMIC */
+	idr = &eb->file->object_idr;
 	for (i = slow_pass; i < count; i++) {
 		struct drm_i915_gem_object *obj;
 
 		if (eb->exec[i].rsvd2)
 			continue;
 
-		obj = to_intel_bo(idr_find(&eb->file->object_idr,
-					   eb->exec[i].handle));
+		obj = to_intel_bo(idr_find(idr, eb->exec[i].handle));
 		if (unlikely(!obj)) {
 			spin_unlock(&eb->file->table_lock);
 			DRM_DEBUG("Invalid object handle %d at index %d\n",
 				  eb->exec[i].handle, i);
-			return -ENOENT;
+			ret = -ENOENT;
+			goto err;
 		}
 
 		eb->exec[i].rsvd2 = 1 | (uintptr_t)obj;
@@ -237,11 +648,12 @@ next_vma: ;
 		 * from the (obj, vm) we don't run the risk of creating
 		 * duplicated vmas for the same vm.
 		 */
-		obj = to_ptr(struct drm_i915_gem_object, eb->exec[i].rsvd2 & ~1);
+		obj = to_ptr(typeof(*obj), eb->exec[i].rsvd2 & ~1);
 		vma = i915_gem_obj_lookup_or_create_vma(obj, eb->vm, NULL);
 		if (unlikely(IS_ERR(vma))) {
 			DRM_DEBUG("Failed to lookup VMA\n");
-			return PTR_ERR(vma);
+			ret = PTR_ERR(vma);
+			goto err;
 		}
 
 		/* First come, first served */
@@ -257,8 +669,9 @@ next_vma: ;
 			}
 		}
 
-		if (!eb_add_vma(eb, vma, i))
-			return -EINVAL;
+		ret = eb_add_vma(eb, &eb->exec[i], vma);
+		if (unlikely(ret))
+			goto err;
 	}
 	if (4*eb->ctx->vma.ht_count > 3*eb->ctx->vma.ht_size ||
 	    4*eb->ctx->vma.ht_count < eb->ctx->vma.ht_size) {
@@ -266,15 +679,10 @@ next_vma: ;
 		queue_work(system_highpri_wq, &eb->ctx->vma.resize);
 	}
 
-	return 0;
-}
-
-static struct i915_vma *
-eb_get_batch(struct i915_execbuffer *eb)
-{
-	struct i915_vma *vma;
-
-	vma = to_ptr(struct i915_vma, eb->exec[eb->args->buffer_count-1].rsvd2);
+out:
+	/* take note of the batch buffer before we might reorder the lists */
+	i = eb_batch_index(eb);
+	eb->batch = to_ptr(struct i915_vma, eb->exec[i].rsvd2);
 
 	/*
 	 * SNA is doing fancy tricks with compressing batch buffers, which leads
@@ -285,24 +693,34 @@ eb_get_batch(struct i915_execbuffer *eb)
 	 * Note that actual hangs have only been observed on gen7, but for
 	 * paranoia do it everywhere.
 	 */
-	if ((vma->exec_entry->flags & EXEC_OBJECT_PINNED) == 0)
-		vma->exec_entry->flags |= __EXEC_OBJECT_NEEDS_BIAS;
+	if ((eb->exec[i].flags & EXEC_OBJECT_PINNED) == 0)
+		eb->exec[i].flags |= __EXEC_OBJECT_NEEDS_BIAS;
+	if (eb->reloc_cache.has_fence)
+		eb->exec[i].flags |= EXEC_OBJECT_NEEDS_FENCE;
 
-	return vma;
+	eb->args->flags |= __EXEC_VALIDATED;
+	return eb_reserve(eb);
+
+err:
+	for (i = slow_pass; i < count; i++) {
+		if (eb->exec[i].rsvd2 & 1)
+			eb->exec[i].rsvd2 = 0;
+	}
+	return ret;
 }
 
 static struct i915_vma *
-eb_get_vma(struct i915_execbuffer *eb, unsigned long handle)
+eb_get_vma(const struct i915_execbuffer *eb, unsigned long handle)
 {
-	if (eb->lut_mask < 0) {
-		if (handle >= -eb->lut_mask)
+	if (eb->lut_size < 0) {
+		if (handle >= -eb->lut_size)
 			return NULL;
 		return to_ptr(struct i915_vma, eb->exec[handle].rsvd2);
 	} else {
 		struct hlist_head *head;
 		struct i915_vma *vma;
 
-		head = &eb->buckets[hash_32(handle, eb->lut_mask)];
+		head = &eb->buckets[hash_32(handle, eb->lut_size)];
 		hlist_for_each_entry(vma, head, exec_node) {
 			if (vma->exec_handle == handle)
 				return vma;
@@ -311,60 +729,58 @@ eb_get_vma(struct i915_execbuffer *eb, unsigned long handle)
 	}
 }
 
-static void eb_destroy(struct i915_execbuffer *eb)
+static void
+eb_reset(const struct i915_execbuffer *eb)
 {
-	struct i915_vma *vma;
+	const unsigned int count = eb->args->buffer_count;
+	unsigned int i;
 
-	list_for_each_entry(vma, &eb->vmas, exec_list) {
-		if (!vma->exec_entry)
-			continue;
+	for (i = 0; i < count; i++) {
+		struct drm_i915_gem_exec_object2 *entry = &eb->exec[i];
+		struct i915_vma *vma = to_ptr(struct i915_vma, entry->rsvd2);
 
-		__eb_unreserve_vma(vma, vma->exec_entry);
+		eb_unreserve_vma(vma, entry);
 		vma->exec_entry = NULL;
 	}
 
-	i915_gem_context_put(eb->ctx);
-
-	if (eb->lut_mask >= 0)
-		kfree(eb->buckets);
+	if (eb->lut_size >= 0)
+		memset(eb->buckets, 0,
+		       sizeof(struct hlist_head) << eb->lut_size);
 }
 
-static inline int use_cpu_reloc(struct drm_i915_gem_object *obj)
+static void eb_release_vma(const struct i915_execbuffer *eb)
 {
-	if (!i915_gem_object_has_struct_page(obj))
-		return false;
+	const unsigned int count = eb->args->buffer_count;
+	unsigned int i;
 
-	if (DBG_USE_CPU_RELOC)
-		return DBG_USE_CPU_RELOC > 0;
+	if (!eb->exec)
+		return;
 
-	return (HAS_LLC(obj->base.dev) ||
-		obj->base.write_domain == I915_GEM_DOMAIN_CPU ||
-		obj->cache_level != I915_CACHE_NONE);
-}
+	for (i = 0; i < count; i++) {
+		struct drm_i915_gem_exec_object2 *entry = &eb->exec[i];
+		struct i915_vma *vma = to_ptr(struct i915_vma, entry->rsvd2);
 
-/* Used to convert any address to canonical form.
- * Starting from gen8, some commands (e.g. STATE_BASE_ADDRESS,
- * MI_LOAD_REGISTER_MEM and others, see Broadwell PRM Vol2a) require the
- * addresses to be in a canonical form:
- * "GraphicsAddress[63:48] are ignored by the HW and assumed to be in correct
- * canonical form [63:48] == [47]."
- */
-#define GEN8_HIGH_ADDRESS_BIT 47
-static inline uint64_t gen8_canonical_addr(uint64_t address)
-{
-	return sign_extend64(address, GEN8_HIGH_ADDRESS_BIT);
+		if (!vma || !vma->exec_entry)
+			continue;
+
+		GEM_BUG_ON(vma->exec_entry != entry);
+		if (entry->flags & __EXEC_OBJECT_HAS_PIN)
+			__eb_unreserve_vma(vma, entry);
+		vma->exec_entry = NULL;
+	}
 }
 
-static inline uint64_t gen8_noncanonical_addr(uint64_t address)
+static void eb_destroy(const struct i915_execbuffer *eb)
 {
-	return address & ((1ULL << (GEN8_HIGH_ADDRESS_BIT + 1)) - 1);
+	if (eb->lut_size >= 0)
+		kfree(eb->buckets);
 }
 
-static inline uint64_t
+static inline u64
 relocation_target(const struct drm_i915_gem_relocation_entry *reloc,
-		  uint64_t target_offset)
+		  const struct i915_vma *target)
 {
-	return gen8_canonical_addr((int)reloc->delta + target_offset);
+	return gen8_canonical_addr((int)reloc->delta + target->node.start);
 }
 
 static void reloc_cache_init(struct reloc_cache *cache,
@@ -372,6 +788,8 @@ static void reloc_cache_init(struct reloc_cache *cache,
 {
 	cache->page = -1;
 	cache->vaddr = 0;
+	cache->has_llc = HAS_LLC(i915);
+	cache->has_fence = INTEL_GEN(i915) < 4;
 	cache->use_64bit_reloc = INTEL_GEN(i915) >= 8;
 	cache->node.allocated = false;
 }
@@ -484,7 +902,7 @@ static void *reloc_iomap(struct drm_i915_gem_object *obj,
 		struct i915_vma *vma;
 		int ret;
 
-		if (use_cpu_reloc(obj))
+		if (use_cpu_reloc(cache, obj))
 			return NULL;
 
 		ret = i915_gem_object_set_to_gtt_domain(obj, true);
@@ -572,17 +990,17 @@ static void clflush_write32(u32 *addr, u32 value, unsigned int flushes)
 		*addr = value;
 }
 
-static int
+static u64
 relocate_entry(struct drm_i915_gem_object *obj,
 	       const struct drm_i915_gem_relocation_entry *reloc,
 	       struct reloc_cache *cache,
-	       u64 target_offset)
+	       const struct i915_vma *target)
 {
 	u64 offset = reloc->offset;
+	u64 target_offset = relocation_target(reloc, target);
 	bool wide = cache->use_64bit_reloc;
 	void *vaddr;
 
-	target_offset = relocation_target(reloc, target_offset);
 repeat:
 	vaddr = reloc_vaddr(obj, cache, offset >> PAGE_SHIFT);
 	if (IS_ERR(vaddr))
@@ -599,7 +1017,7 @@ repeat:
 		goto repeat;
 	}
 
-	return 0;
+	return gen8_canonical_addr(target->node.start) | 1;
 }
 
 static bool object_is_idle(struct drm_i915_gem_object *obj)
@@ -616,13 +1034,12 @@ static bool object_is_idle(struct drm_i915_gem_object *obj)
 	return true;
 }
 
-static int
-eb_relocate_entry(struct i915_vma *vma,
-		  struct i915_execbuffer *eb,
-		  struct drm_i915_gem_relocation_entry *reloc)
+static u64
+eb_relocate_entry(struct i915_execbuffer *eb,
+		  const struct i915_vma *vma,
+		  const struct drm_i915_gem_relocation_entry *reloc)
 {
 	struct i915_vma *target;
-	u64 target_offset;
 	int ret;
 
 	/* we've already hold a reference to all valid objects */
@@ -653,26 +1070,28 @@ eb_relocate_entry(struct i915_vma *vma,
 		return -EINVAL;
 	}
 
-	if (reloc->write_domain)
+	if (reloc->write_domain) {
 		target->exec_entry->flags |= EXEC_OBJECT_WRITE;
 
-	/* Sandybridge PPGTT errata: We need a global gtt mapping for MI and
-	 * pipe_control writes because the gpu doesn't properly redirect them
-	 * through the ppgtt for non_secure batchbuffers.
-	 */
-	if (unlikely(IS_GEN6(eb->i915) &&
-		     reloc->write_domain == I915_GEM_DOMAIN_INSTRUCTION)) {
-		ret = i915_vma_bind(target, target->obj->cache_level,
-				    PIN_GLOBAL);
-		if (WARN_ONCE(ret, "Unexpected failure to bind target VMA!"))
-			return ret;
+		/* Sandybridge PPGTT errata: We need a global gtt mapping
+		 * for MI and pipe_control writes because the gpu doesn't
+		 * properly redirect them through the ppgtt for non_secure
+		 * batchbuffers.
+		 */
+		if (reloc->write_domain == I915_GEM_DOMAIN_INSTRUCTION &&
+		    IS_GEN6(eb->i915)) {
+			ret = i915_vma_bind(target, target->obj->cache_level,
+					    PIN_GLOBAL);
+			if (WARN_ONCE(ret,
+				      "Unexpected failure to bind target VMA!"))
+				return ret;
+		}
 	}
 
 	/* If the relocation already has the right value in it, no
 	 * more work needs to be done.
 	 */
-	target_offset = gen8_canonical_addr(target->node.start);
-	if (target_offset == reloc->presumed_offset)
+	if (gen8_canonical_addr(target->node.start) == reloc->presumed_offset)
 		return 0;
 
 	/* Check that the relocation address is valid... */
@@ -695,85 +1114,90 @@ eb_relocate_entry(struct i915_vma *vma,
 
 	/* We can't wait for rendering with pagefaults disabled */
 	if (pagefault_disabled() && !object_is_idle(vma->obj))
-		return -EFAULT;
-
-	ret = relocate_entry(vma->obj, reloc, &eb->reloc_cache, target_offset);
-	if (ret)
-		return ret;
+		return -EBUSY;
 
 	/* and update the user's relocation entry */
-	reloc->presumed_offset = target_offset;
-	return 0;
+	return relocate_entry(vma->obj, reloc, &eb->reloc_cache, target);
 }
 
-static int eb_relocate_vma(struct i915_vma *vma, struct i915_execbuffer *eb)
+static int eb_relocate_vma(struct i915_execbuffer *eb,
+			   const struct i915_vma *vma)
 {
 #define N_RELOC(x) ((x) / sizeof(struct drm_i915_gem_relocation_entry))
-	struct drm_i915_gem_relocation_entry stack_reloc[N_RELOC(512)];
-	struct drm_i915_gem_relocation_entry __user *user_relocs;
-	struct drm_i915_gem_exec_object2 *entry = vma->exec_entry;
-	int remain, ret = 0;
-
-	user_relocs = u64_to_user_ptr(entry->relocs_ptr);
+	struct drm_i915_gem_relocation_entry stack[N_RELOC(512)];
+	struct drm_i915_gem_relocation_entry __user *urelocs;
+	const struct drm_i915_gem_exec_object2 *entry = vma->exec_entry;
+	unsigned int remain;
 
+	urelocs = u64_to_user_ptr(entry->relocs_ptr);
 	remain = entry->relocation_count;
-	while (remain) {
-		struct drm_i915_gem_relocation_entry *r = stack_reloc;
-		int count = remain;
-		if (count > ARRAY_SIZE(stack_reloc))
-			count = ARRAY_SIZE(stack_reloc);
-		remain -= count;
+	if (unlikely(remain > ULONG_MAX / sizeof(*urelocs)))
+		return -EINVAL;
+
+	/*
+	 * We must check that the entire relocation array is safe
+	 * to read. However, if the array is not writable the user loses
+	 * the updated relocation values.
+	 */
 
-		if (__copy_from_user_inatomic(r, user_relocs, count*sizeof(r[0]))) {
-			ret = -EFAULT;
+	do {
+		struct drm_i915_gem_relocation_entry *r = stack;
+		unsigned int count =
+			min_t(unsigned int, remain, ARRAY_SIZE(stack));
+
+		if (__copy_from_user_inatomic(r, urelocs, count*sizeof(r[0]))) {
+			remain = -EFAULT;
 			goto out;
 		}
 
+		remain -= count;
 		do {
-			u64 offset = r->presumed_offset;
+			u64 offset = eb_relocate_entry(eb, vma, r);
 
-			ret = eb_relocate_entry(vma, eb, r);
-			if (ret)
-				goto out;
-
-			if (r->presumed_offset != offset &&
-			    __put_user(r->presumed_offset,
-				       &user_relocs->presumed_offset)) {
-				ret = -EFAULT;
+			if (offset == 0) {
+			} else if ((s64)offset < 0) {
+				remain = (s64)offset;
 				goto out;
+			} else {
+				__put_user(offset & ~1,
+					   &urelocs[r-stack].presumed_offset);
 			}
-
-			user_relocs++;
-			r++;
-		} while (--count);
-	}
-
+		} while (r++, --count);
+		urelocs += ARRAY_SIZE(stack);
+	} while (remain);
 out:
 	reloc_cache_reset(&eb->reloc_cache);
-	return ret;
+	return remain;
 #undef N_RELOC
 }
 
 static int
-eb_relocate_vma_slow(struct i915_vma *vma,
-		     struct i915_execbuffer *eb,
-		     struct drm_i915_gem_relocation_entry *relocs)
+eb_relocate_vma_slow(struct i915_execbuffer *eb,
+		     const struct i915_vma *vma)
 {
 	const struct drm_i915_gem_exec_object2 *entry = vma->exec_entry;
-	int i, ret = 0;
+	struct drm_i915_gem_relocation_entry *relocs =
+		to_ptr(typeof(*relocs), entry->relocs_ptr);
+	unsigned int i;
+	int ret;
 
 	for (i = 0; i < entry->relocation_count; i++) {
-		ret = eb_relocate_entry(vma, eb, &relocs[i]);
-		if (ret)
-			break;
+		u64 offset = eb_relocate_entry(eb, vma, &relocs[i]);
+
+		if ((s64)offset < 0) {
+			ret = (s64)offset;
+			goto err;
+		}
 	}
+	ret = 0;
+err:
 	reloc_cache_reset(&eb->reloc_cache);
 	return ret;
 }
 
 static int eb_relocate(struct i915_execbuffer *eb)
 {
-	struct i915_vma *vma;
+	const struct i915_vma *vma;
 	int ret = 0;
 
 	/* This is the fast path and we cannot handle a pagefault whilst
@@ -784,299 +1208,195 @@ static int eb_relocate(struct i915_execbuffer *eb)
 	 * lockdep complains vehemently.
 	 */
 	pagefault_disable();
-	list_for_each_entry(vma, &eb->vmas, exec_list) {
-		ret = eb_relocate_vma(vma, eb);
-		if (ret)
-			break;
+	list_for_each_entry(vma, &eb->relocs, reloc_link) {
+retry:
+		ret = eb_relocate_vma(eb, vma);
+		if (ret == 0)
+			continue;
+
+		if (ret == -EBUSY) {
+			pagefault_enable();
+			ret = i915_gem_object_wait_rendering(vma->obj, false);
+			pagefault_disable();
+			if (ret == 0)
+				goto retry;
+		}
+		break;
 	}
 	pagefault_enable();
 
 	return ret;
 }
 
-static bool only_mappable_for_reloc(unsigned int flags)
+static int check_relocations(const struct drm_i915_gem_exec_object2 *entry)
 {
-	return (flags & (EXEC_OBJECT_NEEDS_FENCE | __EXEC_OBJECT_NEEDS_MAP)) ==
-		__EXEC_OBJECT_NEEDS_MAP;
-}
-
-static int
-eb_reserve_vma(struct i915_vma *vma,
-	       struct intel_engine_cs *engine,
-	       bool *need_reloc)
-{
-	struct drm_i915_gem_exec_object2 *entry = vma->exec_entry;
-	uint64_t flags;
-	int ret;
-
-	flags = PIN_USER;
-	if (entry->flags & EXEC_OBJECT_NEEDS_GTT)
-		flags |= PIN_GLOBAL;
+	const unsigned long relocs_max =
+		ULONG_MAX / sizeof(struct drm_i915_gem_relocation_entry);
+	const char __user *addr, *end;
+	unsigned long size;
+	unsigned int nreloc;
+	char c;
+
+	nreloc = entry->relocation_count;
+	if (nreloc == 0)
+		return 0;
 
-	if (!drm_mm_node_allocated(&vma->node)) {
-		/* Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset,
-		 * limit address to the first 4GBs for unflagged objects.
-		 */
-		if ((entry->flags & EXEC_OBJECT_SUPPORTS_48B_ADDRESS) == 0)
-			flags |= PIN_ZONE_4G;
-		if (entry->flags & __EXEC_OBJECT_NEEDS_MAP)
-			flags |= PIN_GLOBAL | PIN_MAPPABLE;
-		if (entry->flags & __EXEC_OBJECT_NEEDS_BIAS)
-			flags |= BATCH_OFFSET_BIAS | PIN_OFFSET_BIAS;
-		if (entry->flags & EXEC_OBJECT_PINNED)
-			flags |= entry->offset | PIN_OFFSET_FIXED;
-		if ((flags & PIN_MAPPABLE) == 0)
-			flags |= PIN_HIGH;
-	}
-
-	ret = i915_vma_pin(vma,
-			   entry->pad_to_size,
-			   entry->alignment,
-			   flags);
-	if ((ret == -ENOSPC || ret == -E2BIG) &&
-	    only_mappable_for_reloc(entry->flags))
-		ret = i915_vma_pin(vma,
-				   entry->pad_to_size,
-				   entry->alignment,
-				   flags & ~PIN_MAPPABLE);
-	if (ret)
-		return ret;
+	if (nreloc > relocs_max)
+		return -EINVAL;
 
-	entry->flags |= __EXEC_OBJECT_HAS_PIN;
+	addr = u64_to_user_ptr(entry->relocs_ptr);
+	size = nreloc * sizeof(struct drm_i915_gem_relocation_entry);
+	if (!access_ok(VERIFY_WRITE, addr, size))
+		return -EFAULT;
 
-	if (entry->flags & EXEC_OBJECT_NEEDS_FENCE) {
-		ret = i915_vma_get_fence(vma);
+	end = addr + size;
+	for (; addr < end; addr += PAGE_SIZE) {
+		int ret = __get_user(c, addr);
 		if (ret)
 			return ret;
-
-		if (i915_vma_pin_fence(vma))
-			entry->flags |= __EXEC_OBJECT_HAS_FENCE;
-	}
-
-	if (entry->offset != vma->node.start) {
-		entry->offset = vma->node.start;
-		*need_reloc = true;
 	}
-
-	return 0;
+	return __get_user(c, end - 1);
 }
 
-static bool
-need_reloc_mappable(struct i915_vma *vma)
+static int
+eb_copy_relocations(const struct i915_execbuffer *eb)
 {
-	struct drm_i915_gem_exec_object2 *entry = vma->exec_entry;
-
-	if (entry->relocation_count == 0)
-		return false;
-
-	if (!i915_vma_is_ggtt(vma))
-		return false;
+	const unsigned int count = eb->args->buffer_count;
+	unsigned int i;
+	int ret;
 
-	/* See also use_cpu_reloc() */
-	if (HAS_LLC(vma->obj->base.dev))
-		return false;
+	for (i = 0; i < count; i++) {
+		struct drm_i915_gem_relocation_entry __user *urelocs;
+		struct drm_i915_gem_relocation_entry *relocs;
+		unsigned int nreloc = eb->exec[i].relocation_count, j;
+		unsigned long size;
 
-	if (vma->obj->base.write_domain == I915_GEM_DOMAIN_CPU)
-		return false;
+		if (nreloc == 0)
+			continue;
 
-	return true;
-}
+		ret = check_relocations(&eb->exec[i]);
+		if (ret)
+			goto err;
 
-static bool
-eb_vma_misplaced(struct i915_vma *vma)
-{
-	struct drm_i915_gem_exec_object2 *entry = vma->exec_entry;
+		urelocs = u64_to_user_ptr(eb->exec[i].relocs_ptr);
+		size = nreloc * sizeof(*relocs);
 
-	WARN_ON(entry->flags & __EXEC_OBJECT_NEEDS_MAP &&
-		!i915_vma_is_ggtt(vma));
+		relocs = drm_malloc_gfp(size, 1, GFP_TEMPORARY);
+		if (!relocs) {
+			ret = -ENOMEM;
+			goto err;
+		}
 
-	if (entry->alignment &&
-	    vma->node.start & (entry->alignment - 1))
-		return true;
+		/* copy_from_user is limited to 4GiB */
+		j = 0;
+		do {
+			u32 len = min_t(u64, 1ull<<31, size);
 
-	if (vma->node.size < entry->pad_to_size)
-		return true;
+			if (__copy_from_user(relocs + j, urelocs + j, len)) {
+				ret = -EFAULT;
+				goto err;
+			}
 
-	if (entry->flags & EXEC_OBJECT_PINNED &&
-	    vma->node.start != entry->offset)
-		return true;
+			size -= len;
+			BUILD_BUG_ON_NOT_POWER_OF_2(sizeof(*relocs));
+			j += len / sizeof(*relocs);
+		} while (size);
 
-	if (entry->flags & __EXEC_OBJECT_NEEDS_BIAS &&
-	    vma->node.start < BATCH_OFFSET_BIAS)
-		return true;
+		/* As we do not update the known relocation offsets after
+		 * relocating (due to the complexities in lock handling),
+		 * we need to mark them as invalid now so that we force the
+		 * relocation processing next time. Just in case the target
+		 * object is evicted and then rebound into its old
+		 * presumed_offset before the next execbuffer - if that
+		 * happened we would make the mistake of assuming that the
+		 * relocations were valid.
+		 */
+		user_access_begin();
+		for (j = 0; j < nreloc; j++)
+			unsafe_put_user(-1,
+					&urelocs[j].presumed_offset,
+					end_user);
+end_user:
+		user_access_end();
 
-	/* avoid costly ping-pong once a batch bo ended up non-mappable */
-	if (entry->flags & __EXEC_OBJECT_NEEDS_MAP &&
-	    !i915_vma_is_map_and_fenceable(vma))
-		return !only_mappable_for_reloc(entry->flags);
+		eb->exec[i].relocs_ptr = (uintptr_t)relocs;
+	}
 
-	if ((entry->flags & EXEC_OBJECT_SUPPORTS_48B_ADDRESS) == 0 &&
-	    (vma->node.start + vma->node.size - 1) >> 32)
-		return true;
+	return 0;
 
-	return false;
+err:
+	while (i--) {
+		struct drm_i915_gem_relocation_entry *relocs =
+			to_ptr(typeof(*relocs), eb->exec[i].relocs_ptr);
+		if (eb->exec[i].relocation_count)
+			drm_free_large(relocs);
+	}
+	return ret;
 }
 
-static int eb_reserve(struct i915_execbuffer *eb)
+static int eb_prefault_relocations(const struct i915_execbuffer *eb)
 {
-	const bool has_fenced_gpu_access = INTEL_GEN(eb->i915) < 4;
-	struct i915_vma *vma;
-	struct list_head ordered_vmas;
-	struct list_head pinned_vmas;
-	int retry;
-
-	INIT_LIST_HEAD(&ordered_vmas);
-	INIT_LIST_HEAD(&pinned_vmas);
-	while (!list_empty(&eb->vmas)) {
-		struct drm_i915_gem_exec_object2 *entry;
-		bool need_fence, need_mappable;
-
-		vma = list_first_entry(&eb->vmas, struct i915_vma, exec_list);
-		entry = vma->exec_entry;
-
-		if (eb->ctx->flags & CONTEXT_NO_ZEROMAP)
-			entry->flags |= __EXEC_OBJECT_NEEDS_BIAS;
-
-		if (!has_fenced_gpu_access)
-			entry->flags &= ~EXEC_OBJECT_NEEDS_FENCE;
-		need_fence =
-			entry->flags & EXEC_OBJECT_NEEDS_FENCE &&
-			i915_gem_object_is_tiled(vma->obj);
-		need_mappable = need_fence || need_reloc_mappable(vma);
-
-		if (entry->flags & EXEC_OBJECT_PINNED)
-			list_move_tail(&vma->exec_list, &pinned_vmas);
-		else if (need_mappable) {
-			entry->flags |= __EXEC_OBJECT_NEEDS_MAP;
-			list_move(&vma->exec_list, &ordered_vmas);
-		} else
-			list_move_tail(&vma->exec_list, &ordered_vmas);
-	}
-	list_splice(&ordered_vmas, &eb->vmas);
-	list_splice(&pinned_vmas, &eb->vmas);
-
-	/* Attempt to pin all of the buffers into the GTT.
-	 * This is done in 3 phases:
-	 *
-	 * 1a. Unbind all objects that do not match the GTT constraints for
-	 *     the execbuffer (fenceable, mappable, alignment etc).
-	 * 1b. Increment pin count for already bound objects.
-	 * 2.  Bind new objects.
-	 * 3.  Decrement pin count.
-	 *
-	 * This avoid unnecessary unbinding of later objects in order to make
-	 * room for the earlier objects *unless* we need to defragment.
-	 */
-	retry = 0;
-	do {
-		int ret = 0;
-
-		/* Unbind any ill-fitting objects or pin. */
-		list_for_each_entry(vma, &eb->vmas, exec_list) {
-			if (!drm_mm_node_allocated(&vma->node))
-				continue;
-
-			if (eb_vma_misplaced(vma))
-				ret = i915_vma_unbind(vma);
-			else
-				ret = eb_reserve_vma(vma, eb->engine, &eb->need_relocs);
-			if (ret)
-				goto err;
-		}
-
-		/* Bind fresh objects */
-		list_for_each_entry(vma, &eb->vmas, exec_list) {
-			if (drm_mm_node_allocated(&vma->node))
-				continue;
-
-			ret = eb_reserve_vma(vma, eb->engine, &eb->need_relocs);
-			if (ret)
-				goto err;
-		}
-
-err:
-		if (ret != -ENOSPC || retry++)
-			return ret;
+	const unsigned int count = eb->args->buffer_count;
+	unsigned int i;
 
-		/* Decrement pin count for bound objects */
-		list_for_each_entry(vma, &eb->vmas, exec_list)
-			eb_unreserve_vma(vma);
+	for (i = 0; i < count; i++) {
+		int ret;
 
-		ret = i915_gem_evict_vm(eb->vm, true);
+		ret = check_relocations(&eb->exec[i]);
 		if (ret)
 			return ret;
-	} while (1);
+	}
+
+	return 0;
 }
 
-static int
-eb_relocate_slow(struct i915_execbuffer *eb)
+static int eb_relocate_slow(struct i915_execbuffer *eb)
 {
-	const unsigned int count = eb->args->buffer_count;
 	struct drm_device *dev = &eb->i915->drm;
-	struct drm_i915_gem_relocation_entry *reloc;
-	struct i915_vma *vma;
-	int *reloc_offset;
-	int i, total, ret;
+	bool have_copy = false;
+	const struct i915_vma *vma;
+	int ret = 0;
+
+repeat:
+	if (signal_pending(current)) {
+		ret = -ERESTARTSYS;
+		goto out;
+	}
 
 	/* We may process another execbuffer during the unlock... */
 	eb_reset(eb);
 	mutex_unlock(&dev->struct_mutex);
 
-	total = 0;
-	for (i = 0; i < count; i++)
-		total += eb->exec[i].relocation_count;
-
-	reloc_offset = drm_malloc_ab(count, sizeof(*reloc_offset));
-	reloc = drm_malloc_ab(total, sizeof(*reloc));
-	if (reloc == NULL || reloc_offset == NULL) {
-		drm_free_large(reloc);
-		drm_free_large(reloc_offset);
-		mutex_lock(&dev->struct_mutex);
-		return -ENOMEM;
+	/* We take 3 passes through the slowpatch.
+	 *
+	 * 1 - we try to just prefault all the user relocation entries and
+	 * then attempt to reuse the atomic pagefault disabled fast path again.
+	 *
+	 * 2 - we copy the user entries to a local buffer here outside of the
+	 * local and allow ourselves to wait upon any rendering before
+	 * relocations
+	 *
+	 * 3 - we already have a local copy of the relocation entries, but
+	 * were interrupted (EAGAIN) whilst waiting for the objects, try again.
+	 */
+	if (ret == 0 && likely(!i915.prefault_disable)) {
+		ret = eb_prefault_relocations(eb);
+	} else if (!have_copy) {
+		ret = eb_copy_relocations(eb);
+		have_copy = true;
+	} else {
+		cond_resched();
+		ret = 0;
 	}
-
-	total = 0;
-	for (i = 0; i < count; i++) {
-		struct drm_i915_gem_relocation_entry __user *user_relocs;
-		u64 invalid_offset = (u64)-1;
-		int j;
-
-		user_relocs = u64_to_user_ptr(eb->exec[i].relocs_ptr);
-
-		if (copy_from_user(reloc+total, user_relocs,
-				   eb->exec[i].relocation_count * sizeof(*reloc))) {
-			ret = -EFAULT;
-			mutex_lock(&dev->struct_mutex);
-			goto err;
-		}
-
-		/* As we do not update the known relocation offsets after
-		 * relocating (due to the complexities in lock handling),
-		 * we need to mark them as invalid now so that we force the
-		 * relocation processing next time. Just in case the target
-		 * object is evicted and then rebound into its old
-		 * presumed_offset before the next execbuffer - if that
-		 * happened we would make the mistake of assuming that the
-		 * relocations were valid.
-		 */
-		for (j = 0; j < eb->exec[i].relocation_count; j++) {
-			if (__copy_to_user(&user_relocs[j].presumed_offset,
-					   &invalid_offset,
-					   sizeof(invalid_offset))) {
-				ret = -EFAULT;
-				mutex_lock(&dev->struct_mutex);
-				goto err;
-			}
-		}
-
-		reloc_offset[i] = total;
-		total += eb->exec[i].relocation_count;
+	if (ret) {
+		mutex_lock(&dev->struct_mutex);
+		goto out;
 	}
 
 	ret = i915_mutex_lock_interruptible(dev);
 	if (ret) {
 		mutex_lock(&dev->struct_mutex);
-		goto err;
+		goto out;
 	}
 
 	/* reacquire the objects */
@@ -1084,16 +1404,18 @@ eb_relocate_slow(struct i915_execbuffer *eb)
 	if (ret)
 		goto err;
 
-	ret = eb_reserve(eb);
-	if (ret)
-		goto err;
-
-	list_for_each_entry(vma, &eb->vmas, exec_list) {
-		int idx = vma->exec_entry - eb->exec;
-
-		ret = eb_relocate_vma_slow(vma, eb, reloc + reloc_offset[idx]);
-		if (ret)
-			goto err;
+	list_for_each_entry(vma, &eb->relocs, reloc_link) {
+		if (!have_copy) {
+			pagefault_disable();
+			ret = eb_relocate_vma(eb, vma);
+			pagefault_enable();
+			if (ret)
+				goto repeat;
+		} else {
+			ret = eb_relocate_vma_slow(eb, vma);
+			if (ret)
+				goto err;
+		}
 	}
 
 	/* Leave the user relocations as are, this is the painfully slow path,
@@ -1103,9 +1425,50 @@ eb_relocate_slow(struct i915_execbuffer *eb)
 	 */
 
 err:
-	drm_free_large(reloc);
-	drm_free_large(reloc_offset);
-	return ret;
+	if (ret == -EAGAIN)
+		goto repeat;
+
+out:
+	if (have_copy) {
+		const unsigned int count = eb->args->buffer_count;
+		unsigned int i;
+
+		for (i = 0; i < count; i++) {
+			const struct drm_i915_gem_exec_object2 *entry =
+				&eb->exec[i];
+			struct drm_i915_gem_relocation_entry *relocs;
+
+			if (entry->relocation_count == 0)
+				continue;
+
+			relocs = to_ptr(typeof(*relocs), entry->relocs_ptr);
+			drm_free_large(relocs);
+		}
+	}
+
+	return ret ?: have_copy;
+}
+
+static void eb_export_fence(struct drm_i915_gem_object *obj,
+			    struct drm_i915_gem_request *req,
+			    unsigned int flags)
+{
+	struct reservation_object *resv;
+
+	resv = i915_gem_object_get_dmabuf_resv(obj);
+	if (!resv)
+		return;
+
+	/* Ignore errors from failing to allocate the new fence, we can't
+	 * handle an error right now. Worst case should be missed
+	 * synchronisation leading to rendering corruption.
+	 */
+	ww_mutex_lock(&resv->lock, NULL);
+	if (flags & EXEC_OBJECT_WRITE)
+		reservation_object_add_excl_fence(resv, &req->fence);
+	else if (reservation_object_reserve_shared(resv) == 0)
+		reservation_object_add_shared_fence(resv, &req->fence);
+	ww_mutex_unlock(&resv->lock);
 }
 
 static unsigned int eb_other_engines(struct i915_execbuffer *eb)
@@ -1122,23 +1485,40 @@ static int
 eb_move_to_gpu(struct i915_execbuffer *eb)
 {
 	const unsigned int other_rings = eb_other_engines(eb);
-	struct i915_vma *vma;
+	const unsigned int count = eb->args->buffer_count;
+	unsigned int i;
 	int ret;
 
-	list_for_each_entry(vma, &eb->vmas, exec_list) {
+	for (i = 0; i < count; i++) {
+		const struct drm_i915_gem_exec_object2 *entry = &eb->exec[i];
+		struct i915_vma *vma = to_ptr(struct i915_vma, entry->rsvd2);
 		struct drm_i915_gem_object *obj = vma->obj;
 
 		if (obj->flags & other_rings) {
-			ret = i915_gem_object_sync(obj,
-						   eb->request,
-						   vma->exec_entry->flags & EXEC_OBJECT_WRITE);
+			ret = i915_gem_object_sync(obj, eb->request,
+						   entry->flags & EXEC_OBJECT_WRITE);
 			if (ret)
 				return ret;
 		}
 
-		if (obj->base.write_domain & I915_GEM_DOMAIN_CPU)
-			i915_gem_clflush_object(obj, false);
+		if (obj->base.write_domain) {
+			if (obj->base.write_domain & I915_GEM_DOMAIN_CPU)
+				i915_gem_clflush_object(obj, false);
+
+			obj->base.write_domain = 0;
+		}
+
+		if (entry->flags & EXEC_OBJECT_WRITE)
+			obj->base.read_domains = 0;
+		obj->base.read_domains |= I915_GEM_GPU_DOMAINS;
+
+		i915_vma_move_to_active(vma, eb->request, entry->flags);
+		eb_export_fence(obj, eb->request, entry->flags);
+
+		__eb_unreserve_vma(vma, entry);
+		vma->exec_entry = NULL;
 	}
+	eb->exec = NULL;
 
 	/* Unconditionally flush any chipset caches (for streaming writes). */
 	i915_gem_chipset_flush(eb->i915);
@@ -1170,104 +1550,6 @@ i915_gem_check_execbuffer(struct drm_i915_gem_execbuffer2 *exec)
 	return true;
 }
 
-static int
-validate_exec_list(struct drm_device *dev,
-		   struct drm_i915_gem_exec_object2 *exec,
-		   int count)
-{
-	unsigned relocs_total = 0;
-	unsigned relocs_max = UINT_MAX / sizeof(struct drm_i915_gem_relocation_entry);
-	unsigned invalid_flags;
-	int i;
-
-	/* INTERNAL flags must not overlap with external ones */
-	BUILD_BUG_ON(__EXEC_OBJECT_INTERNAL_FLAGS & ~__EXEC_OBJECT_UNKNOWN_FLAGS);
-
-	invalid_flags = __EXEC_OBJECT_UNKNOWN_FLAGS;
-	if (USES_FULL_PPGTT(dev))
-		invalid_flags |= EXEC_OBJECT_NEEDS_GTT;
-
-	for (i = 0; i < count; i++) {
-		char __user *ptr = u64_to_user_ptr(exec[i].relocs_ptr);
-		int length; /* limited by fault_in_pages_readable() */
-
-		if (exec[i].flags & invalid_flags)
-			return -EINVAL;
-
-		/* Offset can be used as input (EXEC_OBJECT_PINNED), reject
-		 * any non-page-aligned or non-canonical addresses.
-		 */
-		if (exec[i].flags & EXEC_OBJECT_PINNED) {
-			if (exec[i].offset !=
-			    gen8_canonical_addr(exec[i].offset & PAGE_MASK))
-				return -EINVAL;
-
-			/* From drm_mm perspective address space is continuous,
-			 * so from this point we're always using non-canonical
-			 * form internally.
-			 */
-			exec[i].offset = gen8_noncanonical_addr(exec[i].offset);
-		}
-
-		if (exec[i].alignment && !is_power_of_2(exec[i].alignment))
-			return -EINVAL;
-
-		/* pad_to_size was once a reserved field, so sanitize it */
-		if (exec[i].flags & EXEC_OBJECT_PAD_TO_SIZE) {
-			if (offset_in_page(exec[i].pad_to_size))
-				return -EINVAL;
-		} else {
-			exec[i].pad_to_size = 0;
-		}
-
-		/* First check for malicious input causing overflow in
-		 * the worst case where we need to allocate the entire
-		 * relocation tree as a single array.
-		 */
-		if (exec[i].relocation_count > relocs_max - relocs_total)
-			return -EINVAL;
-		relocs_total += exec[i].relocation_count;
-
-		length = exec[i].relocation_count *
-			sizeof(struct drm_i915_gem_relocation_entry);
-		/*
-		 * We must check that the entire relocation array is safe
-		 * to read, but since we may need to update the presumed
-		 * offsets during execution, check for full write access.
-		 */
-		if (!access_ok(VERIFY_WRITE, ptr, length))
-			return -EFAULT;
-
-		if (likely(!i915.prefault_disable)) {
-			if (fault_in_multipages_readable(ptr, length))
-				return -EFAULT;
-		}
-	}
-
-	return 0;
-}
-
-static int eb_select_context(struct i915_execbuffer *eb)
-{
-	struct i915_gem_context *ctx;
-	unsigned int ctx_id;
-
-	ctx_id = i915_execbuffer2_get_context_id(*eb->args);
-	ctx = i915_gem_context_lookup(eb->file->driver_priv, ctx_id);
-	if (unlikely(IS_ERR(ctx)))
-		return PTR_ERR(ctx);
-
-	if (unlikely(ctx->hang_stats.banned)) {
-		DRM_DEBUG("Context %u tried to submit while banned\n", ctx_id);
-		return -EIO;
-	}
-
-	eb->ctx = i915_gem_context_get(ctx);
-	eb->vm = ctx->ppgtt ? &ctx->ppgtt->base : &eb->i915->ggtt.base;
-
-	return 0;
-}
-
 void i915_vma_move_to_active(struct i915_vma *vma,
 			     struct drm_i915_gem_request *req,
 			     unsigned int flags)
@@ -1289,11 +1571,7 @@ void i915_vma_move_to_active(struct i915_vma *vma,
 
 	if (flags & EXEC_OBJECT_WRITE) {
 		i915_gem_active_set(&obj->last_write, req);
-
 		intel_fb_obj_invalidate(obj, ORIGIN_CS);
-
-		/* update for the implicit flush after a batch */
-		obj->base.write_domain &= ~I915_GEM_GPU_DOMAINS;
 	}
 
 	if (flags & EXEC_OBJECT_NEEDS_FENCE)
@@ -1304,49 +1582,6 @@ void i915_vma_move_to_active(struct i915_vma *vma,
 	list_move_tail(&vma->vm_link, &vma->vm->active_list);
 }
 
-static void eb_export_fence(struct drm_i915_gem_object *obj,
-			    struct drm_i915_gem_request *req,
-			    unsigned int flags)
-{
-	struct reservation_object *resv;
-
-	resv = i915_gem_object_get_dmabuf_resv(obj);
-	if (!resv)
-		return;
-
-	/* Ignore errors from failing to allocate the new fence, we can't
-	 * handle an error right now. Worst case should be missed
-	 * synchronisation leading to rendering corruption.
-	 */
-	ww_mutex_lock(&resv->lock, NULL);
-	if (flags & EXEC_OBJECT_WRITE)
-		reservation_object_add_excl_fence(resv, &req->fence);
-	else if (reservation_object_reserve_shared(resv) == 0)
-		reservation_object_add_shared_fence(resv, &req->fence);
-	ww_mutex_unlock(&resv->lock);
-}
-
-static void
-eb_move_to_active(struct i915_execbuffer *eb)
-{
-	struct i915_vma *vma;
-
-	list_for_each_entry(vma, &eb->vmas, exec_list) {
-		struct drm_i915_gem_object *obj = vma->obj;
-		u32 old_read = obj->base.read_domains;
-		u32 old_write = obj->base.write_domain;
-
-		obj->base.write_domain = 0;
-		if (vma->exec_entry->flags & EXEC_OBJECT_WRITE)
-			obj->base.read_domains = 0;
-		obj->base.read_domains |= I915_GEM_GPU_DOMAINS;
-
-		i915_vma_move_to_active(vma, eb->request, vma->exec_entry->flags);
-		eb_export_fence(obj, eb->request, vma->exec_entry->flags);
-		trace_i915_gem_object_change_domain(obj, old_read, old_write);
-	}
-}
-
 static int
 i915_reset_gen7_sol_offsets(struct drm_i915_gem_request *req)
 {
@@ -1358,16 +1593,16 @@ i915_reset_gen7_sol_offsets(struct drm_i915_gem_request *req)
 		return -EINVAL;
 	}
 
-	ret = intel_ring_begin(req, 4 * 3);
+	ret = intel_ring_begin(req, 4 * 2 + 2);
 	if (ret)
 		return ret;
 
+	intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(4));
 	for (i = 0; i < 4; i++) {
-		intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1));
 		intel_ring_emit_reg(ring, GEN7_SO_WRITE_OFFSET(i));
 		intel_ring_emit(ring, 0);
 	}
-
+	intel_ring_emit(ring, MI_NOOP);
 	intel_ring_advance(ring);
 
 	return 0;
@@ -1403,9 +1638,10 @@ static struct i915_vma *eb_parse(struct i915_execbuffer *eb, bool is_master)
 		goto out;
 
 	vma->exec_entry =
-		memset(&eb->shadow_exec_entry, 0, sizeof(*vma->exec_entry));
+		memset(&eb->exec[eb->args->buffer_count++],
+		       0, sizeof(*vma->exec_entry));
 	vma->exec_entry->flags = __EXEC_OBJECT_HAS_PIN;
-	list_add_tail(&vma->exec_list, &eb->vmas);
+	vma->exec_entry->rsvd2 = (uintptr_t)vma;
 
 out:
 	i915_gem_object_unpin_pages(shadow_batch_obj);
@@ -1421,70 +1657,81 @@ add_to_client(struct drm_i915_gem_request *req,
 }
 
 static int
-execbuf_submit(struct i915_execbuffer *eb)
+eb_set_constants_offset(struct i915_execbuffer *eb)
 {
-	int instp_mode;
-	u32 instp_mask;
+	struct drm_i915_private *dev_priv = eb->i915;
+	struct intel_ring *ring;
+	u32 mode, mask;
 	int ret;
 
-	ret = eb_move_to_gpu(eb);
-	if (ret)
-		return ret;
-
-	ret = i915_switch_context(eb->request);
-	if (ret)
-		return ret;
-
-	instp_mode = eb->args->flags & I915_EXEC_CONSTANTS_MASK;
-	instp_mask = I915_EXEC_CONSTANTS_MASK;
-	switch (instp_mode) {
+	mode = eb->args->flags & I915_EXEC_CONSTANTS_MASK;
+	switch (mode) {
 	case I915_EXEC_CONSTANTS_REL_GENERAL:
 	case I915_EXEC_CONSTANTS_ABSOLUTE:
 	case I915_EXEC_CONSTANTS_REL_SURFACE:
-		if (instp_mode != 0 && eb->engine->id != RCS) {
-			DRM_DEBUG("non-0 rel constants mode on non-RCS\n");
-			return -EINVAL;
-		}
-
-		if (instp_mode != eb->i915->relative_constants_mode) {
-			if (INTEL_INFO(eb->i915)->gen < 4) {
-				DRM_DEBUG("no rel constants on pre-gen4\n");
-				return -EINVAL;
-			}
-
-			if (INTEL_INFO(eb->i915)->gen > 5 &&
-			    instp_mode == I915_EXEC_CONSTANTS_REL_SURFACE) {
-				DRM_DEBUG("rel surface constants mode invalid on gen5+\n");
-				return -EINVAL;
-			}
-
-			/* The HW changed the meaning on this bit on gen6 */
-			if (INTEL_INFO(eb->i915)->gen >= 6)
-				instp_mask &= ~I915_EXEC_CONSTANTS_REL_SURFACE;
-		}
 		break;
 	default:
-		DRM_DEBUG("execbuf with unknown constants: %d\n", instp_mode);
+		DRM_DEBUG("execbuf with unknown constants: %d\n", mode);
 		return -EINVAL;
 	}
 
-	if (eb->engine->id == RCS &&
-	    instp_mode != eb->i915->relative_constants_mode) {
-		struct intel_ring *ring = eb->request->ring;
+	if (mode == dev_priv->relative_constants_mode)
+		return 0;
 
-		ret = intel_ring_begin(eb->request, 4);
-		if (ret)
-			return ret;
+	if (eb->engine->id != RCS) {
+		DRM_DEBUG("non-0 rel constants mode on non-RCS\n");
+		return -EINVAL;
+	}
 
-		intel_ring_emit(ring, MI_NOOP);
-		intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1));
-		intel_ring_emit_reg(ring, INSTPM);
-		intel_ring_emit(ring, instp_mask << 16 | instp_mode);
-		intel_ring_advance(ring);
+	if (INTEL_GEN(dev_priv) < 4) {
+		DRM_DEBUG("no rel constants on pre-gen4\n");
+		return -EINVAL;
+	}
 
-		eb->i915->relative_constants_mode = instp_mode;
+	if (INTEL_GEN(dev_priv) > 5 &&
+	    mode == I915_EXEC_CONSTANTS_REL_SURFACE) {
+		DRM_DEBUG("rel surface constants mode invalid on gen5+\n");
+		return -EINVAL;
 	}
 
+	/* The HW changed the meaning on this bit on gen6 */
+	mask = I915_EXEC_CONSTANTS_MASK;
+	if (INTEL_GEN(dev_priv) >= 6)
+		mask &= ~I915_EXEC_CONSTANTS_REL_SURFACE;
+
+	ret = intel_ring_begin(eb->request, 4);
+	if (ret)
+		return ret;
+
+	ring = eb->request->ring;
+	intel_ring_emit(ring, MI_NOOP);
+	intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1));
+	intel_ring_emit_reg(ring, INSTPM);
+	intel_ring_emit(ring, mask << 16 | mode);
+	intel_ring_advance(ring);
+
+	dev_priv->relative_constants_mode = mode;
+
+	return 0;
+}
+
+static int
+eb_submit(struct i915_execbuffer *eb)
+{
+	int ret;
+
+	ret = eb_move_to_gpu(eb);
+	if (ret)
+		return ret;
+
+	ret = i915_switch_context(eb->request);
+	if (ret)
+		return ret;
+
+	ret = eb_set_constants_offset(eb);
+	if (ret)
+		return ret;
+
 	if (eb->args->flags & I915_EXEC_GEN7_SOL_RESET) {
 		ret = i915_reset_gen7_sol_offsets(eb->request);
 		if (ret)
@@ -1493,15 +1740,13 @@ execbuf_submit(struct i915_execbuffer *eb)
 
 	ret = eb->engine->emit_bb_start(eb->request,
 					eb->batch->node.start +
-					eb->batch_start_offset,
+					eb->args->batch_start_offset,
 					eb->args->batch_len,
 					eb->dispatch_flags);
 	if (ret)
 		return ret;
 
 	trace_i915_gem_ring_dispatch(eb->request, eb->dispatch_flags);
-
-	eb_move_to_active(eb);
 	add_to_client(eb->request, eb->file);
 
 	return 0;
@@ -1596,18 +1841,21 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	struct i915_execbuffer eb;
 	int ret;
 
+	BUILD_BUG_ON(__EXEC_OBJECT_INTERNAL_FLAGS & ~__EXEC_OBJECT_UNKNOWN_FLAGS);
+
 	if (!i915_gem_check_execbuffer(args))
 		return -EINVAL;
 
-	ret = validate_exec_list(dev, exec, args->buffer_count);
-	if (ret)
-		return ret;
-
 	eb.i915 = to_i915(dev);
 	eb.file = file;
 	eb.args = args;
-	eb.exec = exec;
-	eb.need_relocs = (args->flags & I915_EXEC_NO_RELOC) == 0;
+	if ((args->flags & I915_EXEC_NO_RELOC) == 0)
+		args->flags |= __EXEC_HAS_RELOC;
+	eb.exec = NULL;
+	eb.ctx = NULL;
+	eb.invalid_flags = __EXEC_OBJECT_UNKNOWN_FLAGS;
+	if (USES_FULL_PPGTT(eb.i915))
+		eb.invalid_flags |= EXEC_OBJECT_NEEDS_GTT;
 	reloc_cache_init(&eb.reloc_cache, eb.i915);
 
 	eb.dispatch_flags = 0;
@@ -1638,6 +1886,9 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 		eb.dispatch_flags |= I915_DISPATCH_RS;
 	}
 
+	if (eb_create(&eb))
+		return -ENOMEM;
+
 	/* Take a local wakeref for preparing to dispatch the execbuf as
 	 * we expect to access the hardware fairly frequently in the
 	 * process. Upon first dispatch, we acquire another prolonged
@@ -1645,70 +1896,56 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	 * 100ms.
 	 */
 	intel_runtime_pm_get(eb.i915);
-
 	ret = i915_mutex_lock_interruptible(dev);
 	if (ret)
-		goto pre_mutex_err;
+		goto err_rpm;
 
 	ret = eb_select_context(&eb);
-	if (ret) {
-		mutex_unlock(&dev->struct_mutex);
-		goto pre_mutex_err;
-	}
-
-	if (eb_create(&eb)) {
-		i915_gem_context_put(eb.ctx);
-		mutex_unlock(&dev->struct_mutex);
-		ret = -ENOMEM;
-		goto pre_mutex_err;
-	}
+	if (unlikely(ret))
+		goto err_unlock;
 
-	/* Look up object handles */
+	eb.exec = exec;
 	ret = eb_lookup_vmas(&eb);
-	if (ret)
-		goto err;
-
-	/* take note of the batch buffer before we might reorder the lists */
-	eb.batch = eb_get_batch(&eb);
-
-	/* Move the objects en-masse into the GTT, evicting if necessary. */
-	ret = eb_reserve(&eb);
-	if (ret)
-		goto err;
+	if (unlikely(ret))
+		goto err_context;
 
 	/* The objects are in their final locations, apply the relocations. */
-	if (eb.need_relocs)
+	if (args->flags & __EXEC_HAS_RELOC && !list_empty(&eb.relocs)) {
 		ret = eb_relocate(&eb);
-	if (ret) {
-		if (ret == -EFAULT) {
+		if (ret == -EAGAIN || ret == -EFAULT)
 			ret = eb_relocate_slow(&eb);
-			BUG_ON(!mutex_is_locked(&dev->struct_mutex));
-		}
-		if (ret)
-			goto err;
+		if (ret && args->flags & I915_EXEC_NO_RELOC)
+			/* If the user expects the execobject.offset and
+			 * reloc.presumed_offset to be an exact match,
+			 * as for using NO_RELOC, then we cannot update
+			 * the execobject.offset until we have completed
+			 * relocation.
+			 */
+			args->flags &= ~__EXEC_HAS_RELOC;
+		if (ret < 0)
+			goto err_vma;
 	}
 
 	/* Set the pending read domains for the batch buffer to COMMAND */
-	if (eb.batch->exec_entry->flags & EXEC_OBJECT_WRITE) {
+	if (unlikely(eb.batch->exec_entry->flags & EXEC_OBJECT_WRITE)) {
 		DRM_DEBUG("Attempting to use self-modifying batch buffer\n");
 		ret = -EINVAL;
-		goto err;
+		goto err_vma;
 	}
 	if (args->batch_start_offset > eb.batch->size ||
 	    args->batch_len > eb.batch->size - args->batch_start_offset) {
 		DRM_DEBUG("Attempting to use out-of-bounds batch\n");
 		ret = -EINVAL;
-		goto err;
+		goto err_vma;
 	}
 
-	eb.batch_start_offset = args->batch_start_offset;
 	if (intel_engine_needs_cmd_parser(eb.engine) && args->batch_len) {
 		struct i915_vma *vma;
 
 		vma = eb_parse(&eb, drm_is_current_master(file));
 		if (IS_ERR(vma)) {
 			ret = PTR_ERR(vma);
-			goto err;
+			goto err_vma;
 		}
 
 		if (vma) {
@@ -1722,7 +1959,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 			 * command parser has accepted.
 			 */
 			eb.dispatch_flags |= I915_DISPATCH_SECURE;
-			eb.batch_start_offset = 0;
+			eb.args->batch_start_offset = 0;
 			eb.batch = vma;
 		}
 	}
@@ -1731,7 +1968,6 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	 * batch" bit. Hence we need to pin secure batches into the global gtt.
 	 * hsw should have this fixed, but bdw mucks it up again. */
 	if (eb.dispatch_flags & I915_DISPATCH_SECURE) {
-		struct drm_i915_gem_object *obj = eb.batch->obj;
 		struct i915_vma *vma;
 
 		/*
@@ -1744,17 +1980,17 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 		 *   fitting due to fragmentation.
 		 * So this is actually safe.
 		 */
-		vma = i915_gem_object_ggtt_pin(obj, NULL, 0, 0, 0);
+		vma = i915_gem_object_ggtt_pin(eb.batch->obj, NULL, 0, 0, 0);
 		if (IS_ERR(vma)) {
 			ret = PTR_ERR(vma);
-			goto err;
+			goto err_vma;
 		}
 
 		eb.batch = vma;
 	}
 
 	if (args->batch_len == 0)
-		args->batch_len = eb.batch->size - eb.batch_start_offset;
+		args->batch_len = eb.batch->size - eb.args->batch_start_offset;
 
 	/* Allocate a request for this batch buffer nice and early. */
 	eb.request = i915_gem_request_alloc(eb.engine, eb.ctx);
@@ -1771,27 +2007,21 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	 */
 	eb.request->batch = eb.batch;
 
-	ret = execbuf_submit(&eb);
+	ret = eb_submit(&eb);
 	__i915_add_request(eb.request, ret == 0);
 
 err_batch_unpin:
-	/*
-	 * FIXME: We crucially rely upon the active tracking for the (ppgtt)
-	 * batch vma for correctness. For less ugly and less fragility this
-	 * needs to be adjusted to also track the ggtt batch vma properly as
-	 * active.
-	 */
 	if (eb.dispatch_flags & I915_DISPATCH_SECURE)
 		i915_vma_unpin(eb.batch);
-err:
-	/* the request owns the ref now */
-	eb_destroy(&eb);
+err_vma:
+	eb_release_vma(&eb);
+err_context:
+	i915_gem_context_put(eb.ctx);
+err_unlock:
 	mutex_unlock(&dev->struct_mutex);
-
-pre_mutex_err:
-	/* intel_gpu_busy should also get a ref, so it will free when the device
-	 * is really idle. */
+err_rpm:
 	intel_runtime_pm_put(eb.i915);
+	eb_destroy(&eb);
 	return ret;
 }
 
@@ -1815,8 +2045,12 @@ i915_gem_execbuffer(struct drm_device *dev, void *data,
 	}
 
 	/* Copy in the exec list from userland */
-	exec_list = drm_malloc_ab(sizeof(*exec_list), args->buffer_count);
-	exec2_list = drm_malloc_ab(sizeof(*exec2_list), args->buffer_count);
+	exec_list = drm_malloc_gfp(args->buffer_count,
+				   sizeof(*exec_list),
+				   __GFP_NOWARN | GFP_TEMPORARY);
+	exec2_list = drm_malloc_gfp(args->buffer_count + 1,
+				    sizeof(*exec2_list),
+				    __GFP_NOWARN | GFP_TEMPORARY);
 	if (exec_list == NULL || exec2_list == NULL) {
 		DRM_DEBUG("Failed to allocate exec list for %d buffers\n",
 			  args->buffer_count);
@@ -1841,7 +2075,7 @@ i915_gem_execbuffer(struct drm_device *dev, void *data,
 		exec2_list[i].relocs_ptr = exec_list[i].relocs_ptr;
 		exec2_list[i].alignment = exec_list[i].alignment;
 		exec2_list[i].offset = exec_list[i].offset;
-		if (INTEL_INFO(dev)->gen < 4)
+		if (INTEL_GEN(dev) < 4)
 			exec2_list[i].flags = EXEC_OBJECT_NEEDS_FENCE;
 		else
 			exec2_list[i].flags = 0;
@@ -1859,24 +2093,22 @@ i915_gem_execbuffer(struct drm_device *dev, void *data,
 	i915_execbuffer2_set_context_id(exec2, 0);
 
 	ret = i915_gem_do_execbuffer(dev, file, &exec2, exec2_list);
-	if (!ret) {
+	if (exec2.flags & __EXEC_HAS_RELOC) {
 		struct drm_i915_gem_exec_object __user *user_exec_list =
 			u64_to_user_ptr(args->buffers_ptr);
 
 		/* Copy the new buffer offsets back to the user's exec list. */
 		for (i = 0; i < args->buffer_count; i++) {
+			if ((exec2_list[i].offset & UPDATE) == 0)
+				continue;
+
 			exec2_list[i].offset =
-				gen8_canonical_addr(exec2_list[i].offset);
-			ret = __copy_to_user(&user_exec_list[i].offset,
-					     &exec2_list[i].offset,
-					     sizeof(user_exec_list[i].offset));
-			if (ret) {
-				ret = -EFAULT;
-				DRM_DEBUG("failed to copy %d exec entries "
-					  "back to user (%d)\n",
-					  args->buffer_count, ret);
+				gen8_canonical_addr(exec2_list[i].offset & PIN_OFFSET_MASK);
+			exec2_list[i].offset &= PIN_OFFSET_MASK;
+			if (__copy_to_user(&user_exec_list[i].offset,
+					   &exec2_list[i].offset,
+					   sizeof(user_exec_list[i].offset)))
 				break;
-			}
 		}
 	}
 
@@ -1890,11 +2122,11 @@ i915_gem_execbuffer2(struct drm_device *dev, void *data,
 		     struct drm_file *file)
 {
 	struct drm_i915_gem_execbuffer2 *args = data;
-	struct drm_i915_gem_exec_object2 *exec2_list = NULL;
+	struct drm_i915_gem_exec_object2 *exec2_list;
 	int ret;
 
 	if (args->buffer_count < 1 ||
-	    args->buffer_count > UINT_MAX / sizeof(*exec2_list)) {
+	    args->buffer_count >= UINT_MAX / sizeof(*exec2_list)) {
 		DRM_DEBUG("execbuf2 with %d buffers\n", args->buffer_count);
 		return -EINVAL;
 	}
@@ -1904,45 +2136,48 @@ i915_gem_execbuffer2(struct drm_device *dev, void *data,
 		return -EINVAL;
 	}
 
-	exec2_list = drm_malloc_gfp(args->buffer_count,
+	exec2_list = drm_malloc_gfp(args->buffer_count + 1,
 				    sizeof(*exec2_list),
-				    GFP_TEMPORARY);
+				    __GFP_NOWARN | GFP_TEMPORARY);
 	if (exec2_list == NULL) {
 		DRM_DEBUG("Failed to allocate exec list for %d buffers\n",
 			  args->buffer_count);
 		return -ENOMEM;
 	}
-	ret = copy_from_user(exec2_list,
-			     u64_to_user_ptr(args->buffers_ptr),
-			     sizeof(*exec2_list) * args->buffer_count);
-	if (ret != 0) {
-		DRM_DEBUG("copy %d exec entries failed %d\n",
-			  args->buffer_count, ret);
+	if (copy_from_user(exec2_list,
+			   u64_to_user_ptr(args->buffers_ptr),
+			   sizeof(*exec2_list) * args->buffer_count)) {
+		DRM_DEBUG("copy %d exec entries failed\n", args->buffer_count);
 		drm_free_large(exec2_list);
 		return -EFAULT;
 	}
 
 	ret = i915_gem_do_execbuffer(dev, file, args, exec2_list);
-	if (!ret) {
-		/* Copy the new buffer offsets back to the user's exec list. */
+
+	/* Now that we have begun execution of the batchbuffer, we ignore
+	 * any new error after this point. Also given that we have already
+	 * updated the associated relocations, we try to write out the current
+	 * object locations irrespective of any error.
+	 */
+	if (args->flags & __EXEC_HAS_RELOC) {
 		struct drm_i915_gem_exec_object2 __user *user_exec_list =
-				   u64_to_user_ptr(args->buffers_ptr);
+			u64_to_user_ptr(args->buffers_ptr);
 		int i;
 
+		/* Copy the new buffer offsets back to the user's exec list. */
+		user_access_begin();
 		for (i = 0; i < args->buffer_count; i++) {
+			if ((exec2_list[i].offset & UPDATE) == 0)
+				continue;
+
 			exec2_list[i].offset =
-				gen8_canonical_addr(exec2_list[i].offset);
-			ret = __copy_to_user(&user_exec_list[i].offset,
-					     &exec2_list[i].offset,
-					     sizeof(user_exec_list[i].offset));
-			if (ret) {
-				ret = -EFAULT;
-				DRM_DEBUG("failed to copy %d exec entries "
-					  "back to user\n",
-					  args->buffer_count);
-				break;
-			}
+				gen8_canonical_addr(exec2_list[i].offset & PIN_OFFSET_MASK);
+			unsafe_put_user(exec2_list[i].offset,
+					&user_exec_list[i].offset,
+					end_user);
 		}
+end_user:
+		user_access_end();
 	}
 
 	drm_free_large(exec2_list);
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index f9540683d2c0..ba04b0bf7fe0 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -230,7 +230,9 @@ struct i915_vma {
 	struct hlist_node obj_node;
 
 	/** This vma's place in the batchbuffer or on the eviction list */
-	struct list_head exec_list;
+	struct list_head exec_link;
+	struct list_head reloc_link;
+	struct list_head evict_link;
 
 	/**
 	 * Used for performing relocations during execbuffer insertion.
-- 
2.9.3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* ✗ Fi.CI.BAT: warning for series starting with [01/17] drm/i915: Skip holding an object reference for execbuf preparation (rev2)
  2016-08-22  8:03 Execbuf fixes and major tuning Chris Wilson
                   ` (17 preceding siblings ...)
  2016-08-23 13:52 ` ✗ Fi.CI.BAT: failure for series starting with [01/17] drm/i915: Skip holding an object reference for execbuf preparation Patchwork
@ 2016-08-25  6:51 ` Patchwork
  18 siblings, 0 replies; 47+ messages in thread
From: Patchwork @ 2016-08-25  6:51 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/17] drm/i915: Skip holding an object reference for execbuf preparation (rev2)
URL   : https://patchwork.freedesktop.org/series/11399/
State : warning

== Summary ==

Series 11399v2 Series without cover letter
http://patchwork.freedesktop.org/api/1.0/series/11399/revisions/2/mbox/

Test drv_hangman:
        Subgroup error-state-basic:
                pass       -> DMESG-WARN (fi-snb-2600)
Test gem_basic:
        Subgroup bad-close:
                dmesg-warn -> PASS       (fi-snb-2600)
Test gem_exec_gttfill:
        Subgroup basic:
                pass       -> SKIP       (fi-snb-2600)
Test gem_ringfill:
        Subgroup basic-default-forked:
                pass       -> DMESG-WARN (fi-snb-2600)
                dmesg-warn -> PASS       (fi-snb-2520m)
        Subgroup basic-default-hang:
                dmesg-warn -> PASS       (fi-snb-2600)
                pass       -> DMESG-WARN (fi-snb-2520m)
Test kms_cursor_legacy:
        Subgroup basic-flip-vs-cursor-legacy:
                fail       -> PASS       (fi-bsw-n3050)
                fail       -> PASS       (fi-skl-6700k)
        Subgroup basic-flip-vs-cursor-varying-size:
                fail       -> PASS       (fi-bsw-n3050)
                fail       -> PASS       (fi-skl-6700k)

fi-bdw-5557u     total:252  pass:235  dwarn:0   dfail:0   fail:2   skip:15 
fi-bsw-n3050     total:252  pass:204  dwarn:0   dfail:0   fail:2   skip:46 
fi-hsw-4770k     total:252  pass:228  dwarn:0   dfail:0   fail:2   skip:22 
fi-hsw-4770r     total:252  pass:224  dwarn:0   dfail:0   fail:2   skip:26 
fi-ivb-3520m     total:252  pass:220  dwarn:0   dfail:0   fail:1   skip:31 
fi-skl-6260u     total:252  pass:236  dwarn:0   dfail:0   fail:2   skip:14 
fi-skl-6700k     total:252  pass:218  dwarn:4   dfail:0   fail:2   skip:28 
fi-snb-2520m     total:252  pass:199  dwarn:8   dfail:0   fail:2   skip:43 
fi-snb-2600      total:252  pass:198  dwarn:8   dfail:0   fail:2   skip:44 

Results at /archive/results/CI_IGT_test/Patchwork_2425/

e14363ed1c12c86726914e3a790662e84e631c37 drm-intel-nightly: 2016y-08m-24d-18h-39m-56s UTC integration manifest
0c45e82 drm/i915: Pin the pages whilst operating on them
1ed4323 drm/i915: Fix i915_gem_evict_for_vma (soft-pinning)
dd07652 drm/i915: Allow the user to pass a context to any ring
1c729cf drm/i915: Defer active reference until required
76e71f4 drm/i915: Skip holding an object reference for execbuf preparation

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 03/17] drm/i915: Allow the user to pass a context to any ring
  2016-08-23 13:33         ` John Harrison
@ 2016-08-25 15:28           ` John Harrison
  2016-08-29 12:21             ` Joonas Lahtinen
  0 siblings, 1 reply; 47+ messages in thread
From: John Harrison @ 2016-08-25 15:28 UTC (permalink / raw)
  To: intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 3212 bytes --]

On 23/08/2016 14:33, John Harrison wrote:
> On 23/08/2016 14:28, John Harrison wrote:
>> On 22/08/2016 13:23, Chris Wilson wrote:
>>> On Mon, Aug 22, 2016 at 02:23:28PM +0300, Joonas Lahtinen wrote:
>>>> On ma, 2016-08-22 at 09:03 +0100, Chris Wilson wrote:
>>>>> With full-ppgtt, we want the user to have full control over their memory
>>>>> layout, with a separate instance per context. Forcing them to use a
>>>>> shared memory layout for !RCS not only duplicates the amount of work we
>>>>> have to do, but also defeats the memory segregation on offer.
>>>>>
>>>>> Signed-off-by: Chris Wilson<chris@chris-wilson.co.uk>
>>>>> ---
>>>>>   drivers/gpu/drm/i915/i915_gem_execbuffer.c | 5 +----
>>>>>   1 file changed, 1 insertion(+), 4 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
>>>>> index 8f9d5ad0cfd8..fb1a64738fb8 100644
>>>>> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
>>>>> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
>>>>> @@ -1250,12 +1250,9 @@ static struct i915_gem_context *
>>>>>   i915_gem_validate_context(struct drm_device *dev, struct drm_file *file,
>>>>>   			  struct intel_engine_cs *engine, const u32 ctx_id)
>>>>>   {
>>>>> -	struct i915_gem_context *ctx = NULL;
>>>>> +	struct i915_gem_context *ctx;
>>>>>   	struct i915_ctx_hang_stats *hs;
>>>>>   
>>>>> -	if (engine->id != RCS && ctx_id != DEFAULT_CONTEXT_HANDLE)
>>>>> -		return ERR_PTR(-EINVAL);
>>>>> -
>>>> One would think this existed due to lack of testing or bugs in early
>>>> hardware. Do we need to use IS_GEN or some other means of validation?
>>> No.
>>>
>>> This has nothing to do with the hardware logical state (that is found
>>> within intel_context and only enabled where appropriate). The
>>> i915_gem_context is the driver's segregation between clients. Not only
>>> is t required for tracking clients independently (currently hangstats,
>>> but the context would be the first place we start enforcing cgroups like
>>> controls), but it is vital for clients who want to control their memory
>>> layout without conflicts (with themselves and others).
>>> -Chris
>>>
>> It is also important for clients that want to submit lots of work in 
>> parallel from a single application by using multiple contexts. Other 
>> internal teams have been running with this patch for quite some time. 
>> I believe the only reason it has not been merged upstream before (it 
>> has been on the mailing list at least twice before that I know of) 
>> was the argument of no open source user.
>>
>> Reviewed-by: John Harrison <john.c.harrison@intel.com>
>>
>
> Actually, just found a previous instance. It had an r-b from Daniel 
> Thomas but Tvrtko vetoed it on the grounds of needing IGT coverage 
> first - message id '<565EF558.5050705@linux.intel.com>' on Dec 2nd 2015.
>

Just had a quick look at gem_ctx_switch and that seems to notice the 
change with this patch. Without it skips non-render engines, with it 
runs a bunch of non-default context tests across all engines. Is that 
sufficient to satisfy the IGT coverage requirement? Maybe with an update 
to make it fail rather than skip if it can't use a non-default context?

John.


[-- Attachment #1.2: Type: text/html, Size: 4441 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 08/17] drm/i915: Drop spinlocks around adding to the client request list
  2016-08-22  8:03 ` [PATCH 08/17] drm/i915: Drop spinlocks around adding to the client request list Chris Wilson
  2016-08-24 13:16   ` Mika Kuoppala
@ 2016-08-26  9:13   ` Mika Kuoppala
  2016-09-02 10:30   ` John Harrison
  2 siblings, 0 replies; 47+ messages in thread
From: Mika Kuoppala @ 2016-08-26  9:13 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Chris Wilson <chris@chris-wilson.co.uk> writes:

> Adding to the tail of the client request list as the only other user is
> in the throttle ioctl that iterates forwards over the list. It only
> needs protection against deletion of a request as it reads it, it simply
> won't see a new request added to the end of the list, or it would be too
> early and rejected. We can further reduce the number of spinlocks
> required when throttling by removing stale requests from the client_list
> as we throttle.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Revieved-by: Mika Kuoppala <mika.kuoppala@intel.com>

> ---
>  drivers/gpu/drm/i915/i915_debugfs.c        |  2 +-
>  drivers/gpu/drm/i915/i915_gem.c            | 14 ++++++------
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c | 14 +++++++-----
>  drivers/gpu/drm/i915/i915_gem_request.c    | 34 ++++++------------------------
>  drivers/gpu/drm/i915/i915_gem_request.h    |  4 +---
>  5 files changed, 23 insertions(+), 45 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 086053fa2820..996744708f31 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -480,7 +480,7 @@ static int i915_gem_object_info(struct seq_file *m, void* data)
>  		mutex_lock(&dev->struct_mutex);
>  		request = list_first_entry_or_null(&file_priv->mm.request_list,
>  						   struct drm_i915_gem_request,
> -						   client_list);
> +						   client_link);
>  		rcu_read_lock();
>  		task = pid_task(request && request->ctx->pid ?
>  				request->ctx->pid : file->pid,
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 7b8abda541e6..e432211e8b24 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3673,16 +3673,14 @@ i915_gem_ring_throttle(struct drm_device *dev, struct drm_file *file)
>  		return -EIO;
>  
>  	spin_lock(&file_priv->mm.lock);
> -	list_for_each_entry(request, &file_priv->mm.request_list, client_list) {
> +	list_for_each_entry(request, &file_priv->mm.request_list, client_link) {
>  		if (time_after_eq(request->emitted_jiffies, recent_enough))
>  			break;
>  
> -		/*
> -		 * Note that the request might not have been submitted yet.
> -		 * In which case emitted_jiffies will be zero.
> -		 */
> -		if (!request->emitted_jiffies)
> -			continue;
> +		if (target) {
> +			list_del(&target->client_link);
> +			target->file_priv = NULL;
> +		}
>  
>  		target = request;
>  	}
> @@ -4639,7 +4637,7 @@ void i915_gem_release(struct drm_device *dev, struct drm_file *file)
>  	 * file_priv.
>  	 */
>  	spin_lock(&file_priv->mm.lock);
> -	list_for_each_entry(request, &file_priv->mm.request_list, client_list)
> +	list_for_each_entry(request, &file_priv->mm.request_list, client_link)
>  		request->file_priv = NULL;
>  	spin_unlock(&file_priv->mm.lock);
>  
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index 125fb38eff40..5689445b1cd3 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -1421,6 +1421,14 @@ out:
>  	return vma;
>  }
>  
> +static void
> +add_to_client(struct drm_i915_gem_request *req,
> +	      struct drm_file *file)
> +{
> +	req->file_priv = file->driver_priv;
> +	list_add_tail(&req->client_link, &req->file_priv->mm.request_list);
> +}
> +
>  static int
>  execbuf_submit(struct i915_execbuffer_params *params,
>  	       struct drm_i915_gem_execbuffer2 *args,
> @@ -1512,6 +1520,7 @@ execbuf_submit(struct i915_execbuffer_params *params,
>  	trace_i915_gem_ring_dispatch(params->request, params->dispatch_flags);
>  
>  	i915_gem_execbuffer_move_to_active(vmas, params->request);
> +	add_to_client(params->request, params->file);
>  
>  	return 0;
>  }
> @@ -1808,10 +1817,6 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
>  	 */
>  	params->request->batch = params->batch;
>  
> -	ret = i915_gem_request_add_to_client(params->request, file);
> -	if (ret)
> -		goto err_request;
> -
>  	/*
>  	 * Save assorted stuff away to pass through to *_submission().
>  	 * NB: This data should be 'persistent' and not local as it will
> @@ -1825,7 +1830,6 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
>  	params->ctx                     = ctx;
>  
>  	ret = execbuf_submit(params, args, &eb->vmas);
> -err_request:
>  	__i915_add_request(params->request, ret == 0);
>  
>  err_batch_unpin:
> diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
> index 1a215320cefb..bf62427a35b7 100644
> --- a/drivers/gpu/drm/i915/i915_gem_request.c
> +++ b/drivers/gpu/drm/i915/i915_gem_request.c
> @@ -115,42 +115,20 @@ const struct fence_ops i915_fence_ops = {
>  	.timeline_value_str = i915_fence_timeline_value_str,
>  };
>  
> -int i915_gem_request_add_to_client(struct drm_i915_gem_request *req,
> -				   struct drm_file *file)
> -{
> -	struct drm_i915_private *dev_private;
> -	struct drm_i915_file_private *file_priv;
> -
> -	WARN_ON(!req || !file || req->file_priv);
> -
> -	if (!req || !file)
> -		return -EINVAL;
> -
> -	if (req->file_priv)
> -		return -EINVAL;
> -
> -	dev_private = req->i915;
> -	file_priv = file->driver_priv;
> -
> -	spin_lock(&file_priv->mm.lock);
> -	req->file_priv = file_priv;
> -	list_add_tail(&req->client_list, &file_priv->mm.request_list);
> -	spin_unlock(&file_priv->mm.lock);
> -
> -	return 0;
> -}
> -
>  static inline void
>  i915_gem_request_remove_from_client(struct drm_i915_gem_request *request)
>  {
> -	struct drm_i915_file_private *file_priv = request->file_priv;
> +	struct drm_i915_file_private *file_priv;
>  
> +	file_priv = request->file_priv;
>  	if (!file_priv)
>  		return;
>  
>  	spin_lock(&file_priv->mm.lock);
> -	list_del(&request->client_list);
> -	request->file_priv = NULL;
> +	if (request->file_priv) {
> +		list_del(&request->client_link);
> +		request->file_priv = NULL;
> +	}
>  	spin_unlock(&file_priv->mm.lock);
>  }
>  
> diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
> index 6c72bd8d9423..9d5a66bfc509 100644
> --- a/drivers/gpu/drm/i915/i915_gem_request.h
> +++ b/drivers/gpu/drm/i915/i915_gem_request.h
> @@ -132,7 +132,7 @@ struct drm_i915_gem_request {
>  
>  	struct drm_i915_file_private *file_priv;
>  	/** file_priv list entry for this request */
> -	struct list_head client_list;
> +	struct list_head client_link;
>  
>  	/**
>  	 * The ELSP only accepts two elements at a time, so we queue
> @@ -167,8 +167,6 @@ static inline bool fence_is_i915(struct fence *fence)
>  struct drm_i915_gem_request * __must_check
>  i915_gem_request_alloc(struct intel_engine_cs *engine,
>  		       struct i915_gem_context *ctx);
> -int i915_gem_request_add_to_client(struct drm_i915_gem_request *req,
> -				   struct drm_file *file);
>  void i915_gem_request_retire_upto(struct drm_i915_gem_request *req);
>  
>  static inline u32
> -- 
> 2.9.3
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 07/17] drm/i915: Use the precomputed value for whether to enable command parsing
  2016-08-24 14:33   ` John Harrison
@ 2016-08-27  9:12     ` Chris Wilson
  0 siblings, 0 replies; 47+ messages in thread
From: Chris Wilson @ 2016-08-27  9:12 UTC (permalink / raw)
  To: John Harrison; +Cc: intel-gfx

On Wed, Aug 24, 2016 at 03:33:30PM +0100, John Harrison wrote:
> On 22/08/2016 09:03, Chris Wilson wrote:
> >As i915.enable_cmd_parser is an unsafe option, make it read-only at
> >runtime. Now that it is constant, we can use the value determined during
> >initialisation as to whether we need the cmdparser at execbuffer time.
> I'm confused. I can't see where i915.enable_cmd_parse is used now.
> It does not appear to be mentioned anywhere other than i915_params.c
> after this patch. It is not used to calculate
> engine->needs_cmd_parser. There also seems to be no longer a check
> for PPGTT either.

It was meant to be at the start of intel_engine_init_cmd_parser().
https://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=breadcrumbs&id=52265e7a149a1721f31c443ef27de362f74a3309

The joy of rebasing :|
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 03/17] drm/i915: Allow the user to pass a context to any ring
  2016-08-25 15:28           ` John Harrison
@ 2016-08-29 12:21             ` Joonas Lahtinen
  0 siblings, 0 replies; 47+ messages in thread
From: Joonas Lahtinen @ 2016-08-29 12:21 UTC (permalink / raw)
  To: John Harrison, intel-gfx

On to, 2016-08-25 at 16:28 +0100, John Harrison wrote:
> Just had a quick look at gem_ctx_switch and that seems to notice the
> change with this patch. Without it skips non-render engines, with it
> runs a bunch of non-default context tests across all engines. Is that
> sufficient to satisfy the IGT coverage requirement? Maybe with an
> update to make it fail rather than skip if it can't use a non-default 
> context?
> 

Sounds good to me.

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 08/17] drm/i915: Drop spinlocks around adding to the client request list
  2016-08-22  8:03 ` [PATCH 08/17] drm/i915: Drop spinlocks around adding to the client request list Chris Wilson
  2016-08-24 13:16   ` Mika Kuoppala
  2016-08-26  9:13   ` Mika Kuoppala
@ 2016-09-02 10:30   ` John Harrison
  2016-09-02 10:59     ` Chris Wilson
  2 siblings, 1 reply; 47+ messages in thread
From: John Harrison @ 2016-09-02 10:30 UTC (permalink / raw)
  To: intel-gfx

On 22/08/2016 09:03, Chris Wilson wrote:
> Adding to the tail of the client request list as the only other user is
> in the throttle ioctl that iterates forwards over the list. It only
> needs protection against deletion of a request as it reads it, it simply
> won't see a new request added to the end of the list, or it would be too
> early and rejected. We can further reduce the number of spinlocks
> required when throttling by removing stale requests from the client_list
> as we throttle.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_debugfs.c        |  2 +-
>   drivers/gpu/drm/i915/i915_gem.c            | 14 ++++++------
>   drivers/gpu/drm/i915/i915_gem_execbuffer.c | 14 +++++++-----
>   drivers/gpu/drm/i915/i915_gem_request.c    | 34 ++++++------------------------
>   drivers/gpu/drm/i915/i915_gem_request.h    |  4 +---
>   5 files changed, 23 insertions(+), 45 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 086053fa2820..996744708f31 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -480,7 +480,7 @@ static int i915_gem_object_info(struct seq_file *m, void* data)
>   		mutex_lock(&dev->struct_mutex);
>   		request = list_first_entry_or_null(&file_priv->mm.request_list,
>   						   struct drm_i915_gem_request,
> -						   client_list);
> +						   client_link);
>   		rcu_read_lock();
>   		task = pid_task(request && request->ctx->pid ?
>   				request->ctx->pid : file->pid,
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 7b8abda541e6..e432211e8b24 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3673,16 +3673,14 @@ i915_gem_ring_throttle(struct drm_device *dev, struct drm_file *file)
>   		return -EIO;
>   
>   	spin_lock(&file_priv->mm.lock);
> -	list_for_each_entry(request, &file_priv->mm.request_list, client_list) {
> +	list_for_each_entry(request, &file_priv->mm.request_list, client_link) {
>   		if (time_after_eq(request->emitted_jiffies, recent_enough))
>   			break;
>   
> -		/*
> -		 * Note that the request might not have been submitted yet.
> -		 * In which case emitted_jiffies will be zero.
> -		 */
> -		if (!request->emitted_jiffies)
> -			continue;
> +		if (target) {
> +			list_del(&target->client_link);
> +			target->file_priv = NULL;
> +		}
>   
>   		target = request;
>   	}
> @@ -4639,7 +4637,7 @@ void i915_gem_release(struct drm_device *dev, struct drm_file *file)
>   	 * file_priv.
>   	 */
>   	spin_lock(&file_priv->mm.lock);
> -	list_for_each_entry(request, &file_priv->mm.request_list, client_list)
> +	list_for_each_entry(request, &file_priv->mm.request_list, client_link)
>   		request->file_priv = NULL;
>   	spin_unlock(&file_priv->mm.lock);
>   
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index 125fb38eff40..5689445b1cd3 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -1421,6 +1421,14 @@ out:
>   	return vma;
>   }
>   
> +static void
> +add_to_client(struct drm_i915_gem_request *req,
> +	      struct drm_file *file)
> +{
> +	req->file_priv = file->driver_priv;
> +	list_add_tail(&req->client_link, &req->file_priv->mm.request_list);
> +}
> +
>   static int
>   execbuf_submit(struct i915_execbuffer_params *params,
>   	       struct drm_i915_gem_execbuffer2 *args,
> @@ -1512,6 +1520,7 @@ execbuf_submit(struct i915_execbuffer_params *params,
>   	trace_i915_gem_ring_dispatch(params->request, params->dispatch_flags);
>   
>   	i915_gem_execbuffer_move_to_active(vmas, params->request);
> +	add_to_client(params->request, params->file);
>   
>   	return 0;
>   }
> @@ -1808,10 +1817,6 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
>   	 */
>   	params->request->batch = params->batch;
>   
> -	ret = i915_gem_request_add_to_client(params->request, file);
> -	if (ret)
> -		goto err_request;
> -
>   	/*
>   	 * Save assorted stuff away to pass through to *_submission().
>   	 * NB: This data should be 'persistent' and not local as it will
> @@ -1825,7 +1830,6 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
>   	params->ctx                     = ctx;
>   
>   	ret = execbuf_submit(params, args, &eb->vmas);
> -err_request:
>   	__i915_add_request(params->request, ret == 0);
>   
>   err_batch_unpin:
> diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
> index 1a215320cefb..bf62427a35b7 100644
> --- a/drivers/gpu/drm/i915/i915_gem_request.c
> +++ b/drivers/gpu/drm/i915/i915_gem_request.c
> @@ -115,42 +115,20 @@ const struct fence_ops i915_fence_ops = {
>   	.timeline_value_str = i915_fence_timeline_value_str,
>   };
>   
> -int i915_gem_request_add_to_client(struct drm_i915_gem_request *req,
> -				   struct drm_file *file)
> -{
> -	struct drm_i915_private *dev_private;
> -	struct drm_i915_file_private *file_priv;
> -
> -	WARN_ON(!req || !file || req->file_priv);
> -
> -	if (!req || !file)
> -		return -EINVAL;
> -
> -	if (req->file_priv)
> -		return -EINVAL;
> -
> -	dev_private = req->i915;
> -	file_priv = file->driver_priv;
> -
> -	spin_lock(&file_priv->mm.lock);
> -	req->file_priv = file_priv;
> -	list_add_tail(&req->client_list, &file_priv->mm.request_list);
> -	spin_unlock(&file_priv->mm.lock);
> -
> -	return 0;
> -}
> -
>   static inline void
>   i915_gem_request_remove_from_client(struct drm_i915_gem_request *request)
>   {
> -	struct drm_i915_file_private *file_priv = request->file_priv;
> +	struct drm_i915_file_private *file_priv;
>   
> +	file_priv = request->file_priv;
>   	if (!file_priv)
>   		return;
>   
>   	spin_lock(&file_priv->mm.lock);
> -	list_del(&request->client_list);
> -	request->file_priv = NULL;
> +	if (request->file_priv) {
Why check for request->file_priv again? The block above will exit if it 
is null. There surely can't be a race with remove_from_client being 
called concurrently with add_to_client? Especially as add_to_client no 
longer takes the spin_lock anyway.

> +		list_del(&request->client_link);
> +		request->file_priv = NULL;
> +	}
>   	spin_unlock(&file_priv->mm.lock);
>   }
>   
> diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
> index 6c72bd8d9423..9d5a66bfc509 100644
> --- a/drivers/gpu/drm/i915/i915_gem_request.h
> +++ b/drivers/gpu/drm/i915/i915_gem_request.h
> @@ -132,7 +132,7 @@ struct drm_i915_gem_request {
>   
>   	struct drm_i915_file_private *file_priv;
>   	/** file_priv list entry for this request */
> -	struct list_head client_list;
> +	struct list_head client_link;
>   
>   	/**
>   	 * The ELSP only accepts two elements at a time, so we queue
> @@ -167,8 +167,6 @@ static inline bool fence_is_i915(struct fence *fence)
>   struct drm_i915_gem_request * __must_check
>   i915_gem_request_alloc(struct intel_engine_cs *engine,
>   		       struct i915_gem_context *ctx);
> -int i915_gem_request_add_to_client(struct drm_i915_gem_request *req,
> -				   struct drm_file *file);
>   void i915_gem_request_retire_upto(struct drm_i915_gem_request *req);
>   
>   static inline u32

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 08/17] drm/i915: Drop spinlocks around adding to the client request list
  2016-09-02 10:30   ` John Harrison
@ 2016-09-02 10:59     ` Chris Wilson
  2016-09-02 11:02       ` Chris Wilson
  0 siblings, 1 reply; 47+ messages in thread
From: Chris Wilson @ 2016-09-02 10:59 UTC (permalink / raw)
  To: John Harrison; +Cc: intel-gfx

On Fri, Sep 02, 2016 at 11:30:18AM +0100, John Harrison wrote:
> On 22/08/2016 09:03, Chris Wilson wrote:
> >Adding to the tail of the client request list as the only other user is
> >in the throttle ioctl that iterates forwards over the list. It only
> >needs protection against deletion of a request as it reads it, it simply
> >won't see a new request added to the end of the list, or it would be too
> >early and rejected. We can further reduce the number of spinlocks
> >required when throttling by removing stale requests from the client_list
> >as we throttle.
> >
> >Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >---
> >  drivers/gpu/drm/i915/i915_debugfs.c        |  2 +-
> >  drivers/gpu/drm/i915/i915_gem.c            | 14 ++++++------
> >  drivers/gpu/drm/i915/i915_gem_execbuffer.c | 14 +++++++-----
> >  drivers/gpu/drm/i915/i915_gem_request.c    | 34 ++++++------------------------
> >  drivers/gpu/drm/i915/i915_gem_request.h    |  4 +---
> >  5 files changed, 23 insertions(+), 45 deletions(-)
> >
> >diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> >index 086053fa2820..996744708f31 100644
> >--- a/drivers/gpu/drm/i915/i915_debugfs.c
> >+++ b/drivers/gpu/drm/i915/i915_debugfs.c
> >@@ -480,7 +480,7 @@ static int i915_gem_object_info(struct seq_file *m, void* data)
> >  		mutex_lock(&dev->struct_mutex);
> >  		request = list_first_entry_or_null(&file_priv->mm.request_list,
> >  						   struct drm_i915_gem_request,
> >-						   client_list);
> >+						   client_link);
> >  		rcu_read_lock();
> >  		task = pid_task(request && request->ctx->pid ?
> >  				request->ctx->pid : file->pid,
> >diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> >index 7b8abda541e6..e432211e8b24 100644
> >--- a/drivers/gpu/drm/i915/i915_gem.c
> >+++ b/drivers/gpu/drm/i915/i915_gem.c
> >@@ -3673,16 +3673,14 @@ i915_gem_ring_throttle(struct drm_device *dev, struct drm_file *file)
> >  		return -EIO;
> >  	spin_lock(&file_priv->mm.lock);
> >-	list_for_each_entry(request, &file_priv->mm.request_list, client_list) {
> >+	list_for_each_entry(request, &file_priv->mm.request_list, client_link) {
> >  		if (time_after_eq(request->emitted_jiffies, recent_enough))
> >  			break;
> >-		/*
> >-		 * Note that the request might not have been submitted yet.
> >-		 * In which case emitted_jiffies will be zero.
> >-		 */
> >-		if (!request->emitted_jiffies)
> >-			continue;
> >+		if (target) {
> >+			list_del(&target->client_link);
> >+			target->file_priv = NULL;
> >+		}
> >  		target = request;
> >  	}
> >@@ -4639,7 +4637,7 @@ void i915_gem_release(struct drm_device *dev, struct drm_file *file)
> >  	 * file_priv.
> >  	 */
> >  	spin_lock(&file_priv->mm.lock);
> >-	list_for_each_entry(request, &file_priv->mm.request_list, client_list)
> >+	list_for_each_entry(request, &file_priv->mm.request_list, client_link)
> >  		request->file_priv = NULL;
> >  	spin_unlock(&file_priv->mm.lock);
> >diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> >index 125fb38eff40..5689445b1cd3 100644
> >--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> >+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> >@@ -1421,6 +1421,14 @@ out:
> >  	return vma;
> >  }
> >+static void
> >+add_to_client(struct drm_i915_gem_request *req,
> >+	      struct drm_file *file)
> >+{
> >+	req->file_priv = file->driver_priv;
> >+	list_add_tail(&req->client_link, &req->file_priv->mm.request_list);
> >+}
> >+
> >  static int
> >  execbuf_submit(struct i915_execbuffer_params *params,
> >  	       struct drm_i915_gem_execbuffer2 *args,
> >@@ -1512,6 +1520,7 @@ execbuf_submit(struct i915_execbuffer_params *params,
> >  	trace_i915_gem_ring_dispatch(params->request, params->dispatch_flags);
> >  	i915_gem_execbuffer_move_to_active(vmas, params->request);
> >+	add_to_client(params->request, params->file);
> >  	return 0;
> >  }
> >@@ -1808,10 +1817,6 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
> >  	 */
> >  	params->request->batch = params->batch;
> >-	ret = i915_gem_request_add_to_client(params->request, file);
> >-	if (ret)
> >-		goto err_request;
> >-
> >  	/*
> >  	 * Save assorted stuff away to pass through to *_submission().
> >  	 * NB: This data should be 'persistent' and not local as it will
> >@@ -1825,7 +1830,6 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
> >  	params->ctx                     = ctx;
> >  	ret = execbuf_submit(params, args, &eb->vmas);
> >-err_request:
> >  	__i915_add_request(params->request, ret == 0);
> >  err_batch_unpin:
> >diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
> >index 1a215320cefb..bf62427a35b7 100644
> >--- a/drivers/gpu/drm/i915/i915_gem_request.c
> >+++ b/drivers/gpu/drm/i915/i915_gem_request.c
> >@@ -115,42 +115,20 @@ const struct fence_ops i915_fence_ops = {
> >  	.timeline_value_str = i915_fence_timeline_value_str,
> >  };
> >-int i915_gem_request_add_to_client(struct drm_i915_gem_request *req,
> >-				   struct drm_file *file)
> >-{
> >-	struct drm_i915_private *dev_private;
> >-	struct drm_i915_file_private *file_priv;
> >-
> >-	WARN_ON(!req || !file || req->file_priv);
> >-
> >-	if (!req || !file)
> >-		return -EINVAL;
> >-
> >-	if (req->file_priv)
> >-		return -EINVAL;
> >-
> >-	dev_private = req->i915;
> >-	file_priv = file->driver_priv;
> >-
> >-	spin_lock(&file_priv->mm.lock);
> >-	req->file_priv = file_priv;
> >-	list_add_tail(&req->client_list, &file_priv->mm.request_list);
> >-	spin_unlock(&file_priv->mm.lock);
> >-
> >-	return 0;
> >-}
> >-
> >  static inline void
> >  i915_gem_request_remove_from_client(struct drm_i915_gem_request *request)
> >  {
> >-	struct drm_i915_file_private *file_priv = request->file_priv;
> >+	struct drm_i915_file_private *file_priv;
> >+	file_priv = request->file_priv;
> >  	if (!file_priv)
> >  		return;
> >  	spin_lock(&file_priv->mm.lock);
> >-	list_del(&request->client_list);
> >-	request->file_priv = NULL;
> >+	if (request->file_priv) {
> Why check for request->file_priv again? The block above will exit if
> it is null. There surely can't be a race with remove_from_client
> being called concurrently with add_to_client? Especially as
> add_to_client no longer takes the spin_lock anyway.

We can however allow i915_gem_release() to be called concurrently. It
doesn't require struct_mutex.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 08/17] drm/i915: Drop spinlocks around adding to the client request list
  2016-09-02 10:59     ` Chris Wilson
@ 2016-09-02 11:02       ` Chris Wilson
  2016-09-02 13:20         ` John Harrison
  0 siblings, 1 reply; 47+ messages in thread
From: Chris Wilson @ 2016-09-02 11:02 UTC (permalink / raw)
  To: John Harrison, intel-gfx

On Fri, Sep 02, 2016 at 11:59:12AM +0100, Chris Wilson wrote:
> On Fri, Sep 02, 2016 at 11:30:18AM +0100, John Harrison wrote:
> > >+	if (request->file_priv) {
> > Why check for request->file_priv again? The block above will exit if
> > it is null. There surely can't be a race with remove_from_client
> > being called concurrently with add_to_client? Especially as
> > add_to_client no longer takes the spin_lock anyway.
> 
> We can however allow i915_gem_release() to be called concurrently. It
> doesn't require struct_mutex.

That's not correct. setting file_priv = NULL is serialised by the
struct_mutex, not the file_priv->mm.lock. That spin_lock there is
misleading.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 08/17] drm/i915: Drop spinlocks around adding to the client request list
  2016-09-02 11:02       ` Chris Wilson
@ 2016-09-02 13:20         ` John Harrison
  2016-09-02 13:38           ` Chris Wilson
  0 siblings, 1 reply; 47+ messages in thread
From: John Harrison @ 2016-09-02 13:20 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On 02/09/2016 12:02, Chris Wilson wrote:
> On Fri, Sep 02, 2016 at 11:59:12AM +0100, Chris Wilson wrote:
>> On Fri, Sep 02, 2016 at 11:30:18AM +0100, John Harrison wrote:
>>>> +	if (request->file_priv) {
>>> Why check for request->file_priv again? The block above will exit if
>>> it is null. There surely can't be a race with remove_from_client
>>> being called concurrently with add_to_client? Especially as
>>> add_to_client no longer takes the spin_lock anyway.
>> We can however allow i915_gem_release() to be called concurrently. It
>> doesn't require struct_mutex.
> That's not correct. setting file_priv = NULL is serialised by the
> struct_mutex, not the file_priv->mm.lock. That spin_lock there is
> misleading.
> -Chris
>

The spinlock is to protect the list rather than the request, I thought.

AFAICS, i915_gem_release() is only called from i915_driver_preclose() 
and that holds the mutex lock. So you are safe?

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 08/17] drm/i915: Drop spinlocks around adding to the client request list
  2016-09-02 13:20         ` John Harrison
@ 2016-09-02 13:38           ` Chris Wilson
  0 siblings, 0 replies; 47+ messages in thread
From: Chris Wilson @ 2016-09-02 13:38 UTC (permalink / raw)
  To: John Harrison; +Cc: intel-gfx

On Fri, Sep 02, 2016 at 02:20:16PM +0100, John Harrison wrote:
> On 02/09/2016 12:02, Chris Wilson wrote:
> >On Fri, Sep 02, 2016 at 11:59:12AM +0100, Chris Wilson wrote:
> >>On Fri, Sep 02, 2016 at 11:30:18AM +0100, John Harrison wrote:
> >>>>+	if (request->file_priv) {
> >>>Why check for request->file_priv again? The block above will exit if
> >>>it is null. There surely can't be a race with remove_from_client
> >>>being called concurrently with add_to_client? Especially as
> >>>add_to_client no longer takes the spin_lock anyway.
> >>We can however allow i915_gem_release() to be called concurrently. It
> >>doesn't require struct_mutex.
> >That's not correct. setting file_priv = NULL is serialised by the
> >struct_mutex, not the file_priv->mm.lock. That spin_lock there is
> >misleading.
> >-Chris
> >
> 
> The spinlock is to protect the list rather than the request, I thought.

Hmm. We hold a ref to drm_file / drm_i915_file_private and the write
side to the list is serialised by struct_mutex, and we have RCU on the
list itself, so we could just swap the spinlock for an rcu list walk.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 06/17] drm/i915: Move obj->dirty:1 to obj->flags
  2016-08-22  8:03 ` [PATCH 06/17] drm/i915: Move obj->dirty:1 to obj->flags Chris Wilson
  2016-08-23 12:01   ` Joonas Lahtinen
@ 2016-09-06 11:37   ` Dave Gordon
  2016-09-06 13:16     ` Chris Wilson
  1 sibling, 1 reply; 47+ messages in thread
From: Dave Gordon @ 2016-09-06 11:37 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On 22/08/16 09:03, Chris Wilson wrote:
> The obj->dirty bit is a companion to the obj->active bits that were
> moved to the obj->flags bitmask. Since we also update this bit inside
> the i915_vma_move_to_active() hotpath, we can aide gcc by also moving
> the obj->dirty bit to obj->flags bitmask.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/i915_debugfs.c        |  2 +-
>  drivers/gpu/drm/i915/i915_drv.h            | 22 +++++++++++++++++++++-
>  drivers/gpu/drm/i915/i915_gem.c            | 20 ++++++++++----------
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c |  3 +--
>  drivers/gpu/drm/i915/i915_gem_userptr.c    |  6 +++---
>  drivers/gpu/drm/i915/i915_gpu_error.c      |  2 +-
>  drivers/gpu/drm/i915/intel_lrc.c           |  6 +++---
>  7 files changed, 40 insertions(+), 21 deletions(-)
>

[snip]

> @@ -3272,7 +3272,7 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
>  	if (write) {
>  		obj->base.read_domains = I915_GEM_DOMAIN_GTT;
>  		obj->base.write_domain = I915_GEM_DOMAIN_GTT;
> -		obj->dirty = 1;
> +		i915_gem_object_set_dirty(obj);

At this point the object is or may not be pinned. From our conversation 
back in April ...

>> What I particularly dislike about the current obj->dirty is that it is
>> strictly only valid inside a pin_pages/unpin_pages section. That isn't
>> clear from the API atm.
>> -Chris
>
> So, I tried replacing all instances of "obj->dirty = true" with my
> newfunction i915_gem_object_mark_dirty(), and added an assertion that
 > it's called only when (pages_pin_count > 0) - and found a failure.
>
> Stack is:
>     i915_gem_object_mark_dirty
>     i915_gem_object_set_to_gtt_domain
>     i915_gem_set_domain_ioctl
>
> So is i915_gem_object_set_to_gtt_domain() wrong? It's done a
> get_pages but no pin_pages.

... and the same issue is still there. I think I worked out a 
hypothetical scenario where this would lead to data corruption:

* set to GTT => mark dirty
* BO paged out => flushed to swap, marked clean
* BO paged in => still clean
* update contents => still clean?
* get paged out => not written out?

But can this actually happen?

.Dave.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 06/17] drm/i915: Move obj->dirty:1 to obj->flags
  2016-09-06 11:37   ` Dave Gordon
@ 2016-09-06 13:16     ` Chris Wilson
  0 siblings, 0 replies; 47+ messages in thread
From: Chris Wilson @ 2016-09-06 13:16 UTC (permalink / raw)
  To: Dave Gordon; +Cc: intel-gfx

On Tue, Sep 06, 2016 at 12:37:33PM +0100, Dave Gordon wrote:
> On 22/08/16 09:03, Chris Wilson wrote:
> >The obj->dirty bit is a companion to the obj->active bits that were
> >moved to the obj->flags bitmask. Since we also update this bit inside
> >the i915_vma_move_to_active() hotpath, we can aide gcc by also moving
> >the obj->dirty bit to obj->flags bitmask.
> >
> >Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >---
> > drivers/gpu/drm/i915/i915_debugfs.c        |  2 +-
> > drivers/gpu/drm/i915/i915_drv.h            | 22 +++++++++++++++++++++-
> > drivers/gpu/drm/i915/i915_gem.c            | 20 ++++++++++----------
> > drivers/gpu/drm/i915/i915_gem_execbuffer.c |  3 +--
> > drivers/gpu/drm/i915/i915_gem_userptr.c    |  6 +++---
> > drivers/gpu/drm/i915/i915_gpu_error.c      |  2 +-
> > drivers/gpu/drm/i915/intel_lrc.c           |  6 +++---
> > 7 files changed, 40 insertions(+), 21 deletions(-)
> >
> 
> [snip]
> 
> >@@ -3272,7 +3272,7 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
> > 	if (write) {
> > 		obj->base.read_domains = I915_GEM_DOMAIN_GTT;
> > 		obj->base.write_domain = I915_GEM_DOMAIN_GTT;
> >-		obj->dirty = 1;
> >+		i915_gem_object_set_dirty(obj);
> 
> At this point the object is or may not be pinned. From our
> conversation back in April ...
> 
> >>What I particularly dislike about the current obj->dirty is that it is
> >>strictly only valid inside a pin_pages/unpin_pages section. That isn't
> >>clear from the API atm.
> >>-Chris
> >
> >So, I tried replacing all instances of "obj->dirty = true" with my
> >newfunction i915_gem_object_mark_dirty(), and added an assertion that
> > it's called only when (pages_pin_count > 0) - and found a failure.
> >
> >Stack is:
> >    i915_gem_object_mark_dirty
> >    i915_gem_object_set_to_gtt_domain
> >    i915_gem_set_domain_ioctl
> >
> >So is i915_gem_object_set_to_gtt_domain() wrong? It's done a
> >get_pages but no pin_pages.
> 
> ... and the same issue is still there. I think I worked out a
> hypothetical scenario where this would lead to data corruption:
> 
> * set to GTT => mark dirty
> * BO paged out => flushed to swap, marked clean
> * BO paged in => still clean

Key step: how do we trigger the page in?

When we flush the pages out they are marked as being in the CPU domain.
In order to page them back in requires user action (set-domain, execbuf,
pwrite, gtt mmap pagefault) being the primary ones. Any of those will mark
them dirty again.

> * update contents => still clean?
> * get paged out => not written out?
> 
> But can this actually happen?

Hopefully not as the sequence should be more like:

* set to GTT => mark dirty
...
* BO paged out => flushed to swap, marked clean, marked as CPU
...
* set to write=GTT, BO paged in => mark dirty

-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2016-09-06 13:16 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-22  8:03 Execbuf fixes and major tuning Chris Wilson
2016-08-22  8:03 ` [PATCH 01/17] drm/i915: Skip holding an object reference for execbuf preparation Chris Wilson
2016-08-22 11:21   ` Joonas Lahtinen
2016-08-22 11:56     ` Chris Wilson
2016-08-22  8:03 ` [PATCH 02/17] drm/i915: Defer active reference until required Chris Wilson
2016-08-22  8:03 ` [PATCH 03/17] drm/i915: Allow the user to pass a context to any ring Chris Wilson
2016-08-22 11:23   ` Joonas Lahtinen
2016-08-22 12:23     ` Chris Wilson
2016-08-23 13:28       ` John Harrison
2016-08-23 13:33         ` John Harrison
2016-08-25 15:28           ` John Harrison
2016-08-29 12:21             ` Joonas Lahtinen
2016-08-22  8:03 ` [PATCH 04/17] drm/i915: Fix i915_gem_evict_for_vma (soft-pinning) Chris Wilson
2016-08-22  8:03 ` [PATCH 05/17] drm/i915: Pin the pages whilst operating on them Chris Wilson
2016-08-23 11:54   ` Joonas Lahtinen
2016-08-22  8:03 ` [PATCH 06/17] drm/i915: Move obj->dirty:1 to obj->flags Chris Wilson
2016-08-23 12:01   ` Joonas Lahtinen
2016-09-06 11:37   ` Dave Gordon
2016-09-06 13:16     ` Chris Wilson
2016-08-22  8:03 ` [PATCH 07/17] drm/i915: Use the precomputed value for whether to enable command parsing Chris Wilson
2016-08-24 14:33   ` John Harrison
2016-08-27  9:12     ` Chris Wilson
2016-08-22  8:03 ` [PATCH 08/17] drm/i915: Drop spinlocks around adding to the client request list Chris Wilson
2016-08-24 13:16   ` Mika Kuoppala
2016-08-24 13:25     ` Chris Wilson
2016-08-26  9:13   ` Mika Kuoppala
2016-09-02 10:30   ` John Harrison
2016-09-02 10:59     ` Chris Wilson
2016-09-02 11:02       ` Chris Wilson
2016-09-02 13:20         ` John Harrison
2016-09-02 13:38           ` Chris Wilson
2016-08-22  8:03 ` [PATCH 09/17] drm/i915: Amalgamate execbuffer parameter structures Chris Wilson
2016-08-24 13:20   ` John Harrison
2016-08-22  8:03 ` [PATCH 10/17] drm/i915: Use vma->exec_entry as our double-entry placeholder Chris Wilson
2016-08-22  8:03 ` [PATCH 11/17] drm/i915: Store a direct lookup from object handle to vma Chris Wilson
2016-08-22  8:03 ` [PATCH 12/17] drm/i915: Pass vma to relocate entry Chris Wilson
2016-08-22  8:03 ` [PATCH 13/17] drm/i915: Eliminate lots of iterations over the execobjects array Chris Wilson
2016-08-25  6:03   ` [PATCH v2] " Chris Wilson
2016-08-22  8:03 ` [PATCH 14/17] drm/i915: First try the previous execbuffer location Chris Wilson
2016-08-22  8:03 ` [PATCH 15/17] drm/i915: Wait upon userptr get-user-pages within execbuffer Chris Wilson
2016-08-23 10:53   ` Joonas Lahtinen
2016-08-23 11:14     ` Chris Wilson
2016-08-22  8:03 ` [PATCH 16/17] drm/i915: Remove superfluous i915_add_request_no_flush() helper Chris Wilson
2016-08-23 10:21   ` Joonas Lahtinen
2016-08-22  8:03 ` [PATCH 17/17] drm/i915: Use the MRU stack search after evicting Chris Wilson
2016-08-23 13:52 ` ✗ Fi.CI.BAT: failure for series starting with [01/17] drm/i915: Skip holding an object reference for execbuf preparation Patchwork
2016-08-25  6:51 ` ✗ Fi.CI.BAT: warning for series starting with [01/17] drm/i915: Skip holding an object reference for execbuf preparation (rev2) Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.