All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 01/23] drm/i915: Drop vm.ref for duplicate vma on construction
@ 2020-07-02  8:32 ` Chris Wilson
  0 siblings, 0 replies; 56+ messages in thread
From: Chris Wilson @ 2020-07-02  8:32 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson, Tvrtko Ursulin, stable

As we allow for parallel threads to create vma instances in parallel,
and we only filter out the duplicates upon reacquiring the spinlock for
the rbtree, we have to free the loser of the constructors' race. When
freeing, we should also drop any resource references acquired for the
redundant vma.

Fixes: 2850748ef876 ("drm/i915: Pull i915_vma_pin under the vm->mutex")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: <stable@vger.kernel.org> # v5.5+
---
 drivers/gpu/drm/i915/i915_vma.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 1f63c4a1f055..7fe1f317cd2b 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -198,6 +198,7 @@ vma_create(struct drm_i915_gem_object *obj,
 		cmp = i915_vma_compare(pos, vm, view);
 		if (cmp == 0) {
 			spin_unlock(&obj->vma.lock);
+			i915_vm_put(vm);
 			i915_vma_free(vma);
 			return pos;
 		}
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [Intel-gfx] [PATCH 01/23] drm/i915: Drop vm.ref for duplicate vma on construction
@ 2020-07-02  8:32 ` Chris Wilson
  0 siblings, 0 replies; 56+ messages in thread
From: Chris Wilson @ 2020-07-02  8:32 UTC (permalink / raw)
  To: intel-gfx; +Cc: stable, Chris Wilson

As we allow for parallel threads to create vma instances in parallel,
and we only filter out the duplicates upon reacquiring the spinlock for
the rbtree, we have to free the loser of the constructors' race. When
freeing, we should also drop any resource references acquired for the
redundant vma.

Fixes: 2850748ef876 ("drm/i915: Pull i915_vma_pin under the vm->mutex")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: <stable@vger.kernel.org> # v5.5+
---
 drivers/gpu/drm/i915/i915_vma.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 1f63c4a1f055..7fe1f317cd2b 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -198,6 +198,7 @@ vma_create(struct drm_i915_gem_object *obj,
 		cmp = i915_vma_compare(pos, vm, view);
 		if (cmp == 0) {
 			spin_unlock(&obj->vma.lock);
+			i915_vm_put(vm);
 			i915_vma_free(vma);
 			return pos;
 		}
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [Intel-gfx] [PATCH 02/23] drm/i915/gem: Split the context's obj:vma lut into its own mutex
  2020-07-02  8:32 ` [Intel-gfx] " Chris Wilson
  (?)
@ 2020-07-02  8:32 ` Chris Wilson
  2020-07-02 22:09   ` Andi Shyti
  -1 siblings, 1 reply; 56+ messages in thread
From: Chris Wilson @ 2020-07-02  8:32 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Rather than reuse the common ctx->mutex for locking the execbuffer LUT,
split it into its own lock to avoid being taken [as part of ctx->mutex]
at inappropriate times. In particular to avoid the inversion from taking
the timeline->mutex for the whole execbuf submission in the next patch.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c       | 11 +++++++----
 drivers/gpu/drm/i915/gem/i915_gem_context_types.h |  1 +
 drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c    |  4 ++--
 drivers/gpu/drm/i915/gem/i915_gem_object.c        |  4 ++--
 drivers/gpu/drm/i915/gem/selftests/mock_context.c |  4 +++-
 5 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 6675447a47b9..6574af699233 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -101,8 +101,7 @@ static void lut_close(struct i915_gem_context *ctx)
 	struct radix_tree_iter iter;
 	void __rcu **slot;
 
-	lockdep_assert_held(&ctx->mutex);
-
+	mutex_lock(&ctx->lut_mutex);
 	rcu_read_lock();
 	radix_tree_for_each_slot(slot, &ctx->handles_vma, &iter, 0) {
 		struct i915_vma *vma = rcu_dereference_raw(*slot);
@@ -135,6 +134,7 @@ static void lut_close(struct i915_gem_context *ctx)
 		i915_gem_object_put(obj);
 	}
 	rcu_read_unlock();
+	mutex_unlock(&ctx->lut_mutex);
 }
 
 static struct intel_context *
@@ -342,6 +342,7 @@ static void i915_gem_context_free(struct i915_gem_context *ctx)
 	spin_unlock(&ctx->i915->gem.contexts.lock);
 
 	mutex_destroy(&ctx->engines_mutex);
+	mutex_destroy(&ctx->lut_mutex);
 
 	if (ctx->timeline)
 		intel_timeline_put(ctx->timeline);
@@ -725,6 +726,7 @@ __create_context(struct drm_i915_private *i915)
 	RCU_INIT_POINTER(ctx->engines, e);
 
 	INIT_RADIX_TREE(&ctx->handles_vma, GFP_KERNEL);
+	mutex_init(&ctx->lut_mutex);
 
 	/* NB: Mark all slices as needing a remap so that when the context first
 	 * loads it will restore whatever remap state already exists. If there
@@ -1312,11 +1314,11 @@ static int set_ppgtt(struct drm_i915_file_private *file_priv,
 	if (vm == rcu_access_pointer(ctx->vm))
 		goto unlock;
 
+	old = __set_ppgtt(ctx, vm);
+
 	/* Teardown the existing obj:vma cache, it will have to be rebuilt. */
 	lut_close(ctx);
 
-	old = __set_ppgtt(ctx, vm);
-
 	/*
 	 * We need to flush any requests using the current ppgtt before
 	 * we release it as the requests do not hold a reference themselves,
@@ -1330,6 +1332,7 @@ static int set_ppgtt(struct drm_i915_file_private *file_priv,
 	if (err) {
 		i915_vm_close(__set_ppgtt(ctx, old));
 		i915_vm_close(old);
+		lut_close(ctx); /* rebuild the old obj:vma cache */
 	}
 
 unlock:
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
index 28760bd03265..ae14ca24a11f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
@@ -170,6 +170,7 @@ struct i915_gem_context {
 	 * per vm, which may be one per context or shared with the global GTT)
 	 */
 	struct radix_tree_root handles_vma;
+	struct mutex lut_mutex;
 
 	/**
 	 * @name: arbitrary name, used for user debug
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index b4862afaaf28..6d4bf38dcda8 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -782,7 +782,7 @@ static int __eb_add_lut(struct i915_execbuffer *eb,
 
 	/* Check that the context hasn't been closed in the meantime */
 	err = -EINTR;
-	if (!mutex_lock_interruptible(&ctx->mutex)) {
+	if (!mutex_lock_interruptible(&ctx->lut_mutex)) {
 		err = -ENOENT;
 		if (likely(!i915_gem_context_is_closed(ctx)))
 			err = radix_tree_insert(&ctx->handles_vma, handle, vma);
@@ -798,7 +798,7 @@ static int __eb_add_lut(struct i915_execbuffer *eb,
 			}
 			spin_unlock(&obj->lut_lock);
 		}
-		mutex_unlock(&ctx->mutex);
+		mutex_unlock(&ctx->lut_mutex);
 	}
 	if (unlikely(err))
 		goto err;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
index 6b69191c5543..f1165261f41e 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
@@ -143,14 +143,14 @@ void i915_gem_close_object(struct drm_gem_object *gem, struct drm_file *file)
 		 * vma, in the same fd namespace, by virtue of flink/open.
 		 */
 
-		mutex_lock(&ctx->mutex);
+		mutex_lock(&ctx->lut_mutex);
 		vma = radix_tree_delete(&ctx->handles_vma, lut->handle);
 		if (vma) {
 			GEM_BUG_ON(vma->obj != obj);
 			GEM_BUG_ON(!atomic_read(&vma->open_count));
 			i915_vma_close(vma);
 		}
-		mutex_unlock(&ctx->mutex);
+		mutex_unlock(&ctx->lut_mutex);
 
 		i915_gem_context_put(lut->ctx);
 		i915_lut_handle_free(lut);
diff --git a/drivers/gpu/drm/i915/gem/selftests/mock_context.c b/drivers/gpu/drm/i915/gem/selftests/mock_context.c
index aa0d06cf1903..51b5a3421b40 100644
--- a/drivers/gpu/drm/i915/gem/selftests/mock_context.c
+++ b/drivers/gpu/drm/i915/gem/selftests/mock_context.c
@@ -23,6 +23,8 @@ mock_context(struct drm_i915_private *i915,
 	INIT_LIST_HEAD(&ctx->link);
 	ctx->i915 = i915;
 
+	mutex_init(&ctx->mutex);
+
 	spin_lock_init(&ctx->stale.lock);
 	INIT_LIST_HEAD(&ctx->stale.engines);
 
@@ -35,7 +37,7 @@ mock_context(struct drm_i915_private *i915,
 	RCU_INIT_POINTER(ctx->engines, e);
 
 	INIT_RADIX_TREE(&ctx->handles_vma, GFP_KERNEL);
-	mutex_init(&ctx->mutex);
+	mutex_init(&ctx->lut_mutex);
 
 	if (name) {
 		struct i915_ppgtt *ppgtt;
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [Intel-gfx] [PATCH 03/23] drm/i915/gem: Drop forced struct_mutex from shrinker_taints_mutex
  2020-07-02  8:32 ` [Intel-gfx] " Chris Wilson
  (?)
  (?)
@ 2020-07-02  8:32 ` Chris Wilson
  2020-07-02 22:24   ` Andi Shyti
  -1 siblings, 1 reply; 56+ messages in thread
From: Chris Wilson @ 2020-07-02  8:32 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Since we no longer always take struct_mutex around everything, and want
the freedom to create GEM objects, actually taking struct_mutex inside
the lock creation ends up pulling the mutex inside other looks. Since we
don't use generally use struct_mutex, we can relax the tainting.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gem/i915_gem_shrinker.c | 11 -----------
 1 file changed, 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c
index 5b65ce738b16..1ced1e5d2ec0 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c
@@ -408,26 +408,15 @@ void i915_gem_driver_unregister__shrinker(struct drm_i915_private *i915)
 void i915_gem_shrinker_taints_mutex(struct drm_i915_private *i915,
 				    struct mutex *mutex)
 {
-	bool unlock = false;
-
 	if (!IS_ENABLED(CONFIG_LOCKDEP))
 		return;
 
-	if (!lockdep_is_held_type(&i915->drm.struct_mutex, -1)) {
-		mutex_acquire(&i915->drm.struct_mutex.dep_map,
-			      I915_MM_NORMAL, 0, _RET_IP_);
-		unlock = true;
-	}
-
 	fs_reclaim_acquire(GFP_KERNEL);
 
 	mutex_acquire(&mutex->dep_map, 0, 0, _RET_IP_);
 	mutex_release(&mutex->dep_map, _RET_IP_);
 
 	fs_reclaim_release(GFP_KERNEL);
-
-	if (unlock)
-		mutex_release(&i915->drm.struct_mutex.dep_map, _RET_IP_);
 }
 
 #define obj_to_i915(obj__) to_i915((obj__)->base.dev)
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [Intel-gfx] [PATCH 04/23] drm/i915/gem: Only revoke mmap handlers if active
  2020-07-02  8:32 ` [Intel-gfx] " Chris Wilson
                   ` (2 preceding siblings ...)
  (?)
@ 2020-07-02  8:32 ` Chris Wilson
  2020-07-02 12:35   ` Tvrtko Ursulin
  -1 siblings, 1 reply; 56+ messages in thread
From: Chris Wilson @ 2020-07-02  8:32 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Avoid waking up the device and taking stale locks if we know that the
object is not currently mmapped. This is particularly useful as not many
object are actually mmapped and so we can destroy them without waking
the device up, and gives us a little more freedom of workqueue ordering
during shutdown.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gem/i915_gem_mman.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
index fe27c5b344e3..522ca4f51b53 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
@@ -516,8 +516,11 @@ void i915_gem_object_release_mmap_offset(struct drm_i915_gem_object *obj)
  */
 void i915_gem_object_release_mmap(struct drm_i915_gem_object *obj)
 {
-	i915_gem_object_release_mmap_gtt(obj);
-	i915_gem_object_release_mmap_offset(obj);
+	if (obj->userfault_count)
+		i915_gem_object_release_mmap_gtt(obj);
+
+	if (!RB_EMPTY_ROOT(&obj->mmo.offsets))
+		i915_gem_object_release_mmap_offset(obj);
 }
 
 static struct i915_mmap_offset *
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [Intel-gfx] [PATCH 05/23] drm/i915: Export ppgtt_bind_vma
  2020-07-02  8:32 ` [Intel-gfx] " Chris Wilson
                   ` (3 preceding siblings ...)
  (?)
@ 2020-07-02  8:32 ` Chris Wilson
  2020-07-03 10:09   ` Andi Shyti
  -1 siblings, 1 reply; 56+ messages in thread
From: Chris Wilson @ 2020-07-02  8:32 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Reuse the ppgtt_bind_vma() for aliasing_ppgtt_bind_vma() so we can
reduce some code near-duplication. The catch is that we need to then
pass along the i915_address_space and not rely on vma->vm, as they
differ with the aliasing-ppgtt.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 .../gpu/drm/i915/gem/i915_gem_client_blt.c    |  9 ++--
 drivers/gpu/drm/i915/gt/gen6_ppgtt.c          |  7 +--
 drivers/gpu/drm/i915/gt/intel_ggtt.c          | 49 +++++++------------
 drivers/gpu/drm/i915/gt/intel_gtt.h           | 13 ++++-
 drivers/gpu/drm/i915/gt/intel_ppgtt.c         | 19 ++++---
 drivers/gpu/drm/i915/i915_vma.c               |  8 +--
 drivers/gpu/drm/i915/i915_vma_types.h         |  1 -
 drivers/gpu/drm/i915/selftests/mock_gtt.c     | 12 +++--
 8 files changed, 58 insertions(+), 60 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c b/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c
index d3a86a4d5c04..278664f831e7 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c
@@ -32,16 +32,17 @@ static void vma_clear_pages(struct i915_vma *vma)
 	vma->pages = NULL;
 }
 
-static int vma_bind(struct i915_vma *vma,
+static int vma_bind(struct i915_address_space *vm,
+		    struct i915_vma *vma,
 		    enum i915_cache_level cache_level,
 		    u32 flags)
 {
-	return vma->vm->vma_ops.bind_vma(vma, cache_level, flags);
+	return vm->vma_ops.bind_vma(vm, vma, cache_level, flags);
 }
 
-static void vma_unbind(struct i915_vma *vma)
+static void vma_unbind(struct i915_address_space *vm, struct i915_vma *vma)
 {
-	vma->vm->vma_ops.unbind_vma(vma);
+	vm->vma_ops.unbind_vma(vm, vma);
 }
 
 static const struct i915_vma_ops proxy_vma_ops = {
diff --git a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
index f4fec7eb4064..05497b50103f 100644
--- a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
@@ -299,11 +299,12 @@ static void pd_vma_clear_pages(struct i915_vma *vma)
 	vma->pages = NULL;
 }
 
-static int pd_vma_bind(struct i915_vma *vma,
+static int pd_vma_bind(struct i915_address_space *vm,
+		       struct i915_vma *vma,
 		       enum i915_cache_level cache_level,
 		       u32 unused)
 {
-	struct i915_ggtt *ggtt = i915_vm_to_ggtt(vma->vm);
+	struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
 	struct gen6_ppgtt *ppgtt = vma->private;
 	u32 ggtt_offset = i915_ggtt_offset(vma) / I915_GTT_PAGE_SIZE;
 
@@ -314,7 +315,7 @@ static int pd_vma_bind(struct i915_vma *vma,
 	return 0;
 }
 
-static void pd_vma_unbind(struct i915_vma *vma)
+static void pd_vma_unbind(struct i915_address_space *vm, struct i915_vma *vma)
 {
 	struct gen6_ppgtt *ppgtt = vma->private;
 	struct i915_page_directory * const pd = ppgtt->base.pd;
diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c
index 323c328d444a..62979ea591f0 100644
--- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
@@ -436,7 +436,8 @@ static void i915_ggtt_clear_range(struct i915_address_space *vm,
 	intel_gtt_clear_range(start >> PAGE_SHIFT, length >> PAGE_SHIFT);
 }
 
-static int ggtt_bind_vma(struct i915_vma *vma,
+static int ggtt_bind_vma(struct i915_address_space *vm,
+			 struct i915_vma *vma,
 			 enum i915_cache_level cache_level,
 			 u32 flags)
 {
@@ -451,15 +452,15 @@ static int ggtt_bind_vma(struct i915_vma *vma,
 	if (i915_gem_object_is_readonly(obj))
 		pte_flags |= PTE_READ_ONLY;
 
-	vma->vm->insert_entries(vma->vm, vma, cache_level, pte_flags);
+	vm->insert_entries(vm, vma, cache_level, pte_flags);
 	vma->page_sizes.gtt = I915_GTT_PAGE_SIZE;
 
 	return 0;
 }
 
-static void ggtt_unbind_vma(struct i915_vma *vma)
+static void ggtt_unbind_vma(struct i915_address_space *vm, struct i915_vma *vma)
 {
-	vma->vm->clear_range(vma->vm, vma->node.start, vma->size);
+	vm->clear_range(vm, vma->node.start, vma->size);
 }
 
 static int ggtt_reserve_guc_top(struct i915_ggtt *ggtt)
@@ -567,7 +568,8 @@ static int init_ggtt(struct i915_ggtt *ggtt)
 	return ret;
 }
 
-static int aliasing_gtt_bind_vma(struct i915_vma *vma,
+static int aliasing_gtt_bind_vma(struct i915_address_space *vm,
+				 struct i915_vma *vma,
 				 enum i915_cache_level cache_level,
 				 u32 flags)
 {
@@ -580,44 +582,27 @@ static int aliasing_gtt_bind_vma(struct i915_vma *vma,
 		pte_flags |= PTE_READ_ONLY;
 
 	if (flags & I915_VMA_LOCAL_BIND) {
-		struct i915_ppgtt *alias = i915_vm_to_ggtt(vma->vm)->alias;
+		struct i915_ppgtt *alias = i915_vm_to_ggtt(vm)->alias;
 
-		if (flags & I915_VMA_ALLOC) {
-			ret = alias->vm.allocate_va_range(&alias->vm,
-							  vma->node.start,
-							  vma->size);
-			if (ret)
-				return ret;
-
-			set_bit(I915_VMA_ALLOC_BIT, __i915_vma_flags(vma));
-		}
-
-		GEM_BUG_ON(!test_bit(I915_VMA_ALLOC_BIT,
-				     __i915_vma_flags(vma)));
-		alias->vm.insert_entries(&alias->vm, vma,
-					 cache_level, pte_flags);
+		ret = ppgtt_bind_vma(&alias->vm, vma, cache_level, flags);
+		if (ret)
+			return ret;
 	}
 
 	if (flags & I915_VMA_GLOBAL_BIND)
-		vma->vm->insert_entries(vma->vm, vma, cache_level, pte_flags);
+		vm->insert_entries(vm, vma, cache_level, pte_flags);
 
 	return 0;
 }
 
-static void aliasing_gtt_unbind_vma(struct i915_vma *vma)
+static void aliasing_gtt_unbind_vma(struct i915_address_space *vm,
+				    struct i915_vma *vma)
 {
-	if (i915_vma_is_bound(vma, I915_VMA_GLOBAL_BIND)) {
-		struct i915_address_space *vm = vma->vm;
-
+	if (i915_vma_is_bound(vma, I915_VMA_GLOBAL_BIND))
 		vm->clear_range(vm, vma->node.start, vma->size);
-	}
-
-	if (test_and_clear_bit(I915_VMA_ALLOC_BIT, __i915_vma_flags(vma))) {
-		struct i915_address_space *vm =
-			&i915_vm_to_ggtt(vma->vm)->alias->vm;
 
-		vm->clear_range(vm, vma->node.start, vma->size);
-	}
+	if (i915_vma_is_bound(vma, I915_VMA_LOCAL_BIND))
+		ppgtt_unbind_vma(&i915_vm_to_ggtt(vm)->alias->vm, vma);
 }
 
 static int init_aliasing_ppgtt(struct i915_ggtt *ggtt)
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
index d93ebdf3fa0e..f2b75078e05f 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
@@ -198,14 +198,16 @@ struct intel_gt;
 
 struct i915_vma_ops {
 	/* Map an object into an address space with the given cache flags. */
-	int (*bind_vma)(struct i915_vma *vma,
+	int (*bind_vma)(struct i915_address_space *vm,
+			struct i915_vma *vma,
 			enum i915_cache_level cache_level,
 			u32 flags);
 	/*
 	 * Unmap an object from an address space. This usually consists of
 	 * setting the valid PTE entries to a reserved scratch page.
 	 */
-	void (*unbind_vma)(struct i915_vma *vma);
+	void (*unbind_vma)(struct i915_address_space *vm,
+			   struct i915_vma *vma);
 
 	int (*set_pages)(struct i915_vma *vma);
 	void (*clear_pages)(struct i915_vma *vma);
@@ -566,6 +568,13 @@ int ggtt_set_pages(struct i915_vma *vma);
 int ppgtt_set_pages(struct i915_vma *vma);
 void clear_pages(struct i915_vma *vma);
 
+int ppgtt_bind_vma(struct i915_address_space *vm,
+		   struct i915_vma *vma,
+		   enum i915_cache_level cache_level,
+		   u32 flags);
+void ppgtt_unbind_vma(struct i915_address_space *vm,
+		      struct i915_vma *vma);
+
 void gtt_write_workarounds(struct intel_gt *gt);
 
 void setup_private_pat(struct intel_uncore *uncore);
diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
index f86f7e68ce5e..f0862e924d11 100644
--- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
@@ -155,16 +155,16 @@ struct i915_ppgtt *i915_ppgtt_create(struct intel_gt *gt)
 	return ppgtt;
 }
 
-static int ppgtt_bind_vma(struct i915_vma *vma,
-			  enum i915_cache_level cache_level,
-			  u32 flags)
+int ppgtt_bind_vma(struct i915_address_space *vm,
+		   struct i915_vma *vma,
+		   enum i915_cache_level cache_level,
+		   u32 flags)
 {
 	u32 pte_flags;
 	int err;
 
-	if (flags & I915_VMA_ALLOC) {
-		err = vma->vm->allocate_va_range(vma->vm,
-						 vma->node.start, vma->size);
+	if (!test_bit(I915_VMA_ALLOC_BIT, __i915_vma_flags(vma))) {
+		err = vm->allocate_va_range(vm, vma->node.start, vma->size);
 		if (err)
 			return err;
 
@@ -176,17 +176,16 @@ static int ppgtt_bind_vma(struct i915_vma *vma,
 	if (i915_gem_object_is_readonly(vma->obj))
 		pte_flags |= PTE_READ_ONLY;
 
-	GEM_BUG_ON(!test_bit(I915_VMA_ALLOC_BIT, __i915_vma_flags(vma)));
-	vma->vm->insert_entries(vma->vm, vma, cache_level, pte_flags);
+	vm->insert_entries(vm, vma, cache_level, pte_flags);
 	wmb();
 
 	return 0;
 }
 
-static void ppgtt_unbind_vma(struct i915_vma *vma)
+void ppgtt_unbind_vma(struct i915_address_space *vm, struct i915_vma *vma)
 {
 	if (test_and_clear_bit(I915_VMA_ALLOC_BIT, __i915_vma_flags(vma)))
-		vma->vm->clear_range(vma->vm, vma->node.start, vma->size);
+		vm->clear_range(vm, vma->node.start, vma->size);
 }
 
 int ppgtt_set_pages(struct i915_vma *vma)
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 7fe1f317cd2b..627bac2e0252 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -308,7 +308,7 @@ static int __vma_bind(struct dma_fence_work *work)
 	struct i915_vma *vma = vw->vma;
 	int err;
 
-	err = vma->ops->bind_vma(vma, vw->cache_level, vw->flags);
+	err = vma->ops->bind_vma(vma->vm, vma, vw->cache_level, vw->flags);
 	if (err)
 		atomic_or(I915_VMA_ERROR, &vma->flags);
 
@@ -411,7 +411,7 @@ int i915_vma_bind(struct i915_vma *vma,
 
 		work->vma = vma;
 		work->cache_level = cache_level;
-		work->flags = bind_flags | I915_VMA_ALLOC;
+		work->flags = bind_flags;
 
 		/*
 		 * Note we only want to chain up to the migration fence on
@@ -437,7 +437,7 @@ int i915_vma_bind(struct i915_vma *vma,
 			work->pinned = vma->obj;
 		}
 	} else {
-		ret = vma->ops->bind_vma(vma, cache_level, bind_flags);
+		ret = vma->ops->bind_vma(vma->vm, vma, cache_level, bind_flags);
 		if (ret)
 			return ret;
 	}
@@ -1265,7 +1265,7 @@ void __i915_vma_evict(struct i915_vma *vma)
 
 	if (likely(atomic_read(&vma->vm->open))) {
 		trace_i915_vma_unbind(vma);
-		vma->ops->unbind_vma(vma);
+		vma->ops->unbind_vma(vma->vm, vma);
 	}
 	atomic_and(~(I915_VMA_BIND_MASK | I915_VMA_ERROR | I915_VMA_GGTT_WRITE),
 		   &vma->flags);
diff --git a/drivers/gpu/drm/i915/i915_vma_types.h b/drivers/gpu/drm/i915/i915_vma_types.h
index 63831cdb7402..9e9082dc8f4b 100644
--- a/drivers/gpu/drm/i915/i915_vma_types.h
+++ b/drivers/gpu/drm/i915/i915_vma_types.h
@@ -235,7 +235,6 @@ struct i915_vma {
 #define I915_VMA_BIND_MASK (I915_VMA_GLOBAL_BIND | I915_VMA_LOCAL_BIND)
 
 #define I915_VMA_ALLOC_BIT	12
-#define I915_VMA_ALLOC		((int)BIT(I915_VMA_ALLOC_BIT))
 
 #define I915_VMA_ERROR_BIT	13
 #define I915_VMA_ERROR		((int)BIT(I915_VMA_ERROR_BIT))
diff --git a/drivers/gpu/drm/i915/selftests/mock_gtt.c b/drivers/gpu/drm/i915/selftests/mock_gtt.c
index edc5e3dda8ca..b173086411ef 100644
--- a/drivers/gpu/drm/i915/selftests/mock_gtt.c
+++ b/drivers/gpu/drm/i915/selftests/mock_gtt.c
@@ -38,7 +38,8 @@ static void mock_insert_entries(struct i915_address_space *vm,
 {
 }
 
-static int mock_bind_ppgtt(struct i915_vma *vma,
+static int mock_bind_ppgtt(struct i915_address_space *vm,
+			   struct i915_vma *vma,
 			   enum i915_cache_level cache_level,
 			   u32 flags)
 {
@@ -47,7 +48,8 @@ static int mock_bind_ppgtt(struct i915_vma *vma,
 	return 0;
 }
 
-static void mock_unbind_ppgtt(struct i915_vma *vma)
+static void mock_unbind_ppgtt(struct i915_address_space *vm,
+			      struct i915_vma *vma)
 {
 }
 
@@ -88,7 +90,8 @@ struct i915_ppgtt *mock_ppgtt(struct drm_i915_private *i915, const char *name)
 	return ppgtt;
 }
 
-static int mock_bind_ggtt(struct i915_vma *vma,
+static int mock_bind_ggtt(struct i915_address_space *vm,
+			  struct i915_vma *vma,
 			  enum i915_cache_level cache_level,
 			  u32 flags)
 {
@@ -96,7 +99,8 @@ static int mock_bind_ggtt(struct i915_vma *vma,
 	return 0;
 }
 
-static void mock_unbind_ggtt(struct i915_vma *vma)
+static void mock_unbind_ggtt(struct i915_address_space *vm,
+			     struct i915_vma *vma)
 {
 }
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [Intel-gfx] [PATCH 06/23] drm/i915: Preallocate stashes for vma page-directories
  2020-07-02  8:32 ` [Intel-gfx] " Chris Wilson
                   ` (4 preceding siblings ...)
  (?)
@ 2020-07-02  8:32 ` Chris Wilson
  2020-07-03 16:47   ` Tvrtko Ursulin
  -1 siblings, 1 reply; 56+ messages in thread
From: Chris Wilson @ 2020-07-02  8:32 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

We need to make the DMA allocations used for page directories to be
performed up front so that we can include those allocations in our
memory reservation pass. The downside is that we have to assume the
worst case, even before we know the final layout, and always allocate
enough page directories for this object, even when there will be overlap.

It should be noted that the lifetime for the page directories DMA is
more or less decoupled from individual fences as they will be shared
across objects across timelines.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 .../gpu/drm/i915/gem/i915_gem_client_blt.c    | 11 +--
 drivers/gpu/drm/i915/gt/gen6_ppgtt.c          | 38 +++------
 drivers/gpu/drm/i915/gt/gen8_ppgtt.c          | 77 +++++-------------
 drivers/gpu/drm/i915/gt/intel_ggtt.c          | 45 +++++------
 drivers/gpu/drm/i915/gt/intel_gtt.h           | 39 ++++++---
 drivers/gpu/drm/i915/gt/intel_ppgtt.c         | 80 ++++++++++++++++---
 drivers/gpu/drm/i915/i915_vma.c               | 29 ++++---
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 60 ++++++++------
 drivers/gpu/drm/i915/selftests/mock_gtt.c     | 22 ++---
 9 files changed, 224 insertions(+), 177 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c b/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c
index 278664f831e7..947c8aa8e13e 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c
@@ -32,12 +32,13 @@ static void vma_clear_pages(struct i915_vma *vma)
 	vma->pages = NULL;
 }
 
-static int vma_bind(struct i915_address_space *vm,
-		    struct i915_vma *vma,
-		    enum i915_cache_level cache_level,
-		    u32 flags)
+static void vma_bind(struct i915_address_space *vm,
+		     struct i915_vm_pt_stash *stash,
+		     struct i915_vma *vma,
+		     enum i915_cache_level cache_level,
+		     u32 flags)
 {
-	return vm->vma_ops.bind_vma(vm, vma, cache_level, flags);
+	vm->vma_ops.bind_vma(vm, stash, vma, cache_level, flags);
 }
 
 static void vma_unbind(struct i915_address_space *vm, struct i915_vma *vma)
diff --git a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
index 05497b50103f..35e2b698f9ed 100644
--- a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
@@ -177,16 +177,16 @@ static void gen6_flush_pd(struct gen6_ppgtt *ppgtt, u64 start, u64 end)
 	mutex_unlock(&ppgtt->flush);
 }
 
-static int gen6_alloc_va_range(struct i915_address_space *vm,
-			       u64 start, u64 length)
+static void gen6_alloc_va_range(struct i915_address_space *vm,
+				struct i915_vm_pt_stash *stash,
+				u64 start, u64 length)
 {
 	struct gen6_ppgtt *ppgtt = to_gen6_ppgtt(i915_vm_to_ppgtt(vm));
 	struct i915_page_directory * const pd = ppgtt->base.pd;
-	struct i915_page_table *pt, *alloc = NULL;
+	struct i915_page_table *pt;
 	intel_wakeref_t wakeref;
 	u64 from = start;
 	unsigned int pde;
-	int ret = 0;
 
 	wakeref = intel_runtime_pm_get(&vm->i915->runtime_pm);
 
@@ -197,21 +197,17 @@ static int gen6_alloc_va_range(struct i915_address_space *vm,
 		if (px_base(pt) == px_base(&vm->scratch[1])) {
 			spin_unlock(&pd->lock);
 
-			pt = fetch_and_zero(&alloc);
-			if (!pt)
-				pt = alloc_pt(vm);
-			if (IS_ERR(pt)) {
-				ret = PTR_ERR(pt);
-				goto unwind_out;
-			}
+			pt = stash->pt[0];
+			GEM_BUG_ON(!pt);
 
 			fill32_px(pt, vm->scratch[0].encode);
 
 			spin_lock(&pd->lock);
 			if (pd->entry[pde] == &vm->scratch[1]) {
+				stash->pt[0] = pt->stash;
+				atomic_set(&pt->used, 0);
 				pd->entry[pde] = pt;
 			} else {
-				alloc = pt;
 				pt = pd->entry[pde];
 			}
 		}
@@ -223,15 +219,7 @@ static int gen6_alloc_va_range(struct i915_address_space *vm,
 	if (i915_vma_is_bound(ppgtt->vma, I915_VMA_GLOBAL_BIND))
 		gen6_flush_pd(ppgtt, from, start);
 
-	goto out;
-
-unwind_out:
-	gen6_ppgtt_clear_range(vm, from, start - from);
-out:
-	if (alloc)
-		free_px(vm, alloc);
 	intel_runtime_pm_put(&vm->i915->runtime_pm, wakeref);
-	return ret;
 }
 
 static int gen6_ppgtt_init_scratch(struct gen6_ppgtt *ppgtt)
@@ -299,10 +287,11 @@ static void pd_vma_clear_pages(struct i915_vma *vma)
 	vma->pages = NULL;
 }
 
-static int pd_vma_bind(struct i915_address_space *vm,
-		       struct i915_vma *vma,
-		       enum i915_cache_level cache_level,
-		       u32 unused)
+static void pd_vma_bind(struct i915_address_space *vm,
+			struct i915_vm_pt_stash *stash,
+			struct i915_vma *vma,
+			enum i915_cache_level cache_level,
+			u32 unused)
 {
 	struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
 	struct gen6_ppgtt *ppgtt = vma->private;
@@ -312,7 +301,6 @@ static int pd_vma_bind(struct i915_address_space *vm,
 	ppgtt->pd_addr = (gen6_pte_t __iomem *)ggtt->gsm + ggtt_offset;
 
 	gen6_flush_pd(ppgtt, 0, ppgtt->base.vm.total);
-	return 0;
 }
 
 static void pd_vma_unbind(struct i915_address_space *vm, struct i915_vma *vma)
diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
index 699125928272..e6f2acd445dd 100644
--- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
@@ -269,14 +269,12 @@ static void gen8_ppgtt_clear(struct i915_address_space *vm,
 			   start, start + length, vm->top);
 }
 
-static int __gen8_ppgtt_alloc(struct i915_address_space * const vm,
-			      struct i915_page_directory * const pd,
-			      u64 * const start, const u64 end, int lvl)
+static void __gen8_ppgtt_alloc(struct i915_address_space * const vm,
+			       struct i915_vm_pt_stash *stash,
+			       struct i915_page_directory * const pd,
+			       u64 * const start, const u64 end, int lvl)
 {
-	const struct i915_page_scratch * const scratch = &vm->scratch[lvl];
-	struct i915_page_table *alloc = NULL;
 	unsigned int idx, len;
-	int ret = 0;
 
 	GEM_BUG_ON(end > vm->total >> GEN8_PTE_SHIFT);
 
@@ -297,49 +295,30 @@ static int __gen8_ppgtt_alloc(struct i915_address_space * const vm,
 			DBG("%s(%p):{ lvl:%d, idx:%d } allocating new tree\n",
 			    __func__, vm, lvl + 1, idx);
 
-			pt = fetch_and_zero(&alloc);
-			if (lvl) {
-				if (!pt) {
-					pt = &alloc_pd(vm)->pt;
-					if (IS_ERR(pt)) {
-						ret = PTR_ERR(pt);
-						goto out;
-					}
-				}
+			pt = stash->pt[!!lvl];
+			GEM_BUG_ON(!pt);
 
+			if (lvl ||
+			    gen8_pt_count(*start, end) < I915_PDES ||
+			    intel_vgpu_active(vm->i915))
 				fill_px(pt, vm->scratch[lvl].encode);
-			} else {
-				if (!pt) {
-					pt = alloc_pt(vm);
-					if (IS_ERR(pt)) {
-						ret = PTR_ERR(pt);
-						goto out;
-					}
-				}
-
-				if (intel_vgpu_active(vm->i915) ||
-				    gen8_pt_count(*start, end) < I915_PDES)
-					fill_px(pt, vm->scratch[lvl].encode);
-			}
 
 			spin_lock(&pd->lock);
-			if (likely(!pd->entry[idx]))
+			if (likely(!pd->entry[idx])) {
+				stash->pt[!!lvl] = pt->stash;
+				atomic_set(&pt->used, 0);
 				set_pd_entry(pd, idx, pt);
-			else
-				alloc = pt, pt = pd->entry[idx];
+			} else {
+				pt = pd->entry[idx];
+			}
 		}
 
 		if (lvl) {
 			atomic_inc(&pt->used);
 			spin_unlock(&pd->lock);
 
-			ret = __gen8_ppgtt_alloc(vm, as_pd(pt),
-						 start, end, lvl);
-			if (unlikely(ret)) {
-				if (release_pd_entry(pd, idx, pt, scratch))
-					free_px(vm, pt);
-				goto out;
-			}
+			__gen8_ppgtt_alloc(vm, stash,
+					   as_pd(pt), start, end, lvl);
 
 			spin_lock(&pd->lock);
 			atomic_dec(&pt->used);
@@ -359,18 +338,12 @@ static int __gen8_ppgtt_alloc(struct i915_address_space * const vm,
 		}
 	} while (idx++, --len);
 	spin_unlock(&pd->lock);
-out:
-	if (alloc)
-		free_px(vm, alloc);
-	return ret;
 }
 
-static int gen8_ppgtt_alloc(struct i915_address_space *vm,
-			    u64 start, u64 length)
+static void gen8_ppgtt_alloc(struct i915_address_space *vm,
+			     struct i915_vm_pt_stash *stash,
+			     u64 start, u64 length)
 {
-	u64 from;
-	int err;
-
 	GEM_BUG_ON(!IS_ALIGNED(start, BIT_ULL(GEN8_PTE_SHIFT)));
 	GEM_BUG_ON(!IS_ALIGNED(length, BIT_ULL(GEN8_PTE_SHIFT)));
 	GEM_BUG_ON(range_overflows(start, length, vm->total));
@@ -378,15 +351,9 @@ static int gen8_ppgtt_alloc(struct i915_address_space *vm,
 	start >>= GEN8_PTE_SHIFT;
 	length >>= GEN8_PTE_SHIFT;
 	GEM_BUG_ON(length == 0);
-	from = start;
-
-	err = __gen8_ppgtt_alloc(vm, i915_vm_to_ppgtt(vm)->pd,
-				 &start, start + length, vm->top);
-	if (unlikely(err && from != start))
-		__gen8_ppgtt_clear(vm, i915_vm_to_ppgtt(vm)->pd,
-				   from, start, vm->top);
 
-	return err;
+	__gen8_ppgtt_alloc(vm, stash, i915_vm_to_ppgtt(vm)->pd,
+			   &start, start + length, vm->top);
 }
 
 static __always_inline void
diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c
index 62979ea591f0..791e4070ef31 100644
--- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
@@ -436,16 +436,17 @@ static void i915_ggtt_clear_range(struct i915_address_space *vm,
 	intel_gtt_clear_range(start >> PAGE_SHIFT, length >> PAGE_SHIFT);
 }
 
-static int ggtt_bind_vma(struct i915_address_space *vm,
-			 struct i915_vma *vma,
-			 enum i915_cache_level cache_level,
-			 u32 flags)
+static void ggtt_bind_vma(struct i915_address_space *vm,
+			  struct i915_vm_pt_stash *stash,
+			  struct i915_vma *vma,
+			  enum i915_cache_level cache_level,
+			  u32 flags)
 {
 	struct drm_i915_gem_object *obj = vma->obj;
 	u32 pte_flags;
 
 	if (i915_vma_is_bound(vma, ~flags & I915_VMA_BIND_MASK))
-		return 0;
+		return;
 
 	/* Applicable to VLV (gen8+ do not support RO in the GGTT) */
 	pte_flags = 0;
@@ -454,8 +455,6 @@ static int ggtt_bind_vma(struct i915_address_space *vm,
 
 	vm->insert_entries(vm, vma, cache_level, pte_flags);
 	vma->page_sizes.gtt = I915_GTT_PAGE_SIZE;
-
-	return 0;
 }
 
 static void ggtt_unbind_vma(struct i915_address_space *vm, struct i915_vma *vma)
@@ -568,31 +567,25 @@ static int init_ggtt(struct i915_ggtt *ggtt)
 	return ret;
 }
 
-static int aliasing_gtt_bind_vma(struct i915_address_space *vm,
-				 struct i915_vma *vma,
-				 enum i915_cache_level cache_level,
-				 u32 flags)
+static void aliasing_gtt_bind_vma(struct i915_address_space *vm,
+				  struct i915_vm_pt_stash *stash,
+				  struct i915_vma *vma,
+				  enum i915_cache_level cache_level,
+				  u32 flags)
 {
 	u32 pte_flags;
-	int ret;
 
 	/* Currently applicable only to VLV */
 	pte_flags = 0;
 	if (i915_gem_object_is_readonly(vma->obj))
 		pte_flags |= PTE_READ_ONLY;
 
-	if (flags & I915_VMA_LOCAL_BIND) {
-		struct i915_ppgtt *alias = i915_vm_to_ggtt(vm)->alias;
-
-		ret = ppgtt_bind_vma(&alias->vm, vma, cache_level, flags);
-		if (ret)
-			return ret;
-	}
+	if (flags & I915_VMA_LOCAL_BIND)
+		ppgtt_bind_vma(&i915_vm_to_ggtt(vm)->alias->vm,
+			       stash, vma, cache_level, flags);
 
 	if (flags & I915_VMA_GLOBAL_BIND)
 		vm->insert_entries(vm, vma, cache_level, pte_flags);
-
-	return 0;
 }
 
 static void aliasing_gtt_unbind_vma(struct i915_address_space *vm,
@@ -607,6 +600,7 @@ static void aliasing_gtt_unbind_vma(struct i915_address_space *vm,
 
 static int init_aliasing_ppgtt(struct i915_ggtt *ggtt)
 {
+	struct i915_vm_pt_stash stash = {};
 	struct i915_ppgtt *ppgtt;
 	int err;
 
@@ -619,15 +613,17 @@ static int init_aliasing_ppgtt(struct i915_ggtt *ggtt)
 		goto err_ppgtt;
 	}
 
+	err = i915_vm_alloc_pt_stash(&ppgtt->vm, &stash, ggtt->vm.total);
+	if (err)
+		goto err_ppgtt;
+
 	/*
 	 * Note we only pre-allocate as far as the end of the global
 	 * GTT. On 48b / 4-level page-tables, the difference is very,
 	 * very significant! We have to preallocate as GVT/vgpu does
 	 * not like the page directory disappearing.
 	 */
-	err = ppgtt->vm.allocate_va_range(&ppgtt->vm, 0, ggtt->vm.total);
-	if (err)
-		goto err_ppgtt;
+	ppgtt->vm.allocate_va_range(&ppgtt->vm, &stash, 0, ggtt->vm.total);
 
 	ggtt->alias = ppgtt;
 	ggtt->vm.bind_async_flags |= ppgtt->vm.bind_async_flags;
@@ -638,6 +634,7 @@ static int init_aliasing_ppgtt(struct i915_ggtt *ggtt)
 	GEM_BUG_ON(ggtt->vm.vma_ops.unbind_vma != ggtt_unbind_vma);
 	ggtt->vm.vma_ops.unbind_vma = aliasing_gtt_unbind_vma;
 
+	i915_vm_free_pt_stash(&ppgtt->vm, &stash);
 	return 0;
 
 err_ppgtt:
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
index f2b75078e05f..8bd462d2fcd9 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
@@ -159,7 +159,10 @@ struct i915_page_scratch {
 
 struct i915_page_table {
 	struct i915_page_dma base;
-	atomic_t used;
+	union {
+		atomic_t used;
+		struct i915_page_table *stash;
+	};
 };
 
 struct i915_page_directory {
@@ -196,12 +199,18 @@ struct drm_i915_gem_object;
 struct i915_vma;
 struct intel_gt;
 
+struct i915_vm_pt_stash {
+	/* preallocated chains of page tables/directories */
+	struct i915_page_table *pt[2];
+};
+
 struct i915_vma_ops {
 	/* Map an object into an address space with the given cache flags. */
-	int (*bind_vma)(struct i915_address_space *vm,
-			struct i915_vma *vma,
-			enum i915_cache_level cache_level,
-			u32 flags);
+	void (*bind_vma)(struct i915_address_space *vm,
+			 struct i915_vm_pt_stash *stash,
+			 struct i915_vma *vma,
+			 enum i915_cache_level cache_level,
+			 u32 flags);
 	/*
 	 * Unmap an object from an address space. This usually consists of
 	 * setting the valid PTE entries to a reserved scratch page.
@@ -281,8 +290,9 @@ struct i915_address_space {
 			  u32 flags); /* Create a valid PTE */
 #define PTE_READ_ONLY	BIT(0)
 
-	int (*allocate_va_range)(struct i915_address_space *vm,
-				 u64 start, u64 length);
+	void (*allocate_va_range)(struct i915_address_space *vm,
+				  struct i915_vm_pt_stash *stash,
+				  u64 start, u64 length);
 	void (*clear_range)(struct i915_address_space *vm,
 			    u64 start, u64 length);
 	void (*insert_page)(struct i915_address_space *vm,
@@ -568,10 +578,11 @@ int ggtt_set_pages(struct i915_vma *vma);
 int ppgtt_set_pages(struct i915_vma *vma);
 void clear_pages(struct i915_vma *vma);
 
-int ppgtt_bind_vma(struct i915_address_space *vm,
-		   struct i915_vma *vma,
-		   enum i915_cache_level cache_level,
-		   u32 flags);
+void ppgtt_bind_vma(struct i915_address_space *vm,
+		    struct i915_vm_pt_stash *stash,
+		    struct i915_vma *vma,
+		    enum i915_cache_level cache_level,
+		    u32 flags);
 void ppgtt_unbind_vma(struct i915_address_space *vm,
 		      struct i915_vma *vma);
 
@@ -579,6 +590,12 @@ void gtt_write_workarounds(struct intel_gt *gt);
 
 void setup_private_pat(struct intel_uncore *uncore);
 
+int i915_vm_alloc_pt_stash(struct i915_address_space *vm,
+			   struct i915_vm_pt_stash *stash,
+			   u64 size);
+void i915_vm_free_pt_stash(struct i915_address_space *vm,
+			   struct i915_vm_pt_stash *stash);
+
 static inline struct sgt_dma {
 	struct scatterlist *sg;
 	dma_addr_t dma, max;
diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
index f0862e924d11..9633fd2d294d 100644
--- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
@@ -155,19 +155,16 @@ struct i915_ppgtt *i915_ppgtt_create(struct intel_gt *gt)
 	return ppgtt;
 }
 
-int ppgtt_bind_vma(struct i915_address_space *vm,
-		   struct i915_vma *vma,
-		   enum i915_cache_level cache_level,
-		   u32 flags)
+void ppgtt_bind_vma(struct i915_address_space *vm,
+		    struct i915_vm_pt_stash *stash,
+		    struct i915_vma *vma,
+		    enum i915_cache_level cache_level,
+		    u32 flags)
 {
 	u32 pte_flags;
-	int err;
 
 	if (!test_bit(I915_VMA_ALLOC_BIT, __i915_vma_flags(vma))) {
-		err = vm->allocate_va_range(vm, vma->node.start, vma->size);
-		if (err)
-			return err;
-
+		vm->allocate_va_range(vm, stash, vma->node.start, vma->size);
 		set_bit(I915_VMA_ALLOC_BIT, __i915_vma_flags(vma));
 	}
 
@@ -178,8 +175,6 @@ int ppgtt_bind_vma(struct i915_address_space *vm,
 
 	vm->insert_entries(vm, vma, cache_level, pte_flags);
 	wmb();
-
-	return 0;
 }
 
 void ppgtt_unbind_vma(struct i915_address_space *vm, struct i915_vma *vma)
@@ -188,12 +183,73 @@ void ppgtt_unbind_vma(struct i915_address_space *vm, struct i915_vma *vma)
 		vm->clear_range(vm, vma->node.start, vma->size);
 }
 
+static unsigned long pd_count(u64 size, int shift)
+{
+	/* Beware later misalignment */
+	return (size + 2 * (BIT_ULL(shift) - 1)) >> shift;
+}
+
+int i915_vm_alloc_pt_stash(struct i915_address_space *vm,
+			   struct i915_vm_pt_stash *stash,
+			   u64 size)
+{
+	unsigned long count;
+	int shift = 21;
+	int n;
+
+	count = pd_count(size, shift);
+	while (count--) {
+		struct i915_page_table *pt;
+
+		pt = alloc_pt(vm);
+		if (IS_ERR(pt)) {
+			i915_vm_free_pt_stash(vm, stash);
+			return PTR_ERR(pt);
+		}
+
+		pt->stash = stash->pt[0];
+		stash->pt[0] = pt;
+	}
+
+	for (n = 1; n < vm->top; n++) {
+		shift += 9;
+		count = pd_count(size, shift);
+		while (count--) {
+			struct i915_page_directory *pd;
+
+			pd = alloc_pd(vm);
+			if (IS_ERR(pd)) {
+				i915_vm_free_pt_stash(vm, stash);
+				return PTR_ERR(pd);
+			}
+
+			pd->pt.stash = stash->pt[1];
+			stash->pt[1] = &pd->pt;
+		}
+	}
+
+	return 0;
+}
+
+void i915_vm_free_pt_stash(struct i915_address_space *vm,
+			   struct i915_vm_pt_stash *stash)
+{
+	struct i915_page_table *pt;
+	int n;
+
+	for (n = 0; n < ARRAY_SIZE(stash->pt); n++) {
+		while ((pt = stash->pt[n])) {
+			stash->pt[n] = pt->stash;
+			free_px(vm, pt);
+		}
+	}
+}
+
 int ppgtt_set_pages(struct i915_vma *vma)
 {
 	GEM_BUG_ON(vma->pages);
 
 	vma->pages = vma->obj->mm.pages;
-
 	vma->page_sizes = vma->obj->mm.page_sizes;
 
 	return 0;
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 627bac2e0252..fc8a083753bd 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -295,6 +295,8 @@ i915_vma_instance(struct drm_i915_gem_object *obj,
 
 struct i915_vma_work {
 	struct dma_fence_work base;
+	struct i915_address_space *vm;
+	struct i915_vm_pt_stash stash;
 	struct i915_vma *vma;
 	struct drm_i915_gem_object *pinned;
 	struct i915_sw_dma_fence_cb cb;
@@ -306,13 +308,10 @@ static int __vma_bind(struct dma_fence_work *work)
 {
 	struct i915_vma_work *vw = container_of(work, typeof(*vw), base);
 	struct i915_vma *vma = vw->vma;
-	int err;
-
-	err = vma->ops->bind_vma(vma->vm, vma, vw->cache_level, vw->flags);
-	if (err)
-		atomic_or(I915_VMA_ERROR, &vma->flags);
 
-	return err;
+	vma->ops->bind_vma(vw->vm, &vw->stash,
+			   vma, vw->cache_level, vw->flags);
+	return 0;
 }
 
 static void __vma_release(struct dma_fence_work *work)
@@ -321,6 +320,9 @@ static void __vma_release(struct dma_fence_work *work)
 
 	if (vw->pinned)
 		__i915_gem_object_unpin_pages(vw->pinned);
+
+	i915_vm_free_pt_stash(vw->vm, &vw->stash);
+	i915_vm_put(vw->vm);
 }
 
 static const struct dma_fence_work_ops bind_ops = {
@@ -380,7 +382,6 @@ int i915_vma_bind(struct i915_vma *vma,
 {
 	u32 bind_flags;
 	u32 vma_flags;
-	int ret;
 
 	GEM_BUG_ON(!drm_mm_node_allocated(&vma->node));
 	GEM_BUG_ON(vma->size > vma->node.size);
@@ -437,9 +438,7 @@ int i915_vma_bind(struct i915_vma *vma,
 			work->pinned = vma->obj;
 		}
 	} else {
-		ret = vma->ops->bind_vma(vma->vm, vma, cache_level, bind_flags);
-		if (ret)
-			return ret;
+		vma->ops->bind_vma(vma->vm, NULL, vma, cache_level, bind_flags);
 	}
 
 	atomic_or(bind_flags, &vma->flags);
@@ -878,11 +877,21 @@ int i915_vma_pin(struct i915_vma *vma, u64 size, u64 alignment, u64 flags)
 		return err;
 
 	if (flags & vma->vm->bind_async_flags) {
+		u64 max_size;
+
 		work = i915_vma_work();
 		if (!work) {
 			err = -ENOMEM;
 			goto err_pages;
 		}
+
+		work->vm = i915_vm_get(vma->vm);
+
+		/* Allocate enough page directories to cover worst case */
+		max_size = max(size, vma->size);
+		if (flags & PIN_MAPPABLE)
+			max_size = max_t(u64, max_size, vma->fence_size);
+		i915_vm_alloc_pt_stash(vma->vm, &work->stash, max_size);
 	}
 
 	if (flags & PIN_GLOBAL)
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
index 0016ffc7d914..9b8fc990e9ef 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
@@ -172,35 +172,33 @@ static int igt_ppgtt_alloc(void *arg)
 
 	/* Check we can allocate the entire range */
 	for (size = 4096; size <= limit; size <<= 2) {
-		err = ppgtt->vm.allocate_va_range(&ppgtt->vm, 0, size);
-		if (err) {
-			if (err == -ENOMEM) {
-				pr_info("[1] Ran out of memory for va_range [0 + %llx] [bit %d]\n",
-					size, ilog2(size));
-				err = 0; /* virtual space too large! */
-			}
+		struct i915_vm_pt_stash stash = {};
+
+		err = i915_vm_alloc_pt_stash(&ppgtt->vm, &stash, size);
+		if (err)
 			goto err_ppgtt_cleanup;
-		}
 
+		ppgtt->vm.allocate_va_range(&ppgtt->vm, &stash, 0, size);
 		cond_resched();
 
 		ppgtt->vm.clear_range(&ppgtt->vm, 0, size);
+
+		i915_vm_free_pt_stash(&ppgtt->vm, &stash);
 	}
 
 	/* Check we can incrementally allocate the entire range */
 	for (last = 0, size = 4096; size <= limit; last = size, size <<= 2) {
-		err = ppgtt->vm.allocate_va_range(&ppgtt->vm,
-						  last, size - last);
-		if (err) {
-			if (err == -ENOMEM) {
-				pr_info("[2] Ran out of memory for va_range [%llx + %llx] [bit %d]\n",
-					last, size - last, ilog2(size));
-				err = 0; /* virtual space too large! */
-			}
+		struct i915_vm_pt_stash stash = {};
+
+		err = i915_vm_alloc_pt_stash(&ppgtt->vm, &stash, size - last);
+		if (err)
 			goto err_ppgtt_cleanup;
-		}
 
+		ppgtt->vm.allocate_va_range(&ppgtt->vm, &stash,
+					    last, size - last);
 		cond_resched();
+
+		i915_vm_free_pt_stash(&ppgtt->vm, &stash);
 	}
 
 err_ppgtt_cleanup:
@@ -284,9 +282,18 @@ static int lowlevel_hole(struct i915_address_space *vm,
 				break;
 			}
 
-			if (vm->allocate_va_range &&
-			    vm->allocate_va_range(vm, addr, BIT_ULL(size)))
-				break;
+			if (vm->allocate_va_range) {
+				struct i915_vm_pt_stash stash = {};
+
+				if (i915_vm_alloc_pt_stash(vm, &stash,
+							   BIT_ULL(size)))
+					break;
+
+				vm->allocate_va_range(vm, &stash,
+						      addr, BIT_ULL(size));
+
+				i915_vm_free_pt_stash(vm, &stash);
+			}
 
 			mock_vma->pages = obj->mm.pages;
 			mock_vma->node.size = BIT_ULL(size);
@@ -1881,6 +1888,7 @@ static int igt_cs_tlb(void *arg)
 			continue;
 
 		while (!__igt_timeout(end_time, NULL)) {
+			struct i915_vm_pt_stash stash = {};
 			struct i915_request *rq;
 			u64 offset;
 
@@ -1888,10 +1896,6 @@ static int igt_cs_tlb(void *arg)
 						   0, vm->total - PAGE_SIZE,
 						   chunk_size, PAGE_SIZE);
 
-			err = vm->allocate_va_range(vm, offset, chunk_size);
-			if (err)
-				goto end;
-
 			memset32(result, STACK_MAGIC, PAGE_SIZE / sizeof(u32));
 
 			vma = i915_vma_instance(bbe, vm, NULL);
@@ -1904,6 +1908,14 @@ static int igt_cs_tlb(void *arg)
 			if (err)
 				goto end;
 
+			err = i915_vm_alloc_pt_stash(vm, &stash, chunk_size);
+			if (err)
+				goto end;
+
+			vm->allocate_va_range(vm, &stash, offset, chunk_size);
+
+			i915_vm_free_pt_stash(vm, &stash);
+
 			/* Prime the TLB with the dummy pages */
 			for (i = 0; i < count; i++) {
 				vma->node.start = offset + i * PAGE_SIZE;
diff --git a/drivers/gpu/drm/i915/selftests/mock_gtt.c b/drivers/gpu/drm/i915/selftests/mock_gtt.c
index b173086411ef..5e4fb0fba34b 100644
--- a/drivers/gpu/drm/i915/selftests/mock_gtt.c
+++ b/drivers/gpu/drm/i915/selftests/mock_gtt.c
@@ -38,14 +38,14 @@ static void mock_insert_entries(struct i915_address_space *vm,
 {
 }
 
-static int mock_bind_ppgtt(struct i915_address_space *vm,
-			   struct i915_vma *vma,
-			   enum i915_cache_level cache_level,
-			   u32 flags)
+static void mock_bind_ppgtt(struct i915_address_space *vm,
+			    struct i915_vm_pt_stash *stash,
+			    struct i915_vma *vma,
+			    enum i915_cache_level cache_level,
+			    u32 flags)
 {
 	GEM_BUG_ON(flags & I915_VMA_GLOBAL_BIND);
 	set_bit(I915_VMA_LOCAL_BIND_BIT, __i915_vma_flags(vma));
-	return 0;
 }
 
 static void mock_unbind_ppgtt(struct i915_address_space *vm,
@@ -74,6 +74,7 @@ struct i915_ppgtt *mock_ppgtt(struct drm_i915_private *i915, const char *name)
 	ppgtt->vm.i915 = i915;
 	ppgtt->vm.total = round_down(U64_MAX, PAGE_SIZE);
 	ppgtt->vm.file = ERR_PTR(-ENODEV);
+	ppgtt->vm.dma = &i915->drm.pdev->dev;
 
 	i915_address_space_init(&ppgtt->vm, VM_CLASS_PPGTT);
 
@@ -90,13 +91,12 @@ struct i915_ppgtt *mock_ppgtt(struct drm_i915_private *i915, const char *name)
 	return ppgtt;
 }
 
-static int mock_bind_ggtt(struct i915_address_space *vm,
-			  struct i915_vma *vma,
-			  enum i915_cache_level cache_level,
-			  u32 flags)
+static void mock_bind_ggtt(struct i915_address_space *vm,
+			   struct i915_vm_pt_stash *stash,
+			   struct i915_vma *vma,
+			   enum i915_cache_level cache_level,
+			   u32 flags)
 {
-	atomic_or(I915_VMA_GLOBAL_BIND | I915_VMA_LOCAL_BIND, &vma->flags);
-	return 0;
 }
 
 static void mock_unbind_ggtt(struct i915_address_space *vm,
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [Intel-gfx] [PATCH 07/23] drm/i915: Switch to object allocations for page directories
  2020-07-02  8:32 ` [Intel-gfx] " Chris Wilson
                   ` (5 preceding siblings ...)
  (?)
@ 2020-07-02  8:32 ` Chris Wilson
  2020-07-03  8:44   ` Tvrtko Ursulin
  2020-07-03 16:36   ` Tvrtko Ursulin
  -1 siblings, 2 replies; 56+ messages in thread
From: Chris Wilson @ 2020-07-02  8:32 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

The GEM object is grossly overweight for the practicality of tracking
large numbers of individual pages, yet it is currently our only
abstraction for tracking DMA allocations. Since those allocations need
to be reserved upfront before an operation, and that we need to break
away from simple system memory, we need to ditch using plain struct page
wrappers.

In the process, we drop the WC mapping as we ended up clflushing
everything anyway due to various issues across a wider range of
platforms. Though in a future step, we need to drop the kmap_atomic
approach which suggests we need to pre-map all the pages and keep them
mapped.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |   1 +
 .../gpu/drm/i915/gem/selftests/huge_pages.c   |   2 +-
 .../drm/i915/gem/selftests/i915_gem_context.c |   2 +-
 drivers/gpu/drm/i915/gt/gen6_ppgtt.c          |  46 ++-
 drivers/gpu/drm/i915/gt/gen6_ppgtt.h          |   1 +
 drivers/gpu/drm/i915/gt/gen8_ppgtt.c          |  64 ++--
 drivers/gpu/drm/i915/gt/intel_ggtt.c          |  31 +-
 drivers/gpu/drm/i915/gt/intel_gtt.c           | 291 +++---------------
 drivers/gpu/drm/i915/gt/intel_gtt.h           |  92 ++----
 drivers/gpu/drm/i915/gt/intel_ppgtt.c         |  25 +-
 .../gpu/drm/i915/gt/intel_ring_submission.c   |  16 +-
 drivers/gpu/drm/i915/gvt/scheduler.c          |  17 +-
 drivers/gpu/drm/i915/i915_drv.c               |   1 +
 drivers/gpu/drm/i915/i915_drv.h               |   5 -
 drivers/gpu/drm/i915/selftests/mock_gtt.c     |   2 +
 15 files changed, 183 insertions(+), 413 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index 5335f799b548..d0847d7896f9 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -282,6 +282,7 @@ struct drm_i915_gem_object {
 		} userptr;
 
 		unsigned long scratch;
+		u64 encode;
 
 		void *gvt_info;
 	};
diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
index 8291ede6902c..9fb06fcc8f8f 100644
--- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
+++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
@@ -393,7 +393,7 @@ static int igt_mock_exhaust_device_supported_pages(void *arg)
 	 */
 
 	for (i = 1; i < BIT(ARRAY_SIZE(page_sizes)); i++) {
-		unsigned int combination = 0;
+		unsigned int combination = SZ_4K;
 
 		for (j = 0; j < ARRAY_SIZE(page_sizes); j++) {
 			if (i & BIT(j))
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
index b81978890641..1308198543d8 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
@@ -1745,7 +1745,7 @@ static int check_scratch_page(struct i915_gem_context *ctx, u32 *out)
 	if (!vm)
 		return -ENODEV;
 
-	page = vm->scratch[0].base.page;
+	page = __px_page(vm->scratch[0]);
 	if (!page) {
 		pr_err("No scratch page!\n");
 		return -EINVAL;
diff --git a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
index 35e2b698f9ed..226e404c706d 100644
--- a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
@@ -16,8 +16,10 @@ static inline void gen6_write_pde(const struct gen6_ppgtt *ppgtt,
 				  const unsigned int pde,
 				  const struct i915_page_table *pt)
 {
+	dma_addr_t addr = pt ? px_dma(pt) : px_dma(ppgtt->base.vm.scratch[1]);
+
 	/* Caller needs to make sure the write completes if necessary */
-	iowrite32(GEN6_PDE_ADDR_ENCODE(px_dma(pt)) | GEN6_PDE_VALID,
+	iowrite32(GEN6_PDE_ADDR_ENCODE(addr) | GEN6_PDE_VALID,
 		  ppgtt->pd_addr + pde);
 }
 
@@ -79,7 +81,7 @@ static void gen6_ppgtt_clear_range(struct i915_address_space *vm,
 {
 	struct gen6_ppgtt * const ppgtt = to_gen6_ppgtt(i915_vm_to_ppgtt(vm));
 	const unsigned int first_entry = start / I915_GTT_PAGE_SIZE;
-	const gen6_pte_t scratch_pte = vm->scratch[0].encode;
+	const gen6_pte_t scratch_pte = vm->scratch[0]->encode;
 	unsigned int pde = first_entry / GEN6_PTES;
 	unsigned int pte = first_entry % GEN6_PTES;
 	unsigned int num_entries = length / I915_GTT_PAGE_SIZE;
@@ -90,8 +92,6 @@ static void gen6_ppgtt_clear_range(struct i915_address_space *vm,
 		const unsigned int count = min(num_entries, GEN6_PTES - pte);
 		gen6_pte_t *vaddr;
 
-		GEM_BUG_ON(px_base(pt) == px_base(&vm->scratch[1]));
-
 		num_entries -= count;
 
 		GEM_BUG_ON(count > atomic_read(&pt->used));
@@ -127,7 +127,7 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
 	struct sgt_dma iter = sgt_dma(vma);
 	gen6_pte_t *vaddr;
 
-	GEM_BUG_ON(pd->entry[act_pt] == &vm->scratch[1]);
+	GEM_BUG_ON(!pd->entry[act_pt]);
 
 	vaddr = kmap_atomic_px(i915_pt_entry(pd, act_pt));
 	do {
@@ -194,16 +194,16 @@ static void gen6_alloc_va_range(struct i915_address_space *vm,
 	gen6_for_each_pde(pt, pd, start, length, pde) {
 		const unsigned int count = gen6_pte_count(start, length);
 
-		if (px_base(pt) == px_base(&vm->scratch[1])) {
+		if (!pt) {
 			spin_unlock(&pd->lock);
 
 			pt = stash->pt[0];
 			GEM_BUG_ON(!pt);
 
-			fill32_px(pt, vm->scratch[0].encode);
+			fill32_px(pt, vm->scratch[0]->encode);
 
 			spin_lock(&pd->lock);
-			if (pd->entry[pde] == &vm->scratch[1]) {
+			if (!pd->entry[pde]) {
 				stash->pt[0] = pt->stash;
 				atomic_set(&pt->used, 0);
 				pd->entry[pde] = pt;
@@ -225,24 +225,21 @@ static void gen6_alloc_va_range(struct i915_address_space *vm,
 static int gen6_ppgtt_init_scratch(struct gen6_ppgtt *ppgtt)
 {
 	struct i915_address_space * const vm = &ppgtt->base.vm;
-	struct i915_page_directory * const pd = ppgtt->base.pd;
 	int ret;
 
-	ret = setup_scratch_page(vm, __GFP_HIGHMEM);
+	ret = setup_scratch_page(vm);
 	if (ret)
 		return ret;
 
-	vm->scratch[0].encode =
-		vm->pte_encode(px_dma(&vm->scratch[0]),
+	vm->scratch[0]->encode =
+		vm->pte_encode(px_dma(vm->scratch[0]),
 			       I915_CACHE_NONE, PTE_READ_ONLY);
 
-	if (unlikely(setup_page_dma(vm, px_base(&vm->scratch[1])))) {
-		cleanup_scratch_page(vm);
-		return -ENOMEM;
-	}
+	vm->scratch[1] = vm->alloc_pt_dma(vm, I915_GTT_PAGE_SIZE_4K);
+	if (IS_ERR(vm->scratch[1]))
+		return PTR_ERR(vm->scratch[1]);
 
-	fill32_px(&vm->scratch[1], vm->scratch[0].encode);
-	memset_p(pd->entry, &vm->scratch[1], I915_PDES);
+	fill32_px(vm->scratch[1], vm->scratch[0]->encode);
 
 	return 0;
 }
@@ -250,13 +247,11 @@ static int gen6_ppgtt_init_scratch(struct gen6_ppgtt *ppgtt)
 static void gen6_ppgtt_free_pd(struct gen6_ppgtt *ppgtt)
 {
 	struct i915_page_directory * const pd = ppgtt->base.pd;
-	struct i915_page_dma * const scratch =
-		px_base(&ppgtt->base.vm.scratch[1]);
 	struct i915_page_table *pt;
 	u32 pde;
 
 	gen6_for_all_pdes(pt, pd, pde)
-		if (px_base(pt) != scratch)
+		if (pt)
 			free_px(&ppgtt->base.vm, pt);
 }
 
@@ -297,7 +292,7 @@ static void pd_vma_bind(struct i915_address_space *vm,
 	struct gen6_ppgtt *ppgtt = vma->private;
 	u32 ggtt_offset = i915_ggtt_offset(vma) / I915_GTT_PAGE_SIZE;
 
-	px_base(ppgtt->base.pd)->ggtt_offset = ggtt_offset * sizeof(gen6_pte_t);
+	ppgtt->pp_dir = ggtt_offset * sizeof(gen6_pte_t) << 10;
 	ppgtt->pd_addr = (gen6_pte_t __iomem *)ggtt->gsm + ggtt_offset;
 
 	gen6_flush_pd(ppgtt, 0, ppgtt->base.vm.total);
@@ -307,8 +302,6 @@ static void pd_vma_unbind(struct i915_address_space *vm, struct i915_vma *vma)
 {
 	struct gen6_ppgtt *ppgtt = vma->private;
 	struct i915_page_directory * const pd = ppgtt->base.pd;
-	struct i915_page_dma * const scratch =
-		px_base(&ppgtt->base.vm.scratch[1]);
 	struct i915_page_table *pt;
 	unsigned int pde;
 
@@ -317,11 +310,11 @@ static void pd_vma_unbind(struct i915_address_space *vm, struct i915_vma *vma)
 
 	/* Free all no longer used page tables */
 	gen6_for_all_pdes(pt, ppgtt->base.pd, pde) {
-		if (px_base(pt) == scratch || atomic_read(&pt->used))
+		if (!pt || atomic_read(&pt->used))
 			continue;
 
 		free_px(&ppgtt->base.vm, pt);
-		pd->entry[pde] = scratch;
+		pd->entry[pde] = NULL;
 	}
 
 	ppgtt->scan_for_unused_pt = false;
@@ -441,6 +434,7 @@ struct i915_ppgtt *gen6_ppgtt_create(struct intel_gt *gt)
 	ppgtt->base.vm.insert_entries = gen6_ppgtt_insert_entries;
 	ppgtt->base.vm.cleanup = gen6_ppgtt_cleanup;
 
+	ppgtt->base.vm.alloc_pt_dma = alloc_pt_dma;
 	ppgtt->base.vm.pte_encode = ggtt->vm.pte_encode;
 
 	ppgtt->base.pd = __alloc_pd(sizeof(*ppgtt->base.pd));
diff --git a/drivers/gpu/drm/i915/gt/gen6_ppgtt.h b/drivers/gpu/drm/i915/gt/gen6_ppgtt.h
index 72e481806c96..7249672e5802 100644
--- a/drivers/gpu/drm/i915/gt/gen6_ppgtt.h
+++ b/drivers/gpu/drm/i915/gt/gen6_ppgtt.h
@@ -14,6 +14,7 @@ struct gen6_ppgtt {
 	struct mutex flush;
 	struct i915_vma *vma;
 	gen6_pte_t __iomem *pd_addr;
+	u32 pp_dir;
 
 	atomic_t pin_count;
 	struct mutex pin_mutex;
diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
index e6f2acd445dd..d3f27beaac03 100644
--- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
@@ -199,7 +199,7 @@ static u64 __gen8_ppgtt_clear(struct i915_address_space * const vm,
 			      struct i915_page_directory * const pd,
 			      u64 start, const u64 end, int lvl)
 {
-	const struct i915_page_scratch * const scratch = &vm->scratch[lvl];
+	const struct drm_i915_gem_object * const scratch = vm->scratch[lvl];
 	unsigned int idx, len;
 
 	GEM_BUG_ON(end > vm->total >> GEN8_PTE_SHIFT);
@@ -239,7 +239,7 @@ static u64 __gen8_ppgtt_clear(struct i915_address_space * const vm,
 
 			vaddr = kmap_atomic_px(pt);
 			memset64(vaddr + gen8_pd_index(start, 0),
-				 vm->scratch[0].encode,
+				 vm->scratch[0]->encode,
 				 count);
 			kunmap_atomic(vaddr);
 
@@ -301,7 +301,7 @@ static void __gen8_ppgtt_alloc(struct i915_address_space * const vm,
 			if (lvl ||
 			    gen8_pt_count(*start, end) < I915_PDES ||
 			    intel_vgpu_active(vm->i915))
-				fill_px(pt, vm->scratch[lvl].encode);
+				fill_px(pt, vm->scratch[lvl]->encode);
 
 			spin_lock(&pd->lock);
 			if (likely(!pd->entry[idx])) {
@@ -356,16 +356,6 @@ static void gen8_ppgtt_alloc(struct i915_address_space *vm,
 			   &start, start + length, vm->top);
 }
 
-static __always_inline void
-write_pte(gen8_pte_t *pte, const gen8_pte_t val)
-{
-	/* Magic delays? Or can we refine these to flush all in one pass? */
-	*pte = val;
-	wmb(); /* cpu to cache */
-	clflush(pte); /* cache to memory */
-	wmb(); /* visible to all */
-}
-
 static __always_inline u64
 gen8_ppgtt_insert_pte(struct i915_ppgtt *ppgtt,
 		      struct i915_page_directory *pdp,
@@ -382,8 +372,7 @@ gen8_ppgtt_insert_pte(struct i915_ppgtt *ppgtt,
 	vaddr = kmap_atomic_px(i915_pt_entry(pd, gen8_pd_index(idx, 1)));
 	do {
 		GEM_BUG_ON(iter->sg->length < I915_GTT_PAGE_SIZE);
-		write_pte(&vaddr[gen8_pd_index(idx, 0)],
-			  pte_encode | iter->dma);
+		vaddr[gen8_pd_index(idx, 0)] = pte_encode | iter->dma;
 
 		iter->dma += I915_GTT_PAGE_SIZE;
 		if (iter->dma >= iter->max) {
@@ -406,10 +395,12 @@ gen8_ppgtt_insert_pte(struct i915_ppgtt *ppgtt,
 				pd = pdp->entry[gen8_pd_index(idx, 2)];
 			}
 
+			clflush_cache_range(vaddr, PAGE_SIZE);
 			kunmap_atomic(vaddr);
 			vaddr = kmap_atomic_px(i915_pt_entry(pd, gen8_pd_index(idx, 1)));
 		}
 	} while (1);
+	clflush_cache_range(vaddr, PAGE_SIZE);
 	kunmap_atomic(vaddr);
 
 	return idx;
@@ -465,7 +456,7 @@ static void gen8_ppgtt_insert_huge(struct i915_vma *vma,
 
 		do {
 			GEM_BUG_ON(iter->sg->length < page_size);
-			write_pte(&vaddr[index++], encode | iter->dma);
+			vaddr[index++] = encode | iter->dma;
 
 			start += page_size;
 			iter->dma += page_size;
@@ -490,6 +481,7 @@ static void gen8_ppgtt_insert_huge(struct i915_vma *vma,
 			}
 		} while (rem >= page_size && index < I915_PDES);
 
+		clflush_cache_range(vaddr, PAGE_SIZE);
 		kunmap_atomic(vaddr);
 
 		/*
@@ -521,7 +513,7 @@ static void gen8_ppgtt_insert_huge(struct i915_vma *vma,
 			if (I915_SELFTEST_ONLY(vma->vm->scrub_64K)) {
 				u16 i;
 
-				encode = vma->vm->scratch[0].encode;
+				encode = vma->vm->scratch[0]->encode;
 				vaddr = kmap_atomic_px(i915_pt_entry(pd, maybe_64K));
 
 				for (i = 1; i < index; i += 16)
@@ -575,27 +567,31 @@ static int gen8_init_scratch(struct i915_address_space *vm)
 		GEM_BUG_ON(!clone->has_read_only);
 
 		vm->scratch_order = clone->scratch_order;
-		memcpy(vm->scratch, clone->scratch, sizeof(vm->scratch));
-		px_dma(&vm->scratch[0]) = 0; /* no xfer of ownership */
+		for (i = 0; i <= vm->top; i++)
+			vm->scratch[i] = i915_gem_object_get(clone->scratch[i]);
+
 		return 0;
 	}
 
-	ret = setup_scratch_page(vm, __GFP_HIGHMEM);
+	ret = setup_scratch_page(vm);
 	if (ret)
 		return ret;
 
-	vm->scratch[0].encode =
-		gen8_pte_encode(px_dma(&vm->scratch[0]),
+	vm->scratch[0]->encode =
+		gen8_pte_encode(px_dma(vm->scratch[0]),
 				I915_CACHE_LLC, vm->has_read_only);
 
 	for (i = 1; i <= vm->top; i++) {
-		if (unlikely(setup_page_dma(vm, px_base(&vm->scratch[i]))))
+		struct drm_i915_gem_object *obj;
+
+		obj = vm->alloc_pt_dma(vm, I915_GTT_PAGE_SIZE_4K);
+		if (IS_ERR(obj))
 			goto free_scratch;
 
-		fill_px(&vm->scratch[i], vm->scratch[i - 1].encode);
-		vm->scratch[i].encode =
-			gen8_pde_encode(px_dma(&vm->scratch[i]),
-					I915_CACHE_LLC);
+		fill_px(obj, vm->scratch[i - 1]->encode);
+		obj->encode = gen8_pde_encode(px_dma(obj), I915_CACHE_LLC);
+
+		vm->scratch[i] = obj;
 	}
 
 	return 0;
@@ -621,7 +617,7 @@ static int gen8_preallocate_top_level_pdp(struct i915_ppgtt *ppgtt)
 		if (IS_ERR(pde))
 			return PTR_ERR(pde);
 
-		fill_px(pde, vm->scratch[1].encode);
+		fill_px(pde, vm->scratch[1]->encode);
 		set_pd_entry(pd, idx, pde);
 		atomic_inc(px_used(pde)); /* keep pinned */
 	}
@@ -642,12 +638,13 @@ gen8_alloc_top_pd(struct i915_address_space *vm)
 	if (unlikely(!pd))
 		return ERR_PTR(-ENOMEM);
 
-	if (unlikely(setup_page_dma(vm, px_base(pd)))) {
+	pd->pt.base = vm->alloc_pt_dma(vm, I915_GTT_PAGE_SIZE_4K);
+	if (IS_ERR(pd->pt.base)) {
 		kfree(pd);
 		return ERR_PTR(-ENOMEM);
 	}
 
-	fill_page_dma(px_base(pd), vm->scratch[vm->top].encode, count);
+	fill_page_dma(px_base(pd), vm->scratch[vm->top]->encode, count);
 	atomic_inc(px_used(pd)); /* mark as pinned */
 	return pd;
 }
@@ -681,12 +678,7 @@ struct i915_ppgtt *gen8_ppgtt_create(struct intel_gt *gt)
 	 */
 	ppgtt->vm.has_read_only = !IS_GEN_RANGE(gt->i915, 11, 12);
 
-	/*
-	 * There are only few exceptions for gen >=6. chv and bxt.
-	 * And we are not sure about the latter so play safe for now.
-	 */
-	if (IS_CHERRYVIEW(gt->i915) || IS_BROXTON(gt->i915))
-		ppgtt->vm.pt_kmap_wc = true;
+	ppgtt->vm.alloc_pt_dma = alloc_pt_dma;
 
 	err = gen8_init_scratch(&ppgtt->vm);
 	if (err)
diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c
index 791e4070ef31..9db27a2e5f36 100644
--- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
@@ -78,8 +78,6 @@ int i915_ggtt_init_hw(struct drm_i915_private *i915)
 {
 	int ret;
 
-	stash_init(&i915->mm.wc_stash);
-
 	/*
 	 * Note that we use page colouring to enforce a guard page at the
 	 * end of the address space. This is required as the CS may prefetch
@@ -232,7 +230,7 @@ static void gen8_ggtt_insert_entries(struct i915_address_space *vm,
 
 	/* Fill the allocated but "unused" space beyond the end of the buffer */
 	while (gte < end)
-		gen8_set_pte(gte++, vm->scratch[0].encode);
+		gen8_set_pte(gte++, vm->scratch[0]->encode);
 
 	/*
 	 * We want to flush the TLBs only after we're certain all the PTE
@@ -283,7 +281,7 @@ static void gen6_ggtt_insert_entries(struct i915_address_space *vm,
 
 	/* Fill the allocated but "unused" space beyond the end of the buffer */
 	while (gte < end)
-		iowrite32(vm->scratch[0].encode, gte++);
+		iowrite32(vm->scratch[0]->encode, gte++);
 
 	/*
 	 * We want to flush the TLBs only after we're certain all the PTE
@@ -303,7 +301,7 @@ static void gen8_ggtt_clear_range(struct i915_address_space *vm,
 	struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
 	unsigned int first_entry = start / I915_GTT_PAGE_SIZE;
 	unsigned int num_entries = length / I915_GTT_PAGE_SIZE;
-	const gen8_pte_t scratch_pte = vm->scratch[0].encode;
+	const gen8_pte_t scratch_pte = vm->scratch[0]->encode;
 	gen8_pte_t __iomem *gtt_base =
 		(gen8_pte_t __iomem *)ggtt->gsm + first_entry;
 	const int max_entries = ggtt_total_entries(ggtt) - first_entry;
@@ -401,7 +399,7 @@ static void gen6_ggtt_clear_range(struct i915_address_space *vm,
 		 first_entry, num_entries, max_entries))
 		num_entries = max_entries;
 
-	scratch_pte = vm->scratch[0].encode;
+	scratch_pte = vm->scratch[0]->encode;
 	for (i = 0; i < num_entries; i++)
 		iowrite32(scratch_pte, &gtt_base[i]);
 }
@@ -712,18 +710,11 @@ static void ggtt_cleanup_hw(struct i915_ggtt *ggtt)
 void i915_ggtt_driver_release(struct drm_i915_private *i915)
 {
 	struct i915_ggtt *ggtt = &i915->ggtt;
-	struct pagevec *pvec;
 
 	fini_aliasing_ppgtt(ggtt);
 
 	intel_ggtt_fini_fences(ggtt);
 	ggtt_cleanup_hw(ggtt);
-
-	pvec = &i915->mm.wc_stash.pvec;
-	if (pvec->nr) {
-		set_pages_array_wb(pvec->pages, pvec->nr);
-		__pagevec_release(pvec);
-	}
 }
 
 static unsigned int gen6_get_total_gtt_size(u16 snb_gmch_ctl)
@@ -786,7 +777,7 @@ static int ggtt_probe_common(struct i915_ggtt *ggtt, u64 size)
 		return -ENOMEM;
 	}
 
-	ret = setup_scratch_page(&ggtt->vm, GFP_DMA32);
+	ret = setup_scratch_page(&ggtt->vm);
 	if (ret) {
 		drm_err(&i915->drm, "Scratch setup failed\n");
 		/* iounmap will also get called at remove, but meh */
@@ -794,8 +785,8 @@ static int ggtt_probe_common(struct i915_ggtt *ggtt, u64 size)
 		return ret;
 	}
 
-	ggtt->vm.scratch[0].encode =
-		ggtt->vm.pte_encode(px_dma(&ggtt->vm.scratch[0]),
+	ggtt->vm.scratch[0]->encode =
+		ggtt->vm.pte_encode(px_dma(ggtt->vm.scratch[0]),
 				    I915_CACHE_NONE, 0);
 
 	return 0;
@@ -821,7 +812,7 @@ static void gen6_gmch_remove(struct i915_address_space *vm)
 	struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
 
 	iounmap(ggtt->gsm);
-	cleanup_scratch_page(vm);
+	free_scratch(vm);
 }
 
 static struct resource pci_resource(struct pci_dev *pdev, int bar)
@@ -849,6 +840,8 @@ static int gen8_gmch_probe(struct i915_ggtt *ggtt)
 	else
 		size = gen8_get_total_gtt_size(snb_gmch_ctl);
 
+	ggtt->vm.alloc_pt_dma = alloc_pt_dma;
+
 	ggtt->vm.total = (size / sizeof(gen8_pte_t)) * I915_GTT_PAGE_SIZE;
 	ggtt->vm.cleanup = gen6_gmch_remove;
 	ggtt->vm.insert_page = gen8_ggtt_insert_page;
@@ -997,6 +990,8 @@ static int gen6_gmch_probe(struct i915_ggtt *ggtt)
 	size = gen6_get_total_gtt_size(snb_gmch_ctl);
 	ggtt->vm.total = (size / sizeof(gen6_pte_t)) * I915_GTT_PAGE_SIZE;
 
+	ggtt->vm.alloc_pt_dma = alloc_pt_dma;
+
 	ggtt->vm.clear_range = nop_clear_range;
 	if (!HAS_FULL_PPGTT(i915) || intel_scanout_needs_vtd_wa(i915))
 		ggtt->vm.clear_range = gen6_ggtt_clear_range;
@@ -1047,6 +1042,8 @@ static int i915_gmch_probe(struct i915_ggtt *ggtt)
 	ggtt->gmadr =
 		(struct resource)DEFINE_RES_MEM(gmadr_base, ggtt->mappable_end);
 
+	ggtt->vm.alloc_pt_dma = alloc_pt_dma;
+
 	ggtt->do_idle_maps = needs_idle_maps(i915);
 	ggtt->vm.insert_page = i915_ggtt_insert_page;
 	ggtt->vm.insert_entries = i915_ggtt_insert_entries;
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
index 2a72cce63fd9..e0cc90942848 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
@@ -11,160 +11,24 @@
 #include "intel_gt.h"
 #include "intel_gtt.h"
 
-void stash_init(struct pagestash *stash)
+struct drm_i915_gem_object *alloc_pt_dma(struct i915_address_space *vm, int sz)
 {
-	pagevec_init(&stash->pvec);
-	spin_lock_init(&stash->lock);
-}
-
-static struct page *stash_pop_page(struct pagestash *stash)
-{
-	struct page *page = NULL;
-
-	spin_lock(&stash->lock);
-	if (likely(stash->pvec.nr))
-		page = stash->pvec.pages[--stash->pvec.nr];
-	spin_unlock(&stash->lock);
-
-	return page;
-}
-
-static void stash_push_pagevec(struct pagestash *stash, struct pagevec *pvec)
-{
-	unsigned int nr;
-
-	spin_lock_nested(&stash->lock, SINGLE_DEPTH_NESTING);
-
-	nr = min_t(typeof(nr), pvec->nr, pagevec_space(&stash->pvec));
-	memcpy(stash->pvec.pages + stash->pvec.nr,
-	       pvec->pages + pvec->nr - nr,
-	       sizeof(pvec->pages[0]) * nr);
-	stash->pvec.nr += nr;
-
-	spin_unlock(&stash->lock);
-
-	pvec->nr -= nr;
-}
-
-static struct page *vm_alloc_page(struct i915_address_space *vm, gfp_t gfp)
-{
-	struct pagevec stack;
-	struct page *page;
-
-	if (I915_SELFTEST_ONLY(should_fail(&vm->fault_attr, 1)))
-		i915_gem_shrink_all(vm->i915);
-
-	page = stash_pop_page(&vm->free_pages);
-	if (page)
-		return page;
-
-	if (!vm->pt_kmap_wc)
-		return alloc_page(gfp);
-
-	/* Look in our global stash of WC pages... */
-	page = stash_pop_page(&vm->i915->mm.wc_stash);
-	if (page)
-		return page;
+	struct drm_i915_gem_object *obj;
+	int err;
 
-	/*
-	 * Otherwise batch allocate pages to amortize cost of set_pages_wc.
-	 *
-	 * We have to be careful as page allocation may trigger the shrinker
-	 * (via direct reclaim) which will fill up the WC stash underneath us.
-	 * So we add our WB pages into a temporary pvec on the stack and merge
-	 * them into the WC stash after all the allocations are complete.
-	 */
-	pagevec_init(&stack);
-	do {
-		struct page *page;
-
-		page = alloc_page(gfp);
-		if (unlikely(!page))
-			break;
-
-		stack.pages[stack.nr++] = page;
-	} while (pagevec_space(&stack));
-
-	if (stack.nr && !set_pages_array_wc(stack.pages, stack.nr)) {
-		page = stack.pages[--stack.nr];
-
-		/* Merge spare WC pages to the global stash */
-		if (stack.nr)
-			stash_push_pagevec(&vm->i915->mm.wc_stash, &stack);
-
-		/* Push any surplus WC pages onto the local VM stash */
-		if (stack.nr)
-			stash_push_pagevec(&vm->free_pages, &stack);
-	}
-
-	/* Return unwanted leftovers */
-	if (unlikely(stack.nr)) {
-		WARN_ON_ONCE(set_pages_array_wb(stack.pages, stack.nr));
-		__pagevec_release(&stack);
-	}
-
-	return page;
-}
-
-static void vm_free_pages_release(struct i915_address_space *vm,
-				  bool immediate)
-{
-	struct pagevec *pvec = &vm->free_pages.pvec;
-	struct pagevec stack;
-
-	lockdep_assert_held(&vm->free_pages.lock);
-	GEM_BUG_ON(!pagevec_count(pvec));
-
-	if (vm->pt_kmap_wc) {
-		/*
-		 * When we use WC, first fill up the global stash and then
-		 * only if full immediately free the overflow.
-		 */
-		stash_push_pagevec(&vm->i915->mm.wc_stash, pvec);
+	obj = i915_gem_object_create_internal(vm->i915, sz);
+	if (IS_ERR(obj))
+		return obj;
 
-		/*
-		 * As we have made some room in the VM's free_pages,
-		 * we can wait for it to fill again. Unless we are
-		 * inside i915_address_space_fini() and must
-		 * immediately release the pages!
-		 */
-		if (pvec->nr <= (immediate ? 0 : PAGEVEC_SIZE - 1))
-			return;
-
-		/*
-		 * We have to drop the lock to allow ourselves to sleep,
-		 * so take a copy of the pvec and clear the stash for
-		 * others to use it as we sleep.
-		 */
-		stack = *pvec;
-		pagevec_reinit(pvec);
-		spin_unlock(&vm->free_pages.lock);
-
-		pvec = &stack;
-		set_pages_array_wb(pvec->pages, pvec->nr);
-
-		spin_lock(&vm->free_pages.lock);
+	err = i915_gem_object_pin_pages(obj);
+	if (err) {
+		i915_gem_object_put(obj);
+		return ERR_PTR(err);
 	}
 
-	__pagevec_release(pvec);
-}
+	i915_gem_object_make_unshrinkable(obj);
 
-static void vm_free_page(struct i915_address_space *vm, struct page *page)
-{
-	/*
-	 * On !llc, we need to change the pages back to WB. We only do so
-	 * in bulk, so we rarely need to change the page attributes here,
-	 * but doing so requires a stop_machine() from deep inside arch/x86/mm.
-	 * To make detection of the possible sleep more likely, use an
-	 * unconditional might_sleep() for everybody.
-	 */
-	might_sleep();
-	spin_lock(&vm->free_pages.lock);
-	while (!pagevec_space(&vm->free_pages.pvec))
-		vm_free_pages_release(vm, false);
-	GEM_BUG_ON(pagevec_count(&vm->free_pages.pvec) >= PAGEVEC_SIZE);
-	pagevec_add(&vm->free_pages.pvec, page);
-	spin_unlock(&vm->free_pages.lock);
+	return obj;
 }
 
 void __i915_vm_close(struct i915_address_space *vm)
@@ -194,14 +58,7 @@ void __i915_vm_close(struct i915_address_space *vm)
 
 void i915_address_space_fini(struct i915_address_space *vm)
 {
-	spin_lock(&vm->free_pages.lock);
-	if (pagevec_count(&vm->free_pages.pvec))
-		vm_free_pages_release(vm, true);
-	GEM_BUG_ON(pagevec_count(&vm->free_pages.pvec));
-	spin_unlock(&vm->free_pages.lock);
-
 	drm_mm_takedown(&vm->mm);
-
 	mutex_destroy(&vm->mutex);
 }
 
@@ -246,8 +103,6 @@ void i915_address_space_init(struct i915_address_space *vm, int subclass)
 	drm_mm_init(&vm->mm, 0, vm->total);
 	vm->mm.head_node.color = I915_COLOR_UNEVICTABLE;
 
-	stash_init(&vm->free_pages);
-
 	INIT_LIST_HEAD(&vm->bound_list);
 }
 
@@ -264,64 +119,47 @@ void clear_pages(struct i915_vma *vma)
 	memset(&vma->page_sizes, 0, sizeof(vma->page_sizes));
 }
 
-static int __setup_page_dma(struct i915_address_space *vm,
-			    struct i915_page_dma *p,
-			    gfp_t gfp)
-{
-	p->page = vm_alloc_page(vm, gfp | I915_GFP_ALLOW_FAIL);
-	if (unlikely(!p->page))
-		return -ENOMEM;
-
-	p->daddr = dma_map_page_attrs(vm->dma,
-				      p->page, 0, PAGE_SIZE,
-				      PCI_DMA_BIDIRECTIONAL,
-				      DMA_ATTR_SKIP_CPU_SYNC |
-				      DMA_ATTR_NO_WARN);
-	if (unlikely(dma_mapping_error(vm->dma, p->daddr))) {
-		vm_free_page(vm, p->page);
-		return -ENOMEM;
-	}
-
-	return 0;
-}
-
-int setup_page_dma(struct i915_address_space *vm, struct i915_page_dma *p)
+dma_addr_t __px_dma(struct drm_i915_gem_object *p)
 {
-	return __setup_page_dma(vm, p, __GFP_HIGHMEM);
+	GEM_BUG_ON(!i915_gem_object_has_pages(p));
+	return sg_dma_address(p->mm.pages->sgl);
 }
 
-void cleanup_page_dma(struct i915_address_space *vm, struct i915_page_dma *p)
+struct page *__px_page(struct drm_i915_gem_object *p)
 {
-	dma_unmap_page(vm->dma, p->daddr, PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
-	vm_free_page(vm, p->page);
+	GEM_BUG_ON(!i915_gem_object_has_pages(p));
+	return sg_page(p->mm.pages->sgl);
 }
 
 void
-fill_page_dma(const struct i915_page_dma *p, const u64 val, unsigned int count)
+fill_page_dma(struct drm_i915_gem_object *p, const u64 val, unsigned int count)
 {
-	kunmap_atomic(memset64(kmap_atomic(p->page), val, count));
+	struct page *page = __px_page(p);
+	void *vaddr;
+
+	vaddr = kmap(page);
+	memset64(vaddr, val, count);
+	kunmap(page);
 }
 
-static void poison_scratch_page(struct page *page, unsigned long size)
+static void poison_scratch_page(struct drm_i915_gem_object *scratch)
 {
+	struct sgt_iter sgt;
+	struct page *page;
+
 	if (!IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM))
 		return;
 
-	GEM_BUG_ON(!IS_ALIGNED(size, PAGE_SIZE));
-
-	do {
+	for_each_sgt_page(page, sgt, scratch->mm.pages) {
 		void *vaddr;
 
 		vaddr = kmap(page);
 		memset(vaddr, POISON_FREE, PAGE_SIZE);
 		kunmap(page);
-
-		page = pfn_to_page(page_to_pfn(page) + 1);
-		size -= PAGE_SIZE;
-	} while (size);
+	}
 }
 
-int setup_scratch_page(struct i915_address_space *vm, gfp_t gfp)
+int setup_scratch_page(struct i915_address_space *vm)
 {
 	unsigned long size;
 
@@ -338,21 +176,19 @@ int setup_scratch_page(struct i915_address_space *vm, gfp_t gfp)
 	 */
 	size = I915_GTT_PAGE_SIZE_4K;
 	if (i915_vm_is_4lvl(vm) &&
-	    HAS_PAGE_SIZES(vm->i915, I915_GTT_PAGE_SIZE_64K)) {
+	    HAS_PAGE_SIZES(vm->i915, I915_GTT_PAGE_SIZE_64K))
 		size = I915_GTT_PAGE_SIZE_64K;
-		gfp |= __GFP_NOWARN;
-	}
-	gfp |= __GFP_ZERO | __GFP_RETRY_MAYFAIL;
 
 	do {
-		unsigned int order = get_order(size);
-		struct page *page;
-		dma_addr_t addr;
+		struct drm_i915_gem_object *obj;
 
-		page = alloc_pages(gfp, order);
-		if (unlikely(!page))
+		obj = vm->alloc_pt_dma(vm, size);
+		if (IS_ERR(obj))
 			goto skip;
 
+		if (obj->mm.page_sizes.sg < size)
+			goto skip_obj;
+
 		/*
 		 * Use a non-zero scratch page for debugging.
 		 *
@@ -362,61 +198,28 @@ int setup_scratch_page(struct i915_address_space *vm, gfp_t gfp)
 		 * should it ever be accidentally used, the effect should be
 		 * fairly benign.
 		 */
-		poison_scratch_page(page, size);
-
-		addr = dma_map_page_attrs(vm->dma,
-					  page, 0, size,
-					  PCI_DMA_BIDIRECTIONAL,
-					  DMA_ATTR_SKIP_CPU_SYNC |
-					  DMA_ATTR_NO_WARN);
-		if (unlikely(dma_mapping_error(vm->dma, addr)))
-			goto free_page;
-
-		if (unlikely(!IS_ALIGNED(addr, size)))
-			goto unmap_page;
-
-		vm->scratch[0].base.page = page;
-		vm->scratch[0].base.daddr = addr;
-		vm->scratch_order = order;
+		poison_scratch_page(obj);
+
+		vm->scratch[0] = obj;
+		vm->scratch_order = get_order(size);
 		return 0;
 
-unmap_page:
-		dma_unmap_page(vm->dma, addr, size, PCI_DMA_BIDIRECTIONAL);
-free_page:
-		__free_pages(page, order);
+skip_obj:
+		i915_gem_object_put(obj);
 skip:
 		if (size == I915_GTT_PAGE_SIZE_4K)
 			return -ENOMEM;
 
 		size = I915_GTT_PAGE_SIZE_4K;
-		gfp &= ~__GFP_NOWARN;
 	} while (1);
 }
 
-void cleanup_scratch_page(struct i915_address_space *vm)
-{
-	struct i915_page_dma *p = px_base(&vm->scratch[0]);
-	unsigned int order = vm->scratch_order;
-
-	dma_unmap_page(vm->dma, p->daddr, BIT(order) << PAGE_SHIFT,
-		       PCI_DMA_BIDIRECTIONAL);
-	__free_pages(p->page, order);
-}
-
 void free_scratch(struct i915_address_space *vm)
 {
 	int i;
 
-	if (!px_dma(&vm->scratch[0])) /* set to 0 on clones */
-		return;
-
-	for (i = 1; i <= vm->top; i++) {
-		if (!px_dma(&vm->scratch[i]))
-			break;
-		cleanup_page_dma(vm, px_base(&vm->scratch[i]));
-	}
-
-	cleanup_scratch_page(vm);
+	for (i = 0; i <= vm->top; i++)
+		i915_gem_object_put(vm->scratch[i]);
 }
 
 void gtt_write_workarounds(struct intel_gt *gt)
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
index 8bd462d2fcd9..57b31b36285f 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
@@ -134,31 +134,19 @@ typedef u64 gen8_pte_t;
 #define GEN8_PDE_IPS_64K BIT(11)
 #define GEN8_PDE_PS_2M   BIT(7)
 
+enum i915_cache_level;
+
+struct drm_i915_file_private;
+struct drm_i915_gem_object;
 struct i915_fence_reg;
+struct i915_vma;
+struct intel_gt;
 
 #define for_each_sgt_daddr(__dp, __iter, __sgt) \
 	__for_each_sgt_daddr(__dp, __iter, __sgt, I915_GTT_PAGE_SIZE)
 
-struct i915_page_dma {
-	struct page *page;
-	union {
-		dma_addr_t daddr;
-
-		/*
-		 * For gen6/gen7 only. This is the offset in the GGTT
-		 * where the page directory entries for PPGTT begin
-		 */
-		u32 ggtt_offset;
-	};
-};
-
-struct i915_page_scratch {
-	struct i915_page_dma base;
-	u64 encode;
-};
-
 struct i915_page_table {
-	struct i915_page_dma base;
+	struct drm_i915_gem_object *base;
 	union {
 		atomic_t used;
 		struct i915_page_table *stash;
@@ -179,12 +167,14 @@ struct i915_page_directory {
 	other)
 
 #define px_base(px) \
-	__px_choose_expr(px, struct i915_page_dma *, __x, \
-	__px_choose_expr(px, struct i915_page_scratch *, &__x->base, \
-	__px_choose_expr(px, struct i915_page_table *, &__x->base, \
-	__px_choose_expr(px, struct i915_page_directory *, &__x->pt.base, \
-	(void)0))))
-#define px_dma(px) (px_base(px)->daddr)
+	__px_choose_expr(px, struct drm_i915_gem_object *, __x, \
+	__px_choose_expr(px, struct i915_page_table *, __x->base, \
+	__px_choose_expr(px, struct i915_page_directory *, __x->pt.base, \
+	(void)0)))
+
+struct page *__px_page(struct drm_i915_gem_object *p);
+dma_addr_t __px_dma(struct drm_i915_gem_object *p);
+#define px_dma(px) (__px_dma(px_base(px)))
 
 #define px_pt(px) \
 	__px_choose_expr(px, struct i915_page_table *, __x, \
@@ -192,13 +182,6 @@ struct i915_page_directory {
 	(void)0))
 #define px_used(px) (&px_pt(px)->used)
 
-enum i915_cache_level;
-
-struct drm_i915_file_private;
-struct drm_i915_gem_object;
-struct i915_vma;
-struct intel_gt;
-
 struct i915_vm_pt_stash {
 	/* preallocated chains of page tables/directories */
 	struct i915_page_table *pt[2];
@@ -222,13 +205,6 @@ struct i915_vma_ops {
 	void (*clear_pages)(struct i915_vma *vma);
 };
 
-struct pagestash {
-	spinlock_t lock;
-	struct pagevec pvec;
-};
-
-void stash_init(struct pagestash *stash);
-
 struct i915_address_space {
 	struct kref ref;
 	struct rcu_work rcu;
@@ -265,7 +241,7 @@ struct i915_address_space {
 #define VM_CLASS_GGTT 0
 #define VM_CLASS_PPGTT 1
 
-	struct i915_page_scratch scratch[4];
+	struct drm_i915_gem_object *scratch[4];
 	unsigned int scratch_order;
 	unsigned int top;
 
@@ -274,17 +250,15 @@ struct i915_address_space {
 	 */
 	struct list_head bound_list;
 
-	struct pagestash free_pages;
-
 	/* Global GTT */
 	bool is_ggtt:1;
 
-	/* Some systems require uncached updates of the page directories */
-	bool pt_kmap_wc:1;
-
 	/* Some systems support read-only mappings for GGTT and/or PPGTT */
 	bool has_read_only:1;
 
+	struct drm_i915_gem_object *
+		(*alloc_pt_dma)(struct i915_address_space *vm, int sz);
+
 	u64 (*pte_encode)(dma_addr_t addr,
 			  enum i915_cache_level level,
 			  u32 flags); /* Create a valid PTE */
@@ -500,9 +474,9 @@ i915_pd_entry(const struct i915_page_directory * const pdp,
 static inline dma_addr_t
 i915_page_dir_dma_addr(const struct i915_ppgtt *ppgtt, const unsigned int n)
 {
-	struct i915_page_dma *pt = ppgtt->pd->entry[n];
+	struct i915_page_table *pt = ppgtt->pd->entry[n];
 
-	return px_dma(pt ?: px_base(&ppgtt->vm.scratch[ppgtt->vm.top]));
+	return __px_dma(pt ? px_base(pt) : ppgtt->vm.scratch[ppgtt->vm.top]);
 }
 
 void ppgtt_init(struct i915_ppgtt *ppgtt, struct intel_gt *gt);
@@ -527,13 +501,10 @@ struct i915_ppgtt *i915_ppgtt_create(struct intel_gt *gt);
 void i915_ggtt_suspend(struct i915_ggtt *gtt);
 void i915_ggtt_resume(struct i915_ggtt *ggtt);
 
-int setup_page_dma(struct i915_address_space *vm, struct i915_page_dma *p);
-void cleanup_page_dma(struct i915_address_space *vm, struct i915_page_dma *p);
-
-#define kmap_atomic_px(px) kmap_atomic(px_base(px)->page)
+#define kmap_atomic_px(px) kmap_atomic(__px_page(px_base(px)))
 
 void
-fill_page_dma(const struct i915_page_dma *p, const u64 val, unsigned int count);
+fill_page_dma(struct drm_i915_gem_object *p, const u64 val, unsigned int count);
 
 #define fill_px(px, v) fill_page_dma(px_base(px), (v), PAGE_SIZE / sizeof(u64))
 #define fill32_px(px, v) do {						\
@@ -541,37 +512,36 @@ fill_page_dma(const struct i915_page_dma *p, const u64 val, unsigned int count);
 	fill_px((px), v__ << 32 | v__);					\
 } while (0)
 
-int setup_scratch_page(struct i915_address_space *vm, gfp_t gfp);
-void cleanup_scratch_page(struct i915_address_space *vm);
+int setup_scratch_page(struct i915_address_space *vm);
 void free_scratch(struct i915_address_space *vm);
 
+struct drm_i915_gem_object *alloc_pt_dma(struct i915_address_space *vm, int sz);
 struct i915_page_table *alloc_pt(struct i915_address_space *vm);
 struct i915_page_directory *alloc_pd(struct i915_address_space *vm);
 struct i915_page_directory *__alloc_pd(size_t sz);
 
-void free_pd(struct i915_address_space *vm, struct i915_page_dma *pd);
-
-#define free_px(vm, px) free_pd(vm, px_base(px))
+void free_pt(struct i915_address_space *vm, struct i915_page_table *pt);
+#define free_px(vm, px) free_pt(vm, px_pt(px))
 
 void
 __set_pd_entry(struct i915_page_directory * const pd,
 	       const unsigned short idx,
-	       struct i915_page_dma * const to,
+	       struct i915_page_table *pt,
 	       u64 (*encode)(const dma_addr_t, const enum i915_cache_level));
 
 #define set_pd_entry(pd, idx, to) \
-	__set_pd_entry((pd), (idx), px_base(to), gen8_pde_encode)
+	__set_pd_entry((pd), (idx), px_pt(to), gen8_pde_encode)
 
 void
 clear_pd_entry(struct i915_page_directory * const pd,
 	       const unsigned short idx,
-	       const struct i915_page_scratch * const scratch);
+	       const struct drm_i915_gem_object * const scratch);
 
 bool
 release_pd_entry(struct i915_page_directory * const pd,
 		 const unsigned short idx,
 		 struct i915_page_table * const pt,
-		 const struct i915_page_scratch * const scratch);
+		 const struct drm_i915_gem_object * const scratch);
 void gen6_ggtt_invalidate(struct i915_ggtt *ggtt);
 
 int ggtt_set_pages(struct i915_vma *vma);
diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
index 9633fd2d294d..94bd969ebffd 100644
--- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
@@ -18,7 +18,8 @@ struct i915_page_table *alloc_pt(struct i915_address_space *vm)
 	if (unlikely(!pt))
 		return ERR_PTR(-ENOMEM);
 
-	if (unlikely(setup_page_dma(vm, &pt->base))) {
+	pt->base = vm->alloc_pt_dma(vm, I915_GTT_PAGE_SIZE_4K);
+	if (IS_ERR(pt->base)) {
 		kfree(pt);
 		return ERR_PTR(-ENOMEM);
 	}
@@ -47,7 +48,8 @@ struct i915_page_directory *alloc_pd(struct i915_address_space *vm)
 	if (unlikely(!pd))
 		return ERR_PTR(-ENOMEM);
 
-	if (unlikely(setup_page_dma(vm, px_base(pd)))) {
+	pd->pt.base = vm->alloc_pt_dma(vm, I915_GTT_PAGE_SIZE_4K);
+	if (IS_ERR(pd->pt.base)) {
 		kfree(pd);
 		return ERR_PTR(-ENOMEM);
 	}
@@ -55,27 +57,28 @@ struct i915_page_directory *alloc_pd(struct i915_address_space *vm)
 	return pd;
 }
 
-void free_pd(struct i915_address_space *vm, struct i915_page_dma *pd)
+void free_pt(struct i915_address_space *vm, struct i915_page_table *pt)
 {
-	cleanup_page_dma(vm, pd);
-	kfree(pd);
+	i915_gem_object_put(pt->base);
+	kfree(pt);
 }
 
 static inline void
-write_dma_entry(struct i915_page_dma * const pdma,
+write_dma_entry(struct drm_i915_gem_object * const pdma,
 		const unsigned short idx,
 		const u64 encoded_entry)
 {
-	u64 * const vaddr = kmap_atomic(pdma->page);
+	u64 * const vaddr = kmap_atomic(__px_page(pdma));
 
 	vaddr[idx] = encoded_entry;
+	clflush_cache_range(&vaddr[idx], sizeof(u64));
 	kunmap_atomic(vaddr);
 }
 
 void
 __set_pd_entry(struct i915_page_directory * const pd,
 	       const unsigned short idx,
-	       struct i915_page_dma * const to,
+	       struct i915_page_table * const to,
 	       u64 (*encode)(const dma_addr_t, const enum i915_cache_level))
 {
 	/* Each thread pre-pins the pd, and we may have a thread per pde. */
@@ -83,13 +86,13 @@ __set_pd_entry(struct i915_page_directory * const pd,
 
 	atomic_inc(px_used(pd));
 	pd->entry[idx] = to;
-	write_dma_entry(px_base(pd), idx, encode(to->daddr, I915_CACHE_LLC));
+	write_dma_entry(px_base(pd), idx, encode(px_dma(to), I915_CACHE_LLC));
 }
 
 void
 clear_pd_entry(struct i915_page_directory * const pd,
 	       const unsigned short idx,
-	       const struct i915_page_scratch * const scratch)
+	       const struct drm_i915_gem_object * const scratch)
 {
 	GEM_BUG_ON(atomic_read(px_used(pd)) == 0);
 
@@ -102,7 +105,7 @@ bool
 release_pd_entry(struct i915_page_directory * const pd,
 		 const unsigned short idx,
 		 struct i915_page_table * const pt,
-		 const struct i915_page_scratch * const scratch)
+		 const struct drm_i915_gem_object * const scratch)
 {
 	bool free = false;
 
diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
index 68a08486fc87..f1f27b7fc746 100644
--- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
@@ -201,16 +201,18 @@ static struct i915_address_space *vm_alias(struct i915_address_space *vm)
 	return vm;
 }
 
+static u32 pp_dir(struct i915_address_space *vm)
+{
+	return to_gen6_ppgtt(i915_vm_to_ppgtt(vm))->pp_dir;
+}
+
 static void set_pp_dir(struct intel_engine_cs *engine)
 {
 	struct i915_address_space *vm = vm_alias(engine->gt->vm);
 
 	if (vm) {
-		struct i915_ppgtt *ppgtt = i915_vm_to_ppgtt(vm);
-
 		ENGINE_WRITE(engine, RING_PP_DIR_DCLV, PP_DIR_DCLV_2G);
-		ENGINE_WRITE(engine, RING_PP_DIR_BASE,
-			     px_base(ppgtt->pd)->ggtt_offset << 10);
+		ENGINE_WRITE(engine, RING_PP_DIR_BASE, pp_dir(vm));
 	}
 }
 
@@ -608,7 +610,7 @@ static const struct intel_context_ops ring_context_ops = {
 };
 
 static int load_pd_dir(struct i915_request *rq,
-		       const struct i915_ppgtt *ppgtt,
+		       struct i915_address_space *vm,
 		       u32 valid)
 {
 	const struct intel_engine_cs * const engine = rq->engine;
@@ -624,7 +626,7 @@ static int load_pd_dir(struct i915_request *rq,
 
 	*cs++ = MI_LOAD_REGISTER_IMM(1);
 	*cs++ = i915_mmio_reg_offset(RING_PP_DIR_BASE(engine->mmio_base));
-	*cs++ = px_base(ppgtt->pd)->ggtt_offset << 10;
+	*cs++ = pp_dir(vm);
 
 	/* Stall until the page table load is complete? */
 	*cs++ = MI_STORE_REGISTER_MEM | MI_SRM_LRM_GLOBAL_GTT;
@@ -826,7 +828,7 @@ static int switch_mm(struct i915_request *rq, struct i915_address_space *vm)
 	 * post-sync op, this extra pass appears vital before a
 	 * mm switch!
 	 */
-	ret = load_pd_dir(rq, i915_vm_to_ppgtt(vm), PP_DIR_DCLV_2G);
+	ret = load_pd_dir(rq, vm, PP_DIR_DCLV_2G);
 	if (ret)
 		return ret;
 
diff --git a/drivers/gpu/drm/i915/gvt/scheduler.c b/drivers/gpu/drm/i915/gvt/scheduler.c
index 3c3b9842bbbd..1570eb8aa978 100644
--- a/drivers/gpu/drm/i915/gvt/scheduler.c
+++ b/drivers/gpu/drm/i915/gvt/scheduler.c
@@ -403,6 +403,14 @@ static void release_shadow_wa_ctx(struct intel_shadow_wa_ctx *wa_ctx)
 	wa_ctx->indirect_ctx.shadow_va = NULL;
 }
 
+static void set_dma_address(struct i915_page_directory *pd, dma_addr_t addr)
+{
+	struct scatterlist *sg = pd->pt.base->mm.pages->sgl;
+
+	/* This is not a good idea */
+	sg->dma_address = addr;
+}
+
 static void set_context_ppgtt_from_shadow(struct intel_vgpu_workload *workload,
 					  struct intel_context *ce)
 {
@@ -411,7 +419,7 @@ static void set_context_ppgtt_from_shadow(struct intel_vgpu_workload *workload,
 	int i = 0;
 
 	if (mm->ppgtt_mm.root_entry_type == GTT_TYPE_PPGTT_ROOT_L4_ENTRY) {
-		px_dma(ppgtt->pd) = mm->ppgtt_mm.shadow_pdps[0];
+		set_dma_address(ppgtt->pd, mm->ppgtt_mm.shadow_pdps[0]);
 	} else {
 		for (i = 0; i < GVT_RING_CTX_NR_PDPS; i++) {
 			struct i915_page_directory * const pd =
@@ -421,7 +429,8 @@ static void set_context_ppgtt_from_shadow(struct intel_vgpu_workload *workload,
 			   shadow ppgtt. */
 			if (!pd)
 				break;
-			px_dma(pd) = mm->ppgtt_mm.shadow_pdps[i];
+
+			set_dma_address(pd, mm->ppgtt_mm.shadow_pdps[i]);
 		}
 	}
 }
@@ -1240,13 +1249,13 @@ i915_context_ppgtt_root_restore(struct intel_vgpu_submission *s,
 	int i;
 
 	if (i915_vm_is_4lvl(&ppgtt->vm)) {
-		px_dma(ppgtt->pd) = s->i915_context_pml4;
+		set_dma_address(ppgtt->pd, s->i915_context_pml4);
 	} else {
 		for (i = 0; i < GEN8_3LVL_PDPES; i++) {
 			struct i915_page_directory * const pd =
 				i915_pd_entry(ppgtt->pd, i);
 
-			px_dma(pd) = s->i915_context_pdps[i];
+			set_dma_address(pd, s->i915_context_pdps[i]);
 		}
 	}
 }
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 67102dc26fce..ea281d7b0630 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -1080,6 +1080,7 @@ static void i915_driver_release(struct drm_device *dev)
 
 	intel_memory_regions_driver_release(dev_priv);
 	i915_ggtt_driver_release(dev_priv);
+	i915_gem_drain_freed_objects(dev_priv);
 
 	i915_driver_mmio_release(dev_priv);
 
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 6e9072ab30a1..100c2029798f 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -590,11 +590,6 @@ struct i915_gem_mm {
 	 */
 	atomic_t free_count;
 
-	/**
-	 * Small stash of WC pages
-	 */
-	struct pagestash wc_stash;
-
 	/**
 	 * tmpfs instance used for shmem backed objects
 	 */
diff --git a/drivers/gpu/drm/i915/selftests/mock_gtt.c b/drivers/gpu/drm/i915/selftests/mock_gtt.c
index 5e4fb0fba34b..63a29211652e 100644
--- a/drivers/gpu/drm/i915/selftests/mock_gtt.c
+++ b/drivers/gpu/drm/i915/selftests/mock_gtt.c
@@ -78,6 +78,8 @@ struct i915_ppgtt *mock_ppgtt(struct drm_i915_private *i915, const char *name)
 
 	i915_address_space_init(&ppgtt->vm, VM_CLASS_PPGTT);
 
+	ppgtt->vm.alloc_pt_dma = alloc_pt_dma;
+
 	ppgtt->vm.clear_range = mock_clear_range;
 	ppgtt->vm.insert_page = mock_insert_page;
 	ppgtt->vm.insert_entries = mock_insert_entries;
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [Intel-gfx] [PATCH 08/23] drm/i915/gem: Don't drop the timeline lock during execbuf
  2020-07-02  8:32 ` [Intel-gfx] " Chris Wilson
                   ` (6 preceding siblings ...)
  (?)
@ 2020-07-02  8:32 ` Chris Wilson
  -1 siblings, 0 replies; 56+ messages in thread
From: Chris Wilson @ 2020-07-02  8:32 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Our timeline lock is our defence against a concurrent execbuf
interrupting our request construction. we need hold it throughout or,
for example, a second thread may interject a relocation request in
between our own relocation request and execution in the ring.

A second, major benefit, is that it allows us to preserve a large chunk
of the ringbuffer for our exclusive use; which should virtually
eliminate the threat of hitting a wait_for_space during request
construction -- although we should have already dropped other
contentious locks at that point.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 241 ++++++++++++------
 .../i915/gem/selftests/i915_gem_execbuffer.c  |  24 +-
 2 files changed, 186 insertions(+), 79 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 6d4bf38dcda8..5e59b4a689b5 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -259,6 +259,8 @@ struct i915_execbuffer {
 		bool has_fence : 1;
 		bool needs_unfenced : 1;
 
+		struct intel_context *ce;
+
 		struct i915_vma *target;
 		struct i915_request *rq;
 		struct i915_vma *rq_vma;
@@ -639,6 +641,35 @@ static int eb_reserve_vma(const struct i915_execbuffer *eb,
 	return 0;
 }
 
+static void retire_requests(struct intel_timeline *tl, struct i915_request *end)
+{
+	struct i915_request *rq, *rn;
+
+	list_for_each_entry_safe(rq, rn, &tl->requests, link)
+		if (rq == end || !i915_request_retire(rq))
+			break;
+}
+
+static int wait_for_timeline(struct intel_timeline *tl)
+{
+	do {
+		struct dma_fence *fence;
+		int err;
+
+		fence = i915_active_fence_get(&tl->last_request);
+		if (!fence)
+			return 0;
+
+		err = dma_fence_wait(fence, true);
+		dma_fence_put(fence);
+		if (err)
+			return err;
+
+		/* Retiring may trigger a barrier, requiring an extra pass */
+		retire_requests(tl, NULL);
+	} while (1);
+}
+
 static int eb_reserve(struct i915_execbuffer *eb)
 {
 	const unsigned int count = eb->buffer_count;
@@ -646,7 +677,6 @@ static int eb_reserve(struct i915_execbuffer *eb)
 	struct list_head last;
 	struct eb_vma *ev;
 	unsigned int i, pass;
-	int err = 0;
 
 	/*
 	 * Attempt to pin all of the buffers into the GTT.
@@ -662,18 +692,22 @@ static int eb_reserve(struct i915_execbuffer *eb)
 	 * room for the earlier objects *unless* we need to defragment.
 	 */
 
-	if (mutex_lock_interruptible(&eb->i915->drm.struct_mutex))
-		return -EINTR;
-
 	pass = 0;
 	do {
+		int err = 0;
+
+		if (mutex_lock_interruptible(&eb->i915->drm.struct_mutex))
+			return -EINTR;
+
 		list_for_each_entry(ev, &eb->unbound, bind_link) {
 			err = eb_reserve_vma(eb, ev, pin_flags);
 			if (err)
 				break;
 		}
-		if (!(err == -ENOSPC || err == -EAGAIN))
-			break;
+		if (!(err == -ENOSPC || err == -EAGAIN)) {
+			mutex_unlock(&eb->i915->drm.struct_mutex);
+			return err;
+		}
 
 		/* Resort *all* the objects into priority order */
 		INIT_LIST_HEAD(&eb->unbound);
@@ -702,11 +736,10 @@ static int eb_reserve(struct i915_execbuffer *eb)
 				list_add_tail(&ev->bind_link, &last);
 		}
 		list_splice_tail(&last, &eb->unbound);
+		mutex_unlock(&eb->i915->drm.struct_mutex);
 
 		if (err == -EAGAIN) {
-			mutex_unlock(&eb->i915->drm.struct_mutex);
 			flush_workqueue(eb->i915->mm.userptr_wq);
-			mutex_lock(&eb->i915->drm.struct_mutex);
 			continue;
 		}
 
@@ -715,25 +748,23 @@ static int eb_reserve(struct i915_execbuffer *eb)
 			break;
 
 		case 1:
-			/* Too fragmented, unbind everything and retry */
-			mutex_lock(&eb->context->vm->mutex);
-			err = i915_gem_evict_vm(eb->context->vm);
-			mutex_unlock(&eb->context->vm->mutex);
+			/*
+			 * Too fragmented, retire everything on the timeline
+			 * and so make it all [contexts included] available to
+			 * evict.
+			 */
+			err = wait_for_timeline(eb->context->timeline);
 			if (err)
-				goto unlock;
+				return err;
+
 			break;
 
 		default:
-			err = -ENOSPC;
-			goto unlock;
+			return -ENOSPC;
 		}
 
 		pin_flags = PIN_USER;
 	} while (1);
-
-unlock:
-	mutex_unlock(&eb->i915->drm.struct_mutex);
-	return err;
 }
 
 static unsigned int eb_batch_index(const struct i915_execbuffer *eb)
@@ -1000,13 +1031,44 @@ static int reloc_gpu_chain(struct reloc_cache *cache)
 	return err;
 }
 
+static struct i915_request *
+nested_request_create(struct intel_context *ce)
+{
+	struct i915_request *rq;
+
+	/* XXX This only works once; replace with shared timeline */
+	mutex_lock_nested(&ce->timeline->mutex, SINGLE_DEPTH_NESTING);
+	intel_context_enter(ce);
+
+	rq = __i915_request_create(ce, GFP_KERNEL);
+
+	intel_context_exit(ce);
+	if (IS_ERR(rq))
+		mutex_unlock(&ce->timeline->mutex);
+
+	return rq;
+}
+
+static void __i915_request_add(struct i915_request *rq,
+			       struct i915_sched_attr *attr)
+{
+	struct intel_timeline * const tl = i915_request_timeline(rq);
+
+	lockdep_assert_held(&tl->mutex);
+	lockdep_unpin_lock(&tl->mutex, rq->cookie);
+
+	__i915_request_commit(rq);
+	__i915_request_queue(rq, attr);
+}
+
 static unsigned int reloc_bb_flags(const struct reloc_cache *cache)
 {
 	return cache->gen > 5 ? 0 : I915_DISPATCH_SECURE;
 }
 
-static int reloc_gpu_flush(struct reloc_cache *cache)
+static int reloc_gpu_flush(struct i915_execbuffer *eb)
 {
+	struct reloc_cache *cache = &eb->reloc_cache;
 	struct i915_request *rq;
 	int err;
 
@@ -1037,7 +1099,9 @@ static int reloc_gpu_flush(struct reloc_cache *cache)
 		i915_request_set_error_once(rq, err);
 
 	intel_gt_chipset_flush(rq->engine->gt);
-	i915_request_add(rq);
+	__i915_request_add(rq, &eb->gem_context->sched);
+	if (i915_request_timeline(rq) != eb->context->timeline)
+		mutex_unlock(&i915_request_timeline(rq)->mutex);
 
 	return err;
 }
@@ -1096,27 +1160,15 @@ static int __reloc_gpu_alloc(struct i915_execbuffer *eb,
 	if (err)
 		goto err_unmap;
 
-	if (engine == eb->context->engine) {
-		rq = i915_request_create(eb->context);
-	} else {
-		struct intel_context *ce;
-
-		ce = intel_context_create(engine);
-		if (IS_ERR(ce)) {
-			err = PTR_ERR(ce);
-			goto err_unpin;
-		}
-
-		i915_vm_put(ce->vm);
-		ce->vm = i915_vm_get(eb->context->vm);
-
-		rq = intel_context_create_request(ce);
-		intel_context_put(ce);
-	}
+	if (cache->ce == eb->context)
+		rq = __i915_request_create(cache->ce, GFP_KERNEL);
+	else
+		rq = nested_request_create(cache->ce);
 	if (IS_ERR(rq)) {
 		err = PTR_ERR(rq);
 		goto err_unpin;
 	}
+	rq->cookie = lockdep_pin_lock(&i915_request_timeline(rq)->mutex);
 
 	err = intel_gt_buffer_pool_mark_active(pool, rq);
 	if (err)
@@ -1144,7 +1196,9 @@ static int __reloc_gpu_alloc(struct i915_execbuffer *eb,
 skip_request:
 	i915_request_set_error_once(rq, err);
 err_request:
-	i915_request_add(rq);
+	__i915_request_add(rq, &eb->gem_context->sched);
+	if (i915_request_timeline(rq) != eb->context->timeline)
+		mutex_unlock(&i915_request_timeline(rq)->mutex);
 err_unpin:
 	i915_vma_unpin(batch);
 err_unmap:
@@ -1154,11 +1208,6 @@ static int __reloc_gpu_alloc(struct i915_execbuffer *eb,
 	return err;
 }
 
-static bool reloc_can_use_engine(const struct intel_engine_cs *engine)
-{
-	return engine->class != VIDEO_DECODE_CLASS || !IS_GEN(engine->i915, 6);
-}
-
 static u32 *reloc_gpu(struct i915_execbuffer *eb,
 		      struct i915_vma *vma,
 		      unsigned int len)
@@ -1170,12 +1219,6 @@ static u32 *reloc_gpu(struct i915_execbuffer *eb,
 	if (unlikely(!cache->rq)) {
 		struct intel_engine_cs *engine = eb->engine;
 
-		if (!reloc_can_use_engine(engine)) {
-			engine = engine->gt->engine_class[COPY_ENGINE_CLASS][0];
-			if (!engine)
-				return ERR_PTR(-ENODEV);
-		}
-
 		err = __reloc_gpu_alloc(eb, engine, len);
 		if (unlikely(err))
 			return ERR_PTR(err);
@@ -1506,7 +1549,7 @@ static int eb_relocate(struct i915_execbuffer *eb)
 				break;
 		}
 
-		flush = reloc_gpu_flush(&eb->reloc_cache);
+		flush = reloc_gpu_flush(eb);
 		if (!err)
 			err = flush;
 	}
@@ -1730,21 +1773,17 @@ parser_mark_active(struct eb_parse_work *pw, struct intel_timeline *tl)
 {
 	int err;
 
-	mutex_lock(&tl->mutex);
-
 	err = __parser_mark_active(pw->shadow, tl, &pw->base.dma);
 	if (err)
-		goto unlock;
+		return err;
 
 	if (pw->trampoline) {
 		err = __parser_mark_active(pw->trampoline, tl, &pw->base.dma);
 		if (err)
-			goto unlock;
+			return err;
 	}
 
-unlock:
-	mutex_unlock(&tl->mutex);
-	return err;
+	return 0;
 }
 
 static int eb_parse_pipeline(struct i915_execbuffer *eb,
@@ -2037,6 +2076,54 @@ static struct i915_request *eb_throttle(struct intel_context *ce)
 	return i915_request_get(rq);
 }
 
+static bool reloc_can_use_engine(const struct intel_engine_cs *engine)
+{
+	return engine->class != VIDEO_DECODE_CLASS || !IS_GEN(engine->i915, 6);
+}
+
+static int __eb_pin_reloc_engine(struct i915_execbuffer *eb)
+{
+	struct intel_engine_cs *engine = eb->engine;
+	struct intel_context *ce;
+	int err;
+
+	if (reloc_can_use_engine(engine)) {
+		eb->reloc_cache.ce = eb->context;
+		return 0;
+	}
+
+	engine = engine->gt->engine_class[COPY_ENGINE_CLASS][0];
+	if (!engine)
+		return -ENODEV;
+
+	ce = intel_context_create(engine);
+	if (IS_ERR(ce))
+		return PTR_ERR(ce);
+
+	/* Reuse eb->context->timeline with scheduler! */
+
+	i915_vm_put(ce->vm);
+	ce->vm = i915_vm_get(eb->context->vm);
+
+	err = intel_context_pin(ce);
+	if (err)
+		return err;
+
+	eb->reloc_cache.ce = ce;
+	return 0;
+}
+
+static void __eb_unpin_reloc_engine(struct i915_execbuffer *eb)
+{
+	struct intel_context *ce = eb->reloc_cache.ce;
+
+	if (ce == eb->context)
+		return;
+
+	intel_context_unpin(ce);
+	intel_context_put(ce);
+}
+
 static int __eb_pin_engine(struct i915_execbuffer *eb, struct intel_context *ce)
 {
 	struct intel_timeline *tl;
@@ -2080,9 +2167,7 @@ static int __eb_pin_engine(struct i915_execbuffer *eb, struct intel_context *ce)
 	intel_context_enter(ce);
 	rq = eb_throttle(ce);
 
-	intel_context_timeline_unlock(tl);
-
-	if (rq) {
+	while (rq) {
 		bool nonblock = eb->file->filp->f_flags & O_NONBLOCK;
 		long timeout;
 
@@ -2090,23 +2175,34 @@ static int __eb_pin_engine(struct i915_execbuffer *eb, struct intel_context *ce)
 		if (nonblock)
 			timeout = 0;
 
+		mutex_unlock(&tl->mutex);
+
 		timeout = i915_request_wait(rq,
 					    I915_WAIT_INTERRUPTIBLE,
 					    timeout);
 		i915_request_put(rq);
 
+		mutex_lock(&tl->mutex);
+
 		if (timeout < 0) {
 			err = nonblock ? -EWOULDBLOCK : timeout;
 			goto err_exit;
 		}
+
+		retire_requests(tl, NULL);
+		rq = eb_throttle(ce);
 	}
 
 	eb->engine = ce->engine;
 	eb->context = ce;
+
+	err = __eb_pin_reloc_engine(eb);
+	if (err)
+		goto err_exit;
+
 	return 0;
 
 err_exit:
-	mutex_lock(&tl->mutex);
 	intel_context_exit(ce);
 	intel_context_timeline_unlock(tl);
 err_unpin:
@@ -2117,11 +2213,11 @@ static int __eb_pin_engine(struct i915_execbuffer *eb, struct intel_context *ce)
 static void eb_unpin_engine(struct i915_execbuffer *eb)
 {
 	struct intel_context *ce = eb->context;
-	struct intel_timeline *tl = ce->timeline;
 
-	mutex_lock(&tl->mutex);
+	__eb_unpin_reloc_engine(eb);
+
 	intel_context_exit(ce);
-	mutex_unlock(&tl->mutex);
+	intel_context_timeline_unlock(ce->timeline);
 
 	intel_context_unpin(ce);
 }
@@ -2323,15 +2419,6 @@ signal_fence_array(struct i915_execbuffer *eb,
 	}
 }
 
-static void retire_requests(struct intel_timeline *tl, struct i915_request *end)
-{
-	struct i915_request *rq, *rn;
-
-	list_for_each_entry_safe(rq, rn, &tl->requests, link)
-		if (rq == end || !i915_request_retire(rq))
-			break;
-}
-
 static void eb_request_add(struct i915_execbuffer *eb)
 {
 	struct i915_request *rq = eb->request;
@@ -2360,8 +2447,6 @@ static void eb_request_add(struct i915_execbuffer *eb)
 	/* Try to clean up the client's timeline after submitting the request */
 	if (prev)
 		retire_requests(tl, prev);
-
-	mutex_unlock(&tl->mutex);
 }
 
 static int
@@ -2448,6 +2533,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	err = eb_pin_engine(&eb, file, args);
 	if (unlikely(err))
 		goto err_context;
+	lockdep_assert_held(&eb.context->timeline->mutex);
 
 	err = eb_relocate(&eb);
 	if (err) {
@@ -2515,11 +2601,12 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	GEM_BUG_ON(eb.reloc_cache.rq);
 
 	/* Allocate a request for this batch buffer nice and early. */
-	eb.request = i915_request_create(eb.context);
+	eb.request = __i915_request_create(eb.context, GFP_KERNEL);
 	if (IS_ERR(eb.request)) {
 		err = PTR_ERR(eb.request);
 		goto err_batch_unpin;
 	}
+	eb.request->cookie = lockdep_pin_lock(&eb.context->timeline->mutex);
 
 	if (in_fence) {
 		if (args->flags & I915_EXEC_FENCE_SUBMIT)
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c
index 57c14d3340cd..992d46db1b33 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c
@@ -7,6 +7,9 @@
 
 #include "gt/intel_engine_pm.h"
 #include "selftests/igt_flush_test.h"
+#include "selftests/mock_drm.h"
+
+#include "mock_context.h"
 
 static u64 read_reloc(const u32 *map, int x, const u64 mask)
 {
@@ -60,7 +63,7 @@ static int __igt_gpu_reloc(struct i915_execbuffer *eb,
 
 	GEM_BUG_ON(!eb->reloc_cache.rq);
 	rq = i915_request_get(eb->reloc_cache.rq);
-	err = reloc_gpu_flush(&eb->reloc_cache);
+	err = reloc_gpu_flush(eb);
 	if (err)
 		goto put_rq;
 	GEM_BUG_ON(eb->reloc_cache.rq);
@@ -100,14 +103,22 @@ static int igt_gpu_reloc(void *arg)
 {
 	struct i915_execbuffer eb;
 	struct drm_i915_gem_object *scratch;
+	struct file *file;
 	int err = 0;
 	u32 *map;
 
+	file = mock_file(arg);
+	if (IS_ERR(file))
+		return PTR_ERR(file);
+
 	eb.i915 = arg;
+	eb.gem_context = live_context(arg, file);
+	if (IS_ERR(eb.gem_context))
+		goto err_file;
 
 	scratch = i915_gem_object_create_internal(eb.i915, 4096);
 	if (IS_ERR(scratch))
-		return PTR_ERR(scratch);
+		goto err_file;
 
 	map = i915_gem_object_pin_map(scratch, I915_MAP_WC);
 	if (IS_ERR(map)) {
@@ -130,8 +141,15 @@ static int igt_gpu_reloc(void *arg)
 		if (err)
 			goto err_put;
 
+		mutex_lock(&eb.context->timeline->mutex);
+		intel_context_enter(eb.context);
+		eb.reloc_cache.ce = eb.context;
+
 		err = __igt_gpu_reloc(&eb, scratch);
 
+		intel_context_exit(eb.context);
+		mutex_unlock(&eb.context->timeline->mutex);
+
 		intel_context_unpin(eb.context);
 err_put:
 		intel_context_put(eb.context);
@@ -146,6 +164,8 @@ static int igt_gpu_reloc(void *arg)
 
 err_scratch:
 	i915_gem_object_put(scratch);
+err_file:
+	fput(file);
 	return err;
 }
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [Intel-gfx] [PATCH 09/23] drm/i915/gem: Rename execbuf.bind_link to unbound_link
  2020-07-02  8:32 ` [Intel-gfx] " Chris Wilson
                   ` (7 preceding siblings ...)
  (?)
@ 2020-07-02  8:32 ` Chris Wilson
  -1 siblings, 0 replies; 56+ messages in thread
From: Chris Wilson @ 2020-07-02  8:32 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Rename the current list of unbound objects so that we can track of all
objects that we need to bind, as well as the list of currently unbound
[unprocessed] objects.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 5e59b4a689b5..6abbd8e80f05 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -33,7 +33,7 @@ struct eb_vma {
 
 	/** This vma's place in the execbuf reservation list */
 	struct drm_i915_gem_exec_object2 *exec;
-	struct list_head bind_link;
+	struct list_head unbound_link;
 	struct list_head reloc_link;
 
 	struct hlist_node node;
@@ -594,7 +594,7 @@ eb_add_vma(struct i915_execbuffer *eb,
 		}
 	} else {
 		eb_unreserve_vma(ev);
-		list_add_tail(&ev->bind_link, &eb->unbound);
+		list_add_tail(&ev->unbound_link, &eb->unbound);
 	}
 }
 
@@ -699,7 +699,7 @@ static int eb_reserve(struct i915_execbuffer *eb)
 		if (mutex_lock_interruptible(&eb->i915->drm.struct_mutex))
 			return -EINTR;
 
-		list_for_each_entry(ev, &eb->unbound, bind_link) {
+		list_for_each_entry(ev, &eb->unbound, unbound_link) {
 			err = eb_reserve_vma(eb, ev, pin_flags);
 			if (err)
 				break;
@@ -725,15 +725,15 @@ static int eb_reserve(struct i915_execbuffer *eb)
 
 			if (flags & EXEC_OBJECT_PINNED)
 				/* Pinned must have their slot */
-				list_add(&ev->bind_link, &eb->unbound);
+				list_add(&ev->unbound_link, &eb->unbound);
 			else if (flags & __EXEC_OBJECT_NEEDS_MAP)
 				/* Map require the lowest 256MiB (aperture) */
-				list_add_tail(&ev->bind_link, &eb->unbound);
+				list_add_tail(&ev->unbound_link, &eb->unbound);
 			else if (!(flags & EXEC_OBJECT_SUPPORTS_48B_ADDRESS))
 				/* Prioritise 4GiB region for restricted bo */
-				list_add(&ev->bind_link, &last);
+				list_add(&ev->unbound_link, &last);
 			else
-				list_add_tail(&ev->bind_link, &last);
+				list_add_tail(&ev->unbound_link, &last);
 		}
 		list_splice_tail(&last, &eb->unbound);
 		mutex_unlock(&eb->i915->drm.struct_mutex);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [Intel-gfx] [PATCH 10/23] drm/i915/gem: Break apart the early i915_vma_pin from execbuf object lookup
  2020-07-02  8:32 ` [Intel-gfx] " Chris Wilson
                   ` (8 preceding siblings ...)
  (?)
@ 2020-07-02  8:32 ` Chris Wilson
  -1 siblings, 0 replies; 56+ messages in thread
From: Chris Wilson @ 2020-07-02  8:32 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

As a prelude to the next step where we want to perform all the object
allocations together under the same lock, we first must delay the
i915_vma_pin() as that implicitly does the allocations for us, one by
one. As it only does the allocations one by one, it is not allowed to
wait/evict, whereas pulling all the allocations together the entire set
can be scheduled as one.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 70 +++++++++++--------
 1 file changed, 39 insertions(+), 31 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 6abbd8e80f05..1348cb5ec7e6 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -33,6 +33,8 @@ struct eb_vma {
 
 	/** This vma's place in the execbuf reservation list */
 	struct drm_i915_gem_exec_object2 *exec;
+
+	struct list_head bind_link;
 	struct list_head unbound_link;
 	struct list_head reloc_link;
 
@@ -240,8 +242,8 @@ struct i915_execbuffer {
 	/** actual size of execobj[] as we may extend it for the cmdparser */
 	unsigned int buffer_count;
 
-	/** list of vma not yet bound during reservation phase */
-	struct list_head unbound;
+	/** list of all vma required to bound for this execbuf */
+	struct list_head bind_list;
 
 	/** list of vma that have execobj.relocation_count */
 	struct list_head relocs;
@@ -565,6 +567,8 @@ eb_add_vma(struct i915_execbuffer *eb,
 						    eb->lut_size)]);
 	}
 
+	list_add_tail(&ev->bind_link, &eb->bind_list);
+
 	if (entry->relocation_count)
 		list_add_tail(&ev->reloc_link, &eb->relocs);
 
@@ -586,16 +590,6 @@ eb_add_vma(struct i915_execbuffer *eb,
 
 		eb->batch = ev;
 	}
-
-	if (eb_pin_vma(eb, entry, ev)) {
-		if (entry->offset != vma->node.start) {
-			entry->offset = vma->node.start | UPDATE;
-			eb->args->flags |= __EXEC_HAS_RELOC;
-		}
-	} else {
-		eb_unreserve_vma(ev);
-		list_add_tail(&ev->unbound_link, &eb->unbound);
-	}
 }
 
 static int eb_reserve_vma(const struct i915_execbuffer *eb,
@@ -670,13 +664,31 @@ static int wait_for_timeline(struct intel_timeline *tl)
 	} while (1);
 }
 
-static int eb_reserve(struct i915_execbuffer *eb)
+static int eb_reserve_vm(struct i915_execbuffer *eb)
 {
-	const unsigned int count = eb->buffer_count;
 	unsigned int pin_flags = PIN_USER | PIN_NONBLOCK;
-	struct list_head last;
+	struct list_head last, unbound;
 	struct eb_vma *ev;
-	unsigned int i, pass;
+	unsigned int pass;
+
+	INIT_LIST_HEAD(&unbound);
+	list_for_each_entry(ev, &eb->bind_list, bind_link) {
+		struct drm_i915_gem_exec_object2 *entry = ev->exec;
+		struct i915_vma *vma = ev->vma;
+
+		if (eb_pin_vma(eb, entry, ev)) {
+			if (entry->offset != vma->node.start) {
+				entry->offset = vma->node.start | UPDATE;
+				eb->args->flags |= __EXEC_HAS_RELOC;
+			}
+		} else {
+			eb_unreserve_vma(ev);
+			list_add_tail(&ev->unbound_link, &unbound);
+		}
+	}
+
+	if (list_empty(&unbound))
+		return 0;
 
 	/*
 	 * Attempt to pin all of the buffers into the GTT.
@@ -699,7 +711,7 @@ static int eb_reserve(struct i915_execbuffer *eb)
 		if (mutex_lock_interruptible(&eb->i915->drm.struct_mutex))
 			return -EINTR;
 
-		list_for_each_entry(ev, &eb->unbound, unbound_link) {
+		list_for_each_entry(ev, &unbound, unbound_link) {
 			err = eb_reserve_vma(eb, ev, pin_flags);
 			if (err)
 				break;
@@ -710,13 +722,11 @@ static int eb_reserve(struct i915_execbuffer *eb)
 		}
 
 		/* Resort *all* the objects into priority order */
-		INIT_LIST_HEAD(&eb->unbound);
+		INIT_LIST_HEAD(&unbound);
 		INIT_LIST_HEAD(&last);
-		for (i = 0; i < count; i++) {
-			unsigned int flags;
+		list_for_each_entry(ev, &eb->bind_list, bind_link) {
+			unsigned int flags = ev->flags;
 
-			ev = &eb->vma[i];
-			flags = ev->flags;
 			if (flags & EXEC_OBJECT_PINNED &&
 			    flags & __EXEC_OBJECT_HAS_PIN)
 				continue;
@@ -725,17 +735,17 @@ static int eb_reserve(struct i915_execbuffer *eb)
 
 			if (flags & EXEC_OBJECT_PINNED)
 				/* Pinned must have their slot */
-				list_add(&ev->unbound_link, &eb->unbound);
+				list_add(&ev->unbound_link, &unbound);
 			else if (flags & __EXEC_OBJECT_NEEDS_MAP)
 				/* Map require the lowest 256MiB (aperture) */
-				list_add_tail(&ev->unbound_link, &eb->unbound);
+				list_add_tail(&ev->unbound_link, &unbound);
 			else if (!(flags & EXEC_OBJECT_SUPPORTS_48B_ADDRESS))
 				/* Prioritise 4GiB region for restricted bo */
 				list_add(&ev->unbound_link, &last);
 			else
 				list_add_tail(&ev->unbound_link, &last);
 		}
-		list_splice_tail(&last, &eb->unbound);
+		list_splice_tail(&last, &unbound);
 		mutex_unlock(&eb->i915->drm.struct_mutex);
 
 		if (err == -EAGAIN) {
@@ -884,8 +894,8 @@ static int eb_lookup_vmas(struct i915_execbuffer *eb)
 	unsigned int i;
 	int err = 0;
 
+	INIT_LIST_HEAD(&eb->bind_list);
 	INIT_LIST_HEAD(&eb->relocs);
-	INIT_LIST_HEAD(&eb->unbound);
 
 	for (i = 0; i < eb->buffer_count; i++) {
 		struct i915_vma *vma;
@@ -1532,11 +1542,9 @@ static int eb_relocate(struct i915_execbuffer *eb)
 	if (err)
 		return err;
 
-	if (!list_empty(&eb->unbound)) {
-		err = eb_reserve(eb);
-		if (err)
-			return err;
-	}
+	err = eb_reserve_vm(eb);
+	if (err)
+		return err;
 
 	/* The objects are in their final locations, apply the relocations. */
 	if (eb->args->flags & __EXEC_HAS_RELOC) {
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [Intel-gfx] [PATCH 11/23] drm/i915/gem: Remove the call for no-evict i915_vma_pin
  2020-07-02  8:32 ` [Intel-gfx] " Chris Wilson
                   ` (9 preceding siblings ...)
  (?)
@ 2020-07-02  8:32 ` Chris Wilson
  2020-07-03  8:59   ` Tvrtko Ursulin
  -1 siblings, 1 reply; 56+ messages in thread
From: Chris Wilson @ 2020-07-02  8:32 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Remove the stub i915_vma_pin() used for incrementally pining objects for
execbuf (under the severe restriction that they must not wait on a
resource as we may have already pinned it) and replace it with a
i915_vma_pin_inplace() that is only allowed to reclaim the currently
bound location for the vma (and will never wait for a pinned resource).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 69 +++++++++++--------
 drivers/gpu/drm/i915/i915_vma.c               |  6 +-
 drivers/gpu/drm/i915/i915_vma.h               |  2 +
 3 files changed, 45 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 1348cb5ec7e6..18e9325dd98a 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -452,49 +452,55 @@ static u64 eb_pin_flags(const struct drm_i915_gem_exec_object2 *entry,
 	return pin_flags;
 }
 
+static bool eb_pin_vma_fence_inplace(struct eb_vma *ev)
+{
+	struct i915_vma *vma = ev->vma;
+	struct i915_fence_reg *reg = vma->fence;
+
+	if (reg) {
+		if (READ_ONCE(reg->dirty))
+			return false;
+
+		atomic_inc(&reg->pin_count);
+		ev->flags |= __EXEC_OBJECT_HAS_FENCE;
+	} else {
+		if (i915_gem_object_is_tiled(vma->obj))
+			return false;
+	}
+
+	return true;
+}
+
 static inline bool
-eb_pin_vma(struct i915_execbuffer *eb,
-	   const struct drm_i915_gem_exec_object2 *entry,
-	   struct eb_vma *ev)
+eb_pin_vma_inplace(struct i915_execbuffer *eb,
+		   const struct drm_i915_gem_exec_object2 *entry,
+		   struct eb_vma *ev)
 {
 	struct i915_vma *vma = ev->vma;
-	u64 pin_flags;
+	unsigned int pin_flags;
 
-	if (vma->node.size)
-		pin_flags = vma->node.start;
-	else
-		pin_flags = entry->offset & PIN_OFFSET_MASK;
+	if (eb_vma_misplaced(entry, vma, ev->flags))
+		return false;
 
-	pin_flags |= PIN_USER | PIN_NOEVICT | PIN_OFFSET_FIXED;
+	pin_flags = PIN_USER;
 	if (unlikely(ev->flags & EXEC_OBJECT_NEEDS_GTT))
 		pin_flags |= PIN_GLOBAL;
 
 	/* Attempt to reuse the current location if available */
-	if (unlikely(i915_vma_pin(vma, 0, 0, pin_flags))) {
-		if (entry->flags & EXEC_OBJECT_PINNED)
-			return false;
-
-		/* Failing that pick any _free_ space if suitable */
-		if (unlikely(i915_vma_pin(vma,
-					  entry->pad_to_size,
-					  entry->alignment,
-					  eb_pin_flags(entry, ev->flags) |
-					  PIN_USER | PIN_NOEVICT)))
-			return false;
-	}
+	if (!i915_vma_pin_inplace(vma, pin_flags))
+		return false;
 
 	if (unlikely(ev->flags & EXEC_OBJECT_NEEDS_FENCE)) {
-		if (unlikely(i915_vma_pin_fence(vma))) {
-			i915_vma_unpin(vma);
+		if (!eb_pin_vma_fence_inplace(ev)) {
+			__i915_vma_unpin(vma);
 			return false;
 		}
-
-		if (vma->fence)
-			ev->flags |= __EXEC_OBJECT_HAS_FENCE;
 	}
 
+	GEM_BUG_ON(eb_vma_misplaced(entry, vma, ev->flags));
+
 	ev->flags |= __EXEC_OBJECT_HAS_PIN;
-	return !eb_vma_misplaced(entry, vma, ev->flags);
+	return true;
 }
 
 static int
@@ -676,14 +682,17 @@ static int eb_reserve_vm(struct i915_execbuffer *eb)
 		struct drm_i915_gem_exec_object2 *entry = ev->exec;
 		struct i915_vma *vma = ev->vma;
 
-		if (eb_pin_vma(eb, entry, ev)) {
+		if (eb_pin_vma_inplace(eb, entry, ev)) {
 			if (entry->offset != vma->node.start) {
 				entry->offset = vma->node.start | UPDATE;
 				eb->args->flags |= __EXEC_HAS_RELOC;
 			}
 		} else {
-			eb_unreserve_vma(ev);
-			list_add_tail(&ev->unbound_link, &unbound);
+			/* Lightly sort user placed objects to the fore */
+			if (ev->flags & EXEC_OBJECT_PINNED)
+				list_add(&ev->unbound_link, &unbound);
+			else
+				list_add_tail(&ev->unbound_link, &unbound);
 		}
 	}
 
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index fc8a083753bd..a00a026076e4 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -744,11 +744,13 @@ i915_vma_detach(struct i915_vma *vma)
 	list_del(&vma->vm_link);
 }
 
-static bool try_qad_pin(struct i915_vma *vma, unsigned int flags)
+bool i915_vma_pin_inplace(struct i915_vma *vma, unsigned int flags)
 {
 	unsigned int bound;
 	bool pinned = true;
 
+	GEM_BUG_ON(flags & ~I915_VMA_BIND_MASK);
+
 	bound = atomic_read(&vma->flags);
 	do {
 		if (unlikely(flags & ~bound))
@@ -869,7 +871,7 @@ int i915_vma_pin(struct i915_vma *vma, u64 size, u64 alignment, u64 flags)
 	GEM_BUG_ON(!(flags & (PIN_USER | PIN_GLOBAL)));
 
 	/* First try and grab the pin without rebinding the vma */
-	if (try_qad_pin(vma, flags & I915_VMA_BIND_MASK))
+	if (i915_vma_pin_inplace(vma, flags & I915_VMA_BIND_MASK))
 		return 0;
 
 	err = vma_get_pages(vma);
diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
index d0d01f909548..03fea54fd573 100644
--- a/drivers/gpu/drm/i915/i915_vma.h
+++ b/drivers/gpu/drm/i915/i915_vma.h
@@ -236,6 +236,8 @@ static inline void i915_vma_unlock(struct i915_vma *vma)
 	dma_resv_unlock(vma->resv);
 }
 
+bool i915_vma_pin_inplace(struct i915_vma *vma, unsigned int flags);
+
 int __must_check
 i915_vma_pin(struct i915_vma *vma, u64 size, u64 alignment, u64 flags);
 int i915_ggtt_pin(struct i915_vma *vma, u32 align, unsigned int flags);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [Intel-gfx] [PATCH 12/23] drm/i915: Add list_for_each_entry_safe_continue_reverse
  2020-07-02  8:32 ` [Intel-gfx] " Chris Wilson
                   ` (10 preceding siblings ...)
  (?)
@ 2020-07-02  8:32 ` Chris Wilson
  -1 siblings, 0 replies; 56+ messages in thread
From: Chris Wilson @ 2020-07-02  8:32 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

One more list iterator variant, for when we want to unwind from inside
one list iterator with the intention of restarting from the current
entry as the new head of the list.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_utils.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_utils.h b/drivers/gpu/drm/i915/i915_utils.h
index 03a73d2bd50d..6ebccdd12d4c 100644
--- a/drivers/gpu/drm/i915/i915_utils.h
+++ b/drivers/gpu/drm/i915/i915_utils.h
@@ -266,6 +266,12 @@ static inline int list_is_last_rcu(const struct list_head *list,
 	return READ_ONCE(list->next) == head;
 }
 
+#define list_for_each_entry_safe_continue_reverse(pos, n, head, member)	\
+	for (pos = list_prev_entry(pos, member),			\
+	     n = list_prev_entry(pos, member);				\
+	     &pos->member != (head);					\
+	     pos = n, n = list_prev_entry(n, member))
+
 /*
  * Wait until the work is finally complete, even if it tries to postpone
  * by requeueing itself. Note, that if the worker never cancels itself,
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [Intel-gfx] [PATCH 13/23] drm/i915: Always defer fenced work to the worker
  2020-07-02  8:32 ` [Intel-gfx] " Chris Wilson
                   ` (11 preceding siblings ...)
  (?)
@ 2020-07-02  8:32 ` Chris Wilson
  -1 siblings, 0 replies; 56+ messages in thread
From: Chris Wilson @ 2020-07-02  8:32 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Currently, if an error is raised we always call the cleanup locally
[and skip the main work callback]. However, some future users may need
to take a mutex to cleanup and so we cannot immediately execute the
cleanup as we may still be in interrupt context.

With the execute-immediate flag, for most cases this should result in
immediate cleanup of an error.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_sw_fence_work.c | 25 +++++++++++------------
 1 file changed, 12 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_sw_fence_work.c b/drivers/gpu/drm/i915/i915_sw_fence_work.c
index a3a81bb8f2c3..29f63ebc24e8 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence_work.c
+++ b/drivers/gpu/drm/i915/i915_sw_fence_work.c
@@ -16,11 +16,14 @@ static void fence_complete(struct dma_fence_work *f)
 static void fence_work(struct work_struct *work)
 {
 	struct dma_fence_work *f = container_of(work, typeof(*f), work);
-	int err;
 
-	err = f->ops->work(f);
-	if (err)
-		dma_fence_set_error(&f->dma, err);
+	if (!f->dma.error) {
+		int err;
+
+		err = f->ops->work(f);
+		if (err)
+			dma_fence_set_error(&f->dma, err);
+	}
 
 	fence_complete(f);
 	dma_fence_put(&f->dma);
@@ -36,15 +39,11 @@ fence_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
 		if (fence->error)
 			dma_fence_set_error(&f->dma, fence->error);
 
-		if (!f->dma.error) {
-			dma_fence_get(&f->dma);
-			if (test_bit(DMA_FENCE_WORK_IMM, &f->dma.flags))
-				fence_work(&f->work);
-			else
-				queue_work(system_unbound_wq, &f->work);
-		} else {
-			fence_complete(f);
-		}
+		dma_fence_get(&f->dma);
+		if (test_bit(DMA_FENCE_WORK_IMM, &f->dma.flags))
+			fence_work(&f->work);
+		else
+			queue_work(system_unbound_wq, &f->work);
 		break;
 
 	case FENCE_FREE:
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [Intel-gfx] [PATCH 14/23] drm/i915/gem: Assign context id for async work
  2020-07-02  8:32 ` [Intel-gfx] " Chris Wilson
                   ` (12 preceding siblings ...)
  (?)
@ 2020-07-02  8:32 ` Chris Wilson
  -1 siblings, 0 replies; 56+ messages in thread
From: Chris Wilson @ 2020-07-02  8:32 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Allocate a few dma fence context id that we can use to associate async work
[for the CPU] launched on behalf of this context. For extra fun, we allow
a configurable concurrency width.

A current example would be that we spawn an unbound worker for every
userptr get_pages. In the future, we wish to charge this work to the
context that initiated the async work and to impose concurrency limits
based on the context.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c       | 4 ++++
 drivers/gpu/drm/i915/gem/i915_gem_context.h       | 6 ++++++
 drivers/gpu/drm/i915/gem/i915_gem_context_types.h | 6 ++++++
 3 files changed, 16 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 6574af699233..6d3f2cc39e62 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -714,6 +714,10 @@ __create_context(struct drm_i915_private *i915)
 	ctx->sched.priority = I915_USER_PRIORITY(I915_PRIORITY_NORMAL);
 	mutex_init(&ctx->mutex);
 
+	ctx->async.width = rounddown_pow_of_two(num_online_cpus());
+	ctx->async.context = dma_fence_context_alloc(ctx->async.width);
+	ctx->async.width--;
+
 	spin_lock_init(&ctx->stale.lock);
 	INIT_LIST_HEAD(&ctx->stale.engines);
 
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.h b/drivers/gpu/drm/i915/gem/i915_gem_context.h
index 3702b2fb27ab..e104ff0ae740 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.h
@@ -134,6 +134,12 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
 int i915_gem_context_reset_stats_ioctl(struct drm_device *dev, void *data,
 				       struct drm_file *file);
 
+static inline u64 i915_gem_context_async_id(struct i915_gem_context *ctx)
+{
+	return (ctx->async.context +
+		(atomic_fetch_inc(&ctx->async.cur) & ctx->async.width));
+}
+
 static inline struct i915_gem_context *
 i915_gem_context_get(struct i915_gem_context *ctx)
 {
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
index ae14ca24a11f..52561f98000f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
@@ -85,6 +85,12 @@ struct i915_gem_context {
 
 	struct intel_timeline *timeline;
 
+	struct {
+		u64 context;
+		atomic_t cur;
+		unsigned int width;
+	} async;
+
 	/**
 	 * @vm: unique address space (GTT)
 	 *
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [Intel-gfx] [PATCH 15/23] drm/i915: Export a preallocate variant of i915_active_acquire()
  2020-07-02  8:32 ` [Intel-gfx] " Chris Wilson
                   ` (13 preceding siblings ...)
  (?)
@ 2020-07-02  8:32 ` Chris Wilson
  -1 siblings, 0 replies; 56+ messages in thread
From: Chris Wilson @ 2020-07-02  8:32 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Sometimes we have to be very careful not to allocate underneath a mutex
(or spinlock) and yet still want to track activity. Enter
i915_active_acquire_for_context(). This raises the activity counter on
i915_active prior to use and ensures that the fence-tree contains a slot
for the context.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |   2 +-
 drivers/gpu/drm/i915/gt/intel_timeline.c      |   4 +-
 drivers/gpu/drm/i915/i915_active.c            | 113 +++++++++++++++---
 drivers/gpu/drm/i915/i915_active.h            |  14 ++-
 4 files changed, 113 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 18e9325dd98a..a027a0557d3a 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -1782,7 +1782,7 @@ __parser_mark_active(struct i915_vma *vma,
 {
 	struct intel_gt_buffer_pool_node *node = vma->private;
 
-	return i915_active_ref(&node->active, tl, fence);
+	return i915_active_ref(&node->active, tl->fence_context, fence);
 }
 
 static int
diff --git a/drivers/gpu/drm/i915/gt/intel_timeline.c b/drivers/gpu/drm/i915/gt/intel_timeline.c
index 4546284fede1..e4a5326633b8 100644
--- a/drivers/gpu/drm/i915/gt/intel_timeline.c
+++ b/drivers/gpu/drm/i915/gt/intel_timeline.c
@@ -479,7 +479,9 @@ __intel_timeline_get_seqno(struct intel_timeline *tl,
 	 * free it after the current request is retired, which ensures that
 	 * all writes into the cacheline from previous requests are complete.
 	 */
-	err = i915_active_ref(&tl->hwsp_cacheline->active, tl, &rq->fence);
+	err = i915_active_ref(&tl->hwsp_cacheline->active,
+			      tl->fence_context,
+			      &rq->fence);
 	if (err)
 		goto err_cacheline;
 
diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c
index d960d0be5bd2..3f595446fd44 100644
--- a/drivers/gpu/drm/i915/i915_active.c
+++ b/drivers/gpu/drm/i915/i915_active.c
@@ -217,11 +217,10 @@ excl_retire(struct dma_fence *fence, struct dma_fence_cb *cb)
 }
 
 static struct i915_active_fence *
-active_instance(struct i915_active *ref, struct intel_timeline *tl)
+active_instance(struct i915_active *ref, u64 idx)
 {
 	struct active_node *node, *prealloc;
 	struct rb_node **p, *parent;
-	u64 idx = tl->fence_context;
 
 	/*
 	 * We track the most recently used timeline to skip a rbtree search
@@ -353,21 +352,17 @@ __active_del_barrier(struct i915_active *ref, struct active_node *node)
 	return ____active_del_barrier(ref, node, barrier_to_engine(node));
 }
 
-int i915_active_ref(struct i915_active *ref,
-		    struct intel_timeline *tl,
-		    struct dma_fence *fence)
+int i915_active_ref(struct i915_active *ref, u64 idx, struct dma_fence *fence)
 {
 	struct i915_active_fence *active;
 	int err;
 
-	lockdep_assert_held(&tl->mutex);
-
 	/* Prevent reaping in case we malloc/wait while building the tree */
 	err = i915_active_acquire(ref);
 	if (err)
 		return err;
 
-	active = active_instance(ref, tl);
+	active = active_instance(ref, idx);
 	if (!active) {
 		err = -ENOMEM;
 		goto out;
@@ -384,32 +379,104 @@ int i915_active_ref(struct i915_active *ref,
 		atomic_dec(&ref->count);
 	}
 	if (!__i915_active_fence_set(active, fence))
-		atomic_inc(&ref->count);
+		__i915_active_acquire(ref);
 
 out:
 	i915_active_release(ref);
 	return err;
 }
 
-struct dma_fence *
-i915_active_set_exclusive(struct i915_active *ref, struct dma_fence *f)
+static struct dma_fence *
+__i915_active_set_fence(struct i915_active *ref,
+			struct i915_active_fence *active,
+			struct dma_fence *fence)
 {
 	struct dma_fence *prev;
 
 	/* We expect the caller to manage the exclusive timeline ordering */
 	GEM_BUG_ON(i915_active_is_idle(ref));
 
+	if (is_barrier(active)) { /* proto-node used by our idle barrier */
+		/*
+		 * This request is on the kernel_context timeline, and so
+		 * we can use it to substitute for the pending idle-barrer
+		 * request that we want to emit on the kernel_context.
+		 */
+		__active_del_barrier(ref, node_from_active(active));
+		RCU_INIT_POINTER(active->fence, NULL);
+		atomic_dec(&ref->count);
+	}
+
 	rcu_read_lock();
-	prev = __i915_active_fence_set(&ref->excl, f);
+	prev = __i915_active_fence_set(active, fence);
 	if (prev)
 		prev = dma_fence_get_rcu(prev);
 	else
-		atomic_inc(&ref->count);
+		__i915_active_acquire(ref);
 	rcu_read_unlock();
 
 	return prev;
 }
 
+static struct i915_active_fence *
+__active_lookup(struct i915_active *ref, u64 idx)
+{
+	struct active_node *node;
+	struct rb_node *p;
+
+	/* Like active_instance() but with no malloc */
+
+	node = READ_ONCE(ref->cache);
+	if (node && node->timeline == idx)
+		return &node->base;
+
+	spin_lock_irq(&ref->tree_lock);
+	GEM_BUG_ON(i915_active_is_idle(ref));
+
+	p = ref->tree.rb_node;
+	while (p) {
+		node = rb_entry(p, struct active_node, node);
+		if (node->timeline == idx) {
+			ref->cache = node;
+			spin_unlock_irq(&ref->tree_lock);
+			return &node->base;
+		}
+
+		if (node->timeline < idx)
+			p = p->rb_right;
+		else
+			p = p->rb_left;
+	}
+
+	spin_unlock_irq(&ref->tree_lock);
+
+	return NULL;
+}
+
+struct dma_fence *
+__i915_active_ref(struct i915_active *ref, u64 idx, struct dma_fence *fence)
+{
+	struct dma_fence *prev = ERR_PTR(-ENOENT);
+	struct i915_active_fence *active;
+
+	if (!i915_active_acquire_if_busy(ref))
+		return ERR_PTR(-EINVAL);
+
+	active = __active_lookup(ref, idx);
+	if (active)
+		prev = __i915_active_set_fence(ref, active, fence);
+
+	i915_active_release(ref);
+	return prev;
+}
+
+struct dma_fence *
+i915_active_set_exclusive(struct i915_active *ref, struct dma_fence *f)
+{
+	/* We expect the caller to manage the exclusive timeline ordering */
+	return __i915_active_set_fence(ref, &ref->excl, f);
+}
+
 bool i915_active_acquire_if_busy(struct i915_active *ref)
 {
 	debug_active_assert(ref);
@@ -443,6 +510,24 @@ int i915_active_acquire(struct i915_active *ref)
 	return err;
 }
 
+int i915_active_acquire_for_context(struct i915_active *ref, u64 idx)
+{
+	struct i915_active_fence *active;
+	int err;
+
+	err = i915_active_acquire(ref);
+	if (err)
+		return err;
+
+	active = active_instance(ref, idx);
+	if (!active) {
+		i915_active_release(ref);
+		return -ENOMEM;
+	}
+
+	return 0; /* return with active ref */
+}
+
 void i915_active_release(struct i915_active *ref)
 {
 	debug_active_assert(ref);
@@ -804,7 +889,7 @@ int i915_active_acquire_preallocate_barrier(struct i915_active *ref,
 			 */
 			RCU_INIT_POINTER(node->base.fence, ERR_PTR(-EAGAIN));
 			node->base.cb.node.prev = (void *)engine;
-			atomic_inc(&ref->count);
+			__i915_active_acquire(ref);
 		}
 		GEM_BUG_ON(rcu_access_pointer(node->base.fence) != ERR_PTR(-EAGAIN));
 
diff --git a/drivers/gpu/drm/i915/i915_active.h b/drivers/gpu/drm/i915/i915_active.h
index cf4058150966..2e0bcb3289ec 100644
--- a/drivers/gpu/drm/i915/i915_active.h
+++ b/drivers/gpu/drm/i915/i915_active.h
@@ -163,14 +163,18 @@ void __i915_active_init(struct i915_active *ref,
 	__i915_active_init(ref, active, retire, &__mkey, &__wkey);	\
 } while (0)
 
-int i915_active_ref(struct i915_active *ref,
-		    struct intel_timeline *tl,
-		    struct dma_fence *fence);
+struct dma_fence *
+__i915_active_ref(struct i915_active *ref, u64 idx, struct dma_fence *fence);
+int i915_active_ref(struct i915_active *ref, u64 idx, struct dma_fence *fence);
 
 static inline int
 i915_active_add_request(struct i915_active *ref, struct i915_request *rq)
 {
-	return i915_active_ref(ref, i915_request_timeline(rq), &rq->fence);
+	struct intel_timeline *tl = i915_request_timeline(rq);
+
+	lockdep_assert_held(&tl->mutex);
+
+	return i915_active_ref(ref, tl->fence_context, &rq->fence);
 }
 
 struct dma_fence *
@@ -198,7 +202,9 @@ int i915_request_await_active(struct i915_request *rq,
 #define I915_ACTIVE_AWAIT_BARRIER BIT(2)
 
 int i915_active_acquire(struct i915_active *ref);
+int i915_active_acquire_for_context(struct i915_active *ref, u64 idx);
 bool i915_active_acquire_if_busy(struct i915_active *ref);
+
 void i915_active_release(struct i915_active *ref);
 
 static inline void __i915_active_acquire(struct i915_active *ref)
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [Intel-gfx] [PATCH 16/23] drm/i915/gem: Separate the ww_mutex walker into its own list
  2020-07-02  8:32 ` [Intel-gfx] " Chris Wilson
                   ` (14 preceding siblings ...)
  (?)
@ 2020-07-02  8:32 ` Chris Wilson
  -1 siblings, 0 replies; 56+ messages in thread
From: Chris Wilson @ 2020-07-02  8:32 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

In preparation for making eb_vma bigger and heavy to run inn parallel,
we need to stop apply an in-place swap() to reorder around ww_mutex
deadlocks. Keep the array intact and reorder the locks using a dedicated
list.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 83 ++++++++++++-------
 1 file changed, 54 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index a027a0557d3a..faff64b66484 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -37,6 +37,7 @@ struct eb_vma {
 	struct list_head bind_link;
 	struct list_head unbound_link;
 	struct list_head reloc_link;
+	struct list_head submit_link;
 
 	struct hlist_node node;
 	u32 handle;
@@ -248,6 +249,8 @@ struct i915_execbuffer {
 	/** list of vma that have execobj.relocation_count */
 	struct list_head relocs;
 
+	struct list_head submit_list;
+
 	/**
 	 * Track the most recently used object for relocations, as we
 	 * frequently have to perform multiple relocations within the same
@@ -341,6 +344,42 @@ static void eb_vma_array_put(struct eb_vma_array *arr)
 	kref_put(&arr->kref, eb_vma_array_destroy);
 }
 
+static int
+eb_lock_vma(struct i915_execbuffer *eb, struct ww_acquire_ctx *acquire)
+{
+	struct eb_vma *ev;
+	int err = 0;
+
+	list_for_each_entry(ev, &eb->submit_list, submit_link) {
+		struct i915_vma *vma = ev->vma;
+
+		err = ww_mutex_lock_interruptible(&vma->resv->lock, acquire);
+		if (err == -EDEADLK) {
+			struct eb_vma *unlock = ev, *en;
+
+			list_for_each_entry_safe_continue_reverse(unlock, en,
+								  &eb->submit_list,
+								  submit_link) {
+				ww_mutex_unlock(&unlock->vma->resv->lock);
+				list_move_tail(&unlock->submit_link, &eb->submit_list);
+			}
+
+			GEM_BUG_ON(!list_is_first(&ev->submit_link, &eb->submit_list));
+			err = ww_mutex_lock_slow_interruptible(&vma->resv->lock,
+							       acquire);
+		}
+		if (err) {
+			list_for_each_entry_continue_reverse(ev,
+							     &eb->submit_list,
+							     submit_link)
+				ww_mutex_unlock(&ev->vma->resv->lock);
+			break;
+		}
+	}
+
+	return err;
+}
+
 static int eb_create(struct i915_execbuffer *eb)
 {
 	/* Allocate an extra slot for use by the command parser + sentinel */
@@ -393,6 +432,10 @@ static int eb_create(struct i915_execbuffer *eb)
 		eb->lut_size = -eb->buffer_count;
 	}
 
+	INIT_LIST_HEAD(&eb->bind_list);
+	INIT_LIST_HEAD(&eb->submit_list);
+	INIT_LIST_HEAD(&eb->relocs);
+
 	return 0;
 }
 
@@ -574,6 +617,7 @@ eb_add_vma(struct i915_execbuffer *eb,
 	}
 
 	list_add_tail(&ev->bind_link, &eb->bind_list);
+	list_add_tail(&ev->submit_link, &eb->submit_list);
 
 	if (entry->relocation_count)
 		list_add_tail(&ev->reloc_link, &eb->relocs);
@@ -903,9 +947,6 @@ static int eb_lookup_vmas(struct i915_execbuffer *eb)
 	unsigned int i;
 	int err = 0;
 
-	INIT_LIST_HEAD(&eb->bind_list);
-	INIT_LIST_HEAD(&eb->relocs);
-
 	for (i = 0; i < eb->buffer_count; i++) {
 		struct i915_vma *vma;
 
@@ -1576,38 +1617,19 @@ static int eb_relocate(struct i915_execbuffer *eb)
 
 static int eb_move_to_gpu(struct i915_execbuffer *eb)
 {
-	const unsigned int count = eb->buffer_count;
 	struct ww_acquire_ctx acquire;
-	unsigned int i;
+	struct eb_vma *ev;
 	int err = 0;
 
 	ww_acquire_init(&acquire, &reservation_ww_class);
 
-	for (i = 0; i < count; i++) {
-		struct eb_vma *ev = &eb->vma[i];
-		struct i915_vma *vma = ev->vma;
-
-		err = ww_mutex_lock_interruptible(&vma->resv->lock, &acquire);
-		if (err == -EDEADLK) {
-			GEM_BUG_ON(i == 0);
-			do {
-				int j = i - 1;
-
-				ww_mutex_unlock(&eb->vma[j].vma->resv->lock);
-
-				swap(eb->vma[i],  eb->vma[j]);
-			} while (--i);
+	err = eb_lock_vma(eb, &acquire);
+	if (err)
+		goto err_fini;
 
-			err = ww_mutex_lock_slow_interruptible(&vma->resv->lock,
-							       &acquire);
-		}
-		if (err)
-			break;
-	}
 	ww_acquire_done(&acquire);
 
-	while (i--) {
-		struct eb_vma *ev = &eb->vma[i];
+	list_for_each_entry(ev, &eb->submit_list, submit_link) {
 		struct i915_vma *vma = ev->vma;
 		unsigned int flags = ev->flags;
 		struct drm_i915_gem_object *obj = vma->obj;
@@ -1664,6 +1686,8 @@ static int eb_move_to_gpu(struct i915_execbuffer *eb)
 	intel_gt_chipset_flush(eb->engine->gt);
 	return 0;
 
+err_fini:
+	ww_acquire_fini(&acquire);
 err_skip:
 	i915_request_set_error_once(eb->request, err);
 	return err;
@@ -1945,9 +1969,10 @@ static int eb_parse(struct i915_execbuffer *eb)
 	if (err)
 		goto err_trampoline;
 
-	eb->vma[eb->buffer_count].vma = i915_vma_get(shadow);
-	eb->vma[eb->buffer_count].flags = __EXEC_OBJECT_HAS_PIN;
 	eb->batch = &eb->vma[eb->buffer_count++];
+	eb->batch->vma = i915_vma_get(shadow);
+	eb->batch->flags = __EXEC_OBJECT_HAS_PIN;
+	list_add_tail(&eb->batch->submit_link, &eb->submit_list);
 	eb->vma[eb->buffer_count].vma = NULL;
 
 	eb->trampoline = trampoline;
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [Intel-gfx] [PATCH 17/23] drm/i915/gem: Asynchronous GTT unbinding
  2020-07-02  8:32 ` [Intel-gfx] " Chris Wilson
                   ` (15 preceding siblings ...)
  (?)
@ 2020-07-02  8:32 ` Chris Wilson
  2020-07-05 22:01   ` Andi Shyti
  -1 siblings, 1 reply; 56+ messages in thread
From: Chris Wilson @ 2020-07-02  8:32 UTC (permalink / raw)
  To: intel-gfx; +Cc: Matthew Auld, Chris Wilson

It is reasonably common for userspace (even modern drivers like iris) to
reuse an active address for a new buffer. This would cause the
application to stall under its mutex (originally struct_mutex) until the
old batches were idle and it could synchronously remove the stale PTE.
However, we can queue up a job that waits on the signal for the old
nodes to complete and upon those signals, remove the old nodes replacing
them with the new ones for the batch. This is still CPU driven, but in
theory we can do the GTT patching from the GPU. The job itself has a
completion signal allowing the execbuf to wait upon the rebinding, and
also other observers to coordinate with the common VM activity.

Letting userspace queue up more work, lets it do more stuff without
blocking other clients. In turn, we take care not to let it too much
concurrent work, creating a small number of queues for each context to
limit the number of concurrent tasks.

The implementation relies on only scheduling one unbind operation per
vma as we use the unbound vma->node location to track the stale PTE.

Closes: https://gitlab.freedesktop.org/drm/intel/issues/1402
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Andi Shyti <andi.shyti@intel.com>
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 921 ++++++++++++++++--
 drivers/gpu/drm/i915/gt/gen6_ppgtt.c          |   1 +
 drivers/gpu/drm/i915/gt/intel_gtt.c           |   4 +
 drivers/gpu/drm/i915/gt/intel_gtt.h           |   2 +
 drivers/gpu/drm/i915/i915_gem.c               |   7 +
 drivers/gpu/drm/i915/i915_gem_gtt.c           |   5 +
 drivers/gpu/drm/i915/i915_vma.c               |  71 +-
 drivers/gpu/drm/i915/i915_vma.h               |   4 +
 8 files changed, 886 insertions(+), 129 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index faff64b66484..5fa84c802312 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -18,6 +18,7 @@
 #include "gt/intel_gt.h"
 #include "gt/intel_gt_buffer_pool.h"
 #include "gt/intel_gt_pm.h"
+#include "gt/intel_gt_requests.h"
 #include "gt/intel_ring.h"
 
 #include "i915_drv.h"
@@ -43,6 +44,12 @@ struct eb_vma {
 	u32 handle;
 };
 
+struct eb_bind_vma {
+	struct eb_vma *ev;
+	struct drm_mm_node hole;
+	unsigned int bind_flags;
+};
+
 struct eb_vma_array {
 	struct kref kref;
 	struct eb_vma vma[];
@@ -66,11 +73,12 @@ struct eb_vma_array {
 	 I915_EXEC_RESOURCE_STREAMER)
 
 /* Catch emission of unexpected errors for CI! */
+#define __EINVAL__ 22
 #if IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)
 #undef EINVAL
 #define EINVAL ({ \
 	DRM_DEBUG_DRIVER("EINVAL at %s:%d\n", __func__, __LINE__); \
-	22; \
+	__EINVAL__; \
 })
 #endif
 
@@ -311,6 +319,12 @@ static struct eb_vma_array *eb_vma_array_create(unsigned int count)
 	return arr;
 }
 
+static struct eb_vma_array *eb_vma_array_get(struct eb_vma_array *arr)
+{
+	kref_get(&arr->kref);
+	return arr;
+}
+
 static inline void eb_unreserve_vma(struct eb_vma *ev)
 {
 	struct i915_vma *vma = ev->vma;
@@ -444,7 +458,7 @@ eb_vma_misplaced(const struct drm_i915_gem_exec_object2 *entry,
 		 const struct i915_vma *vma,
 		 unsigned int flags)
 {
-	if (vma->node.size < entry->pad_to_size)
+	if (vma->node.size < max(vma->size, entry->pad_to_size))
 		return true;
 
 	if (entry->alignment && !IS_ALIGNED(vma->node.start, entry->alignment))
@@ -469,32 +483,6 @@ eb_vma_misplaced(const struct drm_i915_gem_exec_object2 *entry,
 	return false;
 }
 
-static u64 eb_pin_flags(const struct drm_i915_gem_exec_object2 *entry,
-			unsigned int exec_flags)
-{
-	u64 pin_flags = 0;
-
-	if (exec_flags & EXEC_OBJECT_NEEDS_GTT)
-		pin_flags |= PIN_GLOBAL;
-
-	/*
-	 * Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset,
-	 * limit address to the first 4GBs for unflagged objects.
-	 */
-	if (!(exec_flags & EXEC_OBJECT_SUPPORTS_48B_ADDRESS))
-		pin_flags |= PIN_ZONE_4G;
-
-	if (exec_flags & __EXEC_OBJECT_NEEDS_MAP)
-		pin_flags |= PIN_MAPPABLE;
-
-	if (exec_flags & EXEC_OBJECT_PINNED)
-		pin_flags |= entry->offset | PIN_OFFSET_FIXED;
-	else if (exec_flags & __EXEC_OBJECT_NEEDS_BIAS)
-		pin_flags |= BATCH_OFFSET_BIAS | PIN_OFFSET_BIAS;
-
-	return pin_flags;
-}
-
 static bool eb_pin_vma_fence_inplace(struct eb_vma *ev)
 {
 	struct i915_vma *vma = ev->vma;
@@ -522,6 +510,10 @@ eb_pin_vma_inplace(struct i915_execbuffer *eb,
 	struct i915_vma *vma = ev->vma;
 	unsigned int pin_flags;
 
+	/* Concurrent async binds in progress, get in the queue */
+	if (!i915_active_is_idle(&vma->vm->binding))
+		return false;
+
 	if (eb_vma_misplaced(entry, vma, ev->flags))
 		return false;
 
@@ -642,45 +634,460 @@ eb_add_vma(struct i915_execbuffer *eb,
 	}
 }
 
-static int eb_reserve_vma(const struct i915_execbuffer *eb,
-			  struct eb_vma *ev,
-			  u64 pin_flags)
+struct eb_vm_work {
+	struct dma_fence_work base;
+	struct eb_vma_array *array;
+	struct eb_bind_vma *bind;
+	struct i915_address_space *vm;
+	struct i915_vm_pt_stash stash;
+	struct list_head evict_list;
+	u64 *p_flags;
+	u64 id;
+	unsigned long count;
+};
+
+static inline u64 node_end(const struct drm_mm_node *node)
+{
+	return node->start + node->size;
+}
+
+static int set_bind_fence(struct i915_vma *vma, struct eb_vm_work *work)
+{
+	struct dma_fence *prev;
+	int err = 0;
+
+	lockdep_assert_held(&vma->vm->mutex);
+	prev = i915_active_set_exclusive(&vma->active, &work->base.dma);
+	if (unlikely(prev)) {
+		err = i915_sw_fence_await_dma_fence(&work->base.chain, prev, 0,
+						    GFP_NOWAIT | __GFP_NOWARN);
+		dma_fence_put(prev);
+	}
+
+	return err < 0 ? err : 0;
+}
+
+static int await_evict(struct eb_vm_work *work, struct i915_vma *vma)
 {
-	struct drm_i915_gem_exec_object2 *entry = ev->exec;
-	struct i915_vma *vma = ev->vma;
 	int err;
 
-	if (drm_mm_node_allocated(&vma->node) &&
-	    eb_vma_misplaced(entry, vma, ev->flags)) {
-		err = i915_vma_unbind(vma);
+	if (rcu_access_pointer(vma->active.excl.fence) == &work->base.dma)
+		return 0;
+
+	/* Wait for all other previous activity */
+	err = i915_sw_fence_await_active(&work->base.chain,
+					 &vma->active,
+					 I915_ACTIVE_AWAIT_ACTIVE);
+	/* Then insert along the exclusive vm->mutex timeline */
+	if (err == 0)
+		err = set_bind_fence(vma, work);
+
+	return err;
+}
+
+static int
+evict_for_node(struct eb_vm_work *work,
+	       struct eb_bind_vma *const target,
+	       unsigned int flags)
+{
+	struct i915_vma *target_vma = target->ev->vma;
+	struct i915_address_space *vm = target_vma->vm;
+	const unsigned long color = target_vma->node.color;
+	const u64 start = target_vma->node.start;
+	const u64 end = start + target_vma->node.size;
+	u64 hole_start = start, hole_end = end;
+	struct i915_vma *vma, *next;
+	struct drm_mm_node *node;
+	LIST_HEAD(evict_list);
+	LIST_HEAD(steal_list);
+	int err = 0;
+
+	lockdep_assert_held(&vm->mutex);
+	GEM_BUG_ON(drm_mm_node_allocated(&target_vma->node));
+	GEM_BUG_ON(!IS_ALIGNED(start, I915_GTT_PAGE_SIZE));
+	GEM_BUG_ON(!IS_ALIGNED(end, I915_GTT_PAGE_SIZE));
+
+	if (i915_vm_has_cache_coloring(vm)) {
+		/* Expand search to cover neighbouring guard pages (or lack!) */
+		if (hole_start)
+			hole_start -= I915_GTT_PAGE_SIZE;
+
+		/* Always look at the page afterwards to avoid the end-of-GTT */
+		hole_end += I915_GTT_PAGE_SIZE;
+	}
+	GEM_BUG_ON(hole_start >= hole_end);
+
+	drm_mm_for_each_node_in_range(node, &vm->mm, hole_start, hole_end) {
+		GEM_BUG_ON(node == &target_vma->node);
+
+		/* If we find any non-objects (!vma), we cannot evict them */
+		if (node->color == I915_COLOR_UNEVICTABLE) {
+			err = -ENOSPC;
+			goto err;
+		}
+
+		/*
+		 * If we are using coloring to insert guard pages between
+		 * different cache domains within the address space, we have
+		 * to check whether the objects on either side of our range
+		 * abutt and conflict. If they are in conflict, then we evict
+		 * those as well to make room for our guard pages.
+		 */
+		if (i915_vm_has_cache_coloring(vm)) {
+			if (node_end(node) == start && node->color == color)
+				continue;
+
+			if (node->start == end && node->color == color)
+				continue;
+		}
+
+		GEM_BUG_ON(!drm_mm_node_allocated(node));
+		vma = container_of(node, typeof(*vma), node);
+
+		if (i915_vma_is_pinned(vma)) {
+			err = -ENOSPC;
+			goto err;
+		}
+
+		/* If this VMA is already being freed, or idle, steal it! */
+		if (!i915_active_acquire_if_busy(&vma->active)) {
+			list_move(&vma->vm_link, &steal_list);
+			continue;
+		}
+
+		if (flags & PIN_NONBLOCK)
+			err = -EAGAIN;
+		else
+			err = await_evict(work, vma);
+		i915_active_release(&vma->active);
 		if (err)
-			return err;
+			goto err;
+
+		GEM_BUG_ON(!i915_vma_is_active(vma));
+		list_move(&vma->vm_link, &evict_list);
 	}
 
-	err = i915_vma_pin(vma,
-			   entry->pad_to_size, entry->alignment,
-			   eb_pin_flags(entry, ev->flags) | pin_flags);
-	if (err)
-		return err;
+	list_for_each_entry_safe(vma, next, &steal_list, vm_link) {
+		atomic_and(~I915_VMA_BIND_MASK, &vma->flags);
+		__i915_vma_evict(vma);
+		drm_mm_remove_node(&vma->node);
+		/* No ref held; vma may now be concurrently freed */
+	}
 
-	if (entry->offset != vma->node.start) {
-		entry->offset = vma->node.start | UPDATE;
-		eb->args->flags |= __EXEC_HAS_RELOC;
+	/* No overlapping nodes to evict, claim the slot for ourselves! */
+	if (list_empty(&evict_list))
+		return drm_mm_reserve_node(&vm->mm, &target_vma->node);
+
+	/*
+	 * Mark this range as reserved.
+	 *
+	 * We have not yet removed the PTEs for the old evicted nodes, so
+	 * must prevent this range from being reused for anything else. The
+	 * PTE will be cleared when the range is idle (during the rebind
+	 * phase in the worker).
+	 */
+	target->hole.color = I915_COLOR_UNEVICTABLE;
+	target->hole.start = start;
+	target->hole.size = end;
+
+	list_for_each_entry(vma, &evict_list, vm_link) {
+		target->hole.start =
+			min(target->hole.start, vma->node.start);
+		target->hole.size =
+			max(target->hole.size, node_end(&vma->node));
+
+		GEM_BUG_ON(vma->node.mm != &vm->mm);
+		drm_mm_remove_node(&vma->node);
+		atomic_and(~I915_VMA_BIND_MASK, &vma->flags);
+		GEM_BUG_ON(i915_vma_is_pinned(vma));
 	}
+	list_splice(&evict_list, &work->evict_list);
 
-	if (unlikely(ev->flags & EXEC_OBJECT_NEEDS_FENCE)) {
-		err = i915_vma_pin_fence(vma);
-		if (unlikely(err)) {
-			i915_vma_unpin(vma);
-			return err;
+	target->hole.size -= target->hole.start;
+
+	return drm_mm_reserve_node(&vm->mm, &target->hole);
+
+err:
+	list_splice(&evict_list, &vm->bound_list);
+	list_splice(&steal_list, &vm->bound_list);
+	return err;
+}
+
+static int
+evict_in_range(struct eb_vm_work *work,
+	       struct eb_bind_vma * const target,
+	       u64 start, u64 end, u64 align)
+{
+	struct i915_vma *target_vma = target->ev->vma;
+	struct i915_address_space *vm = target_vma->vm;
+	struct i915_vma *active = NULL;
+	struct i915_vma *vma, *next;
+	struct drm_mm_scan scan;
+	LIST_HEAD(evict_list);
+	bool found = false;
+
+	lockdep_assert_held(&vm->mutex);
+
+	drm_mm_scan_init_with_range(&scan, &vm->mm,
+				    target_vma->node.size,
+				    align,
+				    target_vma->node.color,
+				    start, end,
+				    DRM_MM_INSERT_BEST);
+
+	list_for_each_entry_safe(vma, next, &vm->bound_list, vm_link) {
+		if (i915_vma_is_pinned(vma))
+			continue;
+
+		if (vma == active)
+			active = ERR_PTR(-EAGAIN);
+
+		/* Prefer to reuse idle nodes; push all active vma to the end */
+		if (active != ERR_PTR(-EAGAIN) && i915_vma_is_active(vma)) {
+			if (!active)
+				active = vma;
+
+			list_move_tail(&vma->vm_link, &vm->bound_list);
+			continue;
 		}
 
+		list_move(&vma->vm_link, &evict_list);
+		if (drm_mm_scan_add_block(&scan, &vma->node)) {
+			target_vma->node.start =
+				round_up(scan.hit_start, align);
+			found = true;
+			break;
+		}
+	}
+
+	list_for_each_entry(vma, &evict_list, vm_link)
+		drm_mm_scan_remove_block(&scan, &vma->node);
+	list_splice(&evict_list, &vm->bound_list);
+	if (!found)
+		return -ENOSPC;
+
+	return evict_for_node(work, target, 0);
+}
+
+static u64 random_offset(u64 start, u64 end, u64 len, u64 align)
+{
+	u64 range, addr;
+
+	GEM_BUG_ON(range_overflows(start, len, end));
+	GEM_BUG_ON(round_up(start, align) > round_down(end - len, align));
+
+	range = round_down(end - len, align) - round_up(start, align);
+	if (range) {
+		if (sizeof(unsigned long) == sizeof(u64)) {
+			addr = get_random_long();
+		} else {
+			addr = get_random_int();
+			if (range > U32_MAX) {
+				addr <<= 32;
+				addr |= get_random_int();
+			}
+		}
+		div64_u64_rem(addr, range, &addr);
+		start += addr;
+	}
+
+	return round_up(start, align);
+}
+
+static u64 align0(u64 align)
+{
+	return align <= I915_GTT_MIN_ALIGNMENT ? 0 : align;
+}
+
+static struct drm_mm_node *__best_hole(struct drm_mm *mm, u64 size)
+{
+	struct rb_node *rb = mm->holes_size.rb_root.rb_node;
+	struct drm_mm_node *best = NULL;
+
+	while (rb) {
+		struct drm_mm_node *node =
+			rb_entry(rb, struct drm_mm_node, rb_hole_size);
+
+		if (size <= node->hole_size) {
+			best = node;
+			rb = rb->rb_right;
+		} else {
+			rb = rb->rb_left;
+		}
+	}
+
+	return best;
+}
+
+static int best_hole(struct drm_mm *mm, struct drm_mm_node *node,
+		     u64 start, u64 end, u64 align)
+{
+	struct drm_mm_node *hole;
+	u64 size = node->size;
+
+	do {
+		hole = __best_hole(mm, size);
+		if (!hole)
+			return -ENOSPC;
+
+		node->start = round_up(max(start, drm_mm_hole_node_start(hole)),
+				       align);
+		if (min(drm_mm_hole_node_end(hole), end) >=
+		    node->start + node->size)
+			return drm_mm_reserve_node(mm, node);
+
+		/*
+		 * Too expensive to search for every single hole every time,
+		 * so just look for the next bigger hole, introducing enough
+		 * space for alignments. Finding the smallest hole with ideal
+		 * alignment scales very poorly, so we choose to waste space
+		 * if an alignment is forced. On the other hand, simply
+		 * randomly selecting an offset in 48b space will cause us
+		 * to use the majority of that space and exhaust all memory
+		 * in storing the page directories. Compromise is required.
+		 */
+		size = hole->hole_size + align;
+	} while (1);
+}
+
+static int eb_reserve_vma(struct eb_vm_work *work, struct eb_bind_vma *bind)
+{
+	struct drm_i915_gem_exec_object2 *entry = bind->ev->exec;
+	const unsigned int exec_flags = bind->ev->flags;
+	struct i915_vma *vma = bind->ev->vma;
+	struct i915_address_space *vm = vma->vm;
+	u64 start = 0, end = vm->total;
+	u64 align = entry->alignment ?: I915_GTT_MIN_ALIGNMENT;
+	unsigned int bind_flags;
+	int err;
+
+	lockdep_assert_held(&vm->mutex);
+
+	bind_flags = PIN_USER;
+	if (exec_flags & EXEC_OBJECT_NEEDS_GTT)
+		bind_flags |= PIN_GLOBAL;
+
+	if (drm_mm_node_allocated(&vma->node))
+		goto pin;
+
+	GEM_BUG_ON(i915_vma_is_pinned(vma));
+	GEM_BUG_ON(i915_vma_is_bound(vma, I915_VMA_BIND_MASK));
+	GEM_BUG_ON(i915_active_fence_isset(&vma->active.excl));
+	GEM_BUG_ON(!vma->size);
+
+	/* Reuse old address (if it doesn't conflict with new requirements) */
+	if (eb_vma_misplaced(entry, vma, exec_flags)) {
+		vma->node.start = entry->offset & PIN_OFFSET_MASK;
+		vma->node.size = max(entry->pad_to_size, vma->size);
+		vma->node.color = 0;
+		if (i915_vm_has_cache_coloring(vm))
+			vma->node.color = vma->obj->cache_level;
+	}
+
+	/*
+	 * Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset,
+	 * limit address to the first 4GBs for unflagged objects.
+	 */
+	if (!(exec_flags & EXEC_OBJECT_SUPPORTS_48B_ADDRESS))
+		end = min_t(u64, end, (1ULL << 32) - I915_GTT_PAGE_SIZE);
+
+	align = max(align, vma->display_alignment);
+	if (exec_flags & __EXEC_OBJECT_NEEDS_MAP) {
+		vma->node.size = max_t(u64, vma->node.size, vma->fence_size);
+		end = min_t(u64, end, i915_vm_to_ggtt(vm)->mappable_end);
+		align = max_t(u64, align, vma->fence_alignment);
+	}
+
+	if (exec_flags & __EXEC_OBJECT_NEEDS_BIAS)
+		start = BATCH_OFFSET_BIAS;
+
+	GEM_BUG_ON(!vma->node.size);
+	if (vma->node.size > end - start)
+		return -E2BIG;
+
+	/* Try the user's preferred location first (mandatory if soft-pinned) */
+	err = -__EINVAL__;
+	if (vma->node.start >= start &&
+	    IS_ALIGNED(vma->node.start, align) &&
+	    !range_overflows(vma->node.start, vma->node.size, end)) {
+		unsigned int pin_flags;
+
+		if (drm_mm_reserve_node(&vm->mm, &vma->node) == 0)
+			goto pin;
+
+		pin_flags = 0;
+		if (!(exec_flags & EXEC_OBJECT_PINNED))
+			pin_flags = PIN_NONBLOCK;
+
+		err = evict_for_node(work, bind, pin_flags);
+		if (err == 0)
+			goto pin;
+	}
+	if (exec_flags & EXEC_OBJECT_PINNED)
+		return err;
+
+	/* Try the first available free space */
+	if (!best_hole(&vm->mm, &vma->node, start, end, align))
+		goto pin;
+
+	/* Pick a random slot and see if it's available [O(N) worst case] */
+	vma->node.start = random_offset(start, end, vma->node.size, align);
+	if (evict_for_node(work, bind, 0) == 0)
+		goto pin;
+
+	/* Otherwise search all free space [degrades to O(N^2)] */
+	if (drm_mm_insert_node_in_range(&vm->mm, &vma->node,
+					vma->node.size,
+					align0(align),
+					vma->node.color,
+					start, end,
+					DRM_MM_INSERT_BEST) == 0)
+		goto pin;
+
+	/* Pretty busy! Loop over "LRU" and evict oldest in our search range */
+	err = evict_in_range(work, bind, start, end, align);
+	if (unlikely(err))
+		return err;
+
+pin:
+	if (unlikely(exec_flags & EXEC_OBJECT_NEEDS_FENCE)) {
+		err = __i915_vma_pin_fence(vma); /* XXX no waiting */
+		if (unlikely(err))
+			return err;
+
 		if (vma->fence)
-			ev->flags |= __EXEC_OBJECT_HAS_FENCE;
+			bind->ev->flags |= __EXEC_OBJECT_HAS_FENCE;
 	}
 
-	ev->flags |= __EXEC_OBJECT_HAS_PIN;
-	GEM_BUG_ON(eb_vma_misplaced(entry, vma, ev->flags));
+	bind_flags &= ~atomic_read(&vma->flags);
+	if (bind_flags) {
+		err = set_bind_fence(vma, work);
+		if (unlikely(err))
+			return err;
+
+		atomic_add(I915_VMA_PAGES_ACTIVE, &vma->pages_count);
+		atomic_or(bind_flags, &vma->flags);
+
+		if (i915_vma_is_ggtt(vma))
+			__i915_vma_set_map_and_fenceable(vma);
+
+		GEM_BUG_ON(!i915_vma_is_active(vma));
+		list_move_tail(&vma->vm_link, &vm->bound_list);
+		bind->bind_flags = bind_flags;
+	}
+	__i915_vma_pin(vma); /* and release */
+
+	GEM_BUG_ON(!bind_flags && !drm_mm_node_allocated(&vma->node));
+	GEM_BUG_ON(!(drm_mm_node_allocated(&vma->node) ^
+		     drm_mm_node_allocated(&bind->hole)));
+
+	if (entry->offset != vma->node.start) {
+		entry->offset = vma->node.start | UPDATE;
+		*work->p_flags |= __EXEC_HAS_RELOC;
+	}
+
+	bind->ev->flags |= __EXEC_OBJECT_HAS_PIN;
+	GEM_BUG_ON(eb_vma_misplaced(entry, vma, bind->ev->flags));
 
 	return 0;
 }
@@ -714,13 +1121,260 @@ static int wait_for_timeline(struct intel_timeline *tl)
 	} while (1);
 }
 
+static int __eb_bind_vma(struct eb_vm_work *work, int err)
+{
+	struct i915_address_space *vm = work->vm;
+	unsigned long n;
+
+	GEM_BUG_ON(!intel_gt_pm_is_awake(vm->gt));
+
+	/*
+	 * We have to wait until the stale nodes are completely idle before
+	 * we can remove their PTE and unbind their pages. Hence, after
+	 * claiming their slot in the drm_mm, we defer their removal to
+	 * after the fences are signaled.
+	 */
+	if (!list_empty(&work->evict_list)) {
+		struct i915_vma *vma, *vn;
+
+		mutex_lock(&vm->mutex);
+		list_for_each_entry_safe(vma, vn, &work->evict_list, vm_link) {
+			GEM_BUG_ON(vma->vm != vm);
+			__i915_vma_evict(vma);
+			GEM_BUG_ON(!i915_vma_is_active(vma));
+		}
+		mutex_unlock(&vm->mutex);
+	}
+
+	/*
+	 * Now we know the nodes we require in drm_mm are idle, we can
+	 * replace the PTE in those ranges with our own.
+	 */
+	for (n = 0; n < work->count; n++) {
+		struct eb_bind_vma *bind = &work->bind[n];
+		struct i915_vma *vma = bind->ev->vma;
+
+		if (!bind->bind_flags)
+			goto put;
+
+		GEM_BUG_ON(vma->vm != vm);
+		GEM_BUG_ON(!i915_vma_is_active(vma));
+
+		if (err == 0)
+			vma->ops->bind_vma(work->vm,
+					   &work->stash,
+					   vma,
+					   vma->obj->cache_level,
+					   bind->bind_flags);
+		else
+			atomic_and(~bind->bind_flags, &vma->flags);
+
+		if (drm_mm_node_allocated(&bind->hole)) {
+			mutex_lock(&vm->mutex);
+			GEM_BUG_ON(bind->hole.mm != &vm->mm);
+			GEM_BUG_ON(bind->hole.color != I915_COLOR_UNEVICTABLE);
+			GEM_BUG_ON(drm_mm_node_allocated(&vma->node));
+			drm_mm_remove_node(&bind->hole);
+			if (!err) {
+				drm_mm_reserve_node(&vm->mm, &vma->node);
+				GEM_BUG_ON(!drm_mm_node_allocated(&vma->node));
+			} else {
+				list_del_init(&vma->vm_link);
+			}
+			mutex_unlock(&vm->mutex);
+		}
+		bind->bind_flags = 0;
+
+put:
+		GEM_BUG_ON(drm_mm_node_allocated(&bind->hole));
+		i915_vma_put_pages(vma);
+	}
+	work->count = 0;
+
+	return err;
+}
+
+static int eb_bind_vma(struct dma_fence_work *base)
+{
+	struct eb_vm_work *work = container_of(base, typeof(*work), base);
+
+	return __eb_bind_vma(work, 0);
+}
+
+static void eb_vma_work_release(struct dma_fence_work *base)
+{
+	struct eb_vm_work *work = container_of(base, typeof(*work), base);
+	unsigned long n;
+
+	if (work->id) {
+		if (work->count) {
+			GEM_BUG_ON(!work->base.dma.error);
+			__eb_bind_vma(work, work->base.dma.error);
+		}
+		i915_active_release(&work->vm->binding);
+	}
+
+	for (n = 0; n < work->count; n++)
+		i915_vma_put_pages(work->bind[n].ev->vma);
+	kvfree(work->bind);
+
+	eb_vma_array_put(work->array);
+
+	i915_vm_free_pt_stash(work->vm, &work->stash);
+	i915_vm_put(work->vm);
+}
+
+static const struct dma_fence_work_ops eb_bind_ops = {
+	.name = "eb_bind",
+	.work = eb_bind_vma,
+	.release = eb_vma_work_release,
+};
+
+static int eb_vm_work_cancel(struct eb_vm_work *work, int err)
+{
+	work->base.dma.error = err;
+	dma_fence_work_commit_imm(&work->base);
+
+	return err;
+}
+
+static struct eb_vm_work *eb_vm_work(struct i915_execbuffer *eb,
+				     unsigned long count)
+{
+	struct eb_vm_work *work;
+
+	work = kmalloc(sizeof(*work), GFP_KERNEL);
+	if (!work)
+		return NULL;
+
+	work->bind = kvmalloc(sizeof(*work->bind) * count, GFP_KERNEL);
+	if (!work->bind) {
+		kfree(work->bind);
+		return NULL;
+	}
+	work->count = count;
+
+	INIT_LIST_HEAD(&work->evict_list);
+
+	dma_fence_work_init(&work->base, &eb_bind_ops);
+	work->array = eb_vma_array_get(eb->array);
+	work->p_flags = &eb->args->flags;
+	work->vm = i915_vm_get(eb->context->vm);
+	memset(&work->stash, 0, sizeof(work->stash));
+
+	/* Preallocate our slot in vm->binding, outside of vm->mutex */
+	work->id = i915_gem_context_async_id(eb->gem_context);
+	if (i915_active_acquire_for_context(&work->vm->binding, work->id)) {
+		work->id = 0;
+		eb_vm_work_cancel(work, -ENOMEM);
+		return NULL;
+	}
+
+	return work;
+}
+
+static int eb_vm_throttle(struct eb_vm_work *work)
+{
+	struct dma_fence *p;
+	int err;
+
+	/* Keep async work queued per context */
+	p = __i915_active_ref(&work->vm->binding, work->id, &work->base.dma);
+	if (IS_ERR_OR_NULL(p))
+		return PTR_ERR_OR_ZERO(p);
+
+	err = i915_sw_fence_await_dma_fence(&work->base.chain, p, 0,
+					    GFP_NOWAIT | __GFP_NOWARN);
+	dma_fence_put(p);
+
+	return err < 0 ? err : 0;
+}
+
+static int eb_prepare_vma(struct eb_vm_work *work,
+			  unsigned long idx,
+			  struct eb_vma *ev)
+{
+	struct eb_bind_vma *bind = &work->bind[idx];
+	struct i915_vma *vma = ev->vma;
+	u64 max_size;
+	int err;
+
+	bind->ev = ev;
+	bind->hole.flags = 0;
+	bind->bind_flags = 0;
+
+	/* Allocate enough page directories to cover worst case */
+	max_size = max(vma->size, ev->exec->pad_to_size);
+	if (ev->flags & __EXEC_OBJECT_NEEDS_MAP)
+		max_size = max_t(u64, max_size, vma->fence_size);
+
+	err = i915_vm_alloc_pt_stash(work->vm, &work->stash, max_size);
+	if (err)
+		return err;
+
+	return i915_vma_get_pages(vma);
+}
+
+static int wait_for_unbinds(struct i915_execbuffer *eb,
+			    struct list_head *unbound,
+			    int pass)
+{
+	struct eb_vma *ev;
+	int err;
+
+	list_for_each_entry(ev, unbound, unbound_link) {
+		struct i915_vma *vma = ev->vma;
+
+		GEM_BUG_ON(ev->flags & __EXEC_OBJECT_HAS_PIN);
+
+		if (drm_mm_node_allocated(&vma->node) &&
+		    eb_vma_misplaced(ev->exec, vma, ev->flags)) {
+			err = i915_vma_unbind(vma);
+			if (err)
+				return err;
+		}
+
+		/* Wait for previous to avoid reusing vma->node */
+		err = i915_vma_wait_for_unbind(vma);
+		if (err)
+			return err;
+	}
+
+	switch (pass) {
+	default:
+		return -ENOSPC;
+
+	case 2:
+		/*
+		 * Too fragmented, retire everything on the timeline and so
+		 * make it all [contexts included] available to evict.
+		 */
+		err = wait_for_timeline(eb->context->timeline);
+		if (err)
+			return err;
+
+		fallthrough;
+	case 1:
+		/* XXX ticket lock */
+		if (i915_active_wait(&eb->context->vm->binding))
+			return -EINTR;
+
+		fallthrough;
+	case 0:
+		return 0;
+	}
+}
+
 static int eb_reserve_vm(struct i915_execbuffer *eb)
 {
-	unsigned int pin_flags = PIN_USER | PIN_NONBLOCK;
+	struct i915_address_space *vm = eb->context->vm;
 	struct list_head last, unbound;
-	struct eb_vma *ev;
+	unsigned long count;
 	unsigned int pass;
+	struct eb_vma *ev;
+	int err = 0;
 
+	count = 0;
 	INIT_LIST_HEAD(&unbound);
 	list_for_each_entry(ev, &eb->bind_list, bind_link) {
 		struct drm_i915_gem_exec_object2 *entry = ev->exec;
@@ -737,44 +1391,87 @@ static int eb_reserve_vm(struct i915_execbuffer *eb)
 				list_add(&ev->unbound_link, &unbound);
 			else
 				list_add_tail(&ev->unbound_link, &unbound);
+			count++;
 		}
 	}
-
-	if (list_empty(&unbound))
+	if (count == 0)
 		return 0;
 
-	/*
-	 * Attempt to pin all of the buffers into the GTT.
-	 * This is done in 3 phases:
-	 *
-	 * 1a. Unbind all objects that do not match the GTT constraints for
-	 *     the execbuffer (fenceable, mappable, alignment etc).
-	 * 1b. Increment pin count for already bound objects.
-	 * 2.  Bind new objects.
-	 * 3.  Decrement pin count.
-	 *
-	 * This avoid unnecessary unbinding of later objects in order to make
-	 * room for the earlier objects *unless* we need to defragment.
-	 */
-
 	pass = 0;
 	do {
-		int err = 0;
+		struct eb_vm_work *work;
 
-		if (mutex_lock_interruptible(&eb->i915->drm.struct_mutex))
-			return -EINTR;
+		work = eb_vm_work(eb, count);
+		if (!work)
+			return -ENOMEM;
 
+		count = 0;
 		list_for_each_entry(ev, &unbound, unbound_link) {
-			err = eb_reserve_vma(eb, ev, pin_flags);
+			err = eb_prepare_vma(work, count++, ev);
+			if (err) {
+				work->count = count - 1;
+
+				if (eb_vm_work_cancel(work, err) == -EAGAIN)
+					goto retry;
+
+				return err;
+			}
+		}
+
+		/* No allocations allowed beyond this point */
+		if (mutex_lock_interruptible(&vm->mutex))
+			return eb_vm_work_cancel(work, -EINTR);
+
+		err = eb_vm_throttle(work);
+		if (err) {
+			mutex_unlock(&vm->mutex);
+			return eb_vm_work_cancel(work, err);
+		}
+
+		for (count = 0; count < work->count; count++) {
+			struct eb_bind_vma *bind = &work->bind[count];
+			struct i915_vma *vma;
+
+			ev = bind->ev;
+			vma = ev->vma;
+
+			/*
+			 * Check if this node is being evicted or must be.
+			 *
+			 * As we use the single node inside the vma to track
+			 * both the eviction and where to insert the new node,
+			 * we cannot handle migrating the vma inside the worker.
+			 */
+			if (drm_mm_node_allocated(&vma->node)) {
+				if (eb_vma_misplaced(ev->exec, vma, ev->flags)) {
+					err = -ENOSPC;
+					break;
+				}
+			} else {
+				if (i915_vma_is_active(vma)) {
+					err = -ENOSPC;
+					break;
+				}
+			}
+
+			err = i915_active_acquire(&vma->active);
+			if (!err) {
+				err = eb_reserve_vma(work, bind);
+				i915_active_release(&vma->active);
+			}
 			if (err)
 				break;
 		}
-		if (!(err == -ENOSPC || err == -EAGAIN)) {
-			mutex_unlock(&eb->i915->drm.struct_mutex);
+
+		mutex_unlock(&vm->mutex);
+
+		dma_fence_work_commit_imm(&work->base);
+		if (err != -ENOSPC)
 			return err;
-		}
 
+retry:
 		/* Resort *all* the objects into priority order */
+		count = 0;
 		INIT_LIST_HEAD(&unbound);
 		INIT_LIST_HEAD(&last);
 		list_for_each_entry(ev, &eb->bind_list, bind_link) {
@@ -785,6 +1482,7 @@ static int eb_reserve_vm(struct i915_execbuffer *eb)
 				continue;
 
 			eb_unreserve_vma(ev);
+			count++;
 
 			if (flags & EXEC_OBJECT_PINNED)
 				/* Pinned must have their slot */
@@ -799,34 +1497,21 @@ static int eb_reserve_vm(struct i915_execbuffer *eb)
 				list_add_tail(&ev->unbound_link, &last);
 		}
 		list_splice_tail(&last, &unbound);
-		mutex_unlock(&eb->i915->drm.struct_mutex);
+		GEM_BUG_ON(!count);
 
-		if (err == -EAGAIN) {
-			flush_workqueue(eb->i915->mm.userptr_wq);
-			continue;
-		}
-
-		switch (pass++) {
-		case 0:
-			break;
-
-		case 1:
-			/*
-			 * Too fragmented, retire everything on the timeline
-			 * and so make it all [contexts included] available to
-			 * evict.
-			 */
-			err = wait_for_timeline(eb->context->timeline);
-			if (err)
-				return err;
+		if (signal_pending(current))
+			return -EINTR;
 
-			break;
+		/* Now safe to wait with no reservations held */
 
-		default:
-			return -ENOSPC;
+		if (err == -EAGAIN) {
+			flush_workqueue(eb->i915->mm.userptr_wq);
+			pass = 0;
 		}
 
-		pin_flags = PIN_USER;
+		err = wait_for_unbinds(eb, &unbound, pass++);
+		if (err)
+			return err;
 	} while (1);
 }
 
@@ -1411,6 +2096,29 @@ relocate_entry(struct i915_execbuffer *eb,
 	return target->node.start | UPDATE;
 }
 
+static int gen6_fixup_ggtt(struct i915_vma *vma)
+{
+	int err;
+
+	if (i915_vma_is_bound(vma, I915_VMA_GLOBAL_BIND))
+		return 0;
+
+	err = i915_vma_wait_for_bind(vma);
+	if (err)
+		return err;
+
+	mutex_lock(&vma->vm->mutex);
+	if (!(atomic_fetch_or(I915_VMA_GLOBAL_BIND, &vma->flags) & I915_VMA_GLOBAL_BIND)) {
+		__i915_gem_object_pin_pages(vma->obj);
+		vma->ops->bind_vma(vma->vm, NULL, vma,
+				   vma->obj->cache_level,
+				   I915_VMA_GLOBAL_BIND);
+	}
+	mutex_unlock(&vma->vm->mutex);
+
+	return 0;
+}
+
 static u64
 eb_relocate_entry(struct i915_execbuffer *eb,
 		  struct eb_vma *ev,
@@ -1425,6 +2133,8 @@ eb_relocate_entry(struct i915_execbuffer *eb,
 	if (unlikely(!target))
 		return -ENOENT;
 
+	GEM_BUG_ON(!i915_vma_is_pinned(target->vma));
+
 	/* Validate that the target is in a valid r/w GPU domain */
 	if (unlikely(reloc->write_domain & (reloc->write_domain - 1))) {
 		drm_dbg(&i915->drm, "reloc with multiple write domains: "
@@ -1459,9 +2169,7 @@ eb_relocate_entry(struct i915_execbuffer *eb,
 		 */
 		if (reloc->write_domain == I915_GEM_DOMAIN_INSTRUCTION &&
 		    IS_GEN(eb->i915, 6)) {
-			err = i915_vma_bind(target->vma,
-					    target->vma->obj->cache_level,
-					    PIN_GLOBAL, NULL);
+			err = gen6_fixup_ggtt(target->vma);
 			if (err)
 				return err;
 		}
@@ -1673,7 +2381,6 @@ static int eb_move_to_gpu(struct i915_execbuffer *eb)
 			err = i915_vma_move_to_active(vma, eb->request, flags);
 
 		i915_vma_unlock(vma);
-		eb_unreserve_vma(ev);
 	}
 	ww_acquire_fini(&acquire);
 
@@ -2616,7 +3323,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	 * snb/ivb/vlv conflate the "batch in ppgtt" bit with the "non-secure
 	 * batch" bit. Hence we need to pin secure batches into the global gtt.
 	 * hsw should have this fixed, but bdw mucks it up again. */
-	batch = eb.batch->vma;
+	batch = i915_vma_get(eb.batch->vma);
 	if (eb.batch_flags & I915_DISPATCH_SECURE) {
 		struct i915_vma *vma;
 
@@ -2636,6 +3343,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 			goto err_parse;
 		}
 
+		GEM_BUG_ON(vma->obj != batch->obj);
 		batch = vma;
 	}
 
@@ -2715,6 +3423,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 err_parse:
 	if (batch->private)
 		intel_gt_buffer_pool_put(batch->private);
+	i915_vma_put(batch);
 err_vma:
 	if (eb.trampoline)
 		i915_vma_unpin(eb.trampoline);
diff --git a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
index 226e404c706d..a717d53e9b37 100644
--- a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
@@ -352,6 +352,7 @@ static struct i915_vma *pd_vma_create(struct gen6_ppgtt *ppgtt, int size)
 	atomic_set(&vma->flags, I915_VMA_GGTT);
 	vma->ggtt_view.type = I915_GGTT_VIEW_ROTATED; /* prevent fencing */
 
+	INIT_LIST_HEAD(&vma->vm_link);
 	INIT_LIST_HEAD(&vma->obj_link);
 	INIT_LIST_HEAD(&vma->closed_link);
 
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
index e0cc90942848..29b98828fd12 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
@@ -58,6 +58,8 @@ void __i915_vm_close(struct i915_address_space *vm)
 
 void i915_address_space_fini(struct i915_address_space *vm)
 {
+	i915_active_fini(&vm->binding);
+
 	drm_mm_takedown(&vm->mm);
 	mutex_destroy(&vm->mutex);
 }
@@ -103,6 +105,8 @@ void i915_address_space_init(struct i915_address_space *vm, int subclass)
 	drm_mm_init(&vm->mm, 0, vm->total);
 	vm->mm.head_node.color = I915_COLOR_UNEVICTABLE;
 
+	i915_active_init(&vm->binding, NULL, NULL);
+
 	INIT_LIST_HEAD(&vm->bound_list);
 }
 
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
index 57b31b36285f..c433e9c02842 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
@@ -250,6 +250,8 @@ struct i915_address_space {
 	 */
 	struct list_head bound_list;
 
+	struct i915_active binding;
+
 	/* Global GTT */
 	bool is_ggtt:1;
 
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 9aa3066cb75d..e998f25f30a3 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -997,6 +997,9 @@ i915_gem_object_ggtt_pin(struct drm_i915_gem_object *obj,
 		return vma;
 
 	if (i915_vma_misplaced(vma, size, alignment, flags)) {
+		if (flags & PIN_NOEVICT)
+			return ERR_PTR(-ENOSPC);
+
 		if (flags & PIN_NONBLOCK) {
 			if (i915_vma_is_pinned(vma) || i915_vma_is_active(vma))
 				return ERR_PTR(-ENOSPC);
@@ -1016,6 +1019,10 @@ i915_gem_object_ggtt_pin(struct drm_i915_gem_object *obj,
 			return ERR_PTR(ret);
 	}
 
+	if (flags & PIN_NONBLOCK &&
+	    i915_active_fence_isset(&vma->active.excl))
+		return ERR_PTR(-EAGAIN);
+
 	ret = i915_vma_pin(vma, size, alignment, flags | PIN_GLOBAL);
 	if (ret)
 		return ERR_PTR(ret);
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index cb43381b0d37..7e1225874b03 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -219,6 +219,8 @@ int i915_gem_gtt_insert(struct i915_address_space *vm,
 		mode = DRM_MM_INSERT_HIGHEST;
 	if (flags & PIN_MAPPABLE)
 		mode = DRM_MM_INSERT_LOW;
+	if (flags & PIN_NOSEARCH)
+		mode |= DRM_MM_INSERT_ONCE;
 
 	/* We only allocate in PAGE_SIZE/GTT_PAGE_SIZE (4096) chunks,
 	 * so we know that we always have a minimum alignment of 4096.
@@ -236,6 +238,9 @@ int i915_gem_gtt_insert(struct i915_address_space *vm,
 	if (err != -ENOSPC)
 		return err;
 
+	if (flags & PIN_NOSEARCH)
+		return -ENOSPC;
+
 	if (mode & DRM_MM_INSERT_ONCE) {
 		err = drm_mm_insert_node_in_range(&vm->mm, node,
 						  size, alignment, color,
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index a00a026076e4..484e7a4cccfa 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -132,6 +132,7 @@ vma_create(struct drm_i915_gem_object *obj,
 		fs_reclaim_release(GFP_KERNEL);
 	}
 
+	INIT_LIST_HEAD(&vma->vm_link);
 	INIT_LIST_HEAD(&vma->closed_link);
 
 	if (view && view->type != I915_GGTT_VIEW_NORMAL) {
@@ -345,25 +346,37 @@ struct i915_vma_work *i915_vma_work(void)
 	return vw;
 }
 
-int i915_vma_wait_for_bind(struct i915_vma *vma)
+static int
+__i915_vma_wait_excl(struct i915_vma *vma, bool bound, unsigned int flags)
 {
+	struct dma_fence *fence;
 	int err = 0;
 
-	if (rcu_access_pointer(vma->active.excl.fence)) {
-		struct dma_fence *fence;
+	fence = i915_active_fence_get(&vma->active.excl);
+	if (!fence)
+		return 0;
 
-		rcu_read_lock();
-		fence = dma_fence_get_rcu_safe(&vma->active.excl.fence);
-		rcu_read_unlock();
-		if (fence) {
-			err = dma_fence_wait(fence, MAX_SCHEDULE_TIMEOUT);
-			dma_fence_put(fence);
-		}
+	if (drm_mm_node_allocated(&vma->node) == bound) {
+		if (flags & PIN_NOEVICT)
+			err = -EBUSY;
+		else
+			err = dma_fence_wait(fence, true);
 	}
 
+	dma_fence_put(fence);
 	return err;
 }
 
+int i915_vma_wait_for_bind(struct i915_vma *vma)
+{
+	return __i915_vma_wait_excl(vma, true, 0);
+}
+
+int i915_vma_wait_for_unbind(struct i915_vma *vma)
+{
+	return __i915_vma_wait_excl(vma, false, 0);
+}
+
 /**
  * i915_vma_bind - Sets up PTEs for an VMA in it's corresponding address space.
  * @vma: VMA to map
@@ -628,8 +641,9 @@ i915_vma_insert(struct i915_vma *vma, u64 size, u64 alignment, u64 flags)
 	u64 start, end;
 	int ret;
 
-	GEM_BUG_ON(i915_vma_is_bound(vma, I915_VMA_GLOBAL_BIND | I915_VMA_LOCAL_BIND));
+	GEM_BUG_ON(i915_vma_is_bound(vma, I915_VMA_BIND_MASK));
 	GEM_BUG_ON(drm_mm_node_allocated(&vma->node));
+	GEM_BUG_ON(i915_active_fence_isset(&vma->active.excl));
 
 	size = max(size, vma->size);
 	alignment = max(alignment, vma->display_alignment);
@@ -725,7 +739,7 @@ i915_vma_insert(struct i915_vma *vma, u64 size, u64 alignment, u64 flags)
 	GEM_BUG_ON(!drm_mm_node_allocated(&vma->node));
 	GEM_BUG_ON(!i915_gem_valid_gtt_space(vma, color));
 
-	list_add_tail(&vma->vm_link, &vma->vm->bound_list);
+	list_move_tail(&vma->vm_link, &vma->vm->bound_list);
 
 	return 0;
 }
@@ -733,15 +747,12 @@ i915_vma_insert(struct i915_vma *vma, u64 size, u64 alignment, u64 flags)
 static void
 i915_vma_detach(struct i915_vma *vma)
 {
-	GEM_BUG_ON(!drm_mm_node_allocated(&vma->node));
-	GEM_BUG_ON(i915_vma_is_bound(vma, I915_VMA_GLOBAL_BIND | I915_VMA_LOCAL_BIND));
-
 	/*
 	 * And finally now the object is completely decoupled from this
 	 * vma, we can drop its hold on the backing storage and allow
 	 * it to be reaped by the shrinker.
 	 */
-	list_del(&vma->vm_link);
+	list_del_init(&vma->vm_link);
 }
 
 bool i915_vma_pin_inplace(struct i915_vma *vma, unsigned int flags)
@@ -789,7 +800,7 @@ bool i915_vma_pin_inplace(struct i915_vma *vma, unsigned int flags)
 	return pinned;
 }
 
-static int vma_get_pages(struct i915_vma *vma)
+int i915_vma_get_pages(struct i915_vma *vma)
 {
 	int err = 0;
 
@@ -836,7 +847,7 @@ static void __vma_put_pages(struct i915_vma *vma, unsigned int count)
 	mutex_unlock(&vma->pages_mutex);
 }
 
-static void vma_put_pages(struct i915_vma *vma)
+void i915_vma_put_pages(struct i915_vma *vma)
 {
 	if (atomic_add_unless(&vma->pages_count, -1, 1))
 		return;
@@ -853,9 +864,13 @@ static void vma_unbind_pages(struct i915_vma *vma)
 	/* The upper portion of pages_count is the number of bindings */
 	count = atomic_read(&vma->pages_count);
 	count >>= I915_VMA_PAGES_BIAS;
-	GEM_BUG_ON(!count);
+	if (count)
+		__vma_put_pages(vma, count | count << I915_VMA_PAGES_BIAS);
+}
 
-	__vma_put_pages(vma, count | count << I915_VMA_PAGES_BIAS);
+static int __wait_for_unbind(struct i915_vma *vma, unsigned int flags)
+{
+	return __i915_vma_wait_excl(vma, false, flags);
 }
 
 int i915_vma_pin(struct i915_vma *vma, u64 size, u64 alignment, u64 flags)
@@ -874,10 +889,14 @@ int i915_vma_pin(struct i915_vma *vma, u64 size, u64 alignment, u64 flags)
 	if (i915_vma_pin_inplace(vma, flags & I915_VMA_BIND_MASK))
 		return 0;
 
-	err = vma_get_pages(vma);
+	err = i915_vma_get_pages(vma);
 	if (err)
 		return err;
 
+	err = __wait_for_unbind(vma, flags);
+	if (err)
+		goto err_pages;
+
 	if (flags & vma->vm->bind_async_flags) {
 		u64 max_size;
 
@@ -949,6 +968,10 @@ int i915_vma_pin(struct i915_vma *vma, u64 size, u64 alignment, u64 flags)
 		goto err_unlock;
 
 	if (!(bound & I915_VMA_BIND_MASK)) {
+		err = __wait_for_unbind(vma, flags);
+		if (err)
+			goto err_active;
+
 		err = i915_vma_insert(vma, size, alignment, flags);
 		if (err)
 			goto err_active;
@@ -968,6 +991,7 @@ int i915_vma_pin(struct i915_vma *vma, u64 size, u64 alignment, u64 flags)
 	GEM_BUG_ON(bound + I915_VMA_PAGES_ACTIVE < bound);
 	atomic_add(I915_VMA_PAGES_ACTIVE, &vma->pages_count);
 	list_move_tail(&vma->vm_link, &vma->vm->bound_list);
+	GEM_BUG_ON(!i915_vma_is_active(vma));
 
 	__i915_vma_pin(vma);
 	GEM_BUG_ON(!i915_vma_is_pinned(vma));
@@ -989,7 +1013,7 @@ int i915_vma_pin(struct i915_vma *vma, u64 size, u64 alignment, u64 flags)
 	if (wakeref)
 		intel_runtime_pm_put(&vma->vm->i915->runtime_pm, wakeref);
 err_pages:
-	vma_put_pages(vma);
+	i915_vma_put_pages(vma);
 	return err;
 }
 
@@ -1093,6 +1117,7 @@ void i915_vma_release(struct kref *ref)
 		GEM_BUG_ON(drm_mm_node_allocated(&vma->node));
 	}
 	GEM_BUG_ON(i915_vma_is_active(vma));
+	GEM_BUG_ON(!list_empty(&vma->vm_link));
 
 	if (vma->obj) {
 		struct drm_i915_gem_object *obj = vma->obj;
@@ -1152,7 +1177,7 @@ static void __i915_vma_iounmap(struct i915_vma *vma)
 {
 	GEM_BUG_ON(i915_vma_is_pinned(vma));
 
-	if (vma->iomap == NULL)
+	if (!vma->iomap)
 		return;
 
 	io_mapping_unmap(vma->iomap);
diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
index 03fea54fd573..9a26e6cbe8cd 100644
--- a/drivers/gpu/drm/i915/i915_vma.h
+++ b/drivers/gpu/drm/i915/i915_vma.h
@@ -236,6 +236,9 @@ static inline void i915_vma_unlock(struct i915_vma *vma)
 	dma_resv_unlock(vma->resv);
 }
 
+int i915_vma_get_pages(struct i915_vma *vma);
+void i915_vma_put_pages(struct i915_vma *vma);
+
 bool i915_vma_pin_inplace(struct i915_vma *vma, unsigned int flags);
 
 int __must_check
@@ -379,6 +382,7 @@ void i915_vma_make_shrinkable(struct i915_vma *vma);
 void i915_vma_make_purgeable(struct i915_vma *vma);
 
 int i915_vma_wait_for_bind(struct i915_vma *vma);
+int i915_vma_wait_for_unbind(struct i915_vma *vma);
 
 static inline int i915_vma_sync(struct i915_vma *vma)
 {
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [Intel-gfx] [PATCH 18/23] drm/i915/gem: Bind the fence async for execbuf
  2020-07-02  8:32 ` [Intel-gfx] " Chris Wilson
                   ` (16 preceding siblings ...)
  (?)
@ 2020-07-02  8:32 ` Chris Wilson
  -1 siblings, 0 replies; 56+ messages in thread
From: Chris Wilson @ 2020-07-02  8:32 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

It is illegal to wait on an another vma while holding the vm->mutex, as
that easily leads to ABBA deadlocks (we wait on a second vma that waits
on us to release the vm->mutex). So while the vm->mutex exists, move the
waiting outside of the lock into the async binding pipeline.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |  21 +--
 drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c  | 137 +++++++++++++++++-
 drivers/gpu/drm/i915/gt/intel_ggtt_fencing.h  |   5 +
 3 files changed, 151 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 5fa84c802312..7d12db713271 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -1050,15 +1050,6 @@ static int eb_reserve_vma(struct eb_vm_work *work, struct eb_bind_vma *bind)
 		return err;
 
 pin:
-	if (unlikely(exec_flags & EXEC_OBJECT_NEEDS_FENCE)) {
-		err = __i915_vma_pin_fence(vma); /* XXX no waiting */
-		if (unlikely(err))
-			return err;
-
-		if (vma->fence)
-			bind->ev->flags |= __EXEC_OBJECT_HAS_FENCE;
-	}
-
 	bind_flags &= ~atomic_read(&vma->flags);
 	if (bind_flags) {
 		err = set_bind_fence(vma, work);
@@ -1089,6 +1080,15 @@ static int eb_reserve_vma(struct eb_vm_work *work, struct eb_bind_vma *bind)
 	bind->ev->flags |= __EXEC_OBJECT_HAS_PIN;
 	GEM_BUG_ON(eb_vma_misplaced(entry, vma, bind->ev->flags));
 
+	if (unlikely(exec_flags & EXEC_OBJECT_NEEDS_FENCE)) {
+		err = __i915_vma_pin_fence_async(vma, &work->base);
+		if (unlikely(err))
+			return err;
+
+		if (vma->fence)
+			bind->ev->flags |= __EXEC_OBJECT_HAS_FENCE;
+	}
+
 	return 0;
 }
 
@@ -1154,6 +1154,9 @@ static int __eb_bind_vma(struct eb_vm_work *work, int err)
 		struct eb_bind_vma *bind = &work->bind[n];
 		struct i915_vma *vma = bind->ev->vma;
 
+		if (bind->ev->flags & __EXEC_OBJECT_HAS_FENCE)
+			__i915_vma_apply_fence_async(vma);
+
 		if (!bind->bind_flags)
 			goto put;
 
diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c b/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c
index 7fb36b12fe7a..734b6aa61809 100644
--- a/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c
+++ b/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c
@@ -21,10 +21,13 @@
  * IN THE SOFTWARE.
  */
 
+#include "i915_active.h"
 #include "i915_drv.h"
 #include "i915_scatterlist.h"
+#include "i915_sw_fence_work.h"
 #include "i915_pvinfo.h"
 #include "i915_vgpu.h"
+#include "i915_vma.h"
 
 /**
  * DOC: fence register handling
@@ -340,19 +343,37 @@ static struct i915_fence_reg *fence_find(struct i915_ggtt *ggtt)
 	return ERR_PTR(-EDEADLK);
 }
 
+static int fence_wait_bind(struct i915_fence_reg *reg)
+{
+	struct dma_fence *fence;
+	int err = 0;
+
+	fence = i915_active_fence_get(&reg->active.excl);
+	if (fence) {
+		err = dma_fence_wait(fence, true);
+		dma_fence_put(fence);
+	}
+
+	return err;
+}
+
 int __i915_vma_pin_fence(struct i915_vma *vma)
 {
 	struct i915_ggtt *ggtt = i915_vm_to_ggtt(vma->vm);
-	struct i915_fence_reg *fence;
+	struct i915_fence_reg *fence = vma->fence;
 	struct i915_vma *set = i915_gem_object_is_tiled(vma->obj) ? vma : NULL;
 	int err;
 
 	lockdep_assert_held(&vma->vm->mutex);
 
 	/* Just update our place in the LRU if our fence is getting reused. */
-	if (vma->fence) {
-		fence = vma->fence;
+	if (fence) {
 		GEM_BUG_ON(fence->vma != vma);
+
+		err = fence_wait_bind(fence);
+		if (err)
+			return err;
+
 		atomic_inc(&fence->pin_count);
 		if (!fence->dirty) {
 			list_move_tail(&fence->link, &ggtt->fence_list);
@@ -384,6 +405,116 @@ int __i915_vma_pin_fence(struct i915_vma *vma)
 	return err;
 }
 
+static int set_bind_fence(struct i915_fence_reg *fence,
+			  struct dma_fence_work *work)
+{
+	struct dma_fence *prev;
+	int err;
+
+	if (rcu_access_pointer(fence->active.excl.fence) == &work->dma)
+		return 0;
+
+	err = i915_sw_fence_await_active(&work->chain,
+					 &fence->active,
+					 I915_ACTIVE_AWAIT_ACTIVE);
+	if (err)
+		return err;
+
+	if (i915_active_acquire(&fence->active))
+		return -ENOENT;
+
+	prev = i915_active_set_exclusive(&fence->active, &work->dma);
+	if (unlikely(prev)) {
+		err = i915_sw_fence_await_dma_fence(&work->chain, prev, 0,
+						    GFP_NOWAIT | __GFP_NOWARN);
+		dma_fence_put(prev);
+	}
+
+	i915_active_release(&fence->active);
+	return err < 0 ? err : 0;
+}
+
+int __i915_vma_pin_fence_async(struct i915_vma *vma,
+			       struct dma_fence_work *work)
+{
+	struct i915_ggtt *ggtt = i915_vm_to_ggtt(vma->vm);
+	struct i915_vma *set = i915_gem_object_is_tiled(vma->obj) ? vma : NULL;
+	struct i915_fence_reg *fence = vma->fence;
+	int err;
+
+	lockdep_assert_held(&vma->vm->mutex);
+
+	/* Just update our place in the LRU if our fence is getting reused. */
+	if (fence) {
+		GEM_BUG_ON(fence->vma != vma);
+		GEM_BUG_ON(!i915_vma_is_map_and_fenceable(vma));
+	} else if (set) {
+		if (!i915_vma_is_map_and_fenceable(vma))
+			return -EINVAL;
+
+		fence = fence_find(ggtt);
+		if (IS_ERR(fence))
+			return -ENOSPC;
+
+		GEM_BUG_ON(atomic_read(&fence->pin_count));
+		fence->dirty = true;
+	} else {
+		return 0;
+	}
+
+	atomic_inc(&fence->pin_count);
+	list_move_tail(&fence->link, &ggtt->fence_list);
+	if (!fence->dirty)
+		return 0;
+
+	if (INTEL_GEN(fence_to_i915(fence)) < 4 &&
+	    rcu_access_pointer(vma->active.excl.fence) != &work->dma) {
+		/* implicit 'unfenced' GPU blits */
+		err = i915_sw_fence_await_active(&work->chain,
+						 &vma->active,
+						 I915_ACTIVE_AWAIT_ACTIVE);
+		if (err)
+			goto err_unpin;
+	}
+
+	err = set_bind_fence(fence, work);
+	if (err)
+		goto err_unpin;
+
+	if (set) {
+		fence->start = vma->node.start;
+		fence->size  = vma->fence_size;
+		fence->stride = i915_gem_object_get_stride(vma->obj);
+		fence->tiling = i915_gem_object_get_tiling(vma->obj);
+
+		vma->fence = fence;
+	} else {
+		fence->tiling = 0;
+		vma->fence = NULL;
+	}
+
+	set = xchg(&fence->vma, set);
+	if (set && set != vma) {
+		GEM_BUG_ON(set->fence != fence);
+		WRITE_ONCE(set->fence, NULL);
+		i915_vma_revoke_mmap(set);
+	}
+
+	return 0;
+
+err_unpin:
+	atomic_dec(&fence->pin_count);
+	return err;
+}
+
+void __i915_vma_apply_fence_async(struct i915_vma *vma)
+{
+	struct i915_fence_reg *fence = vma->fence;
+
+	if (fence->dirty)
+		fence_write(fence);
+}
+
 /**
  * i915_vma_pin_fence - set up fencing for a vma
  * @vma: vma to map through a fence reg
diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.h b/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.h
index 9eef679e1311..d306ac14d47e 100644
--- a/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.h
+++ b/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.h
@@ -30,6 +30,7 @@
 
 #include "i915_active.h"
 
+struct dma_fence_work;
 struct drm_i915_gem_object;
 struct i915_ggtt;
 struct i915_vma;
@@ -70,6 +71,10 @@ void i915_gem_object_do_bit_17_swizzle(struct drm_i915_gem_object *obj,
 void i915_gem_object_save_bit_17_swizzle(struct drm_i915_gem_object *obj,
 					 struct sg_table *pages);
 
+int __i915_vma_pin_fence_async(struct i915_vma *vma,
+			       struct dma_fence_work *work);
+void __i915_vma_apply_fence_async(struct i915_vma *vma);
+
 void intel_ggtt_init_fences(struct i915_ggtt *ggtt);
 void intel_ggtt_fini_fences(struct i915_ggtt *ggtt);
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [Intel-gfx] [PATCH 19/23] drm/i915/gem: Include cmdparser in common execbuf pinning
  2020-07-02  8:32 ` [Intel-gfx] " Chris Wilson
                   ` (17 preceding siblings ...)
  (?)
@ 2020-07-02  8:32 ` Chris Wilson
  -1 siblings, 0 replies; 56+ messages in thread
From: Chris Wilson @ 2020-07-02  8:32 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Pull the cmdparser allocations in to the reservation phase, and then
they are included in the common vma pinning pass.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 316 ++++++++++--------
 1 file changed, 172 insertions(+), 144 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 7d12db713271..e19c0cbe1b7d 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -52,6 +52,7 @@ struct eb_bind_vma {
 
 struct eb_vma_array {
 	struct kref kref;
+	struct list_head aux_list;
 	struct eb_vma vma[];
 };
 
@@ -246,7 +247,6 @@ struct i915_execbuffer {
 
 	struct i915_request *request; /** our request to build */
 	struct eb_vma *batch; /** identity of the batch obj/vma */
-	struct i915_vma *trampoline; /** trampoline used for chaining */
 
 	/** actual size of execobj[] as we may extend it for the cmdparser */
 	unsigned int buffer_count;
@@ -281,6 +281,11 @@ struct i915_execbuffer {
 		unsigned int rq_size;
 	} reloc_cache;
 
+	struct eb_cmdparser {
+		struct eb_vma *shadow;
+		struct eb_vma *trampoline;
+	} parser;
+
 	u64 invalid_flags; /** Set of execobj.flags that are invalid */
 	u32 context_flags; /** Set of execobj.flags to insert from the ctx */
 
@@ -298,6 +303,8 @@ struct i915_execbuffer {
 	struct eb_vma_array *array;
 };
 
+static struct drm_i915_gem_exec_object2 no_entry;
+
 static inline bool eb_use_cmdparser(const struct i915_execbuffer *eb)
 {
 	return intel_engine_requires_cmd_parser(eb->engine) ||
@@ -314,6 +321,7 @@ static struct eb_vma_array *eb_vma_array_create(unsigned int count)
 		return NULL;
 
 	kref_init(&arr->kref);
+	INIT_LIST_HEAD(&arr->aux_list);
 	arr->vma[0].vma = NULL;
 
 	return arr;
@@ -339,16 +347,31 @@ static inline void eb_unreserve_vma(struct eb_vma *ev)
 		       __EXEC_OBJECT_HAS_FENCE);
 }
 
+static void eb_vma_destroy(struct eb_vma *ev)
+{
+	eb_unreserve_vma(ev);
+	i915_vma_put(ev->vma);
+}
+
+static void eb_destroy_aux(struct eb_vma_array *arr)
+{
+	struct eb_vma *ev, *en;
+
+	list_for_each_entry_safe(ev, en, &arr->aux_list, reloc_link) {
+		eb_vma_destroy(ev);
+		kfree(ev);
+	}
+}
+
 static void eb_vma_array_destroy(struct kref *kref)
 {
 	struct eb_vma_array *arr = container_of(kref, typeof(*arr), kref);
-	struct eb_vma *ev = arr->vma;
+	struct eb_vma *ev;
 
-	while (ev->vma) {
-		eb_unreserve_vma(ev);
-		i915_vma_put(ev->vma);
-		ev++;
-	}
+	eb_destroy_aux(arr);
+
+	for (ev = arr->vma; ev->vma; ev++)
+		eb_vma_destroy(ev);
 
 	kvfree(arr);
 }
@@ -396,8 +419,8 @@ eb_lock_vma(struct i915_execbuffer *eb, struct ww_acquire_ctx *acquire)
 
 static int eb_create(struct i915_execbuffer *eb)
 {
-	/* Allocate an extra slot for use by the command parser + sentinel */
-	eb->array = eb_vma_array_create(eb->buffer_count + 2);
+	/* Allocate an extra slot for use by the sentinel */
+	eb->array = eb_vma_array_create(eb->buffer_count + 1);
 	if (!eb->array)
 		return -ENOMEM;
 
@@ -1072,7 +1095,7 @@ static int eb_reserve_vma(struct eb_vm_work *work, struct eb_bind_vma *bind)
 	GEM_BUG_ON(!(drm_mm_node_allocated(&vma->node) ^
 		     drm_mm_node_allocated(&bind->hole)));
 
-	if (entry->offset != vma->node.start) {
+	if (entry != &no_entry && entry->offset != vma->node.start) {
 		entry->offset = vma->node.start | UPDATE;
 		*work->p_flags |= __EXEC_HAS_RELOC;
 	}
@@ -1384,7 +1407,8 @@ static int eb_reserve_vm(struct i915_execbuffer *eb)
 		struct i915_vma *vma = ev->vma;
 
 		if (eb_pin_vma_inplace(eb, entry, ev)) {
-			if (entry->offset != vma->node.start) {
+			if (entry != &no_entry &&
+			    entry->offset != vma->node.start) {
 				entry->offset = vma->node.start | UPDATE;
 				eb->args->flags |= __EXEC_HAS_RELOC;
 			}
@@ -1518,6 +1542,112 @@ static int eb_reserve_vm(struct i915_execbuffer *eb)
 	} while (1);
 }
 
+static int eb_alloc_cmdparser(struct i915_execbuffer *eb)
+{
+	struct intel_gt_buffer_pool_node *pool;
+	struct i915_vma *vma;
+	struct eb_vma *ev;
+	unsigned int len;
+	int err;
+
+	if (range_overflows_t(u64,
+			      eb->batch_start_offset, eb->batch_len,
+			      eb->batch->vma->size)) {
+		drm_dbg(&eb->i915->drm,
+			"Attempting to use out-of-bounds batch\n");
+		return -EINVAL;
+	}
+
+	if (eb->batch_len == 0)
+		eb->batch_len = eb->batch->vma->size - eb->batch_start_offset;
+
+	if (!eb_use_cmdparser(eb))
+		return 0;
+
+	len = eb->batch_len;
+	if (!CMDPARSER_USES_GGTT(eb->i915)) {
+		/*
+		 * ppGTT backed shadow buffers must be mapped RO, to prevent
+		 * post-scan tampering
+		 */
+		if (!eb->context->vm->has_read_only) {
+			drm_dbg(&eb->i915->drm,
+				"Cannot prevent post-scan tampering without RO capable vm\n");
+			return -EINVAL;
+		}
+	} else {
+		len += I915_CMD_PARSER_TRAMPOLINE_SIZE;
+	}
+
+	pool = intel_gt_get_buffer_pool(eb->engine->gt, len);
+	if (IS_ERR(pool))
+		return PTR_ERR(pool);
+
+	ev = kzalloc(sizeof(*ev), GFP_KERNEL);
+	if (!ev) {
+		err = -ENOMEM;
+		goto err_pool;
+	}
+
+	vma = i915_vma_instance(pool->obj, eb->context->vm, NULL);
+	if (IS_ERR(vma)) {
+		err = PTR_ERR(vma);
+		goto err_ev;
+	}
+	i915_gem_object_set_readonly(vma->obj);
+	vma->private = pool;
+
+	ev->vma = i915_vma_get(vma);
+	ev->exec = &no_entry;
+	list_add(&ev->reloc_link, &eb->array->aux_list);
+	list_add(&ev->bind_link, &eb->bind_list);
+	list_add(&ev->submit_link, &eb->submit_list);
+
+	if (CMDPARSER_USES_GGTT(eb->i915)) {
+		eb->parser.trampoline = ev;
+
+		/*
+		 * Special care when binding will be required for full-ppgtt
+		 * as there will be distinct vm involved, and we will need to
+		 * separate the binding/eviction passes (different vm->mutex).
+		 */
+		if (GEM_WARN_ON(eb->context->vm != &eb->engine->gt->ggtt->vm)) {
+			ev = kzalloc(sizeof(*ev), GFP_KERNEL);
+			if (!ev) {
+				err = -ENOMEM;
+				goto err_pool;
+			}
+
+			vma = i915_vma_instance(pool->obj,
+						&eb->engine->gt->ggtt->vm,
+						NULL);
+			if (IS_ERR(vma)) {
+				err = PTR_ERR(vma);
+				goto err_ev;
+			}
+			vma->private = pool;
+
+			ev->vma = i915_vma_get(vma);
+			ev->exec = &no_entry;
+			list_add(&ev->reloc_link, &eb->array->aux_list);
+			list_add(&ev->bind_link, &eb->bind_list);
+			list_add(&ev->submit_link, &eb->submit_list);
+		}
+
+		ev->flags = EXEC_OBJECT_NEEDS_GTT;
+		eb->batch_flags |= I915_DISPATCH_SECURE;
+	}
+
+	eb->parser.shadow = ev;
+	return 0;
+
+err_ev:
+	kfree(ev);
+err_pool:
+	intel_gt_buffer_pool_put(pool);
+	return err;
+}
+
 static unsigned int eb_batch_index(const struct i915_execbuffer *eb)
 {
 	if (eb->args->flags & I915_EXEC_BATCH_FIRST)
@@ -1681,9 +1811,7 @@ static void eb_destroy(const struct i915_execbuffer *eb)
 {
 	GEM_BUG_ON(eb->reloc_cache.rq);
 
-	if (eb->array)
-		eb_vma_array_put(eb->array);
-
+	eb_vma_array_put(eb->array);
 	if (eb->lut_size > 0)
 		kfree(eb->buckets);
 }
@@ -2303,6 +2431,10 @@ static int eb_relocate(struct i915_execbuffer *eb)
 	if (err)
 		return err;
 
+	err = eb_alloc_cmdparser(eb);
+	if (err)
+		return err;
+
 	err = eb_reserve_vm(eb);
 	if (err)
 		return err;
@@ -2387,8 +2519,6 @@ static int eb_move_to_gpu(struct i915_execbuffer *eb)
 	}
 	ww_acquire_fini(&acquire);
 
-	eb_vma_array_put(fetch_and_zero(&eb->array));
-
 	if (unlikely(err))
 		goto err_skip;
 
@@ -2452,25 +2582,6 @@ static int i915_reset_gen7_sol_offsets(struct i915_request *rq)
 	return 0;
 }
 
-static struct i915_vma *
-shadow_batch_pin(struct drm_i915_gem_object *obj,
-		 struct i915_address_space *vm,
-		 unsigned int flags)
-{
-	struct i915_vma *vma;
-	int err;
-
-	vma = i915_vma_instance(obj, vm, NULL);
-	if (IS_ERR(vma))
-		return vma;
-
-	err = i915_vma_pin(vma, 0, 0, flags);
-	if (err)
-		return ERR_PTR(err);
-
-	return vma;
-}
-
 struct eb_parse_work {
 	struct dma_fence_work base;
 	struct intel_engine_cs *engine;
@@ -2522,19 +2633,10 @@ __parser_mark_active(struct i915_vma *vma,
 static int
 parser_mark_active(struct eb_parse_work *pw, struct intel_timeline *tl)
 {
-	int err;
-
-	err = __parser_mark_active(pw->shadow, tl, &pw->base.dma);
-	if (err)
-		return err;
-
-	if (pw->trampoline) {
-		err = __parser_mark_active(pw->trampoline, tl, &pw->base.dma);
-		if (err)
-			return err;
-	}
+	GEM_BUG_ON(pw->trampoline &&
+		   pw->trampoline->private != pw->shadow->private);
 
-	return 0;
+	return __parser_mark_active(pw->shadow, tl, &pw->base.dma);
 }
 
 static int eb_parse_pipeline(struct i915_execbuffer *eb,
@@ -2544,6 +2646,9 @@ static int eb_parse_pipeline(struct i915_execbuffer *eb,
 	struct eb_parse_work *pw;
 	int err;
 
+	GEM_BUG_ON(!i915_vma_is_pinned(shadow));
+	GEM_BUG_ON(trampoline && !i915_vma_is_pinned(trampoline));
+
 	pw = kzalloc(sizeof(*pw), GFP_KERNEL);
 	if (!pw)
 		return -ENOMEM;
@@ -2622,82 +2727,26 @@ static int eb_parse_pipeline(struct i915_execbuffer *eb,
 
 static int eb_parse(struct i915_execbuffer *eb)
 {
-	struct drm_i915_private *i915 = eb->i915;
-	struct intel_gt_buffer_pool_node *pool;
-	struct i915_vma *shadow, *trampoline;
-	unsigned int len;
 	int err;
 
-	if (!eb_use_cmdparser(eb))
-		return 0;
-
-	len = eb->batch_len;
-	if (!CMDPARSER_USES_GGTT(eb->i915)) {
-		/*
-		 * ppGTT backed shadow buffers must be mapped RO, to prevent
-		 * post-scan tampering
-		 */
-		if (!eb->context->vm->has_read_only) {
-			drm_dbg(&i915->drm,
-				"Cannot prevent post-scan tampering without RO capable vm\n");
-			return -EINVAL;
-		}
-	} else {
-		len += I915_CMD_PARSER_TRAMPOLINE_SIZE;
-	}
-
-	pool = intel_gt_get_buffer_pool(eb->engine->gt, len);
-	if (IS_ERR(pool))
-		return PTR_ERR(pool);
-
-	shadow = shadow_batch_pin(pool->obj, eb->context->vm, PIN_USER);
-	if (IS_ERR(shadow)) {
-		err = PTR_ERR(shadow);
-		goto err;
+	if (unlikely(eb->batch->flags & EXEC_OBJECT_WRITE)) {
+		drm_dbg(&eb->i915->drm,
+			"Attempting to use self-modifying batch buffer\n");
+		return -EINVAL;
 	}
-	i915_gem_object_set_readonly(shadow->obj);
-	shadow->private = pool;
 
-	trampoline = NULL;
-	if (CMDPARSER_USES_GGTT(eb->i915)) {
-		trampoline = shadow;
-
-		shadow = shadow_batch_pin(pool->obj,
-					  &eb->engine->gt->ggtt->vm,
-					  PIN_GLOBAL);
-		if (IS_ERR(shadow)) {
-			err = PTR_ERR(shadow);
-			shadow = trampoline;
-			goto err_shadow;
-		}
-		shadow->private = pool;
-
-		eb->batch_flags |= I915_DISPATCH_SECURE;
-	}
+	if (!eb->parser.shadow)
+		return 0;
 
-	err = eb_parse_pipeline(eb, shadow, trampoline);
+	err = eb_parse_pipeline(eb,
+				eb->parser.shadow->vma,
+				eb->parser.trampoline ? eb->parser.trampoline->vma : NULL);
 	if (err)
-		goto err_trampoline;
-
-	eb->batch = &eb->vma[eb->buffer_count++];
-	eb->batch->vma = i915_vma_get(shadow);
-	eb->batch->flags = __EXEC_OBJECT_HAS_PIN;
-	list_add_tail(&eb->batch->submit_link, &eb->submit_list);
-	eb->vma[eb->buffer_count].vma = NULL;
+		return err;
 
-	eb->trampoline = trampoline;
+	eb->batch = eb->parser.shadow;
 	eb->batch_start_offset = 0;
-
 	return 0;
-
-err_trampoline:
-	if (trampoline)
-		i915_vma_unpin(trampoline);
-err_shadow:
-	i915_vma_unpin(shadow);
-err:
-	intel_gt_buffer_pool_put(pool);
-	return err;
 }
 
 static void
@@ -2746,10 +2795,10 @@ static int eb_submit(struct i915_execbuffer *eb, struct i915_vma *batch)
 	if (err)
 		return err;
 
-	if (eb->trampoline) {
+	if (eb->parser.trampoline) {
 		GEM_BUG_ON(eb->batch_start_offset);
 		err = eb->engine->emit_bb_start(eb->request,
-						eb->trampoline->node.start +
+						eb->parser.trampoline->vma->node.start +
 						eb->batch_len,
 						0, 0);
 		if (err)
@@ -3234,7 +3283,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	eb.buffer_count = args->buffer_count;
 	eb.batch_start_offset = args->batch_start_offset;
 	eb.batch_len = args->batch_len;
-	eb.trampoline = NULL;
+	memset(&eb.parser, 0, sizeof(eb.parser));
 
 	eb.batch_flags = 0;
 	if (args->flags & I915_EXEC_SECURE) {
@@ -3300,24 +3349,6 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 		goto err_vma;
 	}
 
-	if (unlikely(eb.batch->flags & EXEC_OBJECT_WRITE)) {
-		drm_dbg(&i915->drm,
-			"Attempting to use self-modifying batch buffer\n");
-		err = -EINVAL;
-		goto err_vma;
-	}
-
-	if (range_overflows_t(u64,
-			      eb.batch_start_offset, eb.batch_len,
-			      eb.batch->vma->size)) {
-		drm_dbg(&i915->drm, "Attempting to use out-of-bounds batch\n");
-		err = -EINVAL;
-		goto err_vma;
-	}
-
-	if (eb.batch_len == 0)
-		eb.batch_len = eb.batch->vma->size - eb.batch_start_offset;
-
 	err = eb_parse(&eb);
 	if (err)
 		goto err_vma;
@@ -3343,7 +3374,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 		vma = i915_gem_object_ggtt_pin(batch->obj, NULL, 0, 0, 0);
 		if (IS_ERR(vma)) {
 			err = PTR_ERR(vma);
-			goto err_parse;
+			goto err_vma;
 		}
 
 		GEM_BUG_ON(vma->obj != batch->obj);
@@ -3395,8 +3426,9 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	 * to explicitly hold another reference here.
 	 */
 	eb.request->batch = batch;
-	if (batch->private)
-		intel_gt_buffer_pool_mark_active(batch->private, eb.request);
+	if (eb.parser.shadow)
+		intel_gt_buffer_pool_mark_active(eb.parser.shadow->vma->private,
+						 eb.request);
 
 	trace_i915_request_queue(eb.request, eb.batch_flags);
 	err = eb_submit(&eb, batch);
@@ -3423,13 +3455,9 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 err_batch_unpin:
 	if (eb.batch_flags & I915_DISPATCH_SECURE)
 		i915_vma_unpin(batch);
-err_parse:
-	if (batch->private)
-		intel_gt_buffer_pool_put(batch->private);
-	i915_vma_put(batch);
 err_vma:
-	if (eb.trampoline)
-		i915_vma_unpin(eb.trampoline);
+	if (eb.parser.shadow)
+		intel_gt_buffer_pool_put(eb.parser.shadow->vma->private);
 	eb_unpin_engine(&eb);
 err_context:
 	i915_gem_context_put(eb.gem_context);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [Intel-gfx] [PATCH 20/23] drm/i915/gem: Include secure batch in common execbuf pinning
  2020-07-02  8:32 ` [Intel-gfx] " Chris Wilson
                   ` (18 preceding siblings ...)
  (?)
@ 2020-07-02  8:32 ` Chris Wilson
  -1 siblings, 0 replies; 56+ messages in thread
From: Chris Wilson @ 2020-07-02  8:32 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Pull the GGTT binding for the secure batch dispatch into the common vma
pinning routine for execbuf, so that there is just a single central
place for all i915_vma_pin().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 88 +++++++++++--------
 1 file changed, 51 insertions(+), 37 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index e19c0cbe1b7d..320840f9c629 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -1648,6 +1648,48 @@ static int eb_alloc_cmdparser(struct i915_execbuffer *eb)
 	return err;
 }
 
+static int eb_secure_batch(struct i915_execbuffer *eb)
+{
+	struct i915_vma *vma = eb->batch->vma;
+
+	/*
+	 * snb/ivb/vlv conflate the "batch in ppgtt" bit with the "non-secure
+	 * batch" bit. Hence we need to pin secure batches into the global gtt.
+	 * hsw should have this fixed, but bdw mucks it up again.
+	 */
+	if (!(eb->batch_flags & I915_DISPATCH_SECURE))
+		return 0;
+
+	if (GEM_WARN_ON(vma->vm != &eb->engine->gt->ggtt->vm)) {
+		struct eb_vma *ev;
+
+		ev = kzalloc(sizeof(*ev), GFP_KERNEL);
+		if (!ev)
+			return -ENOMEM;
+
+		vma = i915_vma_instance(vma->obj,
+					&eb->engine->gt->ggtt->vm,
+					NULL);
+		if (IS_ERR(vma)) {
+			kfree(ev);
+			return PTR_ERR(vma);
+		}
+
+		ev->vma = i915_vma_get(vma);
+		ev->exec = &no_entry;
+
+		list_add(&ev->submit_link, &eb->submit_list);
+		list_add(&ev->reloc_link, &eb->array->aux_list);
+		list_add(&ev->bind_link, &eb->bind_list);
+
+		GEM_BUG_ON(eb->batch->vma->private);
+		eb->batch = ev;
+	}
+
+	eb->batch->flags |= EXEC_OBJECT_NEEDS_GTT;
+	return 0;
+}
+
 static unsigned int eb_batch_index(const struct i915_execbuffer *eb)
 {
 	if (eb->args->flags & I915_EXEC_BATCH_FIRST)
@@ -2435,6 +2477,10 @@ static int eb_relocate(struct i915_execbuffer *eb)
 	if (err)
 		return err;
 
+	err = eb_secure_batch(eb);
+	if (err)
+		return err;
+
 	err = eb_reserve_vm(eb);
 	if (err)
 		return err;
@@ -2761,7 +2807,7 @@ add_to_client(struct i915_request *rq, struct drm_file *file)
 	spin_unlock(&file_priv->mm.lock);
 }
 
-static int eb_submit(struct i915_execbuffer *eb, struct i915_vma *batch)
+static int eb_submit(struct i915_execbuffer *eb)
 {
 	int err;
 
@@ -2788,7 +2834,7 @@ static int eb_submit(struct i915_execbuffer *eb, struct i915_vma *batch)
 	}
 
 	err = eb->engine->emit_bb_start(eb->request,
-					batch->node.start +
+					eb->batch->vma->node.start +
 					eb->batch_start_offset,
 					eb->batch_len,
 					eb->batch_flags);
@@ -3261,7 +3307,6 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	struct i915_execbuffer eb;
 	struct dma_fence *in_fence = NULL;
 	struct sync_file *out_fence = NULL;
-	struct i915_vma *batch;
 	int out_fence_fd = -1;
 	int err;
 
@@ -3353,34 +3398,6 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	if (err)
 		goto err_vma;
 
-	/*
-	 * snb/ivb/vlv conflate the "batch in ppgtt" bit with the "non-secure
-	 * batch" bit. Hence we need to pin secure batches into the global gtt.
-	 * hsw should have this fixed, but bdw mucks it up again. */
-	batch = i915_vma_get(eb.batch->vma);
-	if (eb.batch_flags & I915_DISPATCH_SECURE) {
-		struct i915_vma *vma;
-
-		/*
-		 * So on first glance it looks freaky that we pin the batch here
-		 * outside of the reservation loop. But:
-		 * - The batch is already pinned into the relevant ppgtt, so we
-		 *   already have the backing storage fully allocated.
-		 * - No other BO uses the global gtt (well contexts, but meh),
-		 *   so we don't really have issues with multiple objects not
-		 *   fitting due to fragmentation.
-		 * So this is actually safe.
-		 */
-		vma = i915_gem_object_ggtt_pin(batch->obj, NULL, 0, 0, 0);
-		if (IS_ERR(vma)) {
-			err = PTR_ERR(vma);
-			goto err_vma;
-		}
-
-		GEM_BUG_ON(vma->obj != batch->obj);
-		batch = vma;
-	}
-
 	/* All GPU relocation batches must be submitted prior to the user rq */
 	GEM_BUG_ON(eb.reloc_cache.rq);
 
@@ -3388,7 +3405,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	eb.request = __i915_request_create(eb.context, GFP_KERNEL);
 	if (IS_ERR(eb.request)) {
 		err = PTR_ERR(eb.request);
-		goto err_batch_unpin;
+		goto err_vma;
 	}
 	eb.request->cookie = lockdep_pin_lock(&eb.context->timeline->mutex);
 
@@ -3425,13 +3442,13 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	 * inactive_list and lose its active reference. Hence we do not need
 	 * to explicitly hold another reference here.
 	 */
-	eb.request->batch = batch;
+	eb.request->batch = eb.batch->vma;
 	if (eb.parser.shadow)
 		intel_gt_buffer_pool_mark_active(eb.parser.shadow->vma->private,
 						 eb.request);
 
 	trace_i915_request_queue(eb.request, eb.batch_flags);
-	err = eb_submit(&eb, batch);
+	err = eb_submit(&eb);
 err_request:
 	add_to_client(eb.request, file);
 	i915_request_get(eb.request);
@@ -3452,9 +3469,6 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	}
 	i915_request_put(eb.request);
 
-err_batch_unpin:
-	if (eb.batch_flags & I915_DISPATCH_SECURE)
-		i915_vma_unpin(batch);
 err_vma:
 	if (eb.parser.shadow)
 		intel_gt_buffer_pool_put(eb.parser.shadow->vma->private);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [Intel-gfx] [PATCH 21/23] drm/i915/gem: Reintroduce multiple passes for reloc processing
  2020-07-02  8:32 ` [Intel-gfx] " Chris Wilson
                   ` (19 preceding siblings ...)
  (?)
@ 2020-07-02  8:32 ` Chris Wilson
  -1 siblings, 0 replies; 56+ messages in thread
From: Chris Wilson @ 2020-07-02  8:32 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

The prospect of locking the entire submission sequence under a wide
ww_mutex re-imposes some key restrictions, in particular that we must
not call copy_(from|to)_user underneath the mutex (as the faulthandlers
themselves may need to take the ww_mutex). To satisfy this requirement,
we need to split the relocation handling into multiple phases again.
After dropping the reservations, we need to allocate enough buffer space
to both copy the relocations from userspace into, and serve as the
relocation command buffer. Once we have finished copying the
relocations, we can then re-aquire all the objects for the execbuf and
rebind them, including our new relocations objects. After we have bound
all the new and old objects into their final locations, we can then
convert the relocation entries into the GPU commands to update the
relocated vma. Finally, once it is all over and we have dropped the
ww_mutex for the last time, we can then complete the update of the user
relocation entries.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 758 ++++++++----------
 .../i915/gem/selftests/i915_gem_execbuffer.c  |   6 +
 2 files changed, 348 insertions(+), 416 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 320840f9c629..c325aed82629 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -265,20 +265,18 @@ struct i915_execbuffer {
 	 * obj/page
 	 */
 	struct reloc_cache {
-		struct drm_mm_node node; /** temporary GTT binding */
 		unsigned int gen; /** Cached value of INTEL_GEN */
 		bool use_64bit_reloc : 1;
-		bool has_llc : 1;
 		bool has_fence : 1;
 		bool needs_unfenced : 1;
 
 		struct intel_context *ce;
 
-		struct i915_vma *target;
-		struct i915_request *rq;
-		struct i915_vma *rq_vma;
-		u32 *rq_cmd;
-		unsigned int rq_size;
+		struct eb_relocs_link {
+			struct i915_vma *vma;
+		} head;
+		struct drm_i915_gem_relocation_entry *map;
+		unsigned int pos;
 	} reloc_cache;
 
 	struct eb_cmdparser {
@@ -305,6 +303,12 @@ struct i915_execbuffer {
 
 static struct drm_i915_gem_exec_object2 no_entry;
 
+static u64 noncanonical_addr(u64 addr, const struct i915_address_space *vm)
+{
+	GEM_BUG_ON(!is_power_of_2(vm->total));
+	return addr & (vm->total - 1);
+}
+
 static inline bool eb_use_cmdparser(const struct i915_execbuffer *eb)
 {
 	return intel_engine_requires_cmd_parser(eb->engine) ||
@@ -593,7 +597,7 @@ eb_validate_vma(struct i915_execbuffer *eb,
 	 * so from this point we're always using non-canonical
 	 * form internally.
 	 */
-	entry->offset = gen8_noncanonical_addr(entry->offset);
+	entry->offset = noncanonical_addr(entry->offset, eb->context->vm);
 
 	if (!eb->reloc_cache.has_fence) {
 		entry->flags &= ~EXEC_OBJECT_NEEDS_FENCE;
@@ -1851,8 +1855,6 @@ eb_get_vma(const struct i915_execbuffer *eb, unsigned long handle)
 
 static void eb_destroy(const struct i915_execbuffer *eb)
 {
-	GEM_BUG_ON(eb->reloc_cache.rq);
-
 	eb_vma_array_put(eb->array);
 	if (eb->lut_size > 0)
 		kfree(eb->buckets);
@@ -1870,83 +1872,9 @@ static void reloc_cache_init(struct reloc_cache *cache,
 {
 	/* Must be a variable in the struct to allow GCC to unroll. */
 	cache->gen = INTEL_GEN(i915);
-	cache->has_llc = HAS_LLC(i915);
 	cache->use_64bit_reloc = HAS_64BIT_RELOC(i915);
 	cache->has_fence = cache->gen < 4;
 	cache->needs_unfenced = INTEL_INFO(i915)->unfenced_needs_alignment;
-	cache->node.flags = 0;
-	cache->rq = NULL;
-	cache->target = NULL;
-}
-
-#define RELOC_TAIL 4
-
-static int reloc_gpu_chain(struct reloc_cache *cache)
-{
-	struct intel_gt_buffer_pool_node *pool;
-	struct i915_request *rq = cache->rq;
-	struct i915_vma *batch;
-	u32 *cmd;
-	int err;
-
-	pool = intel_gt_get_buffer_pool(rq->engine->gt, PAGE_SIZE);
-	if (IS_ERR(pool))
-		return PTR_ERR(pool);
-
-	batch = i915_vma_instance(pool->obj, rq->context->vm, NULL);
-	if (IS_ERR(batch)) {
-		err = PTR_ERR(batch);
-		goto out_pool;
-	}
-
-	err = i915_vma_pin(batch, 0, 0, PIN_USER | PIN_NONBLOCK);
-	if (err)
-		goto out_pool;
-
-	GEM_BUG_ON(cache->rq_size + RELOC_TAIL > PAGE_SIZE  / sizeof(u32));
-	cmd = cache->rq_cmd + cache->rq_size;
-	*cmd++ = MI_ARB_CHECK;
-	if (cache->gen >= 8)
-		*cmd++ = MI_BATCH_BUFFER_START_GEN8;
-	else if (cache->gen >= 6)
-		*cmd++ = MI_BATCH_BUFFER_START;
-	else
-		*cmd++ = MI_BATCH_BUFFER_START | MI_BATCH_GTT;
-	*cmd++ = lower_32_bits(batch->node.start);
-	*cmd++ = upper_32_bits(batch->node.start); /* Always 0 for gen<8 */
-	i915_gem_object_flush_map(cache->rq_vma->obj);
-	i915_gem_object_unpin_map(cache->rq_vma->obj);
-	cache->rq_vma = NULL;
-
-	err = intel_gt_buffer_pool_mark_active(pool, rq);
-	if (err == 0) {
-		i915_vma_lock(batch);
-		err = i915_request_await_object(rq, batch->obj, false);
-		if (err == 0)
-			err = i915_vma_move_to_active(batch, rq, 0);
-		i915_vma_unlock(batch);
-	}
-	i915_vma_unpin(batch);
-	if (err)
-		goto out_pool;
-
-	cmd = i915_gem_object_pin_map(batch->obj,
-				      cache->has_llc ?
-				      I915_MAP_FORCE_WB :
-				      I915_MAP_FORCE_WC);
-	if (IS_ERR(cmd)) {
-		err = PTR_ERR(cmd);
-		goto out_pool;
-	}
-
-	/* Return with batch mapping (cmd) still pinned */
-	cache->rq_cmd = cmd;
-	cache->rq_size = 0;
-	cache->rq_vma = batch;
-
-out_pool:
-	intel_gt_buffer_pool_put(pool);
-	return err;
 }
 
 static struct i915_request *
@@ -1984,28 +1912,18 @@ static unsigned int reloc_bb_flags(const struct reloc_cache *cache)
 	return cache->gen > 5 ? 0 : I915_DISPATCH_SECURE;
 }
 
-static int reloc_gpu_flush(struct i915_execbuffer *eb)
+static int
+reloc_gpu_flush(struct i915_execbuffer *eb, struct i915_request *rq, int err)
 {
 	struct reloc_cache *cache = &eb->reloc_cache;
-	struct i915_request *rq;
-	int err;
-
-	rq = fetch_and_zero(&cache->rq);
-	if (!rq)
-		return 0;
-
-	if (cache->rq_vma) {
-		struct drm_i915_gem_object *obj = cache->rq_vma->obj;
-
-		GEM_BUG_ON(cache->rq_size >= obj->base.size / sizeof(u32));
-		cache->rq_cmd[cache->rq_size++] = MI_BATCH_BUFFER_END;
+	u32 *cs;
 
-		__i915_gem_object_flush_map(obj,
-					    0, sizeof(u32) * cache->rq_size);
-		i915_gem_object_unpin_map(obj);
-	}
+	cs = memset(cache->map + cache->pos, 0, 4);
+	*cs++ = MI_BATCH_BUFFER_END;
+	__i915_gem_object_flush_map(cache->head.vma->obj,
+				    0, (void *)cs - (void *)cache->map);
+	i915_gem_object_unpin_map(cache->head.vma->obj);
 
-	err = 0;
 	if (rq->engine->emit_init_breadcrumb)
 		err = rq->engine->emit_init_breadcrumb(rq);
 	if (!err)
@@ -2018,6 +1936,7 @@ static int reloc_gpu_flush(struct i915_execbuffer *eb)
 
 	intel_gt_chipset_flush(rq->engine->gt);
 	__i915_request_add(rq, &eb->gem_context->sched);
+
 	if (i915_request_timeline(rq) != eb->context->timeline)
 		mutex_unlock(&i915_request_timeline(rq)->mutex);
 
@@ -2035,7 +1954,7 @@ static int reloc_move_to_gpu(struct i915_request *rq, struct i915_vma *vma)
 		i915_gem_clflush_object(obj, 0);
 	obj->write_domain = 0;
 
-	err = i915_request_await_object(rq, vma->obj, true);
+	err = i915_request_await_object(rq, obj, true);
 	if (err == 0)
 		err = i915_vma_move_to_active(vma, rq, EXEC_OBJECT_WRITE);
 
@@ -2044,130 +1963,6 @@ static int reloc_move_to_gpu(struct i915_request *rq, struct i915_vma *vma)
 	return err;
 }
 
-static int __reloc_gpu_alloc(struct i915_execbuffer *eb,
-			     struct intel_engine_cs *engine,
-			     unsigned int len)
-{
-	struct reloc_cache *cache = &eb->reloc_cache;
-	struct intel_gt_buffer_pool_node *pool;
-	struct i915_request *rq;
-	struct i915_vma *batch;
-	u32 *cmd;
-	int err;
-
-	pool = intel_gt_get_buffer_pool(engine->gt, PAGE_SIZE);
-	if (IS_ERR(pool))
-		return PTR_ERR(pool);
-
-	cmd = i915_gem_object_pin_map(pool->obj,
-				      cache->has_llc ?
-				      I915_MAP_FORCE_WB :
-				      I915_MAP_FORCE_WC);
-	if (IS_ERR(cmd)) {
-		err = PTR_ERR(cmd);
-		goto out_pool;
-	}
-
-	batch = i915_vma_instance(pool->obj, eb->context->vm, NULL);
-	if (IS_ERR(batch)) {
-		err = PTR_ERR(batch);
-		goto err_unmap;
-	}
-
-	err = i915_vma_pin(batch, 0, 0, PIN_USER | PIN_NONBLOCK);
-	if (err)
-		goto err_unmap;
-
-	if (cache->ce == eb->context)
-		rq = __i915_request_create(cache->ce, GFP_KERNEL);
-	else
-		rq = nested_request_create(cache->ce);
-	if (IS_ERR(rq)) {
-		err = PTR_ERR(rq);
-		goto err_unpin;
-	}
-	rq->cookie = lockdep_pin_lock(&i915_request_timeline(rq)->mutex);
-
-	err = intel_gt_buffer_pool_mark_active(pool, rq);
-	if (err)
-		goto err_request;
-
-	i915_vma_lock(batch);
-	err = i915_request_await_object(rq, batch->obj, false);
-	if (err == 0)
-		err = i915_vma_move_to_active(batch, rq, 0);
-	i915_vma_unlock(batch);
-	if (err)
-		goto skip_request;
-
-	rq->batch = batch;
-	i915_vma_unpin(batch);
-
-	cache->rq = rq;
-	cache->rq_cmd = cmd;
-	cache->rq_size = 0;
-	cache->rq_vma = batch;
-
-	/* Return with batch mapping (cmd) still pinned */
-	goto out_pool;
-
-skip_request:
-	i915_request_set_error_once(rq, err);
-err_request:
-	__i915_request_add(rq, &eb->gem_context->sched);
-	if (i915_request_timeline(rq) != eb->context->timeline)
-		mutex_unlock(&i915_request_timeline(rq)->mutex);
-err_unpin:
-	i915_vma_unpin(batch);
-err_unmap:
-	i915_gem_object_unpin_map(pool->obj);
-out_pool:
-	intel_gt_buffer_pool_put(pool);
-	return err;
-}
-
-static u32 *reloc_gpu(struct i915_execbuffer *eb,
-		      struct i915_vma *vma,
-		      unsigned int len)
-{
-	struct reloc_cache *cache = &eb->reloc_cache;
-	u32 *cmd;
-	int err;
-
-	if (unlikely(!cache->rq)) {
-		struct intel_engine_cs *engine = eb->engine;
-
-		err = __reloc_gpu_alloc(eb, engine, len);
-		if (unlikely(err))
-			return ERR_PTR(err);
-	}
-
-	if (vma != cache->target) {
-		err = reloc_move_to_gpu(cache->rq, vma);
-		if (unlikely(err)) {
-			i915_request_set_error_once(cache->rq, err);
-			return ERR_PTR(err);
-		}
-
-		cache->target = vma;
-	}
-
-	if (unlikely(cache->rq_size + len >
-		     PAGE_SIZE / sizeof(u32) - RELOC_TAIL)) {
-		err = reloc_gpu_chain(cache);
-		if (unlikely(err)) {
-			i915_request_set_error_once(cache->rq, err);
-			return ERR_PTR(err);
-		}
-	}
-
-	GEM_BUG_ON(cache->rq_size + len >= PAGE_SIZE  / sizeof(u32));
-	cmd = cache->rq_cmd + cache->rq_size;
-	cache->rq_size += len;
-
-	return cmd;
-}
-
 static unsigned long vma_phys_addr(struct i915_vma *vma, u32 offset)
 {
 	struct page *page;
@@ -2182,30 +1977,28 @@ static unsigned long vma_phys_addr(struct i915_vma *vma, u32 offset)
 	return addr + offset_in_page(offset);
 }
 
-static int __reloc_entry_gpu(struct i915_execbuffer *eb,
-			     struct i915_vma *vma,
-			     u64 offset,
-			     u64 target_addr)
+static bool
+eb_relocs_vma_entry(struct i915_execbuffer *eb,
+		    struct eb_vma *ev,
+		    struct drm_i915_gem_relocation_entry *reloc)
 {
 	const unsigned int gen = eb->reloc_cache.gen;
-	unsigned int len;
+	struct i915_vma *target = eb_get_vma(eb, reloc->target_handle)->vma;
+	const u64 target_addr = relocation_target(reloc, target);
+	const u64 presumed =
+		noncanonical_addr(reloc->presumed_offset, target->vm);
+	u64 offset = reloc->offset;
 	u32 *batch;
-	u64 addr;
 
-	if (gen >= 8)
-		len = offset & 7 ? 8 : 5;
-	else if (gen >= 4)
-		len = 4;
-	else
-		len = 3;
-
-	batch = reloc_gpu(eb, vma, len);
-	if (IS_ERR(batch))
-		return PTR_ERR(batch);
+	/* Replace the reloc entry with the GPU commands */
+	batch = memset(reloc, 0, 32);
+	if (presumed == target->node.start)
+		return false;
 
-	addr = gen8_canonical_addr(vma->node.start + offset);
 	if (gen >= 8) {
-		if (offset & 7) {
+		u64 addr = gen8_canonical_addr(ev->vma->node.start + offset);
+
+		if (addr & 7) {
 			*batch++ = MI_STORE_DWORD_IMM_GEN4;
 			*batch++ = lower_32_bits(addr);
 			*batch++ = upper_32_bits(addr);
@@ -2227,107 +2020,64 @@ static int __reloc_entry_gpu(struct i915_execbuffer *eb,
 	} else if (gen >= 6) {
 		*batch++ = MI_STORE_DWORD_IMM_GEN4;
 		*batch++ = 0;
-		*batch++ = addr;
+		*batch++ = ev->vma->node.start + offset;
 		*batch++ = target_addr;
 	} else if (IS_I965G(eb->i915)) {
 		*batch++ = MI_STORE_DWORD_IMM_GEN4;
 		*batch++ = 0;
-		*batch++ = vma_phys_addr(vma, offset);
+		*batch++ = vma_phys_addr(ev->vma, offset);
 		*batch++ = target_addr;
 	} else if (gen >= 4) {
 		*batch++ = MI_STORE_DWORD_IMM_GEN4 | MI_USE_GGTT;
 		*batch++ = 0;
-		*batch++ = addr;
+		*batch++ = ev->vma->node.start + offset;
 		*batch++ = target_addr;
-	} else if (gen >= 3 &&
-		   !(IS_I915G(eb->i915) || IS_I915GM(eb->i915))) {
+	} else if (gen >= 3 && !(IS_I915G(eb->i915) || IS_I915GM(eb->i915))) {
 		*batch++ = MI_STORE_DWORD_IMM | MI_MEM_VIRTUAL;
-		*batch++ = addr;
+		*batch++ = ev->vma->node.start + offset;
 		*batch++ = target_addr;
 	} else {
 		*batch++ = MI_STORE_DWORD_IMM;
-		*batch++ = vma_phys_addr(vma, offset);
+		*batch++ = vma_phys_addr(ev->vma, offset);
 		*batch++ = target_addr;
 	}
 
-	return 0;
-}
-
-static u64
-relocate_entry(struct i915_execbuffer *eb,
-	       struct i915_vma *vma,
-	       const struct drm_i915_gem_relocation_entry *reloc,
-	       const struct i915_vma *target)
-{
-	u64 target_addr = relocation_target(reloc, target);
-	int err;
-
-	err = __reloc_entry_gpu(eb, vma, reloc->offset, target_addr);
-	if (err)
-		return err;
-
-	return target->node.start | UPDATE;
-}
-
-static int gen6_fixup_ggtt(struct i915_vma *vma)
-{
-	int err;
-
-	if (i915_vma_is_bound(vma, I915_VMA_GLOBAL_BIND))
-		return 0;
-
-	err = i915_vma_wait_for_bind(vma);
-	if (err)
-		return err;
-
-	mutex_lock(&vma->vm->mutex);
-	if (!(atomic_fetch_or(I915_VMA_GLOBAL_BIND, &vma->flags) & I915_VMA_GLOBAL_BIND)) {
-		__i915_gem_object_pin_pages(vma->obj);
-		vma->ops->bind_vma(vma->vm, NULL, vma,
-				   vma->obj->cache_level,
-				   I915_VMA_GLOBAL_BIND);
-	}
-	mutex_unlock(&vma->vm->mutex);
-
-	return 0;
+	return true;
 }
 
-static u64
-eb_relocate_entry(struct i915_execbuffer *eb,
-		  struct eb_vma *ev,
-		  const struct drm_i915_gem_relocation_entry *reloc)
+static int
+eb_relocs_check_entry(struct i915_execbuffer *eb,
+		      struct eb_vma *ev,
+		      const struct drm_i915_gem_relocation_entry *reloc)
 {
 	struct drm_i915_private *i915 = eb->i915;
 	struct eb_vma *target;
-	int err;
 
 	/* we've already hold a reference to all valid objects */
 	target = eb_get_vma(eb, reloc->target_handle);
 	if (unlikely(!target))
 		return -ENOENT;
 
-	GEM_BUG_ON(!i915_vma_is_pinned(target->vma));
-
 	/* Validate that the target is in a valid r/w GPU domain */
 	if (unlikely(reloc->write_domain & (reloc->write_domain - 1))) {
 		drm_dbg(&i915->drm, "reloc with multiple write domains: "
-			  "target %d offset %d "
-			  "read %08x write %08x",
-			  reloc->target_handle,
-			  (int) reloc->offset,
-			  reloc->read_domains,
-			  reloc->write_domain);
+			"target %d offset %llu "
+			"read %08x write %08x",
+			reloc->target_handle,
+			reloc->offset,
+			reloc->read_domains,
+			reloc->write_domain);
 		return -EINVAL;
 	}
 	if (unlikely((reloc->write_domain | reloc->read_domains)
 		     & ~I915_GEM_GPU_DOMAINS)) {
 		drm_dbg(&i915->drm, "reloc with read/write non-GPU domains: "
-			  "target %d offset %d "
-			  "read %08x write %08x",
-			  reloc->target_handle,
-			  (int) reloc->offset,
-			  reloc->read_domains,
-			  reloc->write_domain);
+			"target %d offset %llu "
+			"read %08x write %08x",
+			reloc->target_handle,
+			reloc->offset,
+			reloc->read_domains,
+			reloc->write_domain);
 		return -EINVAL;
 	}
 
@@ -2341,130 +2091,313 @@ eb_relocate_entry(struct i915_execbuffer *eb,
 		 * batchbuffers.
 		 */
 		if (reloc->write_domain == I915_GEM_DOMAIN_INSTRUCTION &&
-		    IS_GEN(eb->i915, 6)) {
-			err = gen6_fixup_ggtt(target->vma);
-			if (err)
-				return err;
-		}
+		    IS_GEN(eb->i915, 6))
+			target->flags |= EXEC_OBJECT_NEEDS_GTT;
 	}
 
-	/*
-	 * If the relocation already has the right value in it, no
-	 * more work needs to be done.
-	 */
-	if (gen8_canonical_addr(target->vma->node.start) == reloc->presumed_offset)
-		return 0;
-
 	/* Check that the relocation address is valid... */
 	if (unlikely(reloc->offset >
 		     ev->vma->size - (eb->reloc_cache.use_64bit_reloc ? 8 : 4))) {
 		drm_dbg(&i915->drm, "Relocation beyond object bounds: "
-			  "target %d offset %d size %d.\n",
-			  reloc->target_handle,
-			  (int)reloc->offset,
-			  (int)ev->vma->size);
+			"target %d offset %llu size %llu.\n",
+			reloc->target_handle,
+			reloc->offset,
+			ev->vma->size);
 		return -EINVAL;
 	}
 	if (unlikely(reloc->offset & 3)) {
 		drm_dbg(&i915->drm, "Relocation not 4-byte aligned: "
-			  "target %d offset %d.\n",
-			  reloc->target_handle,
-			  (int)reloc->offset);
+			"target %d offset %llu.\n",
+			reloc->target_handle,
+			reloc->offset);
 		return -EINVAL;
 	}
 
-	/*
-	 * If we write into the object, we need to force the synchronisation
-	 * barrier, either with an asynchronous clflush or if we executed the
-	 * patching using the GPU (though that should be serialised by the
-	 * timeline). To be completely sure, and since we are required to
-	 * do relocations we are already stalling, disable the user's opt
-	 * out of our synchronisation.
-	 */
-	ev->flags &= ~EXEC_OBJECT_ASYNC;
+	return 0;
+}
+
+static struct drm_i915_gem_relocation_entry *
+eb_relocs_grow(struct i915_execbuffer *eb, unsigned long *count)
+{
+#define RELOC_SZ SZ_64K
+#define N_RELOC (RELOC_SZ / sizeof(struct drm_i915_gem_relocation_entry) - 1)
+	struct reloc_cache *c = &eb->reloc_cache;
+	struct drm_i915_gem_relocation_entry *r;
+	unsigned long remain;
+
+	remain = N_RELOC - c->pos;
+	if (remain == 0) {
+		struct drm_i915_gem_object *obj;
+		struct i915_vma *vma;
+		struct eb_vma *ev;
+
+		obj = i915_gem_object_create_internal(eb->i915, RELOC_SZ);
+		if (IS_ERR(obj))
+			return ERR_CAST(obj);
+
+		if (c->gen >= 6)
+			i915_gem_object_set_cache_coherency(obj,
+							    I915_CACHE_LLC);
+
+		vma = i915_vma_instance(obj, eb->context->vm, NULL);
+		if (IS_ERR(vma)) {
+			i915_gem_object_put(obj);
+			return ERR_CAST(vma);
+		}
+
+		ev = kzalloc(sizeof(*ev), GFP_KERNEL);
+		if (!ev) {
+			i915_gem_object_put(obj);
+			return ERR_PTR(-ENOMEM);
+		}
+
+		vma->private = ev;
+		ev->vma = vma;
+		ev->exec = &no_entry;
+		ev->flags = EXEC_OBJECT_SUPPORTS_48B_ADDRESS;
+		list_add(&ev->bind_link, &eb->bind_list);
+		list_add(&ev->reloc_link, &eb->array->aux_list);
+
+		if (!c->head.vma) {
+			c->head.vma = vma;
+		} else {
+			struct eb_relocs_link *link;
+
+			link = (struct eb_relocs_link *)(c->map + c->pos);
+			link->vma = vma;
+		}
+
+		c->map = i915_gem_object_pin_map(obj, I915_MAP_WB);
+		c->pos = 0;
+
+		remain = N_RELOC;
+	}
+	*count = min(remain, *count);
+
+	r = c->map + c->pos;
+	c->pos += *count;
 
-	/* and update the user's relocation entry */
-	return relocate_entry(eb, ev->vma, reloc, target->vma);
+	return r;
 }
 
-static int eb_relocate_vma(struct i915_execbuffer *eb, struct eb_vma *ev)
+static int eb_relocs_copy_vma(struct i915_execbuffer *eb, struct eb_vma *ev)
 {
-#define N_RELOC(x) ((x) / sizeof(struct drm_i915_gem_relocation_entry))
-	struct drm_i915_gem_relocation_entry stack[N_RELOC(512)];
 	const struct drm_i915_gem_exec_object2 *entry = ev->exec;
 	struct drm_i915_gem_relocation_entry __user *urelocs =
 		u64_to_user_ptr(entry->relocs_ptr);
 	unsigned long remain = entry->relocation_count;
 
-	if (unlikely(remain > N_RELOC(ULONG_MAX)))
+	if (unlikely(remain > ULONG_MAX / sizeof(*urelocs)))
 		return -EINVAL;
 
-	/*
-	 * We must check that the entire relocation array is safe
-	 * to read. However, if the array is not writable the user loses
-	 * the updated relocation values.
-	 */
-	if (unlikely(!access_ok(urelocs, remain * sizeof(*urelocs))))
-		return -EFAULT;
-
 	do {
-		struct drm_i915_gem_relocation_entry *r = stack;
-		unsigned int count =
-			min_t(unsigned long, remain, ARRAY_SIZE(stack));
-		unsigned int copied;
+		struct drm_i915_gem_relocation_entry *r;
+		unsigned long count = remain;
+		int err;
 
-		/*
-		 * This is the fast path and we cannot handle a pagefault
-		 * whilst holding the struct mutex lest the user pass in the
-		 * relocations contained within a mmaped bo. For in such a case
-		 * we, the page fault handler would call i915_gem_fault() and
-		 * we would try to acquire the struct mutex again. Obviously
-		 * this is bad and so lockdep complains vehemently.
-		 */
-		copied = __copy_from_user(r, urelocs, count * sizeof(r[0]));
-		if (unlikely(copied))
+		r = eb_relocs_grow(eb, &count);
+		if (IS_ERR(r))
+			return PTR_ERR(r);
+
+		if (unlikely(copy_from_user(r, urelocs, count * sizeof(r[0]))))
 			return -EFAULT;
 
 		remain -= count;
-		do {
-			u64 offset = eb_relocate_entry(eb, ev, r);
+		urelocs += count;
 
-			if (likely(offset == 0)) {
-			} else if ((s64)offset < 0) {
-				return (int)offset;
-			} else {
-				/*
-				 * Note that reporting an error now
-				 * leaves everything in an inconsistent
-				 * state as we have *already* changed
-				 * the relocation value inside the
-				 * object. As we have not changed the
-				 * reloc.presumed_offset or will not
-				 * change the execobject.offset, on the
-				 * call we may not rewrite the value
-				 * inside the object, leaving it
-				 * dangling and causing a GPU hang. Unless
-				 * userspace dynamically rebuilds the
-				 * relocations on each execbuf rather than
-				 * presume a static tree.
-				 *
-				 * We did previously check if the relocations
-				 * were writable (access_ok), an error now
-				 * would be a strange race with mprotect,
-				 * having already demonstrated that we
-				 * can read from this userspace address.
-				 */
-				offset = gen8_canonical_addr(offset & ~UPDATE);
-				__put_user(offset,
-					   &urelocs[r - stack].presumed_offset);
-			}
-		} while (r++, --count);
-		urelocs += ARRAY_SIZE(stack);
+		do {
+			err = eb_relocs_check_entry(eb, ev, r++);
+			if (err)
+				return err;
+		} while (--count);
 	} while (remain);
 
 	return 0;
 }
 
+static int eb_relocs_copy_user(struct i915_execbuffer *eb)
+{
+	struct eb_vma *ev;
+	int err;
+
+	/* Drop everything before we copy_from_user */
+	list_for_each_entry(ev, &eb->bind_list, bind_link)
+		eb_unreserve_vma(ev);
+
+	eb->reloc_cache.head.vma = NULL;
+	eb->reloc_cache.pos = N_RELOC;
+
+	list_for_each_entry(ev, &eb->relocs, reloc_link) {
+		err = eb_relocs_copy_vma(eb, ev);
+		if (err)
+			return err;
+	}
+
+	/* Now reacquire everything, including the extra reloc bo */
+	return eb_reserve_vm(eb);
+}
+
+static struct drm_i915_gem_relocation_entry *
+get_gpu_relocs(struct i915_execbuffer *eb,
+	       struct i915_request *rq,
+	       unsigned long *count)
+{
+	struct reloc_cache *c = &eb->reloc_cache;
+	struct drm_i915_gem_relocation_entry *r;
+	unsigned long remain;
+
+	remain = N_RELOC - c->pos;
+	if (remain == 0) {
+		struct eb_relocs_link link;
+		const int gen = c->gen;
+		u32 *cs;
+
+		link = *(struct eb_relocs_link *)(c->map + c->pos);
+
+		cs = memset(c->map + c->pos, 0, 32);
+		*cs++ = MI_ARB_CHECK;
+		if (gen >= 8)
+			*cs++ = MI_BATCH_BUFFER_START_GEN8;
+		else if (gen >= 6)
+			*cs++ = MI_BATCH_BUFFER_START;
+		else
+			*cs++ = MI_BATCH_BUFFER_START | MI_BATCH_GTT;
+		*cs++ = lower_32_bits(link.vma->node.start);
+		*cs++ = upper_32_bits(link.vma->node.start);
+		i915_gem_object_flush_map(c->head.vma->obj);
+		i915_gem_object_unpin_map(c->head.vma->obj);
+
+		c->head = link;
+		c->map = NULL;
+	}
+
+	if (!c->map) {
+		struct i915_vma *vma = c->head.vma;
+		int err;
+
+		i915_vma_lock(vma);
+		err = i915_request_await_object(rq, vma->obj, false);
+		if (err == 0)
+			err = i915_vma_move_to_active(vma, rq, 0);
+		i915_vma_unlock(vma);
+		if (err)
+			return ERR_PTR(err);
+
+		c->map = page_mask_bits(vma->obj->mm.mapping);
+		c->pos = 0;
+
+		remain = N_RELOC;
+	}
+
+	*count = min(remain, *count);
+
+	r = c->map + c->pos;
+	c->pos += *count;
+
+	return r;
+}
+
+static int eb_relocs_gpu_vma(struct i915_execbuffer *eb,
+			     struct i915_request *rq,
+			     struct eb_vma *ev)
+{
+	const struct drm_i915_gem_exec_object2 *entry = ev->exec;
+	unsigned long remain = entry->relocation_count;
+	bool write = false;
+	int err = 0;
+
+	do {
+		struct drm_i915_gem_relocation_entry *r;
+		unsigned long count = remain;
+
+		r = get_gpu_relocs(eb, rq, &count);
+		if (IS_ERR(r))
+			return PTR_ERR(r);
+
+		remain -= count;
+		do {
+			write |= eb_relocs_vma_entry(eb, ev, r++);
+		} while (--count);
+	} while (remain);
+
+	if (write)
+		err = reloc_move_to_gpu(rq, ev->vma);
+
+	return err;
+}
+
+static struct i915_request *reloc_gpu_alloc(struct i915_execbuffer *eb)
+{
+	struct reloc_cache *cache = &eb->reloc_cache;
+	struct i915_request *rq;
+
+	if (cache->ce == eb->context)
+		rq = __i915_request_create(cache->ce, GFP_KERNEL);
+	else
+		rq = nested_request_create(cache->ce);
+	if (IS_ERR(rq))
+		return rq;
+
+	rq->cookie = lockdep_pin_lock(&i915_request_timeline(rq)->mutex);
+	return rq;
+}
+
+static int eb_relocs_gpu(struct i915_execbuffer *eb)
+{
+	struct i915_request *rq;
+	struct eb_vma *ev;
+	int err;
+
+	rq = reloc_gpu_alloc(eb);
+	if (IS_ERR(rq))
+		return PTR_ERR(rq);
+
+	rq->batch = eb->reloc_cache.head.vma;
+
+	eb->reloc_cache.map = NULL;
+	eb->reloc_cache.pos = 0;
+
+	err = 0;
+	list_for_each_entry(ev, &eb->relocs, reloc_link) {
+		err = eb_relocs_gpu_vma(eb, rq, ev);
+		if (err)
+			break;
+	}
+
+	return reloc_gpu_flush(eb, rq, err);
+}
+
+static void eb_relocs_update_vma(struct i915_execbuffer *eb, struct eb_vma *ev)
+{
+	const struct drm_i915_gem_exec_object2 *entry = ev->exec;
+	struct drm_i915_gem_relocation_entry __user *urelocs =
+		u64_to_user_ptr(entry->relocs_ptr);
+	unsigned long count = entry->relocation_count;
+
+	do {
+		u32 handle;
+
+		if (get_user(handle, &urelocs->target_handle) == 0) {
+			struct i915_vma *vma = eb_get_vma(eb, handle)->vma;
+			u64 offset = gen8_canonical_addr(vma->node.start);
+
+			if (put_user(offset, &urelocs->presumed_offset))
+				return;
+		}
+	} while (urelocs++, --count);
+}
+
+static void eb_relocs_update_user(struct i915_execbuffer *eb)
+{
+	struct eb_vma *ev;
+
+	if (!(eb->args->flags & __EXEC_HAS_RELOC))
+		return;
+
+	list_for_each_entry(ev, &eb->relocs, reloc_link)
+		eb_relocs_update_vma(eb, ev);
+}
+
 static int eb_relocate(struct i915_execbuffer *eb)
 {
 	int err;
@@ -2486,22 +2419,17 @@ static int eb_relocate(struct i915_execbuffer *eb)
 		return err;
 
 	/* The objects are in their final locations, apply the relocations. */
-	if (eb->args->flags & __EXEC_HAS_RELOC) {
-		struct eb_vma *ev;
-		int flush;
-
-		list_for_each_entry(ev, &eb->relocs, reloc_link) {
-			err = eb_relocate_vma(eb, ev);
-			if (err)
-				break;
-		}
+	if (eb->args->flags & __EXEC_HAS_RELOC && !list_empty(&eb->relocs)) {
+		err = eb_relocs_copy_user(eb);
+		if (err)
+			return err;
 
-		flush = reloc_gpu_flush(eb);
-		if (!err)
-			err = flush;
+		err = eb_relocs_gpu(eb);
+		if (err)
+			return err;
 	}
 
-	return err;
+	return 0;
 }
 
 static int eb_move_to_gpu(struct i915_execbuffer *eb)
@@ -3398,9 +3326,6 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	if (err)
 		goto err_vma;
 
-	/* All GPU relocation batches must be submitted prior to the user rq */
-	GEM_BUG_ON(eb.reloc_cache.rq);
-
 	/* Allocate a request for this batch buffer nice and early. */
 	eb.request = __i915_request_create(eb.context, GFP_KERNEL);
 	if (IS_ERR(eb.request)) {
@@ -3472,6 +3397,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 err_vma:
 	if (eb.parser.shadow)
 		intel_gt_buffer_pool_put(eb.parser.shadow->vma->private);
+	eb_relocs_update_user(&eb);
 	eb_unpin_engine(&eb);
 err_context:
 	i915_gem_context_put(eb.gem_context);
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c
index 992d46db1b33..940090753949 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c
@@ -11,6 +11,7 @@
 
 #include "mock_context.h"
 
+#if 0
 static u64 read_reloc(const u32 *map, int x, const u64 mask)
 {
 	u64 reloc;
@@ -18,10 +19,12 @@ static u64 read_reloc(const u32 *map, int x, const u64 mask)
 	memcpy(&reloc, &map[x], sizeof(reloc));
 	return reloc & mask;
 }
+#endif
 
 static int __igt_gpu_reloc(struct i915_execbuffer *eb,
 			   struct drm_i915_gem_object *obj)
 {
+#if 0
 	const unsigned int offsets[] = { 8, 3, 0 };
 	const u64 mask =
 		GENMASK_ULL(eb->reloc_cache.use_64bit_reloc ? 63 : 31, 0);
@@ -97,6 +100,9 @@ static int __igt_gpu_reloc(struct i915_execbuffer *eb,
 unpin_vma:
 	i915_vma_unpin(vma);
 	return err;
+#else
+	return 0;
+#endif
 }
 
 static int igt_gpu_reloc(void *arg)
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [Intel-gfx] [PATCH 22/23] drm/i915: Add an implementation for i915_gem_ww_ctx locking, v2.
  2020-07-02  8:32 ` [Intel-gfx] " Chris Wilson
                   ` (20 preceding siblings ...)
  (?)
@ 2020-07-02  8:32 ` Chris Wilson
  2020-07-02 22:32   ` kernel test robot
  -1 siblings, 1 reply; 56+ messages in thread
From: Chris Wilson @ 2020-07-02  8:32 UTC (permalink / raw)
  To: intel-gfx

From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>

i915_gem_ww_ctx is used to lock all gem bo's for pinning and memory
eviction. We don't use it yet, but lets start adding the definition
first.

To use it, we have to pass a non-NULL ww to gem_object_lock, and don't
unlock directly. It is done in i915_gem_ww_ctx_fini.

Changes since v1:
- Change ww_ctx and obj order in locking functions (Jonas Lahtinen)

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
---
 drivers/gpu/drm/i915/Makefile                 |   4 +
 drivers/gpu/drm/i915/gem/i915_gem_object.h    |   8 +-
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |   9 ++
 drivers/gpu/drm/i915/mm/i915_acquire_ctx.c    | 110 ++++++++++++++++++
 drivers/gpu/drm/i915/mm/i915_acquire_ctx.h    |  34 ++++++
 5 files changed, 161 insertions(+), 4 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/mm/i915_acquire_ctx.c
 create mode 100644 drivers/gpu/drm/i915/mm/i915_acquire_ctx.h

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 41a27fd5dbc7..33c85b4ff3ed 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -124,6 +124,10 @@ gt-y += \
 	gt/gen9_renderstate.o
 i915-y += $(gt-y)
 
+# Memory + DMA management
+i915-y += \
+	mm/i915_acquire_ctx.o
+
 # GEM (Graphics Execution Management) code
 gem-y += \
 	gem/i915_gem_busy.o \
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h
index 2faa481cc18f..8b4a341ebc56 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
@@ -110,14 +110,14 @@ i915_gem_object_put(struct drm_i915_gem_object *obj)
 
 #define assert_object_held(obj) dma_resv_assert_held((obj)->base.resv)
 
-static inline void i915_gem_object_lock(struct drm_i915_gem_object *obj)
+static inline bool i915_gem_object_trylock(struct drm_i915_gem_object *obj)
 {
-	dma_resv_lock(obj->base.resv, NULL);
+	return dma_resv_trylock(obj->base.resv);
 }
 
-static inline bool i915_gem_object_trylock(struct drm_i915_gem_object *obj)
+static inline void i915_gem_object_lock(struct drm_i915_gem_object *obj)
 {
-	return dma_resv_trylock(obj->base.resv);
+	dma_resv_lock(obj->base.resv, NULL);
 }
 
 static inline int
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index d0847d7896f9..80b2cdd3875f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -206,6 +206,15 @@ struct drm_i915_gem_object {
 		 */
 		struct list_head region_link;
 
+		/**
+		 * @acquire_link: Link into @i915_acquire_ctx.list
+		 *
+		 * When we lock this object through i915_gem_object_lock() with a
+		 * context, we add it to the list to ensure we can unlock everything
+		 * when i915_gem_ww_ctx_backoff() or i915_gem_ww_ctx_fini() are called.
+		 */
+		struct list_head acquire_link;
+
 		struct sg_table *pages;
 		void *mapping;
 
diff --git a/drivers/gpu/drm/i915/mm/i915_acquire_ctx.c b/drivers/gpu/drm/i915/mm/i915_acquire_ctx.c
new file mode 100644
index 000000000000..7e8771d8a711
--- /dev/null
+++ b/drivers/gpu/drm/i915/mm/i915_acquire_ctx.c
@@ -0,0 +1,110 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2020 Intel Corporation
+ */
+
+#include <linux/dma-resv.h>
+
+#include "gem/i915_gem_object.h"
+
+#include "i915_acquire_ctx.h"
+
+void i915_acquire_ctx_init(struct i915_acquire_ctx *acquire)
+{
+	ww_acquire_init(&acquire->ctx, &reservation_ww_class);
+	INIT_LIST_HEAD(&acquire->list);
+	acquire->contended = NULL;
+}
+
+int i915_acquire_ctx_lock(struct i915_acquire_ctx *acquire,
+			  struct drm_i915_gem_object *obj)
+{
+	int ret;
+
+	ret = dma_resv_lock_interruptible(obj->base.resv, &acquire->ctx);
+	if (!ret)
+		list_add_tail(&obj->mm.acquire_link, &acquire->list);
+	if (ret == -EALREADY)
+		ret = 0;
+	if (ret == -EDEADLK)
+		acquire->contended = obj;
+
+	return ret;
+}
+
+static void i915_acquire_ctx_unlock_all(struct i915_acquire_ctx *acquire)
+{
+	struct drm_i915_gem_object *obj, *on;
+
+	list_for_each_entry_safe(obj, on, &acquire->list, mm.acquire_link)
+		i915_gem_object_unlock(obj);
+	INIT_LIST_HEAD(&acquire->list);
+}
+
+int __must_check i915_acquire_ctx_backoff(struct i915_acquire_ctx *acquire)
+{
+	struct drm_i915_gem_object *obj;
+	int ret = 0;
+
+	GEM_BUG_ON(!acquire->contended);
+
+	i915_acquire_ctx_unlock_all(acquire);
+
+	obj = fetch_and_zero(&acquire->contended);
+	ret = dma_resv_lock_slow_interruptible(obj->base.resv, &acquire->ctx);
+	if (!ret)
+		list_add_tail(&obj->mm.acquire_link, &acquire->list);
+
+	return ret;
+}
+
+void i915_acquire_ctx_fini(struct i915_acquire_ctx *acquire)
+{
+	GEM_BUG_ON(acquire->contended);
+
+	i915_acquire_ctx_unlock_all(acquire);
+	ww_acquire_fini(&acquire->ctx);
+}
+
+#if 0
+static int igt_acquire_ctx(void *arg)
+{
+	struct drm_i915_private *i915 = arg;
+	struct drm_i915_gem_object *obj, *obj2;
+	struct i915_acquire_ctx acquire;
+	int err = 0;
+
+	obj = i915_gem_object_create_internal(i915, PAGE_SIZE);
+	if (IS_ERR(obj))
+		return PTR_ERR(obj);
+
+	obj2 = i915_gem_object_create_internal(i915, PAGE_SIZE);
+	if (IS_ERR(obj)) {
+		err = PTR_ERR(obj);
+		goto put1;
+	}
+
+	i915_acquire_ctx_init(&acquire, true);
+retry:
+	/* Lock the objects, twice for good measure (-EALREADY handling) */
+	err = i915_gem_object_lock(obj, &acquire);
+	if (!err)
+		err = i915_gem_object_lock_interruptible(obj, &acquire);
+	if (!err)
+		err = i915_gem_object_lock_interruptible(obj2, &acquire);
+	if (!err)
+		err = i915_gem_object_lock(obj2, &acquire);
+
+	if (err == -EDEADLK) {
+		err = i915_acquire_ctx_backoff(&acquire);
+		if (!err)
+			goto retry;
+	}
+	i915_acquire_ctx_fini(&acquire);
+	i915_gem_object_put(obj2);
+put1:
+	i915_gem_object_put(obj);
+	return err;
+}
+
+#endif
diff --git a/drivers/gpu/drm/i915/mm/i915_acquire_ctx.h b/drivers/gpu/drm/i915/mm/i915_acquire_ctx.h
new file mode 100644
index 000000000000..71cd9373c4fe
--- /dev/null
+++ b/drivers/gpu/drm/i915/mm/i915_acquire_ctx.h
@@ -0,0 +1,34 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2020 Intel Corporation
+ */
+
+#ifndef __I915_ACQIURE_CTX_H__
+#define __I915_ACQUIRE_CTX_H__
+
+#include <linux/list.h>
+#include <linux/ww_mutex.h>
+
+struct drm_i915_gem_object;
+
+struct i915_acquire_ctx {
+	struct ww_acquire_ctx ctx;
+	struct list_head list;
+	struct drm_i915_gem_object *contended;
+};
+
+void i915_acquire_ctx_init(struct i915_acquire_ctx *acquire);
+
+static inline void i915_acquire_ctx_done(struct i915_acquire_ctx *acquire)
+{
+	ww_acquire_done(&acquire->ctx);
+}
+
+void i915_acquire_ctx_fini(struct i915_acquire_ctx *acquire);
+
+int i915_acquire_ctx_lock(struct i915_acquire_ctx *acquire,
+			  struct drm_i915_gem_object *obj);
+
+int __must_check i915_acquire_ctx_backoff(struct i915_acquire_ctx *acquire);
+
+#endif /* __I915_ACQUIRE_CTX_H__ */
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [Intel-gfx] [PATCH 23/23] drm/i915/gem: Pull execbuf dma resv under a single critical section
  2020-07-02  8:32 ` [Intel-gfx] " Chris Wilson
                   ` (21 preceding siblings ...)
  (?)
@ 2020-07-02  8:32 ` Chris Wilson
  -1 siblings, 0 replies; 56+ messages in thread
From: Chris Wilson @ 2020-07-02  8:32 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Acquire all the objects and their backing storage, and page directories,
as used by execbuf under a single common ww_mutex. Albeit we have to
restart the critical section a few times in order to handle various
restrictions (such as avoiding copy_(from|to)_user and mmap_sem).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 146 ++++++++----------
 1 file changed, 64 insertions(+), 82 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index c325aed82629..22b382053cdc 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -20,6 +20,7 @@
 #include "gt/intel_gt_pm.h"
 #include "gt/intel_gt_requests.h"
 #include "gt/intel_ring.h"
+#include "mm/i915_acquire_ctx.h"
 
 #include "i915_drv.h"
 #include "i915_gem_clflush.h"
@@ -245,6 +246,8 @@ struct i915_execbuffer {
 	struct intel_context *context; /* logical state for the request */
 	struct i915_gem_context *gem_context; /** caller's context */
 
+	struct i915_acquire_ctx acquire; /** lock for _all_ DMA reservations */
+
 	struct i915_request *request; /** our request to build */
 	struct eb_vma *batch; /** identity of the batch obj/vma */
 
@@ -385,42 +388,6 @@ static void eb_vma_array_put(struct eb_vma_array *arr)
 	kref_put(&arr->kref, eb_vma_array_destroy);
 }
 
-static int
-eb_lock_vma(struct i915_execbuffer *eb, struct ww_acquire_ctx *acquire)
-{
-	struct eb_vma *ev;
-	int err = 0;
-
-	list_for_each_entry(ev, &eb->submit_list, submit_link) {
-		struct i915_vma *vma = ev->vma;
-
-		err = ww_mutex_lock_interruptible(&vma->resv->lock, acquire);
-		if (err == -EDEADLK) {
-			struct eb_vma *unlock = ev, *en;
-
-			list_for_each_entry_safe_continue_reverse(unlock, en,
-								  &eb->submit_list,
-								  submit_link) {
-				ww_mutex_unlock(&unlock->vma->resv->lock);
-				list_move_tail(&unlock->submit_link, &eb->submit_list);
-			}
-
-			GEM_BUG_ON(!list_is_first(&ev->submit_link, &eb->submit_list));
-			err = ww_mutex_lock_slow_interruptible(&vma->resv->lock,
-							       acquire);
-		}
-		if (err) {
-			list_for_each_entry_continue_reverse(ev,
-							     &eb->submit_list,
-							     submit_link)
-				ww_mutex_unlock(&ev->vma->resv->lock);
-			break;
-		}
-	}
-
-	return err;
-}
-
 static int eb_create(struct i915_execbuffer *eb)
 {
 	/* Allocate an extra slot for use by the sentinel */
@@ -661,6 +628,31 @@ eb_add_vma(struct i915_execbuffer *eb,
 	}
 }
 
+static int eb_reserve_mm(struct i915_execbuffer *eb)
+{
+	struct eb_vma *ev;
+	int err;
+
+	list_for_each_entry(ev, &eb->bind_list, bind_link) {
+		err = i915_acquire_ctx_lock(&eb->acquire, ev->vma->obj);
+		if (err == -EDEADLK) {
+			struct eb_vma *unlock = ev, *en;
+
+			list_for_each_entry_safe_continue_reverse(unlock, en,
+								  &eb->bind_list,
+								  bind_link)
+				list_move_tail(&unlock->bind_link,
+					       &eb->bind_list);
+
+			err = i915_acquire_ctx_backoff(&eb->acquire);
+		}
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
 struct eb_vm_work {
 	struct dma_fence_work base;
 	struct eb_vma_array *array;
@@ -1338,10 +1330,13 @@ static int eb_prepare_vma(struct eb_vm_work *work,
 	if (ev->flags & __EXEC_OBJECT_NEEDS_MAP)
 		max_size = max_t(u64, max_size, vma->fence_size);
 
+	/* XXX pass eb->acquire to pt_stash for its DMA resv */
 	err = i915_vm_alloc_pt_stash(work->vm, &work->stash, max_size);
+	GEM_BUG_ON(err == -EDEADLK); /* all fresh, no contention */
 	if (err)
 		return err;
 
+	/* XXX just setup vma->pages, holding obj->pages under ww_mutex */
 	return i915_vma_get_pages(vma);
 }
 
@@ -1402,7 +1397,11 @@ static int eb_reserve_vm(struct i915_execbuffer *eb)
 	unsigned long count;
 	unsigned int pass;
 	struct eb_vma *ev;
-	int err = 0;
+	int err;
+
+	err = eb_reserve_mm(eb);
+	if (err)
+		return err;
 
 	count = 0;
 	INIT_LIST_HEAD(&unbound);
@@ -1533,6 +1532,8 @@ static int eb_reserve_vm(struct i915_execbuffer *eb)
 		if (signal_pending(current))
 			return -EINTR;
 
+		i915_acquire_ctx_fini(&eb->acquire);
+
 		/* Now safe to wait with no reservations held */
 
 		if (err == -EAGAIN) {
@@ -1541,6 +1542,9 @@ static int eb_reserve_vm(struct i915_execbuffer *eb)
 		}
 
 		err = wait_for_unbinds(eb, &unbound, pass++);
+		i915_acquire_ctx_init(&eb->acquire);
+		if (err == 0)
+			err = eb_reserve_mm(eb);
 		if (err)
 			return err;
 	} while (1);
@@ -1948,8 +1952,6 @@ static int reloc_move_to_gpu(struct i915_request *rq, struct i915_vma *vma)
 	struct drm_i915_gem_object *obj = vma->obj;
 	int err;
 
-	i915_vma_lock(vma);
-
 	if (obj->cache_dirty & ~obj->cache_coherent)
 		i915_gem_clflush_object(obj, 0);
 	obj->write_domain = 0;
@@ -1958,8 +1960,6 @@ static int reloc_move_to_gpu(struct i915_request *rq, struct i915_vma *vma)
 	if (err == 0)
 		err = i915_vma_move_to_active(vma, rq, EXEC_OBJECT_WRITE);
 
-	i915_vma_unlock(vma);
-
 	return err;
 }
 
@@ -2223,6 +2223,7 @@ static int eb_relocs_copy_user(struct i915_execbuffer *eb)
 	/* Drop everything before we copy_from_user */
 	list_for_each_entry(ev, &eb->bind_list, bind_link)
 		eb_unreserve_vma(ev);
+	i915_acquire_ctx_fini(&eb->acquire);
 
 	eb->reloc_cache.head.vma = NULL;
 	eb->reloc_cache.pos = N_RELOC;
@@ -2230,9 +2231,13 @@ static int eb_relocs_copy_user(struct i915_execbuffer *eb)
 	list_for_each_entry(ev, &eb->relocs, reloc_link) {
 		err = eb_relocs_copy_vma(eb, ev);
 		if (err)
-			return err;
+			break;
 	}
 
+	i915_acquire_ctx_init(&eb->acquire);
+	if (err)
+		return err;
+
 	/* Now reacquire everything, including the extra reloc bo */
 	return eb_reserve_vm(eb);
 }
@@ -2275,11 +2280,9 @@ get_gpu_relocs(struct i915_execbuffer *eb,
 		struct i915_vma *vma = c->head.vma;
 		int err;
 
-		i915_vma_lock(vma);
 		err = i915_request_await_object(rq, vma->obj, false);
 		if (err == 0)
 			err = i915_vma_move_to_active(vma, rq, 0);
-		i915_vma_unlock(vma);
 		if (err)
 			return ERR_PTR(err);
 
@@ -2434,17 +2437,8 @@ static int eb_relocate(struct i915_execbuffer *eb)
 
 static int eb_move_to_gpu(struct i915_execbuffer *eb)
 {
-	struct ww_acquire_ctx acquire;
 	struct eb_vma *ev;
-	int err = 0;
-
-	ww_acquire_init(&acquire, &reservation_ww_class);
-
-	err = eb_lock_vma(eb, &acquire);
-	if (err)
-		goto err_fini;
-
-	ww_acquire_done(&acquire);
+	int err;
 
 	list_for_each_entry(ev, &eb->submit_list, submit_link) {
 		struct i915_vma *vma = ev->vma;
@@ -2481,27 +2475,22 @@ static int eb_move_to_gpu(struct i915_execbuffer *eb)
 				flags &= ~EXEC_OBJECT_ASYNC;
 		}
 
-		if (err == 0 && !(flags & EXEC_OBJECT_ASYNC)) {
+		if (!(flags & EXEC_OBJECT_ASYNC)) {
 			err = i915_request_await_object
 				(eb->request, obj, flags & EXEC_OBJECT_WRITE);
+			if (unlikely(err))
+				goto err_skip;
 		}
 
-		if (err == 0)
-			err = i915_vma_move_to_active(vma, eb->request, flags);
-
-		i915_vma_unlock(vma);
+		err = i915_vma_move_to_active(vma, eb->request, flags);
+		if (unlikely(err))
+			goto err_skip;
 	}
-	ww_acquire_fini(&acquire);
-
-	if (unlikely(err))
-		goto err_skip;
 
 	/* Unconditionally flush any chipset caches (for streaming writes). */
 	intel_gt_chipset_flush(eb->engine->gt);
 	return 0;
 
-err_fini:
-	ww_acquire_fini(&acquire);
 err_skip:
 	i915_request_set_error_once(eb->request, err);
 	return err;
@@ -2653,39 +2642,27 @@ static int eb_parse_pipeline(struct i915_execbuffer *eb,
 	/* Mark active refs early for this worker, in case we get interrupted */
 	err = parser_mark_active(pw, eb->context->timeline);
 	if (err)
-		goto err_commit;
-
-	err = dma_resv_lock_interruptible(pw->batch->resv, NULL);
-	if (err)
-		goto err_commit;
+		goto out;
 
 	err = dma_resv_reserve_shared(pw->batch->resv, 1);
 	if (err)
-		goto err_commit_unlock;
+		goto out;
 
 	/* Wait for all writes (and relocs) into the batch to complete */
 	err = i915_sw_fence_await_reservation(&pw->base.chain,
 					      pw->batch->resv, NULL, false,
 					      0, I915_FENCE_GFP);
 	if (err < 0)
-		goto err_commit_unlock;
+		goto out;
 
 	/* Keep the batch alive and unwritten as we parse */
 	dma_resv_add_shared_fence(pw->batch->resv, &pw->base.dma);
 
-	dma_resv_unlock(pw->batch->resv);
-
 	/* Force execution to wait for completion of the parser */
-	dma_resv_lock(shadow->resv, NULL);
 	dma_resv_add_excl_fence(shadow->resv, &pw->base.dma);
-	dma_resv_unlock(shadow->resv);
 
-	dma_fence_work_commit_imm(&pw->base);
-	return 0;
-
-err_commit_unlock:
-	dma_resv_unlock(pw->batch->resv);
-err_commit:
+	err = 0;
+out:
 	i915_sw_fence_set_error_once(&pw->base.chain, err);
 	dma_fence_work_commit_imm(&pw->base);
 	return err;
@@ -3309,6 +3286,8 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 		goto err_context;
 	lockdep_assert_held(&eb.context->timeline->mutex);
 
+	i915_acquire_ctx_init(&eb.acquire);
+
 	err = eb_relocate(&eb);
 	if (err) {
 		/*
@@ -3322,6 +3301,8 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 		goto err_vma;
 	}
 
+	i915_acquire_ctx_done(&eb.acquire);
+
 	err = eb_parse(&eb);
 	if (err)
 		goto err_vma;
@@ -3397,6 +3378,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 err_vma:
 	if (eb.parser.shadow)
 		intel_gt_buffer_pool_put(eb.parser.shadow->vma->private);
+	i915_acquire_ctx_fini(&eb.acquire);
 	eb_relocs_update_user(&eb);
 	eb_unpin_engine(&eb);
 err_context:
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/23] drm/i915: Drop vm.ref for duplicate vma on construction
  2020-07-02  8:32 ` [Intel-gfx] " Chris Wilson
                   ` (22 preceding siblings ...)
  (?)
@ 2020-07-02  9:17 ` Patchwork
  -1 siblings, 0 replies; 56+ messages in thread
From: Patchwork @ 2020-07-02  9:17 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/23] drm/i915: Drop vm.ref for duplicate vma on construction
URL   : https://patchwork.freedesktop.org/series/79037/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
2b716a4a1108 drm/i915: Drop vm.ref for duplicate vma on construction
02e97324e6dd drm/i915/gem: Split the context's obj:vma lut into its own mutex
-:82: CHECK:UNCOMMENTED_DEFINITION: struct mutex definition without comment
#82: FILE: drivers/gpu/drm/i915/gem/i915_gem_context_types.h:173:
+	struct mutex lut_mutex;

total: 0 errors, 0 warnings, 1 checks, 105 lines checked
981b52482265 drm/i915/gem: Drop forced struct_mutex from shrinker_taints_mutex
63cac79e5aff drm/i915/gem: Only revoke mmap handlers if active
ceb2d68f55a0 drm/i915: Export ppgtt_bind_vma
41eb0c64c9b6 drm/i915: Preallocate stashes for vma page-directories
84369a1c6dad drm/i915: Switch to object allocations for page directories
880e4a90f882 drm/i915/gem: Don't drop the timeline lock during execbuf
84600d1a1d00 drm/i915/gem: Rename execbuf.bind_link to unbound_link
6ce1d70bdaa0 drm/i915/gem: Break apart the early i915_vma_pin from execbuf object lookup
6095d5335d88 drm/i915/gem: Remove the call for no-evict i915_vma_pin
8041afeb3c3a drm/i915: Add list_for_each_entry_safe_continue_reverse
-:21: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'pos' - possible side-effects?
#21: FILE: drivers/gpu/drm/i915/i915_utils.h:269:
+#define list_for_each_entry_safe_continue_reverse(pos, n, head, member)	\
+	for (pos = list_prev_entry(pos, member),			\
+	     n = list_prev_entry(pos, member);				\
+	     &pos->member != (head);					\
+	     pos = n, n = list_prev_entry(n, member))

-:21: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'n' - possible side-effects?
#21: FILE: drivers/gpu/drm/i915/i915_utils.h:269:
+#define list_for_each_entry_safe_continue_reverse(pos, n, head, member)	\
+	for (pos = list_prev_entry(pos, member),			\
+	     n = list_prev_entry(pos, member);				\
+	     &pos->member != (head);					\
+	     pos = n, n = list_prev_entry(n, member))

-:21: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'member' - possible side-effects?
#21: FILE: drivers/gpu/drm/i915/i915_utils.h:269:
+#define list_for_each_entry_safe_continue_reverse(pos, n, head, member)	\
+	for (pos = list_prev_entry(pos, member),			\
+	     n = list_prev_entry(pos, member);				\
+	     &pos->member != (head);					\
+	     pos = n, n = list_prev_entry(n, member))

total: 0 errors, 0 warnings, 3 checks, 12 lines checked
5f2ba176cfcc drm/i915: Always defer fenced work to the worker
3bc8dcef47f8 drm/i915/gem: Assign context id for async work
658265850918 drm/i915: Export a preallocate variant of i915_active_acquire()
3474748efc48 drm/i915/gem: Separate the ww_mutex walker into its own list
9596fd217b0e drm/i915/gem: Asynchronous GTT unbinding
11531ee318da drm/i915/gem: Bind the fence async for execbuf
67d8c52b893a drm/i915/gem: Include cmdparser in common execbuf pinning
c9ad9f9d60dd drm/i915/gem: Include secure batch in common execbuf pinning
db49ec7ea2ce drm/i915/gem: Reintroduce multiple passes for reloc processing
-:993: WARNING:IF_0: Consider removing the code enclosed by this #if 0 and its #endif
#993: FILE: drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c:14:
+#if 0

-:1006: WARNING:IF_0: Consider removing the code enclosed by this #if 0 and its #endif
#1006: FILE: drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c:27:
+#if 0

total: 0 errors, 2 warnings, 0 checks, 969 lines checked
1375d1da0182 drm/i915: Add an implementation for i915_gem_ww_ctx locking, v2.
-:78: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#78: 
new file mode 100644

-:151: WARNING:IF_0: Consider removing the code enclosed by this #if 0 and its #endif
#151: FILE: drivers/gpu/drm/i915/mm/i915_acquire_ctx.c:69:
+#if 0

total: 0 errors, 2 warnings, 0 checks, 187 lines checked
6f377648e7f5 drm/i915/gem: Pull execbuf dma resv under a single critical section

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [Intel-gfx] ✗ Fi.CI.SPARSE: warning for series starting with [01/23] drm/i915: Drop vm.ref for duplicate vma on construction
  2020-07-02  8:32 ` [Intel-gfx] " Chris Wilson
                   ` (23 preceding siblings ...)
  (?)
@ 2020-07-02  9:18 ` Patchwork
  -1 siblings, 0 replies; 56+ messages in thread
From: Patchwork @ 2020-07-02  9:18 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/23] drm/i915: Drop vm.ref for duplicate vma on construction
URL   : https://patchwork.freedesktop.org/series/79037/
State : warning

== Summary ==

$ dim sparse --fast origin/drm-tip
Sparse version: v0.6.0
Fast mode used, each commit won't be checked separately.
-
+drivers/gpu/drm/i915/display/intel_display.c:1223:22: error: Expected constant expression in case statement
+drivers/gpu/drm/i915/display/intel_display.c:1226:22: error: Expected constant expression in case statement
+drivers/gpu/drm/i915/display/intel_display.c:1229:22: error: Expected constant expression in case statement
+drivers/gpu/drm/i915/display/intel_display.c:1232:22: error: Expected constant expression in case statement
+drivers/gpu/drm/i915/gem/i915_gem_context.c:2274:17: error: bad integer constant expression
+drivers/gpu/drm/i915/gem/i915_gem_context.c:2275:17: error: bad integer constant expression
+drivers/gpu/drm/i915/gem/i915_gem_context.c:2276:17: error: bad integer constant expression
+drivers/gpu/drm/i915/gem/i915_gem_context.c:2277:17: error: bad integer constant expression
+drivers/gpu/drm/i915/gem/i915_gem_context.c:2278:17: error: bad integer constant expression
+drivers/gpu/drm/i915/gem/i915_gem_context.c:2279:17: error: bad integer constant expression
+drivers/gpu/drm/i915/gt/intel_lrc.c:2785:17: error: too long token expansion
+drivers/gpu/drm/i915/gt/intel_lrc.c:2785:17: error: too long token expansion
+drivers/gpu/drm/i915/gt/intel_reset.c:1310:5: warning: context imbalance in 'intel_gt_reset_trylock' - different lock contexts for basic block
+drivers/gpu/drm/i915/gt/sysfs_engines.c:61:10: error: bad integer constant expression
+drivers/gpu/drm/i915/gt/sysfs_engines.c:62:10: error: bad integer constant expression
+drivers/gpu/drm/i915/gt/sysfs_engines.c:66:10: error: bad integer constant expression
+drivers/gpu/drm/i915/gvt/mmio.c:287:23: warning: memcpy with byte count of 279040
+drivers/gpu/drm/i915/i915_perf.c:1425:15: warning: memset with byte count of 16777216
+drivers/gpu/drm/i915/i915_perf.c:1479:15: warning: memset with byte count of 16777216
+drivers/gpu/drm/i915/intel_wakeref.c:137:19: warning: context imbalance in 'wakeref_auto_timeout' - unexpected unlock
+drivers/gpu/drm/i915/selftests/i915_syncmap.c:80:54: warning: dubious: x | !y
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'fwtable_read16' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'fwtable_read32' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'fwtable_read64' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'fwtable_read8' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'fwtable_write16' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'fwtable_write32' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'fwtable_write8' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen11_fwtable_read16' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen11_fwtable_read32' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen11_fwtable_read64' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen11_fwtable_read8' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen11_fwtable_write16' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen11_fwtable_write32' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen11_fwtable_write8' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen12_fwtable_read16' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen12_fwtable_read32' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen12_fwtable_read64' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen12_fwtable_read8' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen12_fwtable_write16' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen12_fwtable_write32' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen12_fwtable_write8' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen6_read16' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen6_read32' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen6_read64' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen6_read8' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen6_write16' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen6_write32' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen6_write8' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen8_write16' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen8_write32' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen8_write8' - different lock contexts for basic block

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [Intel-gfx] ✓ Fi.CI.BAT: success for series starting with [01/23] drm/i915: Drop vm.ref for duplicate vma on construction
  2020-07-02  8:32 ` [Intel-gfx] " Chris Wilson
                   ` (24 preceding siblings ...)
  (?)
@ 2020-07-02  9:40 ` Patchwork
  -1 siblings, 0 replies; 56+ messages in thread
From: Patchwork @ 2020-07-02  9:40 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/23] drm/i915: Drop vm.ref for duplicate vma on construction
URL   : https://patchwork.freedesktop.org/series/79037/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_8692 -> Patchwork_18065
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/index.html

Known issues
------------

  Here are the changes found in Patchwork_18065 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_exec_suspend@basic-s3:
    - fi-tgl-u2:          [PASS][1] -> [FAIL][2] ([i915#1888]) +1 similar issue
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8692/fi-tgl-u2/igt@gem_exec_suspend@basic-s3.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/fi-tgl-u2/igt@gem_exec_suspend@basic-s3.html

  * igt@gem_linear_blits@basic:
    - fi-tgl-u2:          [PASS][3] -> [DMESG-WARN][4] ([i915#402])
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8692/fi-tgl-u2/igt@gem_linear_blits@basic.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/fi-tgl-u2/igt@gem_linear_blits@basic.html

  * igt@i915_pm_backlight@basic-brightness:
    - fi-whl-u:           [PASS][5] -> [DMESG-WARN][6] ([i915#95])
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8692/fi-whl-u/igt@i915_pm_backlight@basic-brightness.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/fi-whl-u/igt@i915_pm_backlight@basic-brightness.html

  * igt@kms_flip@basic-flip-vs-wf_vblank@b-edp1:
    - fi-icl-u2:          [PASS][7] -> [DMESG-WARN][8] ([i915#1982])
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8692/fi-icl-u2/igt@kms_flip@basic-flip-vs-wf_vblank@b-edp1.html
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/fi-icl-u2/igt@kms_flip@basic-flip-vs-wf_vblank@b-edp1.html

  
#### Possible fixes ####

  * igt@i915_pm_rpm@module-reload:
    - fi-glk-dsi:         [DMESG-WARN][9] ([i915#1982]) -> [PASS][10]
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8692/fi-glk-dsi/igt@i915_pm_rpm@module-reload.html
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/fi-glk-dsi/igt@i915_pm_rpm@module-reload.html

  * igt@kms_cursor_legacy@basic-busy-flip-before-cursor-atomic:
    - {fi-kbl-7560u}:     [DMESG-WARN][11] ([i915#1982]) -> [PASS][12]
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8692/fi-kbl-7560u/igt@kms_cursor_legacy@basic-busy-flip-before-cursor-atomic.html
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/fi-kbl-7560u/igt@kms_cursor_legacy@basic-busy-flip-before-cursor-atomic.html

  
#### Warnings ####

  * igt@i915_pm_rpm@module-reload:
    - fi-kbl-x1275:       [SKIP][13] ([fdo#109271]) -> [DMESG-FAIL][14] ([i915#62])
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8692/fi-kbl-x1275/igt@i915_pm_rpm@module-reload.html
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/fi-kbl-x1275/igt@i915_pm_rpm@module-reload.html

  * igt@kms_cursor_legacy@basic-flip-after-cursor-legacy:
    - fi-kbl-x1275:       [DMESG-WARN][15] ([i915#62] / [i915#92] / [i915#95]) -> [DMESG-WARN][16] ([i915#62] / [i915#92]) +1 similar issue
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8692/fi-kbl-x1275/igt@kms_cursor_legacy@basic-flip-after-cursor-legacy.html
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/fi-kbl-x1275/igt@kms_cursor_legacy@basic-flip-after-cursor-legacy.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [i915#1888]: https://gitlab.freedesktop.org/drm/intel/issues/1888
  [i915#1982]: https://gitlab.freedesktop.org/drm/intel/issues/1982
  [i915#402]: https://gitlab.freedesktop.org/drm/intel/issues/402
  [i915#62]: https://gitlab.freedesktop.org/drm/intel/issues/62
  [i915#92]: https://gitlab.freedesktop.org/drm/intel/issues/92
  [i915#95]: https://gitlab.freedesktop.org/drm/intel/issues/95


Participating hosts (43 -> 37)
------------------------------

  Missing    (6): fi-ilk-m540 fi-hsw-4200u fi-byt-squawks fi-bsw-cyan fi-byt-clapper fi-bdw-samus 


Build changes
-------------

  * Linux: CI_DRM_8692 -> Patchwork_18065

  CI-20190529: 20190529
  CI_DRM_8692: e30abe29fd5407631a61d48f93bad5fdeba8080d @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_5720: f35053d4b6d7bbcf6505ef67a8bd56acc7fb2eb2 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_18065: 6f377648e7f5e352bf1600ef1f8954ec04781765 @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

6f377648e7f5 drm/i915/gem: Pull execbuf dma resv under a single critical section
1375d1da0182 drm/i915: Add an implementation for i915_gem_ww_ctx locking, v2.
db49ec7ea2ce drm/i915/gem: Reintroduce multiple passes for reloc processing
c9ad9f9d60dd drm/i915/gem: Include secure batch in common execbuf pinning
67d8c52b893a drm/i915/gem: Include cmdparser in common execbuf pinning
11531ee318da drm/i915/gem: Bind the fence async for execbuf
9596fd217b0e drm/i915/gem: Asynchronous GTT unbinding
3474748efc48 drm/i915/gem: Separate the ww_mutex walker into its own list
658265850918 drm/i915: Export a preallocate variant of i915_active_acquire()
3bc8dcef47f8 drm/i915/gem: Assign context id for async work
5f2ba176cfcc drm/i915: Always defer fenced work to the worker
8041afeb3c3a drm/i915: Add list_for_each_entry_safe_continue_reverse
6095d5335d88 drm/i915/gem: Remove the call for no-evict i915_vma_pin
6ce1d70bdaa0 drm/i915/gem: Break apart the early i915_vma_pin from execbuf object lookup
84600d1a1d00 drm/i915/gem: Rename execbuf.bind_link to unbound_link
880e4a90f882 drm/i915/gem: Don't drop the timeline lock during execbuf
84369a1c6dad drm/i915: Switch to object allocations for page directories
41eb0c64c9b6 drm/i915: Preallocate stashes for vma page-directories
ceb2d68f55a0 drm/i915: Export ppgtt_bind_vma
63cac79e5aff drm/i915/gem: Only revoke mmap handlers if active
981b52482265 drm/i915/gem: Drop forced struct_mutex from shrinker_taints_mutex
02e97324e6dd drm/i915/gem: Split the context's obj:vma lut into its own mutex
2b716a4a1108 drm/i915: Drop vm.ref for duplicate vma on construction

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/index.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Intel-gfx] [PATCH 01/23] drm/i915: Drop vm.ref for duplicate vma on construction
  2020-07-02  8:32 ` [Intel-gfx] " Chris Wilson
@ 2020-07-02 12:27   ` Tvrtko Ursulin
  -1 siblings, 0 replies; 56+ messages in thread
From: Tvrtko Ursulin @ 2020-07-02 12:27 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: stable


On 02/07/2020 09:32, Chris Wilson wrote:
> As we allow for parallel threads to create vma instances in parallel,
> and we only filter out the duplicates upon reacquiring the spinlock for
> the rbtree, we have to free the loser of the constructors' race. When
> freeing, we should also drop any resource references acquired for the
> redundant vma.
> 
> Fixes: 2850748ef876 ("drm/i915: Pull i915_vma_pin under the vm->mutex")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: <stable@vger.kernel.org> # v5.5+
> ---
>   drivers/gpu/drm/i915/i915_vma.c | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
> index 1f63c4a1f055..7fe1f317cd2b 100644
> --- a/drivers/gpu/drm/i915/i915_vma.c
> +++ b/drivers/gpu/drm/i915/i915_vma.c
> @@ -198,6 +198,7 @@ vma_create(struct drm_i915_gem_object *obj,
>   		cmp = i915_vma_compare(pos, vm, view);
>   		if (cmp == 0) {
>   			spin_unlock(&obj->vma.lock);
> +			i915_vm_put(vm);
>   			i915_vma_free(vma);
>   			return pos;
>   		}
> 

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Intel-gfx] [PATCH 01/23] drm/i915: Drop vm.ref for duplicate vma on construction
@ 2020-07-02 12:27   ` Tvrtko Ursulin
  0 siblings, 0 replies; 56+ messages in thread
From: Tvrtko Ursulin @ 2020-07-02 12:27 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: stable


On 02/07/2020 09:32, Chris Wilson wrote:
> As we allow for parallel threads to create vma instances in parallel,
> and we only filter out the duplicates upon reacquiring the spinlock for
> the rbtree, we have to free the loser of the constructors' race. When
> freeing, we should also drop any resource references acquired for the
> redundant vma.
> 
> Fixes: 2850748ef876 ("drm/i915: Pull i915_vma_pin under the vm->mutex")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: <stable@vger.kernel.org> # v5.5+
> ---
>   drivers/gpu/drm/i915/i915_vma.c | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
> index 1f63c4a1f055..7fe1f317cd2b 100644
> --- a/drivers/gpu/drm/i915/i915_vma.c
> +++ b/drivers/gpu/drm/i915/i915_vma.c
> @@ -198,6 +198,7 @@ vma_create(struct drm_i915_gem_object *obj,
>   		cmp = i915_vma_compare(pos, vm, view);
>   		if (cmp == 0) {
>   			spin_unlock(&obj->vma.lock);
> +			i915_vm_put(vm);
>   			i915_vma_free(vma);
>   			return pos;
>   		}
> 

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Intel-gfx] [PATCH 04/23] drm/i915/gem: Only revoke mmap handlers if active
  2020-07-02  8:32 ` [Intel-gfx] [PATCH 04/23] drm/i915/gem: Only revoke mmap handlers if active Chris Wilson
@ 2020-07-02 12:35   ` Tvrtko Ursulin
  2020-07-02 12:47     ` Chris Wilson
  0 siblings, 1 reply; 56+ messages in thread
From: Tvrtko Ursulin @ 2020-07-02 12:35 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 02/07/2020 09:32, Chris Wilson wrote:
> Avoid waking up the device and taking stale locks if we know that the
> object is not currently mmapped. This is particularly useful as not many
> object are actually mmapped and so we can destroy them without waking
> the device up, and gives us a little more freedom of workqueue ordering
> during shutdown.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/gem/i915_gem_mman.c | 7 +++++--
>   1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> index fe27c5b344e3..522ca4f51b53 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> @@ -516,8 +516,11 @@ void i915_gem_object_release_mmap_offset(struct drm_i915_gem_object *obj)
>    */
>   void i915_gem_object_release_mmap(struct drm_i915_gem_object *obj)
>   {
> -	i915_gem_object_release_mmap_gtt(obj);
> -	i915_gem_object_release_mmap_offset(obj);
> +	if (obj->userfault_count)
> +		i915_gem_object_release_mmap_gtt(obj);
> +
> +	if (!RB_EMPTY_ROOT(&obj->mmo.offsets))
> +		i915_gem_object_release_mmap_offset(obj);
>   }
>   
>   static struct i915_mmap_offset *
> 

Both conditions will need explaining why they are not racy.

First should normally be done under the ggtt->mutex, second under 
obj->mmo.lock.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Intel-gfx] [PATCH 04/23] drm/i915/gem: Only revoke mmap handlers if active
  2020-07-02 12:35   ` Tvrtko Ursulin
@ 2020-07-02 12:47     ` Chris Wilson
  2020-07-02 12:54       ` Chris Wilson
  0 siblings, 1 reply; 56+ messages in thread
From: Chris Wilson @ 2020-07-02 12:47 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2020-07-02 13:35:41)
> 
> On 02/07/2020 09:32, Chris Wilson wrote:
> > Avoid waking up the device and taking stale locks if we know that the
> > object is not currently mmapped. This is particularly useful as not many
> > object are actually mmapped and so we can destroy them without waking
> > the device up, and gives us a little more freedom of workqueue ordering
> > during shutdown.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >   drivers/gpu/drm/i915/gem/i915_gem_mman.c | 7 +++++--
> >   1 file changed, 5 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> > index fe27c5b344e3..522ca4f51b53 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> > @@ -516,8 +516,11 @@ void i915_gem_object_release_mmap_offset(struct drm_i915_gem_object *obj)
> >    */
> >   void i915_gem_object_release_mmap(struct drm_i915_gem_object *obj)
> >   {
> > -     i915_gem_object_release_mmap_gtt(obj);
> > -     i915_gem_object_release_mmap_offset(obj);
> > +     if (obj->userfault_count)
> > +             i915_gem_object_release_mmap_gtt(obj);
> > +
> > +     if (!RB_EMPTY_ROOT(&obj->mmo.offsets))
> > +             i915_gem_object_release_mmap_offset(obj);
> >   }
> >   
> >   static struct i915_mmap_offset *
> > 
> 
> Both conditions will need explaining why they are not racy.

It's an identical race even if you do take the mutex.

Thread A		Thread B
release_mmap		create_mmap_offset
  mutex_lock/unlock	...
  			mutex_lock/unlock

Thread A will only operate on a snapshot of the current state with or
without the mutex; if Thread B is concurrently adding new mmaps, that
may occur before after Thread A makes decision the object is clean.
Thread A can only assess the state at that moment in time, and only
cares enough to ensure that from its pov, it has cleared the old
mmaps.

During free, we know there can be no concurrency (refcnt==0) and so the
snapshot is true.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Intel-gfx] [PATCH 04/23] drm/i915/gem: Only revoke mmap handlers if active
  2020-07-02 12:47     ` Chris Wilson
@ 2020-07-02 12:54       ` Chris Wilson
  0 siblings, 0 replies; 56+ messages in thread
From: Chris Wilson @ 2020-07-02 12:54 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Chris Wilson (2020-07-02 13:47:00)
> Quoting Tvrtko Ursulin (2020-07-02 13:35:41)
> > 
> > On 02/07/2020 09:32, Chris Wilson wrote:
> > > Avoid waking up the device and taking stale locks if we know that the
> > > object is not currently mmapped. This is particularly useful as not many
> > > object are actually mmapped and so we can destroy them without waking
> > > the device up, and gives us a little more freedom of workqueue ordering
> > > during shutdown.
> > > 
> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > ---
> > >   drivers/gpu/drm/i915/gem/i915_gem_mman.c | 7 +++++--
> > >   1 file changed, 5 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> > > index fe27c5b344e3..522ca4f51b53 100644
> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> > > @@ -516,8 +516,11 @@ void i915_gem_object_release_mmap_offset(struct drm_i915_gem_object *obj)
> > >    */
> > >   void i915_gem_object_release_mmap(struct drm_i915_gem_object *obj)
> > >   {
> > > -     i915_gem_object_release_mmap_gtt(obj);
> > > -     i915_gem_object_release_mmap_offset(obj);
> > > +     if (obj->userfault_count)
> > > +             i915_gem_object_release_mmap_gtt(obj);
> > > +
> > > +     if (!RB_EMPTY_ROOT(&obj->mmo.offsets))
> > > +             i915_gem_object_release_mmap_offset(obj);
> > >   }
> > >   
> > >   static struct i915_mmap_offset *
> > > 
> > 
> > Both conditions will need explaining why they are not racy.
> 
> It's an identical race even if you do take the mutex.
> 
> Thread A                Thread B
> release_mmap            create_mmap_offset
>   mutex_lock/unlock     ...
>                         mutex_lock/unlock
> 
> Thread A will only operate on a snapshot of the current state with or
> without the mutex; if Thread B is concurrently adding new mmaps, that
> may occur before after Thread A makes decision the object is clean.
> Thread A can only assess the state at that moment in time, and only
> cares enough to ensure that from its pov, it has cleared the old
> mmaps.
> 
> During free, we know there can be no concurrency (refcnt==0) and so the
> snapshot is true.

Beyond the free usecase, the serialisation of the individual releases is
coordinated by owning the backing storage operation i.e. we release
when revoking the vma under the vma->vm->mutex, and the pages under
currently the obj->mm.lock; to create a new fault mapping, the handlers
will have taken a reference to either the vma or backing store and thus
have serialised with the release. i915_gem_object_release_mmap() should
be only used on the free path, since it's usual for us to have to do
both. Now what are we doing in set-tiling? The tiling only affects ggtt
mmapings...
-Chris
> -Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [Intel-gfx] ✗ Fi.CI.IGT: failure for series starting with [01/23] drm/i915: Drop vm.ref for duplicate vma on construction
  2020-07-02  8:32 ` [Intel-gfx] " Chris Wilson
                   ` (26 preceding siblings ...)
  (?)
@ 2020-07-02 13:08 ` Patchwork
  -1 siblings, 0 replies; 56+ messages in thread
From: Patchwork @ 2020-07-02 13:08 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/23] drm/i915: Drop vm.ref for duplicate vma on construction
URL   : https://patchwork.freedesktop.org/series/79037/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_8692_full -> Patchwork_18065_full
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with Patchwork_18065_full absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_18065_full, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_18065_full:

### IGT changes ###

#### Possible regressions ####

  * igt@gem_bad_reloc@negative-reloc-bltcopy:
    - shard-iclb:         [PASS][1] -> [FAIL][2]
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8692/shard-iclb5/igt@gem_bad_reloc@negative-reloc-bltcopy.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/shard-iclb5/igt@gem_bad_reloc@negative-reloc-bltcopy.html
    - shard-kbl:          [PASS][3] -> [FAIL][4] +1 similar issue
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8692/shard-kbl1/igt@gem_bad_reloc@negative-reloc-bltcopy.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/shard-kbl3/igt@gem_bad_reloc@negative-reloc-bltcopy.html
    - shard-skl:          NOTRUN -> [FAIL][5]
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/shard-skl5/igt@gem_bad_reloc@negative-reloc-bltcopy.html

  * igt@gem_close@many-handles-one-vma:
    - shard-glk:          [PASS][6] -> [FAIL][7] +1 similar issue
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8692/shard-glk2/igt@gem_close@many-handles-one-vma.html
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/shard-glk1/igt@gem_close@many-handles-one-vma.html
    - shard-apl:          [PASS][8] -> [FAIL][9] +1 similar issue
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8692/shard-apl3/igt@gem_close@many-handles-one-vma.html
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/shard-apl8/igt@gem_close@many-handles-one-vma.html
    - shard-skl:          [PASS][10] -> [FAIL][11]
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8692/shard-skl9/igt@gem_close@many-handles-one-vma.html
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/shard-skl8/igt@gem_close@many-handles-one-vma.html
    - shard-tglb:         [PASS][12] -> [FAIL][13]
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8692/shard-tglb5/igt@gem_close@many-handles-one-vma.html
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/shard-tglb8/igt@gem_close@many-handles-one-vma.html
    - shard-hsw:          [PASS][14] -> [FAIL][15]
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8692/shard-hsw4/igt@gem_close@many-handles-one-vma.html
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/shard-hsw7/igt@gem_close@many-handles-one-vma.html
    - shard-snb:          [PASS][16] -> [FAIL][17]
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8692/shard-snb2/igt@gem_close@many-handles-one-vma.html
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/shard-snb6/igt@gem_close@many-handles-one-vma.html
    - shard-iclb:         NOTRUN -> [FAIL][18]
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/shard-iclb6/igt@gem_close@many-handles-one-vma.html

  * igt@gem_sync@basic-many-each:
    - shard-iclb:         [PASS][19] -> [INCOMPLETE][20]
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8692/shard-iclb6/igt@gem_sync@basic-many-each.html
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/shard-iclb3/igt@gem_sync@basic-many-each.html

  * igt@kms_vblank@pipe-d-ts-continuation-dpms-rpm:
    - shard-tglb:         [PASS][21] -> [INCOMPLETE][22]
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8692/shard-tglb8/igt@kms_vblank@pipe-d-ts-continuation-dpms-rpm.html
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/shard-tglb8/igt@kms_vblank@pipe-d-ts-continuation-dpms-rpm.html

  

### Piglit changes ###

#### Possible regressions ####

  * spec@arb_tessellation_shader@execution@built-in-functions@tcs-op-bitand-not-uint-uvec3 (NEW):
    - pig-glk-j5005:      NOTRUN -> [FAIL][23]
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/pig-glk-j5005/spec@arb_tessellation_shader@execution@built-in-functions@tcs-op-bitand-not-uint-uvec3.html

  
New tests
---------

  New tests have been introduced between CI_DRM_8692_full and Patchwork_18065_full:

### New Piglit tests (1) ###

  * spec@arb_tessellation_shader@execution@built-in-functions@tcs-op-bitand-not-uint-uvec3:
    - Statuses : 1 fail(s)
    - Exec time: [0.12] s

  

Known issues
------------

  Here are the changes found in Patchwork_18065_full that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_exec_reloc@basic-many-active@vecs0:
    - shard-kbl:          [PASS][24] -> [DMESG-WARN][25] ([i915#93] / [i915#95]) +2 similar issues
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8692/shard-kbl1/igt@gem_exec_reloc@basic-many-active@vecs0.html
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/shard-kbl6/igt@gem_exec_reloc@basic-many-active@vecs0.html

  * igt@gem_exec_whisper@basic-forked-all:
    - shard-glk:          [PASS][26] -> [DMESG-WARN][27] ([i915#118] / [i915#95])
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8692/shard-glk4/igt@gem_exec_whisper@basic-forked-all.html
   [27]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/shard-glk2/igt@gem_exec_whisper@basic-forked-all.html

  * igt@gem_userptr_blits@readonly-unsync:
    - shard-skl:          [PASS][28] -> [TIMEOUT][29] ([i915#1958] / [i915#2119]) +1 similar issue
   [28]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8692/shard-skl7/igt@gem_userptr_blits@readonly-unsync.html
   [29]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/shard-skl6/igt@gem_userptr_blits@readonly-unsync.html

  * igt@i915_selftest@mock@requests:
    - shard-skl:          [PASS][30] -> [INCOMPLETE][31] ([i915#198] / [i915#2110])
   [30]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8692/shard-skl2/igt@i915_selftest@mock@requests.html
   [31]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/shard-skl9/igt@i915_selftest@mock@requests.html
    - shard-hsw:          [PASS][32] -> [INCOMPLETE][33] ([i915#2110])
   [32]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8692/shard-hsw6/igt@i915_selftest@mock@requests.html
   [33]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/shard-hsw8/igt@i915_selftest@mock@requests.html

  * igt@kms_color@pipe-c-ctm-0-25:
    - shard-skl:          [PASS][34] -> [DMESG-WARN][35] ([i915#1982]) +13 similar issues
   [34]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8692/shard-skl10/igt@kms_color@pipe-c-ctm-0-25.html
   [35]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/shard-skl6/igt@kms_color@pipe-c-ctm-0-25.html

  * igt@kms_flip@flip-vs-suspend@a-dp1:
    - shard-kbl:          [PASS][36] -> [DMESG-WARN][37] ([i915#180]) +5 similar issues
   [36]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8692/shard-kbl3/igt@kms_flip@flip-vs-suspend@a-dp1.html
   [37]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/shard-kbl7/igt@kms_flip@flip-vs-suspend@a-dp1.html

  * igt@kms_flip@flip-vs-suspend@b-edp1:
    - shard-skl:          [PASS][38] -> [INCOMPLETE][39] ([i915#198])
   [38]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8692/shard-skl5/igt@kms_flip@flip-vs-suspend@b-edp1.html
   [39]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/shard-skl4/igt@kms_flip@flip-vs-suspend@b-edp1.html

  * igt@kms_frontbuffer_tracking@fbc-1p-primscrn-cur-indfb-onoff:
    - shard-snb:          [PASS][40] -> [SKIP][41] ([fdo#109271]) +2 similar issues
   [40]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8692/shard-snb2/igt@kms_frontbuffer_tracking@fbc-1p-primscrn-cur-indfb-onoff.html
   [41]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/shard-snb2/igt@kms_frontbuffer_tracking@fbc-1p-primscrn-cur-indfb-onoff.html

  * igt@kms_frontbuffer_tracking@fbc-1p-primscrn-shrfb-pgflip-blt:
    - shard-apl:          [PASS][42] -> [DMESG-WARN][43] ([i915#1982])
   [42]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8692/shard-apl3/igt@kms_frontbuffer_tracking@fbc-1p-primscrn-shrfb-pgflip-blt.html
   [43]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/shard-apl7/igt@kms_frontbuffer_tracking@fbc-1p-primscrn-shrfb-pgflip-blt.html

  * igt@kms_frontbuffer_tracking@fbc-1p-primscrn-spr-indfb-draw-render:
    - shard-kbl:          [PASS][44] -> [DMESG-WARN][45] ([i915#1982]) +1 similar issue
   [44]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8692/shard-kbl2/igt@kms_frontbuffer_tracking@fbc-1p-primscrn-spr-indfb-draw-render.html
   [45]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/shard-kbl4/igt@kms_frontbuffer_tracking@fbc-1p-primscrn-spr-indfb-draw-render.html

  * igt@kms_plane@plane-panning-top-left-pipe-a-planes:
    - shard-iclb:         [PASS][46] -> [DMESG-WARN][47] ([i915#1982])
   [46]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8692/shard-iclb7/igt@kms_plane@plane-panning-top-left-pipe-a-planes.html
   [47]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/shard-iclb8/igt@kms_plane@plane-panning-top-left-pipe-a-planes.html

  * igt@kms_plane_alpha_blend@pipe-c-constant-alpha-min:
    - shard-skl:          [PASS][48] -> [FAIL][49] ([fdo#108145] / [i915#265])
   [48]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8692/shard-skl9/igt@kms_plane_alpha_blend@pipe-c-constant-alpha-min.html
   [49]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/shard-skl8/igt@kms_plane_alpha_blend@pipe-c-constant-alpha-min.html

  * igt@kms_psr@psr2_cursor_mmap_cpu:
    - shard-iclb:         [PASS][50] -> [SKIP][51] ([fdo#109441]) +2 similar issues
   [50]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8692/shard-iclb2/igt@kms_psr@psr2_cursor_mmap_cpu.html
   [51]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/shard-iclb6/igt@kms_psr@psr2_cursor_mmap_cpu.html

  * igt@kms_setmode@basic:
    - shard-kbl:          [PASS][52] -> [FAIL][53] ([i915#31])
   [52]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8692/shard-kbl6/igt@kms_setmode@basic.html
   [53]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/shard-kbl4/igt@kms_setmode@basic.html

  * igt@kms_vblank@pipe-c-ts-continuation-suspend:
    - shard-kbl:          [PASS][54] -> [INCOMPLETE][55] ([i915#155] / [i915#794])
   [54]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8692/shard-kbl7/igt@kms_vblank@pipe-c-ts-continuation-suspend.html
   [55]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/shard-kbl1/igt@kms_vblank@pipe-c-ts-continuation-suspend.html

  * igt@perf_pmu@semaphore-busy@rcs0:
    - shard-apl:          [PASS][56] -> [FAIL][57] ([i915#1820])
   [56]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8692/shard-apl7/igt@perf_pmu@semaphore-busy@rcs0.html
   [57]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/shard-apl2/igt@perf_pmu@semaphore-busy@rcs0.html

  * igt@perf_pmu@semaphore-busy@vcs0:
    - shard-kbl:          [PASS][58] -> [FAIL][59] ([i915#1820]) +2 similar issues
   [58]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8692/shard-kbl1/igt@perf_pmu@semaphore-busy@vcs0.html
   [59]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/shard-kbl1/igt@perf_pmu@semaphore-busy@vcs0.html

  * igt@perf_pmu@semaphore-wait@rcs0:
    - shard-apl:          [PASS][60] -> [DMESG-WARN][61] ([i915#1635] / [i915#95]) +25 similar issues
   [60]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8692/shard-apl8/igt@perf_pmu@semaphore-wait@rcs0.html
   [61]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/shard-apl4/igt@perf_pmu@semaphore-wait@rcs0.html

  * igt@sysfs_heartbeat_interval@mixed@vecs0:
    - shard-glk:          [PASS][62] -> [FAIL][63] ([i915#1731])
   [62]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8692/shard-glk6/igt@sysfs_heartbeat_interval@mixed@vecs0.html
   [63]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/shard-glk4/igt@sysfs_heartbeat_interval@mixed@vecs0.html

  
#### Possible fixes ####

  * igt@gem_exec_reloc@basic-concurrent0:
    - shard-tglb:         [FAIL][64] ([i915#1930]) -> [PASS][65]
   [64]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8692/shard-tglb3/igt@gem_exec_reloc@basic-concurrent0.html
   [65]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/shard-tglb3/igt@gem_exec_reloc@basic-concurrent0.html
    - shard-glk:          [FAIL][66] ([i915#1930]) -> [PASS][67]
   [66]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8692/shard-glk4/igt@gem_exec_reloc@basic-concurrent0.html
   [67]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/shard-glk2/igt@gem_exec_reloc@basic-concurrent0.html
    - shard-apl:          [FAIL][68] ([i915#1930]) -> [PASS][69]
   [68]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8692/shard-apl3/igt@gem_exec_reloc@basic-concurrent0.html
   [69]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/shard-apl7/igt@gem_exec_reloc@basic-concurrent0.html
    - shard-kbl:          [FAIL][70] ([i915#1930]) -> [PASS][71]
   [70]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8692/shard-kbl7/igt@gem_exec_reloc@basic-concurrent0.html
   [71]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/shard-kbl3/igt@gem_exec_reloc@basic-concurrent0.html
    - shard-hsw:          [FAIL][72] ([i915#1930]) -> [PASS][73]
   [72]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8692/shard-hsw4/igt@gem_exec_reloc@basic-concurrent0.html
   [73]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/shard-hsw2/igt@gem_exec_reloc@basic-concurrent0.html
    - shard-iclb:         [FAIL][74] ([i915#1930]) -> [PASS][75]
   [74]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8692/shard-iclb3/igt@gem_exec_reloc@basic-concurrent0.html
   [75]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/shard-iclb7/igt@gem_exec_reloc@basic-concurrent0.html
    - shard-skl:          [FAIL][76] ([i915#1930]) -> [PASS][77]
   [76]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8692/shard-skl6/igt@gem_exec_reloc@basic-concurrent0.html
   [77]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/shard-skl1/igt@gem_exec_reloc@basic-concurrent0.html

  * igt@gem_exec_reloc@basic-concurrent16:
    - shard-snb:          [FAIL][78] ([i915#1930]) -> [PASS][79] +1 similar issue
   [78]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8692/shard-snb6/igt@gem_exec_reloc@basic-concurrent16.html
   [79]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/shard-snb5/igt@gem_exec_reloc@basic-concurrent16.html
    - shard-iclb:         [INCOMPLETE][80] ([i915#1958]) -> [PASS][81]
   [80]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8692/shard-iclb7/igt@gem_exec_reloc@basic-concurrent16.html
   [81]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/shard-iclb1/igt@gem_exec_reloc@basic-concurrent16.html
    - shard-hsw:          [TIMEOUT][82] ([i915#1958] / [i915#2119]) -> [PASS][83] +3 similar issues
   [82]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8692/shard-hsw1/igt@gem_exec_reloc@basic-concurrent16.html
   [83]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/shard-hsw8/igt@gem_exec_reloc@basic-concurrent16.html
    - shard-skl:          [INCOMPLETE][84] ([i915#1958]) -> [PASS][85]
   [84]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8692/shard-skl10/igt@gem_exec_reloc@basic-concurrent16.html
   [85]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/shard-skl8/igt@gem_exec_reloc@basic-concurrent16.html
    - shard-kbl:          [INCOMPLETE][86] ([i915#1958]) -> [PASS][87]
   [86]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8692/shard-kbl7/igt@gem_exec_reloc@basic-concurrent16.html
   [87]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/shard-kbl7/igt@gem_exec_reloc@basic-concurrent16.html
    - shard-apl:          [INCOMPLETE][88] ([i915#1635] / [i915#1958]) -> [PASS][89]
   [88]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8692/shard-apl3/igt@gem_exec_reloc@basic-concurrent16.html
   [89]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/shard-apl6/igt@gem_exec_reloc@basic-concurrent16.html
    - shard-tglb:         [INCOMPLETE][90] ([i915#1958]) -> [PASS][91]
   [90]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8692/shard-tglb2/igt@gem_exec_reloc@basic-concurrent16.html
   [91]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/shard-tglb5/igt@gem_exec_reloc@basic-concurrent16.html
    - shard-glk:          [INCOMPLETE][92] ([i915#1958] / [i915#58] / [k.org#198133]) -> [PASS][93]
   [92]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8692/shard-glk1/igt@gem_exec_reloc@basic-concurrent16.html
   [93]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/shard-glk3/igt@gem_exec_reloc@basic-concurrent16.html

  * igt@i915_pm_rpm@i2c:
    - shard-skl:          [DMESG-WARN][94] ([i915#1982]) -> [PASS][95] +6 similar issues
   [94]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8692/shard-skl1/igt@i915_pm_rpm@i2c.html
   [95]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/shard-skl5/igt@i915_pm_rpm@i2c.html

  * igt@i915_selftest@mock@requests:
    - shard-apl:          [INCOMPLETE][96] ([i915#1635] / [i915#2110]) -> [PASS][97]
   [96]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8692/shard-apl1/igt@i915_selftest@mock@requests.html
   [97]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/shard-apl2/igt@i915_selftest@mock@requests.html

  * igt@i915_suspend@fence-restore-tiled2untiled:
    - shard-skl:          [INCOMPLETE][98] ([i915#69]) -> [PASS][99]
   [98]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8692/shard-skl3/igt@i915_suspend@fence-restore-tiled2untiled.html
   [99]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/shard-skl4/igt@i915_suspend@fence-restore-tiled2untiled.html

  * igt@kms_big_fb@x-tiled-64bpp-rotate-0:
    - shard-glk:          [DMESG-FAIL][100] ([i915#118] / [i915#95]) -> [PASS][101] +1 similar issue
   [100]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8692/shard-glk8/igt@kms_big_fb@x-tiled-64bpp-rotate-0.html
   [101]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/shard-glk6/igt@kms_big_fb@x-tiled-64bpp-rotate-0.html

  * igt@kms_cursor_crc@pipe-a-cursor-suspend:
    - shard-kbl:          [DMESG-WARN][102] ([i915#180]) -> [PASS][103] +3 similar issues
   [102]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8692/shard-kbl2/igt@kms_cursor_crc@pipe-a-cursor-suspend.html
   [103]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/shard-kbl3/igt@kms_cursor_crc@pipe-a-cursor-suspend.html

  * igt@kms_cursor_legacy@pipe-a-forked-move:
    - shard-apl:          [DMESG-WARN][104] ([i915#1635] / [i915#95]) -> [PASS][105] +18 similar issues
   [104]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8692/shard-apl1/igt@kms_cursor_legacy@pipe-a-forked-move.html
   [105]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/shard-apl8/igt@kms_cursor_legacy@pipe-a-forked-move.html

  * igt@kms_flip@flip-vs-wf_vblank-interruptible@a-edp1:
    - shard-tglb:         [DMESG-WARN][106] ([i915#1982]) -> [PASS][107] +1 similar issue
   [106]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8692/shard-tglb7/igt@kms_flip@flip-vs-wf_vblank-interruptible@a-edp1.html
   [107]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/shard-tglb2/igt@kms_flip@flip-vs-wf_vblank-interruptible@a-edp1.html

  * igt@kms_flip@plain-flip-ts-check@a-edp1:
    - shard-skl:          [FAIL][108] ([i915#1928]) -> [PASS][109]
   [108]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8692/shard-skl9/igt@kms_flip@plain-flip-ts-check@a-edp1.html
   [109]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/shard-skl7/igt@kms_flip@plain-flip-ts-check@a-edp1.html

  * igt@kms_hdr@bpc-switch:
    - shard-skl:          [FAIL][110] ([i915#1188]) -> [PASS][111]
   [110]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8692/shard-skl3/igt@kms_hdr@bpc-switch.html
   [111]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/shard-skl4/igt@kms_hdr@bpc-switch.html

  * igt@kms_properties@crtc-properties-atomic:
    - shard-tglb:         [DMESG-WARN][112] ([i915#402]) -> [PASS][113]
   [112]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8692/shard-tglb3/igt@kms_properties@crtc-properties-atomic.html
   [113]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/shard-tglb3/igt@kms_properties@crtc-properties-atomic.html

  * igt@kms_psr@psr2_cursor_blt:
    - shard-iclb:         [SKIP][114] ([fdo#109441]) -> [PASS][115]
   [114]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8692/shard-iclb7/igt@kms_psr@psr2_cursor_blt.html
   [115]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/shard-iclb2/igt@kms_psr@psr2_cursor_blt.html

  * igt@kms_setmode@basic:
    - shard-hsw:          [FAIL][116] ([i915#31]) -> [PASS][117]
   [116]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8692/shard-hsw7/igt@kms_setmode@basic.html
   [117]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/shard-hsw8/igt@kms_setmode@basic.html

  * igt@perf@blocking-parameterized:
    - shard-tglb:         [FAIL][118] ([i915#1542]) -> [PASS][119]
   [118]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8692/shard-tglb6/igt@perf@blocking-parameterized.html
   [119]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/shard-tglb7/igt@perf@blocking-parameterized.html

  * igt@perf_pmu@enable-race@bcs0:
    - shard-glk:          [DMESG-WARN][120] ([i915#118] / [i915#95]) -> [PASS][121]
   [120]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8692/shard-glk6/igt@perf_pmu@enable-race@bcs0.html
   [121]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/shard-glk4/igt@perf_pmu@enable-race@bcs0.html

  
#### Warnings ####

  * igt@kms_chamelium@vga-edid-read:
    - shard-apl:          [SKIP][122] ([fdo#109271] / [fdo#111827]) -> [SKIP][123] ([fdo#109271] / [fdo#111827] / [i915#1635]) +1 similar issue
   [122]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8692/shard-apl3/igt@kms_chamelium@vga-edid-read.html
   [123]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/shard-apl7/igt@kms_chamelium@vga-edid-read.html

  * igt@kms_color@pipe-a-ctm-0-75:
    - shard-tglb:         [FAIL][124] ([i915#1149] / [i915#315]) -> [DMESG-FAIL][125] ([i915#1149] / [i915#402])
   [124]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8692/shard-tglb2/igt@kms_color@pipe-a-ctm-0-75.html
   [125]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/shard-tglb7/igt@kms_color@pipe-a-ctm-0-75.html

  * igt@kms_content_protection@atomic-dpms:
    - shard-kbl:          [TIMEOUT][126] ([i915#1319] / [i915#2119]) -> [TIMEOUT][127] ([i915#1319] / [i915#1958] / [i915#2119])
   [126]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8692/shard-kbl7/igt@kms_content_protection@atomic-dpms.html
   [127]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/shard-kbl3/igt@kms_content_protection@atomic-dpms.html

  * igt@kms_cursor_crc@pipe-d-cursor-512x512-random:
    - shard-apl:          [SKIP][128] ([fdo#109271] / [i915#1635]) -> [SKIP][129] ([fdo#109271]) +12 similar issues
   [128]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8692/shard-apl1/igt@kms_cursor_crc@pipe-d-cursor-512x512-random.html
   [129]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/shard-apl8/igt@kms_cursor_crc@pipe-d-cursor-512x512-random.html

  * igt@kms_frontbuffer_tracking@fbc-1p-offscren-pri-shrfb-draw-render:
    - shard-hsw:          [TIMEOUT][130] ([i915#1958] / [i915#2119]) -> [SKIP][131] ([fdo#109271]) +2 similar issues
   [130]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8692/shard-hsw1/igt@kms_frontbuffer_tracking@fbc-1p-offscren-pri-shrfb-draw-render.html
   [131]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/shard-hsw8/igt@kms_frontbuffer_tracking@fbc-1p-offscren-pri-shrfb-draw-render.html

  * igt@kms_frontbuffer_tracking@fbc-suspend:
    - shard-kbl:          [DMESG-WARN][132] ([i915#93] / [i915#95]) -> [DMESG-WARN][133] ([i915#180] / [i915#93] / [i915#95])
   [132]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8692/shard-kbl3/igt@kms_frontbuffer_tracking@fbc-suspend.html
   [133]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/shard-kbl1/igt@kms_frontbuffer_tracking@fbc-suspend.html

  * igt@kms_frontbuffer_tracking@psr-2p-primscrn-cur-indfb-onoff:
    - shard-apl:          [SKIP][134] ([fdo#109271]) -> [SKIP][135] ([fdo#109271] / [i915#1635]) +7 similar issues
   [134]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8692/shard-apl2/igt@kms_frontbuffer_tracking@psr-2p-primscrn-cur-indfb-onoff.html
   [135]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/shard-apl4/igt@kms_frontbuffer_tracking@psr-2p-primscrn-cur-indfb-onoff.html

  * igt@kms_plane_alpha_blend@pipe-a-constant-alpha-min:
    - shard-skl:          [FAIL][136] ([fdo#108145] / [i915#265]) -> [DMESG-WARN][137] ([i915#1982])
   [136]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8692/shard-skl4/igt@kms_plane_alpha_blend@pipe-a-constant-alpha-min.html
   [137]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/shard-skl2/igt@kms_plane_alpha_blend@pipe-a-constant-alpha-min.html

  * igt@runner@aborted:
    - shard-hsw:          [FAIL][138] ([i915#2110]) -> [FAIL][139] ([i915#1436] / [i915#2110])
   [138]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8692/shard-hsw6/igt@runner@aborted.html
   [139]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/shard-hsw8/igt@runner@aborted.html

  
  [fdo#108145]: https://bugs.freedesktop.org/show_bug.cgi?id=108145
  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109441]: https://bugs.freedesktop.org/show_bug.cgi?id=109441
  [fdo#111827]: https://bugs.freedesktop.org/show_bug.cgi?id=111827
  [i915#1149]: https://gitlab.freedesktop.org/drm/intel/issues/1149
  [i915#118]: https://gitlab.freedesktop.org/drm/intel/issues/118
  [i915#1188]: https://gitlab.freedesktop.org/drm/intel/issues/1188
  [i915#1319]: https://gitlab.freedesktop.org/drm/intel/issues/1319
  [i915#1436]: https://gitlab.freedesktop.org/drm/intel/issues/1436
  [i915#1542]: https://gitlab.freedesktop.org/drm/intel/issues/1542
  [i915#155]: https://gitlab.freedesktop.org/drm/intel/issues/155
  [i915#1635]: https://gitlab.freedesktop.org/drm/intel/issues/1635
  [i915#1731]: https://gitlab.freedesktop.org/drm/intel/issues/1731
  [i915#180]: https://gitlab.freedesktop.org/drm/intel/issues/180
  [i915#1820]: https://gitlab.freedesktop.org/drm/intel/issues/1820
  [i915#1928]: https://gitlab.freedesktop.org/drm/intel/issues/1928
  [i915#1930]: https://gitlab.freedesktop.org/drm/intel/issues/1930
  [i915#1958]: https://gitlab.freedesktop.org/drm/intel/issues/1958
  [i915#198]: https://gitlab.freedesktop.org/drm/intel/issues/198
  [i915#1982]: https://gitlab.freedesktop.org/drm/intel/issues/1982
  [i915#2110]: https://gitlab.freedesktop.org/drm/intel/issues/2110
  [i915#2119]: https://gitlab.freedesktop.org/drm/intel/issues/2119
  [i915#265]: https://gitlab.freedesktop.org/drm/intel/issues/265
  [i915#31]: https://gitlab.freedesktop.org/drm/intel/issues/31
  [i915#315]: https://gitlab.freedesktop.org/drm/intel/issues/315
  [i915#402]: https://gitlab.freedesktop.org/drm/intel/issues/402
  [i915#58]: https://gitlab.freedesktop.org/drm/intel/issues/58
  [i915#69]: https://gitlab.freedesktop.org/drm/intel/issues/69
  [i915#794]: https://gitlab.freedesktop.org/drm/intel/issues/794
  [i915#93]: https://gitlab.freedesktop.org/drm/intel/issues/93
  [i915#95]: https://gitlab.freedesktop.org/drm/intel/issues/95
  [k.org#198133]: https://bugzilla.kernel.org/show_bug.cgi?id=198133


Participating hosts (10 -> 10)
------------------------------

  No changes in participating hosts


Build changes
-------------

  * Linux: CI_DRM_8692 -> Patchwork_18065

  CI-20190529: 20190529
  CI_DRM_8692: e30abe29fd5407631a61d48f93bad5fdeba8080d @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_5720: f35053d4b6d7bbcf6505ef67a8bd56acc7fb2eb2 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_18065: 6f377648e7f5e352bf1600ef1f8954ec04781765 @ git://anongit.freedesktop.org/gfx-ci/linux
  piglit_4509: fdc5a4ca11124ab8413c7988896eec4c97336694 @ git://anongit.freedesktop.org/piglit

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18065/index.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Intel-gfx] [PATCH 01/23] drm/i915: Drop vm.ref for duplicate vma on construction
  2020-07-02  8:32 ` [Intel-gfx] " Chris Wilson
@ 2020-07-02 20:25   ` Andi Shyti
  -1 siblings, 0 replies; 56+ messages in thread
From: Andi Shyti @ 2020-07-02 20:25 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx, stable, andi, andi.shyti

Hi Chris,

> diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
> index 1f63c4a1f055..7fe1f317cd2b 100644
> --- a/drivers/gpu/drm/i915/i915_vma.c
> +++ b/drivers/gpu/drm/i915/i915_vma.c
> @@ -198,6 +198,7 @@ vma_create(struct drm_i915_gem_object *obj,
>  		cmp = i915_vma_compare(pos, vm, view);
>  		if (cmp == 0) {
>  			spin_unlock(&obj->vma.lock);
> +			i915_vm_put(vm);
>  			i915_vma_free(vma);

You are forgettin one return without dereferencing it.

would this be a solution:

@@ -106,6 +106,7 @@ vma_create(struct drm_i915_gem_object *obj,
 {
        struct i915_vma *vma;
        struct rb_node *rb, **p;
+       struct i915_vma *pos = ERR_PTR(-E2BIG);
 
        /* The aliasing_ppgtt should never be used directly! */
        GEM_BUG_ON(vm == &vm->gt->ggtt->alias->vm);
@@ -185,7 +186,6 @@ vma_create(struct drm_i915_gem_object *obj,
        rb = NULL;
        p = &obj->vma.tree.rb_node;
        while (*p) {
-               struct i915_vma *pos;
                long cmp;
 
                rb = *p;
@@ -197,12 +197,8 @@ vma_create(struct drm_i915_gem_object *obj,
                 * and dispose of ours.
                 */
                cmp = i915_vma_compare(pos, vm, view);
-               if (cmp == 0) {
-                       spin_unlock(&obj->vma.lock);
-                       i915_vm_put(vm);
-                       i915_vma_free(vma);
-                       return pos;
-               }
+               if (!cmp)
+                       goto err_unlock;
 
                if (cmp < 0)
                        p = &rb->rb_right;
@@ -230,8 +226,9 @@ vma_create(struct drm_i915_gem_object *obj,
 err_unlock:
        spin_unlock(&obj->vma.lock);
 err_vma:
+       i915_vm_put(vm);
        i915_vma_free(vma);
-       return ERR_PTR(-E2BIG);
+       return pos;
 }

Andi

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Intel-gfx] [PATCH 01/23] drm/i915: Drop vm.ref for duplicate vma on construction
@ 2020-07-02 20:25   ` Andi Shyti
  0 siblings, 0 replies; 56+ messages in thread
From: Andi Shyti @ 2020-07-02 20:25 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx, stable

Hi Chris,

> diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
> index 1f63c4a1f055..7fe1f317cd2b 100644
> --- a/drivers/gpu/drm/i915/i915_vma.c
> +++ b/drivers/gpu/drm/i915/i915_vma.c
> @@ -198,6 +198,7 @@ vma_create(struct drm_i915_gem_object *obj,
>  		cmp = i915_vma_compare(pos, vm, view);
>  		if (cmp == 0) {
>  			spin_unlock(&obj->vma.lock);
> +			i915_vm_put(vm);
>  			i915_vma_free(vma);

You are forgettin one return without dereferencing it.

would this be a solution:

@@ -106,6 +106,7 @@ vma_create(struct drm_i915_gem_object *obj,
 {
        struct i915_vma *vma;
        struct rb_node *rb, **p;
+       struct i915_vma *pos = ERR_PTR(-E2BIG);
 
        /* The aliasing_ppgtt should never be used directly! */
        GEM_BUG_ON(vm == &vm->gt->ggtt->alias->vm);
@@ -185,7 +186,6 @@ vma_create(struct drm_i915_gem_object *obj,
        rb = NULL;
        p = &obj->vma.tree.rb_node;
        while (*p) {
-               struct i915_vma *pos;
                long cmp;
 
                rb = *p;
@@ -197,12 +197,8 @@ vma_create(struct drm_i915_gem_object *obj,
                 * and dispose of ours.
                 */
                cmp = i915_vma_compare(pos, vm, view);
-               if (cmp == 0) {
-                       spin_unlock(&obj->vma.lock);
-                       i915_vm_put(vm);
-                       i915_vma_free(vma);
-                       return pos;
-               }
+               if (!cmp)
+                       goto err_unlock;
 
                if (cmp < 0)
                        p = &rb->rb_right;
@@ -230,8 +226,9 @@ vma_create(struct drm_i915_gem_object *obj,
 err_unlock:
        spin_unlock(&obj->vma.lock);
 err_vma:
+       i915_vm_put(vm);
        i915_vma_free(vma);
-       return ERR_PTR(-E2BIG);
+       return pos;
 }

Andi
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Intel-gfx] [PATCH 01/23] drm/i915: Drop vm.ref for duplicate vma on construction
  2020-07-02 20:25   ` Andi Shyti
@ 2020-07-02 20:38     ` Chris Wilson
  -1 siblings, 0 replies; 56+ messages in thread
From: Chris Wilson @ 2020-07-02 20:38 UTC (permalink / raw)
  To: Andi Shyti; +Cc: intel-gfx, stable, andi, andi.shyti

Quoting Andi Shyti (2020-07-02 21:25:45)
> Hi Chris,
> 
> > diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
> > index 1f63c4a1f055..7fe1f317cd2b 100644
> > --- a/drivers/gpu/drm/i915/i915_vma.c
> > +++ b/drivers/gpu/drm/i915/i915_vma.c
> > @@ -198,6 +198,7 @@ vma_create(struct drm_i915_gem_object *obj,
> >               cmp = i915_vma_compare(pos, vm, view);
> >               if (cmp == 0) {
> >                       spin_unlock(&obj->vma.lock);
> > +                     i915_vm_put(vm);
> >                       i915_vma_free(vma);
> 
> You are forgettin one return without dereferencing it.
> 
> would this be a solution:
> 
> @@ -106,6 +106,7 @@ vma_create(struct drm_i915_gem_object *obj,
>  {
>         struct i915_vma *vma;
>         struct rb_node *rb, **p;
> +       struct i915_vma *pos = ERR_PTR(-E2BIG);
>  
>         /* The aliasing_ppgtt should never be used directly! */
>         GEM_BUG_ON(vm == &vm->gt->ggtt->alias->vm);
> @@ -185,7 +186,6 @@ vma_create(struct drm_i915_gem_object *obj,
>         rb = NULL;
>         p = &obj->vma.tree.rb_node;
>         while (*p) {
> -               struct i915_vma *pos;
>                 long cmp;
>  
>                 rb = *p;
> @@ -197,12 +197,8 @@ vma_create(struct drm_i915_gem_object *obj,
>                  * and dispose of ours.
>                  */
>                 cmp = i915_vma_compare(pos, vm, view);
> -               if (cmp == 0) {
> -                       spin_unlock(&obj->vma.lock);
> -                       i915_vm_put(vm);
> -                       i915_vma_free(vma);
> -                       return pos;
> -               }
> +               if (!cmp)
> +                       goto err_unlock;

Yeah, but you might as well do

if (cmp < 0)
 	p = right;
else if (cmp > 0)
 	p = left;
else
	goto err_unlock;
 
>                 if (cmp < 0)
>                         p = &rb->rb_right;
> @@ -230,8 +226,9 @@ vma_create(struct drm_i915_gem_object *obj,
>  err_unlock:
>         spin_unlock(&obj->vma.lock);
>  err_vma:
> +       i915_vm_put(vm);
>         i915_vma_free(vma);
> -       return ERR_PTR(-E2BIG);
> +       return pos;
>  }
> 
> Andi

Ta, going to send that as a patch?
-Chris

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Intel-gfx] [PATCH 01/23] drm/i915: Drop vm.ref for duplicate vma on construction
@ 2020-07-02 20:38     ` Chris Wilson
  0 siblings, 0 replies; 56+ messages in thread
From: Chris Wilson @ 2020-07-02 20:38 UTC (permalink / raw)
  To: Andi Shyti; +Cc: intel-gfx, stable

Quoting Andi Shyti (2020-07-02 21:25:45)
> Hi Chris,
> 
> > diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
> > index 1f63c4a1f055..7fe1f317cd2b 100644
> > --- a/drivers/gpu/drm/i915/i915_vma.c
> > +++ b/drivers/gpu/drm/i915/i915_vma.c
> > @@ -198,6 +198,7 @@ vma_create(struct drm_i915_gem_object *obj,
> >               cmp = i915_vma_compare(pos, vm, view);
> >               if (cmp == 0) {
> >                       spin_unlock(&obj->vma.lock);
> > +                     i915_vm_put(vm);
> >                       i915_vma_free(vma);
> 
> You are forgettin one return without dereferencing it.
> 
> would this be a solution:
> 
> @@ -106,6 +106,7 @@ vma_create(struct drm_i915_gem_object *obj,
>  {
>         struct i915_vma *vma;
>         struct rb_node *rb, **p;
> +       struct i915_vma *pos = ERR_PTR(-E2BIG);
>  
>         /* The aliasing_ppgtt should never be used directly! */
>         GEM_BUG_ON(vm == &vm->gt->ggtt->alias->vm);
> @@ -185,7 +186,6 @@ vma_create(struct drm_i915_gem_object *obj,
>         rb = NULL;
>         p = &obj->vma.tree.rb_node;
>         while (*p) {
> -               struct i915_vma *pos;
>                 long cmp;
>  
>                 rb = *p;
> @@ -197,12 +197,8 @@ vma_create(struct drm_i915_gem_object *obj,
>                  * and dispose of ours.
>                  */
>                 cmp = i915_vma_compare(pos, vm, view);
> -               if (cmp == 0) {
> -                       spin_unlock(&obj->vma.lock);
> -                       i915_vm_put(vm);
> -                       i915_vma_free(vma);
> -                       return pos;
> -               }
> +               if (!cmp)
> +                       goto err_unlock;

Yeah, but you might as well do

if (cmp < 0)
 	p = right;
else if (cmp > 0)
 	p = left;
else
	goto err_unlock;
 
>                 if (cmp < 0)
>                         p = &rb->rb_right;
> @@ -230,8 +226,9 @@ vma_create(struct drm_i915_gem_object *obj,
>  err_unlock:
>         spin_unlock(&obj->vma.lock);
>  err_vma:
> +       i915_vm_put(vm);
>         i915_vma_free(vma);
> -       return ERR_PTR(-E2BIG);
> +       return pos;
>  }
> 
> Andi

Ta, going to send that as a patch?
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Intel-gfx] [PATCH 01/23] drm/i915: Drop vm.ref for duplicate vma on construction
  2020-07-02 20:38     ` Chris Wilson
@ 2020-07-02 20:56       ` Andi Shyti
  -1 siblings, 0 replies; 56+ messages in thread
From: Andi Shyti @ 2020-07-02 20:56 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Andi Shyti, intel-gfx, stable, andi.shyti

Hi Chris,

> Ta, going to send that as a patch?

mine was a suggestion, it was easier to build the diff than
explain myself :)

If you want I can send it, though.

Andi

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Intel-gfx] [PATCH 01/23] drm/i915: Drop vm.ref for duplicate vma on construction
@ 2020-07-02 20:56       ` Andi Shyti
  0 siblings, 0 replies; 56+ messages in thread
From: Andi Shyti @ 2020-07-02 20:56 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx, stable

Hi Chris,

> Ta, going to send that as a patch?

mine was a suggestion, it was easier to build the diff than
explain myself :)

If you want I can send it, though.

Andi
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Intel-gfx] [PATCH 02/23] drm/i915/gem: Split the context's obj:vma lut into its own mutex
  2020-07-02  8:32 ` [Intel-gfx] [PATCH 02/23] drm/i915/gem: Split the context's obj:vma lut into its own mutex Chris Wilson
@ 2020-07-02 22:09   ` Andi Shyti
  2020-07-02 22:14     ` Chris Wilson
  0 siblings, 1 reply; 56+ messages in thread
From: Andi Shyti @ 2020-07-02 22:09 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

Hi Chris,

> @@ -1312,11 +1314,11 @@ static int set_ppgtt(struct drm_i915_file_private *file_priv,
>  	if (vm == rcu_access_pointer(ctx->vm))
>  		goto unlock;
>  
> +	old = __set_ppgtt(ctx, vm);
> +
>  	/* Teardown the existing obj:vma cache, it will have to be rebuilt. */
>  	lut_close(ctx);
>  
> -	old = __set_ppgtt(ctx, vm);
> -
>  	/*
>  	 * We need to flush any requests using the current ppgtt before
>  	 * we release it as the requests do not hold a reference themselves,
> @@ -1330,6 +1332,7 @@ static int set_ppgtt(struct drm_i915_file_private *file_priv,
>  	if (err) {
>  		i915_vm_close(__set_ppgtt(ctx, old));
>  		i915_vm_close(old);
> +		lut_close(ctx); /* rebuild the old obj:vma cache */

I don't really understand this but it doesn't hurt

> diff --git a/drivers/gpu/drm/i915/gem/selftests/mock_context.c b/drivers/gpu/drm/i915/gem/selftests/mock_context.c
> index aa0d06cf1903..51b5a3421b40 100644
> --- a/drivers/gpu/drm/i915/gem/selftests/mock_context.c
> +++ b/drivers/gpu/drm/i915/gem/selftests/mock_context.c
> @@ -23,6 +23,8 @@ mock_context(struct drm_i915_private *i915,
>  	INIT_LIST_HEAD(&ctx->link);
>  	ctx->i915 = i915;
>  
> +	mutex_init(&ctx->mutex);
> +
>  	spin_lock_init(&ctx->stale.lock);
>  	INIT_LIST_HEAD(&ctx->stale.engines);
>  
> @@ -35,7 +37,7 @@ mock_context(struct drm_i915_private *i915,
>  	RCU_INIT_POINTER(ctx->engines, e);
>  
>  	INIT_RADIX_TREE(&ctx->handles_vma, GFP_KERNEL);
> -	mutex_init(&ctx->mutex);
> +	mutex_init(&ctx->lut_mutex);

...and I don't really understand why moved the first
init(&ctx->mutex) above, is it just aesthetic?

Reviewed-by: Andi Shyti <andi.shyti@intel.com>

Andi
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Intel-gfx] [PATCH 02/23] drm/i915/gem: Split the context's obj:vma lut into its own mutex
  2020-07-02 22:09   ` Andi Shyti
@ 2020-07-02 22:14     ` Chris Wilson
  0 siblings, 0 replies; 56+ messages in thread
From: Chris Wilson @ 2020-07-02 22:14 UTC (permalink / raw)
  To: Andi Shyti; +Cc: intel-gfx

Quoting Andi Shyti (2020-07-02 23:09:44)
> Hi Chris,
> 
> > @@ -1312,11 +1314,11 @@ static int set_ppgtt(struct drm_i915_file_private *file_priv,
> >       if (vm == rcu_access_pointer(ctx->vm))
> >               goto unlock;
> >  
> > +     old = __set_ppgtt(ctx, vm);
> > +
> >       /* Teardown the existing obj:vma cache, it will have to be rebuilt. */
> >       lut_close(ctx);
> >  
> > -     old = __set_ppgtt(ctx, vm);
> > -
> >       /*
> >        * We need to flush any requests using the current ppgtt before
> >        * we release it as the requests do not hold a reference themselves,
> > @@ -1330,6 +1332,7 @@ static int set_ppgtt(struct drm_i915_file_private *file_priv,
> >       if (err) {
> >               i915_vm_close(__set_ppgtt(ctx, old));
> >               i915_vm_close(old);
> > +             lut_close(ctx); /* rebuild the old obj:vma cache */
> 
> I don't really understand this but it doesn't hurt

Yeah; another testcase required for racing set-vm against execbuf.
Outcome unknown, all that we have to avoid are explosions. Userspace is
allowed to shoot itself in the foot, but is not allowed to shoot anyone
else.
 
> > diff --git a/drivers/gpu/drm/i915/gem/selftests/mock_context.c b/drivers/gpu/drm/i915/gem/selftests/mock_context.c
> > index aa0d06cf1903..51b5a3421b40 100644
> > --- a/drivers/gpu/drm/i915/gem/selftests/mock_context.c
> > +++ b/drivers/gpu/drm/i915/gem/selftests/mock_context.c
> > @@ -23,6 +23,8 @@ mock_context(struct drm_i915_private *i915,
> >       INIT_LIST_HEAD(&ctx->link);
> >       ctx->i915 = i915;
> >  
> > +     mutex_init(&ctx->mutex);
> > +
> >       spin_lock_init(&ctx->stale.lock);
> >       INIT_LIST_HEAD(&ctx->stale.engines);
> >  
> > @@ -35,7 +37,7 @@ mock_context(struct drm_i915_private *i915,
> >       RCU_INIT_POINTER(ctx->engines, e);
> >  
> >       INIT_RADIX_TREE(&ctx->handles_vma, GFP_KERNEL);
> > -     mutex_init(&ctx->mutex);
> > +     mutex_init(&ctx->lut_mutex);
> 
> ...and I don't really understand why moved the first
> init(&ctx->mutex) above, is it just aesthetic?

Yup. The ctx->mutex is the broader one, so I felt it deserved to be
higher. Whereas here we are setting up the lut [handles_vma] so was the
natural spot to place the ctx->lut_mutex; and I wanted some distance
between the pair to keep the confusion at bay.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Intel-gfx] [PATCH 03/23] drm/i915/gem: Drop forced struct_mutex from shrinker_taints_mutex
  2020-07-02  8:32 ` [Intel-gfx] [PATCH 03/23] drm/i915/gem: Drop forced struct_mutex from shrinker_taints_mutex Chris Wilson
@ 2020-07-02 22:24   ` Andi Shyti
  0 siblings, 0 replies; 56+ messages in thread
From: Andi Shyti @ 2020-07-02 22:24 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

Hi Chris,

On Thu, Jul 02, 2020 at 09:32:05AM +0100, Chris Wilson wrote:
> Since we no longer always take struct_mutex around everything, and want
> the freedom to create GEM objects, actually taking struct_mutex inside
> the lock creation ends up pulling the mutex inside other looks. Since we
> don't use generally use struct_mutex, we can relax the tainting.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Looks good!

Reviewed-by: Andi Shyti <andi.shyti@intel.com>

Andi
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Intel-gfx] [PATCH 22/23] drm/i915: Add an implementation for i915_gem_ww_ctx locking, v2.
  2020-07-02  8:32 ` [Intel-gfx] [PATCH 22/23] drm/i915: Add an implementation for i915_gem_ww_ctx locking, v2 Chris Wilson
@ 2020-07-02 22:32   ` kernel test robot
  0 siblings, 0 replies; 56+ messages in thread
From: kernel test robot @ 2020-07-02 22:32 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 2228 bytes --]

Hi Chris,

I love your patch! Perhaps something to improve:

[auto build test WARNING on drm-intel/for-linux-next]
[also build test WARNING on next-20200702]
[cannot apply to drm-tip/drm-tip linus/master v5.8-rc3]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use  as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Chris-Wilson/drm-i915-Drop-vm-ref-for-duplicate-vma-on-construction/20200702-170652
base:   git://anongit.freedesktop.org/drm-intel for-linux-next
config: x86_64-allyesconfig (attached as .config)
compiler: clang version 11.0.0 (https://github.com/llvm/llvm-project 003a086ffc0d1affbb8300b36225fb8150a2d40a)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install x86_64 cross compiling tool for clang build
        # apt-get install binutils-x86-64-linux-gnu
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   In file included from drivers/gpu/drm/i915/mm/i915_acquire_ctx.c:10:
>> drivers/gpu/drm/i915/mm/i915_acquire_ctx.h:6:9: warning: '__I915_ACQIURE_CTX_H__' is used as a header guard here, followed by #define of a different macro [-Wheader-guard]
   #ifndef __I915_ACQIURE_CTX_H__
           ^~~~~~~~~~~~~~~~~~~~~~
   drivers/gpu/drm/i915/mm/i915_acquire_ctx.h:7:9: note: '__I915_ACQUIRE_CTX_H__' is defined here; did you mean '__I915_ACQIURE_CTX_H__'?
   #define __I915_ACQUIRE_CTX_H__
           ^~~~~~~~~~~~~~~~~~~~~~
           __I915_ACQIURE_CTX_H__
   1 warning generated.

vim +/__I915_ACQIURE_CTX_H__ +6 drivers/gpu/drm/i915/mm/i915_acquire_ctx.h

   > 6	#ifndef __I915_ACQIURE_CTX_H__
     7	#define __I915_ACQUIRE_CTX_H__
     8	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 75347 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Intel-gfx] [PATCH 07/23] drm/i915: Switch to object allocations for page directories
  2020-07-02  8:32 ` [Intel-gfx] [PATCH 07/23] drm/i915: Switch to object allocations for page directories Chris Wilson
@ 2020-07-03  8:44   ` Tvrtko Ursulin
  2020-07-03  9:00     ` Chris Wilson
  2020-07-03 16:36   ` Tvrtko Ursulin
  1 sibling, 1 reply; 56+ messages in thread
From: Tvrtko Ursulin @ 2020-07-03  8:44 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 02/07/2020 09:32, Chris Wilson wrote:
> The GEM object is grossly overweight for the practicality of tracking
> large numbers of individual pages, yet it is currently our only
> abstraction for tracking DMA allocations. Since those allocations need
> to be reserved upfront before an operation, and that we need to break
> away from simple system memory, we need to ditch using plain struct page
> wrappers.

[Calling all page table experts...] :)

So.. mostly 4k allocations via GEM objects? Sounds not ideal on first.

Reminder on why we need to break away from simple system memory? Need to 
have a list of GEM objects which can be locked in the ww locking phase? 
But how do you allocate these objects up front, when allocation needs to 
be under the ww lock in case evictions need to be triggered.

Regards,

Tvrtko

> In the process, we drop the WC mapping as we ended up clflushing
> everything anyway due to various issues across a wider range of
> platforms. Though in a future step, we need to drop the kmap_atomic
> approach which suggests we need to pre-map all the pages and keep them
> mapped.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   .../gpu/drm/i915/gem/i915_gem_object_types.h  |   1 +
>   .../gpu/drm/i915/gem/selftests/huge_pages.c   |   2 +-
>   .../drm/i915/gem/selftests/i915_gem_context.c |   2 +-
>   drivers/gpu/drm/i915/gt/gen6_ppgtt.c          |  46 ++-
>   drivers/gpu/drm/i915/gt/gen6_ppgtt.h          |   1 +
>   drivers/gpu/drm/i915/gt/gen8_ppgtt.c          |  64 ++--
>   drivers/gpu/drm/i915/gt/intel_ggtt.c          |  31 +-
>   drivers/gpu/drm/i915/gt/intel_gtt.c           | 291 +++---------------
>   drivers/gpu/drm/i915/gt/intel_gtt.h           |  92 ++----
>   drivers/gpu/drm/i915/gt/intel_ppgtt.c         |  25 +-
>   .../gpu/drm/i915/gt/intel_ring_submission.c   |  16 +-
>   drivers/gpu/drm/i915/gvt/scheduler.c          |  17 +-
>   drivers/gpu/drm/i915/i915_drv.c               |   1 +
>   drivers/gpu/drm/i915/i915_drv.h               |   5 -
>   drivers/gpu/drm/i915/selftests/mock_gtt.c     |   2 +
>   15 files changed, 183 insertions(+), 413 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> index 5335f799b548..d0847d7896f9 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> @@ -282,6 +282,7 @@ struct drm_i915_gem_object {
>   		} userptr;
>   
>   		unsigned long scratch;
> +		u64 encode;
>   
>   		void *gvt_info;
>   	};
> diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
> index 8291ede6902c..9fb06fcc8f8f 100644
> --- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
> +++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
> @@ -393,7 +393,7 @@ static int igt_mock_exhaust_device_supported_pages(void *arg)
>   	 */
>   
>   	for (i = 1; i < BIT(ARRAY_SIZE(page_sizes)); i++) {
> -		unsigned int combination = 0;
> +		unsigned int combination = SZ_4K;
>   
>   		for (j = 0; j < ARRAY_SIZE(page_sizes); j++) {
>   			if (i & BIT(j))
> diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
> index b81978890641..1308198543d8 100644
> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
> @@ -1745,7 +1745,7 @@ static int check_scratch_page(struct i915_gem_context *ctx, u32 *out)
>   	if (!vm)
>   		return -ENODEV;
>   
> -	page = vm->scratch[0].base.page;
> +	page = __px_page(vm->scratch[0]);
>   	if (!page) {
>   		pr_err("No scratch page!\n");
>   		return -EINVAL;
> diff --git a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
> index 35e2b698f9ed..226e404c706d 100644
> --- a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
> +++ b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
> @@ -16,8 +16,10 @@ static inline void gen6_write_pde(const struct gen6_ppgtt *ppgtt,
>   				  const unsigned int pde,
>   				  const struct i915_page_table *pt)
>   {
> +	dma_addr_t addr = pt ? px_dma(pt) : px_dma(ppgtt->base.vm.scratch[1]);
> +
>   	/* Caller needs to make sure the write completes if necessary */
> -	iowrite32(GEN6_PDE_ADDR_ENCODE(px_dma(pt)) | GEN6_PDE_VALID,
> +	iowrite32(GEN6_PDE_ADDR_ENCODE(addr) | GEN6_PDE_VALID,
>   		  ppgtt->pd_addr + pde);
>   }
>   
> @@ -79,7 +81,7 @@ static void gen6_ppgtt_clear_range(struct i915_address_space *vm,
>   {
>   	struct gen6_ppgtt * const ppgtt = to_gen6_ppgtt(i915_vm_to_ppgtt(vm));
>   	const unsigned int first_entry = start / I915_GTT_PAGE_SIZE;
> -	const gen6_pte_t scratch_pte = vm->scratch[0].encode;
> +	const gen6_pte_t scratch_pte = vm->scratch[0]->encode;
>   	unsigned int pde = first_entry / GEN6_PTES;
>   	unsigned int pte = first_entry % GEN6_PTES;
>   	unsigned int num_entries = length / I915_GTT_PAGE_SIZE;
> @@ -90,8 +92,6 @@ static void gen6_ppgtt_clear_range(struct i915_address_space *vm,
>   		const unsigned int count = min(num_entries, GEN6_PTES - pte);
>   		gen6_pte_t *vaddr;
>   
> -		GEM_BUG_ON(px_base(pt) == px_base(&vm->scratch[1]));
> -
>   		num_entries -= count;
>   
>   		GEM_BUG_ON(count > atomic_read(&pt->used));
> @@ -127,7 +127,7 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
>   	struct sgt_dma iter = sgt_dma(vma);
>   	gen6_pte_t *vaddr;
>   
> -	GEM_BUG_ON(pd->entry[act_pt] == &vm->scratch[1]);
> +	GEM_BUG_ON(!pd->entry[act_pt]);
>   
>   	vaddr = kmap_atomic_px(i915_pt_entry(pd, act_pt));
>   	do {
> @@ -194,16 +194,16 @@ static void gen6_alloc_va_range(struct i915_address_space *vm,
>   	gen6_for_each_pde(pt, pd, start, length, pde) {
>   		const unsigned int count = gen6_pte_count(start, length);
>   
> -		if (px_base(pt) == px_base(&vm->scratch[1])) {
> +		if (!pt) {
>   			spin_unlock(&pd->lock);
>   
>   			pt = stash->pt[0];
>   			GEM_BUG_ON(!pt);
>   
> -			fill32_px(pt, vm->scratch[0].encode);
> +			fill32_px(pt, vm->scratch[0]->encode);
>   
>   			spin_lock(&pd->lock);
> -			if (pd->entry[pde] == &vm->scratch[1]) {
> +			if (!pd->entry[pde]) {
>   				stash->pt[0] = pt->stash;
>   				atomic_set(&pt->used, 0);
>   				pd->entry[pde] = pt;
> @@ -225,24 +225,21 @@ static void gen6_alloc_va_range(struct i915_address_space *vm,
>   static int gen6_ppgtt_init_scratch(struct gen6_ppgtt *ppgtt)
>   {
>   	struct i915_address_space * const vm = &ppgtt->base.vm;
> -	struct i915_page_directory * const pd = ppgtt->base.pd;
>   	int ret;
>   
> -	ret = setup_scratch_page(vm, __GFP_HIGHMEM);
> +	ret = setup_scratch_page(vm);
>   	if (ret)
>   		return ret;
>   
> -	vm->scratch[0].encode =
> -		vm->pte_encode(px_dma(&vm->scratch[0]),
> +	vm->scratch[0]->encode =
> +		vm->pte_encode(px_dma(vm->scratch[0]),
>   			       I915_CACHE_NONE, PTE_READ_ONLY);
>   
> -	if (unlikely(setup_page_dma(vm, px_base(&vm->scratch[1])))) {
> -		cleanup_scratch_page(vm);
> -		return -ENOMEM;
> -	}
> +	vm->scratch[1] = vm->alloc_pt_dma(vm, I915_GTT_PAGE_SIZE_4K);
> +	if (IS_ERR(vm->scratch[1]))
> +		return PTR_ERR(vm->scratch[1]);
>   
> -	fill32_px(&vm->scratch[1], vm->scratch[0].encode);
> -	memset_p(pd->entry, &vm->scratch[1], I915_PDES);
> +	fill32_px(vm->scratch[1], vm->scratch[0]->encode);
>   
>   	return 0;
>   }
> @@ -250,13 +247,11 @@ static int gen6_ppgtt_init_scratch(struct gen6_ppgtt *ppgtt)
>   static void gen6_ppgtt_free_pd(struct gen6_ppgtt *ppgtt)
>   {
>   	struct i915_page_directory * const pd = ppgtt->base.pd;
> -	struct i915_page_dma * const scratch =
> -		px_base(&ppgtt->base.vm.scratch[1]);
>   	struct i915_page_table *pt;
>   	u32 pde;
>   
>   	gen6_for_all_pdes(pt, pd, pde)
> -		if (px_base(pt) != scratch)
> +		if (pt)
>   			free_px(&ppgtt->base.vm, pt);
>   }
>   
> @@ -297,7 +292,7 @@ static void pd_vma_bind(struct i915_address_space *vm,
>   	struct gen6_ppgtt *ppgtt = vma->private;
>   	u32 ggtt_offset = i915_ggtt_offset(vma) / I915_GTT_PAGE_SIZE;
>   
> -	px_base(ppgtt->base.pd)->ggtt_offset = ggtt_offset * sizeof(gen6_pte_t);
> +	ppgtt->pp_dir = ggtt_offset * sizeof(gen6_pte_t) << 10;
>   	ppgtt->pd_addr = (gen6_pte_t __iomem *)ggtt->gsm + ggtt_offset;
>   
>   	gen6_flush_pd(ppgtt, 0, ppgtt->base.vm.total);
> @@ -307,8 +302,6 @@ static void pd_vma_unbind(struct i915_address_space *vm, struct i915_vma *vma)
>   {
>   	struct gen6_ppgtt *ppgtt = vma->private;
>   	struct i915_page_directory * const pd = ppgtt->base.pd;
> -	struct i915_page_dma * const scratch =
> -		px_base(&ppgtt->base.vm.scratch[1]);
>   	struct i915_page_table *pt;
>   	unsigned int pde;
>   
> @@ -317,11 +310,11 @@ static void pd_vma_unbind(struct i915_address_space *vm, struct i915_vma *vma)
>   
>   	/* Free all no longer used page tables */
>   	gen6_for_all_pdes(pt, ppgtt->base.pd, pde) {
> -		if (px_base(pt) == scratch || atomic_read(&pt->used))
> +		if (!pt || atomic_read(&pt->used))
>   			continue;
>   
>   		free_px(&ppgtt->base.vm, pt);
> -		pd->entry[pde] = scratch;
> +		pd->entry[pde] = NULL;
>   	}
>   
>   	ppgtt->scan_for_unused_pt = false;
> @@ -441,6 +434,7 @@ struct i915_ppgtt *gen6_ppgtt_create(struct intel_gt *gt)
>   	ppgtt->base.vm.insert_entries = gen6_ppgtt_insert_entries;
>   	ppgtt->base.vm.cleanup = gen6_ppgtt_cleanup;
>   
> +	ppgtt->base.vm.alloc_pt_dma = alloc_pt_dma;
>   	ppgtt->base.vm.pte_encode = ggtt->vm.pte_encode;
>   
>   	ppgtt->base.pd = __alloc_pd(sizeof(*ppgtt->base.pd));
> diff --git a/drivers/gpu/drm/i915/gt/gen6_ppgtt.h b/drivers/gpu/drm/i915/gt/gen6_ppgtt.h
> index 72e481806c96..7249672e5802 100644
> --- a/drivers/gpu/drm/i915/gt/gen6_ppgtt.h
> +++ b/drivers/gpu/drm/i915/gt/gen6_ppgtt.h
> @@ -14,6 +14,7 @@ struct gen6_ppgtt {
>   	struct mutex flush;
>   	struct i915_vma *vma;
>   	gen6_pte_t __iomem *pd_addr;
> +	u32 pp_dir;
>   
>   	atomic_t pin_count;
>   	struct mutex pin_mutex;
> diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> index e6f2acd445dd..d3f27beaac03 100644
> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> @@ -199,7 +199,7 @@ static u64 __gen8_ppgtt_clear(struct i915_address_space * const vm,
>   			      struct i915_page_directory * const pd,
>   			      u64 start, const u64 end, int lvl)
>   {
> -	const struct i915_page_scratch * const scratch = &vm->scratch[lvl];
> +	const struct drm_i915_gem_object * const scratch = vm->scratch[lvl];
>   	unsigned int idx, len;
>   
>   	GEM_BUG_ON(end > vm->total >> GEN8_PTE_SHIFT);
> @@ -239,7 +239,7 @@ static u64 __gen8_ppgtt_clear(struct i915_address_space * const vm,
>   
>   			vaddr = kmap_atomic_px(pt);
>   			memset64(vaddr + gen8_pd_index(start, 0),
> -				 vm->scratch[0].encode,
> +				 vm->scratch[0]->encode,
>   				 count);
>   			kunmap_atomic(vaddr);
>   
> @@ -301,7 +301,7 @@ static void __gen8_ppgtt_alloc(struct i915_address_space * const vm,
>   			if (lvl ||
>   			    gen8_pt_count(*start, end) < I915_PDES ||
>   			    intel_vgpu_active(vm->i915))
> -				fill_px(pt, vm->scratch[lvl].encode);
> +				fill_px(pt, vm->scratch[lvl]->encode);
>   
>   			spin_lock(&pd->lock);
>   			if (likely(!pd->entry[idx])) {
> @@ -356,16 +356,6 @@ static void gen8_ppgtt_alloc(struct i915_address_space *vm,
>   			   &start, start + length, vm->top);
>   }
>   
> -static __always_inline void
> -write_pte(gen8_pte_t *pte, const gen8_pte_t val)
> -{
> -	/* Magic delays? Or can we refine these to flush all in one pass? */
> -	*pte = val;
> -	wmb(); /* cpu to cache */
> -	clflush(pte); /* cache to memory */
> -	wmb(); /* visible to all */
> -}
> -
>   static __always_inline u64
>   gen8_ppgtt_insert_pte(struct i915_ppgtt *ppgtt,
>   		      struct i915_page_directory *pdp,
> @@ -382,8 +372,7 @@ gen8_ppgtt_insert_pte(struct i915_ppgtt *ppgtt,
>   	vaddr = kmap_atomic_px(i915_pt_entry(pd, gen8_pd_index(idx, 1)));
>   	do {
>   		GEM_BUG_ON(iter->sg->length < I915_GTT_PAGE_SIZE);
> -		write_pte(&vaddr[gen8_pd_index(idx, 0)],
> -			  pte_encode | iter->dma);
> +		vaddr[gen8_pd_index(idx, 0)] = pte_encode | iter->dma;
>   
>   		iter->dma += I915_GTT_PAGE_SIZE;
>   		if (iter->dma >= iter->max) {
> @@ -406,10 +395,12 @@ gen8_ppgtt_insert_pte(struct i915_ppgtt *ppgtt,
>   				pd = pdp->entry[gen8_pd_index(idx, 2)];
>   			}
>   
> +			clflush_cache_range(vaddr, PAGE_SIZE);
>   			kunmap_atomic(vaddr);
>   			vaddr = kmap_atomic_px(i915_pt_entry(pd, gen8_pd_index(idx, 1)));
>   		}
>   	} while (1);
> +	clflush_cache_range(vaddr, PAGE_SIZE);
>   	kunmap_atomic(vaddr);
>   
>   	return idx;
> @@ -465,7 +456,7 @@ static void gen8_ppgtt_insert_huge(struct i915_vma *vma,
>   
>   		do {
>   			GEM_BUG_ON(iter->sg->length < page_size);
> -			write_pte(&vaddr[index++], encode | iter->dma);
> +			vaddr[index++] = encode | iter->dma;
>   
>   			start += page_size;
>   			iter->dma += page_size;
> @@ -490,6 +481,7 @@ static void gen8_ppgtt_insert_huge(struct i915_vma *vma,
>   			}
>   		} while (rem >= page_size && index < I915_PDES);
>   
> +		clflush_cache_range(vaddr, PAGE_SIZE);
>   		kunmap_atomic(vaddr);
>   
>   		/*
> @@ -521,7 +513,7 @@ static void gen8_ppgtt_insert_huge(struct i915_vma *vma,
>   			if (I915_SELFTEST_ONLY(vma->vm->scrub_64K)) {
>   				u16 i;
>   
> -				encode = vma->vm->scratch[0].encode;
> +				encode = vma->vm->scratch[0]->encode;
>   				vaddr = kmap_atomic_px(i915_pt_entry(pd, maybe_64K));
>   
>   				for (i = 1; i < index; i += 16)
> @@ -575,27 +567,31 @@ static int gen8_init_scratch(struct i915_address_space *vm)
>   		GEM_BUG_ON(!clone->has_read_only);
>   
>   		vm->scratch_order = clone->scratch_order;
> -		memcpy(vm->scratch, clone->scratch, sizeof(vm->scratch));
> -		px_dma(&vm->scratch[0]) = 0; /* no xfer of ownership */
> +		for (i = 0; i <= vm->top; i++)
> +			vm->scratch[i] = i915_gem_object_get(clone->scratch[i]);
> +
>   		return 0;
>   	}
>   
> -	ret = setup_scratch_page(vm, __GFP_HIGHMEM);
> +	ret = setup_scratch_page(vm);
>   	if (ret)
>   		return ret;
>   
> -	vm->scratch[0].encode =
> -		gen8_pte_encode(px_dma(&vm->scratch[0]),
> +	vm->scratch[0]->encode =
> +		gen8_pte_encode(px_dma(vm->scratch[0]),
>   				I915_CACHE_LLC, vm->has_read_only);
>   
>   	for (i = 1; i <= vm->top; i++) {
> -		if (unlikely(setup_page_dma(vm, px_base(&vm->scratch[i]))))
> +		struct drm_i915_gem_object *obj;
> +
> +		obj = vm->alloc_pt_dma(vm, I915_GTT_PAGE_SIZE_4K);
> +		if (IS_ERR(obj))
>   			goto free_scratch;
>   
> -		fill_px(&vm->scratch[i], vm->scratch[i - 1].encode);
> -		vm->scratch[i].encode =
> -			gen8_pde_encode(px_dma(&vm->scratch[i]),
> -					I915_CACHE_LLC);
> +		fill_px(obj, vm->scratch[i - 1]->encode);
> +		obj->encode = gen8_pde_encode(px_dma(obj), I915_CACHE_LLC);
> +
> +		vm->scratch[i] = obj;
>   	}
>   
>   	return 0;
> @@ -621,7 +617,7 @@ static int gen8_preallocate_top_level_pdp(struct i915_ppgtt *ppgtt)
>   		if (IS_ERR(pde))
>   			return PTR_ERR(pde);
>   
> -		fill_px(pde, vm->scratch[1].encode);
> +		fill_px(pde, vm->scratch[1]->encode);
>   		set_pd_entry(pd, idx, pde);
>   		atomic_inc(px_used(pde)); /* keep pinned */
>   	}
> @@ -642,12 +638,13 @@ gen8_alloc_top_pd(struct i915_address_space *vm)
>   	if (unlikely(!pd))
>   		return ERR_PTR(-ENOMEM);
>   
> -	if (unlikely(setup_page_dma(vm, px_base(pd)))) {
> +	pd->pt.base = vm->alloc_pt_dma(vm, I915_GTT_PAGE_SIZE_4K);
> +	if (IS_ERR(pd->pt.base)) {
>   		kfree(pd);
>   		return ERR_PTR(-ENOMEM);
>   	}
>   
> -	fill_page_dma(px_base(pd), vm->scratch[vm->top].encode, count);
> +	fill_page_dma(px_base(pd), vm->scratch[vm->top]->encode, count);
>   	atomic_inc(px_used(pd)); /* mark as pinned */
>   	return pd;
>   }
> @@ -681,12 +678,7 @@ struct i915_ppgtt *gen8_ppgtt_create(struct intel_gt *gt)
>   	 */
>   	ppgtt->vm.has_read_only = !IS_GEN_RANGE(gt->i915, 11, 12);
>   
> -	/*
> -	 * There are only few exceptions for gen >=6. chv and bxt.
> -	 * And we are not sure about the latter so play safe for now.
> -	 */
> -	if (IS_CHERRYVIEW(gt->i915) || IS_BROXTON(gt->i915))
> -		ppgtt->vm.pt_kmap_wc = true;
> +	ppgtt->vm.alloc_pt_dma = alloc_pt_dma;
>   
>   	err = gen8_init_scratch(&ppgtt->vm);
>   	if (err)
> diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> index 791e4070ef31..9db27a2e5f36 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> @@ -78,8 +78,6 @@ int i915_ggtt_init_hw(struct drm_i915_private *i915)
>   {
>   	int ret;
>   
> -	stash_init(&i915->mm.wc_stash);
> -
>   	/*
>   	 * Note that we use page colouring to enforce a guard page at the
>   	 * end of the address space. This is required as the CS may prefetch
> @@ -232,7 +230,7 @@ static void gen8_ggtt_insert_entries(struct i915_address_space *vm,
>   
>   	/* Fill the allocated but "unused" space beyond the end of the buffer */
>   	while (gte < end)
> -		gen8_set_pte(gte++, vm->scratch[0].encode);
> +		gen8_set_pte(gte++, vm->scratch[0]->encode);
>   
>   	/*
>   	 * We want to flush the TLBs only after we're certain all the PTE
> @@ -283,7 +281,7 @@ static void gen6_ggtt_insert_entries(struct i915_address_space *vm,
>   
>   	/* Fill the allocated but "unused" space beyond the end of the buffer */
>   	while (gte < end)
> -		iowrite32(vm->scratch[0].encode, gte++);
> +		iowrite32(vm->scratch[0]->encode, gte++);
>   
>   	/*
>   	 * We want to flush the TLBs only after we're certain all the PTE
> @@ -303,7 +301,7 @@ static void gen8_ggtt_clear_range(struct i915_address_space *vm,
>   	struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
>   	unsigned int first_entry = start / I915_GTT_PAGE_SIZE;
>   	unsigned int num_entries = length / I915_GTT_PAGE_SIZE;
> -	const gen8_pte_t scratch_pte = vm->scratch[0].encode;
> +	const gen8_pte_t scratch_pte = vm->scratch[0]->encode;
>   	gen8_pte_t __iomem *gtt_base =
>   		(gen8_pte_t __iomem *)ggtt->gsm + first_entry;
>   	const int max_entries = ggtt_total_entries(ggtt) - first_entry;
> @@ -401,7 +399,7 @@ static void gen6_ggtt_clear_range(struct i915_address_space *vm,
>   		 first_entry, num_entries, max_entries))
>   		num_entries = max_entries;
>   
> -	scratch_pte = vm->scratch[0].encode;
> +	scratch_pte = vm->scratch[0]->encode;
>   	for (i = 0; i < num_entries; i++)
>   		iowrite32(scratch_pte, &gtt_base[i]);
>   }
> @@ -712,18 +710,11 @@ static void ggtt_cleanup_hw(struct i915_ggtt *ggtt)
>   void i915_ggtt_driver_release(struct drm_i915_private *i915)
>   {
>   	struct i915_ggtt *ggtt = &i915->ggtt;
> -	struct pagevec *pvec;
>   
>   	fini_aliasing_ppgtt(ggtt);
>   
>   	intel_ggtt_fini_fences(ggtt);
>   	ggtt_cleanup_hw(ggtt);
> -
> -	pvec = &i915->mm.wc_stash.pvec;
> -	if (pvec->nr) {
> -		set_pages_array_wb(pvec->pages, pvec->nr);
> -		__pagevec_release(pvec);
> -	}
>   }
>   
>   static unsigned int gen6_get_total_gtt_size(u16 snb_gmch_ctl)
> @@ -786,7 +777,7 @@ static int ggtt_probe_common(struct i915_ggtt *ggtt, u64 size)
>   		return -ENOMEM;
>   	}
>   
> -	ret = setup_scratch_page(&ggtt->vm, GFP_DMA32);
> +	ret = setup_scratch_page(&ggtt->vm);
>   	if (ret) {
>   		drm_err(&i915->drm, "Scratch setup failed\n");
>   		/* iounmap will also get called at remove, but meh */
> @@ -794,8 +785,8 @@ static int ggtt_probe_common(struct i915_ggtt *ggtt, u64 size)
>   		return ret;
>   	}
>   
> -	ggtt->vm.scratch[0].encode =
> -		ggtt->vm.pte_encode(px_dma(&ggtt->vm.scratch[0]),
> +	ggtt->vm.scratch[0]->encode =
> +		ggtt->vm.pte_encode(px_dma(ggtt->vm.scratch[0]),
>   				    I915_CACHE_NONE, 0);
>   
>   	return 0;
> @@ -821,7 +812,7 @@ static void gen6_gmch_remove(struct i915_address_space *vm)
>   	struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
>   
>   	iounmap(ggtt->gsm);
> -	cleanup_scratch_page(vm);
> +	free_scratch(vm);
>   }
>   
>   static struct resource pci_resource(struct pci_dev *pdev, int bar)
> @@ -849,6 +840,8 @@ static int gen8_gmch_probe(struct i915_ggtt *ggtt)
>   	else
>   		size = gen8_get_total_gtt_size(snb_gmch_ctl);
>   
> +	ggtt->vm.alloc_pt_dma = alloc_pt_dma;
> +
>   	ggtt->vm.total = (size / sizeof(gen8_pte_t)) * I915_GTT_PAGE_SIZE;
>   	ggtt->vm.cleanup = gen6_gmch_remove;
>   	ggtt->vm.insert_page = gen8_ggtt_insert_page;
> @@ -997,6 +990,8 @@ static int gen6_gmch_probe(struct i915_ggtt *ggtt)
>   	size = gen6_get_total_gtt_size(snb_gmch_ctl);
>   	ggtt->vm.total = (size / sizeof(gen6_pte_t)) * I915_GTT_PAGE_SIZE;
>   
> +	ggtt->vm.alloc_pt_dma = alloc_pt_dma;
> +
>   	ggtt->vm.clear_range = nop_clear_range;
>   	if (!HAS_FULL_PPGTT(i915) || intel_scanout_needs_vtd_wa(i915))
>   		ggtt->vm.clear_range = gen6_ggtt_clear_range;
> @@ -1047,6 +1042,8 @@ static int i915_gmch_probe(struct i915_ggtt *ggtt)
>   	ggtt->gmadr =
>   		(struct resource)DEFINE_RES_MEM(gmadr_base, ggtt->mappable_end);
>   
> +	ggtt->vm.alloc_pt_dma = alloc_pt_dma;
> +
>   	ggtt->do_idle_maps = needs_idle_maps(i915);
>   	ggtt->vm.insert_page = i915_ggtt_insert_page;
>   	ggtt->vm.insert_entries = i915_ggtt_insert_entries;
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
> index 2a72cce63fd9..e0cc90942848 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
> @@ -11,160 +11,24 @@
>   #include "intel_gt.h"
>   #include "intel_gtt.h"
>   
> -void stash_init(struct pagestash *stash)
> +struct drm_i915_gem_object *alloc_pt_dma(struct i915_address_space *vm, int sz)
>   {
> -	pagevec_init(&stash->pvec);
> -	spin_lock_init(&stash->lock);
> -}
> -
> -static struct page *stash_pop_page(struct pagestash *stash)
> -{
> -	struct page *page = NULL;
> -
> -	spin_lock(&stash->lock);
> -	if (likely(stash->pvec.nr))
> -		page = stash->pvec.pages[--stash->pvec.nr];
> -	spin_unlock(&stash->lock);
> -
> -	return page;
> -}
> -
> -static void stash_push_pagevec(struct pagestash *stash, struct pagevec *pvec)
> -{
> -	unsigned int nr;
> -
> -	spin_lock_nested(&stash->lock, SINGLE_DEPTH_NESTING);
> -
> -	nr = min_t(typeof(nr), pvec->nr, pagevec_space(&stash->pvec));
> -	memcpy(stash->pvec.pages + stash->pvec.nr,
> -	       pvec->pages + pvec->nr - nr,
> -	       sizeof(pvec->pages[0]) * nr);
> -	stash->pvec.nr += nr;
> -
> -	spin_unlock(&stash->lock);
> -
> -	pvec->nr -= nr;
> -}
> -
> -static struct page *vm_alloc_page(struct i915_address_space *vm, gfp_t gfp)
> -{
> -	struct pagevec stack;
> -	struct page *page;
> -
> -	if (I915_SELFTEST_ONLY(should_fail(&vm->fault_attr, 1)))
> -		i915_gem_shrink_all(vm->i915);
> -
> -	page = stash_pop_page(&vm->free_pages);
> -	if (page)
> -		return page;
> -
> -	if (!vm->pt_kmap_wc)
> -		return alloc_page(gfp);
> -
> -	/* Look in our global stash of WC pages... */
> -	page = stash_pop_page(&vm->i915->mm.wc_stash);
> -	if (page)
> -		return page;
> +	struct drm_i915_gem_object *obj;
> +	int err;
>   
> -	/*
> -	 * Otherwise batch allocate pages to amortize cost of set_pages_wc.
> -	 *
> -	 * We have to be careful as page allocation may trigger the shrinker
> -	 * (via direct reclaim) which will fill up the WC stash underneath us.
> -	 * So we add our WB pages into a temporary pvec on the stack and merge
> -	 * them into the WC stash after all the allocations are complete.
> -	 */
> -	pagevec_init(&stack);
> -	do {
> -		struct page *page;
> -
> -		page = alloc_page(gfp);
> -		if (unlikely(!page))
> -			break;
> -
> -		stack.pages[stack.nr++] = page;
> -	} while (pagevec_space(&stack));
> -
> -	if (stack.nr && !set_pages_array_wc(stack.pages, stack.nr)) {
> -		page = stack.pages[--stack.nr];
> -
> -		/* Merge spare WC pages to the global stash */
> -		if (stack.nr)
> -			stash_push_pagevec(&vm->i915->mm.wc_stash, &stack);
> -
> -		/* Push any surplus WC pages onto the local VM stash */
> -		if (stack.nr)
> -			stash_push_pagevec(&vm->free_pages, &stack);
> -	}
> -
> -	/* Return unwanted leftovers */
> -	if (unlikely(stack.nr)) {
> -		WARN_ON_ONCE(set_pages_array_wb(stack.pages, stack.nr));
> -		__pagevec_release(&stack);
> -	}
> -
> -	return page;
> -}
> -
> -static void vm_free_pages_release(struct i915_address_space *vm,
> -				  bool immediate)
> -{
> -	struct pagevec *pvec = &vm->free_pages.pvec;
> -	struct pagevec stack;
> -
> -	lockdep_assert_held(&vm->free_pages.lock);
> -	GEM_BUG_ON(!pagevec_count(pvec));
> -
> -	if (vm->pt_kmap_wc) {
> -		/*
> -		 * When we use WC, first fill up the global stash and then
> -		 * only if full immediately free the overflow.
> -		 */
> -		stash_push_pagevec(&vm->i915->mm.wc_stash, pvec);
> +	obj = i915_gem_object_create_internal(vm->i915, sz);
> +	if (IS_ERR(obj))
> +		return obj;
>   
> -		/*
> -		 * As we have made some room in the VM's free_pages,
> -		 * we can wait for it to fill again. Unless we are
> -		 * inside i915_address_space_fini() and must
> -		 * immediately release the pages!
> -		 */
> -		if (pvec->nr <= (immediate ? 0 : PAGEVEC_SIZE - 1))
> -			return;
> -
> -		/*
> -		 * We have to drop the lock to allow ourselves to sleep,
> -		 * so take a copy of the pvec and clear the stash for
> -		 * others to use it as we sleep.
> -		 */
> -		stack = *pvec;
> -		pagevec_reinit(pvec);
> -		spin_unlock(&vm->free_pages.lock);
> -
> -		pvec = &stack;
> -		set_pages_array_wb(pvec->pages, pvec->nr);
> -
> -		spin_lock(&vm->free_pages.lock);
> +	err = i915_gem_object_pin_pages(obj);
> +	if (err) {
> +		i915_gem_object_put(obj);
> +		return ERR_PTR(err);
>   	}
>   
> -	__pagevec_release(pvec);
> -}
> +	i915_gem_object_make_unshrinkable(obj);
>   
> -static void vm_free_page(struct i915_address_space *vm, struct page *page)
> -{
> -	/*
> -	 * On !llc, we need to change the pages back to WB. We only do so
> -	 * in bulk, so we rarely need to change the page attributes here,
> -	 * but doing so requires a stop_machine() from deep inside arch/x86/mm.
> -	 * To make detection of the possible sleep more likely, use an
> -	 * unconditional might_sleep() for everybody.
> -	 */
> -	might_sleep();
> -	spin_lock(&vm->free_pages.lock);
> -	while (!pagevec_space(&vm->free_pages.pvec))
> -		vm_free_pages_release(vm, false);
> -	GEM_BUG_ON(pagevec_count(&vm->free_pages.pvec) >= PAGEVEC_SIZE);
> -	pagevec_add(&vm->free_pages.pvec, page);
> -	spin_unlock(&vm->free_pages.lock);
> +	return obj;
>   }
>   
>   void __i915_vm_close(struct i915_address_space *vm)
> @@ -194,14 +58,7 @@ void __i915_vm_close(struct i915_address_space *vm)
>   
>   void i915_address_space_fini(struct i915_address_space *vm)
>   {
> -	spin_lock(&vm->free_pages.lock);
> -	if (pagevec_count(&vm->free_pages.pvec))
> -		vm_free_pages_release(vm, true);
> -	GEM_BUG_ON(pagevec_count(&vm->free_pages.pvec));
> -	spin_unlock(&vm->free_pages.lock);
> -
>   	drm_mm_takedown(&vm->mm);
> -
>   	mutex_destroy(&vm->mutex);
>   }
>   
> @@ -246,8 +103,6 @@ void i915_address_space_init(struct i915_address_space *vm, int subclass)
>   	drm_mm_init(&vm->mm, 0, vm->total);
>   	vm->mm.head_node.color = I915_COLOR_UNEVICTABLE;
>   
> -	stash_init(&vm->free_pages);
> -
>   	INIT_LIST_HEAD(&vm->bound_list);
>   }
>   
> @@ -264,64 +119,47 @@ void clear_pages(struct i915_vma *vma)
>   	memset(&vma->page_sizes, 0, sizeof(vma->page_sizes));
>   }
>   
> -static int __setup_page_dma(struct i915_address_space *vm,
> -			    struct i915_page_dma *p,
> -			    gfp_t gfp)
> -{
> -	p->page = vm_alloc_page(vm, gfp | I915_GFP_ALLOW_FAIL);
> -	if (unlikely(!p->page))
> -		return -ENOMEM;
> -
> -	p->daddr = dma_map_page_attrs(vm->dma,
> -				      p->page, 0, PAGE_SIZE,
> -				      PCI_DMA_BIDIRECTIONAL,
> -				      DMA_ATTR_SKIP_CPU_SYNC |
> -				      DMA_ATTR_NO_WARN);
> -	if (unlikely(dma_mapping_error(vm->dma, p->daddr))) {
> -		vm_free_page(vm, p->page);
> -		return -ENOMEM;
> -	}
> -
> -	return 0;
> -}
> -
> -int setup_page_dma(struct i915_address_space *vm, struct i915_page_dma *p)
> +dma_addr_t __px_dma(struct drm_i915_gem_object *p)
>   {
> -	return __setup_page_dma(vm, p, __GFP_HIGHMEM);
> +	GEM_BUG_ON(!i915_gem_object_has_pages(p));
> +	return sg_dma_address(p->mm.pages->sgl);
>   }
>   
> -void cleanup_page_dma(struct i915_address_space *vm, struct i915_page_dma *p)
> +struct page *__px_page(struct drm_i915_gem_object *p)
>   {
> -	dma_unmap_page(vm->dma, p->daddr, PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
> -	vm_free_page(vm, p->page);
> +	GEM_BUG_ON(!i915_gem_object_has_pages(p));
> +	return sg_page(p->mm.pages->sgl);
>   }
>   
>   void
> -fill_page_dma(const struct i915_page_dma *p, const u64 val, unsigned int count)
> +fill_page_dma(struct drm_i915_gem_object *p, const u64 val, unsigned int count)
>   {
> -	kunmap_atomic(memset64(kmap_atomic(p->page), val, count));
> +	struct page *page = __px_page(p);
> +	void *vaddr;
> +
> +	vaddr = kmap(page);
> +	memset64(vaddr, val, count);
> +	kunmap(page);
>   }
>   
> -static void poison_scratch_page(struct page *page, unsigned long size)
> +static void poison_scratch_page(struct drm_i915_gem_object *scratch)
>   {
> +	struct sgt_iter sgt;
> +	struct page *page;
> +
>   	if (!IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM))
>   		return;
>   
> -	GEM_BUG_ON(!IS_ALIGNED(size, PAGE_SIZE));
> -
> -	do {
> +	for_each_sgt_page(page, sgt, scratch->mm.pages) {
>   		void *vaddr;
>   
>   		vaddr = kmap(page);
>   		memset(vaddr, POISON_FREE, PAGE_SIZE);
>   		kunmap(page);
> -
> -		page = pfn_to_page(page_to_pfn(page) + 1);
> -		size -= PAGE_SIZE;
> -	} while (size);
> +	}
>   }
>   
> -int setup_scratch_page(struct i915_address_space *vm, gfp_t gfp)
> +int setup_scratch_page(struct i915_address_space *vm)
>   {
>   	unsigned long size;
>   
> @@ -338,21 +176,19 @@ int setup_scratch_page(struct i915_address_space *vm, gfp_t gfp)
>   	 */
>   	size = I915_GTT_PAGE_SIZE_4K;
>   	if (i915_vm_is_4lvl(vm) &&
> -	    HAS_PAGE_SIZES(vm->i915, I915_GTT_PAGE_SIZE_64K)) {
> +	    HAS_PAGE_SIZES(vm->i915, I915_GTT_PAGE_SIZE_64K))
>   		size = I915_GTT_PAGE_SIZE_64K;
> -		gfp |= __GFP_NOWARN;
> -	}
> -	gfp |= __GFP_ZERO | __GFP_RETRY_MAYFAIL;
>   
>   	do {
> -		unsigned int order = get_order(size);
> -		struct page *page;
> -		dma_addr_t addr;
> +		struct drm_i915_gem_object *obj;
>   
> -		page = alloc_pages(gfp, order);
> -		if (unlikely(!page))
> +		obj = vm->alloc_pt_dma(vm, size);
> +		if (IS_ERR(obj))
>   			goto skip;
>   
> +		if (obj->mm.page_sizes.sg < size)
> +			goto skip_obj;
> +
>   		/*
>   		 * Use a non-zero scratch page for debugging.
>   		 *
> @@ -362,61 +198,28 @@ int setup_scratch_page(struct i915_address_space *vm, gfp_t gfp)
>   		 * should it ever be accidentally used, the effect should be
>   		 * fairly benign.
>   		 */
> -		poison_scratch_page(page, size);
> -
> -		addr = dma_map_page_attrs(vm->dma,
> -					  page, 0, size,
> -					  PCI_DMA_BIDIRECTIONAL,
> -					  DMA_ATTR_SKIP_CPU_SYNC |
> -					  DMA_ATTR_NO_WARN);
> -		if (unlikely(dma_mapping_error(vm->dma, addr)))
> -			goto free_page;
> -
> -		if (unlikely(!IS_ALIGNED(addr, size)))
> -			goto unmap_page;
> -
> -		vm->scratch[0].base.page = page;
> -		vm->scratch[0].base.daddr = addr;
> -		vm->scratch_order = order;
> +		poison_scratch_page(obj);
> +
> +		vm->scratch[0] = obj;
> +		vm->scratch_order = get_order(size);
>   		return 0;
>   
> -unmap_page:
> -		dma_unmap_page(vm->dma, addr, size, PCI_DMA_BIDIRECTIONAL);
> -free_page:
> -		__free_pages(page, order);
> +skip_obj:
> +		i915_gem_object_put(obj);
>   skip:
>   		if (size == I915_GTT_PAGE_SIZE_4K)
>   			return -ENOMEM;
>   
>   		size = I915_GTT_PAGE_SIZE_4K;
> -		gfp &= ~__GFP_NOWARN;
>   	} while (1);
>   }
>   
> -void cleanup_scratch_page(struct i915_address_space *vm)
> -{
> -	struct i915_page_dma *p = px_base(&vm->scratch[0]);
> -	unsigned int order = vm->scratch_order;
> -
> -	dma_unmap_page(vm->dma, p->daddr, BIT(order) << PAGE_SHIFT,
> -		       PCI_DMA_BIDIRECTIONAL);
> -	__free_pages(p->page, order);
> -}
> -
>   void free_scratch(struct i915_address_space *vm)
>   {
>   	int i;
>   
> -	if (!px_dma(&vm->scratch[0])) /* set to 0 on clones */
> -		return;
> -
> -	for (i = 1; i <= vm->top; i++) {
> -		if (!px_dma(&vm->scratch[i]))
> -			break;
> -		cleanup_page_dma(vm, px_base(&vm->scratch[i]));
> -	}
> -
> -	cleanup_scratch_page(vm);
> +	for (i = 0; i <= vm->top; i++)
> +		i915_gem_object_put(vm->scratch[i]);
>   }
>   
>   void gtt_write_workarounds(struct intel_gt *gt)
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
> index 8bd462d2fcd9..57b31b36285f 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
> @@ -134,31 +134,19 @@ typedef u64 gen8_pte_t;
>   #define GEN8_PDE_IPS_64K BIT(11)
>   #define GEN8_PDE_PS_2M   BIT(7)
>   
> +enum i915_cache_level;
> +
> +struct drm_i915_file_private;
> +struct drm_i915_gem_object;
>   struct i915_fence_reg;
> +struct i915_vma;
> +struct intel_gt;
>   
>   #define for_each_sgt_daddr(__dp, __iter, __sgt) \
>   	__for_each_sgt_daddr(__dp, __iter, __sgt, I915_GTT_PAGE_SIZE)
>   
> -struct i915_page_dma {
> -	struct page *page;
> -	union {
> -		dma_addr_t daddr;
> -
> -		/*
> -		 * For gen6/gen7 only. This is the offset in the GGTT
> -		 * where the page directory entries for PPGTT begin
> -		 */
> -		u32 ggtt_offset;
> -	};
> -};
> -
> -struct i915_page_scratch {
> -	struct i915_page_dma base;
> -	u64 encode;
> -};
> -
>   struct i915_page_table {
> -	struct i915_page_dma base;
> +	struct drm_i915_gem_object *base;
>   	union {
>   		atomic_t used;
>   		struct i915_page_table *stash;
> @@ -179,12 +167,14 @@ struct i915_page_directory {
>   	other)
>   
>   #define px_base(px) \
> -	__px_choose_expr(px, struct i915_page_dma *, __x, \
> -	__px_choose_expr(px, struct i915_page_scratch *, &__x->base, \
> -	__px_choose_expr(px, struct i915_page_table *, &__x->base, \
> -	__px_choose_expr(px, struct i915_page_directory *, &__x->pt.base, \
> -	(void)0))))
> -#define px_dma(px) (px_base(px)->daddr)
> +	__px_choose_expr(px, struct drm_i915_gem_object *, __x, \
> +	__px_choose_expr(px, struct i915_page_table *, __x->base, \
> +	__px_choose_expr(px, struct i915_page_directory *, __x->pt.base, \
> +	(void)0)))
> +
> +struct page *__px_page(struct drm_i915_gem_object *p);
> +dma_addr_t __px_dma(struct drm_i915_gem_object *p);
> +#define px_dma(px) (__px_dma(px_base(px)))
>   
>   #define px_pt(px) \
>   	__px_choose_expr(px, struct i915_page_table *, __x, \
> @@ -192,13 +182,6 @@ struct i915_page_directory {
>   	(void)0))
>   #define px_used(px) (&px_pt(px)->used)
>   
> -enum i915_cache_level;
> -
> -struct drm_i915_file_private;
> -struct drm_i915_gem_object;
> -struct i915_vma;
> -struct intel_gt;
> -
>   struct i915_vm_pt_stash {
>   	/* preallocated chains of page tables/directories */
>   	struct i915_page_table *pt[2];
> @@ -222,13 +205,6 @@ struct i915_vma_ops {
>   	void (*clear_pages)(struct i915_vma *vma);
>   };
>   
> -struct pagestash {
> -	spinlock_t lock;
> -	struct pagevec pvec;
> -};
> -
> -void stash_init(struct pagestash *stash);
> -
>   struct i915_address_space {
>   	struct kref ref;
>   	struct rcu_work rcu;
> @@ -265,7 +241,7 @@ struct i915_address_space {
>   #define VM_CLASS_GGTT 0
>   #define VM_CLASS_PPGTT 1
>   
> -	struct i915_page_scratch scratch[4];
> +	struct drm_i915_gem_object *scratch[4];
>   	unsigned int scratch_order;
>   	unsigned int top;
>   
> @@ -274,17 +250,15 @@ struct i915_address_space {
>   	 */
>   	struct list_head bound_list;
>   
> -	struct pagestash free_pages;
> -
>   	/* Global GTT */
>   	bool is_ggtt:1;
>   
> -	/* Some systems require uncached updates of the page directories */
> -	bool pt_kmap_wc:1;
> -
>   	/* Some systems support read-only mappings for GGTT and/or PPGTT */
>   	bool has_read_only:1;
>   
> +	struct drm_i915_gem_object *
> +		(*alloc_pt_dma)(struct i915_address_space *vm, int sz);
> +
>   	u64 (*pte_encode)(dma_addr_t addr,
>   			  enum i915_cache_level level,
>   			  u32 flags); /* Create a valid PTE */
> @@ -500,9 +474,9 @@ i915_pd_entry(const struct i915_page_directory * const pdp,
>   static inline dma_addr_t
>   i915_page_dir_dma_addr(const struct i915_ppgtt *ppgtt, const unsigned int n)
>   {
> -	struct i915_page_dma *pt = ppgtt->pd->entry[n];
> +	struct i915_page_table *pt = ppgtt->pd->entry[n];
>   
> -	return px_dma(pt ?: px_base(&ppgtt->vm.scratch[ppgtt->vm.top]));
> +	return __px_dma(pt ? px_base(pt) : ppgtt->vm.scratch[ppgtt->vm.top]);
>   }
>   
>   void ppgtt_init(struct i915_ppgtt *ppgtt, struct intel_gt *gt);
> @@ -527,13 +501,10 @@ struct i915_ppgtt *i915_ppgtt_create(struct intel_gt *gt);
>   void i915_ggtt_suspend(struct i915_ggtt *gtt);
>   void i915_ggtt_resume(struct i915_ggtt *ggtt);
>   
> -int setup_page_dma(struct i915_address_space *vm, struct i915_page_dma *p);
> -void cleanup_page_dma(struct i915_address_space *vm, struct i915_page_dma *p);
> -
> -#define kmap_atomic_px(px) kmap_atomic(px_base(px)->page)
> +#define kmap_atomic_px(px) kmap_atomic(__px_page(px_base(px)))
>   
>   void
> -fill_page_dma(const struct i915_page_dma *p, const u64 val, unsigned int count);
> +fill_page_dma(struct drm_i915_gem_object *p, const u64 val, unsigned int count);
>   
>   #define fill_px(px, v) fill_page_dma(px_base(px), (v), PAGE_SIZE / sizeof(u64))
>   #define fill32_px(px, v) do {						\
> @@ -541,37 +512,36 @@ fill_page_dma(const struct i915_page_dma *p, const u64 val, unsigned int count);
>   	fill_px((px), v__ << 32 | v__);					\
>   } while (0)
>   
> -int setup_scratch_page(struct i915_address_space *vm, gfp_t gfp);
> -void cleanup_scratch_page(struct i915_address_space *vm);
> +int setup_scratch_page(struct i915_address_space *vm);
>   void free_scratch(struct i915_address_space *vm);
>   
> +struct drm_i915_gem_object *alloc_pt_dma(struct i915_address_space *vm, int sz);
>   struct i915_page_table *alloc_pt(struct i915_address_space *vm);
>   struct i915_page_directory *alloc_pd(struct i915_address_space *vm);
>   struct i915_page_directory *__alloc_pd(size_t sz);
>   
> -void free_pd(struct i915_address_space *vm, struct i915_page_dma *pd);
> -
> -#define free_px(vm, px) free_pd(vm, px_base(px))
> +void free_pt(struct i915_address_space *vm, struct i915_page_table *pt);
> +#define free_px(vm, px) free_pt(vm, px_pt(px))
>   
>   void
>   __set_pd_entry(struct i915_page_directory * const pd,
>   	       const unsigned short idx,
> -	       struct i915_page_dma * const to,
> +	       struct i915_page_table *pt,
>   	       u64 (*encode)(const dma_addr_t, const enum i915_cache_level));
>   
>   #define set_pd_entry(pd, idx, to) \
> -	__set_pd_entry((pd), (idx), px_base(to), gen8_pde_encode)
> +	__set_pd_entry((pd), (idx), px_pt(to), gen8_pde_encode)
>   
>   void
>   clear_pd_entry(struct i915_page_directory * const pd,
>   	       const unsigned short idx,
> -	       const struct i915_page_scratch * const scratch);
> +	       const struct drm_i915_gem_object * const scratch);
>   
>   bool
>   release_pd_entry(struct i915_page_directory * const pd,
>   		 const unsigned short idx,
>   		 struct i915_page_table * const pt,
> -		 const struct i915_page_scratch * const scratch);
> +		 const struct drm_i915_gem_object * const scratch);
>   void gen6_ggtt_invalidate(struct i915_ggtt *ggtt);
>   
>   int ggtt_set_pages(struct i915_vma *vma);
> diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
> index 9633fd2d294d..94bd969ebffd 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
> @@ -18,7 +18,8 @@ struct i915_page_table *alloc_pt(struct i915_address_space *vm)
>   	if (unlikely(!pt))
>   		return ERR_PTR(-ENOMEM);
>   
> -	if (unlikely(setup_page_dma(vm, &pt->base))) {
> +	pt->base = vm->alloc_pt_dma(vm, I915_GTT_PAGE_SIZE_4K);
> +	if (IS_ERR(pt->base)) {
>   		kfree(pt);
>   		return ERR_PTR(-ENOMEM);
>   	}
> @@ -47,7 +48,8 @@ struct i915_page_directory *alloc_pd(struct i915_address_space *vm)
>   	if (unlikely(!pd))
>   		return ERR_PTR(-ENOMEM);
>   
> -	if (unlikely(setup_page_dma(vm, px_base(pd)))) {
> +	pd->pt.base = vm->alloc_pt_dma(vm, I915_GTT_PAGE_SIZE_4K);
> +	if (IS_ERR(pd->pt.base)) {
>   		kfree(pd);
>   		return ERR_PTR(-ENOMEM);
>   	}
> @@ -55,27 +57,28 @@ struct i915_page_directory *alloc_pd(struct i915_address_space *vm)
>   	return pd;
>   }
>   
> -void free_pd(struct i915_address_space *vm, struct i915_page_dma *pd)
> +void free_pt(struct i915_address_space *vm, struct i915_page_table *pt)
>   {
> -	cleanup_page_dma(vm, pd);
> -	kfree(pd);
> +	i915_gem_object_put(pt->base);
> +	kfree(pt);
>   }
>   
>   static inline void
> -write_dma_entry(struct i915_page_dma * const pdma,
> +write_dma_entry(struct drm_i915_gem_object * const pdma,
>   		const unsigned short idx,
>   		const u64 encoded_entry)
>   {
> -	u64 * const vaddr = kmap_atomic(pdma->page);
> +	u64 * const vaddr = kmap_atomic(__px_page(pdma));
>   
>   	vaddr[idx] = encoded_entry;
> +	clflush_cache_range(&vaddr[idx], sizeof(u64));
>   	kunmap_atomic(vaddr);
>   }
>   
>   void
>   __set_pd_entry(struct i915_page_directory * const pd,
>   	       const unsigned short idx,
> -	       struct i915_page_dma * const to,
> +	       struct i915_page_table * const to,
>   	       u64 (*encode)(const dma_addr_t, const enum i915_cache_level))
>   {
>   	/* Each thread pre-pins the pd, and we may have a thread per pde. */
> @@ -83,13 +86,13 @@ __set_pd_entry(struct i915_page_directory * const pd,
>   
>   	atomic_inc(px_used(pd));
>   	pd->entry[idx] = to;
> -	write_dma_entry(px_base(pd), idx, encode(to->daddr, I915_CACHE_LLC));
> +	write_dma_entry(px_base(pd), idx, encode(px_dma(to), I915_CACHE_LLC));
>   }
>   
>   void
>   clear_pd_entry(struct i915_page_directory * const pd,
>   	       const unsigned short idx,
> -	       const struct i915_page_scratch * const scratch)
> +	       const struct drm_i915_gem_object * const scratch)
>   {
>   	GEM_BUG_ON(atomic_read(px_used(pd)) == 0);
>   
> @@ -102,7 +105,7 @@ bool
>   release_pd_entry(struct i915_page_directory * const pd,
>   		 const unsigned short idx,
>   		 struct i915_page_table * const pt,
> -		 const struct i915_page_scratch * const scratch)
> +		 const struct drm_i915_gem_object * const scratch)
>   {
>   	bool free = false;
>   
> diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> index 68a08486fc87..f1f27b7fc746 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> @@ -201,16 +201,18 @@ static struct i915_address_space *vm_alias(struct i915_address_space *vm)
>   	return vm;
>   }
>   
> +static u32 pp_dir(struct i915_address_space *vm)
> +{
> +	return to_gen6_ppgtt(i915_vm_to_ppgtt(vm))->pp_dir;
> +}
> +
>   static void set_pp_dir(struct intel_engine_cs *engine)
>   {
>   	struct i915_address_space *vm = vm_alias(engine->gt->vm);
>   
>   	if (vm) {
> -		struct i915_ppgtt *ppgtt = i915_vm_to_ppgtt(vm);
> -
>   		ENGINE_WRITE(engine, RING_PP_DIR_DCLV, PP_DIR_DCLV_2G);
> -		ENGINE_WRITE(engine, RING_PP_DIR_BASE,
> -			     px_base(ppgtt->pd)->ggtt_offset << 10);
> +		ENGINE_WRITE(engine, RING_PP_DIR_BASE, pp_dir(vm));
>   	}
>   }
>   
> @@ -608,7 +610,7 @@ static const struct intel_context_ops ring_context_ops = {
>   };
>   
>   static int load_pd_dir(struct i915_request *rq,
> -		       const struct i915_ppgtt *ppgtt,
> +		       struct i915_address_space *vm,
>   		       u32 valid)
>   {
>   	const struct intel_engine_cs * const engine = rq->engine;
> @@ -624,7 +626,7 @@ static int load_pd_dir(struct i915_request *rq,
>   
>   	*cs++ = MI_LOAD_REGISTER_IMM(1);
>   	*cs++ = i915_mmio_reg_offset(RING_PP_DIR_BASE(engine->mmio_base));
> -	*cs++ = px_base(ppgtt->pd)->ggtt_offset << 10;
> +	*cs++ = pp_dir(vm);
>   
>   	/* Stall until the page table load is complete? */
>   	*cs++ = MI_STORE_REGISTER_MEM | MI_SRM_LRM_GLOBAL_GTT;
> @@ -826,7 +828,7 @@ static int switch_mm(struct i915_request *rq, struct i915_address_space *vm)
>   	 * post-sync op, this extra pass appears vital before a
>   	 * mm switch!
>   	 */
> -	ret = load_pd_dir(rq, i915_vm_to_ppgtt(vm), PP_DIR_DCLV_2G);
> +	ret = load_pd_dir(rq, vm, PP_DIR_DCLV_2G);
>   	if (ret)
>   		return ret;
>   
> diff --git a/drivers/gpu/drm/i915/gvt/scheduler.c b/drivers/gpu/drm/i915/gvt/scheduler.c
> index 3c3b9842bbbd..1570eb8aa978 100644
> --- a/drivers/gpu/drm/i915/gvt/scheduler.c
> +++ b/drivers/gpu/drm/i915/gvt/scheduler.c
> @@ -403,6 +403,14 @@ static void release_shadow_wa_ctx(struct intel_shadow_wa_ctx *wa_ctx)
>   	wa_ctx->indirect_ctx.shadow_va = NULL;
>   }
>   
> +static void set_dma_address(struct i915_page_directory *pd, dma_addr_t addr)
> +{
> +	struct scatterlist *sg = pd->pt.base->mm.pages->sgl;
> +
> +	/* This is not a good idea */
> +	sg->dma_address = addr;
> +}
> +
>   static void set_context_ppgtt_from_shadow(struct intel_vgpu_workload *workload,
>   					  struct intel_context *ce)
>   {
> @@ -411,7 +419,7 @@ static void set_context_ppgtt_from_shadow(struct intel_vgpu_workload *workload,
>   	int i = 0;
>   
>   	if (mm->ppgtt_mm.root_entry_type == GTT_TYPE_PPGTT_ROOT_L4_ENTRY) {
> -		px_dma(ppgtt->pd) = mm->ppgtt_mm.shadow_pdps[0];
> +		set_dma_address(ppgtt->pd, mm->ppgtt_mm.shadow_pdps[0]);
>   	} else {
>   		for (i = 0; i < GVT_RING_CTX_NR_PDPS; i++) {
>   			struct i915_page_directory * const pd =
> @@ -421,7 +429,8 @@ static void set_context_ppgtt_from_shadow(struct intel_vgpu_workload *workload,
>   			   shadow ppgtt. */
>   			if (!pd)
>   				break;
> -			px_dma(pd) = mm->ppgtt_mm.shadow_pdps[i];
> +
> +			set_dma_address(pd, mm->ppgtt_mm.shadow_pdps[i]);
>   		}
>   	}
>   }
> @@ -1240,13 +1249,13 @@ i915_context_ppgtt_root_restore(struct intel_vgpu_submission *s,
>   	int i;
>   
>   	if (i915_vm_is_4lvl(&ppgtt->vm)) {
> -		px_dma(ppgtt->pd) = s->i915_context_pml4;
> +		set_dma_address(ppgtt->pd, s->i915_context_pml4);
>   	} else {
>   		for (i = 0; i < GEN8_3LVL_PDPES; i++) {
>   			struct i915_page_directory * const pd =
>   				i915_pd_entry(ppgtt->pd, i);
>   
> -			px_dma(pd) = s->i915_context_pdps[i];
> +			set_dma_address(pd, s->i915_context_pdps[i]);
>   		}
>   	}
>   }
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index 67102dc26fce..ea281d7b0630 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -1080,6 +1080,7 @@ static void i915_driver_release(struct drm_device *dev)
>   
>   	intel_memory_regions_driver_release(dev_priv);
>   	i915_ggtt_driver_release(dev_priv);
> +	i915_gem_drain_freed_objects(dev_priv);
>   
>   	i915_driver_mmio_release(dev_priv);
>   
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 6e9072ab30a1..100c2029798f 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -590,11 +590,6 @@ struct i915_gem_mm {
>   	 */
>   	atomic_t free_count;
>   
> -	/**
> -	 * Small stash of WC pages
> -	 */
> -	struct pagestash wc_stash;
> -
>   	/**
>   	 * tmpfs instance used for shmem backed objects
>   	 */
> diff --git a/drivers/gpu/drm/i915/selftests/mock_gtt.c b/drivers/gpu/drm/i915/selftests/mock_gtt.c
> index 5e4fb0fba34b..63a29211652e 100644
> --- a/drivers/gpu/drm/i915/selftests/mock_gtt.c
> +++ b/drivers/gpu/drm/i915/selftests/mock_gtt.c
> @@ -78,6 +78,8 @@ struct i915_ppgtt *mock_ppgtt(struct drm_i915_private *i915, const char *name)
>   
>   	i915_address_space_init(&ppgtt->vm, VM_CLASS_PPGTT);
>   
> +	ppgtt->vm.alloc_pt_dma = alloc_pt_dma;
> +
>   	ppgtt->vm.clear_range = mock_clear_range;
>   	ppgtt->vm.insert_page = mock_insert_page;
>   	ppgtt->vm.insert_entries = mock_insert_entries;
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Intel-gfx] [PATCH 11/23] drm/i915/gem: Remove the call for no-evict i915_vma_pin
  2020-07-02  8:32 ` [Intel-gfx] [PATCH 11/23] drm/i915/gem: Remove the call for no-evict i915_vma_pin Chris Wilson
@ 2020-07-03  8:59   ` Tvrtko Ursulin
  2020-07-03  9:23     ` Chris Wilson
  0 siblings, 1 reply; 56+ messages in thread
From: Tvrtko Ursulin @ 2020-07-03  8:59 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 02/07/2020 09:32, Chris Wilson wrote:
> Remove the stub i915_vma_pin() used for incrementally pining objects for
> execbuf (under the severe restriction that they must not wait on a
> resource as we may have already pinned it) and replace it with a
> i915_vma_pin_inplace() that is only allowed to reclaim the currently
> bound location for the vma (and will never wait for a pinned resource).

Hm I thought the point of the previous patch ("drm/i915/gem: Break apart 
the early i915_vma_pin from execbuf object lookup") was to move the 
pinning into a phase under the ww lock, where it will be allowed. I 
misunderstood something?

Regards,

Tvrtko

> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 69 +++++++++++--------
>   drivers/gpu/drm/i915/i915_vma.c               |  6 +-
>   drivers/gpu/drm/i915/i915_vma.h               |  2 +
>   3 files changed, 45 insertions(+), 32 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> index 1348cb5ec7e6..18e9325dd98a 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> @@ -452,49 +452,55 @@ static u64 eb_pin_flags(const struct drm_i915_gem_exec_object2 *entry,
>   	return pin_flags;
>   }
>   
> +static bool eb_pin_vma_fence_inplace(struct eb_vma *ev)
> +{
> +	struct i915_vma *vma = ev->vma;
> +	struct i915_fence_reg *reg = vma->fence;
> +
> +	if (reg) {
> +		if (READ_ONCE(reg->dirty))
> +			return false;
> +
> +		atomic_inc(&reg->pin_count);
> +		ev->flags |= __EXEC_OBJECT_HAS_FENCE;
> +	} else {
> +		if (i915_gem_object_is_tiled(vma->obj))
> +			return false;
> +	}
> +
> +	return true;
> +}
> +
>   static inline bool
> -eb_pin_vma(struct i915_execbuffer *eb,
> -	   const struct drm_i915_gem_exec_object2 *entry,
> -	   struct eb_vma *ev)
> +eb_pin_vma_inplace(struct i915_execbuffer *eb,
> +		   const struct drm_i915_gem_exec_object2 *entry,
> +		   struct eb_vma *ev)
>   {
>   	struct i915_vma *vma = ev->vma;
> -	u64 pin_flags;
> +	unsigned int pin_flags;
>   
> -	if (vma->node.size)
> -		pin_flags = vma->node.start;
> -	else
> -		pin_flags = entry->offset & PIN_OFFSET_MASK;
> +	if (eb_vma_misplaced(entry, vma, ev->flags))
> +		return false;
>   
> -	pin_flags |= PIN_USER | PIN_NOEVICT | PIN_OFFSET_FIXED;
> +	pin_flags = PIN_USER;
>   	if (unlikely(ev->flags & EXEC_OBJECT_NEEDS_GTT))
>   		pin_flags |= PIN_GLOBAL;
>   
>   	/* Attempt to reuse the current location if available */
> -	if (unlikely(i915_vma_pin(vma, 0, 0, pin_flags))) {
> -		if (entry->flags & EXEC_OBJECT_PINNED)
> -			return false;
> -
> -		/* Failing that pick any _free_ space if suitable */
> -		if (unlikely(i915_vma_pin(vma,
> -					  entry->pad_to_size,
> -					  entry->alignment,
> -					  eb_pin_flags(entry, ev->flags) |
> -					  PIN_USER | PIN_NOEVICT)))
> -			return false;
> -	}
> +	if (!i915_vma_pin_inplace(vma, pin_flags))
> +		return false;
>   
>   	if (unlikely(ev->flags & EXEC_OBJECT_NEEDS_FENCE)) {
> -		if (unlikely(i915_vma_pin_fence(vma))) {
> -			i915_vma_unpin(vma);
> +		if (!eb_pin_vma_fence_inplace(ev)) {
> +			__i915_vma_unpin(vma);
>   			return false;
>   		}
> -
> -		if (vma->fence)
> -			ev->flags |= __EXEC_OBJECT_HAS_FENCE;
>   	}
>   
> +	GEM_BUG_ON(eb_vma_misplaced(entry, vma, ev->flags));
> +
>   	ev->flags |= __EXEC_OBJECT_HAS_PIN;
> -	return !eb_vma_misplaced(entry, vma, ev->flags);
> +	return true;
>   }
>   
>   static int
> @@ -676,14 +682,17 @@ static int eb_reserve_vm(struct i915_execbuffer *eb)
>   		struct drm_i915_gem_exec_object2 *entry = ev->exec;
>   		struct i915_vma *vma = ev->vma;
>   
> -		if (eb_pin_vma(eb, entry, ev)) {
> +		if (eb_pin_vma_inplace(eb, entry, ev)) {
>   			if (entry->offset != vma->node.start) {
>   				entry->offset = vma->node.start | UPDATE;
>   				eb->args->flags |= __EXEC_HAS_RELOC;
>   			}
>   		} else {
> -			eb_unreserve_vma(ev);
> -			list_add_tail(&ev->unbound_link, &unbound);
> +			/* Lightly sort user placed objects to the fore */
> +			if (ev->flags & EXEC_OBJECT_PINNED)
> +				list_add(&ev->unbound_link, &unbound);
> +			else
> +				list_add_tail(&ev->unbound_link, &unbound);
>   		}
>   	}
>   
> diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
> index fc8a083753bd..a00a026076e4 100644
> --- a/drivers/gpu/drm/i915/i915_vma.c
> +++ b/drivers/gpu/drm/i915/i915_vma.c
> @@ -744,11 +744,13 @@ i915_vma_detach(struct i915_vma *vma)
>   	list_del(&vma->vm_link);
>   }
>   
> -static bool try_qad_pin(struct i915_vma *vma, unsigned int flags)
> +bool i915_vma_pin_inplace(struct i915_vma *vma, unsigned int flags)
>   {
>   	unsigned int bound;
>   	bool pinned = true;
>   
> +	GEM_BUG_ON(flags & ~I915_VMA_BIND_MASK);
> +
>   	bound = atomic_read(&vma->flags);
>   	do {
>   		if (unlikely(flags & ~bound))
> @@ -869,7 +871,7 @@ int i915_vma_pin(struct i915_vma *vma, u64 size, u64 alignment, u64 flags)
>   	GEM_BUG_ON(!(flags & (PIN_USER | PIN_GLOBAL)));
>   
>   	/* First try and grab the pin without rebinding the vma */
> -	if (try_qad_pin(vma, flags & I915_VMA_BIND_MASK))
> +	if (i915_vma_pin_inplace(vma, flags & I915_VMA_BIND_MASK))
>   		return 0;
>   
>   	err = vma_get_pages(vma);
> diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
> index d0d01f909548..03fea54fd573 100644
> --- a/drivers/gpu/drm/i915/i915_vma.h
> +++ b/drivers/gpu/drm/i915/i915_vma.h
> @@ -236,6 +236,8 @@ static inline void i915_vma_unlock(struct i915_vma *vma)
>   	dma_resv_unlock(vma->resv);
>   }
>   
> +bool i915_vma_pin_inplace(struct i915_vma *vma, unsigned int flags);
> +
>   int __must_check
>   i915_vma_pin(struct i915_vma *vma, u64 size, u64 alignment, u64 flags);
>   int i915_ggtt_pin(struct i915_vma *vma, u32 align, unsigned int flags);
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Intel-gfx] [PATCH 07/23] drm/i915: Switch to object allocations for page directories
  2020-07-03  8:44   ` Tvrtko Ursulin
@ 2020-07-03  9:00     ` Chris Wilson
  2020-07-03  9:24       ` Tvrtko Ursulin
  0 siblings, 1 reply; 56+ messages in thread
From: Chris Wilson @ 2020-07-03  9:00 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2020-07-03 09:44:52)
> 
> On 02/07/2020 09:32, Chris Wilson wrote:
> > The GEM object is grossly overweight for the practicality of tracking
> > large numbers of individual pages, yet it is currently our only
> > abstraction for tracking DMA allocations. Since those allocations need
> > to be reserved upfront before an operation, and that we need to break
> > away from simple system memory, we need to ditch using plain struct page
> > wrappers.
> 
> [Calling all page table experts...] :)
> 
> So.. mostly 4k allocations via GEM objects? Sounds not ideal on first.
> 
> Reminder on why we need to break away from simple system memory?

The page tables are stored in device memory, which at the moment are
plain pages with dma mappings.

> Need to 
> have a list of GEM objects which can be locked in the ww locking phase? 

Yes, since we will need to be able to reserve all the device memory we
need for execution.

> But how do you allocate these objects up front, when allocation needs to 
> be under the ww lock in case evictions need to be triggered.

By preeallocating enough objects to cover the page directories during
the reservation phase. The previous patch moved the allocations from the
point-of-use to before we insert the vma. Having made it the onus of the
caller to provide the page directories allocations, we can then do it
early on during the memory reservations.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Intel-gfx] [PATCH 11/23] drm/i915/gem: Remove the call for no-evict i915_vma_pin
  2020-07-03  8:59   ` Tvrtko Ursulin
@ 2020-07-03  9:23     ` Chris Wilson
  2020-07-06 14:43       ` Tvrtko Ursulin
  0 siblings, 1 reply; 56+ messages in thread
From: Chris Wilson @ 2020-07-03  9:23 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2020-07-03 09:59:01)
> 
> On 02/07/2020 09:32, Chris Wilson wrote:
> > Remove the stub i915_vma_pin() used for incrementally pining objects for
> > execbuf (under the severe restriction that they must not wait on a
> > resource as we may have already pinned it) and replace it with a
> > i915_vma_pin_inplace() that is only allowed to reclaim the currently
> > bound location for the vma (and will never wait for a pinned resource).
> 
> Hm I thought the point of the previous patch ("drm/i915/gem: Break apart 
> the early i915_vma_pin from execbuf object lookup") was to move the 
> pinning into a phase under the ww lock, where it will be allowed. I 
> misunderstood something?

Still different locks, and the vm->mutex is still being used for managing
the iova assignments.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Intel-gfx] [PATCH 07/23] drm/i915: Switch to object allocations for page directories
  2020-07-03  9:00     ` Chris Wilson
@ 2020-07-03  9:24       ` Tvrtko Ursulin
  2020-07-03  9:49         ` Chris Wilson
  0 siblings, 1 reply; 56+ messages in thread
From: Tvrtko Ursulin @ 2020-07-03  9:24 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 03/07/2020 10:00, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2020-07-03 09:44:52)
>>
>> On 02/07/2020 09:32, Chris Wilson wrote:
>>> The GEM object is grossly overweight for the practicality of tracking
>>> large numbers of individual pages, yet it is currently our only
>>> abstraction for tracking DMA allocations. Since those allocations need
>>> to be reserved upfront before an operation, and that we need to break
>>> away from simple system memory, we need to ditch using plain struct page
>>> wrappers.
>>
>> [Calling all page table experts...] :)
>>
>> So.. mostly 4k allocations via GEM objects? Sounds not ideal on first.

What is the relationship between object size and number of 4k objects 
needed for page tables?

>> Reminder on why we need to break away from simple system memory?
> 
> The page tables are stored in device memory, which at the moment are
> plain pages with dma mappings.
> 
>> Need to
>> have a list of GEM objects which can be locked in the ww locking phase?
> 
> Yes, since we will need to be able to reserve all the device memory we
> need for execution.
> 
>> But how do you allocate these objects up front, when allocation needs to
>> be under the ww lock in case evictions need to be triggered.
> 
> By preeallocating enough objects to cover the page directories during
> the reservation phase. The previous patch moved the allocations from the
> point-of-use to before we insert the vma. Having made it the onus of the
> caller to provide the page directories allocations, we can then do it
> early on during the memory reservations.

Okay I missed the importance of the previous patch.

But preallocations have to be able to trigger evictions. Is the 
preallocating objects split then into creating objects and obtaining 
backing store? I do not see this in this patch, alloc_pt_dma both 
creates the object and pins the pages.

Regards,

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Intel-gfx] [PATCH 07/23] drm/i915: Switch to object allocations for page directories
  2020-07-03  9:24       ` Tvrtko Ursulin
@ 2020-07-03  9:49         ` Chris Wilson
  2020-07-03 16:34           ` Tvrtko Ursulin
  0 siblings, 1 reply; 56+ messages in thread
From: Chris Wilson @ 2020-07-03  9:49 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2020-07-03 10:24:27)
> 
> On 03/07/2020 10:00, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2020-07-03 09:44:52)
> >>
> >> On 02/07/2020 09:32, Chris Wilson wrote:
> >>> The GEM object is grossly overweight for the practicality of tracking
> >>> large numbers of individual pages, yet it is currently our only
> >>> abstraction for tracking DMA allocations. Since those allocations need
> >>> to be reserved upfront before an operation, and that we need to break
> >>> away from simple system memory, we need to ditch using plain struct page
> >>> wrappers.
> >>
> >> [Calling all page table experts...] :)
> >>
> >> So.. mostly 4k allocations via GEM objects? Sounds not ideal on first.
> 
> What is the relationship between object size and number of 4k objects 
> needed for page tables?

1 pt (4KiB dma + small struct) per 2MiB + misalignment
1 pd (4KiB dma + ~4KiB struct) per 1GiB + misalignment
1 pd per 512GiB + misalignment
1 pd per 256TiB + misalignment
[top level is preallocated]

etc.

> 
> >> Reminder on why we need to break away from simple system memory?
> > 
> > The page tables are stored in device memory, which at the moment are
> > plain pages with dma mappings.
> > 
> >> Need to
> >> have a list of GEM objects which can be locked in the ww locking phase?
> > 
> > Yes, since we will need to be able to reserve all the device memory we
> > need for execution.
> > 
> >> But how do you allocate these objects up front, when allocation needs to
> >> be under the ww lock in case evictions need to be triggered.
> > 
> > By preeallocating enough objects to cover the page directories during
> > the reservation phase. The previous patch moved the allocations from the
> > point-of-use to before we insert the vma. Having made it the onus of the
> > caller to provide the page directories allocations, we can then do it
> > early on during the memory reservations.
> 
> Okay I missed the importance of the previous patch.
> 
> But preallocations have to be able to trigger evictions. Is the 
> preallocating objects split then into creating objects and obtaining 
> backing store? I do not see this in this patch, alloc_pt_dma both 
> creates the object and pins the pages.

Sure. It can be broken into two calls easily, or rather after having
allocated objects suitable for the page tables, they can then all be
reserved en masse will the rest of the objects. I was guilty of still
thinking in terms of system memory.

Worth keeping in mind is that the GGTT should never need extra
allocations, which should keep a lot of the isolated object handling
easier. And some vm will have preallocated ranges (e.g. the
aliasing-ppgtt) so that we don't need to allocate more objects during
critical phases.

My goal is separate out the special cases for PIN_USER (i.e. execbuf)
where there are many, many objects and auxiliary allocations from the
special cases for the isolated PIN_GLOBAL, and from future special cases
for pageout; killing i915_vma_pin(PIN_USER).
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Intel-gfx] [PATCH 05/23] drm/i915: Export ppgtt_bind_vma
  2020-07-02  8:32 ` [Intel-gfx] [PATCH 05/23] drm/i915: Export ppgtt_bind_vma Chris Wilson
@ 2020-07-03 10:09   ` Andi Shyti
  0 siblings, 0 replies; 56+ messages in thread
From: Andi Shyti @ 2020-07-03 10:09 UTC (permalink / raw)
  To: Chris Wilson, andi, andi.shyti; +Cc: intel-gfx

Hi Chris,

On Thu, Jul 02, 2020 at 09:32:07AM +0100, Chris Wilson wrote:
> Reuse the ppgtt_bind_vma() for aliasing_ppgtt_bind_vma() so we can
> reduce some code near-duplication. The catch is that we need to then
> pass along the i915_address_space and not rely on vma->vm, as they
> differ with the aliasing-ppgtt.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

very nice!

Reviewed-by: Andi Shyti <andi.shyti@intel.com>

Andi
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Intel-gfx] [PATCH 07/23] drm/i915: Switch to object allocations for page directories
  2020-07-03  9:49         ` Chris Wilson
@ 2020-07-03 16:34           ` Tvrtko Ursulin
  0 siblings, 0 replies; 56+ messages in thread
From: Tvrtko Ursulin @ 2020-07-03 16:34 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx



On 03/07/2020 10:49, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2020-07-03 10:24:27)
>>
>> On 03/07/2020 10:00, Chris Wilson wrote:
>>> Quoting Tvrtko Ursulin (2020-07-03 09:44:52)
>>>>
>>>> On 02/07/2020 09:32, Chris Wilson wrote:
>>>>> The GEM object is grossly overweight for the practicality of tracking
>>>>> large numbers of individual pages, yet it is currently our only
>>>>> abstraction for tracking DMA allocations. Since those allocations need
>>>>> to be reserved upfront before an operation, and that we need to break
>>>>> away from simple system memory, we need to ditch using plain struct page
>>>>> wrappers.
>>>>
>>>> [Calling all page table experts...] :)
>>>>
>>>> So.. mostly 4k allocations via GEM objects? Sounds not ideal on first.
>>
>> What is the relationship between object size and number of 4k objects
>> needed for page tables?
> 
> 1 pt (4KiB dma + small struct) per 2MiB + misalignment
> 1 pd (4KiB dma + ~4KiB struct) per 1GiB + misalignment
> 1 pd per 512GiB + misalignment
> 1 pd per 256TiB + misalignment
> [top level is preallocated]

Okay so not too much.

Advantage is direction seems right for making page table backing store 
in local memory take part in group ww locking during reservation.

Although strictly we could track any ww lock in the ww context, it 
doesn't strictly need to be the object one.

Disadvantage is increased system memory usage for gem bo metadata. Still 
route is open to replace this with some other (new) object, as long as 
it provides a ww mutex.

> etc.
> 
>>
>>>> Reminder on why we need to break away from simple system memory?
>>>
>>> The page tables are stored in device memory, which at the moment are
>>> plain pages with dma mappings.
>>>
>>>> Need to
>>>> have a list of GEM objects which can be locked in the ww locking phase?
>>>
>>> Yes, since we will need to be able to reserve all the device memory we
>>> need for execution.
>>>
>>>> But how do you allocate these objects up front, when allocation needs to
>>>> be under the ww lock in case evictions need to be triggered.
>>>
>>> By preeallocating enough objects to cover the page directories during
>>> the reservation phase. The previous patch moved the allocations from the
>>> point-of-use to before we insert the vma. Having made it the onus of the
>>> caller to provide the page directories allocations, we can then do it
>>> early on during the memory reservations.
>>
>> Okay I missed the importance of the previous patch.
>>
>> But preallocations have to be able to trigger evictions. Is the
>> preallocating objects split then into creating objects and obtaining
>> backing store? I do not see this in this patch, alloc_pt_dma both
>> creates the object and pins the pages.
> 
> Sure. It can be broken into two calls easily, or rather after having
> allocated objects suitable for the page tables, they can then all be
> reserved en masse will the rest of the objects. I was guilty of still
> thinking in terms of system memory.

Yep, okay, I read this as respin will split the phases.

> Worth keeping in mind is that the GGTT should never need extra
> allocations, which should keep a lot of the isolated object handling
> easier. And some vm will have preallocated ranges (e.g. the
> aliasing-ppgtt) so that we don't need to allocate more objects during
> critical phases.
> 
> My goal is separate out the special cases for PIN_USER (i.e. execbuf)
> where there are many, many objects and auxiliary allocations from the
> special cases for the isolated PIN_GLOBAL, and from future special cases
> for pageout; killing i915_vma_pin(PIN_USER).

The PIN_USER part is clear, however I am not sure why PIN_GLOBAL would 
be exempt. There is always the case when first submission against a 
context needs to allocate stuff.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Intel-gfx] [PATCH 07/23] drm/i915: Switch to object allocations for page directories
  2020-07-02  8:32 ` [Intel-gfx] [PATCH 07/23] drm/i915: Switch to object allocations for page directories Chris Wilson
  2020-07-03  8:44   ` Tvrtko Ursulin
@ 2020-07-03 16:36   ` Tvrtko Ursulin
  1 sibling, 0 replies; 56+ messages in thread
From: Tvrtko Ursulin @ 2020-07-03 16:36 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 02/07/2020 09:32, Chris Wilson wrote:
> The GEM object is grossly overweight for the practicality of tracking
> large numbers of individual pages, yet it is currently our only
> abstraction for tracking DMA allocations. Since those allocations need
> to be reserved upfront before an operation, and that we need to break
> away from simple system memory, we need to ditch using plain struct page
> wrappers.
> 
> In the process, we drop the WC mapping as we ended up clflushing
> everything anyway due to various issues across a wider range of
> platforms. Though in a future step, we need to drop the kmap_atomic
> approach which suggests we need to pre-map all the pages and keep them
> mapped.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   .../gpu/drm/i915/gem/i915_gem_object_types.h  |   1 +
>   .../gpu/drm/i915/gem/selftests/huge_pages.c   |   2 +-
>   .../drm/i915/gem/selftests/i915_gem_context.c |   2 +-
>   drivers/gpu/drm/i915/gt/gen6_ppgtt.c          |  46 ++-
>   drivers/gpu/drm/i915/gt/gen6_ppgtt.h          |   1 +
>   drivers/gpu/drm/i915/gt/gen8_ppgtt.c          |  64 ++--
>   drivers/gpu/drm/i915/gt/intel_ggtt.c          |  31 +-
>   drivers/gpu/drm/i915/gt/intel_gtt.c           | 291 +++---------------
>   drivers/gpu/drm/i915/gt/intel_gtt.h           |  92 ++----
>   drivers/gpu/drm/i915/gt/intel_ppgtt.c         |  25 +-
>   .../gpu/drm/i915/gt/intel_ring_submission.c   |  16 +-
>   drivers/gpu/drm/i915/gvt/scheduler.c          |  17 +-
>   drivers/gpu/drm/i915/i915_drv.c               |   1 +
>   drivers/gpu/drm/i915/i915_drv.h               |   5 -
>   drivers/gpu/drm/i915/selftests/mock_gtt.c     |   2 +
>   15 files changed, 183 insertions(+), 413 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> index 5335f799b548..d0847d7896f9 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> @@ -282,6 +282,7 @@ struct drm_i915_gem_object {
>   		} userptr;
>   
>   		unsigned long scratch;
> +		u64 encode;

Name is not very descriptive and we should try to look for ways to share 
fields via unions as discussed on IRC.

Reading the rest of the patch in detail I'll leave for v2.

Regards,

Tvrtko

>   
>   		void *gvt_info;
>   	};
> diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
> index 8291ede6902c..9fb06fcc8f8f 100644
> --- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
> +++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
> @@ -393,7 +393,7 @@ static int igt_mock_exhaust_device_supported_pages(void *arg)
>   	 */
>   
>   	for (i = 1; i < BIT(ARRAY_SIZE(page_sizes)); i++) {
> -		unsigned int combination = 0;
> +		unsigned int combination = SZ_4K;
>   
>   		for (j = 0; j < ARRAY_SIZE(page_sizes); j++) {
>   			if (i & BIT(j))
> diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
> index b81978890641..1308198543d8 100644
> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
> @@ -1745,7 +1745,7 @@ static int check_scratch_page(struct i915_gem_context *ctx, u32 *out)
>   	if (!vm)
>   		return -ENODEV;
>   
> -	page = vm->scratch[0].base.page;
> +	page = __px_page(vm->scratch[0]);
>   	if (!page) {
>   		pr_err("No scratch page!\n");
>   		return -EINVAL;
> diff --git a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
> index 35e2b698f9ed..226e404c706d 100644
> --- a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
> +++ b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
> @@ -16,8 +16,10 @@ static inline void gen6_write_pde(const struct gen6_ppgtt *ppgtt,
>   				  const unsigned int pde,
>   				  const struct i915_page_table *pt)
>   {
> +	dma_addr_t addr = pt ? px_dma(pt) : px_dma(ppgtt->base.vm.scratch[1]);
> +
>   	/* Caller needs to make sure the write completes if necessary */
> -	iowrite32(GEN6_PDE_ADDR_ENCODE(px_dma(pt)) | GEN6_PDE_VALID,
> +	iowrite32(GEN6_PDE_ADDR_ENCODE(addr) | GEN6_PDE_VALID,
>   		  ppgtt->pd_addr + pde);
>   }
>   
> @@ -79,7 +81,7 @@ static void gen6_ppgtt_clear_range(struct i915_address_space *vm,
>   {
>   	struct gen6_ppgtt * const ppgtt = to_gen6_ppgtt(i915_vm_to_ppgtt(vm));
>   	const unsigned int first_entry = start / I915_GTT_PAGE_SIZE;
> -	const gen6_pte_t scratch_pte = vm->scratch[0].encode;
> +	const gen6_pte_t scratch_pte = vm->scratch[0]->encode;
>   	unsigned int pde = first_entry / GEN6_PTES;
>   	unsigned int pte = first_entry % GEN6_PTES;
>   	unsigned int num_entries = length / I915_GTT_PAGE_SIZE;
> @@ -90,8 +92,6 @@ static void gen6_ppgtt_clear_range(struct i915_address_space *vm,
>   		const unsigned int count = min(num_entries, GEN6_PTES - pte);
>   		gen6_pte_t *vaddr;
>   
> -		GEM_BUG_ON(px_base(pt) == px_base(&vm->scratch[1]));
> -
>   		num_entries -= count;
>   
>   		GEM_BUG_ON(count > atomic_read(&pt->used));
> @@ -127,7 +127,7 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
>   	struct sgt_dma iter = sgt_dma(vma);
>   	gen6_pte_t *vaddr;
>   
> -	GEM_BUG_ON(pd->entry[act_pt] == &vm->scratch[1]);
> +	GEM_BUG_ON(!pd->entry[act_pt]);
>   
>   	vaddr = kmap_atomic_px(i915_pt_entry(pd, act_pt));
>   	do {
> @@ -194,16 +194,16 @@ static void gen6_alloc_va_range(struct i915_address_space *vm,
>   	gen6_for_each_pde(pt, pd, start, length, pde) {
>   		const unsigned int count = gen6_pte_count(start, length);
>   
> -		if (px_base(pt) == px_base(&vm->scratch[1])) {
> +		if (!pt) {
>   			spin_unlock(&pd->lock);
>   
>   			pt = stash->pt[0];
>   			GEM_BUG_ON(!pt);
>   
> -			fill32_px(pt, vm->scratch[0].encode);
> +			fill32_px(pt, vm->scratch[0]->encode);
>   
>   			spin_lock(&pd->lock);
> -			if (pd->entry[pde] == &vm->scratch[1]) {
> +			if (!pd->entry[pde]) {
>   				stash->pt[0] = pt->stash;
>   				atomic_set(&pt->used, 0);
>   				pd->entry[pde] = pt;
> @@ -225,24 +225,21 @@ static void gen6_alloc_va_range(struct i915_address_space *vm,
>   static int gen6_ppgtt_init_scratch(struct gen6_ppgtt *ppgtt)
>   {
>   	struct i915_address_space * const vm = &ppgtt->base.vm;
> -	struct i915_page_directory * const pd = ppgtt->base.pd;
>   	int ret;
>   
> -	ret = setup_scratch_page(vm, __GFP_HIGHMEM);
> +	ret = setup_scratch_page(vm);
>   	if (ret)
>   		return ret;
>   
> -	vm->scratch[0].encode =
> -		vm->pte_encode(px_dma(&vm->scratch[0]),
> +	vm->scratch[0]->encode =
> +		vm->pte_encode(px_dma(vm->scratch[0]),
>   			       I915_CACHE_NONE, PTE_READ_ONLY);
>   
> -	if (unlikely(setup_page_dma(vm, px_base(&vm->scratch[1])))) {
> -		cleanup_scratch_page(vm);
> -		return -ENOMEM;
> -	}
> +	vm->scratch[1] = vm->alloc_pt_dma(vm, I915_GTT_PAGE_SIZE_4K);
> +	if (IS_ERR(vm->scratch[1]))
> +		return PTR_ERR(vm->scratch[1]);
>   
> -	fill32_px(&vm->scratch[1], vm->scratch[0].encode);
> -	memset_p(pd->entry, &vm->scratch[1], I915_PDES);
> +	fill32_px(vm->scratch[1], vm->scratch[0]->encode);
>   
>   	return 0;
>   }
> @@ -250,13 +247,11 @@ static int gen6_ppgtt_init_scratch(struct gen6_ppgtt *ppgtt)
>   static void gen6_ppgtt_free_pd(struct gen6_ppgtt *ppgtt)
>   {
>   	struct i915_page_directory * const pd = ppgtt->base.pd;
> -	struct i915_page_dma * const scratch =
> -		px_base(&ppgtt->base.vm.scratch[1]);
>   	struct i915_page_table *pt;
>   	u32 pde;
>   
>   	gen6_for_all_pdes(pt, pd, pde)
> -		if (px_base(pt) != scratch)
> +		if (pt)
>   			free_px(&ppgtt->base.vm, pt);
>   }
>   
> @@ -297,7 +292,7 @@ static void pd_vma_bind(struct i915_address_space *vm,
>   	struct gen6_ppgtt *ppgtt = vma->private;
>   	u32 ggtt_offset = i915_ggtt_offset(vma) / I915_GTT_PAGE_SIZE;
>   
> -	px_base(ppgtt->base.pd)->ggtt_offset = ggtt_offset * sizeof(gen6_pte_t);
> +	ppgtt->pp_dir = ggtt_offset * sizeof(gen6_pte_t) << 10;
>   	ppgtt->pd_addr = (gen6_pte_t __iomem *)ggtt->gsm + ggtt_offset;
>   
>   	gen6_flush_pd(ppgtt, 0, ppgtt->base.vm.total);
> @@ -307,8 +302,6 @@ static void pd_vma_unbind(struct i915_address_space *vm, struct i915_vma *vma)
>   {
>   	struct gen6_ppgtt *ppgtt = vma->private;
>   	struct i915_page_directory * const pd = ppgtt->base.pd;
> -	struct i915_page_dma * const scratch =
> -		px_base(&ppgtt->base.vm.scratch[1]);
>   	struct i915_page_table *pt;
>   	unsigned int pde;
>   
> @@ -317,11 +310,11 @@ static void pd_vma_unbind(struct i915_address_space *vm, struct i915_vma *vma)
>   
>   	/* Free all no longer used page tables */
>   	gen6_for_all_pdes(pt, ppgtt->base.pd, pde) {
> -		if (px_base(pt) == scratch || atomic_read(&pt->used))
> +		if (!pt || atomic_read(&pt->used))
>   			continue;
>   
>   		free_px(&ppgtt->base.vm, pt);
> -		pd->entry[pde] = scratch;
> +		pd->entry[pde] = NULL;
>   	}
>   
>   	ppgtt->scan_for_unused_pt = false;
> @@ -441,6 +434,7 @@ struct i915_ppgtt *gen6_ppgtt_create(struct intel_gt *gt)
>   	ppgtt->base.vm.insert_entries = gen6_ppgtt_insert_entries;
>   	ppgtt->base.vm.cleanup = gen6_ppgtt_cleanup;
>   
> +	ppgtt->base.vm.alloc_pt_dma = alloc_pt_dma;
>   	ppgtt->base.vm.pte_encode = ggtt->vm.pte_encode;
>   
>   	ppgtt->base.pd = __alloc_pd(sizeof(*ppgtt->base.pd));
> diff --git a/drivers/gpu/drm/i915/gt/gen6_ppgtt.h b/drivers/gpu/drm/i915/gt/gen6_ppgtt.h
> index 72e481806c96..7249672e5802 100644
> --- a/drivers/gpu/drm/i915/gt/gen6_ppgtt.h
> +++ b/drivers/gpu/drm/i915/gt/gen6_ppgtt.h
> @@ -14,6 +14,7 @@ struct gen6_ppgtt {
>   	struct mutex flush;
>   	struct i915_vma *vma;
>   	gen6_pte_t __iomem *pd_addr;
> +	u32 pp_dir;
>   
>   	atomic_t pin_count;
>   	struct mutex pin_mutex;
> diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> index e6f2acd445dd..d3f27beaac03 100644
> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> @@ -199,7 +199,7 @@ static u64 __gen8_ppgtt_clear(struct i915_address_space * const vm,
>   			      struct i915_page_directory * const pd,
>   			      u64 start, const u64 end, int lvl)
>   {
> -	const struct i915_page_scratch * const scratch = &vm->scratch[lvl];
> +	const struct drm_i915_gem_object * const scratch = vm->scratch[lvl];
>   	unsigned int idx, len;
>   
>   	GEM_BUG_ON(end > vm->total >> GEN8_PTE_SHIFT);
> @@ -239,7 +239,7 @@ static u64 __gen8_ppgtt_clear(struct i915_address_space * const vm,
>   
>   			vaddr = kmap_atomic_px(pt);
>   			memset64(vaddr + gen8_pd_index(start, 0),
> -				 vm->scratch[0].encode,
> +				 vm->scratch[0]->encode,
>   				 count);
>   			kunmap_atomic(vaddr);
>   
> @@ -301,7 +301,7 @@ static void __gen8_ppgtt_alloc(struct i915_address_space * const vm,
>   			if (lvl ||
>   			    gen8_pt_count(*start, end) < I915_PDES ||
>   			    intel_vgpu_active(vm->i915))
> -				fill_px(pt, vm->scratch[lvl].encode);
> +				fill_px(pt, vm->scratch[lvl]->encode);
>   
>   			spin_lock(&pd->lock);
>   			if (likely(!pd->entry[idx])) {
> @@ -356,16 +356,6 @@ static void gen8_ppgtt_alloc(struct i915_address_space *vm,
>   			   &start, start + length, vm->top);
>   }
>   
> -static __always_inline void
> -write_pte(gen8_pte_t *pte, const gen8_pte_t val)
> -{
> -	/* Magic delays? Or can we refine these to flush all in one pass? */
> -	*pte = val;
> -	wmb(); /* cpu to cache */
> -	clflush(pte); /* cache to memory */
> -	wmb(); /* visible to all */
> -}
> -
>   static __always_inline u64
>   gen8_ppgtt_insert_pte(struct i915_ppgtt *ppgtt,
>   		      struct i915_page_directory *pdp,
> @@ -382,8 +372,7 @@ gen8_ppgtt_insert_pte(struct i915_ppgtt *ppgtt,
>   	vaddr = kmap_atomic_px(i915_pt_entry(pd, gen8_pd_index(idx, 1)));
>   	do {
>   		GEM_BUG_ON(iter->sg->length < I915_GTT_PAGE_SIZE);
> -		write_pte(&vaddr[gen8_pd_index(idx, 0)],
> -			  pte_encode | iter->dma);
> +		vaddr[gen8_pd_index(idx, 0)] = pte_encode | iter->dma;
>   
>   		iter->dma += I915_GTT_PAGE_SIZE;
>   		if (iter->dma >= iter->max) {
> @@ -406,10 +395,12 @@ gen8_ppgtt_insert_pte(struct i915_ppgtt *ppgtt,
>   				pd = pdp->entry[gen8_pd_index(idx, 2)];
>   			}
>   
> +			clflush_cache_range(vaddr, PAGE_SIZE);
>   			kunmap_atomic(vaddr);
>   			vaddr = kmap_atomic_px(i915_pt_entry(pd, gen8_pd_index(idx, 1)));
>   		}
>   	} while (1);
> +	clflush_cache_range(vaddr, PAGE_SIZE);
>   	kunmap_atomic(vaddr);
>   
>   	return idx;
> @@ -465,7 +456,7 @@ static void gen8_ppgtt_insert_huge(struct i915_vma *vma,
>   
>   		do {
>   			GEM_BUG_ON(iter->sg->length < page_size);
> -			write_pte(&vaddr[index++], encode | iter->dma);
> +			vaddr[index++] = encode | iter->dma;
>   
>   			start += page_size;
>   			iter->dma += page_size;
> @@ -490,6 +481,7 @@ static void gen8_ppgtt_insert_huge(struct i915_vma *vma,
>   			}
>   		} while (rem >= page_size && index < I915_PDES);
>   
> +		clflush_cache_range(vaddr, PAGE_SIZE);
>   		kunmap_atomic(vaddr);
>   
>   		/*
> @@ -521,7 +513,7 @@ static void gen8_ppgtt_insert_huge(struct i915_vma *vma,
>   			if (I915_SELFTEST_ONLY(vma->vm->scrub_64K)) {
>   				u16 i;
>   
> -				encode = vma->vm->scratch[0].encode;
> +				encode = vma->vm->scratch[0]->encode;
>   				vaddr = kmap_atomic_px(i915_pt_entry(pd, maybe_64K));
>   
>   				for (i = 1; i < index; i += 16)
> @@ -575,27 +567,31 @@ static int gen8_init_scratch(struct i915_address_space *vm)
>   		GEM_BUG_ON(!clone->has_read_only);
>   
>   		vm->scratch_order = clone->scratch_order;
> -		memcpy(vm->scratch, clone->scratch, sizeof(vm->scratch));
> -		px_dma(&vm->scratch[0]) = 0; /* no xfer of ownership */
> +		for (i = 0; i <= vm->top; i++)
> +			vm->scratch[i] = i915_gem_object_get(clone->scratch[i]);
> +
>   		return 0;
>   	}
>   
> -	ret = setup_scratch_page(vm, __GFP_HIGHMEM);
> +	ret = setup_scratch_page(vm);
>   	if (ret)
>   		return ret;
>   
> -	vm->scratch[0].encode =
> -		gen8_pte_encode(px_dma(&vm->scratch[0]),
> +	vm->scratch[0]->encode =
> +		gen8_pte_encode(px_dma(vm->scratch[0]),
>   				I915_CACHE_LLC, vm->has_read_only);
>   
>   	for (i = 1; i <= vm->top; i++) {
> -		if (unlikely(setup_page_dma(vm, px_base(&vm->scratch[i]))))
> +		struct drm_i915_gem_object *obj;
> +
> +		obj = vm->alloc_pt_dma(vm, I915_GTT_PAGE_SIZE_4K);
> +		if (IS_ERR(obj))
>   			goto free_scratch;
>   
> -		fill_px(&vm->scratch[i], vm->scratch[i - 1].encode);
> -		vm->scratch[i].encode =
> -			gen8_pde_encode(px_dma(&vm->scratch[i]),
> -					I915_CACHE_LLC);
> +		fill_px(obj, vm->scratch[i - 1]->encode);
> +		obj->encode = gen8_pde_encode(px_dma(obj), I915_CACHE_LLC);
> +
> +		vm->scratch[i] = obj;
>   	}
>   
>   	return 0;
> @@ -621,7 +617,7 @@ static int gen8_preallocate_top_level_pdp(struct i915_ppgtt *ppgtt)
>   		if (IS_ERR(pde))
>   			return PTR_ERR(pde);
>   
> -		fill_px(pde, vm->scratch[1].encode);
> +		fill_px(pde, vm->scratch[1]->encode);
>   		set_pd_entry(pd, idx, pde);
>   		atomic_inc(px_used(pde)); /* keep pinned */
>   	}
> @@ -642,12 +638,13 @@ gen8_alloc_top_pd(struct i915_address_space *vm)
>   	if (unlikely(!pd))
>   		return ERR_PTR(-ENOMEM);
>   
> -	if (unlikely(setup_page_dma(vm, px_base(pd)))) {
> +	pd->pt.base = vm->alloc_pt_dma(vm, I915_GTT_PAGE_SIZE_4K);
> +	if (IS_ERR(pd->pt.base)) {
>   		kfree(pd);
>   		return ERR_PTR(-ENOMEM);
>   	}
>   
> -	fill_page_dma(px_base(pd), vm->scratch[vm->top].encode, count);
> +	fill_page_dma(px_base(pd), vm->scratch[vm->top]->encode, count);
>   	atomic_inc(px_used(pd)); /* mark as pinned */
>   	return pd;
>   }
> @@ -681,12 +678,7 @@ struct i915_ppgtt *gen8_ppgtt_create(struct intel_gt *gt)
>   	 */
>   	ppgtt->vm.has_read_only = !IS_GEN_RANGE(gt->i915, 11, 12);
>   
> -	/*
> -	 * There are only few exceptions for gen >=6. chv and bxt.
> -	 * And we are not sure about the latter so play safe for now.
> -	 */
> -	if (IS_CHERRYVIEW(gt->i915) || IS_BROXTON(gt->i915))
> -		ppgtt->vm.pt_kmap_wc = true;
> +	ppgtt->vm.alloc_pt_dma = alloc_pt_dma;
>   
>   	err = gen8_init_scratch(&ppgtt->vm);
>   	if (err)
> diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> index 791e4070ef31..9db27a2e5f36 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> @@ -78,8 +78,6 @@ int i915_ggtt_init_hw(struct drm_i915_private *i915)
>   {
>   	int ret;
>   
> -	stash_init(&i915->mm.wc_stash);
> -
>   	/*
>   	 * Note that we use page colouring to enforce a guard page at the
>   	 * end of the address space. This is required as the CS may prefetch
> @@ -232,7 +230,7 @@ static void gen8_ggtt_insert_entries(struct i915_address_space *vm,
>   
>   	/* Fill the allocated but "unused" space beyond the end of the buffer */
>   	while (gte < end)
> -		gen8_set_pte(gte++, vm->scratch[0].encode);
> +		gen8_set_pte(gte++, vm->scratch[0]->encode);
>   
>   	/*
>   	 * We want to flush the TLBs only after we're certain all the PTE
> @@ -283,7 +281,7 @@ static void gen6_ggtt_insert_entries(struct i915_address_space *vm,
>   
>   	/* Fill the allocated but "unused" space beyond the end of the buffer */
>   	while (gte < end)
> -		iowrite32(vm->scratch[0].encode, gte++);
> +		iowrite32(vm->scratch[0]->encode, gte++);
>   
>   	/*
>   	 * We want to flush the TLBs only after we're certain all the PTE
> @@ -303,7 +301,7 @@ static void gen8_ggtt_clear_range(struct i915_address_space *vm,
>   	struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
>   	unsigned int first_entry = start / I915_GTT_PAGE_SIZE;
>   	unsigned int num_entries = length / I915_GTT_PAGE_SIZE;
> -	const gen8_pte_t scratch_pte = vm->scratch[0].encode;
> +	const gen8_pte_t scratch_pte = vm->scratch[0]->encode;
>   	gen8_pte_t __iomem *gtt_base =
>   		(gen8_pte_t __iomem *)ggtt->gsm + first_entry;
>   	const int max_entries = ggtt_total_entries(ggtt) - first_entry;
> @@ -401,7 +399,7 @@ static void gen6_ggtt_clear_range(struct i915_address_space *vm,
>   		 first_entry, num_entries, max_entries))
>   		num_entries = max_entries;
>   
> -	scratch_pte = vm->scratch[0].encode;
> +	scratch_pte = vm->scratch[0]->encode;
>   	for (i = 0; i < num_entries; i++)
>   		iowrite32(scratch_pte, &gtt_base[i]);
>   }
> @@ -712,18 +710,11 @@ static void ggtt_cleanup_hw(struct i915_ggtt *ggtt)
>   void i915_ggtt_driver_release(struct drm_i915_private *i915)
>   {
>   	struct i915_ggtt *ggtt = &i915->ggtt;
> -	struct pagevec *pvec;
>   
>   	fini_aliasing_ppgtt(ggtt);
>   
>   	intel_ggtt_fini_fences(ggtt);
>   	ggtt_cleanup_hw(ggtt);
> -
> -	pvec = &i915->mm.wc_stash.pvec;
> -	if (pvec->nr) {
> -		set_pages_array_wb(pvec->pages, pvec->nr);
> -		__pagevec_release(pvec);
> -	}
>   }
>   
>   static unsigned int gen6_get_total_gtt_size(u16 snb_gmch_ctl)
> @@ -786,7 +777,7 @@ static int ggtt_probe_common(struct i915_ggtt *ggtt, u64 size)
>   		return -ENOMEM;
>   	}
>   
> -	ret = setup_scratch_page(&ggtt->vm, GFP_DMA32);
> +	ret = setup_scratch_page(&ggtt->vm);
>   	if (ret) {
>   		drm_err(&i915->drm, "Scratch setup failed\n");
>   		/* iounmap will also get called at remove, but meh */
> @@ -794,8 +785,8 @@ static int ggtt_probe_common(struct i915_ggtt *ggtt, u64 size)
>   		return ret;
>   	}
>   
> -	ggtt->vm.scratch[0].encode =
> -		ggtt->vm.pte_encode(px_dma(&ggtt->vm.scratch[0]),
> +	ggtt->vm.scratch[0]->encode =
> +		ggtt->vm.pte_encode(px_dma(ggtt->vm.scratch[0]),
>   				    I915_CACHE_NONE, 0);
>   
>   	return 0;
> @@ -821,7 +812,7 @@ static void gen6_gmch_remove(struct i915_address_space *vm)
>   	struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
>   
>   	iounmap(ggtt->gsm);
> -	cleanup_scratch_page(vm);
> +	free_scratch(vm);
>   }
>   
>   static struct resource pci_resource(struct pci_dev *pdev, int bar)
> @@ -849,6 +840,8 @@ static int gen8_gmch_probe(struct i915_ggtt *ggtt)
>   	else
>   		size = gen8_get_total_gtt_size(snb_gmch_ctl);
>   
> +	ggtt->vm.alloc_pt_dma = alloc_pt_dma;
> +
>   	ggtt->vm.total = (size / sizeof(gen8_pte_t)) * I915_GTT_PAGE_SIZE;
>   	ggtt->vm.cleanup = gen6_gmch_remove;
>   	ggtt->vm.insert_page = gen8_ggtt_insert_page;
> @@ -997,6 +990,8 @@ static int gen6_gmch_probe(struct i915_ggtt *ggtt)
>   	size = gen6_get_total_gtt_size(snb_gmch_ctl);
>   	ggtt->vm.total = (size / sizeof(gen6_pte_t)) * I915_GTT_PAGE_SIZE;
>   
> +	ggtt->vm.alloc_pt_dma = alloc_pt_dma;
> +
>   	ggtt->vm.clear_range = nop_clear_range;
>   	if (!HAS_FULL_PPGTT(i915) || intel_scanout_needs_vtd_wa(i915))
>   		ggtt->vm.clear_range = gen6_ggtt_clear_range;
> @@ -1047,6 +1042,8 @@ static int i915_gmch_probe(struct i915_ggtt *ggtt)
>   	ggtt->gmadr =
>   		(struct resource)DEFINE_RES_MEM(gmadr_base, ggtt->mappable_end);
>   
> +	ggtt->vm.alloc_pt_dma = alloc_pt_dma;
> +
>   	ggtt->do_idle_maps = needs_idle_maps(i915);
>   	ggtt->vm.insert_page = i915_ggtt_insert_page;
>   	ggtt->vm.insert_entries = i915_ggtt_insert_entries;
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
> index 2a72cce63fd9..e0cc90942848 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
> @@ -11,160 +11,24 @@
>   #include "intel_gt.h"
>   #include "intel_gtt.h"
>   
> -void stash_init(struct pagestash *stash)
> +struct drm_i915_gem_object *alloc_pt_dma(struct i915_address_space *vm, int sz)
>   {
> -	pagevec_init(&stash->pvec);
> -	spin_lock_init(&stash->lock);
> -}
> -
> -static struct page *stash_pop_page(struct pagestash *stash)
> -{
> -	struct page *page = NULL;
> -
> -	spin_lock(&stash->lock);
> -	if (likely(stash->pvec.nr))
> -		page = stash->pvec.pages[--stash->pvec.nr];
> -	spin_unlock(&stash->lock);
> -
> -	return page;
> -}
> -
> -static void stash_push_pagevec(struct pagestash *stash, struct pagevec *pvec)
> -{
> -	unsigned int nr;
> -
> -	spin_lock_nested(&stash->lock, SINGLE_DEPTH_NESTING);
> -
> -	nr = min_t(typeof(nr), pvec->nr, pagevec_space(&stash->pvec));
> -	memcpy(stash->pvec.pages + stash->pvec.nr,
> -	       pvec->pages + pvec->nr - nr,
> -	       sizeof(pvec->pages[0]) * nr);
> -	stash->pvec.nr += nr;
> -
> -	spin_unlock(&stash->lock);
> -
> -	pvec->nr -= nr;
> -}
> -
> -static struct page *vm_alloc_page(struct i915_address_space *vm, gfp_t gfp)
> -{
> -	struct pagevec stack;
> -	struct page *page;
> -
> -	if (I915_SELFTEST_ONLY(should_fail(&vm->fault_attr, 1)))
> -		i915_gem_shrink_all(vm->i915);
> -
> -	page = stash_pop_page(&vm->free_pages);
> -	if (page)
> -		return page;
> -
> -	if (!vm->pt_kmap_wc)
> -		return alloc_page(gfp);
> -
> -	/* Look in our global stash of WC pages... */
> -	page = stash_pop_page(&vm->i915->mm.wc_stash);
> -	if (page)
> -		return page;
> +	struct drm_i915_gem_object *obj;
> +	int err;
>   
> -	/*
> -	 * Otherwise batch allocate pages to amortize cost of set_pages_wc.
> -	 *
> -	 * We have to be careful as page allocation may trigger the shrinker
> -	 * (via direct reclaim) which will fill up the WC stash underneath us.
> -	 * So we add our WB pages into a temporary pvec on the stack and merge
> -	 * them into the WC stash after all the allocations are complete.
> -	 */
> -	pagevec_init(&stack);
> -	do {
> -		struct page *page;
> -
> -		page = alloc_page(gfp);
> -		if (unlikely(!page))
> -			break;
> -
> -		stack.pages[stack.nr++] = page;
> -	} while (pagevec_space(&stack));
> -
> -	if (stack.nr && !set_pages_array_wc(stack.pages, stack.nr)) {
> -		page = stack.pages[--stack.nr];
> -
> -		/* Merge spare WC pages to the global stash */
> -		if (stack.nr)
> -			stash_push_pagevec(&vm->i915->mm.wc_stash, &stack);
> -
> -		/* Push any surplus WC pages onto the local VM stash */
> -		if (stack.nr)
> -			stash_push_pagevec(&vm->free_pages, &stack);
> -	}
> -
> -	/* Return unwanted leftovers */
> -	if (unlikely(stack.nr)) {
> -		WARN_ON_ONCE(set_pages_array_wb(stack.pages, stack.nr));
> -		__pagevec_release(&stack);
> -	}
> -
> -	return page;
> -}
> -
> -static void vm_free_pages_release(struct i915_address_space *vm,
> -				  bool immediate)
> -{
> -	struct pagevec *pvec = &vm->free_pages.pvec;
> -	struct pagevec stack;
> -
> -	lockdep_assert_held(&vm->free_pages.lock);
> -	GEM_BUG_ON(!pagevec_count(pvec));
> -
> -	if (vm->pt_kmap_wc) {
> -		/*
> -		 * When we use WC, first fill up the global stash and then
> -		 * only if full immediately free the overflow.
> -		 */
> -		stash_push_pagevec(&vm->i915->mm.wc_stash, pvec);
> +	obj = i915_gem_object_create_internal(vm->i915, sz);
> +	if (IS_ERR(obj))
> +		return obj;
>   
> -		/*
> -		 * As we have made some room in the VM's free_pages,
> -		 * we can wait for it to fill again. Unless we are
> -		 * inside i915_address_space_fini() and must
> -		 * immediately release the pages!
> -		 */
> -		if (pvec->nr <= (immediate ? 0 : PAGEVEC_SIZE - 1))
> -			return;
> -
> -		/*
> -		 * We have to drop the lock to allow ourselves to sleep,
> -		 * so take a copy of the pvec and clear the stash for
> -		 * others to use it as we sleep.
> -		 */
> -		stack = *pvec;
> -		pagevec_reinit(pvec);
> -		spin_unlock(&vm->free_pages.lock);
> -
> -		pvec = &stack;
> -		set_pages_array_wb(pvec->pages, pvec->nr);
> -
> -		spin_lock(&vm->free_pages.lock);
> +	err = i915_gem_object_pin_pages(obj);
> +	if (err) {
> +		i915_gem_object_put(obj);
> +		return ERR_PTR(err);
>   	}
>   
> -	__pagevec_release(pvec);
> -}
> +	i915_gem_object_make_unshrinkable(obj);
>   
> -static void vm_free_page(struct i915_address_space *vm, struct page *page)
> -{
> -	/*
> -	 * On !llc, we need to change the pages back to WB. We only do so
> -	 * in bulk, so we rarely need to change the page attributes here,
> -	 * but doing so requires a stop_machine() from deep inside arch/x86/mm.
> -	 * To make detection of the possible sleep more likely, use an
> -	 * unconditional might_sleep() for everybody.
> -	 */
> -	might_sleep();
> -	spin_lock(&vm->free_pages.lock);
> -	while (!pagevec_space(&vm->free_pages.pvec))
> -		vm_free_pages_release(vm, false);
> -	GEM_BUG_ON(pagevec_count(&vm->free_pages.pvec) >= PAGEVEC_SIZE);
> -	pagevec_add(&vm->free_pages.pvec, page);
> -	spin_unlock(&vm->free_pages.lock);
> +	return obj;
>   }
>   
>   void __i915_vm_close(struct i915_address_space *vm)
> @@ -194,14 +58,7 @@ void __i915_vm_close(struct i915_address_space *vm)
>   
>   void i915_address_space_fini(struct i915_address_space *vm)
>   {
> -	spin_lock(&vm->free_pages.lock);
> -	if (pagevec_count(&vm->free_pages.pvec))
> -		vm_free_pages_release(vm, true);
> -	GEM_BUG_ON(pagevec_count(&vm->free_pages.pvec));
> -	spin_unlock(&vm->free_pages.lock);
> -
>   	drm_mm_takedown(&vm->mm);
> -
>   	mutex_destroy(&vm->mutex);
>   }
>   
> @@ -246,8 +103,6 @@ void i915_address_space_init(struct i915_address_space *vm, int subclass)
>   	drm_mm_init(&vm->mm, 0, vm->total);
>   	vm->mm.head_node.color = I915_COLOR_UNEVICTABLE;
>   
> -	stash_init(&vm->free_pages);
> -
>   	INIT_LIST_HEAD(&vm->bound_list);
>   }
>   
> @@ -264,64 +119,47 @@ void clear_pages(struct i915_vma *vma)
>   	memset(&vma->page_sizes, 0, sizeof(vma->page_sizes));
>   }
>   
> -static int __setup_page_dma(struct i915_address_space *vm,
> -			    struct i915_page_dma *p,
> -			    gfp_t gfp)
> -{
> -	p->page = vm_alloc_page(vm, gfp | I915_GFP_ALLOW_FAIL);
> -	if (unlikely(!p->page))
> -		return -ENOMEM;
> -
> -	p->daddr = dma_map_page_attrs(vm->dma,
> -				      p->page, 0, PAGE_SIZE,
> -				      PCI_DMA_BIDIRECTIONAL,
> -				      DMA_ATTR_SKIP_CPU_SYNC |
> -				      DMA_ATTR_NO_WARN);
> -	if (unlikely(dma_mapping_error(vm->dma, p->daddr))) {
> -		vm_free_page(vm, p->page);
> -		return -ENOMEM;
> -	}
> -
> -	return 0;
> -}
> -
> -int setup_page_dma(struct i915_address_space *vm, struct i915_page_dma *p)
> +dma_addr_t __px_dma(struct drm_i915_gem_object *p)
>   {
> -	return __setup_page_dma(vm, p, __GFP_HIGHMEM);
> +	GEM_BUG_ON(!i915_gem_object_has_pages(p));
> +	return sg_dma_address(p->mm.pages->sgl);
>   }
>   
> -void cleanup_page_dma(struct i915_address_space *vm, struct i915_page_dma *p)
> +struct page *__px_page(struct drm_i915_gem_object *p)
>   {
> -	dma_unmap_page(vm->dma, p->daddr, PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
> -	vm_free_page(vm, p->page);
> +	GEM_BUG_ON(!i915_gem_object_has_pages(p));
> +	return sg_page(p->mm.pages->sgl);
>   }
>   
>   void
> -fill_page_dma(const struct i915_page_dma *p, const u64 val, unsigned int count)
> +fill_page_dma(struct drm_i915_gem_object *p, const u64 val, unsigned int count)
>   {
> -	kunmap_atomic(memset64(kmap_atomic(p->page), val, count));
> +	struct page *page = __px_page(p);
> +	void *vaddr;
> +
> +	vaddr = kmap(page);
> +	memset64(vaddr, val, count);
> +	kunmap(page);
>   }
>   
> -static void poison_scratch_page(struct page *page, unsigned long size)
> +static void poison_scratch_page(struct drm_i915_gem_object *scratch)
>   {
> +	struct sgt_iter sgt;
> +	struct page *page;
> +
>   	if (!IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM))
>   		return;
>   
> -	GEM_BUG_ON(!IS_ALIGNED(size, PAGE_SIZE));
> -
> -	do {
> +	for_each_sgt_page(page, sgt, scratch->mm.pages) {
>   		void *vaddr;
>   
>   		vaddr = kmap(page);
>   		memset(vaddr, POISON_FREE, PAGE_SIZE);
>   		kunmap(page);
> -
> -		page = pfn_to_page(page_to_pfn(page) + 1);
> -		size -= PAGE_SIZE;
> -	} while (size);
> +	}
>   }
>   
> -int setup_scratch_page(struct i915_address_space *vm, gfp_t gfp)
> +int setup_scratch_page(struct i915_address_space *vm)
>   {
>   	unsigned long size;
>   
> @@ -338,21 +176,19 @@ int setup_scratch_page(struct i915_address_space *vm, gfp_t gfp)
>   	 */
>   	size = I915_GTT_PAGE_SIZE_4K;
>   	if (i915_vm_is_4lvl(vm) &&
> -	    HAS_PAGE_SIZES(vm->i915, I915_GTT_PAGE_SIZE_64K)) {
> +	    HAS_PAGE_SIZES(vm->i915, I915_GTT_PAGE_SIZE_64K))
>   		size = I915_GTT_PAGE_SIZE_64K;
> -		gfp |= __GFP_NOWARN;
> -	}
> -	gfp |= __GFP_ZERO | __GFP_RETRY_MAYFAIL;
>   
>   	do {
> -		unsigned int order = get_order(size);
> -		struct page *page;
> -		dma_addr_t addr;
> +		struct drm_i915_gem_object *obj;
>   
> -		page = alloc_pages(gfp, order);
> -		if (unlikely(!page))
> +		obj = vm->alloc_pt_dma(vm, size);
> +		if (IS_ERR(obj))
>   			goto skip;
>   
> +		if (obj->mm.page_sizes.sg < size)
> +			goto skip_obj;
> +
>   		/*
>   		 * Use a non-zero scratch page for debugging.
>   		 *
> @@ -362,61 +198,28 @@ int setup_scratch_page(struct i915_address_space *vm, gfp_t gfp)
>   		 * should it ever be accidentally used, the effect should be
>   		 * fairly benign.
>   		 */
> -		poison_scratch_page(page, size);
> -
> -		addr = dma_map_page_attrs(vm->dma,
> -					  page, 0, size,
> -					  PCI_DMA_BIDIRECTIONAL,
> -					  DMA_ATTR_SKIP_CPU_SYNC |
> -					  DMA_ATTR_NO_WARN);
> -		if (unlikely(dma_mapping_error(vm->dma, addr)))
> -			goto free_page;
> -
> -		if (unlikely(!IS_ALIGNED(addr, size)))
> -			goto unmap_page;
> -
> -		vm->scratch[0].base.page = page;
> -		vm->scratch[0].base.daddr = addr;
> -		vm->scratch_order = order;
> +		poison_scratch_page(obj);
> +
> +		vm->scratch[0] = obj;
> +		vm->scratch_order = get_order(size);
>   		return 0;
>   
> -unmap_page:
> -		dma_unmap_page(vm->dma, addr, size, PCI_DMA_BIDIRECTIONAL);
> -free_page:
> -		__free_pages(page, order);
> +skip_obj:
> +		i915_gem_object_put(obj);
>   skip:
>   		if (size == I915_GTT_PAGE_SIZE_4K)
>   			return -ENOMEM;
>   
>   		size = I915_GTT_PAGE_SIZE_4K;
> -		gfp &= ~__GFP_NOWARN;
>   	} while (1);
>   }
>   
> -void cleanup_scratch_page(struct i915_address_space *vm)
> -{
> -	struct i915_page_dma *p = px_base(&vm->scratch[0]);
> -	unsigned int order = vm->scratch_order;
> -
> -	dma_unmap_page(vm->dma, p->daddr, BIT(order) << PAGE_SHIFT,
> -		       PCI_DMA_BIDIRECTIONAL);
> -	__free_pages(p->page, order);
> -}
> -
>   void free_scratch(struct i915_address_space *vm)
>   {
>   	int i;
>   
> -	if (!px_dma(&vm->scratch[0])) /* set to 0 on clones */
> -		return;
> -
> -	for (i = 1; i <= vm->top; i++) {
> -		if (!px_dma(&vm->scratch[i]))
> -			break;
> -		cleanup_page_dma(vm, px_base(&vm->scratch[i]));
> -	}
> -
> -	cleanup_scratch_page(vm);
> +	for (i = 0; i <= vm->top; i++)
> +		i915_gem_object_put(vm->scratch[i]);
>   }
>   
>   void gtt_write_workarounds(struct intel_gt *gt)
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
> index 8bd462d2fcd9..57b31b36285f 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
> @@ -134,31 +134,19 @@ typedef u64 gen8_pte_t;
>   #define GEN8_PDE_IPS_64K BIT(11)
>   #define GEN8_PDE_PS_2M   BIT(7)
>   
> +enum i915_cache_level;
> +
> +struct drm_i915_file_private;
> +struct drm_i915_gem_object;
>   struct i915_fence_reg;
> +struct i915_vma;
> +struct intel_gt;
>   
>   #define for_each_sgt_daddr(__dp, __iter, __sgt) \
>   	__for_each_sgt_daddr(__dp, __iter, __sgt, I915_GTT_PAGE_SIZE)
>   
> -struct i915_page_dma {
> -	struct page *page;
> -	union {
> -		dma_addr_t daddr;
> -
> -		/*
> -		 * For gen6/gen7 only. This is the offset in the GGTT
> -		 * where the page directory entries for PPGTT begin
> -		 */
> -		u32 ggtt_offset;
> -	};
> -};
> -
> -struct i915_page_scratch {
> -	struct i915_page_dma base;
> -	u64 encode;
> -};
> -
>   struct i915_page_table {
> -	struct i915_page_dma base;
> +	struct drm_i915_gem_object *base;
>   	union {
>   		atomic_t used;
>   		struct i915_page_table *stash;
> @@ -179,12 +167,14 @@ struct i915_page_directory {
>   	other)
>   
>   #define px_base(px) \
> -	__px_choose_expr(px, struct i915_page_dma *, __x, \
> -	__px_choose_expr(px, struct i915_page_scratch *, &__x->base, \
> -	__px_choose_expr(px, struct i915_page_table *, &__x->base, \
> -	__px_choose_expr(px, struct i915_page_directory *, &__x->pt.base, \
> -	(void)0))))
> -#define px_dma(px) (px_base(px)->daddr)
> +	__px_choose_expr(px, struct drm_i915_gem_object *, __x, \
> +	__px_choose_expr(px, struct i915_page_table *, __x->base, \
> +	__px_choose_expr(px, struct i915_page_directory *, __x->pt.base, \
> +	(void)0)))
> +
> +struct page *__px_page(struct drm_i915_gem_object *p);
> +dma_addr_t __px_dma(struct drm_i915_gem_object *p);
> +#define px_dma(px) (__px_dma(px_base(px)))
>   
>   #define px_pt(px) \
>   	__px_choose_expr(px, struct i915_page_table *, __x, \
> @@ -192,13 +182,6 @@ struct i915_page_directory {
>   	(void)0))
>   #define px_used(px) (&px_pt(px)->used)
>   
> -enum i915_cache_level;
> -
> -struct drm_i915_file_private;
> -struct drm_i915_gem_object;
> -struct i915_vma;
> -struct intel_gt;
> -
>   struct i915_vm_pt_stash {
>   	/* preallocated chains of page tables/directories */
>   	struct i915_page_table *pt[2];
> @@ -222,13 +205,6 @@ struct i915_vma_ops {
>   	void (*clear_pages)(struct i915_vma *vma);
>   };
>   
> -struct pagestash {
> -	spinlock_t lock;
> -	struct pagevec pvec;
> -};
> -
> -void stash_init(struct pagestash *stash);
> -
>   struct i915_address_space {
>   	struct kref ref;
>   	struct rcu_work rcu;
> @@ -265,7 +241,7 @@ struct i915_address_space {
>   #define VM_CLASS_GGTT 0
>   #define VM_CLASS_PPGTT 1
>   
> -	struct i915_page_scratch scratch[4];
> +	struct drm_i915_gem_object *scratch[4];
>   	unsigned int scratch_order;
>   	unsigned int top;
>   
> @@ -274,17 +250,15 @@ struct i915_address_space {
>   	 */
>   	struct list_head bound_list;
>   
> -	struct pagestash free_pages;
> -
>   	/* Global GTT */
>   	bool is_ggtt:1;
>   
> -	/* Some systems require uncached updates of the page directories */
> -	bool pt_kmap_wc:1;
> -
>   	/* Some systems support read-only mappings for GGTT and/or PPGTT */
>   	bool has_read_only:1;
>   
> +	struct drm_i915_gem_object *
> +		(*alloc_pt_dma)(struct i915_address_space *vm, int sz);
> +
>   	u64 (*pte_encode)(dma_addr_t addr,
>   			  enum i915_cache_level level,
>   			  u32 flags); /* Create a valid PTE */
> @@ -500,9 +474,9 @@ i915_pd_entry(const struct i915_page_directory * const pdp,
>   static inline dma_addr_t
>   i915_page_dir_dma_addr(const struct i915_ppgtt *ppgtt, const unsigned int n)
>   {
> -	struct i915_page_dma *pt = ppgtt->pd->entry[n];
> +	struct i915_page_table *pt = ppgtt->pd->entry[n];
>   
> -	return px_dma(pt ?: px_base(&ppgtt->vm.scratch[ppgtt->vm.top]));
> +	return __px_dma(pt ? px_base(pt) : ppgtt->vm.scratch[ppgtt->vm.top]);
>   }
>   
>   void ppgtt_init(struct i915_ppgtt *ppgtt, struct intel_gt *gt);
> @@ -527,13 +501,10 @@ struct i915_ppgtt *i915_ppgtt_create(struct intel_gt *gt);
>   void i915_ggtt_suspend(struct i915_ggtt *gtt);
>   void i915_ggtt_resume(struct i915_ggtt *ggtt);
>   
> -int setup_page_dma(struct i915_address_space *vm, struct i915_page_dma *p);
> -void cleanup_page_dma(struct i915_address_space *vm, struct i915_page_dma *p);
> -
> -#define kmap_atomic_px(px) kmap_atomic(px_base(px)->page)
> +#define kmap_atomic_px(px) kmap_atomic(__px_page(px_base(px)))
>   
>   void
> -fill_page_dma(const struct i915_page_dma *p, const u64 val, unsigned int count);
> +fill_page_dma(struct drm_i915_gem_object *p, const u64 val, unsigned int count);
>   
>   #define fill_px(px, v) fill_page_dma(px_base(px), (v), PAGE_SIZE / sizeof(u64))
>   #define fill32_px(px, v) do {						\
> @@ -541,37 +512,36 @@ fill_page_dma(const struct i915_page_dma *p, const u64 val, unsigned int count);
>   	fill_px((px), v__ << 32 | v__);					\
>   } while (0)
>   
> -int setup_scratch_page(struct i915_address_space *vm, gfp_t gfp);
> -void cleanup_scratch_page(struct i915_address_space *vm);
> +int setup_scratch_page(struct i915_address_space *vm);
>   void free_scratch(struct i915_address_space *vm);
>   
> +struct drm_i915_gem_object *alloc_pt_dma(struct i915_address_space *vm, int sz);
>   struct i915_page_table *alloc_pt(struct i915_address_space *vm);
>   struct i915_page_directory *alloc_pd(struct i915_address_space *vm);
>   struct i915_page_directory *__alloc_pd(size_t sz);
>   
> -void free_pd(struct i915_address_space *vm, struct i915_page_dma *pd);
> -
> -#define free_px(vm, px) free_pd(vm, px_base(px))
> +void free_pt(struct i915_address_space *vm, struct i915_page_table *pt);
> +#define free_px(vm, px) free_pt(vm, px_pt(px))
>   
>   void
>   __set_pd_entry(struct i915_page_directory * const pd,
>   	       const unsigned short idx,
> -	       struct i915_page_dma * const to,
> +	       struct i915_page_table *pt,
>   	       u64 (*encode)(const dma_addr_t, const enum i915_cache_level));
>   
>   #define set_pd_entry(pd, idx, to) \
> -	__set_pd_entry((pd), (idx), px_base(to), gen8_pde_encode)
> +	__set_pd_entry((pd), (idx), px_pt(to), gen8_pde_encode)
>   
>   void
>   clear_pd_entry(struct i915_page_directory * const pd,
>   	       const unsigned short idx,
> -	       const struct i915_page_scratch * const scratch);
> +	       const struct drm_i915_gem_object * const scratch);
>   
>   bool
>   release_pd_entry(struct i915_page_directory * const pd,
>   		 const unsigned short idx,
>   		 struct i915_page_table * const pt,
> -		 const struct i915_page_scratch * const scratch);
> +		 const struct drm_i915_gem_object * const scratch);
>   void gen6_ggtt_invalidate(struct i915_ggtt *ggtt);
>   
>   int ggtt_set_pages(struct i915_vma *vma);
> diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
> index 9633fd2d294d..94bd969ebffd 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
> @@ -18,7 +18,8 @@ struct i915_page_table *alloc_pt(struct i915_address_space *vm)
>   	if (unlikely(!pt))
>   		return ERR_PTR(-ENOMEM);
>   
> -	if (unlikely(setup_page_dma(vm, &pt->base))) {
> +	pt->base = vm->alloc_pt_dma(vm, I915_GTT_PAGE_SIZE_4K);
> +	if (IS_ERR(pt->base)) {
>   		kfree(pt);
>   		return ERR_PTR(-ENOMEM);
>   	}
> @@ -47,7 +48,8 @@ struct i915_page_directory *alloc_pd(struct i915_address_space *vm)
>   	if (unlikely(!pd))
>   		return ERR_PTR(-ENOMEM);
>   
> -	if (unlikely(setup_page_dma(vm, px_base(pd)))) {
> +	pd->pt.base = vm->alloc_pt_dma(vm, I915_GTT_PAGE_SIZE_4K);
> +	if (IS_ERR(pd->pt.base)) {
>   		kfree(pd);
>   		return ERR_PTR(-ENOMEM);
>   	}
> @@ -55,27 +57,28 @@ struct i915_page_directory *alloc_pd(struct i915_address_space *vm)
>   	return pd;
>   }
>   
> -void free_pd(struct i915_address_space *vm, struct i915_page_dma *pd)
> +void free_pt(struct i915_address_space *vm, struct i915_page_table *pt)
>   {
> -	cleanup_page_dma(vm, pd);
> -	kfree(pd);
> +	i915_gem_object_put(pt->base);
> +	kfree(pt);
>   }
>   
>   static inline void
> -write_dma_entry(struct i915_page_dma * const pdma,
> +write_dma_entry(struct drm_i915_gem_object * const pdma,
>   		const unsigned short idx,
>   		const u64 encoded_entry)
>   {
> -	u64 * const vaddr = kmap_atomic(pdma->page);
> +	u64 * const vaddr = kmap_atomic(__px_page(pdma));
>   
>   	vaddr[idx] = encoded_entry;
> +	clflush_cache_range(&vaddr[idx], sizeof(u64));
>   	kunmap_atomic(vaddr);
>   }
>   
>   void
>   __set_pd_entry(struct i915_page_directory * const pd,
>   	       const unsigned short idx,
> -	       struct i915_page_dma * const to,
> +	       struct i915_page_table * const to,
>   	       u64 (*encode)(const dma_addr_t, const enum i915_cache_level))
>   {
>   	/* Each thread pre-pins the pd, and we may have a thread per pde. */
> @@ -83,13 +86,13 @@ __set_pd_entry(struct i915_page_directory * const pd,
>   
>   	atomic_inc(px_used(pd));
>   	pd->entry[idx] = to;
> -	write_dma_entry(px_base(pd), idx, encode(to->daddr, I915_CACHE_LLC));
> +	write_dma_entry(px_base(pd), idx, encode(px_dma(to), I915_CACHE_LLC));
>   }
>   
>   void
>   clear_pd_entry(struct i915_page_directory * const pd,
>   	       const unsigned short idx,
> -	       const struct i915_page_scratch * const scratch)
> +	       const struct drm_i915_gem_object * const scratch)
>   {
>   	GEM_BUG_ON(atomic_read(px_used(pd)) == 0);
>   
> @@ -102,7 +105,7 @@ bool
>   release_pd_entry(struct i915_page_directory * const pd,
>   		 const unsigned short idx,
>   		 struct i915_page_table * const pt,
> -		 const struct i915_page_scratch * const scratch)
> +		 const struct drm_i915_gem_object * const scratch)
>   {
>   	bool free = false;
>   
> diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> index 68a08486fc87..f1f27b7fc746 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> @@ -201,16 +201,18 @@ static struct i915_address_space *vm_alias(struct i915_address_space *vm)
>   	return vm;
>   }
>   
> +static u32 pp_dir(struct i915_address_space *vm)
> +{
> +	return to_gen6_ppgtt(i915_vm_to_ppgtt(vm))->pp_dir;
> +}
> +
>   static void set_pp_dir(struct intel_engine_cs *engine)
>   {
>   	struct i915_address_space *vm = vm_alias(engine->gt->vm);
>   
>   	if (vm) {
> -		struct i915_ppgtt *ppgtt = i915_vm_to_ppgtt(vm);
> -
>   		ENGINE_WRITE(engine, RING_PP_DIR_DCLV, PP_DIR_DCLV_2G);
> -		ENGINE_WRITE(engine, RING_PP_DIR_BASE,
> -			     px_base(ppgtt->pd)->ggtt_offset << 10);
> +		ENGINE_WRITE(engine, RING_PP_DIR_BASE, pp_dir(vm));
>   	}
>   }
>   
> @@ -608,7 +610,7 @@ static const struct intel_context_ops ring_context_ops = {
>   };
>   
>   static int load_pd_dir(struct i915_request *rq,
> -		       const struct i915_ppgtt *ppgtt,
> +		       struct i915_address_space *vm,
>   		       u32 valid)
>   {
>   	const struct intel_engine_cs * const engine = rq->engine;
> @@ -624,7 +626,7 @@ static int load_pd_dir(struct i915_request *rq,
>   
>   	*cs++ = MI_LOAD_REGISTER_IMM(1);
>   	*cs++ = i915_mmio_reg_offset(RING_PP_DIR_BASE(engine->mmio_base));
> -	*cs++ = px_base(ppgtt->pd)->ggtt_offset << 10;
> +	*cs++ = pp_dir(vm);
>   
>   	/* Stall until the page table load is complete? */
>   	*cs++ = MI_STORE_REGISTER_MEM | MI_SRM_LRM_GLOBAL_GTT;
> @@ -826,7 +828,7 @@ static int switch_mm(struct i915_request *rq, struct i915_address_space *vm)
>   	 * post-sync op, this extra pass appears vital before a
>   	 * mm switch!
>   	 */
> -	ret = load_pd_dir(rq, i915_vm_to_ppgtt(vm), PP_DIR_DCLV_2G);
> +	ret = load_pd_dir(rq, vm, PP_DIR_DCLV_2G);
>   	if (ret)
>   		return ret;
>   
> diff --git a/drivers/gpu/drm/i915/gvt/scheduler.c b/drivers/gpu/drm/i915/gvt/scheduler.c
> index 3c3b9842bbbd..1570eb8aa978 100644
> --- a/drivers/gpu/drm/i915/gvt/scheduler.c
> +++ b/drivers/gpu/drm/i915/gvt/scheduler.c
> @@ -403,6 +403,14 @@ static void release_shadow_wa_ctx(struct intel_shadow_wa_ctx *wa_ctx)
>   	wa_ctx->indirect_ctx.shadow_va = NULL;
>   }
>   
> +static void set_dma_address(struct i915_page_directory *pd, dma_addr_t addr)
> +{
> +	struct scatterlist *sg = pd->pt.base->mm.pages->sgl;
> +
> +	/* This is not a good idea */
> +	sg->dma_address = addr;
> +}
> +
>   static void set_context_ppgtt_from_shadow(struct intel_vgpu_workload *workload,
>   					  struct intel_context *ce)
>   {
> @@ -411,7 +419,7 @@ static void set_context_ppgtt_from_shadow(struct intel_vgpu_workload *workload,
>   	int i = 0;
>   
>   	if (mm->ppgtt_mm.root_entry_type == GTT_TYPE_PPGTT_ROOT_L4_ENTRY) {
> -		px_dma(ppgtt->pd) = mm->ppgtt_mm.shadow_pdps[0];
> +		set_dma_address(ppgtt->pd, mm->ppgtt_mm.shadow_pdps[0]);
>   	} else {
>   		for (i = 0; i < GVT_RING_CTX_NR_PDPS; i++) {
>   			struct i915_page_directory * const pd =
> @@ -421,7 +429,8 @@ static void set_context_ppgtt_from_shadow(struct intel_vgpu_workload *workload,
>   			   shadow ppgtt. */
>   			if (!pd)
>   				break;
> -			px_dma(pd) = mm->ppgtt_mm.shadow_pdps[i];
> +
> +			set_dma_address(pd, mm->ppgtt_mm.shadow_pdps[i]);
>   		}
>   	}
>   }
> @@ -1240,13 +1249,13 @@ i915_context_ppgtt_root_restore(struct intel_vgpu_submission *s,
>   	int i;
>   
>   	if (i915_vm_is_4lvl(&ppgtt->vm)) {
> -		px_dma(ppgtt->pd) = s->i915_context_pml4;
> +		set_dma_address(ppgtt->pd, s->i915_context_pml4);
>   	} else {
>   		for (i = 0; i < GEN8_3LVL_PDPES; i++) {
>   			struct i915_page_directory * const pd =
>   				i915_pd_entry(ppgtt->pd, i);
>   
> -			px_dma(pd) = s->i915_context_pdps[i];
> +			set_dma_address(pd, s->i915_context_pdps[i]);
>   		}
>   	}
>   }
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index 67102dc26fce..ea281d7b0630 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -1080,6 +1080,7 @@ static void i915_driver_release(struct drm_device *dev)
>   
>   	intel_memory_regions_driver_release(dev_priv);
>   	i915_ggtt_driver_release(dev_priv);
> +	i915_gem_drain_freed_objects(dev_priv);
>   
>   	i915_driver_mmio_release(dev_priv);
>   
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 6e9072ab30a1..100c2029798f 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -590,11 +590,6 @@ struct i915_gem_mm {
>   	 */
>   	atomic_t free_count;
>   
> -	/**
> -	 * Small stash of WC pages
> -	 */
> -	struct pagestash wc_stash;
> -
>   	/**
>   	 * tmpfs instance used for shmem backed objects
>   	 */
> diff --git a/drivers/gpu/drm/i915/selftests/mock_gtt.c b/drivers/gpu/drm/i915/selftests/mock_gtt.c
> index 5e4fb0fba34b..63a29211652e 100644
> --- a/drivers/gpu/drm/i915/selftests/mock_gtt.c
> +++ b/drivers/gpu/drm/i915/selftests/mock_gtt.c
> @@ -78,6 +78,8 @@ struct i915_ppgtt *mock_ppgtt(struct drm_i915_private *i915, const char *name)
>   
>   	i915_address_space_init(&ppgtt->vm, VM_CLASS_PPGTT);
>   
> +	ppgtt->vm.alloc_pt_dma = alloc_pt_dma;
> +
>   	ppgtt->vm.clear_range = mock_clear_range;
>   	ppgtt->vm.insert_page = mock_insert_page;
>   	ppgtt->vm.insert_entries = mock_insert_entries;
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Intel-gfx] [PATCH 06/23] drm/i915: Preallocate stashes for vma page-directories
  2020-07-02  8:32 ` [Intel-gfx] [PATCH 06/23] drm/i915: Preallocate stashes for vma page-directories Chris Wilson
@ 2020-07-03 16:47   ` Tvrtko Ursulin
  0 siblings, 0 replies; 56+ messages in thread
From: Tvrtko Ursulin @ 2020-07-03 16:47 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 02/07/2020 09:32, Chris Wilson wrote:
> We need to make the DMA allocations used for page directories to be
> performed up front so that we can include those allocations in our
> memory reservation pass. The downside is that we have to assume the
> worst case, even before we know the final layout, and always allocate
> enough page directories for this object, even when there will be overlap.
> 
> It should be noted that the lifetime for the page directories DMA is
> more or less decoupled from individual fences as they will be shared
> across objects across timelines.

Why specifically you are pointing this out?

> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   .../gpu/drm/i915/gem/i915_gem_client_blt.c    | 11 +--
>   drivers/gpu/drm/i915/gt/gen6_ppgtt.c          | 38 +++------
>   drivers/gpu/drm/i915/gt/gen8_ppgtt.c          | 77 +++++-------------
>   drivers/gpu/drm/i915/gt/intel_ggtt.c          | 45 +++++------
>   drivers/gpu/drm/i915/gt/intel_gtt.h           | 39 ++++++---
>   drivers/gpu/drm/i915/gt/intel_ppgtt.c         | 80 ++++++++++++++++---
>   drivers/gpu/drm/i915/i915_vma.c               | 29 ++++---
>   drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 60 ++++++++------
>   drivers/gpu/drm/i915/selftests/mock_gtt.c     | 22 ++---
>   9 files changed, 224 insertions(+), 177 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c b/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c
> index 278664f831e7..947c8aa8e13e 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c
> @@ -32,12 +32,13 @@ static void vma_clear_pages(struct i915_vma *vma)
>   	vma->pages = NULL;
>   }
>   
> -static int vma_bind(struct i915_address_space *vm,
> -		    struct i915_vma *vma,
> -		    enum i915_cache_level cache_level,
> -		    u32 flags)
> +static void vma_bind(struct i915_address_space *vm,
> +		     struct i915_vm_pt_stash *stash,
> +		     struct i915_vma *vma,
> +		     enum i915_cache_level cache_level,
> +		     u32 flags)
>   {
> -	return vm->vma_ops.bind_vma(vm, vma, cache_level, flags);
> +	vm->vma_ops.bind_vma(vm, stash, vma, cache_level, flags);
>   }
>   
>   static void vma_unbind(struct i915_address_space *vm, struct i915_vma *vma)
> diff --git a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
> index 05497b50103f..35e2b698f9ed 100644
> --- a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
> +++ b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
> @@ -177,16 +177,16 @@ static void gen6_flush_pd(struct gen6_ppgtt *ppgtt, u64 start, u64 end)
>   	mutex_unlock(&ppgtt->flush);
>   }
>   
> -static int gen6_alloc_va_range(struct i915_address_space *vm,
> -			       u64 start, u64 length)
> +static void gen6_alloc_va_range(struct i915_address_space *vm,
> +				struct i915_vm_pt_stash *stash,
> +				u64 start, u64 length)
>   {
>   	struct gen6_ppgtt *ppgtt = to_gen6_ppgtt(i915_vm_to_ppgtt(vm));
>   	struct i915_page_directory * const pd = ppgtt->base.pd;
> -	struct i915_page_table *pt, *alloc = NULL;
> +	struct i915_page_table *pt;
>   	intel_wakeref_t wakeref;
>   	u64 from = start;
>   	unsigned int pde;
> -	int ret = 0;
>   
>   	wakeref = intel_runtime_pm_get(&vm->i915->runtime_pm);
>   
> @@ -197,21 +197,17 @@ static int gen6_alloc_va_range(struct i915_address_space *vm,
>   		if (px_base(pt) == px_base(&vm->scratch[1])) {
>   			spin_unlock(&pd->lock);
>   
> -			pt = fetch_and_zero(&alloc);
> -			if (!pt)
> -				pt = alloc_pt(vm);
> -			if (IS_ERR(pt)) {
> -				ret = PTR_ERR(pt);
> -				goto unwind_out;
> -			}
> +			pt = stash->pt[0];
> +			GEM_BUG_ON(!pt);
>   
>   			fill32_px(pt, vm->scratch[0].encode);
>   
>   			spin_lock(&pd->lock);
>   			if (pd->entry[pde] == &vm->scratch[1]) {
> +				stash->pt[0] = pt->stash;
> +				atomic_set(&pt->used, 0);
>   				pd->entry[pde] = pt;
>   			} else {
> -				alloc = pt;
>   				pt = pd->entry[pde];
>   			}
>   		}
> @@ -223,15 +219,7 @@ static int gen6_alloc_va_range(struct i915_address_space *vm,
>   	if (i915_vma_is_bound(ppgtt->vma, I915_VMA_GLOBAL_BIND))
>   		gen6_flush_pd(ppgtt, from, start);
>   
> -	goto out;
> -
> -unwind_out:
> -	gen6_ppgtt_clear_range(vm, from, start - from);
> -out:
> -	if (alloc)
> -		free_px(vm, alloc);
>   	intel_runtime_pm_put(&vm->i915->runtime_pm, wakeref);
> -	return ret;
>   }
>   
>   static int gen6_ppgtt_init_scratch(struct gen6_ppgtt *ppgtt)
> @@ -299,10 +287,11 @@ static void pd_vma_clear_pages(struct i915_vma *vma)
>   	vma->pages = NULL;
>   }
>   
> -static int pd_vma_bind(struct i915_address_space *vm,
> -		       struct i915_vma *vma,
> -		       enum i915_cache_level cache_level,
> -		       u32 unused)
> +static void pd_vma_bind(struct i915_address_space *vm,
> +			struct i915_vm_pt_stash *stash,
> +			struct i915_vma *vma,
> +			enum i915_cache_level cache_level,
> +			u32 unused)
>   {
>   	struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
>   	struct gen6_ppgtt *ppgtt = vma->private;
> @@ -312,7 +301,6 @@ static int pd_vma_bind(struct i915_address_space *vm,
>   	ppgtt->pd_addr = (gen6_pte_t __iomem *)ggtt->gsm + ggtt_offset;
>   
>   	gen6_flush_pd(ppgtt, 0, ppgtt->base.vm.total);
> -	return 0;
>   }
>   
>   static void pd_vma_unbind(struct i915_address_space *vm, struct i915_vma *vma)
> diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> index 699125928272..e6f2acd445dd 100644
> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> @@ -269,14 +269,12 @@ static void gen8_ppgtt_clear(struct i915_address_space *vm,
>   			   start, start + length, vm->top);
>   }
>   
> -static int __gen8_ppgtt_alloc(struct i915_address_space * const vm,
> -			      struct i915_page_directory * const pd,
> -			      u64 * const start, const u64 end, int lvl)
> +static void __gen8_ppgtt_alloc(struct i915_address_space * const vm,
> +			       struct i915_vm_pt_stash *stash,
> +			       struct i915_page_directory * const pd,
> +			       u64 * const start, const u64 end, int lvl)
>   {
> -	const struct i915_page_scratch * const scratch = &vm->scratch[lvl];
> -	struct i915_page_table *alloc = NULL;
>   	unsigned int idx, len;
> -	int ret = 0;
>   
>   	GEM_BUG_ON(end > vm->total >> GEN8_PTE_SHIFT);
>   
> @@ -297,49 +295,30 @@ static int __gen8_ppgtt_alloc(struct i915_address_space * const vm,
>   			DBG("%s(%p):{ lvl:%d, idx:%d } allocating new tree\n",
>   			    __func__, vm, lvl + 1, idx);
>   
> -			pt = fetch_and_zero(&alloc);
> -			if (lvl) {
> -				if (!pt) {
> -					pt = &alloc_pd(vm)->pt;
> -					if (IS_ERR(pt)) {
> -						ret = PTR_ERR(pt);
> -						goto out;
> -					}
> -				}
> +			pt = stash->pt[!!lvl];
> +			GEM_BUG_ON(!pt);
>   
> +			if (lvl ||
> +			    gen8_pt_count(*start, end) < I915_PDES ||
> +			    intel_vgpu_active(vm->i915))
>   				fill_px(pt, vm->scratch[lvl].encode);
> -			} else {
> -				if (!pt) {
> -					pt = alloc_pt(vm);
> -					if (IS_ERR(pt)) {
> -						ret = PTR_ERR(pt);
> -						goto out;
> -					}
> -				}
> -
> -				if (intel_vgpu_active(vm->i915) ||
> -				    gen8_pt_count(*start, end) < I915_PDES)
> -					fill_px(pt, vm->scratch[lvl].encode);
> -			}
>   
>   			spin_lock(&pd->lock);
> -			if (likely(!pd->entry[idx]))
> +			if (likely(!pd->entry[idx])) {
> +				stash->pt[!!lvl] = pt->stash;
> +				atomic_set(&pt->used, 0);
>   				set_pd_entry(pd, idx, pt);
> -			else
> -				alloc = pt, pt = pd->entry[idx];
> +			} else {
> +				pt = pd->entry[idx];
> +			}
>   		}
>   
>   		if (lvl) {
>   			atomic_inc(&pt->used);
>   			spin_unlock(&pd->lock);
>   
> -			ret = __gen8_ppgtt_alloc(vm, as_pd(pt),
> -						 start, end, lvl);
> -			if (unlikely(ret)) {
> -				if (release_pd_entry(pd, idx, pt, scratch))
> -					free_px(vm, pt);
> -				goto out;
> -			}
> +			__gen8_ppgtt_alloc(vm, stash,
> +					   as_pd(pt), start, end, lvl);
>   
>   			spin_lock(&pd->lock);
>   			atomic_dec(&pt->used);
> @@ -359,18 +338,12 @@ static int __gen8_ppgtt_alloc(struct i915_address_space * const vm,
>   		}
>   	} while (idx++, --len);
>   	spin_unlock(&pd->lock);
> -out:
> -	if (alloc)
> -		free_px(vm, alloc);
> -	return ret;
>   }
>   
> -static int gen8_ppgtt_alloc(struct i915_address_space *vm,
> -			    u64 start, u64 length)
> +static void gen8_ppgtt_alloc(struct i915_address_space *vm,
> +			     struct i915_vm_pt_stash *stash,
> +			     u64 start, u64 length)
>   {
> -	u64 from;
> -	int err;
> -
>   	GEM_BUG_ON(!IS_ALIGNED(start, BIT_ULL(GEN8_PTE_SHIFT)));
>   	GEM_BUG_ON(!IS_ALIGNED(length, BIT_ULL(GEN8_PTE_SHIFT)));
>   	GEM_BUG_ON(range_overflows(start, length, vm->total));
> @@ -378,15 +351,9 @@ static int gen8_ppgtt_alloc(struct i915_address_space *vm,
>   	start >>= GEN8_PTE_SHIFT;
>   	length >>= GEN8_PTE_SHIFT;
>   	GEM_BUG_ON(length == 0);
> -	from = start;
> -
> -	err = __gen8_ppgtt_alloc(vm, i915_vm_to_ppgtt(vm)->pd,
> -				 &start, start + length, vm->top);
> -	if (unlikely(err && from != start))
> -		__gen8_ppgtt_clear(vm, i915_vm_to_ppgtt(vm)->pd,
> -				   from, start, vm->top);
>   
> -	return err;
> +	__gen8_ppgtt_alloc(vm, stash, i915_vm_to_ppgtt(vm)->pd,
> +			   &start, start + length, vm->top);
>   }
>   
>   static __always_inline void
> diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> index 62979ea591f0..791e4070ef31 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> @@ -436,16 +436,17 @@ static void i915_ggtt_clear_range(struct i915_address_space *vm,
>   	intel_gtt_clear_range(start >> PAGE_SHIFT, length >> PAGE_SHIFT);
>   }
>   
> -static int ggtt_bind_vma(struct i915_address_space *vm,
> -			 struct i915_vma *vma,
> -			 enum i915_cache_level cache_level,
> -			 u32 flags)
> +static void ggtt_bind_vma(struct i915_address_space *vm,
> +			  struct i915_vm_pt_stash *stash,
> +			  struct i915_vma *vma,
> +			  enum i915_cache_level cache_level,
> +			  u32 flags)
>   {
>   	struct drm_i915_gem_object *obj = vma->obj;
>   	u32 pte_flags;
>   
>   	if (i915_vma_is_bound(vma, ~flags & I915_VMA_BIND_MASK))
> -		return 0;
> +		return;
>   
>   	/* Applicable to VLV (gen8+ do not support RO in the GGTT) */
>   	pte_flags = 0;
> @@ -454,8 +455,6 @@ static int ggtt_bind_vma(struct i915_address_space *vm,
>   
>   	vm->insert_entries(vm, vma, cache_level, pte_flags);
>   	vma->page_sizes.gtt = I915_GTT_PAGE_SIZE;
> -
> -	return 0;
>   }
>   
>   static void ggtt_unbind_vma(struct i915_address_space *vm, struct i915_vma *vma)
> @@ -568,31 +567,25 @@ static int init_ggtt(struct i915_ggtt *ggtt)
>   	return ret;
>   }
>   
> -static int aliasing_gtt_bind_vma(struct i915_address_space *vm,
> -				 struct i915_vma *vma,
> -				 enum i915_cache_level cache_level,
> -				 u32 flags)
> +static void aliasing_gtt_bind_vma(struct i915_address_space *vm,
> +				  struct i915_vm_pt_stash *stash,
> +				  struct i915_vma *vma,
> +				  enum i915_cache_level cache_level,
> +				  u32 flags)
>   {
>   	u32 pte_flags;
> -	int ret;
>   
>   	/* Currently applicable only to VLV */
>   	pte_flags = 0;
>   	if (i915_gem_object_is_readonly(vma->obj))
>   		pte_flags |= PTE_READ_ONLY;
>   
> -	if (flags & I915_VMA_LOCAL_BIND) {
> -		struct i915_ppgtt *alias = i915_vm_to_ggtt(vm)->alias;
> -
> -		ret = ppgtt_bind_vma(&alias->vm, vma, cache_level, flags);
> -		if (ret)
> -			return ret;
> -	}
> +	if (flags & I915_VMA_LOCAL_BIND)
> +		ppgtt_bind_vma(&i915_vm_to_ggtt(vm)->alias->vm,
> +			       stash, vma, cache_level, flags);
>   
>   	if (flags & I915_VMA_GLOBAL_BIND)
>   		vm->insert_entries(vm, vma, cache_level, pte_flags);
> -
> -	return 0;
>   }
>   
>   static void aliasing_gtt_unbind_vma(struct i915_address_space *vm,
> @@ -607,6 +600,7 @@ static void aliasing_gtt_unbind_vma(struct i915_address_space *vm,
>   
>   static int init_aliasing_ppgtt(struct i915_ggtt *ggtt)
>   {
> +	struct i915_vm_pt_stash stash = {};
>   	struct i915_ppgtt *ppgtt;
>   	int err;
>   
> @@ -619,15 +613,17 @@ static int init_aliasing_ppgtt(struct i915_ggtt *ggtt)
>   		goto err_ppgtt;
>   	}
>   
> +	err = i915_vm_alloc_pt_stash(&ppgtt->vm, &stash, ggtt->vm.total);
> +	if (err)
> +		goto err_ppgtt;
> +
>   	/*
>   	 * Note we only pre-allocate as far as the end of the global
>   	 * GTT. On 48b / 4-level page-tables, the difference is very,
>   	 * very significant! We have to preallocate as GVT/vgpu does
>   	 * not like the page directory disappearing.
>   	 */
> -	err = ppgtt->vm.allocate_va_range(&ppgtt->vm, 0, ggtt->vm.total);
> -	if (err)
> -		goto err_ppgtt;
> +	ppgtt->vm.allocate_va_range(&ppgtt->vm, &stash, 0, ggtt->vm.total);
>   
>   	ggtt->alias = ppgtt;
>   	ggtt->vm.bind_async_flags |= ppgtt->vm.bind_async_flags;
> @@ -638,6 +634,7 @@ static int init_aliasing_ppgtt(struct i915_ggtt *ggtt)
>   	GEM_BUG_ON(ggtt->vm.vma_ops.unbind_vma != ggtt_unbind_vma);
>   	ggtt->vm.vma_ops.unbind_vma = aliasing_gtt_unbind_vma;
>   
> +	i915_vm_free_pt_stash(&ppgtt->vm, &stash);
>   	return 0;
>   
>   err_ppgtt:
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
> index f2b75078e05f..8bd462d2fcd9 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
> @@ -159,7 +159,10 @@ struct i915_page_scratch {
>   
>   struct i915_page_table {
>   	struct i915_page_dma base;
> -	atomic_t used;
> +	union {
> +		atomic_t used;
> +		struct i915_page_table *stash;

What it is for?

> +	};
>   };
>   
>   struct i915_page_directory {
> @@ -196,12 +199,18 @@ struct drm_i915_gem_object;
>   struct i915_vma;
>   struct intel_gt;
>   
> +struct i915_vm_pt_stash {
> +	/* preallocated chains of page tables/directories */
> +	struct i915_page_table *pt[2];

How does the chain work, so I don't have to reverse engineer it from the 
code?

> +};
> +
>   struct i915_vma_ops {
>   	/* Map an object into an address space with the given cache flags. */
> -	int (*bind_vma)(struct i915_address_space *vm,
> -			struct i915_vma *vma,
> -			enum i915_cache_level cache_level,
> -			u32 flags);
> +	void (*bind_vma)(struct i915_address_space *vm,
> +			 struct i915_vm_pt_stash *stash,
> +			 struct i915_vma *vma,
> +			 enum i915_cache_level cache_level,
> +			 u32 flags);
>   	/*
>   	 * Unmap an object from an address space. This usually consists of
>   	 * setting the valid PTE entries to a reserved scratch page.
> @@ -281,8 +290,9 @@ struct i915_address_space {
>   			  u32 flags); /* Create a valid PTE */
>   #define PTE_READ_ONLY	BIT(0)
>   
> -	int (*allocate_va_range)(struct i915_address_space *vm,
> -				 u64 start, u64 length);
> +	void (*allocate_va_range)(struct i915_address_space *vm,
> +				  struct i915_vm_pt_stash *stash,
> +				  u64 start, u64 length);
>   	void (*clear_range)(struct i915_address_space *vm,
>   			    u64 start, u64 length);
>   	void (*insert_page)(struct i915_address_space *vm,
> @@ -568,10 +578,11 @@ int ggtt_set_pages(struct i915_vma *vma);
>   int ppgtt_set_pages(struct i915_vma *vma);
>   void clear_pages(struct i915_vma *vma);
>   
> -int ppgtt_bind_vma(struct i915_address_space *vm,
> -		   struct i915_vma *vma,
> -		   enum i915_cache_level cache_level,
> -		   u32 flags);
> +void ppgtt_bind_vma(struct i915_address_space *vm,
> +		    struct i915_vm_pt_stash *stash,
> +		    struct i915_vma *vma,
> +		    enum i915_cache_level cache_level,
> +		    u32 flags);
>   void ppgtt_unbind_vma(struct i915_address_space *vm,
>   		      struct i915_vma *vma);
>   
> @@ -579,6 +590,12 @@ void gtt_write_workarounds(struct intel_gt *gt);
>   
>   void setup_private_pat(struct intel_uncore *uncore);
>   
> +int i915_vm_alloc_pt_stash(struct i915_address_space *vm,
> +			   struct i915_vm_pt_stash *stash,
> +			   u64 size);
> +void i915_vm_free_pt_stash(struct i915_address_space *vm,
> +			   struct i915_vm_pt_stash *stash);
> +
>   static inline struct sgt_dma {
>   	struct scatterlist *sg;
>   	dma_addr_t dma, max;
> diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
> index f0862e924d11..9633fd2d294d 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
> @@ -155,19 +155,16 @@ struct i915_ppgtt *i915_ppgtt_create(struct intel_gt *gt)
>   	return ppgtt;
>   }
>   
> -int ppgtt_bind_vma(struct i915_address_space *vm,
> -		   struct i915_vma *vma,
> -		   enum i915_cache_level cache_level,
> -		   u32 flags)
> +void ppgtt_bind_vma(struct i915_address_space *vm,
> +		    struct i915_vm_pt_stash *stash,
> +		    struct i915_vma *vma,
> +		    enum i915_cache_level cache_level,
> +		    u32 flags)
>   {
>   	u32 pte_flags;
> -	int err;
>   
>   	if (!test_bit(I915_VMA_ALLOC_BIT, __i915_vma_flags(vma))) {
> -		err = vm->allocate_va_range(vm, vma->node.start, vma->size);
> -		if (err)
> -			return err;
> -
> +		vm->allocate_va_range(vm, stash, vma->node.start, vma->size);
>   		set_bit(I915_VMA_ALLOC_BIT, __i915_vma_flags(vma));
>   	}
>   
> @@ -178,8 +175,6 @@ int ppgtt_bind_vma(struct i915_address_space *vm,
>   
>   	vm->insert_entries(vm, vma, cache_level, pte_flags);
>   	wmb();
> -
> -	return 0;
>   }
>   
>   void ppgtt_unbind_vma(struct i915_address_space *vm, struct i915_vma *vma)
> @@ -188,12 +183,73 @@ void ppgtt_unbind_vma(struct i915_address_space *vm, struct i915_vma *vma)
>   		vm->clear_range(vm, vma->node.start, vma->size);
>   }
>   
> +static unsigned long pd_count(u64 size, int shift)
> +{
> +	/* Beware later misalignment */
> +	return (size + 2 * (BIT_ULL(shift) - 1)) >> shift;

Beware how and what misalignment? :)

> +}
> +
> +int i915_vm_alloc_pt_stash(struct i915_address_space *vm,
> +			   struct i915_vm_pt_stash *stash,
> +			   u64 size)
> +{
> +	unsigned long count;
> +	int shift = 21;

I wanted to ask what is 21 (2 MiB?) but it probably is overall better if 
Matt or Mika reviewed this one.

Regards,

Tvrtko

> +	int n;
> +
> +	count = pd_count(size, shift);
> +	while (count--) {
> +		struct i915_page_table *pt;
> +
> +		pt = alloc_pt(vm);
> +		if (IS_ERR(pt)) {
> +			i915_vm_free_pt_stash(vm, stash);
> +			return PTR_ERR(pt);
> +		}
> +
> +		pt->stash = stash->pt[0];
> +		stash->pt[0] = pt;
> +	}
> +
> +	for (n = 1; n < vm->top; n++) {
> +		shift += 9;
> +		count = pd_count(size, shift);
> +		while (count--) {
> +			struct i915_page_directory *pd;
> +
> +			pd = alloc_pd(vm);
> +			if (IS_ERR(pd)) {
> +				i915_vm_free_pt_stash(vm, stash);
> +				return PTR_ERR(pd);
> +			}
> +
> +			pd->pt.stash = stash->pt[1];
> +			stash->pt[1] = &pd->pt;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
> +void i915_vm_free_pt_stash(struct i915_address_space *vm,
> +			   struct i915_vm_pt_stash *stash)
> +{
> +	struct i915_page_table *pt;
> +	int n;
> +
> +	for (n = 0; n < ARRAY_SIZE(stash->pt); n++) {
> +		while ((pt = stash->pt[n])) {
> +			stash->pt[n] = pt->stash;
> +			free_px(vm, pt);
> +		}
> +	}
> +}
> +
>   int ppgtt_set_pages(struct i915_vma *vma)
>   {
>   	GEM_BUG_ON(vma->pages);
>   
>   	vma->pages = vma->obj->mm.pages;
> -
>   	vma->page_sizes = vma->obj->mm.page_sizes;
>   
>   	return 0;
> diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
> index 627bac2e0252..fc8a083753bd 100644
> --- a/drivers/gpu/drm/i915/i915_vma.c
> +++ b/drivers/gpu/drm/i915/i915_vma.c
> @@ -295,6 +295,8 @@ i915_vma_instance(struct drm_i915_gem_object *obj,
>   
>   struct i915_vma_work {
>   	struct dma_fence_work base;
> +	struct i915_address_space *vm;
> +	struct i915_vm_pt_stash stash;
>   	struct i915_vma *vma;
>   	struct drm_i915_gem_object *pinned;
>   	struct i915_sw_dma_fence_cb cb;
> @@ -306,13 +308,10 @@ static int __vma_bind(struct dma_fence_work *work)
>   {
>   	struct i915_vma_work *vw = container_of(work, typeof(*vw), base);
>   	struct i915_vma *vma = vw->vma;
> -	int err;
> -
> -	err = vma->ops->bind_vma(vma->vm, vma, vw->cache_level, vw->flags);
> -	if (err)
> -		atomic_or(I915_VMA_ERROR, &vma->flags);
>   
> -	return err;
> +	vma->ops->bind_vma(vw->vm, &vw->stash,
> +			   vma, vw->cache_level, vw->flags);
> +	return 0;
>   }
>   
>   static void __vma_release(struct dma_fence_work *work)
> @@ -321,6 +320,9 @@ static void __vma_release(struct dma_fence_work *work)
>   
>   	if (vw->pinned)
>   		__i915_gem_object_unpin_pages(vw->pinned);
> +
> +	i915_vm_free_pt_stash(vw->vm, &vw->stash);
> +	i915_vm_put(vw->vm);
>   }
>   
>   static const struct dma_fence_work_ops bind_ops = {
> @@ -380,7 +382,6 @@ int i915_vma_bind(struct i915_vma *vma,
>   {
>   	u32 bind_flags;
>   	u32 vma_flags;
> -	int ret;
>   
>   	GEM_BUG_ON(!drm_mm_node_allocated(&vma->node));
>   	GEM_BUG_ON(vma->size > vma->node.size);
> @@ -437,9 +438,7 @@ int i915_vma_bind(struct i915_vma *vma,
>   			work->pinned = vma->obj;
>   		}
>   	} else {
> -		ret = vma->ops->bind_vma(vma->vm, vma, cache_level, bind_flags);
> -		if (ret)
> -			return ret;
> +		vma->ops->bind_vma(vma->vm, NULL, vma, cache_level, bind_flags);
>   	}
>   
>   	atomic_or(bind_flags, &vma->flags);
> @@ -878,11 +877,21 @@ int i915_vma_pin(struct i915_vma *vma, u64 size, u64 alignment, u64 flags)
>   		return err;
>   
>   	if (flags & vma->vm->bind_async_flags) {
> +		u64 max_size;
> +
>   		work = i915_vma_work();
>   		if (!work) {
>   			err = -ENOMEM;
>   			goto err_pages;
>   		}
> +
> +		work->vm = i915_vm_get(vma->vm);
> +
> +		/* Allocate enough page directories to cover worst case */
> +		max_size = max(size, vma->size);
> +		if (flags & PIN_MAPPABLE)
> +			max_size = max_t(u64, max_size, vma->fence_size);
> +		i915_vm_alloc_pt_stash(vma->vm, &work->stash, max_size);
>   	}
>   
>   	if (flags & PIN_GLOBAL)
> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
> index 0016ffc7d914..9b8fc990e9ef 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
> @@ -172,35 +172,33 @@ static int igt_ppgtt_alloc(void *arg)
>   
>   	/* Check we can allocate the entire range */
>   	for (size = 4096; size <= limit; size <<= 2) {
> -		err = ppgtt->vm.allocate_va_range(&ppgtt->vm, 0, size);
> -		if (err) {
> -			if (err == -ENOMEM) {
> -				pr_info("[1] Ran out of memory for va_range [0 + %llx] [bit %d]\n",
> -					size, ilog2(size));
> -				err = 0; /* virtual space too large! */
> -			}
> +		struct i915_vm_pt_stash stash = {};
> +
> +		err = i915_vm_alloc_pt_stash(&ppgtt->vm, &stash, size);
> +		if (err)
>   			goto err_ppgtt_cleanup;
> -		}
>   
> +		ppgtt->vm.allocate_va_range(&ppgtt->vm, &stash, 0, size);
>   		cond_resched();
>   
>   		ppgtt->vm.clear_range(&ppgtt->vm, 0, size);
> +
> +		i915_vm_free_pt_stash(&ppgtt->vm, &stash);
>   	}
>   
>   	/* Check we can incrementally allocate the entire range */
>   	for (last = 0, size = 4096; size <= limit; last = size, size <<= 2) {
> -		err = ppgtt->vm.allocate_va_range(&ppgtt->vm,
> -						  last, size - last);
> -		if (err) {
> -			if (err == -ENOMEM) {
> -				pr_info("[2] Ran out of memory for va_range [%llx + %llx] [bit %d]\n",
> -					last, size - last, ilog2(size));
> -				err = 0; /* virtual space too large! */
> -			}
> +		struct i915_vm_pt_stash stash = {};
> +
> +		err = i915_vm_alloc_pt_stash(&ppgtt->vm, &stash, size - last);
> +		if (err)
>   			goto err_ppgtt_cleanup;
> -		}
>   
> +		ppgtt->vm.allocate_va_range(&ppgtt->vm, &stash,
> +					    last, size - last);
>   		cond_resched();
> +
> +		i915_vm_free_pt_stash(&ppgtt->vm, &stash);
>   	}
>   
>   err_ppgtt_cleanup:
> @@ -284,9 +282,18 @@ static int lowlevel_hole(struct i915_address_space *vm,
>   				break;
>   			}
>   
> -			if (vm->allocate_va_range &&
> -			    vm->allocate_va_range(vm, addr, BIT_ULL(size)))
> -				break;
> +			if (vm->allocate_va_range) {
> +				struct i915_vm_pt_stash stash = {};
> +
> +				if (i915_vm_alloc_pt_stash(vm, &stash,
> +							   BIT_ULL(size)))
> +					break;
> +
> +				vm->allocate_va_range(vm, &stash,
> +						      addr, BIT_ULL(size));
> +
> +				i915_vm_free_pt_stash(vm, &stash);
> +			}
>   
>   			mock_vma->pages = obj->mm.pages;
>   			mock_vma->node.size = BIT_ULL(size);
> @@ -1881,6 +1888,7 @@ static int igt_cs_tlb(void *arg)
>   			continue;
>   
>   		while (!__igt_timeout(end_time, NULL)) {
> +			struct i915_vm_pt_stash stash = {};
>   			struct i915_request *rq;
>   			u64 offset;
>   
> @@ -1888,10 +1896,6 @@ static int igt_cs_tlb(void *arg)
>   						   0, vm->total - PAGE_SIZE,
>   						   chunk_size, PAGE_SIZE);
>   
> -			err = vm->allocate_va_range(vm, offset, chunk_size);
> -			if (err)
> -				goto end;
> -
>   			memset32(result, STACK_MAGIC, PAGE_SIZE / sizeof(u32));
>   
>   			vma = i915_vma_instance(bbe, vm, NULL);
> @@ -1904,6 +1908,14 @@ static int igt_cs_tlb(void *arg)
>   			if (err)
>   				goto end;
>   
> +			err = i915_vm_alloc_pt_stash(vm, &stash, chunk_size);
> +			if (err)
> +				goto end;
> +
> +			vm->allocate_va_range(vm, &stash, offset, chunk_size);
> +
> +			i915_vm_free_pt_stash(vm, &stash);
> +
>   			/* Prime the TLB with the dummy pages */
>   			for (i = 0; i < count; i++) {
>   				vma->node.start = offset + i * PAGE_SIZE;
> diff --git a/drivers/gpu/drm/i915/selftests/mock_gtt.c b/drivers/gpu/drm/i915/selftests/mock_gtt.c
> index b173086411ef..5e4fb0fba34b 100644
> --- a/drivers/gpu/drm/i915/selftests/mock_gtt.c
> +++ b/drivers/gpu/drm/i915/selftests/mock_gtt.c
> @@ -38,14 +38,14 @@ static void mock_insert_entries(struct i915_address_space *vm,
>   {
>   }
>   
> -static int mock_bind_ppgtt(struct i915_address_space *vm,
> -			   struct i915_vma *vma,
> -			   enum i915_cache_level cache_level,
> -			   u32 flags)
> +static void mock_bind_ppgtt(struct i915_address_space *vm,
> +			    struct i915_vm_pt_stash *stash,
> +			    struct i915_vma *vma,
> +			    enum i915_cache_level cache_level,
> +			    u32 flags)
>   {
>   	GEM_BUG_ON(flags & I915_VMA_GLOBAL_BIND);
>   	set_bit(I915_VMA_LOCAL_BIND_BIT, __i915_vma_flags(vma));
> -	return 0;
>   }
>   
>   static void mock_unbind_ppgtt(struct i915_address_space *vm,
> @@ -74,6 +74,7 @@ struct i915_ppgtt *mock_ppgtt(struct drm_i915_private *i915, const char *name)
>   	ppgtt->vm.i915 = i915;
>   	ppgtt->vm.total = round_down(U64_MAX, PAGE_SIZE);
>   	ppgtt->vm.file = ERR_PTR(-ENODEV);
> +	ppgtt->vm.dma = &i915->drm.pdev->dev;
>   
>   	i915_address_space_init(&ppgtt->vm, VM_CLASS_PPGTT);
>   
> @@ -90,13 +91,12 @@ struct i915_ppgtt *mock_ppgtt(struct drm_i915_private *i915, const char *name)
>   	return ppgtt;
>   }
>   
> -static int mock_bind_ggtt(struct i915_address_space *vm,
> -			  struct i915_vma *vma,
> -			  enum i915_cache_level cache_level,
> -			  u32 flags)
> +static void mock_bind_ggtt(struct i915_address_space *vm,
> +			   struct i915_vm_pt_stash *stash,
> +			   struct i915_vma *vma,
> +			   enum i915_cache_level cache_level,
> +			   u32 flags)
>   {
> -	atomic_or(I915_VMA_GLOBAL_BIND | I915_VMA_LOCAL_BIND, &vma->flags);
> -	return 0;
>   }
>   
>   static void mock_unbind_ggtt(struct i915_address_space *vm,
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Intel-gfx] [PATCH 17/23] drm/i915/gem: Asynchronous GTT unbinding
  2020-07-02  8:32 ` [Intel-gfx] [PATCH 17/23] drm/i915/gem: Asynchronous GTT unbinding Chris Wilson
@ 2020-07-05 22:01   ` Andi Shyti
  2020-07-05 22:07     ` Chris Wilson
  0 siblings, 1 reply; 56+ messages in thread
From: Andi Shyti @ 2020-07-05 22:01 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx, Matthew Auld

Hi Chris,

> +static int gen6_fixup_ggtt(struct i915_vma *vma)

you create this function here and remove it in patch 21. This
series is a bit confusing, can we have a final version of it?

Andi
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Intel-gfx] [PATCH 17/23] drm/i915/gem: Asynchronous GTT unbinding
  2020-07-05 22:01   ` Andi Shyti
@ 2020-07-05 22:07     ` Chris Wilson
  0 siblings, 0 replies; 56+ messages in thread
From: Chris Wilson @ 2020-07-05 22:07 UTC (permalink / raw)
  To: Andi Shyti; +Cc: intel-gfx, Matthew Auld

Quoting Andi Shyti (2020-07-05 23:01:29)
> Hi Chris,
> 
> > +static int gen6_fixup_ggtt(struct i915_vma *vma)
> 
> you create this function here and remove it in patch 21. This
> series is a bit confusing, can we have a final version of it?

It get's removed because the next patches reorder all the pinning around
this central function. Until that occurs, the fixup occurs after we do
the pinning.  And those patches depend on this to provide the central
pinning. So it's a circular dependency and this patch needs to provide
the fixup so that it works by itself.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Intel-gfx] [PATCH 11/23] drm/i915/gem: Remove the call for no-evict i915_vma_pin
  2020-07-03  9:23     ` Chris Wilson
@ 2020-07-06 14:43       ` Tvrtko Ursulin
  0 siblings, 0 replies; 56+ messages in thread
From: Tvrtko Ursulin @ 2020-07-06 14:43 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 03/07/2020 10:23, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2020-07-03 09:59:01)
>>
>> On 02/07/2020 09:32, Chris Wilson wrote:
>>> Remove the stub i915_vma_pin() used for incrementally pining objects for
>>> execbuf (under the severe restriction that they must not wait on a
>>> resource as we may have already pinned it) and replace it with a
>>> i915_vma_pin_inplace() that is only allowed to reclaim the currently
>>> bound location for the vma (and will never wait for a pinned resource).
>>
>> Hm I thought the point of the previous patch ("drm/i915/gem: Break apart
>> the early i915_vma_pin from execbuf object lookup") was to move the
>> pinning into a phase under the ww lock, where it will be allowed. I
>> misunderstood something?
> 
> Still different locks, and the vm->mutex is still being used for managing
> the iova assignments.

Right, think I get it. For the record I've asked for a cover letter with 
a high level design description. Emphasis on flow of stages through 
execbuf by the end of the series and separation of lookup and 
reservation, and/or vm->mutex (ppgtt space) and obj->wwlock (backing store).

Regards,

Tvrtko


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

end of thread, other threads:[~2020-07-06 14:43 UTC | newest]

Thread overview: 56+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-02  8:32 [PATCH 01/23] drm/i915: Drop vm.ref for duplicate vma on construction Chris Wilson
2020-07-02  8:32 ` [Intel-gfx] " Chris Wilson
2020-07-02  8:32 ` [Intel-gfx] [PATCH 02/23] drm/i915/gem: Split the context's obj:vma lut into its own mutex Chris Wilson
2020-07-02 22:09   ` Andi Shyti
2020-07-02 22:14     ` Chris Wilson
2020-07-02  8:32 ` [Intel-gfx] [PATCH 03/23] drm/i915/gem: Drop forced struct_mutex from shrinker_taints_mutex Chris Wilson
2020-07-02 22:24   ` Andi Shyti
2020-07-02  8:32 ` [Intel-gfx] [PATCH 04/23] drm/i915/gem: Only revoke mmap handlers if active Chris Wilson
2020-07-02 12:35   ` Tvrtko Ursulin
2020-07-02 12:47     ` Chris Wilson
2020-07-02 12:54       ` Chris Wilson
2020-07-02  8:32 ` [Intel-gfx] [PATCH 05/23] drm/i915: Export ppgtt_bind_vma Chris Wilson
2020-07-03 10:09   ` Andi Shyti
2020-07-02  8:32 ` [Intel-gfx] [PATCH 06/23] drm/i915: Preallocate stashes for vma page-directories Chris Wilson
2020-07-03 16:47   ` Tvrtko Ursulin
2020-07-02  8:32 ` [Intel-gfx] [PATCH 07/23] drm/i915: Switch to object allocations for page directories Chris Wilson
2020-07-03  8:44   ` Tvrtko Ursulin
2020-07-03  9:00     ` Chris Wilson
2020-07-03  9:24       ` Tvrtko Ursulin
2020-07-03  9:49         ` Chris Wilson
2020-07-03 16:34           ` Tvrtko Ursulin
2020-07-03 16:36   ` Tvrtko Ursulin
2020-07-02  8:32 ` [Intel-gfx] [PATCH 08/23] drm/i915/gem: Don't drop the timeline lock during execbuf Chris Wilson
2020-07-02  8:32 ` [Intel-gfx] [PATCH 09/23] drm/i915/gem: Rename execbuf.bind_link to unbound_link Chris Wilson
2020-07-02  8:32 ` [Intel-gfx] [PATCH 10/23] drm/i915/gem: Break apart the early i915_vma_pin from execbuf object lookup Chris Wilson
2020-07-02  8:32 ` [Intel-gfx] [PATCH 11/23] drm/i915/gem: Remove the call for no-evict i915_vma_pin Chris Wilson
2020-07-03  8:59   ` Tvrtko Ursulin
2020-07-03  9:23     ` Chris Wilson
2020-07-06 14:43       ` Tvrtko Ursulin
2020-07-02  8:32 ` [Intel-gfx] [PATCH 12/23] drm/i915: Add list_for_each_entry_safe_continue_reverse Chris Wilson
2020-07-02  8:32 ` [Intel-gfx] [PATCH 13/23] drm/i915: Always defer fenced work to the worker Chris Wilson
2020-07-02  8:32 ` [Intel-gfx] [PATCH 14/23] drm/i915/gem: Assign context id for async work Chris Wilson
2020-07-02  8:32 ` [Intel-gfx] [PATCH 15/23] drm/i915: Export a preallocate variant of i915_active_acquire() Chris Wilson
2020-07-02  8:32 ` [Intel-gfx] [PATCH 16/23] drm/i915/gem: Separate the ww_mutex walker into its own list Chris Wilson
2020-07-02  8:32 ` [Intel-gfx] [PATCH 17/23] drm/i915/gem: Asynchronous GTT unbinding Chris Wilson
2020-07-05 22:01   ` Andi Shyti
2020-07-05 22:07     ` Chris Wilson
2020-07-02  8:32 ` [Intel-gfx] [PATCH 18/23] drm/i915/gem: Bind the fence async for execbuf Chris Wilson
2020-07-02  8:32 ` [Intel-gfx] [PATCH 19/23] drm/i915/gem: Include cmdparser in common execbuf pinning Chris Wilson
2020-07-02  8:32 ` [Intel-gfx] [PATCH 20/23] drm/i915/gem: Include secure batch " Chris Wilson
2020-07-02  8:32 ` [Intel-gfx] [PATCH 21/23] drm/i915/gem: Reintroduce multiple passes for reloc processing Chris Wilson
2020-07-02  8:32 ` [Intel-gfx] [PATCH 22/23] drm/i915: Add an implementation for i915_gem_ww_ctx locking, v2 Chris Wilson
2020-07-02 22:32   ` kernel test robot
2020-07-02  8:32 ` [Intel-gfx] [PATCH 23/23] drm/i915/gem: Pull execbuf dma resv under a single critical section Chris Wilson
2020-07-02  9:17 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/23] drm/i915: Drop vm.ref for duplicate vma on construction Patchwork
2020-07-02  9:18 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2020-07-02  9:40 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2020-07-02 12:27 ` [Intel-gfx] [PATCH 01/23] " Tvrtko Ursulin
2020-07-02 12:27   ` Tvrtko Ursulin
2020-07-02 13:08 ` [Intel-gfx] ✗ Fi.CI.IGT: failure for series starting with [01/23] " Patchwork
2020-07-02 20:25 ` [Intel-gfx] [PATCH 01/23] " Andi Shyti
2020-07-02 20:25   ` Andi Shyti
2020-07-02 20:38   ` Chris Wilson
2020-07-02 20:38     ` Chris Wilson
2020-07-02 20:56     ` Andi Shyti
2020-07-02 20:56       ` Andi Shyti

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.