All of lore.kernel.org
 help / color / mirror / Atom feed
* WIP glance towards struct_mutex removal
@ 2019-06-14  7:09 Chris Wilson
  2019-06-14  7:09 ` [PATCH 01/39] drm/i915: Discard some redundant cache domain flushes Chris Wilson
                   ` (40 more replies)
  0 siblings, 41 replies; 61+ messages in thread
From: Chris Wilson @ 2019-06-14  7:09 UTC (permalink / raw)
  To: intel-gfx

This series is still under development, getting the coordination right
for async vma (having just make i915_vma refcounted, I need to actually
prevent the refcycles and fixup good old userspace in ggtt,
vma->open_count for everyone incl. gem_vm_ops).

[PATCH 01/39] drm/i915: Discard some redundant cache domain flushes
[PATCH 02/39] drm/i915: Refine i915_reset.lock_map
[PATCH 03/39] drm/i915: Avoid tainting i915_gem_park() with
[PATCH 04/39] drm/i915: Enable refcount debugging for default debug

Endless/qos patches:
[PATCH 05/39] drm/i915: Keep contexts pinned until after the next
[PATCH 06/39] drm/i915: Stop retiring along engine
[PATCH 07/39] drm/i915: Replace engine->timeline with a plain list
[PATCH 08/39] drm/i915: Flush the execution-callbacks on retiring
[PATCH 09/39] drm/i915/execlists: Preempt-to-busy
[PATCH 10/39] drm/i915/execlists: Minimalistic timeslicing
[PATCH 11/39] drm/i915/execlists: Force preemption

Remove struct_mutex from i915_actve:
[PATCH 12/39] dma-fence: Propagate errors to dma-fence-array
[PATCH 13/39] dma-fence: Report the composite sync_file status
[PATCH 14/39] dma-fence: Refactor signaling for manual invocation
[PATCH 15/39] dma-fence: Always execute signal callbacks
[PATCH 16/39] drm/i915: Execute signal callbacks from no-op
[PATCH 17/39] drm/i915: Make the semaphore saturation mask global
[PATCH 18/39] drm/i915: Throw away the active object retirement
[PATCH 19/39] drm/i915: Provide an i915_active.acquire callback
[PATCH 20/39] drm/i915: Push the i915_active.retire into a worker
[PATCH 21/39] drm/i915/overlay: Switch to using i915_active tracking
[PATCH 22/39] drm/i915: Forgo last_fence active request tracking
[PATCH 23/39] drm/i915: Extract intel_frontbuffer active tracking
[PATCH 24/39] drm/i915: Coordinate i915_active with its own mutex

Up to this point, we should be solid; and removing struct_mutex around
i915_active is a major milestone as it was one of the two remaining
requirements for holding struct_mutex across request construction. In
principle, now you just need the timeline mutex, which is taken and
pinned by i915_request_create. (The remaining problem is making
retirement only require the timeline mutex as well.)

Remove struct_mutex from i915_address_space:
[PATCH 25/39] drm/i915: Track ggtt fence reservations under its own
[PATCH 26/39] drm/i915: Only track bound elements of the GTT
[PATCH 27/39] drm/i915: Explicitly cleanup initial_plane_config
[PATCH 28/39] initial-plane-vma
[PATCH 29/39] drm/i915: Make i915_vma track its own kref
[PATCH 30/39] drm/i915: Propagate fence errors
[PATCH 31/39] drm/i915: Allow page pinning to be in the background
[PATCH 32/39] drm/i915: Allow vma binding to occur asynchronously
[PATCH 33/39] revoke-fence
[PATCH 34/39] drm/i915: Use vm->mutex for serialising GTT insertion

This is quite close. Major refleak in i915_vma aka finishing touches in
converting it to be refcounted, and I need to take a pass over the async
vma to force the worker even for ggtt if we need to allocate pages (e.g.
rotation, remap, partial).

At this point, we drop the struct_mutex pretty much everywhere
internally. And should be free to start doing interesting things from
within obj->mm workers.

async get_pages:
[PATCH 35/39] drm/i915: Pin pages before waiting
[PATCH 36/39] drm/i915: Use reservation_object to coordinate userptr

One such example. Broken across rebases and needs fixing up: async
get-pages.

Remove struct_mutex for i915_requests:
[PATCH 37/39] drm/i915: Use forced preemptions in preference over
[PATCH 38/39] drm/i915: Remove logical HW ID
[PATCH 39/39] active-rings

And to complete the picture to remove struct_mutex around i915_request,
we need to be able to retire under the timeline alone. With that
inplace, we can drop struct_mutex. This presumptiously removes it from
GEM_EXECBUFFER_IOCTL, but that's a little bit premature as there are
still auxiliaries that need their own locks but is the ultimate proof.

Please look over and flag any more issues you can spot.
-Chris

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 01/39] drm/i915: Discard some redundant cache domain flushes
  2019-06-14  7:09 WIP glance towards struct_mutex removal Chris Wilson
@ 2019-06-14  7:09 ` Chris Wilson
  2019-06-14  9:12   ` Matthew Auld
  2019-06-14  7:09 ` [PATCH 02/39] drm/i915: Refine i915_reset.lock_map Chris Wilson
                   ` (39 subsequent siblings)
  40 siblings, 1 reply; 61+ messages in thread
From: Chris Wilson @ 2019-06-14  7:09 UTC (permalink / raw)
  To: intel-gfx

Since commit a679f58d0510 ("drm/i915: Flush pages on acquisition"), we
flush objects on acquire their pages and as such when we create an
object for the purpose of writing into it, we do not need to manually
flush.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c |  6 ------
 drivers/gpu/drm/i915/gt/selftest_workarounds.c        |  6 ------
 drivers/gpu/drm/i915/intel_guc_log.c                  | 11 +----------
 drivers/gpu/drm/i915/intel_overlay.c                  |  8 --------
 4 files changed, 1 insertion(+), 30 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
index 74b0e5871c4b..9e2878a59023 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
@@ -209,12 +209,6 @@ gpu_fill_dw(struct i915_vma *vma, u64 offset, unsigned long count, u32 value)
 	i915_gem_object_flush_map(obj);
 	i915_gem_object_unpin_map(obj);
 
-	i915_gem_object_lock(obj);
-	err = i915_gem_object_set_to_gtt_domain(obj, false);
-	i915_gem_object_unlock(obj);
-	if (err)
-		goto err;
-
 	vma = i915_vma_instance(obj, vma->vm, NULL);
 	if (IS_ERR(vma)) {
 		err = PTR_ERR(vma);
diff --git a/drivers/gpu/drm/i915/gt/selftest_workarounds.c b/drivers/gpu/drm/i915/gt/selftest_workarounds.c
index c8d335d63f9c..93e9579b8a4f 100644
--- a/drivers/gpu/drm/i915/gt/selftest_workarounds.c
+++ b/drivers/gpu/drm/i915/gt/selftest_workarounds.c
@@ -368,12 +368,6 @@ static struct i915_vma *create_batch(struct i915_gem_context *ctx)
 	if (err)
 		goto err_obj;
 
-	i915_gem_object_lock(obj);
-	err = i915_gem_object_set_to_wc_domain(obj, true);
-	i915_gem_object_unlock(obj);
-	if (err)
-		goto err_obj;
-
 	return vma;
 
 err_obj:
diff --git a/drivers/gpu/drm/i915/intel_guc_log.c b/drivers/gpu/drm/i915/intel_guc_log.c
index 67eadc82c396..4f9c53681f92 100644
--- a/drivers/gpu/drm/i915/intel_guc_log.c
+++ b/drivers/gpu/drm/i915/intel_guc_log.c
@@ -344,29 +344,20 @@ static void capture_logs_work(struct work_struct *work)
 static int guc_log_map(struct intel_guc_log *log)
 {
 	void *vaddr;
-	int ret;
 
 	lockdep_assert_held(&log->relay.lock);
 
 	if (!log->vma)
 		return -ENODEV;
 
-	i915_gem_object_lock(log->vma->obj);
-	ret = i915_gem_object_set_to_wc_domain(log->vma->obj, true);
-	i915_gem_object_unlock(log->vma->obj);
-	if (ret)
-		return ret;
-
 	/*
 	 * Create a WC (Uncached for read) vmalloc mapping of log
 	 * buffer pages, so that we can directly get the data
 	 * (up-to-date) from memory.
 	 */
 	vaddr = i915_gem_object_pin_map(log->vma->obj, I915_MAP_WC);
-	if (IS_ERR(vaddr)) {
-		DRM_ERROR("Couldn't map log buffer pages %d\n", ret);
+	if (IS_ERR(vaddr))
 		return PTR_ERR(vaddr);
-	}
 
 	log->relay.buf_addr = vaddr;
 
diff --git a/drivers/gpu/drm/i915/intel_overlay.c b/drivers/gpu/drm/i915/intel_overlay.c
index a2ac06a08715..21339b7f6a3e 100644
--- a/drivers/gpu/drm/i915/intel_overlay.c
+++ b/drivers/gpu/drm/i915/intel_overlay.c
@@ -1377,12 +1377,6 @@ void intel_overlay_setup(struct drm_i915_private *dev_priv)
 	if (ret)
 		goto out_free;
 
-	i915_gem_object_lock(overlay->reg_bo);
-	ret = i915_gem_object_set_to_gtt_domain(overlay->reg_bo, true);
-	i915_gem_object_unlock(overlay->reg_bo);
-	if (ret)
-		goto out_reg_bo;
-
 	memset_io(overlay->regs, 0, sizeof(struct overlay_registers));
 	update_polyphase_filter(overlay->regs);
 	update_reg_attrs(overlay, overlay->regs);
@@ -1391,8 +1385,6 @@ void intel_overlay_setup(struct drm_i915_private *dev_priv)
 	DRM_INFO("Initialized overlay support.\n");
 	return;
 
-out_reg_bo:
-	i915_gem_object_put(overlay->reg_bo);
 out_free:
 	kfree(overlay);
 }
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 02/39] drm/i915: Refine i915_reset.lock_map
  2019-06-14  7:09 WIP glance towards struct_mutex removal Chris Wilson
  2019-06-14  7:09 ` [PATCH 01/39] drm/i915: Discard some redundant cache domain flushes Chris Wilson
@ 2019-06-14  7:09 ` Chris Wilson
  2019-06-14 14:10   ` Mika Kuoppala
  2019-06-14  7:09 ` [PATCH 03/39] drm/i915: Avoid tainting i915_gem_park() with wakeref.lock Chris Wilson
                   ` (38 subsequent siblings)
  40 siblings, 1 reply; 61+ messages in thread
From: Chris Wilson @ 2019-06-14  7:09 UTC (permalink / raw)
  To: intel-gfx

We already use a mutex to serialise i915_reset() and wedging, so all we
need it to link that into i915_request_wait() and we have our lock cycle
detection.

v2.5: Take error mutex for selftests

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/i915/gt/intel_reset.c            |  6 ++----
 drivers/gpu/drm/i915/i915_drv.h                  |  8 --------
 drivers/gpu/drm/i915/i915_gem.c                  |  3 ---
 drivers/gpu/drm/i915/i915_request.c              | 12 ++++++++++--
 drivers/gpu/drm/i915/selftests/mock_gem_device.c |  2 --
 5 files changed, 12 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
index 8ba7af8b7ced..41a294f5cc19 100644
--- a/drivers/gpu/drm/i915/gt/intel_reset.c
+++ b/drivers/gpu/drm/i915/gt/intel_reset.c
@@ -978,7 +978,7 @@ void i915_reset(struct drm_i915_private *i915,
 
 	might_sleep();
 	GEM_BUG_ON(!test_bit(I915_RESET_BACKOFF, &error->flags));
-	lock_map_acquire(&i915->gt.reset_lockmap);
+	mutex_lock(&error->wedge_mutex);
 
 	/* Clear any previous failed attempts at recovery. Time to try again. */
 	if (!__i915_gem_unset_wedged(i915))
@@ -1031,7 +1031,7 @@ void i915_reset(struct drm_i915_private *i915,
 finish:
 	reset_finish(i915);
 unlock:
-	lock_map_release(&i915->gt.reset_lockmap);
+	mutex_unlock(&error->wedge_mutex);
 	return;
 
 taint:
@@ -1147,9 +1147,7 @@ static void i915_reset_device(struct drm_i915_private *i915,
 		/* Flush everyone using a resource about to be clobbered */
 		synchronize_srcu_expedited(&error->reset_backoff_srcu);
 
-		mutex_lock(&error->wedge_mutex);
 		i915_reset(i915, engine_mask, reason);
-		mutex_unlock(&error->wedge_mutex);
 
 		intel_finish_reset(i915);
 	}
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 90d94d904e65..3683ef6d4c28 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1901,14 +1901,6 @@ struct drm_i915_private {
 		ktime_t last_init_time;
 
 		struct i915_vma *scratch;
-
-		/*
-		 * We must never wait on the GPU while holding a lock as we
-		 * may need to perform a GPU reset. So while we don't need to
-		 * serialise wait/reset with an explicit lock, we do want
-		 * lockdep to detect potential dependency cycles.
-		 */
-		struct lockdep_map reset_lockmap;
 	} gt;
 
 	struct {
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 4bbded4aa936..7232361973fd 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1746,7 +1746,6 @@ static void i915_gem_init__mm(struct drm_i915_private *i915)
 
 int i915_gem_init_early(struct drm_i915_private *dev_priv)
 {
-	static struct lock_class_key reset_key;
 	int err;
 
 	intel_gt_pm_init(dev_priv);
@@ -1754,8 +1753,6 @@ int i915_gem_init_early(struct drm_i915_private *dev_priv)
 	INIT_LIST_HEAD(&dev_priv->gt.active_rings);
 	INIT_LIST_HEAD(&dev_priv->gt.closed_vma);
 	spin_lock_init(&dev_priv->gt.closed_lock);
-	lockdep_init_map(&dev_priv->gt.reset_lockmap,
-			 "i915.reset", &reset_key, 0);
 
 	i915_gem_init__mm(dev_priv);
 	i915_gem_init__pm(dev_priv);
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 1cbc3ef4fc27..5311286578b7 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -1444,7 +1444,15 @@ long i915_request_wait(struct i915_request *rq,
 		return -ETIME;
 
 	trace_i915_request_wait_begin(rq, flags);
-	lock_map_acquire(&rq->i915->gt.reset_lockmap);
+
+	/*
+	 * We must never wait on the GPU while holding a lock as we
+	 * may need to perform a GPU reset. So while we don't need to
+	 * serialise wait/reset with an explicit lock, we do want
+	 * lockdep to detect potential dependency cycles.
+	 */
+	mutex_acquire(&rq->i915->gpu_error.wedge_mutex.dep_map,
+		      0, 0, _THIS_IP_);
 
 	/*
 	 * Optimistic spin before touching IRQs.
@@ -1518,7 +1526,7 @@ long i915_request_wait(struct i915_request *rq,
 	dma_fence_remove_callback(&rq->fence, &wait.cb);
 
 out:
-	lock_map_release(&rq->i915->gt.reset_lockmap);
+	mutex_release(&rq->i915->gpu_error.wedge_mutex.dep_map, 0, _THIS_IP_);
 	trace_i915_request_wait_end(rq);
 	return timeout;
 }
diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
index 1e9ffced78c1..b7f3fbb4ae89 100644
--- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
+++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
@@ -130,7 +130,6 @@ static struct dev_pm_domain pm_domain = {
 
 struct drm_i915_private *mock_gem_device(void)
 {
-	static struct lock_class_key reset_key;
 	struct drm_i915_private *i915;
 	struct pci_dev *pdev;
 	int err;
@@ -205,7 +204,6 @@ struct drm_i915_private *mock_gem_device(void)
 	INIT_LIST_HEAD(&i915->gt.active_rings);
 	INIT_LIST_HEAD(&i915->gt.closed_vma);
 	spin_lock_init(&i915->gt.closed_lock);
-	lockdep_init_map(&i915->gt.reset_lockmap, "i915.reset", &reset_key, 0);
 
 	mutex_lock(&i915->drm.struct_mutex);
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 03/39] drm/i915: Avoid tainting i915_gem_park() with wakeref.lock
  2019-06-14  7:09 WIP glance towards struct_mutex removal Chris Wilson
  2019-06-14  7:09 ` [PATCH 01/39] drm/i915: Discard some redundant cache domain flushes Chris Wilson
  2019-06-14  7:09 ` [PATCH 02/39] drm/i915: Refine i915_reset.lock_map Chris Wilson
@ 2019-06-14  7:09 ` Chris Wilson
  2019-06-14 14:47   ` Mika Kuoppala
  2019-06-14  7:09 ` [PATCH 04/39] drm/i915: Enable refcount debugging for default debug levels Chris Wilson
                   ` (37 subsequent siblings)
  40 siblings, 1 reply; 61+ messages in thread
From: Chris Wilson @ 2019-06-14  7:09 UTC (permalink / raw)
  To: intel-gfx

While we need to flush the wakeref before parking, we do not need to
perform the i915_gem_park() itself underneath the wakeref lock, merely
the struct_mutex. If we rearrange the locks, we can avoid the unnecessary
tainting.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gem/i915_gem_pm.c | 17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pm.c b/drivers/gpu/drm/i915/gem/i915_gem_pm.c
index 6e75702c5671..a33f69610d6f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_pm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_pm.c
@@ -30,23 +30,22 @@ static void idle_work_handler(struct work_struct *work)
 {
 	struct drm_i915_private *i915 =
 		container_of(work, typeof(*i915), gem.idle_work);
-	bool restart = true;
+	bool park;
 
-	cancel_delayed_work(&i915->gem.retire_work);
+	cancel_delayed_work_sync(&i915->gem.retire_work);
 	mutex_lock(&i915->drm.struct_mutex);
 
 	intel_wakeref_lock(&i915->gt.wakeref);
-	if (!intel_wakeref_active(&i915->gt.wakeref) && !work_pending(work)) {
-		i915_gem_park(i915);
-		restart = false;
-	}
+	park = !intel_wakeref_active(&i915->gt.wakeref) && !work_pending(work);
 	intel_wakeref_unlock(&i915->gt.wakeref);
-
-	mutex_unlock(&i915->drm.struct_mutex);
-	if (restart)
+	if (park)
+		i915_gem_park(i915);
+	else
 		queue_delayed_work(i915->wq,
 				   &i915->gem.retire_work,
 				   round_jiffies_up_relative(HZ));
+
+	mutex_unlock(&i915->drm.struct_mutex);
 }
 
 static void retire_work_handler(struct work_struct *work)
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 04/39] drm/i915: Enable refcount debugging for default debug levels
  2019-06-14  7:09 WIP glance towards struct_mutex removal Chris Wilson
                   ` (2 preceding siblings ...)
  2019-06-14  7:09 ` [PATCH 03/39] drm/i915: Avoid tainting i915_gem_park() with wakeref.lock Chris Wilson
@ 2019-06-14  7:09 ` Chris Wilson
  2019-06-14 15:11   ` Mika Kuoppala
  2019-06-14 15:11   ` Mika Kuoppala
  2019-06-14  7:09 ` [PATCH 05/39] drm/i915: Keep contexts pinned until after the next kernel context switch Chris Wilson
                   ` (36 subsequent siblings)
  40 siblings, 2 replies; 61+ messages in thread
From: Chris Wilson @ 2019-06-14  7:09 UTC (permalink / raw)
  To: intel-gfx

refcount_t is our first line of defence against use-after-free, so let's
enable it for debugging.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/Kconfig.debug | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/i915/Kconfig.debug b/drivers/gpu/drm/i915/Kconfig.debug
index 09aa0f4c8bf1..8d922bb4d953 100644
--- a/drivers/gpu/drm/i915/Kconfig.debug
+++ b/drivers/gpu/drm/i915/Kconfig.debug
@@ -21,6 +21,7 @@ config DRM_I915_DEBUG
         depends on DRM_I915
         select DEBUG_FS
         select PREEMPT_COUNT
+        select REFCOUNT_FULL
         select I2C_CHARDEV
         select STACKDEPOT
         select DRM_DP_AUX_CHARDEV
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 05/39] drm/i915: Keep contexts pinned until after the next kernel context switch
  2019-06-14  7:09 WIP glance towards struct_mutex removal Chris Wilson
                   ` (3 preceding siblings ...)
  2019-06-14  7:09 ` [PATCH 04/39] drm/i915: Enable refcount debugging for default debug levels Chris Wilson
@ 2019-06-14  7:09 ` Chris Wilson
  2019-06-14  7:09 ` [PATCH 06/39] drm/i915: Stop retiring along engine Chris Wilson
                   ` (35 subsequent siblings)
  40 siblings, 0 replies; 61+ messages in thread
From: Chris Wilson @ 2019-06-14  7:09 UTC (permalink / raw)
  To: intel-gfx

We need to keep the context image pinned in memory until after the GPU
has finished writing into it. Since it continues to write as we signal
the final breadcrumb, we need to keep it pinned until the request after
it is complete. Currently we know the order in which requests execute on
each engine, and so to remove that presumption we need to identify a
request/context-switch we know must occur after our completion. Any
request queued after the signal must imply a context switch, for
simplicity we use a fresh request from the kernel context.

The sequence of operations for keeping the context pinned until saved is:

 - On context activation, we preallocate a node for each physical engine
   the context may operate on. This is to avoid allocations during
   unpinning, which may be from inside FS_RECLAIM context (aka the
   shrinker)

 - On context deactivation on retirement of the last active request (which
   is before we know the context has been saved), we add the
   preallocated node onto a barrier list on each engine

 - On engine idling, we emit a switch to kernel context. When this
   switch completes, we know that all previous contexts must have been
   saved, and so on retiring this request we can finally unpin all the
   contexts that were marked as deactivated prior to the switch.

We can enhance this in future by flushing all the idle contexts on a
regular heartbeat pulse of a switch to kernel context, which will also
be used to check for hung engines.

v2: intel_context_active_acquire/_release

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c   | 24 ++----
 drivers/gpu/drm/i915/gem/i915_gem_context.h   |  1 -
 drivers/gpu/drm/i915/gem/i915_gem_pm.c        | 20 ++++-
 drivers/gpu/drm/i915/gt/intel_context.c       | 80 ++++++++++++++++---
 drivers/gpu/drm/i915/gt/intel_context.h       |  3 +
 drivers/gpu/drm/i915/gt/intel_context_types.h |  6 +-
 drivers/gpu/drm/i915/gt/intel_engine.h        |  2 -
 drivers/gpu/drm/i915/gt/intel_engine_cs.c     | 23 +-----
 drivers/gpu/drm/i915/gt/intel_engine_pm.c     |  2 +
 drivers/gpu/drm/i915/gt/intel_engine_types.h  | 13 +--
 drivers/gpu/drm/i915/gt/intel_lrc.c           | 62 ++------------
 drivers/gpu/drm/i915/gt/intel_ringbuffer.c    | 44 +---------
 drivers/gpu/drm/i915/gt/mock_engine.c         | 11 +--
 drivers/gpu/drm/i915/i915_active.c            | 78 ++++++++++++++++++
 drivers/gpu/drm/i915/i915_active.h            |  5 ++
 drivers/gpu/drm/i915/i915_active_types.h      |  3 +
 drivers/gpu/drm/i915/i915_gem.c               |  4 -
 drivers/gpu/drm/i915/i915_request.c           | 15 ----
 .../gpu/drm/i915/selftests/mock_gem_device.c  |  1 -
 19 files changed, 213 insertions(+), 184 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index c86ca9f21532..6200060aef05 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -692,17 +692,6 @@ int i915_gem_contexts_init(struct drm_i915_private *dev_priv)
 	return 0;
 }
 
-void i915_gem_contexts_lost(struct drm_i915_private *dev_priv)
-{
-	struct intel_engine_cs *engine;
-	enum intel_engine_id id;
-
-	lockdep_assert_held(&dev_priv->drm.struct_mutex);
-
-	for_each_engine(engine, dev_priv, id)
-		intel_engine_lost_context(engine);
-}
-
 void i915_gem_contexts_fini(struct drm_i915_private *i915)
 {
 	lockdep_assert_held(&i915->drm.struct_mutex);
@@ -1203,10 +1192,6 @@ gen8_modify_rpcs(struct intel_context *ce, struct intel_sseu sseu)
 	if (ret)
 		goto out_add;
 
-	ret = gen8_emit_rpcs_config(rq, ce, sseu);
-	if (ret)
-		goto out_add;
-
 	/*
 	 * Guarantee context image and the timeline remains pinned until the
 	 * modifying request is retired by setting the ce activity tracker.
@@ -1214,9 +1199,12 @@ gen8_modify_rpcs(struct intel_context *ce, struct intel_sseu sseu)
 	 * But we only need to take one pin on the account of it. Or in other
 	 * words transfer the pinned ce object to tracked active request.
 	 */
-	if (!i915_active_request_isset(&ce->active_tracker))
-		__intel_context_pin(ce);
-	__i915_active_request_set(&ce->active_tracker, rq);
+	GEM_BUG_ON(i915_active_is_idle(&ce->active));
+	ret = i915_active_ref(&ce->active, rq->fence.context, rq);
+	if (ret)
+		goto out_add;
+
+	ret = gen8_emit_rpcs_config(rq, ce, sseu);
 
 out_add:
 	i915_request_add(rq);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.h b/drivers/gpu/drm/i915/gem/i915_gem_context.h
index 630392c77e48..9691dd062f72 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.h
@@ -134,7 +134,6 @@ static inline bool i915_gem_context_is_kernel(struct i915_gem_context *ctx)
 
 /* i915_gem_context.c */
 int __must_check i915_gem_contexts_init(struct drm_i915_private *dev_priv);
-void i915_gem_contexts_lost(struct drm_i915_private *dev_priv);
 void i915_gem_contexts_fini(struct drm_i915_private *dev_priv);
 
 int i915_gem_context_open(struct drm_i915_private *i915,
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pm.c b/drivers/gpu/drm/i915/gem/i915_gem_pm.c
index a33f69610d6f..05011d4a3b88 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_pm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_pm.c
@@ -10,6 +10,22 @@
 #include "i915_drv.h"
 #include "i915_globals.h"
 
+static void call_idle_barriers(struct intel_engine_cs *engine)
+{
+	struct llist_node *node, *next;
+
+	llist_for_each_safe(node, next, llist_del_all(&engine->barrier_tasks)) {
+		struct i915_active_request *active =
+			container_of((struct list_head *)node,
+				     typeof(*active), link);
+
+		INIT_LIST_HEAD(&active->link);
+		RCU_INIT_POINTER(active->request, NULL);
+
+		active->retire(active, NULL);
+	}
+}
+
 static void i915_gem_park(struct drm_i915_private *i915)
 {
 	struct intel_engine_cs *engine;
@@ -17,8 +33,10 @@ static void i915_gem_park(struct drm_i915_private *i915)
 
 	lockdep_assert_held(&i915->drm.struct_mutex);
 
-	for_each_engine(engine, i915, id)
+	for_each_engine(engine, i915, id) {
+		call_idle_barriers(engine); /* cleanup after wedging */
 		i915_gem_batch_pool_fini(&engine->batch_pool);
+	}
 
 	i915_timelines_park(i915);
 	i915_vma_parked(i915);
diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
index c78ec0b58e77..8e299c631575 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -61,7 +61,6 @@ int __intel_context_do_pin(struct intel_context *ce)
 
 		i915_gem_context_get(ce->gem_context); /* for ctx->ppgtt */
 
-		intel_context_get(ce);
 		smp_mb__before_atomic(); /* flush pin before it is visible */
 	}
 
@@ -89,20 +88,45 @@ void intel_context_unpin(struct intel_context *ce)
 		ce->ops->unpin(ce);
 
 		i915_gem_context_put(ce->gem_context);
-		intel_context_put(ce);
+		intel_context_active_release(ce);
 	}
 
 	mutex_unlock(&ce->pin_mutex);
 	intel_context_put(ce);
 }
 
-static void intel_context_retire(struct i915_active_request *active,
-				 struct i915_request *rq)
+static int __context_pin_state(struct i915_vma *vma, unsigned long flags)
 {
-	struct intel_context *ce =
-		container_of(active, typeof(*ce), active_tracker);
+	int err;
 
-	intel_context_unpin(ce);
+	err = i915_vma_pin(vma, 0, 0, flags | PIN_GLOBAL);
+	if (err)
+		return err;
+
+	/*
+	 * And mark it as a globally pinned object to let the shrinker know
+	 * it cannot reclaim the object until we release it.
+	 */
+	vma->obj->pin_global++;
+	vma->obj->mm.dirty = true;
+
+	return 0;
+}
+
+static void __context_unpin_state(struct i915_vma *vma)
+{
+	vma->obj->pin_global--;
+	__i915_vma_unpin(vma);
+}
+
+static void intel_context_retire(struct i915_active *active)
+{
+	struct intel_context *ce = container_of(active, typeof(*ce), active);
+
+	if (ce->state)
+		__context_unpin_state(ce->state);
+
+	intel_context_put(ce);
 }
 
 void
@@ -125,8 +149,46 @@ intel_context_init(struct intel_context *ce,
 
 	mutex_init(&ce->pin_mutex);
 
-	i915_active_request_init(&ce->active_tracker,
-				 NULL, intel_context_retire);
+	i915_active_init(ctx->i915, &ce->active, intel_context_retire);
+}
+
+int intel_context_active_acquire(struct intel_context *ce, unsigned long flags)
+{
+	int err;
+
+	if (!i915_active_acquire(&ce->active))
+		return 0;
+
+	intel_context_get(ce);
+
+	if (!ce->state)
+		return 0;
+
+	err = __context_pin_state(ce->state, flags);
+	if (err) {
+		i915_active_cancel(&ce->active);
+		intel_context_put(ce);
+		return err;
+	}
+
+	/* Preallocate tracking nodes */
+	if (!i915_gem_context_is_kernel(ce->gem_context)) {
+		err = i915_active_acquire_preallocate_barrier(&ce->active,
+							      ce->engine);
+		if (err) {
+			i915_active_release(&ce->active);
+			return err;
+		}
+	}
+
+	return 0;
+}
+
+void intel_context_active_release(struct intel_context *ce)
+{
+	/* Nodes preallocated in intel_context_active() */
+	i915_active_acquire_barrier(&ce->active);
+	i915_active_release(&ce->active);
 }
 
 static void i915_global_context_shrink(void)
diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h
index 6d5453ba2c1e..a47275bc4f01 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.h
+++ b/drivers/gpu/drm/i915/gt/intel_context.h
@@ -102,6 +102,9 @@ static inline void intel_context_exit(struct intel_context *ce)
 		ce->ops->exit(ce);
 }
 
+int intel_context_active_acquire(struct intel_context *ce, unsigned long flags);
+void intel_context_active_release(struct intel_context *ce);
+
 static inline struct intel_context *intel_context_get(struct intel_context *ce)
 {
 	kref_get(&ce->ref);
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
index 825fcf0ac9c4..e95be4be9612 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -56,10 +56,10 @@ struct intel_context {
 	intel_engine_mask_t saturated; /* submitting semaphores too late? */
 
 	/**
-	 * active_tracker: Active tracker for the external rq activity
-	 * on this intel_context object.
+	 * active: Active tracker for the rq activity (inc. external) on this
+	 * intel_context object.
 	 */
-	struct i915_active_request active_tracker;
+	struct i915_active active;
 
 	const struct intel_context_ops *ops;
 
diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h
index 1439fa4093ac..3b5a6d152997 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine.h
@@ -467,8 +467,6 @@ static inline void intel_engine_reset(struct intel_engine_cs *engine,
 bool intel_engine_is_idle(struct intel_engine_cs *engine);
 bool intel_engines_are_idle(struct drm_i915_private *dev_priv);
 
-void intel_engine_lost_context(struct intel_engine_cs *engine);
-
 void intel_engines_reset_default_submission(struct drm_i915_private *i915);
 unsigned int intel_engines_has_context_isolation(struct drm_i915_private *i915);
 
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index c0d986db5a75..5a08036ae774 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -611,6 +611,8 @@ static int intel_engine_setup_common(struct intel_engine_cs *engine)
 {
 	int err;
 
+	init_llist_head(&engine->barrier_tasks);
+
 	err = init_status_page(engine);
 	if (err)
 		return err;
@@ -870,6 +872,7 @@ void intel_engine_cleanup_common(struct intel_engine_cs *engine)
 	if (engine->preempt_context)
 		intel_context_unpin(engine->preempt_context);
 	intel_context_unpin(engine->kernel_context);
+	GEM_BUG_ON(!llist_empty(&engine->barrier_tasks));
 
 	i915_timeline_fini(&engine->timeline);
 
@@ -1201,26 +1204,6 @@ void intel_engines_reset_default_submission(struct drm_i915_private *i915)
 		engine->set_default_submission(engine);
 }
 
-/**
- * intel_engine_lost_context: called when the GPU is reset into unknown state
- * @engine: the engine
- *
- * We have either reset the GPU or otherwise about to lose state tracking of
- * the current GPU logical state (e.g. suspend). On next use, it is therefore
- * imperative that we make no presumptions about the current state and load
- * from scratch.
- */
-void intel_engine_lost_context(struct intel_engine_cs *engine)
-{
-	struct intel_context *ce;
-
-	lockdep_assert_held(&engine->i915->drm.struct_mutex);
-
-	ce = fetch_and_zero(&engine->last_retired_context);
-	if (ce)
-		intel_context_unpin(ce);
-}
-
 bool intel_engine_can_store_dword(struct intel_engine_cs *engine)
 {
 	switch (INTEL_GEN(engine->i915)) {
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
index ccf034764741..3c448a061abd 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
@@ -88,6 +88,8 @@ static bool switch_to_kernel_context(struct intel_engine_cs *engine)
 
 	/* Check again on the next retirement. */
 	engine->wakeref_serial = engine->serial + 1;
+
+	i915_request_add_barriers(rq);
 	__i915_request_commit(rq);
 
 	return false;
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index 01223864237a..33a31aa2d2ae 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -11,6 +11,7 @@
 #include <linux/irq_work.h>
 #include <linux/kref.h>
 #include <linux/list.h>
+#include <linux/llist.h>
 #include <linux/types.h>
 
 #include "i915_gem.h"
@@ -288,6 +289,7 @@ struct intel_engine_cs {
 	struct intel_ring *buffer;
 
 	struct i915_timeline timeline;
+	struct llist_head barrier_tasks;
 
 	struct intel_context *kernel_context; /* pinned */
 	struct intel_context *preempt_context; /* pinned; optional */
@@ -435,17 +437,6 @@ struct intel_engine_cs {
 
 	struct intel_engine_execlists execlists;
 
-	/* Contexts are pinned whilst they are active on the GPU. The last
-	 * context executed remains active whilst the GPU is idle - the
-	 * switch away and write to the context object only occurs on the
-	 * next execution.  Contexts are only unpinned on retirement of the
-	 * following request ensuring that we can always write to the object
-	 * on the context switch even after idling. Across suspend, we switch
-	 * to the kernel context and trash it as the save may not happen
-	 * before the hardware is powered down.
-	 */
-	struct intel_context *last_retired_context;
-
 	/* status_notifier: list of callbacks for context-switch changes */
 	struct atomic_notifier_head context_status_notifier;
 
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index b8f5592da18f..d0a51752386f 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -1422,60 +1422,11 @@ static void execlists_context_destroy(struct kref *kref)
 	intel_context_free(ce);
 }
 
-static int __context_pin(struct i915_vma *vma)
-{
-	unsigned int flags;
-	int err;
-
-	flags = PIN_GLOBAL | PIN_HIGH;
-	flags |= PIN_OFFSET_BIAS | i915_ggtt_pin_bias(vma);
-
-	err = i915_vma_pin(vma, 0, 0, flags);
-	if (err)
-		return err;
-
-	vma->obj->pin_global++;
-	vma->obj->mm.dirty = true;
-
-	return 0;
-}
-
-static void __context_unpin(struct i915_vma *vma)
-{
-	vma->obj->pin_global--;
-	__i915_vma_unpin(vma);
-}
-
 static void execlists_context_unpin(struct intel_context *ce)
 {
-	struct intel_engine_cs *engine;
-
-	/*
-	 * The tasklet may still be using a pointer to our state, via an
-	 * old request. However, since we know we only unpin the context
-	 * on retirement of the following request, we know that the last
-	 * request referencing us will have had a completion CS interrupt.
-	 * If we see that it is still active, it means that the tasklet hasn't
-	 * had the chance to run yet; let it run before we teardown the
-	 * reference it may use.
-	 */
-	engine = READ_ONCE(ce->inflight);
-	if (unlikely(engine)) {
-		unsigned long flags;
-
-		spin_lock_irqsave(&engine->timeline.lock, flags);
-		process_csb(engine);
-		spin_unlock_irqrestore(&engine->timeline.lock, flags);
-
-		GEM_BUG_ON(READ_ONCE(ce->inflight));
-	}
-
 	i915_gem_context_unpin_hw_id(ce->gem_context);
-
-	intel_ring_unpin(ce->ring);
-
 	i915_gem_object_unpin_map(ce->state->obj);
-	__context_unpin(ce->state);
+	intel_ring_unpin(ce->ring);
 }
 
 static void
@@ -1512,7 +1463,10 @@ __execlists_context_pin(struct intel_context *ce,
 		goto err;
 	GEM_BUG_ON(!ce->state);
 
-	ret = __context_pin(ce->state);
+	ret = intel_context_active_acquire(ce,
+					   engine->i915->ggtt.pin_bias |
+					   PIN_OFFSET_BIAS |
+					   PIN_HIGH);
 	if (ret)
 		goto err;
 
@@ -1521,7 +1475,7 @@ __execlists_context_pin(struct intel_context *ce,
 					I915_MAP_OVERRIDE);
 	if (IS_ERR(vaddr)) {
 		ret = PTR_ERR(vaddr);
-		goto unpin_vma;
+		goto unpin_active;
 	}
 
 	ret = intel_ring_pin(ce->ring);
@@ -1542,8 +1496,8 @@ __execlists_context_pin(struct intel_context *ce,
 	intel_ring_unpin(ce->ring);
 unpin_map:
 	i915_gem_object_unpin_map(ce->state->obj);
-unpin_vma:
-	__context_unpin(ce->state);
+unpin_active:
+	intel_context_active_release(ce);
 err:
 	return ret;
 }
diff --git a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
index b3bf47e8162f..cc901edec09a 100644
--- a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
@@ -1349,45 +1349,9 @@ static void __context_unpin_ppgtt(struct i915_gem_context *ctx)
 		gen6_ppgtt_unpin(i915_vm_to_ppgtt(vm));
 }
 
-static int __context_pin(struct intel_context *ce)
-{
-	struct i915_vma *vma;
-	int err;
-
-	vma = ce->state;
-	if (!vma)
-		return 0;
-
-	err = i915_vma_pin(vma, 0, 0, PIN_GLOBAL | PIN_HIGH);
-	if (err)
-		return err;
-
-	/*
-	 * And mark is as a globally pinned object to let the shrinker know
-	 * it cannot reclaim the object until we release it.
-	 */
-	vma->obj->pin_global++;
-	vma->obj->mm.dirty = true;
-
-	return 0;
-}
-
-static void __context_unpin(struct intel_context *ce)
-{
-	struct i915_vma *vma;
-
-	vma = ce->state;
-	if (!vma)
-		return;
-
-	vma->obj->pin_global--;
-	i915_vma_unpin(vma);
-}
-
 static void ring_context_unpin(struct intel_context *ce)
 {
 	__context_unpin_ppgtt(ce->gem_context);
-	__context_unpin(ce);
 }
 
 static struct i915_vma *
@@ -1477,18 +1441,18 @@ static int ring_context_pin(struct intel_context *ce)
 		ce->state = vma;
 	}
 
-	err = __context_pin(ce);
+	err = intel_context_active_acquire(ce, PIN_HIGH);
 	if (err)
 		return err;
 
 	err = __context_pin_ppgtt(ce->gem_context);
 	if (err)
-		goto err_unpin;
+		goto err_active;
 
 	return 0;
 
-err_unpin:
-	__context_unpin(ce);
+err_active:
+	intel_context_active_release(ce);
 	return err;
 }
 
diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c b/drivers/gpu/drm/i915/gt/mock_engine.c
index 6d7562769eb2..d1ef515bac8d 100644
--- a/drivers/gpu/drm/i915/gt/mock_engine.c
+++ b/drivers/gpu/drm/i915/gt/mock_engine.c
@@ -146,12 +146,18 @@ static void mock_context_destroy(struct kref *ref)
 
 static int mock_context_pin(struct intel_context *ce)
 {
+	int ret;
+
 	if (!ce->ring) {
 		ce->ring = mock_ring(ce->engine);
 		if (!ce->ring)
 			return -ENOMEM;
 	}
 
+	ret = intel_context_active_acquire(ce, PIN_HIGH);
+	if (ret)
+		return ret;
+
 	mock_timeline_pin(ce->ring->timeline);
 	return 0;
 }
@@ -328,14 +334,9 @@ void mock_engine_free(struct intel_engine_cs *engine)
 {
 	struct mock_engine *mock =
 		container_of(engine, typeof(*mock), base);
-	struct intel_context *ce;
 
 	GEM_BUG_ON(timer_pending(&mock->hw_delay));
 
-	ce = fetch_and_zero(&engine->last_retired_context);
-	if (ce)
-		intel_context_unpin(ce);
-
 	intel_context_unpin(engine->kernel_context);
 
 	intel_engine_fini_breadcrumbs(engine);
diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c
index 863ae12707ba..2d019ac6db20 100644
--- a/drivers/gpu/drm/i915/i915_active.c
+++ b/drivers/gpu/drm/i915/i915_active.c
@@ -157,6 +157,7 @@ void i915_active_init(struct drm_i915_private *i915,
 	ref->retire = retire;
 	ref->tree = RB_ROOT;
 	i915_active_request_init(&ref->last, NULL, last_retire);
+	init_llist_head(&ref->barriers);
 	ref->count = 0;
 }
 
@@ -263,6 +264,83 @@ void i915_active_fini(struct i915_active *ref)
 }
 #endif
 
+int i915_active_acquire_preallocate_barrier(struct i915_active *ref,
+					    struct intel_engine_cs *engine)
+{
+	struct drm_i915_private *i915 = engine->i915;
+	unsigned long tmp;
+	int err = 0;
+
+	GEM_BUG_ON(!engine->mask);
+	for_each_engine_masked(engine, i915, engine->mask, tmp) {
+		struct intel_context *kctx = engine->kernel_context;
+		struct active_node *node;
+
+		node = kmem_cache_alloc(global.slab_cache, GFP_KERNEL);
+		if (unlikely(!node)) {
+			err = -ENOMEM;
+			break;
+		}
+
+		i915_active_request_init(&node->base,
+					 (void *)engine, node_retire);
+		node->timeline = kctx->ring->timeline->fence_context;
+		node->ref = ref;
+		ref->count++;
+
+		llist_add((struct llist_node *)&node->base.link,
+			  &ref->barriers);
+	}
+
+	return err;
+}
+
+void i915_active_acquire_barrier(struct i915_active *ref)
+{
+	struct llist_node *pos, *next;
+
+	i915_active_acquire(ref);
+
+	llist_for_each_safe(pos, next, llist_del_all(&ref->barriers)) {
+		struct intel_engine_cs *engine;
+		struct active_node *node;
+		struct rb_node **p, *parent;
+
+		node = container_of((struct list_head *)pos,
+				    typeof(*node), base.link);
+
+		engine = (void *)rcu_access_pointer(node->base.request);
+		RCU_INIT_POINTER(node->base.request, ERR_PTR(-EAGAIN));
+
+		parent = NULL;
+		p = &ref->tree.rb_node;
+		while (*p) {
+			parent = *p;
+			if (rb_entry(parent,
+				     struct active_node,
+				     node)->timeline < node->timeline)
+				p = &parent->rb_right;
+			else
+				p = &parent->rb_left;
+		}
+		rb_link_node(&node->node, parent, p);
+		rb_insert_color(&node->node, &ref->tree);
+
+		llist_add((struct llist_node *)&node->base.link,
+			  &engine->barrier_tasks);
+	}
+	i915_active_release(ref);
+}
+
+void i915_request_add_barriers(struct i915_request *rq)
+{
+	struct intel_engine_cs *engine = rq->engine;
+	struct llist_node *node, *next;
+
+	llist_for_each_safe(node, next, llist_del_all(&engine->barrier_tasks))
+		list_add_tail((struct list_head *)node, &rq->active_list);
+}
+
 int i915_active_request_set(struct i915_active_request *active,
 			    struct i915_request *rq)
 {
diff --git a/drivers/gpu/drm/i915/i915_active.h b/drivers/gpu/drm/i915/i915_active.h
index 7d758719ce39..d55d37673944 100644
--- a/drivers/gpu/drm/i915/i915_active.h
+++ b/drivers/gpu/drm/i915/i915_active.h
@@ -406,4 +406,9 @@ void i915_active_fini(struct i915_active *ref);
 static inline void i915_active_fini(struct i915_active *ref) { }
 #endif
 
+int i915_active_acquire_preallocate_barrier(struct i915_active *ref,
+					    struct intel_engine_cs *engine);
+void i915_active_acquire_barrier(struct i915_active *ref);
+void i915_request_add_barriers(struct i915_request *rq);
+
 #endif /* _I915_ACTIVE_H_ */
diff --git a/drivers/gpu/drm/i915/i915_active_types.h b/drivers/gpu/drm/i915/i915_active_types.h
index b679253b53a5..c025991b9233 100644
--- a/drivers/gpu/drm/i915/i915_active_types.h
+++ b/drivers/gpu/drm/i915/i915_active_types.h
@@ -7,6 +7,7 @@
 #ifndef _I915_ACTIVE_TYPES_H_
 #define _I915_ACTIVE_TYPES_H_
 
+#include <linux/llist.h>
 #include <linux/rbtree.h>
 #include <linux/rcupdate.h>
 
@@ -31,6 +32,8 @@ struct i915_active {
 	unsigned int count;
 
 	void (*retire)(struct i915_active *ref);
+
+	struct llist_head barriers;
 };
 
 #endif /* _I915_ACTIVE_TYPES_H_ */
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 7232361973fd..314c532490df 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1198,10 +1198,6 @@ void i915_gem_sanitize(struct drm_i915_private *i915)
 
 	intel_uncore_forcewake_put(&i915->uncore, FORCEWAKE_ALL);
 	intel_runtime_pm_put(i915, wakeref);
-
-	mutex_lock(&i915->drm.struct_mutex);
-	i915_gem_contexts_lost(i915);
-	mutex_unlock(&i915->drm.struct_mutex);
 }
 
 void i915_gem_init_swizzling(struct drm_i915_private *dev_priv)
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 5311286578b7..af757a958553 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -213,18 +213,6 @@ static void __retire_engine_request(struct intel_engine_cs *engine,
 	spin_unlock(&rq->lock);
 
 	local_irq_enable();
-
-	/*
-	 * The backing object for the context is done after switching to the
-	 * *next* context. Therefore we cannot retire the previous context until
-	 * the next context has already started running. However, since we
-	 * cannot take the required locks at i915_request_submit() we
-	 * defer the unpinning of the active context to now, retirement of
-	 * the subsequent request.
-	 */
-	if (engine->last_retired_context)
-		intel_context_unpin(engine->last_retired_context);
-	engine->last_retired_context = rq->hw_context;
 }
 
 static void __retire_engine_upto(struct intel_engine_cs *engine,
@@ -759,9 +747,6 @@ __i915_request_create(struct intel_context *ce, gfp_t gfp)
 
 	rq->infix = rq->ring->emit; /* end of header; start of user payload */
 
-	/* Keep a second pin for the dual retirement along engine and ring */
-	__intel_context_pin(ce);
-
 	intel_context_mark_active(ce);
 	return rq;
 
diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
index b7f3fbb4ae89..a96d0c012d46 100644
--- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
+++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
@@ -56,7 +56,6 @@ static void mock_device_release(struct drm_device *dev)
 
 	mutex_lock(&i915->drm.struct_mutex);
 	mock_device_flush(i915);
-	i915_gem_contexts_lost(i915);
 	mutex_unlock(&i915->drm.struct_mutex);
 
 	flush_work(&i915->gem.idle_work);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 06/39] drm/i915: Stop retiring along engine
  2019-06-14  7:09 WIP glance towards struct_mutex removal Chris Wilson
                   ` (4 preceding siblings ...)
  2019-06-14  7:09 ` [PATCH 05/39] drm/i915: Keep contexts pinned until after the next kernel context switch Chris Wilson
@ 2019-06-14  7:09 ` Chris Wilson
  2019-06-14  7:09 ` [PATCH 07/39] drm/i915: Replace engine->timeline with a plain list Chris Wilson
                   ` (34 subsequent siblings)
  40 siblings, 0 replies; 61+ messages in thread
From: Chris Wilson @ 2019-06-14  7:09 UTC (permalink / raw)
  To: intel-gfx

We no longer track the execution order along the engine and so no longer
need to enforce ordering of retire along the engine.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_request.c | 128 +++++++++++-----------------
 1 file changed, 52 insertions(+), 76 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index af757a958553..14a9c5a969e9 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -183,72 +183,23 @@ static void free_capture_list(struct i915_request *request)
 	}
 }
 
-static void __retire_engine_request(struct intel_engine_cs *engine,
-				    struct i915_request *rq)
-{
-	GEM_TRACE("%s(%s) fence %llx:%lld, current %d\n",
-		  __func__, engine->name,
-		  rq->fence.context, rq->fence.seqno,
-		  hwsp_seqno(rq));
-
-	GEM_BUG_ON(!i915_request_completed(rq));
-
-	local_irq_disable();
-
-	spin_lock(&engine->timeline.lock);
-	GEM_BUG_ON(!list_is_first(&rq->link, &engine->timeline.requests));
-	list_del_init(&rq->link);
-	spin_unlock(&engine->timeline.lock);
-
-	spin_lock(&rq->lock);
-	i915_request_mark_complete(rq);
-	if (!i915_request_signaled(rq))
-		dma_fence_signal_locked(&rq->fence);
-	if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &rq->fence.flags))
-		i915_request_cancel_breadcrumb(rq);
-	if (rq->waitboost) {
-		GEM_BUG_ON(!atomic_read(&rq->i915->gt_pm.rps.num_waiters));
-		atomic_dec(&rq->i915->gt_pm.rps.num_waiters);
-	}
-	spin_unlock(&rq->lock);
-
-	local_irq_enable();
-}
-
-static void __retire_engine_upto(struct intel_engine_cs *engine,
-				 struct i915_request *rq)
-{
-	struct i915_request *tmp;
-
-	if (list_empty(&rq->link))
-		return;
-
-	do {
-		tmp = list_first_entry(&engine->timeline.requests,
-				       typeof(*tmp), link);
-
-		GEM_BUG_ON(tmp->engine != engine);
-		__retire_engine_request(engine, tmp);
-	} while (tmp != rq);
-}
-
-static void i915_request_retire(struct i915_request *request)
+static bool i915_request_retire(struct i915_request *rq)
 {
 	struct i915_active_request *active, *next;
 
-	GEM_TRACE("%s fence %llx:%lld, current %d\n",
-		  request->engine->name,
-		  request->fence.context, request->fence.seqno,
-		  hwsp_seqno(request));
+	lockdep_assert_held(&rq->i915->drm.struct_mutex);
+	if (!i915_request_completed(rq))
+		return false;
 
-	lockdep_assert_held(&request->i915->drm.struct_mutex);
-	GEM_BUG_ON(!i915_sw_fence_signaled(&request->submit));
-	GEM_BUG_ON(!i915_request_completed(request));
+	GEM_TRACE("%s fence %llx:%lld, current %d\n",
+		  rq->engine->name,
+		  rq->fence.context, rq->fence.seqno,
+		  hwsp_seqno(rq));
 
-	trace_i915_request_retire(request);
+	GEM_BUG_ON(!i915_sw_fence_signaled(&rq->submit));
+	trace_i915_request_retire(rq);
 
-	advance_ring(request);
-	free_capture_list(request);
+	advance_ring(rq);
 
 	/*
 	 * Walk through the active list, calling retire on each. This allows
@@ -260,7 +211,7 @@ static void i915_request_retire(struct i915_request *request)
 	 * pass along the auxiliary information (to avoid dereferencing
 	 * the node after the callback).
 	 */
-	list_for_each_entry_safe(active, next, &request->active_list, link) {
+	list_for_each_entry_safe(active, next, &rq->active_list, link) {
 		/*
 		 * In microbenchmarks or focusing upon time inside the kernel,
 		 * we may spend an inordinate amount of time simply handling
@@ -276,18 +227,39 @@ static void i915_request_retire(struct i915_request *request)
 		INIT_LIST_HEAD(&active->link);
 		RCU_INIT_POINTER(active->request, NULL);
 
-		active->retire(active, request);
+		active->retire(active, rq);
+	}
+
+	local_irq_disable();
+
+	spin_lock(&rq->engine->timeline.lock);
+	list_del(&rq->link);
+	spin_unlock(&rq->engine->timeline.lock);
+
+	spin_lock(&rq->lock);
+	i915_request_mark_complete(rq);
+	if (!i915_request_signaled(rq))
+		dma_fence_signal_locked(&rq->fence);
+	if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &rq->fence.flags))
+		i915_request_cancel_breadcrumb(rq);
+	if (rq->waitboost) {
+		GEM_BUG_ON(!atomic_read(&rq->i915->gt_pm.rps.num_waiters));
+		atomic_dec(&rq->i915->gt_pm.rps.num_waiters);
 	}
+	spin_unlock(&rq->lock);
+
+	local_irq_enable();
 
-	i915_request_remove_from_client(request);
+	intel_context_exit(rq->hw_context);
+	intel_context_unpin(rq->hw_context);
 
-	__retire_engine_upto(request->engine, request);
+	i915_request_remove_from_client(rq);
 
-	intel_context_exit(request->hw_context);
-	intel_context_unpin(request->hw_context);
+	free_capture_list(rq);
+	i915_sched_node_fini(&rq->sched);
+	i915_request_put(rq);
 
-	i915_sched_node_fini(&request->sched);
-	i915_request_put(request);
+	return true;
 }
 
 void i915_request_retire_upto(struct i915_request *rq)
@@ -309,9 +281,7 @@ void i915_request_retire_upto(struct i915_request *rq)
 	do {
 		tmp = list_first_entry(&ring->request_list,
 				       typeof(*tmp), ring_link);
-
-		i915_request_retire(tmp);
-	} while (tmp != rq);
+	} while (i915_request_retire(tmp) && tmp != rq);
 }
 
 static void irq_execute_cb(struct irq_work *wrk)
@@ -600,12 +570,9 @@ static void ring_retire_requests(struct intel_ring *ring)
 {
 	struct i915_request *rq, *rn;
 
-	list_for_each_entry_safe(rq, rn, &ring->request_list, ring_link) {
-		if (!i915_request_completed(rq))
+	list_for_each_entry_safe(rq, rn, &ring->request_list, ring_link)
+		if (!i915_request_retire(rq))
 			break;
-
-		i915_request_retire(rq);
-	}
 }
 
 static noinline struct i915_request *
@@ -620,6 +587,15 @@ request_alloc_slow(struct intel_context *ce, gfp_t gfp)
 	if (!gfpflags_allow_blocking(gfp))
 		goto out;
 
+	/* Move our oldest request to the slab-cache (if not in use!) */
+	rq = list_first_entry(&ring->request_list, typeof(*rq), ring_link);
+	i915_request_retire(rq);
+
+	rq = kmem_cache_alloc(global.slab_requests,
+			      gfp | __GFP_RETRY_MAYFAIL | __GFP_NOWARN);
+	if (rq)
+		return rq;
+
 	/* Ratelimit ourselves to prevent oom from malicious clients */
 	rq = list_last_entry(&ring->request_list, typeof(*rq), ring_link);
 	cond_synchronize_rcu(rq->rcustate);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 07/39] drm/i915: Replace engine->timeline with a plain list
  2019-06-14  7:09 WIP glance towards struct_mutex removal Chris Wilson
                   ` (5 preceding siblings ...)
  2019-06-14  7:09 ` [PATCH 06/39] drm/i915: Stop retiring along engine Chris Wilson
@ 2019-06-14  7:09 ` Chris Wilson
  2019-06-14  7:09 ` [PATCH 08/39] drm/i915: Flush the execution-callbacks on retiring Chris Wilson
                   ` (33 subsequent siblings)
  40 siblings, 0 replies; 61+ messages in thread
From: Chris Wilson @ 2019-06-14  7:09 UTC (permalink / raw)
  To: intel-gfx

To continue the onslaught of removing the assumption of a global
execution ordering, another casualty is the engine->timeline. Without an
actual timeline to track, it is overkill and we can replace it with a
much less grand plain list. We still need a list of requests inflight,
for the simple purpose of finding inflight requests (for retiring,
resetting, preemption etc).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gt/intel_engine.h        |  6 ++
 drivers/gpu/drm/i915/gt/intel_engine_cs.c     | 62 ++++++------
 drivers/gpu/drm/i915/gt/intel_engine_types.h  |  6 +-
 drivers/gpu/drm/i915/gt/intel_lrc.c           | 95 ++++++++++---------
 drivers/gpu/drm/i915/gt/intel_reset.c         | 10 +-
 drivers/gpu/drm/i915/gt/intel_ringbuffer.c    | 15 ++-
 drivers/gpu/drm/i915/gt/mock_engine.c         | 18 ++--
 drivers/gpu/drm/i915/i915_gpu_error.c         |  5 +-
 drivers/gpu/drm/i915/i915_request.c           | 43 +++------
 drivers/gpu/drm/i915/i915_request.h           |  2 +-
 drivers/gpu/drm/i915/i915_scheduler.c         | 38 ++++----
 drivers/gpu/drm/i915/i915_timeline.c          |  1 -
 drivers/gpu/drm/i915/i915_timeline.h          | 19 ----
 drivers/gpu/drm/i915/i915_timeline_types.h    |  4 -
 drivers/gpu/drm/i915/intel_guc_submission.c   | 16 ++--
 .../gpu/drm/i915/selftests/mock_timeline.c    |  1 -
 16 files changed, 153 insertions(+), 188 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h
index 3b5a6d152997..2f1c6871ee95 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine.h
@@ -565,4 +565,10 @@ static inline bool inject_preempt_hang(struct intel_engine_execlists *execlists)
 
 #endif
 
+void intel_engine_init_active(struct intel_engine_cs *engine,
+			      unsigned int subclass);
+#define ENGINE_PHYSICAL	0
+#define ENGINE_MOCK	1
+#define ENGINE_VIRTUAL	2
+
 #endif /* _INTEL_RINGBUFFER_H_ */
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index 5a08036ae774..01f50cfd517c 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -617,14 +617,7 @@ static int intel_engine_setup_common(struct intel_engine_cs *engine)
 	if (err)
 		return err;
 
-	err = i915_timeline_init(engine->i915,
-				 &engine->timeline,
-				 engine->status_page.vma);
-	if (err)
-		goto err_hwsp;
-
-	i915_timeline_set_subclass(&engine->timeline, TIMELINE_ENGINE);
-
+	intel_engine_init_active(engine, ENGINE_PHYSICAL);
 	intel_engine_init_breadcrumbs(engine);
 	intel_engine_init_execlists(engine);
 	intel_engine_init_hangcheck(engine);
@@ -637,10 +630,6 @@ static int intel_engine_setup_common(struct intel_engine_cs *engine)
 		intel_sseu_from_device_info(&RUNTIME_INFO(engine->i915)->sseu);
 
 	return 0;
-
-err_hwsp:
-	cleanup_status_page(engine);
-	return err;
 }
 
 /**
@@ -797,6 +786,27 @@ static int pin_context(struct i915_gem_context *ctx,
 	return 0;
 }
 
+void
+intel_engine_init_active(struct intel_engine_cs *engine, unsigned int subclass)
+{
+	INIT_LIST_HEAD(&engine->active.requests);
+
+	spin_lock_init(&engine->active.lock);
+	lockdep_set_subclass(&engine->active.lock, subclass);
+
+	/*
+	 * Due to an interesting quirk in lockdep's internal debug tracking,
+	 * after setting a subclass we must ensure the lock is used. Otherwise,
+	 * nr_unused_locks is incremented once too often.
+	 */
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+	local_irq_disable();
+	lock_map_acquire(&engine->active.lock.dep_map);
+	lock_map_release(&engine->active.lock.dep_map);
+	local_irq_enable();
+#endif
+}
+
 /**
  * intel_engines_init_common - initialize cengine state which might require hw access
  * @engine: Engine to initialize.
@@ -860,6 +870,8 @@ int intel_engine_init_common(struct intel_engine_cs *engine)
  */
 void intel_engine_cleanup_common(struct intel_engine_cs *engine)
 {
+	GEM_BUG_ON(!list_empty(&engine->active.requests));
+
 	cleanup_status_page(engine);
 
 	intel_engine_fini_breadcrumbs(engine);
@@ -874,8 +886,6 @@ void intel_engine_cleanup_common(struct intel_engine_cs *engine)
 	intel_context_unpin(engine->kernel_context);
 	GEM_BUG_ON(!llist_empty(&engine->barrier_tasks));
 
-	i915_timeline_fini(&engine->timeline);
-
 	intel_wa_list_free(&engine->ctx_wa_list);
 	intel_wa_list_free(&engine->wa_list);
 	intel_wa_list_free(&engine->whitelist);
@@ -1482,16 +1492,6 @@ void intel_engine_dump(struct intel_engine_cs *engine,
 
 	drm_printf(m, "\tRequests:\n");
 
-	rq = list_first_entry(&engine->timeline.requests,
-			      struct i915_request, link);
-	if (&rq->link != &engine->timeline.requests)
-		print_request(m, rq, "\t\tfirst  ");
-
-	rq = list_last_entry(&engine->timeline.requests,
-			     struct i915_request, link);
-	if (&rq->link != &engine->timeline.requests)
-		print_request(m, rq, "\t\tlast   ");
-
 	rq = intel_engine_find_active_request(engine);
 	if (rq) {
 		print_request(m, rq, "\t\tactive ");
@@ -1572,7 +1572,7 @@ int intel_enable_engine_stats(struct intel_engine_cs *engine)
 	if (!intel_engine_supports_stats(engine))
 		return -ENODEV;
 
-	spin_lock_irqsave(&engine->timeline.lock, flags);
+	spin_lock_irqsave(&engine->active.lock, flags);
 	write_seqlock(&engine->stats.lock);
 
 	if (unlikely(engine->stats.enabled == ~0)) {
@@ -1598,7 +1598,7 @@ int intel_enable_engine_stats(struct intel_engine_cs *engine)
 
 unlock:
 	write_sequnlock(&engine->stats.lock);
-	spin_unlock_irqrestore(&engine->timeline.lock, flags);
+	spin_unlock_irqrestore(&engine->active.lock, flags);
 
 	return err;
 }
@@ -1683,22 +1683,22 @@ intel_engine_find_active_request(struct intel_engine_cs *engine)
 	 * At all other times, we must assume the GPU is still running, but
 	 * we only care about the snapshot of this moment.
 	 */
-	spin_lock_irqsave(&engine->timeline.lock, flags);
-	list_for_each_entry(request, &engine->timeline.requests, link) {
+	spin_lock_irqsave(&engine->active.lock, flags);
+	list_for_each_entry(request, &engine->active.requests, sched.link) {
 		if (i915_request_completed(request))
 			continue;
 
 		if (!i915_request_started(request))
-			break;
+			continue;
 
 		/* More than one preemptible request may match! */
 		if (!match_ring(request))
-			break;
+			continue;
 
 		active = request;
 		break;
 	}
-	spin_unlock_irqrestore(&engine->timeline.lock, flags);
+	spin_unlock_irqrestore(&engine->active.lock, flags);
 
 	return active;
 }
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index 33a31aa2d2ae..b2faca8e5dec 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -288,7 +288,11 @@ struct intel_engine_cs {
 
 	struct intel_ring *buffer;
 
-	struct i915_timeline timeline;
+	struct {
+		spinlock_t lock;
+		struct list_head requests;
+	} active;
+
 	struct llist_head barrier_tasks;
 
 	struct intel_context *kernel_context; /* pinned */
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index d0a51752386f..c400c66d0ee5 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -298,8 +298,8 @@ static inline bool need_preempt(const struct intel_engine_cs *engine,
 	 * Check against the first request in ELSP[1], it will, thanks to the
 	 * power of PI, be the highest priority of that context.
 	 */
-	if (!list_is_last(&rq->link, &engine->timeline.requests) &&
-	    rq_prio(list_next_entry(rq, link)) > last_prio)
+	if (!list_is_last(&rq->sched.link, &engine->active.requests) &&
+	    rq_prio(list_next_entry(rq, sched.link)) > last_prio)
 		return true;
 
 	if (rb) {
@@ -434,11 +434,11 @@ __unwind_incomplete_requests(struct intel_engine_cs *engine)
 	struct list_head *uninitialized_var(pl);
 	int prio = I915_PRIORITY_INVALID;
 
-	lockdep_assert_held(&engine->timeline.lock);
+	lockdep_assert_held(&engine->active.lock);
 
 	list_for_each_entry_safe_reverse(rq, rn,
-					 &engine->timeline.requests,
-					 link) {
+					 &engine->active.requests,
+					 sched.link) {
 		struct intel_engine_cs *owner;
 
 		if (i915_request_completed(rq))
@@ -465,7 +465,7 @@ __unwind_incomplete_requests(struct intel_engine_cs *engine)
 			}
 			GEM_BUG_ON(RB_EMPTY_ROOT(&engine->execlists.queue.rb_root));
 
-			list_add(&rq->sched.link, pl);
+			list_move(&rq->sched.link, pl);
 			active = rq;
 		} else {
 			rq->engine = owner;
@@ -933,11 +933,11 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 			rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
 		struct i915_request *rq;
 
-		spin_lock(&ve->base.timeline.lock);
+		spin_lock(&ve->base.active.lock);
 
 		rq = ve->request;
 		if (unlikely(!rq)) { /* lost the race to a sibling */
-			spin_unlock(&ve->base.timeline.lock);
+			spin_unlock(&ve->base.active.lock);
 			rb_erase_cached(rb, &execlists->virtual);
 			RB_CLEAR_NODE(rb);
 			rb = rb_first_cached(&execlists->virtual);
@@ -950,13 +950,13 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 
 		if (rq_prio(rq) >= queue_prio(execlists)) {
 			if (!virtual_matches(ve, rq, engine)) {
-				spin_unlock(&ve->base.timeline.lock);
+				spin_unlock(&ve->base.active.lock);
 				rb = rb_next(rb);
 				continue;
 			}
 
 			if (last && !can_merge_rq(last, rq)) {
-				spin_unlock(&ve->base.timeline.lock);
+				spin_unlock(&ve->base.active.lock);
 				return; /* leave this rq for another engine */
 			}
 
@@ -1011,7 +1011,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 			last = rq;
 		}
 
-		spin_unlock(&ve->base.timeline.lock);
+		spin_unlock(&ve->base.active.lock);
 		break;
 	}
 
@@ -1068,8 +1068,6 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 				GEM_BUG_ON(port_isset(port));
 			}
 
-			list_del_init(&rq->sched.link);
-
 			__i915_request_submit(rq);
 			trace_i915_request_in(rq, port_index(port, execlists));
 
@@ -1170,7 +1168,7 @@ static void process_csb(struct intel_engine_cs *engine)
 	const u8 num_entries = execlists->csb_size;
 	u8 head, tail;
 
-	lockdep_assert_held(&engine->timeline.lock);
+	lockdep_assert_held(&engine->active.lock);
 
 	/*
 	 * Note that csb_write, csb_status may be either in HWSP or mmio.
@@ -1330,7 +1328,7 @@ static void process_csb(struct intel_engine_cs *engine)
 
 static void __execlists_submission_tasklet(struct intel_engine_cs *const engine)
 {
-	lockdep_assert_held(&engine->timeline.lock);
+	lockdep_assert_held(&engine->active.lock);
 
 	process_csb(engine);
 	if (!execlists_is_active(&engine->execlists, EXECLISTS_ACTIVE_PREEMPT))
@@ -1351,15 +1349,16 @@ static void execlists_submission_tasklet(unsigned long data)
 		  !!intel_wakeref_active(&engine->wakeref),
 		  engine->execlists.active);
 
-	spin_lock_irqsave(&engine->timeline.lock, flags);
+	spin_lock_irqsave(&engine->active.lock, flags);
 	__execlists_submission_tasklet(engine);
-	spin_unlock_irqrestore(&engine->timeline.lock, flags);
+	spin_unlock_irqrestore(&engine->active.lock, flags);
 }
 
 static void queue_request(struct intel_engine_cs *engine,
 			  struct i915_sched_node *node,
 			  int prio)
 {
+	GEM_BUG_ON(!list_empty(&node->link));
 	list_add_tail(&node->link, i915_sched_lookup_priolist(engine, prio));
 }
 
@@ -1390,7 +1389,7 @@ static void execlists_submit_request(struct i915_request *request)
 	unsigned long flags;
 
 	/* Will be called from irq-context when using foreign fences. */
-	spin_lock_irqsave(&engine->timeline.lock, flags);
+	spin_lock_irqsave(&engine->active.lock, flags);
 
 	queue_request(engine, &request->sched, rq_prio(request));
 
@@ -1399,7 +1398,7 @@ static void execlists_submit_request(struct i915_request *request)
 
 	submit_queue(engine, rq_prio(request));
 
-	spin_unlock_irqrestore(&engine->timeline.lock, flags);
+	spin_unlock_irqrestore(&engine->active.lock, flags);
 }
 
 static void __execlists_context_fini(struct intel_context *ce)
@@ -2050,8 +2049,8 @@ static void execlists_reset_prepare(struct intel_engine_cs *engine)
 	intel_engine_stop_cs(engine);
 
 	/* And flush any current direct submission. */
-	spin_lock_irqsave(&engine->timeline.lock, flags);
-	spin_unlock_irqrestore(&engine->timeline.lock, flags);
+	spin_lock_irqsave(&engine->active.lock, flags);
+	spin_unlock_irqrestore(&engine->active.lock, flags);
 }
 
 static bool lrc_regs_ok(const struct i915_request *rq)
@@ -2094,11 +2093,11 @@ static void reset_csb_pointers(struct intel_engine_execlists *execlists)
 
 static struct i915_request *active_request(struct i915_request *rq)
 {
-	const struct list_head * const list = &rq->engine->timeline.requests;
+	const struct list_head * const list = &rq->engine->active.requests;
 	const struct intel_context * const context = rq->hw_context;
 	struct i915_request *active = NULL;
 
-	list_for_each_entry_from_reverse(rq, list, link) {
+	list_for_each_entry_from_reverse(rq, list, sched.link) {
 		if (i915_request_completed(rq))
 			break;
 
@@ -2215,11 +2214,11 @@ static void execlists_reset(struct intel_engine_cs *engine, bool stalled)
 
 	GEM_TRACE("%s\n", engine->name);
 
-	spin_lock_irqsave(&engine->timeline.lock, flags);
+	spin_lock_irqsave(&engine->active.lock, flags);
 
 	__execlists_reset(engine, stalled);
 
-	spin_unlock_irqrestore(&engine->timeline.lock, flags);
+	spin_unlock_irqrestore(&engine->active.lock, flags);
 }
 
 static void nop_submission_tasklet(unsigned long data)
@@ -2250,12 +2249,12 @@ static void execlists_cancel_requests(struct intel_engine_cs *engine)
 	 * submission's irq state, we also wish to remind ourselves that
 	 * it is irq state.)
 	 */
-	spin_lock_irqsave(&engine->timeline.lock, flags);
+	spin_lock_irqsave(&engine->active.lock, flags);
 
 	__execlists_reset(engine, true);
 
 	/* Mark all executing requests as skipped. */
-	list_for_each_entry(rq, &engine->timeline.requests, link) {
+	list_for_each_entry(rq, &engine->active.requests, sched.link) {
 		if (!i915_request_signaled(rq))
 			dma_fence_set_error(&rq->fence, -EIO);
 
@@ -2286,7 +2285,7 @@ static void execlists_cancel_requests(struct intel_engine_cs *engine)
 		rb_erase_cached(rb, &execlists->virtual);
 		RB_CLEAR_NODE(rb);
 
-		spin_lock(&ve->base.timeline.lock);
+		spin_lock(&ve->base.active.lock);
 		if (ve->request) {
 			ve->request->engine = engine;
 			__i915_request_submit(ve->request);
@@ -2295,7 +2294,7 @@ static void execlists_cancel_requests(struct intel_engine_cs *engine)
 			ve->base.execlists.queue_priority_hint = INT_MIN;
 			ve->request = NULL;
 		}
-		spin_unlock(&ve->base.timeline.lock);
+		spin_unlock(&ve->base.active.lock);
 	}
 
 	/* Remaining _unready_ requests will be nop'ed when submitted */
@@ -2307,7 +2306,7 @@ static void execlists_cancel_requests(struct intel_engine_cs *engine)
 	GEM_BUG_ON(__tasklet_is_enabled(&execlists->tasklet));
 	execlists->tasklet.func = nop_submission_tasklet;
 
-	spin_unlock_irqrestore(&engine->timeline.lock, flags);
+	spin_unlock_irqrestore(&engine->active.lock, flags);
 }
 
 static void execlists_reset_finish(struct intel_engine_cs *engine)
@@ -3010,12 +3009,18 @@ static int execlists_context_deferred_alloc(struct intel_context *ce,
 	return ret;
 }
 
+static struct list_head *virtual_queue(struct virtual_engine *ve)
+{
+	return &ve->base.execlists.default_priolist.requests[0];
+}
+
 static void virtual_context_destroy(struct kref *kref)
 {
 	struct virtual_engine *ve =
 		container_of(kref, typeof(*ve), context.ref);
 	unsigned int n;
 
+	GEM_BUG_ON(!list_empty(virtual_queue(ve)));
 	GEM_BUG_ON(ve->request);
 	GEM_BUG_ON(ve->context.inflight);
 
@@ -3026,13 +3031,13 @@ static void virtual_context_destroy(struct kref *kref)
 		if (RB_EMPTY_NODE(node))
 			continue;
 
-		spin_lock_irq(&sibling->timeline.lock);
+		spin_lock_irq(&sibling->active.lock);
 
 		/* Detachment is lazily performed in the execlists tasklet */
 		if (!RB_EMPTY_NODE(node))
 			rb_erase_cached(node, &sibling->execlists.virtual);
 
-		spin_unlock_irq(&sibling->timeline.lock);
+		spin_unlock_irq(&sibling->active.lock);
 	}
 	GEM_BUG_ON(__tasklet_is_scheduled(&ve->base.execlists.tasklet));
 
@@ -3040,8 +3045,6 @@ static void virtual_context_destroy(struct kref *kref)
 		__execlists_context_fini(&ve->context);
 
 	kfree(ve->bonds);
-
-	i915_timeline_fini(&ve->base.timeline);
 	kfree(ve);
 }
 
@@ -3161,16 +3164,16 @@ static void virtual_submission_tasklet(unsigned long data)
 
 		if (unlikely(!(mask & sibling->mask))) {
 			if (!RB_EMPTY_NODE(&node->rb)) {
-				spin_lock(&sibling->timeline.lock);
+				spin_lock(&sibling->active.lock);
 				rb_erase_cached(&node->rb,
 						&sibling->execlists.virtual);
 				RB_CLEAR_NODE(&node->rb);
-				spin_unlock(&sibling->timeline.lock);
+				spin_unlock(&sibling->active.lock);
 			}
 			continue;
 		}
 
-		spin_lock(&sibling->timeline.lock);
+		spin_lock(&sibling->active.lock);
 
 		if (!RB_EMPTY_NODE(&node->rb)) {
 			/*
@@ -3214,7 +3217,7 @@ static void virtual_submission_tasklet(unsigned long data)
 			tasklet_hi_schedule(&sibling->execlists.tasklet);
 		}
 
-		spin_unlock(&sibling->timeline.lock);
+		spin_unlock(&sibling->active.lock);
 	}
 	local_irq_enable();
 }
@@ -3231,9 +3234,13 @@ static void virtual_submit_request(struct i915_request *rq)
 	GEM_BUG_ON(ve->base.submit_request != virtual_submit_request);
 
 	GEM_BUG_ON(ve->request);
+	GEM_BUG_ON(!list_empty(virtual_queue(ve)));
+
 	ve->base.execlists.queue_priority_hint = rq_prio(rq);
 	WRITE_ONCE(ve->request, rq);
 
+	list_move_tail(&rq->sched.link, virtual_queue(ve));
+
 	tasklet_schedule(&ve->base.execlists.tasklet);
 }
 
@@ -3297,10 +3304,7 @@ intel_execlists_create_virtual(struct i915_gem_context *ctx,
 
 	snprintf(ve->base.name, sizeof(ve->base.name), "virtual");
 
-	err = i915_timeline_init(ctx->i915, &ve->base.timeline, NULL);
-	if (err)
-		goto err_put;
-	i915_timeline_set_subclass(&ve->base.timeline, TIMELINE_VIRTUAL);
+	intel_engine_init_active(&ve->base, ENGINE_VIRTUAL);
 
 	intel_engine_init_execlists(&ve->base);
 
@@ -3311,6 +3315,7 @@ intel_execlists_create_virtual(struct i915_gem_context *ctx,
 	ve->base.submit_request = virtual_submit_request;
 	ve->base.bond_execute = virtual_bond_execute;
 
+	INIT_LIST_HEAD(virtual_queue(ve));
 	ve->base.execlists.queue_priority_hint = INT_MIN;
 	tasklet_init(&ve->base.execlists.tasklet,
 		     virtual_submission_tasklet,
@@ -3465,11 +3470,11 @@ void intel_execlists_show_requests(struct intel_engine_cs *engine,
 	unsigned int count;
 	struct rb_node *rb;
 
-	spin_lock_irqsave(&engine->timeline.lock, flags);
+	spin_lock_irqsave(&engine->active.lock, flags);
 
 	last = NULL;
 	count = 0;
-	list_for_each_entry(rq, &engine->timeline.requests, link) {
+	list_for_each_entry(rq, &engine->active.requests, sched.link) {
 		if (count++ < max - 1)
 			show_request(m, rq, "\t\tE ");
 		else
@@ -3532,7 +3537,7 @@ void intel_execlists_show_requests(struct intel_engine_cs *engine,
 		show_request(m, last, "\t\tV ");
 	}
 
-	spin_unlock_irqrestore(&engine->timeline.lock, flags);
+	spin_unlock_irqrestore(&engine->active.lock, flags);
 }
 
 void intel_lr_context_reset(struct intel_engine_cs *engine,
diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
index 41a294f5cc19..b779984a5620 100644
--- a/drivers/gpu/drm/i915/gt/intel_reset.c
+++ b/drivers/gpu/drm/i915/gt/intel_reset.c
@@ -49,12 +49,12 @@ static void engine_skip_context(struct i915_request *rq)
 	struct intel_engine_cs *engine = rq->engine;
 	struct i915_gem_context *hung_ctx = rq->gem_context;
 
-	lockdep_assert_held(&engine->timeline.lock);
+	lockdep_assert_held(&engine->active.lock);
 
 	if (!i915_request_is_active(rq))
 		return;
 
-	list_for_each_entry_continue(rq, &engine->timeline.requests, link)
+	list_for_each_entry_continue(rq, &engine->active.requests, sched.link)
 		if (rq->gem_context == hung_ctx)
 			i915_request_skip(rq, -EIO);
 }
@@ -130,7 +130,7 @@ void i915_reset_request(struct i915_request *rq, bool guilty)
 		  rq->fence.seqno,
 		  yesno(guilty));
 
-	lockdep_assert_held(&rq->engine->timeline.lock);
+	lockdep_assert_held(&rq->engine->active.lock);
 	GEM_BUG_ON(i915_request_completed(rq));
 
 	if (guilty) {
@@ -785,10 +785,10 @@ static void nop_submit_request(struct i915_request *request)
 		  engine->name, request->fence.context, request->fence.seqno);
 	dma_fence_set_error(&request->fence, -EIO);
 
-	spin_lock_irqsave(&engine->timeline.lock, flags);
+	spin_lock_irqsave(&engine->active.lock, flags);
 	__i915_request_submit(request);
 	i915_request_mark_complete(request);
-	spin_unlock_irqrestore(&engine->timeline.lock, flags);
+	spin_unlock_irqrestore(&engine->active.lock, flags);
 
 	intel_engine_queue_breadcrumbs(engine);
 }
diff --git a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
index cc901edec09a..019bf039f616 100644
--- a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
@@ -730,14 +730,13 @@ static void reset_prepare(struct intel_engine_cs *engine)
 
 static void reset_ring(struct intel_engine_cs *engine, bool stalled)
 {
-	struct i915_timeline *tl = &engine->timeline;
 	struct i915_request *pos, *rq;
 	unsigned long flags;
 	u32 head;
 
 	rq = NULL;
-	spin_lock_irqsave(&tl->lock, flags);
-	list_for_each_entry(pos, &tl->requests, link) {
+	spin_lock_irqsave(&engine->active.lock, flags);
+	list_for_each_entry(pos, &engine->active.requests, sched.link) {
 		if (!i915_request_completed(pos)) {
 			rq = pos;
 			break;
@@ -791,7 +790,7 @@ static void reset_ring(struct intel_engine_cs *engine, bool stalled)
 	}
 	engine->buffer->head = intel_ring_wrap(engine->buffer, head);
 
-	spin_unlock_irqrestore(&tl->lock, flags);
+	spin_unlock_irqrestore(&engine->active.lock, flags);
 }
 
 static void reset_finish(struct intel_engine_cs *engine)
@@ -877,10 +876,10 @@ static void cancel_requests(struct intel_engine_cs *engine)
 	struct i915_request *request;
 	unsigned long flags;
 
-	spin_lock_irqsave(&engine->timeline.lock, flags);
+	spin_lock_irqsave(&engine->active.lock, flags);
 
 	/* Mark all submitted requests as skipped. */
-	list_for_each_entry(request, &engine->timeline.requests, link) {
+	list_for_each_entry(request, &engine->active.requests, sched.link) {
 		if (!i915_request_signaled(request))
 			dma_fence_set_error(&request->fence, -EIO);
 
@@ -889,7 +888,7 @@ static void cancel_requests(struct intel_engine_cs *engine)
 
 	/* Remaining _unready_ requests will be nop'ed when submitted */
 
-	spin_unlock_irqrestore(&engine->timeline.lock, flags);
+	spin_unlock_irqrestore(&engine->active.lock, flags);
 }
 
 static void i9xx_submit_request(struct i915_request *request)
@@ -1267,8 +1266,6 @@ intel_engine_create_ring(struct intel_engine_cs *engine,
 
 	GEM_BUG_ON(!is_power_of_2(size));
 	GEM_BUG_ON(RING_CTL_SIZE(size) & ~RING_NR_PAGES);
-	GEM_BUG_ON(timeline == &engine->timeline);
-	lockdep_assert_held(&engine->i915->drm.struct_mutex);
 
 	ring = kzalloc(sizeof(*ring), GFP_KERNEL);
 	if (!ring)
diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c b/drivers/gpu/drm/i915/gt/mock_engine.c
index d1ef515bac8d..086801b51441 100644
--- a/drivers/gpu/drm/i915/gt/mock_engine.c
+++ b/drivers/gpu/drm/i915/gt/mock_engine.c
@@ -229,17 +229,17 @@ static void mock_cancel_requests(struct intel_engine_cs *engine)
 	struct i915_request *request;
 	unsigned long flags;
 
-	spin_lock_irqsave(&engine->timeline.lock, flags);
+	spin_lock_irqsave(&engine->active.lock, flags);
 
 	/* Mark all submitted requests as skipped. */
-	list_for_each_entry(request, &engine->timeline.requests, sched.link) {
+	list_for_each_entry(request, &engine->active.requests, sched.link) {
 		if (!i915_request_signaled(request))
 			dma_fence_set_error(&request->fence, -EIO);
 
 		i915_request_mark_complete(request);
 	}
 
-	spin_unlock_irqrestore(&engine->timeline.lock, flags);
+	spin_unlock_irqrestore(&engine->active.lock, flags);
 }
 
 struct intel_engine_cs *mock_engine(struct drm_i915_private *i915,
@@ -285,28 +285,23 @@ int mock_engine_init(struct intel_engine_cs *engine)
 	struct drm_i915_private *i915 = engine->i915;
 	int err;
 
+	intel_engine_init_active(engine, ENGINE_MOCK);
 	intel_engine_init_breadcrumbs(engine);
 	intel_engine_init_execlists(engine);
 	intel_engine_init__pm(engine);
 
-	if (i915_timeline_init(i915, &engine->timeline, NULL))
-		goto err_breadcrumbs;
-	i915_timeline_set_subclass(&engine->timeline, TIMELINE_ENGINE);
-
 	engine->kernel_context =
 		i915_gem_context_get_engine(i915->kernel_context, engine->id);
 	if (IS_ERR(engine->kernel_context))
-		goto err_timeline;
+		goto err_breadcrumbs;
 
 	err = intel_context_pin(engine->kernel_context);
 	intel_context_put(engine->kernel_context);
 	if (err)
-		goto err_timeline;
+		goto err_breadcrumbs;
 
 	return 0;
 
-err_timeline:
-	i915_timeline_fini(&engine->timeline);
 err_breadcrumbs:
 	intel_engine_fini_breadcrumbs(engine);
 	return -ENOMEM;
@@ -340,7 +335,6 @@ void mock_engine_free(struct intel_engine_cs *engine)
 	intel_context_unpin(engine->kernel_context);
 
 	intel_engine_fini_breadcrumbs(engine);
-	i915_timeline_fini(&engine->timeline);
 
 	kfree(engine);
 }
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 26c9c0595bdf..f411e3244208 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -1275,7 +1275,7 @@ static void engine_record_requests(struct intel_engine_cs *engine,
 
 	count = 0;
 	request = first;
-	list_for_each_entry_from(request, &engine->timeline.requests, link)
+	list_for_each_entry_from(request, &engine->active.requests, sched.link)
 		count++;
 	if (!count)
 		return;
@@ -1288,7 +1288,8 @@ static void engine_record_requests(struct intel_engine_cs *engine,
 
 	count = 0;
 	request = first;
-	list_for_each_entry_from(request, &engine->timeline.requests, link) {
+	list_for_each_entry_from(request,
+				 &engine->active.requests, sched.link) {
 		if (count >= ee->num_requests) {
 			/*
 			 * If the ring request list was changed in
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 14a9c5a969e9..a619fafd9b86 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -232,9 +232,9 @@ static bool i915_request_retire(struct i915_request *rq)
 
 	local_irq_disable();
 
-	spin_lock(&rq->engine->timeline.lock);
-	list_del(&rq->link);
-	spin_unlock(&rq->engine->timeline.lock);
+	spin_lock(&rq->engine->active.lock);
+	list_del(&rq->sched.link);
+	spin_unlock(&rq->engine->active.lock);
 
 	spin_lock(&rq->lock);
 	i915_request_mark_complete(rq);
@@ -254,6 +254,7 @@ static bool i915_request_retire(struct i915_request *rq)
 	intel_context_unpin(rq->hw_context);
 
 	i915_request_remove_from_client(rq);
+	list_del(&rq->link);
 
 	free_capture_list(rq);
 	i915_sched_node_fini(&rq->sched);
@@ -373,28 +374,17 @@ __i915_request_await_execution(struct i915_request *rq,
 	return 0;
 }
 
-static void move_to_timeline(struct i915_request *request,
-			     struct i915_timeline *timeline)
-{
-	GEM_BUG_ON(request->timeline == &request->engine->timeline);
-	lockdep_assert_held(&request->engine->timeline.lock);
-
-	spin_lock(&request->timeline->lock);
-	list_move_tail(&request->link, &timeline->requests);
-	spin_unlock(&request->timeline->lock);
-}
-
 void __i915_request_submit(struct i915_request *request)
 {
 	struct intel_engine_cs *engine = request->engine;
 
-	GEM_TRACE("%s fence %llx:%lld -> current %d\n",
+	GEM_TRACE("%s fence %llx:%lld, current %d\n",
 		  engine->name,
 		  request->fence.context, request->fence.seqno,
 		  hwsp_seqno(request));
 
 	GEM_BUG_ON(!irqs_disabled());
-	lockdep_assert_held(&engine->timeline.lock);
+	lockdep_assert_held(&engine->active.lock);
 
 	if (i915_gem_context_is_banned(request->gem_context))
 		i915_request_skip(request, -EIO);
@@ -422,6 +412,8 @@ void __i915_request_submit(struct i915_request *request)
 	/* We may be recursing from the signal callback of another i915 fence */
 	spin_lock_nested(&request->lock, SINGLE_DEPTH_NESTING);
 
+	list_move_tail(&request->sched.link, &engine->active.requests);
+
 	GEM_BUG_ON(test_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags));
 	set_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags);
 
@@ -437,9 +429,6 @@ void __i915_request_submit(struct i915_request *request)
 	engine->emit_fini_breadcrumb(request,
 				     request->ring->vaddr + request->postfix);
 
-	/* Transfer from per-context onto the global per-engine timeline */
-	move_to_timeline(request, &engine->timeline);
-
 	engine->serial++;
 
 	trace_i915_request_execute(request);
@@ -451,11 +440,11 @@ void i915_request_submit(struct i915_request *request)
 	unsigned long flags;
 
 	/* Will be called from irq-context when using foreign fences. */
-	spin_lock_irqsave(&engine->timeline.lock, flags);
+	spin_lock_irqsave(&engine->active.lock, flags);
 
 	__i915_request_submit(request);
 
-	spin_unlock_irqrestore(&engine->timeline.lock, flags);
+	spin_unlock_irqrestore(&engine->active.lock, flags);
 }
 
 void __i915_request_unsubmit(struct i915_request *request)
@@ -468,7 +457,7 @@ void __i915_request_unsubmit(struct i915_request *request)
 		  hwsp_seqno(request));
 
 	GEM_BUG_ON(!irqs_disabled());
-	lockdep_assert_held(&engine->timeline.lock);
+	lockdep_assert_held(&engine->active.lock);
 
 	/*
 	 * Only unwind in reverse order, required so that the per-context list
@@ -486,9 +475,6 @@ void __i915_request_unsubmit(struct i915_request *request)
 
 	spin_unlock(&request->lock);
 
-	/* Transfer back from the global per-engine timeline to per-context */
-	move_to_timeline(request, request->timeline);
-
 	/* We've already spun, don't charge on resubmitting. */
 	if (request->sched.semaphores && i915_request_started(request)) {
 		request->sched.attr.priority |= I915_PRIORITY_NOSEMAPHORE;
@@ -510,11 +496,11 @@ void i915_request_unsubmit(struct i915_request *request)
 	unsigned long flags;
 
 	/* Will be called from irq-context when using foreign fences. */
-	spin_lock_irqsave(&engine->timeline.lock, flags);
+	spin_lock_irqsave(&engine->active.lock, flags);
 
 	__i915_request_unsubmit(request);
 
-	spin_unlock_irqrestore(&engine->timeline.lock, flags);
+	spin_unlock_irqrestore(&engine->active.lock, flags);
 }
 
 static int __i915_sw_fence_call
@@ -669,7 +655,6 @@ __i915_request_create(struct intel_context *ce, gfp_t gfp)
 	rq->engine = ce->engine;
 	rq->ring = ce->ring;
 	rq->timeline = tl;
-	GEM_BUG_ON(rq->timeline == &ce->engine->timeline);
 	rq->hwsp_seqno = tl->hwsp_seqno;
 	rq->hwsp_cacheline = tl->hwsp_cacheline;
 	rq->rcustate = get_state_synchronize_rcu(); /* acts as smp_mb() */
@@ -1137,9 +1122,7 @@ __i915_request_add_to_timeline(struct i915_request *rq)
 							 0);
 	}
 
-	spin_lock_irq(&timeline->lock);
 	list_add_tail(&rq->link, &timeline->requests);
-	spin_unlock_irq(&timeline->lock);
 
 	/*
 	 * Make sure that no request gazumped us - if it was allocated after
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index c9f7d07991c8..edbbdfec24ab 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -217,7 +217,7 @@ struct i915_request {
 
 	bool waitboost;
 
-	/** engine->request_list entry for this request */
+	/** timeline->request entry for this request */
 	struct list_head link;
 
 	/** ring->request_list entry for this request */
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index 78ceb56d7801..2e9b38bdc33c 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -77,7 +77,7 @@ i915_sched_lookup_priolist(struct intel_engine_cs *engine, int prio)
 	bool first = true;
 	int idx, i;
 
-	lockdep_assert_held(&engine->timeline.lock);
+	lockdep_assert_held(&engine->active.lock);
 	assert_priolists(execlists);
 
 	/* buckets sorted from highest [in slot 0] to lowest priority */
@@ -162,9 +162,9 @@ sched_lock_engine(const struct i915_sched_node *node,
 	 * check that the rq still belongs to the newly locked engine.
 	 */
 	while (locked != (engine = READ_ONCE(rq->engine))) {
-		spin_unlock(&locked->timeline.lock);
+		spin_unlock(&locked->active.lock);
 		memset(cache, 0, sizeof(*cache));
-		spin_lock(&engine->timeline.lock);
+		spin_lock(&engine->active.lock);
 		locked = engine;
 	}
 
@@ -189,7 +189,7 @@ static void kick_submission(struct intel_engine_cs *engine, int prio)
 	 * tasklet, i.e. we have not change the priority queue
 	 * sufficiently to oust the running context.
 	 */
-	if (inflight && !i915_scheduler_need_preempt(prio, rq_prio(inflight)))
+	if (!inflight || !i915_scheduler_need_preempt(prio, rq_prio(inflight)))
 		return;
 
 	tasklet_hi_schedule(&engine->execlists.tasklet);
@@ -278,7 +278,7 @@ static void __i915_schedule(struct i915_sched_node *node,
 
 	memset(&cache, 0, sizeof(cache));
 	engine = node_to_request(node)->engine;
-	spin_lock(&engine->timeline.lock);
+	spin_lock(&engine->active.lock);
 
 	/* Fifo and depth-first replacement ensure our deps execute before us */
 	engine = sched_lock_engine(node, engine, &cache);
@@ -287,7 +287,7 @@ static void __i915_schedule(struct i915_sched_node *node,
 
 		node = dep->signaler;
 		engine = sched_lock_engine(node, engine, &cache);
-		lockdep_assert_held(&engine->timeline.lock);
+		lockdep_assert_held(&engine->active.lock);
 
 		/* Recheck after acquiring the engine->timeline.lock */
 		if (prio <= node->attr.priority || node_signaled(node))
@@ -296,14 +296,8 @@ static void __i915_schedule(struct i915_sched_node *node,
 		GEM_BUG_ON(node_to_request(node)->engine != engine);
 
 		node->attr.priority = prio;
-		if (!list_empty(&node->link)) {
-			GEM_BUG_ON(intel_engine_is_virtual(engine));
-			if (!cache.priolist)
-				cache.priolist =
-					i915_sched_lookup_priolist(engine,
-								   prio);
-			list_move_tail(&node->link, cache.priolist);
-		} else {
+
+		if (list_empty(&node->link)) {
 			/*
 			 * If the request is not in the priolist queue because
 			 * it is not yet runnable, then it doesn't contribute
@@ -312,8 +306,16 @@ static void __i915_schedule(struct i915_sched_node *node,
 			 * queue; but in that case we may still need to reorder
 			 * the inflight requests.
 			 */
-			if (!i915_sw_fence_done(&node_to_request(node)->submit))
-				continue;
+			continue;
+		}
+
+		if (!intel_engine_is_virtual(engine) &&
+		    !i915_request_is_active(node_to_request(node))) {
+			if (!cache.priolist)
+				cache.priolist =
+					i915_sched_lookup_priolist(engine,
+								   prio);
+			list_move_tail(&node->link, cache.priolist);
 		}
 
 		if (prio <= engine->execlists.queue_priority_hint)
@@ -325,7 +327,7 @@ static void __i915_schedule(struct i915_sched_node *node,
 		kick_submission(engine, prio);
 	}
 
-	spin_unlock(&engine->timeline.lock);
+	spin_unlock(&engine->active.lock);
 }
 
 void i915_schedule(struct i915_request *rq, const struct i915_sched_attr *attr)
@@ -439,8 +441,6 @@ void i915_sched_node_fini(struct i915_sched_node *node)
 {
 	struct i915_dependency *dep, *tmp;
 
-	GEM_BUG_ON(!list_empty(&node->link));
-
 	spin_lock_irq(&schedule_lock);
 
 	/*
diff --git a/drivers/gpu/drm/i915/i915_timeline.c b/drivers/gpu/drm/i915/i915_timeline.c
index 000e1a9b6750..c311ce9c6f9d 100644
--- a/drivers/gpu/drm/i915/i915_timeline.c
+++ b/drivers/gpu/drm/i915/i915_timeline.c
@@ -251,7 +251,6 @@ int i915_timeline_init(struct drm_i915_private *i915,
 
 	timeline->fence_context = dma_fence_context_alloc(1);
 
-	spin_lock_init(&timeline->lock);
 	mutex_init(&timeline->mutex);
 
 	INIT_ACTIVE_REQUEST(&timeline->last_request);
diff --git a/drivers/gpu/drm/i915/i915_timeline.h b/drivers/gpu/drm/i915/i915_timeline.h
index 27668a1a69a3..36e5e5a65155 100644
--- a/drivers/gpu/drm/i915/i915_timeline.h
+++ b/drivers/gpu/drm/i915/i915_timeline.h
@@ -36,25 +36,6 @@ int i915_timeline_init(struct drm_i915_private *i915,
 		       struct i915_vma *hwsp);
 void i915_timeline_fini(struct i915_timeline *tl);
 
-static inline void
-i915_timeline_set_subclass(struct i915_timeline *timeline,
-			   unsigned int subclass)
-{
-	lockdep_set_subclass(&timeline->lock, subclass);
-
-	/*
-	 * Due to an interesting quirk in lockdep's internal debug tracking,
-	 * after setting a subclass we must ensure the lock is used. Otherwise,
-	 * nr_unused_locks is incremented once too often.
-	 */
-#ifdef CONFIG_DEBUG_LOCK_ALLOC
-	local_irq_disable();
-	lock_map_acquire(&timeline->lock.dep_map);
-	lock_map_release(&timeline->lock.dep_map);
-	local_irq_enable();
-#endif
-}
-
 struct i915_timeline *
 i915_timeline_create(struct drm_i915_private *i915,
 		     struct i915_vma *global_hwsp);
diff --git a/drivers/gpu/drm/i915/i915_timeline_types.h b/drivers/gpu/drm/i915/i915_timeline_types.h
index 1688705f4a2b..fce5cb4f1090 100644
--- a/drivers/gpu/drm/i915/i915_timeline_types.h
+++ b/drivers/gpu/drm/i915/i915_timeline_types.h
@@ -23,10 +23,6 @@ struct i915_timeline {
 	u64 fence_context;
 	u32 seqno;
 
-	spinlock_t lock;
-#define TIMELINE_CLIENT 0 /* default subclass */
-#define TIMELINE_ENGINE 1
-#define TIMELINE_VIRTUAL 2
 	struct mutex mutex; /* protects the flow of requests */
 
 	unsigned int pin_count;
diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c
index 97f6970d8da8..db531ebc7704 100644
--- a/drivers/gpu/drm/i915/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/intel_guc_submission.c
@@ -740,7 +740,7 @@ static bool __guc_dequeue(struct intel_engine_cs *engine)
 	bool submit = false;
 	struct rb_node *rb;
 
-	lockdep_assert_held(&engine->timeline.lock);
+	lockdep_assert_held(&engine->active.lock);
 
 	if (port_isset(port)) {
 		if (intel_engine_has_preemption(engine)) {
@@ -822,7 +822,7 @@ static void guc_submission_tasklet(unsigned long data)
 	struct i915_request *rq;
 	unsigned long flags;
 
-	spin_lock_irqsave(&engine->timeline.lock, flags);
+	spin_lock_irqsave(&engine->active.lock, flags);
 
 	rq = port_request(port);
 	while (rq && i915_request_completed(rq)) {
@@ -847,7 +847,7 @@ static void guc_submission_tasklet(unsigned long data)
 	if (!execlists_is_active(execlists, EXECLISTS_ACTIVE_PREEMPT))
 		guc_dequeue(engine);
 
-	spin_unlock_irqrestore(&engine->timeline.lock, flags);
+	spin_unlock_irqrestore(&engine->active.lock, flags);
 }
 
 static void guc_reset_prepare(struct intel_engine_cs *engine)
@@ -884,7 +884,7 @@ static void guc_reset(struct intel_engine_cs *engine, bool stalled)
 	struct i915_request *rq;
 	unsigned long flags;
 
-	spin_lock_irqsave(&engine->timeline.lock, flags);
+	spin_lock_irqsave(&engine->active.lock, flags);
 
 	execlists_cancel_port_requests(execlists);
 
@@ -900,7 +900,7 @@ static void guc_reset(struct intel_engine_cs *engine, bool stalled)
 	intel_lr_context_reset(engine, rq->hw_context, rq->head, stalled);
 
 out_unlock:
-	spin_unlock_irqrestore(&engine->timeline.lock, flags);
+	spin_unlock_irqrestore(&engine->active.lock, flags);
 }
 
 static void guc_cancel_requests(struct intel_engine_cs *engine)
@@ -926,13 +926,13 @@ static void guc_cancel_requests(struct intel_engine_cs *engine)
 	 * submission's irq state, we also wish to remind ourselves that
 	 * it is irq state.)
 	 */
-	spin_lock_irqsave(&engine->timeline.lock, flags);
+	spin_lock_irqsave(&engine->active.lock, flags);
 
 	/* Cancel the requests on the HW and clear the ELSP tracker. */
 	execlists_cancel_port_requests(execlists);
 
 	/* Mark all executing requests as skipped. */
-	list_for_each_entry(rq, &engine->timeline.requests, link) {
+	list_for_each_entry(rq, &engine->active.requests, sched.link) {
 		if (!i915_request_signaled(rq))
 			dma_fence_set_error(&rq->fence, -EIO);
 
@@ -961,7 +961,7 @@ static void guc_cancel_requests(struct intel_engine_cs *engine)
 	execlists->queue = RB_ROOT_CACHED;
 	GEM_BUG_ON(port_isset(execlists->port));
 
-	spin_unlock_irqrestore(&engine->timeline.lock, flags);
+	spin_unlock_irqrestore(&engine->active.lock, flags);
 }
 
 static void guc_reset_finish(struct intel_engine_cs *engine)
diff --git a/drivers/gpu/drm/i915/selftests/mock_timeline.c b/drivers/gpu/drm/i915/selftests/mock_timeline.c
index e084476469ef..65b52be23d42 100644
--- a/drivers/gpu/drm/i915/selftests/mock_timeline.c
+++ b/drivers/gpu/drm/i915/selftests/mock_timeline.c
@@ -13,7 +13,6 @@ void mock_timeline_init(struct i915_timeline *timeline, u64 context)
 	timeline->i915 = NULL;
 	timeline->fence_context = context;
 
-	spin_lock_init(&timeline->lock);
 	mutex_init(&timeline->mutex);
 
 	INIT_ACTIVE_REQUEST(&timeline->last_request);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 08/39] drm/i915: Flush the execution-callbacks on retiring
  2019-06-14  7:09 WIP glance towards struct_mutex removal Chris Wilson
                   ` (6 preceding siblings ...)
  2019-06-14  7:09 ` [PATCH 07/39] drm/i915: Replace engine->timeline with a plain list Chris Wilson
@ 2019-06-14  7:09 ` Chris Wilson
  2019-06-14  7:09 ` [PATCH 09/39] drm/i915/execlists: Preempt-to-busy Chris Wilson
                   ` (32 subsequent siblings)
  40 siblings, 0 replies; 61+ messages in thread
From: Chris Wilson @ 2019-06-14  7:09 UTC (permalink / raw)
  To: intel-gfx

In the unlikely case the request completes while we regard it as not even
executing on the GPU (see the next patch!), we have to flush any pending
execution callbacks at retirement and ensure that we do not add any
more.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_request.c | 93 +++++++++++++++--------------
 1 file changed, 49 insertions(+), 44 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index a619fafd9b86..6d71eefd6c9f 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -119,6 +119,50 @@ const struct dma_fence_ops i915_fence_ops = {
 	.release = i915_fence_release,
 };
 
+static void irq_execute_cb(struct irq_work *wrk)
+{
+	struct execute_cb *cb = container_of(wrk, typeof(*cb), work);
+
+	i915_sw_fence_complete(cb->fence);
+	kmem_cache_free(global.slab_execute_cbs, cb);
+}
+
+static void irq_execute_cb_hook(struct irq_work *wrk)
+{
+	struct execute_cb *cb = container_of(wrk, typeof(*cb), work);
+
+	cb->hook(container_of(cb->fence, struct i915_request, submit),
+		 &cb->signal->fence);
+	i915_request_put(cb->signal);
+
+	irq_execute_cb(wrk);
+}
+
+static void __notify_execute_cb(struct i915_request *rq)
+{
+	struct execute_cb *cb;
+
+	lockdep_assert_held(&rq->lock);
+
+	if (list_empty(&rq->execute_cb))
+		return;
+
+	list_for_each_entry(cb, &rq->execute_cb, link)
+		irq_work_queue(&cb->work);
+
+	/*
+	 * XXX Rollback on __i915_request_unsubmit()
+	 *
+	 * In the future, perhaps when we have an active time-slicing scheduler,
+	 * it will be interesting to unsubmit parallel execution and remove
+	 * busywaits from the GPU until their master is restarted. This is
+	 * quite hairy, we have to carefully rollback the fence and do a
+	 * preempt-to-idle cycle on the target engine, all the while the
+	 * master execute_cb may refire.
+	 */
+	INIT_LIST_HEAD(&rq->execute_cb);
+}
+
 static inline void
 i915_request_remove_from_client(struct i915_request *request)
 {
@@ -246,6 +290,11 @@ static bool i915_request_retire(struct i915_request *rq)
 		GEM_BUG_ON(!atomic_read(&rq->i915->gt_pm.rps.num_waiters));
 		atomic_dec(&rq->i915->gt_pm.rps.num_waiters);
 	}
+	if (!test_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags)) {
+		set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
+		__notify_execute_cb(rq);
+	}
+	GEM_BUG_ON(!list_empty(&rq->execute_cb));
 	spin_unlock(&rq->lock);
 
 	local_irq_enable();
@@ -285,50 +334,6 @@ void i915_request_retire_upto(struct i915_request *rq)
 	} while (i915_request_retire(tmp) && tmp != rq);
 }
 
-static void irq_execute_cb(struct irq_work *wrk)
-{
-	struct execute_cb *cb = container_of(wrk, typeof(*cb), work);
-
-	i915_sw_fence_complete(cb->fence);
-	kmem_cache_free(global.slab_execute_cbs, cb);
-}
-
-static void irq_execute_cb_hook(struct irq_work *wrk)
-{
-	struct execute_cb *cb = container_of(wrk, typeof(*cb), work);
-
-	cb->hook(container_of(cb->fence, struct i915_request, submit),
-		 &cb->signal->fence);
-	i915_request_put(cb->signal);
-
-	irq_execute_cb(wrk);
-}
-
-static void __notify_execute_cb(struct i915_request *rq)
-{
-	struct execute_cb *cb;
-
-	lockdep_assert_held(&rq->lock);
-
-	if (list_empty(&rq->execute_cb))
-		return;
-
-	list_for_each_entry(cb, &rq->execute_cb, link)
-		irq_work_queue(&cb->work);
-
-	/*
-	 * XXX Rollback on __i915_request_unsubmit()
-	 *
-	 * In the future, perhaps when we have an active time-slicing scheduler,
-	 * it will be interesting to unsubmit parallel execution and remove
-	 * busywaits from the GPU until their master is restarted. This is
-	 * quite hairy, we have to carefully rollback the fence and do a
-	 * preempt-to-idle cycle on the target engine, all the while the
-	 * master execute_cb may refire.
-	 */
-	INIT_LIST_HEAD(&rq->execute_cb);
-}
-
 static int
 __i915_request_await_execution(struct i915_request *rq,
 			       struct i915_request *signal,
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 09/39] drm/i915/execlists: Preempt-to-busy
  2019-06-14  7:09 WIP glance towards struct_mutex removal Chris Wilson
                   ` (7 preceding siblings ...)
  2019-06-14  7:09 ` [PATCH 08/39] drm/i915: Flush the execution-callbacks on retiring Chris Wilson
@ 2019-06-14  7:09 ` Chris Wilson
  2019-06-14  7:09 ` [PATCH 10/39] drm/i915/execlists: Minimalistic timeslicing Chris Wilson
                   ` (31 subsequent siblings)
  40 siblings, 0 replies; 61+ messages in thread
From: Chris Wilson @ 2019-06-14  7:09 UTC (permalink / raw)
  To: intel-gfx

When using a global seqno, we required a precise stop-the-workd event to
handle preemption and unwind the global seqno counter. To accomplish
this, we would preempt to a special out-of-band context and wait for the
machine to report that it was idle. Given an idle machine, we could very
precisely see which requests had completed and which we needed to feed
back into the run queue.

However, now that we have scrapped the global seqno, we no longer need
to precisely unwind the global counter and only track requests by their
per-context seqno. This allows us to loosely unwind inflight requests
while scheduling a preemption, with the enormous caveat that the
requests we put back on the run queue are still _inflight_ (until the
preemption request is complete). This makes request tracking much more
messy, as at any point then we can see a completed request that we
believe is not currently scheduled for execution. We also have to be
careful not to rewind RING_TAIL past RING_HEAD on preempting to the
running context, and for this we use a semaphore to prevent completion
of the request before continuing.

To accomplish this feat, we change how we track requests scheduled to
the HW. Instead of appending our requests onto a single list as we
submit, we track each submission to ELSP as its own block. Then upon
receiving the CS preemption event, we promote the pending block to the
inflight block (discarding what was previously being tracked). As normal
CS completion events arrive, we then remove stale entries from the
inflight tracker.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c   |   2 +-
 drivers/gpu/drm/i915/gt/intel_context_types.h |   5 +
 drivers/gpu/drm/i915/gt/intel_engine.h        |  61 +-
 drivers/gpu/drm/i915/gt/intel_engine_cs.c     |  61 +-
 drivers/gpu/drm/i915/gt/intel_engine_types.h  |  52 +-
 drivers/gpu/drm/i915/gt/intel_lrc.c           | 671 ++++++++----------
 drivers/gpu/drm/i915/i915_gpu_error.c         |  19 +-
 drivers/gpu/drm/i915/i915_request.c           |   6 +
 drivers/gpu/drm/i915/i915_request.h           |   1 +
 drivers/gpu/drm/i915/i915_scheduler.c         |   3 +-
 drivers/gpu/drm/i915/i915_utils.h             |  12 +
 drivers/gpu/drm/i915/intel_guc_submission.c   | 175 ++---
 drivers/gpu/drm/i915/selftests/i915_request.c |   8 +-
 13 files changed, 465 insertions(+), 611 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 6200060aef05..1ce122f4ed25 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -646,7 +646,7 @@ static void init_contexts(struct drm_i915_private *i915)
 
 static bool needs_preempt_context(struct drm_i915_private *i915)
 {
-	return HAS_EXECLISTS(i915);
+	return USES_GUC_SUBMISSION(i915);
 }
 
 int i915_gem_contexts_init(struct drm_i915_private *dev_priv)
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
index e95be4be9612..b565c3ff4378 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -13,6 +13,7 @@
 #include <linux/types.h>
 
 #include "i915_active_types.h"
+#include "i915_utils.h"
 #include "intel_engine_types.h"
 #include "intel_sseu.h"
 
@@ -38,6 +39,10 @@ struct intel_context {
 	struct i915_gem_context *gem_context;
 	struct intel_engine_cs *engine;
 	struct intel_engine_cs *inflight;
+#define intel_context_inflight(ce) ptr_mask_bits((ce)->inflight, 2)
+#define intel_context_inflight_count(ce)  ptr_unmask_bits((ce)->inflight, 2)
+#define intel_context_inflight_inc(ce) ptr_count_inc(&(ce)->inflight)
+#define intel_context_inflight_dec(ce) ptr_count_dec(&(ce)->inflight)
 
 	struct list_head signal_link;
 	struct list_head signals;
diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h
index 2f1c6871ee95..9bb6ff76680e 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine.h
@@ -125,71 +125,26 @@ hangcheck_action_to_str(const enum intel_engine_hangcheck_action a)
 
 void intel_engines_set_scheduler_caps(struct drm_i915_private *i915);
 
-static inline void
-execlists_set_active(struct intel_engine_execlists *execlists,
-		     unsigned int bit)
-{
-	__set_bit(bit, (unsigned long *)&execlists->active);
-}
-
-static inline bool
-execlists_set_active_once(struct intel_engine_execlists *execlists,
-			  unsigned int bit)
-{
-	return !__test_and_set_bit(bit, (unsigned long *)&execlists->active);
-}
-
-static inline void
-execlists_clear_active(struct intel_engine_execlists *execlists,
-		       unsigned int bit)
-{
-	__clear_bit(bit, (unsigned long *)&execlists->active);
-}
-
-static inline void
-execlists_clear_all_active(struct intel_engine_execlists *execlists)
+static inline unsigned int
+execlists_num_ports(const struct intel_engine_execlists * const execlists)
 {
-	execlists->active = 0;
+	return execlists->port_mask + 1;
 }
 
-static inline bool
-execlists_is_active(const struct intel_engine_execlists *execlists,
-		    unsigned int bit)
+static inline struct i915_request *
+execlists_active(const struct intel_engine_execlists *execlists)
 {
-	return test_bit(bit, (unsigned long *)&execlists->active);
+	GEM_BUG_ON(execlists->active - execlists->inflight >
+		   execlists_num_ports(execlists));
+	return READ_ONCE(*execlists->active);
 }
 
-void execlists_user_begin(struct intel_engine_execlists *execlists,
-			  const struct execlist_port *port);
-void execlists_user_end(struct intel_engine_execlists *execlists);
-
 void
 execlists_cancel_port_requests(struct intel_engine_execlists * const execlists);
 
 struct i915_request *
 execlists_unwind_incomplete_requests(struct intel_engine_execlists *execlists);
 
-static inline unsigned int
-execlists_num_ports(const struct intel_engine_execlists * const execlists)
-{
-	return execlists->port_mask + 1;
-}
-
-static inline struct execlist_port *
-execlists_port_complete(struct intel_engine_execlists * const execlists,
-			struct execlist_port * const port)
-{
-	const unsigned int m = execlists->port_mask;
-
-	GEM_BUG_ON(port_index(port, execlists) != 0);
-	GEM_BUG_ON(!execlists_is_active(execlists, EXECLISTS_ACTIVE_USER));
-
-	memmove(port, port + 1, m * sizeof(struct execlist_port));
-	memset(port + m, 0, sizeof(struct execlist_port));
-
-	return port;
-}
-
 static inline u32
 intel_read_status_page(const struct intel_engine_cs *engine, int reg)
 {
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index 01f50cfd517c..01357280449a 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -508,6 +508,10 @@ void intel_engine_init_execlists(struct intel_engine_cs *engine)
 	GEM_BUG_ON(!is_power_of_2(execlists_num_ports(execlists)));
 	GEM_BUG_ON(execlists_num_ports(execlists) > EXECLIST_MAX_PORTS);
 
+	memset(execlists->pending, 0, sizeof(execlists->pending));
+	execlists->active =
+		memset(execlists->inflight, 0, sizeof(execlists->inflight));
+
 	execlists->queue_priority_hint = INT_MIN;
 	execlists->queue = RB_ROOT_CACHED;
 }
@@ -1152,7 +1156,7 @@ bool intel_engine_is_idle(struct intel_engine_cs *engine)
 		return true;
 
 	/* Waiting to drain ELSP? */
-	if (READ_ONCE(engine->execlists.active)) {
+	if (execlists_active(&engine->execlists)) {
 		struct tasklet_struct *t = &engine->execlists.tasklet;
 
 		synchronize_hardirq(engine->i915->drm.irq);
@@ -1169,7 +1173,7 @@ bool intel_engine_is_idle(struct intel_engine_cs *engine)
 		/* Otherwise flush the tasklet if it was on another cpu */
 		tasklet_unlock_wait(t);
 
-		if (READ_ONCE(engine->execlists.active))
+		if (execlists_active(&engine->execlists))
 			return false;
 	}
 
@@ -1366,6 +1370,7 @@ static void intel_engine_print_registers(const struct intel_engine_cs *engine,
 	}
 
 	if (HAS_EXECLISTS(dev_priv)) {
+		struct i915_request * const *port, *rq;
 		const u32 *hws =
 			&engine->status_page.addr[I915_HWS_CSB_BUF0_INDEX];
 		const u8 num_entries = execlists->csb_size;
@@ -1398,26 +1403,28 @@ static void intel_engine_print_registers(const struct intel_engine_cs *engine,
 		}
 
 		rcu_read_lock();
-		for (idx = 0; idx < execlists_num_ports(execlists); idx++) {
-			struct i915_request *rq;
-			unsigned int count;
-
-			rq = port_unpack(&execlists->port[idx], &count);
-			if (rq) {
-				char hdr[80];
-
-				snprintf(hdr, sizeof(hdr),
-					 "\t\tELSP[%d] count=%d, ring:{start:%08x, hwsp:%08x, seqno:%08x}, rq: ",
-					 idx, count,
-					 i915_ggtt_offset(rq->ring->vma),
-					 rq->timeline->hwsp_offset,
-					 hwsp_seqno(rq));
-				print_request(m, rq, hdr);
-			} else {
-				drm_printf(m, "\t\tELSP[%d] idle\n", idx);
-			}
+		for (port = execlists->active; (rq = *port); port++) {
+			char hdr[80];
+
+			snprintf(hdr, sizeof(hdr),
+				 "\t\tActive[%d] ring:{start:%08x, hwsp:%08x, seqno:%08x}, rq: ",
+				 (int)(port - execlists->active),
+				 i915_ggtt_offset(rq->ring->vma),
+				 rq->timeline->hwsp_offset,
+				 hwsp_seqno(rq));
+			print_request(m, rq, hdr);
+		}
+		for (port = execlists->pending; (rq = *port); port++) {
+			char hdr[80];
+
+			snprintf(hdr, sizeof(hdr),
+				 "\t\tPending[%d] ring:{start:%08x, hwsp:%08x, seqno:%08x}, rq: ",
+				 (int)(port - execlists->pending),
+				 i915_ggtt_offset(rq->ring->vma),
+				 rq->timeline->hwsp_offset,
+				 hwsp_seqno(rq));
+			print_request(m, rq, hdr);
 		}
-		drm_printf(m, "\t\tHW active? 0x%x\n", execlists->active);
 		rcu_read_unlock();
 	} else if (INTEL_GEN(dev_priv) > 6) {
 		drm_printf(m, "\tPP_DIR_BASE: 0x%08x\n",
@@ -1581,15 +1588,19 @@ int intel_enable_engine_stats(struct intel_engine_cs *engine)
 	}
 
 	if (engine->stats.enabled++ == 0) {
-		const struct execlist_port *port = execlists->port;
-		unsigned int num_ports = execlists_num_ports(execlists);
+		struct i915_request * const *port;
+		struct i915_request *rq;
 
 		engine->stats.enabled_at = ktime_get();
 
 		/* XXX submission method oblivious? */
-		while (num_ports-- && port_isset(port)) {
+		for (port = execlists->active; (rq = *port); port++)
 			engine->stats.active++;
-			port++;
+
+		for (port = execlists->pending; (rq = *port); port++) {
+			/* Exclude any contexts already counted in active */
+			if (intel_context_inflight_count(rq->hw_context) == 1)
+				engine->stats.active++;
 		}
 
 		if (engine->stats.active)
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index b2faca8e5dec..dd0082df42cc 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -160,51 +160,10 @@ struct intel_engine_execlists {
 	 */
 	u32 __iomem *ctrl_reg;
 
-	/**
-	 * @port: execlist port states
-	 *
-	 * For each hardware ELSP (ExecList Submission Port) we keep
-	 * track of the last request and the number of times we submitted
-	 * that port to hw. We then count the number of times the hw reports
-	 * a context completion or preemption. As only one context can
-	 * be active on hw, we limit resubmission of context to port[0]. This
-	 * is called Lite Restore, of the context.
-	 */
-	struct execlist_port {
-		/**
-		 * @request_count: combined request and submission count
-		 */
-		struct i915_request *request_count;
-#define EXECLIST_COUNT_BITS 2
-#define port_request(p) ptr_mask_bits((p)->request_count, EXECLIST_COUNT_BITS)
-#define port_count(p) ptr_unmask_bits((p)->request_count, EXECLIST_COUNT_BITS)
-#define port_pack(rq, count) ptr_pack_bits(rq, count, EXECLIST_COUNT_BITS)
-#define port_unpack(p, count) ptr_unpack_bits((p)->request_count, count, EXECLIST_COUNT_BITS)
-#define port_set(p, packed) ((p)->request_count = (packed))
-#define port_isset(p) ((p)->request_count)
-#define port_index(p, execlists) ((p) - (execlists)->port)
-
-		/**
-		 * @context_id: context ID for port
-		 */
-		GEM_DEBUG_DECL(u32 context_id);
-
 #define EXECLIST_MAX_PORTS 2
-	} port[EXECLIST_MAX_PORTS];
-
-	/**
-	 * @active: is the HW active? We consider the HW as active after
-	 * submitting any context for execution and until we have seen the
-	 * last context completion event. After that, we do not expect any
-	 * more events until we submit, and so can park the HW.
-	 *
-	 * As we have a small number of different sources from which we feed
-	 * the HW, we track the state of each inside a single bitfield.
-	 */
-	unsigned int active;
-#define EXECLISTS_ACTIVE_USER 0
-#define EXECLISTS_ACTIVE_PREEMPT 1
-#define EXECLISTS_ACTIVE_HWACK 2
+	struct i915_request * const *active;
+	struct i915_request *inflight[EXECLIST_MAX_PORTS + 1 /* sentinel */];
+	struct i915_request *pending[EXECLIST_MAX_PORTS + 1];
 
 	/**
 	 * @port_mask: number of execlist ports - 1
@@ -245,11 +204,6 @@ struct intel_engine_execlists {
 	 */
 	u32 *csb_status;
 
-	/**
-	 * @preempt_complete_status: expected CSB upon completing preemption
-	 */
-	u32 preempt_complete_status;
-
 	/**
 	 * @csb_size: context status buffer FIFO size
 	 */
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index c400c66d0ee5..65c91b7db59d 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -161,6 +161,8 @@
 #define GEN8_CTX_STATUS_COMPLETED_MASK \
 	 (GEN8_CTX_STATUS_COMPLETE | GEN8_CTX_STATUS_PREEMPTED)
 
+#define CTX_DESC_FORCE_RESTORE BIT_ULL(2)
+
 /* Typical size of the average request (2 pipecontrols and a MI_BB) */
 #define EXECLISTS_REQUEST_SIZE 64 /* bytes */
 #define WA_TAIL_DWORDS 2
@@ -221,6 +223,14 @@ static void execlists_init_reg_state(u32 *reg_state,
 				     struct intel_engine_cs *engine,
 				     struct intel_ring *ring);
 
+static inline u32 intel_hws_preempt_address(struct intel_engine_cs *engine)
+{
+	return (i915_ggtt_offset(engine->status_page.vma) +
+		I915_GEM_HWS_PREEMPT_ADDR);
+}
+
+#define ring_pause(E) ((E)->status_page.addr[I915_GEM_HWS_PREEMPT])
+
 static inline struct i915_priolist *to_priolist(struct rb_node *rb)
 {
 	return rb_entry(rb, struct i915_priolist, node);
@@ -271,12 +281,6 @@ static inline bool need_preempt(const struct intel_engine_cs *engine,
 {
 	int last_prio;
 
-	if (!engine->preempt_context)
-		return false;
-
-	if (i915_request_completed(rq))
-		return false;
-
 	/*
 	 * Check if the current priority hint merits a preemption attempt.
 	 *
@@ -338,9 +342,6 @@ __maybe_unused static inline bool
 assert_priority_queue(const struct i915_request *prev,
 		      const struct i915_request *next)
 {
-	const struct intel_engine_execlists *execlists =
-		&prev->engine->execlists;
-
 	/*
 	 * Without preemption, the prev may refer to the still active element
 	 * which we refuse to let go.
@@ -348,7 +349,7 @@ assert_priority_queue(const struct i915_request *prev,
 	 * Even with preemption, there are times when we think it is better not
 	 * to preempt and leave an ostensibly lower priority request in flight.
 	 */
-	if (port_request(execlists->port) == prev)
+	if (i915_request_is_active(prev))
 		return true;
 
 	return rq_prio(prev) >= rq_prio(next);
@@ -442,13 +443,11 @@ __unwind_incomplete_requests(struct intel_engine_cs *engine)
 		struct intel_engine_cs *owner;
 
 		if (i915_request_completed(rq))
-			break;
+			continue; /* XXX */
 
 		__i915_request_unsubmit(rq);
 		unwind_wa_tail(rq);
 
-		GEM_BUG_ON(rq->hw_context->inflight);
-
 		/*
 		 * Push the request back into the queue for later resubmission.
 		 * If this request is not native to this physical engine (i.e.
@@ -500,32 +499,32 @@ execlists_context_status_change(struct i915_request *rq, unsigned long status)
 				   status, rq);
 }
 
-inline void
-execlists_user_begin(struct intel_engine_execlists *execlists,
-		     const struct execlist_port *port)
+static inline struct i915_request *
+execlists_schedule_in(struct i915_request *rq, int idx)
 {
-	execlists_set_active_once(execlists, EXECLISTS_ACTIVE_USER);
-}
+	struct intel_context *ce = rq->hw_context;
+	int count;
 
-inline void
-execlists_user_end(struct intel_engine_execlists *execlists)
-{
-	execlists_clear_active(execlists, EXECLISTS_ACTIVE_USER);
-}
+	trace_i915_request_in(rq, idx);
 
-static inline void
-execlists_context_schedule_in(struct i915_request *rq)
-{
-	GEM_BUG_ON(rq->hw_context->inflight);
+	count = intel_context_inflight_count(ce);
+	if (!count) {
+		intel_context_get(ce);
+		ce->inflight = rq->engine;
+
+		execlists_context_status_change(rq, INTEL_CONTEXT_SCHEDULE_IN);
+		intel_engine_context_in(ce->inflight);
+	}
+
+	intel_context_inflight_inc(ce);
+	GEM_BUG_ON(intel_context_inflight(ce) != rq->engine);
 
-	execlists_context_status_change(rq, INTEL_CONTEXT_SCHEDULE_IN);
-	intel_engine_context_in(rq->engine);
-	rq->hw_context->inflight = rq->engine;
+	return i915_request_get(rq);
 }
 
-static void kick_siblings(struct i915_request *rq)
+static void kick_siblings(struct i915_request *rq, struct intel_context *ce)
 {
-	struct virtual_engine *ve = to_virtual_engine(rq->hw_context->engine);
+	struct virtual_engine *ve = container_of(ce, typeof(*ve), context);
 	struct i915_request *next = READ_ONCE(ve->request);
 
 	if (next && next->execution_mask & ~rq->execution_mask)
@@ -533,29 +532,42 @@ static void kick_siblings(struct i915_request *rq)
 }
 
 static inline void
-execlists_context_schedule_out(struct i915_request *rq, unsigned long status)
+execlists_schedule_out(struct i915_request *rq)
 {
-	rq->hw_context->inflight = NULL;
-	intel_engine_context_out(rq->engine);
-	execlists_context_status_change(rq, status);
+	struct intel_context *ce = rq->hw_context;
+
+	GEM_BUG_ON(!intel_context_inflight_count(ce));
+
 	trace_i915_request_out(rq);
 
-	/*
-	 * If this is part of a virtual engine, its next request may have
-	 * been blocked waiting for access to the active context. We have
-	 * to kick all the siblings again in case we need to switch (e.g.
-	 * the next request is not runnable on this engine). Hopefully,
-	 * we will already have submitted the next request before the
-	 * tasklet runs and do not need to rebuild each virtual tree
-	 * and kick everyone again.
-	 */
-	if (rq->engine != rq->hw_context->engine)
-		kick_siblings(rq);
+	intel_context_inflight_dec(ce);
+	if (!intel_context_inflight_count(ce)) {
+		intel_engine_context_out(ce->inflight);
+		execlists_context_status_change(rq, INTEL_CONTEXT_SCHEDULE_OUT);
+
+		ce->inflight = NULL;
+		intel_context_put(ce);
+
+		/*
+		 * If this is part of a virtual engine, its next request may
+		 * have been blocked waiting for access to the active context.
+		 * We have to kick all the siblings again in case we need to
+		 * switch (e.g. the next request is not runnable on this
+		 * engine). Hopefully, we will already have submitted the next
+		 * request before the tasklet runs and do not need to rebuild
+		 * each virtual tree and kick everyone again.
+		 */
+		if (rq->engine != ce->engine)
+			kick_siblings(rq, ce);
+	}
+
+	i915_request_put(rq);
 }
 
-static u64 execlists_update_context(struct i915_request *rq)
+static u64 execlists_update_context(const struct i915_request *rq)
 {
 	struct intel_context *ce = rq->hw_context;
+	u64 desc;
 
 	ce->lrc_reg_state[CTX_RING_TAIL + 1] =
 		intel_ring_set_tail(rq->ring, rq->tail);
@@ -576,7 +588,11 @@ static u64 execlists_update_context(struct i915_request *rq)
 	 * wmb).
 	 */
 	mb();
-	return ce->lrc_desc;
+
+	desc = ce->lrc_desc;
+	ce->lrc_desc &= ~CTX_DESC_FORCE_RESTORE;
+
+	return desc;
 }
 
 static inline void write_desc(struct intel_engine_execlists *execlists, u64 desc, u32 port)
@@ -590,12 +606,54 @@ static inline void write_desc(struct intel_engine_execlists *execlists, u64 desc
 	}
 }
 
+static __maybe_unused void
+trace_ports(const struct intel_engine_execlists *execlists,
+	    const char *msg,
+	    struct i915_request * const *ports)
+{
+	const struct intel_engine_cs *engine =
+		container_of(execlists, typeof(*engine), execlists);
+
+	GEM_TRACE("%s: %s { %llx:%lld%s, %llx:%lld }\n",
+		  engine->name, msg,
+		  ports[0]->fence.context,
+		  ports[0]->fence.seqno,
+		  i915_request_completed(ports[0]) ? "!" :
+		  i915_request_started(ports[0]) ? "*" :
+		  "",
+		  ports[1] ? ports[1]->fence.context : 0,
+		  ports[1] ? ports[1]->fence.seqno : 0);
+}
+
+static __maybe_unused bool
+assert_pending_valid(const struct intel_engine_execlists *execlists,
+		     const char *msg)
+{
+	struct i915_request * const *port, *rq;
+	struct intel_context *ce = NULL;
+
+	trace_ports(execlists, msg, execlists->pending);
+
+	if (execlists->pending[execlists_num_ports(execlists)])
+		return false;
+
+	for (port = execlists->pending; (rq = *port); port++) {
+		if (ce == rq->hw_context)
+			return false;
+
+		ce = rq->hw_context;
+	}
+
+	return ce;
+}
+
 static void execlists_submit_ports(struct intel_engine_cs *engine)
 {
 	struct intel_engine_execlists *execlists = &engine->execlists;
-	struct execlist_port *port = execlists->port;
 	unsigned int n;
 
+	GEM_BUG_ON(!assert_pending_valid(execlists, "submit"));
+
 	/*
 	 * We can skip acquiring intel_runtime_pm_get() here as it was taken
 	 * on our behalf by the request (see i915_gem_mark_busy()) and it will
@@ -613,38 +671,16 @@ static void execlists_submit_ports(struct intel_engine_cs *engine)
 	 * of elsq entries, keep this in mind before changing the loop below.
 	 */
 	for (n = execlists_num_ports(execlists); n--; ) {
-		struct i915_request *rq;
-		unsigned int count;
-		u64 desc;
+		struct i915_request *rq = execlists->pending[n];
 
-		rq = port_unpack(&port[n], &count);
-		if (rq) {
-			GEM_BUG_ON(count > !n);
-			if (!count++)
-				execlists_context_schedule_in(rq);
-			port_set(&port[n], port_pack(rq, count));
-			desc = execlists_update_context(rq);
-			GEM_DEBUG_EXEC(port[n].context_id = upper_32_bits(desc));
-
-			GEM_TRACE("%s in[%d]:  ctx=%d.%d, fence %llx:%lld (current %d), prio=%d\n",
-				  engine->name, n,
-				  port[n].context_id, count,
-				  rq->fence.context, rq->fence.seqno,
-				  hwsp_seqno(rq),
-				  rq_prio(rq));
-		} else {
-			GEM_BUG_ON(!n);
-			desc = 0;
-		}
-
-		write_desc(execlists, desc, n);
+		write_desc(execlists,
+			   rq ? execlists_update_context(rq) : 0,
+			   n);
 	}
 
 	/* we need to manually load the submit queue */
 	if (execlists->ctrl_reg)
 		writel(EL_CTRL_LOAD, execlists->ctrl_reg);
-
-	execlists_clear_active(execlists, EXECLISTS_ACTIVE_HWACK);
 }
 
 static bool ctx_single_port_submission(const struct intel_context *ce)
@@ -668,6 +704,7 @@ static bool can_merge_ctx(const struct intel_context *prev,
 static bool can_merge_rq(const struct i915_request *prev,
 			 const struct i915_request *next)
 {
+	GEM_BUG_ON(prev == next);
 	GEM_BUG_ON(!assert_priority_queue(prev, next));
 
 	if (!can_merge_ctx(prev->hw_context, next->hw_context))
@@ -676,58 +713,6 @@ static bool can_merge_rq(const struct i915_request *prev,
 	return true;
 }
 
-static void port_assign(struct execlist_port *port, struct i915_request *rq)
-{
-	GEM_BUG_ON(rq == port_request(port));
-
-	if (port_isset(port))
-		i915_request_put(port_request(port));
-
-	port_set(port, port_pack(i915_request_get(rq), port_count(port)));
-}
-
-static void inject_preempt_context(struct intel_engine_cs *engine)
-{
-	struct intel_engine_execlists *execlists = &engine->execlists;
-	struct intel_context *ce = engine->preempt_context;
-	unsigned int n;
-
-	GEM_BUG_ON(execlists->preempt_complete_status !=
-		   upper_32_bits(ce->lrc_desc));
-
-	/*
-	 * Switch to our empty preempt context so
-	 * the state of the GPU is known (idle).
-	 */
-	GEM_TRACE("%s\n", engine->name);
-	for (n = execlists_num_ports(execlists); --n; )
-		write_desc(execlists, 0, n);
-
-	write_desc(execlists, ce->lrc_desc, n);
-
-	/* we need to manually load the submit queue */
-	if (execlists->ctrl_reg)
-		writel(EL_CTRL_LOAD, execlists->ctrl_reg);
-
-	execlists_clear_active(execlists, EXECLISTS_ACTIVE_HWACK);
-	execlists_set_active(execlists, EXECLISTS_ACTIVE_PREEMPT);
-
-	(void)I915_SELFTEST_ONLY(execlists->preempt_hang.count++);
-}
-
-static void complete_preempt_context(struct intel_engine_execlists *execlists)
-{
-	GEM_BUG_ON(!execlists_is_active(execlists, EXECLISTS_ACTIVE_PREEMPT));
-
-	if (inject_preempt_hang(execlists))
-		return;
-
-	execlists_cancel_port_requests(execlists);
-	__unwind_incomplete_requests(container_of(execlists,
-						  struct intel_engine_cs,
-						  execlists));
-}
-
 static void virtual_update_register_offsets(u32 *regs,
 					    struct intel_engine_cs *engine)
 {
@@ -792,7 +777,7 @@ static bool virtual_matches(const struct virtual_engine *ve,
 	 * we reuse the register offsets). This is a very small
 	 * hystersis on the greedy seelction algorithm.
 	 */
-	inflight = READ_ONCE(ve->context.inflight);
+	inflight = intel_context_inflight(&ve->context);
 	if (inflight && inflight != engine)
 		return false;
 
@@ -815,13 +800,23 @@ static void virtual_xfer_breadcrumbs(struct virtual_engine *ve,
 	spin_unlock(&old->breadcrumbs.irq_lock);
 }
 
+static struct i915_request *
+last_active(const struct intel_engine_execlists *execlists)
+{
+	struct i915_request * const *last = execlists->active;
+
+	while (*last && i915_request_completed(*last))
+		last++;
+
+	return *last;
+}
+
 static void execlists_dequeue(struct intel_engine_cs *engine)
 {
 	struct intel_engine_execlists * const execlists = &engine->execlists;
-	struct execlist_port *port = execlists->port;
-	const struct execlist_port * const last_port =
-		&execlists->port[execlists->port_mask];
-	struct i915_request *last = port_request(port);
+	struct i915_request **port = execlists->pending;
+	struct i915_request ** const last_port = port + execlists->port_mask;
+	struct i915_request *last;
 	struct rb_node *rb;
 	bool submit = false;
 
@@ -867,65 +862,72 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 		break;
 	}
 
+	/*
+	 * If the queue is higher priority than the last
+	 * request in the currently active context, submit afresh.
+	 * We will resubmit again afterwards in case we need to split
+	 * the active context to interject the preemption request,
+	 * i.e. we will retrigger preemption following the ack in case
+	 * of trouble.
+	 */
+	last = last_active(execlists);
 	if (last) {
-		/*
-		 * Don't resubmit or switch until all outstanding
-		 * preemptions (lite-restore) are seen. Then we
-		 * know the next preemption status we see corresponds
-		 * to this ELSP update.
-		 */
-		GEM_BUG_ON(!execlists_is_active(execlists,
-						EXECLISTS_ACTIVE_USER));
-		GEM_BUG_ON(!port_count(&port[0]));
-
-		/*
-		 * If we write to ELSP a second time before the HW has had
-		 * a chance to respond to the previous write, we can confuse
-		 * the HW and hit "undefined behaviour". After writing to ELSP,
-		 * we must then wait until we see a context-switch event from
-		 * the HW to indicate that it has had a chance to respond.
-		 */
-		if (!execlists_is_active(execlists, EXECLISTS_ACTIVE_HWACK))
-			return;
-
 		if (need_preempt(engine, last, rb)) {
-			inject_preempt_context(engine);
-			return;
-		}
+			GEM_TRACE("%s: preempting last=%llx:%lld, prio=%d, hint=%d\n",
+				  engine->name,
+				  last->fence.context,
+				  last->fence.seqno,
+				  last->sched.attr.priority,
+				  execlists->queue_priority_hint);
+			/*
+			 * Don't let the RING_HEAD advance past the breadcrumb
+			 * as we unwind (and until we resubmit) so that we do
+			 * not accidentally tell it to go backwards.
+			 */
+			ring_pause(engine) = 1;
 
-		/*
-		 * In theory, we could coalesce more requests onto
-		 * the second port (the first port is active, with
-		 * no preemptions pending). However, that means we
-		 * then have to deal with the possible lite-restore
-		 * of the second port (as we submit the ELSP, there
-		 * may be a context-switch) but also we may complete
-		 * the resubmission before the context-switch. Ergo,
-		 * coalescing onto the second port will cause a
-		 * preemption event, but we cannot predict whether
-		 * that will affect port[0] or port[1].
-		 *
-		 * If the second port is already active, we can wait
-		 * until the next context-switch before contemplating
-		 * new requests. The GPU will be busy and we should be
-		 * able to resubmit the new ELSP before it idles,
-		 * avoiding pipeline bubbles (momentary pauses where
-		 * the driver is unable to keep up the supply of new
-		 * work). However, we have to double check that the
-		 * priorities of the ports haven't been switch.
-		 */
-		if (port_count(&port[1]))
-			return;
+			/*
+			 * Note that we have not stopped the GPU at this point,
+			 * so we are unwinding the incomplete requests as they
+			 * remain inflight and so by the time we do complete
+			 * the preemption, some of the unwound requests may
+			 * complete!
+			 */
+			__unwind_incomplete_requests(engine);
 
-		/*
-		 * WaIdleLiteRestore:bdw,skl
-		 * Apply the wa NOOPs to prevent
-		 * ring:HEAD == rq:TAIL as we resubmit the
-		 * request. See gen8_emit_fini_breadcrumb() for
-		 * where we prepare the padding after the
-		 * end of the request.
-		 */
-		last->tail = last->wa_tail;
+			/*
+			 * If we need to return to the preempted context, we
+			 * need to skip the lite-restore and force it to
+			 * reload the RING_TAIL. Otherwise, the HW has a
+			 * tendency to ignore us rewinding the TAIL to the
+			 * end of an earlier request.
+			 */
+			last->hw_context->lrc_desc |= CTX_DESC_FORCE_RESTORE;
+			last = NULL;
+		} else {
+			/*
+			 * Otherwise if we already have a request pending
+			 * for execution after the current one, we can
+			 * just wait until the next CS event before
+			 * queuing more. In either case we will force a
+			 * lite-restore preemption event, but if we wait
+			 * we hopefully coalesce several updates into a single
+			 * submission.
+			 */
+			if (!list_is_last(&last->sched.link,
+					  &engine->active.requests))
+				return;
+
+			/*
+			 * WaIdleLiteRestore:bdw,skl
+			 * Apply the wa NOOPs to prevent
+			 * ring:HEAD == rq:TAIL as we resubmit the
+			 * request. See gen8_emit_fini_breadcrumb() for
+			 * where we prepare the padding after the
+			 * end of the request.
+			 */
+			last->tail = last->wa_tail;
+		}
 	}
 
 	while (rb) { /* XXX virtual is always taking precedence */
@@ -955,9 +957,24 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 				continue;
 			}
 
+			if (i915_request_completed(rq)) {
+				ve->request = NULL;
+				ve->base.execlists.queue_priority_hint = INT_MIN;
+				rb_erase_cached(rb, &execlists->virtual);
+				RB_CLEAR_NODE(rb);
+
+				rq->engine = engine;
+				__i915_request_submit(rq);
+
+				spin_unlock(&ve->base.active.lock);
+
+				rb = rb_first_cached(&execlists->virtual);
+				continue;
+			}
+
 			if (last && !can_merge_rq(last, rq)) {
 				spin_unlock(&ve->base.active.lock);
-				return; /* leave this rq for another engine */
+				return; /* leave this for another */
 			}
 
 			GEM_TRACE("%s: virtual rq=%llx:%lld%s, new engine? %s\n",
@@ -1006,9 +1023,10 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 			}
 
 			__i915_request_submit(rq);
-			trace_i915_request_in(rq, port_index(port, execlists));
-			submit = true;
-			last = rq;
+			if (!i915_request_completed(rq)) {
+				submit = true;
+				last = rq;
+			}
 		}
 
 		spin_unlock(&ve->base.active.lock);
@@ -1021,6 +1039,9 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 		int i;
 
 		priolist_for_each_request_consume(rq, rn, p, i) {
+			if (i915_request_completed(rq))
+				goto skip;
+
 			/*
 			 * Can we combine this request with the current port?
 			 * It has to be the same context/ringbuffer and not
@@ -1060,19 +1081,14 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 				    ctx_single_port_submission(rq->hw_context))
 					goto done;
 
-
-				if (submit)
-					port_assign(port, last);
+				*port = execlists_schedule_in(last, port - execlists->pending);
 				port++;
-
-				GEM_BUG_ON(port_isset(port));
 			}
 
-			__i915_request_submit(rq);
-			trace_i915_request_in(rq, port_index(port, execlists));
-
 			last = rq;
 			submit = true;
+skip:
+			__i915_request_submit(rq);
 		}
 
 		rb_erase_cached(&p->node, &execlists->queue);
@@ -1097,54 +1113,30 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 	 * interrupt for secondary ports).
 	 */
 	execlists->queue_priority_hint = queue_prio(execlists);
+	GEM_TRACE("%s: queue_priority_hint:%d, submit:%s\n",
+		  engine->name, execlists->queue_priority_hint,
+		  yesno(submit));
 
 	if (submit) {
-		port_assign(port, last);
+		*port = execlists_schedule_in(last, port - execlists->pending);
+		memset(port + 1, 0, (last_port - port) * sizeof(*port));
 		execlists_submit_ports(engine);
 	}
-
-	/* We must always keep the beast fed if we have work piled up */
-	GEM_BUG_ON(rb_first_cached(&execlists->queue) &&
-		   !port_isset(execlists->port));
-
-	/* Re-evaluate the executing context setup after each preemptive kick */
-	if (last)
-		execlists_user_begin(execlists, execlists->port);
-
-	/* If the engine is now idle, so should be the flag; and vice versa. */
-	GEM_BUG_ON(execlists_is_active(&engine->execlists,
-				       EXECLISTS_ACTIVE_USER) ==
-		   !port_isset(engine->execlists.port));
 }
 
 void
 execlists_cancel_port_requests(struct intel_engine_execlists * const execlists)
 {
-	struct execlist_port *port = execlists->port;
-	unsigned int num_ports = execlists_num_ports(execlists);
-
-	while (num_ports-- && port_isset(port)) {
-		struct i915_request *rq = port_request(port);
-
-		GEM_TRACE("%s:port%u fence %llx:%lld, (current %d)\n",
-			  rq->engine->name,
-			  (unsigned int)(port - execlists->port),
-			  rq->fence.context, rq->fence.seqno,
-			  hwsp_seqno(rq));
+	struct i915_request * const *port, *rq;
 
-		GEM_BUG_ON(!execlists->active);
-		execlists_context_schedule_out(rq,
-					       i915_request_completed(rq) ?
-					       INTEL_CONTEXT_SCHEDULE_OUT :
-					       INTEL_CONTEXT_SCHEDULE_PREEMPTED);
+	for (port = execlists->pending; (rq = *port); port++)
+		execlists_schedule_out(rq);
+	memset(execlists->pending, 0, sizeof(execlists->pending));
 
-		i915_request_put(rq);
-
-		memset(port, 0, sizeof(*port));
-		port++;
-	}
-
-	execlists_clear_all_active(execlists);
+	for (port = execlists->active; (rq = *port); port++)
+		execlists_schedule_out(rq);
+	execlists->active =
+		memset(execlists->inflight, 0, sizeof(execlists->inflight));
 }
 
 static inline void
@@ -1163,7 +1155,6 @@ reset_in_progress(const struct intel_engine_execlists *execlists)
 static void process_csb(struct intel_engine_cs *engine)
 {
 	struct intel_engine_execlists * const execlists = &engine->execlists;
-	struct execlist_port *port = execlists->port;
 	const u32 * const buf = execlists->csb_status;
 	const u8 num_entries = execlists->csb_size;
 	u8 head, tail;
@@ -1197,9 +1188,7 @@ static void process_csb(struct intel_engine_cs *engine)
 	rmb();
 
 	do {
-		struct i915_request *rq;
 		unsigned int status;
-		unsigned int count;
 
 		if (++head == num_entries)
 			head = 0;
@@ -1222,68 +1211,37 @@ static void process_csb(struct intel_engine_cs *engine)
 		 * status notifier.
 		 */
 
-		GEM_TRACE("%s csb[%d]: status=0x%08x:0x%08x, active=0x%x\n",
+		GEM_TRACE("%s csb[%d]: status=0x%08x:0x%08x\n",
 			  engine->name, head,
-			  buf[2 * head + 0], buf[2 * head + 1],
-			  execlists->active);
+			  buf[2 * head + 0], buf[2 * head + 1]);
 
 		status = buf[2 * head];
-		if (status & (GEN8_CTX_STATUS_IDLE_ACTIVE |
-			      GEN8_CTX_STATUS_PREEMPTED))
-			execlists_set_active(execlists,
-					     EXECLISTS_ACTIVE_HWACK);
-		if (status & GEN8_CTX_STATUS_ACTIVE_IDLE)
-			execlists_clear_active(execlists,
-					       EXECLISTS_ACTIVE_HWACK);
-
-		if (!(status & GEN8_CTX_STATUS_COMPLETED_MASK))
-			continue;
-
-		/* We should never get a COMPLETED | IDLE_ACTIVE! */
-		GEM_BUG_ON(status & GEN8_CTX_STATUS_IDLE_ACTIVE);
+		if (status & GEN8_CTX_STATUS_IDLE_ACTIVE) {
+promote:
+			GEM_BUG_ON(!assert_pending_valid(execlists, "promote"));
+			execlists->active =
+				memcpy(execlists->inflight,
+				       execlists->pending,
+				       execlists_num_ports(execlists) *
+				       sizeof(*execlists->pending));
+			execlists->pending[0] = NULL;
 
-		if (status & GEN8_CTX_STATUS_COMPLETE &&
-		    buf[2*head + 1] == execlists->preempt_complete_status) {
-			GEM_TRACE("%s preempt-idle\n", engine->name);
-			complete_preempt_context(execlists);
-			continue;
-		}
+			if (!inject_preempt_hang(execlists))
+				ring_pause(engine) = 0;
+		} else if (status & GEN8_CTX_STATUS_PREEMPTED) {
+			struct i915_request * const *port = execlists->active;
 
-		if (status & GEN8_CTX_STATUS_PREEMPTED &&
-		    execlists_is_active(execlists,
-					EXECLISTS_ACTIVE_PREEMPT))
-			continue;
+			trace_ports(execlists, "preempted", execlists->active);
 
-		GEM_BUG_ON(!execlists_is_active(execlists,
-						EXECLISTS_ACTIVE_USER));
+			while (*port)
+				execlists_schedule_out(*port++);
 
-		rq = port_unpack(port, &count);
-		GEM_TRACE("%s out[0]: ctx=%d.%d, fence %llx:%lld (current %d), prio=%d\n",
-			  engine->name,
-			  port->context_id, count,
-			  rq ? rq->fence.context : 0,
-			  rq ? rq->fence.seqno : 0,
-			  rq ? hwsp_seqno(rq) : 0,
-			  rq ? rq_prio(rq) : 0);
+			goto promote;
+		} else if (*execlists->active) {
+			struct i915_request *rq = *execlists->active++;
 
-		/* Check the context/desc id for this event matches */
-		GEM_DEBUG_BUG_ON(buf[2 * head + 1] != port->context_id);
-
-		GEM_BUG_ON(count == 0);
-		if (--count == 0) {
-			/*
-			 * On the final event corresponding to the
-			 * submission of this context, we expect either
-			 * an element-switch event or a completion
-			 * event (and on completion, the active-idle
-			 * marker). No more preemptions, lite-restore
-			 * or otherwise.
-			 */
-			GEM_BUG_ON(status & GEN8_CTX_STATUS_PREEMPTED);
-			GEM_BUG_ON(port_isset(&port[1]) &&
-				   !(status & GEN8_CTX_STATUS_ELEMENT_SWITCH));
-			GEM_BUG_ON(!port_isset(&port[1]) &&
-				   !(status & GEN8_CTX_STATUS_ACTIVE_IDLE));
+			trace_ports(execlists, "completed",
+				    execlists->active - 1);
 
 			/*
 			 * We rely on the hardware being strongly
@@ -1292,21 +1250,10 @@ static void process_csb(struct intel_engine_cs *engine)
 			 * user interrupt and CSB is processed.
 			 */
 			GEM_BUG_ON(!i915_request_completed(rq));
+			execlists_schedule_out(rq);
 
-			execlists_context_schedule_out(rq,
-						       INTEL_CONTEXT_SCHEDULE_OUT);
-			i915_request_put(rq);
-
-			GEM_TRACE("%s completed ctx=%d\n",
-				  engine->name, port->context_id);
-
-			port = execlists_port_complete(execlists, port);
-			if (port_isset(port))
-				execlists_user_begin(execlists, port);
-			else
-				execlists_user_end(execlists);
-		} else {
-			port_set(port, port_pack(rq, count));
+			GEM_BUG_ON(execlists->active - execlists->inflight >
+				   execlists_num_ports(execlists));
 		}
 	} while (head != tail);
 
@@ -1331,7 +1278,7 @@ static void __execlists_submission_tasklet(struct intel_engine_cs *const engine)
 	lockdep_assert_held(&engine->active.lock);
 
 	process_csb(engine);
-	if (!execlists_is_active(&engine->execlists, EXECLISTS_ACTIVE_PREEMPT))
+	if (!engine->execlists.pending[0])
 		execlists_dequeue(engine);
 }
 
@@ -1344,11 +1291,6 @@ static void execlists_submission_tasklet(unsigned long data)
 	struct intel_engine_cs * const engine = (struct intel_engine_cs *)data;
 	unsigned long flags;
 
-	GEM_TRACE("%s awake?=%d, active=%x\n",
-		  engine->name,
-		  !!intel_wakeref_active(&engine->wakeref),
-		  engine->execlists.active);
-
 	spin_lock_irqsave(&engine->active.lock, flags);
 	__execlists_submission_tasklet(engine);
 	spin_unlock_irqrestore(&engine->active.lock, flags);
@@ -1375,12 +1317,16 @@ static void __submit_queue_imm(struct intel_engine_cs *engine)
 		tasklet_hi_schedule(&execlists->tasklet);
 }
 
-static void submit_queue(struct intel_engine_cs *engine, int prio)
+static void submit_queue(struct intel_engine_cs *engine,
+			 const struct i915_request *rq)
 {
-	if (prio > engine->execlists.queue_priority_hint) {
-		engine->execlists.queue_priority_hint = prio;
-		__submit_queue_imm(engine);
-	}
+	struct intel_engine_execlists *execlists = &engine->execlists;
+
+	if (rq_prio(rq) <= execlists->queue_priority_hint)
+		return;
+
+	execlists->queue_priority_hint = rq_prio(rq);
+	__submit_queue_imm(engine);
 }
 
 static void execlists_submit_request(struct i915_request *request)
@@ -1396,7 +1342,7 @@ static void execlists_submit_request(struct i915_request *request)
 	GEM_BUG_ON(RB_EMPTY_ROOT(&engine->execlists.queue.rb_root));
 	GEM_BUG_ON(list_empty(&request->sched.link));
 
-	submit_queue(engine, rq_prio(request));
+	submit_queue(engine, request);
 
 	spin_unlock_irqrestore(&engine->active.lock, flags);
 }
@@ -2053,27 +1999,13 @@ static void execlists_reset_prepare(struct intel_engine_cs *engine)
 	spin_unlock_irqrestore(&engine->active.lock, flags);
 }
 
-static bool lrc_regs_ok(const struct i915_request *rq)
-{
-	const struct intel_ring *ring = rq->ring;
-	const u32 *regs = rq->hw_context->lrc_reg_state;
-
-	/* Quick spot check for the common signs of context corruption */
-
-	if (regs[CTX_RING_BUFFER_CONTROL + 1] !=
-	    (RING_CTL_SIZE(ring->size) | RING_VALID))
-		return false;
-
-	if (regs[CTX_RING_BUFFER_START + 1] != i915_ggtt_offset(ring->vma))
-		return false;
-
-	return true;
-}
-
-static void reset_csb_pointers(struct intel_engine_execlists *execlists)
+static void reset_csb_pointers(struct intel_engine_cs *engine)
 {
+	struct intel_engine_execlists * const execlists = &engine->execlists;
 	const unsigned int reset_value = execlists->csb_size - 1;
 
+	ring_pause(engine) = 0;
+
 	/*
 	 * After a reset, the HW starts writing into CSB entry [0]. We
 	 * therefore have to set our HEAD pointer back one entry so that
@@ -2120,18 +2052,19 @@ static void __execlists_reset(struct intel_engine_cs *engine, bool stalled)
 	process_csb(engine); /* drain preemption events */
 
 	/* Following the reset, we need to reload the CSB read/write pointers */
-	reset_csb_pointers(&engine->execlists);
+	reset_csb_pointers(engine);
 
 	/*
 	 * Save the currently executing context, even if we completed
 	 * its request, it was still running at the time of the
 	 * reset and will have been clobbered.
 	 */
-	if (!port_isset(execlists->port))
-		goto out_clear;
+	rq = execlists_active(execlists);
+	if (!rq)
+		return;
 
-	rq = port_request(execlists->port);
 	ce = rq->hw_context;
+	rq = active_request(rq);
 
 	/*
 	 * Catch up with any missed context-switch interrupts.
@@ -2144,9 +2077,12 @@ static void __execlists_reset(struct intel_engine_cs *engine, bool stalled)
 	 */
 	execlists_cancel_port_requests(execlists);
 
-	rq = active_request(rq);
-	if (!rq)
+	if (!rq) {
+		ce->ring->head = ce->ring->tail;
 		goto out_replay;
+	}
+
+	ce->ring->head = intel_ring_wrap(ce->ring, rq->head);
 
 	/*
 	 * If this request hasn't started yet, e.g. it is waiting on a
@@ -2160,7 +2096,7 @@ static void __execlists_reset(struct intel_engine_cs *engine, bool stalled)
 	 * Otherwise, if we have not started yet, the request should replay
 	 * perfectly and we do not need to flag the result as being erroneous.
 	 */
-	if (!i915_request_started(rq) && lrc_regs_ok(rq))
+	if (!i915_request_started(rq))
 		goto out_replay;
 
 	/*
@@ -2175,7 +2111,7 @@ static void __execlists_reset(struct intel_engine_cs *engine, bool stalled)
 	 * image back to the expected values to skip over the guilty request.
 	 */
 	i915_reset_request(rq, stalled);
-	if (!stalled && lrc_regs_ok(rq))
+	if (!stalled)
 		goto out_replay;
 
 	/*
@@ -2195,17 +2131,13 @@ static void __execlists_reset(struct intel_engine_cs *engine, bool stalled)
 	execlists_init_reg_state(regs, ce, engine, ce->ring);
 
 out_replay:
-	/* Rerun the request; its payload has been neutered (if guilty). */
-	ce->ring->head =
-		rq ? intel_ring_wrap(ce->ring, rq->head) : ce->ring->tail;
+	GEM_TRACE("%s replay {head:%04x, tail:%04x\n",
+		  engine->name, ce->ring->head, ce->ring->tail);
 	intel_ring_update_space(ce->ring);
 	__execlists_update_reg_state(ce, engine);
 
 	/* Push back any incomplete requests for replay after the reset. */
 	__unwind_incomplete_requests(engine);
-
-out_clear:
-	execlists_clear_all_active(execlists);
 }
 
 static void execlists_reset(struct intel_engine_cs *engine, bool stalled)
@@ -2301,7 +2233,6 @@ static void execlists_cancel_requests(struct intel_engine_cs *engine)
 
 	execlists->queue_priority_hint = INT_MIN;
 	execlists->queue = RB_ROOT_CACHED;
-	GEM_BUG_ON(port_isset(execlists->port));
 
 	GEM_BUG_ON(__tasklet_is_enabled(&execlists->tasklet));
 	execlists->tasklet.func = nop_submission_tasklet;
@@ -2519,15 +2450,29 @@ static u32 *gen8_emit_wa_tail(struct i915_request *request, u32 *cs)
 	return cs;
 }
 
+static u32 *emit_preempt_busywait(struct i915_request *request, u32 *cs)
+{
+	*cs++ = MI_SEMAPHORE_WAIT |
+		MI_SEMAPHORE_GLOBAL_GTT |
+		MI_SEMAPHORE_POLL |
+		MI_SEMAPHORE_SAD_EQ_SDD;
+	*cs++ = 0;
+	*cs++ = intel_hws_preempt_address(request->engine);
+	*cs++ = 0;
+
+	return cs;
+}
+
 static u32 *gen8_emit_fini_breadcrumb(struct i915_request *request, u32 *cs)
 {
 	cs = gen8_emit_ggtt_write(cs,
 				  request->fence.seqno,
 				  request->timeline->hwsp_offset,
 				  0);
-
 	*cs++ = MI_USER_INTERRUPT;
+
 	*cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE;
+	cs = emit_preempt_busywait(request, cs);
 
 	request->tail = intel_ring_offset(request, cs);
 	assert_ring_tail_valid(request->ring, request->tail);
@@ -2548,9 +2493,10 @@ static u32 *gen8_emit_fini_breadcrumb_rcs(struct i915_request *request, u32 *cs)
 				    PIPE_CONTROL_FLUSH_ENABLE |
 				    PIPE_CONTROL_CS_STALL,
 				    0);
-
 	*cs++ = MI_USER_INTERRUPT;
+
 	*cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE;
+	cs = emit_preempt_busywait(request, cs);
 
 	request->tail = intel_ring_offset(request, cs);
 	assert_ring_tail_valid(request->ring, request->tail);
@@ -2599,8 +2545,7 @@ void intel_execlists_set_default_submission(struct intel_engine_cs *engine)
 	engine->flags |= I915_ENGINE_SUPPORTS_STATS;
 	if (!intel_vgpu_active(engine->i915))
 		engine->flags |= I915_ENGINE_HAS_SEMAPHORES;
-	if (engine->preempt_context &&
-	    HAS_LOGICAL_RING_PREEMPTION(engine->i915))
+	if (HAS_LOGICAL_RING_PREEMPTION(engine->i915))
 		engine->flags |= I915_ENGINE_HAS_PREEMPTION;
 }
 
@@ -2723,11 +2668,6 @@ int intel_execlists_submission_init(struct intel_engine_cs *engine)
 			i915_mmio_reg_offset(RING_ELSP(base));
 	}
 
-	execlists->preempt_complete_status = ~0u;
-	if (engine->preempt_context)
-		execlists->preempt_complete_status =
-			upper_32_bits(engine->preempt_context->lrc_desc);
-
 	execlists->csb_status =
 		&engine->status_page.addr[I915_HWS_CSB_BUF0_INDEX];
 
@@ -2739,7 +2679,7 @@ int intel_execlists_submission_init(struct intel_engine_cs *engine)
 	else
 		execlists->csb_size = GEN11_CSB_ENTRIES;
 
-	reset_csb_pointers(execlists);
+	reset_csb_pointers(engine);
 
 	return 0;
 }
@@ -2922,11 +2862,6 @@ populate_lr_context(struct intel_context *ce,
 	if (!engine->default_state)
 		regs[CTX_CONTEXT_CONTROL + 1] |=
 			_MASKED_BIT_ENABLE(CTX_CTRL_ENGINE_CTX_RESTORE_INHIBIT);
-	if (ce->gem_context == engine->i915->preempt_context &&
-	    INTEL_GEN(engine->i915) < 11)
-		regs[CTX_CONTEXT_CONTROL + 1] |=
-			_MASKED_BIT_ENABLE(CTX_CTRL_ENGINE_CTX_RESTORE_INHIBIT |
-					   CTX_CTRL_ENGINE_CTX_SAVE_INHIBIT);
 
 	ret = 0;
 err_unpin_ctx:
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index f411e3244208..01e1af2a651f 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -1247,10 +1247,10 @@ static void error_record_engine_registers(struct i915_gpu_state *error,
 	}
 }
 
-static void record_request(struct i915_request *request,
+static void record_request(const struct i915_request *request,
 			   struct drm_i915_error_request *erq)
 {
-	struct i915_gem_context *ctx = request->gem_context;
+	const struct i915_gem_context *ctx = request->gem_context;
 
 	erq->flags = request->fence.flags;
 	erq->context = request->fence.context;
@@ -1314,20 +1314,15 @@ static void engine_record_requests(struct intel_engine_cs *engine,
 	ee->num_requests = count;
 }
 
-static void error_record_engine_execlists(struct intel_engine_cs *engine,
+static void error_record_engine_execlists(const struct intel_engine_cs *engine,
 					  struct drm_i915_error_engine *ee)
 {
 	const struct intel_engine_execlists * const execlists = &engine->execlists;
-	unsigned int n;
+	struct i915_request * const *port = execlists->active;
+	unsigned int n = 0;
 
-	for (n = 0; n < execlists_num_ports(execlists); n++) {
-		struct i915_request *rq = port_request(&execlists->port[n]);
-
-		if (!rq)
-			break;
-
-		record_request(rq, &ee->execlist[n]);
-	}
+	while (*port)
+		record_request(*port++, &ee->execlist[n++]);
 
 	ee->num_ports = n;
 }
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 6d71eefd6c9f..ead68a2ffeac 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -276,6 +276,12 @@ static bool i915_request_retire(struct i915_request *rq)
 
 	local_irq_disable();
 
+	/*
+	 * We only loosely track inflight requests across preemption,
+	 * and so we may find ourselves attempting to retire a _completed_
+	 * request that we have removed from the HW and put back on a run
+	 * queue.
+	 */
 	spin_lock(&rq->engine->active.lock);
 	list_del(&rq->sched.link);
 	spin_unlock(&rq->engine->active.lock);
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index edbbdfec24ab..bebc1e9b4a5e 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -28,6 +28,7 @@
 #include <linux/dma-fence.h>
 #include <linux/lockdep.h>
 
+#include "gt/intel_context_types.h"
 #include "gt/intel_engine_types.h"
 
 #include "i915_gem.h"
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index 2e9b38bdc33c..b1ba3e65cd52 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -179,8 +179,7 @@ static inline int rq_prio(const struct i915_request *rq)
 
 static void kick_submission(struct intel_engine_cs *engine, int prio)
 {
-	const struct i915_request *inflight =
-		port_request(engine->execlists.port);
+	const struct i915_request *inflight = *engine->execlists.active;
 
 	/*
 	 * If we are already the currently executing context, don't
diff --git a/drivers/gpu/drm/i915/i915_utils.h b/drivers/gpu/drm/i915/i915_utils.h
index 2987219a6300..4920ff9aba62 100644
--- a/drivers/gpu/drm/i915/i915_utils.h
+++ b/drivers/gpu/drm/i915/i915_utils.h
@@ -131,6 +131,18 @@ __check_struct_size(size_t base, size_t arr, size_t count, size_t *size)
 	((typeof(ptr))((unsigned long)(ptr) | __bits));			\
 })
 
+#define ptr_count_dec(p_ptr) do {					\
+	typeof(p_ptr) __p = (p_ptr);					\
+	unsigned long __v = (unsigned long)(*__p);			\
+	*__p = (typeof(*p_ptr))(--__v);					\
+} while (0)
+
+#define ptr_count_inc(p_ptr) do {					\
+	typeof(p_ptr) __p = (p_ptr);					\
+	unsigned long __v = (unsigned long)(*__p);			\
+	*__p = (typeof(*p_ptr))(++__v);					\
+} while (0)
+
 #define page_mask_bits(ptr) ptr_mask_bits(ptr, PAGE_SHIFT)
 #define page_unmask_bits(ptr) ptr_unmask_bits(ptr, PAGE_SHIFT)
 #define page_pack_bits(ptr, bits) ptr_pack_bits(ptr, bits, PAGE_SHIFT)
diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c
index db531ebc7704..12c22359fdac 100644
--- a/drivers/gpu/drm/i915/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/intel_guc_submission.c
@@ -32,7 +32,11 @@
 #include "intel_guc_submission.h"
 #include "i915_drv.h"
 
-#define GUC_PREEMPT_FINISHED		0x1
+enum {
+	GUC_PREEMPT_NONE = 0,
+	GUC_PREEMPT_INPROGRESS,
+	GUC_PREEMPT_FINISHED,
+};
 #define GUC_PREEMPT_BREADCRUMB_DWORDS	0x8
 #define GUC_PREEMPT_BREADCRUMB_BYTES	\
 	(sizeof(u32) * GUC_PREEMPT_BREADCRUMB_DWORDS)
@@ -537,15 +541,11 @@ static void guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 	u32 ctx_desc = lower_32_bits(rq->hw_context->lrc_desc);
 	u32 ring_tail = intel_ring_set_tail(rq->ring, rq->tail) / sizeof(u64);
 
-	spin_lock(&client->wq_lock);
-
 	guc_wq_item_append(client, engine->guc_id, ctx_desc,
 			   ring_tail, rq->fence.seqno);
 	guc_ring_doorbell(client);
 
 	client->submissions[engine->id] += 1;
-
-	spin_unlock(&client->wq_lock);
 }
 
 /*
@@ -631,8 +631,9 @@ static void inject_preempt_context(struct work_struct *work)
 	data[6] = intel_guc_ggtt_offset(guc, guc->shared_data);
 
 	if (WARN_ON(intel_guc_send(guc, data, ARRAY_SIZE(data)))) {
-		execlists_clear_active(&engine->execlists,
-				       EXECLISTS_ACTIVE_PREEMPT);
+		intel_write_status_page(engine,
+					I915_GEM_HWS_PREEMPT,
+					GUC_PREEMPT_NONE);
 		tasklet_schedule(&engine->execlists.tasklet);
 	}
 
@@ -672,8 +673,6 @@ static void complete_preempt_context(struct intel_engine_cs *engine)
 {
 	struct intel_engine_execlists *execlists = &engine->execlists;
 
-	GEM_BUG_ON(!execlists_is_active(execlists, EXECLISTS_ACTIVE_PREEMPT));
-
 	if (inject_preempt_hang(execlists))
 		return;
 
@@ -681,89 +680,90 @@ static void complete_preempt_context(struct intel_engine_cs *engine)
 	execlists_unwind_incomplete_requests(execlists);
 
 	wait_for_guc_preempt_report(engine);
-	intel_write_status_page(engine, I915_GEM_HWS_PREEMPT, 0);
+	intel_write_status_page(engine, I915_GEM_HWS_PREEMPT, GUC_PREEMPT_NONE);
 }
 
-/**
- * guc_submit() - Submit commands through GuC
- * @engine: engine associated with the commands
- *
- * The only error here arises if the doorbell hardware isn't functioning
- * as expected, which really shouln't happen.
- */
-static void guc_submit(struct intel_engine_cs *engine)
+static void guc_submit(struct intel_engine_cs *engine,
+		       struct i915_request **out,
+		       struct i915_request **end)
 {
 	struct intel_guc *guc = &engine->i915->guc;
-	struct intel_engine_execlists * const execlists = &engine->execlists;
-	struct execlist_port *port = execlists->port;
-	unsigned int n;
+	struct intel_guc_client *client = guc->execbuf_client;
 
-	for (n = 0; n < execlists_num_ports(execlists); n++) {
-		struct i915_request *rq;
-		unsigned int count;
+	spin_lock(&client->wq_lock);
 
-		rq = port_unpack(&port[n], &count);
-		if (rq && count == 0) {
-			port_set(&port[n], port_pack(rq, ++count));
+	do {
+		struct i915_request *rq = *out++;
 
-			flush_ggtt_writes(rq->ring->vma);
+		flush_ggtt_writes(rq->ring->vma);
+		guc_add_request(guc, rq);
+	} while (out != end);
 
-			guc_add_request(guc, rq);
-		}
-	}
+	spin_unlock(&client->wq_lock);
 }
 
-static void port_assign(struct execlist_port *port, struct i915_request *rq)
+static inline int rq_prio(const struct i915_request *rq)
 {
-	GEM_BUG_ON(port_isset(port));
-
-	port_set(port, i915_request_get(rq));
+	return rq->sched.attr.priority | __NO_PREEMPTION;
 }
 
-static inline int rq_prio(const struct i915_request *rq)
+static struct i915_request *schedule_in(struct i915_request *rq, int idx)
 {
-	return rq->sched.attr.priority;
+	trace_i915_request_in(rq, idx);
+
+	if (!rq->hw_context->inflight)
+		rq->hw_context->inflight = rq->engine;
+	intel_context_inflight_inc(rq->hw_context);
+
+	return i915_request_get(rq);
 }
 
-static inline int port_prio(const struct execlist_port *port)
+static void schedule_out(struct i915_request *rq)
 {
-	return rq_prio(port_request(port)) | __NO_PREEMPTION;
+	trace_i915_request_out(rq);
+
+	intel_context_inflight_dec(rq->hw_context);
+	if (!intel_context_inflight_count(rq->hw_context))
+		rq->hw_context->inflight = NULL;
+
+	i915_request_put(rq);
 }
 
-static bool __guc_dequeue(struct intel_engine_cs *engine)
+static void __guc_dequeue(struct intel_engine_cs *engine)
 {
 	struct intel_engine_execlists * const execlists = &engine->execlists;
-	struct execlist_port *port = execlists->port;
-	struct i915_request *last = NULL;
-	const struct execlist_port * const last_port =
-		&execlists->port[execlists->port_mask];
+	struct i915_request **first = execlists->inflight;
+	struct i915_request ** const last_port = first + execlists->port_mask;
+	struct i915_request *last = first[0];
+	struct i915_request **port;
 	bool submit = false;
 	struct rb_node *rb;
 
 	lockdep_assert_held(&engine->active.lock);
 
-	if (port_isset(port)) {
+	if (last) {
 		if (intel_engine_has_preemption(engine)) {
 			struct guc_preempt_work *preempt_work =
 				&engine->i915->guc.preempt_work[engine->id];
 			int prio = execlists->queue_priority_hint;
 
-			if (i915_scheduler_need_preempt(prio,
-							port_prio(port))) {
-				execlists_set_active(execlists,
-						     EXECLISTS_ACTIVE_PREEMPT);
+			if (i915_scheduler_need_preempt(prio, rq_prio(last))) {
+				intel_write_status_page(engine,
+							I915_GEM_HWS_PREEMPT,
+							GUC_PREEMPT_INPROGRESS);
 				queue_work(engine->i915->guc.preempt_wq,
 					   &preempt_work->work);
-				return false;
+				return;
 			}
 		}
 
-		port++;
-		if (port_isset(port))
-			return false;
+		if (*++first)
+			return;
+
+		last = NULL;
 	}
-	GEM_BUG_ON(port_isset(port));
 
+	port = first;
 	while ((rb = rb_first_cached(&execlists->queue))) {
 		struct i915_priolist *p = to_priolist(rb);
 		struct i915_request *rq, *rn;
@@ -774,18 +774,15 @@ static bool __guc_dequeue(struct intel_engine_cs *engine)
 				if (port == last_port)
 					goto done;
 
-				if (submit)
-					port_assign(port, last);
+				*port = schedule_in(last,
+						    port - execlists->inflight);
 				port++;
 			}
 
 			list_del_init(&rq->sched.link);
-
 			__i915_request_submit(rq);
-			trace_i915_request_in(rq, port_index(port, execlists));
-
-			last = rq;
 			submit = true;
+			last = rq;
 		}
 
 		rb_erase_cached(&p->node, &execlists->queue);
@@ -794,58 +791,41 @@ static bool __guc_dequeue(struct intel_engine_cs *engine)
 done:
 	execlists->queue_priority_hint =
 		rb ? to_priolist(rb)->priority : INT_MIN;
-	if (submit)
-		port_assign(port, last);
-	if (last)
-		execlists_user_begin(execlists, execlists->port);
-
-	/* We must always keep the beast fed if we have work piled up */
-	GEM_BUG_ON(port_isset(execlists->port) &&
-		   !execlists_is_active(execlists, EXECLISTS_ACTIVE_USER));
-	GEM_BUG_ON(rb_first_cached(&execlists->queue) &&
-		   !port_isset(execlists->port));
-
-	return submit;
-}
-
-static void guc_dequeue(struct intel_engine_cs *engine)
-{
-	if (__guc_dequeue(engine))
-		guc_submit(engine);
+	if (submit) {
+		*port = schedule_in(last, port - execlists->inflight);
+		*++port = NULL;
+		guc_submit(engine, first, port);
+	}
+	execlists->active = execlists->inflight;
 }
 
 static void guc_submission_tasklet(unsigned long data)
 {
 	struct intel_engine_cs * const engine = (struct intel_engine_cs *)data;
 	struct intel_engine_execlists * const execlists = &engine->execlists;
-	struct execlist_port *port = execlists->port;
-	struct i915_request *rq;
+	struct i915_request **port, *rq;
 	unsigned long flags;
 
 	spin_lock_irqsave(&engine->active.lock, flags);
 
-	rq = port_request(port);
-	while (rq && i915_request_completed(rq)) {
-		trace_i915_request_out(rq);
-		i915_request_put(rq);
+	for (port = execlists->inflight; (rq = *port); port++) {
+		if (!i915_request_completed(rq))
+			break;
 
-		port = execlists_port_complete(execlists, port);
-		if (port_isset(port)) {
-			execlists_user_begin(execlists, port);
-			rq = port_request(port);
-		} else {
-			execlists_user_end(execlists);
-			rq = NULL;
-		}
+		schedule_out(rq);
+	}
+	if (port != execlists->inflight) {
+		int idx = port - execlists->inflight;
+		int rem = ARRAY_SIZE(execlists->inflight) - idx;
+		memmove(execlists->inflight, port, rem * sizeof(*port));
 	}
 
-	if (execlists_is_active(execlists, EXECLISTS_ACTIVE_PREEMPT) &&
-	    intel_read_status_page(engine, I915_GEM_HWS_PREEMPT) ==
+	if (intel_read_status_page(engine, I915_GEM_HWS_PREEMPT) ==
 	    GUC_PREEMPT_FINISHED)
 		complete_preempt_context(engine);
 
-	if (!execlists_is_active(execlists, EXECLISTS_ACTIVE_PREEMPT))
-		guc_dequeue(engine);
+	if (!intel_read_status_page(engine, I915_GEM_HWS_PREEMPT))
+		__guc_dequeue(engine);
 
 	spin_unlock_irqrestore(&engine->active.lock, flags);
 }
@@ -959,7 +939,6 @@ static void guc_cancel_requests(struct intel_engine_cs *engine)
 
 	execlists->queue_priority_hint = INT_MIN;
 	execlists->queue = RB_ROOT_CACHED;
-	GEM_BUG_ON(port_isset(execlists->port));
 
 	spin_unlock_irqrestore(&engine->active.lock, flags);
 }
@@ -1422,7 +1401,7 @@ int intel_guc_submission_enable(struct intel_guc *guc)
 	 * and it is guaranteed that it will remove the work item from the
 	 * queue before our request is completed.
 	 */
-	BUILD_BUG_ON(ARRAY_SIZE(engine->execlists.port) *
+	BUILD_BUG_ON(ARRAY_SIZE(engine->execlists.inflight) *
 		     sizeof(struct guc_wq_item) *
 		     I915_NUM_ENGINES > GUC_WQ_SIZE);
 
diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c b/drivers/gpu/drm/i915/selftests/i915_request.c
index 11278bac3a24..1bd1ee95cf8c 100644
--- a/drivers/gpu/drm/i915/selftests/i915_request.c
+++ b/drivers/gpu/drm/i915/selftests/i915_request.c
@@ -366,13 +366,15 @@ static int __igt_breadcrumbs_smoketest(void *arg)
 
 		if (!wait_event_timeout(wait->wait,
 					i915_sw_fence_done(wait),
-					HZ / 2)) {
+					5 * HZ)) {
 			struct i915_request *rq = requests[count - 1];
 
-			pr_err("waiting for %d fences (last %llx:%lld) on %s timed out!\n",
-			       count,
+			pr_err("waiting for %d/%d fences (last %llx:%lld) on %s timed out!\n",
+			       atomic_read(&wait->pending), count,
 			       rq->fence.context, rq->fence.seqno,
 			       t->engine->name);
+			GEM_TRACE_DUMP();
+
 			i915_gem_set_wedged(t->engine->i915);
 			GEM_BUG_ON(!i915_request_completed(rq));
 			i915_sw_fence_wait(wait);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 10/39] drm/i915/execlists: Minimalistic timeslicing
  2019-06-14  7:09 WIP glance towards struct_mutex removal Chris Wilson
                   ` (8 preceding siblings ...)
  2019-06-14  7:09 ` [PATCH 09/39] drm/i915/execlists: Preempt-to-busy Chris Wilson
@ 2019-06-14  7:09 ` Chris Wilson
  2019-06-14  7:09 ` [PATCH 11/39] drm/i915/execlists: Force preemption Chris Wilson
                   ` (30 subsequent siblings)
  40 siblings, 0 replies; 61+ messages in thread
From: Chris Wilson @ 2019-06-14  7:09 UTC (permalink / raw)
  To: intel-gfx

If we have multiple contexts of equal priority pending execution,
activate a timer to demote the currently executing context in favour of
the next in the queue when that timeslice expires. This enforces
fairness between contexts (so long as they allow preemption -- forced
preemption, in the future, will kick those who do not obey) and allows
us to avoid userspace blocking forward progress with e.g. unbounded
MI_SEMAPHORE_WAIT.

For the starting point here, we use the jiffie as our timeslice so that
we should be reasonably efficient wrt frequent CPU wakeups.

Testcase: igt/gem_exec_scheduler/semaphore-resolve
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gt/intel_engine_types.h |   6 +
 drivers/gpu/drm/i915/gt/intel_lrc.c          | 111 +++++++++
 drivers/gpu/drm/i915/gt/selftest_lrc.c       | 223 +++++++++++++++++++
 drivers/gpu/drm/i915/i915_scheduler.c        |   1 +
 drivers/gpu/drm/i915/i915_scheduler_types.h  |   1 +
 5 files changed, 342 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index dd0082df42cc..11a25f060fed 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -12,6 +12,7 @@
 #include <linux/kref.h>
 #include <linux/list.h>
 #include <linux/llist.h>
+#include <linux/timer.h>
 #include <linux/types.h>
 
 #include "i915_gem.h"
@@ -137,6 +138,11 @@ struct intel_engine_execlists {
 	 */
 	struct tasklet_struct tasklet;
 
+	/**
+	 * @timer: kick the current context if its timeslice expires
+	 */
+	struct timer_list timer;
+
 	/**
 	 * @default_priolist: priority list for I915_PRIORITY_NORMAL
 	 */
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 65c91b7db59d..cea08d665ef5 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -255,6 +255,7 @@ static int effective_prio(const struct i915_request *rq)
 		prio |= I915_PRIORITY_NOSEMAPHORE;
 
 	/* Restrict mere WAIT boosts from triggering preemption */
+	BUILD_BUG_ON(__NO_PREEMPTION & ~I915_PRIORITY_MASK); /* only internal */
 	return prio | __NO_PREEMPTION;
 }
 
@@ -811,6 +812,81 @@ last_active(const struct intel_engine_execlists *execlists)
 	return *last;
 }
 
+static void
+defer_request(struct i915_request * const rq, struct list_head * const pl)
+{
+	struct i915_dependency *p;
+
+	/*
+	 * We want to move the interrupted request to the back of
+	 * the round-robin list (i.e. its priority level), but
+	 * in doing so, we must then move all requests that were in
+	 * flight and were waiting for the interrupted request to
+	 * be run after it again.
+	 */
+	list_move_tail(&rq->sched.link, pl);
+
+	list_for_each_entry(p, &rq->sched.waiters_list, wait_link) {
+		struct i915_request *w =
+			container_of(p->waiter, typeof(*w), sched);
+
+		/* Leave semaphores spinning on the other engines */
+		if (w->engine != rq->engine)
+			continue;
+
+		/* No waiter should start before the active request completed */
+		GEM_BUG_ON(i915_request_started(w));
+
+		GEM_BUG_ON(rq_prio(w) > rq_prio(rq));
+		if (rq_prio(w) < rq_prio(rq))
+			continue;
+
+		if (list_empty(&w->sched.link))
+			continue; /* Not yet submitted; unready */
+
+		/*
+		 * This should be very shallow as it is limited by the
+		 * number of requests that can fit in a ring (<64) and
+		 * the number of contexts that can be in flight on this
+		 * engine.
+		 */
+		defer_request(w, pl);
+	}
+}
+
+static void defer_active(struct intel_engine_cs *engine)
+{
+	struct i915_request *rq;
+
+	rq = __unwind_incomplete_requests(engine);
+	if (!rq)
+		return;
+
+	defer_request(rq, i915_sched_lookup_priolist(engine, rq_prio(rq)));
+}
+
+static bool
+need_timeslice(struct intel_engine_cs *engine, const struct i915_request *rq)
+{
+	int hint;
+
+	if (list_is_last(&rq->sched.link, &engine->active.requests))
+		return false;
+
+	hint = max(rq_prio(list_next_entry(rq, sched.link)),
+		   engine->execlists.queue_priority_hint);
+
+	return hint >= rq_prio(rq);
+}
+
+static bool
+enable_timeslice(struct intel_engine_cs *engine)
+{
+	struct i915_request *last = last_active(&engine->execlists);
+
+	return last && need_timeslice(engine, last);
+}
+
 static void execlists_dequeue(struct intel_engine_cs *engine)
 {
 	struct intel_engine_execlists * const execlists = &engine->execlists;
@@ -904,6 +980,27 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 			 */
 			last->hw_context->lrc_desc |= CTX_DESC_FORCE_RESTORE;
 			last = NULL;
+		} else if (need_timeslice(engine, last) &&
+			   !timer_pending(&engine->execlists.timer)) {
+			GEM_TRACE("%s: expired last=%llx:%lld, prio=%d, hint=%d\n",
+				  engine->name,
+				  last->fence.context,
+				  last->fence.seqno,
+				  last->sched.attr.priority,
+				  execlists->queue_priority_hint);
+
+			ring_pause(engine) = 1;
+			defer_active(engine);
+
+			/*
+			 * Unlike for preemption, if we rewind and continue
+			 * executing the same context as previously active,
+			 * the order of execution will remain the same and
+			 * the tail will only advance. We do not need to
+			 * force a full context restore, as a lite-restore
+			 * is sufficient to resample the monotonic TAIL.
+			 */
+			last = NULL;
 		} else {
 			/*
 			 * Otherwise if we already have a request pending
@@ -1226,6 +1323,9 @@ static void process_csb(struct intel_engine_cs *engine)
 				       sizeof(*execlists->pending));
 			execlists->pending[0] = NULL;
 
+			if (enable_timeslice(engine))
+				mod_timer(&execlists->timer, jiffies + 1);
+
 			if (!inject_preempt_hang(execlists))
 				ring_pause(engine) = 0;
 		} else if (status & GEN8_CTX_STATUS_PREEMPTED) {
@@ -1296,6 +1396,15 @@ static void execlists_submission_tasklet(unsigned long data)
 	spin_unlock_irqrestore(&engine->active.lock, flags);
 }
 
+static void execlists_submission_timer(struct timer_list *timer)
+{
+	struct intel_engine_cs *engine =
+		from_timer(engine, timer, execlists.timer);
+
+	/* Kick the tasklet for some interrupt coalescing and reset handling */
+	tasklet_hi_schedule(&engine->execlists.tasklet);
+}
+
 static void queue_request(struct intel_engine_cs *engine,
 			  struct i915_sched_node *node,
 			  int prio)
@@ -2525,6 +2634,7 @@ static int gen8_init_rcs_context(struct i915_request *rq)
 
 static void execlists_park(struct intel_engine_cs *engine)
 {
+	del_timer_sync(&engine->execlists.timer);
 	intel_engine_park(engine);
 }
 
@@ -2622,6 +2732,7 @@ int intel_execlists_submission_setup(struct intel_engine_cs *engine)
 
 	tasklet_init(&engine->execlists.tasklet,
 		     execlists_submission_tasklet, (unsigned long)engine);
+	timer_setup(&engine->execlists.timer, execlists_submission_timer, 0);
 
 	logical_ring_default_vfuncs(engine);
 	logical_ring_default_irqs(engine);
diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c b/drivers/gpu/drm/i915/gt/selftest_lrc.c
index f0ca2a09dabd..9ba6bffff3e3 100644
--- a/drivers/gpu/drm/i915/gt/selftest_lrc.c
+++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c
@@ -79,6 +79,225 @@ static int live_sanitycheck(void *arg)
 	return err;
 }
 
+static int
+emit_semaphore_chain(struct i915_request *rq, struct i915_vma *vma, int idx)
+{
+	u32 *cs;
+
+	cs = intel_ring_begin(rq, 10);
+	if (IS_ERR(cs))
+		return PTR_ERR(cs);
+
+	*cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE;
+
+	*cs++ = MI_SEMAPHORE_WAIT |
+		MI_SEMAPHORE_GLOBAL_GTT |
+		MI_SEMAPHORE_POLL |
+		MI_SEMAPHORE_SAD_NEQ_SDD;
+	*cs++ = 0;
+	*cs++ = i915_ggtt_offset(vma) + 4 * idx;
+	*cs++ = 0;
+
+	if (idx > 0) {
+		*cs++ = MI_STORE_DWORD_IMM_GEN4 | MI_USE_GGTT;
+		*cs++ = i915_ggtt_offset(vma) + 4 * (idx - 1);
+		*cs++ = 0;
+		*cs++ = 1;
+	} else {
+		*cs++ = MI_NOOP;
+		*cs++ = MI_NOOP;
+		*cs++ = MI_NOOP;
+		*cs++ = MI_NOOP;
+	}
+
+	*cs++ = MI_ARB_ON_OFF | MI_ARB_DISABLE;
+
+	intel_ring_advance(rq, cs);
+	return 0;
+}
+
+static struct i915_request *
+semaphore_queue(struct intel_engine_cs *engine, struct i915_vma *vma, int idx)
+{
+	struct i915_gem_context *ctx;
+	struct i915_request *rq;
+	int err;
+
+	ctx = kernel_context(engine->i915);
+	if (!ctx)
+		return ERR_PTR(-ENOMEM);
+
+	rq = igt_request_alloc(ctx, engine);
+	if (IS_ERR(rq))
+		goto out_ctx;
+
+	err = emit_semaphore_chain(rq, vma, idx);
+	i915_request_add(rq);
+	if (err)
+		rq = ERR_PTR(err);
+
+out_ctx:
+	kernel_context_close(ctx);
+	return rq;
+}
+
+static int
+release_queue(struct intel_engine_cs *engine,
+	      struct i915_vma *vma,
+	      int idx)
+{
+	struct i915_sched_attr attr = {
+		.priority = I915_USER_PRIORITY(I915_PRIORITY_MAX),
+	};
+	struct i915_request *rq;
+	u32 *cs;
+
+	rq = i915_request_create(engine->kernel_context);
+	if (IS_ERR(rq))
+		return PTR_ERR(rq);
+
+	cs = intel_ring_begin(rq, 4);
+	if (IS_ERR(cs)) {
+		i915_request_add(rq);
+		return PTR_ERR(cs);
+	}
+
+	*cs++ = MI_STORE_DWORD_IMM_GEN4 | MI_USE_GGTT;
+	*cs++ = i915_ggtt_offset(vma) + 4 * (idx - 1);
+	*cs++ = 0;
+	*cs++ = 1;
+
+	intel_ring_advance(rq, cs);
+	i915_request_add(rq);
+
+	engine->schedule(rq, &attr);
+
+	return 0;
+}
+
+static int
+slice_semaphore_queue(struct intel_engine_cs *outer,
+		      struct i915_vma *vma,
+		      int count)
+{
+	struct intel_engine_cs *engine;
+	struct i915_request *head;
+	enum intel_engine_id id;
+	int err, i, n = 0;
+
+	head = semaphore_queue(outer, vma, n++);
+	if (IS_ERR(head))
+		return PTR_ERR(head);
+
+	i915_request_get(head);
+	for_each_engine(engine, outer->i915, id) {
+		for (i = 0; i < count; i++) {
+			struct i915_request *rq;
+
+			rq = semaphore_queue(engine, vma, n++);
+			if (IS_ERR(rq)) {
+				err = PTR_ERR(rq);
+				goto out;
+			}
+		}
+	}
+
+	err = release_queue(outer, vma, n);
+	if (err)
+		goto out;
+
+	if (i915_request_wait(head,
+			      I915_WAIT_LOCKED,
+			      2 * RUNTIME_INFO(outer->i915)->num_engines * (count + 2) * (count + 3)) < 0) {
+		pr_err("Failed to slice along semaphore chain of length (%d, %d)!\n",
+		       count, n);
+		GEM_TRACE_DUMP();
+		i915_gem_set_wedged(outer->i915);
+		err = -EIO;
+	}
+
+out:
+	i915_request_put(head);
+	return err;
+}
+
+static int live_timeslice_preempt(void *arg)
+{
+	struct drm_i915_private *i915 = arg;
+	struct drm_i915_gem_object *obj;
+	intel_wakeref_t wakeref;
+	struct i915_vma *vma;
+	void *vaddr;
+	int err = 0;
+	int count;
+
+	/*
+	 * If a request takes too long, we would like to give other users
+	 * a fair go on the GPU. In particular, users may create batches
+	 * that wait upon external input, where that input may even be
+	 * supplied by another GPU job. To avoid blocking forever, we
+	 * need to preempt the current task and replace it with another
+	 * ready task.
+	 */
+
+	mutex_lock(&i915->drm.struct_mutex);
+	wakeref = intel_runtime_pm_get(i915);
+
+	obj = i915_gem_object_create_internal(i915, PAGE_SIZE);
+	if (IS_ERR(obj)) {
+		err = PTR_ERR(obj);
+		goto err_unlock;
+	}
+
+	vma = i915_vma_instance(obj, &i915->ggtt.vm, NULL);
+	if (IS_ERR(vma)) {
+		err = PTR_ERR(vma);
+		goto err_obj;
+	}
+
+	vaddr = i915_gem_object_pin_map(obj, I915_MAP_WC);
+	if (IS_ERR(vaddr)) {
+		err = PTR_ERR(vaddr);
+		goto err_obj;
+	}
+
+	err = i915_vma_pin(vma, 0, 0, PIN_GLOBAL);
+	if (err)
+		goto err_map;
+
+	for_each_prime_number_from(count, 1, 16) {
+		struct intel_engine_cs *engine;
+		enum intel_engine_id id;
+
+		for_each_engine(engine, i915, id) {
+			memset(vaddr, 0, PAGE_SIZE);
+
+			err = slice_semaphore_queue(engine, vma, count);
+			if (err)
+				goto err_pin;
+
+			if (igt_flush_test(i915, I915_WAIT_LOCKED)) {
+				err = -EIO;
+				goto err_pin;
+			}
+		}
+	}
+
+err_pin:
+	i915_vma_unpin(vma);
+err_map:
+	i915_gem_object_unpin_map(obj);
+err_obj:
+	i915_gem_object_put(obj);
+err_unlock:
+	if (igt_flush_test(i915, I915_WAIT_LOCKED))
+		err = -EIO;
+	intel_runtime_pm_put(i915, wakeref);
+	mutex_unlock(&i915->drm.struct_mutex);
+
+	return err;
+}
+
 static int live_busywait_preempt(void *arg)
 {
 	struct drm_i915_private *i915 = arg;
@@ -398,6 +617,9 @@ static int live_late_preempt(void *arg)
 	if (!ctx_lo)
 		goto err_ctx_hi;
 
+	/* Make sure ctx_lo stays before ctx_hi until we trigger preemption. */
+	ctx_lo->sched.priority = I915_USER_PRIORITY(1);
+
 	for_each_engine(engine, i915, id) {
 		struct igt_live_test t;
 		struct i915_request *rq;
@@ -1818,6 +2040,7 @@ int intel_execlists_live_selftests(struct drm_i915_private *i915)
 {
 	static const struct i915_subtest tests[] = {
 		SUBTEST(live_sanitycheck),
+		SUBTEST(live_timeslice_preempt),
 		SUBTEST(live_busywait_preempt),
 		SUBTEST(live_preempt),
 		SUBTEST(live_late_preempt),
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index b1ba3e65cd52..0bd452e851d8 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -394,6 +394,7 @@ bool __i915_sched_node_add_dependency(struct i915_sched_node *node,
 		list_add(&dep->wait_link, &signal->waiters_list);
 		list_add(&dep->signal_link, &node->signalers_list);
 		dep->signaler = signal;
+		dep->waiter = node;
 		dep->flags = flags;
 
 		/* Keep track of whether anyone on this chain has a semaphore */
diff --git a/drivers/gpu/drm/i915/i915_scheduler_types.h b/drivers/gpu/drm/i915/i915_scheduler_types.h
index 3e309631bd0b..aad81acba9dc 100644
--- a/drivers/gpu/drm/i915/i915_scheduler_types.h
+++ b/drivers/gpu/drm/i915/i915_scheduler_types.h
@@ -62,6 +62,7 @@ struct i915_sched_node {
 
 struct i915_dependency {
 	struct i915_sched_node *signaler;
+	struct i915_sched_node *waiter;
 	struct list_head signal_link;
 	struct list_head wait_link;
 	struct list_head dfs_link;
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 11/39] drm/i915/execlists: Force preemption
  2019-06-14  7:09 WIP glance towards struct_mutex removal Chris Wilson
                   ` (9 preceding siblings ...)
  2019-06-14  7:09 ` [PATCH 10/39] drm/i915/execlists: Minimalistic timeslicing Chris Wilson
@ 2019-06-14  7:09 ` Chris Wilson
  2019-06-14  7:09 ` [PATCH 12/39] dma-fence: Propagate errors to dma-fence-array container Chris Wilson
                   ` (29 subsequent siblings)
  40 siblings, 0 replies; 61+ messages in thread
From: Chris Wilson @ 2019-06-14  7:09 UTC (permalink / raw)
  To: intel-gfx

If the preempted context takes too long to relinquish control, e.g. it
is stuck inside a shader with arbitration disabled, evict that context
with an engine reset. This ensures that preemptions are reasonably
responsive, providing a tighter QoS for the more important context at
the cost of flagging unresponsive contexts more frequently (i.e. instead
of using an ~10s hangcheck, we now evict at ~10ms).  The challenge of
lies in picking a timeout that can be reasonably serviced by HW for
typical workloads, balancing the existing clients against the needs for
responsiveness.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/Kconfig.profile | 12 ++++++
 drivers/gpu/drm/i915/gt/intel_lrc.c  | 56 ++++++++++++++++++++++++++--
 2 files changed, 65 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/Kconfig.profile b/drivers/gpu/drm/i915/Kconfig.profile
index 48df8889a88a..8273d3baafe4 100644
--- a/drivers/gpu/drm/i915/Kconfig.profile
+++ b/drivers/gpu/drm/i915/Kconfig.profile
@@ -25,3 +25,15 @@ config DRM_I915_SPIN_REQUEST
 	  May be 0 to disable the initial spin. In practice, we estimate
 	  the cost of enabling the interrupt (if currently disabled) to be
 	  a few microseconds.
+
+config DRM_I915_PREEMPT_TIMEOUT
+	int "Preempt timeout (ms)"
+	default 10 # milliseconds
+	help
+	  How long to wait (in milliseconds) for a preemption event to occur
+	  when submitting a new context via execlists. If the current context
+	  does not hit an arbitration point and yield to HW before the timer
+	  expires, the HW will be reset to allow the more important context
+	  to execute.
+
+	  May be 0 to disable the timeout.
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index cea08d665ef5..0563fe8398c5 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -887,6 +887,15 @@ enable_timeslice(struct intel_engine_cs *engine)
 	return last && need_timeslice(engine, last);
 }
 
+static unsigned long preempt_expires(void)
+{
+	unsigned long timeout =
+		msecs_to_jiffies_timeout(CONFIG_DRM_I915_PREEMPT_TIMEOUT);
+
+	barrier();
+	return jiffies + timeout;
+}
+
 static void execlists_dequeue(struct intel_engine_cs *engine)
 {
 	struct intel_engine_execlists * const execlists = &engine->execlists;
@@ -1218,6 +1227,9 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 		*port = execlists_schedule_in(last, port - execlists->pending);
 		memset(port + 1, 0, (last_port - port) * sizeof(*port));
 		execlists_submit_ports(engine);
+
+		if (CONFIG_DRM_I915_PREEMPT_TIMEOUT)
+			mod_timer(&execlists->timer, preempt_expires());
 	}
 }
 
@@ -1373,13 +1385,48 @@ static void process_csb(struct intel_engine_cs *engine)
 	invalidate_csb_entries(&buf[0], &buf[num_entries - 1]);
 }
 
-static void __execlists_submission_tasklet(struct intel_engine_cs *const engine)
+static bool __execlists_submission_tasklet(struct intel_engine_cs *const engine)
 {
 	lockdep_assert_held(&engine->active.lock);
 
 	process_csb(engine);
-	if (!engine->execlists.pending[0])
+	if (!engine->execlists.pending[0]) {
 		execlists_dequeue(engine);
+		return true;
+	}
+
+	return false;
+}
+
+static void preempt_reset(struct intel_engine_cs *engine)
+{
+	const unsigned int bit = I915_RESET_ENGINE + engine->id;
+	unsigned long *lock = &engine->i915->gpu_error.flags;
+
+	if (test_and_set_bit(bit, lock))
+		return;
+
+	tasklet_disable_nosync(&engine->execlists.tasklet);
+	spin_unlock(&engine->active.lock);
+
+	i915_reset_engine(engine, "preemption time out");
+
+	spin_lock(&engine->active.lock);
+	tasklet_enable(&engine->execlists.tasklet);
+
+	clear_bit(bit, lock);
+	wake_up_bit(lock, bit);
+}
+
+static bool preempt_timeout(struct intel_engine_cs *const engine)
+{
+	if (!CONFIG_DRM_I915_PREEMPT_TIMEOUT)
+		return false;
+
+	if (!intel_engine_has_preemption(engine))
+		return false;
+
+	return !timer_pending(&engine->execlists.timer);
 }
 
 /*
@@ -1392,7 +1439,10 @@ static void execlists_submission_tasklet(unsigned long data)
 	unsigned long flags;
 
 	spin_lock_irqsave(&engine->active.lock, flags);
-	__execlists_submission_tasklet(engine);
+
+	if (!__execlists_submission_tasklet(engine) && preempt_timeout(engine))
+		preempt_reset(engine);
+
 	spin_unlock_irqrestore(&engine->active.lock, flags);
 }
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 12/39] dma-fence: Propagate errors to dma-fence-array container
  2019-06-14  7:09 WIP glance towards struct_mutex removal Chris Wilson
                   ` (10 preceding siblings ...)
  2019-06-14  7:09 ` [PATCH 11/39] drm/i915/execlists: Force preemption Chris Wilson
@ 2019-06-14  7:09 ` Chris Wilson
  2019-06-14  7:09 ` [PATCH 13/39] dma-fence: Report the composite sync_file status Chris Wilson
                   ` (28 subsequent siblings)
  40 siblings, 0 replies; 61+ messages in thread
From: Chris Wilson @ 2019-06-14  7:09 UTC (permalink / raw)
  To: intel-gfx; +Cc: Gustavo Padovan, Sumit Semwal

When one of the array of fences is signaled, propagate its errors to the
parent fence-array (keeping the first error to be raised).

v2: Opencode cmpxchg_local to avoid compiler freakout.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Gustavo Padovan <gustavo@padovan.org>
---
 drivers/dma-buf/dma-fence-array.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/drivers/dma-buf/dma-fence-array.c b/drivers/dma-buf/dma-fence-array.c
index 12c6f64c0bc2..d90675bb4fcc 100644
--- a/drivers/dma-buf/dma-fence-array.c
+++ b/drivers/dma-buf/dma-fence-array.c
@@ -13,6 +13,12 @@
 #include <linux/slab.h>
 #include <linux/dma-fence-array.h>
 
+static void fence_set_error_once(struct dma_fence *fence, int error)
+{
+	if (!fence->error && error)
+		dma_fence_set_error(fence, error);
+}
+
 static const char *dma_fence_array_get_driver_name(struct dma_fence *fence)
 {
 	return "dma_fence_array";
@@ -38,6 +44,13 @@ static void dma_fence_array_cb_func(struct dma_fence *f,
 		container_of(cb, struct dma_fence_array_cb, cb);
 	struct dma_fence_array *array = array_cb->array;
 
+	/*
+	 * Propagate the first error reported by any of our fences, but only
+	 * before we ourselves are signaled.
+	 */
+	if (atomic_read(&array->num_pending) > 0)
+		fence_set_error_once(&array->base, f->error);
+
 	if (atomic_dec_and_test(&array->num_pending))
 		irq_work_queue(&array->work);
 	else
@@ -63,6 +76,8 @@ static bool dma_fence_array_enable_signaling(struct dma_fence *fence)
 		dma_fence_get(&array->base);
 		if (dma_fence_add_callback(array->fences[i], &cb[i].cb,
 					   dma_fence_array_cb_func)) {
+			fence_set_error_once(&array->base,
+					     array->fences[i]->error);
 			dma_fence_put(&array->base);
 			if (atomic_dec_and_test(&array->num_pending))
 				return false;
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 13/39] dma-fence: Report the composite sync_file status
  2019-06-14  7:09 WIP glance towards struct_mutex removal Chris Wilson
                   ` (11 preceding siblings ...)
  2019-06-14  7:09 ` [PATCH 12/39] dma-fence: Propagate errors to dma-fence-array container Chris Wilson
@ 2019-06-14  7:09 ` Chris Wilson
  2019-06-14  7:09 ` [PATCH 14/39] dma-fence: Refactor signaling for manual invocation Chris Wilson
                   ` (27 subsequent siblings)
  40 siblings, 0 replies; 61+ messages in thread
From: Chris Wilson @ 2019-06-14  7:09 UTC (permalink / raw)
  To: intel-gfx; +Cc: Gustavo Padovan, Sumit Semwal

Same as for the individual fences, we want to report the actual status
of the fence when queried.

Reported-by: Petri Latvala <petri.latvala@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: Petri Latvala <petri.latvala@intel.com>
---
 drivers/dma-buf/sync_file.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/dma-buf/sync_file.c b/drivers/dma-buf/sync_file.c
index ee4d1a96d779..25c5c071645b 100644
--- a/drivers/dma-buf/sync_file.c
+++ b/drivers/dma-buf/sync_file.c
@@ -419,7 +419,7 @@ static long sync_file_ioctl_fence_info(struct sync_file *sync_file,
 	 * info->num_fences.
 	 */
 	if (!info.num_fences) {
-		info.status = dma_fence_is_signaled(sync_file->fence);
+		info.status = dma_fence_get_status(sync_file->fence);
 		goto no_fences;
 	} else {
 		info.status = 1;
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 14/39] dma-fence: Refactor signaling for manual invocation
  2019-06-14  7:09 WIP glance towards struct_mutex removal Chris Wilson
                   ` (12 preceding siblings ...)
  2019-06-14  7:09 ` [PATCH 13/39] dma-fence: Report the composite sync_file status Chris Wilson
@ 2019-06-14  7:09 ` Chris Wilson
  2019-06-14  7:09 ` [PATCH 15/39] dma-fence: Always execute signal callbacks Chris Wilson
                   ` (26 subsequent siblings)
  40 siblings, 0 replies; 61+ messages in thread
From: Chris Wilson @ 2019-06-14  7:09 UTC (permalink / raw)
  To: intel-gfx

Move the duplicated code within dma-fence.c into the header for wider
reuse. In the process apply a small micro-optimisation to only prune the
fence->cb_list once rather than use list_del on every entry.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/dma-buf/Makefile                    |  10 +-
 drivers/dma-buf/dma-fence-trace.c           |  28 +++
 drivers/dma-buf/dma-fence.c                 |  33 +--
 drivers/gpu/drm/i915/gt/intel_breadcrumbs.c |  32 +--
 include/linux/dma-fence-impl.h              |  84 +++++++
 include/linux/dma-fence-types.h             | 252 ++++++++++++++++++++
 include/linux/dma-fence.h                   | 222 +----------------
 7 files changed, 381 insertions(+), 280 deletions(-)
 create mode 100644 drivers/dma-buf/dma-fence-trace.c
 create mode 100644 include/linux/dma-fence-impl.h
 create mode 100644 include/linux/dma-fence-types.h

diff --git a/drivers/dma-buf/Makefile b/drivers/dma-buf/Makefile
index e8c7310cb800..65c43778e571 100644
--- a/drivers/dma-buf/Makefile
+++ b/drivers/dma-buf/Makefile
@@ -1,6 +1,12 @@
 # SPDX-License-Identifier: GPL-2.0-only
-obj-y := dma-buf.o dma-fence.o dma-fence-array.o dma-fence-chain.o \
-	 reservation.o seqno-fence.o
+obj-y := \
+	dma-buf.o \
+	dma-fence.o \
+	dma-fence-array.o \
+	dma-fence-chain.o \
+	dma-fence-trace.o \
+	reservation.o \
+	seqno-fence.o
 obj-$(CONFIG_SYNC_FILE)		+= sync_file.o
 obj-$(CONFIG_SW_SYNC)		+= sw_sync.o sync_debug.o
 obj-$(CONFIG_UDMABUF)		+= udmabuf.o
diff --git a/drivers/dma-buf/dma-fence-trace.c b/drivers/dma-buf/dma-fence-trace.c
new file mode 100644
index 000000000000..eb6f282be4c0
--- /dev/null
+++ b/drivers/dma-buf/dma-fence-trace.c
@@ -0,0 +1,28 @@
+/*
+ * Fence mechanism for dma-buf and to allow for asynchronous dma access
+ *
+ * Copyright (C) 2012 Canonical Ltd
+ * Copyright (C) 2012 Texas Instruments
+ *
+ * Authors:
+ * Rob Clark <robdclark@gmail.com>
+ * Maarten Lankhorst <maarten.lankhorst@canonical.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+
+#include <linux/dma-fence-types.h>
+
+#define CREATE_TRACE_POINTS
+#include <trace/events/dma_fence.h>
+
+EXPORT_TRACEPOINT_SYMBOL(dma_fence_emit);
+EXPORT_TRACEPOINT_SYMBOL(dma_fence_enable_signal);
+EXPORT_TRACEPOINT_SYMBOL(dma_fence_signaled);
diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index 59ac96ec7ba8..027a6a894abd 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -14,15 +14,9 @@
 #include <linux/export.h>
 #include <linux/atomic.h>
 #include <linux/dma-fence.h>
+#include <linux/dma-fence-impl.h>
 #include <linux/sched/signal.h>
 
-#define CREATE_TRACE_POINTS
-#include <trace/events/dma_fence.h>
-
-EXPORT_TRACEPOINT_SYMBOL(dma_fence_emit);
-EXPORT_TRACEPOINT_SYMBOL(dma_fence_enable_signal);
-EXPORT_TRACEPOINT_SYMBOL(dma_fence_signaled);
-
 static DEFINE_SPINLOCK(dma_fence_stub_lock);
 static struct dma_fence dma_fence_stub;
 
@@ -128,7 +122,6 @@ EXPORT_SYMBOL(dma_fence_context_alloc);
  */
 int dma_fence_signal_locked(struct dma_fence *fence)
 {
-	struct dma_fence_cb *cur, *tmp;
 	int ret = 0;
 
 	lockdep_assert_held(fence->lock);
@@ -136,7 +129,7 @@ int dma_fence_signal_locked(struct dma_fence *fence)
 	if (WARN_ON(!fence))
 		return -EINVAL;
 
-	if (test_and_set_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags)) {
+	if (!__dma_fence_signal(fence)) {
 		ret = -EINVAL;
 
 		/*
@@ -144,15 +137,10 @@ int dma_fence_signal_locked(struct dma_fence *fence)
 		 * still run through all callbacks
 		 */
 	} else {
-		fence->timestamp = ktime_get();
-		set_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT, &fence->flags);
-		trace_dma_fence_signaled(fence);
+		__dma_fence_signal__timestamp(fence, ktime_get());
 	}
 
-	list_for_each_entry_safe(cur, tmp, &fence->cb_list, node) {
-		list_del_init(&cur->node);
-		cur->func(fence, cur);
-	}
+	__dma_fence_signal__notify(fence);
 	return ret;
 }
 EXPORT_SYMBOL(dma_fence_signal_locked);
@@ -177,21 +165,14 @@ int dma_fence_signal(struct dma_fence *fence)
 	if (!fence)
 		return -EINVAL;
 
-	if (test_and_set_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
+	if (!__dma_fence_signal(fence))
 		return -EINVAL;
 
-	fence->timestamp = ktime_get();
-	set_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT, &fence->flags);
-	trace_dma_fence_signaled(fence);
+	__dma_fence_signal__timestamp(fence, ktime_get());
 
 	if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &fence->flags)) {
-		struct dma_fence_cb *cur, *tmp;
-
 		spin_lock_irqsave(fence->lock, flags);
-		list_for_each_entry_safe(cur, tmp, &fence->cb_list, node) {
-			list_del_init(&cur->node);
-			cur->func(fence, cur);
-		}
+		__dma_fence_signal__notify(fence);
 		spin_unlock_irqrestore(fence->lock, flags);
 	}
 	return 0;
diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
index c092bdf5f0bf..f9ba43a0f4d8 100644
--- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
@@ -22,8 +22,7 @@
  *
  */
 
-#include <linux/kthread.h>
-#include <trace/events/dma_fence.h>
+#include <linux/dma-fence-impl.h>
 #include <uapi/linux/sched/types.h>
 
 #include "i915_drv.h"
@@ -97,35 +96,6 @@ check_signal_order(struct intel_context *ce, struct i915_request *rq)
 	return true;
 }
 
-static bool
-__dma_fence_signal(struct dma_fence *fence)
-{
-	return !test_and_set_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags);
-}
-
-static void
-__dma_fence_signal__timestamp(struct dma_fence *fence, ktime_t timestamp)
-{
-	fence->timestamp = timestamp;
-	set_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT, &fence->flags);
-	trace_dma_fence_signaled(fence);
-}
-
-static void
-__dma_fence_signal__notify(struct dma_fence *fence)
-{
-	struct dma_fence_cb *cur, *tmp;
-
-	lockdep_assert_held(fence->lock);
-	lockdep_assert_irqs_disabled();
-
-	list_for_each_entry_safe(cur, tmp, &fence->cb_list, node) {
-		INIT_LIST_HEAD(&cur->node);
-		cur->func(fence, cur);
-	}
-	INIT_LIST_HEAD(&fence->cb_list);
-}
-
 void intel_engine_breadcrumbs_irq(struct intel_engine_cs *engine)
 {
 	struct intel_breadcrumbs *b = &engine->breadcrumbs;
diff --git a/include/linux/dma-fence-impl.h b/include/linux/dma-fence-impl.h
new file mode 100644
index 000000000000..aa44eb031693
--- /dev/null
+++ b/include/linux/dma-fence-impl.h
@@ -0,0 +1,84 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Fence mechanism for dma-buf to allow for asynchronous dma access
+ *
+ * Copyright (C) 2012 Canonical Ltd
+ * Copyright (C) 2012 Texas Instruments
+ *
+ * Authors:
+ * Rob Clark <robdclark@gmail.com>
+ * Maarten Lankhorst <maarten.lankhorst@canonical.com>
+ */
+
+#ifndef __LINUX_DMA_FENCE_IMPL_H
+#define __LINUX_DMA_FENCE_IMPL_H
+
+#include <linux/dma-fence.h>
+#include <linux/lockdep.h>
+#include <linux/list.h>
+#include <linux/ktime.h>
+
+#include <trace/events/dma_fence.h>
+
+/**
+ * __dma_fence_signal: Mark a fence as signaled
+ * @fence: the dma fence to mark
+ *
+ * The first step of the dma_fence_signal() implementation is to atomically
+ * mark the fence as signaled.
+ *
+ * Returns: true if the fence was not previously signaled, false if it was
+ * already signaled.
+ */
+static inline bool
+__dma_fence_signal(struct dma_fence *fence)
+{
+	return !test_and_set_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags);
+}
+
+/**
+ * __dma_fence_signal__timestamp: sets the signaling timestamp
+ * @fence: the dma fence
+ * @timestamp: the monotonic timestamp (e.g. ktime_get_mono())
+ *
+ * The second step of the dma_fence_signal() implementation it to record
+ * the siganling timestamp.
+ *
+ * The dma-fence stores a timestamp of when it was signaled for inspection
+ * by userspace. This timestamp is typically the CPU time at which the
+ * signal was raised, but could be a HW timestamp generated by the event
+ * itself. Either way, it must be set on the signaled fence before
+ * callbacks are notified.
+ */
+static inline void
+__dma_fence_signal__timestamp(struct dma_fence *fence, ktime_t timestamp)
+{
+	fence->timestamp = timestamp;
+	set_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT, &fence->flags);
+	trace_dma_fence_signaled(fence);
+}
+
+/**
+ * __dma_fence_signal__notify: notify observers of the signal event
+ * @fence: the dma fence
+ *
+ * The final step of the dma_fence_signal() implementation is to notify
+ * all observers (dma_fence_add_callback()) of the signal event. This must
+ * be called with the fence->lock already held and irqsoff.
+ */
+static inline void
+__dma_fence_signal__notify(struct dma_fence *fence)
+{
+	struct dma_fence_cb *cur, *tmp;
+
+	lockdep_assert_held(fence->lock);
+	lockdep_assert_irqs_disabled();
+
+	list_for_each_entry_safe(cur, tmp, &fence->cb_list, node) {
+		INIT_LIST_HEAD(&cur->node);
+		cur->func(fence, cur);
+	}
+	INIT_LIST_HEAD(&fence->cb_list);
+}
+
+#endif /* __LINUX_DMA_FENCE_IMPL_H */
diff --git a/include/linux/dma-fence-types.h b/include/linux/dma-fence-types.h
new file mode 100644
index 000000000000..52f48decf23d
--- /dev/null
+++ b/include/linux/dma-fence-types.h
@@ -0,0 +1,252 @@
+/*
+ * Fence mechanism for dma-buf to allow for asynchronous dma access
+ *
+ * Copyright (C) 2012 Canonical Ltd
+ * Copyright (C) 2012 Texas Instruments
+ *
+ * Authors:
+ * Rob Clark <robdclark@gmail.com>
+ * Maarten Lankhorst <maarten.lankhorst@canonical.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+
+#ifndef __LINUX_DMA_FENCE_TYPES_H
+#define __LINUX_DMA_FENCE_TYPES_H
+
+#include <linux/list.h>
+#include <linux/kref.h>
+#include <linux/ktime.h>
+#include <linux/rcupdate.h>
+#include <linux/spinlock.h>
+#include <linux/types.h>
+
+struct dma_fence;
+struct dma_fence_ops;
+struct dma_fence_cb;
+
+/**
+ * struct dma_fence - software synchronization primitive
+ * @refcount: refcount for this fence
+ * @ops: dma_fence_ops associated with this fence
+ * @rcu: used for releasing fence with kfree_rcu
+ * @cb_list: list of all callbacks to call
+ * @lock: spin_lock_irqsave used for locking
+ * @context: execution context this fence belongs to, returned by
+ *           dma_fence_context_alloc()
+ * @seqno: the sequence number of this fence inside the execution context,
+ * can be compared to decide which fence would be signaled later.
+ * @flags: A mask of DMA_FENCE_FLAG_* defined below
+ * @timestamp: Timestamp when the fence was signaled.
+ * @error: Optional, only valid if < 0, must be set before calling
+ * dma_fence_signal, indicates that the fence has completed with an error.
+ *
+ * the flags member must be manipulated and read using the appropriate
+ * atomic ops (bit_*), so taking the spinlock will not be needed most
+ * of the time.
+ *
+ * DMA_FENCE_FLAG_SIGNALED_BIT - fence is already signaled
+ * DMA_FENCE_FLAG_TIMESTAMP_BIT - timestamp recorded for fence signaling
+ * DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT - enable_signaling might have been called
+ * DMA_FENCE_FLAG_USER_BITS - start of the unused bits, can be used by the
+ * implementer of the fence for its own purposes. Can be used in different
+ * ways by different fence implementers, so do not rely on this.
+ *
+ * Since atomic bitops are used, this is not guaranteed to be the case.
+ * Particularly, if the bit was set, but dma_fence_signal was called right
+ * before this bit was set, it would have been able to set the
+ * DMA_FENCE_FLAG_SIGNALED_BIT, before enable_signaling was called.
+ * Adding a check for DMA_FENCE_FLAG_SIGNALED_BIT after setting
+ * DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT closes this race, and makes sure that
+ * after dma_fence_signal was called, any enable_signaling call will have either
+ * been completed, or never called at all.
+ */
+struct dma_fence {
+	struct kref refcount;
+	const struct dma_fence_ops *ops;
+	struct rcu_head rcu;
+	struct list_head cb_list;
+	spinlock_t *lock;
+	u64 context;
+	u64 seqno;
+	unsigned long flags;
+	ktime_t timestamp;
+	int error;
+};
+
+enum dma_fence_flag_bits {
+	DMA_FENCE_FLAG_SIGNALED_BIT,
+	DMA_FENCE_FLAG_TIMESTAMP_BIT,
+	DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
+	DMA_FENCE_FLAG_USER_BITS, /* must always be last member */
+};
+
+typedef void (*dma_fence_func_t)(struct dma_fence *fence,
+				 struct dma_fence_cb *cb);
+
+/**
+ * struct dma_fence_cb - callback for dma_fence_add_callback()
+ * @node: used by dma_fence_add_callback() to append this struct to fence::cb_list
+ * @func: dma_fence_func_t to call
+ *
+ * This struct will be initialized by dma_fence_add_callback(), additional
+ * data can be passed along by embedding dma_fence_cb in another struct.
+ */
+struct dma_fence_cb {
+	struct list_head node;
+	dma_fence_func_t func;
+};
+
+/**
+ * struct dma_fence_ops - operations implemented for fence
+ *
+ */
+struct dma_fence_ops {
+	/**
+	 * @use_64bit_seqno:
+	 *
+	 * True if this dma_fence implementation uses 64bit seqno, false
+	 * otherwise.
+	 */
+	bool use_64bit_seqno;
+
+	/**
+	 * @get_driver_name:
+	 *
+	 * Returns the driver name. This is a callback to allow drivers to
+	 * compute the name at runtime, without having it to store permanently
+	 * for each fence, or build a cache of some sort.
+	 *
+	 * This callback is mandatory.
+	 */
+	const char * (*get_driver_name)(struct dma_fence *fence);
+
+	/**
+	 * @get_timeline_name:
+	 *
+	 * Return the name of the context this fence belongs to. This is a
+	 * callback to allow drivers to compute the name at runtime, without
+	 * having it to store permanently for each fence, or build a cache of
+	 * some sort.
+	 *
+	 * This callback is mandatory.
+	 */
+	const char * (*get_timeline_name)(struct dma_fence *fence);
+
+	/**
+	 * @enable_signaling:
+	 *
+	 * Enable software signaling of fence.
+	 *
+	 * For fence implementations that have the capability for hw->hw
+	 * signaling, they can implement this op to enable the necessary
+	 * interrupts, or insert commands into cmdstream, etc, to avoid these
+	 * costly operations for the common case where only hw->hw
+	 * synchronization is required.  This is called in the first
+	 * dma_fence_wait() or dma_fence_add_callback() path to let the fence
+	 * implementation know that there is another driver waiting on the
+	 * signal (ie. hw->sw case).
+	 *
+	 * This function can be called from atomic context, but not
+	 * from irq context, so normal spinlocks can be used.
+	 *
+	 * A return value of false indicates the fence already passed,
+	 * or some failure occurred that made it impossible to enable
+	 * signaling. True indicates successful enabling.
+	 *
+	 * &dma_fence.error may be set in enable_signaling, but only when false
+	 * is returned.
+	 *
+	 * Since many implementations can call dma_fence_signal() even before
+	 * @enable_signaling has been called there's a race window, where the
+	 * dma_fence_signal() might result in the final fence reference being
+	 * released and its memory freed. To avoid this, implementations of this
+	 * callback should grab their own reference using dma_fence_get(), to be
+	 * released when the fence is signalled (through e.g. the interrupt
+	 * handler).
+	 *
+	 * This callback is optional. If this callback is not present, then the
+	 * driver must always have signaling enabled.
+	 */
+	bool (*enable_signaling)(struct dma_fence *fence);
+
+	/**
+	 * @signaled:
+	 *
+	 * Peek whether the fence is signaled, as a fastpath optimization for
+	 * e.g. dma_fence_wait() or dma_fence_add_callback(). Note that this
+	 * callback does not need to make any guarantees beyond that a fence
+	 * once indicates as signalled must always return true from this
+	 * callback. This callback may return false even if the fence has
+	 * completed already, in this case information hasn't propogated throug
+	 * the system yet. See also dma_fence_is_signaled().
+	 *
+	 * May set &dma_fence.error if returning true.
+	 *
+	 * This callback is optional.
+	 */
+	bool (*signaled)(struct dma_fence *fence);
+
+	/**
+	 * @wait:
+	 *
+	 * Custom wait implementation, defaults to dma_fence_default_wait() if
+	 * not set.
+	 *
+	 * The dma_fence_default_wait implementation should work for any fence,
+	 * as long as @enable_signaling works correctly. This hook allows
+	 * drivers to have an optimized version for the case where a process
+	 * context is already available, e.g. if @enable_signaling for the
+	 * general case needs to set up a worker thread.
+	 *
+	 * Must return -ERESTARTSYS if the wait is intr = true and the wait was
+	 * interrupted, and remaining jiffies if fence has signaled, or 0 if
+	 * wait timed out. Can also return other error values on custom
+	 * implementations, which should be treated as if the fence is signaled.
+	 * For example a hardware lockup could be reported like that.
+	 *
+	 * This callback is optional.
+	 */
+	signed long (*wait)(struct dma_fence *fence,
+			    bool intr, signed long timeout);
+
+	/**
+	 * @release:
+	 *
+	 * Called on destruction of fence to release additional resources.
+	 * Can be called from irq context.  This callback is optional. If it is
+	 * NULL, then dma_fence_free() is instead called as the default
+	 * implementation.
+	 */
+	void (*release)(struct dma_fence *fence);
+
+	/**
+	 * @fence_value_str:
+	 *
+	 * Callback to fill in free-form debug info specific to this fence, like
+	 * the sequence number.
+	 *
+	 * This callback is optional.
+	 */
+	void (*fence_value_str)(struct dma_fence *fence, char *str, int size);
+
+	/**
+	 * @timeline_value_str:
+	 *
+	 * Fills in the current value of the timeline as a string, like the
+	 * sequence number. Note that the specific fence passed to this function
+	 * should not matter, drivers should only use it to look up the
+	 * corresponding timeline structures.
+	 */
+	void (*timeline_value_str)(struct dma_fence *fence,
+				   char *str, int size);
+};
+
+#endif /* __LINUX_DMA_FENCE_TYPES_H */
diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
index 05d29dbc7e62..1c8dd1fbafae 100644
--- a/include/linux/dma-fence.h
+++ b/include/linux/dma-fence.h
@@ -13,6 +13,7 @@
 #ifndef __LINUX_DMA_FENCE_H
 #define __LINUX_DMA_FENCE_H
 
+#include <linux/dma-fence-types.h>
 #include <linux/err.h>
 #include <linux/wait.h>
 #include <linux/list.h>
@@ -22,227 +23,6 @@
 #include <linux/printk.h>
 #include <linux/rcupdate.h>
 
-struct dma_fence;
-struct dma_fence_ops;
-struct dma_fence_cb;
-
-/**
- * struct dma_fence - software synchronization primitive
- * @refcount: refcount for this fence
- * @ops: dma_fence_ops associated with this fence
- * @rcu: used for releasing fence with kfree_rcu
- * @cb_list: list of all callbacks to call
- * @lock: spin_lock_irqsave used for locking
- * @context: execution context this fence belongs to, returned by
- *           dma_fence_context_alloc()
- * @seqno: the sequence number of this fence inside the execution context,
- * can be compared to decide which fence would be signaled later.
- * @flags: A mask of DMA_FENCE_FLAG_* defined below
- * @timestamp: Timestamp when the fence was signaled.
- * @error: Optional, only valid if < 0, must be set before calling
- * dma_fence_signal, indicates that the fence has completed with an error.
- *
- * the flags member must be manipulated and read using the appropriate
- * atomic ops (bit_*), so taking the spinlock will not be needed most
- * of the time.
- *
- * DMA_FENCE_FLAG_SIGNALED_BIT - fence is already signaled
- * DMA_FENCE_FLAG_TIMESTAMP_BIT - timestamp recorded for fence signaling
- * DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT - enable_signaling might have been called
- * DMA_FENCE_FLAG_USER_BITS - start of the unused bits, can be used by the
- * implementer of the fence for its own purposes. Can be used in different
- * ways by different fence implementers, so do not rely on this.
- *
- * Since atomic bitops are used, this is not guaranteed to be the case.
- * Particularly, if the bit was set, but dma_fence_signal was called right
- * before this bit was set, it would have been able to set the
- * DMA_FENCE_FLAG_SIGNALED_BIT, before enable_signaling was called.
- * Adding a check for DMA_FENCE_FLAG_SIGNALED_BIT after setting
- * DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT closes this race, and makes sure that
- * after dma_fence_signal was called, any enable_signaling call will have either
- * been completed, or never called at all.
- */
-struct dma_fence {
-	struct kref refcount;
-	const struct dma_fence_ops *ops;
-	struct rcu_head rcu;
-	struct list_head cb_list;
-	spinlock_t *lock;
-	u64 context;
-	u64 seqno;
-	unsigned long flags;
-	ktime_t timestamp;
-	int error;
-};
-
-enum dma_fence_flag_bits {
-	DMA_FENCE_FLAG_SIGNALED_BIT,
-	DMA_FENCE_FLAG_TIMESTAMP_BIT,
-	DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
-	DMA_FENCE_FLAG_USER_BITS, /* must always be last member */
-};
-
-typedef void (*dma_fence_func_t)(struct dma_fence *fence,
-				 struct dma_fence_cb *cb);
-
-/**
- * struct dma_fence_cb - callback for dma_fence_add_callback()
- * @node: used by dma_fence_add_callback() to append this struct to fence::cb_list
- * @func: dma_fence_func_t to call
- *
- * This struct will be initialized by dma_fence_add_callback(), additional
- * data can be passed along by embedding dma_fence_cb in another struct.
- */
-struct dma_fence_cb {
-	struct list_head node;
-	dma_fence_func_t func;
-};
-
-/**
- * struct dma_fence_ops - operations implemented for fence
- *
- */
-struct dma_fence_ops {
-	/**
-	 * @use_64bit_seqno:
-	 *
-	 * True if this dma_fence implementation uses 64bit seqno, false
-	 * otherwise.
-	 */
-	bool use_64bit_seqno;
-
-	/**
-	 * @get_driver_name:
-	 *
-	 * Returns the driver name. This is a callback to allow drivers to
-	 * compute the name at runtime, without having it to store permanently
-	 * for each fence, or build a cache of some sort.
-	 *
-	 * This callback is mandatory.
-	 */
-	const char * (*get_driver_name)(struct dma_fence *fence);
-
-	/**
-	 * @get_timeline_name:
-	 *
-	 * Return the name of the context this fence belongs to. This is a
-	 * callback to allow drivers to compute the name at runtime, without
-	 * having it to store permanently for each fence, or build a cache of
-	 * some sort.
-	 *
-	 * This callback is mandatory.
-	 */
-	const char * (*get_timeline_name)(struct dma_fence *fence);
-
-	/**
-	 * @enable_signaling:
-	 *
-	 * Enable software signaling of fence.
-	 *
-	 * For fence implementations that have the capability for hw->hw
-	 * signaling, they can implement this op to enable the necessary
-	 * interrupts, or insert commands into cmdstream, etc, to avoid these
-	 * costly operations for the common case where only hw->hw
-	 * synchronization is required.  This is called in the first
-	 * dma_fence_wait() or dma_fence_add_callback() path to let the fence
-	 * implementation know that there is another driver waiting on the
-	 * signal (ie. hw->sw case).
-	 *
-	 * This function can be called from atomic context, but not
-	 * from irq context, so normal spinlocks can be used.
-	 *
-	 * A return value of false indicates the fence already passed,
-	 * or some failure occurred that made it impossible to enable
-	 * signaling. True indicates successful enabling.
-	 *
-	 * &dma_fence.error may be set in enable_signaling, but only when false
-	 * is returned.
-	 *
-	 * Since many implementations can call dma_fence_signal() even when before
-	 * @enable_signaling has been called there's a race window, where the
-	 * dma_fence_signal() might result in the final fence reference being
-	 * released and its memory freed. To avoid this, implementations of this
-	 * callback should grab their own reference using dma_fence_get(), to be
-	 * released when the fence is signalled (through e.g. the interrupt
-	 * handler).
-	 *
-	 * This callback is optional. If this callback is not present, then the
-	 * driver must always have signaling enabled.
-	 */
-	bool (*enable_signaling)(struct dma_fence *fence);
-
-	/**
-	 * @signaled:
-	 *
-	 * Peek whether the fence is signaled, as a fastpath optimization for
-	 * e.g. dma_fence_wait() or dma_fence_add_callback(). Note that this
-	 * callback does not need to make any guarantees beyond that a fence
-	 * once indicates as signalled must always return true from this
-	 * callback. This callback may return false even if the fence has
-	 * completed already, in this case information hasn't propogated throug
-	 * the system yet. See also dma_fence_is_signaled().
-	 *
-	 * May set &dma_fence.error if returning true.
-	 *
-	 * This callback is optional.
-	 */
-	bool (*signaled)(struct dma_fence *fence);
-
-	/**
-	 * @wait:
-	 *
-	 * Custom wait implementation, defaults to dma_fence_default_wait() if
-	 * not set.
-	 *
-	 * The dma_fence_default_wait implementation should work for any fence, as long
-	 * as @enable_signaling works correctly. This hook allows drivers to
-	 * have an optimized version for the case where a process context is
-	 * already available, e.g. if @enable_signaling for the general case
-	 * needs to set up a worker thread.
-	 *
-	 * Must return -ERESTARTSYS if the wait is intr = true and the wait was
-	 * interrupted, and remaining jiffies if fence has signaled, or 0 if wait
-	 * timed out. Can also return other error values on custom implementations,
-	 * which should be treated as if the fence is signaled. For example a hardware
-	 * lockup could be reported like that.
-	 *
-	 * This callback is optional.
-	 */
-	signed long (*wait)(struct dma_fence *fence,
-			    bool intr, signed long timeout);
-
-	/**
-	 * @release:
-	 *
-	 * Called on destruction of fence to release additional resources.
-	 * Can be called from irq context.  This callback is optional. If it is
-	 * NULL, then dma_fence_free() is instead called as the default
-	 * implementation.
-	 */
-	void (*release)(struct dma_fence *fence);
-
-	/**
-	 * @fence_value_str:
-	 *
-	 * Callback to fill in free-form debug info specific to this fence, like
-	 * the sequence number.
-	 *
-	 * This callback is optional.
-	 */
-	void (*fence_value_str)(struct dma_fence *fence, char *str, int size);
-
-	/**
-	 * @timeline_value_str:
-	 *
-	 * Fills in the current value of the timeline as a string, like the
-	 * sequence number. Note that the specific fence passed to this function
-	 * should not matter, drivers should only use it to look up the
-	 * corresponding timeline structures.
-	 */
-	void (*timeline_value_str)(struct dma_fence *fence,
-				   char *str, int size);
-};
-
 void dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
 		    spinlock_t *lock, u64 context, u64 seqno);
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 15/39] dma-fence: Always execute signal callbacks
  2019-06-14  7:09 WIP glance towards struct_mutex removal Chris Wilson
                   ` (13 preceding siblings ...)
  2019-06-14  7:09 ` [PATCH 14/39] dma-fence: Refactor signaling for manual invocation Chris Wilson
@ 2019-06-14  7:09 ` Chris Wilson
  2019-06-14  7:10 ` [PATCH 16/39] drm/i915: Execute signal callbacks from no-op i915_request_wait Chris Wilson
                   ` (25 subsequent siblings)
  40 siblings, 0 replies; 61+ messages in thread
From: Chris Wilson @ 2019-06-14  7:09 UTC (permalink / raw)
  To: intel-gfx

Allow for some users to surreptiously insert lazy signal callbacks that
do not depend on enabling the signaling mechanism around every fence.
This means that we may have a cb_list even if the signaling bit is not
enabled, so always notify the callbacks.

The cost is that dma_fence_signal() must always acquire the spinlock to
ensure that the cb_list is flushed.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/dma-buf/dma-fence.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index 027a6a894abd..ab4a456bba04 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -170,11 +170,9 @@ int dma_fence_signal(struct dma_fence *fence)
 
 	__dma_fence_signal__timestamp(fence, ktime_get());
 
-	if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &fence->flags)) {
-		spin_lock_irqsave(fence->lock, flags);
-		__dma_fence_signal__notify(fence);
-		spin_unlock_irqrestore(fence->lock, flags);
-	}
+	spin_lock_irqsave(fence->lock, flags);
+	__dma_fence_signal__notify(fence);
+	spin_unlock_irqrestore(fence->lock, flags);
 	return 0;
 }
 EXPORT_SYMBOL(dma_fence_signal);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 16/39] drm/i915: Execute signal callbacks from no-op i915_request_wait
  2019-06-14  7:09 WIP glance towards struct_mutex removal Chris Wilson
                   ` (14 preceding siblings ...)
  2019-06-14  7:09 ` [PATCH 15/39] dma-fence: Always execute signal callbacks Chris Wilson
@ 2019-06-14  7:10 ` Chris Wilson
  2019-06-14 10:45   ` Tvrtko Ursulin
  2019-06-14  7:10 ` [PATCH 17/39] drm/i915: Make the semaphore saturation mask global Chris Wilson
                   ` (24 subsequent siblings)
  40 siblings, 1 reply; 61+ messages in thread
From: Chris Wilson @ 2019-06-14  7:10 UTC (permalink / raw)
  To: intel-gfx

If we enter i915_request_wait() with an already completed request, but
unsignaled dma-fence, signal the fence before returning. This allows us
to execute any of the signal callbacks at the earliest opportunity.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_request.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index ead68a2ffeac..a82f3ff82b39 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -1392,7 +1392,7 @@ long i915_request_wait(struct i915_request *rq,
 	might_sleep();
 	GEM_BUG_ON(timeout < 0);
 
-	if (i915_request_completed(rq))
+	if (dma_fence_is_signaled(&rq->fence))
 		return timeout;
 
 	if (!timeout)
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 17/39] drm/i915: Make the semaphore saturation mask global
  2019-06-14  7:09 WIP glance towards struct_mutex removal Chris Wilson
                   ` (15 preceding siblings ...)
  2019-06-14  7:10 ` [PATCH 16/39] drm/i915: Execute signal callbacks from no-op i915_request_wait Chris Wilson
@ 2019-06-14  7:10 ` Chris Wilson
  2019-06-14  7:10 ` [PATCH 18/39] drm/i915: Throw away the active object retirement complexity Chris Wilson
                   ` (23 subsequent siblings)
  40 siblings, 0 replies; 61+ messages in thread
From: Chris Wilson @ 2019-06-14  7:10 UTC (permalink / raw)
  To: intel-gfx; +Cc: Dmitry Ermilov

The idea behind keeping the saturation mask local to a context backfired
spectacularly. The premise with the local mask was that we would be more
proactive in attempting to use semaphores after each time the context
idled, and that all new contexts would attempt to use semaphores
ignoring the current state of the system. This turns out to be horribly
optimistic. If the system state is still oversaturated and the existing
workloads have all stopped using semaphores, the new workloads would
attempt to use semaphores and be deprioritised behind real work. The
new contexts would not switch off using semaphores until their initial
batch of low priority work had completed. Given sufficient backload load
of equal user priority, this would completely starve the new work of any
GPU time.

To compensate, remove the local tracking in favour of keeping it as
global state on the engine -- once the system is saturated and
semaphores are disabled, everyone stops attempting to use semaphores
until the system is idle again. One of the reason for preferring local
context tracking was that it worked with virtual engines, so for
switching to global state we could either do a complete check of all the
virtual siblings or simply disable semaphores for those requests. This
takes the simpler approach of disabling semaphores on virtual engines.

The downside is that the decision that the engine is saturated is a
local measure -- we are only checking whether or not this context was
scheduled in a timely fashion, it may be legitimately delayed due to user
priorities. We still have the same dilemma though, that we do not want
to employ the semaphore poll unless it will be used.

Fixes: ca6e56f654e7 ("drm/i915: Disable semaphore busywaits on saturated systems")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Cc: Dmitry Ermilov <dmitry.ermilov@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_context.c       | 2 --
 drivers/gpu/drm/i915/gt/intel_context_types.h | 2 --
 drivers/gpu/drm/i915/gt/intel_engine_pm.c     | 2 ++
 drivers/gpu/drm/i915/gt/intel_engine_types.h  | 2 ++
 drivers/gpu/drm/i915/gt/intel_lrc.c           | 2 +-
 drivers/gpu/drm/i915/i915_request.c           | 4 ++--
 6 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
index 8e299c631575..3d660c8ce414 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -142,7 +142,6 @@ intel_context_init(struct intel_context *ce,
 	ce->engine = engine;
 	ce->ops = engine->cops;
 	ce->sseu = engine->sseu;
-	ce->saturated = 0;
 
 	INIT_LIST_HEAD(&ce->signal_link);
 	INIT_LIST_HEAD(&ce->signals);
@@ -223,7 +222,6 @@ void intel_context_enter_engine(struct intel_context *ce)
 
 void intel_context_exit_engine(struct intel_context *ce)
 {
-	ce->saturated = 0;
 	intel_engine_pm_put(ce->engine);
 }
 
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
index b565c3ff4378..4c0e211c715d 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -58,8 +58,6 @@ struct intel_context {
 	atomic_t pin_count;
 	struct mutex pin_mutex; /* guards pinning and associated on-gpuing */
 
-	intel_engine_mask_t saturated; /* submitting semaphores too late? */
-
 	/**
 	 * active: Active tracker for the rq activity (inc. external) on this
 	 * intel_context object.
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
index 3c448a061abd..25287140fb61 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
@@ -100,6 +100,8 @@ static int __engine_park(struct intel_wakeref *wf)
 	struct intel_engine_cs *engine =
 		container_of(wf, typeof(*engine), wakeref);
 
+	engine->saturated = 0;
+
 	/*
 	 * If one and only one request is completed between pm events,
 	 * we know that we are inside the kernel context and it is
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index 11a25f060fed..1cbe10a0fec7 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -258,6 +258,8 @@ struct intel_engine_cs {
 	struct intel_context *kernel_context; /* pinned */
 	struct intel_context *preempt_context; /* pinned; optional */
 
+	intel_engine_mask_t saturated; /* submitting semaphores too late? */
+
 	unsigned long serial;
 
 	unsigned long wakeref_serial;
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 0563fe8398c5..bbbdc63906c6 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -3198,7 +3198,6 @@ static void virtual_context_exit(struct intel_context *ce)
 	struct virtual_engine *ve = container_of(ce, typeof(*ve), context);
 	unsigned int n;
 
-	ce->saturated = 0;
 	for (n = 0; n < ve->num_siblings; n++)
 		intel_engine_pm_put(ve->siblings[n]);
 }
@@ -3397,6 +3396,7 @@ intel_execlists_create_virtual(struct i915_gem_context *ctx,
 	ve->base.uabi_class = I915_ENGINE_CLASS_INVALID;
 	ve->base.instance = I915_ENGINE_CLASS_INVALID_VIRTUAL;
 	ve->base.flags = I915_ENGINE_IS_VIRTUAL;
+	ve->base.saturated = ALL_ENGINES;
 
 	snprintf(ve->base.name, sizeof(ve->base.name), "virtual");
 
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index a82f3ff82b39..865b49928ff9 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -418,7 +418,7 @@ void __i915_request_submit(struct i915_request *request)
 	 */
 	if (request->sched.semaphores &&
 	    i915_sw_fence_signaled(&request->semaphore))
-		request->hw_context->saturated |= request->sched.semaphores;
+		engine->saturated |= request->sched.semaphores;
 
 	/* We may be recursing from the signal callback of another i915 fence */
 	spin_lock_nested(&request->lock, SINGLE_DEPTH_NESTING);
@@ -799,7 +799,7 @@ already_busywaiting(struct i915_request *rq)
 	 *
 	 * See the are-we-too-late? check in __i915_request_submit().
 	 */
-	return rq->sched.semaphores | rq->hw_context->saturated;
+	return rq->sched.semaphores | rq->engine->saturated;
 }
 
 static int
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 18/39] drm/i915: Throw away the active object retirement complexity
  2019-06-14  7:09 WIP glance towards struct_mutex removal Chris Wilson
                   ` (16 preceding siblings ...)
  2019-06-14  7:10 ` [PATCH 17/39] drm/i915: Make the semaphore saturation mask global Chris Wilson
@ 2019-06-14  7:10 ` Chris Wilson
  2019-06-14  7:10 ` [PATCH 19/39] drm/i915: Provide an i915_active.acquire callback Chris Wilson
                   ` (22 subsequent siblings)
  40 siblings, 0 replies; 61+ messages in thread
From: Chris Wilson @ 2019-06-14  7:10 UTC (permalink / raw)
  To: intel-gfx

Remove the accumulated optimisations that we have for i915_vma_retire
and reduce it to the bare essential of tracking the active object
reference. This allows us to only use atomic operations, and so will be
able to avoid the struct_mutex requirement.

The principal loss here is the shrinker MRU bumping, so now if we have
to shrink, we will do so in much more random order and more likely to
try and shrink recently used objects. That is a nuisance, but shrinking
active objects is a second step we try to avoid and will always be a
system-wide performance issue.

The other loss is here is in the automatic pruning of the
reservation_object when idling. This is not as large an issue as upon
reservation_object introduction as now adding new fences into the object
replaces already signaled fences, keeping the array compact. But we do
lose the auto-expiration of stale fences and unused arrays. That may be
a noticeable problem for which we need to re-implement autopruning.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gem/i915_gem_object.c    |  1 -
 drivers/gpu/drm/i915/gem/i915_gem_object.h    |  6 ---
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |  1 -
 drivers/gpu/drm/i915/gem/i915_gem_shrinker.c  |  5 +-
 .../drm/i915/gem/selftests/i915_gem_mman.c    |  9 ----
 drivers/gpu/drm/i915/gt/intel_lrc.c           |  4 +-
 drivers/gpu/drm/i915/gt/intel_ringbuffer.c    |  1 -
 drivers/gpu/drm/i915/gt/selftest_hangcheck.c  | 32 +++++------
 drivers/gpu/drm/i915/i915_debugfs.c           |  8 +--
 drivers/gpu/drm/i915/i915_gem_batch_pool.c    | 42 ++++++---------
 drivers/gpu/drm/i915/i915_vma.c               | 54 ++++---------------
 11 files changed, 47 insertions(+), 116 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
index 36b76c6a0a9d..01528553d4cc 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
@@ -189,7 +189,6 @@ static void __i915_gem_free_objects(struct drm_i915_private *i915,
 
 		mutex_lock(&i915->drm.struct_mutex);
 
-		GEM_BUG_ON(i915_gem_object_is_active(obj));
 		list_for_each_entry_safe(vma, vn, &obj->vma.list, obj_link) {
 			GEM_BUG_ON(i915_vma_is_active(vma));
 			vma->flags &= ~I915_VMA_PIN_MASK;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h
index 7cb1871d7128..454bfb498001 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
@@ -158,12 +158,6 @@ i915_gem_object_needs_async_cancel(const struct drm_i915_gem_object *obj)
 	return obj->ops->flags & I915_GEM_OBJECT_ASYNC_CANCEL;
 }
 
-static inline bool
-i915_gem_object_is_active(const struct drm_i915_gem_object *obj)
-{
-	return READ_ONCE(obj->active_count);
-}
-
 static inline bool
 i915_gem_object_is_framebuffer(const struct drm_i915_gem_object *obj)
 {
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index 5b05698619ce..c299fed2c6b1 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -156,7 +156,6 @@ struct drm_i915_gem_object {
 
 	/** Count of VMA actually bound by this object */
 	atomic_t bind_count;
-	unsigned int active_count;
 	/** Count of how many global VMA are currently pinned for use by HW */
 	unsigned int pin_global;
 
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c
index e15f37bef36a..7445e6f662ac 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c
@@ -235,8 +235,9 @@ i915_gem_shrink(struct drm_i915_private *i915,
 				continue;
 
 			if (!(shrink & I915_SHRINK_ACTIVE) &&
-			    (i915_gem_object_is_active(obj) ||
-			     i915_gem_object_is_framebuffer(obj)))
+			    (i915_gem_object_is_framebuffer(obj) ||
+			     reservation_object_test_signaled_rcu(obj->resv,
+								  true)))
 				continue;
 
 			if (!(shrink & I915_SHRINK_BOUND) &&
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
index b92809418729..aac093b1b0b2 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
@@ -474,15 +474,6 @@ static int igt_mmap_offset_exhaustion(void *arg)
 			pr_err("[loop %d] Failed to busy the object\n", loop);
 			goto err_obj;
 		}
-
-		/* NB we rely on the _active_ reference to access obj now */
-		GEM_BUG_ON(!i915_gem_object_is_active(obj));
-		err = create_mmap_offset(obj);
-		if (err) {
-			pr_err("[loop %d] create_mmap_offset failed with err=%d\n",
-			       loop, err);
-			goto out;
-		}
 	}
 
 out:
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index bbbdc63906c6..cd4cf4d0b30c 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -1509,9 +1509,7 @@ static void execlists_submit_request(struct i915_request *request)
 static void __execlists_context_fini(struct intel_context *ce)
 {
 	intel_ring_put(ce->ring);
-
-	GEM_BUG_ON(i915_gem_object_is_active(ce->state->obj));
-	i915_gem_object_put(ce->state->obj);
+	i915_vma_put(ce->state);
 }
 
 static void execlists_context_destroy(struct kref *kref)
diff --git a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
index 019bf039f616..5ccf61806d9e 100644
--- a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
@@ -1309,7 +1309,6 @@ void intel_ring_free(struct kref *ref)
 
 static void __ring_context_fini(struct intel_context *ce)
 {
-	GEM_BUG_ON(i915_gem_object_is_active(ce->state->obj));
 	i915_gem_object_put(ce->state->obj);
 }
 
diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
index 45379a63e013..8f2f07cae7bd 100644
--- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
+++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
@@ -129,33 +129,29 @@ hang_create_request(struct hang *h, struct intel_engine_cs *engine)
 {
 	struct drm_i915_private *i915 = h->i915;
 	struct i915_address_space *vm = h->ctx->vm ?: &i915->ggtt.vm;
+	struct drm_i915_gem_object *obj;
 	struct i915_request *rq = NULL;
 	struct i915_vma *hws, *vma;
 	unsigned int flags;
+	void *vaddr;
 	u32 *batch;
 	int err;
 
-	if (i915_gem_object_is_active(h->obj)) {
-		struct drm_i915_gem_object *obj;
-		void *vaddr;
-
-		obj = i915_gem_object_create_internal(h->i915, PAGE_SIZE);
-		if (IS_ERR(obj))
-			return ERR_CAST(obj);
+	obj = i915_gem_object_create_internal(h->i915, PAGE_SIZE);
+	if (IS_ERR(obj))
+		return ERR_CAST(obj);
 
-		vaddr = i915_gem_object_pin_map(obj,
-						i915_coherent_map_type(h->i915));
-		if (IS_ERR(vaddr)) {
-			i915_gem_object_put(obj);
-			return ERR_CAST(vaddr);
-		}
+	vaddr = i915_gem_object_pin_map(obj, i915_coherent_map_type(h->i915));
+	if (IS_ERR(vaddr)) {
+		i915_gem_object_put(obj);
+		return ERR_CAST(vaddr);
+	}
 
-		i915_gem_object_unpin_map(h->obj);
-		i915_gem_object_put(h->obj);
+	i915_gem_object_unpin_map(h->obj);
+	i915_gem_object_put(h->obj);
 
-		h->obj = obj;
-		h->batch = vaddr;
-	}
+	h->obj = obj;
+	h->batch = vaddr;
 
 	vma = i915_vma_instance(h->obj, vm, NULL);
 	if (IS_ERR(vma))
diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 323863504111..d8fe75f5e0d6 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -74,11 +74,6 @@ static int i915_capabilities(struct seq_file *m, void *data)
 	return 0;
 }
 
-static char get_active_flag(struct drm_i915_gem_object *obj)
-{
-	return i915_gem_object_is_active(obj) ? '*' : ' ';
-}
-
 static char get_pin_flag(struct drm_i915_gem_object *obj)
 {
 	return obj->pin_global ? 'p' : ' ';
@@ -143,9 +138,8 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
 	unsigned int frontbuffer_bits;
 	int pin_count = 0;
 
-	seq_printf(m, "%pK: %c%c%c%c%c %8zdKiB %02x %02x %s%s%s",
+	seq_printf(m, "%pK: %c%c%c%c %8zdKiB %02x %02x %s%s%s",
 		   &obj->base,
-		   get_active_flag(obj),
 		   get_pin_flag(obj),
 		   get_tiling_flag(obj),
 		   get_global_flag(obj),
diff --git a/drivers/gpu/drm/i915/i915_gem_batch_pool.c b/drivers/gpu/drm/i915/i915_gem_batch_pool.c
index 56adfdcaed3e..1b7595e2ac21 100644
--- a/drivers/gpu/drm/i915/i915_gem_batch_pool.c
+++ b/drivers/gpu/drm/i915/i915_gem_batch_pool.c
@@ -94,34 +94,26 @@ i915_gem_batch_pool_get(struct i915_gem_batch_pool *pool,
 	list = &pool->cache_list[n];
 
 	list_for_each_entry(obj, list, batch_pool_link) {
+		struct reservation_object *resv = obj->resv;
+
 		/* The batches are strictly LRU ordered */
-		if (i915_gem_object_is_active(obj)) {
-			struct reservation_object *resv = obj->resv;
-
-			if (!reservation_object_test_signaled_rcu(resv, true))
-				break;
-
-			i915_retire_requests(pool->engine->i915);
-			GEM_BUG_ON(i915_gem_object_is_active(obj));
-
-			/*
-			 * The object is now idle, clear the array of shared
-			 * fences before we add a new request. Although, we
-			 * remain on the same engine, we may be on a different
-			 * timeline and so may continually grow the array,
-			 * trapping a reference to all the old fences, rather
-			 * than replace the existing fence.
-			 */
-			if (rcu_access_pointer(resv->fence)) {
-				reservation_object_lock(resv, NULL);
-				reservation_object_add_excl_fence(resv, NULL);
-				reservation_object_unlock(resv);
-			}
+		if (!reservation_object_test_signaled_rcu(resv, true))
+			break;
+
+		/*
+		 * The object is now idle, clear the array of shared
+		 * fences before we add a new request. Although, we
+		 * remain on the same engine, we may be on a different
+		 * timeline and so may continually grow the array,
+		 * trapping a reference to all the old fences, rather
+		 * than replace the existing fence.
+		 */
+		if (rcu_access_pointer(resv->fence)) {
+			reservation_object_lock(resv, NULL);
+			reservation_object_add_excl_fence(resv, NULL);
+			reservation_object_unlock(resv);
 		}
 
-		GEM_BUG_ON(!reservation_object_test_signaled_rcu(obj->resv,
-								 true));
-
 		if (obj->base.size >= size)
 			goto found;
 	}
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index cb341e4acf99..f782919ff19b 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -77,43 +77,11 @@ static void vma_print_allocator(struct i915_vma *vma, const char *reason)
 
 #endif
 
-static void obj_bump_mru(struct drm_i915_gem_object *obj)
-{
-	struct drm_i915_private *i915 = to_i915(obj->base.dev);
-	unsigned long flags;
-
-	spin_lock_irqsave(&i915->mm.obj_lock, flags);
-	list_move_tail(&obj->mm.link, &i915->mm.shrink_list);
-	spin_unlock_irqrestore(&i915->mm.obj_lock, flags);
-
-	obj->mm.dirty = true; /* be paranoid  */
-}
-
 static void __i915_vma_retire(struct i915_active *ref)
 {
 	struct i915_vma *vma = container_of(ref, typeof(*vma), active);
-	struct drm_i915_gem_object *obj = vma->obj;
-
-	GEM_BUG_ON(!i915_gem_object_is_active(obj));
-	if (--obj->active_count)
-		return;
-
-	/* Prune the shared fence arrays iff completely idle (inc. external) */
-	if (reservation_object_trylock(obj->resv)) {
-		if (reservation_object_test_signaled_rcu(obj->resv, true))
-			reservation_object_add_excl_fence(obj->resv, NULL);
-		reservation_object_unlock(obj->resv);
-	}
 
-	/*
-	 * Bump our place on the bound list to keep it roughly in LRU order
-	 * so that we don't steal from recently used but inactive objects
-	 * (unless we are forced to ofc!)
-	 */
-	if (i915_gem_object_is_shrinkable(obj))
-		obj_bump_mru(obj);
-
-	i915_gem_object_put(obj); /* and drop the active reference */
+	i915_vma_put(vma);
 }
 
 static struct i915_vma *
@@ -921,6 +889,7 @@ int i915_vma_move_to_active(struct i915_vma *vma,
 			    unsigned int flags)
 {
 	struct drm_i915_gem_object *obj = vma->obj;
+	int err;
 
 	assert_vma_held(vma);
 	assert_object_held(obj);
@@ -934,17 +903,13 @@ int i915_vma_move_to_active(struct i915_vma *vma,
 	 * add the active reference first and queue for it to be dropped
 	 * *last*.
 	 */
-	if (!vma->active.count && !obj->active_count++)
-		i915_gem_object_get(obj); /* once more for the active ref */
-
-	if (unlikely(i915_active_ref(&vma->active, rq->fence.context, rq))) {
-		if (!vma->active.count && !--obj->active_count)
-			i915_gem_object_put(obj);
-		return -ENOMEM;
-	}
+	if (i915_active_acquire(&vma->active))
+		i915_vma_get(vma);
 
-	GEM_BUG_ON(!i915_vma_is_active(vma));
-	GEM_BUG_ON(!obj->active_count);
+	err = i915_active_ref(&vma->active, rq->fence.context, rq);
+	i915_active_release(&vma->active);
+	if (unlikely(err))
+		return err;
 
 	obj->write_domain = 0;
 	if (flags & EXEC_OBJECT_WRITE) {
@@ -956,11 +921,14 @@ int i915_vma_move_to_active(struct i915_vma *vma,
 		obj->read_domains = 0;
 	}
 	obj->read_domains |= I915_GEM_GPU_DOMAINS;
+	obj->mm.dirty = true;
 
 	if (flags & EXEC_OBJECT_NEEDS_FENCE)
 		__i915_active_request_set(&vma->last_fence, rq);
 
 	export_fence(vma, rq, flags);
+
+	GEM_BUG_ON(!i915_vma_is_active(vma));
 	return 0;
 }
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 19/39] drm/i915: Provide an i915_active.acquire callback
  2019-06-14  7:09 WIP glance towards struct_mutex removal Chris Wilson
                   ` (17 preceding siblings ...)
  2019-06-14  7:10 ` [PATCH 18/39] drm/i915: Throw away the active object retirement complexity Chris Wilson
@ 2019-06-14  7:10 ` Chris Wilson
  2019-06-14  7:10 ` [PATCH 20/39] drm/i915: Push the i915_active.retire into a worker Chris Wilson
                   ` (21 subsequent siblings)
  40 siblings, 0 replies; 61+ messages in thread
From: Chris Wilson @ 2019-06-14  7:10 UTC (permalink / raw)
  To: intel-gfx

If we introduce a callback for i915_active that is only called the first
time we use the i915_active and is symmetrically paired with the
i915_active.retire callback, we can replace the open-coded and
non-atomic implementations -- which will be very fragile (i.e. broken)
upon removing the struct_mutex serialisation.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c  |   8 +-
 drivers/gpu/drm/i915/gt/intel_context.c      |  82 ++++---
 drivers/gpu/drm/i915/gt/intel_context.h      |  14 +-
 drivers/gpu/drm/i915/gt/intel_lrc.c          |   6 +-
 drivers/gpu/drm/i915/gt/intel_ringbuffer.c   |   2 +-
 drivers/gpu/drm/i915/gt/mock_engine.c        |   2 +-
 drivers/gpu/drm/i915/i915_active.c           | 217 ++++++++++---------
 drivers/gpu/drm/i915/i915_active.h           |  25 ++-
 drivers/gpu/drm/i915/i915_active_types.h     |  10 +-
 drivers/gpu/drm/i915/i915_gem_gtt.c          |   2 +-
 drivers/gpu/drm/i915/i915_timeline.c         |  16 +-
 drivers/gpu/drm/i915/i915_vma.c              |  22 +-
 drivers/gpu/drm/i915/selftests/i915_active.c |  12 +-
 13 files changed, 225 insertions(+), 193 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 1ce122f4ed25..9262a1d4f763 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -923,8 +923,12 @@ static int context_barrier_task(struct i915_gem_context *ctx,
 	if (!cb)
 		return -ENOMEM;
 
-	i915_active_init(i915, &cb->base, cb_retire);
-	i915_active_acquire(&cb->base);
+	i915_active_init(i915, &cb->base, NULL, cb_retire);
+	err = i915_active_acquire(&cb->base);
+	if (err) {
+		kfree(cb);
+		return err;
+	}
 
 	for_each_gem_engine(ce, i915_gem_context_lock_engines(ctx), it) {
 		struct i915_request *rq;
diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
index 3d660c8ce414..2ea05dcdb99f 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -95,11 +95,14 @@ void intel_context_unpin(struct intel_context *ce)
 	intel_context_put(ce);
 }
 
-static int __context_pin_state(struct i915_vma *vma, unsigned long flags)
+static int __context_pin_state(struct i915_vma *vma)
 {
+	u64 flags;
 	int err;
 
-	err = i915_vma_pin(vma, 0, 0, flags | PIN_GLOBAL);
+	flags = PIN_HIGH | PIN_GLOBAL;
+	flags |= i915_vm_to_ggtt(vma->vm)->pin_bias | PIN_OFFSET_BIAS;
+	err = i915_vma_pin(vma, 0, 0, flags);
 	if (err)
 		return err;
 
@@ -119,7 +122,7 @@ static void __context_unpin_state(struct i915_vma *vma)
 	__i915_vma_unpin(vma);
 }
 
-static void intel_context_retire(struct i915_active *active)
+static void __intel_context_retire(struct i915_active *active)
 {
 	struct intel_context *ce = container_of(active, typeof(*ce), active);
 
@@ -129,65 +132,58 @@ static void intel_context_retire(struct i915_active *active)
 	intel_context_put(ce);
 }
 
-void
-intel_context_init(struct intel_context *ce,
-		   struct i915_gem_context *ctx,
-		   struct intel_engine_cs *engine)
-{
-	GEM_BUG_ON(!engine->cops);
-
-	kref_init(&ce->ref);
-
-	ce->gem_context = ctx;
-	ce->engine = engine;
-	ce->ops = engine->cops;
-	ce->sseu = engine->sseu;
-
-	INIT_LIST_HEAD(&ce->signal_link);
-	INIT_LIST_HEAD(&ce->signals);
-
-	mutex_init(&ce->pin_mutex);
-
-	i915_active_init(ctx->i915, &ce->active, intel_context_retire);
-}
-
-int intel_context_active_acquire(struct intel_context *ce, unsigned long flags)
+static int __intel_context_active(struct i915_active *active)
 {
+	struct intel_context *ce = container_of(active, typeof(*ce), active);
 	int err;
 
-	if (!i915_active_acquire(&ce->active))
-		return 0;
-
 	intel_context_get(ce);
 
 	if (!ce->state)
 		return 0;
 
-	err = __context_pin_state(ce->state, flags);
-	if (err) {
-		i915_active_cancel(&ce->active);
-		intel_context_put(ce);
-		return err;
-	}
+	err = __context_pin_state(ce->state);
+	if (err)
+		goto err_put;
 
 	/* Preallocate tracking nodes */
 	if (!i915_gem_context_is_kernel(ce->gem_context)) {
 		err = i915_active_acquire_preallocate_barrier(&ce->active,
 							      ce->engine);
-		if (err) {
-			i915_active_release(&ce->active);
-			return err;
-		}
+		if (err)
+			goto err_unpin;
 	}
 
 	return 0;
+
+err_unpin:
+	__context_unpin_state(ce->state);
+err_put:
+	intel_context_put(ce);
+	return err;
 }
 
-void intel_context_active_release(struct intel_context *ce)
+void
+intel_context_init(struct intel_context *ce,
+		   struct i915_gem_context *ctx,
+		   struct intel_engine_cs *engine)
 {
-	/* Nodes preallocated in intel_context_active() */
-	i915_active_acquire_barrier(&ce->active);
-	i915_active_release(&ce->active);
+	GEM_BUG_ON(!engine->cops);
+
+	kref_init(&ce->ref);
+
+	ce->gem_context = ctx;
+	ce->engine = engine;
+	ce->ops = engine->cops;
+	ce->sseu = engine->sseu;
+
+	INIT_LIST_HEAD(&ce->signal_link);
+	INIT_LIST_HEAD(&ce->signals);
+
+	mutex_init(&ce->pin_mutex);
+
+	i915_active_init(ctx->i915, &ce->active,
+			 __intel_context_active, __intel_context_retire);
 }
 
 static void i915_global_context_shrink(void)
diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h
index a47275bc4f01..40cd8320fcc3 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.h
+++ b/drivers/gpu/drm/i915/gt/intel_context.h
@@ -9,6 +9,7 @@
 
 #include <linux/lockdep.h>
 
+#include "i915_active.h"
 #include "intel_context_types.h"
 #include "intel_engine_types.h"
 
@@ -102,8 +103,17 @@ static inline void intel_context_exit(struct intel_context *ce)
 		ce->ops->exit(ce);
 }
 
-int intel_context_active_acquire(struct intel_context *ce, unsigned long flags);
-void intel_context_active_release(struct intel_context *ce);
+static inline int intel_context_active_acquire(struct intel_context *ce)
+{
+	return i915_active_acquire(&ce->active);
+}
+
+static inline void intel_context_active_release(struct intel_context *ce)
+{
+	/* Nodes preallocated in intel_context_active() */
+	i915_active_acquire_barrier(&ce->active);
+	i915_active_release(&ce->active);
+}
 
 static inline struct intel_context *intel_context_get(struct intel_context *ce)
 {
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index cd4cf4d0b30c..5dbc43c70496 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -1565,12 +1565,10 @@ __execlists_context_pin(struct intel_context *ce,
 		goto err;
 	GEM_BUG_ON(!ce->state);
 
-	ret = intel_context_active_acquire(ce,
-					   engine->i915->ggtt.pin_bias |
-					   PIN_OFFSET_BIAS |
-					   PIN_HIGH);
+	ret = intel_context_active_acquire(ce);
 	if (ret)
 		goto err;
+	GEM_BUG_ON(!i915_vma_is_pinned(ce->state));
 
 	vaddr = i915_gem_object_pin_map(ce->state->obj,
 					i915_coherent_map_type(engine->i915) |
diff --git a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
index 5ccf61806d9e..bead5afaeb13 100644
--- a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
@@ -1437,7 +1437,7 @@ static int ring_context_pin(struct intel_context *ce)
 		ce->state = vma;
 	}
 
-	err = intel_context_active_acquire(ce, PIN_HIGH);
+	err = intel_context_active_acquire(ce);
 	if (err)
 		return err;
 
diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c b/drivers/gpu/drm/i915/gt/mock_engine.c
index 086801b51441..b9c2764beca3 100644
--- a/drivers/gpu/drm/i915/gt/mock_engine.c
+++ b/drivers/gpu/drm/i915/gt/mock_engine.c
@@ -154,7 +154,7 @@ static int mock_context_pin(struct intel_context *ce)
 			return -ENOMEM;
 	}
 
-	ret = intel_context_active_acquire(ce, PIN_HIGH);
+	ret = intel_context_active_acquire(ce);
 	if (ret)
 		return ret;
 
diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c
index 2d019ac6db20..6a9f8d37f415 100644
--- a/drivers/gpu/drm/i915/i915_active.c
+++ b/drivers/gpu/drm/i915/i915_active.c
@@ -30,48 +30,50 @@ struct active_node {
 };
 
 static void
-__active_park(struct i915_active *ref)
+active_retire(struct i915_active *ref)
 {
 	struct active_node *it, *n;
+	struct rb_root root;
+	bool retire = false;
 
-	rbtree_postorder_for_each_entry_safe(it, n, &ref->tree, node) {
-		GEM_BUG_ON(i915_active_request_isset(&it->base));
-		kmem_cache_free(global.slab_cache, it);
+	GEM_BUG_ON(!atomic_read(&ref->count));
+	if (atomic_add_unless(&ref->count, -1, 1))
+		return;
+
+	/* One active may be flushed from inside the acquire of another */
+	mutex_lock_nested(&ref->mutex, SINGLE_DEPTH_NESTING);
+
+	/* return the unused nodes to our slabcache -- flushing the allocator */
+	if (atomic_dec_and_test(&ref->count)) {
+		root = ref->tree;
+		ref->tree = RB_ROOT;
+		ref->cache = NULL;
+		retire = true;
 	}
-	ref->tree = RB_ROOT;
-}
 
-static void
-__active_retire(struct i915_active *ref)
-{
-	GEM_BUG_ON(!ref->count);
-	if (--ref->count)
+	mutex_unlock(&ref->mutex);
+	if (!retire)
 		return;
 
-	/* return the unused nodes to our slabcache */
-	__active_park(ref);
-
 	ref->retire(ref);
-}
 
-static void
-node_retire(struct i915_active_request *base, struct i915_request *rq)
-{
-	__active_retire(container_of(base, struct active_node, base)->ref);
+	rbtree_postorder_for_each_entry_safe(it, n, &root, node) {
+		GEM_BUG_ON(i915_active_request_isset(&it->base));
+		kmem_cache_free(global.slab_cache, it);
+	}
 }
 
 static void
-last_retire(struct i915_active_request *base, struct i915_request *rq)
+node_retire(struct i915_active_request *base, struct i915_request *rq)
 {
-	__active_retire(container_of(base, struct i915_active, last));
+	active_retire(container_of(base, struct active_node, base)->ref);
 }
 
 static struct i915_active_request *
 active_instance(struct i915_active *ref, u64 idx)
 {
-	struct active_node *node;
+	struct active_node *node, *prealloc;
 	struct rb_node **p, *parent;
-	struct i915_request *old;
 
 	/*
 	 * We track the most recently used timeline to skip a rbtree search
@@ -79,20 +81,18 @@ active_instance(struct i915_active *ref, u64 idx)
 	 * at all. We can reuse the last slot if it is empty, that is
 	 * after the previous activity has been retired, or if it matches the
 	 * current timeline.
-	 *
-	 * Note that we allow the timeline to be active simultaneously in
-	 * the rbtree and the last cache. We do this to avoid having
-	 * to search and replace the rbtree element for a new timeline, with
-	 * the cost being that we must be aware that the ref may be retired
-	 * twice for the same timeline (as the older rbtree element will be
-	 * retired before the new request added to last).
 	 */
-	old = i915_active_request_raw(&ref->last, BKL(ref));
-	if (!old || old->fence.context == idx)
-		goto out;
+	node = READ_ONCE(ref->cache);
+	if (node && node->timeline == idx)
+		return &node->base;
 
-	/* Move the currently active fence into the rbtree */
-	idx = old->fence.context;
+	/* Preallocate a replacement, just in case */
+	prealloc = kmem_cache_alloc(global.slab_cache, GFP_KERNEL);
+	if (!prealloc)
+		return NULL;
+
+	mutex_lock(&ref->mutex);
+	GEM_BUG_ON(i915_active_is_idle(ref));
 
 	parent = NULL;
 	p = &ref->tree.rb_node;
@@ -100,8 +100,10 @@ active_instance(struct i915_active *ref, u64 idx)
 		parent = *p;
 
 		node = rb_entry(parent, struct active_node, node);
-		if (node->timeline == idx)
-			goto replace;
+		if (node->timeline == idx) {
+			kmem_cache_free(global.slab_cache, prealloc);
+			goto out;
+		}
 
 		if (node->timeline < idx)
 			p = &parent->rb_right;
@@ -109,17 +111,7 @@ active_instance(struct i915_active *ref, u64 idx)
 			p = &parent->rb_left;
 	}
 
-	node = kmem_cache_alloc(global.slab_cache, GFP_KERNEL);
-
-	/* kmalloc may retire the ref->last (thanks shrinker)! */
-	if (unlikely(!i915_active_request_raw(&ref->last, BKL(ref)))) {
-		kmem_cache_free(global.slab_cache, node);
-		goto out;
-	}
-
-	if (unlikely(!node))
-		return ERR_PTR(-ENOMEM);
-
+	node = prealloc;
 	i915_active_request_init(&node->base, NULL, node_retire);
 	node->ref = ref;
 	node->timeline = idx;
@@ -127,38 +119,27 @@ active_instance(struct i915_active *ref, u64 idx)
 	rb_link_node(&node->node, parent, p);
 	rb_insert_color(&node->node, &ref->tree);
 
-replace:
-	/*
-	 * Overwrite the previous active slot in the rbtree with last,
-	 * leaving last zeroed. If the previous slot is still active,
-	 * we must be careful as we now only expect to receive one retire
-	 * callback not two, and so much undo the active counting for the
-	 * overwritten slot.
-	 */
-	if (i915_active_request_isset(&node->base)) {
-		/* Retire ourselves from the old rq->active_list */
-		__list_del_entry(&node->base.link);
-		ref->count--;
-		GEM_BUG_ON(!ref->count);
-	}
-	GEM_BUG_ON(list_empty(&ref->last.link));
-	list_replace_init(&ref->last.link, &node->base.link);
-	node->base.request = fetch_and_zero(&ref->last.request);
-
 out:
-	return &ref->last;
+	ref->cache = node;
+	mutex_unlock(&ref->mutex);
+
+	return &node->base;
 }
 
-void i915_active_init(struct drm_i915_private *i915,
-		      struct i915_active *ref,
-		      void (*retire)(struct i915_active *ref))
+void __i915_active_init(struct drm_i915_private *i915,
+			struct i915_active *ref,
+			int (*active)(struct i915_active *ref),
+			void (*retire)(struct i915_active *ref),
+			struct lock_class_key *key)
 {
 	ref->i915 = i915;
+	ref->active = active;
 	ref->retire = retire;
 	ref->tree = RB_ROOT;
-	i915_active_request_init(&ref->last, NULL, last_retire);
+	ref->cache = NULL;
 	init_llist_head(&ref->barriers);
-	ref->count = 0;
+	atomic_set(&ref->count, 0);
+	__mutex_init(&ref->mutex, "i915_active", key);
 }
 
 int i915_active_ref(struct i915_active *ref,
@@ -166,60 +147,80 @@ int i915_active_ref(struct i915_active *ref,
 		    struct i915_request *rq)
 {
 	struct i915_active_request *active;
-	int err = 0;
+	int err;
 
 	/* Prevent reaping in case we malloc/wait while building the tree */
-	i915_active_acquire(ref);
+	err = i915_active_acquire(ref);
+	if (err)
+		return err;
 
 	active = active_instance(ref, timeline);
-	if (IS_ERR(active)) {
-		err = PTR_ERR(active);
+	if (!active) {
+		err = -ENOMEM;
 		goto out;
 	}
 
 	if (!i915_active_request_isset(active))
-		ref->count++;
+		atomic_inc(&ref->count);
 	__i915_active_request_set(active, rq);
 
-	GEM_BUG_ON(!ref->count);
 out:
 	i915_active_release(ref);
 	return err;
 }
 
-bool i915_active_acquire(struct i915_active *ref)
+int i915_active_acquire(struct i915_active *ref)
 {
-	lockdep_assert_held(BKL(ref));
-	return !ref->count++;
+	int err;
+
+	if (atomic_add_unless(&ref->count, 1, 0))
+		return 0;
+
+	err = mutex_lock_interruptible(&ref->mutex);
+	if (err)
+		return err;
+
+	if (!atomic_read(&ref->count) && ref->active)
+		err = ref->active(ref);
+	if (!err)
+		atomic_inc(&ref->count);
+
+	mutex_unlock(&ref->mutex);
+
+	return err;
 }
 
 void i915_active_release(struct i915_active *ref)
 {
-	lockdep_assert_held(BKL(ref));
-	__active_retire(ref);
+	active_retire(ref);
 }
 
 int i915_active_wait(struct i915_active *ref)
 {
 	struct active_node *it, *n;
-	int ret = 0;
+	int err;
 
-	if (i915_active_acquire(ref))
-		goto out_release;
+	might_sleep();
+	if (RB_EMPTY_ROOT(&ref->tree))
+		return 0;
 
-	ret = i915_active_request_retire(&ref->last, BKL(ref));
-	if (ret)
-		goto out_release;
+	err = mutex_lock_interruptible(&ref->mutex);
+	if (err)
+		return err;
+
+	if (!atomic_add_unless(&ref->count, 1, 0))
+		goto unlock;
 
 	rbtree_postorder_for_each_entry_safe(it, n, &ref->tree, node) {
-		ret = i915_active_request_retire(&it->base, BKL(ref));
-		if (ret)
+		err = i915_active_request_retire(&it->base, BKL(ref));
+		if (err)
 			break;
 	}
 
-out_release:
-	i915_active_release(ref);
-	return ret;
+	active_retire(ref);
+unlock:
+	mutex_unlock(&ref->mutex);
+	return err;
 }
 
 int i915_request_await_active_request(struct i915_request *rq,
@@ -234,23 +235,24 @@ int i915_request_await_active_request(struct i915_request *rq,
 int i915_request_await_active(struct i915_request *rq, struct i915_active *ref)
 {
 	struct active_node *it, *n;
-	int err = 0;
+	int err;
 
-	/* await allocates and so we need to avoid hitting the shrinker */
-	if (i915_active_acquire(ref))
-		goto out; /* was idle */
+	if (RB_EMPTY_ROOT(&ref->tree))
+		return 0;
 
-	err = i915_request_await_active_request(rq, &ref->last);
+	/* await allocates and so we need to avoid hitting the shrinker */
+	err = i915_active_acquire(ref);
 	if (err)
-		goto out;
+		return err;
 
+	mutex_lock(&ref->mutex);
 	rbtree_postorder_for_each_entry_safe(it, n, &ref->tree, node) {
 		err = i915_request_await_active_request(rq, &it->base);
 		if (err)
-			goto out;
+			break;
 	}
+	mutex_unlock(&ref->mutex);
 
-out:
 	i915_active_release(ref);
 	return err;
 }
@@ -258,9 +260,9 @@ int i915_request_await_active(struct i915_request *rq, struct i915_active *ref)
 #if IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)
 void i915_active_fini(struct i915_active *ref)
 {
-	GEM_BUG_ON(i915_active_request_isset(&ref->last));
 	GEM_BUG_ON(!RB_EMPTY_ROOT(&ref->tree));
-	GEM_BUG_ON(ref->count);
+	GEM_BUG_ON(atomic_read(&ref->count));
+	mutex_destroy(&ref->mutex);
 }
 #endif
 
@@ -286,7 +288,7 @@ int i915_active_acquire_preallocate_barrier(struct i915_active *ref,
 					 (void *)engine, node_retire);
 		node->timeline = kctx->ring->timeline->fence_context;
 		node->ref = ref;
-		ref->count++;
+		atomic_inc(&ref->count);
 
 		llist_add((struct llist_node *)&node->base.link,
 			  &ref->barriers);
@@ -299,8 +301,9 @@ void i915_active_acquire_barrier(struct i915_active *ref)
 {
 	struct llist_node *pos, *next;
 
-	i915_active_acquire(ref);
+	GEM_BUG_ON(i915_active_is_idle(ref));
 
+	mutex_lock_nested(&ref->mutex, SINGLE_DEPTH_NESTING);
 	llist_for_each_safe(pos, next, llist_del_all(&ref->barriers)) {
 		struct intel_engine_cs *engine;
 		struct active_node *node;
@@ -329,7 +332,7 @@ void i915_active_acquire_barrier(struct i915_active *ref)
 		llist_add((struct llist_node *)&node->base.link,
 			  &engine->barrier_tasks);
 	}
-	i915_active_release(ref);
+	mutex_unlock(&ref->mutex);
 }
 
 void i915_request_add_barriers(struct i915_request *rq)
diff --git a/drivers/gpu/drm/i915/i915_active.h b/drivers/gpu/drm/i915/i915_active.h
index d55d37673944..6fa9263ca407 100644
--- a/drivers/gpu/drm/i915/i915_active.h
+++ b/drivers/gpu/drm/i915/i915_active.h
@@ -369,9 +369,16 @@ i915_active_request_retire(struct i915_active_request *active,
  * synchronisation.
  */
 
-void i915_active_init(struct drm_i915_private *i915,
-		      struct i915_active *ref,
-		      void (*retire)(struct i915_active *ref));
+void __i915_active_init(struct drm_i915_private *i915,
+			struct i915_active *ref,
+			int (*active)(struct i915_active *ref),
+			void (*retire)(struct i915_active *ref),
+			struct lock_class_key *key);
+#define i915_active_init(i915, ref, active, retire) do {		\
+	static struct lock_class_key __key;				\
+									\
+	__i915_active_init(i915, ref, active, retire, &__key);		\
+} while (0)
 
 int i915_active_ref(struct i915_active *ref,
 		    u64 timeline,
@@ -384,20 +391,14 @@ int i915_request_await_active(struct i915_request *rq,
 int i915_request_await_active_request(struct i915_request *rq,
 				      struct i915_active_request *active);
 
-bool i915_active_acquire(struct i915_active *ref);
-
-static inline void i915_active_cancel(struct i915_active *ref)
-{
-	GEM_BUG_ON(ref->count != 1);
-	ref->count = 0;
-}
-
+int i915_active_acquire(struct i915_active *ref);
 void i915_active_release(struct i915_active *ref);
+void __i915_active_release_nested(struct i915_active *ref, int subclass);
 
 static inline bool
 i915_active_is_idle(const struct i915_active *ref)
 {
-	return !ref->count;
+	return !atomic_read(&ref->count);
 }
 
 #if IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)
diff --git a/drivers/gpu/drm/i915/i915_active_types.h b/drivers/gpu/drm/i915/i915_active_types.h
index c025991b9233..5b0a3024ce24 100644
--- a/drivers/gpu/drm/i915/i915_active_types.h
+++ b/drivers/gpu/drm/i915/i915_active_types.h
@@ -7,7 +7,9 @@
 #ifndef _I915_ACTIVE_TYPES_H_
 #define _I915_ACTIVE_TYPES_H_
 
+#include <linux/atomic.h>
 #include <linux/llist.h>
+#include <linux/mutex.h>
 #include <linux/rbtree.h>
 #include <linux/rcupdate.h>
 
@@ -24,13 +26,17 @@ struct i915_active_request {
 	i915_active_retire_fn retire;
 };
 
+struct active_node;
+
 struct i915_active {
 	struct drm_i915_private *i915;
 
+	struct active_node *cache;
 	struct rb_root tree;
-	struct i915_active_request last;
-	unsigned int count;
+	struct mutex mutex;
+	atomic_t count;
 
+	int (*active)(struct i915_active *ref);
 	void (*retire)(struct i915_active *ref);
 
 	struct llist_head barriers;
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 7be72388b052..611ced79fd09 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -2061,7 +2061,7 @@ static struct i915_vma *pd_vma_create(struct gen6_ppgtt *ppgtt, int size)
 	if (!vma)
 		return ERR_PTR(-ENOMEM);
 
-	i915_active_init(i915, &vma->active, NULL);
+	i915_active_init(i915, &vma->active, NULL, NULL);
 	INIT_ACTIVE_REQUEST(&vma->last_fence);
 
 	vma->vm = &ggtt->vm;
diff --git a/drivers/gpu/drm/i915/i915_timeline.c b/drivers/gpu/drm/i915/i915_timeline.c
index c311ce9c6f9d..e2f9336bab83 100644
--- a/drivers/gpu/drm/i915/i915_timeline.c
+++ b/drivers/gpu/drm/i915/i915_timeline.c
@@ -148,6 +148,15 @@ static void __cacheline_retire(struct i915_active *active)
 		__idle_cacheline_free(cl);
 }
 
+static int __cacheline_active(struct i915_active *active)
+{
+	struct i915_timeline_cacheline *cl =
+		container_of(active, typeof(*cl), active);
+
+	__i915_vma_pin(cl->hwsp->vma);
+	return 0;
+}
+
 static struct i915_timeline_cacheline *
 cacheline_alloc(struct i915_timeline_hwsp *hwsp, unsigned int cacheline)
 {
@@ -170,15 +179,16 @@ cacheline_alloc(struct i915_timeline_hwsp *hwsp, unsigned int cacheline)
 	cl->hwsp = hwsp;
 	cl->vaddr = page_pack_bits(vaddr, cacheline);
 
-	i915_active_init(hwsp_to_i915(hwsp), &cl->active, __cacheline_retire);
+	i915_active_init(hwsp_to_i915(hwsp), &cl->active,
+			 __cacheline_active, __cacheline_retire);
 
 	return cl;
 }
 
 static void cacheline_acquire(struct i915_timeline_cacheline *cl)
 {
-	if (cl && i915_active_acquire(&cl->active))
-		__i915_vma_pin(cl->hwsp->vma);
+	if (cl)
+		i915_active_acquire(&cl->active);
 }
 
 static void cacheline_release(struct i915_timeline_cacheline *cl)
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index f782919ff19b..f02128658055 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -77,11 +77,20 @@ static void vma_print_allocator(struct i915_vma *vma, const char *reason)
 
 #endif
 
-static void __i915_vma_retire(struct i915_active *ref)
+static inline struct i915_vma *active_to_vma(struct i915_active *ref)
 {
-	struct i915_vma *vma = container_of(ref, typeof(*vma), active);
+	return container_of(ref, typeof(struct i915_vma), active);
+}
 
-	i915_vma_put(vma);
+static int __i915_vma_active(struct i915_active *ref)
+{
+	i915_vma_get(active_to_vma(ref));
+	return 0;
+}
+
+static void __i915_vma_retire(struct i915_active *ref)
+{
+	i915_vma_put(active_to_vma(ref));
 }
 
 static struct i915_vma *
@@ -106,7 +115,8 @@ vma_create(struct drm_i915_gem_object *obj,
 	vma->size = obj->base.size;
 	vma->display_alignment = I915_GTT_MIN_ALIGNMENT;
 
-	i915_active_init(vm->i915, &vma->active, __i915_vma_retire);
+	i915_active_init(vm->i915, &vma->active,
+			 __i915_vma_active, __i915_vma_retire);
 	INIT_ACTIVE_REQUEST(&vma->last_fence);
 
 	INIT_LIST_HEAD(&vma->closed_link);
@@ -903,11 +913,7 @@ int i915_vma_move_to_active(struct i915_vma *vma,
 	 * add the active reference first and queue for it to be dropped
 	 * *last*.
 	 */
-	if (i915_active_acquire(&vma->active))
-		i915_vma_get(vma);
-
 	err = i915_active_ref(&vma->active, rq->fence.context, rq);
-	i915_active_release(&vma->active);
 	if (unlikely(err))
 		return err;
 
diff --git a/drivers/gpu/drm/i915/selftests/i915_active.c b/drivers/gpu/drm/i915/selftests/i915_active.c
index cc1ca4be1a00..3b3ca5658122 100644
--- a/drivers/gpu/drm/i915/selftests/i915_active.c
+++ b/drivers/gpu/drm/i915/selftests/i915_active.c
@@ -36,14 +36,12 @@ static int __live_active_setup(struct drm_i915_private *i915,
 	if (!submit)
 		return -ENOMEM;
 
-	i915_active_init(i915, &active->base, __live_active_retire);
+	i915_active_init(i915, &active->base, NULL, __live_active_retire);
 	active->retired = false;
 
-	if (!i915_active_acquire(&active->base)) {
-		pr_err("First i915_active_acquire should report being idle\n");
-		err = -EINVAL;
+	err = i915_active_acquire(&active->base);
+	if (err)
 		goto out;
-	}
 
 	for_each_engine(engine, i915, id) {
 		struct i915_request *rq;
@@ -74,9 +72,9 @@ static int __live_active_setup(struct drm_i915_private *i915,
 		pr_err("i915_active retired before submission!\n");
 		err = -EINVAL;
 	}
-	if (active->base.count != count) {
+	if (atomic_read(&active->base.count) != count) {
 		pr_err("i915_active not tracking all requests, found %d, expected %d\n",
-		       active->base.count, count);
+		       atomic_read(&active->base.count), count);
 		err = -EINVAL;
 	}
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 20/39] drm/i915: Push the i915_active.retire into a worker
  2019-06-14  7:09 WIP glance towards struct_mutex removal Chris Wilson
                   ` (18 preceding siblings ...)
  2019-06-14  7:10 ` [PATCH 19/39] drm/i915: Provide an i915_active.acquire callback Chris Wilson
@ 2019-06-14  7:10 ` Chris Wilson
  2019-06-14  7:10 ` [PATCH 21/39] drm/i915/overlay: Switch to using i915_active tracking Chris Wilson
                   ` (20 subsequent siblings)
  40 siblings, 0 replies; 61+ messages in thread
From: Chris Wilson @ 2019-06-14  7:10 UTC (permalink / raw)
  To: intel-gfx

As we need to use a mutex to serialisation i915_active activation
(because we want to allow the callback to sleep), we need to push the
i915_active.retire into a worker callback in case we get need to retire
from an atomic context.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_active.c       | 54 ++++++++++++++++++------
 drivers/gpu/drm/i915/i915_active_types.h |  3 ++
 2 files changed, 44 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c
index 6a9f8d37f415..f76a2b4ef95d 100644
--- a/drivers/gpu/drm/i915/i915_active.c
+++ b/drivers/gpu/drm/i915/i915_active.c
@@ -30,18 +30,14 @@ struct active_node {
 };
 
 static void
-active_retire(struct i915_active *ref)
+__active_retire(struct i915_active *ref)
 {
 	struct active_node *it, *n;
 	struct rb_root root;
 	bool retire = false;
 
-	GEM_BUG_ON(!atomic_read(&ref->count));
-	if (atomic_add_unless(&ref->count, -1, 1))
-		return;
-
-	/* One active may be flushed from inside the acquire of another */
-	mutex_lock_nested(&ref->mutex, SINGLE_DEPTH_NESTING);
+	lockdep_assert_held(&ref->mutex);
+	GEM_BUG_ON(i915_active_is_idle(ref));
 
 	/* return the unused nodes to our slabcache -- flushing the allocator */
 	if (atomic_dec_and_test(&ref->count)) {
@@ -63,6 +59,35 @@ active_retire(struct i915_active *ref)
 	}
 }
 
+static void
+active_work(struct work_struct *wrk)
+{
+	struct i915_active *ref = container_of(wrk, typeof(*ref), work);
+
+	GEM_BUG_ON(!atomic_read(&ref->count));
+	if (atomic_add_unless(&ref->count, -1, 1))
+		return;
+
+	mutex_lock(&ref->mutex);
+	__active_retire(ref);
+}
+
+static void
+active_retire(struct i915_active *ref)
+{
+	GEM_BUG_ON(!atomic_read(&ref->count));
+	if (atomic_add_unless(&ref->count, -1, 1))
+		return;
+
+	/* If we are inside interrupt context (fence signaling), defer */
+	if (!mutex_trylock(&ref->mutex)) {
+		queue_work(system_unbound_wq, &ref->work);
+		return;
+	}
+
+	__active_retire(ref);
+}
+
 static void
 node_retire(struct i915_active_request *base, struct i915_request *rq)
 {
@@ -140,6 +165,7 @@ void __i915_active_init(struct drm_i915_private *i915,
 	init_llist_head(&ref->barriers);
 	atomic_set(&ref->count, 0);
 	__mutex_init(&ref->mutex, "i915_active", key);
+	INIT_WORK(&ref->work, active_work);
 }
 
 int i915_active_ref(struct i915_active *ref,
@@ -208,8 +234,10 @@ int i915_active_wait(struct i915_active *ref)
 	if (err)
 		return err;
 
-	if (!atomic_add_unless(&ref->count, 1, 0))
-		goto unlock;
+	if (!atomic_add_unless(&ref->count, 1, 0)) {
+		mutex_unlock(&ref->mutex);
+		return 0;
+	}
 
 	rbtree_postorder_for_each_entry_safe(it, n, &ref->tree, node) {
 		err = i915_active_request_retire(&it->base, BKL(ref));
@@ -217,9 +245,8 @@ int i915_active_wait(struct i915_active *ref)
 			break;
 	}
 
-	active_retire(ref);
-unlock:
-	mutex_unlock(&ref->mutex);
+	__active_retire(ref);
+	flush_work(&ref->work);
 	return err;
 }
 
@@ -260,8 +287,9 @@ int i915_request_await_active(struct i915_request *rq, struct i915_active *ref)
 #if IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)
 void i915_active_fini(struct i915_active *ref)
 {
-	GEM_BUG_ON(!RB_EMPTY_ROOT(&ref->tree));
 	GEM_BUG_ON(atomic_read(&ref->count));
+	GEM_BUG_ON(work_pending(&ref->work));
+	GEM_BUG_ON(!RB_EMPTY_ROOT(&ref->tree));
 	mutex_destroy(&ref->mutex);
 }
 #endif
diff --git a/drivers/gpu/drm/i915/i915_active_types.h b/drivers/gpu/drm/i915/i915_active_types.h
index 5b0a3024ce24..06acdffe0f6d 100644
--- a/drivers/gpu/drm/i915/i915_active_types.h
+++ b/drivers/gpu/drm/i915/i915_active_types.h
@@ -12,6 +12,7 @@
 #include <linux/mutex.h>
 #include <linux/rbtree.h>
 #include <linux/rcupdate.h>
+#include <linux/workqueue.h>
 
 struct drm_i915_private;
 struct i915_active_request;
@@ -39,6 +40,8 @@ struct i915_active {
 	int (*active)(struct i915_active *ref);
 	void (*retire)(struct i915_active *ref);
 
+	struct work_struct work;
+
 	struct llist_head barriers;
 };
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 21/39] drm/i915/overlay: Switch to using i915_active tracking
  2019-06-14  7:09 WIP glance towards struct_mutex removal Chris Wilson
                   ` (19 preceding siblings ...)
  2019-06-14  7:10 ` [PATCH 20/39] drm/i915: Push the i915_active.retire into a worker Chris Wilson
@ 2019-06-14  7:10 ` Chris Wilson
  2019-06-14  7:10 ` [PATCH 22/39] drm/i915: Forgo last_fence active request tracking Chris Wilson
                   ` (19 subsequent siblings)
  40 siblings, 0 replies; 61+ messages in thread
From: Chris Wilson @ 2019-06-14  7:10 UTC (permalink / raw)
  To: intel-gfx

Remove the raw i915_active_request tracking in favour of the higher
level i915_active tracking for the sole purpose of making the lockless
transition easier in later patches.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_active.h   |  19 ----
 drivers/gpu/drm/i915/intel_overlay.c | 130 +++++++++++++--------------
 2 files changed, 64 insertions(+), 85 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_active.h b/drivers/gpu/drm/i915/i915_active.h
index 6fa9263ca407..bdec4f81b6e8 100644
--- a/drivers/gpu/drm/i915/i915_active.h
+++ b/drivers/gpu/drm/i915/i915_active.h
@@ -89,25 +89,6 @@ int __must_check
 i915_active_request_set(struct i915_active_request *active,
 			struct i915_request *rq);
 
-/**
- * i915_active_request_set_retire_fn - updates the retirement callback
- * @active - the active tracker
- * @fn - the routine called when the request is retired
- * @mutex - struct_mutex used to guard retirements
- *
- * i915_active_request_set_retire_fn() updates the function pointer that
- * is called when the final request associated with the @active tracker
- * is retired.
- */
-static inline void
-i915_active_request_set_retire_fn(struct i915_active_request *active,
-				  i915_active_retire_fn fn,
-				  struct mutex *mutex)
-{
-	lockdep_assert_held(mutex);
-	active->retire = fn ?: i915_active_retire_noop;
-}
-
 /**
  * i915_active_request_raw - return the active request
  * @active - the active tracker
diff --git a/drivers/gpu/drm/i915/intel_overlay.c b/drivers/gpu/drm/i915/intel_overlay.c
index 21339b7f6a3e..c7d2d980df8c 100644
--- a/drivers/gpu/drm/i915/intel_overlay.c
+++ b/drivers/gpu/drm/i915/intel_overlay.c
@@ -190,7 +190,8 @@ struct intel_overlay {
 	struct overlay_registers __iomem *regs;
 	u32 flip_addr;
 	/* flip handling */
-	struct i915_active_request last_flip;
+	struct i915_active last_flip;
+	void (*flip_complete)(struct intel_overlay *ovl);
 };
 
 static void i830_overlay_clock_gating(struct drm_i915_private *dev_priv,
@@ -216,32 +217,26 @@ static void i830_overlay_clock_gating(struct drm_i915_private *dev_priv,
 				  PCI_DEVFN(0, 0), I830_CLOCK_GATE, val);
 }
 
-static void intel_overlay_submit_request(struct intel_overlay *overlay,
-					 struct i915_request *rq,
-					 i915_active_retire_fn retire)
+static struct i915_request *
+alloc_request(struct intel_overlay *overlay, void (*fn)(struct intel_overlay *))
 {
-	GEM_BUG_ON(i915_active_request_peek(&overlay->last_flip,
-					    &overlay->i915->drm.struct_mutex));
-	i915_active_request_set_retire_fn(&overlay->last_flip, retire,
-					  &overlay->i915->drm.struct_mutex);
-	__i915_active_request_set(&overlay->last_flip, rq);
-	i915_request_add(rq);
-}
+	struct intel_engine_cs *engine = overlay->i915->engine[RCS0];
+	struct i915_request *rq;
+	int err;
 
-static int intel_overlay_do_wait_request(struct intel_overlay *overlay,
-					 struct i915_request *rq,
-					 i915_active_retire_fn retire)
-{
-	intel_overlay_submit_request(overlay, rq, retire);
-	return i915_active_request_retire(&overlay->last_flip,
-					  &overlay->i915->drm.struct_mutex);
-}
+	overlay->flip_complete = fn;
 
-static struct i915_request *alloc_request(struct intel_overlay *overlay)
-{
-	struct intel_engine_cs *engine = overlay->i915->engine[RCS0];
+	rq = i915_request_create(engine->kernel_context);
+	if (IS_ERR(rq))
+		return rq;
+
+	err = i915_active_ref(&overlay->last_flip, rq->fence.context, rq);
+	if (err) {
+		i915_request_add(rq);
+		return ERR_PTR(err);
+	}
 
-	return i915_request_create(engine->kernel_context);
+	return rq;
 }
 
 /* overlay needs to be disable in OCMD reg */
@@ -253,7 +248,7 @@ static int intel_overlay_on(struct intel_overlay *overlay)
 
 	WARN_ON(overlay->active);
 
-	rq = alloc_request(overlay);
+	rq = alloc_request(overlay, NULL);
 	if (IS_ERR(rq))
 		return PTR_ERR(rq);
 
@@ -274,7 +269,9 @@ static int intel_overlay_on(struct intel_overlay *overlay)
 	*cs++ = MI_NOOP;
 	intel_ring_advance(rq, cs);
 
-	return intel_overlay_do_wait_request(overlay, rq, NULL);
+	i915_request_add(rq);
+
+	return i915_active_wait(&overlay->last_flip);
 }
 
 static void intel_overlay_flip_prepare(struct intel_overlay *overlay,
@@ -318,7 +315,7 @@ static int intel_overlay_continue(struct intel_overlay *overlay,
 	if (tmp & (1 << 17))
 		DRM_DEBUG("overlay underrun, DOVSTA: %x\n", tmp);
 
-	rq = alloc_request(overlay);
+	rq = alloc_request(overlay, NULL);
 	if (IS_ERR(rq))
 		return PTR_ERR(rq);
 
@@ -333,8 +330,7 @@ static int intel_overlay_continue(struct intel_overlay *overlay,
 	intel_ring_advance(rq, cs);
 
 	intel_overlay_flip_prepare(overlay, vma);
-
-	intel_overlay_submit_request(overlay, rq, NULL);
+	i915_request_add(rq);
 
 	return 0;
 }
@@ -355,20 +351,13 @@ static void intel_overlay_release_old_vma(struct intel_overlay *overlay)
 }
 
 static void
-intel_overlay_release_old_vid_tail(struct i915_active_request *active,
-				   struct i915_request *rq)
+intel_overlay_release_old_vid_tail(struct intel_overlay *overlay)
 {
-	struct intel_overlay *overlay =
-		container_of(active, typeof(*overlay), last_flip);
-
 	intel_overlay_release_old_vma(overlay);
 }
 
-static void intel_overlay_off_tail(struct i915_active_request *active,
-				   struct i915_request *rq)
+static void intel_overlay_off_tail(struct intel_overlay *overlay)
 {
-	struct intel_overlay *overlay =
-		container_of(active, typeof(*overlay), last_flip);
 	struct drm_i915_private *dev_priv = overlay->i915;
 
 	intel_overlay_release_old_vma(overlay);
@@ -381,6 +370,16 @@ static void intel_overlay_off_tail(struct i915_active_request *active,
 		i830_overlay_clock_gating(dev_priv, true);
 }
 
+static void
+intel_overlay_last_flip_retire(struct i915_active *active)
+{
+	struct intel_overlay *overlay =
+		container_of(active, typeof(*overlay), last_flip);
+
+	if (overlay->flip_complete)
+		overlay->flip_complete(overlay);
+}
+
 /* overlay needs to be disabled in OCMD reg */
 static int intel_overlay_off(struct intel_overlay *overlay)
 {
@@ -395,7 +394,7 @@ static int intel_overlay_off(struct intel_overlay *overlay)
 	 * of the hw. Do it in both cases */
 	flip_addr |= OFC_UPDATE;
 
-	rq = alloc_request(overlay);
+	rq = alloc_request(overlay, intel_overlay_off_tail);
 	if (IS_ERR(rq))
 		return PTR_ERR(rq);
 
@@ -418,17 +417,16 @@ static int intel_overlay_off(struct intel_overlay *overlay)
 	intel_ring_advance(rq, cs);
 
 	intel_overlay_flip_prepare(overlay, NULL);
+	i915_request_add(rq);
 
-	return intel_overlay_do_wait_request(overlay, rq,
-					     intel_overlay_off_tail);
+	return i915_active_wait(&overlay->last_flip);
 }
 
 /* recover from an interruption due to a signal
  * We have to be careful not to repeat work forever an make forward progess. */
 static int intel_overlay_recover_from_interrupt(struct intel_overlay *overlay)
 {
-	return i915_active_request_retire(&overlay->last_flip,
-					  &overlay->i915->drm.struct_mutex);
+	return i915_active_wait(&overlay->last_flip);
 }
 
 /* Wait for pending overlay flip and release old frame.
@@ -438,43 +436,40 @@ static int intel_overlay_recover_from_interrupt(struct intel_overlay *overlay)
 static int intel_overlay_release_old_vid(struct intel_overlay *overlay)
 {
 	struct drm_i915_private *dev_priv = overlay->i915;
+	struct i915_request *rq;
 	u32 *cs;
-	int ret;
 
 	lockdep_assert_held(&dev_priv->drm.struct_mutex);
 
-	/* Only wait if there is actually an old frame to release to
+	/*
+	 * Only wait if there is actually an old frame to release to
 	 * guarantee forward progress.
 	 */
 	if (!overlay->old_vma)
 		return 0;
 
-	if (I915_READ(GEN2_ISR) & I915_OVERLAY_PLANE_FLIP_PENDING_INTERRUPT) {
-		/* synchronous slowpath */
-		struct i915_request *rq;
+	if (!(I915_READ(GEN2_ISR) & I915_OVERLAY_PLANE_FLIP_PENDING_INTERRUPT)) {
+		intel_overlay_release_old_vid_tail(overlay);
+		return 0;
+	}
 
-		rq = alloc_request(overlay);
-		if (IS_ERR(rq))
-			return PTR_ERR(rq);
+	rq = alloc_request(overlay, intel_overlay_release_old_vid_tail);
+	if (IS_ERR(rq))
+		return PTR_ERR(rq);
 
-		cs = intel_ring_begin(rq, 2);
-		if (IS_ERR(cs)) {
-			i915_request_add(rq);
-			return PTR_ERR(cs);
-		}
+	cs = intel_ring_begin(rq, 2);
+	if (IS_ERR(cs)) {
+		i915_request_add(rq);
+		return PTR_ERR(cs);
+	}
 
-		*cs++ = MI_WAIT_FOR_EVENT | MI_WAIT_FOR_OVERLAY_FLIP;
-		*cs++ = MI_NOOP;
-		intel_ring_advance(rq, cs);
+	*cs++ = MI_WAIT_FOR_EVENT | MI_WAIT_FOR_OVERLAY_FLIP;
+	*cs++ = MI_NOOP;
+	intel_ring_advance(rq, cs);
 
-		ret = intel_overlay_do_wait_request(overlay, rq,
-						    intel_overlay_release_old_vid_tail);
-		if (ret)
-			return ret;
-	} else
-		intel_overlay_release_old_vid_tail(&overlay->last_flip, NULL);
+	i915_request_add(rq);
 
-	return 0;
+	return i915_active_wait(&overlay->last_flip);
 }
 
 void intel_overlay_reset(struct drm_i915_private *dev_priv)
@@ -1371,7 +1366,9 @@ void intel_overlay_setup(struct drm_i915_private *dev_priv)
 	overlay->contrast = 75;
 	overlay->saturation = 146;
 
-	INIT_ACTIVE_REQUEST(&overlay->last_flip);
+	i915_active_init(dev_priv,
+			 &overlay->last_flip,
+			 NULL, intel_overlay_last_flip_retire);
 
 	ret = get_registers(overlay, OVERLAY_NEEDS_PHYSICAL(dev_priv));
 	if (ret)
@@ -1405,6 +1402,7 @@ void intel_overlay_cleanup(struct drm_i915_private *dev_priv)
 	WARN_ON(overlay->active);
 
 	i915_gem_object_put(overlay->reg_bo);
+	i915_active_fini(&overlay->last_flip);
 
 	kfree(overlay);
 }
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 22/39] drm/i915: Forgo last_fence active request tracking
  2019-06-14  7:09 WIP glance towards struct_mutex removal Chris Wilson
                   ` (20 preceding siblings ...)
  2019-06-14  7:10 ` [PATCH 21/39] drm/i915/overlay: Switch to using i915_active tracking Chris Wilson
@ 2019-06-14  7:10 ` Chris Wilson
  2019-06-14  7:10 ` [PATCH 23/39] drm/i915: Extract intel_frontbuffer active tracking Chris Wilson
                   ` (18 subsequent siblings)
  40 siblings, 0 replies; 61+ messages in thread
From: Chris Wilson @ 2019-06-14  7:10 UTC (permalink / raw)
  To: intel-gfx

We were using the last_fence to track the last request that used this
vma that might be interpreted by a fence register and forced ourselves
to wait for this request before modifying any fence register that
overlapped our vma. Due to requirement that we need to track any XY_BLT
command, linear or tiled, this in effect meant that we have to track the
vma for its active lifespan anyway, so we can forgo the explicit
last_fence tracking and just use the whole vma->active.

Another solution would be to pipeline the register updates, and would
help resolve some long running stalls for gen3 (but only gen 2 and 3!)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c       |  4 +---
 drivers/gpu/drm/i915/i915_gem_fence_reg.c |  6 ++----
 drivers/gpu/drm/i915/i915_gem_gtt.c       |  1 -
 drivers/gpu/drm/i915/i915_vma.c           | 13 -------------
 drivers/gpu/drm/i915/i915_vma.h           |  1 -
 5 files changed, 3 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index d8fe75f5e0d6..85dd2448ef81 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -209,9 +209,7 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
 			}
 		}
 		if (vma->fence)
-			seq_printf(m, " , fence: %d%s",
-				   vma->fence->id,
-				   i915_active_request_isset(&vma->last_fence) ? "*" : "");
+			seq_printf(m, " , fence: %d", vma->fence->id);
 		seq_puts(m, ")");
 
 		spin_lock(&obj->vma.lock);
diff --git a/drivers/gpu/drm/i915/i915_gem_fence_reg.c b/drivers/gpu/drm/i915/i915_gem_fence_reg.c
index 1c9466676caf..6378c330809f 100644
--- a/drivers/gpu/drm/i915/i915_gem_fence_reg.c
+++ b/drivers/gpu/drm/i915/i915_gem_fence_reg.c
@@ -230,16 +230,14 @@ static int fence_update(struct i915_fence_reg *fence,
 			 i915_gem_object_get_tiling(vma->obj)))
 			return -EINVAL;
 
-		ret = i915_active_request_retire(&vma->last_fence,
-					     &vma->obj->base.dev->struct_mutex);
+		ret = i915_active_wait(&vma->active);
 		if (ret)
 			return ret;
 	}
 
 	old = xchg(&fence->vma, NULL);
 	if (old) {
-		ret = i915_active_request_retire(&old->last_fence,
-					     &old->obj->base.dev->struct_mutex);
+		ret = i915_active_wait(&old->active);
 		if (ret) {
 			fence->vma = old;
 			return ret;
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 611ced79fd09..7195c3c09656 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -2062,7 +2062,6 @@ static struct i915_vma *pd_vma_create(struct gen6_ppgtt *ppgtt, int size)
 		return ERR_PTR(-ENOMEM);
 
 	i915_active_init(i915, &vma->active, NULL, NULL);
-	INIT_ACTIVE_REQUEST(&vma->last_fence);
 
 	vma->vm = &ggtt->vm;
 	vma->ops = &pd_vma_ops;
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index f02128658055..922179451caa 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -117,7 +117,6 @@ vma_create(struct drm_i915_gem_object *obj,
 
 	i915_active_init(vm->i915, &vma->active,
 			 __i915_vma_active, __i915_vma_retire);
-	INIT_ACTIVE_REQUEST(&vma->last_fence);
 
 	INIT_LIST_HEAD(&vma->closed_link);
 
@@ -792,8 +791,6 @@ static void __i915_vma_destroy(struct i915_vma *vma)
 	GEM_BUG_ON(vma->node.allocated);
 	GEM_BUG_ON(vma->fence);
 
-	GEM_BUG_ON(i915_active_request_isset(&vma->last_fence));
-
 	mutex_lock(&vma->vm->mutex);
 	list_del(&vma->vm_link);
 	mutex_unlock(&vma->vm->mutex);
@@ -929,9 +926,6 @@ int i915_vma_move_to_active(struct i915_vma *vma,
 	obj->read_domains |= I915_GEM_GPU_DOMAINS;
 	obj->mm.dirty = true;
 
-	if (flags & EXEC_OBJECT_NEEDS_FENCE)
-		__i915_active_request_set(&vma->last_fence, rq);
-
 	export_fence(vma, rq, flags);
 
 	GEM_BUG_ON(!i915_vma_is_active(vma));
@@ -964,14 +958,7 @@ int i915_vma_unbind(struct i915_vma *vma)
 		 * before we are finished).
 		 */
 		__i915_vma_pin(vma);
-
 		ret = i915_active_wait(&vma->active);
-		if (ret)
-			goto unpin;
-
-		ret = i915_active_request_retire(&vma->last_fence,
-					      &vma->vm->i915->drm.struct_mutex);
-unpin:
 		__i915_vma_unpin(vma);
 		if (ret)
 			return ret;
diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
index 4b769db649bf..17e6a038bd36 100644
--- a/drivers/gpu/drm/i915/i915_vma.h
+++ b/drivers/gpu/drm/i915/i915_vma.h
@@ -111,7 +111,6 @@ struct i915_vma {
 #define I915_VMA_GGTT_WRITE	BIT(14)
 
 	struct i915_active active;
-	struct i915_active_request last_fence;
 
 	/**
 	 * Support different GGTT views into the same object.
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 23/39] drm/i915: Extract intel_frontbuffer active tracking
  2019-06-14  7:09 WIP glance towards struct_mutex removal Chris Wilson
                   ` (21 preceding siblings ...)
  2019-06-14  7:10 ` [PATCH 22/39] drm/i915: Forgo last_fence active request tracking Chris Wilson
@ 2019-06-14  7:10 ` Chris Wilson
  2019-06-14  7:10 ` [PATCH 24/39] drm/i915: Coordinate i915_active with its own mutex Chris Wilson
                   ` (17 subsequent siblings)
  40 siblings, 0 replies; 61+ messages in thread
From: Chris Wilson @ 2019-06-14  7:10 UTC (permalink / raw)
  To: intel-gfx

Move the active tracking for the frontbuffer operations out of the
i915_gem_object and into its own first class (refcounted) object. In the
process of detangling, we switch from low level request tracking to the
easier i915_active -- with the plan that this avoids any potential
atomic callbacks as the frontbuffer tracking wishes to sleep as it
flushes.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gem/i915_gem_clflush.c   |   2 +-
 drivers/gpu/drm/i915/gem/i915_gem_domain.c    |  14 +-
 drivers/gpu/drm/i915/gem/i915_gem_mman.c      |   4 -
 drivers/gpu/drm/i915/gem/i915_gem_object.c    |  28 +-
 drivers/gpu/drm/i915/gem/i915_gem_object.h    |   2 +-
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |   8 +-
 drivers/gpu/drm/i915/i915_debugfs.c           |   5 -
 drivers/gpu/drm/i915/i915_drv.h               |   4 -
 drivers/gpu/drm/i915/i915_gem.c               |  47 +---
 drivers/gpu/drm/i915/i915_vma.c               |   6 +-
 drivers/gpu/drm/i915/intel_display.c          |  67 +++--
 drivers/gpu/drm/i915/intel_drv.h              |   1 +
 drivers/gpu/drm/i915/intel_fbdev.c            |  22 +-
 drivers/gpu/drm/i915/intel_frontbuffer.c      | 244 +++++++++++++-----
 drivers/gpu/drm/i915/intel_frontbuffer.h      |  70 +++--
 drivers/gpu/drm/i915/intel_overlay.c          |   8 +-
 16 files changed, 294 insertions(+), 238 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_clflush.c b/drivers/gpu/drm/i915/gem/i915_gem_clflush.c
index 537aa2337cc8..54b447539430 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_clflush.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_clflush.c
@@ -47,7 +47,7 @@ static void __i915_do_clflush(struct drm_i915_gem_object *obj)
 {
 	GEM_BUG_ON(!i915_gem_object_has_pages(obj));
 	drm_clflush_sg(obj->mm.pages);
-	intel_fb_obj_flush(obj, ORIGIN_CPU);
+	intel_frontbuffer_flush(obj->frontbuffer, ORIGIN_CPU);
 }
 
 static void i915_clflush_work(struct work_struct *work)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
index bd180ef46aeb..7e6ed767348c 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
@@ -550,13 +550,6 @@ i915_gem_object_set_to_cpu_domain(struct drm_i915_gem_object *obj, bool write)
 	return 0;
 }
 
-static inline enum fb_op_origin
-fb_write_origin(struct drm_i915_gem_object *obj, unsigned int domain)
-{
-	return (domain == I915_GEM_DOMAIN_GTT ?
-		obj->frontbuffer_ggtt_origin : ORIGIN_CPU);
-}
-
 /**
  * Called when user space prepares to use an object with the CPU, either
  * through the mmap ioctl's mapping or a GTT mapping.
@@ -660,9 +653,8 @@ i915_gem_set_domain_ioctl(struct drm_device *dev, void *data,
 
 	i915_gem_object_unlock(obj);
 
-	if (write_domain != 0)
-		intel_fb_obj_invalidate(obj,
-					fb_write_origin(obj, write_domain));
+	if (write_domain)
+		intel_frontbuffer_invalidate(obj->frontbuffer, ORIGIN_CPU);
 
 out_unpin:
 	i915_gem_object_unpin_pages(obj);
@@ -782,7 +774,7 @@ int i915_gem_object_prepare_write(struct drm_i915_gem_object *obj,
 	}
 
 out:
-	intel_fb_obj_invalidate(obj, ORIGIN_CPU);
+	intel_frontbuffer_invalidate(obj->frontbuffer, ORIGIN_CPU);
 	obj->mm.dirty = true;
 	/* return with the pages pinned */
 	return 0;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
index a8b8b9c281f1..49cf9ad97bfc 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
@@ -99,9 +99,6 @@ i915_gem_mmap_ioctl(struct drm_device *dev, void *data,
 		up_write(&mm->mmap_sem);
 		if (IS_ERR_VALUE(addr))
 			goto err;
-
-		/* This may race, but that's ok, it only gets set */
-		WRITE_ONCE(obj->frontbuffer_ggtt_origin, ORIGIN_CPU);
 	}
 	i915_gem_object_put(obj);
 
@@ -280,7 +277,6 @@ vm_fault_t i915_gem_fault(struct vm_fault *vmf)
 		 * Userspace is now writing through an untracked VMA, abandon
 		 * all hope that the hardware is able to track future writes.
 		 */
-		obj->frontbuffer_ggtt_origin = ORIGIN_CPU;
 
 		vma = i915_gem_object_ggtt_pin(obj, &view, 0, 0, flags);
 		if (IS_ERR(vma) && !view.type) {
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
index 01528553d4cc..ee81acb04d84 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
@@ -44,16 +44,6 @@ void i915_gem_object_free(struct drm_i915_gem_object *obj)
 	return kmem_cache_free(global.slab_objects, obj);
 }
 
-static void
-frontbuffer_retire(struct i915_active_request *active,
-		   struct i915_request *request)
-{
-	struct drm_i915_gem_object *obj =
-		container_of(active, typeof(*obj), frontbuffer_write);
-
-	intel_fb_obj_flush(obj, ORIGIN_CS);
-}
-
 void i915_gem_object_init(struct drm_i915_gem_object *obj,
 			  const struct drm_i915_gem_object_ops *ops)
 {
@@ -72,10 +62,6 @@ void i915_gem_object_init(struct drm_i915_gem_object *obj,
 	reservation_object_init(&obj->__builtin_resv);
 	obj->resv = &obj->__builtin_resv;
 
-	obj->frontbuffer_ggtt_origin = ORIGIN_GTT;
-	i915_active_request_init(&obj->frontbuffer_write,
-				 NULL, frontbuffer_retire);
-
 	obj->mm.madv = I915_MADV_WILLNEED;
 	INIT_RADIX_TREE(&obj->mm.get_page.radix, GFP_KERNEL | __GFP_NOWARN);
 	mutex_init(&obj->mm.get_page.lock);
@@ -217,7 +203,6 @@ static void __i915_gem_free_objects(struct drm_i915_private *i915,
 
 		GEM_BUG_ON(atomic_read(&obj->bind_count));
 		GEM_BUG_ON(obj->userfault_count);
-		GEM_BUG_ON(atomic_read(&obj->frontbuffer_bits));
 		GEM_BUG_ON(!list_empty(&obj->lut_list));
 
 		if (obj->ops->release)
@@ -322,6 +307,8 @@ void i915_gem_free_object(struct drm_gem_object *gem_obj)
 {
 	struct drm_i915_gem_object *obj = to_intel_bo(gem_obj);
 
+	GEM_BUG_ON(i915_gem_object_is_framebuffer(obj));
+
 	if (obj->mm.quirked)
 		__i915_gem_object_unpin_pages(obj);
 
@@ -349,13 +336,6 @@ void i915_gem_free_object(struct drm_gem_object *gem_obj)
 	call_rcu(&obj->rcu, __i915_gem_free_object_rcu);
 }
 
-static inline enum fb_op_origin
-fb_write_origin(struct drm_i915_gem_object *obj, unsigned int domain)
-{
-	return (domain == I915_GEM_DOMAIN_GTT ?
-		obj->frontbuffer_ggtt_origin : ORIGIN_CPU);
-}
-
 static bool gpu_write_needs_clflush(struct drm_i915_gem_object *obj)
 {
 	return !(obj->cache_level == I915_CACHE_NONE ||
@@ -377,9 +357,7 @@ i915_gem_object_flush_write_domain(struct drm_i915_gem_object *obj,
 	switch (obj->write_domain) {
 	case I915_GEM_DOMAIN_GTT:
 		i915_gem_flush_ggtt_writes(dev_priv);
-
-		intel_fb_obj_flush(obj,
-				   fb_write_origin(obj, I915_GEM_DOMAIN_GTT));
+		intel_frontbuffer_flush(obj->frontbuffer, ORIGIN_CPU);
 
 		for_each_ggtt_vma(vma, obj) {
 			if (vma->iomap)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h
index 454bfb498001..67d70d144bd9 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
@@ -161,7 +161,7 @@ i915_gem_object_needs_async_cancel(const struct drm_i915_gem_object *obj)
 static inline bool
 i915_gem_object_is_framebuffer(const struct drm_i915_gem_object *obj)
 {
-	return READ_ONCE(obj->framebuffer_references);
+	return READ_ONCE(obj->frontbuffer);
 }
 
 static inline unsigned int
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index c299fed2c6b1..21bfb7bd0f57 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -15,6 +15,7 @@
 #include "i915_selftest.h"
 
 struct drm_i915_gem_object;
+struct intel_fronbuffer;
 
 /*
  * struct i915_lut_handle tracks the fast lookups from handle to vma used
@@ -144,9 +145,7 @@ struct drm_i915_gem_object {
 	 */
 	u16 write_domain;
 
-	atomic_t frontbuffer_bits;
-	unsigned int frontbuffer_ggtt_origin; /* write once */
-	struct i915_active_request frontbuffer_write;
+	struct intel_frontbuffer *frontbuffer;
 
 	/** Current tiling stride for the object, if it's tiled. */
 	unsigned int tiling_and_stride;
@@ -239,9 +238,6 @@ struct drm_i915_gem_object {
 	 */
 	struct reservation_object *resv;
 
-	/** References from framebuffers, locks out tiling changes. */
-	unsigned int framebuffer_references;
-
 	/** Record of address bit 17 of each page at last unbind. */
 	unsigned long *bit_17;
 
diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 85dd2448ef81..a3f1aff3c5c2 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -135,7 +135,6 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
 	struct drm_i915_private *dev_priv = to_i915(obj->base.dev);
 	struct intel_engine_cs *engine;
 	struct i915_vma *vma;
-	unsigned int frontbuffer_bits;
 	int pin_count = 0;
 
 	seq_printf(m, "%pK: %c%c%c%c %8zdKiB %02x %02x %s%s%s",
@@ -225,10 +224,6 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
 	engine = i915_gem_object_last_write_engine(obj);
 	if (engine)
 		seq_printf(m, " (%s)", engine->name);
-
-	frontbuffer_bits = atomic_read(&obj->frontbuffer_bits);
-	if (frontbuffer_bits)
-		seq_printf(m, " (frontbuffer: 0x%03x)", frontbuffer_bits);
 }
 
 struct file_stats {
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 3683ef6d4c28..3c9317592729 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2583,10 +2583,6 @@ int i915_gem_mmap_gtt(struct drm_file *file_priv, struct drm_device *dev,
 		      u32 handle, u64 *offset);
 int i915_gem_mmap_gtt_version(void);
 
-void i915_gem_track_fb(struct drm_i915_gem_object *old,
-		       struct drm_i915_gem_object *new,
-		       unsigned frontbuffer_bits);
-
 int __must_check i915_gem_set_global_seqno(struct drm_device *dev, u32 seqno);
 
 static inline bool __i915_wedged(struct i915_gpu_error *error)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 314c532490df..9422821c9d58 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -133,17 +133,19 @@ i915_gem_phys_pwrite(struct drm_i915_gem_object *obj,
 	void *vaddr = obj->phys_handle->vaddr + args->offset;
 	char __user *user_data = u64_to_user_ptr(args->data_ptr);
 
-	/* We manually control the domain here and pretend that it
+	/*
+	 * We manually control the domain here and pretend that it
 	 * remains coherent i.e. in the GTT domain, like shmem_pwrite.
 	 */
-	intel_fb_obj_invalidate(obj, ORIGIN_CPU);
+	intel_frontbuffer_invalidate(obj->frontbuffer, ORIGIN_CPU);
+
 	if (copy_from_user(vaddr, user_data, args->size))
 		return -EFAULT;
 
 	drm_clflush_virt_range(vaddr, args->size);
 	i915_gem_chipset_flush(to_i915(obj->base.dev));
 
-	intel_fb_obj_flush(obj, ORIGIN_CPU);
+	intel_frontbuffer_flush(obj->frontbuffer, ORIGIN_CPU);
 	return 0;
 }
 
@@ -630,7 +632,7 @@ i915_gem_gtt_pwrite_fast(struct drm_i915_gem_object *obj,
 		goto out_unpin;
 	}
 
-	intel_fb_obj_invalidate(obj, ORIGIN_CPU);
+	intel_frontbuffer_invalidate(obj->frontbuffer, ORIGIN_CPU);
 
 	user_data = u64_to_user_ptr(args->data_ptr);
 	offset = args->offset;
@@ -671,7 +673,7 @@ i915_gem_gtt_pwrite_fast(struct drm_i915_gem_object *obj,
 		user_data += page_length;
 		offset += page_length;
 	}
-	intel_fb_obj_flush(obj, ORIGIN_CPU);
+	intel_frontbuffer_flush(obj->frontbuffer, ORIGIN_CPU);
 
 	i915_gem_object_unlock_fence(obj, fence);
 out_unpin:
@@ -764,7 +766,7 @@ i915_gem_shmem_pwrite(struct drm_i915_gem_object *obj,
 		offset = 0;
 	}
 
-	intel_fb_obj_flush(obj, ORIGIN_CPU);
+	intel_frontbuffer_flush(obj->frontbuffer, ORIGIN_CPU);
 	i915_gem_object_unlock_fence(obj, fence);
 
 	return ret;
@@ -1870,39 +1872,6 @@ int i915_gem_open(struct drm_i915_private *i915, struct drm_file *file)
 	return ret;
 }
 
-/**
- * i915_gem_track_fb - update frontbuffer tracking
- * @old: current GEM buffer for the frontbuffer slots
- * @new: new GEM buffer for the frontbuffer slots
- * @frontbuffer_bits: bitmask of frontbuffer slots
- *
- * This updates the frontbuffer tracking bits @frontbuffer_bits by clearing them
- * from @old and setting them in @new. Both @old and @new can be NULL.
- */
-void i915_gem_track_fb(struct drm_i915_gem_object *old,
-		       struct drm_i915_gem_object *new,
-		       unsigned frontbuffer_bits)
-{
-	/* Control of individual bits within the mask are guarded by
-	 * the owning plane->mutex, i.e. we can never see concurrent
-	 * manipulation of individual bits. But since the bitfield as a whole
-	 * is updated using RMW, we need to use atomics in order to update
-	 * the bits.
-	 */
-	BUILD_BUG_ON(INTEL_FRONTBUFFER_BITS_PER_PIPE * I915_MAX_PIPES >
-		     BITS_PER_TYPE(atomic_t));
-
-	if (old) {
-		WARN_ON(!(atomic_read(&old->frontbuffer_bits) & frontbuffer_bits));
-		atomic_andnot(frontbuffer_bits, &old->frontbuffer_bits);
-	}
-
-	if (new) {
-		WARN_ON(atomic_read(&new->frontbuffer_bits) & frontbuffer_bits);
-		atomic_or(frontbuffer_bits, &new->frontbuffer_bits);
-	}
-}
-
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
 #include "selftests/mock_gem_device.c"
 #include "selftests/i915_gem.c"
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 922179451caa..041638a50006 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -918,8 +918,10 @@ int i915_vma_move_to_active(struct i915_vma *vma,
 	if (flags & EXEC_OBJECT_WRITE) {
 		obj->write_domain = I915_GEM_DOMAIN_RENDER;
 
-		if (intel_fb_obj_invalidate(obj, ORIGIN_CS))
-			__i915_active_request_set(&obj->frontbuffer_write, rq);
+		if (intel_frontbuffer_invalidate(obj->frontbuffer, ORIGIN_CS))
+			i915_active_ref(&obj->frontbuffer->write,
+					rq->fence.context,
+					rq);
 
 		obj->read_domains = 0;
 	}
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index e681ed99cdf2..72e2c04b6e11 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -3047,12 +3047,13 @@ intel_alloc_initial_plane_obj(struct intel_crtc *crtc,
 {
 	struct drm_device *dev = crtc->base.dev;
 	struct drm_i915_private *dev_priv = to_i915(dev);
-	struct drm_i915_gem_object *obj = NULL;
 	struct drm_mode_fb_cmd2 mode_cmd = { 0 };
 	struct drm_framebuffer *fb = &plane_config->fb->base;
 	u32 base_aligned = round_down(plane_config->base, PAGE_SIZE);
 	u32 size_aligned = round_up(plane_config->base + plane_config->size,
 				    PAGE_SIZE);
+	struct drm_i915_gem_object *obj;
+	bool ret = false;
 
 	size_aligned -= base_aligned;
 
@@ -3094,7 +3095,7 @@ intel_alloc_initial_plane_obj(struct intel_crtc *crtc,
 		break;
 	default:
 		MISSING_CASE(plane_config->tiling);
-		return false;
+		goto out;
 	}
 
 	mode_cmd.pixel_format = fb->format->format;
@@ -3106,16 +3107,15 @@ intel_alloc_initial_plane_obj(struct intel_crtc *crtc,
 
 	if (intel_framebuffer_init(to_intel_framebuffer(fb), obj, &mode_cmd)) {
 		DRM_DEBUG_KMS("intel fb init failed\n");
-		goto out_unref_obj;
+		goto out;
 	}
 
 
 	DRM_DEBUG_KMS("initial plane fb obj %p\n", obj);
-	return true;
-
-out_unref_obj:
+	ret = true;
+out:
 	i915_gem_object_put(obj);
-	return false;
+	return ret;
 }
 
 static void
@@ -3172,6 +3172,12 @@ static void intel_plane_disable_noatomic(struct intel_crtc *crtc,
 	intel_disable_plane(plane, crtc_state);
 }
 
+static struct intel_frontbuffer *
+to_intel_frontbuffer(struct drm_framebuffer *fb)
+{
+	return fb ? to_intel_framebuffer(fb)->frontbuffer : NULL;
+}
+
 static void
 intel_find_initial_plane_obj(struct intel_crtc *intel_crtc,
 			     struct intel_initial_plane_config *plane_config)
@@ -3179,7 +3185,6 @@ intel_find_initial_plane_obj(struct intel_crtc *intel_crtc,
 	struct drm_device *dev = intel_crtc->base.dev;
 	struct drm_i915_private *dev_priv = to_i915(dev);
 	struct drm_crtc *c;
-	struct drm_i915_gem_object *obj;
 	struct drm_plane *primary = intel_crtc->base.primary;
 	struct drm_plane_state *plane_state = primary->state;
 	struct intel_plane *intel_plane = to_intel_plane(primary);
@@ -3255,8 +3260,7 @@ intel_find_initial_plane_obj(struct intel_crtc *intel_crtc,
 		return;
 	}
 
-	obj = intel_fb_obj(fb);
-	intel_fb_obj_flush(obj, ORIGIN_DIRTYFB);
+	intel_frontbuffer_flush(to_intel_frontbuffer(fb), ORIGIN_DIRTYFB);
 
 	plane_state->src_x = 0;
 	plane_state->src_y = 0;
@@ -3271,14 +3275,14 @@ intel_find_initial_plane_obj(struct intel_crtc *intel_crtc,
 	intel_state->base.src = drm_plane_state_src(plane_state);
 	intel_state->base.dst = drm_plane_state_dest(plane_state);
 
-	if (i915_gem_object_is_tiled(obj))
+	if (plane_config->tiling)
 		dev_priv->preserve_bios_swizzle = true;
 
 	plane_state->fb = fb;
 	plane_state->crtc = &intel_crtc->base;
 
 	atomic_or(to_intel_plane(primary)->frontbuffer_bit,
-		  &obj->frontbuffer_bits);
+		  &to_intel_frontbuffer(fb)->bits);
 }
 
 static int skl_max_plane_width(const struct drm_framebuffer *fb,
@@ -13981,9 +13985,9 @@ static void intel_atomic_track_fbs(struct drm_atomic_state *state)
 	int i;
 
 	for_each_oldnew_plane_in_state(state, plane, old_plane_state, new_plane_state, i)
-		i915_gem_track_fb(intel_fb_obj(old_plane_state->fb),
-				  intel_fb_obj(new_plane_state->fb),
-				  to_intel_plane(plane)->frontbuffer_bit);
+		intel_frontbuffer_track(to_intel_frontbuffer(old_plane_state->fb),
+					to_intel_frontbuffer(new_plane_state->fb),
+					to_intel_plane(plane)->frontbuffer_bit);
 }
 
 /**
@@ -14293,7 +14297,7 @@ intel_prepare_plane_fb(struct drm_plane *plane,
 		return ret;
 
 	fb_obj_bump_render_priority(obj);
-	intel_fb_obj_flush(obj, ORIGIN_DIRTYFB);
+	intel_frontbuffer_flush(obj->frontbuffer, ORIGIN_DIRTYFB);
 
 	if (!new_state->fence) { /* implicit fencing */
 		struct dma_fence *fence;
@@ -14560,13 +14564,12 @@ intel_legacy_cursor_update(struct drm_plane *plane,
 			   struct drm_modeset_acquire_ctx *ctx)
 {
 	struct drm_i915_private *dev_priv = to_i915(crtc->dev);
-	int ret;
 	struct drm_plane_state *old_plane_state, *new_plane_state;
 	struct intel_plane *intel_plane = to_intel_plane(plane);
-	struct drm_framebuffer *old_fb;
 	struct intel_crtc_state *crtc_state =
 		to_intel_crtc_state(crtc->state);
 	struct intel_crtc_state *new_crtc_state;
+	int ret;
 
 	/*
 	 * When crtc is inactive or there is a modeset pending,
@@ -14634,11 +14637,10 @@ intel_legacy_cursor_update(struct drm_plane *plane,
 	if (ret)
 		goto out_unlock;
 
-	intel_fb_obj_flush(intel_fb_obj(fb), ORIGIN_FLIP);
-
-	old_fb = old_plane_state->fb;
-	i915_gem_track_fb(intel_fb_obj(old_fb), intel_fb_obj(fb),
-			  intel_plane->frontbuffer_bit);
+	intel_frontbuffer_flush(to_intel_frontbuffer(fb), ORIGIN_FLIP);
+	intel_frontbuffer_track(to_intel_frontbuffer(old_plane_state->fb),
+				to_intel_frontbuffer(fb),
+				intel_plane->frontbuffer_bit);
 
 	/* Swap plane state */
 	plane->state = new_plane_state;
@@ -15328,15 +15330,9 @@ static void intel_setup_outputs(struct drm_i915_private *dev_priv)
 static void intel_user_framebuffer_destroy(struct drm_framebuffer *fb)
 {
 	struct intel_framebuffer *intel_fb = to_intel_framebuffer(fb);
-	struct drm_i915_gem_object *obj = intel_fb_obj(fb);
 
 	drm_framebuffer_cleanup(fb);
-
-	i915_gem_object_lock(obj);
-	WARN_ON(!obj->framebuffer_references--);
-	i915_gem_object_unlock(obj);
-
-	i915_gem_object_put(obj);
+	intel_frontbuffer_put(intel_fb->frontbuffer);
 
 	kfree(intel_fb);
 }
@@ -15364,7 +15360,7 @@ static int intel_user_framebuffer_dirty(struct drm_framebuffer *fb,
 	struct drm_i915_gem_object *obj = intel_fb_obj(fb);
 
 	i915_gem_object_flush_if_display(obj);
-	intel_fb_obj_flush(obj, ORIGIN_DIRTYFB);
+	intel_frontbuffer_flush(to_intel_frontbuffer(fb), ORIGIN_DIRTYFB);
 
 	return 0;
 }
@@ -15386,8 +15382,11 @@ static int intel_framebuffer_init(struct intel_framebuffer *intel_fb,
 	int ret = -EINVAL;
 	int i;
 
+	intel_fb->frontbuffer = intel_frontbuffer_get(obj);
+	if (!intel_fb->frontbuffer)
+		return -ENOMEM;
+
 	i915_gem_object_lock(obj);
-	obj->framebuffer_references++;
 	tiling = i915_gem_object_get_tiling(obj);
 	stride = i915_gem_object_get_stride(obj);
 	i915_gem_object_unlock(obj);
@@ -15504,9 +15503,7 @@ static int intel_framebuffer_init(struct intel_framebuffer *intel_fb,
 	return 0;
 
 err:
-	i915_gem_object_lock(obj);
-	obj->framebuffer_references--;
-	i915_gem_object_unlock(obj);
+	intel_frontbuffer_put(intel_fb->frontbuffer);
 	return ret;
 }
 
diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
index 3e337317f77e..a5253fd838bc 100644
--- a/drivers/gpu/drm/i915/intel_drv.h
+++ b/drivers/gpu/drm/i915/intel_drv.h
@@ -69,6 +69,7 @@ enum intel_output_type {
 
 struct intel_framebuffer {
 	struct drm_framebuffer base;
+	struct intel_frontbuffer *frontbuffer;
 	struct intel_rotation_info rot_info;
 
 	/* for each plane in the normal GTT view */
diff --git a/drivers/gpu/drm/i915/intel_fbdev.c b/drivers/gpu/drm/i915/intel_fbdev.c
index 0d3a6fa674e6..c7c11d1842af 100644
--- a/drivers/gpu/drm/i915/intel_fbdev.c
+++ b/drivers/gpu/drm/i915/intel_fbdev.c
@@ -47,13 +47,14 @@
 #include "intel_fbdev.h"
 #include "intel_frontbuffer.h"
 
-static void intel_fbdev_invalidate(struct intel_fbdev *ifbdev)
+static struct intel_frontbuffer *to_frontbuffer(struct intel_fbdev *ifbdev)
 {
-	struct drm_i915_gem_object *obj = intel_fb_obj(&ifbdev->fb->base);
-	unsigned int origin =
-		ifbdev->vma_flags & PLANE_HAS_FENCE ? ORIGIN_GTT : ORIGIN_CPU;
+	return ifbdev->fb->frontbuffer;
+}
 
-	intel_fb_obj_invalidate(obj, origin);
+static void intel_fbdev_invalidate(struct intel_fbdev *ifbdev)
+{
+	intel_frontbuffer_invalidate(to_frontbuffer(ifbdev), ORIGIN_CPU);
 }
 
 static int intel_fbdev_set_par(struct fb_info *info)
@@ -180,7 +181,6 @@ static int intelfb_create(struct drm_fb_helper *helper,
 	const struct i915_ggtt_view view = {
 		.type = I915_GGTT_VIEW_NORMAL,
 	};
-	struct drm_framebuffer *fb;
 	intel_wakeref_t wakeref;
 	struct fb_info *info;
 	struct i915_vma *vma;
@@ -226,8 +226,7 @@ static int intelfb_create(struct drm_fb_helper *helper,
 		goto out_unlock;
 	}
 
-	fb = &ifbdev->fb->base;
-	intel_fb_obj_flush(intel_fb_obj(fb), ORIGIN_DIRTYFB);
+	intel_frontbuffer_flush(to_frontbuffer(ifbdev), ORIGIN_DIRTYFB);
 
 	info = drm_fb_helper_alloc_fbi(helper);
 	if (IS_ERR(info)) {
@@ -236,7 +235,7 @@ static int intelfb_create(struct drm_fb_helper *helper,
 		goto out_unpin;
 	}
 
-	ifbdev->helper.fb = fb;
+	ifbdev->helper.fb = &ifbdev->fb->base;
 
 	info->fbops = &intelfb_ops;
 
@@ -262,13 +261,14 @@ static int intelfb_create(struct drm_fb_helper *helper,
 	 * If the object is stolen however, it will be full of whatever
 	 * garbage was left in there.
 	 */
-	if (intel_fb_obj(fb)->stolen && !prealloc)
+	if (vma->obj->stolen && !prealloc)
 		memset_io(info->screen_base, 0, info->screen_size);
 
 	/* Use default scratch pixmap (info->pixmap.flags = FB_PIXMAP_SYSTEM) */
 
 	DRM_DEBUG_KMS("allocated %dx%d fb: 0x%08x\n",
-		      fb->width, fb->height, i915_ggtt_offset(vma));
+		      ifbdev->fb->base.width, ifbdev->fb->base.height,
+		      i915_ggtt_offset(vma));
 	ifbdev->vma = vma;
 	ifbdev->vma_flags = flags;
 
diff --git a/drivers/gpu/drm/i915/intel_frontbuffer.c b/drivers/gpu/drm/i915/intel_frontbuffer.c
index d6036b9ad16a..ecbed110aec9 100644
--- a/drivers/gpu/drm/i915/intel_frontbuffer.c
+++ b/drivers/gpu/drm/i915/intel_frontbuffer.c
@@ -62,28 +62,9 @@
 #include "intel_frontbuffer.h"
 #include "intel_psr.h"
 
-void __intel_fb_obj_invalidate(struct drm_i915_gem_object *obj,
-			       enum fb_op_origin origin,
-			       unsigned int frontbuffer_bits)
-{
-	struct drm_i915_private *dev_priv = to_i915(obj->base.dev);
-
-	if (origin == ORIGIN_CS) {
-		spin_lock(&dev_priv->fb_tracking.lock);
-		dev_priv->fb_tracking.busy_bits |= frontbuffer_bits;
-		dev_priv->fb_tracking.flip_bits &= ~frontbuffer_bits;
-		spin_unlock(&dev_priv->fb_tracking.lock);
-	}
-
-	might_sleep();
-	intel_psr_invalidate(dev_priv, frontbuffer_bits, origin);
-	intel_edp_drrs_invalidate(dev_priv, frontbuffer_bits);
-	intel_fbc_invalidate(dev_priv, frontbuffer_bits, origin);
-}
-
 /**
- * intel_frontbuffer_flush - flush frontbuffer
- * @dev_priv: i915 device
+ * frontbuffer_flush - flush frontbuffer
+ * @i915: i915 device
  * @frontbuffer_bits: frontbuffer plane tracking bits
  * @origin: which operation caused the flush
  *
@@ -93,45 +74,27 @@ void __intel_fb_obj_invalidate(struct drm_i915_gem_object *obj,
  *
  * Can be called without any locks held.
  */
-static void intel_frontbuffer_flush(struct drm_i915_private *dev_priv,
-				    unsigned frontbuffer_bits,
-				    enum fb_op_origin origin)
+static void frontbuffer_flush(struct drm_i915_private *i915,
+			      unsigned int frontbuffer_bits,
+			      enum fb_op_origin origin)
 {
 	/* Delay flushing when rings are still busy.*/
-	spin_lock(&dev_priv->fb_tracking.lock);
-	frontbuffer_bits &= ~dev_priv->fb_tracking.busy_bits;
-	spin_unlock(&dev_priv->fb_tracking.lock);
+	spin_lock(&i915->fb_tracking.lock);
+	frontbuffer_bits &= ~i915->fb_tracking.busy_bits;
+	spin_unlock(&i915->fb_tracking.lock);
 
 	if (!frontbuffer_bits)
 		return;
 
 	might_sleep();
-	intel_edp_drrs_flush(dev_priv, frontbuffer_bits);
-	intel_psr_flush(dev_priv, frontbuffer_bits, origin);
-	intel_fbc_flush(dev_priv, frontbuffer_bits, origin);
-}
-
-void __intel_fb_obj_flush(struct drm_i915_gem_object *obj,
-			  enum fb_op_origin origin,
-			  unsigned int frontbuffer_bits)
-{
-	struct drm_i915_private *dev_priv = to_i915(obj->base.dev);
-
-	if (origin == ORIGIN_CS) {
-		spin_lock(&dev_priv->fb_tracking.lock);
-		/* Filter out new bits since rendering started. */
-		frontbuffer_bits &= dev_priv->fb_tracking.busy_bits;
-		dev_priv->fb_tracking.busy_bits &= ~frontbuffer_bits;
-		spin_unlock(&dev_priv->fb_tracking.lock);
-	}
-
-	if (frontbuffer_bits)
-		intel_frontbuffer_flush(dev_priv, frontbuffer_bits, origin);
+	intel_edp_drrs_flush(i915, frontbuffer_bits);
+	intel_psr_flush(i915, frontbuffer_bits, origin);
+	intel_fbc_flush(i915, frontbuffer_bits, origin);
 }
 
 /**
  * intel_frontbuffer_flip_prepare - prepare asynchronous frontbuffer flip
- * @dev_priv: i915 device
+ * @i915: i915 device
  * @frontbuffer_bits: frontbuffer plane tracking bits
  *
  * This function gets called after scheduling a flip on @obj. The actual
@@ -141,19 +104,19 @@ void __intel_fb_obj_flush(struct drm_i915_gem_object *obj,
  *
  * Can be called without any locks held.
  */
-void intel_frontbuffer_flip_prepare(struct drm_i915_private *dev_priv,
+void intel_frontbuffer_flip_prepare(struct drm_i915_private *i915,
 				    unsigned frontbuffer_bits)
 {
-	spin_lock(&dev_priv->fb_tracking.lock);
-	dev_priv->fb_tracking.flip_bits |= frontbuffer_bits;
+	spin_lock(&i915->fb_tracking.lock);
+	i915->fb_tracking.flip_bits |= frontbuffer_bits;
 	/* Remove stale busy bits due to the old buffer. */
-	dev_priv->fb_tracking.busy_bits &= ~frontbuffer_bits;
-	spin_unlock(&dev_priv->fb_tracking.lock);
+	i915->fb_tracking.busy_bits &= ~frontbuffer_bits;
+	spin_unlock(&i915->fb_tracking.lock);
 }
 
 /**
  * intel_frontbuffer_flip_complete - complete asynchronous frontbuffer flip
- * @dev_priv: i915 device
+ * @i915: i915 device
  * @frontbuffer_bits: frontbuffer plane tracking bits
  *
  * This function gets called after the flip has been latched and will complete
@@ -161,23 +124,22 @@ void intel_frontbuffer_flip_prepare(struct drm_i915_private *dev_priv,
  *
  * Can be called without any locks held.
  */
-void intel_frontbuffer_flip_complete(struct drm_i915_private *dev_priv,
+void intel_frontbuffer_flip_complete(struct drm_i915_private *i915,
 				     unsigned frontbuffer_bits)
 {
-	spin_lock(&dev_priv->fb_tracking.lock);
+	spin_lock(&i915->fb_tracking.lock);
 	/* Mask any cancelled flips. */
-	frontbuffer_bits &= dev_priv->fb_tracking.flip_bits;
-	dev_priv->fb_tracking.flip_bits &= ~frontbuffer_bits;
-	spin_unlock(&dev_priv->fb_tracking.lock);
+	frontbuffer_bits &= i915->fb_tracking.flip_bits;
+	i915->fb_tracking.flip_bits &= ~frontbuffer_bits;
+	spin_unlock(&i915->fb_tracking.lock);
 
 	if (frontbuffer_bits)
-		intel_frontbuffer_flush(dev_priv,
-					frontbuffer_bits, ORIGIN_FLIP);
+		frontbuffer_flush(i915, frontbuffer_bits, ORIGIN_FLIP);
 }
 
 /**
  * intel_frontbuffer_flip - synchronous frontbuffer flip
- * @dev_priv: i915 device
+ * @i915: i915 device
  * @frontbuffer_bits: frontbuffer plane tracking bits
  *
  * This function gets called after scheduling a flip on @obj. This is for
@@ -186,13 +148,159 @@ void intel_frontbuffer_flip_complete(struct drm_i915_private *dev_priv,
  *
  * Can be called without any locks held.
  */
-void intel_frontbuffer_flip(struct drm_i915_private *dev_priv,
+void intel_frontbuffer_flip(struct drm_i915_private *i915,
 			    unsigned frontbuffer_bits)
 {
-	spin_lock(&dev_priv->fb_tracking.lock);
+	spin_lock(&i915->fb_tracking.lock);
 	/* Remove stale busy bits due to the old buffer. */
-	dev_priv->fb_tracking.busy_bits &= ~frontbuffer_bits;
-	spin_unlock(&dev_priv->fb_tracking.lock);
+	i915->fb_tracking.busy_bits &= ~frontbuffer_bits;
+	spin_unlock(&i915->fb_tracking.lock);
 
-	intel_frontbuffer_flush(dev_priv, frontbuffer_bits, ORIGIN_FLIP);
+	frontbuffer_flush(i915, frontbuffer_bits, ORIGIN_FLIP);
+}
+
+void __intel_fb_invalidate(struct intel_frontbuffer *front,
+			   enum fb_op_origin origin,
+			   unsigned int frontbuffer_bits)
+{
+	struct drm_i915_private *i915 = to_i915(front->obj->base.dev);
+
+	if (origin == ORIGIN_CS) {
+		spin_lock(&i915->fb_tracking.lock);
+		i915->fb_tracking.busy_bits |= frontbuffer_bits;
+		i915->fb_tracking.flip_bits &= ~frontbuffer_bits;
+		spin_unlock(&i915->fb_tracking.lock);
+	}
+
+	might_sleep();
+	intel_psr_invalidate(i915, frontbuffer_bits, origin);
+	intel_edp_drrs_invalidate(i915, frontbuffer_bits);
+	intel_fbc_invalidate(i915, frontbuffer_bits, origin);
+}
+
+void __intel_fb_flush(struct intel_frontbuffer *front,
+		      enum fb_op_origin origin,
+		      unsigned int frontbuffer_bits)
+{
+	struct drm_i915_private *i915 = to_i915(front->obj->base.dev);
+
+	if (origin == ORIGIN_CS) {
+		spin_lock(&i915->fb_tracking.lock);
+		/* Filter out new bits since rendering started. */
+		frontbuffer_bits &= i915->fb_tracking.busy_bits;
+		i915->fb_tracking.busy_bits &= ~frontbuffer_bits;
+		spin_unlock(&i915->fb_tracking.lock);
+	}
+
+	if (frontbuffer_bits)
+		frontbuffer_flush(i915, frontbuffer_bits, origin);
+}
+
+static int frontbuffer_active(struct i915_active *ref)
+{
+	struct intel_frontbuffer *front =
+		container_of(ref, typeof(*front), write);
+
+	kref_get(&front->ref);
+	return 0;
+}
+
+static void frontbuffer_retire(struct i915_active *ref)
+{
+	struct intel_frontbuffer *front =
+		container_of(ref, typeof(*front), write);
+
+	intel_frontbuffer_flush(front, ORIGIN_CS);
+	intel_frontbuffer_put(front);
+}
+
+static void frontbuffer_release(struct kref *ref)
+{
+	struct intel_frontbuffer *front =
+		container_of(ref, typeof(*front), ref);
+
+	front->obj->frontbuffer = NULL;
+	spin_unlock(&to_i915(front->obj->base.dev)->fb_tracking.lock);
+
+	i915_gem_object_put(front->obj);
+	kfree(front);
+}
+
+struct intel_frontbuffer *
+intel_frontbuffer_get(struct drm_i915_gem_object *obj)
+{
+	struct drm_i915_private *i915 = to_i915(obj->base.dev);
+	struct intel_frontbuffer *front;
+
+	spin_lock(&i915->fb_tracking.lock);
+	front = obj->frontbuffer;
+	if (front)
+		kref_get(&front->ref);
+	spin_unlock(&i915->fb_tracking.lock);
+	if (front)
+		return front;
+
+	front = kmalloc(sizeof(*front), GFP_KERNEL);
+	if (!front)
+		return NULL;
+
+	front->obj = obj;
+	kref_init(&front->ref);
+	atomic_set(&front->bits, 0);
+	i915_active_init(i915, &front->write,
+			 frontbuffer_active, frontbuffer_retire);
+
+	spin_lock(&i915->fb_tracking.lock);
+	if (obj->frontbuffer) {
+		kfree(front);
+		front = obj->frontbuffer;
+		kref_get(&front->ref);
+	} else {
+		i915_gem_object_get(obj);
+		obj->frontbuffer = front;
+	}
+	spin_unlock(&i915->fb_tracking.lock);
+
+	return front;
+}
+
+void intel_frontbuffer_put(struct intel_frontbuffer *front)
+{
+	kref_put_lock(&front->ref,
+		      frontbuffer_release,
+		      &to_i915(front->obj->base.dev)->fb_tracking.lock);
+}
+
+/**
+ * intel_frontbuffer_track - update frontbuffer tracking
+ * @old: current GEM buffer for the frontbuffer slots
+ * @new: new GEM buffer for the frontbuffer slots
+ * @frontbuffer_bits: bitmask of frontbuffer slots
+ *
+ * This updates the frontbuffer tracking bits @frontbuffer_bits by clearing them
+ * from @old and setting them in @new. Both @old and @new can be NULL.
+ */
+void intel_frontbuffer_track(struct intel_frontbuffer *old,
+			     struct intel_frontbuffer *new,
+			     unsigned int frontbuffer_bits)
+{
+	/*
+	 * Control of individual bits within the mask are guarded by
+	 * the owning plane->mutex, i.e. we can never see concurrent
+	 * manipulation of individual bits. But since the bitfield as a whole
+	 * is updated using RMW, we need to use atomics in order to update
+	 * the bits.
+	 */
+	BUILD_BUG_ON(INTEL_FRONTBUFFER_BITS_PER_PIPE * I915_MAX_PIPES >
+		     BITS_PER_TYPE(atomic_t));
+
+	if (old) {
+		WARN_ON(!(atomic_read(&old->bits) & frontbuffer_bits));
+		atomic_andnot(frontbuffer_bits, &old->bits);
+	}
+
+	if (new) {
+		WARN_ON(atomic_read(&new->bits) & frontbuffer_bits);
+		atomic_or(frontbuffer_bits, &new->bits);
+	}
 }
diff --git a/drivers/gpu/drm/i915/intel_frontbuffer.h b/drivers/gpu/drm/i915/intel_frontbuffer.h
index 5727320c8084..adc64d61a4a5 100644
--- a/drivers/gpu/drm/i915/intel_frontbuffer.h
+++ b/drivers/gpu/drm/i915/intel_frontbuffer.h
@@ -24,7 +24,10 @@
 #ifndef __INTEL_FRONTBUFFER_H__
 #define __INTEL_FRONTBUFFER_H__
 
-#include "gem/i915_gem_object.h"
+#include <linux/atomic.h>
+#include <linux/kref.h>
+
+#include "i915_active.h"
 
 struct drm_i915_private;
 struct drm_i915_gem_object;
@@ -37,23 +40,30 @@ enum fb_op_origin {
 	ORIGIN_DIRTYFB,
 };
 
-void intel_frontbuffer_flip_prepare(struct drm_i915_private *dev_priv,
+struct intel_frontbuffer {
+	struct kref ref;
+	atomic_t bits;
+	struct i915_active write;
+	struct drm_i915_gem_object *obj;
+};
+
+void intel_frontbuffer_flip_prepare(struct drm_i915_private *i915,
 				    unsigned frontbuffer_bits);
-void intel_frontbuffer_flip_complete(struct drm_i915_private *dev_priv,
+void intel_frontbuffer_flip_complete(struct drm_i915_private *i915,
 				     unsigned frontbuffer_bits);
-void intel_frontbuffer_flip(struct drm_i915_private *dev_priv,
+void intel_frontbuffer_flip(struct drm_i915_private *i915,
 			    unsigned frontbuffer_bits);
 
-void __intel_fb_obj_invalidate(struct drm_i915_gem_object *obj,
-			       enum fb_op_origin origin,
-			       unsigned int frontbuffer_bits);
-void __intel_fb_obj_flush(struct drm_i915_gem_object *obj,
-			  enum fb_op_origin origin,
-			  unsigned int frontbuffer_bits);
+struct intel_frontbuffer *
+intel_frontbuffer_get(struct drm_i915_gem_object *obj);
+
+void __intel_fb_invalidate(struct intel_frontbuffer *front,
+			   enum fb_op_origin origin,
+			   unsigned int frontbuffer_bits);
 
 /**
- * intel_fb_obj_invalidate - invalidate frontbuffer object
- * @obj: GEM object to invalidate
+ * intel_frontbuffer_invalidate - invalidate frontbuffer object
+ * @front: GEM object to invalidate
  * @origin: which operation caused the invalidation
  *
  * This function gets called every time rendering on the given object starts and
@@ -62,37 +72,53 @@ void __intel_fb_obj_flush(struct drm_i915_gem_object *obj,
  * until the rendering completes or a flip on this frontbuffer plane is
  * scheduled.
  */
-static inline bool intel_fb_obj_invalidate(struct drm_i915_gem_object *obj,
-					   enum fb_op_origin origin)
+static inline bool intel_frontbuffer_invalidate(struct intel_frontbuffer *front,
+						enum fb_op_origin origin)
 {
 	unsigned int frontbuffer_bits;
 
-	frontbuffer_bits = atomic_read(&obj->frontbuffer_bits);
+	if (!front)
+		return false;
+
+	frontbuffer_bits = atomic_read(&front->bits);
 	if (!frontbuffer_bits)
 		return false;
 
-	__intel_fb_obj_invalidate(obj, origin, frontbuffer_bits);
+	__intel_fb_invalidate(front, origin, frontbuffer_bits);
 	return true;
 }
 
+void __intel_fb_flush(struct intel_frontbuffer *front,
+		      enum fb_op_origin origin,
+		      unsigned int frontbuffer_bits);
+
 /**
- * intel_fb_obj_flush - flush frontbuffer object
- * @obj: GEM object to flush
+ * intel_frontbuffer_flush - flush frontbuffer object
+ * @front: GEM object to flush
  * @origin: which operation caused the flush
  *
  * This function gets called every time rendering on the given object has
  * completed and frontbuffer caching can be started again.
  */
-static inline void intel_fb_obj_flush(struct drm_i915_gem_object *obj,
-				      enum fb_op_origin origin)
+static inline void intel_frontbuffer_flush(struct intel_frontbuffer *front,
+					   enum fb_op_origin origin)
 {
 	unsigned int frontbuffer_bits;
 
-	frontbuffer_bits = atomic_read(&obj->frontbuffer_bits);
+	if (!front)
+		return;
+
+	frontbuffer_bits = atomic_read(&front->bits);
 	if (!frontbuffer_bits)
 		return;
 
-	__intel_fb_obj_flush(obj, origin, frontbuffer_bits);
+	__intel_fb_flush(front, origin, frontbuffer_bits);
 }
 
+void intel_frontbuffer_track(struct intel_frontbuffer *old,
+			     struct intel_frontbuffer *new,
+			     unsigned int frontbuffer_bits);
+
+void intel_frontbuffer_put(struct intel_frontbuffer *front);
+
 #endif /* __INTEL_FRONTBUFFER_H__ */
diff --git a/drivers/gpu/drm/i915/intel_overlay.c b/drivers/gpu/drm/i915/intel_overlay.c
index c7d2d980df8c..f0743af79fcb 100644
--- a/drivers/gpu/drm/i915/intel_overlay.c
+++ b/drivers/gpu/drm/i915/intel_overlay.c
@@ -281,9 +281,9 @@ static void intel_overlay_flip_prepare(struct intel_overlay *overlay,
 
 	WARN_ON(overlay->old_vma);
 
-	i915_gem_track_fb(overlay->vma ? overlay->vma->obj : NULL,
-			  vma ? vma->obj : NULL,
-			  INTEL_FRONTBUFFER_OVERLAY(pipe));
+	intel_frontbuffer_track(overlay->vma ? overlay->vma->obj->frontbuffer : NULL,
+				vma ? vma->obj->frontbuffer : NULL,
+				INTEL_FRONTBUFFER_OVERLAY(pipe));
 
 	intel_frontbuffer_flip_prepare(overlay->i915,
 				       INTEL_FRONTBUFFER_OVERLAY(pipe));
@@ -768,7 +768,7 @@ static int intel_overlay_do_put_image(struct intel_overlay *overlay,
 		ret = PTR_ERR(vma);
 		goto out_pin_section;
 	}
-	intel_fb_obj_flush(new_bo, ORIGIN_DIRTYFB);
+	intel_frontbuffer_flush(new_bo->frontbuffer, ORIGIN_DIRTYFB);
 
 	ret = i915_vma_put_fence(vma);
 	if (ret)
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 24/39] drm/i915: Coordinate i915_active with its own mutex
  2019-06-14  7:09 WIP glance towards struct_mutex removal Chris Wilson
                   ` (22 preceding siblings ...)
  2019-06-14  7:10 ` [PATCH 23/39] drm/i915: Extract intel_frontbuffer active tracking Chris Wilson
@ 2019-06-14  7:10 ` Chris Wilson
  2019-06-14  7:10 ` [PATCH 25/39] drm/i915: Track ggtt fence reservations under " Chris Wilson
                   ` (16 subsequent siblings)
  40 siblings, 0 replies; 61+ messages in thread
From: Chris Wilson @ 2019-06-14  7:10 UTC (permalink / raw)
  To: intel-gfx

Forgo the struct_mutex serialisation for i915_active, and interpose its
own mutex handling for active/retire.

This is a multi-layered sleight-of-hand. First, we had to ensure that no
active/retire callbacks accidentally inverted the mutex ordering rules,
nor assumed that they were themselves serialised by struct_mutex. More
challenging though, is the rule over updating elements of the active
rbtree. Instead of the whole i915_active now being serialised by
struct_mutex, allocations/rotations of the tree are serialised by the
i915_active.mutex and individual nodes are serialised by the caller
using the i915_timeline.mutex (we need to use nested spinlocks to
interact with the dma_fence callback lists).

The pain point here is that instead of a single mutex around execbuf, we
now have to take a mutex for active tracker (one for each vma, context,
etc) and a couple of spinlocks for each fence update. The improvement in
fine grained locking allowing for multiple concurrent clients
(eventually!) should be worth it in typical loads.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c   |  12 +-
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |   2 +-
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |   2 +
 drivers/gpu/drm/i915/gem/i915_gem_pm.c        |   9 +-
 drivers/gpu/drm/i915/gt/intel_context.c       |   2 +-
 drivers/gpu/drm/i915/gt/intel_reset.c         |  10 +-
 drivers/gpu/drm/i915/gt/selftest_lrc.c        |  10 +-
 drivers/gpu/drm/i915/gvt/scheduler.c          |   3 -
 drivers/gpu/drm/i915/i915_active.c            | 148 +++++----
 drivers/gpu/drm/i915/i915_active.h            | 305 ++++--------------
 drivers/gpu/drm/i915/i915_active_types.h      |  17 +-
 drivers/gpu/drm/i915/i915_gem.c               |  42 ++-
 drivers/gpu/drm/i915/i915_gem_gtt.c           |   2 +-
 drivers/gpu/drm/i915/i915_request.c           |  53 +--
 drivers/gpu/drm/i915/i915_request.h           |   1 -
 drivers/gpu/drm/i915/i915_timeline.c          |   9 +-
 drivers/gpu/drm/i915/i915_timeline_types.h    |   2 +-
 drivers/gpu/drm/i915/i915_vma.c               |  37 +--
 drivers/gpu/drm/i915/intel_frontbuffer.c      |   3 +-
 drivers/gpu/drm/i915/intel_overlay.c          |   6 +-
 drivers/gpu/drm/i915/selftests/i915_active.c  |  22 +-
 .../gpu/drm/i915/selftests/mock_timeline.c    |   2 +-
 22 files changed, 225 insertions(+), 474 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 9262a1d4f763..2e3e47bb1d11 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -910,20 +910,18 @@ static int context_barrier_task(struct i915_gem_context *ctx,
 				void (*task)(void *data),
 				void *data)
 {
-	struct drm_i915_private *i915 = ctx->i915;
 	struct context_barrier_task *cb;
 	struct i915_gem_engines_iter it;
 	struct intel_context *ce;
 	int err = 0;
 
-	lockdep_assert_held(&i915->drm.struct_mutex);
 	GEM_BUG_ON(!task);
 
 	cb = kmalloc(sizeof(*cb), GFP_KERNEL);
 	if (!cb)
 		return -ENOMEM;
 
-	i915_active_init(i915, &cb->base, NULL, cb_retire);
+	i915_active_init(&cb->base, NULL, cb_retire);
 	err = i915_active_acquire(&cb->base);
 	if (err) {
 		kfree(cb);
@@ -955,7 +953,9 @@ static int context_barrier_task(struct i915_gem_context *ctx,
 		if (emit)
 			err = emit(rq, data);
 		if (err == 0)
-			err = i915_active_ref(&cb->base, rq->fence.context, rq);
+			err = i915_active_ref(&cb->base,
+					      rq->fence.context,
+					      &rq->fence);
 
 		i915_request_add(rq);
 		if (err)
@@ -1192,7 +1192,7 @@ gen8_modify_rpcs(struct intel_context *ce, struct intel_sseu sseu)
 		return PTR_ERR(rq);
 
 	/* Queue this switch after all other activity by this context. */
-	ret = i915_active_request_set(&ce->ring->timeline->last_request, rq);
+	ret = i915_active_fence_set(&ce->ring->timeline->last_request, rq);
 	if (ret)
 		goto out_add;
 
@@ -1204,7 +1204,7 @@ gen8_modify_rpcs(struct intel_context *ce, struct intel_sseu sseu)
 	 * words transfer the pinned ce object to tracked active request.
 	 */
 	GEM_BUG_ON(i915_active_is_idle(&ce->active));
-	ret = i915_active_ref(&ce->active, rq->fence.context, rq);
+	ret = i915_active_ref(&ce->active, rq->fence.context, &rq->fence);
 	if (ret)
 		goto out_add;
 
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 528eea44dccf..eeb4c6cdb01d 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -1316,7 +1316,7 @@ relocate_entry(struct i915_vma *vma,
 
 	if (!eb->reloc_cache.vaddr &&
 	    (DBG_FORCE_RELOC == FORCE_GPU_RELOC ||
-	     !reservation_object_test_signaled_rcu(vma->resv, true))) {
+	     i915_vma_is_active(vma))) {
 		const unsigned int gen = eb->reloc_cache.gen;
 		unsigned int len;
 		u32 *batch;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index 21bfb7bd0f57..e87fca4d8194 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -11,6 +11,8 @@
 
 #include <drm/drm_gem.h>
 
+#include <uapi/drm/i915_drm.h>
+
 #include "i915_active.h"
 #include "i915_selftest.h"
 
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pm.c b/drivers/gpu/drm/i915/gem/i915_gem_pm.c
index 05011d4a3b88..35b73c347887 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_pm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_pm.c
@@ -15,14 +15,11 @@ static void call_idle_barriers(struct intel_engine_cs *engine)
 	struct llist_node *node, *next;
 
 	llist_for_each_safe(node, next, llist_del_all(&engine->barrier_tasks)) {
-		struct i915_active_request *active =
+		struct dma_fence_cb *cb =
 			container_of((struct list_head *)node,
-				     typeof(*active), link);
+				     typeof(*cb), node);
 
-		INIT_LIST_HEAD(&active->link);
-		RCU_INIT_POINTER(active->request, NULL);
-
-		active->retire(active, NULL);
+		cb->func(NULL, cb);
 	}
 }
 
diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
index 2ea05dcdb99f..a109904faa57 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -182,7 +182,7 @@ intel_context_init(struct intel_context *ce,
 
 	mutex_init(&ce->pin_mutex);
 
-	i915_active_init(ctx->i915, &ce->active,
+	i915_active_init(&ce->active,
 			 __intel_context_active, __intel_context_retire);
 }
 
diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
index b779984a5620..110a5936ea0e 100644
--- a/drivers/gpu/drm/i915/gt/intel_reset.c
+++ b/drivers/gpu/drm/i915/gt/intel_reset.c
@@ -881,10 +881,10 @@ static bool __i915_gem_unset_wedged(struct drm_i915_private *i915)
 	 */
 	mutex_lock(&i915->gt.timelines.mutex);
 	list_for_each_entry(tl, &i915->gt.timelines.active_list, link) {
-		struct i915_request *rq;
+		struct dma_fence *fence;
 
-		rq = i915_active_request_get_unlocked(&tl->last_request);
-		if (!rq)
+		fence = i915_active_fence_get(&tl->last_request);
+		if (!fence)
 			continue;
 
 		/*
@@ -894,8 +894,8 @@ static bool __i915_gem_unset_wedged(struct drm_i915_private *i915)
 		 * (I915_FENCE_TIMEOUT) so this wait should not be unbounded
 		 * in the worst case.
 		 */
-		dma_fence_default_wait(&rq->fence, false, MAX_SCHEDULE_TIMEOUT);
-		i915_request_put(rq);
+		dma_fence_default_wait(fence, false, MAX_SCHEDULE_TIMEOUT);
+		dma_fence_put(fence);
 	}
 	mutex_unlock(&i915->gt.timelines.mutex);
 
diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c b/drivers/gpu/drm/i915/gt/selftest_lrc.c
index 9ba6bffff3e3..bfba2fe9ec4d 100644
--- a/drivers/gpu/drm/i915/gt/selftest_lrc.c
+++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c
@@ -854,9 +854,13 @@ static struct i915_request *dummy_request(struct intel_engine_cs *engine)
 	if (!rq)
 		return NULL;
 
-	INIT_LIST_HEAD(&rq->active_list);
 	rq->engine = engine;
 
+	spin_lock_init(&rq->lock);
+	INIT_LIST_HEAD(&rq->fence.cb_list);
+	rq->fence.lock = &rq->lock;
+	rq->fence.ops = &i915_fence_ops;
+
 	i915_sched_node_init(&rq->sched);
 
 	/* mark this request as permanently incomplete */
@@ -945,8 +949,8 @@ static int live_suppress_wait_preempt(void *arg)
 				}
 
 				/* Disable NEWCLIENT promotion */
-				__i915_active_request_set(&rq[i]->timeline->last_request,
-							  dummy);
+				__i915_active_fence_set(&rq[i]->timeline->last_request,
+							&dummy->fence);
 				i915_request_add(rq[i]);
 			}
 
diff --git a/drivers/gpu/drm/i915/gvt/scheduler.c b/drivers/gpu/drm/i915/gvt/scheduler.c
index e301efb18d45..c36f904a6d54 100644
--- a/drivers/gpu/drm/i915/gvt/scheduler.c
+++ b/drivers/gpu/drm/i915/gvt/scheduler.c
@@ -391,11 +391,8 @@ intel_gvt_workload_req_alloc(struct intel_vgpu_workload *workload)
 {
 	struct intel_vgpu *vgpu = workload->vgpu;
 	struct intel_vgpu_submission *s = &vgpu->submission;
-	struct drm_i915_private *dev_priv = vgpu->gvt->dev_priv;
 	struct i915_request *rq;
 
-	lockdep_assert_held(&dev_priv->drm.struct_mutex);
-
 	if (workload->req)
 		return 0;
 
diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c
index f76a2b4ef95d..38f16b69fde1 100644
--- a/drivers/gpu/drm/i915/i915_active.c
+++ b/drivers/gpu/drm/i915/i915_active.c
@@ -8,8 +8,6 @@
 #include "i915_active.h"
 #include "i915_globals.h"
 
-#define BKL(ref) (&(ref)->i915->drm.struct_mutex)
-
 /*
  * Active refs memory management
  *
@@ -23,7 +21,7 @@ static struct i915_global_active {
 } global;
 
 struct active_node {
-	struct i915_active_request base;
+	struct i915_active_fence base;
 	struct i915_active *ref;
 	struct rb_node node;
 	u64 timeline;
@@ -54,7 +52,7 @@ __active_retire(struct i915_active *ref)
 	ref->retire(ref);
 
 	rbtree_postorder_for_each_entry_safe(it, n, &root, node) {
-		GEM_BUG_ON(i915_active_request_isset(&it->base));
+		GEM_BUG_ON(i915_active_fence_isset(&it->base));
 		kmem_cache_free(global.slab_cache, it);
 	}
 }
@@ -89,12 +87,13 @@ active_retire(struct i915_active *ref)
 }
 
 static void
-node_retire(struct i915_active_request *base, struct i915_request *rq)
+node_retire(struct dma_fence *fence, struct dma_fence_cb *cb)
 {
-	active_retire(container_of(base, struct active_node, base)->ref);
+	i915_active_fence_cb(fence, cb);
+	active_retire(container_of(cb, struct active_node, base.cb)->ref);
 }
 
-static struct i915_active_request *
+static struct i915_active_fence *
 active_instance(struct i915_active *ref, u64 idx)
 {
 	struct active_node *node, *prealloc;
@@ -137,7 +136,7 @@ active_instance(struct i915_active *ref, u64 idx)
 	}
 
 	node = prealloc;
-	i915_active_request_init(&node->base, NULL, node_retire);
+	__i915_active_fence_init(&node->base, NULL, node_retire);
 	node->ref = ref;
 	node->timeline = idx;
 
@@ -151,13 +150,11 @@ active_instance(struct i915_active *ref, u64 idx)
 	return &node->base;
 }
 
-void __i915_active_init(struct drm_i915_private *i915,
-			struct i915_active *ref,
+void __i915_active_init(struct i915_active *ref,
 			int (*active)(struct i915_active *ref),
 			void (*retire)(struct i915_active *ref),
 			struct lock_class_key *key)
 {
-	ref->i915 = i915;
 	ref->active = active;
 	ref->retire = retire;
 	ref->tree = RB_ROOT;
@@ -170,9 +167,9 @@ void __i915_active_init(struct drm_i915_private *i915,
 
 int i915_active_ref(struct i915_active *ref,
 		    u64 timeline,
-		    struct i915_request *rq)
+		    struct dma_fence *fence)
 {
-	struct i915_active_request *active;
+	struct i915_active_fence *active;
 	int err;
 
 	/* Prevent reaping in case we malloc/wait while building the tree */
@@ -186,9 +183,9 @@ int i915_active_ref(struct i915_active *ref,
 		goto out;
 	}
 
-	if (!i915_active_request_isset(active))
+	GEM_BUG_ON(!atomic_read(&ref->count));
+	if (!__i915_active_fence_set(active, fence))
 		atomic_inc(&ref->count);
-	__i915_active_request_set(active, rq);
 
 out:
 	i915_active_release(ref);
@@ -240,47 +237,20 @@ int i915_active_wait(struct i915_active *ref)
 	}
 
 	rbtree_postorder_for_each_entry_safe(it, n, &ref->tree, node) {
-		err = i915_active_request_retire(&it->base, BKL(ref));
-		if (err)
-			break;
-	}
-
-	__active_retire(ref);
-	flush_work(&ref->work);
-	return err;
-}
-
-int i915_request_await_active_request(struct i915_request *rq,
-				      struct i915_active_request *active)
-{
-	struct i915_request *barrier =
-		i915_active_request_raw(active, &rq->i915->drm.struct_mutex);
-
-	return barrier ? i915_request_await_dma_fence(rq, &barrier->fence) : 0;
-}
+		struct dma_fence *fence;
 
-int i915_request_await_active(struct i915_request *rq, struct i915_active *ref)
-{
-	struct active_node *it, *n;
-	int err;
+		fence = i915_active_fence_get(&it->base);
+		if (!fence)
+			continue;
 
-	if (RB_EMPTY_ROOT(&ref->tree))
-		return 0;
-
-	/* await allocates and so we need to avoid hitting the shrinker */
-	err = i915_active_acquire(ref);
-	if (err)
-		return err;
-
-	mutex_lock(&ref->mutex);
-	rbtree_postorder_for_each_entry_safe(it, n, &ref->tree, node) {
-		err = i915_request_await_active_request(rq, &it->base);
+		err = dma_fence_wait(fence, true);
+		dma_fence_put(fence);
 		if (err)
 			break;
 	}
-	mutex_unlock(&ref->mutex);
 
-	i915_active_release(ref);
+	__active_retire(ref);
+	flush_work(&ref->work);
 	return err;
 }
 
@@ -312,13 +282,12 @@ int i915_active_acquire_preallocate_barrier(struct i915_active *ref,
 			break;
 		}
 
-		i915_active_request_init(&node->base,
-					 (void *)engine, node_retire);
+		__i915_active_fence_init(&node->base, engine, node_retire);
 		node->timeline = kctx->ring->timeline->fence_context;
 		node->ref = ref;
 		atomic_inc(&ref->count);
 
-		llist_add((struct llist_node *)&node->base.link,
+		llist_add((struct llist_node *)&node->base.cb.node,
 			  &ref->barriers);
 	}
 
@@ -338,10 +307,10 @@ void i915_active_acquire_barrier(struct i915_active *ref)
 		struct rb_node **p, *parent;
 
 		node = container_of((struct list_head *)pos,
-				    typeof(*node), base.link);
+				    typeof(*node), base.cb.node);
 
-		engine = (void *)rcu_access_pointer(node->base.request);
-		RCU_INIT_POINTER(node->base.request, ERR_PTR(-EAGAIN));
+		engine = (void *)rcu_access_pointer(node->base.fence);
+		RCU_INIT_POINTER(node->base.fence, ERR_PTR(-EAGAIN));
 
 		parent = NULL;
 		p = &ref->tree.rb_node;
@@ -357,7 +326,7 @@ void i915_active_acquire_barrier(struct i915_active *ref)
 		rb_link_node(&node->node, parent, p);
 		rb_insert_color(&node->node, &ref->tree);
 
-		llist_add((struct llist_node *)&node->base.link,
+		llist_add((struct llist_node *)&node->base.cb.node,
 			  &engine->barrier_tasks);
 	}
 	mutex_unlock(&ref->mutex);
@@ -367,29 +336,72 @@ void i915_request_add_barriers(struct i915_request *rq)
 {
 	struct intel_engine_cs *engine = rq->engine;
 	struct llist_node *node, *next;
+	unsigned long flags;
 
-	llist_for_each_safe(node, next, llist_del_all(&engine->barrier_tasks))
-		list_add_tail((struct list_head *)node, &rq->active_list);
+	GEM_BUG_ON(intel_engine_is_virtual(engine));
+	node = llist_del_all(&engine->barrier_tasks);
+	if (!node)
+		return;
+
+	spin_lock_irqsave(&rq->lock, flags);
+	llist_for_each_safe(node, next, node)
+		list_add_tail((struct list_head *)node, &rq->fence.cb_list);
+	spin_unlock_irqrestore(&rq->lock, flags);
 }
 
-int i915_active_request_set(struct i915_active_request *active,
-			    struct i915_request *rq)
+struct dma_fence *
+__i915_active_fence_set(struct i915_active_fence *active,
+			struct dma_fence *fence)
 {
+	struct dma_fence *prev;
+	unsigned long flags;
+
+	/* NB: updates must be serialised by an outer timeline mutex */
+	spin_lock_irqsave(fence->lock, flags);
+	GEM_BUG_ON(test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags));
+
+	prev = rcu_dereference_protected(active->fence, 1 /* timeline mutex */);
+	if (prev) {
+		spin_lock_nested(prev->lock, SINGLE_DEPTH_NESTING);
+		__list_del_entry(&active->cb.node);
+		spin_unlock(prev->lock);
+		prev = rcu_access_pointer(active->fence);
+	}
+
+	rcu_assign_pointer(active->fence, fence);
+	list_add_tail(&active->cb.node, &fence->cb_list);
+
+	spin_unlock_irqrestore(fence->lock, flags);
+
+	return prev;
+}
+
+int i915_active_fence_set(struct i915_active_fence *active,
+			  struct i915_request *rq)
+{
+	struct dma_fence *fence;
 	int err;
 
 	/* Must maintain ordering wrt previous active requests */
-	err = i915_request_await_active_request(rq, active);
-	if (err)
-		return err;
+	rcu_read_lock();
+	fence = __i915_active_fence_set(active, &rq->fence);
+	if (fence)
+		fence = dma_fence_get_rcu(fence);
+	rcu_read_unlock();
+
+	if (fence) {
+		err = i915_request_await_dma_fence(rq, fence);
+		dma_fence_put(fence);
+		if (err)
+			return err;
+	}
 
-	__i915_active_request_set(active, rq);
 	return 0;
 }
 
-void i915_active_retire_noop(struct i915_active_request *active,
-			     struct i915_request *request)
+void i915_active_noop(struct dma_fence *fence, struct dma_fence_cb *cb)
 {
-	/* Space left intentionally blank */
+	i915_active_fence_cb(fence, cb);
 }
 
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
diff --git a/drivers/gpu/drm/i915/i915_active.h b/drivers/gpu/drm/i915/i915_active.h
index bdec4f81b6e8..fdf87bad76d5 100644
--- a/drivers/gpu/drm/i915/i915_active.h
+++ b/drivers/gpu/drm/i915/i915_active.h
@@ -10,7 +10,9 @@
 #include <linux/lockdep.h>
 
 #include "i915_active_types.h"
-#include "i915_request.h"
+
+struct i915_request;
+struct intel_engine_cs;
 
 /*
  * We treat requests as fences. This is not be to confused with our
@@ -28,300 +30,103 @@
  * write access so that we can perform concurrent read operations between
  * the CPU and GPU engines, as well as waiting for all rendering to
  * complete, or waiting for the last GPU user of a "fence register". The
- * object then embeds a #i915_active_request to track the most recent (in
+ * object then embeds a #i915_active_fence to track the most recent (in
  * retirement order) request relevant for the desired mode of access.
- * The #i915_active_request is updated with i915_active_request_set() to
+ * The #i915_active_fence is updated with i915_active_fence_set() to
  * track the most recent fence request, typically this is done as part of
  * i915_vma_move_to_active().
  *
- * When the #i915_active_request completes (is retired), it will
+ * When the #i915_active_fence completes (is retired), it will
  * signal its completion to the owner through a callback as well as mark
- * itself as idle (i915_active_request.request == NULL). The owner
+ * itself as idle (i915_active_fence.request == NULL). The owner
  * can then perform any action, such as delayed freeing of an active
  * resource including itself.
  */
 
-void i915_active_retire_noop(struct i915_active_request *active,
-			     struct i915_request *request);
+void i915_active_noop(struct dma_fence *fence, struct dma_fence_cb *cb);
 
 /**
- * i915_active_request_init - prepares the activity tracker for use
+ * __i915_active_fence_init - prepares the activity tracker for use
  * @active - the active tracker
- * @rq - initial request to track, can be NULL
+ * @fence - initial fence to track, can be NULL
  * @func - a callback when then the tracker is retired (becomes idle),
  *         can be NULL
  *
- * i915_active_request_init() prepares the embedded @active struct for use as
- * an activity tracker, that is for tracking the last known active request
- * associated with it. When the last request becomes idle, when it is retired
+ * i915_active_fence_init() prepares the embedded @active struct for use as
+ * an activity tracker, that is for tracking the last known active fence
+ * associated with it. When the last fence becomes idle, when it is retired
  * after completion, the optional callback @func is invoked.
  */
 static inline void
-i915_active_request_init(struct i915_active_request *active,
-			 struct i915_request *rq,
-			 i915_active_retire_fn retire)
+__i915_active_fence_init(struct i915_active_fence *active,
+			 void *fence,
+			 dma_fence_func_t fn)
 {
-	RCU_INIT_POINTER(active->request, rq);
-	INIT_LIST_HEAD(&active->link);
-	active->retire = retire ?: i915_active_retire_noop;
+	RCU_INIT_POINTER(active->fence, fence);
+	active->cb.func = fn ?: i915_active_noop;
 }
 
-#define INIT_ACTIVE_REQUEST(name) i915_active_request_init((name), NULL, NULL)
-
-/**
- * i915_active_request_set - updates the tracker to watch the current request
- * @active - the active tracker
- * @request - the request to watch
- *
- * __i915_active_request_set() watches the given @request for completion. Whilst
- * that @request is busy, the @active reports busy. When that @request is
- * retired, the @active tracker is updated to report idle.
- */
-static inline void
-__i915_active_request_set(struct i915_active_request *active,
-			  struct i915_request *request)
-{
-	list_move(&active->link, &request->active_list);
-	rcu_assign_pointer(active->request, request);
-}
-
-int __must_check
-i915_active_request_set(struct i915_active_request *active,
-			struct i915_request *rq);
-
-/**
- * i915_active_request_raw - return the active request
- * @active - the active tracker
- *
- * i915_active_request_raw() returns the current request being tracked, or NULL.
- * It does not obtain a reference on the request for the caller, so the caller
- * must hold struct_mutex.
- */
-static inline struct i915_request *
-i915_active_request_raw(const struct i915_active_request *active,
-			struct mutex *mutex)
-{
-	return rcu_dereference_protected(active->request,
-					 lockdep_is_held(mutex));
-}
-
-/**
- * i915_active_request_peek - report the active request being monitored
- * @active - the active tracker
- *
- * i915_active_request_peek() returns the current request being tracked if
- * still active, or NULL. It does not obtain a reference on the request
- * for the caller, so the caller must hold struct_mutex.
- */
-static inline struct i915_request *
-i915_active_request_peek(const struct i915_active_request *active,
-			 struct mutex *mutex)
-{
-	struct i915_request *request;
-
-	request = i915_active_request_raw(active, mutex);
-	if (!request || i915_request_completed(request))
-		return NULL;
+#define INIT_ACTIVE_FENCE(A) __i915_active_fence_init(A, NULL, NULL)
 
-	return request;
-}
+struct dma_fence *
+__i915_active_fence_set(struct i915_active_fence *active,
+			struct dma_fence *fence);
 
 /**
- * i915_active_request_get - return a reference to the active request
+ * i915_active_fence_set - updates the tracker to watch the current fence
  * @active - the active tracker
+ * @rq - the request to watch
  *
- * i915_active_request_get() returns a reference to the active request, or NULL
- * if the active tracker is idle. The caller must hold struct_mutex.
+ * i915_active_fence_set() watches the given @rq for completion. While
+ * that @rq is busy, the @active reports busy. When that @rq is signaled
+ * (or else retired) the @active tracker is updated to report idle.
  */
-static inline struct i915_request *
-i915_active_request_get(const struct i915_active_request *active,
-			struct mutex *mutex)
-{
-	return i915_request_get(i915_active_request_peek(active, mutex));
-}
-
-/**
- * __i915_active_request_get_rcu - return a reference to the active request
- * @active - the active tracker
- *
- * __i915_active_request_get() returns a reference to the active request,
- * or NULL if the active tracker is idle. The caller must hold the RCU read
- * lock, but the returned pointer is safe to use outside of RCU.
- */
-static inline struct i915_request *
-__i915_active_request_get_rcu(const struct i915_active_request *active)
-{
-	/*
-	 * Performing a lockless retrieval of the active request is super
-	 * tricky. SLAB_TYPESAFE_BY_RCU merely guarantees that the backing
-	 * slab of request objects will not be freed whilst we hold the
-	 * RCU read lock. It does not guarantee that the request itself
-	 * will not be freed and then *reused*. Viz,
-	 *
-	 * Thread A			Thread B
-	 *
-	 * rq = active.request
-	 *				retire(rq) -> free(rq);
-	 *				(rq is now first on the slab freelist)
-	 *				active.request = NULL
-	 *
-	 *				rq = new submission on a new object
-	 * ref(rq)
-	 *
-	 * To prevent the request from being reused whilst the caller
-	 * uses it, we take a reference like normal. Whilst acquiring
-	 * the reference we check that it is not in a destroyed state
-	 * (refcnt == 0). That prevents the request being reallocated
-	 * whilst the caller holds on to it. To check that the request
-	 * was not reallocated as we acquired the reference we have to
-	 * check that our request remains the active request across
-	 * the lookup, in the same manner as a seqlock. The visibility
-	 * of the pointer versus the reference counting is controlled
-	 * by using RCU barriers (rcu_dereference and rcu_assign_pointer).
-	 *
-	 * In the middle of all that, we inspect whether the request is
-	 * complete. Retiring is lazy so the request may be completed long
-	 * before the active tracker is updated. Querying whether the
-	 * request is complete is far cheaper (as it involves no locked
-	 * instructions setting cachelines to exclusive) than acquiring
-	 * the reference, so we do it first. The RCU read lock ensures the
-	 * pointer dereference is valid, but does not ensure that the
-	 * seqno nor HWS is the right one! However, if the request was
-	 * reallocated, that means the active tracker's request was complete.
-	 * If the new request is also complete, then both are and we can
-	 * just report the active tracker is idle. If the new request is
-	 * incomplete, then we acquire a reference on it and check that
-	 * it remained the active request.
-	 *
-	 * It is then imperative that we do not zero the request on
-	 * reallocation, so that we can chase the dangling pointers!
-	 * See i915_request_alloc().
-	 */
-	do {
-		struct i915_request *request;
-
-		request = rcu_dereference(active->request);
-		if (!request || i915_request_completed(request))
-			return NULL;
-
-		/*
-		 * An especially silly compiler could decide to recompute the
-		 * result of i915_request_completed, more specifically
-		 * re-emit the load for request->fence.seqno. A race would catch
-		 * a later seqno value, which could flip the result from true to
-		 * false. Which means part of the instructions below might not
-		 * be executed, while later on instructions are executed. Due to
-		 * barriers within the refcounting the inconsistency can't reach
-		 * past the call to i915_request_get_rcu, but not executing
-		 * that while still executing i915_request_put() creates
-		 * havoc enough.  Prevent this with a compiler barrier.
-		 */
-		barrier();
-
-		request = i915_request_get_rcu(request);
-
-		/*
-		 * What stops the following rcu_access_pointer() from occurring
-		 * before the above i915_request_get_rcu()? If we were
-		 * to read the value before pausing to get the reference to
-		 * the request, we may not notice a change in the active
-		 * tracker.
-		 *
-		 * The rcu_access_pointer() is a mere compiler barrier, which
-		 * means both the CPU and compiler are free to perform the
-		 * memory read without constraint. The compiler only has to
-		 * ensure that any operations after the rcu_access_pointer()
-		 * occur afterwards in program order. This means the read may
-		 * be performed earlier by an out-of-order CPU, or adventurous
-		 * compiler.
-		 *
-		 * The atomic operation at the heart of
-		 * i915_request_get_rcu(), see dma_fence_get_rcu(), is
-		 * atomic_inc_not_zero() which is only a full memory barrier
-		 * when successful. That is, if i915_request_get_rcu()
-		 * returns the request (and so with the reference counted
-		 * incremented) then the following read for rcu_access_pointer()
-		 * must occur after the atomic operation and so confirm
-		 * that this request is the one currently being tracked.
-		 *
-		 * The corresponding write barrier is part of
-		 * rcu_assign_pointer().
-		 */
-		if (!request || request == rcu_access_pointer(active->request))
-			return rcu_pointer_handoff(request);
-
-		i915_request_put(request);
-	} while (1);
-}
-
+int __must_check
+i915_active_fence_set(struct i915_active_fence *active,
+		      struct i915_request *rq);
 /**
- * i915_active_request_get_unlocked - return a reference to the active request
+ * i915_active_fence_get - return a reference to the active fence
  * @active - the active tracker
  *
- * i915_active_request_get_unlocked() returns a reference to the active request,
+ * i915_active_fence_get() returns a reference to the active fence,
  * or NULL if the active tracker is idle. The reference is obtained under RCU,
  * so no locking is required by the caller.
  *
- * The reference should be freed with i915_request_put().
+ * The reference should be freed with dma_fence_put().
  */
-static inline struct i915_request *
-i915_active_request_get_unlocked(const struct i915_active_request *active)
+static inline struct dma_fence *
+i915_active_fence_get(struct i915_active_fence *active)
 {
-	struct i915_request *request;
+	struct dma_fence *fence;
 
 	rcu_read_lock();
-	request = __i915_active_request_get_rcu(active);
+	fence = dma_fence_get_rcu_safe(&active->fence);
 	rcu_read_unlock();
 
-	return request;
+	return fence;
 }
 
 /**
- * i915_active_request_isset - report whether the active tracker is assigned
+ * i915_active_fence_isset - report whether the active tracker is assigned
  * @active - the active tracker
  *
- * i915_active_request_isset() returns true if the active tracker is currently
- * assigned to a request. Due to the lazy retiring, that request may be idle
+ * i915_active_fence_isset() returns true if the active tracker is currently
+ * assigned to a fence. Due to the lazy retiring, that fence may be idle
  * and this may report stale information.
  */
 static inline bool
-i915_active_request_isset(const struct i915_active_request *active)
+i915_active_fence_isset(const struct i915_active_fence *active)
 {
-	return rcu_access_pointer(active->request);
+	return rcu_access_pointer(active->fence);
 }
 
-/**
- * i915_active_request_retire - waits until the request is retired
- * @active - the active request on which to wait
- *
- * i915_active_request_retire() waits until the request is completed,
- * and then ensures that at least the retirement handler for this
- * @active tracker is called before returning. If the @active
- * tracker is idle, the function returns immediately.
- */
-static inline int __must_check
-i915_active_request_retire(struct i915_active_request *active,
-			   struct mutex *mutex)
+static inline void
+i915_active_fence_cb(struct dma_fence *fence, struct dma_fence_cb *cb)
 {
-	struct i915_request *request;
-	long ret;
-
-	request = i915_active_request_raw(active, mutex);
-	if (!request)
-		return 0;
-
-	ret = i915_request_wait(request,
-				I915_WAIT_INTERRUPTIBLE | I915_WAIT_LOCKED,
-				MAX_SCHEDULE_TIMEOUT);
-	if (ret < 0)
-		return ret;
-
-	list_del_init(&active->link);
-	RCU_INIT_POINTER(active->request, NULL);
-
-	active->retire(active, request);
+	struct i915_active_fence *active =
+		container_of(cb, typeof(*active), cb);
 
-	return 0;
+	RCU_INIT_POINTER(active->fence, NULL);
 }
 
 /*
@@ -350,31 +155,29 @@ i915_active_request_retire(struct i915_active_request *active,
  * synchronisation.
  */
 
-void __i915_active_init(struct drm_i915_private *i915,
-			struct i915_active *ref,
+void __i915_active_init(struct i915_active *ref,
 			int (*active)(struct i915_active *ref),
 			void (*retire)(struct i915_active *ref),
 			struct lock_class_key *key);
-#define i915_active_init(i915, ref, active, retire) do {		\
+#define i915_active_init(ref, active, retire) do {		\
 	static struct lock_class_key __key;				\
 									\
-	__i915_active_init(i915, ref, active, retire, &__key);		\
+	__i915_active_init(ref, active, retire, &__key);		\
 } while (0)
 
 int i915_active_ref(struct i915_active *ref,
 		    u64 timeline,
-		    struct i915_request *rq);
+		    struct dma_fence *fence);
 
 int i915_active_wait(struct i915_active *ref);
 
 int i915_request_await_active(struct i915_request *rq,
 			      struct i915_active *ref);
-int i915_request_await_active_request(struct i915_request *rq,
-				      struct i915_active_request *active);
+int i915_request_await_active_fence(struct i915_request *rq,
+				    struct i915_active_fence *active);
 
 int i915_active_acquire(struct i915_active *ref);
 void i915_active_release(struct i915_active *ref);
-void __i915_active_release_nested(struct i915_active *ref, int subclass);
 
 static inline bool
 i915_active_is_idle(const struct i915_active *ref)
diff --git a/drivers/gpu/drm/i915/i915_active_types.h b/drivers/gpu/drm/i915/i915_active_types.h
index 06acdffe0f6d..9519b6523801 100644
--- a/drivers/gpu/drm/i915/i915_active_types.h
+++ b/drivers/gpu/drm/i915/i915_active_types.h
@@ -8,30 +8,21 @@
 #define _I915_ACTIVE_TYPES_H_
 
 #include <linux/atomic.h>
+#include <linux/dma-fence.h>
 #include <linux/llist.h>
 #include <linux/mutex.h>
 #include <linux/rbtree.h>
 #include <linux/rcupdate.h>
 #include <linux/workqueue.h>
 
-struct drm_i915_private;
-struct i915_active_request;
-struct i915_request;
-
-typedef void (*i915_active_retire_fn)(struct i915_active_request *,
-				      struct i915_request *);
-
-struct i915_active_request {
-	struct i915_request __rcu *request;
-	struct list_head link;
-	i915_active_retire_fn retire;
+struct i915_active_fence {
+	struct dma_fence __rcu *fence;
+	struct dma_fence_cb cb;
 };
 
 struct active_node;
 
 struct i915_active {
-	struct drm_i915_private *i915;
-
 	struct active_node *cache;
 	struct rb_root tree;
 	struct mutex mutex;
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 9422821c9d58..94901d2d6573 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -952,28 +952,38 @@ wait_for_timelines(struct drm_i915_private *i915,
 
 	mutex_lock(&gt->mutex);
 	list_for_each_entry(tl, &gt->active_list, link) {
-		struct i915_request *rq;
+		struct dma_fence *fence;
 
-		rq = i915_active_request_get_unlocked(&tl->last_request);
-		if (!rq)
+		fence = i915_active_fence_get(&tl->last_request);
+		if (!fence)
 			continue;
 
 		mutex_unlock(&gt->mutex);
 
-		/*
-		 * "Race-to-idle".
-		 *
-		 * Switching to the kernel context is often used a synchronous
-		 * step prior to idling, e.g. in suspend for flushing all
-		 * current operations to memory before sleeping. These we
-		 * want to complete as quickly as possible to avoid prolonged
-		 * stalls, so allow the gpu to boost to maximum clocks.
-		 */
-		if (flags & I915_WAIT_FOR_IDLE_BOOST)
-			gen6_rps_boost(rq);
+		if (!dma_fence_is_i915(fence)) {
+			timeout = dma_fence_wait_timeout(fence,
+							 flags & I915_WAIT_INTERRUPTIBLE,
+							 timeout);
+		} else {
+			struct i915_request *rq = to_request(fence);
+
+			/*
+			 * "Race-to-idle".
+			 *
+			 * Switching to the kernel context is often used as
+			 * a synchronous step prior to idling, e.g. in suspend
+			 * for flushing all current operations to memory before
+			 * sleeping. These we want to complete as quickly as
+			 * possible to avoid prolonged stalls, so allow the gpu
+			 * to boost to maximum clocks.
+			 */
+			if (flags & I915_WAIT_FOR_IDLE_BOOST)
+				gen6_rps_boost(rq);
+
+			timeout = i915_request_wait(rq, flags, timeout);
+		}
 
-		timeout = i915_request_wait(rq, flags, timeout);
-		i915_request_put(rq);
+		dma_fence_put(fence);
 		if (timeout < 0)
 			return timeout;
 
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 7195c3c09656..7c5970f51523 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -2061,7 +2061,7 @@ static struct i915_vma *pd_vma_create(struct gen6_ppgtt *ppgtt, int size)
 	if (!vma)
 		return ERR_PTR(-ENOMEM);
 
-	i915_active_init(i915, &vma->active, NULL, NULL);
+	i915_active_init(&vma->active, NULL, NULL);
 
 	vma->vm = &ggtt->vm;
 	vma->ops = &pd_vma_ops;
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 865b49928ff9..d43f1d3d50ba 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -229,9 +229,8 @@ static void free_capture_list(struct i915_request *request)
 
 static bool i915_request_retire(struct i915_request *rq)
 {
-	struct i915_active_request *active, *next;
+	lockdep_assert_held(&rq->timeline->mutex);
 
-	lockdep_assert_held(&rq->i915->drm.struct_mutex);
 	if (!i915_request_completed(rq))
 		return false;
 
@@ -245,35 +244,6 @@ static bool i915_request_retire(struct i915_request *rq)
 
 	advance_ring(rq);
 
-	/*
-	 * Walk through the active list, calling retire on each. This allows
-	 * objects to track their GPU activity and mark themselves as idle
-	 * when their *last* active request is completed (updating state
-	 * tracking lists for eviction, active references for GEM, etc).
-	 *
-	 * As the ->retire() may free the node, we decouple it first and
-	 * pass along the auxiliary information (to avoid dereferencing
-	 * the node after the callback).
-	 */
-	list_for_each_entry_safe(active, next, &rq->active_list, link) {
-		/*
-		 * In microbenchmarks or focusing upon time inside the kernel,
-		 * we may spend an inordinate amount of time simply handling
-		 * the retirement of requests and processing their callbacks.
-		 * Of which, this loop itself is particularly hot due to the
-		 * cache misses when jumping around the list of
-		 * i915_active_request.  So we try to keep this loop as
-		 * streamlined as possible and also prefetch the next
-		 * i915_active_request to try and hide the likely cache miss.
-		 */
-		prefetchw(next);
-
-		INIT_LIST_HEAD(&active->link);
-		RCU_INIT_POINTER(active->request, NULL);
-
-		active->retire(active, rq);
-	}
-
 	local_irq_disable();
 
 	/*
@@ -328,11 +298,9 @@ void i915_request_retire_upto(struct i915_request *rq)
 		  rq->fence.context, rq->fence.seqno,
 		  hwsp_seqno(rq));
 
-	lockdep_assert_held(&rq->i915->drm.struct_mutex);
+	lockdep_assert_held(&rq->timeline->mutex);
 	GEM_BUG_ON(!i915_request_completed(rq));
-
-	if (list_empty(&rq->ring_link))
-		return;
+	GEM_BUG_ON(list_empty(&rq->ring_link));
 
 	do {
 		tmp = list_first_entry(&ring->request_list,
@@ -567,6 +535,7 @@ static void ring_retire_requests(struct intel_ring *ring)
 {
 	struct i915_request *rq, *rn;
 
+	lockdep_assert_held(&ring->timeline->mutex);
 	list_for_each_entry_safe(rq, rn, &ring->request_list, ring_link)
 		if (!i915_request_retire(rq))
 			break;
@@ -687,7 +656,6 @@ __i915_request_create(struct intel_context *ce, gfp_t gfp)
 	rq->waitboost = false;
 	rq->execution_mask = ALL_ENGINES;
 
-	INIT_LIST_HEAD(&rq->active_list);
 	INIT_LIST_HEAD(&rq->execute_cb);
 
 	/*
@@ -726,7 +694,6 @@ __i915_request_create(struct intel_context *ce, gfp_t gfp)
 	ce->ring->emit = rq->head;
 
 	/* Make sure we didn't add ourselves to external state before freeing */
-	GEM_BUG_ON(!list_empty(&rq->active_list));
 	GEM_BUG_ON(!list_empty(&rq->sched.signalers_list));
 	GEM_BUG_ON(!list_empty(&rq->sched.waiters_list));
 
@@ -1116,7 +1083,8 @@ __i915_request_add_to_timeline(struct i915_request *rq)
 	 * precludes optimising to use semaphores serialisation of a single
 	 * timeline across engines.
 	 */
-	prev = rcu_dereference_protected(timeline->last_request.request, 1);
+	prev = to_request(__i915_active_fence_set(&timeline->last_request,
+						  &rq->fence));
 	if (prev && !i915_request_completed(prev)) {
 		if (is_power_of_2(prev->engine->mask | rq->engine->mask))
 			i915_sw_fence_await_sw_fence(&rq->submit,
@@ -1141,7 +1109,6 @@ __i915_request_add_to_timeline(struct i915_request *rq)
 	 * us, the timeline will hold its seqno which is later than ours.
 	 */
 	GEM_BUG_ON(timeline->seqno != rq->fence.seqno);
-	__i915_active_request_set(&timeline->last_request, rq);
 
 	return prev;
 }
@@ -1372,10 +1339,6 @@ static void request_wait_wake(struct dma_fence *fence, struct dma_fence_cb *cb)
  * maximum of @timeout jiffies (with MAX_SCHEDULE_TIMEOUT implying an
  * unbounded wait).
  *
- * If the caller holds the struct_mutex, the caller must pass I915_WAIT_LOCKED
- * in via the flags, and vice versa if the struct_mutex is not held, the caller
- * must not specify that the wait is locked.
- *
  * Returns the remaining time (in jiffies) if the request completed, which may
  * be zero or -ETIME if the request is unfinished after the timeout expires.
  * May return -EINTR is called with I915_WAIT_INTERRUPTIBLE and a signal is
@@ -1495,7 +1458,11 @@ bool i915_retire_requests(struct drm_i915_private *i915)
 	list_for_each_entry_safe(ring, tmp,
 				 &i915->gt.active_rings, active_link) {
 		intel_ring_get(ring); /* last rq holds reference! */
+		mutex_lock(&ring->timeline->mutex);
+
 		ring_retire_requests(ring);
+
+		mutex_unlock(&ring->timeline->mutex);
 		intel_ring_put(ring);
 	}
 
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index bebc1e9b4a5e..8277cff0df70 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -211,7 +211,6 @@ struct i915_request {
 	 * on the active_list (of their final request).
 	 */
 	struct i915_capture_list *capture_list;
-	struct list_head active_list;
 
 	/** Time at which this request was emitted, in jiffies. */
 	unsigned long emitted_jiffies;
diff --git a/drivers/gpu/drm/i915/i915_timeline.c b/drivers/gpu/drm/i915/i915_timeline.c
index e2f9336bab83..4ffb61991768 100644
--- a/drivers/gpu/drm/i915/i915_timeline.c
+++ b/drivers/gpu/drm/i915/i915_timeline.c
@@ -179,8 +179,7 @@ cacheline_alloc(struct i915_timeline_hwsp *hwsp, unsigned int cacheline)
 	cl->hwsp = hwsp;
 	cl->vaddr = page_pack_bits(vaddr, cacheline);
 
-	i915_active_init(hwsp_to_i915(hwsp), &cl->active,
-			 __cacheline_active, __cacheline_retire);
+	i915_active_init(&cl->active, __cacheline_active, __cacheline_retire);
 
 	return cl;
 }
@@ -263,7 +262,7 @@ int i915_timeline_init(struct drm_i915_private *i915,
 
 	mutex_init(&timeline->mutex);
 
-	INIT_ACTIVE_REQUEST(&timeline->last_request);
+	INIT_ACTIVE_FENCE(&timeline->last_request);
 	INIT_LIST_HEAD(&timeline->requests);
 
 	i915_syncmap_init(&timeline->sync);
@@ -463,7 +462,7 @@ __i915_timeline_get_seqno(struct i915_timeline *tl,
 	 * all writes into the cacheline from previous requests are complete.
 	 */
 	err = i915_active_ref(&tl->hwsp_cacheline->active,
-			      tl->fence_context, rq);
+			      tl->fence_context, &rq->fence);
 	if (err)
 		goto err_cacheline;
 
@@ -514,7 +513,7 @@ int i915_timeline_get_seqno(struct i915_timeline *tl,
 static int cacheline_ref(struct i915_timeline_cacheline *cl,
 			 struct i915_request *rq)
 {
-	return i915_active_ref(&cl->active, rq->fence.context, rq);
+	return i915_active_ref(&cl->active, rq->fence.context, &rq->fence);
 }
 
 int i915_timeline_read_hwsp(struct i915_request *from,
diff --git a/drivers/gpu/drm/i915/i915_timeline_types.h b/drivers/gpu/drm/i915/i915_timeline_types.h
index fce5cb4f1090..5af6d185d70c 100644
--- a/drivers/gpu/drm/i915/i915_timeline_types.h
+++ b/drivers/gpu/drm/i915/i915_timeline_types.h
@@ -45,7 +45,7 @@ struct i915_timeline {
 	 * the request using i915_active_request_get_request_rcu(), or hold the
 	 * struct_mutex.
 	 */
-	struct i915_active_request last_request;
+	struct i915_active_fence last_request;
 
 	/**
 	 * We track the most recent seqno that we wait on in every context so
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 041638a50006..311eb9837699 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -115,8 +115,7 @@ vma_create(struct drm_i915_gem_object *obj,
 	vma->size = obj->base.size;
 	vma->display_alignment = I915_GTT_MIN_ALIGNMENT;
 
-	i915_active_init(vm->i915, &vma->active,
-			 __i915_vma_active, __i915_vma_retire);
+	i915_active_init(&vma->active, __i915_vma_active, __i915_vma_retire);
 
 	INIT_LIST_HEAD(&vma->closed_link);
 
@@ -910,7 +909,7 @@ int i915_vma_move_to_active(struct i915_vma *vma,
 	 * add the active reference first and queue for it to be dropped
 	 * *last*.
 	 */
-	err = i915_active_ref(&vma->active, rq->fence.context, rq);
+	err = i915_active_ref(&vma->active, rq->fence.context, &rq->fence);
 	if (unlikely(err))
 		return err;
 
@@ -921,7 +920,7 @@ int i915_vma_move_to_active(struct i915_vma *vma,
 		if (intel_frontbuffer_invalidate(obj->frontbuffer, ORIGIN_CS))
 			i915_active_ref(&obj->frontbuffer->write,
 					rq->fence.context,
-					rq);
+					&rq->fence);
 
 		obj->read_domains = 0;
 	}
@@ -940,33 +939,11 @@ int i915_vma_unbind(struct i915_vma *vma)
 
 	lockdep_assert_held(&vma->vm->i915->drm.struct_mutex);
 
-	/*
-	 * First wait upon any activity as retiring the request may
-	 * have side-effects such as unpinning or even unbinding this vma.
-	 */
-	might_sleep();
-	if (i915_vma_is_active(vma)) {
-		/*
-		 * When a closed VMA is retired, it is unbound - eek.
-		 * In order to prevent it from being recursively closed,
-		 * take a pin on the vma so that the second unbind is
-		 * aborted.
-		 *
-		 * Even more scary is that the retire callback may free
-		 * the object (last active vma). To prevent the explosion
-		 * we defer the actual object free to a worker that can
-		 * only proceed once it acquires the struct_mutex (which
-		 * we currently hold, therefore it cannot free this object
-		 * before we are finished).
-		 */
-		__i915_vma_pin(vma);
-		ret = i915_active_wait(&vma->active);
-		__i915_vma_unpin(vma);
-		if (ret)
-			return ret;
-	}
-	GEM_BUG_ON(i915_vma_is_active(vma));
+	ret = i915_active_wait(&vma->active);
+	if (ret)
+		return ret;
 
+	GEM_BUG_ON(i915_vma_is_active(vma));
 	if (i915_vma_is_pinned(vma)) {
 		vma_print_allocator(vma, "is pinned");
 		return -EBUSY;
diff --git a/drivers/gpu/drm/i915/intel_frontbuffer.c b/drivers/gpu/drm/i915/intel_frontbuffer.c
index ecbed110aec9..1299bb341c53 100644
--- a/drivers/gpu/drm/i915/intel_frontbuffer.c
+++ b/drivers/gpu/drm/i915/intel_frontbuffer.c
@@ -247,8 +247,7 @@ intel_frontbuffer_get(struct drm_i915_gem_object *obj)
 	front->obj = obj;
 	kref_init(&front->ref);
 	atomic_set(&front->bits, 0);
-	i915_active_init(i915, &front->write,
-			 frontbuffer_active, frontbuffer_retire);
+	i915_active_init(&front->write, frontbuffer_active, frontbuffer_retire);
 
 	spin_lock(&i915->fb_tracking.lock);
 	if (obj->frontbuffer) {
diff --git a/drivers/gpu/drm/i915/intel_overlay.c b/drivers/gpu/drm/i915/intel_overlay.c
index f0743af79fcb..d50a0196b166 100644
--- a/drivers/gpu/drm/i915/intel_overlay.c
+++ b/drivers/gpu/drm/i915/intel_overlay.c
@@ -230,7 +230,8 @@ alloc_request(struct intel_overlay *overlay, void (*fn)(struct intel_overlay *))
 	if (IS_ERR(rq))
 		return rq;
 
-	err = i915_active_ref(&overlay->last_flip, rq->fence.context, rq);
+	err = i915_active_ref(&overlay->last_flip,
+			      rq->fence.context, &rq->fence);
 	if (err) {
 		i915_request_add(rq);
 		return ERR_PTR(err);
@@ -1366,8 +1367,7 @@ void intel_overlay_setup(struct drm_i915_private *dev_priv)
 	overlay->contrast = 75;
 	overlay->saturation = 146;
 
-	i915_active_init(dev_priv,
-			 &overlay->last_flip,
+	i915_active_init(&overlay->last_flip,
 			 NULL, intel_overlay_last_flip_retire);
 
 	ret = get_registers(overlay, OVERLAY_NEEDS_PHYSICAL(dev_priv));
diff --git a/drivers/gpu/drm/i915/selftests/i915_active.c b/drivers/gpu/drm/i915/selftests/i915_active.c
index 3b3ca5658122..0e8bf8bce146 100644
--- a/drivers/gpu/drm/i915/selftests/i915_active.c
+++ b/drivers/gpu/drm/i915/selftests/i915_active.c
@@ -36,7 +36,7 @@ static int __live_active_setup(struct drm_i915_private *i915,
 	if (!submit)
 		return -ENOMEM;
 
-	i915_active_init(i915, &active->base, NULL, __live_active_retire);
+	i915_active_init(&active->base, NULL, __live_active_retire);
 	active->retired = false;
 
 	err = i915_active_acquire(&active->base);
@@ -57,7 +57,8 @@ static int __live_active_setup(struct drm_i915_private *i915,
 						       GFP_KERNEL);
 		if (err >= 0)
 			err = i915_active_ref(&active->base,
-					      rq->fence.context, rq);
+					      rq->fence.context,
+					      &rq->fence);
 		i915_request_add(rq);
 		if (err) {
 			pr_err("Failed to track active ref!\n");
@@ -89,14 +90,10 @@ static int live_active_wait(void *arg)
 {
 	struct drm_i915_private *i915 = arg;
 	struct live_active active;
-	intel_wakeref_t wakeref;
 	int err;
 
 	/* Check that we get a callback when requests retire upon waiting */
 
-	mutex_lock(&i915->drm.struct_mutex);
-	wakeref = intel_runtime_pm_get(i915);
-
 	err = __live_active_setup(i915, &active);
 
 	i915_active_wait(&active.base);
@@ -106,11 +103,12 @@ static int live_active_wait(void *arg)
 	}
 
 	i915_active_fini(&active.base);
+
+	mutex_lock(&i915->drm.struct_mutex);
 	if (igt_flush_test(i915, I915_WAIT_LOCKED))
 		err = -EIO;
-
-	intel_runtime_pm_put(i915, wakeref);
 	mutex_unlock(&i915->drm.struct_mutex);
+
 	return err;
 }
 
@@ -118,19 +116,17 @@ static int live_active_retire(void *arg)
 {
 	struct drm_i915_private *i915 = arg;
 	struct live_active active;
-	intel_wakeref_t wakeref;
 	int err;
 
 	/* Check that we get a callback when requests are indirectly retired */
 
-	mutex_lock(&i915->drm.struct_mutex);
-	wakeref = intel_runtime_pm_get(i915);
-
 	err = __live_active_setup(i915, &active);
 
 	/* waits for & retires all requests */
+	mutex_lock(&i915->drm.struct_mutex);
 	if (igt_flush_test(i915, I915_WAIT_LOCKED))
 		err = -EIO;
+	mutex_unlock(&i915->drm.struct_mutex);
 
 	if (!active.retired) {
 		pr_err("i915_active not retired after flushing!\n");
@@ -138,8 +134,6 @@ static int live_active_retire(void *arg)
 	}
 
 	i915_active_fini(&active.base);
-	intel_runtime_pm_put(i915, wakeref);
-	mutex_unlock(&i915->drm.struct_mutex);
 	return err;
 }
 
diff --git a/drivers/gpu/drm/i915/selftests/mock_timeline.c b/drivers/gpu/drm/i915/selftests/mock_timeline.c
index 65b52be23d42..024de718f66f 100644
--- a/drivers/gpu/drm/i915/selftests/mock_timeline.c
+++ b/drivers/gpu/drm/i915/selftests/mock_timeline.c
@@ -15,7 +15,7 @@ void mock_timeline_init(struct i915_timeline *timeline, u64 context)
 
 	mutex_init(&timeline->mutex);
 
-	INIT_ACTIVE_REQUEST(&timeline->last_request);
+	INIT_ACTIVE_FENCE(&timeline->last_request);
 	INIT_LIST_HEAD(&timeline->requests);
 
 	i915_syncmap_init(&timeline->sync);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 25/39] drm/i915: Track ggtt fence reservations under its own mutex
  2019-06-14  7:09 WIP glance towards struct_mutex removal Chris Wilson
                   ` (23 preceding siblings ...)
  2019-06-14  7:10 ` [PATCH 24/39] drm/i915: Coordinate i915_active with its own mutex Chris Wilson
@ 2019-06-14  7:10 ` Chris Wilson
  2019-06-14  7:10 ` [PATCH 26/39] drm/i915: Only track bound elements of the GTT Chris Wilson
                   ` (15 subsequent siblings)
  40 siblings, 0 replies; 61+ messages in thread
From: Chris Wilson @ 2019-06-14  7:10 UTC (permalink / raw)
  To: intel-gfx

We can reduce the locking for fence registers from the dev->struct_mutex
to a local mutex. We could introduce a mutex for the sole purpose of
tracking the fence acquisition, except there is a little bit of overlap
with the fault tracking, so use the i915_ggtt.mutex as it covers both.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/i915/gt/selftest_hangcheck.c |   7 ++
 drivers/gpu/drm/i915/gvt/aperture_gm.c       |  10 +-
 drivers/gpu/drm/i915/i915_debugfs.c          |   5 +-
 drivers/gpu/drm/i915/i915_gem_fence_reg.c    | 108 ++++++++++++-------
 drivers/gpu/drm/i915/i915_gem_fence_reg.h    |   2 +-
 drivers/gpu/drm/i915/i915_vma.h              |   4 +-
 6 files changed, 87 insertions(+), 49 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
index 8f2f07cae7bd..458c6559312d 100644
--- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
+++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
@@ -1161,7 +1161,14 @@ static int evict_fence(void *data)
 		goto out_unlock;
 	}
 
+	err = i915_vma_pin(arg->vma, 0, 0, PIN_GLOBAL | PIN_MAPPABLE);
+	if (err) {
+		pr_err("Unable to pin vma for Y-tiled fence; err:%d\n", err);
+		goto out_unlock;
+	}
+
 	err = i915_vma_pin_fence(arg->vma);
+	i915_vma_unpin(arg->vma);
 	if (err) {
 		pr_err("Unable to pin Y-tiled fence; err:%d\n", err);
 		goto out_unlock;
diff --git a/drivers/gpu/drm/i915/gvt/aperture_gm.c b/drivers/gpu/drm/i915/gvt/aperture_gm.c
index 4098902bfaeb..2bd9dcebd48c 100644
--- a/drivers/gpu/drm/i915/gvt/aperture_gm.c
+++ b/drivers/gpu/drm/i915/gvt/aperture_gm.c
@@ -172,14 +172,14 @@ static void free_vgpu_fence(struct intel_vgpu *vgpu)
 
 	intel_runtime_pm_get(dev_priv);
 
-	mutex_lock(&dev_priv->drm.struct_mutex);
+	mutex_lock(&dev_priv->ggtt.vm.mutex);
 	_clear_vgpu_fence(vgpu);
 	for (i = 0; i < vgpu_fence_sz(vgpu); i++) {
 		reg = vgpu->fence.regs[i];
 		i915_unreserve_fence(reg);
 		vgpu->fence.regs[i] = NULL;
 	}
-	mutex_unlock(&dev_priv->drm.struct_mutex);
+	mutex_unlock(&dev_priv->ggtt.vm.mutex);
 
 	intel_runtime_pm_put_unchecked(dev_priv);
 }
@@ -194,7 +194,7 @@ static int alloc_vgpu_fence(struct intel_vgpu *vgpu)
 	intel_runtime_pm_get(dev_priv);
 
 	/* Request fences from host */
-	mutex_lock(&dev_priv->drm.struct_mutex);
+	mutex_lock(&dev_priv->ggtt.vm.mutex);
 
 	for (i = 0; i < vgpu_fence_sz(vgpu); i++) {
 		reg = i915_reserve_fence(dev_priv);
@@ -206,7 +206,7 @@ static int alloc_vgpu_fence(struct intel_vgpu *vgpu)
 
 	_clear_vgpu_fence(vgpu);
 
-	mutex_unlock(&dev_priv->drm.struct_mutex);
+	mutex_unlock(&dev_priv->ggtt.vm.mutex);
 	intel_runtime_pm_put_unchecked(dev_priv);
 	return 0;
 out_free_fence:
@@ -219,7 +219,7 @@ static int alloc_vgpu_fence(struct intel_vgpu *vgpu)
 		i915_unreserve_fence(reg);
 		vgpu->fence.regs[i] = NULL;
 	}
-	mutex_unlock(&dev_priv->drm.struct_mutex);
+	mutex_unlock(&dev_priv->ggtt.vm.mutex);
 	intel_runtime_pm_put_unchecked(dev_priv);
 	return -ENOSPC;
 }
diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index a3f1aff3c5c2..b54f7b79a53c 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -697,10 +697,11 @@ static int i915_gem_fence_regs_info(struct seq_file *m, void *data)
 
 	rcu_read_lock();
 	for (i = 0; i < i915->ggtt.num_fences; i++) {
-		struct i915_vma *vma = i915->ggtt.fence_regs[i].vma;
+		struct i915_fence_reg *reg = &i915->ggtt.fence_regs[i];
+		struct i915_vma *vma = reg->vma;
 
 		seq_printf(m, "Fence %d, pin count = %d, object = ",
-			   i, i915->ggtt.fence_regs[i].pin_count);
+			   i, atomic_read(&reg->pin_count));
 		if (!vma)
 			seq_puts(m, "unused");
 		else
diff --git a/drivers/gpu/drm/i915/i915_gem_fence_reg.c b/drivers/gpu/drm/i915/i915_gem_fence_reg.c
index 6378c330809f..f24b626bf0f8 100644
--- a/drivers/gpu/drm/i915/i915_gem_fence_reg.c
+++ b/drivers/gpu/drm/i915/i915_gem_fence_reg.c
@@ -299,15 +299,24 @@ static int fence_update(struct i915_fence_reg *fence,
  */
 int i915_vma_put_fence(struct i915_vma *vma)
 {
+	struct i915_ggtt *ggtt = i915_vm_to_ggtt(vma->vm);
 	struct i915_fence_reg *fence = vma->fence;
+	int err;
 
 	if (!fence)
 		return 0;
 
-	if (fence->pin_count)
+	if (atomic_read(&fence->pin_count))
 		return -EBUSY;
 
-	return fence_update(fence, NULL);
+	err = mutex_lock_interruptible(&ggtt->vm.mutex);
+	if (err)
+		return err;
+
+	err = fence_update(fence, NULL);
+	mutex_unlock(&ggtt->vm.mutex);
+
+	return err;
 }
 
 static struct i915_fence_reg *fence_find(struct drm_i915_private *i915)
@@ -317,7 +326,7 @@ static struct i915_fence_reg *fence_find(struct drm_i915_private *i915)
 	list_for_each_entry(fence, &i915->ggtt.fence_list, link) {
 		GEM_BUG_ON(fence->vma && fence->vma->fence != fence);
 
-		if (fence->pin_count)
+		if (atomic_read(&fence->pin_count))
 			continue;
 
 		return fence;
@@ -330,6 +339,48 @@ static struct i915_fence_reg *fence_find(struct drm_i915_private *i915)
 	return ERR_PTR(-EDEADLK);
 }
 
+static int __i915_vma_pin_fence(struct i915_vma *vma)
+{
+	struct i915_ggtt *ggtt = i915_vm_to_ggtt(vma->vm);
+	struct i915_fence_reg *fence;
+	struct i915_vma *set = i915_gem_object_is_tiled(vma->obj) ? vma : NULL;
+	int err;
+
+	/* Just update our place in the LRU if our fence is getting reused. */
+	if (vma->fence) {
+		fence = vma->fence;
+		GEM_BUG_ON(fence->vma != vma);
+		atomic_inc(&fence->pin_count);
+		if (!fence->dirty) {
+			list_move_tail(&fence->link, &ggtt->fence_list);
+			return 0;
+		}
+	} else if (set) {
+		fence = fence_find(vma->vm->i915);
+		if (IS_ERR(fence))
+			return PTR_ERR(fence);
+
+		GEM_BUG_ON(atomic_read(&fence->pin_count));
+		atomic_inc(&fence->pin_count);
+	} else {
+		return 0;
+	}
+
+	err = fence_update(fence, set);
+	if (err)
+		goto out_unpin;
+
+	GEM_BUG_ON(fence->vma != set);
+	GEM_BUG_ON(vma->fence != (set ? fence : NULL));
+
+	if (set)
+		return 0;
+
+out_unpin:
+	atomic_dec(&fence->pin_count);
+	return err;
+}
+
 /**
  * i915_vma_pin_fence - set up fencing for a vma
  * @vma: vma to map through a fence reg
@@ -350,8 +401,6 @@ static struct i915_fence_reg *fence_find(struct drm_i915_private *i915)
  */
 int i915_vma_pin_fence(struct i915_vma *vma)
 {
-	struct i915_fence_reg *fence;
-	struct i915_vma *set = i915_gem_object_is_tiled(vma->obj) ? vma : NULL;
 	int err;
 
 	/*
@@ -359,39 +408,16 @@ int i915_vma_pin_fence(struct i915_vma *vma)
 	 * must keep the device awake whilst using the fence.
 	 */
 	assert_rpm_wakelock_held(vma->vm->i915);
+	GEM_BUG_ON(!i915_vma_is_pinned(vma));
+	GEM_BUG_ON(!i915_vma_is_ggtt(vma));
 
-	/* Just update our place in the LRU if our fence is getting reused. */
-	if (vma->fence) {
-		fence = vma->fence;
-		GEM_BUG_ON(fence->vma != vma);
-		fence->pin_count++;
-		if (!fence->dirty) {
-			list_move_tail(&fence->link,
-				       &fence->i915->ggtt.fence_list);
-			return 0;
-		}
-	} else if (set) {
-		fence = fence_find(vma->vm->i915);
-		if (IS_ERR(fence))
-			return PTR_ERR(fence);
-
-		GEM_BUG_ON(fence->pin_count);
-		fence->pin_count++;
-	} else
-		return 0;
-
-	err = fence_update(fence, set);
+	err = mutex_lock_interruptible(&vma->vm->mutex);
 	if (err)
-		goto out_unpin;
+		return err;
 
-	GEM_BUG_ON(fence->vma != set);
-	GEM_BUG_ON(vma->fence != (set ? fence : NULL));
-
-	if (set)
-		return 0;
+	err = __i915_vma_pin_fence(vma);
+	mutex_unlock(&vma->vm->mutex);
 
-out_unpin:
-	fence->pin_count--;
 	return err;
 }
 
@@ -404,16 +430,17 @@ int i915_vma_pin_fence(struct i915_vma *vma)
  */
 struct i915_fence_reg *i915_reserve_fence(struct drm_i915_private *i915)
 {
+	struct i915_ggtt *ggtt = &i915->ggtt;
 	struct i915_fence_reg *fence;
 	int count;
 	int ret;
 
-	lockdep_assert_held(&i915->drm.struct_mutex);
+	lockdep_assert_held(&ggtt->vm.mutex);
 
 	/* Keep at least one fence available for the display engine. */
 	count = 0;
-	list_for_each_entry(fence, &i915->ggtt.fence_list, link)
-		count += !fence->pin_count;
+	list_for_each_entry(fence, &ggtt->fence_list, link)
+		count += !atomic_read(&fence->pin_count);
 	if (count <= 1)
 		return ERR_PTR(-ENOSPC);
 
@@ -429,6 +456,7 @@ struct i915_fence_reg *i915_reserve_fence(struct drm_i915_private *i915)
 	}
 
 	list_del(&fence->link);
+
 	return fence;
 }
 
@@ -440,9 +468,11 @@ struct i915_fence_reg *i915_reserve_fence(struct drm_i915_private *i915)
  */
 void i915_unreserve_fence(struct i915_fence_reg *fence)
 {
-	lockdep_assert_held(&fence->i915->drm.struct_mutex);
+	struct i915_ggtt *ggtt = &fence->i915->ggtt;
+
+	lockdep_assert_held(&ggtt->vm.mutex);
 
-	list_add(&fence->link, &fence->i915->ggtt.fence_list);
+	list_add(&fence->link, &ggtt->fence_list);
 }
 
 /**
diff --git a/drivers/gpu/drm/i915/i915_gem_fence_reg.h b/drivers/gpu/drm/i915/i915_gem_fence_reg.h
index d2da98828179..d7c6ebf789c1 100644
--- a/drivers/gpu/drm/i915/i915_gem_fence_reg.h
+++ b/drivers/gpu/drm/i915/i915_gem_fence_reg.h
@@ -40,7 +40,7 @@ struct i915_fence_reg {
 	struct list_head link;
 	struct drm_i915_private *i915;
 	struct i915_vma *vma;
-	int pin_count;
+	atomic_t pin_count;
 	int id;
 	/**
 	 * Whether the tiling parameters for the currently
diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
index 17e6a038bd36..71088ff4ad59 100644
--- a/drivers/gpu/drm/i915/i915_vma.h
+++ b/drivers/gpu/drm/i915/i915_vma.h
@@ -418,8 +418,8 @@ int __must_check i915_vma_put_fence(struct i915_vma *vma);
 
 static inline void __i915_vma_unpin_fence(struct i915_vma *vma)
 {
-	GEM_BUG_ON(vma->fence->pin_count <= 0);
-	vma->fence->pin_count--;
+	GEM_BUG_ON(atomic_read(&vma->fence->pin_count) <= 0);
+	atomic_dec(&vma->fence->pin_count);
 }
 
 /**
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 26/39] drm/i915: Only track bound elements of the GTT
  2019-06-14  7:09 WIP glance towards struct_mutex removal Chris Wilson
                   ` (24 preceding siblings ...)
  2019-06-14  7:10 ` [PATCH 25/39] drm/i915: Track ggtt fence reservations under " Chris Wilson
@ 2019-06-14  7:10 ` Chris Wilson
  2019-06-14  7:10 ` [PATCH 27/39] drm/i915: Explicitly cleanup initial_plane_config Chris Wilson
                   ` (14 subsequent siblings)
  40 siblings, 0 replies; 61+ messages in thread
From: Chris Wilson @ 2019-06-14  7:10 UTC (permalink / raw)
  To: intel-gfx

The premise here is to simply avoiding having to acquire the vm->mutex
inside vma create/destroy to update the vm->unbound_lists, to avoid some
nasty lock recursions later.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gem/i915_gem_stolen.c    |  2 +-
 drivers/gpu/drm/i915/i915_gem_gtt.c           | 23 ++++---------------
 drivers/gpu/drm/i915/i915_gem_gtt.h           |  5 ----
 drivers/gpu/drm/i915/i915_vma.c               | 12 ++--------
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c |  2 +-
 5 files changed, 8 insertions(+), 36 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
index de1fab2058ec..e8bc9b46d4e8 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
@@ -686,7 +686,7 @@ i915_gem_object_create_stolen_for_preallocated(struct drm_i915_private *dev_priv
 	__i915_vma_set_map_and_fenceable(vma);
 
 	mutex_lock(&ggtt->vm.mutex);
-	list_move_tail(&vma->vm_link, &ggtt->vm.bound_list);
+	list_add_tail(&vma->vm_link, &ggtt->vm.bound_list);
 	mutex_unlock(&ggtt->vm.mutex);
 
 	GEM_BUG_ON(i915_gem_object_is_shrinkable(obj));
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 7c5970f51523..457126973cdd 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -500,7 +500,6 @@ static void i915_address_space_init(struct i915_address_space *vm, int subclass)
 
 	stash_init(&vm->free_pages);
 
-	INIT_LIST_HEAD(&vm->unbound_list);
 	INIT_LIST_HEAD(&vm->bound_list);
 }
 
@@ -2075,10 +2074,6 @@ static struct i915_vma *pd_vma_create(struct gen6_ppgtt *ppgtt, int size)
 	INIT_LIST_HEAD(&vma->obj_link);
 	INIT_LIST_HEAD(&vma->closed_link);
 
-	mutex_lock(&vma->vm->mutex);
-	list_add(&vma->vm_link, &vma->vm->unbound_list);
-	mutex_unlock(&vma->vm->mutex);
-
 	return vma;
 }
 
@@ -2255,19 +2250,12 @@ i915_ppgtt_create(struct drm_i915_private *i915)
 
 static void ppgtt_destroy_vma(struct i915_address_space *vm)
 {
-	struct list_head *phases[] = {
-		&vm->bound_list,
-		&vm->unbound_list,
-		NULL,
-	}, **phase;
+	struct i915_vma *vma, *vn;
 
 	vm->closed = true;
-	for (phase = phases; *phase; phase++) {
-		struct i915_vma *vma, *vn;
-
-		list_for_each_entry_safe(vma, vn, *phase, vm_link)
-			i915_vma_destroy(vma);
-	}
+	list_for_each_entry_safe(vma, vn, &vm->bound_list, vm_link)
+		i915_vma_destroy(vma);
+	GEM_BUG_ON(!list_empty(&vm->bound_list));
 }
 
 void i915_vm_release(struct kref *kref)
@@ -2280,9 +2268,6 @@ void i915_vm_release(struct kref *kref)
 
 	ppgtt_destroy_vma(vm);
 
-	GEM_BUG_ON(!list_empty(&vm->bound_list));
-	GEM_BUG_ON(!list_empty(&vm->unbound_list));
-
 	vm->cleanup(vm);
 	i915_address_space_fini(vm);
 
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 63fa357c69de..889c4e88c598 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -328,11 +328,6 @@ struct i915_address_space {
 	 */
 	struct list_head bound_list;
 
-	/**
-	 * List of vma that are not unbound.
-	 */
-	struct list_head unbound_list;
-
 	struct pagestash free_pages;
 
 	/* Global GTT */
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 311eb9837699..14c6d104332a 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -208,10 +208,6 @@ vma_create(struct drm_i915_gem_object *obj,
 
 	spin_unlock(&obj->vma.lock);
 
-	mutex_lock(&vm->mutex);
-	list_add(&vma->vm_link, &vm->unbound_list);
-	mutex_unlock(&vm->mutex);
-
 	return vma;
 
 err_vma:
@@ -649,7 +645,7 @@ i915_vma_insert(struct i915_vma *vma, u64 size, u64 alignment, u64 flags)
 	GEM_BUG_ON(!i915_gem_valid_gtt_space(vma, cache_level));
 
 	mutex_lock(&vma->vm->mutex);
-	list_move_tail(&vma->vm_link, &vma->vm->bound_list);
+	list_add_tail(&vma->vm_link, &vma->vm->bound_list);
 	mutex_unlock(&vma->vm->mutex);
 
 	if (vma->obj) {
@@ -677,7 +673,7 @@ i915_vma_remove(struct i915_vma *vma)
 
 	mutex_lock(&vma->vm->mutex);
 	drm_mm_remove_node(&vma->node);
-	list_move_tail(&vma->vm_link, &vma->vm->unbound_list);
+	list_del(&vma->vm_link);
 	mutex_unlock(&vma->vm->mutex);
 
 	/*
@@ -790,10 +786,6 @@ static void __i915_vma_destroy(struct i915_vma *vma)
 	GEM_BUG_ON(vma->node.allocated);
 	GEM_BUG_ON(vma->fence);
 
-	mutex_lock(&vma->vm->mutex);
-	list_del(&vma->vm_link);
-	mutex_unlock(&vma->vm->mutex);
-
 	if (vma->obj) {
 		struct drm_i915_gem_object *obj = vma->obj;
 
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
index 2093d08a7569..70c8bc626300 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
@@ -1239,7 +1239,7 @@ static void track_vma_bind(struct i915_vma *vma)
 	vma->pages = obj->mm.pages;
 
 	mutex_lock(&vma->vm->mutex);
-	list_move_tail(&vma->vm_link, &vma->vm->bound_list);
+	list_add_tail(&vma->vm_link, &vma->vm->bound_list);
 	mutex_unlock(&vma->vm->mutex);
 }
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 27/39] drm/i915: Explicitly cleanup initial_plane_config
  2019-06-14  7:09 WIP glance towards struct_mutex removal Chris Wilson
                   ` (25 preceding siblings ...)
  2019-06-14  7:10 ` [PATCH 26/39] drm/i915: Only track bound elements of the GTT Chris Wilson
@ 2019-06-14  7:10 ` Chris Wilson
  2019-06-14  7:10 ` [PATCH 28/39] initial-plane-vma Chris Wilson
                   ` (13 subsequent siblings)
  40 siblings, 0 replies; 61+ messages in thread
From: Chris Wilson @ 2019-06-14  7:10 UTC (permalink / raw)
  To: intel-gfx

I am about to stuff more objects into the plane_config and would like to
have it clean up after itself. Move the current framebuffer release into
a common function so it can be extended with the new object with
relative ease.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_display.c | 22 +++++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 72e2c04b6e11..039536625bbf 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -3200,8 +3200,6 @@ intel_find_initial_plane_obj(struct intel_crtc *intel_crtc,
 		goto valid_fb;
 	}
 
-	kfree(plane_config->fb);
-
 	/*
 	 * Failed to alloc the obj, check to see if we should share
 	 * an fb with another CRTC instead
@@ -3221,7 +3219,6 @@ intel_find_initial_plane_obj(struct intel_crtc *intel_crtc,
 
 		if (intel_plane_ggtt_offset(state) == plane_config->base) {
 			fb = state->base.fb;
-			drm_framebuffer_get(fb);
 			goto valid_fb;
 		}
 	}
@@ -3256,7 +3253,6 @@ intel_find_initial_plane_obj(struct intel_crtc *intel_crtc,
 			  intel_crtc->pipe, PTR_ERR(intel_state->vma));
 
 		intel_state->vma = NULL;
-		drm_framebuffer_put(fb);
 		return;
 	}
 
@@ -3278,8 +3274,9 @@ intel_find_initial_plane_obj(struct intel_crtc *intel_crtc,
 	if (plane_config->tiling)
 		dev_priv->preserve_bios_swizzle = true;
 
-	plane_state->fb = fb;
 	plane_state->crtc = &intel_crtc->base;
+	plane_state->fb = fb;
+	drm_framebuffer_get(fb);
 
 	atomic_or(to_intel_plane(primary)->frontbuffer_bit,
 		  &to_intel_frontbuffer(fb)->bits);
@@ -15910,6 +15907,19 @@ static int intel_initial_commit(struct drm_device *dev)
 	return ret;
 }
 
+static void plane_config_fini(struct intel_initial_plane_config *plane_config)
+{
+	if (plane_config->fb) {
+		struct drm_framebuffer *fb = &plane_config->fb->base;
+
+		/* We may only have the stub and not a full framebuffer */
+		if (drm_framebuffer_read_refcount(fb))
+			drm_framebuffer_put(fb);
+		else
+			kfree(fb);
+	}
+}
+
 int intel_modeset_init(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = to_i915(dev);
@@ -16047,6 +16057,8 @@ int intel_modeset_init(struct drm_device *dev)
 		 * just get the first one.
 		 */
 		intel_find_initial_plane_obj(crtc, &plane_config);
+
+		plane_config_fini(&plane_config);
 	}
 
 	/*
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 28/39] initial-plane-vma
  2019-06-14  7:09 WIP glance towards struct_mutex removal Chris Wilson
                   ` (26 preceding siblings ...)
  2019-06-14  7:10 ` [PATCH 27/39] drm/i915: Explicitly cleanup initial_plane_config Chris Wilson
@ 2019-06-14  7:10 ` Chris Wilson
  2019-06-14  7:10 ` [PATCH 29/39] drm/i915: Make i915_vma track its own kref Chris Wilson
                   ` (12 subsequent siblings)
  40 siblings, 0 replies; 61+ messages in thread
From: Chris Wilson @ 2019-06-14  7:10 UTC (permalink / raw)
  To: intel-gfx

---
 drivers/gpu/drm/i915/gem/i915_gem_stolen.c |  88 +++----------
 drivers/gpu/drm/i915/i915_drv.h            |   1 -
 drivers/gpu/drm/i915/intel_display.c       | 146 +++++++++++++--------
 drivers/gpu/drm/i915/intel_drv.h           |   1 +
 drivers/gpu/drm/i915/intel_pm.c            |   1 -
 5 files changed, 109 insertions(+), 128 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
index e8bc9b46d4e8..7066044d63cf 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
@@ -604,29 +604,24 @@ i915_gem_object_create_stolen(struct drm_i915_private *dev_priv,
 }
 
 struct drm_i915_gem_object *
-i915_gem_object_create_stolen_for_preallocated(struct drm_i915_private *dev_priv,
+i915_gem_object_create_stolen_for_preallocated(struct drm_i915_private *i915,
 					       resource_size_t stolen_offset,
-					       resource_size_t gtt_offset,
 					       resource_size_t size)
 {
-	struct i915_ggtt *ggtt = &dev_priv->ggtt;
 	struct drm_i915_gem_object *obj;
 	struct drm_mm_node *stolen;
-	struct i915_vma *vma;
 	int ret;
 
-	if (!drm_mm_initialized(&dev_priv->mm.stolen))
+	if (!drm_mm_initialized(&i915->mm.stolen))
 		return NULL;
 
-	lockdep_assert_held(&dev_priv->drm.struct_mutex);
-
-	DRM_DEBUG_DRIVER("creating preallocated stolen object: stolen_offset=%pa, gtt_offset=%pa, size=%pa\n",
-			 &stolen_offset, &gtt_offset, &size);
+	DRM_DEBUG_DRIVER("creating preallocated stolen object: stolen_offset=%pa, size=%pa\n",
+			 &stolen_offset, &size);
 
 	/* KISS and expect everything to be page-aligned */
-	if (WARN_ON(size == 0) ||
-	    WARN_ON(!IS_ALIGNED(size, I915_GTT_PAGE_SIZE)) ||
-	    WARN_ON(!IS_ALIGNED(stolen_offset, I915_GTT_MIN_ALIGNMENT)))
+	if (GEM_WARN_ON(size == 0) ||
+	    GEM_WARN_ON(!IS_ALIGNED(size, I915_GTT_PAGE_SIZE)) ||
+	    GEM_WARN_ON(!IS_ALIGNED(stolen_offset, I915_GTT_MIN_ALIGNMENT)))
 		return NULL;
 
 	stolen = kzalloc(sizeof(*stolen), GFP_KERNEL);
@@ -635,68 +630,21 @@ i915_gem_object_create_stolen_for_preallocated(struct drm_i915_private *dev_priv
 
 	stolen->start = stolen_offset;
 	stolen->size = size;
-	mutex_lock(&dev_priv->mm.stolen_lock);
-	ret = drm_mm_reserve_node(&dev_priv->mm.stolen, stolen);
-	mutex_unlock(&dev_priv->mm.stolen_lock);
-	if (ret) {
-		DRM_DEBUG_DRIVER("failed to allocate stolen space\n");
-		kfree(stolen);
-		return NULL;
-	}
-
-	obj = _i915_gem_object_create_stolen(dev_priv, stolen);
-	if (obj == NULL) {
-		DRM_DEBUG_DRIVER("failed to allocate stolen object\n");
-		i915_gem_stolen_remove_node(dev_priv, stolen);
-		kfree(stolen);
-		return NULL;
-	}
-
-	/* Some objects just need physical mem from stolen space */
-	if (gtt_offset == I915_GTT_OFFSET_NONE)
-		return obj;
-
-	ret = i915_gem_object_pin_pages(obj);
+	mutex_lock(&i915->mm.stolen_lock);
+	ret = drm_mm_reserve_node(&i915->mm.stolen, stolen);
+	mutex_unlock(&i915->mm.stolen_lock);
 	if (ret)
-		goto err;
+		goto err_free;
 
-	vma = i915_vma_instance(obj, &ggtt->vm, NULL);
-	if (IS_ERR(vma)) {
-		ret = PTR_ERR(vma);
-		goto err_pages;
-	}
-
-	/* To simplify the initialisation sequence between KMS and GTT,
-	 * we allow construction of the stolen object prior to
-	 * setting up the GTT space. The actual reservation will occur
-	 * later.
-	 */
-	ret = i915_gem_gtt_reserve(&ggtt->vm, &vma->node,
-				   size, gtt_offset, obj->cache_level,
-				   0);
-	if (ret) {
-		DRM_DEBUG_DRIVER("failed to allocate stolen GTT space\n");
-		goto err_pages;
-	}
-
-	GEM_BUG_ON(!drm_mm_node_allocated(&vma->node));
-
-	vma->pages = obj->mm.pages;
-	vma->flags |= I915_VMA_GLOBAL_BIND;
-	__i915_vma_set_map_and_fenceable(vma);
-
-	mutex_lock(&ggtt->vm.mutex);
-	list_add_tail(&vma->vm_link, &ggtt->vm.bound_list);
-	mutex_unlock(&ggtt->vm.mutex);
-
-	GEM_BUG_ON(i915_gem_object_is_shrinkable(obj));
-	atomic_inc(&obj->bind_count);
+	obj = _i915_gem_object_create_stolen(i915, stolen);
+	if (!obj)
+		goto err_stolen;
 
 	return obj;
 
-err_pages:
-	i915_gem_object_unpin_pages(obj);
-err:
-	i915_gem_object_put(obj);
+err_stolen:
+	i915_gem_stolen_remove_node(i915, stolen);
+err_free:
+	kfree(stolen);
 	return NULL;
 }
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 3c9317592729..9a1ec487e8b1 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2703,7 +2703,6 @@ i915_gem_object_create_stolen(struct drm_i915_private *dev_priv,
 struct drm_i915_gem_object *
 i915_gem_object_create_stolen_for_preallocated(struct drm_i915_private *dev_priv,
 					       resource_size_t stolen_offset,
-					       resource_size_t gtt_offset,
 					       resource_size_t size);
 
 /* i915_gem_internal.c */
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 039536625bbf..74fc6d696540 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -3041,6 +3041,71 @@ int skl_format_to_fourcc(int format, bool rgb_order, bool alpha)
 	}
 }
 
+static struct i915_vma *
+initial_plane_vma(struct drm_i915_private *i915,
+		  struct intel_initial_plane_config *plane_config)
+{
+	struct drm_i915_gem_object *obj;
+	struct i915_vma *vma;
+	u32 base, size;
+	int err;
+
+	if (plane_config->size == 0)
+		return NULL;
+
+	base = round_down(plane_config->base, PAGE_SIZE);
+	size = round_up(plane_config->base + plane_config->size, PAGE_SIZE);
+	size -= base;
+
+	/*
+	 * If the FB is too big, just don't use it since fbdev is not very
+	 * important and we should probably use that space with FBC or other
+	 * features.
+	 */
+	if (size * 2 > i915->stolen_usable_size)
+		return NULL;
+
+	obj = i915_gem_object_create_stolen_for_preallocated(i915, base, size);
+	if (!obj)
+		return NULL;
+
+	switch (plane_config->tiling) {
+	case I915_TILING_NONE:
+		break;
+	case I915_TILING_X:
+	case I915_TILING_Y:
+		obj->tiling_and_stride =
+			plane_config->fb->base.pitches[0] |
+			plane_config->tiling;
+		break;
+	default:
+		MISSING_CASE(plane_config->tiling);
+		goto err_obj;
+	}
+
+	vma = i915_vma_instance(obj, &i915->ggtt.vm, NULL);
+	if (IS_ERR(vma))
+		goto err_obj;
+
+	mutex_lock(&i915->drm.struct_mutex);
+	err = i915_vma_pin(vma, 0, 0,
+			   PIN_GLOBAL | PIN_MAPPABLE |
+			   base | PIN_OFFSET_FIXED);
+	mutex_unlock(&i915->drm.struct_mutex);
+	if (err)
+		goto err_obj;
+
+	if (i915_gem_object_is_tiled(obj) &&
+	    !i915_vma_is_map_and_fenceable(vma))
+		goto err_obj;
+
+	return vma;
+
+err_obj:
+	i915_gem_object_put(obj);
+	return NULL;
+}
+
 static bool
 intel_alloc_initial_plane_obj(struct intel_crtc *crtc,
 			      struct intel_initial_plane_config *plane_config)
@@ -3049,22 +3114,7 @@ intel_alloc_initial_plane_obj(struct intel_crtc *crtc,
 	struct drm_i915_private *dev_priv = to_i915(dev);
 	struct drm_mode_fb_cmd2 mode_cmd = { 0 };
 	struct drm_framebuffer *fb = &plane_config->fb->base;
-	u32 base_aligned = round_down(plane_config->base, PAGE_SIZE);
-	u32 size_aligned = round_up(plane_config->base + plane_config->size,
-				    PAGE_SIZE);
-	struct drm_i915_gem_object *obj;
-	bool ret = false;
-
-	size_aligned -= base_aligned;
-
-	if (plane_config->size == 0)
-		return false;
-
-	/* If the FB is too big, just don't use it since fbdev is not very
-	 * important and we should probably use that space with FBC or other
-	 * features. */
-	if (size_aligned * 2 > dev_priv->stolen_usable_size)
-		return false;
+	struct i915_vma *vma;
 
 	switch (fb->modifier) {
 	case DRM_FORMAT_MOD_LINEAR:
@@ -3077,27 +3127,10 @@ intel_alloc_initial_plane_obj(struct intel_crtc *crtc,
 		return false;
 	}
 
-	mutex_lock(&dev->struct_mutex);
-	obj = i915_gem_object_create_stolen_for_preallocated(dev_priv,
-							     base_aligned,
-							     base_aligned,
-							     size_aligned);
-	mutex_unlock(&dev->struct_mutex);
-	if (!obj)
+	vma = initial_plane_vma(dev_priv, plane_config);
+	if (!vma)
 		return false;
 
-	switch (plane_config->tiling) {
-	case I915_TILING_NONE:
-		break;
-	case I915_TILING_X:
-	case I915_TILING_Y:
-		obj->tiling_and_stride = fb->pitches[0] | plane_config->tiling;
-		break;
-	default:
-		MISSING_CASE(plane_config->tiling);
-		goto out;
-	}
-
 	mode_cmd.pixel_format = fb->format->format;
 	mode_cmd.width = fb->width;
 	mode_cmd.height = fb->height;
@@ -3105,17 +3138,18 @@ intel_alloc_initial_plane_obj(struct intel_crtc *crtc,
 	mode_cmd.modifier[0] = fb->modifier;
 	mode_cmd.flags = DRM_MODE_FB_MODIFIERS;
 
-	if (intel_framebuffer_init(to_intel_framebuffer(fb), obj, &mode_cmd)) {
+	if (intel_framebuffer_init(to_intel_framebuffer(fb),
+				   vma->obj, &mode_cmd)) {
 		DRM_DEBUG_KMS("intel fb init failed\n");
-		goto out;
+		goto err_vma;
 	}
 
+	plane_config->vma = vma;
+	return true;
 
-	DRM_DEBUG_KMS("initial plane fb obj %p\n", obj);
-	ret = true;
-out:
-	i915_gem_object_put(obj);
-	return ret;
+err_vma:
+	i915_vma_put(vma);
+	return false;
 }
 
 static void
@@ -3191,12 +3225,14 @@ intel_find_initial_plane_obj(struct intel_crtc *intel_crtc,
 	struct intel_plane_state *intel_state =
 		to_intel_plane_state(plane_state);
 	struct drm_framebuffer *fb;
+	struct i915_vma *vma;
 
 	if (!plane_config->fb)
 		return;
 
 	if (intel_alloc_initial_plane_obj(intel_crtc, plane_config)) {
 		fb = &plane_config->fb->base;
+		vma = plane_config->vma;
 		goto valid_fb;
 	}
 
@@ -3219,6 +3255,7 @@ intel_find_initial_plane_obj(struct intel_crtc *intel_crtc,
 
 		if (intel_plane_ggtt_offset(state) == plane_config->base) {
 			fb = state->base.fb;
+			vma = state->vma;
 			goto valid_fb;
 		}
 	}
@@ -3242,21 +3279,13 @@ intel_find_initial_plane_obj(struct intel_crtc *intel_crtc,
 		intel_fb_pitch(fb, 0, intel_state->base.rotation);
 
 	mutex_lock(&dev->struct_mutex);
-	intel_state->vma =
-		intel_pin_and_fence_fb_obj(fb,
-					   &intel_state->view,
-					   intel_plane_uses_fence(intel_state),
-					   &intel_state->flags);
+	intel_state->vma = i915_vma_get(plane_config->vma);
+	__i915_vma_pin(intel_state->vma);
+	if (intel_plane_uses_fence(intel_state) &&
+	    i915_vma_pin_fence(intel_state->vma) == 0)
+		intel_state->flags |= PLANE_HAS_FENCE;
+	vma->obj->pin_global++;
 	mutex_unlock(&dev->struct_mutex);
-	if (IS_ERR(intel_state->vma)) {
-		DRM_ERROR("failed to pin boot fb on pipe %d: %li\n",
-			  intel_crtc->pipe, PTR_ERR(intel_state->vma));
-
-		intel_state->vma = NULL;
-		return;
-	}
-
-	intel_frontbuffer_flush(to_intel_frontbuffer(fb), ORIGIN_DIRTYFB);
 
 	plane_state->src_x = 0;
 	plane_state->src_y = 0;
@@ -3278,6 +3307,8 @@ intel_find_initial_plane_obj(struct intel_crtc *intel_crtc,
 	plane_state->fb = fb;
 	drm_framebuffer_get(fb);
 
+	intel_frontbuffer_flush(to_intel_frontbuffer(fb), ORIGIN_DIRTYFB);
+
 	atomic_or(to_intel_plane(primary)->frontbuffer_bit,
 		  &to_intel_frontbuffer(fb)->bits);
 }
@@ -15918,6 +15949,9 @@ static void plane_config_fini(struct intel_initial_plane_config *plane_config)
 		else
 			kfree(fb);
 	}
+
+	if (plane_config->vma)
+		i915_vma_put(plane_config->vma);
 }
 
 int intel_modeset_init(struct drm_device *dev)
diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
index a5253fd838bc..3a119d0105b5 100644
--- a/drivers/gpu/drm/i915/intel_drv.h
+++ b/drivers/gpu/drm/i915/intel_drv.h
@@ -548,6 +548,7 @@ struct intel_plane_state {
 
 struct intel_initial_plane_config {
 	struct intel_framebuffer *fb;
+	struct i915_vma *vma;
 	unsigned int tiling;
 	int size;
 	u32 base;
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 2c7f3ebc0117..62416798d5fe 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -7722,7 +7722,6 @@ static void valleyview_setup_pctx(struct drm_i915_private *dev_priv)
 		pcbr_offset = (pcbr & (~4095)) - dev_priv->dsm.start;
 		pctx = i915_gem_object_create_stolen_for_preallocated(dev_priv,
 								      pcbr_offset,
-								      I915_GTT_OFFSET_NONE,
 								      pctx_size);
 		goto out;
 	}
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 29/39] drm/i915: Make i915_vma track its own kref
  2019-06-14  7:09 WIP glance towards struct_mutex removal Chris Wilson
                   ` (27 preceding siblings ...)
  2019-06-14  7:10 ` [PATCH 28/39] initial-plane-vma Chris Wilson
@ 2019-06-14  7:10 ` Chris Wilson
  2019-06-14 11:15   ` Matthew Auld
  2019-06-14  7:10 ` [PATCH 30/39] drm/i915: Propagate fence errors Chris Wilson
                   ` (11 subsequent siblings)
  40 siblings, 1 reply; 61+ messages in thread
From: Chris Wilson @ 2019-06-14  7:10 UTC (permalink / raw)
  To: intel-gfx

Throughout the code base we internally track vma (objects bound into
a particular GTT), with the objects themselves being the common backing
storage. By making the vma itself reference counted we can start
operating on the vma concurrently, moving work into async threads.

Just the conversion to making sure we keep track of the vma reference
counts is not particularly pleasant.
---
 .../gpu/drm/i915/gem/i915_gem_client_blt.c    |  6 +-
 drivers/gpu/drm/i915/gem/i915_gem_context.c   |  6 +-
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 19 ++---
 drivers/gpu/drm/i915/gem/i915_gem_object.c    | 16 +----
 .../gpu/drm/i915/gem/i915_gem_object_blt.c    |  4 +-
 .../gpu/drm/i915/gem/selftests/huge_pages.c   | 28 ++------
 .../drm/i915/gem/selftests/i915_gem_context.c | 17 ++---
 .../drm/i915/gem/selftests/i915_gem_mman.c    | 15 ++--
 drivers/gpu/drm/i915/gt/intel_engine_cs.c     |  7 +-
 drivers/gpu/drm/i915/gt/intel_lrc.c           | 28 ++++----
 drivers/gpu/drm/i915/gt/intel_ringbuffer.c    | 13 +---
 drivers/gpu/drm/i915/gt/intel_workarounds.c   | 16 ++---
 .../gpu/drm/i915/gt/selftest_workarounds.c    | 18 ++---
 drivers/gpu/drm/i915/gvt/scheduler.c          |  2 +-
 drivers/gpu/drm/i915/i915_gem.c               | 21 +++---
 drivers/gpu/drm/i915/i915_gem_gtt.c           | 18 +----
 drivers/gpu/drm/i915/i915_gem_render_state.c  |  2 +-
 drivers/gpu/drm/i915/i915_timeline.c          |  3 +-
 drivers/gpu/drm/i915/i915_vma.c               | 69 ++++++++++---------
 drivers/gpu/drm/i915/i915_vma.h               | 38 +++++-----
 drivers/gpu/drm/i915/intel_display.c          | 16 +++--
 drivers/gpu/drm/i915/intel_guc.c              |  9 +--
 drivers/gpu/drm/i915/intel_overlay.c          | 17 ++---
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 11 +--
 drivers/gpu/drm/i915/selftests/i915_vma.c     |  3 +-
 25 files changed, 178 insertions(+), 224 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c b/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c
index f253ec5765ad..22a262dfd07b 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c
@@ -9,7 +9,6 @@
 
 struct i915_sleeve {
 	struct i915_vma *vma;
-	struct drm_i915_gem_object *obj;
 	struct sg_table *pages;
 	struct i915_page_sizes page_sizes;
 };
@@ -72,7 +71,6 @@ static struct i915_sleeve *create_sleeve(struct i915_address_space *vm,
 	vma->ops = &proxy_vma_ops;
 
 	sleeve->vma = vma;
-	sleeve->obj = i915_gem_object_get(obj);
 	sleeve->pages = pages;
 	sleeve->page_sizes = *page_sizes;
 
@@ -85,7 +83,7 @@ static struct i915_sleeve *create_sleeve(struct i915_address_space *vm,
 
 static void destroy_sleeve(struct i915_sleeve *sleeve)
 {
-	i915_gem_object_put(sleeve->obj);
+	i915_vma_put(sleeve->vma);
 	kfree(sleeve);
 }
 
@@ -155,7 +153,7 @@ static void clear_pages_worker(struct work_struct *work)
 {
 	struct clear_pages_work *w = container_of(work, typeof(*w), work);
 	struct drm_i915_private *i915 = w->ce->gem_context->i915;
-	struct drm_i915_gem_object *obj = w->sleeve->obj;
+	struct drm_i915_gem_object *obj = w->sleeve->vma->obj;
 	struct i915_vma *vma = w->sleeve->vma;
 	struct i915_request *rq;
 	int err = w->dma.error;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 2e3e47bb1d11..24a714853f85 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -127,10 +127,10 @@ static void lut_close(struct i915_gem_context *ctx)
 		if (&lut->obj_link != &obj->lut_list) {
 			i915_lut_handle_free(lut);
 			radix_tree_iter_delete(&ctx->handles_vma, &iter, slot);
-			if (atomic_dec_and_test(&vma->open_count) &&
-			    !i915_vma_is_ggtt(vma))
+			GEM_BUG_ON(!atomic_read(&vma->open_count));
+			if (atomic_dec_and_test(&vma->open_count))
 				i915_vma_close(vma);
-			i915_gem_object_put(obj);
+			i915_vma_put(vma);
 		}
 
 		i915_gem_object_put(obj);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index eeb4c6cdb01d..4354f4e85d41 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -797,8 +797,8 @@ static int eb_wait_for_ring(const struct i915_execbuffer *eb)
 static int eb_lookup_vmas(struct i915_execbuffer *eb)
 {
 	struct radix_tree_root *handles_vma = &eb->gem_context->handles_vma;
-	struct drm_i915_gem_object *obj;
 	unsigned int i, batch;
+	struct i915_vma *vma;
 	int err;
 
 	if (unlikely(i915_gem_context_is_banned(eb->gem_context)))
@@ -817,8 +817,8 @@ static int eb_lookup_vmas(struct i915_execbuffer *eb)
 
 	for (i = 0; i < eb->buffer_count; i++) {
 		u32 handle = eb->exec[i].handle;
+		struct drm_i915_gem_object *obj;
 		struct i915_lut_handle *lut;
-		struct i915_vma *vma;
 
 		vma = radix_tree_lookup(handles_vma, handle);
 		if (likely(vma))
@@ -831,21 +831,22 @@ static int eb_lookup_vmas(struct i915_execbuffer *eb)
 		}
 
 		vma = i915_vma_instance(obj, eb->vm, NULL);
+		i915_gem_object_put(obj);
 		if (IS_ERR(vma)) {
 			err = PTR_ERR(vma);
-			goto err_obj;
+			goto err_vma;
 		}
 
 		lut = i915_lut_handle_alloc();
 		if (unlikely(!lut)) {
 			err = -ENOMEM;
-			goto err_obj;
+			goto err_put;
 		}
 
 		err = radix_tree_insert(handles_vma, handle, vma);
 		if (unlikely(err)) {
 			i915_lut_handle_free(lut);
-			goto err_obj;
+			goto err_put;
 		}
 
 		/* transfer ref to lut */
@@ -874,8 +875,8 @@ static int eb_lookup_vmas(struct i915_execbuffer *eb)
 	eb->args->flags |= __EXEC_VALIDATED;
 	return eb_reserve(eb);
 
-err_obj:
-	i915_gem_object_put(obj);
+err_put:
+	i915_vma_put(vma);
 err_vma:
 	eb->vma[i] = NULL;
 err_ctx:
@@ -1226,7 +1227,7 @@ static int __reloc_gpu_alloc(struct i915_execbuffer *eb,
 
 	err = i915_vma_pin(batch, 0, 0, PIN_USER | PIN_NONBLOCK);
 	if (err)
-		goto err_unmap;
+		goto err_batch;
 
 	rq = i915_request_create(eb->context);
 	if (IS_ERR(rq)) {
@@ -1267,6 +1268,8 @@ static int __reloc_gpu_alloc(struct i915_execbuffer *eb,
 	i915_request_add(rq);
 err_unpin:
 	i915_vma_unpin(batch);
+err_batch:
+	i915_vma_put(batch);
 err_unmap:
 	i915_gem_object_unpin_map(obj);
 	return err;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
index ee81acb04d84..1ad53cddd425 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
@@ -122,15 +122,14 @@ void i915_gem_close_object(struct drm_gem_object *gem, struct drm_file *file)
 		if (vma) {
 			GEM_BUG_ON(vma->obj != obj);
 			GEM_BUG_ON(!atomic_read(&vma->open_count));
-			if (atomic_dec_and_test(&vma->open_count) &&
-			    !i915_vma_is_ggtt(vma))
+			if (atomic_dec_and_test(&vma->open_count))
 				i915_vma_close(vma);
+			i915_vma_put(vma);
 		}
 		mutex_unlock(&ctx->mutex);
 
 		i915_gem_context_put(lut->ctx);
 		i915_lut_handle_free(lut);
-		i915_gem_object_put(obj);
 	}
 }
 
@@ -169,17 +168,8 @@ static void __i915_gem_free_objects(struct drm_i915_private *i915,
 
 	wakeref = intel_runtime_pm_get(i915);
 	llist_for_each_entry_safe(obj, on, freed, freed) {
-		struct i915_vma *vma, *vn;
-
 		trace_i915_gem_object_destroy(obj);
 
-		mutex_lock(&i915->drm.struct_mutex);
-
-		list_for_each_entry_safe(vma, vn, &obj->vma.list, obj_link) {
-			GEM_BUG_ON(i915_vma_is_active(vma));
-			vma->flags &= ~I915_VMA_PIN_MASK;
-			i915_vma_destroy(vma);
-		}
 		GEM_BUG_ON(!list_empty(&obj->vma.list));
 		GEM_BUG_ON(!RB_EMPTY_ROOT(&obj->vma.tree));
 
@@ -199,8 +189,6 @@ static void __i915_gem_free_objects(struct drm_i915_private *i915,
 			spin_unlock_irqrestore(&i915->mm.obj_lock, flags);
 		}
 
-		mutex_unlock(&i915->drm.struct_mutex);
-
 		GEM_BUG_ON(atomic_read(&obj->bind_count));
 		GEM_BUG_ON(obj->userfault_count);
 		GEM_BUG_ON(!list_empty(&obj->lut_list));
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_blt.c b/drivers/gpu/drm/i915/gem/i915_gem_object_blt.c
index cb42e3a312e2..0e7922aa4bd3 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_blt.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_blt.c
@@ -61,7 +61,7 @@ int i915_gem_object_fill_blt(struct drm_i915_gem_object *obj,
 
 	err = i915_vma_pin(vma, 0, 0, PIN_USER);
 	if (unlikely(err))
-		return err;
+		goto out_put;
 
 	if (obj->cache_dirty & ~obj->cache_coherent) {
 		i915_gem_object_lock(obj);
@@ -99,6 +99,8 @@ int i915_gem_object_fill_blt(struct drm_i915_gem_object *obj,
 	i915_request_add(rq);
 out_unpin:
 	i915_vma_unpin(vma);
+out_put:
+	i915_vma_put(vma);
 	return err;
 }
 
diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
index 73e667b31cc4..b03a29366bd1 100644
--- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
+++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
@@ -413,7 +413,7 @@ static int igt_mock_exhaust_device_supported_pages(void *arg)
 
 			err = i915_vma_pin(vma, 0, 0, PIN_USER);
 			if (err)
-				goto out_close;
+				goto out_put;
 
 			err = igt_check_page_sizes(vma);
 
@@ -424,7 +424,7 @@ static int igt_mock_exhaust_device_supported_pages(void *arg)
 			}
 
 			i915_vma_unpin(vma);
-			i915_vma_close(vma);
+			i915_vma_put(vma);
 
 			i915_gem_object_put(obj);
 
@@ -435,8 +435,6 @@ static int igt_mock_exhaust_device_supported_pages(void *arg)
 
 	goto out_device;
 
-out_close:
-	i915_vma_close(vma);
 out_put:
 	i915_gem_object_put(obj);
 out_device:
@@ -498,11 +496,9 @@ static int igt_mock_ppgtt_misaligned_dma(void *arg)
 
 		err = i915_vma_pin(vma, 0, 0, flags);
 		if (err) {
-			i915_vma_close(vma);
 			goto out_unpin;
 		}
 
-
 		err = igt_check_page_sizes(vma);
 
 		if (vma->page_sizes.gtt != page_size) {
@@ -514,7 +510,6 @@ static int igt_mock_ppgtt_misaligned_dma(void *arg)
 		i915_vma_unpin(vma);
 
 		if (err) {
-			i915_vma_close(vma);
 			goto out_unpin;
 		}
 
@@ -526,13 +521,11 @@ static int igt_mock_ppgtt_misaligned_dma(void *arg)
 		for (offset = 4096; offset < page_size; offset += 4096) {
 			err = i915_vma_unbind(vma);
 			if (err) {
-				i915_vma_close(vma);
 				goto out_unpin;
 			}
 
 			err = i915_vma_pin(vma, 0, 0, flags | offset);
 			if (err) {
-				i915_vma_close(vma);
 				goto out_unpin;
 			}
 
@@ -547,7 +540,6 @@ static int igt_mock_ppgtt_misaligned_dma(void *arg)
 			i915_vma_unpin(vma);
 
 			if (err) {
-				i915_vma_close(vma);
 				goto out_unpin;
 			}
 
@@ -557,8 +549,6 @@ static int igt_mock_ppgtt_misaligned_dma(void *arg)
 				break;
 		}
 
-		i915_vma_close(vma);
-
 		i915_gem_object_unpin_pages(obj);
 		__i915_gem_object_put_pages(obj, I915_MM_NORMAL);
 		i915_gem_object_put(obj);
@@ -583,8 +573,9 @@ static void close_object_list(struct list_head *objects,
 		struct i915_vma *vma;
 
 		vma = i915_vma_instance(obj, &ppgtt->vm, NULL);
-		if (!IS_ERR(vma))
-			i915_vma_close(vma);
+		if (!IS_ERR(vma)) {
+			i915_vma_put(vma);
+		}
 
 		list_del(&obj->st_link);
 		i915_gem_object_unpin_pages(obj);
@@ -855,7 +846,6 @@ static int igt_mock_ppgtt_64K(void *arg)
 			}
 
 			i915_vma_unpin(vma);
-			i915_vma_close(vma);
 
 			i915_gem_object_unpin_pages(obj);
 			__i915_gem_object_put_pages(obj, I915_MM_NORMAL);
@@ -868,7 +858,6 @@ static int igt_mock_ppgtt_64K(void *arg)
 out_vma_unpin:
 	i915_vma_unpin(vma);
 out_vma_close:
-	i915_vma_close(vma);
 out_object_unpin:
 	i915_gem_object_unpin_pages(obj);
 out_object_put:
@@ -993,7 +982,6 @@ static int gpu_write(struct i915_vma *vma,
 	i915_request_add(rq);
 err_batch:
 	i915_vma_unpin(batch);
-	i915_vma_close(batch);
 	i915_vma_put(batch);
 
 	return err;
@@ -1082,7 +1070,7 @@ static int __igt_write_huge(struct i915_gem_context *ctx,
 out_vma_unpin:
 	i915_vma_unpin(vma);
 out_vma_close:
-	i915_vma_destroy(vma);
+	i915_vma_put(vma);
 
 	return err;
 }
@@ -1492,7 +1480,6 @@ static int igt_ppgtt_pin_update(void *arg)
 			goto out_unpin;
 
 		i915_vma_unpin(vma);
-		i915_vma_close(vma);
 
 		i915_gem_object_put(obj);
 	}
@@ -1527,7 +1514,6 @@ static int igt_ppgtt_pin_update(void *arg)
 out_unpin:
 	i915_vma_unpin(vma);
 out_close:
-	i915_vma_close(vma);
 out_put:
 	i915_gem_object_put(obj);
 
@@ -1583,7 +1569,6 @@ static int igt_tmpfs_fallback(void *arg)
 
 	i915_vma_unpin(vma);
 out_close:
-	i915_vma_close(vma);
 out_put:
 	i915_gem_object_put(obj);
 out_restore:
@@ -1667,7 +1652,6 @@ static int igt_shrink_thp(void *arg)
 out_unpin:
 	i915_vma_unpin(vma);
 out_close:
-	i915_vma_close(vma);
 out_put:
 	i915_gem_object_put(obj);
 
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
index 9e2878a59023..b2870d2bd458 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
@@ -210,14 +210,15 @@ gpu_fill_dw(struct i915_vma *vma, u64 offset, unsigned long count, u32 value)
 	i915_gem_object_unpin_map(obj);
 
 	vma = i915_vma_instance(obj, vma->vm, NULL);
-	if (IS_ERR(vma)) {
-		err = PTR_ERR(vma);
-		goto err;
-	}
+	i915_gem_object_put(obj);
+	if (IS_ERR(vma))
+		return vma;
 
 	err = i915_vma_pin(vma, 0, 0, PIN_USER);
-	if (err)
-		goto err;
+	if (err) {
+		i915_vma_put(vma);
+		return ERR_PTR(err);
+	}
 
 	return vma;
 
@@ -314,7 +315,6 @@ static int gpu_fill(struct drm_i915_gem_object *obj,
 	i915_request_add(rq);
 
 	i915_vma_unpin(batch);
-	i915_vma_close(batch);
 	i915_vma_put(batch);
 
 	i915_vma_unpin(vma);
@@ -795,7 +795,6 @@ emit_rpcs_query(struct drm_i915_gem_object *obj,
 		goto skip_request;
 
 	i915_vma_unpin(batch);
-	i915_vma_close(batch);
 	i915_vma_put(batch);
 
 	i915_vma_unpin(vma);
@@ -1359,7 +1358,6 @@ static int write_to_scratch(struct i915_gem_context *ctx,
 		goto skip_request;
 
 	i915_vma_unpin(vma);
-	i915_vma_close(vma);
 	i915_vma_put(vma);
 
 	i915_request_add(rq);
@@ -1456,7 +1454,6 @@ static int read_from_scratch(struct i915_gem_context *ctx,
 		goto skip_request;
 
 	i915_vma_unpin(vma);
-	i915_vma_close(vma);
 
 	i915_request_add(rq);
 
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
index aac093b1b0b2..8ba425f8061e 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
@@ -165,10 +165,10 @@ static int check_partial_mapping(struct drm_i915_gem_object *obj,
 		*cpu = 0;
 		drm_clflush_virt_range(cpu, sizeof(*cpu));
 		kunmap(p);
-		if (err)
-			return err;
 
 		i915_vma_destroy(vma);
+		if (err)
+			return err;
 	}
 
 	return 0;
@@ -337,12 +337,12 @@ static int make_obj_busy(struct drm_i915_gem_object *obj)
 
 	err = i915_vma_pin(vma, 0, 0, PIN_USER);
 	if (err)
-		return err;
+		goto out_put;
 
 	rq = i915_request_create(i915->engine[RCS0]->kernel_context);
 	if (IS_ERR(rq)) {
-		i915_vma_unpin(vma);
-		return PTR_ERR(rq);
+		err = PTR_ERR(rq);
+		goto out_unpin;
 	}
 
 	i915_vma_lock(vma);
@@ -351,9 +351,10 @@ static int make_obj_busy(struct drm_i915_gem_object *obj)
 
 	i915_request_add(rq);
 
+out_unpin:
 	i915_vma_unpin(vma);
-	i915_gem_object_put(obj); /* leave it only alive via its active ref */
-
+out_put:
+	i915_vma_put(vma); /* leave it only alive via its active ref */
 	return err;
 }
 
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index 01357280449a..85eb5c99d657 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -531,7 +531,7 @@ static void cleanup_status_page(struct intel_engine_cs *engine)
 		i915_vma_unpin(vma);
 
 	i915_gem_object_unpin_map(vma->obj);
-	i915_gem_object_put(vma->obj);
+	i915_vma_put(vma);
 }
 
 static int pin_ggtt_status_page(struct intel_engine_cs *engine,
@@ -590,7 +590,7 @@ static int init_status_page(struct intel_engine_cs *engine)
 	vaddr = i915_gem_object_pin_map(obj, I915_MAP_WB);
 	if (IS_ERR(vaddr)) {
 		ret = PTR_ERR(vaddr);
-		goto err;
+		goto err_vma;
 	}
 
 	engine->status_page.addr = memset(vaddr, 0, PAGE_SIZE);
@@ -602,10 +602,13 @@ static int init_status_page(struct intel_engine_cs *engine)
 			goto err_unpin;
 	}
 
+	i915_gem_object_put(obj);
 	return 0;
 
 err_unpin:
 	i915_gem_object_unpin_map(obj);
+err_vma:
+	i915_vma_put(vma);
 err:
 	i915_gem_object_put(obj);
 	return ret;
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 5dbc43c70496..5cd263dd3e23 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -1981,10 +1981,9 @@ static int lrc_setup_wa_ctx(struct intel_engine_cs *engine)
 		return PTR_ERR(obj);
 
 	vma = i915_vma_instance(obj, &engine->i915->ggtt.vm, NULL);
-	if (IS_ERR(vma)) {
-		err = PTR_ERR(vma);
-		goto err;
-	}
+	i915_gem_object_put(obj);
+	if (IS_ERR(vma))
+		return PTR_ERR(vma);
 
 	err = i915_vma_pin(vma, 0, 0, PIN_GLOBAL | PIN_HIGH);
 	if (err)
@@ -1994,7 +1993,7 @@ static int lrc_setup_wa_ctx(struct intel_engine_cs *engine)
 	return 0;
 
 err:
-	i915_gem_object_put(obj);
+	i915_vma_put(vma);
 	return err;
 }
 
@@ -3063,15 +3062,14 @@ static int execlists_context_deferred_alloc(struct intel_context *ce,
 		return PTR_ERR(ctx_obj);
 
 	vma = i915_vma_instance(ctx_obj, &engine->i915->ggtt.vm, NULL);
-	if (IS_ERR(vma)) {
-		ret = PTR_ERR(vma);
-		goto error_deref_obj;
-	}
+	i915_gem_object_put(ctx_obj);
+	if (IS_ERR(vma))
+		return PTR_ERR(vma);
 
 	timeline = get_timeline(ce->gem_context);
 	if (IS_ERR(timeline)) {
 		ret = PTR_ERR(timeline);
-		goto error_deref_obj;
+		goto err_vma;
 	}
 
 	ring = intel_engine_create_ring(engine,
@@ -3080,13 +3078,13 @@ static int execlists_context_deferred_alloc(struct intel_context *ce,
 	i915_timeline_put(timeline);
 	if (IS_ERR(ring)) {
 		ret = PTR_ERR(ring);
-		goto error_deref_obj;
+		goto err_vma;
 	}
 
 	ret = populate_lr_context(ce, ctx_obj, engine, ring);
 	if (ret) {
 		DRM_DEBUG_DRIVER("Failed to populate LRC: %d\n", ret);
-		goto error_ring_free;
+		goto err_ring;
 	}
 
 	ce->ring = ring;
@@ -3094,10 +3092,10 @@ static int execlists_context_deferred_alloc(struct intel_context *ce,
 
 	return 0;
 
-error_ring_free:
+err_ring:
 	intel_ring_put(ring);
-error_deref_obj:
-	i915_gem_object_put(ctx_obj);
+err_vma:
+	i915_vma_put(vma);
 	return ret;
 }
 
diff --git a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
index bead5afaeb13..d3783beaf132 100644
--- a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
@@ -1246,13 +1246,8 @@ intel_ring_create_vma(struct drm_i915_private *dev_priv, int size)
 		i915_gem_object_set_readonly(obj);
 
 	vma = i915_vma_instance(obj, vm, NULL);
-	if (IS_ERR(vma))
-		goto err;
-
-	return vma;
-
-err:
 	i915_gem_object_put(obj);
+
 	return vma;
 }
 
@@ -1300,7 +1295,6 @@ void intel_ring_free(struct kref *ref)
 {
 	struct intel_ring *ring = container_of(ref, typeof(*ring), ref);
 
-	i915_vma_close(ring->vma);
 	i915_vma_put(ring->vma);
 
 	i915_timeline_put(ring->timeline);
@@ -1404,10 +1398,7 @@ alloc_context_vma(struct intel_engine_cs *engine)
 	}
 
 	vma = i915_vma_instance(obj, &i915->ggtt.vm, NULL);
-	if (IS_ERR(vma)) {
-		err = PTR_ERR(vma);
-		goto err_obj;
-	}
+	i915_gem_object_put(obj);
 
 	return vma;
 
diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c
index 165b0a45e009..2c42f66f9301 100644
--- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
+++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
@@ -1310,20 +1310,19 @@ create_scratch(struct i915_address_space *vm, int count)
 	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
 
 	vma = i915_vma_instance(obj, vm, NULL);
-	if (IS_ERR(vma)) {
-		err = PTR_ERR(vma);
-		goto err_obj;
-	}
+	i915_gem_object_put(obj);
+	if (IS_ERR(vma))
+		return vma;
 
 	err = i915_vma_pin(vma, 0, 0,
 			   i915_vma_is_ggtt(vma) ? PIN_GLOBAL : PIN_USER);
 	if (err)
-		goto err_obj;
+		goto err_put;
 
 	return vma;
 
-err_obj:
-	i915_gem_object_put(obj);
+err_put:
+	i915_vma_put(vma);
 	return ERR_PTR(err);
 }
 
@@ -1403,8 +1402,7 @@ static int engine_wa_list_verify(struct intel_context *ce,
 	i915_gem_object_unpin_map(vma->obj);
 
 err_vma:
-	i915_vma_unpin(vma);
-	i915_vma_put(vma);
+	i915_vma_destroy(vma);
 	return err;
 }
 
diff --git a/drivers/gpu/drm/i915/gt/selftest_workarounds.c b/drivers/gpu/drm/i915/gt/selftest_workarounds.c
index 93e9579b8a4f..180745517633 100644
--- a/drivers/gpu/drm/i915/gt/selftest_workarounds.c
+++ b/drivers/gpu/drm/i915/gt/selftest_workarounds.c
@@ -110,7 +110,7 @@ read_nonprivs(struct i915_gem_context *ctx, struct intel_engine_cs *engine)
 
 	err = i915_vma_pin(vma, 0, 0, PIN_GLOBAL);
 	if (err)
-		goto err_obj;
+		goto err_vma;
 
 	rq = igt_request_alloc(ctx, engine);
 	if (IS_ERR(rq)) {
@@ -144,6 +144,7 @@ read_nonprivs(struct i915_gem_context *ctx, struct intel_engine_cs *engine)
 
 	i915_request_add(rq);
 	i915_vma_unpin(vma);
+	i915_vma_put(vma);
 
 	return result;
 
@@ -151,6 +152,8 @@ read_nonprivs(struct i915_gem_context *ctx, struct intel_engine_cs *engine)
 	i915_request_add(rq);
 err_pin:
 	i915_vma_unpin(vma);
+err_vma:
+	i915_vma_put(vma);
 err_obj:
 	i915_gem_object_put(result);
 	return ERR_PTR(err);
@@ -359,19 +362,18 @@ static struct i915_vma *create_batch(struct i915_gem_context *ctx)
 		return ERR_CAST(obj);
 
 	vma = i915_vma_instance(obj, ctx->vm, NULL);
-	if (IS_ERR(vma)) {
-		err = PTR_ERR(vma);
-		goto err_obj;
-	}
+	i915_gem_object_put(obj);
+	if (IS_ERR(vma))
+		return vma;
 
 	err = i915_vma_pin(vma, 0, 0, PIN_USER);
 	if (err)
-		goto err_obj;
+		goto err_put;
 
 	return vma;
 
-err_obj:
-	i915_gem_object_put(obj);
+err_put:
+	i915_vma_put(vma);
 	return ERR_PTR(err);
 }
 
diff --git a/drivers/gpu/drm/i915/gvt/scheduler.c b/drivers/gpu/drm/i915/gvt/scheduler.c
index c36f904a6d54..bbc852386b0a 100644
--- a/drivers/gpu/drm/i915/gvt/scheduler.c
+++ b/drivers/gpu/drm/i915/gvt/scheduler.c
@@ -595,7 +595,7 @@ static void release_shadow_batch_buffer(struct intel_vgpu_workload *workload)
 
 			if (bb->vma && !IS_ERR(bb->vma)) {
 				i915_vma_unpin(bb->vma);
-				i915_vma_close(bb->vma);
+				i915_vma_put(bb->vma);
 			}
 			i915_gem_object_put(bb->obj);
 		}
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 94901d2d6573..fcbca8a06f41 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -112,11 +112,14 @@ int i915_gem_object_unbind(struct drm_i915_gem_object *obj)
 	while (!ret && (vma = list_first_entry_or_null(&obj->vma.list,
 						       struct i915_vma,
 						       obj_link))) {
+		i915_vma_get(vma);
+
 		list_move_tail(&vma->obj_link, &still_in_list);
 		spin_unlock(&obj->vma.lock);
 
 		ret = i915_vma_unbind(vma);
 
+		i915_vma_put(vma);
 		spin_lock(&obj->vma.lock);
 	}
 	list_splice(&still_in_list, &obj->vma.list);
@@ -1095,11 +1098,14 @@ i915_gem_object_ggtt_pin(struct drm_i915_gem_object *obj,
 		     !!(flags & PIN_MAPPABLE),
 		     i915_vma_is_map_and_fenceable(vma));
 		ret = i915_vma_unbind(vma);
-		if (ret)
+		if (ret) {
+			i915_vma_put(vma);
 			return ERR_PTR(ret);
+		}
 	}
 
 	ret = i915_vma_pin(vma, size, alignment, flags | PIN_GLOBAL);
+	i915_vma_put(vma);
 	if (ret)
 		return ERR_PTR(ret);
 
@@ -1476,20 +1482,19 @@ i915_gem_init_scratch(struct drm_i915_private *i915, unsigned int size)
 	}
 
 	vma = i915_vma_instance(obj, &i915->ggtt.vm, NULL);
-	if (IS_ERR(vma)) {
-		ret = PTR_ERR(vma);
-		goto err_unref;
-	}
+	i915_gem_object_put(obj);
+	if (IS_ERR(vma))
+		return PTR_ERR(vma);
 
 	ret = i915_vma_pin(vma, 0, 0, PIN_GLOBAL | PIN_HIGH);
 	if (ret)
-		goto err_unref;
+		goto err_put;
 
 	i915->gt.scratch = vma;
 	return 0;
 
-err_unref:
-	i915_gem_object_put(obj);
+err_put:
+	i915_vma_put(vma);
 	return ret;
 }
 
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 457126973cdd..59d21fcbf18a 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1931,12 +1931,9 @@ static void gen6_ppgtt_cleanup_work(struct work_struct *wrk)
 {
 	struct gen6_ppgtt_cleanup_work *work =
 		container_of(wrk, typeof(*work), base);
-	/* Side note, vma->vm is the GGTT not the ppgtt we just destroyed! */
-	struct drm_i915_private *i915 = work->vma->vm->i915;
 
-	mutex_lock(&i915->drm.struct_mutex);
+	/* Side note, vma->vm is the GGTT not the ppgtt we just destroyed! */
 	i915_vma_destroy(work->vma);
-	mutex_unlock(&i915->drm.struct_mutex);
 
 	kfree(work);
 }
@@ -2060,6 +2057,7 @@ static struct i915_vma *pd_vma_create(struct gen6_ppgtt *ppgtt, int size)
 	if (!vma)
 		return ERR_PTR(-ENOMEM);
 
+	kref_init(&vma->ref);
 	i915_active_init(&vma->active, NULL, NULL);
 
 	vma->vm = &ggtt->vm;
@@ -2248,16 +2246,6 @@ i915_ppgtt_create(struct drm_i915_private *i915)
 	return ppgtt;
 }
 
-static void ppgtt_destroy_vma(struct i915_address_space *vm)
-{
-	struct i915_vma *vma, *vn;
-
-	vm->closed = true;
-	list_for_each_entry_safe(vma, vn, &vm->bound_list, vm_link)
-		i915_vma_destroy(vma);
-	GEM_BUG_ON(!list_empty(&vm->bound_list));
-}
-
 void i915_vm_release(struct kref *kref)
 {
 	struct i915_address_space *vm =
@@ -2266,8 +2254,6 @@ void i915_vm_release(struct kref *kref)
 	GEM_BUG_ON(i915_is_ggtt(vm));
 	trace_i915_ppgtt_release(vm);
 
-	ppgtt_destroy_vma(vm);
-
 	vm->cleanup(vm);
 	i915_address_space_fini(vm);
 
diff --git a/drivers/gpu/drm/i915/i915_gem_render_state.c b/drivers/gpu/drm/i915/i915_gem_render_state.c
index 4ee032072d4f..c59f0e22142f 100644
--- a/drivers/gpu/drm/i915/i915_gem_render_state.c
+++ b/drivers/gpu/drm/i915/i915_gem_render_state.c
@@ -228,7 +228,7 @@ int i915_gem_render_state_emit(struct i915_request *rq)
 err_unpin:
 	i915_vma_unpin(so.vma);
 err_vma:
-	i915_vma_close(so.vma);
+	i915_vma_put(so.vma);
 err_obj:
 	i915_gem_object_put(so.obj);
 	return err;
diff --git a/drivers/gpu/drm/i915/i915_timeline.c b/drivers/gpu/drm/i915/i915_timeline.c
index 4ffb61991768..b2bbb19eea90 100644
--- a/drivers/gpu/drm/i915/i915_timeline.c
+++ b/drivers/gpu/drm/i915/i915_timeline.c
@@ -46,8 +46,7 @@ static struct i915_vma *__hwsp_alloc(struct drm_i915_private *i915)
 	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
 
 	vma = i915_vma_instance(obj, &i915->ggtt.vm, NULL);
-	if (IS_ERR(vma))
-		i915_gem_object_put(obj);
+	i915_gem_object_put(obj);
 
 	return vma;
 }
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 14c6d104332a..b80d9e007293 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -108,9 +108,11 @@ vma_create(struct drm_i915_gem_object *obj,
 	if (vma == NULL)
 		return ERR_PTR(-ENOMEM);
 
-	vma->vm = vm;
+	kref_init(&vma->ref);
+
+	vma->vm = i915_vm_get(vm);
 	vma->ops = &vm->vma_ops;
-	vma->obj = obj;
+	vma->obj = i915_gem_object_get(obj);
 	vma->resv = obj->resv;
 	vma->size = obj->base.size;
 	vma->display_alignment = I915_GTT_MIN_ALIGNMENT;
@@ -182,6 +184,7 @@ vma_create(struct drm_i915_gem_object *obj,
 		 */
 		cmp = i915_vma_compare(pos, vm, view);
 		if (cmp == 0) {
+			i915_vma_get(pos);
 			spin_unlock(&obj->vma.lock);
 			i915_vma_free(vma);
 			return pos;
@@ -267,11 +270,14 @@ i915_vma_instance(struct drm_i915_gem_object *obj,
 
 	spin_lock(&obj->vma.lock);
 	vma = vma_lookup(obj, vm, view);
+	if (likely(vma))
+		i915_vma_get(vma);
 	spin_unlock(&obj->vma.lock);
+	if (likely(vma))
+		return vma;
 
 	/* vma_create() will resolve the race if another creates the vma */
-	if (unlikely(!vma))
-		vma = vma_create(obj, vm, view);
+	vma = vma_create(obj, vm, view);
 
 	GEM_BUG_ON(!IS_ERR(vma) && i915_vma_compare(vma, vm, view));
 	return vma;
@@ -326,7 +332,10 @@ int i915_vma_bind(struct i915_vma *vma, enum i915_cache_level cache_level,
 	if (ret)
 		return ret;
 
+	if (!vma_flags)
+		i915_vma_get(vma);
 	vma->flags |= bind_flags;
+
 	return 0;
 }
 
@@ -400,22 +409,16 @@ void i915_vma_unpin_iomap(struct i915_vma *vma)
 void i915_vma_unpin_and_release(struct i915_vma **p_vma, unsigned int flags)
 {
 	struct i915_vma *vma;
-	struct drm_i915_gem_object *obj;
 
 	vma = fetch_and_zero(p_vma);
 	if (!vma)
 		return;
 
-	obj = vma->obj;
-	GEM_BUG_ON(!obj);
-
 	i915_vma_unpin(vma);
-	i915_vma_close(vma);
-
 	if (flags & I915_VMA_RELEASE_MAP)
-		i915_gem_object_unpin_map(obj);
+		i915_gem_object_unpin_map(vma->obj);
 
-	i915_gem_object_put(obj);
+	i915_vma_put(vma);
 }
 
 bool i915_vma_misplaced(const struct i915_vma *vma,
@@ -760,11 +763,11 @@ void i915_vma_close(struct i915_vma *vma)
 	 * of wasted work for the steady state.
 	 */
 	spin_lock_irqsave(&i915->gt.closed_lock, flags);
-	list_add(&vma->closed_link, &i915->gt.closed_vma);
+	list_add(&i915_vma_get(vma)->closed_link, &i915->gt.closed_vma);
 	spin_unlock_irqrestore(&i915->gt.closed_lock, flags);
 }
 
-static void __i915_vma_remove_closed(struct i915_vma *vma)
+void i915_vma_reopen(struct i915_vma *vma)
 {
 	struct drm_i915_private *i915 = vma->vm->i915;
 
@@ -774,15 +777,25 @@ static void __i915_vma_remove_closed(struct i915_vma *vma)
 	spin_lock_irq(&i915->gt.closed_lock);
 	list_del_init(&vma->closed_link);
 	spin_unlock_irq(&i915->gt.closed_lock);
+
+	i915_vma_put(vma);
 }
 
-void i915_vma_reopen(struct i915_vma *vma)
+void i915_vma_destroy(struct i915_vma *vma)
 {
-	__i915_vma_remove_closed(vma);
+	mutex_lock(&vma->vm->i915->drm.struct_mutex);
+	vma->flags &= ~I915_VMA_PIN_MASK;
+	WARN_ON(i915_vma_unbind(vma));
+	mutex_unlock(&vma->vm->i915->drm.struct_mutex);
+
+	i915_vma_put(vma);
 }
 
-static void __i915_vma_destroy(struct i915_vma *vma)
+void __i915_vma_release(struct kref *ref)
 {
+	struct i915_vma *vma = container_of(ref, typeof(*vma), ref);
+
+	GEM_BUG_ON(i915_vma_is_active(vma));
 	GEM_BUG_ON(vma->node.allocated);
 	GEM_BUG_ON(vma->fence);
 
@@ -791,29 +804,18 @@ static void __i915_vma_destroy(struct i915_vma *vma)
 
 		spin_lock(&obj->vma.lock);
 		list_del(&vma->obj_link);
-		rb_erase(&vma->obj_node, &vma->obj->vma.tree);
+		rb_erase(&vma->obj_node, &obj->vma.tree);
 		spin_unlock(&obj->vma.lock);
+
+		i915_gem_object_put(obj);
 	}
 
 	i915_active_fini(&vma->active);
 
+	i915_vm_put(vma->vm);
 	i915_vma_free(vma);
 }
 
-void i915_vma_destroy(struct i915_vma *vma)
-{
-	lockdep_assert_held(&vma->vm->i915->drm.struct_mutex);
-
-	GEM_BUG_ON(i915_vma_is_pinned(vma));
-
-	__i915_vma_remove_closed(vma);
-
-	WARN_ON(i915_vma_unbind(vma));
-	GEM_BUG_ON(i915_vma_is_active(vma));
-
-	__i915_vma_destroy(vma);
-}
-
 void i915_vma_parked(struct drm_i915_private *i915)
 {
 	struct i915_vma *vma, *next;
@@ -823,7 +825,7 @@ void i915_vma_parked(struct drm_i915_private *i915)
 		list_del_init(&vma->closed_link);
 		spin_unlock_irq(&i915->gt.closed_lock);
 
-		i915_vma_destroy(vma);
+		i915_vma_unbind(vma);
 
 		spin_lock_irq(&i915->gt.closed_lock);
 	}
@@ -975,6 +977,7 @@ int i915_vma_unbind(struct i915_vma *vma)
 	vma->flags &= ~(I915_VMA_GLOBAL_BIND | I915_VMA_LOCAL_BIND);
 
 	i915_vma_remove(vma);
+	i915_vma_put(vma);
 
 	return 0;
 }
diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
index 71088ff4ad59..27a5993966a4 100644
--- a/drivers/gpu/drm/i915/i915_vma.h
+++ b/drivers/gpu/drm/i915/i915_vma.h
@@ -51,14 +51,19 @@ enum i915_cache_level;
  */
 struct i915_vma {
 	struct drm_mm_node node;
-	struct drm_i915_gem_object *obj;
-	struct i915_address_space *vm;
+
+	struct kref ref;
 	const struct i915_vma_ops *ops;
-	struct i915_fence_reg *fence;
-	struct reservation_object *resv; /** Alias of obj->resv */
+	struct i915_address_space *vm;
+	void *private; /* owned by creator */
+
 	struct sg_table *pages;
+	struct i915_fence_reg *fence;
 	void __iomem *iomap;
-	void *private; /* owned by creator */
+
+	struct drm_i915_gem_object *obj;
+	struct reservation_object *resv; /** Alias of obj->resv */
+
 	u64 size;
 	u64 display_alignment;
 	struct i915_page_sizes page_sizes;
@@ -66,11 +71,6 @@ struct i915_vma {
 	u32 fence_size;
 	u32 fence_alignment;
 
-	/**
-	 * Count of the number of times this vma has been opened by different
-	 * handles (but same file) for execbuf, i.e. the number of aliases
-	 * that exist in the ctx->handle_vmas LUT for this vma.
-	 */
 	atomic_t open_count;
 	unsigned long flags;
 	/**
@@ -206,6 +206,10 @@ static inline bool i915_vma_has_userfault(const struct i915_vma *vma)
 	return test_bit(I915_VMA_USERFAULT_BIT, &vma->flags);
 }
 
+void i915_vma_close(struct i915_vma *vma);
+void i915_vma_reopen(struct i915_vma *vma);
+void i915_vma_parked(struct drm_i915_private *i915);
+
 static inline bool i915_vma_is_closed(const struct i915_vma *vma)
 {
 	return !list_empty(&vma->closed_link);
@@ -227,15 +231,18 @@ static inline u32 i915_ggtt_pin_bias(struct i915_vma *vma)
 
 static inline struct i915_vma *i915_vma_get(struct i915_vma *vma)
 {
-	i915_gem_object_get(vma->obj);
+	kref_get(&vma->ref);
 	return vma;
 }
 
+void __i915_vma_release(struct kref *ref);
 static inline void i915_vma_put(struct i915_vma *vma)
 {
-	i915_gem_object_put(vma->obj);
+	kref_put(&vma->ref, __i915_vma_release);
 }
 
+void i915_vma_destroy(struct i915_vma *vma);
+
 static __always_inline ptrdiff_t ptrdiff(const void *a, const void *b)
 {
 	return a - b;
@@ -292,11 +299,8 @@ bool i915_vma_misplaced(const struct i915_vma *vma,
 			u64 size, u64 alignment, u64 flags);
 void __i915_vma_set_map_and_fenceable(struct i915_vma *vma);
 void i915_vma_revoke_mmap(struct i915_vma *vma);
-int __must_check i915_vma_unbind(struct i915_vma *vma);
+int i915_vma_unbind(struct i915_vma *vma);
 void i915_vma_unlink_ctx(struct i915_vma *vma);
-void i915_vma_close(struct i915_vma *vma);
-void i915_vma_reopen(struct i915_vma *vma);
-void i915_vma_destroy(struct i915_vma *vma);
 
 #define assert_vma_held(vma) reservation_object_assert_held((vma)->resv)
 
@@ -438,8 +442,6 @@ i915_vma_unpin_fence(struct i915_vma *vma)
 		__i915_vma_unpin_fence(vma);
 }
 
-void i915_vma_parked(struct drm_i915_private *i915);
-
 #define for_each_until(cond) if (cond) break; else
 
 /**
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 74fc6d696540..c3f44cbdb815 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -3080,12 +3080,14 @@ initial_plane_vma(struct drm_i915_private *i915,
 		break;
 	default:
 		MISSING_CASE(plane_config->tiling);
-		goto err_obj;
+		i915_gem_object_put(obj);
+		return NULL;
 	}
 
 	vma = i915_vma_instance(obj, &i915->ggtt.vm, NULL);
+	i915_gem_object_put(obj);
 	if (IS_ERR(vma))
-		goto err_obj;
+		return NULL;
 
 	mutex_lock(&i915->drm.struct_mutex);
 	err = i915_vma_pin(vma, 0, 0,
@@ -3093,16 +3095,16 @@ initial_plane_vma(struct drm_i915_private *i915,
 			   base | PIN_OFFSET_FIXED);
 	mutex_unlock(&i915->drm.struct_mutex);
 	if (err)
-		goto err_obj;
+		goto err_vma;
 
-	if (i915_gem_object_is_tiled(obj) &&
+	if (i915_gem_object_is_tiled(vma->obj) &&
 	    !i915_vma_is_map_and_fenceable(vma))
-		goto err_obj;
+		goto err_vma;
 
 	return vma;
 
-err_obj:
-	i915_gem_object_put(obj);
+err_vma:
+	i915_vma_put(vma);
 	return NULL;
 }
 
diff --git a/drivers/gpu/drm/i915/intel_guc.c b/drivers/gpu/drm/i915/intel_guc.c
index c40a6efdd33a..3b4e7686cd6c 100644
--- a/drivers/gpu/drm/i915/intel_guc.c
+++ b/drivers/gpu/drm/i915/intel_guc.c
@@ -669,19 +669,16 @@ struct i915_vma *intel_guc_allocate_vma(struct intel_guc *guc, u32 size)
 		return ERR_CAST(obj);
 
 	vma = i915_vma_instance(obj, &dev_priv->ggtt.vm, NULL);
+	i915_gem_object_put(obj);
 	if (IS_ERR(vma))
-		goto err;
+		return vma;
 
 	flags = PIN_GLOBAL | PIN_OFFSET_BIAS | i915_ggtt_pin_bias(vma);
 	ret = i915_vma_pin(vma, 0, 0, flags);
 	if (ret) {
+		i915_vma_put(vma);
 		vma = ERR_PTR(ret);
-		goto err;
 	}
 
 	return vma;
-
-err:
-	i915_gem_object_put(obj);
-	return vma;
 }
diff --git a/drivers/gpu/drm/i915/intel_overlay.c b/drivers/gpu/drm/i915/intel_overlay.c
index d50a0196b166..009cddc09367 100644
--- a/drivers/gpu/drm/i915/intel_overlay.c
+++ b/drivers/gpu/drm/i915/intel_overlay.c
@@ -186,7 +186,7 @@ struct intel_overlay {
 	u32 brightness, contrast, saturation;
 	u32 old_xscale, old_yscale;
 	/* register access */
-	struct drm_i915_gem_object *reg_bo;
+	struct i915_vma *reg_vma;
 	struct overlay_registers __iomem *regs;
 	u32 flip_addr;
 	/* flip handling */
@@ -1319,13 +1319,14 @@ static int get_registers(struct intel_overlay *overlay, bool use_phys)
 	}
 
 	vma = i915_gem_object_ggtt_pin(obj, NULL, 0, 0, PIN_MAPPABLE);
+	i915_gem_object_put(obj);
 	if (IS_ERR(vma)) {
 		err = PTR_ERR(vma);
-		goto err_put_bo;
+		goto err_unlock;
 	}
 
 	if (use_phys)
-		overlay->flip_addr = sg_dma_address(obj->mm.pages->sgl);
+		overlay->flip_addr = sg_dma_address(vma->obj->mm.pages->sgl);
 	else
 		overlay->flip_addr = i915_ggtt_offset(vma);
 	overlay->regs = i915_vma_pin_iomap(vma);
@@ -1333,15 +1334,15 @@ static int get_registers(struct intel_overlay *overlay, bool use_phys)
 
 	if (IS_ERR(overlay->regs)) {
 		err = PTR_ERR(overlay->regs);
-		goto err_put_bo;
+		goto err_vma;
 	}
 
-	overlay->reg_bo = obj;
+	overlay->reg_vma = vma;
 	mutex_unlock(&i915->drm.struct_mutex);
 	return 0;
 
-err_put_bo:
-	i915_gem_object_put(obj);
+err_vma:
+	i915_vma_destroy(vma);
 err_unlock:
 	mutex_unlock(&i915->drm.struct_mutex);
 	return err;
@@ -1401,7 +1402,7 @@ void intel_overlay_cleanup(struct drm_i915_private *dev_priv)
 	 */
 	WARN_ON(overlay->active);
 
-	i915_gem_object_put(overlay->reg_bo);
+	i915_vma_destroy(overlay->reg_vma);
 	i915_active_fini(&overlay->last_flip);
 
 	kfree(overlay);
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
index 70c8bc626300..664b7e3e1d7c 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
@@ -332,9 +332,8 @@ static void close_object_list(struct list_head *objects,
 		vma = i915_vma_instance(obj, vm, NULL);
 		if (!IS_ERR(vma))
 			ignored = i915_vma_unbind(vma);
-		/* Only ppgtt vma may be closed before the object is freed */
-		if (!IS_ERR(vma) && !i915_vma_is_ggtt(vma))
-			i915_vma_close(vma);
+		if (!IS_ERR(vma))
+			i915_vma_put(vma);
 
 		list_del(&obj->st_link);
 		i915_gem_object_put(obj);
@@ -624,8 +623,6 @@ static int walk_hole(struct drm_i915_private *i915,
 		}
 
 err_close:
-		if (!i915_vma_is_ggtt(vma))
-			i915_vma_close(vma);
 err_put:
 		i915_gem_object_put(obj);
 		if (err)
@@ -706,8 +703,6 @@ static int pot_hole(struct drm_i915_private *i915,
 	}
 
 err:
-	if (!i915_vma_is_ggtt(vma))
-		i915_vma_close(vma);
 err_obj:
 	i915_gem_object_put(obj);
 	return err;
@@ -809,8 +804,6 @@ static int drunk_hole(struct drm_i915_private *i915,
 		}
 
 err:
-		if (!i915_vma_is_ggtt(vma))
-			i915_vma_close(vma);
 err_obj:
 		i915_gem_object_put(obj);
 		kfree(order);
diff --git a/drivers/gpu/drm/i915/selftests/i915_vma.c b/drivers/gpu/drm/i915/selftests/i915_vma.c
index a166d9405a94..56c1cac368cc 100644
--- a/drivers/gpu/drm/i915/selftests/i915_vma.c
+++ b/drivers/gpu/drm/i915/selftests/i915_vma.c
@@ -317,6 +317,7 @@ static int igt_vma_pin1(void *arg)
 		return PTR_ERR(obj);
 
 	vma = checked_vma_instance(obj, &ggtt->vm, NULL);
+	i915_gem_object_put(obj);
 	if (IS_ERR(vma))
 		goto out;
 
@@ -345,7 +346,7 @@ static int igt_vma_pin1(void *arg)
 
 	err = 0;
 out:
-	i915_gem_object_put(obj);
+	i915_vma_destroy(vma);
 	return err;
 }
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 30/39] drm/i915: Propagate fence errors
  2019-06-14  7:09 WIP glance towards struct_mutex removal Chris Wilson
                   ` (28 preceding siblings ...)
  2019-06-14  7:10 ` [PATCH 29/39] drm/i915: Make i915_vma track its own kref Chris Wilson
@ 2019-06-14  7:10 ` Chris Wilson
  2019-06-14  7:10 ` [PATCH 31/39] drm/i915: Allow page pinning to be in the background Chris Wilson
                   ` (10 subsequent siblings)
  40 siblings, 0 replies; 61+ messages in thread
From: Chris Wilson @ 2019-06-14  7:10 UTC (permalink / raw)
  To: intel-gfx; +Cc: Matthew Auld

Errors spread like wildfire, and must eventually be returned to the
user. They need to be captured and passed along the flow of fences,
infecting each in turn with the existing error, until finally they fall
out of a user visible result.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
---
 drivers/gpu/drm/i915/i915_request.c           |  8 +++++++
 drivers/gpu/drm/i915/i915_sw_fence.c          | 23 +++++++++++++++----
 drivers/gpu/drm/i915/i915_sw_fence.h          |  7 ++++++
 drivers/gpu/drm/i915/selftests/lib_sw_fence.c |  1 +
 4 files changed, 34 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index d43f1d3d50ba..2c2624d7e18e 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -491,6 +491,10 @@ submit_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
 	switch (state) {
 	case FENCE_COMPLETE:
 		trace_i915_request_submit(request);
+
+		if (unlikely(fence->error))
+			i915_request_skip(request, fence->error);
+
 		/*
 		 * We need to serialize use of the submit_request() callback
 		 * with its hotplugging performed during an emergency
@@ -1044,6 +1048,9 @@ void i915_request_skip(struct i915_request *rq, int error)
 	GEM_BUG_ON(!IS_ERR_VALUE((long)error));
 	dma_fence_set_error(&rq->fence, error);
 
+	if (rq->infix == rq->postfix)
+		return;
+
 	/*
 	 * As this request likely depends on state from the lost
 	 * context, clear out all the user operations leaving the
@@ -1055,6 +1062,7 @@ void i915_request_skip(struct i915_request *rq, int error)
 		head = 0;
 	}
 	memset(vaddr + head, 0, rq->postfix - head);
+	rq->infix = rq->postfix;
 }
 
 static struct i915_request *
diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c b/drivers/gpu/drm/i915/i915_sw_fence.c
index 5387aafd3424..dedacafc9442 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence.c
+++ b/drivers/gpu/drm/i915/i915_sw_fence.c
@@ -157,8 +157,11 @@ static void __i915_sw_fence_wake_up_all(struct i915_sw_fence *fence,
 		LIST_HEAD(extra);
 
 		do {
-			list_for_each_entry_safe(pos, next, &x->head, entry)
-				pos->func(pos, TASK_NORMAL, 0, &extra);
+			list_for_each_entry_safe(pos, next, &x->head, entry) {
+				pos->func(pos,
+					  TASK_NORMAL, fence->error,
+					  &extra);
+			}
 
 			if (list_empty(&extra))
 				break;
@@ -219,6 +222,8 @@ void __i915_sw_fence_init(struct i915_sw_fence *fence,
 
 	__init_waitqueue_head(&fence->wait, name, key);
 	atomic_set(&fence->pending, 1);
+	fence->error = 0;
+
 	fence->flags = (unsigned long)fn;
 }
 
@@ -230,6 +235,8 @@ void i915_sw_fence_commit(struct i915_sw_fence *fence)
 
 static int i915_sw_fence_wake(wait_queue_entry_t *wq, unsigned mode, int flags, void *key)
 {
+	i915_sw_fence_set_error_once(wq->private, flags);
+
 	list_del(&wq->entry);
 	__i915_sw_fence_complete(wq->private, key);
 
@@ -302,8 +309,10 @@ static int __i915_sw_fence_await_sw_fence(struct i915_sw_fence *fence,
 	debug_fence_assert(fence);
 	might_sleep_if(gfpflags_allow_blocking(gfp));
 
-	if (i915_sw_fence_done(signaler))
+	if (i915_sw_fence_done(signaler)) {
+		i915_sw_fence_set_error_once(fence, signaler->error);
 		return 0;
+	}
 
 	debug_fence_assert(signaler);
 
@@ -319,6 +328,7 @@ static int __i915_sw_fence_await_sw_fence(struct i915_sw_fence *fence,
 				return -ENOMEM;
 
 			i915_sw_fence_wait(signaler);
+			i915_sw_fence_set_error_once(fence, signaler->error);
 			return 0;
 		}
 
@@ -337,7 +347,7 @@ static int __i915_sw_fence_await_sw_fence(struct i915_sw_fence *fence,
 		__add_wait_queue_entry_tail(&signaler->wait, wq);
 		pending = 1;
 	} else {
-		i915_sw_fence_wake(wq, 0, 0, NULL);
+		i915_sw_fence_wake(wq, 0, signaler->error, NULL);
 		pending = 0;
 	}
 	spin_unlock_irqrestore(&signaler->wait.lock, flags);
@@ -372,6 +382,7 @@ static void dma_i915_sw_fence_wake(struct dma_fence *dma,
 {
 	struct i915_sw_dma_fence_cb *cb = container_of(data, typeof(*cb), base);
 
+	i915_sw_fence_set_error_once(cb->fence, dma->error);
 	i915_sw_fence_complete(cb->fence);
 	kfree(cb);
 }
@@ -391,6 +402,7 @@ static void timer_i915_sw_fence_wake(struct timer_list *t)
 		  cb->dma->seqno,
 		  i915_sw_fence_debug_hint(fence));
 
+	i915_sw_fence_set_error_once(fence, -ETIMEDOUT);
 	i915_sw_fence_complete(fence);
 }
 
@@ -480,6 +492,7 @@ static void __dma_i915_sw_fence_wake(struct dma_fence *dma,
 {
 	struct i915_sw_dma_fence_cb *cb = container_of(data, typeof(*cb), base);
 
+	i915_sw_fence_set_error_once(cb->fence, dma->error);
 	i915_sw_fence_complete(cb->fence);
 }
 
@@ -501,7 +514,7 @@ int __i915_sw_fence_await_dma_fence(struct i915_sw_fence *fence,
 	if (ret == 0) {
 		ret = 1;
 	} else {
-		i915_sw_fence_complete(fence);
+		__dma_i915_sw_fence_wake(dma, &cb->base);
 		if (ret == -ENOENT) /* fence already signaled */
 			ret = 0;
 	}
diff --git a/drivers/gpu/drm/i915/i915_sw_fence.h b/drivers/gpu/drm/i915/i915_sw_fence.h
index 9cb5c3b307a6..518cbaad9bea 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence.h
+++ b/drivers/gpu/drm/i915/i915_sw_fence.h
@@ -22,6 +22,7 @@ struct i915_sw_fence {
 	wait_queue_head_t wait;
 	unsigned long flags;
 	atomic_t pending;
+	int error;
 };
 
 #define I915_SW_FENCE_CHECKED_BIT	0 /* used internally for DAG checking */
@@ -106,4 +107,10 @@ static inline void i915_sw_fence_wait(struct i915_sw_fence *fence)
 	wait_event(fence->wait, i915_sw_fence_done(fence));
 }
 
+static inline void
+i915_sw_fence_set_error_once(struct i915_sw_fence *fence, int error)
+{
+	cmpxchg(&fence->error, 0, error);
+}
+
 #endif /* _I915_SW_FENCE_H_ */
diff --git a/drivers/gpu/drm/i915/selftests/lib_sw_fence.c b/drivers/gpu/drm/i915/selftests/lib_sw_fence.c
index b976c12817c5..080b90b63d16 100644
--- a/drivers/gpu/drm/i915/selftests/lib_sw_fence.c
+++ b/drivers/gpu/drm/i915/selftests/lib_sw_fence.c
@@ -40,6 +40,7 @@ void __onstack_fence_init(struct i915_sw_fence *fence,
 
 	__init_waitqueue_head(&fence->wait, name, key);
 	atomic_set(&fence->pending, 1);
+	fence->error = 0;
 	fence->flags = (unsigned long)nop_fence_notify;
 }
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 31/39] drm/i915: Allow page pinning to be in the background
  2019-06-14  7:09 WIP glance towards struct_mutex removal Chris Wilson
                   ` (29 preceding siblings ...)
  2019-06-14  7:10 ` [PATCH 30/39] drm/i915: Propagate fence errors Chris Wilson
@ 2019-06-14  7:10 ` Chris Wilson
  2019-06-14  7:10 ` [PATCH 32/39] drm/i915: Allow vma binding to occur asynchronously Chris Wilson
                   ` (9 subsequent siblings)
  40 siblings, 0 replies; 61+ messages in thread
From: Chris Wilson @ 2019-06-14  7:10 UTC (permalink / raw)
  To: intel-gfx; +Cc: Matthew Auld

Assume that pages may be pinned in a background task and use a
completion event to synchronise with callers that must access the pages
immediately.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_object.c    |  1 +
 drivers/gpu/drm/i915/gem/i915_gem_object.h    |  7 +--
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |  3 ++
 drivers/gpu/drm/i915/gem/i915_gem_pages.c     | 53 +++++++++++++++----
 4 files changed, 52 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
index 1ad53cddd425..588e0ca285e1 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
@@ -65,6 +65,7 @@ void i915_gem_object_init(struct drm_i915_gem_object *obj,
 	obj->mm.madv = I915_MADV_WILLNEED;
 	INIT_RADIX_TREE(&obj->mm.get_page.radix, GFP_KERNEL | __GFP_NOWARN);
 	mutex_init(&obj->mm.get_page.lock);
+	init_completion(&obj->mm.completion);
 }
 
 /**
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h
index 67d70d144bd9..7ea8013d108f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
@@ -234,7 +234,7 @@ int ____i915_gem_object_get_pages(struct drm_i915_gem_object *obj);
 int __i915_gem_object_get_pages(struct drm_i915_gem_object *obj);
 
 static inline int __must_check
-i915_gem_object_pin_pages(struct drm_i915_gem_object *obj)
+i915_gem_object_pin_pages_async(struct drm_i915_gem_object *obj)
 {
 	might_lock(&obj->mm.lock);
 
@@ -244,6 +244,9 @@ i915_gem_object_pin_pages(struct drm_i915_gem_object *obj)
 	return __i915_gem_object_get_pages(obj);
 }
 
+int __must_check
+i915_gem_object_pin_pages(struct drm_i915_gem_object *obj);
+
 static inline bool
 i915_gem_object_has_pages(struct drm_i915_gem_object *obj)
 {
@@ -267,9 +270,7 @@ i915_gem_object_has_pinned_pages(struct drm_i915_gem_object *obj)
 static inline void
 __i915_gem_object_unpin_pages(struct drm_i915_gem_object *obj)
 {
-	GEM_BUG_ON(!i915_gem_object_has_pages(obj));
 	GEM_BUG_ON(!i915_gem_object_has_pinned_pages(obj));
-
 	atomic_dec(&obj->mm.pages_pin_count);
 }
 
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index e87fca4d8194..8f61d7a93078 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -7,6 +7,7 @@
 #ifndef __I915_GEM_OBJECT_TYPES_H__
 #define __I915_GEM_OBJECT_TYPES_H__
 
+#include <linux/completion.h>
 #include <linux/reservation.h>
 
 #include <drm/drm_gem.h>
@@ -210,6 +211,8 @@ struct drm_i915_gem_object {
 		 */
 		struct list_head link;
 
+		struct completion completion;
+
 		/**
 		 * Advice: are the backing pages purgeable?
 		 */
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
index b36ad269f4ea..6bec301cee79 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
@@ -73,21 +73,18 @@ void __i915_gem_object_set_pages(struct drm_i915_gem_object *obj,
 
 		spin_unlock_irqrestore(&i915->mm.obj_lock, flags);
 	}
+
+	complete_all(&obj->mm.completion);
 }
 
 int ____i915_gem_object_get_pages(struct drm_i915_gem_object *obj)
 {
-	int err;
-
 	if (unlikely(obj->mm.madv != I915_MADV_WILLNEED)) {
 		DRM_DEBUG("Attempting to obtain a purgeable object\n");
 		return -EFAULT;
 	}
 
-	err = obj->ops->get_pages(obj);
-	GEM_BUG_ON(!err && !i915_gem_object_has_pages(obj));
-
-	return err;
+	return obj->ops->get_pages(obj);
 }
 
 /* Ensure that the associated pages are gathered from the backing storage
@@ -105,7 +102,7 @@ int __i915_gem_object_get_pages(struct drm_i915_gem_object *obj)
 	if (err)
 		return err;
 
-	if (unlikely(!i915_gem_object_has_pages(obj))) {
+	if (!obj->mm.pages) {
 		GEM_BUG_ON(i915_gem_object_has_pinned_pages(obj));
 
 		err = ____i915_gem_object_get_pages(obj);
@@ -121,6 +118,32 @@ int __i915_gem_object_get_pages(struct drm_i915_gem_object *obj)
 	return err;
 }
 
+int i915_gem_object_pin_pages(struct drm_i915_gem_object *obj)
+{
+	int err;
+
+	err = i915_gem_object_pin_pages_async(obj);
+	if (err)
+		return err;
+
+	err = wait_for_completion_interruptible(&obj->mm.completion);
+	if (err)
+		goto err_unpin;
+
+	if (IS_ERR(obj->mm.pages)) {
+		err = PTR_ERR(obj->mm.pages);
+		goto err_unpin;
+	}
+
+	GEM_BUG_ON(!i915_gem_object_has_pages(obj));
+	return 0;
+
+err_unpin:
+	GEM_BUG_ON(!i915_gem_object_has_pinned_pages(obj));
+	atomic_dec(&obj->mm.pages_pin_count);
+	return err;
+}
+
 /* Immediately discard the backing storage */
 void i915_gem_object_truncate(struct drm_i915_gem_object *obj)
 {
@@ -201,6 +224,9 @@ int __i915_gem_object_put_pages(struct drm_i915_gem_object *obj,
 
 	GEM_BUG_ON(atomic_read(&obj->bind_count));
 
+	if (obj->mm.pages == ERR_PTR(-EAGAIN))
+		wait_for_completion(&obj->mm.completion);
+
 	/* May be called by shrinker from within get_pages() (on another bo) */
 	mutex_lock_nested(&obj->mm.lock, subclass);
 	if (unlikely(atomic_read(&obj->mm.pages_pin_count))) {
@@ -227,6 +253,7 @@ int __i915_gem_object_put_pages(struct drm_i915_gem_object *obj,
 	if (!IS_ERR(pages))
 		obj->ops->put_pages(obj, pages);
 
+	reinit_completion(&obj->mm.completion);
 	err = 0;
 unlock:
 	mutex_unlock(&obj->mm.lock);
@@ -304,7 +331,7 @@ void *i915_gem_object_pin_map(struct drm_i915_gem_object *obj,
 	type &= ~I915_MAP_OVERRIDE;
 
 	if (!atomic_inc_not_zero(&obj->mm.pages_pin_count)) {
-		if (unlikely(!i915_gem_object_has_pages(obj))) {
+		if (!obj->mm.pages) {
 			GEM_BUG_ON(i915_gem_object_has_pinned_pages(obj));
 
 			err = ____i915_gem_object_get_pages(obj);
@@ -316,7 +343,6 @@ void *i915_gem_object_pin_map(struct drm_i915_gem_object *obj,
 		atomic_inc(&obj->mm.pages_pin_count);
 		pinned = false;
 	}
-	GEM_BUG_ON(!i915_gem_object_has_pages(obj));
 
 	ptr = page_unpack_bits(obj->mm.mapping, &has_type);
 	if (ptr && has_type != type) {
@@ -334,6 +360,15 @@ void *i915_gem_object_pin_map(struct drm_i915_gem_object *obj,
 	}
 
 	if (!ptr) {
+		err = wait_for_completion_interruptible(&obj->mm.completion);
+		if (err)
+			goto err_unpin;
+
+		if (IS_ERR(obj->mm.pages)) {
+			err = PTR_ERR(obj->mm.pages);
+			goto err_unpin;
+		}
+
 		ptr = i915_gem_object_map(obj, type);
 		if (!ptr) {
 			err = -ENOMEM;
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 32/39] drm/i915: Allow vma binding to occur asynchronously
  2019-06-14  7:09 WIP glance towards struct_mutex removal Chris Wilson
                   ` (30 preceding siblings ...)
  2019-06-14  7:10 ` [PATCH 31/39] drm/i915: Allow page pinning to be in the background Chris Wilson
@ 2019-06-14  7:10 ` Chris Wilson
  2019-08-05 20:35   ` Kumar Valsan, Prathap
  2019-06-14  7:10 ` [PATCH 33/39] revoke-fence Chris Wilson
                   ` (8 subsequent siblings)
  40 siblings, 1 reply; 61+ messages in thread
From: Chris Wilson @ 2019-06-14  7:10 UTC (permalink / raw)
  To: intel-gfx

If we let pages be allocated asynchronously, we also then want to push
the binding process into an asynchronous task. Make it so, utilising the
recent improvements to fence error tracking and struct_mutex reduction.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |  14 +-
 .../drm/i915/gem/selftests/i915_gem_context.c |   4 +
 drivers/gpu/drm/i915/i915_gem_gtt.c           |  28 +++-
 drivers/gpu/drm/i915/i915_vma.c               | 158 +++++++++++++++---
 drivers/gpu/drm/i915/i915_vma.h               |  14 ++
 drivers/gpu/drm/i915/selftests/i915_vma.c     |   4 +-
 6 files changed, 187 insertions(+), 35 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 4354f4e85d41..6905fd14361b 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -580,6 +580,15 @@ static int eb_reserve_vma(const struct i915_execbuffer *eb,
 	u64 pin_flags;
 	int err;
 
+	/*
+	 * If we load the pages asynchronously, then the user *must*
+	 * obey the reservation_object and not bypass waiting on it.
+	 * On the positive side, if the vma is not yet bound (no pages!),
+	 * then it should not have any annoying implicit fences.
+	 */
+	if (exec_flags & EXEC_OBJECT_ASYNC && !vma->pages)
+		*vma->exec_flags &= ~EXEC_OBJECT_ASYNC;
+
 	pin_flags = PIN_USER | PIN_NONBLOCK;
 	if (exec_flags & EXEC_OBJECT_NEEDS_GTT)
 		pin_flags |= PIN_GLOBAL;
@@ -1246,8 +1255,9 @@ static int __reloc_gpu_alloc(struct i915_execbuffer *eb,
 		goto skip_request;
 
 	i915_vma_lock(batch);
-	GEM_BUG_ON(!reservation_object_test_signaled_rcu(batch->resv, true));
-	err = i915_vma_move_to_active(batch, rq, 0);
+	err = i915_request_await_object(rq, batch->obj, false);
+	if (err == 0)
+		err = i915_vma_move_to_active(batch, rq, 0);
 	i915_vma_unlock(batch);
 	if (err)
 		goto skip_request;
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
index b2870d2bd458..6247d2880464 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
@@ -290,6 +290,10 @@ static int gpu_fill(struct drm_i915_gem_object *obj,
 		goto err_batch;
 	}
 
+	err = i915_request_await_object(rq, batch->obj, false);
+	if (err)
+		goto err_request;
+
 	flags = 0;
 	if (INTEL_GEN(vm->i915) <= 5)
 		flags |= I915_DISPATCH_SECURE;
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 59d21fcbf18a..fb484685ff1b 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -30,6 +30,7 @@
 #include <linux/random.h>
 #include <linux/seq_file.h>
 #include <linux/stop_machine.h>
+#include <linux/workqueue.h>
 
 #include <asm/set_memory.h>
 
@@ -139,14 +140,14 @@ static inline void i915_ggtt_invalidate(struct drm_i915_private *i915)
 
 static int ppgtt_bind_vma(struct i915_vma *vma,
 			  enum i915_cache_level cache_level,
-			  u32 unused)
+			  u32 flags)
 {
+	struct i915_address_space *vm = vma->vm;
 	u32 pte_flags;
 	int err;
 
-	if (!(vma->flags & I915_VMA_LOCAL_BIND)) {
-		err = vma->vm->allocate_va_range(vma->vm,
-						 vma->node.start, vma->size);
+	if (flags & I915_VMA_ALLOC_BIND) {
+		err = vm->allocate_va_range(vm, vma->node.start, vma->size);
 		if (err)
 			return err;
 	}
@@ -156,20 +157,26 @@ static int ppgtt_bind_vma(struct i915_vma *vma,
 	if (i915_gem_object_is_readonly(vma->obj))
 		pte_flags |= PTE_READ_ONLY;
 
-	vma->vm->insert_entries(vma->vm, vma, cache_level, pte_flags);
+	vm->insert_entries(vm, vma, cache_level, pte_flags);
 
 	return 0;
 }
 
 static void ppgtt_unbind_vma(struct i915_vma *vma)
 {
-	vma->vm->clear_range(vma->vm, vma->node.start, vma->size);
+	struct i915_address_space *vm = vma->vm;
+
+	vm->clear_range(vm, vma->node.start, vma->size);
 }
 
 static int ppgtt_set_pages(struct i915_vma *vma)
 {
 	GEM_BUG_ON(vma->pages);
 
+	wait_for_completion(&vma->obj->mm.completion);
+	if (IS_ERR(vma->obj->mm.pages))
+		return PTR_ERR(vma->obj->mm.pages);
+
 	vma->pages = vma->obj->mm.pages;
 
 	vma->page_sizes = vma->obj->mm.page_sizes;
@@ -2629,7 +2636,7 @@ static int aliasing_gtt_bind_vma(struct i915_vma *vma,
 	if (flags & I915_VMA_LOCAL_BIND) {
 		struct i915_ppgtt *appgtt = i915->mm.aliasing_ppgtt;
 
-		if (!(vma->flags & I915_VMA_LOCAL_BIND)) {
+		if (flags & I915_VMA_ALLOC_BIND) {
 			ret = appgtt->vm.allocate_va_range(&appgtt->vm,
 							   vma->node.start,
 							   vma->size);
@@ -3847,13 +3854,18 @@ i915_get_ggtt_vma_pages(struct i915_vma *vma)
 {
 	int ret;
 
-	/* The vma->pages are only valid within the lifespan of the borrowed
+	/*
+	 * The vma->pages are only valid within the lifespan of the borrowed
 	 * obj->mm.pages. When the obj->mm.pages sg_table is regenerated, so
 	 * must be the vma->pages. A simple rule is that vma->pages must only
 	 * be accessed when the obj->mm.pages are pinned.
 	 */
 	GEM_BUG_ON(!i915_gem_object_has_pinned_pages(vma->obj));
 
+	wait_for_completion(&vma->obj->mm.completion);
+	if (IS_ERR(vma->obj->mm.pages))
+		return PTR_ERR(vma->obj->mm.pages);
+
 	switch (vma->ggtt_view.type) {
 	default:
 		GEM_BUG_ON(vma->ggtt_view.type);
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index b80d9e007293..33c6460eae8a 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -93,6 +93,70 @@ static void __i915_vma_retire(struct i915_active *ref)
 	i915_vma_put(active_to_vma(ref));
 }
 
+static int __vma_bind(struct i915_vma *vma,
+		      unsigned int cache_level,
+		      unsigned int flags)
+{
+	int err;
+
+	if (!vma->pages) {
+		err = vma->ops->set_pages(vma);
+		if (err)
+			return err;
+	}
+
+	return vma->ops->bind_vma(vma, cache_level, flags);
+}
+
+static void do_async_bind(struct work_struct *work)
+{
+	struct i915_vma *vma = container_of(work, typeof(*vma), async.work);
+	struct i915_address_space *vm = vma->vm;
+	int err;
+
+	if (vma->async.dma->error)
+		goto out;
+
+	err = __vma_bind(vma, vma->async.cache_level, vma->async.flags);
+	if (err)
+		dma_fence_set_error(vma->async.dma, err);
+
+out:
+	dma_fence_signal(vma->async.dma);
+
+	i915_vma_unpin(vma);
+	i915_vma_put(vma);
+	i915_vm_put(vm);
+}
+
+static void __queue_async_bind(struct dma_fence *f, struct dma_fence_cb *cb)
+{
+	struct i915_vma *vma = container_of(cb, typeof(*vma), async.cb);
+
+	if (f->error)
+		dma_fence_set_error(vma->async.dma, f->error);
+
+	INIT_WORK(&vma->async.work, do_async_bind);
+	queue_work(system_unbound_wq, &vma->async.work);
+}
+
+static const char *async_bind_driver_name(struct dma_fence *fence)
+{
+	return DRIVER_NAME;
+}
+
+static const char *async_bind_timeline_name(struct dma_fence *fence)
+{
+	return "bind";
+}
+
+static const struct dma_fence_ops async_bind_ops = {
+	.get_driver_name = async_bind_driver_name,
+	.get_timeline_name = async_bind_timeline_name,
+};
+
+static DEFINE_SPINLOCK(async_lock);
+
 static struct i915_vma *
 vma_create(struct drm_i915_gem_object *obj,
 	   struct i915_address_space *vm,
@@ -283,6 +347,54 @@ i915_vma_instance(struct drm_i915_gem_object *obj,
 	return vma;
 }
 
+static int queue_async_bind(struct i915_vma *vma,
+			    enum i915_cache_level cache_level,
+			    u32 flags)
+{
+	bool ready = true;
+
+	/* We are not allowed to shrink inside vm->mutex! */
+	vma->async.dma = kmalloc(sizeof(*vma->async.dma),
+				 GFP_NOWAIT | __GFP_NOWARN);
+	if (!vma->async.dma)
+		return -ENOMEM;
+
+	dma_fence_init(vma->async.dma,
+		       &async_bind_ops,
+		       &async_lock,
+		       vma->vm->i915->mm.unordered_timeline,
+		       0);
+
+	/* XXX find and avoid allocations under reservation_object locks */
+	if (!i915_vma_trylock(vma)) {
+		kfree(fetch_and_zero(&vma->async.dma));
+		return -EAGAIN;
+	}
+
+	i915_vm_get(vma->vm);
+	i915_vma_get(vma);
+	__i915_vma_pin(vma); /* avoid being shrunk */
+
+	vma->async.cache_level = cache_level;
+	vma->async.flags = flags;
+
+	if (rcu_access_pointer(vma->resv->fence_excl)) { /* async pages */
+		struct dma_fence *f = reservation_object_get_excl(vma->resv);
+
+		if (!dma_fence_add_callback(f,
+					    &vma->async.cb,
+					    __queue_async_bind))
+			ready = false;
+	}
+	reservation_object_add_excl_fence(vma->resv, vma->async.dma);
+	i915_vma_unlock(vma);
+
+	if (ready)
+		__queue_async_bind(vma->async.dma, &vma->async.cb);
+
+	return 0;
+}
+
 /**
  * i915_vma_bind - Sets up PTEs for an VMA in it's corresponding address space.
  * @vma: VMA to map
@@ -300,17 +412,12 @@ int i915_vma_bind(struct i915_vma *vma, enum i915_cache_level cache_level,
 	u32 vma_flags;
 	int ret;
 
+	GEM_BUG_ON(!flags);
 	GEM_BUG_ON(!drm_mm_node_allocated(&vma->node));
 	GEM_BUG_ON(vma->size > vma->node.size);
-
-	if (GEM_DEBUG_WARN_ON(range_overflows(vma->node.start,
-					      vma->node.size,
-					      vma->vm->total)))
-		return -ENODEV;
-
-	if (GEM_DEBUG_WARN_ON(!flags))
-		return -EINVAL;
-
+	GEM_BUG_ON(range_overflows(vma->node.start,
+				   vma->node.size,
+				   vma->vm->total));
 	bind_flags = 0;
 	if (flags & PIN_GLOBAL)
 		bind_flags |= I915_VMA_GLOBAL_BIND;
@@ -325,16 +432,20 @@ int i915_vma_bind(struct i915_vma *vma, enum i915_cache_level cache_level,
 	if (bind_flags == 0)
 		return 0;
 
-	GEM_BUG_ON(!vma->pages);
+	if ((bind_flags & ~vma_flags) & I915_VMA_LOCAL_BIND)
+		bind_flags |= I915_VMA_ALLOC_BIND;
 
 	trace_i915_vma_bind(vma, bind_flags);
-	ret = vma->ops->bind_vma(vma, cache_level, bind_flags);
+	if (bind_flags & I915_VMA_ALLOC_BIND)
+		ret = queue_async_bind(vma, cache_level, bind_flags);
+	else
+		ret = __vma_bind(vma, cache_level, bind_flags);
 	if (ret)
 		return ret;
 
 	if (!vma_flags)
 		i915_vma_get(vma);
-	vma->flags |= bind_flags;
+	vma->flags |= bind_flags & ~I915_VMA_ALLOC_BIND;
 
 	return 0;
 }
@@ -573,7 +684,7 @@ i915_vma_insert(struct i915_vma *vma, u64 size, u64 alignment, u64 flags)
 	}
 
 	if (vma->obj) {
-		ret = i915_gem_object_pin_pages(vma->obj);
+		ret = i915_gem_object_pin_pages_async(vma->obj);
 		if (ret)
 			return ret;
 
@@ -582,25 +693,19 @@ i915_vma_insert(struct i915_vma *vma, u64 size, u64 alignment, u64 flags)
 		cache_level = 0;
 	}
 
-	GEM_BUG_ON(vma->pages);
-
-	ret = vma->ops->set_pages(vma);
-	if (ret)
-		goto err_unpin;
-
 	if (flags & PIN_OFFSET_FIXED) {
 		u64 offset = flags & PIN_OFFSET_MASK;
 		if (!IS_ALIGNED(offset, alignment) ||
 		    range_overflows(offset, size, end)) {
 			ret = -EINVAL;
-			goto err_clear;
+			goto err_unpin;
 		}
 
 		ret = i915_gem_gtt_reserve(vma->vm, &vma->node,
 					   size, offset, cache_level,
 					   flags);
 		if (ret)
-			goto err_clear;
+			goto err_unpin;
 	} else {
 		/*
 		 * We only support huge gtt pages through the 48b PPGTT,
@@ -639,7 +744,7 @@ i915_vma_insert(struct i915_vma *vma, u64 size, u64 alignment, u64 flags)
 					  size, alignment, cache_level,
 					  start, end, flags);
 		if (ret)
-			goto err_clear;
+			goto err_unpin;
 
 		GEM_BUG_ON(vma->node.start < start);
 		GEM_BUG_ON(vma->node.start + vma->node.size > end);
@@ -658,8 +763,6 @@ i915_vma_insert(struct i915_vma *vma, u64 size, u64 alignment, u64 flags)
 
 	return 0;
 
-err_clear:
-	vma->ops->clear_pages(vma);
 err_unpin:
 	if (vma->obj)
 		i915_gem_object_unpin_pages(vma->obj);
@@ -933,6 +1036,11 @@ int i915_vma_unbind(struct i915_vma *vma)
 
 	lockdep_assert_held(&vma->vm->i915->drm.struct_mutex);
 
+	if (vma->async.dma &&
+	    dma_fence_wait_timeout(vma->async.dma, true,
+				   MAX_SCHEDULE_TIMEOUT) < 0)
+		return -EINTR;
+
 	ret = i915_active_wait(&vma->active);
 	if (ret)
 		return ret;
@@ -976,6 +1084,8 @@ int i915_vma_unbind(struct i915_vma *vma)
 	}
 	vma->flags &= ~(I915_VMA_GLOBAL_BIND | I915_VMA_LOCAL_BIND);
 
+	dma_fence_put(fetch_and_zero(&vma->async.dma));
+
 	i915_vma_remove(vma);
 	i915_vma_put(vma);
 
diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
index 27a5993966a4..e54ed0b380a3 100644
--- a/drivers/gpu/drm/i915/i915_vma.h
+++ b/drivers/gpu/drm/i915/i915_vma.h
@@ -103,6 +103,7 @@ struct i915_vma {
 #define I915_VMA_GLOBAL_BIND	BIT(9)
 #define I915_VMA_LOCAL_BIND	BIT(10)
 #define I915_VMA_BIND_MASK (I915_VMA_GLOBAL_BIND | I915_VMA_LOCAL_BIND | I915_VMA_PIN_OVERFLOW)
+#define I915_VMA_ALLOC_BIND	I915_VMA_PIN_OVERFLOW /* not stored */
 
 #define I915_VMA_GGTT		BIT(11)
 #define I915_VMA_CAN_FENCE	BIT(12)
@@ -143,6 +144,14 @@ struct i915_vma {
 	unsigned int *exec_flags;
 	struct hlist_node exec_node;
 	u32 exec_handle;
+
+	struct i915_vma_async_bind {
+		struct dma_fence *dma;
+		struct dma_fence_cb cb;
+		struct work_struct work;
+		unsigned int cache_level;
+		unsigned int flags;
+	} async;
 };
 
 struct i915_vma *
@@ -309,6 +318,11 @@ static inline void i915_vma_lock(struct i915_vma *vma)
 	reservation_object_lock(vma->resv, NULL);
 }
 
+static inline bool i915_vma_trylock(struct i915_vma *vma)
+{
+	return reservation_object_trylock(vma->resv);
+}
+
 static inline void i915_vma_unlock(struct i915_vma *vma)
 {
 	reservation_object_unlock(vma->resv);
diff --git a/drivers/gpu/drm/i915/selftests/i915_vma.c b/drivers/gpu/drm/i915/selftests/i915_vma.c
index 56c1cac368cc..30b831408b7b 100644
--- a/drivers/gpu/drm/i915/selftests/i915_vma.c
+++ b/drivers/gpu/drm/i915/selftests/i915_vma.c
@@ -204,8 +204,10 @@ static int igt_vma_create(void *arg)
 		mock_context_close(ctx);
 	}
 
-	list_for_each_entry_safe(obj, on, &objects, st_link)
+	list_for_each_entry_safe(obj, on, &objects, st_link) {
+		i915_gem_object_wait(obj, I915_WAIT_ALL, MAX_SCHEDULE_TIMEOUT);
 		i915_gem_object_put(obj);
+	}
 	return err;
 }
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 33/39] revoke-fence
  2019-06-14  7:09 WIP glance towards struct_mutex removal Chris Wilson
                   ` (31 preceding siblings ...)
  2019-06-14  7:10 ` [PATCH 32/39] drm/i915: Allow vma binding to occur asynchronously Chris Wilson
@ 2019-06-14  7:10 ` Chris Wilson
  2019-06-14  7:10 ` [PATCH 34/39] drm/i915: Use vm->mutex for serialising GTT insertion Chris Wilson
                   ` (7 subsequent siblings)
  40 siblings, 0 replies; 61+ messages in thread
From: Chris Wilson @ 2019-06-14  7:10 UTC (permalink / raw)
  To: intel-gfx

---
 drivers/gpu/drm/i915/gem/i915_gem_domain.c    | 16 ++++++--
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |  9 ++---
 drivers/gpu/drm/i915/i915_gem.c               | 38 ++++++++-----------
 drivers/gpu/drm/i915/i915_gem_fence_reg.c     | 16 ++------
 drivers/gpu/drm/i915/i915_vma.c               |  4 +-
 drivers/gpu/drm/i915/i915_vma.h               |  4 +-
 drivers/gpu/drm/i915/intel_overlay.c          |  4 --
 7 files changed, 40 insertions(+), 51 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
index 7e6ed767348c..a62f7f0e2947 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
@@ -220,6 +220,8 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
 	 * state and so involves less work.
 	 */
 	if (atomic_read(&obj->bind_count)) {
+		struct drm_i915_private *i915 = to_i915(obj->base.dev);
+
 		/* Before we change the PTE, the GPU must not be accessing it.
 		 * If we wait upon the object, we know that all the bound
 		 * VMA are no longer active.
@@ -231,8 +233,7 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
 		if (ret)
 			return ret;
 
-		if (!HAS_LLC(to_i915(obj->base.dev)) &&
-		    cache_level != I915_CACHE_NONE) {
+		if (!HAS_LLC(i915) && cache_level != I915_CACHE_NONE) {
 			/* Access to snoopable pages through the GTT is
 			 * incoherent and on some machines causes a hard
 			 * lockup. Relinquish the CPU mmaping to force
@@ -240,6 +241,10 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
 			 * then double check if the GTT mapping is still
 			 * valid for that pointer access.
 			 */
+			ret = mutex_lock_interruptible(&i915->ggtt.vm.mutex);
+			if (ret)
+				return ret;
+
 			i915_gem_object_release_mmap(obj);
 
 			/* As we no longer need a fence for GTT access,
@@ -250,10 +255,13 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
 			 * supposed to be linear.
 			 */
 			for_each_ggtt_vma(vma, obj) {
-				ret = i915_vma_put_fence(vma);
+				ret = i915_vma_revoke_fence(vma);
 				if (ret)
-					return ret;
+					break;
 			}
+			mutex_unlock(&i915->ggtt.vm.mutex);
+			if (ret)
+				return ret;
 		} else {
 			/* We either have incoherent backing store and
 			 * so no GTT access or the architecture is fully
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 6905fd14361b..f12dc0f521c9 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -1091,6 +1091,9 @@ static void *reloc_iomap(struct drm_i915_gem_object *obj,
 		struct i915_vma *vma;
 		int err;
 
+		if (i915_gem_object_is_tiled(obj))
+			return ERR_PTR(-EINVAL);
+
 		if (use_cpu_reloc(cache, obj))
 			return NULL;
 
@@ -1114,12 +1117,6 @@ static void *reloc_iomap(struct drm_i915_gem_object *obj,
 			if (err) /* no inactive aperture space, use cpu reloc */
 				return NULL;
 		} else {
-			err = i915_vma_put_fence(vma);
-			if (err) {
-				i915_vma_unpin(vma);
-				return ERR_PTR(err);
-			}
-
 			cache->node.start = vma->node.start;
 			cache->node.mm = (void *)vma;
 		}
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index fcbca8a06f41..fd448afae492 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -380,20 +380,17 @@ i915_gem_gtt_pread(struct drm_i915_gem_object *obj,
 		return ret;
 
 	wakeref = intel_runtime_pm_get(i915);
-	vma = i915_gem_object_ggtt_pin(obj, NULL, 0, 0,
-				       PIN_MAPPABLE |
-				       PIN_NONFAULT |
-				       PIN_NONBLOCK);
+	vma = ERR_PTR(-ENODEV);
+	if (!i915_gem_object_is_tiled(obj)) {
+		vma = i915_gem_object_ggtt_pin(obj, NULL, 0, 0,
+					       PIN_MAPPABLE |
+					       PIN_NONFAULT |
+					       PIN_NONBLOCK);
+	}
 	if (!IS_ERR(vma)) {
 		node.start = i915_ggtt_offset(vma);
 		node.allocated = false;
-		ret = i915_vma_put_fence(vma);
-		if (ret) {
-			i915_vma_unpin(vma);
-			vma = ERR_PTR(ret);
-		}
-	}
-	if (IS_ERR(vma)) {
+	} else {
 		ret = insert_mappable_node(ggtt, &node, PAGE_SIZE);
 		if (ret)
 			goto out_unlock;
@@ -596,20 +593,17 @@ i915_gem_gtt_pwrite_fast(struct drm_i915_gem_object *obj,
 		wakeref = intel_runtime_pm_get(i915);
 	}
 
-	vma = i915_gem_object_ggtt_pin(obj, NULL, 0, 0,
-				       PIN_MAPPABLE |
-				       PIN_NONFAULT |
-				       PIN_NONBLOCK);
+	vma = ERR_PTR(-ENODEV);
+	if (i915_gem_object_is_tiled(obj)) {
+		vma = i915_gem_object_ggtt_pin(obj, NULL, 0, 0,
+					       PIN_MAPPABLE |
+					       PIN_NONFAULT |
+					       PIN_NONBLOCK);
+	}
 	if (!IS_ERR(vma)) {
 		node.start = i915_ggtt_offset(vma);
 		node.allocated = false;
-		ret = i915_vma_put_fence(vma);
-		if (ret) {
-			i915_vma_unpin(vma);
-			vma = ERR_PTR(ret);
-		}
-	}
-	if (IS_ERR(vma)) {
+	} else {
 		ret = insert_mappable_node(ggtt, &node, PAGE_SIZE);
 		if (ret)
 			goto out_rpm;
diff --git a/drivers/gpu/drm/i915/i915_gem_fence_reg.c b/drivers/gpu/drm/i915/i915_gem_fence_reg.c
index f24b626bf0f8..e596dd5ae1e2 100644
--- a/drivers/gpu/drm/i915/i915_gem_fence_reg.c
+++ b/drivers/gpu/drm/i915/i915_gem_fence_reg.c
@@ -287,7 +287,7 @@ static int fence_update(struct i915_fence_reg *fence,
 }
 
 /**
- * i915_vma_put_fence - force-remove fence for a VMA
+ * i915_vma_revoke_fence - force-remove fence for a VMA
  * @vma: vma to map linearly (not through a fence reg)
  *
  * This function force-removes any fence from the given object, which is useful
@@ -297,26 +297,18 @@ static int fence_update(struct i915_fence_reg *fence,
  *
  * 0 on success, negative error code on failure.
  */
-int i915_vma_put_fence(struct i915_vma *vma)
+int i915_vma_revoke_fence(struct i915_vma *vma)
 {
-	struct i915_ggtt *ggtt = i915_vm_to_ggtt(vma->vm);
 	struct i915_fence_reg *fence = vma->fence;
-	int err;
 
+	lockdep_assert_held(&vma->vm->mutex);
 	if (!fence)
 		return 0;
 
 	if (atomic_read(&fence->pin_count))
 		return -EBUSY;
 
-	err = mutex_lock_interruptible(&ggtt->vm.mutex);
-	if (err)
-		return err;
-
-	err = fence_update(fence, NULL);
-	mutex_unlock(&ggtt->vm.mutex);
-
-	return err;
+	return fence_update(fence, NULL);
 }
 
 static struct i915_fence_reg *fence_find(struct drm_i915_private *i915)
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 33c6460eae8a..b5875db6f2c0 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -1065,7 +1065,9 @@ int i915_vma_unbind(struct i915_vma *vma)
 		GEM_BUG_ON(i915_vma_has_ggtt_write(vma));
 
 		/* release the fence reg _after_ flushing */
-		ret = i915_vma_put_fence(vma);
+		mutex_lock(&vma->vm->mutex);
+		ret = i915_vma_revoke_fence(vma);
+		mutex_unlock(&vma->vm->mutex);
 		if (ret)
 			return ret;
 
diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
index e54ed0b380a3..f9f4cbb99e62 100644
--- a/drivers/gpu/drm/i915/i915_vma.h
+++ b/drivers/gpu/drm/i915/i915_vma.h
@@ -431,8 +431,8 @@ static inline struct page *i915_vma_first_page(struct i915_vma *vma)
  *
  * True if the vma has a fence, false otherwise.
  */
-int i915_vma_pin_fence(struct i915_vma *vma);
-int __must_check i915_vma_put_fence(struct i915_vma *vma);
+int __must_check i915_vma_pin_fence(struct i915_vma *vma);
+int __must_check i915_vma_revoke_fence(struct i915_vma *vma);
 
 static inline void __i915_vma_unpin_fence(struct i915_vma *vma)
 {
diff --git a/drivers/gpu/drm/i915/intel_overlay.c b/drivers/gpu/drm/i915/intel_overlay.c
index 009cddc09367..cdc028220848 100644
--- a/drivers/gpu/drm/i915/intel_overlay.c
+++ b/drivers/gpu/drm/i915/intel_overlay.c
@@ -771,10 +771,6 @@ static int intel_overlay_do_put_image(struct intel_overlay *overlay,
 	}
 	intel_frontbuffer_flush(new_bo->frontbuffer, ORIGIN_DIRTYFB);
 
-	ret = i915_vma_put_fence(vma);
-	if (ret)
-		goto out_unpin;
-
 	if (!overlay->active) {
 		u32 oconfig;
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 34/39] drm/i915: Use vm->mutex for serialising GTT insertion
  2019-06-14  7:09 WIP glance towards struct_mutex removal Chris Wilson
                   ` (32 preceding siblings ...)
  2019-06-14  7:10 ` [PATCH 33/39] revoke-fence Chris Wilson
@ 2019-06-14  7:10 ` Chris Wilson
  2019-06-14 19:14   ` Matthew Auld
  2019-06-14  7:10 ` [PATCH 35/39] drm/i915: Pin pages before waiting Chris Wilson
                   ` (6 subsequent siblings)
  40 siblings, 1 reply; 61+ messages in thread
From: Chris Wilson @ 2019-06-14  7:10 UTC (permalink / raw)
  To: intel-gfx

Serialising insertion into each of the global GTT and ppGTT accounts for
a large chunk of the current struct_mutex serialisation requireemnts.
(Note that it is not just the drm_mm / gtt management itself being
serialised, but the pin count and various flags.) Previously, the main
blocker for replacing this mutex was the reset handing, but with the
advent of "lockless" resets, we can freely take the vm->mutex and block
waiting for the GPU (without fear of deadlock if the GPU hangs). We also
proscribe allocations underneath vm->mutex, allowing us to take the
mutex inside the shrinker, avoiding the recursive struct_mutex we
previously used.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 .../gpu/drm/i915/gem/i915_gem_client_blt.c    |   7 +-
 drivers/gpu/drm/i915/gem/i915_gem_context.c   |   5 +-
 drivers/gpu/drm/i915/gem/i915_gem_domain.c    |  33 +++---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |   5 +-
 drivers/gpu/drm/i915/gem/i915_gem_mman.c      |  27 ++---
 drivers/gpu/drm/i915/gem/i915_gem_object.h    |   3 +-
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |   2 +-
 drivers/gpu/drm/i915/gem/i915_gem_shrinker.c  |  93 +--------------
 drivers/gpu/drm/i915/gem/i915_gem_tiling.c    |  60 +++++-----
 drivers/gpu/drm/i915/gem/i915_gem_userptr.c   |  28 +----
 drivers/gpu/drm/i915/gt/intel_context.c       |   5 +-
 drivers/gpu/drm/i915/gt/intel_ringbuffer.c    |   4 +-
 drivers/gpu/drm/i915/gvt/aperture_gm.c        |  12 +-
 drivers/gpu/drm/i915/i915_debugfs.c           |  10 +-
 drivers/gpu/drm/i915/i915_gem.c               | 106 ++++++++++--------
 drivers/gpu/drm/i915/i915_gem_evict.c         |  28 +----
 drivers/gpu/drm/i915/i915_gem_fence_reg.c     |   2 +-
 drivers/gpu/drm/i915/i915_gem_gtt.c           |  76 +------------
 drivers/gpu/drm/i915/i915_gem_gtt.h           |   3 -
 drivers/gpu/drm/i915/i915_perf.c              |  20 +---
 drivers/gpu/drm/i915/i915_vma.c               |  83 ++++++++------
 drivers/gpu/drm/i915/i915_vma.h               |  61 +++-------
 drivers/gpu/drm/i915/intel_display.c          |  42 +------
 drivers/gpu/drm/i915/intel_fbdev.c            |   8 +-
 drivers/gpu/drm/i915/intel_guc.c              |   1 +
 drivers/gpu/drm/i915/intel_overlay.c          |  18 +--
 drivers/gpu/drm/i915/selftests/i915_vma.c     |   2 +
 27 files changed, 239 insertions(+), 505 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c b/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c
index 22a262dfd07b..7c6fd310276e 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c
@@ -152,7 +152,6 @@ static void clear_pages_dma_fence_cb(struct dma_fence *fence,
 static void clear_pages_worker(struct work_struct *work)
 {
 	struct clear_pages_work *w = container_of(work, typeof(*w), work);
-	struct drm_i915_private *i915 = w->ce->gem_context->i915;
 	struct drm_i915_gem_object *obj = w->sleeve->vma->obj;
 	struct i915_vma *vma = w->sleeve->vma;
 	struct i915_request *rq;
@@ -168,11 +167,9 @@ static void clear_pages_worker(struct work_struct *work)
 		obj->cache_dirty = false;
 	}
 
-	/* XXX: we need to kill this */
-	mutex_lock(&i915->drm.struct_mutex);
 	err = i915_vma_pin(vma, 0, 0, PIN_USER);
 	if (unlikely(err))
-		goto out_unlock;
+		goto out_signal;
 
 	rq = i915_request_create(w->ce);
 	if (IS_ERR(rq)) {
@@ -208,8 +205,6 @@ static void clear_pages_worker(struct work_struct *work)
 	i915_request_add(rq);
 out_unpin:
 	i915_vma_unpin(vma);
-out_unlock:
-	mutex_unlock(&i915->drm.struct_mutex);
 out_signal:
 	if (unlikely(err)) {
 		dma_fence_set_error(&w->dma, err);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 24a714853f85..da6bf27697e4 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -99,6 +99,7 @@ static void lut_close(struct i915_gem_context *ctx)
 	void __rcu **slot;
 
 	lockdep_assert_held(&ctx->mutex);
+	mutex_lock(&ctx->vm->mutex);
 
 	rcu_read_lock();
 	radix_tree_for_each_slot(slot, &ctx->handles_vma, &iter, 0) {
@@ -129,13 +130,15 @@ static void lut_close(struct i915_gem_context *ctx)
 			radix_tree_iter_delete(&ctx->handles_vma, &iter, slot);
 			GEM_BUG_ON(!atomic_read(&vma->open_count));
 			if (atomic_dec_and_test(&vma->open_count))
-				i915_vma_close(vma);
+				i915_vma_unbind(vma);
 			i915_vma_put(vma);
 		}
 
 		i915_gem_object_put(obj);
 	}
 	rcu_read_unlock();
+
+	mutex_unlock(&ctx->vm->mutex);
 }
 
 static struct intel_context *
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
index a62f7f0e2947..f9044bbdd429 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
@@ -26,7 +26,7 @@ static void __i915_gem_object_flush_for_display(struct drm_i915_gem_object *obj)
 
 void i915_gem_object_flush_if_display(struct drm_i915_gem_object *obj)
 {
-	if (!READ_ONCE(obj->pin_global))
+	if (!atomic_read(&obj->pin_global))
 		return;
 
 	i915_gem_object_lock(obj);
@@ -377,16 +377,11 @@ int i915_gem_set_caching_ioctl(struct drm_device *dev, void *data,
 	if (ret)
 		goto out;
 
-	ret = mutex_lock_interruptible(&i915->drm.struct_mutex);
-	if (ret)
-		goto out;
-
 	ret = i915_gem_object_lock_interruptible(obj);
 	if (ret == 0) {
 		ret = i915_gem_object_set_cache_level(obj, level);
 		i915_gem_object_unlock(obj);
 	}
-	mutex_unlock(&i915->drm.struct_mutex);
 
 out:
 	i915_gem_object_put(obj);
@@ -413,7 +408,7 @@ i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj,
 	/* Mark the global pin early so that we account for the
 	 * display coherency whilst setting up the cache domains.
 	 */
-	obj->pin_global++;
+	atomic_inc(&obj->pin_global);
 
 	/* The display engine is not coherent with the LLC cache on gen6.  As
 	 * a result, we make sure that the pinning that is about to occur is
@@ -463,25 +458,27 @@ i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj,
 	return vma;
 
 err_unpin_global:
-	obj->pin_global--;
+	atomic_dec(&obj->pin_global);
 	return vma;
 }
 
 static void i915_gem_object_bump_inactive_ggtt(struct drm_i915_gem_object *obj)
 {
 	struct drm_i915_private *i915 = to_i915(obj->base.dev);
-	struct i915_vma *vma;
 
 	GEM_BUG_ON(!i915_gem_object_has_pinned_pages(obj));
 
-	mutex_lock(&i915->ggtt.vm.mutex);
-	for_each_ggtt_vma(vma, obj) {
-		if (!drm_mm_node_allocated(&vma->node))
-			continue;
+	if (mutex_trylock(&i915->ggtt.vm.mutex)) {
+		struct i915_vma *vma;
 
-		list_move_tail(&vma->vm_link, &vma->vm->bound_list);
+		for_each_ggtt_vma(vma, obj) {
+			if (!drm_mm_node_allocated(&vma->node))
+				continue;
+
+			list_move_tail(&vma->vm_link, &vma->vm->bound_list);
+		}
+		mutex_unlock(&i915->ggtt.vm.mutex);
 	}
-	mutex_unlock(&i915->ggtt.vm.mutex);
 
 	if (i915_gem_object_is_shrinkable(obj)) {
 		unsigned long flags;
@@ -500,12 +497,10 @@ i915_gem_object_unpin_from_display_plane(struct i915_vma *vma)
 {
 	struct drm_i915_gem_object *obj = vma->obj;
 
-	assert_object_held(obj);
-
-	if (WARN_ON(obj->pin_global == 0))
+	if (GEM_WARN_ON(!atomic_read(&obj->pin_global)))
 		return;
 
-	if (--obj->pin_global == 0)
+	if (atomic_dec_and_test(&obj->pin_global))
 		vma->display_alignment = I915_GTT_MIN_ALIGNMENT;
 
 	/* Bump the LRU to try and avoid premature eviction whilst flipping  */
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index f12dc0f521c9..44bcb681c168 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -547,8 +547,11 @@ eb_add_vma(struct i915_execbuffer *eb,
 		eb_unreserve_vma(vma, vma->exec_flags);
 
 		list_add_tail(&vma->exec_link, &eb->unbound);
-		if (drm_mm_node_allocated(&vma->node))
+		if (drm_mm_node_allocated(&vma->node)) {
+			mutex_lock(&vma->vm->mutex);
 			err = i915_vma_unbind(vma);
+			mutex_unlock(&vma->vm->mutex);
+		}
 		if (unlikely(err))
 			vma->exec_flags = NULL;
 	}
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
index 49cf9ad97bfc..9fa0569afb50 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
@@ -248,16 +248,6 @@ vm_fault_t i915_gem_fault(struct vm_fault *vmf)
 		goto err_rpm;
 	}
 
-	ret = i915_mutex_lock_interruptible(dev);
-	if (ret)
-		goto err_reset;
-
-	/* Access to snoopable pages through the GTT is incoherent. */
-	if (obj->cache_level != I915_CACHE_NONE && !HAS_LLC(i915)) {
-		ret = -EFAULT;
-		goto err_unlock;
-	}
-
 	/* Now pin it into the GTT as needed */
 	vma = i915_gem_object_ggtt_pin(obj, NULL, 0, 0,
 				       PIN_MAPPABLE |
@@ -287,13 +277,19 @@ vm_fault_t i915_gem_fault(struct vm_fault *vmf)
 	}
 	if (IS_ERR(vma)) {
 		ret = PTR_ERR(vma);
-		goto err_unlock;
+		goto err_reset;
 	}
 
-	ret = i915_vma_pin_fence(vma);
+	assert_rpm_wakelock_held(i915);
+
+	ret = mutex_lock_interruptible(&vma->vm->mutex);
 	if (ret)
 		goto err_unpin;
 
+	ret = __i915_vma_pin_fence(vma);
+	if (ret)
+		goto err_unlock;
+
 	/* Finally, remap it using the new GTT offset */
 	ret = remap_io_mapping(area,
 			       area->vm_start + (vma->ggtt_view.partial.offset << PAGE_SHIFT),
@@ -304,7 +300,6 @@ vm_fault_t i915_gem_fault(struct vm_fault *vmf)
 		goto err_fence;
 
 	/* Mark as being mmapped into userspace for later revocation */
-	assert_rpm_wakelock_held(i915);
 	if (!i915_vma_set_userfault(vma) && !obj->userfault_count++)
 		list_add(&obj->userfault_link, &i915->ggtt.userfault_list);
 	if (CONFIG_DRM_I915_USERFAULT_AUTOSUSPEND)
@@ -316,10 +311,10 @@ vm_fault_t i915_gem_fault(struct vm_fault *vmf)
 
 err_fence:
 	i915_vma_unpin_fence(vma);
+err_unlock:
+	mutex_unlock(&vma->vm->mutex);
 err_unpin:
 	__i915_vma_unpin(vma);
-err_unlock:
-	mutex_unlock(&dev->struct_mutex);
 err_reset:
 	i915_reset_unlock(i915, srcu);
 err_rpm:
@@ -405,7 +400,7 @@ void i915_gem_object_release_mmap(struct drm_i915_gem_object *obj)
 	 * requirement that operations to the GGTT be made holding the RPM
 	 * wakeref.
 	 */
-	lockdep_assert_held(&i915->drm.struct_mutex);
+	lockdep_assert_held(&i915->ggtt.vm.mutex);
 	wakeref = intel_runtime_pm_get(i915);
 
 	if (!obj->userfault_count)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h
index 7ea8013d108f..0fc54924b33a 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
@@ -403,7 +403,8 @@ static inline bool cpu_write_needs_clflush(struct drm_i915_gem_object *obj)
 	if (!(obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_WRITE))
 		return true;
 
-	return obj->pin_global; /* currently in use by HW, keep flushed */
+	/* Currently in use by HW? Keep flushed. */
+	return atomic_read(&obj->pin_global);
 }
 
 static inline void __start_cpu_write(struct drm_i915_gem_object *obj)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index 8f61d7a93078..f792953b8a71 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -159,7 +159,7 @@ struct drm_i915_gem_object {
 	/** Count of VMA actually bound by this object */
 	atomic_t bind_count;
 	/** Count of how many global VMA are currently pinned for use by HW */
-	unsigned int pin_global;
+	atomic_t pin_global;
 
 	struct {
 		struct mutex lock; /* protects the pages and their use */
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c
index 7445e6f662ac..f833e1bd7c2f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c
@@ -16,40 +16,6 @@
 
 #include "i915_trace.h"
 
-static bool shrinker_lock(struct drm_i915_private *i915,
-			  unsigned int flags,
-			  bool *unlock)
-{
-	struct mutex *m = &i915->drm.struct_mutex;
-
-	switch (mutex_trylock_recursive(m)) {
-	case MUTEX_TRYLOCK_RECURSIVE:
-		*unlock = false;
-		return true;
-
-	case MUTEX_TRYLOCK_FAILED:
-		*unlock = false;
-		if (flags & I915_SHRINK_ACTIVE &&
-		    mutex_lock_killable_nested(m, I915_MM_SHRINKER) == 0)
-			*unlock = true;
-		return *unlock;
-
-	case MUTEX_TRYLOCK_SUCCESS:
-		*unlock = true;
-		return true;
-	}
-
-	BUG();
-}
-
-static void shrinker_unlock(struct drm_i915_private *i915, bool unlock)
-{
-	if (!unlock)
-		return;
-
-	mutex_unlock(&i915->drm.struct_mutex);
-}
-
 static bool swap_available(void)
 {
 	return get_nr_swap_pages() > 0;
@@ -78,7 +44,7 @@ static bool can_release_pages(struct drm_i915_gem_object *obj)
 	 * To simplify the scan, and to avoid walking the list of vma under the
 	 * object, we just check the count of its permanently pinned.
 	 */
-	if (READ_ONCE(obj->pin_global))
+	if (atomic_read(&obj->pin_global))
 		return false;
 
 	/* We can only return physical pages to the system if we can either
@@ -154,10 +120,6 @@ i915_gem_shrink(struct drm_i915_private *i915,
 	intel_wakeref_t wakeref = 0;
 	unsigned long count = 0;
 	unsigned long scanned = 0;
-	bool unlock;
-
-	if (!shrinker_lock(i915, shrink, &unlock))
-		return 0;
 
 	/*
 	 * When shrinking the active list, also consider active contexts.
@@ -174,7 +136,6 @@ i915_gem_shrink(struct drm_i915_private *i915,
 				       MAX_SCHEDULE_TIMEOUT);
 
 	trace_i915_gem_shrink(i915, target, shrink);
-	i915_retire_requests(i915);
 
 	/*
 	 * Unbinding of objects will require HW access; Let us not wake the
@@ -216,13 +177,6 @@ i915_gem_shrink(struct drm_i915_private *i915,
 
 		INIT_LIST_HEAD(&still_in_list);
 
-		/*
-		 * We serialize our access to unreferenced objects through
-		 * the use of the struct_mutex. While the objects are not
-		 * yet freed (due to RCU then a workqueue) we still want
-		 * to be able to shrink their pages, so they remain on
-		 * the unbound/bound list until actually freed.
-		 */
 		spin_lock_irqsave(&i915->mm.obj_lock, flags);
 		while (count < target &&
 		       (obj = list_first_entry_or_null(phase->list,
@@ -270,10 +224,6 @@ i915_gem_shrink(struct drm_i915_private *i915,
 	if (shrink & I915_SHRINK_BOUND)
 		intel_runtime_pm_put(i915, wakeref);
 
-	i915_retire_requests(i915);
-
-	shrinker_unlock(i915, unlock);
-
 	if (nr_scanned)
 		*nr_scanned += scanned;
 	return count;
@@ -343,13 +293,9 @@ i915_gem_shrinker_scan(struct shrinker *shrinker, struct shrink_control *sc)
 	struct drm_i915_private *i915 =
 		container_of(shrinker, struct drm_i915_private, mm.shrinker);
 	unsigned long freed;
-	bool unlock;
 
 	sc->nr_scanned = 0;
 
-	if (!shrinker_lock(i915, 0, &unlock))
-		return SHRINK_STOP;
-
 	freed = i915_gem_shrink(i915,
 				sc->nr_to_scan,
 				&sc->nr_scanned,
@@ -370,8 +316,6 @@ i915_gem_shrinker_scan(struct shrinker *shrinker, struct shrink_control *sc)
 		}
 	}
 
-	shrinker_unlock(i915, unlock);
-
 	return sc->nr_scanned ? freed : SHRINK_STOP;
 }
 
@@ -423,16 +367,12 @@ i915_gem_shrinker_vmap(struct notifier_block *nb, unsigned long event, void *ptr
 	struct i915_vma *vma, *next;
 	unsigned long freed_pages = 0;
 	intel_wakeref_t wakeref;
-	bool unlock;
-
-	if (!shrinker_lock(i915, 0, &unlock))
-		return NOTIFY_DONE;
 
 	/* Force everything onto the inactive lists */
 	if (i915_gem_wait_for_idle(i915,
 				   I915_WAIT_LOCKED,
 				   MAX_SCHEDULE_TIMEOUT))
-		goto out;
+		return NOTIFY_DONE;
 
 	with_intel_runtime_pm(i915, wakeref)
 		freed_pages += i915_gem_shrink(i915, -1UL, NULL,
@@ -449,16 +389,11 @@ i915_gem_shrinker_vmap(struct notifier_block *nb, unsigned long event, void *ptr
 		if (!vma->iomap || i915_vma_is_active(vma))
 			continue;
 
-		mutex_unlock(&i915->ggtt.vm.mutex);
 		if (i915_vma_unbind(vma) == 0)
 			freed_pages += count;
-		mutex_lock(&i915->ggtt.vm.mutex);
 	}
 	mutex_unlock(&i915->ggtt.vm.mutex);
 
-out:
-	shrinker_unlock(i915, unlock);
-
 	*(unsigned long *)ptr += freed_pages;
 	return NOTIFY_DONE;
 }
@@ -500,37 +435,13 @@ void i915_gem_shrinker_unregister(struct drm_i915_private *i915)
 void i915_gem_shrinker_taints_mutex(struct drm_i915_private *i915,
 				    struct mutex *mutex)
 {
-	bool unlock = false;
-
 	if (!IS_ENABLED(CONFIG_LOCKDEP))
 		return;
 
-	if (!lockdep_is_held_type(&i915->drm.struct_mutex, -1)) {
-		mutex_acquire(&i915->drm.struct_mutex.dep_map,
-			      I915_MM_NORMAL, 0, _RET_IP_);
-		unlock = true;
-	}
-
 	fs_reclaim_acquire(GFP_KERNEL);
 
-	/*
-	 * As we invariably rely on the struct_mutex within the shrinker,
-	 * but have a complicated recursion dance, taint all the mutexes used
-	 * within the shrinker with the struct_mutex. For completeness, we
-	 * taint with all subclass of struct_mutex, even though we should
-	 * only need tainting by I915_MM_NORMAL to catch possible ABBA
-	 * deadlocks from using struct_mutex inside @mutex.
-	 */
-	mutex_acquire(&i915->drm.struct_mutex.dep_map,
-		      I915_MM_SHRINKER, 0, _RET_IP_);
-
 	mutex_acquire(&mutex->dep_map, 0, 0, _RET_IP_);
 	mutex_release(&mutex->dep_map, 0, _RET_IP_);
 
-	mutex_release(&i915->drm.struct_mutex.dep_map, 0, _RET_IP_);
-
 	fs_reclaim_release(GFP_KERNEL);
-
-	if (unlock)
-		mutex_release(&i915->drm.struct_mutex.dep_map, 0, _RET_IP_);
 }
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_tiling.c b/drivers/gpu/drm/i915/gem/i915_gem_tiling.c
index ca0c2f451742..f919dbb36edb 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_tiling.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_tiling.c
@@ -212,15 +212,18 @@ i915_gem_object_set_tiling(struct drm_i915_gem_object *obj,
 
 	GEM_BUG_ON(!i915_tiling_ok(obj, tiling, stride));
 	GEM_BUG_ON(!stride ^ (tiling == I915_TILING_NONE));
-	lockdep_assert_held(&i915->drm.struct_mutex);
 
 	if ((tiling | stride) == obj->tiling_and_stride)
 		return 0;
 
-	if (i915_gem_object_is_framebuffer(obj))
+	i915_gem_object_lock(obj);
+	if (i915_gem_object_is_framebuffer(obj)) {
+		i915_gem_object_unlock(obj);
 		return -EBUSY;
+	}
 
-	/* We need to rebind the object if its current allocation
+	/*
+	 * We need to rebind the object if its current allocation
 	 * no longer meets the alignment restrictions for its new
 	 * tiling mode. Otherwise we can just leave it alone, but
 	 * need to ensure that any fence register is updated before
@@ -233,17 +236,37 @@ i915_gem_object_set_tiling(struct drm_i915_gem_object *obj,
 	 * whilst executing a fenced command for an untiled object.
 	 */
 
-	err = i915_gem_object_fence_prepare(obj, tiling, stride);
-	if (err)
+	err = mutex_lock_interruptible(&i915->ggtt.vm.mutex);
+	if (err) {
+		i915_gem_object_unlock(obj);
 		return err;
+	}
 
-	i915_gem_object_lock(obj);
-	if (i915_gem_object_is_framebuffer(obj)) {
+	err = i915_gem_object_fence_prepare(obj, tiling, stride);
+	if (err) {
+		mutex_unlock(&i915->ggtt.vm.mutex);
 		i915_gem_object_unlock(obj);
-		return -EBUSY;
+		return err;
 	}
 
-	/* If the memory has unknown (i.e. varying) swizzling, we pin the
+	for_each_ggtt_vma(vma, obj) {
+		vma->fence_size =
+			i915_gem_fence_size(i915, vma->size, tiling, stride);
+		vma->fence_alignment =
+			i915_gem_fence_alignment(i915,
+						 vma->size, tiling, stride);
+
+		if (vma->fence)
+			vma->fence->dirty = true;
+	}
+
+	/* Force the fence to be reacquired for GTT access */
+	i915_gem_object_release_mmap(obj);
+
+	mutex_unlock(&i915->ggtt.vm.mutex);
+
+	/*
+	 * If the memory has unknown (i.e. varying) swizzling, we pin the
 	 * pages to prevent them being swapped out and causing corruption
 	 * due to the change in swizzling.
 	 */
@@ -264,23 +287,9 @@ i915_gem_object_set_tiling(struct drm_i915_gem_object *obj,
 	}
 	mutex_unlock(&obj->mm.lock);
 
-	for_each_ggtt_vma(vma, obj) {
-		vma->fence_size =
-			i915_gem_fence_size(i915, vma->size, tiling, stride);
-		vma->fence_alignment =
-			i915_gem_fence_alignment(i915,
-						 vma->size, tiling, stride);
-
-		if (vma->fence)
-			vma->fence->dirty = true;
-	}
-
 	obj->tiling_and_stride = tiling | stride;
 	i915_gem_object_unlock(obj);
 
-	/* Force the fence to be reacquired for GTT access */
-	i915_gem_object_release_mmap(obj);
-
 	/* Try to preallocate memory required to save swizzling on put-pages */
 	if (i915_gem_object_needs_bit17_swizzle(obj)) {
 		if (!obj->bit_17) {
@@ -364,12 +373,7 @@ i915_gem_set_tiling_ioctl(struct drm_device *dev, void *data,
 		}
 	}
 
-	err = mutex_lock_interruptible(&dev->struct_mutex);
-	if (err)
-		goto err;
-
 	err = i915_gem_object_set_tiling(obj, args->tiling_mode, args->stride);
-	mutex_unlock(&dev->struct_mutex);
 
 	/* We have to maintain this existing ABI... */
 	args->stride = i915_gem_object_get_stride(obj);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
index 528b61678334..f093deaeb5c0 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
@@ -93,7 +93,6 @@ userptr_mn_invalidate_range_start(struct mmu_notifier *_mn,
 	struct i915_mmu_notifier *mn =
 		container_of(_mn, struct i915_mmu_notifier, mn);
 	struct interval_tree_node *it;
-	struct mutex *unlock = NULL;
 	unsigned long end;
 	int ret = 0;
 
@@ -130,32 +129,12 @@ userptr_mn_invalidate_range_start(struct mmu_notifier *_mn,
 		}
 		spin_unlock(&mn->lock);
 
-		if (!unlock) {
-			unlock = &mn->mm->i915->drm.struct_mutex;
-
-			switch (mutex_trylock_recursive(unlock)) {
-			default:
-			case MUTEX_TRYLOCK_FAILED:
-				if (mutex_lock_killable_nested(unlock, I915_MM_SHRINKER)) {
-					i915_gem_object_put(obj);
-					return -EINTR;
-				}
-				/* fall through */
-			case MUTEX_TRYLOCK_SUCCESS:
-				break;
-
-			case MUTEX_TRYLOCK_RECURSIVE:
-				unlock = ERR_PTR(-EEXIST);
-				break;
-			}
-		}
-
 		ret = i915_gem_object_unbind(obj);
 		if (ret == 0)
 			ret = __i915_gem_object_put_pages(obj, I915_MM_SHRINKER);
 		i915_gem_object_put(obj);
 		if (ret)
-			goto unlock;
+			return ret;
 
 		spin_lock(&mn->lock);
 
@@ -168,12 +147,7 @@ userptr_mn_invalidate_range_start(struct mmu_notifier *_mn,
 	}
 	spin_unlock(&mn->lock);
 
-unlock:
-	if (!IS_ERR_OR_NULL(unlock))
-		mutex_unlock(unlock);
-
 	return ret;
-
 }
 
 static const struct mmu_notifier_ops i915_gem_userptr_notifier = {
diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
index a109904faa57..967256d7e8a6 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -110,7 +110,7 @@ static int __context_pin_state(struct i915_vma *vma)
 	 * And mark it as a globally pinned object to let the shrinker know
 	 * it cannot reclaim the object until we release it.
 	 */
-	vma->obj->pin_global++;
+	atomic_inc(&vma->obj->pin_global);
 	vma->obj->mm.dirty = true;
 
 	return 0;
@@ -118,7 +118,8 @@ static int __context_pin_state(struct i915_vma *vma)
 
 static void __context_unpin_state(struct i915_vma *vma)
 {
-	vma->obj->pin_global--;
+	GEM_BUG_ON(!atomic_read(&vma->obj->pin_global));
+	atomic_dec(&vma->obj->pin_global);
 	__i915_vma_unpin(vma);
 }
 
diff --git a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
index d3783beaf132..95d9adc7e2e9 100644
--- a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
@@ -1183,7 +1183,7 @@ int intel_ring_pin(struct intel_ring *ring)
 		goto unpin_ring;
 	}
 
-	vma->obj->pin_global++;
+	atomic_read(&vma->obj->pin_global);
 
 	ring->vaddr = addr;
 	return 0;
@@ -1219,7 +1219,7 @@ void intel_ring_unpin(struct intel_ring *ring)
 		i915_gem_object_unpin_map(ring->vma->obj);
 	ring->vaddr = NULL;
 
-	ring->vma->obj->pin_global--;
+	atomic_dec(&ring->vma->obj->pin_global);
 	i915_vma_unpin(ring->vma);
 
 	i915_timeline_unpin(ring->timeline);
diff --git a/drivers/gpu/drm/i915/gvt/aperture_gm.c b/drivers/gpu/drm/i915/gvt/aperture_gm.c
index 2bd9dcebd48c..a2d7a1df820f 100644
--- a/drivers/gpu/drm/i915/gvt/aperture_gm.c
+++ b/drivers/gpu/drm/i915/gvt/aperture_gm.c
@@ -61,14 +61,14 @@ static int alloc_gm(struct intel_vgpu *vgpu, bool high_gm)
 		flags = PIN_MAPPABLE;
 	}
 
-	mutex_lock(&dev_priv->drm.struct_mutex);
+	mutex_lock(&dev_priv->ggtt.vm.mutex);
 	mmio_hw_access_pre(dev_priv);
 	ret = i915_gem_gtt_insert(&dev_priv->ggtt.vm, node,
 				  size, I915_GTT_PAGE_SIZE,
 				  I915_COLOR_UNEVICTABLE,
 				  start, end, flags);
 	mmio_hw_access_post(dev_priv);
-	mutex_unlock(&dev_priv->drm.struct_mutex);
+	mutex_unlock(&dev_priv->ggtt.vm.mutex);
 	if (ret)
 		gvt_err("fail to alloc %s gm space from host\n",
 			high_gm ? "high" : "low");
@@ -98,9 +98,9 @@ static int alloc_vgpu_gm(struct intel_vgpu *vgpu)
 
 	return 0;
 out_free_aperture:
-	mutex_lock(&dev_priv->drm.struct_mutex);
+	mutex_lock(&dev_priv->ggtt.vm.mutex);
 	drm_mm_remove_node(&vgpu->gm.low_gm_node);
-	mutex_unlock(&dev_priv->drm.struct_mutex);
+	mutex_unlock(&dev_priv->ggtt.vm.mutex);
 	return ret;
 }
 
@@ -108,10 +108,10 @@ static void free_vgpu_gm(struct intel_vgpu *vgpu)
 {
 	struct drm_i915_private *dev_priv = vgpu->gvt->dev_priv;
 
-	mutex_lock(&dev_priv->drm.struct_mutex);
+	mutex_lock(&dev_priv->ggtt.vm.mutex);
 	drm_mm_remove_node(&vgpu->gm.low_gm_node);
 	drm_mm_remove_node(&vgpu->gm.high_gm_node);
-	mutex_unlock(&dev_priv->drm.struct_mutex);
+	mutex_unlock(&dev_priv->ggtt.vm.mutex);
 }
 
 /**
diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index b54f7b79a53c..cbeaa53117af 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -76,7 +76,7 @@ static int i915_capabilities(struct seq_file *m, void *data)
 
 static char get_pin_flag(struct drm_i915_gem_object *obj)
 {
-	return obj->pin_global ? 'p' : ' ';
+	return atomic_read(&obj->pin_global) ? 'p' : ' ';
 }
 
 static char get_tiling_flag(struct drm_i915_gem_object *obj)
@@ -218,7 +218,7 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
 	seq_printf(m, " (pinned x %d)", pin_count);
 	if (obj->stolen)
 		seq_printf(m, " (stolen: %08llx)", obj->stolen->start);
-	if (obj->pin_global)
+	if (atomic_read(&obj->pin_global))
 		seq_printf(m, " (global)");
 
 	engine = i915_gem_object_last_write_engine(obj);
@@ -1604,11 +1604,6 @@ static int i915_gem_framebuffer_info(struct seq_file *m, void *data)
 	struct drm_device *dev = &dev_priv->drm;
 	struct intel_framebuffer *fbdev_fb = NULL;
 	struct drm_framebuffer *drm_fb;
-	int ret;
-
-	ret = mutex_lock_interruptible(&dev->struct_mutex);
-	if (ret)
-		return ret;
 
 #ifdef CONFIG_DRM_FBDEV_EMULATION
 	if (dev_priv->fbdev && dev_priv->fbdev->helper.fb) {
@@ -1643,7 +1638,6 @@ static int i915_gem_framebuffer_info(struct seq_file *m, void *data)
 		seq_putc(m, '\n');
 	}
 	mutex_unlock(&dev->mode_config.fb_lock);
-	mutex_unlock(&dev->struct_mutex);
 
 	return 0;
 }
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index fd448afae492..9e835355eb93 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -60,20 +60,31 @@
 #include "intel_pm.h"
 
 static int
-insert_mappable_node(struct i915_ggtt *ggtt,
-                     struct drm_mm_node *node, u32 size)
+insert_mappable_node(struct i915_ggtt *ggtt, struct drm_mm_node *node, u32 size)
 {
+	int ret;
+
 	memset(node, 0, sizeof(*node));
-	return drm_mm_insert_node_in_range(&ggtt->vm.mm, node,
-					   size, 0, I915_COLOR_UNEVICTABLE,
-					   0, ggtt->mappable_end,
-					   DRM_MM_INSERT_LOW);
+
+	ret = mutex_lock_interruptible(&ggtt->vm.mutex);
+	if (ret)
+		return ret;
+
+	ret = drm_mm_insert_node_in_range(&ggtt->vm.mm, node,
+					  size, 0, I915_COLOR_UNEVICTABLE,
+					  0, ggtt->mappable_end,
+					  DRM_MM_INSERT_LOW);
+	mutex_unlock(&ggtt->vm.mutex);
+
+	return ret;
 }
 
 static void
-remove_mappable_node(struct drm_mm_node *node)
+remove_mappable_node(struct i915_ggtt *ggtt, struct drm_mm_node *node)
 {
+	mutex_lock(&ggtt->vm.mutex);
 	drm_mm_remove_node(node);
+	mutex_unlock(&ggtt->vm.mutex);
 }
 
 int
@@ -84,8 +95,11 @@ i915_gem_get_aperture_ioctl(struct drm_device *dev, void *data,
 	struct drm_i915_gem_get_aperture *args = data;
 	struct i915_vma *vma;
 	u64 pinned;
+	int ret;
 
-	mutex_lock(&ggtt->vm.mutex);
+	ret = mutex_lock_interruptible(&ggtt->vm.mutex);
+	if (ret)
+		return ret;
 
 	pinned = ggtt->vm.reserved;
 	list_for_each_entry(vma, &ggtt->vm.bound_list, vm_link)
@@ -106,8 +120,6 @@ int i915_gem_object_unbind(struct drm_i915_gem_object *obj)
 	LIST_HEAD(still_in_list);
 	int ret = 0;
 
-	lockdep_assert_held(&obj->base.dev->struct_mutex);
-
 	spin_lock(&obj->vma.lock);
 	while (!ret && (vma = list_first_entry_or_null(&obj->vma.list,
 						       struct i915_vma,
@@ -117,7 +129,11 @@ int i915_gem_object_unbind(struct drm_i915_gem_object *obj)
 		list_move_tail(&vma->obj_link, &still_in_list);
 		spin_unlock(&obj->vma.lock);
 
-		ret = i915_vma_unbind(vma);
+		ret = mutex_lock_interruptible(&vma->vm->mutex);
+		if (!ret) {
+			ret = i915_vma_unbind(vma);
+			mutex_unlock(&vma->vm->mutex);
+		}
 
 		i915_vma_put(vma);
 		spin_lock(&obj->vma.lock);
@@ -375,10 +391,6 @@ i915_gem_gtt_pread(struct drm_i915_gem_object *obj,
 	u64 remain, offset;
 	int ret;
 
-	ret = mutex_lock_interruptible(&i915->drm.struct_mutex);
-	if (ret)
-		return ret;
-
 	wakeref = intel_runtime_pm_get(i915);
 	vma = ERR_PTR(-ENODEV);
 	if (!i915_gem_object_is_tiled(obj)) {
@@ -393,12 +405,10 @@ i915_gem_gtt_pread(struct drm_i915_gem_object *obj,
 	} else {
 		ret = insert_mappable_node(ggtt, &node, PAGE_SIZE);
 		if (ret)
-			goto out_unlock;
+			goto out_rpm;
 		GEM_BUG_ON(!node.allocated);
 	}
 
-	mutex_unlock(&i915->drm.struct_mutex);
-
 	ret = i915_gem_object_lock_interruptible(obj);
 	if (ret)
 		goto out_unpin;
@@ -454,17 +464,15 @@ i915_gem_gtt_pread(struct drm_i915_gem_object *obj,
 
 	i915_gem_object_unlock_fence(obj, fence);
 out_unpin:
-	mutex_lock(&i915->drm.struct_mutex);
 	if (node.allocated) {
 		wmb();
 		ggtt->vm.clear_range(&ggtt->vm, node.start, node.size);
-		remove_mappable_node(&node);
+		remove_mappable_node(ggtt, &node);
 	} else {
 		i915_vma_unpin(vma);
 	}
-out_unlock:
+out_rpm:
 	intel_runtime_pm_put(i915, wakeref);
-	mutex_unlock(&i915->drm.struct_mutex);
 
 	return ret;
 }
@@ -571,10 +579,6 @@ i915_gem_gtt_pwrite_fast(struct drm_i915_gem_object *obj,
 	void __user *user_data;
 	int ret;
 
-	ret = mutex_lock_interruptible(&i915->drm.struct_mutex);
-	if (ret)
-		return ret;
-
 	if (i915_gem_object_has_struct_page(obj)) {
 		/*
 		 * Avoid waking the device up if we can fallback, as
@@ -584,10 +588,8 @@ i915_gem_gtt_pwrite_fast(struct drm_i915_gem_object *obj,
 		 * using the cache bypass of indirect GGTT access.
 		 */
 		wakeref = intel_runtime_pm_get_if_in_use(i915);
-		if (!wakeref) {
-			ret = -EFAULT;
-			goto out_unlock;
-		}
+		if (!wakeref)
+			return -EFAULT;
 	} else {
 		/* No backing pages, no fallback, we must force GGTT access */
 		wakeref = intel_runtime_pm_get(i915);
@@ -610,8 +612,6 @@ i915_gem_gtt_pwrite_fast(struct drm_i915_gem_object *obj,
 		GEM_BUG_ON(!node.allocated);
 	}
 
-	mutex_unlock(&i915->drm.struct_mutex);
-
 	ret = i915_gem_object_lock_interruptible(obj);
 	if (ret)
 		goto out_unpin;
@@ -674,18 +674,15 @@ i915_gem_gtt_pwrite_fast(struct drm_i915_gem_object *obj,
 
 	i915_gem_object_unlock_fence(obj, fence);
 out_unpin:
-	mutex_lock(&i915->drm.struct_mutex);
 	if (node.allocated) {
 		wmb();
 		ggtt->vm.clear_range(&ggtt->vm, node.start, node.size);
-		remove_mappable_node(&node);
+		remove_mappable_node(ggtt, &node);
 	} else {
 		i915_vma_unpin(vma);
 	}
 out_rpm:
 	intel_runtime_pm_put(i915, wakeref);
-out_unlock:
-	mutex_unlock(&i915->drm.struct_mutex);
 	return ret;
 }
 
@@ -1036,8 +1033,6 @@ i915_gem_object_ggtt_pin(struct drm_i915_gem_object *obj,
 	struct i915_vma *vma;
 	int ret;
 
-	lockdep_assert_held(&obj->base.dev->struct_mutex);
-
 	if (flags & PIN_MAPPABLE &&
 	    (!view || view->type == I915_GGTT_VIEW_NORMAL)) {
 		/* If the required space is larger than the available
@@ -1074,14 +1069,28 @@ i915_gem_object_ggtt_pin(struct drm_i915_gem_object *obj,
 	if (IS_ERR(vma))
 		return vma;
 
+	ret = i915_gem_object_pin_pages(obj);
+	if (ret)
+		return ERR_PTR(ret);
+
+	ret = mutex_lock_interruptible(&vm->mutex);
+	if (ret) {
+		vma = ERR_PTR(ret);
+		goto unpin;
+	}
+
 	if (i915_vma_misplaced(vma, size, alignment, flags)) {
 		if (flags & PIN_NONBLOCK) {
-			if (i915_vma_is_pinned(vma) || i915_vma_is_active(vma))
-				return ERR_PTR(-ENOSPC);
+			if (i915_vma_is_pinned(vma) || i915_vma_is_active(vma)) {
+				vma = ERR_PTR(-ENOSPC);
+				goto unlock;
+			}
 
 			if (flags & PIN_MAPPABLE &&
-			    vma->fence_size > dev_priv->ggtt.mappable_end / 2)
-				return ERR_PTR(-ENOSPC);
+			    vma->fence_size > dev_priv->ggtt.mappable_end / 2) {
+				vma = ERR_PTR(-ENOSPC);
+				goto unlock;
+			}
 		}
 
 		WARN(i915_vma_is_pinned(vma),
@@ -1094,15 +1103,20 @@ i915_gem_object_ggtt_pin(struct drm_i915_gem_object *obj,
 		ret = i915_vma_unbind(vma);
 		if (ret) {
 			i915_vma_put(vma);
-			return ERR_PTR(ret);
+			vma = ERR_PTR(ret);
+			goto unlock;
 		}
 	}
 
-	ret = i915_vma_pin(vma, size, alignment, flags | PIN_GLOBAL);
+	ret = __i915_vma_do_pin(vma, size, alignment, flags | PIN_GLOBAL);
 	i915_vma_put(vma);
 	if (ret)
-		return ERR_PTR(ret);
+		vma = ERR_PTR(ret);
 
+unlock:
+	mutex_unlock(&vm->mutex);
+unpin:
+	i915_gem_object_unpin_pages(obj);
 	return vma;
 }
 
@@ -1401,7 +1415,9 @@ static int __intel_engines_record_defaults(struct drm_i915_private *i915)
 		 * from the GTT to prevent such accidents and reclaim the
 		 * space.
 		 */
+		mutex_lock(&state->vm->mutex);
 		err = i915_vma_unbind(state);
+		mutex_unlock(&state->vm->mutex);
 		if (err)
 			goto err_active;
 
diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c
index a5783c4cb98b..0202f58d8bb5 100644
--- a/drivers/gpu/drm/i915/i915_gem_evict.c
+++ b/drivers/gpu/drm/i915/i915_gem_evict.c
@@ -48,8 +48,7 @@ static int ggtt_flush(struct drm_i915_private *i915)
 	 * bound by their active reference.
 	 */
 	return i915_gem_wait_for_idle(i915,
-				      I915_WAIT_INTERRUPTIBLE |
-				      I915_WAIT_LOCKED,
+				      I915_WAIT_INTERRUPTIBLE,
 				      MAX_SCHEDULE_TIMEOUT);
 }
 
@@ -108,7 +107,7 @@ i915_gem_evict_something(struct i915_address_space *vm,
 	struct i915_vma *active;
 	int ret;
 
-	lockdep_assert_held(&vm->i915->drm.struct_mutex);
+	lockdep_assert_held(&vm->mutex);
 	trace_i915_gem_evict(vm, min_size, alignment, flags);
 
 	/*
@@ -131,15 +130,6 @@ i915_gem_evict_something(struct i915_address_space *vm,
 				    min_size, alignment, cache_level,
 				    start, end, mode);
 
-	/*
-	 * Retire before we search the active list. Although we have
-	 * reasonable accuracy in our retirement lists, we may have
-	 * a stray pin (preventing eviction) that can only be resolved by
-	 * retiring.
-	 */
-	if (!(flags & PIN_NONBLOCK))
-		i915_retire_requests(dev_priv);
-
 search_again:
 	active = NULL;
 	INIT_LIST_HEAD(&eviction_list);
@@ -273,20 +263,12 @@ int i915_gem_evict_for_node(struct i915_address_space *vm,
 	bool check_color;
 	int ret = 0;
 
-	lockdep_assert_held(&vm->i915->drm.struct_mutex);
+	lockdep_assert_held(&vm->mutex);
 	GEM_BUG_ON(!IS_ALIGNED(start, I915_GTT_PAGE_SIZE));
 	GEM_BUG_ON(!IS_ALIGNED(end, I915_GTT_PAGE_SIZE));
 
 	trace_i915_gem_evict_node(vm, target, flags);
 
-	/* Retire before we search the active list. Although we have
-	 * reasonable accuracy in our retirement lists, we may have
-	 * a stray pin (preventing eviction) that can only be resolved by
-	 * retiring.
-	 */
-	if (!(flags & PIN_NONBLOCK))
-		i915_retire_requests(vm->i915);
-
 	check_color = vm->mm.color_adjust;
 	if (check_color) {
 		/* Expand search to cover neighbouring guard pages (or lack!) */
@@ -384,7 +366,7 @@ int i915_gem_evict_vm(struct i915_address_space *vm)
 	struct i915_vma *vma, *next;
 	int ret;
 
-	lockdep_assert_held(&vm->i915->drm.struct_mutex);
+	lockdep_assert_held(&vm->mutex);
 	trace_i915_gem_evict_vm(vm);
 
 	/* Switch back to the default context in order to unpin
@@ -399,7 +381,6 @@ int i915_gem_evict_vm(struct i915_address_space *vm)
 	}
 
 	INIT_LIST_HEAD(&eviction_list);
-	mutex_lock(&vm->mutex);
 	list_for_each_entry(vma, &vm->bound_list, vm_link) {
 		if (i915_vma_is_pinned(vma))
 			continue;
@@ -407,7 +388,6 @@ int i915_gem_evict_vm(struct i915_address_space *vm)
 		__i915_vma_pin(vma);
 		list_add(&vma->evict_link, &eviction_list);
 	}
-	mutex_unlock(&vm->mutex);
 
 	ret = 0;
 	list_for_each_entry_safe(vma, next, &eviction_list, evict_link) {
diff --git a/drivers/gpu/drm/i915/i915_gem_fence_reg.c b/drivers/gpu/drm/i915/i915_gem_fence_reg.c
index e596dd5ae1e2..17215de2a8b8 100644
--- a/drivers/gpu/drm/i915/i915_gem_fence_reg.c
+++ b/drivers/gpu/drm/i915/i915_gem_fence_reg.c
@@ -331,7 +331,7 @@ static struct i915_fence_reg *fence_find(struct drm_i915_private *i915)
 	return ERR_PTR(-EDEADLK);
 }
 
-static int __i915_vma_pin_fence(struct i915_vma *vma)
+int __i915_vma_pin_fence(struct i915_vma *vma)
 {
 	struct i915_ggtt *ggtt = i915_vm_to_ggtt(vma->vm);
 	struct i915_fence_reg *fence;
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index fb484685ff1b..c7d7959f761f 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1929,59 +1929,11 @@ static void gen6_ppgtt_free_pd(struct gen6_ppgtt *ppgtt)
 			free_pt(&ppgtt->base.vm, pt);
 }
 
-struct gen6_ppgtt_cleanup_work {
-	struct work_struct base;
-	struct i915_vma *vma;
-};
-
-static void gen6_ppgtt_cleanup_work(struct work_struct *wrk)
-{
-	struct gen6_ppgtt_cleanup_work *work =
-		container_of(wrk, typeof(*work), base);
-
-	/* Side note, vma->vm is the GGTT not the ppgtt we just destroyed! */
-	i915_vma_destroy(work->vma);
-
-	kfree(work);
-}
-
-static int nop_set_pages(struct i915_vma *vma)
-{
-	return -ENODEV;
-}
-
-static void nop_clear_pages(struct i915_vma *vma)
-{
-}
-
-static int nop_bind(struct i915_vma *vma,
-		    enum i915_cache_level cache_level,
-		    u32 unused)
-{
-	return -ENODEV;
-}
-
-static void nop_unbind(struct i915_vma *vma)
-{
-}
-
-static const struct i915_vma_ops nop_vma_ops = {
-	.set_pages = nop_set_pages,
-	.clear_pages = nop_clear_pages,
-	.bind_vma = nop_bind,
-	.unbind_vma = nop_unbind,
-};
-
 static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
 {
 	struct gen6_ppgtt *ppgtt = to_gen6_ppgtt(i915_vm_to_ppgtt(vm));
-	struct gen6_ppgtt_cleanup_work *work = ppgtt->work;
 
-	/* FIXME remove the struct_mutex to bring the locking under control */
-	INIT_WORK(&work->base, gen6_ppgtt_cleanup_work);
-	work->vma = ppgtt->vma;
-	work->vma->ops = &nop_vma_ops;
-	schedule_work(&work->base);
+	i915_vma_destroy(ppgtt->vma);
 
 	gen6_ppgtt_free_pd(ppgtt);
 	gen6_ppgtt_free_scratch(vm);
@@ -2157,15 +2109,9 @@ static struct i915_ppgtt *gen6_ppgtt_create(struct drm_i915_private *i915)
 
 	ppgtt->base.vm.pte_encode = ggtt->vm.pte_encode;
 
-	ppgtt->work = kmalloc(sizeof(*ppgtt->work), GFP_KERNEL);
-	if (!ppgtt->work) {
-		err = -ENOMEM;
-		goto err_free;
-	}
-
 	err = gen6_ppgtt_init_scratch(ppgtt);
 	if (err)
-		goto err_work;
+		goto err_free;
 
 	ppgtt->vma = pd_vma_create(ppgtt, GEN6_PD_SIZE);
 	if (IS_ERR(ppgtt->vma)) {
@@ -2177,8 +2123,6 @@ static struct i915_ppgtt *gen6_ppgtt_create(struct drm_i915_private *i915)
 
 err_scratch:
 	gen6_ppgtt_free_scratch(&ppgtt->base.vm);
-err_work:
-	kfree(ppgtt->work);
 err_free:
 	kfree(ppgtt);
 	return ERR_PTR(err);
@@ -2895,11 +2839,12 @@ void i915_ggtt_cleanup_hw(struct drm_i915_private *dev_priv)
 
 	ggtt->vm.closed = true;
 
-	mutex_lock(&dev_priv->drm.struct_mutex);
 	fini_aliasing_ppgtt(dev_priv);
 
+	mutex_lock(&ggtt->vm.mutex);
 	list_for_each_entry_safe(vma, vn, &ggtt->vm.bound_list, vm_link)
 		WARN_ON(i915_vma_unbind(vma));
+	mutex_unlock(&ggtt->vm.mutex);
 
 	if (drm_mm_node_allocated(&ggtt->error_capture))
 		drm_mm_remove_node(&ggtt->error_capture);
@@ -2919,8 +2864,6 @@ void i915_ggtt_cleanup_hw(struct drm_i915_private *dev_priv)
 		__pagevec_release(pvec);
 	}
 
-	mutex_unlock(&dev_priv->drm.struct_mutex);
-
 	arch_phys_wc_del(ggtt->mtrr);
 	io_mapping_fini(&ggtt->iomap);
 
@@ -3523,7 +3466,6 @@ int i915_ggtt_init_hw(struct drm_i915_private *dev_priv)
 	 * beyond the end of the batch buffer, across the page boundary,
 	 * and beyond the end of the GTT if we do not provide a guard.
 	 */
-	mutex_lock(&dev_priv->drm.struct_mutex);
 	i915_address_space_init(&ggtt->vm, VM_CLASS_GGTT);
 
 	ggtt->vm.is_ggtt = true;
@@ -3533,7 +3475,6 @@ int i915_ggtt_init_hw(struct drm_i915_private *dev_priv)
 
 	if (!HAS_LLC(dev_priv) && !HAS_PPGTT(dev_priv))
 		ggtt->vm.mm.color_adjust = i915_gtt_color_adjust;
-	mutex_unlock(&dev_priv->drm.struct_mutex);
 
 	if (!io_mapping_init_wc(&dev_priv->ggtt.iomap,
 				dev_priv->ggtt.gmadr.start,
@@ -3612,10 +3553,8 @@ void i915_gem_restore_gtt_mappings(struct drm_i915_private *dev_priv)
 		if (!(vma->flags & I915_VMA_GLOBAL_BIND))
 			continue;
 
-		mutex_unlock(&ggtt->vm.mutex);
-
 		if (!i915_vma_unbind(vma))
-			goto lock;
+			continue;
 
 		WARN_ON(i915_vma_bind(vma,
 				      obj ? obj->cache_level : 0,
@@ -3625,9 +3564,6 @@ void i915_gem_restore_gtt_mappings(struct drm_i915_private *dev_priv)
 			WARN_ON(i915_gem_object_set_to_gtt_domain(obj, false));
 			i915_gem_object_unlock(obj);
 		}
-
-lock:
-		mutex_lock(&ggtt->vm.mutex);
 	}
 
 	ggtt->vm.closed = false;
@@ -4024,7 +3960,7 @@ int i915_gem_gtt_insert(struct i915_address_space *vm,
 	u64 offset;
 	int err;
 
-	lockdep_assert_held(&vm->i915->drm.struct_mutex);
+	lockdep_assert_held(&vm->mutex);
 	GEM_BUG_ON(!size);
 	GEM_BUG_ON(!IS_ALIGNED(size, I915_GTT_PAGE_SIZE));
 	GEM_BUG_ON(alignment && !is_power_of_2(alignment));
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 889c4e88c598..d957f1a5b6a3 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -438,8 +438,6 @@ struct gen6_ppgtt {
 
 	unsigned int pin_count;
 	bool scan_for_unused_pt;
-
-	struct gen6_ppgtt_cleanup_work *work;
 };
 
 #define __to_gen6_ppgtt(base) container_of(base, struct gen6_ppgtt, base)
@@ -692,7 +690,6 @@ int i915_gem_gtt_insert(struct i915_address_space *vm,
 #define PIN_OFFSET_BIAS		BIT_ULL(6)
 #define PIN_OFFSET_FIXED	BIT_ULL(7)
 
-#define PIN_MBZ			BIT_ULL(8) /* I915_VMA_PIN_OVERFLOW */
 #define PIN_GLOBAL		BIT_ULL(9) /* I915_VMA_GLOBAL_BIND */
 #define PIN_USER		BIT_ULL(10) /* I915_VMA_LOCAL_BIND */
 #define PIN_UPDATE		BIT_ULL(11)
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index d92ddfada262..891b60f49843 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -1347,13 +1347,9 @@ static void oa_put_render_ctx_id(struct i915_perf_stream *stream)
 static void
 free_oa_buffer(struct drm_i915_private *i915)
 {
-	mutex_lock(&i915->drm.struct_mutex);
-
 	i915_vma_unpin_and_release(&i915->perf.oa.oa_buffer.vma,
 				   I915_VMA_RELEASE_MAP);
 
-	mutex_unlock(&i915->drm.struct_mutex);
-
 	i915->perf.oa.oa_buffer.vaddr = NULL;
 }
 
@@ -1505,19 +1501,12 @@ static int alloc_oa_buffer(struct drm_i915_private *dev_priv)
 	if (WARN_ON(dev_priv->perf.oa.oa_buffer.vma))
 		return -ENODEV;
 
-	ret = i915_mutex_lock_interruptible(&dev_priv->drm);
-	if (ret)
-		return ret;
-
 	BUILD_BUG_ON_NOT_POWER_OF_2(OA_BUFFER_SIZE);
 	BUILD_BUG_ON(OA_BUFFER_SIZE < SZ_128K || OA_BUFFER_SIZE > SZ_16M);
 
 	bo = i915_gem_object_create_shmem(dev_priv, OA_BUFFER_SIZE);
-	if (IS_ERR(bo)) {
-		DRM_ERROR("Failed to allocate OA buffer\n");
-		ret = PTR_ERR(bo);
-		goto unlock;
-	}
+	if (IS_ERR(bo))
+		return PTR_ERR(bo);
 
 	i915_gem_object_set_cache_coherency(bo, I915_CACHE_LLC);
 
@@ -1539,8 +1528,7 @@ static int alloc_oa_buffer(struct drm_i915_private *dev_priv)
 	DRM_DEBUG_DRIVER("OA Buffer initialized, gtt offset = 0x%x, vaddr = %p\n",
 			 i915_ggtt_offset(dev_priv->perf.oa.oa_buffer.vma),
 			 dev_priv->perf.oa.oa_buffer.vaddr);
-
-	goto unlock;
+	return 0;
 
 err_unpin:
 	__i915_vma_unpin(vma);
@@ -1551,8 +1539,6 @@ static int alloc_oa_buffer(struct drm_i915_private *dev_priv)
 	dev_priv->perf.oa.oa_buffer.vaddr = NULL;
 	dev_priv->perf.oa.oa_buffer.vma = NULL;
 
-unlock:
-	mutex_unlock(&dev_priv->drm.struct_mutex);
 	return ret;
 }
 
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index b5875db6f2c0..b06e4f5db6a8 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -318,8 +318,6 @@ vma_lookup(struct drm_i915_gem_object *obj,
  * Once created, the VMA is kept until either the object is freed, or the
  * address space is closed.
  *
- * Must be called with struct_mutex held.
- *
  * Returns the vma, or an error pointer.
  */
 struct i915_vma *
@@ -458,7 +456,8 @@ void __iomem *i915_vma_pin_iomap(struct i915_vma *vma)
 	/* Access through the GTT requires the device to be awake. */
 	assert_rpm_wakelock_held(vma->vm->i915);
 
-	lockdep_assert_held(&vma->vm->i915->drm.struct_mutex);
+	mutex_lock(&vma->vm->mutex);
+
 	if (WARN_ON(!i915_vma_is_map_and_fenceable(vma))) {
 		err = -ENODEV;
 		goto err;
@@ -482,16 +481,19 @@ void __iomem *i915_vma_pin_iomap(struct i915_vma *vma)
 
 	__i915_vma_pin(vma);
 
-	err = i915_vma_pin_fence(vma);
+	err = __i915_vma_pin_fence(vma);
 	if (err)
 		goto err_unpin;
 
 	i915_vma_set_ggtt_write(vma);
+	mutex_unlock(&vma->vm->mutex);
+
 	return ptr;
 
 err_unpin:
 	__i915_vma_unpin(vma);
 err:
+	mutex_unlock(&vma->vm->mutex);
 	return IO_ERR_PTR(err);
 }
 
@@ -507,8 +509,6 @@ void i915_vma_flush_writes(struct i915_vma *vma)
 
 void i915_vma_unpin_iomap(struct i915_vma *vma)
 {
-	lockdep_assert_held(&vma->vm->i915->drm.struct_mutex);
-
 	GEM_BUG_ON(vma->iomap == NULL);
 
 	i915_vma_flush_writes(vma);
@@ -684,10 +684,7 @@ i915_vma_insert(struct i915_vma *vma, u64 size, u64 alignment, u64 flags)
 	}
 
 	if (vma->obj) {
-		ret = i915_gem_object_pin_pages_async(vma->obj);
-		if (ret)
-			return ret;
-
+		__i915_gem_object_pin_pages(vma->obj);
 		cache_level = vma->obj->cache_level;
 	} else {
 		cache_level = 0;
@@ -752,9 +749,7 @@ i915_vma_insert(struct i915_vma *vma, u64 size, u64 alignment, u64 flags)
 	GEM_BUG_ON(!drm_mm_node_allocated(&vma->node));
 	GEM_BUG_ON(!i915_gem_valid_gtt_space(vma, cache_level));
 
-	mutex_lock(&vma->vm->mutex);
 	list_add_tail(&vma->vm_link, &vma->vm->bound_list);
-	mutex_unlock(&vma->vm->mutex);
 
 	if (vma->obj) {
 		atomic_inc(&vma->obj->bind_count);
@@ -777,10 +772,8 @@ i915_vma_remove(struct i915_vma *vma)
 
 	vma->ops->clear_pages(vma);
 
-	mutex_lock(&vma->vm->mutex);
 	drm_mm_remove_node(&vma->node);
 	list_del(&vma->vm_link);
-	mutex_unlock(&vma->vm->mutex);
 
 	/*
 	 * Since the unbound list is global, only move to that list if
@@ -801,25 +794,20 @@ i915_vma_remove(struct i915_vma *vma)
 	}
 }
 
-int __i915_vma_do_pin(struct i915_vma *vma,
-		      u64 size, u64 alignment, u64 flags)
+int __i915_vma_do_pin(struct i915_vma *vma, u64 size, u64 alignment, u64 flags)
 {
 	const unsigned int bound = vma->flags;
 	int ret;
 
-	lockdep_assert_held(&vma->vm->i915->drm.struct_mutex);
-	GEM_BUG_ON((flags & (PIN_GLOBAL | PIN_USER)) == 0);
-	GEM_BUG_ON((flags & PIN_GLOBAL) && !i915_vma_is_ggtt(vma));
+	lockdep_assert_held(&vma->vm->mutex);
 
-	if (WARN_ON(bound & I915_VMA_PIN_OVERFLOW)) {
-		ret = -EBUSY;
-		goto err_unpin;
-	}
+	if (atomic_read(&vma->pin_count))
+		goto out;
 
 	if ((bound & I915_VMA_BIND_MASK) == 0) {
 		ret = i915_vma_insert(vma, size, alignment, flags);
 		if (ret)
-			goto err_unpin;
+			return ret;
 	}
 	GEM_BUG_ON(!drm_mm_node_allocated(&vma->node));
 
@@ -833,6 +821,9 @@ int __i915_vma_do_pin(struct i915_vma *vma,
 		__i915_vma_set_map_and_fenceable(vma);
 
 	GEM_BUG_ON(i915_vma_misplaced(vma, size, alignment, flags));
+
+out:
+	atomic_inc(&vma->pin_count);
 	return 0;
 
 err_remove:
@@ -841,11 +832,34 @@ int __i915_vma_do_pin(struct i915_vma *vma,
 		GEM_BUG_ON(vma->pages);
 		GEM_BUG_ON(vma->flags & I915_VMA_BIND_MASK);
 	}
-err_unpin:
-	__i915_vma_unpin(vma);
 	return ret;
 }
 
+int i915_vma_do_pin(struct i915_vma *vma, u64 size, u64 alignment, u64 flags)
+{
+	int err;
+
+	GEM_BUG_ON((flags & (PIN_GLOBAL | PIN_USER)) == 0);
+	GEM_BUG_ON((flags & PIN_GLOBAL) && !i915_vma_is_ggtt(vma));
+
+	if (vma->obj) {
+		err = i915_gem_object_pin_pages_async(vma->obj);
+		if (err)
+			return err;
+	}
+
+	err = mutex_lock_interruptible(&vma->vm->mutex);
+	if (!err) {
+		err = __i915_vma_do_pin(vma, size, alignment, flags);
+		mutex_unlock(&vma->vm->mutex);
+	}
+
+	if (vma->obj)
+		i915_gem_object_unpin_pages(vma->obj);
+
+	return err;
+}
+
 void i915_vma_close(struct i915_vma *vma)
 {
 	struct drm_i915_private *i915 = vma->vm->i915;
@@ -886,10 +900,10 @@ void i915_vma_reopen(struct i915_vma *vma)
 
 void i915_vma_destroy(struct i915_vma *vma)
 {
-	mutex_lock(&vma->vm->i915->drm.struct_mutex);
-	vma->flags &= ~I915_VMA_PIN_MASK;
+	mutex_lock(&vma->vm->mutex);
+	atomic_set(&vma->pin_count, 0);
 	WARN_ON(i915_vma_unbind(vma));
-	mutex_unlock(&vma->vm->i915->drm.struct_mutex);
+	mutex_unlock(&vma->vm->mutex);
 
 	i915_vma_put(vma);
 }
@@ -925,11 +939,15 @@ void i915_vma_parked(struct drm_i915_private *i915)
 
 	spin_lock_irq(&i915->gt.closed_lock);
 	list_for_each_entry_safe(vma, next, &i915->gt.closed_vma, closed_link) {
+		i915_vma_get(vma);
 		list_del_init(&vma->closed_link);
 		spin_unlock_irq(&i915->gt.closed_lock);
 
+		mutex_lock(&vma->vm->mutex);
 		i915_vma_unbind(vma);
+		mutex_unlock(&vma->vm->mutex);
 
+		i915_vma_put(vma);
 		spin_lock_irq(&i915->gt.closed_lock);
 	}
 	spin_unlock_irq(&i915->gt.closed_lock);
@@ -951,7 +969,8 @@ void i915_vma_revoke_mmap(struct i915_vma *vma)
 	struct drm_vma_offset_node *node = &vma->obj->base.vma_node;
 	u64 vma_offset;
 
-	lockdep_assert_held(&vma->vm->i915->drm.struct_mutex);
+	lockdep_assert_held(&vma->vm->mutex);
+	GEM_BUG_ON(!i915_vma_is_ggtt(vma));
 
 	if (!i915_vma_has_userfault(vma))
 		return;
@@ -1034,7 +1053,7 @@ int i915_vma_unbind(struct i915_vma *vma)
 {
 	int ret;
 
-	lockdep_assert_held(&vma->vm->i915->drm.struct_mutex);
+	lockdep_assert_held(&vma->vm->mutex);
 
 	if (vma->async.dma &&
 	    dma_fence_wait_timeout(vma->async.dma, true,
@@ -1065,9 +1084,7 @@ int i915_vma_unbind(struct i915_vma *vma)
 		GEM_BUG_ON(i915_vma_has_ggtt_write(vma));
 
 		/* release the fence reg _after_ flushing */
-		mutex_lock(&vma->vm->mutex);
 		ret = i915_vma_revoke_fence(vma);
-		mutex_unlock(&vma->vm->mutex);
 		if (ret)
 			return ret;
 
diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
index f9f4cbb99e62..140d12f67cf8 100644
--- a/drivers/gpu/drm/i915/i915_vma.h
+++ b/drivers/gpu/drm/i915/i915_vma.h
@@ -71,39 +71,15 @@ struct i915_vma {
 	u32 fence_size;
 	u32 fence_alignment;
 
+	atomic_t pin_count;
 	atomic_t open_count;
 	unsigned long flags;
-	/**
-	 * How many users have pinned this object in GTT space.
-	 *
-	 * This is a tightly bound, fairly small number of users, so we
-	 * stuff inside the flags field so that we can both check for overflow
-	 * and detect a no-op i915_vma_pin() in a single check, while also
-	 * pinning the vma.
-	 *
-	 * The worst case display setup would have the same vma pinned for
-	 * use on each plane on each crtc, while also building the next atomic
-	 * state and holding a pin for the length of the cleanup queue. In the
-	 * future, the flip queue may be increased from 1.
-	 * Estimated worst case: 3 [qlen] * 4 [max crtcs] * 7 [max planes] = 84
-	 *
-	 * For GEM, the number of concurrent users for pwrite/pread is
-	 * unbounded. For execbuffer, it is currently one but will in future
-	 * be extended to allow multiple clients to pin vma concurrently.
-	 *
-	 * We also use suballocated pages, with each suballocation claiming
-	 * its own pin on the shared vma. At present, this is limited to
-	 * exclusive cachelines of a single page, so a maximum of 64 possible
-	 * users.
-	 */
-#define I915_VMA_PIN_MASK 0xff
-#define I915_VMA_PIN_OVERFLOW	BIT(8)
 
 	/** Flags and address space this VMA is bound to */
 #define I915_VMA_GLOBAL_BIND	BIT(9)
 #define I915_VMA_LOCAL_BIND	BIT(10)
-#define I915_VMA_BIND_MASK (I915_VMA_GLOBAL_BIND | I915_VMA_LOCAL_BIND | I915_VMA_PIN_OVERFLOW)
-#define I915_VMA_ALLOC_BIND	I915_VMA_PIN_OVERFLOW /* not stored */
+#define I915_VMA_BIND_MASK	(I915_VMA_GLOBAL_BIND | I915_VMA_LOCAL_BIND)
+#define I915_VMA_ALLOC_BIND	BIT(0) /* not stored */
 
 #define I915_VMA_GGTT		BIT(11)
 #define I915_VMA_CAN_FENCE	BIT(12)
@@ -328,30 +304,27 @@ static inline void i915_vma_unlock(struct i915_vma *vma)
 	reservation_object_unlock(vma->resv);
 }
 
-int __i915_vma_do_pin(struct i915_vma *vma,
-		      u64 size, u64 alignment, u64 flags);
+int __i915_vma_do_pin(struct i915_vma *vma, u64 size, u64 alignment, u64 flags);
+int i915_vma_do_pin(struct i915_vma *vma, u64 size, u64 alignment, u64 flags);
+
 static inline int __must_check
 i915_vma_pin(struct i915_vma *vma, u64 size, u64 alignment, u64 flags)
 {
-	BUILD_BUG_ON(PIN_MBZ != I915_VMA_PIN_OVERFLOW);
 	BUILD_BUG_ON(PIN_GLOBAL != I915_VMA_GLOBAL_BIND);
 	BUILD_BUG_ON(PIN_USER != I915_VMA_LOCAL_BIND);
 
-	/* Pin early to prevent the shrinker/eviction logic from destroying
-	 * our vma as we insert and bind.
-	 */
-	if (likely(((++vma->flags ^ flags) & I915_VMA_BIND_MASK) == 0)) {
+	if (atomic_add_unless(&vma->pin_count, 1, 0)) {
 		GEM_BUG_ON(!drm_mm_node_allocated(&vma->node));
 		GEM_BUG_ON(i915_vma_misplaced(vma, size, alignment, flags));
 		return 0;
 	}
 
-	return __i915_vma_do_pin(vma, size, alignment, flags);
+	return i915_vma_do_pin(vma, size, alignment, flags);
 }
 
 static inline int i915_vma_pin_count(const struct i915_vma *vma)
 {
-	return vma->flags & I915_VMA_PIN_MASK;
+	return atomic_read(&vma->pin_count);
 }
 
 static inline bool i915_vma_is_pinned(const struct i915_vma *vma)
@@ -361,19 +334,18 @@ static inline bool i915_vma_is_pinned(const struct i915_vma *vma)
 
 static inline void __i915_vma_pin(struct i915_vma *vma)
 {
-	vma->flags++;
-	GEM_BUG_ON(vma->flags & I915_VMA_PIN_OVERFLOW);
+	atomic_inc(&vma->pin_count);
 }
 
 static inline void __i915_vma_unpin(struct i915_vma *vma)
 {
-	vma->flags--;
+	atomic_dec(&vma->pin_count);
 }
 
 static inline void i915_vma_unpin(struct i915_vma *vma)
 {
-	GEM_BUG_ON(!i915_vma_is_pinned(vma));
 	GEM_BUG_ON(!drm_mm_node_allocated(&vma->node));
+	GEM_BUG_ON(!atomic_read(&vma->pin_count));
 	__i915_vma_unpin(vma);
 }
 
@@ -392,8 +364,6 @@ static inline bool i915_vma_is_bound(const struct i915_vma *vma,
  * the caller must call i915_vma_unpin_iomap to relinquish the pinning
  * after the iomapping is no longer required.
  *
- * Callers must hold the struct_mutex.
- *
  * Returns a valid iomapped pointer or ERR_PTR.
  */
 void __iomem *i915_vma_pin_iomap(struct i915_vma *vma);
@@ -405,8 +375,8 @@ void __iomem *i915_vma_pin_iomap(struct i915_vma *vma);
  *
  * Unpins the previously iomapped VMA from i915_vma_pin_iomap().
  *
- * Callers must hold the struct_mutex. This function is only valid to be
- * called on a VMA previously iomapped by the caller with i915_vma_pin_iomap().
+ * This function is only valid to be called on a VMA previously iomapped
+ * by the caller with i915_vma_pin_iomap().
  */
 void i915_vma_unpin_iomap(struct i915_vma *vma);
 
@@ -416,6 +386,8 @@ static inline struct page *i915_vma_first_page(struct i915_vma *vma)
 	return sg_page(vma->pages->sgl);
 }
 
+int __i915_vma_pin_fence(struct i915_vma *vma);
+
 /**
  * i915_vma_pin_fence - pin fencing state
  * @vma: vma to pin fencing for
@@ -451,7 +423,6 @@ static inline void __i915_vma_unpin_fence(struct i915_vma *vma)
 static inline void
 i915_vma_unpin_fence(struct i915_vma *vma)
 {
-	/* lockdep_assert_held(&vma->vm->i915->drm.struct_mutex); */
 	if (vma->fence)
 		__i915_vma_unpin_fence(vma);
 }
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index c3f44cbdb815..739c8df3f89d 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -2093,8 +2093,6 @@ intel_pin_and_fence_fb_obj(struct drm_framebuffer *fb,
 	unsigned int pinctl;
 	u32 alignment;
 
-	WARN_ON(!mutex_is_locked(&dev->struct_mutex));
-
 	alignment = intel_surf_alignment(fb, 0);
 
 	/* Note that the w/a also requires 64 PTE of padding following the
@@ -2175,13 +2173,9 @@ intel_pin_and_fence_fb_obj(struct drm_framebuffer *fb,
 
 void intel_unpin_fb_vma(struct i915_vma *vma, unsigned long flags)
 {
-	lockdep_assert_held(&vma->vm->i915->drm.struct_mutex);
-
-	i915_gem_object_lock(vma->obj);
 	if (flags & PLANE_HAS_FENCE)
 		i915_vma_unpin_fence(vma);
 	i915_gem_object_unpin_from_display_plane(vma);
-	i915_gem_object_unlock(vma->obj);
 
 	i915_vma_put(vma);
 }
@@ -3048,7 +3042,6 @@ initial_plane_vma(struct drm_i915_private *i915,
 	struct drm_i915_gem_object *obj;
 	struct i915_vma *vma;
 	u32 base, size;
-	int err;
 
 	if (plane_config->size == 0)
 		return NULL;
@@ -3089,12 +3082,9 @@ initial_plane_vma(struct drm_i915_private *i915,
 	if (IS_ERR(vma))
 		return NULL;
 
-	mutex_lock(&i915->drm.struct_mutex);
-	err = i915_vma_pin(vma, 0, 0,
-			   PIN_GLOBAL | PIN_MAPPABLE |
-			   base | PIN_OFFSET_FIXED);
-	mutex_unlock(&i915->drm.struct_mutex);
-	if (err)
+	if (i915_vma_pin(vma, 0, 0,
+			 PIN_GLOBAL | PIN_MAPPABLE |
+			 base | PIN_OFFSET_FIXED))
 		goto err_vma;
 
 	if (i915_gem_object_is_tiled(vma->obj) &&
@@ -3280,14 +3270,12 @@ intel_find_initial_plane_obj(struct intel_crtc *intel_crtc,
 	intel_state->color_plane[0].stride =
 		intel_fb_pitch(fb, 0, intel_state->base.rotation);
 
-	mutex_lock(&dev->struct_mutex);
 	intel_state->vma = i915_vma_get(plane_config->vma);
 	__i915_vma_pin(intel_state->vma);
 	if (intel_plane_uses_fence(intel_state) &&
 	    i915_vma_pin_fence(intel_state->vma) == 0)
 		intel_state->flags |= PLANE_HAS_FENCE;
-	vma->obj->pin_global++;
-	mutex_unlock(&dev->struct_mutex);
+	atomic_inc(&vma->obj->pin_global);
 
 	plane_state->src_x = 0;
 	plane_state->src_y = 0;
@@ -14255,8 +14243,6 @@ static void fb_obj_bump_render_priority(struct drm_i915_gem_object *obj)
  * bits.  Some older platforms need special physical address handling for
  * cursor planes.
  *
- * Must be called with struct_mutex held.
- *
  * Returns 0 on success, negative error code on failure.
  */
 int
@@ -14313,15 +14299,8 @@ intel_prepare_plane_fb(struct drm_plane *plane,
 	if (ret)
 		return ret;
 
-	ret = mutex_lock_interruptible(&dev_priv->drm.struct_mutex);
-	if (ret) {
-		i915_gem_object_unpin_pages(obj);
-		return ret;
-	}
-
 	ret = intel_plane_pin_fb(to_intel_plane_state(new_state));
 
-	mutex_unlock(&dev_priv->drm.struct_mutex);
 	i915_gem_object_unpin_pages(obj);
 	if (ret)
 		return ret;
@@ -14370,8 +14349,6 @@ intel_prepare_plane_fb(struct drm_plane *plane,
  * @old_state: the state from the previous modeset
  *
  * Cleans up a framebuffer that has just been removed from a plane.
- *
- * Must be called with struct_mutex held.
  */
 void
 intel_cleanup_plane_fb(struct drm_plane *plane,
@@ -14387,9 +14364,7 @@ intel_cleanup_plane_fb(struct drm_plane *plane,
 	}
 
 	/* Should only be called after a successful intel_prepare_plane_fb()! */
-	mutex_lock(&dev_priv->drm.struct_mutex);
 	intel_plane_unpin_fb(to_intel_plane_state(old_state));
-	mutex_unlock(&dev_priv->drm.struct_mutex);
 }
 
 int
@@ -14593,7 +14568,6 @@ intel_legacy_cursor_update(struct drm_plane *plane,
 			   u32 src_w, u32 src_h,
 			   struct drm_modeset_acquire_ctx *ctx)
 {
-	struct drm_i915_private *dev_priv = to_i915(crtc->dev);
 	struct drm_plane_state *old_plane_state, *new_plane_state;
 	struct intel_plane *intel_plane = to_intel_plane(plane);
 	struct intel_crtc_state *crtc_state =
@@ -14659,13 +14633,9 @@ intel_legacy_cursor_update(struct drm_plane *plane,
 	if (ret)
 		goto out_free;
 
-	ret = mutex_lock_interruptible(&dev_priv->drm.struct_mutex);
-	if (ret)
-		goto out_free;
-
 	ret = intel_plane_pin_fb(to_intel_plane_state(new_plane_state));
 	if (ret)
-		goto out_unlock;
+		goto out_free;
 
 	intel_frontbuffer_flush(to_intel_frontbuffer(fb), ORIGIN_FLIP);
 	intel_frontbuffer_track(to_intel_frontbuffer(old_plane_state->fb),
@@ -14695,8 +14665,6 @@ intel_legacy_cursor_update(struct drm_plane *plane,
 
 	intel_plane_unpin_fb(to_intel_plane_state(old_plane_state));
 
-out_unlock:
-	mutex_unlock(&dev_priv->drm.struct_mutex);
 out_free:
 	if (new_crtc_state)
 		intel_crtc_destroy_state(crtc, &new_crtc_state->base);
diff --git a/drivers/gpu/drm/i915/intel_fbdev.c b/drivers/gpu/drm/i915/intel_fbdev.c
index c7c11d1842af..1464157d0ffe 100644
--- a/drivers/gpu/drm/i915/intel_fbdev.c
+++ b/drivers/gpu/drm/i915/intel_fbdev.c
@@ -212,7 +212,6 @@ static int intelfb_create(struct drm_fb_helper *helper,
 		sizes->fb_height = intel_fb->base.height;
 	}
 
-	mutex_lock(&dev->struct_mutex);
 	wakeref = intel_runtime_pm_get(dev_priv);
 
 	/* Pin the GGTT vma for our access via info->screen_base.
@@ -273,7 +272,6 @@ static int intelfb_create(struct drm_fb_helper *helper,
 	ifbdev->vma_flags = flags;
 
 	intel_runtime_pm_put(dev_priv, wakeref);
-	mutex_unlock(&dev->struct_mutex);
 	vga_switcheroo_client_fb_set(pdev, info);
 	return 0;
 
@@ -281,7 +279,6 @@ static int intelfb_create(struct drm_fb_helper *helper,
 	intel_unpin_fb_vma(vma, flags);
 out_unlock:
 	intel_runtime_pm_put(dev_priv, wakeref);
-	mutex_unlock(&dev->struct_mutex);
 	return ret;
 }
 
@@ -298,11 +295,8 @@ static void intel_fbdev_destroy(struct intel_fbdev *ifbdev)
 
 	drm_fb_helper_fini(&ifbdev->helper);
 
-	if (ifbdev->vma) {
-		mutex_lock(&ifbdev->helper.dev->struct_mutex);
+	if (ifbdev->vma)
 		intel_unpin_fb_vma(ifbdev->vma, ifbdev->vma_flags);
-		mutex_unlock(&ifbdev->helper.dev->struct_mutex);
-	}
 
 	if (ifbdev->fb)
 		drm_framebuffer_remove(&ifbdev->fb->base);
diff --git a/drivers/gpu/drm/i915/intel_guc.c b/drivers/gpu/drm/i915/intel_guc.c
index 3b4e7686cd6c..184de18995e7 100644
--- a/drivers/gpu/drm/i915/intel_guc.c
+++ b/drivers/gpu/drm/i915/intel_guc.c
@@ -674,6 +674,7 @@ struct i915_vma *intel_guc_allocate_vma(struct intel_guc *guc, u32 size)
 		return vma;
 
 	flags = PIN_GLOBAL | PIN_OFFSET_BIAS | i915_ggtt_pin_bias(vma);
+
 	ret = i915_vma_pin(vma, 0, 0, flags);
 	if (ret) {
 		i915_vma_put(vma);
diff --git a/drivers/gpu/drm/i915/intel_overlay.c b/drivers/gpu/drm/i915/intel_overlay.c
index cdc028220848..b969eef3331b 100644
--- a/drivers/gpu/drm/i915/intel_overlay.c
+++ b/drivers/gpu/drm/i915/intel_overlay.c
@@ -752,7 +752,6 @@ static int intel_overlay_do_put_image(struct intel_overlay *overlay,
 	struct i915_vma *vma;
 	int ret, tmp_width;
 
-	lockdep_assert_held(&dev_priv->drm.struct_mutex);
 	WARN_ON(!drm_modeset_is_locked(&dev_priv->drm.mode_config.connection_mutex));
 
 	ret = intel_overlay_release_old_vid(overlay);
@@ -1304,22 +1303,16 @@ static int get_registers(struct intel_overlay *overlay, bool use_phys)
 	struct i915_vma *vma;
 	int err;
 
-	mutex_lock(&i915->drm.struct_mutex);
-
 	obj = i915_gem_object_create_stolen(i915, PAGE_SIZE);
 	if (obj == NULL)
 		obj = i915_gem_object_create_internal(i915, PAGE_SIZE);
-	if (IS_ERR(obj)) {
-		err = PTR_ERR(obj);
-		goto err_unlock;
-	}
+	if (IS_ERR(obj))
+		return PTR_ERR(obj);
 
 	vma = i915_gem_object_ggtt_pin(obj, NULL, 0, 0, PIN_MAPPABLE);
 	i915_gem_object_put(obj);
-	if (IS_ERR(vma)) {
-		err = PTR_ERR(vma);
-		goto err_unlock;
-	}
+	if (IS_ERR(vma))
+		return PTR_ERR(vma);
 
 	if (use_phys)
 		overlay->flip_addr = sg_dma_address(vma->obj->mm.pages->sgl);
@@ -1334,13 +1327,10 @@ static int get_registers(struct intel_overlay *overlay, bool use_phys)
 	}
 
 	overlay->reg_vma = vma;
-	mutex_unlock(&i915->drm.struct_mutex);
 	return 0;
 
 err_vma:
 	i915_vma_destroy(vma);
-err_unlock:
-	mutex_unlock(&i915->drm.struct_mutex);
 	return err;
 }
 
diff --git a/drivers/gpu/drm/i915/selftests/i915_vma.c b/drivers/gpu/drm/i915/selftests/i915_vma.c
index 30b831408b7b..332ee5cb9120 100644
--- a/drivers/gpu/drm/i915/selftests/i915_vma.c
+++ b/drivers/gpu/drm/i915/selftests/i915_vma.c
@@ -338,7 +338,9 @@ static int igt_vma_pin1(void *arg)
 
 		if (!err) {
 			i915_vma_unpin(vma);
+			mutex_lock(&ggtt->vm.mutex);
 			err = i915_vma_unbind(vma);
+			mutex_unlock(&ggtt->vm.mutex);
 			if (err) {
 				pr_err("Failed to unbind single page from GGTT, err=%d\n", err);
 				goto out;
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 35/39] drm/i915: Pin pages before waiting
  2019-06-14  7:09 WIP glance towards struct_mutex removal Chris Wilson
                   ` (33 preceding siblings ...)
  2019-06-14  7:10 ` [PATCH 34/39] drm/i915: Use vm->mutex for serialising GTT insertion Chris Wilson
@ 2019-06-14  7:10 ` Chris Wilson
  2019-06-14 19:53   ` Matthew Auld
  2019-06-14  7:10 ` [PATCH 36/39] drm/i915: Use reservation_object to coordinate userptr get_pages() Chris Wilson
                   ` (5 subsequent siblings)
  40 siblings, 1 reply; 61+ messages in thread
From: Chris Wilson @ 2019-06-14  7:10 UTC (permalink / raw)
  To: intel-gfx

In order to allow for asynchronous gathering of pages tracked by the
obj->resv, we take advantage of pinning the pages before doing waiting
on the reservation, and where possible do an early wait before acquiring
the object lock (with a follow up locked waited to ensure we have
exclusive access where necessary).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c |   4 +-
 drivers/gpu/drm/i915/gem/i915_gem_domain.c | 104 +++++++++++----------
 drivers/gpu/drm/i915/i915_gem.c            |  22 +++--
 3 files changed, 68 insertions(+), 62 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
index a93e233cfaa9..84992d590da5 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
@@ -154,7 +154,7 @@ static int i915_gem_begin_cpu_access(struct dma_buf *dma_buf, enum dma_data_dire
 	bool write = (direction == DMA_BIDIRECTIONAL || direction == DMA_TO_DEVICE);
 	int err;
 
-	err = i915_gem_object_pin_pages(obj);
+	err = i915_gem_object_pin_pages_async(obj);
 	if (err)
 		return err;
 
@@ -175,7 +175,7 @@ static int i915_gem_end_cpu_access(struct dma_buf *dma_buf, enum dma_data_direct
 	struct drm_i915_gem_object *obj = dma_buf_to_obj(dma_buf);
 	int err;
 
-	err = i915_gem_object_pin_pages(obj);
+	err = i915_gem_object_pin_pages_async(obj);
 	if (err)
 		return err;
 
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
index f9044bbdd429..bda990113124 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
@@ -49,17 +49,11 @@ i915_gem_object_set_to_wc_domain(struct drm_i915_gem_object *obj, bool write)
 
 	assert_object_held(obj);
 
-	ret = i915_gem_object_wait(obj,
-				   I915_WAIT_INTERRUPTIBLE |
-				   (write ? I915_WAIT_ALL : 0),
-				   MAX_SCHEDULE_TIMEOUT);
-	if (ret)
-		return ret;
-
 	if (obj->write_domain == I915_GEM_DOMAIN_WC)
 		return 0;
 
-	/* Flush and acquire obj->pages so that we are coherent through
+	/*
+	 * Flush and acquire obj->pages so that we are coherent through
 	 * direct access in memory with previous cached writes through
 	 * shmemfs and that our cache domain tracking remains valid.
 	 * For example, if the obj->filp was moved to swap without us
@@ -67,10 +61,17 @@ i915_gem_object_set_to_wc_domain(struct drm_i915_gem_object *obj, bool write)
 	 * continue to assume that the obj remained out of the CPU cached
 	 * domain.
 	 */
-	ret = i915_gem_object_pin_pages(obj);
+	ret = i915_gem_object_pin_pages_async(obj);
 	if (ret)
 		return ret;
 
+	ret = i915_gem_object_wait(obj,
+				   I915_WAIT_INTERRUPTIBLE |
+				   (write ? I915_WAIT_ALL : 0),
+				   MAX_SCHEDULE_TIMEOUT);
+	if (ret)
+		goto out_unpin;
+
 	i915_gem_object_flush_write_domain(obj, ~I915_GEM_DOMAIN_WC);
 
 	/* Serialise direct access to this object with the barriers for
@@ -91,8 +92,9 @@ i915_gem_object_set_to_wc_domain(struct drm_i915_gem_object *obj, bool write)
 		obj->mm.dirty = true;
 	}
 
+out_unpin:
 	i915_gem_object_unpin_pages(obj);
-	return 0;
+	return ret;
 }
 
 /**
@@ -110,17 +112,11 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
 
 	assert_object_held(obj);
 
-	ret = i915_gem_object_wait(obj,
-				   I915_WAIT_INTERRUPTIBLE |
-				   (write ? I915_WAIT_ALL : 0),
-				   MAX_SCHEDULE_TIMEOUT);
-	if (ret)
-		return ret;
-
 	if (obj->write_domain == I915_GEM_DOMAIN_GTT)
 		return 0;
 
-	/* Flush and acquire obj->pages so that we are coherent through
+	/*
+	 * Flush and acquire obj->pages so that we are coherent through
 	 * direct access in memory with previous cached writes through
 	 * shmemfs and that our cache domain tracking remains valid.
 	 * For example, if the obj->filp was moved to swap without us
@@ -128,10 +124,17 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
 	 * continue to assume that the obj remained out of the CPU cached
 	 * domain.
 	 */
-	ret = i915_gem_object_pin_pages(obj);
+	ret = i915_gem_object_pin_pages_async(obj);
 	if (ret)
 		return ret;
 
+	ret = i915_gem_object_wait(obj,
+				   I915_WAIT_INTERRUPTIBLE |
+				   (write ? I915_WAIT_ALL : 0),
+				   MAX_SCHEDULE_TIMEOUT);
+	if (ret)
+		goto out_unpin;
+
 	i915_gem_object_flush_write_domain(obj, ~I915_GEM_DOMAIN_GTT);
 
 	/* Serialise direct access to this object with the barriers for
@@ -152,8 +155,9 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
 		obj->mm.dirty = true;
 	}
 
+out_unpin:
 	i915_gem_object_unpin_pages(obj);
-	return 0;
+	return ret;
 }
 
 /**
@@ -603,19 +607,6 @@ i915_gem_set_domain_ioctl(struct drm_device *dev, void *data,
 		goto out;
 	}
 
-	/*
-	 * Try to flush the object off the GPU without holding the lock.
-	 * We will repeat the flush holding the lock in the normal manner
-	 * to catch cases where we are gazumped.
-	 */
-	err = i915_gem_object_wait(obj,
-				   I915_WAIT_INTERRUPTIBLE |
-				   I915_WAIT_PRIORITY |
-				   (write_domain ? I915_WAIT_ALL : 0),
-				   MAX_SCHEDULE_TIMEOUT);
-	if (err)
-		goto out;
-
 	/*
 	 * Proxy objects do not control access to the backing storage, ergo
 	 * they cannot be used as a means to manipulate the cache domain
@@ -636,10 +627,23 @@ i915_gem_set_domain_ioctl(struct drm_device *dev, void *data,
 	 * continue to assume that the obj remained out of the CPU cached
 	 * domain.
 	 */
-	err = i915_gem_object_pin_pages(obj);
+	err = i915_gem_object_pin_pages_async(obj);
 	if (err)
 		goto out;
 
+	/*
+	 * Try to flush the object off the GPU without holding the lock.
+	 * We will repeat the flush holding the lock in the normal manner
+	 * to catch cases where we are gazumped.
+	 */
+	err = i915_gem_object_wait(obj,
+				   I915_WAIT_INTERRUPTIBLE |
+				   I915_WAIT_PRIORITY |
+				   (write_domain ? I915_WAIT_ALL : 0),
+				   MAX_SCHEDULE_TIMEOUT);
+	if (err)
+		goto out_unpin;
+
 	err = i915_gem_object_lock_interruptible(obj);
 	if (err)
 		goto out_unpin;
@@ -680,25 +684,25 @@ int i915_gem_object_prepare_read(struct drm_i915_gem_object *obj,
 	if (!i915_gem_object_has_struct_page(obj))
 		return -ENODEV;
 
-	ret = i915_gem_object_lock_interruptible(obj);
+	ret = i915_gem_object_pin_pages_async(obj);
 	if (ret)
 		return ret;
 
+	ret = i915_gem_object_lock_interruptible(obj);
+	if (ret)
+		goto err_unpin;
+
 	ret = i915_gem_object_wait(obj,
 				   I915_WAIT_INTERRUPTIBLE,
 				   MAX_SCHEDULE_TIMEOUT);
 	if (ret)
 		goto err_unlock;
 
-	ret = i915_gem_object_pin_pages(obj);
-	if (ret)
-		goto err_unlock;
-
 	if (obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_READ ||
 	    !static_cpu_has(X86_FEATURE_CLFLUSH)) {
 		ret = i915_gem_object_set_to_cpu_domain(obj, false);
 		if (ret)
-			goto err_unpin;
+			goto err_unlock;
 		else
 			goto out;
 	}
@@ -718,10 +722,10 @@ int i915_gem_object_prepare_read(struct drm_i915_gem_object *obj,
 	/* return with the pages pinned */
 	return 0;
 
-err_unpin:
-	i915_gem_object_unpin_pages(obj);
 err_unlock:
 	i915_gem_object_unlock(obj);
+err_unpin:
+	i915_gem_object_unpin_pages(obj);
 	return ret;
 }
 
@@ -734,10 +738,14 @@ int i915_gem_object_prepare_write(struct drm_i915_gem_object *obj,
 	if (!i915_gem_object_has_struct_page(obj))
 		return -ENODEV;
 
-	ret = i915_gem_object_lock_interruptible(obj);
+	ret = i915_gem_object_pin_pages_async(obj);
 	if (ret)
 		return ret;
 
+	ret = i915_gem_object_lock_interruptible(obj);
+	if (ret)
+		goto err_unpin;
+
 	ret = i915_gem_object_wait(obj,
 				   I915_WAIT_INTERRUPTIBLE |
 				   I915_WAIT_ALL,
@@ -745,15 +753,11 @@ int i915_gem_object_prepare_write(struct drm_i915_gem_object *obj,
 	if (ret)
 		goto err_unlock;
 
-	ret = i915_gem_object_pin_pages(obj);
-	if (ret)
-		goto err_unlock;
-
 	if (obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_WRITE ||
 	    !static_cpu_has(X86_FEATURE_CLFLUSH)) {
 		ret = i915_gem_object_set_to_cpu_domain(obj, true);
 		if (ret)
-			goto err_unpin;
+			goto err_unlock;
 		else
 			goto out;
 	}
@@ -782,9 +786,9 @@ int i915_gem_object_prepare_write(struct drm_i915_gem_object *obj,
 	/* return with the pages pinned */
 	return 0;
 
-err_unpin:
-	i915_gem_object_unpin_pages(obj);
 err_unlock:
 	i915_gem_object_unlock(obj);
+err_unpin:
+	i915_gem_object_unpin_pages(obj);
 	return ret;
 }
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 9e835355eb93..21416b87905e 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -512,20 +512,21 @@ i915_gem_pread_ioctl(struct drm_device *dev, void *data,
 
 	trace_i915_gem_object_pread(obj, args->offset, args->size);
 
-	ret = i915_gem_object_wait(obj,
-				   I915_WAIT_INTERRUPTIBLE,
-				   MAX_SCHEDULE_TIMEOUT);
+	ret = i915_gem_object_pin_pages_async(obj);
 	if (ret)
 		goto out;
 
-	ret = i915_gem_object_pin_pages(obj);
+	ret = i915_gem_object_wait(obj,
+				   I915_WAIT_INTERRUPTIBLE,
+				   MAX_SCHEDULE_TIMEOUT);
 	if (ret)
-		goto out;
+		goto out_unpin;
 
 	ret = i915_gem_shmem_pread(obj, args);
 	if (ret == -EFAULT || ret == -ENODEV)
 		ret = i915_gem_gtt_pread(obj, args);
 
+out_unpin:
 	i915_gem_object_unpin_pages(obj);
 out:
 	i915_gem_object_put(obj);
@@ -812,16 +813,16 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data,
 	if (ret != -ENODEV)
 		goto err;
 
+	ret = i915_gem_object_pin_pages_async(obj);
+	if (ret)
+		goto err;
+
 	ret = i915_gem_object_wait(obj,
 				   I915_WAIT_INTERRUPTIBLE |
 				   I915_WAIT_ALL,
 				   MAX_SCHEDULE_TIMEOUT);
 	if (ret)
-		goto err;
-
-	ret = i915_gem_object_pin_pages(obj);
-	if (ret)
-		goto err;
+		goto err_unpin;
 
 	ret = -EFAULT;
 	/* We can only do the GTT pwrite on untiled buffers, as otherwise
@@ -845,6 +846,7 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data,
 			ret = i915_gem_shmem_pwrite(obj, args);
 	}
 
+err_unpin:
 	i915_gem_object_unpin_pages(obj);
 err:
 	i915_gem_object_put(obj);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 36/39] drm/i915: Use reservation_object to coordinate userptr get_pages()
  2019-06-14  7:09 WIP glance towards struct_mutex removal Chris Wilson
                   ` (34 preceding siblings ...)
  2019-06-14  7:10 ` [PATCH 35/39] drm/i915: Pin pages before waiting Chris Wilson
@ 2019-06-14  7:10 ` Chris Wilson
  2019-08-07 20:49   ` Daniel Vetter
  2019-06-14  7:10 ` [PATCH 37/39] drm/i915: Use forced preemptions in preference over hangcheck Chris Wilson
                   ` (4 subsequent siblings)
  40 siblings, 1 reply; 61+ messages in thread
From: Chris Wilson @ 2019-06-14  7:10 UTC (permalink / raw)
  To: intel-gfx

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c    |  27 +--
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |   3 -
 drivers/gpu/drm/i915/gem/i915_gem_internal.c  |  30 +--
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |  11 +-
 drivers/gpu/drm/i915/gem/i915_gem_pages.c     | 137 ++++++++++-
 drivers/gpu/drm/i915/gem/i915_gem_phys.c      |  33 ++-
 drivers/gpu/drm/i915/gem/i915_gem_shmem.c     |  30 +--
 drivers/gpu/drm/i915/gem/i915_gem_stolen.c    |  25 +-
 drivers/gpu/drm/i915/gem/i915_gem_userptr.c   | 221 ++++--------------
 .../drm/i915/gem/selftests/huge_gem_object.c  |  17 +-
 .../gpu/drm/i915/gem/selftests/huge_pages.c   |  45 ++--
 drivers/gpu/drm/i915/gvt/dmabuf.c             |  17 +-
 drivers/gpu/drm/i915/i915_drv.h               |   9 +-
 drivers/gpu/drm/i915/i915_gem.c               |   5 +-
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c |  14 +-
 15 files changed, 300 insertions(+), 324 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
index 84992d590da5..a44d6d2ef7ed 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
@@ -225,33 +225,24 @@ struct dma_buf *i915_gem_prime_export(struct drm_device *dev,
 	return drm_gem_dmabuf_export(dev, &exp_info);
 }
 
-static int i915_gem_object_get_pages_dmabuf(struct drm_i915_gem_object *obj)
+static struct sg_table *
+dmabuf_get_pages(struct i915_gem_object_get_pages_context *ctx,
+		 unsigned int *sizes)
 {
-	struct sg_table *pages;
-	unsigned int sg_page_sizes;
-
-	pages = dma_buf_map_attachment(obj->base.import_attach,
-				       DMA_BIDIRECTIONAL);
-	if (IS_ERR(pages))
-		return PTR_ERR(pages);
-
-	sg_page_sizes = i915_sg_page_sizes(pages->sgl);
-
-	__i915_gem_object_set_pages(obj, pages, sg_page_sizes);
-
-	return 0;
+	return dma_buf_map_attachment(ctx->object->base.import_attach,
+				      DMA_BIDIRECTIONAL);
 }
 
-static void i915_gem_object_put_pages_dmabuf(struct drm_i915_gem_object *obj,
-					     struct sg_table *pages)
+static void dmabuf_put_pages(struct drm_i915_gem_object *obj,
+			     struct sg_table *pages)
 {
 	dma_buf_unmap_attachment(obj->base.import_attach, pages,
 				 DMA_BIDIRECTIONAL);
 }
 
 static const struct drm_i915_gem_object_ops i915_gem_object_dmabuf_ops = {
-	.get_pages = i915_gem_object_get_pages_dmabuf,
-	.put_pages = i915_gem_object_put_pages_dmabuf,
+	.get_pages = dmabuf_get_pages,
+	.put_pages = dmabuf_put_pages,
 };
 
 struct drm_gem_object *i915_gem_prime_import(struct drm_device *dev,
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 44bcb681c168..68faf1a71c97 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -1784,9 +1784,6 @@ static noinline int eb_relocate_slow(struct i915_execbuffer *eb)
 		goto out;
 	}
 
-	/* A frequent cause for EAGAIN are currently unavailable client pages */
-	flush_workqueue(eb->i915->mm.userptr_wq);
-
 	err = i915_mutex_lock_interruptible(dev);
 	if (err) {
 		mutex_lock(&dev->struct_mutex);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_internal.c b/drivers/gpu/drm/i915/gem/i915_gem_internal.c
index 0c41e04ab8fa..aa0bd5de313b 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_internal.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_internal.c
@@ -32,11 +32,14 @@ static void internal_free_pages(struct sg_table *st)
 	kfree(st);
 }
 
-static int i915_gem_object_get_pages_internal(struct drm_i915_gem_object *obj)
+static struct sg_table *
+internal_get_pages(struct i915_gem_object_get_pages_context *ctx,
+		   unsigned int *sizes)
 {
+	struct drm_i915_gem_object *obj = ctx->object;
 	struct drm_i915_private *i915 = to_i915(obj->base.dev);
-	struct sg_table *st;
 	struct scatterlist *sg;
+	struct sg_table *st;
 	unsigned int sg_page_sizes;
 	unsigned int npages;
 	int max_order;
@@ -66,12 +69,12 @@ static int i915_gem_object_get_pages_internal(struct drm_i915_gem_object *obj)
 create_st:
 	st = kmalloc(sizeof(*st), GFP_KERNEL);
 	if (!st)
-		return -ENOMEM;
+		return ERR_PTR(-ENOMEM);
 
 	npages = obj->base.size / PAGE_SIZE;
 	if (sg_alloc_table(st, npages, GFP_KERNEL)) {
 		kfree(st);
-		return -ENOMEM;
+		return ERR_PTR(-ENOMEM);
 	}
 
 	sg = st->sgl;
@@ -117,27 +120,26 @@ static int i915_gem_object_get_pages_internal(struct drm_i915_gem_object *obj)
 		goto err;
 	}
 
-	/* Mark the pages as dontneed whilst they are still pinned. As soon
+	/*
+	 * Mark the pages as dontneed whilst they are still pinned. As soon
 	 * as they are unpinned they are allowed to be reaped by the shrinker,
 	 * and the caller is expected to repopulate - the contents of this
 	 * object are only valid whilst active and pinned.
 	 */
 	obj->mm.madv = I915_MADV_DONTNEED;
 
-	__i915_gem_object_set_pages(obj, st, sg_page_sizes);
-
-	return 0;
+	*sizes = sg_page_sizes;
+	return st;
 
 err:
 	sg_set_page(sg, NULL, 0, 0);
 	sg_mark_end(sg);
 	internal_free_pages(st);
-
-	return -ENOMEM;
+	return ERR_PTR(-ENOMEM);
 }
 
-static void i915_gem_object_put_pages_internal(struct drm_i915_gem_object *obj,
-					       struct sg_table *pages)
+static void internal_put_pages(struct drm_i915_gem_object *obj,
+			       struct sg_table *pages)
 {
 	i915_gem_gtt_finish_pages(obj, pages);
 	internal_free_pages(pages);
@@ -149,8 +151,8 @@ static void i915_gem_object_put_pages_internal(struct drm_i915_gem_object *obj,
 static const struct drm_i915_gem_object_ops i915_gem_object_internal_ops = {
 	.flags = I915_GEM_OBJECT_HAS_STRUCT_PAGE |
 		 I915_GEM_OBJECT_IS_SHRINKABLE,
-	.get_pages = i915_gem_object_get_pages_internal,
-	.put_pages = i915_gem_object_put_pages_internal,
+	.get_pages = internal_get_pages,
+	.put_pages = internal_put_pages,
 };
 
 /**
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index f792953b8a71..0ea404cfbc1c 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -19,6 +19,7 @@
 
 struct drm_i915_gem_object;
 struct intel_fronbuffer;
+struct task_struct;
 
 /*
  * struct i915_lut_handle tracks the fast lookups from handle to vma used
@@ -32,6 +33,11 @@ struct i915_lut_handle {
 	u32 handle;
 };
 
+struct i915_gem_object_get_pages_context  {
+	struct drm_i915_gem_object *object;
+	struct task_struct *task;
+};
+
 struct drm_i915_gem_object_ops {
 	unsigned int flags;
 #define I915_GEM_OBJECT_HAS_STRUCT_PAGE	BIT(0)
@@ -52,9 +58,11 @@ struct drm_i915_gem_object_ops {
 	 * being released or under memory pressure (where we attempt to
 	 * reap pages for the shrinker).
 	 */
-	int (*get_pages)(struct drm_i915_gem_object *obj);
+	struct sg_table *(*get_pages)(struct i915_gem_object_get_pages_context *ctx,
+				      unsigned int *sizes);
 	void (*put_pages)(struct drm_i915_gem_object *obj,
 			  struct sg_table *pages);
+
 	void (*truncate)(struct drm_i915_gem_object *obj);
 	void (*writeback)(struct drm_i915_gem_object *obj);
 
@@ -252,7 +260,6 @@ struct drm_i915_gem_object {
 
 			struct i915_mm_struct *mm;
 			struct i915_mmu_object *mmu_object;
-			struct work_struct *work;
 		} userptr;
 
 		unsigned long scratch;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
index 6bec301cee79..f65a983248c6 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
@@ -8,16 +8,49 @@
 #include "i915_gem_object.h"
 #include "i915_scatterlist.h"
 
-void __i915_gem_object_set_pages(struct drm_i915_gem_object *obj,
-				 struct sg_table *pages,
-				 unsigned int sg_page_sizes)
+static DEFINE_SPINLOCK(fence_lock);
+
+struct get_pages_work {
+	struct dma_fence dma; /* Must be first for dma_fence_free() */
+	struct i915_sw_fence wait;
+	struct work_struct work;
+	struct i915_gem_object_get_pages_context ctx;
+};
+
+static const char *get_pages_work_driver_name(struct dma_fence *fence)
+{
+	return DRIVER_NAME;
+}
+
+static const char *get_pages_work_timeline_name(struct dma_fence *fence)
+{
+	return "allocation";
+}
+
+static void get_pages_work_release(struct dma_fence *fence)
+{
+	struct get_pages_work *w = container_of(fence, typeof(*w), dma);
+
+	i915_sw_fence_fini(&w->wait);
+
+	BUILD_BUG_ON(offsetof(typeof(*w), dma));
+	dma_fence_free(&w->dma);
+}
+
+static const struct dma_fence_ops get_pages_work_ops = {
+	.get_driver_name = get_pages_work_driver_name,
+	.get_timeline_name = get_pages_work_timeline_name,
+	.release = get_pages_work_release,
+};
+
+static void __set_pages(struct drm_i915_gem_object *obj,
+			struct sg_table *pages,
+			unsigned int sg_page_sizes)
 {
 	struct drm_i915_private *i915 = to_i915(obj->base.dev);
 	unsigned long supported = INTEL_INFO(i915)->page_sizes;
 	int i;
 
-	lockdep_assert_held(&obj->mm.lock);
-
 	/* Make the pages coherent with the GPU (flushing any swapin). */
 	if (obj->cache_dirty) {
 		obj->write_domain = 0;
@@ -29,8 +62,6 @@ void __i915_gem_object_set_pages(struct drm_i915_gem_object *obj,
 	obj->mm.get_page.sg_pos = pages->sgl;
 	obj->mm.get_page.sg_idx = 0;
 
-	obj->mm.pages = pages;
-
 	if (i915_gem_object_is_tiled(obj) &&
 	    i915->quirks & QUIRK_PIN_SWIZZLED_PAGES) {
 		GEM_BUG_ON(obj->mm.quirked);
@@ -38,7 +69,8 @@ void __i915_gem_object_set_pages(struct drm_i915_gem_object *obj,
 		obj->mm.quirked = true;
 	}
 
-	GEM_BUG_ON(!sg_page_sizes);
+	if (!sg_page_sizes)
+		sg_page_sizes = i915_sg_page_sizes(pages->sgl);
 	obj->mm.page_sizes.phys = sg_page_sizes;
 
 	/*
@@ -73,18 +105,105 @@ void __i915_gem_object_set_pages(struct drm_i915_gem_object *obj,
 
 		spin_unlock_irqrestore(&i915->mm.obj_lock, flags);
 	}
+}
 
+static void
+get_pages_worker(struct work_struct *_work)
+{
+	struct get_pages_work *work = container_of(_work, typeof(*work), work);
+	struct drm_i915_gem_object *obj = work->ctx.object;
+	struct sg_table *pages;
+	unsigned int sizes = 0;
+
+	if (!work->dma.error) {
+		pages = obj->ops->get_pages(&work->ctx, &sizes);
+		if (!IS_ERR(pages))
+			__set_pages(obj, pages, sizes);
+		else
+			dma_fence_set_error(&work->dma, PTR_ERR(pages));
+	} else {
+		pages = ERR_PTR(work->dma.error);
+	}
+
+	obj->mm.pages = pages;
 	complete_all(&obj->mm.completion);
+	atomic_dec(&obj->mm.pages_pin_count);
+
+	i915_gem_object_put(obj);
+	put_task_struct(work->ctx.task);
+
+	dma_fence_signal(&work->dma);
+	dma_fence_put(&work->dma);
+}
+
+static int __i915_sw_fence_call
+get_pages_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
+{
+	struct get_pages_work *w = container_of(fence, typeof(*w), wait);
+
+	switch (state) {
+	case FENCE_COMPLETE:
+		if (fence->error)
+			dma_fence_set_error(&w->dma, fence->error);
+		queue_work(system_unbound_wq, &w->work);
+		break;
+
+	case FENCE_FREE:
+		dma_fence_put(&w->dma);
+		break;
+	}
+
+	return NOTIFY_DONE;
 }
 
 int ____i915_gem_object_get_pages(struct drm_i915_gem_object *obj)
 {
+	struct get_pages_work *work;
+	int err;
+
 	if (unlikely(obj->mm.madv != I915_MADV_WILLNEED)) {
 		DRM_DEBUG("Attempting to obtain a purgeable object\n");
 		return -EFAULT;
 	}
 
-	return obj->ops->get_pages(obj);
+	/* XXX inline? */
+
+	work = kmalloc(sizeof(*work), GFP_KERNEL);
+	if (!work)
+		return -ENOMEM;
+
+	dma_fence_init(&work->dma,
+		       &get_pages_work_ops,
+		       &fence_lock,
+		       to_i915(obj->base.dev)->mm.unordered_timeline,
+		       0);
+	i915_sw_fence_init(&work->wait, get_pages_notify);
+
+	work->ctx.object = i915_gem_object_get(obj);
+
+	work->ctx.task = current;
+	get_task_struct(work->ctx.task);
+
+	INIT_WORK(&work->work, get_pages_worker);
+
+	i915_gem_object_lock(obj);
+	GEM_BUG_ON(!reservation_object_test_signaled_rcu(obj->resv, true));
+	err = i915_sw_fence_await_reservation(&work->wait,
+					      obj->resv, NULL,
+					      true, I915_FENCE_TIMEOUT,
+					      I915_FENCE_GFP);
+	if (err == 0) {
+		reservation_object_add_excl_fence(obj->resv, &work->dma);
+		atomic_inc(&obj->mm.pages_pin_count);
+	} else {
+		dma_fence_set_error(&work->dma, err);
+	}
+	i915_gem_object_unlock(obj);
+
+	dma_fence_get(&work->dma);
+	i915_sw_fence_commit(&work->wait);
+
+	return err;
 }
 
 /* Ensure that the associated pages are gathered from the backing storage
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_phys.c b/drivers/gpu/drm/i915/gem/i915_gem_phys.c
index 2deac933cf59..6b4a5fb52055 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_phys.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_phys.c
@@ -17,18 +17,21 @@
 #include "i915_gem_object.h"
 #include "i915_scatterlist.h"
 
-static int i915_gem_object_get_pages_phys(struct drm_i915_gem_object *obj)
+static struct sg_table *
+phys_get_pages(struct i915_gem_object_get_pages_context *ctx,
+	       unsigned int *sizes)
 {
+	struct drm_i915_gem_object *obj = ctx->object;
 	struct address_space *mapping = obj->base.filp->f_mapping;
 	struct drm_dma_handle *phys;
-	struct sg_table *st;
 	struct scatterlist *sg;
+	struct sg_table *st;
 	char *vaddr;
 	int i;
 	int err;
 
 	if (WARN_ON(i915_gem_object_needs_bit17_swizzle(obj)))
-		return -EINVAL;
+		return ERR_PTR(-EINVAL);
 
 	/* Always aligning to the object size, allows a single allocation
 	 * to handle all possible callers, and given typical object sizes,
@@ -38,7 +41,7 @@ static int i915_gem_object_get_pages_phys(struct drm_i915_gem_object *obj)
 			     roundup_pow_of_two(obj->base.size),
 			     roundup_pow_of_two(obj->base.size));
 	if (!phys)
-		return -ENOMEM;
+		return ERR_PTR(-ENOMEM);
 
 	vaddr = phys->vaddr;
 	for (i = 0; i < obj->base.size / PAGE_SIZE; i++) {
@@ -83,19 +86,16 @@ static int i915_gem_object_get_pages_phys(struct drm_i915_gem_object *obj)
 
 	obj->phys_handle = phys;
 
-	__i915_gem_object_set_pages(obj, st, sg->length);
-
-	return 0;
+	*sizes = sg->length;
+	return st;
 
 err_phys:
 	drm_pci_free(obj->base.dev, phys);
-
-	return err;
+	return ERR_PTR(err);
 }
 
 static void
-i915_gem_object_put_pages_phys(struct drm_i915_gem_object *obj,
-			       struct sg_table *pages)
+phys_put_pages(struct drm_i915_gem_object *obj, struct sg_table *pages)
 {
 	__i915_gem_object_release_shmem(obj, pages, false);
 
@@ -139,8 +139,8 @@ i915_gem_object_release_phys(struct drm_i915_gem_object *obj)
 }
 
 static const struct drm_i915_gem_object_ops i915_gem_phys_ops = {
-	.get_pages = i915_gem_object_get_pages_phys,
-	.put_pages = i915_gem_object_put_pages_phys,
+	.get_pages = phys_get_pages,
+	.put_pages = phys_put_pages,
 	.release = i915_gem_object_release_phys,
 };
 
@@ -193,15 +193,12 @@ int i915_gem_object_attach_phys(struct drm_i915_gem_object *obj, int align)
 	if (!IS_ERR_OR_NULL(pages))
 		i915_gem_shmem_ops.put_pages(obj, pages);
 	mutex_unlock(&obj->mm.lock);
+
+	wait_for_completion(&obj->mm.completion);
 	return 0;
 
 err_xfer:
 	obj->ops = &i915_gem_shmem_ops;
-	if (!IS_ERR_OR_NULL(pages)) {
-		unsigned int sg_page_sizes = i915_sg_page_sizes(pages->sgl);
-
-		__i915_gem_object_set_pages(obj, pages, sg_page_sizes);
-	}
 err_unlock:
 	mutex_unlock(&obj->mm.lock);
 	return err;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
index 19d9ecdb2894..c43304e3bada 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
@@ -22,11 +22,13 @@ static void check_release_pagevec(struct pagevec *pvec)
 	cond_resched();
 }
 
-static int shmem_get_pages(struct drm_i915_gem_object *obj)
+static struct sg_table *
+shmem_get_pages(struct i915_gem_object_get_pages_context *ctx,
+		unsigned int *sizes)
 {
+	struct drm_i915_gem_object *obj = ctx->object;
 	struct drm_i915_private *i915 = to_i915(obj->base.dev);
 	const unsigned long page_count = obj->base.size / PAGE_SIZE;
-	unsigned long i;
 	struct address_space *mapping;
 	struct sg_table *st;
 	struct scatterlist *sg;
@@ -37,31 +39,24 @@ static int shmem_get_pages(struct drm_i915_gem_object *obj)
 	unsigned int sg_page_sizes;
 	struct pagevec pvec;
 	gfp_t noreclaim;
+	unsigned long i;
 	int ret;
 
-	/*
-	 * Assert that the object is not currently in any GPU domain. As it
-	 * wasn't in the GTT, there shouldn't be any way it could have been in
-	 * a GPU cache
-	 */
-	GEM_BUG_ON(obj->read_domains & I915_GEM_GPU_DOMAINS);
-	GEM_BUG_ON(obj->write_domain & I915_GEM_GPU_DOMAINS);
-
 	/*
 	 * If there's no chance of allocating enough pages for the whole
 	 * object, bail early.
 	 */
 	if (page_count > totalram_pages())
-		return -ENOMEM;
+		return ERR_PTR(-ENOMEM);
 
 	st = kmalloc(sizeof(*st), GFP_KERNEL);
 	if (!st)
-		return -ENOMEM;
+		return ERR_PTR(-ENOMEM);
 
 rebuild_st:
 	if (sg_alloc_table(st, page_count, GFP_KERNEL)) {
 		kfree(st);
-		return -ENOMEM;
+		return ERR_PTR(-ENOMEM);
 	}
 
 	/*
@@ -179,9 +174,8 @@ static int shmem_get_pages(struct drm_i915_gem_object *obj)
 	if (i915_gem_object_needs_bit17_swizzle(obj))
 		i915_gem_object_do_bit_17_swizzle(obj, st);
 
-	__i915_gem_object_set_pages(obj, st, sg_page_sizes);
-
-	return 0;
+	*sizes = sg_page_sizes;
+	return st;
 
 err_sg:
 	sg_mark_end(sg);
@@ -209,7 +203,7 @@ static int shmem_get_pages(struct drm_i915_gem_object *obj)
 	if (ret == -ENOSPC)
 		ret = -ENOMEM;
 
-	return ret;
+	return ERR_PTR(ret);
 }
 
 static void
@@ -276,8 +270,6 @@ __i915_gem_object_release_shmem(struct drm_i915_gem_object *obj,
 				struct sg_table *pages,
 				bool needs_clflush)
 {
-	GEM_BUG_ON(obj->mm.madv == __I915_MADV_PURGED);
-
 	if (obj->mm.madv == I915_MADV_DONTNEED)
 		obj->mm.dirty = false;
 
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
index 7066044d63cf..28ea06e667cd 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
@@ -499,22 +499,19 @@ i915_pages_create_for_stolen(struct drm_device *dev,
 	return st;
 }
 
-static int i915_gem_object_get_pages_stolen(struct drm_i915_gem_object *obj)
+static struct sg_table *
+stolen_get_pages(struct i915_gem_object_get_pages_context *ctx,
+		 unsigned int *sizes)
 {
-	struct sg_table *pages =
-		i915_pages_create_for_stolen(obj->base.dev,
-					     obj->stolen->start,
-					     obj->stolen->size);
-	if (IS_ERR(pages))
-		return PTR_ERR(pages);
-
-	__i915_gem_object_set_pages(obj, pages, obj->stolen->size);
+	struct drm_i915_gem_object *obj = ctx->object;
 
-	return 0;
+	return i915_pages_create_for_stolen(obj->base.dev,
+					    obj->stolen->start,
+					    obj->stolen->size);
 }
 
-static void i915_gem_object_put_pages_stolen(struct drm_i915_gem_object *obj,
-					     struct sg_table *pages)
+static void
+stolen_put_pages(struct drm_i915_gem_object *obj, struct sg_table *pages)
 {
 	/* Should only be called from i915_gem_object_release_stolen() */
 	sg_free_table(pages);
@@ -536,8 +533,8 @@ i915_gem_object_release_stolen(struct drm_i915_gem_object *obj)
 }
 
 static const struct drm_i915_gem_object_ops i915_gem_object_stolen_ops = {
-	.get_pages = i915_gem_object_get_pages_stolen,
-	.put_pages = i915_gem_object_put_pages_stolen,
+	.get_pages = stolen_get_pages,
+	.put_pages = stolen_put_pages,
 	.release = i915_gem_object_release_stolen,
 };
 
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
index f093deaeb5c0..6748e15bf89a 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
@@ -378,7 +378,7 @@ __i915_mm_struct_free(struct kref *kref)
 	mutex_unlock(&mm->i915->mm_lock);
 
 	INIT_WORK(&mm->work, __i915_mm_struct_free__worker);
-	queue_work(mm->i915->mm.userptr_wq, &mm->work);
+	schedule_work(&mm->work);
 }
 
 static void
@@ -393,19 +393,12 @@ i915_gem_userptr_release__mm_struct(struct drm_i915_gem_object *obj)
 	obj->userptr.mm = NULL;
 }
 
-struct get_pages_work {
-	struct work_struct work;
-	struct drm_i915_gem_object *obj;
-	struct task_struct *task;
-};
-
 static struct sg_table *
 __i915_gem_userptr_alloc_pages(struct drm_i915_gem_object *obj,
 			       struct page **pvec, int num_pages)
 {
 	unsigned int max_segment = i915_sg_segment_size();
 	struct sg_table *st;
-	unsigned int sg_page_sizes;
 	int ret;
 
 	st = kmalloc(sizeof(*st), GFP_KERNEL);
@@ -435,131 +428,23 @@ __i915_gem_userptr_alloc_pages(struct drm_i915_gem_object *obj,
 		return ERR_PTR(ret);
 	}
 
-	sg_page_sizes = i915_sg_page_sizes(st->sgl);
-
-	__i915_gem_object_set_pages(obj, st, sg_page_sizes);
-
 	return st;
 }
 
-static void
-__i915_gem_userptr_get_pages_worker(struct work_struct *_work)
-{
-	struct get_pages_work *work = container_of(_work, typeof(*work), work);
-	struct drm_i915_gem_object *obj = work->obj;
-	const int npages = obj->base.size >> PAGE_SHIFT;
-	struct page **pvec;
-	int pinned, ret;
-
-	ret = -ENOMEM;
-	pinned = 0;
-
-	pvec = kvmalloc_array(npages, sizeof(struct page *), GFP_KERNEL);
-	if (pvec != NULL) {
-		struct mm_struct *mm = obj->userptr.mm->mm;
-		unsigned int flags = 0;
-
-		if (!i915_gem_object_is_readonly(obj))
-			flags |= FOLL_WRITE;
-
-		ret = -EFAULT;
-		if (mmget_not_zero(mm)) {
-			down_read(&mm->mmap_sem);
-			while (pinned < npages) {
-				ret = get_user_pages_remote
-					(work->task, mm,
-					 obj->userptr.ptr + pinned * PAGE_SIZE,
-					 npages - pinned,
-					 flags,
-					 pvec + pinned, NULL, NULL);
-				if (ret < 0)
-					break;
-
-				pinned += ret;
-			}
-			up_read(&mm->mmap_sem);
-			mmput(mm);
-		}
-	}
-
-	mutex_lock(&obj->mm.lock);
-	if (obj->userptr.work == &work->work) {
-		struct sg_table *pages = ERR_PTR(ret);
-
-		if (pinned == npages) {
-			pages = __i915_gem_userptr_alloc_pages(obj, pvec,
-							       npages);
-			if (!IS_ERR(pages)) {
-				pinned = 0;
-				pages = NULL;
-			}
-		}
-
-		obj->userptr.work = ERR_CAST(pages);
-		if (IS_ERR(pages))
-			__i915_gem_userptr_set_active(obj, false);
-	}
-	mutex_unlock(&obj->mm.lock);
-
-	release_pages(pvec, pinned);
-	kvfree(pvec);
-
-	i915_gem_object_put(obj);
-	put_task_struct(work->task);
-	kfree(work);
-}
-
 static struct sg_table *
-__i915_gem_userptr_get_pages_schedule(struct drm_i915_gem_object *obj)
-{
-	struct get_pages_work *work;
-
-	/* Spawn a worker so that we can acquire the
-	 * user pages without holding our mutex. Access
-	 * to the user pages requires mmap_sem, and we have
-	 * a strict lock ordering of mmap_sem, struct_mutex -
-	 * we already hold struct_mutex here and so cannot
-	 * call gup without encountering a lock inversion.
-	 *
-	 * Userspace will keep on repeating the operation
-	 * (thanks to EAGAIN) until either we hit the fast
-	 * path or the worker completes. If the worker is
-	 * cancelled or superseded, the task is still run
-	 * but the results ignored. (This leads to
-	 * complications that we may have a stray object
-	 * refcount that we need to be wary of when
-	 * checking for existing objects during creation.)
-	 * If the worker encounters an error, it reports
-	 * that error back to this function through
-	 * obj->userptr.work = ERR_PTR.
-	 */
-	work = kmalloc(sizeof(*work), GFP_KERNEL);
-	if (work == NULL)
-		return ERR_PTR(-ENOMEM);
-
-	obj->userptr.work = &work->work;
-
-	work->obj = i915_gem_object_get(obj);
-
-	work->task = current;
-	get_task_struct(work->task);
-
-	INIT_WORK(&work->work, __i915_gem_userptr_get_pages_worker);
-	queue_work(to_i915(obj->base.dev)->mm.userptr_wq, &work->work);
-
-	return ERR_PTR(-EAGAIN);
-}
-
-static int i915_gem_userptr_get_pages(struct drm_i915_gem_object *obj)
+userptr_get_pages(struct i915_gem_object_get_pages_context *ctx,
+		  unsigned int *sizes)
 {
+	struct drm_i915_gem_object *obj = ctx->object;
 	const int num_pages = obj->base.size >> PAGE_SHIFT;
 	struct mm_struct *mm = obj->userptr.mm->mm;
-	struct page **pvec;
 	struct sg_table *pages;
-	bool active;
+	struct page **pvec;
 	int pinned;
+	int ret;
 
-	/* If userspace should engineer that these pages are replaced in
+	/*
+	 * If userspace should engineer that these pages are replaced in
 	 * the vma between us binding this page into the GTT and completion
 	 * of rendering... Their loss. If they change the mapping of their
 	 * pages they need to create a new bo to point to the new vma.
@@ -576,59 +461,58 @@ static int i915_gem_userptr_get_pages(struct drm_i915_gem_object *obj)
 	 * egregious cases from causing harm.
 	 */
 
-	if (obj->userptr.work) {
-		/* active flag should still be held for the pending work */
-		if (IS_ERR(obj->userptr.work))
-			return PTR_ERR(obj->userptr.work);
-		else
-			return -EAGAIN;
-	}
+	pvec = kvmalloc_array(num_pages, sizeof(struct page *),
+			      GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN);
+	if (!pvec)
+		return ERR_PTR(-ENOMEM);
+
+	__i915_gem_userptr_set_active(obj, true);
 
-	pvec = NULL;
 	pinned = 0;
+	ret = -EFAULT;
+	if (mmget_not_zero(mm)) {
+		unsigned int flags;
 
-	if (mm == current->mm) {
-		pvec = kvmalloc_array(num_pages, sizeof(struct page *),
-				      GFP_KERNEL |
-				      __GFP_NORETRY |
-				      __GFP_NOWARN);
-		if (pvec) /* defer to worker if malloc fails */
-			pinned = __get_user_pages_fast(obj->userptr.ptr,
-						       num_pages,
-						       !i915_gem_object_is_readonly(obj),
-						       pvec);
-	}
+		flags = 0;
+		if (!i915_gem_object_is_readonly(obj))
+			flags |= FOLL_WRITE;
 
-	active = false;
-	if (pinned < 0) {
-		pages = ERR_PTR(pinned);
-		pinned = 0;
-	} else if (pinned < num_pages) {
-		pages = __i915_gem_userptr_get_pages_schedule(obj);
-		active = pages == ERR_PTR(-EAGAIN);
-	} else {
-		pages = __i915_gem_userptr_alloc_pages(obj, pvec, num_pages);
-		active = !IS_ERR(pages);
+		down_read(&mm->mmap_sem);
+		while (pinned < num_pages) {
+			ret = get_user_pages_remote
+				(ctx->task, mm,
+				 obj->userptr.ptr + pinned * PAGE_SIZE,
+				 num_pages - pinned,
+				 flags,
+				 pvec + pinned, NULL, NULL);
+			if (ret < 0)
+				break;
+
+			pinned += ret;
+		}
+		up_read(&mm->mmap_sem);
+		mmput(mm);
 	}
-	if (active)
-		__i915_gem_userptr_set_active(obj, true);
 
-	if (IS_ERR(pages))
+	if (ret)
+		pages = ERR_PTR(ret);
+	else
+		pages = __i915_gem_userptr_alloc_pages(obj, pvec, num_pages);
+	if (IS_ERR(pages)) {
 		release_pages(pvec, pinned);
+		__i915_gem_userptr_set_active(obj, false);
+	}
 	kvfree(pvec);
 
-	return PTR_ERR_OR_ZERO(pages);
+	return pages;
 }
 
 static void
-i915_gem_userptr_put_pages(struct drm_i915_gem_object *obj,
-			   struct sg_table *pages)
+userptr_put_pages(struct drm_i915_gem_object *obj, struct sg_table *pages)
 {
 	struct sgt_iter sgt_iter;
 	struct page *page;
 
-	/* Cancel any inflight work and force them to restart their gup */
-	obj->userptr.work = NULL;
 	__i915_gem_userptr_set_active(obj, false);
 	if (!pages)
 		return;
@@ -669,8 +553,8 @@ static const struct drm_i915_gem_object_ops i915_gem_userptr_ops = {
 	.flags = I915_GEM_OBJECT_HAS_STRUCT_PAGE |
 		 I915_GEM_OBJECT_IS_SHRINKABLE |
 		 I915_GEM_OBJECT_ASYNC_CANCEL,
-	.get_pages = i915_gem_userptr_get_pages,
-	.put_pages = i915_gem_userptr_put_pages,
+	.get_pages = userptr_get_pages,
+	.put_pages = userptr_put_pages,
 	.dmabuf_export = i915_gem_userptr_dmabuf_export,
 	.release = i915_gem_userptr_release,
 };
@@ -786,22 +670,13 @@ i915_gem_userptr_ioctl(struct drm_device *dev,
 	return 0;
 }
 
-int i915_gem_init_userptr(struct drm_i915_private *dev_priv)
+void i915_gem_init_userptr(struct drm_i915_private *dev_priv)
 {
 	mutex_init(&dev_priv->mm_lock);
 	hash_init(dev_priv->mm_structs);
-
-	dev_priv->mm.userptr_wq =
-		alloc_workqueue("i915-userptr-acquire",
-				WQ_HIGHPRI | WQ_UNBOUND,
-				0);
-	if (!dev_priv->mm.userptr_wq)
-		return -ENOMEM;
-
-	return 0;
 }
 
 void i915_gem_cleanup_userptr(struct drm_i915_private *dev_priv)
 {
-	destroy_workqueue(dev_priv->mm.userptr_wq);
+	mutex_destroy(&dev_priv->mm_lock);
 }
diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c b/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c
index 3c5d17b2b670..02e6edce715e 100644
--- a/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c
@@ -21,9 +21,12 @@ static void huge_free_pages(struct drm_i915_gem_object *obj,
 	kfree(pages);
 }
 
-static int huge_get_pages(struct drm_i915_gem_object *obj)
+static struct sg_table *
+huge_get_pages(struct i915_gem_object_get_pages_context *ctx,
+	       unsigned int *sizes)
 {
 #define GFP (GFP_KERNEL | __GFP_NOWARN | __GFP_NORETRY)
+	struct drm_i915_gem_object *obj = ctx->object;
 	const unsigned long nreal = obj->scratch / PAGE_SIZE;
 	const unsigned long npages = obj->base.size / PAGE_SIZE;
 	struct scatterlist *sg, *src, *end;
@@ -32,11 +35,11 @@ static int huge_get_pages(struct drm_i915_gem_object *obj)
 
 	pages = kmalloc(sizeof(*pages), GFP);
 	if (!pages)
-		return -ENOMEM;
+		return ERR_PTR(-ENOMEM);
 
 	if (sg_alloc_table(pages, npages, GFP)) {
 		kfree(pages);
-		return -ENOMEM;
+		return ERR_PTR(-ENOMEM);
 	}
 
 	sg = pages->sgl;
@@ -64,14 +67,12 @@ static int huge_get_pages(struct drm_i915_gem_object *obj)
 	if (i915_gem_gtt_prepare_pages(obj, pages))
 		goto err;
 
-	__i915_gem_object_set_pages(obj, pages, PAGE_SIZE);
-
-	return 0;
+	*sizes = PAGE_SIZE;
+	return pages;
 
 err:
 	huge_free_pages(obj, pages);
-
-	return -ENOMEM;
+	return ERR_PTR(-ENOMEM);
 #undef GFP
 }
 
diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
index b03a29366bd1..3b93a7337f27 100644
--- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
+++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
@@ -51,9 +51,12 @@ static void huge_pages_free_pages(struct sg_table *st)
 	kfree(st);
 }
 
-static int get_huge_pages(struct drm_i915_gem_object *obj)
+static struct sg_table *
+get_huge_pages(struct i915_gem_object_get_pages_context *ctx,
+	       unsigned int *sizes)
 {
 #define GFP (GFP_KERNEL | __GFP_NOWARN | __GFP_NORETRY)
+	struct drm_i915_gem_object *obj = ctx->object;
 	unsigned int page_mask = obj->mm.page_mask;
 	struct sg_table *st;
 	struct scatterlist *sg;
@@ -62,11 +65,11 @@ static int get_huge_pages(struct drm_i915_gem_object *obj)
 
 	st = kmalloc(sizeof(*st), GFP);
 	if (!st)
-		return -ENOMEM;
+		return ERR_PTR(-ENOMEM);
 
 	if (sg_alloc_table(st, obj->base.size >> PAGE_SHIFT, GFP)) {
 		kfree(st);
-		return -ENOMEM;
+		return ERR_PTR(-ENOMEM);
 	}
 
 	rem = obj->base.size;
@@ -114,16 +117,14 @@ static int get_huge_pages(struct drm_i915_gem_object *obj)
 	obj->mm.madv = I915_MADV_DONTNEED;
 
 	GEM_BUG_ON(sg_page_sizes != obj->mm.page_mask);
-	__i915_gem_object_set_pages(obj, st, sg_page_sizes);
-
-	return 0;
+	*sizes = sg_page_sizes;
+	return st;
 
 err:
 	sg_set_page(sg, NULL, 0, 0);
 	sg_mark_end(sg);
 	huge_pages_free_pages(st);
-
-	return -ENOMEM;
+	return ERR_PTR(-ENOMEM);
 }
 
 static void put_huge_pages(struct drm_i915_gem_object *obj,
@@ -175,8 +176,11 @@ huge_pages_object(struct drm_i915_private *i915,
 	return obj;
 }
 
-static int fake_get_huge_pages(struct drm_i915_gem_object *obj)
+static struct sg_table *
+fake_get_huge_pages(struct i915_gem_object_get_pages_context *ctx,
+		    unsigned int *sizes)
 {
+	struct drm_i915_gem_object *obj = ctx->object;
 	struct drm_i915_private *i915 = to_i915(obj->base.dev);
 	const u64 max_len = rounddown_pow_of_two(UINT_MAX);
 	struct sg_table *st;
@@ -186,11 +190,11 @@ static int fake_get_huge_pages(struct drm_i915_gem_object *obj)
 
 	st = kmalloc(sizeof(*st), GFP);
 	if (!st)
-		return -ENOMEM;
+		return ERR_PTR(-ENOMEM);
 
 	if (sg_alloc_table(st, obj->base.size >> PAGE_SHIFT, GFP)) {
 		kfree(st);
-		return -ENOMEM;
+		return ERR_PTR(-ENOMEM);
 	}
 
 	/* Use optimal page sized chunks to fill in the sg table */
@@ -227,13 +231,15 @@ static int fake_get_huge_pages(struct drm_i915_gem_object *obj)
 
 	obj->mm.madv = I915_MADV_DONTNEED;
 
-	__i915_gem_object_set_pages(obj, st, sg_page_sizes);
-
-	return 0;
+	*sizes = sg_page_sizes;
+	return st;
 }
 
-static int fake_get_huge_pages_single(struct drm_i915_gem_object *obj)
+static struct sg_table *
+fake_get_huge_pages_single(struct i915_gem_object_get_pages_context *ctx,
+			   unsigned int *sizes)
 {
+	struct drm_i915_gem_object *obj = ctx->object;
 	struct drm_i915_private *i915 = to_i915(obj->base.dev);
 	struct sg_table *st;
 	struct scatterlist *sg;
@@ -241,11 +247,11 @@ static int fake_get_huge_pages_single(struct drm_i915_gem_object *obj)
 
 	st = kmalloc(sizeof(*st), GFP);
 	if (!st)
-		return -ENOMEM;
+		return ERR_PTR(-ENOMEM);
 
 	if (sg_alloc_table(st, 1, GFP)) {
 		kfree(st);
-		return -ENOMEM;
+		return ERR_PTR(-ENOMEM);
 	}
 
 	sg = st->sgl;
@@ -261,9 +267,8 @@ static int fake_get_huge_pages_single(struct drm_i915_gem_object *obj)
 
 	obj->mm.madv = I915_MADV_DONTNEED;
 
-	__i915_gem_object_set_pages(obj, st, sg->length);
-
-	return 0;
+	*sizes = sg->length;
+	return st;
 #undef GFP
 }
 
diff --git a/drivers/gpu/drm/i915/gvt/dmabuf.c b/drivers/gpu/drm/i915/gvt/dmabuf.c
index 41c8ebc60c63..c2a81939f698 100644
--- a/drivers/gpu/drm/i915/gvt/dmabuf.c
+++ b/drivers/gpu/drm/i915/gvt/dmabuf.c
@@ -36,9 +36,11 @@
 
 #define GEN8_DECODE_PTE(pte) (pte & GENMASK_ULL(63, 12))
 
-static int vgpu_gem_get_pages(
-		struct drm_i915_gem_object *obj)
+static struct sg_table *
+vgpu_gem_get_pages(struct i915_gem_object_get_pages_context *ctx,
+		   unsigned int *sizes)
 {
+	struct drm_i915_gem_object *obj = ctx->object;
 	struct drm_i915_private *dev_priv = to_i915(obj->base.dev);
 	struct sg_table *st;
 	struct scatterlist *sg;
@@ -49,17 +51,17 @@ static int vgpu_gem_get_pages(
 
 	fb_info = (struct intel_vgpu_fb_info *)obj->gvt_info;
 	if (WARN_ON(!fb_info))
-		return -ENODEV;
+		return ERR_PTR(-ENODEV);
 
 	st = kmalloc(sizeof(*st), GFP_KERNEL);
 	if (unlikely(!st))
-		return -ENOMEM;
+		return ERR_PTR(-ENOMEM);
 
 	page_num = obj->base.size >> PAGE_SHIFT;
 	ret = sg_alloc_table(st, page_num, GFP_KERNEL);
 	if (ret) {
 		kfree(st);
-		return ret;
+		return ERR_PTR(ret);
 	}
 	gtt_entries = (gen8_pte_t __iomem *)dev_priv->ggtt.gsm +
 		(fb_info->start >> PAGE_SHIFT);
@@ -71,9 +73,8 @@ static int vgpu_gem_get_pages(
 		sg_dma_len(sg) = PAGE_SIZE;
 	}
 
-	__i915_gem_object_set_pages(obj, st, PAGE_SIZE);
-
-	return 0;
+	*sizes = PAGE_SIZE;
+	return st;
 }
 
 static void vgpu_gem_put_pages(struct drm_i915_gem_object *obj,
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 9a1ec487e8b1..ef9e4cb49b4a 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -786,13 +786,6 @@ struct i915_gem_mm {
 	struct notifier_block vmap_notifier;
 	struct shrinker shrinker;
 
-	/**
-	 * Workqueue to fault in userptr pages, flushed by the execbuf
-	 * when required but otherwise left to userspace to try again
-	 * on EAGAIN.
-	 */
-	struct workqueue_struct *userptr_wq;
-
 	u64 unordered_timeline;
 
 	/* the indicator for dispatch video commands on two BSD rings */
@@ -2514,7 +2507,7 @@ static inline bool intel_vgpu_active(struct drm_i915_private *dev_priv)
 }
 
 /* i915_gem.c */
-int i915_gem_init_userptr(struct drm_i915_private *dev_priv);
+void i915_gem_init_userptr(struct drm_i915_private *dev_priv);
 void i915_gem_cleanup_userptr(struct drm_i915_private *dev_priv);
 void i915_gem_sanitize(struct drm_i915_private *i915);
 int i915_gem_init_early(struct drm_i915_private *dev_priv);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 21416b87905e..1571c707ad15 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1544,10 +1544,7 @@ int i915_gem_init(struct drm_i915_private *dev_priv)
 	dev_priv->mm.unordered_timeline = dma_fence_context_alloc(1);
 
 	i915_timelines_init(dev_priv);
-
-	ret = i915_gem_init_userptr(dev_priv);
-	if (ret)
-		return ret;
+	i915_gem_init_userptr(dev_priv);
 
 	ret = intel_uc_init_misc(dev_priv);
 	if (ret)
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
index 664b7e3e1d7c..dae1f8634a38 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
@@ -54,10 +54,13 @@ static void fake_free_pages(struct drm_i915_gem_object *obj,
 	kfree(pages);
 }
 
-static int fake_get_pages(struct drm_i915_gem_object *obj)
+static struct sg_table *
+fake_get_pages(struct i915_gem_object_get_pages_context *ctx,
+	       unsigned int *sizes)
 {
 #define GFP (GFP_KERNEL | __GFP_NOWARN | __GFP_NORETRY)
 #define PFN_BIAS 0x1000
+	struct drm_i915_gem_object *obj = ctx->object;
 	struct sg_table *pages;
 	struct scatterlist *sg;
 	unsigned int sg_page_sizes;
@@ -65,12 +68,12 @@ static int fake_get_pages(struct drm_i915_gem_object *obj)
 
 	pages = kmalloc(sizeof(*pages), GFP);
 	if (!pages)
-		return -ENOMEM;
+		return ERR_PTR(-ENOMEM);
 
 	rem = round_up(obj->base.size, BIT(31)) >> 31;
 	if (sg_alloc_table(pages, rem, GFP)) {
 		kfree(pages);
-		return -ENOMEM;
+		return ERR_PTR(-ENOMEM);
 	}
 
 	sg_page_sizes = 0;
@@ -90,9 +93,8 @@ static int fake_get_pages(struct drm_i915_gem_object *obj)
 
 	obj->mm.madv = I915_MADV_DONTNEED;
 
-	__i915_gem_object_set_pages(obj, pages, sg_page_sizes);
-
-	return 0;
+	*sizes = sg_page_sizes;
+	return pages;
 #undef GFP
 }
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 37/39] drm/i915: Use forced preemptions in preference over hangcheck
  2019-06-14  7:09 WIP glance towards struct_mutex removal Chris Wilson
                   ` (35 preceding siblings ...)
  2019-06-14  7:10 ` [PATCH 36/39] drm/i915: Use reservation_object to coordinate userptr get_pages() Chris Wilson
@ 2019-06-14  7:10 ` Chris Wilson
  2019-06-14  7:10 ` [PATCH 38/39] drm/i915: Remove logical HW ID Chris Wilson
                   ` (3 subsequent siblings)
  40 siblings, 0 replies; 61+ messages in thread
From: Chris Wilson @ 2019-06-14  7:10 UTC (permalink / raw)
  To: intel-gfx

How well does this work in practice? It means that unless someone else
is attempting to run we do not reset infinite loops. Maybe that is a
good thing.

Opens:

* This sacrifices error capture. Maybe make that an opt-in with a
watchdog.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
 drivers/gpu/drm/i915/gt/intel_gt_pm.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
index ae7155f0e063..9a4e58d637ee 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
@@ -43,7 +43,8 @@ static int intel_gt_unpark(struct intel_wakeref *wf)
 
 	i915_pmu_gt_unparked(i915);
 
-	i915_queue_hangcheck(i915);
+	if (!(i915->caps.scheduler & I915_SCHEDULER_CAP_PREEMPTION))
+		i915_queue_hangcheck(i915);
 
 	pm_notify(i915, INTEL_GT_UNPARK);
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 38/39] drm/i915: Remove logical HW ID
  2019-06-14  7:09 WIP glance towards struct_mutex removal Chris Wilson
                   ` (36 preceding siblings ...)
  2019-06-14  7:10 ` [PATCH 37/39] drm/i915: Use forced preemptions in preference over hangcheck Chris Wilson
@ 2019-06-14  7:10 ` Chris Wilson
  2019-06-14  7:10 ` [PATCH 39/39] active-rings Chris Wilson
                   ` (2 subsequent siblings)
  40 siblings, 0 replies; 61+ messages in thread
From: Chris Wilson @ 2019-06-14  7:10 UTC (permalink / raw)
  To: intel-gfx

We only need to keep a unique tag for the active lifetime of the
context, and for as long as we need to identify that context. The HW
uses the tag to determine if it should use a lite-restore (why not the
LRCA?) and passes the tag back for various status identifies. The only
status we need to track is for OA, so when using perf, we assign the
specific context a unique tag.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c   | 157 ------------------
 drivers/gpu/drm/i915/gem/i915_gem_context.h   |  15 --
 .../gpu/drm/i915/gem/i915_gem_context_types.h |  18 --
 .../drm/i915/gem/selftests/i915_gem_context.c |  13 +-
 .../gpu/drm/i915/gem/selftests/mock_context.c |   8 -
 drivers/gpu/drm/i915/gt/intel_context_types.h |   1 +
 drivers/gpu/drm/i915/gt/intel_engine_types.h  |   5 +-
 drivers/gpu/drm/i915/gt/intel_lrc.c           |  28 +---
 drivers/gpu/drm/i915/gvt/kvmgt.c              |  17 --
 drivers/gpu/drm/i915/i915_debugfs.c           |   3 -
 drivers/gpu/drm/i915/i915_drv.h               |  10 --
 drivers/gpu/drm/i915/i915_gpu_error.c         |   7 +-
 drivers/gpu/drm/i915/i915_gpu_error.h         |   1 -
 drivers/gpu/drm/i915/i915_perf.c              |  28 ++--
 drivers/gpu/drm/i915/i915_trace.h             |  38 ++---
 .../gpu/drm/i915/selftests/i915_gem_evict.c   |   4 +-
 drivers/gpu/drm/i915/selftests/i915_vma.c     |   2 +-
 17 files changed, 51 insertions(+), 304 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index da6bf27697e4..14407a8bbe60 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -169,95 +169,6 @@ lookup_user_engine(struct i915_gem_context *ctx,
 	return i915_gem_context_get_engine(ctx, idx);
 }
 
-static inline int new_hw_id(struct drm_i915_private *i915, gfp_t gfp)
-{
-	unsigned int max;
-
-	lockdep_assert_held(&i915->contexts.mutex);
-
-	if (INTEL_GEN(i915) >= 11)
-		max = GEN11_MAX_CONTEXT_HW_ID;
-	else if (USES_GUC_SUBMISSION(i915))
-		/*
-		 * When using GuC in proxy submission, GuC consumes the
-		 * highest bit in the context id to indicate proxy submission.
-		 */
-		max = MAX_GUC_CONTEXT_HW_ID;
-	else
-		max = MAX_CONTEXT_HW_ID;
-
-	return ida_simple_get(&i915->contexts.hw_ida, 0, max, gfp);
-}
-
-static int steal_hw_id(struct drm_i915_private *i915)
-{
-	struct i915_gem_context *ctx, *cn;
-	LIST_HEAD(pinned);
-	int id = -ENOSPC;
-
-	lockdep_assert_held(&i915->contexts.mutex);
-
-	list_for_each_entry_safe(ctx, cn,
-				 &i915->contexts.hw_id_list, hw_id_link) {
-		if (atomic_read(&ctx->hw_id_pin_count)) {
-			list_move_tail(&ctx->hw_id_link, &pinned);
-			continue;
-		}
-
-		GEM_BUG_ON(!ctx->hw_id); /* perma-pinned kernel context */
-		list_del_init(&ctx->hw_id_link);
-		id = ctx->hw_id;
-		break;
-	}
-
-	/*
-	 * Remember how far we got up on the last repossesion scan, so the
-	 * list is kept in a "least recently scanned" order.
-	 */
-	list_splice_tail(&pinned, &i915->contexts.hw_id_list);
-	return id;
-}
-
-static int assign_hw_id(struct drm_i915_private *i915, unsigned int *out)
-{
-	int ret;
-
-	lockdep_assert_held(&i915->contexts.mutex);
-
-	/*
-	 * We prefer to steal/stall ourselves and our users over that of the
-	 * entire system. That may be a little unfair to our users, and
-	 * even hurt high priority clients. The choice is whether to oomkill
-	 * something else, or steal a context id.
-	 */
-	ret = new_hw_id(i915, GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN);
-	if (unlikely(ret < 0)) {
-		ret = steal_hw_id(i915);
-		if (ret < 0) /* once again for the correct errno code */
-			ret = new_hw_id(i915, GFP_KERNEL);
-		if (ret < 0)
-			return ret;
-	}
-
-	*out = ret;
-	return 0;
-}
-
-static void release_hw_id(struct i915_gem_context *ctx)
-{
-	struct drm_i915_private *i915 = ctx->i915;
-
-	if (list_empty(&ctx->hw_id_link))
-		return;
-
-	mutex_lock(&i915->contexts.mutex);
-	if (!list_empty(&ctx->hw_id_link)) {
-		ida_simple_remove(&i915->contexts.hw_ida, ctx->hw_id);
-		list_del_init(&ctx->hw_id_link);
-	}
-	mutex_unlock(&i915->contexts.mutex);
-}
-
 static void __free_engines(struct i915_gem_engines *e, unsigned int count)
 {
 	while (count--) {
@@ -311,7 +222,6 @@ static void i915_gem_context_free(struct i915_gem_context *ctx)
 	lockdep_assert_held(&ctx->i915->drm.struct_mutex);
 	GEM_BUG_ON(!i915_gem_context_is_closed(ctx));
 
-	release_hw_id(ctx);
 	if (ctx->vm)
 		i915_vm_put(ctx->vm);
 
@@ -383,12 +293,6 @@ static void context_close(struct i915_gem_context *ctx)
 	i915_gem_context_set_closed(ctx);
 	ctx->file_priv = ERR_PTR(-EBADF);
 
-	/*
-	 * This context will never again be assinged to HW, so we can
-	 * reuse its ID for the next context.
-	 */
-	release_hw_id(ctx);
-
 	/*
 	 * The LUT uses the VMA as a backpointer to unref the object,
 	 * so we need to clear the LUT before we close all the VMA (inside
@@ -451,7 +355,6 @@ __create_context(struct drm_i915_private *i915)
 	RCU_INIT_POINTER(ctx->engines, e);
 
 	INIT_RADIX_TREE(&ctx->handles_vma, GFP_KERNEL);
-	INIT_LIST_HEAD(&ctx->hw_id_link);
 
 	/* NB: Mark all slices as needing a remap so that when the context first
 	 * loads it will restore whatever remap state already exists. If there
@@ -574,13 +477,6 @@ i915_gem_context_create_gvt(struct drm_device *dev)
 	if (IS_ERR(ctx))
 		goto out;
 
-	ret = i915_gem_context_pin_hw_id(ctx);
-	if (ret) {
-		context_close(ctx);
-		ctx = ERR_PTR(ret);
-		goto out;
-	}
-
 	ctx->file_priv = ERR_PTR(-EBADF);
 	i915_gem_context_set_closed(ctx); /* not user accessible */
 	i915_gem_context_clear_bannable(ctx);
@@ -611,18 +507,11 @@ struct i915_gem_context *
 i915_gem_context_create_kernel(struct drm_i915_private *i915, int prio)
 {
 	struct i915_gem_context *ctx;
-	int err;
 
 	ctx = i915_gem_create_context(i915, 0);
 	if (IS_ERR(ctx))
 		return ctx;
 
-	err = i915_gem_context_pin_hw_id(ctx);
-	if (err) {
-		destroy_kernel_context(&ctx);
-		return ERR_PTR(err);
-	}
-
 	i915_gem_context_clear_bannable(ctx);
 	ctx->sched.priority = I915_USER_PRIORITY(prio);
 	ctx->ring_size = PAGE_SIZE;
@@ -637,12 +526,6 @@ static void init_contexts(struct drm_i915_private *i915)
 	mutex_init(&i915->contexts.mutex);
 	INIT_LIST_HEAD(&i915->contexts.list);
 
-	/* Using the simple ida interface, the max is limited by sizeof(int) */
-	BUILD_BUG_ON(MAX_CONTEXT_HW_ID > INT_MAX);
-	BUILD_BUG_ON(GEN11_MAX_CONTEXT_HW_ID > INT_MAX);
-	ida_init(&i915->contexts.hw_ida);
-	INIT_LIST_HEAD(&i915->contexts.hw_id_list);
-
 	INIT_WORK(&i915->contexts.free_work, contexts_free_worker);
 	init_llist_head(&i915->contexts.free_list);
 }
@@ -669,15 +552,6 @@ int i915_gem_contexts_init(struct drm_i915_private *dev_priv)
 		DRM_ERROR("Failed to create default global context\n");
 		return PTR_ERR(ctx);
 	}
-	/*
-	 * For easy recognisablity, we want the kernel context to be 0 and then
-	 * all user contexts will have non-zero hw_id. Kernel contexts are
-	 * permanently pinned, so that we never suffer a stall and can
-	 * use them from any allocation context (e.g. for evicting other
-	 * contexts and from inside the shrinker).
-	 */
-	GEM_BUG_ON(ctx->hw_id);
-	GEM_BUG_ON(!atomic_read(&ctx->hw_id_pin_count));
 	dev_priv->kernel_context = ctx;
 
 	/* highest priority; preempting task */
@@ -702,10 +576,6 @@ void i915_gem_contexts_fini(struct drm_i915_private *i915)
 	if (i915->preempt_context)
 		destroy_kernel_context(&i915->preempt_context);
 	destroy_kernel_context(&i915->kernel_context);
-
-	/* Must free all deferred contexts (via flush_workqueue) first */
-	GEM_BUG_ON(!list_empty(&i915->contexts.hw_id_list));
-	ida_destroy(&i915->contexts.hw_ida);
 }
 
 static int context_idr_cleanup(int id, void *p, void *data)
@@ -2398,33 +2268,6 @@ int i915_gem_context_reset_stats_ioctl(struct drm_device *dev,
 	return ret;
 }
 
-int __i915_gem_context_pin_hw_id(struct i915_gem_context *ctx)
-{
-	struct drm_i915_private *i915 = ctx->i915;
-	int err = 0;
-
-	mutex_lock(&i915->contexts.mutex);
-
-	GEM_BUG_ON(i915_gem_context_is_closed(ctx));
-
-	if (list_empty(&ctx->hw_id_link)) {
-		GEM_BUG_ON(atomic_read(&ctx->hw_id_pin_count));
-
-		err = assign_hw_id(i915, &ctx->hw_id);
-		if (err)
-			goto out_unlock;
-
-		list_add_tail(&ctx->hw_id_link, &i915->contexts.hw_id_list);
-	}
-
-	GEM_BUG_ON(atomic_read(&ctx->hw_id_pin_count) == ~0u);
-	atomic_inc(&ctx->hw_id_pin_count);
-
-out_unlock:
-	mutex_unlock(&i915->contexts.mutex);
-	return err;
-}
-
 /* GEM context-engines iterator: for_each_gem_engine() */
 struct intel_context *
 i915_gem_engines_iter_next(struct i915_gem_engines_iter *it)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.h b/drivers/gpu/drm/i915/gem/i915_gem_context.h
index 9691dd062f72..6f516ff951b4 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.h
@@ -112,21 +112,6 @@ i915_gem_context_clear_user_engines(struct i915_gem_context *ctx)
 	clear_bit(CONTEXT_USER_ENGINES, &ctx->flags);
 }
 
-int __i915_gem_context_pin_hw_id(struct i915_gem_context *ctx);
-static inline int i915_gem_context_pin_hw_id(struct i915_gem_context *ctx)
-{
-	if (atomic_inc_not_zero(&ctx->hw_id_pin_count))
-		return 0;
-
-	return __i915_gem_context_pin_hw_id(ctx);
-}
-
-static inline void i915_gem_context_unpin_hw_id(struct i915_gem_context *ctx)
-{
-	GEM_BUG_ON(atomic_read(&ctx->hw_id_pin_count) == 0u);
-	atomic_dec(&ctx->hw_id_pin_count);
-}
-
 static inline bool i915_gem_context_is_kernel(struct i915_gem_context *ctx)
 {
 	return !ctx->file_priv;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
index cc513410eeef..372e20b3f4bd 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
@@ -147,24 +147,6 @@ struct i915_gem_context {
 #define CONTEXT_FORCE_SINGLE_SUBMISSION	2
 #define CONTEXT_USER_ENGINES		3
 
-	/**
-	 * @hw_id: - unique identifier for the context
-	 *
-	 * The hardware needs to uniquely identify the context for a few
-	 * functions like fault reporting, PASID, scheduling. The
-	 * &drm_i915_private.context_hw_ida is used to assign a unqiue
-	 * id for the lifetime of the context.
-	 *
-	 * @hw_id_pin_count: - number of times this context had been pinned
-	 * for use (should be, at most, once per engine).
-	 *
-	 * @hw_id_link: - all contexts with an assigned id are tracked
-	 * for possible repossession.
-	 */
-	unsigned int hw_id;
-	atomic_t hw_id_pin_count;
-	struct list_head hw_id_link;
-
 	struct mutex mutex;
 
 	struct i915_sched_attr sched;
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
index 6247d2880464..dbce23d9cf91 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
@@ -534,9 +534,9 @@ static int igt_ctx_exec(void *arg)
 			with_intel_runtime_pm(i915, wakeref)
 				err = gpu_fill(obj, ctx, engine, dw);
 			if (err) {
-				pr_err("Failed to fill dword %lu [%lu/%lu] with gpu (%s) in ctx %u [full-ppgtt? %s], err=%d\n",
+				pr_err("Failed to fill dword %lu [%lu/%lu] with gpu (%s) [full-ppgtt? %s], err=%d\n",
 				       ndwords, dw, max_dwords(obj),
-				       engine->name, ctx->hw_id,
+				       engine->name,
 				       yesno(!!ctx->vm), err);
 				goto out_unlock;
 			}
@@ -654,9 +654,9 @@ static int igt_shared_ctx_exec(void *arg)
 			with_intel_runtime_pm(i915, wakeref)
 				err = gpu_fill(obj, ctx, engine, dw);
 			if (err) {
-				pr_err("Failed to fill dword %lu [%lu/%lu] with gpu (%s) in ctx %u [full-ppgtt? %s], err=%d\n",
+				pr_err("Failed to fill dword %lu [%lu/%lu] with gpu (%s) [full-ppgtt? %s], err=%d\n",
 				       ndwords, dw, max_dwords(obj),
-				       engine->name, ctx->hw_id,
+				       engine->name,
 				       yesno(!!ctx->vm), err);
 				kernel_context_close(ctx);
 				goto out_test;
@@ -1236,10 +1236,9 @@ static int igt_ctx_readonly(void *arg)
 			with_intel_runtime_pm(i915, wakeref)
 				err = gpu_fill(obj, ctx, engine, dw);
 			if (err) {
-				pr_err("Failed to fill dword %lu [%lu/%lu] with gpu (%s) in ctx %u [full-ppgtt? %s], err=%d\n",
+				pr_err("Failed to fill dword %lu [%lu/%lu] with gpu (%s) [full-ppgtt? %s], err=%d\n",
 				       ndwords, dw, max_dwords(obj),
-				       engine->name, ctx->hw_id,
-				       yesno(!!ctx->vm), err);
+				       engine->name, yesno(!!ctx->vm), err);
 				goto out_unlock;
 			}
 
diff --git a/drivers/gpu/drm/i915/gem/selftests/mock_context.c b/drivers/gpu/drm/i915/gem/selftests/mock_context.c
index be8974ccff24..0104f16b1327 100644
--- a/drivers/gpu/drm/i915/gem/selftests/mock_context.c
+++ b/drivers/gpu/drm/i915/gem/selftests/mock_context.c
@@ -13,7 +13,6 @@ mock_context(struct drm_i915_private *i915,
 {
 	struct i915_gem_context *ctx;
 	struct i915_gem_engines *e;
-	int ret;
 
 	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
 	if (!ctx)
@@ -30,13 +29,8 @@ mock_context(struct drm_i915_private *i915,
 	RCU_INIT_POINTER(ctx->engines, e);
 
 	INIT_RADIX_TREE(&ctx->handles_vma, GFP_KERNEL);
-	INIT_LIST_HEAD(&ctx->hw_id_link);
 	mutex_init(&ctx->mutex);
 
-	ret = i915_gem_context_pin_hw_id(ctx);
-	if (ret < 0)
-		goto err_engines;
-
 	if (name) {
 		struct i915_ppgtt *ppgtt;
 
@@ -54,8 +48,6 @@ mock_context(struct drm_i915_private *i915,
 
 	return ctx;
 
-err_engines:
-	free_engines(rcu_access_pointer(ctx->engines));
 err_free:
 	kfree(ctx);
 	return NULL;
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
index 4c0e211c715d..a70ec82c4575 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -52,6 +52,7 @@ struct intel_context {
 
 	u32 *lrc_reg_state;
 	u64 lrc_desc;
+	u32 hw_tag;
 
 	unsigned int active_count; /* notionally protected by timeline->mutex */
 
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index 1cbe10a0fec7..96994a55804a 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -239,10 +239,13 @@ struct intel_engine_cs {
 
 	u8 class;
 	u8 instance;
+
+	u32 uabi_capabilities;
 	u32 context_size;
 	u32 mmio_base;
 
-	u32 uabi_capabilities;
+	unsigned int hw_tag;
+#define NUM_HW_TAG (256)
 
 	struct intel_sseu sseu;
 
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 5cd263dd3e23..721be5cec6a6 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -388,9 +388,6 @@ lrc_descriptor(struct intel_context *ce, struct intel_engine_cs *engine)
 	struct i915_gem_context *ctx = ce->gem_context;
 	u64 desc;
 
-	BUILD_BUG_ON(MAX_CONTEXT_HW_ID > (BIT(GEN8_CTX_ID_WIDTH)));
-	BUILD_BUG_ON(GEN11_MAX_CONTEXT_HW_ID > (BIT(GEN11_SW_CTX_ID_WIDTH)));
-
 	desc = ctx->desc_template;				/* bits  0-11 */
 	GEM_BUG_ON(desc & GENMASK_ULL(63, 12));
 
@@ -404,20 +401,11 @@ lrc_descriptor(struct intel_context *ce, struct intel_engine_cs *engine)
 	 * anything below.
 	 */
 	if (INTEL_GEN(engine->i915) >= 11) {
-		GEM_BUG_ON(ctx->hw_id >= BIT(GEN11_SW_CTX_ID_WIDTH));
-		desc |= (u64)ctx->hw_id << GEN11_SW_CTX_ID_SHIFT;
-								/* bits 37-47 */
-
 		desc |= (u64)engine->instance << GEN11_ENGINE_INSTANCE_SHIFT;
 								/* bits 48-53 */
 
-		/* TODO: decide what to do with SW counter (bits 55-60) */
-
 		desc |= (u64)engine->class << GEN11_ENGINE_CLASS_SHIFT;
 								/* bits 61-63 */
-	} else {
-		GEM_BUG_ON(ctx->hw_id >= BIT(GEN8_CTX_ID_WIDTH));
-		desc |= (u64)ctx->hw_id << GEN8_CTX_ID_SHIFT;	/* bits 32-52 */
 	}
 
 	return desc;
@@ -513,6 +501,15 @@ execlists_schedule_in(struct i915_request *rq, int idx)
 		intel_context_get(ce);
 		ce->inflight = rq->engine;
 
+		if (ce->hw_tag) {
+			ce->lrc_desc |= (u64)ce->hw_tag << 32;
+		} else {
+			ce->lrc_desc &= ~GENMASK_ULL(47, 37);
+			ce->lrc_desc |=
+				(u64)(rq->engine->hw_tag++ % NUM_HW_TAG) <<
+				GEN11_SW_CTX_ID_SHIFT;
+		}
+
 		execlists_context_status_change(rq, INTEL_CONTEXT_SCHEDULE_IN);
 		intel_engine_context_in(ce->inflight);
 	}
@@ -1526,7 +1523,6 @@ static void execlists_context_destroy(struct kref *kref)
 
 static void execlists_context_unpin(struct intel_context *ce)
 {
-	i915_gem_context_unpin_hw_id(ce->gem_context);
 	i915_gem_object_unpin_map(ce->state->obj);
 	intel_ring_unpin(ce->ring);
 }
@@ -1582,18 +1578,12 @@ __execlists_context_pin(struct intel_context *ce,
 	if (ret)
 		goto unpin_map;
 
-	ret = i915_gem_context_pin_hw_id(ce->gem_context);
-	if (ret)
-		goto unpin_ring;
-
 	ce->lrc_desc = lrc_descriptor(ce, engine);
 	ce->lrc_reg_state = vaddr + LRC_STATE_PN * PAGE_SIZE;
 	__execlists_update_reg_state(ce, engine);
 
 	return 0;
 
-unpin_ring:
-	intel_ring_unpin(ce->ring);
 unpin_map:
 	i915_gem_object_unpin_map(ce->state->obj);
 unpin_active:
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index 144301b778df..941d0f08b047 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -1566,27 +1566,10 @@ vgpu_id_show(struct device *dev, struct device_attribute *attr,
 	return sprintf(buf, "\n");
 }
 
-static ssize_t
-hw_id_show(struct device *dev, struct device_attribute *attr,
-	   char *buf)
-{
-	struct mdev_device *mdev = mdev_from_dev(dev);
-
-	if (mdev) {
-		struct intel_vgpu *vgpu = (struct intel_vgpu *)
-			mdev_get_drvdata(mdev);
-		return sprintf(buf, "%u\n",
-			       vgpu->submission.shadow[0]->gem_context->hw_id);
-	}
-	return sprintf(buf, "\n");
-}
-
 static DEVICE_ATTR_RO(vgpu_id);
-static DEVICE_ATTR_RO(hw_id);
 
 static struct attribute *intel_vgpu_attrs[] = {
 	&dev_attr_vgpu_id.attr,
-	&dev_attr_hw_id.attr,
 	NULL
 };
 
diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index cbeaa53117af..657e8fb18e38 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1664,9 +1664,6 @@ static int i915_context_status(struct seq_file *m, void *unused)
 		struct intel_context *ce;
 
 		seq_puts(m, "HW context ");
-		if (!list_empty(&ctx->hw_id_link))
-			seq_printf(m, "%x [pin %u]", ctx->hw_id,
-				   atomic_read(&ctx->hw_id_pin_count));
 		if (ctx->pid) {
 			struct task_struct *task;
 
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index ef9e4cb49b4a..a0ad9522425e 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1616,16 +1616,6 @@ struct drm_i915_private {
 		struct list_head list;
 		struct llist_head free_list;
 		struct work_struct free_work;
-
-		/* The hw wants to have a stable context identifier for the
-		 * lifetime of the context (for OA, PASID, faults, etc).
-		 * This is limited in execlists to 21 bits.
-		 */
-		struct ida hw_ida;
-#define MAX_CONTEXT_HW_ID (1<<21) /* exclusive */
-#define MAX_GUC_CONTEXT_HW_ID (1 << 20) /* exclusive */
-#define GEN11_MAX_CONTEXT_HW_ID (1<<11) /* exclusive */
-		struct list_head hw_id_list;
 	} contexts;
 
 	u32 fdi_rx_config;
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 01e1af2a651f..1190a1513612 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -460,9 +460,9 @@ static void error_print_context(struct drm_i915_error_state_buf *m,
 				const char *header,
 				const struct drm_i915_error_context *ctx)
 {
-	err_printf(m, "%s%s[%d] hw_id %d, prio %d, guilty %d active %d\n",
-		   header, ctx->comm, ctx->pid, ctx->hw_id,
-		   ctx->sched_attr.priority, ctx->guilty, ctx->active);
+	err_printf(m, "%s%s[%d] prio %d, guilty %d active %d\n",
+		   header, ctx->comm, ctx->pid, ctx->sched_attr.priority,
+		   ctx->guilty, ctx->active);
 }
 
 static void error_print_engine(struct drm_i915_error_state_buf *m,
@@ -1342,7 +1342,6 @@ static void record_context(struct drm_i915_error_context *e,
 		rcu_read_unlock();
 	}
 
-	e->hw_id = ctx->hw_id;
 	e->sched_attr = ctx->sched;
 	e->guilty = atomic_read(&ctx->guilty_count);
 	e->active = atomic_read(&ctx->active_count);
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.h b/drivers/gpu/drm/i915/i915_gpu_error.h
index 2ecd0c6a1c94..aa2c056df4ed 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.h
+++ b/drivers/gpu/drm/i915/i915_gpu_error.h
@@ -117,7 +117,6 @@ struct i915_gpu_state {
 		struct drm_i915_error_context {
 			char comm[TASK_COMM_LEN];
 			pid_t pid;
-			u32 hw_id;
 			int active;
 			int guilty;
 			struct i915_sched_attr sched_attr;
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 891b60f49843..7ba83488d2e7 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -1293,19 +1293,14 @@ static int oa_get_render_ctx_id(struct i915_perf_stream *stream)
 			i915->perf.oa.specific_ctx_id_mask =
 				(1U << GEN8_CTX_ID_WIDTH) - 1;
 			i915->perf.oa.specific_ctx_id =
-				upper_32_bits(ce->lrc_desc);
-			i915->perf.oa.specific_ctx_id &=
 				i915->perf.oa.specific_ctx_id_mask;
 		}
 		break;
 
 	case 11: {
 		i915->perf.oa.specific_ctx_id_mask =
-			((1U << GEN11_SW_CTX_ID_WIDTH) - 1) << (GEN11_SW_CTX_ID_SHIFT - 32) |
-			((1U << GEN11_ENGINE_INSTANCE_WIDTH) - 1) << (GEN11_ENGINE_INSTANCE_SHIFT - 32) |
-			((1 << GEN11_ENGINE_CLASS_WIDTH) - 1) << (GEN11_ENGINE_CLASS_SHIFT - 32);
-		i915->perf.oa.specific_ctx_id = upper_32_bits(ce->lrc_desc);
-		i915->perf.oa.specific_ctx_id &=
+			((1U << GEN11_SW_CTX_ID_WIDTH) - 1) << (GEN11_SW_CTX_ID_SHIFT - 32);
+		i915->perf.oa.specific_ctx_id =
 			i915->perf.oa.specific_ctx_id_mask;
 		break;
 	}
@@ -1314,6 +1309,8 @@ static int oa_get_render_ctx_id(struct i915_perf_stream *stream)
 		MISSING_CASE(INTEL_GEN(i915));
 	}
 
+	ce->hw_tag = i915->perf.oa.specific_ctx_id_mask;
+
 	DRM_DEBUG_DRIVER("filtering on ctx_id=0x%x ctx_id_mask=0x%x\n",
 			 i915->perf.oa.specific_ctx_id,
 			 i915->perf.oa.specific_ctx_id_mask);
@@ -1330,18 +1327,19 @@ static int oa_get_render_ctx_id(struct i915_perf_stream *stream)
  */
 static void oa_put_render_ctx_id(struct i915_perf_stream *stream)
 {
-	struct drm_i915_private *dev_priv = stream->dev_priv;
+	struct drm_i915_private *i915 = stream->dev_priv;
 	struct intel_context *ce;
 
-	dev_priv->perf.oa.specific_ctx_id = INVALID_CTX_ID;
-	dev_priv->perf.oa.specific_ctx_id_mask = 0;
-
-	ce = fetch_and_zero(&dev_priv->perf.oa.pinned_ctx);
-	if (ce) {
-		mutex_lock(&dev_priv->drm.struct_mutex);
+	ce = fetch_and_zero(&i915->perf.oa.pinned_ctx);
+	if (ce)  {
+		mutex_lock(&i915->drm.struct_mutex);
+		ce->hw_tag = 0;
 		intel_context_unpin(ce);
-		mutex_unlock(&dev_priv->drm.struct_mutex);
+		mutex_unlock(&i915->drm.struct_mutex);
 	}
+
+	i915->perf.oa.specific_ctx_id = INVALID_CTX_ID;
+	i915->perf.oa.specific_ctx_id_mask = 0;
 }
 
 static void
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index 5c8cfaa70d72..e6f855323ffc 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -675,7 +675,6 @@ TRACE_EVENT(i915_request_queue,
 
 	    TP_STRUCT__entry(
 			     __field(u32, dev)
-			     __field(u32, hw_id)
 			     __field(u64, ctx)
 			     __field(u16, class)
 			     __field(u16, instance)
@@ -685,7 +684,6 @@ TRACE_EVENT(i915_request_queue,
 
 	    TP_fast_assign(
 			   __entry->dev = rq->i915->drm.primary->index;
-			   __entry->hw_id = rq->gem_context->hw_id;
 			   __entry->class = rq->engine->uabi_class;
 			   __entry->instance = rq->engine->instance;
 			   __entry->ctx = rq->fence.context;
@@ -693,10 +691,9 @@ TRACE_EVENT(i915_request_queue,
 			   __entry->flags = flags;
 			   ),
 
-	    TP_printk("dev=%u, engine=%u:%u, hw_id=%u, ctx=%llu, seqno=%u, flags=0x%x",
+	    TP_printk("dev=%u, engine=%u:%u, ctx=%llu, seqno=%u, flags=0x%x",
 		      __entry->dev, __entry->class, __entry->instance,
-		      __entry->hw_id, __entry->ctx, __entry->seqno,
-		      __entry->flags)
+		      __entry->ctx, __entry->seqno, __entry->flags)
 );
 
 DECLARE_EVENT_CLASS(i915_request,
@@ -705,7 +702,6 @@ DECLARE_EVENT_CLASS(i915_request,
 
 	    TP_STRUCT__entry(
 			     __field(u32, dev)
-			     __field(u32, hw_id)
 			     __field(u64, ctx)
 			     __field(u16, class)
 			     __field(u16, instance)
@@ -714,16 +710,15 @@ DECLARE_EVENT_CLASS(i915_request,
 
 	    TP_fast_assign(
 			   __entry->dev = rq->i915->drm.primary->index;
-			   __entry->hw_id = rq->gem_context->hw_id;
 			   __entry->class = rq->engine->uabi_class;
 			   __entry->instance = rq->engine->instance;
 			   __entry->ctx = rq->fence.context;
 			   __entry->seqno = rq->fence.seqno;
 			   ),
 
-	    TP_printk("dev=%u, engine=%u:%u, hw_id=%u, ctx=%llu, seqno=%u",
+	    TP_printk("dev=%u, engine=%u:%u, ctx=%llu, seqno=%u",
 		      __entry->dev, __entry->class, __entry->instance,
-		      __entry->hw_id, __entry->ctx, __entry->seqno)
+		      __entry->ctx, __entry->seqno)
 );
 
 DEFINE_EVENT(i915_request, i915_request_add,
@@ -748,7 +743,6 @@ TRACE_EVENT(i915_request_in,
 
 	    TP_STRUCT__entry(
 			     __field(u32, dev)
-			     __field(u32, hw_id)
 			     __field(u64, ctx)
 			     __field(u16, class)
 			     __field(u16, instance)
@@ -759,7 +753,6 @@ TRACE_EVENT(i915_request_in,
 
 	    TP_fast_assign(
 			   __entry->dev = rq->i915->drm.primary->index;
-			   __entry->hw_id = rq->gem_context->hw_id;
 			   __entry->class = rq->engine->uabi_class;
 			   __entry->instance = rq->engine->instance;
 			   __entry->ctx = rq->fence.context;
@@ -768,9 +761,9 @@ TRACE_EVENT(i915_request_in,
 			   __entry->port = port;
 			   ),
 
-	    TP_printk("dev=%u, engine=%u:%u, hw_id=%u, ctx=%llu, seqno=%u, prio=%u, port=%u",
+	    TP_printk("dev=%u, engine=%u:%u, ctx=%llu, seqno=%u, prio=%u, port=%u",
 		      __entry->dev, __entry->class, __entry->instance,
-		      __entry->hw_id, __entry->ctx, __entry->seqno,
+		      __entry->ctx, __entry->seqno,
 		      __entry->prio, __entry->port)
 );
 
@@ -780,7 +773,6 @@ TRACE_EVENT(i915_request_out,
 
 	    TP_STRUCT__entry(
 			     __field(u32, dev)
-			     __field(u32, hw_id)
 			     __field(u64, ctx)
 			     __field(u16, class)
 			     __field(u16, instance)
@@ -790,7 +782,6 @@ TRACE_EVENT(i915_request_out,
 
 	    TP_fast_assign(
 			   __entry->dev = rq->i915->drm.primary->index;
-			   __entry->hw_id = rq->gem_context->hw_id;
 			   __entry->class = rq->engine->uabi_class;
 			   __entry->instance = rq->engine->instance;
 			   __entry->ctx = rq->fence.context;
@@ -798,10 +789,9 @@ TRACE_EVENT(i915_request_out,
 			   __entry->completed = i915_request_completed(rq);
 			   ),
 
-		    TP_printk("dev=%u, engine=%u:%u, hw_id=%u, ctx=%llu, seqno=%u, completed?=%u",
+		    TP_printk("dev=%u, engine=%u:%u, ctx=%llu, seqno=%u, completed?=%u",
 			      __entry->dev, __entry->class, __entry->instance,
-			      __entry->hw_id, __entry->ctx, __entry->seqno,
-			      __entry->completed)
+			      __entry->ctx, __entry->seqno, __entry->completed)
 );
 
 #else
@@ -839,7 +829,6 @@ TRACE_EVENT(i915_request_wait_begin,
 
 	    TP_STRUCT__entry(
 			     __field(u32, dev)
-			     __field(u32, hw_id)
 			     __field(u64, ctx)
 			     __field(u16, class)
 			     __field(u16, instance)
@@ -855,7 +844,6 @@ TRACE_EVENT(i915_request_wait_begin,
 	     */
 	    TP_fast_assign(
 			   __entry->dev = rq->i915->drm.primary->index;
-			   __entry->hw_id = rq->gem_context->hw_id;
 			   __entry->class = rq->engine->uabi_class;
 			   __entry->instance = rq->engine->instance;
 			   __entry->ctx = rq->fence.context;
@@ -863,9 +851,9 @@ TRACE_EVENT(i915_request_wait_begin,
 			   __entry->flags = flags;
 			   ),
 
-	    TP_printk("dev=%u, engine=%u:%u, hw_id=%u, ctx=%llu, seqno=%u, blocking=%u, flags=0x%x",
+	    TP_printk("dev=%u, engine=%u:%u, ctx=%llu, seqno=%u, blocking=%u, flags=0x%x",
 		      __entry->dev, __entry->class, __entry->instance,
-		      __entry->hw_id, __entry->ctx, __entry->seqno,
+		      __entry->ctx, __entry->seqno,
 		      !!(__entry->flags & I915_WAIT_LOCKED),
 		      __entry->flags)
 );
@@ -969,19 +957,17 @@ DECLARE_EVENT_CLASS(i915_context,
 	TP_STRUCT__entry(
 			__field(u32, dev)
 			__field(struct i915_gem_context *, ctx)
-			__field(u32, hw_id)
 			__field(struct i915_address_space *, vm)
 	),
 
 	TP_fast_assign(
 			__entry->dev = ctx->i915->drm.primary->index;
 			__entry->ctx = ctx;
-			__entry->hw_id = ctx->hw_id;
 			__entry->vm = ctx->vm;
 	),
 
-	TP_printk("dev=%u, ctx=%p, ctx_vm=%p, hw_id=%u",
-		  __entry->dev, __entry->ctx, __entry->vm, __entry->hw_id)
+	TP_printk("dev=%u, ctx=%p, ctx_vm=%p",
+		  __entry->dev, __entry->ctx, __entry->vm)
 )
 
 DEFINE_EVENT(i915_context, i915_context_create,
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
index 71c1363ad536..4247163cb80f 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
@@ -473,8 +473,8 @@ static int igt_evict_contexts(void *arg)
 			if (IS_ERR(rq)) {
 				/* When full, fail_if_busy will trigger EBUSY */
 				if (PTR_ERR(rq) != -EBUSY) {
-					pr_err("Unexpected error from request alloc (ctx hw id %u, on %s): %d\n",
-					       ctx->hw_id, engine->name,
+					pr_err("Unexpected error from request alloc (on %s): %d\n",
+					       engine->name,
 					       (int)PTR_ERR(rq));
 					err = PTR_ERR(rq);
 				}
diff --git a/drivers/gpu/drm/i915/selftests/i915_vma.c b/drivers/gpu/drm/i915/selftests/i915_vma.c
index 332ee5cb9120..d223f3a6f608 100644
--- a/drivers/gpu/drm/i915/selftests/i915_vma.c
+++ b/drivers/gpu/drm/i915/selftests/i915_vma.c
@@ -170,7 +170,7 @@ static int igt_vma_create(void *arg)
 		}
 
 		nc = 0;
-		for_each_prime_number(num_ctx, MAX_CONTEXT_HW_ID) {
+		for_each_prime_number(num_ctx, NUM_HW_TAG) {
 			for (; nc < num_ctx; nc++) {
 				ctx = mock_context(i915, "mock");
 				if (!ctx)
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 39/39] active-rings
  2019-06-14  7:09 WIP glance towards struct_mutex removal Chris Wilson
                   ` (37 preceding siblings ...)
  2019-06-14  7:10 ` [PATCH 38/39] drm/i915: Remove logical HW ID Chris Wilson
@ 2019-06-14  7:10 ` Chris Wilson
  2019-06-14  7:22 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/39] drm/i915: Discard some redundant cache domain flushes Patchwork
  2019-06-14  7:35 ` ✗ Fi.CI.SPARSE: " Patchwork
  40 siblings, 0 replies; 61+ messages in thread
From: Chris Wilson @ 2019-06-14  7:10 UTC (permalink / raw)
  To: intel-gfx

---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 38 ++++----
 drivers/gpu/drm/i915/gem/i915_gem_pm.c        |  6 +-
 drivers/gpu/drm/i915/gt/intel_engine_cs.c     |  1 -
 drivers/gpu/drm/i915/gt/intel_engine_types.h  |  2 -
 drivers/gpu/drm/i915/gt/intel_ringbuffer.c    | 12 ++-
 drivers/gpu/drm/i915/gt/mock_engine.c         |  1 -
 drivers/gpu/drm/i915/i915_debugfs.c           |  6 +-
 drivers/gpu/drm/i915/i915_drv.h               |  2 -
 drivers/gpu/drm/i915/i915_gem.c               |  4 +-
 drivers/gpu/drm/i915/i915_request.c           | 89 ++++++++-----------
 drivers/gpu/drm/i915/i915_request.h           |  3 -
 .../gpu/drm/i915/selftests/igt_flush_test.c   |  5 +-
 .../gpu/drm/i915/selftests/mock_gem_device.c  |  3 -
 13 files changed, 74 insertions(+), 98 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 68faf1a71c97..bd01b7d95d0e 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -751,8 +751,11 @@ static int eb_select_context(struct i915_execbuffer *eb)
 
 static struct i915_request *__eb_wait_for_ring(struct intel_ring *ring)
 {
+	struct list_head *requests = &ring->timeline->requests;
 	struct i915_request *rq;
 
+	lockdep_assert_held(&ring->timeline->mutex);
+
 	/*
 	 * Completely unscientific finger-in-the-air estimates for suitable
 	 * maximum user request size (to avoid blocking) and then backoff.
@@ -767,12 +770,15 @@ static struct i915_request *__eb_wait_for_ring(struct intel_ring *ring)
 	 * claiming our resources, but not so long that the ring completely
 	 * drains before we can submit our next request.
 	 */
-	list_for_each_entry(rq, &ring->request_list, ring_link) {
+	list_for_each_entry(rq, requests, link) {
+		if (rq->ring != ring)
+			continue;
+
 		if (__intel_ring_space(rq->postfix,
 				       ring->emit, ring->size) > ring->size / 2)
 			break;
 	}
-	if (&rq->ring_link == &ring->request_list)
+	if (&rq->link == requests)
 		return NULL; /* weird, we will check again later for real */
 
 	return i915_request_get(rq);
@@ -780,8 +786,12 @@ static struct i915_request *__eb_wait_for_ring(struct intel_ring *ring)
 
 static int eb_wait_for_ring(const struct i915_execbuffer *eb)
 {
+	struct intel_ring *ring = eb->context->ring;
 	struct i915_request *rq;
-	int ret = 0;
+	int ret;
+
+	if (READ_ONCE(ring->space) >= PAGE_SIZE)
+		return 0;
 
 	/*
 	 * Apply a light amount of backpressure to prevent excessive hogs
@@ -789,18 +799,20 @@ static int eb_wait_for_ring(const struct i915_execbuffer *eb)
 	 * keeping all of their resources pinned.
 	 */
 
-	rq = __eb_wait_for_ring(eb->context->ring);
-	if (rq) {
-		mutex_unlock(&eb->i915->drm.struct_mutex);
+	ret = mutex_lock_interruptible(&ring->timeline->mutex);
+	if (ret)
+		return ret;
 
+	rq = __eb_wait_for_ring(ring);
+	mutex_unlock(&ring->timeline->mutex);
+
+	if (rq) {
 		if (i915_request_wait(rq,
 				      I915_WAIT_INTERRUPTIBLE,
 				      MAX_SCHEDULE_TIMEOUT) < 0)
 			ret = -EINTR;
 
 		i915_request_put(rq);
-
-		mutex_lock(&eb->i915->drm.struct_mutex);
 	}
 
 	return ret;
@@ -2447,15 +2459,11 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	 */
 	intel_gt_pm_get(eb.i915);
 
-	err = i915_mutex_lock_interruptible(dev);
-	if (err)
-		goto err_rpm;
-
 	err = eb_select_engine(&eb, file, args);
 	if (unlikely(err))
-		goto err_unlock;
+		goto err_rpm;
 
-	err = eb_wait_for_ring(&eb); /* may temporarily drop struct_mutex */
+	err = eb_wait_for_ring(&eb);
 	if (unlikely(err))
 		goto err_engine;
 
@@ -2612,8 +2620,6 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 		eb_release_vmas(&eb);
 err_engine:
 	eb_unpin_context(&eb);
-err_unlock:
-	mutex_unlock(&dev->struct_mutex);
 err_rpm:
 	intel_gt_pm_put(eb.i915);
 	i915_gem_context_put(eb.gem_context);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pm.c b/drivers/gpu/drm/i915/gem/i915_gem_pm.c
index 35b73c347887..e7134cb38425 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_pm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_pm.c
@@ -68,11 +68,7 @@ static void retire_work_handler(struct work_struct *work)
 	struct drm_i915_private *i915 =
 		container_of(work, typeof(*i915), gem.retire_work.work);
 
-	/* Come back later if the device is busy... */
-	if (mutex_trylock(&i915->drm.struct_mutex)) {
-		i915_retire_requests(i915);
-		mutex_unlock(&i915->drm.struct_mutex);
-	}
+	i915_retire_requests(i915);
 
 	queue_delayed_work(i915->wq,
 			   &i915->gem.retire_work,
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index 85eb5c99d657..85bbe2055ea0 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -745,7 +745,6 @@ static int measure_breadcrumb_dw(struct intel_engine_cs *engine)
 			       engine->status_page.vma))
 		goto out_frame;
 
-	INIT_LIST_HEAD(&frame->ring.request_list);
 	frame->ring.timeline = &frame->timeline;
 	frame->ring.vaddr = frame->cs;
 	frame->ring.size = sizeof(frame->cs);
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index 96994a55804a..367e687b5601 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -68,8 +68,6 @@ struct intel_ring {
 	void *vaddr;
 
 	struct i915_timeline *timeline;
-	struct list_head request_list;
-	struct list_head active_link;
 
 	u32 head;
 	u32 tail;
diff --git a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
index 95d9adc7e2e9..546f27b81b4f 100644
--- a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
@@ -1267,7 +1267,6 @@ intel_engine_create_ring(struct intel_engine_cs *engine,
 		return ERR_PTR(-ENOMEM);
 
 	kref_init(&ring->ref);
-	INIT_LIST_HEAD(&ring->request_list);
 	ring->timeline = i915_timeline_get(timeline);
 
 	ring->size = size;
@@ -1787,21 +1786,26 @@ static int ring_request_alloc(struct i915_request *request)
 
 static noinline int wait_for_space(struct intel_ring *ring, unsigned int bytes)
 {
+	struct list_head *requests = &ring->timeline->requests;
 	struct i915_request *target;
 	long timeout;
 
+	lockdep_assert_held(&ring->timeline->mutex);
 	if (intel_ring_update_space(ring) >= bytes)
 		return 0;
 
-	GEM_BUG_ON(list_empty(&ring->request_list));
-	list_for_each_entry(target, &ring->request_list, ring_link) {
+	GEM_BUG_ON(list_empty(requests));
+	list_for_each_entry(target, requests, link) {
+		if (target->ring != ring)
+			continue;
+
 		/* Would completion of this request free enough space? */
 		if (bytes <= __intel_ring_space(target->postfix,
 						ring->emit, ring->size))
 			break;
 	}
 
-	if (WARN_ON(&target->ring_link == &ring->request_list))
+	if (GEM_WARN_ON(&target->link == requests))
 		return -ENOSPC;
 
 	timeout = i915_request_wait(target,
diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c b/drivers/gpu/drm/i915/gt/mock_engine.c
index b9c2764beca3..fb19a5290abc 100644
--- a/drivers/gpu/drm/i915/gt/mock_engine.c
+++ b/drivers/gpu/drm/i915/gt/mock_engine.c
@@ -67,7 +67,6 @@ static struct intel_ring *mock_ring(struct intel_engine_cs *engine)
 	ring->base.vaddr = (void *)(ring + 1);
 	ring->base.timeline = &ring->timeline;
 
-	INIT_LIST_HEAD(&ring->base.request_list);
 	intel_ring_update_space(&ring->base);
 
 	return &ring->base;
diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 657e8fb18e38..a8a114c6a839 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -3695,12 +3695,12 @@ i915_drop_caches_set(void *data, u64 val)
 						     I915_WAIT_LOCKED,
 						     MAX_SCHEDULE_TIMEOUT);
 
-		if (val & DROP_RETIRE)
-			i915_retire_requests(i915);
-
 		mutex_unlock(&i915->drm.struct_mutex);
 	}
 
+	if (val & DROP_RETIRE)
+		i915_retire_requests(i915);
+
 	if (val & DROP_RESET_ACTIVE && i915_terminally_wedged(i915))
 		i915_handle_error(i915, ALL_ENGINES, 0, NULL);
 
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index a0ad9522425e..621f13191e01 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1863,8 +1863,6 @@ struct drm_i915_private {
 			struct list_head hwsp_free_list;
 		} timelines;
 
-		struct list_head active_rings;
-
 		struct intel_wakeref wakeref;
 
 		struct list_head closed_vma;
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 1571c707ad15..91a21672461f 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1017,9 +1017,10 @@ int i915_gem_wait_for_idle(struct drm_i915_private *i915,
 		if (err)
 			return err;
 
-		i915_retire_requests(i915);
 	}
 
+	i915_retire_requests(i915);
+
 	return 0;
 }
 
@@ -1772,7 +1773,6 @@ int i915_gem_init_early(struct drm_i915_private *dev_priv)
 
 	intel_gt_pm_init(dev_priv);
 
-	INIT_LIST_HEAD(&dev_priv->gt.active_rings);
 	INIT_LIST_HEAD(&dev_priv->gt.closed_vma);
 	spin_lock_init(&dev_priv->gt.closed_lock);
 
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 2c2624d7e18e..7e1c699d21c9 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -182,9 +182,6 @@ i915_request_remove_from_client(struct i915_request *request)
 
 static void advance_ring(struct i915_request *request)
 {
-	struct intel_ring *ring = request->ring;
-	unsigned int tail;
-
 	/*
 	 * We know the GPU must have read the request to have
 	 * sent us the seqno + interrupt, so use the position
@@ -194,24 +191,7 @@ static void advance_ring(struct i915_request *request)
 	 * Note this requires that we are always called in request
 	 * completion order.
 	 */
-	GEM_BUG_ON(!list_is_first(&request->ring_link, &ring->request_list));
-	if (list_is_last(&request->ring_link, &ring->request_list)) {
-		/*
-		 * We may race here with execlists resubmitting this request
-		 * as we retire it. The resubmission will move the ring->tail
-		 * forwards (to request->wa_tail). We either read the
-		 * current value that was written to hw, or the value that
-		 * is just about to be. Either works, if we miss the last two
-		 * noops - they are safe to be replayed on a reset.
-		 */
-		tail = READ_ONCE(request->tail);
-		list_del(&ring->active_link);
-	} else {
-		tail = request->postfix;
-	}
-	list_del_init(&request->ring_link);
-
-	ring->head = tail;
+	request->ring->head = request->postfix;
 }
 
 static void free_capture_list(struct i915_request *request)
@@ -290,7 +270,7 @@ static bool i915_request_retire(struct i915_request *rq)
 
 void i915_request_retire_upto(struct i915_request *rq)
 {
-	struct intel_ring *ring = rq->ring;
+	struct list_head *requests = &rq->timeline->requests;
 	struct i915_request *tmp;
 
 	GEM_TRACE("%s fence %llx:%lld, current %d\n",
@@ -300,11 +280,9 @@ void i915_request_retire_upto(struct i915_request *rq)
 
 	lockdep_assert_held(&rq->timeline->mutex);
 	GEM_BUG_ON(!i915_request_completed(rq));
-	GEM_BUG_ON(list_empty(&rq->ring_link));
 
 	do {
-		tmp = list_first_entry(&ring->request_list,
-				       typeof(*tmp), ring_link);
+		tmp = list_first_entry(requests, typeof(*tmp), link);
 	} while (i915_request_retire(tmp) && tmp != rq);
 }
 
@@ -535,12 +513,12 @@ semaphore_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
 	return NOTIFY_DONE;
 }
 
-static void ring_retire_requests(struct intel_ring *ring)
+static void retire_requests(struct i915_timeline *tl)
 {
 	struct i915_request *rq, *rn;
 
-	lockdep_assert_held(&ring->timeline->mutex);
-	list_for_each_entry_safe(rq, rn, &ring->request_list, ring_link)
+	lockdep_assert_held(&tl->mutex);
+	list_for_each_entry_safe(rq, rn, &tl->requests, link)
 		if (!i915_request_retire(rq))
 			break;
 }
@@ -548,17 +526,17 @@ static void ring_retire_requests(struct intel_ring *ring)
 static noinline struct i915_request *
 request_alloc_slow(struct intel_context *ce, gfp_t gfp)
 {
-	struct intel_ring *ring = ce->ring;
+	struct i915_timeline *tl = ce->ring->timeline;
 	struct i915_request *rq;
 
-	if (list_empty(&ring->request_list))
+	if (list_empty(&tl->requests))
 		goto out;
 
 	if (!gfpflags_allow_blocking(gfp))
 		goto out;
 
 	/* Move our oldest request to the slab-cache (if not in use!) */
-	rq = list_first_entry(&ring->request_list, typeof(*rq), ring_link);
+	rq = list_first_entry(&tl->requests, typeof(*rq), link);
 	i915_request_retire(rq);
 
 	rq = kmem_cache_alloc(global.slab_requests,
@@ -567,11 +545,11 @@ request_alloc_slow(struct intel_context *ce, gfp_t gfp)
 		return rq;
 
 	/* Ratelimit ourselves to prevent oom from malicious clients */
-	rq = list_last_entry(&ring->request_list, typeof(*rq), ring_link);
+	rq = list_last_entry(&tl->requests, typeof(*rq), link);
 	cond_synchronize_rcu(rq->rcustate);
 
 	/* Retire our old requests in the hope that we free some */
-	ring_retire_requests(ring);
+	retire_requests(tl);
 
 out:
 	return kmem_cache_alloc(global.slab_requests, gfp);
@@ -711,6 +689,7 @@ __i915_request_create(struct intel_context *ce, gfp_t gfp)
 struct i915_request *
 i915_request_create(struct intel_context *ce)
 {
+	struct i915_timeline *tl = ce->ring->timeline;
 	struct i915_request *rq;
 	int err;
 
@@ -719,8 +698,8 @@ i915_request_create(struct intel_context *ce)
 		return ERR_PTR(err);
 
 	/* Move our oldest request to the slab-cache (if not in use!) */
-	rq = list_first_entry(&ce->ring->request_list, typeof(*rq), ring_link);
-	if (!list_is_last(&rq->ring_link, &ce->ring->request_list) &&
+	rq = list_first_entry(&tl->requests, typeof(*rq), link);
+	if (!list_is_last(&rq->link, &tl->requests) &&
 	    i915_request_completed(rq))
 		i915_request_retire(rq);
 
@@ -731,7 +710,7 @@ i915_request_create(struct intel_context *ce)
 		goto err_unlock;
 
 	/* Check that we do not interrupt ourselves with a new request */
-	rq->cookie = lockdep_pin_lock(&ce->ring->timeline->mutex);
+	rq->cookie = lockdep_pin_lock(&tl->mutex);
 
 	return rq;
 
@@ -743,10 +722,10 @@ i915_request_create(struct intel_context *ce)
 static int
 i915_request_await_start(struct i915_request *rq, struct i915_request *signal)
 {
-	if (list_is_first(&signal->ring_link, &signal->ring->request_list))
+	if (list_is_first(&signal->link, &signal->ring->timeline->requests))
 		return 0;
 
-	signal = list_prev_entry(signal, ring_link);
+	signal = list_prev_entry(signal, link);
 	if (i915_timeline_sync_is_later(rq->timeline, &signal->fence))
 		return 0;
 
@@ -1143,6 +1122,7 @@ struct i915_request *__i915_request_commit(struct i915_request *rq)
 	 */
 	GEM_BUG_ON(rq->reserved_space > ring->space);
 	rq->reserved_space = 0;
+	rq->emitted_jiffies = jiffies;
 
 	/*
 	 * Record the position of the start of the breadcrumb so that
@@ -1156,11 +1136,6 @@ struct i915_request *__i915_request_commit(struct i915_request *rq)
 
 	prev = __i915_request_add_to_timeline(rq);
 
-	list_add_tail(&rq->ring_link, &ring->request_list);
-	if (list_is_first(&rq->ring_link, &ring->request_list))
-		list_add(&ring->active_link, &rq->i915->gt.active_rings);
-	rq->emitted_jiffies = jiffies;
-
 	/*
 	 * Let the backend know a new request has arrived that may need
 	 * to adjust the existing execution schedule due to a high priority
@@ -1459,22 +1434,30 @@ long i915_request_wait(struct i915_request *rq,
 
 bool i915_retire_requests(struct drm_i915_private *i915)
 {
-	struct intel_ring *ring, *tmp;
+	struct i915_gt_timelines *gt = &i915->gt.timelines;
+	struct i915_timeline *tl, *tn;
+
+	mutex_lock(&gt->mutex);
+	list_for_each_entry_safe(tl, tn, &gt->active_list, link) {
+		if (!i915_active_fence_isset(&tl->last_request))
+			continue;
 
-	lockdep_assert_held(&i915->drm.struct_mutex);
+		i915_timeline_get(tl);
 
-	list_for_each_entry_safe(ring, tmp,
-				 &i915->gt.active_rings, active_link) {
-		intel_ring_get(ring); /* last rq holds reference! */
-		mutex_lock(&ring->timeline->mutex);
+		mutex_unlock(&gt->mutex);
+		if (!mutex_trylock(&tl->mutex))
+			goto relock;
 
-		ring_retire_requests(ring);
+		retire_requests(tl);
 
-		mutex_unlock(&ring->timeline->mutex);
-		intel_ring_put(ring);
+		mutex_unlock(&tl->mutex);
+relock:
+		mutex_lock(&gt->mutex);
+		i915_timeline_put(tl);
 	}
+	mutex_unlock(&gt->mutex);
 
-	return !list_empty(&i915->gt.active_rings);
+	return !list_empty(&gt->active_list);
 }
 
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index 8277cff0df70..d63b7f69cd87 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -220,9 +220,6 @@ struct i915_request {
 	/** timeline->request entry for this request */
 	struct list_head link;
 
-	/** ring->request_list entry for this request */
-	struct list_head ring_link;
-
 	struct drm_i915_file_private *file_priv;
 	/** file_priv list entry for this request */
 	struct list_head client_link;
diff --git a/drivers/gpu/drm/i915/selftests/igt_flush_test.c b/drivers/gpu/drm/i915/selftests/igt_flush_test.c
index 5bfd1b2626a2..0a10351c3c64 100644
--- a/drivers/gpu/drm/i915/selftests/igt_flush_test.c
+++ b/drivers/gpu/drm/i915/selftests/igt_flush_test.c
@@ -14,7 +14,7 @@
 int igt_flush_test(struct drm_i915_private *i915, unsigned int flags)
 {
 	int ret = i915_terminally_wedged(i915) ? -EIO : 0;
-	int repeat = !!(flags & I915_WAIT_LOCKED);
+	int repeat = 1;
 
 	cond_resched();
 
@@ -33,8 +33,7 @@ int igt_flush_test(struct drm_i915_private *i915, unsigned int flags)
 		}
 
 		/* Ensure we also flush after wedging. */
-		if (flags & I915_WAIT_LOCKED)
-			i915_retire_requests(i915);
+		i915_retire_requests(i915);
 	} while (repeat--);
 
 	return ret;
diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
index a96d0c012d46..1b66f0d95a0f 100644
--- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
+++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
@@ -40,8 +40,6 @@ void mock_device_flush(struct drm_i915_private *i915)
 	struct intel_engine_cs *engine;
 	enum intel_engine_id id;
 
-	lockdep_assert_held(&i915->drm.struct_mutex);
-
 	do {
 		for_each_engine(engine, i915, id)
 			mock_engine_flush(engine);
@@ -200,7 +198,6 @@ struct drm_i915_private *mock_gem_device(void)
 
 	i915_timelines_init(i915);
 
-	INIT_LIST_HEAD(&i915->gt.active_rings);
 	INIT_LIST_HEAD(&i915->gt.closed_vma);
 	spin_lock_init(&i915->gt.closed_lock);
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/39] drm/i915: Discard some redundant cache domain flushes
  2019-06-14  7:09 WIP glance towards struct_mutex removal Chris Wilson
                   ` (38 preceding siblings ...)
  2019-06-14  7:10 ` [PATCH 39/39] active-rings Chris Wilson
@ 2019-06-14  7:22 ` Patchwork
  2019-06-14  7:35 ` ✗ Fi.CI.SPARSE: " Patchwork
  40 siblings, 0 replies; 61+ messages in thread
From: Patchwork @ 2019-06-14  7:22 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/39] drm/i915: Discard some redundant cache domain flushes
URL   : https://patchwork.freedesktop.org/series/62085/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
6eb3b4d8d4bd drm/i915: Discard some redundant cache domain flushes
a145d1b010f1 drm/i915: Refine i915_reset.lock_map
18868d2d5f4a drm/i915: Avoid tainting i915_gem_park() with wakeref.lock
3dd6eec4d204 drm/i915: Enable refcount debugging for default debug levels
8c062ac906be drm/i915: Keep contexts pinned until after the next kernel context switch
b049d2db266c drm/i915: Stop retiring along engine
98f655113122 drm/i915: Replace engine->timeline with a plain list
-:180: CHECK:UNCOMMENTED_DEFINITION: spinlock_t definition without comment
#180: FILE: drivers/gpu/drm/i915/gt/intel_engine_types.h:292:
+		spinlock_t lock;

total: 0 errors, 0 warnings, 1 checks, 968 lines checked
7f8c5c0d9568 drm/i915: Flush the execution-callbacks on retiring
7960c6f7e24e drm/i915/execlists: Preempt-to-busy
-:1494: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'p_ptr' - possible side-effects?
#1494: FILE: drivers/gpu/drm/i915/i915_utils.h:134:
+#define ptr_count_dec(p_ptr) do {					\
+	typeof(p_ptr) __p = (p_ptr);					\
+	unsigned long __v = (unsigned long)(*__p);			\
+	*__p = (typeof(*p_ptr))(--__v);					\
+} while (0)

-:1500: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'p_ptr' - possible side-effects?
#1500: FILE: drivers/gpu/drm/i915/i915_utils.h:140:
+#define ptr_count_inc(p_ptr) do {					\
+	typeof(p_ptr) __p = (p_ptr);					\
+	unsigned long __v = (unsigned long)(*__p);			\
+	*__p = (typeof(*p_ptr))(++__v);					\
+} while (0)

-:1783: WARNING:LINE_SPACING: Missing a blank line after declarations
#1783: FILE: drivers/gpu/drm/i915/intel_guc_submission.c:820:
+		int rem = ARRAY_SIZE(execlists->inflight) - idx;
+		memmove(execlists->inflight, port, rem * sizeof(*port));

total: 0 errors, 1 warnings, 2 checks, 1682 lines checked
4362ebfc9e11 drm/i915/execlists: Minimalistic timeslicing
-:345: WARNING:LONG_LINE: line over 100 characters
#345: FILE: drivers/gpu/drm/i915/gt/selftest_lrc.c:211:
+			      2 * RUNTIME_INFO(outer->i915)->num_engines * (count + 2) * (count + 3)) < 0) {

total: 0 errors, 1 warnings, 0 checks, 426 lines checked
ee3f9dbd5dad drm/i915/execlists: Force preemption
44c2f89f62fa dma-fence: Propagate errors to dma-fence-array container
e7d5c4f25cec dma-fence: Report the composite sync_file status
2cdad69a45fb dma-fence: Refactor signaling for manual invocation
-:33: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#33: 
new file mode 100644

-:38: WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
#38: FILE: drivers/dma-buf/dma-fence-trace.c:1:
+/*

-:293: WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
#293: FILE: include/linux/dma-fence-types.h:1:
+/*

-:368: CHECK:UNCOMMENTED_DEFINITION: spinlock_t definition without comment
#368: FILE: include/linux/dma-fence-types.h:76:
+	spinlock_t *lock;

total: 0 errors, 3 warnings, 1 checks, 728 lines checked
e171ea0e2acb dma-fence: Always execute signal callbacks
47a81f170456 drm/i915: Execute signal callbacks from no-op i915_request_wait
1cedd186a982 drm/i915: Make the semaphore saturation mask global
5da6a4163753 drm/i915: Throw away the active object retirement complexity
15e64af9e592 drm/i915: Provide an i915_active.acquire callback
-:684: CHECK:UNCOMMENTED_DEFINITION: struct mutex definition without comment
#684: FILE: drivers/gpu/drm/i915/i915_active_types.h:36:
+	struct mutex mutex;

total: 0 errors, 0 warnings, 1 checks, 730 lines checked
0dbf4afec759 drm/i915: Push the i915_active.retire into a worker
655970ed13db drm/i915/overlay: Switch to using i915_active tracking
b03c66bab792 drm/i915: Forgo last_fence active request tracking
55e795c76331 drm/i915: Extract intel_frontbuffer active tracking
9ca5f25a8498 drm/i915: Coordinate i915_active with its own mutex
e7f6cb7712a2 drm/i915: Track ggtt fence reservations under its own mutex
2e2f11bbd4d9 drm/i915: Only track bound elements of the GTT
3576e88429ca drm/i915: Explicitly cleanup initial_plane_config
fb28ad5a502c initial-plane-vma
-:8: WARNING:COMMIT_MESSAGE: Missing commit description - Add an appropriate one

-:389: ERROR:MISSING_SIGN_OFF: Missing Signed-off-by: line(s)

total: 1 errors, 1 warnings, 0 checks, 349 lines checked
ce0da27c91c2 drm/i915: Make i915_vma track its own kref
-:308: WARNING:BRACES: braces {} are not necessary for single statement blocks
#308: FILE: drivers/gpu/drm/i915/gem/selftests/huge_pages.c:576:
+		if (!IS_ERR(vma)) {
+			i915_vma_put(vma);
+		}

-:1346: ERROR:MISSING_SIGN_OFF: Missing Signed-off-by: line(s)

total: 1 errors, 1 warnings, 0 checks, 1136 lines checked
273465acf85c drm/i915: Propagate fence errors
e73a45e0f5a1 drm/i915: Allow page pinning to be in the background
94c97e5b13b7 drm/i915: Allow vma binding to occur asynchronously
07310921ddea revoke-fence
-:8: WARNING:COMMIT_MESSAGE: Missing commit description - Add an appropriate one

-:233: ERROR:MISSING_SIGN_OFF: Missing Signed-off-by: line(s)

total: 1 errors, 1 warnings, 0 checks, 186 lines checked
95f33c091ed7 drm/i915: Use vm->mutex for serialising GTT insertion
1174540f83b8 drm/i915: Pin pages before waiting
cf576994e16e drm/i915: Use reservation_object to coordinate userptr get_pages()
-:8: WARNING:COMMIT_MESSAGE: Missing commit description - Add an appropriate one

total: 0 errors, 1 warnings, 0 checks, 1104 lines checked
2c181cb942b4 drm/i915: Use forced preemptions in preference over hangcheck
6a167522e758 drm/i915: Remove logical HW ID
e676c08b2ebc active-rings
-:8: WARNING:COMMIT_MESSAGE: Missing commit description - Add an appropriate one

-:533: ERROR:MISSING_SIGN_OFF: Missing Signed-off-by: line(s)

total: 1 errors, 1 warnings, 0 checks, 440 lines checked

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* ✗ Fi.CI.SPARSE: warning for series starting with [01/39] drm/i915: Discard some redundant cache domain flushes
  2019-06-14  7:09 WIP glance towards struct_mutex removal Chris Wilson
                   ` (39 preceding siblings ...)
  2019-06-14  7:22 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/39] drm/i915: Discard some redundant cache domain flushes Patchwork
@ 2019-06-14  7:35 ` Patchwork
  40 siblings, 0 replies; 61+ messages in thread
From: Patchwork @ 2019-06-14  7:35 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/39] drm/i915: Discard some redundant cache domain flushes
URL   : https://patchwork.freedesktop.org/series/62085/
State : warning

== Summary ==

$ dim sparse origin/drm-tip
Sparse version: v0.5.2
Commit: drm/i915: Discard some redundant cache domain flushes
Okay!

Commit: drm/i915: Refine i915_reset.lock_map
Okay!

Commit: drm/i915: Avoid tainting i915_gem_park() with wakeref.lock
Okay!

Commit: drm/i915: Enable refcount debugging for default debug levels
Okay!

Commit: drm/i915: Keep contexts pinned until after the next kernel context switch
Okay!

Commit: drm/i915: Stop retiring along engine
Okay!

Commit: drm/i915: Replace engine->timeline with a plain list
Okay!

Commit: drm/i915: Flush the execution-callbacks on retiring
Okay!

Commit: drm/i915/execlists: Preempt-to-busy
-drivers/gpu/drm/i915/selftests/../i915_utils.h:220:16: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/../i915_utils.h:232:16: warning: expression using sizeof(void)

Commit: drm/i915/execlists: Minimalistic timeslicing
+drivers/gpu/drm/i915/gt/intel_lrc.c:876:16: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/gt/intel_lrc.c:876:16: warning: expression using sizeof(void)

Commit: drm/i915/execlists: Force preemption
+
+drivers/gpu/drm/i915/i915_utils.h:232:16: warning: expression using sizeof(void)
+Error in reading or end of file.

Commit: dma-fence: Propagate errors to dma-fence-array container
Okay!

Commit: dma-fence: Report the composite sync_file status
Okay!

Commit: dma-fence: Refactor signaling for manual invocation
Okay!

Commit: dma-fence: Always execute signal callbacks
Okay!

Commit: drm/i915: Execute signal callbacks from no-op i915_request_wait
Okay!

Commit: drm/i915: Make the semaphore saturation mask global
Okay!

Commit: drm/i915: Throw away the active object retirement complexity
Okay!

Commit: drm/i915: Provide an i915_active.acquire callback
Okay!

Commit: drm/i915: Push the i915_active.retire into a worker
Okay!

Commit: drm/i915/overlay: Switch to using i915_active tracking
Okay!

Commit: drm/i915: Forgo last_fence active request tracking
Okay!

Commit: drm/i915: Extract intel_frontbuffer active tracking
+drivers/gpu/drm/i915/intel_frontbuffer.c:217:13: warning: context imbalance in 'frontbuffer_release' - unexpected unlock

Commit: drm/i915: Coordinate i915_active with its own mutex
-./include/uapi/linux/perf_event.h:147:56: warning: cast truncates bits from constant value (8000000000000000 becomes 0)
-./include/uapi/linux/perf_event.h:147:56: warning: cast truncates bits from constant value (8000000000000000 becomes 0)
-./include/uapi/linux/perf_event.h:147:56: warning: cast truncates bits from constant value (8000000000000000 becomes 0)
-./include/uapi/linux/perf_event.h:147:56: warning: cast truncates bits from constant value (8000000000000000 becomes 0)

Commit: drm/i915: Track ggtt fence reservations under its own mutex
Okay!

Commit: drm/i915: Only track bound elements of the GTT
Okay!

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 01/39] drm/i915: Discard some redundant cache domain flushes
  2019-06-14  7:09 ` [PATCH 01/39] drm/i915: Discard some redundant cache domain flushes Chris Wilson
@ 2019-06-14  9:12   ` Matthew Auld
  0 siblings, 0 replies; 61+ messages in thread
From: Matthew Auld @ 2019-06-14  9:12 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Intel Graphics Development

On Fri, 14 Jun 2019 at 08:11, Chris Wilson <chris@chris-wilson.co.uk> wrote:
>
> Since commit a679f58d0510 ("drm/i915: Flush pages on acquisition"), we
> flush objects on acquire their pages and as such when we create an

acquiring

> object for the purpose of writing into it, we do not need to manually
> flush.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 16/39] drm/i915: Execute signal callbacks from no-op i915_request_wait
  2019-06-14  7:10 ` [PATCH 16/39] drm/i915: Execute signal callbacks from no-op i915_request_wait Chris Wilson
@ 2019-06-14 10:45   ` Tvrtko Ursulin
  0 siblings, 0 replies; 61+ messages in thread
From: Tvrtko Ursulin @ 2019-06-14 10:45 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 14/06/2019 08:10, Chris Wilson wrote:
> If we enter i915_request_wait() with an already completed request, but
> unsignaled dma-fence, signal the fence before returning. This allows us
> to execute any of the signal callbacks at the earliest opportunity.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_request.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index ead68a2ffeac..a82f3ff82b39 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -1392,7 +1392,7 @@ long i915_request_wait(struct i915_request *rq,
>   	might_sleep();
>   	GEM_BUG_ON(timeout < 0);
>   
> -	if (i915_request_completed(rq))
> +	if (dma_fence_is_signaled(&rq->fence))
>   		return timeout;
>   
>   	if (!timeout)
> 

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 29/39] drm/i915: Make i915_vma track its own kref
  2019-06-14  7:10 ` [PATCH 29/39] drm/i915: Make i915_vma track its own kref Chris Wilson
@ 2019-06-14 11:15   ` Matthew Auld
  2019-06-14 11:18     ` Chris Wilson
  0 siblings, 1 reply; 61+ messages in thread
From: Matthew Auld @ 2019-06-14 11:15 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Intel Graphics Development

On Fri, 14 Jun 2019 at 08:10, Chris Wilson <chris@chris-wilson.co.uk> wrote:
>
> Throughout the code base we internally track vma (objects bound into
> a particular GTT), with the objects themselves being the common backing
> storage. By making the vma itself reference counted we can start
> operating on the vma concurrently, moving work into async threads.
>
> Just the conversion to making sure we keep track of the vma reference
> counts is not particularly pleasant.
> ---

[snip]

> @@ -2060,6 +2057,7 @@ static struct i915_vma *pd_vma_create(struct gen6_ppgtt *ppgtt, int size)
>         if (!vma)
>                 return ERR_PTR(-ENOMEM);
>
> +       kref_init(&vma->ref);
>         i915_active_init(&vma->active, NULL, NULL);
>
>         vma->vm = &ggtt->vm;

Just a first pass.  Do we need i915_vm_get(&ggtt->vm); so we match the
i915_vm_put() in __i915_vma_release()?
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 29/39] drm/i915: Make i915_vma track its own kref
  2019-06-14 11:15   ` Matthew Auld
@ 2019-06-14 11:18     ` Chris Wilson
  0 siblings, 0 replies; 61+ messages in thread
From: Chris Wilson @ 2019-06-14 11:18 UTC (permalink / raw)
  To: Matthew Auld; +Cc: Intel Graphics Development

Quoting Matthew Auld (2019-06-14 12:15:27)
> On Fri, 14 Jun 2019 at 08:10, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> >
> > Throughout the code base we internally track vma (objects bound into
> > a particular GTT), with the objects themselves being the common backing
> > storage. By making the vma itself reference counted we can start
> > operating on the vma concurrently, moving work into async threads.
> >
> > Just the conversion to making sure we keep track of the vma reference
> > counts is not particularly pleasant.
> > ---
> 
> [snip]
> 
> > @@ -2060,6 +2057,7 @@ static struct i915_vma *pd_vma_create(struct gen6_ppgtt *ppgtt, int size)
> >         if (!vma)
> >                 return ERR_PTR(-ENOMEM);
> >
> > +       kref_init(&vma->ref);
> >         i915_active_init(&vma->active, NULL, NULL);
> >
> >         vma->vm = &ggtt->vm;
> 
> Just a first pass.  Do we need i915_vm_get(&ggtt->vm); so we match the
> i915_vm_put() in __i915_vma_release()?

Yup. That would have an interesting explosion at some point. Tempted to
leave it to make sure CI does explode :)

Now that I mention it... I think I'll will leave a comment with the fix.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 02/39] drm/i915: Refine i915_reset.lock_map
  2019-06-14  7:09 ` [PATCH 02/39] drm/i915: Refine i915_reset.lock_map Chris Wilson
@ 2019-06-14 14:10   ` Mika Kuoppala
  2019-06-14 14:17     ` Chris Wilson
  0 siblings, 1 reply; 61+ messages in thread
From: Mika Kuoppala @ 2019-06-14 14:10 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Chris Wilson <chris@chris-wilson.co.uk> writes:

> We already use a mutex to serialise i915_reset() and wedging, so all we
> need it to link that into i915_request_wait() and we have our lock cycle
> detection.
>
> v2.5: Take error mutex for selftests
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/gt/intel_reset.c            |  6 ++----
>  drivers/gpu/drm/i915/i915_drv.h                  |  8 --------
>  drivers/gpu/drm/i915/i915_gem.c                  |  3 ---
>  drivers/gpu/drm/i915/i915_request.c              | 12 ++++++++++--
>  drivers/gpu/drm/i915/selftests/mock_gem_device.c |  2 --
>  5 files changed, 12 insertions(+), 19 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
> index 8ba7af8b7ced..41a294f5cc19 100644
> --- a/drivers/gpu/drm/i915/gt/intel_reset.c
> +++ b/drivers/gpu/drm/i915/gt/intel_reset.c
> @@ -978,7 +978,7 @@ void i915_reset(struct drm_i915_private *i915,
>  
>  	might_sleep();
>  	GEM_BUG_ON(!test_bit(I915_RESET_BACKOFF, &error->flags));
> -	lock_map_acquire(&i915->gt.reset_lockmap);
> +	mutex_lock(&error->wedge_mutex);
>  
>  	/* Clear any previous failed attempts at recovery. Time to try again. */
>  	if (!__i915_gem_unset_wedged(i915))
> @@ -1031,7 +1031,7 @@ void i915_reset(struct drm_i915_private *i915,
>  finish:
>  	reset_finish(i915);
>  unlock:
> -	lock_map_release(&i915->gt.reset_lockmap);
> +	mutex_unlock(&error->wedge_mutex);
>  	return;
>  
>  taint:
> @@ -1147,9 +1147,7 @@ static void i915_reset_device(struct drm_i915_private *i915,
>  		/* Flush everyone using a resource about to be clobbered */
>  		synchronize_srcu_expedited(&error->reset_backoff_srcu);
>  
> -		mutex_lock(&error->wedge_mutex);
>  		i915_reset(i915, engine_mask, reason);
> -		mutex_unlock(&error->wedge_mutex);
>  
>  		intel_finish_reset(i915);
>  	}
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 90d94d904e65..3683ef6d4c28 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1901,14 +1901,6 @@ struct drm_i915_private {
>  		ktime_t last_init_time;
>  
>  		struct i915_vma *scratch;
> -
> -		/*
> -		 * We must never wait on the GPU while holding a lock as we
> -		 * may need to perform a GPU reset. So while we don't need to
> -		 * serialise wait/reset with an explicit lock, we do want
> -		 * lockdep to detect potential dependency cycles.
> -		 */
> -		struct lockdep_map reset_lockmap;
>  	} gt;
>  
>  	struct {
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 4bbded4aa936..7232361973fd 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1746,7 +1746,6 @@ static void i915_gem_init__mm(struct drm_i915_private *i915)
>  
>  int i915_gem_init_early(struct drm_i915_private *dev_priv)
>  {
> -	static struct lock_class_key reset_key;
>  	int err;
>  
>  	intel_gt_pm_init(dev_priv);
> @@ -1754,8 +1753,6 @@ int i915_gem_init_early(struct drm_i915_private *dev_priv)
>  	INIT_LIST_HEAD(&dev_priv->gt.active_rings);
>  	INIT_LIST_HEAD(&dev_priv->gt.closed_vma);
>  	spin_lock_init(&dev_priv->gt.closed_lock);
> -	lockdep_init_map(&dev_priv->gt.reset_lockmap,
> -			 "i915.reset", &reset_key, 0);
>  
>  	i915_gem_init__mm(dev_priv);
>  	i915_gem_init__pm(dev_priv);
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index 1cbc3ef4fc27..5311286578b7 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -1444,7 +1444,15 @@ long i915_request_wait(struct i915_request *rq,
>  		return -ETIME;
>  
>  	trace_i915_request_wait_begin(rq, flags);
> -	lock_map_acquire(&rq->i915->gt.reset_lockmap);
> +
> +	/*
> +	 * We must never wait on the GPU while holding a lock as we
> +	 * may need to perform a GPU reset. So while we don't need to
> +	 * serialise wait/reset with an explicit lock, we do want
> +	 * lockdep to detect potential dependency cycles.
> +	 */
> +	mutex_acquire(&rq->i915->gpu_error.wedge_mutex.dep_map,
> +		      0, 0, _THIS_IP_);

Seems to translate to exclusive lock with full checking.

There was ofcourse a slight possibilty that previous reviewer did
read all the lockdep.h. Looked at the wedge mutex and connected
the dots. Well, it is obvious now.

Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>

>  
>  	/*
>  	 * Optimistic spin before touching IRQs.
> @@ -1518,7 +1526,7 @@ long i915_request_wait(struct i915_request *rq,
>  	dma_fence_remove_callback(&rq->fence, &wait.cb);
>  
>  out:
> -	lock_map_release(&rq->i915->gt.reset_lockmap);
> +	mutex_release(&rq->i915->gpu_error.wedge_mutex.dep_map, 0, _THIS_IP_);
>  	trace_i915_request_wait_end(rq);
>  	return timeout;
>  }
> diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> index 1e9ffced78c1..b7f3fbb4ae89 100644
> --- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> +++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> @@ -130,7 +130,6 @@ static struct dev_pm_domain pm_domain = {
>  
>  struct drm_i915_private *mock_gem_device(void)
>  {
> -	static struct lock_class_key reset_key;
>  	struct drm_i915_private *i915;
>  	struct pci_dev *pdev;
>  	int err;
> @@ -205,7 +204,6 @@ struct drm_i915_private *mock_gem_device(void)
>  	INIT_LIST_HEAD(&i915->gt.active_rings);
>  	INIT_LIST_HEAD(&i915->gt.closed_vma);
>  	spin_lock_init(&i915->gt.closed_lock);
> -	lockdep_init_map(&i915->gt.reset_lockmap, "i915.reset", &reset_key, 0);
>  
>  	mutex_lock(&i915->drm.struct_mutex);
>  
> -- 
> 2.20.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 02/39] drm/i915: Refine i915_reset.lock_map
  2019-06-14 14:10   ` Mika Kuoppala
@ 2019-06-14 14:17     ` Chris Wilson
  0 siblings, 0 replies; 61+ messages in thread
From: Chris Wilson @ 2019-06-14 14:17 UTC (permalink / raw)
  To: Mika Kuoppala, intel-gfx

Quoting Mika Kuoppala (2019-06-14 15:10:08)
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> > index 1cbc3ef4fc27..5311286578b7 100644
> > --- a/drivers/gpu/drm/i915/i915_request.c
> > +++ b/drivers/gpu/drm/i915/i915_request.c
> > @@ -1444,7 +1444,15 @@ long i915_request_wait(struct i915_request *rq,
> >               return -ETIME;
> >  
> >       trace_i915_request_wait_begin(rq, flags);
> > -     lock_map_acquire(&rq->i915->gt.reset_lockmap);
> > +
> > +     /*
> > +      * We must never wait on the GPU while holding a lock as we
> > +      * may need to perform a GPU reset. So while we don't need to
> > +      * serialise wait/reset with an explicit lock, we do want
> > +      * lockdep to detect potential dependency cycles.
> > +      */
> > +     mutex_acquire(&rq->i915->gpu_error.wedge_mutex.dep_map,
> > +                   0, 0, _THIS_IP_);
> 
> Seems to translate to exclusive lock with full checking.
> 
> There was ofcourse a slight possibilty that previous reviewer did
> read all the lockdep.h. Looked at the wedge mutex and connected
> the dots. Well, it is obvious now.

Hah, I had forgotten all about wedge_mutex :-p

Hopefully, this keeps our reset handling robust. First I have to fix the
mistakes I've recently made...

I just need to find a reviewer for struct_mutex removal :)
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 03/39] drm/i915: Avoid tainting i915_gem_park() with wakeref.lock
  2019-06-14  7:09 ` [PATCH 03/39] drm/i915: Avoid tainting i915_gem_park() with wakeref.lock Chris Wilson
@ 2019-06-14 14:47   ` Mika Kuoppala
  2019-06-14 14:51     ` Chris Wilson
  0 siblings, 1 reply; 61+ messages in thread
From: Mika Kuoppala @ 2019-06-14 14:47 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Chris Wilson <chris@chris-wilson.co.uk> writes:

> While we need to flush the wakeref before parking, we do not need to
> perform the i915_gem_park() itself underneath the wakeref lock, merely
> the struct_mutex. If we rearrange the locks, we can avoid the unnecessary
> tainting.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_pm.c | 17 ++++++++---------
>  1 file changed, 8 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pm.c b/drivers/gpu/drm/i915/gem/i915_gem_pm.c
> index 6e75702c5671..a33f69610d6f 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_pm.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_pm.c
> @@ -30,23 +30,22 @@ static void idle_work_handler(struct work_struct *work)
>  {
>  	struct drm_i915_private *i915 =
>  		container_of(work, typeof(*i915), gem.idle_work);
> -	bool restart = true;
> +	bool park;
>

can_park...meh.

> -	cancel_delayed_work(&i915->gem.retire_work);
> +	cancel_delayed_work_sync(&i915->gem.retire_work);
>  	mutex_lock(&i915->drm.struct_mutex);
>  
>  	intel_wakeref_lock(&i915->gt.wakeref);
> -	if (!intel_wakeref_active(&i915->gt.wakeref) && !work_pending(work)) {
> -		i915_gem_park(i915);
> -		restart = false;
> -	}
> +	park = !intel_wakeref_active(&i915->gt.wakeref) && !work_pending(work);
>  	intel_wakeref_unlock(&i915->gt.wakeref);
> -
> -	mutex_unlock(&i915->drm.struct_mutex);
> -	if (restart)
> +	if (park)
> +		i915_gem_park(i915);

Did not find anything beneath gem park that would need wakeref.

Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>

> +	else
>  		queue_delayed_work(i915->wq,
>  				   &i915->gem.retire_work,
>  				   round_jiffies_up_relative(HZ));
> +
> +	mutex_unlock(&i915->drm.struct_mutex);
>  }
>  
>  static void retire_work_handler(struct work_struct *work)
> -- 
> 2.20.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 03/39] drm/i915: Avoid tainting i915_gem_park() with wakeref.lock
  2019-06-14 14:47   ` Mika Kuoppala
@ 2019-06-14 14:51     ` Chris Wilson
  0 siblings, 0 replies; 61+ messages in thread
From: Chris Wilson @ 2019-06-14 14:51 UTC (permalink / raw)
  To: Mika Kuoppala, intel-gfx

Quoting Mika Kuoppala (2019-06-14 15:47:46)
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > While we need to flush the wakeref before parking, we do not need to
> > perform the i915_gem_park() itself underneath the wakeref lock, merely
> > the struct_mutex. If we rearrange the locks, we can avoid the unnecessary
> > tainting.
> >
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >  drivers/gpu/drm/i915/gem/i915_gem_pm.c | 17 ++++++++---------
> >  1 file changed, 8 insertions(+), 9 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pm.c b/drivers/gpu/drm/i915/gem/i915_gem_pm.c
> > index 6e75702c5671..a33f69610d6f 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_pm.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_pm.c
> > @@ -30,23 +30,22 @@ static void idle_work_handler(struct work_struct *work)
> >  {
> >       struct drm_i915_private *i915 =
> >               container_of(work, typeof(*i915), gem.idle_work);
> > -     bool restart = true;
> > +     bool park;
> >
> 
> can_park...meh.
> 
> > -     cancel_delayed_work(&i915->gem.retire_work);
> > +     cancel_delayed_work_sync(&i915->gem.retire_work);
> >       mutex_lock(&i915->drm.struct_mutex);
> >  
> >       intel_wakeref_lock(&i915->gt.wakeref);
> > -     if (!intel_wakeref_active(&i915->gt.wakeref) && !work_pending(work)) {
> > -             i915_gem_park(i915);
> > -             restart = false;
> > -     }
> > +     park = !intel_wakeref_active(&i915->gt.wakeref) && !work_pending(work);
> >       intel_wakeref_unlock(&i915->gt.wakeref);
> > -
> > -     mutex_unlock(&i915->drm.struct_mutex);
> > -     if (restart)
> > +     if (park)
> > +             i915_gem_park(i915);
> 
> Did not find anything beneath gem park that would need wakeref.

In fact, mostly planning to move this to a timer and not use the global
idle. Certainly for i915_vma_parked() which is just supposed to be cache
of userspace vma and is causing me some pain with i915_vma.kref :)
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 04/39] drm/i915: Enable refcount debugging for default debug levels
  2019-06-14  7:09 ` [PATCH 04/39] drm/i915: Enable refcount debugging for default debug levels Chris Wilson
@ 2019-06-14 15:11   ` Mika Kuoppala
  2019-06-14 15:11   ` Mika Kuoppala
  1 sibling, 0 replies; 61+ messages in thread
From: Mika Kuoppala @ 2019-06-14 15:11 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Chris Wilson <chris@chris-wilson.co.uk> writes:

> refcount_t is our first line of defence against use-after-free, so let's
> enable it for debugging.

Sure. Only wakerefs currently seems to use this tho.

Should we consider changing some atomic_t's to refcounts?

And add our own assertions? We might find places where we
could warn before hitting UINT_MAX on refcount.

-Mika

>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/Kconfig.debug | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/gpu/drm/i915/Kconfig.debug b/drivers/gpu/drm/i915/Kconfig.debug
> index 09aa0f4c8bf1..8d922bb4d953 100644
> --- a/drivers/gpu/drm/i915/Kconfig.debug
> +++ b/drivers/gpu/drm/i915/Kconfig.debug
> @@ -21,6 +21,7 @@ config DRM_I915_DEBUG
>          depends on DRM_I915
>          select DEBUG_FS
>          select PREEMPT_COUNT
> +        select REFCOUNT_FULL
>          select I2C_CHARDEV
>          select STACKDEPOT
>          select DRM_DP_AUX_CHARDEV
> -- 
> 2.20.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 04/39] drm/i915: Enable refcount debugging for default debug levels
  2019-06-14  7:09 ` [PATCH 04/39] drm/i915: Enable refcount debugging for default debug levels Chris Wilson
  2019-06-14 15:11   ` Mika Kuoppala
@ 2019-06-14 15:11   ` Mika Kuoppala
  1 sibling, 0 replies; 61+ messages in thread
From: Mika Kuoppala @ 2019-06-14 15:11 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Chris Wilson <chris@chris-wilson.co.uk> writes:

> refcount_t is our first line of defence against use-after-free, so let's
> enable it for debugging.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Replied but forgot to stamp.
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>

> ---
>  drivers/gpu/drm/i915/Kconfig.debug | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/gpu/drm/i915/Kconfig.debug b/drivers/gpu/drm/i915/Kconfig.debug
> index 09aa0f4c8bf1..8d922bb4d953 100644
> --- a/drivers/gpu/drm/i915/Kconfig.debug
> +++ b/drivers/gpu/drm/i915/Kconfig.debug
> @@ -21,6 +21,7 @@ config DRM_I915_DEBUG
>          depends on DRM_I915
>          select DEBUG_FS
>          select PREEMPT_COUNT
> +        select REFCOUNT_FULL
>          select I2C_CHARDEV
>          select STACKDEPOT
>          select DRM_DP_AUX_CHARDEV
> -- 
> 2.20.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 34/39] drm/i915: Use vm->mutex for serialising GTT insertion
  2019-06-14  7:10 ` [PATCH 34/39] drm/i915: Use vm->mutex for serialising GTT insertion Chris Wilson
@ 2019-06-14 19:14   ` Matthew Auld
  2019-06-14 19:20     ` Chris Wilson
  0 siblings, 1 reply; 61+ messages in thread
From: Matthew Auld @ 2019-06-14 19:14 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Intel Graphics Development

On Fri, 14 Jun 2019 at 08:11, Chris Wilson <chris@chris-wilson.co.uk> wrote:
>
> Serialising insertion into each of the global GTT and ppGTT accounts for
> a large chunk of the current struct_mutex serialisation requireemnts.
> (Note that it is not just the drm_mm / gtt management itself being
> serialised, but the pin count and various flags.) Previously, the main
> blocker for replacing this mutex was the reset handing, but with the
> advent of "lockless" resets, we can freely take the vm->mutex and block
> waiting for the GPU (without fear of deadlock if the GPU hangs). We also
> proscribe allocations underneath vm->mutex, allowing us to take the
> mutex inside the shrinker, avoiding the recursive struct_mutex we
> previously used.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---

[snip]

> @@ -248,16 +248,6 @@ vm_fault_t i915_gem_fault(struct vm_fault *vmf)
>                 goto err_rpm;
>         }
>
> -       ret = i915_mutex_lock_interruptible(dev);
> -       if (ret)
> -               goto err_reset;
> -
> -       /* Access to snoopable pages through the GTT is incoherent. */
> -       if (obj->cache_level != I915_CACHE_NONE && !HAS_LLC(i915)) {
> -               ret = -EFAULT;
> -               goto err_unlock;
> -       }

Intentional? Perhaps harder to enforce now?
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 34/39] drm/i915: Use vm->mutex for serialising GTT insertion
  2019-06-14 19:14   ` Matthew Auld
@ 2019-06-14 19:20     ` Chris Wilson
  0 siblings, 0 replies; 61+ messages in thread
From: Chris Wilson @ 2019-06-14 19:20 UTC (permalink / raw)
  To: Matthew Auld; +Cc: Intel Graphics Development

Quoting Matthew Auld (2019-06-14 20:14:27)
> On Fri, 14 Jun 2019 at 08:11, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> >
> > Serialising insertion into each of the global GTT and ppGTT accounts for
> > a large chunk of the current struct_mutex serialisation requireemnts.
> > (Note that it is not just the drm_mm / gtt management itself being
> > serialised, but the pin count and various flags.) Previously, the main
> > blocker for replacing this mutex was the reset handing, but with the
> > advent of "lockless" resets, we can freely take the vm->mutex and block
> > waiting for the GPU (without fear of deadlock if the GPU hangs). We also
> > proscribe allocations underneath vm->mutex, allowing us to take the
> > mutex inside the shrinker, avoiding the recursive struct_mutex we
> > previously used.
> >
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> 
> [snip]
> 
> > @@ -248,16 +248,6 @@ vm_fault_t i915_gem_fault(struct vm_fault *vmf)
> >                 goto err_rpm;
> >         }
> >
> > -       ret = i915_mutex_lock_interruptible(dev);
> > -       if (ret)
> > -               goto err_reset;
> > -
> > -       /* Access to snoopable pages through the GTT is incoherent. */
> > -       if (obj->cache_level != I915_CACHE_NONE && !HAS_LLC(i915)) {
> > -               ret = -EFAULT;
> > -               goto err_unlock;
> > -       }
> 
> Intentional? Perhaps harder to enforce now?

Removed, then thought of how to lock it correctly. I was chickening out
at set_cache_level, but all it really needs is object_lock although
there's some scope to moving the cache level onto the vma and then we
would only need to check after pinning the vma (so long as then allow
waiting on pinned vma for set_cache_level). I thought I might have been
able to justify it, but on second thoughts I think it is a machine killer
for older gen iirc so probably best not left open.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 35/39] drm/i915: Pin pages before waiting
  2019-06-14  7:10 ` [PATCH 35/39] drm/i915: Pin pages before waiting Chris Wilson
@ 2019-06-14 19:53   ` Matthew Auld
  2019-06-14 19:58     ` Chris Wilson
  0 siblings, 1 reply; 61+ messages in thread
From: Matthew Auld @ 2019-06-14 19:53 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Intel Graphics Development

On Fri, 14 Jun 2019 at 08:11, Chris Wilson <chris@chris-wilson.co.uk> wrote:
>
> In order to allow for asynchronous gathering of pages tracked by the
> obj->resv, we take advantage of pinning the pages before doing waiting
> on the reservation, and where possible do an early wait before acquiring
> the object lock (with a follow up locked waited to ensure we have
> exclusive access where necessary).
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c |   4 +-
>  drivers/gpu/drm/i915/gem/i915_gem_domain.c | 104 +++++++++++----------
>  drivers/gpu/drm/i915/i915_gem.c            |  22 +++--
>  3 files changed, 68 insertions(+), 62 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> index a93e233cfaa9..84992d590da5 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> @@ -154,7 +154,7 @@ static int i915_gem_begin_cpu_access(struct dma_buf *dma_buf, enum dma_data_dire
>         bool write = (direction == DMA_BIDIRECTIONAL || direction == DMA_TO_DEVICE);
>         int err;
>
> -       err = i915_gem_object_pin_pages(obj);
> +       err = i915_gem_object_pin_pages_async(obj);
>         if (err)
>                 return err;
>
> @@ -175,7 +175,7 @@ static int i915_gem_end_cpu_access(struct dma_buf *dma_buf, enum dma_data_direct
>         struct drm_i915_gem_object *obj = dma_buf_to_obj(dma_buf);
>         int err;
>
> -       err = i915_gem_object_pin_pages(obj);
> +       err = i915_gem_object_pin_pages_async(obj);
>         if (err)
>                 return err;
>
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> index f9044bbdd429..bda990113124 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> @@ -49,17 +49,11 @@ i915_gem_object_set_to_wc_domain(struct drm_i915_gem_object *obj, bool write)
>
>         assert_object_held(obj);
>
> -       ret = i915_gem_object_wait(obj,
> -                                  I915_WAIT_INTERRUPTIBLE |
> -                                  (write ? I915_WAIT_ALL : 0),
> -                                  MAX_SCHEDULE_TIMEOUT);
> -       if (ret)
> -               return ret;
> -
>         if (obj->write_domain == I915_GEM_DOMAIN_WC)
>                 return 0;
>
> -       /* Flush and acquire obj->pages so that we are coherent through
> +       /*
> +        * Flush and acquire obj->pages so that we are coherent through
>          * direct access in memory with previous cached writes through
>          * shmemfs and that our cache domain tracking remains valid.
>          * For example, if the obj->filp was moved to swap without us
> @@ -67,10 +61,17 @@ i915_gem_object_set_to_wc_domain(struct drm_i915_gem_object *obj, bool write)
>          * continue to assume that the obj remained out of the CPU cached
>          * domain.
>          */
> -       ret = i915_gem_object_pin_pages(obj);
> +       ret = i915_gem_object_pin_pages_async(obj);
>         if (ret)
>                 return ret;
>
> +       ret = i915_gem_object_wait(obj,
> +                                  I915_WAIT_INTERRUPTIBLE |
> +                                  (write ? I915_WAIT_ALL : 0),
> +                                  MAX_SCHEDULE_TIMEOUT);
> +       if (ret)
> +               goto out_unpin;
> +

Do we somehow propagate a potential error from a worker to the
object_wait()? Or should we be looking at obj->mm.pages here?
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 35/39] drm/i915: Pin pages before waiting
  2019-06-14 19:53   ` Matthew Auld
@ 2019-06-14 19:58     ` Chris Wilson
  2019-06-14 20:13       ` Chris Wilson
  0 siblings, 1 reply; 61+ messages in thread
From: Chris Wilson @ 2019-06-14 19:58 UTC (permalink / raw)
  To: Matthew Auld; +Cc: Intel Graphics Development

Quoting Matthew Auld (2019-06-14 20:53:26)
> On Fri, 14 Jun 2019 at 08:11, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > @@ -67,10 +61,17 @@ i915_gem_object_set_to_wc_domain(struct drm_i915_gem_object *obj, bool write)
> >          * continue to assume that the obj remained out of the CPU cached
> >          * domain.
> >          */
> > -       ret = i915_gem_object_pin_pages(obj);
> > +       ret = i915_gem_object_pin_pages_async(obj);
> >         if (ret)
> >                 return ret;
> >
> > +       ret = i915_gem_object_wait(obj,
> > +                                  I915_WAIT_INTERRUPTIBLE |
> > +                                  (write ? I915_WAIT_ALL : 0),
> > +                                  MAX_SCHEDULE_TIMEOUT);
> > +       if (ret)
> > +               goto out_unpin;
> > +
> 
> Do we somehow propagate a potential error from a worker to the
> object_wait()? Or should we be looking at obj->mm.pages here?

Yeah, I've propagated such errors elsewhere (principally along the
fences). What you are suggesting is tantamount to making
i915_gem_object_wait() report an error, and I have bad memories from all
the unhandled -EIO in the past. However, that feels the natural thing to
do, so lets give it a whirl.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 35/39] drm/i915: Pin pages before waiting
  2019-06-14 19:58     ` Chris Wilson
@ 2019-06-14 20:13       ` Chris Wilson
  2019-06-14 20:18         ` Matthew Auld
  0 siblings, 1 reply; 61+ messages in thread
From: Chris Wilson @ 2019-06-14 20:13 UTC (permalink / raw)
  To: Matthew Auld; +Cc: Intel Graphics Development

Quoting Chris Wilson (2019-06-14 20:58:09)
> Quoting Matthew Auld (2019-06-14 20:53:26)
> > On Fri, 14 Jun 2019 at 08:11, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > > @@ -67,10 +61,17 @@ i915_gem_object_set_to_wc_domain(struct drm_i915_gem_object *obj, bool write)
> > >          * continue to assume that the obj remained out of the CPU cached
> > >          * domain.
> > >          */
> > > -       ret = i915_gem_object_pin_pages(obj);
> > > +       ret = i915_gem_object_pin_pages_async(obj);
> > >         if (ret)
> > >                 return ret;
> > >
> > > +       ret = i915_gem_object_wait(obj,
> > > +                                  I915_WAIT_INTERRUPTIBLE |
> > > +                                  (write ? I915_WAIT_ALL : 0),
> > > +                                  MAX_SCHEDULE_TIMEOUT);
> > > +       if (ret)
> > > +               goto out_unpin;
> > > +
> > 
> > Do we somehow propagate a potential error from a worker to the
> > object_wait()? Or should we be looking at obj->mm.pages here?
> 
> Yeah, I've propagated such errors elsewhere (principally along the
> fences). What you are suggesting is tantamount to making
> i915_gem_object_wait() report an error, and I have bad memories from all
> the unhandled -EIO in the past. However, that feels the natural thing to
> do, so lets give it a whirl.

So we need to check for error pages anyway, because we can't rule out a
race between the pin_pages_async and i915_gem_object_wait.

There's plenty of duplicated code for pin_pages_async, object_wait,
check pages so I should refactor that into a variant, 
i915_gem_object_pin_pages_wait() ?
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 35/39] drm/i915: Pin pages before waiting
  2019-06-14 20:13       ` Chris Wilson
@ 2019-06-14 20:18         ` Matthew Auld
  0 siblings, 0 replies; 61+ messages in thread
From: Matthew Auld @ 2019-06-14 20:18 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Intel Graphics Development

On Fri, 14 Jun 2019 at 21:13, Chris Wilson <chris@chris-wilson.co.uk> wrote:
>
> Quoting Chris Wilson (2019-06-14 20:58:09)
> > Quoting Matthew Auld (2019-06-14 20:53:26)
> > > On Fri, 14 Jun 2019 at 08:11, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > > > @@ -67,10 +61,17 @@ i915_gem_object_set_to_wc_domain(struct drm_i915_gem_object *obj, bool write)
> > > >          * continue to assume that the obj remained out of the CPU cached
> > > >          * domain.
> > > >          */
> > > > -       ret = i915_gem_object_pin_pages(obj);
> > > > +       ret = i915_gem_object_pin_pages_async(obj);
> > > >         if (ret)
> > > >                 return ret;
> > > >
> > > > +       ret = i915_gem_object_wait(obj,
> > > > +                                  I915_WAIT_INTERRUPTIBLE |
> > > > +                                  (write ? I915_WAIT_ALL : 0),
> > > > +                                  MAX_SCHEDULE_TIMEOUT);
> > > > +       if (ret)
> > > > +               goto out_unpin;
> > > > +
> > >
> > > Do we somehow propagate a potential error from a worker to the
> > > object_wait()? Or should we be looking at obj->mm.pages here?
> >
> > Yeah, I've propagated such errors elsewhere (principally along the
> > fences). What you are suggesting is tantamount to making
> > i915_gem_object_wait() report an error, and I have bad memories from all
> > the unhandled -EIO in the past. However, that feels the natural thing to
> > do, so lets give it a whirl.
>
> So we need to check for error pages anyway, because we can't rule out a
> race between the pin_pages_async and i915_gem_object_wait.
>
> There's plenty of duplicated code for pin_pages_async, object_wait,
> check pages so I should refactor that into a variant,
> i915_gem_object_pin_pages_wait() ?

Yeah, makes sense to me.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 32/39] drm/i915: Allow vma binding to occur asynchronously
  2019-06-14  7:10 ` [PATCH 32/39] drm/i915: Allow vma binding to occur asynchronously Chris Wilson
@ 2019-08-05 20:35   ` Kumar Valsan, Prathap
  0 siblings, 0 replies; 61+ messages in thread
From: Kumar Valsan, Prathap @ 2019-08-05 20:35 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Fri, Jun 14, 2019 at 08:10:16AM +0100, Chris Wilson wrote:
> If we let pages be allocated asynchronously, we also then want to push
> the binding process into an asynchronous task. Make it so, utilising the
> recent improvements to fence error tracking and struct_mutex reduction.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
[snip]
>  /**
>   * i915_vma_bind - Sets up PTEs for an VMA in it's corresponding address space.
>   * @vma: VMA to map
> @@ -300,17 +412,12 @@ int i915_vma_bind(struct i915_vma *vma, enum i915_cache_level cache_level,
>  	u32 vma_flags;
>  	int ret;
>  
> +	GEM_BUG_ON(!flags);
>  	GEM_BUG_ON(!drm_mm_node_allocated(&vma->node));
>  	GEM_BUG_ON(vma->size > vma->node.size);
> -
> -	if (GEM_DEBUG_WARN_ON(range_overflows(vma->node.start,
> -					      vma->node.size,
> -					      vma->vm->total)))
> -		return -ENODEV;
> -
> -	if (GEM_DEBUG_WARN_ON(!flags))
> -		return -EINVAL;
> -
> +	GEM_BUG_ON(range_overflows(vma->node.start,
> +				   vma->node.size,
> +				   vma->vm->total));
>  	bind_flags = 0;
>  	if (flags & PIN_GLOBAL)
>  		bind_flags |= I915_VMA_GLOBAL_BIND;
> @@ -325,16 +432,20 @@ int i915_vma_bind(struct i915_vma *vma, enum i915_cache_level cache_level,
>  	if (bind_flags == 0)
>  		return 0;
>  
> -	GEM_BUG_ON(!vma->pages);
> +	if ((bind_flags & ~vma_flags) & I915_VMA_LOCAL_BIND)
> +		bind_flags |= I915_VMA_ALLOC_BIND;
>  
>  	trace_i915_vma_bind(vma, bind_flags);
> -	ret = vma->ops->bind_vma(vma, cache_level, bind_flags);
> +	if (bind_flags & I915_VMA_ALLOC_BIND)
> +		ret = queue_async_bind(vma, cache_level, bind_flags);
> +	else
> +		ret = __vma_bind(vma, cache_level, bind_flags);
>  	if (ret)
>  		return ret;

i915_vma_remove() expects vma has pages set. This is no longer true with
async get pages. Shouldn't the clear_pages() called iff pages are set? 

[snip]
>  static inline void i915_vma_unlock(struct i915_vma *vma)
>  {
>  	reservation_object_unlock(vma->resv);
> diff --git a/drivers/gpu/drm/i915/selftests/i915_vma.c b/drivers/gpu/drm/i915/selftests/i915_vma.c
> index 56c1cac368cc..30b831408b7b 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_vma.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_vma.c
> @@ -204,8 +204,10 @@ static int igt_vma_create(void *arg)
>  		mock_context_close(ctx);
>  	}
>  
> -	list_for_each_entry_safe(obj, on, &objects, st_link)
> +	list_for_each_entry_safe(obj, on, &objects, st_link) {
> +		i915_gem_object_wait(obj, I915_WAIT_ALL, MAX_SCHEDULE_TIMEOUT);
>  		i915_gem_object_put(obj);
> +	}
>  	return err;
>  }
>  
> -- 
> 2.20.1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 36/39] drm/i915: Use reservation_object to coordinate userptr get_pages()
  2019-06-14  7:10 ` [PATCH 36/39] drm/i915: Use reservation_object to coordinate userptr get_pages() Chris Wilson
@ 2019-08-07 20:49   ` Daniel Vetter
  2019-08-08  7:30     ` [Intel-gfx] " Daniel Vetter
  0 siblings, 1 reply; 61+ messages in thread
From: Daniel Vetter @ 2019-08-07 20:49 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Fri, Jun 14, 2019 at 08:10:20AM +0100, Chris Wilson wrote:
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Not sure this works, 2 thoughts way down ...

> ---
>  drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c    |  27 +--
>  .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |   3 -
>  drivers/gpu/drm/i915/gem/i915_gem_internal.c  |  30 +--
>  .../gpu/drm/i915/gem/i915_gem_object_types.h  |  11 +-
>  drivers/gpu/drm/i915/gem/i915_gem_pages.c     | 137 ++++++++++-
>  drivers/gpu/drm/i915/gem/i915_gem_phys.c      |  33 ++-
>  drivers/gpu/drm/i915/gem/i915_gem_shmem.c     |  30 +--
>  drivers/gpu/drm/i915/gem/i915_gem_stolen.c    |  25 +-
>  drivers/gpu/drm/i915/gem/i915_gem_userptr.c   | 221 ++++--------------
>  .../drm/i915/gem/selftests/huge_gem_object.c  |  17 +-
>  .../gpu/drm/i915/gem/selftests/huge_pages.c   |  45 ++--
>  drivers/gpu/drm/i915/gvt/dmabuf.c             |  17 +-
>  drivers/gpu/drm/i915/i915_drv.h               |   9 +-
>  drivers/gpu/drm/i915/i915_gem.c               |   5 +-
>  drivers/gpu/drm/i915/selftests/i915_gem_gtt.c |  14 +-
>  15 files changed, 300 insertions(+), 324 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> index 84992d590da5..a44d6d2ef7ed 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> @@ -225,33 +225,24 @@ struct dma_buf *i915_gem_prime_export(struct drm_device *dev,
>  	return drm_gem_dmabuf_export(dev, &exp_info);
>  }
>  
> -static int i915_gem_object_get_pages_dmabuf(struct drm_i915_gem_object *obj)
> +static struct sg_table *
> +dmabuf_get_pages(struct i915_gem_object_get_pages_context *ctx,
> +		 unsigned int *sizes)
>  {
> -	struct sg_table *pages;
> -	unsigned int sg_page_sizes;
> -
> -	pages = dma_buf_map_attachment(obj->base.import_attach,
> -				       DMA_BIDIRECTIONAL);
> -	if (IS_ERR(pages))
> -		return PTR_ERR(pages);
> -
> -	sg_page_sizes = i915_sg_page_sizes(pages->sgl);
> -
> -	__i915_gem_object_set_pages(obj, pages, sg_page_sizes);
> -
> -	return 0;
> +	return dma_buf_map_attachment(ctx->object->base.import_attach,
> +				      DMA_BIDIRECTIONAL);
>  }
>  
> -static void i915_gem_object_put_pages_dmabuf(struct drm_i915_gem_object *obj,
> -					     struct sg_table *pages)
> +static void dmabuf_put_pages(struct drm_i915_gem_object *obj,
> +			     struct sg_table *pages)
>  {
>  	dma_buf_unmap_attachment(obj->base.import_attach, pages,
>  				 DMA_BIDIRECTIONAL);
>  }
>  
>  static const struct drm_i915_gem_object_ops i915_gem_object_dmabuf_ops = {
> -	.get_pages = i915_gem_object_get_pages_dmabuf,
> -	.put_pages = i915_gem_object_put_pages_dmabuf,
> +	.get_pages = dmabuf_get_pages,
> +	.put_pages = dmabuf_put_pages,
>  };
>  
>  struct drm_gem_object *i915_gem_prime_import(struct drm_device *dev,
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> index 44bcb681c168..68faf1a71c97 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> @@ -1784,9 +1784,6 @@ static noinline int eb_relocate_slow(struct i915_execbuffer *eb)
>  		goto out;
>  	}
>  
> -	/* A frequent cause for EAGAIN are currently unavailable client pages */
> -	flush_workqueue(eb->i915->mm.userptr_wq);
> -
>  	err = i915_mutex_lock_interruptible(dev);
>  	if (err) {
>  		mutex_lock(&dev->struct_mutex);
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_internal.c b/drivers/gpu/drm/i915/gem/i915_gem_internal.c
> index 0c41e04ab8fa..aa0bd5de313b 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_internal.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_internal.c
> @@ -32,11 +32,14 @@ static void internal_free_pages(struct sg_table *st)
>  	kfree(st);
>  }
>  
> -static int i915_gem_object_get_pages_internal(struct drm_i915_gem_object *obj)
> +static struct sg_table *
> +internal_get_pages(struct i915_gem_object_get_pages_context *ctx,
> +		   unsigned int *sizes)
>  {
> +	struct drm_i915_gem_object *obj = ctx->object;
>  	struct drm_i915_private *i915 = to_i915(obj->base.dev);
> -	struct sg_table *st;
>  	struct scatterlist *sg;
> +	struct sg_table *st;
>  	unsigned int sg_page_sizes;
>  	unsigned int npages;
>  	int max_order;
> @@ -66,12 +69,12 @@ static int i915_gem_object_get_pages_internal(struct drm_i915_gem_object *obj)
>  create_st:
>  	st = kmalloc(sizeof(*st), GFP_KERNEL);
>  	if (!st)
> -		return -ENOMEM;
> +		return ERR_PTR(-ENOMEM);
>  
>  	npages = obj->base.size / PAGE_SIZE;
>  	if (sg_alloc_table(st, npages, GFP_KERNEL)) {
>  		kfree(st);
> -		return -ENOMEM;
> +		return ERR_PTR(-ENOMEM);
>  	}
>  
>  	sg = st->sgl;
> @@ -117,27 +120,26 @@ static int i915_gem_object_get_pages_internal(struct drm_i915_gem_object *obj)
>  		goto err;
>  	}
>  
> -	/* Mark the pages as dontneed whilst they are still pinned. As soon
> +	/*
> +	 * Mark the pages as dontneed whilst they are still pinned. As soon
>  	 * as they are unpinned they are allowed to be reaped by the shrinker,
>  	 * and the caller is expected to repopulate - the contents of this
>  	 * object are only valid whilst active and pinned.
>  	 */
>  	obj->mm.madv = I915_MADV_DONTNEED;
>  
> -	__i915_gem_object_set_pages(obj, st, sg_page_sizes);
> -
> -	return 0;
> +	*sizes = sg_page_sizes;
> +	return st;
>  
>  err:
>  	sg_set_page(sg, NULL, 0, 0);
>  	sg_mark_end(sg);
>  	internal_free_pages(st);
> -
> -	return -ENOMEM;
> +	return ERR_PTR(-ENOMEM);
>  }
>  
> -static void i915_gem_object_put_pages_internal(struct drm_i915_gem_object *obj,
> -					       struct sg_table *pages)
> +static void internal_put_pages(struct drm_i915_gem_object *obj,
> +			       struct sg_table *pages)
>  {
>  	i915_gem_gtt_finish_pages(obj, pages);
>  	internal_free_pages(pages);
> @@ -149,8 +151,8 @@ static void i915_gem_object_put_pages_internal(struct drm_i915_gem_object *obj,
>  static const struct drm_i915_gem_object_ops i915_gem_object_internal_ops = {
>  	.flags = I915_GEM_OBJECT_HAS_STRUCT_PAGE |
>  		 I915_GEM_OBJECT_IS_SHRINKABLE,
> -	.get_pages = i915_gem_object_get_pages_internal,
> -	.put_pages = i915_gem_object_put_pages_internal,
> +	.get_pages = internal_get_pages,
> +	.put_pages = internal_put_pages,
>  };
>  
>  /**
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> index f792953b8a71..0ea404cfbc1c 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> @@ -19,6 +19,7 @@
>  
>  struct drm_i915_gem_object;
>  struct intel_fronbuffer;
> +struct task_struct;
>  
>  /*
>   * struct i915_lut_handle tracks the fast lookups from handle to vma used
> @@ -32,6 +33,11 @@ struct i915_lut_handle {
>  	u32 handle;
>  };
>  
> +struct i915_gem_object_get_pages_context  {
> +	struct drm_i915_gem_object *object;
> +	struct task_struct *task;
> +};
> +
>  struct drm_i915_gem_object_ops {
>  	unsigned int flags;
>  #define I915_GEM_OBJECT_HAS_STRUCT_PAGE	BIT(0)
> @@ -52,9 +58,11 @@ struct drm_i915_gem_object_ops {
>  	 * being released or under memory pressure (where we attempt to
>  	 * reap pages for the shrinker).
>  	 */
> -	int (*get_pages)(struct drm_i915_gem_object *obj);
> +	struct sg_table *(*get_pages)(struct i915_gem_object_get_pages_context *ctx,
> +				      unsigned int *sizes);
>  	void (*put_pages)(struct drm_i915_gem_object *obj,
>  			  struct sg_table *pages);
> +
>  	void (*truncate)(struct drm_i915_gem_object *obj);
>  	void (*writeback)(struct drm_i915_gem_object *obj);
>  
> @@ -252,7 +260,6 @@ struct drm_i915_gem_object {
>  
>  			struct i915_mm_struct *mm;
>  			struct i915_mmu_object *mmu_object;
> -			struct work_struct *work;
>  		} userptr;
>  
>  		unsigned long scratch;
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> index 6bec301cee79..f65a983248c6 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> @@ -8,16 +8,49 @@
>  #include "i915_gem_object.h"
>  #include "i915_scatterlist.h"
>  
> -void __i915_gem_object_set_pages(struct drm_i915_gem_object *obj,
> -				 struct sg_table *pages,
> -				 unsigned int sg_page_sizes)
> +static DEFINE_SPINLOCK(fence_lock);
> +
> +struct get_pages_work {
> +	struct dma_fence dma; /* Must be first for dma_fence_free() */
> +	struct i915_sw_fence wait;
> +	struct work_struct work;
> +	struct i915_gem_object_get_pages_context ctx;
> +};
> +
> +static const char *get_pages_work_driver_name(struct dma_fence *fence)
> +{
> +	return DRIVER_NAME;
> +}
> +
> +static const char *get_pages_work_timeline_name(struct dma_fence *fence)
> +{
> +	return "allocation";
> +}
> +
> +static void get_pages_work_release(struct dma_fence *fence)
> +{
> +	struct get_pages_work *w = container_of(fence, typeof(*w), dma);
> +
> +	i915_sw_fence_fini(&w->wait);
> +
> +	BUILD_BUG_ON(offsetof(typeof(*w), dma));
> +	dma_fence_free(&w->dma);
> +}
> +
> +static const struct dma_fence_ops get_pages_work_ops = {
> +	.get_driver_name = get_pages_work_driver_name,
> +	.get_timeline_name = get_pages_work_timeline_name,
> +	.release = get_pages_work_release,
> +};
> +
> +static void __set_pages(struct drm_i915_gem_object *obj,
> +			struct sg_table *pages,
> +			unsigned int sg_page_sizes)
>  {
>  	struct drm_i915_private *i915 = to_i915(obj->base.dev);
>  	unsigned long supported = INTEL_INFO(i915)->page_sizes;
>  	int i;
>  
> -	lockdep_assert_held(&obj->mm.lock);
> -
>  	/* Make the pages coherent with the GPU (flushing any swapin). */
>  	if (obj->cache_dirty) {
>  		obj->write_domain = 0;
> @@ -29,8 +62,6 @@ void __i915_gem_object_set_pages(struct drm_i915_gem_object *obj,
>  	obj->mm.get_page.sg_pos = pages->sgl;
>  	obj->mm.get_page.sg_idx = 0;
>  
> -	obj->mm.pages = pages;
> -
>  	if (i915_gem_object_is_tiled(obj) &&
>  	    i915->quirks & QUIRK_PIN_SWIZZLED_PAGES) {
>  		GEM_BUG_ON(obj->mm.quirked);
> @@ -38,7 +69,8 @@ void __i915_gem_object_set_pages(struct drm_i915_gem_object *obj,
>  		obj->mm.quirked = true;
>  	}
>  
> -	GEM_BUG_ON(!sg_page_sizes);
> +	if (!sg_page_sizes)
> +		sg_page_sizes = i915_sg_page_sizes(pages->sgl);
>  	obj->mm.page_sizes.phys = sg_page_sizes;
>  
>  	/*
> @@ -73,18 +105,105 @@ void __i915_gem_object_set_pages(struct drm_i915_gem_object *obj,
>  
>  		spin_unlock_irqrestore(&i915->mm.obj_lock, flags);
>  	}
> +}
>  
> +static void
> +get_pages_worker(struct work_struct *_work)
> +{
> +	struct get_pages_work *work = container_of(_work, typeof(*work), work);
> +	struct drm_i915_gem_object *obj = work->ctx.object;
> +	struct sg_table *pages;
> +	unsigned int sizes = 0;
> +
> +	if (!work->dma.error) {
> +		pages = obj->ops->get_pages(&work->ctx, &sizes);
> +		if (!IS_ERR(pages))
> +			__set_pages(obj, pages, sizes);
> +		else
> +			dma_fence_set_error(&work->dma, PTR_ERR(pages));
> +	} else {
> +		pages = ERR_PTR(work->dma.error);
> +	}
> +
> +	obj->mm.pages = pages;
>  	complete_all(&obj->mm.completion);
> +	atomic_dec(&obj->mm.pages_pin_count);
> +
> +	i915_gem_object_put(obj);
> +	put_task_struct(work->ctx.task);
> +
> +	dma_fence_signal(&work->dma);
> +	dma_fence_put(&work->dma);
> +}
> +
> +static int __i915_sw_fence_call
> +get_pages_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
> +{
> +	struct get_pages_work *w = container_of(fence, typeof(*w), wait);
> +
> +	switch (state) {
> +	case FENCE_COMPLETE:
> +		if (fence->error)
> +			dma_fence_set_error(&w->dma, fence->error);
> +		queue_work(system_unbound_wq, &w->work);
> +		break;
> +
> +	case FENCE_FREE:
> +		dma_fence_put(&w->dma);
> +		break;
> +	}
> +
> +	return NOTIFY_DONE;
>  }
>  
>  int ____i915_gem_object_get_pages(struct drm_i915_gem_object *obj)
>  {
> +	struct get_pages_work *work;
> +	int err;
> +
>  	if (unlikely(obj->mm.madv != I915_MADV_WILLNEED)) {
>  		DRM_DEBUG("Attempting to obtain a purgeable object\n");
>  		return -EFAULT;
>  	}
>  
> -	return obj->ops->get_pages(obj);
> +	/* XXX inline? */
> +
> +	work = kmalloc(sizeof(*work), GFP_KERNEL);
> +	if (!work)
> +		return -ENOMEM;
> +
> +	dma_fence_init(&work->dma,
> +		       &get_pages_work_ops,
> +		       &fence_lock,
> +		       to_i915(obj->base.dev)->mm.unordered_timeline,
> +		       0);

With all the shrinkers (across the tree, not just i915) we have that just
blindly assume you can wait for a fence (at least if you manage to acquire
the reservation_lock) I expect this to deadlock. I think the rule is that
no published fence can depend on any memory allocation anywhere.

Yes it sucks real hard.

> +	i915_sw_fence_init(&work->wait, get_pages_notify);
> +
> +	work->ctx.object = i915_gem_object_get(obj);
> +
> +	work->ctx.task = current;
> +	get_task_struct(work->ctx.task);
> +
> +	INIT_WORK(&work->work, get_pages_worker);
> +
> +	i915_gem_object_lock(obj);

Other bit is nesting reservation_object lock within obj->mm.lock.

obj->mm.lock is a very neat&tidy answer to dealing with bo alloc vs.
shrinker madness. But looking at where dma-buf is headed we'll need to
have the reservation_object nest outside of the obj->mm.lock (because the
importer is going to acquire that for us, before calling down into
exporter code).

Of course on the importer side we're currently nesting the other way
round, and Christian König already set CI on fire figuring that out.

I think what could work is pulling the reservation_object out of
obj->mm.lock, and entirely bypassing it for imported dma-buf (since for
that case dealing with shrinker lolz isn't our problem, but the exporters,
so we don't need the special shrinker lock).

3rd part is just irrational me freaking out when we hide userptr behind a
wait_completion, that didn't work out well last time around because
lockdep fails us. But that's really just an aside.
-Daniel

> +	GEM_BUG_ON(!reservation_object_test_signaled_rcu(obj->resv, true));
> +	err = i915_sw_fence_await_reservation(&work->wait,
> +					      obj->resv, NULL,
> +					      true, I915_FENCE_TIMEOUT,
> +					      I915_FENCE_GFP);
> +	if (err == 0) {
> +		reservation_object_add_excl_fence(obj->resv, &work->dma);
> +		atomic_inc(&obj->mm.pages_pin_count);
> +	} else {
> +		dma_fence_set_error(&work->dma, err);
> +	}
> +	i915_gem_object_unlock(obj);
> +
> +	dma_fence_get(&work->dma);
> +	i915_sw_fence_commit(&work->wait);
> +
> +	return err;
>  }
>  
>  /* Ensure that the associated pages are gathered from the backing storage
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_phys.c b/drivers/gpu/drm/i915/gem/i915_gem_phys.c
> index 2deac933cf59..6b4a5fb52055 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_phys.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_phys.c
> @@ -17,18 +17,21 @@
>  #include "i915_gem_object.h"
>  #include "i915_scatterlist.h"
>  
> -static int i915_gem_object_get_pages_phys(struct drm_i915_gem_object *obj)
> +static struct sg_table *
> +phys_get_pages(struct i915_gem_object_get_pages_context *ctx,
> +	       unsigned int *sizes)
>  {
> +	struct drm_i915_gem_object *obj = ctx->object;
>  	struct address_space *mapping = obj->base.filp->f_mapping;
>  	struct drm_dma_handle *phys;
> -	struct sg_table *st;
>  	struct scatterlist *sg;
> +	struct sg_table *st;
>  	char *vaddr;
>  	int i;
>  	int err;
>  
>  	if (WARN_ON(i915_gem_object_needs_bit17_swizzle(obj)))
> -		return -EINVAL;
> +		return ERR_PTR(-EINVAL);
>  
>  	/* Always aligning to the object size, allows a single allocation
>  	 * to handle all possible callers, and given typical object sizes,
> @@ -38,7 +41,7 @@ static int i915_gem_object_get_pages_phys(struct drm_i915_gem_object *obj)
>  			     roundup_pow_of_two(obj->base.size),
>  			     roundup_pow_of_two(obj->base.size));
>  	if (!phys)
> -		return -ENOMEM;
> +		return ERR_PTR(-ENOMEM);
>  
>  	vaddr = phys->vaddr;
>  	for (i = 0; i < obj->base.size / PAGE_SIZE; i++) {
> @@ -83,19 +86,16 @@ static int i915_gem_object_get_pages_phys(struct drm_i915_gem_object *obj)
>  
>  	obj->phys_handle = phys;
>  
> -	__i915_gem_object_set_pages(obj, st, sg->length);
> -
> -	return 0;
> +	*sizes = sg->length;
> +	return st;
>  
>  err_phys:
>  	drm_pci_free(obj->base.dev, phys);
> -
> -	return err;
> +	return ERR_PTR(err);
>  }
>  
>  static void
> -i915_gem_object_put_pages_phys(struct drm_i915_gem_object *obj,
> -			       struct sg_table *pages)
> +phys_put_pages(struct drm_i915_gem_object *obj, struct sg_table *pages)
>  {
>  	__i915_gem_object_release_shmem(obj, pages, false);
>  
> @@ -139,8 +139,8 @@ i915_gem_object_release_phys(struct drm_i915_gem_object *obj)
>  }
>  
>  static const struct drm_i915_gem_object_ops i915_gem_phys_ops = {
> -	.get_pages = i915_gem_object_get_pages_phys,
> -	.put_pages = i915_gem_object_put_pages_phys,
> +	.get_pages = phys_get_pages,
> +	.put_pages = phys_put_pages,
>  	.release = i915_gem_object_release_phys,
>  };
>  
> @@ -193,15 +193,12 @@ int i915_gem_object_attach_phys(struct drm_i915_gem_object *obj, int align)
>  	if (!IS_ERR_OR_NULL(pages))
>  		i915_gem_shmem_ops.put_pages(obj, pages);
>  	mutex_unlock(&obj->mm.lock);
> +
> +	wait_for_completion(&obj->mm.completion);
>  	return 0;
>  
>  err_xfer:
>  	obj->ops = &i915_gem_shmem_ops;
> -	if (!IS_ERR_OR_NULL(pages)) {
> -		unsigned int sg_page_sizes = i915_sg_page_sizes(pages->sgl);
> -
> -		__i915_gem_object_set_pages(obj, pages, sg_page_sizes);
> -	}
>  err_unlock:
>  	mutex_unlock(&obj->mm.lock);
>  	return err;
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
> index 19d9ecdb2894..c43304e3bada 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
> @@ -22,11 +22,13 @@ static void check_release_pagevec(struct pagevec *pvec)
>  	cond_resched();
>  }
>  
> -static int shmem_get_pages(struct drm_i915_gem_object *obj)
> +static struct sg_table *
> +shmem_get_pages(struct i915_gem_object_get_pages_context *ctx,
> +		unsigned int *sizes)
>  {
> +	struct drm_i915_gem_object *obj = ctx->object;
>  	struct drm_i915_private *i915 = to_i915(obj->base.dev);
>  	const unsigned long page_count = obj->base.size / PAGE_SIZE;
> -	unsigned long i;
>  	struct address_space *mapping;
>  	struct sg_table *st;
>  	struct scatterlist *sg;
> @@ -37,31 +39,24 @@ static int shmem_get_pages(struct drm_i915_gem_object *obj)
>  	unsigned int sg_page_sizes;
>  	struct pagevec pvec;
>  	gfp_t noreclaim;
> +	unsigned long i;
>  	int ret;
>  
> -	/*
> -	 * Assert that the object is not currently in any GPU domain. As it
> -	 * wasn't in the GTT, there shouldn't be any way it could have been in
> -	 * a GPU cache
> -	 */
> -	GEM_BUG_ON(obj->read_domains & I915_GEM_GPU_DOMAINS);
> -	GEM_BUG_ON(obj->write_domain & I915_GEM_GPU_DOMAINS);
> -
>  	/*
>  	 * If there's no chance of allocating enough pages for the whole
>  	 * object, bail early.
>  	 */
>  	if (page_count > totalram_pages())
> -		return -ENOMEM;
> +		return ERR_PTR(-ENOMEM);
>  
>  	st = kmalloc(sizeof(*st), GFP_KERNEL);
>  	if (!st)
> -		return -ENOMEM;
> +		return ERR_PTR(-ENOMEM);
>  
>  rebuild_st:
>  	if (sg_alloc_table(st, page_count, GFP_KERNEL)) {
>  		kfree(st);
> -		return -ENOMEM;
> +		return ERR_PTR(-ENOMEM);
>  	}
>  
>  	/*
> @@ -179,9 +174,8 @@ static int shmem_get_pages(struct drm_i915_gem_object *obj)
>  	if (i915_gem_object_needs_bit17_swizzle(obj))
>  		i915_gem_object_do_bit_17_swizzle(obj, st);
>  
> -	__i915_gem_object_set_pages(obj, st, sg_page_sizes);
> -
> -	return 0;
> +	*sizes = sg_page_sizes;
> +	return st;
>  
>  err_sg:
>  	sg_mark_end(sg);
> @@ -209,7 +203,7 @@ static int shmem_get_pages(struct drm_i915_gem_object *obj)
>  	if (ret == -ENOSPC)
>  		ret = -ENOMEM;
>  
> -	return ret;
> +	return ERR_PTR(ret);
>  }
>  
>  static void
> @@ -276,8 +270,6 @@ __i915_gem_object_release_shmem(struct drm_i915_gem_object *obj,
>  				struct sg_table *pages,
>  				bool needs_clflush)
>  {
> -	GEM_BUG_ON(obj->mm.madv == __I915_MADV_PURGED);
> -
>  	if (obj->mm.madv == I915_MADV_DONTNEED)
>  		obj->mm.dirty = false;
>  
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
> index 7066044d63cf..28ea06e667cd 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
> @@ -499,22 +499,19 @@ i915_pages_create_for_stolen(struct drm_device *dev,
>  	return st;
>  }
>  
> -static int i915_gem_object_get_pages_stolen(struct drm_i915_gem_object *obj)
> +static struct sg_table *
> +stolen_get_pages(struct i915_gem_object_get_pages_context *ctx,
> +		 unsigned int *sizes)
>  {
> -	struct sg_table *pages =
> -		i915_pages_create_for_stolen(obj->base.dev,
> -					     obj->stolen->start,
> -					     obj->stolen->size);
> -	if (IS_ERR(pages))
> -		return PTR_ERR(pages);
> -
> -	__i915_gem_object_set_pages(obj, pages, obj->stolen->size);
> +	struct drm_i915_gem_object *obj = ctx->object;
>  
> -	return 0;
> +	return i915_pages_create_for_stolen(obj->base.dev,
> +					    obj->stolen->start,
> +					    obj->stolen->size);
>  }
>  
> -static void i915_gem_object_put_pages_stolen(struct drm_i915_gem_object *obj,
> -					     struct sg_table *pages)
> +static void
> +stolen_put_pages(struct drm_i915_gem_object *obj, struct sg_table *pages)
>  {
>  	/* Should only be called from i915_gem_object_release_stolen() */
>  	sg_free_table(pages);
> @@ -536,8 +533,8 @@ i915_gem_object_release_stolen(struct drm_i915_gem_object *obj)
>  }
>  
>  static const struct drm_i915_gem_object_ops i915_gem_object_stolen_ops = {
> -	.get_pages = i915_gem_object_get_pages_stolen,
> -	.put_pages = i915_gem_object_put_pages_stolen,
> +	.get_pages = stolen_get_pages,
> +	.put_pages = stolen_put_pages,
>  	.release = i915_gem_object_release_stolen,
>  };
>  
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
> index f093deaeb5c0..6748e15bf89a 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
> @@ -378,7 +378,7 @@ __i915_mm_struct_free(struct kref *kref)
>  	mutex_unlock(&mm->i915->mm_lock);
>  
>  	INIT_WORK(&mm->work, __i915_mm_struct_free__worker);
> -	queue_work(mm->i915->mm.userptr_wq, &mm->work);
> +	schedule_work(&mm->work);
>  }
>  
>  static void
> @@ -393,19 +393,12 @@ i915_gem_userptr_release__mm_struct(struct drm_i915_gem_object *obj)
>  	obj->userptr.mm = NULL;
>  }
>  
> -struct get_pages_work {
> -	struct work_struct work;
> -	struct drm_i915_gem_object *obj;
> -	struct task_struct *task;
> -};
> -
>  static struct sg_table *
>  __i915_gem_userptr_alloc_pages(struct drm_i915_gem_object *obj,
>  			       struct page **pvec, int num_pages)
>  {
>  	unsigned int max_segment = i915_sg_segment_size();
>  	struct sg_table *st;
> -	unsigned int sg_page_sizes;
>  	int ret;
>  
>  	st = kmalloc(sizeof(*st), GFP_KERNEL);
> @@ -435,131 +428,23 @@ __i915_gem_userptr_alloc_pages(struct drm_i915_gem_object *obj,
>  		return ERR_PTR(ret);
>  	}
>  
> -	sg_page_sizes = i915_sg_page_sizes(st->sgl);
> -
> -	__i915_gem_object_set_pages(obj, st, sg_page_sizes);
> -
>  	return st;
>  }
>  
> -static void
> -__i915_gem_userptr_get_pages_worker(struct work_struct *_work)
> -{
> -	struct get_pages_work *work = container_of(_work, typeof(*work), work);
> -	struct drm_i915_gem_object *obj = work->obj;
> -	const int npages = obj->base.size >> PAGE_SHIFT;
> -	struct page **pvec;
> -	int pinned, ret;
> -
> -	ret = -ENOMEM;
> -	pinned = 0;
> -
> -	pvec = kvmalloc_array(npages, sizeof(struct page *), GFP_KERNEL);
> -	if (pvec != NULL) {
> -		struct mm_struct *mm = obj->userptr.mm->mm;
> -		unsigned int flags = 0;
> -
> -		if (!i915_gem_object_is_readonly(obj))
> -			flags |= FOLL_WRITE;
> -
> -		ret = -EFAULT;
> -		if (mmget_not_zero(mm)) {
> -			down_read(&mm->mmap_sem);
> -			while (pinned < npages) {
> -				ret = get_user_pages_remote
> -					(work->task, mm,
> -					 obj->userptr.ptr + pinned * PAGE_SIZE,
> -					 npages - pinned,
> -					 flags,
> -					 pvec + pinned, NULL, NULL);
> -				if (ret < 0)
> -					break;
> -
> -				pinned += ret;
> -			}
> -			up_read(&mm->mmap_sem);
> -			mmput(mm);
> -		}
> -	}
> -
> -	mutex_lock(&obj->mm.lock);
> -	if (obj->userptr.work == &work->work) {
> -		struct sg_table *pages = ERR_PTR(ret);
> -
> -		if (pinned == npages) {
> -			pages = __i915_gem_userptr_alloc_pages(obj, pvec,
> -							       npages);
> -			if (!IS_ERR(pages)) {
> -				pinned = 0;
> -				pages = NULL;
> -			}
> -		}
> -
> -		obj->userptr.work = ERR_CAST(pages);
> -		if (IS_ERR(pages))
> -			__i915_gem_userptr_set_active(obj, false);
> -	}
> -	mutex_unlock(&obj->mm.lock);
> -
> -	release_pages(pvec, pinned);
> -	kvfree(pvec);
> -
> -	i915_gem_object_put(obj);
> -	put_task_struct(work->task);
> -	kfree(work);
> -}
> -
>  static struct sg_table *
> -__i915_gem_userptr_get_pages_schedule(struct drm_i915_gem_object *obj)
> -{
> -	struct get_pages_work *work;
> -
> -	/* Spawn a worker so that we can acquire the
> -	 * user pages without holding our mutex. Access
> -	 * to the user pages requires mmap_sem, and we have
> -	 * a strict lock ordering of mmap_sem, struct_mutex -
> -	 * we already hold struct_mutex here and so cannot
> -	 * call gup without encountering a lock inversion.
> -	 *
> -	 * Userspace will keep on repeating the operation
> -	 * (thanks to EAGAIN) until either we hit the fast
> -	 * path or the worker completes. If the worker is
> -	 * cancelled or superseded, the task is still run
> -	 * but the results ignored. (This leads to
> -	 * complications that we may have a stray object
> -	 * refcount that we need to be wary of when
> -	 * checking for existing objects during creation.)
> -	 * If the worker encounters an error, it reports
> -	 * that error back to this function through
> -	 * obj->userptr.work = ERR_PTR.
> -	 */
> -	work = kmalloc(sizeof(*work), GFP_KERNEL);
> -	if (work == NULL)
> -		return ERR_PTR(-ENOMEM);
> -
> -	obj->userptr.work = &work->work;
> -
> -	work->obj = i915_gem_object_get(obj);
> -
> -	work->task = current;
> -	get_task_struct(work->task);
> -
> -	INIT_WORK(&work->work, __i915_gem_userptr_get_pages_worker);
> -	queue_work(to_i915(obj->base.dev)->mm.userptr_wq, &work->work);
> -
> -	return ERR_PTR(-EAGAIN);
> -}
> -
> -static int i915_gem_userptr_get_pages(struct drm_i915_gem_object *obj)
> +userptr_get_pages(struct i915_gem_object_get_pages_context *ctx,
> +		  unsigned int *sizes)
>  {
> +	struct drm_i915_gem_object *obj = ctx->object;
>  	const int num_pages = obj->base.size >> PAGE_SHIFT;
>  	struct mm_struct *mm = obj->userptr.mm->mm;
> -	struct page **pvec;
>  	struct sg_table *pages;
> -	bool active;
> +	struct page **pvec;
>  	int pinned;
> +	int ret;
>  
> -	/* If userspace should engineer that these pages are replaced in
> +	/*
> +	 * If userspace should engineer that these pages are replaced in
>  	 * the vma between us binding this page into the GTT and completion
>  	 * of rendering... Their loss. If they change the mapping of their
>  	 * pages they need to create a new bo to point to the new vma.
> @@ -576,59 +461,58 @@ static int i915_gem_userptr_get_pages(struct drm_i915_gem_object *obj)
>  	 * egregious cases from causing harm.
>  	 */
>  
> -	if (obj->userptr.work) {
> -		/* active flag should still be held for the pending work */
> -		if (IS_ERR(obj->userptr.work))
> -			return PTR_ERR(obj->userptr.work);
> -		else
> -			return -EAGAIN;
> -	}
> +	pvec = kvmalloc_array(num_pages, sizeof(struct page *),
> +			      GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN);
> +	if (!pvec)
> +		return ERR_PTR(-ENOMEM);
> +
> +	__i915_gem_userptr_set_active(obj, true);
>  
> -	pvec = NULL;
>  	pinned = 0;
> +	ret = -EFAULT;
> +	if (mmget_not_zero(mm)) {
> +		unsigned int flags;
>  
> -	if (mm == current->mm) {
> -		pvec = kvmalloc_array(num_pages, sizeof(struct page *),
> -				      GFP_KERNEL |
> -				      __GFP_NORETRY |
> -				      __GFP_NOWARN);
> -		if (pvec) /* defer to worker if malloc fails */
> -			pinned = __get_user_pages_fast(obj->userptr.ptr,
> -						       num_pages,
> -						       !i915_gem_object_is_readonly(obj),
> -						       pvec);
> -	}
> +		flags = 0;
> +		if (!i915_gem_object_is_readonly(obj))
> +			flags |= FOLL_WRITE;
>  
> -	active = false;
> -	if (pinned < 0) {
> -		pages = ERR_PTR(pinned);
> -		pinned = 0;
> -	} else if (pinned < num_pages) {
> -		pages = __i915_gem_userptr_get_pages_schedule(obj);
> -		active = pages == ERR_PTR(-EAGAIN);
> -	} else {
> -		pages = __i915_gem_userptr_alloc_pages(obj, pvec, num_pages);
> -		active = !IS_ERR(pages);
> +		down_read(&mm->mmap_sem);
> +		while (pinned < num_pages) {
> +			ret = get_user_pages_remote
> +				(ctx->task, mm,
> +				 obj->userptr.ptr + pinned * PAGE_SIZE,
> +				 num_pages - pinned,
> +				 flags,
> +				 pvec + pinned, NULL, NULL);
> +			if (ret < 0)
> +				break;
> +
> +			pinned += ret;
> +		}
> +		up_read(&mm->mmap_sem);
> +		mmput(mm);
>  	}
> -	if (active)
> -		__i915_gem_userptr_set_active(obj, true);
>  
> -	if (IS_ERR(pages))
> +	if (ret)
> +		pages = ERR_PTR(ret);
> +	else
> +		pages = __i915_gem_userptr_alloc_pages(obj, pvec, num_pages);
> +	if (IS_ERR(pages)) {
>  		release_pages(pvec, pinned);
> +		__i915_gem_userptr_set_active(obj, false);
> +	}
>  	kvfree(pvec);
>  
> -	return PTR_ERR_OR_ZERO(pages);
> +	return pages;
>  }
>  
>  static void
> -i915_gem_userptr_put_pages(struct drm_i915_gem_object *obj,
> -			   struct sg_table *pages)
> +userptr_put_pages(struct drm_i915_gem_object *obj, struct sg_table *pages)
>  {
>  	struct sgt_iter sgt_iter;
>  	struct page *page;
>  
> -	/* Cancel any inflight work and force them to restart their gup */
> -	obj->userptr.work = NULL;
>  	__i915_gem_userptr_set_active(obj, false);
>  	if (!pages)
>  		return;
> @@ -669,8 +553,8 @@ static const struct drm_i915_gem_object_ops i915_gem_userptr_ops = {
>  	.flags = I915_GEM_OBJECT_HAS_STRUCT_PAGE |
>  		 I915_GEM_OBJECT_IS_SHRINKABLE |
>  		 I915_GEM_OBJECT_ASYNC_CANCEL,
> -	.get_pages = i915_gem_userptr_get_pages,
> -	.put_pages = i915_gem_userptr_put_pages,
> +	.get_pages = userptr_get_pages,
> +	.put_pages = userptr_put_pages,
>  	.dmabuf_export = i915_gem_userptr_dmabuf_export,
>  	.release = i915_gem_userptr_release,
>  };
> @@ -786,22 +670,13 @@ i915_gem_userptr_ioctl(struct drm_device *dev,
>  	return 0;
>  }
>  
> -int i915_gem_init_userptr(struct drm_i915_private *dev_priv)
> +void i915_gem_init_userptr(struct drm_i915_private *dev_priv)
>  {
>  	mutex_init(&dev_priv->mm_lock);
>  	hash_init(dev_priv->mm_structs);
> -
> -	dev_priv->mm.userptr_wq =
> -		alloc_workqueue("i915-userptr-acquire",
> -				WQ_HIGHPRI | WQ_UNBOUND,
> -				0);
> -	if (!dev_priv->mm.userptr_wq)
> -		return -ENOMEM;
> -
> -	return 0;
>  }
>  
>  void i915_gem_cleanup_userptr(struct drm_i915_private *dev_priv)
>  {
> -	destroy_workqueue(dev_priv->mm.userptr_wq);
> +	mutex_destroy(&dev_priv->mm_lock);
>  }
> diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c b/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c
> index 3c5d17b2b670..02e6edce715e 100644
> --- a/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c
> +++ b/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c
> @@ -21,9 +21,12 @@ static void huge_free_pages(struct drm_i915_gem_object *obj,
>  	kfree(pages);
>  }
>  
> -static int huge_get_pages(struct drm_i915_gem_object *obj)
> +static struct sg_table *
> +huge_get_pages(struct i915_gem_object_get_pages_context *ctx,
> +	       unsigned int *sizes)
>  {
>  #define GFP (GFP_KERNEL | __GFP_NOWARN | __GFP_NORETRY)
> +	struct drm_i915_gem_object *obj = ctx->object;
>  	const unsigned long nreal = obj->scratch / PAGE_SIZE;
>  	const unsigned long npages = obj->base.size / PAGE_SIZE;
>  	struct scatterlist *sg, *src, *end;
> @@ -32,11 +35,11 @@ static int huge_get_pages(struct drm_i915_gem_object *obj)
>  
>  	pages = kmalloc(sizeof(*pages), GFP);
>  	if (!pages)
> -		return -ENOMEM;
> +		return ERR_PTR(-ENOMEM);
>  
>  	if (sg_alloc_table(pages, npages, GFP)) {
>  		kfree(pages);
> -		return -ENOMEM;
> +		return ERR_PTR(-ENOMEM);
>  	}
>  
>  	sg = pages->sgl;
> @@ -64,14 +67,12 @@ static int huge_get_pages(struct drm_i915_gem_object *obj)
>  	if (i915_gem_gtt_prepare_pages(obj, pages))
>  		goto err;
>  
> -	__i915_gem_object_set_pages(obj, pages, PAGE_SIZE);
> -
> -	return 0;
> +	*sizes = PAGE_SIZE;
> +	return pages;
>  
>  err:
>  	huge_free_pages(obj, pages);
> -
> -	return -ENOMEM;
> +	return ERR_PTR(-ENOMEM);
>  #undef GFP
>  }
>  
> diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
> index b03a29366bd1..3b93a7337f27 100644
> --- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
> +++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
> @@ -51,9 +51,12 @@ static void huge_pages_free_pages(struct sg_table *st)
>  	kfree(st);
>  }
>  
> -static int get_huge_pages(struct drm_i915_gem_object *obj)
> +static struct sg_table *
> +get_huge_pages(struct i915_gem_object_get_pages_context *ctx,
> +	       unsigned int *sizes)
>  {
>  #define GFP (GFP_KERNEL | __GFP_NOWARN | __GFP_NORETRY)
> +	struct drm_i915_gem_object *obj = ctx->object;
>  	unsigned int page_mask = obj->mm.page_mask;
>  	struct sg_table *st;
>  	struct scatterlist *sg;
> @@ -62,11 +65,11 @@ static int get_huge_pages(struct drm_i915_gem_object *obj)
>  
>  	st = kmalloc(sizeof(*st), GFP);
>  	if (!st)
> -		return -ENOMEM;
> +		return ERR_PTR(-ENOMEM);
>  
>  	if (sg_alloc_table(st, obj->base.size >> PAGE_SHIFT, GFP)) {
>  		kfree(st);
> -		return -ENOMEM;
> +		return ERR_PTR(-ENOMEM);
>  	}
>  
>  	rem = obj->base.size;
> @@ -114,16 +117,14 @@ static int get_huge_pages(struct drm_i915_gem_object *obj)
>  	obj->mm.madv = I915_MADV_DONTNEED;
>  
>  	GEM_BUG_ON(sg_page_sizes != obj->mm.page_mask);
> -	__i915_gem_object_set_pages(obj, st, sg_page_sizes);
> -
> -	return 0;
> +	*sizes = sg_page_sizes;
> +	return st;
>  
>  err:
>  	sg_set_page(sg, NULL, 0, 0);
>  	sg_mark_end(sg);
>  	huge_pages_free_pages(st);
> -
> -	return -ENOMEM;
> +	return ERR_PTR(-ENOMEM);
>  }
>  
>  static void put_huge_pages(struct drm_i915_gem_object *obj,
> @@ -175,8 +176,11 @@ huge_pages_object(struct drm_i915_private *i915,
>  	return obj;
>  }
>  
> -static int fake_get_huge_pages(struct drm_i915_gem_object *obj)
> +static struct sg_table *
> +fake_get_huge_pages(struct i915_gem_object_get_pages_context *ctx,
> +		    unsigned int *sizes)
>  {
> +	struct drm_i915_gem_object *obj = ctx->object;
>  	struct drm_i915_private *i915 = to_i915(obj->base.dev);
>  	const u64 max_len = rounddown_pow_of_two(UINT_MAX);
>  	struct sg_table *st;
> @@ -186,11 +190,11 @@ static int fake_get_huge_pages(struct drm_i915_gem_object *obj)
>  
>  	st = kmalloc(sizeof(*st), GFP);
>  	if (!st)
> -		return -ENOMEM;
> +		return ERR_PTR(-ENOMEM);
>  
>  	if (sg_alloc_table(st, obj->base.size >> PAGE_SHIFT, GFP)) {
>  		kfree(st);
> -		return -ENOMEM;
> +		return ERR_PTR(-ENOMEM);
>  	}
>  
>  	/* Use optimal page sized chunks to fill in the sg table */
> @@ -227,13 +231,15 @@ static int fake_get_huge_pages(struct drm_i915_gem_object *obj)
>  
>  	obj->mm.madv = I915_MADV_DONTNEED;
>  
> -	__i915_gem_object_set_pages(obj, st, sg_page_sizes);
> -
> -	return 0;
> +	*sizes = sg_page_sizes;
> +	return st;
>  }
>  
> -static int fake_get_huge_pages_single(struct drm_i915_gem_object *obj)
> +static struct sg_table *
> +fake_get_huge_pages_single(struct i915_gem_object_get_pages_context *ctx,
> +			   unsigned int *sizes)
>  {
> +	struct drm_i915_gem_object *obj = ctx->object;
>  	struct drm_i915_private *i915 = to_i915(obj->base.dev);
>  	struct sg_table *st;
>  	struct scatterlist *sg;
> @@ -241,11 +247,11 @@ static int fake_get_huge_pages_single(struct drm_i915_gem_object *obj)
>  
>  	st = kmalloc(sizeof(*st), GFP);
>  	if (!st)
> -		return -ENOMEM;
> +		return ERR_PTR(-ENOMEM);
>  
>  	if (sg_alloc_table(st, 1, GFP)) {
>  		kfree(st);
> -		return -ENOMEM;
> +		return ERR_PTR(-ENOMEM);
>  	}
>  
>  	sg = st->sgl;
> @@ -261,9 +267,8 @@ static int fake_get_huge_pages_single(struct drm_i915_gem_object *obj)
>  
>  	obj->mm.madv = I915_MADV_DONTNEED;
>  
> -	__i915_gem_object_set_pages(obj, st, sg->length);
> -
> -	return 0;
> +	*sizes = sg->length;
> +	return st;
>  #undef GFP
>  }
>  
> diff --git a/drivers/gpu/drm/i915/gvt/dmabuf.c b/drivers/gpu/drm/i915/gvt/dmabuf.c
> index 41c8ebc60c63..c2a81939f698 100644
> --- a/drivers/gpu/drm/i915/gvt/dmabuf.c
> +++ b/drivers/gpu/drm/i915/gvt/dmabuf.c
> @@ -36,9 +36,11 @@
>  
>  #define GEN8_DECODE_PTE(pte) (pte & GENMASK_ULL(63, 12))
>  
> -static int vgpu_gem_get_pages(
> -		struct drm_i915_gem_object *obj)
> +static struct sg_table *
> +vgpu_gem_get_pages(struct i915_gem_object_get_pages_context *ctx,
> +		   unsigned int *sizes)
>  {
> +	struct drm_i915_gem_object *obj = ctx->object;
>  	struct drm_i915_private *dev_priv = to_i915(obj->base.dev);
>  	struct sg_table *st;
>  	struct scatterlist *sg;
> @@ -49,17 +51,17 @@ static int vgpu_gem_get_pages(
>  
>  	fb_info = (struct intel_vgpu_fb_info *)obj->gvt_info;
>  	if (WARN_ON(!fb_info))
> -		return -ENODEV;
> +		return ERR_PTR(-ENODEV);
>  
>  	st = kmalloc(sizeof(*st), GFP_KERNEL);
>  	if (unlikely(!st))
> -		return -ENOMEM;
> +		return ERR_PTR(-ENOMEM);
>  
>  	page_num = obj->base.size >> PAGE_SHIFT;
>  	ret = sg_alloc_table(st, page_num, GFP_KERNEL);
>  	if (ret) {
>  		kfree(st);
> -		return ret;
> +		return ERR_PTR(ret);
>  	}
>  	gtt_entries = (gen8_pte_t __iomem *)dev_priv->ggtt.gsm +
>  		(fb_info->start >> PAGE_SHIFT);
> @@ -71,9 +73,8 @@ static int vgpu_gem_get_pages(
>  		sg_dma_len(sg) = PAGE_SIZE;
>  	}
>  
> -	__i915_gem_object_set_pages(obj, st, PAGE_SIZE);
> -
> -	return 0;
> +	*sizes = PAGE_SIZE;
> +	return st;
>  }
>  
>  static void vgpu_gem_put_pages(struct drm_i915_gem_object *obj,
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 9a1ec487e8b1..ef9e4cb49b4a 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -786,13 +786,6 @@ struct i915_gem_mm {
>  	struct notifier_block vmap_notifier;
>  	struct shrinker shrinker;
>  
> -	/**
> -	 * Workqueue to fault in userptr pages, flushed by the execbuf
> -	 * when required but otherwise left to userspace to try again
> -	 * on EAGAIN.
> -	 */
> -	struct workqueue_struct *userptr_wq;
> -
>  	u64 unordered_timeline;
>  
>  	/* the indicator for dispatch video commands on two BSD rings */
> @@ -2514,7 +2507,7 @@ static inline bool intel_vgpu_active(struct drm_i915_private *dev_priv)
>  }
>  
>  /* i915_gem.c */
> -int i915_gem_init_userptr(struct drm_i915_private *dev_priv);
> +void i915_gem_init_userptr(struct drm_i915_private *dev_priv);
>  void i915_gem_cleanup_userptr(struct drm_i915_private *dev_priv);
>  void i915_gem_sanitize(struct drm_i915_private *i915);
>  int i915_gem_init_early(struct drm_i915_private *dev_priv);
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 21416b87905e..1571c707ad15 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1544,10 +1544,7 @@ int i915_gem_init(struct drm_i915_private *dev_priv)
>  	dev_priv->mm.unordered_timeline = dma_fence_context_alloc(1);
>  
>  	i915_timelines_init(dev_priv);
> -
> -	ret = i915_gem_init_userptr(dev_priv);
> -	if (ret)
> -		return ret;
> +	i915_gem_init_userptr(dev_priv);
>  
>  	ret = intel_uc_init_misc(dev_priv);
>  	if (ret)
> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
> index 664b7e3e1d7c..dae1f8634a38 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
> @@ -54,10 +54,13 @@ static void fake_free_pages(struct drm_i915_gem_object *obj,
>  	kfree(pages);
>  }
>  
> -static int fake_get_pages(struct drm_i915_gem_object *obj)
> +static struct sg_table *
> +fake_get_pages(struct i915_gem_object_get_pages_context *ctx,
> +	       unsigned int *sizes)
>  {
>  #define GFP (GFP_KERNEL | __GFP_NOWARN | __GFP_NORETRY)
>  #define PFN_BIAS 0x1000
> +	struct drm_i915_gem_object *obj = ctx->object;
>  	struct sg_table *pages;
>  	struct scatterlist *sg;
>  	unsigned int sg_page_sizes;
> @@ -65,12 +68,12 @@ static int fake_get_pages(struct drm_i915_gem_object *obj)
>  
>  	pages = kmalloc(sizeof(*pages), GFP);
>  	if (!pages)
> -		return -ENOMEM;
> +		return ERR_PTR(-ENOMEM);
>  
>  	rem = round_up(obj->base.size, BIT(31)) >> 31;
>  	if (sg_alloc_table(pages, rem, GFP)) {
>  		kfree(pages);
> -		return -ENOMEM;
> +		return ERR_PTR(-ENOMEM);
>  	}
>  
>  	sg_page_sizes = 0;
> @@ -90,9 +93,8 @@ static int fake_get_pages(struct drm_i915_gem_object *obj)
>  
>  	obj->mm.madv = I915_MADV_DONTNEED;
>  
> -	__i915_gem_object_set_pages(obj, pages, sg_page_sizes);
> -
> -	return 0;
> +	*sizes = sg_page_sizes;
> +	return pages;
>  #undef GFP
>  }
>  
> -- 
> 2.20.1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Intel-gfx] [PATCH 36/39] drm/i915: Use reservation_object to coordinate userptr get_pages()
  2019-08-07 20:49   ` Daniel Vetter
@ 2019-08-08  7:30     ` Daniel Vetter
  0 siblings, 0 replies; 61+ messages in thread
From: Daniel Vetter @ 2019-08-08  7:30 UTC (permalink / raw)
  To: Chris Wilson, Christian König, Matthew Auld, Tvrtko Ursulin,
	Joonas Lahtinen, CQ Tang
  Cc: intel-gfx, dri-devel

Adding a pile of people since there's lots of discussoins going on around this.
-Daniel

On Wed, Aug 7, 2019 at 10:50 PM Daniel Vetter <daniel@ffwll.ch> wrote:
>
> On Fri, Jun 14, 2019 at 08:10:20AM +0100, Chris Wilson wrote:
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>
> Not sure this works, 2 thoughts way down ...
>
> > ---
> >  drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c    |  27 +--
> >  .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |   3 -
> >  drivers/gpu/drm/i915/gem/i915_gem_internal.c  |  30 +--
> >  .../gpu/drm/i915/gem/i915_gem_object_types.h  |  11 +-
> >  drivers/gpu/drm/i915/gem/i915_gem_pages.c     | 137 ++++++++++-
> >  drivers/gpu/drm/i915/gem/i915_gem_phys.c      |  33 ++-
> >  drivers/gpu/drm/i915/gem/i915_gem_shmem.c     |  30 +--
> >  drivers/gpu/drm/i915/gem/i915_gem_stolen.c    |  25 +-
> >  drivers/gpu/drm/i915/gem/i915_gem_userptr.c   | 221 ++++--------------
> >  .../drm/i915/gem/selftests/huge_gem_object.c  |  17 +-
> >  .../gpu/drm/i915/gem/selftests/huge_pages.c   |  45 ++--
> >  drivers/gpu/drm/i915/gvt/dmabuf.c             |  17 +-
> >  drivers/gpu/drm/i915/i915_drv.h               |   9 +-
> >  drivers/gpu/drm/i915/i915_gem.c               |   5 +-
> >  drivers/gpu/drm/i915/selftests/i915_gem_gtt.c |  14 +-
> >  15 files changed, 300 insertions(+), 324 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> > index 84992d590da5..a44d6d2ef7ed 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> > @@ -225,33 +225,24 @@ struct dma_buf *i915_gem_prime_export(struct drm_device *dev,
> >       return drm_gem_dmabuf_export(dev, &exp_info);
> >  }
> >
> > -static int i915_gem_object_get_pages_dmabuf(struct drm_i915_gem_object *obj)
> > +static struct sg_table *
> > +dmabuf_get_pages(struct i915_gem_object_get_pages_context *ctx,
> > +              unsigned int *sizes)
> >  {
> > -     struct sg_table *pages;
> > -     unsigned int sg_page_sizes;
> > -
> > -     pages = dma_buf_map_attachment(obj->base.import_attach,
> > -                                    DMA_BIDIRECTIONAL);
> > -     if (IS_ERR(pages))
> > -             return PTR_ERR(pages);
> > -
> > -     sg_page_sizes = i915_sg_page_sizes(pages->sgl);
> > -
> > -     __i915_gem_object_set_pages(obj, pages, sg_page_sizes);
> > -
> > -     return 0;
> > +     return dma_buf_map_attachment(ctx->object->base.import_attach,
> > +                                   DMA_BIDIRECTIONAL);
> >  }
> >
> > -static void i915_gem_object_put_pages_dmabuf(struct drm_i915_gem_object *obj,
> > -                                          struct sg_table *pages)
> > +static void dmabuf_put_pages(struct drm_i915_gem_object *obj,
> > +                          struct sg_table *pages)
> >  {
> >       dma_buf_unmap_attachment(obj->base.import_attach, pages,
> >                                DMA_BIDIRECTIONAL);
> >  }
> >
> >  static const struct drm_i915_gem_object_ops i915_gem_object_dmabuf_ops = {
> > -     .get_pages = i915_gem_object_get_pages_dmabuf,
> > -     .put_pages = i915_gem_object_put_pages_dmabuf,
> > +     .get_pages = dmabuf_get_pages,
> > +     .put_pages = dmabuf_put_pages,
> >  };
> >
> >  struct drm_gem_object *i915_gem_prime_import(struct drm_device *dev,
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > index 44bcb681c168..68faf1a71c97 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > @@ -1784,9 +1784,6 @@ static noinline int eb_relocate_slow(struct i915_execbuffer *eb)
> >               goto out;
> >       }
> >
> > -     /* A frequent cause for EAGAIN are currently unavailable client pages */
> > -     flush_workqueue(eb->i915->mm.userptr_wq);
> > -
> >       err = i915_mutex_lock_interruptible(dev);
> >       if (err) {
> >               mutex_lock(&dev->struct_mutex);
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_internal.c b/drivers/gpu/drm/i915/gem/i915_gem_internal.c
> > index 0c41e04ab8fa..aa0bd5de313b 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_internal.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_internal.c
> > @@ -32,11 +32,14 @@ static void internal_free_pages(struct sg_table *st)
> >       kfree(st);
> >  }
> >
> > -static int i915_gem_object_get_pages_internal(struct drm_i915_gem_object *obj)
> > +static struct sg_table *
> > +internal_get_pages(struct i915_gem_object_get_pages_context *ctx,
> > +                unsigned int *sizes)
> >  {
> > +     struct drm_i915_gem_object *obj = ctx->object;
> >       struct drm_i915_private *i915 = to_i915(obj->base.dev);
> > -     struct sg_table *st;
> >       struct scatterlist *sg;
> > +     struct sg_table *st;
> >       unsigned int sg_page_sizes;
> >       unsigned int npages;
> >       int max_order;
> > @@ -66,12 +69,12 @@ static int i915_gem_object_get_pages_internal(struct drm_i915_gem_object *obj)
> >  create_st:
> >       st = kmalloc(sizeof(*st), GFP_KERNEL);
> >       if (!st)
> > -             return -ENOMEM;
> > +             return ERR_PTR(-ENOMEM);
> >
> >       npages = obj->base.size / PAGE_SIZE;
> >       if (sg_alloc_table(st, npages, GFP_KERNEL)) {
> >               kfree(st);
> > -             return -ENOMEM;
> > +             return ERR_PTR(-ENOMEM);
> >       }
> >
> >       sg = st->sgl;
> > @@ -117,27 +120,26 @@ static int i915_gem_object_get_pages_internal(struct drm_i915_gem_object *obj)
> >               goto err;
> >       }
> >
> > -     /* Mark the pages as dontneed whilst they are still pinned. As soon
> > +     /*
> > +      * Mark the pages as dontneed whilst they are still pinned. As soon
> >        * as they are unpinned they are allowed to be reaped by the shrinker,
> >        * and the caller is expected to repopulate - the contents of this
> >        * object are only valid whilst active and pinned.
> >        */
> >       obj->mm.madv = I915_MADV_DONTNEED;
> >
> > -     __i915_gem_object_set_pages(obj, st, sg_page_sizes);
> > -
> > -     return 0;
> > +     *sizes = sg_page_sizes;
> > +     return st;
> >
> >  err:
> >       sg_set_page(sg, NULL, 0, 0);
> >       sg_mark_end(sg);
> >       internal_free_pages(st);
> > -
> > -     return -ENOMEM;
> > +     return ERR_PTR(-ENOMEM);
> >  }
> >
> > -static void i915_gem_object_put_pages_internal(struct drm_i915_gem_object *obj,
> > -                                            struct sg_table *pages)
> > +static void internal_put_pages(struct drm_i915_gem_object *obj,
> > +                            struct sg_table *pages)
> >  {
> >       i915_gem_gtt_finish_pages(obj, pages);
> >       internal_free_pages(pages);
> > @@ -149,8 +151,8 @@ static void i915_gem_object_put_pages_internal(struct drm_i915_gem_object *obj,
> >  static const struct drm_i915_gem_object_ops i915_gem_object_internal_ops = {
> >       .flags = I915_GEM_OBJECT_HAS_STRUCT_PAGE |
> >                I915_GEM_OBJECT_IS_SHRINKABLE,
> > -     .get_pages = i915_gem_object_get_pages_internal,
> > -     .put_pages = i915_gem_object_put_pages_internal,
> > +     .get_pages = internal_get_pages,
> > +     .put_pages = internal_put_pages,
> >  };
> >
> >  /**
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> > index f792953b8a71..0ea404cfbc1c 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> > @@ -19,6 +19,7 @@
> >
> >  struct drm_i915_gem_object;
> >  struct intel_fronbuffer;
> > +struct task_struct;
> >
> >  /*
> >   * struct i915_lut_handle tracks the fast lookups from handle to vma used
> > @@ -32,6 +33,11 @@ struct i915_lut_handle {
> >       u32 handle;
> >  };
> >
> > +struct i915_gem_object_get_pages_context  {
> > +     struct drm_i915_gem_object *object;
> > +     struct task_struct *task;
> > +};
> > +
> >  struct drm_i915_gem_object_ops {
> >       unsigned int flags;
> >  #define I915_GEM_OBJECT_HAS_STRUCT_PAGE      BIT(0)
> > @@ -52,9 +58,11 @@ struct drm_i915_gem_object_ops {
> >        * being released or under memory pressure (where we attempt to
> >        * reap pages for the shrinker).
> >        */
> > -     int (*get_pages)(struct drm_i915_gem_object *obj);
> > +     struct sg_table *(*get_pages)(struct i915_gem_object_get_pages_context *ctx,
> > +                                   unsigned int *sizes);
> >       void (*put_pages)(struct drm_i915_gem_object *obj,
> >                         struct sg_table *pages);
> > +
> >       void (*truncate)(struct drm_i915_gem_object *obj);
> >       void (*writeback)(struct drm_i915_gem_object *obj);
> >
> > @@ -252,7 +260,6 @@ struct drm_i915_gem_object {
> >
> >                       struct i915_mm_struct *mm;
> >                       struct i915_mmu_object *mmu_object;
> > -                     struct work_struct *work;
> >               } userptr;
> >
> >               unsigned long scratch;
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> > index 6bec301cee79..f65a983248c6 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> > @@ -8,16 +8,49 @@
> >  #include "i915_gem_object.h"
> >  #include "i915_scatterlist.h"
> >
> > -void __i915_gem_object_set_pages(struct drm_i915_gem_object *obj,
> > -                              struct sg_table *pages,
> > -                              unsigned int sg_page_sizes)
> > +static DEFINE_SPINLOCK(fence_lock);
> > +
> > +struct get_pages_work {
> > +     struct dma_fence dma; /* Must be first for dma_fence_free() */
> > +     struct i915_sw_fence wait;
> > +     struct work_struct work;
> > +     struct i915_gem_object_get_pages_context ctx;
> > +};
> > +
> > +static const char *get_pages_work_driver_name(struct dma_fence *fence)
> > +{
> > +     return DRIVER_NAME;
> > +}
> > +
> > +static const char *get_pages_work_timeline_name(struct dma_fence *fence)
> > +{
> > +     return "allocation";
> > +}
> > +
> > +static void get_pages_work_release(struct dma_fence *fence)
> > +{
> > +     struct get_pages_work *w = container_of(fence, typeof(*w), dma);
> > +
> > +     i915_sw_fence_fini(&w->wait);
> > +
> > +     BUILD_BUG_ON(offsetof(typeof(*w), dma));
> > +     dma_fence_free(&w->dma);
> > +}
> > +
> > +static const struct dma_fence_ops get_pages_work_ops = {
> > +     .get_driver_name = get_pages_work_driver_name,
> > +     .get_timeline_name = get_pages_work_timeline_name,
> > +     .release = get_pages_work_release,
> > +};
> > +
> > +static void __set_pages(struct drm_i915_gem_object *obj,
> > +                     struct sg_table *pages,
> > +                     unsigned int sg_page_sizes)
> >  {
> >       struct drm_i915_private *i915 = to_i915(obj->base.dev);
> >       unsigned long supported = INTEL_INFO(i915)->page_sizes;
> >       int i;
> >
> > -     lockdep_assert_held(&obj->mm.lock);
> > -
> >       /* Make the pages coherent with the GPU (flushing any swapin). */
> >       if (obj->cache_dirty) {
> >               obj->write_domain = 0;
> > @@ -29,8 +62,6 @@ void __i915_gem_object_set_pages(struct drm_i915_gem_object *obj,
> >       obj->mm.get_page.sg_pos = pages->sgl;
> >       obj->mm.get_page.sg_idx = 0;
> >
> > -     obj->mm.pages = pages;
> > -
> >       if (i915_gem_object_is_tiled(obj) &&
> >           i915->quirks & QUIRK_PIN_SWIZZLED_PAGES) {
> >               GEM_BUG_ON(obj->mm.quirked);
> > @@ -38,7 +69,8 @@ void __i915_gem_object_set_pages(struct drm_i915_gem_object *obj,
> >               obj->mm.quirked = true;
> >       }
> >
> > -     GEM_BUG_ON(!sg_page_sizes);
> > +     if (!sg_page_sizes)
> > +             sg_page_sizes = i915_sg_page_sizes(pages->sgl);
> >       obj->mm.page_sizes.phys = sg_page_sizes;
> >
> >       /*
> > @@ -73,18 +105,105 @@ void __i915_gem_object_set_pages(struct drm_i915_gem_object *obj,
> >
> >               spin_unlock_irqrestore(&i915->mm.obj_lock, flags);
> >       }
> > +}
> >
> > +static void
> > +get_pages_worker(struct work_struct *_work)
> > +{
> > +     struct get_pages_work *work = container_of(_work, typeof(*work), work);
> > +     struct drm_i915_gem_object *obj = work->ctx.object;
> > +     struct sg_table *pages;
> > +     unsigned int sizes = 0;
> > +
> > +     if (!work->dma.error) {
> > +             pages = obj->ops->get_pages(&work->ctx, &sizes);
> > +             if (!IS_ERR(pages))
> > +                     __set_pages(obj, pages, sizes);
> > +             else
> > +                     dma_fence_set_error(&work->dma, PTR_ERR(pages));
> > +     } else {
> > +             pages = ERR_PTR(work->dma.error);
> > +     }
> > +
> > +     obj->mm.pages = pages;
> >       complete_all(&obj->mm.completion);
> > +     atomic_dec(&obj->mm.pages_pin_count);
> > +
> > +     i915_gem_object_put(obj);
> > +     put_task_struct(work->ctx.task);
> > +
> > +     dma_fence_signal(&work->dma);
> > +     dma_fence_put(&work->dma);
> > +}
> > +
> > +static int __i915_sw_fence_call
> > +get_pages_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
> > +{
> > +     struct get_pages_work *w = container_of(fence, typeof(*w), wait);
> > +
> > +     switch (state) {
> > +     case FENCE_COMPLETE:
> > +             if (fence->error)
> > +                     dma_fence_set_error(&w->dma, fence->error);
> > +             queue_work(system_unbound_wq, &w->work);
> > +             break;
> > +
> > +     case FENCE_FREE:
> > +             dma_fence_put(&w->dma);
> > +             break;
> > +     }
> > +
> > +     return NOTIFY_DONE;
> >  }
> >
> >  int ____i915_gem_object_get_pages(struct drm_i915_gem_object *obj)
> >  {
> > +     struct get_pages_work *work;
> > +     int err;
> > +
> >       if (unlikely(obj->mm.madv != I915_MADV_WILLNEED)) {
> >               DRM_DEBUG("Attempting to obtain a purgeable object\n");
> >               return -EFAULT;
> >       }
> >
> > -     return obj->ops->get_pages(obj);
> > +     /* XXX inline? */
> > +
> > +     work = kmalloc(sizeof(*work), GFP_KERNEL);
> > +     if (!work)
> > +             return -ENOMEM;
> > +
> > +     dma_fence_init(&work->dma,
> > +                    &get_pages_work_ops,
> > +                    &fence_lock,
> > +                    to_i915(obj->base.dev)->mm.unordered_timeline,
> > +                    0);
>
> With all the shrinkers (across the tree, not just i915) we have that just
> blindly assume you can wait for a fence (at least if you manage to acquire
> the reservation_lock) I expect this to deadlock. I think the rule is that
> no published fence can depend on any memory allocation anywhere.
>
> Yes it sucks real hard.
>
> > +     i915_sw_fence_init(&work->wait, get_pages_notify);
> > +
> > +     work->ctx.object = i915_gem_object_get(obj);
> > +
> > +     work->ctx.task = current;
> > +     get_task_struct(work->ctx.task);
> > +
> > +     INIT_WORK(&work->work, get_pages_worker);
> > +
> > +     i915_gem_object_lock(obj);
>
> Other bit is nesting reservation_object lock within obj->mm.lock.
>
> obj->mm.lock is a very neat&tidy answer to dealing with bo alloc vs.
> shrinker madness. But looking at where dma-buf is headed we'll need to
> have the reservation_object nest outside of the obj->mm.lock (because the
> importer is going to acquire that for us, before calling down into
> exporter code).
>
> Of course on the importer side we're currently nesting the other way
> round, and Christian König already set CI on fire figuring that out.
>
> I think what could work is pulling the reservation_object out of
> obj->mm.lock, and entirely bypassing it for imported dma-buf (since for
> that case dealing with shrinker lolz isn't our problem, but the exporters,
> so we don't need the special shrinker lock).
>
> 3rd part is just irrational me freaking out when we hide userptr behind a
> wait_completion, that didn't work out well last time around because
> lockdep fails us. But that's really just an aside.
> -Daniel
>
> > +     GEM_BUG_ON(!reservation_object_test_signaled_rcu(obj->resv, true));
> > +     err = i915_sw_fence_await_reservation(&work->wait,
> > +                                           obj->resv, NULL,
> > +                                           true, I915_FENCE_TIMEOUT,
> > +                                           I915_FENCE_GFP);
> > +     if (err == 0) {
> > +             reservation_object_add_excl_fence(obj->resv, &work->dma);
> > +             atomic_inc(&obj->mm.pages_pin_count);
> > +     } else {
> > +             dma_fence_set_error(&work->dma, err);
> > +     }
> > +     i915_gem_object_unlock(obj);
> > +
> > +     dma_fence_get(&work->dma);
> > +     i915_sw_fence_commit(&work->wait);
> > +
> > +     return err;
> >  }
> >
> >  /* Ensure that the associated pages are gathered from the backing storage
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_phys.c b/drivers/gpu/drm/i915/gem/i915_gem_phys.c
> > index 2deac933cf59..6b4a5fb52055 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_phys.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_phys.c
> > @@ -17,18 +17,21 @@
> >  #include "i915_gem_object.h"
> >  #include "i915_scatterlist.h"
> >
> > -static int i915_gem_object_get_pages_phys(struct drm_i915_gem_object *obj)
> > +static struct sg_table *
> > +phys_get_pages(struct i915_gem_object_get_pages_context *ctx,
> > +            unsigned int *sizes)
> >  {
> > +     struct drm_i915_gem_object *obj = ctx->object;
> >       struct address_space *mapping = obj->base.filp->f_mapping;
> >       struct drm_dma_handle *phys;
> > -     struct sg_table *st;
> >       struct scatterlist *sg;
> > +     struct sg_table *st;
> >       char *vaddr;
> >       int i;
> >       int err;
> >
> >       if (WARN_ON(i915_gem_object_needs_bit17_swizzle(obj)))
> > -             return -EINVAL;
> > +             return ERR_PTR(-EINVAL);
> >
> >       /* Always aligning to the object size, allows a single allocation
> >        * to handle all possible callers, and given typical object sizes,
> > @@ -38,7 +41,7 @@ static int i915_gem_object_get_pages_phys(struct drm_i915_gem_object *obj)
> >                            roundup_pow_of_two(obj->base.size),
> >                            roundup_pow_of_two(obj->base.size));
> >       if (!phys)
> > -             return -ENOMEM;
> > +             return ERR_PTR(-ENOMEM);
> >
> >       vaddr = phys->vaddr;
> >       for (i = 0; i < obj->base.size / PAGE_SIZE; i++) {
> > @@ -83,19 +86,16 @@ static int i915_gem_object_get_pages_phys(struct drm_i915_gem_object *obj)
> >
> >       obj->phys_handle = phys;
> >
> > -     __i915_gem_object_set_pages(obj, st, sg->length);
> > -
> > -     return 0;
> > +     *sizes = sg->length;
> > +     return st;
> >
> >  err_phys:
> >       drm_pci_free(obj->base.dev, phys);
> > -
> > -     return err;
> > +     return ERR_PTR(err);
> >  }
> >
> >  static void
> > -i915_gem_object_put_pages_phys(struct drm_i915_gem_object *obj,
> > -                            struct sg_table *pages)
> > +phys_put_pages(struct drm_i915_gem_object *obj, struct sg_table *pages)
> >  {
> >       __i915_gem_object_release_shmem(obj, pages, false);
> >
> > @@ -139,8 +139,8 @@ i915_gem_object_release_phys(struct drm_i915_gem_object *obj)
> >  }
> >
> >  static const struct drm_i915_gem_object_ops i915_gem_phys_ops = {
> > -     .get_pages = i915_gem_object_get_pages_phys,
> > -     .put_pages = i915_gem_object_put_pages_phys,
> > +     .get_pages = phys_get_pages,
> > +     .put_pages = phys_put_pages,
> >       .release = i915_gem_object_release_phys,
> >  };
> >
> > @@ -193,15 +193,12 @@ int i915_gem_object_attach_phys(struct drm_i915_gem_object *obj, int align)
> >       if (!IS_ERR_OR_NULL(pages))
> >               i915_gem_shmem_ops.put_pages(obj, pages);
> >       mutex_unlock(&obj->mm.lock);
> > +
> > +     wait_for_completion(&obj->mm.completion);
> >       return 0;
> >
> >  err_xfer:
> >       obj->ops = &i915_gem_shmem_ops;
> > -     if (!IS_ERR_OR_NULL(pages)) {
> > -             unsigned int sg_page_sizes = i915_sg_page_sizes(pages->sgl);
> > -
> > -             __i915_gem_object_set_pages(obj, pages, sg_page_sizes);
> > -     }
> >  err_unlock:
> >       mutex_unlock(&obj->mm.lock);
> >       return err;
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
> > index 19d9ecdb2894..c43304e3bada 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
> > @@ -22,11 +22,13 @@ static void check_release_pagevec(struct pagevec *pvec)
> >       cond_resched();
> >  }
> >
> > -static int shmem_get_pages(struct drm_i915_gem_object *obj)
> > +static struct sg_table *
> > +shmem_get_pages(struct i915_gem_object_get_pages_context *ctx,
> > +             unsigned int *sizes)
> >  {
> > +     struct drm_i915_gem_object *obj = ctx->object;
> >       struct drm_i915_private *i915 = to_i915(obj->base.dev);
> >       const unsigned long page_count = obj->base.size / PAGE_SIZE;
> > -     unsigned long i;
> >       struct address_space *mapping;
> >       struct sg_table *st;
> >       struct scatterlist *sg;
> > @@ -37,31 +39,24 @@ static int shmem_get_pages(struct drm_i915_gem_object *obj)
> >       unsigned int sg_page_sizes;
> >       struct pagevec pvec;
> >       gfp_t noreclaim;
> > +     unsigned long i;
> >       int ret;
> >
> > -     /*
> > -      * Assert that the object is not currently in any GPU domain. As it
> > -      * wasn't in the GTT, there shouldn't be any way it could have been in
> > -      * a GPU cache
> > -      */
> > -     GEM_BUG_ON(obj->read_domains & I915_GEM_GPU_DOMAINS);
> > -     GEM_BUG_ON(obj->write_domain & I915_GEM_GPU_DOMAINS);
> > -
> >       /*
> >        * If there's no chance of allocating enough pages for the whole
> >        * object, bail early.
> >        */
> >       if (page_count > totalram_pages())
> > -             return -ENOMEM;
> > +             return ERR_PTR(-ENOMEM);
> >
> >       st = kmalloc(sizeof(*st), GFP_KERNEL);
> >       if (!st)
> > -             return -ENOMEM;
> > +             return ERR_PTR(-ENOMEM);
> >
> >  rebuild_st:
> >       if (sg_alloc_table(st, page_count, GFP_KERNEL)) {
> >               kfree(st);
> > -             return -ENOMEM;
> > +             return ERR_PTR(-ENOMEM);
> >       }
> >
> >       /*
> > @@ -179,9 +174,8 @@ static int shmem_get_pages(struct drm_i915_gem_object *obj)
> >       if (i915_gem_object_needs_bit17_swizzle(obj))
> >               i915_gem_object_do_bit_17_swizzle(obj, st);
> >
> > -     __i915_gem_object_set_pages(obj, st, sg_page_sizes);
> > -
> > -     return 0;
> > +     *sizes = sg_page_sizes;
> > +     return st;
> >
> >  err_sg:
> >       sg_mark_end(sg);
> > @@ -209,7 +203,7 @@ static int shmem_get_pages(struct drm_i915_gem_object *obj)
> >       if (ret == -ENOSPC)
> >               ret = -ENOMEM;
> >
> > -     return ret;
> > +     return ERR_PTR(ret);
> >  }
> >
> >  static void
> > @@ -276,8 +270,6 @@ __i915_gem_object_release_shmem(struct drm_i915_gem_object *obj,
> >                               struct sg_table *pages,
> >                               bool needs_clflush)
> >  {
> > -     GEM_BUG_ON(obj->mm.madv == __I915_MADV_PURGED);
> > -
> >       if (obj->mm.madv == I915_MADV_DONTNEED)
> >               obj->mm.dirty = false;
> >
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
> > index 7066044d63cf..28ea06e667cd 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
> > @@ -499,22 +499,19 @@ i915_pages_create_for_stolen(struct drm_device *dev,
> >       return st;
> >  }
> >
> > -static int i915_gem_object_get_pages_stolen(struct drm_i915_gem_object *obj)
> > +static struct sg_table *
> > +stolen_get_pages(struct i915_gem_object_get_pages_context *ctx,
> > +              unsigned int *sizes)
> >  {
> > -     struct sg_table *pages =
> > -             i915_pages_create_for_stolen(obj->base.dev,
> > -                                          obj->stolen->start,
> > -                                          obj->stolen->size);
> > -     if (IS_ERR(pages))
> > -             return PTR_ERR(pages);
> > -
> > -     __i915_gem_object_set_pages(obj, pages, obj->stolen->size);
> > +     struct drm_i915_gem_object *obj = ctx->object;
> >
> > -     return 0;
> > +     return i915_pages_create_for_stolen(obj->base.dev,
> > +                                         obj->stolen->start,
> > +                                         obj->stolen->size);
> >  }
> >
> > -static void i915_gem_object_put_pages_stolen(struct drm_i915_gem_object *obj,
> > -                                          struct sg_table *pages)
> > +static void
> > +stolen_put_pages(struct drm_i915_gem_object *obj, struct sg_table *pages)
> >  {
> >       /* Should only be called from i915_gem_object_release_stolen() */
> >       sg_free_table(pages);
> > @@ -536,8 +533,8 @@ i915_gem_object_release_stolen(struct drm_i915_gem_object *obj)
> >  }
> >
> >  static const struct drm_i915_gem_object_ops i915_gem_object_stolen_ops = {
> > -     .get_pages = i915_gem_object_get_pages_stolen,
> > -     .put_pages = i915_gem_object_put_pages_stolen,
> > +     .get_pages = stolen_get_pages,
> > +     .put_pages = stolen_put_pages,
> >       .release = i915_gem_object_release_stolen,
> >  };
> >
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
> > index f093deaeb5c0..6748e15bf89a 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
> > @@ -378,7 +378,7 @@ __i915_mm_struct_free(struct kref *kref)
> >       mutex_unlock(&mm->i915->mm_lock);
> >
> >       INIT_WORK(&mm->work, __i915_mm_struct_free__worker);
> > -     queue_work(mm->i915->mm.userptr_wq, &mm->work);
> > +     schedule_work(&mm->work);
> >  }
> >
> >  static void
> > @@ -393,19 +393,12 @@ i915_gem_userptr_release__mm_struct(struct drm_i915_gem_object *obj)
> >       obj->userptr.mm = NULL;
> >  }
> >
> > -struct get_pages_work {
> > -     struct work_struct work;
> > -     struct drm_i915_gem_object *obj;
> > -     struct task_struct *task;
> > -};
> > -
> >  static struct sg_table *
> >  __i915_gem_userptr_alloc_pages(struct drm_i915_gem_object *obj,
> >                              struct page **pvec, int num_pages)
> >  {
> >       unsigned int max_segment = i915_sg_segment_size();
> >       struct sg_table *st;
> > -     unsigned int sg_page_sizes;
> >       int ret;
> >
> >       st = kmalloc(sizeof(*st), GFP_KERNEL);
> > @@ -435,131 +428,23 @@ __i915_gem_userptr_alloc_pages(struct drm_i915_gem_object *obj,
> >               return ERR_PTR(ret);
> >       }
> >
> > -     sg_page_sizes = i915_sg_page_sizes(st->sgl);
> > -
> > -     __i915_gem_object_set_pages(obj, st, sg_page_sizes);
> > -
> >       return st;
> >  }
> >
> > -static void
> > -__i915_gem_userptr_get_pages_worker(struct work_struct *_work)
> > -{
> > -     struct get_pages_work *work = container_of(_work, typeof(*work), work);
> > -     struct drm_i915_gem_object *obj = work->obj;
> > -     const int npages = obj->base.size >> PAGE_SHIFT;
> > -     struct page **pvec;
> > -     int pinned, ret;
> > -
> > -     ret = -ENOMEM;
> > -     pinned = 0;
> > -
> > -     pvec = kvmalloc_array(npages, sizeof(struct page *), GFP_KERNEL);
> > -     if (pvec != NULL) {
> > -             struct mm_struct *mm = obj->userptr.mm->mm;
> > -             unsigned int flags = 0;
> > -
> > -             if (!i915_gem_object_is_readonly(obj))
> > -                     flags |= FOLL_WRITE;
> > -
> > -             ret = -EFAULT;
> > -             if (mmget_not_zero(mm)) {
> > -                     down_read(&mm->mmap_sem);
> > -                     while (pinned < npages) {
> > -                             ret = get_user_pages_remote
> > -                                     (work->task, mm,
> > -                                      obj->userptr.ptr + pinned * PAGE_SIZE,
> > -                                      npages - pinned,
> > -                                      flags,
> > -                                      pvec + pinned, NULL, NULL);
> > -                             if (ret < 0)
> > -                                     break;
> > -
> > -                             pinned += ret;
> > -                     }
> > -                     up_read(&mm->mmap_sem);
> > -                     mmput(mm);
> > -             }
> > -     }
> > -
> > -     mutex_lock(&obj->mm.lock);
> > -     if (obj->userptr.work == &work->work) {
> > -             struct sg_table *pages = ERR_PTR(ret);
> > -
> > -             if (pinned == npages) {
> > -                     pages = __i915_gem_userptr_alloc_pages(obj, pvec,
> > -                                                            npages);
> > -                     if (!IS_ERR(pages)) {
> > -                             pinned = 0;
> > -                             pages = NULL;
> > -                     }
> > -             }
> > -
> > -             obj->userptr.work = ERR_CAST(pages);
> > -             if (IS_ERR(pages))
> > -                     __i915_gem_userptr_set_active(obj, false);
> > -     }
> > -     mutex_unlock(&obj->mm.lock);
> > -
> > -     release_pages(pvec, pinned);
> > -     kvfree(pvec);
> > -
> > -     i915_gem_object_put(obj);
> > -     put_task_struct(work->task);
> > -     kfree(work);
> > -}
> > -
> >  static struct sg_table *
> > -__i915_gem_userptr_get_pages_schedule(struct drm_i915_gem_object *obj)
> > -{
> > -     struct get_pages_work *work;
> > -
> > -     /* Spawn a worker so that we can acquire the
> > -      * user pages without holding our mutex. Access
> > -      * to the user pages requires mmap_sem, and we have
> > -      * a strict lock ordering of mmap_sem, struct_mutex -
> > -      * we already hold struct_mutex here and so cannot
> > -      * call gup without encountering a lock inversion.
> > -      *
> > -      * Userspace will keep on repeating the operation
> > -      * (thanks to EAGAIN) until either we hit the fast
> > -      * path or the worker completes. If the worker is
> > -      * cancelled or superseded, the task is still run
> > -      * but the results ignored. (This leads to
> > -      * complications that we may have a stray object
> > -      * refcount that we need to be wary of when
> > -      * checking for existing objects during creation.)
> > -      * If the worker encounters an error, it reports
> > -      * that error back to this function through
> > -      * obj->userptr.work = ERR_PTR.
> > -      */
> > -     work = kmalloc(sizeof(*work), GFP_KERNEL);
> > -     if (work == NULL)
> > -             return ERR_PTR(-ENOMEM);
> > -
> > -     obj->userptr.work = &work->work;
> > -
> > -     work->obj = i915_gem_object_get(obj);
> > -
> > -     work->task = current;
> > -     get_task_struct(work->task);
> > -
> > -     INIT_WORK(&work->work, __i915_gem_userptr_get_pages_worker);
> > -     queue_work(to_i915(obj->base.dev)->mm.userptr_wq, &work->work);
> > -
> > -     return ERR_PTR(-EAGAIN);
> > -}
> > -
> > -static int i915_gem_userptr_get_pages(struct drm_i915_gem_object *obj)
> > +userptr_get_pages(struct i915_gem_object_get_pages_context *ctx,
> > +               unsigned int *sizes)
> >  {
> > +     struct drm_i915_gem_object *obj = ctx->object;
> >       const int num_pages = obj->base.size >> PAGE_SHIFT;
> >       struct mm_struct *mm = obj->userptr.mm->mm;
> > -     struct page **pvec;
> >       struct sg_table *pages;
> > -     bool active;
> > +     struct page **pvec;
> >       int pinned;
> > +     int ret;
> >
> > -     /* If userspace should engineer that these pages are replaced in
> > +     /*
> > +      * If userspace should engineer that these pages are replaced in
> >        * the vma between us binding this page into the GTT and completion
> >        * of rendering... Their loss. If they change the mapping of their
> >        * pages they need to create a new bo to point to the new vma.
> > @@ -576,59 +461,58 @@ static int i915_gem_userptr_get_pages(struct drm_i915_gem_object *obj)
> >        * egregious cases from causing harm.
> >        */
> >
> > -     if (obj->userptr.work) {
> > -             /* active flag should still be held for the pending work */
> > -             if (IS_ERR(obj->userptr.work))
> > -                     return PTR_ERR(obj->userptr.work);
> > -             else
> > -                     return -EAGAIN;
> > -     }
> > +     pvec = kvmalloc_array(num_pages, sizeof(struct page *),
> > +                           GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN);
> > +     if (!pvec)
> > +             return ERR_PTR(-ENOMEM);
> > +
> > +     __i915_gem_userptr_set_active(obj, true);
> >
> > -     pvec = NULL;
> >       pinned = 0;
> > +     ret = -EFAULT;
> > +     if (mmget_not_zero(mm)) {
> > +             unsigned int flags;
> >
> > -     if (mm == current->mm) {
> > -             pvec = kvmalloc_array(num_pages, sizeof(struct page *),
> > -                                   GFP_KERNEL |
> > -                                   __GFP_NORETRY |
> > -                                   __GFP_NOWARN);
> > -             if (pvec) /* defer to worker if malloc fails */
> > -                     pinned = __get_user_pages_fast(obj->userptr.ptr,
> > -                                                    num_pages,
> > -                                                    !i915_gem_object_is_readonly(obj),
> > -                                                    pvec);
> > -     }
> > +             flags = 0;
> > +             if (!i915_gem_object_is_readonly(obj))
> > +                     flags |= FOLL_WRITE;
> >
> > -     active = false;
> > -     if (pinned < 0) {
> > -             pages = ERR_PTR(pinned);
> > -             pinned = 0;
> > -     } else if (pinned < num_pages) {
> > -             pages = __i915_gem_userptr_get_pages_schedule(obj);
> > -             active = pages == ERR_PTR(-EAGAIN);
> > -     } else {
> > -             pages = __i915_gem_userptr_alloc_pages(obj, pvec, num_pages);
> > -             active = !IS_ERR(pages);
> > +             down_read(&mm->mmap_sem);
> > +             while (pinned < num_pages) {
> > +                     ret = get_user_pages_remote
> > +                             (ctx->task, mm,
> > +                              obj->userptr.ptr + pinned * PAGE_SIZE,
> > +                              num_pages - pinned,
> > +                              flags,
> > +                              pvec + pinned, NULL, NULL);
> > +                     if (ret < 0)
> > +                             break;
> > +
> > +                     pinned += ret;
> > +             }
> > +             up_read(&mm->mmap_sem);
> > +             mmput(mm);
> >       }
> > -     if (active)
> > -             __i915_gem_userptr_set_active(obj, true);
> >
> > -     if (IS_ERR(pages))
> > +     if (ret)
> > +             pages = ERR_PTR(ret);
> > +     else
> > +             pages = __i915_gem_userptr_alloc_pages(obj, pvec, num_pages);
> > +     if (IS_ERR(pages)) {
> >               release_pages(pvec, pinned);
> > +             __i915_gem_userptr_set_active(obj, false);
> > +     }
> >       kvfree(pvec);
> >
> > -     return PTR_ERR_OR_ZERO(pages);
> > +     return pages;
> >  }
> >
> >  static void
> > -i915_gem_userptr_put_pages(struct drm_i915_gem_object *obj,
> > -                        struct sg_table *pages)
> > +userptr_put_pages(struct drm_i915_gem_object *obj, struct sg_table *pages)
> >  {
> >       struct sgt_iter sgt_iter;
> >       struct page *page;
> >
> > -     /* Cancel any inflight work and force them to restart their gup */
> > -     obj->userptr.work = NULL;
> >       __i915_gem_userptr_set_active(obj, false);
> >       if (!pages)
> >               return;
> > @@ -669,8 +553,8 @@ static const struct drm_i915_gem_object_ops i915_gem_userptr_ops = {
> >       .flags = I915_GEM_OBJECT_HAS_STRUCT_PAGE |
> >                I915_GEM_OBJECT_IS_SHRINKABLE |
> >                I915_GEM_OBJECT_ASYNC_CANCEL,
> > -     .get_pages = i915_gem_userptr_get_pages,
> > -     .put_pages = i915_gem_userptr_put_pages,
> > +     .get_pages = userptr_get_pages,
> > +     .put_pages = userptr_put_pages,
> >       .dmabuf_export = i915_gem_userptr_dmabuf_export,
> >       .release = i915_gem_userptr_release,
> >  };
> > @@ -786,22 +670,13 @@ i915_gem_userptr_ioctl(struct drm_device *dev,
> >       return 0;
> >  }
> >
> > -int i915_gem_init_userptr(struct drm_i915_private *dev_priv)
> > +void i915_gem_init_userptr(struct drm_i915_private *dev_priv)
> >  {
> >       mutex_init(&dev_priv->mm_lock);
> >       hash_init(dev_priv->mm_structs);
> > -
> > -     dev_priv->mm.userptr_wq =
> > -             alloc_workqueue("i915-userptr-acquire",
> > -                             WQ_HIGHPRI | WQ_UNBOUND,
> > -                             0);
> > -     if (!dev_priv->mm.userptr_wq)
> > -             return -ENOMEM;
> > -
> > -     return 0;
> >  }
> >
> >  void i915_gem_cleanup_userptr(struct drm_i915_private *dev_priv)
> >  {
> > -     destroy_workqueue(dev_priv->mm.userptr_wq);
> > +     mutex_destroy(&dev_priv->mm_lock);
> >  }
> > diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c b/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c
> > index 3c5d17b2b670..02e6edce715e 100644
> > --- a/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c
> > +++ b/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c
> > @@ -21,9 +21,12 @@ static void huge_free_pages(struct drm_i915_gem_object *obj,
> >       kfree(pages);
> >  }
> >
> > -static int huge_get_pages(struct drm_i915_gem_object *obj)
> > +static struct sg_table *
> > +huge_get_pages(struct i915_gem_object_get_pages_context *ctx,
> > +            unsigned int *sizes)
> >  {
> >  #define GFP (GFP_KERNEL | __GFP_NOWARN | __GFP_NORETRY)
> > +     struct drm_i915_gem_object *obj = ctx->object;
> >       const unsigned long nreal = obj->scratch / PAGE_SIZE;
> >       const unsigned long npages = obj->base.size / PAGE_SIZE;
> >       struct scatterlist *sg, *src, *end;
> > @@ -32,11 +35,11 @@ static int huge_get_pages(struct drm_i915_gem_object *obj)
> >
> >       pages = kmalloc(sizeof(*pages), GFP);
> >       if (!pages)
> > -             return -ENOMEM;
> > +             return ERR_PTR(-ENOMEM);
> >
> >       if (sg_alloc_table(pages, npages, GFP)) {
> >               kfree(pages);
> > -             return -ENOMEM;
> > +             return ERR_PTR(-ENOMEM);
> >       }
> >
> >       sg = pages->sgl;
> > @@ -64,14 +67,12 @@ static int huge_get_pages(struct drm_i915_gem_object *obj)
> >       if (i915_gem_gtt_prepare_pages(obj, pages))
> >               goto err;
> >
> > -     __i915_gem_object_set_pages(obj, pages, PAGE_SIZE);
> > -
> > -     return 0;
> > +     *sizes = PAGE_SIZE;
> > +     return pages;
> >
> >  err:
> >       huge_free_pages(obj, pages);
> > -
> > -     return -ENOMEM;
> > +     return ERR_PTR(-ENOMEM);
> >  #undef GFP
> >  }
> >
> > diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
> > index b03a29366bd1..3b93a7337f27 100644
> > --- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
> > +++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
> > @@ -51,9 +51,12 @@ static void huge_pages_free_pages(struct sg_table *st)
> >       kfree(st);
> >  }
> >
> > -static int get_huge_pages(struct drm_i915_gem_object *obj)
> > +static struct sg_table *
> > +get_huge_pages(struct i915_gem_object_get_pages_context *ctx,
> > +            unsigned int *sizes)
> >  {
> >  #define GFP (GFP_KERNEL | __GFP_NOWARN | __GFP_NORETRY)
> > +     struct drm_i915_gem_object *obj = ctx->object;
> >       unsigned int page_mask = obj->mm.page_mask;
> >       struct sg_table *st;
> >       struct scatterlist *sg;
> > @@ -62,11 +65,11 @@ static int get_huge_pages(struct drm_i915_gem_object *obj)
> >
> >       st = kmalloc(sizeof(*st), GFP);
> >       if (!st)
> > -             return -ENOMEM;
> > +             return ERR_PTR(-ENOMEM);
> >
> >       if (sg_alloc_table(st, obj->base.size >> PAGE_SHIFT, GFP)) {
> >               kfree(st);
> > -             return -ENOMEM;
> > +             return ERR_PTR(-ENOMEM);
> >       }
> >
> >       rem = obj->base.size;
> > @@ -114,16 +117,14 @@ static int get_huge_pages(struct drm_i915_gem_object *obj)
> >       obj->mm.madv = I915_MADV_DONTNEED;
> >
> >       GEM_BUG_ON(sg_page_sizes != obj->mm.page_mask);
> > -     __i915_gem_object_set_pages(obj, st, sg_page_sizes);
> > -
> > -     return 0;
> > +     *sizes = sg_page_sizes;
> > +     return st;
> >
> >  err:
> >       sg_set_page(sg, NULL, 0, 0);
> >       sg_mark_end(sg);
> >       huge_pages_free_pages(st);
> > -
> > -     return -ENOMEM;
> > +     return ERR_PTR(-ENOMEM);
> >  }
> >
> >  static void put_huge_pages(struct drm_i915_gem_object *obj,
> > @@ -175,8 +176,11 @@ huge_pages_object(struct drm_i915_private *i915,
> >       return obj;
> >  }
> >
> > -static int fake_get_huge_pages(struct drm_i915_gem_object *obj)
> > +static struct sg_table *
> > +fake_get_huge_pages(struct i915_gem_object_get_pages_context *ctx,
> > +                 unsigned int *sizes)
> >  {
> > +     struct drm_i915_gem_object *obj = ctx->object;
> >       struct drm_i915_private *i915 = to_i915(obj->base.dev);
> >       const u64 max_len = rounddown_pow_of_two(UINT_MAX);
> >       struct sg_table *st;
> > @@ -186,11 +190,11 @@ static int fake_get_huge_pages(struct drm_i915_gem_object *obj)
> >
> >       st = kmalloc(sizeof(*st), GFP);
> >       if (!st)
> > -             return -ENOMEM;
> > +             return ERR_PTR(-ENOMEM);
> >
> >       if (sg_alloc_table(st, obj->base.size >> PAGE_SHIFT, GFP)) {
> >               kfree(st);
> > -             return -ENOMEM;
> > +             return ERR_PTR(-ENOMEM);
> >       }
> >
> >       /* Use optimal page sized chunks to fill in the sg table */
> > @@ -227,13 +231,15 @@ static int fake_get_huge_pages(struct drm_i915_gem_object *obj)
> >
> >       obj->mm.madv = I915_MADV_DONTNEED;
> >
> > -     __i915_gem_object_set_pages(obj, st, sg_page_sizes);
> > -
> > -     return 0;
> > +     *sizes = sg_page_sizes;
> > +     return st;
> >  }
> >
> > -static int fake_get_huge_pages_single(struct drm_i915_gem_object *obj)
> > +static struct sg_table *
> > +fake_get_huge_pages_single(struct i915_gem_object_get_pages_context *ctx,
> > +                        unsigned int *sizes)
> >  {
> > +     struct drm_i915_gem_object *obj = ctx->object;
> >       struct drm_i915_private *i915 = to_i915(obj->base.dev);
> >       struct sg_table *st;
> >       struct scatterlist *sg;
> > @@ -241,11 +247,11 @@ static int fake_get_huge_pages_single(struct drm_i915_gem_object *obj)
> >
> >       st = kmalloc(sizeof(*st), GFP);
> >       if (!st)
> > -             return -ENOMEM;
> > +             return ERR_PTR(-ENOMEM);
> >
> >       if (sg_alloc_table(st, 1, GFP)) {
> >               kfree(st);
> > -             return -ENOMEM;
> > +             return ERR_PTR(-ENOMEM);
> >       }
> >
> >       sg = st->sgl;
> > @@ -261,9 +267,8 @@ static int fake_get_huge_pages_single(struct drm_i915_gem_object *obj)
> >
> >       obj->mm.madv = I915_MADV_DONTNEED;
> >
> > -     __i915_gem_object_set_pages(obj, st, sg->length);
> > -
> > -     return 0;
> > +     *sizes = sg->length;
> > +     return st;
> >  #undef GFP
> >  }
> >
> > diff --git a/drivers/gpu/drm/i915/gvt/dmabuf.c b/drivers/gpu/drm/i915/gvt/dmabuf.c
> > index 41c8ebc60c63..c2a81939f698 100644
> > --- a/drivers/gpu/drm/i915/gvt/dmabuf.c
> > +++ b/drivers/gpu/drm/i915/gvt/dmabuf.c
> > @@ -36,9 +36,11 @@
> >
> >  #define GEN8_DECODE_PTE(pte) (pte & GENMASK_ULL(63, 12))
> >
> > -static int vgpu_gem_get_pages(
> > -             struct drm_i915_gem_object *obj)
> > +static struct sg_table *
> > +vgpu_gem_get_pages(struct i915_gem_object_get_pages_context *ctx,
> > +                unsigned int *sizes)
> >  {
> > +     struct drm_i915_gem_object *obj = ctx->object;
> >       struct drm_i915_private *dev_priv = to_i915(obj->base.dev);
> >       struct sg_table *st;
> >       struct scatterlist *sg;
> > @@ -49,17 +51,17 @@ static int vgpu_gem_get_pages(
> >
> >       fb_info = (struct intel_vgpu_fb_info *)obj->gvt_info;
> >       if (WARN_ON(!fb_info))
> > -             return -ENODEV;
> > +             return ERR_PTR(-ENODEV);
> >
> >       st = kmalloc(sizeof(*st), GFP_KERNEL);
> >       if (unlikely(!st))
> > -             return -ENOMEM;
> > +             return ERR_PTR(-ENOMEM);
> >
> >       page_num = obj->base.size >> PAGE_SHIFT;
> >       ret = sg_alloc_table(st, page_num, GFP_KERNEL);
> >       if (ret) {
> >               kfree(st);
> > -             return ret;
> > +             return ERR_PTR(ret);
> >       }
> >       gtt_entries = (gen8_pte_t __iomem *)dev_priv->ggtt.gsm +
> >               (fb_info->start >> PAGE_SHIFT);
> > @@ -71,9 +73,8 @@ static int vgpu_gem_get_pages(
> >               sg_dma_len(sg) = PAGE_SIZE;
> >       }
> >
> > -     __i915_gem_object_set_pages(obj, st, PAGE_SIZE);
> > -
> > -     return 0;
> > +     *sizes = PAGE_SIZE;
> > +     return st;
> >  }
> >
> >  static void vgpu_gem_put_pages(struct drm_i915_gem_object *obj,
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > index 9a1ec487e8b1..ef9e4cb49b4a 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -786,13 +786,6 @@ struct i915_gem_mm {
> >       struct notifier_block vmap_notifier;
> >       struct shrinker shrinker;
> >
> > -     /**
> > -      * Workqueue to fault in userptr pages, flushed by the execbuf
> > -      * when required but otherwise left to userspace to try again
> > -      * on EAGAIN.
> > -      */
> > -     struct workqueue_struct *userptr_wq;
> > -
> >       u64 unordered_timeline;
> >
> >       /* the indicator for dispatch video commands on two BSD rings */
> > @@ -2514,7 +2507,7 @@ static inline bool intel_vgpu_active(struct drm_i915_private *dev_priv)
> >  }
> >
> >  /* i915_gem.c */
> > -int i915_gem_init_userptr(struct drm_i915_private *dev_priv);
> > +void i915_gem_init_userptr(struct drm_i915_private *dev_priv);
> >  void i915_gem_cleanup_userptr(struct drm_i915_private *dev_priv);
> >  void i915_gem_sanitize(struct drm_i915_private *i915);
> >  int i915_gem_init_early(struct drm_i915_private *dev_priv);
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index 21416b87905e..1571c707ad15 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -1544,10 +1544,7 @@ int i915_gem_init(struct drm_i915_private *dev_priv)
> >       dev_priv->mm.unordered_timeline = dma_fence_context_alloc(1);
> >
> >       i915_timelines_init(dev_priv);
> > -
> > -     ret = i915_gem_init_userptr(dev_priv);
> > -     if (ret)
> > -             return ret;
> > +     i915_gem_init_userptr(dev_priv);
> >
> >       ret = intel_uc_init_misc(dev_priv);
> >       if (ret)
> > diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
> > index 664b7e3e1d7c..dae1f8634a38 100644
> > --- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
> > +++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
> > @@ -54,10 +54,13 @@ static void fake_free_pages(struct drm_i915_gem_object *obj,
> >       kfree(pages);
> >  }
> >
> > -static int fake_get_pages(struct drm_i915_gem_object *obj)
> > +static struct sg_table *
> > +fake_get_pages(struct i915_gem_object_get_pages_context *ctx,
> > +            unsigned int *sizes)
> >  {
> >  #define GFP (GFP_KERNEL | __GFP_NOWARN | __GFP_NORETRY)
> >  #define PFN_BIAS 0x1000
> > +     struct drm_i915_gem_object *obj = ctx->object;
> >       struct sg_table *pages;
> >       struct scatterlist *sg;
> >       unsigned int sg_page_sizes;
> > @@ -65,12 +68,12 @@ static int fake_get_pages(struct drm_i915_gem_object *obj)
> >
> >       pages = kmalloc(sizeof(*pages), GFP);
> >       if (!pages)
> > -             return -ENOMEM;
> > +             return ERR_PTR(-ENOMEM);
> >
> >       rem = round_up(obj->base.size, BIT(31)) >> 31;
> >       if (sg_alloc_table(pages, rem, GFP)) {
> >               kfree(pages);
> > -             return -ENOMEM;
> > +             return ERR_PTR(-ENOMEM);
> >       }
> >
> >       sg_page_sizes = 0;
> > @@ -90,9 +93,8 @@ static int fake_get_pages(struct drm_i915_gem_object *obj)
> >
> >       obj->mm.madv = I915_MADV_DONTNEED;
> >
> > -     __i915_gem_object_set_pages(obj, pages, sg_page_sizes);
> > -
> > -     return 0;
> > +     *sizes = sg_page_sizes;
> > +     return pages;
> >  #undef GFP
> >  }
> >
> > --
> > 2.20.1
> >
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch



--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 61+ messages in thread

end of thread, other threads:[~2019-08-08  7:31 UTC | newest]

Thread overview: 61+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-14  7:09 WIP glance towards struct_mutex removal Chris Wilson
2019-06-14  7:09 ` [PATCH 01/39] drm/i915: Discard some redundant cache domain flushes Chris Wilson
2019-06-14  9:12   ` Matthew Auld
2019-06-14  7:09 ` [PATCH 02/39] drm/i915: Refine i915_reset.lock_map Chris Wilson
2019-06-14 14:10   ` Mika Kuoppala
2019-06-14 14:17     ` Chris Wilson
2019-06-14  7:09 ` [PATCH 03/39] drm/i915: Avoid tainting i915_gem_park() with wakeref.lock Chris Wilson
2019-06-14 14:47   ` Mika Kuoppala
2019-06-14 14:51     ` Chris Wilson
2019-06-14  7:09 ` [PATCH 04/39] drm/i915: Enable refcount debugging for default debug levels Chris Wilson
2019-06-14 15:11   ` Mika Kuoppala
2019-06-14 15:11   ` Mika Kuoppala
2019-06-14  7:09 ` [PATCH 05/39] drm/i915: Keep contexts pinned until after the next kernel context switch Chris Wilson
2019-06-14  7:09 ` [PATCH 06/39] drm/i915: Stop retiring along engine Chris Wilson
2019-06-14  7:09 ` [PATCH 07/39] drm/i915: Replace engine->timeline with a plain list Chris Wilson
2019-06-14  7:09 ` [PATCH 08/39] drm/i915: Flush the execution-callbacks on retiring Chris Wilson
2019-06-14  7:09 ` [PATCH 09/39] drm/i915/execlists: Preempt-to-busy Chris Wilson
2019-06-14  7:09 ` [PATCH 10/39] drm/i915/execlists: Minimalistic timeslicing Chris Wilson
2019-06-14  7:09 ` [PATCH 11/39] drm/i915/execlists: Force preemption Chris Wilson
2019-06-14  7:09 ` [PATCH 12/39] dma-fence: Propagate errors to dma-fence-array container Chris Wilson
2019-06-14  7:09 ` [PATCH 13/39] dma-fence: Report the composite sync_file status Chris Wilson
2019-06-14  7:09 ` [PATCH 14/39] dma-fence: Refactor signaling for manual invocation Chris Wilson
2019-06-14  7:09 ` [PATCH 15/39] dma-fence: Always execute signal callbacks Chris Wilson
2019-06-14  7:10 ` [PATCH 16/39] drm/i915: Execute signal callbacks from no-op i915_request_wait Chris Wilson
2019-06-14 10:45   ` Tvrtko Ursulin
2019-06-14  7:10 ` [PATCH 17/39] drm/i915: Make the semaphore saturation mask global Chris Wilson
2019-06-14  7:10 ` [PATCH 18/39] drm/i915: Throw away the active object retirement complexity Chris Wilson
2019-06-14  7:10 ` [PATCH 19/39] drm/i915: Provide an i915_active.acquire callback Chris Wilson
2019-06-14  7:10 ` [PATCH 20/39] drm/i915: Push the i915_active.retire into a worker Chris Wilson
2019-06-14  7:10 ` [PATCH 21/39] drm/i915/overlay: Switch to using i915_active tracking Chris Wilson
2019-06-14  7:10 ` [PATCH 22/39] drm/i915: Forgo last_fence active request tracking Chris Wilson
2019-06-14  7:10 ` [PATCH 23/39] drm/i915: Extract intel_frontbuffer active tracking Chris Wilson
2019-06-14  7:10 ` [PATCH 24/39] drm/i915: Coordinate i915_active with its own mutex Chris Wilson
2019-06-14  7:10 ` [PATCH 25/39] drm/i915: Track ggtt fence reservations under " Chris Wilson
2019-06-14  7:10 ` [PATCH 26/39] drm/i915: Only track bound elements of the GTT Chris Wilson
2019-06-14  7:10 ` [PATCH 27/39] drm/i915: Explicitly cleanup initial_plane_config Chris Wilson
2019-06-14  7:10 ` [PATCH 28/39] initial-plane-vma Chris Wilson
2019-06-14  7:10 ` [PATCH 29/39] drm/i915: Make i915_vma track its own kref Chris Wilson
2019-06-14 11:15   ` Matthew Auld
2019-06-14 11:18     ` Chris Wilson
2019-06-14  7:10 ` [PATCH 30/39] drm/i915: Propagate fence errors Chris Wilson
2019-06-14  7:10 ` [PATCH 31/39] drm/i915: Allow page pinning to be in the background Chris Wilson
2019-06-14  7:10 ` [PATCH 32/39] drm/i915: Allow vma binding to occur asynchronously Chris Wilson
2019-08-05 20:35   ` Kumar Valsan, Prathap
2019-06-14  7:10 ` [PATCH 33/39] revoke-fence Chris Wilson
2019-06-14  7:10 ` [PATCH 34/39] drm/i915: Use vm->mutex for serialising GTT insertion Chris Wilson
2019-06-14 19:14   ` Matthew Auld
2019-06-14 19:20     ` Chris Wilson
2019-06-14  7:10 ` [PATCH 35/39] drm/i915: Pin pages before waiting Chris Wilson
2019-06-14 19:53   ` Matthew Auld
2019-06-14 19:58     ` Chris Wilson
2019-06-14 20:13       ` Chris Wilson
2019-06-14 20:18         ` Matthew Auld
2019-06-14  7:10 ` [PATCH 36/39] drm/i915: Use reservation_object to coordinate userptr get_pages() Chris Wilson
2019-08-07 20:49   ` Daniel Vetter
2019-08-08  7:30     ` [Intel-gfx] " Daniel Vetter
2019-06-14  7:10 ` [PATCH 37/39] drm/i915: Use forced preemptions in preference over hangcheck Chris Wilson
2019-06-14  7:10 ` [PATCH 38/39] drm/i915: Remove logical HW ID Chris Wilson
2019-06-14  7:10 ` [PATCH 39/39] active-rings Chris Wilson
2019-06-14  7:22 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/39] drm/i915: Discard some redundant cache domain flushes Patchwork
2019-06-14  7:35 ` ✗ Fi.CI.SPARSE: " Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.