All of lore.kernel.org
 help / color / mirror / Atom feed
* Tracking multiple timelines (full-ppgtt)
@ 2016-09-14  6:52 Chris Wilson
  2016-09-14  6:52 ` [PATCH 01/18] drm/i915: Support asynchronous waits on struct fence from i915_gem_request Chris Wilson
                   ` (18 more replies)
  0 siblings, 19 replies; 51+ messages in thread
From: Chris Wilson @ 2016-09-14  6:52 UTC (permalink / raw)
  To: intel-gfx

Here's how I think we can handle multiple timelines whilst preserving
the efficiency of retirement ordering of requests and the single waiter
property.
-Chris

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 01/18] drm/i915: Support asynchronous waits on struct fence from i915_gem_request
  2016-09-14  6:52 Tracking multiple timelines (full-ppgtt) Chris Wilson
@ 2016-09-14  6:52 ` Chris Wilson
  2016-09-14  7:37   ` Joonas Lahtinen
  2016-09-14  6:52 ` [PATCH 02/18] drm/i915: Allow i915_sw_fence_await_sw_fence() to allocate Chris Wilson
                   ` (17 subsequent siblings)
  18 siblings, 1 reply; 51+ messages in thread
From: Chris Wilson @ 2016-09-14  6:52 UTC (permalink / raw)
  To: intel-gfx

We will need to wait on DMA completion (as signaled via struct fence)
before executing our i915_gem_request. Therefore we want to expose a
method for adding the await on the fence itself to the request.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_request.c | 40 +++++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_gem_request.h |  2 ++
 2 files changed, 42 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
index 40978bc12ceb..624d71bf4518 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -23,6 +23,7 @@
  */
 
 #include <linux/prefetch.h>
+#include <linux/fence-array.h>
 
 #include "i915_drv.h"
 
@@ -494,6 +495,45 @@ i915_gem_request_await_request(struct drm_i915_gem_request *to,
 	return 0;
 }
 
+int
+i915_gem_request_await_fence(struct drm_i915_gem_request *req,
+			     struct fence *fence)
+{
+	struct fence_array *array;
+	int ret;
+	int i;
+
+	if (test_bit(FENCE_FLAG_SIGNALED_BIT, &fence->flags))
+		return 0;
+
+	if (fence_is_i915(fence))
+		return i915_gem_request_await_request(req, to_request(fence));
+
+	if (!fence_is_array(fence)) {
+		ret = i915_sw_fence_await_dma_fence(&req->submit,
+						    fence, 10*HZ,
+						    GFP_KERNEL);
+		return ret < 0 ? ret : 0;
+	}
+
+	array = to_fence_array(fence);
+	for (i = 0; i < array->num_fences; i++) {
+		struct fence *child = array->fences[i];
+
+		if (fence_is_i915(child))
+			ret = i915_gem_request_await_request(req,
+							     to_request(child));
+		else
+			ret = i915_sw_fence_await_dma_fence(&req->submit,
+							    child, 10*HZ,
+							    GFP_KERNEL);
+		if (ret < 0)
+			return ret;
+	}
+
+	return 0;
+}
+
 /**
  * i915_gem_request_await_object - set this request to (async) wait upon a bo
  *
diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
index 974bd7bcc801..c85a3d82febf 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.h
+++ b/drivers/gpu/drm/i915/i915_gem_request.h
@@ -214,6 +214,8 @@ int
 i915_gem_request_await_object(struct drm_i915_gem_request *to,
 			      struct drm_i915_gem_object *obj,
 			      bool write);
+int i915_gem_request_await_fence(struct drm_i915_gem_request *req,
+				 struct fence *fence);
 
 void __i915_add_request(struct drm_i915_gem_request *req, bool flush_caches);
 #define i915_add_request(req) \
-- 
2.9.3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 02/18] drm/i915: Allow i915_sw_fence_await_sw_fence() to allocate
  2016-09-14  6:52 Tracking multiple timelines (full-ppgtt) Chris Wilson
  2016-09-14  6:52 ` [PATCH 01/18] drm/i915: Support asynchronous waits on struct fence from i915_gem_request Chris Wilson
@ 2016-09-14  6:52 ` Chris Wilson
  2016-09-14  7:51   ` Joonas Lahtinen
  2016-09-14  6:52 ` [PATCH 03/18] drm/i915: Rearrange i915_wait_request() accounting with callers Chris Wilson
                   ` (16 subsequent siblings)
  18 siblings, 1 reply; 51+ messages in thread
From: Chris Wilson @ 2016-09-14  6:52 UTC (permalink / raw)
  To: intel-gfx

In forthcoming patches, we want to be able to dynamically allocate the
wait_queue_t used whilst awaiting. This is more convenient if we extend
the i915_sw_fence_await_sw_fence() to perform the allocation for us if
we pass in a gfp mask rather than a preallocate struct.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_request.c |  2 +-
 drivers/gpu/drm/i915/i915_sw_fence.c    | 20 ++++++++++++++++++--
 drivers/gpu/drm/i915/i915_sw_fence.h    |  7 ++++++-
 3 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
index 624d71bf4518..7496661b295e 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -678,7 +678,7 @@ void __i915_add_request(struct drm_i915_gem_request *request, bool flush_caches)
 				   &request->i915->drm.struct_mutex);
 	if (prev)
 		i915_sw_fence_await_sw_fence(&request->submit, &prev->submit,
-					     &request->submitq);
+					     &request->submitq, GFP_NOWAIT);
 
 	request->emitted_jiffies = jiffies;
 	request->previous_seqno = engine->last_submitted_seqno;
diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c b/drivers/gpu/drm/i915/i915_sw_fence.c
index 1e5cbc585ca2..0fe8b446ae84 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence.c
+++ b/drivers/gpu/drm/i915/i915_sw_fence.c
@@ -135,6 +135,8 @@ static int i915_sw_fence_wake(wait_queue_t *wq, unsigned mode, int flags, void *
 	list_del(&wq->task_list);
 	__i915_sw_fence_complete(wq->private, key);
 	i915_sw_fence_put(wq->private);
+	if (wq->flags)
+		kfree(wq);
 	return 0;
 }
 
@@ -194,7 +196,7 @@ static bool i915_sw_fence_check_if_after(struct i915_sw_fence *fence,
 
 int i915_sw_fence_await_sw_fence(struct i915_sw_fence *fence,
 				 struct i915_sw_fence *signaler,
-				 wait_queue_t *wq)
+				 wait_queue_t *wq, gfp_t gfp)
 {
 	unsigned long flags;
 	int pending;
@@ -206,8 +208,22 @@ int i915_sw_fence_await_sw_fence(struct i915_sw_fence *fence,
 	if (unlikely(i915_sw_fence_check_if_after(fence, signaler)))
 		return -EINVAL;
 
+	pending = 0;
+	if (!wq) {
+		wq = kmalloc(sizeof(*wq), gfp);
+		if (!wq) {
+			if (!gfpflags_allow_blocking(gfp))
+				return -ENOMEM;
+
+			i915_sw_fence_wait(signaler);
+			return 0;
+		}
+
+		pending = 1;
+	}
+
 	INIT_LIST_HEAD(&wq->task_list);
-	wq->flags = 0;
+	wq->flags = pending;
 	wq->func = i915_sw_fence_wake;
 	wq->private = i915_sw_fence_get(fence);
 
diff --git a/drivers/gpu/drm/i915/i915_sw_fence.h b/drivers/gpu/drm/i915/i915_sw_fence.h
index 373141602ca4..d221df381ec2 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence.h
+++ b/drivers/gpu/drm/i915/i915_sw_fence.h
@@ -45,7 +45,7 @@ void i915_sw_fence_commit(struct i915_sw_fence *fence);
 
 int i915_sw_fence_await_sw_fence(struct i915_sw_fence *fence,
 				 struct i915_sw_fence *after,
-				 wait_queue_t *wq);
+				 wait_queue_t *wq, gfp_t gfp);
 int i915_sw_fence_await_dma_fence(struct i915_sw_fence *fence,
 				  struct fence *dma,
 				  unsigned long timeout,
@@ -62,4 +62,9 @@ static inline bool i915_sw_fence_done(const struct i915_sw_fence *fence)
 	return atomic_read(&fence->pending) < 0;
 }
 
+static inline void i915_sw_fence_wait(struct i915_sw_fence *fence)
+{
+	wait_event(fence->wait, i915_sw_fence_done(fence));
+}
+
 #endif /* _I915_SW_FENCE_H_ */
-- 
2.9.3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 03/18] drm/i915: Rearrange i915_wait_request() accounting with callers
  2016-09-14  6:52 Tracking multiple timelines (full-ppgtt) Chris Wilson
  2016-09-14  6:52 ` [PATCH 01/18] drm/i915: Support asynchronous waits on struct fence from i915_gem_request Chris Wilson
  2016-09-14  6:52 ` [PATCH 02/18] drm/i915: Allow i915_sw_fence_await_sw_fence() to allocate Chris Wilson
@ 2016-09-14  6:52 ` Chris Wilson
  2016-09-14  8:47   ` Joonas Lahtinen
  2016-09-14  6:52 ` [PATCH 04/18] drm/i915: Remove unused i915_gem_active_wait() in favour of _unlocked() Chris Wilson
                   ` (15 subsequent siblings)
  18 siblings, 1 reply; 51+ messages in thread
From: Chris Wilson @ 2016-09-14  6:52 UTC (permalink / raw)
  To: intel-gfx

Our low-level wait routine has evolved from our generic wait interface
that handled unlocked, RPS boosting, waits with time tracking. If we
push our GEM fence tracking to use reservation_objects (required for
handling multiple timelines), we lose the ability to pass the required
information down to i915_wait_request(). However, if we push the extra
functionality from i915_wait_request() to the individual callsites
(i915_gem_object_wait_rendering and i915_gem_wait_ioctl) that make use
of those extras, we can both simplify our low level wait and prepare for
extending the GEM interface for use of reservation_objects.

* This causes a temporary regression in the use of wait-ioctl as a busy
query -- it will fail to report immediately, but go into
i915_wait_request() before timing out.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h         |   7 +-
 drivers/gpu/drm/i915/i915_gem.c         | 300 +++++++++++++++++++++++---------
 drivers/gpu/drm/i915/i915_gem_request.c | 117 ++-----------
 drivers/gpu/drm/i915/i915_gem_request.h |  32 ++--
 drivers/gpu/drm/i915/i915_gem_userptr.c |  12 +-
 drivers/gpu/drm/i915/intel_display.c    |  34 ++--
 drivers/gpu/drm/i915/intel_ringbuffer.c |  14 +-
 drivers/gpu/drm/i915/intel_ringbuffer.h |   3 +-
 8 files changed, 283 insertions(+), 236 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 1e2dda88a483..af0212032b64 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3274,9 +3274,10 @@ int __must_check i915_gem_wait_for_idle(struct drm_i915_private *dev_priv,
 int __must_check i915_gem_suspend(struct drm_device *dev);
 void i915_gem_resume(struct drm_device *dev);
 int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf);
-int __must_check
-i915_gem_object_wait_rendering(struct drm_i915_gem_object *obj,
-			       bool readonly);
+int i915_gem_object_wait(struct drm_i915_gem_object *obj,
+			 unsigned int flags,
+			 long timeout,
+			 struct intel_rps_client *rps);
 int __must_check
 i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj,
 				  bool write);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index c8bd02277b7d..d9214c9d31d2 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -292,7 +292,12 @@ int i915_gem_object_unbind(struct drm_i915_gem_object *obj)
 	 * must wait for all rendering to complete to the object (as unbinding
 	 * must anyway), and retire the requests.
 	 */
-	ret = i915_gem_object_wait_rendering(obj, false);
+	ret = i915_gem_object_wait(obj,
+				   I915_WAIT_INTERRUPTIBLE |
+				   I915_WAIT_LOCKED |
+				   I915_WAIT_ALL,
+				   MAX_SCHEDULE_TIMEOUT,
+				   NULL);
 	if (ret)
 		return ret;
 
@@ -311,88 +316,171 @@ int i915_gem_object_unbind(struct drm_i915_gem_object *obj)
 	return ret;
 }
 
-/**
- * Ensures that all rendering to the object has completed and the object is
- * safe to unbind from the GTT or access from the CPU.
- * @obj: i915 gem object
- * @readonly: waiting for just read access or read-write access
- */
-int
-i915_gem_object_wait_rendering(struct drm_i915_gem_object *obj,
-			       bool readonly)
+static long
+i915_gem_object_wait_fence(struct fence *fence,
+			   unsigned int flags,
+			   long timeout,
+			   struct intel_rps_client *rps)
 {
-	struct reservation_object *resv;
-	struct i915_gem_active *active;
-	unsigned long active_mask;
-	int idx;
+	struct drm_i915_gem_request *rq;
 
-	lockdep_assert_held(&obj->base.dev->struct_mutex);
+	BUILD_BUG_ON(I915_WAIT_INTERRUPTIBLE != 0x1);
 
-	if (!readonly) {
-		active = obj->last_read;
-		active_mask = i915_gem_object_get_active(obj);
-	} else {
-		active_mask = 1;
-		active = &obj->last_write;
+	if (test_bit(FENCE_FLAG_SIGNALED_BIT, &fence->flags))
+		return timeout;
+
+	if (!fence_is_i915(fence))
+		return fence_wait_timeout(fence,
+					  flags & I915_WAIT_INTERRUPTIBLE,
+					  timeout);
+
+	rq = to_request(fence);
+	if (i915_gem_request_completed(rq))
+		goto out;
+
+	/* This client is about to stall waiting for the GPU. In many cases
+	 * this is undesirable and limits the throughput of the system, as
+	 * many clients cannot continue processing user input/output whilst
+	 * blocked. RPS autotuning may take tens of milliseconds to respond
+	 * to the GPU load and thus incurs additional latency for the client.
+	 * We can circumvent that by promoting the GPU frequency to maximum
+	 * before we wait. This makes the GPU throttle up much more quickly
+	 * (good for benchmarks and user experience, e.g. window animations),
+	 * but at a cost of spending more power processing the workload
+	 * (bad for battery). Not all clients even want their results
+	 * immediately and for them we should just let the GPU select its own
+	 * frequency to maximise efficiency. To prevent a single client from
+	 * forcing the clocks too high for the whole system, we only allow
+	 * each client to waitboost once in a busy period.
+	 */
+	if (rps) {
+		if (INTEL_GEN(rq->i915) >= 6)
+			gen6_rps_boost(rq->i915, rps, rq->emitted_jiffies);
+		else
+			rps = NULL;
 	}
 
-	for_each_active(active_mask, idx) {
+	timeout = i915_wait_request(rq, flags, timeout);
+
+out:
+	if (flags & I915_WAIT_LOCKED && i915_gem_request_completed(rq))
+		i915_gem_request_retire_upto(rq);
+
+	if (rps && rq->fence.seqno == rq->engine->last_submitted_seqno) {
+		/* The GPU is now idle and this client has stalled.
+		 * Since no other client has submitted a request in the
+		 * meantime, assume that this client is the only one
+		 * supplying work to the GPU but is unable to keep that
+		 * work supplied because it is waiting. Since the GPU is
+		 * then never kept fully busy, RPS autoclocking will
+		 * keep the clocks relatively low, causing further delays.
+		 * Compensate by giving the synchronous client credit for
+		 * a waitboost next time.
+		 */
+		spin_lock(&rq->i915->rps.client_lock);
+		list_del_init(&rps->link);
+		spin_unlock(&rq->i915->rps.client_lock);
+	}
+
+	return timeout;
+}
+
+static long
+i915_gem_object_wait_reservation(struct reservation_object *resv,
+				 unsigned int flags,
+				 long timeout,
+				 struct intel_rps_client *rps)
+{
+	struct fence *excl;
+
+	if (flags & I915_WAIT_ALL) {
+		struct fence **shared;
+		unsigned int count, i;
 		int ret;
 
-		ret = i915_gem_active_wait(&active[idx],
-					   &obj->base.dev->struct_mutex);
+		ret = reservation_object_get_fences_rcu(resv,
+							&excl, &count, &shared);
 		if (ret)
 			return ret;
-	}
 
-	resv = i915_gem_object_get_dmabuf_resv(obj);
-	if (resv) {
-		long err;
+		for (i = 0; i < count; i++) {
+			timeout = i915_gem_object_wait_fence(shared[i],
+							     flags, timeout,
+							     rps);
+			if (timeout <= 0)
+				break;
 
-		err = reservation_object_wait_timeout_rcu(resv, !readonly, true,
-							  MAX_SCHEDULE_TIMEOUT);
-		if (err < 0)
-			return err;
+			fence_put(shared[i]);
+		}
+
+		for (; i < count; i++)
+			fence_put(shared[i]);
+		kfree(shared);
+	} else {
+		excl = reservation_object_get_excl_rcu(resv);
 	}
 
-	return 0;
+	if (excl && timeout > 0)
+		timeout = i915_gem_object_wait_fence(excl, flags, timeout, rps);
+
+	fence_put(excl);
+
+	return timeout;
 }
 
-/* A nonblocking variant of the above wait. Must be called prior to
- * acquiring the mutex for the object, as the object state may change
- * during this call. A reference must be held by the caller for the object.
+/**
+ * Waits for rendering to the object to be completed
+ * @obj: i915 gem object
+ * @flags: how to wait (under a lock, for all rendering or just for writes etc)
+ * @timeout: how long to wait
+ * @rps: client (user process) to charge for any waitboosting
  */
-static __must_check int
-__unsafe_wait_rendering(struct drm_i915_gem_object *obj,
-			struct intel_rps_client *rps,
-			bool readonly)
+int
+i915_gem_object_wait(struct drm_i915_gem_object *obj,
+		     unsigned int flags,
+		     long timeout,
+		     struct intel_rps_client *rps)
 {
+	struct reservation_object *resv;
 	struct i915_gem_active *active;
 	unsigned long active_mask;
 	int idx;
 
-	active_mask = __I915_BO_ACTIVE(obj);
-	if (!active_mask)
-		return 0;
+	might_sleep();
+#if IS_ENABLED(CONFIG_LOCKDEP)
+	GEM_BUG_ON(!!lockdep_is_held(&obj->base.dev->struct_mutex) !=
+		   !!(flags & I915_WAIT_LOCKED));
+#endif
+	GEM_BUG_ON(timeout < 0);
 
-	if (!readonly) {
+	if (flags & I915_WAIT_ALL) {
 		active = obj->last_read;
+		active_mask = i915_gem_object_get_active(obj);
 	} else {
 		active_mask = 1;
 		active = &obj->last_write;
 	}
 
 	for_each_active(active_mask, idx) {
-		int ret;
-
-		ret = i915_gem_active_wait_unlocked(&active[idx],
-						    I915_WAIT_INTERRUPTIBLE,
-						    NULL, rps);
-		if (ret)
-			return ret;
+		struct drm_i915_gem_request *request;
+
+		request = i915_gem_active_get_unlocked(&active[idx]);
+		if (request) {
+			timeout = i915_gem_object_wait_fence(&request->fence,
+							     flags, timeout,
+							     rps);
+			i915_gem_request_put(request);
+		}
+		if (timeout < 0)
+			return timeout;
 	}
 
-	return 0;
+	resv = i915_gem_object_get_dmabuf_resv(obj);
+	if (resv)
+		timeout = i915_gem_object_wait_reservation(resv,
+							   flags, timeout,
+							   rps);
+	return timeout < 0 ? timeout : timeout > 0 ? 0 : -ETIME;
 }
 
 static struct intel_rps_client *to_rps_client(struct drm_file *file)
@@ -454,7 +542,13 @@ i915_gem_phys_pwrite(struct drm_i915_gem_object *obj,
 	/* We manually control the domain here and pretend that it
 	 * remains coherent i.e. in the GTT domain, like shmem_pwrite.
 	 */
-	ret = i915_gem_object_wait_rendering(obj, false);
+	lockdep_assert_held(&obj->base.dev->struct_mutex);
+	ret = i915_gem_object_wait(obj,
+				   I915_WAIT_INTERRUPTIBLE |
+				   I915_WAIT_LOCKED |
+				   I915_WAIT_ALL,
+				   MAX_SCHEDULE_TIMEOUT,
+				   to_rps_client(file_priv));
 	if (ret)
 		return ret;
 
@@ -614,12 +708,17 @@ int i915_gem_obj_prepare_shmem_read(struct drm_i915_gem_object *obj,
 {
 	int ret;
 
-	*needs_clflush = 0;
+	lockdep_assert_held(&obj->base.dev->struct_mutex);
 
+	*needs_clflush = 0;
 	if (!i915_gem_object_has_struct_page(obj))
 		return -ENODEV;
 
-	ret = i915_gem_object_wait_rendering(obj, true);
+	ret = i915_gem_object_wait(obj,
+				   I915_WAIT_INTERRUPTIBLE |
+				   I915_WAIT_LOCKED,
+				   MAX_SCHEDULE_TIMEOUT,
+				   NULL);
 	if (ret)
 		return ret;
 
@@ -661,11 +760,18 @@ int i915_gem_obj_prepare_shmem_write(struct drm_i915_gem_object *obj,
 {
 	int ret;
 
+	lockdep_assert_held(&obj->base.dev->struct_mutex);
+
 	*needs_clflush = 0;
 	if (!i915_gem_object_has_struct_page(obj))
 		return -ENODEV;
 
-	ret = i915_gem_object_wait_rendering(obj, false);
+	ret = i915_gem_object_wait(obj,
+				   I915_WAIT_INTERRUPTIBLE |
+				   I915_WAIT_LOCKED |
+				   I915_WAIT_ALL,
+				   MAX_SCHEDULE_TIMEOUT,
+				   NULL);
 	if (ret)
 		return ret;
 
@@ -1050,7 +1156,10 @@ i915_gem_pread_ioctl(struct drm_device *dev, void *data,
 
 	trace_i915_gem_object_pread(obj, args->offset, args->size);
 
-	ret = __unsafe_wait_rendering(obj, to_rps_client(file), true);
+	ret = i915_gem_object_wait(obj,
+				   I915_WAIT_INTERRUPTIBLE,
+				   MAX_SCHEDULE_TIMEOUT,
+				   to_rps_client(file));
 	if (ret)
 		goto err;
 
@@ -1450,7 +1559,11 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data,
 
 	trace_i915_gem_object_pwrite(obj, args->offset, args->size);
 
-	ret = __unsafe_wait_rendering(obj, to_rps_client(file), false);
+	ret = i915_gem_object_wait(obj,
+				   I915_WAIT_INTERRUPTIBLE |
+				   I915_WAIT_ALL,
+				   MAX_SCHEDULE_TIMEOUT,
+				   to_rps_client(file));
 	if (ret)
 		goto err;
 
@@ -1537,7 +1650,11 @@ i915_gem_set_domain_ioctl(struct drm_device *dev, void *data,
 	 * We will repeat the flush holding the lock in the normal manner
 	 * to catch cases where we are gazumped.
 	 */
-	ret = __unsafe_wait_rendering(obj, to_rps_client(file), !write_domain);
+	ret = i915_gem_object_wait(obj,
+				   I915_WAIT_INTERRUPTIBLE |
+				   (write_domain ? I915_WAIT_ALL : 0),
+				   MAX_SCHEDULE_TIMEOUT,
+				   to_rps_client(file));
 	if (ret)
 		goto err;
 
@@ -1773,7 +1890,10 @@ int i915_gem_fault(struct vm_area_struct *area, struct vm_fault *vmf)
 	 * repeat the flush holding the lock in the normal manner to catch cases
 	 * where we are gazumped.
 	 */
-	ret = __unsafe_wait_rendering(obj, NULL, !write);
+	ret = i915_gem_object_wait(obj,
+				   I915_WAIT_INTERRUPTIBLE,
+				   MAX_SCHEDULE_TIMEOUT,
+				   NULL);
 	if (ret)
 		goto err;
 
@@ -2792,10 +2912,9 @@ int
 i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 {
 	struct drm_i915_gem_wait *args = data;
-	struct intel_rps_client *rps = to_rps_client(file);
 	struct drm_i915_gem_object *obj;
-	unsigned long active;
-	int idx, ret = 0;
+	ktime_t start;
+	long ret;
 
 	if (args->flags != 0)
 		return -EINVAL;
@@ -2804,17 +2923,21 @@ i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 	if (!obj)
 		return -ENOENT;
 
-	active = __I915_BO_ACTIVE(obj);
-	for_each_active(active, idx) {
-		s64 *timeout = args->timeout_ns >= 0 ? &args->timeout_ns : NULL;
-		ret = i915_gem_active_wait_unlocked(&obj->last_read[idx],
-						    I915_WAIT_INTERRUPTIBLE,
-						    timeout, rps);
-		if (ret)
-			break;
+	start = ktime_get();
+
+	ret = i915_gem_object_wait(obj,
+				   I915_WAIT_INTERRUPTIBLE | I915_WAIT_ALL,
+				   args->timeout_ns < 0 ? MAX_SCHEDULE_TIMEOUT : nsecs_to_jiffies(args->timeout_ns),
+				   to_rps_client(file));
+
+	if (args->timeout_ns > 0) {
+		args->timeout_ns -= ktime_to_ns(ktime_sub(ktime_get(), start));
+		if (args->timeout_ns < 0)
+			args->timeout_ns = 0;
 	}
 
 	i915_gem_object_put_unlocked(obj);
+
 	return ret;
 }
 
@@ -3226,7 +3349,13 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
 	uint32_t old_write_domain, old_read_domains;
 	int ret;
 
-	ret = i915_gem_object_wait_rendering(obj, !write);
+	lockdep_assert_held(&obj->base.dev->struct_mutex);
+	ret = i915_gem_object_wait(obj,
+				   I915_WAIT_INTERRUPTIBLE |
+				   I915_WAIT_LOCKED |
+				   (write ? I915_WAIT_ALL : 0),
+				   MAX_SCHEDULE_TIMEOUT,
+				   NULL);
 	if (ret)
 		return ret;
 
@@ -3343,7 +3472,12 @@ restart:
 		 * If we wait upon the object, we know that all the bound
 		 * VMA are no longer active.
 		 */
-		ret = i915_gem_object_wait_rendering(obj, false);
+		ret = i915_gem_object_wait(obj,
+					   I915_WAIT_INTERRUPTIBLE |
+					   I915_WAIT_LOCKED |
+					   I915_WAIT_ALL,
+					   MAX_SCHEDULE_TIMEOUT,
+					   NULL);
 		if (ret)
 			return ret;
 
@@ -3598,7 +3732,13 @@ i915_gem_object_set_to_cpu_domain(struct drm_i915_gem_object *obj, bool write)
 	uint32_t old_write_domain, old_read_domains;
 	int ret;
 
-	ret = i915_gem_object_wait_rendering(obj, !write);
+	lockdep_assert_held(&obj->base.dev->struct_mutex);
+	ret = i915_gem_object_wait(obj,
+				   I915_WAIT_INTERRUPTIBLE |
+				   I915_WAIT_LOCKED |
+				   (write ? I915_WAIT_ALL : 0),
+				   MAX_SCHEDULE_TIMEOUT,
+				   NULL);
 	if (ret)
 		return ret;
 
@@ -3654,11 +3794,7 @@ i915_gem_ring_throttle(struct drm_device *dev, struct drm_file *file)
 	struct drm_i915_file_private *file_priv = file->driver_priv;
 	unsigned long recent_enough = jiffies - DRM_I915_THROTTLE_JIFFIES;
 	struct drm_i915_gem_request *request, *target = NULL;
-	int ret;
-
-	ret = i915_gem_wait_for_error(&dev_priv->gpu_error);
-	if (ret)
-		return ret;
+	long ret;
 
 	/* ABI: return -EIO if already wedged */
 	if (i915_terminally_wedged(&dev_priv->gpu_error))
@@ -3685,10 +3821,12 @@ i915_gem_ring_throttle(struct drm_device *dev, struct drm_file *file)
 	if (target == NULL)
 		return 0;
 
-	ret = i915_wait_request(target, I915_WAIT_INTERRUPTIBLE, NULL, NULL);
+	ret = i915_wait_request(target,
+				I915_WAIT_INTERRUPTIBLE,
+				MAX_SCHEDULE_TIMEOUT);
 	i915_gem_request_put(target);
 
-	return ret;
+	return ret < 0 ? ret : 0;
 }
 
 static bool
diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
index 7496661b295e..687537d91be8 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -59,31 +59,9 @@ static bool i915_fence_enable_signaling(struct fence *fence)
 
 static signed long i915_fence_wait(struct fence *fence,
 				   bool interruptible,
-				   signed long timeout_jiffies)
+				   signed long timeout)
 {
-	s64 timeout_ns, *timeout;
-	int ret;
-
-	if (timeout_jiffies != MAX_SCHEDULE_TIMEOUT) {
-		timeout_ns = jiffies_to_nsecs(timeout_jiffies);
-		timeout = &timeout_ns;
-	} else {
-		timeout = NULL;
-	}
-
-	ret = i915_wait_request(to_request(fence),
-				interruptible, timeout,
-				NO_WAITBOOST);
-	if (ret == -ETIME)
-		return 0;
-
-	if (ret < 0)
-		return ret;
-
-	if (timeout_jiffies != MAX_SCHEDULE_TIMEOUT)
-		timeout_jiffies = nsecs_to_jiffies(timeout_ns);
-
-	return timeout_jiffies;
+	return i915_wait_request(to_request(fence), interruptible, timeout);
 }
 
 static void i915_fence_value_str(struct fence *fence, char *str, int size)
@@ -166,7 +144,7 @@ static void i915_gem_request_retire(struct drm_i915_gem_request *request)
 	struct i915_gem_active *active, *next;
 
 	trace_i915_gem_request_retire(request);
-	list_del(&request->link);
+	list_del_init(&request->link);
 
 	/* We know the GPU must have read the request to have
 	 * sent us the seqno + interrupt, so use the position
@@ -224,7 +202,8 @@ void i915_gem_request_retire_upto(struct drm_i915_gem_request *req)
 	struct drm_i915_gem_request *tmp;
 
 	lockdep_assert_held(&req->i915->drm.struct_mutex);
-	GEM_BUG_ON(list_empty(&req->link));
+	if (list_empty(&req->link))
+		return;
 
 	do {
 		tmp = list_first_entry(&engine->request_list,
@@ -785,60 +764,27 @@ bool __i915_spin_request(const struct drm_i915_gem_request *req,
  * Returns 0 if the request was found within the alloted time. Else returns the
  * errno with remaining time filled in timeout argument.
  */
-int i915_wait_request(struct drm_i915_gem_request *req,
-		      unsigned int flags,
-		      s64 *timeout,
-		      struct intel_rps_client *rps)
+long i915_wait_request(struct drm_i915_gem_request *req,
+		       unsigned int flags,
+		       long timeout)
 {
 	const int state = flags & I915_WAIT_INTERRUPTIBLE ?
 		TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
 	DEFINE_WAIT(reset);
 	struct intel_wait wait;
-	unsigned long timeout_remain;
-	int ret = 0;
 
 	might_sleep();
 #if IS_ENABLED(CONFIG_LOCKDEP)
 	GEM_BUG_ON(!!lockdep_is_held(&req->i915->drm.struct_mutex) !=
 		   !!(flags & I915_WAIT_LOCKED));
 #endif
+	GEM_BUG_ON(timeout < 0);
 
 	if (i915_gem_request_completed(req))
-		return 0;
-
-	timeout_remain = MAX_SCHEDULE_TIMEOUT;
-	if (timeout) {
-		if (WARN_ON(*timeout < 0))
-			return -EINVAL;
-
-		if (*timeout == 0)
-			return -ETIME;
-
-		/* Record current time in case interrupted, or wedged */
-		timeout_remain = nsecs_to_jiffies_timeout(*timeout);
-		*timeout += ktime_get_raw_ns();
-	}
+		return timeout;
 
 	trace_i915_gem_request_wait_begin(req);
 
-	/* This client is about to stall waiting for the GPU. In many cases
-	 * this is undesirable and limits the throughput of the system, as
-	 * many clients cannot continue processing user input/output whilst
-	 * blocked. RPS autotuning may take tens of milliseconds to respond
-	 * to the GPU load and thus incurs additional latency for the client.
-	 * We can circumvent that by promoting the GPU frequency to maximum
-	 * before we wait. This makes the GPU throttle up much more quickly
-	 * (good for benchmarks and user experience, e.g. window animations),
-	 * but at a cost of spending more power processing the workload
-	 * (bad for battery). Not all clients even want their results
-	 * immediately and for them we should just let the GPU select its own
-	 * frequency to maximise efficiency. To prevent a single client from
-	 * forcing the clocks too high for the whole system, we only allow
-	 * each client to waitboost once in a busy period.
-	 */
-	if (IS_RPS_CLIENT(rps) && INTEL_GEN(req->i915) >= 6)
-		gen6_rps_boost(req->i915, rps, req->emitted_jiffies);
-
 	/* Optimistic short spin before touching IRQs */
 	if (i915_spin_request(req, state, 5))
 		goto complete;
@@ -857,15 +803,13 @@ int i915_wait_request(struct drm_i915_gem_request *req,
 
 	for (;;) {
 		if (signal_pending_state(state, current)) {
-			ret = -ERESTARTSYS;
+			timeout = -ERESTARTSYS;
 			break;
 		}
 
-		timeout_remain = io_schedule_timeout(timeout_remain);
-		if (timeout_remain == 0) {
-			ret = -ETIME;
+		timeout = io_schedule_timeout(timeout);
+		if (!timeout)
 			break;
-		}
 
 		if (intel_wait_complete(&wait))
 			break;
@@ -913,40 +857,7 @@ wakeup:
 complete:
 	trace_i915_gem_request_wait_end(req);
 
-	if (timeout) {
-		*timeout -= ktime_get_raw_ns();
-		if (*timeout < 0)
-			*timeout = 0;
-
-		/*
-		 * Apparently ktime isn't accurate enough and occasionally has a
-		 * bit of mismatch in the jiffies<->nsecs<->ktime loop. So patch
-		 * things up to make the test happy. We allow up to 1 jiffy.
-		 *
-		 * This is a regrssion from the timespec->ktime conversion.
-		 */
-		if (ret == -ETIME && *timeout < jiffies_to_usecs(1)*1000)
-			*timeout = 0;
-	}
-
-	if (IS_RPS_USER(rps) &&
-	    req->fence.seqno == req->engine->last_submitted_seqno) {
-		/* The GPU is now idle and this client has stalled.
-		 * Since no other client has submitted a request in the
-		 * meantime, assume that this client is the only one
-		 * supplying work to the GPU but is unable to keep that
-		 * work supplied because it is waiting. Since the GPU is
-		 * then never kept fully busy, RPS autoclocking will
-		 * keep the clocks relatively low, causing further delays.
-		 * Compensate by giving the synchronous client credit for
-		 * a waitboost next time.
-		 */
-		spin_lock(&req->i915->rps.client_lock);
-		list_del_init(&rps->link);
-		spin_unlock(&req->i915->rps.client_lock);
-	}
-
-	return ret;
+	return timeout;
 }
 
 static bool engine_retire_requests(struct intel_engine_cs *engine)
diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
index c85a3d82febf..f1ce390c9244 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.h
+++ b/drivers/gpu/drm/i915/i915_gem_request.h
@@ -228,13 +228,13 @@ struct intel_rps_client;
 #define IS_RPS_CLIENT(p) (!IS_ERR(p))
 #define IS_RPS_USER(p) (!IS_ERR_OR_NULL(p))
 
-int i915_wait_request(struct drm_i915_gem_request *req,
-		      unsigned int flags,
-		      s64 *timeout,
-		      struct intel_rps_client *rps)
+long i915_wait_request(struct drm_i915_gem_request *req,
+		       unsigned int flags,
+		       long timeout)
 	__attribute__((nonnull(1)));
 #define I915_WAIT_INTERRUPTIBLE	BIT(0)
 #define I915_WAIT_LOCKED	BIT(1) /* struct_mutex held, handle GPU reset */
+#define I915_WAIT_ALL		BIT(2) /* used by i915_gem_object_wait() */
 
 static inline u32 intel_engine_get_seqno(struct intel_engine_cs *engine);
 
@@ -583,14 +583,16 @@ static inline int __must_check
 i915_gem_active_wait(const struct i915_gem_active *active, struct mutex *mutex)
 {
 	struct drm_i915_gem_request *request;
+	long ret;
 
 	request = i915_gem_active_peek(active, mutex);
 	if (!request)
 		return 0;
 
-	return i915_wait_request(request,
-				 I915_WAIT_INTERRUPTIBLE | I915_WAIT_LOCKED,
-				 NULL, NULL);
+	ret = i915_wait_request(request,
+				I915_WAIT_INTERRUPTIBLE | I915_WAIT_LOCKED,
+				MAX_SCHEDULE_TIMEOUT);
+	return ret < 0 ? ret : 0;
 }
 
 /**
@@ -617,20 +619,18 @@ i915_gem_active_wait(const struct i915_gem_active *active, struct mutex *mutex)
  */
 static inline int
 i915_gem_active_wait_unlocked(const struct i915_gem_active *active,
-			      unsigned int flags,
-			      s64 *timeout,
-			      struct intel_rps_client *rps)
+			      unsigned int flags)
 {
 	struct drm_i915_gem_request *request;
-	int ret = 0;
+	long ret = 0;
 
 	request = i915_gem_active_get_unlocked(active);
 	if (request) {
-		ret = i915_wait_request(request, flags, timeout, rps);
+		ret = i915_wait_request(request, flags, MAX_SCHEDULE_TIMEOUT);
 		i915_gem_request_put(request);
 	}
 
-	return ret;
+	return ret < 0 ? ret : 0;
 }
 
 /**
@@ -647,7 +647,7 @@ i915_gem_active_retire(struct i915_gem_active *active,
 		       struct mutex *mutex)
 {
 	struct drm_i915_gem_request *request;
-	int ret;
+	long ret;
 
 	request = i915_gem_active_raw(active, mutex);
 	if (!request)
@@ -655,8 +655,8 @@ i915_gem_active_retire(struct i915_gem_active *active,
 
 	ret = i915_wait_request(request,
 				I915_WAIT_INTERRUPTIBLE | I915_WAIT_LOCKED,
-				NULL, NULL);
-	if (ret)
+				MAX_SCHEDULE_TIMEOUT);
+	if (ret < 0)
 		return ret;
 
 	list_del_init(&active->link);
diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
index e537930c64b5..1c891b92ac80 100644
--- a/drivers/gpu/drm/i915/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
@@ -61,23 +61,13 @@ struct i915_mmu_object {
 	bool attached;
 };
 
-static void wait_rendering(struct drm_i915_gem_object *obj)
-{
-	unsigned long active = __I915_BO_ACTIVE(obj);
-	int idx;
-
-	for_each_active(active, idx)
-		i915_gem_active_wait_unlocked(&obj->last_read[idx],
-					      0, NULL, NULL);
-}
-
 static void cancel_userptr(struct work_struct *work)
 {
 	struct i915_mmu_object *mo = container_of(work, typeof(*mo), work);
 	struct drm_i915_gem_object *obj = mo->obj;
 	struct drm_device *dev = obj->base.dev;
 
-	wait_rendering(obj);
+	i915_gem_object_wait(obj, I915_WAIT_ALL, MAX_SCHEDULE_TIMEOUT, NULL);
 
 	mutex_lock(&dev->struct_mutex);
 	/* Cancel any active worker and force us to re-evaluate gup */
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 497d99b88468..a5d61f0eea1d 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -12022,7 +12022,7 @@ static void intel_mmio_flip_work_func(struct work_struct *w)
 
 	if (work->flip_queued_req)
 		WARN_ON(i915_wait_request(work->flip_queued_req,
-					  0, NULL, NO_WAITBOOST));
+					  0, MAX_SCHEDULE_TIMEOUT) < 0);
 
 	/* For framebuffer backed by dmabuf, wait for fence */
 	resv = i915_gem_object_get_dmabuf_resv(obj);
@@ -14069,19 +14069,21 @@ static int intel_atomic_prepare_commit(struct drm_device *dev,
 		for_each_plane_in_state(state, plane, plane_state, i) {
 			struct intel_plane_state *intel_plane_state =
 				to_intel_plane_state(plane_state);
+			long timeout;
 
 			if (!intel_plane_state->wait_req)
 				continue;
 
-			ret = i915_wait_request(intel_plane_state->wait_req,
-						I915_WAIT_INTERRUPTIBLE,
-						NULL, NULL);
-			if (ret) {
+			timeout = i915_wait_request(intel_plane_state->wait_req,
+						    I915_WAIT_INTERRUPTIBLE,
+						    MAX_SCHEDULE_TIMEOUT);
+			if (timeout < 0) {
 				/* Any hang should be swallowed by the wait */
-				WARN_ON(ret == -EIO);
+				WARN_ON(timeout == -EIO);
 				mutex_lock(&dev->struct_mutex);
 				drm_atomic_helper_cleanup_planes(dev, state);
 				mutex_unlock(&dev->struct_mutex);
+				ret = timeout;
 				break;
 			}
 		}
@@ -14283,7 +14285,7 @@ static void intel_atomic_commit_tail(struct drm_atomic_state *state)
 	bool hw_check = intel_state->modeset;
 	unsigned long put_domains[I915_MAX_PIPES] = {};
 	unsigned crtc_vblank_mask = 0;
-	int i, ret;
+	int i;
 
 	for_each_plane_in_state(state, plane, plane_state, i) {
 		struct intel_plane_state *intel_plane_state =
@@ -14292,11 +14294,10 @@ static void intel_atomic_commit_tail(struct drm_atomic_state *state)
 		if (!intel_plane_state->wait_req)
 			continue;
 
-		ret = i915_wait_request(intel_plane_state->wait_req,
-					0, NULL, NULL);
 		/* EIO should be eaten, and we can't get interrupted in the
 		 * worker, and blocking commits have waited already. */
-		WARN_ON(ret);
+		WARN_ON(i915_wait_request(intel_plane_state->wait_req,
+					  0, MAX_SCHEDULE_TIMEOUT) < 0);
 	}
 
 	drm_atomic_helper_wait_for_dependencies(state);
@@ -14647,6 +14648,7 @@ intel_prepare_plane_fb(struct drm_plane *plane,
 	if (old_obj) {
 		struct drm_crtc_state *crtc_state =
 			drm_atomic_get_existing_crtc_state(new_state->state, plane->state->crtc);
+		long timeout = 0;
 
 		/* Big Hammer, we also need to ensure that any pending
 		 * MI_WAIT_FOR_EVENT inside a user batch buffer on the
@@ -14660,11 +14662,15 @@ intel_prepare_plane_fb(struct drm_plane *plane,
 		 * can safely continue.
 		 */
 		if (needs_modeset(crtc_state))
-			ret = i915_gem_object_wait_rendering(old_obj, true);
-		if (ret) {
+			timeout = i915_gem_object_wait(old_obj,
+						       I915_WAIT_INTERRUPTIBLE |
+						       I915_WAIT_LOCKED,
+						       MAX_SCHEDULE_TIMEOUT,
+						       NULL);
+		if (timeout < 0) {
 			/* GPU hangs should have been swallowed by the wait */
-			WARN_ON(ret == -EIO);
-			return ret;
+			WARN_ON(timeout == -EIO);
+			return timeout;
 		}
 	}
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 7a74750076c5..4bf6bd056b26 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2211,7 +2211,9 @@ static int wait_for_space(struct drm_i915_gem_request *req, int bytes)
 {
 	struct intel_ring *ring = req->ring;
 	struct drm_i915_gem_request *target;
-	int ret;
+	long timeout;
+
+	lockdep_assert_held(&req->i915->drm.struct_mutex);
 
 	intel_ring_update_space(ring);
 	if (ring->space >= bytes)
@@ -2241,11 +2243,11 @@ static int wait_for_space(struct drm_i915_gem_request *req, int bytes)
 	if (WARN_ON(&target->ring_link == &ring->request_list))
 		return -ENOSPC;
 
-	ret = i915_wait_request(target,
-				I915_WAIT_INTERRUPTIBLE | I915_WAIT_LOCKED,
-				NULL, NO_WAITBOOST);
-	if (ret)
-		return ret;
+	timeout = i915_wait_request(target,
+				    I915_WAIT_INTERRUPTIBLE | I915_WAIT_LOCKED,
+				    MAX_SCHEDULE_TIMEOUT);
+	if (timeout < 0)
+		return timeout;
 
 	i915_gem_request_retire_upto(target);
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 7f64d611159b..88ed21f1135b 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -503,8 +503,7 @@ static inline int intel_engine_idle(struct intel_engine_cs *engine,
 				    unsigned int flags)
 {
 	/* Wait upon the last request to be completed */
-	return i915_gem_active_wait_unlocked(&engine->last_request,
-					     flags, NULL, NULL);
+	return i915_gem_active_wait_unlocked(&engine->last_request, flags);
 }
 
 int intel_init_render_ring_buffer(struct intel_engine_cs *engine);
-- 
2.9.3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 04/18] drm/i915: Remove unused i915_gem_active_wait() in favour of _unlocked()
  2016-09-14  6:52 Tracking multiple timelines (full-ppgtt) Chris Wilson
                   ` (2 preceding siblings ...)
  2016-09-14  6:52 ` [PATCH 03/18] drm/i915: Rearrange i915_wait_request() accounting with callers Chris Wilson
@ 2016-09-14  6:52 ` Chris Wilson
  2016-09-14  8:48   ` Joonas Lahtinen
  2016-09-14  6:52 ` [PATCH 05/18] drm/i915: Move GEM activity tracking into a common struct reservation_object Chris Wilson
                   ` (14 subsequent siblings)
  18 siblings, 1 reply; 51+ messages in thread
From: Chris Wilson @ 2016-09-14  6:52 UTC (permalink / raw)
  To: intel-gfx

Since we only use the more generic unlocked variant, just rename it as
the normal i915_gem_active_wait(). The temporary cost is that we need to
always acquire the reference in a RCU safe manner, but the benefit is
that we will combine the common paths.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_request.h | 34 +++------------------------------
 drivers/gpu/drm/i915/intel_ringbuffer.h |  2 +-
 2 files changed, 4 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
index f1ce390c9244..45998eedda2c 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.h
+++ b/drivers/gpu/drm/i915/i915_gem_request.h
@@ -569,40 +569,13 @@ i915_gem_active_is_idle(const struct i915_gem_active *active,
 }
 
 /**
- * i915_gem_active_wait - waits until the request is completed
- * @active - the active request on which to wait
- *
- * i915_gem_active_wait() waits until the request is completed before
- * returning. Note that it does not guarantee that the request is
- * retired first, see i915_gem_active_retire().
- *
- * i915_gem_active_wait() returns immediately if the active
- * request is already complete.
- */
-static inline int __must_check
-i915_gem_active_wait(const struct i915_gem_active *active, struct mutex *mutex)
-{
-	struct drm_i915_gem_request *request;
-	long ret;
-
-	request = i915_gem_active_peek(active, mutex);
-	if (!request)
-		return 0;
-
-	ret = i915_wait_request(request,
-				I915_WAIT_INTERRUPTIBLE | I915_WAIT_LOCKED,
-				MAX_SCHEDULE_TIMEOUT);
-	return ret < 0 ? ret : 0;
-}
-
-/**
- * i915_gem_active_wait_unlocked - waits until the request is completed
+ * i915_gem_active_wait- waits until the request is completed
  * @active - the active request on which to wait
  * @flags - how to wait
  * @timeout - how long to wait at most
  * @rps - userspace client to charge for a waitboost
  *
- * i915_gem_active_wait_unlocked() waits until the request is completed before
+ * i915_gem_active_wait() waits until the request is completed before
  * returning, without requiring any locks to be held. Note that it does not
  * retire any requests before returning.
  *
@@ -618,8 +591,7 @@ i915_gem_active_wait(const struct i915_gem_active *active, struct mutex *mutex)
  * Returns 0 if successful, or a negative error code.
  */
 static inline int
-i915_gem_active_wait_unlocked(const struct i915_gem_active *active,
-			      unsigned int flags)
+i915_gem_active_wait(const struct i915_gem_active *active, unsigned int flags)
 {
 	struct drm_i915_gem_request *request;
 	long ret = 0;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 88ed21f1135b..80f2db3d63d9 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -503,7 +503,7 @@ static inline int intel_engine_idle(struct intel_engine_cs *engine,
 				    unsigned int flags)
 {
 	/* Wait upon the last request to be completed */
-	return i915_gem_active_wait_unlocked(&engine->last_request, flags);
+	return i915_gem_active_wait(&engine->last_request, flags);
 }
 
 int intel_init_render_ring_buffer(struct intel_engine_cs *engine);
-- 
2.9.3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 05/18] drm/i915: Move GEM activity tracking into a common struct reservation_object
  2016-09-14  6:52 Tracking multiple timelines (full-ppgtt) Chris Wilson
                   ` (3 preceding siblings ...)
  2016-09-14  6:52 ` [PATCH 04/18] drm/i915: Remove unused i915_gem_active_wait() in favour of _unlocked() Chris Wilson
@ 2016-09-14  6:52 ` Chris Wilson
  2016-09-14  9:44   ` Joonas Lahtinen
  2016-09-16 11:40   ` Chris Wilson
  2016-09-14  6:52 ` [PATCH 06/18] drm: Add reference counting to drm_atomic_state Chris Wilson
                   ` (13 subsequent siblings)
  18 siblings, 2 replies; 51+ messages in thread
From: Chris Wilson @ 2016-09-14  6:52 UTC (permalink / raw)
  To: intel-gfx

In preparation to support many distinct timelines, we need to expand the
activity tracking on the GEM object to handle more than just a request
per engine. We already use the struct reservation_object on the dma-buf
to handle many fence contexts, so integrating that into the GEM object
itself is the preferred solution. (For example, we can now share the same
reservation_object between every consumer/producer using this buffer and
skip the manual import/export via dma-buf.)

Caveats:

 * busy-ioctl: the modern interface is completely thrown away,
regressing us back to before

commit e9808edd98679680804dfbc42c5ee8f1aa91f617 [v3.10]
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Jul 4 12:25:08 2012 +0100

    drm/i915: Return a mask of the active rings in the high word of busy_ioctl

 * local engine selection and semaphore avoidance is lost for CS flips

 * non-blocking atomic modesets take a step backwards as the wait for
render completion blocks the ioctl. This is fixed in a subsequent
patch to use a fence instead for awaiting on the rendering.

 * dynamic array manipulation for shared-fences in reservation is slower
than the previous lockless static assignment (e.g. gem_exec_lut_handle
runtime on ivb goes from 42s to 72s)

 * loss of object-level retirement callbacks

 * minor loss of object-level last activity information from debugfs,
could be replaced with per-vma information if desired

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c        |  18 +--
 drivers/gpu/drm/i915/i915_drv.h            |  43 +----
 drivers/gpu/drm/i915/i915_gem.c            | 252 +++--------------------------
 drivers/gpu/drm/i915/i915_gem_batch_pool.c |   3 +-
 drivers/gpu/drm/i915/i915_gem_dmabuf.c     |  46 +-----
 drivers/gpu/drm/i915/i915_gem_dmabuf.h     |  45 ------
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  71 ++------
 drivers/gpu/drm/i915/i915_gem_gtt.c        |  29 ++++
 drivers/gpu/drm/i915/i915_gem_gtt.h        |   1 +
 drivers/gpu/drm/i915/i915_gem_request.c    |  48 +++---
 drivers/gpu/drm/i915/i915_gem_request.h    |  35 +---
 drivers/gpu/drm/i915/i915_gpu_error.c      |   6 +-
 drivers/gpu/drm/i915/intel_atomic_plane.c  |   2 -
 drivers/gpu/drm/i915/intel_display.c       | 122 +++-----------
 drivers/gpu/drm/i915/intel_drv.h           |   3 -
 include/uapi/drm/i915_drm.h                |  33 ----
 16 files changed, 127 insertions(+), 630 deletions(-)
 delete mode 100644 drivers/gpu/drm/i915/i915_gem_dmabuf.h

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 64702cc68e3a..c4e7532c5b6a 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -134,15 +134,13 @@ static void
 describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
 {
 	struct drm_i915_private *dev_priv = to_i915(obj->base.dev);
-	struct intel_engine_cs *engine;
 	struct i915_vma *vma;
 	unsigned int frontbuffer_bits;
 	int pin_count = 0;
-	enum intel_engine_id id;
 
 	lockdep_assert_held(&obj->base.dev->struct_mutex);
 
-	seq_printf(m, "%pK: %c%c%c%c%c %8zdKiB %02x %02x [ ",
+	seq_printf(m, "%pK: %c%c%c%c%c %8zdKiB %02x %02x %s%s%s",
 		   &obj->base,
 		   get_active_flag(obj),
 		   get_pin_flag(obj),
@@ -151,14 +149,7 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
 		   get_pin_mapped_flag(obj),
 		   obj->base.size / 1024,
 		   obj->base.read_domains,
-		   obj->base.write_domain);
-	for_each_engine_id(engine, dev_priv, id)
-		seq_printf(m, "%x ",
-			   i915_gem_active_get_seqno(&obj->last_read[id],
-						     &obj->base.dev->struct_mutex));
-	seq_printf(m, "] %x %s%s%s",
-		   i915_gem_active_get_seqno(&obj->last_write,
-					     &obj->base.dev->struct_mutex),
+		   obj->base.write_domain,
 		   i915_cache_level_str(dev_priv, obj->cache_level),
 		   obj->dirty ? " dirty" : "",
 		   obj->madv == I915_MADV_DONTNEED ? " purgeable" : "");
@@ -198,11 +189,6 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
 		seq_printf(m, " (%s mappable)", s);
 	}
 
-	engine = i915_gem_active_get_engine(&obj->last_write,
-					    &dev_priv->drm.struct_mutex);
-	if (engine)
-		seq_printf(m, " (%s)", engine->name);
-
 	frontbuffer_bits = atomic_read(&obj->frontbuffer_bits);
 	if (frontbuffer_bits)
 		seq_printf(m, " (frontbuffer: 0x%03x)", frontbuffer_bits);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index af0212032b64..2bcab3087e8c 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -41,6 +41,7 @@
 #include <linux/intel-iommu.h>
 #include <linux/kref.h>
 #include <linux/pm_qos.h>
+#include <linux/reservation.h>
 #include <linux/shmem_fs.h>
 
 #include <drm/drmP.h>
@@ -2198,15 +2199,6 @@ struct drm_i915_gem_object {
 	struct list_head batch_pool_link;
 
 	unsigned long flags;
-	/**
-	 * This is set if the object is on the active lists (has pending
-	 * rendering and so a non-zero seqno), and is not set if it i s on
-	 * inactive (ready to be unbound) list.
-	 */
-#define I915_BO_ACTIVE_SHIFT 0
-#define I915_BO_ACTIVE_MASK ((1 << I915_NUM_ENGINES) - 1)
-#define __I915_BO_ACTIVE(bo) \
-	((READ_ONCE((bo)->flags) >> I915_BO_ACTIVE_SHIFT) & I915_BO_ACTIVE_MASK)
 
 	/**
 	 * This is set if the object has been written to since last bound
@@ -2245,6 +2237,7 @@ struct drm_i915_gem_object {
 
 	/** Count of VMA actually bound by this object */
 	unsigned int bind_count;
+	unsigned int active_count;
 	unsigned int pin_display;
 
 	struct sg_table *pages;
@@ -2265,8 +2258,7 @@ struct drm_i915_gem_object {
 	 * read request. This allows for the CPU to read from an active
 	 * buffer by only waiting for the write to complete.
 	 */
-	struct i915_gem_active last_read[I915_NUM_ENGINES];
-	struct i915_gem_active last_write;
+	struct reservation_object *resv;
 
 	/** References from framebuffers, locks out tiling changes. */
 	unsigned long framebuffer_references;
@@ -2289,6 +2281,8 @@ struct drm_i915_gem_object {
 			struct work_struct *work;
 		} userptr;
 	};
+
+	struct reservation_object __builtin_resv;
 };
 
 static inline struct drm_i915_gem_object *
@@ -2347,35 +2341,10 @@ i915_gem_object_has_struct_page(const struct drm_i915_gem_object *obj)
 	return obj->ops->flags & I915_GEM_OBJECT_HAS_STRUCT_PAGE;
 }
 
-static inline unsigned long
-i915_gem_object_get_active(const struct drm_i915_gem_object *obj)
-{
-	return (obj->flags >> I915_BO_ACTIVE_SHIFT) & I915_BO_ACTIVE_MASK;
-}
-
 static inline bool
 i915_gem_object_is_active(const struct drm_i915_gem_object *obj)
 {
-	return i915_gem_object_get_active(obj);
-}
-
-static inline void
-i915_gem_object_set_active(struct drm_i915_gem_object *obj, int engine)
-{
-	obj->flags |= BIT(engine + I915_BO_ACTIVE_SHIFT);
-}
-
-static inline void
-i915_gem_object_clear_active(struct drm_i915_gem_object *obj, int engine)
-{
-	obj->flags &= ~BIT(engine + I915_BO_ACTIVE_SHIFT);
-}
-
-static inline bool
-i915_gem_object_has_active_engine(const struct drm_i915_gem_object *obj,
-				  int engine)
-{
-	return obj->flags & BIT(engine + I915_BO_ACTIVE_SHIFT);
+	return obj->active_count;
 }
 
 static inline unsigned int
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index d9214c9d31d2..ab8d10388581 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -29,7 +29,6 @@
 #include <drm/drm_vma_manager.h>
 #include <drm/i915_drm.h>
 #include "i915_drv.h"
-#include "i915_gem_dmabuf.h"
 #include "i915_vgpu.h"
 #include "i915_trace.h"
 #include "intel_drv.h"
@@ -441,11 +440,6 @@ i915_gem_object_wait(struct drm_i915_gem_object *obj,
 		     long timeout,
 		     struct intel_rps_client *rps)
 {
-	struct reservation_object *resv;
-	struct i915_gem_active *active;
-	unsigned long active_mask;
-	int idx;
-
 	might_sleep();
 #if IS_ENABLED(CONFIG_LOCKDEP)
 	GEM_BUG_ON(!!lockdep_is_held(&obj->base.dev->struct_mutex) !=
@@ -453,33 +447,9 @@ i915_gem_object_wait(struct drm_i915_gem_object *obj,
 #endif
 	GEM_BUG_ON(timeout < 0);
 
-	if (flags & I915_WAIT_ALL) {
-		active = obj->last_read;
-		active_mask = i915_gem_object_get_active(obj);
-	} else {
-		active_mask = 1;
-		active = &obj->last_write;
-	}
-
-	for_each_active(active_mask, idx) {
-		struct drm_i915_gem_request *request;
-
-		request = i915_gem_active_get_unlocked(&active[idx]);
-		if (request) {
-			timeout = i915_gem_object_wait_fence(&request->fence,
-							     flags, timeout,
-							     rps);
-			i915_gem_request_put(request);
-		}
-		if (timeout < 0)
-			return timeout;
-	}
-
-	resv = i915_gem_object_get_dmabuf_resv(obj);
-	if (resv)
-		timeout = i915_gem_object_wait_reservation(resv,
-							   flags, timeout,
-							   rps);
+	timeout = i915_gem_object_wait_reservation(obj->resv,
+						   flags, timeout,
+						   rps);
 	return timeout < 0 ? timeout : timeout > 0 ? 0 : -ETIME;
 }
 
@@ -2586,41 +2556,6 @@ err:
 	return ERR_PTR(ret);
 }
 
-static void
-i915_gem_object_retire__write(struct i915_gem_active *active,
-			      struct drm_i915_gem_request *request)
-{
-	struct drm_i915_gem_object *obj =
-		container_of(active, struct drm_i915_gem_object, last_write);
-
-	intel_fb_obj_flush(obj, true, ORIGIN_CS);
-}
-
-static void
-i915_gem_object_retire__read(struct i915_gem_active *active,
-			     struct drm_i915_gem_request *request)
-{
-	int idx = request->engine->id;
-	struct drm_i915_gem_object *obj =
-		container_of(active, struct drm_i915_gem_object, last_read[idx]);
-
-	GEM_BUG_ON(!i915_gem_object_has_active_engine(obj, idx));
-
-	i915_gem_object_clear_active(obj, idx);
-	if (i915_gem_object_is_active(obj))
-		return;
-
-	/* Bump our place on the bound list to keep it roughly in LRU order
-	 * so that we don't steal from recently used but inactive objects
-	 * (unless we are forced to ofc!)
-	 */
-	if (obj->bind_count)
-		list_move_tail(&obj->global_list,
-			       &request->i915->mm.bound_list);
-
-	i915_gem_object_put(obj);
-}
-
 static bool i915_context_is_banned(const struct i915_gem_context *ctx)
 {
 	unsigned long elapsed;
@@ -2923,6 +2858,16 @@ i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 	if (!obj)
 		return -ENOENT;
 
+	if (reservation_object_test_signaled_rcu(obj->resv, true)) {
+		ret = 0;
+		goto out;
+	}
+
+	if (!args->timeout_ns) {
+		ret = -ETIME;
+		goto out;
+	}
+
 	start = ktime_get();
 
 	ret = i915_gem_object_wait(obj,
@@ -2936,8 +2881,8 @@ i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 			args->timeout_ns = 0;
 	}
 
+out:
 	i915_gem_object_put_unlocked(obj);
-
 	return ret;
 }
 
@@ -3956,173 +3901,18 @@ i915_gem_object_ggtt_pin(struct drm_i915_gem_object *obj,
 	return vma;
 }
 
-static __always_inline unsigned int __busy_read_flag(unsigned int id)
-{
-	/* Note that we could alias engines in the execbuf API, but
-	 * that would be very unwise as it prevents userspace from
-	 * fine control over engine selection. Ahem.
-	 *
-	 * This should be something like EXEC_MAX_ENGINE instead of
-	 * I915_NUM_ENGINES.
-	 */
-	BUILD_BUG_ON(I915_NUM_ENGINES > 16);
-	return 0x10000 << id;
-}
-
-static __always_inline unsigned int __busy_write_id(unsigned int id)
-{
-	/* The uABI guarantees an active writer is also amongst the read
-	 * engines. This would be true if we accessed the activity tracking
-	 * under the lock, but as we perform the lookup of the object and
-	 * its activity locklessly we can not guarantee that the last_write
-	 * being active implies that we have set the same engine flag from
-	 * last_read - hence we always set both read and write busy for
-	 * last_write.
-	 */
-	return id | __busy_read_flag(id);
-}
-
-static __always_inline unsigned int
-__busy_set_if_active(const struct i915_gem_active *active,
-		     unsigned int (*flag)(unsigned int id))
-{
-	struct drm_i915_gem_request *request;
-
-	request = rcu_dereference(active->request);
-	if (!request || i915_gem_request_completed(request))
-		return 0;
-
-	/* This is racy. See __i915_gem_active_get_rcu() for an in detail
-	 * discussion of how to handle the race correctly, but for reporting
-	 * the busy state we err on the side of potentially reporting the
-	 * wrong engine as being busy (but we guarantee that the result
-	 * is at least self-consistent).
-	 *
-	 * As we use SLAB_DESTROY_BY_RCU, the request may be reallocated
-	 * whilst we are inspecting it, even under the RCU read lock as we are.
-	 * This means that there is a small window for the engine and/or the
-	 * seqno to have been overwritten. The seqno will always be in the
-	 * future compared to the intended, and so we know that if that
-	 * seqno is idle (on whatever engine) our request is idle and the
-	 * return 0 above is correct.
-	 *
-	 * The issue is that if the engine is switched, it is just as likely
-	 * to report that it is busy (but since the switch happened, we know
-	 * the request should be idle). So there is a small chance that a busy
-	 * result is actually the wrong engine.
-	 *
-	 * So why don't we care?
-	 *
-	 * For starters, the busy ioctl is a heuristic that is by definition
-	 * racy. Even with perfect serialisation in the driver, the hardware
-	 * state is constantly advancing - the state we report to the user
-	 * is stale.
-	 *
-	 * The critical information for the busy-ioctl is whether the object
-	 * is idle as userspace relies on that to detect whether its next
-	 * access will stall, or if it has missed submitting commands to
-	 * the hardware allowing the GPU to stall. We never generate a
-	 * false-positive for idleness, thus busy-ioctl is reliable at the
-	 * most fundamental level, and we maintain the guarantee that a
-	 * busy object left to itself will eventually become idle (and stay
-	 * idle!).
-	 *
-	 * We allow ourselves the leeway of potentially misreporting the busy
-	 * state because that is an optimisation heuristic that is constantly
-	 * in flux. Being quickly able to detect the busy/idle state is much
-	 * more important than accurate logging of exactly which engines were
-	 * busy.
-	 *
-	 * For accuracy in reporting the engine, we could use
-	 *
-	 *	result = 0;
-	 *	request = __i915_gem_active_get_rcu(active);
-	 *	if (request) {
-	 *		if (!i915_gem_request_completed(request))
-	 *			result = flag(request->engine->exec_id);
-	 *		i915_gem_request_put(request);
-	 *	}
-	 *
-	 * but that still remains susceptible to both hardware and userspace
-	 * races. So we accept making the result of that race slightly worse,
-	 * given the rarity of the race and its low impact on the result.
-	 */
-	return flag(READ_ONCE(request->engine->exec_id));
-}
-
-static __always_inline unsigned int
-busy_check_reader(const struct i915_gem_active *active)
-{
-	return __busy_set_if_active(active, __busy_read_flag);
-}
-
-static __always_inline unsigned int
-busy_check_writer(const struct i915_gem_active *active)
-{
-	return __busy_set_if_active(active, __busy_write_id);
-}
-
 int
 i915_gem_busy_ioctl(struct drm_device *dev, void *data,
 		    struct drm_file *file)
 {
 	struct drm_i915_gem_busy *args = data;
 	struct drm_i915_gem_object *obj;
-	unsigned long active;
 
 	obj = i915_gem_object_lookup(file, args->handle);
 	if (!obj)
 		return -ENOENT;
 
-	args->busy = 0;
-	active = __I915_BO_ACTIVE(obj);
-	if (active) {
-		int idx;
-
-		/* Yes, the lookups are intentionally racy.
-		 *
-		 * First, we cannot simply rely on __I915_BO_ACTIVE. We have
-		 * to regard the value as stale and as our ABI guarantees
-		 * forward progress, we confirm the status of each active
-		 * request with the hardware.
-		 *
-		 * Even though we guard the pointer lookup by RCU, that only
-		 * guarantees that the pointer and its contents remain
-		 * dereferencable and does *not* mean that the request we
-		 * have is the same as the one being tracked by the object.
-		 *
-		 * Consider that we lookup the request just as it is being
-		 * retired and freed. We take a local copy of the pointer,
-		 * but before we add its engine into the busy set, the other
-		 * thread reallocates it and assigns it to a task on another
-		 * engine with a fresh and incomplete seqno. Guarding against
-		 * that requires careful serialisation and reference counting,
-		 * i.e. using __i915_gem_active_get_request_rcu(). We don't,
-		 * instead we expect that if the result is busy, which engines
-		 * are busy is not completely reliable - we only guarantee
-		 * that the object was busy.
-		 */
-		rcu_read_lock();
-
-		for_each_active(active, idx)
-			args->busy |= busy_check_reader(&obj->last_read[idx]);
-
-		/* For ABI sanity, we only care that the write engine is in
-		 * the set of read engines. This should be ensured by the
-		 * ordering of setting last_read/last_write in
-		 * i915_vma_move_to_active(), and then in reverse in retire.
-		 * However, for good measure, we always report the last_write
-		 * request as a busy read as well as being a busy write.
-		 *
-		 * We don't care that the set of active read/write engines
-		 * may change during construction of the result, as it is
-		 * equally liable to change before userspace can inspect
-		 * the result.
-		 */
-		args->busy |= busy_check_writer(&obj->last_write);
-
-		rcu_read_unlock();
-	}
+	args->busy = !reservation_object_test_signaled_rcu(obj->resv, true);
 
 	i915_gem_object_put_unlocked(obj);
 	return 0;
@@ -4189,20 +3979,16 @@ unlock:
 void i915_gem_object_init(struct drm_i915_gem_object *obj,
 			  const struct drm_i915_gem_object_ops *ops)
 {
-	int i;
-
 	INIT_LIST_HEAD(&obj->global_list);
-	for (i = 0; i < I915_NUM_ENGINES; i++)
-		init_request_active(&obj->last_read[i],
-				    i915_gem_object_retire__read);
-	init_request_active(&obj->last_write,
-			    i915_gem_object_retire__write);
 	INIT_LIST_HEAD(&obj->obj_exec_link);
 	INIT_LIST_HEAD(&obj->vma_list);
 	INIT_LIST_HEAD(&obj->batch_pool_link);
 
 	obj->ops = ops;
 
+	reservation_object_init(&obj->__builtin_resv);
+	obj->resv = &obj->__builtin_resv;
+
 	obj->frontbuffer_ggtt_origin = ORIGIN_GTT;
 	obj->madv = I915_MADV_WILLNEED;
 
@@ -4349,6 +4135,8 @@ void i915_gem_free_object(struct drm_gem_object *gem_obj)
 	if (obj->ops->release)
 		obj->ops->release(obj);
 
+	reservation_object_fini(&obj->__builtin_resv);
+
 	drm_gem_object_release(&obj->base);
 	i915_gem_info_remove_obj(dev_priv, obj->base.size);
 
diff --git a/drivers/gpu/drm/i915/i915_gem_batch_pool.c b/drivers/gpu/drm/i915/i915_gem_batch_pool.c
index ed989596d9a3..bbfea965c593 100644
--- a/drivers/gpu/drm/i915/i915_gem_batch_pool.c
+++ b/drivers/gpu/drm/i915/i915_gem_batch_pool.c
@@ -114,8 +114,7 @@ i915_gem_batch_pool_get(struct i915_gem_batch_pool *pool,
 
 	list_for_each_entry_safe(tmp, next, list, batch_pool_link) {
 		/* The batches are strictly LRU ordered */
-		if (!i915_gem_active_is_idle(&tmp->last_read[pool->engine->id],
-					     &tmp->base.dev->struct_mutex))
+		if (i915_gem_object_is_active(tmp))
 			break;
 
 		/* While we're looping, do some clean up */
diff --git a/drivers/gpu/drm/i915/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/i915_gem_dmabuf.c
index 10265bb35604..b0a429aeafd9 100644
--- a/drivers/gpu/drm/i915/i915_gem_dmabuf.c
+++ b/drivers/gpu/drm/i915/i915_gem_dmabuf.c
@@ -222,49 +222,6 @@ static const struct dma_buf_ops i915_dmabuf_ops =  {
 	.end_cpu_access = i915_gem_end_cpu_access,
 };
 
-static void export_fences(struct drm_i915_gem_object *obj,
-			  struct dma_buf *dma_buf)
-{
-	struct reservation_object *resv = dma_buf->resv;
-	struct drm_i915_gem_request *req;
-	unsigned long active;
-	int idx;
-
-	active = __I915_BO_ACTIVE(obj);
-	if (!active)
-		return;
-
-	/* Serialise with execbuf to prevent concurrent fence-loops */
-	mutex_lock(&obj->base.dev->struct_mutex);
-
-	/* Mark the object for future fences before racily adding old fences */
-	obj->base.dma_buf = dma_buf;
-
-	ww_mutex_lock(&resv->lock, NULL);
-
-	for_each_active(active, idx) {
-		req = i915_gem_active_get(&obj->last_read[idx],
-					  &obj->base.dev->struct_mutex);
-		if (!req)
-			continue;
-
-		if (reservation_object_reserve_shared(resv) == 0)
-			reservation_object_add_shared_fence(resv, &req->fence);
-
-		i915_gem_request_put(req);
-	}
-
-	req = i915_gem_active_get(&obj->last_write,
-				  &obj->base.dev->struct_mutex);
-	if (req) {
-		reservation_object_add_excl_fence(resv, &req->fence);
-		i915_gem_request_put(req);
-	}
-
-	ww_mutex_unlock(&resv->lock);
-	mutex_unlock(&obj->base.dev->struct_mutex);
-}
-
 struct dma_buf *i915_gem_prime_export(struct drm_device *dev,
 				      struct drm_gem_object *gem_obj, int flags)
 {
@@ -276,6 +233,7 @@ struct dma_buf *i915_gem_prime_export(struct drm_device *dev,
 	exp_info.size = gem_obj->size;
 	exp_info.flags = flags;
 	exp_info.priv = gem_obj;
+	exp_info.resv = obj->resv;
 
 	if (obj->ops->dmabuf_export) {
 		int ret = obj->ops->dmabuf_export(obj);
@@ -287,7 +245,6 @@ struct dma_buf *i915_gem_prime_export(struct drm_device *dev,
 	if (IS_ERR(dma_buf))
 		return dma_buf;
 
-	export_fences(obj, dma_buf);
 	return dma_buf;
 }
 
@@ -350,6 +307,7 @@ struct drm_gem_object *i915_gem_prime_import(struct drm_device *dev,
 	drm_gem_private_object_init(dev, &obj->base, dma_buf->size);
 	i915_gem_object_init(obj, &i915_gem_object_dmabuf_ops);
 	obj->base.import_attach = attach;
+	obj->resv = dma_buf->resv;
 
 	/* We use GTT as shorthand for a coherent domain, one that is
 	 * neither in the GPU cache nor in the CPU cache, where all
diff --git a/drivers/gpu/drm/i915/i915_gem_dmabuf.h b/drivers/gpu/drm/i915/i915_gem_dmabuf.h
deleted file mode 100644
index 91315557e421..000000000000
--- a/drivers/gpu/drm/i915/i915_gem_dmabuf.h
+++ /dev/null
@@ -1,45 +0,0 @@
-/*
- * Copyright 2016 Intel Corporation
- *
- * Permission is hereby granted, free of charge, to any person obtaining a
- * copy of this software and associated documentation files (the "Software"),
- * to deal in the Software without restriction, including without limitation
- * the rights to use, copy, modify, merge, publish, distribute, sublicense,
- * and/or sell copies of the Software, and to permit persons to whom the
- * Software is furnished to do so, subject to the following conditions:
- *
- * The above copyright notice and this permission notice (including the next
- * paragraph) shall be included in all copies or substantial portions of the
- * Software.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
- * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
- * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
- * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
- * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
- * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
- * DEALINGS IN THE SOFTWARE.
- *
- */
-
-#ifndef _I915_GEM_DMABUF_H_
-#define _I915_GEM_DMABUF_H_
-
-#include <linux/dma-buf.h>
-
-static inline struct reservation_object *
-i915_gem_object_get_dmabuf_resv(struct drm_i915_gem_object *obj)
-{
-	struct dma_buf *dma_buf;
-
-	if (obj->base.dma_buf)
-		dma_buf = obj->base.dma_buf;
-	else if (obj->base.import_attach)
-		dma_buf = obj->base.import_attach->dmabuf;
-	else
-		return NULL;
-
-	return dma_buf->resv;
-}
-
-#endif
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 33c85227643d..6b5175ee824c 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -34,7 +34,6 @@
 #include <drm/i915_drm.h>
 
 #include "i915_drv.h"
-#include "i915_gem_dmabuf.h"
 #include "i915_trace.h"
 #include "intel_drv.h"
 #include "intel_frontbuffer.h"
@@ -552,20 +551,6 @@ repeat:
 	return 0;
 }
 
-static bool object_is_idle(struct drm_i915_gem_object *obj)
-{
-	unsigned long active = i915_gem_object_get_active(obj);
-	int idx;
-
-	for_each_active(active, idx) {
-		if (!i915_gem_active_is_idle(&obj->last_read[idx],
-					     &obj->base.dev->struct_mutex))
-			return false;
-	}
-
-	return true;
-}
-
 static int
 i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
 				   struct eb_vmas *eb,
@@ -650,7 +635,8 @@ i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
 	}
 
 	/* We can't wait for rendering with pagefaults disabled */
-	if (pagefault_disabled() && !object_is_idle(obj))
+	if (pagefault_disabled() &&
+	    !reservation_object_test_signaled_rcu(obj->resv, true))
 		return -EFAULT;
 
 	ret = relocate_entry(obj, reloc, cache, target_offset);
@@ -1111,44 +1097,20 @@ err:
 	return ret;
 }
 
-static unsigned int eb_other_engines(struct drm_i915_gem_request *req)
-{
-	unsigned int mask;
-
-	mask = ~intel_engine_flag(req->engine) & I915_BO_ACTIVE_MASK;
-	mask <<= I915_BO_ACTIVE_SHIFT;
-
-	return mask;
-}
-
 static int
 i915_gem_execbuffer_move_to_gpu(struct drm_i915_gem_request *req,
 				struct list_head *vmas)
 {
-	const unsigned int other_rings = eb_other_engines(req);
 	struct i915_vma *vma;
 	int ret;
 
 	list_for_each_entry(vma, vmas, exec_list) {
 		struct drm_i915_gem_object *obj = vma->obj;
-		struct reservation_object *resv;
-
-		if (obj->flags & other_rings) {
-			ret = i915_gem_request_await_object
-				(req, obj, obj->base.pending_write_domain);
-			if (ret)
-				return ret;
-		}
 
-		resv = i915_gem_object_get_dmabuf_resv(obj);
-		if (resv) {
-			ret = i915_sw_fence_await_reservation
-				(&req->submit, resv, &i915_fence_ops,
-				 obj->base.pending_write_domain, 10*HZ,
-				 GFP_KERNEL | __GFP_NOWARN);
-			if (ret < 0)
-				return ret;
-		}
+		ret = i915_gem_request_await_object
+			(req, obj, obj->base.pending_write_domain);
+		if (ret)
+			return ret;
 
 		if (obj->base.write_domain & I915_GEM_DOMAIN_CPU)
 			i915_gem_clflush_object(obj, false);
@@ -1290,8 +1252,6 @@ void i915_vma_move_to_active(struct i915_vma *vma,
 
 	GEM_BUG_ON(!drm_mm_node_allocated(&vma->node));
 
-	obj->dirty = 1; /* be paranoid  */
-
 	/* Add a reference if we're newly entering the active list.
 	 * The order in which we add operations to the retirement queue is
 	 * vital here: mark_active adds to the start of the callback list,
@@ -1299,13 +1259,14 @@ void i915_vma_move_to_active(struct i915_vma *vma,
 	 * add the active reference first and queue for it to be dropped
 	 * *last*.
 	 */
-	if (!i915_gem_object_is_active(obj))
+	if (!i915_vma_is_active(vma) && !obj->active_count++)
 		i915_gem_object_get(obj);
-	i915_gem_object_set_active(obj, idx);
-	i915_gem_active_set(&obj->last_read[idx], req);
+	i915_vma_set_active(vma, idx);
+	i915_gem_active_set(&vma->last_read[idx], req);
+	list_move_tail(&vma->vm_link, &vma->vm->active_list);
 
 	if (flags & EXEC_OBJECT_WRITE) {
-		i915_gem_active_set(&obj->last_write, req);
+		i915_gem_active_set(&vma->last_write, req);
 
 		intel_fb_obj_invalidate(obj, ORIGIN_CS);
 
@@ -1315,21 +1276,13 @@ void i915_vma_move_to_active(struct i915_vma *vma,
 
 	if (flags & EXEC_OBJECT_NEEDS_FENCE)
 		i915_gem_active_set(&vma->last_fence, req);
-
-	i915_vma_set_active(vma, idx);
-	i915_gem_active_set(&vma->last_read[idx], req);
-	list_move_tail(&vma->vm_link, &vma->vm->active_list);
 }
 
 static void eb_export_fence(struct drm_i915_gem_object *obj,
 			    struct drm_i915_gem_request *req,
 			    unsigned int flags)
 {
-	struct reservation_object *resv;
-
-	resv = i915_gem_object_get_dmabuf_resv(obj);
-	if (!resv)
-		return;
+	struct reservation_object *resv = obj->resv;
 
 	/* Ignore errors from failing to allocate the new fence, we can't
 	 * handle an error right now. Worst case should be missed
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 0bb4232f66bc..92b36ab79771 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -31,6 +31,7 @@
 #include "i915_vgpu.h"
 #include "i915_trace.h"
 #include "intel_drv.h"
+#include "intel_frontbuffer.h"
 
 #define I915_GFP_DMA (GFP_KERNEL | __GFP_HIGHMEM)
 
@@ -3303,6 +3304,7 @@ i915_vma_retire(struct i915_gem_active *active,
 	const unsigned int idx = rq->engine->id;
 	struct i915_vma *vma =
 		container_of(active, struct i915_vma, last_read[idx]);
+	struct drm_i915_gem_object *obj = vma->obj;
 
 	GEM_BUG_ON(!i915_vma_has_active_engine(vma, idx));
 
@@ -3313,6 +3315,31 @@ i915_vma_retire(struct i915_gem_active *active,
 	list_move_tail(&vma->vm_link, &vma->vm->inactive_list);
 	if (unlikely(i915_vma_is_closed(vma) && !i915_vma_is_pinned(vma)))
 		WARN_ON(i915_vma_unbind(vma));
+
+	GEM_BUG_ON(!i915_gem_object_is_active(obj));
+	if (--obj->active_count)
+		return;
+
+	/* Bump our place on the bound list to keep it roughly in LRU order
+	 * so that we don't steal from recently used but inactive objects
+	 * (unless we are forced to ofc!)
+	 */
+	if (obj->bind_count)
+		list_move_tail(&obj->global_list, &rq->i915->mm.bound_list);
+
+	obj->dirty = 1; /* be paranoid  */
+
+	i915_gem_object_put(obj);
+}
+
+static void
+i915_ggtt_retire__write(struct i915_gem_active *active,
+			struct drm_i915_gem_request *request)
+{
+	struct i915_vma *vma =
+		container_of(active, struct i915_vma, last_write);
+
+	intel_fb_obj_flush(vma->obj, true, ORIGIN_CS);
 }
 
 void i915_vma_destroy(struct i915_vma *vma)
@@ -3356,6 +3383,8 @@ __i915_vma_create(struct drm_i915_gem_object *obj,
 	INIT_LIST_HEAD(&vma->exec_list);
 	for (i = 0; i < ARRAY_SIZE(vma->last_read); i++)
 		init_request_active(&vma->last_read[i], i915_vma_retire);
+	init_request_active(&vma->last_write,
+			    i915_is_ggtt(vm) ? i915_ggtt_retire__write : NULL);
 	init_request_active(&vma->last_fence, NULL);
 	list_add(&vma->vm_link, &vm->unbound_list);
 	vma->vm = vm;
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index ec78be2f8c77..21f5d9657271 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -211,6 +211,7 @@ struct i915_vma {
 
 	unsigned int active;
 	struct i915_gem_active last_read[I915_NUM_ENGINES];
+	struct i915_gem_active last_write;
 	struct i915_gem_active last_fence;
 
 	/**
diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
index 687537d91be8..bc27c80be1e4 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -193,6 +193,8 @@ static void i915_gem_request_retire(struct drm_i915_gem_request *request)
 	}
 
 	i915_gem_context_put(request->ctx);
+
+	fence_signal(&request->fence);
 	i915_gem_request_put(request);
 }
 
@@ -538,33 +540,41 @@ i915_gem_request_await_object(struct drm_i915_gem_request *to,
 			      struct drm_i915_gem_object *obj,
 			      bool write)
 {
-	struct i915_gem_active *active;
-	unsigned long active_mask;
-	int idx;
+	struct fence *excl;
+	int ret = 0;
 
 	if (write) {
-		active_mask = i915_gem_object_get_active(obj);
-		active = obj->last_read;
+		struct fence **shared;
+		unsigned int count, i;
+
+		ret = reservation_object_get_fences_rcu(obj->resv,
+							&excl, &count, &shared);
+		if (ret)
+			return ret;
+
+		for (i = 0; i < count; i++) {
+			ret = i915_gem_request_await_fence(to, shared[i]);
+			if (ret)
+				break;
+
+			fence_put(shared[i]);
+		}
+
+		for (; i < count; i++)
+			fence_put(shared[i]);
+		kfree(shared);
 	} else {
-		active_mask = 1;
-		active = &obj->last_write;
+		excl = reservation_object_get_excl_rcu(obj->resv);
 	}
 
-	for_each_active(active_mask, idx) {
-		struct drm_i915_gem_request *request;
-		int ret;
-
-		request = i915_gem_active_peek(&active[idx],
-					       &obj->base.dev->struct_mutex);
-		if (!request)
-			continue;
+	if (excl) {
+		if (ret == 0)
+			ret = i915_gem_request_await_fence(to, excl);
 
-		ret = i915_gem_request_await_request(to, request);
-		if (ret)
-			return ret;
+		fence_put(excl);
 	}
 
-	return 0;
+	return ret;
 }
 
 static void i915_gem_mark_busy(const struct intel_engine_cs *engine)
diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
index 45998eedda2c..fe75026c60e0 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.h
+++ b/drivers/gpu/drm/i915/i915_gem_request.h
@@ -554,22 +554,7 @@ i915_gem_active_isset(const struct i915_gem_active *active)
 }
 
 /**
- * i915_gem_active_is_idle - report whether the active tracker is idle
- * @active - the active tracker
- *
- * i915_gem_active_is_idle() returns true if the active tracker is currently
- * unassigned or if the request is complete (but not yet retired). Requires
- * the caller to hold struct_mutex (but that can be relaxed if desired).
- */
-static inline bool
-i915_gem_active_is_idle(const struct i915_gem_active *active,
-			struct mutex *mutex)
-{
-	return !i915_gem_active_peek(active, mutex);
-}
-
-/**
- * i915_gem_active_wait- waits until the request is completed
+ * i915_gem_active_wait - waits until the request is completed
  * @active - the active request on which to wait
  * @flags - how to wait
  * @timeout - how long to wait at most
@@ -639,24 +624,6 @@ i915_gem_active_retire(struct i915_gem_active *active,
 	return 0;
 }
 
-/* Convenience functions for peeking at state inside active's request whilst
- * guarded by the struct_mutex.
- */
-
-static inline uint32_t
-i915_gem_active_get_seqno(const struct i915_gem_active *active,
-			  struct mutex *mutex)
-{
-	return i915_gem_request_get_seqno(i915_gem_active_peek(active, mutex));
-}
-
-static inline struct intel_engine_cs *
-i915_gem_active_get_engine(const struct i915_gem_active *active,
-			   struct mutex *mutex)
-{
-	return i915_gem_request_get_engine(i915_gem_active_peek(active, mutex));
-}
-
 #define for_each_active(mask, idx) \
 	for (; mask ? idx = ffs(mask) - 1, 1 : 0; mask &= ~BIT(idx))
 
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 334f15df7c8d..3d3f410d9aa9 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -795,9 +795,9 @@ static void capture_bo(struct drm_i915_error_buffer *err,
 	err->name = obj->base.name;
 
 	for (i = 0; i < I915_NUM_ENGINES; i++)
-		err->rseqno[i] = __active_get_seqno(&obj->last_read[i]);
-	err->wseqno = __active_get_seqno(&obj->last_write);
-	err->engine = __active_get_engine_id(&obj->last_write);
+		err->rseqno[i] = __active_get_seqno(&vma->last_read[i]);
+	err->wseqno = __active_get_seqno(&vma->last_write);
+	err->engine = __active_get_engine_id(&vma->last_write);
 
 	err->gtt_offset = vma->node.start;
 	err->read_domains = obj->base.read_domains;
diff --git a/drivers/gpu/drm/i915/intel_atomic_plane.c b/drivers/gpu/drm/i915/intel_atomic_plane.c
index b82de3072d4f..a8927929c740 100644
--- a/drivers/gpu/drm/i915/intel_atomic_plane.c
+++ b/drivers/gpu/drm/i915/intel_atomic_plane.c
@@ -84,7 +84,6 @@ intel_plane_duplicate_state(struct drm_plane *plane)
 	state = &intel_state->base;
 
 	__drm_atomic_helper_plane_duplicate_state(plane, state);
-	intel_state->wait_req = NULL;
 
 	return state;
 }
@@ -101,7 +100,6 @@ void
 intel_plane_destroy_state(struct drm_plane *plane,
 			  struct drm_plane_state *state)
 {
-	WARN_ON(state && to_intel_plane_state(state)->wait_req);
 	drm_atomic_helper_plane_destroy_state(plane, state);
 }
 
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index a5d61f0eea1d..a3dbeb50c41d 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -37,7 +37,6 @@
 #include "intel_frontbuffer.h"
 #include <drm/i915_drm.h>
 #include "i915_drv.h"
-#include "i915_gem_dmabuf.h"
 #include "intel_dsi.h"
 #include "i915_trace.h"
 #include <drm/drm_atomic.h>
@@ -11917,8 +11916,6 @@ static int intel_gen7_queue_flip(struct drm_device *dev,
 static bool use_mmio_flip(struct intel_engine_cs *engine,
 			  struct drm_i915_gem_object *obj)
 {
-	struct reservation_object *resv;
-
 	/*
 	 * This is not being used for older platforms, because
 	 * non-availability of flip done interrupt forces us to use
@@ -11935,17 +11932,8 @@ static bool use_mmio_flip(struct intel_engine_cs *engine,
 
 	if (i915.use_mmio_flip < 0)
 		return false;
-	else if (i915.use_mmio_flip > 0)
-		return true;
-	else if (i915.enable_execlists)
-		return true;
 
-	resv = i915_gem_object_get_dmabuf_resv(obj);
-	if (resv && !reservation_object_test_signaled_rcu(resv, false))
-		return true;
-
-	return engine != i915_gem_active_get_engine(&obj->last_write,
-						    &obj->base.dev->struct_mutex);
+	return true;
 }
 
 static void skl_do_mmio_flip(struct intel_crtc *intel_crtc,
@@ -12018,17 +12006,8 @@ static void intel_mmio_flip_work_func(struct work_struct *w)
 	struct intel_framebuffer *intel_fb =
 		to_intel_framebuffer(crtc->base.primary->fb);
 	struct drm_i915_gem_object *obj = intel_fb->obj;
-	struct reservation_object *resv;
 
-	if (work->flip_queued_req)
-		WARN_ON(i915_wait_request(work->flip_queued_req,
-					  0, MAX_SCHEDULE_TIMEOUT) < 0);
-
-	/* For framebuffer backed by dmabuf, wait for fence */
-	resv = i915_gem_object_get_dmabuf_resv(obj);
-	if (resv)
-		WARN_ON(reservation_object_wait_timeout_rcu(resv, false, false,
-							    MAX_SCHEDULE_TIMEOUT) < 0);
+	WARN_ON(i915_gem_object_wait(obj, 0, MAX_SCHEDULE_TIMEOUT, NULL) < 0);
 
 	intel_pipe_update_start(crtc);
 
@@ -12226,13 +12205,8 @@ static int intel_crtc_page_flip(struct drm_crtc *crtc,
 		if (fb->modifier[0] != old_fb->modifier[0])
 			/* vlv: DISPLAY_FLIP fails to change tiling */
 			engine = NULL;
-	} else if (IS_IVYBRIDGE(dev) || IS_HASWELL(dev)) {
-		engine = &dev_priv->engine[BCS];
 	} else if (INTEL_INFO(dev)->gen >= 7) {
-		engine = i915_gem_active_get_engine(&obj->last_write,
-						    &obj->base.dev->struct_mutex);
-		if (engine == NULL || engine->id != RCS)
-			engine = &dev_priv->engine[BCS];
+		engine = &dev_priv->engine[BCS];
 	} else {
 		engine = &dev_priv->engine[RCS];
 	}
@@ -12262,9 +12236,6 @@ static int intel_crtc_page_flip(struct drm_crtc *crtc,
 
 	if (mmio_flip) {
 		INIT_WORK(&work->mmio_work, intel_mmio_flip_work_func);
-
-		work->flip_queued_req = i915_gem_active_get(&obj->last_write,
-							    &obj->base.dev->struct_mutex);
 		schedule_work(&work->mmio_work);
 	} else {
 		request = i915_gem_request_alloc(engine, engine->last_context);
@@ -14036,13 +14007,10 @@ static int intel_atomic_check(struct drm_device *dev,
 }
 
 static int intel_atomic_prepare_commit(struct drm_device *dev,
-				       struct drm_atomic_state *state,
-				       bool nonblock)
+				       struct drm_atomic_state *state)
 {
 	struct drm_i915_private *dev_priv = to_i915(dev);
-	struct drm_plane_state *plane_state;
 	struct drm_crtc_state *crtc_state;
-	struct drm_plane *plane;
 	struct drm_crtc *crtc;
 	int i, ret;
 
@@ -14065,30 +14033,6 @@ static int intel_atomic_prepare_commit(struct drm_device *dev,
 	ret = drm_atomic_helper_prepare_planes(dev, state);
 	mutex_unlock(&dev->struct_mutex);
 
-	if (!ret && !nonblock) {
-		for_each_plane_in_state(state, plane, plane_state, i) {
-			struct intel_plane_state *intel_plane_state =
-				to_intel_plane_state(plane_state);
-			long timeout;
-
-			if (!intel_plane_state->wait_req)
-				continue;
-
-			timeout = i915_wait_request(intel_plane_state->wait_req,
-						    I915_WAIT_INTERRUPTIBLE,
-						    MAX_SCHEDULE_TIMEOUT);
-			if (timeout < 0) {
-				/* Any hang should be swallowed by the wait */
-				WARN_ON(timeout == -EIO);
-				mutex_lock(&dev->struct_mutex);
-				drm_atomic_helper_cleanup_planes(dev, state);
-				mutex_unlock(&dev->struct_mutex);
-				ret = timeout;
-				break;
-			}
-		}
-	}
-
 	return ret;
 }
 
@@ -14280,26 +14224,11 @@ static void intel_atomic_commit_tail(struct drm_atomic_state *state)
 	struct drm_crtc_state *old_crtc_state;
 	struct drm_crtc *crtc;
 	struct intel_crtc_state *intel_cstate;
-	struct drm_plane *plane;
-	struct drm_plane_state *plane_state;
 	bool hw_check = intel_state->modeset;
 	unsigned long put_domains[I915_MAX_PIPES] = {};
 	unsigned crtc_vblank_mask = 0;
 	int i;
 
-	for_each_plane_in_state(state, plane, plane_state, i) {
-		struct intel_plane_state *intel_plane_state =
-			to_intel_plane_state(plane_state);
-
-		if (!intel_plane_state->wait_req)
-			continue;
-
-		/* EIO should be eaten, and we can't get interrupted in the
-		 * worker, and blocking commits have waited already. */
-		WARN_ON(i915_wait_request(intel_plane_state->wait_req,
-					  0, MAX_SCHEDULE_TIMEOUT) < 0);
-	}
-
 	drm_atomic_helper_wait_for_dependencies(state);
 
 	if (intel_state->modeset) {
@@ -14507,7 +14436,7 @@ static int intel_atomic_commit(struct drm_device *dev,
 
 	INIT_WORK(&state->commit_work, intel_atomic_commit_work);
 
-	ret = intel_atomic_prepare_commit(dev, state, nonblock);
+	ret = intel_atomic_prepare_commit(dev, state);
 	if (ret) {
 		DRM_DEBUG_ATOMIC("Preparing state failed with %i\n", ret);
 		return ret;
@@ -14639,7 +14568,7 @@ intel_prepare_plane_fb(struct drm_plane *plane,
 	struct drm_framebuffer *fb = new_state->fb;
 	struct drm_i915_gem_object *obj = intel_fb_obj(fb);
 	struct drm_i915_gem_object *old_obj = intel_fb_obj(plane->state->fb);
-	struct reservation_object *resv;
+	long lret;
 	int ret = 0;
 
 	if (!obj && !old_obj)
@@ -14678,39 +14607,34 @@ intel_prepare_plane_fb(struct drm_plane *plane,
 		return 0;
 
 	/* For framebuffer backed by dmabuf, wait for fence */
-	resv = i915_gem_object_get_dmabuf_resv(obj);
-	if (resv) {
-		long lret;
-
-		lret = reservation_object_wait_timeout_rcu(resv, false, true,
-							   MAX_SCHEDULE_TIMEOUT);
-		if (lret == -ERESTARTSYS)
-			return lret;
+	lret = i915_gem_object_wait(obj,
+				    I915_WAIT_INTERRUPTIBLE | I915_WAIT_LOCKED,
+				    MAX_SCHEDULE_TIMEOUT,
+				    NULL);
+	if (lret == -ERESTARTSYS)
+		return lret;
 
-		WARN(lret < 0, "waiting returns %li\n", lret);
-	}
+	WARN(lret < 0, "waiting returns %li\n", lret);
 
 	if (plane->type == DRM_PLANE_TYPE_CURSOR &&
 	    INTEL_INFO(dev)->cursor_needs_physical) {
 		int align = IS_I830(dev) ? 16 * 1024 : 256;
 		ret = i915_gem_object_attach_phys(obj, align);
-		if (ret)
+		if (ret) {
 			DRM_DEBUG_KMS("failed to attach phys object\n");
+			return ret;
+		}
 	} else {
 		struct i915_vma *vma;
 
 		vma = intel_pin_and_fence_fb_obj(fb, new_state->rotation);
-		if (IS_ERR(vma))
-			ret = PTR_ERR(vma);
-	}
-
-	if (ret == 0) {
-		to_intel_plane_state(new_state)->wait_req =
-			i915_gem_active_get(&obj->last_write,
-					    &obj->base.dev->struct_mutex);
+		if (IS_ERR(vma)) {
+			DRM_DEBUG_KMS("failed to pin object\n");
+			return PTR_ERR(vma);
+		}
 	}
 
-	return ret;
+	return 0;
 }
 
 /**
@@ -14728,7 +14652,6 @@ intel_cleanup_plane_fb(struct drm_plane *plane,
 {
 	struct drm_device *dev = plane->dev;
 	struct intel_plane_state *old_intel_state;
-	struct intel_plane_state *intel_state = to_intel_plane_state(plane->state);
 	struct drm_i915_gem_object *old_obj = intel_fb_obj(old_state->fb);
 	struct drm_i915_gem_object *obj = intel_fb_obj(plane->state->fb);
 
@@ -14740,9 +14663,6 @@ intel_cleanup_plane_fb(struct drm_plane *plane,
 	if (old_obj && (plane->type != DRM_PLANE_TYPE_CURSOR ||
 	    !INTEL_INFO(dev)->cursor_needs_physical))
 		intel_unpin_fb_obj(old_state->fb, old_state->rotation);
-
-	i915_gem_request_assign(&intel_state->wait_req, NULL);
-	i915_gem_request_assign(&old_intel_state->wait_req, NULL);
 }
 
 int
diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
index abe7a4df2e43..15e3050edeb9 100644
--- a/drivers/gpu/drm/i915/intel_drv.h
+++ b/drivers/gpu/drm/i915/intel_drv.h
@@ -393,9 +393,6 @@ struct intel_plane_state {
 	int scaler_id;
 
 	struct drm_intel_sprite_colorkey ckey;
-
-	/* async flip related structures */
-	struct drm_i915_gem_request *wait_req;
 };
 
 struct intel_initial_plane_config {
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 03725fe89859..93ad6f9d2496 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -860,39 +860,6 @@ struct drm_i915_gem_busy {
 	 * long as no new GPU commands are executed upon it). Due to the
 	 * asynchronous nature of the hardware, an object reported
 	 * as busy may become idle before the ioctl is completed.
-	 *
-	 * Furthermore, if the object is busy, which engine is busy is only
-	 * provided as a guide. There are race conditions which prevent the
-	 * report of which engines are busy from being always accurate.
-	 * However, the converse is not true. If the object is idle, the
-	 * result of the ioctl, that all engines are idle, is accurate.
-	 *
-	 * The returned dword is split into two fields to indicate both
-	 * the engines on which the object is being read, and the
-	 * engine on which it is currently being written (if any).
-	 *
-	 * The low word (bits 0:15) indicate if the object is being written
-	 * to by any engine (there can only be one, as the GEM implicit
-	 * synchronisation rules force writes to be serialised). Only the
-	 * engine for the last write is reported.
-	 *
-	 * The high word (bits 16:31) are a bitmask of which engines are
-	 * currently reading from the object. Multiple engines may be
-	 * reading from the object simultaneously.
-	 *
-	 * The value of each engine is the same as specified in the
-	 * EXECBUFFER2 ioctl, i.e. I915_EXEC_RENDER, I915_EXEC_BSD etc.
-	 * Note I915_EXEC_DEFAULT is a symbolic value and is mapped to
-	 * the I915_EXEC_RENDER engine for execution, and so it is never
-	 * reported as active itself. Some hardware may have parallel
-	 * execution engines, e.g. multiple media engines, which are
-	 * mapped to the same identifier in the EXECBUFFER2 ioctl and
-	 * so are not separately reported for busyness.
-	 *
-	 * Caveat emptor:
-	 * Only the boolean result of this query is reliable; that is whether
-	 * the object is idle or busy. The report of which engines are busy
-	 * should be only used as a heuristic.
 	 */
 	__u32 busy;
 };
-- 
2.9.3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 06/18] drm: Add reference counting to drm_atomic_state
  2016-09-14  6:52 Tracking multiple timelines (full-ppgtt) Chris Wilson
                   ` (4 preceding siblings ...)
  2016-09-14  6:52 ` [PATCH 05/18] drm/i915: Move GEM activity tracking into a common struct reservation_object Chris Wilson
@ 2016-09-14  6:52 ` Chris Wilson
  2016-09-14  6:52 ` [PATCH 07/18] drm/i915: Restore nonblocking awaits for modesetting Chris Wilson
                   ` (12 subsequent siblings)
  18 siblings, 0 replies; 51+ messages in thread
From: Chris Wilson @ 2016-09-14  6:52 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter, joonas.lahtinen, dri-devel, mika.kahola

drm_atomic_state has a complicated single owner model that tracks the
single reference from allocation through to destruction on another
thread - or perhaps on a local error path. We can simplify this tracking
by using reference counting (at a cost of a few more atomics). This is
even more beneficial when the lifetime of the state becomes more
convoluted than being passed to a single worker thread for the commit.

v2: Double check !intel atomic_commit functions for missing gets
v3: Update kerneldocs

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: dri-devel@lists.freedesktop.org
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
---
 drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c |  3 +-
 drivers/gpu/drm/drm_atomic.c                 | 25 +++----
 drivers/gpu/drm/drm_atomic_helper.c          | 98 +++++++---------------------
 drivers/gpu/drm/drm_fb_helper.c              |  9 +--
 drivers/gpu/drm/exynos/exynos_drm_drv.c      |  3 +-
 drivers/gpu/drm/i915/i915_debugfs.c          |  3 +-
 drivers/gpu/drm/i915/intel_display.c         | 31 +++++----
 drivers/gpu/drm/i915/intel_sprite.c          |  4 +-
 drivers/gpu/drm/mediatek/mtk_drm_drv.c       |  3 +-
 drivers/gpu/drm/msm/msm_atomic.c             |  3 +-
 drivers/gpu/drm/omapdrm/omap_drv.c           |  3 +-
 drivers/gpu/drm/rcar-du/rcar_du_kms.c        |  3 +-
 drivers/gpu/drm/sti/sti_drv.c                |  3 +-
 drivers/gpu/drm/tegra/drm.c                  |  3 +-
 drivers/gpu/drm/tilcdc/tilcdc_drv.c          |  2 -
 drivers/gpu/drm/vc4/vc4_kms.c                |  3 +-
 include/drm/drm_atomic.h                     | 28 +++++++-
 include/drm/drm_crtc.h                       |  3 +
 18 files changed, 101 insertions(+), 129 deletions(-)

diff --git a/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c b/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c
index 8e7483d90c47..c9e645fa1233 100644
--- a/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c
+++ b/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c
@@ -464,7 +464,7 @@ atmel_hlcdc_dc_atomic_complete(struct atmel_hlcdc_dc_commit *commit)
 
 	drm_atomic_helper_cleanup_planes(dev, old_state);
 
-	drm_atomic_state_free(old_state);
+	drm_atomic_state_put(old_state);
 
 	/* Complete the commit, wake up any waiter. */
 	spin_lock(&dc->commit.wait.lock);
@@ -521,6 +521,7 @@ static int atmel_hlcdc_dc_atomic_commit(struct drm_device *dev,
 	/* Swap the state, this is the point of no return. */
 	drm_atomic_helper_swap_state(state, true);
 
+	drm_atomic_state_get(state);
 	if (async)
 		queue_work(dc->wq, &commit->work);
 	else
diff --git a/drivers/gpu/drm/drm_atomic.c b/drivers/gpu/drm/drm_atomic.c
index 23739609427d..5dd70540219c 100644
--- a/drivers/gpu/drm/drm_atomic.c
+++ b/drivers/gpu/drm/drm_atomic.c
@@ -74,6 +74,8 @@ EXPORT_SYMBOL(drm_atomic_state_default_release);
 int
 drm_atomic_state_init(struct drm_device *dev, struct drm_atomic_state *state)
 {
+	kref_init(&state->ref);
+
 	/* TODO legacy paths should maybe do a better job about
 	 * setting this appropriately?
 	 */
@@ -215,22 +217,16 @@ void drm_atomic_state_clear(struct drm_atomic_state *state)
 EXPORT_SYMBOL(drm_atomic_state_clear);
 
 /**
- * drm_atomic_state_free - free all memory for an atomic state
- * @state: atomic state to deallocate
+ * __drm_atomic_state_free - free all memory for an atomic state
+ * @ref: This atomic state to deallocate
  *
  * This frees all memory associated with an atomic state, including all the
  * per-object state for planes, crtcs and connectors.
  */
-void drm_atomic_state_free(struct drm_atomic_state *state)
+void __drm_atomic_state_free(struct kref *ref)
 {
-	struct drm_device *dev;
-	struct drm_mode_config *config;
-
-	if (!state)
-		return;
-
-	dev = state->dev;
-	config = &dev->mode_config;
+	struct drm_atomic_state *state = container_of(ref, typeof(*state), ref);
+	struct drm_mode_config *config = &state->dev->mode_config;
 
 	drm_atomic_state_clear(state);
 
@@ -243,7 +239,7 @@ void drm_atomic_state_free(struct drm_atomic_state *state)
 		kfree(state);
 	}
 }
-EXPORT_SYMBOL(drm_atomic_state_free);
+EXPORT_SYMBOL(__drm_atomic_state_free);
 
 /**
  * drm_atomic_get_crtc_state - get crtc state
@@ -1742,7 +1738,7 @@ retry:
 	if (arg->flags & DRM_MODE_ATOMIC_TEST_ONLY) {
 		/*
 		 * Unlike commit, check_only does not clean up state.
-		 * Below we call drm_atomic_state_free for it.
+		 * Below we call drm_atomic_state_put for it.
 		 */
 		ret = drm_atomic_check_only(state);
 	} else if (arg->flags & DRM_MODE_ATOMIC_NONBLOCK) {
@@ -1775,8 +1771,7 @@ out:
 		goto retry;
 	}
 
-	if (ret || arg->flags & DRM_MODE_ATOMIC_TEST_ONLY)
-		drm_atomic_state_free(state);
+	drm_atomic_state_put(state);
 
 	drm_modeset_drop_locks(&ctx);
 	drm_modeset_acquire_fini(&ctx);
diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
index ea78d70de9f3..d1ec053d36d5 100644
--- a/drivers/gpu/drm/drm_atomic_helper.c
+++ b/drivers/gpu/drm/drm_atomic_helper.c
@@ -1207,7 +1207,7 @@ static void commit_tail(struct drm_atomic_state *state)
 
 	drm_atomic_helper_commit_cleanup_done(state);
 
-	drm_atomic_state_free(state);
+	drm_atomic_state_put(state);
 }
 
 static void commit_work(struct work_struct *work)
@@ -1289,6 +1289,7 @@ int drm_atomic_helper_commit(struct drm_device *dev,
 	 * make sure work items don't artifically stall on each another.
 	 */
 
+	drm_atomic_state_get(state);
 	if (nonblock)
 		queue_work(system_unbound_wq, &state->commit_work);
 	else
@@ -1590,7 +1591,7 @@ EXPORT_SYMBOL(drm_atomic_helper_commit_hw_done);
  *
  * This signals completion of the atomic update @state, including any cleanup
  * work. If used, it must be called right before calling
- * drm_atomic_state_free().
+ * drm_atomic_state_put().
  *
  * This is part of the atomic helper support for nonblocking commits, see
  * drm_atomic_helper_setup_commit() for an overview.
@@ -2113,18 +2114,13 @@ retry:
 		state->legacy_cursor_update = true;
 
 	ret = drm_atomic_commit(state);
-	if (ret != 0)
-		goto fail;
-
-	/* Driver takes ownership of state on successful commit. */
-	return 0;
 fail:
 	if (ret == -EDEADLK)
 		goto backoff;
 
-	drm_atomic_state_free(state);
-
+	drm_atomic_state_put(state);
 	return ret;
+
 backoff:
 	drm_atomic_state_clear(state);
 	drm_atomic_legacy_backoff(state);
@@ -2186,18 +2182,13 @@ retry:
 		goto fail;
 
 	ret = drm_atomic_commit(state);
-	if (ret != 0)
-		goto fail;
-
-	/* Driver takes ownership of state on successful commit. */
-	return 0;
 fail:
 	if (ret == -EDEADLK)
 		goto backoff;
 
-	drm_atomic_state_free(state);
-
+	drm_atomic_state_put(state);
 	return ret;
+
 backoff:
 	drm_atomic_state_clear(state);
 	drm_atomic_legacy_backoff(state);
@@ -2326,18 +2317,13 @@ retry:
 		goto fail;
 
 	ret = drm_atomic_commit(state);
-	if (ret != 0)
-		goto fail;
-
-	/* Driver takes ownership of state on successful commit. */
-	return 0;
 fail:
 	if (ret == -EDEADLK)
 		goto backoff;
 
-	drm_atomic_state_free(state);
-
+	drm_atomic_state_put(state);
 	return ret;
+
 backoff:
 	drm_atomic_state_clear(state);
 	drm_atomic_legacy_backoff(state);
@@ -2479,11 +2465,8 @@ int drm_atomic_helper_disable_all(struct drm_device *dev,
 	}
 
 	err = drm_atomic_commit(state);
-
 free:
-	if (err < 0)
-		drm_atomic_state_free(state);
-
+	drm_atomic_state_put(state);
 	return err;
 }
 EXPORT_SYMBOL(drm_atomic_helper_disable_all);
@@ -2534,7 +2517,7 @@ retry:
 
 	err = drm_atomic_helper_disable_all(dev, &ctx);
 	if (err < 0) {
-		drm_atomic_state_free(state);
+		drm_atomic_state_put(state);
 		state = ERR_PTR(err);
 		goto unlock;
 	}
@@ -2623,18 +2606,13 @@ retry:
 		goto fail;
 
 	ret = drm_atomic_commit(state);
-	if (ret != 0)
-		goto fail;
-
-	/* Driver takes ownership of state on successful commit. */
-	return 0;
 fail:
 	if (ret == -EDEADLK)
 		goto backoff;
 
-	drm_atomic_state_free(state);
-
+	drm_atomic_state_put(state);
 	return ret;
+
 backoff:
 	drm_atomic_state_clear(state);
 	drm_atomic_legacy_backoff(state);
@@ -2683,18 +2661,13 @@ retry:
 		goto fail;
 
 	ret = drm_atomic_commit(state);
-	if (ret != 0)
-		goto fail;
-
-	/* Driver takes ownership of state on successful commit. */
-	return 0;
 fail:
 	if (ret == -EDEADLK)
 		goto backoff;
 
-	drm_atomic_state_free(state);
-
+	drm_atomic_state_put(state);
 	return ret;
+
 backoff:
 	drm_atomic_state_clear(state);
 	drm_atomic_legacy_backoff(state);
@@ -2743,18 +2716,13 @@ retry:
 		goto fail;
 
 	ret = drm_atomic_commit(state);
-	if (ret != 0)
-		goto fail;
-
-	/* Driver takes ownership of state on successful commit. */
-	return 0;
 fail:
 	if (ret == -EDEADLK)
 		goto backoff;
 
-	drm_atomic_state_free(state);
-
+	drm_atomic_state_put(state);
 	return ret;
+
 backoff:
 	drm_atomic_state_clear(state);
 	drm_atomic_legacy_backoff(state);
@@ -2827,18 +2795,13 @@ retry:
 	}
 
 	ret = drm_atomic_nonblocking_commit(state);
-	if (ret != 0)
-		goto fail;
-
-	/* Driver takes ownership of state on successful commit. */
-	return 0;
 fail:
 	if (ret == -EDEADLK)
 		goto backoff;
 
-	drm_atomic_state_free(state);
-
+	drm_atomic_state_put(state);
 	return ret;
+
 backoff:
 	drm_atomic_state_clear(state);
 	drm_atomic_legacy_backoff(state);
@@ -2914,19 +2877,14 @@ retry:
 	crtc_state->active = active;
 
 	ret = drm_atomic_commit(state);
-	if (ret != 0)
-		goto fail;
-
-	/* Driver takes ownership of state on successful commit. */
-	return 0;
 fail:
 	if (ret == -EDEADLK)
 		goto backoff;
 
 	connector->dpms = old_mode;
-	drm_atomic_state_free(state);
-
+	drm_atomic_state_put(state);
 	return ret;
+
 backoff:
 	drm_atomic_state_clear(state);
 	drm_atomic_legacy_backoff(state);
@@ -3333,7 +3291,7 @@ drm_atomic_helper_duplicate_state(struct drm_device *dev,
 
 free:
 	if (err < 0) {
-		drm_atomic_state_free(state);
+		drm_atomic_state_put(state);
 		state = ERR_PTR(err);
 	}
 
@@ -3448,22 +3406,14 @@ retry:
 		goto fail;
 
 	ret = drm_atomic_commit(state);
-	if (ret)
-		goto fail;
-
-	/* Driver takes ownership of state on successful commit. */
-
-	drm_property_unreference_blob(blob);
-
-	return 0;
 fail:
 	if (ret == -EDEADLK)
 		goto backoff;
 
-	drm_atomic_state_free(state);
+	drm_atomic_state_put(state);
 	drm_property_unreference_blob(blob);
-
 	return ret;
+
 backoff:
 	drm_atomic_state_clear(state);
 	drm_atomic_legacy_backoff(state);
diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c
index dd8e3b68fd53..05a711ecf97a 100644
--- a/drivers/gpu/drm/drm_fb_helper.c
+++ b/drivers/gpu/drm/drm_fb_helper.c
@@ -365,9 +365,7 @@ fail:
 	if (ret == -EDEADLK)
 		goto backoff;
 
-	if (ret != 0)
-		drm_atomic_state_free(state);
-
+	drm_atomic_state_put(state);
 	return ret;
 
 backoff:
@@ -1359,16 +1357,13 @@ retry:
 	info->var.xoffset = var->xoffset;
 	info->var.yoffset = var->yoffset;
 
-
 fail:
 	drm_atomic_clean_old_fb(dev, plane_mask, ret);
 
 	if (ret == -EDEADLK)
 		goto backoff;
 
-	if (ret != 0)
-		drm_atomic_state_free(state);
-
+	drm_atomic_state_put(state);
 	return ret;
 
 backoff:
diff --git a/drivers/gpu/drm/exynos/exynos_drm_drv.c b/drivers/gpu/drm/exynos/exynos_drm_drv.c
index 486943e70f70..f11ba20781ce 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_drv.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_drv.c
@@ -111,7 +111,7 @@ static void exynos_atomic_commit_complete(struct exynos_atomic_commit *commit)
 
 	drm_atomic_helper_cleanup_planes(dev, state);
 
-	drm_atomic_state_free(state);
+	drm_atomic_state_put(state);
 
 	spin_lock(&priv->lock);
 	priv->pending &= ~commit->crtcs;
@@ -296,6 +296,7 @@ int exynos_atomic_commit(struct drm_device *dev, struct drm_atomic_state *state,
 
 	drm_atomic_helper_swap_state(state, true);
 
+	drm_atomic_state_get(state);
 	if (nonblock)
 		schedule_work(&commit->work);
 	else
diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index c4e7532c5b6a..2e02304c7447 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -3926,8 +3926,7 @@ static void hsw_trans_edp_pipe_A_crc_wa(struct drm_i915_private *dev_priv,
 out:
 	drm_modeset_unlock_all(dev);
 	WARN(ret, "Toggling workaround to %i returns %i\n", enable, ret);
-	if (ret)
-		drm_atomic_state_free(state);
+	drm_atomic_state_put(state);
 }
 
 static int ivb_pipe_crc_ctl_reg(struct drm_i915_private *dev_priv,
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index a3dbeb50c41d..5c0ea08dddb3 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -3581,7 +3581,7 @@ void intel_prepare_reset(struct drm_i915_private *dev_priv)
 	return;
 
 err:
-	drm_atomic_state_free(state);
+	drm_atomic_state_put(state);
 }
 
 void intel_finish_reset(struct drm_i915_private *dev_priv)
@@ -3640,6 +3640,8 @@ void intel_finish_reset(struct drm_i915_private *dev_priv)
 		intel_hpd_init(dev_priv);
 	}
 
+	if (state)
+		drm_atomic_state_put(state);
 	drm_modeset_drop_locks(ctx);
 	drm_modeset_acquire_fini(ctx);
 	mutex_unlock(&dev->mode_config.mutex);
@@ -6879,7 +6881,7 @@ static void intel_crtc_disable_noatomic(struct drm_crtc *crtc)
 
 	dev_priv->display.crtc_disable(crtc_state, state);
 
-	drm_atomic_state_free(state);
+	drm_atomic_state_put(state);
 
 	DRM_DEBUG_KMS("[CRTC:%d:%s] hw state adjusted, was enabled, now disabled\n",
 		      crtc->base.id, crtc->name);
@@ -11246,8 +11248,8 @@ found:
 	return true;
 
 fail:
-	drm_atomic_state_free(state);
-	drm_atomic_state_free(restore_state);
+	drm_atomic_state_put(state);
+	drm_atomic_state_put(restore_state);
 	restore_state = state = NULL;
 
 	if (ret == -EDEADLK) {
@@ -11276,10 +11278,9 @@ void intel_release_load_detect_pipe(struct drm_connector *connector,
 		return;
 
 	ret = drm_atomic_commit(state);
-	if (ret) {
+	if (ret)
 		DRM_DEBUG_KMS("Couldn't release load detect pipe: %i\n", ret);
-		drm_atomic_state_free(state);
-	}
+	drm_atomic_state_put(state);
 }
 
 static int i9xx_pll_refclk(struct drm_device *dev,
@@ -12319,8 +12320,7 @@ retry:
 			goto retry;
 		}
 
-		if (ret)
-			drm_atomic_state_free(state);
+		drm_atomic_state_put(state);
 
 		if (ret == 0 && event) {
 			spin_lock_irq(&dev->event_lock);
@@ -14365,7 +14365,7 @@ static void intel_atomic_commit_tail(struct drm_atomic_state *state)
 
 	drm_atomic_helper_commit_cleanup_done(state);
 
-	drm_atomic_state_free(state);
+	drm_atomic_state_put(state);
 
 	/* As one of the primary mmio accessors, KMS has a high likelihood
 	 * of triggering bugs in unclaimed access. After we finish
@@ -14448,6 +14448,7 @@ static int intel_atomic_commit(struct drm_device *dev,
 	intel_shared_dpll_commit(state);
 	intel_atomic_track_fbs(state);
 
+	drm_atomic_state_get(state);
 	if (nonblock)
 		queue_work(system_unbound_wq, &state->commit_work);
 	else
@@ -14489,9 +14490,8 @@ retry:
 		goto retry;
 	}
 
-	if (ret)
 out:
-		drm_atomic_state_free(state);
+	drm_atomic_state_put(state);
 }
 
 /*
@@ -16229,8 +16229,8 @@ retry:
 		dev_priv->display.optimize_watermarks(cs);
 	}
 
-	drm_atomic_state_free(state);
 fail:
+	drm_atomic_state_put(state);
 	drm_modeset_drop_locks(&ctx);
 	drm_modeset_acquire_fini(&ctx);
 }
@@ -16867,10 +16867,9 @@ void intel_display_resume(struct drm_device *dev)
 	drm_modeset_acquire_fini(&ctx);
 	mutex_unlock(&dev->mode_config.mutex);
 
-	if (ret) {
+	if (ret)
 		DRM_ERROR("Restoring old state failed with %i\n", ret);
-		drm_atomic_state_free(state);
-	}
+	drm_atomic_state_put(state);
 }
 
 void intel_modeset_gem_init(struct drm_device *dev)
diff --git a/drivers/gpu/drm/i915/intel_sprite.c b/drivers/gpu/drm/i915/intel_sprite.c
index 73a521fdf1bd..be3e04623e2a 100644
--- a/drivers/gpu/drm/i915/intel_sprite.c
+++ b/drivers/gpu/drm/i915/intel_sprite.c
@@ -987,9 +987,7 @@ int intel_sprite_set_colorkey(struct drm_device *dev, void *data,
 		drm_modeset_backoff(&ctx);
 	}
 
-	if (ret)
-		drm_atomic_state_free(state);
-
+	drm_atomic_state_put(state);
 out:
 	drm_modeset_drop_locks(&ctx);
 	drm_modeset_acquire_fini(&ctx);
diff --git a/drivers/gpu/drm/mediatek/mtk_drm_drv.c b/drivers/gpu/drm/mediatek/mtk_drm_drv.c
index 72c1ae4e02d4..5ab6017575ea 100644
--- a/drivers/gpu/drm/mediatek/mtk_drm_drv.c
+++ b/drivers/gpu/drm/mediatek/mtk_drm_drv.c
@@ -83,7 +83,7 @@ static void mtk_atomic_complete(struct mtk_drm_private *private,
 	drm_atomic_helper_wait_for_vblanks(drm, state);
 
 	drm_atomic_helper_cleanup_planes(drm, state);
-	drm_atomic_state_free(state);
+	drm_atomic_state_put(state);
 }
 
 static void mtk_atomic_work(struct work_struct *work)
@@ -110,6 +110,7 @@ static int mtk_atomic_commit(struct drm_device *drm,
 
 	drm_atomic_helper_swap_state(state, true);
 
+	drm_atomic_state_get(state);
 	if (async)
 		mtk_atomic_schedule(private, state);
 	else
diff --git a/drivers/gpu/drm/msm/msm_atomic.c b/drivers/gpu/drm/msm/msm_atomic.c
index 73bae382eac3..db193f835298 100644
--- a/drivers/gpu/drm/msm/msm_atomic.c
+++ b/drivers/gpu/drm/msm/msm_atomic.c
@@ -141,7 +141,7 @@ static void complete_commit(struct msm_commit *c, bool async)
 
 	kms->funcs->complete_commit(kms, state);
 
-	drm_atomic_state_free(state);
+	drm_atomic_state_put(state);
 
 	commit_destroy(c);
 }
@@ -256,6 +256,7 @@ int msm_atomic_commit(struct drm_device *dev,
 	 * current layout.
 	 */
 
+	drm_atomic_state_get(state);
 	if (nonblock) {
 		queue_work(priv->atomic_wq, &c->work);
 		return 0;
diff --git a/drivers/gpu/drm/omapdrm/omap_drv.c b/drivers/gpu/drm/omapdrm/omap_drv.c
index e1cfba51cff6..1735c7accf72 100644
--- a/drivers/gpu/drm/omapdrm/omap_drv.c
+++ b/drivers/gpu/drm/omapdrm/omap_drv.c
@@ -105,7 +105,7 @@ static void omap_atomic_complete(struct omap_atomic_state_commit *commit)
 
 	dispc_runtime_put();
 
-	drm_atomic_state_free(old_state);
+	drm_atomic_state_put(old_state);
 
 	/* Complete the commit, wake up any waiter. */
 	spin_lock(&priv->commit.lock);
@@ -176,6 +176,7 @@ static int omap_atomic_commit(struct drm_device *dev,
 	/* Swap the state, this is the point of no return. */
 	drm_atomic_helper_swap_state(state, true);
 
+	drm_atomic_state_get(state);
 	if (nonblock)
 		schedule_work(&commit->work);
 	else
diff --git a/drivers/gpu/drm/rcar-du/rcar_du_kms.c b/drivers/gpu/drm/rcar-du/rcar_du_kms.c
index bd9c3bb9252c..c76ed9ee6a01 100644
--- a/drivers/gpu/drm/rcar-du/rcar_du_kms.c
+++ b/drivers/gpu/drm/rcar-du/rcar_du_kms.c
@@ -264,7 +264,7 @@ static void rcar_du_atomic_complete(struct rcar_du_commit *commit)
 
 	drm_atomic_helper_cleanup_planes(dev, old_state);
 
-	drm_atomic_state_free(old_state);
+	drm_atomic_state_put(old_state);
 
 	/* Complete the commit, wake up any waiter. */
 	spin_lock(&rcdu->commit.wait.lock);
@@ -330,6 +330,7 @@ static int rcar_du_atomic_commit(struct drm_device *dev,
 	/* Swap the state, this is the point of no return. */
 	drm_atomic_helper_swap_state(state, true);
 
+	drm_atomic_state_get(state);
 	if (nonblock)
 		schedule_work(&commit->work);
 	else
diff --git a/drivers/gpu/drm/sti/sti_drv.c b/drivers/gpu/drm/sti/sti_drv.c
index 7cd3804c6dee..7ad8d42dba28 100644
--- a/drivers/gpu/drm/sti/sti_drv.c
+++ b/drivers/gpu/drm/sti/sti_drv.c
@@ -184,7 +184,7 @@ static void sti_atomic_complete(struct sti_private *private,
 	drm_atomic_helper_wait_for_vblanks(drm, state);
 
 	drm_atomic_helper_cleanup_planes(drm, state);
-	drm_atomic_state_free(state);
+	drm_atomic_state_put(state);
 }
 
 static void sti_atomic_work(struct work_struct *work)
@@ -217,6 +217,7 @@ static int sti_atomic_commit(struct drm_device *drm,
 
 	drm_atomic_helper_swap_state(state, true);
 
+	drm_atomic_state_get(state);
 	if (nonblock)
 		sti_atomic_schedule(private, state);
 	else
diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
index 4b9f1c79cd7b..4ca8f2f3c46b 100644
--- a/drivers/gpu/drm/tegra/drm.c
+++ b/drivers/gpu/drm/tegra/drm.c
@@ -63,7 +63,7 @@ static void tegra_atomic_complete(struct tegra_drm *tegra,
 	drm_atomic_helper_wait_for_vblanks(drm, state);
 
 	drm_atomic_helper_cleanup_planes(drm, state);
-	drm_atomic_state_free(state);
+	drm_atomic_state_put(state);
 }
 
 static void tegra_atomic_work(struct work_struct *work)
@@ -96,6 +96,7 @@ static int tegra_atomic_commit(struct drm_device *drm,
 
 	drm_atomic_helper_swap_state(state, true);
 
+	drm_atomic_state_get(state);
 	if (nonblock)
 		tegra_atomic_schedule(tegra, state);
 	else
diff --git a/drivers/gpu/drm/tilcdc/tilcdc_drv.c b/drivers/gpu/drm/tilcdc/tilcdc_drv.c
index f8892e9ad169..19abf65f0ff3 100644
--- a/drivers/gpu/drm/tilcdc/tilcdc_drv.c
+++ b/drivers/gpu/drm/tilcdc/tilcdc_drv.c
@@ -143,8 +143,6 @@ static int tilcdc_commit(struct drm_device *dev,
 
 	drm_atomic_helper_cleanup_planes(dev, state);
 
-	drm_atomic_state_free(state);
-
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/vc4/vc4_kms.c b/drivers/gpu/drm/vc4/vc4_kms.c
index c1f65c6c8e60..f31f72af8551 100644
--- a/drivers/gpu/drm/vc4/vc4_kms.c
+++ b/drivers/gpu/drm/vc4/vc4_kms.c
@@ -61,7 +61,7 @@ vc4_atomic_complete_commit(struct vc4_commit *c)
 
 	drm_atomic_helper_cleanup_planes(dev, state);
 
-	drm_atomic_state_free(state);
+	drm_atomic_state_put(state);
 
 	up(&vc4->async_modeset);
 
@@ -173,6 +173,7 @@ static int vc4_atomic_commit(struct drm_device *dev,
 	 * current layout.
 	 */
 
+	drm_atomic_state_get(state);
 	if (nonblock) {
 		vc4_queue_seqno_cb(dev, &c->cb, wait_seqno,
 				   vc4_atomic_complete_commit_seqno_cb);
diff --git a/include/drm/drm_atomic.h b/include/drm/drm_atomic.h
index 856a9c85a838..1c6210e7a791 100644
--- a/include/drm/drm_atomic.h
+++ b/include/drm/drm_atomic.h
@@ -39,7 +39,33 @@ static inline void drm_crtc_commit_get(struct drm_crtc_commit *commit)
 struct drm_atomic_state * __must_check
 drm_atomic_state_alloc(struct drm_device *dev);
 void drm_atomic_state_clear(struct drm_atomic_state *state);
-void drm_atomic_state_free(struct drm_atomic_state *state);
+
+/**
+ * drm_atomic_state_get - acquire a reference to the atomic state
+ * @state: The atomic state
+ *
+ * Returns a new reference to the @state
+ */
+static inline struct drm_atomic_state *
+drm_atomic_state_get(struct drm_atomic_state *state)
+{
+	kref_get(&state->ref);
+	return state;
+}
+
+void __drm_atomic_state_free(struct kref *ref);
+
+/**
+ * drm_atomic_state_put - release a reference to the atomic state
+ * @state: The atomic state
+ *
+ * This releases a reference to @state which is freed after removing the
+ * final reference. No locking required and callable from any context.
+ */
+static inline void drm_atomic_state_put(struct drm_atomic_state *state)
+{
+	kref_put(&state->ref, __drm_atomic_state_free);
+}
 
 int  __must_check
 drm_atomic_state_init(struct drm_device *dev, struct drm_atomic_state *state);
diff --git a/include/drm/drm_crtc.h b/include/drm/drm_crtc.h
index 8ca71d66282b..b6e1853d2355 100644
--- a/include/drm/drm_crtc.h
+++ b/include/drm/drm_crtc.h
@@ -1309,6 +1309,7 @@ struct __drm_connnectors_state {
 
 /**
  * struct drm_atomic_state - the global state object for atomic updates
+ * @ref: count of all references to this state
  * @dev: parent DRM device
  * @allow_modeset: allow full modeset
  * @legacy_cursor_update: hint to enforce legacy cursor IOCTL semantics
@@ -1320,6 +1321,8 @@ struct __drm_connnectors_state {
  * @acquire_ctx: acquire context for this atomic modeset state update
  */
 struct drm_atomic_state {
+	struct kref ref;
+
 	struct drm_device *dev;
 	bool allow_modeset : 1;
 	bool legacy_cursor_update : 1;
-- 
2.9.3

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 07/18] drm/i915: Restore nonblocking awaits for modesetting
  2016-09-14  6:52 Tracking multiple timelines (full-ppgtt) Chris Wilson
                   ` (5 preceding siblings ...)
  2016-09-14  6:52 ` [PATCH 06/18] drm: Add reference counting to drm_atomic_state Chris Wilson
@ 2016-09-14  6:52 ` Chris Wilson
  2016-09-19 16:01   ` Joonas Lahtinen
  2016-09-14  6:52 ` [PATCH 08/18] drm/i915: Combine seqno + tracking into a global timeline struct Chris Wilson
                   ` (11 subsequent siblings)
  18 siblings, 1 reply; 51+ messages in thread
From: Chris Wilson @ 2016-09-14  6:52 UTC (permalink / raw)
  To: intel-gfx

After combining the dma-buf reservation object and the GEM reservation
object, we lost the ability to do a nonblocking wait on the i915 request
(as we blocked upon the reservation object during prepare_fb). We can
instead convert the reservation object into a fence upon which we can
asynchronously wait (including a forced timeout in case the DMA fence is
never signaled).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_display.c | 78 ++++++++++++++++++++++--------------
 drivers/gpu/drm/i915/intel_drv.h     |  2 +
 2 files changed, 51 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 5c0ea08dddb3..294df122f7c6 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -14383,12 +14383,33 @@ static void intel_atomic_commit_tail(struct drm_atomic_state *state)
 
 static void intel_atomic_commit_work(struct work_struct *work)
 {
-	struct drm_atomic_state *state = container_of(work,
-						      struct drm_atomic_state,
-						      commit_work);
+	struct drm_atomic_state *state =
+		container_of(work, struct drm_atomic_state, commit_work);
+
 	intel_atomic_commit_tail(state);
 }
 
+static int __i915_sw_fence_call
+intel_atomic_commit_ready(struct i915_sw_fence *fence,
+			  enum i915_sw_fence_notify notify)
+{
+	struct intel_atomic_state *state =
+		container_of(fence, struct intel_atomic_state, commit_ready);
+
+	switch (notify) {
+	case FENCE_COMPLETE:
+		if (state->base.commit_work.func)
+			queue_work(system_unbound_wq, &state->base.commit_work);
+		break;
+
+	case FENCE_FREE:
+		drm_atomic_state_put(&state->base);
+		break;
+	}
+
+	return NOTIFY_DONE;
+}
+
 static void intel_atomic_track_fbs(struct drm_atomic_state *state)
 {
 	struct drm_plane_state *old_plane_state;
@@ -14434,11 +14455,14 @@ static int intel_atomic_commit(struct drm_device *dev,
 	if (ret)
 		return ret;
 
-	INIT_WORK(&state->commit_work, intel_atomic_commit_work);
+	drm_atomic_state_get(state);
+	i915_sw_fence_init(&intel_state->commit_ready,
+			   intel_atomic_commit_ready);
 
 	ret = intel_atomic_prepare_commit(dev, state);
 	if (ret) {
 		DRM_DEBUG_ATOMIC("Preparing state failed with %i\n", ret);
+		i915_sw_fence_commit(&intel_state->commit_ready);
 		return ret;
 	}
 
@@ -14449,10 +14473,14 @@ static int intel_atomic_commit(struct drm_device *dev,
 	intel_atomic_track_fbs(state);
 
 	drm_atomic_state_get(state);
-	if (nonblock)
-		queue_work(system_unbound_wq, &state->commit_work);
-	else
+	INIT_WORK(&state->commit_work,
+		  nonblock ? intel_atomic_commit_work : NULL);
+
+	i915_sw_fence_commit(&intel_state->commit_ready);
+	if (!nonblock) {
+		i915_sw_fence_wait(&intel_state->commit_ready);
 		intel_atomic_commit_tail(state);
+	}
 
 	return 0;
 }
@@ -14568,8 +14596,7 @@ intel_prepare_plane_fb(struct drm_plane *plane,
 	struct drm_framebuffer *fb = new_state->fb;
 	struct drm_i915_gem_object *obj = intel_fb_obj(fb);
 	struct drm_i915_gem_object *old_obj = intel_fb_obj(plane->state->fb);
-	long lret;
-	int ret = 0;
+	int ret;
 
 	if (!obj && !old_obj)
 		return 0;
@@ -14577,7 +14604,6 @@ intel_prepare_plane_fb(struct drm_plane *plane,
 	if (old_obj) {
 		struct drm_crtc_state *crtc_state =
 			drm_atomic_get_existing_crtc_state(new_state->state, plane->state->crtc);
-		long timeout = 0;
 
 		/* Big Hammer, we also need to ensure that any pending
 		 * MI_WAIT_FOR_EVENT inside a user batch buffer on the
@@ -14590,31 +14616,25 @@ intel_prepare_plane_fb(struct drm_plane *plane,
 		 * This should only fail upon a hung GPU, in which case we
 		 * can safely continue.
 		 */
-		if (needs_modeset(crtc_state))
-			timeout = i915_gem_object_wait(old_obj,
-						       I915_WAIT_INTERRUPTIBLE |
-						       I915_WAIT_LOCKED,
-						       MAX_SCHEDULE_TIMEOUT,
-						       NULL);
-		if (timeout < 0) {
-			/* GPU hangs should have been swallowed by the wait */
-			WARN_ON(timeout == -EIO);
-			return timeout;
+		if (needs_modeset(crtc_state)) {
+			ret = i915_sw_fence_await_reservation(&to_intel_atomic_state(new_state->state)->commit_ready,
+							      old_obj->resv, NULL,
+							      false, 0,
+							      GFP_KERNEL);
+			if (ret < 0)
+				return ret;
 		}
 	}
 
 	if (!obj)
 		return 0;
 
-	/* For framebuffer backed by dmabuf, wait for fence */
-	lret = i915_gem_object_wait(obj,
-				    I915_WAIT_INTERRUPTIBLE | I915_WAIT_LOCKED,
-				    MAX_SCHEDULE_TIMEOUT,
-				    NULL);
-	if (lret == -ERESTARTSYS)
-		return lret;
-
-	WARN(lret < 0, "waiting returns %li\n", lret);
+	ret = i915_sw_fence_await_reservation(&to_intel_atomic_state(new_state->state)->commit_ready,
+					      obj->resv, NULL,
+					      false, 10*HZ,
+					      GFP_KERNEL);
+	if (ret < 0)
+		return ret;
 
 	if (plane->type == DRM_PLANE_TYPE_CURSOR &&
 	    INTEL_INFO(dev)->cursor_needs_physical) {
diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
index 15e3050edeb9..cb99a2540863 100644
--- a/drivers/gpu/drm/i915/intel_drv.h
+++ b/drivers/gpu/drm/i915/intel_drv.h
@@ -357,6 +357,8 @@ struct intel_atomic_state {
 
 	/* Gen9+ only */
 	struct skl_wm_values wm_results;
+
+	struct i915_sw_fence commit_ready;
 };
 
 struct intel_plane_state {
-- 
2.9.3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 08/18] drm/i915: Combine seqno + tracking into a global timeline struct
  2016-09-14  6:52 Tracking multiple timelines (full-ppgtt) Chris Wilson
                   ` (6 preceding siblings ...)
  2016-09-14  6:52 ` [PATCH 07/18] drm/i915: Restore nonblocking awaits for modesetting Chris Wilson
@ 2016-09-14  6:52 ` Chris Wilson
  2016-09-14 15:42   ` Joonas Lahtinen
  2016-09-14  6:52 ` [PATCH 09/18] drm/i915: Wait first for submission, before waiting for request completion Chris Wilson
                   ` (10 subsequent siblings)
  18 siblings, 1 reply; 51+ messages in thread
From: Chris Wilson @ 2016-09-14  6:52 UTC (permalink / raw)
  To: intel-gfx

Our timelines are more than just a seqno. They also provide an ordered
list of requests to be executed. Due to the restriction of handling
individual address spaces, we are limited to a timeline per address
space but we use a fence context per engine within.

Our first step to introducing independent timelines per context (i.e. to
allow each context to have a queue of requests to execute that have a
defined set of dependencies on other requests) is to provide a timeline
abstraction for the global execution queue.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/Makefile              |  1 +
 drivers/gpu/drm/i915/i915_debugfs.c        | 19 +++----
 drivers/gpu/drm/i915/i915_drv.c            |  6 ++-
 drivers/gpu/drm/i915/i915_drv.h            |  9 ++--
 drivers/gpu/drm/i915/i915_gem.c            | 80 ++++++++++++++++++++++--------
 drivers/gpu/drm/i915/i915_gem.h            |  2 +
 drivers/gpu/drm/i915/i915_gem_request.c    | 74 ++++++++++++++++-----------
 drivers/gpu/drm/i915/i915_gem_request.h    |  1 +
 drivers/gpu/drm/i915/i915_gem_timeline.c   | 63 +++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_gem_timeline.h   | 69 ++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_gpu_error.c      |  6 +--
 drivers/gpu/drm/i915/i915_guc_submission.c |  3 +-
 drivers/gpu/drm/i915/i915_irq.c            |  2 +-
 drivers/gpu/drm/i915/intel_engine_cs.c     |  9 ++--
 drivers/gpu/drm/i915/intel_ringbuffer.h    | 35 ++-----------
 15 files changed, 268 insertions(+), 111 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/i915_gem_timeline.c
 create mode 100644 drivers/gpu/drm/i915/i915_gem_timeline.h

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index a998c2bce70a..1d2f3ce2ecfd 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -41,6 +41,7 @@ i915-y += i915_cmd_parser.o \
 	  i915_gem_shrinker.o \
 	  i915_gem_stolen.o \
 	  i915_gem_tiling.o \
+	  i915_gem_timeline.o \
 	  i915_gem_userptr.o \
 	  i915_gpu_error.o \
 	  i915_trace_points.o \
diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 2e02304c7447..d1752b0fee37 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -557,7 +557,7 @@ static int i915_gem_pageflip_info(struct seq_file *m, void *data)
 				seq_printf(m, "Flip queued on %s at seqno %x, next seqno %x [current breadcrumb %x], completed? %d\n",
 					   engine->name,
 					   i915_gem_request_get_seqno(work->flip_queued_req),
-					   dev_priv->next_seqno,
+					   dev_priv->gt.global_timeline.next_seqno,
 					   intel_engine_get_seqno(engine),
 					   i915_gem_request_completed(work->flip_queued_req));
 			} else
@@ -648,13 +648,13 @@ static int i915_gem_request_info(struct seq_file *m, void *data)
 		int count;
 
 		count = 0;
-		list_for_each_entry(req, &engine->request_list, link)
+		list_for_each_entry(req, &engine->timeline->requests, link)
 			count++;
 		if (count == 0)
 			continue;
 
 		seq_printf(m, "%s requests: %d\n", engine->name, count);
-		list_for_each_entry(req, &engine->request_list, link) {
+		list_for_each_entry(req, &engine->timeline->requests, link) {
 			struct pid *pid = req->ctx->pid;
 			struct task_struct *task;
 
@@ -1028,15 +1028,8 @@ static int
 i915_next_seqno_get(void *data, u64 *val)
 {
 	struct drm_i915_private *dev_priv = data;
-	int ret;
-
-	ret = mutex_lock_interruptible(&dev_priv->drm.struct_mutex);
-	if (ret)
-		return ret;
-
-	*val = dev_priv->next_seqno;
-	mutex_unlock(&dev_priv->drm.struct_mutex);
 
+	*val = dev_priv->gt.global_timeline.next_seqno;
 	return 0;
 }
 
@@ -1051,7 +1044,7 @@ i915_next_seqno_set(void *data, u64 val)
 	if (ret)
 		return ret;
 
-	ret = i915_gem_set_seqno(dev, val);
+	ret = i915_gem_set_global_seqno(dev, val);
 	mutex_unlock(&dev->struct_mutex);
 
 	return ret;
@@ -1310,7 +1303,7 @@ static int i915_hangcheck_info(struct seq_file *m, void *unused)
 		seq_printf(m, "\tseqno = %x [current %x, last %x]\n",
 			   engine->hangcheck.seqno,
 			   seqno[id],
-			   engine->last_submitted_seqno);
+			   engine->timeline->last_submitted_seqno);
 		seq_printf(m, "\twaiters? %s, fake irq active? %s\n",
 			   yesno(intel_engine_has_waiter(engine)),
 			   yesno(test_bit(engine->id,
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 7f4e8adec8a8..5719deefd484 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -832,7 +832,9 @@ static int i915_driver_init_early(struct drm_i915_private *dev_priv,
 	intel_init_display_hooks(dev_priv);
 	intel_init_clock_gating_hooks(dev_priv);
 	intel_init_audio_hooks(dev_priv);
-	i915_gem_load_init(&dev_priv->drm);
+	ret = i915_gem_load_init(&dev_priv->drm);
+	if (ret < 0)
+		goto err_gvt;
 
 	intel_display_crc_init(dev_priv);
 
@@ -848,6 +850,8 @@ static int i915_driver_init_early(struct drm_i915_private *dev_priv,
 
 	return 0;
 
+err_gvt:
+	intel_gvt_cleanup(dev_priv);
 err_workqueues:
 	i915_workqueues_cleanup(dev_priv);
 	return ret;
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 2bcab3087e8c..0ee9f8534826 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -63,6 +63,7 @@
 #include "i915_gem_gtt.h"
 #include "i915_gem_render_state.h"
 #include "i915_gem_request.h"
+#include "i915_gem_timeline.h"
 
 #include "intel_gvt.h"
 
@@ -1788,7 +1789,6 @@ struct drm_i915_private {
 	struct i915_gem_context *kernel_context;
 	struct intel_engine_cs engine[I915_NUM_ENGINES];
 	struct i915_vma *semaphore;
-	u32 next_seqno;
 
 	struct drm_dma_handle *status_page_dmah;
 	struct resource mch_res;
@@ -2046,6 +2046,9 @@ struct drm_i915_private {
 		void (*resume)(struct drm_i915_private *);
 		void (*cleanup_engine)(struct intel_engine_cs *engine);
 
+		struct list_head timelines;
+		struct i915_gem_timeline global_timeline;
+
 		/**
 		 * Is the GPU currently considered idle, or busy executing
 		 * userspace requests? Whilst idle, we allow runtime power
@@ -3040,7 +3043,7 @@ int i915_gem_get_aperture_ioctl(struct drm_device *dev, void *data,
 				struct drm_file *file_priv);
 int i915_gem_wait_ioctl(struct drm_device *dev, void *data,
 			struct drm_file *file_priv);
-void i915_gem_load_init(struct drm_device *dev);
+int i915_gem_load_init(struct drm_device *dev);
 void i915_gem_load_cleanup(struct drm_device *dev);
 void i915_gem_load_init_fences(struct drm_i915_private *dev_priv);
 int i915_gem_freeze_late(struct drm_i915_private *dev_priv);
@@ -3204,7 +3207,7 @@ void i915_gem_track_fb(struct drm_i915_gem_object *old,
 		       struct drm_i915_gem_object *new,
 		       unsigned frontbuffer_bits);
 
-int __must_check i915_gem_set_seqno(struct drm_device *dev, u32 seqno);
+int __must_check i915_gem_set_global_seqno(struct drm_device *dev, u32 seqno);
 
 struct drm_i915_gem_request *
 i915_gem_find_active_request(struct intel_engine_cs *engine);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index ab8d10388581..dcf4947026fa 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -365,7 +365,7 @@ out:
 	if (flags & I915_WAIT_LOCKED && i915_gem_request_completed(rq))
 		i915_gem_request_retire_upto(rq);
 
-	if (rps && rq->fence.seqno == rq->engine->last_submitted_seqno) {
+	if (rps && rq->fence.seqno == rq->timeline->last_submitted_seqno) {
 		/* The GPU is now idle and this client has stalled.
 		 * Since no other client has submitted a request in the
 		 * meantime, assume that this client is the only one
@@ -2600,7 +2600,7 @@ i915_gem_find_active_request(struct intel_engine_cs *engine)
 	 * extra delay for a recent interrupt is pointless. Hence, we do
 	 * not need an engine->irq_seqno_barrier() before the seqno reads.
 	 */
-	list_for_each_entry(request, &engine->request_list, link) {
+	list_for_each_entry(request, &engine->timeline->requests, link) {
 		if (i915_gem_request_completed(request))
 			continue;
 
@@ -2668,7 +2668,7 @@ static void i915_gem_reset_engine(struct intel_engine_cs *engine)
 	if (i915_gem_context_is_default(incomplete_ctx))
 		return;
 
-	list_for_each_entry_continue(request, &engine->request_list, link)
+	list_for_each_entry_continue(request, &engine->timeline->requests, link)
 		if (request->ctx == incomplete_ctx)
 			reset_request(request);
 }
@@ -2697,7 +2697,8 @@ static void i915_gem_cleanup_engine(struct intel_engine_cs *engine)
 	 * (lockless) lookup doesn't try and wait upon the request as we
 	 * reset it.
 	 */
-	intel_engine_init_seqno(engine, engine->last_submitted_seqno);
+	intel_engine_init_global_seqno(engine,
+				       engine->timeline->last_submitted_seqno);
 
 	/*
 	 * Clear the execlists queue up before freeing the requests, as those
@@ -2988,17 +2989,26 @@ destroy:
 	return 0;
 }
 
-int i915_gem_wait_for_idle(struct drm_i915_private *dev_priv,
-			   unsigned int flags)
+static int wait_for_timeline(struct i915_gem_timeline *tl, unsigned int flags)
 {
-	struct intel_engine_cs *engine;
-	int ret;
+	int ret, i;
 
-	for_each_engine(engine, dev_priv) {
-		if (engine->last_context == NULL)
-			continue;
+	for (i = 0; i < ARRAY_SIZE(tl->engine); i++) {
+		ret = i915_gem_active_wait(&tl->engine[i].last_request, flags);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
 
-		ret = intel_engine_idle(engine, flags);
+int i915_gem_wait_for_idle(struct drm_i915_private *i915, unsigned int flags)
+{
+	struct i915_gem_timeline *tl;
+	int ret;
+
+	list_for_each_entry(tl, &i915->gt.timelines, link) {
+		ret = wait_for_timeline(tl, flags);
 		if (ret)
 			return ret;
 	}
@@ -4414,12 +4424,6 @@ i915_gem_cleanup_engines(struct drm_device *dev)
 		dev_priv->gt.cleanup_engine(engine);
 }
 
-static void
-init_engine_lists(struct intel_engine_cs *engine)
-{
-	INIT_LIST_HEAD(&engine->request_list);
-}
-
 void
 i915_gem_load_init_fences(struct drm_i915_private *dev_priv)
 {
@@ -4452,22 +4456,32 @@ i915_gem_load_init_fences(struct drm_i915_private *dev_priv)
 	i915_gem_detect_bit_6_swizzle(dev);
 }
 
-void
+int
 i915_gem_load_init(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = to_i915(dev);
-	int i;
+	int err;
 
 	dev_priv->objects =
 		kmem_cache_create("i915_gem_object",
 				  sizeof(struct drm_i915_gem_object), 0,
 				  SLAB_HWCACHE_ALIGN,
 				  NULL);
+	if (!dev_priv->objects) {
+		err = -ENOMEM;
+		goto err_out;
+	}
+
 	dev_priv->vmas =
 		kmem_cache_create("i915_gem_vma",
 				  sizeof(struct i915_vma), 0,
 				  SLAB_HWCACHE_ALIGN,
 				  NULL);
+	if (!dev_priv->vmas) {
+		err = -ENOMEM;
+		goto err_objects;
+	}
+
 	dev_priv->requests =
 		kmem_cache_create("i915_gem_request",
 				  sizeof(struct drm_i915_gem_request), 0,
@@ -4475,13 +4489,24 @@ i915_gem_load_init(struct drm_device *dev)
 				  SLAB_RECLAIM_ACCOUNT |
 				  SLAB_DESTROY_BY_RCU,
 				  NULL);
+	if (!dev_priv->requests) {
+		err = -ENOMEM;
+		goto err_vmas;
+	}
+
+	mutex_lock(&dev_priv->drm.struct_mutex);
+	INIT_LIST_HEAD(&dev_priv->gt.timelines);
+	err = i915_gem_timeline_init(dev_priv,
+				     &dev_priv->gt.global_timeline,
+				     "[execution]");
+	mutex_unlock(&dev_priv->drm.struct_mutex);
+	if (err)
+		goto err_requests;
 
 	INIT_LIST_HEAD(&dev_priv->context_list);
 	INIT_LIST_HEAD(&dev_priv->mm.unbound_list);
 	INIT_LIST_HEAD(&dev_priv->mm.bound_list);
 	INIT_LIST_HEAD(&dev_priv->mm.fence_list);
-	for (i = 0; i < I915_NUM_ENGINES; i++)
-		init_engine_lists(&dev_priv->engine[i]);
 	INIT_DELAYED_WORK(&dev_priv->gt.retire_work,
 			  i915_gem_retire_work_handler);
 	INIT_DELAYED_WORK(&dev_priv->gt.idle_work,
@@ -4498,6 +4523,17 @@ i915_gem_load_init(struct drm_device *dev)
 	atomic_set(&dev_priv->mm.bsd_engine_dispatch_index, 0);
 
 	spin_lock_init(&dev_priv->fb_tracking.lock);
+
+	return 0;
+
+err_requests:
+	kmem_cache_destroy(dev_priv->requests);
+err_vmas:
+	kmem_cache_destroy(dev_priv->vmas);
+err_objects:
+	kmem_cache_destroy(dev_priv->objects);
+err_out:
+	return err;
 }
 
 void i915_gem_load_cleanup(struct drm_device *dev)
diff --git a/drivers/gpu/drm/i915/i915_gem.h b/drivers/gpu/drm/i915/i915_gem.h
index 8292e797d9b5..735580d72eb1 100644
--- a/drivers/gpu/drm/i915/i915_gem.h
+++ b/drivers/gpu/drm/i915/i915_gem.h
@@ -31,4 +31,6 @@
 #define GEM_BUG_ON(expr)
 #endif
 
+#define I915_NUM_ENGINES 5
+
 #endif /* __I915_GEM_H__ */
diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
index bc27c80be1e4..5817ec55849f 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -40,7 +40,7 @@ static const char *i915_fence_get_timeline_name(struct fence *fence)
 	 * multiple execution contexts (fence contexts) as we allow
 	 * engines within a single timeline to execute in parallel.
 	 */
-	return "global";
+	return to_request(fence)->timeline->common->name;
 }
 
 static bool i915_fence_signaled(struct fence *fence)
@@ -208,7 +208,7 @@ void i915_gem_request_retire_upto(struct drm_i915_gem_request *req)
 		return;
 
 	do {
-		tmp = list_first_entry(&engine->request_list,
+		tmp = list_first_entry(&engine->timeline->requests,
 				       typeof(*tmp), link);
 
 		i915_gem_request_retire(tmp);
@@ -235,23 +235,24 @@ static int i915_gem_check_wedge(struct drm_i915_private *dev_priv)
 	return 0;
 }
 
-static int i915_gem_init_seqno(struct drm_i915_private *dev_priv, u32 seqno)
+static int i915_gem_init_global_seqno(struct drm_i915_private *dev_priv,
+				      u32 seqno)
 {
 	struct intel_engine_cs *engine;
 	int ret;
 
 	/* Carefully retire all requests without writing to the rings */
-	for_each_engine(engine, dev_priv) {
-		ret = intel_engine_idle(engine,
-					I915_WAIT_INTERRUPTIBLE |
-					I915_WAIT_LOCKED);
-		if (ret)
-			return ret;
-	}
+	ret = i915_gem_wait_for_idle(dev_priv,
+				     I915_WAIT_INTERRUPTIBLE |
+				     I915_WAIT_LOCKED);
+	if (ret)
+		return ret;
+
 	i915_gem_retire_requests(dev_priv);
 
 	/* If the seqno wraps around, we need to clear the breadcrumb rbtree */
-	if (!i915_seqno_passed(seqno, dev_priv->next_seqno)) {
+	if (!i915_seqno_passed(seqno,
+			       dev_priv->gt.global_timeline.next_seqno)) {
 		while (intel_kick_waiters(dev_priv) ||
 		       intel_kick_signalers(dev_priv))
 			yield();
@@ -259,12 +260,12 @@ static int i915_gem_init_seqno(struct drm_i915_private *dev_priv, u32 seqno)
 
 	/* Finally reset hw state */
 	for_each_engine(engine, dev_priv)
-		intel_engine_init_seqno(engine, seqno);
+		intel_engine_init_global_seqno(engine, seqno);
 
 	return 0;
 }
 
-int i915_gem_set_seqno(struct drm_device *dev, u32 seqno)
+int i915_gem_set_global_seqno(struct drm_device *dev, u32 seqno)
 {
 	struct drm_i915_private *dev_priv = to_i915(dev);
 	int ret;
@@ -275,28 +276,31 @@ int i915_gem_set_seqno(struct drm_device *dev, u32 seqno)
 	/* HWS page needs to be set less than what we
 	 * will inject to ring
 	 */
-	ret = i915_gem_init_seqno(dev_priv, seqno - 1);
+	ret = i915_gem_init_global_seqno(dev_priv, seqno - 1);
 	if (ret)
 		return ret;
 
-	dev_priv->next_seqno = seqno;
+	dev_priv->gt.global_timeline.next_seqno = seqno;
 	return 0;
 }
 
-static int i915_gem_get_seqno(struct drm_i915_private *dev_priv, u32 *seqno)
+static int i915_gem_get_global_seqno(struct drm_i915_private *dev_priv,
+				     u32 *seqno)
 {
+	struct i915_gem_timeline *tl = &dev_priv->gt.global_timeline;
+
 	/* reserve 0 for non-seqno */
-	if (unlikely(dev_priv->next_seqno == 0)) {
+	if (unlikely(tl->next_seqno == 0)) {
 		int ret;
 
-		ret = i915_gem_init_seqno(dev_priv, 0);
+		ret = i915_gem_init_global_seqno(dev_priv, 0);
 		if (ret)
 			return ret;
 
-		dev_priv->next_seqno = 1;
+		tl->next_seqno = 1;
 	}
 
-	*seqno = dev_priv->next_seqno++;
+	*seqno = tl->next_seqno++;
 	return 0;
 }
 
@@ -350,7 +354,7 @@ i915_gem_request_alloc(struct intel_engine_cs *engine,
 		return ERR_PTR(ret);
 
 	/* Move the oldest request to the slab-cache (if not in use!) */
-	req = list_first_entry_or_null(&engine->request_list,
+	req = list_first_entry_or_null(&engine->timeline->requests,
 				       typeof(*req), link);
 	if (req && i915_gem_request_completed(req))
 		i915_gem_request_retire(req);
@@ -387,15 +391,17 @@ i915_gem_request_alloc(struct intel_engine_cs *engine,
 	if (!req)
 		return ERR_PTR(-ENOMEM);
 
-	ret = i915_gem_get_seqno(dev_priv, &seqno);
+	ret = i915_gem_get_global_seqno(dev_priv, &seqno);
 	if (ret)
 		goto err;
 
+	req->timeline = engine->timeline;
+
 	spin_lock_init(&req->lock);
 	fence_init(&req->fence,
 		   &i915_fence_ops,
 		   &req->lock,
-		   engine->fence_context,
+		   req->timeline->fence_context,
 		   seqno);
 
 	i915_sw_fence_init(&req->submit, submit_notify);
@@ -450,9 +456,15 @@ i915_gem_request_await_request(struct drm_i915_gem_request *to,
 
 	GEM_BUG_ON(to == from);
 
-	if (to->engine == from->engine)
+	if (to->timeline == from->timeline)
 		return 0;
 
+	if (to->engine == from->engine) {
+		ret = i915_sw_fence_await_sw_fence(&to->submit, &from->submit,
+						   NULL, GFP_KERNEL);
+		return ret < 0 ? ret : 0;
+	}
+
 	idx = intel_engine_sync_index(from->engine, to->engine);
 	if (from->fence.seqno <= from->engine->semaphore.sync_seqno[idx])
 		return 0;
@@ -607,6 +619,7 @@ void __i915_add_request(struct drm_i915_gem_request *request, bool flush_caches)
 {
 	struct intel_engine_cs *engine = request->engine;
 	struct intel_ring *ring = request->ring;
+	struct intel_timeline *timeline = request->timeline;
 	struct drm_i915_gem_request *prev;
 	u32 request_start;
 	u32 reserved_tail;
@@ -663,17 +676,17 @@ void __i915_add_request(struct drm_i915_gem_request *request, bool flush_caches)
 	 * see a more recent value in the hws than we are tracking.
 	 */
 
-	prev = i915_gem_active_raw(&engine->last_request,
+	prev = i915_gem_active_raw(&timeline->last_request,
 				   &request->i915->drm.struct_mutex);
 	if (prev)
 		i915_sw_fence_await_sw_fence(&request->submit, &prev->submit,
 					     &request->submitq, GFP_NOWAIT);
 
 	request->emitted_jiffies = jiffies;
-	request->previous_seqno = engine->last_submitted_seqno;
-	engine->last_submitted_seqno = request->fence.seqno;
-	i915_gem_active_set(&engine->last_request, request);
-	list_add_tail(&request->link, &engine->request_list);
+	request->previous_seqno = timeline->last_submitted_seqno;
+	timeline->last_submitted_seqno = request->fence.seqno;
+	i915_gem_active_set(&timeline->last_request, request);
+	list_add_tail(&request->link, &timeline->requests);
 	list_add_tail(&request->ring_link, &ring->request_list);
 
 	i915_gem_mark_busy(engine);
@@ -874,7 +887,8 @@ static bool engine_retire_requests(struct intel_engine_cs *engine)
 {
 	struct drm_i915_gem_request *request, *next;
 
-	list_for_each_entry_safe(request, next, &engine->request_list, link) {
+	list_for_each_entry_safe(request, next,
+				 &engine->timeline->requests, link) {
 		if (!i915_gem_request_completed(request))
 			return false;
 
diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
index fe75026c60e0..6c78ca0d85a9 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.h
+++ b/drivers/gpu/drm/i915/i915_gem_request.h
@@ -81,6 +81,7 @@ struct drm_i915_gem_request {
 	struct i915_gem_context *ctx;
 	struct intel_engine_cs *engine;
 	struct intel_ring *ring;
+	struct intel_timeline *timeline;
 	struct intel_signal_node signaling;
 
 	struct i915_sw_fence submit;
diff --git a/drivers/gpu/drm/i915/i915_gem_timeline.c b/drivers/gpu/drm/i915/i915_gem_timeline.c
new file mode 100644
index 000000000000..da2e0d0c232e
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_gem_timeline.c
@@ -0,0 +1,63 @@
+/*
+ * Copyright © 2016 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ */
+
+#include "i915_drv.h"
+
+int i915_gem_timeline_init(struct drm_i915_private *i915,
+			   struct i915_gem_timeline *timeline,
+			   const char *name)
+{
+	u64 fences;
+	int i;
+
+	lockdep_assert_held(&i915->drm.struct_mutex);
+
+	timeline->i915 = i915;
+	timeline->name = kstrdup(name ?: "[kernel]", GFP_KERNEL);
+	if (!timeline->name)
+		return -ENOMEM;
+
+	list_add(&timeline->link, &i915->gt.timelines);
+
+	fences = fence_context_alloc(ARRAY_SIZE(timeline->engine));
+	for (i = 0; i < ARRAY_SIZE(timeline->engine); i++) {
+		struct intel_timeline *tl = &timeline->engine[i];
+
+		tl->fence_context = fences + i;
+		tl->common = timeline;
+
+		init_request_active(&tl->last_request, NULL);
+		INIT_LIST_HEAD(&tl->requests);
+	}
+
+	return 0;
+}
+
+void i915_gem_timeline_fini(struct i915_gem_timeline *tl)
+{
+	lockdep_assert_held(&tl->i915->drm.struct_mutex);
+
+	list_del(&tl->link);
+	kfree(tl->name);
+}
diff --git a/drivers/gpu/drm/i915/i915_gem_timeline.h b/drivers/gpu/drm/i915/i915_gem_timeline.h
new file mode 100644
index 000000000000..c3cbaf28e959
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_gem_timeline.h
@@ -0,0 +1,69 @@
+/*
+ * Copyright © 2016 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ */
+
+#ifndef I915_GEM_TIMELINE_H
+#define I915_GEM_TIMELINE_H
+
+#include <linux/list.h>
+
+#include "i915_gem_request.h"
+
+struct i915_gem_timeline;
+
+struct intel_timeline {
+	u64 fence_context;
+	u32 last_submitted_seqno;
+
+	/**
+	 * List of breadcrumbs associated with GPU requests currently
+	 * outstanding.
+	 */
+	struct list_head requests;
+
+	/* An RCU guarded pointer to the last request. No reference is
+	 * held to the request, users must carefully acquire a reference to
+	 * the request using i915_gem_active_get_request_rcu(), or hold the
+	 * struct_mutex.
+	 */
+	struct i915_gem_active last_request;
+
+	struct i915_gem_timeline *common;
+};
+
+struct i915_gem_timeline {
+	struct list_head link;
+	u32 next_seqno;
+
+	struct drm_i915_private *i915;
+	const char *name;
+
+	struct intel_timeline engine[I915_NUM_ENGINES];
+};
+
+int i915_gem_timeline_init(struct drm_i915_private *i915,
+			   struct i915_gem_timeline *tl,
+			   const char *name);
+void i915_gem_timeline_fini(struct i915_gem_timeline *tl);
+
+#endif
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 3d3f410d9aa9..1e271008f1ef 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -1017,7 +1017,7 @@ static void error_record_engine_registers(struct drm_i915_error_state *error,
 	ee->instpm = I915_READ(RING_INSTPM(engine->mmio_base));
 	ee->acthd = intel_engine_get_active_head(engine);
 	ee->seqno = intel_engine_get_seqno(engine);
-	ee->last_seqno = engine->last_submitted_seqno;
+	ee->last_seqno = engine->timeline->last_submitted_seqno;
 	ee->start = I915_READ_START(engine);
 	ee->head = I915_READ_HEAD(engine);
 	ee->tail = I915_READ_TAIL(engine);
@@ -1088,7 +1088,7 @@ static void engine_record_requests(struct intel_engine_cs *engine,
 
 	count = 0;
 	request = first;
-	list_for_each_entry_from(request, &engine->request_list, link)
+	list_for_each_entry_from(request, &engine->timeline->requests, link)
 		count++;
 	if (!count)
 		return;
@@ -1101,7 +1101,7 @@ static void engine_record_requests(struct intel_engine_cs *engine,
 
 	count = 0;
 	request = first;
-	list_for_each_entry_from(request, &engine->request_list, link) {
+	list_for_each_entry_from(request, &engine->timeline->requests, link) {
 		struct drm_i915_error_request *erq;
 
 		if (count >= ee->num_requests) {
diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
index 279a4d060288..b73e9020cbd2 100644
--- a/drivers/gpu/drm/i915/i915_guc_submission.c
+++ b/drivers/gpu/drm/i915/i915_guc_submission.c
@@ -1024,7 +1024,8 @@ int i915_guc_submission_enable(struct drm_i915_private *dev_priv)
 		engine->submit_request = i915_guc_submit;
 
 		/* Replay the current set of previously submitted requests */
-		list_for_each_entry(request, &engine->request_list, link) {
+		list_for_each_entry(request,
+				    &engine->timeline->requests, link) {
 			client->wq_rsvd += sizeof(struct guc_wq_item);
 			if (i915_sw_fence_done(&request->submit))
 				i915_guc_submit(request);
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 8462817a7dae..ad6b1459a1a9 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -3094,7 +3094,7 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
 
 		acthd = intel_engine_get_active_head(engine);
 		seqno = intel_engine_get_seqno(engine);
-		submit = READ_ONCE(engine->last_submitted_seqno);
+		submit = READ_ONCE(engine->timeline->last_submitted_seqno);
 
 		if (engine->hangcheck.seqno == seqno) {
 			if (i915_seqno_passed(seqno, submit)) {
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index e405f1080296..87feaeff4304 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -161,7 +161,7 @@ cleanup:
 	return ret;
 }
 
-void intel_engine_init_seqno(struct intel_engine_cs *engine, u32 seqno)
+void intel_engine_init_global_seqno(struct intel_engine_cs *engine, u32 seqno)
 {
 	struct drm_i915_private *dev_priv = engine->i915;
 
@@ -197,7 +197,7 @@ void intel_engine_init_seqno(struct intel_engine_cs *engine, u32 seqno)
 	intel_write_status_page(engine, I915_GEM_HWS_INDEX, seqno);
 	if (engine->irq_seqno_barrier)
 		engine->irq_seqno_barrier(engine);
-	engine->last_submitted_seqno = seqno;
+	engine->timeline->last_submitted_seqno = seqno;
 
 	engine->hangcheck.seqno = seqno;
 
@@ -217,8 +217,7 @@ void intel_engine_init_hangcheck(struct intel_engine_cs *engine)
 
 static void intel_engine_init_requests(struct intel_engine_cs *engine)
 {
-	init_request_active(&engine->last_request, NULL);
-	INIT_LIST_HEAD(&engine->request_list);
+	engine->timeline = &engine->i915->gt.global_timeline.engine[engine->id];
 }
 
 /**
@@ -235,8 +234,6 @@ void intel_engine_setup_common(struct intel_engine_cs *engine)
 	INIT_LIST_HEAD(&engine->execlist_queue);
 	spin_lock_init(&engine->execlist_lock);
 
-	engine->fence_context = fence_context_alloc(1);
-
 	intel_engine_init_requests(engine);
 	intel_engine_init_hangcheck(engine);
 	i915_gem_batch_pool_init(engine, &engine->batch_pool);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 80f2db3d63d9..d6f80b3d66b4 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -4,6 +4,7 @@
 #include <linux/hashtable.h>
 #include "i915_gem_batch_pool.h"
 #include "i915_gem_request.h"
+#include "i915_gem_timeline.h"
 
 #define I915_CMD_HASH_ORDER 9
 
@@ -141,7 +142,6 @@ struct intel_engine_cs {
 		VCS2,	/* Keep instances of the same type engine together. */
 		VECS
 	} id;
-#define I915_NUM_ENGINES 5
 #define _VCS(n) (VCS + (n))
 	unsigned int exec_id;
 	enum intel_engine_hw_id {
@@ -152,10 +152,10 @@ struct intel_engine_cs {
 		VCS2_HW
 	} hw_id;
 	enum intel_engine_hw_id guc_id; /* XXX same as hw_id? */
-	u64 fence_context;
 	u32		mmio_base;
 	unsigned int irq_shift;
 	struct intel_ring *buffer;
+	struct intel_timeline *timeline;
 
 	/* Rather than have every client wait upon all user interrupts,
 	 * with the herd waking after every interrupt and each doing the
@@ -316,26 +316,6 @@ struct intel_engine_cs {
 	bool preempt_wa;
 	u32 ctx_desc_template;
 
-	/**
-	 * List of breadcrumbs associated with GPU requests currently
-	 * outstanding.
-	 */
-	struct list_head request_list;
-
-	/**
-	 * Seqno of request most recently submitted to request_list.
-	 * Used exclusively by hang checker to avoid grabbing lock while
-	 * inspecting request list.
-	 */
-	u32 last_submitted_seqno;
-
-	/* An RCU guarded pointer to the last request. No reference is
-	 * held to the request, users must carefully acquire a reference to
-	 * the request using i915_gem_active_get_rcu(), or hold the
-	 * struct_mutex.
-	 */
-	struct i915_gem_active last_request;
-
 	struct i915_gem_context *last_context;
 
 	struct intel_engine_hangcheck hangcheck;
@@ -491,7 +471,7 @@ static inline u32 intel_ring_offset(struct intel_ring *ring, u32 value)
 int __intel_ring_space(int head, int tail, int size);
 void intel_ring_update_space(struct intel_ring *ring);
 
-void intel_engine_init_seqno(struct intel_engine_cs *engine, u32 seqno);
+void intel_engine_init_global_seqno(struct intel_engine_cs *engine, u32 seqno);
 void intel_engine_reset_irq(struct intel_engine_cs *engine);
 
 void intel_engine_setup_common(struct intel_engine_cs *engine);
@@ -499,13 +479,6 @@ int intel_engine_init_common(struct intel_engine_cs *engine);
 int intel_engine_create_scratch(struct intel_engine_cs *engine, int size);
 void intel_engine_cleanup_common(struct intel_engine_cs *engine);
 
-static inline int intel_engine_idle(struct intel_engine_cs *engine,
-				    unsigned int flags)
-{
-	/* Wait upon the last request to be completed */
-	return i915_gem_active_wait(&engine->last_request, flags);
-}
-
 int intel_init_render_ring_buffer(struct intel_engine_cs *engine);
 int intel_init_bsd_ring_buffer(struct intel_engine_cs *engine);
 int intel_init_bsd2_ring_buffer(struct intel_engine_cs *engine);
@@ -589,7 +562,7 @@ unsigned int intel_kick_signalers(struct drm_i915_private *i915);
 
 static inline bool intel_engine_is_active(struct intel_engine_cs *engine)
 {
-	return i915_gem_active_isset(&engine->last_request);
+	return i915_gem_active_isset(&engine->timeline->last_request);
 }
 
 #endif /* _INTEL_RINGBUFFER_H_ */
-- 
2.9.3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 09/18] drm/i915: Wait first for submission, before waiting for request completion
  2016-09-14  6:52 Tracking multiple timelines (full-ppgtt) Chris Wilson
                   ` (7 preceding siblings ...)
  2016-09-14  6:52 ` [PATCH 08/18] drm/i915: Combine seqno + tracking into a global timeline struct Chris Wilson
@ 2016-09-14  6:52 ` Chris Wilson
  2016-09-19  8:59   ` Joonas Lahtinen
  2016-09-14  6:52 ` [PATCH 10/18] drm/i915: Introduce a global_seqno for each request Chris Wilson
                   ` (9 subsequent siblings)
  18 siblings, 1 reply; 51+ messages in thread
From: Chris Wilson @ 2016-09-14  6:52 UTC (permalink / raw)
  To: intel-gfx

In future patches, we will no longer be able to wait on a static global
seqno and instead have to break our wait up into phases. First we wait
for the global seqno assignment (upon submission to hardware), and once
submitted we wait for the hardware to complete.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_request.c | 49 +++++++++++++++++++++++++++++++++
 1 file changed, 49 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
index 5817ec55849f..1b8463a474bb 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -770,6 +770,49 @@ bool __i915_spin_request(const struct drm_i915_gem_request *req,
 	return false;
 }
 
+static long
+__i915_request_wait_for_submit(struct drm_i915_gem_request *request,
+			       unsigned int flags,
+			       long timeout)
+{
+	const int state = flags & I915_WAIT_INTERRUPTIBLE ?
+		TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
+	wait_queue_head_t *q = &request->i915->gpu_error.wait_queue;
+	DEFINE_WAIT(reset);
+	DEFINE_WAIT(wait);
+
+	if (flags & I915_WAIT_LOCKED)
+		add_wait_queue(q, &reset);
+
+	do {
+		prepare_to_wait(&request->submit.wait, &wait, state);
+
+		if (i915_sw_fence_done(&request->submit))
+			break;
+
+		if (flags & I915_WAIT_LOCKED &&
+		    i915_reset_in_progress(&request->i915->gpu_error)) {
+			__set_current_state(TASK_RUNNING);
+			i915_reset(request->i915);
+			reset_wait_queue(q, &reset);
+			continue;
+		}
+
+		if (signal_pending_state(state, current)) {
+			timeout = -ERESTARTSYS;
+			break;
+		}
+
+		timeout = io_schedule_timeout(timeout);
+	} while (timeout);
+	finish_wait(&request->submit.wait, &wait);
+
+	if (flags & I915_WAIT_LOCKED)
+		remove_wait_queue(q, &reset);
+
+	return timeout;
+}
+
 /**
  * i915_wait_request - wait until execution of request has finished
  * @req: duh!
@@ -808,6 +851,12 @@ long i915_wait_request(struct drm_i915_gem_request *req,
 
 	trace_i915_gem_request_wait_begin(req);
 
+	if (!i915_sw_fence_done(&req->submit)) {
+		timeout = __i915_request_wait_for_submit(req, flags, timeout);
+		if (timeout < 0)
+			goto complete;
+	}
+
 	/* Optimistic short spin before touching IRQs */
 	if (i915_spin_request(req, state, 5))
 		goto complete;
-- 
2.9.3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 10/18] drm/i915: Introduce a global_seqno for each request
  2016-09-14  6:52 Tracking multiple timelines (full-ppgtt) Chris Wilson
                   ` (8 preceding siblings ...)
  2016-09-14  6:52 ` [PATCH 09/18] drm/i915: Wait first for submission, before waiting for request completion Chris Wilson
@ 2016-09-14  6:52 ` Chris Wilson
  2016-09-19 10:36   ` Joonas Lahtinen
  2016-09-14  6:52 ` [PATCH 11/18] drm/i915: Record space required for request emission Chris Wilson
                   ` (8 subsequent siblings)
  18 siblings, 1 reply; 51+ messages in thread
From: Chris Wilson @ 2016-09-14  6:52 UTC (permalink / raw)
  To: intel-gfx

Though we will have multiple timelines, we still have a single timeline
of execution. This we can use to provide an execution and retirement order
of requests. This keeps tracking execution of requests simple, and vital
for preserving a single waiter (i.e. so that we can order the waiters so
that only the earliest to wakeup need be woken). To accomplish this we
distinguish the seqno used to order requests per-context (external) and
that used internally for execution.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c        |  2 +-
 drivers/gpu/drm/i915/i915_drv.h            |  4 ++--
 drivers/gpu/drm/i915/i915_gem.c            |  4 ++--
 drivers/gpu/drm/i915/i915_gem_request.c    | 23 ++++++++++++++++-------
 drivers/gpu/drm/i915/i915_gem_request.h    | 30 +++++++++++++++++++++++++-----
 drivers/gpu/drm/i915/i915_gpu_error.c      |  2 +-
 drivers/gpu/drm/i915/i915_guc_submission.c |  4 ++--
 drivers/gpu/drm/i915/i915_trace.h          |  8 ++++----
 drivers/gpu/drm/i915/intel_breadcrumbs.c   |  8 +++++---
 drivers/gpu/drm/i915/intel_lrc.c           |  4 ++--
 drivers/gpu/drm/i915/intel_ringbuffer.c    | 14 +++++++-------
 11 files changed, 67 insertions(+), 36 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index d1752b0fee37..b325ec3b5a09 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -661,7 +661,7 @@ static int i915_gem_request_info(struct seq_file *m, void *data)
 			rcu_read_lock();
 			task = pid ? pid_task(pid, PIDTYPE_PID) : NULL;
 			seq_printf(m, "    %x @ %d: %s [%d]\n",
-				   req->fence.seqno,
+				   req->global_seqno,
 				   (int) (jiffies - req->emitted_jiffies),
 				   task ? task->comm : "<unknown>",
 				   task ? task->pid : -1);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 0ee9f8534826..bbb8a5e38ecf 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3844,7 +3844,7 @@ __i915_request_irq_complete(struct drm_i915_gem_request *req)
 	/* Before we do the heavier coherent read of the seqno,
 	 * check the value (hopefully) in the CPU cacheline.
 	 */
-	if (i915_gem_request_completed(req))
+	if (__i915_gem_request_completed(req))
 		return true;
 
 	/* Ensure our read of the seqno is coherent so that we
@@ -3895,7 +3895,7 @@ __i915_request_irq_complete(struct drm_i915_gem_request *req)
 			wake_up_process(tsk);
 		rcu_read_unlock();
 
-		if (i915_gem_request_completed(req))
+		if (__i915_gem_request_completed(req))
 			return true;
 	}
 
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index dcf4947026fa..76d6fc146adf 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2601,7 +2601,7 @@ i915_gem_find_active_request(struct intel_engine_cs *engine)
 	 * not need an engine->irq_seqno_barrier() before the seqno reads.
 	 */
 	list_for_each_entry(request, &engine->timeline->requests, link) {
-		if (i915_gem_request_completed(request))
+		if (__i915_gem_request_completed(request))
 			continue;
 
 		if (!i915_sw_fence_done(&request->submit))
@@ -2651,7 +2651,7 @@ static void i915_gem_reset_engine(struct intel_engine_cs *engine)
 		return;
 
 	DRM_DEBUG_DRIVER("resetting %s to restart from tail of request 0x%x\n",
-			 engine->name, request->fence.seqno);
+			 engine->name, request->global_seqno);
 
 	/* Setup the CS to resume from the breadcrumb of the hung request */
 	engine->reset_hw(engine, request);
diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
index 1b8463a474bb..ebc2feba3e50 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -356,7 +356,7 @@ i915_gem_request_alloc(struct intel_engine_cs *engine,
 	/* Move the oldest request to the slab-cache (if not in use!) */
 	req = list_first_entry_or_null(&engine->timeline->requests,
 				       typeof(*req), link);
-	if (req && i915_gem_request_completed(req))
+	if (req && __i915_gem_request_completed(req))
 		i915_gem_request_retire(req);
 
 	/* Beware: Dragons be flying overhead.
@@ -367,7 +367,7 @@ i915_gem_request_alloc(struct intel_engine_cs *engine,
 	 * of being read by __i915_gem_active_get_rcu(). As such,
 	 * we have to be very careful when overwriting the contents. During
 	 * the RCU lookup, we change chase the request->engine pointer,
-	 * read the request->fence.seqno and increment the reference count.
+	 * read the request->global_seqno and increment the reference count.
 	 *
 	 * The reference count is incremented atomically. If it is zero,
 	 * the lookup knows the request is unallocated and complete. Otherwise,
@@ -409,6 +409,7 @@ i915_gem_request_alloc(struct intel_engine_cs *engine,
 	INIT_LIST_HEAD(&req->active_list);
 	req->i915 = dev_priv;
 	req->engine = engine;
+	req->global_seqno = seqno;
 	req->ctx = i915_gem_context_get(ctx);
 
 	/* No zalloc, must clear what we need by hand */
@@ -465,8 +466,15 @@ i915_gem_request_await_request(struct drm_i915_gem_request *to,
 		return ret < 0 ? ret : 0;
 	}
 
+	if (from->global_seqno == 0) {
+		ret = i915_sw_fence_await_dma_fence(&to->submit,
+						    &from->fence, 0,
+						    GFP_KERNEL);
+		return ret < 0 ? ret : 0;
+	}
+
 	idx = intel_engine_sync_index(from->engine, to->engine);
-	if (from->fence.seqno <= from->engine->semaphore.sync_seqno[idx])
+	if (from->global_seqno <= from->engine->semaphore.sync_seqno[idx])
 		return 0;
 
 	trace_i915_gem_ring_sync_to(to, from);
@@ -484,7 +492,7 @@ i915_gem_request_await_request(struct drm_i915_gem_request *to,
 			return ret;
 	}
 
-	from->engine->semaphore.sync_seqno[idx] = from->fence.seqno;
+	from->engine->semaphore.sync_seqno[idx] = from->global_seqno;
 	return 0;
 }
 
@@ -755,7 +763,7 @@ bool __i915_spin_request(const struct drm_i915_gem_request *req,
 
 	timeout_us += local_clock_us(&cpu);
 	do {
-		if (i915_gem_request_completed(req))
+		if (__i915_gem_request_completed(req))
 			return true;
 
 		if (signal_pending_state(state, current))
@@ -856,6 +864,7 @@ long i915_wait_request(struct drm_i915_gem_request *req,
 		if (timeout < 0)
 			goto complete;
 	}
+	GEM_BUG_ON(!req->global_seqno);
 
 	/* Optimistic short spin before touching IRQs */
 	if (i915_spin_request(req, state, 5))
@@ -865,7 +874,7 @@ long i915_wait_request(struct drm_i915_gem_request *req,
 	if (flags & I915_WAIT_LOCKED)
 		add_wait_queue(&req->i915->gpu_error.wait_queue, &reset);
 
-	intel_wait_init(&wait, req->fence.seqno);
+	intel_wait_init(&wait, req->global_seqno);
 	if (intel_engine_add_wait(req->engine, &wait))
 		/* In order to check that we haven't missed the interrupt
 		 * as we enabled it, we need to kick ourselves to do a
@@ -938,7 +947,7 @@ static bool engine_retire_requests(struct intel_engine_cs *engine)
 
 	list_for_each_entry_safe(request, next,
 				 &engine->timeline->requests, link) {
-		if (!i915_gem_request_completed(request))
+		if (!__i915_gem_request_completed(request))
 			return false;
 
 		i915_gem_request_retire(request);
diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
index 6c78ca0d85a9..22dd7ac270e6 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.h
+++ b/drivers/gpu/drm/i915/i915_gem_request.h
@@ -87,6 +87,8 @@ struct drm_i915_gem_request {
 	struct i915_sw_fence submit;
 	wait_queue_t submitq;
 
+	u32 global_seqno;
+
 	/** GEM sequence number associated with the previous request,
 	 * when the HWS breadcrumb is equal to this the GPU is processing
 	 * this request.
@@ -163,7 +165,7 @@ void i915_gem_request_retire_upto(struct drm_i915_gem_request *req);
 static inline u32
 i915_gem_request_get_seqno(struct drm_i915_gem_request *req)
 {
-	return req ? req->fence.seqno : 0;
+	return req ? req->global_seqno : 0;
 }
 
 static inline struct intel_engine_cs *
@@ -248,17 +250,35 @@ static inline bool i915_seqno_passed(u32 seq1, u32 seq2)
 }
 
 static inline bool
-i915_gem_request_started(const struct drm_i915_gem_request *req)
+__i915_gem_request_started(const struct drm_i915_gem_request *req)
 {
 	return i915_seqno_passed(intel_engine_get_seqno(req->engine),
 				 req->previous_seqno);
 }
 
 static inline bool
-i915_gem_request_completed(const struct drm_i915_gem_request *req)
+i915_gem_request_started(const struct drm_i915_gem_request *req)
+{
+	if (!req->global_seqno)
+		return false;
+
+	return __i915_gem_request_started(req);
+}
+
+static inline bool
+__i915_gem_request_completed(const struct drm_i915_gem_request *req)
 {
 	return i915_seqno_passed(intel_engine_get_seqno(req->engine),
-				 req->fence.seqno);
+				 req->global_seqno);
+}
+
+static inline bool
+i915_gem_request_completed(const struct drm_i915_gem_request *req)
+{
+	if (!req->global_seqno)
+		return false;
+
+	return __i915_gem_request_completed(req);
 }
 
 bool __i915_spin_request(const struct drm_i915_gem_request *request,
@@ -266,7 +286,7 @@ bool __i915_spin_request(const struct drm_i915_gem_request *request,
 static inline bool i915_spin_request(const struct drm_i915_gem_request *request,
 				     int state, unsigned long timeout_us)
 {
-	return (i915_gem_request_started(request) &&
+	return (__i915_gem_request_started(request) &&
 		__i915_spin_request(request, state, timeout_us));
 }
 
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 1e271008f1ef..65761c16ac48 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -1124,7 +1124,7 @@ static void engine_record_requests(struct intel_engine_cs *engine,
 		}
 
 		erq = &ee->requests[count++];
-		erq->seqno = request->fence.seqno;
+		erq->seqno = request->global_seqno;
 		erq->jiffies = request->emitted_jiffies;
 		erq->head = request->head;
 		erq->tail = request->tail;
diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
index b73e9020cbd2..d7b44e7afe59 100644
--- a/drivers/gpu/drm/i915/i915_guc_submission.c
+++ b/drivers/gpu/drm/i915/i915_guc_submission.c
@@ -508,7 +508,7 @@ static void guc_add_workqueue_item(struct i915_guc_client *gc,
 	wqi->context_desc = (u32)intel_lr_context_descriptor(rq->ctx, engine);
 
 	wqi->ring_tail = tail << WQ_RING_TAIL_SHIFT;
-	wqi->fence_id = rq->fence.seqno;
+	wqi->fence_id = rq->global_seqno;
 
 	kunmap_atomic(base);
 }
@@ -604,7 +604,7 @@ static void i915_guc_submit(struct drm_i915_gem_request *rq)
 		client->b_fail += 1;
 
 	guc->submissions[engine_id] += 1;
-	guc->last_seqno[engine_id] = rq->fence.seqno;
+	guc->last_seqno[engine_id] = rq->global_seqno;
 	spin_unlock(&client->wq_lock);
 }
 
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index 178798002a73..4c46f7c00323 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -466,7 +466,7 @@ TRACE_EVENT(i915_gem_ring_sync_to,
 			   __entry->dev = from->i915->drm.primary->index;
 			   __entry->sync_from = from->engine->id;
 			   __entry->sync_to = to->engine->id;
-			   __entry->seqno = from->fence.seqno;
+			   __entry->seqno = from->global_seqno;
 			   ),
 
 	    TP_printk("dev=%u, sync-from=%u, sync-to=%u, seqno=%u",
@@ -489,7 +489,7 @@ TRACE_EVENT(i915_gem_ring_dispatch,
 	    TP_fast_assign(
 			   __entry->dev = req->i915->drm.primary->index;
 			   __entry->ring = req->engine->id;
-			   __entry->seqno = req->fence.seqno;
+			   __entry->seqno = req->global_seqno;
 			   __entry->flags = flags;
 			   fence_enable_sw_signaling(&req->fence);
 			   ),
@@ -534,7 +534,7 @@ DECLARE_EVENT_CLASS(i915_gem_request,
 	    TP_fast_assign(
 			   __entry->dev = req->i915->drm.primary->index;
 			   __entry->ring = req->engine->id;
-			   __entry->seqno = req->fence.seqno;
+			   __entry->seqno = req->global_seqno;
 			   ),
 
 	    TP_printk("dev=%u, ring=%u, seqno=%u",
@@ -596,7 +596,7 @@ TRACE_EVENT(i915_gem_request_wait_begin,
 	    TP_fast_assign(
 			   __entry->dev = req->i915->drm.primary->index;
 			   __entry->ring = req->engine->id;
-			   __entry->seqno = req->fence.seqno;
+			   __entry->seqno = req->global_seqno;
 			   __entry->blocking =
 				     mutex_is_locked(&req->i915->drm.struct_mutex);
 			   ),
diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
index 9bad14d22c95..9ad1028681cf 100644
--- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
@@ -504,9 +504,11 @@ void intel_engine_enable_signaling(struct drm_i915_gem_request *request)
 
 	/* locked by fence_enable_sw_signaling() */
 	assert_spin_locked(&request->lock);
+	if (!request->global_seqno)
+		return;
 
 	request->signaling.wait.tsk = b->signaler;
-	request->signaling.wait.seqno = request->fence.seqno;
+	request->signaling.wait.seqno = request->global_seqno;
 	i915_gem_request_get(request);
 
 	spin_lock(&b->lock);
@@ -530,8 +532,8 @@ void intel_engine_enable_signaling(struct drm_i915_gem_request *request)
 	p = &b->signals.rb_node;
 	while (*p) {
 		parent = *p;
-		if (i915_seqno_passed(request->fence.seqno,
-				      to_signaler(parent)->fence.seqno)) {
+		if (i915_seqno_passed(request->global_seqno,
+				      to_signaler(parent)->global_seqno)) {
 			p = &parent->rb_right;
 			first = false;
 		} else {
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 16d7cdd11082..00fcf36ba919 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1566,7 +1566,7 @@ static int gen8_emit_request(struct drm_i915_gem_request *request)
 			intel_hws_seqno_address(request->engine) |
 			MI_FLUSH_DW_USE_GTT);
 	intel_ring_emit(ring, 0);
-	intel_ring_emit(ring, request->fence.seqno);
+	intel_ring_emit(ring, request->global_seqno);
 	intel_ring_emit(ring, MI_USER_INTERRUPT);
 	intel_ring_emit(ring, MI_NOOP);
 	return intel_logical_ring_advance(request);
@@ -1595,7 +1595,7 @@ static int gen8_emit_request_render(struct drm_i915_gem_request *request)
 			 PIPE_CONTROL_QW_WRITE));
 	intel_ring_emit(ring, intel_hws_seqno_address(request->engine));
 	intel_ring_emit(ring, 0);
-	intel_ring_emit(ring, i915_gem_request_get_seqno(request));
+	intel_ring_emit(ring, request->global_seqno);
 	/* We're thrashing one dword of HWS. */
 	intel_ring_emit(ring, 0);
 	intel_ring_emit(ring, MI_USER_INTERRUPT);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 4bf6bd056b26..597e35c9b699 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1296,7 +1296,7 @@ static int gen8_rcs_signal(struct drm_i915_gem_request *req)
 				PIPE_CONTROL_CS_STALL);
 		intel_ring_emit(ring, lower_32_bits(gtt_offset));
 		intel_ring_emit(ring, upper_32_bits(gtt_offset));
-		intel_ring_emit(ring, req->fence.seqno);
+		intel_ring_emit(ring, req->global_seqno);
 		intel_ring_emit(ring, 0);
 		intel_ring_emit(ring,
 				MI_SEMAPHORE_SIGNAL |
@@ -1332,7 +1332,7 @@ static int gen8_xcs_signal(struct drm_i915_gem_request *req)
 				lower_32_bits(gtt_offset) |
 				MI_FLUSH_DW_USE_GTT);
 		intel_ring_emit(ring, upper_32_bits(gtt_offset));
-		intel_ring_emit(ring, req->fence.seqno);
+		intel_ring_emit(ring, req->global_seqno);
 		intel_ring_emit(ring,
 				MI_SEMAPHORE_SIGNAL |
 				MI_SEMAPHORE_TARGET(waiter->hw_id));
@@ -1365,7 +1365,7 @@ static int gen6_signal(struct drm_i915_gem_request *req)
 		if (i915_mmio_reg_valid(mbox_reg)) {
 			intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1));
 			intel_ring_emit_reg(ring, mbox_reg);
-			intel_ring_emit(ring, req->fence.seqno);
+			intel_ring_emit(ring, req->global_seqno);
 		}
 	}
 
@@ -1396,7 +1396,7 @@ static int i9xx_emit_request(struct drm_i915_gem_request *req)
 
 	intel_ring_emit(ring, MI_STORE_DWORD_INDEX);
 	intel_ring_emit(ring, I915_GEM_HWS_INDEX << MI_STORE_DWORD_INDEX_SHIFT);
-	intel_ring_emit(ring, req->fence.seqno);
+	intel_ring_emit(ring, req->global_seqno);
 	intel_ring_emit(ring, MI_USER_INTERRUPT);
 	intel_ring_advance(ring);
 
@@ -1446,7 +1446,7 @@ static int gen8_render_emit_request(struct drm_i915_gem_request *req)
 			       PIPE_CONTROL_QW_WRITE));
 	intel_ring_emit(ring, intel_hws_seqno_address(engine));
 	intel_ring_emit(ring, 0);
-	intel_ring_emit(ring, i915_gem_request_get_seqno(req));
+	intel_ring_emit(ring, req->global_seqno);
 	/* We're thrashing one dword of HWS. */
 	intel_ring_emit(ring, 0);
 	intel_ring_emit(ring, MI_USER_INTERRUPT);
@@ -1484,7 +1484,7 @@ gen8_ring_sync_to(struct drm_i915_gem_request *req,
 			MI_SEMAPHORE_WAIT |
 			MI_SEMAPHORE_GLOBAL_GTT |
 			MI_SEMAPHORE_SAD_GTE_SDD);
-	intel_ring_emit(ring, signal->fence.seqno);
+	intel_ring_emit(ring, signal->global_seqno);
 	intel_ring_emit(ring, lower_32_bits(offset));
 	intel_ring_emit(ring, upper_32_bits(offset));
 	intel_ring_advance(ring);
@@ -1522,7 +1522,7 @@ gen6_ring_sync_to(struct drm_i915_gem_request *req,
 	 * seqno is >= the last seqno executed. However for hardware the
 	 * comparison is strictly greater than.
 	 */
-	intel_ring_emit(ring, signal->fence.seqno - 1);
+	intel_ring_emit(ring, signal->global_seqno - 1);
 	intel_ring_emit(ring, 0);
 	intel_ring_emit(ring, MI_NOOP);
 	intel_ring_advance(ring);
-- 
2.9.3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 11/18] drm/i915: Record space required for request emission
  2016-09-14  6:52 Tracking multiple timelines (full-ppgtt) Chris Wilson
                   ` (9 preceding siblings ...)
  2016-09-14  6:52 ` [PATCH 10/18] drm/i915: Introduce a global_seqno for each request Chris Wilson
@ 2016-09-14  6:52 ` Chris Wilson
  2016-09-14 13:30   ` Tvrtko Ursulin
  2016-09-19 10:47   ` Joonas Lahtinen
  2016-09-14  6:52 ` [PATCH 12/18] drm/i915: Defer " Chris Wilson
                   ` (7 subsequent siblings)
  18 siblings, 2 replies; 51+ messages in thread
From: Chris Wilson @ 2016-09-14  6:52 UTC (permalink / raw)
  To: intel-gfx

In the next patch, we will use deferred request emission. That requires
reserving sufficient space in the ringbuffer to emit the request, which
first requires us to know how large the request is.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_request.c |  1 +
 drivers/gpu/drm/i915/intel_lrc.c        |  6 ++++++
 drivers/gpu/drm/i915/intel_ringbuffer.c | 29 +++++++++++++++++++++++++++--
 drivers/gpu/drm/i915/intel_ringbuffer.h |  1 +
 4 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
index ebc2feba3e50..9a735e2d5aea 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -425,6 +425,7 @@ i915_gem_request_alloc(struct intel_engine_cs *engine,
 	 * away, e.g. because a GPU scheduler has deferred it.
 	 */
 	req->reserved_space = MIN_SPACE_FOR_ADD_REQUEST;
+	GEM_BUG_ON(req->reserved_space < engine->emit_request_sz);
 
 	if (i915.enable_execlists)
 		ret = intel_logical_ring_alloc_request_extras(req);
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 00fcf36ba919..7e944a0acc10 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1572,6 +1572,8 @@ static int gen8_emit_request(struct drm_i915_gem_request *request)
 	return intel_logical_ring_advance(request);
 }
 
+static const int gen8_emit_request_sz = 6 + WA_TAIL_DWORDS;
+
 static int gen8_emit_request_render(struct drm_i915_gem_request *request)
 {
 	struct intel_ring *ring = request->ring;
@@ -1603,6 +1605,8 @@ static int gen8_emit_request_render(struct drm_i915_gem_request *request)
 	return intel_logical_ring_advance(request);
 }
 
+static const int gen8_emit_request_render_sz = 8 + WA_TAIL_DWORDS;
+
 static int gen8_init_rcs_context(struct drm_i915_gem_request *req)
 {
 	int ret;
@@ -1677,6 +1681,7 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine)
 	engine->reset_hw = reset_common_ring;
 	engine->emit_flush = gen8_emit_flush;
 	engine->emit_request = gen8_emit_request;
+	engine->emit_request_sz = gen8_emit_request_sz;
 	engine->submit_request = execlists_submit_request;
 
 	engine->irq_enable = gen8_logical_ring_enable_irq;
@@ -1799,6 +1804,7 @@ int logical_render_ring_init(struct intel_engine_cs *engine)
 	engine->init_context = gen8_init_rcs_context;
 	engine->emit_flush = gen8_emit_flush_render;
 	engine->emit_request = gen8_emit_request_render;
+	engine->emit_request_sz = gen8_emit_request_render_sz;
 
 	ret = intel_engine_create_scratch(engine, 4096);
 	if (ret)
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 597e35c9b699..c8c9ad40fd93 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1405,6 +1405,8 @@ static int i9xx_emit_request(struct drm_i915_gem_request *req)
 	return 0;
 }
 
+static const int i9xx_emit_request_sz = 4;
+
 /**
  * gen6_sema_emit_request - Update the semaphore mailbox registers
  *
@@ -1458,6 +1460,8 @@ static int gen8_render_emit_request(struct drm_i915_gem_request *req)
 	return 0;
 }
 
+static const int gen8_render_emit_request_sz = 8;
+
 /**
  * intel_ring_sync - sync the waiter to the signaller on seqno
  *
@@ -2677,8 +2681,21 @@ static void intel_ring_default_vfuncs(struct drm_i915_private *dev_priv,
 	engine->reset_hw = reset_ring_common;
 
 	engine->emit_request = i9xx_emit_request;
-	if (i915.semaphores)
+	engine->emit_request_sz = i9xx_emit_request_sz;
+	if (i915.semaphores) {
+		int num_rings;
+
 		engine->emit_request = gen6_sema_emit_request;
+
+		num_rings = hweight32(INTEL_INFO(dev_priv)->ring_mask) - 1;
+		if (INTEL_GEN(dev_priv) >= 8) {
+			engine->emit_request_sz += num_rings * 6;
+		} else {
+			engine->emit_request_sz += num_rings * 3;
+			if (num_rings & 1)
+				engine->emit_request_sz++;
+		}
+	}
 	engine->submit_request = i9xx_submit_request;
 
 	if (INTEL_GEN(dev_priv) >= 8)
@@ -2706,9 +2723,17 @@ int intel_init_render_ring_buffer(struct intel_engine_cs *engine)
 	if (INTEL_GEN(dev_priv) >= 8) {
 		engine->init_context = intel_rcs_ctx_init;
 		engine->emit_request = gen8_render_emit_request;
+		engine->emit_request_sz = gen8_render_emit_request_sz;
 		engine->emit_flush = gen8_render_ring_flush;
-		if (i915.semaphores)
+		if (i915.semaphores) {
+			int num_rings;
+
 			engine->semaphore.signal = gen8_rcs_signal;
+
+			num_rings =
+				hweight32(INTEL_INFO(dev_priv)->ring_mask) - 1;
+			engine->emit_request_sz += num_rings * 6;
+		}
 	} else if (INTEL_GEN(dev_priv) >= 6) {
 		engine->init_context = intel_rcs_ctx_init;
 		engine->emit_flush = gen7_render_ring_flush;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index d6f80b3d66b4..47ad19f5441d 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -226,6 +226,7 @@ struct intel_engine_cs {
 #define I915_DISPATCH_PINNED BIT(1)
 #define I915_DISPATCH_RS     BIT(2)
 	int		(*emit_request)(struct drm_i915_gem_request *req);
+	int		emit_request_sz;
 
 	/* Pass the request to the hardware queue (e.g. directly into
 	 * the legacy ringbuffer or to the end of an execlist).
-- 
2.9.3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 12/18] drm/i915: Defer request emission
  2016-09-14  6:52 Tracking multiple timelines (full-ppgtt) Chris Wilson
                   ` (10 preceding siblings ...)
  2016-09-14  6:52 ` [PATCH 11/18] drm/i915: Record space required for request emission Chris Wilson
@ 2016-09-14  6:52 ` Chris Wilson
  2016-09-19 12:06   ` Joonas Lahtinen
  2016-09-14  6:52 ` [PATCH 13/18] drm/i915: Move the global sync optimisation to the timeline Chris Wilson
                   ` (6 subsequent siblings)
  18 siblings, 1 reply; 51+ messages in thread
From: Chris Wilson @ 2016-09-14  6:52 UTC (permalink / raw)
  To: intel-gfx

Move the actual emission of the request (the closing breadcrumb) from
i915_add_request() to the submit callback. (It can be moved later when
required.) This allows us to defer the allocation of the global_seqno
from request construction to actual submission, allowing us to emit the
requests out of order (wrt to the order of their construction, they
still will only be executed one all of their dependencies are resolved
including that all earlier requests on their timeline have been
submitted.) We have to specialise how we then emit the request in order
to write into the preallocated space, rather than at the tail of the
ringbuffer (which will have been advanced by the addition of new
requests).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_request.c |  26 ++---
 drivers/gpu/drm/i915/intel_lrc.c        | 121 ++++++++---------------
 drivers/gpu/drm/i915/intel_ringbuffer.c | 168 +++++++++++---------------------
 drivers/gpu/drm/i915/intel_ringbuffer.h |  10 +-
 4 files changed, 112 insertions(+), 213 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
index 9a735e2d5aea..f2809e364fd7 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -630,9 +630,7 @@ void __i915_add_request(struct drm_i915_gem_request *request, bool flush_caches)
 	struct intel_ring *ring = request->ring;
 	struct intel_timeline *timeline = request->timeline;
 	struct drm_i915_gem_request *prev;
-	u32 request_start;
-	u32 reserved_tail;
-	int ret;
+	int err;
 
 	trace_i915_gem_request_add(request);
 
@@ -641,8 +639,6 @@ void __i915_add_request(struct drm_i915_gem_request *request, bool flush_caches)
 	 * should already have been reserved in the ring buffer. Let the ring
 	 * know that it is time to use that space up.
 	 */
-	request_start = ring->tail;
-	reserved_tail = request->reserved_space;
 	request->reserved_space = 0;
 
 	/*
@@ -653,10 +649,10 @@ void __i915_add_request(struct drm_i915_gem_request *request, bool flush_caches)
 	 * what.
 	 */
 	if (flush_caches) {
-		ret = engine->emit_flush(request, EMIT_FLUSH);
+		err = engine->emit_flush(request, EMIT_FLUSH);
 
 		/* Not allowed to fail! */
-		WARN(ret, "engine->emit_flush() failed: %d!\n", ret);
+		WARN(err, "engine->emit_flush() failed: %d!\n", err);
 	}
 
 	/* Record the position of the start of the breadcrumb so that
@@ -664,20 +660,12 @@ void __i915_add_request(struct drm_i915_gem_request *request, bool flush_caches)
 	 * GPU processing the request, we never over-estimate the
 	 * position of the ring's HEAD.
 	 */
+	err = intel_ring_begin(request, engine->emit_request_sz);
+	GEM_BUG_ON(err);
 	request->postfix = ring->tail;
 
-	/* Not allowed to fail! */
-	ret = engine->emit_request(request);
-	WARN(ret, "(%s)->emit_request failed: %d!\n", engine->name, ret);
-
-	/* Sanity check that the reserved size was large enough. */
-	ret = ring->tail - request_start;
-	if (ret < 0)
-		ret += ring->size;
-	WARN_ONCE(ret > reserved_tail,
-		  "Not enough space reserved (%d bytes) "
-		  "for adding the request (%d bytes)\n",
-		  reserved_tail, ret);
+	engine->emit_request(request, request->ring->vaddr + request->postfix);
+	ring->tail += engine->emit_request_sz * sizeof(u32);
 
 	/* Seal the request and mark it as pending execution. Note that
 	 * we may inspect this state, without holding any locks, during
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 7e944a0acc10..ff081ce4a2aa 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -360,7 +360,7 @@ static u64 execlists_update_context(struct drm_i915_gem_request *rq)
 	struct i915_hw_ppgtt *ppgtt = rq->ctx->ppgtt;
 	u32 *reg_state = ce->lrc_reg_state;
 
-	reg_state[CTX_RING_TAIL+1] = intel_ring_offset(rq->ring, rq->tail);
+	reg_state[CTX_RING_TAIL+1] = rq->tail;
 
 	/* True 32b PPGTT with dynamic page allocation: update PDP
 	 * registers and point the unallocated PDPs to scratch page.
@@ -594,6 +594,15 @@ static void execlists_submit_request(struct drm_i915_gem_request *request)
 
 	spin_lock_irqsave(&engine->execlist_lock, flags);
 
+	/* We keep the previous context alive until we retire the following
+	 * request. This ensures that any the context object is still pinned
+	 * for any residual writes the HW makes into it on the context switch
+	 * into the next object following the breadcrumb. Otherwise, we may
+	 * retire the context too early.
+	 */
+	request->previous_context = engine->last_context;
+	engine->last_context = request->ctx;
+
 	list_add_tail(&request->execlist_link, &engine->execlist_queue);
 	if (execlists_elsp_idle(engine))
 		tasklet_hi_schedule(&engine->irq_tasklet);
@@ -663,46 +672,6 @@ err_unpin:
 	return ret;
 }
 
-/*
- * intel_logical_ring_advance() - advance the tail and prepare for submission
- * @request: Request to advance the logical ringbuffer of.
- *
- * The tail is updated in our logical ringbuffer struct, not in the actual context. What
- * really happens during submission is that the context and current tail will be placed
- * on a queue waiting for the ELSP to be ready to accept a new context submission. At that
- * point, the tail *inside* the context is updated and the ELSP written to.
- */
-static int
-intel_logical_ring_advance(struct drm_i915_gem_request *request)
-{
-	struct intel_ring *ring = request->ring;
-	struct intel_engine_cs *engine = request->engine;
-
-	intel_ring_advance(ring);
-	request->tail = ring->tail;
-
-	/*
-	 * Here we add two extra NOOPs as padding to avoid
-	 * lite restore of a context with HEAD==TAIL.
-	 *
-	 * Caller must reserve WA_TAIL_DWORDS for us!
-	 */
-	intel_ring_emit(ring, MI_NOOP);
-	intel_ring_emit(ring, MI_NOOP);
-	intel_ring_advance(ring);
-	request->wa_tail = ring->tail;
-
-	/* We keep the previous context alive until we retire the following
-	 * request. This ensures that any the context object is still pinned
-	 * for any residual writes the HW makes into it on the context switch
-	 * into the next object following the breadcrumb. Otherwise, we may
-	 * retire the context too early.
-	 */
-	request->previous_context = engine->last_context;
-	engine->last_context = request->ctx;
-	return 0;
-}
-
 static int intel_lr_context_pin(struct i915_gem_context *ctx,
 				struct intel_engine_cs *engine)
 {
@@ -1548,41 +1517,36 @@ static void bxt_a_seqno_barrier(struct intel_engine_cs *engine)
  * restore with HEAD==TAIL (WaIdleLiteRestore).
  */
 #define WA_TAIL_DWORDS 2
-
-static int gen8_emit_request(struct drm_i915_gem_request *request)
+static void gen8_emit_wa_tail(struct drm_i915_gem_request *request, u32 *out)
 {
-	struct intel_ring *ring = request->ring;
-	int ret;
-
-	ret = intel_ring_begin(request, 6 + WA_TAIL_DWORDS);
-	if (ret)
-		return ret;
+	out[0] = MI_NOOP;
+	out[1] = MI_NOOP;
+	request->wa_tail = intel_ring_offset(request->ring,
+					     out + WA_TAIL_DWORDS);
+}
 
+static void gen8_emit_request(struct drm_i915_gem_request *request,
+			      u32 *out)
+{
 	/* w/a: bit 5 needs to be zero for MI_FLUSH_DW address. */
 	BUILD_BUG_ON(I915_GEM_HWS_INDEX_ADDR & (1 << 5));
 
-	intel_ring_emit(ring, (MI_FLUSH_DW + 1) | MI_FLUSH_DW_OP_STOREDW);
-	intel_ring_emit(ring,
-			intel_hws_seqno_address(request->engine) |
-			MI_FLUSH_DW_USE_GTT);
-	intel_ring_emit(ring, 0);
-	intel_ring_emit(ring, request->global_seqno);
-	intel_ring_emit(ring, MI_USER_INTERRUPT);
-	intel_ring_emit(ring, MI_NOOP);
-	return intel_logical_ring_advance(request);
+	out[0] = (MI_FLUSH_DW + 1) | MI_FLUSH_DW_OP_STOREDW;
+	out[1] = intel_hws_seqno_address(request->engine) | MI_FLUSH_DW_USE_GTT;
+	out[2] = 0;
+	out[3] = request->global_seqno;
+	out[4] = MI_USER_INTERRUPT;
+	out[5] = MI_NOOP;
+	request->tail = intel_ring_offset(request->ring, out + 6);
+
+	gen8_emit_wa_tail(request, out + 6);
 }
 
 static const int gen8_emit_request_sz = 6 + WA_TAIL_DWORDS;
 
-static int gen8_emit_request_render(struct drm_i915_gem_request *request)
+static void gen8_emit_request_render(struct drm_i915_gem_request *request,
+				     u32 *out)
 {
-	struct intel_ring *ring = request->ring;
-	int ret;
-
-	ret = intel_ring_begin(request, 8 + WA_TAIL_DWORDS);
-	if (ret)
-		return ret;
-
 	/* We're using qword write, seqno should be aligned to 8 bytes. */
 	BUILD_BUG_ON(I915_GEM_HWS_INDEX & 1);
 
@@ -1590,19 +1554,20 @@ static int gen8_emit_request_render(struct drm_i915_gem_request *request)
 	 * need a prior CS_STALL, which is emitted by the flush
 	 * following the batch.
 	 */
-	intel_ring_emit(ring, GFX_OP_PIPE_CONTROL(6));
-	intel_ring_emit(ring,
-			(PIPE_CONTROL_GLOBAL_GTT_IVB |
-			 PIPE_CONTROL_CS_STALL |
-			 PIPE_CONTROL_QW_WRITE));
-	intel_ring_emit(ring, intel_hws_seqno_address(request->engine));
-	intel_ring_emit(ring, 0);
-	intel_ring_emit(ring, request->global_seqno);
+	out[0] = GFX_OP_PIPE_CONTROL(6);
+	out[1] = (PIPE_CONTROL_GLOBAL_GTT_IVB |
+		  PIPE_CONTROL_CS_STALL |
+		  PIPE_CONTROL_QW_WRITE);
+	out[2] = intel_hws_seqno_address(request->engine);
+	out[3] = 0;
+	out[4] = request->global_seqno;
 	/* We're thrashing one dword of HWS. */
-	intel_ring_emit(ring, 0);
-	intel_ring_emit(ring, MI_USER_INTERRUPT);
-	intel_ring_emit(ring, MI_NOOP);
-	return intel_logical_ring_advance(request);
+	out[5] = 0;
+	out[6] = MI_USER_INTERRUPT;
+	out[7] = MI_NOOP;
+	request->tail = intel_ring_offset(request->ring, out + 8);
+
+	gen8_emit_wa_tail(request, out + 8);
 }
 
 static const int gen8_emit_request_render_sz = 8 + WA_TAIL_DWORDS;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index c8c9ad40fd93..83d2a3f72ce1 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1271,89 +1271,61 @@ static void render_ring_cleanup(struct intel_engine_cs *engine)
 	i915_vma_unpin_and_release(&dev_priv->semaphore);
 }
 
-static int gen8_rcs_signal(struct drm_i915_gem_request *req)
+static u32 *gen8_rcs_signal(struct drm_i915_gem_request *req, u32 *out)
 {
-	struct intel_ring *ring = req->ring;
 	struct drm_i915_private *dev_priv = req->i915;
 	struct intel_engine_cs *waiter;
 	enum intel_engine_id id;
-	int ret, num_rings;
-
-	num_rings = INTEL_INFO(dev_priv)->num_rings;
-	ret = intel_ring_begin(req, (num_rings-1) * 8);
-	if (ret)
-		return ret;
 
 	for_each_engine_id(waiter, dev_priv, id) {
 		u64 gtt_offset = req->engine->semaphore.signal_ggtt[id];
 		if (gtt_offset == MI_SEMAPHORE_SYNC_INVALID)
 			continue;
 
-		intel_ring_emit(ring, GFX_OP_PIPE_CONTROL(6));
-		intel_ring_emit(ring,
-				PIPE_CONTROL_GLOBAL_GTT_IVB |
-				PIPE_CONTROL_QW_WRITE |
-				PIPE_CONTROL_CS_STALL);
-		intel_ring_emit(ring, lower_32_bits(gtt_offset));
-		intel_ring_emit(ring, upper_32_bits(gtt_offset));
-		intel_ring_emit(ring, req->global_seqno);
-		intel_ring_emit(ring, 0);
-		intel_ring_emit(ring,
-				MI_SEMAPHORE_SIGNAL |
-				MI_SEMAPHORE_TARGET(waiter->hw_id));
-		intel_ring_emit(ring, 0);
+		*out++ = GFX_OP_PIPE_CONTROL(6);
+		*out++ = (PIPE_CONTROL_GLOBAL_GTT_IVB |
+			  PIPE_CONTROL_QW_WRITE |
+			  PIPE_CONTROL_CS_STALL);
+		*out++ = lower_32_bits(gtt_offset);
+		*out++ = upper_32_bits(gtt_offset);
+		*out++ = req->global_seqno;
+		*out++ = 0;
+		*out++ = (MI_SEMAPHORE_SIGNAL |
+			  MI_SEMAPHORE_TARGET(waiter->hw_id));
+		*out++ = 0;
 	}
-	intel_ring_advance(ring);
 
-	return 0;
+	return out;
 }
 
-static int gen8_xcs_signal(struct drm_i915_gem_request *req)
+static u32 *gen8_xcs_signal(struct drm_i915_gem_request *req, u32 *out)
 {
-	struct intel_ring *ring = req->ring;
 	struct drm_i915_private *dev_priv = req->i915;
 	struct intel_engine_cs *waiter;
 	enum intel_engine_id id;
-	int ret, num_rings;
-
-	num_rings = INTEL_INFO(dev_priv)->num_rings;
-	ret = intel_ring_begin(req, (num_rings-1) * 6);
-	if (ret)
-		return ret;
 
 	for_each_engine_id(waiter, dev_priv, id) {
 		u64 gtt_offset = req->engine->semaphore.signal_ggtt[id];
 		if (gtt_offset == MI_SEMAPHORE_SYNC_INVALID)
 			continue;
 
-		intel_ring_emit(ring,
-				(MI_FLUSH_DW + 1) | MI_FLUSH_DW_OP_STOREDW);
-		intel_ring_emit(ring,
-				lower_32_bits(gtt_offset) |
-				MI_FLUSH_DW_USE_GTT);
-		intel_ring_emit(ring, upper_32_bits(gtt_offset));
-		intel_ring_emit(ring, req->global_seqno);
-		intel_ring_emit(ring,
-				MI_SEMAPHORE_SIGNAL |
-				MI_SEMAPHORE_TARGET(waiter->hw_id));
-		intel_ring_emit(ring, 0);
+		*out++ = (MI_FLUSH_DW + 1) | MI_FLUSH_DW_OP_STOREDW;
+		*out++ = lower_32_bits(gtt_offset) | MI_FLUSH_DW_USE_GTT;
+		*out++ = upper_32_bits(gtt_offset);
+		*out++ = req->global_seqno;
+		*out++ = (MI_SEMAPHORE_SIGNAL |
+			  MI_SEMAPHORE_TARGET(waiter->hw_id));
+		*out++ = 0;
 	}
-	intel_ring_advance(ring);
 
-	return 0;
+	return out;
 }
 
-static int gen6_signal(struct drm_i915_gem_request *req)
+static u32 *gen6_signal(struct drm_i915_gem_request *req, u32 *out)
 {
-	struct intel_ring *ring = req->ring;
 	struct drm_i915_private *dev_priv = req->i915;
 	struct intel_engine_cs *engine;
-	int ret, num_rings;
-
-	num_rings = INTEL_INFO(dev_priv)->num_rings;
-	ret = intel_ring_begin(req, round_up((num_rings-1) * 3, 2));
-	if (ret)
-		return ret;
+	int num_rings = 0;
 
 	for_each_engine(engine, dev_priv) {
 		i915_reg_t mbox_reg;
@@ -1363,46 +1335,34 @@ static int gen6_signal(struct drm_i915_gem_request *req)
 
 		mbox_reg = req->engine->semaphore.mbox.signal[engine->hw_id];
 		if (i915_mmio_reg_valid(mbox_reg)) {
-			intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1));
-			intel_ring_emit_reg(ring, mbox_reg);
-			intel_ring_emit(ring, req->global_seqno);
+			*out++ = MI_LOAD_REGISTER_IMM(1);
+			*out++ = i915_mmio_reg_offset(mbox_reg);
+			*out++ = req->global_seqno;
+			num_rings++;
 		}
 	}
+	if (num_rings & 1)
+		*out++ = MI_NOOP;
 
-	/* If num_dwords was rounded, make sure the tail pointer is correct */
-	if (num_rings % 2 == 0)
-		intel_ring_emit(ring, MI_NOOP);
-	intel_ring_advance(ring);
-
-	return 0;
+	return out;
 }
 
 static void i9xx_submit_request(struct drm_i915_gem_request *request)
 {
 	struct drm_i915_private *dev_priv = request->i915;
 
-	I915_WRITE_TAIL(request->engine,
-			intel_ring_offset(request->ring, request->tail));
+	I915_WRITE_TAIL(request->engine, request->tail);
 }
 
-static int i9xx_emit_request(struct drm_i915_gem_request *req)
+static void i9xx_emit_request(struct drm_i915_gem_request *req,
+			      u32 *out)
 {
-	struct intel_ring *ring = req->ring;
-	int ret;
-
-	ret = intel_ring_begin(req, 4);
-	if (ret)
-		return ret;
-
-	intel_ring_emit(ring, MI_STORE_DWORD_INDEX);
-	intel_ring_emit(ring, I915_GEM_HWS_INDEX << MI_STORE_DWORD_INDEX_SHIFT);
-	intel_ring_emit(ring, req->global_seqno);
-	intel_ring_emit(ring, MI_USER_INTERRUPT);
-	intel_ring_advance(ring);
-
-	req->tail = ring->tail;
+	out[0] = MI_STORE_DWORD_INDEX;
+	out[1] = I915_GEM_HWS_INDEX << MI_STORE_DWORD_INDEX_SHIFT;
+	out[2] = req->global_seqno;
+	out[3] = MI_USER_INTERRUPT;
 
-	return 0;
+	req->tail = intel_ring_offset(req->ring, out + 4);
 }
 
 static const int i9xx_emit_request_sz = 4;
@@ -1415,49 +1375,33 @@ static const int i9xx_emit_request_sz = 4;
  * Update the mailbox registers in the *other* rings with the current seqno.
  * This acts like a signal in the canonical semaphore.
  */
-static int gen6_sema_emit_request(struct drm_i915_gem_request *req)
+static void gen6_sema_emit_request(struct drm_i915_gem_request *req,
+				   u32 *out)
 {
-	int ret;
-
-	ret = req->engine->semaphore.signal(req);
-	if (ret)
-		return ret;
-
-	return i9xx_emit_request(req);
+	return i9xx_emit_request(req, req->engine->semaphore.signal(req, out));
 }
 
-static int gen8_render_emit_request(struct drm_i915_gem_request *req)
+static void gen8_render_emit_request(struct drm_i915_gem_request *req,
+				     u32 *out)
 {
 	struct intel_engine_cs *engine = req->engine;
-	struct intel_ring *ring = req->ring;
-	int ret;
 
-	if (engine->semaphore.signal) {
-		ret = engine->semaphore.signal(req);
-		if (ret)
-			return ret;
-	}
-
-	ret = intel_ring_begin(req, 8);
-	if (ret)
-		return ret;
+	if (engine->semaphore.signal)
+		out = engine->semaphore.signal(req, out);
 
-	intel_ring_emit(ring, GFX_OP_PIPE_CONTROL(6));
-	intel_ring_emit(ring, (PIPE_CONTROL_GLOBAL_GTT_IVB |
+	out[0] = GFX_OP_PIPE_CONTROL(6);
+	out[1] = (PIPE_CONTROL_GLOBAL_GTT_IVB |
 			       PIPE_CONTROL_CS_STALL |
-			       PIPE_CONTROL_QW_WRITE));
-	intel_ring_emit(ring, intel_hws_seqno_address(engine));
-	intel_ring_emit(ring, 0);
-	intel_ring_emit(ring, req->global_seqno);
+			       PIPE_CONTROL_QW_WRITE);
+	out[2] = intel_hws_seqno_address(engine);
+	out[3] = 0;
+	out[4] = req->global_seqno;
 	/* We're thrashing one dword of HWS. */
-	intel_ring_emit(ring, 0);
-	intel_ring_emit(ring, MI_USER_INTERRUPT);
-	intel_ring_emit(ring, MI_NOOP);
-	intel_ring_advance(ring);
-
-	req->tail = ring->tail;
+	out[5] = 0;
+	out[6] = MI_USER_INTERRUPT;
+	out[7] = MI_NOOP;
 
-	return 0;
+	req->tail = intel_ring_offset(req->ring, out + 8);
 }
 
 static const int gen8_render_emit_request_sz = 8;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 47ad19f5441d..a2f9c7d60381 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -225,7 +225,8 @@ struct intel_engine_cs {
 #define I915_DISPATCH_SECURE BIT(0)
 #define I915_DISPATCH_PINNED BIT(1)
 #define I915_DISPATCH_RS     BIT(2)
-	int		(*emit_request)(struct drm_i915_gem_request *req);
+	void		(*emit_request)(struct drm_i915_gem_request *req,
+					u32 *out);
 	int		emit_request_sz;
 
 	/* Pass the request to the hardware queue (e.g. directly into
@@ -301,7 +302,7 @@ struct intel_engine_cs {
 		/* AKA wait() */
 		int	(*sync_to)(struct drm_i915_gem_request *req,
 				   struct drm_i915_gem_request *signal);
-		int	(*signal)(struct drm_i915_gem_request *req);
+		u32	*(*signal)(struct drm_i915_gem_request *req, u32 *out);
 	} semaphore;
 
 	/* Execlists */
@@ -463,10 +464,11 @@ static inline void intel_ring_advance(struct intel_ring *ring)
 	 */
 }
 
-static inline u32 intel_ring_offset(struct intel_ring *ring, u32 value)
+static inline u32 intel_ring_offset(struct intel_ring *ring, void *addr)
 {
 	/* Don't write ring->size (equivalent to 0) as that hangs some GPUs. */
-	return value & (ring->size - 1);
+	u32 offset = addr - ring->vaddr;
+	return offset & (ring->size - 1);
 }
 
 int __intel_ring_space(int head, int tail, int size);
-- 
2.9.3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 13/18] drm/i915: Move the global sync optimisation to the timeline
  2016-09-14  6:52 Tracking multiple timelines (full-ppgtt) Chris Wilson
                   ` (11 preceding siblings ...)
  2016-09-14  6:52 ` [PATCH 12/18] drm/i915: Defer " Chris Wilson
@ 2016-09-14  6:52 ` Chris Wilson
  2016-09-19 13:16   ` Joonas Lahtinen
  2016-09-14  6:52 ` [PATCH 14/18] drm/i915: Create a unique name for the context Chris Wilson
                   ` (5 subsequent siblings)
  18 siblings, 1 reply; 51+ messages in thread
From: Chris Wilson @ 2016-09-14  6:52 UTC (permalink / raw)
  To: intel-gfx

Currently we try to reduce the number of synchronisations (now the
number of requests we need to wait upon) by noting that if we have
earlier waited upon a request, all subsequent requests in the timeline
will be after the wait. This only applies to requests in this timeline,
as other timelines will not be ordered by that waiter.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c      |  9 ---------
 drivers/gpu/drm/i915/i915_drv.h          |  1 -
 drivers/gpu/drm/i915/i915_gem_request.c  | 16 +++++++++++-----
 drivers/gpu/drm/i915/i915_gem_timeline.h |  1 +
 drivers/gpu/drm/i915/i915_gpu_error.c    | 23 +++++++----------------
 drivers/gpu/drm/i915/intel_engine_cs.c   |  2 --
 drivers/gpu/drm/i915/intel_ringbuffer.c  |  3 ---
 drivers/gpu/drm/i915/intel_ringbuffer.h  |  2 --
 8 files changed, 19 insertions(+), 38 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index b325ec3b5a09..f7c5e09122ab 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -3155,15 +3155,6 @@ static int i915_semaphore_status(struct seq_file *m, void *unused)
 		seq_putc(m, '\n');
 	}
 
-	seq_puts(m, "\nSync seqno:\n");
-	for_each_engine(engine, dev_priv) {
-		for (j = 0; j < num_rings; j++)
-			seq_printf(m, "  0x%08x ",
-				   engine->semaphore.sync_seqno[j]);
-		seq_putc(m, '\n');
-	}
-	seq_putc(m, '\n');
-
 	intel_runtime_pm_put(dev_priv);
 	mutex_unlock(&dev->struct_mutex);
 	return 0;
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index bbb8a5e38ecf..f60d2e17033d 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -781,7 +781,6 @@ struct drm_i915_error_state {
 		u32 cpu_ring_tail;
 
 		u32 last_seqno;
-		u32 semaphore_seqno[I915_NUM_ENGINES - 1];
 
 		/* Register state */
 		u32 start;
diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
index f2809e364fd7..74e1a875988c 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -239,7 +239,8 @@ static int i915_gem_init_global_seqno(struct drm_i915_private *dev_priv,
 				      u32 seqno)
 {
 	struct intel_engine_cs *engine;
-	int ret;
+	struct i915_gem_timeline *tl;
+	int ret, i;
 
 	/* Carefully retire all requests without writing to the rings */
 	ret = i915_gem_wait_for_idle(dev_priv,
@@ -262,6 +263,12 @@ static int i915_gem_init_global_seqno(struct drm_i915_private *dev_priv,
 	for_each_engine(engine, dev_priv)
 		intel_engine_init_global_seqno(engine, seqno);
 
+	list_for_each_entry(tl, &dev_priv->gt.timelines, link) {
+		for (i = 0; i < ARRAY_SIZE(tl->engine); i++)
+			memset(tl->engine[i].sync_seqno, 0,
+			       sizeof(tl->engine[i].sync_seqno));
+	}
+
 	return 0;
 }
 
@@ -454,7 +461,7 @@ static int
 i915_gem_request_await_request(struct drm_i915_gem_request *to,
 			       struct drm_i915_gem_request *from)
 {
-	int idx, ret;
+	int ret;
 
 	GEM_BUG_ON(to == from);
 
@@ -474,8 +481,7 @@ i915_gem_request_await_request(struct drm_i915_gem_request *to,
 		return ret < 0 ? ret : 0;
 	}
 
-	idx = intel_engine_sync_index(from->engine, to->engine);
-	if (from->global_seqno <= from->engine->semaphore.sync_seqno[idx])
+	if (from->global_seqno <= to->timeline->sync_seqno[from->engine->id])
 		return 0;
 
 	trace_i915_gem_ring_sync_to(to, from);
@@ -493,7 +499,7 @@ i915_gem_request_await_request(struct drm_i915_gem_request *to,
 			return ret;
 	}
 
-	from->engine->semaphore.sync_seqno[idx] = from->global_seqno;
+	to->timeline->sync_seqno[from->engine->id] = from->global_seqno;
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/i915/i915_gem_timeline.h b/drivers/gpu/drm/i915/i915_gem_timeline.h
index c3cbaf28e959..96c765ba3abc 100644
--- a/drivers/gpu/drm/i915/i915_gem_timeline.h
+++ b/drivers/gpu/drm/i915/i915_gem_timeline.h
@@ -47,6 +47,7 @@ struct intel_timeline {
 	 * struct_mutex.
 	 */
 	struct i915_gem_active last_request;
+	u32 sync_seqno[I915_NUM_ENGINES];
 
 	struct i915_gem_timeline *common;
 };
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 65761c16ac48..fc62bd0ce17d 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -263,16 +263,13 @@ static void error_print_engine(struct drm_i915_error_state_buf *m,
 	if (INTEL_GEN(m->i915) >= 6) {
 		err_printf(m, "  RC PSMI: 0x%08x\n", ee->rc_psmi);
 		err_printf(m, "  FAULT_REG: 0x%08x\n", ee->fault_reg);
-		err_printf(m, "  SYNC_0: 0x%08x [last synced 0x%08x]\n",
-			   ee->semaphore_mboxes[0],
-			   ee->semaphore_seqno[0]);
-		err_printf(m, "  SYNC_1: 0x%08x [last synced 0x%08x]\n",
-			   ee->semaphore_mboxes[1],
-			   ee->semaphore_seqno[1]);
+		err_printf(m, "  SYNC_0: 0x%08x\n",
+			   ee->semaphore_mboxes[0]);
+		err_printf(m, "  SYNC_1: 0x%08x\n",
+			   ee->semaphore_mboxes[1]);
 		if (HAS_VEBOX(m->i915)) {
-			err_printf(m, "  SYNC_2: 0x%08x [last synced 0x%08x]\n",
-				   ee->semaphore_mboxes[2],
-				   ee->semaphore_seqno[2]);
+			err_printf(m, "  SYNC_2: 0x%08x\n",
+				   ee->semaphore_mboxes[2]);
 		}
 	}
 	if (USES_PPGTT(m->i915)) {
@@ -905,7 +902,6 @@ static void gen8_record_semaphore_state(struct drm_i915_error_state *error,
 		idx = intel_engine_sync_index(engine, to);
 
 		ee->semaphore_mboxes[idx] = tmp[signal_offset];
-		ee->semaphore_seqno[idx] = engine->semaphore.sync_seqno[idx];
 	}
 }
 
@@ -916,14 +912,9 @@ static void gen6_record_semaphore_state(struct intel_engine_cs *engine,
 
 	ee->semaphore_mboxes[0] = I915_READ(RING_SYNC_0(engine->mmio_base));
 	ee->semaphore_mboxes[1] = I915_READ(RING_SYNC_1(engine->mmio_base));
-	ee->semaphore_seqno[0] = engine->semaphore.sync_seqno[0];
-	ee->semaphore_seqno[1] = engine->semaphore.sync_seqno[1];
-
-	if (HAS_VEBOX(dev_priv)) {
+	if (HAS_VEBOX(dev_priv))
 		ee->semaphore_mboxes[2] =
 			I915_READ(RING_SYNC_2(engine->mmio_base));
-		ee->semaphore_seqno[2] = engine->semaphore.sync_seqno[2];
-	}
 }
 
 static void error_record_engine_waiters(struct intel_engine_cs *engine,
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 87feaeff4304..ac8ada613211 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -191,8 +191,6 @@ void intel_engine_init_global_seqno(struct intel_engine_cs *engine, u32 seqno)
 				       I915_NUM_ENGINES * gen8_semaphore_seqno_size);
 		kunmap(page);
 	}
-	memset(engine->semaphore.sync_seqno, 0,
-	       sizeof(engine->semaphore.sync_seqno));
 
 	intel_write_status_page(engine, I915_GEM_HWS_INDEX, seqno);
 	if (engine->irq_seqno_barrier)
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 83d2a3f72ce1..9ce7b5bbe670 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2041,9 +2041,6 @@ static int intel_init_ring_buffer(struct intel_engine_cs *engine)
 
 	intel_engine_setup_common(engine);
 
-	memset(engine->semaphore.sync_seqno, 0,
-	       sizeof(engine->semaphore.sync_seqno));
-
 	ret = intel_engine_init_common(engine);
 	if (ret)
 		goto error;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index a2f9c7d60381..7114ef149343 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -284,8 +284,6 @@ struct intel_engine_cs {
 	 *  ie. transpose of f(x, y)
 	 */
 	struct {
-		u32	sync_seqno[I915_NUM_ENGINES-1];
-
 		union {
 #define GEN6_SEMAPHORE_LAST	VECS_HW
 #define GEN6_NUM_SEMAPHORES	(GEN6_SEMAPHORE_LAST + 1)
-- 
2.9.3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 14/18] drm/i915: Create a unique name for the context
  2016-09-14  6:52 Tracking multiple timelines (full-ppgtt) Chris Wilson
                   ` (12 preceding siblings ...)
  2016-09-14  6:52 ` [PATCH 13/18] drm/i915: Move the global sync optimisation to the timeline Chris Wilson
@ 2016-09-14  6:52 ` Chris Wilson
  2016-09-19 13:23   ` Joonas Lahtinen
  2016-09-14  6:52 ` [PATCH 15/18] drm/i915: Reserve space in the global seqno during request allocation Chris Wilson
                   ` (4 subsequent siblings)
  18 siblings, 1 reply; 51+ messages in thread
From: Chris Wilson @ 2016-09-14  6:52 UTC (permalink / raw)
  To: intel-gfx

This will be used for communicating issues with this context to
userspace, so we want to identify the parent process and the individual
context. Note that the name isn't quite unique, it makes the presumption
of there only being a single device fd per process.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h         |  1 +
 drivers/gpu/drm/i915/i915_gem_context.c | 17 +++++++++++++++--
 2 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index f60d2e17033d..cc5fa593e4c4 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -914,6 +914,7 @@ struct i915_gem_context {
 	struct drm_i915_file_private *file_priv;
 	struct i915_hw_ppgtt *ppgtt;
 	struct pid *pid;
+	const char *name;
 
 	struct i915_ctx_hang_stats hang_stats;
 
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index df10f4e95736..edcfebd3c6ef 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -158,6 +158,7 @@ void i915_gem_context_free(struct kref *ctx_ref)
 		i915_vma_put(ce->state);
 	}
 
+	kfree(ctx->name);
 	put_pid(ctx->pid);
 	list_del(&ctx->link);
 
@@ -310,12 +311,21 @@ __create_hw_context(struct drm_device *dev,
 			goto err_out;
 	} else
 		ret = DEFAULT_CONTEXT_HANDLE;
+	ctx->user_handle = ret;
 
 	ctx->file_priv = file_priv;
-	if (file_priv)
+	if (file_priv) {
 		ctx->pid = get_task_pid(current, PIDTYPE_PID);
+		ctx->name = kasprintf(GFP_KERNEL, "%s[%d]/%x",
+				      current->comm,
+				      pid_nr(ctx->pid),
+				      ctx->user_handle);
+		if (!ctx->name) {
+			ret = -ENOMEM;
+			goto err_pid;
+		}
+	}
 
-	ctx->user_handle = ret;
 	/* NB: Mark all slices as needing a remap so that when the context first
 	 * loads it will restore whatever remap state already exists. If there
 	 * is no remap info, it will be a NOP. */
@@ -329,6 +339,9 @@ __create_hw_context(struct drm_device *dev,
 
 	return ctx;
 
+err_pid:
+	put_pid(ctx->pid);
+	idr_remove(&file_priv->context_idr, ctx->user_handle);
 err_out:
 	context_close(ctx);
 	return ERR_PTR(ret);
-- 
2.9.3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 15/18] drm/i915: Reserve space in the global seqno during request allocation
  2016-09-14  6:52 Tracking multiple timelines (full-ppgtt) Chris Wilson
                   ` (13 preceding siblings ...)
  2016-09-14  6:52 ` [PATCH 14/18] drm/i915: Create a unique name for the context Chris Wilson
@ 2016-09-14  6:52 ` Chris Wilson
  2016-09-19 13:47   ` Joonas Lahtinen
  2016-09-14  6:52 ` [PATCH 16/18] drm/i915: Enable multiple timelines Chris Wilson
                   ` (3 subsequent siblings)
  18 siblings, 1 reply; 51+ messages in thread
From: Chris Wilson @ 2016-09-14  6:52 UTC (permalink / raw)
  To: intel-gfx

A restriction on our global seqno is that they cannot wrap, and that we
cannot use the value 0. This allows us to detect when a request has not
yet been submitted, its global seqno is still 0, and ensures that
hardware semaphores are monotonic as required by older hardware. To
meet these restrictions when we defer the assignment of the global
seqno, we must check that we have an available slot in the global seqno
space during request construction. If that test fails, we wait for all
requests to be completed and reset the hardware back to 0.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c      | 10 ++---
 drivers/gpu/drm/i915/i915_drv.h          |  2 +-
 drivers/gpu/drm/i915/i915_gem.c          |  6 +--
 drivers/gpu/drm/i915/i915_gem_request.c  | 74 +++++++++++++++-----------------
 drivers/gpu/drm/i915/i915_gem_timeline.h |  2 +-
 5 files changed, 44 insertions(+), 50 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index f7c5e09122ab..d23047bb2fe9 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -557,7 +557,7 @@ static int i915_gem_pageflip_info(struct seq_file *m, void *data)
 				seq_printf(m, "Flip queued on %s at seqno %x, next seqno %x [current breadcrumb %x], completed? %d\n",
 					   engine->name,
 					   i915_gem_request_get_seqno(work->flip_queued_req),
-					   dev_priv->gt.global_timeline.next_seqno,
+					   atomic_read(&dev_priv->gt.global_timeline.next_seqno),
 					   intel_engine_get_seqno(engine),
 					   i915_gem_request_completed(work->flip_queued_req));
 			} else
@@ -1029,7 +1029,7 @@ i915_next_seqno_get(void *data, u64 *val)
 {
 	struct drm_i915_private *dev_priv = data;
 
-	*val = dev_priv->gt.global_timeline.next_seqno;
+	*val = atomic_read(&dev_priv->gt.global_timeline.next_seqno);
 	return 0;
 }
 
@@ -2304,8 +2304,8 @@ static int i915_rps_boost_info(struct seq_file *m, void *data)
 	struct drm_file *file;
 
 	seq_printf(m, "RPS enabled? %d\n", dev_priv->rps.enabled);
-	seq_printf(m, "GPU busy? %s [%x]\n",
-		   yesno(dev_priv->gt.awake), dev_priv->gt.active_engines);
+	seq_printf(m, "GPU busy? %s [%d requests]\n",
+		   yesno(dev_priv->gt.awake), dev_priv->gt.active_requests);
 	seq_printf(m, "CPU waiting? %d\n", count_irq_waiters(dev_priv));
 	seq_printf(m, "Frequency requested %d\n",
 		   intel_gpu_freq(dev_priv, dev_priv->rps.cur_freq));
@@ -2340,7 +2340,7 @@ static int i915_rps_boost_info(struct seq_file *m, void *data)
 
 	if (INTEL_GEN(dev_priv) >= 6 &&
 	    dev_priv->rps.enabled &&
-	    dev_priv->gt.active_engines) {
+	    dev_priv->gt.active_requests) {
 		u32 rpup, rpupei;
 		u32 rpdown, rpdownei;
 
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index cc5fa593e4c4..45fd20af0c6d 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2048,6 +2048,7 @@ struct drm_i915_private {
 
 		struct list_head timelines;
 		struct i915_gem_timeline global_timeline;
+		u32 active_requests;
 
 		/**
 		 * Is the GPU currently considered idle, or busy executing
@@ -2056,7 +2057,6 @@ struct drm_i915_private {
 		 * In order to reduce the effect on performance, there
 		 * is a slight delay before we do so.
 		 */
-		unsigned int active_engines;
 		bool awake;
 
 		/**
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 76d6fc146adf..cd9d3a2f72e7 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2714,8 +2714,6 @@ static void i915_gem_cleanup_engine(struct intel_engine_cs *engine)
 		memset(engine->execlist_port, 0, sizeof(engine->execlist_port));
 		spin_unlock(&engine->execlist_lock);
 	}
-
-	engine->i915->gt.active_engines &= ~intel_engine_flag(engine);
 }
 
 void i915_gem_set_wedged(struct drm_i915_private *dev_priv)
@@ -2770,7 +2768,7 @@ i915_gem_idle_work_handler(struct work_struct *work)
 	if (!READ_ONCE(dev_priv->gt.awake))
 		return;
 
-	if (READ_ONCE(dev_priv->gt.active_engines))
+	if (READ_ONCE(dev_priv->gt.active_requests))
 		return;
 
 	rearm_hangcheck =
@@ -2784,7 +2782,7 @@ i915_gem_idle_work_handler(struct work_struct *work)
 		goto out_rearm;
 	}
 
-	if (dev_priv->gt.active_engines)
+	if (dev_priv->gt.active_requests)
 		goto out_unlock;
 
 	for_each_engine(engine, dev_priv)
diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
index 74e1a875988c..bd84bb757e39 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -156,6 +156,7 @@ static void i915_gem_request_retire(struct drm_i915_gem_request *request)
 	 */
 	list_del(&request->ring_link);
 	request->ring->last_retired_head = request->postfix;
+	request->i915->gt.active_requests--;
 
 	/* Walk through the active list, calling retire on each. This allows
 	 * objects to track their GPU activity and mark themselves as idle
@@ -253,7 +254,7 @@ static int i915_gem_init_global_seqno(struct drm_i915_private *dev_priv,
 
 	/* If the seqno wraps around, we need to clear the breadcrumb rbtree */
 	if (!i915_seqno_passed(seqno,
-			       dev_priv->gt.global_timeline.next_seqno)) {
+			       atomic_read(&dev_priv->gt.global_timeline.next_seqno))) {
 		while (intel_kick_waiters(dev_priv) ||
 		       intel_kick_signalers(dev_priv))
 			yield();
@@ -269,13 +270,13 @@ static int i915_gem_init_global_seqno(struct drm_i915_private *dev_priv,
 			       sizeof(tl->engine[i].sync_seqno));
 	}
 
+	atomic_set(&dev_priv->gt.global_timeline.next_seqno, seqno);
 	return 0;
 }
 
 int i915_gem_set_global_seqno(struct drm_device *dev, u32 seqno)
 {
 	struct drm_i915_private *dev_priv = to_i915(dev);
-	int ret;
 
 	if (seqno == 0)
 		return -EINVAL;
@@ -283,34 +284,32 @@ int i915_gem_set_global_seqno(struct drm_device *dev, u32 seqno)
 	/* HWS page needs to be set less than what we
 	 * will inject to ring
 	 */
-	ret = i915_gem_init_global_seqno(dev_priv, seqno - 1);
-	if (ret)
-		return ret;
-
-	dev_priv->gt.global_timeline.next_seqno = seqno;
-	return 0;
+	return i915_gem_init_global_seqno(dev_priv, seqno - 1);
 }
 
-static int i915_gem_get_global_seqno(struct drm_i915_private *dev_priv,
-				     u32 *seqno)
+static int reserve_global_seqno(struct drm_i915_private *i915)
 {
-	struct i915_gem_timeline *tl = &dev_priv->gt.global_timeline;
+	struct i915_gem_timeline *tl = &i915->gt.global_timeline;
+	u32 next_seqno = atomic_read(&tl->next_seqno);
 
-	/* reserve 0 for non-seqno */
-	if (unlikely(tl->next_seqno == 0)) {
+	if (unlikely(next_seqno + ++i915->gt.active_requests <= next_seqno)) {
 		int ret;
 
-		ret = i915_gem_init_global_seqno(dev_priv, 0);
-		if (ret)
+		ret = i915_gem_init_global_seqno(i915, 0);
+		if (ret) {
+			i915->gt.active_requests--;
 			return ret;
-
-		tl->next_seqno = 1;
+		}
 	}
 
-	*seqno = tl->next_seqno++;
 	return 0;
 }
 
+static u32 timeline_get_seqno(struct i915_gem_timeline *tl)
+{
+	return atomic_inc_return(&tl->next_seqno);
+}
+
 static int __i915_sw_fence_call
 submit_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
 {
@@ -349,7 +348,6 @@ i915_gem_request_alloc(struct intel_engine_cs *engine,
 {
 	struct drm_i915_private *dev_priv = engine->i915;
 	struct drm_i915_gem_request *req;
-	u32 seqno;
 	int ret;
 
 	/* ABI: Before userspace accesses the GPU (e.g. execbuffer), report
@@ -360,6 +358,10 @@ i915_gem_request_alloc(struct intel_engine_cs *engine,
 	if (ret)
 		return ERR_PTR(ret);
 
+	ret = reserve_global_seqno(dev_priv);
+	if (ret)
+		return ERR_PTR(ret);
+
 	/* Move the oldest request to the slab-cache (if not in use!) */
 	req = list_first_entry_or_null(&engine->timeline->requests,
 				       typeof(*req), link);
@@ -395,12 +397,10 @@ i915_gem_request_alloc(struct intel_engine_cs *engine,
 	 * Do not use kmem_cache_zalloc() here!
 	 */
 	req = kmem_cache_alloc(dev_priv->requests, GFP_KERNEL);
-	if (!req)
-		return ERR_PTR(-ENOMEM);
-
-	ret = i915_gem_get_global_seqno(dev_priv, &seqno);
-	if (ret)
-		goto err;
+	if (!req) {
+		ret = -ENOMEM;
+		goto err_unreserve;
+	}
 
 	req->timeline = engine->timeline;
 
@@ -409,14 +409,14 @@ i915_gem_request_alloc(struct intel_engine_cs *engine,
 		   &i915_fence_ops,
 		   &req->lock,
 		   req->timeline->fence_context,
-		   seqno);
+		   timeline_get_seqno(req->timeline->common));
 
 	i915_sw_fence_init(&req->submit, submit_notify);
 
 	INIT_LIST_HEAD(&req->active_list);
 	req->i915 = dev_priv;
 	req->engine = engine;
-	req->global_seqno = seqno;
+	req->global_seqno = req->fence.seqno;
 	req->ctx = i915_gem_context_get(ctx);
 
 	/* No zalloc, must clear what we need by hand */
@@ -452,8 +452,9 @@ i915_gem_request_alloc(struct intel_engine_cs *engine,
 
 err_ctx:
 	i915_gem_context_put(ctx);
-err:
 	kmem_cache_free(dev_priv->requests, req);
+err_unreserve:
+	dev_priv->gt.active_requests--;
 	return ERR_PTR(ret);
 }
 
@@ -608,7 +609,6 @@ static void i915_gem_mark_busy(const struct intel_engine_cs *engine)
 {
 	struct drm_i915_private *dev_priv = engine->i915;
 
-	dev_priv->gt.active_engines |= intel_engine_flag(engine);
 	if (dev_priv->gt.awake)
 		return;
 
@@ -936,38 +936,34 @@ complete:
 	return timeout;
 }
 
-static bool engine_retire_requests(struct intel_engine_cs *engine)
+static void engine_retire_requests(struct intel_engine_cs *engine)
 {
 	struct drm_i915_gem_request *request, *next;
 
 	list_for_each_entry_safe(request, next,
 				 &engine->timeline->requests, link) {
 		if (!__i915_gem_request_completed(request))
-			return false;
+			return;
 
 		i915_gem_request_retire(request);
 	}
-
-	return true;
 }
 
 void i915_gem_retire_requests(struct drm_i915_private *dev_priv)
 {
 	struct intel_engine_cs *engine;
-	unsigned int tmp;
 
 	lockdep_assert_held(&dev_priv->drm.struct_mutex);
 
-	if (dev_priv->gt.active_engines == 0)
+	if (!dev_priv->gt.active_requests)
 		return;
 
 	GEM_BUG_ON(!dev_priv->gt.awake);
 
-	for_each_engine_masked(engine, dev_priv, dev_priv->gt.active_engines, tmp)
-		if (engine_retire_requests(engine))
-			dev_priv->gt.active_engines &= ~intel_engine_flag(engine);
+	for_each_engine(engine, dev_priv)
+		engine_retire_requests(engine);
 
-	if (dev_priv->gt.active_engines == 0)
+	if (!dev_priv->gt.active_requests)
 		queue_delayed_work(dev_priv->wq,
 				   &dev_priv->gt.idle_work,
 				   msecs_to_jiffies(100));
diff --git a/drivers/gpu/drm/i915/i915_gem_timeline.h b/drivers/gpu/drm/i915/i915_gem_timeline.h
index 96c765ba3abc..0fbcf8a37076 100644
--- a/drivers/gpu/drm/i915/i915_gem_timeline.h
+++ b/drivers/gpu/drm/i915/i915_gem_timeline.h
@@ -54,7 +54,7 @@ struct intel_timeline {
 
 struct i915_gem_timeline {
 	struct list_head link;
-	u32 next_seqno;
+	atomic_t next_seqno;
 
 	struct drm_i915_private *i915;
 	const char *name;
-- 
2.9.3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 16/18] drm/i915: Enable multiple timelines
  2016-09-14  6:52 Tracking multiple timelines (full-ppgtt) Chris Wilson
                   ` (14 preceding siblings ...)
  2016-09-14  6:52 ` [PATCH 15/18] drm/i915: Reserve space in the global seqno during request allocation Chris Wilson
@ 2016-09-14  6:52 ` Chris Wilson
  2016-09-19 15:52   ` Joonas Lahtinen
  2016-09-14  6:52 ` [PATCH 17/18] drm/i915: Enable userspace to opt-out of implicit fencing Chris Wilson
                   ` (2 subsequent siblings)
  18 siblings, 1 reply; 51+ messages in thread
From: Chris Wilson @ 2016-09-14  6:52 UTC (permalink / raw)
  To: intel-gfx

With the infrastructure converted over to tracking multiple timelines in
the GEM API whilst preserving the efficiency of using a single execution
timeline internally, we can now assign a separate timeline to every
context with full-ppgtt.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h          | 10 ++++
 drivers/gpu/drm/i915/i915_gem.c          |  5 ++
 drivers/gpu/drm/i915/i915_gem_context.c  |  4 +-
 drivers/gpu/drm/i915/i915_gem_evict.c    | 11 +++--
 drivers/gpu/drm/i915/i915_gem_gtt.c      | 19 ++++---
 drivers/gpu/drm/i915/i915_gem_gtt.h      |  4 +-
 drivers/gpu/drm/i915/i915_gem_request.c  | 85 +++++++++++++++++++-------------
 drivers/gpu/drm/i915/i915_gem_timeline.c |  1 +
 drivers/gpu/drm/i915/i915_gem_timeline.h |  2 +
 drivers/gpu/drm/i915/intel_breadcrumbs.c | 12 +++--
 drivers/gpu/drm/i915/intel_ringbuffer.h  |  5 --
 11 files changed, 101 insertions(+), 57 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 45fd20af0c6d..9a8a81d5643b 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3402,6 +3402,16 @@ static inline void i915_gem_context_put(struct i915_gem_context *ctx)
 	kref_put(&ctx->ref, i915_gem_context_free);
 }
 
+static inline struct intel_timeline *
+i915_gem_context_lookup_timeline(struct i915_gem_context *ctx,
+				 struct intel_engine_cs *engine)
+{
+	struct i915_address_space *vm;
+
+	vm = ctx->ppgtt ? &ctx->ppgtt->base : &ctx->i915->ggtt.base;
+	return &vm->timeline.engine[engine->id];
+}
+
 static inline bool i915_gem_context_is_default(const struct i915_gem_context *c)
 {
 	return c->user_handle == DEFAULT_CONTEXT_HANDLE;
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index cd9d3a2f72e7..e6cc61c391a0 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2634,6 +2634,7 @@ static void i915_gem_reset_engine(struct intel_engine_cs *engine)
 {
 	struct drm_i915_gem_request *request;
 	struct i915_gem_context *incomplete_ctx;
+	struct intel_timeline *timeline;
 	bool ring_hung;
 
 	/* Ensure irq handler finishes, and not run again. */
@@ -2671,6 +2672,10 @@ static void i915_gem_reset_engine(struct intel_engine_cs *engine)
 	list_for_each_entry_continue(request, &engine->timeline->requests, link)
 		if (request->ctx == incomplete_ctx)
 			reset_request(request);
+
+	timeline = i915_gem_context_lookup_timeline(incomplete_ctx, engine);
+	list_for_each_entry(request, &timeline->requests, link)
+		reset_request(request);
 }
 
 void i915_gem_reset(struct drm_i915_private *dev_priv)
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index edcfebd3c6ef..d76456fd712e 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -365,9 +365,9 @@ i915_gem_create_context(struct drm_device *dev,
 		return ctx;
 
 	if (USES_FULL_PPGTT(dev)) {
-		struct i915_hw_ppgtt *ppgtt =
-			i915_ppgtt_create(to_i915(dev), file_priv);
+		struct i915_hw_ppgtt *ppgtt;
 
+		ppgtt = i915_ppgtt_create(to_i915(dev), file_priv, ctx->name);
 		if (IS_ERR(ppgtt)) {
 			DRM_DEBUG_DRIVER("PPGTT setup failed (%ld)\n",
 					 PTR_ERR(ppgtt));
diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c
index 5b6f81c1dbca..aadbf1f83351 100644
--- a/drivers/gpu/drm/i915/i915_gem_evict.c
+++ b/drivers/gpu/drm/i915/i915_gem_evict.c
@@ -33,13 +33,16 @@
 #include "intel_drv.h"
 #include "i915_trace.h"
 
-static bool
-gpu_is_idle(struct drm_i915_private *dev_priv)
+static bool ggtt_is_idle(struct drm_i915_private *dev_priv)
 {
+	struct i915_ggtt *ggtt = &dev_priv->ggtt;
 	struct intel_engine_cs *engine;
 
 	for_each_engine(engine, dev_priv) {
-		if (intel_engine_is_active(engine))
+		struct intel_timeline *tl;
+
+		tl = &ggtt->base.timeline.engine[engine->id];
+		if (i915_gem_active_isset(&tl->last_request))
 			return false;
 	}
 
@@ -152,7 +155,7 @@ search_again:
 	if (!i915_is_ggtt(vm) || flags & PIN_NONBLOCK)
 		return -ENOSPC;
 
-	if (gpu_is_idle(dev_priv)) {
+	if (ggtt_is_idle(dev_priv)) {
 		/* If we still have pending pageflip completions, drop
 		 * back to userspace to give our workqueues time to
 		 * acquire our locks and unpin the old scanouts.
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 92b36ab79771..6fca32a73770 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -2112,8 +2112,10 @@ static int __hw_ppgtt_init(struct i915_hw_ppgtt *ppgtt,
 }
 
 static void i915_address_space_init(struct i915_address_space *vm,
-				    struct drm_i915_private *dev_priv)
+				    struct drm_i915_private *dev_priv,
+				    const char *name)
 {
+	i915_gem_timeline_init(dev_priv, &vm->timeline, name);
 	drm_mm_init(&vm->mm, vm->start, vm->total);
 	INIT_LIST_HEAD(&vm->active_list);
 	INIT_LIST_HEAD(&vm->inactive_list);
@@ -2142,14 +2144,15 @@ static void gtt_write_workarounds(struct drm_device *dev)
 
 static int i915_ppgtt_init(struct i915_hw_ppgtt *ppgtt,
 			   struct drm_i915_private *dev_priv,
-			   struct drm_i915_file_private *file_priv)
+			   struct drm_i915_file_private *file_priv,
+			   const char *name)
 {
 	int ret;
 
 	ret = __hw_ppgtt_init(ppgtt, dev_priv);
 	if (ret == 0) {
 		kref_init(&ppgtt->ref);
-		i915_address_space_init(&ppgtt->base, dev_priv);
+		i915_address_space_init(&ppgtt->base, dev_priv, name);
 		ppgtt->base.file = file_priv;
 	}
 
@@ -2183,7 +2186,8 @@ int i915_ppgtt_init_hw(struct drm_device *dev)
 
 struct i915_hw_ppgtt *
 i915_ppgtt_create(struct drm_i915_private *dev_priv,
-		  struct drm_i915_file_private *fpriv)
+		  struct drm_i915_file_private *fpriv,
+		  const char *name)
 {
 	struct i915_hw_ppgtt *ppgtt;
 	int ret;
@@ -2192,7 +2196,7 @@ i915_ppgtt_create(struct drm_i915_private *dev_priv,
 	if (!ppgtt)
 		return ERR_PTR(-ENOMEM);
 
-	ret = i915_ppgtt_init(ppgtt, dev_priv, fpriv);
+	ret = i915_ppgtt_init(ppgtt, dev_priv, fpriv, name);
 	if (ret) {
 		kfree(ppgtt);
 		return ERR_PTR(ret);
@@ -2215,6 +2219,7 @@ void  i915_ppgtt_release(struct kref *kref)
 	WARN_ON(!list_empty(&ppgtt->base.inactive_list));
 	WARN_ON(!list_empty(&ppgtt->base.unbound_list));
 
+	i915_gem_timeline_fini(&ppgtt->base.timeline);
 	list_del(&ppgtt->base.global_link);
 	drm_mm_takedown(&ppgtt->base.mm);
 
@@ -3191,11 +3196,13 @@ int i915_ggtt_init_hw(struct drm_i915_private *dev_priv)
 	/* Subtract the guard page before address space initialization to
 	 * shrink the range used by drm_mm.
 	 */
+	mutex_lock(&dev_priv->drm.struct_mutex);
 	ggtt->base.total -= PAGE_SIZE;
-	i915_address_space_init(&ggtt->base, dev_priv);
+	i915_address_space_init(&ggtt->base, dev_priv, "[global]");
 	ggtt->base.total += PAGE_SIZE;
 	if (!HAS_LLC(dev_priv))
 		ggtt->base.mm.color_adjust = i915_gtt_color_adjust;
+	mutex_unlock(&dev_priv->drm.struct_mutex);
 
 	if (!io_mapping_init_wc(&dev_priv->ggtt.mappable,
 				dev_priv->ggtt.mappable_base,
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 21f5d9657271..1b2e633f22a3 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -342,6 +342,7 @@ struct i915_pml4 {
 
 struct i915_address_space {
 	struct drm_mm mm;
+	struct i915_gem_timeline timeline;
 	struct drm_device *dev;
 	/* Every address space belongs to a struct file - except for the global
 	 * GTT that is owned by the driver (and so @file is set to NULL). In
@@ -612,7 +613,8 @@ void i915_ggtt_cleanup_hw(struct drm_i915_private *dev_priv);
 int i915_ppgtt_init_hw(struct drm_device *dev);
 void i915_ppgtt_release(struct kref *kref);
 struct i915_hw_ppgtt *i915_ppgtt_create(struct drm_i915_private *dev_priv,
-					struct drm_i915_file_private *fpriv);
+					struct drm_i915_file_private *fpriv,
+					const char *name);
 static inline void i915_ppgtt_get(struct i915_hw_ppgtt *ppgtt)
 {
 	if (ppgtt)
diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
index bd84bb757e39..2dc231e58593 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -34,12 +34,6 @@ static const char *i915_fence_get_driver_name(struct fence *fence)
 
 static const char *i915_fence_get_timeline_name(struct fence *fence)
 {
-	/* Timelines are bound by eviction to a VM. However, since
-	 * we only have a global seqno at the moment, we only have
-	 * a single timeline. Note that each timeline will have
-	 * multiple execution contexts (fence contexts) as we allow
-	 * engines within a single timeline to execute in parallel.
-	 */
 	return to_request(fence)->timeline->common->name;
 }
 
@@ -64,18 +58,6 @@ static signed long i915_fence_wait(struct fence *fence,
 	return i915_wait_request(to_request(fence), interruptible, timeout);
 }
 
-static void i915_fence_value_str(struct fence *fence, char *str, int size)
-{
-	snprintf(str, size, "%u", fence->seqno);
-}
-
-static void i915_fence_timeline_value_str(struct fence *fence, char *str,
-					  int size)
-{
-	snprintf(str, size, "%u",
-		 intel_engine_get_seqno(to_request(fence)->engine));
-}
-
 static void i915_fence_release(struct fence *fence)
 {
 	struct drm_i915_gem_request *req = to_request(fence);
@@ -90,8 +72,6 @@ const struct fence_ops i915_fence_ops = {
 	.signaled = i915_fence_signaled,
 	.wait = i915_fence_wait,
 	.release = i915_fence_release,
-	.fence_value_str = i915_fence_value_str,
-	.timeline_value_str = i915_fence_timeline_value_str,
 };
 
 int i915_gem_request_add_to_client(struct drm_i915_gem_request *req,
@@ -144,7 +124,10 @@ static void i915_gem_request_retire(struct drm_i915_gem_request *request)
 	struct i915_gem_active *active, *next;
 
 	trace_i915_gem_request_retire(request);
+
+	spin_lock_irq(&request->engine->timeline->lock);
 	list_del_init(&request->link);
+	spin_unlock_irq(&request->engine->timeline->lock);
 
 	/* We know the GPU must have read the request to have
 	 * sent us the seqno + interrupt, so use the position
@@ -305,6 +288,12 @@ static int reserve_global_seqno(struct drm_i915_private *i915)
 	return 0;
 }
 
+static u32 __timeline_get_seqno(struct i915_gem_timeline *tl)
+{
+	/* next_seqno only incremented under a mutex */
+	return tl->next_seqno.counter++;
+}
+
 static u32 timeline_get_seqno(struct i915_gem_timeline *tl)
 {
 	return atomic_inc_return(&tl->next_seqno);
@@ -315,17 +304,42 @@ submit_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
 {
 	struct drm_i915_gem_request *request =
 		container_of(fence, typeof(*request), submit);
+	struct intel_timeline *timeline;
+	struct intel_engine_cs *engine = request->engine;
+	unsigned long flags;
+	u32 seqno;
 
 	/* Will be called from irq-context when using foreign DMA fences */
 
-	switch (state) {
-	case FENCE_COMPLETE:
-		request->engine->submit_request(request);
-		break;
+	if (state != FENCE_COMPLETE)
+		return NOTIFY_DONE;
 
-	case FENCE_FREE:
-		break;
-	}
+	timeline = engine->timeline;
+	GEM_BUG_ON(timeline == request->timeline);
+
+	spin_lock_irqsave(&timeline->lock, flags);
+
+	seqno = timeline_get_seqno(&request->i915->gt.global_timeline);
+	GEM_BUG_ON(seqno == 0);
+
+	request->previous_seqno = timeline->last_submitted_seqno;
+	timeline->last_submitted_seqno = seqno;
+
+	spin_lock_nested(&request->lock, SINGLE_DEPTH_NESTING);
+	request->global_seqno = seqno;
+	if (test_bit(FENCE_FLAG_ENABLE_SIGNAL_BIT, &request->fence.flags))
+		intel_engine_enable_signaling(request);
+	spin_unlock(&request->lock);
+
+	engine->emit_request(request, request->ring->vaddr + request->postfix);
+
+	spin_lock_nested(&request->timeline->lock, SINGLE_DEPTH_NESTING);
+	list_move_tail(&request->link, &timeline->requests);
+	spin_unlock(&request->timeline->lock);
+
+	engine->submit_request(request);
+
+	spin_unlock_irqrestore(&timeline->lock, flags);
 
 	return NOTIFY_DONE;
 }
@@ -402,24 +416,24 @@ i915_gem_request_alloc(struct intel_engine_cs *engine,
 		goto err_unreserve;
 	}
 
-	req->timeline = engine->timeline;
+	req->timeline = i915_gem_context_lookup_timeline(ctx, engine);
 
 	spin_lock_init(&req->lock);
 	fence_init(&req->fence,
 		   &i915_fence_ops,
 		   &req->lock,
 		   req->timeline->fence_context,
-		   timeline_get_seqno(req->timeline->common));
+		   __timeline_get_seqno(req->timeline->common));
 
 	i915_sw_fence_init(&req->submit, submit_notify);
 
 	INIT_LIST_HEAD(&req->active_list);
 	req->i915 = dev_priv;
 	req->engine = engine;
-	req->global_seqno = req->fence.seqno;
 	req->ctx = i915_gem_context_get(ctx);
 
 	/* No zalloc, must clear what we need by hand */
+	req->global_seqno = 0;
 	req->previous_context = NULL;
 	req->file_priv = NULL;
 	req->batch = NULL;
@@ -669,8 +683,6 @@ void __i915_add_request(struct drm_i915_gem_request *request, bool flush_caches)
 	err = intel_ring_begin(request, engine->emit_request_sz);
 	GEM_BUG_ON(err);
 	request->postfix = ring->tail;
-
-	engine->emit_request(request, request->ring->vaddr + request->postfix);
 	ring->tail += engine->emit_request_sz * sizeof(u32);
 
 	/* Seal the request and mark it as pending execution. Note that
@@ -685,12 +697,15 @@ void __i915_add_request(struct drm_i915_gem_request *request, bool flush_caches)
 		i915_sw_fence_await_sw_fence(&request->submit, &prev->submit,
 					     &request->submitq, GFP_NOWAIT);
 
-	request->emitted_jiffies = jiffies;
-	request->previous_seqno = timeline->last_submitted_seqno;
+	spin_lock_irq(&timeline->lock);
+	list_add_tail(&request->link, &timeline->requests);
+	spin_unlock_irq(&timeline->lock);
+
 	timeline->last_submitted_seqno = request->fence.seqno;
 	i915_gem_active_set(&timeline->last_request, request);
-	list_add_tail(&request->link, &timeline->requests);
+
 	list_add_tail(&request->ring_link, &ring->request_list);
+	request->emitted_jiffies = jiffies;
 
 	i915_gem_mark_busy(engine);
 
diff --git a/drivers/gpu/drm/i915/i915_gem_timeline.c b/drivers/gpu/drm/i915/i915_gem_timeline.c
index da2e0d0c232e..22ca86335d86 100644
--- a/drivers/gpu/drm/i915/i915_gem_timeline.c
+++ b/drivers/gpu/drm/i915/i915_gem_timeline.c
@@ -47,6 +47,7 @@ int i915_gem_timeline_init(struct drm_i915_private *i915,
 		tl->fence_context = fences + i;
 		tl->common = timeline;
 
+		spin_lock_init(&tl->lock);
 		init_request_active(&tl->last_request, NULL);
 		INIT_LIST_HEAD(&tl->requests);
 	}
diff --git a/drivers/gpu/drm/i915/i915_gem_timeline.h b/drivers/gpu/drm/i915/i915_gem_timeline.h
index 0fbcf8a37076..0302be2dfa13 100644
--- a/drivers/gpu/drm/i915/i915_gem_timeline.h
+++ b/drivers/gpu/drm/i915/i915_gem_timeline.h
@@ -35,6 +35,8 @@ struct intel_timeline {
 	u64 fence_context;
 	u32 last_submitted_seqno;
 
+	spinlock_t lock;
+
 	/**
 	 * List of breadcrumbs associated with GPU requests currently
 	 * outstanding.
diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
index 9ad1028681cf..9dba4971fb1e 100644
--- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
@@ -77,22 +77,26 @@ static void intel_breadcrumbs_fake_irq(unsigned long data)
 
 static void irq_enable(struct intel_engine_cs *engine)
 {
+	unsigned long flags;
+
 	/* Enabling the IRQ may miss the generation of the interrupt, but
 	 * we still need to force the barrier before reading the seqno,
 	 * just in case.
 	 */
 	engine->breadcrumbs.irq_posted = true;
 
-	spin_lock_irq(&engine->i915->irq_lock);
+	spin_lock_irqsave(&engine->i915->irq_lock, flags);
 	engine->irq_enable(engine);
-	spin_unlock_irq(&engine->i915->irq_lock);
+	spin_unlock_irqrestore(&engine->i915->irq_lock, flags);
 }
 
 static void irq_disable(struct intel_engine_cs *engine)
 {
-	spin_lock_irq(&engine->i915->irq_lock);
+	unsigned long flags;
+
+	spin_lock_irqsave(&engine->i915->irq_lock, flags);
 	engine->irq_disable(engine);
-	spin_unlock_irq(&engine->i915->irq_lock);
+	spin_unlock_irqrestore(&engine->i915->irq_lock, flags);
 
 	engine->breadcrumbs.irq_posted = false;
 }
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 7114ef149343..f6637568bad4 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -561,9 +561,4 @@ void intel_engine_fini_breadcrumbs(struct intel_engine_cs *engine);
 unsigned int intel_kick_waiters(struct drm_i915_private *i915);
 unsigned int intel_kick_signalers(struct drm_i915_private *i915);
 
-static inline bool intel_engine_is_active(struct intel_engine_cs *engine)
-{
-	return i915_gem_active_isset(&engine->timeline->last_request);
-}
-
 #endif /* _INTEL_RINGBUFFER_H_ */
-- 
2.9.3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 17/18] drm/i915: Enable userspace to opt-out of implicit fencing
  2016-09-14  6:52 Tracking multiple timelines (full-ppgtt) Chris Wilson
                   ` (15 preceding siblings ...)
  2016-09-14  6:52 ` [PATCH 16/18] drm/i915: Enable multiple timelines Chris Wilson
@ 2016-09-14  6:52 ` Chris Wilson
  2016-09-14  6:52 ` [PATCH 18/18] drm/i915: Support explicit fencing for execbuf Chris Wilson
  2016-09-14  9:16 ` ✗ Fi.CI.BAT: failure for series starting with [01/18] drm/i915: Support asynchronous waits on struct fence from i915_gem_request Patchwork
  18 siblings, 0 replies; 51+ messages in thread
From: Chris Wilson @ 2016-09-14  6:52 UTC (permalink / raw)
  To: intel-gfx

Userspace is faced with a dilemma. The kernel requires implicit fencing
to manage resource usage (we always must wait for the GPU to finish
before releasing its PTE) and for third parties. However, userspace may
wish to avoid this serialisation if it is either using explicit fencing
between parties and wants more fine-grained access to buffers (e.g. it
may partition the buffer between uses and track fences on ranges rather
than the implicit fences tracking the whole object). It follows that
userspace needs a mechanism to avoid the kernel's serialisation on its
implicit fences before execbuf execution.

The next question is whether this is an object, execbuf or context flag.
Hybrid users (such as using explicit EGL_ANDROID_native_sync fencing on
shared winsys buffers, but implicit fencing on internal surfaces)
require a per-object level flag. Given that this flag need to be only
set once for the lifetime of the object, this reduces the convenience of
having an execbuf or context level flag (and avoids having multiple
pieces of uABI controlling the same feature).

Incorrect use of this flag will result in rendering corruption and GPU
hangs - but will not result in use-after-free or similar resource
tracking issues.

Serious caveat: write ordering is not strictly correct after setting
this flag on a render target on multiple engines. This affects all
subsequent GEM operations (execbuf, set-domain, pread) and shared
dma-buf operations. A fix is possible - but costly (both in terms of
further ABI changes and runtime overhead).

Testcase: igt/gem_exec_async
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_drv.c            |  1 +
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  3 +++
 include/uapi/drm/i915_drm.h                | 27 ++++++++++++++++++++++++++-
 3 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 5719deefd484..87748c8c091e 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -332,6 +332,7 @@ static int i915_getparam(struct drm_device *dev, void *data,
 	case I915_PARAM_HAS_EXEC_HANDLE_LUT:
 	case I915_PARAM_HAS_COHERENT_PHYS_GTT:
 	case I915_PARAM_HAS_EXEC_SOFTPIN:
+	case I915_PARAM_HAS_EXEC_ASYNC:
 		/* For the time being all of these are always true;
 		 * if some supported hardware does not have one of these
 		 * features this value needs to be provided from
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 6b5175ee824c..7a20dd2e5b55 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1107,6 +1107,9 @@ i915_gem_execbuffer_move_to_gpu(struct drm_i915_gem_request *req,
 	list_for_each_entry(vma, vmas, exec_list) {
 		struct drm_i915_gem_object *obj = vma->obj;
 
+		if (vma->exec_entry->flags & EXEC_OBJECT_ASYNC)
+			continue;
+
 		ret = i915_gem_request_await_object
 			(req, obj, obj->base.pending_write_domain);
 		if (ret)
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 93ad6f9d2496..66ee7d30329f 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -388,6 +388,10 @@ typedef struct drm_i915_irq_wait {
 #define I915_PARAM_HAS_POOLED_EU	 38
 #define I915_PARAM_MIN_EU_IN_POOL	 39
 #define I915_PARAM_MMAP_GTT_VERSION	 40
+/* Query whether DRM_I915_GEM_EXECBUFFER2 supports the ability to opt-out of
+ * synchronisation with implicit fencing on individual objects.
+ */
+#define I915_PARAM_HAS_EXEC_ASYNC	 41
 
 typedef struct drm_i915_getparam {
 	__s32 param;
@@ -729,8 +733,29 @@ struct drm_i915_gem_exec_object2 {
 #define EXEC_OBJECT_SUPPORTS_48B_ADDRESS (1<<3)
 #define EXEC_OBJECT_PINNED		 (1<<4)
 #define EXEC_OBJECT_PAD_TO_SIZE		 (1<<5)
+/* The kernel implicitly tracks GPU activity on all GEM objects, and
+ * synchronises operations with outstanding rendering. This includes
+ * rendering on other devices if exported via dma-buf. However, sometimes
+ * this tracking is too coarse and the user knows better. For example,
+ * if the object is split into non-overlapping ranges shared between different
+ * clients or engines (i.e. suballocating objects), the implicit tracking
+ * by kernel assumes that each operation affects the whole object rather
+ * than an individual range, causing needless synchronisation between clients.
+ * The kernel will also forgo any CPU cache flushes prior to rendering from
+ * the object as the client is expected to be also handling such domain
+ * tracking.
+ *
+ * The kernel maintains the implicit tracking in order to manage resources
+ * used by the GPU - this flag only disables the synchronisation prior to
+ * rendering with this object in this execbuf.
+ *
+ * Opting out of implicit synhronisation requires the user to do its own
+ * explicit tracking to avoid rendering corruption. See, for example,
+ * I915_PARAM_HAS_EXEC_FENCE to order execbufs and execute them asynchronously.
+ */
+#define EXEC_OBJECT_ASYNC		 (1<<6)
 /* All remaining bits are MBZ and RESERVED FOR FUTURE USE */
-#define __EXEC_OBJECT_UNKNOWN_FLAGS -(EXEC_OBJECT_PAD_TO_SIZE<<1)
+#define __EXEC_OBJECT_UNKNOWN_FLAGS -(EXEC_OBJECT_ASYNC<<1)
 	__u64 flags;
 
 	union {
-- 
2.9.3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 18/18] drm/i915: Support explicit fencing for execbuf
  2016-09-14  6:52 Tracking multiple timelines (full-ppgtt) Chris Wilson
                   ` (16 preceding siblings ...)
  2016-09-14  6:52 ` [PATCH 17/18] drm/i915: Enable userspace to opt-out of implicit fencing Chris Wilson
@ 2016-09-14  6:52 ` Chris Wilson
  2016-09-14  9:16 ` ✗ Fi.CI.BAT: failure for series starting with [01/18] drm/i915: Support asynchronous waits on struct fence from i915_gem_request Patchwork
  18 siblings, 0 replies; 51+ messages in thread
From: Chris Wilson @ 2016-09-14  6:52 UTC (permalink / raw)
  To: intel-gfx

Now that the user can opt-out of implicit fencing, we need to give them
back control over the fencing. We employ sync_file to wrap our
drm_i915_gem_request and provide an fd that userspace can merge with
other sync_file fds and pass back to the kernel to wait upon before
future execution.

Testcase: igt/gem_exec_fence
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
 drivers/gpu/drm/i915/Kconfig               |  1 +
 drivers/gpu/drm/i915/i915_drv.c            |  3 +-
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 53 +++++++++++++++++++++++++++---
 include/uapi/drm/i915_drm.h                | 36 +++++++++++++++++++-
 4 files changed, 86 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/Kconfig b/drivers/gpu/drm/i915/Kconfig
index 7769e469118f..319ca27ea719 100644
--- a/drivers/gpu/drm/i915/Kconfig
+++ b/drivers/gpu/drm/i915/Kconfig
@@ -18,6 +18,7 @@ config DRM_I915
 	select INPUT if ACPI
 	select ACPI_VIDEO if ACPI
 	select ACPI_BUTTON if ACPI
+	select SYNC_FILE
 	help
 	  Choose this option if you have a system that has "Intel Graphics
 	  Media Accelerator" or "HD Graphics" integrated graphics,
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 87748c8c091e..af43c968fc5a 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -333,6 +333,7 @@ static int i915_getparam(struct drm_device *dev, void *data,
 	case I915_PARAM_HAS_COHERENT_PHYS_GTT:
 	case I915_PARAM_HAS_EXEC_SOFTPIN:
 	case I915_PARAM_HAS_EXEC_ASYNC:
+	case I915_PARAM_HAS_EXEC_FENCE:
 		/* For the time being all of these are always true;
 		 * if some supported hardware does not have one of these
 		 * features this value needs to be provided from
@@ -2532,7 +2533,7 @@ static const struct drm_ioctl_desc i915_ioctls[] = {
 	DRM_IOCTL_DEF_DRV(I915_HWS_ADDR, drm_noop, DRM_AUTH|DRM_MASTER|DRM_ROOT_ONLY),
 	DRM_IOCTL_DEF_DRV(I915_GEM_INIT, drm_noop, DRM_AUTH|DRM_MASTER|DRM_ROOT_ONLY),
 	DRM_IOCTL_DEF_DRV(I915_GEM_EXECBUFFER, i915_gem_execbuffer, DRM_AUTH),
-	DRM_IOCTL_DEF_DRV(I915_GEM_EXECBUFFER2, i915_gem_execbuffer2, DRM_AUTH|DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(I915_GEM_EXECBUFFER2_WR, i915_gem_execbuffer2, DRM_AUTH|DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(I915_GEM_PIN, i915_gem_reject_pin_ioctl, DRM_AUTH|DRM_ROOT_ONLY),
 	DRM_IOCTL_DEF_DRV(I915_GEM_UNPIN, i915_gem_reject_pin_ioctl, DRM_AUTH|DRM_ROOT_ONLY),
 	DRM_IOCTL_DEF_DRV(I915_GEM_BUSY, i915_gem_busy_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 7a20dd2e5b55..b57f0bb9ba51 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -28,6 +28,7 @@
 
 #include <linux/dma_remapping.h>
 #include <linux/reservation.h>
+#include <linux/sync_file.h>
 #include <linux/uaccess.h>
 
 #include <drm/drmP.h>
@@ -1585,6 +1586,9 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	struct i915_execbuffer_params *params = &params_master;
 	const u32 ctx_id = i915_execbuffer2_get_context_id(*args);
 	u32 dispatch_flags;
+	struct fence *in_fence = NULL;
+	struct sync_file *out_fence = NULL;
+	int out_fence_fd = -1;
 	int ret;
 	bool need_relocs;
 
@@ -1628,6 +1632,23 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 		dispatch_flags |= I915_DISPATCH_RS;
 	}
 
+	if (args->flags & I915_EXEC_FENCE_IN) {
+		in_fence = sync_file_get_fence(lower_32_bits(args->rsvd2));
+		if (!in_fence) {
+			ret = -EINVAL;
+			goto pre_mutex_err;
+		}
+	}
+
+	if (args->flags & I915_EXEC_FENCE_OUT) {
+		out_fence_fd = get_unused_fd_flags(O_CLOEXEC);
+		if (out_fence_fd < 0) {
+			ret = out_fence_fd;
+			out_fence_fd = -1;
+			goto pre_mutex_err;
+		}
+	}
+
 	/* Take a local wakeref for preparing to dispatch the execbuf as
 	 * we expect to access the hardware fairly frequently in the
 	 * process. Upon first dispatch, we acquire another prolonged
@@ -1772,6 +1793,20 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 		goto err_batch_unpin;
 	}
 
+	if (in_fence) {
+		ret = i915_gem_request_await_fence(params->request, in_fence);
+		if (ret < 0)
+			goto err_request;
+	}
+
+	if (out_fence_fd != -1) {
+		out_fence = sync_file_create(fence_get(&params->request->fence));
+		if (!out_fence) {
+			ret = -ENOMEM;
+			goto err_request;
+		}
+	}
+
 	/* Whilst this request exists, batch_obj will be on the
 	 * active_list, and so will hold the active reference. Only when this
 	 * request is retired will the the batch_obj be moved onto the
@@ -1799,6 +1834,16 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	ret = execbuf_submit(params, args, &eb->vmas);
 err_request:
 	__i915_add_request(params->request, ret == 0);
+	if (out_fence) {
+		if (ret == 0) {
+			fd_install(out_fence_fd, out_fence->file);
+			args->rsvd2 &= GENMASK_ULL(0, 31); /* keep in-fence */
+			args->rsvd2 |= (u64)out_fence_fd << 32;
+			out_fence_fd = -1;
+		} else {
+			fput(out_fence->file);
+		}
+	}
 
 err_batch_unpin:
 	/*
@@ -1820,6 +1865,9 @@ pre_mutex_err:
 	/* intel_gpu_busy should also get a ref, so it will free when the device
 	 * is really idle. */
 	intel_runtime_pm_put(dev_priv);
+	if (out_fence_fd != -1)
+		put_unused_fd(out_fence_fd);
+	fence_put(in_fence);
 	return ret;
 }
 
@@ -1927,11 +1975,6 @@ i915_gem_execbuffer2(struct drm_device *dev, void *data,
 		return -EINVAL;
 	}
 
-	if (args->rsvd2 != 0) {
-		DRM_DEBUG("dirty rvsd2 field\n");
-		return -EINVAL;
-	}
-
 	exec2_list = drm_malloc_gfp(args->buffer_count,
 				    sizeof(*exec2_list),
 				    GFP_TEMPORARY);
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 66ee7d30329f..b7db0a9c35b9 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -246,6 +246,7 @@ typedef struct _drm_i915_sarea {
 #define DRM_I915_OVERLAY_PUT_IMAGE	0x27
 #define DRM_I915_OVERLAY_ATTRS	0x28
 #define DRM_I915_GEM_EXECBUFFER2	0x29
+#define DRM_I915_GEM_EXECBUFFER2_WR	DRM_I915_GEM_EXECBUFFER2
 #define DRM_I915_GET_SPRITE_COLORKEY	0x2a
 #define DRM_I915_SET_SPRITE_COLORKEY	0x2b
 #define DRM_I915_GEM_WAIT	0x2c
@@ -279,6 +280,7 @@ typedef struct _drm_i915_sarea {
 #define DRM_IOCTL_I915_GEM_INIT		DRM_IOW(DRM_COMMAND_BASE + DRM_I915_GEM_INIT, struct drm_i915_gem_init)
 #define DRM_IOCTL_I915_GEM_EXECBUFFER	DRM_IOW(DRM_COMMAND_BASE + DRM_I915_GEM_EXECBUFFER, struct drm_i915_gem_execbuffer)
 #define DRM_IOCTL_I915_GEM_EXECBUFFER2	DRM_IOW(DRM_COMMAND_BASE + DRM_I915_GEM_EXECBUFFER2, struct drm_i915_gem_execbuffer2)
+#define DRM_IOCTL_I915_GEM_EXECBUFFER2_WR	DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_EXECBUFFER2_WR, struct drm_i915_gem_execbuffer2)
 #define DRM_IOCTL_I915_GEM_PIN		DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_PIN, struct drm_i915_gem_pin)
 #define DRM_IOCTL_I915_GEM_UNPIN	DRM_IOW(DRM_COMMAND_BASE + DRM_I915_GEM_UNPIN, struct drm_i915_gem_unpin)
 #define DRM_IOCTL_I915_GEM_BUSY		DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_BUSY, struct drm_i915_gem_busy)
@@ -393,6 +395,13 @@ typedef struct drm_i915_irq_wait {
  */
 #define I915_PARAM_HAS_EXEC_ASYNC	 41
 
+/* Query whether DRM_I915_GEM_EXECBUFFER2 supports explicit fence support -
+ * both being able to pass in a sync_file fd to wait upon before executing,
+ * and being able to return a new sync_file fd that is signaled when the
+ * current request is complete.
+ */
+#define I915_PARAM_HAS_EXEC_FENCE	 42
+
 typedef struct drm_i915_getparam {
 	__s32 param;
 	/*
@@ -845,7 +854,32 @@ struct drm_i915_gem_execbuffer2 {
  */
 #define I915_EXEC_RESOURCE_STREAMER     (1<<15)
 
-#define __I915_EXEC_UNKNOWN_FLAGS -(I915_EXEC_RESOURCE_STREAMER<<1)
+/* Setting I915_EXEC_FENCE_IN implies that lower_32_bits(rsvd2) represent
+ * a sync_file fd to wait upon (in a nonblocking manner) prior to executing
+ * the batch.
+ *
+ * Returns -EINVAL if the sync_file fd cannot be found.
+ */
+#define I915_EXEC_FENCE_IN		(1<<16)
+
+/* Setting I915_EXEC_FENCE_OUT causes the ioctl to return a sync_file fd
+ * in the upper_32_bits(rsvd2) upon success. Ownership of the fd is given
+ * to the caller, and it should be close() after use. (The fd is a regular
+ * file descriptor and will be cleaned up on process termination. It holds
+ * a reference to the request, but nothing else.)
+ *
+ * The sync_file fd can be combined with other sync_file and passed either
+ * to execbuf using I915_EXEC_FENCE_IN, to atomic KMS ioctls (so that a flip
+ * will only occur after this request completes), or to other devices.
+ *
+ * Using I915_EXEC_FENCE_OUT requires use of
+ * DRM_IOCTL_I915_GEM_EXECBUFFER2_WR ioctl so that the result is written
+ * back to userspace. Failure to do so will cause the out-fence to always
+ * be reported as zero, and the real fence fd to be leaked.
+ */
+#define I915_EXEC_FENCE_OUT		(1<<17)
+
+#define __I915_EXEC_UNKNOWN_FLAGS (-(I915_EXEC_FENCE_OUT<<1))
 
 #define I915_EXEC_CONTEXT_ID_MASK	(0xffffffff)
 #define i915_execbuffer2_set_context_id(eb2, context) \
-- 
2.9.3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH 01/18] drm/i915: Support asynchronous waits on struct fence from i915_gem_request
  2016-09-14  6:52 ` [PATCH 01/18] drm/i915: Support asynchronous waits on struct fence from i915_gem_request Chris Wilson
@ 2016-09-14  7:37   ` Joonas Lahtinen
  2016-09-19 11:26     ` Chris Wilson
  0 siblings, 1 reply; 51+ messages in thread
From: Joonas Lahtinen @ 2016-09-14  7:37 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On ke, 2016-09-14 at 07:52 +0100, Chris Wilson wrote:
> +int
> +i915_gem_request_await_fence(struct drm_i915_gem_request *req,
> +			     struct fence *fence)
> +{
> +	struct fence_array *array;
> +	int ret;
> +	int i;
> +
> +	if (test_bit(FENCE_FLAG_SIGNALED_BIT, &fence->flags))
> +		return 0;
> +
> +	if (fence_is_i915(fence))
> +		return i915_gem_request_await_request(req, to_request(fence));
> +
> +	if (!fence_is_array(fence)) {
> +		ret = i915_sw_fence_await_dma_fence(&req->submit,
> +						    fence, 10*HZ,

Magic 10*HZ, give it a define it is not temporary.

> +						    GFP_KERNEL);
> +		return ret < 0 ? ret : 0;
> +	}
> +
> +	array = to_fence_array(fence);
> +	for (i = 0; i < array->num_fences; i++) {
> +		struct fence *child = array->fences[i];
> +
> +		if (fence_is_i915(child))
> +			ret = i915_gem_request_await_request(req,
> +							     to_request(child));
> +		else
> +			ret = i915_sw_fence_await_dma_fence(&req->submit,
> +							    child, 10*HZ,
> +							    GFP_KERNEL);

This chunk repeats from above, make it a small
__i915_gem_request_await_fence function.

> +		if (ret < 0)
> +			return ret;

How about the failure case when we're half bound, no need to tear down?

No uses in this patch so hard to evaluate.

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 02/18] drm/i915: Allow i915_sw_fence_await_sw_fence() to allocate
  2016-09-14  6:52 ` [PATCH 02/18] drm/i915: Allow i915_sw_fence_await_sw_fence() to allocate Chris Wilson
@ 2016-09-14  7:51   ` Joonas Lahtinen
  2016-09-14  8:46     ` Chris Wilson
  0 siblings, 1 reply; 51+ messages in thread
From: Joonas Lahtinen @ 2016-09-14  7:51 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On ke, 2016-09-14 at 07:52 +0100, Chris Wilson wrote:
> @@ -678,7 +678,7 @@ void __i915_add_request(struct drm_i915_gem_request *request, bool flush_caches)
>  				   &request->i915->drm.struct_mutex);
>  	if (prev)
>  		i915_sw_fence_await_sw_fence(&request->submit, &prev->submit,
> -					     &request->submitq);
> +					     &request->submitq, GFP_NOWAIT);

Wrt commit message, why do we pass both here? If one was to run
statistic analysis, !wq is never true if you pass &foo here.

> @@ -135,6 +135,8 @@ static int i915_sw_fence_wake(wait_queue_t *wq, unsigned mode, int flags, void *
>  	list_del(&wq->task_list);
>  	__i915_sw_fence_complete(wq->private, key);
>  	i915_sw_fence_put(wq->private);
> +	if (wq->flags)
> +		kfree(wq);

This is confusing without a comment or proper flag #define.

>  	INIT_LIST_HEAD(&wq->task_list);
> -	wq->flags = 0;
> +	wq->flags = pending;

Why not make this look proper by using I915_SW_FENCE_FLAG_FOO name.

> +static inline void i915_sw_fence_wait(struct i915_sw_fence *fence)
> +{
> +	wait_event(fence->wait, i915_sw_fence_done(fence));
> +}
> +

This seems to be a lost-in-rebasing hunk.

Above addressed;

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 02/18] drm/i915: Allow i915_sw_fence_await_sw_fence() to allocate
  2016-09-14  7:51   ` Joonas Lahtinen
@ 2016-09-14  8:46     ` Chris Wilson
  0 siblings, 0 replies; 51+ messages in thread
From: Chris Wilson @ 2016-09-14  8:46 UTC (permalink / raw)
  To: Joonas Lahtinen; +Cc: intel-gfx

On Wed, Sep 14, 2016 at 10:51:12AM +0300, Joonas Lahtinen wrote:
> On ke, 2016-09-14 at 07:52 +0100, Chris Wilson wrote:
> > @@ -678,7 +678,7 @@ void __i915_add_request(struct drm_i915_gem_request *request, bool flush_caches)
> >  				   &request->i915->drm.struct_mutex);
> >  	if (prev)
> >  		i915_sw_fence_await_sw_fence(&request->submit, &prev->submit,
> > -					     &request->submitq);
> > +					     &request->submitq, GFP_NOWAIT);
> 
> Wrt commit message, why do we pass both here? If one was to run
> statistic analysis, !wq is never true if you pass &foo here.

Only because GFP_NOWAIT was descriptive, and cleaner than say
(__force gfp_t)0

> > @@ -135,6 +135,8 @@ static int i915_sw_fence_wake(wait_queue_t *wq, unsigned mode, int flags, void *
> >  	list_del(&wq->task_list);
> >  	__i915_sw_fence_complete(wq->private, key);
> >  	i915_sw_fence_put(wq->private);
> > +	if (wq->flags)
> > +		kfree(wq);
> 
> This is confusing without a comment or proper flag #define.
> 
> >  	INIT_LIST_HEAD(&wq->task_list);
> > -	wq->flags = 0;
> > +	wq->flags = pending;
> 
> Why not make this look proper by using I915_SW_FENCE_FLAG_FOO name.
> 
> > +static inline void i915_sw_fence_wait(struct i915_sw_fence *fence)
> > +{
> > +	wait_event(fence->wait, i915_sw_fence_done(fence));
> > +}
> > +
> 
> This seems to be a lost-in-rebasing hunk.

I snuck in a use along the oom path to justify it here (and avoid having
to magic it out of nowhere later).
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 03/18] drm/i915: Rearrange i915_wait_request() accounting with callers
  2016-09-14  6:52 ` [PATCH 03/18] drm/i915: Rearrange i915_wait_request() accounting with callers Chris Wilson
@ 2016-09-14  8:47   ` Joonas Lahtinen
  0 siblings, 0 replies; 51+ messages in thread
From: Joonas Lahtinen @ 2016-09-14  8:47 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On ke, 2016-09-14 at 07:52 +0100, Chris Wilson wrote:
> +int
> +i915_gem_object_wait(struct drm_i915_gem_object *obj,
> +		     unsigned int flags,
> +		     long timeout,
> +		     struct intel_rps_client *rps)
>  {
>  

[...]

> -	return 0;
> +	resv = i915_gem_object_get_dmabuf_resv(obj);
> +	if (resv)
> +		timeout = i915_gem_object_wait_reservation(resv,
> +							   flags, timeout,
> +							   rps);
> +	return timeout < 0 ? timeout : timeout > 0 ? 0 : -ETIME;

	Format this in a more readable manner eg.;

	return timeout == 0 ? -ETIME :
	       timeout < 0 ? timeout :
	       0;
		
>  
>  static struct intel_rps_client *to_rps_client(struct drm_file *file)
> @@ -454,7 +542,13 @@ i915_gem_phys_pwrite(struct drm_i915_gem_object *obj,
>  	/* We manually control the domain here and pretend that it
>  	 * remains coherent i.e. in the GTT domain, like shmem_pwrite.
>  	 */
> -	ret = i915_gem_object_wait_rendering(obj, false);
> +	lockdep_assert_held(&obj->base.dev->struct_mutex);

Bump this before the comment to the beginning of function like
elsehwere.

> @@ -2804,17 +2923,21 @@ i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>  	if (!obj)
>  		return -ENOENT;
>  
> -	active = __I915_BO_ACTIVE(obj);
> -	for_each_active(active, idx) {
> -		s64 *timeout = args->timeout_ns >= 0 ? &args->timeout_ns : NULL;
> -		ret = i915_gem_active_wait_unlocked(&obj->last_read[idx],
> -						    I915_WAIT_INTERRUPTIBLE,
> -						    timeout, rps);
> -		if (ret)
> -			break;
> +	start = ktime_get();
> +
> +	ret = i915_gem_object_wait(obj,
> +				   I915_WAIT_INTERRUPTIBLE | I915_WAIT_ALL,
> +				   args->timeout_ns < 0 ? MAX_SCHEDULE_TIMEOUT : nsecs_to_jiffies(args->timeout_ns),

Do break this line, plz.

Maybe just have long timeout = MAX_SCHEDULE_TIMEOUT; in the beginning
of file, then do if (args->timeout_ns >= 0) before the function, it
matches the after function if nicely.


> +				   to_rps_client(file));
> +
> +	if (args->timeout_ns > 0) {

And as we have this.

> +		args->timeout_ns -= ktime_to_ns(ktime_sub(ktime_get(), start));
> +		if (args->timeout_ns < 0)
> +			args->timeout_ns = 0;
>  	}
>  
>  	i915_gem_object_put_unlocked(obj);
> +
>  	return ret;
>  }
> 

<SNIP>

> 
> @@ -3598,7 +3732,13 @@ i915_gem_object_set_to_cpu_domain(struct drm_i915_gem_object *obj, bool write)
>  	uint32_t old_write_domain, old_read_domains;
>  	int ret;
>  
> -	ret = i915_gem_object_wait_rendering(obj, !write);
> +	lockdep_assert_held(&obj->base.dev->struct_mutex);

I'd add a newline here like elsewhere.

> +	ret = i915_gem_object_wait(obj,
> +				   I915_WAIT_INTERRUPTIBLE |
> +				   I915_WAIT_LOCKED |
> +				   (write ? I915_WAIT_ALL : 0),
> +				   MAX_SCHEDULE_TIMEOUT,
> +				   NULL);
>  	if (ret)
>  		return ret;
>  
> @@ -3654,11 +3794,7 @@ i915_gem_ring_throttle(struct drm_device *dev, struct drm_file *file)
>  	struct drm_i915_file_private *file_priv = file->driver_priv;
>  	unsigned long recent_enough = jiffies - DRM_I915_THROTTLE_JIFFIES;
>  	struct drm_i915_gem_request *request, *target = NULL;
> -	int ret;
> -
> -	ret = i915_gem_wait_for_error(&dev_priv->gpu_error);
> -	if (ret)
> -		return ret;

Unsure how this is related to the changes, need to explain in commit
message or I nominate this a lost hunk.

With those addressed,

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 04/18] drm/i915: Remove unused i915_gem_active_wait() in favour of _unlocked()
  2016-09-14  6:52 ` [PATCH 04/18] drm/i915: Remove unused i915_gem_active_wait() in favour of _unlocked() Chris Wilson
@ 2016-09-14  8:48   ` Joonas Lahtinen
  0 siblings, 0 replies; 51+ messages in thread
From: Joonas Lahtinen @ 2016-09-14  8:48 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On ke, 2016-09-14 at 07:52 +0100, Chris Wilson wrote:
> Since we only use the more generic unlocked variant, just rename it as
> the normal i915_gem_active_wait(). The temporary cost is that we need to
> always acquire the reference in a RCU safe manner, but the benefit is
> that we will combine the common paths.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 51+ messages in thread

* ✗ Fi.CI.BAT: failure for series starting with [01/18] drm/i915: Support asynchronous waits on struct fence from i915_gem_request
  2016-09-14  6:52 Tracking multiple timelines (full-ppgtt) Chris Wilson
                   ` (17 preceding siblings ...)
  2016-09-14  6:52 ` [PATCH 18/18] drm/i915: Support explicit fencing for execbuf Chris Wilson
@ 2016-09-14  9:16 ` Patchwork
  18 siblings, 0 replies; 51+ messages in thread
From: Patchwork @ 2016-09-14  9:16 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/18] drm/i915: Support asynchronous waits on struct fence from i915_gem_request
URL   : https://patchwork.freedesktop.org/series/12433/
State : failure

== Summary ==

Series 12433v1 Series without cover letter
https://patchwork.freedesktop.org/api/1.0/series/12433/revisions/1/mbox/

Test kms_pipe_crc_basic:
        Subgroup hang-read-crc-pipe-c:
                skip       -> PASS       (fi-hsw-4770r)
        Subgroup suspend-read-crc-pipe-b:
                skip       -> INCOMPLETE (fi-ivb-3770)

fi-bsw-n3050     total:244  pass:202  dwarn:0   dfail:0   fail:0   skip:42 
fi-hsw-4770k     total:244  pass:226  dwarn:0   dfail:0   fail:0   skip:18 
fi-hsw-4770r     total:244  pass:222  dwarn:0   dfail:0   fail:0   skip:22 
fi-ilk-650       total:244  pass:183  dwarn:0   dfail:0   fail:1   skip:60 
fi-ivb-3520m     total:244  pass:219  dwarn:0   dfail:0   fail:0   skip:25 
fi-ivb-3770      total:202  pass:171  dwarn:0   dfail:0   fail:0   skip:30 
fi-skl-6260u     total:244  pass:230  dwarn:0   dfail:0   fail:0   skip:14 
fi-skl-6700hq    total:244  pass:221  dwarn:0   dfail:0   fail:1   skip:22 
fi-skl-6700k     total:244  pass:219  dwarn:1   dfail:0   fail:0   skip:24 
fi-snb-2520m     total:244  pass:208  dwarn:0   dfail:0   fail:0   skip:36 
fi-snb-2600      total:244  pass:207  dwarn:0   dfail:0   fail:0   skip:37 

Results at /archive/results/CI_IGT_test/Patchwork_2528/

208290026552464713d3897ab5d649f4445d5513 drm-intel-nightly: 2016y-09m-13d-14h-45m-32s UTC integration manifest
422c489 drm/i915: Support explicit fencing for execbuf
f78943a drm/i915: Enable userspace to opt-out of implicit fencing
93acb19 drm/i915: Enable multiple timelines
7b8232a drm/i915: Reserve space in the global seqno during request allocation
c972275 drm/i915: Create a unique name for the context
6c8fa4a drm/i915: Move the global sync optimisation to the timeline
27f5fd8 drm/i915: Defer request emission
e0e3346 drm/i915: Record space required for request emission
6cbf7e2 drm/i915: Introduce a global_seqno for each request
279611d drm/i915: Wait first for submission, before waiting for request completion
ec2ccb1 drm/i915: Combine seqno + tracking into a global timeline struct
c5ca7a3 drm/i915: Restore nonblocking awaits for modesetting
435dc25 drm: Add reference counting to drm_atomic_state
bc1da87 drm/i915: Move GEM activity tracking into a common struct reservation_object
354aa03 drm/i915: Remove unused i915_gem_active_wait() in favour of _unlocked()
39f214d drm/i915: Rearrange i915_wait_request() accounting with callers
459763e drm/i915: Allow i915_sw_fence_await_sw_fence() to allocate
83aaab6 drm/i915: Support asynchronous waits on struct fence from i915_gem_request

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 05/18] drm/i915: Move GEM activity tracking into a common struct reservation_object
  2016-09-14  6:52 ` [PATCH 05/18] drm/i915: Move GEM activity tracking into a common struct reservation_object Chris Wilson
@ 2016-09-14  9:44   ` Joonas Lahtinen
  2016-09-14 17:35     ` Chris Wilson
  2016-09-16 11:40   ` Chris Wilson
  1 sibling, 1 reply; 51+ messages in thread
From: Joonas Lahtinen @ 2016-09-14  9:44 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: Daniel Vetter

On ke, 2016-09-14 at 07:52 +0100, Chris Wilson wrote:
> In preparation to support many distinct timelines, we need to expand the
> activity tracking on the GEM object to handle more than just a request
> per engine. We already use the struct reservation_object on the dma-buf
> to handle many fence contexts, so integrating that into the GEM object
> itself is the preferred solution. (For example, we can now share the same
> reservation_object between every consumer/producer using this buffer and
> skip the manual import/export via dma-buf.)
> 
> Caveats:

I'd make comments which patch in the series addresses each introduced
problem, which are fixable in future and which are taken as "a
permanent hit" for achieving multiple timelines. With a bit of
reasoning for each (now only a few points include some of this).

>  static inline struct drm_i915_gem_object *
> @@ -2347,35 +2341,10 @@ i915_gem_object_has_struct_page(const struct drm_i915_gem_object *obj)
>  	return obj->ops->flags & I915_GEM_OBJECT_HAS_STRUCT_PAGE;
>  }
>  
> -static inline unsigned long
> -i915_gem_object_get_active(const struct drm_i915_gem_object *obj)
> -{
> -	return (obj->flags >> I915_BO_ACTIVE_SHIFT) & I915_BO_ACTIVE_MASK;
> -}
> -
>  static inline bool
>  i915_gem_object_is_active(const struct drm_i915_gem_object *obj)
>  {
> -	return i915_gem_object_get_active(obj);
> -}
> -
> -static inline void
> -i915_gem_object_set_active(struct drm_i915_gem_object *obj, int engine)
> -{
> -	obj->flags |= BIT(engine + I915_BO_ACTIVE_SHIFT);
> -}
> -
> -static inline void
> -i915_gem_object_clear_active(struct drm_i915_gem_object *obj, int engine)
> -{
> -	obj->flags &= ~BIT(engine + I915_BO_ACTIVE_SHIFT);
> -}
> -
> -static inline bool
> -i915_gem_object_has_active_engine(const struct drm_i915_gem_object *obj,
> -				  int engine)
> -{
> -	return obj->flags & BIT(engine + I915_BO_ACTIVE_SHIFT);
> +	return obj->active_count;

our type is bool, so !!obj->active_count;

>  }
> 

<SNIP>

> 
>  static int
>  i915_gem_execbuffer_move_to_gpu(struct drm_i915_gem_request *req,
>  				struct list_head *vmas)
>  {
> -	const unsigned int other_rings = eb_other_engines(req);
>  	struct i915_vma *vma;
>  	int ret;
>  
>  	list_for_each_entry(vma, vmas, exec_list) {
>  		struct drm_i915_gem_object *obj = vma->obj;
> -		struct reservation_object *resv;
> -
> -		if (obj->flags & other_rings) {
> -			ret = i915_gem_request_await_object
> -				(req, obj, obj->base.pending_write_domain);
> -			if (ret)
> -				return ret;
> -		}
>  
> -		resv = i915_gem_object_get_dmabuf_resv(obj);
> -		if (resv) {
> -			ret = i915_sw_fence_await_reservation
> -				(&req->submit, resv, &i915_fence_ops,
> -				 obj->base.pending_write_domain, 10*HZ,
> -				 GFP_KERNEL | __GFP_NOWARN);
> -			if (ret < 0)
> -				return ret;
> -		}
> +		ret = i915_gem_request_await_object
> +			(req, obj, obj->base.pending_write_domain);

I know it was previously like this, but I'm not sure I agree on this
style at all.

> @@ -11935,17 +11932,8 @@ static bool use_mmio_flip(struct intel_engine_cs *engine,
>  
>  	if (i915.use_mmio_flip < 0)
>  		return false;
> -	else if (i915.use_mmio_flip > 0)
> -		return true;
> -	else if (i915.enable_execlists)
> -		return true;
>  
> -	resv = i915_gem_object_get_dmabuf_resv(obj);
> -	if (resv && !reservation_object_test_signaled_rcu(resv, false))
> -		return true;
> -
> -	return engine != i915_gem_active_get_engine(&obj->last_write,
> -						    &obj->base.dev->struct_mutex);
> +	return true;

	return i915_use_mmio_flip >= 0; // ?

> @@ -860,39 +860,6 @@ struct drm_i915_gem_busy {
>  	 * long as no new GPU commands are executed upon it). Due to the
>  	 * asynchronous nature of the hardware, an object reported
>  	 * as busy may become idle before the ioctl is completed.
> -	 *
> -	 * Furthermore, if the object is busy, which engine is busy is only
> -	 * provided as a guide. There are race conditions which prevent the
> -	 * report of which engines are busy from being always accurate.
> -	 * However, the converse is not true. If the object is idle, the
> -	 * result of the ioctl, that all engines are idle, is accurate.
> -	 *
> -	 * The returned dword is split into two fields to indicate both
> -	 * the engines on which the object is being read, and the
> -	 * engine on which it is currently being written (if any).
> -	 *
> -	 * The low word (bits 0:15) indicate if the object is being written
> -	 * to by any engine (there can only be one, as the GEM implicit
> -	 * synchronisation rules force writes to be serialised). Only the
> -	 * engine for the last write is reported.
> -	 *
> -	 * The high word (bits 16:31) are a bitmask of which engines are
> -	 * currently reading from the object. Multiple engines may be
> -	 * reading from the object simultaneously.
> -	 *
> -	 * The value of each engine is the same as specified in the
> -	 * EXECBUFFER2 ioctl, i.e. I915_EXEC_RENDER, I915_EXEC_BSD etc.
> -	 * Note I915_EXEC_DEFAULT is a symbolic value and is mapped to
> -	 * the I915_EXEC_RENDER engine for execution, and so it is never
> -	 * reported as active itself. Some hardware may have parallel
> -	 * execution engines, e.g. multiple media engines, which are
> -	 * mapped to the same identifier in the EXECBUFFER2 ioctl and
> -	 * so are not separately reported for busyness.
> -	 *
> -	 * Caveat emptor:
> -	 * Only the boolean result of this query is reliable; that is whether
> -	 * the object is idle or busy. The report of which engines are busy
> -	 * should be only used as a heuristic.
>  	 */

Daniel to Ack this ABI change.

Reviewed-by: Joonas Lahtinen <joonas.lahtine@linux.intel.com>

Double check by somebody on the plane code would not hurt.

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 11/18] drm/i915: Record space required for request emission
  2016-09-14  6:52 ` [PATCH 11/18] drm/i915: Record space required for request emission Chris Wilson
@ 2016-09-14 13:30   ` Tvrtko Ursulin
  2016-09-14 17:33     ` Chris Wilson
  2016-09-19 10:47   ` Joonas Lahtinen
  1 sibling, 1 reply; 51+ messages in thread
From: Tvrtko Ursulin @ 2016-09-14 13:30 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 14/09/2016 07:52, Chris Wilson wrote:
> In the next patch, we will use deferred request emission. That requires
> reserving sufficient space in the ringbuffer to emit the request, which
> first requires us to know how large the request is.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_gem_request.c |  1 +
>   drivers/gpu/drm/i915/intel_lrc.c        |  6 ++++++
>   drivers/gpu/drm/i915/intel_ringbuffer.c | 29 +++++++++++++++++++++++++++--
>   drivers/gpu/drm/i915/intel_ringbuffer.h |  1 +
>   4 files changed, 35 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
> index ebc2feba3e50..9a735e2d5aea 100644
> --- a/drivers/gpu/drm/i915/i915_gem_request.c
> +++ b/drivers/gpu/drm/i915/i915_gem_request.c
> @@ -425,6 +425,7 @@ i915_gem_request_alloc(struct intel_engine_cs *engine,
>   	 * away, e.g. because a GPU scheduler has deferred it.
>   	 */
>   	req->reserved_space = MIN_SPACE_FOR_ADD_REQUEST;
> +	GEM_BUG_ON(req->reserved_space < engine->emit_request_sz);
>   
>   	if (i915.enable_execlists)
>   		ret = intel_logical_ring_alloc_request_extras(req);
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 00fcf36ba919..7e944a0acc10 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -1572,6 +1572,8 @@ static int gen8_emit_request(struct drm_i915_gem_request *request)
>   	return intel_logical_ring_advance(request);
>   }
>   
> +static const int gen8_emit_request_sz = 6 + WA_TAIL_DWORDS;
> +
>   static int gen8_emit_request_render(struct drm_i915_gem_request *request)
>   {
>   	struct intel_ring *ring = request->ring;
> @@ -1603,6 +1605,8 @@ static int gen8_emit_request_render(struct drm_i915_gem_request *request)
>   	return intel_logical_ring_advance(request);
>   }
>   
> +static const int gen8_emit_request_render_sz = 8 + WA_TAIL_DWORDS;
> +
>   static int gen8_init_rcs_context(struct drm_i915_gem_request *req)
>   {
>   	int ret;
> @@ -1677,6 +1681,7 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine)
>   	engine->reset_hw = reset_common_ring;
>   	engine->emit_flush = gen8_emit_flush;
>   	engine->emit_request = gen8_emit_request;
> +	engine->emit_request_sz = gen8_emit_request_sz;
>   	engine->submit_request = execlists_submit_request;
>   
>   	engine->irq_enable = gen8_logical_ring_enable_irq;
> @@ -1799,6 +1804,7 @@ int logical_render_ring_init(struct intel_engine_cs *engine)
>   	engine->init_context = gen8_init_rcs_context;
>   	engine->emit_flush = gen8_emit_flush_render;
>   	engine->emit_request = gen8_emit_request_render;
> +	engine->emit_request_sz = gen8_emit_request_render_sz;
>   
>   	ret = intel_engine_create_scratch(engine, 4096);
>   	if (ret)
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 597e35c9b699..c8c9ad40fd93 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -1405,6 +1405,8 @@ static int i9xx_emit_request(struct drm_i915_gem_request *req)
>   	return 0;
>   }
>   
> +static const int i9xx_emit_request_sz = 4;
> +
>   /**
>    * gen6_sema_emit_request - Update the semaphore mailbox registers
>    *
> @@ -1458,6 +1460,8 @@ static int gen8_render_emit_request(struct drm_i915_gem_request *req)
>   	return 0;
>   }
>   
> +static const int gen8_render_emit_request_sz = 8;
> +
>   /**
>    * intel_ring_sync - sync the waiter to the signaller on seqno
>    *
> @@ -2677,8 +2681,21 @@ static void intel_ring_default_vfuncs(struct drm_i915_private *dev_priv,
>   	engine->reset_hw = reset_ring_common;
>   
>   	engine->emit_request = i9xx_emit_request;
> -	if (i915.semaphores)
> +	engine->emit_request_sz = i9xx_emit_request_sz;
> +	if (i915.semaphores) {
> +		int num_rings;
> +
>   		engine->emit_request = gen6_sema_emit_request;
> +
> +		num_rings = hweight32(INTEL_INFO(dev_priv)->ring_mask) - 1;

You can use INTEL_INFO(dev_priv)->num_rings instead of hweight32.

> +		if (INTEL_GEN(dev_priv) >= 8) {
> +			engine->emit_request_sz += num_rings * 6;
> +		} else {
> +			engine->emit_request_sz += num_rings * 3;
> +			if (num_rings & 1)
> +				engine->emit_request_sz++;
> +		}
> +	}
>   	engine->submit_request = i9xx_submit_request;
>   
>   	if (INTEL_GEN(dev_priv) >= 8)
> @@ -2706,9 +2723,17 @@ int intel_init_render_ring_buffer(struct intel_engine_cs *engine)
>   	if (INTEL_GEN(dev_priv) >= 8) {
>   		engine->init_context = intel_rcs_ctx_init;
>   		engine->emit_request = gen8_render_emit_request;
> +		engine->emit_request_sz = gen8_render_emit_request_sz;
>   		engine->emit_flush = gen8_render_ring_flush;
> -		if (i915.semaphores)
> +		if (i915.semaphores) {
> +			int num_rings;
> +
>   			engine->semaphore.signal = gen8_rcs_signal;
> +
> +			num_rings =
> +				hweight32(INTEL_INFO(dev_priv)->ring_mask) - 1;

Here as well.

> +			engine->emit_request_sz += num_rings * 6;
> +		}
>   	} else if (INTEL_GEN(dev_priv) >= 6) {
>   		engine->init_context = intel_rcs_ctx_init;
>   		engine->emit_flush = gen7_render_ring_flush;
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index d6f80b3d66b4..47ad19f5441d 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -226,6 +226,7 @@ struct intel_engine_cs {
>   #define I915_DISPATCH_PINNED BIT(1)
>   #define I915_DISPATCH_RS     BIT(2)
>   	int		(*emit_request)(struct drm_i915_gem_request *req);
> +	int		emit_request_sz;
>   
>   	/* Pass the request to the hardware queue (e.g. directly into
>   	 * the legacy ringbuffer or to the end of an execlist).

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 08/18] drm/i915: Combine seqno + tracking into a global timeline struct
  2016-09-14  6:52 ` [PATCH 08/18] drm/i915: Combine seqno + tracking into a global timeline struct Chris Wilson
@ 2016-09-14 15:42   ` Joonas Lahtinen
  0 siblings, 0 replies; 51+ messages in thread
From: Joonas Lahtinen @ 2016-09-14 15:42 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On ke, 2016-09-14 at 07:52 +0100, Chris Wilson wrote:
>  i915_next_seqno_get(void *data, u64 *val)
>  {
>  	struct drm_i915_private *dev_priv = data;
> -	int ret;
> -
> -	ret = mutex_lock_interruptible(&dev_priv->drm.struct_mutex);
> -	if (ret)
> -		return ret;
> -
> -	*val = dev_priv->next_seqno;
> -	mutex_unlock(&dev_priv->drm.struct_mutex);
>  
> +	*val = dev_priv->gt.global_timeline.next_seqno;

This should be marked as unlocked access somehow.

> +static int wait_for_timeline(struct i915_gem_timeline *tl, unsigned int flags)
>  {
> -	struct intel_engine_cs *engine;
> -	int ret;
> +	int ret, i;
>  
> -	for_each_engine(engine, dev_priv) {
> -		if (engine->last_context == NULL)
> -			continue;
> +	for (i = 0; i < ARRAY_SIZE(tl->engine); i++) {

Lets untangle them headers and not manually roll these loops?

> @@ -4475,13 +4489,24 @@ i915_gem_load_init(struct drm_device *dev)
>  				  SLAB_RECLAIM_ACCOUNT |
>  				  SLAB_DESTROY_BY_RCU,
>  				  NULL);
> +	if (!dev_priv->requests) {
> +		err = -ENOMEM;
> +		goto err_vmas;
> +	}
> +
> +	mutex_lock(&dev_priv->drm.struct_mutex);

Hngg, locking in initializer, maybe we should have our own variant for
assert_held_struct_mutex (or as you suggested have no struct_mutex!)

> +	INIT_LIST_HEAD(&dev_priv->gt.timelines);
> +	err = i915_gem_timeline_init(dev_priv,
> +				     &dev_priv->gt.global_timeline,
> +				     "[execution]");

"[global]" would not be clearer to give better mapping between
code/debug?

> 
> -static int i915_gem_get_seqno(struct drm_i915_private *dev_priv, u32 *seqno)
> +static int i915_gem_get_global_seqno(struct drm_i915_private *dev_priv,
> +				     u32 *seqno)
>  {
> +	struct i915_gem_timeline *tl = &dev_priv->gt.global_timeline;

*gtl? to indicate the global one.

> +struct intel_timeline {
> +	u64 fence_context;
> +	u32 last_submitted_seqno;
> +
> +	/**
> +	 * List of breadcrumbs associated with GPU requests currently
> +	 * outstanding.
> +	 */
> +	struct list_head requests;
> +
> +	/* An RCU guarded pointer to the last request. No reference is
> +	 * held to the request, users must carefully acquire a reference to
> +	 * the request using i915_gem_active_get_request_rcu(), or hold the
> +	 * struct_mutex.
> +	 */
> +	struct i915_gem_active last_request;

RCU guarded pointer sounds off when we're speaking of an object.
 
> 
> @@ -217,8 +217,7 @@ void intel_engine_init_hangcheck(struct intel_engine_cs *engine)
>  
>  static void intel_engine_init_requests(struct intel_engine_cs *engine)
>  {
> -	init_request_active(&engine->last_request, NULL);
> -	INIT_LIST_HEAD(&engine->request_list);
> +	engine->timeline = &engine->i915->gt.global_timeline.engine[engine->id];

function name is not very current, please update.

> @@ -141,7 +142,6 @@ struct intel_engine_cs {
>  		VCS2,	/* Keep instances of the same type engine together. */
>  		VECS
>  	} id;
> -#define I915_NUM_ENGINES 5
>  #define _VCS(n) (VCS + (n))
>  	unsigned int exec_id;
>  	enum intel_engine_hw_id {
> @@ -152,10 +152,10 @@ struct intel_engine_cs {
>  		VCS2_HW
>  	} hw_id;
>  	enum intel_engine_hw_id guc_id; /* XXX same as hw_id? */
> -	u64 fence_context;
>  	u32		mmio_base;
>  	unsigned int irq_shift;
>  	struct intel_ring *buffer;
> +	struct intel_timeline *timeline;

I don't see why not as a non-pointer member?

Again a rather big patch, with above addressed;

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 11/18] drm/i915: Record space required for request emission
  2016-09-14 13:30   ` Tvrtko Ursulin
@ 2016-09-14 17:33     ` Chris Wilson
  2016-09-15  8:59       ` Tvrtko Ursulin
  0 siblings, 1 reply; 51+ messages in thread
From: Chris Wilson @ 2016-09-14 17:33 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Wed, Sep 14, 2016 at 02:30:20PM +0100, Tvrtko Ursulin wrote:
> 
> On 14/09/2016 07:52, Chris Wilson wrote:
> >In the next patch, we will use deferred request emission. That requires
> >reserving sufficient space in the ringbuffer to emit the request, which
> >first requires us to know how large the request is.
> >
> >Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >---
> >  drivers/gpu/drm/i915/i915_gem_request.c |  1 +
> >  drivers/gpu/drm/i915/intel_lrc.c        |  6 ++++++
> >  drivers/gpu/drm/i915/intel_ringbuffer.c | 29 +++++++++++++++++++++++++++--
> >  drivers/gpu/drm/i915/intel_ringbuffer.h |  1 +
> >  4 files changed, 35 insertions(+), 2 deletions(-)
> >
> >diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
> >index ebc2feba3e50..9a735e2d5aea 100644
> >--- a/drivers/gpu/drm/i915/i915_gem_request.c
> >+++ b/drivers/gpu/drm/i915/i915_gem_request.c
> >@@ -425,6 +425,7 @@ i915_gem_request_alloc(struct intel_engine_cs *engine,
> >  	 * away, e.g. because a GPU scheduler has deferred it.
> >  	 */
> >  	req->reserved_space = MIN_SPACE_FOR_ADD_REQUEST;
> >+	GEM_BUG_ON(req->reserved_space < engine->emit_request_sz);
> >  	if (i915.enable_execlists)
> >  		ret = intel_logical_ring_alloc_request_extras(req);
> >diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> >index 00fcf36ba919..7e944a0acc10 100644
> >--- a/drivers/gpu/drm/i915/intel_lrc.c
> >+++ b/drivers/gpu/drm/i915/intel_lrc.c
> >@@ -1572,6 +1572,8 @@ static int gen8_emit_request(struct drm_i915_gem_request *request)
> >  	return intel_logical_ring_advance(request);
> >  }
> >+static const int gen8_emit_request_sz = 6 + WA_TAIL_DWORDS;
> >+
> >  static int gen8_emit_request_render(struct drm_i915_gem_request *request)
> >  {
> >  	struct intel_ring *ring = request->ring;
> >@@ -1603,6 +1605,8 @@ static int gen8_emit_request_render(struct drm_i915_gem_request *request)
> >  	return intel_logical_ring_advance(request);
> >  }
> >+static const int gen8_emit_request_render_sz = 8 + WA_TAIL_DWORDS;
> >+
> >  static int gen8_init_rcs_context(struct drm_i915_gem_request *req)
> >  {
> >  	int ret;
> >@@ -1677,6 +1681,7 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine)
> >  	engine->reset_hw = reset_common_ring;
> >  	engine->emit_flush = gen8_emit_flush;
> >  	engine->emit_request = gen8_emit_request;
> >+	engine->emit_request_sz = gen8_emit_request_sz;
> >  	engine->submit_request = execlists_submit_request;
> >  	engine->irq_enable = gen8_logical_ring_enable_irq;
> >@@ -1799,6 +1804,7 @@ int logical_render_ring_init(struct intel_engine_cs *engine)
> >  	engine->init_context = gen8_init_rcs_context;
> >  	engine->emit_flush = gen8_emit_flush_render;
> >  	engine->emit_request = gen8_emit_request_render;
> >+	engine->emit_request_sz = gen8_emit_request_render_sz;
> >  	ret = intel_engine_create_scratch(engine, 4096);
> >  	if (ret)
> >diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> >index 597e35c9b699..c8c9ad40fd93 100644
> >--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> >+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> >@@ -1405,6 +1405,8 @@ static int i9xx_emit_request(struct drm_i915_gem_request *req)
> >  	return 0;
> >  }
> >+static const int i9xx_emit_request_sz = 4;
> >+
> >  /**
> >   * gen6_sema_emit_request - Update the semaphore mailbox registers
> >   *
> >@@ -1458,6 +1460,8 @@ static int gen8_render_emit_request(struct drm_i915_gem_request *req)
> >  	return 0;
> >  }
> >+static const int gen8_render_emit_request_sz = 8;
> >+
> >  /**
> >   * intel_ring_sync - sync the waiter to the signaller on seqno
> >   *
> >@@ -2677,8 +2681,21 @@ static void intel_ring_default_vfuncs(struct drm_i915_private *dev_priv,
> >  	engine->reset_hw = reset_ring_common;
> >  	engine->emit_request = i9xx_emit_request;
> >-	if (i915.semaphores)
> >+	engine->emit_request_sz = i9xx_emit_request_sz;
> >+	if (i915.semaphores) {
> >+		int num_rings;
> >+
> >  		engine->emit_request = gen6_sema_emit_request;
> >+
> >+		num_rings = hweight32(INTEL_INFO(dev_priv)->ring_mask) - 1;
> 
> You can use INTEL_INFO(dev_priv)->num_rings instead of hweight32.

I thought info->num_rings was set afterwards? (And num_rings may be an
overestimate here :(
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 05/18] drm/i915: Move GEM activity tracking into a common struct reservation_object
  2016-09-14  9:44   ` Joonas Lahtinen
@ 2016-09-14 17:35     ` Chris Wilson
  2016-09-15  9:38       ` Dave Gordon
  0 siblings, 1 reply; 51+ messages in thread
From: Chris Wilson @ 2016-09-14 17:35 UTC (permalink / raw)
  To: Joonas Lahtinen; +Cc: intel-gfx, Daniel Vetter

On Wed, Sep 14, 2016 at 12:44:04PM +0300, Joonas Lahtinen wrote:
> On ke, 2016-09-14 at 07:52 +0100, Chris Wilson wrote:
> > -static inline bool
> > -i915_gem_object_has_active_engine(const struct drm_i915_gem_object *obj,
> > -				  int engine)
> > -{
> > -	return obj->flags & BIT(engine + I915_BO_ACTIVE_SHIFT);
> > +	return obj->active_count;
> 
> our type is bool, so !!obj->active_count;

That's the beauty of using bool, !! is implied on the (bool)integer
conversion.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 11/18] drm/i915: Record space required for request emission
  2016-09-14 17:33     ` Chris Wilson
@ 2016-09-15  8:59       ` Tvrtko Ursulin
  0 siblings, 0 replies; 51+ messages in thread
From: Tvrtko Ursulin @ 2016-09-15  8:59 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 14/09/2016 18:33, Chris Wilson wrote:
> On Wed, Sep 14, 2016 at 02:30:20PM +0100, Tvrtko Ursulin wrote:
>> On 14/09/2016 07:52, Chris Wilson wrote:
>>> In the next patch, we will use deferred request emission. That requires
>>> reserving sufficient space in the ringbuffer to emit the request, which
>>> first requires us to know how large the request is.
>>>
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>> ---
>>>   drivers/gpu/drm/i915/i915_gem_request.c |  1 +
>>>   drivers/gpu/drm/i915/intel_lrc.c        |  6 ++++++
>>>   drivers/gpu/drm/i915/intel_ringbuffer.c | 29 +++++++++++++++++++++++++++--
>>>   drivers/gpu/drm/i915/intel_ringbuffer.h |  1 +
>>>   4 files changed, 35 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
>>> index ebc2feba3e50..9a735e2d5aea 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem_request.c
>>> +++ b/drivers/gpu/drm/i915/i915_gem_request.c
>>> @@ -425,6 +425,7 @@ i915_gem_request_alloc(struct intel_engine_cs *engine,
>>>   	 * away, e.g. because a GPU scheduler has deferred it.
>>>   	 */
>>>   	req->reserved_space = MIN_SPACE_FOR_ADD_REQUEST;
>>> +	GEM_BUG_ON(req->reserved_space < engine->emit_request_sz);
>>>   	if (i915.enable_execlists)
>>>   		ret = intel_logical_ring_alloc_request_extras(req);
>>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
>>> index 00fcf36ba919..7e944a0acc10 100644
>>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>>> @@ -1572,6 +1572,8 @@ static int gen8_emit_request(struct drm_i915_gem_request *request)
>>>   	return intel_logical_ring_advance(request);
>>>   }
>>> +static const int gen8_emit_request_sz = 6 + WA_TAIL_DWORDS;
>>> +
>>>   static int gen8_emit_request_render(struct drm_i915_gem_request *request)
>>>   {
>>>   	struct intel_ring *ring = request->ring;
>>> @@ -1603,6 +1605,8 @@ static int gen8_emit_request_render(struct drm_i915_gem_request *request)
>>>   	return intel_logical_ring_advance(request);
>>>   }
>>> +static const int gen8_emit_request_render_sz = 8 + WA_TAIL_DWORDS;
>>> +
>>>   static int gen8_init_rcs_context(struct drm_i915_gem_request *req)
>>>   {
>>>   	int ret;
>>> @@ -1677,6 +1681,7 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine)
>>>   	engine->reset_hw = reset_common_ring;
>>>   	engine->emit_flush = gen8_emit_flush;
>>>   	engine->emit_request = gen8_emit_request;
>>> +	engine->emit_request_sz = gen8_emit_request_sz;
>>>   	engine->submit_request = execlists_submit_request;
>>>   	engine->irq_enable = gen8_logical_ring_enable_irq;
>>> @@ -1799,6 +1804,7 @@ int logical_render_ring_init(struct intel_engine_cs *engine)
>>>   	engine->init_context = gen8_init_rcs_context;
>>>   	engine->emit_flush = gen8_emit_flush_render;
>>>   	engine->emit_request = gen8_emit_request_render;
>>> +	engine->emit_request_sz = gen8_emit_request_render_sz;
>>>   	ret = intel_engine_create_scratch(engine, 4096);
>>>   	if (ret)
>>> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
>>> index 597e35c9b699..c8c9ad40fd93 100644
>>> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
>>> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
>>> @@ -1405,6 +1405,8 @@ static int i9xx_emit_request(struct drm_i915_gem_request *req)
>>>   	return 0;
>>>   }
>>> +static const int i9xx_emit_request_sz = 4;
>>> +
>>>   /**
>>>    * gen6_sema_emit_request - Update the semaphore mailbox registers
>>>    *
>>> @@ -1458,6 +1460,8 @@ static int gen8_render_emit_request(struct drm_i915_gem_request *req)
>>>   	return 0;
>>>   }
>>> +static const int gen8_render_emit_request_sz = 8;
>>> +
>>>   /**
>>>    * intel_ring_sync - sync the waiter to the signaller on seqno
>>>    *
>>> @@ -2677,8 +2681,21 @@ static void intel_ring_default_vfuncs(struct drm_i915_private *dev_priv,
>>>   	engine->reset_hw = reset_ring_common;
>>>   	engine->emit_request = i9xx_emit_request;
>>> -	if (i915.semaphores)
>>> +	engine->emit_request_sz = i9xx_emit_request_sz;
>>> +	if (i915.semaphores) {
>>> +		int num_rings;
>>> +
>>>   		engine->emit_request = gen6_sema_emit_request;
>>> +
>>> +		num_rings = hweight32(INTEL_INFO(dev_priv)->ring_mask) - 1;
>> You can use INTEL_INFO(dev_priv)->num_rings instead of hweight32.
> I thought info->num_rings was set afterwards? (And num_rings may be an
> overestimate here :(

You are correct, my oversight. Maybe it should have been num_rings and 
num_initialized_rings.

Regards,

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 05/18] drm/i915: Move GEM activity tracking into a common struct reservation_object
  2016-09-14 17:35     ` Chris Wilson
@ 2016-09-15  9:38       ` Dave Gordon
  2016-09-15  9:55         ` Jani Nikula
  0 siblings, 1 reply; 51+ messages in thread
From: Dave Gordon @ 2016-09-15  9:38 UTC (permalink / raw)
  To: Chris Wilson, Joonas Lahtinen, intel-gfx, Mika Kuoppala, Daniel Vetter

On 14/09/16 18:35, Chris Wilson wrote:
> On Wed, Sep 14, 2016 at 12:44:04PM +0300, Joonas Lahtinen wrote:
>> On ke, 2016-09-14 at 07:52 +0100, Chris Wilson wrote:
>>> -static inline bool
>>> -i915_gem_object_has_active_engine(const struct drm_i915_gem_object *obj,
>>> -				  int engine)
>>> -{
>>> -	return obj->flags & BIT(engine + I915_BO_ACTIVE_SHIFT);
>>> +	return obj->active_count;
>>
>> our type is bool, so !!obj->active_count;
>
> That's the beauty of using bool, !! is implied on the (bool)integer
> conversion.
> -Chris

It's a gcc extension, though, so does it work on clang?

.Dave.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 05/18] drm/i915: Move GEM activity tracking into a common struct reservation_object
  2016-09-15  9:38       ` Dave Gordon
@ 2016-09-15  9:55         ` Jani Nikula
  0 siblings, 0 replies; 51+ messages in thread
From: Jani Nikula @ 2016-09-15  9:55 UTC (permalink / raw)
  To: Dave Gordon, Chris Wilson, Joonas Lahtinen, intel-gfx,
	Mika Kuoppala, Daniel Vetter

On Thu, 15 Sep 2016, Dave Gordon <david.s.gordon@intel.com> wrote:
> On 14/09/16 18:35, Chris Wilson wrote:
>> On Wed, Sep 14, 2016 at 12:44:04PM +0300, Joonas Lahtinen wrote:
>>> On ke, 2016-09-14 at 07:52 +0100, Chris Wilson wrote:
>>>> -static inline bool
>>>> -i915_gem_object_has_active_engine(const struct drm_i915_gem_object *obj,
>>>> -				  int engine)
>>>> -{
>>>> -	return obj->flags & BIT(engine + I915_BO_ACTIVE_SHIFT);
>>>> +	return obj->active_count;
>>>
>>> our type is bool, so !!obj->active_count;
>>
>> That's the beauty of using bool, !! is implied on the (bool)integer
>> conversion.
>> -Chris
>
> It's a gcc extension, though, so does it work on clang?

It's in the standard.

"When any scalar value is converted to _Bool, the result is 0 if the
value compares equal to 0; otherwise, the result is 1."

BR,
Jani.


>
> .Dave.
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Jani Nikula, Intel Open Source Technology Center
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 05/18] drm/i915: Move GEM activity tracking into a common struct reservation_object
  2016-09-14  6:52 ` [PATCH 05/18] drm/i915: Move GEM activity tracking into a common struct reservation_object Chris Wilson
  2016-09-14  9:44   ` Joonas Lahtinen
@ 2016-09-16 11:40   ` Chris Wilson
  1 sibling, 0 replies; 51+ messages in thread
From: Chris Wilson @ 2016-09-16 11:40 UTC (permalink / raw)
  To: intel-gfx

On Wed, Sep 14, 2016 at 07:52:37AM +0100, Chris Wilson wrote:
> In preparation to support many distinct timelines, we need to expand the
> activity tracking on the GEM object to handle more than just a request
> per engine. We already use the struct reservation_object on the dma-buf
> to handle many fence contexts, so integrating that into the GEM object
> itself is the preferred solution. (For example, we can now share the same
> reservation_object between every consumer/producer using this buffer and
> skip the manual import/export via dma-buf.)
> 
> Caveats:
> 
>  * busy-ioctl: the modern interface is completely thrown away,
> regressing us back to before
> 
> commit e9808edd98679680804dfbc42c5ee8f1aa91f617 [v3.10]
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Wed Jul 4 12:25:08 2012 +0100
> 
>     drm/i915: Return a mask of the active rings in the high word of busy_ioctl

There is an alternative here, as we can walk through the obj->resv and
convert *i915* fences to their respective read/write engines.

However, this means that busy-ioctl and wait-ioctl(.timeout=0) are no
longer equivalent as that busy-ioctl may report an object as idle but
still has external rendering (and so wait-ioctl, set-to-domain-ioctl,
pread, pwrite may stall).

Walking the resv object is also more expensive than doing a simple test.
Though that can be mitigated later by opencoding the test (once we can
do the whole busy-ioctl under only RCU protection that makes more
sense). But it makes busy-ioctl slower than ideal, and busy-ioctl is
high frequency (so long as we keep avoiding struct_mutex, the difference
is likely only visible in the microbenchmarks).

Pro: we keep the finese of being able to report which engines are busy
Con: not equivalent to wait-ioctl, polling the dmabuf, etc.

That we have alternative mechanisms to check may be considered a good
thing, i.e. we can migrate to using wait-ioctl for the general am I
going to stall test, and using busy-ioctl for engine selection. I think
I am leaning towards keeping the ABI more or less intact for the time
being. Though we are getting pretty close to its limit on NUM_ENGINES.

int
i915_gem_busy_ioctl(struct drm_device *dev, void *data,
                    struct drm_file *file)
{
        struct drm_i915_gem_busy *args = data;
        struct drm_i915_gem_object *obj;
        struct fence **shared, *excl;
        unsigned int count, i;
        int ret;

        ret = -ENOENT;
        obj = i915_gem_object_lookup(file, args->handle);
        if (obj) {
                ret = reservation_object_get_fences_rcu(obj->resv,
                                                        &excl, &count, &shared);
                i915_gem_object_put_unlocked(obj);
        }
        if (unlikely(ret))
                return ret;

         /* A discrepancy here is that we do not report the status of
          * non-i915 fences, i.e. even though we may report the object as idle,
          * a call to set-domain may still stall waiting for foreign rendering.
          * This also means that wait-ioctl may report an object as busy,
          * where busy-ioctl considers it idle.
          *
          * We trade the ability to warn of foreign fences to report on which
          * i915 engines are active for the object.
          *
          * Alternatively, we can trade that extra information on read/write
          * activity with
          *     args->busy =
          *             !reservation_object_test_signaled_rcu(obj->resv, true);
          * to report the overall busyness. This is what the wait-ioctl does.
          */
        args->busy = 0;

        /* Translate shared fences to READ set of engines */
        for (i = 0; i < count; i++) {
                args->busy |= busy_check_reader(shared[i]);
                fence_put(shared[i]);
        }
        kfree(shared);

        /* Translate the exclusive fence to the READ *and* WRITE engine */
        if (excl) {
                args->busy |= busy_check_writer(excl);
                fence_put(excl);
        }

        return 0;
}

-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 09/18] drm/i915: Wait first for submission, before waiting for request completion
  2016-09-14  6:52 ` [PATCH 09/18] drm/i915: Wait first for submission, before waiting for request completion Chris Wilson
@ 2016-09-19  8:59   ` Joonas Lahtinen
  0 siblings, 0 replies; 51+ messages in thread
From: Joonas Lahtinen @ 2016-09-19  8:59 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On ke, 2016-09-14 at 07:52 +0100, Chris Wilson wrote:
> +static long
> +__i915_request_wait_for_submit(struct drm_i915_gem_request *request,
> +			       unsigned int flags,
> +			       long timeout)
> +{
> +	const int state = flags & I915_WAIT_INTERRUPTIBLE ?
> +		TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;

const bool wait_locked = flags & I915_WAIT_LOCKED;

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 10/18] drm/i915: Introduce a global_seqno for each request
  2016-09-14  6:52 ` [PATCH 10/18] drm/i915: Introduce a global_seqno for each request Chris Wilson
@ 2016-09-19 10:36   ` Joonas Lahtinen
  0 siblings, 0 replies; 51+ messages in thread
From: Joonas Lahtinen @ 2016-09-19 10:36 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On ke, 2016-09-14 at 07:52 +0100, Chris Wilson wrote:
> @@ -465,8 +466,15 @@ i915_gem_request_await_request(struct
> drm_i915_gem_request *to,
>  		return ret < 0 ? ret : 0;
>  	}
>  
> +	if (from->global_seqno == 0) {

Just use (!from->global_seqno) here too, for consistency.

Might be worth adding to the commit message that currently global_seqno
is still immediately assigned at creation.

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 11/18] drm/i915: Record space required for request emission
  2016-09-14  6:52 ` [PATCH 11/18] drm/i915: Record space required for request emission Chris Wilson
  2016-09-14 13:30   ` Tvrtko Ursulin
@ 2016-09-19 10:47   ` Joonas Lahtinen
  2016-09-19 11:32     ` Chris Wilson
  1 sibling, 1 reply; 51+ messages in thread
From: Joonas Lahtinen @ 2016-09-19 10:47 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On ke, 2016-09-14 at 07:52 +0100, Chris Wilson wrote:
> @@ -1572,6 +1572,8 @@ static int gen8_emit_request(struct drm_i915_gem_request *request)
>  	return intel_logical_ring_advance(request);
>  }
>  
> +static const int gen8_emit_request_sz = 6 + WA_TAIL_DWORDS;

Could argue these to be #define by current convention.

> @@ -1677,6 +1681,7 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine)
>  	engine->reset_hw = reset_common_ring;
>  	engine->emit_flush = gen8_emit_flush;
>  	engine->emit_request = gen8_emit_request;
> +	engine->emit_request_sz = gen8_emit_request_sz;

This assignment would then stand out better too, now it looks like a
bunch of function assignments.

> @@ -2677,8 +2681,21 @@ static void intel_ring_default_vfuncs(struct drm_i915_private *dev_priv,
>  	engine->reset_hw = reset_ring_common;
>  
>  	engine->emit_request = i9xx_emit_request;
> -	if (i915.semaphores)
> +	engine->emit_request_sz = i9xx_emit_request_sz;
> +	if (i915.semaphores) {
> +		int num_rings;

'initialized_rings' to differentiate as suggested by Tvrtko too.

> +
>  		engine->emit_request = gen6_sema_emit_request;
> +
> +		num_rings = hweight32(INTEL_INFO(dev_priv)->ring_mask) - 1;
> +		if (INTEL_GEN(dev_priv) >= 8) {
> +			engine->emit_request_sz += num_rings * 6;
> +		} else {
> +			engine->emit_request_sz += num_rings * 3;
> +			if (num_rings & 1)
> +				engine->emit_request_sz++;

Please do add a comment explaining this.

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 01/18] drm/i915: Support asynchronous waits on struct fence from i915_gem_request
  2016-09-14  7:37   ` Joonas Lahtinen
@ 2016-09-19 11:26     ` Chris Wilson
  0 siblings, 0 replies; 51+ messages in thread
From: Chris Wilson @ 2016-09-19 11:26 UTC (permalink / raw)
  To: Joonas Lahtinen; +Cc: intel-gfx

On Wed, Sep 14, 2016 at 10:37:18AM +0300, Joonas Lahtinen wrote:
> On ke, 2016-09-14 at 07:52 +0100, Chris Wilson wrote:
> > +	array = to_fence_array(fence);
> > +	for (i = 0; i < array->num_fences; i++) {
> > +		struct fence *child = array->fences[i];
> > +
> > +		if (fence_is_i915(child))
> > +			ret = i915_gem_request_await_request(req,
> > +							     to_request(child));
> > +		else
> > +			ret = i915_sw_fence_await_dma_fence(&req->submit,
> > +							    child, 10*HZ,
> > +							    GFP_KERNEL);
> 
> This chunk repeats from above, make it a small
> __i915_gem_request_await_fence function.

It's not a repeat. If it were, we could just recurse. Except we don't
know how deep the fence-array could go.

> > +		if (ret < 0)
> > +			return ret;
> 
> How about the failure case when we're half bound, no need to tear down?

We submit the partial request that doesn't execute anything more than
the context setup. The next request in the timeline with then inherit
the context setup from this request. Same rule as everything else
handling errors during request construction.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 11/18] drm/i915: Record space required for request emission
  2016-09-19 10:47   ` Joonas Lahtinen
@ 2016-09-19 11:32     ` Chris Wilson
  2016-09-19 16:09       ` Joonas Lahtinen
  0 siblings, 1 reply; 51+ messages in thread
From: Chris Wilson @ 2016-09-19 11:32 UTC (permalink / raw)
  To: Joonas Lahtinen; +Cc: intel-gfx

On Mon, Sep 19, 2016 at 01:47:30PM +0300, Joonas Lahtinen wrote:
> On ke, 2016-09-14 at 07:52 +0100, Chris Wilson wrote:
> > @@ -2677,8 +2681,21 @@ static void intel_ring_default_vfuncs(struct drm_i915_private *dev_priv,
> >  	engine->reset_hw = reset_ring_common;
> >  
> >  	engine->emit_request = i9xx_emit_request;
> > -	if (i915.semaphores)
> > +	engine->emit_request_sz = i9xx_emit_request_sz;
> > +	if (i915.semaphores) {
> > +		int num_rings;
> 
> 'initialized_rings' to differentiate as suggested by Tvrtko too.

Tvrtko wasn't referring to this. The point is that we don't know
initialized_rings here, just the declared set of rings we think
the hardware is supposed to have.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 12/18] drm/i915: Defer request emission
  2016-09-14  6:52 ` [PATCH 12/18] drm/i915: Defer " Chris Wilson
@ 2016-09-19 12:06   ` Joonas Lahtinen
  0 siblings, 0 replies; 51+ messages in thread
From: Joonas Lahtinen @ 2016-09-19 12:06 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On ke, 2016-09-14 at 07:52 +0100, Chris Wilson wrote:
> +static void gen8_emit_wa_tail(struct drm_i915_gem_request *request, u32 *out)
>  {
> -	struct intel_ring *ring = request->ring;
> -	int ret;
> -
> -	ret = intel_ring_begin(request, 6 + WA_TAIL_DWORDS);
> -	if (ret)
> -		return ret;
> +	out[0] = MI_NOOP;
> +	out[1] = MI_NOOP;

How about storing 'out' on the request and still using the emit
wrapper? I assume passing a separate variable might end up rather
fragile.

> -static int gen8_rcs_signal(struct drm_i915_gem_request *req)
> +static u32 *gen8_rcs_signal(struct drm_i915_gem_request *req, u32 *out)
>  {
> -	struct intel_ring *ring = req->ring;
>  	struct drm_i915_private *dev_priv = req->i915;
>  	struct intel_engine_cs *waiter;
>  	enum intel_engine_id id;
> -	int ret, num_rings;
> -
> -	num_rings = INTEL_INFO(dev_priv)->num_rings;
> -	ret = intel_ring_begin(req, (num_rings-1) * 8);
> -	if (ret)
> -		return ret;
>  
>  	for_each_engine_id(waiter, dev_priv, id) {
>  		u64 gtt_offset = req->engine->semaphore.signal_ggtt[id];
>  		if (gtt_offset == MI_SEMAPHORE_SYNC_INVALID)
>  			continue;
>  
> -		intel_ring_emit(ring, GFX_OP_PIPE_CONTROL(6));
> -		intel_ring_emit(ring,
> -				PIPE_CONTROL_GLOBAL_GTT_IVB |
> -				PIPE_CONTROL_QW_WRITE |
> -				PIPE_CONTROL_CS_STALL);
> -		intel_ring_emit(ring, lower_32_bits(gtt_offset));
> -		intel_ring_emit(ring, upper_32_bits(gtt_offset));
> -		intel_ring_emit(ring, req->global_seqno);
> -		intel_ring_emit(ring, 0);
> -		intel_ring_emit(ring,
> -				MI_SEMAPHORE_SIGNAL |
> -				MI_SEMAPHORE_TARGET(waiter->hw_id));
> -		intel_ring_emit(ring, 0);
> +		*out++ = GFX_OP_PIPE_CONTROL(6);
> +		*out++ = (PIPE_CONTROL_GLOBAL_GTT_IVB |
> +			  PIPE_CONTROL_QW_WRITE |
> +			  PIPE_CONTROL_CS_STALL);
> +		*out++ = lower_32_bits(gtt_offset);
> +		*out++ = upper_32_bits(gtt_offset);
> +		*out++ = req->global_seqno;
> +		*out++ = 0;
> +		*out++ = (MI_SEMAPHORE_SIGNAL |
> +			  MI_SEMAPHORE_TARGET(waiter->hw_id));
> +		*out++ = 0;

And this is mixing a different approach in... Pick one and convert
others to it. I like having the wrapper for easier grepping of emitted
commands, but YMMV.

> +static void gen6_sema_emit_request(struct drm_i915_gem_request *req,
> +				   u32 *out)
>  {
> -	int ret;
> -
> -	ret = req->engine->semaphore.signal(req);
> -	if (ret)
> -		return ret;
> -
> -	return i9xx_emit_request(req);
> +	return i9xx_emit_request(req, req->engine->semaphore.signal(req, out));

Not sure if this chaining is beautiful, I'd split signal to its own
line of "out = ...".

Those addressed,

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 13/18] drm/i915: Move the global sync optimisation to the timeline
  2016-09-14  6:52 ` [PATCH 13/18] drm/i915: Move the global sync optimisation to the timeline Chris Wilson
@ 2016-09-19 13:16   ` Joonas Lahtinen
  0 siblings, 0 replies; 51+ messages in thread
From: Joonas Lahtinen @ 2016-09-19 13:16 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On ke, 2016-09-14 at 07:52 +0100, Chris Wilson wrote:
> @@ -262,6 +263,12 @@ static int i915_gem_init_global_seqno(struct drm_i915_private *dev_priv,
>  	for_each_engine(engine, dev_priv)
>  		intel_engine_init_global_seqno(engine, seqno);
>  
> +	list_for_each_entry(tl, &dev_priv->gt.timelines, link) {
> +		for (i = 0; i < ARRAY_SIZE(tl->engine); i++)
> +			memset(tl->engine[i].sync_seqno, 0,
> +			       sizeof(tl->engine[i].sync_seqno));
> +	}
> +

I think have braces in both loops or neither, looks extremely odd
otherwise.

> @@ -454,7 +461,7 @@ static int
>  i915_gem_request_await_request(struct drm_i915_gem_request *to,
> >  			       struct drm_i915_gem_request *from)
>  {
> > -	int idx, ret;
> > +	int ret;
>  
> >  	GEM_BUG_ON(to == from);
>  
> @@ -474,8 +481,7 @@ i915_gem_request_await_request(struct drm_i915_gem_request *to,
>  		return ret < 0 ? ret : 0;
>  	}
>  
> -	idx = intel_engine_sync_index(from->engine, to->engine);

You remove this here, and the only place it remains used
is i915_gpu_error.c, maybe make it static in there?

> @@ -263,16 +263,13 @@ static void error_print_engine(struct drm_i915_error_state_buf *m,
>  	if (INTEL_GEN(m->i915) >= 6) {
>  		err_printf(m, "  RC PSMI: 0x%08x\n", ee->rc_psmi);
>  		err_printf(m, "  FAULT_REG: 0x%08x\n", ee->fault_reg);
> -		err_printf(m, "  SYNC_0: 0x%08x [last synced 0x%08x]\n",
> -			   ee->semaphore_mboxes[0],
> -			   ee->semaphore_seqno[0]);
> -		err_printf(m, "  SYNC_1: 0x%08x [last synced 0x%08x]\n",
> -			   ee->semaphore_mboxes[1],
> -			   ee->semaphore_seqno[1]);
> +		err_printf(m, "  SYNC_0: 0x%08x\n",
> +			   ee->semaphore_mboxes[0]);
> +		err_printf(m, "  SYNC_1: 0x%08x\n",
> +			   ee->semaphore_mboxes[1]);
>  		if (HAS_VEBOX(m->i915)) {
> -			err_printf(m, "  SYNC_2: 0x%08x [last synced 0x%08x]\n",
> -				   ee->semaphore_mboxes[2],
> -				   ee->semaphore_seqno[2]);
> +			err_printf(m, "  SYNC_2: 0x%08x\n",
> +				   ee->semaphore_mboxes[2]);
>  		}

You can drop these braces too.

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 14/18] drm/i915: Create a unique name for the context
  2016-09-14  6:52 ` [PATCH 14/18] drm/i915: Create a unique name for the context Chris Wilson
@ 2016-09-19 13:23   ` Joonas Lahtinen
  0 siblings, 0 replies; 51+ messages in thread
From: Joonas Lahtinen @ 2016-09-19 13:23 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On ke, 2016-09-14 at 07:52 +0100, Chris Wilson wrote:
> @@ -310,12 +311,21 @@ __create_hw_context(struct drm_device *dev,
>  			goto err_out;
>  	} else
>  		ret = DEFAULT_CONTEXT_HANDLE;

Confusing indent, so add braces to above else and a newline here.

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 15/18] drm/i915: Reserve space in the global seqno during request allocation
  2016-09-14  6:52 ` [PATCH 15/18] drm/i915: Reserve space in the global seqno during request allocation Chris Wilson
@ 2016-09-19 13:47   ` Joonas Lahtinen
  2016-09-19 15:35     ` Jani Nikula
  0 siblings, 1 reply; 51+ messages in thread
From: Joonas Lahtinen @ 2016-09-19 13:47 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On ke, 2016-09-14 at 07:52 +0100, Chris Wilson wrote:
> +static int reserve_global_seqno(struct drm_i915_private *i915)
>  {
> -	struct i915_gem_timeline *tl = &dev_priv->gt.global_timeline;
> +	struct i915_gem_timeline *tl = &i915->gt.global_timeline;
> +	u32 next_seqno = atomic_read(&tl->next_seqno);
>  
> -	/* reserve 0 for non-seqno */
> -	if (unlikely(tl->next_seqno == 0)) {

Meh, do not hide the ++i915->gt.active_requests in if (unlikely(...)).

> +	if (unlikely(next_seqno + ++i915->gt.active_requests <= next_seqno)) {
>  		int ret;
>  

Why not if (likely(next_seqno + active_requests > next_seqno))
		return 0;

> -		ret = i915_gem_init_global_seqno(dev_priv, 0);
> -		if (ret)
> +		ret = i915_gem_init_global_seqno(i915, 0);
> +		if (ret) {
> +			i915->gt.active_requests--;
>  			return ret;

With above change the built-in teardown becomes OK. Otherwise split.

> -
> -		tl->next_seqno = 1;
> +		}
>  	}
>  
> -	*seqno = tl->next_seqno++;
>  	return 0;
>  }
>  
> +static u32 timeline_get_seqno(struct i915_gem_timeline *tl)
> +{
> +	return atomic_inc_return(&tl->next_seqno);
> +}
> +

How about u32 timeline_peek_seqno(), as you have quite a few
atomic_reads on it. Also, lockdep_assert_held would be useful sprinkled
ni the functions.

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 15/18] drm/i915: Reserve space in the global seqno during request allocation
  2016-09-19 13:47   ` Joonas Lahtinen
@ 2016-09-19 15:35     ` Jani Nikula
  2016-09-19 16:07       ` Joonas Lahtinen
  0 siblings, 1 reply; 51+ messages in thread
From: Jani Nikula @ 2016-09-19 15:35 UTC (permalink / raw)
  To: Joonas Lahtinen, Chris Wilson, intel-gfx

On Mon, 19 Sep 2016, Joonas Lahtinen <joonas.lahtinen@linux.intel.com> wrote:
> On ke, 2016-09-14 at 07:52 +0100, Chris Wilson wrote:
>> +static int reserve_global_seqno(struct drm_i915_private *i915)
>>  {
>> -	struct i915_gem_timeline *tl = &dev_priv->gt.global_timeline;
>> +	struct i915_gem_timeline *tl = &i915->gt.global_timeline;
>> +	u32 next_seqno = atomic_read(&tl->next_seqno);
>>  
>> -	/* reserve 0 for non-seqno */
>> -	if (unlikely(tl->next_seqno == 0)) {
>
> Meh, do not hide the ++i915->gt.active_requests in if (unlikely(...)).

*shudder* it doesn't get any better by removing unlikely!

BR,
Jani.

>
>> +	if (unlikely(next_seqno + ++i915->gt.active_requests <= next_seqno)) {
>>  		int ret;
>>  
>
> Why not if (likely(next_seqno + active_requests > next_seqno))
> 		return 0;
>
>> -		ret = i915_gem_init_global_seqno(dev_priv, 0);
>> -		if (ret)
>> +		ret = i915_gem_init_global_seqno(i915, 0);
>> +		if (ret) {
>> +			i915->gt.active_requests--;
>>  			return ret;
>
> With above change the built-in teardown becomes OK. Otherwise split.
>
>> -
>> -		tl->next_seqno = 1;
>> +		}
>>  	}
>>  
>> -	*seqno = tl->next_seqno++;
>>  	return 0;
>>  }
>>  
>> +static u32 timeline_get_seqno(struct i915_gem_timeline *tl)
>> +{
>> +	return atomic_inc_return(&tl->next_seqno);
>> +}
>> +
>
> How about u32 timeline_peek_seqno(), as you have quite a few
> atomic_reads on it. Also, lockdep_assert_held would be useful sprinkled
> ni the functions.
>
> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>
> Regards, Joonas

-- 
Jani Nikula, Intel Open Source Technology Center
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 16/18] drm/i915: Enable multiple timelines
  2016-09-14  6:52 ` [PATCH 16/18] drm/i915: Enable multiple timelines Chris Wilson
@ 2016-09-19 15:52   ` Joonas Lahtinen
  2016-10-20 12:49     ` Chris Wilson
  0 siblings, 1 reply; 51+ messages in thread
From: Joonas Lahtinen @ 2016-09-19 15:52 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On ke, 2016-09-14 at 07:52 +0100, Chris Wilson wrote:
> @@ -315,17 +304,42 @@ submit_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
>  {
>  	struct drm_i915_gem_request *request =
>  		container_of(fence, typeof(*request), submit);
> +	struct intel_timeline *timeline;
> +	struct intel_engine_cs *engine = request->engine;
> +	unsigned long flags;
> +	u32 seqno;
>  
>  	/* Will be called from irq-context when using foreign DMA fences */
>  
> -	switch (state) {
> -	case FENCE_COMPLETE:
> -		request->engine->submit_request(request);
> -		break;
> +	if (state != FENCE_COMPLETE)
> +		return NOTIFY_DONE;
>  
> -	case FENCE_FREE:
> -		break;
> -	}
> +	timeline = engine->timeline;
> +	GEM_BUG_ON(timeline == request->timeline);

Umm, why this BUG_ON?

> +
> +	spin_lock_irqsave(&timeline->lock, flags);
> +
> +	seqno = timeline_get_seqno(&request->i915->gt.global_timeline);
> +	GEM_BUG_ON(seqno == 0);
> +
> +	request->previous_seqno = timeline->last_submitted_seqno;
> +	timeline->last_submitted_seqno = seqno;
> +
> +	spin_lock_nested(&request->lock, SINGLE_DEPTH_NESTING);

A comment might be in place for the nested locking.

> +	request->global_seqno = seqno;
> +	if (test_bit(FENCE_FLAG_ENABLE_SIGNAL_BIT, &request->fence.flags))
> +		intel_engine_enable_signaling(request);
> +	spin_unlock(&request->lock);
> +
> +	engine->emit_request(request, request->ring->vaddr + request->postfix);

Is it just me or does this look very strange, let's "emit the request"
at "vaddr + request->postfix" offset. Makes no sense. Maybe reconsider
refreshing the vfunc name.

> +
> +	spin_lock_nested(&request->timeline->lock, SINGLE_DEPTH_NESTING);
> +	list_move_tail(&request->link, &timeline->requests);
> +	spin_unlock(&request->timeline->lock);
> +
> +	engine->submit_request(request);
> +
> +	spin_unlock_irqrestore(&timeline->lock, flags);
>  
>  	return NOTIFY_DONE;
>  }
> 

> @@ -685,12 +697,15 @@ void __i915_add_request(struct drm_i915_gem_request *request, bool flush_caches)
> 
>  		i915_sw_fence_await_sw_fence(&request->submit, &prev->submit,
>  					     &request->submitq, GFP_NOWAIT);
>  
> -	request->emitted_jiffies = jiffies;
> -	request->previous_seqno = timeline->last_submitted_seqno;
> +	spin_lock_irq(&timeline->lock);
> +	list_add_tail(&request->link, &timeline->requests);
> +	spin_unlock_irq(&timeline->lock);
> +
>  	timeline->last_submitted_seqno = request->fence.seqno;
>  	i915_gem_active_set(&timeline->last_request, request);
> -	list_add_tail(&request->link, &timeline->requests);
> +
>  	list_add_tail(&request->ring_link, &ring->request_list);
> +	request->emitted_jiffies = jiffies;

Is this assigment really worth delaying here?

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 07/18] drm/i915: Restore nonblocking awaits for modesetting
  2016-09-14  6:52 ` [PATCH 07/18] drm/i915: Restore nonblocking awaits for modesetting Chris Wilson
@ 2016-09-19 16:01   ` Joonas Lahtinen
  0 siblings, 0 replies; 51+ messages in thread
From: Joonas Lahtinen @ 2016-09-19 16:01 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Maarten, could you give this a look?

On ke, 2016-09-14 at 07:52 +0100, Chris Wilson wrote:
> After combining the dma-buf reservation object and the GEM reservation
> object, we lost the ability to do a nonblocking wait on the i915 request
> (as we blocked upon the reservation object during prepare_fb). We can
> instead convert the reservation object into a fence upon which we can
> asynchronously wait (including a forced timeout in case the DMA fence is
> never signaled).
> 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/intel_display.c | 78 ++++++++++++++++++++++--------------
>  drivers/gpu/drm/i915/intel_drv.h     |  2 +
>  2 files changed, 51 insertions(+), 29 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> index 5c0ea08dddb3..294df122f7c6 100644
> --- a/drivers/gpu/drm/i915/intel_display.c
> +++ b/drivers/gpu/drm/i915/intel_display.c
> @@ -14383,12 +14383,33 @@ static void intel_atomic_commit_tail(struct drm_atomic_state *state)
>  
>  static void intel_atomic_commit_work(struct work_struct *work)
>  {
> > -	struct drm_atomic_state *state = container_of(work,
> > -						      struct drm_atomic_state,
> > -						      commit_work);
> > +	struct drm_atomic_state *state =
> > +		container_of(work, struct drm_atomic_state, commit_work);
> +
> >  	intel_atomic_commit_tail(state);
>  }
>  
> +static int __i915_sw_fence_call
> +intel_atomic_commit_ready(struct i915_sw_fence *fence,
> > +			  enum i915_sw_fence_notify notify)
> +{
> > +	struct intel_atomic_state *state =
> > +		container_of(fence, struct intel_atomic_state, commit_ready);
> +
> > +	switch (notify) {
> > +	case FENCE_COMPLETE:
> > +		if (state->base.commit_work.func)
> > +			queue_work(system_unbound_wq, &state->base.commit_work);
> > +		break;
> +
> > +	case FENCE_FREE:
> > +		drm_atomic_state_put(&state->base);
> > +		break;
> > +	}
> +
> > +	return NOTIFY_DONE;
> +}
> +
>  static void intel_atomic_track_fbs(struct drm_atomic_state *state)
>  {
> >  	struct drm_plane_state *old_plane_state;
> @@ -14434,11 +14455,14 @@ static int intel_atomic_commit(struct drm_device *dev,
> >  	if (ret)
> >  		return ret;
>  
> > -	INIT_WORK(&state->commit_work, intel_atomic_commit_work);
> > +	drm_atomic_state_get(state);
> > +	i915_sw_fence_init(&intel_state->commit_ready,
> > +			   intel_atomic_commit_ready);
>  
> >  	ret = intel_atomic_prepare_commit(dev, state);
> >  	if (ret) {
> >  		DRM_DEBUG_ATOMIC("Preparing state failed with %i\n", ret);
> > +		i915_sw_fence_commit(&intel_state->commit_ready);
> >  		return ret;
> >  	}
>  
> @@ -14449,10 +14473,14 @@ static int intel_atomic_commit(struct drm_device *dev,
> >  	intel_atomic_track_fbs(state);
>  
> >  	drm_atomic_state_get(state);
> > -	if (nonblock)
> > -		queue_work(system_unbound_wq, &state->commit_work);
> > -	else
> > +	INIT_WORK(&state->commit_work,
> > +		  nonblock ? intel_atomic_commit_work : NULL);
> +
> > +	i915_sw_fence_commit(&intel_state->commit_ready);
> > +	if (!nonblock) {
> > +		i915_sw_fence_wait(&intel_state->commit_ready);
> >  		intel_atomic_commit_tail(state);
> > +	}
>  
> >  	return 0;
>  }
> @@ -14568,8 +14596,7 @@ intel_prepare_plane_fb(struct drm_plane *plane,
> >  	struct drm_framebuffer *fb = new_state->fb;
> >  	struct drm_i915_gem_object *obj = intel_fb_obj(fb);
> >  	struct drm_i915_gem_object *old_obj = intel_fb_obj(plane->state->fb);
> > -	long lret;
> > -	int ret = 0;
> > +	int ret;
>  
> >  	if (!obj && !old_obj)
> >  		return 0;
> @@ -14577,7 +14604,6 @@ intel_prepare_plane_fb(struct drm_plane *plane,
> >  	if (old_obj) {
> >  		struct drm_crtc_state *crtc_state =
> >  			drm_atomic_get_existing_crtc_state(new_state->state, plane->state->crtc);
> > -		long timeout = 0;
>  
> >  		/* Big Hammer, we also need to ensure that any pending
> >  		 * MI_WAIT_FOR_EVENT inside a user batch buffer on the
> @@ -14590,31 +14616,25 @@ intel_prepare_plane_fb(struct drm_plane *plane,
> >  		 * This should only fail upon a hung GPU, in which case we
> >  		 * can safely continue.
> >  		 */
> > -		if (needs_modeset(crtc_state))
> > -			timeout = i915_gem_object_wait(old_obj,
> > -						       I915_WAIT_INTERRUPTIBLE |
> > -						       I915_WAIT_LOCKED,
> > -						       MAX_SCHEDULE_TIMEOUT,
> > -						       NULL);
> > -		if (timeout < 0) {
> > -			/* GPU hangs should have been swallowed by the wait */
> > -			WARN_ON(timeout == -EIO);
> > -			return timeout;
> > +		if (needs_modeset(crtc_state)) {
> > +			ret = i915_sw_fence_await_reservation(&to_intel_atomic_state(new_state->state)->commit_ready,
> > +							      old_obj->resv, NULL,
> > +							      false, 0,
> > +							      GFP_KERNEL);
> > +			if (ret < 0)
> > +				return ret;
> >  		}
> >  	}
>  
> >  	if (!obj)
> >  		return 0;
>  
> > -	/* For framebuffer backed by dmabuf, wait for fence */
> > -	lret = i915_gem_object_wait(obj,
> > -				    I915_WAIT_INTERRUPTIBLE | I915_WAIT_LOCKED,
> > -				    MAX_SCHEDULE_TIMEOUT,
> > -				    NULL);
> > -	if (lret == -ERESTARTSYS)
> > -		return lret;
> -
> > -	WARN(lret < 0, "waiting returns %li\n", lret);
> > +	ret = i915_sw_fence_await_reservation(&to_intel_atomic_state(new_state->state)->commit_ready,
> > +					      obj->resv, NULL,
> > +					      false, 10*HZ,
> > +					      GFP_KERNEL);
> > +	if (ret < 0)
> > +		return ret;
>  
> >  	if (plane->type == DRM_PLANE_TYPE_CURSOR &&
> >  	    INTEL_INFO(dev)->cursor_needs_physical) {
> diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
> index 15e3050edeb9..cb99a2540863 100644
> --- a/drivers/gpu/drm/i915/intel_drv.h
> +++ b/drivers/gpu/drm/i915/intel_drv.h
> @@ -357,6 +357,8 @@ struct intel_atomic_state {
>  
> >  	/* Gen9+ only */
> >  	struct skl_wm_values wm_results;
> +
> > +	struct i915_sw_fence commit_ready;
>  };
>  
>  struct intel_plane_state {
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 15/18] drm/i915: Reserve space in the global seqno during request allocation
  2016-09-19 15:35     ` Jani Nikula
@ 2016-09-19 16:07       ` Joonas Lahtinen
  0 siblings, 0 replies; 51+ messages in thread
From: Joonas Lahtinen @ 2016-09-19 16:07 UTC (permalink / raw)
  To: Jani Nikula, Chris Wilson, intel-gfx

On ma, 2016-09-19 at 18:35 +0300, Jani Nikula wrote:
> > On Mon, 19 Sep 2016, Joonas Lahtinen <joonas.lahtinen@linux.intel.com> wrote:
> > 
> > On ke, 2016-09-14 at 07:52 +0100, Chris Wilson wrote:
> > > 
> > > +static int reserve_global_seqno(struct drm_i915_private *i915)
> > >  {
> > > -	struct i915_gem_timeline *tl = &dev_priv->gt.global_timeline;
> > > +	struct i915_gem_timeline *tl = &i915->gt.global_timeline;
> > > +	u32 next_seqno = atomic_read(&tl->next_seqno);
> > >  
> > > -	/* reserve 0 for non-seqno */
> > > -	if (unlikely(tl->next_seqno == 0)) {
> > 
> > Meh, do not hide the ++i915->gt.active_requests in if (unlikely(...)).
> 
> *shudder* it doesn't get any better by removing unlikely!
> 

Well, it gets twice as bad by adding it, though :P

My below recommendation would be preferred by lifting the increment
before the if ().

> > > 
> > > +	if (unlikely(next_seqno + ++i915->gt.active_requests <= next_seqno)) {
> > >  		int ret;
> > >  
> > 
> > Why not if (likely(next_seqno + active_requests > next_seqno))
> > > > 		return 0;

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 11/18] drm/i915: Record space required for request emission
  2016-09-19 11:32     ` Chris Wilson
@ 2016-09-19 16:09       ` Joonas Lahtinen
  0 siblings, 0 replies; 51+ messages in thread
From: Joonas Lahtinen @ 2016-09-19 16:09 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On ma, 2016-09-19 at 12:32 +0100, Chris Wilson wrote:
> On Mon, Sep 19, 2016 at 01:47:30PM +0300, Joonas Lahtinen wrote:
> > On ke, 2016-09-14 at 07:52 +0100, Chris Wilson wrote:
> > > @@ -2677,8 +2681,21 @@ static void intel_ring_default_vfuncs(struct drm_i915_private *dev_priv,
> > >     engine->reset_hw = reset_ring_common;
> > >  
> > >     engine->emit_request = i9xx_emit_request;
> > > -   if (i915.semaphores)
> > > +   engine->emit_request_sz = i9xx_emit_request_sz;
> > > +   if (i915.semaphores) {
> > > +           int num_rings;
> > 
> > 'initialized_rings' to differentiate as suggested by Tvrtko too.
> 
> Tvrtko wasn't referring to this. The point is that we don't know
> initialized_rings here, just the declared set of rings we think
> the hardware is supposed to have.

Can't we afford to calculate this to the info struct anyway?

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 16/18] drm/i915: Enable multiple timelines
  2016-09-19 15:52   ` Joonas Lahtinen
@ 2016-10-20 12:49     ` Chris Wilson
  2016-10-20 15:25       ` Joonas Lahtinen
  0 siblings, 1 reply; 51+ messages in thread
From: Chris Wilson @ 2016-10-20 12:49 UTC (permalink / raw)
  To: Joonas Lahtinen; +Cc: intel-gfx

On Mon, Sep 19, 2016 at 06:52:13PM +0300, Joonas Lahtinen wrote:
> On ke, 2016-09-14 at 07:52 +0100, Chris Wilson wrote:
> > @@ -315,17 +304,42 @@ submit_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
> >  {
> >  	struct drm_i915_gem_request *request =
> >  		container_of(fence, typeof(*request), submit);
> > +	struct intel_timeline *timeline;
> > +	struct intel_engine_cs *engine = request->engine;
> > +	unsigned long flags;
> > +	u32 seqno;
> >  
> >  	/* Will be called from irq-context when using foreign DMA fences */
> >  
> > -	switch (state) {
> > -	case FENCE_COMPLETE:
> > -		request->engine->submit_request(request);
> > -		break;
> > +	if (state != FENCE_COMPLETE)
> > +		return NOTIFY_DONE;
> >  
> > -	case FENCE_FREE:
> > -		break;
> > -	}
> > +	timeline = engine->timeline;
> > +	GEM_BUG_ON(timeline == request->timeline);
> 
> Umm, why this BUG_ON?

To document that the intent here is to move from the per-context
timeline onto the global per-engine timeline. If the request was already
on the engine->timeline bad things would happen, at the very simplest a
deadlock here.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 16/18] drm/i915: Enable multiple timelines
  2016-10-20 12:49     ` Chris Wilson
@ 2016-10-20 15:25       ` Joonas Lahtinen
  0 siblings, 0 replies; 51+ messages in thread
From: Joonas Lahtinen @ 2016-10-20 15:25 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On to, 2016-10-20 at 13:49 +0100, Chris Wilson wrote:
> On Mon, Sep 19, 2016 at 06:52:13PM +0300, Joonas Lahtinen wrote:
> > 
> > On ke, 2016-09-14 at 07:52 +0100, Chris Wilson wrote:
> > > 
> > > @@ -315,17 +304,42 @@ submit_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
> > >  {
> > >  	struct drm_i915_gem_request *request =
> > >  		container_of(fence, typeof(*request), submit);
> > > +	struct intel_timeline *timeline;
> > > +	struct intel_engine_cs *engine = request->engine;
> > > +	unsigned long flags;
> > > +	u32 seqno;
> > >  
> > >  	/* Will be called from irq-context when using foreign DMA fences */
> > >  
> > > -	switch (state) {
> > > -	case FENCE_COMPLETE:
> > > -		request->engine->submit_request(request);
> > > -		break;
> > > +	if (state != FENCE_COMPLETE)
> > > +		return NOTIFY_DONE;
> > >  
> > > -	case FENCE_FREE:
> > > -		break;
> > > -	}
> > > +	timeline = engine->timeline;
> > > +	GEM_BUG_ON(timeline == request->timeline);
> > 
> > Umm, why this BUG_ON?
> 
> To document that the intent here is to move from the per-context
> timeline onto the global per-engine timeline. If the request was already
> on the engine->timeline bad things would happen, at the very simplest a
> deadlock here.

I see. I would have lifted the assignment and the BUG_ON before actual
code, like elsewhere. But I can live with this too.

The new patch seems to have got rid of the emit_request weirdness too,
so I'll add by R-b there.

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 17/18] drm/i915: Enable userspace to opt-out of implicit fencing
  2016-08-30  8:17 Non-blocking fences, now GuC compatible Chris Wilson
@ 2016-08-30  8:18 ` Chris Wilson
  0 siblings, 0 replies; 51+ messages in thread
From: Chris Wilson @ 2016-08-30  8:18 UTC (permalink / raw)
  To: intel-gfx; +Cc: mika.kuoppala

Userspace is faced with a dilemma. The kernel requires implicit fencing
to manage resource usage (we always must wait for the GPU to finish
before releasing its PTE) and for third parties. However, userspace may
wish to avoid this serialisation if it is either using explicit fencing
between parties and wants more fine-grained access to buffers (e.g. it
may partition the buffer between uses and track fences on ranges rather
than the implicit fences tracking the whole object). To whit, userspace
needs a mechanism to avoid the kernel's serialisation on its implicit
fences before execbuf execution.

The next question is whether this is an object, execbuf or context flag.
Hybrid users (such as using explicit EGL_ANDROID_native_sync fencing on
shared winsys buffers, but implicit fencing on internal surfaces)
require a per-object level flag. Given that this flag need to be only
set once for the lifetime of the object, this reduces the convenience of
having an execbuf or context level flag (and avoids having multiple
pieces of uABI controlling the same feature).

Incorrect use of this flag will result in rendering corruption and GPU
hangs - but will not result in use-after-free or similar resource
tracking issues.

Serious caveat: write ordering is not strictly correct after setting
this flag on a render target on multiple engines. This affects all
subsequent GEM operations (execbuf, set-domain, pread) and shared
dma-buf operations. A fix is possible - but costly (both in terms of
further ABI changes and runtime overhead).

Testcase: igt/gem_exec_async
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.c            |  3 +++
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 30 ++++++++++++++++--------------
 include/uapi/drm/i915_drm.h                | 24 +++++++++++++++++++++++-
 3 files changed, 42 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 449fcfa40319..437ab0695373 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -362,6 +362,9 @@ static int i915_getparam(struct drm_device *dev, void *data,
 		 */
 		value = i915_gem_mmap_gtt_version();
 		break;
+	case I915_PARAM_HAS_EXEC_ASYNC:
+		value = 1;
+		break;
 	default:
 		DRM_DEBUG("Unknown parameter %d\n", param->param);
 		return -EINVAL;
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index b63f9327fc40..ca7675c9c1b7 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1199,21 +1199,23 @@ i915_gem_execbuffer_move_to_gpu(struct drm_i915_gem_request *req,
 		struct drm_i915_gem_object *obj = vma->obj;
 		struct reservation_object *resv;
 
-		if (obj->flags & other_rings) {
-			ret = eb_await_bo(req, obj,
-					  obj->base.pending_write_domain);
-			if (ret)
-				return ret;
-		}
+		if ((vma->exec_entry->flags & EXEC_OBJECT_ASYNC) == 0) {
+			if (obj->flags & other_rings) {
+				ret = eb_await_bo(req, obj,
+						  obj->base.pending_write_domain);
+				if (ret)
+					return ret;
+			}
 
-		resv = i915_gem_object_get_dmabuf_resv(obj);
-		if (resv) {
-			ret = i915_sw_fence_await_reservation
-				(&req->submit, resv, &i915_fence_ops,
-				 obj->base.pending_write_domain,
-				 GFP_KERNEL | __GFP_NOWARN);
-			if (ret < 0)
-				return ret;
+			resv = i915_gem_object_get_dmabuf_resv(obj);
+			if (resv) {
+				ret = i915_sw_fence_await_reservation
+					(&req->submit, resv, &i915_fence_ops,
+					 obj->base.pending_write_domain,
+					 GFP_KERNEL | __GFP_NOWARN);
+				if (ret < 0)
+					return ret;
+			}
 		}
 
 		if (obj->base.write_domain & I915_GEM_DOMAIN_CPU)
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 03725fe89859..730cba23f982 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -388,6 +388,10 @@ typedef struct drm_i915_irq_wait {
 #define I915_PARAM_HAS_POOLED_EU	 38
 #define I915_PARAM_MIN_EU_IN_POOL	 39
 #define I915_PARAM_MMAP_GTT_VERSION	 40
+/* Query whether DRM_I915_GEM_EXECBUFFER2 supports the ability to opt-out of
+ * synchronisation with implicit fencing on individual objects.
+ */
+#define I915_PARAM_HAS_EXEC_ASYNC	 41
 
 typedef struct drm_i915_getparam {
 	__s32 param;
@@ -729,8 +733,26 @@ struct drm_i915_gem_exec_object2 {
 #define EXEC_OBJECT_SUPPORTS_48B_ADDRESS (1<<3)
 #define EXEC_OBJECT_PINNED		 (1<<4)
 #define EXEC_OBJECT_PAD_TO_SIZE		 (1<<5)
+/* The kernel implicitly tracks GPU activity on all GEM objects, and
+ * synchronises operations with outstanding rendering. This includes
+ * rendering on other devices if exported via dma-buf. However, sometimes
+ * this tracking is too coarse and the user knows better. For example,
+ * if the object is split into non-overlapping ranges shared between different
+ * clients or engines (i.e. suballocating objects), the implicit tracking
+ * by kernel assumes that each operation affects the whole object rather
+ * than an individual range, causing needless synchronisation between clients.
+ *
+ * The kernel maintains the implicit tracking in order to manage resources
+ * used by the GPU - this flag only disables the synchronisation prior to
+ * rendering with this object in this execbuf.
+ *
+ * Opting out of implicit synhronisation requires the user to do its own
+ * explicit tracking to avoid rendering corruption. See, for example,
+ * I915_PARAM_HAS_EXEC_FENCE to order execbufs and execute them asynchronously.
+ */
+#define EXEC_OBJECT_ASYNC		 (1<<6)
 /* All remaining bits are MBZ and RESERVED FOR FUTURE USE */
-#define __EXEC_OBJECT_UNKNOWN_FLAGS -(EXEC_OBJECT_PAD_TO_SIZE<<1)
+#define __EXEC_OBJECT_UNKNOWN_FLAGS -(EXEC_OBJECT_ASYNC<<1)
 	__u64 flags;
 
 	union {
-- 
2.9.3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2016-10-20 15:25 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-14  6:52 Tracking multiple timelines (full-ppgtt) Chris Wilson
2016-09-14  6:52 ` [PATCH 01/18] drm/i915: Support asynchronous waits on struct fence from i915_gem_request Chris Wilson
2016-09-14  7:37   ` Joonas Lahtinen
2016-09-19 11:26     ` Chris Wilson
2016-09-14  6:52 ` [PATCH 02/18] drm/i915: Allow i915_sw_fence_await_sw_fence() to allocate Chris Wilson
2016-09-14  7:51   ` Joonas Lahtinen
2016-09-14  8:46     ` Chris Wilson
2016-09-14  6:52 ` [PATCH 03/18] drm/i915: Rearrange i915_wait_request() accounting with callers Chris Wilson
2016-09-14  8:47   ` Joonas Lahtinen
2016-09-14  6:52 ` [PATCH 04/18] drm/i915: Remove unused i915_gem_active_wait() in favour of _unlocked() Chris Wilson
2016-09-14  8:48   ` Joonas Lahtinen
2016-09-14  6:52 ` [PATCH 05/18] drm/i915: Move GEM activity tracking into a common struct reservation_object Chris Wilson
2016-09-14  9:44   ` Joonas Lahtinen
2016-09-14 17:35     ` Chris Wilson
2016-09-15  9:38       ` Dave Gordon
2016-09-15  9:55         ` Jani Nikula
2016-09-16 11:40   ` Chris Wilson
2016-09-14  6:52 ` [PATCH 06/18] drm: Add reference counting to drm_atomic_state Chris Wilson
2016-09-14  6:52 ` [PATCH 07/18] drm/i915: Restore nonblocking awaits for modesetting Chris Wilson
2016-09-19 16:01   ` Joonas Lahtinen
2016-09-14  6:52 ` [PATCH 08/18] drm/i915: Combine seqno + tracking into a global timeline struct Chris Wilson
2016-09-14 15:42   ` Joonas Lahtinen
2016-09-14  6:52 ` [PATCH 09/18] drm/i915: Wait first for submission, before waiting for request completion Chris Wilson
2016-09-19  8:59   ` Joonas Lahtinen
2016-09-14  6:52 ` [PATCH 10/18] drm/i915: Introduce a global_seqno for each request Chris Wilson
2016-09-19 10:36   ` Joonas Lahtinen
2016-09-14  6:52 ` [PATCH 11/18] drm/i915: Record space required for request emission Chris Wilson
2016-09-14 13:30   ` Tvrtko Ursulin
2016-09-14 17:33     ` Chris Wilson
2016-09-15  8:59       ` Tvrtko Ursulin
2016-09-19 10:47   ` Joonas Lahtinen
2016-09-19 11:32     ` Chris Wilson
2016-09-19 16:09       ` Joonas Lahtinen
2016-09-14  6:52 ` [PATCH 12/18] drm/i915: Defer " Chris Wilson
2016-09-19 12:06   ` Joonas Lahtinen
2016-09-14  6:52 ` [PATCH 13/18] drm/i915: Move the global sync optimisation to the timeline Chris Wilson
2016-09-19 13:16   ` Joonas Lahtinen
2016-09-14  6:52 ` [PATCH 14/18] drm/i915: Create a unique name for the context Chris Wilson
2016-09-19 13:23   ` Joonas Lahtinen
2016-09-14  6:52 ` [PATCH 15/18] drm/i915: Reserve space in the global seqno during request allocation Chris Wilson
2016-09-19 13:47   ` Joonas Lahtinen
2016-09-19 15:35     ` Jani Nikula
2016-09-19 16:07       ` Joonas Lahtinen
2016-09-14  6:52 ` [PATCH 16/18] drm/i915: Enable multiple timelines Chris Wilson
2016-09-19 15:52   ` Joonas Lahtinen
2016-10-20 12:49     ` Chris Wilson
2016-10-20 15:25       ` Joonas Lahtinen
2016-09-14  6:52 ` [PATCH 17/18] drm/i915: Enable userspace to opt-out of implicit fencing Chris Wilson
2016-09-14  6:52 ` [PATCH 18/18] drm/i915: Support explicit fencing for execbuf Chris Wilson
2016-09-14  9:16 ` ✗ Fi.CI.BAT: failure for series starting with [01/18] drm/i915: Support asynchronous waits on struct fence from i915_gem_request Patchwork
  -- strict thread matches above, loose matches on Subject: below --
2016-08-30  8:17 Non-blocking fences, now GuC compatible Chris Wilson
2016-08-30  8:18 ` [PATCH 17/18] drm/i915: Enable userspace to opt-out of implicit fencing Chris Wilson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.