All of lore.kernel.org
 help / color / mirror / Atom feed
* Trivial scheduler
@ 2016-10-27  0:03 Chris Wilson
  2016-10-27  0:03 ` [PATCH 1/6] drm/i915: Show the execlist queue in debugfs/i915_engine_info Chris Wilson
                   ` (7 more replies)
  0 siblings, 8 replies; 21+ messages in thread
From: Chris Wilson @ 2016-10-27  0:03 UTC (permalink / raw)
  To: intel-gfx

Started out as what intended to be a quick look at how to extend the
fences to support the priority inheritance ended up completing the final
few steps to implement the most trivial priority based scheduler. A lot
of the policy decisions have been left out (such as e.g. only submitting
one request from each context each interrupt) so that it is not very
fine-grained, nor does it implement time-slicing and fair load
balancing, and continues to live in blissful ignorance of the challenge
of preemption. It is also execlists only. It purely serves as fodder to
take the next steps towards a real scheduler, and in the meantime offers
one benefit to current desktops, giving uber priority to the rendering
of the pageflip.

This adds the dependency tracking of the requests required to do
inflight reordering (as distinct from the dependency tracking required
to decide when we can submit the request to hardware) and implements a
priority sorted rbtree of the ready-to-execute requests to be submitted
on the next context-switch.
-Chris

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/6] drm/i915: Show the execlist queue in debugfs/i915_engine_info
  2016-10-27  0:03 Trivial scheduler Chris Wilson
@ 2016-10-27  0:03 ` Chris Wilson
  2016-11-02 13:27   ` Joonas Lahtinen
  2016-10-27  0:03 ` [PATCH 2/6] drm/i915/scheduler: Signal the arrival of a new request Chris Wilson
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 21+ messages in thread
From: Chris Wilson @ 2016-10-27  0:03 UTC (permalink / raw)
  To: intel-gfx

When looking at freezes whilst working on execlists, knowing the order
of the pending requests in the driver is useful.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 925aaedd7fd9..e9382ea2e626 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -3256,6 +3256,12 @@ static int i915_engine_info(struct seq_file *m, void *unused)
 			else
 				seq_printf(m, "\t\tELSP[1] idle\n");
 			rcu_read_unlock();
+
+			spin_lock_irq(&engine->execlist_lock);
+			list_for_each_entry(rq, &engine->execlist_queue, execlist_link) {
+				print_request(m, rq, "\t\tQ ");
+			}
+			spin_unlock_irq(&engine->execlist_lock);
 		} else if (INTEL_GEN(dev_priv) > 6) {
 			seq_printf(m, "\tPP_DIR_BASE: 0x%08x\n",
 				   I915_READ(RING_PP_DIR_BASE(engine)));
-- 
2.10.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 2/6] drm/i915/scheduler: Signal the arrival of a new request
  2016-10-27  0:03 Trivial scheduler Chris Wilson
  2016-10-27  0:03 ` [PATCH 1/6] drm/i915: Show the execlist queue in debugfs/i915_engine_info Chris Wilson
@ 2016-10-27  0:03 ` Chris Wilson
  2016-10-27  0:03 ` [PATCH 3/6] drm/i915/scheduler: Record all dependencies upon request construction Chris Wilson
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 21+ messages in thread
From: Chris Wilson @ 2016-10-27  0:03 UTC (permalink / raw)
  To: intel-gfx

The start of the scheduler, add a hook into request submission for the
scheduler to see the arrival of new requests and prepare its runqueues.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.c         |  4 ++++
 drivers/gpu/drm/i915/i915_gem_request.c | 13 +++++++++++++
 drivers/gpu/drm/i915/intel_engine_cs.c  |  3 +++
 drivers/gpu/drm/i915/intel_ringbuffer.h |  9 +++++++++
 include/uapi/drm/i915_drm.h             |  5 +++++
 5 files changed, 34 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index dd02ad69eb7b..11b9f6c38699 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -323,6 +323,10 @@ static int i915_getparam(struct drm_device *dev, void *data,
 		 */
 		value = i915_gem_mmap_gtt_version();
 		break;
+	case I915_PARAM_HAS_SCHEDULER:
+		value = dev_priv->engine[RCS] &&
+			dev_priv->engine[RCS]->schedule;
+		break;
 	case I915_PARAM_MMAP_VERSION:
 		/* Remember to bump this if the version changes! */
 	case I915_PARAM_HAS_GEM:
diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
index 2232f86b892d..864ab686af33 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -737,6 +737,19 @@ void __i915_add_request(struct drm_i915_gem_request *request, bool flush_caches)
 
 	i915_gem_mark_busy(engine);
 
+	/* Let the backend know a new request has arrived that may need
+	 * to adjust the existing execution schedule due to a high priority
+	 * request - i.e. we may want to preempt the current request in order
+	 * to run a high priority dependency chain *before* we can execute this
+	 * request.
+	 *
+	 * This is called before the request is ready to run so that we can
+	 * decide whether to preempt the entire chain so that it is ready to
+	 * run at the earliest possible convenience.
+	 */
+	if (engine->schedule)
+		engine->schedule(request, 0);
+
 	local_bh_disable();
 	i915_sw_fence_commit(&request->submit);
 	local_bh_enable(); /* Kick the execlists tasklet if just scheduled */
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 94de3d66733d..3e8ecbd9b95d 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -102,6 +102,9 @@ intel_engine_setup(struct drm_i915_private *dev_priv,
 	engine->mmio_base = info->mmio_base;
 	engine->irq_shift = info->irq_shift;
 
+	/* Nothing to do here, execute in order of dependencies */
+	engine->schedule = NULL;
+
 	dev_priv->engine[id] = engine;
 	return 0;
 }
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index dcb145c67e89..97e162efe557 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -267,6 +267,15 @@ struct intel_engine_cs {
 	 */
 	void		(*submit_request)(struct drm_i915_gem_request *req);
 
+	/* Call when the priority on a request has changed and it and its
+	 * dependencies may need rescheduling. Note the request itself may
+	 * not be ready to run!
+	 *
+	 * Called under the struct_mutex.
+	 */
+	void		(*schedule)(struct drm_i915_gem_request *request,
+				    int priority);
+
 	/* Some chipsets are not quite as coherent as advertised and need
 	 * an expensive kick to force a true read of the up-to-date seqno.
 	 * However, the up-to-date seqno is not always required and the last
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index a1d04d8bc80a..f51d429feaae 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -402,6 +402,11 @@ typedef struct drm_i915_irq_wait {
  */
 #define I915_PARAM_HAS_EXEC_FENCE	 42
 
+/* Query whether DRM_I915_GEM_EXECBUFFER2 supports user defined execution
+ * priorities and the driver will attempt to execute batches in priority order.
+ */
+#define I915_PARAM_HAS_SCHEDULER	 43
+
 typedef struct drm_i915_getparam {
 	__s32 param;
 	/*
-- 
2.10.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 3/6] drm/i915/scheduler: Record all dependencies upon request construction
  2016-10-27  0:03 Trivial scheduler Chris Wilson
  2016-10-27  0:03 ` [PATCH 1/6] drm/i915: Show the execlist queue in debugfs/i915_engine_info Chris Wilson
  2016-10-27  0:03 ` [PATCH 2/6] drm/i915/scheduler: Signal the arrival of a new request Chris Wilson
@ 2016-10-27  0:03 ` Chris Wilson
  2016-10-27  0:03 ` [PATCH 4/6] drm/i915/scheduler: Execute requests in order of priorities Chris Wilson
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 21+ messages in thread
From: Chris Wilson @ 2016-10-27  0:03 UTC (permalink / raw)
  To: intel-gfx

The scheduler needs to know the dependencies of each request for the
lifetime of the request, as it may choose to reschedule the requests at
any time and must ensure the dependency tree is not broken. This is in
additional to using the fence to only allow execution after all
dependencies have been completed.

One option was to extend the fence to support the bidirectional
dependency tracking required by the scheduler. However the mismatch in
lifetimes between the submit fence and the request essentially meant
that we had to build a completely separate struct (and we could not
simply reuse the existing waitqueue in the fence for one half of the
dependency tracking). The extra dependency tracking simply did not mesh
well with the fence, and keeping it separate both keeps the fence
implementation simpler and allows us to extend the dependency tracking
into a priority tree (whilst maintaining support for reordering the
tree).

To avoid the additional allocations and list manipulations, the use of
the priotree is disabled when there are no schedulers to use it.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_request.c | 71 ++++++++++++++++++++++++++++++++-
 drivers/gpu/drm/i915/i915_gem_request.h | 23 +++++++++++
 2 files changed, 93 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
index 864ab686af33..cf06b036b47a 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -113,6 +113,59 @@ i915_gem_request_remove_from_client(struct drm_i915_gem_request *request)
 	spin_unlock(&file_priv->mm.lock);
 }
 
+static int
+i915_priotree_add_dependency(struct i915_priotree *pt,
+			     struct i915_priotree *signal,
+			     struct i915_dependency *dep)
+{
+	unsigned long flags = 0;
+
+	if (!dep) {
+		dep = kmalloc(sizeof(*dep), GFP_KERNEL);
+		if (!dep)
+			return -ENOMEM;
+
+		flags |= I915_DEPENDENCY_ALLOC;
+	}
+
+	dep->signal = signal;
+	dep->flags = flags;
+
+	list_add(&dep->post_link, &signal->post_list);
+	list_add(&dep->pre_link, &pt->pre_list);
+	return 0;
+}
+
+static void
+i915_priotree_fini(struct i915_priotree *pt)
+{
+	struct i915_dependency *dep, *next;
+
+	/* Everyone we depended upon should retire before us and remove
+	 * themselves from our list. However, retirement is run independently
+	 * on each timeline and so we may be called out-of-order.
+	 */
+	list_for_each_entry_safe(dep, next, &pt->pre_list, pre_link) {
+		list_del(&dep->post_link);
+		if (dep->flags & I915_DEPENDENCY_ALLOC)
+			kfree(dep);
+	}
+
+	/* Remove ourselves from everyone who depends upon us */
+	list_for_each_entry_safe(dep, next, &pt->post_list, post_link) {
+		list_del(&dep->pre_link);
+		if (dep->flags & I915_DEPENDENCY_ALLOC)
+			kfree(dep);
+	}
+}
+
+static void
+i915_priotree_init(struct i915_priotree *pt)
+{
+	INIT_LIST_HEAD(&pt->pre_list);
+	INIT_LIST_HEAD(&pt->post_list);
+}
+
 void i915_gem_retire_noop(struct i915_gem_active *active,
 			  struct drm_i915_gem_request *request)
 {
@@ -182,6 +235,8 @@ static void i915_gem_request_retire(struct drm_i915_gem_request *request)
 	i915_gem_context_put(request->ctx);
 
 	dma_fence_signal(&request->fence);
+
+	i915_priotree_fini(&request->priotree);
 	i915_gem_request_put(request);
 }
 
@@ -441,6 +496,7 @@ i915_gem_request_alloc(struct intel_engine_cs *engine,
 		       __timeline_get_seqno(req->timeline->common));
 
 	i915_sw_fence_init(&req->submit, submit_notify);
+	i915_priotree_init(&req->priotree);
 
 	INIT_LIST_HEAD(&req->active_list);
 	req->i915 = dev_priv;
@@ -495,6 +551,14 @@ i915_gem_request_await_request(struct drm_i915_gem_request *to,
 
 	GEM_BUG_ON(to == from);
 
+	if (to->engine->schedule) {
+		ret = i915_priotree_add_dependency(&to->priotree,
+						   &from->priotree,
+						   NULL);
+		if (ret < 0)
+			return ret;
+	}
+
 	if (to->timeline == from->timeline)
 		return 0;
 
@@ -718,9 +782,14 @@ void __i915_add_request(struct drm_i915_gem_request *request, bool flush_caches)
 
 	prev = i915_gem_active_raw(&timeline->last_request,
 				   &request->i915->drm.struct_mutex);
-	if (prev)
+	if (prev) {
 		i915_sw_fence_await_sw_fence(&request->submit, &prev->submit,
 					     &request->submitq);
+		if (engine->schedule)
+			i915_priotree_add_dependency(&request->priotree,
+						     &prev->priotree,
+						     &request->depq);
+	}
 
 	spin_lock_irq(&timeline->lock);
 	list_add_tail(&request->link, &timeline->requests);
diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
index 75f8360b3421..5841bbfa39ad 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.h
+++ b/drivers/gpu/drm/i915/i915_gem_request.h
@@ -41,6 +41,18 @@ struct intel_signal_node {
 	struct intel_wait wait;
 };
 
+struct i915_dependency {
+	struct i915_priotree *signal;
+	struct list_head pre_link, post_link;
+	unsigned long flags;
+#define I915_DEPENDENCY_ALLOC BIT(0)
+};
+
+struct i915_priotree {
+	struct list_head pre_list; /* who is before us, we depend upon */
+	struct list_head post_list; /* who is after us, they depend upon us */
+};
+
 /**
  * Request queue structure.
  *
@@ -87,6 +99,17 @@ struct drm_i915_gem_request {
 	struct i915_sw_fence submit;
 	wait_queue_t submitq;
 
+	/* A list of everyone we wait upon, and everyone who waits upon us.
+	 * Even though we will not be submitted to the hardware before the
+	 * submit fence is signaled (it waits for all external events as well
+	 * as our own requests), the scheduler still needs to know the
+	 * dependency tree for the lifetime of the request (from execbuf
+	 * to retirement), i.e. bidirectional dependency information for the
+	 * request not tied to individual fences.
+	 */
+	struct i915_priotree priotree;
+	struct i915_dependency depq;
+
 	u32 global_seqno;
 
 	/** GEM sequence number associated with the previous request,
-- 
2.10.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 4/6] drm/i915/scheduler: Execute requests in order of priorities
  2016-10-27  0:03 Trivial scheduler Chris Wilson
                   ` (2 preceding siblings ...)
  2016-10-27  0:03 ` [PATCH 3/6] drm/i915/scheduler: Record all dependencies upon request construction Chris Wilson
@ 2016-10-27  0:03 ` Chris Wilson
  2016-10-27  7:45   ` Chris Wilson
  2016-10-27  0:03 ` [PATCH 5/6] drm/i915/scheduler: Boost priorities for flips Chris Wilson
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 21+ messages in thread
From: Chris Wilson @ 2016-10-27  0:03 UTC (permalink / raw)
  To: intel-gfx

Track the priority of each request and use it to determine the order in
which we submit requests to the hardware via execlists.

The priority of the request is determined by the user (eventually via
the context) but may be overridden at any time by the driver. When we set
the priority of the request, we bump the priority of all of its
dependencies to match - so that a high priority drawing operation is not
stuck behind a background task.

When the request is ready to execute (i.e. we have signaled the submit
fence following completion of all its dependencies, including third
party fences), we put the request into a priority sorted rbtree to be
submitted to the hardware. If the request is higher priority than all
pending requests, it will be submitted on the next context-switch
interrupt as soon as the hardware has completed the current request. We
do not currently preempt any current execution to immediately run a very
high priority request, at least not yet.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c        |  7 ++-
 drivers/gpu/drm/i915/i915_gem.c            |  3 +-
 drivers/gpu/drm/i915/i915_gem_request.c    |  4 ++
 drivers/gpu/drm/i915/i915_gem_request.h    |  5 +-
 drivers/gpu/drm/i915/i915_guc_submission.c |  1 +
 drivers/gpu/drm/i915/intel_engine_cs.c     |  3 +-
 drivers/gpu/drm/i915/intel_lrc.c           | 93 +++++++++++++++++++++++++++---
 drivers/gpu/drm/i915/intel_ringbuffer.h    |  3 +-
 8 files changed, 103 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index e9382ea2e626..fcf171e468c6 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -631,8 +631,9 @@ static void print_request(struct seq_file *m,
 			  struct drm_i915_gem_request *rq,
 			  const char *prefix)
 {
-	seq_printf(m, "%s%x [%x:%x] @ %d: %s\n", prefix,
+	seq_printf(m, "%s%x [%x:%x] prio=%d @ %dms: %s\n", prefix,
 		   rq->global_seqno, rq->ctx->hw_id, rq->fence.seqno,
+		   rq->priotree.priority,
 		   jiffies_to_msecs(jiffies - rq->emitted_jiffies),
 		   rq->timeline->common->name);
 }
@@ -3219,6 +3220,7 @@ static int i915_engine_info(struct seq_file *m, void *unused)
 
 		if (i915.enable_execlists) {
 			u32 ptr, read, write;
+			struct rb_node *rb;
 
 			seq_printf(m, "\tExeclist status: 0x%08x %08x\n",
 				   I915_READ(RING_EXECLIST_STATUS_LO(engine)),
@@ -3258,7 +3260,8 @@ static int i915_engine_info(struct seq_file *m, void *unused)
 			rcu_read_unlock();
 
 			spin_lock_irq(&engine->execlist_lock);
-			list_for_each_entry(rq, &engine->execlist_queue, execlist_link) {
+			for (rb = engine->execlist_first; rb; rb = rb_next(rb)) {
+				rq = rb_entry(rb, typeof(*rq), priotree.node);
 				print_request(m, rq, "\t\tQ ");
 			}
 			spin_unlock_irq(&engine->execlist_lock);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index bdc18c680afe..5a29b2bf9750 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2683,10 +2683,11 @@ static void i915_gem_cleanup_engine(struct intel_engine_cs *engine)
 
 	if (i915.enable_execlists) {
 		spin_lock(&engine->execlist_lock);
-		INIT_LIST_HEAD(&engine->execlist_queue);
 		i915_gem_request_put(engine->execlist_port[0].request);
 		i915_gem_request_put(engine->execlist_port[1].request);
 		memset(engine->execlist_port, 0, sizeof(engine->execlist_port));
+		engine->execlist_queue = RB_ROOT;
+		engine->execlist_first = NULL;
 		spin_unlock(&engine->execlist_lock);
 	}
 }
diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
index cf06b036b47a..a62911a6a5fa 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -141,6 +141,8 @@ i915_priotree_fini(struct i915_priotree *pt)
 {
 	struct i915_dependency *dep, *next;
 
+	GEM_BUG_ON(!RB_EMPTY_NODE(&pt->node));
+
 	/* Everyone we depended upon should retire before us and remove
 	 * themselves from our list. However, retirement is run independently
 	 * on each timeline and so we may be called out-of-order.
@@ -164,6 +166,8 @@ i915_priotree_init(struct i915_priotree *pt)
 {
 	INIT_LIST_HEAD(&pt->pre_list);
 	INIT_LIST_HEAD(&pt->post_list);
+	RB_CLEAR_NODE(&pt->node);
+	pt->priority = INT_MIN;
 }
 
 void i915_gem_retire_noop(struct i915_gem_active *active,
diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
index 5841bbfa39ad..d182a772136c 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.h
+++ b/drivers/gpu/drm/i915/i915_gem_request.h
@@ -51,6 +51,8 @@ struct i915_dependency {
 struct i915_priotree {
 	struct list_head pre_list; /* who is before us, we depend upon */
 	struct list_head post_list; /* who is after us, they depend upon us */
+	struct rb_node node;
+	int priority;
 };
 
 /**
@@ -166,9 +168,6 @@ struct drm_i915_gem_request {
 	struct drm_i915_file_private *file_priv;
 	/** file_priv list entry for this request */
 	struct list_head client_list;
-
-	/** Link in the execlist submission queue, guarded by execlist_lock. */
-	struct list_head execlist_link;
 };
 
 extern const struct dma_fence_ops i915_fence_ops;
diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
index 857ef914cae7..8ef61bbe2dba 100644
--- a/drivers/gpu/drm/i915/i915_guc_submission.c
+++ b/drivers/gpu/drm/i915/i915_guc_submission.c
@@ -1520,6 +1520,7 @@ int i915_guc_submission_enable(struct drm_i915_private *dev_priv)
 	/* Take over from manual control of ELSP (execlists) */
 	for_each_engine(engine, dev_priv, id) {
 		engine->submit_request = i915_guc_submit;
+		engine->schedule = NULL;
 
 		/* Replay the current set of previously submitted requests */
 		list_for_each_entry(request,
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 3e8ecbd9b95d..ddcd8198b0d2 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -244,8 +244,9 @@ static void intel_engine_init_timeline(struct intel_engine_cs *engine)
  */
 void intel_engine_setup_common(struct intel_engine_cs *engine)
 {
-	INIT_LIST_HEAD(&engine->execlist_queue);
 	spin_lock_init(&engine->execlist_lock);
+	engine->execlist_queue = RB_ROOT;
+	engine->execlist_first = NULL;
 
 	intel_engine_init_timeline(engine);
 	intel_engine_init_hangcheck(engine);
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index fa3012c342cc..eafd20b5219a 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -432,8 +432,9 @@ static bool can_merge_ctx(const struct i915_gem_context *prev,
 
 static void execlists_dequeue(struct intel_engine_cs *engine)
 {
-	struct drm_i915_gem_request *cursor, *last;
+	struct drm_i915_gem_request *last;
 	struct execlist_port *port = engine->execlist_port;
+	struct rb_node *rb;
 	bool submit = false;
 
 	last = port->request;
@@ -470,7 +471,11 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 	 */
 
 	spin_lock(&engine->execlist_lock);
-	list_for_each_entry(cursor, &engine->execlist_queue, execlist_link) {
+	rb = engine->execlist_first;
+	while (rb) {
+		struct drm_i915_gem_request *cursor =
+			rb_entry(rb, typeof(*cursor), priotree.node);
+
 		/* Can we combine this request with the current port? It has to
 		 * be the same context/ringbuffer and not have any exceptions
 		 * (e.g. GVT saying never to combine contexts).
@@ -501,15 +506,17 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 			i915_gem_request_assign(&port->request, last);
 			port++;
 		}
+
+		rb = rb_next(rb);
+		rb_erase(&cursor->priotree.node, &engine->execlist_queue);
+		RB_CLEAR_NODE(&cursor->priotree.node);
+
 		last = cursor;
 		submit = true;
 	}
 	if (submit) {
-		/* Decouple all the requests submitted from the queue */
-		engine->execlist_queue.next = &cursor->execlist_link;
-		cursor->execlist_link.prev = &engine->execlist_queue;
-
 		i915_gem_request_assign(&port->request, last);
+		engine->execlist_first = rb;
 	}
 	spin_unlock(&engine->execlist_lock);
 
@@ -592,6 +599,32 @@ static void intel_lrc_irq_handler(unsigned long data)
 	intel_uncore_forcewake_put(dev_priv, engine->fw_domains);
 }
 
+static bool insert_request(struct i915_priotree *pt, struct rb_root *root)
+{
+	struct rb_node **p, *rb;
+	bool first = true;
+
+	/* more positive priorities are scheduled first */
+	rb = NULL;
+	p = &root->rb_node;
+	while (*p) {
+		struct i915_priotree *pos;
+
+		rb = *p;
+		pos = rb_entry(rb, typeof(*pos), node);
+		if (pos->priority >= pt->priority) { /* fifo */
+			p = &rb->rb_right;
+			first = false;
+		} else {
+			p = &rb->rb_left;
+		}
+	}
+	rb_link_node(&pt->node, rb, p);
+	rb_insert_color(&pt->node, root);
+
+	return first;
+}
+
 static void execlists_submit_request(struct drm_i915_gem_request *request)
 {
 	struct intel_engine_cs *engine = request->engine;
@@ -608,13 +641,54 @@ static void execlists_submit_request(struct drm_i915_gem_request *request)
 	request->previous_context = engine->last_context;
 	engine->last_context = request->ctx;
 
-	list_add_tail(&request->execlist_link, &engine->execlist_queue);
+	if (insert_request(&request->priotree, &engine->execlist_queue))
+		engine->execlist_first = &request->priotree.node;
 	if (execlists_elsp_idle(engine))
 		tasklet_hi_schedule(&engine->irq_tasklet);
 
 	spin_unlock_irqrestore(&engine->execlist_lock, flags);
 }
 
+static void update_priorities(struct i915_priotree *pt, int prio)
+{
+	struct drm_i915_gem_request *request =
+		container_of(pt, struct drm_i915_gem_request, priotree);
+	struct i915_dependency *dep;
+
+	if (prio <= pt->priority)
+		return;
+
+	if (i915_gem_request_completed(request))
+		return;
+
+	pt->priority = prio;
+
+	/* Recursively bump all dependent priorities to match the new request */
+	list_for_each_entry(dep, &pt->pre_list, pre_link)
+		update_priorities(dep->signal, prio);
+
+	/* Ensure the fifo tree remains correctly ordered wrt to deps */
+	if (!RB_EMPTY_NODE(&pt->node)) {
+		struct intel_engine_cs *engine = request->engine;
+
+		spin_lock_irq(&engine->execlist_lock);
+		if (!RB_EMPTY_NODE(&pt->node)) {
+			rb_erase(&pt->node, &engine->execlist_queue);
+			if (insert_request(pt, &engine->execlist_queue))
+				engine->execlist_first = &pt->node;
+		}
+		spin_unlock_irq(&engine->execlist_lock);
+	}
+}
+
+static void execlists_schedule(struct drm_i915_gem_request *request, int prio)
+{
+	/* Make sure that our entire dependency chain shares our prio */
+	update_priorities(&request->priotree, prio);
+
+	/* XXX Do we need to preempt to make room for us and our deps? */
+}
+
 int intel_logical_ring_alloc_request_extras(struct drm_i915_gem_request *request)
 {
 	struct intel_engine_cs *engine = request->engine;
@@ -1651,8 +1725,10 @@ void intel_execlists_enable_submission(struct drm_i915_private *dev_priv)
 	struct intel_engine_cs *engine;
 	enum intel_engine_id id;
 
-	for_each_engine(engine, dev_priv, id)
+	for_each_engine(engine, dev_priv, id) {
 		engine->submit_request = execlists_submit_request;
+		engine->schedule = execlists_schedule;
+	}
 }
 
 static void
@@ -1665,6 +1741,7 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine)
 	engine->emit_breadcrumb = gen8_emit_breadcrumb;
 	engine->emit_breadcrumb_sz = gen8_emit_breadcrumb_sz;
 	engine->submit_request = execlists_submit_request;
+	engine->schedule = execlists_schedule;
 
 	engine->irq_enable = gen8_logical_ring_enable_irq;
 	engine->irq_disable = gen8_logical_ring_disable_irq;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 97e162efe557..a24a7e2e6c4c 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -349,7 +349,8 @@ struct intel_engine_cs {
 		struct drm_i915_gem_request *request;
 		unsigned int count;
 	} execlist_port[2];
-	struct list_head execlist_queue;
+	struct rb_root execlist_queue;
+	struct rb_node *execlist_first;
 	unsigned int fw_domains;
 	bool disable_lite_restore_wa;
 	bool preempt_wa;
-- 
2.10.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 5/6] drm/i915/scheduler: Boost priorities for flips
  2016-10-27  0:03 Trivial scheduler Chris Wilson
                   ` (3 preceding siblings ...)
  2016-10-27  0:03 ` [PATCH 4/6] drm/i915/scheduler: Execute requests in order of priorities Chris Wilson
@ 2016-10-27  0:03 ` Chris Wilson
  2016-10-27  0:03 ` [PATCH 6/6] drm/i915/scheduler: Support user-defined priorities Chris Wilson
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 21+ messages in thread
From: Chris Wilson @ 2016-10-27  0:03 UTC (permalink / raw)
  To: intel-gfx

Boost the priority of any rendering required to show the next pageflip
as we want to avoid missing the vblank by being delayed by invisible
workload. We prioritise avoiding jank and jitter in the GUI over
starving background tasks.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h      |  5 ++++
 drivers/gpu/drm/i915/i915_gem.c      | 54 ++++++++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_display.c |  2 ++
 3 files changed, 61 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index a53c80172752..075ce2b503ed 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3386,6 +3386,11 @@ int i915_gem_object_wait(struct drm_i915_gem_object *obj,
 			 unsigned int flags,
 			 long timeout,
 			 struct intel_rps_client *rps);
+int i915_gem_object_wait_priority(struct drm_i915_gem_object *obj,
+				  unsigned int flags,
+				  int priority);
+#define I915_PRIORITY_DISPLAY 1024
+
 int __must_check
 i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj,
 				  bool write);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 5a29b2bf9750..055c9e48ebe6 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -433,6 +433,60 @@ i915_gem_object_wait_reservation(struct reservation_object *resv,
 	return timeout;
 }
 
+static void
+i915_gem_object_boost_fence(struct dma_fence *fence, int boost)
+{
+	struct drm_i915_gem_request *rq;
+	struct intel_engine_cs *engine;
+
+	if (!dma_fence_is_i915(fence))
+		return;
+
+	rq = to_request(fence);
+	if (i915_gem_request_completed(rq))
+		return;
+
+	engine = rq->engine;
+	if (!engine->schedule)
+		return;
+
+	engine->schedule(rq, rq->priotree.priority + boost);
+}
+
+int
+i915_gem_object_wait_priority(struct drm_i915_gem_object *obj,
+			      unsigned int flags,
+			      int boost)
+{
+	struct dma_fence *excl;
+
+	if (flags & I915_WAIT_ALL) {
+		struct dma_fence **shared;
+		unsigned int count, i;
+		int ret;
+
+		ret = reservation_object_get_fences_rcu(obj->resv,
+							&excl, &count, &shared);
+		if (ret)
+			return ret;
+
+		for (i = 0; i < count; i++) {
+			i915_gem_object_boost_fence(shared[i], boost);
+			dma_fence_put(shared[i]);
+		}
+
+		kfree(shared);
+	} else {
+		excl = reservation_object_get_excl_rcu(obj->resv);
+	}
+
+	if (excl) {
+		i915_gem_object_boost_fence(excl, boost);
+		dma_fence_put(excl);
+	}
+	return 0;
+}
+
 /**
  * Waits for rendering to the object to be completed
  * @obj: i915 gem object
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 39cda13b6267..45d21421a2be 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -14213,6 +14213,8 @@ intel_prepare_plane_fb(struct drm_plane *plane,
 		}
 	}
 
+	i915_gem_object_wait_priority(obj, 0, I915_PRIORITY_DISPLAY);
+
 	return 0;
 }
 
-- 
2.10.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 6/6] drm/i915/scheduler: Support user-defined priorities
  2016-10-27  0:03 Trivial scheduler Chris Wilson
                   ` (4 preceding siblings ...)
  2016-10-27  0:03 ` [PATCH 5/6] drm/i915/scheduler: Boost priorities for flips Chris Wilson
@ 2016-10-27  0:03 ` Chris Wilson
  2016-10-27  0:46 ` ✓ Fi.CI.BAT: success for series starting with [1/6] drm/i915: Show the execlist queue in debugfs/i915_engine_info Patchwork
  2016-10-27 14:05 ` [PATCH v2 1/9] " Chris Wilson
  7 siblings, 0 replies; 21+ messages in thread
From: Chris Wilson @ 2016-10-27  0:03 UTC (permalink / raw)
  To: intel-gfx

Use a priority stored in the context as the initial value when
submitting a request. This allows us to change the default priority on a
per-context basis, allowing different contexts to be favoured with GPU
time at the expense of lower importance work. The user can adjust the
context's priority via I915_CONTEXT_PARAM_PRIORITY, with more positive
values being higher priority (they will be serviced earlier, after their
dependencies have been resolved). Any prerequisite work for an execbuf
will have its priority raised to match the new request as required.

Normal users can specify any value in the range of -1023 to 0 [default],
i.e. they can reduce the priority of their workloads (and temporarily
boost it back to normal if so desired).

Privileged users can specify any value in the range of -1023 to 1023,
[default is 0], i.e. they can raise their priority above all overs and
so potentially starve the system.

Note that the existing schedulers are not fair, nor load balancing, the
execution is strictly by priority on a first-come, first-served basis,
and the driver may choose to boost some requests above the range
available to users.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h         |  1 +
 drivers/gpu/drm/i915/i915_gem_context.c | 19 +++++++++++++++++++
 drivers/gpu/drm/i915/i915_gem_request.c |  2 +-
 include/uapi/drm/i915_drm.h             |  3 +++
 4 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 075ce2b503ed..b909a9ba6722 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -948,6 +948,7 @@ struct i915_gem_context {
 	/* Unique identifier for this context, used by the hw for tracking */
 	unsigned int hw_id;
 	u32 user_handle;
+	int priority; /* greater priorities are serviced first */
 
 	u32 ggtt_alignment;
 
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 34daef828dca..f688807b1c06 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -1095,6 +1095,9 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
 	case I915_CONTEXT_PARAM_NO_ERROR_CAPTURE:
 		args->value = !!(ctx->flags & CONTEXT_NO_ERROR_CAPTURE);
 		break;
+	case I915_CONTEXT_PARAM_PRIORITY:
+		args->value = ctx->priority;
+		break;
 	default:
 		ret = -EINVAL;
 		break;
@@ -1150,6 +1153,22 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
 				ctx->flags &= ~CONTEXT_NO_ERROR_CAPTURE;
 		}
 		break;
+
+	case I915_CONTEXT_PARAM_PRIORITY:
+		{
+			int priority = args->value;
+
+			if (args->size)
+				ret = -EINVAL;
+			else if (priority >= 1024 || priority <= -1024)
+				ret = -EINVAL;
+			else if (priority > 0 && !capable(CAP_SYS_ADMIN))
+				ret = -EPERM;
+			else
+				ctx->priority = priority;
+		}
+		break;
+
 	default:
 		ret = -EINVAL;
 		break;
diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
index a62911a6a5fa..cab40e3d044a 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -821,7 +821,7 @@ void __i915_add_request(struct drm_i915_gem_request *request, bool flush_caches)
 	 * run at the earliest possible convenience.
 	 */
 	if (engine->schedule)
-		engine->schedule(request, 0);
+		engine->schedule(request, request->ctx->priority);
 
 	local_bh_disable();
 	i915_sw_fence_commit(&request->submit);
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index f51d429feaae..9fa5eee64f6b 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -404,6 +404,8 @@ typedef struct drm_i915_irq_wait {
 
 /* Query whether DRM_I915_GEM_EXECBUFFER2 supports user defined execution
  * priorities and the driver will attempt to execute batches in priority order.
+ * The initial priority for each batch is supplied by the context and is
+ * controlled via I915_CONTEXT_PARAM_PRIORITY.
  */
 #define I915_PARAM_HAS_SCHEDULER	 43
 
@@ -1283,6 +1285,7 @@ struct drm_i915_gem_context_param {
 #define I915_CONTEXT_PARAM_NO_ZEROMAP	0x2
 #define I915_CONTEXT_PARAM_GTT_SIZE	0x3
 #define I915_CONTEXT_PARAM_NO_ERROR_CAPTURE	0x4
+#define I915_CONTEXT_PARAM_PRIORITY	0x5
 	__u64 value;
 };
 
-- 
2.10.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* ✓ Fi.CI.BAT: success for series starting with [1/6] drm/i915: Show the execlist queue in debugfs/i915_engine_info
  2016-10-27  0:03 Trivial scheduler Chris Wilson
                   ` (5 preceding siblings ...)
  2016-10-27  0:03 ` [PATCH 6/6] drm/i915/scheduler: Support user-defined priorities Chris Wilson
@ 2016-10-27  0:46 ` Patchwork
  2016-10-27 14:05 ` [PATCH v2 1/9] " Chris Wilson
  7 siblings, 0 replies; 21+ messages in thread
From: Patchwork @ 2016-10-27  0:46 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [1/6] drm/i915: Show the execlist queue in debugfs/i915_engine_info
URL   : https://patchwork.freedesktop.org/series/14444/
State : success

== Summary ==

Series 14444v1 Series without cover letter
https://patchwork.freedesktop.org/api/1.0/series/14444/revisions/1/mbox/


fi-bdw-5557u     total:246  pass:231  dwarn:0   dfail:0   fail:0   skip:15 
fi-bsw-n3050     total:246  pass:204  dwarn:0   dfail:0   fail:0   skip:42 
fi-bxt-t5700     total:246  pass:216  dwarn:0   dfail:0   fail:0   skip:30 
fi-byt-j1900     total:246  pass:215  dwarn:0   dfail:0   fail:0   skip:31 
fi-byt-n2820     total:246  pass:211  dwarn:0   dfail:0   fail:0   skip:35 
fi-hsw-4770      total:246  pass:224  dwarn:0   dfail:0   fail:0   skip:22 
fi-hsw-4770r     total:246  pass:223  dwarn:0   dfail:0   fail:0   skip:23 
fi-ilk-650       total:246  pass:185  dwarn:0   dfail:0   fail:0   skip:61 
fi-ivb-3520m     total:246  pass:220  dwarn:0   dfail:0   fail:0   skip:26 
fi-ivb-3770      total:246  pass:220  dwarn:0   dfail:0   fail:0   skip:26 
fi-kbl-7200u     total:246  pass:222  dwarn:0   dfail:0   fail:0   skip:24 
fi-skl-6260u     total:246  pass:232  dwarn:0   dfail:0   fail:0   skip:14 
fi-skl-6700hq    total:246  pass:223  dwarn:0   dfail:0   fail:0   skip:23 
fi-skl-6700k     total:246  pass:222  dwarn:1   dfail:0   fail:0   skip:23 
fi-skl-6770hq    total:246  pass:232  dwarn:0   dfail:0   fail:0   skip:14 
fi-snb-2520m     total:246  pass:209  dwarn:0   dfail:0   fail:0   skip:37 
fi-snb-2600      total:246  pass:208  dwarn:0   dfail:0   fail:0   skip:38 

5854c608e6354948f1f5e6be53132c897d0d4902 drm-intel-nightly: 2016y-10m-26d-21h-22m-02s UTC integration manifest
44515eb drm/i915: Show the execlist queue in debugfs/i915_engine_info

Full results at https://intel-gfx-ci.01.org/CI/Patchwork_2835/

== Logs ==

For more details see: https://intel-gfx-ci.01.org/CI/Patchwork_2835/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/6] drm/i915/scheduler: Execute requests in order of priorities
  2016-10-27  0:03 ` [PATCH 4/6] drm/i915/scheduler: Execute requests in order of priorities Chris Wilson
@ 2016-10-27  7:45   ` Chris Wilson
  0 siblings, 0 replies; 21+ messages in thread
From: Chris Wilson @ 2016-10-27  7:45 UTC (permalink / raw)
  To: intel-gfx

On Thu, Oct 27, 2016 at 01:03:46AM +0100, Chris Wilson wrote:
>  	spin_lock(&engine->execlist_lock);
> -	list_for_each_entry(cursor, &engine->execlist_queue, execlist_link) {
> +	rb = engine->execlist_first;
> +	while (rb) {
> +		struct drm_i915_gem_request *cursor =
> +			rb_entry(rb, typeof(*cursor), priotree.node);
> +
>  		/* Can we combine this request with the current port? It has to
>  		 * be the same context/ringbuffer and not have any exceptions
>  		 * (e.g. GVT saying never to combine contexts).
> @@ -501,15 +506,17 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>  			i915_gem_request_assign(&port->request, last);
>  			port++;
>  		}
> +
> +		rb = rb_next(rb);
> +		rb_erase(&cursor->priotree.node, &engine->execlist_queue);
> +		RB_CLEAR_NODE(&cursor->priotree.node);

Whoopsie, forgot to maintain the retirement order of the timeline...
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2 1/9] drm/i915: Show the execlist queue in debugfs/i915_engine_info
  2016-10-27  0:03 Trivial scheduler Chris Wilson
                   ` (6 preceding siblings ...)
  2016-10-27  0:46 ` ✓ Fi.CI.BAT: success for series starting with [1/6] drm/i915: Show the execlist queue in debugfs/i915_engine_info Patchwork
@ 2016-10-27 14:05 ` Chris Wilson
  2016-10-27 14:05   ` [PATCH v2 2/9] drm/i915: Split request submit/execute phase into two Chris Wilson
                     ` (7 more replies)
  7 siblings, 8 replies; 21+ messages in thread
From: Chris Wilson @ 2016-10-27 14:05 UTC (permalink / raw)
  To: intel-gfx

When looking at freezes whilst working on execlists, knowing the order
of the pending requests in the driver is useful.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 8d3cb8f5ff41..6ccd683a4fbc 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -3256,6 +3256,12 @@ static int i915_engine_info(struct seq_file *m, void *unused)
 			else
 				seq_printf(m, "\t\tELSP[1] idle\n");
 			rcu_read_unlock();
+
+			spin_lock_irq(&engine->execlist_lock);
+			list_for_each_entry(rq, &engine->execlist_queue, execlist_link) {
+				print_request(m, rq, "\t\tQ ");
+			}
+			spin_unlock_irq(&engine->execlist_lock);
 		} else if (INTEL_GEN(dev_priv) > 6) {
 			seq_printf(m, "\tPP_DIR_BASE: 0x%08x\n",
 				   I915_READ(RING_PP_DIR_BASE(engine)));
-- 
2.10.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 2/9] drm/i915: Split request submit/execute phase into two
  2016-10-27 14:05 ` [PATCH v2 1/9] " Chris Wilson
@ 2016-10-27 14:05   ` Chris Wilson
  2016-10-27 14:05   ` [PATCH v2 3/9] drm/i915: Defer transfer onto execution timeline to actual hw submission Chris Wilson
                     ` (6 subsequent siblings)
  7 siblings, 0 replies; 21+ messages in thread
From: Chris Wilson @ 2016-10-27 14:05 UTC (permalink / raw)
  To: intel-gfx

In order to support deferred scheduling, we need to differentiate
between when the request is ready to run (i.e. the submit fence is
signaled) and when the request is actually run (a new execute fence).
This is typically split between the request itself wanting to wait upon
others (for which we use the submit fence) and the CPU wanting to wait
upon the request, for which we use the execute fence to be sure the
hardware is ready to signal completion.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_request.c | 33 ++++++++++++++++++++++++---------
 drivers/gpu/drm/i915/i915_gem_request.h |  2 ++
 2 files changed, 26 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
index 79b0046d9a57..1ae5a2f8953f 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -351,11 +351,19 @@ submit_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
 	list_move_tail(&request->link, &timeline->requests);
 	spin_unlock(&request->timeline->lock);
 
+	i915_sw_fence_commit(&request->execute);
+
 	spin_unlock_irqrestore(&timeline->lock, flags);
 
 	return NOTIFY_DONE;
 }
 
+static int __i915_sw_fence_call
+execute_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
+{
+	return NOTIFY_DONE;
+}
+
 /**
  * i915_gem_request_alloc - allocate a request structure
  *
@@ -441,6 +449,12 @@ i915_gem_request_alloc(struct intel_engine_cs *engine,
 		       __timeline_get_seqno(req->timeline->common));
 
 	i915_sw_fence_init(&req->submit, submit_notify);
+	i915_sw_fence_init(&req->execute, execute_notify);
+	/* Ensure that the execute fence completes after the submit fence -
+	 * as we complete the execute fence from within the submit fence
+	 * callback, its completion would otherwise be visible first.
+	 */
+	i915_sw_fence_await_sw_fence(&req->execute, &req->submit, &req->execq);
 
 	INIT_LIST_HEAD(&req->active_list);
 	req->i915 = dev_priv;
@@ -817,9 +831,9 @@ bool __i915_spin_request(const struct drm_i915_gem_request *req,
 }
 
 static long
-__i915_request_wait_for_submit(struct drm_i915_gem_request *request,
-			       unsigned int flags,
-			       long timeout)
+__i915_request_wait_for_execute(struct drm_i915_gem_request *request,
+				unsigned int flags,
+				long timeout)
 {
 	const int state = flags & I915_WAIT_INTERRUPTIBLE ?
 		TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
@@ -831,9 +845,9 @@ __i915_request_wait_for_submit(struct drm_i915_gem_request *request,
 		add_wait_queue(q, &reset);
 
 	do {
-		prepare_to_wait(&request->submit.wait, &wait, state);
+		prepare_to_wait(&request->execute.wait, &wait, state);
 
-		if (i915_sw_fence_done(&request->submit))
+		if (i915_sw_fence_done(&request->execute))
 			break;
 
 		if (flags & I915_WAIT_LOCKED &&
@@ -851,7 +865,7 @@ __i915_request_wait_for_submit(struct drm_i915_gem_request *request,
 
 		timeout = io_schedule_timeout(timeout);
 	} while (timeout);
-	finish_wait(&request->submit.wait, &wait);
+	finish_wait(&request->execute.wait, &wait);
 
 	if (flags & I915_WAIT_LOCKED)
 		remove_wait_queue(q, &reset);
@@ -903,13 +917,14 @@ long i915_wait_request(struct drm_i915_gem_request *req,
 
 	trace_i915_gem_request_wait_begin(req);
 
-	if (!i915_sw_fence_done(&req->submit)) {
-		timeout = __i915_request_wait_for_submit(req, flags, timeout);
+	if (!i915_sw_fence_done(&req->execute)) {
+		timeout = __i915_request_wait_for_execute(req, flags, timeout);
 		if (timeout < 0)
 			goto complete;
 
-		GEM_BUG_ON(!i915_sw_fence_done(&req->submit));
+		GEM_BUG_ON(!i915_sw_fence_done(&req->execute));
 	}
+	GEM_BUG_ON(!i915_sw_fence_done(&req->submit));
 	GEM_BUG_ON(!req->global_seqno);
 
 	/* Optimistic short spin before touching IRQs */
diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
index 75f8360b3421..ed13f37fea0f 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.h
+++ b/drivers/gpu/drm/i915/i915_gem_request.h
@@ -85,7 +85,9 @@ struct drm_i915_gem_request {
 	struct intel_signal_node signaling;
 
 	struct i915_sw_fence submit;
+	struct i915_sw_fence execute;
 	wait_queue_t submitq;
+	wait_queue_t execq;
 
 	u32 global_seqno;
 
-- 
2.10.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 3/9] drm/i915: Defer transfer onto execution timeline to actual hw submission
  2016-10-27 14:05 ` [PATCH v2 1/9] " Chris Wilson
  2016-10-27 14:05   ` [PATCH v2 2/9] drm/i915: Split request submit/execute phase into two Chris Wilson
@ 2016-10-27 14:05   ` Chris Wilson
  2016-10-27 14:05   ` [PATCH v2 4/9] drm/i915: Remove engine->execlist_lock Chris Wilson
                     ` (5 subsequent siblings)
  7 siblings, 0 replies; 21+ messages in thread
From: Chris Wilson @ 2016-10-27 14:05 UTC (permalink / raw)
  To: intel-gfx

Defer the transfer from the client's timeline onto the execution
timeline from the point of readiness to the point of actual submission.
For example, in execlists, a request is finally submitted to hardware
when the hardware is ready, and only put onto the hardware queue when
the request is ready. By deferring the transfer, we ensure that the
timeline is maintained in retirement order if we decide to queue the
requests onto the hardware in a different order than fifo.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_request.c    | 34 ++++++++++++++++++------------
 drivers/gpu/drm/i915/i915_gem_request.h    |  2 ++
 drivers/gpu/drm/i915/i915_guc_submission.c |  2 ++
 drivers/gpu/drm/i915/intel_lrc.c           |  5 +++++
 drivers/gpu/drm/i915/intel_ringbuffer.c    |  2 ++
 5 files changed, 32 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
index 1ae5a2f8953f..6fadabd6ad25 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -307,25 +307,16 @@ static u32 timeline_get_seqno(struct i915_gem_timeline *tl)
 	return atomic_inc_return(&tl->next_seqno);
 }
 
-static int __i915_sw_fence_call
-submit_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
+void __i915_gem_request_submit(struct drm_i915_gem_request *request)
 {
-	struct drm_i915_gem_request *request =
-		container_of(fence, typeof(*request), submit);
 	struct intel_engine_cs *engine = request->engine;
 	struct intel_timeline *timeline;
-	unsigned long flags;
 	u32 seqno;
 
-	if (state != FENCE_COMPLETE)
-		return NOTIFY_DONE;
-
 	/* Transfer from per-context onto the global per-engine timeline */
 	timeline = engine->timeline;
 	GEM_BUG_ON(timeline == request->timeline);
-
-	/* Will be called from irq-context when using foreign DMA fences */
-	spin_lock_irqsave(&timeline->lock, flags);
+	assert_spin_locked(&timeline->lock);
 
 	seqno = timeline_get_seqno(timeline->common);
 	GEM_BUG_ON(!seqno);
@@ -345,15 +336,32 @@ submit_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
 	GEM_BUG_ON(!request->global_seqno);
 	engine->emit_breadcrumb(request,
 				request->ring->vaddr + request->postfix);
-	engine->submit_request(request);
 
 	spin_lock_nested(&request->timeline->lock, SINGLE_DEPTH_NESTING);
 	list_move_tail(&request->link, &timeline->requests);
 	spin_unlock(&request->timeline->lock);
 
 	i915_sw_fence_commit(&request->execute);
+}
 
-	spin_unlock_irqrestore(&timeline->lock, flags);
+static int __i915_sw_fence_call
+submit_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
+{
+	if (state == FENCE_COMPLETE) {
+		struct drm_i915_gem_request *request =
+			container_of(fence, typeof(*request), submit);
+		struct intel_engine_cs *engine = request->engine;
+		unsigned long flags;
+
+		/* Will be called from irq-context when using foreign fences,
+		 * and may be called from the execution callback of another
+		 * engine.
+		 */
+		spin_lock_irqsave_nested(&engine->timeline->lock, flags,
+					 SINGLE_DEPTH_NESTING);
+		engine->submit_request(request);
+		spin_unlock_irqrestore(&engine->timeline->lock, flags);
+	}
 
 	return NOTIFY_DONE;
 }
diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
index ed13f37fea0f..725d04128eaf 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.h
+++ b/drivers/gpu/drm/i915/i915_gem_request.h
@@ -228,6 +228,8 @@ void __i915_add_request(struct drm_i915_gem_request *req, bool flush_caches);
 #define i915_add_request_no_flush(req) \
 	__i915_add_request(req, false)
 
+void __i915_gem_request_submit(struct drm_i915_gem_request *request);
+
 struct intel_rps_client;
 #define NO_WAITBOOST ERR_PTR(-1)
 #define IS_RPS_CLIENT(p) (!IS_ERR(p))
diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
index 857ef914cae7..ddb574d5ceda 100644
--- a/drivers/gpu/drm/i915/i915_guc_submission.c
+++ b/drivers/gpu/drm/i915/i915_guc_submission.c
@@ -640,6 +640,8 @@ static void i915_guc_submit(struct drm_i915_gem_request *rq)
 	struct i915_guc_client *client = guc->execbuf_client;
 	int b_ret;
 
+	__i915_gem_request_submit(rq);
+
 	spin_lock(&client->wq_lock);
 	guc_wq_item_append(client, rq);
 
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index fa3012c342cc..b9aba16851ac 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -434,6 +434,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 {
 	struct drm_i915_gem_request *cursor, *last;
 	struct execlist_port *port = engine->execlist_port;
+	unsigned long flags;
 	bool submit = false;
 
 	last = port->request;
@@ -469,6 +470,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 	 * and context switches) submission.
 	 */
 
+	spin_lock_irqsave(&engine->timeline->lock, flags);
 	spin_lock(&engine->execlist_lock);
 	list_for_each_entry(cursor, &engine->execlist_queue, execlist_link) {
 		/* Can we combine this request with the current port? It has to
@@ -501,6 +503,8 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 			i915_gem_request_assign(&port->request, last);
 			port++;
 		}
+
+		__i915_gem_request_submit(cursor);
 		last = cursor;
 		submit = true;
 	}
@@ -512,6 +516,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 		i915_gem_request_assign(&port->request, last);
 	}
 	spin_unlock(&engine->execlist_lock);
+	spin_unlock_irqrestore(&engine->timeline->lock, flags);
 
 	if (submit)
 		execlists_submit_ports(engine);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 188fdec5fa6b..d450d5a5326c 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1294,6 +1294,8 @@ static void i9xx_submit_request(struct drm_i915_gem_request *request)
 {
 	struct drm_i915_private *dev_priv = request->i915;
 
+	__i915_gem_request_submit(request);
+
 	I915_WRITE_TAIL(request->engine, request->tail);
 }
 
-- 
2.10.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 4/9] drm/i915: Remove engine->execlist_lock
  2016-10-27 14:05 ` [PATCH v2 1/9] " Chris Wilson
  2016-10-27 14:05   ` [PATCH v2 2/9] drm/i915: Split request submit/execute phase into two Chris Wilson
  2016-10-27 14:05   ` [PATCH v2 3/9] drm/i915: Defer transfer onto execution timeline to actual hw submission Chris Wilson
@ 2016-10-27 14:05   ` Chris Wilson
  2016-10-27 14:05   ` [PATCH v2 5/9] drm/i915/scheduler: Signal the arrival of a new request Chris Wilson
                     ` (4 subsequent siblings)
  7 siblings, 0 replies; 21+ messages in thread
From: Chris Wilson @ 2016-10-27 14:05 UTC (permalink / raw)
  To: intel-gfx

The execlist_lock is now completely subsumed by the engine->timeline->lock,
and so we can remove the redundant layer of locking.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c     | 4 ++--
 drivers/gpu/drm/i915/i915_gem.c         | 4 ++--
 drivers/gpu/drm/i915/intel_engine_cs.c  | 1 -
 drivers/gpu/drm/i915/intel_lrc.c        | 7 +------
 drivers/gpu/drm/i915/intel_ringbuffer.h | 1 -
 5 files changed, 5 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 6ccd683a4fbc..2a909e89728d 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -3257,11 +3257,11 @@ static int i915_engine_info(struct seq_file *m, void *unused)
 				seq_printf(m, "\t\tELSP[1] idle\n");
 			rcu_read_unlock();
 
-			spin_lock_irq(&engine->execlist_lock);
+			spin_lock_irq(&engine->timeline->lock);
 			list_for_each_entry(rq, &engine->execlist_queue, execlist_link) {
 				print_request(m, rq, "\t\tQ ");
 			}
-			spin_unlock_irq(&engine->execlist_lock);
+			spin_unlock_irq(&engine->timeline->lock);
 		} else if (INTEL_GEN(dev_priv) > 6) {
 			seq_printf(m, "\tPP_DIR_BASE: 0x%08x\n",
 				   I915_READ(RING_PP_DIR_BASE(engine)));
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 3d0ee89dc57e..2a28501ec10e 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2683,12 +2683,12 @@ static void i915_gem_cleanup_engine(struct intel_engine_cs *engine)
 	 */
 
 	if (i915.enable_execlists) {
-		spin_lock(&engine->execlist_lock);
+		spin_lock_irq(&engine->timeline->lock);
 		INIT_LIST_HEAD(&engine->execlist_queue);
 		i915_gem_request_put(engine->execlist_port[0].request);
 		i915_gem_request_put(engine->execlist_port[1].request);
 		memset(engine->execlist_port, 0, sizeof(engine->execlist_port));
-		spin_unlock(&engine->execlist_lock);
+		spin_unlock_irq(&engine->timeline->lock);
 	}
 }
 
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 94de3d66733d..33a5b6d61d63 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -242,7 +242,6 @@ static void intel_engine_init_timeline(struct intel_engine_cs *engine)
 void intel_engine_setup_common(struct intel_engine_cs *engine)
 {
 	INIT_LIST_HEAD(&engine->execlist_queue);
-	spin_lock_init(&engine->execlist_lock);
 
 	intel_engine_init_timeline(engine);
 	intel_engine_init_hangcheck(engine);
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index b9aba16851ac..186c3ccb2c34 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -471,7 +471,6 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 	 */
 
 	spin_lock_irqsave(&engine->timeline->lock, flags);
-	spin_lock(&engine->execlist_lock);
 	list_for_each_entry(cursor, &engine->execlist_queue, execlist_link) {
 		/* Can we combine this request with the current port? It has to
 		 * be the same context/ringbuffer and not have any exceptions
@@ -515,7 +514,6 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 
 		i915_gem_request_assign(&port->request, last);
 	}
-	spin_unlock(&engine->execlist_lock);
 	spin_unlock_irqrestore(&engine->timeline->lock, flags);
 
 	if (submit)
@@ -600,9 +598,8 @@ static void intel_lrc_irq_handler(unsigned long data)
 static void execlists_submit_request(struct drm_i915_gem_request *request)
 {
 	struct intel_engine_cs *engine = request->engine;
-	unsigned long flags;
 
-	spin_lock_irqsave(&engine->execlist_lock, flags);
+	assert_spin_locked(&engine->timeline->lock);
 
 	/* We keep the previous context alive until we retire the following
 	 * request. This ensures that any the context object is still pinned
@@ -616,8 +613,6 @@ static void execlists_submit_request(struct drm_i915_gem_request *request)
 	list_add_tail(&request->execlist_link, &engine->execlist_queue);
 	if (execlists_elsp_idle(engine))
 		tasklet_hi_schedule(&engine->irq_tasklet);
-
-	spin_unlock_irqrestore(&engine->execlist_lock, flags);
 }
 
 int intel_logical_ring_alloc_request_extras(struct drm_i915_gem_request *request)
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index dcb145c67e89..8aa691661dbb 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -335,7 +335,6 @@ struct intel_engine_cs {
 
 	/* Execlists */
 	struct tasklet_struct irq_tasklet;
-	spinlock_t execlist_lock; /* used inside tasklet, use spin_lock_bh */
 	struct execlist_port {
 		struct drm_i915_gem_request *request;
 		unsigned int count;
-- 
2.10.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 5/9] drm/i915/scheduler: Signal the arrival of a new request
  2016-10-27 14:05 ` [PATCH v2 1/9] " Chris Wilson
                     ` (2 preceding siblings ...)
  2016-10-27 14:05   ` [PATCH v2 4/9] drm/i915: Remove engine->execlist_lock Chris Wilson
@ 2016-10-27 14:05   ` Chris Wilson
  2016-10-27 14:05   ` [PATCH v2 6/9] drm/i915/scheduler: Record all dependencies upon request construction Chris Wilson
                     ` (3 subsequent siblings)
  7 siblings, 0 replies; 21+ messages in thread
From: Chris Wilson @ 2016-10-27 14:05 UTC (permalink / raw)
  To: intel-gfx

The start of the scheduler, add a hook into request submission for the
scheduler to see the arrival of new requests and prepare its runqueues.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.c         |  4 ++++
 drivers/gpu/drm/i915/i915_gem_request.c | 13 +++++++++++++
 drivers/gpu/drm/i915/intel_engine_cs.c  |  3 +++
 drivers/gpu/drm/i915/intel_ringbuffer.h |  9 +++++++++
 include/uapi/drm/i915_drm.h             |  5 +++++
 5 files changed, 34 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index dd02ad69eb7b..11b9f6c38699 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -323,6 +323,10 @@ static int i915_getparam(struct drm_device *dev, void *data,
 		 */
 		value = i915_gem_mmap_gtt_version();
 		break;
+	case I915_PARAM_HAS_SCHEDULER:
+		value = dev_priv->engine[RCS] &&
+			dev_priv->engine[RCS]->schedule;
+		break;
 	case I915_PARAM_MMAP_VERSION:
 		/* Remember to bump this if the version changes! */
 	case I915_PARAM_HAS_GEM:
diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
index 6fadabd6ad25..3d63797ba49a 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -759,6 +759,19 @@ void __i915_add_request(struct drm_i915_gem_request *request, bool flush_caches)
 
 	i915_gem_mark_busy(engine);
 
+	/* Let the backend know a new request has arrived that may need
+	 * to adjust the existing execution schedule due to a high priority
+	 * request - i.e. we may want to preempt the current request in order
+	 * to run a high priority dependency chain *before* we can execute this
+	 * request.
+	 *
+	 * This is called before the request is ready to run so that we can
+	 * decide whether to preempt the entire chain so that it is ready to
+	 * run at the earliest possible convenience.
+	 */
+	if (engine->schedule)
+		engine->schedule(request, 0);
+
 	local_bh_disable();
 	i915_sw_fence_commit(&request->submit);
 	local_bh_enable(); /* Kick the execlists tasklet if just scheduled */
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 33a5b6d61d63..ed05151bb023 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -102,6 +102,9 @@ intel_engine_setup(struct drm_i915_private *dev_priv,
 	engine->mmio_base = info->mmio_base;
 	engine->irq_shift = info->irq_shift;
 
+	/* Nothing to do here, execute in order of dependencies */
+	engine->schedule = NULL;
+
 	dev_priv->engine[id] = engine;
 	return 0;
 }
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 8aa691661dbb..380b83eff25e 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -267,6 +267,15 @@ struct intel_engine_cs {
 	 */
 	void		(*submit_request)(struct drm_i915_gem_request *req);
 
+	/* Call when the priority on a request has changed and it and its
+	 * dependencies may need rescheduling. Note the request itself may
+	 * not be ready to run!
+	 *
+	 * Called under the struct_mutex.
+	 */
+	void		(*schedule)(struct drm_i915_gem_request *request,
+				    int priority);
+
 	/* Some chipsets are not quite as coherent as advertised and need
 	 * an expensive kick to force a true read of the up-to-date seqno.
 	 * However, the up-to-date seqno is not always required and the last
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index a1d04d8bc80a..f51d429feaae 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -402,6 +402,11 @@ typedef struct drm_i915_irq_wait {
  */
 #define I915_PARAM_HAS_EXEC_FENCE	 42
 
+/* Query whether DRM_I915_GEM_EXECBUFFER2 supports user defined execution
+ * priorities and the driver will attempt to execute batches in priority order.
+ */
+#define I915_PARAM_HAS_SCHEDULER	 43
+
 typedef struct drm_i915_getparam {
 	__s32 param;
 	/*
-- 
2.10.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 6/9] drm/i915/scheduler: Record all dependencies upon request construction
  2016-10-27 14:05 ` [PATCH v2 1/9] " Chris Wilson
                     ` (3 preceding siblings ...)
  2016-10-27 14:05   ` [PATCH v2 5/9] drm/i915/scheduler: Signal the arrival of a new request Chris Wilson
@ 2016-10-27 14:05   ` Chris Wilson
  2016-10-27 14:05   ` [PATCH v2 7/9] drm/i915/scheduler: Execute requests in order of priorities Chris Wilson
                     ` (2 subsequent siblings)
  7 siblings, 0 replies; 21+ messages in thread
From: Chris Wilson @ 2016-10-27 14:05 UTC (permalink / raw)
  To: intel-gfx

The scheduler needs to know the dependencies of each request for the
lifetime of the request, as it may choose to reschedule the requests at
any time and must ensure the dependency tree is not broken. This is in
additional to using the fence to only allow execution after all
dependencies have been completed.

One option was to extend the fence to support the bidirectional
dependency tracking required by the scheduler. However the mismatch in
lifetimes between the submit fence and the request essentially meant
that we had to build a completely separate struct (and we could not
simply reuse the existing waitqueue in the fence for one half of the
dependency tracking). The extra dependency tracking simply did not mesh
well with the fence, and keeping it separate both keeps the fence
implementation simpler and allows us to extend the dependency tracking
into a priority tree (whilst maintaining support for reordering the
tree).

To avoid the additional allocations and list manipulations, the use of
the priotree is disabled when there are no schedulers to use it.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_request.c | 72 ++++++++++++++++++++++++++++++++-
 drivers/gpu/drm/i915/i915_gem_request.h | 23 +++++++++++
 2 files changed, 94 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
index 3d63797ba49a..1660a12d3f19 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -113,6 +113,59 @@ i915_gem_request_remove_from_client(struct drm_i915_gem_request *request)
 	spin_unlock(&file_priv->mm.lock);
 }
 
+static int
+i915_priotree_add_dependency(struct i915_priotree *pt,
+			     struct i915_priotree *signal,
+			     struct i915_dependency *dep)
+{
+	unsigned long flags = 0;
+
+	if (!dep) {
+		dep = kmalloc(sizeof(*dep), GFP_KERNEL);
+		if (!dep)
+			return -ENOMEM;
+
+		flags |= I915_DEPENDENCY_ALLOC;
+	}
+
+	dep->signal = signal;
+	dep->flags = flags;
+
+	list_add(&dep->post_link, &signal->post_list);
+	list_add(&dep->pre_link, &pt->pre_list);
+	return 0;
+}
+
+static void
+i915_priotree_fini(struct i915_priotree *pt)
+{
+	struct i915_dependency *dep, *next;
+
+	/* Everyone we depended upon should retire before us and remove
+	 * themselves from our list. However, retirement is run independently
+	 * on each timeline and so we may be called out-of-order.
+	 */
+	list_for_each_entry_safe(dep, next, &pt->pre_list, pre_link) {
+		list_del(&dep->post_link);
+		if (dep->flags & I915_DEPENDENCY_ALLOC)
+			kfree(dep);
+	}
+
+	/* Remove ourselves from everyone who depends upon us */
+	list_for_each_entry_safe(dep, next, &pt->post_list, post_link) {
+		list_del(&dep->pre_link);
+		if (dep->flags & I915_DEPENDENCY_ALLOC)
+			kfree(dep);
+	}
+}
+
+static void
+i915_priotree_init(struct i915_priotree *pt)
+{
+	INIT_LIST_HEAD(&pt->pre_list);
+	INIT_LIST_HEAD(&pt->post_list);
+}
+
 void i915_gem_retire_noop(struct i915_gem_active *active,
 			  struct drm_i915_gem_request *request)
 {
@@ -182,6 +235,8 @@ static void i915_gem_request_retire(struct drm_i915_gem_request *request)
 	i915_gem_context_put(request->ctx);
 
 	dma_fence_signal(&request->fence);
+
+	i915_priotree_fini(&request->priotree);
 	i915_gem_request_put(request);
 }
 
@@ -464,6 +519,8 @@ i915_gem_request_alloc(struct intel_engine_cs *engine,
 	 */
 	i915_sw_fence_await_sw_fence(&req->execute, &req->submit, &req->execq);
 
+	i915_priotree_init(&req->priotree);
+
 	INIT_LIST_HEAD(&req->active_list);
 	req->i915 = dev_priv;
 	req->engine = engine;
@@ -517,6 +574,14 @@ i915_gem_request_await_request(struct drm_i915_gem_request *to,
 
 	GEM_BUG_ON(to == from);
 
+	if (to->engine->schedule) {
+		ret = i915_priotree_add_dependency(&to->priotree,
+						   &from->priotree,
+						   NULL);
+		if (ret < 0)
+			return ret;
+	}
+
 	if (to->timeline == from->timeline)
 		return 0;
 
@@ -740,9 +805,14 @@ void __i915_add_request(struct drm_i915_gem_request *request, bool flush_caches)
 
 	prev = i915_gem_active_raw(&timeline->last_request,
 				   &request->i915->drm.struct_mutex);
-	if (prev)
+	if (prev) {
 		i915_sw_fence_await_sw_fence(&request->submit, &prev->submit,
 					     &request->submitq);
+		if (engine->schedule)
+			i915_priotree_add_dependency(&request->priotree,
+						     &prev->priotree,
+						     &request->depq);
+	}
 
 	spin_lock_irq(&timeline->lock);
 	list_add_tail(&request->link, &timeline->requests);
diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
index 725d04128eaf..0cb2ba546062 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.h
+++ b/drivers/gpu/drm/i915/i915_gem_request.h
@@ -41,6 +41,18 @@ struct intel_signal_node {
 	struct intel_wait wait;
 };
 
+struct i915_dependency {
+	struct i915_priotree *signal;
+	struct list_head pre_link, post_link;
+	unsigned long flags;
+#define I915_DEPENDENCY_ALLOC BIT(0)
+};
+
+struct i915_priotree {
+	struct list_head pre_list; /* who is before us, we depend upon */
+	struct list_head post_list; /* who is after us, they depend upon us */
+};
+
 /**
  * Request queue structure.
  *
@@ -89,6 +101,17 @@ struct drm_i915_gem_request {
 	wait_queue_t submitq;
 	wait_queue_t execq;
 
+	/* A list of everyone we wait upon, and everyone who waits upon us.
+	 * Even though we will not be submitted to the hardware before the
+	 * submit fence is signaled (it waits for all external events as well
+	 * as our own requests), the scheduler still needs to know the
+	 * dependency tree for the lifetime of the request (from execbuf
+	 * to retirement), i.e. bidirectional dependency information for the
+	 * request not tied to individual fences.
+	 */
+	struct i915_priotree priotree;
+	struct i915_dependency depq;
+
 	u32 global_seqno;
 
 	/** GEM sequence number associated with the previous request,
-- 
2.10.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 7/9] drm/i915/scheduler: Execute requests in order of priorities
  2016-10-27 14:05 ` [PATCH v2 1/9] " Chris Wilson
                     ` (4 preceding siblings ...)
  2016-10-27 14:05   ` [PATCH v2 6/9] drm/i915/scheduler: Record all dependencies upon request construction Chris Wilson
@ 2016-10-27 14:05   ` Chris Wilson
  2016-10-27 14:05   ` [PATCH v2 8/9] drm/i915/scheduler: Boost priorities for flips Chris Wilson
  2016-10-27 14:05   ` [PATCH v2 9/9] drm/i915/scheduler: Support user-defined priorities Chris Wilson
  7 siblings, 0 replies; 21+ messages in thread
From: Chris Wilson @ 2016-10-27 14:05 UTC (permalink / raw)
  To: intel-gfx

Track the priority of each request and use it to determine the order in
which we submit requests to the hardware via execlists.

The priority of the request is determined by the user (eventually via
the context) but may be overridden at any time by the driver. When we set
the priority of the request, we bump the priority of all of its
dependencies to match - so that a high priority drawing operation is not
stuck behind a background task.

When the request is ready to execute (i.e. we have signaled the submit
fence following completion of all its dependencies, including third
party fences), we put the request into a priority sorted rbtree to be
submitted to the hardware. If the request is higher priority than all
pending requests, it will be submitted on the next context-switch
interrupt as soon as the hardware has completed the current request. We
do not currently preempt any current execution to immediately run a very
high priority request, at least not yet.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c        |  7 ++-
 drivers/gpu/drm/i915/i915_gem.c            |  3 +-
 drivers/gpu/drm/i915/i915_gem_request.c    |  4 ++
 drivers/gpu/drm/i915/i915_gem_request.h    |  6 +--
 drivers/gpu/drm/i915/i915_guc_submission.c |  1 +
 drivers/gpu/drm/i915/intel_engine_cs.c     |  3 +-
 drivers/gpu/drm/i915/intel_lrc.c           | 86 +++++++++++++++++++++++++++---
 drivers/gpu/drm/i915/intel_ringbuffer.h    |  3 +-
 8 files changed, 97 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 2a909e89728d..f1c2d3f435fb 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -631,8 +631,9 @@ static void print_request(struct seq_file *m,
 			  struct drm_i915_gem_request *rq,
 			  const char *prefix)
 {
-	seq_printf(m, "%s%x [%x:%x] @ %d: %s\n", prefix,
+	seq_printf(m, "%s%x [%x:%x] prio=%d @ %dms: %s\n", prefix,
 		   rq->global_seqno, rq->ctx->hw_id, rq->fence.seqno,
+		   rq->priotree.priority,
 		   jiffies_to_msecs(jiffies - rq->emitted_jiffies),
 		   rq->timeline->common->name);
 }
@@ -3219,6 +3220,7 @@ static int i915_engine_info(struct seq_file *m, void *unused)
 
 		if (i915.enable_execlists) {
 			u32 ptr, read, write;
+			struct rb_node *rb;
 
 			seq_printf(m, "\tExeclist status: 0x%08x %08x\n",
 				   I915_READ(RING_EXECLIST_STATUS_LO(engine)),
@@ -3258,7 +3260,8 @@ static int i915_engine_info(struct seq_file *m, void *unused)
 			rcu_read_unlock();
 
 			spin_lock_irq(&engine->timeline->lock);
-			list_for_each_entry(rq, &engine->execlist_queue, execlist_link) {
+			for (rb = engine->execlist_first; rb; rb = rb_next(rb)) {
+				rq = rb_entry(rb, typeof(*rq), priotree.node);
 				print_request(m, rq, "\t\tQ ");
 			}
 			spin_unlock_irq(&engine->timeline->lock);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 2a28501ec10e..82ce5e4d7fdd 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2684,10 +2684,11 @@ static void i915_gem_cleanup_engine(struct intel_engine_cs *engine)
 
 	if (i915.enable_execlists) {
 		spin_lock_irq(&engine->timeline->lock);
-		INIT_LIST_HEAD(&engine->execlist_queue);
 		i915_gem_request_put(engine->execlist_port[0].request);
 		i915_gem_request_put(engine->execlist_port[1].request);
 		memset(engine->execlist_port, 0, sizeof(engine->execlist_port));
+		engine->execlist_queue = RB_ROOT;
+		engine->execlist_first = NULL;
 		spin_unlock_irq(&engine->timeline->lock);
 	}
 }
diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
index 1660a12d3f19..fabd29fde175 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -141,6 +141,8 @@ i915_priotree_fini(struct i915_priotree *pt)
 {
 	struct i915_dependency *dep, *next;
 
+	GEM_BUG_ON(!RB_EMPTY_NODE(&pt->node));
+
 	/* Everyone we depended upon should retire before us and remove
 	 * themselves from our list. However, retirement is run independently
 	 * on each timeline and so we may be called out-of-order.
@@ -164,6 +166,8 @@ i915_priotree_init(struct i915_priotree *pt)
 {
 	INIT_LIST_HEAD(&pt->pre_list);
 	INIT_LIST_HEAD(&pt->post_list);
+	RB_CLEAR_NODE(&pt->node);
+	pt->priority = INT_MIN;
 }
 
 void i915_gem_retire_noop(struct i915_gem_active *active,
diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
index 0cb2ba546062..4288f512ac51 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.h
+++ b/drivers/gpu/drm/i915/i915_gem_request.h
@@ -51,6 +51,9 @@ struct i915_dependency {
 struct i915_priotree {
 	struct list_head pre_list; /* who is before us, we depend upon */
 	struct list_head post_list; /* who is after us, they depend upon us */
+	struct rb_node node;
+	int priority;
+#define I915_PRIORITY_MAX 1024
 };
 
 /**
@@ -168,9 +171,6 @@ struct drm_i915_gem_request {
 	struct drm_i915_file_private *file_priv;
 	/** file_priv list entry for this request */
 	struct list_head client_list;
-
-	/** Link in the execlist submission queue, guarded by execlist_lock. */
-	struct list_head execlist_link;
 };
 
 extern const struct dma_fence_ops i915_fence_ops;
diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
index ddb574d5ceda..65b02737f756 100644
--- a/drivers/gpu/drm/i915/i915_guc_submission.c
+++ b/drivers/gpu/drm/i915/i915_guc_submission.c
@@ -1522,6 +1522,7 @@ int i915_guc_submission_enable(struct drm_i915_private *dev_priv)
 	/* Take over from manual control of ELSP (execlists) */
 	for_each_engine(engine, dev_priv, id) {
 		engine->submit_request = i915_guc_submit;
+		engine->schedule = NULL;
 
 		/* Replay the current set of previously submitted requests */
 		list_for_each_entry(request,
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index ed05151bb023..6b726c199775 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -244,7 +244,8 @@ static void intel_engine_init_timeline(struct intel_engine_cs *engine)
  */
 void intel_engine_setup_common(struct intel_engine_cs *engine)
 {
-	INIT_LIST_HEAD(&engine->execlist_queue);
+	engine->execlist_queue = RB_ROOT;
+	engine->execlist_first = NULL;
 
 	intel_engine_init_timeline(engine);
 	intel_engine_init_hangcheck(engine);
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 186c3ccb2c34..5ff6ac6fea1e 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -432,9 +432,10 @@ static bool can_merge_ctx(const struct i915_gem_context *prev,
 
 static void execlists_dequeue(struct intel_engine_cs *engine)
 {
-	struct drm_i915_gem_request *cursor, *last;
+	struct drm_i915_gem_request *last;
 	struct execlist_port *port = engine->execlist_port;
 	unsigned long flags;
+	struct rb_node *rb;
 	bool submit = false;
 
 	last = port->request;
@@ -471,7 +472,11 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 	 */
 
 	spin_lock_irqsave(&engine->timeline->lock, flags);
-	list_for_each_entry(cursor, &engine->execlist_queue, execlist_link) {
+	rb = engine->execlist_first;
+	while (rb) {
+		struct drm_i915_gem_request *cursor =
+			rb_entry(rb, typeof(*cursor), priotree.node);
+
 		/* Can we combine this request with the current port? It has to
 		 * be the same context/ringbuffer and not have any exceptions
 		 * (e.g. GVT saying never to combine contexts).
@@ -503,16 +508,18 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 			port++;
 		}
 
+		rb = rb_next(rb);
+		rb_erase(&cursor->priotree.node, &engine->execlist_queue);
+		RB_CLEAR_NODE(&cursor->priotree.node);
+		cursor->priotree.priority = INT_MAX;
+
 		__i915_gem_request_submit(cursor);
 		last = cursor;
 		submit = true;
 	}
 	if (submit) {
-		/* Decouple all the requests submitted from the queue */
-		engine->execlist_queue.next = &cursor->execlist_link;
-		cursor->execlist_link.prev = &engine->execlist_queue;
-
 		i915_gem_request_assign(&port->request, last);
+		engine->execlist_first = rb;
 	}
 	spin_unlock_irqrestore(&engine->timeline->lock, flags);
 
@@ -595,6 +602,32 @@ static void intel_lrc_irq_handler(unsigned long data)
 	intel_uncore_forcewake_put(dev_priv, engine->fw_domains);
 }
 
+static bool insert_request(struct i915_priotree *pt, struct rb_root *root)
+{
+	struct rb_node **p, *rb;
+	bool first = true;
+
+	/* most positive priority is scheduled first, equal priorities fifo */
+	rb = NULL;
+	p = &root->rb_node;
+	while (*p) {
+		struct i915_priotree *pos;
+
+		rb = *p;
+		pos = rb_entry(rb, typeof(*pos), node);
+		if (pt->priority > pos->priority) {
+			p = &rb->rb_left;
+		} else {
+			p = &rb->rb_right;
+			first = false;
+		}
+	}
+	rb_link_node(&pt->node, rb, p);
+	rb_insert_color(&pt->node, root);
+
+	return first;
+}
+
 static void execlists_submit_request(struct drm_i915_gem_request *request)
 {
 	struct intel_engine_cs *engine = request->engine;
@@ -610,11 +643,45 @@ static void execlists_submit_request(struct drm_i915_gem_request *request)
 	request->previous_context = engine->last_context;
 	engine->last_context = request->ctx;
 
-	list_add_tail(&request->execlist_link, &engine->execlist_queue);
+	if (insert_request(&request->priotree, &engine->execlist_queue))
+		engine->execlist_first = &request->priotree.node;
 	if (execlists_elsp_idle(engine))
 		tasklet_hi_schedule(&engine->irq_tasklet);
 }
 
+static void update_priorities(struct i915_priotree *pt, int prio)
+{
+	struct drm_i915_gem_request *request =
+		container_of(pt, struct drm_i915_gem_request, priotree);
+	struct intel_engine_cs *engine = request->engine;
+	struct i915_dependency *dep;
+
+	if (prio <= READ_ONCE(pt->priority))
+		return;
+
+	/* Recursively bump all dependent priorities to match the new request */
+	list_for_each_entry(dep, &pt->pre_list, pre_link)
+		update_priorities(dep->signal, prio);
+
+	/* Fifo and depth-first replacement executes our deps before us */
+	spin_lock_irq(&engine->timeline->lock);
+	pt->priority = prio;
+	if (!RB_EMPTY_NODE(&pt->node)) {
+		rb_erase(&pt->node, &engine->execlist_queue);
+		if (insert_request(pt, &engine->execlist_queue))
+			engine->execlist_first = &pt->node;
+	}
+	spin_unlock_irq(&engine->timeline->lock);
+}
+
+static void execlists_schedule(struct drm_i915_gem_request *request, int prio)
+{
+	/* Make sure that our entire dependency chain shares our prio */
+	update_priorities(&request->priotree, prio);
+
+	/* XXX Do we need to preempt to make room for us and our deps? */
+}
+
 int intel_logical_ring_alloc_request_extras(struct drm_i915_gem_request *request)
 {
 	struct intel_engine_cs *engine = request->engine;
@@ -1651,8 +1718,10 @@ void intel_execlists_enable_submission(struct drm_i915_private *dev_priv)
 	struct intel_engine_cs *engine;
 	enum intel_engine_id id;
 
-	for_each_engine(engine, dev_priv, id)
+	for_each_engine(engine, dev_priv, id) {
 		engine->submit_request = execlists_submit_request;
+		engine->schedule = execlists_schedule;
+	}
 }
 
 static void
@@ -1665,6 +1734,7 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine)
 	engine->emit_breadcrumb = gen8_emit_breadcrumb;
 	engine->emit_breadcrumb_sz = gen8_emit_breadcrumb_sz;
 	engine->submit_request = execlists_submit_request;
+	engine->schedule = execlists_schedule;
 
 	engine->irq_enable = gen8_logical_ring_enable_irq;
 	engine->irq_disable = gen8_logical_ring_disable_irq;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 380b83eff25e..ea7ab5de2f31 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -348,7 +348,8 @@ struct intel_engine_cs {
 		struct drm_i915_gem_request *request;
 		unsigned int count;
 	} execlist_port[2];
-	struct list_head execlist_queue;
+	struct rb_root execlist_queue;
+	struct rb_node *execlist_first;
 	unsigned int fw_domains;
 	bool disable_lite_restore_wa;
 	bool preempt_wa;
-- 
2.10.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 8/9] drm/i915/scheduler: Boost priorities for flips
  2016-10-27 14:05 ` [PATCH v2 1/9] " Chris Wilson
                     ` (5 preceding siblings ...)
  2016-10-27 14:05   ` [PATCH v2 7/9] drm/i915/scheduler: Execute requests in order of priorities Chris Wilson
@ 2016-10-27 14:05   ` Chris Wilson
  2016-10-27 14:05   ` [PATCH v2 9/9] drm/i915/scheduler: Support user-defined priorities Chris Wilson
  7 siblings, 0 replies; 21+ messages in thread
From: Chris Wilson @ 2016-10-27 14:05 UTC (permalink / raw)
  To: intel-gfx

Boost the priority of any rendering required to show the next pageflip
as we want to avoid missing the vblank by being delayed by invisible
workload. We prioritise avoiding jank and jitter in the GUI over
starving background tasks.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h      |  5 ++++
 drivers/gpu/drm/i915/i915_gem.c      | 50 ++++++++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_display.c |  2 ++
 3 files changed, 57 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index a53c80172752..e3b4412d0241 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3386,6 +3386,11 @@ int i915_gem_object_wait(struct drm_i915_gem_object *obj,
 			 unsigned int flags,
 			 long timeout,
 			 struct intel_rps_client *rps);
+int i915_gem_object_wait_priority(struct drm_i915_gem_object *obj,
+				  unsigned int flags,
+				  int priority);
+#define I915_PRIORITY_DISPLAY I915_PRIORITY_MAX
+
 int __must_check
 i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj,
 				  bool write);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 82ce5e4d7fdd..33c4a57fbeb6 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -433,6 +433,56 @@ i915_gem_object_wait_reservation(struct reservation_object *resv,
 	return timeout;
 }
 
+static void fence_set_priority(struct dma_fence *fence, int prio)
+{
+	struct drm_i915_gem_request *rq;
+	struct intel_engine_cs *engine;
+
+	if (!dma_fence_is_i915(fence))
+		return;
+
+	rq = to_request(fence);
+	engine = rq->engine;
+	if (!engine->schedule)
+		return;
+
+	engine->schedule(rq, prio);
+}
+
+int
+i915_gem_object_wait_priority(struct drm_i915_gem_object *obj,
+			      unsigned int flags,
+			      int prio)
+{
+	struct dma_fence *excl;
+
+	if (flags & I915_WAIT_ALL) {
+		struct dma_fence **shared;
+		unsigned int count, i;
+		int ret;
+
+		ret = reservation_object_get_fences_rcu(obj->resv,
+							&excl, &count, &shared);
+		if (ret)
+			return ret;
+
+		for (i = 0; i < count; i++) {
+			fence_set_priority(shared[i], prio);
+			dma_fence_put(shared[i]);
+		}
+
+		kfree(shared);
+	} else {
+		excl = reservation_object_get_excl_rcu(obj->resv);
+	}
+
+	if (excl) {
+		fence_set_priority(excl, prio);
+		dma_fence_put(excl);
+	}
+	return 0;
+}
+
 /**
  * Waits for rendering to the object to be completed
  * @obj: i915 gem object
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 39cda13b6267..45d21421a2be 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -14213,6 +14213,8 @@ intel_prepare_plane_fb(struct drm_plane *plane,
 		}
 	}
 
+	i915_gem_object_wait_priority(obj, 0, I915_PRIORITY_DISPLAY);
+
 	return 0;
 }
 
-- 
2.10.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 9/9] drm/i915/scheduler: Support user-defined priorities
  2016-10-27 14:05 ` [PATCH v2 1/9] " Chris Wilson
                     ` (6 preceding siblings ...)
  2016-10-27 14:05   ` [PATCH v2 8/9] drm/i915/scheduler: Boost priorities for flips Chris Wilson
@ 2016-10-27 14:05   ` Chris Wilson
  2016-10-27 15:22     ` Chris Wilson
  7 siblings, 1 reply; 21+ messages in thread
From: Chris Wilson @ 2016-10-27 14:05 UTC (permalink / raw)
  To: intel-gfx

Use a priority stored in the context as the initial value when
submitting a request. This allows us to change the default priority on a
per-context basis, allowing different contexts to be favoured with GPU
time at the expense of lower importance work. The user can adjust the
context's priority via I915_CONTEXT_PARAM_PRIORITY, with more positive
values being higher priority (they will be serviced earlier, after their
dependencies have been resolved). Any prerequisite work for an execbuf
will have its priority raised to match the new request as required.

Normal users can specify any value in the range of -1023 to 0 [default],
i.e. they can reduce the priority of their workloads (and temporarily
boost it back to normal if so desired).

Privileged users can specify any value in the range of -1023 to 1023,
[default is 0], i.e. they can raise their priority above all overs and
so potentially starve the system.

Note that the existing schedulers are not fair, nor load balancing, the
execution is strictly by priority on a first-come, first-served basis,
and the driver may choose to boost some requests above the range
available to users.

This priority was originally based around nice(2), but evolved to allow
clients to adjust their priority within a small range, and allow for a
privileged high priority range.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h         |  1 +
 drivers/gpu/drm/i915/i915_gem_context.c | 21 +++++++++++++++++++++
 drivers/gpu/drm/i915/i915_gem_request.c |  2 +-
 include/uapi/drm/i915_drm.h             |  3 +++
 4 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index e3b4412d0241..2da5ecad1f35 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -948,6 +948,7 @@ struct i915_gem_context {
 	/* Unique identifier for this context, used by the hw for tracking */
 	unsigned int hw_id;
 	u32 user_handle;
+	int priority; /* greater priorities are serviced first */
 
 	u32 ggtt_alignment;
 
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 34daef828dca..8df216eba490 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -476,6 +476,7 @@ int i915_gem_context_init(struct drm_device *dev)
 		return PTR_ERR(ctx);
 	}
 
+	ctx->priority = -I915_PRIORITY_MAX; /* lowest priority; idle task */
 	dev_priv->kernel_context = ctx;
 
 	DRM_DEBUG_DRIVER("%s context support initialized\n",
@@ -1095,6 +1096,9 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
 	case I915_CONTEXT_PARAM_NO_ERROR_CAPTURE:
 		args->value = !!(ctx->flags & CONTEXT_NO_ERROR_CAPTURE);
 		break;
+	case I915_CONTEXT_PARAM_PRIORITY:
+		args->value = ctx->priority;
+		break;
 	default:
 		ret = -EINVAL;
 		break;
@@ -1150,6 +1154,23 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
 				ctx->flags &= ~CONTEXT_NO_ERROR_CAPTURE;
 		}
 		break;
+
+	case I915_CONTEXT_PARAM_PRIORITY:
+		{
+			int priority = args->value;
+
+			if (args->size)
+				ret = -EINVAL;
+			else if (priority >= I915_PRIORITY_MAX ||
+				 priority <= -I915_PRIORITY_MAX)
+				ret = -EINVAL;
+			else if (priority > 0 && !capable(CAP_SYS_ADMIN))
+				ret = -EPERM;
+			else
+				ctx->priority = priority;
+		}
+		break;
+
 	default:
 		ret = -EINVAL;
 		break;
diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
index fabd29fde175..4bfccb3d0c2c 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -844,7 +844,7 @@ void __i915_add_request(struct drm_i915_gem_request *request, bool flush_caches)
 	 * run at the earliest possible convenience.
 	 */
 	if (engine->schedule)
-		engine->schedule(request, 0);
+		engine->schedule(request, request->ctx->priority);
 
 	local_bh_disable();
 	i915_sw_fence_commit(&request->submit);
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index f51d429feaae..9fa5eee64f6b 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -404,6 +404,8 @@ typedef struct drm_i915_irq_wait {
 
 /* Query whether DRM_I915_GEM_EXECBUFFER2 supports user defined execution
  * priorities and the driver will attempt to execute batches in priority order.
+ * The initial priority for each batch is supplied by the context and is
+ * controlled via I915_CONTEXT_PARAM_PRIORITY.
  */
 #define I915_PARAM_HAS_SCHEDULER	 43
 
@@ -1283,6 +1285,7 @@ struct drm_i915_gem_context_param {
 #define I915_CONTEXT_PARAM_NO_ZEROMAP	0x2
 #define I915_CONTEXT_PARAM_GTT_SIZE	0x3
 #define I915_CONTEXT_PARAM_NO_ERROR_CAPTURE	0x4
+#define I915_CONTEXT_PARAM_PRIORITY	0x5
 	__u64 value;
 };
 
-- 
2.10.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 9/9] drm/i915/scheduler: Support user-defined priorities
  2016-10-27 14:05   ` [PATCH v2 9/9] drm/i915/scheduler: Support user-defined priorities Chris Wilson
@ 2016-10-27 15:22     ` Chris Wilson
  2016-10-28 11:21       ` Chris Wilson
  0 siblings, 1 reply; 21+ messages in thread
From: Chris Wilson @ 2016-10-27 15:22 UTC (permalink / raw)
  To: intel-gfx

On Thu, Oct 27, 2016 at 03:05:16PM +0100, Chris Wilson wrote:
> Use a priority stored in the context as the initial value when
> submitting a request. This allows us to change the default priority on a
> per-context basis, allowing different contexts to be favoured with GPU
> time at the expense of lower importance work. The user can adjust the
> context's priority via I915_CONTEXT_PARAM_PRIORITY, with more positive
> values being higher priority (they will be serviced earlier, after their
> dependencies have been resolved). Any prerequisite work for an execbuf
> will have its priority raised to match the new request as required.

Not in mesa yet, but use case would in supporting EGL_IMG_context_priority
https://www.khronos.org/registry/egl/extensions/IMG/EGL_IMG_context_priority.txt

	EGL_CONTEXT_PRIORITY_LEVEL_IMG determines the priority level of
        the context to be created. This attribute is a hint, as an
        implementation may not support multiple contexts at some
        priority levels and system policy may limit access to high
        priority contexts to appropriate system privilege level. The
        default value for EGL_CONTEXT_PRIORITY_LEVEL_IMG is
        EGL_CONTEXT_PRIORITY_MEDIUM_IMG."

i.e.
PRIORITY_HIGH -> 1023 [privileged, will failback to 0]
PRIORITY_MED -> 0 [default]
PRIORITY_LOW -> -1023

Does anyone has a real use for this? On the transcode side?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 9/9] drm/i915/scheduler: Support user-defined priorities
  2016-10-27 15:22     ` Chris Wilson
@ 2016-10-28 11:21       ` Chris Wilson
  0 siblings, 0 replies; 21+ messages in thread
From: Chris Wilson @ 2016-10-28 11:21 UTC (permalink / raw)
  To: intel-gfx

On Thu, Oct 27, 2016 at 04:22:31PM +0100, Chris Wilson wrote:
> On Thu, Oct 27, 2016 at 03:05:16PM +0100, Chris Wilson wrote:
> > Use a priority stored in the context as the initial value when
> > submitting a request. This allows us to change the default priority on a
> > per-context basis, allowing different contexts to be favoured with GPU
> > time at the expense of lower importance work. The user can adjust the
> > context's priority via I915_CONTEXT_PARAM_PRIORITY, with more positive
> > values being higher priority (they will be serviced earlier, after their
> > dependencies have been resolved). Any prerequisite work for an execbuf
> > will have its priority raised to match the new request as required.
> 
> Not in mesa yet, but use case would in supporting EGL_IMG_context_priority
> https://www.khronos.org/registry/egl/extensions/IMG/EGL_IMG_context_priority.txt
> 
> 	EGL_CONTEXT_PRIORITY_LEVEL_IMG determines the priority level of
>         the context to be created. This attribute is a hint, as an
>         implementation may not support multiple contexts at some
>         priority levels and system policy may limit access to high
>         priority contexts to appropriate system privilege level. The
>         default value for EGL_CONTEXT_PRIORITY_LEVEL_IMG is
>         EGL_CONTEXT_PRIORITY_MEDIUM_IMG."
> 
> i.e.
> PRIORITY_HIGH -> 1023 [privileged, will failback to 0]
> PRIORITY_MED -> 0 [default]
> PRIORITY_LOW -> -1023
> 
> Does anyone has a real use for this? On the transcode side?

Also applies to VkQueue as each can have a distinct priority.
Queues need a bit more work in the anv implmentation to marry the spec
to a kernel context, mostly working out the details of QueueFamilies and
the like and seeing if we require new abi.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/6] drm/i915: Show the execlist queue in debugfs/i915_engine_info
  2016-10-27  0:03 ` [PATCH 1/6] drm/i915: Show the execlist queue in debugfs/i915_engine_info Chris Wilson
@ 2016-11-02 13:27   ` Joonas Lahtinen
  0 siblings, 0 replies; 21+ messages in thread
From: Joonas Lahtinen @ 2016-11-02 13:27 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On to, 2016-10-27 at 01:03 +0100, Chris Wilson wrote:
> When looking at freezes whilst working on execlists, knowing the order
> of the pending requests in the driver is useful.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

I think this can be merged already.

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2016-11-02 13:27 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-27  0:03 Trivial scheduler Chris Wilson
2016-10-27  0:03 ` [PATCH 1/6] drm/i915: Show the execlist queue in debugfs/i915_engine_info Chris Wilson
2016-11-02 13:27   ` Joonas Lahtinen
2016-10-27  0:03 ` [PATCH 2/6] drm/i915/scheduler: Signal the arrival of a new request Chris Wilson
2016-10-27  0:03 ` [PATCH 3/6] drm/i915/scheduler: Record all dependencies upon request construction Chris Wilson
2016-10-27  0:03 ` [PATCH 4/6] drm/i915/scheduler: Execute requests in order of priorities Chris Wilson
2016-10-27  7:45   ` Chris Wilson
2016-10-27  0:03 ` [PATCH 5/6] drm/i915/scheduler: Boost priorities for flips Chris Wilson
2016-10-27  0:03 ` [PATCH 6/6] drm/i915/scheduler: Support user-defined priorities Chris Wilson
2016-10-27  0:46 ` ✓ Fi.CI.BAT: success for series starting with [1/6] drm/i915: Show the execlist queue in debugfs/i915_engine_info Patchwork
2016-10-27 14:05 ` [PATCH v2 1/9] " Chris Wilson
2016-10-27 14:05   ` [PATCH v2 2/9] drm/i915: Split request submit/execute phase into two Chris Wilson
2016-10-27 14:05   ` [PATCH v2 3/9] drm/i915: Defer transfer onto execution timeline to actual hw submission Chris Wilson
2016-10-27 14:05   ` [PATCH v2 4/9] drm/i915: Remove engine->execlist_lock Chris Wilson
2016-10-27 14:05   ` [PATCH v2 5/9] drm/i915/scheduler: Signal the arrival of a new request Chris Wilson
2016-10-27 14:05   ` [PATCH v2 6/9] drm/i915/scheduler: Record all dependencies upon request construction Chris Wilson
2016-10-27 14:05   ` [PATCH v2 7/9] drm/i915/scheduler: Execute requests in order of priorities Chris Wilson
2016-10-27 14:05   ` [PATCH v2 8/9] drm/i915/scheduler: Boost priorities for flips Chris Wilson
2016-10-27 14:05   ` [PATCH v2 9/9] drm/i915/scheduler: Support user-defined priorities Chris Wilson
2016-10-27 15:22     ` Chris Wilson
2016-10-28 11:21       ` Chris Wilson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.